Quick viewing(Text Mode)

Age, DNA Methylation and the Malignant Potential of the Serrated Neoplasia Pathway Lochlan John Fennell B

Age, DNA Methylation and the Malignant Potential of the Serrated Neoplasia Pathway Lochlan John Fennell B

Age, DNA Methylation and the Malignant Potential of the Serrated Neoplasia Pathway Lochlan John Fennell B. Biomed Sci

A thesis submitted for the degree of Doctor of Philosophy at The University of Queensland in 2020 Faculty of Medicine

ORC ID: 0000-0003-3214-3527

1

Abstract Colorectal cancer is the third most common cancer in Australia and is responsible for the death of over four thousand Australians each year. There are two overarching molecular pathways leading to colorectal cancer. The conventional pathway, which is responsible for ~75% of colorectal cancer diagnoses, occurs in a step-wise manner and is the consequence of a series of genetic alterations including mutations of tumour suppressor and gross chromosomal abnormalities. This pathway has been extensively studied over the past three decades.

The serrated neoplasia pathway is responsible for the remaining colorectal cancers. This pathway is triggered by oncogenic BRAF mutation and these cancers accumulate epigenetic alterations while progressing to invasive cancer. DNA methylation is important in serrated neoplasia, however the extent and role of DNA methylation on the initiation and progression of serrated lesions is not clear. DNA methylation accumulates in tissues with age, and advanced serrated lesions and cancers occur almost exclusively in elderly patients. How this methylation affects serrated lesions is unknown. In this thesis I set out to address three key research questions related to DNA methylation, age and serrated colorectal neoplasia. First, what is the extent of DNA methylation in colorectal cancers?; Second, Does age-related hypermethylation, and namely that occurring at the loci encoding tumour suppressor genes, increase the risk of serrated colorectal neoplasia?; and if true, how can we reconcile this with the existence of early onset serrated colorectal cancer?

In the first chapter of this thesis, I examine the DNA methylation and transcriptional architecture of 216 colorectal cancer samples collected consecutively at the Royal Brisbane and Women’s hospital. Clustering analysis of DNA methylation data revealed five distinct subtypes of colorectal cancers, including two clusters with high levels of DNA methylation, two with intermediate levels of DNA methylation, and one devoid of DNA methylation alterations. This study highlighted striking associations between DNA methylation and age, gender and tumour location. Oncogenic BRAF mutations were common in CIMP-H1, but rare in CIMP-H2. The inverse was true for CIMP-H2. These analyses were robustly verified using data from TCGA. Using exome capture sequencing data from TCGA, we assessed the frequency of mutations in genes encoding epigenetic regulators in cancers by methylation subtype. The frequency of

2 mutation in epigenetic regulators increased concomitant with increasing genomic hypermethylation, and the spectrum of mutations differed by methylation subtype. Mutations in epigenetic regulators may provide insight into the epigenetic aberrancies observed in these highly methylated cancers. In the second chapter of this thesis, I investigate the role of age and age-associated DNA methylation and the associated transcriptional changes, in the risk of serrated neoplastic transformation using a murine model of serrated neoplasia driven by oncogenic Braf. First, the DNA methylation landscape of wild type animals was assessed over the lifespan of the animals via reduced representation bisulphite sequencing. This revealed pervasive age-associated DNA methylation, with a tendency for methylation to accumulate at the promoters of genes encoding WNT signalling regulators. Activating the Braf oncogene at wean induces a remarkable acceleration of DNA methylation accumulation, with a stronger still enrichment at WNT signalling loci. Lastly, we activated the Braf mutation in mice at wean for five months, and at nine months of age for the same period. We observe a 10.5 fold relative risk of advanced serrated lesions in animals induced at nine months compared with the younger animals, despite the same period of oncogenic exposure. These data directly implicate the aging process in determining the malignant threat of Braf induced neoplasia. Methylation analysis of these animals revealed differences in loci identified as age-associated, and those accelerated by oncogenic Braf. This study strongly implicates age and DNA methylation as a risk factors for serrated neoplastic progression and may have implications for patient surveillance.

In the third chapter of this thesis, I attempt to elucidate the role of mutations in WNT signalling genes in serrated colorectal cancers, hypothesising that genetic alterations of WNT regulators will underpin early onset BRAF mutant cancers. Large exome sequencing data sets from earlier studies were collated to establish a cohort of 199 BRAF mutant colorectal cancers with somatic mutation data. This study reveals a mosaic of WNT pathway mutations, including several potential novel driver genes. APC mutations in the setting of BRAF mutant cancers were associated with poor patient outcome, and importantly, younger age at diagnosis. These data indicates that mutation of APC could provide an alternate avenue for progressing to invasive cancer, abrogating the need for excessive DNA methylation alterations.

3

To conclude, this thesis has comprehensively evaluated the role of DNA methylation and age in the risk of serrated colorectal neoplasia. These data improve our understanding of the role of DNA methylation in colorectal carcinogenesis and may provide an impetus for investigating patient-centric surveillance regimens according to the age of patients at index colonoscopy.

4

Declaration by author

This thesis is composed of my original work, and contains no material previously published or written by another person except where due reference has been made in the text. I have clearly stated the contribution by others to jointly-authored works that I have included in my thesis.

I have clearly stated the contribution of others to my thesis as a whole, including statistical assistance, survey design, data analysis, significant technical procedures, professional editorial advice, financial support and any other original research work used or reported in my thesis. The content of my thesis is the result of work I have carried out since the commencement of my higher degree by research candidature and does not include a substantial part of work that has been submitted to qualify for the award of any other degree or diploma in any university or other tertiary institution. I have clearly stated which parts of my thesis, if any, have been submitted to qualify for another award.

I acknowledge that an electronic copy of my thesis must be lodged with the University Library and, subject to the policy and procedures of The University of Queensland, the thesis be made available for research and study in accordance with the Copyright Act 1968 unless a period of embargo has been approved by the Dean of the Graduate School.

I acknowledge that copyright of all material contained in my thesis resides with the copyright holder(s) of that material. Where appropriate I have obtained copyright permission from the copyright holder to reproduce material in this thesis and have sought permission from co-authors for any jointly authored works included in the thesis.

5

Publications included in this thesis 1) Fennell, L., Dumenil, T., Wockner, L., Hartel, G., Nones, K., Bond, C., Borowsky, J., Liu, C., McKeone, D., Bowdler, L., Montgomery, G., Klein, K., Hoffmann, I., Patch, A.- M., Kazakoff, S., Pearson, J., Waddell, N., Wirapati, P., Lochhead, P., Imamura, Y., Ogino, S., Shao, R., Tejpar, S., Leggett, B., Whitehall, V., 2019. Integrative Genome- Scale DNA Methylation Analysis of a Large and Unselected Cohort Reveals 5 Distinct Subtypes of Colorectal Adenocarcinomas. Cellular and Molecular Gastroenterology and Hepatology 8, 269–290. https://doi.org/10.1016/j.jcmgh.2019.04.002 2) Fennell, L.J., Kane, A., Liu, C., McKeone, D., Fernando, W., Su, C., Bond, C., Jamieson, S., Dumenil, T., Patch, A.-M., Kazakoff, S.H., Pearson, J.V., Waddell, N., Leggett, B., Whitehall, V.L.J., 2020. APC Mutation Marks an Aggressive Subtype of BRAF Mutant Colorectal Cancers. Cancers 12, 1171.

6

Submitted manuscripts included in this thesis 1) Fennell, L.J, Kane, A., Liu, C., McKeone D., Hartel, G., Bond, C.E., Bettington, M., Leggett, B.A. and Whitehall, V.L.J., 2020, Braf mutation induces rapid neoplastic transformation in the aged and extensively hypermethylated intestinal epithelium Submitted

Other publications during candidature 1) Kane A., Liu, C., Fennell, L.J., McKeone, D., Bond, C., Pollock, P., Young, G., Leggett, B., and Whitehall, V. 2021 Aspirin reduces the incidence of metastasis in a preclinical study of Braf mutant serrated colorectal neoplasia. In Press, British J Cancer. Contribution: Bioinformatic analysis of RNA-Seq data 2) Kane, A., Fennell, L. J., Liu, C., Borowsky, J., Mckeone, D., Bond, C., Kazakoff, S., Patch, A., Koufariotis, L., Pearson, J., Waddell, N., Leggett, B., and Whitehall, V. 2019. Alterations in signaling pathways that accompany spontaneous transition to malignancy in a mouse model of BRAF mutant microsatellite stable colorectal cancer. In Press Neoplasia. Contribution: Bioinformatic analysis of RNA-Seq data 3) Liu, C., Fennell, L.J., Bettington, M.L., Walker, N.I., Dwine, J., Leggett, B.A. & Whitehall, V.L.J 2019 DNA Methylation changes that precede onset of dysplasia in advanced sessile serrated adenomas. Clinical Epigenetics. Contribution: Bioinformatic analysis of EPIC Array data 4) Lannagan, T., Lee, Y., Wang, T., Roper, J., Bettington, M., Fennell, L. J., Vrbanac, L., Somashekar, R., Gieniec, K., Yang, M., Ng, J. Q., Suzuki, N., Ichinose, M., Wright, J., Kobayashi, H., Putoczki, T., Hayakawa, Y., Leedham, S., Abud, H., Yilmaz, O., Marker, J., Klebe, S., Wirapati, P., Mukherjee, S., Tejpar, S., Leggett, B., Whitehall, V., Worthley, D. L. & Woods, S. 2018. Genetic editing of colonic organoids provides a molecularly distinct and orthotopic preclinical model of serrated carcinogenesis. Gut. Contribution: Development and application of microsatellite instability assay 5) Fennell, L. J., Jamieson, S., Mckeone, D., Corish, T., Rohdmann, M., Furner, T., Bettington, M., Liu, C., Kawamata, F., Bond, C., Van De Pols, J., Leggett, B. &

7

Whitehall, V. 2018. MLH1–93 G/a polymorphism is associated with MLH1 promoter methylation and protein loss in dysplastic sessile serrated adenomas with BRAF(V600E) mutation. BMC Cancer, 18, 35. Contribution: Conceptualisation, study oversight, experimentation, analysis and manuscript preparation 6) Bond, C. E., Liu, C., Kawamata, F., Mckeone, D. M., Fernando, W., Jamieson, S., Pearson, S.-A., Kane, A., Woods, S. L., Lannagan, T. R. M., Somashekar, R., Lee, Y., Dumenil, T., Hartel, G., Spring, K. J., Borowsky, J., Fennell, L., Bettington, M., Lee, J., Worthley, D. L., Leggett, B. A. & Whitehall, V. L. J. 2018. Oncogenic BRAF mutation induces DNA methylation changes in a murine model for human serrated colorectal neoplasia. Epigenetics, 13, 40-48. Contribution: Animal and molecular experiments 7) Kawamata, F., Patch, A.-M., Nones, K., Bond, C., Mckeone, D., Pearson, S.-A., Homma, S., Liu, C., Fennell, L., Dumenil, T., Hartel, G., Kobayasi, N., Yokoo, H., Fukai, M., Nishihara, H., Kamiyama, T., Burge, M. E., Karapetis, C. S., Taketomi, A., Leggett, B., Waddell, N. & Whitehall, V. 2018. Copy number profiles of paired primary and metastatic colorectal cancers. Oncotarget, 9, 3394-3405. Contribution: Tissue processing and molecular experiments 8) Borowsky, J., Dumenil, T., Bettington, M., Pearson, S.-A., Bond, C., Fennell, L., Liu, C., Mckeone, D., Rosty, C., Brown, I., Walker, N., Leggett, B. & Whitehall, V. 2017. The role of APC in WNT pathway activation in serrated neoplasia. Modern Pathology, 31, 495. Contribution: Bioinformatic analysis 9) Fennell, L. J., Clendenning, M., Mckeone, D. M., Jamieson, S. H., Balachandran, S., Borowsky, J., Liu, J., Kawamata, F., Bond, C. E., Rosty, C., Burge, M. E., Buchanan, D. D., Leggett, B. A. & Whitehall, V. L. J. 2017. RNF43 is mutated less frequently in Lynch Syndrome compared with sporadic microsatellite unstable colorectal cancers. Familial Cancer, 1-7. Contribution: Conceptualisation, experimentation, analysis and manuscript preparation.

8

Contributions by others to the thesis As specified at the start of each chapter.

9

Statement of parts of the thesis submitted to qualify for the award of another degree No works submitted towards another degree have been included in this thesis.

Research Involving Human or Animal Subjects Work with human specimens in this thesis was approved by the QIMR Berghofer Human Research Ethics Committee (P460 and P773). Animal work was approved by the QIMR Berghofer Animal Ethics Committee (P1208 and P2178).

10

Acknowledgements

This thesis would not have been possible if not for the support of my mentors, my colleagues, my peers, my friends and my family. I am immensely grateful to have you.

I would specifically like to acknowledge my supervisors, Associate Professor Vicki Whitehall and Professor Barbara Leggett. Your guidance on matters scientific, professional and personal have been formative in shaping me as a scientist and as an individual. The close mentorship and constant availability of you both was immensely helpful.

I would like to thank all the members of the Conjoint Gastroenterology Laboratory. To Diane, Alex, John, Cath, Chang, Jen, Basit, Fayth and other members of the lab over the years, I am grateful for the contributions to my thesis and the positive environment you all contributed to over the years. My PhD would have been much more challenging had I not had the opportunity to work with such an outstanding group of people.

I am grateful for the support of staff in the various scientific services departments at QIMR. To Andrew, Sang-Hee and the histology team, I owe a debt of gratitude to you for the way in which you handled hundreds of samples from my project. The service was always excellent and highly professional. Likewise, to Paul and his team at sample processing and sequencing.

11

Financial support

This research was supported by an Australian Government Research Training Program Scholarship, and Top-Up awards from QIMR Berghofer Medical Research Institute, Australian Rotary Health and Tour De Cure. This research was also supported by project grants from the National Health and Medical Research Council (#1050455 and 1063105).

Keywords Colorectal cancer, Gastroenterology, DNA methylation, Sessile serrated lesion, BRAF, Genomics, Transcriptomics

12

Australian and New Zealand Standard Research Classifications (ANZSRC) ANZSRC code: 111202, Cancer Diagnostics, 50% ANZSRC code: 111203, Cancer Genetics, 50%

Fields of Research (FoR) Classification

FoR code: 1112, Oncology and Carcinogenesis (100%)

13

Conflict of Interest Declarations

I declare no conflicts of interest on behalf of myself and all other contributors to this thesis.

14

Table of Contents

Age, DNA Methylation and the Malignant Potential of the Serrated Neoplasia Pathway ...... 1

Abstract ...... 2

Declaration by author ...... 5

Publications included in this thesis ...... 6

Submitted manuscripts included in this thesis ...... 7

Other publications during candidature ...... 7

Contributions by others to the thesis ...... 9

Statement of parts of the thesis submitted to qualify for the award of another degree ...... 10

Research Involving Human or Animal Subjects ...... 10

Acknowledgements ...... 11

Financial support ...... 12

Keywords ...... 12

Australian and New Zealand Standard Research Classifications (ANZSRC) ...... 13

Fields of Research (FoR) Classification ...... 13

Conflict of Interest Declarations ...... 14

Table of Contents ...... 15

Abbreviations ...... 17

Introduction ...... 20

Chapter One: Integrative Genome-Scale DNA Methylation Analysis of a Large and Unselected cohort reveals five distinct subtypes of Colorectal Adenocarcinomas ...... 67

Chapter Two: Braf mutation induces rapid neoplastic transformation in the aged and extensively hypermethylated intestinal epithelium ...... 125

Chapter Three: APC Mutation marks an aggressive subtype of BRAF mutant colorectal cancers ...... 166

Chapter Four: Discussion ...... 206

15

Appendix 1 – Epigenetic regulator genes interrogated in Thesis Chapter 1...... 223

16

Abbreviations

AJCC: American Joint Committee on Cancer CBC: Crypt base columnar ACF: Aberrant Crypt Foci KRAS: KRAS proto-oncogene, GTPase BRAF: B-Raf proto-oncogene, serine/threonine APC: APC regulator of WNT signaling pathway TP53: Tumor protein HP: Hyperplastic polyp SSL: Sessile serrated lesion TSA: Traditional serrated adenoma GCHP: Goblet-cell rich hyperplastic polyp MVHP: Microvesicular hyperplastic polyp CIMP: CpG Island Methylator Phenotype DNA: Deoxyribonucleic Acid MLH1: mutL homolog 1 CIN: Chromosomal instability EGFR: Epidermal growth factor receptor ERBB2: erb-b2 receptor tyrosine kinase 2 PIK3CA: phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha BUB1: BUB1 mitotic checkpoint serine/threonine kinase BUB3: BUB3 mitotic checkpoint protein STAG2: stromal antigen 2 MDM2: MDM2 proto-oncogene PML: PML nuclear body scaffold ATM: ATM serine/threonine kinase ATR: ATR serine/threonine kinase CDKN2A: cyclin dependent kinase inhibitor 2A CDK4: cyclin dependent kinase 4 CDK6: cyclin dependent kinase 6

17

RB: Retinoblastoma protein BCL2: BCL2 regulator BAX: BCL2 associated X, apoptosis regulator BAK: BCL2 antagonist/killer 1 APAF-1: apoptotic peptidase activating factor 1 VEGFA: vascular endothelial growth factor A Polα: DNA polymerase alpha Polδ: DNA polymerase delta Polε: DNA polymerase epsilon MSH2: mutS homolog 2 MSH6: mutS homolog 6 PMS2: PMS1 homolog 2, mismatch repair system component EXO1: exonuclease 1 MSI: Microsatellite Instability MAPK: Mitogen Activated Protein Kinase MAPKK: Mitogen Activated Protein Kinase Kinase MAPKKK: MAP Kinase Kinase Kinase RAS: Ras GTPase RAF: Rapidly accelerated fibrosarcoma SHC: SHC-transforming protein GRB2: growth factor receptor bound protein 2 SOS: SOS Ras/Rac guanine nucleotide exchange factor GDP: Guanosine diphosphate GTP: Guanosine triphosphate PI3K: Phosphoinositide 3-kinase PI-4,5-P2: Phosphatidylinositol-4,5-bisphosphate

PIP3: Phosphatidylinositol-3,4,5-triphosphate PI-3-P: Phosphatidylinositol-3-phosphate

PI-3,4-P2: Phosphatidylinositol-3,4-biphosphate AKT: PIK3CA: Phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha

18

PTEN: Phosphatase and tensin homolog LRP5/6: LDL receptor related protein 5/6 GSK3β: Glycogen synthase kinase 3 beta SFRP: Secreted frizzled-related protein CACNA1G: Calcium voltage-gated channel subunit alpha1 G SOCS1: Suppressor of cytokine signaling 1 RUNX3: RUNX family transcription factor 3 NEUROG1: Neurogenin 1 IGF2: Insulin like growth factor 2 CRABP1: Cellular retinoic acid binding protein 1 H1, H2, H3: Histone 1, 2, 3 H3K27me3: Trimethylation of histone H3 at lysine 27 SETD1B: SET domain containing 1B, histone lysine methyltransferase H3K4: Lysine 4 on Histone H3 KMT2C: Lysine methyltransferase 2C H3K4me1: Monomethylation of histone H3 at lysine 4 CMS (1,2,3,4) : Consensus molecular subtype (1,2,3,4) MYC: MYC proto-oncogene, bHLH transcription factor TGFBR2: Transforming growth factor beta receptor 2 RNF43: Ring finger protein 43 ZNRF3: Zinc and ring finger 3 ESR1: Estrogen receptor 1 GATA5: GATA binding protein 5 HIC1: HIC ZBTB transcriptional repressor 1 HPP1: Hyperpigmentation, progressive, 1

19

Introduction

Colorectal cancer in Australia

Colorectal cancer is a leading cause of morbidity and mortality in Australia. In 2019 alone, >16,000 cases of colorectal cancer were diagnosed, making colorectal cancer the second most frequently diagnosed cancer in both males and females1 (Figure 1A and 1B). Despite public health efforts aimed at primary prevention and early detection of premalignant colonic lesions, the age standardised incidence rate of colorectal cancer has remained relatively constant over the past two decades (Figure 1C)1. The age-standardised incidence rate in 2015 was 57.4 cases per hundred thousand people, and the risk of developing colorectal cancer by age 85 remains approximately 1 in 121.

20

Figure 1: A and B: The number of cancer diagnoses in 2019 as stratified by primary cancer location and gender. C: The age standardized rate of colorectal cancers from 1985 through 20151

The survivability of colorectal cancer is largely dependent on the stage at which the cancer is diagnosed. Colorectal cancer is staged via the TNM system devised by the American Joint Committee on Cancer (AJCC). The three elements assessed in TNM staging are the degree tumour invasion at the primary site, the extent of lymph node involvement and the degree of metastatic disease. Local disease, that is cancers with N and M stages of zero, have an excellent 5-year survival rate of between 88-100%1. By contrast, cancers with an N > 0 have a survival rate of ~60% and metastatic cancers (M>0) have a dismal survival of less than 10%. Colorectal cancer mortality rates, as a singular entity, have been decreasing in most western countries including Australia1, the United States2,3, and the UK4. Much of the decline in colorectal cancer

21 mortality can be attributed to improved screening and surveillance regimens, which aim to detect precancerous colonic lesions and early stage cancers2. These programs rely on a distinct pathobiological sequence of events that precede invasive cancer development, providing windows of opportunity for early interventions. This sequence of events begins with a single dysfunctional colonic crypt.

The biology of the intestinal epithelium

The colonic epithelium is lined by millions of repeating crypt-like structures that invaginate the intestinal wall. These structures sit atop layers of blood vessels, and serve primarily to facilitate rapid nutrient absorption and electrolyte transport5,6. The colonic crypt can be divided into three distinct compartments; the stem zone, the transit amplifying zone (or the proliferative compartment), and the mature zone. The stem zone is populated by ~15 crypt base columnar (CBC) cells5, which, as the name suggests, possess inherent characteristics of stem cells, including the ability to continuously cycle, to self-renew and to give rise to distinct lineages of cells5,7,8,8). CBC cells give rise to daughter cells that enter a period of transit amplification. During this period cells proliferate rapidly, but ultimately terminally differentiate into mature colonic epithelial cells7, which eventually exfoliate into the gut through a process known as anoikis9.

This well-organised cascade is crucial to the functionality of the gut5. As a stem cell that is highly proliferative and continuously cycling, the colonic stem cells are at risk of accumulating genetic lesions, through . Further, the microenvironment of the gut is populated by microorganisms and metabolites that can damage the epithelial layer of the intestine. Therefore, the intestinal stem cell, if inadequately protected, is acutely susceptible to developing neoplasia.

The structure of the crypt offers protection to the continuously cycling stem cells residing at the base of the crypt. The mature cells, which do not replicate, are exposed to the brunt of these environmental factors. In contrast, the stem cells at the base of the crypt are shielded from such exposures. This protects the integrity of the stem cells, and as a result allows for the constant replenishment of damaged epithelial cells facing the lumen by newly differentiated epithelial cells rising from the apical membrane. Kaiko and colleagues10 provided credence to this

22 hypothesis by demonstrating that butyrate, which is a common bacterial metabolite, is metabolised by the mature enterocytes, and rarely reaches the intestinal stem cells at apical membrane. However should butyrate access the stem cells at the base of the crypt, it markedly decreases proliferation and wound repair (Kaiko et al, 2016)10. Thus, the crypt structure and composition is uniquely suited to the intestinal environment. However dysregulation of signalling and mutation of certain genes can trigger neoplastic transformation of the intestinal epithelia11.

23

Adenomas, Polyps and Lesions: Precursors to Colorectal Cancer

The earliest marcoscopic manifestation of colorectal neoplasia is the formation of an adenoma. Colorectal adenomas can be broadly classified as either conventional or serrated.

1. Aberrant Crypt Foci

The term aberrant crypt foci (ACF) is used to describe an abnormal outgrowth, or thickening of multiple crypts in close proximity. Bird12 initially reported aberrant crypt foci in a carcinogen induced tumour model. Here, Bird injected animals with azoxymethane, and observed abnormal thickening of some colonic crypts in methylene blue stained tissue sections12. Other chemical carcinogens, such as 1,2-dimethylhydrazine, have been observed to induce a similar effect13. Carcinogen induced ACF also appear to harbour oncogenic mutations, including those in the KRAS proto-oncogene, GTPase (KRAS). Aberrant crypt foci have since been identified in human colonic specimens16–18. Molecularly, ACF bear striking similarities to colorectal adenomas. Approximately 25-40% of aberrant crypt foci with a conventional morphology mutate KRAS19,20, while ~60% of those with a serrated morphology mutated B-Raf proto-oncogene, serine/threonine kinase (BRAF)20. As with adenomas and cancers, mutations in KRAS and BRAF are mutually exclusive20. Few ACF harbour a mutation in APC regulator of WNT signaling pathway (APC), indicating that APC mutation may be the impetus for adenoma formation from ACF.

2. Conventional Adenomas

Conventional adenomas are estimated to be present in ~50% of the adult population aged >50 years21. These polyps are relatively infrequent in younger patients, becoming more prevalent with advancing age22. These polyps are initiated by biallelic loss of APC, and acquire KRAS and/or Tumour protein 53 (TP53) mutations on progressing to cancer11. The conventional adenoma can be categorized as tubular, villous or tubulovillous owing to the histological appearance of the lesion, with the later referring to lesions containing compartments of both villous and tubular morphologies21,23. The tubular adenoma has <25% villous histology and is characterised by elongated and hyperchromatic nuclei. These lesions are always dysplastic, but present the lowest malignant threat of all conventional adenomas24, indeed patients who have

24 only tubular adenomas, of <10mm, removed at colonoscopy appear no more at risk of future colorectal cancer than the general population24. Villous histology is indicative of an advanced adenoma25, and adenomas with >75% villosity are classified as villous adenomas26, and those with between 25% and 75% villosity as tubulovillus adenomas26. Villous architecture can be described as ‘fronds covered in neoplastic cells’27. In practice, villus adenomas are extraordinary rare.

The term dysplasia is used to describe an abnormal growth and development pattern of cells. It is a definitively neoplastic event, and signifies a heightened risk of malignant progression. In conventional adenomas, lesions are classified as having low, or high grade dysplasia27. Adenomas with low grade dysplasia may show more subtle cellular defects, such as hyperchromatic, darkened nuclei23. Adenomas with high grade dysplasia show more marked defects, such as crowding, loss of cellular polarity, and the presence of poorly differentiated cells at the luminal surface. This distinction is important as adenomas with low-grade dysplasia are low risk lesions, and those with high grade dysplasia are considered advanced. Adenomas with high grade dysplasia are at a significantly higher risk of malignant progression, and ~30-35% of conventional adenomas with high-grade dysplasia harbour a focus of invasive cancer, compared with just 6% of polyps with low grade dysplasia28. Approximately 75-85% of all colorectal cancers arise from conventional adenomas11. The remaining fraction of cancers arise in the setting of serrated neoplasia, and from serrated lesions29.

3. Serrated Lesions

There are a subset of sporadic colorectal cancers that do not arise from conventional adenomas, and rather from serrated lesions29. These lesions include hyperplastic polyps (HP), sessile serrated lesions (SSL) and traditional serrated adenomas (TSA), each of which harbour distinct molecular and morphological features29–32. The hyperplastic polyp is thought present the lowest risk of the serrated lesions33. This lesion is diagnosed when the histological criteria for a sessile serrated lesion is not met26. Endoscopically, hyperplastic polyps are usually flat, sessile and diminutive appearing lesions that predominate in the distal colon and rectum33–35. Two variants of hyperplastic polyps have been proposed, the goblet-cell rich hyperplastic polyp (GCHP) and the microvesicular hyperplastic polyp (MVHP)34. The former is characterised by surface tufting

25 and an enrichment for goblet cells. These polyps are usually small, have more subtle serrations and as a result are more difficult to diagnose. The MVHP, in contrast, harbours more pronounced serration of the affected crypts, and a prominent microvesicular appearance34,36.

Although the malignant threat of hyperplastic polyps is controversial and thought to be limited29,33,34, SSLs and TSAs possess significant malignant potential. SSLs can be histologically distinguished from HPs by the presence of exaggerated luminal serration37,38, laterally-spreading crypt base dilatation38, and dystrophic goblet cells38. The minimum diagnosis criteria for a sessile serrated lesion is the presence of at least one horizontally branch, dilated crypt26. Endoscopically, most sessile serrated lesions are flat, and sessile in appearance37,39. Approximately 60% of lesions have a mucinous cap39, and ~50% of lesions are marked by a ring of debris39. In contrast to hyperplastic polyps, most SSLs present in the proximal colon34. The nebulous appearance of SSLs presents challenges for detection and the sessile serrated lesion detection rate varies substantially between operators and centres40. SSLs are initiated by point mutation of the BRAF proto-oncogene29, and develop the CpG island methylator phenotype (CIMP, described in detail below), and most acquire DNA methylation at the mutL homolog 1 (MLH1) in the course of progressing to invasive cancer29,41.

The final serrated lesion is the traditional serrated adenoma. TSAs are very rare lesions and, in most large series, represent <1% of all detected lesions30,42. TSA feature “typical” cytology, ectopic crypts, and slit-like serratons43, where typical cytology describes the presence of narrow, penicillate nuclei on a background of a bright eosinophilic cytoplasm32,43. The natural history of these lesions is variable, and it is clear that some are de novo lesions, whereas others have arisen from precursor polyps32. The precursor polyps that can give rise to TSAs include HPs and SSLs32. TSAs arising from precursor polyps are less likely to have a focus of serrated dysplasia or carcinoma, and usually represent less advanced lesions when compared to de novo TSAs32. TSAs mutate either KRAS or BRAF, but rarely both32. They develop CIMP, but do not methylate MLH1 and therefore remain microsatellite stable32. TSAs are thought to underlie the BRAF mutant microsatellite stable cancers, which confer dismal prognoses44. Given the frequency of advanced histological and molecular features, TSAs are considered high risk lesions.

Molecular genetics of colorectal cancer

26

Carcinogenesis is an intricate, step-wise process typified by the accumulation of genetic and epigenetic alterations that inactivate the various mechanisms designed to prevent malignancy. Cancers of the colorectum can arise through several distinct molecular pathways.

1. The Chromosomal Instability Pathway

The chromosomal instability (CIN) pathway gives rise to ~70% of colorectal cancers11. Cancers of the chromosomal instability pathway harbour a mosaic of somatic copy number alterations45. These alterations are often large, and can span entire chromosomes45. The natural by-product of these events is the gain, or loss of DNA copies of the genes encoded within these regions. One or both copies of tumour suppressor genes are often lost45. When the former occurs, the loci is said to have lost heterozygosity and the functionality of the underlying depends on the remaining copies 46,47, which are often targeted by mutation47 or epimutation48. The latter describes a deep deletion and can fully inactivate genes. Aberrant gain of copy number is also common in CIN cancers and this results in overexpression of the affected gene. This frequently occurs at loci which harbour oncogenic potential, such as the epidermal growth factor receptor (EGFR) and the erb-b2 receptor tyrosine kinase 2 (ERBB2)49.

Cancers arising in the chromosomal instability pathway are thought to be initiated by biallelic loss of the APC tumour suppressor gene50,51. APC is a critical regulator of the WNT/β-Catenin pathway52,53. The protein is a member of the β-Catenin destruction complex, which, when activated, patrols the cytoplasm and degrades β-Catenin52,53. This prevents the nuclear translocation of β-Catenin and transcription of WNT target genes, which are involved in stemness and proliferation52. APC mutation is common in conventional adenomas54 and colorectal cancers49, but rare in serrated lesions54. Activating mutations in KRAS are the next event in the molecular cascade, and occur more frequently as adenomas become increasingly advanced (~10% of adenomas with low grade dysplasia versus ~45% of adenomas with high grade dysplasia)55. Progression through to cancer is ensured via mutation of TP53 and/or Phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha (PIK3CA)45.

Concurrent with the accumulation of mutations in driver genes is the growing aneuploidy and aberrant copy number alterations45,56. Chromosomal instability can arise through several

27 mechanisms. Chromosomal segregation is a process during mitosis that distributes the to the daughter cells57. Herein lies an important cell cycle checkpoint called the spindle assembly checkpoint57. The spindle assembly checkpoint delays the onset of anaphase if the chromosomes are not appropriately distributed to the daughter cells and by consequence acts as a guard against aneuploidy57,58. The functionality of the spindle assembly checkpoint relies on several proteins. The spindle assembly checkpoint is activated when the kinetochores are not attached to the microtubules58. This triggers the formation of the mitotic checkpoint complex, which in turn inhibits the anaphase promoting complex C, thus preventing progression to anaphase58. The arrests the cell in prometaphase until kinetochores are attached to the spindle fibres. Mutations in genes encoding members of the mitotic checkpoint complex are rare in sporadic colorectal cancer, however mutations in BUB1 mitotic checkpoint serine/threonine kinase (BUB1) and BUB3 mitotic checkpoint protein (BUB3) genes, which are both important elements of this complex, are linked to aneuploidy and familial colorectal cancer59.

Another proposed mechanism for CIN is defects in cohesion, which refers to the process by which sister chromatids are bound together. Barber et al60 screened the human homologues of genes that encode proteins involved in chromosome segregation in yeast, and identified recurrent mutations in five genes that are specifically related to chromatid cohesion. Moreover, stromal antigen 2 (STAG2), which is a component of the chromatid cohesion process, is heavily expressed in normal colonic mucosa, but lost in ~30% of colorectal cancer specimens. Loss of STAG2 has also been shown to induce aneuploidy and chromosomal missegregation61,62.

2. TP53 Pathway

TP53 is the most extensively mutated gene in all human cancer63 and dysregulation of TP53 signalling is a key event in the chromosomal instability pathway. TP53 has many multifaceted and highly varied roles in preventing cancer. In normal cells, at a basal state, TP53 is lowly expressed64, however TP53 is rapidly deployed in response to cellular stressors. The stressors TP53 responds to include hypoxia65, DNA damage66, oncogenic signalling67, replicative stress68 and oxidative stress69, amongst other stimuli.

28

The MDM2 proto-oncogene (MDM2) is a ubiquitin that tags TP53 for proteasomal degradation70. MDM2 acts as an ON-OFF switch for the TP53 pathway, where if MDM2-TP53 N-terminus binding activity is inhibited or otherwise impaired, TP53 protein can accumulate and activate cell stress responses70. Abrogation of MDM2 activity is the primary conduit for TP53 stabilisation and activation, and this is achieved through several mechanisms. For example, ARF, which responds to oncogenic insults, can directly interact with MDM2 and prevent TP53 ubiquitination71, thus stabilising TP53. ARF can also sequester MDM2 to the nucleoli, thus providing spatial separation between MDM2 and TP5371. PML nuclear body scaffold (PML), a protein that is activated by ATM serine/threonine kinase (ATM)/ ATR serine/threonine kinase (ATR) dependent DNA damage sensing, also regulates MDM2 via nucleolar sequester72. ATM can also directly phosphorylate MDM2 at serine 394. This post-translational modification prevents MDM2 from ubiquitinating TP53, and stabilises the protein73.

The best described of all of TP53’s functions is the ability of the protein to induce cell cycle arrest64. TP53 arrests the cell cycle by inducing transcription of cyclin dependent kinase inhibitor 2A (CDKN2A), the gene encoding p16 and p14ARF. The p16 protein is an inhibitor of the cyclin dependent , cyclin dependent kinase 4 (CDK4) and cyclin dependent kinase 6 (CDK6). Induction of p16, and inhibition of CDK4/6, prevents the phosphorylation of the retinoblastoma protein (RB)71. Phosphorylated RB decouples from the E2F transcription factor, which binds target genes that facilitate the transition from the G1 phase of the cell cycle to the synthesis phase 74. Thus TP53 induction of p16 abrogates the transcription of E2F target genes and arrests the cell cycle at G1.

In addition to inducing cell cycle arrest, TP53 can respond to stressors by triggering apoptosis. The primary means by which TP53 activates the apoptosis is via activation of the intrinsic apoptosis pathway75. Here, TP53 induces the transcription of the pro-apoptotic genes in the BCL2 apoptosis regulator (BCL2) family75. The pro-apoptotic Bcl-2 Homology 3-only BCL2 proteins inhibit the pro-survival members of the BCL2 family. By consequence, the BCL2 associated X, apoptosis regulator (BAX)/BCL2 antagonist/killer 1 (BAK) proteins become activated and mitochondrial outer membrane permeabilization is induced75. Cytochrome c escapes the mitochondria, and oligomerizes apoptotic peptidase activating factor 1 (APAF-1), which subsequently activates the and apoptosis ensues75,76.

29

TP53 is mutated in between 4-16% of adenomas with low-grade dysplasia77, ~50% of adenomas with high-grade dysplasia77, and 60-70% of colorectal carcinoma49. Most mutations reside in the DNA binding domain of TP5364 and probably function by inhibiting the DNA binding capacity of the protein64. Although the spectrum of mutations in TP53 is wide, four mutational hotspots have been reported (R175, R248, R273, R282)49. Mutations at codon R175 account for ~10% of all TP53 mutations49,64. This mutation induces a conformational change in the tertiary structure of the protein, preventing wild-type-like binding78,79. TP53 mutations that behave in this manner are called conformational mutations or class II TP53 mutations78. The other class of missense mutation in TP53 is the DNA contact mutations, or class I TP53 mutations. These mutations directly affect the ability of TP53 to read and bind DNA78. Mutations of this class include the R273 mutation, which encompasses ~6% of TP53 mutations. In contrast with most tumour suppressor genes, most mutations in TP53 are missense mutations, and only ~25% of TP53 mutations are truncating mutations. Truncating mutations are the third class of TP53 mutation, and are usually null, loss-of-function mutations78,80.

Reduced endogenous functionality of TP53 is but one consequence of many of the TP53 mutations reported in the literature. Intriguingly, many of the TP53 mutations also have gain-of- function effects80. Seminal works by Dittmer and colleagues81 demonstrate that many of the common p53 mutations observed in human cancer enhance the in vivo tumour forming capacity of cancer cells lacking p53, indicating a clear gain-of-function over the null phenotype81. Moreover, gain-of-function mutations reportedly induce the expression of a wide array of oncogenic genes80, including vascular endothelial growth factor A (VEGFA)82 and EGFR83.

3. Polε Mutation, Defective Mismatch Repair and The Hypermutation Pathway

Cell division and proliferation requires the duplication of nuclear material. In the cell, DNA is replicated through a process called DNA replication. DNA polymerase synthesise new, complementary strands of DNA using an existing single-stranded DNA molecule as a template. The primary polymerase enzymes involved in DNA replication are from the B family of polymerases84,85. This consists of DNA polymerase alpha (Polα), delta (Polδ), and epsilon (Polε)84. Each of these enzymes replicates DNA with high fidelity, with an error rate circa 104- 105 86. When mismatches are introduced, they can be immediately remedied by Polδ and Polε,

30 both of which have 3’ exonuclease domains which proofread and excise errors in newly synthesised DNA86. However should these processes fail, the cell has one remaining guard against genome instability, the DNA mismatch repair complex.

The DNA mismatch repair pathway is a DNA repair mechanism that detects base-base mismatches and indels that occur during DNA replication87. To detect and repair these errors the cell employs a DNA repair mechanism known as DNA mismatch repair88. Mismatch repair begins when the MutSα heterodimer, which consists of mutS homolog 2 (MSH2) and mutS homolog 6 (MSH6), binds to a mismatch and recruits the MutLα hetrodimer88. The MutLα heterodimer, comprised of MLH1 and PMS1 homolog 2, mismatch repair system component (PMS2). The MutLα complex nicks the recently synthesised strand of DNA on the 5’ and 3’ aspect of the mismatched nucleotide88. The strand of DNA is then removed via exonuclease 1 (EXO1) activity, strand displacement or the 3’ nuclease activity of Polδ and Polε. The repair process is completed when a replacement fragment of of DNA is synthesised by the polymerase enzymes and ligated to the remaining DNA by DNA ligase88.

DNA mismatch repair is highly efficient at repairing the mismatches that are most commonly introduced by polymerases and to which the replicative 3’ exonuclease proofreading is least efficient88. The phenomena of reciprocity, whereby the errors more frequently introduced are most efficiently repaired, is extremely effective in preventing the introduction of most replicative errors into the genome88, however it lacks functional redundancy and therefore leads to a highly hypermutable phenotype should one of these elements fail. Mutations in genes encoding the DNA polymerase enzymes89,90, and failure of DNA mismatch repair is common in colorectal cancer91,92.

Defective DNA mismatch repair results in microsatellite instability (MSI), an umbrella term for the accumulation of mutations in repetitive regions of the genome, that was first reported in 199393,94. It was some years later, aided by the molecular cloning of mismatch repair genes, that mutation or epimutation of these genes was mechanistically associated with microsatellite instability95–98. Lynch syndrome (formerly hereditary non-polyposis colorectal cancer) is a familial syndrome that is caused by germline mutation of one of the mismatch repair genes99. Lynch syndrome accounts for ~3% of all colorectal cancers, and is estimated to occur in ~1:300

31 persons100. Microsatellite instability is also common amongst sporadic colorectal cancers99. Microsatellite instability in sporadic colorectal cancers tends to occur in the proximal colon of patients presenting at an advanced age99. In contrast to Lynch syndrome cancers, most sporadic microsatellite unstable cancers do not arise from mutation of one of the mismatch repair genes99, but rather via epigenetic silencing of the MLH1 gene97,98. The mutation burden of colorectal cancers with MSI is much higher when compared with MSS cancer49. Microsatellite instability has clinical implications. Increased mutation burden correlates with increased neo-antigen load and therefore improved response to immune checkpoint therapy101,102. By contrast, patients with MSI respond poorly to conventional 5-flurouracil based therapy as functional mismatch repair is required for the cytotoxic effect103.

Patients with polymerase defects have more pronounced mutator phenotype90,104, and tend to accumulate thousands of mutations. Classical POLE and POLD mutations occur in the exonuclease proofreading domain, and alter the 3D structure of the polymerase105, thereby profoundly disrupting the proofreading capacity of the enzyme90. Like MSI cancers, POLE-exo mutated cancers have a heightened neoantigen burden and are acutely susceptible to immune checkpoint blockade106.

Canonical Signalling pathway activity in colorectal neoplasia

Several canonical signalling pathways become dysregulated as colorectal neoplasia progresses. These pathways govern crucial processes including proliferation (MAPKinase, PI3K/AKT), apoptosis (PI3K/AKT, Tp53/Rb) and stemness and differentiation (WNT).

1. The Mitogen Activated Protein Kinase Signalling Cascade (MAPK)

The MAPKinase signalling pathway is an evolutionarily conserved, three-tiered cascade of protein kinases with terminal functions controlling , proliferation, migration and inflammation107. G-proteins, in response to upstream stimuli, bind and activate MAPKKK molecules, initiating a sequential cascade of phosphorylation events through MAPKK and MAPK molecules to effect a response107. The RAS-GTPase (RAS)/Rapidly accelerated fibrosarcoma (RAF) pathway is the best characterised of the MAPK pathways. This particular rendition of the MAPK pathway is triggered when extracellular epidermal growth factor ligands

32 bind EGFR. EGFR is transmembrane receptor tyrosine kinase that forms a dimer when bound by EGF108. Dimerized EGFR autophosphorylates the C-terminus domain of the protein109. EGFR sequesters SHC-transforming protein (SHC) and growth factor receptor bound protein 2 (GRB2) to the plasma membrane and GRB2 subsequently recruits SOS Ras/Rac guanine nucleotide exchange factor (SOS). The SOS protein is a guanine nucleotide exchange factor110 that interacts with the KRAS protein, catalysing the conversion of guanosine diphosphate (GDP) to guanosine triphosphate (GTP). The conversion of GDP to GTP acts as a molecular switch activating the KRAS molecule, which binds to and stimulates RAF proteins (MAPKKKs)107. RAF proteins phosphorylate MEK1/2 (MAPKKs), which then phosphorylate ERK1/2 (MAPKs)107. ERK1/2 interacts with a plethora of transcription factors and other enzymes to induce proliferation, cell growth, migration and evasion of apoptosis107.

Several levels of this cascade are the subject of (epi)-genetic or transcriptomic dysregulation in colorectal cancers, however mutation of KRAS and BRAF is the most common. KRAS is the most frequently mutated gene in the RAF/RAS pathway. Mutations in KRAS are usually confined to hotspots at codon 12 and 13, codon 61, and codon 117 and codon 146. These hotspots encode the nucleotide binding pocket, which is the site on the protein responsible for the conversion of GDP to GTP111. Mutations at residues encoding the nucleotide binding pocket function to maintain GTP-bound KRAS. This is achieved via two mechanisms i) by altering the hydrolysis of GTP (12, 13 and 61), preventing KRAS from reverting to its basal, inactive state, or ii) by promoting nucleotide exchange (117 and 146), increasing the ratio of GTP-bound KRAS to GDP-bound KRAS111. Mutations at these residues convert KRAS into an oncoprotein that continuously binds RAF proteins, and induces hyperactivation of the RAF/RAS pathway.

BRAF is a MAPKKK protein and proto-oncogene that is activated by KRAS and signals to MEK1/2 via phosphorylation. Mutation of BRAF occurs in 10-15% of colorectal cancers and is a marker of the serrated neoplasia pathway31,112. ~70% of BRAF mutations are a valine to glutamic acid substitutions at codon 600. Codon 600 resides in the . The V600E mutation induces a 10-15 fold increase in the kinase activity relative to wild type BRAF113. This is accompanied by a sustained elevation in the phosphorylation of ERK1/2, confirming constitutive signalling by mutant BRAF113.

33

2. The PI3K/AKT signalling pathway

The Phosphoinositide 3-kinase (PI3K) pathway is comprised of several intracellular lipid kinases that function to control proliferation, apoptosis and other key cellular processes114. PI3 kinases fall broadly into three categories, owing to protein structure, and substrate specificity114,115. Class I PI3Ks primarily convert phosphatidylinositol-4,5-bisphosphate (PI-4,5-

P2) to phosphatidylinositol-3,4,5-triphosphate (PIP3) and class II PI3Ks generate phosphatidylinositol-3-phosphate (PI-3-P) and phosphatidylinositol-3,4-biphosphate (PI-3,4- 115 P2) . Class III PI3Ks only generate PI-3-P. Class I PI3Ks are heterodimeric complexes that contain both regulatory and catalytic subunits115. Class II and III PI3Ks, in contrast, lack regulatory subunits. Class I PI3Ks are further dichtomised into IA and IB according to the specific regulatory and catalytic subunits present115. PI3Ks of Class IA are the most frequent members of the PI3K family that are associated with colorectal cancer114. Class IA PI3Ks are activated by receptor tyrosine kinases, G-protein coupled receptors and other oncogenes included members of the Ras/Raf family114. Activation of PI3K results in downstream protein kinase B (AKT) signalling and pro-tumourigenic cellular adaptations, such as enhanced proliferation and decreased apoptosis.

The phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha gene (PIK3CA) encodes p110α, a catalytic subunit of class IA PI3Ks. PIK3CA mutations occur in 25-30% of colorectal cancers, and unlike other putative driver mutations, such as those in KRAS and BRAF, PIK3CA mutations often co-occur with mutations in other driver genes112. There are three hotspot mutation sites on the PIK3CA gene, codon 542, 545 and 1047. Mutation of these residues accounts for 45% of all PIK3CA mutations, indicating positive selection pressures at the loci. Mutation at codon 542 and 545 occur in the helical domain of the protein and disrupt binding with the corresponding regulatory subunit, which usually serves to inhibit kinase activity114, thus increasing lipid kinase activity and downstream pro-tumourigenic signalling. However, helical domain mutants are dependent on cross-talk from the MAPKinase pathway in the form of binding by RAS-GTP116. Codon 1047 resides in the kinase domain of the protein, and serves in increase lipid kinase activity. In contrast to helical domain mutants, the PI3K/AKT inducing potential of kinase domain mutants depends on interaction with the regulatory subunit and is independent of RAS-GTP116.

34

Although PIK3CA mutation accounts for most of the aberrant PI3K/AKT signalling observed in colorectal cancer, other, less frequent alterations have also been described. These include overexpression of AKT genes117,118, loss of phosphatase and tensin homolog (PTEN)119, and overexpression/amplification of upstream receptors.

3. The WNT signalling pathway

The WNT signalling pathway is an important signalling pathway that is responsible for regulating stemness, cell polarity, proliferation and differentiation. The WNT signalling pathway is crucial in maintaining the stem cell niche of the intestine120, and becomes deregulated in most colorectal cancers52. The WNT signalling pathway is comprised of two main variants, canonical WNT signalling and non-canonical WNT signalling. Canonical WNT signalling is initiated when WNT ligands bind surface Frizzled receptors121, inducing receptor dimerization with LDL receptor related protein 5/6 (LRP5/6). This process results in the recruitment of Dishevelled121, and the phosphorylation of LRP5/6. Axin, a key mediator of the β-catenin destruction complex, is sequestered to the plasma membrane, preventing complex formation. As such the β-catenin molecule is stabilized, and is able to translocate to the nucleus and shift to favour stemness, and proliferation121. This process is important for stem cell maintenance in the normal intestinal epithelium. However the canonical WNT signalling pathway is hijacked in >80% of colorectal cancers52.

The aforementioned β-catenin destruction complex is comprised of several proteins. APC, Glycogen synthase kinase 3 beta (GSK3β), AXIN1/2, and CKα form a complex that phosphorylates β-catenin, marking the protein for proteasomal degradation121. APC is a critical subunit of the complex, and loss of APC results in the inability of the cell to degrade β-catenin. Truncating mutation and genomic deletions of the APC gene are very common in colorectal cancers and precursor lesions. In The Cancer Genome Atlas colorectal cancer project, 75% of colorectal cancers were reported to harbour deleterious alterations in APC, with most alterations being truncating mutations. Moreover, Borowsky and colleagues reported APC mutations to occur in 89% of conventional adenomas54, and 15-30% of advanced serrated lesions54. Although much less common, mutations of other elements of canonical WNT signalling have been reported. This includes in β-catenin itself, where mutations are usually confined to the residues

35 that become phosphorylated, effectively blocking degradation122. AXIN1/2, another member of the destruction complex, is also mutated in some gastrointestinal malignancies123.

WNT signalling can also become dysregulated following epigenetic alterations (reviewed in Sermen et al 2014124). The Secreted frizzled-related proteins (SFRP) family of proteins are secreted frizzled related proteins that are capable of binding WNT ligands125. When released into the extracellular environment, these proteins bind free WNT ligand, preventing their interaction with Frizzled receptors and abrogating canonical WNT activation125. This interaction represents a mechanism for homeostatic control of local WNT signalling. The genes encoding the SFRP proteins can become methylated, and expression lost during colorectal neoplasia126,127, increasing the availability of free extracellular WNT ligand, and promoting canonical WNT signalling.

Epigenetic Alterations in Colorectal Cancers

This review of the molecular genetics of colorectal cancer has thus far focused on the genetic events that underpin colorectal cancer. These have included mutations in driver genes, and phenomena such as chromosomal and microsatellite instability. These events alter the genetic material of the cell. Epigenetics describes alterations that do not alter the sequence of the genome, but rather modulate gene expression through a series of modifications either directly on the DNA, such as DNA methylation128, or on other nuclear elements, such as histones129. These alterations shift the state of the chromatin and make it more, or less permissive to active transcription.

1. The CpG Island Methylator Phenotype

Fifteen percent of colon cases arise as a direct result of an epigenetic phenotype, namely the CpG island methylator phenotype (CIMP). CIMP describes the widespread accumulation of DNA methylation throughout the genome128,130. DNA methylation is a term describing the covalent modification of 5-cytosine to 5-methylcytosine and preferentially occurs at CG dinucleotides, and particularly those in CpG islands, which are CpG dense regions of the genome131. DNA methylation in certain regulatory regions can alter the transcription of nearby genes132. In colorectal cancers with CIMP important tumour suppressor genes are silenced via this mechanism. One such gene is CDKN2A, the gene encoding two negative regulators of the cell

36 cycle, p16 and p14ARF71. The p16 protein acts as an inducer of senescence in response to oncogenic stimuli133,134, and hence has an important role to play in restraining tumourigenesis. Promoter methylation of CDKN2A is associated with reduced transcript expression135,136, advanced stage and diminished survival137,138, although evidence supporting poorer overall survival is somewhat contradictory139. Methylation of the hMLH1 gene is another important event in colorectal cancer tumourigenesis arising from CIMP. hMLH1 is a mismatch repair protein87 (as reviewed earlier), and is a putative target for cancer-specific DNA methylation98, resulting in gene silencing and microsatellite instability99. This is an early event in CIMP-driven neoplasia and occurs prior to invasive cancer developing41. In the serrated neoplasia pathway, loss of MLH1 occurs abruptly at the transition from a non-dysplastic sessile serrated adenoma, to one with cytological dysplasia41. Given progression to cancer follows rapidly after the development of dysplasia, it is probable that the loss of hMLH1 is a key event promoting malignancy in this context.

Studies of the CpG island methylator phenotype have evolved over time with improved technologies, but a consistent definition of the phenotype is still lacking. For example, in one of the earliest studies on CIMP, Weisenberger and colleagues130 evaluated the DNA methylation of 195 loci across 295 primary colon cancer specimens. This study identified a cluster of colorectal cancers displaying a distinct DNA methylation profile, increased frequency of BRAF mutation, enrichment for location in the proximal colon and a preponderance for occurring in female patients130. The authors proposed a panel of five markers (Calcium voltage-gated channel subunit alpha1 G (CACNA1G), suppressor of cytokine signaling 1 (SOCS1), RUNX family transcription factor 3 (RUNX3), Neurogenin 1 (NEUROG1), and Insulin like growth factor 2 (IGF2)) to be used to diagnose CIMP via methylation specific PCR, where methylation of ≥ 3 markers is sufficient to categorise a tumour as CIMP+130. Ogino et al140 proposed different CIMP marker panel that included CACNA1G and NEUROG1, but assessed MLH1, CDKN2A and Cellular retinoic acid binding protein 1 (CRABP1) in lieu of the remaining markers from Weisenberger et al130. This study also identified a strong association with CIMP and BRAF mutation. The same group used these markers to identify a group of cancers with an attenuated CIMP-like phenotype, CIMP-low141.

37

Jia et al142 identified as many as 16 different definitions of CIMP in the literature, and the variable definitions of CIMP provides a partial explanation for the conflicting evidence of the prognostic significance of CIMP. Some studies support the premise that CIMP status is a positive prognostic indicator143, some indicate that the prognostic impact of CIMP is location dependent144, or is influenced by microsatellite instability145. A unified definition of CIMP and a more comprehensive understanding of the phenotype would resolve this shortcoming.

There have been several studies in the genomics era that have attempted to understand the wider context of the CpG island methylator phenotype. Both The Cancer Genome Atlas Consortium49 and Hinuoe et al146 leveraged microarray technology to assay DNA methylation on larger scale than previously possible with low-throughput PCR based methods. The array used in these studies simultaneously assessed the DNA methylation levels of ~27,000 CpG sites, most of which were in the promoter of known genes. In both studies, sample clustering based on the most informative CpG sites yielded four distinct clusters of colorectal cancers (CIMP-High, CIMP- Low, CIMP-Negative 1, CIMP-Negative 2)49,146. In keeping with earlier works, this study identified an enrichment for female gender, proximal colonic location and BRAF mutation in CIMP-High cancers. CIMP-Low cancers had the highest proportion of KRAS mutant cancers and a slight, but non-significant enrichment for occurring in males. This accords with the works of Ogino and colleagues141. The CIMP-Negative clusters, of which there were two, could be discriminated by the high frequency of TP53 mutations in CIMP-Negative 1 (65%) when compared with CIMP-Negative 2 (16%). Although CIMP-High cancers occurred at the highest average age (71) this was not significantly higher than the remaining subtypes. This study provides a more accurate snapshot of the promoter methylation landscape of CIMP, however the technology is limited by the number and location of probes. It is not clear how CIMP affects the DNA methylation landscape of non-promoter CpGs, nor those in regulatory regions such as enhancers. Moreover, the previous studies have linked CIMP to advanced age130, however Hinoue et al146 did not observe this phenomena in their genome scale study. Cancers in this study occurred at a high average age overall, and it is possible that the sampling bias may have been a confounder.

2. Epigenetic modifiers and Colorectal Cancer

38

Other epigenetic marks are frequently altered in colorectal carcinogenesis. In the mammalian nucleus, 146 base-pairs of DNA is wound around a structure called the nucleosome147–149. The nucleosome is an octomeric protein complex comprised of two of each of the Histone proteins, H1, H2A, H2B, and H3149. Mutations in genes encoding histones150–152, or direct post- translational modification of their tails153 can influence how close nucleosomes are to one another. Transcriptional machinery cannot access the DNA underlying closely compacted nucleosomal architecture, and as result genes in these regions are usually not expressed154. It follows that the conformation of the nucleosome, mediated by the histone proteins, is instructive for gene expression.

The N-terminus of histone proteins are the histone tails155. The histone tails contain a series of residues that can be post-translationally modified to change the compaction of the chromatin. For example, trimethylation of lysine-27 on Histone H3 (H3k27me3) is associated with inaccessible chromatin and repressed gene expression153, while acetylation of the same residue is positively associated with gene expression and open chromatin153. Histone modifications, like DNA methylation, are deposited and removed by enzymes. Histone methyltransferases and Histone demethylases add and remove methyl groups from histone tails156. Histone acetylases and deacetylases perform a similar function for the acetyl mark157. Mutations in the genes that encode these proteins can abrogate their ability to catalyse these reactions. In colorectal cancer, a number of epigenetic modifiers have been reported as mutated or lost. SET domain containing 1B, histone lysine methyltransferase (SETD1B) is a gene encoding a histone lysine methytransferase that methylates H3 at lysine-4 (H3K4)158. Truncating mutations in SETD1B occur in ~35% of microsatellite unstable cancers, and these mutations are associated with concomitant loss of expression159. Likewise Lysine methyltransferase 2C (KMT2C) encodes a histone methytrasferase that deposits Monomethylation of histone H3 at lysine 4 (H3K4me1), typically at enhancer elements, and commonly mutated and lost in colorectal cancers160,161. These studies highlight the propensity for epigenetic regulators to be mutated in colorectal cancer, however to date there is no systematic analysis of the mutational landscape of these genes, nor a consideration of their impact in concert with dysregulation of DNA methylation.

The Consensus Molecular Subtypes of Colorectal Cancer and Precision Medicine

39

There have been numerous attempts at leveraging genomics to identify subtypes of colorectal cancer that can confer both prognostic information and predict treatment efficacy, the most prominent of which is consensus molecular subtyping (CMS)162. By analysis of transcriptomic data from 4,151 cancer samples, Guinney et al162 identified four distinct subtypes of colorectal cancers, CMS1 through CMS4. CMS1 or the MSI/Immune subtype, which comprised 14% of the total cohort, was enriched for cancers with microsatellite instability, CIMP-High cancers, and BRAF mutation (Table 1). At the transcript level, these cancers displayed marked enrichments for immune associated processes. In keeping, cell type deconvolution of the transcriptomes of CMS1 cancers shows an enrichment for tumour resident immune cells163 and heightened T-cell reactivity164. These characteristics imply that CMS1 cancers may respond well to immune checkpoint inhibition. CMS2 cancers (or the canonical subtype) have an unstable genomic architecture, with frequent somatic copy number alterations, harbour APC and TP53 mutations, and have high expression of classical canonical signalling pathways, including the MYC proto- oncogene, bHLH transcription factor (MYC) and WNT signalling pathways162. A subsequent analysis by Berg et al165 indicated that the upregulation of these signalling pathways is the result of DNA copy number gains, rather than substantial cross-talk between the tumour epithelial cells and the microenvironment165. The CMS3 subtype of colorectal cancers show transcriptional dysregulation of metabolic processes162. These cancers often harbour KRAS mutation, have a moderate mutation burden, and are often CIMP-Low (Table 1)162. The final consensus molecular subtype, CMS4, is described as the mesenchymal subtype162. These cancers upregulate several processes involved in the epithelial-to-mesenchymal transition, angiogenesis, and remodelling of the extracellular matrix (Table 1)162. In keeping, CMS4 cancers are more often metastatic and are associated with poor patient outcomes162.

40

Table 1 – Consensus molecular subtypes of colorectal cancer162

Characteristic CMS1 CMS2 CMS3 CMS4 SCNA + +++ ++ +++ Molecular Mutation Load +++ - ++ - Phenotypes CpG Island Methylator +++ - + - Phenotype BRAF Mutation ++ - - - KRAS Mutation + + ++ + Mutational Profiles APC Mutation ++ +++ ++ ++ TP53 Mutation + +++ ++ +++ WNT Moderate High Moderate Low MYC Moderate High Moderate Low Signalling Pathway TGF-Beta High Low Low High Activity Immune Activation High Low Moderate Moderate EMT Pathway Moderate Low Low High 5-Year Survival (%) 75% 82% 80% 66% Clinical Features Tumour Side (% Right) 75% 25% 50% 42% Footnote: - : 0 - 25%, +: 25 - 50%, ++: 50-75%, +++: 75-100%

Models of Colorectal Carcinogenesis

Investigating colorectal cancer using patient specimens alone is limited to observational studies that can identify correlations, but not causal relationships. Various model systems have been established to interrogate the underlying biology and mechanisms driving colorectal cancer, including in vitro models such as cell lines and organoids, and in vivo model systems, including transgenic and xenograft models.

1. In vitro models of Colorectal cancer

In vitro modelling of colorectal cancer has leaned heavily on cell lines derived from patient specimens. The molecular features of colorectal cancer cell lines broadly represent some of the molecular subtypes observed in clinical practise166. Each cell line has defining characteristics, including molecular alterations, growth patterns and morphology. Berg and colleagues characterised the genome and transcriptome of 34 frequently used colorectal cancer cell lines166. In keeping with previous studies of patient specimens, microsatellite unstable and hypermutated

41 colorectal cancer cell lines upregulate pathways involved in inflammation and immune response and were classified as belonging to the first consensus molecular subtype. The remaining cell lines were dispersed in CMS2 through CMS4. In concordance with human data, hypermutability and chromosomal instability in cell lines was mutually exclusive, with the number of exonic variants inversely correlating with the proportion of the genome harbouring aberrant DNA copy number166. Thus cell lines represent an important tool for colorectal cancer research, however they do present some inherent limitations. For example, while driver mutations that are ubiquitous in patient specimens are also identified in cancer cell lines49,166,167, the transcriptomic profile of cell lines and primary samples is markedly different. Gillet et al168 examined the expression profile of 380 genes in a series of primary specimens and cancer cell lines from various tissue types and identified strong expression based clustering that discriminated between primary samples and cell lines in each tissue type, highlighting the transcriptional alterations present between these in vitro models and the disease they are purportedly modelling. Moreover, the transcriptional architecture of cell lines of different cancers bore more similarity to each other than the they did to the patient specimens of the same cancer type168. Furthermore, the process of establishing cell lines from patient samples is laborious, prone to failure, and may introduce a bias as not all primary specimens are able to be cultured using these methods. Advances in stem cell biology and the advent of organoids have addressed many of these limitations.

Intestinal organoids are three-dimensional models that are derived from tissue-resident stem cell populations that, under appropriate culture conditions and with access to key growth factors, are able to persist, and grow into structures that are reminiscent of the gut169. Organoids can be grown from both cancer and normal mucosal samples and retain the genetic architecture of the parent specimen during long term culture169. Organoids, in contrast to cell lines, can be readily established from most cancer and normal samples, and have a establishment efficiency of ~90%170. Establishing organoids from tissue stem cells requires a unique combination of niche factors. Sato et al identified three key factors, WNT, EGF and Noggin, that must be present to sustain organoid growth171. WNT stimulation is critical for crypt proliferation and this can be achieved via supplementation of the culture medium with R-Spondin, a powerful WNT agonist171. The sustained proliferation depends on the epidermal growth factor signalling pathway, which is activated by the addition of exogenous EGF ligand171. Lastly, Noggin, which

42 induces an expansion of the number of crypts, underpins the successful passaging of organoids171.

Organoids are an ideal tool for cancer modelling, and have been used extensively to explore the genetic alterations that underpin colorectal cancers arising via different molecular pathways. As mentioned earlier, colorectal cancers arise via two molecular pathways, the conventional and the serrated neoplasia pathway. Drost and colleagues used genetic engineering via the CRISPR-Cas9 system to introduce a sequence of genetic alterations in healthy intestinal organoids to investigate tumour initiation and progression172. Beginning with organoids engineered to express the oncogenic KRASG12D mutation, Drost and colleagues knocked out APC, TP53 and SMAD4 sequentially, producing organoids that were capable of growing in culture medium absent of growth factors172. The quadruple mutant organoids formed invasive carcinomas on xenotransplantation characterised by marked aneuploidy and chromosomal mis-segregation, a feature of conventional pathway of colorectal carcinogenesis172. Lannagan et al173 adopted a similar step-wise approach in recapitulating serrated colorectal carcinogenesis. Here, the authors derived organoids from mice of the BRAF V637 CA;Villin CreERT2 background, which, upon treatment with 5-hydroxytamoxifen, express the murine analogue of the human BRAFV600E mutation. Genetically engineered mutation of transforming growth factor beta receptor 2 (Tgfbr2), Ring finger protein 43 (Rnf43), Zinc and ring finger 3 (Znrf3), p16Ink4a, and Mlh1 was sufficient to produce tumours when orthotopically injected into immunodeficient mice173 and these tumours had characteristic features of serrated neoplasia, including a mucinous endoscopic appearance and serrated cytology173.

2. Animal Models of Colorectal Cancer

Whilst both organoids and traditional cell culture models are valuable resources for cancer research, they both lack physiological context. In vivo modelling using animal models can be useful to address this. There are two main classes of animal models that are used in colorectal cancer, animals that are genetically engineered to have lost a tumour suppressor gene or express an oncogene, or models where cancer is induced via administration of a chemical carcinogen.

43

Transgenic animal models are useful for investigating the effects of specific genes on colorectal carcinogenesis. TP53 and APC are among the most frequently mutated genes in colorectal cancer49 and numerous transgenic mouse models have been engineered to explore their role in colorectal cancer. The Tp53R172H and Tp53R270H murine models develop spontaneous tumours of the intestine174, however Tp53Null animals do not develop such a phenotype, indicating that the tumour formation by missense Tp53 mutations is probably via a gain-of-function mechanism174. In keeping, Schwitalla and colleagues also noted that Tp53 deletion private to the intestinal enterocytes was, in isolation, not sufficient for generating tumours175. Chemical carcinogen induced tumour incidence, however, was markedly increased by loss of Tp53175, indicating that Tp53 modifies the risk of initiation however only in the setting of other driver alterations. In contrast, models of Apc are consistently reported to develop adenomas176, and in some models, invasive carcinoma177,178. The earliest of the Apc mutant murine models is the ApcMin/+ mouse. The model was developed by Moser, Pitot and Dove179 by germline mutatgenesis using the powerful chemical mutagen ethylnitrosourea. Moser and colleagues identified progeny of the mutagenesis experiment that development numerous intestinal lesions that were similar to human colorectal adenomas179. This line of animals was named MIN to describe the phenotype of multiple intestinal neoplasia present in these mice. Linkage analysis later mapped the causal mutation to the Apc locus180, and identified the specific alteration as a nonsense mutation at codon 850. ApcMin/+ mice develop high number of intestinal lesions, however these are rarely invasive179. The short lifespan of these animals is instead linked to malnutrition induced by the high lesion load in the intestine179. The shortened lifespan of the ApcMin/+ mouse restrains adenomas from developing the requisite alterations for malignant progression. This limitation was overcome by crossing males on different genetic backgrounds with ApcMin/+ female mice. The resulting F1 have a much increased lifespan and develop fewer intestinal tumours, however the tumours that did develop were predominantly invasive adenocarcinomas181.

The MAPKinase signalling cascade is commonly dysregulated in colorectal cancer by mutation of either KRAS or BRAF. Both mutations induce constitutive MAPKinase signalling activation and cell proliferation, but can also induce different phenotypic characteristics. Murine models of both Braf and Kras mutation have attempted to delineate the differences between these mutations. Feng et al182 showed that inducing KrasG12D mutation in the murine intestine results in

44 crypt serration and histological abnormalities akin the hyperplastic polyps, but does not result in adenoma formation nor does Kras expand the pool of intestinal stem cells182. Kras mutation also increases the number of goblet cells at the expense of Paneth cells182. The lack of tumour initiating ability of KrasG12D is a consistent finding across several studies182,183. However, KrasG12D mutation does appear to influence tumour progression in the setting of Apc loss183. When KrasG12D animals were crossed with Apc2lox14 mice, animals developed more numerous and more dysplastic tumours when compared with animals bearing the Apc2lox14 alteration alone183. These animals also had poorer survival and the adenocarcinomas were uniformly poorly differentiated183. It appears that Kras mutation modifies the trajectory of Apc driven neoplasia. In keeping, KrasG12D mutation also increased the likelihood of metastasis and reduced animal survival on a Apcnull/Tp53null background184.

In contrast to Kras mutation, there have been much fewer studies on murine models of Braf mutation. One notable murine model of Braf mutation is the BrafCA;VillinCreERT2 model developed by Rad and colleagues185. In this model, the BrafV637 mouse can be induced via the cre recombinase directed DNA recombination. These mice were crossed with VillinCre animals to provide a source of cre recombinase. The expression of the Cre gene is driven by a promoter derived from the Villin gene promoter, and as such the gene is only expressed where Villin is expressed, which is the lower GI tract186. Rad et al185 reported that all animals on the BrafV637;VillinCre background developed intestinal hyperplasia, most developed adenomas that were histologically similar to human traditional serrated adenomas, and a small number (11%) developed invasive carcinoma. Of interest in this study was the assessment of microsatellite instability. In humans, BRAF mutant serrated neoplasia progresses to cancer via the acquisition of methylation at MLH1 and, by consequence, microsatellite instability, or via other genetic alterations, maintaining functional DNA mismatch repair29. The former occurs in ~60% of BRAF mutant cancers, and the later in the remaining 40%29. Rad et al185 reported that the majority of Braf mutant lesions were either MSI-Low or MSI-H. Intriguingly, our group failed to recapitulate microsatellite instability using a similar model187, indicating that propensity to develop microsatellite instability in the setting of Braf mutation may be strain dependent. This is in keeping with human studies that link a specific polymorphism in the MLH1 promoter to hypermethylation in BRAF mutant sessile serrated lesions188. By crossing BrafV637;VillinCre

45 mice with p16null animals, Rad et al185 also showed that p16 restrains malignant progression of advanced murine serrated adenomas. In humans, CDKN2A is often methylated and lost in BRAF mutant cancers135.

Our laboratory used a similar model (the BrafCA;VillinCreERT2 model) to investigate whether Braf mutation could induce the CpG island methylator phenotype. Our model was different in that the Cre gene was fused to Ert2, restraining Cre to the cytosol. Upon stimulation of the estrogen receptor domain of the fusion protein, the cre-recombinase could translocate to the nucleus and recombine the V637 allele of Braf. This allowed for temporal control of mutation induction. When the mutation was induced at wean, we observed intestinal hyperplasia by 10 days post recombination and progressively severe serrated neoplasia as animals aged187. We observed a congruent increase in DNA methylation at 94 colon cancer related genes as Braf mutant animals aged187, indicating that prolonged exposure to oncogenic Braf is sufficient to generate a CIMP- like phenotype. This model has been used extensively in the research described in the coming chapters.

The Aging Epigenome

Epigenetic modifications are also influenced by the aging process189–191. Age-associated DNA methylation is perhaps the best example of how the epigenetic landscape can be a function of patient age. Age-associated DNA methylation refers to how the methylation status of certain CpG sites changes over the natural lifespan of an organism189. There are many examples of specific gene loci that contain promoter CpGs that are prone to age-associated DNA methylation. Estrogen receptor 1 (ESR1), the gene encoding the oestrogen receptor, is one of the earliest reported examples of age-associated DNA methylation in the non-neoplastic colonic mucosa192. ESR1 becomes progressively hypermethylated throughout the aging process, and notably this loci is also hypermethylated in most colorectal cancers, indicating some degree of cross-talk between age-associated and cancer-associated DNA methylation192. The methylation status of other genes such as GATA binding protein 5 (GATA5), HIC ZBTB transcriptional repressor 1 (HIC1), hyperpigmentation, progressive, 1 (HPP1) and SFRP1193 also shifts with age. The process does not appear to be purely stochastic, and bears some tissue and gene pathway specificity194,195. Maegawa and colleagues195 showed that the age associated methylation of

46 certain loci is a predisposition for occurring in particular tissue types. For example, Gpr37 becomes increasing hypermethylated in the aging murine small intestine, but does not become hypermethylated in the lungs, liver, kidney or spleen through aging195.

However there are some CpG sites within the genome which accumulate or lose DNA methylation in an age-dependent, but tissue agnostic manner196,197. Armed with 7,844 non- cancerous samples derived from 51 tissue types, Horvath198 developed the first “epigenetic clock”. The term epigenetic clock was coined to describe the ability to predict chronological age based on the DNA methylation state of a set of CpGs. Horvath used penalized regression to identify 353 CpG sites that were powerful predictors of chronological age in non-diseased tissues. This prediction algorithm can accurately estimate age with a mean error of 3.6 years198. In addition, Hovarth showed that all cancer types display a marked acceleration of epigenetic age198. In the years since this seminal study, heightened epigenetic age has been associated with numerous pathologies including solid and haematological malignanies199,200, cardiovascular disease199,201, and Alzheimer’s disease202.

Several environmental factors can modify DNA methylation and epigenetic age. For example, aspirin, a non-steroidal anti-inflammatory pain killer that is often prescribed for primary prevention of coronary heart disease203 and has protective effects against colorectal adenoma formation204, modulates age associated colonic DNA methylation205. In post-menopausal females, hormone replacement therapy also shifted the DNA methylation profile of the normal colonic mucosa205. In keeping, menopause is associate with an acceleration of epigenetic age206. Cigarette smoking also alters DNA methylation and accelerates epigenetic age207,208. This may be particularly relevant to cancer risk for serrated neoplasia, which features CIMP29, disproportionately affects females130,146, occurs at a higher average age29,31,130, and is strongly associated with smoking209.

47

Conclusion and Key Questions

Colorectal cancer is underpinned by a complex sequence of molecular events that occur along a continuum from normal colonic epithelium through to invasive cancer. Mutation of the APC tumour suppressor gene gives rise to conventional adenomas, which acquire mutations in KRAS and TP53, while accumulating gross chromosomal instability and progressing to invasive cancer. Cancers in the serrated pathway are initiated by mutation in the BRAF proto-oncogene, which triggers the formation of a sessile serrated lesion, and at some point throughout their sojourn to malignancy acquire DNA methylation and epigenetic alterations in progressing to cancer. The extent of these DNA methylation events, especially in non-promoter associated genomic elements, is not entirely clear. Recent studies have shown the impact of age upon the DNA methylation profile of the normal colonic mucosa. As yet, it has not been established whether these age related DNA methylation events have consequences for the risk of serrated colorectal neoplasia. Therefore, in this thesis I seek to resolve three fundamental questions relating to colorectal cancer biology. First, what is the true extent of the CpG island methylator phenotype?; Second, Does age-related hypermethylation, and namely that occurring at the loci encoding tumour suppressor genes, increase the risk of serrated colorectal neoplasia?; and if true, how can we reconcile this with the existence of early onset serrated colorectal cancer?

In addressing these questions, I form the following aims and hypotheses

Aims and Hypotheses

Hypotheses

1. The DNA methylation profile of colorectal cancers will be heterogeneous, and high

throughput DNA methylation profiling will uncover novel subtypes of colorectal cancers.

2. Age associated DNA methylation elevates the risk of BRAF mutation induced neoplastic

progression.

48

3. Early onset BRAF mutant colorectal cancers acquire early genetic alterations of WNT

signalling to progress to malignancy, bypassing the need for extensive DNA methylation

at WNT signalling regulators.

Aims

1. To evaluate the DNA methylation profile of a large series of unselected colorectal

cancers using high throughput microarray technology

2. To investigate the role of aging and DNA methylation in determining the malignant

potential of the serrated neoplasia pathway

3. To survey the landscape of WNT signalling pathway alterations in BRAF mutant

colorectal cancers

49

50

References 1. Australian Institute of Health and Welfare. Cancer in Australia: In brief 2019. (Australian Institute of Health and Welfare). 2. Zauber, A. G. The impact of screening on colorectal cancer mortality and incidence: has it really made a difference? Dig. Dis. Sci. 60, 681–691 (2015). 3. Siegel, R. L., Miller, K. D. & Jemal, A. Cancer statistics, 2020. CA: A Cancer Journal for Clinicians 70, 7–30 (2020). 4. Marshall, D. C. et al. Trends in UK regional cancer mortality 1991-2007. Br. J. Cancer 114, 340–347 (2016). 5. Gehart, H. & Clevers, H. Tales from the crypt: new insights into intestinal stem cells. Nature Reviews Gastroenterology & Hepatology 16, 19–34 (2019). 6. Kiela, P. R. & Ghishan, F. K. Physiology of Intestinal Absorption and Secretion. Best Pract Res Clin Gastroenterol 30, 145–159 (2016). 7. Umar, S. Intestinal stem cells. Curr Gastroenterol Rep 12, 340–348 (2010). 8. Barker, N. et al. Identification of stem cells in small intestine and colon by marker gene Lgr5. Nature 449, 1003–1007 (2007). 9. Dufour, G. et al. Human Intestinal Epithelial Cell Survival and Anoikis DIFFERENTIATION STATE-DISTINCT REGULATION AND ROLES OF PROTEIN KINASE B/Akt ISOFORMS. J. Biol. Chem. 279, 44113–44122 (2004). 10. Kaiko, G. E. et al. The Colonic Crypt Protects Stem Cells from Microbiota-Derived Metabolites. Cell 165, 1708–1720 (2016). 11. Nguyen, L. H., Goel, A. & Chung, D. C. Pathways of Colorectal Carcinogenesis. Gastroenterology 158, 291–302 (2020). 12. Bird, R. P. Observation and quantification of aberrant crypts in the murine colon treated with a colon carcinogen: preliminary findings. Cancer Lett. 37, 147–151 (1987). 13. Caderni, G. et al. Characterisation of aberrant crypt foci in carcinogen-treated rats: association with intestinal carcinogenesis. Br J Cancer 71, 763–769 (1995). 14. Stopera, S. A., Murphy, L. C. & Bird, R. P. Evidence for a ras gene mutation in azoxymethane-induced colonic aberrant crypts in Sprague—Dawley rats: earliest recognizable precursor lesions of experimental colon cancer. Carcinogenesis 13, 2081–2085 (1992).

51

15. Vivona, A. A. et al. K-ras mutations in aberrant crypt foci, adenomas and adenocarcinomas during azoxymethane-induced colon carcinogenesis. Carcinogenesis 14, 1777–1781 (1993). 16. Roncucci, L., Stamp, D., Medline, A., Cullen, J. B. & Robert Bruce, W. Identification and quantification of aberrant crypt foci and microadenomas in the human colon. Human Pathology 22, 287–294 (1991). 17. Adler, D. G. et al. Endoscopic identification and quantification of aberrant crypt foci in the human colon. Gastrointestinal Endoscopy 56, 657–662 (2002). 18. Takayama, T. et al. Aberrant Crypt Foci: Detection, Gene Abnormalities, and Clinical Usefulness. Clinical Gastroenterology and Hepatology 3, S42–S45 (2005). 19. Chan, A. O.-O. et al. CpG Island Methylation in Aberrant Crypt Foci of the Colorectum. The American Journal of Pathology 160, 1823–1830 (2002). 20. Rosenberg, D. W. et al. Mutations in BRAF and KRAS Differentially Distinguish Serrated versus Non-Serrated Hyperplastic Aberrant Crypt Foci in Humans. Cancer Res 67, 3551–3554 (2007). 21. Strum, W. B. Colorectal Adenomas. N Engl J Med 374, 1065–1075 (2016). 22. Bettington, M. et al. Sessile Serrated Adenomas in Young Patients may have Limited Risk of Malignant Progression. Journal of Clinical Gastroenterology 53, e113 (2019). 23. Hall, J. F. Management of Malignant Adenomas. Clin Colon Rectal Surg 28, 215–219 (2015). 24. Cottet, V. et al. Long-term risk of colorectal cancer after adenoma removal: a population- based cohort study. Gut 61, 1180–1186 (2012). 25. Kuipers, E. J. et al. Colorectal cancer. Nat Rev Dis Primers 1, 15065 (2015). 26. Bosman, F. T., Carneiro, F., Hruban, R. H. & Theise, N. D. WHO classification of tumours of the digestive system: WHO Classification of Tumours, Volume 3. (International Agency for Research on Cancer, 2010). 27. Brown, I. et al. POLYPECTOMY AND LOCAL RESECTIONS OF THE COLORECTUM STRUCTURED REPORTING PROTOCOL. (Royal Australasian College of Pathologists). 28. O’Brien, M. J. et al. The National Polyp Study. Patient and polyp characteristics associated with high-grade dysplasia in colorectal adenomas. Gastroenterology 98, 371–379 (1990).

52

29. Leggett, B. & Whitehall, V. Role of the serrated pathway in colorectal cancer pathogenesis. Gastroenterology 138, 2088–2100 (2010). 30. Spring, K. J. et al. High prevalence of sessile serrated adenomas with BRAF mutations: a prospective study of patients undergoing colonoscopy. Gastroenterology 131, 1400–1407 (2006). 31. Kambara, T. et al. BRAF mutation is associated with DNA methylation in serrated polyps and cancers of the colorectum. Gut 53, 1137–1144 (2004). 32. Bettington, M. L. et al. A clinicopathological and molecular analysis of 200 traditional serrated adenomas. Modern Pathology 28, 414–427 (2015). 33. Jass, J. R. Hyperplastic polyps and colorectal cancer: is there a link?1. Clinical Gastroenterology and Hepatology 2, 1–8 (2004). 34. Crockett, S. D. & Nagtegaal, I. D. Terminology, Molecular Features, Epidemiology, and Management of Serrated Colorectal Neoplasia. Gastroenterology 157, 949-966.e4 (2019). 35. Murakami, T., Sakamoto, N. & Nagahara, A. Endoscopic diagnosis of sessile serrated adenoma/polyp with and without dysplasia/carcinoma. World J Gastroenterol 24, 3250–3259 (2018). 36. Qazi, T. M. et al. Epidemiology of Goblet Cell and Microvesicular Hyperplastic Polyps: American Journal of Gastroenterology 109, 1922–1932 (2014). 37. Rex, D. K. et al. Serrated Lesions of the Colorectum: Review and Recommendations From an Expert Panel. Am J Gastroenterol 107, 1315–1330 (2012). 38. Rosty, C. & Bettington, M. Serrated colorectal polyps and polyposis. Diagnostic Histopathology 20, 30–37 (2014). 39. Tadepalli, U. S. et al. A morphologic analysis of sessile serrated polyps observed during routine colonoscopy (with video). Gastrointest. Endosc. 74, 1360–1368 (2011). 40. Payne, S. R. et al. Endoscopic Detection of Proximal Serrated Lesions and Pathologic Identification of Sessile Serrated Adenomas/Polyps Vary on the Basis of Center. Clinical Gastroenterology and Hepatology 12, 1119–1126 (2014). 41. Bettington, M. et al. Clinicopathological and molecular features of sessile serrated adenomas with dysplasia or carcinoma. Gut 66, 97–106 (2017). 42. Carr, N. J., Mahajan, H., Tan, K. L., Hawkins, N. J. & Ward, R. L. Serrated and non- serrated polyps of the colorectum: their prevalence in an unselected case series and correlation

53

of BRAF mutation analysis with the diagnosis of sessile serrated adenoma. J. Clin. Pathol. 62, 516–518 (2009). 43. McCarthy, A. J., Serra, S. & Chetty, R. Traditional serrated adenoma: an overview of pathology and emphasis on molecular pathogenesis. BMJ Open Gastroenterol 6, (2019). 44. Bond, C. E. & Whitehall, V. L. J. How the BRAF V600E Mutation Defines a Distinct Subgroup of Colorectal Cancer: Molecular and Clinical Implications. Gastroenterol Res Pract 2018, (2018). 45. Pino, M. S. & Chung, D. C. THE CHROMOSOMAL INSTABILITY PATHWAY IN COLON CANCER. Gastroenterology 138, 2059–2072 (2010). 46. Makishima, H. & Maciejewski, J. P. Pathogenesis and Consequences of Uniparental Disomy in Cancer. Clin Cancer Res 17, 3913–3923 (2011). 47. Knudson, A. G. Two genetic hits (more or less) to cancer. Nature Reviews Cancer 1, 157–162 (2001). 48. Arnold, C. N. et al. APC promoter hypermethylation contributes to the loss of APC expression in colorectal cancers with allelic loss on 5q. Cancer Biol. Ther. 3, 960–964 (2004). 49. Muzny, D. M. et al. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012). 50. Rowan, A. J. et al. APC mutations in sporadic colorectal tumors: A mutational “hotspot” and interdependence of the “two hits”. PNAS 97, 3352–3357 (2000). 51. Fearon, E. R. & Vogelstein, B. A genetic model for colorectal tumorigenesis. Cell 61, 759–767 (1990). 52. Zhan, T., Rindtorff, N. & Boutros, M. Wnt signaling in cancer. Oncogene 36, 1461–1473 (2017). 53. Schneikert, J. & Behrens, J. The canonical Wnt signalling pathway and its APC partner in colon cancer development. Gut 56, 417–425 (2007). 54. Borowsky, J. et al. The role of APC in WNT pathway activation in serrated neoplasia. Mod Pathol 31, 495–504 (2018). 55. Juárez, M. et al. KRAS and BRAF somatic mutations in colonic polyps and the risk of metachronous neoplasia. PLoS One 12, (2017).

54

56. Baker, D. J., Jin, F., Jeganathan, K. B. & van Deursen, J. M. Whole chromosome instability caused by Bub1 insufficiency drives tumorigenesis through tumor suppressor gene loss of heterozygosity. Cancer Cell 16, 475–486 (2009). 57. Potapova, T. & Gorbsky, G. J. The Consequences of Chromosome Segregation Errors in Mitosis and Meiosis. Biology (Basel) 6, (2017). 58. Lara-Gonzalez, P., Westhorpe, F. G. & Taylor, S. S. The Spindle Assembly Checkpoint. Current Biology 22, R966–R980 (2012). 59. de Voer, R. M. et al. Germline mutations in the spindle assembly checkpoint genes BUB1 and BUB3 are risk factors for colorectal cancer. Gastroenterology 145, 544–547 (2013). 60. Barber, T. D. et al. Chromatid cohesion defects may underlie chromosome instability in human colorectal cancers. Proc Natl Acad Sci U S A 105, 3443–3448 (2008). 61. Solomon, D. A. et al. Mutational Inactivation of STAG2 Causes Aneuploidy in Human Cancer. Science 333, 1039–1043 (2011). 62. Kim, M. S., Kim, S. S., Je, E. M., Yoo, N. J. & Lee, S. H. Mutational and expressional analyses of STAG2 gene in solid cancers. Neoplasma 59, 524–529 (2012). 63. Bailey, M. H. et al. Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173, 371-385.e18 (2018). 64. Kastenhuber, E. R. & Lowe, S. W. Putting p53 in Context. Cell 170, 1062–1078 (2017). 65. Sermeus, A. & Michiels, C. Reciprocal influence of the p53 and the hypoxic pathways. Cell Death Dis 2, e164 (2011). 66. Lakin, N. D. & Jackson, S. P. Regulation of p53 in response to DNA damage. Oncogene 18, 7644–7655 (1999). 67. Ruiz, L. et al. Characterization of the p53 Response to Oncogene-Induced Senescence. PLOS ONE 3, e3230 (2008). 68. Benedict, B. et al. Loss of p53 suppresses replication-stress-induced DNA breakage in G1/S checkpoint deficient cells. eLife 7, 69. Liu, D. & Xu, Y. p53, Oxidative Stress, and Aging. Antioxid Redox Signal 15, 1669– 1678 (2011). 70. Nag, S., Qin, J., Srivenugopal, K. S., Wang, M. & Zhang, R. The MDM2-p53 pathway revisited. J Biomed Res 27, 254–271 (2013).

55

71. Ozenne, P., Eymin, B., Brambilla, E. & Gazzeri, S. The ARF tumor suppressor: structure, functions and status in cancer. Int. J. Cancer 127, 2239–2247 (2010). 72. Bernardi, R. et al. PML regulates p53 stability by sequestering Mdm2 to the nucleolus. Nat. Cell Biol. 6, 665–672 (2004). 73. Carr, M. I., Roderick, J. E., Gannon, H. S., Kelliher, M. A. & Jones, S. N. Mdm2 Phosphorylation Regulates its Stability and has Contrasting Effects on Oncogene and Radiation-Induced Tumorigenesis. Cell Rep 16, 2618–2629 (2016). 74. Giacinti, C. & Giordano, A. RB and cell cycle progression. Oncogene 25, 5220–5227 (2006). 75. Aubrey, B. J., Kelly, G. L., Janic, A., Herold, M. J. & Strasser, A. How does p53 induce apoptosis and how does this relate to p53-mediated tumour suppression? Cell Death & Differentiation 25, 104–113 (2018). 76. Green, D. R. Apoptotic Pathways: Ten Minutes to Dead. Cell 121, 671–674 (2005). 77. Hao, X. P. et al. The spectrum of p53 mutations in colorectal adenomas differs from that in colorectal carcinomas. Gut 50, 834–839 (2002). 78. Freed-Pastor, W. A. & Prives, C. Mutant p53: one name, many proteins. Genes Dev. 26, 1268–1286 (2012). 79. Cho, Y., Gorina, S., Jeffrey, P. D. & Pavletich, N. P. Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations. Science 265, 346–355 (1994). 80. Oijen, M. G. C. T. van & Slootweg, P. J. Gain-of-Function Mutations in the Tumor Suppressor Gene p53. Clin Cancer Res 6, 2138–2145 (2000). 81. Dittmer, D. et al. Gain of function mutations in p53. Nat. Genet. 4, 42–46 (1993). 82. Kieser, A., Weich, H. A., Brandner, G., Marmé, D. & Kolch, W. Mutant p53 potentiates induction of vascular endothelial growth factor expression. Oncogene 9, 963–969 (1994). 83. Ludes-Meyers, J. H. et al. Transcriptional activation of the human epidermal growth factor receptor promoter by human p53. Mol. Cell. Biol. 16, 6009–6019 (1996). 84. Garcia-Diaz, M. & Bebenek, K. Multiple functions of DNA polymerases. CRC Crit Rev Plant Sci 26, 105–122 (2007).

56

85. Garg, P. & Burgers, P. M. J. DNA polymerases that propagate the eukaryotic DNA replication fork. Crit. Rev. Biochem. Mol. Biol. 40, 115–128 (2005). 86. Kunkel, T. A. Evolving views of DNA replication (in)fidelity. Cold Spring Harb. Symp. Quant. Biol. 74, 91–101 (2009). 87. Li, G.-M. Mechanisms and functions of DNA mismatch repair. Cell Research 18, 85–98 (2008). 88. Kunkel, T. A. & Erie, D. A. Eukaryotic Mismatch Repair in Relation to DNA Replication. Annual Review of Genetics 49, 291–313 (2015). 89. Temko, D. et al. Somatic POLE exonuclease domain mutations are early events in sporadic endometrial and colorectal carcinogenesis, determining driver mutational landscape, clonal neoantigen burden and immune response. The Journal of Pathology 245, 283–296 (2018). 90. Palles, C. et al. Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nature Genetics 45, 136–144 (2013). 91. Malkhosyan, S. R., Yamamoto, H., Piao, Z. & Perucho, M. Late onset and high incidence of colon cancer of the mutator phenotype with hypermethylated hMLH1 gene in women. Gastroenterology 119, 598 (2000). 92. Yamamoto, H., Sawai, H., Weber, T. K., Rodriguez-Bigas, M. A. & Perucho, M. Somatic Frameshift Mutations in DNA Mismatch Repair and Proapoptosis Genes in Hereditary Nonpolyposis Colorectal Cancer. Cancer Res 58, 997–1003 (1998). 93. Thibodeau, S. N., Bren, G. & Schaid, D. Microsatellite instability in cancer of the proximal colon. Science 260, 816–819 (1993). 94. Ionov, Y., Peinado, M. A., Malkhosyan, S., Shibata, D. & Perucho, M. Ubiquitous somatic mutations in simple repeated sequences reveal a new mechanism for colonic carcinogenesis. Nature 363, 558–561 (1993). 95. Miyaki, M. et al. Germline mutation of MSH6 as the cause of hereditary nonpolyposis colorectal cancer. Nat. Genet. 17, 271–272 (1997). 96. Aquilina, G. et al. A mismatch recognition defect in colon carcinoma confers DNA microsatellite instability and a mutator phenotype. Proc. Natl. Acad. Sci. U.S.A. 91, 8905– 8909 (1994).

57

97. Kane, M. F. et al. Methylation of the hMLH1 promoter correlates with lack of expression of hMLH1 in sporadic colon tumors and mismatch repair-defective human tumor cell lines. Cancer Res. 57, 808–811 (1997). 98. Veigl, M. L. et al. Biallelic inactivation of hMLH1 by epigenetic gene silencing, a novel mechanism causing human MSI cancers. Proc Natl Acad Sci U S A 95, 8698–8702 (1998). 99. Boland, C. R. & Goel, A. Microsatellite Instability in Colorectal Cancer. Gastroenterology 138, 2073-2087.e3 (2010). 100. Boland, P. M., Yurgelun, M. B. & Boland, C. R. Recent progress in Lynch syndrome and other familial colorectal cancer syndromes. CA: A Cancer Journal for Clinicians 68, 217–231 (2018). 101. Llosa, N. J. et al. The vigorous immune microenvironment of microsatellite instable colon cancer is balanced by multiple counter-inhibitory checkpoints. Cancer Discov 5, 43–51 (2015). 102. Sahin, I. H. et al. Immune checkpoint inhibitors for the treatment of MSI-H/MMR-D colorectal cancer and a perspective on resistance mechanisms. British Journal of Cancer 121, 809–818 (2019). 103. Iwaizumi, M., Tseng-Rogenski, S. & Carethers, J. M. DNA mismatch repair proficiency executing 5-fluorouracil cytotoxicity in colorectal cancer cells. Cancer Biol Ther 12, 756–764 (2011). 104. Shinbrot, E. et al. Exonuclease mutations in DNA polymerase epsilon reveal replication strand specific mutation patterns and human origins of replication. Genome Res 24, 1740– 1750 (2014). 105. Parkash, V. et al. Structural consequence of the most frequently recurring cancer- associated substitution in DNA polymerase ε. Nature Communications 10, 373 (2019). 106. Wang, F. et al. Evaluation of POLE and POLD1 Mutations as Biomarkers for Immunotherapy Outcomes Across Multiple Cancer Types. JAMA Oncol 5, 1504–1506 (2019). 107. Dhillon, A. S., Hagan, S., Rath, O. & Kolch, W. MAP kinase signalling pathways in cancer. Oncogene 26, 3279–3290 (2007). 108. Wee, P. & Wang, Z. Epidermal Growth Factor Receptor Cell Proliferation Signaling Pathways. Cancers (Basel) 9, (2017).

58

109. Ferguson, K. M. A structure-based view of Epidermal Growth Factor Receptor regulation. Annu Rev Biophys 37, 353–373 (2008). 110. Chardin, P. et al. Human Sos1: a guanine nucleotide exchange factor for Ras that binds to GRB2. Science 260, 1338–1343 (1993). 111. Haigis, K. M. KRAS Alleles: The Devil Is In The Detail. Trends Cancer 3, 686–697 (2017). 112. The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012). 113. Davies, H. et al. Mutations of the BRAF gene in human cancer. Nature 417, 949–954 (2002). 114. Zhang, J., Roberts, T. M. & Shivdasani, R. A. Targeting PI3K Signaling as a Therapeutic Approach for Colorectal Cancer. Gastroenterology 141, 50–61 (2011). 115. Engelman, J. A., Luo, J. & Cantley, L. C. The evolution of phosphatidylinositol 3-kinases as regulators of growth and metabolism. Nature Reviews Genetics 7, 606–619 (2006). 116. Zhao, L. & Vogt, P. K. Helical domain and kinase domain mutations in p110α of phosphatidylinositol 3-kinase induce gain of function by different mechanisms. Proc Natl Acad Sci U S A 105, 2652–2657 (2008). 117. Rychahou, P. G. et al. Akt2 overexpression plays a critical role in the establishment of colorectal cancer metastasis. Proc Natl Acad Sci U S A 105, 20315–20320 (2008). 118. Roy, H. K. et al. AKT proto-oncogene overexpression is an early event during sporadic colon carcinogenesis. Carcinogenesis 23, 201–205 (2002). 119. Molinari, F. & Frattini, M. Functions and Regulation of the PTEN Gene in Colorectal Cancer. Front Oncol 3, (2014). 120. Perochon, J., Carroll, L. R. & Cordero, J. B. Wnt Signalling in Intestinal Stem Cells: Lessons from Mice and Flies. Genes (Basel) 9, (2018). 121. MacDonald, B. T., Tamai, K. & He, X. Wnt/β-catenin signaling: components, mechanisms, and diseases. Dev Cell 17, 9–26 (2009). 122. Kim, S. & Jeong, S. Mutation Hotspots in the β-Catenin Gene: Lessons from the Human Cancer Genome Databases. Mol Cells 42, 8–16 (2019). 123. Mazzoni, S. M. & Fearon, E. R. AXIN1 and AXIN2 Variants in Gastrointestinal Cancers. Cancer Lett 355, 1–8 (2014).

59

124. Serman, L., Martic, T. N., Serman, A. & Vranic, S. Epigenetic alterations of the Wnt signaling pathway in cancer: a mini review. Bosn J Basic Med Sci 14, 191–194 (2014). 125. Mii, Y. & Taira, M. Secreted Wnt “inhibitors” are not just inhibitors: Regulation of extracellular Wnt by secreted Frizzled-related proteins. Development, Growth & Differentiation 53, 911–923 (2011). 126. Suzuki, H. et al. Epigenetic inactivation of SFRP genes allows constitutive WNT signaling in colorectal cancer. Nat Genet 36, 417–422 (2004). 127. Yu, J. et al. Association between SFRP promoter hypermethylation and different types of cancer: A systematic review and meta-analysis. Oncol Lett 18, 3481–3492 (2019). 128. Weisenberger, D. J., Liang, G. & Lenz, H.-J. DNA methylation aberrancies delineate clinically distinct subsets of colorectal cancer and provide novel targets for epigenetic therapies. Oncogene 37, 566–577 (2018). 129. Qin, J., Wen, B., Liang, Y., Yu, W. & Li, H. Histone Modifications and their Role in Colorectal Cancer (Review). Pathol. Oncol. Res. (2019). doi:10.1007/s12253-019-00663-8 130. Weisenberger, D. J. et al. CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancer. Nat. Genet. 38, 787–793 (2006). 131. Deaton, A. M. & Bird, A. CpG islands and the regulation of transcription. Genes Dev. 25, 1010–1022 (2011). 132. Baylin, S. B. DNA methylation and gene silencing in cancer. Nature Clinical Practice Oncology 2, S4–S11 (2005). 133. Herranz, N. & Gil, J. Mechanisms and functions of cellular senescence. J Clin Invest 128, 1238–1246 (2018). 134. Prieur, A., Besnard, E., Babled, A. & Lemaitre, J.-M. p53 and p16 INK4A independent induction of senescence by chromatin-dependent alteration of S-phase progression. Nature Communications 2, 473 (2011). 135. Trzeciak, L., Hennig, E., Kolodziejski, J., Nowacki, M. & Ostrowski, J. Mutations, methylation and expression of CDKN2a/p16 gene in colorectal cancer and normal colonic mucosa. Cancer Letters 163, 17–23 (2001).

60

136. Schneider-Stock, R. et al. Differences in loss of p16INK4 protein expression by promoter methylation between left- and right-sided primary colorectal carcinomas. Int. J. Oncol. 23, 1009–1013 (2003). 137. Esteller, M., Levine, R., Baylin, S. B., Ellenson, L. H. & Herman, J. G. MLH1 promoter hypermethylation is associated with the microsatellite instability phenotype in sporadic endometrial carcinomas. Oncogene 17, 2413–2417 (1998). 138. Lee, M. et al. Prognostic value of p16INK4a and p14ARF gene hypermethylation in human colon cancer. Pathology - Research and Practice 202, 415–424 (2006). 139. Shima, K. et al. Prognostic significance of CDKN2A (p16) promoter methylation and loss of expression in 902 colorectal cancers: Cohort study and literature review. International Journal of Cancer 128, 1080–1094 (2011). 140. Ogino, S. et al. CpG island methylator phenotype (CIMP) of colorectal cancer is best characterised by quantitative DNA methylation analysis and prospective cohort studies. Gut 55, 1000–1006 (2006). 141. Ogino, S., Kawasaki, T., Kirkner, G. J., Loda, M. & Fuchs, C. S. CpG Island Methylator Phenotype-Low (CIMP-Low) in Colorectal Cancer: Possible Associations with Male Sex and KRAS Mutations. J Mol Diagn 8, 582–588 (2006). 142. Jia, M., Gao, X., Zhang, Y., Hoffmeister, M. & Brenner, H. Different definitions of CpG island methylator phenotype and outcomes of colorectal cancer: a systematic review. Clin Epigenet 8, 25 (2016). 143. Ogino, S. et al. CpG island methylator phenotype, microsatellite instability, BRAF mutation and clinical outcome in colon cancer. Gut 58, 90–96 (2009). 144. Bae, J. M., Kim, J. H., Cho, N.-Y., Kim, T.-Y. & Kang, G. H. Prognostic implication of the CpG island methylator phenotype in colorectal cancers depends on tumour location. British Journal of Cancer 109, 1004–1012 (2013). 145. Dahlin, A. M. et al. The Role of the CpG Island Methylator Phenotype in Colorectal Cancer Prognosis Depends on Microsatellite Instability Screening Status. Clin Cancer Res 16, 1845–1855 (2010). 146. Hinoue, T. et al. Genome-scale analysis of aberrant DNA methylation in colorectal cancer. Genome Res. 22, 271–282 (2012).

61

147. Khorasanizadeh, S. The Nucleosome: From Genomic Organization to Genomic Regulation. Cell 116, 259–272 (2004). 148. Davey, C. A., Sargent, D. F., Luger, K., Maeder, A. W. & Richmond, T. J. Solvent mediated interactions in the structure of the nucleosome core particle at 1.9 a resolution. J. Mol. Biol. 319, 1097–1113 (2002). 149. Luger, K., Mäder, A. W., Richmond, R. K., Sargent, D. F. & Richmond, T. J. Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389, 251–260 (1997). 150. Nacev, B. A. et al. The expanding landscape of ‘oncohistone’ mutations in human cancers. Nature 567, 473–478 (2019). 151. Wan, Y. C. E., Liu, J. & Chan, K. M. Histone H3 Mutations in Cancer. Curr Pharmacol Rep 4, 292–300 (2018). 152. Schwartzentruber, J. et al. Driver mutations in histone H3.3 and chromatin remodelling genes in paediatric glioblastoma. Nature 482, 226–231 (2012). 153. Bannister, A. J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell Research 21, 381–395 (2011). 154. Allshire, R. C. & Madhani, H. D. Ten principles of heterochromatin formation and function. Nature Reviews Molecular Cell Biology 19, 229–244 (2018). 155. Erler, J. et al. The Role of Histone Tails in the Nucleosome: A Computational Study. Biophys J 107, 2911–2922 (2014). 156. Hyun, K., Jeon, J., Park, K. & Kim, J. Writing, erasing and reading histone lysine methylations. Experimental & Molecular Medicine 49, e324–e324 (2017). 157. Turner, B. M. Histone acetylation as an epigenetic determinant of long-term transcriptional competence. Cell. Mol. Life Sci. 54, 21–31 (1998). 158. Ardehali, M. B. et al. Drosophila Set1 is the major histone H3 lysine 4 trimethyltransferase with role in transcription. The EMBO Journal 30, 2817–2828 (2011). 159. Choi, Y. J. et al. Frameshift mutation of a histone methylation-related gene SETD1B and its regional heterogeneity in gastric and colorectal cancers with high microsatellite instability. Hum. Pathol. 45, 1674–1681 (2014). 160. Je, E. M., Lee, S. H., Yoo, N. J. & Lee, S. H. Mutational and expressional analysis of MLL genes in gastric and colorectal cancers with microsatellite instability. neo 60, 188–195 (2012).

62

161. Rampias, T. et al. The lysine‐specific methyltransferase KMT2C/MLL3 regulates DNA repair components in cancer. EMBO Rep 20, (2019). 162. Guinney, J. et al. The consensus molecular subtypes of colorectal cancer. Nature Medicine 21, 1350–1356 (2015). 163. Becht, E. et al. Immune and Stromal Classification of Colorectal Cancer Is Associated with Molecular Subtypes and Relevant for Precision Immunotherapy. Clin. Cancer Res. 22, 4057–4066 (2016). 164. van den Bulk, J. et al. Neoantigen-specific immunity in low mutation burden colorectal cancers of the consensus molecular subtype 4. Genome Medicine 11, 87 (2019). 165. Berg, K. C. G. et al. Gene expression profiles of CMS2-epithelial/canonical colorectal cancers are largely driven by DNA copy number gains. Oncogene 38, 6109–6122 (2019). 166. Berg, K. C. G. et al. Multi-omics of 34 colorectal cancer cell lines - a resource for biomedical studies. Molecular Cancer 16, 116 (2017). 167. Gillet, J.-P., Varma, S. & Gottesman, M. M. The clinical relevance of cancer cell lines. J. Natl. Cancer Inst. 105, 452–458 (2013). 168. Gillet, J.-P. et al. Redefining the relevance of established cancer cell lines to the study of mechanisms of clinical anti-cancer drug resistance. Proc Natl Acad Sci U S A 108, 18708– 18713 (2011). 169. Drost, J. & Clevers, H. Organoids in cancer research. Nat. Rev. Cancer 18, 407–418 (2018). 170. van de Wetering, M. et al. Prospective derivation of a living organoid biobank of colorectal cancer patients. Cell 161, 933–945 (2015). 171. Sato, T. et al. Single Lgr5 stem cells build crypt-villus structures in vitro without a mesenchymal niche. Nature 459, 262–265 (2009). 172. Drost, J. et al. Sequential cancer mutations in cultured human intestinal stem cells. Nature 521, 43–47 (2015). 173. Lannagan, T. R. M. et al. Genetic editing of colonic organoids provides a molecularly distinct and orthotopic preclinical model of serrated carcinogenesis. Gut 68, 684–692 (2019). 174. Olive, K. P. et al. Mutant p53 gain of function in two mouse models of Li-Fraumeni syndrome. Cell 119, 847–860 (2004).

63

175. Schwitalla, S. et al. Loss of p53 in Enterocytes Generates an Inflammatory Microenvironment Enabling Invasion and Lymph Node Metastasis of Carcinogen-Induced Colorectal Tumors. Cancer Cell 23, 93–106 (2013). 176. Washington, K. & Zemper, A. E. D. Apc-related models of intestinal neoplasia: a brief review for pathologists. Surgical and Experimental Pathology 2, 11 (2019). 177. Colnot, S. et al. Colorectal cancers in a new mouse model of familial adenomatous polyposis: influence of genetic and environmental modifiers. Laboratory Investigation 84, 1619–1630 (2004). 178. Robanus-Maandag, E. C. et al. A new conditional Apc -mutant mouse model for colorectal cancer. Carcinogenesis 31, 946–952 (2010). 179. Moser, A. R., Pitot, H. C. & Dove, W. F. A dominant mutation that predisposes to multiple intestinal neoplasia in the mouse. Science 247, 322–324 (1990). 180. Su, L. K. et al. Multiple intestinal neoplasia caused by a mutation in the murine homolog of the APC gene. Science 256, 668–670 (1992). 181. Halberg, R. B. et al. Long-lived Min Mice Develop Advanced Intestinal Cancers through a Genetically Conservative Pathway. Cancer Res 69, 5768–5775 (2009). 182. Feng, Y. et al. Mutant Kras Promotes Hyperplasia and Alters Differentiation in the Colon Epithelium But Does Not Expand the Presumptive Stem Cell Pool. Gastroenterology 141, 1003-1013.e10 (2011). 183. Haigis, K. M. et al. Differential effects of oncogenic K-Ras and N-Ras on proliferation, differentiation and tumor progression in the colon. Nat Genet 40, 600–608 (2008). 184. Boutin, A. T. et al. Oncogenic Kras drives invasion and maintains metastases in colorectal cancer. Genes Dev 31, 370–382 (2017). 185. Rad, R. et al. A Genetic Progression Model of BrafV600E-Induced Intestinal Tumorigenesis Reveals Targets for Therapeutic Intervention. Cancer Cell 24, 15–29 (2013). 186. Madison, B. B. et al. Cis elements of the villin gene control expression in restricted domains of the vertical (crypt) and horizontal (duodenum, cecum) axes of the intestine. J. Biol. Chem. 277, 33275–33283 (2002). 187. Bond, C. E. et al. Oncogenic BRAF mutation induces DNA methylation changes in a murine model for human serrated colorectal neoplasia. Epigenetics 13, 40–48 (2018).

64

188. Fennell, L. J. et al. MLH1–93 G/a polymorphism is associated with MLH1 promoter methylation and protein loss in dysplastic sessile serrated adenomas with BRAFV600E mutation. BMC Cancer 18, (2018). 189. Teschendorff, A. E., West, J. & Beck, S. Age-associated epigenetic drift: implications, and a case of epigenetic thrift? Hum Mol Genet 22, R7–R15 (2013). 190. Issa, J.-P. Aging and epigenetic drift: a vicious cycle. J Clin Invest 124, 24–29 (2014). 191. Sen, P., Shah, P. P., Nativio, R. & Berger, S. L. Epigenetic mechanisms regulating longevity and aging. Cell 166, 822–839 (2016). 192. Issa, J.-P. J. et al. Methylation of the oestrogen receptor CpG island links ageing and neoplasia in human colon. Nature Genetics 7, 536–540 (1994). 193. Worthley, D. L. et al. DNA methylation within the normal colorectal mucosa is associated with pathway-specific predisposition to cancer. Oncogene 29, 1653–1662 (2010). 194. Waki, T., Tamura, G., Sato, M. & Motoyama, T. Age-related methylation of tumor suppressor and tumor-related genes: an analysis of autopsy samples. Oncogene 22, 4128–4133 (2003). 195. Maegawa, S. et al. Widespread and tissue specific age-related DNA methylation changes in mice. Genome Res 20, 332–340 (2010). 196. Teschendorff, A. E. et al. Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res 20, 440–446 (2010). 197. Christensen, B. C. et al. Aging and environmental exposures alter tissue-specific DNA methylation dependent upon CpG island context. PLoS Genet. 5, e1000602 (2009). 198. Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol 14, R115 (2013). 199. Perna, L. et al. Epigenetic age acceleration predicts cancer, cardiovascular, and all-cause mortality in a German case cohort. Clin Epigenet 8, 64 (2016). 200. Zheng, Y. et al. Blood Epigenetic Age may Predict Cancer Incidence and Mortality. EBioMedicine 5, 68–73 (2016). 201. Roetker Nicholas S., Pankow James S., Bressler Jan, Morrison Alanna C. & Boerwinkle Eric. Prospective Study of Epigenetic Age Acceleration and Incidence of Cardiovascular Disease Outcomes in the ARIC Study (Atherosclerosis Risk in Communities). Circulation: Genomic and Precision Medicine 11, e001937 (2018).

65

202. Levine, M. E., Lu, A. T., Bennett, D. A. & Horvath, S. Epigenetic age of the pre-frontal cortex is associated with neuritic plaques, amyloid load, and Alzheimer’s disease related cognitive functioning. Aging (Albany NY) 7, 1198–1211 (2015). 203. Bibbins-Domingo, K. Aspirin Use for the Primary Prevention of Cardiovascular Disease and Colorectal Cancer: U.S. Preventive Services Task Force Recommendation Statement. Annals of Internal Medicine (2016). 204. Dubé, C. et al. The Use of Aspirin for Primary Prevention of Colorectal Cancer: A Systematic Review Prepared for the U.S. Preventive Services Task Force. Annals of Internal Medicine (2007). 205. Noreen, F. et al. Modulation of Age- and Cancer-Associated DNA Methylation Change in the Healthy Colon by Aspirin and Lifestyle. J Natl Cancer Inst 106, (2014). 206. Levine, M. E. et al. Menopause accelerates biological aging. PNAS 113, 9327–9332 (2016). 207. Wu, X. et al. Effect of tobacco smoking on the epigenetic age of human respiratory organs. Clinical Epigenetics 11, 183 (2019). 208. Yang, Y. et al. Smoking-Related DNA Methylation is Associated with DNA Methylation Phenotypic Age Acceleration: The Veterans Affairs Normative Aging Study. Int J Environ Res Public Health 16, (2019). 209. Lee, J. Y. et al. Association Between Cigarette Smoking and Alcohol Consumption and Sessile Serrated Polyps in Subjects 30 to 49 Years Old. Clinical Gastroenterology and Hepatology 17, 1551-1560.e1 (2019).

66

Chapter One: Integrative Genome-Scale DNA Methylation Analysis of a Large and Unselected cohort reveals five distinct subtypes of Colorectal Adenocarcinomas

As published in Cellular and Molecular Gastroenterology and Hepatology:

Fennell L.J, Dumenil T, Wockner L, Hartel G, Nones K, Bond C.E, Borowsky J, Liu C, McKeone D, Bowdler L, Montgomery G, Klein K, Hoffmann I, Patch AM, Kazakoff S, Pearson J, Waddell N, Wirapati P, Lochhead P, Imamura Y, Ogino S, Shao R, Tejpar S, Leggett B, Whitehall V. Integrative genome-scale DNA Methylation analysis of a large and unselected cohort reveals five distinct subtypes of Colorectal Adenocarcinomas. Cellular and Molecular Gastroenterology and Hepatology, 2019, In Press

Supplementary materials are available from: https://doi.org/10.5281/zenodo.4560888

Relevance to Thesis Aims:

Hypothesis: The DNA methylation profile of colorectal cancers will be heterogeneous, and high throughput DNA methylation profiling will uncover novel subtypes of colorectal cancers. Aim: To evaluate the DNA methylation profile of a large series of unselected colorectal cancers using high throughput microarray technology

Here we have collected a large series of unselected, and consecutively sampled colorectal cancers from a tertiary hospital in Brisbane, Australia. The unselected nature of this study allowed us to perform a comprehensive and unbiased assessment of the DNA methylation landscape of colorectal cancers, and allowed to us identify a number of novel associations between clinical and molecular features and DNA methylation and transcriptional profiles. We had hypothesised that such an unbiased and large experiment would identify novel subtypes of colorectal cancer, and indeed this proved to be the case. Here we have identified five subtypes of colorectal cancer, and two subtypes of CIMP-High cancers, to which the bifurcation of CIMP- High is further supported by clinical and molecular associations. Further, this study showed a step-wise increase in patient age concordant with DNA methylation subtype. This, together with

67 then unpublished data (now Liu et al 2019, Gastroenterology), prompted us to form our second hypothesis described in this thesis, that: Age associated DNA methylation elevates the risk of BRAF mutation induced neoplastic progression.

68

Author Contributions

Contribution Author Bioinformatic Statistical Experimental Drafting the Editting the Analysis Analysis Work Conceptualisation Manuscript Manuscript Fennell L.J 70.0% 75.0% 20.0% 33.0% 90.0% 16.0% Dumenil T 2.5% - 30.0% - - 2.0% Wockner L 2.5% 2.0% - - - 2.0% Hartel G - 12.0% - - - 2.0% Nones K 2.5% - - - - 2.0% Bond C.E - - 6.0% - - 2.0% Borowsky J - - 7.0% - - 2.0% Liu C - - 7.0% - - 2.0% McKeone D - - 20.0% - - 2.0% Bowdler L 2.5% 2.0% - - - 2.0% Montgomery G - 2.5% - - - 2.0% Klein K 2.5% 2.0% - - - 2.0% Hoffmann I 2.5% 2.0% - - - 2.0% Patch AM 2.5% - - - - 2.0% Kazakoff S 2.5% - - - - 2.0% Pearson J 2.5% - - - - 2.0% Waddell N 2.5% - - - - 2.0% Wirapati P 2.5% - - - - 2.0% Lochhead P - - 4.0% - - 2.0% Imamura Y - - 3.0% - - 2.0% Ogino S - - 3.0% - - 2.0% Shao R - 2.5% - - - 2.0% Tejpar S 2.5% - - - - 2.0% Leggett B - - - 33.0% - 20.0% Whitehall V - - - 33.0% 10.0% 20.0%

69

Synopsis

Here we have identified five molecularly and clinically relevant subtypes of the CpG island methylator phenotype in colorectal cancer. For the first time we demonstrate that CIMP-High cancers segregate into distinct subgroups, which display different frequencies of BRAF and KRAS mutation. These CIMP subtypes are associated with important clinical and molecular features, are correlated with mutations in different epigenetic regulator genes and show a marked relationship with patient age.

70

Abstract

BACKGROUND AND AIMS: Colorectal cancer is an epigenetically heterogeneous disease, however the extent and spectrum of the CpG Island Methylator Phenotype (CIMP) is not clear.

Methods: Genome scale methylation and transcript expression were measured using the Illumina HM450 DNA methylation and HT12 V3 expression microarrays in 216 unselected colorectal cancers, and findings validated using TCGA 450K and RNA-Seq data. Mutations in epigenetic regulators were assessed using CIMP subtyped Cancer Genome Atlas exomes.

RESULTS: CIMP-High cancers dichotomised into CIMP-H1 and CIMP-H2 based on methylation profile. KRAS mutation was significantly associated with CIMP-H2 cancers, but not CIMP-H1 cancers. Congruent with increasing methylation, there was a stepwise increase in patient age from 62 years in the CIMP-Negative subgroup to 75 years in the CIMP-H1 subgroup (P<0.0001). CIMP-H1 were predominantly comprised of consensus molecular subtype 1 (CMS1) cancers (70%) whilst CMS3 was over-represented in the CIMP-H2 subgroup (55%). PRC2-marked loci were subjected to significant gene body methylation in CIMP cancers (P<1.6x10-78). We identified oncogenes susceptible to gene body methylation and Wnt pathway antagonists resistant to gene body methylation. CIMP cluster specific mutations were observed for in chromatin remodeling genes, such as in the SWI/SNF and CHD gene families.

CONCLUSION: There are five clinically and molecularly distinct subgroups of colorectal cancer. We show a striking association between CIMP and age, gender and tumor location and identify an unidentified role for gene body methylation in progression of serrated neoplasia. These data support our recent findings that CIMP is uncommon in young patients and that BRAF mutant polyps in young patients may have limited potential for malignant progression.

71

Background

Colorectal cancer is a heterogeneous disease characterized by distinct genetic and epigenetic changes that drive proliferative activity and inhibit apoptosis. The conventional pathway to colorectal cancer is distinguished by APC mutation and chromosomal instability, and accounts for approximately 75% of sporadic cancers 1,2. The remaining colorectal cancers arise from serrated polyps and have activating mutations in the BRAF proto-oncogene, frequent microsatellite instability (MSI), and the CpG Island Methylator Phenotype (CIMP) 2,3.

The development of CIMP is critical in the progression of serrated neoplasia 3. It is well established that CIMP can result in the silencing of key genes important for tumor progression, including the tumor suppressor gene CDKN2A and the DNA mismatch repair gene MLH1 4,5. Gene silencing mediated by MLH1 promoter hypermethylation impairs mismatch repair function which leads to microsatellite instability 5. CIMP can be detected using a standardized marker panel to stratify tumors as CIMP-high, CIMP-low or CIMP-negative 3. Activation of the mitogen-activated protein kinase (MAPK) signaling pathway due to BRAF mutation is highly associated with CIMP-high. CIMP-high cancers frequently arise proximal to the splenic flexure and are more common in elderly female patients 2,3 whilst CIMP-low cancers have been associated with KRAS mutation 6,7.

More recently, consensus molecular subtyping (CMS) was proposed for classifying colorectal cancers based on transcriptional signatures. Guinney and colleagues identified four major molecular subtypes (CMS1 - CMS4) 8. CMS1, or MSI immune subtype, is characterized by MSI, BRAF mutation and enhanced immunogenicity. CMS2 can be distinguished by chromosomal instability and WNT pathway perturbations. CMS3, or metabolic subtype, is characterized by KRAS mutation, CIMP-low status and infrequent copy number alterations. CMS4, or mesenchymal subtype, shows high copy number aberrations, activation of the transforming growth factor-β signaling cascade, stromal infiltration and the worst overall survival. The relationship between CIMP and CMS subtypes is currently unclear.

Methylation is not a phenomenon distinct to neoplasia. Changes in the epigenome also occur with age and in response to environmental factors 9,10. We have previously shown that the promoter region of certain genes becomes increasingly methylated in normal colonic mucosa with age 9. CIMP-high cancers are identified primarily in older patients 2 hence, age related hypermethylation might prime the intestinal epigenome for serrated neoplasia-type colorectal

72

cancers. Methylation is also critical in the progression of serrated pathway precursors to invasive cancer, primarily through methylation of MLH1 at the transition to dysplasia 11,12. Thus the natural history of the cancer within the colorectum may dictate the methylation profile of the cancer once malignancy develops.

DNA methylation alone can be insufficient to induce transcriptional repression 13. Gene repression is also associated with repressive histone marks such as the H3K27me3 mark 14, which is catalyzed by the polycomb-repressor-complex 2. Modification of histone tails is catalyzed by a series of enzymes including epigenetic readers, which scan for histone modifications; writers, which effect the addition of a modification; and erasers, which are responsible for the removal of histone marks. Mutations in genes encoding epigenetic enzymes have been shown to occur frequently in cancer 15. Whilst DNA methylation is classically associated with gene silencing, the relationship between DNA methylation and histone modifications has not been fully elucidated, nor has the role of somatic mutations in enzymes that catalyze these epigenetic processes been comprehensively examined.

In this study, we define the extent and spectrum of DNA methylation changes occurring in colorectal cancers and relate this to key clinical and molecular events characteristic of defined pathways of tumor progression. We investigate the role of DNA methylation in the modulation of gene transcription, and assess mutation of genes encoding epigenetic regulatory proteins.

73

Results

Clinical and molecular features of the consecutive cohort in comparison to The Cancer Genome Atlas cohort

Genome wide DNA methylation levels were assessed in 216 unselected colorectal cancers (Table 1). The mean age of patients at surgery was 67.9 years. 29 of 216 (13.4%) cancers had a BRAF V600E mutation, and 75 of 216 (34.7%) cancers were mutated at KRAS codons 12 or 13. Mutation of BRAF and KRAS were mutually exclusive. Patients with BRAF mutated cancers were significantly older than patients with BRAF wild-type cancers (mean age 74.9 vs 66.9, P=0.01). TP53 was mutated in 78/185 (42.2%) cancers. MSI was significantly associated with BRAF mutation (18 of 29 BRAF mutant vs 9 of 187 compared with BRAF wild-type cancers, P<0.0001). Using the Wiesenberger panel to determine CIMP status3, 24/216 (11.1%) were CIMP-high, 44/216 (20.4%) were CIMP-low and 148 of 216 (68.5%) were CIMP-negative. CIMP-high was significantly associated with BRAF mutation compared with BRAF wild-type cancers (19/29 vs 5/186, P<0.0001). CIMP-low was significantly associated with KRAS mutation compared with KRAS wild-type cancers (26/75, 34.6% vs 18/141, 12.8%, P<0.001).

We collected a subset of 32 matched non-cancerous mucosal samples from patients in the consecutive cohort. The mean age of patients within the cohort of matched normal samples was 68.9 and was not significantly different to the mean age of patients in the wider cohort (P=0.71).

74

Table 1: Clinicopathological details of the 216 colorectal adenocarcinomas as stratified for methylation based CIMP clustering, measured on Illumina HM450 arrays, using the 5,000 most variable CpG sites that were not hypermethylated in normal mucosal tissue. P values reported were obtained using ANOVA for continuous variables and X2 for categorical variables.

n CIMP-H1 CIMP-H2 CIMP-L1 CIMP-L2 CIMP-Neg P Value Total n 216 23 22 52 66 53 Mean Age years 67.9 75.2 73.4 70.1 66.8 61.9 P<0.0001 Gender Male 100 (46.4%) 5 (21.7%) 9 (40.9%) 24 (46.2%) 35 (53.0%) 27 (50.9%) P=0.11 Female 116 (53.7%) 18 (78.3%) 13 (59.1%) 28 (53.8%) 31 (47.0%) 26 (49.1%) Site Proximal 75/213 (35.2%) 19 (82.6%) 13 (59.1%) 20 (39.2%) 15 (23.4%) 8 (15.1%) Distal 96/213 (45.1%) 4 (17.4%) 6 (27.3%) 21 (41.2%) 32 (50.0%) 33 (62.3%) P<0.0001 Rectal 42/213 (19.7%) 0 3 (13.6%) 10 (19.6%) 17 (26.6%) 12 (22.6%) CIMP Status CIMP-High 24 (11.1%) 16 (69.6%) 3 (13.6%) 3 (5.8%) 2 (3.0%) 0 CIMP-Low 44 (20.4%) 6 (26.1%) 13 (59.1%) 16 (30.8%) 8 (12.1%) 1 (1.9%) P<0.0001 CIMP-Neg 148 (68.5%) 1 (4.3%) 6 (27.3%) 33 (63.5%) 56 (84.8%) 52 (98.1%) Mutation (%) KRAS mutant 75 (34.7%) 4 (17.4%) 12 (54.5%) 34 (65.4%) 19 (28.8%) 7 (13.2%) P<0.0001 BRAF mutant 29 (13.4%) 17 (73.9%) 2 (9.1%) 6 (11.5%) 4 (6.0%) 0 (0%) P<0.0001 TP53 mutant 77/185 (41.6%) 12/21 (57.1%) 6/21 (28.6%) 18/45 (40.0%) 22/54 (40.7%) 19/44 (43.2%) P=0.45 Microsatellite MSI 26 (12.0%) 11 (47.8%) 1 (4.8%) 8 (15.4%) 6 (9.1%) 0 P<0.0001 Instability (%) MSS 190 (88.0%) 12 (52.2%) 21 (95.2%) 44 (84.6%) 60 (90.9%) 0 Consensus CMS1 35 (16.2%) 16 (69.6%) 4 (18.2%) 5 (9.6%) 9 (13.6%) 1 (1.9%) Molecular CMS2 68 (31.5%) 0 4 (18.2%) 10 (19.2%) 30 (45.5%) 24 (45.3%) Subtype P<0.0001 CMS3 53 (24.5%) 3 (13.0%) 12 (54.5%) 21 (40.4%) 10 (15.2%) 7 (13.2%) CMS4 60 (27.8%) 4 (17.4%) 2 (9.1%) 16 (30.8%) 17 (25.8%) 21 (39.6%) Stage I 30/111 0/15 5/11 (45.5%) 8/30 (26.7%) 13/35 (37.1%) 4/20 (20.0%) II 33/111 7/15 (46.7%) 1/11 (9.1%) 10/30 (33.3%) 10/35 (28.6%) 5/20 (25.0%) P=0.15 III 34/111 6/15 (40.0%) 4/11 (36.4%) 7/30 (23.3%) 11/35 (31.4%) 6/20 (30.0%) IV 14/111 2/15 (13.3%) 1/11 (9.1%) 5/30 (16.7%) 1/35 (2.9%) 5/20 (25.0%) LINE1 70.3 68.75 68.96 72.05 70.45 69.67 P=0.38

75

Methylation-based clustering reveals five subtypes of colorectal cancer with distinct clinical and molecular features

We examined the extent and spectrum of DNA methylation changes in these 216 colorectal cancers using Illumina HumanMethylation450 BeadChip arrays. Five clusters were identified by RPMM clustering on the 5,000 most variable probes (Figure 1). Owing to the composition of the 450K array, and our criteria for probe exclusion (see methods), >98% of probes were hypermethylated in cancer versus normal for each of CIMP-H1 to CIMP-L2. In CIMP-Neg, 73.38% of these probes were comparably more methylated in the tumour versus normal. We note that only 336 of these probes were included on the earlier 27K HumanMethylation array, highlighting the utility of the extra probes on the 450K array. Subtypes included included two clusters with high levels of methylation that we have designated as CIMP-H1 and CIMP-H2; two clusters with intermediate levels of methylation, CIMP-L1 and CIMP-L2; and a single cluster with low levels of methylation, CIMP-Neg.There was a significant stepwise increase in age between clusters concordant with increasing genomic methylation (CIMP-Neg: 61.9 years, CIMP-L2: 66.8 years, CIMP-L1: 70.1 years, CIMP-H2: 73.4 years, CIMP-H1: 75.2 years, P<0.0001) (Table 1).

76

Figure 1: Methylation heatmap of unselected 216 colorectal cancers using the 5000 most variable β values in CpG sites that were not hypermethylated in normal mucosal tissue. Clustering was performed using the RPMM R package. Clustering showed 5 distinct clusters, termed CIMP-H1, CIMP-H2, CIMP-L1, CIMP-L2, and CIMP-Neg. This was faithfully recapitulated in TCGA.

77

The CIMP-H1 subgroup comprised 23/216 (10.6%) of all cancers and was enriched for female patients (18/23, 78.3%, P<0.0001) and tumors located proximal to the splenic flexure (19/23, 82.6%, P<0.0001). We observed no differences in stage of cancer at diagnosis and methylation cluster. The CIMP-H1 cluster was strikingly enriched for cancers with features characteristic of serrated neoplasia, including BRAF mutation (17/23 73.9%, P<0.0001), CIMP-H status determined using the Wiesenberger marker panel (16/23 69.6%, P<0.0001), MSI (11/23 47.8%) and consensus molecular subtype CMS1 (16/23 69.6%, P<0.0001) (Table 1, Figure 1). TP53 was mutated in 12/21 (57.1%) CIMP-H1 cluster cancers.

CIMP-H2 cluster cancers also frequently arose in the proximal colon (Consecutive Cohort: 13/22 59.1%). CIMP-H2 cancers were more often KRAS mutant than CIMP-H1 cancers (54.5% vs 17.4%) and were less often TP53 mutant when compared with the rest of the cohort (28.6%). The incidence of MSI within these cancers was low (4.8%). The frequency of the metabolic CMS3 subtype was higher than in the other CIMP subtypes (54.5%). CIMP-H2 cancers were significantly less likely to be identified as CIMP-High using the Weisenberger MethyLight panel when compared with CIMP-H1 cancers (13.6% vs 69.6%, P<0.001).

CIMP-L1 cancers were significantly enriched for KRAS mutation (65.4%, P<0.0001), and were identified equally in the distal and proximal colon. These cancers were rarely MSI (15.4%), and were often the CMS3 (40.4%) or CMS4 (30.8%) subtype. CIMP-L2 cancers mutate KRAS with relative infrequency when compared with CIMP-H2 and CIMP-L1 cancers (28.8%), and are significantly enriched for distal colonic and rectal location (50% and 26.6%, for distal and rectal locations, P<0.0001). The proportion of CMS2 cancers was significantly higher in CIMP-L2 cancers when compared with CIMP-H1, CIMP-H2 and CIMP-L1 cancers (P<0.001). The frequency of distal colonic location was the highest amongst CIMP-Neg cancers (62.3%) and were identified in patients with the lowest mean age (61.9). We did not identify BRAF mutation in any CIMP-Neg cancers. CMS2 and CMS4 were the most frequent CMS subtypes in CIMP-Neg cancers (45.3% and 39.6%, respectively). The proportion of CMS4 was highest in CIMP-Neg cancers when compared with other subtypes (P<0.001). We sequenced hotspots on exon 11 and 15 of BRAF, codon 61 in KRAS and exon 18 in EGFR in CIMP-H1/H2 cancers that were wild-type at BRAF V600E and KRAS codons 12 and 13 however did not identify any mutations in these regions.

78

Validation of the association between CIMP Subtype and clinical and molecular features in the Cancer Genome Atlas

DNA methylation was previously measured using the HumanMethylation 450 array in 392 colorectal cancers from The Cancer Genome Atlas project 16. We observed several differences in the TCGA cohort when compared with the consecutive RBWH cohort. The mean age of patients at the time of diagnosis was significantly lower in the TCGA cohort when compared to the consecutive cohort (64.5 vs 67.9, P<0.01). Male gender was slightly overrepresented (199/373, 53.4%). The distribution of cancers throughout the colon was significantly different in the TCGA cohort. Cancers in the TCGA were significantly enriched for proximal location in comparison to the RBWH cohort (47.0% vs 35.2%, P<0.01) and less likely to be located in the distal colon (40.3% vs 45.1%, P<0.01) or rectum (12.7% vs 19.7%, P<0.01).

There were many similarities between the TCGA and RBWH cohorts. The frequency of BRAF mutations was 9.4%, and not significantly different from the proportion observed in the RBWH cohort. Likewise there was no significant difference in the frequency of KRAS mutations between the cohorts (40.1% vs 34.7%, for TCGA and RBWH cohorts respectively). The proportion of microsatellite unstable cancers was not significantly different between the two cohorts (15.9% vs 12%, P=0.1).

We adopted the same method of clustering to identify subtypes in the TCGA cohort. We report that 61% of probes were in the top 5,000 most variable probes in the TCGA were also among the top 5,000 most variable in the Brisbane cohort. .Despite underlying differences in the clinical and molecular features of the cohorts, unsupervised clustering using the same methods as was used in the RBWH cohorts, also resulted in the five distinct CIMP clusters identified in the TCGA series (Table 2 & Figure 1). There was a similar, striking association between CIMP subtype and biological age (P<0.0001). In keeping with the RBWH cohort, increasing CIMP in the TCGA cohort was associated with proximal colonic location (P<0.0001), and was inversely correlated with distal and rectal locations (P<0.0001 and P<0.05 for distal and rectal location). The distribution of KRAS mutations in CIMP subtypes followed a similar bell-shaped distribution, and were most common in CIMP-L1 cancers (48/81, 59.3%), and least common in CIMP-H1 (5/22, 26.3%) and CIMP-Negative cancers (21/102, 20.6%). Notably KRAS mutation was more common in CIMP-H2 cancers when compared with CIMP-H1 cancers in TCGA cohort (43.6% vs 26.3%).

79

Table 2: Clinicopathological and molecular details of 374 colorectal adenocarcinomas from The Cancer Genome Atlas stratified for CIMP Subtype. P values reported were obtained using ANOVA for continuous variables and X2 for categorical variables and represent the P-value for an association between all subtypes and the feature in question. n CIMP-H1 CIMP-H2 CIMP-L1 CIMP-L2 CIMP-Neg P-Value Total n 374 19 (5.1%) 39(10.4%) 81 (21.7%) 133 (35.6%) 102 (27.3%) Mean Age Years 64.5 72.2 67.8 66.5 64.5 57.1 <0.0001 Gender Male 199 7 (36.8%) 21 (53.8%) 47 (58.0%) 74 (55.6%) 50 (49.5%) ns Female 174 12 (63.2%) 18 (46.2%) 34 (42.0%) 59 (44.4%) 51 (50.5%) Site Proximal 167 17 (100%) 28 (84.8%) 53 (67.9%) 53 (40.8%) 16 (16.5%) Distal 143 f0 4 (12.1%) 18 (23.1%) 57 (43.8%) 64 (65.9%) <0.0001 Rectal 45 0 1 (3.0%) 7 (9.0%) 20 (15.4%) 17 (17.5%) Mutation (%) BRAF 35 10 (52.6%) 19 (48.7%) 5 (6.2%) 1 (0.8%) 0 <0.0001 KRAS 150 5 (26.3%) 17 (43.6%) 48 (59.3%) 59 (44.4%) 21 (20.6%) <0.0001 TP53 234 10 (52.6%) 19 (48.7%) 44 (54.3%) 85 (63.9%) 76 (74.5%) 0.01 Microsatellite Instability (%) MSI 51 10 (52.6%) 17 (50%) 11 (16.7%) 7 (6.2%) 6 (6.7%) <0.0001 MSS 269 9 (47.4%) 17 (50%) 55 (83.3%) 105 (93.8%) 83 (93.3%) CMS CMS1 42 10 (58.8%) 20 (69%) 9 (14.3%) 3 (2.8%) 0 (0%) CMS2 121 2 (11.8%) 1 (3.4%) 25 (39.7%) 48 (45.3%) 45 (51.1%) <0.0001 CMS3 45 4 (23.5%) 4 (13.8%) 16 (25.4%) 14 (13.2%) 7 (8%) CMS4 95 1 (5.9%) 4 (13.8%) 13 (20.6%) 41 (38.7%) 36 (40.9%) Stage I 54 3 (15%) 9 (23.7%) 16 (20.8%) 11 (8.7%) 15 (16%) II 133 9 (45%) 18 (47.4%) 32 (41.6%) 50 (39.4%) 24 (25.5%) <0.01 III 119 5 (25%) 11 (28.9%) 20 (26%) 46 (36.2%) 37 (39.4%) IV 50 3 (15%) 0 (0%) 9 (11.7%) 20 (15.7%) 18 (19.1%)

80

In both cohorts, CMS2 cancers were most frequent in the CIMP-L2 (TCGA: 45.3%, RBWH: 45.5%) and CIMP-Negative (TCGA: 51.1%, RBWH: 45.3%). Likewise CIMP-Neg cancers were strongly enriched for the CMS4 subtype in both cohorts (TCGA: 40.9%, RBWH: 39.6%)

In contrast to the RBWH cohort, CIMP-H1 cancers were less frequent overall (TCGA: 5.1% , RBWH: 10.6%) and BRAF mutation was associated with CIMP-H1 and CIMP-H2 (CIMP-H1 TCGA: 52.6% RBWH: 73.9%; CIMP-H2 TCGA: 48.7% RBWH: 9.1%). Perhaps as a consequence of the increased frequency of BRAF mutation in TCGA CIMP-H2 cancers, MSI was significantly more enriched in CIMP-H2 cancers in the TCGA cohort (50%). While we did not identify any association between stage and CIMP subtype in the RBWH cohort, late stage disease was significantly associated with decreasing CIMP in the TCGA cohort (Stage IV (%) CIMP-H1: 15%, CIMP-H2: 0%, CIMP-L1: 11.7%, CIMP-L2: 15.7%, CIMP-Neg: 19.1% P<0.01).

The colorectal cancer methylome is altered in comparison to normal mucosa

We identified differentially methylated probes in each cluster compared to 32 normal mucosal samples that matched a subset of cancers in the unselected series (Table 3). In all 4 CIMP clusters (CIMP-H1, -H2, -L1 and -L2), the number of differentially hypermethylated CpG sites greatly exceeded those that were hypomethylated (Table 3). By contrast, in the single CIMP- negative cluster, hypomethylation was more common than hypermethylation. Probe hypermethylation was most frequent in the CIMP-H1 cluster, including 21,168 hypermethylated probes occurring within 5,165 unique CpG islands. Of these, 4333 were also hypermethylated in CIMP-H2, whilst 832 were uniquely hypermethylated in CIMP-H1. An additional 523 CpG islands were uniquely hypermethlated in the CIMP-H2 cluster relative to CIMP-H1. The highest number of hypomethylation events was seen in the CIMP-H2 cluster compared to all other clusters (P<0.0001), with the majority occurring in open sea regions of the genome. Next we examined the impact of our chosen beta value change threshold on the number of differential methylation events we were able to detect. Shifting the beta value change threshold to 0.3 substantially reduces the number of differentially methylated probes identified (to 47.1%, 47.8%, 24.9%, 13.4%, 5.8% of the probes identified at 0.2 for CIMP-H1 to CIMP-Neg, respectively). When we increase the threshold to 0.4 we see a similar, and more drastic reduction in our ability to identify DMPs (18.9%, 19.5%, 4.1%, 1.2%, 0.3% of

81

probes identified at 0.2 for CIMP-H1 to CIMP-Neg, respectively). There was a significant relationship between CIMP-subtype and the magnitude of the DMPs identified (P<0.0001).

82

Table 3: Distribution of differentially hypermethylated probes in reference to CpG Islands versus normal mucosal tissue. Cancers are stratified for CIMP Clustering. Differential methylation was deemed as an absolute beta value change of >0.2 and an FDR corrected P Value <0.01 compared to 32 Normal. The ‘+’ symbol refers to differential hypermethylation. The ‘-‘ symbol referring to differential hypomethylation. CIMP-H1 CIMP-H2 CIMP-L1 CIMP-L2 CIMP-Neg CpG + - + - + - + - + - Location Island 21011 204 19651 426 11297 118 5685 127 754 162 South Shore 3196 586 3003 1359 1253 426 513 284 78 242 North Shore 4745 890 4641 1885 2095 617 911 420 184 346 South Shelf 229 743 181 1620 83 574 49 331 19 238 North Shelf 280 738 259 1660 92 591 58 342 35 246 Sea 2056 8396 1721 15575 647 6812 297 4189 104 3428 Total 31517 11557 29453 22525 15467 9138 7513 5693 1174 4662

83

We compared the probes that were differentially hypermethylated (versus normal mucosa) in the RBWH cohort to those differentially hypermethylated in the TCGA cohort. There was a remarkable degree of overlap in differentially methylated loci. In CIMP-H1, 80.2% of differentially hypermethylated loci were detected in both the RBWH and TCGA cohorts. Of the remaining 7,481 probes, 6009 were detected solely in the TCGA and 1472 in the RBWH cohorts. We hypothesized that the beta value cut-off (>0.2 mean beta value difference versus normal) may have resulted in the filtering of many of the probes that were detected in one cohort only. Indeed, of the 7,481 DMPs detected in one cohort only, the methylation level of 98.5% were statistically significantly different from normal colonic mucosa in the other cohort, but were filtered as a result of the beta value difference cut-off. This was consistent across all CIMP subtypes.

The events that were recognized in two independent cohorts are likely to be bona fide differential methylation events. These data indicate that the selection of an appropriate beta value difference cut- off is critical and that applying stringent cut-offs may significantly increase the type II error rate when reporting differentially methylated events.

CIMP subtypes are associated with different stromal immune cell composition

We hypothesized that CIMP subtypes may differ in their stromal cell type composition. We used CIBERSORT to deconvolute the relative composition of immune cells in the tumor microenvironment 17. CIMP-H1 cancers were enriched for M1 macrophages in comparison to all other CIMP subtypes, with the exception of CIMP-L2 cancers (P<0.01 vs CIMP-H2; P=0.02 vs CIMP-L1, P=0.01 vs CIMP-Neg,). CIMP-H2 cancers were enriched for resting CD4 T memory cells (P<0.01), and were depleted for M1 macrophages (P=0.01). Mast cells were inversely associated with DNA methylation subtype, with mast cells contributing least to the immune microenvironment in CIMP-H1 cancers and increasing in a stepwise manner from CIMP-H1 to CIMP-Neg (P=0.01). Conversely, natural killer cells were associated with CIMP-H cancers (ANOVA P<0.05), but did not differ between CIMP-H1 and CIMP-H2.

CIMP-H1 and CIMP-H2 cancers can be delineated by expression profiles To examine the extent to which CIMP-H1 and CIMP-H2 are transcriptionally distinct, we analyzed differential expression for each cluster with respect to normal mucosa using Illumina HT-12 Expression arrays. We then performed single sample gene set enrichment analysis 18 to evaluate enrichments in the Hallmark gene set 19 in individual samples (FDR corrected P<0.05). We identified 10 gene sets significantly enriched in CIMP-H1 cancers, 7 of which were related to the immune response (Figure 2). The bile acid metabolism gene set was significantly enriched in

84

CIMP-H2 cancers, and this difference was independent of BRAF and KRAS status (Least squares regression P value with BRAF, KRAS and CIMP subtype as effects: P= 0.028). In TCGA we did not identify any significant differences in immune response, nor bile acid metabolism. This may be due to the increased frequency of BRAF mutant MSI cancers in CIMP-H2 cancers in TCGA.

85

Figure 2: Differentially regulated hallmark gene sets between CIMP-H1 and CIMP-H2 cancers as assessed by single-sample gene set enrichment analysis. IL, interleukin; ssGSEA, single sample gene set enrichment analysis.

86

Relationship between promoter hypermethylation and gene transcriptional activity

To determine the frequency to which DNA hypermethylation in promoter regions control transcription of downstream genes, we examined the transcript levels for genes where the promoter was hypermethylated relative normal mucosa. Although promoter methylation was most common in CIMP-H1 and CIMP-H2 clusters (Figure 3A), these subgroups had the lowest proportion of genes where hypermethylation correlated with reduced transcript expression (13.9% and 15.6%, respectively). This inverse relationship continued for CIMP-L1 (18.9%), CIMP-L2 (19.9%) and with the CIMP-negative cancers having reduced transcription in 22.7% of hypermethylated promoters (P <0.0001, Figure 3B). We observed a similar relationship between gene transcription and promoter methylation in cancers in TCGA. In the TCGA, the proportion of methylated genes that resulted in gene transcription repression did not differ between CIMP subtypes.

To explain the apparent lack of association between gene expression and methylation in CIMP-H cancers, we queried whether genes that become methylated in these clusters were already lowly expressed in normal colonic tissue. We calculated the percentile of expression for each gene in normal colonic mucosa and observed that the median percentile of expression for genes methylated in CIMP-H1 cancers was < 50 (42.9), indicating that genes methylated in CIMP-H1 cancers are generally maintained at low basal expression in normal tissue.

87

Figure 3: (A) Number of differentially methylated promoters in each CIMP cluster vs the cohort of normal mucosal samples. (B) The proportion of methylation events within each cluster that resulted in gene repression at the transcript level

88

We considered that loci that were methylated and repressed in multiple CIMP clusters may be genes that are important for cancer development. Strikingly, of the 1273 genes that were methylated and repressed in at least one CIMP cluster, 82.3% were methylated and repressed in two of more CIMP clusters, 16.9% silenced in ≥ 3 CIMP subtypes and 8.0% in all four CIMP subtypes (excluding CIMP-Negative). We identified 21 tumor suppressor genes, as per the NCG6.0 database, that were recurrently methylated and silenced in ≥ 3 CIMP subtypes (Table 4).

89

Table 4: Tumor suppressor genes that were recurrently methylated and repressed in > 3 CIMP Subtypes Gene Name Description PCDH9 protocadherin 9 [Source:HGNC Symbol;Acc:HGNC:8661] CDO1 cysteine dioxygenase type 1 [Source:HGNC Symbol;Acc:HGNC:1795] MAL mal, T cell differentiation protein [Source:HGNC Symbol;Acc:HGNC:6817] EPB41L3 erythrocyte membrane protein band 4.1 like 3 [Source:HGNC Symbol;Acc:HGNC:3380] AKAP12 A-kinase anchoring protein 12 [Source:HGNC Symbol;Acc:HGNC:370] NDRG4 NDRG family member 4 [Source:HGNC Symbol;Acc:HGNC:14466] LIFR LIF receptor alpha [Source:HGNC Symbol;Acc:HGNC:6597] SCUBE2 signal peptide, CUB domain and EGF like domain containing 2 [Source:HGNC Symbol;Acc:HGNC:30425] TMEFF2 transmembrane protein with EGF like and two follistatin like domains 2 [Source:HGNC Symbol;Acc:HGNC:11867] DUSP26 dual specificity phosphatase 26 [Source:HGNC Symbol;Acc:HGNC:28161] C2orf40 chromosome 2 open reading frame 40 [Source:HGNC Symbol;Acc:HGNC:24642] SFRP1 secreted frizzled related protein 1 [Source:HGNC Symbol;Acc:HGNC:10776] UCHL1 ubiquitin C-terminal L1 [Source:HGNC Symbol;Acc:HGNC:12513] IKZF1 IKAROS family zinc finger 1 [Source:HGNC Symbol;Acc:HGNC:13176] CADM2 cell adhesion molecule 2 [Source:HGNC Symbol;Acc:HGNC:29849] CXCL12 C-X-C motif chemokine ligand 12 [Source:HGNC Symbol;Acc:HGNC:10672] IRF4 interferon regulatory factor 4 [Source:HGNC Symbol;Acc:HGNC:6119] ZBTB16 zinc finger and BTB domain containing 16 [Source:HGNC Symbol;Acc:HGNC:12930] CHFR checkpoint with forkhead and ring finger domains [Source:HGNC Symbol;Acc:HGNC:20455] SLIT2 slit guidance ligand 2 [Source:HGNC Symbol;Acc:HGNC:11086] ZFP82 ZFP82 zinc finger protein [Source:HGNC Symbol;Acc:HGNC:28682]

90

Polycomb-Repressive Complex 2 occupancy at hypermethylated CpGs is inversely correlated with global hypermethylation

SUZ12 occupancy is a surrogate for polycomb-repressor complex 2 occupancy and in embryonic stem cells this has been shown to associate with transcriptional repression of hypermethylated loci 6,20. Consistent with this, we observed an increase in the number of methylated CpG sites that overlap with SUZ12 occupied regions with increasing CIMP cluster (P<0.0001, Figure 4A). Conversely and in keeping with our findings with promoter methylation, an inverse association between proportion of hypermethylated loci genes that overlapped with SUZ12 occupied sites with increasing CIMP cluster was observed (P<0.0001, Figure 4B). This further supports our finding that whilst DNA hypermethylation occurs more frequently with increasing CIMP cluster, these methylation events are more likely to result in gene silencing in CIMP-negative cancers.

91

Figure 4: (A) Proportion of SUZ12-occupied regions in hESC1 cells that contained hypermethylated probes in respective CIMP clusters. (B) Proportion of differential hypermethylation events that overlapped with Polycomb Repressive Complex-2 (PRC2)-occupied regions.

92

CIMP-H1 and CIMP-H2 promoter methylation is defined by the enrichment of distinct transcription factor binding sites

Transcription factor binding sites often contain CpG sequences and therefore are a target of DNA methylation, which may explain some of the effects of methylation on transcription. To explore whether DNA methylation is targeted to specific transcription factor binding sites we performed an enrichment analysis using the CentriMo 21 tool to examine the 2kb region immediately upstream of hypermethylated genes. There were 128 significantly enriched binding sites that overlapped in CIMP-H1 and CIMP-H2 cancers. An additional 323 sites were uniquely enriched in CIMP-H1 cancers and an additional 330 sites in CIMP-H2 cancers. SMAD4 and FOXP3 (adjusted P Value: 1.2x10-24 and 4.1x10-23, respectively) were the most significantly enriched motifs in CIMP-H1 cancers. SPDEF, FLI1 and NKX6 (adjusted P Value: 7.2x10-30, 1.1x10-16, 3.5x10-16, respectively) were most significantly enriched in CIMP-H2 cancers. Table 5 presents the top 10 enriched consensus binding sites that were exclusive to CIMP-H1 and CIMP-H2.

93

Table 5: Motifs that were most significantly and exclusively enriched at methylated promoters in CIMP-H1 and CIMP-H2. CIMP-H1 CIMP-H2 Raw P Adjusted P Raw P Adjusted P Motif Name Motif Motif Name Motif Value Value Value Value SPDEF_DBD Smad4 TGTCTRGM 1.2E-21 1.2E-24 GTGGTCCCGGATTAT 7.2E-33 7.2E-30 _2 VNTAATTAATTAABGS FOXP3_DBD RTAAACA 4.1E-20 4.1E-23 UP00142_1 2.4E-20 2.4E-17 G FOXP3 RTAAACA 4.1E-20 4.1E-23 FLI1_full_2 ACCGGAAATCCGGT 1.1E-19 1.1E-16 POU2F2_DBD GVWAATTAATTAMYB HWTRMATATKCAWA 4.5E-19 4.5E-22 UP00200_1 3.5E-19 3.5E-16 _2 BG DHNATGTGCACAYAH Zscan4_primary 1.2E-18 1.3E-21 NHLH1_DBD CGCAGCTGCS 2.1E-18 2.1E-15 WN HOXC10 GTCRTAAAAH 1.3E-18 1.3E-21 ERG_full_2 ACCGGAWATCCGGT 4.8E-18 4.8E-15 HVWNNGTTAACASHN Bbx_secondary 3.1E-16 3.1E-19 MA0680.1 TAATCGATTA 8.7E-18 8.6E-15 RV Foxc1_DBD_1 GTAAAYAAACA 1.3E-15 1.3E-18 PAX7_DBD TAATYRATTA 1.4E-16 1.4E-13

94

Gene bodies of Wnt pathway antagonists are resistant to methylation

We further explored gene bodies that were unmethylated, but had >10 CpG island probes, and performed pathways analysis to identify pathways that were devoid of gene body methylation. There were six pathways that were significantly enriched amongst these genes, including the WNT signaling pathway (Figure 5). The WNT signaling pathway was most heavily enriched. PCDHA6, PCDHGA2, PCDHA7 and PCDHA2, which contained 36, 15, 10, and 20 gene body CpG island probes were all unmethylated. These protocadherins have been implicated in regulation of the WNT signal and may act as tumor suppressor gene. Likewise AXIN1, a gene critical to the -catenin destruction complex, contained 11 unmethylated intragenic CGI probes. TCF3, a WNT pathway repressor, contained 19 unmethylated intragenic CGI probes. We considered whether gene body methylation within WNT antagonists could alter gene transcription, however we did not observe any differences in expression profiles of these genes versus normal mucosa tissue, nor were they expressed in normal mucosa tissue. In the remaining WNT genes we did not identify any consistent expression changes.

95

Figure 5: Pathways significantly enriched for genes that contained CpG islands that were devoid of methylation in both CIMP-H clusters. VEGF, vascular endothelial growth factor.

96

Oncogenes are significantly more likely than tumor suppressor genes to undergo gene body methylation in CIMP-H1 and CIMP-H2 cancers

Gene body methylation is positively correlated with gene expression 22. We examined hypermethylation in gene body CpG islands, defined where at minimum two probes in the CpG island were hypermethylated relative to normal (P<0.01) and there was a mean absolute difference in beta values versus normal of >0.2 to evaluate whether gene body methylation was a phenomena enriched in oncogenes of CIMP-H type cancers, or was driven more non- specifically by CIMP itself. In total, 239 genes were annotated as known oncogenes, and 239 as known tumor suppressor genes in the NCG6.0 cancer gene database 23. Of these, 121 tumor suppressors and 116 oncogenes had a CpG island within the gene body that was probed on the array. In CIMP-H1 cancers 21.5% (20.2% in TCGA) of oncogenes had significant gene body methylation in reference to normal, by comparison significantly fewer tumor suppressor genes underwent gene body methylation (12.4% in RBWH cohort, P<0.05; 8.1% in TCGA, P<0.001). Likewise, gene body methylation was significantly more likely to occur in oncogenes than tumor suppressor genes in CIMP-H2 cancers (23.3% vs 11.6%, P=0.01). The gene expression of five oncogenes in CIMP-H1 and CIMP-H2 differed significantly from normal mucosa (FEV, BCL2 and KIT downregulated and PAX3 and SND1 upregulated in CIMP-H1; LMO2 and CTNND2 downregulated and SND1, CNTTA2 and TLX1 upregulated in CIMP-H2). Table 6 presents the oncogenes that had significantly higher gene body methylation in CIMP-H1 and CIMP-H2 cancers compared with normal colonic mucosa.

97

Table 6: Oncogenes with significantly higher methylation within the body of the gene CIMP-H1 CIMP-H2 Gene Expression Description Gene Expression Description FEV Down-regulated paired box 3 LMO2 Down-regulated LIM domain only 2 BCL2 Down-regulated LIM domain only 2 CTNND2 Down-regulated catenin delta 2 KIT Down-regulated R-spondin 3 staphylococcal nuclease and tudor SND1 Up-regulated domain containing 1 PAX3 Up-regulated FEV, ETS transcription factor CTNNA2 Up-regulated catenin alpha 2 SND1 Up-regulated catenin delta 2 T cell leukemia TLX1 Up-regulated homeobox 1 LMO2 No Difference T cell leukemia PI-3,4,5- homeobox 3 trisphosphate PREX2 No Difference dependent Rac exchange factor 2 RSPO3 No Difference SIX homeobox 1 RSPO3 No Difference R-spondin 3 CTNND2 No Difference homeobox C13 RET No Difference ret proto-oncogene TLX3 No Difference LIM domain only 1 LMO1 No Difference LIM domain only 1 SIX1 No Difference zinc finger protein fms related tyrosine FLT3 No Difference 521 kinase 3 HOXC13 No Difference spalt like calcium voltage- transcription factor CACNA1D No Difference gated channel 4 subunit alpha1 D LMO1 No Difference BCL2, apoptosis WW domain regulator containing WWTR1 No Difference transcription regulator 1 ZNF521 No Difference zinc finger E-box carbohydrate binding homeobox CHST11 No Difference sulfotransferase 11 1 SALL4 No Difference PI-3,4,5- trisphosphate PAX3 No Difference paired box 3 dependent Rac exchange factor 2 ZEB1 No Difference KIT proto- fms related tyrosine oncogene receptor FLT4 No Difference tyrosine kinase kinase 4 PREX2 No Difference oligodendrocyte C-X-C motif transcription factor CXCR4 No Difference chemokine receptor 2 4 OLIG2 No Difference smoothened, T cell leukemia frizzled class TLX3 No Difference homeobox 3 receptor SMO No Difference fms related tyrosine TAL bHLH kinase 3 transcription factor TAL1 No Difference 1, erythroid differentiation factor FLT3 No Difference GATA binding SIX1 No Difference SIX homeobox 1 protein 2 GATA2 No Difference T cell leukemia homeobox 1 HOXC11 No Difference homeobox C11

98

TLX1 No Difference TAL bHLH transcription factor oligodendrocyte 1, erythroid OLIG2 No Difference transcription factor differentiation 2 factor TAL1 No Difference calcium voltage- myogenic gated channel MYOD1 No Difference subunit alpha1 D differentiation 1 CACNA1D No Difference myogenic zinc finger E-box differentiation 1 ZEB1 No Difference binding homeobox 1 MYOD1 No Difference catenin alpha 2 HOXC13 No Difference homeobox C13 CTNNA2 No Difference carbohydrate zinc finger protein ZNF521 No Difference sulfotransferase 11 521 CHST11 No Difference nuclear receptor smoothened, subfamily 4 group SMO No Difference frizzled class A member 3 receptor NR4A3 No Difference staphylococcal nuclease and tudor GATA binding GATA2 No Difference domain containing protein 2 1 nuclear receptor NR4A3 No Difference subfamily 4 group A member 3

99

Loci marked by the PRC2 complex in human embryonic stem cells are prone to gene body methylation during cancer development PRC2 marking in human embryonic stem cells has previously been shown to overlap significantly with promoter hypermethylation in colorectal cancers 6. We hypothesized that a similar phenomenon would occur with regards to gene body hypermethylation. In CIMP-H1 and CIMP-H2 cancers, 30.59% and 31.04% of loci marked with H3K27me3 in hEScells developed significant gene body hypermethylation (Table 7, P=1.34x10-280 for CIMP-H1 and P=2.5x10-300 for CIMP-H2 overlap). We observed a lesser, but still highly significant overlap between H3K27me3 marked loci and gene body methylation in CIMP-L1 (13.1%, P=6.11x10- 122) and CIMP-L2 (8.5%, P=1.6x10-78) cancers but did not observe any correlation in CIMP- Neg cancers, which is likely due to the scarcity to which gene body methylation occurs in these cancers. We observed similar overlaps for EED targets, SUZ12 targets and PRC2 targets.

100

Table 7: Overlap between genes marked by the PRC2 complex and H3K27Me3 in hEScells and genes which undergo significant gene body methylation in colorectal cancer development. Overlap fraction represents the gene bodies that are methylated (k) divided by the number of genes marked by each respective mark in hEScells (K) (k/K). The FDR corrected P value was obtained through modeling a hypergeometic distribution (k-1, K, N-K, n; where k is the number of genes methylated in each cluster; K is the number of genes in the gene set; N is the number of genes in the ; and n is the number of genes in the query set) using the compute overlaps tool on the GSEA web portal using the Benporath gene sets, which were obtained though ChIP-on a Chip analysis of human embryonic stem cells

CIMP-H1 CIMP-H2 CIMP-L1 CIMP-L2 Overlap FDR p- Overlap FDR p- Overlap FDR p- Overlap FDR p- Gene Set Name Fraction value Fraction value Fraction value Fraction value BENPORATH_ES_WITH_H3K27ME3 30.59% 1.34E-280 31.04% 2.50E-300 13.06% 6.11E-122 8.50% 1.60E-78 BENPORATH_EED_TARGETS 30.70% 3.91E-267 31.07% 1.12E-284 12.81% 8.75E-112 8.66% 8.47E-77 BENPORATH_SUZ12_TARGETS 30.92% 5.05E-264 30.73% 9.67E-273 12.91% 1.29E-110 8.48% 2.02E-72 BENPORATH_PRC2_TARGETS 37.27% 1.04E-218 38.04% 8.59E-235 16.41% 4.56E-98 11.04% 2.05E-66

101

Epigenetic regulator gene mutations are common in The Cancer Genome Atlas cancers

Mutations in epigenetic modifier genes have previously been shown to modulate transcriptional profiles in cancer 15. We assessed the mutational frequency of 719 epigenetic regulator genes in cancers from the TCGA COAD and READ projects using the CIMP Subtypes identified earlier. For these analyses we include only mutations that were truncating in nature (nonsense or indels), were predicted to alter splicing, or were predicted to have a deleterious effect by PolyPhen.

Overall, 92.8% of cancers had a deleterious mutation in an epigenetic regulator gene (347/374). 94.7% and 100% of CIMP-H1 and CIMP-H2 cancers had at least one mutation in an epigenetic regulator. The proportion of CIMP-L1, CIMP-L2 and CIMP-Negative cancers with deleterious mutations in these genes was slightly lower (93.8%, 89.5% and 93.1%), however these proportions were not significantly different from CIMP-H1 or CIMP-H2. Of the 719 genes we interrogated, 95.7% were mutated in at least one cancer (688/719).

Figure 6 shows the most commonly mutated epigenetic regulators in each cluster. Mutations were least common in cancers classified as CIMP-Neg, with increasing global methylation being associated with a concordant increase in epigenetic mutational load. However, when we examined epigenetic mutation frequency in relation to microsatellite instability, there was no significant relationship between CIMP cluster and epigenetic mutation frequency, indicating that the differences observed between CIMP clusters may be driven by the increasing frequency of microsatellite instability in CIMP clusters with higher genomic methylation.

102

Figure 6: High-impact mutations in epigenetic regulator genes are frequent in cancers with higher genomic methylation. Del, deletion; Ins, insertion

103

CIMP-H1 and H2 subtypes have similar mutational patterns in epigenetic regulator genes

We examined the top 25 mutated epigenetic regulator genes in CIMP-H1 and CIMP-H2 to identify mutational targets that are common to CIMP-H and those that are exclusive to either the CIMP-H1 or CIMP-H2 subtypes. This was not influenced by MSI which was equally represented in these cancer subtypes (53% CIMP-H1, 50% CIMP-H2). 31.6% of these genes were identifiable in the top 25 epigenetic mutational targets in both CIMP-H1 and CIMP-H2. Such genes included four histone lysine methyltransferases (SETD1B, KMT2A, KMT2B and KMT2D), the SWI/SNF complex member ARID1A and the chromohelicase domain gene CHD7. Thirteen genes were identified in the top 25 mutated epigenetic regulators in CIMP-H1 but not CIMP-H2, these included the DNA demethylases TET1 (mutated in 15.8% of CIMP- H1 cancers vs 10.3% of CIMP-H2 cancers) and TET3 (mutated in 26.3% of CIMP-H1 cancers vs 10.3% of CIMP-H2 cancers). Mutations in histone lysine demethylase KDM2B were enriched in CIMP-H1 cancers (mutated in 36.8% of CIMP-H1 cancers vs 7.7% of CIMP-H2 cancers, P=0.01).

In contrast, thirteen genes were found in the top 25 mutated epigenetic regulators of CIMP-H2 but not CIMP-H1. The NCOR1 transcription factor was mutated in 20.5% of CIMP-H2 cancers compared with 5.3% of CIMP-H1 cancers, and the cohesin complex subunit NIPBL in 15.4% of CIMP-H2 cancers, despite not being identified as mutated in any CIMP-H1 cancer.

104

Epigenetic regulator gene mutation exclusivity supports the dichotomization of CIMP-L clusters

We used a similar approach (top 25 epigenetic gene mutations) to investigate whether CIMP- L1 and CIMP-L2 subtype cancers also target similar epigenetic regulators for somatic mutation. Here, 11 epigenetic regulator genes were commonly mutated in both CIMP-L1 and CIMP-L2. The histone lysine methyltransferases KMT2B and KMT2C were amongst the top 25 mutated epigenetic regulators in both CIMP-L1 and CIMP-L2, however the frequency of mutation in both KMT2B and KMT2C was lower in CIMP-L2 cancers (KMT2B CIMP-L1: 11.8%, CIMP-L2: 5.7%; KMT2C CIMP-L1: 10.5%, CIMP-L2: 6.5%), however this was not statistically significant. There was a non-significant trend (P=0.06) for increased ASH1L mutation in CIMP-L1 cancers (13.2%) versus CIMP-L2 cancers (4.9%). Fourteen genes were in the top 25 mutated epigenetic regulators of CIMP-L1 or CIMP-L2 alone. SETD1B, a histone lysine methyltransferase identified as a commonly mutated gene in CIMP-H cancers was mutated in six CIMP-L1 cancers, but was only mutated in a single CIMP-L2 cancer (P<0.01). Likewise we identified recurrent ARID1A mutations in CIMP-L1 (9.2%) however were identified significantly fewer in CIMP-L2 cancers (1.6%, P<0.01).

The SWI/SNF complex is a commonly aberrantly mutated chromatin remodeling complex in CIMP-H1, CIMP-H2 and CIMP-L1 cancers

Next we examined the SWI/SNF complex (MARCA2, ARID1A, ARID1B, ARID2, PBRM1, SMARCB1, and SMARCA4) for high-impact somatic mutations. Mutations in any of the SWI/SNF subunits occurred in 19.06% of cancers. ARID1A mutation was the most frequent genetic alteration of the complex (6.7%). We observed a number of recurrently mutated positions in ARID1A, including 6 frameshift deletions at codon 2141, four deletions at codon 1850 and three deletions at codon 1072 (Figure 7). ARID2 was mutated in 6% of cancers, but unlike ARID1A we did not identify any recurrently mutated positions. The distribution of the mutations between CIMP subtypes was significantly skewed towards subtypes with higher overall methylation (P<0.0001). SWI/SNF mutations were observed in 50% of CIMP-H1 cancers, and 38.5% of CIMP-H2 cancers. 26.3% of CIMP-L1 samples mutated a SWI/SNF member, and in contrast to CIMP-H1 and CIMP-H2, the most frequently mutated member of the complex was SMARCA4 (11%). The R885C mutation was observed in three cancers in

105

CIMP-L1. Mutations in SWI/SNF subunits were similarly infrequent and significantly less prevalent in CIMP-L1 and CIMP-Neg (10.6% and 11.6%, respectively, P<0.0001).

106

Figure 7: High impact mutations in ARID1A are common in colorectal adenocarcinomas. Del, deletion; Ins, insertion.

107

Synthetic lethality in the SWI/SNF complex has previously been established 24. CIMP-H1, CIMP-H2 and CIMP-L1 cancers may be more vulnerable to treatments targeting the other element of the SWI/SNF complex. To test whether one SWI/SNF mutation confers dependency on other SWI/SNF subunits in vitro, we correlated exome-capture data from 15 cell lines 25 with cell line dependency data from Meyers et al 26. Five cell lines had an ARID1A truncating mutation and these were significantly more dependent on ARID1B expression for survival (0.31 vs 0.06, P<0.05).

The frequency of genetic perturbation of chromodomain DNA binding genes is associated with DNA methylation

CHD genes are members of another chromatin remodeling family. High impact CHD family gene mutations are present in 22.4% of colorectal cancers in the TCGA. CHD mutations were markedly more common in CIMP-H1 and CIMP-H2 cancers. Family members were mutated in 50% and 51.3% of CIMP-H1 and CIMP-H2 cancers, respectively. CHD7 was the most frequently altered gene in CIMP-H1 (33% of cancers), and CHD8 in CIMP-L2 (22%). CHD mutations were less common, but still frequent in CIMP-L1 cancers (19.7%). In these cancers, CHD4 was the most commonly mutated gene (8%). The frequency of CHD mutations continued to decline as concordant with DNA methylation. The frequency of CHD mutations in CIMP-L2 was 11.7% and was lower than the frequency observed in CIMP-Neg cancers (15%).

We examined the CHD genes for recurrently mutated positions. At the CHD7 locus, which was mutated in 5.5% of cancers, we observed 5 frameshift deletions (D2988fs del 3) at the 3’ end of the gene. This mutation has been observed in a number of colorectal cancer cell lines. For CHD3, CHD4 and CHD9 we observed three recurrently mutated positions at R540fs del 16, R975H and F760fs del 16.

Discussion

Remodeling of the epigenome is fundamental to colorectal cancer progression. One of the most common epigenetic phenomena altered throughout carcinogenesis is the DNA methylation landscape. Here, we aimed to better understand the extent and heterogeneity of aberrant DNA methylation in colorectal cancers, and characterize the interplay between DNA methylation, somatic variation in epigenetic regulator genes, and gene transcription. Through the genome-scale interrogation of the largest unselected and consecutive series of colorectal cancers to date, we identified five clinically and molecularly distinct DNA methylation subtypes. The five subtypes identified in this study are highly correlated with key 108

clinical and molecular features, including patient age, tumor location, microsatellite instability and oncogenic MAPKinase mutations. We show that cancers with high DNA methylation show an increased preponderance for mutating genes involved in epigenetic regulation, and namely those that are implicated in the chromatin remodeling process.

Hinoue and colleagues previously reported the presence of four colorectal cancer methylation subgroups by assessing 125 colorectal cancers using Illumina 27K DNA methylation arrays 27. In the present study, we have considerably increased the power to assess subgroups based on differential methylation by studying 216 unselected cancers using the Illumina 450K DNA methylation platform. The Illumina 450K DNA methylation platform is capable of assessing >10X more CpG sites and thus can more robustly identify methylation subtypes. A major difference of our study is the identification of two discrete CIMP- High subtypes, CIMP-H1 and CIMP-H2. The dichotomization of these CIMP-H cancers identified a homogeneous subgroup of CIMP-H1 cancers with an average age of 75 years, striking over- representation of female gender and BRAF mutant cancers arising in the proximal colon. The newly identified CIMP-H2 subtype encompasses more KRAS mutant cancers than CIMP-H1, and the majority of cancers in this subtype would be CIMP-Low using the 5-marker CIMP panel proposed by Weisenberger et al. Our genome-scale analyses of both our cohort and the TCGA indicate this not to be the case. Together, our CIMP-H1/H2 clusters represent 21% of our unselected cohort, and 16.3% of the TCGA cohort. Collectively, the current findings indicate that CIMP is more prevalent than previously thought, and classification of cancers using existing panels may not identify all CIMP-High colorectal cancers.

We observed a consistent increase in patient age with CIMP cluster, from 62 years in CIMP-Neg cancers to 75 years in CIMP-H1 cancers. This is in contrast to the Hinuoe study 6. The variance in our assay was mostly contained in uniquely mapping probes that were not present in the 27K array employed by Hinuoe et al. Numerous studies have demonstrated age-related methylation in different tissues 9,28,29 and we have previously identified hypermethylated loci in the colons of patients even with no history of colonic disease 9. In the present study, we detected a significant correlation between methylation and patient age. After removal of all probes that were significantly hypermethylated in normal mucosal tissue, we still observed distinct, age linked clustering. This association was faithfully reproduced in cancers from The Cancer Genome Atlas.

109

The subtype with the highest degree of methylation (CIMP-H1) was strongly associated with mutations in the BRAF oncogene. BRAF mutations are a hallmark of the serrated neoplasia pathway, and indicate that these cancers probably arose in serrated precursor lesions. We have previously shown that the colonoscopic incidence of sessile serrated adenomas does not differ between patients aged in their 30s and patients that are much older whilst BRAF mutant cancers were restricted to older individuals 30, suggesting these BRAF mutant polyps may have limited malignant potential in young patients. We have also reported a striking association between patient age and CIMP in sessile serrated adenomas 31. Here we report that the vast majority of BRAF mutant cancers in both the RBWH and TCGA cohorts are CIMP-H arise in older individuals. Collectively, these findings suggest that sessile serrated adenomas may be relatively benign in young patients. In older patients with more advanced DNA methylation changes in the colon, the risk of progression to cancer will be significantly greater. Recently we recapitulated this process in a murine model for serrated neoplasia and showed that early onset Braf mutation leads to the temporal accumulation of DNA methylation and ultimately to malignancy 32. Additional studies are necessary to fully determine the natural history of BRAF mutant cancers, and elucidate the determinants of malignant potential to inform the development of patient centric surveillance for young and older patients who present with sessile serrated adenomas.

Differential CpG island and shore hypermethylation were the most frequently observed methylation events in the study. Probes on the north and south CpG shelves, as well as those in the open seas were frequently hypomethylated across most cancers. The implications of hypomethylated CpG dinucleotides outside of CpG islands are unclear. We did not observe any relationship between hypomethylation and gene transcription, however it is possible that hypomethylation of specific regions of the genome may affect chromatin accessibility elsewhere and hence may modulate transcription in a trans-acting manner. Open sea hypomethylation was also the most frequent methylation event in CIMP-Neg cancers. These are predominately conventional pathway cancers with a high degree of chromosomal instability. One hypothesis that may explain this association is that hypomethylation outside of CpG islands may predispose to copy number changes in these cancers 33,34. Functional studies are necessary to explore the implications of shelf and open sea hypomethylation and whether this is relevant to the cancer development process for these cancers.

110

There were marked differences in transcriptional deregulation of key cancer-related pathways between methylation clusters. CIMP-H1 cancers activated several immune pathways, including those involved in the interferon response, inflammatory response and complement signaling, consistent with the over- representation of CMS1 cancers in this group. This is likely due to the higher mutational burden in these cancers, largely driven by the increased incidence of epigenetically induced microsatellite instability. MSI cancers have been associated with greater immune infiltrate and hence some of this signaling may originate in the stromal immune cells, rather than from within the tumor itself 35. In the RBWH cohort, CIMP-H2 cancers were uniquely enriched for altered bile acid metabolism, consistent with the previously described relationship between silencing of the farnesoid X bile acid receptor in KRAS mutant cancers 36. Bile acids are more concentrated in the proximal colon and metabolism is influenced by the gut microbiome 37. The increased bile acid metabolism signaling in this group of cancers may identify a subset of cancers which have arisen due to aberrant bile acid accumulation. We did not observe such an effect in the TCGA cohort. This may be due to the increased frequency of BRAF mutant MSI cancers in CIMP-H2 in TCGA. Better understanding the role of bile acid signaling in KRAS mutant cancers of the proximal colon may have therapeutic implications for this cancer subgroup.

Paradoxically, despite observing less differential methylation, we observed an increase in gene silencing that correlated with promoter hypermethylation in the least methylated cancer clusters. This may indicate that promoter hypermethylation in CIMP-L1/2 and CIMP-Neg cancers is more specifically selected based on a functional advantage in these cancers. Alternatively, the increased frequency of mutations in epigenetic regulators of CIMP-H1/2 cancers may result in a reduced capacity to induce gene repression at certain loci. This may be due to the loss of a repressive histone modifying , or mutation of locus specific repressive transcription factors. Methylation alone may be insufficient to induce gene repression in certain instances. Instead, relevant chromatin remodeling and histone modifications, such as the addition of the repressive PRC2 mark, may be required in tandem with methylation changes to reduce gene expression. Indeed, we showed that PRC2 occupancy was most frequently related to transcriptionally repressed and methylated genes in the CIMP-Neg subgroup. We also observed instances of promoter methylation that correlated with increased gene transcription. It is possible that some transcription factors preferentially bind methylated DNA 38, and that binding sites for these transcription factors become available following promoter methylation. These data may indicate that the genomic context of methylation is important for determining whether gene expression changes will occur. In

111

TCGA, however, we were unable to discern any significant differences in the proportion methylated and repressed genes versus all methylated genes between CIMP subtypes. This may be due to technological differences between the array-based methods used to evaluate gene transcription in the current study and the RNA-sequencing based methods employed in TCGA. Direct comparisons between the expression values derived from each of these studies is difficult and should be approached with caution.

A major novel finding of the current study is the discovery that gene body methylation may be a major driver of serrated tumorigenesis, and that this may be mediated by H3K27me3 histone marks. Gene body hypermethylation has recently been correlated with increased oncogene expression 22. Here we identified many well characterized oncogenes, such as BCL2 and ZEB1, with methylation of their gene bodies in CIMP-H1/2 cancers, and noted a significant preference for the methylation of gene bodies of oncogenes as compared to tumor suppressor genes. We also identified Wnt pathway antagonists that are resistant to gene body methylation. In the present study, we did not identify distinct transcriptional differences in these Wnt pathway antagonists. It is possible that gene body methylation affects other aspects of the transcriptional process that were not assessed in this study, such as splicing and isoform switching. Alternatively, this gene body methylation may be a stochastic result of the overall increase in aberrant DNA methylation in these cancers.

The epigenome is regulated by proteins that interact with histones or DNA. We assessed the coding sequence of 719 epigenetic regulator genes in the TCGA dataset. The chromodomain-helicase-DNA (CHD) binding was a frequent mutational target in CIMP-H1 cancers. Recently, Fang et al. showed that CHD8 operates in a transcriptional repression complex to direct methylation in the setting of BRAF mutation 39. In the current study we showed BRAF and CHD8 mutations were associated with CIMP-H1. Thus these data suggests that CHD8 mutation may enhance repression complex activity in the setting of BRAF mutation, resulting in hypermethylation. Moreover, CHD8 has been associated with the CTCF protein, which is essential for promoter-enhancer looping and regional insulation. CHD8 mutations may influence CIMP by decreasing the ability of CTCF to insulate regions of the genome, and could encourage methylation spreading throughout the genome 40. Similarly we report frequent mutations in different members of the CHD family. CHD7 was the most mutated CHD gene, and some positions in the CHD7 locus were recurrently mutated. Tahara and colleagues identified mutations in CHD7 and

112

CHD8 in 42% of CIMP1 colorectal cancers 41. The functional consequences of CHD7 mutations are unclear. In pancreatic duct adenocarcinoma CHD7 expression has been shown to correlate with gemcitabine sensitivity 42. The most commonly mutated CHD gene in CIMP-L1 cancers was CHD4. Recently Xia et al (2017) 43 proposed an oncogenic role for CHD4 through facilitating the hypermethylation of tumor suppressor genes. In contrast Li et al (2018)32 showed that CHD4 mutations that promote protein degradation enhance stemness and contributes to the progression of endometrial cancers via the TGF-Beta signaling cascade. Indeed we identified three mutations at the R975H hotspot of CHD4 that was studied by Li et al (2018) and a number of other mutations that were predicted to be damaging. It is not possible to conclude from our data whether these mutations promote the hypermethylaton proposed by Xia et al (2017), and therefore support the oncogenic role of the protein or whether the enhanced protein degradation and increased stemness proposed by Li et al is the predominant purpose of these mutations.

Chromatin remodeling is an essential process whereby condensed euchromatin is modified in a context- specific manner to give rise to regions of heterochromatin that can be actively transcribed. Chromatin remodeling is driven by a series of complexes that are able to enzymatically catalyze reactions that modify histone tails and, in turn, modulate the accessibility of the chromatin. In mammalian cells five key chromatin modifying complexes predominate. The chromodomain helicase DNA-binding complex (CHD), the INO80 complex, the SWI/SNF complex, ISWI complex and the NuRD complex 44. Here, we have examined the frequency of mutations in the SWI/SNF complex, which has previously been shown to be perturbed in various cancers. Interestingly, half of CIMP-H1 and > 25% of CIMP-H2 and CIMP- L1 cancers harbored somatic mutations in SWI/SNF members that were predicted to be deleterious. We hypothesized that mutation of one member of the subunit would increase the reliance of the cancer on other otherwise redundant subunits. To test this hypothesis we used public colorectal cancer cell line dependency data in conjunction with mutational data, and identified a strong dependency conferred upon ARID1B following genetic perturbation of ARID1A. These data support the investigation of SWI/SNF inhibitors to exploit synthetic lethality presented by SWI/SNF mutations in CIMP-L1 cancers. While we have shown associations between genomic methylation and SWI/SNF mutations, and between mutations of SWI/SNF members and synthetic lethality, functional causation is difficult to infer from our study. Collectively these data indicate a need for further functional experiments to elucidate the role of these

113

mutations in the carcinogenic process of CIMP-H1, CIMP-H2, and CIMP-L1 cancers, and to determine whether the potential synthetic lethalities they create can be exploited.

We leveraged the publicly available DNA methylation data from The Cancer Genome Atlas project to validate findings in our consecutive cohort. Key findings, including relationships between CIMP subtype and age, proximal location, BRAF mutation and KRAS mutation were also identified in analysis of the TCGA data. In our unselected and consecutively collected series we observed a strong relationship between the BRAF mutation and CIMP-H1 and the KRAS mutation and CIMP-H2. While BRAF was still enriched in the TCGA CIMP-H1 cancers, and KRAS amongst the CIMP-H2 cancers, we did observe a higher proportion of BRAF mutant CIMP-H2 cancers in the TCGA cohort. The increased proportion of BRAF mutant/CIMP-H2 cancers skewed these cancers towards a preference for microsatellite instability, and the CMS1 subtype. It is notable that >40% of CIMP-H2 cancers in the validation cohort are KRAS mutant, and of these, the majority are microsatellite stable and follow similar CMS patterns to that observed in our consecutive series. The discrepancies observed between the two cohorts may be due to structural differences in each cohort. The mean age of patients in our study was 3.4 years older than those in the TCGA cohort. Cancers were most often identified in the distal colon of the patient, as is typical for colorectal cancers 45, however in contrast the TCGA consisted of a marked overrepresentation of proximal cancers (47.7%).

It is important to recognize the limitations of our study. First, our samples were collected in a consecutive manner where there was sufficient sample available for DNA and RNA analyses. This excludes very small cancers and those in patients where surgery was not possible. This presents a slight bias, however this is standard practice and unavoidable in studies of this nature. As technologies improve and analyses are possible on smaller amounts of tissue it will be important to replicate the key findings of this study. Moreover as we collected fresh tissue we were not able to make any assessments of tumor purity. One alternative would have been to perform analyses on formalin-fixed paraffin embedded samples, where we could perform accurate histological assessments of the purity of the samples. While the Illumina HM450 platform and newer platforms such as the EPIC arrays are amenable to FFPE derived DNA, co- extraction of high-quality RNA from FFPE remains challenging. We note that the findings of this study are largely correlative and as such we cannot draw causation from our data. In depth, mechanistic follow- up is necessary to fully examine many of the key associations we have identified in the present study.

114

Another limitation of our study is the use of normal mucosal samples from patients with cancer. Field DNA methylation defects have been reported in colorectal cancer 46. Thus, we cannot exclude the possibility that field DNA defects had impacted our analysis. In the current study we performed all analyses on bulk tissue samples. As such we have collected the DNA methylome and transcript profile of an aggregate of cells that includes epithelial cells, immune cells and stromal cells. The interplay between these cell types is crucial and it is important to note that some of the expression and methylation differences observed here may be driven by any of the cells in the bulk cell sample.

Conclusion

The past decade has heralded an era where the importance of the cancer epigenome is increasingly recognized, where treatments targeting different epigenetic modifications are entering the clinic and improving patient outcomes. While early strategies targeting epigenetic modifications in colorectal cancers have largely proved ineffective, it has become apparent that a comprehensive understanding of the epigenetic drivers of cancer will be crucial in the rational design of clinical trials and the development of precision medicine strategies. Here we have identified five clinically and molecularly distinct subgroups based on a comprehensive assessment of a large, unselected series of colorectal cancer methylomes. We have validated these subtypes in an additional cohort of 374 cancers from TCGA. In contrast to earlier studies, we identify two clinically and molecularly distinct CIMP-H clusters. We observe a striking association between genomic methylation and age, which further supports the investigation of the epigenetic clock in serrated neoplasia risk. We identify an association between gene body methylation CIMP-H cancers, which may be mediated by H3K27me3 histone marks. Our interrogation of the coding regions of epigenetic regulatory genes shows that they are frequently mutated in colorectal cancers and this may be partially influenced by the degree of genomic methylation. Our analyses have identified potentially druggable vulnerabilities in cancers of different methylation subtypes. Inhibitors targeting synthetic lethalities, such as SWI/SNF component inhibitors for those with ARID mutations, should be evaluated as these agents may be clinically beneficial to certain patient subsets.

115

Methods

Patient samples

Colorectal cancer (N = 216) and matched normal (N = 32) samples were obtained from patients undergoing surgery at the Royal Brisbane and Women’s Hospital, Brisbane, Australia, in a consecutive manner between 2009 and 2012. Tissue was snap-frozen in liquid nitrogen to preserve sample integrity. Written informed consent was obtained from each patient. The study protocol was approved by the Royal Brisbane and Women’s Hospital and QIMR Berghofer Medical Research Institute Research Ethics Committees. The Cancer Genome Atlas (TCGA) colon adenocarcinoma (COAD) exome and methylation data (N = 278) were used for independent validation 16.

DNA and mRNA extractions

DNA and mRNA were simultaneously extracted from approximately 30 mg of homogenized tissue using the AllPrep DNA/RNA Kit (Qiagen, Australia) in accordance with the manufacturer’s protocols. Double stranded DNA concentration was assessed using the PicoGreen quantitation assay (Molecular Probes, USA). mRNA quality was measured using the Bioanalyzer 2100 platform (Agilent, USA). Microarray analysis was performed on samples with a RNA integrity number of >7.

Molecular characterization of cancer samples

Cancer sample DNA was analyzed for the BRAF V600E mutation using allelic discrimination as previously reported 47. In addition, we assayed mutations in KRAS codons 12 and 13, and TP53 exons 4 to 8 using previously reported methods 48,49. We assessed CIMP status by methylation-specific PCR using the five-marker panel (CACNA1G, IGF2, NEUROG1, RUNX1 and SOCS1) proposed by Weisenberger et al. 3. Samples were considered CIMP-high if ≥ 3 markers were methylated, CIMP-low if 1 or 2 markers were methylated, and CIMP-negative if no markers were methylated. MSI was assessed using the criteria of Nagasaka et al. 50 where instability in ≥ 1 mononucleotide marker, and ≥ 1 additional, 116

non-mononucleotide marker, using the marker set reported in Boland et al., 51 was indicative of MSI, the remainder being microsatellite stable (MSS). LINE1 methylation was assessed using pyrosequencing as per Irahara et al. 52. CIMP-high cancers that were both KRAS and BRAF wild-type at hotspot codons were Sanger sequenced for BRAF exons 11 and 15 ( exon 11, forward 5’-TTCCTGTATCCCTCTCAGGCA- 3’, reverse 5’-AAAGGGGAATTCCTCCAGGTT-3’; exon 15, forward 5’- GGAAAGCATCTCACCTCATCCT-3’, reverse 5’-TAGAAAGTCATTGAAGGTCTCAACT-3’), KRAS codon 61 ( forward 5’-TCCAGACTGTGTTTCTCCCTTC-3’, reverse 5’- TGAGATGGTGTCACTTTAACAGT-3’), and EGFR exon 18 (forward 5’- ATGTCTGGCACTGCTTTCCA-3’, reverse 5’-ATTGACCTTGCCATGGGGTG-3’).

DNA methylation microarray

Genome-scale DNA methylation was measured using the HumanMethylation450 BeadChip array (Illumina, USA). The BeadChip array interrogates cytosine methylation at >480,000 CpG sites. 500ng of DNA was bisulphite converted using the EZ-96 DNA Methylation Kit (Zymo Research, USA) as per the manufacturer’s protocol. Whole-genome amplification and enzymatic fragmentation was performed on post-treatment DNA, which was subsequently hybridized to the array at 48°C for 16 hours. Arrays were scanned using the iScan System (Illumina, USA).

Gene expression microarray

Gene expression levels for over 47,000 transcripts were measured for all samples using the HumanHT- 12 v3 Expression BeadChip array (Illumina, USA). Total mRNA (500 ng) was reverse-transcribed, amplified and biotinylated using the TotalPrep-96 RNA Amplification Kit (Illumina, USA). The labelled cRNA (750 ng) was hybridized to the array followed by washing, blocking, and staining with streptavidin-Cy3. Arrays were scanned on the iScan System and the data was extracted using GenomeStudio Software (Illumina, USA).

Data analysis

117

Methylation microarray data were checked for quality against parameters provided by Illumina using the GenomeStudio Software package. IDAT files were read into the R environment using Limma 53. We used subset-within-array normalization (SWAN) to correct for biases resulting from type 1 and type 2 probes on the array. We used the BEclear R package to assess for probe-level batch effects and excluded probes that were significantly batch affected (n=1072) from downstream analysis. We filtered probes that had a detection P > 0.05 in > 50% of samples, as well as probes that were on the X or Y chromosome, where the CpG site was within 10bp of a single nucleotide polymorphism, or where a probe mapped to the genome ambiguously. At the conclusion of filtering 377,612 probes remained and were used for subsequent analyses.

The recursively partitioned mixed model (RPMM) clustering method 54 was used for unsupervised clustering. In order to capture cancer specific methylation we followed methods employed by based The Cancer Genome Atlas 55. DNA methylation drift with age has been charactarized in a number of different normal and cancerous tissues 10. To limit confounding from methylation that occurs through age probes with a mean β value of >0.3 in normal samples were excluded from clustering analysis. 144,542 probes were unmethylated (mean β value <0.3) in normal mucosa, of these the 5,000 probes with the greatest variance in the tumor samples were selected for clustering. The RPMM clustering method is particularly suited to analysis of methylation data generated from the HumanMethylation450 array as output β values fall between 0 and 1, and can be modelled using a β-like distribution 54. We accessed level 1 DNA methylation data from The Cancer Genome Atlas project and perform an identical analysis as above mentioned for validation.

For motif analysis, the CentriMo tool was used 21. CentriMo identifies overrepresented motifs within sequences, correlating these with known DNA-protein binding motifs 21. β values were transformed to

M values using M=log2[β/(1-β)]. For differential methylation analysis versus the subset of normal mucosal samples, a probe was considered to be differentially methylated in a comparison if the Benjamini-Hochberg adjusted P value for the comparison was <0.05 and had an average absolute Δβ ≥ 0.2 versus normal mucosal samples. For examination of methylation in oncogenes and tumor suppressor genes we consulted the NCG6.0 cancer gene database 23. For these analyses we included only cancer genes that were annotated in NCG6.0 without ambiguity (were not annotated as both tumor suppressor genes and oncogenes) and those which we probed on the array. When describing the context of DNA methylation, we describe CpGs as residing in CpG “shores” if they are positioned within 2kb of a CpG 118

island, CpG “shelves” if they are positioned between 2-4kb from a CpG island and open sea probes where they are > 4kb from the nearest CpG island. CpGs were further described as north or south based on whether they lay upstream (5’/North) or downstream (3’/South) of the nearest CpG island.

Expression data were preprocessed and normalized using quantile normalization with the Limma R package. For between group comparisons the empirical Bayes function was used, and adjusted for multiple testing using the Benjamini-Hochberg method 56 to control for false discovery rate (FDR) and avoid type 1 errors. We examined gene expression in the TCGA by accessing level three expression data in FPKM format from Genome Data Commons. We used Limma to perform a voom transformation to correct for heteroscedasticity and examine differential expression against normal colonic mucosal samples using the same methods as employed in the consecutive series. We considered 0.05 to be the FDR threshold for significance. For integrated expression and methylation data analysis, genes were considered to be methylated if one probe within 2 kb upstream of the gene transcription start site (TSS) was differentially methylated by FDR and had an average Δβ ≥ 0.2 at that site. If a gene met this criterion, and had a significant FDR corrected P value for the cancer versus normal expression value, it was predicted to be influenced by methylation. Single Sample Gene-Set enrichment analysis was used for between groups comparisons of transcriptomes 18. We used the CIBERSORT algorithm to compute the relative proportion of stromal cells within each subtype 17.

The CMS classifier package was used to classify cancers into CMS as previously reported 8.

To examine the mutational frequency of epigenetic regulators level 3 somatic variant data was downloaded from the Genome Data Commons portal. Silent variants were discarded and epigenetic regulator genes subset from the EpiFactors Database (Thesis Appendix 1) 57. We assessed the potential pathogencicity of missense mutations using the PolyPhen2 tool 58. PolyPhen2 predicts functional effects of missense mutations by examining the how evolutionarily conserved the affected residue is, and computes the likelihood that the event will induce a structural change. Only variants that were predicted to be probably, or possibly damaging were retained. Variants predicted to be benign were not included as part of these analyses

119

PRC2 and Methylation overlap analysis

Polycomb occupancy was inferred from SUZ12 CHIP-Seq data from hESC1 cells analysed as part of the ENCODE consortium 59. SUZ12 was chosen as a surrogate for PRC2 occupancy as previous studies indicate that it is an essential subunit of the PRC2 complex 20,60. The overlap function within BedTools 61 was used to overlap differentially methylated probes within each cluster versus normal with regions where SUZ12 was bound in hESC1 cells, producing a list of regions where methylation and PRC2 occupancy co-occurred.

Synthetic Lethality Analysis

Cell line dependency data from Meyers et al 26,. was correlated with colorectal cancer cell line mutation data 25. Synthetic lethal relationships were inferred if a high impact mutation (Truncating mutations or those in splice sites) occurred in one subunit of a molecular complex, and the cell line had relatively higher dependence values on other subunits when compared with cell lines that lacked a mutation. Cell lines were grouped as having a mutation in a specific gene and those not having a mutation, and a Students T-Test performed on dependence values every other subunit within the complex.

Statistical analysis

For statistical analyses a combination of software were used, including R and GraphPad Prism 7. Fisher’s exact test was used for hypothesis testing on 2×2 contingencies. Pearson’s chi-squared test was used to compare contingencies > 2×2. Student’s t-test or Wilcoxon rank-sum test was used to compare continuous variables where appropriate. One-way analysis of variance (ANOVA) was used for continuous variable comparisons with > 2 groups.

120

References

1 Fearon, E. R. & Vogelstein, B. A genetic model for colorectal tumorigenesis. Cell 61, 759-767, doi:10.1016/0092-8674(90)90186-I (1990). 2 Leggett, B. & Whitehall, V. Role of the Serrated Pathway in Colorectal Cancer Pathogenesis. Gastroenterology 138, 2088-2100, doi:https://doi.org/10.1053/j.gastro.2009.12.066 (2010). 3 Weisenberger, D. J. et al. CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancer. Nature genetics 38, 787-793, doi:http://www.nature.com/ng/journal/v38/n7/suppinfo/ng1834_S1.html (2006). 4 Guan, R. J., Fu, Y., Holt, P. R. & Pardee, A. B. Association of K-ras mutations with p16 methylation in human colon cancer. Gastroenterology 116, 1063-1071, doi:https://doi.org/10.1016/S0016- 5085(99)70009-0 (1999). 5 Herman, J. G. et al. Incidence and functional consequences of hMLH1 promoter hypermethylation in colorectal carcinoma. Proceedings of the National Academy of Sciences of the United States of America 95, 6870-6875 (1998). 6 Hinoue, T. et al. Genome-scale analysis of aberrant DNA methylation in colorectal cancer. Genome Research 22, 271-282, doi:10.1101/gr.117523.110 (2012). 7 Ogino, S., Kawasaki, T., Kirkner, G. J., Loda, M. & Fuchs, C. S. CpG Island Methylator Phenotype- Low (CIMP-Low) in Colorectal Cancer: Possible Associations with Male Sex and KRAS Mutations. The Journal of molecular diagnostics : JMD 8, 582-588, doi:10.2353/jmoldx.2006.060082 (2006). 8 Guinney, J. et al. The consensus molecular subtypes of colorectal cancer. Nature medicine 21, 1350, doi:10.1038/nm.3967 https://www.nature.com/articles/nm.3967#supplementary-information (2015). 9 Worthley, D. L. et al. DNA methylation within the normal colorectal mucosa is associated with pathway-specific predisposition to cancer. Oncogene 29, 1653-1662, doi:http://www.nature.com/onc/journal/v29/n11/suppinfo/onc2009449s1.html (2010). 10 Horvath, S. DNA methylation age of human tissues and cell types. Genome Biology 14, R115-R115, doi:10.1186/gb-2013-14-10-r115 (2013). 11 Bettington, M. et al. The serrated pathway to colorectal carcinoma: current concepts and challenges. Histopathology 62, 367-386, doi:10.1111/his.12055 (2013). 12 Bettington, M. et al. Clinicopathological and molecular features of sessile serrated adenomas with dysplasia or carcinoma. Gut (2015). 13 Ford, E. E. et al. Frequent lack of repressive capacity of promoter DNA methylation identified through genome-wide epigenomic manipulation. bioRxiv (2017). 14 Allis, C. D. & Jenuwein, T. The molecular hallmarks of epigenetic control. Nature Reviews Genetics 17, 487, doi:10.1038/nrg.2016.59 (2016). 15 Wang, Y. et al. Mutations of epigenetic regulatory genes are common in thymic carcinomas. Scientific Reports 4, 7336, doi:10.1038/srep07336 https://www.nature.com/articles/srep07336#supplementary-information (2014). 16 The Cancer Genome Atlas, N. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330, doi:10.1038/nature11252 https://www.nature.com/articles/nature11252#supplementary-information (2012). 17 Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nature Methods 12, 453, doi:10.1038/nmeth.3337 https://www.nature.com/articles/nmeth.3337#supplementary-information (2015).

121

18 Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102, 15545-15550, doi:10.1073/pnas.0506580102 (2005). 19 Liberzon, A. et al. The Molecular Signatures Database Hallmark Gene Set Collection. Cell Systems 1, 417-425, doi:https://doi.org/10.1016/j.cels.2015.12.004 (2015). 20 Nayak, V., Xu, C. & Min, J. Composition, recruitment and regulation of the PRC2 complex. Nucleus 2, 277-282, doi:10.4161/nucl.2.4.16266 (2011). 21 Bailey, T. L. & Machanick, P. Inferring direct DNA binding from ChIP-seq. Nucleic acids research 40, e128-e128, doi:10.1093/nar/gks433 (2012). 22 Yang, X. et al. Gene Body Methylation Can Alter Gene Expression and Is a Therapeutic Target in Cancer. Cancer Cell 26, 577-590, doi:10.1016/j.ccr.2014.07.028 (2014). 23 Repana, D. et al. The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens. bioRxiv, 389858, doi:10.1101/389858 (2018). 24 St. Pierre, R. & Kadoch, C. Mammalian SWI/SNF complexes in cancer: emerging therapeutic opportunities. Current Opinion in Genetics & Development 42, 56-67, doi:https://doi.org/10.1016/j.gde.2017.02.004 (2017). 25 Mouradov, D. et al. Colorectal Cancer Cell Lines Are Representative Models of the Main Molecular Subtypes of Primary Cancer. Cancer research 74, 3238 (2014). 26 Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR– Cas9 essentiality screens in cancer cells. Nature genetics 49, 1779, doi:10.1038/ng.3984 https://www.nature.com/articles/ng.3984#supplementary-information (2017). 27 Hinoue, T. et al. Genome-scale analysis of aberrant DNA methylation in colorectal cancer. Genome Research (2011). 28 Johnson, A. A. et al. The Role of DNA Methylation in Aging, Rejuvenation, and Age-Related Disease. Rejuvenation Research 15, 483-494, doi:10.1089/rej.2012.1324 (2012). 29 Steegenga, W. T. et al. Genome-wide age-related changes in DNA methylation and gene expression in human PBMCs. AGE 36, 9648, doi:10.1007/s11357-014-9648-x (2014). 30 Bettington, M. et al. Sessile Serrated Adenomas in Young Patients may have Limited Risk of Malignant Progression. Journal of Clinical Gastroenterology 53 (2019). 31 Liu, C. et al. CpG Island Methylation in Sessile Serrated Adenomas Increases With Age, Indicating Lower Risk of Malignancy in Young Patients. Gastroenterology 155, 1362-1365.e1362, doi:10.1053/j.gastro.2018.07.012 (2018). 32 Bond, C. E. et al. Oncogenic BRAF mutation induces DNA methylation changes in a murine model for human serrated colorectal neoplasia. Epigenetics 13, 40-48, doi:10.1080/15592294.2017.1411446 (2018). 33 Rodriguez, J. et al. Chromosomal instability correlates with genome-wide DNA demethylation in human primary colorectal cancers. Cancer research 66, 8462-9468, doi:10.1158/0008-5472.CAN-06-0293 (2006). 34 Sheaffer, K. L., Elliott, E. N. & Kaestner, K. H. DNA hypomethylation contributes to genomic instability and intestinal cancer initiation. Cancer prevention research (Philadelphia, Pa.) 9, 534-546, doi:10.1158/1940-6207.CAPR-15-0349 (2016). 35 Xiao, Y. & Freeman, G. J. The Microsatellite Instable (MSI) Subset of Colorectal Cancer is a particularly good candidate for checkpoint blockade immunotherapy. Cancer discovery 5, 16-18, doi:10.1158/2159-8290.CD-14-1397 (2015). 36 Bailey, A. M. et al. FXR silencing in human colon cancer by DNA methylation and KRAS signaling. American Journal of Physiology - Gastrointestinal and Liver Physiology 306, G48-G58, doi:10.1152/ajpgi.00234.2013 (2014). 37 Lee, M. S., Menter, D. G. & Kopetz, S. Right Versus Left Colon Cancer Biology: Integrating the Consensus Molecular Subtypes. J Natl Compr Canc Netw 15, 411-419 (2017). 38 Chatterjee, R. & Vinson, C. CpG methylation recruits sequence specific transcription factors essential for tissue specific gene expression. Biochimica et Biophysica Acta 1819, 763-770, doi:10.1016/j.bbagrm.2012.02.014 (2012). 122

39 Fang, M., Ou, J., Hutchinson, L. & Green, M. R. The BRAF Oncoprotein Functions Through the Transcriptional Repressor MAFG to Mediate the CpG Island Methylator Phenotype. Molecular cell 55, 904-915, doi:10.1016/j.molcel.2014.08.010 (2014). 40 Kemp, C. J. et al. CTCF haploinsufficiency destabilizes DNA methylation and predisposes to cancer. Cell reports 7, 1020-1029, doi:10.1016/j.celrep.2014.04.004 (2014). 41 Tahara, T. et al. Colorectal carcinomas with CpG island methylator phenotype 1 frequently contain mutations in chromatin regulators. Gastroenterology 146, 530-538.e535, doi:10.1053/j.gastro.2013.10.060 (2014). 42 Colbert, L. E. et al. CHD7 expression predicts survival outcomes in patients with resected pancreatic cancer. Cancer research 74, 2677-2687, doi:10.1158/0008-5472.CAN-13-1996 (2014). 43 Xia, L. et al. CHD4 Has Oncogenic Functions in Initiating and Maintaining Epigenetic Suppression of Multiple Tumor Suppressor Genes. Cancer cell 31, 653-668.e657, doi:10.1016/j.ccell.2017.04.005 (2017). 44 Langst, G. & Manelyte, L. Chromatin Remodelers: From Function to Dysfunction. Genes (Basel) 6, 299-324, doi:10.3390/genes6020299 (2015). 45 Gomez, D., Dalal, Z., Raw, E., Roberts, C. & Lyndon, P. J. Anatomical distribution of colorectal cancer over a 10 year period in a district general hospital: is there a true “rightward shift”? Postgraduate Medical Journal 80, 667, doi:10.1136/pgmj.2004.020198 (2004). 46 Bernstein, C., Nfonsam, V., Prasad, A. R. & Bernstein, H. Epigenetic field defects in progression to cancer. World journal of gastrointestinal oncology 5, 43-49, doi:10.4251/wjgo.v5.i3.43 (2013). 47 Benlloch, S. et al. Detection of BRAF V600E Mutation in Colorectal Cancer : Comparison of Automatic Sequencing and Real-Time Chemistry Methodology. The Journal of molecular diagnostics : JMD 8, 540-543, doi:10.2353/jmoldx.2006.060070 (2006). 48 Whitehall, V. L. J. et al. Oncogenic PIK3CA mutations in colorectal cancers and polyps. International Journal of Cancer 131, 813-820, doi:10.1002/ijc.26440 (2012). 49 Bond, C. E. et al. p53 mutation is common in microsatellite stable, BRAF mutant colorectal cancers. International Journal of Cancer 130, 1567-1576, doi:10.1002/ijc.26175 (2012). 50 Nagasaka, T. et al. Mutations in both KRAS and BRAF may contribute to the methylator phenotype in colon cancer. Gastroenterology 134, 1950-1960.e1951, doi:10.1053/j.gastro.2008.02.094 (2008). 51 Boland, C. R. et al. A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer research 58, 5248-5257 (1998). 52 Irahara, N. et al. Precision of Pyrosequencing Assay to Measure LINE-1 Methylation in Colon Cancer, Normal Colonic Mucosa, and Peripheral Blood Cells. The Journal of Molecular Diagnostics : JMD 12, 177-183, doi:10.2353/jmoldx.2010.090106 (2010). 53 Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43, e47-e47, doi:10.1093/nar/gkv007 (2015). 54 Houseman, E. A. et al. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 9, 365, doi:10.1186/1471-2105-9-365 (2008). 55 The Cancer Genome Atlas Research, N. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513, 202, doi:10.1038/nature13480 https://www.nature.com/articles/nature13480#supplementary-information (2014). 56 Benjamini, Y. & Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 289-300 (1995). 57 Medvedeva, Y. A. et al. EpiFactors: a comprehensive database of human epigenetic factors and complexes. Database (Oxford) 2015, bav067, doi:10.1093/database/bav067 (2015). 58 Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting Functional Effect of Human Missense Mutations Using PolyPhen-2. Current Protocols in Human Genetics 76, 7.20.21-27.20.41, doi:10.1002/0471142905.hg0720s76 (2014).

123

59 The Encode Project Consortium. An Integrated Encyclopedia of DNA Elements in the Human Genome. Nature 489, 57-74, doi:10.1038/nature11247 (2012). 60 van Kruijsbergen, I., Hontelez, S. & Veenstra, G. J. C. Recruiting polycomb to chromatin. The International Journal of Biochemistry & Cell Biology 67, 177-187, doi:https://doi.org/10.1016/j.biocel.2015.05.006 (2015). 61 Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842, doi:10.1093/bioinformatics/btq033 (2010).

124

Chapter Two: Braf mutation induces rapid neoplastic transformation in the aged and extensively hypermethylated intestinal epithelium

Fennell, L.J., Kane, A., Liu, C., McKeone, D., Hartel, G., Bond, C., Bettington M., Leggett, B.A. and Whitehall, V.L.J. Submitted/Under Revision Supplementary Materials are available at: https://doi.org/10.5281/zenodo.4560888

Revelance to thesis aims: Hypothesis: Age associated DNA methylation elevates the risk of BRAF mutation induced neoplastic progression. Aim: To investigate the role of aging and DNA methylation in determining the malignant potential of the serrated neoplasia pathway Here we attempt to elucidate the role of aging in determining the neoplastic potential of the Braf mutation. We first show that age-associated DNA methylation occurs in the small intestine of mice, and that these alterations are enriched at WNT signalling pathway genes. Next, we induced Braf mutation in animals at wean, and examined how the DNA methylation profile of these animals changed over the course of aging. We observed a pervasive acceleration of age-related DNA methylation which was also enriched at WNT signalling loci. Lastly, we activated the Braf mutation in aged animals and compared the frequency of murine serrated adenomas to young animals exposed to mutant Braf for the same period of time. We report that aged animals are at a significantly higher risk of spontaneous neoplastic progression after an identicle period of exposure to the oncogene, when compared to young animals. These data directly address the second aim of this thesis.

125

Author Contributions:

Contribution Author Bioinformatic Statistical Experimental Drafting the Editting the Conceptualisation Analysis Analysis Work Manuscript Manuscript Fennell L.J 70% 100% 25% 70% 100% 22% Kane A - - 10% - - 4% Liu C - - 10% - - 4% McKeone D - - 20% - - 4% Fernando W - - 10% 5% - 4% Su C - - 10% - - 4% Bond C - - 5% - - 4% Jamieson S - - 5% - - 4% Dumenil T - - 5% - - 4%

126

Abstract Objective: Sessile serrated lesions (SSL) are common in both young and old individuals, but the BRAF mutant cancers arising occur predominantly the elderly. DNA Methylation is uncommon in SSL from young patients. Here we interrogate the role of aging and DNA methylation in SSL initiation and progression. Design: We used an inducible model of Braf mutation to direct recombination of the oncogenic Braf V637E allele to the murine intestine. BRAF mutation was activated after periods of aging, and histological, DNA methylation, and gene expression analysis was performed thereafter. We investigated DNA methylation alterations in human SSLs. Results: Inducing Braf mutation in aged mice was associated with a 10-fold relative risk of serrated lesions compared with young mice. Methylation analysis revealed extensive differences in age- associated DNA methylation between animals induced at 9 months versus wean; with relatively little differential Braf-specific methylation, implicating age-associated DNA methylation rather than Braf-specific DNA methylation in the heightened risk. DNA methylation at WNT pathway genes scales with age and Braf mutation accelerated age-associated DNA methylation. In human SSLs, increased epigenetic age was associated with high-risk serrated colorectal neoplasia. Conclusions: SSLs arising in the aged intestine are at a significantly higher risk of spontaneous neoplastic progression. These findings support a new conceptual model for serrated neoplasia whereby risk of progression is related to the milieu of (epi)-genetic alterations in the intestinal epithelium at the time of BRAF mutation, rather than the length of time since polyp initiation. This has implications for surveillance and chemopreventive strategies targeting the epigenome.

127

Summary Box

What is already known about this subject?

Sessile serrated lesions have distinct malignant potential. They are a common colonoscopic finding in patients of all ages, but the cancers arising occur exclusively in the elderly. Extensive DNA methylation is common in advanced SSLs and cancers. It is not clear whether SSLs lay dormant in the colon for decades, acquiring these alterations or whether spontaneously occurring SSLs rapidly acquire these alterations and are at risk of rapid neoplastic progression.

What are the new findings?

Here we show that Braf mutation can induce rapid neoplastic transformation in the aged and extensively aberrantly methylated intestinal epithelium. This shifts the current paradigm, where risk is thought to correlate with the length of time since SSL initiation, to one depending on the age of the patient at onset. We link age-associated DNA methylation alterations, that occur in the normal intestinal epithelium, with the heightened risk profile of SSLs presenting in elderly patients

How might it impact on clinical practice in the foreseeable future? This study indicates that the malignant potential of SSLs in young patients is limited, and that SSLs arising spontaneously in the elderly intestine can undergo rapid neoplastic transformation. This has implications for surveillance of patients previously diagnosed with SSLs, and suggests that older patients may benefit from closer surveillance.

128

Introduction

It is well established that the important subgroup of BRAF mutant colorectal cancers arise from sessile serrated lesions (SSL), previously called sessile serrated adenomas. With improvements in colonoscopic detection and diagnostic methods, SSL are now recognized to be common lesions1-4. Unlike conventional adenomas which are much more common in older individuals, SSL are equally represented across the age spectrum and are often found in individuals in their 30s and 40s 5. However, the BRAF mutant cancers arising from SSL occur at an older mean age than BRAF wild type cancers arising from conventional adenomas and they rarely occur before 50 years of age outside the setting of serrated polyposis syndrome5, 6. This paradox suggests that the risk of malignant transformation of an SSL may depend on the age of the individual in which it is present as well as the characteristics of the lesion itself. Surveillance intervals for patients presenting with SSL are currently guided by criteria established for conventional adenomas7, 8, which have a very different natural history, morphology and molecular profile compared to SSL9-11. Better understanding the drivers of progression of SSLs and particularly the role of age may help develop guidelines to inform surveillance intervals for patients following diagnosis of an SSL.

SSL are uniquely hallmarked by BRAF mutation and widespread DNA methylation changes termed the CpG Island Methylator Phenotype (CIMP)12, 13. These extensive methylation changes facilitate the silencing of key tumor suppressor genes necessary for malignant transformation. This includes genes in the WNT signaling pathway, such as the SFRP family of genes, and DNA repair genes such as MLH1. Approximately half of all SSLs are CIMP-positive and the vast majority of advanced SSLs with dysplasia are CIMP-positive, demonstrating the necessity of CIMP for progression to malignancy11. We have previously reported a striking association between CIMP and increased patient age amongst histologically indistinguishable SSL14, suggesting a biological explanation for why age may be a major risk factor for progression of SSL. It remains unclear if the highly methylated SSLs that are identified in older patients have dwelled for a significant period of time, perhaps decades, or whether methylation alterations occur rapidly in the spontaneously forming SSLs in the aged intestine. This remains a key distinction preventing the translation of this knowledge into the clinic.

A causative versus synergistic role for mutant BRAF and CIMP has been debated6, 15-18. In a mouse model we have recently demonstrated that prolonged exposure to intestinal Braf mutation in vivo

129

induces a CIMP-like methylator phenotype, preceding the development of murine serrated adenomas and ultimately invasive cancer19. In the wildtype intestine, low levels of DNA methylation accumulate with age for many of the same loci found to be altered with Braf mutation19, 20. This suggests that the aged intestine may provide the necessary epigenetic milieu for more rapid accumulation of DNA methylation following oncogenic mutation of Braf in older individuals. CIMP-specific loci have also been shown to accumulate with increasing passage in vitro in both cell line15 and organoid systems18.

This accumulation of DNA methylation with age has been shown to be due to ‘epigenetic drift’21, 22 which occurs as a result of stochastic errors in DNA maintenance methylation23, 24. Epigenetic drift is not randomly distributed, and appears to have an affinity for specific CpGs25. This has allowed for the development of epigenetic clocks that can reliably predict an organism’s age based on the DNA methylation state of a subset of CpGs throughout their genome26, 27. The epigenetic clock has provided a model for biological aging and increases in a person’s ‘epigenetic age’ compared to their chronological age has been associated with an increased risk of disease28, 29, and an overall increase in all-cause mortality30, 31.

Here we have utilized the conditionally active BrafV637E murine model to evaluate the effects of somatic oncogenic signaling on epigenetic drift and tumor formation across the age spectrum. We have exposed young and aged mice to mutant Braf for the same time period to provide insights into how the risk of serrated neoplasia is influenced by age.

130

Results Temporal CpG island hypermethylation in the intestinal epithelium of wildtype mice To establish the effects of aging on the intestinal methylome in our murine model, we examined the DNA methylation status of the normal, non-neoplastic mucosa of the intestine of wildtype mice (n=25) aged between 24 days and 20 months post wean. Using reduced representation bisulfite sequencing we captured 1,027,330 CpG sites, each covered by >10 reads in all samples (including those samples analysed in the forthcoming paragraphs). We identified 82,468 CpG sites (8.03%) where DNA methylation was significantly associated with biological aging (Type-A Sites, FDR corrected P<0.05, R2 range: 0.98 – 0.31, Figure 1A, Supplementary Figure 1). Type-A (Age- Associated) CpGs were predominantly confined to exonic, intergenic, intronic and promoter regions (Figure 1B). Age associated hypomethylation was frequent in exonic, intergenic and intronic regions, but uncommon in promoter regions (Figure 1B).

131

Figure 1: A) DNA methylation profiling of wild-type intestinal mouse DNA. We identified significant age associated DNA methylation was identified in ~8% of CpG sites by linear regression analysis. Positive slope values indicate progressive DNA hypermethylation with age. B) Age associated DNA hypermethylation (Positively-sloping, increases with age) is most common in exons, introns, intergenic regions and gene promoters. Hypomethylation (Negatively-sloping, decreases with age) is common in exons, introns and intergenic regions, however is extremely rare in gene promoters. C and D) Age associated DNA methylation of colon cancer tumour suppressors Cdkn2a (C), Dkk3 (D). Each line represents a single CpG site in the promoter region.

Tumour suppressor genes and regulators of intestinal differentiation are putative targets of intestinal age associated DNA methylation We cross referenced the loci containing age associated promoter hypermethylation with known tumour suppressor genes from the TSGene 2.0 database and report Type-A promoter methylation

132

of 105 tumour suppressor genes, including previously reported colonic tumour supressors such as Cdkn2a (Figure 1C), Dkk3 (Figure 1D), Sfrp2 and Dcc. Type-A methylation in gene promoters was enriched for genes involved biological processes associated with differentiation (P=1.19x10-40, Observed/Expected (O/E) Ratio: 1.92) and development (P=2.05x10-48, O/E Ratio: 1.75). By contrast, immune related molecules were underrepresented (P=1.77x10-9, O/E Ratio: 0.39). Next, we performed enrichment analysis at pathway level using pathways curated by PANTHER32. Pathways level enrichment analysis identified a significant enrichment Type-A methylation at genes in eight pathways, including the WNT signaling cascade (P=9.11x10-4, O/E Ratio: 2.09, Table 1) and Slit/Robo signaling (P=5.37x10-4, O/E Ratio: 6.42, Table 1).

133

Table 1: Type-A DNA methylation at gene promoters targets specific signaling pathways. Type-A CpGs were identified as those significantly associated with age in the intestine of wild-type animals from weaning. CpGs were assessed by regression analyses and pathway enrichment determined using the PATHER Enrichment tool. FDR Corrected Pathway Name Fold Enrichment P-Value Axon guidance mediated by Slit/Robo 6.42 5.37x10-4 Ionotropic glutamate receptor pathway 3.87 4.18x10-3 Cadherin signaling pathway 2.94 6.71x10-5 Metabotropic glutamate receptor group III pathway 2.87 0.0345 Heterotrimeric G-protein signaling pathway-Gi alpha and Gs alpha mediated 2.37 5.67x10-3 Alzheimer disease-presenilin pathway 2.3 0.0368 Heterotrimeric G-protein signaling pathway-Gq alpha and Go alpha mediated 2.27 0.0476 Wnt signaling pathway 2.09 9.11x10-4

134

Next, we performed gene expression analysis via RNA-Seq on the normal, non-neoplastic intestinal mucosa of wild type animals aged 10 days (n=3) and 14 months (n=4) to evaluate the transcriptional consequences of Type-A methylation changes. In 10 day wild-type animals, Type-A genes were more likely to be expressed when compared with the wider transcriptome (Supplementary Figure S2A,B), and the average expression of Type-A genes was higher than non type-A genes (Supplementary Figure S2C). Of the 1208 Type-A genes, 16.8% were not expressed (Mean FPKM = 0), and an additional 56.8% were lowly expressed (FPKM < 1). We performed differential expression analysis on expressed Type-A genes between 10 day and 14 month old wild-types to identify transcriptional changes that may have been caused by temporal methylation changes. We identified differential gene expression in 14.5% of Type-A genes (Figure 2, Supplementary Table S1), including key WNT signaling pathway genes such as SFRP2 (LogFC: -6.81, P=1.65x10-6) and WNT7B (LogFC: -3.7, P=0.034). We did not identify any Type-A genes that were differentially downregulated and were part of the Cadherin signalling cascade.

135

Figure 2: RNA-Sequencing was performed on 10-day-old wildtype animals (n=3) and 14-month-old wildtype animals (n=3) to assess transcript expression. Differential gene expression of DNA methylation Type-A genes in the small intestine of 10-day-old wild-type animals versus 14 month old wild type animals using DeSeq2. P-values were adjusted using the false discovery rate method to limit Type- I errors. Negative log fold changes represent differential downregulation in 14 month old wild-types and positive log fold-change indicates upregulation in 14 month wild-types.

136

Braf mutation potentiates age-related DNA methylation changes Next, we sought to assess the effects of prolonged oncogenic MAPKinase signaling on age associated DNA methylation. We introduced the intestinally specific oncogenic Braf V637E mutation in animals at wean and assessed temporal changes in DNA methylation of the non- neoplastic intestinal mucosa from 10 days to 14 months post activation (n=37). There was a significant association between age and DNA methylation in 16.8% of CpG residues (172,217 CpG sites, FDR <0.05 R2=0.18-0.97), which included 81.3% of Type-A CpGs identified in wild-type animals. We hypothesised that the rate of DNA methylation changes in age-associated loci common to both wild type and Braf mutant animals may differ according to Braf mutation status. Of the 67,056 shared age associated loci we identified significantly different rates of methylation accumulation over time in 46,581 CpG residues (69.5%, Supplementary Figure 1). We denote these CpGs type- AB; Age-associated but modified by oncogenic Braf mutation). CpGs in which we observe temporal changes in the setting of Braf mutation, but not in the wild type intestine, were termed Type-B CpGs (Temporal but specific to Braf mutation). These are discussed in coming paragraphs. 91.2% of Type- AB CpGs accumulated DNA methylation at a greater rate with Braf mutation, whilst 8.8% of probes accumulated methylation changes more slowly (Figure 3A depicts the rate of change in these CpG sites).

137

Figure 3: A) Braf mutation induces a widespread acceleration in the rate of Type-A DNA methylation changes. The rate of change of DNA methylation (%/day) in CpG sites that were significantly different between BrafV637 and Wt animals (Type-AB CpGs, age associated but modified by Braf) as assessed by analysis of covariance. The red line depicts a hypothetical equal rate of change between groups. The bottom-left and top-right quadrants show concordant hypomethylation and hypermethylation, respectively, occurring at different rates according to Braf status. The top-left and bottom-right show methylation events that are discordant by Braf status. Dark blue ellipses indicate the density of CpGs at a given location on the graph. High number of overlapping data are shown by short distances between ellipses. B) The genomic distribution of Type-AB CpGs C) The genomic distribution of Type-B CpGs

138

For genes promoters associated with these 46,581 CpG sites, we observed an enrichment for genes in the cadherin signaling pathway (O/E: 4.41, P=4.32x10-7, Table 2), the WNT signaling pathway (O/E: 2.65, P=2.29x10-4) and heterotrimeric G protein signaling (O/E: 3.22, P= 8.97x10-4, Table 2). Enrichments for WNT signaling in Type-AB promoters was stronger than in Type-A promoters (O/E 2.65 vs 2.09), indicating a selection for accelerating age-associated DNA methylation at WNT signaling pathway gene promoters following Braf mutation.

139

Table 2: Type-AB DNA methylation at gene promoters targets specific signaling pathways. Type-AB CpGs were identified using a two-step process: First, we identified CpGs that were significantly associated with age in both the intestine of wild-type animals and Braf mutant animals from weaning. From this subset of Type-A CpGs, we identified those that (de)accumulate DNA methylation at significantly different rates according to mutation status by analysis of covariance. CpGs were assessed by regression analyses and pathway enrichment determined using the PATHER Enrichment tool. FDR Corrected Pathway Name Fold Enrichment P-Value Axon guidance mediated by Slit/Robo 6.81 1.31 x10-2 Axon guidance mediated by netrin 5.45 1.29 x10-2 Ionotropic glutamate receptor pathway 5.34 1.70 x10-3 Cadherin signaling pathway 4.41 4.32x10-7 5HT1 type receptor mediated signaling pathway 4.34 3.73 x10-2 Metabotropic glutamate receptor group III pathway 4.01 1.15 x10-2 Heterotrimeric G-protein signaling pathway-Gi alpha and Gs alpha mediated 3.22 8.97x10-4 Wnt signaling pathway 2.65 2.29x10-4

140

Intestinal epigenetic age is increased by prolonged exposure to oncogenic Braf Epigenetic age is the predicted biological age of a tissue based on the DNA methylation level of CpGs that are predictive of chronological age in normal tissues. As we identified a robust acceleration of age-associated DNA methylation in Braf mutant intestinal samples we next assessed whether the epigenetic age of the intestinal epithelium was similarly accelerated. We estimated epigenetic age using two previously validated epigenetic age prediction models (Stubbs et al33, Meer et al34). As expected, both epigenetic age predictors yielded estimates of age in our wild type animals that was strongly correlated with chronological age, however these models lack precision (R2 (Stubbs) 0.91 (Meer) 0.97 Figure 4A, 4B). The Stubbs model tended to underpredict epigenetic age of wild-type animals (Ratio of Chronological Age to Epigenetic Age (CA/EA: 1.83, RSME: 184.54), in contrast the Meer model tended to overpredict the age of wild-type animals, but was generally more accurate (CA/EA: 0.76, RMSE: 117.02). Intestinal epigenetic aging was significantly accelerated in Braf mutant mucosa using both the Stubbs model (CA/EA: 0.65 vs 1.83, P<2.2x10- 16, Figure 4A) and the Meer model (CA/EA: 0.39 vs 0.76, P<2.2x10-16, Figure 4B). We also built an in-house, tissue specific, epigenetic clock model, which was much more precise in estimating the epigenetic age of wild-type animals. Prediction of epigenetic age using this model showed the same marked increasing in epigenetic aging revealed by the two published models (Figure 4C, ANCOVA P < 2.2x10-16). Thus, constant exposure to oncogenic Braf induces a marked and sustained acceleration of epigenetic aging.

141

Figure 4: Epigenetic age versus chronological age in the Braf mutant and Wild-type intestine. Epigenetic age is persistently accelerated by prolonged exposure to oncogenic Braf V637. Epigenetic age was estimated using models by Stubbs et al (A), Meer et al (B) and a model built by elastic net regression modelling of age using methylation data from the intestine of wild-type animals (C) . Analysis of Covariance was performed to test whether the rate of epigenetic ageing differed between Braf V637 and wt animals.

142

Braf mutation specific DNA methylation alterations Next we analysed the DNA methylation alterations that occurred over time but were exclusive to the Braf mutant intestinal mucosa. This represented 61% of all temporal DNA methylation alterations that occur in the presence of Braf mutation (105,561 CpGs, Supplementary Figure 1). As these events are specific to oncogenic Braf, we term them Type-B methylation events to discriminate them from age-associated methylation (Type-A, Type-AB). Type-B methylation is similarly distributed to Type-A methylation (Figure 1B, 3C). Of all type-B hypermethylation events, 17,919 mapped to the promoter region of 3,984 genes. Type-B methylation is enriched at cancer-associated signaling pathways (Table 3), including the WNT signaling pathway (O/E: 1.82, P=7.34x10-5), the angiogenesis pathway (O/E: 2.0, P=6.89x10-4), the TGF-Beta signaling cascade (O/E: 1.92, P=0.03), and EGF signaling (O/E: 1.76, P=0.04). We identified 310 tumour suppressor genes that were affected by Type-B methylation, representing 17.26% of all tumour suppressor genes identified in the TSGene 2.0 database.

143

Table 3: Type-B DNA methylation at gene promoters targets specific signaling pathways. Type-B CpGs were identified as those significantly associated with age in the Braf mutant intestine from weaning but not associated with age in wild-type animals. CpGs were assessed by regression analyses and pathway enrichment determined using the PATHER Enrichment tool. FDR Corrected Pathway Name Fold Enrichment P-Value Alpha adrenergic receptor signaling pathway 3.07 0.033 Alzheimer disease-presenilin pathway 2.52 1.51x10-5 Ionotropic glutamate receptor pathway 2.51 0.019 Alzheimer disease-amyloid secretase pathway 2.42 7.90x10-3 Heterotrimeric G-protein signaling pathway-Gi alpha and Gs alpha 2.07 3.52x10-4 Gonadotropin-releasing hormone receptor pathway 2.04 1.42x10-5 Angiogenesis 2 6.89x10-4 Heterotrimeric G-protein signaling pathway-Gq alpha and Go alpha 1.96 9.78x10-3 TGF-beta signaling pathway 1.92 0.033 PDGF signaling pathway 1.89 8.94x10-3 Cadherin signaling pathway 1.88 7.37x10-3 Wnt signaling pathway 1.82 7.34x10-5 EGF receptor signaling pathway 1.76 0.040 Integrin signalling pathway 1.7 0.018

144

Expression analysis of Type-B genes that were hypermethylated revealed significant differential expression of 472 genes between 10 day and 14 month old Braf mutant animals (Supplementary Table S2), which corresponds to 11.8% of all Type-B genes. This is consistent with our findings on the frequency of gene repression by methylation in human colon cancers6. Of differentially expressed Type-B loci, most were differentially downregulated in 14 month Braf mutant animals, suggesting methylation induced gene silencing (Supplementary Figure S3A). Figure S3B and S3C are depictions of Type-B genes Cstf2t and Fbn2 differential gene expression. Differentially expressed Type-B genes included 21 WNT signaling pathway genes (Table 4).

145

Table 4: Differentially expressed Type-B WNT signaling pathway genes between BrafW-10D and BrafW-14M animals. WNT signaling was significantly enriched amongst differentially expressed Type-B genes and this table represents those genes that were differentially modified.

Gene FDR Corrected Log2 Fold Change P-Value Cdh6 3.99x10-3 -4.45 Pcdh9 9.42x10-6 -4.10 Fzd3 1.05x10-2 -2.73 Fzd3 1.05x10-2 -2.73 Kremen2 2.53x10-4 -2.56 Tcf7l1 2.35x10-2 -2.32 Pygo1 8.96x10-3 -2.12 Fat4 6.75x10-33 -2.04 Pcdh18 2.18x10-9 -1.87 Cdh3 3.52x10-2 -1.84 Dchs1 3.01x10-18 -1.79 Wnt5a 1.16x10-3 -1.62 Cdh11 6.70x10-10 -1.51 Pcdh7 1.51x10-9 -1.49 Frzb 1.05x10-2 -1.27 Sfrp1 5.07x10-4 -1.24 Plcb1 7.05x10-3 -0.86 Fzd5 3.03x10-2 0.50 Tcf7 3.35x10-2 1.06 Wnt6 3.27x10-2 1.75 Wnt10a 1.04x10-2 4.47

146

We assessed human CIMP panel genes for type-B methylation and report that Igf2 and Neurog1 undergo significant type-B methylation (Supplementary Table S3). We did not observe evidence of type-B methylation in Cacna1g, or Socs1. Cacna1g, however, is an age-associated locus that is accelerated by oncogenic Braf (Type-AB). We did not capture any CpGs in the Runx3 promoter. Of the five human CIMP panel genes, the expression of two genes, Igf2 and Runx3, were significantly downregulated versus wild-type samples (ANOVA P-value < 0.0001, 6.59x10-6, respectively; Supplementary Figure S4). The differential effects of Braf mutation in the aged intestine We have previously shown that prolonged exposure to mutant Braf from wean can induce murine serrated lesions 35. Based on our finding that extensive DNA methylation alterations accumulate with age (Figure 1A), we hypothesised that the intestine of an elderly mouse would more rapidly develop serrated neoplasia following induction of Braf mutation.. To test this hypothesis we aged animals for 9 months prior to activation of mutant Braf for a period of five months (Braf9-14) and compared both histology and DNA methylation to animals with Braf activated at wean for an identical period of time (BrafW-5).

Figure 5: A) The aged intestine is primed for Braf induced spontaneous neoplastic transformation. Braf mutation rarely induces murine serrated lesions in animals exposed for five months from wean. Mutation for the same interval after 9 months of ageing frequently induces murine serrated lesions B) Representative example of a murine serrated adenoma located in the intestine of a Braf9-14 mouse. This is a protuberant lesion which projects from the small intestinal surface, forming numerous club-shaped papillae. The papillae are lined by a variety of cell types. At the base the cells are cuboidal, contain clear cytoplasm and resemble the mucin-rich cells seen in human sessile serrated lesions. This transitions to slender cells with abundant pink cytoplasm, which resemble human traditional serrated adenomas. Finally, the tips of the papillae are lined by crowded and hyperchromatic cells with overt dysplastic cytology. 147

Following the induction of mutant Braf for 5 months from wean, animals rarely develop murine serrated lesions (Figure 5A). We compared these animals (n=24, BrafW-5) to animals aged for 9 months prior to induction for the mutation for the same 5 month time period (n=32, Braf9-14). The incidence of murine serrated lesion in BrafW-5 animals was 4.2% and in Braf9-14 mice it was significantly higher at 43.8% (Relative risk: 10.5, Fishers Exact P<0.001, Figure 5A), despite being exposed to oncogenic Braf for the same duration of time. An example of a representative murine serrated lesion is shown in Figure 5B. These data suggest that the age, and underlying epigenetic alterations when Braf is mutated, rather than the length of exposure alone, influence the risk of Braf induced transformation. To determine if the increased risk of lesion development in the aged animals may be due to DNA methylation alterations we performed genome-wide DNA methylation analysis on intestinal mucosa from a subset of these animals (BrafW-5 n=9, Braf9-14 n=12). 68,385 CpG sites were differentially methylated, of which 72.2% were more methylated in the Braf9-14 animals. We next classified these loci as either Type-A (Age-associated, independent of Braf mutation), Type-AB (Age-associated, but the rate of methylation accumulation differs between Braf mutant and wild type animals) or Type-B (Occurring exclusively in the setting of Braf mutation), if they were identified in earlier analysis, and ‘other’ if they were not. Forty-three percent of differential methylation occurred at Type-AB loci, 16% at Type-A loci, 15% at Type-B loci and 26% at unclassified loci (Figure 6A).

148

Figure 6: A) The distribution of differentially methylated CpG sites by A) Methylation type B) Methylation type and genomic context. Negative refers to differentially hypomethylated and positive to differentially hypermethylated CpGs

149

Type-A, AB and Type-B differential methylation was predominantly hypermethylation (65.4%, 96.3%, 72.1%, respectively). However unclassified methylation events were more likely to be hypomethylation (37% hypermethylated). The distribution of methylation events was similar for Type-A and Type-B differential methylation (Figure 6B). By contrast, Type-AB methylation was predominantly hypermethylation across all genomic features and we note that type-AB hypomethylation at gene promoters was extremely rare. We performed pathways analysis on differentially methylated genes that were classified as Type-A, Type-AB or Type-B. We observed no significant pathway level enrichment for Type-A loci that were differentially methylated between BrafW-5 and Braf9-14 animals (Table 5). Type-B differentially methylated genes were enriched for G-protein signaling, glutamate receptor signaling, and the cadherin signaling pathway (Table 5). Similarly, Type-AB differentially methylated genes were associated with cadherin and glutamate receptor signaling. Notably, Type-AB differentially methylated genes were also strongly enriched for involvement in the WNT signaling cascade (Table 5).

150

Table 5: PANTHER pathway enrichment for differentially methylated genes in BrafW-5 versus Braf9-14 animals. Enrichment analysis was performed on genes with at least one differentially methylated CpG site mapping to the promoter region of the gene and is stratified for methylation type. FDR Corrected Fold Enrichment P-value Pathway Name P-Value Type-A No significant pathway enrichment Type-AB Ionotropic glutamate receptor pathway 5.9 1.34x10-4 5.56x10-3 Cadherin signaling pathway 4.88 2.81x10-8 4.66x10-6 Wnt signaling pathway 2.78 2.30x10-5 1.27x10-3 Type-B Ionotropic glutamate receptor pathway 6.69 5.71x10-5 3.16x10-3 Metabotropic glutamate receptor group III pathway 5.65 6.56x10-5 2.72x10-3 Heterotrimeric G-protein signaling pathway-Gi alpha and Gs 4.29 2.94x10-6 4.87x10-4 Heterotrimeric G-protein signaling pathway-Gq alpha and Go 3.88 2.35x10-4 7.79x10-3 Cadherin signaling pathway 3.32 4.62x10-4 0.0128

151

We also compared the methylation profile of the non-neoplastic intestinal mucosa of Braf9-14 animals with MSLs to Braf9-14 animals that did not have MSLs (n=6 for both groups). On differential methylation analysis we identified 2,490 CpG sites that differed between these groups, 72.5% of which were significantly more methylated in Braf9-14 animals with MSLs (FDR < 0.05, absolute methylation difference of > 10%, Supplementary Figure S5). 177 of hypermethylated CpG sites mapped to 92 protein coding genes, and enrichment analysis revealed an association with WNT signaling genes (p = 0.000907). Gene expression profiling and analysis by single-sample gene set enrichment analyses revealed substantial downregulation of the WNT signaling cascade in Braf9-14 animals in comparison to BrafW-5 mice (P=0.02, Figure 7A and 7B), providing further support for the role of the WNT signaling pathway in the differential risk of neoplastic transformation.

Figure 7) The WNT signaling pathway is repressed in the hyperplastic epithelium of Braf9-14 mice. Pathway activity was assessed by single sample gene set enrichment analysis of transcript expression data obtained via RNA-Seq. The genesets used for this analysis were the WNT_SIGNALING and KEGG_WNT_SIGNALLING gene sets. Data presented is the normalized ssGSEA score.

152

Epigenetic drift is accelerated in human sessile serrated lesions We also examined the effects of BRAF mutation on epigenetic drift and epigenetic aging in human samples. To achieve this we combined three publicly available DNA methylation data sets. The first dataset contained 232 normal colonic mucosal samples (GSE11390436) derived from healthy individuals that were free of neoplasia, and was used to train a model to predict epigenetic age. The second dataset contained 149 colonic mucosal samples (GSE10176437), derived from colonic mucosa adjacent to cancer, and was used to test the accuracy of the model. The third dataset consisted of 80 BRAF mutant sessile serrated lesions (E-MTAB785438). 40 of these SSLs had a focus of dysplasia which was macrodissected and discarded, the remaining 40 had no evidence of dysplasia but were nonetheless CIMP-High lesions. We regard the former as “high risk” SSLs, and the later as “moderate risk” SSLs. Our human model for intestinal epigenetic age was constructed using elastic net regression on probes that were shared by all three cohorts (n=376,280). The model identified 238 CpG sites that were predictive of age. In our training dataset, the mean error was 0.34 years. In the test dataset the mean error rate was 2.94 years. When the model was applied to the DNA methylation data from sessile serrated lesions, we observed a marked increase in epigenetic age of 11.0 years greater than the expected chronological age (P<0.0001, Supplementary Figure S6A). This analysis did not include high risk SSLs, however when we apply our model to these lesions (n=40), we report a further elevation in epigenetic age compared to moderate risk SSLs (P=0.08, Mean difference: 4.76 years, Supplementary Figure S6B). This difference, albeit not at the threshold for significance, provides evidence that epigenetic age may inform the transformational risk of SSLs. Discussion This study shows that prolonged exposure to mutant Braf markedly increases age-associated epigenetic drift. In the context of BRAF mutant human colorectal cancer, these findings shed light on why the risk of neoplastic transformation increases dramatically with age. The conventional model for colorectal tumorigenesis describes a multistep process whereby somatic mutations sequentially accumulate to promote tumor development over time. Data presented here support a new paradigm pertinent to serrated neoplasia, whereby risk of progression is influenced by the milieu of age-associated epigenetic alterations in the intestinal cells at the time of BRAF mutation, rather than the length of time since polyp initiation. These findings have implications for

153

understanding the natural history of serrated neoplasia, for informing SSL surveillance strategies relative to age and for the rational design of chemopreventive strategies that target the epigenome. Recently, using the same model presented in the current study, we showed that induction of the Braf mutation results in consistent DNA methylation changes analogous to CIMP in human colorectal cancer19. Here we have examined both normal and Braf mutant intestinal mucosa in much greater depth. Profiling of the murine intestinal methylome revealed age-associated DNA methylation changes in 8.0% of the >1 million CpGs examined. Most of the loci that were associated with epigenetic drift were also associated with Braf mutation, however the rate these loci accumulate DNA methylation is significantly faster in the Braf mutant intestine. DNA hypomethylation is also common during both aging and tumourigenesis39. It is important to note that the technology used in this study (RRBS) enriches for CpG islands, and that CpGs residing outside of these regions tend to become demethylated with age39 We report that age associated loci overlap significantly with the WNT signaling pathway, and genes associated with development and differentiation. Moreover we identified >100 promoters of tumour suppressor genes that accumulate DNA methylation in a temporal manner. As WNT signaling is crucial in serrated colorectal carcinogenesis40 and the silencing of tumour suppressor genes by DNA methylation is a well-established means of escaping normal cellular regulation, we hypothesise that age associated DNA methylation at the promoter region of these genes primes the aged intestine for a more rapid neoplastic transformation once a Braf mutation is acquired and an SSL is formed. This supports our conceptual model of the evolution of CIMP arising in the setting of BRAF mutation14. It has been proposed that there are two DNA methylation patterns within the colon, Type-A and Type-C methylation41, 42. Type-A methylation would normally occur with age. Type-C methylation describes the DNA methylation patterns that have been observed in cancer42. We prefer the more precise nomenclature of Type-B methylation, describing Braf induced alterations. In this study, we report that the majority of loci that become methylated with age do so significantly more rapidly in the presence of mutant Braf. Therefore we propose that Type-A methylation should now be dichotomized into pure type-A (Type-A) methylation, which occurs at the same rate regardless of the presence of Braf mutation, and Braf-modified type-A methylation (Type-AB), which refers to DNA methylation that normally occurs throughout the aging process but is accelerated by oncogenic exposure. Based on this we propose that some methylation previously described as Type-C (Which we now refer to as Type-B) methylation may have resulted from this acceleration process and is instead an extreme form of Type-A methylation, and more accurately described as Type-AB 154

methylation. We hypothesize that in humans, given an unlimited lifespan, many of the DNA methylation alterations that are observed in BRAF mutant cancers may arise in the absence of BRAF mutation, albeit over a significantly longer period. The functional consequences of Type-A and Type-AB methylation in the setting of serrated neoplasia remain unclear, and further study is warranted to elucidate their role in priming the intestinal epithelia for neoplastic transformation.

We next examined tumor burden in animals exposed to mutant Braf for 5 months, either from wean or in animals aged to 9 months. In the aged mice, there was a 10 fold increased relative risk of developing a murine serrated lesion compared to mice exposed to mutant Braf from wean. We did not observe invasive cancer in our study, and thus it remains possible that aging facilitates the acquisition of overt cytological dysplasia in serrated lesions, but that some dwell time is still required for malignant progression. In humans, dysplastic sessile serrated lesions terminate to malignancy in ~12 months11.

Comparisons of the DNA methylation landscape of these groups revealed highly divergent epigenetic profiles. Braf9-14 animals displayed extensive hypermethylation, predominantly confined to Type-AB loci. We observed some hypermethylation of Type-B loci, suggesting that certain genomic regions become predisposed through aging to rapid methylation acquisition following activation of oncogenic Braf. Pathways analysis of the affected genes did not reveal enrichments for cancer-associated pathway, and thus we conclude that it is improbable that Type-B methylation differences explain the increased risk of neoplasia in the aged intestine. In contrast, Type-AB methylation differences were enriched for the WNT signaling pathway, and the enrichment for WNT signaling amongst these specific Type-AB loci was stronger than the overall fold enrichment of Type-AB loci. These data suggest that the existing epigenetic milieu is a critical determinant of tumorigenicity following oncogenic mutation of Braf. In keeping, when we examined Braf9-14 mice that had MSLs, in comparison to those that didn’t, we observed significant hypermethylation at gene promoters and especially those encoding WNT signaling regulators. Transcriptomic profiling of the intestine of Braf9-14 (in comparison to BrafW-5 animals), revealed significant repression of WNT signaling. This repression of WNT signaling may heighten the selective pressure for pro- transformation (epi)genetic alterations in WNT signaling, which may ultimately result in neoplastic transformation. Consistent with this, examination of cell lines over time have shown the acquisition of DNA methylation changes at CIMP target genes, both in wildtype cells and following mutation of Braf15. 155

In vitro assessment of colonic organoids have also shown that DNA methylation changes accumulate over time and may create an environment more permissive to oncogenic transformation18. The in vivo study we present here provides an important validation of these findings and resolves a key limitation in this earlier work, being the possibility that the culture environment may have contributed to the effects Tao et al observe18. Our study shows this not to be the case, and lays forth a foundation for clinical studies in this area. On the mechanism of DNA methylation acceleration of by Braf, we identify three hypotheses. The first, as proposed by Tao et al18 is that cells harbouring DNA methylation alterations that have occurred during aging have a growth advantage, and continue to be selected for in the presence of Braf mutation, thus increasing the overall level of DNA methylation in the sample. The second, that Braf mutation induces a process that directly influences DNA methylation (such as oxidative stress, or via MAFG as suggested by Fang et al16), and the third that age-associated DNA methylation alterations are accelerated by Braf as a consequence of oncogene induced proliferation and mitosis, and thus the tissue “ages” relatively more rapidly compared with the wild type intestine. Discerning between these competing (or potentially co- occuring) hypotheses is an important area of future research.

We also examined the impact of epigenetic drift in human samples. We developed an epigenetic clock model for human colon using two independent data sets. We then applied this to a series of SSL and observed an acceleration of epigenetic age of 11 years. Moreover we report that DNA extracted from the non-dysplastic compartment of human SSLs with a focus of dysplasia has an elevated epigenetic age when compared with age-matched human SSLs with no evidence of cytological dysplasia. These data indicate that advanced epigenetic age may indicate risk of neoplastic progression.

In this study, we aged mice to 9 months prior to activating oncogenic Braf, equivalent to middle age in humans. Future studies could age animals for much longer to mimic elderly humans, before inducing the Braf mutation for even shorter periods of time. In addition, further validation in human specimens is required, especially in polyps where the dwell time can be calculated from a previous clearing colonoscopy. As the CpG island methylator phenotype is an indelible component of human BRAF mutant cancers, this was the focus of our investigations. Recently, Lee-Six and colleagues43 showed the pervasive accumulation of genetic alterations that scales with age, in the colonic crypts of healthy participants. Lee-Six et al report that ~1% of colonic crypts of middle-aged participants contained driver mutations43. It is possible that age-associated risk of serrated neoplasia is also 156

influenced by the accumulation of genetic lesions with age, and that some of these genetic alterations may subvert the cellular processes restraining transformation. It is likely that a combination of age- associated events contributes to the risk we have observed in the present study. Conclusion Here we have comprehensively evaluated epigenetic drift in the murine intestine and shown that prolonged exposure to mutant Braf rapidly accelerates the rate of epigenetic drift. We have provided evidence that the WNT signaling pathway is a putative target of epigenetic drift. We have demonstrated in vivo that mutation of Braf in an aged intestine dramatically increases the rate of developing neoplasia compared to the same mutation occurring in young intestine, and show striking methylation differences between the intestinal epithelium of animals induced at a young age compared to those at an older age. These findings have implications for our understanding the development of serrated neoplasia across the age spectrum in humans, and may influence the rational development of personalized surveillance guidelines according to the age of the individual. The ability to reverse epigenetic alterations also provides an impetus for investigating chemopreventive strategies to slow epigenetic drift to attenuate cancer development.

157

Methods Murine model of serrated neoplasia We recapitulated Braf mutant serrated neoplasia in vivo by crossing animals with the conditionally activated BrafV637E allele with VillinCreERT2 animals as previously reported19. Intraperitoneal administration (75mg/kg) of tamoxifen at 14 days post birth directs intestinally specific recombination of the BrafV637E allele, which is analogous to the human BRAFV600E mutation. Animals were genotyped using the method reported by Bond et al.19 Braf mutation was activated at wean (W) and animals were sacrificed at a range of timepoints across the murine lifespan (referred to as BrafW-X, where X = the number of months post induction of the Braf mutation) (Table 6). We also examined the morphological implications of aging following Braf mutation by activating oncogenic Braf at nine months of age for a period of five months. We assessed the histology, DNA methylation and gene expression of the intestinal epithelium of these animals (Table 6). Extended methods are available as supplementary materials. We confirmed efficient recombination of the mutant Braf allele by Polyacrylamide gel electrophoresis of DNA extracted from intestinal samples (Supplementary Figure S7).

Extended Methods Sample collection and assessment The first 10cm of the small intestine was excised and preserved in formalin fixed paraffin embedded blocks for histological evaluation. In our experience (Bond et al 2018), most murine serrated lesion are concentrated in this region. Intestinal epithelial cells were scraped using a scalpel blade from the mucosa of the next ten centimeters were snap-frozen for molecular analyses, with the remainder of the intestine preserved for histological assessment. If macroscopic murine serrated lesions were identified in the tissue section usually reserved for DNA methylation, care was taken to collect tissue >1cm from the margin of the lesion. Four micron sections were cut and stained with hematoxylin and eosin using standard methods. Histological features were assessed by a specialist gastrointestinal pathologist (CL), who previously established the histological criteria for assessing murine serrated neoplasia23. From the frozen samples, DNA was extracted using the Qiagen AllPrep Mini Kit (Qiagen, CA, USA) as per the manufacturer’s protocol. DNA quality was assessed using QuBit BR dsDNA assay kit (Thermofisher, MA, USA).

158

Reduced representation bisulfite sequencing and analysis We assessed genome scale DNA methylation using the NuGen Ovation RRBS Methyl-Seq system. In brief, DNA is digested by the methylation insensitive MspI restriction enzyme. This enzyme digests DNA at CCGG motifs, resulting in fragments of ~250-300bp that correspond to regions with high CpG content. Sequencing these fragments allows for the assessment of hundreds of thousands to millions of CpG sites without sequencing the entire genome. RRBS Sequencing preprocessing and data analysis

We performed single-end 100bp sequencing on libraries using the NovaSeq 6000 S1 flowcell with a target of 30 million reads per sample. Output BCL files were converted to fastq format, and demultiplexed using bcl2fastq2. Adapters were removed and reads trimmed for poor quality using TrimGalore and the Nugen Diversity adaptor trimming script (trimRRBSdiversityAdaptCustomers.py). Reads were aligned to the mm10 using Bismark (v0.20.0)64. Positions with poor coverage (<10X in any sample) were filtered and methylation fractions extracted using the methyKit R package (1.8.1)65. Epigenetic age was modelled by elastic net regression using the glmNet R package using an alpha of 0.5. The cv.glmnet function was used for ten-fold cross validation and the identification of the minimum lambda. For human samples the model was trained using data from Luebeck et al (2019)47 and validated using data from Barrow et al (2018)48. Murine epigenetic age was also assessed using two external epigenetic age models and a model developed in-house. For analysis with the Stubbs et al epigenetic clock, clock sites were extracted and missing clock sites imputed using the K-Nearest neighbor method. Data was quantile normalized with data from Stubbs et al prior to epigenetic age estimation. For analysis with the epigenetic clock proposed by Meer et al clock sites were extracted and sites not present in any individual sample were excluded from all samples prior to epigenetic age estimation. For construction of our in house epigenetic age model, epigenetic age was modelled by elastic net regression using the glmNet R package using an alpha of 0.5. The cv.glmnet function was used for ten-fold cross validation and the identification of the minimum lambda. The model was trained on wild type animals aged 24 days to 20 months. The coefficients generated from these analyses are available as supplementary materials (Supplementary Table S4). CpG sites were annotated using the AnnotatePeaks.pl script contained within the HOMER software suite. RNA-Sequencing and data analysis

159

Transcript expression was assessed in a subset of animals (Table 6) by RNA-Sequencing. Total RNA sequencing libraries were prepared using the TruSeq Stranded Total RNA with RiboZero Gold kit (Illumina, CA, USA) with 500ng of input total RNA. Input RNA was quality assessed using the TapeStation platform (Agilent, CA, USA), and RNA fragmentation and PCR cycling optimized as per the manufacturers protocol. Libraries were sequenced on an Illumina NextSeq 550 platform to an average target depth of 50,000,000 75bp paired end reads per sample. For data processing, including quality control, adaptor trimming, alignment (to mm10), and transcript quantification we used nf-core rnaseq pipeline (v1.4.2). As the library is stranded the –reverseStranded flag was used. In brief, this pipeline trims adaptors with TrimGalore and aligns reads with STAR, marks duplicates and calculates mean fragment sizing using Piccard, and quantifies gene expression using Subread. Various quality control metrics are produced using RseQC, FastQC, Qualimap, Preseq, DupRadar, EdgeR and MultiQC. These reports were assessed to maintain pre and post alignment quality control. Counts generated using featureCounts (Subread) were normalized to fragment length and library size (FPKM) using countToFPKM. Differential expression analysis was performed using DeSEQ2.

Statistical analysis Statistical analyses were performed in R and JMP (v13) and are reported throughout the manuscript. For ANCOVA analysis between wild type and Braf mutant animals the model was as follows: Chronological Age ~ Epigenetic Age * Braf Status. Differential DNA methylation and Transcript expression was assessed using MethylKit and DEseq2, respectively. For enrichment analysis we used the PANTHER Enrichment tool, with gene lists from the Gene Ontology database (release date: 08/10/2019) and PATHER pathways database (release date: 03/12/2019). Where appropriate, P values were corrected for multiple testing using the false discovery rate method. Unless otherwise indicated in the text, input for FDR corrections were the entire dataset. Where FDR corrections were carried out on a subset of the data, subsetting on data was hypothesis driven and done without inspecting the underlying values or analyses. P<0.05 was the threshold for statistical significance.

160

Table 6: Summary of timepoints and analysis performed on the respective samples

Reduced Representation Bisulphite Sequencing Histopathological Group RNA-Seq (DNA Methylation) Assessment

BrafW-10D 5 3 BrafW-10W 4 BrafW-5M 9 4 24 BrafW-8M 3 BrafW-10M 4 BrafW-14M 4 4 Braf9-14M 12 4 32 Wt10D 3 3 Wt10W 3 Wt5M 3 3 Wt8M 4 Wt10M 4 Wt14M 4 3 Wt20M 5

161

References 1. Spring KJ, Zhao ZZ, Karamatic R, et al. High prevalence of sessile serrated adenomas with BRAF mutations: a prospective study of patients undergoing colonoscopy. Gastroenterology 2006;131:1400-7. 2. Carr NJ, Mahajan H, Tan KL, et al. Serrated and non-serrated polyps of the colorectum: their prevalence in an unselected case series and correlation of BRAF mutation analysis with the diagnosis of sessile serrated adenoma. J Clin Pathol 2009;62:516-8. 3. Pohl H, Srivastava A, Bensen SP, et al. Incomplete polyp resection during colonoscopy-results of the complete adenoma resection (CARE) study. Gastroenterology 2013;144:74-80 e1. 4. Bettington M, Walker N, Rahman T, et al. High prevalence of sessile serrated adenomas in contemporary outpatient colonoscopy practice. Intern Med J 2017;47:318-323. 5. Bettington M, Brown I, Rosty C, et al. Sessile Serrated Adenomas in Young Patients may have Limited Risk of Malignant Progression. J Clin Gastroenterol 2018. 6. Fennell L, Dumenil T, Wockner L, et al. Integrative Genome-Scale DNA Methylation Analysis of a Large and Unselected Cohort Reveals 5 Distinct Subtypes of Colorectal Adenocarcinomas. Cell Mol Gastroenterol Hepatol 2019;8:269-290. 7. Lieberman DA, Rex DK, Winawer SJ, et al. Guidelines for colonoscopy surveillance after screening and polypectomy: a consensus update by the US Multi-Society Task Force on Colorectal Cancer. Gastroenterology 2012;143:844-857. 8. Fan C, Younis A, Bookhout CE, et al. Management of Serrated Polyps of the Colon. Curr Treat Options Gastroenterol 2018;16:182-202. 9. Leggett B, Whitehall V. Role of the serrated pathway in colorectal cancer pathogenesis. Gastroenterology 2010;138:2088-100. 10. Bettington M, Walker N, Clouston A, et al. The serrated pathway to colorectal carcinoma: current concepts and challenges. Histopathology 2013;62:367-86. 11. Bettington M, Walker N, Rosty C, et al. Clinicopathological and molecular features of sessile serrated adenomas with dysplasia or carcinoma. Gut 2017;66:97-106. 12. Weisenberger DJ, Siegmund KD, Campan M, et al. CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancer. Nat Genet 2006;38:787-93. 13. Kambara T, Simms LA, Whitehall VL, et al. BRAF mutation is associated with DNA methylation in serrated polyps and cancers of the colorectum. Gut 2004;53:1137-44.

162

14. Liu C, Bettington ML, Walker NI, et al. CpG Island Methylation in Sessile Serrated Adenomas Increases With Age, Indicating Lower Risk of Malignancy in Young Patients. Gastroenterology 2018;155:1362-1365 e2. 15. Hinoue T, Weisenberger DJ, Pan F, et al. Analysis of the association between CIMP and BRAF in colorectal cancer by DNA methylation profiling. PLoS One 2009;4:e8357. 16. Fang M, Ou J, Hutchinson L, et al. The BRAF oncoprotein functions through the transcriptional repressor MAFG to mediate the CpG Island Methylator phenotype. Mol Cell 2014;55:904-915. 17. Minoo P, Baker K, Goswami R, et al. Extensive DNA methylation in normal colorectal mucosa in hyperplastic polyposis. Gut 2006;55:1467-74. 18. Tao Y, Kang B, Petkovich DA, et al. Aging-like Spontaneous Epigenetic Silencing Facilitates Wnt Activation, Stemness, and Braf(V600E)-Induced Tumorigenesis. Cancer Cell 2019;35:315-328 e6. 19. Bond CE, Liu C, Kawamata F, et al. Oncogenic BRAF mutation induces DNA methylation changes in a murine model for human serrated colorectal neoplasia. Epigenetics 2017:1-20. 20. Maegawa S, Hinkal G, Kim HS, et al. Widespread and tissue specific age-related DNA methylation changes in mice. Genome Res 2010;20:332-40. 21. Teschendorff AE, West J, Beck S. Age-associated epigenetic drift: implications, and a case of epigenetic thrift? Hum Mol Genet 2013;22:R7-R15. 22. Veitia RA, Govindaraju DR, Bottani S, et al. Aging: Somatic Mutations, Epigenetic Drift and Gene Dosage Imbalance. Trends Cell Biol 2017;27:299-310. 23. Gentilini D, Garagnani P, Pisoni S, et al. Stochastic epigenetic mutations (DNA methylation) increase exponentially in human aging and correlate with X chromosome inactivation skewing in females. Aging (Albany NY) 2015;7:568-78. 24. Shah S, McRae AF, Marioni RE, et al. Genetic and environmental exposures constrain epigenetic drift over the human life course. Genome Res 2014;24:1725-33. 25. Rakyan VK, Down TA, Maslau S, et al. Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains. Genome Res 2010;20:434-9. 26. Han Y, Eipel M, Franzen J, et al. Epigenetic age-predictor for mice based on three CpG sites. Elife 2018;7. 27. Horvath S. DNA methylation age of human tissues and cell types. Genome Biol 2013;14:R115. 28. Horvath S, Ritz BR. Increased epigenetic age and granulocyte counts in the blood of Parkinson's disease patients. Aging (Albany NY) 2015;7:1130-42.

163

29. Levine ME, Hosgood HD, Chen B, et al. DNA methylation age of blood predicts future onset of lung cancer in the women's health initiative. Aging (Albany NY) 2015;7:690-700. 30. Christiansen L, Lenart A, Tan Q, et al. DNA methylation age is associated with mortality in a longitudinal Danish twin study. Aging Cell 2016;15:149-54. 31. Perna L, Zhang Y, Mons U, et al. Epigenetic age acceleration predicts cancer, cardiovascular, and all-cause mortality in a German case cohort. Clin Epigenetics 2016;8:64. 32. Mi H, Muruganujan A, Ebert D, et al. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res 2019;47:D419- D426. 33. Stubbs TM, Bonder MJ, Stark AK, et al. Multi-tissue DNA methylation age predictor in mouse. Genome Biol 2017;18:68. 34. Meer MV, Podolskiy DI, Tyshkovskiy A, et al. A whole lifespan mouse multi-tissue DNA methylation clock. Elife 2018;7. 35. Bond C, Liu C, Kawamata F, et al. Oncogenic BRAF Mutation Induces Widespread DNA Hypermethylation in a Murine Model for Human Serrated Colorectal Neoplasia, In Lorne Cancer Conference, Lorne, 2017. 36. Luebeck GE, Hazelton WD, Curtius K, et al. Implications of Epigenetic Drift in Colorectal Neoplasia. Cancer Res 2019;79:495-504. 37. Barrow TM, Klett H, Toth R, et al. Smoking is associated with hypermethylation of the APC 1A promoter in colorectal cancer: the ColoCare Study. J Pathol 2017;243:366-375. 38. Liu C, Fennell LJ, Bettington ML, et al. DNA methylation changes that precede onset of dysplasia in advanced sessile serrated adenomas. Clin Epigenetics 2019;11:90. 39. Christensen BC, Houseman EA, Marsit CJ, et al. Aging and environmental exposures alter tissue-specific DNA methylation dependent upon CpG island context. PLoS Genet 2009;5:e1000602. 40. Borowsky J, Dumenil T, Bettington M, et al. The role of APC in WNT pathway activation in serrated neoplasia. Modern Pathology 2018;31:495-504. 41. Kawakami K, Ruszkiewicz A, Bennett G, et al. DNA hypermethylation in the normal colonic mucosa of patients with colorectal cancer. Br J Cancer 2006;94:593-8. 42. Toyota M, Issa JP. CpG island methylator phenotypes in aging and cancer. Semin Cancer Biol 1999;9:349-57.

164

43. Lee-Six H, Olafsson S, Ellis P, et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 2019;574:532-537.

165

Chapter Three: APC Mutation marks an aggressive subtype of BRAF mutant colorectal cancers

As published in Cancers

Fennell, LJ., Kane, A., Liu, C., McKeone, D., Fernando, W., Su, C., Bond, C., Jamieson, S., Dumenil, T., Patch, A.M,. Kazakoff, S.H., Pearson, J.V., Waddell, N., Leggett, B., Whitehall, V. APC mutation marks an aggressive subtype of BRAF mutant colorectal cancers. Cancers, 2020

Supplementary Materials are available at: https://doi.org/10.5281/zenodo.4560888

Relevance to thesis aims: Hypothesis: Early onset BRAF mutant colorectal cancers acquire early genetic alterations of WNT signalling to progress to malignancy, bypassing the need for extensive DNA methylation at WNT signalling regulators. Aim: To survey the landscape of WNT signalling pathway alterations in BRAF mutant colorectal cancers At the onset of this thesis we acknowledged that if aging were a large contributor to serrated neoplasia, there must be an explanation for why we observe the rare earlier onset BRAF mutant cancers. In chapter two, we identified pervasive accumulation of methylation at WNT signalling loci and hypothesise that this may be the conduit for increased risk of neoplasia with age. In the conventional pathway, WNT is activated by mutations in the APC gene. Here, we examined whether truncating mutations in APC, or mutations in other WNT pathway loci, occur more frequently in earlier onset BRAF mutant cancers. We show that BRAF mutant cancers occurring earlier in life are significantly more likely to harbour truncating APC mutations, and that this combination of mutations confers an aggressive phenotype. We validated the aggressive nature of comutation of Braf and Apc in a murine model, showing decreased survival and massively increased polyp load.

166

Author Contributions

Contribution Author Bioinformatic Statistical Experimental Drafting the Editting the Conceptualisation Analysis Analysis Work Manuscript Manuscript Fennell L.J 70% 100% 25% 70% 100% 22% Kane A - - 10% - - 4% Liu C - - 10% - - 4% McKeone D - - 20% - - 4% Fernando W - - 10% 5% - 4% Su C - - 10% - - 4% Bond C - - 5% - - 4% Jamieson S - - 5% - - 4% Dumenil T - - 5% - - 4% Patch AM 10% - - - - 4% Kazakoff S 10% - - - - 4% Pearson J 5% - - - - 4% Waddell N 5% - - - - 4% Leggett B - - - 12.5% - 10% Whitehall V - - - 12.5% - 20%

167

Abstract

Background: WNT activation is a hallmark of colorectal cancer. BRAF mutation is present in 15% of colorectal cancers, and the role of mutations in WNT signaling regulators in this context is unclear. Here we evaluate the mutational landscape of WNT signaling regulators in BRAF mutant cancers.

Methods: We performed exome-sequencing on 24 BRAF mutant colorectal cancers and analysed these data in combination with 175 publicly available BRAF mutant colorectal cancer exomes. We assessed the somatic mutational landscape of WNT signaling regulators, and performed hotspot and driver mutation analyses to identify potential drivers of WNT signaling. The effects of Apc and Braf mutation were modelled, in vivo, using the Apcmin/+ and BrafV637/Villin-CreERT2/+ mouse, respectively.

Results: RNF43 was the most frequently mutated WNT signaling regulator (41%). Mutations in the beta-catenin destruction complex occurred in 48% of cancers. Hotspot analyses identified potential cancer driver genes in the WNT signaling cascade, including MEN1, GNG12 and WNT16. Truncating APC mutation was identified in 20.8% of cancers. Truncating APC mutation was associated with early age at diagnosis (P< 2x10-5), advanced stage (P<0.01), and poor survival (P=0.026). Apcmin/+/BrafV637 animals had more numerous and larger SI and colonic lesions (P<0.0001 and P<0.05, respectively), and a markedly reduced survival (Median survival: 3.2 months, P=8.8x10-21) compared to animals with Apc or Braf mutation alone.

Conclusions: The WNT signaling axis is frequently mutated in BRAF mutant colorectal cancers. WNT16 and MEN1 may be novel drivers of aberrant WNT signaling in colorectal cancer. Co-mutation of BRAF and APC generates an extremely aggressive neoplastic phenotype that is associated with poor patient outcome.

168

Background Colorectal cancer is a heterogeneous disease that arises through two main molecular pathways. The conventional pathway, which accounts for 75-80% of all colorectal cancer diagnoses, is initiated by biallelic inactivation of APC and progresses to cancer via mutations in KRAS and alterations to the TP53 gene. By contrast, the serrated neoplasia pathway is initiated by activating mutations in BRAF and often progresses to malignancy via MLH1 hypermethylation, microsatellite instability and a plethora of epigenetic alterations. At the transition to dysplasia, serrated lesions usually acquire mutations that increase WNT signaling. Sessile serrated lesions (SSLs) acquire missense APC mutations1, and truncating RNF43 mutations2. In traditional serrated adenomas (TSAs), common WNT pathway aberrations include RSPO3 fusions 3,4, mutations of CTNNB13 and mutation of APC3.

In the normal enterocytes the WNT signaling cascade exists to support stemness, differentiation and development. Appropriate levels of WNT signal are maintained intracellularly by the β-catenin destruction complex. The complex consists of AXIN, APC, GSK3β, and CK1α. The destruction complex ubiquitinates β-catenin in the cytosol, triggering its subsequent proteasomal degradation. In the absence of the destruction complex, β-catenin translocates to the nucleus, forms a complex with the TCF/LEF molecules and p300 to activate the expression of genes supporting the stem phenotype. Constitutive WNT signaling is deleterious to the cell and thus in the absence of exogenous stimuli the β-catenin destruction complex patrols the cytosol and degrades β-catenin. WNT signaling is activated by the binding of extracellular WNT ligands to frizzled receptors residing on the cell surface. This triggers the sequestering of the destruction complex to the cell membrane and facilitates the build-up of β-catenin, which enters the nucleus and activates WNT target genes.

Approximately 45-50% of BRAF mutant cancers show dysregulated WNT signaling1, and thus the WNT signaling pathway appears important to serrated colorectal neoplasia. In conventional colorectal carcinogenesis, WNT signaling is dysregulated via truncating mutations of APC and loss of 5q21, the region where the APC gene resides5. This dysregulation occurs very early in the evolution of conventional adenomas. However numerous studies have indicated that mutation of BRAF is almost never identified in such APC mutated adenomas even when they develop advanced histological features 6,7. This suggests that BRAF and APC mutations are mutually exclusive in conventional adenomas.

169

In the serrated neoplastic pathway where the initiating mutation is BRAF, WNT signaling only commonly becomes dysregulated when the benign polyp transitions to malignancy. Truncating RNF43 mutations may alter WNT signaling, but these are predominantly present in mismatch repair deficient BRAF mutant cancers2,8, and there is controversy as to whether RNF43 mutation affects canonical WNT signaling9. Epigenetic silencing of WNT pathway members is another possible mechanism for altering canonical WNT pathway activity. Methylation of SFRP genes increases WNT signaling10 and is common in colorectal cancer11. Similarly DNA methylation induced inactivation of DKK genes, which are antagonists of WNT signaling, occurs in ~20% of all colon cancers12. The frequency of WNT signaling dysregulation being due to APC mutation is not well established.

Here we have conducted a large-scale genomic analysis of the somatic mutations that underlie WNT signaling activation in BRAF mutant colorectal cancer. We hypothesise that WNT signaling activation in BRAF mutant cancers will be heterogeneous, and a mosaic of alterations underpin WNT signaling to achieve a “just-right” level of pathway activation.

170

Methods Cohorts included in the study We assessed the somatic mutational landscape of 199 BRAF mutant cancers from four distinct sources. This included cancers from The Cancer Genome Atlas project (n=51)7,13, the Dana Faber Cancer Institute (Giannakis et al 2016, n=111)14, the Clinical Proteomic Tumor Analysis Consortium (Suhas et al 2019, n=13)15 and additional BRAF mutant cancers that were sequenced as part of this study (methods detailed below, n=24). For analyses survival analyses we included additional targeted sequenced data from the Memorial Sloan-Kettering Cancer Centre (Yaeger et al 2018, n=76)16. This dataset was limited to a panel of genes and as such was excluded from other analyses. Supplementary Figure 1 shows similar tumor mutation burden across each cohort. BRAF wild type cancers (n=512) were included for comparison of mutational profiles between BRAF mutant and wild type cancers. Clinicopathological details of samples included in this study and mutational data are available as supplementary materials (Supplementary Table S1 and S2).

DNA extraction, library preparation and exome sequencing of local samples Cancer and germline samples were obtained from patients at the Royal Brisbane and Women’s Hospital, Brisbane, Australia at the time of surgery. All participants gave their written, informed consent prior to participating in the study and the study was approved by the QIMR Berghofer Human Research Ethics Committee (P460, P773). DNA was isolated from whole blood using the salt precipitation method as previously reported17. Cancer samples were snap-frozen in liquid nitrogen and DNA extracted using the AllPrep DNA/RNA/Protein mini kit (QIAGEN, Germany) as previously reported 18. Exome-sequencing libraries were generated using the Agilent SureSelect Human All Exon V4+UTR capture platform (Agilent, CA, USA). Libraries were sequenced to a target depth of 200-fold coverage on a 100 bp paired-end sequencing run using an Illumina HiSeq 2000 instrument. Sequence reads were trimmed using Cutadapt (v1.9) 19 and aligned to the GRCh37 reference with BWA-MEM (v0.7.12) 20. Alignments were duplicate-marked with Picard (v1.129, https://broadinstitute.github.io/picard/) and coordinate-sorted using Samtools (v1.1) 21. Single nucleotide substitution variants were detected using a dual calling strategy using qSNP (v2.0) 22 and the GATK HaplotypeCaller (v3.3-0) 23. The HaplotypeCaller was also used to call short indels of ≤50 bp. Initial read filtering for all variants detected included: a minimum of 35 alignment matches in the CIGAR string, 3 or fewer mismatches in the MD field, and a mapping quality greater than 10. High confidence variants were selected with: a minimum coverage of 8 reads in the control data and 12 reads

171

in the tumour data; at least 5 variant supporting reads present where the variant was not within the first or last 5 bases; at least 4 of the 5 reads with unique start positions; the variant was identified in reads of both sequencing directions; the variant was not less than 5 base pairs from a mono-nucleotide run of 7 or more bases in length. Variants were annotated with gene feature information and transcript or protein consequences using SnpEff (v4.0e) 24. Sequencing and QC metrics are reported in supplementary table S3.

Assessing the somatic mutational landscape of WNT regulators To assess the somatic mutational landscape of WNT signaling regulators we downloaded mutational annotation files for each cohort from the Genome Data Commons (TCGA cohort7,13), cBioPortal (CPTAC cohort15), from published supplementary materials (DFCI cohort14) or analysis of the Royal Brisbane and Women’s Hospital, Brisbane cases. MAF files concatenated to form a combined MAF file comprised of 924,366 entries relating to 1411 samples. BRAF V600E mutant samples were subset from the larger dataset, yielding a total of 320,431 variants from 199 samples. As we sought to investigate WNT, we further selected only genes that were members of the REACTOME signaling by WNT geneset (n=327 genes). The final dataset was comprised of 5,327 nonsynonymous variants in WNT signaling loci that corresponded to 199 samples. Analysis of variants was performed using the MAFtools R package25. Cancer drivers were predicted using two orthogonal approaches (OncodriveClust26: Default parameters as implemented in MAFtools; OncodriveFML27: Scores: CADD v1.3, Signature: Computed by sample, remaining parameters: default). Driver mutation analyses were performed on the entire set of variants (not limited to the WNT signaling gene-set) to accurately model the background mutational processes and avoid statistical biases. Results from Non-WNT loci were discarded and FDR corrections were performed on the remaining P values that pertain to tests performed on genes in the WNT signaling pathway. Somatic interactions (co-mutations and mutual exclusivity) were identified by performing Fishers exact test on pairs of genes.

Murine model of Apc and Braf mutation To model the effects of Apc and Braf mutation on colorectal neoplasia we utilized two murine models. The ApcMin/+ mouse has a mono-allelic mutation at codon 851 and recapitulates human germline APC mutation. In both humans and mice, progression is governed by the loss of the remaining allele. Our second model, the BrafCA/CA/Villin-CreERT2/+ mouse 28,29, is an inducible model of Braf mutant

172

colorectal neoplasia. Recombination of the mutant Braf V637E allele is induced at 2 weeks of age by a single intraperitoneal injection of tamoxifen (75mg/kg). The Braf V637E allele is the murine analogue of the BRAF V600E human mutation. To model the effects of Apc mutation and Braf mutation we crossed ApcMin/+ mice with BrafCA/CA/Villin-CreERT2/+. Animals were monitored biweekly for signs of distressed and humanely euthanized when such signs were identified, as per our approved protocol (QIMR Berghofer Animal Ethics Committee; P1208). For survival analysis, animals were deemed to be deceased if they were euthanized due to distress. If animals reached the prescribed endpoints of the experiment without any signs of distress they were deemed to have survived and were censored for survival analysis. At sacrifice the gastrointestinal tract from oesophagus to rectum was removed, cleaned and opened longitudinally. Macroscopic lesions were bisected to obtain both molecular and histological data. Matched normal hyperplastic tissue was taken minimum five centimetres from the site of the lesion. Histological assessment of lesions and lesion counts was performed on haematoxylin and eosin stained sections from formalin-fixed, paraffin-embedded blocks by specialist gastrointestinal anatomical pathologists.

Sanger Sequencing Sanger sequencing was performed to assess the G7 repeat track of WNT16. PCR conditions were as follows: 1X GoBuffer (ProMega, USA), 2.5mM MgCl2, 0.25mM dNTP, 0.25uM Forward Primer (5’ GGCAACATGACAGAGTGTTCC 3’), 0.25uM Reverse Primer (5’ GCCATACTGGACATCATCGG 3’), 0.25uM Syto9, 1U GoTaq DNA polymerase (ProMega, USA), 50ng DNA; Cycing: 95⁰C hold for two minutes, 40X cycles of 95⁰C for 30 seconds, 60⁰C for 30 seconds, 72⁰C for 45 seconds, followed by a 72⁰C hold for five minutes at the end of cycle 40. Sequencing was performed as per Fennell et al30.

Statistical Analysis All statistical analysis was performed in Microsoft Open R (v3.5.1). Students T-Tests were performed for hypothesis testing of continuous variables. Logistic regression analyses were employed to examine the probability of mutations over patient age. The likelihood-ratio test was performed to assess associations with categorical variables and Fishers exact test to examine for mutual exclusivity of mutations in gene-pairs.

173

Results The somatic mutation landscape of WNT signaling in BRAF mutated cancers To assess the degree of variation in genetic alterations of WNT signaling pathway genes, we collated whole exome-sequencing data of BRAF mutant colorectal cancers from three previously published studies 7,14,15, combined with 24 samples that were sequenced in-house (total n=199). We limited whole exome sequencing variants to genes in the WNT signaling cascade, as identified in the REACTOME signaling by WNT gene set (n=327 genes). The mean number of WNT pathway mutations per sample was 16.7±13.6, and was highly correlated with overall tumour mutation burden (P= 4.08x10-85, r2=0.86). RNF43 was the most commonly mutated gene (41%, Figure 1). 35.3% of samples had a truncating mutation in RNF43. KMT2D, TRRAP, and APC were mutated in 33%, 30% and 28% of samples, respectively (Figure 1).

174

Figure 1: The somatic mutation landscape of WNT signaling regulators in BRAF mutant colorectal cancers. The 30 most frequently mutated genes in the WNT pathway are depicted. Each column corresponds to a single cancer. The colour of bars is indicative of the type of mutation with grey = wild-type. The barplot at the top of the figure represents the number of mutations in the WNT pathway a sample has. The vertical plot on the right of the figure represents the number of mutations in each gene, colour coded by mutation type. Microsatellite instability status is indicated below the mutation plot.

175

The β-catenin destruction complex is an important regulator of canonical WNT signaling. It is comprised of APC, AXIN1, AXIN2, and GSK3β (Figure 2). We next evaluated how frequently mutations occur in any component of this complex or in CTNNB1 itself. 48% of all cancers had mutations in at least one of these five genes. Mutations in APC and AXIN2, but not AXIN1, trend toward mutually exclusive. Missense mutations in CTNNB1 have been reported to render the molecule impervious to ubiquitin-mediated destruction. CTNNB1 mutations occur in 9% of samples. Missense CTNNB1 mutations were mutually exclusive with truncating APC mutations.

APC is a classical tumor suppressor gene where loss of both copies is required for complete abrogation of transcription. We next assessed the frequency of double, and triple hit mutational events in APC. Of cancers with truncating APC mutations, 50% also harbored a second truncating mutation in APC. One cancer had three truncating mutations in APC. When we included missense mutations in the assessment of potential double hit events, this percentage rose to 54%. These data indicate that missense mutations are not selected for as second-hits at the APC locus, and raises questions regarding their functionality.

176

Figure 2: Mutations in the Beta-Catenin destruction complex. Each column corresponds to a single cancer and each row a single gene.

177

Somatic mutation interaction analysis identifies co-mutated WNT signaling loci and mutual exclusivity of truncating APC and RNF43 mutations

We performed somatic mutation interaction analyses to examine for mutations in genes that are mutually exclusive and those that tend to co-occur. We found evidence for co-occurring mutations in 222 gene pairs (Figure 3, Supplementary Table 4). As truncating mutations are more likely to influence the final protein, we next examined somatic interactions between truncating mutations in WNT pathway genes (Figure 3). We identified statistical evidence for somatic interactions between 75 gene pairs. 96% were between co-mutated gene pairs (Supplementary Table 4). Truncating APC mutation was mutually exclusive to truncating mutations in both RNF43 (P0.0003, OR:0.20), and ZNRF3 (P=0.001, OR: 0). AMER1 truncating mutations were mutually exclusive to RNF43 mutation (P=0.043, OR:0.12).

178

Figure 3: Somatic interaction analysis reveals mutually exclusive mutations between gene pairs, and significant co-occurring mutations. Co- occurring mutations are indicated by green squares and mutually exclusive mutations between gene pairs in purple. The intensity of the colour is proportional to the–log10(P-value). P-values were determined using Fishers exact test. Bold stars indicate P<0.05 and non-bolded stars P<0.1.

179

The WNT signaling mutational landscape of BRAF mutant microsatellite unstable cancers differs from BRAF mutant microsatellite stable cancers

It is well established that loss of DNA mismatch repair functionality results in a hypermutable phenotype (Microsatellite instability or MSI) and that this is common in BRAF mutant colorectal cancers. We next sought to examine mutations in WNT regulators by microsatellite instability status to identify if different regulators are targeted in these contexts. We identified fifty genes that were significantly differentially mutated according to microsatellite instability status. As expected, all genes were more frequently mutated in MSI cancers. Table 1 shows the 15 most significantly differentially mutated WNT signaling regulators between MSI and MSS cancers. This analysis reveals genes that are exclusively mutated in MSI cancers (ie. TPR2, EP300, DVL2) and those that are mutated significantly more frequently in MSI cancers (ie. TRRAP, RNF43, ZNRF3). The most frequently mutated genes are markedly different between MSI and MSS cancers (Figure 4). The most frequently mutated genes in MSI cancers were RNF43 (49%), KMT2D (48%), and TRRAP (41%), in contrast APC was the most frequent mutational target in MSS cancers (21%) followed by RNF43 (20%), HECW1 (10%), and a tail of less frequently mutated genes (Figure 4).

180

Table 1: WNT signalling regulators that are differentially mutated according to microsatellite instability status in BRAF mutant colorectal cancers

Samples Mutated Samples Mutated Adjusted P Odds Ratio Gene (MSI) (MSS) Value (MSI/MSS) KMT2D 51 4 1.15E-06 13.03657408 CHD8 36 1 1.15E-05 30.45914642 ITPR2 30 0 1.68E-05 NA CREBBP 40 3 3.65E-05 11.57936393 ITPR3 41 4 9.91E-05 8.88801381 BCL9L 34 2 9.91E-05 13.77175126 EP300 25 0 1.32E-04 NA USP34 33 2 1.46E-04 13.18635308 DVL2 23 0 2.29E-04 NA TRRAP 43 6 2.86E-04 6.194964561 SCRIB 22 0 3.48E-04 NA RNF43 52 12 2.09E-03 3.900624188 FZD8 19 0 2.24E-03 Inf ZNRF3 30 3 2.34E-03 7.558622414

181

Figure 4: The mutational landscape of WNT signaling regulators by microsatellite instability status. Note this analysis is limited to cancers with microsatellite instability status available (n=167).

182

WNT signalling regulators are differentially mutated in BRAF mutant cancers in comparison to BRAF wild type cancers

To establish whether the WNT mutational landscape of colorectal cancers is differs according to BRAF status we examined the mutational profile of WNT signalling genes in further 512 colorectal cancers from The Cancer Genome Atlas that were BRAF wild type. The most frequently mutated genes are markedly different in BRAF wild type cancers (Figure 5, Figure 1). APC was mutated in 82% of these cancers, in comparison to just 28% of BRAF mutant cancers. Of all WNT signalling regulators assessed in this study, 110 were significantly differentially mutated between BRAF mutant and wild type cancers (Table 2). 99% of these genes were more frequently mutated in BRAF mutant cancers. This is likely due to the increased frequency of MSI in BRAF mutant cancers, and the high frequency of the mutations in highly penetrant APC gene in BRAF wild type cancers reducing the selective pressure on other WNT regulators.

183

Table 2: Comparison of the mutational frequency of WNT signalling regulators in BRAF mutant cancers versus BRAF wild type cancers

Samples Samples Odds-Ratio Adjusted Gene Mutated (BRAF Mutated (BRAF (Mutant/Wild P value Mutant) Wild type) type) 7.18E- APC 57 420 0.09 39 2.90E- RNF43 82 29 11.52 26 3.67E- ZNRF3 38 7 16.84 14 3.27E- ITPR3 51 25 6.65 12 2.53E- KMT2D 66 49 4.64 11 2.53E- TRRAP 60 40 5.04 11 9.61E- CREBBP 51 37 4.38 09 6.25E- DVL2 24 6 11.45 08 8.60E- FZD8 20 3 18.77 08 8.92E- CHD8 45 32 4.34 08 2.34E- TERT 19 3 17.74 07 2.72E- LRP5 29 13 6.49 07 3.74E- WNT16 25 9 7.96 07 1.10E- SCRIB 28 16 5.03 05 1.10E- ROR2 29 18 4.64 05 1.76E- BCL9L 39 33 3.51 05 1.95E- AKT1 16 4 11.00 05 1.99E- SOX3 17 5 9.38 05 3.38E- WNT4 13 2 17.66 05

184

Figure 5: The mutational landscape of WNT signalling regulators in BRAF wild type cancers.

185

Mutation clustering analysis reveals mutational hotspots in nine WNT signaling genes We next sought to identify driver genes using the OncodriveCLUST algorithm. This method identifies potential driver genes using a positional clustering method and operates on the assumption that clusters of mutations, or “mutational hotspots” are more likely to occur in oncogenes. In keeping with previous studies 2,8,30,31(p43), RNF43 was identified as a putative cancer driver (P=0.07). Somatic mutations in MEN1, a gene identified as a familial cancer risk gene and as an inducer of genome wide hypermethylation, were identified as putative drivers. MEN1 was mutated in 4% (8/199, P<0.001) of samples, and most of the identified mutations were frameshift deletions at R521. Moreover, WNT16 and GNG12 were implicated as a potential cancer driver (P=0.06 and P=0.0006, respectively).

We used oncodriveFML, an orthogonal computational method of predicting cancer drivers based on predictions of functionality, to identify other potential driver genes that do not necessarily harbour clusters of mutations. This analysis identified 11 potential cancers drivers in the WNT signaling cascade, three of which were identified by oncodriveCLUST (RNF43: P=7.22x10-6, MEN1: P= 0.02, and GNG12: P=0.012). Other genes that were identified include members of the beta-catenin destruction complex (APC: P=7.22x10-6, AXIN1: P<0.01, AXIN2: P=0.0001), ZNRF3 (P=7.22x10-6), SOX9 (P=7.22x10-6), BCL9L (P<0.001), PYGO2 (P<0.001), and WNT11 (P=0.045).

WNT16 harbours a mutational hotspot at G165. This codon resides in a G7 repeat track that was the subject of frameshift indels in 15 cancers. We used Sanger sequencing to orthogonally validate the presence of WNT16 hotspot mutations in BRAF mutant cancers (n=79) and identified frameshift mutations in 20.2% (16/79) of cancers (Supplementary Figure S2). Clinicopathological analyses of WNT16 mutant cancers revealed an association with MSI (Mutant: 100% vs Wt: 61.9%, P=0.0054), right sided tumours (Mutant: 100% vs Wt: 86.8%, P=0.038) and earlier stage at diagnosis (Mutant: 87.6% vs Wt: 63.2%, P=0.029). When corrected for covariates that may influence cancer stage (MSI, tumour side, sex, age, CIMP), WNT16 mutations were marginally associated with earlier stage at diagnosis (P=0.05).

186

Mutations in WNT signaling regulators confer prognostic implications To determine whether frequently mutated WNT regulators, or those identified as putative cancer drivers is relevant to patient outcome we performed survival analysis on patients with cancers bearing mutations in the five most frequently mutated WNT signaling regulators (RNF43, KMT2D, TRRAP, APC, CREBBP) and those identified by driver analysis (MEN1, GNG12, WNT16, AXIN1, AXIN2, ZNRF3, SOX9, BCL9L, PYGO2 and WNT11). Four genes (APC, RNF43, CREBBP and WNT16) were associated with survival (Table 3). Mutations in WNT16 and APC were significantly associated with shorter overall survival (Table 3). By contrast, mutations of CREBBP and RNF43 were associated with prolonged survival (Table 3).

187

Table 3: Univariate survival analysis of the most frequently mutated WNT signalling regulators and WNT signalling regulators identified as potential drivers by computational analysis. Differences in overall survival were assessed using the log-rank test.

Median Survival (Days) All Mutations Truncating Mutations Gene Wild Type Mutant P value Wild Type Mutant P value WNT16^ 2134 547 0.001 2134 547 0.001 RNF43* 752 2047 0.01 934 2047 0.04 CREBBP* 958 2134 0.01 961 NA 0.14 APC* 1390 504 0.03 1390 504 0.03 WNT11^ 2134 188 0.059 2134 188 0.059 AXIN2* 958 1158 0.26 961 1158 0.21 KMT2D* 958 1503 0.27 961 1503 0.4 AXIN1* 958 1503 0.35 961 1158 0.86 SOX9* 961 2047 0.49 1158 2047 0.88 GNG12^ 2134 NA 0.52 2134 1503 0.93 MEN1^ NA 2134 0.54 NA 2134 0.54 PYGO2^ NA 1818 0.63 NA 1818 0.5 ZNRF3^ 2134 2047 0.65 2134 2047 0.9 TRRAP^ NA 2047 0.8 2134 NA 0.81 *n=109, ^n=50, NA = Indeterminable

188

Co-mutation of APC and BRAF represents a unique and aggressive subtype of BRAF mutant cancers We next evaluated the relationship between BRAF mutation and APC mutation in further detail to characterize the clinical and molecular correlates of this subtype of cancers. We supplemented the 199 BRAF mutant exomes assessed earlier in the manuscript with 76 BRAF mutant cancers that were subjected to targeted sequencing as part of Yaeger et al 201816. Truncating mutation was present in 20% of BRAF mutant cancers. We examined whether there was a relationship between age at diagnosis and APC mutation by logistic regression analysis. The probability of truncating APC mutation occurring in a BRAF mutant cancer decreases markedly with age from ~60% in patients diagnosed at age 40, to <10% of patients diagnosed at >90 years of age (Logistic Regression P=3.74x10-7). The average age of patients with a BRAFV600E/APCTruncated cancer was significantly lower than both patients with a BRAF V600E/APCMissense tumour (61 vs 72, P=2.03x10-5, Table 4) and a patient with a BRAF V600E/APCWild-type cancer (61 vs 71, P=9.3x10-6). BRAFV600E/APCTruncated cancers were more likely to be left sided when compared with BRAFV600E/APCMissense cancer (24.5% vs 4.2%, P=0.02, Table 4). There was no difference in frequency of CIMP versus either missense or wild-type APC cancers. 42.3% of BRAFV600E/APCTruncated cancers were microsatellite unstable. MSI is less frequent than both BRAF V600E/APCMissense (91.3%, P=5.3x10-5, Table 4) and BRAF V600E/APCWild-type (53.8%, P=0.14) cancers.

BRAFV600E/APCTruncated cancers were aggressive cancers, with 67.3% of patients presenting with metastatic disease. In contrast, only 36.4% and 45.7% of BRAF V600E/APCMissense and BRAF V600E/APCWild-type cancers presented at stage III/IV (P= 0.01 and 0.002 versus BRAFV600E/APCTruncated, respectively). BRAFV600E/APCTruncated cancers that were also microsatellite stable were enriched further for late-stage disease, with 100% of these patients presenting with metastatic disease (Stage III or IV), and 88% with distant metastases (Stage IV).

Furthermore, univariable analysis of survival indicated that BRAFV600E/APCTruncated cancers have a significantly poorer median survival (504 days vs 1390 days, Log-rank P=0.026, n=32 and n=78 for truncating mutant and wild-type, respectively; Figure 6A). The five-year survival of BRAFV600E/APCTruncated patients was 12%. By contrast the five year survival of BRAFV600E/APCWild-Type patients was 42%. This effect was most pronounced in microsatellite stable cancers (supplementary figure S3A) compared with microsatellite unstable cancers (Supplementary Figure S3B & S3C). We performed multivariate survival analysis, including age at diagnoses, gender, stage and microsatellite

189

instability as potential prognosticators. Using the cox-proportional hazard method, microsatellite instability status, and gender are significantly independently associated with survival. Truncating APC mutation trends toward conferring independent negative prognostic implications, however this failed to reach the threshold for significance (Table 5, P=0.17). Collectively these data indicate that activating mutation of BRAF and truncating mutation of APC represent an aggressive subtype of colorectal cancers that occur at a relatively young age in comparison to BRAF mutant cancers more generally.

190

Table 4: Clinical and molecular characteristics of BRAF mutant cancers with APC mutation APC P-value1 Truncating Missense Truncating vs Truncating vs Missense vs n Wild-Type Mutation Mutation Missense Wild-type Wild-type Mean 273 60.8 72.4 70.6 2.03x10-5 9.3x10-6 0.34 Age Male 87 (31.8%) 18 (32%) 6 (24%) 63 (33%) Sex 0.48 0.86 0.36 Female 187 (68.2%) 39 (68%) 19 (76%) 129 (67%) Tumour Left 42 (16.6%) 13 (25%) 1 (4%) 28 (16%) 0.02 0.16 0.08 Side Right 211 (83.4%) 40 (75%) 23 (86%) 148 (84%) I 32 (12.9%) 3 (6%) 2 (9%) 27 (16%) II 93 (37.7%) 14 (27%) 12 (55%) 67 (39%) Stage 0.01 0.002 0.32 III 59 (23.9%) 10 (19%) 6 (27%) 43 (25%) IV 63 (25.5%) 25 (48%) 2 (9%) 36 (21%) High 128 (81.0%) 20 (83%) 18 (95%) 90 (78%) CIMP 0.23 0.57 0.05 Negative 30 (19.0%) 4 (17%) 1 (5%) 25 (22%) MSI 136 (54.8%) 22 (42%) 21 (91%) 93 (54%) MSI 5.3x10-5 0.14 0.0002 MSS 112 (45.2%) 30 (58%) 2 (9%) 80 (46%)

1P-values were obtained using the likelihood-ratio test for categorical variables, and the student’s t-test for continuous variables. All statistical analyses were two-tailed.

191

Table 5: Cox-proportional Hazard analysis of survival of BRAF mutant cancers. Variable Risk Ratio 95% CI P Value Microsatellite Instability (MSS) 2.41 1.18-4.95 0.016 Gender (Female) 1.93 1.04-3.57 0.0373 APC (Truncating) 1.63 0.80-3.32 0.1744 Stage (III/IV) 1.56 0.66-3.69 0.3083 Age (<50) 1.2 0.53-2.71 0.6545

192

Figure 6: Survival analysis of A: BRAF mutant human cancers by the presence or absence of truncating APC mutation. B: Apc, Braf, and Apc/Braf mutant murine models. P-values are univariate and derived from the log-rank test. C-E: Assessment of the number and size of lesions in Apc, Braf, and Apc/Braf mutant mouse models. C: Total lesions in the small intestine D: Total number of lesions in the colon and caecum E: Mean size of lesions in the colon and caecum

193

Mutation of Braf in APCmin/+ mouse results in massive polyp load, rapid disease progression and poor survival

To determine if we could recapitulate the apparently aggressive phenotype of co-mutation of BRAF and APC we crossed inducible BrafV637 mutant mice with Apcmin/+ mice. The Braf mutation was induced at wean in ApcMin/+ mice and we compared the number of lesions per animal and survival to mice with just the mutant Braf allele or the mutant Apc allele. We next assessed differences in survival between ApcMin/+ (n=29), BrafV637(n=15), and APCMin/+/ BrafV637 mice (n=22). Animals were regarded as having survived and were censored if they were healthy at the time of sacrifice, animals regarded as deceased if the animal had to be euthanized due to illness. 100% of Braf mutant animals survived to 12 months, as did 81.25% of Apc mutant animals. Mutation of both Braf and Apc significantly reduced the survival of the animals (P= 8.8x10-21, Figure 6B). The median survival of animals with both Apc and Braf mutation was 3.2 months. No animal with both mutations survived longer than six months. We assessed polyp load by microscopic enumeration. Animals with Braf and Apc mutations alone develop an average of 4.6 and 16.55 polyps in the small intestine, respectively. Animals with both Braf and Apc mutation simultaneously develop significantly more lesions in the SI (P<0.0001, Figure 6C). Animals with Braf or Apc mutation rarely developed colonic or caecal lesions (mean lesions per mouse: 0.11 and 1.1, respectively, Figure 6D). In contrast, dual mutation of Apc and Braf resulted in the accumulation of an average of 59.82 colonic/caecal lesions per animal (P<0.0001, Figure 6D). We did not observe a significant increase in lesion size in the small intestine between groups, however we did observe significantly larger lesions in the colon and caecum of animals bearing both Apc and Braf mutation (P<0.0001, Figure 6E). Lesions had a morphology that was reminiscent of human conventional adenomas, rather than dysplastic serrated lesions.

194

Discussion

Here we have investigated the role of somatic mutation in shaping the WNT signaling landscape of colorectal cancers bearing the BRAF mutation. We have shown that 48% of BRAF mutant cancers mutate at least one member of the B-catenin destruction complex. Other common modes of activation including mutation of RNF43 and ZNRF3. We have identified a number of novel mutations that may alter the WNT signaling landscape of cancers. These include MEN1, a known WNT pathway tumour suppressor, and WNT16, a WNT ligand that may act as an antagonist of ligand mediated WNT activation. Both MEN1 and WNT16 harbour hotspot frameshift mutations that were identified as potential drivers by computational analysis. Mutation of RNF43 was mutually exclusive to mutation of APC. We examined the clinical and molecular correlates of BRAF mutant cancers bearing truncating mutations of APC, which occurred in 20% of samples. These cancers were predominantly microsatellite stable, and late stage. Cancers with a truncating APC mutation occurred at an average age that was >10 years lower than the wider cohort of BRAF mutant cancers. Survival analysis revealed a significantly poorer prognosis for this subtype of patients. Using murine models of Apc and Braf mutation, we show that mutating both genes results in an extensive phenotype with massive lesion burden. Animals had a median survival of 3.2 months, and no animal bearing both mutations survived longer than 6 months. Collectively these data indicate that mutation of both BRAF and APC results in an aggressive and rapidly progressing cancer phenotype and confers a poor prognosis.

WNT signaling underpins colorectal carcinogenesis. In the conventional pathway WNT signaling is usually activated via bi-allelic inactivation of the APC tumour suppressor gene at the beginning of the tumourigenic process. However, the mechanisms governing WNT pathway activation in the serrated neoplasia pathway, which is uniquely marked by BRAF mutation, is less clear. In the present study, we sought to identify WNT signaling genes that are mutated in the context of BRAF mutant serrated colorectal neoplasia. We obtained exome sequencing data from 175 BRAF mutant colorectal cancers from four previously published studies7,14,15,32 and sequenced a further 24 BRAF mutant samples collected locally. Our analyses revealed a mosaic of mutations in WNT signaling regulators, including well-known WNT regulators such as RNF43, APC, AXIN2 and ZNRF3. Our analysis identified significant mutual exclusivity between truncating mutations of RNF43 and APC. Mutual exclusivity of mutations in these genes has been previously reported8, however it has not been clear whether this mutual exclusivity was, in part, due to the abundance of RNF43 mutations in BRAF mutant cancers,

195

and the relative rarity of APC mutations in this context. Here we have shown that within cancers that bear BRAF mutation, the mutually exclusive nature of mutations in these genes remain, indicating the mutual exclusivity is likely between APC and RNF43 mutations, rather than between BRAF mutations and APC mutations. Likewise, ZNRF3 mutation was mutually exclusive to truncating mutations of APC. This association was present only when missense mutations were not included. It is possible that the addition of a truncating APC mutation in this context is disadvantageous to tumour progression. Therefore, mutation of RNF43/ZNRF3 may create a genetic dependency on APC. If true, exploiting the dependency on APC, a canonical tumour suppressor gene, may be a novel therapeutic treatment for patients with an RNF43 mutated cancer.

We next examined the exome sequencing data to identify potential novel drivers of WNT signaling activation in colorectal cancer. We adopted a mutational clustering based approach to identify potential cancer drivers based on the presence of mutational hotspots, as implemented in the oncodriveCLUST algorithm26. Reassuringly, RNF43, which has two mutational hotspots2,8,30 was successfully identified as a cancer driver. RSPO fusions, which have been implicated in WNT dysregulation of serrated lesions and cancers4,33, were not identified due to technological limitations. It is likely that some cancers in this cohort harboured such fusions given the frequency of RSPO fusions previously reported. We identified eleven other potential cancer driver genes in the WNT signaling cascade. MEN1 was mutated in eight samples and most mutations were frameshift alterations at codon R521. Germline MEN1 mutations result in multiple endocrine neoplasia type 1, a tumour predisposition syndrome. It has also been identified as a tumour suppressor gene in a number of different cancer types, including tumours of the parathyroid34(p1), entero-pancreatic neuroendocrine cancers35(p1), and carcinoids36. Interestingly, especially in the context of highly methylated BRAF mutant cancers, loss of MEN1 has been associated with aberrant DNMT1 activity and an altered DNA methylation landscape. To our knowledge, MEN1 alterations have not been previously reported in colorectal cancer, nor is colorectal cancer a typical presentation of MEN1 syndrome. It is possible these patients had an underlying germline mutation in MEN1, and the mutations identified in this studied were the second hit at the locus.

WNT16 was also identified as a potential cancer driver gene. WNT16 is a WNT ligand, a seemingly unlikely candidate tumour suppressor. However, Nalesso et al 37 showed that while WNT16 was capable of binding Fzd receptors and activating canonical WNT signaling, the degree of activation was

196

significantly lower when compared with the more abundant WNT3A. TOPFlash assays showed that costimulation with both WNT3A and WNT16 resulted in significantly less canonical WNT activation when compared with stimulation using WNT3A alone37. Thus, it appears WNT16 acts as a competitive inhibitor of Fzd and acts to ensure the homeostasis of WNT signaling. In cancer, loss of WNT16 may facilitate excessive canonical WNT activation by failure to compete with more potent WNT ligands, such as WNT3A and WNT8. In combination with data indicating poorer prognosis for patients harbouring WNT16 mutant cancers, it is likely that WNT16 acts as a tumour suppressor. Inhibitors of WNT ligand secretion, such as porcupine inhibitors, are currently being trialed in colorectal and other solid tumours38. Cancers that lack WNT16 are prone to excessive ligand-dependent WNT activation37(p16) and may represent a subset of patients that could benefit from this therapy. Indeed, as much of WNT16 mutations occur on a background of RNF43 alterations, which has been shown to confer sensitivity to porcupine inhibitors31, mutation of both genes could further sensitize cells to this class of drugs. Although we do not present data in support of this hypothesis in this study, the frequency in which WNT16 is mutated in BRAF mutant colorectal cancers may provide an impetus for investigating whether WNT16 has a role in determining sensitivity to WNT-ligand inhibitors.

We recently assessed a series of 80 BRAF mutant cancers1 and identified truncating APC mutation in 11% of these cancers. It has been postulated that truncating APC mutation is uncommon in the context of pre-existing BRAF mutation because the dysregulation of WNT signal is too profound in this cellular context. This is consistent with the model proposed by Albuquerque et al 39 and indicates that mutations such as RNF43/ZNRF3 provides a “just-right” level of WNT signaling to confer a selective advantage. However, the present study has confirmed that a minority of BRAF mutant cancers do carry a truncating APC mutation. The higher proportion of cancers bearing both APC and BRAF mutation in the present study may be due to selection bias in the present series with a higher proportion of late stage microsatellite stable cancer included.

APC mutations were much more common in minority of BRAF mutant cancers diagnosed at a younger age. The average age of BRAF mutant cancers harbouring APC mutation was 12 years less than APC wild-type. These cancers were more likely to be microsatellite stable, and present with metastatic disease. The median survival of patients with BRAF mutation and APC mutation was 64% lower than patients with BRAF mutation alone, and patients with both mutations had a five-year survival rate of 12%. BRAF mutant microsatellite stable cancers are known to confer a particularly poor prognosis 40,

197

in part due to the low neoantigen burden and poor immunogenicity of the cancers41. Although APC mutation in BRAF mutant cancers was not significantly associated with poorer prognosis on multivariate analysis, we did identify APC mutations as prognosticators on univariate analysis. Furthermore, there was a strong and borderline significant trend to a worse still prognosis for BRAF mutant MSS cancers with APC mutations compared to those without. Hence cancers appear to be highly aggressive and occur earlier in life. We generated a murine model to recapitulate mutation of APC and BRAF to examine interactions between these mutations and the consequences of mutating both genes on polyp development and overall survival. We observed massive polyp loads in animals bearing both mutations, and a markedly reduced survival. 100% and 81.25% of Braf mutant and Apc mutant animals survived to 12 months. When we mutated both genes, no animals survived past six months and the median survival was reduced to a mere 3.2 months. While animals did not develop invasive cancer, and instead died due to polyp load, these data indicate that comutation of APC and BRAF in an enterocyte induces rapid neoplastic alterations and an overt proliferative phenotype. Collectively these data provide strong evidence that mutation of both APC and BRAF, whilst uncommon in humans, generates a remarkably aggressive neoplastic phenotype.

It is difficult to resolve how these particular cancers have evolved. Both APC and BRAF mutations are tumour-initiating events 28,42,43 and give rise to different precursor lesions44. Moreover, the cell of origin for APC initiated polyps and BRAF initiated polyps is hypothesized to differ. APC initiated lesions adhere to the “top-down” model45, whereas BRAF mutant lesions are initiated in the stem compartment46. Methylation profiling of BRAF mutant and APC mutant cancers confirmed this model, and showed that BRAF mutant cancers had a methylation profile reminiscent of the intestinal stem cell46(p). In our study, we observed no difference in the frequency of CIMP between BRAF mutants with APC mutation and those without, suggesting that these dual-mutant cancers may have arisen in the stem component, and therefore may have been initiated by BRAF and acquired an APC mutation at a later stage. However it is difficult to reconcile this with the rarity to which APC is mutated in BRAF mutant precursor lesions1. It is possible that, upon acquiring an APC mutation, progression to cancer is rapid and as a result identifying lesions in a transitional state is rare. This fits with the aggressive phenotype of these cancers. An alternative hypothesis, supported an age at diagnosis that is similar to conventional pathway cancers18 and the morphology of our murine adenomas, is that polyps are initiated by APC, acquire a BRAF mutation. If this is the case such lesions must progress extremely rapidly to cancer as they are very rarely identified in large series of conventional adenomas.

198

The mechanisms associated with APC mutation induced aggressiveness in the context of BRAF mutant cancers are not clear. Aberrant WNT signaling appears to be a prerequisite for the acquisition of dysplasia in traditional serrated adenomas 3 and sessile serrated lesions29,47. In traditional serrated adenomas, which are by nature dysplastic, are thought to have transitioned from hyperplastic polyps or sessile serrated lesions, and this transition coincides with the acquisition of WNT pathway alterations, such as RSPO fusions4 and RNF43 mutations3. In sessile serrated lesions that acquire dysplasia, RNF43 mutations are extremely common (>50%)47, however are rare in lesions that are not yet dysplastic. This is an important distinction, as lesions with dysplasia are thought to progress within 12 months to invasive cancer48. Alterations to RNF43 are dependent on DNA methylation induced loss of MLH1 and mismatch repair function. In sessile serrated lesions, DNA methylation may develop over a protracted period of time, increasing the sojourn of these lesions to dysplasia and ultimately malignancy49. BRAF mutation has been reported to induce senescence, which can be overcome through silencing of the CDKN2A loci50. It has been previously reported that APC loss in the intestine does require loss of p21 to generate adenomas and does not induce senescence51. It is possible that APC mutations, in the context of BRAF mutation, facilitate an exit to oncogene induced senescence and while simultaneously satisfying the requirement for WNT signaling, facilitating a rapid transition to dysplasia and invasive cancer. This may explain why precancerous lesions with both APC mutation and BRAF mutation are rarely identified, and why the cancers with these mutations present at a younger age.

Our study has comprehensively assessed the somatic mutation landscape of WNT signaling regulators in a large series of BRAF mutant colorectal cancers, however several limitations remain. As we assessed publicly accessible sequencing data from various sources, the depth of sequencing, bioinformatic pipeline and variant filtering methods was not uniform across cohorts. This may result in the under (or over) reporting of certain variants. Our study was also limited in the ability to assess the role of the CpG island methylator phenotype in WNT signaling. CIMP was assessed in some samples in our study cohorts, however the method of assessment was not uniform across samples. For example, cancers from TCGA assessed CIMP via microarray based clustering, in contrast other studies used qPCR based marker panels. Moreover many WNT signaling genes are putative methylation targets. Further study is necessary to assess the role of CIMP in shaping WNT signaling in the context of BRAF mutant colorectal cancer. Although our study is the largest to date to assess WNT regulators in BRAF

199

mutant cancers, we did not have a sufficiently large enough sample size to draw conclusions as to the prognostic implications of genes that are mutated more rarely, such as MEN1 and GNG12. Mutations in hotspots, or in relatively small genes could be assessed in much larger cohorts using less expensive genotyping technologies to determine whether mutations in these genes are relevant to patient outcome.

Conclusions In conclusion, here we have conducted a comprehensive survey of the somatic mutational landscape shaping WNT signaling in BRAF mutant serrated colorectal neoplasia. The mutational landscape of WNT signaling regulators is a mosaic that is underpinned by mutations in key driver genes, such as RNF43 and APC. Mutations of RNF43 and APC are mutually exclusive. We identified potential cancer driver genes in the WNT signaling axis. MEN1 has previously been implicated in cancers of endocrine origin, but has not been identified as a tumour suppressor gene in colorectal cancer. We have identified a hotspot mutation in MEN1 that effects 4% of BRAF mutant cancers. We have identified WNT16 as a potential driver gene by mutational hotspot analysis. WNT16 is a competitive inhibitor of canonical WNT and mutation of WNT16 is common in BRAF mutant cancers. BRAF mutant colorectal cancers with truncating APC mutation tended to arise earlier in life, and presented at a significantly later stage. These cancers are extremely aggressive and survival of patients with both BRAF and APC mutation is poor (12% 5-year survival). In vivo modelling of Apc and Braf mutation revealed a dramatically increased tumour burden with the median survival of 3.2 months for animals with both mutations. Therefore, we conclude that co-mutation of BRAF and APC in colorectal cancers is conducive to an aggressive phenotype.

200

References 1. Borowsky J, Dumenil T, Bettington M, et al. The role of APC in WNT pathway activation in serrated neoplasia. Mod Pathol Off J U S Can Acad Pathol Inc. 2018;31(3):495-504. doi:10.1038/modpathol.2017.150 2. Bond CE, McKeone DM, Kalimutho M, et al. RNF43 and ZNRF3 are commonly altered in serrated pathway colorectal tumorigenesis. Oncotarget. 2016;7(43):70589-70600. doi:10.18632/oncotarget.12130 3. Hashimoto T, Ogawa R, Yoshida H, et al. Acquisition of WNT Pathway Gene Alterations Coincides With the Transition From Precursor Polyps to Traditional Serrated Adenomas. Am J Surg Pathol. 2019;43(1):132-139. doi:10.1097/PAS.0000000000001149 4. Sekine S, Yamashita S, Tanabe T, et al. Frequent PTPRK-RSPO3 fusions and RNF43 mutations in colorectal traditional serrated adenoma. J Pathol. 2016;239(2):133-138. doi:10.1002/path.4709 5. Rowan AJ, Lamlum H, Ilyas M, et al. APC mutations in sporadic colorectal tumors: A mutational “hotspot” and interdependence of the “two hits.” Proc Natl Acad Sci. 2000;97(7):3352-3357. doi:10.1073/pnas.97.7.3352 6. Kambara T, Matsubara N, Nagao A, et al. Mtations in BRAF, KRAS, and APC, and CpG island methylation: Alternative pathways to colorectal cancer. Cancer Res. 2006;66(8 Supplement):81- 81. 7. The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012;487(7407):330-337. doi:10.1038/nature11252 8. Giannakis M, Hodis E, Jasmine Mu X, et al. RNF43 is frequently mutated in colorectal and endometrial cancers. Nat Genet. 2014;46(12):1264-1266. doi:10.1038/ng.3127 9. Tu J, Park S, Yu W, et al. The most common RNF43 mutant G659Vfs*41 is fully functional in inhibiting Wnt signaling and unlikely to play a role in tumorigenesis. Sci Rep. 2019;9(1):1-12. doi:10.1038/s41598-019-54931-3 10. Suzuki H, Watkins DN, Jair K-W, et al. Epigenetic inactivation of SFRP genes allows constitutive WNT signaling in colorectal cancer. Nat Genet. 2004;36(4):417-422. doi:10.1038/ng1330 11. Suzuki H, Gabrielson E, Chen W, et al. A genomic screen for genes upregulated by demethylation and histone deacetylase inhibition in human colorectal cancer. Nat Genet. 2002;31(2):141-149. doi:10.1038/ng892

201

12. Aguilera O, Fraga MF, Ballestar E, et al. Epigenetic inactivation of the Wnt antagonist DICKKOPF-1 (DKK-1) gene in human colorectal cancer. Oncogene. 2006;25(29):4116-4121. doi:10.1038/sj.onc.1209439 13. Y L, Ns S, T H, et al. Comparative Molecular Analysis of Gastrointestinal Adenocarcinomas. Cancer cell. doi:10.1016/j.ccell.2018.03.010 14. Giannakis M, Mu XJ, Shukla SA, et al. Genomic Correlates of Immune-Cell Infiltrates in Colorectal Carcinoma. Cell Rep. 2016;15(4):857-865. doi:10.1016/j.celrep.2016.03.075 15. Vasaikar S, Huang C, Wang X, et al. Proteogenomic Analysis of Human Colon Cancer Reveals New Therapeutic Opportunities. Cell. 2019;177(4):1035-1049.e19. doi:10.1016/j.cell.2019.03.030 16. R Y, Wk C, Md L, et al. Clinical Sequencing Defines the Genomic Landscape of Metastatic Colorectal Cancer. Cancer cell. doi:10.1016/j.ccell.2017.12.004 17. Miller SA, Dykes DD, Polesky HF. A simple salting out procedure for extracting DNA from human nucleated cells. Nucleic Acids Res. 1988;16(3):1215. 18. Fennell L, Dumenil T, Wockner L, et al. Integrative Genome-Scale DNA Methylation Analysis of a Large and Unselected Cohort Reveals Five Distinct Subtypes of Colorectal Adenocarcinomas. Cell Mol Gastroenterol Hepatol. April 2019. doi:10.1016/j.jcmgh.2019.04.002 19. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17(1):10-12. doi:10.14806/ej.17.1.200 20. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv13033997 Q-Bio. May 2013. http://arxiv.org/abs/1303.3997. Accessed February 6, 2020. 21. Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinforma Oxf Engl. 2009;25(16):2078-2079. doi:10.1093/bioinformatics/btp352 22. Kassahn KS, Holmes O, Nones K, et al. Somatic Point Mutation Calling in Low Cellularity Tumors. PLOS ONE. 2013;8(11):e74380. doi:10.1371/journal.pone.0074380 23. McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297- 1303. doi:10.1101/gr.107524.110 24. Cingolani P, Platts A, Wang LL, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012;6(2):80-92. doi:10.4161/fly.19695 25. Mayakonda A, Lin D-C, Assenov Y, Plass C, Koeffler HP. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 2018;28(11):1747-1756. doi:10.1101/gr.239244.118 202

26. D T, A G-P, N L-B. OncodriveCLUST: Exploiting the Positional Clustering of Somatic Mutations to Identify Cancer Genes. Bioinformatics (Oxford, England). doi:10.1093/bioinformatics/btt395 27. Mularoni L, Sabarinathan R, Deu-Pons J, Gonzalez-Perez A, López-Bigas N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 2016;17(1):128. doi:10.1186/s13059-016-0994-0 28. Bond CE, Liu C, Kawamata F, et al. Oncogenic BRAF mutation induces DNA methylation changes in a murine model for human serrated colorectal neoplasia. Epigenetics. 2018;13(1):40- 48. doi:10.1080/15592294.2017.1411446 29. Kane AM, Fennell LJ, Liu C, et al. Alterations in signaling pathways that accompany spontaneous transition to malignancy in a mouse model of BRAF mutant microsatellite stable colorectal cancer. Neoplasia. 2020;22(2):120-128. doi:10.1016/j.neo.2019.12.002 30. Fennell LJ, Clendenning M, McKeone DM, et al. RNF43 is mutated less frequently in Lynch Syndrome compared with sporadic microsatellite unstable colorectal cancers. Fam Cancer. 2018;17(1):63-69. doi:10.1007/s10689-017-0003-0 31. X J, Hx H, Jd G, et al. Inactivating Mutations of RNF43 Confer Wnt Dependency in Pancreatic Ductal Adenocarcinoma. Proceedings of the National Academy of Sciences of the United States of America. doi:10.1073/pnas.1307218110 32. S S, Ew S, S D, et al. Recurrent R-spondin Fusions in Colon Cancer. Nature. doi:10.1038/nature11282 33. Sekine S, Ogawa R, Hashimoto T, et al. Comprehensive characterization of RSPO fusions in colorectal traditional serrated adenomas. Histopathology. 2017;71(4):601-609. doi:10.1111/his.13265 34. Z Y, C SC, M S, et al. Loss of MEN1 Activates DNMT1 Implicating DNA Hypermethylation as a Driver of MEN1 Tumorigenesis. Oncotarget. doi:10.18632/oncotarget.7279 35. Agarwal SK. The future: genetics advances in MEN1 therapeutic approaches and management strategies. Endocr Relat Cancer. 2017;24(10):T119-T134. doi:10.1530/ERC-17- 0199 36. Marini F, Falchetti A, Luzi E, Tonelli F, Maria Luisa B. Multiple Endocrine Neoplasia Type 1 (MEN1) Syndrome. In: Riegert-Johnson DL, Boardman LA, Hefferon T, Roberts M, eds. Cancer Syndromes. Bethesda (MD): National Center for Biotechnology Information (US); 2009. http://www.ncbi.nlm.nih.gov/books/NBK7029/. Accessed January 20, 2020.

203

37. Nalesso G, Thomas BL, Sherwood JC, et al. WNT16 antagonises excessive canonical WNT activation and protects cartilage in osteoarthritis. Ann Rheum Dis. 2017;76(1):218-226. doi:10.1136/annrheumdis-2015-208577 38. Rodon J, Argilés G, Connolly RM, et al. Abstract CT175: Biomarker analyses from a phase I study of WNT974, a first-in-class Porcupine inhibitor, in patients (pts) with advanced solid tumors. Cancer Res. 2018;78(13 Supplement):CT175-CT175. doi:10.1158/1538-7445.AM2018- CT175 39. Albuquerque C, Breukel C, van der Luijt R, et al. The ‘just-right’ signaling model: APC somatic mutations are selected based on a specific level of activation of the β-catenin signaling cascade. Hum Mol Genet. 2002;11(13):1549-1560. doi:10.1093/hmg/11.13.1549 40. Pai RK, Jayachandran P, Koong AC, et al. BRAF-mutated, Microsatellite-stable Adenocarcinoma of the Proximal Colon: An Aggressive Adenocarcinoma With Poor Survival, Mucinous Differentiation, and Adverse Morphologic Features. Am J Surg Pathol. 2012;36(5):744-752. doi:10.1097/PAS.0b013e31824430d7 41. Bolzacchini E, Cerutti R, Digiacomo N, et al. Difference in immune infiltration in MSI and MSS BRAF mutant colorectal cancer. J Clin Oncol. 2018;36(15_suppl):e15624-e15624. doi:10.1200/JCO.2018.36.15_suppl.e15624 42. Fearon ER, Vogelstein B. A genetic model for colorectal tumorigenesis. Cell. 1990;61(5):759-767. doi:10.1016/0092-8674(90)90186-I 43. Rad R, Cadiñanos J, Rad L, et al. A Genetic Progression Model of BrafV600E-Induced Intestinal Tumorigenesis Reveals Targets for Therapeutic Intervention. Cancer Cell. 2013;24(1):15-29. doi:10.1016/j.ccr.2013.05.014 44. Rk P, M B, A S, C R. An Update on the Morphology and Molecular Pathology of Serrated Colorectal Polyps and Associated Carcinomas. Modern pathology : an official journal of the United States and Canadian Academy of Pathology, Inc. doi:10.1038/s41379-019-0280-2 45. Shih I-M, Wang T-L, Traverso G, et al. Top-down morphogenesis of colorectal tumors. Proc Natl Acad Sci U S A. 2001;98(5):2640-2645. doi:10.1073/pnas.051629398 46. Bormann F, Rodríguez-Paredes M, Lasitschka F, et al. Cell-of-Origin DNA Methylation Signatures Are Maintained during Colorectal Carcinogenesis. Cell Rep. 2018;23(11):3407-3418. doi:10.1016/j.celrep.2018.05.045 47. Hashimoto T, Yamashita S, Yoshida H, et al. WNT Pathway Gene Mutations Are Associated With the Presence of Dysplasia in Colorectal Sessile Serrated Adenoma/Polyps. Am J Surg Pathol. 2017;41(9):1188-1197. doi:10.1097/PAS.0000000000000877

204

48. Bettington M, Walker N, Rosty C, et al. Clinicopathological and molecular features of sessile serrated adenomas with dysplasia or carcinoma. Gut. 2017;66(1):97-106. doi:10.1136/gutjnl-2015-310456 49. Liu C, Bettington ML, Walker NI, et al. CpG Island Methylation in Sessile Serrated Adenomas Increases With Age, Indicating Lower Risk of Malignancy in Young Patients. Gastroenterology. 2018;155(5):1362-1365.e2. doi:10.1053/j.gastro.2018.07.012 50. Bennecke M, Kriegl L, Bajbouj M, et al. Ink4a/Arf and Oncogene-Induced Senescence Prevent Tumor Progression during Alternative Colorectal Tumorigenesis. Cancer Cell. 2010;18(2):135-146. doi:10.1016/j.ccr.2010.06.013 51. Cole AM, Ridgway RA, Derkits SE, et al. p21 loss blocks senescence following Apc loss and provokes tumourigenesis in the renal but not the intestinal epithelium. EMBO Mol Med. 2010;2(11):472-486. doi:10.1002/emmm.201000101

205

Chapter Four: Discussion

Chapter Four: Discussion

Colorectal cancer is the third most commonly diagnosed cancer in Australia and there is an urgent need to better understand the molecular underpinnings of the disease to further treatment and prevention efforts. The advent of next generation sequencing and the decreasing cost of high throughput molecular assays in the past decade have facilitated the rapid expansion of our understanding of the genomics and transcriptomics of colorectal cancers. The epigenomics of colorectal cancer is substantially understudied in comparison to its genomic and transcriptomic counterparts. Therefore, this thesis set out to answer the following overarching question: What is the role of epigenomics in colorectal cancer development and progression?

To address this question, we collected >200 samples from the Royal Brisbane and Women’s hospital and embarked on a project to survey the DNA methylation landscape of colorectal cancers, and the potential transcriptomic consequences of this DNA methylation. In the years that preceded this project other groups had attempted similar studies. The first semi-high throughput study of the DNA methylation landscape of colorectal cancers was published in 20061. In this study, Weisenberger and colleagues evaluated the DNA methylation state of 195 loci in a cohort comprised of 295 primary colorectal cancers. This seminal work identified a striking relationship between CIMP, the CpG island methylator phenotype, and BRAF mutation. Although this study was comprehensive by the standards of the mid-2000s, it was limited to but a small portion of the genome. Understanding the true nature of the colorectal cancer methylome would require technology able to assess the entire genome, in a high-throughput and unbiased manner. The Illumina HumanMethylation 27 BeadChip microarray platform was released in 2008. This technology significantly reduced the technical barriers preventing the assessment of DNA methylation in a high-throughput manner, allowing the rapid assessment of the DNA methylation state of thousands of individual CpG’s simultaneously. This technology was used to evaluate the colorectal cancer methylome by The Cancer Genome Atlas2 and Hinuoe et al3. In the former, DNA methylation was surveyed in 236 cancer and 42 adjacent normal mucosal samples. Using recursively-partiontioned mixed model clustering, Muzny et al2 identified four clusters of colorectal cancers. The first cluster, CIMP-High, represented the classical CpG island methylator phenotype and was enriched for BRAF mutant samples. The second cluster displayed less overall DNA methylation, and was reminiscent of a more attenuated phenotype, the authors termed this cluster CIMP-low. Finally, Muzny and colleagues identified two clusters with relatively little DNA

206

methylation alterations. Hinoue et al3 assessed 125 colorectal cancer samples, and similarly identified four clusters.

While more comprehensive in nature than previous studies, there were several limitations to both Muzny et al2 and Hinoue et al3. First, while both consisted of a large cohort of samples, neither were collected consecutively or were unselected. As such it is difficult to understand the interpatient DNA methylation heterogeneity. Second, both studies assessed DNA methylation by HM27 microarray. This technology was state-of-the-art at the time these studies were conducted, however was limited to 27,000 CpG sites. The vast majority of these CpG sites were located in CpG islands at the proximal promoter of protein coding genes. It is estimated that the human genome contains ~28,000 CpG islands4 and ~28 million CpG sites4, and therefore these studies assessed ~0.1% of the entire methylome. Lastly, we now know that the DNA methylation state of normal cells and tissues is dynamic and changes over the lifespan of an adult5,6. To accurately assess cancer-specific alterations, it is necessary to first identify those that occur in the aged, but physiologically normal colonic mucosa. As such the study presented in Chapter 2 of this thesis was designed to address these knowledge gaps

We designed a study that utilized our tumour bank of samples that were collected from the RBWH in an essentially consecutive and unselected manner. In doing so, we hoped to gain a clear understanding of the heterogeneity of both DNA methylation and transcription amongst colorectal cancers in an Australian setting. In 2012 Illumina released the HumanMethylation450 array platform. This was a significant improvement on the earlier HumanMethylation27 microarray and was able to accurately detect the methylation of 16x more CpG sites than was previously possible. This includes ~90% of the content that was covered by the HumanMethylation27 array. Moreover, the HumanMethylation450 array covered >96% of CpG islands with at least one probe, and was expanded to include features that were absent from the HumanMethylation27 array, including non- CpG island associated CpGs, FANTOM 4 promoters, DNase hypersensitivity sites, and miRNA promoters. This technology was well suited to our overall aim of improving our knowledge of the colorectal cancer methylome, and moving beyond the confines of methylation within proximal gene promoters.

For consistency and reproducibility we adopted similar methodologies for analysing these data as was employed in Hinoue et al3. We did, however, make one subtle but key distinction; to capture cancer-specific DNA methylation alterations and identify subtypes of cancers we discarded all probes where we identified age-associated alterations in the subset of normal mucosal samples that 207

we assayed simultaneously, so as to avoid the confounding of epigenetic drift. This filtering step was designed to reduce the possibility that patient age may drive clustering, rather than cancer- specific alterations. When we performed recursively-partitioned mixed model clustering on the most variable of the remaining probes, we identified five consensus subtypes of colorectal cancers (Figure 1). This contrasts with both Hinoue et al3 and Muzny et al2, where four subtypes were detected. We identified two distinct clusters of cancers with high levels of DNA methylation, CIMP-H1 and CIMP-H2, two with intermediate levels of DNA methylation, and one that was relatively devoid of methylation alterations. The key difference between our study and those previous is the dichotomization of CIMP-High into two distinct subtypes.

Figure 1: The five subtypes of colorectal cancer identified in Fennell et al7

Our CIMP-H1 cluster was strongly enriched for BRAF mutations, location within the proximal colon, female gender and advanced age. This is largely consistent with an origin in the serrated neoplasia pathway8. The serrated neoplasia pathway is uniquely marked by BRAF mutation8. This mutation occurs at a high frequency in putative precursor lesion, the sessile serrated lesion9. These lesions are most often identified in the ascending colon10,11, and have a preponderance for occurring in female patients12.

The second subtype of highly methylated cancers, CIMP-H2, had not been previously described. These cancers were associated with female gender, proximal location, and advanced age, but all to a lesser extent than CIMP-H1. The most striking of differences between CIMP-H1 and CIMP-H2 was the frequency of mutation of BRAF and KRAS. In contrast to CIMP-H1, CIMP-H2 cancers did not frequently mutate BRAF, instead CIMP-H2 cancers tended to activate the MAPKinase pathway via mutations in the KRAS oncogene. They were almost universally microsatellite stable, and expression profiling revealed an enrichment for the metabolic (CMS3) expression subtype. For comparison ~50% of CIMP-H1 cancers displayed microsatellite instability, and only 13% were of the metabolic expression subtype. Thus CIMP-H2 clearly represented a novel entity. 208

While it was clear that CIMP-H1 cancers had their origins in serrated neoplasia, the natural history of CIMP-H2 cancers was not immediately apparent. The preponderance for occurring in the proximal colon, and the highly methylated nature of the cancers suggests that these cancers may also arise via the serrated neoplasia pathway. KRAS mutations are extremely uncommon in sessile serrated lesions13 and BRAF mutations occurs in a majority of these lesions13–15. Hence it is unlikely that these lesions arise from sessile serrated lesions. Traditional serrated lesions represent a unique subtype of colonic lesion that is rare and aggressive. Traditional serrated lesions are histologically charactarised by luminal serrations that sit atop architecture that is reminiscent of conventional adenomas16,17. In contrast to the sessile serrated lesion, KRAS mutations are relatively common in the traditional serrated lesion and occur in approximately 50-70% of cases17, and the lesions are reported to occur in both the proximal and distal colon17. It is possible that the origins of the CIMP- H2 cancer rests in the KRAS mutant traditional serrated lesion. To further explore this hypothesis one could evaluate the DNA methylation and transcriptional landscape of advanced traditional serrated lesions. It is conceivable that the methylation and transcriptional architecture of these lesions will inform whether they are the likely polyp of origin for these cancers. We considered such an analysis in this thesis; however interrogation of these polyps is notoriously difficult for a number of reasons, not least of which is their rarity. Traditional serrated lesions represent <2% of all colonic polyps17 and obtaining fresh tissue is fraught with ethical considerations as most, if not all, of the tissue is required for histopathological diagnosis and taking fresh samples could compromise any diagnosis. Nevertheless, as detection of these lesions improves and technology for methylation and expression analysis becomes more amenable to archival, fixed samples, this analysis will become more feasible.

Our expression analysis of CIMP-H1 and CIMP-H2 cancers provided further evidence of this dichotomy. CIMP-H1 cancers strongly overexpress genes associated with various inflammatory and immune related processes. This is consistent with the hypermutator phenotype that is present in ~50% of CIMP-H1 cancers. In CIMP-H1 cancers this phenotype has arisen due to the hypermethylation and loss of MLH1. Loss of MLH1 abrogates effective DNA mismatch repair18,19, and results in a slew of somatic mutations2, the number and spectrum of which are positively associated with activated tumour infiltrating lymphocytes20. Recent studies have indicated that this phenomenon is probably due to the increased neo-antigen burden in microsatellite unstable cancers21. Patients with microsatellite unstable cancer, and a high neo-antigen burden respond well to immune checkpoint blockade, with many patients achieving durable responses22. The subset of mismatch repair deficient colorectal cancers that are CIMP-H1 may also benefit from combination 209

therapy targeting both the epigenome and the immune system. Low-dose DNA methyltransferase inhibitors can induce the “viral mimicry” phenomena. Here, heavily methylated endogenous retroviruses become unmethylated and expressed following DNA methyltransferase inhibition23. This induces an anti-viral cellular response23. These findings have lead to trials combining different DNA methyltransferase inhibitors (ie. Azacitidine or Decitabine) with immune checkpoint blockade. These trials are ongoing24, and it is likely that the DNA methylation subtype of the patients’ cancer, in combination with microsatellite instability status, will affect the efficacy of these therapeutic strategies.

In CIMP-H2 we identified a strong enrichment for overexpression of the genes involved in the metabolism of bile acid. The primary bile acids are formed as a by-product of metabolism that occurs in the liver. The two primary bile acids produced are cholic acid and chenodeoxycholic acid25. These bile acids are further metabolised to deoxycholic and lithocholic acids by the microorganisms that inhabit the gut25. High expression of the bile acid metabolism pathway is indicative of high concentrations of bile acid in the microenvironment surrounding the cancer and may provide an insight into the aetiology of these cancers. Intestinal bile acids have been associated with a number of cancer-initiating processes26,27. Secondary bile acids produced in the intestine can increase reactive oxygen species, resulting in oxidative stress and DNA damage28. Administration of deoxycholic acid, in an animal model of colorectal carcinogenesis, has been reported to significantly increase the frequency of Kras mutations29. In 2009, Payne and colleagues30 reported that 5/6 animals fed a diet consisting of 0.2% deoxycholic acid developed adenomas, and two of these adenomas possessed serrated morphology. However, the authors did not examine Kras mutation status and it unclear whether this was the mechanism of action. When considered in concert with the high frequency of KRAS mutation in CIMP-H2 cancers, one could hypothesise that excessive secondary bile acid production could initiate these cancers.

The association between DNA methylation subtype and age was a curious one. We had taken extra care to reduce the confounding of age-associated DNA methylation as much as possible when analysing cancer samples, yet we still observed a stepwise increase in the mean patient age from CIMP-Negative cancers through to CIMP-H1. Given our initial filtering steps, we were comfortable that this was biologically meaningful and not a mere artefact of the spectrum of patient ages included in the study. It had been previously established that both BRAF mutant colorectal cancers31,32 and CIMP-High colorectal cancers tended to occur at a higher average age when compared with wild-type and CIMP-negative cancers33. Such a step-wise relationship, as we have shown here, has not been previously reported. It is not entirely clear whether advancing age 210

predisposes to certain subtypes of cancer, or whether age-associated DNA methylation patterns alter the environment to favour the formation of certain types of cancer. We favour the latter. The chromosomal instability phenotype describes large-scale copy number alterations; tumour suppressor genes are frequently deleted and oncogenes amplified. Less than 30% of CIMP-High cancers display chromosomal instability, and by comparison ~70% of CIMP-Negative cancers are chromosomally unstable34. Seminal works by Eden et al35 showed that genomic hypomethylation creates a permissive environment for aneuploidies. Likewise, Rodriguez reported that the degree of hypomethylation correlates with the degree of DNA damage36. This probably relates to the chromatin state induced by genomic hypermethylation versus hypomethylation. The hypermethylation of DNA is a key component of the heterochromatin formation process 37. Heterochromatic DNA is highly condensed and is in contrast to euchromatic DNA which refers to loosely packed DNA38. Heterochromatin is more resistant to DNA damage, owing to the tight compaction of the nucleosomes39. In contrast, euchromatin is more prone to DNA damaging agents, including radiation39,40. It is possible that age-associated DNA methylation provides safe haven from chromosomal instability by inducing the formation of heterochromatin at frequently altered genomics locations. This would explain the low frequency of CIN in cancers with high DNA methylation and advanced age.

DNA methylation and the post-translational modification of histone tails occur via reactions catalysed by epigenetic enzymes. A feature of Chapter two was the integrative analysis of mutations in the genes encoding these enzymes with DNA methylation subtypes. We had hypothesised that epigenetic dysregulation of CIMP-H cancers would extend to other elements of the epigenome, such as histone modifications, in part due to heightened mutation burdens in genes encoding epigenetic regulators. Sequencing exomes or genomes in significant number is a costly exercise, and The Cancer Genome Atlas2 consortia had previously sequenced >400 colorectal cancer exomes. We elected to analyse the cancer exomes in the TCGA with accompanying methylation data, which allowed for the classification of cancers into five CIMP subtypes. This also served as validation of our methylation data analyses. In keeping with our hypothesis, epigenetic regulators were mutated with high frequency in cancers with extensive DNA methylation. There was a step-wise decrease in the number of mutations in these genes that was concordant with decreasing genomic methylation. In CIMP-H1 and CIMP-H2 we observed a preponderance for truncating mutations in SETD1B. SETD1B is a histone methyltransferase that acts on Lys4 on the tail of histone 3 41. H3K4 methylation is one of many chromatin marks associated with active transcription of euchromatin42. We also identified an enrichment for DNA methylation at bivalent chromatin, which is marked by the repressive H3K27me3 and the active H3K4me3 marks. Bivalent domains are poised and usually 211

repressed, however can respond to stimuli and be actively transcribed. Loss of H3K4me3 at these regions may irreversibly repress the underlying chromatin. This has been reported in embryonic stem cells in response to loss of MLL2, a histone lysine methyltransferase that also methylates Histone H3 at lysine 443. Thus, selection for mutations in histone methyltransferases in CIMP-H cancers might occur as a mechanism to promote the silencing of genes in previously bivalent chromatin domains. This could be explored experimentally, via knockout models, or observationally using human samples with mutations in these genes. However examining histone modifications in clinical samples is difficult as these assays required fresh tissue. For this reason, we did not explore histone modifications in our DNA methylation study. The role of most these mutations remains to be seen, however our study has provided a useful catalogue of mutations with respect to CIMP that may be used in future functional studies.

In 2018, our laboratory published two studies on the association between age and sessile serrated lesions44,45. The first was an analysis of the prevalence of sessile serrated lesions by patient age45. In keeping with previous studies46, the prevalence of conventional adenomas increases with advancing patient age45, and so too does the prevalence of colorectal cancer45,47. Given the high average age of patients diagnosed with BRAF mutant cancers, one would have expected to see a similar trend in the prevalence of sessile serrated lesions. Surprisingly, Bettington et al45 reported that sessile serrated lesions are equally prevalent across the age spectrum. This indicates that sessile serrated lesions in the colon of a young individual present a limited risk of neoplastic transformation. As DNA methylation is inextricably linked to the progression of BRAF mutant colorectal cancers, one potential explanation for the apparent low risk of sessile serrated lesions in young patients is the absence of widespread DNA methylation alterations. Liu et al44 revealed that the prevalence of CIMP in sessile serrated lesions increased with age. Whether CIMP occurs over a protracted period of time in a lesion initiated in the colon of young patients, or instantaneously a lesion arising in an elderly patient remained unclear. It was this finding that was the impetus for the study described in Chapter 2.

To resolve this critical distinction we used the inducible BrafCA/Villin CreERT2 murine model48. In this model, the oncogenic Braf V637E mutation is induced by a single intraperitoneal injection of tamoxifen. Using this model, we previously demonstrated that widespread DNA methylation alterations occur over a protracted period of time if the oncogenic Braf mutation is induced at wean48. These animals developed murine serrated lesions (MSLs) by 8 months, and a subset progress to cancer by 14 months48. This study recapitulated the scenario where Braf mutation occurs early in life, and confirmed that prolonged oncogenic signalling eventually results in DNA 212

methylation alterations and serrated neoplasia. However, it was still not clear if the age at which the Braf mutation occurs influences how long neoplasia takes to develop. This is important as current surveillance guidelines do not take age into account49. In chapter two, we activated the Braf mutation at wean and after 9 months of aging for a period of five months. When Braf was activated at wean, we rarely observed MSLs at five months. In contrast, activating Braf after nine months of normal aging for the same five month period of time resulted in a 10.5 relative risk of MSLs, with >40% of animals developing at least one MSL. These data provide a strong indication that age alters the risk of Braf mutation induced neoplastic transformation. Moreover the current data supports the hypothesis that Braf mutation and Braf mutant sessile serrated lesions may have limited malignant potential when they occur in young patients45.

As DNA methylation is a critical determinant of neoplastic potential in the serrated pathway44 and MSLs occurred almost exclusively in older animals in our study, we sought to investigate how the DNA methylation landscape of the intestine changes over time. We hypothesised that age- associated DNA methylation alterations may predispose to neoplastic transformation. Age- associated DNA methylation is not a new concept, and early studies of DNA methylation and cancer took care to discriminate age associated alterations from those occurring exclusively in cancer50,51. For example, SFRP1 and SFRP2 are secreted frizzled receptors that compete with cell surface frizzled receptors for WNT ligand. These loci reportedly undergo progressive DNA hypomethylation with age52. Other genes, such as HPP1 and GATA5 are also frequently altered through aging53.

To investigate age-associated DNA methylation alterations we performed reduced representation bisulphite sequencing (RRBS). RRBS is a restriction enzyme based sequencing assay that relies on bisulphite conversion, enzymatic digestion of DNA and size selection. MspI is the most commonly used restriction enzyme for RRBS. This enzyme recognises the C*CGG motif, where * is the cut site54. ~2% of the genome is digested into fragments that are <500bp. These fragments are usually CpG dense, and as a result it is possible to size select and economically sequence >1x106 CpG sites. Sequencing reads are aligned to the murine genome and the methylation status of CpGs is bioinformatically deconvoluted. We applied this technology to wild type murine intestinal samples from wean to 20 months of age and performed linear regression on the methylation percentage at single nucleotide resolution. This analysis revealed widespread DNA methylation accumulation with age.

213

We also evaluated the effects of BRAF mutation on epigenetic age. Epigenetic age is a biological measure of age that can be measured based on the DNA methylation level of certain CpG sites. In disease free tissues, epigenetic age correlates with chronological age, however in certain disease settings, epigenetic age is much higher than the chronological age of the patient5,60,61. We applied two models of epigenetic age62,63. While predicted epigenetic age differed between the models, both revealed a remarkable acceleration of epigenetic age in Braf mutant tissues in comparison to age matched wild type samples. It is not clear how Braf mutation accelerates either age-associated DNA methylation or epigenetic age, however there are several different potential mechanisms. The most obvious hypothesis is that oncogenic Braf increases proliferation and the number of cell divisions, and by consequence, epigenetic age and DNA methylation drift. Such a mechanism would be consistent with the role of Braf as an oncogene and as a member of the MAPKinase signalling cascade. However it is difficult to reconcile this with the very nature of the epigenetic clock. The multi-tissue epigenetic clocks employed in this study were modelled using various tissue types, each with differing proliferative capacities. For example, the Meer et al63 model includes cerebral, haematological, muscle, splenic, liver, and heart tissues. Despite varying rates of proliferation and mitosis, this model still accurately predicts chronological age in each of these tissues. Thus, proliferation alone does not completely explain the acceleration induced by Braf. It is also unclear whether other similar models display the same phenotype. For example, KRAS mutation is common in colorectal cancer and is directly upstream of BRAF in the MAPKinase signalling cascade. While Kras mutant mouse models do exist64, they have not been employed to investigate intestinal DNA methylation. Determining the precise mechanism of Braf induced epigenetic drift acceleration, and whether it is specific to the Braf oncogene, or is a class-wide effect, would improve both our understanding of the mechanics of epigenetic drift, and how age-related epigenetic phenomena influences oncogenesis more broadly.

Gene level analysis highlighted an enrichment for age-associated alterations occurring in the promoter of genes involved in WNT signalling. The potential implications of this finding must be considered within the broader context of WNT signalling and its relationship with colorectal cancer. The WNT signalling pathway is highly dysregulated in colorectal cancers. In the conventional pathway ~80% of colorectal cancers display nuclear accumulation of β-catenin, a harbinger of aberrant WNT signalling55. This is usually a consequence of loss of the APC tumour suppressor gene55,56. Approximately 50% of BRAF mutant colorectal cancers show nuclear β-catenin, however only a small number appear to have APC mutation55,57. Moreover, nuclear β-catenin is rare in non- dysplastic SSLs, but common (60%) in those that have acquired cytological dysplasia55. Therefore, alterations of WNT signalling appear to underpin the development of dysplasia and invasive BRAF 214

mutant cancer, but through a mechanism largely independent of APC. In chapter two, we report pervasive methylation at WNT signalling genes that occurs through age. As dysplastic SSLs and invasive BRAF mutant cancers predominantly occur in the elderly15, we hypothesised that these epigenetic alterations may influence WNT signalling in this context. We also observed an acceleration of certain age-associated DNA methylation alterations by oncogenic Braf. In support of the role of methylation at WNT pathway gene promoters as a driver of progression, we observed selection for WNT signalling genes in the loci that Braf mutation altered the rate of DNA methylation accumulation. Other mechanisms that may contribute to the elevated WNT signalling in these lesions are mutations in RNF43, which are common in BRAF mutant MSI cancers58, and RSPO fusions, which occur in some traditional serrated adenomas59. In all likelihood, it is probably a combination of these genetic and epigenetic events that maintain a “just-right” level of WNT signalling, such that progression to cancer is permitted.

While most BRAF mutant cancers arise in patients >65 years of age7,15, there are some cases diagnosed in patients in their 40s and 50s. As the precursor lesions (SSLs) in patients of this age are unlikely to have developed significant DNA methylation to permit progression44, we hypothesised that an alternate mechanism might explain their progression to cancer. We recently showed that prolonged exposure to Braf mutation from wean results in MSLs after ~8-10 months48, and that these lesions usually harboured mutations in Ctnnb1, the gene encoding β-catenin65. Reasoning that genetic alterations in WNT might underpin the progression of early onset BRAF mutant cancers, we performed exome capture sequencing on human BRAF mutant cancers, and combined these data with public somatic mutation data. In keeping with our hypothesis, we identified a plethora of somatic mutations in WNT signalling genes57. Strikingly, BRAF mutant cancers occurring in patients <60 years of age were significantly more likely to harbour truncating mutations in APC, and the likelihood of truncating APC mutation increased with younger age at diagnosis. Thus it appears that early onset BRAF mutant cancers dysregulate WNT signalling in a manner that is reminiscent of the conventional pathway34. These cancers had several other striking features, including a high frequency of metastatic disease and poorer patient outcome. We confirmed the aggressive nature of these cancers in vivo by crossing the Braf model mentioned earlier with the ApcMin mouse, reporting a massively increased polyp burden and dismal survival. From these data, we conclude that BRAF mutant cancers with APC mutation represent a distinct subtype of colorectal cancers. They occur earlier than is typical of BRAF mutant cancers, dysregulate WNT via APC mutation and are remarkably aggressive.

215

Conclusion and Future Directions

This thesis began with three questions: 1) What is the true extent of the CpG island methylator phenotype? 2) Does age-related hypermethylation, and namely that occurring at the loci encoding tumour suppressor genes, increase the risk of serrated colorectal neoplasia? 3) How can we reconcile this with the existence of early onset serrated colorectal cancer? In this thesis we designed three studies to address these fundamental questions of colorectal cancer biology. In the first chapter of this thesis, we conducted a large scale survey of DNA methylation in colorectal cancers, showing vast interpatient DNA methylation heterogeneity and identifying five distinct subtypes of colorectal cancers by DNA methylation profiling. This included the novel CIMP-H2 subtype, a hypermethylator group of cancers with a high frequency of KRAS mutation. This study included analysis of genomic regions inaccessible due to technological constraints in earlier studies, and provided a more comprehensive portrait of the DNA methylation landscape of cancers of the colorectum. These studies also identified a strong association between age and DNA methylation subtype, providing an impetus for investigating how age-related DNA methylation might affect neoplastic transformation.

As BRAF mutation is strongly associated with CIMP, we explored this in a murine model of serrated neoplasia, driven by oncogenic Braf. This study showed that age-related methylation in the normal intestine accumulates at specific loci, and was enriched at WNT signalling gene promoters. Braf mutation, when induced at wean, accelerated the accumulation of this methylation. Next, we showed that mutating Braf in the setting of the aged intestine induces rapid neoplastic transformation when compared to inducing Braf in a young animal for the same period of time. These data strongly link aging, and age-related methylation to the risk of neoplastic progression in the serrated pathway.

Finally, in chapter three we address how BRAF mutant cancers that occur earlier in life arise. We hypothesised that WNT activation would also be necessary in these cancers, but that this would occur via genetic mutation rather than DNA methylation. To address this we collated exome sequencing data from multiple sources to form a large cohort of BRAF mutant cancers. We observed a striking relationship between the probability of a cancer bearing a truncating APC mutation and age at diagnosis, with patients diagnosed younger more likely to harbour cancers with truncating mutations in APC. This concords with our hypothesis that genetic, rather than epigenetic mechanisms underpin these cancers. Furthermore, these cancers appear to be highly aggressive.

216

In this thesis we have showed age-related risk of sessile serrated lesion transformation in murine samples, however it remains to be seen whether this is also the case in human patients. This research question is well beyond the scope of the current thesis, and would be best explored through clinical studies, where different surveillance intervals might be trialled in younger patients versus older patients. Another way this could be resolved is through the use of human colon derived organoids. Here, one could introduce the BRAF mutation via genetic engineering in organoids from young and old patients, and compare the tumourigenic properties of both. This may be hampered by the low efficiency of the CRISPR-Cas9 system in creating single nucleotide edits, and potential difficulty of maintaining organoid cultures from older patients.

Epigenetic alterations can be modified by inhibiting the enzymes that are responsible for their deposition. Thus, an arena for future research is the field of chemoprevention. DNA methylation inhibitors are currently approved as chemotherapy for haematological malignancies, and have poor toxicity profiles and are probably not well suited to chemoprevention. However there are a number of natural compounds that may inhibit DNA methylation and that are well-tolerated. Testing these therapies in a Braf mutant murine model would determine the efficacy of this approach and provide data for clinical studies.

The role of APC mutation in serrated neoplasia is fertile ground for additional research. For many years, it has been assumed that BRAF mutation and APC mutation are largely mutually exclusive. However, we have shown this to not necessarily be the case. When these mutations do occur together, patient outcome is poor. It remains to be seen how effective of a biomarker APC is in the prognosis of BRAF mutant cancers, and a much larger cohort of patients would be required to fully examine this. As most APC mutations reside in the mutational cluster region of the gene, this would be a feasible undertaking given access to sufficient samples. Additionally, it isn’t entirely clear how these cancers arise. Are they APC initiated conventional adenomas that acquired a BRAF mutation, or are they BRAF mutant serrated lesions that selected for an APC mutation. As whole genome sequencing becomes more common place, and these cancers are identified in greater numbers, one could apply the emerging mutation timing methods to resolve this.

In conclusion, this thesis has comprehensively interrogated the role of DNA methylation in colorectal cancer. It has highlighted the relevance of age-associated DNA methylation in colorectal carcinogenesis, and demonstrated the heightened risk of Braf driven neoplasia in the aged intestine. This thesis has identified APC mutation as a key event in the transformation of early onset BRAF mutant colorectal cancers, and provided evidence that these cancers are tremendously aggressive.

217

These findings enhance our understanding of the evolution of colorectal cancers and may improve precision medicine for colorectal cancer.

References 1. Weisenberger, D. J. et al. CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancer. Nat. Genet. 38, 787– 793 (2006). 2. Muzny, D. M. et al. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012). 3. Hinoue, T. et al. Genome-scale analysis of aberrant DNA methylation in colorectal cancer. Genome Res. 22, 271–282 (2012). 4. Lövkvist, C., Dodd, I. B., Sneppen, K. & Haerter, J. O. DNA methylation in human epigenomes depends on local topology of CpG sites. Nucleic Acids Res 44, 5123–5132 (2016). 5. Teschendorff, A. E., West, J. & Beck, S. Age-associated epigenetic drift: implications, and a case of epigenetic thrift? Hum Mol Genet 22, R7–R15 (2013). 6. Issa, J.-P. Aging and epigenetic drift: a vicious cycle. J Clin Invest 124, 24–29 (2014). 7. Fennell, L. et al. Integrative Genome-Scale DNA Methylation Analysis of a Large and Unselected Cohort Reveals 5 Distinct Subtypes of Colorectal Adenocarcinomas. Cellular and Molecular Gastroenterology and Hepatology 8, 269–290 (2019). 8. Leggett, B. & Whitehall, V. Role of the serrated pathway in colorectal cancer pathogenesis. Gastroenterology 138, 2088–2100 (2010). 9. Spring, K. J. et al. High prevalence of sessile serrated adenomas with BRAF mutations: a prospective study of patients undergoing colonoscopy. Gastroenterology 131, 1400–1407 (2006). 10. Min, Y. W. et al. Prevalence of proximal colon serrated polyps in a population at average risk undergoing screening colonoscopy: a multicenter study. Clin Res Hepatol Gastroenterol 36, 604–608 (2012). 11. Crockett, S. D. & Nagtegaal, I. D. Terminology, Molecular Features, Epidemiology, and Management of Serrated Colorectal Neoplasia. Gastroenterology 157, 949-966.e4 (2019). 12. Limketkai, B. N., Lam-Himlin, D., Arnold, M. A. & Arnold, C. A. The cutting edge of serrated polyps: a practical guide to approaching and managing serrated colon polyps. Gastrointestinal Endoscopy 77, 360–375 (2013). 13. Rosty, C. et al. Phenotype and Polyp Landscape in Serrated Polyposis Syndrome: A Series of 100 Patients from Genetics Clinics. Am J Surg Pathol 36, 876–882 (2012). 14. Kambara, T. et al. BRAF mutation is associated with DNA methylation in serrated polyps and cancers of the colorectum. Gut 53, 1137–1144 (2004).

218

15. Bettington, M. et al. Clinicopathological and molecular features of sessile serrated adenomas with dysplasia or carcinoma. Gut 66, 97–106 (2017). 16. Chetty, R. Traditional serrated adenoma (TSA): morphological questions, queries and quandaries. Journal of Clinical Pathology 69, 6–11 (2016). 17. Pai, R. K., Bettington, M., Srivastava, A. & Rosty, C. An update on the morphology and molecular pathology of serrated colorectal polyps and associated carcinomas. Mod Pathol 32, 1390– 1415 (2019). 18. Esteller, M., Levine, R., Baylin, S. B., Ellenson, L. H. & Herman, J. G. MLH1 promoter hypermethylation is associated with the microsatellite instability phenotype in sporadic endometrial carcinomas. Oncogene 17, 2413–2417 (1998). 19. Kuismanen, S. A., Holmberg, M. T., Salovaara, R., de la Chapelle, A. & Peltomäki, P. Genetic and Epigenetic Modification of MLH1 Accounts for a Major Share of Microsatellite- Unstable Colorectal Cancers. The American Journal of Pathology 156, 1773–1779 (2000). 20. Tougeron, D. et al. Tumor-infiltrating lymphocytes in colorectal cancers with microsatellite instability are correlated with the number and spectrum of frameshift mutations. Modern Pathology 22, 1186–1195 (2009). 21. Germano, G. et al. Inactivation of DNA repair triggers neoantigen generation and impairs tumour growth. Nature 552, 116–120 (2017). 22. Le, D. T. et al. PD-1 Blockade in Tumors with Mismatch-Repair Deficiency. New England Journal of Medicine 372, 2509–2520 (2015). 23. Roulois, D. et al. DNA-demethylating agents target colorectal cancer cells by inducing viral mimicry by endogenous transcripts. Cell 162, 961–973 (2015). 24. Lenz, H.-J. Guadecitabine and Nivolumab in Treating Refractory Metastatic Colorectal Cancer - ClinicalTrials.gov. https://clinicaltrials.gov/ct2/show/NCT03576963. 25. Staels, B. & Fonseca, V. A. Bile Acids and Metabolic Regulation. Diabetes Care 32, S237– S245 (2009). 26. Nguyen, T. T., Ung, T. T., Kim, N. H. & Jung, Y. D. Role of bile acids in colon carcinogenesis. World J Clin Cases 6, 577–588 (2018). 27. Ajouz, H., Mukherji, D. & Shamseddine, A. Secondary bile acids: an underrecognized cause of colon cancer. World J Surg Oncol 12, 164 (2014). 28. Powolny, A., Xu, J. & Loo, G. Deoxycholate induces DNA damage and apoptosis in human colon epithelial cells expressing either mutant or wild-type p53. Int. J. Biochem. Cell Biol. 33, 193– 203 (2001).

219

29. Narahara, H. et al. K-ras point mutation is associated with enhancement by deoxycholic acid of colon carcinogenesis induced by azoxymethane, but not with its attenuation by all-trans-retinoic acid. Int. J. Cancer 88, 157–161 (2000). 30. Payne, C. M., Holubec, H., Bhattacharyya, A. K., Bernstein, C. & Bernstein, H. Exposure of mouse colon to dietary bile acid supplement induces sessile adenomas. Inflamm Bowel Dis 16, 729–730 (2010). 31. Iacopetta, B., Li, W. Q., Grieu, F., Ruszkiewicz, A. & Kawakami, K. BRAF mutation and gene methylation frequencies of colorectal tumours with microsatellite instability increase markedly with patient age. Gut 55, 1213–1213 (2006). 32. Malkhosyan, S. R., Yamamoto, H., Piao, Z. & Perucho, M. Late onset and high incidence of colon cancer of the mutator phenotype with hypermethylated hMLH1 gene in women. Gastroenterology 119, 598 (2000). 33. Samowitz, W. S. et al. Evaluation of a Large, Population-Based Sample Supports a CpG Island Methylator Phenotype in Colon Cancer. Gastroenterology 129, 837–845 (2005). 34. Pino, M. S. & Chung, D. C. THE CHROMOSOMAL INSTABILITY PATHWAY IN COLON CANCER. Gastroenterology 138, 2059–2072 (2010). 35. Eden, A., Gaudet, F., Waghmare, A. & Jaenisch, R. Chromosomal instability and tumors promoted by DNA hypomethylation. Science 300, 455 (2003). 36. Rodriguez, J. et al. Chromosomal Instability Correlates with Genome-wide DNA Demethylation in Human Primary Colorectal Cancers. Cancer Res 66, 8462–9468 (2006). 37. Rose, N. R. & Klose, R. J. Understanding the relationship between DNA methylation and histone lysine methylation. Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms 1839, 1362–1372 (2014). 38. Allshire, R. C. & Madhani, H. D. Ten principles of heterochromatin formation and function. Nature Reviews Molecular Cell Biology 19, 229–244 (2018). 39. Cann, K. L. & Dellaire, G. Heterochromatin and the DNA damage response: The need to relax. Biochem. Cell Biol. 89, 45–60 (2011). 40. Cowell, I. et al. γH2AX foci form preferentially in euchromatin after ionising-radiation. PLoS ONE (2007) doi:10.1371/journal.pone.0001057. 41. Roguev, A. et al. The Saccharomyces cerevisiae Set1 complex includes an Ash2 homologue and methylates histone 3 lysine 4. The EMBO Journal 20, 7137–7148 (2001). 42. Yang, W. & Ernst, P. Distinct functions of H3K4 methyltransferases in normal and malignant hematopoiesis. Curr Opin Hematol 24, 322–328 (2017). 43. Mas, G. et al. Promoter bivalency favors an open chromatin architecture in embryonic stem cells. Nat. Genet. 50, 1452–1462 (2018). 220

44. Liu, C. et al. CpG Island Methylation in Sessile Serrated Adenomas Increases With Age, Indicating Lower Risk of Malignancy in Young Patients. Gastroenterology 155, 1362-1365.e2 (2018). 45. Bettington, M. et al. Sessile Serrated Adenomas in Young Patients may have Limited Risk of Malignant Progression. Journal of Clinical Gastroenterology 53, e113 (2019). 46. Kim, H. Y. et al. Age-specific prevalence of serrated lesions and their subtypes by screening colonoscopy: a retrospective study. BMC Gastroenterol 14, 82 (2014). 47. Favoriti, P. et al. Worldwide burden of colorectal cancer: a review. Updates Surg 68, 7–11 (2016). 48. Bond, C. E. et al. Oncogenic BRAF mutation induces DNA methylation changes in a murine model for human serrated colorectal neoplasia. Epigenetics 13, 40–48 (2018). 49. Cancer Council Australia Surveillance Colonoscopy Guidelines Working Party. Clinical practice guidelines for surveillance colonoscopy. https://wiki.cancer.org.au/australia/Guidelines:Colorectal_cancer/Colonoscopy_surveillance. 50. Toyota, M. et al. CpG island methylator phenotype in colorectal cancer. PNAS 96, 8681– 8686 (1999). 51. Issa, J.-P. J. et al. Methylation of the oestrogen receptor CpG island links ageing and neoplasia in human colon. Nature Genetics 7, 536–540 (1994). 52. Belshaw, N. J. et al. Patterns of DNA methylation in individual colonic crypts reveal aging and cancer-related field defects in the morphologically normal mucosa. Carcinogenesis 31, 1158– 1163 (2010). 53. Worthley, D. L. et al. DNA methylation within the normal colorectal mucosa is associated with pathway-specific predisposition to cancer. Oncogene 29, 1653–1662 (2010). 54. Waalwijk, C. & Flavell, R. A. MspI, an isoschizomer of hpaII which cleaves both unmethylated and methylated hpaII sites. Nucleic Acids Res 5, 3231–3236 (1978). 55. Borowsky, J. et al. The role of APC in WNT pathway activation in serrated neoplasia. Mod Pathol 31, 495–504 (2018). 56. Rowan, A. J. et al. APC mutations in sporadic colorectal tumors: A mutational “hotspot” and interdependence of the “two hits”. PNAS 97, 3352–3357 (2000). 57. Fennell, L. J. et al. APC Mutation Marks an Aggressive Subtype of BRAF Mutant Colorectal Cancers. Cancers 12, 1171 (2020). 58. Bond, C. E. et al. RNF43 and ZNRF3 are commonly altered in serrated pathway colorectal tumorigenesis. Oncotarget 7, 70589–70600 (2016). 59. Sekine, S. et al. Comprehensive characterization of RSPO fusions in colorectal traditional serrated adenomas. Histopathology 71, 601–609 (2017). 221

60. Levine, M. E., Lu, A. T., Bennett, D. A. & Horvath, S. Epigenetic age of the pre-frontal cortex is associated with neuritic plaques, amyloid load, and Alzheimer’s disease related cognitive functioning. Aging (Albany NY) 7, 1198–1211 (2015). 61. Perna, L. et al. Epigenetic age acceleration predicts cancer, cardiovascular, and all-cause mortality in a German case cohort. Clin Epigenet 8, 64 (2016). 62. Stubbs, T. M. et al. Multi-tissue DNA methylation age predictor in mouse. Genome Biology 18, 68 (2017). 63. Meer, M. V., Podolskiy, D. I., Tyshkovskiy, A. & Gladyshev, V. N. A whole lifespan mouse multi-tissue DNA methylation clock. eLife 7, e40675 (2018). 64. Feng, Y. et al. Mutant Kras Promotes Hyperplasia and Alters Differentiation in the Colon Epithelium But Does Not Expand the Presumptive Stem Cell Pool. Gastroenterology 141, 1003- 1013.e10 (2011). 65. Kane, A. M. et al. Alterations in signaling pathways that accompany spontaneous transition to malignancy in a mouse model of BRAF mutant microsatellite stable colorectal cancer. Neoplasia 22, 120–128 (2020).

222

Appendix 1 – Epigenetic regulator genes interrogated in Thesis Chapter 1 A1CF ARID4B BCOR ACTB ARNTL BCORL1 ACTL6A ARRB1 BMI1 ACTL6B ASF1A BPTF ACTR3B ASF1B BRCA1 ACTR5 ASH1L BRCA2 ACTR6 ASH2L BRCC3 ACTR8 ASXL1 BRD1 ADNP ASXL2 BRD2 AEBP2 ASXL3 BRD3 AICDA ATAD2 BRD4 AIRE ATAD2B BRD7 ALKBH1 ATF2 BRD8 ANKRD32 ATF7IP BRD9 ANP32A ATM BRDT ANP32B ATN1 BRE ANP32E ATR BRMS1 APBB1 ATRX BRMS1L APEX1 ATXN7 BRPF1 APOBEC1 ATXN7L3 BRPF3 APOBEC2 AURKA BRWD1 APOBEC3A AURKB BRWD3 APOBEC3B AURKC BUB1 APOBEC3C BABAM1 C11orf30 APOBEC3D BAHD1 C14orf169 APOBEC3F BANP C17orf49 APOBEC3G BAP1 CARM1 APOBEC3H BARD1 CBX1 ARID1A BAZ1A CBX2 ARID1B BAZ1B CBX3 ARID2 BAZ2A CBX4 ARID4A BAZ2B CBX5

223

CBX6 CHTOP DNMT3B CBX7 CHUK DNMT3L CBX8 CIR1 DNTTIP2 CCDC101 CIT DOT1L CDC6 CLNS1A DPF1 CDC73 CLOCK DPF2 CDK1 CRB2 DPF3 CDK17 CREBBP DPPA3 CDK2 CSNK2A1 DPY30 CDK3 CSRP2BP DR1 CDK5 CTBP1 DTX3L CDK7 CTBP2 DZIP3 CDK9 CTCF E2F6 CDY1 CTCFL EED CDY1B CTR9 EHMT1 CDY2A CUL1 EHMT2 CDY2B CUL2 EID1 CDYL CUL3 EID2 CDYL2 CUL4A EID2B CECR2 CUL4B ELP2 CENPC CUL5 ELP3 CHAF1A CXXC1 ELP4 CHAF1B DAPK3 ELP5 CHD1 DAXX ELP6 CHD1L DDB1 ENY2 CHD2 DDB2 EP300 CHD3 DDX21 EP400 CHD4 DDX50 EPC1 CHD5 DEK EPC2 CHD6 DMAP1 ERBB4 CHD7 DNAJC1 ERCC6 CHD8 DNAJC2 EXOSC1 CHD9 DND1 EXOSC2 CHEK1 DNMT1 EXOSC3 CHRAC1 DNMT3A EXOSC4 224

EXOSC5 HAT1 HUWE1 EXOSC6 HCFC1 IKBKAP EXOSC7 HCFC2 IKZF1 EXOSC8 HDAC1 IKZF3 EXOSC9 HDAC10 ING1 EYA1 HDAC11 ING2 EYA2 HDAC2 ING3 EYA3 HDAC3 ING4 EYA4 HDAC4 ING5 EZH1 HDAC5 INO80 EZH2 HDAC6 INO80B FAM175A HDAC7 INO80C FAM175B HDAC8 INO80D FBL HDAC9 INO80E FBRS HDGF JADE1 FBRSL1 HELLS JADE2 FOXA1 HIF1AN JADE3 FOXO1 HINFP JAK2 FOXP1 HIRA JARID2 FOXP2 HIRIP3 JDP2 FOXP3 HJURP JMJD1C FOXP4 HLCS JMJD6 GADD45A HLTF KANSL1 GADD45B HMG20A KANSL2 GADD45G HMG20B KANSL3 GATAD1 HMGB1 KAT2A GATAD2A HMGN1 KAT2B GATAD2B HMGN2 KAT5 GFI1 HMGN3 KAT6A GFI1B HMGN4 KAT6B GLYR1 HMGN5 KAT7 GSE1 HP1BP3 KAT8 GSG2 HR KDM1A GTF2I HSPA1A KDM1B GTF3C4 HSPA1B KDM2A 225

KDM2B MBD1 MYO1C KDM3A MBD2 MYSM1 KDM3B MBD3 NAA60 KDM4A MBD4 NAP1L1 KDM4B MBD5 NAP1L2 KDM4C MBD6 NAP1L4 KDM4D MBIP NASP KDM4E MBTD1 NAT10 KDM5A MCRS1 NBN KDM5B MDC1 NCL KDM5C MEAF6 NCOA1 KDM5D MECP2 NCOA2 KDM6A MEN1 NCOA3 KDM6B MGA NCOA6 KDM7A MGEA5 NCOR1 KDM8 MINA NCOR2 KEAP1 MLLT1 NEK6 KMT2A MLLT10 NEK9 KMT2B MLLT6 NFRKB KMT2C MORF4L1 NFYB KMT2D MORF4L2 NFYC KMT2E MOV10 NIPBL L3MBTL1 MPHOSPH8 NOC2L L3MBTL2 MRGBP NPAS2 L3MBTL3 MSH6 NPM1 L3MBTL4 MSL1 NPM2 LAS1L MSL2 NSD1 LBR MSL3 NSL1 LEO1 MST1 OGT LRWD1 MTA1 PADI1 MAP3K7 MTA2 PADI2 MAPKAPK3 MTA3 PADI3 MASTL MTF2 PADI4 MAX MUM1 PAF1 MAZ MYBBP1A PAGR1 226

PAK2 POLE3 PRMT8 PARG PPARGC1A PRMT9 PARP1 PPM1G PRPF31 PARP2 PPP2CA PRR14 PARP3 PPP4C PSIP1 PAXIP1 PPP4R2 RAD51 PBK PRDM1 RAD54B PBRM1 PRDM11 RAD54L PCGF1 PRDM12 RAD54L2 PCGF2 PRDM13 RAG1 PCGF3 PRDM14 RAG2 PCGF5 PRDM16 RAI1 PCGF6 PRDM2 RARA PCNA PRDM4 RB1 PDP1 PRDM5 RBBP4 PELP1 PRDM6 RBBP5 PHC1 PRDM7 RBBP7 PHC2 PRDM8 RBX1 PHC3 PRDM9 RCC1 PHF1 PRKAA1 RCOR1 PHF10 PRKAA2 RCOR3 PHF12 PRKAB1 REST PHF13 PRKAB2 RING1 PHF14 PRKAG1 RLIM PHF19 PRKAG2 RMI1 PHF2 PRKAG3 RNF168 PHF20 PRKCA RNF2 PHF20L1 PRKCB RNF20 PHF21A PRKCD RNF40 PHF8 PRKDC RNF8 PHIP PRMT1 RPS6KA3 PIWIL4 PRMT2 RPS6KA4 PKM PRMT5 RPS6KA5 PKN1 PRMT6 RRP8 POGZ PRMT7 RSF1 227

RUVBL1 SIN3B SS18L2 RUVBL2 SIRT1 SSRP1 RYBP SIRT2 STK4 SAFB SIRT6 SUDS3 SAP130 SIRT7 SUPT16H SAP18 SKP1 SUPT3H SAP25 SMARCA1 SUPT6H SAP30 SMARCA2 SUPT7L SAP30L SMARCA4 SUV39H1 SATB1 SMARCA5 SUV39H2 SATB2 SMARCAD1 SUV420H1 SCMH1 SMARCAL1 SUV420H2 SCML2 SMARCB1 SUZ12 SCML4 SMARCC1 SYNCRIP SENP1 SMARCC2 TADA1 SENP3 SMARCD1 TADA2A SET SMARCD2 TADA2B SETD1A SMARCD3 TADA3 SETD1B SMARCE1 TAF1 SETD2 SMEK1 TAF10 SETD3 SMEK2 TAF12 SETD5 SMYD1 TAF1L SETD6 SMYD2 TAF2 SETD7 SMYD3 TAF3 SETD8 SMYD4 TAF4 SETDB1 SNAI2 TAF5 SETDB2 SP1 TAF5L SETMAR SP100 TAF6 SF3B1 SP140 TAF6L SF3B3 SPEN TAF7 SFMBT1 SPOP TAF8 SFMBT2 SRCAP TAF9 SFPQ SRSF1 TAF9B SHPRH SRSF3 TBL1XR1 SIN3A SS18L1 TDG 228

TDRD3 UBE2D3 WDR82 TDRD7 UBE2E1 WHSC1 TDRKH UBE2H WHSC1L1 TET1 UBE2N WSB2 TET2 UBE2T YAF2 TET3 UBN1 YEATS2 TEX10 UBR2 YEATS4 TFDP1 UBR5 YWHAB TFPT UBR7 YWHAE TLE1 UCHL5 YWHAZ TLE2 UHRF1 YY1 TLE4 UHRF2 ZBTB16 TLK1 UIMC1 ZBTB33 TLK2 USP11 ZBTB7C TNP1 USP12 ZCWPW1 TNP2 USP15 ZFP57 TONSL USP16 ZGPAT TOP2A USP17L2 ZHX1 TOP2B USP21 ZMYM2 TP53 USP22 ZMYM3 TP53BP1 USP3 ZMYND11 TRIM16 USP36 ZMYND8 TRIM24 USP44 ZNF217 TRIM27 USP46 ZNF516 TRIM28 USP49 ZNF532 TRIM33 USP7 ZNF541 TRRAP UTY ZNF592 TSSK6 VDR ZNF687 TTK VPS72 ZNF711 TYW5 VRK1 ZNHIT1 UBE2A WAC ZRANB3 UBE2B WDR5 ZZZ3 UBE2D1 WDR77

229