Dedicated to my father 1952 – 2013

Abstract

While the aetiology and pathogenesis of cancer is variable and dependent on the initial cellular and carcinogenic environment, all manifestations of the disease are unified by a gross dysfunction of normal epigenetic control. Recent technological advances have revealed epigenomic disruption of large domains that encompass multiple , as well as a deregulation of spatial and temporal control of DNA in the nucleus. Here I consolidate these findings by identifying large genomic domains that are epigenetically and transcriptionally activated in prostate tumourigenesis, a phenomenon we termed Long Range Epigenetic Activation (LREA). These regions contain oncogenes, miRNAs and multiple prostate cancer biomarkers, such as prostate cancer antigen 3 (PCA3) and the prostate specific antigen (PSA). LREA regions are characterised by an increase of histone modifications H3K4me3 and H3K9ac with a simultaneous depletion of H3K27me3 at promoters. While I found little evidence of CpG island hypomethylation causing gene activation, I identified hypermethylation of CpG-islands associated with gene activation and differential promoter usage in prostate cancer.

The presence of both epigenetically activated and repressed domains in prostate cancer indicates the deregulation of superior, “long-range” acting processes such as chromatin looping or the timing of DNA replication. Using the Kallikrein gene family locus, I describe the presence of chromatin loops anchored by the CTCF that commonly demarcate the limits of this cancer-specific regional epigenetic modulation. To investigate how replication timing can influence the cancer epigenome I optimised and carried out the high-resolution “Repli-seq” technique, which details the time of replication for all genomic loci. I show that epigenetic gene activation and repression at both individual genes and at domains is associated with a shift in replication timing between normal and cancer cells. I identify the presence of a cancer-specific signature of replication timing, as well as a replication timing shift at the common prostate oncogene, ERG. Genome-wide hypomethylation which commonly affects cancer DNA is shown to exclusively occur at DNA that replicates late in both normal and cancer cells.

Together, the work presented here provides a significant and novel contribution in understanding the nature of epigenetic regulation and the consequences of its deregulation in carcinogenesis.

Acknowledgements

To begin, I would like to thank my supervisor Prof Sue Clark. The entirety of the research this thesis represents, from its conception and realisation, to its documentation here, all exist due to her unfailing support and guidance. Moreover, while it may have indulged my idle temperament, her friendship and collegiality are amongst my most valued memories of the Garvan. I would also like to thank my co-supervisor Dr Clare Stirzaker, for always taking the time to care for me, even when I didn’t realise I needed it.

I am grateful to the Garvan Institute for creating such a unique and welcoming research environment for young scientists, and Dr Alessandra Bray for her many efforts to mitigate the consequences of my failing to properly read guidelines. The research presented here relied on the expertise and generosity of many people at the Garvan. Specifically, I would like to thank: Dr Marcel Coolen and Dr Liz Caldon for their assistance in experimental design; Dr Mark Robinson, Dr Nicola Armstrong and Dario Strbenac for their critical bioinformatics support and analyses, as well as their patience; Jenny Song, Dr Fatima Valdes and Dr Phillippa Taberlay for all of their experimental data which contributed to this thesis.

I would like to acknowledge all of the Clark lab, past and present, for creating a welcoming and stimulating environment away from home. To all my fellow PhD students, I felt grateful and elevated by your company. I would like to specifically thank Dr Warick Locke for his undying and beloved enmity, and Zena Kassir for her treasured and undeserved friendship. It would be amiss to not thank Aaron Statham, for his contributions to this thesis, which span bioinformatics support, experimental guidance and youthful mentorship, and for his valued friendship during my years at the Garvan.

I am thankful to my friends, for their generous hearts and continual affections that my many and varied flaws should never have warranted. I am most grateful to Giselle and Aurélie, who honoured me with their company and whose love and understanding I will always cherish.

I would like to thank my brother for his life-long rivalry and for his persisting successes that have always spurned me to higher grounds. I would like to thank my mother for her unqualified love, for giving me the audacity to never shy from myself or from life’s challenges. Finally, I would like to thank my father, for his unshakable belief in a rational universe, in spite of all the evidence that can be borne; for his grace and quiet insight; and for his unwavering support of all my studies, even when they mattered the least.

Financial Support and Publications

I gratefully acknowledgement the following scholarships, prizes and financial support received during my PhD:

 2009 - 2012: Australian Postgraduate Award, University of New South Wales  2009 – 2012: Rising Star Award, Faculty of Medicine, University of New South Wales  2009 – 2011: Cancer Institute NSW Research Scholar Award  2009: The 19th St Vincent’s and Mater Health Sydney Research Symposium Oral Presentation Prize  2009 – 2012: The Garvan Institute Postgraduate Supplementary Scholarship

Publications:

 Bert, S.A., Robinson, M.D., Strbenac, D., Statham, A.L., Song, J.Z., Hulf, T., Sutherland, R.L., Coolen, M.W., Stirzaker, C., and Clark, S.J. (2013). Regional activation of the cancer genome by long-range epigenetic remodeling. Cancer Cell 23, 9-22

Conference Publications  Bert, S.A., Robinson, M.D., Strbenac, D., Statham, A.L., Song, J.Z., Coolen, M.W., Stirzaker, C., and Clark, S.J. Gene Activation Occurs In Epigenetically Controlled Domains In Cancer, presented at:  19th St Vincent’s and Mater Health Sydney Research Symposium, Sydney, Australia 2009. Oral presentation.  4th Asian Epigenomics Meeting, Genome Institute Singapore, Singapore 2009. Poster presentation.  Lorne Genome Conference, Lorne, Australia 2010. Poster presentation.  5th PACRIM Breast and Prostate Cancer Meeting, Gold Coast, Australia 2011. Poster presentation.

CONTENTS

CONTENTS

CHAPTER 1: Introduction 1

1.1 The regulated genome ...... 2

1.2 The epigenetic code ...... 3

1.2.1 Histone modifications ...... 3

1.2.2 DNA methylation ...... 6

1.2.2.1 CpG Island Methylation ...... 7

1.2.2.2 Low CpG Density Methylation ...... 8

1.3 Domains of regulation ...... 8

1.3.1 Domains of sequence composition ...... 10

1.3.2 Histone modification domains ...... 11

1.3.3 CpG methylation domains...... 11

1.3.4 Nuclear domains ...... 12

1.3.4.1 Lamina associated domains ...... 12

1.3.4.2 Active domains...... 13

1.3.4.3 Globular structure of the genome ...... 14

1.3.5 DNA replication domains ...... 15

1.4 Cancer, the diseased genome ...... 17

1.4.1 DNA methylation in tumourigenesis ...... 18

1.4.2 Chromatin remodelling in tumourigenesis ...... 20

1.4.2.1 Histone methylation and cancer...... 20

1.4.2.2 Histone acetylation and cancer ...... 21

CONTENTS

1.4.3 Domain level epigenomic remodelling in cancer ...... 22

1.5 Justification & Aims ...... 24

CHAPTER 2: Methods 26

2.1 Cell lines and culture ...... 27

2.1.1 LNCaP cell line: Lymph Node Cancer of the Prostate ...... 27

2.1.2 PrEC cell line: Prostate Epithelial Cells ...... 27

2.2 Nucleic acid extraction ...... 28

2.2.1 DNA extraction ...... 28

2.2.2 RNA extraction ...... 28

2.3 Nucleic acid quantification ...... 28

2.3.1 Spectrophotometric quantification ...... 28

2.3.2 Fluorometric quantification ...... 29

2.4 Nucleic acid treatments...... 29

2.4.1 Bisulphite conversion ...... 29

2.4.2 cDNA synthesis ...... 30

2.5 DNA purification ...... 30

2.5.1 Phenol chloroform / ethanol purification ...... 30

2.5.2 Promega Wizard purification kits ...... 31

2.6 PCR based techniques ...... 31

2.6.1 PCR ...... 31

2.6.2 Bisulphite PCR ...... 32

2.6.3 Quantitative Real-Time PCR ...... 33

2.7 Chromatin immunoprecipitation ...... 33

2.7.1 Fixation and sonication ...... 33

CONTENTS

2.7.2 Immunoprecipitation ...... 34

2.7.3 Chromatin Immunoprecipitation Sequencing (ChIP-Seq) ...... 35

2.8 Whole genome methylation studies ...... 35

2.8.1 MBD2 Capture Sequencing (MBDCap-Seq) ...... 35

2.8.2 Infinium Human Methylation450 BeadChips (450K arrays) ...... 36

2.8.3 Methylated DNA immunoprecipitation (MeDIP) ...... 36

2.9 Plasmid preparation and single molecule sequencing ...... 36

2.9.1 Mini-prep plasmid preparation ...... 36

2.9.2 Capillary separation sequencing ...... 37

CHAPTER 3: Long Range Epigenetic Activation 38

3.1 Introduction ...... 39

3.2 Aim ...... 41

3.3 Methods ...... 41

3.3.1 Expression arrays ...... 41

3.3.2 ChIP on chip ...... 42

3.3.3 Bioinformatics ...... 42

3.3.3.1 Identification of transcriptionally activated regions ...... 42

3.3.3.2 Oncomine data mining ...... 42

3.3.3.3 Heat maps ...... 43

3.3.3.4 Copy number analysis ...... 43

3.3.3.5 Pairs analysis ...... 43

3.3.3.6 Significance plots ...... 43

3.3.3.7 Genomic feature density ...... 43

3.4 Results ...... 45

CONTENTS

3.4.1 Identifying activated domains in prostate cancer ...... 45

3.4.2 Validation of activated regions ...... 47

3.4.3 Activated domains harbour cancer related genes ...... 52

3.4.4 Epigenomic analyses of activated domains ...... 54

3.4.5 Activated domains are epigenetically deregulated ...... 56

3.4.6 Epigenetic changes occur in unison ...... 60

3.5 Discussion ...... 61

CHAPTER 4: DNA Methylation and Gene Activation 65

4.1 Introduction ...... 66

4.2 Aim ...... 68

4.3 Methods ...... 68

4.3.1 Clonal Bisulphite Sequencing (CBS) ...... 68

4.3.2 RNA based sequencing ...... 69

4.3.2.1 RNA-Seq ...... 69

4.3.2.2 CAGE-Seq ...... 69

4.3.3 Transcription factor binding analysis ...... 70

4.4 Results ...... 71

4.4.1 Validation of gene-specific hypomethylation ...... 71

4.4.2 CpG island methylation in LREA domains ...... 74

4.4.3 KLK4 and cancer specific hypomethylation ...... 76

4.4.4 CpG island methylation and gene activation ...... 79

4.4.5 PrEC cells are a model of normal CpG methylation ...... 84

4.5 Discussion ...... 86

CONTENTS

CHAPTER 5: Chromatin Looping and Kallikreins 89

5.1 Introduction ...... 90

5.2 Aim ...... 93

5.3 Methods ...... 93

5.3.1 3C protocol ...... 93

5.3.1.1 Cell fixation and nuclei extraction ...... 93

5.3.1.2 Restriction endonuclease digestion and ligation...... 94

5.3.2 Bacterial Artificial (BAC) ...... 94

5.4 Results ...... 96

5.4.1 Conformation Capture ...... 99

5.4.2 3C primer design ...... 101

5.4.3 Bacterial Artificial Chromosomes (BAC) ...... 102

5.4.3.1 BAC isolation and identification ...... 102

5.4.3.2 3C primer validation using BACs ...... 105

5.4.4 3C PCR optimisation ...... 105

5.4.5 KLK 3C analysis in PrEC and LNCaP...... 106

5.4.6 KLK looping in cancer cell lines ...... 108

5.5 Discussion ...... 110

CHAPTER 6: Repli-Seq Optimisation 113

6.1 Introduction ...... 114

6.2 Aim ...... 118

6.3 Methods ...... 118

6.3.1 BrdU pulse labelling and ethanol fixation ...... 118

CONTENTS

6.3.2 Propidium iodide staining and FACS sorting ...... 118

6.3.3 BrdU immunoprecipitation ...... 119

6.3.4 dsDNA Klenow extension and sonication ...... 120

6.4 Results ...... 122

6.4.1 Repli-Seq ...... 122

6.4.2 BrdU labelling time ...... 123

6.4.3 Immunoprecipitation antibody optimisation ...... 124

6.4.4 qPCR Amplicon Validation ...... 125

6.4.5 dsDNA reconstitution ...... 126

6.4.6 Repli-Seq with LNCaP and PrEC ...... 129

6.5 Discussion ...... 132

CHAPTER 7: Replication Timing in Prostate Cancer 133

7.1 Introduction ...... 134

7.2 Aim ...... 139

7.3 Methods ...... 139

7.3.1 Illumina sequencing and processing ...... 139

7.3.2 Replication timing value processing ...... 140

7.3.3 ENCODE data processing ...... 141

7.3.4 Epigenetic promoter quantification ...... 142

7.3.5 gNOME-Seq WGBS processing ...... 142

7.3.6 K-means-clustering...... 143

7.3.7 Principle Component Analysis...... 143

7.4 Results ...... 144

7.4.1 Repli-Seq data and processing ...... 144

CONTENTS

7.4.2 Genome-wide visualisation of Repli-Seq data ...... 150

7.4.3 Differential replication timing in cancer cells ...... 151

7.4.4 Replication timing and gene expression ...... 153

7.4.5 Replication timing and histone modifications ...... 156

7.4.6 Replication timing and DNA methylation ...... 156

7.4.6.1 DNA methylation at gene promoters ...... 157

7.4.6.2 Genome-wide DNA methylation ...... 163

7.4.7 Replication timing and clusters of change ...... 168

7.4.8 Replication at regions of long range epigenetic reprogramming ...... 170

7.4.9 Cancer specific signature of replication timing ...... 173

7.4.10 Cancer genes and replication timing ...... 175

7.5 Discussion ...... 178

CHAPTER 8: Conclusion 185

8.1 Key findings ...... 186

8.2 Future directions ...... 189

8.3 Concluding remarks ...... 192

CHAPTER 9: Appendix 193

9.1 Primers...... 194

9.2 Identification of LREA domains ...... 198

9.3 Oncomine region validation ...... 200

References 202

FIGURES

FIGURES

Figure 1.1 The nucleosome and its modifications ...... 4 Figure 1.2 CpG methylation at gene loci ...... 6 Figure 1.3 Domains of genomic regulation ...... 9 Figure 3.1 Identification of activated domains in prostate cancer cells ...... 46 Figure 3.2 Identified regions were assessed for copy number amplifications ...... 47 Figure 3.3 Transcriptional activation was validated with q-RT-PCR ...... 48 Figure 3.4 Oncomine prostate cancer studies validation ...... 50 Figure 3.5 Genomic characteristics of activated domains ...... 53 Figure 3.6 Confirmation of the LNCaP specific translocation of the ETV1 locus ...... 54 Figure 3.7 Concordant epigenetic modification at the KLK locus ...... 55 Figure 3.8 Generalised epigenetic changes in activated domains ...... 57 Figure 3.9 Blocks of epigenetic change surrounding the TSS ...... 58 Figure 3.10 Clusters of epigenetic changes in activated regions ...... 59 Figure 3.11 Epigenetic change in cancer is influenced by neighbouring genes ...... 60 Figure 3.12 Model of typical LREA epigenetic changes ...... 63 Figure 4.1 Proposed cancer-specific DNA methylation changes ...... 67 Figure 4.2 DNA methylation validation of 6 LREA genes using clonal bisulphite sequencing ...... 72 Figure 4.3 CBS validation of SINE methylation in C15orf21 ...... 73 Figure 4.4 Methylation distribution at promoter associated CpG islands...... 74 Figure 4.5 Relationship between gene expression and DNA methylation in LREA regions 75 Figure 4.6 Methylation and expression of the KLK4 locus ...... 77 Figure 4.7 Methylation and expression of KLK4 in clinical samples ...... 78 Figure 4.8 Genome wide relationship between gene expression and CpG island methylation ...... 80 Figure 4.9 Group I: border methylation of CpG islands ...... 82 Figure 4.10 Group II: full methylation of CpG islands with alternate TSS ...... 83

FIGURES

Figure 4.11 ES cell methylation comparison ...... 85 Figure 4.12 Hypermethylation of CpG islands and gene activation in cancer ...... 87 Figure 5.1 KLK LREA and LRES domains ...... 92 Figure 5.2 Characterisation of the KLK region ...... 97 Figure 5.3 Validation of CTCF ChIP-Seq ...... 98 Figure 5.4 Chromosome Conformation Capture methodology ...... 100 Figure 5.5 3C Bait and probe design ...... 101 Figure 5.6 BAC design and maxi-prep validation ...... 104 Figure 5.7 3C primer validation ...... 106 Figure 5.8 3C DNA optimisation ...... 107 Figure 5.9 KLK chromatin looping in PrEC and LNCaP ...... 108 Figure 5.10 KLK looping in MCF7 and K562 cell lines ...... 109 Figure 5.11 Model of KLK looping ...... 111 Figure 6.1 Asynchronous replication ...... 115 Figure 6.2 The Repli-Seq protocol ...... 121 Figure 6.3 Optimisation of BrdU labelling conditions ...... 124 Figure 6.4 BrdU immunoprecipitation optimisation ...... 125 Figure 6.5 Validating replication timing amplicons ...... 127 Figure 6.6 Optimising Klenow extension conditions ...... 128 Figure 6.7 Representative LNCaP PI FACS sort ...... 129 Figure 6.8 Final qPCR Validation of Repli-Seq protocol ...... 130 Figure 7.1 Quantification process of Repli-Seq samples ...... 146 Figure 7.2 Replicates confirm Repli-Seq reproducibility ...... 147 Figure 7.3 Public Repli-Seq datasets have similar WA distributions ...... 148 Figure 7.4 Replication timing in the normal and cancer prostate genome ...... 150 Figure 7.5 Classifying differentially replicated loci ...... 150 Figure 7.6 Visualising differentially replicated regions ...... 152 Figure 7.7 Replication timing at genes ...... 153 Figure 7.8 Changing replication time correlates with a change in gene expression ...... 154

FIGURES

Figure 7.9 Chromatin changes and changing replication timing at genes ...... 156 Figure 7.10 Replication timing and promoter methylation ...... 158 Figure 7.11 Comparing promoter methylation, replication and expression ...... 160 Figure 7.12 Changing replication time and changing promoter methylation ...... 163 Figure 7.13 Whole genome methylation and time of replication ...... 165 Figure 7.14 Changing methylation and the time of replication ...... 166 Figure 7.15 Genomic location of replication timing and methylation ...... 167 Figure 7.16 Clusters of epigenetic changes at genes with altered replication timing ...... 170 Figure 7.17 Replication timing at LREA regions ...... 171 Figure 7.18 Replication timing at LRES regions ...... 172 Figure 7.19 Comparison of LRES and LREA average replication ...... 173 Figure 7.20 PCA clustering of public Repli-Seq data ...... 174 Figure 7.21 ERG replicates later and is hypomethylated in LNCaP ...... 177 Figure 7.22 Model of chromatin modifications and replication timing ...... 179 Figure 7.23 Model of DNA methylation and replication timing ...... 181

TABLES

TABLES

Table 3.1 Oncomine dataset characteristics ...... 49 Table 3.2 Summary of activated regions in prostate cancer ...... 51 Table 4.1 Transcription factor binding motifs identified in Group I genes ...... 84 Table 5.1 KLK gene expression values ...... 96 Table 6.1 Putative BrdU-IP control amplicons ...... 126 Table 6.2 Summary of LNCaP and PrEC Repli-Seq experiments ...... 131 Table 7.1 Publicly available Repli-Seq datasets ...... 141 Table 7.2 Summary table of Repli-Seq samples ...... 144 Table 7.3 Cancer genes with changed replication timing ...... 175

CHAPTER 1: INTRODUCTION

CHAPTER 1 Introduction

1

CHAPTER 1: INTRODUCTION

1.1 The regulated genome

All cellular function ultimately relies on the appropriate regulation of DNA. The precise nature of this regulation is determined by the requirements of a cell in the context of an organism and its development. Generally speaking, the primary function of DNA is to be transcribed as mRNA, which in turn codes for . These proteins, alongside functional RNA units, are able to supply both the structure and function of all cellular systems. The final objective of this so termed Central Dogma of Molecular Biology is to provide an appropriate environment for DNA to be replicated and passed on to future generations, as necessitated by natural selection. This function is centrally mediated by a vast range of proteins, made possible by the incredible diversity of conformations that are adopted by these molecular structures. In order to ensure an appropriate cellular environment for replicative success, the correct array of RNA and proteins need to be present to respond to both internal and external stimuli. These include the effectors of multi-cellular differentiation and the mechanical structures of DNA replication itself. As RNA and proteins can be functionally active, it is essential that only the DNA which codes for desired products is “switched on” while DNA that codes for unwanted products is “switched off”.

These “on” and “off” switches are part of a large and significantly unexplored network of nuclear controlling factors known as epigenetics. Epigenetics is historically ill-defined and much debated, but has unexclusively come to refer to nuclear processes that regulate or perpetuate the function and output of DNA without altering the DNA sequence itself. Unlike the sequence of genomic DNA, which remains essentially unchanged throughout an organism’s lifespan and across all contained cell types, the epigenetic state of a cell changes upon differentiation and is cell-type specific. Traditionally, epigenetics has referred either to the directed methylation of DNA at cytosine bases, known as CpG methylation, or to chemical modifications of the histone proteins with which DNA is commonly associated. Increasingly the term has now come to include the larger spatial and temporal organisational processes of the genome that are associated with cell-type specificity. Together these epigenetic forces are able to maintain genomic stability, encourage and 2

CHAPTER 1: INTRODUCTION direct accurate DNA replication, as well as maintain appropriate transcriptional programs. The subject of this thesis concerns itself with the consequences of disrupted epigenomic control in the cancer cell, and as such will begin with a description of the relevant contributors to the normal and cancer epigenome.

1.2 The epigenetic code

1.2.1 Histone modifications

DNA is commonly found in eukaryotic cells associated with an octomeric protein complex made up of 4 pairs of histone proteins, typically H2A, H2B, H3 and H41. 146 bp of DNA is wrapped around this octomer2, in a nucleoprotein structure known as a nucleosome. Nucleosomes are separated by a variable length of linker-DNA, which can then be compacted by several magnitudes into higher-order chromatin structures3. The degree of this compaction can be used to distinguish the highly condensed and silent DNA of “heterochromatin” from the more accessible and transcriptionally active “euchromatin”4. Constitutive heterochromatin, which is generally found at the telomeres and centromeres of chromosomes, is heavily enriched in repetitive and transposable DNA elements5 and remains in a silent heterochromatic state throughout development. Facultative heterochromatin by contrast is present at genomic regions that are developmentally regulated and as such can interconvert with euchromatin with the transition to transcriptional activation6. The recognition and establishment of both heterochromatin and euchromatin is facilitated by chemical modifications of the N-terminal tails of the histone proteins, which protrude from the core globular domain of the nucleosome7,8. These chemical modifications include acetylation, methylation, ubiquitination and phosphorylation, and can broadly be defined as either “active” or “repressive” depending on their localisation on the histone protein (Figure 1.1).

3

CHAPTER 1: INTRODUCTION

Figure 1.1 The nucleosome and its modifications Left: DNA wraps around 4 pairs of histone proteins to create the nucleosome, from which the N-terminal tails of the histones protrude. Right: commonly studied chemical modifications of the H3 N-terminus which confer “active” and “repressive” epigenetic properties are indicated.

Active histone modifications are those associated with accessible chromatin, active transcription or active and potentiated gene promoters and enhancers. Histone acetylation was traditionally thought to encourage transcriptional activity through negating ionic attraction between nucleosomes and DNA, encouraging an open chromatin conformation9,10. Presently, acetylation of specific lysine residues within H3 and H4 have been shown to exhibit direct effects on transcription. Acetylation of H3 lysine 9 (H3K9ac) for example, is found exclusively at the promoter region of genes11 and is significant in recruiting the essential basal transcription protein complex TFIID12, while acetylation of H3 lysine 27 (H3K27ac) is a mark of active enhancer elements13. Trimethylation of H3 lysine 4 (H3K4me3) is a prominent modification associated with active promoters. With significant overlap of H3K9ac14, this modification is generally confined to gene promoters and is instrumental in the formation of the transcriptional pre-initiation complex15. Highlighting the sensitivity of these histone tail modifications, monomethylation, but not trimethylation, of H3 lysine 4 (H3K4me1) is found only at active and poised gene enhancer elements13. In addition to H3K4me3, another common active histone methylation modification is trimethylation of H3 lysine 36 (H3K36me3), found specifically at exons and implicated in transcriptional elongation and alternate splicing16.

4

CHAPTER 1: INTRODUCTION

While the majority of investigated histone modifications are involved in active and poised gene expression17, methylations of histone 3 at both lysine 9 (H3K9me2) and 27 (H3K27me3) are commonly implicated in genic repression. H3K9me2 is classically associated with domains of facultative heterochromatin18 and has recently been implicated in gene silencing at the nuclear periphery19,20. H3K27me3 comparatively, is maintained by the PRC2 polycomb repressive complex21 and is essential for the transcriptional repression of proliferative and developmentally regulated genes22. The mechanisms by which this modification exerts a repressive effect are still being elucidated22; recent reports have implicated H3K27me3 in recruiting repressive complexes, inhibiting transcriptional initiation and elongation as well as altering nucleosome dynamics23.

Histone modifications are frequently described as acting in a combinatorial fashion and protein complexes with multiple histone modification recognition sites have been identified24,25. Combinatorial histone modification “signatures” are able to distinguish between functionally similar elements of the genome, such as enhancers and promoters, at different states of activity. Enhancers for example can be characterised as either active or poised if marked by H3K4me1/H3K27ac or H3K4me1/H3K27me3 respectively26,27. Similarly, gene promoters can be simultaneously marked by the diametric modifications of H3K4me3 (active) and H3K27me3 (repressive). While promoters marked simply by H3K4me3 or H3K27me3 follow an expected course of transcriptional activation and repression respectively, a significant subset of gene promoters are marked by both modifications on individual histone proteins28. The highest proportion of this “bivalent” pattern is observed in embryonic stem (ES) cells and marks genes that subsequently resolve to either an active or inactive state during differentiation via the loss of one of these histone marks29. Epigenetic modifications can additionally act combinatorially through locational exclusivity. While this exclusivity is expected and observed17 at different modifications of the same amino acid, for example, at H3K9ac and H3K9me2, antagonistic epigenetic modifications are also seen at H3K27me3 and H3K36me3 modifications30, as well as H3K4me3 and DNA CpG methylation31,32.

5

CHAPTER 1: INTRODUCTION

Figure 1.2 CpG methylation at gene loci Upper: CpG methylation patterns found at intergenic and genic regions in healthy cells, with exonic and intronic regions indicated. Lower: promoter associated CpG islands are typically unmethylated in healthy cells. These islands can become hypermethylated in tumourigenesis, associated with gene repression.

1.2.2 DNA methylation

DNA methylation refers to the addition of a methyl group to cytosine nucleotides. In eukaryotes, this occurs predominantly in the context of cytosines which are followed by a guanine (CpG), but has additionally been observed in non CpG contexts in embryonic stem cells33. While CpG methylation is perhaps the most widely studied epigenetic modification, its precise function in the nucleus remains controversial. As methylated cytosines can undergo spontaneous deamination, giving rise to a substituted thymine nucleotide, CpG dinucleotides are strikingly under-represented throughout most of the genome34. This general CpG scarcity is contrasted by the presence of CpG islands, which are regions of the genome containing the expected distribution of CpG sites, often found at gene promoters34,35. In a normal cellular environment, promoter associated CpG islands are 6

CHAPTER 1: INTRODUCTION almost always completely unmethylated33,36 (Figure 1.2); this is in contrast to intragenic CpG islands where approximately 1/5 of CpG islands are methylated36. CpG poor intergenic regions by comparison are typically very highly methylated. Intragenic methylation, or “gene-body” methylation, is known to increase with the level of gene expression33, with methylation being higher in exons compared to introns37 (Figure 1.2). Aberrations of methylation deposition in the development of tumourigenesis are widely studied. In cancer cells, CpG-poor regions are known to undergo hypomethylation38,39 and CpG islands can become hypermethylated40, a phenotype associated with gene silencing41 (Figure 1.2). The details of these changes, as well as other cancer associated epigenetic deviations will be discussed in detail later in this chapter.

1.2.2.1 CpG Island Methylation The exact function of DNA methylation is still being determined and is known to vary depending on genomic context. CpG islands provide a conspicuous genomic element, being found at 70% of annotated gene promoters; this set encompassing approximately half of all mapped islands42. While there is an apparent genomic distinction between promoter and non-promoter CpG islands, recent research has suggested that many (if not all) non- promoter islands can also be sites of transcriptional initiation43. Unmethylated CpG islands are prone to spontaneous deposition of the transcriptionally active H3K4me3 histone mark44,45, providing a scaffold for further recruitment of the transcriptional complex. As H3K4me3 and DNA methylation are reported to be mutually exclusive at CpG islands46,47, the enduring presence of DNA methylation would inhibit the ability of promoters to acquire an active histone conformation. This relationship between promoter associated CpG island methylation and gene silencing has been extensively documented in both cancer cells41,48 and in normal cellular differentiation49,50. While DNA methylation of CpG islands can directly preclude transcription, its normal function is more likely to be the maintenance of a repressive state rather than its primary initiation51. CpG islands found intergenically, and not at promoters or intragenically, exhibit the highest level of normal somatic cell

7

CHAPTER 1: INTRODUCTION methylation52. This is likely a silencing mechanism of enhancers, intergenic ncRNA transcripts53 as well as alternative promoters and subsequent mRNA isoforms36.

1.2.2.2 Low CpG Density Methylation In contrast to CpG islands, methylation in CpG-poor areas of the genome does not appear to function homogenously. Methylation at pericentromeric repeat elements has been implicated in centromere stability54,55, and is known to be disrupted in the Immunodeficiency, Centromere instability and Facial anomalies (ICF) syndrome56,57. In the context of retrotransposons and retroviral elements that are widespread in the genome, DNA methylation is thought to silence these parasitic regions during gametogenesis58-61, the mechanisms of which are still unknown. In contrast to these functions, and as alluded to previously, DNA methylation in CpG-poor gene-body regions of the genome associates with transcriptional activity. While the function of this relationship is still being elucidated, higher methylation levels in exons compared to introns as well as the association between DNA methylation and H3K36me362 have suggested a role in the regulation of splice junctions37,63. The relationship between gene-body methylation and the degree of transcription is the most pronounced in cells that are highly mitotic, whereas slow dividing cells exhibit comparable methylation between high and low expressing genes64. This has led to speculation that the loss of methylation is a passive process encouraged by cell division in repressed regions of the genome.

1.3 Domains of regulation

The genome and epigenome are often considered in the context of individual functional units such as genes, promoters, enhancers and retrotransposons. There is, however, an increasing wealth of research that describes the genome as a developmentally regulated collection of large domains that can co-ordinately regulate or influence genomic transcription and stability. This concept of discrete genomic domains first coincided with the identification of cytogenetic staining bands65; the differential uptake of staining dyes 8

CHAPTER 1: INTRODUCTION

Figure 1.3 Domains of genomic regulation Repressive and permissive compartments refer to the two self-interacting aspects of the nucleus identified by Hi- C. Domains are described separately and in exclusive compartments to illustrate general trends rather than absolute relationships. A) DNA associated with the lamin polymer network at the nuclear periphery is epigenetically repressed and known as lamina associated domains (LADs). B) Large organised chromatin K9 domains (LOCKs) are gene poor and transcriptionally repressed regions of the genome that are homogenously 9

CHAPTER 1: INTRODUCTION

marked with H3K9me2 modifications. C) A/T isochores are long stretches of DNA (>200 kb) that have predominantly A and T nucleotide content and are generally gene deplete. D) Partially methylated domains (PMDs) have an average CpG methylation of less than 70% and cover ~40% of the genome. E) Multiple and distinct gene loci loop into protein dense foci that are enriched with surface RNA polymerase II, known as transcription factories, from where nascent RNA is actively transcribed. F) G/C isochores are long stretches of DNA that are mainly constituted with G and C nucleotides, commonly found in active regions of the genome. G) The permissive compartment of the genome is enriched with CpG islands and gene loci. A large range of transcriptional output is observed from contained genes. over large regions of the genome implicating variations in the composition or conformation of DNA. This early recognition has been expanded to include domains of sequence composition, epigenetic character, nuclear structure associations, replication timing and physical interconnectivity. Importantly, the regulation of all of these aspects appear to be cooperative, ensuring the stability and propagation of a defined nuclear state66.

1.3.1 Domains of sequence composition

Even without interrogating the physical and temporal nature of genomic organisation, it is readily apparent that different sequences, both coding and non-coding, align to different genomic domains. Domains of different nucleotide content were identified long before the genome was sequenced using caesium sulphate ultracentrifugation67. These domains were termed isochores, and reflected long DNA stretches (>200 kb) that contained relatively homogenous quantities of A/T or C/G nucleotides (Figure 1.3.C,F). G/C rich isochores were found to have a higher density of genes68, CpG islands69 and SINE retrotransposons, while being deplete of LINE retrotransposons70. Additionally, genes in these high G/C isochores were identified as having higher levels of gene transcription as well as significantly shorter introns71. Together this evidence was suggestive of large genomic domains that were not only discernible on the basis of gene and nucleotide content, but also on their regulatory potential in the nucleus.

10

CHAPTER 1: INTRODUCTION

1.3.2 Histone modification domains

Large scale epigenetic regulation of genomic loci was first recognised at the tightly developmentally regulated Hox clusters. The polycomb and trithorax protein complexes which are responsible for the deposition of H3K27me3 and H3K4me3 respectively, were in fact first identified for their role in regulating Hox gene expression in Drosophila melanogaster72,73. In vertebrates, developmentally determined large scale enrichment of both of these epigenetic modifications encompasses several genes in these essential clusters74,75. The significance of this domain organisation was underscored by an investigation showing that an artificially “split” Hox cluster was inappropriately regulated74. In addition to domains at Hox loci, large “BLOCs” (broad local enrichment) of the repressive H3K27me3 modification have been identified throughout the mouse genome at gene and SINE rich regions76. These domains of H3K27me3 are known to increase in size during differentiation and selectively encompass developmental genes62. Heterochromatic histone modifications H3K9me2 and H3K9me3 have shown similar domain expansion during differentiation62,77. These domains, termed LOCKs (large organised chromatin K9 modifications) can cover up to 46% of the genome and are found in gene poor genomic regions (Figure 1.3.B). Genes found within LOCKs are generally transcriptionally repressed, and this inactivity, as well as the formation of LOCKs themselves, can be abrogated by knockdown of the G9a histone methyl transferase20,77.

1.3.3 CpG methylation domains

The recent advent of whole genome bisulphite sequencing has facilitated the appreciation of large scale DNA methylation domains in the genome33. These investigations revealed the presence of partially methylated domains (PMDs, Figure 1.3.D) in differentiated cells; defined as contiguous regions of the genome where the average CpG methylation was less than 70%. These PMDs have a mean size of 153 kb, cover 38% of each chromosome and are generally fully methylated in ES cells. Of particular note is that in differentiated cells that were induced into a pluripotent stem cell state (iPSC cells), PMDs became fully

11

CHAPTER 1: INTRODUCTION methylated78, indicating the importance of this methylation phenotype in pluripotency. While genes found in PMDs are generally low expressing, they are also more likely to become repressed in the cancer state, accompanied by widespread low CpG density hypomethylation79. These regions were also heavily enriched with heterochromatic LOCKs80, and while no studies have looked specifically at PMDs and isochores, the lowest G/C isochores are known to have reduced methylation81. Although PMDs are involved in developmental and pathological processes, there is a scarcity of studies addressing their formation, regulation and functional significance.

1.3.4 Nuclear domains

The discovery that telomeres would preferentially associate with the nuclear periphery occurred early in cytological experimentation82 and first established the concept of nuclear localisation being significant in chromatin organisation. These observations were bolstered over a century later by experiments showing the preference for the nuclear interior exhibited by active genetic loci83. It is now broadly recognised that while heterochromatic, gene poor and A/T rich loci are generally peripherally localised, the genetically dense and G/C rich euchromatic DNA is predominantly found towards the nuclear interior84,85 (Figure 1.3.G).

1.3.4.1 Lamina associated domains The nuclear envelope is the structure that separates the cytoplasm from the nucleus. On the inside face of this envelope is the nuclear lamina, a fibrillar network of lamin polymers and associated proteins which is known to have functions in signal transduction as well as chromatin organisation86. Mutations that lead to disrupted Lamin A/C formation can cause premature ageing in a syndrome known as Hutchinson Gilford Progeria Syndrome. This syndrome is notable for its loss of peripherally localised heterochromatin87 as well as reduced levels of H3K27me388. The lamina network can bind both directly89 and indirectly90 to histone proteins, and is considered to a be a physical scaffold for perinuclear chromatin 12

CHAPTER 1: INTRODUCTION

(Figure 1.3.A). Genome-wide mapping of so-termed lamina-associated domains (LADs) has revealed that these regions cover ~40% of the genome. LADs were characterised as gene and CpG island deplete, and enriched in repressive chromatin modifications H3K27me3 and H3K9me219,91. As gene deserts are predominantly associated with LAD domains, it was proposed that these genomic regions may function to provide a defined nuclear localisation for proximate genic domains19. Interestingly, these domains also significantly coincided with the previously described PMDs92. While LADs are often gene poor they can contain genes that are repositioned to or from the lamina during development with expected shifts in transcription93. The mechanistic events which lead to heterochromatin formation and transcriptional silencing at the nuclear periphery are still being completely defined, but it is likely maintained by positive feedback between localisation and histone modification. Specifically, transcriptional repression was observed when genic loci were artificially tethered to the nuclear periphery94, and in a separate study, loci enriched with H3K9me2 were more likely to spontaneously associate with the nuclear lamina95.

1.3.4.2 Active domains While inactive heterochromatic genomic domains interact with the nuclear lamina, the spatial confinement of active domains is not as well documented. Euchromatic DNA is generally centrally located within a nucleus96 and this is associated with H3K9ac marked chromatin97. While there is a strong relationship between gene density and central localisation84, broad radial positioning is known to vary during differentiation and development with associated changes in gene expression98,99. This understanding is complemented by the presence of “transcription factories”, foci in the nucleus characterised by nascent transcription and a high enrichment of transcription factors and active RNA polymerase II (RNAPII)100 (Figure 1.3.E). At these factories, loci from distinct parts of the linear genome can share a common nuclear location and be co-ordinately transcribed, a quality potentially mediated by CpG islands at gene promoters101. Interestingly, a study has demonstrated that even in the absence of RNAPII these foci can

13

CHAPTER 1: INTRODUCTION still form, suggesting that active transcription isn’t required to direct this sub-nuclear localisation102.

1.3.4.3 Globular structure of the genome Recent technologies based around the “Chromosome Conformation Capture” (3C) technique quantify the physical interactions between any two genomic loci. While 3C, which will be discussed further in Chapter 5, investigates specific genomic interactions at a high resolution, techniques such as Hi-C103 and ChIA-PET104 have resolved a broad genome-wide perspective of 3D nuclear topology. Previously, individual chromosomes had been ascribed distinct locations in the nucleus known as chromosome territories, which were thought to contain independent units of transcriptional and epigenetic regulation105. Hi-C experiments have refined this understanding such that two large self-interacting compartments are described within the nucleus, and that loci from one compartment don’t interact with the other103 (Figure 1.3). Unsurprisingly, these compartments distinguish DNA that is highly genic and transcriptionally active from DNA which is transcriptionally repressive. Similar to LADs, LOCKs and BLOCs of H3K27me3, while these compartments correlate generally with static genomic features such as gene density, they are still prone to change through development, associated with an altered epigenetic state103. Interestingly, nuclear compartments associated with a repressive environment were shown to have tighter interactions103 and be less mobile than the active counterpart106. This finding is suggestive of a fixed repressive scaffold, like that of the nuclear lamina as previously described.

In addition to the expansive nuclear compartments described, other 3C-based technologies have identified more refined sub-territory structures. Using ChIA-PET, a genome-wide 3C technique which assesses DNA interactions mediated by a specified protein, investigators identified several looping chromatin domains bound by the insulator protein CTCF107. Both epigenetically active and repressive looping domains, marked by H3K4me2 and H3K9me3/H3K27me3 respectively, were described in ES cells. Other studies used high-

14

CHAPTER 1: INTRODUCTION sequencing depth Hi-C to characterise individual units of genomic organisation known as topologically associating domains (TADs)108. These megabase sized domains are defined as regions of the genome that are highly self-interacting with clearly delineated borders. While specific TADs were found to encompass units of heterochromatic H3K9me2, the topological structures were unaffected by knockdown of the G9a methyltransferase109. This result suggests that TADs mark boundaries of heterochromatin spread rather than being defined by existing chromatin boundaries. Interestingly, these domains retain their form through development, even with associated changes in contained chromatin structure109. This understanding, combined with the observation that several TADs can fall under larger genomic domains such as LADs108, suggests that TADs might represent basic units of genome architecture110.

1.3.5 DNA replication domains

The replication of DNA during S-phase of the cell cycle is a tightly developmentally controlled process. In a given cell type a specific genomic locus will always replicate at the same time point in S-phase, a quality known as “replication timing” which is typically characterised as either early or late. DNA replication is initiated at replication origins, opening sites in the chromatin from which the DNA replication complex can extend bi- directionally, and within mammalian nuclei, are usually found at transcription start sites111. Of the many thousand replication origins present only a fraction are active in a given cell112 or at a given time113,114. The replication timing of a locus is therefore defined by the point in the S-phase that a nearby origin is activated. While the precise mechanisms that determine when a dormant origin is activated are still being determined, a clear correlation between the level of transcription from associated transcription start sites and early replication has been observed111. Given this association, it is no surprise that early replicating loci are typically highly genic and transcriptionally active, while late replicating regions are generally gene poor and lowly transcribed115. Additionally, late replicating regions have been associated with LOCKs116,117, LADs118,119, A/T isochores120, PMDs92 as well

15

CHAPTER 1: INTRODUCTION as the aforementioned Hi-C repressive compartment116. Similar to these regions, “replication domains” of consistent early or late replication exhibit a size range of 200 kb – 2 Mb and are prone to redistribution and consolidation during differentiation115. These changes during differentiation are known to occur in concert, with regions that change in replication timing being reflected in changes to their sub-nuclear localisation115,121 as well as heterochromatin state122.

While there is a clear association between replication timing and the other nuclear domains discussed, causative relationships have remained elusive. For example, while H3K9me2 is the primary characteristic of LOCKs and is enriched in late replicating loci, a reduction in the modification via knock down of its catalyst G9a fails to alter the time of replication20. Similarly, forced repositioning of an early replicating locus to the nuclear periphery failed to impose a late replication phenotype123. Clues about the nature of this co-regulation emerged with the development of micro-injection studies, where plasmids could be injected directly into cell nuclei at different stages of S-phase. These investigations showed that DNA that was initially marked by hypo-acetylated chromatin would become hyper- acetylated if they replicated early in S-phase, and remain hypo-acetylated if they replicated late in S-phase124,125. In contrast, separate investigations revealed that ectopic histone acetylation was sufficient to advance the replication program of late replicating loci126,127. These results indicate the presence of a positive feedback loop, whereby acetylated chromatin promotes early replication, which in turn encourages hyper-acetylation. Interestingly, as alterations of repressive histone methylation failed to cause a shift in replication timing20,127, it suggests that histone acetylation specifically, rather than active chromatin generally, is causally associated with the temporal regulation of replication. The apparent relationship between late replication and heterochromatin domains is therefore either mediated by replication as a superior process, or by a third unknown mechanism.

16

CHAPTER 1: INTRODUCTION

1.4 Cancer, the diseased genome

Cancer is by nature a genetic disease, manifest by cells that have lost the regulatory processes which define rates of cell division and cell death. This eventuates in ectopic growths that can metastasise and eventually compromise essential physiological systems. The initial triggers of malignant growth are commonly thought to be DNA mutations, either caused by external mutagens such as tobacco smoke and UV radiation or through naturally occurring processes like erroneous DNA replication and repair. Only mutations in genes involved in key cellular pathways can promote tumourigenesis and these are known as “driver” mutations. There are around 140 known driver genes (genes that can contain driver mutations) that are currently described, and these are part of 12 cellular pathways that can confer selective growth advantage when disrupted128. Many mutations that facilitate cancerous growth are found in pathways concerned with DNA repair and apoptosis. Inactivation of these pathways leads to the common cancer phenotype of gross chromosomal aberrations, which includes amplifications, deletions and translocations as well as heightened numbers of DNA mutations128,129. These events can in turn lead to further oncogenic events through missense and nonsense mutations, chromosomal deletions of tumour suppressor genes, amplification of oncogenes as well as the creation of novel oncogenic gene fusions. Interestingly, the rate of DNA mutations and chromosomal rearrangements in cancer isn’t necessarily higher than in the normal cell, rather the cell’s tolerance for these disturbances is greatly increased. In addition to genetic abnormalities, malignant cells are witness to gross epigenomic disruption, which can in turn stimulate tumour progression. The lack of proper epigenetic control provides cancer cells with a characteristic nuclear heterogeneity that aids replication in a competitive micro- environment. The manifestation and consequence of these changes, as well as potential effectors are discussed below.

17

CHAPTER 1: INTRODUCTION

1.4.1 DNA methylation in tumourigenesis

Alterations in DNA methylation across the cancer genome have been widely investigated, implicating both hyper- and hypo- methylation in tumour progression. Hypomethylation was one of the earliest observed cancer epigenetic phenotypes, where researchers identified hypomethylation of both individual genes130 as well as of total nuclear levels131 compared to normal genomes. Hypomethylation was initially attributed to repeat elements throughout the genome39, the demethylation of which has been implicated in alternative transcript activation132 and overexpression of oncogenes133. More recent genome-wide studies have shown that the bulk of cancer-specific hypomethylation occurs in large genomic blocks that lose methylation in tumourigenesis and correspond with other “repressive” genomic features such as LADs, LOCKs and PMDs79,80,92. Interestingly, a knock down of DNMT3A, a DNA methyltransferase implicated in haematological malignancies134,135 and embryogenesis136, was shown to induce a genome-wide hypomethylated state, irrespective of other genomic elements137. This observation suggests that DNMT3A typically protects non-repressive genomic domains from widespread malignant hypomethylation. While the underlying mechanism that results in hypomethylated domains is still disputed, the domains are known to increase in their degree of hypomethylation with cancer progression80, suggestive of a gradual decline in methylation control rather than a sudden genomic shift caused by active demethylation. Although the functional consequences of widespread hypomethylation are unknown, artificial genomic hypomethylation can induce tumourigenesis in mice138. This tumourigenic quality is potentially mediated through a mutually exclusive presence of DNA methylation and heterochromatic histone modifications, such that hypomethylation permits the formation of large repressive domains and further transcriptional disruption79,139,140.

In addition to genome-wide hypomethylation, cancer cells notoriously exhibit hypermethylation of CpG islands throughout the genome. The frequency of this hypermethylation is variable, depending on tumour type141,142, progression143 and prognosis144, and only affects a subset of islands92,145. The most well studied outcome of 18

CHAPTER 1: INTRODUCTION cancer-specific hypermethylation is gene silencing. This was initially documented in classical tumour-suppressor genes such as Rb146 and p16147 and this epigenetic silencing has been equated with traditional genetic mutations in furthering tumourigenesis148. The majority of CpG islands remain unmethylated in malignant cells, with the molecular forces that direct this specific selection remaining unclear. A subset of cancers exhibit a CpG Island Methylated Phenotype (CIMP), whereby a widespread and coordinated set of genes become hypermethylated and can define tumour subtypes. This was first observed in colorectal cancer141 but has since been described in prostate149, pancreatic150, ovarian151 and liver152 cancers, amongst others142. The nature of these coordinated hypermethylation events is suggestive of a directed and not stochastic mechanism of deregulation. A potential effector of the CIMP phenotype is IDH1153, a gene commonly mutated in cancer154 and found to actively define CIMP in gliomas155 and leukaemia156. Mutated IDH1 is thought to promote hypermethylation through inhibition of TET proteins156,157, a family of proteins involved in active demethylation of CpG sites158,159 which are also mutated in cancer160. The predictable methylation of CIMP tumours can have predictable oncogenic qualities; demonstrated in colorectal cancer where CIMP associated hypermethylation of MLH1 leads to microsatellite instability and further malignant transformation161.

Hypermethylation of CpG islands also commonly occurs in cancer irrespective of a coordinated CIMP phenotype. The mechanisms of this generalised deregulation, as well of the methylation pressures that select only a subset of islands, are questions of current debate. While certain cancers possess clear genetic causes of a deregulated methylome, such as those described above for DNMT3A134,135, IDH1155,156 and TET2160, such mutations aren’t as yet apparent for the remainder of cancers, even with associated methylation defects. This suggests one of two conclusions; either we haven’t fully discovered potential genetic initiators of a deregulated methylome or that this characteristic methylome is inherent in tumourigenesis. In support of this latter hypothesis, De Varvalho and colleagues identified a set of genes whose hypermethylation is essential for survival of cancer cells162. The majority of hypermethylated CpG islands however appear to be “passenger” events163,

19

CHAPTER 1: INTRODUCTION and are a product of the cancer cell’s penchant for methylation rather than having oncogenic qualities. The criteria by which some CpG islands become hypermethylated while others are protected appears to be determined by local chromatin architecture164, transcription factor occupancy165 and DNA sequence context166. Occupancy of CpG islands by transcription factors is protective for de novo methylation over large regions167 and could belie the variation in CpG island methylation specificity between cancer types, as driven by cell type specific transcription factor repertoires. While resistant CpG islands tend to be bound by ubiquitous transcription factors and associated with basic cellular function165, CpG islands at genes that are developmentally regulated and enriched with bivalent H3K4me3/H3K27me3 modifications embryonically have a far greater propensity for hypermethylation in cancer168,169.

1.4.2 Chromatin remodelling in tumourigenesis

While there are common patterns of DNA methylation in cancer development, the deregulation of chromatin and histone structures in cancer reveals a far greater diversity. This is due to the multitude of nuclear factors required for the “writing, editing and reading” of histone methylation, acetylation and phosphorylation as well as the presence of histone variants and chromatin remodelling enzymes, all of which have representative genes implicated in tumourigenesis170-172.

1.4.2.1 Histone methylation and cancer Oncogenic mutations and chromosomal rearrangements of genes involved in histone methylation deposition are commonly reported173,174. Indeed, global levels of specific histone methylations are known to vary (both increasing and decreasing) in diverse cancer types175-180. Two of the most widely studied oncogenic methyltransferases are Enhancer of Zeste Homologue 2 (EZH2) and Mixed Lineage Leukaemia (MLL). More than 50 translocations involving the H3K4me3 methyltransferase MLL have been identified181, being found in 80% of infant leukaemias182. Different classes of MLL mutations impart different epigenetic defects. A common tandem duplication for example ectopically increases 20

CHAPTER 1: INTRODUCTION histone acetylation and H3K4me3 at the otherwise developmentally restricted HoxA locus183,184, potentiating malignant transformation185. While MLL exerts influence over a small set of genes, the oncogenic actions of EZH2 can lead to epigenetic repression of a range of tumour suppressor genes, including BRCA1186, p57187 and E-cadherin188. EZH2 is the catalytic subunit of the polycomb repressive complex (PRC2) and is essential for H3K27me3 mediated gene silencing across the genome189. Interestingly, while it is amplified in a variety of cancers including prostate190, breast191 and gastric cancer188, it can also be prone to inactivating mutations in lymphoma192. EZH2 is typically lowly expressed in fully differentiated tissues190 when compared to embryonic cells193, leading to speculation that over-expression in cancer represents a return to a stem-cell like chromatin conformation of developmental genes194.

1.4.2.2 Histone acetylation and cancer Similar to histone methylation, the role of deregulated histone acetylation in cancer development has been widely investigated. Global loss of the histone acetylation modification is found in a range of cancers and can occur at many different modification sites, including H3K9ac178,179, H3K4ac195, H3K27ac196 and H4K16ac197. Levels of histone acetylation are dynamically regulated through the competing activities of the histone deacetylase (HDAC) and histone acetyltransferase (HAT) protein families, each of which have members that are prone to deregulation in cancer. HATs are a broad range of proteins that have diverse functions, targeting histone tails, histone cores as well as non-histone proteins198. While genetic disruptions in cancer can potentiate their qualities as oncogenes199,200, more commonly they are observed to have a tumour suppressive role, with inactivating chromosomal translocations and coding mutations being described for many disparate HAT genes198,201,202. The most notable example of this oncogenic potential lie in the CBP/p300 family of HATs, which are mutated in a number of malignancies203,204. Germ-line deletions in either p300 or CBP cause Rubinstein-Taybi Syndrome, characterised by an increased risk in childhood tumour development205,206. Furthermore, the oncogenic

21

CHAPTER 1: INTRODUCTION

E1A adenovirus affects its transforming qualities by binding p300 and displacing the HAT from its typical targets, resulting in genome-wide hypoacetylation and cancer development207.

Histone hypoacetylation in cancer is also commonly associated with aberrant HDAC activity, with increases in expression of HDACs being identified in a wide range of tumour types208,209. Additionally, oncogenic fusion proteins seen in haematological malignancies are able to recruit HDACs to multiple genomic loci, causing ectopic transcriptional repression210- 212. The significance of HDAC activity in tumourigenesis is testified by the success of HDAC inhibitors (HDACi) as chemotherapeutic agents. These agents can increase levels of histone acetylation213 and halt cancer development through cell growth arrest and induction of apoptosis214. The precise mechanisms that underlie these drugs is still being investigated, but is likely to involve hyperacetylation of both histone and non-histone proteins214.

1.4.3 Domain level epigenomic remodelling in cancer

Given the significance of histone modifications in normal genetic regulation and its evident disruption in the cancer state, there is a surprising paucity of studies that address the specific genome-wide distribution and redistribution of these marks during tumourigenesis215. The few studies that have investigated this question have provided sometimes conflicting results. For example, while Wen and colleagues77 suggest that heterochromatic LOCKs are diminished in tumourigenesis, Hon and colleagues79 argue that heterochromatic H3K9me2 and H3K27me3 are increased in genomic regions that undergo hypomethylation. In support of this latter observation, our laboratory identified large regions of the cancer genome that become epigenetically repressed in multiple cancer types139,140,216, a phenomenon termed Long Range Epigenetic Silencing (LRES). Studies of the epigenomic changes that occur during the epithelial to mesenchymal transition (EMT) show a reduction in heterochromatic modifications and an enrichment in euchromatin217,

22

CHAPTER 1: INTRODUCTION potentially explaining previous discordant results as being reflective of different stages of tumourigenic development.

Although there is little consensus of the exact nature of epigenomic disruption in tumourigenesis, there are many potentially causal nuclear aberrations that are common and well described. Indeed, one of the earliest cellular observations of malignant cells, made in 1860, was of atypical nuclear size and morphology in pharyngeal cancer218. Almost a century following, George Papanicolaou entrenched the diagnostic value of atypical nuclei by pioneering the now common “Pap Test” for screening of cervical cancer219. Pleomorphic nuclei, which show variation in nuclear shape and size, are common in cancer and are representative of a deregulated nuclear environment. The cause of this pleomorphism is likely multifactorial and tumour specific. Implicated structures include the nuclear envelope, the nuclear matrix, nucleoli as well as generalised chromatin conformation, all of which can be disrupted in the cancer state220-223. The deregulation of the nuclear envelope in particular can lead to profound nuclear disturbances in cancer224. This is due to its essential roles in regulating cell division, acting as a gatekeeper for cytoplasmic-nuclear signalling and most pertinently, as a scaffold for appropriate genetic and epigenetic control, as discussed previously. Many nuclear envelope genes, and in particular the lamina genes are mis-expressed in diverse cancer types225-229. In spite of the apparent connections between the nuclear envelope and epigenetic regulation and that of a deregulated nuclear envelope and malignant transformation, there are presently no studies that sufficiently characterise how the nuclear envelope is implicated in gross epigenomic aberrations in cancer.

23

CHAPTER 1: INTRODUCTION

1.5 Justification & Aims

The many manifestations of epigenetic control are essential to appropriate gene expression and cellular identity. Disruptions of these regulatory networks are found as both cause and consequence in many different disease states. In cancer this disruption is particularly fundamental to the disease’s pathophysiology; occurring early in transformation and potentiating massive transcriptional adaptability. Despite this, only a small subset of epigenetic changes are well described in cancer. These traditionally include repressive hypermethylation of CpG islands, widespread CpG hypomethylation and global variations in the absolute quantity of histone modifications. Due to the diversity of epigenetic mechanisms and the heterogeneity of cancer types there are many questions that remain to be answered, regarding large scale epigenomic changes that affect the entire nucleus as well as epigenetic changes that occur at individual loci.

The general aim of this thesis, therefore, is to characterise epigenomic changes that occur in cancer, using prostate cancer as a model system. I intend to investigate traditional epigenetic modifications of DNA methylation and histone modification at an epigenomic level, as well as more contemporary epigenetic processes that regulate spatial and temporal aspects of DNA function.

Specific aims are as follows:

Aim 1 (Chapter 3) The Clark laboratory, of which I am a part, has previously documented Long Range Epigenetic Silencing (LRES) in prostate cancer. Here, I investigate the presence and nature of the converse of LRES, specifically characterising regions of the genome that are uniformly epigenetically activated in cancer, namely by Long Range Epigenetic Activation (LREA).

24

CHAPTER 1: INTRODUCTION

Aim 2 (Chapter 4) Gene repression associated with CpG hypermethylation is commonly reported. Here I aim to explore the relationship between CpG methylation and gene activation in tumourigenesis, and to specifically address whether promoter demethylation in cancer is associated with oncogene activation.

Aim 3 (Chapter 5) Although chromatin looping is a vital aspect of genetic regulation, its presence and potential dysfunction in cancer has not been reported. Here I aim to examine the incidence of chromatin looping at a model locus that undergoes significant epigenetic remodelling in the cancer state.

Aim 4 (Chapter 6 and 7) Replication timing is a tightly regulated developmental process that affects the entire genome. Here I aim to first optimise the technologies required to properly interrogate replication timing, then to assess potential variation that occurs between normal and cancer cell states and, finally, to examine the epigenetic manifestations of an altered replication program in cancer.

25

CHAPTER 2: METHODS

CHAPTER 2 Methods

26

CHAPTER 2: METHODS

2.1 Cell lines and culture

2.1.1 LNCaP cell line: Lymph Node Cancer of the Prostate

LNCaP cells were maintained in T-medium (Gibco) supplemented with foetal bovine serum (10%) (ThermoTrace), L-Glutamine (20 mM) (#25030, Gibco) and a Penicillin/Streptomycin

o solution (50000 units Penicillin/50 mg Streptomycin) (#15070, Gibco) at 37 C and 5% CO2. Cells were passaged at 80% confluence via trypsinisation with Trypsin-EDTA (0.05%) (#25300, Gibco) at 37oC for 5 min. Detached cells were centrifuged at 800 g for 5 min, the supernatant was discarded and the cell pellet was resuspended in supplemented T-media to inactivate the remaining Trypsin. The number of cells was ascertained with a haemocytometer and new cultures were seeded as a single cell suspension at 1x104 cells/cm2.

2.1.2 PrEC cell line: Prostate Epithelial Cells

PrEC cells (#CC-2555, Cambrex Bio Science) were maintained in PrEBM medium (#CC-3165, Clonetics) that was supplemented with SingleQuots growth supplements (containing BPE, Hydrocortisone, hEGF, Epinephrine, Transferrin, Insulin, Retinoic Acid, Triiodothyronine, GA-1000) (#CC-4177, Clonetics). Cells at 80% confluency were rinsed with HEPES-BSS (#CC- 5022, Clonetics) and trypsinised with Trypsin-EDTA (0.025%) (#CC-5012, Clonetics) at 25oC for 5 min. Trypsin was neutralised with a Trypsin-Neutralising Solution (TNS) (#CC-5002). Detached cells were centrifuged at 800 g for 5 min, the supernatant was discarded and the cell pellet was resuspended in supplemented PrEBM. The number of cells was ascertained with a haemocytometer and new cultures were seeded as a single cell suspension at 2.5x103 cells/cm2.

27

CHAPTER 2: METHODS

2.2 Nucleic acid extraction

2.2.1 DNA extraction

Cells in adherent culture were harvested via trypsin and centrifuged at 800 g for 5 min. The supernatant was removed and cells were resuspended in PBS (#10010049, Life Technologies). DNA was extracted using the QIAamp DNA Mini kit (#51304, QIAGEN) according to manufacturer’s instructions.

2.2.2 RNA extraction

Cells (~1-10x106) in adherent culture were harvested via trypsin and centrifuged at 800 g for 5 min. The supernatant was removed and cells were resuspended in 1 ml TRIzol (#15596018, Life Technologies), followed by repeated pipetting and vortexing to homogenize. The sample was incubated for 5 min at room temperature before the addition of 200 l of chloroform, with subsequent vigorous shaking and incubation at room temperature for 2-3 min. Centrifugation at 12,000 g for 15 min at 4oC separated the solution into two phases, a lower organic phenol-chloroform phase and an upper aqueous phase which contained RNA. The upper phase was transferred to a fresh tube and RNA was precipitated with the addition of 500 l of isopropanol, incubation at room temperature for 10 min and centrifugation at 12,000 g for 10 min at 4oC. The RNA pellet was washed with 75% ethanol, air dried and resuspended in RNAse free water. Isolated RNA was stored at - 80oC.

2.3 Nucleic acid quantification

2.3.1 Spectrophotometric quantification

DNA and RNA were generally quantified using the NanoDrop spectrophotometer (Thermo scientific), according to manufacturer’s instruction.

28

CHAPTER 2: METHODS

2.3.2 Fluorometric quantification

For instances where very accurate quantification was required the Qubit 2.0 Fluorometer (Life Technologies) was used. For general DNA quantification the dsDNA BR Assay kit (#Q32850) was used. For low DNA concentration (assay range 0.2-100 ng) the dsDNA HS Assay kit (#Q32851) was used. For single strand DNA quantification, the ssDNA Assay kit (Q10212) was used. Quantification was carried out according to manufacturer’s instruction. Briefly, 1-10 l of sample to be analysed was put into a Qubit Assay Tube (#Q32856). Kit supplied concentrated assay reagent was diluted 1:200 with appropriate supplied dilution buffer, which was added to samples to a total of 200 l. Measurements were conducted using the Qubit 2.0 Fluorometer, using supplied DNA standards to calibrate the device.

2.4 Nucleic acid treatments

2.4.1 Bisulphite conversion

Bisulphite conversion of genomic DNA was carried out according to the Clark method as previously described230. 1 g of genomic DNA was incubated with DNA lysis buffer containing 2 g tRNA, 280 ng/l Proteinase K (#19131, QIAGEN) and 1% SDS (#L3771, Sigma- Aldrich) in a total volume of 18 l at 37oC for 1 hr. 2 l of 3 M NaOH was added, the samples were vortexed and incubated at 37oC for 15 min, 90oC for 2 min and kept on ice. 208 l of saturated sodium metabisulphite (#31448, Sigma-Aldrich) and 12 l of 10 mM quinol (#H9003, Sigma-Aldrich) were added, and the sample was incubated at 55oC for 8 hr. The sample was cleaned using Wizard DNA Clean-up kit as described (Section 2.5.2) and eluted into 50 l H2O before being desulphonated with the addition of 5.5 l 3M NaOH and incubation at 37oC for 15 min. Bisulphite converted DNA was precipitated using ethanol precipitation as described (Section 2.5.1), using 10 mg/ml tRNA as a carrier and 5M

o C2H3O2NH5 as the salt. Purified DNA was dissolved in 20-50l H2O and stored at -20 C.

29

CHAPTER 2: METHODS

2.4.2 cDNA synthesis cDNA was synthesised from purified RNA using the SuperScript III First-Strand Synthesis System (#18080-051, Life Technologies). 1 g total RNA was combined with 150 ng random hexamers and 1 l of 10 mM dNTPs in a total volume of 13 l, made up with H2O. Samples were incubated at 65oC for 5 min and then kept on ice for at least 1 min. 1 l of 0.1 M DTT, 4 l of 5x 1st strand buffer, 1 l RNase Recombinant Inhibitor (40 units/l) and 1 l SuperScript III Reverse Transcriptase were added. The sample was incubated at 25oC for 5 min, 50oC for 1 hr and 70oC for 15 minutes. The sample was diluted to an appropriate

o concentration using H2O and newly synthesised cDNA was stored at -20 C.

2.5 DNA purification

2.5.1 Phenol chloroform / ethanol purification

An equal volume of phenol/chloroform (#P3803, Sigma-Aldrich) as the initial volume of the sample was added. Samples were vortexed and centrifuged at 12,000 g for 5 min, separating into a lower organic phase and an upper aqueous phase that contains the DNA. The aqueous phase is removed to a new tube and the same volume of chloroform (#C2432, Sigma- Aldrich) was added to the samples, which were centrifuged at 12,000 g for 5 min. The upper phase was again removed and the samples were subject to ethanol precipitation.

Ethanol precipitation can be modulated in several ways depending on the nature of the initial sample. In all cases 2.5 volumes of ice cold 100% ethanol was added to each sample.

 For routine precipitations, sodium acetate was added (0.3 M final concentration)  For solutions that contain SDS, sodium chloride was added (0.2 M final concentration)  For solutions containing dNTPs, ammonium acetate was added (2 M final concentration) 30

CHAPTER 2: METHODS

 To aid in the precipitation of low concentration solutions, 1 l Glycogen or 1 l of 10 mg/ml tRNA was used as a ‘carrier’

Samples with the appropriate salt and carrier added were mixed by inversion, and stored at -20oC for at least 20 min. Samples were centrifuged at 14,000 g for 30 min, and the supernatant was removed. The pellet was washed with 2 volumes of 70% ethanol and centrifuged at 14,000 g for 20 min. The supernatant was removed, and the pellet was allowed to air dry until all ethanol had evaporated. Samples were resuspended in either H2O or TE buffer (#T11493, Life Technologies).

2.5.2 Promega Wizard purification kits

To purify DNA from PCR and other enzymatic reactions, as well as to extract DNA from agarose gel visualisations, the Wizard SV Gel and PCR Clean-Up System (#A2982, Promega) was used according to the manufacturer’s protocols.

2.6 PCR based techniques

2.6.1 PCR

PCR was generally performed in triplicate to account for amplification bias using primers listed in Appendix 9.1. PCR was performed in a 20 l volume using the Platinum Taq DNA Polymerase kit (#10966-018, Life Technologies). Briefly, the following components were added on ice: 0.2 mM dNTP mixture (#U1511, Promega), 1.5 mM MgCl2, 1 unit Platinum Taq DNA Polymerase, 1X supplied PCR Buffer, 0.2 M forward and reverse primers. DNA was generally added as 10 ng per reaction but was modulated to specific assay requirements. H2O was added to a final volume of 20 l. PCR was performed using Eppendorf Mastercycler under the following conditions:

31

CHAPTER 2: METHODS

Temperature Time (min:sec) Cycles 95oC 10:00 1 95oC 00:20 40 XoC 00:30 72oC 00:30 4oC hold  XoC represents the specific annealing temperature of individual primer sets.

2.6.2 Bisulphite PCR

Bisulphite PCR was carried out essentially as described in Section 2.6.1 with important modifications. Primers were designed against the bisulphite converted genome where all cytosine nucleotides (C) are changed to a thymine (T), according to the primer design rules outlined in Clark et al230. As the bisulphite treatment protocol converts all cytosines except for those that are modified with a methyl group, it is unknown in the context of primer design whether a CpG dinucleotide is in fact CpG or TpG. As such, primers were designed to be exclusive of CpG sites in the endogenous sequence. Due to the changed sequence of both template and primer, the conditions of the PCR reaction were also modified: Temperature Time (min:sec) Cycles 95oC 05:00 1 95oC 00:45 5 XoC 01:30 72oC 02:00 95oC 00:45 25 XoC 01:30 72oC 01:30 4oC hold XoC represents the specific annealing temperature of individual primer sets.

32

CHAPTER 2: METHODS

2.6.3 Quantitative Real-Time PCR

Quantitative Real-Time PCR (q-RT-PCR) was carried out using the LightCycler 480 Real-Time PCR System (Roche), the ABI PRISM 7900HT Sequence Detection System (Life Technologies) or the CX384 Real-Time PCR Detection System (Bio-Rad). Reactions were carried out in 10 l volumes using optically appropriate 384 well plates.

For quantification assays that relied on SYBR green fluorescence the Power SYBR Green PCR Master Mix (#436769, Life Technologies), primers (200 nM) and DNA or cDNA (1 - 100 ng) were added. For assays that relied on TaqMan technology the TaqMan Gene Expression Master Mix (#4369016, Life Technologies), primers (200 nM), DNA or cDNA (1-100 ng) and TaqMan MGB Probe (100 nM) were used. For the control TaqMan assay that quantified ubiquitously expressing 18S rRNA, a 20X TaqMan Gene Expression Assay (Life Technologies) was used. This assay is used instead of individual primers and TaqMan probe. Amplification conditions are outlined below:

Temperature Time (min:sec) Cycles 95oC 10:00 1 95oC 00:15 35 60oC 01:00 4oC hold 

2.7 Chromatin immunoprecipitation

Carried out with the assistance of Senior Research Assistant Jenny Song from the Clark Lab.

2.7.1 Fixation and sonication

Adherent cells were cultured on 10 cm dishes to 80% confluency. Protein cross-linking was carried out by the addition of 37% formaldehyde to a 1% concentration and incubation at room temperature for 10 min. Fixation was stopped with the addition of glycine (125 M) 33

CHAPTER 2: METHODS and a further 5 min incubation at room temperature. The media and the fixative was aspirated and the cells were washed twice with cold PBS containing protease inhibitors (1 mM phenylmethylsulfonyl fluoride (#P7626, Sigma-Aldrich), 1 g/ml aprotinin (#109811532001, Roche), 1 g/ml pepstatin A (#P5318, Sigma-Aldrich)). Washed cells were scraped and centrifuged at 2,000 g for 4 min; the supernatant was removed and the fixed cells were resuspended in SDS lysis buffer at a concentration of 1-2x106 cells per 200 l aliqout.

Sonication was carried out using a Branson Sonifier 250 (Emerson Industrial Automation) at an amplitude of 30%, 0.9 seconds on / 0.1 seconds off for 9 seconds in total. These conditions were repeated for 4 cycles until the majority of chromatin was in the 200-500 bp size range as determined by gel electrophoresis.

2.7.2 Immunoprecipitation

Immunoprecipitation was carried out with the Chromatin Immunoprecipitation Assay Kit (#170295, Merck Millipore) according to manufacturer’s protocol. Complexes were immunoprecipitated with 10 l of antibody specific for: acetylated lysine 9 of histone H3 (H3K9ac) (#06-599 Merck, Millipore), trimethylated lysine 4 of histone H3 (H3K4me3) (#ab8580, Abcam), trimethylated lysine 27 of histone H3 (H3K27me3) (#07-449, Merck Millipore), dimethylated lysine 9 of histone H3 (H3K9me2) (#ab1220, Abcam), trimethylated lysine 36 of histone 3 (H3K36me3), 5-methylcytosine (meDIP) (#NA81, Millipore), CCCTC- binding factor (CTCF) (#07-279, Millipore). “No antibody” controls were included for each ChIP assay and input samples were processed in parallel. Antibody/protein complexes were collected by either salmon sperm DNA/protein A agarose slurry or Protein A/G PLUS agarose beads (#sc-2003, Santa Cruz) depending on the specified antibody, and samples were washed several times. Immunoprecipitated complexes were eluted with 1% SDS and 0.1 M

NaHCO3, followed by proteinase K for 1 hr. DNA was purified by phenol/chloroform extraction and ethanol precipitation as described, before being resuspended in H2O.

34

CHAPTER 2: METHODS

2.7.3 Chromatin Immunoprecipitation Sequencing (ChIP-Seq)

Next generation sequencing of immunoprecipitated products (10 - 20 ng) was carried out at the Ramaciotti Centre for Gene Function Analysis, Australia, (H3K27me3) and Beijing Genomics Institute, China (H3K36me3, H3K4me3, CTCF), sequencing centres. Sequencing was carried out using 50 bp single-end reads on an Illumina Genome Analyzer IIx (H3K27me3) and the Illumina HiSeq 2000 (H3K36me3, H3K4me3, CTCF). Sequence reads were mapped to the hg18 reference genome using Bowtie231 with up to three mismatches. Reads that mapped more than once to a single genomic location were filtered from further analysis.

2.8 Whole genome methylation studies

Methylation studies for LNCaP and PrEC DNA were carried out using a number of different technologies over the course of my studies, with the assistance of Senior Research Assistant, Jenny Song and Research Officer, Shalima Nair from the Clark Lab.

2.8.1 MBD2 Capture Sequencing (MBDCap-Seq)

Methylated DNA was isolated using the MethylMiner Methylated DNA Enrichment Kit (#ME10025, Life Technologies) as described by Nair et al232. Briefly, 1 g of genomic DNA was sonicated to 100-500 bp and then added to a solution containing MBD-Biotin conjugated to Dynabeads M-280 Streptavidin which had been previously prepared. This solution was then incubated for 1 hr on a rotating mixer at room temperature and washed three times with supplied Bind/Wash buffer. Bound methylated DNA was eluted with High Salt Elution Buffer (2000 mM NaCl). DNA was purified using ethanol precipitation as described (Section 2.5.1), using sodium acetate and 1 l glycogen as a carrier. DNA was resuspended in H2O. 10 ng of captured DNA was sent to the Ramaciotti Centre for Gene Function Analysis (University of New South Wales) for library preparation and sequencing using the Illumina Genome Analyzer II.

35

CHAPTER 2: METHODS

2.8.2 Infinium Human Methylation450 BeadChips (450K arrays)

Methylation analysis was performed using Infinium HumanMethylation450 BeadChips. Genomic DNA (500 ng) was hybridised to the HumanMethylation450 BeadChips as a service conducted by the Australian Genome Research Facility (AGRF) (Westmead Millenium Institute). Hybridisations were performed in triplicate and raw data was processed using the ‘minfi’ Bioconductor package. For CpG island analysis, beta values representing methylation at individual probes were averaged +/- 500bp from the centre of all CpG islands located within 2.5 kb of a TSS.

2.8.3 Methylated DNA immunoprecipitation (MeDIP)

MeDIP was performed as previously described by Nair et al232. Briefly,

4 g of sonicated genomic DNA (300-500 bp) in IP buffer (10 mM NaPO4 pH 7.0, 140 mM NaCl, 0.05% Triton X-100) was incubated with 10 g of antibody against 5-methylcytosine (#NA81, Calbiochem) overnight in a total volume of 500 l. DNA/antibody complexes were precipitated with 80 l Protein A/G PLUS agarose beads, washed three times in IP buffer at 4oC and twice with 1 ml TE buffer at room temperature. DNA was eluted with 1% SDS/ 0.1

M NaHCO3, and purified with phenol/chloroform extraction followed by ethanol precipitation. Samples were resuspended in 30 l H2O and maintained at -20oC.

2.9 Plasmid preparation and single molecule sequencing

2.9.1 Mini-prep plasmid preparation

Escherichia coli containing the plasmid of interest were grown on LB agar containing appropriate selective antibiotics for 12-16 hr at 37oC. Individual colonies were picked using a sterile toothpick and used to inoculate 2 ml of LB medium containing appropriate antibiotics. Cultures were grown for 12-16 hr at 37oC at 220 rpm. The bacteria were pelleted by centrifugation at 12,000 g for 30 sec, and the supernatant was discarded. The pellet was resuspended in 100 l cold ‘Solution I’ (50 mM Glucose, 25 mM Tris Cl pH 8.0, 10 mM EDTA).

36

CHAPTER 2: METHODS

200 l of ‘Solution II’ (0.2 M NaOH, 1% SDS) was added and the solution was mixed via inversion and stored on ice. 150 l of ‘Solution III’ (3M CH3CO2K, 5M anhydrous CH3CO2H) was added and the samples were vortexed and stored on ice for 3-5 min before being centrifuged at 12,000 g for 5 min at 4oC. The supernatant was transferred to a new tube and plasmid DNA was purified by phenol chloroform extraction with ethanol purification (Section 2.5.1). DNA was resuspended in 50 l TE buffer (pH 8.0) containing RNase (20 g/ml) and was incubated at 37oC for 30 min.

2.9.2 Capillary separation sequencing

Plasmids generated by mini- or maxi-preps were prepared for sequencing with the BigDye Terminator v3.1 Cycle Sequencing Kit (#4337455, Life Technologies) according to manufacturer’s protocol and commonly using the SP6 promoter primer sequence. DNA was purified using ethanol purification as described (Section 2.5.1) with the addition of EDTA (7.5 mM). Capillary sequencing was carried out by the Australian Cancer Research Foundation (ACRF) (Garvan Institute of Medical Research) using a 3100 Genetic Analyser (Life Technologies).

37

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

CHAPTER 3 Long Range Epigenetic Activation

38

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

3.1 Introduction

Gene expression is regulated through coordination of DNA binding factors at individual genetic loci, as well as through domain-level processes that govern wider transcriptional potential. Although both epigenetic processes are disrupted in tumourigenesis, only those changes that affect CpG islands and involve epigenetic repression are well characterised. Inroads to a greater understanding of domain-level epigenetic deregulation in cancer have been achieved by the Clark laboratory and others with the discovery of Long Range Epigenetic Silencing (LRES) in multiple cancer types139,140,216,233. Regions of LRES are typically defined as genomic domains where contiguous genes are transcriptionally silenced and associated with repressive epigenetic modifications in cancer. While the form of this epigenetic repression is not uniform, common LRES characteristics include the loss of H3K9ac at gene promoters with a coincident gain of repressive histone modifications and DNA methylation of CpG islands139,140,216. Repressive modifications such as H3K9me2 and H3K27me3 were found throughout LRES regions and marked a subset of genes that were mutually-exclusive of hypermethylated CpG islands. Importantly, LRES regions contain multiple tumour-suppressor genes and overlap significantly with genomic regions that are commonly deleted in cancer140. The mechanisms that underlie LRES are still a matter of some debate. Early studies from our group assert that CpG island genes that lost transcriptional activity would subsequently lose protection from cancer-associated DNA hypermethylation. Upon hypermethylation, these regions could then “spread” repressive chromatin to neighbouring loci causing an LRES phenotype234. Alternatively, Hsu and colleagues216 demonstrate epigenetic repression of LRES regions mediated by the presence of distinct looping structures that are modulated by oestrogen stimulation.

As discussed in the Introduction (Chapter 1), well described nuclear domains are often repressive in nature, as evidenced by LOCKs, BLOCs, LADs and PMDs. In cancer research the trend towards repressive epigenetics was bolstered by early studies of CpG island hypermethylation and gene silencing of tumour suppressor genes147. Characterisation of LRES and the commonly investigated polycomb repressive complex235,236 have further 39

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION reinforced cancer epigenetics as a field dominated by aberrant chromatin silencing. As a deregulated nucleus is an essential quality of cancer development, it is expected that processes that promote active epigenetic conformations, or those that suppress them, would also become disrupted. Ectopic epigenetic activation in cancer is traditionally represented by hypomethylation of repeat elements potentiating transposon237 and oncogene133 transcription. Oncogenic and active deregulation of chromatin has been identified at select loci, most notably at the HoxA locus when associated with amplification of the MLL methyl transferase183. There are however no studies that systemically assess cancer cells for genomic domains that become epigenetically activated during tumourigenesis. The research described in this chapter has now been published in Cancer Cell: “Regional activation of the cancer genome by long-range epigenetic remodelling” (2013)145.

40

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

3.2 Aim

The aim of this chapter is to identify and epigenetically characterise domains that are transcriptionally activated in prostate cancer.

Specific aims: i) Using expression microarrays, to identify transcriptionally activated domains in prostate cancer using LNCaP and PrEC cell lines as a model for cancer and normal prostate cells respectively. ii) To validate transcriptionally activated identified domains using q-RT-PCR and publicly available clinical datasets. iii) To characterise transcriptionally activated domains with reference to genomic and clinically relevant qualities. iv) To describe the epigenetic changes that occur in these domains using previously generated datasets for histone modifications and DNA methylation in LNCaP and PrEC cells.

3.3 Methods

3.3.1 Expression arrays

300 ng of TRIzol purified RNA was labelled according to Affymetrix GeneChip Whole Transcript Sense Target Labelling Assay Manual (P/N 701880 Rev. 4, Affymetrix) and hybridised to GeneChip Human Gene 1.0ST arrays (#902112, Affymetrix) according to manufacturer’s instructions. Data was pre-processed using RMA238 and differential expression was calculated using Limma239.

41

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

3.3.2 ChIP on chip

50 ng of Chromatin Immunoprecipitated DNA and input DNA was amplified using GenomePlex Complete Whole Genome Amplification Kit (#WGA2, Sigma) according to manufacturer’s instructions. Reactions were cleaned using the GeneChip Sample Cleanup Module (#900371, Affymetrix) and fragmented and labelled according to Affymetrix Chromatin Immunoprecipitation Assay Protocol #702238 Revision 3. GeneChip Human Promoter 1.0R arrays (#900777, Affymetrix) were hybridised using the GeneChip Hybridisation Wash and Stain kit (#900720) according to manufacturer’s instructions. Model-based Analysis of Tiling-arrays (MAT)240 was used to analyse the array data, using a 1 kb smoothing window and biological duplicates. Signals were normalised to input to correct for copy number changes between the cell lines.

3.3.3 Bioinformatics

Bioinformatic analyses described below were primarily carried out by Mark Robinson, a postdoctoral statistician in the Clark lab, with the assistance of research assistant Dario Strbenac and PhD student Aaron Statham.

3.3.3.1 Identification of transcriptionally activated regions Replicate array data from LNCaP and PrEC were processed using RMA238, and moderated t- statistics representing differential expression between the two samples were calculated using Limma239 for each represented gene. Domains were identified by first defining a ‘core region’ where the median t-statistic over 5 genes was greater than 4 for two sequential gene windows. These core regions were then extended bi-directionally to encompass flanking genes that were also assigned positive t-statistics for change in expression.

3.3.3.2 Oncomine data mining Changes in Oncomine data sets were called at ‘probe set level’ between local prostate cancer and matched benign prostate samples and data was remapped to the appropriate hg18 build.

42

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

3.3.3.3 Heat maps Heat maps were created using the ‘blockStats’ procedure in the R-epitools software package241. Briefly, enrichment of chromatin modifications as assayed by GeneChip Human Promoter 1.0R arrays were summarised in 1000 bp blocks around the transcription start site (-2500 to -1500, -1500 to -500, -500 to +500, +500 to +1500). Testing of changes in individual regions was carried out using ‘geneSetTest’ function in Limma239.

3.3.3.4 Copy number analysis To assess the prevalence of copy number changes between LNCaP and PrEC datasets, all GeneChip Human Promoter 1.0R probe-level data for genomic DNA input samples was processed using RMA238. Change in copy number was determined as the average difference between the LNCaP and PrEC datasets.

3.3.3.5 Pairs analysis Statistical analysis of enrichment scores for neighbouring epigenetic changes was completed as previously described140. Cut-offs of epigenetic change were set as a t-statistic of change in LNCaP compared to PrEC, greater than or less than 2 for chromatin marks, and a Z-score greater than or less than 2 for MBDCap-Seq data.

3.3.3.6 Significance plots Plots of tiling array signal across the promoter are summarised over gene sets of interest using the ‘significancePlots’ procedure in the R-epitools R package241. Briefly, the smoothed tiling array signal is summarised for probes in each sequential region of the promoter in each gene set. Significant boundaries are denoted by the same computation of many random gene sets, and the 2.5 percentile, 50 percentile and 97.5 percentile are plotted alongside the gene set of interest.

3.3.3.7 Genomic feature density The density of genomic features such as repeat elements and gene density was calculated for each region of interest and normalised to 1 Mb of genomic sequence. These are

43

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION visualised against a boxplot representing the distribution of 10,000 random non- centromeric regions (the same size as the regions of interest), which were again normalised to a 1 Mb density. The whiskers of the boxplot denote 2.5 percentile and 97.5 percentile boundaries, thus empirically representing significance of 5% extremities.

44

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

3.4 Results

LNCaP and PrEC cell lines were used as a model system to identify regions of the genome that were transcriptionally activated in prostate cancer. PrEC (Prostate Epithelial Cells) are a commercially available primary culture of normal epithelial cells that are both Androgen Receptor and PSA negative242. LNCaP (Lymph Node Cancer of the Prostate) cells243 are derived from a lymph-node metastasis of a human prostate adenocarcinoma obtained in 1977. The cell line is androgen responsive, PSA positive and hypotetraploidal244. DU-145245 and PC-3246 cells were also derived from prostate cancer metastases and are used herein for the purposes of experimental validation. Transcription was assayed across these cell lines using Affymetrix Human Gene 1.0ST expression arrays. Data was pre-processed using RMA238, and moderated t-statistics that represented the change in expression from PrEC to LNCaP were calculated for each represented gene. A positive t-statistic indicates an increase in expression from PrEC to LNCaP, with the magnitude of the statistic correlating with the significance of this increase.

3.4.1 Identifying activated domains in prostate cancer

To characterise regions of transcriptional activation in LNCaP cells, the median t-statistic over a sliding window of 5 genes was calculated as a representation of localised up- and down- regulation of consecutive genes. Regions were initially identified by defining a ‘core’ region, where the median t-statistic was above 4. These core regions were then extended bi-directionally to include flanking genes that also had positive t-statistics for a change in expression, thus identifying additional loci where transcriptional activation occurs. Forty- three activated domains were found using this process.

An example of this identification process is given in Figure 3.1 for genes on chromosome 7. In Figure 3.1.A, the median t-statistic is plotted for loci across the entire chromosome where it is clear that only two regions fit the given criteria of an activated domain. Similar plots for all chromosomes are featured in Appendix 9.2. Figure 3.1.B further elaborates on the

45

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION applied extension principles where 5 additional genes are attached to the initial “core” region, as indicated by the dashed arrow.

Figure 3.1 Identification of activated domains in prostate cancer cells A) The median t-statistic representing change in expression from LNCaP to PrEC over 5 genes is plotted for every locus on chromosome 7 (red line). The dotted line marks a median t-statistic of 4, above which activated regions were identified. B) An identified activated region is magnified. The blue bars indicate the calculated t-statistic of change in expression for each gene labelled on the x0-axis. The red line is a magnified portion of the median t- statistic shown in A).

46

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

The presence of copy number amplifications, which are common in prostate cancer247, could cause apparent regional up-regulation of transcription. To exclude these domains, we used genomic DNA inputs that had been hybridised to Affymetrix Promoter 1.0R arrays to estimate copy number changes between LNCaP and PrEC. Seven regions were removed from further analysis as they showed significantly increased (p<0.05) DNA content in LNCaP cells (Figure 3.2), leaving 35 unamplified and transcriptionally activated regions. Surprisingly, two domains showed a significant loss of DNA copy number (4q31.1 and 14q11.1-q11.2, Figure 3.2) despite regional gene activation. The 35 remaining regions were thus considered bona fide activated domains and were used for the remaining analyses. The characteristics of these regions are summarised in Table 3.2 and elaborated further in this chapter.

Figure 3.2 Identified regions were assessed for copy number amplifications Using input DNA hybridised to Affymetrix Promoter 1.0R arrays, gene levels estimates of changes in copy number between PrEC and LNCaP were calculated and are plotted here as the corresponding log10 p-value. The red line represented a p-value of 0.05 above which changes are considered significant.

3.4.2 Validation of activated regions

To confirm the existence and veracity of these activated domains, several approaches were taken. First, gene expression of transcripts from 4 example domains was validated with q- RT-PCR, using RNA from LNCaP and PrEC cells. SYBR green was used to fluorescently monitor amplification and primers that interrogate the 18S transcript were used for normalisation purposes. From the obtained data it was apparent that all genes in the chosen regions were highly up-regulated in LNCaP when compared to PrEC (Figure 3.3). It was therefore

47

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION concluded that array based quantification of gene expression accurately represented the relative amount of RNA in these cells.

Figure 3.3 Transcriptional activation was validated with q-RT-PCR Data in the left panels represents Affymetrix expression data, with the grey box indicating values for which expression is considered background. Note that these values represent log2 changes in expression. Data on the right shows q-RT-PCR validation of corresponding genes, normalised to 18S in each sample.

48

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

Second, to examine the persistence of these activated domains in other prostate cancer model systems, we carried out gene expression studies using DU145 and PC3 on the Affymetrix Human Gene 1.0ST platform. Data obtained was processed as with LNCaP and PrEC. Of the activated regions identified in LNCaP, 43% were consistently activated in PC3 cells and 57% were activated in the DU145 cell line when compared to normal PrEC cells (Table 3.2).

Study Name Normal Cancer Reference

Dhanasekaran 12 36 Delineation of prognostic biomarkers in prostate cancer248 Lapointe 41 71 Gene expression profiling identifies clinically relevant subtypes of prostate cancer249 Liu 13 44 Sex-determining region Y box 4 is a transforming oncogene in human prostate cancer cells250 Luo 15 15 Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling251 Singh 50 52 Gene expression correlates of clinical prostate cancer behavior252 Tomlins 28 49 Integrative molecular concept modeling of prostate cancer progression253 Vanaja 8 32 Transcriptional silencing of zinc finger protein 185 identified by expression profiling is associated with prostate cancer progression254 Varambally 6 13 Integrative genomic and proteomic analysis of prostate cancer reveals signatures of metastatic progression255 Welsh 9 25 Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer256 Yu 5 6 Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy257 Table 3.1 Oncomine dataset characteristics Expression data from 11 large clinical studies were obtained from the Oncomine database. ‘Normal’ and ‘Cancer’ refer to the number of respective samples in each study.

Finally, to investigate if these regions were transcriptionally up-regulated in clinically relevant prostate cancer samples, we analysed gene expression from eleven large datasets obtained from the Oncomine database248-258. These datasets were selected as they contained gene expression data for both tumour and (unmatched) normal samples. The datasets and their respective number of samples are clarified in Table 3.1, interrogating a total of 187 normal and 343 cancer tissues. Matching cancer and normal probes from individual datasets were called as unchanged, up-regulated in cancer or down-regulated in cancer, and remapped to hg18. These comparative values were plotted for all 49

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION aforementioned transcriptionally activated domains in LNCaP as can be seen for representative examples in Figure 3.4. Mapped changes from these Oncomine datasets for all regions can be seen in Appendix 9.3. Using this approach I found that 74% (26/35) of identified domains are consistently activated in clinical prostate cancer (summarised in Table 3.2), suggesting a commonality of genesis.

Figure 3.4 Oncomine prostate cancer studies validation Summarised data extracted from 11 Oncomine prostate cancer studies is plotted for 4 example transcriptionally up- regulated domains. Red, green and grey boxes represent probe-sets with increased, decreased, or unchanged expression respectively. Coloured boxes are aligned to the RefSeq transcripts below the plot. Note, due to the nature of these array datasets, not all genes present are represented in the Oncomine database

50

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION # % Up in miRNA/ Region Band Chromosome Coordinates Size (kb) Genes genes Oncomine Oncogene 1 * ^ 1q21.2 chr1: 149,035,308 - 149,234,738 199.4 5 70 ARNT CTSK, ARNT, SETDB1, LASS2, ANXA9 2 * ^ 1q23.3 chr1: 159,357,388 - 159,457,113 99.7 8 68.8 DEDD, UFC1, USP21, PPOX, B4GALT3, ADAMTS4, NDUFS2, FCER1G 3 * ^ 3q13.2-q13.31 chr3: 114,948,598 - 115,164,901 216.3 3 66.7 ATP6V1A, GRAMD1C, ZDHHC23 4 * ^ 3q13.33 chr3: 121,797,818 - 122,748,178 950.4 6 70 NDUFB4, HGD, RABL3, GTF2E1, STXBP5L, POLQ TMPRSS11E, UGT2B17, UGT2B15, TMPRSS11E, UGT2B15, UGT2B10, UGT2A3, UGT2B11, 5 4q13.2 chr4: 68,995,776 - 70,396,212 1400.4 11 40 UGT2B7, UGT2B28, UGT2B4 6 * 4q22.1 chr4: 89,398,960 - 89,848,709 449.7 5 60 PPM1K, HERC6, HERC5, PIGY, HERC3 7 * 4q31.1 chr4: 140,594,411 - 141,294,683 700.3 5 62.5 RAB33B, SETD7, MGST2, H3F3A, MAML3 8 ^ 5p13.2 chr5: 36,139,171 - 36,724,193 585 5 50 mir-580 LMBRD2, SKP2, C5orf33, RANBP3L, SLC1A3 9 ^ 6q21 chr6: 110,608,037 - 111,453,996 846 8 50 CDC40, C6orf186, DDO, SLC22A16, CDC2L6, AMD1, GTF3C6, BXDC1 IKZF1 ZPBP, LOC100130988, LOC100130988, IKZF1, FIGNL1, DDC, LOC100129427, GRB10, COBL, 10 * ^ 7p12.2-p12.1 chr7: 49,947,565 - 53,224,113 3276.5 10 66.7 DKFZp564N2472 11 * ^ 7q21.13-q21.2 chr7: 89,621,625 - 91,577,925 1956.3 9 55.6 AKAP9 STEAP1, STEAP2, C7orf63, GTPBP10, CLDN12, PFTK1, FZD1, MTERF, AKAP9 12 * ^ 8p21.2-p21.1 chr8: 27,224,916 - 27,528,288 303.4 4 16.7 PTK2B, CHRNA2, EPHX2, CLU 13 * ^ 8q21.13 chr8: 80,685,869 - 82,186,858 1501 5 80 STMN2, HEY1, MRPS28, TPD52, ZBTB10 14 9q21.13-q21.2 chr9: 78,190,253 - 79,453,043 1262.8 9 58.3 RFK, GCNT1, PCA3, PRUNE2, FOXB2, LOC645225, VPS13A, LOC642947, GNA14 15 ^ 10q11.21 chr10: 43,201,071 - 43,464,332 563.3 4 50 HNRNPF, ZNF239, ZNF485, ZNF32 16 * ^ 11p15.1-p14.3 chr11: 20,365,679 - 25,566,989 5201.3 10 55.6 FANCF PRMT3, SLC6A5, NELL1, ANO5, SLC17A6, FANCF, GAS2, SVIP, LUZP2, LOC554234 17 * ^ 12p11.21 chr12: 31,118,046 - 31,773,319 655.3 8 40 DDX11, OVOS2, FAM60A, FLJ13224, DENND5B, AK3L1, C12orf72, AMN1 18 * ^ 12q14.2-q14.3 chr12: 63,084,500 - 63,801,383 716.9 6 60 mir-548c XPOT, TBK1, RASSF3, GNS, TBC1D30, WIF1 19 * ^ 12q21.31 chr12: 80,177,487 - 82,052,212 1874.7 4 75 PPFIA2, CCDC59, C12orf26, TMTC2 20 * ^ 12q23.2 chr12: 100,073,402 - 100,979,977 906.6 10 44.4 SLC5A8, UTP20, ARL1, SPIC, MYBPC1, CHPT1, SYCP3, GNPTAB, DRAM, CCDC53 21 * 13q12.12 chr13: 22,653,091 - 23,779,210 1126.1 7 50 SGCG, SACS, TNFRSF19, MIPEP, PCOTH, FLJ46358, SPATA13 22 * ^ 14q11.1-q11.2 chr14: 18,447,594 - 19,251,915 804.3 6 0 OR11H1, A26C2, LOC440157, OR11H1, A26C2, OR11H1 23 * ^ 14q13.3-q21.1 chr14: 36,218,829 - 37,752,019 1533.2 6 70 SLC25A21, MIPOL1, FOXA1, C14orf25, TTC6, SSTR1 24 15q11.2 chr15: 22,705,169 - 22,888,117 182.9 3 100 SNRPN, SNRPN, SNRPN 25 * ^ 15q21.1 chr15: 43,561,968 - 43,689,685 127.7 3 66.7 C15orf21 SLC30A4, C15orf21, PLDN 26 16p12.2 chr16: 20,542,060 - 20,768,491 226.4 5 90 ACSM1, THUMPD1, ACSM3, EXOD1, LOC81691 27 * ^ 19p13.2 chr19: 8,036,287 - 8,293,278 257 5 50 FBN3, LASS4, CD320, NDUFA7, RPS28 28 * 19q13.33 chr19: 56,020,357 - 56,105,806 85.5 5 100 KLK2 KLK15, KLK3, KLK2, KLKP1, KLK4 20q11.21- 29 * ^ chr20: 31,334,602 - 31,737,871 406.3 8 64.3 C20orf114, CDK5RAP1, SNTA1, CBFA2T2, NECAB3, C20orf144, C20orf134, E2F1 q11.22 30 * ^ 22q11.21 chr22: 16,973,242 - 17,159,474 186.2 5 33.3 TUBA8, USP18, DKFZP434B061, LOC728212, GGT3P DGCR6L, LOC728212, DGCR6L, TMEM191B, PI4KAP2, RIMBP3B, LOC728212, LOC728212, 31 * ^ 22q11.21 chr22: 18,681,799 - 19,092,752 411 10 50 USP18, ZNF74 CCDC22, FOXP3, PPP1R3F, GAGE4, GAGE13, GAGE12C, GAGE13, GAGE12C, GAGE12B, 32 * Xp11.23 chrX: 48,978,871 - 49,347,307 368.4 13 0 GAGE12B, GAGE12C, GAGE12C, PAGE1 mir-98/let-7f / SSX7, SSX2, SSX2, SPANXN5, XAGE5, XAGE3, FAM156A, FAM156A, GPR173, TSPYL2, JARID1C, 33 * ^ Xp22.22 chrX: 52,689,836 - 54,488,645 1798.8 20 46.4 SSX2 IQSEC2, SMC1A, RIBC1, HSD17B10, HUWE1, PHF8, FAM120C, WNK3, TSR2 34 * ^ Xq11.1 chrX: 62,435,851 - 63,342,349 906.5 4 62.5 LOC645251, SPIN4, ARHGEF9, FAM123B mir-767/mir- MAGEA5, MAGEA10, GABRA3, GABRQ, MAGEA6, CSAG2, MAGEA2, MAGEA12, CSAG1, 35 * ^ Xq28 chrX: 151,033,182 - 151,688,896 655.7 12 28.6 105 MAGEA2, CSAG2, MAGEA3 Average 926.81 7.06 55.77 Table 3.2 Summary of activated regions in prostate cancer ‘*’ and ‘^’ indicate regions that also contain genes up-regulated in PC3 or DU145 respectively, compared to PrEC cells. Underlined genes indicate those within gene families, bold genes indicate those that have CpG island promoters. Tumour genes are denoted by the Wellcome Trust Sanger Institute Cancer Genome Project.

51

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

3.4.3 Activated domains harbour cancer related genes

By using this thorough methodology I identified 35 transcriptionally up-regulated regions that harboured 251 genes. These regions span 32.5 Mb in total (approximately 1% of the genome), range in size from 85.5 kb to 5.2 Mb and have a mean size of approximately 1 Mb. Activated domains were identified on all chromosomes except 2, 17, 18, 21 and the Y chromosome, with chromosomes 7, 11 and 12 having the highest coverage. Each region contained an average of 7 genes, with a mean 5.96 fold change in expression in LNCaP compared to PrEC cells (Table 3.2).

To further characterise distinguishing features of these regions, I compared the occurrence of repeat elements as well as that of gene and CpG island density with a random genomic distribution (Figure 3.5). With specific exceptions, I found no general enrichment of these features across the identified activated regions. I did however find that specific regions 27 (19p13.2) and 29 (20q11.21-q11.22) were enriched for SINE elements; region 35 (Xq28) was enriched for LINE elements and regions 28 (19q13.33) and 30 (22q11.21) were enriched for simple repeats (p<0.05, Figure 3.5).

Notably, of the 35 activated domains I identified, 15% contained gene clusters (Table 3.2). These included the MAGE (Xq28: region 35) and GAGE cancer-testis antigen (Xp11.23: region 32) families, UDP-Glucuronosyltransferases type 2 family genes (UGT2) (4q13.2: region 5), as well as genes from the Kallikrein gene family (KLK) (19q13.33: region 28). Excitingly, we found that several prostate cancer associated genes were also located within these activated regions. Of particular interest was that two of the most sensitive prostate cancer biomarkers, KLK3 (also known as the Prostate Specific Antigen)259 and PCA3 (Prostate Cancer Antigen 3)260,261 were located within regions 28 and 14 respectively. Transcriptional quantification of both these genes is included in Figure 3.3. Loci encoding miRNAs were also found within identified domains, and included let-7f, a member of the well-studied let-7 family, which possess tumour suppressor and antigrowth activity in

52

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

Figure 3.5 Genomic characteristics of activated domains Abundance of genomic features found within the boundaries of each activated domains was normalised to a region size of 1 Mb. “Random‟ represents 10,000 randomised permutations, with the red dashed line demarcating the 2.5% and 97.5% quartiles of this randomisation, serving as an empirical p-value of 0.05 for any feature abundance found outside this range. prostate cancer262, and miR-98, which has been reported to potentially target EZH2263 (Table 3.2) In addition these activated domains harboured several genes, including C15orf21, KLK2 and MIPOL1, which are implicated in translocations and gene fusions with Ets-transcription factors in prostate cancer264-266. As the translocation of the Ets- transcription factor ETV1 into an intro of MIPOL1 (14q13.3-q21.1: region 23) is known to occur in LNCaP cells, I sought to replicate this finding to ensure the appropriateness of our model. Using primers that would specifically amplify the translocated product, it is clear that this genomic event occurs in LNCaP cells and not in PrEC cells (Figure 3.6).

53

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

Figure 3.6 Confirmation of the LNCaP specific translocation of the ETV1 locus Primers used corresponded to a locus upstream of ETV1 on chromosome 7, and to the first intron MIPOL1 on chromosome 14. PCR confirmed a fusion product in DNA from LNCaP, but not from PrEC. GSTP1 primers were used as a positive PCR control for an expected negative fusion PCR product in PrEC.

Cancer related genes that are not specifically associated with prostate cancer were also identified in these activated domains. These include IKZF1 (7p12.2-p12.1: region 10)267, FANCF (11p15.1-p14.3: region 16)268, ARNT (1q21.2:region 1)269, AKAP9 (7q21.13-q21.2: region 11)270, and SSX2 (Xp22.22: region 33)271 (Table 3.2). Functional annotation clustering of genes in activated domains using DAVID analysis272,273 failed to demonstrate significant enrichment for any tumour associated pathways.

3.4.4 Epigenomic analyses of activated domains

To assess if these 35 activated regions also exhibited significant epigenomic changes, we undertook chromatin immunoprecipitation (ChIP) coupled with array hybridization in a combined technique known as ChIP-on-chip. The ChIP protocol was carried out against several histone modifications, including H3K9ac, H3K4me3, H3K9me2 and H3K27me3. Immunoprecipitated DNA was hybridized to Affymetrix Human Promoter 1.0R arrays, thus facilitating a genome wide, promoter focused quantification of epigenetic enrichment. 54

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

Genome wide DNA methylation was determined via a technique known as MBDCap-Seq. MBDCap-Seq uses the Methyl-Binding Domain of the MBD2 protein as an immunoprecipitative antibody surrogate, with this enriched material being subsequently sequenced, as described in Nair et al232.

Figure 3.7 Concordant epigenetic modification at the KLK locus Histone modifications (H3K9ac, H3K4me3, H3K27me3, and H3K9me2) and DNA methylation profiles (MBDCap-seq) are shown for each TSS from the Kallikrein gene subfamily in region 28 (chromosome 19). For each gene and each modification, the enrichment over input status is shown (green, PrEC; red, LNCaP; black, differential LNCaP-PrEC). Black arrows mark the TSS for each gene. Note that KLKP1 has no probes on the Affymetrix promoter array.

55

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

ChIP-on-chip and MBDCap-Seq signals for the 85 kb Kallikrein region (19q13.33: region 28) can be seen in Figure 3.7, where enrichment of LNCaP, PrEC and the differential are represented. It is readily apparent that a broad restructuring of the epigenetic landscape occurs in this region, leading to a highly activated genomic locus. Specifically, levels of H3K9ac are increased across all represented gene promoters while the polycomb H3K27me3 is simultaneously reduced. Changes in H3K9me2 and H3K4me3 were more discrete, with losses of H3K9me2 at the promoter of KLK2, and gains of H3K4me3 at the promoter of KLK4. There was minimal change of DNA methylation in regions of low CpG density (non-CpG islands), or in regions of high CpG density, including CpG_28 (in the body of KLK15), which remained methylated and CpG_35 (4.6 kb upstream of KLK15), which remained unmethylated. Specific analysis of methylation in this region will be discussed further in Chapter 4.

3.4.5 Activated domains are epigenetically deregulated

To expose the epigenetic mechanisms that underscore the activation of these 35 identified domains, we collectively analysed histone modification enrichment at the 251 contained gene promoters. Using the ‘significancePlots’ function from the Repitools R package241, plots were generated that describe the relative enrichment of a given histone modification around the TSS for a given set of loci. Specifically, 1000 randomised sets of 251 TSSs had the average enrichment for a given modification calculated for each genomic coordinate relative to the TSS. The 2.5th percentile, 97.5th percentile and median of these random calculations were plotted in Figure 3.8. The average enrichment for a given modification was then calculated for the gene list of interest, here represented by the 251 genes found within activated domains. Where the line that represents the gene list of interest takes leave of the plotted random distribution, it denotes an empirically significant difference at that relative coordinate. Using this approach, a significant enrichment of the active H3K9ac modification was seen +/- 1000 bp from the start of transcription. H3K27me3 reveals a

56

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

Figure 3.8 Generalised epigenetic changes in activated domains The black line represents the average signal change of all genes in activated regions, smoothed over 600 bp and plotted in 100 bp increments over the TSS (-7000 bp to +2500 bp). The shaded region represents 95% of 1000 randomised gene lists of the same size (bounded by 2.5% and 97.5% of the distribution); the dotted line indicates the median of this randomization. The shaded region therefore represents the internal limits of an empirical p-value of 0.05 for each coordinate. general depletion over the entire locus, while H3K4me3 and H3K9me2 only show focal regions of enrichment and depletion respectively (Figure 3.8).

While this data suggests there are general epigenetic mechanisms of regional activation in the identified domains, it doesn’t preclude different subsets of regions being differently regulated. To evaluate this possibility, we first summarized enrichment for each epigenetic modification in 1000 bp blocks surrounding the transcription start site for each gene in the activated domains. Specific blocks were designed to cover the promoter region at: -2500 bp -> -1500 bp, -1500 bp -> -500 bp, -500 bp -> +500 bp, +500 bp -> +1500 bp. This methodology is illustrated in Figure 3.9 for the PRUNE2 gene (9q21.13-q21.2: region 14), 57

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

Figure 3.9 Blocks of epigenetic change surrounding the TSS Differential enrichment (LNCaP-PrEC) of H3K9ac, H3K4me3, H3K27me3 and H3K9me2 is shown for individual probes around the TSS of PRUNE2. “Blocks” of change are calculated by summarising this differential in 1000 bp divisions; the degree of change is indicated by the solid, coloured rectangles. where it is clear that these promoter blocks gain active histone modifications and lose repressive modifications.

To characterize each region’s domain level epigenetic profile, we computed p-values that represented the significance of the genes within a region change for a given chromatin modification in a particular direction. Using this calculation, we designated which regions had significantly changed over all the contained genes for a given epigenetic mark. I identified that epigenetic changes can occur in blocks of consecutive genes, and that this is often found in the context of multiple epigenetic changes in various combinations (Figure 3.10). For example, region 5 (4q13.2; UDP- Glucuronosyltransferase gene family), region 23 (14q13.3; includes SLC25A21, MIPOL1, FOXA1, C14orf24, TT6, and SSTR1) and region 28 (19q13.33; Kallikrein gene family) all display a regional exchange of the repressive H3K27me3 modification for the active H3K9ac (Figure 3.10.A). In contrast, region 4

58

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

(3q13.33), region 9 (6q21) and region 13 (8q21.13) all show global increases in both H3K9ac as well as H3K4me3, exhibiting an epigenetic reinforcement phenotype (Figure 3.10.B). In addition to those regions that exhibit combinations of epigenetic alterations, many activated regions show global changes predominately in only one epigenetic mark. For example, regions 14 (9q21.13-q21.2) and 26 (16p12.2) are specifically depleted of H3K27me3 (Figure 3.10.C), regions 3 (3q13.2-q13.31) and 35 (Xq28) are depleted in H3K9me2 (Figure 3.10.D), while regions 11 (7q21.13-q21.2) and 25 (15q21.1) show a significant gain of H3K9ac (Figure 3.10.E).

Figure 3.10 Clusters of epigenetic changes in activated regions Histone modification changes between LNCaP and PrEC cells are plotted across the TSS for each gene in activated domains, and various chromatin modes are identified. (A) Gain of H3K9ac and loss H3K27me3. (B) Gain in both H3K9ac and H3K4me3. (C–E) Concordant change in only one histone mark (H3K27me3, H3K9me2, or H3K9ac). Each row represents a single gene (named). Heat maps are divided into four blocks showing changes in model-based analysis of tiling-array (MAT) scores at fixed intervals (-2,500 to +1,500, -1,500 to +500, -500 to +500, and +500 to +1,500 bp) relative to the TSS (arrow), for each modification. Relative scale is shown in the bottom panel (green, loss; red, gain; black, not represented on the array). Dotted boxes indicate regions with significant change between LNCaP and PrEC (p < 0.1).

59

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

3.4.6 Epigenetic changes occur in unison

To ascertain if regional epigenetic deregulation in cancer is confined to domains of transcriptional activation or describes a more general epigenomic phenotype, I investigated whether the changes individual genes experience are influenced by local epigenetic environment. To this end, we quantified the incidence of epigenomic remodelling in adjacent genes in LNCaP as compared to PrEC cells over the whole genome. Figure 3.11 shows, in blocks surrounding the TSS, the observed number of adjacent genes that experience the same significant directional epigenetic change compared to an expected random distribution. A significant directional change was designated as a t-statistic representing the differential enrichment in LNCaP compared to PrEC of greater or less than +/- 2 for a gain and loss of a given epigenetic mark respectively. I found that at each position relative to the TSS, all of the assessed epigenetic alterations exhibited a significantly higher frequency of occurring in neighbouring pairs than would be expected by chance (p<0.01, Figure 3.11). Both increases and decreases of H3K9ac and H3K27me3 were found to be significantly enriched over the entire promoter sequence. Pairs of H3K4me3 and H3K9me2 by contrast, were found to generally be enriched at the TSS or immediately downstream. Surprisingly, decreases in DNA methylation were found to have the highest significance for enrichment of changes of any epigenetic mark (p<1x10-43 over the whole promoter).

Figure 3.11 Epigenetic change in cancer is influenced by neighbouring genes Observed and expected counts of the incidence of neighbouring, similar epigenetic changes is plotted for all interrogated modifications at 4 positions surrounding the TSS (+/- 500bp for each indicated position). For H3K9ac, H3K9me2, H3K27me3 and H3K4me3, counts of changes were considered to be significant when the t-statistic was greater (increasing change; red) or less than (decreasing change; green) a value of +/-2 respectively for change from PrEC to LNCaP. For MBDCap-Seq, values were determined as being greater or less than a Z-score of +/-2.

60

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

3.5 Discussion

The study of epigenomics in cancer has traditionally concentrated on aberrations that eventuate in gene silencing. This has in large part been due to early discoveries of CpG island hypermethylation with associated gene repression. In addition, many founding studies of epigenomics in cancer development have looked towards a deregulated polycomb complex, which is known to affect gene silencing. The Clark laboratory has also contributed to this inclination, detailing in high resolution the existence of large regions of the genome that are epigenetically silenced though LRES. The hallmarks of LRES, as elaborated previously, are an enrichment of silencing modifications H3K9me2 and H3K27me3, and depletion of activate modifications such as H3K9ac. Given that genes are both transcriptionally activated and repressed in tumourigenesis, I had aimed to investigate the possibility of a significantly activated epigenomic landscape in cancer.

The approach undertaken in this chapter has detailed several important discoveries. First, through integration of gene expression, chromatin and DNA methylation genome-wide profiles, I identified 35 regions of transcriptional activation that are significantly epigenetically remodelled in prostate cancer. Due to this characteristic, we classified these as domains of Long Range Epigenetic Activation (LREA). These LREA regions harbour 251 genes in LNCaP and include multiple gene families and tumour related genes. Loci that code for miRNAs were also discovered within LREA regions, including cancer related miRNAs such as let-7f and miR-98, the latter of which is known to target the polycomb protein EZH2263. Of particular consequence is that two of the most prominent prostate cancer biomarkers, KLK3 (PSA) and PCA3 were found embedded in LREA regions. Although KLK3 can be expressed in normal prostate tissues, it is highly over-expressed in cancer. The epigenetics of this transcriptional progression are yet to be adequately described. While it is known that this gene is regulated by androgens via histone acetylation274, we describe for the first time that the locus undergoes dramatic and large scale epigenetic remodelling. PCA3 is a noncoding RNA that is extraordinarily prostate cancer specific275. Little is known about the transcriptional regulation of PCA3, but reports suggest its regulation is independent of the 61

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION overlapping PRUNE2 gene276. Here, it is shown that the surrounding PRUNE2 locus becomes depleted of the repressive H3K27me3 modification in cancer cells, indicating that epigenetic remodelling may contribute to the biomarker’s prostate cancer specific activation.

In addition to these known biomarkers, we identified several genes and loci in these remodelled LREA regions that are commonly involved in oncogenic Ets transcription factor translocations in prostate cancer. KLK2 is known to create fusions with ETV4264, while C15orf21 and MIPOL1 are implicated in genomic translocations with ETV1266. While the translocation event between ETV1 and MIPOL1 occurs in LNCaP cells (Figure 3.6), fusion events involving KLK2 and C15orf21 are not apparent. I hypothesise that the presence of these loci in ectopically accessible LREA regions may potentially favour genetic instability and consequently prime these genes for genomic rearrangement in prostate tumourigenesis.

The second key finding of the study is that LREA domains showed extensive changes in chromatin remodelling and could be divided into two prominent modes of histone modification alterations. Mode 1 is characterised by an exchange of repressive for active histone modifications. Specifically we see an enrichment of the active histone modification H3K9ac and a depletion of the repressive polycomb H3K27me3 modification (Figure 3.12). This was particularly prominent in the exemplified Kallikrein region ((19q13.33: region 28, Figure 3.7), the mechanisms of which will be discussed in greater detail in Chapter 5. Mode 2 is categorized by an enrichment of multiple active modifications, specifically those of H3K9ac and H3K4me3 (Figure 3.12), indicating a regionally controlled directive towards active transcription and accessibility.

The identified characteristics of LREA domains are of particular interest when compared to the previously discovered LRES domains. Similar to LREA domains, LRES was identified in the context of the LNCaP cell model by the Clark laboratory140. These domains were distinguished by the loss of H3K9ac as well as the gain of repressive modifications such as

62

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

Figure 3.12 Model of typical LREA epigenetic changes Two main modes of combinatorial epigenetic change were identified in LREA regions. Mode 1 is characterised by an exchange of the repressive polycomb modifications for active histone acetylation. Mode 2 demonstrates an enrichment of multiple active epigenetic motifs in the cancer epigenome.

H3K9me2 and H3K27me3. This discovery, in conjunction with the data outlined in this chapter, suggest that both LREA and LRES occur in the same cell and are typified by similar yet contrary epigenomic transitions. Furthermore, this suggestion presumes the possibility of a common deregulated pathway in prostate tumourigenesis, which can result in domains of transcriptional activation and repression occurring simultaneously.

63

CHAPTER 3: LONG RANGE EPIGENETIC ACTIVATION

Finally, evidence suggesting regional genomic deregulation is further reinforced by the “pairwise” analysis of chromatin change, showing neighbouring promoters undergo similar epigenetic remodelling in cancer (Figure 3.11). Importantly, neighbouring remodelling was significant for both enrichment and depletion of all the studied epigenetic modifications. The cause of this regional bi-directional epigenetic remodelling are still unclear, but it is possible that factors that typically organise the genome into epigenetically and transcriptionally appropriate domains themselves become deregulated. There are currently several organisational processes that I hypothesise could be causally related in the establishment of LREA and LRES in cancer, of which disruption of chromatin looping and replication timing are key candidates and which will be discussed further in this thesis (Chapters 5 and Chapter 6-7 respectively).

64

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

CHAPTER 4 DNA Methylation and Gene Activation

65

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

4.1 Introduction

DNA methylation at CpG dinucleotides is functionally related to gene expression via a number of pathways (see Section 1.2.2). Traditionally this relationship consisted of gene silencing mediated by hypermethylation of CpG islands at gene promoters277; a mechanism of silencing restricted to a subset of developmentally controlled genes, imprinted regions and X-chromosome inactivation in normal cells278. With the advent of technologies that could sensitively assess genomic DNA methylation279 such as whole genome bisulphite sequencing (WGBS), enrichment based sequencing and custom microarrays, a relationship between increased DNA methylation of gene bodies and gene activation became apparent33,37. This correlation occurs preferentially at exons and is exclusive of intragenic CpG islands, whose methylation potentially serves to regulate ectopic and alternative promoter usage36. Thus, in normal cellular development the deposition and maintenance of DNA methylation may play varying roles depending on genomic context.

The development of cancer, however, is witness to a series of anomalous methylation transitions (Figure 4.1, Section 1.4.1). Genome-wide hypomethylation, initially reported by Feinberg et al.131, is one of the primary epigenetic aberrations found in tumours and is traditionally attributed to demethylation of the pervasive LINE-1 elements, as well as other repeat sequences39,280. In contrast, and as has been elaborated previously, promoter associated CpG islands exhibit widespread hypermethylation and gene silencing. Due to this, and as many genes become transcriptionally up-regulated in cancer, it was widely hypothesised that hypomethylation of CpG sites could be associated with gene activation. In support of this hypothesis, demethylation of repeats has been casually implicated in the activation of alternative transcripts132 and over-expression of the CSF1R oncogene133 in cancer. In addition, CpG demethylation of gene-promoters has been shown for several individual genes in cancer, including R-RAS281 and cancer-testis antigens, such as MAGE, GAGE and XAGE families282,283. There are as yet no definitive genome-wide studies that assess the relationship between cancer specific DNA hypomethylation and gene activation. In this chapter I intend to investigate the prevalence of this hypomethylation at CpG islands 66

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION in prostate cancer. The research presented here has now been published in Cancer Cell: “Regional activation of the cancer genome by long-range epigenetic remodelling” (2013)145.

Figure 4.1 Proposed cancer-specific DNA methylation changes While hypermethylation of CpG islands with associated gene silencing is well documented in prostate cancer, the identification of hypomethylated CpG islands and gene activation has remained elusive.

67

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

4.2 Aim

The aim of this chapter is to document the relationship between gene activation and alterations in CpG island methylation in the prostate cancer genome.

Specific aims: i) To identify and validate promoter-associated CpG islands that become hypomethylated in LNCaP cells. ii) To quantitatively examine DNA methylation of CpG islands found within LREA domains. iii) To investigate the relationship between CpG island hypermethylation and gene activation in prostate cancer.

4.3 Methods

4.3.1 Clonal Bisulphite Sequencing (CBS)

Bisulphite specific PCR (Section 2.6.2) was carried out and products were purified using Promega Wizard PCR purification system as described (Section 2.5.2). Purified DNA was eluted with 50 l TE buffer and maintained at 4oC. DNA was ligated into the appropriate vector using the pGEM-T Easy Vector System (#A1360, Promega). Briefly, 1.5 l of PCR product was added to 25 ng pGEM-T Easy vector, 1.5 Weiss U T4 DNA ligase and 1X Rapid Ligation Buffer in a total volume of 5 l. The ligation mixture was gently mixed and incubated at room temperature for 1 hr. Samples were subsequently maintained at 4oC.

2 l ligated DNA was heat shocked transformed into 48 l E. coli DH5αSEM competent cells by incubation on ice for 20 min, 42oC for 45 sec and ice for a further 2 min. 950 l sterile LB was added to each sample, and cells were incubated at 37oC with 150 rpm shaking for 1.5 hr. 50 l cells were streaked on to LB-agar plates which contained freshly added 40 g/ml IPTG (#1882, Bio Vectra), 50 g/ml X-gal (#1161, Bio Vectra), 100 g/ml Ampicillin (#A9393,

68

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

Sigma-Aldrich). Plates were incubated at 37oC for 16-24 hr. White colonies, which denoted successful ligation and transformation, were picked for amplification and purification using the Miniprep procedure (Section 2.9.1).

Purified plasmids were sequenced according to Section 2.9.2, using the Sp6 promoter primer sequence. Successfully sequenced products were analysed using the BiQ Analyzer software tool (Max Planck Institut Informatik)284. This software package aligns bisulphite converted PCR products to the unconverted genomic sequence and filters out products with a non-CpG cytosine conversion of less than 90%. Unmethylated CpG sites are denoted on the basis of the conversion of the cytosine to a thymidine.

4.3.2 RNA based sequencing

4.3.2.1 RNA-Seq RNA was extracted from PrEC and LNCaP cells using TRIzol reagent (Section 2.2.2) and integrity was confirmed using an Agilent Bioanalyzer. 1 μg was sequenced by single-end 75 bp directional sequencing using an Illumina Genome Analyzer IIx at the Ramaciotti Centre for Gene Function Analysis, Australia. Reads were mapped to hg18 human genome build using TopHat285, with RefSeq as the reference transcriptome. Transcripts were assembled with Cufflinks231 using default parameters.

4.3.2.2 CAGE-Seq RNA was extracted from PrEC and LNCaP cells using TRIzol reagent (Section 2.2.2) and integrity was confirmed using an Agilent Bioanalyzer. CAGE-Seq was performed at Riken Laboratories, Japan through Geneworks, Adelaide, Australia using 50 μg total RNA. 29 bp reads derived from CAGE tags were mapped to hg18 human genome build using Bowtie231, allowing up to three mismatches.

69

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

4.3.3 Transcription factor binding analysis

Regions that showed hypermethylation in LNCaP were manually identified from “Group I” genes. For each region, the TRANFAC database (Biobase Biological Databases) was used to locate putative transcription factor binding sites (TFBSs) using the MATCH program with preset cut-offs that minimised false positives. We compared the frequency of TFBS occurrence in the set of regions to a control set comprised of 1000 CpG islands within 2.5 kb of a TSS. Significance was assigned using the Wilcoxon rank sum test and TFBSs were filtered to those that occur at least 8 times.

70

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

4.4 Results

4.4.1 Validation of gene-specific hypomethylation

To initially investigate the relationship between gene activation and promoter associated methylation, we used a technique known as MeDIP-chip (Methylated DNA ImmunoPrecipitation, Section 2.8.3). MeDIP involves the immunoprecipitation of methylation-enriched material via a monoclonal antibody directed against the 5- methylcytosine modified nucleotide. Like the previously described ChIP-on-chip protocol, MeDIP-chip interrogates MeDIP DNA with high density arrays, in this instance, the Affymetrix GeneChip Human Promoter 1.0R array. To identify methylated regions of interest, I used a set of LREA gene promoters as a model for gene activation in prostate cancer. MeDIP-chip data for LNCaP and PrEC was visualised using the IGV browser and promoters that underwent methylation remodelling were identified via visual inspection. In this manner I identified 6 genes in LREA regions that exhibited hypomethylation in LNCaP compared to PrEC (Figure 4.2). Hypomethylation was observed at both promoter associated CpG islands (XPOT, TBK1, FOXA1, SLC30A4) as well as at low CpG density regions (GRB10, C15orf21).

To validate the methylation status of these loci, I used Clonal Bisulphite Sequencing (CBS). CBS takes advantage of the bisulphite conversion of the genome, where all cytosine nucleotides are converted to uracil nucleotides286. This is true for all cytosines regardless of sequence context, with the exception of those with either a 5-hydroxymethyl- or 5-methyl- covalent modification, which remain as a cytosine230,287. After bisulphite conversion, PCR is carried out in triplicate with primers designed against the bisulphite-converted genome at the genomic loci of interest. The resulting PCR reaction contains a mixture of all the amplicons that represent, via differentially converted cytosines, the endogenous state of methylation. The mixture of amplicons was then ligated into the pGEM-T easy vector and transformed into E. coli DH5αSEM competent cells. Transformed colonies were separated on LB agar plates, such that each colony amplified an individual molecule derived from the 71

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

Figure 4.2 DNA methylation validation of 6 LREA genes using clonal bisulphite sequencing The left panels indicate the enrichment of MeDIP-chip signal in PrEC (green) and LNCaP (red) at the promoter region of candidate genes. The indicated arrow represents the annotated RefSeq TSS with the green boxes below indicating the location of CpG islands. The right panels represent CBS results for PrEC and LNCaP, each line indicating the methylation status of an individual PCR clone.

72

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION initial bisulphite PCR. Plasmids were isolated from 12-24 clones and sequenced using a mini- prep procedure and sequenced using Capillary Separation Sequencing. Full details of this method are provided in Section 4.3.1.

Using CBS I validated the methylation status of gene promoters identified as hypomethylated in the MeDIP-chip data (Figure 4.2). I found that nearly all sequences of GRB10, XPOT, TBK1, FOXA1 and SLC30A4 were unmethylated in both PrEC and LNCaP samples. While the validation of methylation in these gene promoters proved unsuccessful, the hypomethylated status of the promoter of C15orf21 was confirmed. As C15orf21 is involved in oncogenic translocations in prostate cancer266, I sought to further characterise any LNCaP specific methylation variations. Figure 4.3 shows the location of several LINE and SINE elements upstream of the C15orf21 promoter. As demethylation of repeat elements is known to be a casual factor in the activation of oncogenic elements in cancer132,133, I used CBS to assess the methylation status of these loci. The results of this investigation however revealed that these upstream elements are methylated in both the normal and cancer model cell lines (Figure 4.3), and are thus unlikely to be involved in the transcriptional regulation of this gene.

Figure 4.3 CBS validation of SINE methylation in C15orf21 MeDIP-chip enrichment is shown in PrEC (green) and LNCaP (red) for all represented probesets across the locus. The location of SINE and LINE repeat elements are indicated by black boxes.

73

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

Figure 4.4 Methylation distribution at promoter associated CpG islands. Methylation enrichment at CpG islands within 2.5 kb of a TSS was obtained by MBDCap-seq and normalised to the fully methylation SssI sample. The frequency of a given methylation score is plotted vertically, the number of bins is calculated using “Sturges Formula”288. 4.4.2 CpG island methylation in LREA domains

As the majority of hypomethylated promoters identified by MeDIP-chip were unmethylated in both PrEC and LNCaP, I concluded that, for my intentions, MeDIP-chip was not an accurate indicator of relative CpG methylation at gene promoters. To further assess relative levels of methylation at activated gene promoters, we utilised the MBDCap-Seq methodology (Section 2.8.1). MBDCap-Seq was used to measure enrichment of CpG methylation across the genome in LNCaP and PrEC cells as well as on fully methylated SssI treated DNA. While this method is an annotation independent technique, as with many other sequencing technologies, it is known to have a bias for CpG dense regions232. To quantify this data therefore, enrichment was summarised for the 20,254 CpG islands in the genome using all reads that were +/- 500 bp from the centre of a given island. To control for variability in endogenous mapability, the MBDCap-Seq enrichment of SssI treated DNA was also calculated at CpG islands. The signal obtained from PrEC and LNCaP was normalised to the fully methylated data to obtain a final ‘score’ of methylation289. For analysis pertinent to gene transcription, CpG islands were confined to those found at gene promoters, defined as being within 2.5 kb of an annotated transcription start site. Using this 74

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION definition we quantified methylation at 13,193 CpG islands. I found that the majority of these CpG islands were unmethylated in both the LNCaP and PrEC methylome (Figure 4.4).

To investigate the relationship between gene activation and the methylation status of associated CpG islands, an FDR adjusted p-value representing the change in methylation between PrEC and LNCaP was calculated. Figure 4.5 plots the log10 of this p-value against the t-statistic representing change in expression from PrEC to LNCaP for the 133 promoter- associated CpG islands in LREA regions. Dashed red lines represent the log10 of a significant p-value of 0.05. Regardless of the significance in change of expression, 75% of CpG islands at LREA gene promoters don’t exhibit a significantly altered methylation status in LNCaP. Of those islands that do show a significant change in methylation, only 2% reveal the anticipated hypomethylated phenotype, while 23% become hypermethylated irrespective of an associated change in expression (Figure 4.5).

Figure 4.5 Relationship between gene expression and DNA methylation in LREA regions

The t-statistic of expression change from PrEC to LNCaP (y-axis) is plotted against the +/-log10 of the FDR adjusted p- value representing significance of difference in methylation (x-axis, +/- for islands that lose or gain methylation respectively) for all CpG islands within 2.5 kb of a TSS found within LREA regions. The dashed red lines indicate a +/- log10 of a p-value of 0.05. Two KLK4 associated CpG islands are highlighted with red circles.

75

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

4.4.3 KLK4 and cancer specific hypomethylation

The only CpG island that exhibited a high degree of hypomethylation in LREA regions was CpG_27, an island found within the KLK4 gene. This was in stark contrast to the KLK4 promoter-associated island, CpG_22. Using CBS to validate the methylation of these islands in PrEC and LNCaP cells, I found that CpG_22 was methylated at 80% of sites in the LNCaP genome and remained essentially unmethylated in PrEC cells with only 2% of sites methylated. Conversely, CpG_27 exhibited the expected hypomethylated phenotype with 73% of sites methylated in PrEC cells and 1% in LNCaP (Figure 4.6.A). As methylation switching associated with gene activation was particularly unusual, I investigated the epigenetic status of these islands in H1 human embryonic stem (hES) cells to determine their regulatory state in early development. We extracted pertinent ChIP-Seq data (H3K27me3 and H3K4me3) as well as whole genome Bisulphite-seq methylation data from Lister et al. ‘Human DNA methylomes at base resolution show widespread epigenomic differences’ (2009)33. Figure 4.6.A shows that similarly to PrEC, CpG_22 is unmethylated in hES cells but is also bivalently marked with H3K4me3 and H3K27me3. In contrast, CpG_27 has neither of the bivalent marks and is completely hypermethylated in hES.

As the methylation status of CpG islands at gene promoters and gene bodies has been implicated in the regulation of transcription36, I wanted to investigate the transcriptional potential of the KLK4 gene. To this end, I designed q-RT-PCR primers that selectively amplify the different isoforms of KLK4 associated with the CpG islands (Figure 4.6.B). Specifically, primers were designed to cover exons 1 to 2, 2 to 3 and 3 to 4. I found that in both PrEC and LNCaP cells, expression originating from the second exon, located 666 bp upstream of CpG_27, was 100 fold higher than that originating from the first annotated exon. Expression originating from both exons was significantly higher in LNCaP than it was in PrEC cells.

76

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

Figure 4.6 Methylation and expression of the KLK4 locus A) Data from MBDCap-Seq is plotted for PrEC (blue) and LNCaP (red). DNA methylation, H3K27me3 and H3K4me3 data for the KLK4 locus from H1 ES cells33, is represented below. The bottom panels indicate CBS results confirming the differential methylation for the two CpG islands in PrEC and LNCaP. B) q-RT-PCR is used to investigate the origin of transcription in KLK4. Primers designed to interrogate exons 1-2, 2-3 and 3-4 are indicated below the representation of the KLK4 transcript. Expression is normalised to the 18S transcript and plotted on a log scale.

77

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

Figure 4.7 Methylation and expression of KLK4 in clinical samples A) CBS was used to validate the methylation status of 6 representative tumour and normal clinical samples, the number above each set of results indicate the sample designation. The upper and lower panels show the methylation status of CpG_22 and CpG_27 amplicons respectively. B) Percent methylation of each sample at both amplicons, calculated as # methylated sites / # total sites x 100. * denotes a significant difference (p<0.05) between the two groups. C) q-RT-PCR quantification of transcription from exon 2 of KLK4 is normalised to GAPDH for normal and tumour samples and 18S for cell line samples.

78

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

This relationship between the hypomethylation of CpG_27 and expression originating from the nearby exon 2 was further investigated further using a panel of clinically obtained prostate tumour (n=3) and normal samples (n=3) obtained from the Garvan Prostate Cancer Tumour Bank. Using CBS to assess the methylation status in these samples, hypermethylation of CpG_22 was again observed, with an average of 15% methylation in normal samples and 49.4% in tumour samples (Figure 4.7.A,B). Significant cancer-specific hypomethylation was identified at CpG_27 (p<0.05, two-sided unpaired t-test) with an average of 58% methylation in normal samples and 15% in cancer samples (Figure 4.7.A,B). Clonally distinct methylation patterns were distinguished in these samples, suggesting they represent a heterogeneous population of normal and cancer cells. Transcription originating from exon 2 of KLK4 was assessed using q-RT-PCR for all clinical samples (Figure 4.7.C). Tumour samples revealed higher levels of KLK4 expression compared to normal samples, indicating a correlation between transcriptional output and hypomethylation of CpG_27, as previously demonstrated in LNCaP and PrEC (Figure 4.6).

4.4.4 CpG island methylation and gene activation

Through investigation of methylation changes in LREA genes I’ve discussed specific examples of hypomethylation associated with gene activation, but noted that this was not as common as expected. To more fully explore the relationship between gene expression and CpG island methylation in PrEC and LNCaP, I plotted change in gene expression against the significance of change in methylation, for all promoter-associated CpG islands in the genome (Figure 4.8). As indicated in Figure 4.4, the majority of these CpG islands are unmethylated in both PrEC and LNCaP. As such, 76.5% of interrogated islands fail to exhibit a significant change in methylation between PrEC and LNCaP cells. Of those that do change, and in a similar tendency to LREA regions (Figure 4.5), 21% of CpG islands were hypermethylated in LNCaP compared to the 2.5% that were hypomethylated. The two CpG islands associated with KLK4 are highlighted in Figure 4.8 as an indication of the rarity of hypomethylated CpG islands affiliated with a hyper-activated gene promoter. Of the CpG

79

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

Figure 4.8 Genome wide relationship between gene expression and CpG island methylation

The t-statistic of expression change from PrEC to LNCaP (y-axis) is plotted against the +/-log10 of the FDR adjusted p- value representing significance of difference in methylation (x-axis, +/- for islands that lose or gain methylation respectively) for all CpG islands within 2.5 kb of a TSS in the genome. The vertical dashed red lines indicate a +/-log10 of a p-value of 0.05. The horizontal dashed red lines indicate a t-statistic of change in expression > 4 or < -4. The two KLK4 associated CpG islands are highlighted as red circles. The shaded quadrant indicates those hypermethylated CpG islands found at genes that have a t-statistic of change in expression >4. islands that were hypermethylated in LNCaP (FDR adjusted p<0.05), 15% of the associated genes adhered to the accepted dogma and exhibited transcriptional repression (t-statistic < -4). In contrast to this dogma however, and in support of the trend exhibited by LREA genes, I found that 5% of CpG islands that were significantly hypermethylated were associated with genes that were significantly transcriptionally activated (Figure 4.8, shaded area, t-statistic > 4).

80

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

To understand the potential mechanisms that underlie this unexpected observation, I utilized a number of a sequencing datasets that had more recently been performed in the Clark laboratory. To precisely quantify the transcriptional output and exon usage of all genes in LNCaP and PrEC, I used RNA-Seq data. This technology involves deep sequencing of all poly-adenylated RNA that had been reverse transcribed to cDNA, thus mapping all transcription in the genome. To complement this dataset, I used data from the transcription start site mapping technique known as CAGE-Seq (Cap Analysis of Gene Expression)290,291. The CAGE methodology incorporates a biotinylation of the 5’ cap found at the 5’ end of pre- mRNA. Molecules containing biotinylated 5’ cap are enriched with the aid of streptavidin and are subsequently sequenced. This technique thus enriches and identifies those sequences found at the 5’ end of RNA molecules. Finally, H3K4me3-Seq was also used to identify active promoter across the genome. H3K4me3-Seq combines chromatin immunoprecipitation against H3K4me3 with deep sequencing. For all details regarding these sequencing methodologies, see Section 4.3.2 and Section 2.7.3.

By combining these datasets with MBDCap-Seq, and through visual inspection of those CpG islands that are hypermethylated and associated with activated transcripts (Figure 4.8, shaded area), I discovered that all contained CpG islands fell into one of two dominant patterns of methylation. Group I describes promoters where hypermethylation occurs at the borders of CpG islands and methylation is exclusive of the transcription start site (Figure 4.9). This was the dominant group of methylated and activated promoters, being found at 85% of loci. Of particular interest was the mutual exclusivity between H3K4me3 and DNA methylation at these promoters; the former appearing to “protect” the TSS from encroaching methylation. The activity of the annotated TSS was confirmed by the presence of both CAGE-Seq and RNA-Seq signal. Figure 4.9 shows 4 examples of Group I methylation. The hypermethylation signal can be seen as either unidirectional, as with WNK3, MMP16 and PRUNE2, or, as is the case with IQGAP2, to enclose the activated TSS from both sides. PRUNE2 is a biologically notable inclusion in this category, as this gene is in an LREA region,

81

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

Figure 4.9 Group I: border methylation of CpG islands Group I genes showed a gain of DNA methylation at a flanking region of a CpG island associated with an activated TSS. Data for: MBDCap-Seq is plotted in blue (PrEC) and red (LNCaP), H3K4me3 is plotted in green (PrEC) and orange (LNCaP). RNA-Seq and CAGE-Seq profiles are plotted below the RefSeq transcripts for each gene. contains the PCA3 non-coding biomarker and is one of the most highly up-regulated genes in the genome.

Group II methylation at these transcriptionally up-regulated genes was categorised by hyper-methylation that enveloped the annotated TSS, with a shift in the observed TSS (Figure 4.10). Altered transcriptional initiation was defined by CAGE-Seq peaks that were distal to the fully methylated annotate site. These were further confirmed by the presence of H3K4me3 signal in LNCaP as well as supportive RNA-Seq. Figure 4.10 documents three examples of Group II promoters where the new TSS is internal to the annotated transcript (TRIM36, ALOX15, MPP2) and one where transcription begins 40 kb upstream (TBC1D30). Interestingly, in three of these examples (TRIM36, TBC1D30, MPP2), transcription begins at alternate start sites that have unmethylated CpG islands, suggesting a commonality in the gene’s regulatory machinery compared to the endogenous state. Group II pattern methylation was found at 10% of the CpG island promoters that were hypermethylated and transcriptionally up-regulated. 82

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

Figure 4.10 Group II: full methylation of CpG islands with alternate TSS Group II genes showed extensive DNA methylation that enveloped the local TSS, associated with alternative promoter usage. Data for: MBDCap-Seq is plotted in blue (PrEC) and red (LNCaP), H3K4me3 is plotted in green (PrEC) and orange (LNCaP). RNA-Seq and CAGE-Seq profiles are plotted below the RefSeq transcripts for each gene.

While Group II methylation was congruous with the current understanding of transcriptional regulation, Group I methylation warranted further investigation. As hypermethylation of CpG island borders resulted in an augmentation of gene expression, we undertook a motif analysis on these specific regions to unearth potentially enriched binding partners that could serve to repress transcription. Using the TRANSFAC292 database, a curated list of transcription factors motifs, I compared the frequency of motifs found in Group I methylated regions against 1000 randomised sets of CpG island sequences in the genome. Motifs that were significantly enriched (Wilcoxon rank sum test, p<0.05), and occurred more than 8 times in assessed regions are represented in Table 4.1. As the functional annotation of these binding factors was represented by both activators and repressors293-303, it appeared that these methylated regions don’t exclusively harbour elements to reduce expression. Of note, however, was that the most commonly found binding site in these regions was for ZF5. ZF5 is a ubiquitous zinc finger transcriptional 83

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

TRANSFAC TFBS Gene Symbol # p-value Function Symbol VENTX V$XVENT1_01 12 9.11x10-7 Repressor MSX1 V$MSX1_01 13 1.02x10-4 Repressor TEAD2 V$ETF_Q6 12 2.09x10-4 Activator SREBF1 V$SREBP1_02 8 1.04x10-3 Activator HAND1 V$HAND1E47_01 14 1.45x10-3 Activator/Repressor TEF V$TEF1_Q6 13 4.31x10-3 Activator KLF12 V$AP2REP_01 10 1.08x10-2 Repressor SP1 V$SP1_Q6 10 1.27x10-2 Activator ZF5 V$ZF5_B 24 1.40x10-2 Repressor EP300 V$P300_01 14 3.55x10-2 Activator Table 4.1 Transcription factor binding motifs identified in Group I genes # refers to the number of times a TFBS was found in the specified regions repressor, which has been shown to regulate the expression of c-myc304, and is potentially methylation-sensitive305. The validation of the selective binding of these repressive factors remains a further study to be continued by the Clark lab.

4.4.5 PrEC cells are a model of normal CpG methylation

Finally, to investigate whether methylome comparisons conducted between LNCaP and PrEC models accurately represent a divergence from a normal state, prostate cell lines were compared to H1 ES cells. To assess relative methylation in PrEC, LNCaP and H1 cells, Infinium HumanMethylation450 BeadChip (450K) arrays were used (Section 2.8.2). 450K arrays rely on bisulphite conversion to interrogate 485,000 high relevance methylation sites at single nucleotide resolution across the genome. Using the ‘minfi’ Bioconductor package306, ‘Beta’ values were calculated that normalised probe assayed methylation to a scale of 0 – 1. To quantify the methylation at relevant CpG islands, all Beta values for probes found +/- 500 bp the centre of the island were averaged. Figure 4.11.A compares the differences between LNCaP and PrEC to the differences between LNCaP and H1 at all promoter associated CpG islands, such that a positive value indicates a gain in methylation in the LNCaP state on both axes. I found PrEC and ES share a “normal” methylome compared to the LNCaP cancer (R2=0.804). A comparison of CpG islands found in Group I and II methylated regions (Figure 4.11.B), reflects that they too exhibited similar changes from both PrEC and ES

84

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

Figure 4.11 ES cell methylation comparison Differences in methylation between LNCaP and H1 hES, and LNCaP and PrEC, are plotted for all promoter associated CpG islands (A) and promoters in Group I and II methylated regions (B). Methylation differences are calculated as the difference of the average 450K ‘Beta’ value calculated +/-500bp from the centre of each island. Group I and Group II CpG islands are coloured green and red respectively. Group I differences are calculated as the average methylation over the specifically methylated flanking area (see Figure 4.9).

(R2=0.69). This indicates that the gain of aberrant methylation associated with activated transcription in these regions is likely to be cancer specific.

85

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

4.5 Discussion

DNA methylation was one of the first proposed and earliest validated mechanisms of epigenetic control307,308. Many disease states can be attributed to or defined by aberrations in the methylation program309. This includes developmental disorders driven by a loss of imprinting such as Prader-Willi syndrome309 as well as the evidently deregulated methylome found in the cancer state. Here I have presented both discrete and genome- wide investigations into the state of methylation at the sites of gene activation in prostate cancer.

As hypermethylated CpG islands are implicated in gene silencing in tumourigenesis48,310, I investigated the hypothesis that gene activation could be also associated with CpG island hypomethylation in prostate cancer. First, I investigated genes in LREA regions as candidate activated genes, identifying KLK4 as a highly transcriptionally activated gene associated with a hypomethylated CpG island located 2.5 kb downstream of the canonical TSS. In addition, upstream of the canonical KLK4 TSS I found a completely hyper-methylated CpG island. Clinical prostate cancer and normal cells also exhibited this unusual pattern of hypo- and hyper- methylated islands flanking the TSS. In these normal samples I found that the hypomethylated CpG island was associated with increased expression of the KLK4 transcript. Hypomethylation of CpG islands in cancer has been documented previously311,312, although only in a select few examples. Instead, this hypomethylation has seemingly been restricted to non-dense repeat elements and gene bodies313. This understanding highlights the novelty of the thorough hypomethylation shown at KLK4. Due to the associated increase in expression observed in the non-canonical exon 2 TSS of KLK4, I propose that cancer-specific hypomethylation of CpG islands with transcriptional up- regulation may be more common than otherwise reported; identification being generally obscured by their location at non-canonical promoters. A thorough investigation into the prevalence of such sites is, while both warranted and pertinent, beyond the scope of this project.

86

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION

Figure 4.12 Hypermethylation of CpG islands and gene activation in cancer Group I methylation is characterised by hypermethylation of CpG island border regions demarcated by H3K4me3, potentially inhibiting repressor element binding, thereby activating ectopic gene expression. Group II methylation shows extensive hypermethylation of CpG islands, resulting in a change of promoter with associated H3K4me3 enrichment.

Second, to further investigate the relationship between gene activation and CpG methylation, I examined the methylation status of all promoter-associated CpG islands with respect to the associated change in expression in LNCaP. The key finding of this analysis was that promoter CpG island hypermethylation is much more commonly associated with gene activation than hypomethylation. While an increase in methylation is known to correspond to higher gene expression, this has previously been confined to low CpG density, intergenic regions33. In this study I found hypermethylation of CpG rich-promoters was also associated with active genes and could be categorised into one of two profiles (Figure 4.12).

Group I regions were defined by DNA methylation of the flanking regions of CpG islands, which was mutually exclusive of the H3K4me3 marked TSS (Figure 4.12). This methylation pattern is markedly different from previously reported cancer-specific methylation of CpG island shores, which occurs in more CpG deplete regions up to 2 kb distant from the island and is associated with gene repression314. The basis of this hypermethylated augmentation to gene expression posed an interesting question. I proposed that repressive proteins could be excluded from binding to promoter elements due to the increase in the CpG methylation. Using the TRANSFAC database, I show that many of the methylated sites in Group I regions harbour both repressive and activating protein binding sites. As this aspect of the study was

87

CHAPTER 4: DNA METHYLATION AND GENE ACTIVATION done in silico, any conclusive understanding would require a further experimental validation to determine if any of the transcriptional factors are bone fide targets of the border CpG island regions.

Group II methylated regions by contrast exhibited extensive CpG island hypermethylation across the annotated TSS, resulting in ectopic gene activation from alternative promoters in the cancer state (Figure 4.12). These alternative promoters were identified through the presence of CAGE-Seq signal, as well as a significant increase in H3K4me3. Notably, LNCaP specific initiation sites were found at unmethylated CpG islands. This suggests that for these genes the presence of a CpG island is a prerequisite for their appropriate genetic regulation. Intragenic CpG islands have been previously reported to harbour promoter or enhancer activity36,52; their methylation functioning to inhibit inappropriate transcription36. While these studies show that intergenic CpG island methylation can inhibit incorrect transcription in normal cells, the same is not true for CpG islands found at canonical promoters, as these are constitutively unmethylated in normal cells. The study here presented therefore extends this understanding to encompass these CpG islands, suggesting that the factors that prevent the annotated TSS from becoming methylated are lost in the cancer state. Given that these genes are defined by a significant increase in expression, I propose that with hypermethylation of the canonical promoter, the associated transcripts have lost their endogenous regulatory inhibition and that this contributes to the cancer phenotype.

88

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

CHAPTER 5 Chromatin Looping and Kallikreins

89

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

5.1 Introduction

The genome can be spatially defined by several nuclear processes. These include attachment to the nuclear envelope, the nuclear matrix or nucleoli as well as protein directed connections between distal DNA elements. This spatial arrangement of DNA is intimately involved in the establishment and maintenance of active and repressive epigenomic environments, as well as ensuring accurate DNA replication and cell division. Physical connections between DNA elements, referred to as “chromatin loops”, are implicated in a number of regulatory processes. Loops can bring both transcriptional enhancers315 and repressive elements316 into direct contact with gene promoters in response to external stimuli, as well as linking the transcriptional initiation and termination sites of a gene to facilitate recycling of transcriptional machinery317. Pertinently, larger looping regions can also act to physically demarcate transcriptionally distinct domains of the genome. While the particular mechanisms that direct chromatin looping are still unclear and are likely to involve multiple nuclear factors, a potential role for non-coding RNA in targeting DNA sequences has been identified318.

Looped DNA structures are physically linked by a number of protein partners, including polymerases, transcription factors319 and “insulator” factors such as the CCCTC-binding factor (CTCF). CTCF is a key genomic organiser that is almost completely conserved between birds and humans320 and results in early embryonic lethality if absent321. It has between 15,000 and 25,000 binding sites in the human genome, most of which are commonly occupied in diverse cell types322. CTCF was originally thought to have multiple roles depending on variable DNA motif binding323, acting as a transcriptional activator324, repressor325 or as an insulator that could block the action of enhancers. This insulator activity of CTCF is most well-known for its role in differentially regulating the H19/Igf2 imprinted locus326. More recent investigations have extended this role genome-wide, where the presence of CTCF binding sites was found to reduce the correlation of transcription between two genes on either side of the site327. CTCF is known to demarcate large genomic and epigenomic domains; the protein was identified at the boundaries of 90

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

H3K27me3 repressive domains322, LOCKs77, LADs328 as well as PMDs92. These diverse roles of CTCF are unified by its role as a primary intermediary in genome-wide chromatin looping and topology. Indeed CTCF, along-side the Cohesin and Mediator complexes, have been detected at 80% of all looping interaction sites329. When levels of CTCF are artificially depleted the number of physical interactions within megabase scale DNA domains increases330, suggesting that CTCF is necessary to specify as well as restrict looping structures. Using ChIA-PET (chromatin interaction analysis by paired-end tag) technology, which interrogates all looping structures mediated by a specified linking protein, Handoko and colleagues107 identified 5 looping domains that were distinguished through epigenetic profiling. Specifically they found CTCF loops could form barriers between active (H3K4me1, H3K4me2, H3K36me3) and repressive (H3K9me3, H3K20me3, H3K27me3) domains, could form loops encompassing enhancer elements (H3K4me1 and H3K4me2) as well as forming loops which included active and repressive domains themselves. Loops that included active chromatin were found to span smaller distances (<200 kb) when compared to those that included repressive elements. Relative distribution of CTCF binding sites alludes to the variable nature of its transcriptional regulation; CTCF deplete regions that were flanked by CTCF sites often include gene families and other genes that are transcriptionally co- regulated, while CTCF-rich regions contain genes that have multiple alternate promoters and require finer transcriptional control331.

As discussed in Chapter 3 of this thesis, as well as in previously described investigations139,140,216, the cancer genome is subject to widespread regional epigenetic disruption. The ability for chromatin loops to both demarcate and encapsulate epigenetically homogenous regions107,322 makes their deregulation an attractive explanation for both LREA and LRES. CTCF is mutated in multiple cancer types332-334 and abrogation of its binding is implicated in cancer specific chromatin alterations335-337. In spite of this, current evidence suggesting misdirected CTCF mediated loops could impart large epigenomic defects in cancer is still largely provisional. While Forn and colleagues demonstrate that a loss of CTCF binding is associated with an LRES phenotype233, they

91

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS neglect to investigate a specific looping mechanism. Hsu and colleagues report a LRES type region that is epigenetically silenced by oestrogen-receptor mediated looping when exposed to oestrogen in normal cells, but that this silencing is maintained in the absence of loops in the cancer state216. There is, however, no studies that specifically address the role of chromatin looping in the establishment of large scale epigenetic disruptions in tumourigenesis. The KLK family locus contains an archetypal LREA domain (Section 3.4.4) which is directly juxtaposed by an LRES region (Figure 5.1). It therefore provides an excellent model with which to address the role of CTCF in chromatin looping and the establishment of aberrant epigenomic domains in prostate cancer.

Figure 5.1 KLK LREA and LRES domains The KLK family locus harbours both LREA and LRES domains. The LREA domain is characterised by enriched H3K9ac and deplete H3K27me3 in LNCaP compared to PrEC, while the LRES domain demonstrates the opposite phenotype. The location of RefSeq genes are indicated at the top. Epigenetic modifications represent ChIP-on-chip data for LNCaP (red) and PrEC (blue), as discussed in Chapter 3.

92

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

5.2 Aim

Using the Kallikrein (KLK) family as a model locus, I aim to explore the relationship between CTCF-mediated chromatin looping and the establishment of both LREA and LRES in prostate cancer.

Specific aims: i) To design and optimise required parameters for a chromosome conformation capture (3C) assay ii) To carry out the 3C assay using a pre-created 3C library iii) To assess the prevalence of chromatin looping at the KLK locus in public datasets

5.3 Methods

5.3.1 3C protocol

The creation of 3C libraries in LNCaP and PrEC was primarily carried out by Dr Phillippa Taberlay, a postdoctoral fellow in the Clark laboratory.

5.3.1.1 Cell fixation and nuclei extraction Adherent cells were cultured under normal conditions until 80% confluency. Cells were lifted with Trypsin-EDTA (0.05%) (#25300, Gibco) at 37oC, and pipetted vigorously to remove clumping cells. Trypsin was quenched with media and cells were centrifuged at 1500 rpm for 3 min. The supernatant was discarded and the cells were fixed in 1% formaldehyde at room temperature for 10 min. Fixation was stopped by the addition of 1 M Glycine, followed by incubation for 5 min at room temperature and 10 min on ice. Cells were centrifuged at 1500 rpm for 5 min and washed with PBS before being resuspended in 1 ml Lysis Buffer

6 (100 mM Tris-HCl pH 8.0, 30 mM MgCl2, 100 mM NaCl, 1 mM EDTA, 5% Igepal) per 5 x 10 cells. Cells were incubated on ice for 15 min and then dounce homogenised with three cycles of 10 strokes and 1 min incubation on ice.

93

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

5.3.1.2 Restriction endonuclease digestion and ligation Cells were centrifuged for 5 min at 3000 rpm and washed twice with ice cold 500 μl NEB

Buffer 3 (100 mM NaCl, 50 mM Tris-HCl, 10 mM MgCl2, 1 mM DTT, pH 7.9). Cells were resuspended and aliquoted in 362 μl NEB buffer 3 per 5 x 106 cell; 3.8 μl 10% SDS was added and incubated at 65oC for 10 min. Samples were immediately kept on ice and 44 μl 10% Triton X-100 was carefully added and mixed. Chromatin was digested overnight at 37oC with the addition of 400U BglII (#R0144S, New England Biolabs).* BglII was inactivated with the addition of 86 μl 10% SDS and incubation at 65oC for 30 min, after which samples were put on ice. Dilute ligation was carried out in a final volume of 7.61 ml containing 1% Triton X- 100, 1X Ligation Buffer (#B0202S, New England Biolabs), 100 μg/ml BSA, 1 mM ATP. 4000U of T4 DNA ligase (#M0202S, New England Biolabs) was added and samples were incubated at room temperature for 6 hr. 50 μl of 10 mg/ml Proteinase K was added and samples were incubated overnight at 65oC, followed by an additional 50 μl of 10 mg/ml Proteinase K and a 2 hr incubation at 65oC. DNA was purified using Phenol/Chloroform extraction using sodium acetate as the requisite salt, followed by an ethanol precipitation (Section 2.5.1). DNA was resuspended in 25 μl TE with RNase A (40 μg/ml) and incubated at 37oC for 15 min.

5.3.2 Bacterial Artificial Chromosomes (BAC)

BAC constructs CTD-2342A18 and CTC-771P3 (#96012, Life Technologies), which harboured the KLK locus, were received as glycerol stocks of the HS996, “GeneHogs®” strain of E. coli. Clonal colonies were separated and individual colonies were used to inoculate 500 ml LB medium containing 12.5 μg/ml chloramphenicol. Maxipreps were carried out using QIAGEN Plasmid Maxi Kit (#12163, QIAGEN) according to manufacturer’s protocol with described modifications for low-copy number, large size plasmids. BAC DNA was

* BglII was selected as an appropriate restriction enzyme for the 3C assay through in silico digestion of the KLK region, which demonstrated desired resolution of restriction fragments. 94

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS resuspended in TE buffer, and quantified using Qubit dsDNA BR Assay Kit (#Q32853, Life Technologies).

3C BAC control libraries were created by first combining all isolated BAC constructs in equimolar ratios. This mixture was digested using 2U BglII per 1 μg BAC DNA at 37oC for 3 hr. DNA was purified using phenol/chloroform extraction followed by ethanol precipitation using sodium acetate as the required salt, as described (Section 2.5.1). The digested BAC mixture was re-ligated using T4 DNA ligase at 16oC overnight and DNA was purified as in the prior step. Purified DNA was resuspended in TE buffer and stored at -20oC.

95

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

5.4 Results

LREA region 28 (19q13.33) includes 5 genes of the KLK gene family: KLK15, KLK3, KLK2, KLKP1 and KLK4. As discussed previously in this thesis, this genic region is of particular biological interest, with KLK3 being an essential prostate cancer biomarker, KLK2 being implicated in oncogenic translocations with Ets transcription factors, and KLK4 being uniquely associated with a transcriptionally activated and hypomethylated CpG island. In addition to these characteristics, the genomic locus of the KLK LREA region shows the remarkable quality of being immediately juxtaposed by a region of LRES (Figure 5.2, Table 5.1). The adjacent LRES region contains 10 genes, ranging from KLK5 to KLK14, with an average 2.4 fold reduction in gene expression as assayed by Affymetrix GeneChip arrays (Table 5.1). This LRES region shows clear epigenetic remodelling in LNCaP cells compared to PrEC, with an evident enrichment of repressive modifications H3K27me3 and H3K9me2 throughout (Figure 5.2).

Table 5.1 KLK gene expression values Expression was quantified using Affymetrix GeneChip arrays for all KLK genes using LNCaP and PrEC RNA. Fold change refers to fold change from PrEC to LNCaP. Green rows indicate a 2 fold increase in LNCaP; light pink rows indicate a decrease in expression in LNCaP between 0 and 2 fold; dark pink rows indicate at least 2 fold decrease in expression in LNCaP. KLK genes that are considered LREA and LRES genes are indicated.

96

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

Figure 5.2 Characterisation of the KLK region This figure shows many aspects of the KLK region. At the top, an ideogram of Chromosome 19 shows the location of the region, beneath which are the locations of all RefSeq genes. The location of two BACs used for 3C normalisation are shown. In red and green blocks the BglII fragmentation of regions of interest for 3C are indicated, and are labelled with a number for future discrimination. Blue circles beneath BglII fragments indicate those assayed by 3C. Pale lines running vertically throughout the figure are representative of this BglII fragmentation, while the thick red line indicates the location of the ‘Bait’ 3C fragment. The relative enrichment of epigenetic modifications assayed by ChIP- on-chip are shown in the centre of the figure for PrEC (blue), LNCaP (red) and the difference (orange). At the bottom of the figure, signal for both RNA-Seq and CTCF-Seq are represented. Boxed regions in CTCF tracks indicate the location of validated peaks (see Figure 5.3).

97

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

Reports have recently shown that domains of consistent epigenetic modifications can be encompassed by chromatin loops anchored at CTCF sites107. I therefore sort to determine whether the neighbouring LREA and LRES regions found at the KLK locus were caused by a deregulated mechanism of chromatin looping. To begin, CTCF Chip-seq, carried out by Jenny Song, a senior research assistant in the Clark laboratory, revealed discrete binding across the wider KLK region (Figure 5.2, bottom panel). Specifically CTCF peaks were observed sporadically within the LREA and LRES regions, as well as strong peaks of enrichment between the two regions. Before this dataset was used for further experimentation, CTFC enrichment was validated using qPCR for three chosen peaks (highlighted in Figure 5.2). This assessment revealed significant enrichment of CTCF at all three loci when normalised to ChIP input material (Figure 5.3).

Figure 5.3 Validation of CTCF ChIP-Seq qPCR was carried out on CTCF ChIP material at three loci in the KLK region, the location of which are indicated in the bottom panel of Figure 5.2. Error bars indicate +/- SD.

98

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

5.4.1 Chromosome Conformation Capture

To investigate DNA looping structures in the KLK region a technique called Chromosome Conformation Capture (3C) was used. This technique is described in Figure 5.4.A. Briefly, DNA is treated with 1% formaldehyde, causing all proximal proteins and DNA to become reversibly covalently cross-linked. If two segments of DNA are looped together in the endogenous state they will therefore become covalently attached. Cross-linked DNA is then digested using a restriction endonuclease and re-ligated in dilute conditions. As DNA that was initially proximal in the cell remains covalently bound, a dilute ligation ensures that only free DNA ends around these contacts become ligated. To detect the presence of these novel ligation products, PCR primers are designed near the free ends created by endonuclease digestion. This PCR reaction will only be successful if the two ends where the primers were designed are successfully ligated together. Traditionally, a single “Bait” primer is assayed against a panel of “Test” primers, which probes for the interaction of one digested fragment against a wide range of genomic locations.

Figure 5.4.B illustrates the results of theoretical 3C experiments. In the first example, a loop exists between the Bait fragment and 9 Test fragments while in the second example there is no loop present. Primers that face the ends of both the Bait fragment and each Test fragment are used for a PCR reaction and the relative amount of product is shown on the right. Test fragments 1 and 2 always show a small enrichment as fragments that are genomically close to the Bait have a higher than background chance of forming cross linked products. Test fragment 8 however only shows a PCR product when there is a loop present. Importantly, in order to properly assess the presence of a looping product an investigator needs to include genomically proximal test fragments. In this example, Test fragments 7 and 9 provide a background for fragment 8 to rise above, thus suggestive of a looping product.

99

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

Figure 5.4 Chromosome Conformation Capture methodology A) A hypothetical endogenously looped structure is shown, contacts between the two strands of DNA are highlighted by green and red blocks. Cross-linking using formaldehyde covalently links the two contact points, indicated by the “zig-zag” line. Following digestion with an appropriate endonuclease, only products that were covalently bound remain proximally located. After T4 ligation, these proximal fragments are connected together. PCR amplification using primers located on either side of the restriction/ligation site (blue line) facilitates the final validation of endogenous proximity. B) A complete analysis of chromatin looping requires the assessment of many potential contact points within a large DNA locus. In the example given the top section shows a hypothetical example of a looping structure and the resulting PCR output, while the bottom shows the output if no loop is present.

100

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

5.4.2 3C primer design

The generation of 3C libraries, which includes the 3C protocol until ligation, is described in Section 5.3.1. Libraries were prepared for both PrEC and LNCaP cell lines. The digesting endonuclease used in this library generation was BglII, a “6-cutter” enzyme that recognises the AGATCT palindromic sequence. DNA for the region that encompasses both the KLK LREA and LRES regions as well as all major CTCF peaks, spanning genes ACPT to CTU1, was in silico digested using the BglII recognition motif. Identified fragments larger than 200 bp were given sequential BglII fragment identifiers 1-59 (Figure 5.2). The Bait fragment that was to be compared to all other digested products was Fragment 27. This fragment lies in between the KLK LREA and LRES regions and includes a significant and conserved CTCF peak. Figure 5.5 shows the location of the Bait primer sequence (also shown by a red line in Figure 5.2), indicating its orientation towards the proximal BglII restriction site. A TaqMan probe was also designed to be included after the end of this Bait primer (Figure 5.5) such that it could be used to quantify the product of the 3C PCR reaction.

Figure 5.5 3C Bait and probe design BglII digested “Fragment 27” contains one of the two major CTCF peaks that divide the LREA and LRES regions and was denoted as the Bait fragment. The Bait primer sequence (blue) faces towards the 5’ end of the digested fragment. The TaqMan probe (red) was designed against the most 5’ part of this fragment and is inclusive of part of the BglII restriction site, ensuring ligation specificity.

101

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

Test primers were designed to span the KLK region such that they interrogated fragments that contained CTCF sites as well as gene TSSs. Primers were designed within 100 bp of, and facing towards BglII restriction sites. This ensures that upon 3C ligation a viable PCR product would be formed using both the Test and Bait primer sets that doesn’t exceed 150 bp. With these constraints, and with the integration of restrictions imposed by consistent primer design, 26 fragments across the KLK region had Test primers designed. The sequence of these primers can be found in Appendix 9.1 and locations are indicated by blue circles in Figure 5.2.

5.4.3 Bacterial Artificial Chromosomes (BAC)

5.4.3.1 BAC isolation and identification The 3C protocol necessitates comparatively large amounts of DNA to be used in PCR reactions. This is due to the large number of pairwise combinations of fragments that can be fused during the digestion and ligation process, thus creating a very low ratio of desired sequence to the background. Because of this impediment, it is not desirable to validate and optimise primers using 3C library material. BACs are large bacterial plasmids that can contain DNA inserts of up to 350 kb which can be used for 3C primer validation purposes. Specifically, one or more BAC constructs that cover the genomic region of interest are commercially acquired and amplified using plasmid amplification methodologies. These constructs are digested using the 3C assay appropriate restriction enzyme and then re- ligated. This creates a library that represents all possible 3C products of the desired genomic region that endogenously do not exist in other contexts.

The BAC constructs used in this study are CTC-771P3 (chr19: 55,921,271-56,049,985; 129 kb) and CTD-2342A18 (chr19: 56,058,065-56,327,410; 269 kb), together which cover the entirety of the KLK region (Figure 5.2). Each of these BAC constructs used the pBeloBAC11 backbone vector and was received in the HS996, “GeneHogs®” strain of E. coli as a glycerol stock (Life Technologies). 102

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

Each BAC construct was clonally amplified using the QIAGEN Plasmid Maxi Kit. To validate the presence of the appropriate BAC insert in the amplified products, two approaches were used: i) end sequencing using vector primer sequences, ii) PCR of internal amplicons. The first approach entails capillary sequencing of product obtained via BigDye PCR (Section 2.9) using primers for sequences found on the pBeloBAC11 backbone vector. Specifically, identification of ~300 bp of the 5’ and 3’ end of the BAC sequences could be gathered using Sp6 and T7 primer sequences respectively. The second approach involved a PCR reaction of amplicons that would theoretically be found internally in the BAC construct. CTCF ChIP primers used in Figure 5.3 were again used for this purpose, their locations in the BAC constructs can be seen in Figure 5.6.A. This dual approach allows validation of the presence of the ends as well as the generalised internal sequences of the BAC constructs.

End sequencing of CTC-771P3 using the Sp6 primer and CTD-232A18 using both Sp6 and T7 primers showed correct genomic positioning of both amplified constructs (Figure 5.6.A). Internal amplicon PCR of CTC-771P3 using the “CTCF-1” sequence (67 bp amplicon) showed the expected amplification (Figure 5.6.B, left). Conversely, PCR of CTD-2342A18 for “CTCF- 2” (61 bp amplicon) and “CTCF-3” (69 bp amplicon) only showed amplification for “CTCF-2”, implying the absence of the latter sequence (Figure 5.6.B, right). Thus, the amplified CTD- 2342A18 construct appeared to consist of the extreme ends of the BAC, but only a portion of the internal structure. Several more clones were amplified and the same characteristic was observed in each (data not shown). To determine whether the acquired CTD-2342A18 containing bacterial stock constitutively exhibited this absence, a maxi-prep was carried out without an initial clonal separation. This plasmid amplification would therefore contain all represented plasmids in the bacterial population. To more fully characterise this population, 10 qPCR “Walking” primers were designed that spanned the entirety of the BAC insert. In addition, a primer set that spanned the contained BAIT restriction sequence was included (3C Constant, Figure 5.6.C). Following “Stock” maxi-prep isolation and SYBR qPCR using these primers, I showed that while the internal amplicons were in fact present in the

103

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

Figure 5.6 BAC design and maxi-prep validation A) BACs CTC-771P3 and CTD-2342A18 cover the KLK genes and some surrounding genic loci. Locations of internal CTCF ChIP amplicons are indicated on each BAC. The 5’ end of CTC-771P3 and 5’ and 3’ ends of CTD-2342A18 are focused, showing the location of sequenced ends of maxi-prep products relative to the mapped UCSC sequence. For both 5’ ends the Sp6 primer was used for sequencing, while the T7 primer was used for the 3’ end of CTD-232A18. B) Internal amplicons located in A) were run on a 1.5% agarose gel. CTCF-1 was carried out in duplicate and the appropriate size amplicon was seen in CTC-771P3 material; CTCF-2 and CTCF-3 PCR were carried out in triplicate on CTD-2342A18 material, but only CTCF-2 product was seen. C) The location of 10 “Walking” primers designed against the length of CTD-2342A18 is indicated, as is the position of the “3C Constant” amplicon. The “3C Constant” primer pair includes the Bait primer and a primer facing the opposing direction on the 3’ end of Fragment 26. SYBR green qPCR was carried out on a maxi-prep of CTD-2342A18 “Stock” material, and values were normalised to “3C Constant” results. Error bars indicate +/- SD.

104

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS plasmid population, they were present at a 1000-fold reduction (Figure 5.6.C). Although BACs are generally capable of maintaining large DNA inserts338, it appeared that in this instance there is endogenous instability of the construct leading to a common deletion.

5.4.3.2 3C primer validation using BACs In spite of this deletion, the CTD-2342A18 plasmid population amplified and isolated directly from the bacterial stock was sufficient for the desired 3C amplicon validation. CTD- 2342A18 was combined with CTC-771P3 in equimolar ratios and digested using the BglII endonuclease. The digested BAC mixture was then re-ligated using T4 DNA ligase, creating a theoretical library of all possible 3C products in the genomic regions covered by the BAC constructs (Section 5.3.2). This material was assessed via TaqMan qPCR to quantify the presence of successful restriction and ligation products. Each qPCR reaction contained the constant Bait primer, one Test primer and the TaqMan probe identified in Figure 5.5. As the TaqMan probe is designed against the Bait fragment, this ensures that only products that are a consequence of ligation with the Bait restriction site would be amplified. Concurrently to these reactions, the CTCF-1 ChIP amplicon was probed with SYBR green for purposes of normalising the PCR reaction. From Figure 5.7 it is clear that all the designed Test-Bait primer pairs successfully amplify the validation library material. Additionally, this successful amplification suggests that the Bait fragment, which is located in the deleted region of CTD- 2342A18 is in fact present in these reactions. The fact that the wider deleted region shows substantially less signal is again testament to its marked depletion in the stock CTD-2342A18 material. Importantly, as all of the assessed 3C primers were successful in this validation library they were therefore considered suitable for the final 3C assessment of LNCaP and PrEC DNA.

5.4.4 3C PCR optimisation

3C DNA for LNCaP and PrEC was quantified by standard curve using CTCF-1 and CTCF-2 ChIP primers (data not shown). In order to determine a functional DNA range suitable for 3C qPCR, multiple dilutions of each preparation were assessed with expected positive, 105

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

Figure 5.7 3C primer validation 3C primer pairs (Test/Bait) were validated using CTD-2342A18 and CTC-771P3 maxi-prep material that had been combined, digested with BglII and re-ligated. TaqMan qPCR was carried out using the probe designed in Figure 5.5. For fragments where two primers were designed (in instances of design uncertainty), “RV” (reverse) and “FW” (forward) denotes which end of the specified fragment the sequence refers to. Error bars indicate +/- SD. expected negative and control primer pairs. Fragment 25 was used as a positive interacting fragment as it is proximal to the Bait primer on Fragment 27 (Figure 5.2). Fragment 8 was used as a negative interacting fragment as it is distal to the Bait fragment and not predicted to show a looping interaction (Figure 5.2). Control primer pairs were designed against the “3C constant” amplicon, as previously described (Figure 5.6). Figure 5.8 shows PCR amplifications of both LNCaP and PrEC material in 2X dilution series for the three amplicons tested, carried out in duplicate. “3C constant” showed amplification throughout the dilution series and was indicative of the sufficiency of the DNA quantities used. For PrEC and LNCaP, 17 ng and 20 ng respectively showed amplification in duplicate for the positive Fragment 25 but not for the negative Fragment 8. 20 ng was therefore selected as an appropriate amount of 3C material for further analyses.

5.4.5 KLK 3C analysis in PrEC and LNCaP

TaqMan qPCR was carried out on 20 ng of PrEC and LNCaP 3C DNA using primers designed against all specified fragments. ChIP amplicon “CTCF-1” was used to quantitatively normalise the output of the assay in both samples. Figure 5.9 shows the successful 106

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

Figure 5.8 3C DNA optimisation The 3C Constant Fragment (93 bp), Fragment 25 (86 bp) and Fragment 8 (85 bp) were amplified using PCR and visualised using a 1.5% agarose gel. A 2x dilution series was used for both LNCaP and PrEC, starting from 160 ng and 68 ng respectively. amplification of all primer sets in both PrEC and LNCaP 3C DNA. “Loops” were designated on the basis of localised enrichment of crosslinked product when compared to surrounding interactions339. From this data it is apparent that the same three looping structures are present in both LNCaP and PrEC samples. Loop 1 extends from the Bait to CTCF peak B, and is inclusive of the entire LREA region as well as upstream genes C19orf4 and KLK1. Loop 2 extends towards CTCF peaks G/H and includes LRES genes KLK5 to KLK10. This division is of note as these genes are the subset of the KLK LRES genes that are particularly repressed in LNCaP. This is clearly shown by LNCaP and PrEC RNA-Seq enrichment in Figure 5.9, as well as summarised Affymetrix GeneChip array data in Table 5.1. The final looping product observed was between the Bait and CTCF peaks I/J and included KLK11 to KLK14, constituting the remainder of the KLK LRES genes. These results suggest that CTCF bound looping structures demarcate regions of distinct epigenomic and transcriptional regulation and that these structures are present regardless of the nature of the contained regulation, being found unchanged in both PrEC and LNCaP cell lines.

107

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

Figure 5.9 KLK chromatin looping in PrEC and LNCaP Top panels: TaqMan qPCR was carried out using all 3C primer pairs and resultant values were normalised to “CTCF- 1” amplification values for each cell type. These normalised values are plotted against the genomic distance from the “Bait” locus. Error bars represent +/- SD. Where enrichment above surrounding genomic loci were seen, loops were designated. These loops are plotted above the location of RefSeq genes across the KLK region. Bottom panels: RNA-Seq and CTCF ChIP-seq enrichment is shown for PrEC (blue) and LNCaP (red). The location of all major CTCF peaks is annotated below.

5.4.6 KLK looping in cancer cell lines

By performing 3C in LNCaP and PrEC I established the presence of an unaltered looping structure in the prostate cell cancer model that encompassed both the KLK LREA and LRES domains. To investigate whether this structure was present in other cell types, I used publicly available RNA-Seq340, CTCF-Seq and CTCF ChIA-PET data for MCF-7 breast cancer and K562 myelogenous leukaemia cell lines341-343. Chromatin interaction analysis with 108

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS paired-end tag sequencing (ChIA-PET) is a technique that identifies looping chromatin interactions genome-wide at an anchored nuclear protein of interest343. In this instance, loops anchored by CTCF were identified. In MCF-7 cells, RNA is highly expressed throughout the LRES region, including KLK5 – KLK10, and this is associated with loops correlating to “Loop 2” and “Loop 3” identified previously (Figure 5.10). While this pattern of gene expression is quite similar to that found in PrEC cells, “Loop 1” is absent in this breast cancer model. In contrast, K562 cells, which show low expression throughout both KLK LRES and LREA regions, show evidence of both “Loop 1” and “Loop 2”. This cell line additionally has a looping structure, which spans from the intra-LRES border of “Loop 2” to the external border of the region. Due to the limitations of the 3C methodology the presence of this loop was not interrogated in the described LNCaP and PrEC model system. It’s inclusion in K562 cells, however, suggests that this secondary LRES loop may be independent of genomic segmentation.

Figure 5.10 KLK looping in MCF7 and K562 cell lines Chromatin loops identified by 3C in LNCaP and PrEC cells are shown in the top panel. CTCF ChIA-PET, CTCF-Seq and RNA-Seq data for K562 (bottom panels) and MCF7 (middle panels) cells was extracted from UCSC Genome Browser. ChIA-PET data is visualised as vertical lines representing each end of a chromatin loop which is linked by paired end sequencing, indicated by horizontal lines. ChIA-PET data with less than 3 paired reads were removed from visualisation as recommended by the original investigators343.

109

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

5.5 Discussion

The KLK region presents an ideal model with which to study how domain wide epigenetic modifications may be regulated in both cancer and normal nuclear environments. At this locus, a LREA region is directly juxtaposed to a region of LRES; the former characterised by sequential genes showing an enrichment in active histone modifications (H3K9ac, H3K4me3) and depletion of repressive modifications (H3K27me3), while the latter is distinguished by this reversal. The nuclear mechanisms that direct this widespread epigenetic remodelling across domains in prostate cancer remain unclear. One possible regulatory structure that could be prone to epigenetic malformation in tumourigenesis is that of chromatin looping. This process is known to be involved in discriminating large epigenomic domains344 and is strikingly malleable at regions of developmentally controlled transcriptional regulation345.

The involvement of looping structures in the transcriptional regulation of co-localised genes has been extensively documented in Hox A-D clusters345-347, α- and β-globin gene loci348,349, as well as the Th2 cytokine loci350. These regions are characterised by two common criteria. First, they are all part of gene families, which although could be bias due to the allure of investigation, is more likely necessitated by coordinated regulation. Second, the looping identified, which is typically involved in gene activation, is distinguished by individual genes looping into centres of common regulation. At the α-globin loci for example, a subset of activated genes loop into a central domain where they interact with distal regulatory elements348. The transcriptional activation exhibited in these cores is likely driven by “transcription factories”, so termed focal regions of RNA polymerase II commonly found in the nucleus93,351. The epigenetics that underpin this transcriptional activation hasn’t been fully described for all these looping regions. LREA-like widespread enrichment of chromatin modifications, however, have been documented at the Th2 cytokine locus350 as well as the Hox A cluster318.

110

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

Figure 5.11 Model of KLK looping A) Loops identified by 3C in LNCaP and PrEC are indicated below representative CTCF-Seq enrichment and RefSeq genes. LREA and LRES KLK genes are coloured green and red respectively. B) The hypothesised looping structure of the KLK region in PrEC and LNCaP is illustrated. Green and red coloured regions indicate regions of epigenetic activation and repression respectively. The looping associated transcriptional and epigenetic regulation identified at the KLK locus is distinct from these previously investigated gene families. First, both regions of transcriptional repression and transcriptional activation are looped to the same CTCF bound locus in both PrEC and LNCaP cells (Figure 5.11.B); in contrast to the previous examples that were typically associated with gene activation. Second, instead of individual genes and regulatory elements being looped to a central domain to effect transcriptional control, looping identified in the KLK region serves to demarcate large regions of independent epigenetic activation and repression (Figure 5.11.B). This form of looping had been previously identified by Handoko and colleagues107 using CTCF ChIA-PET. Specifically, this study found that of the 1,480 identified intra-chromosomal CTCF mediated loops in pluripotent cells, 12% and 11% were characterised by an enrichment of active (H3H4me1,

111

CHAPTER 5: CHROMATIN LOOPING AND KALLIKREINS

H3k4me2, H3K36me3) and repressive (H3K9me3, H3K27me3) modifications respectively. Importantly, these markers were identified within the looping structures and not at the site of looping itself.

The results drawn in the presented KLK study contribute to this understanding by demonstrating how these loops, while clearly encompassing domains of consistent epigenetic enrichment, may be present regardless of the nature of the epigenetic state. I show that within the KLK domain, the same looping structures are present in both LNCaP and PrEC even as the contained chromatin undergoes significant remodelling (Figure 5.9). These loops were also identified in breast cancer and leukaemia cell lines (Figure 5.10). Together these observations suggest the presence of functional units of chromatin looping, which may undergo coordinated remodelling in tumourigenesis.

Similar functional units of looping have recently been described as Topologically Associating Domains (TADs)108,109. TADs were defined as sub-megabase units of the genome that preferentially self-interact and remain constant during differentiation and through X- inactivation. Significantly, while some TADs show a broad and consistent epigenetic characteristic, the borders of the TAD unit remain unchanged even if the specified epigenetic modification is deregulated109. This feature again suggests that a looped region of the genome can determine the borders of epigenetic remodelling, rather than the inverse. While TADs showed a demonstrably high enrichment of CTCF at border regions108, they are specifically defined as domains of high self-interaction. Thus, while in this thesis I have identified major looping structures that broadly demarcate the KLK region, there is a strong likelihood of finer inter-domain interactions that actively define their particular and uniquely coordinated epigenetic regulation. With this consideration, the described KLK loops still further the understanding of TADs by demonstrating the maintenance of domain level interactions even through the rigours of tumourigenesis. HiC experiments are now being performed by the Clark Lab across normal and cancer cell lines to determine if this is a more generalised cancer phenomenon.

112

CHAPTER 6: REPLI-SEQ OPTIMISATION

CHAPTER 6 Repli-Seq optimisation

113

CHAPTER 6: REPLI-SEQ OPTIMISATION

6.1 Introduction

DNA is replicated during S-phase of the cell cycle in a tightly coordinated and developmentally regulated fashion. The time that a given locus is replicated, referred to as the “replication time” is non-random and correlates with several cellular attributes such as nucleotide content, transcriptional potential, nuclear spatiality and epigenetic character. Studies investigating this genetic quality have been ongoing for several decades, with the pace of understanding closely mirroring technological development. Early researchers exploited the high-resolution afforded by β-emission of tritium labelled thymidine which was coupled with autoradiography to infer replication dynamics. Indeed, these pioneering studies founded the now common understanding of how chromosomes replicate, separate and exchange material over the course of cell division352,353. Using this technique researchers revealed the potential for asynchronous replication between homologous X chromosomes354 as well as between distant loci on a single chromosome355,356, thus establishing replication timing as a directed process in the nucleus. Figure 6.1, adapted from “Asynchronous duplication of human chromosomes and the origin of the sex chromatin” by Morishima and colleagues, 1962357, serves as a dramatic example of the capability of these founding techniques to resolve asynchronous replication of the two X chromosomes. Further investigations using tritium labelled thymidine revealed that replication timing changes are developmentally controlled, and that these changes correlate with changes in gene expression358,359.

While these early studies demonstrate several fundamental properties of replication dynamics and its relationship to genetic control, they relied on qualitative inspection of differential karyotypes and so were significantly limited in their resolution. The development of Southern blotting360 and fluorescent in-situ hybridisation (FISH)361, however, facilitated the assessment of replication changes at the level of individual loci. To use Southern blotting for the purposes of replication identification, cells first needed to be separated on the basis of their DNA content. For this purpose, bromodeoxyuridine (BrdU), a synthetic thymidine analogue was exploited. Cells were first synchronised with metaphase 114

CHAPTER 6: REPLI-SEQ OPTIMISATION

Figure 6.1 Asynchronous X chromosome replication Tritium labelled thymidine based assays reveal gross asynchronous replication of X chromosomes. Figure and legend adopted from Morishima et al, 1962357. arresting chemical agents and then labelled with BrdU at different periods following synchronisation release. Due to the different chemical composition of BrdU compared to endogenous nucleotides, populations that incorporated the analogue could be separated using caesium/chloride centrifugation. This isolation of DNA at certain points in the cell cycle, when coupled with Southern blotting, therefore allowed researchers to quantify replication timing of individual genetic loci362,363.

The incorporation of BrdU in newly replicating DNA has enabled several distinct techniques. In addition to the aforementioned use in caesium/chloride density gradients, BrdU was essential in revealing “replication bands” through its inhibition of 33258 Hoechst fluorescence364. The most sensitive utilisation however, and indeed a primary focus for this chapter, is the use of anti-BrdU antibodies for immunoprecipitation (BrdU-IP)365. The initial 115

CHAPTER 6: REPLI-SEQ OPTIMISATION protocol describing this technique first pulse labelled cell populations with BrdU, such that growing cells would incorporate BrdU wherever replication was occurring. This was followed by caesium/chloride centrifugation to separate cells based on progression through S-phase, which were then subject to BrdU-IP366. These precipitates therefore contained all the DNA that was being replicated at a specified stage in S-phase. To identify the BrdU incorporated DNA in these isolates, Vassilev and colleagues utilised the recently developed PCR protocol to amplify DNA of interest with subsequent Southern blotting. This technique was further optimised by using fluorescent activated cell sorting (FACS) to more precisely fractionate S-phase populations367, and then later with the use of q-RT-PCR to quantify immunoprecipitated DNA368.

The more recent advent of whole-genome technologies, such as micro-arrays and next generation sequencing, have now allowed researchers to fully interrogate the output of the BrdU-IP protocol. Using comparative genomic hybridisation (CGH) arrays that compared BrdU-IP material from early and late S-phase, Hiratani and colleagues created the first replication timing profile that mapped an entire mammalian genome115. This technique has since been applied to human pluripotent cells116, acute lymphoblastic leukaemia (ALL) cancer cells369 as well as cell line DNA. Although CGH arrays provided reasonable genomic coverage of replication domains, the use of next generation sequencing to probe BrdU-IP material has several clear advantages. Next generation sequencing typically provides greater genomic coverage than CGH arrays and is able to interrogate traditionally confounding repetitive regions. Additionally, as CGH based replication assessment relies on the relative enrichment of “early” and “late” material to quantify replication, the technique is unable to distinguish loci that replicate asynchronistically from those that have poor signal due to deletion, amplification or other confounding issues; a limitation that doesn’t affect sequencing based assessment. BrdU-IP combined with next generation sequencing, a technique termed “Repli-Seq”, was first developed by Hansen and colleagues118 to assess plasticity between replication profiles in diverse cell types.

116

CHAPTER 6: REPLI-SEQ OPTIMISATION

In this chapter, I describe the optimisation and validation of the Repli-Seq protocol required to assess the replication profiles of the LNCaP and PrEC genomes. In Chapter 7, I use Repli- Seq to determine if there is a cancer-associated change in replication timing and if this correlates with a change in the cancer epigenome.

117

CHAPTER 6: REPLI-SEQ OPTIMISATION

6.2 Aim

To optimise, perform and validate the Repli-Seq protocol on LNCaP and PrEC cells.

Specific aims: i) To optimise BrdU pulse labelling and immunoprecipitation conditions. ii) To validate the relevancy of literature derived “control amplicons” in prostate cell systems. iii) To perform and validate the Repli-Seq protocol

6.3 Methods

6.3.1 BrdU pulse labelling and ethanol fixation

PrEC and LNCaP cells were grown to 80% confluency in T150 flasks (#CLS430825, Corning). Media was removed by aspiration and replenished with fresh medium containing 50 M BrdU (#B5002, Sigma-Aldrich). As BrdU is light sensitive, all further steps were carried out

o away from direct light sources. Cells were incubated at 37 C with 5% CO2 for 2 hr, rinsed gently with 12 ml cold PBS and detached using 10 ml Trypsin-EDTA at 37oC. Cells were transferred to a 50 ml falcon tube. Flasks were washed with 10 ml of media and this was added to the trypsinized cells. BrdU labelled cells were centrifuged at 500 g, 5 min at 4oC. The supernatant was aspirated and the cells were resuspended in 2.5 ml cold PBS. 7.5 ml of ice-cold 100% ethanol was added quickly to the cells and pipetted up and down to ensure a single cell suspension. BrdU labelled and ethanol fixed cells were then stored at -20oC for up to 6 months.

6.3.2 Propidium iodide staining and FACS sorting

4x106 BrdU labelled and ethanol fixed cells were centrifuged at 700 g for 5 min and the supernatant discarded. Cells were resuspended in PBS with 1% FBS, centrifuged again and resuspended in 4 ml PBS with 1% FBS (1 x 106 cells/ml). 1 mg/ml propidium iodide (#P4170,

118

CHAPTER 6: REPLI-SEQ OPTIMISATION

Sigma-Aldrich) was added to a final concentration of 50 g/ml and incubated overnight at 4oC in the dark. 10 mg/ml RNase A was added to a final concentration of 250 g/ml and the sample was incubated for ½ hr at room temperature in the dark. Samples were kept on ice prior to and during FACS sorting. Flow cytometry was performed using a BD Influx Cell Sorter (BD Biosciences) with the assistance of Dr Chris Brownlee and Nikki Alling, operators of the Flow Facility at the Garvan Institute. Forward and side scatter were used to select a population of cells free of cell debris and doublets. Cells were sorted using a 488 nm excitation into the 6 fractions shown in Figure 6.2.C. Sorting continued until ~50,000 cells had been collected for each nominated fraction.To isolate DNA from sorted cell populations, 1 ml of SDS-PK buffer (50 mM Tris-HCl, 10 mM EDTA, 1 M NaCl, 0.5% SDS, 0.2 mg/ml proteinase K, 0.05 mg/ml glycogen) was added per 100,000 cells. Samples were incubated for 56oC for 2 hr and then overnight at -20oC. DNA was isolated using phenol/chloroform extraction with isopropanol as described previously, and resuspended in TE buffer.

6.3.3 BrdU immunoprecipitation

Isolated DNA was divided into 200 l aliquots in 1.5 ml hydrophobic microtubes (#1210, Scientific Specialties Inc.) and sonicated for 30 min, 30 sec on/off on high power using a Bioruptor (Diagenode), until DNA had a mean size of ~700 bp. Samples were immunoprecipitated using up to 100,000 cells per reaction, in a volume made up to 500 l using TE. Samples were incubated at 95oC for 5 min, and then cooled on ice for 2 min. 5 l of each sample was taken at this stage for input normalisation. 60 l of 10X IP buffer (100 mM NaH2PO4 (pH 7.0), 1.5 M NaCl, 0.5% Triton X-100) was added to each sample, which were then vortexed and had 80 l of 12.5 g/ml anti-BrdU antibody (#555627, BD Pharmingen) added. Samples were incubated with turning for 20 min at room temperature in the dark. 35 g of rabbit anti-mouse IgG (#M7023, Sigma-Aldrich) was added and the samples were again incubated with turning for 20 min at room temperature. The samples were centrifuged for 20 min, 16,000 g at 4oC, and the supernatant was carefully

119

CHAPTER 6: REPLI-SEQ OPTIMISATION removed. 750 l of cold 1X IP buffer was added, the samples were centrifuged and the supernatant was again removed. Pellets were resuspended in 200 l digestion buffer (50 mM Tris HCl (pH 8.0), 10 mM EDTA, 0.5% SDS) with freshly added proteinase K (0.25 mg/ml) and incubated overnight at 37oC. 100 l of fresh digestion buffer with proteinase K was added and samples were incubated for a further 60 min at 56oC. DNA was extracted using two phenol/chloroform extractions and an ethanol precipitation, using glycogen and

NH4C2H3O2. DNA was washed with 70% ethanol and resuspended in 30 l TE. DNA was quantified using the ssDNA Qubit assay kit (#Q10212, Life Technologies) and stored at -20oC until required.

6.3.4 dsDNA Klenow extension and sonication

After BrdU-immunoprecipitation, DNA was single stranded (ssDNA). To reconstitute the second strand the Random Primers DNA Labelling System (#18187-013, Life Technologies) was used. Each sample was made up to 54 l, heated to 95oC for 5 min and then cooled on ice for 2 min. 4 l each of 0.5 mM dATP, dCTP, dGTP, dTTP and 30 l of Random Primers

Buffer Mix (0.67 M HEPES, 0.17 M Tris HCl, 17 mM MgCl2, 33 mM 2-mercaptoethanol, 1.33 mg/ml BSA, 18 OD260 units/ml oligodeoxyribonucleotide primers (hexamers)) was added to each sample and mixed briefly. 2 l of Klenow Fragment (large fragment, 3 U/l) was added; the samples were mixed gently and incubated for 2 hr at 37oC. 10 l of Stop Buffer (0.5 M EDTA, pH 8.0) was used to stop the reaction, and the samples were made up to 150 l with

H2O.

Sonication was carried out on ice with a Sonifer 250 (Branson Ultrasonics), using 6 cycles of on/off : 0.5 sec/ 1 sec for a total of 10 seconds, until the DNA had a mean size of 200-500 bp. DNA was purified using a Wizard SV Gel and PCR Clean-Up System (#A9281, Promega) and eluted in 40 l H2O. Quantification was carried out using the dsDNA High Sensitivity Qubit assay kit (#Q32854, Life Technologies).

120

CHAPTER 6: REPLI-SEQ OPTIMISATION

Figure 6.2 The Repli-Seq protocol

121

CHAPTER 6: REPLI-SEQ OPTIMISATION

6.4 Results

The Repli-Seq protocol utilized in this investigation was broadly adapted from Hansen et al. “Sequencing newly replicated DNA reveals widespread plasticity in human replication timing”118, with modifications inspired by Azuara et al, “Profiling of DNA replication timing in unsynchronized cell populations”370 and Ryba et al, “Genome-scale analysis of replication timing: from bench to bioinformatics”371. The protocol was designed and optimised with the assistance of Dr Liz Caldon, a senior postdoctoral fellow in the Cancer Division, Garvan Institute.

6.4.1 Repli-Seq

The Repli-Seq protocol is described generally in Figure 6.2. Briefly, cells in exponential growth phase are incubated for a short period (pulse labelled) with the thymidine analogue Bromodeoxyuridine (BrdU). BrdU is therefore incorporated wherever DNA replication is currently occurring in a given cell (Figure 6.2.A). The cells are immediately fixed in ethanol, which halts further cellular activity and allows the cells to be stained with Propidium Iodide (PI) (Figure 6.2.B). PI is a DNA intercalating molecule that fluoresces when bound to nucleic acid. This allows identification of cells within a population that have an amount of DNA ranging from n (G1 phase) to 2n (G2 phase). In this instance, ethanol fixed and PI stained cells were separated into 6 populations of cells using a BD Influx Cell Sorter. These populations represented G1 and G2 phases, as well as 4 intermediate fractions of S phase (Figure 6.2.C). In each specified sample, BrdU was present wherever DNA was being replicated at that part of the cell cycle. Following cell sorting, DNA was purified from each sample, sonicated into ~700 bp fragments (Figure 6.2.D) and subject to BrdU immunoprecipitation (Figure 6.2.E). This immunoprecipitation selectively enriches for DNA that contains incorporated BrdU, thus isolating the DNA that was replicating in an identified part of the cell cycle. Before proceeding, it was necessary to validate the enrichment of BrdU Immunoprecipitated material using a panel of amplicons of DNA that is known to consistently replicate at different times in the cell cycle (Figure 6.2.F). Finally, as the DNA recovered from BrdU immunoprecipitation is single stranded (ssDNA), a klenow extension 122

CHAPTER 6: REPLI-SEQ OPTIMISATION with random hexamers was used to reconstitute the second strand prior to next generation sequencing (NGS, Figure 6.2.G).

The parameters that were necessary to optimise over the course of this investigation are discussed below.

6.4.2 BrdU labelling time

The progression through the cell cycle and the length of S phase in particular is known to vary between different cell types372,373. Accordingly, the length of time that growing cells are pulse labelled with BrdU varies between replication timing studies370,374, ranging from 30 min to 2 hr370,371. Shorter incubation periods label more discrete regions of the genome but are thought to increase the level of ‘noise’ during subsequent BrdU immunoprecipitations. Conversely, a pulse label that is too long has the risk of labelled DNA being integrated into subsequent daughter cells. To investigate whether a 2 hr BrdU pulse label is too long for this study, I incubated growing LNCaP and PrEC cells with BrdU for 1 hr and 2 hr time periods prior to ethanol fixation and the addition of FITC conjugated anti- BrdU antibody (#347583, BD Biosciences). Cells were stained with PI and analysed using a FACSCanto II flow cytometer (Figure 6.3). This analysis shows that ~13% of cells have incorporated BrdU in LNCaP cells compared to ~6% in PrEC populations, representing the assessable S-phase population. Importantly, no cell populations were observed in the far upper left of these analyses for either 1 hr or 2 hr incubations. This population would represent cells that have integrated BrdU and have also progressed through G2 phase. As both conditions yielded appropriate distribution, a 2 hr BrdU incubation was used for remaining experiments.

123

CHAPTER 6: REPLI-SEQ OPTIMISATION

Figure 6.3 Optimisation of BrdU labelling conditions FACS analysis was carried out using a FACSCanto II flow cytometer, using detectors for FITC (BrdU, y-axis) and PI (x- axis). Both LNCaP and PrEC cells that had been incubated with BrdU for either 1 hr or 2 hr are shown. Cell populations in the upper gate of each plot represent those cells that are going through S-phase (progression along the x-axis) and have integrated BrdU (higher on the y-axis).

6.4.3 Immunoprecipitation antibody optimisation

The protocol in Section 6.3.3 describes in detail the immunoprecipitation procedure that isolates DNA fragments that have integrated BrdU. As there were discrepancies in the available protocols concerning the amount of anti-BrdU antibody needed and the length of primary antibody incubation, I carried out the protocol while varying these conditions. LNCaP material that had been pulse labelled with BrdU for 2 hr, sonicated to 500 bp and DNA purified using QiaAmp Mini Kit was used as source material. The conditions tested were either 40 l or 80 l of a 12.5 g/ml anti-BrdU solution and an incubation time of either 20 min or 40 min. The measure of immunoprecipitation efficacy was the amount of

124

CHAPTER 6: REPLI-SEQ OPTIMISATION

Figure 6.4 BrdU immunoprecipitation optimisation BrdU pulse labelled LNCaP DNA was incubated with varying amounts of anti-BrdU antibody for varying amounts of time during the BrdU immunoprecipitation protocol (described in Section 6.3.3). All conditions were carried out in duplicate. The red bar indicates the negative IgG only control which didn’t use any primary antibody. The green bars indicate the conditions that yielded the highest amount of DNA. DNA recovered from the assay compared to the input material. Under these experimental conditions, it’s clear that a 20 min incubation with 80 l of anti-BrdU antibody were the ideal qualities (Figure 6.4) and were used for further experiments.

6.4.4 qPCR Amplicon Validation

Validation of the BrdU immunoprecipitation protocol was carried out using q-PCR. The replication program is recognised to vary between different cell types118 and as such the identity of DNA that would definitively replicate early or late in PrEC and LNCaP was an unknown. To counter this, a panel of putative ‘early’ and ‘late’ DNA amplicons that were previously suggested371 to exhibit replication constancy were used (Table 6.1). Mitochondrial DNA was used as a general control of immunoprecipitation as it is known to replicate throughout the cell cycle. q-PCR with the selected primers was carried out on LNCaP material that had been sorted into 6 phases of the cell cycle and undergone BrdU immunoprecipitation. Figure 6.5 shows

125

CHAPTER 6: REPLI-SEQ OPTIMISATION

Gene Gene Name Putative Replication Time HBA1 Haemoglobin, alpha 1 Early MMP15 Matrix metalloproteinase 15 Early BMP1 Bone morphogenetic protein 1 Early HBB Beta globin Late PTGS2 Prostaglandin-endoperoxide synthase 2 Late NETO1 Neuropilin and tolloid-like 1 Late SLITRK6 SLIT and NTRK-like family, 6 Late ZFP42 Zinc finger protein 42 Late DPPA2 Developmental pluripotency associated 2 Late MITO Mitochondrial DNA Throughout S-phase

Table 6.1 Putative BrdU-IP control amplicons the relative enrichment of all amplicons detailed in Table 6.1. Candidate early replicating amplicons clearly show increased enrichment in S1 and S2 sorted fractions, while candidate late amplicons were generally increased at S4 and G2 fractions. In contrast, mitochondrial DNA showed enrichment for all the sorted fractions. As they exhibited the greatest signal to noise ratio, BMP1 and DPPA2 were used for all future validations as markers of early and late replication respectively.

6.4.5 dsDNA reconstitution

While double stranded DNA (dsDNA) is required for standard NGS, the DNA isolated from the BrdU immunoprecipitation protocol is single stranded (ssDNA). In order to reconstitute the complementary DNA strand, a Klenow extension with random hexamers as priming sequences was used. The Klenow fragment is a large protein fragment derived from the DNA polymerase I protein of E. coli. It has maintained the original protein’s polymerase activity but lost its exonuclease function375. In order to use this dsDNA synthesis, I needed to optimise different aspects of the protocol. I show that DNA amplification occurs in a linear fashion, dependent on the length of the 37oC incubation (Figure 6.6.A). Similarly, the

126

CHAPTER 6: REPLI-SEQ OPTIMISATION

Figure 6.5 Validating replication timing amplicons qPCR was used to amplify amplicons presumed to replicate early or late in the cell cycle. BrdU labelled LNCaP material that had been sorted into the 6 nominated fractions (coloured) and Immunoprecipitated was used as source material. qPCR data was normalised to immunoprecipitation input samples. Error bars represent +/- SD. dsDNA output is directly proportional to the amount of input ssDNA (Figure 6.6.B). Accordingly, I used a 2 hr incubation and a 10 ng ssDNA input for all future experiments.

An important consideration for all procedures that follow an immunoprecipitation is to ensure appropriate enrichment is maintained. In this instance, I was concerned that the Klenow extension could be biased against specific sequence composition. Figure 6.6.C shows that the initial S1 and S4 fractionated input material display the expected ratio between BMP1 (early replicating) and DPPA2 (late replicating) amplicons. After a 2 hr Klenow extension, a ten-fold enrichment of early and late replicating material is clearly maintained in both S1 and S4 samples, thus validating this approach.

Following a Klenow extension, I observed that the size range of treated material would significantly increase from a band centred at 400 bp to a DNA smear that expanded the length of a 1.5% agarose gel (Figure 6.6.D). I hypothesised that this expansion of fragment size was due to DNA concatemers caused by the Klenow extension. In order to reduce the size range to the 200-500 bp required for NGS, Klenow treated products were sonicated using a Sonifer 250 probe sonicator, as described (Section 6.3.4, Figure 6.6.D)

127

CHAPTER 6: REPLI-SEQ OPTIMISATION

Figure 6.6 Optimising Klenow extension conditions A) 8 ng of ssDNA DNA underwent Klenow extension at 37oC for the different time points specified. dsDNA was quantified using dsDNA High Sensitivity Qubit assay kit. B) Different amounts of ssDNA were incubated with Klenow for 2 hr and the resultant dsDNA was quantified. C) qPCR was carried out against “S1” and “S4” fractionated LNCaP material before (green) and after (blue) Klenow extension. The ratio of the BMP1 and DPPA2 amplicons provides a quantity of relative early to late material. D) Resonication of dsDNA material. Shown is a UV imaged 1.5% agarose gel separation. The left lane shows a 100 bp size ladder with the size of bands indicated. Material that had been BrdU-immunoprecipitated (“Original”) showed a size range of 200-700 bp. This material upon Klenow labelling (“Klenow”) shows an increase in large DNA fragments. The right lane shows Klenow labelled DNA that has been re- sonicated using a Sonifer 250 probe.

128

CHAPTER 6: REPLI-SEQ OPTIMISATION

6.4.6 Repli-Seq with LNCaP and PrEC

LNCaP and PreC cells were pulsed labelled with BrdU for 2 hr, before being fixed with ethanol, stained with PI and treated with RNase A. FACS sorting was carried out in duplicate for each population as this was the step that was likely to introduce the most variability within the Repli-Seq protocol. Cells were sorted into 6 fractions of the cell cycle that encompassed S-phase (Figure 6.7), and between 30,602 (LNCaP S3.A) and 726,108 (PrEC G1.B) cells were recovered in each fraction (Table 6.2). Due to normal cell cycle dynamics, cells within G1 and G2 represented the greatest populations. DNA was purified from these sorted samples using a phenol/chloroform extraction and an average of 75 ng/10,000 cells was recovered for PrEC cells, and 125 ng/10,000 cells for LNCaP cells, reflecting the differences in ploidy between the two cells.

Figure 6.7 Representative LNCaP PI FACS sort BrdU labelled and ethanol fixed LNCaP cells were stained with PI and treated with RNase A. PI fluorescence under 488 nm excitation was used to separate cells into the 6 fractions of the cell cycle indicated here (G1, S1-S4, G2). In this example, ~400,000 cells were isolated from “G1” phase and ~50,000 cell were recovered from each of the remaining fractions.

129

CHAPTER 6: REPLI-SEQ OPTIMISATION

DNA isolated from the 24 fractions was immunoprecipitated, Klenow treated and validated as previously described. Successful qPCR validation of the final solutions is indicated in Figure 6.8, where appropriate relative enrichment of BMP1 and DPPA2 is shown for each fraction. Following successful validation, 38 l of Klenow treated material remained. Of this, 15 l was sent to University of Southern California Epigenome Centre Data Production Favility for 50 bp single end sequencing. The amount of DNA ranged from 16-47 ng, and is detailed in Table 6.2.

Figure 6.8 Final qPCR Validation of Repli-Seq protocol A) qPCR was carried out on duplicate, sorted PrEC and LNCaP samples that had been BrdU Immunoprecipitated. The relative amounts of BMP1 and DPPA2 amplicons were used as a measure of early to late replication. The y-axis uses a log10 scale. B) qPCR was repeated on the same samples following Klenow dsDNA synthesis.

130

CHAPTER 6: REPLI-SEQ OPTIMISATION

Table 6.2 Summary of LNCaP and PrEC Repli-Seq experiments A and B groupings indicate the parallel duplicate experiments carried out. The left two green columns indicate the quantity of cells found in each sorted fraction and the DNA isolated from each population of cells. “Post-IP DNA” refers to the amount of ssDNA recovered from BrdU immunoprecipitation. The percentage in the brackets refers to the yield compared to the original DNA input. “Post-Klenow DNA” refers to dsDNA quantified after Klenow extension. “DNA sequenced” is the amount of DNA found in 15 l of Klenow extended product, and is the amount of DNA that was used for NGS.

131

CHAPTER 6: REPLI-SEQ OPTIMISATION

6.5 Discussion

With the conclusion of this protocol I have successfully carried out the Repli-Seq methodology on PrEC and LNCaP cells. The Repli-Seq protocol was initially developed by Hansen and colleagues118 with a suite of different cell types and cell models. Unlike the protocol described in this chapter, the published method used DAPI (4,6-diamidino-2- phenylindole) fluorescence to separate cells instead of PI, which precludes ethanol fixation. In addition to this dissimilarity, a number of aspects of the protocol were either obscure or not present in the published material. For these reasons, an extensive optimisation of the Repli-Seq protocol as carried out, in order to be certain of proper enrichment of distinct DNA in temporally isolated cell populations. Here I show optimisation of BrdU pulse labelling time, BrdU antibody incubation conditions and dsDNA synthesis. Throughout I have ensured that the appropriate enrichment of late and early amplicons was maintained. This is an essential aspect of all enrichment-based analyses, and its undertaking will facilitate success throughout the NGS process and downstream analyses as discussed in the following chapter.

132

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

CHAPTER 7 Replication Timing in Prostate Cancer

133

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

7.1 Introduction

DNA replication proceeds through S-phase as a strictly defined and developmentally regulated process such that all DNA is replicated exactly once. In prokaryotes, which contain small circular genomes, the replication complex binds to a single well characterised DNA element and proceeds bidirectionally until the entire genome has been duplicated376,377. Eukaryotic genomes are typically much larger and can contain many thousand origins of replication. With the notable exception of Saccharomyces cerevisiae378 and related species, eukaryotes do not have a conspicuous DNA sequence from which replication begins. In metazoans, there are several proposed genomic characteristics of origins, including the presence of CpG islands379, AT-rich DNA380, gene promoters381, nucleosome free regions382 and specific histone signatures379,383,384. In both yeast and metazoan genomes, only a subset of origins bound by the origin recognition complex (ORC) are actually involved in DNA replication385,386. The activation of these sites is asynchronous and can be occur at any stage in S-phase. The distance between a given DNA locus and an activated origin, as well as the stage of S-phase that the origin is activated, together determine the “replication time” of the locus.

The nuclear factors that temporally regulate the activation of origins therefore define and coordinate the replication timing profile of a cell. While uncovering these factors in metazoan cells has proven a challenge, several clues have recently emerged. Studies using S. cerevisiae have suggested a resource limiting mechanism to temporal origin activation, whereby replication proteins that are found in low abundance, can in turn associate with origins that bear them a high and low affinity, thus establishing a programmed order of replication387-389. Although the mechanism of this dynamic origin affinity has yet to be reported, it is likely to reflect the local chromatin composition at early and late replicating origins which are enriched in open acetylated histones390,391 and heterochromatin392 respectively.

134

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

In metazoan cells no such resource limiting mechanism has been identified, but the possibility and likelihood of its existence remains. The double-strand break response protein RIF1 is potentially implicated in this process. In mammalian cells, RIF1 exhibits temporal dependent enrichment at chromocentres393 and participation in chromatin loop modulation394. Importantly, knock-down of the gene results in global alterations to the replication program, a phenotype suggested394 to be mediated by S-phase sequestration of replication origins via chromatin loops. Histone modifications are an additional source of origin selectivity. In one example, the ectopic hyper- or hypo- acetylation of the β-globin locus, through tethering of histone modifiers, results in significant shifts of earlier and later replication respectively395. Genome-wide studies additionally demonstrate HAT inhibition leading to increased late replication origin activity and HDAC inhibition promoting early replication of heterochromatin127. A recent investigation by Dellino and colleagues has revealed that the majority of ORC bound origins are actively transcribed, and that while early replicating origins are associated with highly expressed coding RNA, late activated origins preferentially associate with poorly expressed ncRNA111. Although these transcription start sites may promote replication initiation, origins found within a gene itself are silenced by active transcription396. The relationship between origin selection, transcription, histone modifications and chromatin looping in metazoan systems is incompletely investigated and is likely to present ongoing complexity to researchers.

The malleable mechanisms of origin selection lead to a replication timing profile that is widely dynamic118, such that it can respond to developmental cues115. The need for such a program is not immediately apparent. For example, prokaryotes and certain eukaryotes, such as in Drosophila melanogaster397 and Xenopus laevis398,399 embryos, maintain appropriate replication programs without resorting to temporal distinction. Current modelling suggests a dynamic replication timing program is required for proper transmission of developmentally regulated epigenetic signalling400,401. The part of the epigenome that is constituted by direct interactions to the DNA, such as CpG methylation, histone modifications, nucleosome accessibility and chromatin looping, needs to be

135

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER faithfully reassembled after DNA replication. While these factors can often self-propagate after replication, as is the case with CpG methylation402 and some histone modifications403- 405, replication timing provides and augments a spatially supportive nuclear environment for epigenetic replication. Broadly speaking, early replicating DNA is highly genic, transcriptionally active and epigenetically permissive, while late replicating DNA is characteristically the converse115,118,406. Replication timing can contribute to these phenotypes through two important mechanisms; i) through spatially isolating chromatin compartments at the point of replication, ii) through epigenetic modifiers that associated with the replication fork and nascent DNA.

Firstly, domains of similar replication timing are known to occupy spatially distinct regions in the nucleus. This quality has been described since early observations of inactivated X chromosomes which both replicate later and more peripherally than their active counterparts357. Contemporary studies utilising Hi-C and 4C technologies have identified early and late replicating DNA as occupying the two discrete nuclear compartments that correlate with active and repressive DNA116,122. The hierarchical relationship between replication timing, spatial isolation and epigenetic character is difficult to discern. Logically however, the coordination of replication initiation through physical connections allows for a local nuclear environment surrounding newly replicated DNA which is disposed to a particular epigenetic character. The second proposed method of epigenetic control relies on the direct association between chromatin modifying enzymes and the replication complex. Many examples of such interactions exist, and include such enzymes as histone methyltransferases (G9A407, SETDB1408,409), histone acetylases (p300410), histone deacetylases (HDAC1, HDAC2411,412) and DNA methyltransferases (DNMT1413). Microinjection studies, where plasmids were directly injected into dividing cells, exhibit nucleosomal repackaging with either acetylated or deacetylated histones depending on the time of replication124. Histone H3 is highly acetylated prior to incorporation in newly replicated DNA414,415, and its deacetylated state in late replicating plasmids is consistent with the identification of HDAC2 exclusively at late replicating foci416. In a similar result,

136

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

SetDB1 a histone methyltransferase responsible for the heterochromatin H3K9me3 modification, is found at target loci only at their time of replication in mid-to-late S-phase408. These results suggest a model whereby the time of replication has a direct influence on the epigenetic state of replicated DNA. Varying enrichment of other chromatin modifying factors at the replication fork during the course of S-phase, while plausible, has not been reported.

Widespread transcriptional changes required for cellular differentiation are accompanied by extensive epigenomic remodelling that are supported by an altered replication program. The β-globin locus provided early proof of this principle417, where differentiation into erythroid cells was accompanied by a switch from late to early replication and heightened epigenetic accessibility418,419. Using genome-wide replication protocols, Hiratani and colleagues elegantly assessed the changing replication and epigenomic landscape during the course of neuronal differentiation115. This study revealed that 20% of the genome was subject to replication change from the embryonic state, consolidating smaller embryonic replication domains into larger terminally differentiated domains; a phenotype characteristically similar to polycomb domains420. iPS cells were shown to have an indistinguishable profile from embryonic cells, indicative of a fundamental role for the replication program in pluripotency. These differentiation associated replication changes were further correlated with spatial rearrangement, quantified by distance to the nuclear periphery, as well as transcriptional capacity. Similar to LAD transcriptional regulation91, genes that changed from late to early replication (or lamina associated to unassociated) reveal a potential for gene activation rather than a direct correlation. Interestingly, genes with CpG island promoters were much less influenced by surrounding replication timing changes, suggesting the presence of “strong” transcriptional promoters which could dictate local epigenetic environments. A separate study demonstrates that ~50% of the genome, primarily at gene deplete regions, is subject to replication timing plasticity between differentiated cell types, and that changes in these regions correlate more with chromatin accessibility than transcriptional output118.

137

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

As with many epigenetic factors, the replication timing program has a role in tumourigenesis and is prone to disruption. Late replicating regions are known to have an increased mutation rate in human development421, and this trend continues in cancer development where these regions are disposed to single nucleotide substitutions422,423. Genomic amplifications and deletions are preferentially replicated in early and late S-phase respectively422, and oncogenic fusions often occur when two loci are co-replicated and spatially co-localised424,425. In healthy cells, asynchronous replication of homologous alleles is generally only exhibited by imprinted genes and the X-chromosomes. Cancer cells by contrast show asynchrony for several loci, including tumour suppressor genes TP53426 and RB1, as well as oncogenes such as MYC427. The direct effects of asynchronous replication on cancer development are unclear, however a shift to late replication has been associated with abnormal spindle assembly and chromosomal instability428,429. There is currently only a single study assessing genome-wide changes of replication profiles in cancer, specifically analysing cancers of haemopoietic origin369. This investigation interrogated multiple primary samples and cell lines representing acute myeloid leukaemia (AML) and acute lymphoblastic leukaemia (ALL), revealing that up to 18% of the genome was subject to tumour associated plasticity. These plastic regions demonstrated heightened replication heterogeneity when compared to normal haemopoietic cells, suggesting the typical controls of replication initiation had been deregulated. Of particular interest was the existence of replication timing “fingerprints” for both leukaemia subtypes as well as leukaemia in general, irrespective of the cell of origin. The hypothesis of a common early event in the disruption of replication timing potentiating leukaemogenesis or even tumourigenesis was subsequently raised but left unexplored. In this chapter I use Repli-Seq to address if replication timing is disrupted in LNCaP prostate cancer cells and if this is associated with the mechanisms of transcriptional and epigenetic remodelling that occur in LRES and LREA regions.

138

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

7.2 Aim

I intend to assess the variability of the replication timing program in prostate cancer using LNCaP and PrEC cell lines and explore the epigenetic manifestations that this variability promotes.

Specific aims: i) To validate and process data obtained from the Repli-Seq protocol described in Chapter 6. ii) To isolate and describe genome-wide differences in the replication timing program between LNCaP and PrEC cells. iii) To correlate changes in replication timing with changes in gene expression and histone modifications. iv) To assess the relationship between replication timing and cancer-associated CpG island hypermethylation and genome-wide hypomethylation. v) To identify epigenetically distinct classes of replication timing changes and characterise temporal changes at LRES and LREA regions. vi) To identify cancer specific signatures of replication timing variation and evaluate changes at clinically relevant genetic loci.

7.3 Methods

7.3.1 Illumina sequencing and processing

Carried out with the assistance of Dr Elena Zotenko, a senior bioinformatics postdoctoral fellow in the Clark Lab. 15 – 50 ng of double stranded DNA was sent to USC Epigenome Centre Data Production Facility for 50 bp single end sequencing using the Illumina HiSeq 2000. Samples were indexed such that 6 samples were run per sequencing lane, and between 14.9 and 31.9 million reads were obtained (Table 7.2). Sequence reads were mapped to the hg18 human genome build using Bowtie, allowing up to 3 mismatches. 139

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

Sequences were excluded if they failed to align to hg18, mapped to multiple locations in the genome or were clonal sequences.

7.3.2 Replication timing value processing

Carried out with the assistance of Dr Nicola Armstrong, a postdoctoral fellow in the Clark Lab. This data processing has been primarily appropriated from Hansen et al. “Sequencing newly replicated DNA reveals widespread plasticity in human replication timing”118.

The density of sequence reads was calculated for each cell-cycle fraction using a 50 kb sliding window in 1 kb intervals across the genome. Genomic regions were excluded from further analysis if they either contained greater than 20 reads in a 150 bp window, or less than 50 reads in a 50 kb window. For comparison purposes, regions excluded in a replicate from one cell line were subsequently removed from the analysis. The remaining read densities were then normalised to 1 million reads per cell cycle fraction. To create Percent Normalised Density Values (PNDV), the 50 kb smoothed value was converted to a percentage of the total signal at each 1 kb locus for all 6 fractions for a given sample. This value therefore represents the percent of replication occurring at a given time point, as well reducing the impact of copy number and mappability induced variation.

To create a single value representing the time of replication for a given sample, the following formula was used:

Weighted Average = (0.917*G1)+(0.750*S1)+(0.583*S2)+(0.417*S3)+(0.250*S4)+(0*G2)

In this calculation of the Weighted Average (WA), G1, S1 etc refer to the previously calculated PNDV values for a given locus. The formula for this transformation was obtained from the ENCODE method for “Replication Timing by Repli-Seq”. WA values represent the

140

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER time of replication, where a higher WA was indicative of an earlier time of replication. These values were used for all genome-wide analyses of replication time.

7.3.3 ENCODE data processing

Carried out with the assistance of Dr Nicola Armstrong. Publicly available Repli-Seq datasets used in this study were made available via the ENCODE consortium118,430. These data were created by the University of Washington ENCODE group and include the following cell lines: Cell Line Cell Type K562 Myelogenous Leukaemia HeLa-S3 Cervical Carcinoma HepG2 Liver Carcinoma MCF7 Breast Adenocarcinoma SK-N-SH Neuroblastoma HUVEC Umbilical Vein Endothelial Cells NHEK Epidermal Keratinocytes IMR90 Foetal Lung Fibroblasts BJ Foreskin Fibroblasts GM12878 B – Lymphocyte GM06990 B – Lymphocyte GM12801 B – Lymphocyte GM12812 B – Lymphocyte GM12813 B – Lymphocyte BG02ES Embryonic Stem Cell

Table 7.1 Publicly available Repli-Seq datasets

Raw sequence data was downloaded from UCSC and mapped to hg18 in the same manner as our own sequence data, described above. Further, PNDV and WA values were created following the same protocol detailed in Section 7.3.2.

141

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

7.3.4 Epigenetic promoter quantification

Carried out with the assistance of Aaron Statham, a PhD student in the Clark Lab. To summarise ChIP-Seq enrichment for H3K4me3 and H3K27me3 at promoter sequences, normalised counts +/- 500 bp of all RefSeq TSS were tabulated. ChIP-on-chip data for H3K9me2 and H3K9ac is represented as a t-statistic calculated via ‘BlockStats’ in the Repitools R package241. Methylation values were taken from 450K array data, and were defined to be the average Beta value of all probes located within +/- 500bp of the TSS for all RefSeq genes (see Section 2.8.2).

7.3.5 gNOME-Seq WGBS processing gNOME-Seq on LNCaP and PrEC was performed by Dr Fatima Valdes (postdoctoral fellow in the Clark Lab) and Dr Phillippa Taberlay (senior postdoctoral fellow in the Clark Lab) as previously described by Kelly et al.431 Bisulphite sequencing reads were processed and aligned to human genome build hg19 using a custom pipeline with the assistance Aaron Statham. Whole Genome Bisulphite Sequencing (WGBS) CpG specific methylation was assessed by quantifying the incidence of cytosine and thymine nucleotides at the location of the endogenous cytosine nucleotide in all WCG sites in the genome. WCG sites were used in order to remove the incidence of GpCpG sites, as GpC methylation is impossible to distinguish from CpG methylation in this instance. CpG methylation was then defined as the ratio of cytosine to thymine (C/T) where greater than 5 reads were present.

As previously calculated WA values were aligned to hg18 and gNOME-Seq was aligned to hg19, the batch coordinate conversion tool ‘liftOver’ was used via UCSC432. Using this tool, >99% of WA values in 1 kwb hg18 intervals were assigned hg19 coordinates, and more than 99.9% of them retained a size of 1 kb. To directly compare replication time and methylation, gNOME-Seq WGBS data was averaged over, and assigned to, each of these represented 1 kb windows that contained WA data.

142

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

7.3.6 K-means-clustering

Cluster analysis aims to partition the observations/data into groups so that pairwise dissimilarities between cluster members are smaller than those in different clusters. K- means clustering is a popular method of clustering that chooses a prespecified number of clusters ("K"). The observations are then assigned to the K clusters such that, within each cluster, the squared Euclidean distance between each observation and the cluster mean is minimised.

7.3.7 Principle Component Analysis

Principal component analysis is an unsupervised method that finds linear combinations of the original variables or columns of a matrix X, that best explain the variation in the dataset433. It is a way to identify patterns in high dimensional data; the first few principal components may reveal structure and can be easily graphically represented. By definition, the first principle component is the eigenvector of the largest eigenvalue of the data set and the second principle component is the eigenvector corresponding to the second largest eigenvalue. The specific goal of using PCA in this thesis was to summarize patterns of correlations among observed replication timing data from different cell lines.

143

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

7.4 Results

7.4.1 Repli-Seq data and processing

Table 7.2 Summary table of Repli-Seq samples PrEC and LNCaP were sorted into 6 fractions in duplicate. DNA (ng) refers to the amount of DNA sent for sequencing at the USC Epigenome Centre. ‘Total reads’ refers to the number of raw reads obtained for each sample; ‘unique reads’ refers to the final number that were suitable for further analyses. Six propidium iodide (PI) sorted fractions of LNCaP and PrEC BrdU pulse labelled material were immunoprecipitated using an anti-BrdU antibody in duplicate (24 samples in total, see Chapter 6). As immunoprecipitated DNA was single stranded after BrdU- immunoprecipitation, it was necessary to synthesise double-stranded DNA using Klenow extension. The full protocol can be found in Chapter 6. Double stranded DNA was quantified using the dsDNA High Sensitivity Qubit DNA assay, and a sufficient quantity (> 10 ng) of the

144

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER samples was sent to the University of Southern California (USC) Epigenome Centre Data Production Facility (Table 7.2) for ‘library prep’ and high throughput sequencing. Samples were prepared at USC such that 6 samples could be barcoded separately and run combined in a lane of the Hi-Seq system (Illumina). Barcoding entails annealing a specific primer to the 5’ ends of the sequences of a given sample, such that after sequencing all samples with this ‘barcode’ can be easily identified. This allowed for several samples to be run simultaneously in a single sequencing lane. 50 bp single end sequencing was carried out for all 24 samples, yielding 679 million reads in total that were mapped to the hg18 human genome build (Table 7.2). Sequences that failed to align to the genome (unaligned), that aligned in more than one locus (multiple) or that were clonal in composition (clonal; ie. several reads with the exact same sequence) were excluded from further analyses. An average of 18 million unique reads were accordingly obtained for each sample. It is of note that all the samples exhibited a higher than normal rate of unaligned sequences. While there was still an adequate degree of coverage, it is possible that this high amount of unaligned reads was caused by a protracted Klenow extension potentially causing DNA concatemers.

To convert the sequencing reads to meaningful replication data, the approach described by Hansen et al118 was appropriated and is partly summarised in Figure 7.1. To calculate the local genomic density of the reads in each fraction, the mean of a 50 kb sliding window was calculated in 1 kb intervals. These local densities were then normalised to 1 million reads for each sample to facilitate appropriate integration of the cell cycle fractions. To allow for copy number variations and sequence mapability and bias, all calculated sequence densities at a given locus was converted to a percentage of the total signal over all 6 fractions at that locus for a given sample. Essentially, this converted all signal in a sorted fraction to a percentage of replication that occurs at that time point at a specified locus. These values are referred to as Percent Normalised Density Values (PNDV). Visualisation of PNDV values using the Integrative Genomics Viewer (IGV) allows for ready inspection of replication fork progression. For example, in Figure 7.1 (middle panel), the presence of replication initiation zones is highlighted by red arrows. The signal progression makes it apparent that the

145

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

Figure 7.1 Quantification process of Repli-Seq samples Upper panel: unique reads were mapped to the hg18 genome build and visualised using the Integrative Genomics Viewer (IGV) for 6 cell-sorted fractions of a PrEC sample. Middle panel: PNDV values were calculated and visualised in IGV. Red arrows indicate regions of replication initiation; blue arrows indicate regions of replication fork termination. Bottom panel: Weighted Average (WA) values calculated for each 1 kb locus are plotted as a continuous line. Higher values represent earlier replication and lower values represent later replication, as visualised in the top panels. replication fork moves along the DNA fibre and that the replication machinery moves in a bidirectional manner. This culminates where two facing replication forks meet with the cessation of further DNA replication, indicated by blue arrows.

As there were 6 PNDVs that represented replication at a single locus, these were then converted to a Weighted Average (WA) using the following formula: WA=(0.917*G1)+(0.750*S1)+(0.583*S2)+(0.417*S3)+(0.250*S4)+(0*G2) This formula was obtained from ENCODE methods for “Replication Timing by Repli-Seq”, and generates a single value which represents the time of replication for every locus in the genome (Figure 7.1). With this calculation, a higher number equates to an earlier time of replication (more signal in G1 and S1) while a lower number equates to a later time of replication (more signal in S4 and G2). 146

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

Figure 7.2 Replicates confirm Repli-Seq reproducibility Left panels: cell sorted replicates are contrasted for all replication fractions via PNDV summarised values in PrEC and LNCaP. r2 correlation coefficient values are indicated in the respective panels. Right panels: replicate PNDV were summarised as WA values for PrEC and LNCaP and contrasted.

Sample replicates were introduced at the stage of cell sorting in the Repli-Seq protocol. Due to the operator control of this process, it is thought to be the point of greatest potential variability in the full protocol. As described, PNDVs were calculated for each locus in each sample. To assess the level of variability in the replicate samples, PNDV values were compared for all loci in all duplicate samples (Figure 7.2). A high degree of correlation was

147

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

Figure 7.3 Public Repli-Seq datasets have similar WA distributions The frequency of WA scores that were previously determined in 1 kb blocks across the genome are summarised as a Kernal density estimation, thus visualising the relative distribution of a given WA value. LNCaP and PrEC values are represented in red and green lines respectively. 12 ENCODE Repli-Seq datasets of different cell lineages in visualised as pink lines. observed across all sorted fractions, with a minimum r2 value of 0.948. Furthermore, WA values, which combined all 6 PNDV fractions, had a r2 value of 0.995 and 0.996 for replicates of LNCaP and PrEC respectively (Figure 7.2). Due to this high degree of correlation, for all further analyses the WA of the two replicates were averaged to create a single value for each locus in LNCaP and PrEC separately.

As a final measure of the quality and success of the Repli-Seq protocol, the distribution of WA scores obtained was compared to publicly available datasets (Table 7.1). Repli-Seq data for 12 different cell-types118,430 was obtained from the ENCODE database and processed to create WA scores as above. It is clear from Figure 7.3 that the distributions of WA at all loci in LNCaP and PrEC is comparable to the distributions found in this established ENCODE dataset, thus validating our analytical and experimental approach.

148

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

149

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

Figure 7.4 Replication timing in the normal and cancer prostate genome Repli-Seq derived PNDV values of all cell fractions are plotted for each chromosome. The name and size of each chromosome is shown above the respective chromosomal ideogram. As detailed in the bottom right legend, green values represent PrEC and red values represent LNCaP. Regions that replicate earlier in LNCaP (Early Blocks) are highlighted in blue; regions that replicate later in LNCaP (Late Blocks) are highlighted in magenta.

7.4.2 Genome-wide visualisation of Repli-Seq data

Next, to display replication timing profiles genome-wide, the PNDV signal obtained for all cell-cycle fractions on all chromosomes in the genome in LNCaP and PrEC were plotted (Figure 7.4). This data clearly reveal the previously described115,118 ‘peaks’ that represent replication initiation zones. These are found generally in early S-phase and are distinguished by signal moving in opposite directions, representative of the bidirectional replication fork. Signal appears to propagate away from these initiation zones, throughout S-phase or until they contact a replication complex moving in the opposite direction, forming ‘troughs’ in the visualised data. This data therefore describes very clearly delineated replication origins in the LNCaP cancer genome, despite its reportedly complex genomic structure and hyperploidy434.

Figure 7.5 Classifying differentially replicated loci A) The distribution of all differences for a given loci between the two replicates of either PrEC or LNCaP are plotted as a histogram. B) The distribution of WA differences between LNCaP and PrEC are plotted. Dotted lines indicate a ΔWA of +/- 25. Values that fall outside this range are coloured red, and signify loci that are significantly changed in the time of replication.

150

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

7.4.3 Differential replication timing in cancer cells

We were interested in identifying regions of the cancer genome where there were differences in replication timing. To define a difference in the WA scores between LNCaP and PrEC that would represent a significant and real change in replication between the two samples, the frequency of differences exhibited between the replicates of each cell type were plotted (Figure 7.5.A). The maximum WA difference exhibited between these two sets of replicates was 21.5. This value was therefore a measure below which a difference in WA could be considered a technical artefact. To be more conservative, we defined a locus to have an altered replication time if the difference between the LNCaP and PrEC was more than 25. Figure 7.5.B shows the distribution of differences between the WA of LNCaP and PrEC. These differences are clearly centred at zero, emphasising that the majority of loci remain unchanged, with respect to replication timing, between the two cell types. The WA difference of +/- 25 is visually demarcated, above and below which loci are deemed to replicate earlier and later in LNCaP respectively.

To identify larger regions in the genome where the time of replication is altered in LNCaP, we first merged all loci that had a WA difference of greater than 25 into regions. These regions were then consolidated into a single block if the distance between them was less than 50 kb. This process yielded 301 ‘Late Blocks’ with an average size of 293 kb and covering 2.8% of the genome. Likewise, 246 ‘Early Blocks’ were found that had an average size of 311 kb and covered 2.5% of the genome. All identified blocks are indicated in Figure 7.4. Figure 7.6.A shows an example of 3 regions that replicate later in LNCaP and Figure 7.6.B shows a region that replicates earlier in LNCaP. From these representative figures it is clear that genes can be located in these differentially replicated blocks.

151

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

Figure 7.6 Visualising differentially replicated regions PNDV for PrEC and LNCaP are shown in the upper panels. The summarised WA values are indicated beneath. A) Regions that replicate later in LNCaP (ΔWA < -25) are highlighted in pink. B) Regions that replicate earlier in LNCaP (ΔWA > +25) are highlighted in blue. The location of RefSeq transcripts is indicated beneath the WA plot.

152

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

Figure 7.7 Replication timing at genes A) Distribution of replication timing (WA) at all gene promoters for LNCaP (red) and PrEC (green), n=16,276. B) Box and whisker plot of RNA-Seq values (RPKM) for genes that replicate early (highest 2% of WA) and late (lowest 2% of WA) in PrEC (green) and LNCaP (red).

7.4.4 Replication timing and gene expression

In order to understand the effect a changing replication program has on gene expression, it was necessary to first assess the endogenous relationship of these two characteristics in LNCaP and PrEC. The replication time of each gene was defined to be the WA at the TSS locus. As replication profiles were calculated using a smoothed 50 kb window and generally exhibit megabase scaled constancy, the WA at the beginning of a gene was thought to be an accurate representation for the WA of the whole transcript. Figure 7.7.A shows that the majority of genes have an early replicating profile in both LNCaP and PrEC. While this is a general quality of genic regions, there are still many genes that replicate later in both PrEC and LNCaP. Figure 7.7.B details the range of RNA-Seq reads exhibited by the earliest and latest replicating 2% of genes in the genome, corresponding to 325 gene loci for each class. As expected, genes that replicate late show constitutive gene repression. By contrast, genes that replicate early are far more plastic in their gene expression, tending towards higher expression, suggesting a ‘permissive’ chromatin environment.

153

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

Figure 7.8 Changing replication time correlates with a change in gene expression A) The difference in time of replication (ΔWA, LNCaP-PrEC) is plotted against the difference in gene expression (Δ RNA-Seq RPKM, LNCaP-PrEC) for all RefSeq promoter loci. Dotted lines indicate a ΔWA +/- 25 outside of which a significant change in replication time is denoted and points are coloured red. B) Box and whisker plot of the change in expression for all genes that replicate later or earlier in LNCaP compared to PrEC (ΔWA +/- 25). The two groups are further divided into those that contain CpG islands at the promoter locus. The asterisk denotes a significant difference between later and earlier replicating groups (p<0.05). Genes found in either Late or Early Blocks were considered to have a significantly altered replication profile in LNCaP. Using this definition, 350 and 143 genes were found to have a later or earlier profile in LNCaP compared to PrEC respectively (red dots, Figure 7.8.A). As the majority of genes replicate early in the normal cell (Figure 7.7.A), it is expected that more genes would replicate later than earlier in the cancer state. Figure 7.8 compares the change in expression of genes using RNA-Seq (Section 4.3.2) to the change in replication time. Genes that replicate earlier showed a significantly higher change in expression than those genes that replicate later in LNCaP (Figure 7.8.B). I found that genes that replicate later in LNCaP show a reduction in expression regardless of whether they were associated with a promoter CpG island or not. In contrast, only genes that had CpG islands showed an increase in expression if they replicate earlier in LNCaP (Figure 7.8.B). This however wasn’t an absolute relationship, as many genes did not show any change in expression regardless of a changing replication time, again suggesting the replication time facilitates rather than dictates the transcriptional environment.

154

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

155

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

Figure 7.9 Chromatin changes and changing replication timing at genes The change in replication time (ΔWA) is contrasted to the change in chromatin enrichment for H3K4me3 (ChIP-Seq), H3K9ac (ChIP-on-chip), H3K27me3 (ChIP-Seq) and H3K9me2 (ChIP-on-chip). For ChIP-Seq assayed modifications, values represent the difference in average enrichment over the promoter locus (LNCaP-PrEC; +/- 1kb of the TSS). For ChIP-on-chip assayed modifications, values represent the t-statistic of change in enrichment at the promoter locus. Dotted lines indicate a ΔWA +/- 25 outside of which a significant change in replication time is denoted and points are coloured red. Right hand panels: box and whisker plots represent changes in a given chromatin modification for genes that significantly change their time of replication (ΔWA +/- 25), asterisks denote p<0.05.

7.4.5 Replication timing and histone modifications

The relationship between an altered replication time in LNCaP and respective changes in the chromatin state of promoters relative to PrEC were investigated using a similar method. Previously used ChIP-on-chip datasets for H3K9me2 and H3K9ac (Chapter 3) as well as ChIP- Seq data for H3K4me3 and H3K27me3 were investigated in Figure 7.9. Both of the active modifications, H3K4me3 and H3K9ac, exhibited the expected positive correlation between increase in replication time and increase in promoter enrichment in LNCaP cells. Interestingly, the inverse relationship appeared to have a stronger association, with genes that replicate later in cancer exhibiting a larger depletion than the corresponding enrichment in early genes. By contrast, enrichment of H3K27me3 was strong at gene promoters found in Late Blocks but depleted at promoters in Early Blocks. However, H3K9me2 failed to show pronounced change at genes that replicated either early or late, suggesting either there is a minimal relationship between replication and H3K9me2 or, that the quality of enrichment in this dataset is low.

7.4.6 Replication timing and DNA methylation

Unlike the chromatin modifications studied, DNA methylation is known to have variable influence depending on its genomic context (Section 1.2.2). Specifically, DNA methylation found in intergenic regions has a suite of different functions compared to methylation found at gene promoters. Methylation at gene promoters can itself be demarcated by the existence of CpG islands or not, both of which can silence gene expression41,435. Furthermore, methylation at gene bodies is known to correlate with transcription in differentiated cells33,436. Due to these diverse functions, I examined the relationship

156

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER between replication timing and CpG methylation in the context of CpG island promoters, non-island promoters and at a whole genome level.

7.4.6.1 DNA methylation at gene promoters Figure 7.10.A displays the differential presence of CpG islands at early and late regions in PrEC and LNCaP. CpG island containing promoters are shown to have a significant preference for early replicating genes in both cell types (p<0.01, Chi-square with yates correction), being found at more than 80% of early promoters. Late replicating genes by contrast appear to have a significant depletion of CpG islands (p<1x104) compared to the expected distribution. To investigate the relationship between methylation and replication timing at gene promoters, data from Infinium HumanMethylation450 BeadChip (450K) arrays were used for PrEC and LNCaP. 450K methylation data was used in preference to MBDCap-Seq data, due to the latter’s exclusive preference for CpG island and other CpG dense regions. Figure 7.10.B compares the 450K methylation ‘Beta’ value (mean +/- 500 bp of TSS) with the time of replication (WA) for all gene promoters. The variation in methylation levels found at extremely early or late replicating genes in LNCaP (defined as genes in the top or bottom 2% of the relevant WA distribution) are highlighted by density plots along the vertical axes. These density plots emphasise the major methylation populations identified in each category. For CpG island containing promoters, in both PrEC and LNCaP, and at both early (blue lines) and late (red lines) replicating regions, promoters are generally completely unmethylated (Figure 7.10.B, top panels). For promoters that do not have a CpG island, methylation exhibits a bimodal distribution, being either methylated or unmethylated at early replicating genes in both PrEC and LNCaP (Figure 7.10.B, bottom panels, blue lines). Late replicating genes in PrEC demonstrate a bimodal distribution of methylation, with a preference towards having a highly methylated promoter. In stark contrast however, late replicating genes in LNCaP were almost exclusively unmethylated (Figure 7.10.B, bottom panels, red lines). This demonstrates a cancer-specific distinction in the relationship between DNA methylation and replication timing at non-CpG containing promoters.

157

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

Figure 7.10 Replication timing and promoter methylation A) Barplot representing the % of promoters that contain CpG islands for the latest and earliest 2% (n=325) of genes in PrEC (green) and LNCaP (red). B) Each plot contrasts the time of replication (WA) against the level of methylation (450K array, Beta value) at either CpG island or non-island genes in PrEC and LNCaP. Higher Beta values are indicative of higher CpG methylation. The distribution of methylation at the earliest (blue) and latest (red) replicating genes is plotted as a density plot on the side of each figure.

158

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

To examine the relationship between DNA methylation, the time of replication and gene expression, three-dimensional plots comparing the methylation status of each promoter against the WA and the square-root of the representative RNA-Seq expression were created (Figure 7.11.A). As previously, this was done separately for CpG island and non-CpG island containing promoters. For LNCaP and PrEC, high expression at CpG island promoters correlated both with an early time of replication as well as the presence of an unmethylated promoter. This is revealed by expression being the highest when both of these conditions are met. The corollary of this observation is that neither of these factors is in itself sufficient to promote high expression. For example, a CpG island gene that is unmethylated will only be highly expressed if it is also replicated early. Similarly, an early replicating gene will only be expressed if it is unmethylated. For genes that replicate late and are unmethylated, as well as genes that replicate early and are highly methylated, high levels of expression are not observed.

As in Figure 7.10.B, there is a much greater diversity of methylation profiles shown at non- CpG island promoters (Figure 7.11.A, lower panels). In these instances, with particular reference to PrEC methylation, expression appears to show plasticity at early replicating genes regardless of the methylation status of the promoter. This is in contrast to late replicating genes, which exhibit very low expression. This figure additionally demonstrates that promoters that do not contain a CpG island promoter are generally less transcribed than those that do contain an island (Figure 7.11.A lower panels, Figure 7.11.B).

To assess how a changing replication program affects the methylation status of a gene promoter, I compared the change between PrEC and LNCaP of these two variables in Figure 7.12. As expected, and as discussed previously in this thesis, CpG islands were almost exclusively hypermethylated in cancer (Figure 4.8, Figure 7.12.A). This positive change appeared to occur equally at promoters that both replicated earlier and later in LNCaP (Figure 7.12.A, right panel). As hypermethylation of CpG island promoters is associated with gene silencing and an earlier replication program is associated with gene activation, the

159

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

Figure 7.11 Comparing promoter methylation, replication and expression A) Methylation (450K array, Beta values) is compared to the time of replication (WA) for CpG-island genes and non- island genes in PrEC and LNCaP. The height of each bar represents the level of gene expression (RNA-Seq, square root RPKM). Black and red colours are used to aid visualisation of distance, representing the time of replication. B) Box and whisker plots represent the level of expression (RNA-Seq, square root RPKM) at CpG island genes and non- island genes in PrEC (green) and LNCaP (red).

160

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER expression values of the genes that fit both of these criteria was compared in LNCaP and PrEC (Figure 7.12.B). Hypermethylation was defined as a Beta-value change of greater than 0.2 (representing a methylation change of 20%), and is indicated by a blue horizontal line in Figure 7.12.A. Expression was compared both for genes that are hypermethylated and replicate later in LNCaP (Box i, Figure 7.12.A) and genes that are hypermethylated and replicate earlier in LNCaP (Box ii, Figure 7.12.A). Genes that replicated later were found to be either lowly expressed in both LNCaP and PrEC, or exhibit a marked reduction in expression in LNCaP (Figure 7.12.B.i). In contrast, genes that replicated earlier and were hypermethylated showed either an unchanged low-level expression or an increase in expression in LNCaP (Figure 7.12.B.ii). As hypermethylated genes are traditionally thought to be transcriptionally repressed in cancer, the identity and methylation profile of the two most highly activated genes was sought (Figure 7.12.B.ii, blue circles). Interestingly, they were identified as NCAM2 and LIN7A, both of which belong to ‘Group I’ methylated genes discussed earlier in this thesis (Section 4.4.4)145. Briefly, Group I genes are a class of genes that show hypermethylation at the borders of promoter associated CpG islands with a concomitant ectopic activation of cancer gene expression. The exclusive nature of activated and repressed genes belonging to either earlier or later replicating regions respectively, in spite of the hypermethylated state of the CpG island promoter region (Figure 7.12.B), further supports the transcriptional influence of the replication program.

Unlike their CpG island containing counterparts, non-CpG island promoters exhibit a wide range of changing methylation profiles, exhibiting both hypo and hypermethylation in LNCaP. This state is reflected in the changing replication profile, with later replicating promoters being shown to be statistically significant (p<0.05) less methylated than promoters that replicate earlier (Figure 7.12.C). These results therefore complement those found in Figure 7.10; not only are genes that replicate late devoid of methylation at low CpG density promoters, but also those genes that replicate later in cancer also actively lose methylation in the cancer cells.

161

CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

A CpG island promoters 1.0 1.0 ro-..... ro-

  • . ..c 0.0 £ 0.0 +-....l.....- -r--.L_J.....--,-.....1.-j ......
  • -40 -20 0 20 40 Later Repliication Earlier Replication Change in Replication Time (LNCaP - PrEC, t::,.WA) (/1WA < -25) (/1WA > 25)

    hypermethylated and hypermethylated and B LATER replicating II EARLIER replicating ~ 20 10 n = 38 t n = 25 5- 0" ~ 15 ~ 8 0" 0" Q) Q) (J) (J) 6 ..1:: 10 z ..1::z a: a: 4 0.. 5 0.. Cll Cll 2 () () z ~ 0 .....1 0 0 5 10 15 20 0 2 4 6 a 10 PrEC RNA-seq (sqrt) PrEC RNA-seq (sqrt) c non-CpG island promoters I * 1.0 ~ 1.0 ro-...... CIJ

  • . . ··, : >.. ..c ... ..c I I ..... 0 . 0 +-----~·~~~-·~~~~~~~~~---~ ..... 0.0
  • 25)

    162

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Figure 7.12 Changing replication time and changing promoter methylation A) Change in replication time (ΔWA) is contrasted against the change in methylation (Δ450K Beta value) for genes with CpG islands. The vertical dotted lines indicate a ΔWA +/- 25 and the horizontal blue line indicates a ΔBeta value of +0.2, above which islands are denoted as hyper-methylated. Genes located in shaded areas marked as i and ii are further explored in B), comparing the RNA-Seq expression values (square root RPKM) for PrEC and LNCaP. Blue circles indicate the 2 genes that are highly over-expressed in LNCaP compared to PrEC. C) Change in replication time (ΔWA) is contrasted against the change in methylation (Δ450K Beta value) for genes without CpG island promoters. The distribution of methylation at genes that are highly changed in time of replication (ΔWA +/-25) are documented by box and whisker plots on the right panels, asterisks denote p<0.05.

    7.4.6.2 Genome-wide DNA methylation The 450K array assay was designed specifically to interrogate the methylation status of genes and other distal regulatory elements. As the bulk of the methylome lies in intragenic regions, it was necessary to adopt an additional experimental approach. To this end, CpG methylation was analysed from gNOME-Seq studies carried out in the Clark laboratory on LNCaP and PrEC cells (Section 7.3.5). gNOME-Seq, as first described by Kelly et al431, is a novel technique designed primarily to investigate the prevalence of nucleosome deplete regions in the genome. This is achieved by treating cellular DNA with the M.CviPl GpC methyltransferase and subsequent whole genome bisulphite sequencing. As the GpC methyltransferase is only able to access those nucleotides that aren’t protected by nucleosomes or other transcription factors, only those loci that are ‘free’ are subsequently methylated. The measure of this GpC methylation is carried out via whole genome bisulphite sequencing which has the advantageous result of simultaneously interrogating the native CpG methylation. Unfortunately, WGBS and gNOME-Seq both show bias against CpG islands and other CpG dense regions (Figure 7.13.A). While this precluded its use for the prior CpG island methylation studies carried out in this thesis, gNOME-Seq was well suited to investigate the relationship between replication timing and methylation at low- CpG dense inter- and intra-genic regions, as these regions are missed with 450K or MBDCaP- Seq approaches.

    To isolate CpG methylation data, those CpG loci found in the context of GpCpG were first excluded, as the origin of this methylation is impossible to ascertain. Methylation was then determined by dividing the amount of reads at a given locus that contained cytosine by the total number of reads, for all loci that contained greater than 5 reads. This yielded

    163

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER approximately 9.6 and 10.5 million CpG sites in LNCaP and PrEC respectively that could be accurately assessed for CpG methylation. In order to investigate the relationship between methylation and replication, the score of methylation was averaged in the same 1 kb blocks that previously been assigned a WA value (Section 7.4.1) across the genome. This value is plotted against the WA for the 2,489,163 assessable 1 kb loci for both PrEC and LNCaP in Figure 7.13.B. To facilitate interpretation of this visualisation, a density diagram demonstrating the relative quantity of loci containing a given methylation score for early and late loci is plotted in Figure 7.13.C. For this purpose, late replicating loci were defined as having a WA score of less than 20 and early replicating loci were defined as above 752. This accounted for approximately 10% of loci at either extremity. This data suggests that in PrEC, all represented loci are generally moderately to highly methylated, with a small reduction in methylation at very late replicating loci. In LNCaP however, it becomes strikingly clear that a completely unmethylated population dominates those loci that replicate late. By contrast, early replicating loci exhibit the same highly methylated phenotype seen in PrEC.

    The comparative absence of methylation at later replicating loci in LNCaP compared to PrEC suggested that these loci were becoming hypomethylated. Differential methylation was calculated as the difference between LNCaP and PrEC for all represented 1 kb blocks. Of the 2,489,163 genomic 1 kb blocks that had sufficient methylation coverage, 519,297 became hypomethylated (ΔmCG < -0.5) while only 25,517 became hypermethylated (ΔmCG > 0.5). Hypomethylation was associated with later replication in both PrEC and LNCaP, although this was more notable in the cancer state (Figure 7.14.A-B). Hypermethylated loci in contrast were preferentially found at early replicating regions, consistent with enrichment of CpG islands in these regions (Figure 7.10.A). The fact that PrEC cells show the same relationship between differential methylation and time of replication indicates that

    2 Note, this value is different to the previous cut-off of ΔWA +/-25, which refers to a significant change in replication timing from PrEC to LNCaP. A WA of 20 and 75 refers to late and early replication respectively in the frame of each cell type. 164

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Figure 7.13 Whole genome methylation and time of replication A) CpG representation in gNOME-Seq WGBS is compared for sites found in CpG islands, in CpG shores (2kb surrounding CpG islands) and the remainder of the genome. A value of 1 indicates expected representation. B) Replication time (WA) is contrasted against gNOME-Seq WGBS Methylation score (# of reads with Cytosine / # of reads) for the 2,489,163 1kb blocks that had both Repli-Seq and gNOME-Seq coverage in the genome. “Late” and “Early” dotted lines are at a WA of 20 and 75 respectively, representing ~10% of replication extremities. C) The frequency of gNOME-Seq methylation for late (red) and early (blue) replicating loci in PrEC and LNCaP.

    165

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Figure 7.14 Changing methylation and the time of replication A) The time of replication in PrEC and LNCaP is plotted against the change in methylation (ΔmCG, LNCaP – PrEC) in the cancer cell. Values above the red dotted line indicate hypermethylation (ΔmCG > 0.5); values below the blue dotted line indicate hypomethylation (ΔmCG < -0.5). B) The distributions of the time of replication (WA) of loci that were indicated as being hyper- (25,517 values) or hypo- (519,297 values) methylated in A) are plotted. C) Left panel: the change in replication time (ΔWA, LNCaP – PrEC) is ploted against the change in Methylation for all represented loci. Vertical dotted lines indicate significant change in replication (ΔWA +/- 25), horizontal lines indicate significant change in methylation (ΔmCG +/- 0.5). Top-right panel: the distribution of the change in replication time (ΔWA) of loci that were indicated as being hyper- (red) or hypo- (blue) methylated. Bottom-right panel: the number of loci from A) that were categorised as hypermethylated and later replicating (i), hypermethylated and earlier replicating (ii), hypomethylated and later replicating (iii), hypomethylated and earlier replicating (iv).

    166

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER the proclivity for these changes is established prior to tumourigenesis. This is further corroborated by a comparison of the change in replication time against the change in methylation, which demonstrates that hypomethylated and hypermethylated loci tend to exhibit unchanged time of replication (Figure 7.14.C, density plot). If a significant change in replication time (+/- 25) is associated with a change in methylation (+/- 0.5), it was only commonly observed at hypomethylated regions that also replicate later (Figure 7.14.C, bar plot). Finally, I investigated the genomic location of early and late replicating regions in PrEC and LNCaP, as well as DNA that was hypo- or hyper-methylated in LNCaP compared to PrEC (Figure 7.15). In concordance with previous observations, I show that both late replicating and hypomethylated regions are specifically enriched at intergenic regions. In contrast, DNA that replicates early or is hypermethylated is strongly enriched at exonic DNA.

    Figure 7.15 Genomic location of replication timing and methylation A) DNA that replicated late (WA<20) or early (WA>75) in PrEC and LNCaP were assigned genomic locations as either ‘Exonic’, ‘Intronic’ or ‘Intergenic’. The amount of 1 kb bins that were assigned a given annotation was normalised to the expected genomic frequency (y-axis). B) DNA that was either hypo- or hyper- methylated (ΔmCG +/- 05) was assessed in a similar fashion. 167

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    7.4.7 Replication timing and clusters of change

    To investigate whether genes that change their time of replication exhibit distinct modes of epigenetic reprogramming, we used K-means clustering (see Section 7.3.6 for details). This method identifies combinations of chromatin modifications that are altered together or share a common enrichment. For consistency, only epigenetic modifications that have been investigated via genome sequencing were considered. To represent active chromatin we examined relative enrichment of H3K4me3-Seq and H3K36me3-Seq, while H3K27me3-Seq represented repressive chromatin. DNA methylation was investigated via MBDCap-Seq, which preferentially inspects CpG island methylation. The genes examined consisted of those that were replicated at significantly different times in LNCaP compared to PrEC (see Section 7.4.4), accounting for 143 genes that replicate earlier and 350 that replicate later. Chromatin changes from PrEC to LNCaP were quantified in 100 bp windows, +/- 2500 bp from the ENCODE annotated TSS. Five clusters were generated for genes that replicate earlier and genes that replicate later independently, and these were ranked according to the average change in expression (difference in square-root of RNA-Seq) for each group (Figure 7.16) For the 143 genes that replicate earlier in LNCaP, three combinations of distinct epigenetic modifications were apparent (Figure 7.16.A). Cluster 1 (19 genes) represents those genes that have a marked enrichment of H3K36me3 without any change of the other marks. Cluster 2 (13 genes) is defined by an increase in H3K4me3 with a concurrent depletion of H3K27me3. H3K27me3 is also depleted in Cluster 5 (18 genes), associated with an increase in DNA methylation. DNA hyper-methylation is generally associated with gene silencing and as such, Cluster 5 exhibited the lowest, but still positive, change in gene expression. Genes that replicated later in LNCAP were also found to have three main combinations of epigenetic alteration (Figure 7.16.B). Both Clusters 2 (31 genes) and 4 (36 genes) were shown to have pronounced enrichment of H3K27me3, while Clusters 4 and 5 (39 genes) exhibited a simultaneous depletion of H3K4me3. While Cluster 4 showed a moderate enrichment in the MBDCap-Seq signal, the lack of general association of late replicating genes and hypermethylation again suggests that CpG island hypermethylation

    168

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    169

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Figure 7.16 Clusters of epigenetic changes at genes with altered replication timing A) Genes that replicate earlier in LNCaP (ΔWA > +25) and B) genes that replicate later in LNCaP (ΔWA < -25) were each clustered into 5 groups using K-means clustering. Clustering was based on change in enrichment (LNCaP-PrEC) calculated in 100 bp windows, 2.5 kb +/- the TSS, for MBDCap-Seq, H3K4me3-Seq, H3K36me3-Seq and H3K27me3- Seq. Clusters were ordered by mean change in RNA-Seq (Δsqrt RPKM, LNCaP-PrEC), the values of which are plotted on the right. The blue-red colour scale at the bottom indicates the relative change in LNCaP compared to PrEC. is independent of the time of replication (Figure 7.10.B). The presence of the enrichment or depletion of H3K4me3 and H3K27me3 in different combinations demonstrates that an altered replication program is associated with differential mechanisms of epigenetic reprogramming, rather than directing a specific remodelling program.

    7.4.8 Replication at regions of long range epigenetic reprogramming

    Changing replication time was associated with different combinations of epigenetic modifications at gene promoters. As this characteristic is similar to the different modes of gene activation described at both Long Range Epigenetic Activation (LREA, Chapter 3) and Long Range Epigenetic Silencing (LRES)140 regions, we investigated the relationship between replication timing and long range epigenetic deregulation. Both LREA and LRES regions had a significantly different mean WA score compared to a random distribution of similarly sized domains (Figure 7.17.A, Figure 7.18.A). The average WA for all LREA and LRES regions in LNCaP and PrEC cells is also plotted in Figure 7.19, where earlier and later replication for LREA and LRES regions, respectively, are easily discernible. As expected, LREA regions exhibited a significantly increased average WA compared to the significantly reduced WA at LRES regions. LREA region 19 (12q21.31) is highlighted in Figure 7.17.B, revealing a strong overlap with a replication initiation zone in LNCaP that is absent in PrEC. Conversely; LRES region 7 (1q32.2) intersects with an initiation zone in PrEC that is absent in LNCaP (Figure 7.18.B). These LRES and LREA example regions are seen to partially overlap with changing regions rather than having a complete intersection. Using the previously defined Early and Late Blocks (Section 7.4.3), we found that LRES regions had a significant overlap with Late Blocks (p<0.01) and LREA regions had a significant overlap with Early Blocks (p<0.001), when compared to a random distribution of similarly sized regions.

    170

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Figure 7.17 Replication timing at LREA regions A) The average replication time at each LREA region is plotted. The ‘Random’ distribution refers to 10,000 randomised LREA sized regions. The p-value of the difference in the distribution of LREA regions compared to ‘Random’ is indicated. The circled LREA region is exemplified in B).

    171

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Figure 7.18 Replication timing at LRES regions A) The average replication time at each LRES region is plotted. The ‘Random’ distribution refers to 10,000 randomised LRES sized regions. The p-value of the difference in the distribution of LRES regions compared to ‘Random’ is indicated. The circled LRES region is exemplified in B).

    172

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Figure 7.19 Comparison of LRES and LREA average replication The average replication time (WA) of each LREA (green) and LRES (blue) region in LNCaP (y-axis) and PrEC (x-axis) is plotted. The red dashed line indicates an equal score of replication in each cell line. Regions above and below the line replicate earlier and later in LNCaP respectively.

    7.4.9 Cancer specific signature of replication timing

    Previous studies have demonstrated that replication timing profiles of different leukaemia cells were more similar to each other than they were to their respective progenitor cell type369. In this study, the authors postulate the existence of a leukaemic specific signature of replication timing change. In an effort to extend these findings to encompass cells of many different lineages, Principle Component Analysis (PCA) was carried out using our LNCaP and PrEC data together with the publicly available ENCODE Repli-Seq data118,430 as described previously (Section 7.4.1). The cell lineages include: normal cultured primary cells HUVEC and NHEK; established fibroblast cell lines BJ and IMR90; Epstein-Barr Virus (EBV) transformed normal lymphoblastoid cells GM06990, GM12801, GM12812, GM12813 and GM12878; embryonic stem cell BG02ES; cancer cell lines HelaS3, MCF7, SK-N-SH, HepG2 and K562 (Table 7.1). PCA identifies standardised linear combinations of the original variables that can be used to summarise the data437. In this instance, it allows us to identify groupings of samples based on their summarised WA score of replication timing.

    173

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Figure 7.20 PCA clustering of public Repli-Seq data WA values for PrEC and LNCaP replicates were combined with WA calculated using ENCODE datasets and assessed using Principle Component Analysis. Samples are identified by colour and number using the key on the right.

    Using this analysis, cell types were visually divided into 4 major clusters (Figure 7.20). The first cluster consisted of normal epithelial cells, including the two replicates of PrEC, as well as epidermal and endothelial cell types. The second apparent cluster contained IMR90 and BJ cells, which were representative of all the normal fibroblasts assayed. The third and most disparate cluster consisted of the EBV transformed normal lymphoblast cells. Within the scope of this study however we can’t discern replication differences due to cell type or due to the EBV immortalisation method. Finally, all assayed cancer cells were found in Cluster 4. As might be expected from a collection of different cancer cell lines, this group showed greater spread than the other identified groupings. This characteristic notwithstanding, the fact that all cancer cells, regardless of cell of origin, clustered separately to normal differentiated cells, is suggestive of a cancer specific reprogramming event. Unexpectedly, the one assayed embryonic stem cell line was found to associate with the cancer grouping. This conclusion is contrary to previous reports369 which suggest that ES cells are more likely

    174

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Table 7.3 Cancer genes with changed replication timing Cancer genes were identified from the Wellcome Trust Cancer Genome Project438 in regions that significantly change replication time. Tumour Type and Mutation Type were directly informed from this census. “Translocation” mutation type indicates that the specified gene is involved in oncogenic genomic rearrangements. “Deletion” indicates that the specified gene can be deleted in tumours, and “Sequence” refers to missense, nonsense, frame- shift and splice-site mutations. to cluster with in vitro differentiated cell types. In the results presented here however, the presence of the ES cells within the cancer cell classification suggests a replication signature common with highly prolific undifferentiated cells

    7.4.10 Cancer genes and replication timing

    In order to identify biologically relevant genes in loci that change replication timing in LNCaP, the 350 genes that replicate later and 143 that replicate earlier (Section 7.4.4) were parsed for cancer related genes as denoted by the Wellcome Trust Cancer Genome Project438. This assessment revealed 10 cancer genes that replicate later in LNCaP and 3 that replicated earlier (Table 7.3). Interestingly, of the 10 genes that replicate later, 90% 175

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER were implicated in tumourigenic translocations in their respective cancer types. Of the identified genes, the only gene specifically implicated in prostate cancer pathogenesis was Ets Related Gene (ERG). ERG fusions with TMPRSS2 are the most commonly identified chromosomal rearrangement found in prostate cancer biopsies439, and specific isoforms are associated with an aggressive prognosis440. Here I show for the first time that this gene replicates later in the LNCaP cancer state compared to PrEC cells. This changed replication state is clearly documented in Figure 7.21.A. As I’ve demonstrated that late replication is often associated with hypomethylation (Section 7.4.6.2), and hypomethylation is thought to potentiate chromosomal instability441-443, I investigated the methylation status of the ERG locus.

    Methylation was quantified in 1 kb blocks using gNOME-Seq WGBS over the entire ERG locus. The values obtained were compared between PrEC and LNCaP (Figure 7.21.B), where it is evident that methylation is substantially reduced in LNCaP cells. This was then confirmed on closer visual inspection at the ERG promoter (Figure 7.21.C). Here we see both a reduction in methylation over the entire sequence, as well as an absence of representation in the associated CpG island. While these observations raise the possibility that a later replication time and a hypomethylated locus are involved in oncogenic ERG translocations, the fact that this rearrangement doesn’t occur in LNCaP implies that they are not sufficient in themselves. Rather, I hypothesise that these qualities potentiate the locus for oncogenic translocation, but the completion requires a more responsive nuclear environment of other unknown factors. In order to more fully explore this theory, a thorough investigation of replication timing in clinical prostate samples would be required.

    176

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Figure 7.21 ERG replicates later and is hypomethylated in LNCaP A) PNDV representation for LNCaP and PrEC of the ERG locus. The location of RefSeq annotated transcripts is shown below the main plot. B) gNOME-Seq methylation data was summarised in 1 kb blocks across the ERG locus. The size and relative location of these blocks is demonstrated with green and red boxes. A scatter plot contrasts the level of methylation found in LNCaP to PrEC, using these summarised values. Points below the x=y line are indicative of 1 kb

    177

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    blocks that have less methylation in LNCaP compared to PrEC. C) gNOME-Seq sequence reads are shown for the ERG promoter locus, the location of the TSS and CpG island are shown. Grey bars indicate reads that either have no CpG sites or have GpCpG sites. Red bars indicate methylated CpG sites and blue bars indicate unmethylated CpG sites.

    7.5 Discussion

    The replication timing program of many cell types have previously been catalogued using both array based methods116,444 and Repli-Seq118. In addition, a recent study has documented the extent of replication changes that occur in paediatric leukaemia when compared to normal B- and T- cell populations369. The investigation carried out in this thesis presents for the first time a high-resolution study of the epigenetic changes that accompany replication timing aberrations in the development of cancer and how these changes correlate with an altered transcriptome.

    Previous studies have ascertained that ~50% of the genome is subject to any replication timing plasticity118 in development (using Repli-Seq), while 8-17% of the genome was shown to exhibit an altered profile in leukemic samples when compared to normal lymphoblasts369 (using array based analyses). In our assessment of replication timing in the prostate cancer genome we found 5.3% of the genome to be significantly altered from PrEC to LNCaP. The discrepancies between these two conclusions are likely due to the systemic differences in experimental approach (Repli-Seq vs array based analysis), however a biological distinction between leukaemia and prostate cancer cannot be ruled out. Given that a destabilised replication mechanism has been implicated in early stages of tumourigenesis445, it was to some surprise that such a high degree of replication constancy (>90%) was observed between PrEC and LNCaP (Figure 7.5.B) This observation suggests that the basic mechanisms that control the replication program are not themselves deregulated in prostate tumourigenesis. This is further reinforced by the observation that LNCaP cells exhibit typical relationships between replication timing and gene density (Figure 7.7.A), CpG island density (Figure 7.10.A) and gene expression (Figure 7.7.B)446,447.

    178

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Figure 7.22 Model of chromatin modifications and replication timing Each number and transcript represents recognisable patterns of epigenetic and transcriptional modifications. The light blue ‘Early Replicating DNA’ box represents a permissive nuclear environment, while the light red ‘Late Replicating DNA’ box represents a repressive environment. 1) Enrichment of H3K4me3 at early promoters; 2) An enrichment of H3K27me3 at late promoters; 3) An enrichment of H3K4me3 and a depletion of H3K27me3 at early promoters; 4) An enrichment of H3K9ac at early promoters; 5) the presence of inactive promoters in early replicating DNA.

    While the replication program itself appears to be generally maintained in the cancer state, I show that the changes that do occur correlate with both epigenetic and transcriptional alterations. First, I show that a changing replication program correlates with the expected shift in transcriptional activity, with genes that replicate later being transcriptionally repressed in LNCaP (Figure 7.8). In addition, a changing epigenetic landscape is also observed, with genes that replicate later becoming deplete of activating modifications H3K4me3 and H3K9ac, and acquiring the repressive H3K27me3 polycomb modification (Figure 7.9). The acquisition of H3K27me3 at late replicating regions contrasts observations made by Gilbert and colleagues 115,116, but confirms those made in earlier ENCODE studies430. Furthermore, I highlighted that discrete combinations of epigenetic modifications are associated with a changing replication program and transcriptional

    179

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER output (Figure 7.16). H3K4me3 and H3K27me3 were acquired and lost both independently (Figure 7.22: 1-2) and combinatorially (Figure 7.22: 3). The enrichment and depletion of H3K9ac at genes that replicate earlier and later was also observed (Figure 7.22: 4), however, the micro-array platform precluded the assessment of its co-regulation with the other investigated chromatin modifications, as discussed (Section 7.4.7). Finally, I observed strict genetic repression in late replication domains but saw greater plasticity in the expression of genes found in early domains (Figure 7.7.B, Figure 7.22: 5). Together these observations suggest that early and late replicating domains represent permissive and repressive transcriptional nuclear environments. This is further supported by the restricted presence of LREA and LRES regions to those parts of the genome that replicate earlier and later respectively (Figure 7.17, Figure 7.18, Figure 7.19). The fact that none of the investigated epigenetic modifications actively define a replication time, but are instead found exclusively in a particular replication time, lends support to compartmentalisation of permissive and repressive domains within the nucleus. To expound by way of example, a given gene that replicated earlier in LNCaP would be shifted to a ‘permissive’ domain, where it could either remain inactive or be transcriptionally activated by diverse epigenetic mechanisms. This theory of replication time being correlative with large domains of open and closed chromatin has been previously suggested116 using Hi-C whole genome looping data103. To more fully elucidate how this relationship is manifest in prostate tumourigenesis a similar Hi-C experiment would be required, which is beyond the scope of the work presented here.

    The second set of conclusions drawn from this study concern the interaction between replication timing and cancer specific methylation phenotypes. As discussed previously, hyper-methylation of CpG islands and hypomethylation of low CpG density loci is typical in tumourigenesis39. Here I show that CpG island methylation appears to function largely independently of replication timing and changes thereof. These islands are largely unmethylated in both PrEC and LNCaP, at both early and late replicating regions (Figure 7.10.B). If a change in methylation occurs then it is almost exclusively a hypermethylation event and this change occurs regardless of any changes in the time of replication (Figure

    180

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Figure 7.23 Model of DNA methylation and replication timing A) and B) represent DNA replication that is constant early or constant late respectively. Gene promoters that are underlined represent CpG islands. A -1), 4) Low CpG density promoters are either methylated or unmethylated at early replicating DNA. A - 2), 3) CpG islands either become methylated or remain unmethylated at early replicating DNA. B - 5), 8) Low CpG density promoters either lose methylation or remain unmethylated at late replicating DNA. B - 6), 7) CpG islands either become methylated or remain unmethylated at late replicating DNA. 7.12.A, Figure 7.23: 2-3, 6-7). Conclusions of independent activity are corroborated by the observation that both CpG island methylation and the time of replication correlate with transcriptional activity (Figure 7.11). While there is a paucity of literature that investigates this relationship, one study has shown that CpG island promoters that shift their time of replication don’t show significant changes in gene expression115, suggesting that the regulation of these genes can ‘overcome’ the influence of the replication program. The data presented here however contradicts this conclusion as CpG island genes are shown to be more prone to transcriptional changes when associated with a change in replication timing (Figure 7.8.B).

    181

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    CpG methylation is regulated by different nuclear mechanisms in distinct genomic contexts. Here I show that the genome-wide hypomethylation of low CpG dense regions found in tumourigenesis79,131 is associated with late replicating DNA. Specifically I show that late replicating DNA, while showing a reduced methylation signature in PrEC, is completely unmethylated in LNCaP, both genome wide (Figure 7.13) and at gene promoters (Figure 7.10). In contrast, DNA that replicates early is highly methylated in both cell types (Figure 7.13). By looking specifically at changing methylation with reference to the time of replication (Figure 7.14), I show that hypomethylation in LNCaP preferentially occurs at DNA that replicates late both in PrEC and LNCaP (Figure 7.23.B: 5,8). The fact that this occurs in DNA that replicates late in PrEC (in contrast to DNA that shifts in replication time) indicates that it is an inherent property of late replicating DNA to have a propensity to lose methylation in our cancer model.

    These findings are in agreement with the previously reported association between late replicating DNA and Lamina Associated Domains (LADs)118, which are regions of the genome that are found at the nuclear periphery and are associated with repressive chromatin328. LADs are known to significantly intersect with large domains of hypomethylation in cancer92, thus indirectly associating hypomethylation with late replication. In addition, late replicating DNA has been shown to become increasingly hypomethylated at gene bodies with increasing cell divisions64, as well as being associated with hypomethylation of heterochromatin in normal fibroblasts and lymphocytes448. Here I show for the first time that hypomethylation associated with prostate cancer can be directly associated with late replicating DNA, as this characteristic is almost completely absent at early replicating loci (Figure 7.14.B). This observation, in conjunction with the understanding that late replicating DNA loses methylation over many cell cycles64, suggests that hypomethylation observed in cancer is a passive consequence of a significantly increased rate of cell division. This conclusion begs the question about the difference between early and late replicating DNA that allows this passive demethylation to occur.

    182

    CHAPTER 7: REPLICATION TIMING IN PROSTATE CANCER

    Finally, the observation that there is a cancer specific signature of replication timing (Figure 7.20) has several ramifications. While there have been some initial investigations into the presence of cell-specific replication signatures, they have generally been limited in scope. One study116 demonstrated that ES cells and iPS cells would cluster independently of neural precursor cells (NPCs) and normal lymphoblasts, while another369 showed that leukaemia cells, regardless of cell of origin would cluster separately to their precursors. These studies therefore suggest that ES cells and cancer cells of a specific type have different replication programs to normal cells throughout the body. The findings presented here extend these conclusions to suggest that all cancers share a commonality in the regulation of their replication timing programs. As the assessed ES sample clustered with cancer cells and not with the included normal samples, it raises the possibility that a cancer-specific replication program is representative of that found in undifferentiated cells. In addition, a potentially useful outcome of this PCA analysis is the high disparity of the EBV transformed lymphoblastoid cells to all other tested samples. This could either be due to the cellular identity of the cells, or due to the EBV transformation process. If due to the latter then this could provide a unique tool to grossly alter the replication timing program of a cell in an experimental setting.

    183

    CHAPTER 8: CONCLUSION

    CHAPTER 8 Conclusion

    184

    CHAPTER 8: CONCLUSION

    Mechanisms of epigenetic and transcriptional control have been debated since the discovery of DNA itself. Due to the inherent molecular scale and massive diversity of epigenetic processes, a complete understanding of how these factors interact with the cellular environment, with DNA and with each other remains a distant yet attainable goal. As an example of ongoing challenges, the NIH Roadmap Epigenomics Mapping Consortium449 currently considers 30 histone modifications and variants as experimentally valid epigenetic markers. These represent a small subset of the greater than 100 post translational modifications recently identified on core histone proteins450, the majority of which have never been directly investigated due to the lack of appropriate technology. Considering histone modifications and other epigenetic marks often effect genomic regulation through combinatorial signalling25, the potential for complexity and nuance is enormous. During the course of this thesis and the research it represents, the field of epigenetics has shifted dramatically and gained unprecedented momentum as it has moved from a loci specific approach to genome-wide studies. As is common in science, this progression is strongly driven by the development and refinement of new technologies. In this instance the exponential availability of next generation sequencing has facilitated genome-wide investigation of histone modifications, DNA methylation and its variants, transcriptional output, nuclear spatial and temporal configurations as well as many non- epigenetic regulatory mechanisms. With this glut of new data, a new resolution of nuclear biology has emerged, of multiple and intersecting domain-level epigenetic processes that coordinate with transcription factors and other DNA binding proteins to correctly maintain genomic function.

    Tumourigenesis is fundamentally characterised by persistent cell division promoted by inadequate genetic regulation. By investigating the form and consequence of these inadequacies, epigenetic researchers can highlight clinically therapeutic sensitivities in the cancer cell as well as reveal basic molecular relationships that are essential in healthy cellular development. To this end, the broad aim of my thesis was to describe multiple aspects of epigenetic regulation in prostate cancer using LNCaP and PrEC cells as a model

    185

    CHAPTER 8: CONCLUSION system. By integrating this data with publicly available datasets, I had aspired to uncover novel mechanisms underlying the cancer cell nuclear phenotype and in turn, shed light on normal cellular function.

    8.1 Key findings

    The first key finding in this thesis, and the principle subject of a recent manuscript, “Regional activation of the cancer genome by long-range epigenetic remodelling”145 was the identification of Long Range Epigenetically Activated (LREA) domains in the prostate cancer genome. As the discovery of these domains was primarily carried out in LNCaP cells and validated using the Oncomine database, it is unlikely that this set exhaustively describes all prostate cancer LREA regions. With this caveat, the presence of multiple prostate biomarkers and sites of common oncogenic translocations in the LREA regions described here, serves to substantiate their clinical significance as a common and predictable event in prostate tumourigenesis. These biomarkers include the widely recognised prostate specific antigen (KLK3), and the more recent non-coding prostate cancer antigen 3 (PCA3). The inclusion of several gene loci involved in frequent and oncogenic Ets transcription factor translocations266 provided a peculiar challenge; of the three identified, only one (MIPOL1) was actively rearranged in the LNCaP cell model. I therefore suggested that LREA type remodelling could provide a destabilising force that potentiated such re-arrangements, rather than defining them.

    Prior to this investigation, only domains of cancer-specific epigenetic silencing (LRES) has been identified140. By revealing the presence of LREA domains, which are found in juxtaposition to LRES domains in individual cancer types, I detail an epigenomic environment in which the specification of large scale epigenetic character is disrupted. LREA domains are generally characterised by an enrichment of H3K9ac and either a concomitant depletion of the polycomb H3K27me3 or an enrichment of H3K4me3 (Figure 3.12). The significance of this character was highlighted by contrast to LRES domains, which were

    186

    CHAPTER 8: CONCLUSION typically deplete of H3K9ac and enriched in H3K9me2 and H3K27me3. As both types of long range epigenetic modulation were found in the same cancer cell model, it suggested that the deposition of both active and repressive marks itself wasn’t deregulated, but rather an overarching process that dictated the specific localisation of the marks was impaired. One potential mechanism for this observation, involving changes in chromatin architecture, was detailed in Chapter 5. Using the Kallikrein gene locus, which harboured adjacent LREA and LRES domains, I identified multiple looping structures anchored by the CTCF protein. These looped structures clearly demarcate both LREA and LRES regions in PrEC and LNCaP cells. Interestingly there was no discernable distinction in the loops between the cancer and normal cell state in spite of massive epigenetic remodelling, suggesting the looped domains represented epigenomic modules that were prone to cancer associated remodelling (Figure 5.11).

    Continuing my investigation into epigenetically activated domains, I sought to explore the cancer-specific relationship between CpG island methylation and up-regulated gene transcription. An initial enquiry seeking hypomethylated islands at gene promoters yielded results similar to previous studies92. As the majority of promoter associated CpG islands are typically and specifically unmethylated in normal methylomes, the presence of hypomethylated islands in cancer is predictably uncommon. In spite of this, a hypomethylated intronic CpG island was identified at the KLK4 gene locus, associated with up-regulated transcription initiating downstream of the annotated start site. In contrast to this limited example, and contrary to the well accepted dogma of CpG island hypermethylation and gene silencing, I describe many more genes that were both significantly up-regulated and displayed CpG island hypermethylation in LNCaP cells. Further analysis revealed two associated patterns of aberrant methylation with gene activation. Hypermethylation that either occurred at the borders of promoter CpG islands, potentially obstructing inhibitory protein signalling, resulting in augmentation of gene expression, or, hypermethylation that fully encompassed the promoter islands including the TSS, associated with ectopic expression from an alternative promoter (Figure 4.12). The

    187

    CHAPTER 8: CONCLUSION mechanism(s) that lead to these alterations in cancer are not clear, but could involve a similar process to the more widespread hypermethylation of CpG islands with gene repression. The existence of these hypermethylated and transcriptionally activated genes, however, serves as a compelling reminder that the function and regulation of DNA methylation and its aberrations in cancer are still poorly understood.

    Finally, through profiling the replication timing program in prostate cancer I detailed several novel observations. While there are other studies investigating replication timing changes in cancer369, the work presented here represents the first detailed investigation that fully incorporates and compares epigenomic and transcriptional datasets. Perhaps the most surprising discovery pertains to the high degree of similarity between PrEC and LNCaP replication programs, regardless of the significant genomic distortion of the cancer cell. By comparing these generated profiles with publicly available datasets, however, I show that the replication programs of all cancer cells exhibit a greater similarity to each other than normal cell models. This suggests the very interesting possibility of an altered replication program that is specified by the tumourigenesis process itself. Of the domains that do exhibit temporal replication changes in prostate cancer, epigenomic alterations largely conformed to the characteristic early and late replication representing compartments of permissive and repressive chromatin respectively. Of specific interest to our laboratory was that LREA and LRES regions exhibit shifts to earlier and later replication respectively. The tumourigenic sequence of events however, that is, whether chromatin, gene expression or replication changes occur sequentially or simultaneously, remains obscured. An analysis of DNA methylation revealed that the CpG island hypermethylation phenotype that is common in cancer is not associated with replication timing. Given the strong association between replication timing and spatial compartmentalisation, this result can be generalised such that the regulation of CpG island methylation is independent of the broad three dimensional epigenetic regulation that typically governs the nucleus. By contrast, I observed a very strong association between widespread cancer-specific hypomethylation and DNA that replicated late in both the normal and cancer genomes. As replication timing in the

    188

    CHAPTER 8: CONCLUSION normal cell could predict the occurrence of hypomethylation in the cancer state, a causal relationship between the two characteristics is highly likely. The cause and consequences of extensive hypomethylation in cancer is a matter of current debate. The presence of the commonly translocated ERG439 locus in a domain that switches to late replicating with associated hypomethylation however, supports the hypothesis that such genetic loci are particularly susceptible to these oncogenic events.

    8.2 Future directions

    The basic purpose of epigenetic research is to determine the molecular mechanisms that facilitate selective transcription, encourage replication fidelity and maintain genomic integrity. Ultimately a network of causal relationships must be determined. As an example, discovering the foundations of DNA methylation selectivity and the proteins responsible, as well as the functional consequences of the widespread modification. The scale of this task is evidenced by the >40,000 publications currently indexed by PubMed under “DNA methylation”, coupled with our deficiency in understanding its most basic qualities. A primary contributor to this delayed comprehension is the availability of sensitive technologies that can appropriately and directly interrogate molecular scale epigenetic processes. Before we can assemble the puzzle, we must know the nature of the pieces. Accordingly (and tautologically), the era of epigenetic research in which this thesis was undertaken governs the nature of its enquiry and conclusions. The recent ubiquity of genome-wide analysis afforded by microarrays and next generation sequencing has exponentially increased our observational capacity. These technologies are still limited by reliance on other molecular biology protocols, their own inherent biases as well as the intrinsic requirement that researchers have a specific modification for which they’re assaying. In spite of this, the wealth of data generated by the community at large and international collaborations such as the NIH Roadmap Epigenomics Mapping Consortium449 and the International Human Epigenome Consortium (IHEC)451 will provide the raw material

    189

    CHAPTER 8: CONCLUSION of research for decades to come. This era of epigenomics, and as such the nature of this thesis, is therefore significantly characterised by its observational intent.

    The aim of this thesis was to describe multiple aspects of epigenetic regulation in prostate cancer with the intent of further understanding the nuclear mechanisms that contribute to tumourigenesis. The utility of LNCaP and PrEC cell lines as a model for prostate tumourigenesis is exemplified in epigenetic research by allowing multiple layers of biological data to be generated and directly integrated. While this primary aim has been appropriately realised, there are still many more regulatory networks that remain to be explored. The study of regional chromatin activation in LREA for example, would benefit from expansion to other prostate cancer cell systems, including primary cell isolates, as well as other cancer types. The prevalence and specific genomic localisation of LREA regions in distinct cancer populations would further our appreciation of how these regions become targeted for deregulation. Given the striking delineation of LREA and LRES regions via chromatin looping at the Kallikrein gene locus, further investigations incorporating genome scale approaches such as Hi-C or ChIA-PET would serve to confirm the proposed looped modules of chromatin remodelling. Additional experimentation to directly investigate cancer associated changes in genomic spatiality, such as interrogating lamina association via ChIP and nuclear compartmentalisation via FISH, would provide a further insight into the physical characteristics of LREA and LRES domains.

    To advance the conclusions drawn here regarding replication timing and cancer progression there are several promising avenues that could be traversed. The identification of a cancer- specific signature in replication timing suggests the presence of cancer-specific origin selection mechanism. Isolating these regions and interrogating their epigenetic, transcriptional and other bound-protein disposition in diverse cell types would aid our understanding of replication mechanisms in both the healthy and cancer state. The replication “map” detailed in this thesis for LNCaP and PrEC cells provides the framework for higher resolution epigenetic studies. For example, by interrogating the differential

    190

    CHAPTER 8: CONCLUSION enrichment of epigenetic modifiers during the course of S-phase, with the reference map here created, we can identify the factors that bind to DNA pre-, during, and post-replication. This could reveal an exciting mechanistic insight into how epigenetic networks are maintained, replicated and the potential for disruption in cancer cells, as we are mindful that it is not just the DNA sequence that is replicated in each cell division but the matching suite of epigenetic marks. Applying this approach to the regulators of DNA methylation, the DNMT and TET families, combined with genome-wide assays for 5-hydroxymethyl- cytosine452,453, would help resolve how late replicating regions undergo hypomethylation in cancer.

    Understanding casual relationships between epigenetic manifestations and the mechanisms of their action is notoriously difficult. To gain mechanistic into the observations described in this thesis there are two promising approaches that could be adopted. The first approach is to artificially disrupt the action of epigenetic effectors. This can be achieved with compounds such as 5-aza-deoxycytidine and trichostatin A, which inhibit DNMTs454 and HDACs respectively455, as well as transient or stable knock-downs of genes involved in epigenetic regulation. By first interrupting the organisation of epigenetic modifications and then assessing the changes that the action elicits, it would allow a sequence of causality to be inferred. Potential targets include CTCF and RAD21 in the chromatin looping pathway329,330, LMNB1 in lamina associated domains determination456 and RIF1 in replication origin selection394. The second approach entails monitoring epigenetic changes over the course of in vitro malignant transformation. This is commonly achieved through viral introduction of oncogenes followed by periods of selection457 or through direct carcinogen exposure of prostate cells in culture458,459. Alternative models that utilise extended cultures of epithelial cells to model early carcinogenesis have also been explored in the Clark laboratory460. Such model systems could reveal the temporal order of epigenomic alterations, and thus offer clues to their causal hierarchy. As examples, the sequential relationship between: hypermethylation of CpG island border regions and transcriptional activation and H3K4me3 deposition; changes in replication timing and the

    191

    CHAPTER 8: CONCLUSION development of LREA and LRES; the acquisition of genomic rearrangements and DNA hypomethylation. While these model systems, like all model systems used in biology, do not perfectly imitate cancer cells in vivo, their utilisation in epigenetic research could provide novel observations essential for understanding the basic mechanisms of this pervasive disease.

    8.3 Concluding remarks

    The aim of this thesis was to characterise epigenetic changes that occur in prostate cancer, with a focus on epigenetic gene activation and domain level regulatory processes. To exhaustively describe all the epigenetic aberrations that occur in cancer cells is a task requiring a generation of scientific enquiry. The final outcome will be an understanding of the drivers of tumourigenesis and hopefully, exploitable sensitivities that such a knowledge might present. The scale of this task is highlighted by the basic nature of the questions that remain: What are all the epigenetic modifications and its manifestations? What are the protein effectors of epigenetic control and what is the mechanism of their specificity? What are the consequences of epigenetic deregulation in cancer development? What are the initial causes of this epigenetic disruption? International consortia such as the International Human Epigenome Consortium (IHEC)451, the International Cancer Genome Consortium (ICGC)461, The Cancer Genome Atlas (TCGA)462, the NIH Roadmap Epigenomics Mapping Consortium449 and especially the ENCODE project463 are all making sustained progress in resolving these contested issues. However, key epigenetic breakthroughs made only in the last few years regarding DNA methylation dynamics and deposition33,36,464, drivers and regulators of pluripotency107,465-467, and the varied and essential role of ncRNAs468,469, are a daunting testament to the breadth of unknown discoveries which could challenge current paradigms. The work presented here marks a small but significant contribution in the global effort to understand the nature of genetic regulation and the consequences of its corruption in disease.

    192

    CHAPTER 9: APPENDIX

    CHAPTER 9 Appendix

    193

    CHAPTER 9: APPENDIX

    9.1 Primers

    All PCR primers used in this thesis are described in this table. “Locus” refers to the gene or specific location assayed. “Primer Sequence” indicates forward and reverse primer pairs, or, in the instance of 3C experimentation, a single primer sequence.

    Amplicon Coordinates Assay Locus Primer Sequence Figure (hg18) Type RFK CATAGGATGGAACCCATATTACAAGAA chr9:78,190,253-78,199,264 q-RT-PCR 3.3 AGTCCTCTTTGAAGGTATGCATGAT GCNTA TCAGCACTAAGTGATTCAGACTTTCC chr9:78,263,888-78,312,152 q-RT-PCR 3.3 GCTGCAACGGCATCTTGA PRUNE2 CGTGCTAAAGGAGATTCTCCAAGA chr9:78,416,112-78,710,823 q-RT-PCR 3.3 GGTCATCCACTTGAAAAGAATGC PCA3 AGAAATAGCAAGTGCCGAGAA chr9:78,569,174-78,592,285 q-RT-PCR 3.3 CTTATTTCTCACCTCTGTATCATCAGG FOXB2 CTCGTACAGCGACCAAAAACC chr9:78,824,391-78,825,689 q-RT-PCR 3.3 TGTGTGTGCTCGCGGTAGTAG VPS13A CAAAGCAACAGGAACTGAAAAGAATAG chr9:79,010,767-79,014,186 q-RT-PCR 3.3 CCTGTTTTTCCGGCAGATGT GNA14 CGCTAAGGATACAGTATGTGTGTGAA chr9:79,228,368-79,453,043 q-RT-PCR 3.3 CTTGTCCACTTCCACTTCTCTGATT SLC25A21 CTTTCGAATGATTTTCCAAATGG chr14:36,218,829-36,711,616 q-RT-PCR 3.3 CAGCCAAGATAGGTGGCAGAA MIPOL1 GTAAAATGAGAATAACTGCAGAAGAAATG chr14:36,736,907-37,090,215 q-RT-PCR 3.3 AAGCTCCTGCTCTAACCGTTTG FOXA1 AAGATGGAAGGGCATGAAACC chr14:37,128,942-37,134,240 q-RT-PCR 3.3 GACCGGGACGGAGGAGTAG C14orf25 AGCGTGCTCACAAACCACACT chr14:37,150,207-37,580,397 q-RT-PCR 3.3 CAACTTCGTGAATTCTGGATAAGC TTC6 CAGAGCATTATGTTACACCAAGATAAGG chr14:37,334,226-37,381,247 q-RT-PCR 3.3 CAAGAAGCAGCACAATTCCATAATC SSTR1 AGTCTGGAGGTTGCGCACTT chr14:37,746,955-37,752,019 q-RT-PCR 3.3 CGGCTCTGGACTGGTAAATGA C15orf21 GAACCCTGCTAGAGCCATCAA chr15:43,590,626-43,636,220 q-RT-PCR 3.3 CCGTGAGGCCAGGAACCT SLC30A4 CACTTCAGGAAAATCTACTGCCATAG chr15:43,561,970-43,602,294 q-RT-PCR 3.3 CTGTACTTCCTCCCATTTAGATGAACTT PLDN GAAAAGAGATGCTGATGCTTCATG chr15:43,666,709-43,689,201 q-RT-PCR 3.3 TCCAACTCTTCTTTTTGCCTCTTC 194

    CHAPTER 9: APPENDIX

    KLK15 TGGCTTCTCCTCACTCTCTCCTT chr19:56,020,357-56,026,591 q-RT-PCR 3.3 CCTTCCAGCAACTTGTCACCAT KLK3 TGTGGGTCCCGGTTGTCTT chr19:56,049,983-56,055,832 q-RT-PCR 3.3 CCCAGCCTCCCACAATCC KLK2 ACAGCTGCCCATTGCCTAAA chr19:56068501-56075635 q-RT-PCR 3.3 GTGTCTTCAGGCTCAAACAGGTT KLKP1 CATCCTCACTGGGTGCTCACTAC chr19:56,077,164-56,084,253 q-RT-PCR 3.3 CTATGGTGCTGGCTAGTTGATCA KLK4 GAGGGCACGACCAGAAGGA chr19:56,101,420-56,105,806 q-RT-PCR 3.3 AAGACACAAGGCCCTGCAAGT ETV1-fusion ACATTTTGCTAACCCCTCTTCTATCT chr7:14,164,374 DNA 3.6 MIPOL1-fusion AGATGCCATTTATTTTTGAGGTTGA chr14:37,055,248 DNA 3.6 GRB10 AAGGTTGGGTAATGTAAGAGAGTTAT chr7:50,827,880-50,828,097 Bisulphite 4.2 CCTAACAACATATATTCTTAATAAAACTACTTA XPOT TGTATGTTGGGAGTTGTAGGTTTT chr12:63,084,387-63,084,823 Bisulphite 4.2 ACCTACACTCCAAATAATCAACCAA TBK1 GAGGGGGTGTTTTTAGATATTTTT chr12:63,131,998-63,132,450 Bisulphite 4.2 CTACACACACCCCAAAAACAATA FOXA1 TAGGTGTTGTGATTTATTTGTTTTTAGT chr14:37,134,647-37,134,981 Bisulphite 4.2 CCTAAATCCCTCCCCAAAAA C15orf21 GTAGTGGTGTAATTTTAGTTTATTGTAATT chr15:43,591,027-43,591,326 Bisulphite 4.2 CTTTCTAAAAATCATTAACTCCTTACAA SLC30A4 GTTTAGAATTTGTTAGATTTGGGGAGT chr15:43,601,990-43,602,259 Bisulphite 4.2 AACTCCCAACCCTATCCTCAAA C15orf21 SINE 1 TATAATGTTTATTTTGGAATGGATTTT chr15:43,593,490-43,593,919 Bisulphite 4.3 TATAATTTAAAAAATTTTCCCATAACTAAA C15orf21 SINE 2 AGTGGTGGTTTATATTTGTAATTTTAGTAT chr15:43,594,870-43,595,217 Bisulphite 4.3 AAAAAACTTCTACACAATAAAAACATAATTA KLK4 CpG 22 TTTTTTTGTATGTGGAGTTAAGTTATTGTT chr19:56,107,714-56,108,088 Bisulphite 4.6, 4.7 CAAAAAAACACAAAAACATTCCTACC KLK4 CpG 27 GGGTAGAATATGTTGGGGTGGTAT chr19:56,103,461-56,103,621 Bisulphite 4.6, 4.7 GAGAAGTTTTTTTAGGAGATGGGTT KLK4 exon 1-2 GGCCACAGCAGGAAATCC chr19:56,101,420-56,105,806 q-RT-PCR 4.6 GCAGTCCTCGCCGTTTATGAT KLK4 exon 2-3 TGCTGTCAGCCGCACACT chr19:56,101,420-56,105,806 q-RT-PCR 4.6 TTGGTCGGCCTCAAGACTGT KLK4 exon 3-4 GGAACTCTTGCCTCGTTTCTG chr19:56,101,420-56,105,806 q-RT-PCR 4.6 ACTGCAGCACGGTAGGCATT KLK CTCF-1 TTGTCCTGGGAGGGCCTTA chr19:56,033,537-56,033,603 qPCR 5.3, 5.6 AGGAGTAAAACTGCTATGAACAAACG KLK CTCF-2 GGGAAAGGGACAAAAAATGAAA chr19:56,101,452-56,101,512 qPCR 5.3 CCCCCTTGTGGCACGTT

    195

    CHAPTER 9: APPENDIX

    KLK CTCF-3 TGCGGTTAGTGGCTACCATATTG chr19:56,216,251-56,216,319 qPCR 5.3 TCCAACTTCCATGATAGGTAATTAGATAGT CTD-23422A18 WALK1 CTGTCCCAGGAAACCCAGATT chr19:56,069,033-56,069,104 qPCR 5.6 TTGTTGTCTCAGGCCAGATAGC CTD-23422A18 WALK2 TCTAGAGGAAGGAAGTCTTAAGGATGA chr19:56,083,674-56,083,754 qPCR 5.6 CGCATTACAGAACCAATTCCAA CTD-23422A18 WALK3 GGGAAAGGGACAAAAAATGAAA chr19:56,101,452-56,101,512 qPCR 5.6 CCCCCTTGTGGCACGTT CTD-23422A18 WALK4 ACGTGGGTGCGTGAATCAT chr19:56,113,745-56,113,836 qPCR 5.6 CAATTTGGATGCTAAGTTTTTTCAGA CTD-23422A18 WALK5 ATATTCAAGCACTGGAGGACCTTAG chr19:56,143,808-56,143,888 qPCR 5.6 GCCGCCTTCCATCTTTCTC CTD-23422A18 WALK6 CAGTGAGCCAAGATTGCATCA chr19:56,172,053-56,172,133 qPCR 5.6 TGGCACAACGATGTTTTTTTTC CTD-23422A18 WALK7 TGCGGTTAGTGGCTACCATATTG chr19:56,216,251-56,216,319 qPCR 5.6 TCCAACTTCCATGATAGGTAATTAGATAGT CTD-23422A18 WALK8 TGTAGAATTGCAACTCACAGTCGAT chr19:56,243,996-56,244,076 qPCR 5.6 AACTCTTGATATGCGGTCAAAAGA CTD-23422A18 WALK9 GCGCAGTGTGATCTGTAGTCTCAT chr19:56,280,563-56,280,643 qPCR 5.6 TGTGTAACTGTCGTTCAGAATGGA CTD-23422A18 WALK10 CATGCACCCTGTAGCCCATT chr19:56,304,733-56,304,813 qPCR 5.6 CCTCACGGAGCTCATTTAATATGA 3C KLK constant GCATATCGGAAGTGCTCAGTAAAAA chr19:56,120,051 3C qPCR 5.6, 5.7, 5.8, 5.9 3C KLK Fragment 1 AACCCAGGAGGTGAAAGAAAGTT chr19:55,976,821 3C qPCR 5.7, 5.9 3C KLK Fragment 2 RV TTTGGAAATGGAATCTTGCTTTG chr19:55,976,822 3C qPCR 5.7, 5.9 3C KLK Fragment 2 FW TGCACTCTAGCCTGGGCAATA chr19:55,980,265 3C qPCR 5.7, 5.9 3C KLK Fragment 3 CAGATGTCCTGCGGACTCTTG chr19:55,988,819 3C qPCR 5.7, 5.9 3C KLK Fragment 4 ACGTGGGCTCCAATATCCAA chr19:55,988,820 3C qPCR 5.7, 5.9 3C KLK Fragment 5 GAAAGAATGTATAATCCTATCAGTGGAAA chr19:56,000,404 3C qPCR 5.7, 5.9 3C KLK Fragment 7 GAATCTGTGGTTAGCCTGATCTTGA chr19:56,019,621 3C qPCR 5.7, 5.9 3C KLK Fragment 8 CCAGATCCCCATATGTGAAACC chr19:56,025,128 3C qPCR 5.7, 5.8, 5.9 3C KLK Fragment 9 CTCTTACCAGGGTCTCCCAAAA chr19:56,032,167 3C qPCR 5.7, 5.9 3C KLK Fragment 13 CCACGCAGCAGATGAGCAT chr19:56,043,551 3C qPCR 5.7, 5.9 3C KLK Fragment 14 CACTCCCAACCCAGAATCCA chr19:56,049,449 3C qPCR 5.7, 5.9 3C KLK Fragment 18 TTGTACAGCAGATAGCCTTGCAA chr19:56,064,780 3C qPCR 5.7, 5.9 3C KLK Fragment 23 CAGCATTTACAGCTTACCTTCACGTA chr19:56,087,013 3C qPCR 5.7, 5.9 3C KLK Fragment 25 CTCAGCCTCTGTGCCTTCTGT chr19:56,106,288 3C qPCR 5.7, 5.8, 5.9 3C KLK Fragment 31 TGCCGACCACCTTGATTTCT chr19:56,136,693 3C qPCR 5.7, 5.9 3C KLK Fragment 33 GCAAATCAGTGCCCAAGAAAGT chr19:56,163,442 3C qPCR 5.7, 5.9 3C KLK Fragment 36 AGGGCACCATGGCAAGATC chr19:56,177,440 3C qPCR 5.7, 5.9 3C KLK Fragment 39 TTGAGATTCACAAGCAGGAGTCA chr19:56,194,907 3C qPCR 5.7, 5.9 3C KLK Fragment 40 CGGGACTCCACCCCTTGT chr19:56,197,786 3C qPCR 5.7, 5.9 3C KLK Fragment 41 TGTGGCTCTGCCCAGCAT chr19:56,209,918 3C qPCR 5.7, 5.9 196

    CHAPTER 9: APPENDIX

    3C KLK Fragment 42 TGACCAGGCCCTCTCCAA chr19:56,220,893 3C qPCR 5.7, 5.9 3C KLK Fragment 46 CGGTCAAAAGACTAAATTACAACACATT chr19:56,244,028 3C qPCR 5.7, 5.9 3C KLK Fragment 49 GAGTCAGGGAGAGGGAGAGTGAA chr19:56,260,579 3C qPCR 5.7, 5.9 3C KLK Fragment 54 RV CCATATAAAACCTAGACCCCTCCTAAAT chr19:56,275,685 3C qPCR 5.7, 5.9 3C KLK Fragment 54 FW TCAAGGAGTGTCTGGTGTTTATACG chr19:56,289,836 3C qPCR 5.7, 5.9 3C KLK Fragment 56 RV CAGGAGGCTGAGGCTTGAAC chr19:56,290,119 3C qPCR 5.7, 5.9 3C KLK Fragment 56 FW CGAAGCCTTTCATTCTTTCCATA chr19:56,297,450 3C qPCR 5.7, 5.9 3C KLK Fragment 58 GTCTCCACAGATTGCCAAAGG chr19:56,298,818 3C qPCR 5.7, 5.9 3C KLK Fragment 59 AACAAAAGGCACCCTGTGCTT chr19:56,306,191 3C qPCR 5.7, 5.9 HBA1 GACCCTCTTCTCTGCACAGCTC chr16:163,452-163,616 qPCR 6.5 GCTACCGAGGCTCCAGCTTAAC MMP15 CAGGCCTCTGGTCTCTGTCATT chr16:56,626,887-56,627,135 qPCR 6.5 AGAGCTGAGAAACCACCACCAG BMP1 GATGAAGCCTCGACCCCTAGAT chr8:22,108,162-22,108,338 qPCR 6.5, 6.6, 6.9 ACCCGTCAGAGACGAACTTGAG HBB CCTGAGGAGAAGTCTGCCGTTA chr11:5,204,572-5,204,812 qPCR 6.5 GAACCTCTGGGTCCAAGGGTAG PTGS2 GTTCTAGGCTGGTGTCCCATTG chr1:184,911,112-184,911,341 qPCR 6.5 CTTTCTGTACTGCGGGTGGAAC NETO1 GGAGGTGGAATGCTAGGGACTT chr18:68,634,393-68,634,678 qPCR 6.5 GCTGAGTGTGGCCTTAAGAGGA SLITRK6 GGAGAACATGCCTCCACAGTCT chr13:85,267,666-85,267,946 qPCR 6.5 GTCCTGGAAGTTGAGTGGATGG ZFP42 CTTGTGGGGACACCCAGATAAG chr4:189,160,677-189,160,909 qPCR 6.5 AACCACCTCCAGGCAGTAGTGA DPPA2 AGGTGGACAGCGAAGACAGAAC chr3:110,503,485-110,503,652 qPCR 6.5, 6.6, 6.9 GGCCATCAGCAGTGTCCTAAAC Mitochondria CCTAGGAATCACCTCCCATTCC chrM:15,371-15,538 qPCR 6.5 GTGTTTAAGGGGTTGGCTAGGG

    197

    CHAPTER 9: APPENDIX

    9.2 Identification of LREA domains t-statistics representing changes in expression from LNCaP to PrEC were calculated for every gene in the genome. These plots represent the median t-statistic over 5 genes for each gene on every chromosome. Regions designated as transcriptionally activated are marked with a number. “Blue dots” are regions that were omitted for copy number variation. “Red dots” are regions with a median t-statistic above 4 for only a single gene (as opposed to two or more consecutive genes), and therefore failed to meet the criteria for further analysis. This figure is an extension of Figure 3.1.

    198

    CHAPTER 9: APPENDIX

    199

    CHAPTER 9: APPENDIX

    9.3 Oncomine region validation

    Summarised data extracted from 11 Oncomine prostate cancer studies is plotted for the 35 LREA domains, aligned to genes below the plot248-258. Red, green and grey boxes represent probe-sets with increased expression, decreased expression or are unchanged (or below detection) respectively. Due to the nature of array datasets, not all genes present in LREA regions are represented in the selected Oncomine studies. This figure is an extension of Figure 3.4 and Table 3.1.

    200

    CHAPTER 9: APPENDIX - ~.

    WIFIUf- .,

    ~R~eg~i=on~19~:~1=2=q=21:·:3 1===------~R~~~io~~~20=.:~1~2~q2~3~.~2 ------~.~------~.. ~ ~Re~~~io~~~2:~~~1~3 ~q1~2~.~12~------===~------~~~ - ~,... u [}hjJlllse~r.,_, - L~ ~ UP:~ - - . - ·- • • UP~ -- - Singh Singh - Singh ~~ - ~ '=~ • • '~!~ -- • v.m~~ V.m-=~ - · ~ ( I "'"Mn~ ~ - Pf>f.J:i CCOC:S 91 TMT .l I +I t SlGA8 II.. AA\.1 II MY9K1 f-tHftll OMMl Hoi THfRSF19 >-f-!HI - IWa+~t ~ ~u PCOTH~ VTP:ZO -·~· ~PIC I CHPTI t-1 CCOCS3 !if'i SACS f f4 MIP£P 11-H SVCPJ I PCOTM GNPTAB IIJtH

    _R~~~i_on_~~2 ~-~-1_4~q_11_._1 -~q~1_1._2 ______R~~~io _~23_,;_1_4~q1_3_. _3-~q~2 _l._l ______Re~ ~io_n__ 24_~ _1_S~q1_1_.2______

    ~~ Dt~I.Ue~':;~ ----· ~na~~:,t! ~ ~ ~ ~ ~ ~ Singh Singh Singh

    V•r~ltyr; v.fll~ll)o'=~ ----- I Vir~T~~j! ww.n Wtlsh . w.~hh - Yu Yu I Yu OllllHH POT£ G... >lC2~All • t FOXAl SST'Fil I SNRf>N --+· ___....,.... lfl •NOf!OlQIA. MIPOll l' o!IH H f ·l SNUAF ~· SHOROIO!II PAR-SH I SNOfi064 i-N0fl0 116 11 SN0fl0107 r ISNORD 108 SPQlOllr-.11 PAAS • _R~eg~i-on~2S_:_l_S~q-2 1_._1 ______-= ====~ -R~eg~io_n~26_:_1_6~p-12_. _2 ______-= ==~------Re~g~io_n~27_:_1_9~p1_3_.2______

    ~E~ - ~=!e~ ··--· ~~~~ • ~ ~ . .. ~ 01 • tV•rw~~ l--1 - 'V~•llll,. t: 1Ll • rV•rw~!~ VMMf!bltolly Ver1mbillt)o li - Var•Mbdy - . .,.. . PlDN H--+1• THUMPOI 1-tH!WHtffifM-fHI SlC:~ +;;;;;;t::.:'.-----Cl§ori21 ~ ~3 ~ JA(SM ] >-HtH-JEA~ ~ ::~ lOC8 1 69 1 ~

    _R~eg~i~on~28~:~1~9~q~13~·~33~------R~eg~io~n~29~:~2~0q~11~.2~1~-~q~ll~. ~22~------Re~g~io~n-3~0~: ~2~2~q1~1~.2~1~------~~"" Dtwn.iiekaf...... MO_l. ~~~ • I ~~~! lJu ~ l~ . l~ lomlin•"""' Tomlim'""' - V•na)a- •• - -. HI \l•na~ VM~ ~ - • '"'~ - I '"'~ KLK IS IH Ku« U1 00oof11. -~ SNTA.1 IH NECAal H n.t6A8 I It GGTlP •-t+-+i '" C[)I(5RAP1 ...... CBfiUTI USPI B-~

    Elfl fi

    _Re~g~io_n__ 31_ :_2_2 ~q1_1_. 2_1______Re~ g~io_n__ 32_ : _X~p-1 1_._23______Re~ g~io_n_3_3_: _X~p-2 2_._22______

    Ohanasekar.tn DNmasebran Ohanas.!!k.lran_2 Dttanawb~n_2 laPoi~~ ~ lAIPointe ,,. ,,.lJu 'iinqh Sk"ogh Tomhn~ 1 - l~~nl l Vat a~~ • Varam~r_ Wel':l - ~ PMAAPl to~ IIIM8Pl I - DGCR6l TSfU I '" II CCOC22'" f ii G'-~1' GA.CiElC \~ GAGE2A H PM>EI II 1'!-ifal FOJCP ) · ~ GAGE2B .,. c.A.:il}fl I GM.El II FAMI20C I Hfl FGOI WNIO III I I! P9f>l fl3 fl ~~~ ~ d\GELH ~~~ ~ M GAGE~ II GAGES *II GAU£121 ~ GAG£8 f G.\G£4 t-l (.oi.G!:l.:OC I _Re~g~io~n~34_:_X~q-ll_.~l ______Re~g~ i o~n_3~5~: _X~q-28~------onan.ueuran Ohanasekal.,. Oh

    201

    REFERENCES

    REFERENCES

    202

    REFERENCES

    1 Simpson, R. T. & Bustin, M. Histone Composition of Chromatin Subunits Studied by Immunosedimentation. Biochemistry 15, 4305-4312 (1976). 2 Luger, K., Mader, A. W., Richmond, R. K., Sargent, D. F. & Richmond, T. J. Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389, 251-260 (1997). 3 Grigoryev, S. A. Nucleosome spacing and chromatin higher-order folding. Nucleus- Austin 3, 493-499 (2012). 4 Heitz, E. Das Heterochromatin der Moose. I Jahrb Wiss Botanik 69, 762-818 (1928). 5 Martens, J. H., O'Sullivan, R. J., Braunschweig, U., Opravil, S., Radolf, M., Steinlein, P. & Jenuwein, T. The profile of repeat-associated histone lysine methylation states in the mouse epigenome. Embo Journal 24, 800-812 (2005). 6 Trojer, P. & Reinberg, D. Facultative heterochromatin: is there a distinctive molecular signature? Mol Cell 28, 1-13 (2007). 7 Berger, S. L. The complex language of chromatin regulation during transcription. Nature 447, 407-412 (2007). 8 Strahl, B. D. & Allis, C. D. The language of covalent histone modifications. Nature 403, 41-45 (2000). 9 Lee, D. Y., Hayes, J. J., Pruss, D. & Wolffe, A. P. A positive role for histone acetylation in transcription factor access to nucleosomal DNA. Cell 72, 73-84 (1993). 10 Garcia-Ramirez, M., Rocchini, C. & Ausio, J. Modulation of chromatin folding by histone acetylation. J Biol Chem 270, 17923-17928 (1995). 11 Liang, G. et al. Distinct localization of histone H3 acetylation and H3-K4 methylation to the transcription start sites in the human genome. Proc Natl Acad Sci U S A 101, 7357-7362 (2004). 12 Agalioti, T., Chen, G. & Thanos, D. Deciphering the transcriptional histone acetylation code for a human gene. Cell 111, 381-392 (2002). 13 Creyghton, M. P. et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A 107, 21931-21936 (2010). 14 Bernstein, B. E. et al. Genomic maps and comparative analysis of histone modifications in human and mouse. Cell 120, 169-181 (2005). 15 Lauberth, S. M., Nakayama, T., Wu, X., Ferris, A. L., Tang, Z., Hughes, S. H. & Roeder, R. G. H3K4me3 interactions with TAF3 regulate preinitiation complex assembly and selective gene activation. Cell 152, 1021-1036 (2013). 16 Wagner, E. J. & Carpenter, P. B. Understanding the language of Lys36 methylation at histone H3. Nature Reviews Molecular Cell Biology 13, 115-126 (2012). 17 Wang, Z. et al. Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet 40, 897-903 (2008). 18 Trojer, P. & Reinberg, D. Facultative heterochromatin: Is there a distinctive molecular signature? Molecular Cell 28, 1-13 (2007).

    203

    REFERENCES

    19 Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948-U983 (2008). 20 Yokochi, T. et al. G9a selectively represses a class of late-replicating genes at the nuclear periphery. Proceedings of the National Academy of Sciences of the United States of America 106, 19363-19368 (2009). 21 Cao, R. & Zhang, Y. The functions of E(Z)/EZH2-mediated methylation of lysine 27 in histone H3. Current Opinion in Genetics & Development 14, 155-164 (2004). 22 Margueron, R. & Reinberg, D. The Polycomb complex PRC2 and its mark in life. Nature 469, 343-349 (2011). 23 Simon, J. A. & Kingston, R. E. Occupying chromatin: Polycomb mechanisms for getting to genomic targets, stopping transcriptional traffic, and staying put. Mol Cell 49, 808-824 (2013). 24 Wang, Z. & Patel, D. J. Combinatorial readout of dual histone modifications by paired chromatin-associated modules. J Biol Chem 286, 18363-18368 (2011). 25 Rando, O. J. Combinatorial complexity in chromatin structure and function: revisiting the histone code. Curr Opin Genet Dev 22, 148-155 (2012). 26 Rada-Iglesias, A., Bajpai, R., Swigut, T., Brugmann, S. A., Flynn, R. A. & Wysocka, J. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279-283 (2011). 27 Zentner, G. E., Tesar, P. J. & Scacheri, P. C. Epigenetic signatures distinguish multiple classes of enhancers with distinct cellular functions. Genome Res 21, 1273-1283 (2011). 28 Voigt, P. et al. Asymmetrically modified nucleosomes. Cell 151, 181-193 (2012). 29 Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553-560 (2007). 30 Schmitges, F. W. et al. Histone Methylation by PRC2 Is Inhibited by Active Chromatin Marks. Molecular Cell 42, 330-341 (2011). 31 Kim, J. H. et al. Deep sequencing reveals distinct patterns of DNA methylation in prostate cancer. Genome research 21, 1028-1041 (2011). 32 Balasubramanian, D. et al. H3K4me3 inversely correlates with DNA methylation at a large class of non-CpG-island-containing start sites. Genome Medicine 4 (2012). 33 Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315-322 (2009). 34 Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2001). 35 Gardiner-Garden, M. & Frommer, M. CpG islands in vertebrate genomes. J Mol Biol 196, 261-282 (1987). 36 Maunakea, A. K. et al. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 466, 253-257 (2010). 37 Laurent, L. et al. Dynamic changes in the human methylome during differentiation. Genome Res 20, 320-331 (2010).

    204

    REFERENCES

    38 Gama-Sosa, M. A., Slagel, V. A., Trewyn, R. W., Oxenhandler, R., Kuo, K. C., Gehrke, C. W. & Ehrlich, M. The 5-methylcytosine content of DNA from human tumors. Nucleic Acids Res 11, 6883-6894 (1983). 39 Ehrlich, M. DNA methylation in cancer: too much, but also too little. Oncogene 21, 5400-5413 (2002). 40 Jones, P. A., Wolkowicz, M. J., Rideout, W. M., 3rd, Gonzales, F. A., Marziasz, C. M., Coetzee, G. A. & Tapscott, S. J. De novo methylation of the MyoD1 CpG island during the establishment of immortal cell lines. Proc Natl Acad Sci U S A 87, 6117-6121 (1990). 41 Baylin, S. B. & Herman, J. G. DNA hypermethylation in tumorigenesis: epigenetics joins genetics. Trends Genet 16, 168-174 (2000). 42 Illingworth, R. et al. A novel CpG island set identifies tissue-specific methylation at developmental gene loci. Plos Biology 6, 37-51 (2008). 43 Deaton, A. M. & Bird, A. CpG islands and the regulation of transcription. Genes Dev 25, 1010-1022 (2011). 44 Clouaire, T. et al. Cfp1 integrates both CpG content and gene activity for accurate H3K4me3 deposition in embryonic stem cells. Genes Dev 26, 1714-1728 (2012). 45 Thomson, J. P. et al. CpG islands influence chromatin structure via the CpG-binding protein Cfp1. Nature 464, 1082-1086 (2010). 46 Weber, M., Hellmann, I., Stadler, M. B., Ramos, L., Paabo, S., Rebhan, M. & Schubeler, D. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet 39, 457-466 (2007). 47 Ooi, S. K. et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature 448, 714-717 (2007). 48 Sharma, S., Kelly, T. K. & Jones, P. A. Epigenetics in cancer. Carcinogenesis 31, 27-36 (2010). 49 Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766-U791 (2008). 50 Mohn, F. et al. Lineage-specific polycomb targets and de novo DNA methylation define restriction and potential of neuronal progenitors. Molecular Cell 30, 755-766 (2008). 51 Lock, L. F., Takagi, N. & Martin, G. R. Methylation of the Hprt gene on the inactive X occurs after chromosome inactivation. Cell 48, 39-46 (1987). 52 Illingworth, R. S. et al. Orphan CpG islands identify numerous conserved promoters in the mammalian genome. PLoS Genet 6 (2010). 53 Mancini-DiNardo, D., Steele, S. J. S., Levorse, J. M., Ingram, R. S. & Tilghman, S. M. Elongation of the Kcnq1ot1 transcript is required for genomic imprinting of neighboring genes. Genes & Development 20, 1268-1282 (2006). 54 Okano, M., Bell, D. W., Haber, D. A. & Li, E. DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 247-257 (1999). 55 Xu, G. L. et al. Chromosome instability and immunodeficiency syndrome caused by mutations in a DNA methyltransferase gene. Nature 402, 187-191 (1999).

    205

    REFERENCES

    56 Moarefi, A. H. & Chedin, F. ICF syndrome mutations cause a broad spectrum of biochemical defects in DNMT3B-mediated de novo DNA methylation. J Mol Biol 409, 758-772 (2011). 57 Hansen, R. S., Wijmenga, C., Luo, P., Stanek, A. M., Canfield, T. K., Weemaes, C. M. R. & Gartler, S. M. The DNMT3B DNA methyltransferase gene is mutated in the ICF immunodeficiency syndrome. Proceedings of the National Academy of Sciences of the United States of America 96, 14412-14417 (1999). 58 Bourc'his, D. & Bestor, T. H. Meiotic catastrophe and retrotransposon reactivation in male germ cells lacking Dnmt3L. Nature 431, 96-99 (2004). 59 Yoder, J. A., Walsh, C. P. & Bestor, T. H. Cytosine methylation and the ecology of intragenomic parasites. Trends in Genetics 13, 335-340 (1997). 60 Walsh, C. P., Chaillet, J. R. & Bestor, T. H. Transcription of IAP endogenous retroviruses is constrained by cytosine methylation. Nature genetics 20, 116-117 (1998). 61 Popp, C. et al. Genome-wide erasure of DNA methylation in mouse primordial germ cells is affected by AID deficiency. Nature 463, 1101-U1126 (2010). 62 Hawkins, R. D. et al. Distinct epigenomic landscapes of pluripotent and lineage- committed human cells. Cell Stem Cell 6, 479-491 (2010). 63 Maunakea, A. K., Chepelev, I., Cui, K. & Zhao, K. Intragenic DNA methylation modulates alternative splicing by recruiting MeCP2 to promote exon recognition. Cell Res (2013). 64 Aran, D., Toperoff, G., Rosenberg, M. & Hellman, A. Replication timing-related and gene body-specific methylation of active human genes. Hum Mol Genet (2010). 65 Babu, A. & Verma, R. S. Chromosome structure: Euchromatin and heterochromatin. Int Rev Cytol 108, 1-60 (1987). 66 Bickmore, W. A. & van Steensel, B. Genome architecture: domain organization of interphase chromosomes. Cell 152, 1270-1284 (2013). 67 Thiery, J. P., Macaya, G. & Bernardi, G. An analysis of eukaryotic genomes by density gradient centrifugation. J Mol Biol 108, 219-235 (1976). 68 Saccone, S., Federico, C. & Bernardi, G. Localization of the gene-richest and the gene-poorest isochores in the interphase nuclei of mammals and birds. Gene 300, 169-178 (2002). 69 Jabbari, K. & Bernardi, G. CpG doublets, CpG islands and Alu repeats in long human DNA sequences from different isochore families. Gene 224, 123-128 (1998). 70 Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2001). 71 Versteeg, R. et al. The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome research 13, 1998-2004 (2003). 72 Lewis, E. B. A gene complex controlling segmentation in Drosophila. Nature 276, 565-570 (1978). 73 Jurgens, G. A Group of Genes-Controlling the Spatial Expression of the Bithorax Complex in Drosophila. Nature 316, 153-155 (1985).

    206

    REFERENCES

    74 Soshnikova, N. & Duboule, D. Epigenetic temporal control of mouse Hox genes in vivo. Science 324, 1320-1323 (2009). 75 Montavon, T. & Duboule, D. Chromatin organization and global regulation of Hox gene clusters. Philos Trans R Soc Lond B Biol Sci 368, 20120367 (2013). 76 Pauler, F. M. et al. H3K27me3 forms BLOCs over silent genes and intergenic regions and specifies a histone banding pattern on a mouse autosomal chromosome. Genome research 19, 221-233 (2009). 77 Wen, B., Wu, H., Shinkai, Y., Irizarry, R. A. & Feinberg, A. P. Large histone H3 lysine 9 dimethylated chromatin blocks distinguish differentiated from embryonic stem cells. Nat Genet 41, 246-250 (2009). 78 Lister, R. et al. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature 471, 68-73 (2011). 79 Hon, G. C. et al. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res 22, 246-258 (2012). 80 Hansen, K. D. et al. Increased methylation variation in epigenetic domains across cancer types. Nat Genet 43, 768-775 (2011). 81 Varriale, A. & Bernardi, G. Distribution of DNA methylation, CpGs, and CpG islands in human isochores. Genomics 95, 25-28 (2010). 82 Rabl, C. Uber zelltheilung. Morphol. Jahrbuch 10, 214-230 (1885). 83 Spector, D. L. Macromolecular Domains within the Cell-Nucleus. Annual Review of Cell Biology 9, 265-315 (1993). 84 Kupper, K. et al. Radial chromatin positioning is shaped by local gene density, not by gene expression. Chromosoma 116, 285-306 (2007). 85 Misteli, T. Beyond the sequence: cellular organization of genome function. Cell 128, 787-800 (2007). 86 Gruenbaum, Y., Margalit, A., Goldman, R. D., Shumaker, D. K. & Wilson, K. L. The nuclear lamina comes of age. Nat Rev Mol Cell Biol 6, 21-31 (2005). 87 Goldman, R. D. et al. Accumulation of mutant lamin A causes progressive changes in nuclear architecture in Hutchinson-Gilford progeria syndrome. Proc Natl Acad Sci U S A 101, 8963-8968 (2004). 88 Shumaker, D. K. et al. Mutant nuclear lamin A leads to progressive alterations of epigenetic control in premature aging. Proceedings of the National Academy of Sciences of the United States of America 103, 8703-8708 (2006). 89 Goldberg, M., Harel, A., Brandeis, M., Rechsteiner, T., Richmond, T. J., Weiss, A. M. & Gruenbaum, Y. The tail domain of lamin Dm0 binds histones H2A and H2B. Proc Natl Acad Sci U S A 96, 2852-2857 (1999). 90 Ye, Q. Interactions between Heterochromatin Proteins and an Integral Protein of the Nuclear-Envelope Inner Membrane. Molecular Biology of the Cell 6, 1167-1167 (1995). 91 Peric-Hupkes, D. et al. Molecular Maps of the Reorganization of Genome-Nuclear Lamina Interactions during Differentiation. Mol Cell 38, 603-613 (2010).

    207

    REFERENCES

    92 Berman, B. P. et al. Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains. Nat Genet 44, 40-46 (2012). 93 Ragoczy, T., Bender, M. A., Telling, A., Byron, R. & Groudine, M. The locus control region is required for association of the murine beta-globin locus with engaged transcription factories during erythroid maturation. Genes Dev 20, 1447-1457 (2006). 94 Reddy, K. L., Zullo, J. M., Bertolino, E. & Singh, H. Transcriptional repression mediated by repositioning of genes to the nuclear lamina. Nature 452, 243-247 (2008). 95 Kind, J. et al. Single-Cell Dynamics of Genome-Nuclear Lamina Interactions. Cell 153, 178-192 (2013). 96 Boyle, S., Gilchrist, S., Bridger, J. M., Mahy, N. L., Ellis, J. A. & Bickmore, W. A. The spatial organization of human chromosomes within the nuclei of normal and emerin-mutant cells. Human molecular genetics 10, 211-219 (2001). 97 Strasak, L., Bartova, E., Harnicarova, A., Galiova, G., Krejci, J. & Kozubek, S. H3K9 acetylation and radial chromatin positioning. J Cell Physiol 220, 91-101 (2009). 98 Kosak, S. T., Skok, J. A., Medina, K. L., Riblet, R., Le Beau, M. M., Fisher, A. G. & Singh, H. Subnuclear compartmentalization of immunoglobulin loci during lymphocyte development. Science 296, 158-162 (2002). 99 Parada, L. A., McQueen, P. G. & Misteli, T. Tissue-specific spatial organization of genomes. Genome Biol 5, R44 (2004). 100 Iborra, F. J., Pombo, A., Jackson, D. A. & Cook, P. R. Active RNA polymerases are localized within discrete transcription 'factories' in human nuclei. Journal of Cell Science 109, 1427-1436 (1996). 101 Gushchanskaya, E. S., Markova, E. N., Razin, S. V. & Kantidze, O. L. Unmethylated CpG islands are clustered inside the interphase human cell nuclei. Doklady Biochemistry and Biophysics 443, 123-126 (2012). 102 Palstra, R. J., Simonis, M., Klous, P., Brasset, E., Eijkelkamp, B. & de Laat, W. Maintenance of Long-Range DNA Interactions after Inhibition of Ongoing RNA Polymerase II Transcription. PLoS ONE 3 (2008). 103 Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289-293 (2009). 104 Fullwood, M. J. et al. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature 462, 58-64 (2009). 105 Cremer, T., Cremer, M., Dietzel, S., Muller, S., Solovei, I. & Fakan, S. Chromosome territories--a functional nuclear landscape. Curr Opin Cell Biol 18, 307-316 (2006). 106 Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nature genetics 43, 1059-U1040 (2011). 107 Handoko, L. et al. CTCF-mediated functional chromatin interactome in pluripotent cells. Nat Genet 43, 630-638 (2011).

    208

    REFERENCES

    108 Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376-380 (2012). 109 Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381-385 (2012). 110 Nora, E. P., Dekker, J. & Heard, E. Segmental folding of chromosomes: A basis for structural and regulatory chromosomal neighborhoods? Bioessays (2013). 111 Dellino, G. I. et al. Genome-wide mapping of human DNA-replication origins: Levels of transcription at ORC1 sites regulate origin selection and replication timing. Genome research 23, 1-11 (2013). 112 Doksani, Y., Bermejo, R., Fiorani, S., Haber, J. E. & Foiani, M. Replicon dynamics, dormant origin firing, and terminal fork integrity after double-strand break formation. Cell 137, 247-258 (2009). 113 Mechali, M. Eukaryotic DNA replication origins: many choices for appropriate answers. Nat Rev Mol Cell Biol 11, 728-738 (2010). 114 Huberman, J. A. & Riggs, A. D. Autoradiography of chromosomal DNA fibers from Chinese hamster cells. Proc Natl Acad Sci U S A 55, 599-606 (1966). 115 Hiratani, I. et al. Global reorganization of replication domains during embryonic stem cell differentiation. PLoS Biol 6, e245 (2008). 116 Ryba, T. et al. Evolutionarily conserved replication timing profiles predict long-range chromatin interactions and distinguish closely related cell types. Genome Res (2010). 117 Wu, R., Terry, A. V., Singh, P. B. & Gilbert, D. M. Differential subnuclear localization and replication timing of histone H3 lysine 9 methylation states. Molecular Biology of the Cell 16, 2872-2881 (2005). 118 Hansen, R. S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc Natl Acad Sci U S A 107, 139-144 (2010). 119 O'Keefe, R. T., Henderson, S. C. & Spector, D. L. Dynamic organization of DNA replication in mammalian cell nuclei: spatially and temporally defined replication of chromosome-specific alpha-satellite DNA sequences. J Cell Biol 116, 1095-1110 (1992). 120 Costantini, M. & Bernardi, G. Replication timing, chromosomal bands, and isochores. Proc Natl Acad Sci U S A 105, 3433-3437 (2008). 121 Takebayashi, S., Dileep, V., Ryba, T., Dennis, J. H. & Gilbert, D. M. Chromatin- interaction compartment switch at developmentally regulated chromosomal domains reveals an unusual principle of chromatin folding. Proc Natl Acad Sci U S A 109, 12574-12579 (2012). 122 Takebayashi, S., Dileep, V., Ryba, T., Dennis, J. H. & Gilbert, D. M. Chromatin- interaction compartment switch at developmentally regulated chromosomal domains reveals an unusual principle of chromatin folding. Proceedings of the National Academy of Sciences of the United States of America 109, 12574-12579 (2012).

    209

    REFERENCES

    123 Ebrahimi, H., Robertson, E. D., Taddei, A., Gasser, S. M., Donaldson, A. D. & Hiraga, S. I. Early initiation of a replication origin tethered at the nuclear periphery. Journal of Cell Science 123, 1015-1019 (2010). 124 Lande-Diner, L., Zhang, J. & Cedar, H. Shifts in replication timing actively affect histone acetylation during nucleosome reassembly. Mol Cell 34, 767-774 (2009). 125 Zhang, J., Xu, F., Hashimshony, T., Keshet, I. & Cedar, H. Establishment of transcriptional competence in early and late S phase. Nature 420, 198-202 (2002). 126 Conti, C. et al. Inhibition of Histone Deacetylase in Cancer Cells Slows Down Replication Forks, Activates Dormant Origins, and Induces DNA Damage. Cancer Res (2010). 127 Casas-Delucchi, C. S. et al. Histone hypoacetylation is required to maintain late replication timing of constitutive heterochromatin. Nucleic Acids Res 40, 159-169 (2012). 128 Vogelstein, B., Papadopoulos, N., Velculescu, V. E., Zhou, S., Diaz, L. A., Jr. & Kinzler, K. W. Cancer genome landscapes. Science 339, 1546-1558 (2013). 129 Jackson, S. P. & Bartek, J. The DNA-damage response in human biology and disease. Nature 461, 1071-1078 (2009). 130 Feinberg, A. P. & Vogelstein, B. Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature 301, 89-92 (1983). 131 Feinberg, A. P., Gehrke, C. W., Kuo, K. C. & Ehrlich, M. Reduced genomic 5- methylcytosine content in human colonic neoplasia. Cancer Res 48, 1159-1161 (1988). 132 Wolff, E. M. et al. Hypomethylation of a LINE-1 promoter activates an alternate transcript of the MET oncogene in bladders with cancer. PLoS Genet 6, e1000917 (2010). 133 Lamprecht, B. et al. Derepression of an endogenous long terminal repeat activates the CSF1R proto-oncogene in human lymphoma. Nat Med 16, 571-579, 571p following 579 (2010). 134 Ley, T. J. et al. DNMT3A mutations in acute myeloid leukemia. N Engl J Med 363, 2424-2433 (2010). 135 Walter, M. J. et al. Recurrent DNMT3A mutations in patients with myelodysplastic syndromes. Leukemia 25, 1153-1158 (2011). 136 Okano, M., Bell, D. W., Haber, D. A. & Li, E. DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 247-257 (1999). 137 Raddatz, G., Gao, Q., Bender, S., Jaenisch, R. & Lyko, F. Dnmt3a protects active chromosome domains against cancer-associated hypomethylation. PLoS Genet 8, e1003146 (2012). 138 Gaudet, F. et al. Induction of tumors in mice by genomic hypomethylation. Science 300, 489-492 (2003). 139 Frigola, J., Song, J., Stirzaker, C., Hinshelwood, R. A., Peinado, M. A. & Clark, S. J. Epigenetic remodeling in colorectal cancer results in coordinate gene suppression across an entire chromosome band. Nat Genet 38, 540-549 (2006).

    210

    REFERENCES

    140 Coolen, M. W. et al. Consolidation of the cancer genome into domains of repressive chromatin by long-range epigenetic silencing (LRES) reduces transcriptional plasticity. Nat Cell Biol 12, 235-246 (2010). 141 Toyota, M., Ahuja, N., Ohe-Toyota, M., Herman, J. G., Baylin, S. B. & Issa, J. P. CpG island methylator phenotype in colorectal cancer. Proc Natl Acad Sci U S A 96, 8681- 8686 (1999). 142 Issa, J. P. CpG island methylator phenotype in cancer. Nat Rev Cancer 4, 988-993 (2004). 143 Beggs, A. D., Jones, A., El-Bahwary, M., Abulafi, M., Hodgson, S. V. & Tomlinson, I. P. Whole-genome methylation analysis of benign and malignant colorectal tumours. J Pathol 229, 697-704 (2013). 144 Issa, J. P. Methylation and prognosis: of molecular clocks and hypermethylator phenotypes. Clin Cancer Res 9, 2879-2881 (2003). 145 Bert, S. A. et al. Regional activation of the cancer genome by long-range epigenetic remodeling. Cancer Cell 23, 9-22 (2013). 146 Greger, V., Passarge, E., Hopping, W., Messmer, E. & Horsthemke, B. Epigenetic changes may contribute to the formation and spontaneous regression of retinoblastoma. Hum Genet 83, 155-158 (1989). 147 Gonzalez-Zulueta, M., Bender, C. M., Yang, A. S., Nguyen, T., Beart, R. W., Van Tornout, J. M. & Jones, P. A. Methylation of the 5' CpG island of the p16/CDKN2 tumor suppressor gene in normal and transformed human tissues correlates with gene silencing. Cancer Res 55, 4531-4535 (1995). 148 Jones, P. A. & Laird, P. W. Cancer epigenetics comes of age. Nat Genet 21, 163-167 (1999). 149 Maruyama, R. et al. Aberrant promoter methylation profile of prostate cancers and its relationship to clinicopathological features. Clin Cancer Res 8, 514-519 (2002). 150 Ueki, T., Toyota, M., Sohn, T., Yeo, C. J., Issa, J. P., Hruban, R. H. & Goggins, M. Hypermethylation of multiple genes in pancreatic adenocarcinoma. Cancer Res 60, 1835-1839 (2000). 151 Strathdee, G., Appleton, K., Illand, M., Millan, D. W., Sargent, J., Paul, J. & Brown, R. Primary ovarian carcinomas display multiple methylator phenotypes involving known tumor suppressor genes. Am J Pathol 158, 1121-1127 (2001). 152 Shen, L., Ahuja, N., Shen, Y., Habib, N. A., Toyota, M., Rashid, A. & Issa, J. P. DNA methylation and environmental exposures in human hepatocellular carcinoma. J Natl Cancer Inst 94, 755-761 (2002). 153 Hughes, L. A. et al. The CpG Island Methylator Phenotype: What's in a Name? Cancer Res 73, 5858-5868 (2013). 154 Kang, M. R. et al. Mutational analysis of IDH1 codon 132 in glioblastomas and other common cancers. Int J Cancer 125, 353-355 (2009). 155 Turcan, S. et al. IDH1 mutation is sufficient to establish the glioma hypermethylator phenotype. Nature 483, 479-483 (2012).

    211

    REFERENCES

    156 Figueroa, M. E. et al. Leukemic IDH1 and IDH2 mutations result in a hypermethylation phenotype, disrupt TET2 function, and impair hematopoietic differentiation. Cancer Cell 18, 553-567 (2010). 157 Xu, W. et al. Oncometabolite 2-hydroxyglutarate is a competitive inhibitor of alpha- ketoglutarate-dependent dioxygenases. Cancer Cell 19, 17-30 (2011). 158 Ito, S. et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5- carboxylcytosine. Science 333, 1300-1303 (2011). 159 Guo, J. U., Su, Y., Zhong, C., Ming, G. L. & Song, H. Emerging roles of TET proteins and 5-hydroxymethylcytosines in active DNA demethylation and beyond. Cell Cycle 10, 2662-2668 (2011). 160 Delhommeau, F. et al. Mutation in TET2 in myeloid cancers. N Engl J Med 360, 2289- 2301 (2009). 161 Imai, K. & Yamamoto, H. Carcinogenesis and microsatellite instability: the interrelationship between genetics and epigenetics. Carcinogenesis 29, 673-680 (2008). 162 De Carvalho, D. D. et al. DNA methylation screening identifies driver epigenetic events of cancer cell survival. Cancer Cell 21, 655-667 (2012). 163 Kalari, S. & Pfeifer, G. P. in Adv Genet Vol. Volume 70 (eds Herceg Zdenko & Ushijima Toshikazu) 277-308 (Academic Press, 2010). 164 Ohm, J. E. et al. A stem cell-like chromatin pattern may predispose tumor suppressor genes to DNA hypermethylation and heritable silencing. Nat Genet 39, 237-242 (2007). 165 Gebhard, C. et al. General transcription factor binding at CpG islands in normal cells correlates with resistance to de novo DNA methylation in cancer cells. Cancer Res 70, 1398-1407 (2010). 166 McCabe, M. T., Lee, E. K. & Vertino, P. M. A multifactorial signature of DNA sequence and polycomb binding predicts aberrant CpG island methylation. Cancer Res 69, 282- 291 (2009). 167 Lienert, F., Wirbelauer, C., Som, I., Dean, A., Mohn, F. & Schubeler, D. Identification of genetic elements that autonomously determine DNA methylation states. Nat Genet 43, 1091-1097 (2011). 168 Schlesinger, Y. et al. Polycomb-mediated methylation on Lys27 of histone H3 pre- marks genes for de novo methylation in cancer. Nat Genet 39, 232-236 (2007). 169 Easwaran, H. et al. A DNA hypermethylation module for the stem/progenitor cell signature of cancer. Genome Res 22, 837-849 (2012). 170 Dawson, M. A. & Kouzarides, T. Cancer epigenetics: from mechanism to therapy. Cell 150, 12-27 (2012). 171 Plass, C., Pfister, S. M., Lindroth, A. M., Bogatyrova, O., Claus, R. & Lichter, P. Mutations in regulators of the epigenome and their connections to global chromatin patterns in cancer. Nat Rev Genet 14, 765-780 (2013). 172 Suva, M. L., Riggi, N. & Bernstein, B. E. Epigenetic reprogramming in cancer. Science 339, 1567-1570 (2013).

    212

    REFERENCES

    173 Greer, E. L. & Shi, Y. Histone methylation: a dynamic mark in health, disease and inheritance. Nat Rev Genet 13, 343-357 (2012). 174 Albert, M. & Helin, K. Histone methyltransferases in cancer. Semin Cell Dev Biol 21, 209-220 (2010). 175 Hamamoto, R. et al. SMYD3 encodes a histone methyltransferase involved in the proliferation of cancer cells. Nat Cell Biol 6, 731-740 (2004). 176 Nakazawa, T. et al. Global histone modification of histone H3 in colorectal cancer and its precursor lesions. Hum Pathol 43, 834-842 (2012). 177 Yap, D. B. et al. Somatic mutations at EZH2 Y641 act dominantly through a mechanism of selectively altered PRC2 catalytic activity, to increase H3K27 trimethylation. Blood 117, 2451-2459 (2011). 178 Elsheikh, S. E. et al. Global histone modifications in breast cancer correlate with tumor phenotypes, prognostic factors, and patient outcome. Cancer Res 69, 3802- 3809 (2009). 179 Barlesi, F. et al. Global histone modifications predict prognosis of resected non small-cell lung cancer. J Clin Oncol 25, 4358-4364 (2007). 180 Seligson, D. B. et al. Global levels of histone modifications predict prognosis in different cancers. Am J Pathol 174, 1619-1628 (2009). 181 Meyer, C. et al. The MLL recombinome of acute leukemias. Leukemia 20, 777-784 (2006). 182 Krivtsov, A. V. & Armstrong, S. A. MLL translocations, histone modifications and leukaemia stem-cell development. Nat Rev Cancer 7, 823-833 (2007). 183 Dorrance, A. M. et al. Mll partial tandem duplication induces aberrant Hox expression in vivo via specific epigenetic alterations. J Clin Invest 116, 2707-2716 (2006). 184 Milne, T. A., Briggs, S. D., Brock, H. W., Martin, M. E., Gibbs, D., Allis, C. D. & Hess, J. L. MLL targets SET domain methyltransferase activity to Hox gene promoters. Mol Cell 10, 1107-1117 (2002). 185 Argiropoulos, B. & Humphries, R. K. Hox genes in hematopoiesis and leukemogenesis. Oncogene 26, 6766-6776 (2007). 186 Gonzalez, M. E. et al. Downregulation of EZH2 decreases growth of estrogen receptor-negative invasive breast carcinoma and requires BRCA1. Oncogene 28, 843-853 (2009). 187 Yang, X. et al. CDKN1C (p57) is a direct target of EZH2 and suppressed by multiple epigenetic mechanisms in breast cancer cells. PLoS ONE 4, e5011 (2009). 188 Fujii, S. & Ochiai, A. Enhancer of zeste homolog 2 downregulates E-cadherin by mediating histone H3 methylation in gastric cancer cells. Cancer Sci 99, 738-746 (2008). 189 Cao, R. et al. Role of histone H3 lysine 27 methylation in Polycomb-group silencing. Science 298, 1039-1043 (2002). 190 Varambally, S. et al. The polycomb group protein EZH2 is involved in progression of prostate cancer. Nature 419, 624-629 (2002).

    213

    REFERENCES

    191 Collett, K. et al. Expression of enhancer of zeste homologue 2 is significantly associated with increased tumor cell proliferation and is a marker of aggressive breast cancer. Clin Cancer Res 12, 1168-1174 (2006). 192 Morin, R. D. et al. Somatic mutations altering EZH2 (Tyr641) in follicular and diffuse large B-cell lymphomas of germinal-center origin. Nat Genet 42, 181-185 (2010). 193 O'Carroll, D., Erhardt, S., Pagani, M., Barton, S. C., Surani, M. A. & Jenuwein, T. The polycomb-group gene Ezh2 is required for early mouse development. Mol Cell Biol 21, 4330-4336 (2001). 194 Simon, J. A. & Lange, C. A. Roles of the EZH2 histone methyltransferase in cancer epigenetics. Mutat Res 647, 21-29 (2008). 195 Chen, Y. W., Kao, S. Y., Wang, H. J. & Yang, M. H. Histone modification patterns correlate with patient outcome in oral squamous cell carcinoma. Cancer 119, 4259- 4267 (2013). 196 Roche, J. et al. Global Decrease of Histone H3K27 Acetylation in ZEB1-Induced Epithelial to Mesenchymal Transition in Lung Cancer Cells. Cancers (Basel) 5, 334- 356 (2013). 197 Fraga, M. F. et al. Loss of acetylation at Lys16 and trimethylation at Lys20 of histone H4 is a common hallmark of human cancer. Nat Genet 37, 391-400 (2005). 198 Di Cerbo, V. & Schneider, R. Cancers with wrong HATs: the impact of acetylation. Brief Funct Genomics 12, 231-243 (2013). 199 Santer, F. R. et al. Inhibition of the acetyltransferases p300 and CBP reveals a targetable function for p300 in the survival and invasion pathways of prostate cancer cell lines. Mol Cancer Ther 10, 1644-1655 (2011). 200 Debes, J. D., Sebo, T. J., Lohse, C. M., Murphy, L. M., Haugen, D. A. & Tindall, D. J. p300 in prostate cancer proliferation and progression. Cancer Res 63, 7638-7640 (2003). 201 Wang, J. et al. Conditional MLL-CBP targets GMP and models therapy-related myeloproliferative disease. Embo Journal 24, 368-381 (2005). 202 Pasqualucci, L. et al. Inactivating mutations of acetyltransferase genes in B-cell lymphoma. Nature 471, 189-195 (2011). 203 Iyer, N. G., Ozdag, H. & Caldas, C. p300/CBP and cancer. Oncogene 23, 4225-4231 (2004). 204 Gayther, S. A. et al. Mutations truncating the EP300 acetylase in human cancers. Nat Genet 24, 300-303 (2000). 205 Hennekam, R. C. Rubinstein-Taybi syndrome. Eur J Hum Genet 14, 981-985 (2006). 206 Miller, R. W. & Rubinstein, J. H. Tumors in Rubinstein-Taybi syndrome. Am J Med Genet 56, 112-115 (1995). 207 Ferrari, R., Pellegrini, M., Horwitz, G. A., Xie, W., Berk, A. J. & Kurdistani, S. K. Epigenetic reprogramming by adenovirus e1a. Science 321, 1086-1088 (2008). 208 Ropero, S. & Esteller, M. The role of histone deacetylases (HDACs) in human cancer. Mol Oncol 1, 19-25 (2007). 209 Barneda-Zahonero, B. & Parra, M. Histone deacetylases and cancer. Molecular Oncology 6, 579-589 (2012).

    214

    REFERENCES

    210 Lin, R. J., Nagy, L., Inoue, S., Shao, W., Miller, W. H., Jr. & Evans, R. M. Role of the histone deacetylase complex in acute promyelocytic leukaemia. Nature 391, 811- 814 (1998). 211 Grignani, F. et al. Fusion proteins of the retinoic acid receptor-alpha recruit histone deacetylase in promyelocytic leukaemia. Nature 391, 815-818 (1998). 212 Wang, J., Hoshino, T., Redner, R. L., Kajigaya, S. & Liu, J. M. ETO, fusion partner in t(8;21) acute myeloid leukemia, represses transcription by interaction with the human N-CoR/mSin3/HDAC1 complex. Proc Natl Acad Sci U S A 95, 10860-10865 (1998). 213 Plumb, J. A. et al. Pharmacodynamic response and inhibition of growth of human tumor xenografts by the novel histone deacetylase inhibitor PXD101. Mol Cancer Ther 2, 721-728 (2003). 214 Khan, O. & La Thangue, N. B. HDAC inhibitors in cancer biology: emerging mechanisms and clinical applications. Immunol Cell Biol 90, 85-94 (2012). 215 Timp, W. & Feinberg, A. P. Cancer as a dysregulated epigenome allowing cellular growth advantage at the expense of the host. Nat Rev Cancer 13, 497-510 (2013). 216 Hsu, P. Y. et al. Estrogen-mediated epigenetic repression of large chromosomal regions through DNA looping. Genome Res (2010). 217 McDonald, O. G., Wu, H., Timp, W., Doi, A. & Feinberg, A. P. Genome-scale epigenetic reprogramming during epithelial-to-mesenchymal transition. Nat Struct Mol Biol 18, 867-874 (2011). 218 Beale, L. Examination of sputum from a case of cancer of the pharynx and the adjacent parts. Arch Med 2, 1860-1861 (1860). 219 Papanicolaou, G. N. & Traut, H. F. Diagnosis of uterine cancer by the vaginal smear. New York, 46 (1943). 220 Derenzini, M. & Ploton, D. Interphase nucleolar organizer regions in cancer cells. Int Rev Exp Pathol 32, 149-192 (1991). 221 Zaidi, S. K. et al. Nuclear microenvironments in biological control and cancer. Nat Rev Cancer 7, 454-463 (2007). 222 Toumpanaki, A. et al. Two-dimensional electrophoretic analysis of nuclear matrix proteins in human colon adenocarcinoma. Ultrastruct Pathol 33, 83-91 (2009). 223 Dey, P. Nuclear margin irregularity and cancer: a review. Anal Quant Cytol Histol 31, 345-352 (2009). 224 Chow, K. H., Factor, R. E. & Ullman, K. S. The nuclear envelope environment and its cancer connections. Nat Rev Cancer 12, 196-209 (2012). 225 Capo-chichi, C. D., Cai, K. Q., Smedberg, J., Ganjei-Azar, P., Godwin, A. K. & Xu, X. X. Loss of A-type lamin expression compromises nuclear envelope integrity in breast cancer. Chin J Cancer 30, 415-425 (2011). 226 Willis, N. D. et al. Lamin A/C is a risk biomarker in colorectal cancer. PLoS ONE 3, e2988 (2008). 227 Machiels, B. M., Broers, J. L., Raymond, Y., de Ley, L., Kuijpers, H. J., Caberg, N. E. & Ramaekers, F. C. Abnormal A-type lamin organization in a human lung carcinoma cell line. Eur J Cell Biol 67, 328-335 (1995).

    215

    REFERENCES

    228 Agrelo, R. et al. Inactivation of the lamin A/C gene by CpG island promoter hypermethylation in hematologic malignancies, and its association with poor survival in nodal diffuse large B-cell lymphoma. J Clin Oncol 23, 3940-3947 (2005). 229 Helfand, B. T., Wang, Y., Pfleghaar, K., Shimi, T., Taimen, P. & Shumaker, D. K. Chromosomal regions associated with prostate cancer risk localize to lamin B- deficient microdomains and exhibit reduced gene transcription. J Pathol 226, 735- 745 (2012). 230 Clark, S. J., Statham, A., Stirzaker, C., Molloy, P. L. & Frommer, M. DNA methylation: bisulphite modification and analysis. Nat Protoc 1, 2353-2364 (2006). 231 Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25 (2009). 232 Nair, S. S. et al. Comparison of methyl-DNA immunoprecipitation (MeDIP) and methyl-CpG binding domain (MBD) protein capture for genome-wide DNA methylation analysis reveal CpG sequence coverage bias. Epigenetics 6, 34-44 (2011). 233 Forn, M. et al. Long range epigenetic silencing is a trans-species mechanism that results in cancer specific deregulation by overriding the chromatin domains of normal cells. Mol Oncol 7, 1129-1141 (2013). 234 Clark, S. J. Action at a distance: epigenetic silencing of large chromosomal regions in carcinogenesis. Hum Mol Genet 16 Spec No 1, R88-95 (2007). 235 Sparmann, A. & van Lohuizen, M. Polycomb silencers control cell fate, development and cancer. Nat Rev Cancer 6, 846-856 (2006). 236 Bracken, A. P. & Helin, K. Polycomb group proteins: navigators of lineage pathways led astray in cancer. Nat Rev Cancer 9, 773-784 (2009). 237 Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev 16, 6-21 (2002). 238 Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B. & Speed, T. P. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31, e15 (2003). 239 Smyth, G. K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3, Article3 (2004). 240 Johnson, W. E., Li, W., Meyer, C. A., Gottardo, R., Carroll, J. S., Brown, M. & Liu, X. S. Model-based analysis of tiling-arrays for ChIP-chip. Proc Natl Acad Sci U S A 103, 12457-12462 (2006). 241 Statham, A. L., Strbenac, D., Coolen, M. W., Stirzaker, C., Clark, S. J. & Robinson, M. D. Repitools: an R package for the analysis of enrichment-based epigenomic data. Bioinformatics 26, 1662-1663 (2010). 242 Sobel, R. E., Wang, Y. & Sadar, M. D. Molecular analysis and characterization of PrEC, commercially available prostate epithelial cells. In Vitro Cell Dev Biol Anim 42, 33-39 (2006). 243 Horoszewicz, J. S. et al. The LNCaP cell line--a new model for studies on human prostatic carcinoma. Prog Clin Biol Res 37, 115-132 (1980).

    216

    REFERENCES

    244 Horoszewicz, J. S. et al. LNCaP model of human prostatic carcinoma. Cancer Res 43, 1809-1818 (1983). 245 Stone, K. R., Mickey, D. D., Wunderli, H., Mickey, G. H. & Paulson, D. F. Isolation of a human prostate carcinoma cell line (DU 145). Int J Cancer 21, 274-281 (1978). 246 Kaighn, M. E., Narayan, K. S., Ohnuki, Y., Lechner, J. F. & Jones, L. W. Establishment and characterization of a human prostatic carcinoma cell line (PC-3). Invest Urol 17, 16-23 (1979). 247 Bubendorf, L. et al. Survey of gene amplifications during prostate cancer progression by high throughput fluorescence in situ hybridization on tissue microarrays. Cancer research 59, 803-806 (1999). 248 Dhanasekaran, S. M. et al. Delineation of prognostic biomarkers in prostate cancer. Nature 412, 822-826 (2001). 249 Lapointe, J. et al. Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci U S A 101, 811-816 (2004). 250 Liu, P. et al. Sex-determining region Y box 4 is a transforming oncogene in human prostate cancer cells. Cancer Res 66, 4011-4019 (2006). 251 Luo, J. et al. Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling. Cancer Res 61, 4683-4688 (2001). 252 Singh, D. et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203-209 (2002). 253 Tomlins, S. A. et al. Integrative molecular concept modeling of prostate cancer progression. Nat Genet 39, 41-51 (2007). 254 Vanaja, D. K., Cheville, J. C., Iturria, S. J. & Young, C. Y. Transcriptional silencing of zinc finger protein 185 identified by expression profiling is associated with prostate cancer progression. Cancer Res 63, 3877-3882 (2003). 255 Varambally, S. et al. Integrative genomic and proteomic analysis of prostate cancer reveals signatures of metastatic progression. Cancer Cell 8, 393-406 (2005). 256 Welsh, J. B. et al. Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res 61, 5974-5978 (2001). 257 Yu, Y. P. et al. Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy. J Clin Oncol 22, 2790-2799 (2004). 258 Dhanasekaran, S. M. et al. Molecular profiling of human prostate tissues: insights into gene expression patterns of prostate development during puberty. FASEB J 19, 243-245 (2005). 259 Lilja, H., Ulmert, D. & Vickers, A. J. Prostate-specific antigen and prostate cancer: prediction, detection and monitoring. Nat Rev Cancer 8, 268-278 (2008). 260 Deras, I. L. et al. PCA3: a molecular urine assay for predicting prostate biopsy outcome. J Urol 179, 1587-1592 (2008). 261 Bussemakers, M. J. et al. DD3: a new prostate-specific gene, highly overexpressed in prostate cancer. Cancer Res 59, 5975-5979 (1999).

    217

    REFERENCES

    262 Wang, M., Hu, Y., Amatangelo, M. D. & Stearns, M. E. Role of ribosomal protein RPS2 in controlling let-7a expression in human prostate cancer. Mol Cancer Res 9, 36-50 (2011). 263 Alajez, N. M. et al. Enhancer of Zeste homolog 2 (EZH2) is overexpressed in recurrent nasopharyngeal carcinoma and is regulated by miR-26a, miR-101, and miR-98. Cell Death Dis 1, e85 (2010). 264 Hermans, K. G., Bressers, A. A., van der Korput, H. A., Dits, N. F., Jenster, G. & Trapman, J. Two unique novel prostate-specific and androgen-regulated fusion partners of ETV4 in prostate cancer. Cancer Res 68, 3094-3098 (2008). 265 Hermans, K. G. et al. Truncated ETV1, fused to novel tissue-specific genes, and full- length ETV1 in prostate cancer. Cancer Res 68, 7541-7549 (2008). 266 Tomlins, S. A. et al. Distinct classes of chromosomal rearrangements create oncogenic ETS gene fusions in prostate cancer. Nature 448, 595-599 (2007). 267 Mullighan, C. G. et al. Deletion of IKZF1 and prognosis in acute lymphoblastic leukemia. N Engl J Med 360, 470-480 (2009). 268 Lim, S. L. et al. Promoter hypermethylation of FANCF and outcome in advanced ovarian cancer. Br J Cancer 98, 1452-1456 (2008). 269 Salomon-Nguyen, F., Della-Valle, V., Mauchauffe, M., Busson-Le Coniat, M., Ghysdael, J., Berger, R. & Bernard, O. A. The t(1;12)(q21;p13) translocation of human acute myeloblastic leukemia results in a TEL-ARNT fusion. Proc Natl Acad Sci U S A 97, 6757-6762 (2000). 270 Ciampi, R. et al. Oncogenic AKAP9-BRAF fusion is a novel mechanism of MAPK pathway activation in thyroid cancer. Journal of Clinical Investigation 115, 94-101 (2005). 271 de Leeuw, B., Balemans, M., Olde Weghuis, D. & Geurts van Kessel, A. Identification of two alternative fusion genes, SYT-SSX1 and SYT-SSX2, in t(X;18)(p11.2;q11.2)- positive synovial sarcomas. Hum Mol Genet 4, 1097-1099 (1995). 272 Huang da, W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37, 1-13 (2009). 273 Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4, 44-57 (2009). 274 Jia, L. et al. Genomic androgen receptor-occupied regions with different functions, defined by histone acetylation, coregulators and transcriptional capacity. PLoS ONE 3, e3645 (2008). 275 de Kok, J. B. et al. DD3(PCA3), a very sensitive and specific marker to detect prostate tumors. Cancer research 62, 2695-2698 (2002). 276 Salagierski, M., Verhaegh, G. W., Jannink, S. A., Smit, F. P., Hessels, D. & Schalken, J. A. Differential expression of PCA3 and its overlapping PRUNE2 transcript in prostate cancer. Prostate (2009). 277 Klose, R. J. & Bird, A. P. Genomic DNA methylation: the mark and its mediators. Trends Biochem Sci 31, 89-97 (2006).

    218

    REFERENCES

    278 Shen, L. et al. Genome-wide profiling of DNA methylation reveals a class of normally methylated CpG island promoters. PLoS Genet 3, 2023-2036 (2007). 279 Stirzaker, C., Taberlay, P. C., Statham, A. L. & Clark, S. J. Mining cancer methylomes: prospects and challenges. Trends in Genetics. 280 Chalitchagorn, K. et al. Distinctive pattern of LINE-1 methylation level in normal tissues and the association with carcinogenesis. Oncogene 23, 8841-8846 (2004). 281 Nishigaki, M. et al. Discovery of aberrant expression of R-RAS by cancer-linked DNA hypomethylation in gastric cancer using microarrays. Cancer Res 65, 2115-2124 (2005). 282 Grunau, C. et al. Frequent DNA hypomethylation of human juxtacentromeric BAGE loci in cancer. Genes Chromosomes Cancer 43, 11-24 (2005). 283 Lim, J. H., Kim, S. P., Gabrielson, E., Park, Y. B., Park, J. W. & Kwon, T. K. Activation of human cancer/testis antigen gene, XAGE-1, in tumor cells is correlated with CpG island hypomethylation. Int J Cancer 116, 200-206 (2005). 284 Bock, C., Reither, S., Mikeska, T., Paulsen, M., Walter, J. & Lengauer, T. BiQ Analyzer: visualization and quality control for DNA methylation data from bisulfite sequencing. Bioinformatics 21, 4067-4068 (2005). 285 Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105-1111 (2009). 286 Clark, S. J., Harrison, J., Paul, C. L. & Frommer, M. High sensitivity mapping of methylated cytosines. Nucleic Acids Res 22, 2990-2997 (1994). 287 Huang, Y., Pastor, W. A., Shen, Y., Tahiliani, M., Liu, D. R. & Rao, A. The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing. PLoS ONE 5, e8888 (2010). 288 Sturges, H. A. The choice of a class interval Case I Computations involving a single. Journal of the American Statistical Association 21, 65-66 (1926). 289 Riebler, A. et al. BayMeth: Improved DNA methylation quantification for affinity capture sequencing data using a flexible Bayesian approach. arXiv preprint arXiv:1312.3115 (2013). 290 de Hoon, M. & Hayashizaki, Y. Deep cap analysis gene expression (CAGE): genome- wide identification of promoters, quantification of their expression, and network inference. Biotechniques 44, 627-628, 630, 632 (2008). 291 Kodzius, R. et al. CAGE: cap analysis of gene expression. Nat Methods 3, 211-222 (2006). 292 Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 31, 374-378 (2003). 293 Courey, A. J. & Tjian, R. Analysis of Sp1 in vivo reveals multiple transcriptional domains, including a novel glutamine-rich activation motif. Cell 55, 887-898 (1988). 294 Roth, C., Schuierer, M., Gunther, K. & Buettner, R. Genomic structure and DNA binding properties of the human zinc finger transcriptional repressor AP-2rep (KLF12). Genomics 63, 384-390 (2000). 295 Klocke, I. S. et al. The human gene ZFP161 on 18p11.21-pter encodes a putative c- myc repressor and is homologous to murine Zfp161 (Chr 17) and Zfp161-rs1 (X chr) (vol 43, pg 156, 1997). Genomics 45, 633-633 (1997).

    219

    REFERENCES

    296 Knofler, M. et al. Human Hand1 basic helix-loop-helix (bHLH) protein: extra- embryonic expression pattern, interaction partners and identification of its transcriptional repressor domains. Biochem J 361, 641-651 (2002). 297 Morin, S., Pozzulo, G., Robitaille, L., Cross, J. & Nemer, M. MEF2-dependent recruitment of the HAND1 transcription factor results in synergistic activation of target promoters. J Biol Chem 280, 32272-32278 (2005). 298 Woloshin, P., Song, K., Degnin, C., Killary, A. M., Goldhamer, D. J., Sassoon, D. & Thayer, M. J. MSX1 inhibits myoD expression in fibroblast x 10T1/2 cell hybrids. Cell 82, 611-620 (1995). 299 Horton, J. D., Goldstein, J. L. & Brown, M. S. SREBPs: activators of the complete program of cholesterol and fatty acid synthesis in the liver. J Clin Invest 109, 1125- 1131 (2002). 300 Zhang, H. et al. TEAD transcription factors mediate the function of TAZ in cell growth and epithelial-mesenchymal transition. J Biol Chem 284, 13355-13362 (2009). 301 Ogryzko, V. V., Schiltz, R. L., Russanova, V., Howard, B. H. & Nakatani, Y. The transcriptional coactivators p300 and CBP are histone acetyltransferases. Cell 87, 953-959 (1996). 302 Ishiji, T. et al. Transcriptional enhancer factor (TEF)-1 and its cell-specific co-activator activate human papillomavirus-16 E6 and E7 oncogene transcription in keratinocytes and cervical carcinoma cells. Embo Journal 11, 2271-2281 (1992). 303 Gao, H., Le, Y., Wu, X., Silberstein, L. E., Giese, R. W. & Zhu, Z. VentX, a novel lymphoid-enhancing factor/T-cell factor-associated transcription repressor, is a putative tumor suppressor. Cancer Res 70, 202-211 (2010). 304 Numoto, M. et al. Transcriptional repressor ZF5 identifies a new conserved domain in zinc finger proteins. Nucleic Acids Res 21, 3767-3775 (1993). 305 Fang, F., Fan, S., Zhang, X. & Zhang, M. Q. Predicting methylation status of CpG islands in the human brain. Bioinformatics 22, 2204-2209 (2006). 306 Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140 (2010). 307 Holliday, R. & Pugh, J. E. DNA Modification Mechanisms and Gene Activity during Development. Science 187, 226-232 (1975). 308 Riggs, A. D. X-Inactivation, Differentiation, and DNA Methylation. Cytogenetics and Cell Genetics 14, 9-25 (1975). 309 Robertson, K. D. DNA methylation and human disease. Nat Rev Genet 6, 597-610 (2005). 310 Jones, P. A. DNA methylation errors and cancer. Cancer Res 56, 2463-2467 (1996). 311 Gupta, A., Godwin, A. K., Vanderveer, L., Lu, A. & Liu, J. Hypomethylation of the synuclein gamma gene CpG island promotes its aberrant expression in breast carcinoma and ovarian carcinoma. Cancer Res 63, 664-673 (2003). 312 Schenk, T., Stengel, S., Goellner, S., Steinbach, D. & Saluz, H. P. Hypomethylation of PRAME is responsible for its aberrant overexpression in human malignancies. Genes Chromosomes Cancer 46, 796-804 (2007).

    220

    REFERENCES

    313 Ross, J. P., Rand, K. N. & Molloy, P. L. Hypomethylation of repeated DNA sequences in cancer. Epigenomics 2, 245-269 (2010). 314 Irizarry, R. A. et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 41, 178- 186 (2009). 315 Tolhuis, B., Palstra, R.-J., Splinter, E., Grosveld, F. & de Laat, W. Looping and Interaction between Hypersensitive Sites in the Active β-globin Locus. Molecular Cell 10, 1453-1465 (2002). 316 Tiwari, V. K., McGarvey, K. M., Licchesi, J. D., Ohm, J. E., Herman, J. G., Schubeler, D. & Baylin, S. B. PcG proteins, DNA methylation, and gene repression by chromatin looping. PLoS Biol 6, 2911-2927 (2008). 317 Tan-Wong, S. M., French, J. D., Proudfoot, N. J. & Brown, M. A. Dynamic interactions between the promoter and terminator regions of the mammalian BRCA1 gene. Proc Natl Acad Sci U S A 105, 5160-5165 (2008). 318 Wang, K. C. et al. A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature 472, 120-124 (2011). 319 Lan, X. et al. Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages. Nucleic Acids Res 40, 7690-7704 (2012). 320 Ohlsson, R., Renkawitz, R. & Lobanenkov, V. CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends in Genetics 17, 520-527 (2001). 321 Heath, H. et al. CTCF regulates cell cycle progression of αβ T cells in the thymus. The EMBO Journal 27, 2839-2850 (2008). 322 Cuddapah, S., Jothi, R., Schones, D. E., Roh, T. Y., Cui, K. & Zhao, K. Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. Genome Res 19, 24-32 (2009). 323 Filippova, G. N. et al. An exceptionally conserved transcriptional repressor, CTCF, employs different combinations of zinc fingers to bind diverged promoter sequences of avian and mammalian c-myc oncogenes. Molecular and cellular biology 16, 2802- 2813 (1996). 324 Vostrov, A. A. & Quitschke, W. W. The zinc finger protein CTCF binds to the APBbeta domain of the amyloid beta-protein precursor promoter. Evidence for a role in transcriptional activation. J Biol Chem 272, 33353-33359 (1997). 325 Köhne, A. C., Baniahmad, A. & Renkawitz, R. NeP1: A Ubiquitous Transcription Factor Synergizes with v-ERBA in Transcriptional Silencing. Journal of molecular biology 232, 747-755 (1993). 326 Hark, A. T., Schoenherr, C. J., Katz, D. J., Ingram, R. S., Levorse, J. M. & Tilghman, S. M. CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature 405, 486-489 (2000). 327 Xie, X., Mikkelsen, T. S., Gnirke, A., Lindblad-Toh, K., Kellis, M. & Lander, E. S. Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites. Proc Natl Acad Sci U S A 104, 7145-7150 (2007).

    221

    REFERENCES

    328 Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948-951 (2008). 329 Phillips-Cremins, J. E. et al. Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell 153, 1281-1295 (2013). 330 Zuin, J. et al. Cohesin and CTCF differentially affect chromatin architecture and gene expression in human cells. Proc Natl Acad Sci U S A 111, 996-1001 (2014). 331 Kim, T. H. et al. Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell 128, 1231-1245 (2007). 332 Filippova, G. N. et al. Tumor-associated zinc finger mutations in the CTCF transcription factor selectively alter its DNA-binding specificity. Cancer research 62, 48-52 (2002). 333 Aulmann, S., Bläker, H., Penzel, R., Rieker, R. J., Otto, H. F. & Sinn, H. P. CTCF gene mutations in invasive ductal breast cancer. Breast cancer research and treatment 80, 347-352 (2003). 334 Kandoth, C. et al. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67-73 (2013). 335 Méndez-Catalá, C. F. et al. A novel mechanism for CTCF in the epigenetic regulation of bax in breast cancer cells. Neoplasia (United States) 15, 898-912 (2013). 336 Landan, G. et al. Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues. Nat Genet 44, 1207-1214 (2012). 337 Peng, Z., Shen, R., Li, Y. W., Teng, K. Y., Shapiro, C. L. & Lin, H. J. Epigenetic repression of RARRES1 is mediated by methylation of a proximal promoter and a loss of CTCF binding. PLoS ONE 7, e36891 (2012). 338 Shizuya, H., Birren, B., Kim, U. J., Mancino, V., Slepak, T., Tachiiri, Y. & Simon, M. Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc Natl Acad Sci U S A 89, 8794- 8797 (1992). 339 Dekker, J. The three 'C' s of chromosome conformation capture: controls, controls, controls. Nat Methods 3, 17-21 (2006). 340 Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511-515 (2010). 341 Fullwood, M. J. et al. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature 462, 58-64 (2009). 342 Fullwood, M. J., Han, Y., Wei, C. L., Ruan, X. & Ruan, Y. Chromatin interaction analysis using paired-end tag sequencing. Curr Protoc Mol Biol Chapter 21, Unit 21 15 21-25 (2010). 343 Li, G. et al. ChIA-PET tool for comprehensive chromatin interaction analysis with paired-end tag sequencing. Genome Biol 11, R22 (2010). 344 Hou, C. & Corces, V. G. Throwing transcription for a loop: expression of the genome in the 3D nucleus. Chromosoma 121, 107-116 (2012).

    222

    REFERENCES

    345 Noordermeer, D., Leleu, M., Splinter, E., Rougemont, J., De Laat, W. & Duboule, D. The dynamic architecture of Hox gene clusters. Science 334, 222-225 (2011). 346 Ferraiuolo, M. A. et al. The three-dimensional architecture of Hox cluster silencing. Nucleic Acids Res 38, 7472-7484 (2010). 347 Chambeyron, S., Da Silva, N. R., Lawson, K. A. & Bickmore, W. A. Nuclear re- organisation of the Hoxb complex during mouse embryonic development. Development 132, 2215-2223 (2005). 348 Bau, D. et al. The three-dimensional folding of the alpha-globin gene domain reveals formation of chromatin globules. Nat Struct Mol Biol 18, 107-114 (2011). 349 Dostie, J. et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res 16, 1299-1309 (2006). 350 Cai, S., Lee, C. C. & Kohwi-Shigematsu, T. SATB1 packages densely looped, transcriptionally active chromatin for coordinated expression of cytokine genes. Nat Genet 38, 1278-1288 (2006). 351 Sutherland, H. & Bickmore, W. A. Transcription factories: gene expression in unions? Nat Rev Genet 10, 457-466 (2009). 352 Taylor, J. H., Woods, P. S. & Hughes, W. L. The organization and duplication of chromosomes as revealed by autoradiographic studies using tritium-labeled thymidinee. Proceedings of the National Academy of Sciences of the United States of America 43, 122 (1957). 353 Taylor, J. H. Sister chromatid exchanges in tritium-labeled chromosomes. Genetics 43, 515 (1958). 354 Taylor, J. H. NUCLEIC ACID SYNTHESIS IN RELATION TO THE CELL DIVISION CYCLE*. Annals of the New York Academy of Sciences 90, 409-421 (1960). 355 Ficq, A. & Pavan, C. Autoradiography of polytene chromosomes of Rhynchosciara angelae at different stages of larval development. (1957). 356 Wimber, D. E. Asynchronous replication of deoxyribonucleic acid in root tip chromosomes of Tradescantia paludosa. Experimental Cell Research 23, 402-407 (1961). 357 Morishima, A., Grumbach, M. M. & Taylor, J. H. Asynchronous duplication of human chromosomes and the origin of sex chromatin. Proceedings of the National Academy of Sciences of the United States of America 48, 756 (1962). 358 Stambrook, P. J. & Flickinger, R. A. Changes in chromosomal DNA replication patterns in developing frog embryos. Journal of Experimental Zoology 174, 101-113 (1970). 359 Hsu, T., Schmid, W. & Stubblefield, E. DNA replication sequences in higher animals. The role of chromosomes in development (ed. M. Locke), 82-112 (1964). 360 Southern, E. M. Detection of specific sequences among DNA fragments separated by gel electrophoresis. Journal of molecular biology 98, 503-517 (1975). 361 Selig, S., Okumura, K., Ward, D. & Cedar, H. Delineation of DNA replication time zones by fluorescence in situ hybridization. The EMBO Journal 11, 1217 (1992).

    223

    REFERENCES

    362 Goldman, M. A., Holmquist, G. P., Gray, M. C., Caston, L. A. & Nag, A. Replication timing of genes and middle repetitive sequences. Science 224, 686-692 (1984). 363 Holmquist, G., Gray, M., Porter, T. & Jordan, J. Characterization of Giemsa dark- and light-band DNA. Cell 31, 121-129 (1982). 364 Latt, S. A. Fluorescence analysis of late DNA replication in human metaphase chromosomes. Somatic cell genetics 1, 293-321 (1975). 365 Vassilev, L. & Russev, G. Purification of nascent DNA chains by immunoprecipitation with anti-BrdU antibodies. Nucleic acids research 16, 10397 (1988). 366 Vassilev, L. & Johnson, E. M. Mapping initiation sites of DNA replication in vivo using polymerase chain reaction amplification of nascent strand segments. Nucleic acids research 17, 7693-7705 (1989). 367 Ten Hagen, K. G., Gilbert, D. M., Willard, H. F. & Cohen, S. N. Replication timing of DNA sequences associated with human centromeres and telomeres. Molecular and cellular biology 10, 6348-6355 (1990). 368 Perry, P. et al. Report A Dynamic Switch in the Replication Timing of Key Regulator Genes in Embryonic Stem Cells upon Neural Induction. Cell Cycle 3, 1645-1650 (2004). 369 Ryba, T. et al. Abnormal developmental control of replication timing domains in pediatric acute lymphoblastic leukemia. Genome Res (2012). 370 Azuara, V. Profiling of DNA replication timing in unsynchronized cell populations. Nat Protoc 1, 2171-2177 (2006). 371 Ryba, T., Battaglia, D., Pope, B. D., Hiratani, I. & Gilbert, D. M. Genome-scale analysis of replication timing: from bench to bioinformatics. Nat Protoc 6, 870-895 (2011). 372 Cha, R. S. & Kleckner, N. ATR homolog Mec1 promotes fork progression, thus averting breaks in replication slow zones. Science 297, 602-606 (2002). 373 Resnitzky, D., Gossen, M., Bujard, H. & Reed, S. I. Acceleration of the G(1)/S Phase- Transition by Expression of Cyclin-D1 and Cyclin-E with an Inducible System. Molecular and cellular biology 14, 1669-1679 (1994). 374 Azuara, V. et al. Heritable gene silencing in lymphocytes delays chromatid resolution without affecting the timing of DNA replication. Nat Cell Biol 5, 668-674 (2003). 375 Klenow, H. & Henningsen, I. Selective elimination of the exonuclease activity of the deoxyribonucleic acid polymerase from Escherichia coli B by limited proteolysis. Proc Natl Acad Sci U S A 65, 168-175 (1970). 376 Tomizawa, J. & Selzer, G. Initiation of DNA synthesis in Escherichia coli. Annual review of biochemistry 48, 999-1034 (1979). 377 Cairns, J. The bacterial chromosome and its manner of replication as seen by autoradiography. Journal of molecular biology 6, 208-IN205 (1963). 378 Brewer, B. J. & Fangman, W. L. The localization of replication origins on ARS plasmids in S. cerevisiae. Cell 51, 463-471 (1987). 379 Cayrou, C. et al. Genome-scale analysis of metazoan replication origins reveals their organization in specific but flexible sites defined by conserved features. Genome research 21, 1438-1449 (2011).

    224

    REFERENCES

    380 Wang, L., Lin, C. M., Lopreiato, J. O. & Aladjem, M. I. Cooperative sequence modules determine replication initiation sites at the human β-globin locus. Human molecular genetics 15, 2613-2622 (2006). 381 MacAlpine, H. K., Gordân, R., Powell, S. K., Hartemink, A. J. & MacAlpine, D. M. Drosophila ORC localizes to open chromatin and marks sites of cohesin complex loading. Genome research 20, 201-211 (2010). 382 Eaton, M. L., Galani, K., Kang, S., Bell, S. P. & MacAlpine, D. M. Conserved nucleosome positioning defines replication origins. Genes & Development 24, 748- 753 (2010). 383 Jin, C., Zang, C., Wei, G., Cui, K., Peng, W., Zhao, K. & Felsenfeld, G. H3. 3/H2A. Z double variant–containing nucleosomes mark'nucleosome-free regions' of active promoters and other regulatory regions. Nature genetics 41, 941-945 (2009). 384 Fu, H. et al. Methylation of histone H3 on lysine 79 associates with a group of replication origins and helps limit DNA replication once per cell cycle. PLoS genetics 9, e1003542 (2013). 385 Lebofsky, R., Heilig, R., Sonnleitner, M., Weissenbach, J. & Bensimon, A. DNA replication origin interference increases the spacing between initiation events in human cells. Molecular Biology of the Cell 17, 5337-5345 (2006). 386 Friedman, K. L., Brewer, B. J. & Fangman, W. L. Replication profile of Saccharomyces cerevisiae chromosome VI. Genes to Cells 2, 667-678 (1997). 387 Mantiero, D., Mackenzie, A., Donaldson, A. & Zegerman, P. Limiting replication initiation factors execute the temporal programme of origin firing in budding yeast. The EMBO Journal 30, 4805-4814 (2011). 388 Tanaka, S., Nakato, R., Katou, Y., Shirahige, K. & Araki, H. Origin Association of Sld3, Sld7, and Cdc45 Proteins Is a Key Step for Determination of Origin-Firing Timing. Current Biology 21, 2055-2063 (2011). 389 Douglas, Max E. & Diffley, John F. X. Replication Timing: The Early Bird Catches the Worm. Current Biology 22, R81-R82 (2012). 390 Lantermann, A. B., Straub, T., Stralfors, A., Yuan, G. C., Ekwall, K. & Korber, P. Schizosaccharomyces pombe genome-wide nucleosome mapping reveals positioning mechanisms distinct from those of Saccharomyces cerevisiae. Nat Struct Mol Biol 17, 251-257 (2010). 391 Vogelauer, M., Rubbi, L., Lucas, I., Brewer, B. J. & Grunstein, M. Histone acetylation regulates the time of replication origin firing. Mol Cell 10, 1223-1233 (2002). 392 Knott, S. R., Viggiani, C. J., Tavare, S. & Aparicio, O. M. Genome-wide replication profiles indicate an expansive role for Rpd3L in regulating replication initiation timing or efficiency, and reveal genomic loci of Rpd3 function in Saccharomyces cerevisiae. Genes Dev 23, 1077-1090 (2009). 393 Cornacchia, D. et al. Mouse Rif1 is a key regulator of the replication-timing programme in mammalian cells. Embo Journal 31, 3678-3690 (2012). 394 Yamazaki, S., Ishii, A., Kanoh, Y., Oda, M., Nishito, Y. & Masai, H. Rif1 regulates the replication timing domains on the human genome. Embo Journal 31, 3667-3677 (2012).

    225

    REFERENCES

    395 Goren, A., Tabib, A., Hecht, M. & Cedar, H. DNA replication timing of the human beta-globin domain is controlled by histone modification at the origin. Genes Dev 22, 1319-1324 (2008). 396 Ghosh, M., Liu, G., Randall, G., Bevington, J. & Leffak, M. Transcription factor binding and induced transcription alter chromosomal c-myc replicator activity. Molecular and cellular biology 24, 10193-10207 (2004). 397 Blumenthal, A. B., Kriegstein, H. J. & Hogness, D. S. The units of DNA replication in Drosophila melanogaster chromosomes. Cold Spring Harb Symp Quant Biol 38, 205- 223 (1974). 398 Hyrien, O. & Mechali, M. Chromosomal replication initiates and terminates at random sequences but at regular intervals in the ribosomal DNA of Xenopus early embryos. Embo Journal 12, 4511-4520 (1993). 399 Laskey, R. A. & Harland, R. M. Replication origins in the eucaryotic chromosome. Cell 24, 283-284 (1981). 400 Alabert, C. & Groth, A. Chromatin replication and epigenome maintenance. Nat Rev Mol Cell Biol 13, 153-167 (2012). 401 Whitehouse, I. & Smith, D. J. Chromatin dynamics at the replication fork: there's more to life than histones. Curr Opin Genet Dev 23, 140-146 (2013). 402 Gruenbaum, Y., Cedar, H. & Razin, A. Substrate and sequence specificity of a eukaryotic DNA methylase. Nature 295, 620-622 (1982). 403 Hansen, K. H. et al. A model for transmission of the H3K27me3 epigenetic mark. Nat Cell Biol 10, 1291-1300 (2008). 404 Hathaway, N. A., Bell, O., Hodges, C., Miller, E. L., Neel, D. S. & Crabtree, G. R. Dynamics and memory of heterochromatin in living cells. Cell 149, 1447-1460 (2012). 405 Canzio, D. et al. Chromodomain-mediated oligomerization of HP1 suggests a nucleosome-bridging mechanism for heterochromatin assembly. Mol Cell 41, 67-81 (2011). 406 Julienne, H., Zoufir, A., Audit, B. & Arneodo, A. Human Genome Replication Proceeds through Four Chromatin States. PLoS Comput Biol 9, e1003233 (2013). 407 Esteve, P. O. et al. Direct interaction between DNMT1 and G9a coordinates DNA and histone methylation during replication. Genes Dev 20, 3089-3103 (2006). 408 Loyola, A. et al. The HP1alpha-CAF1-SetDB1-containing complex provides H3K9me1 for Suv39-mediated K9me3 in pericentric heterochromatin. EMBO Rep 10, 769-775 (2009). 409 Sarraf, S. A. & Stancheva, I. Methyl-CpG binding protein MBD1 couples histone H3 methylation at lysine 9 by SETDB1 to DNA replication and chromatin assembly. Mol Cell 15, 595-605 (2004). 410 Hasan, S., Hassa, P. O., Imhof, R. & Hottiger, M. O. Transcription coactivator p300 binds PCNA and may have a role in DNA repair synthesis. Nature 410, 387-391 (2001).

    226

    REFERENCES

    411 Rowbotham, S. P. et al. Maintenance of silent chromatin through replication requires SWI/SNF-like chromatin remodeler SMARCAD1. Molecular Cell 42, 285-296 (2011). 412 Milutinovic, S., Zhuang, Q. & Szyf, M. Proliferating cell nuclear antigen associates with histone deacetylase activity, integrating DNA replication and chromatin modification. J Biol Chem 277, 20974-20978 (2002). 413 Schermelleh, L. et al. Dynamics of Dnmt1 interaction with the replication machinery and its role in postreplicative maintenance of DNA methylation. Nucleic Acids Res 35, 4301-4312 (2007). 414 Loyola, A., Bonaldi, T., Roche, D., Imhof, A. & Almouzni, G. PTMs on H3 variants before chromatin assembly potentiate their final epigenetic state. Mol Cell 24, 309- 316 (2006). 415 Sobel, R. E., Cook, R. G., Perry, C. A., Annunziato, A. T. & Allis, C. D. Conservation of deposition-related acetylation sites in newly synthesized histones H3 and H4. Proc Natl Acad Sci U S A 92, 1237-1241 (1995). 416 Rountree, M. R., Bachman, K. E. & Baylin, S. B. DNMT1 binds HDAC2 and a new co- repressor, DMAP1, to form a complex at replication foci. Nat Genet 25, 269-277 (2000). 417 Dhar, V., Skoultchi, A. I. & Schildkraut, C. L. Activation and repression of a beta-globin gene in cell hybrids is accompanied by a shift in its temporal replication. Molecular and cellular biology 9, 3524-3532 (1989). 418 Forrester, W., Epner, E., Driscoll, M., Enver, T., Brice, M., Papayannopoulou, T. & Groudine, M. A deletion of the human beta-globin locus activation region causes a major alteration in chromatin structure and replication across the entire beta-globin locus. Genes & Development 4, 1637-1649 (1990). 419 Groudine, M., Kohwi-Shigematsu, T., Gelinas, R., Stamatoyannopoulos, G. & Papayannopoulou, T. Human fetal to adult hemoglobin switching: changes in chromatin structure of the beta-globin gene locus. Proceedings of the National Academy of Sciences 80, 7551-7555 (1983). 420 Zhou, V. W., Goren, A. & Bernstein, B. E. Charting histone modifications and the functional organization of mammalian genomes. Nat Rev Genet 12, 7-18 (2011). 421 Stamatoyannopoulos, J. A., Adzhubei, I., Thurman, R. E., Kryukov, G. V., Mirkin, S. M. & Sunyaev, S. R. Human mutation rate associated with DNA replication timing. Nat Genet 41, 393-395 (2009). 422 De, S. & Michor, F. DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. Nat Biotechnol 29, 1103-1108 (2011). 423 Schuster-Bockler, B. & Lehner, B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488, 504-507 (2012). 424 Yaffe, E., Farkash-Amar, S., Polten, A., Yakhini, Z., Tanay, A. & Simon, I. Comparative analysis of DNA replication timing reveals conserved large-scale chromosomal architecture. PLoS Genet 6, e1001011 (2010). 425 Wijchers, P. J. & de Laat, W. Genome organization influences partner selection for chromosomal rearrangements. Trends Genet 27, 63-71 (2011).

    227

    REFERENCES

    426 Amiel, A., Litmanovich, T., Gaber, E., Lishner, M., Avivi, L. & Fejgin, M. D. Asynchronous replication of p53 and 21q22 loci in chronic lymphocytic leukemia. Human genetics 101, 219-222 (1997). 427 Amiel, A. et al. Temporal differences in replication timing of homologous loci in malignant cells derived from CML and lymphoma patients. Genes, Chromosomes and Cancer 22, 225-231 (1998). 428 Chang, B. H., Smith, L., Huang, J. & Thayer, M. Chromosomes with delayed replication timing lead to checkpoint activation, delayed recruitment of Aurora B and chromosome instability. Oncogene 26, 1852-1861 (2007). 429 Donley, N. & Thayer, M. J. in Seminars in cancer biology. 80-89 (Elsevier). 430 Thurman, R. E., Day, N., Noble, W. S. & Stamatoyannopoulos, J. A. Identification of higher-order functional domains in the human ENCODE regions. Genome Res 17, 917-927 (2007). 431 Kelly, T. K., Liu, Y., Lay, F. D., Liang, G., Berman, B. P. & Jones, P. A. Genome-wide mapping of nucleosome positioning and DNA methylation within individual DNA molecules. Genome Res (2012). 432 Meyer, L. R. et al. The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res (2012). 433 Abdi, H. & Williams, L. J. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2, 433-459 (2010). 434 Gibas, Z., Becher, R., Kawinski, E., Horoszewicz, J. & Sandberg, A. A. A high-resolution study of chromosome changes in a human prostatic carcinoma cell line (LNCaP). Cancer Genet Cytogenet 11, 399-404 (1984). 435 Han, H., Cortez, C. C., Yang, X., Nichols, P. W., Jones, P. A. & Liang, G. DNA methylation directly silences genes with non-CpG island promoters and establishes a nucleosome occupied promoter. Hum Mol Genet 20, 4299-4310 (2011). 436 Ball, M. P. et al. Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells. Nat Biotechnol 27, 361-368 (2009). 437 Mardia, K. V., Kent, J. T. & Bibby, J. M. Multivariate analysis. (Academic Press, 1979). 438 Bamford, S. et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer 91, 355-358 (2004). 439 Mosquera, J. M. et al. Prevalence of TMPRSS2-ERG fusion prostate cancer among men undergoing prostate biopsy in the United States. Clin Cancer Res 15, 4706-4711 (2009). 440 Wang, J., Cai, Y., Ren, C. & Ittmann, M. Expression of variant TMPRSS2/ERG fusion messenger RNAs is associated with aggressive prostate cancer. Cancer Res 66, 8347- 8351 (2006). 441 Li, J. et al. Genomic hypomethylation in the human germline associates with selective structural mutability in the human genome. PLoS Genet 8, e1002692 (2012). 442 Maslov, A. Y. et al. 5-Aza-2'-deoxycytidine-induced genome rearrangements are mediated by DNMT1. Oncogene (2012).

    228

    REFERENCES

    443 Daskalos, A. et al. Hypomethylation of retrotransposable elements correlates with genomic instability in non-small cell lung cancer. Int J Cancer 124, 81-87 (2009). 444 Ryba, T. et al. Replication timing: a fingerprint for cell identity and pluripotency. PLoS Comput Biol 7, e1002225 (2011). 445 Bester, A. C. et al. Nucleotide deficiency promotes genomic instability in early stages of cancer development. Cell 145, 435-446 (2011). 446 Woodfine, K. et al. Replication timing of the human genome. Hum Mol Genet 13, 191-202 (2004). 447 Maric, C. & Prioleau, M. N. Interplay between DNA replication and gene expression: a harmonious coexistence. Curr Opin Cell Biol 22, 277-283 (2010). 448 Suzuki, M. et al. Late-replicating heterochromatin is characterized by decreased cytosine methylation in the human genome. Genome Res 21, 1833-1840 (2011). 449 Bernstein, B. E. et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol 28, 1045-1048 (2010). 450 Tan, M. et al. Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Cell 146, 1016-1028 (2011). 451 Jones, P. A. et al. Moving AHEAD with an international human epigenome project. Nature 454, 711-715 (2008). 452 Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 324, 930-935 (2009). 453 Hackett, J. A., Sengupta, R., Zylicz, J. J., Murakami, K., Lee, C., Down, T. A. & Surani, M. A. Germline DNA demethylation dynamics and imprint erasure through 5- hydroxymethylcytosine. Science 339, 448-452 (2013). 454 Creusot, F., Acs, G. & Christman, J. K. Inhibition of DNA methyltransferase and induction of Friend erythroleukemia cell differentiation by 5-azacytidine and 5-aza- 2'-deoxycytidine. J Biol Chem 257, 2041-2048 (1982). 455 Yoshida, M., Kijima, M., Akita, M. & Beppu, T. Potent and specific inhibition of mammalian histone deacetylase both in vivo and in vitro by trichostatin A. Journal of Biological Chemistry 265, 17174-17179 (1990). 456 Shah, P. P. et al. Lamin B1 depletion in senescent cells triggers large-scale changes in gene expression and the chromatin landscape. Genes Dev 27, 1787-1799 (2013). 457 Berger, R. et al. Androgen-induced differentiation and tumorigenicity of human prostate epithelial cells. Cancer research 64, 8867-8875 (2004). 458 Hayward, S. W. et al. Malignant transformation in a nontumorigenic human prostatic epithelial cell line. Cancer research 61, 8135-8142 (2001). 459 Achanzar, W. E., Diwan, B. A., Liu, J., Quader, S. T., Webber, M. M. & Waalkes, M. P. Cadmium-induced malignant transformation of human prostate epithelial cells. Cancer research 61, 455-458 (2001). 460 Hinshelwood, R. A. & Clark, S. J. Breast cancer epigenetics: normal human mammary epithelial cells as a model system. Journal of molecular medicine 86, 1315-1328 (2008). 461 International Cancer Genome, C. et al. International network of cancer genome projects. Nature 464, 993-998 (2010).

    229

    REFERENCES

    462 Collins, F. S. & Barker, A. D. Mapping the cancer genome. Pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies. Sci Am 296, 50-57 (2007). 463 Consortium, E. P. The ENCODE (ENCyclopedia of DNA elements) project. Science 306, 636-640 (2004). 464 Williams, K., Christensen, J., Pedersen, M. T., Johansen, J. V., Cloos, P. A., Rappsilber, J. & Helin, K. TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity. Nature 473, 343-348 (2011). 465 Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663-676 (2006). 466 Onder, T. T. et al. Chromatin-modifying enzymes as modulators of reprogramming. Nature 483, 598-602 (2012). 467 Hawkins, R. D. et al. Distinct epigenomic landscapes of pluripotent and lineage- committed human cells. Cell Stem Cell 6, 479-491 (2010). 468 Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223-227 (2009). 469 Gupta, R. A. et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464, 1071-1076 (2010).

    230