The Chromatin-Looping Factor ZNF143 is Genetically Altered and Promotes the Oestrogen Response in Breast Cancer

by

Aislinn Treloar

A thesis submitted in conformity with the requirements for the degree of Master of Science Department of Medical Biophysics University of Toronto

© Copyright by Aislinn Treloar 2016

The Chromatin-Looping Factor ZNF143 is Genetically Altered and Promoters the Oestrogen Response in Breast Cancer

Aislinn Treloar

Master of Science

Department of Medical Biophysics University of Toronto

2016 Abstract

Oestrogen signalling in breast cancer (BrCa) cells relies on chromatin-loops that connect distal regulatory elements bound by the oestrogen 1 (ESR1) to target promoters. We show that chromatin-looping factor , including CTCF, ZNF143 and RAD21, are genetically altered in BrCa. Expanding on the function of CTCF and cohesin in BrCa, we demonstrate that ZNF143 binds promoters of most early-response oestrogen target genes connected to distal regulatory elements in ESR1-positive BrCa cells. Its chromatin occupancy is unaffected by oestrogen-stimulation, supporting a stable three-dimensional genomic architecture within the oestrogen response. Its loss abrogates the oestrogen-induced transcriptional response and growth of BrCa cells. Furthermore, we show that the overexpression of looping-factors within ESR-1 positive BrCa patients associates with a worse clinical outcome. Overall, our results suggest that ZNF143 is a new critical effector of the oestrogen response and highlights the contribution of the chromatin looping machinery to ESR1-positive BrCa development.

ii

Acknowledgments

During the process of writing this thesis, I received help and support from many people. I would like to express my gratitude to them all in this acknowledgment.

Foremost, I would like to thank my supervisor Dr. Mathieu Lupien for giving me the opportunity to work on this project and who has been so generous with his time, knowledge and encouragement throughout the completion of this thesis.

My sincerest gratitude goes to the current and past members of the Lupien lab who contributed to my project. I thank Dr. Xue Wu and Dr. Xiaoyang Zhang for performing the ZNF143 ChIP- sequencing experiment, Dr. Nadia Penrod for performing RNA-sequencing analysis using the Cufflinks suite and both Dr. Swneke Bailey and Parisa Mazoorei for their guidance in learning several forms of data-analysis. Thanks to everyone in the lab for their encouragement, assistance and friendship during the pursuit of this degree.

I would like to thank the members of my supervisory committee, Dr. Brad Wouters and Dr. Senthil Muthuswamy for their helpful suggestions on this project.

Finally, I would like to acknowledge my friends and family for their encouragement and support along the way.

iii

Table of Contents

ABSTRACT...... II

ACKNOWLEDGMENTS ...... III

TABLE OF CONTENTS ...... IV

1 INTRODUCTION ...... 1

2 METHODS ...... 5

3 RESULTS ...... 9

3.1 ZNF143 BINDING OCCURS PRINCIPALLY AT PROMOTERS, INCLUDING AT THE MAJORITY OF THOSE OCCUPIED BY ESR1 9

3.2 ZNF143 DIRECTLY REGULATES OESTROGEN-TARGET GENE TRANSCRIPTION IN ESR1-POSITIVE BREAST CANCER

CELLS 12

3.3 GENETIC ALTERATIONS IN CHROMATIN LOOPING FACTORS ARE FREQUENTLY OCCURRING AND TEND TOWARDS

MUTUAL EXCLUSIVITY ...... 15

3.4 DIFFERENTIAL OF THE CHROMATIN LOOPING MACHINERY IS CLINICALLY RELEVANT AND RELATES

TO GENETIC ALTERATIONS IN BREAST CANCER ...... 17

4 DISCUSSION & FUTURE DIRECTIONS ...... 20

5 SUPPLEMENTARY TABLES & FIGURES ...... 24

5.1 SUPPLEMENTARY TABLES ...... 24 Supplementary Table 1: E2-regulated genes in MCF-7 cells ...... 24 Supplementary Table 2: GSEA analysis of genes down-regulated upon ZNF143 depletion under E2 stimulation ...... 26 Supplementary Table 3: Genetic alterations tend towards mutual exclusivity (TCGA 2012) ...... 27 Supplementary Table 4: Genetic alterations tend towards mutual exclusivity (TCGA Provisional Data Set) ...... 28 Supplementary Table 5: Fold Change mRNA Expression upon Genetic Alteration of Chromatin Looping Factors (TCGA Provisional Data Set) ...... 29 Supplementary Table 6: Genetic alterations tend towards mutual exclusivity in lung squamous cell carcinoma (TCGA, Provisional) ...... 30

iv

Supplementary Table 7: Genetic alterations tend towards mutual exclusivity in bladder urothelial carcinoma (TCGA, 2014) ...... 31 Supplementary Table 8: Genetic alterations tend towards mutual exclusivity in liver hepatocellular carcinoma (TCGA Provisional Data Set) ...... 32 Supplementary Table 9: Primers used in this study ...... 33

5.2 SUPPLEMENTARY FIGURES ...... 34

5 REFERENCES ...... 40

v

1 Introduction

Chromatin looping is involved in transcriptional regulation

Gene expression is tightly controlled by spatially and temporally organized transcriptional programs to determine individual cell identity and function (Spitz and Eileen 2012). This is achieved through the coordinated interplay between -bound promoters and distal regulatory elements, known as enhancers via long-range chromatin interactions that physically connect enhancers to their target promoter regions (Lieberman-Aiden 2009; Fullwood et al. 2009; Sanyal et al. 2012; Jin et al. 2013; Heidari et al. 2014; Rao et al. 2014; Plank & Dean 2014; Kron et al. 2014). The vast majority of long-range chromatin interactions are established during cellular differentiation and participate in the coordinated activation of cell-type specific transcriptional programs; however, a small subset can form upon response to external stimuli (Fraser et al. 2009; Sanyal et al. 2012; Jin et al. 2013; Rao et al. 2014).

Enhancer-promoter loops are mediated by chromatin-looping factors

The machinery regulating the formation of chromatin interactions consists of the DNA binding CCCTC-binding factor (CTCF) and 143 (ZNF143), and the non-DNA binding cohesin complex, which must be recruited to the chromatin (Cubeñas-Potts & Corces 2015).

CTCF: CTCF is a ubiquitous and essential 11-zinc finger protein whose sequence is highly conserved (Klenova et al. 1993; Filippova et al. 1996; Burcin et al. 1997; Fedoriw et al. 2004). It binds tens of thousands of sites in the (Chen et al. 2012), of which a small proportion are ulta- conserved between mammalian species (Schmidt et al. 2012) while 50-60% show cell-type specificity (Barski et al. 2007; Kim et al. 2007; Chen et al. 2008; Cuddapah et al. 2009). Although vertebrate CTCF has traditionally been implicated as an insulator protein, recent reports support CTCF’s role as a chromatin-looping factor. It regulates genome topology as a looping factor through two main mechanisms. First, CTCF contributes to the partitioning the genome into regulatory blocks, or topologically associated domains (TADs) (Nora et al. 2012; Phillips-Cremins et al. 2013; Rao et al. 2014; Lupianez et al. 2015; Guo et al. 2015; Barutcu et al. 2015; Ji et al. 2015). Chromatin interactions

1 2

enrich within these TADs but are relatively rare across TAD boundaries (Dixon et al. 2012; Hou et al. 2012; Sofueva et al. 2013). These sites may thus restrict enhancer-promoter interactions and establish functional domains of gene expression. In addition to this role in TAD boundaries, CTCF is also implicated in the cell-type specific interactions that occur within TADs. Enhancer elements are enriched for CTCF binding (Song et al. 2011; DeMare et al. 2013), indicating that a subset of CTCF sites may be important in shaping cell-type specific transcriptional programs. A significant overlap observed between cell-type specific CTCF binding sites and enhancer elements in the human genome supports this (Barski et al. 2007). Interaction analysis in 3 cell lines indicates that distal fragments that loop to promoter fragments are enriched for CTCF and active enhancer histone tail modifications (Sanyal et al. 2012). Furthermore, interaction analyses focused on CTCF binding sites in several cell types indicate that CTCF-bound fragments interact with gene promoters (Heidari et al. 2015; Guo et al. 2012) and that enhancer-promoter looping may require CTCF (Hirayama et al. 2012). These results indicate that CTCF function may be context-dependent, either defining TAD boundaries, or targeting cell-specific enhancers for looping events.

The cohesin complex: The cohesin complex is made up of several subunits, Rad21, SMC1A, SMC3 and either STAG1 or STAG2. Together, they form a ring-like structure approximately 40nm in diameter that encircles DNA fibers (Losada 2014). The complex contributes to DNA replication, as it promotes restart of replication forks that stall at regions that are difficult to replicate, has a role in facilitating doubles-strand break repair by homologous recombination (Remeseiro et al. 2012; Carretero et al. 2013) and is also involved in sister chromatid cohesion during mitosis (Uhlmann 2004). As the cohesin complex lacks a DNA binding domain, 60-80% of cohesin binding sites are dependent on CTCF for the complex’s recruitment to the chromatin (Wendt et al. 2008; Schmidt et al. 2010) where it is lost upon siRNA mediated depletion of CTCF expression (Parelho et al. 2008; Wendt et al. 2008; Hou et al. 2010). CTCF and cohesin occupancy strongly correlate with long-range interactions (Sanyal et al. 2012; Phillips-Cremins et al. 2013; Heidari et al. 2014) and the depletion of either of these factors destabilizes chromatin interactions (Splinter et al. 2006; Hadjur et al. 2009; Hou et al. 2010; Kagey et al. 2010; Sofueva et al. 2013; Li et al. 2013; Phillips-Cremins et al. 2013; Zuin et al. 2014). Cohesin may also have a CTCF-independent role in chromatin looping as it occupies a subset of active enhancer-promoter interactions that lack CTCF (Schmidt et al. 2010; Kagey et al. 2010; Phillips- Cremins et al. 2013). These sites often harbour the Mediator complex, a transcription co-activator, and

3

Nipb1, cohesin’s loading factor, in embryonic cells (Kagey et al. 2010); however the role for either of these factors in loop formation has neither been fully characterized in these cells, nor suggested in other cell types.

ZNF143: ZNF143 is a ubiquitously expressed 7 zinc-finger protein that has been described as a vertebrate transcriptional activator involved in RNA pol II-dependent transcription (Myslinski et al. 1998; Faresse et al. 2012). Although ZNF143’s requirement for growth and development is not well known in mammals, it is essential for normal development in zebrafish (Halbig et al. 2012). Analysis of ZNF143 binding across 4 mammalian genomes suggests that ZNF143 constitutes one of the most widespread transcription factor binding-sites in mammalian promoters (Myslinski et al. 2006), with binding in approximately 2000 mammalian protein-coding genes. Its enrichment in chromatin interaction anchors along with the cohesin complex and CTCF indicates that it may be involved in chromatin looping (Rao et al.; Bailey et al. 2015; Heidari et al. 2015). Indeed, ZNF143 binding at gene promoters relates directly to cell-specific interactions and the disruption of ZNF143 binding to the chromatin through genetic variation decreases chromatin interaction frequency and diminishes target- gene expression (Bailey et al. 2015).

Oestrogen signalling relies on chromatin-looping

Chromatin looping is a key mechanism in oestrogen receptor 1 (ESR1)-mediated transcriptional regulation in breast tumours that express oestrogen (classified as “luminal”) (Pan et al. 2008; Fullwood et al. 2009; Li et al. 2012; Zhang et al. 2010; Li et al. 2013). Accordingly, CTCF and the cohesin complex have been implicated in the ESR1-mediated response of breast cancer cells to oestrogen in a genome-wide manner. CTCF-bound sites involved in chromatin interactions demarcate the oestrogen response of the TFF1 and TFF3 genes, with CTCF silencing preventing E2-upregulation of gene expression. In addition, depletion of the cohesin complex subunit Rad21 abrogates the oestrogen response by interfering with the formation of chromatin interactions at oestrogen-regulated genes (Li et al. 2013) and by blocking growth of breast cancer cells (Schmidt et al. 2010). Both CTCF and the cohesin complex occupy genomic regions where ESR1 is recruited following oestrogen stimulation (Schmidt et al. 2010; Ross-Innes et al. 2011; Hah et al. 2013). These sites are primarily distal to promoters, and the factors that target promoters for ESR1-mediated long-range chromatin interactions

4

have yet to be elucidated. The characterization of ZNF143 in several cell types as a promoter-bound looping factor suggests that it may provide for this integral component of ESR1 signalling.

In the present study we identify ZNF143 as a key regulator of ESR1-signalling that enriches at oestrogen-responsive gene promoters involved in chromatin interactions. We show that it is required for oestrogen-induced gene transcription and for the luminal breast cancer growth response to oestrogen stimulation. We show that the chromatin interaction machinery, including CTCF, ZNF143 and the cohesin complex, is genetically altered in breast tumours and that the expression of these factors is clinically relevant in luminal breast cancer patients.

2 Methods

Cell Culture: MCF-7 cells were cultured as previously described (Magnani et al. 2011). Briefly, cells were maintained in DMEM (Life Technologies) supplemented with 10%FBS and 1% Penicillin Streptomycin. siRNA Transfection of MCF-7 breast cancer cells: MCF-7 cells were maintained in phenol red free DMEM medium (Life Technologies) supplemented with 10% heat-inactivated CDT-FBS, 1% Penicillin Streptomycin and 1mM Sodium Pyruvate prior to transfection, as described previously (Lupien et al 2008). Following two days of oestrogen starvation, cells were transfected with siZNF143 #1 (Ambion siRNA ID: s15192), siZNF143 #2 (Ambion siRNA ID: s15194), or a negative scrambled control siRNA (Ambion cat: 4404020). Transfection was performed using the RNAiMAX reagent (Invitrogen) according to the manufacturer’s instructions. For cell proliferation assays, cell number or O.D. (562nm) was determined every 24hrs post E2-stimulation (10µM 17β oestradiol). For expression and protein assays, RNA was extracted 3hrs following 10µM, 17β oestadiol stimulation.

RNA preparation/collection and real-time PCR: Total RNA extraction was performed using the RNeasy Plus Mini Kit (Qiagen) according to the manufacturer’s instructions. cDNA was synthesized using the iScript cDNA synthesis kit (Bio-Rad) and real-time PCR was performed using the SensiFAST SYBR No-ROX master mix (BioLine) and primers listed in Supplementary Table 9.

RNA-seq: RNA-seq reads were mapped to the human reference genome (hg19) using TopHat2 with default parameters (Kim et al. 2013). The Cufflinks suite was used to assemble transcripts, estimate transcript abundance, and test for differential expression (Trapnell et al. 2010). Specifically, differential expression was called using CuffDiff2 (Trapnell et al. 2013) with the following parameters: -u to

5 6

correct for reads mapping to more than one location and –b to correct for fragment bias (Roberts et al. 2011).

TCGA and BCCRC Xenograft data for breast cancer genetic alterations:

The DNA copy number, RNA-sequencing and genetic alteration datasets of 482 TCGA published (TCGA 2012), 962 TCGA provisional (2015) and 29 BCCRC Xenograft (Eirew et al. Nature 2014) breast cancer samples used in this work were obtained from the CBio Cancer Genomics Portal (Cerami et al. 2012; Gao et al 2013). Copy number of each gene was generated from the algorithm GISTIC (Genomic Identification of Significant Targets in Cancer) (Beroukhim et al. 2007). Somatic mutation data were obtained from whole-genome shotgun sequencing for the BCCRC Xenograft dataset (Eirew et al. 2014) and exome-sequencing for the TCGA Published and Provisional datasets (TCGA 2012). Figures were recreated using GraphPad Prism Version 5.00 for Mac OS X, (GraphPad Software, San Diego California USA, www.graphpad.com). Datasets are available through CBioPortal (http://www.cbioportal.org).

ChromHMM/Genomic distribution of binding:

The ChromHMM algorithm (Ernst et al. 2011) was used to train a multivariate Hidden Markov Model. ChIP-Seq reads for eight histone modifications in E2-stimulated MCF-7 cells (H3K27ac, H3K9ac, H3K4me3, H3K14ac, H3K4me1, H3K27me3, H3K9me3 and H3K36me3) (http://genome.ucsc.edu/ENCODE/downloads.html) were transformed into binary values using a default 200-bp bin size, and the LearnModel function was used to learn models and to assign each genomic position to one of twelve chromatin states. Biological relevance for each state was assigned based on previous studies (Ernst et al. 2011).

7

ChIP-seq

ChIP assays were performed as previously described (Lupien et al. 2010). Briefly, five million cells were crosslinked with 1% Formaldehyde and lysed. Chromatin was sonicated by biorupter and immunoprecipitated with 4µg anti-ZNF143 (Novus Biologicals H00007702-M01), followed by reverse crosslinking. DNA was extracted using MinElute PCR purification kit (Qiagen 28004), processed with library construction using NEB ChIP-seq library prep reagent set (E6200S), and then sequenced by Hi- Seq 2000. Sequencing reads were aligned to hg19 by Bowtie and ZNF143 binding peaks were called by MACS2.0 (Zhang et al. 2008). All assays were conducted in duplicates. These were pooled to improve peak detection as duplicates for each condition were highly correlated with one another (R= 0.844 vehicle; R=0.928 E2-stimulation) (Supplementary Figure 1).

Statistical Analysis: P-value was determined using Student’s t-test with two-tail distribution. Kaplan-Meier analyses (http://www.kmplot.com) were used to assay differences in overall survival (Györffy 2010). Prism and Omnigraffle were used for figures.

Overlap Analysis and genome structure correction (GSC): The intersections between the binding sites for ZNF143, ESR1 and ChIA-PET regions were performed using the BEDTools software package (Quinlan et al. 2010). Overlapping binding sites were defined by having at least one in common. Genome structure correction (GSC) statistic (Birney et al. 2007; Bickel et al. 2011) was run to establish the significance of the overlap of ZNF143 with E2- regulated promoters involved in loops. The software was run using region fraction = 0.2, sub-region fraction –S = 0.4 and –bm as statistic test. All DNAse I Hypersensitive sites in E2-stimulated (10nM E2, 45min) MCF-7 cells were used as the null list (He et al. 2012).

Gene Set Enrichment Analysis

Biological pathways disrupted by ZNF143 depletion in MCF-7 cells were investigated by performing a gene set enrichment analysis (GSEA), software that is available from the Broad Institute (Subramanian

8

et al. 2005; Mootha et al. 2003). The hallmark gene sets were used for enrichment analysis (www.broadinstitute.org/gsea).

9

3 Results 3.1 ZNF143 binding occurs principally at promoters, including at the majority of those occupied by ESR1

The role for CTCF and the cohesin complex in ESR1 signalling has been established in breast cancer cells (Liu and Cheung 2014). However, ZNF143’s contribution to the progression of this cancer type is unknown. We therefore determined the genome-wide chromatin binding profile of ZNF143 in ESR1- positive breast cancer cells through chromatin immunoprecipitation (ChIP) followed by massively parallel sequencing (ChIP-seq) in MCF-7 cells. This was performed before (vehicle control) and after stimulation with 17β-oestradiol (E2), a potent form of oestrogen (Supplementary Figure 1). We identified a total of 76,802 and 73,798 high-confidence (p≤1.0x10-5) ZNF143 chromatin-binding sites in vehicle and E2-stimulated conditions, respectively. We observe a strong correlation (R = 0.809) in ZNF143’s binding affinity for the chromatin under these two conditions (Figure 1A, upper panel), suggesting that ZNF143 is bound to the chromatin prior to E2 stimulation in ESR1-positive breast cancer cells and that ZNF143 remains stably bound at these sites following hormonal stimulation. The ZNF143 binding regions in MCF-7 breast cancer cells employed for all subsequent analysis were those identified using the merged sequencing files.

To determine the genomic distribution of ZNF143 binding events, we first defined chromatin states in MCF-7 cells based on the ChromHMM algorithm (Ernst and Kellis 2012) using the ChIP-Seq data for Histone 3 Lysine 27 acetylation (H3K27ac), H3K9ac, H3K14ac, Histone 3 Lysine 4 trimethylation (H3K4me3), H3K4me1, H3K27me3, H3K9me3 and H3K36me3 generated in E2-stimulated MCF-7 cells (Joseph et al. 2010; Magnani et al. 2013; Li et al. 2013). We applied a 12-state chromatin model to segment the genome and then grouped predicted functional elements as promoters, enhancers, transcribed, repressed, CTCF or no signal regions (Supplementary Figure 2). In agreement with reports revealing a strong enrichment for ZNF143 binding at promoters in other cell types (Bailey et al. 2015), we find 38,320 (43%) ZNF143 bound sites at promoters (Figure 1A, lower panel). An additional 16,232 sites (18%) map to enhancers, 3,173 sites (3.6%) to transcribed regions, 2,073 sites (2.3%) to

10

repressed regions, 7,410 (8.3%) to CTCF regions, and the remaining 22,486 sites (26%) fall in regions with no signal in our ChromHMM segmentation model (Figure 1A, lower panel). The bias of ZNF143 binding at promoters is further highlighted by its significantly increased binding intensity at promoters over other genomic regions (Figure 1B).

ESR1 recruitment to the chromatin following E2-stimulation in MCF-7 cells has previously been reported to occur primarily away from promoters (Carroll et al. 2006; Lin et al. 2007; Welboren et al. 2009). These assessments were performed based on the position of annotated genes across the reference human genome as opposed to chromatin states. Indeed, using this approach, we map less than 5% (1,168) ESR1 binding sites to promoters (Figure 1C, upper panel). However, the proportion of ESR1-bound promoters increases to 27.4% (6,406 sites) when using chromatin state to define genomic elements in MCF-7 cells (Figure 1C, lower panel). ESR1 binding is still predominantly (33.8%; 7,907 sites) found at enhancers (Figure 1C) and further assessment of ESR1 called peaks indicates that only 10% of binding sites are within ±2.5kb from the TSS of coding genes; this is in line with reports of a subset of ESR1 binding sites that are called in proximity of promoters when co-localized with CTCF (Ross-Innes et al. 2011). The majority of ESR1 sites that are called in promoter regions are so far un- annotated and do not fall within ±2.5kb of TSS of lncRNAs or coding genes.

Comparing ZNF143 and ESR1 binding profiles reveals over 8,671 shared binding sites (Supplementary Figure 3). This translates into 72% of ESR1-bound promoter regions occupied by ZNF143 prior to and after E2-stimulation (4,782 of 6,647 ESR1 promoter bound regions)(Figure 1D). Over 36% of ESR1- bound enhancers are also occupied by ZNF143 binding in MCF-7 cells but transcribed, CTCF and repressed ESR1-bound regions show minimal overlap with ZNF143 binding sites (15%, 4% and 18% of ESR1 sites, respectively) (Figure 1D). We find that ESR1 binding is significantly stronger at enhancer regions than promoters occupied by ZNF143 (p=1x10-3). In contrast, ZNF143 binds the ESR1–bound promoters with greater affinity (p<1x10-3) compared to enhancers (Figure 1E). These results are in line with primary ESR1 binding occurring at enhancers, while primary ZNF143 binding occurs at promoters.

11

Figure 1

Figure 1: Genome-Wide Binding of ZNF143 Suggests a Role in E2-Induced ESR1 Recruitment to Promoter Elements

Scatterplot comparing read count for ZNF143 binding in MCF-7 breast cancer cells treated with vehicle or E2 (10µM) for 45 minutes. Scatterplot was generated with SeqMonk (http:/www.bioinformatics.babraham.ac.uk/projects/seqmonk/) from ZNF143 ChIP-sequencing data. R, Pearson’s correlation, was calculated from all data points (Upper Panel). Genomic distribution of ZNF143 binding events in MCF-7 cells. Chromatin states were annotated by the ChromHMM algorithm (Supplemental Figure S2) (Ernst et al. 2012) (Lower Panel) (A) . Boxplots representing the signal intensity (MACS score) for ZNF143 at promoter, enhancer, transcribed, CTCF ad no-signal regions of the genome. P-value was calculated using the Mann-Whitney test (B) . Global genomic distribution of ESR1 binding events called in MCF-7 cells under E2 stimulation. Genomic elements were annotated by the Cis-Regulatory Element Annotation System (CEAS) web application (cistrome.org/ap) or by the ChromHMM algorithm (Supplemental Figure S2 (C) . Histogram illustrating the proportion of ESR1-bound sites that are occupied by ZNF143 at promoters, enhaners, transcribed, repressed and CTCF regions(D) . Boxplots showing the signal intensity of ZNF143 and ESR1 at shared binding sites called in promoters or enhancers, as defined by ChromHMM. P-values were calculated using the Mann-Whitney test (E). (*p<0.05; ** p<0.01; *** p<0.001)

12

3.2 ZNF143 directly regulates oestrogen-target gene transcription in ESR1-positive breast cancer cells

To determine how ZNF143 occupancy relates to chromatin interactions at E2-regulated promoters, we first defined E2-responsive genes as genes whose expression, measured by RNA-sequencing (RNA- seq), is significantly altered following E2-stimulation for three hours (FC > 1.5; p=0.05). This identified 194 E2-regulated genes (Supplementary Table 1). We then mined the RNA Polymerase II (RNA Pol II) Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) dataset (Li et al. 2012) to identify E2-upregulated gene promoters that form chromatin interactions. A total of 223 promoters ascribed to the 194 E2-regulated genes were assessed in this analysis. We identified 137 (61%) E2-upregulated genes with at least one chromatin interaction anchored at the promoter (±2.5 kilobases (kb) from the Transcription Start Site (TSS)) (Figure 2A, upper panel). ZNF143 binds to 110 (80%) of the 137 E2-regulated gene promoters, a significantly higher proportion (P<1.0x10-4) than expected by chance (Figure 2A, lower panel). These results indicate ZNF143 binding is enriched at E2- regulated gene promoters involved in chromatin interactions, supporting a direct contribution of ZNF143 to chromatin interactions associated with the E2-response in breast cancer.

We then further delineated the requirement for ZNF143 in the E2 response, specifically in the regulation of E2 target gene expression. We performed RNA-seq following E2-stimulation in MCF-7 cells depleted of ZNF143 using siRNAs. We found that 91(47%) of the 194 E2-regulated genes were no longer responsive to E2-stimulation following ZNF143 depletion (Figure 2B-C). Among the genes whose expression remained responsive to E2-stimulation, 32 genes (16%) showed significantly reduced expression (p<0.05) following ZNF143 depletion compared to control cells (Figure 2B-C). The impact of ZNF143 depletion on E2-target gene regulation relates to ZNF143 binding intensity at promoters; we found that genes unaffected by the loss of ZNF143 (No Change gene category) showed weak ZNF143 binding at their promoters whereas genes that showed reduced expression or loss of induction to E2-stimulation following ZNF143 depletion harboured significantly stronger ZNF143 binding at their promoters in control conditions (p= 4.0x10-3; p=7.1x10-3) (Figure 2D). These findings suggest that the genes that are unaffected by the loss of ZNF143 do not require ZNF143 at their promoters to

13

regulate expression whereas the genes that exhibit either loss of induction or reduced expression when ZNF143 is lost do rely on ZNF143 occupancy at their promoters. We further interrogated patterns of mRNA expression in MCF-7 cells depleted of ZNF143 by performing Gene Set Enrichment Analysis (GSEA) for genes whose expression was significantly down-regulated upon ZNF143 depletion (FC >1.5, p <0.05) under E2-stimulated conditions. This revealed enrichment for early and late oestrogen stimulation gene sets (p= 7.21x10-15 & p= 3.92x10-9) (Supplementary Table 2).

MCF-7 cells rely on E2-stimulation to activate the ESR1-regulated gene expression program, which mediates their re-entry into cell cycle and drives proliferation. We therefore assessed the requirement for ZNF143 in the E2-induced growth of MCF-7 luminal breast cancer cells following its siRNA-based depletion (Figure 2E). ZNF143 depletion significantly impaired growth of MCF-7 cells under E2- stimulation (Figure 2F), supporting its central role in the oestrogen response. Overall, these results show that ZNF143 is required for the ESR1-mediated transcriptional response and for the E2-induced growth response of luminal breast cancer cells.

14

Figure 2

E2-Regulated Gene Promoters siZNF143 A B si-Scr 5 67% genes lose 43% genes lose E2-repression E2-induction 1 2 3 Pol2$ChIA-PET$ 4

61%$ 3 (p-Val)

ZNf143 binding 10

100

-Log 2 «««

80

60 1

40 4 5 6

Proportion of Bound, Bound, of Proportion 0 20 -2 0 2 4 6

Looping E2-regulated Promoters mRNA Fold Change: E2 / Vehicle Treated (Log2 transformed) 0 ZNF143 Null List

C Quadrants 1 & 3 Quadrants 2,4,5 & 6 D

No Change Loss of Induction Reduced Expression !!! (37% of genes) (47% of genes) (16% of genes) 400 !!! NS 300 TFF1 TET2 200 100 FC = 1.0 FC = 1.8 NS !! FC = 1.2 NS FC = 1.1 FC = 1.7 NS !!! FC = 1.4 15 !!! 60 FC = 2.6 FC = 2.0 20 !!! !!! FC = 2.5 600 !!! FC = 1.5 !!

FC = 1.3 10 40 NS FC = 2.2 400 !!! Vehicle 10 Vehicle E2 E2 Vehicle FPKM

FPKM E2 FPKM 20 5 200 ZNF143 Signal Intensity (MACS score) 0

0 0 0

si-Scr si-Scr No Change si-Scr siZNF143 siZNF143 siZNF143 Loss of Induction Reduced Expression 1.5 E F 6

NS 1.0 4 «« ««

« Growth 0.5 « si-Scr (Veh) «« «« si-Scr (E2) (Normalized to Actin) Actin) to (Normalized ZNF143 mRNA Level 2 NS « siZNF143-1 (Veh) siZNF143-1 (E2) 0.0 siZNF143-2 (Veh) siZNF143-2 (E2) (520nm absorbance relative to Day 0) 0 siScr (E2) siScr (Veh) 0 1 2 3 4

siZNF143-1 (E2)siZNF143-2 (E2) Time (Days) siZNF143-1 (Veh)siZNF143-2 (Veh) Figure 2: ZNF143 is Required for the E2-Induced Transcriptional Response Proportion of E2-responsive gene promoters (±2.5kb from TSS) that are involved in chromatin interactions associated with RNA polymerase II in E2-stimulated MCF-7 cells (upper panel). Figure 2: ZNF143Scatterplot is illustrating Required the proportion for the ofE2 these-Induced promoters Transcriptional that exhibit ZNF143 bindingResponse (lower panel) (A). Volcano plot showing changes in gene expression of E2-responsive genes upon E2 stimulation in cells depleted of ZNF143 (pink) or transfected with scrambled siRNA (green). Proportion ofEach E2 circle-responsive represents geneone gene. promoters The log fold (±2.5kb change in fromgene expression TSS) that in E2-stimulated are involved versus in chromatin interactions associated with RNA polymerase IIvehicle in E2 treated-stimulated conditions MCF is represented-7 cells on(upper the x-axis. panel). The y-axisScatterplot shows the illustrating -log10 of the the pvalue. proportion of these promoters that exhibit ZNF143 binding (lowerA p-value panel) of 0.05 (A) and. Volcano a fold change plot of 1.5 showing are indicated changes by grey in lines. gene Quadrants expression 1-6 are of E2-responsive genes upon E2 stimulation in cells indicated (B). Case examples of genes that fall within quadrants 1 and 5 that illustrate gene depleted of ZNF143expression (pink)changes orunder transfected vehicle and withE2-stimulated scrambled conditions. siRNA (RPKM: (green).Each reads per kilobase circle per represents one gene. The log fold change in gene expression inmillion). E2-stimulated P-value calculated versus using vehicle student’s treated t-test. conditions (C). Boxplots is illustrating represented signal intensityon the xof- axis. The y-axis shows the -log10 of the pvalue. A p- value of 0.05ZNF143 and a atfold promoters change of genesof 1.5 that are show indicated no change, by reducedgrey lines. expression Quadrants or loss of1 -E2-induction6 are indicated (B). Case examples of genes that fall within upon ZNF143 depletion. P-value calculated using Kruskal-Wallis test (D). ZNF143 depletion via quadrants 1 siRNAand 5 significantly that illustrate reduces gene its mRNA expression levels 48hrs changes post-transfection. under vehicleP-value calculated and E2 using-stimulated conditions. (RPKM: reads per kilobase per million). P-valuestudent’s calculated unpaired t-test using (E). student’sMCF-7 breast t-test. cancer (C) cells. Boxplots depleted of illustratingZNF143 fail to signalgrow in intensity of ZNF143 at promoters of genes that show no change, reducedresponse expressionto E2 stimulation or losscompared of E2 to- controlinduction cells. uponP-value ZNF143 calculated depletion.using student’s P- valueunpaired calculate t- d using Kruskal-Wallis test (D). ZNF143 test (F) (*p<0.05; ** p<0.01; *** p<0.001; NS = Not Significant) depletion via siRNA significantly reduces its mRNA levels 48hrs post-transfection. P-value calculated using student’s unpaired t-test (E). MCF-7 breast cancer cells depleted of ZNF143 fail to grow in response to E2 stimulation compared to control cells. P-value calculated using student’s unpaired t-test (F) (*p<0.05; ** p<0.01; *** p<0.001; NS = Not Significant)

15

3.3 Genetic alterations in chromatin looping factors are frequently occurring and tend towards mutual exclusivity

Pan-cancer analyses led by The Cancer Genome Atlas (TCGA) have revealed that CTCF is significantly mutated in several cancers, including breast (Lawrence et al. 2014). Having established a role for ZNF143 in ESR1 signalling in luminal breast cancer cells, these reports prompted us to assess the extent of genetic alterations to the known chromatin looping factors in breast cancer. We investigated the frequency of genetic alterations in CTCF, ZNF143 and the cohesin complex subunits (Rad21, STAG1, STAG2, SMCA1 and SMC3) in the published and provisional breast cancer datasets from TCGA as well as in the Primary Derived Xenograft (PDX) populations from the British Columbia Cancer Research Centre (BCCRC) (TCGA 2012; Eirew et al. 2014). Collectively, the chromatin- looping factors harbour genetic alterations (amplification, deletion or somatic mutation) in at least 22% of primary breast tumours and up to 65% of PDXs derived from primary and metastatic breast tumours (Figure 3A-B). Although we observe a trend towards mutual exclusivity between genetic alterations in the chromatin-looping factors across samples from the TCGA provisional and published datasets, 3 gene pairs do show significant co-occurrence in the TCGA provisional dataset (Figure 1C, Supplementary Table 3-4).

Finally, we assessed the relevance for these genetic alterations to breast cancer subtype as defined by the PAM50 gene expression classifier (Parker et al. 2009) in the TCGA published and provisional data sets. Although genetic alterations to ZNF143 and CTCF seem to enrich in luminal cancers (3/3 and 15/18, respectively) in the published data set, this observation is not reproduced in the provisional dataset; genetic alterations in each of the chromatin looping factors are found in the other breast cancer subtypes. These results suggest that in addition to the chromatin looping machinery being altered in luminal cancers, where it is contributes to ESR1 signalling, the dysregulation of these factors may also contribute to disease in the other breast cancer subtypes.

16

Figure 3

A B 80 D PAM50 (TCGA Published Data Set) TCGA Pub n = 29 20 60 100

15 80 Luminal Amplification 40 HER2-Enriched 60 Basal-Like Mutation n = 962 Normal-Like 10 Deletion n = 482 40 20 Multiple Alterations Tumours (%) Overall Alteration Frequency (%) 20 5 0 0 Alteration Frequency (%) Frequency Alteration

0 TCGA Pub TCGA Prov SMC3 (n=1) CTCF (n=18)STAG1STAG2 (n=8) (n=6) ZNF143 (n=3)RAD21 (n=67) SMC1A (n=4) BCCRC Xenograft Luminal CTCF SMC3 Rad21STAG1STAG2SMC1A ZNF143 All Samples (n=482) HER2-Enriched C Basal-Like Normal-Like TCGA Prov NA PAM50 25 (TCGA Provisional Data Set)

20 Amplification 100 Mutation 80 Luminal 15 Deletion HER2-Enriched Multiple Alterations 60 Basal-Like Normal-Like 10 40 NA Tumours (%) 20 5 Alteration Frequency (%) Frequency Alteration 0 0

CTCF SMC3 ★★★ Rad21STAG1STAG2SMC1A ZNF143 CTCF (n=29) STAG1STAG2 (n=64) (n=18)SMC3 (n=12) ZNF143 (n=13)RAD21 (n=195) SMC1A (n=21)

★★ All Samples (n=963)

BCCRC Xenograft ★★★

60

40 Amplification Mutation Deletion Multiple Alterations 20 Alteration Frequency (%) Frequency Alteration ★ ★ 0

CTCFRad21 SMC3 ZNF143 STAG1STAG2SMC1A

Co-occurence

Figure 3: Chromatin-Looping Factors are Genetically Altered in Breast Cancer Figure 3: Chromatin-Looping Factors are Genetically Altered in Breast Cancer Copy number alterations and somatic mutations in CTCF, ZNF143 and five members of the Copycohesin number complex alterations were analyzed and somatic in three separatemutations breast in cancer CTCF datasets., ZNF143 DNA amplification,and five members of the cohesin complex were analyzed in three separatedeletion, breast mutation cancer or multiple datasets. alterations DNA are amplification, indicated (A) . deletion,Overall frequency mutation of genetic or multiple alterations are indicated (A). Overall frequency of geneticalterations alterations in the chromatinin the chromatin looping factors looping specified factors in A,specified in three breast in A, cancer in three data breast sets (B) cancer. For data sets (B). For each breast cancer data set, the proportion of samples with genetic alterations in a single factor versus in multiple factors is indicated by the strength of colour (C). Breast each breast cancer data set, the proportion of samples with genetic alterations in a single factor cancer subtype of tumours harbouring a genetic alteration in a chromatin looping factor; subtype determined by gene expression of 50 genesversus (PAM50) in multiple (D) .factors (*p<0.05; is indicated ** p<0.01; by the strength *** p<0.001) of colour (C). Breast cancer subtype of tumours harbouring a genetic alteration in a chromatin looping factor; subtype determined by gene expression of 50 genes (PAM50) (D). (*p<0.05; ** p<0.01; *** p<0.001)

17

3.4 Differential gene expression of the chromatin looping machinery is clinically relevant and relates to genetic alterations in breast cancer

We determined the relevance of genetic alterations in chromatin looping factors to their expression by segregating breast tumour samples based on the GISTIC-based copy-number alteration score (Mermel et al. 2011). We observe a significant decrease in ZNF143 expression in tumours harbouring heterozygous and homozygous deletions compared to diploid cases (p=2.0x10-3 and p<1.0x10-3 respectively) (Figure 4A), while copy number gains correlate with significantly increased ZNF143 expression (p<1.0x10-3). Similar results were obtained for all other chromatin-looping factors with the exception of STAG2 (Figure 4A; Supplementary Table 5). These results suggest that that copy number variations in the chromatin looping factors directly impacts their expression.

To address the clinical relevance of differential expression of looping factors in luminal breast cancer we performed Kaplan-Meier analysis using the METABRIC dataset consisting of close to 2,000 expression profiles from independent, clinically annotated breast cancer samples (Curtis et al. 2012). Using the KMplot tool (http://kmplot.com/private/) (Györffy et al. 2010) we segregated samples based on low versus high levels for each chromatin looping factor. Kaplan-Meier curves focused on overall survival reveal that ESR1-positive breast cancer patients with elevated ZNF143 expression do worse than those with low expression levels (p=1.1x10-3) (Figure 4B). This observation is valid across all ESR1-positive breast cancer patients as well as within luminal A or B subtypes (p=4.3x10-2 & p=7.0x10-4, respectively)(Figure 4B). This suggests that ZNF143 expression does not simply discriminate luminal A from Luminal B cancers, but that high expression correlates with more aggressive disease within each subtype. The association of elevated gene expression with more aggressive breast cancers is also observed for other chromatin-looping factors. For instance, the overall survival of breast cancer patients whose tumours expressed high levels of CTCF or the cohesin subunit Rad21 is worse than for patients whose tumours express these factors at lower levels (p=6.3x10-6 & p=1.3x10-3, respectively across all ESR1-positive breast cancer patients). CTCF expression discriminates poor outcome within both Luminal A and Luminal B subtypes (p=4.3x10-4 & p=7.7x10-4, respectively) (Figure 4B), while RAD21’s expression is only predictive in the more aggressive Luminal B subtype (p=0.027) (Supplementary Figure 5). These results suggest that the expression levels of

18

these chromatin-looping factors are relevant to the clinical outcome of ESR1-positive breast cancers patients.

19

Figure 4

ZNF143 CTCF Rad21 STAG1 «« « A «« NS 17 ««« 12 ««« «« 13 NS ««« ««« 10 16 ««« ««« 12 ««« «««

11 15 11

9 14 10

10 13 9

8 12 8

9 11 7 mRNA Expression (log2) mRNA Expression (log2) mRNA Expression (log2)

mRNA Expression (log2) 10 7 6

8 9 5

Amp (n=3) Gain (n=79) Amp (n=5) Amp (n=11) HomDel (n=6) Gain (n=124) HomDel (n=10) Diploid (n=270) HomDel (n=0) Gain (n=407)Amp (n=190) HomDel (n=1) Gain (n=227) Diploid (n=561) HetLoss (n=598) Diploid (n=338) Diploid (n=631) HetLoss (n=263) HetLoss (n=23) HetLoss (n=89) STAG2 SMC1A SMC3 ««« « «« «« « ««« ««« ««« 14 14 NS NS 13

12 12

12 11

10 10

9 8 10 mRNA Expression (log2) mRNA Expression (log2) mRNA Expression (log2) 8

6 7

Amp (n=4) Amp (n=1) Gain (n=134) Amp (n=10) Gain (n=85) HomDel (n=1) HomDel (n=3) Gain (n=155) HomDel (n=4) Diploid (n=655) Diploid (n=639) Diploid (n=577) HetLoss (n=165) HetLoss (n=151) HetLoss (n=292)

B Luminal A Luminal B ERα Positive

1.0 HR = 1.4 (1.1 − 1.7)

1.0 HR = 1.4 (1.0 − 1.9) 1.0 HR = 1.5 (1.2 − 1.9)

1.0 logrank P = 0.0011 HR = 1.4 1.0 HRlogrank = 1.4 P = 0.043 1.0 HRlogrank = 1.5 P = 0.00077 P-val = 0.0011 P-val = 0.043 P-val = 0.00077 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 ZNF143 0.4 0.4 0.4 0.4 0.4 0.4 Probability Expression Probability Expression Probability Expression 0.2 0.2 0.2 0.2 low 0.2 0.2 Expression Expressionlow Expressionlow lowhigh lowhigh lowhigh high

0.0 high high 0.0 0.0 0.0 0.0 0.0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Time (years) Time (years) Time (years)

1.0 HR = 1.6 (1.2 − 2.2) 1.0 HR = 1.5 (1.2 − 1.9)

1.0 HR = 1.6 (1.3 − 2.0)

logrank P = 6.3e−06 1.0 logrank P = 0.0043 1.0 logrank P = 0.0013 1.0 HR = 1.6 HR = 1.6 HR = 1.5 P-val = 6.3e-6 P-val = 0.0043 P-val = 0.013 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 CTCF 0.6 0.4 0.4 0.4 0.4 0.4 0.4 Probability Probability Probability Expression Expression Expression 0.2 0.2 0.2 0.2 0.2 0.2 low low Expressionlow Expression Expression lowhigh lowhigh lowhigh high high high 0.0 0.0 0.0 0.0 0.0 0.0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Time (years) Time (years) Time (years) Figure 4: Chromatin Looping Factor Expression is Affected by Genetic Alterations and is Clinically Relevant in Luminal Breast Cancer Figure 4:Box-and-whisker Chromatin plots Looping showing mRNAFactor expression Expression for the chromatin is affected looping byfactors Genetic assessed inAlterations and is Clinically Relevant in Luminal Breast CancerA-C that have altered copy number status, as determined from GISTIC. Gene mutation status and mRNA expression were analyzed using publically available data obtained through the cBioPortal Box-and-forwhisker Cancer Genomics. plots showing P-value is mRNAcalculated expressionusing Mann-Whitney for the test chromatin(*p<0.05; ** plooping <0.01; *** factors p assessed in A-C that have altered copy number status, as<0.001) determined (A). . Kaplan-Meier from GISTIC. plots derivedGene frommutation the METABRIC status and data mRNA set (Curtis expression et al. 2012) were analyzed using publically available data obtained through theevaluating cBioPortal overall survival for Cancer in ESR1-positive Genomics. breast P -cancervalue patients is calculated (n=1486), Luminalusing Mann A patients-Whitney test (*p<0.05; ** p <0.01; *** p <0.001) (A). . Kaplan-Meier(n=825) plots and Luminal derived B (n=668) from thepatients, METABRIC stratified by ZNF143data set or (Curtis CTCF expression. et al. 2012) Data wereevaluating overall survival in ESR1-positive breast cancer patients obtained(n=1486), from Luminalthe Kaplan-Meier A patients plotter breast(n=825) cancer and survival Luminal analysis B database (n=668) (Györffy patients, 2010). stratified by ZNF143 or CTCF expression. Data were obtained from the Kaplan-Meier plotter breast cancer survival analysis database (Györffy 2010). Hazard ratios (HR) and logrank P-values Hazard ratios (HR) and logrank P-values are displayed (B). are displayed (B).

20

4 Discussion & Future Directions

Chromatin interactions regulate transcriptional networks that drive differentiation and cell-specific responses to stimuli (Fraser et al. 2009; Sanyal et al. 2012; Jin et al. 2013; Rao et al. 2014). Dysregulation of transcriptional regulation mechanisms and consequent changes to gene expression networks are central to tumourigenesis and disease progression (Kolch et al. 2015). Using three different studies characterizing genetic alterations in breast tumours, we show that the chromatin interaction factors that are known to regulate chromatin loops (ZNF143, CTCF and the subunits of the cohesin complex) are frequently genetically altered. Furthermore, using an independent dataset, we show that elevated expression of these factors typifies aggressive ESR1-positive breast tumours. These results expand on the reported significant mutational load in CTCF and the subunits of the cohesin complex in some solid and haematological cancers (Lawrence et al. 2014). We find that genetic alterations in chromatin-looping factors tend towards mutual exclusivity in breast cancer, an observation shared with lung squamous cell carcinoma (TCGA, Provisional), bladder urothelial carcinoma (TCGA 2014) and liver hepatocellular carcinoma (TCGA, Provisional) (Supplementary Tables 6-8) The tendency towards mutual exclusivity in mutations targeting chromatin-looping factors supports previous reports showing that distinct mutations identified in tumours can converge on proteins involved in a central oncogenic pathway (Leiserson et al. 2015). In addition to mutations that directly target genes, genetic variance in the sequences of regulatory elements can alter looping machinery binding to the chromatin and predispose the cell to improper gene expression. For instance, CTCF/cohesin binding sites in or adjacent to the CTCF motif are frequently mutated in colorectal cancers (Katainen et al. 2015) and deletion of CTCF/cohesin co-bound sites in ESCs alter interaction frequencies of enhancers with proximal genes and can dysregulate their expression (Dowen et al. 2014). These findings indicate that looking beyond the genetic alterations in looping factors to changes that occur in regulatory elements harbouring ZNF143 and CTCF binding motifs in breast cancer may further elucidate the extent to which chromatin looping factor activity is affected in tumours.

Recent studies suggest that looping structures are heterogeneous and are mediated by different combinations of chromatin interaction proteins (De Laat and Duboule 2013). Chromatin interactions

21

between two CTCF-bound regions are associated with long (>1Mb) stable interactions conserved across cell types that partition the genome into regulatory blocks (Zhang et al. 2012; Phillips-Cremins et al. 2013; Heidari et al. 2014, Rao et al. 2014; Dowen et al. 2014). Interactions that connect enhancer and promoter elements are associated with shorter, more cell-type specific interactions and generally occur within CTCF-CTCF interaction boundaries (Schmidt et al. 2010; Kagey et al. 2010; DeMare et al. 2013; Heidari et al. 2014; Dowen et al. 2014; Bailey et al. 2015). They are believed to be mediated by CTCF at distal sites and ZNF143 at promoters (Cubeñas-Potts and Corces 2015). The dynamics of these enhancer-promoter interactions is subject to debate, particularly those that mediate the transcriptional response to external stimuli. Chromatin Conformation Capture (3C) assays in MCF-7 cells depleted of ESR1 indicates that this receptor is required at specific chromatin interaction anchors to activate transcription of certain E2-regulated genes (Pan et al. 2008; Fullwood et al. 2009). HiC analysis in stimulated MCF-7 cells also indicates that E2-stimulation increases the frequency of interactions (Mourad et al. 2014), with the largest changes in interaction frequency occurring at ESR1 binding sites. Finally, locus-specific studies have indicated that E2-stimulation can increase the frequency of specific promoter-enhancer interactions (Li et al. 2013). Taken together, these results would suggest that chromatin interactions involving the ESR1 are transient and are induced by E2- stimulation. These studies however, stand in contrast to the reports that indicate enhancer-promoter interactions are established during cellular differentiation, do not require transcription to remain stable and are mostly pre-formed prior to external stimulation (Fraser et al. 2009; Sanyal et al. 2012; Jin et al. 2013; Rao et al. 2014). Our work shows that ZNF143 is present at promoters interacting with distal ESR1-bound sites and is required for the ESR1-mediated response to E2-stimulation. ZNF143 is present at E2-regulated gene promoters prior to stimulation and remains bound upon hormone treatment. A similar observation was reported for CTCF in MCF-7 breast cancer cells (Ross-Innes et al. 2011). Together, these results are consistent with chromatin looping factors stably bookmarking the chromatin interaction anchors to predispose the cells to a transcriptional response guided by a three- dimensional chromatin configuration. Interestingly, the ZNF143 motif is found at the majority of mammalian promoters (Myslinski et al. 2006) and CTCF binding is highly conserved across tissues (Schmidt et al. 2012). Furthermore, the chromatin looping machinery is genetically altered in each breast cancer subtype (Figure 3C) as well as in many different cancers-types (Supplementary Figure 6), indicating that the looping machinery contributes to cell-specific transcriptional regulation. Therefore, how the looping machinery establishes cell-type specific interactions in a normal cell and how the

22

alterations to these interactions affects gene expression regulation in the context of distinct cancers remain important questions to be addressed.

The ESR1-mediated transcriptional response is a major driver of growth and proliferation in a majority of diagnosed breast cancer cases (Tyson et al. 2011) and is accordingly an important focus for targeted therapy (Renoir et al. 2013). Although the treatment of breast cancer has benefitted tremendously from the generation of therapies against ESR1 activity, disease relapse continues to pose a challenge due to intrinsic or acquired drug resistance. Better understanding of the mechanism of ESR1 signalling may provide alternative treatment avenues. Our work identifies the chromatin-looping machinery, inclusive of ZNF143, as a central player in ESR1 signalling in breast cancer cells. As targeting zinc-finger proteins, either ZNF143 or CTCF, is technically challenging, a comprehensive identification of the chromatin-looping machinery is warranted. In Drosophila, accessory proteins (CP190, Rad21, Mdg4, CapH2, condensin factor, Fs1h-L and L3mbt) that interact with and assist DNA binding chromatin- looping factors have been identified (Moshkovich et al. 2011; Gurudatta et al, 2013; Kellner et al. 2013; Van Bortle et al. 2014; Vogelmann et al. 2014). However, the characterization of chromatin- looping accessory proteins has been less extensive in mammals. In humans, the cohesin complex proteins serve as accessory protein to the chromatin-looping factors (Cubeñas-Potts and Corces 2015) and recent reports suggest that the Mediator complex may assist in the formation of chromatin interactions. The Mediator enriches at enhancer-promoter contact sites and MED1 or MED12 subunit depletion alters chromatin interactions in embryonic stem cells (Kagey et al. 2010; Phillips-Cremins et al. 2013). Additional factors such as Nipbl, which can load cohesin on the chromatin, and CTCF- interacting proteins YYA, Kaiso, CHD8, PARP1, MAZ, JUND, nuceophosmin, PRDM5 and TF-II might also prove critical in targeting the chromatin-looping machinery in cancer (Kagey et al. 2010; Cubeñas-Potts & Corces 2015). Finally, enhancer RNAs (eRNAs) have been suggested to participate in the formation and stability of chromatin interactions (Li et al. 2013; Hah et al. 2013). However, chromatin loops can still exist in the absence of eRNA production (Hah et al. 2013). Although our work focuses on enhancer-promoter loops, the delineation of the full complement of factors involved in the different types of interactions may provide us with a better understanding of the mechanism through which chromatin interactions regulate gene transcription.

23

Overall, we establish the chromatin-looping factor ZNF143 as a key regulator of ESR1 signalling in luminal breast cancer cells. We show that the chromatin-looping machinery, inclusive of ZNF143, is altered in over 20% of breast tumours and that the expression of these looping factors is of clinical relevance to breast cancer patients. This work expands our understanding of the mechanism that drives two-thirds of breast cancer cases. The prevalence of alterations to the chromatin-interaction machinery across multiple cancer types is indicative that these factors likely contribute to transcriptional regulation beyond ESR1 signalling. Thus, in addition to established genetic and epigenetic contributions, the three-dimensional architecture of the chromatin may prove to be another important regulator of the hallmark altered gene transcription regulation seen in cancer.

24

5 Supplementary Tables & Figures 5.1 Supplementary Tables

Supplementary Table 1: E2-regulated genes in MCF-7 cells

E2 Up-Regulated Genes E2 Down-Regulated Genes EN2 KAZN RARA TPPP3 FAT4 NAT8L ANKRD2 TNFSF10 PFKFB3 SYNE3 RET NUPR1 DPM2,FAM1 CELSR2 COLEC12 SLC22A1 02A CBFA2T3 IGFBP4 PCDH9 FUT4 SYBU A4GALT DRD1 KBTBD8 MYEOV TNS1 DEPDC7 TMEM201 FOS DLC1 NOV P2RY2 SIAH2 RAB37 BPIFA4P EPS15L1 GAB2 KCNV1 HCAR2,HC GEMIN5 ZNF703 CYP26A1 AR3 AMER1 PITX1 BIRC3 TM4SF18 ADAT2 SCAND1 SEC14L2 C8orf4 LAMA3 ASB13 SCNN1G KCNJ8 TET2 FAM178B MYC HMOX1 ITPK1 PADI3 MYBL1 BTN3A3 FOSL2 CALCR CISH TNFRSF11 GADD45B C5orf4 HEY2 B SOX9 KCNK6 RAPGEFL1 ANKRD33 B CA12 ARID5A AMZ1 PROCA1 HPDL STX1B CD22 WDR35 HSPB8 NPY1R TMEM156 RTN4RL1 ARTN MYB C10orf2 CYP1B1- PPM1K C5AR1 AS1 PRSS23 SLC22A5 RBM24 HEY1 RAB3A XBP1 PDZK1 ARHGEF37 NCOR2 C1orf226,NOS1 SGK223 ADARB2 PLD6 AP HS3ST3B1 TGFB2 TBKBP1 SGK1 RASGRP1 EPHA4

25

POLR1B MPPED2 GPR68 SALL4 ONECUT2 RNF223 KCNF1 DIO2 ADAMTSL5 PKIB GPR132 BMF C1QTNF6 SEMA6A NR5A2 FAM46B COL12A1 ZNF488 SNORA71B EPHB3 CAMKK1 PEX11A FMN1 CCNG2 SLC7A5 TSKU PGR EDARADD

MED13L ISM1 C5AR2 SLC9A3R1 STC2 FOXN1 TMED8 NAV2 ALX4 FOXC1 MBOAT1 KIAA0226L AMOTL1 TFF1 KCNK5 TGM2 PABPN1L EGR3 KDM4B AGR3 CXCL12 SLC2A1 CCDC88C C8orf46 NRIP1 SLC2A4 SLITRK4 USP35 SIGLEC15 PEBP4 WIPF1 STC1 TMPRSS3 TFAP4 SMOX GREB1 ANKRD1 KCNQ4 FMN1 OTUB2 TIFA TMPRSS3 CARD10,MF FAM86B3P LOC100507584 NG TRIM31 SNORA75 ARHGEF26 HEG1 RPS17L PPARGC1B C8orf44,C8orf44-

RERG SGK3,SGK3

JAK2 SNORD15A

26

Supplementary Table 2: GSEA analysis of genes down-regulated upon ZNF143 depletion under E2 stimulation

Gene Set Name P-Val

Early Oestrogen Response 7.21e-15

Hypoxia 7.21e-15

Late Oestrogen Response 3.92e-9

MTORC1 signalling 3.92e-9

Fatty Acid Metabolism 9.99e-9

27

Supplementary Table 3: Genetic alterations tend towards mutual exclusivity (TCGA 2012)

Factor 1 Factor 2 P-value Log Odds Association Ratio

CTCF RAD21 0.243 -1.071 Tendency towards mutual exclusivity

RAD21 STAG2 0.394 <-3 Tendency towards mutual exclusivity

RAD21 SMC1A 0.538 <-3 Tendency towards mutual exclusivity

ZNF143 RAD21 0.628 <-3 Tendency towards mutual exclusivity

CTCF STAG2 0.795 <-3 Tendency towards mutual exclusivity

CTCF SMC1A 0.858 <-3 Tendency towards mutual exclusivity

ZNF143 CTCF 0.892 <-3 Tendency towards mutual exclusivity

STAG1 STAG2 0.904 <-3 Tendency towards mutual exclusivity

CTCF SMC3 0.927 <-3 Tendency towards mutual exclusivity

STAG1 SMC1A 0.935 <-3 Tendency towards mutual exclusivity

ZNF143 STAG1 0.951 <-3 Tendency towards mutual exclusivity

STAG2 SMC1A 0.951 <-3 Tendency towards mutual exclusivity

ZNF143 STAG2 0.963 <-3 Tendency towards mutual exclusivity

STAG1 SMC3 0.967 <-3 Tendency towards mutual exclusivity

ZNF143 SMC1A 0.975 <-3 Tendency towards mutual exclusivity

STAG2 SMC3 0.975 <-3 Tendency towards mutual exclusivity

SMC1A SMC3 0.983 <-3 Tendency towards mutual exclusivity

ZNF143 SMC3 0.988 <-3 Tendency towards mutual exclusivity

28

Supplementary Table 4: Genetic alterations tend towards mutual exclusivity (TCGA Provisional Data Set)

Factor 1 Factor 2 P-value Log Odds Association Ratio

ZNF143 RAD21 0.222 -1.131 Tendency towards mutual exclusivity

CTCF RAD21 0.443 -0.210 Tendency towards mutual exclusivity

RAD21 SMC3 0.545 -0.248 Tendency towards mutual exclusivity

CTCF SMC3 0.691 <-3 Tendency towards mutual exclusivity

STAG1 SMC3 0.737 <-3 Tendency towards mutual exclusivity

ZNF143 STAG2 0.781 <-3 Tendency towards mutual exclusivity

29

Supplementary Table 5: Fold Change mRNA Expression upon Genetic Alteration of Chromatin Looping Factors (TCGA Provisional Data Set)

Homozygous Heterozygous Gain Amplification Factor Deletion Loss

ZNF143 FC = 0.69 FC = 0.91 FC = 1.2 FC = 1.2 P-val = 0.002 P-val < 0.0001 P-val < 0.0001 P-val= NS

CTCF FC = 0.69 FC = 0.91 FC = 1.2 FC = 1.2 P-val = 0.004 P-val <0.0001 P-val = 0.0004 P-val = 0.0038

Rad21 FC = 0.73 FC = 1.6 FC = 2.7 NA P-val = 0.002 P-val < 0.0001 P-val < 0.0001

STAG1 FC = 0.82 FC = 1.1 FC = 1.4 NA P-val = 0.0001 P-val = 0.0004 P-val = NS STAG2 FC = 1.3 FC = 1.0 FC = 1.0 FC = 1.4 P-val = NS P-val = NS P-val = NS P-val = 0.02 SMC1A FC = 0.62 FC = 1.1 FC = 1.2 FC = 2.4 P-val = 0.01 P-val = 0.015 P-val <0.0001 P-val <0.0001 SMC3 FC = 0.44 FC = 0.8 FC = 1.2 FC = 0.7 P-val < 0.0001 P-val <0.0001 P-val <0.0001 P-val = NA

30

Supplementary Table 6: Genetic alterations tend towards mutual exclusivity in lung squamous cell carcinoma (TCGA, Provisional)

Factor 1 Factor 2 P-value Log Odds Association Ratio

SMC3 STAG1 0.268 <-3 Tendency towards mutual exclusivity

STAG1 STAG2 0.325 <-3 Tendency towards mutual exclusivity

CTCF STAG1 0.691 <-3 Tendency towards mutual exclusivity

RAD21 SMC1A 0.691 <-3 Tendency towards mutual exclusivity

SMC1A STAG2 0.729 <-3 Tendency towards mutual exclusivity

RAD21 SMC3 0.751 <-3 Tendency towards mutual exclusivity

ZNF143 RAD21 0.783 <-3 Tendency towards mutual exclusivity

RAD21 STAG2 0.783 <-3 Tendency towards mutual exclusivity

SMC3 STAG2 0.783 <-3 Tendency towards mutual exclusivity

ZNF143 STAG2 0.812 <-3 Tendency towards mutual exclusivity

CTCF RAD21 0.923 <-3 Tendency towards mutual exclusivity

ZNF143 CTCF 0.934 <-3 Tendency towards mutual exclusivity

CTCF STAG2 0.934 <-3 Tendency towards mutual exclusivity

ZNF143 STAG1 0.733 -0.014 Tendency towards mutual exclusivity

RAD21 STAG1 0.664 -0.203 Tendency towards mutual exclusivity

31

Supplementary Table 7: Genetic alterations tend towards mutual exclusivity in bladder urothelial carcinoma (TCGA, 2014)

Factor 1 Factor 2 P-value Log Odds Ratio Association

RAD21 STAG2 0.297 -0.998 Tendency towards mutual exclusivity

CTCF RAD21 0.460 <-3 Tendency towards mutual exclusivity

CTCF STAG2 0.504 <-3 Tendency towards mutual exclusivity

SMC1A STAG2 0.580 <-3 Tendency towards mutual exclusivity

STAG1 STAG2 0.665 <-3 Tendency towards mutual exclusivity

RAD21 SMC3 0.736 <-3 Tendency towards mutual exclusivity

SMC3 STAG2 0.763 <-3 Tendency towards mutual exclusivity

ZNF143 RAD21 0.858 <-3 Tendency towards mutual exclusivity

ZNF143 STAG2 0.874 <-3 Tendency towards mutual exclusivity

CTCF STAG1 0.886 <-3 Tendency towards mutual exclusivity

SMC1A STAG1 0.908 <-3 Tendency towards mutual exclusivity

CTCF SMC3 0.923 <-3 Tendency towards mutual exclusivity

SMC1A SMC3 0.938 <-3 Tendency towards mutual exclusivity

SMC3 STAG1 0.953 <-3 Tendency towards mutual exclusivity

ZNF143 CTCF 0.961 <-3 Tendency towards mutual exclusivity

ZNF143 SMC1A 0.969 <-3 Tendency towards mutual exclusivity

ZNF143 STAG1 0.976 <-3 Tendency towards mutual exclusivity

ZNF143 SMC3 0.984 <-3 Tendency towards mutual exclusivity

32

Supplementary Table 8: Genetic alterations tend towards mutual exclusivity in liver hepatocellular carcinoma (TCGA Provisional Data Set)

Factor 1 Factor 2 P-value Log Odds Ratio Association

RAD21 STAG1 0.319 <-3 Tendency towards mutual exclusivity

RAD21 SMC3 0.506 <-3 Tendency towards mutual exclusivity

CTCF RAD21 0.569 -0.432 Tendency towards mutual exclusivity

CTCF SMC1A 0.830 <-3 Tendency towards mutual exclusivity

CTCF STAG1 0.830 <-3 Tendency towards mutual exclusivity

SMC1A STAG1 0.876 <-3 Tendency towards mutual exclusivity

ZNF143 CTCF 0.895 <-3 Tendency towards mutual exclusivity

CTCF SMC3 0.895 <-3 Tendency towards mutual exclusivity

ZNF143 STAG1 0.924 <-3 Tendency towards mutual exclusivity

SMC1A SMC3 0.924 <-3 Tendency towards mutual exclusivity

SMC3 STAG1 0.924 <-3 Tendency towards mutual exclusivity

CTCF STAG2 0.929 <-3 Tendency towards mutual exclusivity

SMCA1 STAG2 0.949 <-3 Tendency towards mutual exclusivity

STAG1 STAG2 0.949 <-3 Tendency towards mutual exclusivity

ZNF143 SMC3 0.954 <-3 Tendency towards mutual exclusivity

ZNF143 STAG2 0.969 <-3 Tendency towards mutual exclusivity

SMC3 STAG2 0.969 <-3 Tendency towards mutual exclusivity

33

Supplementary Table 9: Primers used in this study

Gene Forward Primer Reverse Primer

Actin 5’ GGACTTCGAGCAAGAGATGG 3’ 5’ AGCACTGTCTTGGCGTACAG 3’

ZNF143 5’ CGCAGTCTGACACCATCTTG 3’ 5’ CCAATCATTCCAGTACCTGCT 3’

RARA 5’ AGGCTACCACTATGGGGTCA 3’ 5’ CGGGTCACCTTGTTGATGAT 3’

XBP1 5’ CTGAGCCCCGAGGAGAAG 3’ 5’ TGTTCCAGCTCACTCATTCG 3’

CA12 5’ GTGGTGTCCATTTGGCTTTT 3’ 5’ CAAGGTCCTTCCTGGATGTG 3’

ERBB3 5’ TCCTTCCTGCAGTGGATTCG 3’ 5’ CATCTCGGTCCCTCACGATG 3’

CCND1? 5’ TTGTTCAAGCAGCGAGTCCC 3’ 5’ CTGTTCCTCGCAGACCGAAG 3’

34

5.2 Supplementary Figures

Supplementary Figure 1

ZNF143 Binding (Vehicle) ZNF143 Binding (E2) Replicate 2 Replicate 2

R = 0.884 R = 0.928

Replicate 1 Replicate 1

Supplementary Figure 1: ZNF143 ChIP-sequencing replicates show a high degree of reproducibility Scatterplots depicting read count for ZNF143 binding in MCF-7 breast cancer cells treated with vehicle or E2 (10µM) for 45 minutes in two separate experiments. Plot was generated with SeqMonk from ZNF143 ChIP-sequencing data. R, Pearson’s correlations, were calculated from all data points.

35

Supplementary Figure 2

36

Supplementary Figure 3

37

Supplementary Figure 4

ZNF143 ESR1 FOXA1

FC = 3.1 FC = 1.1 FC = 1.2 !!! NS NS

FC = 3.6 FC = 1.1 FC = 1.2 8 !!! NS NS

FC = 1.0 50 FC = 1 100 FC = 1 NS NS NS FC = 1 6 FC = 1.1 NS 40 80 NS

Vehicle 4 E2 30 60 Vehicle Vehicle FPKM E2 E2 FPKM

FC = 1.1 FPKM NS 20 40 2

10 20

0 0 0

siScr siScr siScr siZNF143 siZNF143 siZNF143

Supplementary Figure 4: Depletion of ZNF143 mRNA does not affect ESR1 or FOXA1 expression under vehicle or E2-stimulated conditions Gene expression changes in cells transfected with siZNF143 or scrambled siren under vehicle and E2- stimulated conditions (RPKM: reads per kilobase per million). (*p<0.05; ** p<0.01; *** p<0.001; NS = Not Significant)

38

Supplementary Figure 5

ERα Positive Luminal A Luminal B

1.0 HR = 1.23 (0.91 − 1.67)

Rad21 1.0 HR = 1.4 (1.1 − 1.6) 1.0 HR = 1.3 (1.0 − 1.7)

1.0 logrank P = 0.18 1.0 HRlogrank = 1.4 P = 0.0013 HR = 1.23 1.0 HRlogrank = 1.3P = 0.027 P-val = 0.0013 P-val = 0.18 P-val = 0.027 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 Probability Probability Expression Expression Probability Expression 0.2 0.2 0.2 0.2

0.2 low 0.2 low Expression low Expression Expression low low high low high high high high 0.0 high 0.0 0.0 0.0 0.0 0.0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Time (years) Time (years) Time (years)

Supplemementary Figure 5: Rad21 expression is relevant to clinical outcome of luminal breast cancer patients

Kaplan-Meier plots derived from the METABRIC data set (Curtis et al. 2012) evaluating overall survival in ESR1-positive breast cancer patients (n=1486), Luminal A patients (n=825) and Luminal B (n=668) patients, stratified by RAD21 expression. Data were obtained from the Kaplan-Meier plotter breast cancer survival analysis database (Györffy 2010). Hazard ratios (HR) and logrank P-values are displayed

39

Supplementary Figure 6

80

60

40 Alteration Frequency (%) Frequency Alteration 20

0 Head & neck Head & neck Head & neck Lung (JHU) SC Lung squ (TCGA Lung squ (TCGA) AML (TCGA pub) Lung SC (CLCGP) Liver (AMC) Liver (AMC) Liver Uterine CS (TCGA) Breast (TCGA pub) Liver (TCGA) Lung adeno (TCGA GBM (TCGA 2013) AML (TCGA) Acute ccRCC (TCGA pub) Uterine (TCGA pub) NCI-60 Cell Lung adeno (TCGA) ucs (Johns Hopkins Lung adeno (Broad) Bladder (TCGA pub) Ovarian (TCGA pub) Stomach (TCGA pub) Breast (TCGA) Breast Glioma (TCGA) Brain Prostate (TCGA 2015) Kidney pRCC (TCGA) Bladder (BGI) Melanoma (Yale) Skin Ewing Sarcoma (DFCI) Lung SC (UCOLOGNE) ccRCC (TCGA) Kidney Colorectal (TCGA pub) Uterine (TCGA) Uterine (TCGA) Uterine Melanoma (TCGA) Skin Melanoma (Broad) Skin Bladder (MSKCC 2014) Bladder (MSKCC 2012) Ewing Sarcoma (Institut Ovarian (TCGA) Ovarian (TCGA) Ovarian Bladder (TCGA) Uveal melanoma (TCGA) melanoma Uveal Prostate (MICH) Prostate (TCGA) DLBC (TCGA) Lymphoid Cervical (TCGA) ACyC (MSKCC) Adenoid Sarcoma (TCGA) Stomach (TCGA) Breast (BCCRC Xenograft) Prostate (SU2C) Metastatic Pancreas (TCGA) Pancreatic GBM (TCGA) Glioblastoma Sarcoma (MSKCC) Pancreas (UTSW) Pancreatic ACC (TCGA) Adrenocortical GBC (Shanghai) Gallbladder MPNST (MSKCC) Malignant CCLE (Novartis/Broad 2012) Colorectal (TCGA) Prostate (Broad/Cornell 2013) Esophagus (TCGA) Esophageal Esophagus (Broad) Esophageal Colorectal (Genentech) PCPG (TCGA) Pheochromocytoma Melanoma (Broad/DFCI)

Supplementary Figure 3: Chromatin Looping Factors are Genetically Altered in Many Cancers Cholangiocarcinoma (NCCS) Overall frequency of genetic alterations in the chromatin looping factors (ZNF143, CTCF and the cohesion subunits) in datasets available through the CBioPortal database (http://www.cbioportal.org).

40

5 References

Bailey SD, Zhang X, Desai K, Aid M, Corradin O, Cowper-Sal Lari R, Akhtar-Zaidi B, Scacheri PC, Haibe-Kains B, Lupien M. 2015. ZNF143 provides sequence specificity to secure chromatin interactions at gene promoters. Nat Commun 2: 6186. Doi: 10.1038/ncomms7186.

Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. 2007. High-resolution profiling of histone methylations in the human genome. Cell 129(4): 823-837.

Barutcu AR, Lajoie BR, McCord RP, Tye CE, Hong D, Messier TL, Browne G, van Wijnen AJ, Lian JB, Stein JL. 2015. Chromatin interaction analysis reveals changes in small and telomere clustering between epithelial and breast cancer cells. Genome Biol 16: 214.

Bickel PJ, Boley N, Brown JB, Huang H, Zhang NR. 2010. Subsampling methods for genomic inference. Annals of Applied Statistics 4(4): 1660-1697.

Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE. 2007. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447(7146): 799-816.

Beroukhim R, Getz G, Nghiemphu L, Barretina J, Hsueh T, Linhart D, Vivanco I, Lee JC, Huang JH, Alexander S. 2007. Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma. PNAS 104(50): 20007-20012.

Burcin M, Arnold R, Lutz M, Kaier B, Runge D, Lottspeich F, Filippova GN, Lobanenkov VV, Renkawitz R. 1997. Negative protein 1, which is required for function of the chicken lysozyme gene silencer in conjunction with hormone receptors, is identical to the multivalent zinc finger repressor CTCF. Mol Cell Biol 17: 1281-1288.

Carretero M, Ruiz-Torres M, Rodriguez-Corsino M, Barthelemy I, Losada A. 2013. Pds5B is required for cohesion establishment and Aurora B accumulation at centromeres. EMBO J 32: 2938-2949.

Carroll JS, Meyer CA, Song J, Li W, Geistlinger TR, Eeckhoute J, Brodsky AS, Keeton EK, Fertuck KC, Hall GF et al. 2006. Genome-wide analysis of binding sites. Nature Genet 38(11): 1289-1297.

Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E. 2012. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2(5): 401-404.

Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J et al. 2008. Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells. Cell 133(6): 1106-1117.

41

Cubeñas-Potts C, Corces VG. 2015. Architectural proteins, transcription, and the three-dimensional organization of the genome. FEBS Lett . Epub ahead of print.

Cuddapah S, Jothi R, Schones DE, Roh TY, Cui K, Zhao K. 2009. Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. Genome Res 19: 24-32.

Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y et al. 2012. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486: 346-352.

DeMare LE, Leng J, Cotney J, Reilly SK, Yin J, Sarro R, Noonan JP. 2013. The genomic landscape of cohesin-associated chromatin interactions. Genome Res 23(8): 1224-1234.

Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B. 2012. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485(7398): 376-380.

Dowen JM, Fan ZP, Hnisz D, Ren G, Abraham BJ, Zhang LN, Weintraub AS, Schuijers J, Lee TI, Zhao et al. 2014. Control of cell identity genes occurs in insulated neighborhoods in mammalian . Cell 159(2): 374-387.

Eirew P, Steif A, Khattra J, Ha G, Yap D, Farahani H, Gelmon K, Chia S, Mar C, Wan A. 2015. Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution. Nature 518(7539): 422-426.

Engel N, West AG, Felsenfeld G, Bartolomei MS. 2004. Antagonism between DNA hypermethylation and enhancer-blocking activity at the H19 DMD is uncovered by CpG mutations. Nature 36: 883-888.

Ernst J, Kellis M. 2012. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods 9: 215-216.

Faresse NJ, Canella D, Praz V, Michaud J, Romanascano D, Hernandez N. 2012. Genomic Study of RNA Polymerase II and III SNAPc-Bound Promoters Reveals a Gene Transcribed by Both Enzymes and a Broad Use of Common Activators. PLoS Genetics. DOI: 10.1371/journal.pgen.1003028.

Fraser J, Rousseau M, Shenker S, Ferraiuolo MA, Hayashizaki Y, Blanchette M, Dostie J. 2009. Chromatin conformation signatures of cellular differentiation. Genome Biol 10(4): R37. Doi: 10.1186/gb-2009-10-4-r37.

Fedoriw AM, Stein P, Svoboda P, Schultz RM, Bartolomei MS. 2004. Transgenic RNAi reveals essential function for CTCF in H19 gene imprinting. Science 303: 238-240.

Filippova GN, Fagerlie S, Klenova EM, Myers C, Dehner Y, Goodwin G, Neiman PE, Collins SJ, Lobanenkov VV. 1996. An exceptionally conserved transcriptional repressor, CTCF, employs different combinations of zinc fingers to bind diverged promoter sequences of avian and mammalian c-myc oncogenes. Mol Cell Biol 16: 2802-2813.

42

Fullwood MJ, Liu MH, Pan YF, Liu J, Xu H, Mohamed YB, Orlov YL, Velkov S, Ho A, Mei PH et al. 2009. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature 462(4729): 58-64.

Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E. 2013. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 6(269): pl1. Doi: 10.1126/scisignal.2004088.

Guo Y, Xu Q, Canzio D, Shou J, Li J, Gorkin DU, Jung I, Wu H,Zhai Y, Tang Y et al. 2015. CRISPR Inversion of CTCF Sites Alters Genome Topology and Enhancer/Promoter Function. Cell 162(4): 900- 910.

Gurudatta BV, Yang J, Van Bortle K, Donlin-Asp P.G., Corces V.G. 2013. Dynamic changes in the genomic localization of DNA replication-related element binding factor during the cell cycle. Cell Cycle 12:1605-1615.

Györffy B, Lanczky A, Eklund, AC, Denkert C, Budczies J, Li Q, Szallasi Z. 2010. An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients. Breast Cancer Res Treat 123(3): 725-731.

Hadjur S, Williams LM, Ryan NK, Cobb BS, Sexton T, Fraser P, Fisher AG, Merkenschlager M. 2009. Cohesins form chromosomal cis-interactions at the developmentally regulated IFNG locus. Nature 460(7253): 410-413.

Hah N, Murakami S, Nagari A, Danko CG, Kraus WL. 2013. Enhancer transcripts mark active estrogen receptor binding sites. Genome Res 23(8): 1210-1223.

Halbig KM, Levken AC, Kunkel GR. 2012. The transcriptional activator ZNF143 is essential for normal development in zebrafish. BMC Mol Biol 13(3). DOI: 10.1186/1471-2199-13-3.

He HH, Meyer CA, Chen MW, Jordan VC, Brown M, Liu XS. 2012. Differential DNase I hypersensitivity reveals factor-dependent chromatin dynamics. Genome Res 22(6): 1015-1025.

Heidari N, Phanstiel DH, He C, Grubert F, Jahanbani F, Kasowski M, Zhang MQ, Snyder M. 2014. Genome-wide map of regulatory interactions in the human genome. Genome Res 24: 1905-1917.

Hirayama T, Tarusawa E, Yoshimura Y, Galjart N, Yagi T. 2012. CTCF is required for neural development and stochastic expression of clustered Pcdh genes in neurons. Cell Rep 2(2): 345-357.

Hou C, Dale R, Dean A. 2010. Cell type specificity of chromatin organization mediated by CTCF and cohesin. PNAS 107(8): 3651-3656.

Hou C, Li L, Qin ZS, Corces VG. 2012. Gene density, transcription, and insulators contribute to the partitioning of the Drosophila genome into physical domains. Mol Cell 48(3): 471-484.

Ji X, Dadon DB, Powell BE, Fan ZP, Borges-Rivera D, Shachar S, Weintraub AS, Hnisz D, Pegoraro G, Lee TI et al. 2015. 3D Chromosome Regulatory Landscape of Human Pluripotent Cells. Cell Stem Cell. DOI: 10.1016/j.stem.2015.11.007.

43

Jin F, Li Y, Dixon JR, Selvaraj S, Ye Z, Lee AY, Yen CA, Schmitt D, Espinoza CA, Ren B. 2013. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 503: 290- 294.

Joseph R, Orlov YL, Huss M, Sun W, Kong SL, Ukil L, Pan YF, Li G, Lim M, Thomsen JS. 2010. Integrative model of genomic factors for determining binding site selection by estrogen receptor- α. Mol Syst Biol 6(456). Doi: 10.1038/msb.2010.109.

Kagey MH, Newmann JJ, Bilodeau S, Zhan Y, Orlando DA, van Berkum NL, Ebmeier CC, Goossens J, Rahl PB, Levine SS. 2010. Mediator and cohesin connect gene expression and chromatin architecture. Nature 467: 430:-435.

Katainen, Dave K, Pitkänen E, Palin K, Kivioja T, Välimäki N, Gylfe AE, Ristolainen H, Hänninen UA, Cajuso T et al. 2015. CTCF/cohesin-binding sites are frequently mutated in cancer. Nature Gen 47: 818-821.

Kellner WA, Van Bortle K, Li L, Ramos E, Takenaka N, Corces VG. 2013. Distinct isoforms of the Drosophila Brd4 homologue are present at enhancers, promoters and insulator sites. Nucleic Acids Res 41: 9274-9283.

Kim TH, Abdullaev ZK, Smith AD, Ching KA, Loukinov DI, Green RD, Zhang MQ, Lobanenkov VV, Ren B. 2007. Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell 128(6): 1231-1245.

Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. 2013. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14: R36. Doi: 10.1186/gb-2013-14-4-r36.

Klenova EM, Nicolas RH, Paterson HF, Carne AF, Heath CM, Goodwin GH, Neiman PE, Lobanenkov VV. 1993. CTCF, a conserved nuclear factor required for optimal transcriptional activity of the chicken c-myc gene, is an 11-Zn-finger protein differentially expressed in multiple forms. Mol Cell Biol 13: 7612-7624.

Kolch W, Halasz M, Granovskaya M, Kholodenko BN. 2015. The dynamic control of signal transduction networks in cancer cells. Nature Rev Genet 15: 515-527.

Kron KJ, Bailey SD, Lupien M. 2014. Enhancer alterations in cancer: a source for a cell identity crisis. Genome Med 6(9): 77-89.

Lawrence MS, Stojanov P, Mermel CH, Robinson JT, Garraway LA, Golub TR, Meyerson M, Gabriel SB, Lander ES, Getz G. 2014. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505: 495-501.

Leiserson MDM, Vandin F, Wu HT, Dobson JR, Eldridge JV, Thoman JT, Papoutsaki A, Kim Y, Niu B, McLellan M et al. 2015. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nature Gen 47: 106-114.

44

Li G, Ruan X, Auerbach RK, Sandhu KS, Zheng M, Wang P, Poh HM, Goh Y, Lim J, Zhang J et al. 2012. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148(1-2): 84-89.

Li W, Notani D, Ma Q, Tanasa B, Nunez E, Chen AY, Merkurjev D, Zhang J, Ohgi K, Song X et al. 2013. Functional roles of enhancer RNAs for oestrogen-dependent transcriptional activation. Nature 498: 516-523.

Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO. 2009. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950): 289-293.

Lin CY, Vega VB, Thomsen JS, Zhang T, Kong SL, Xie M, Chiu KP, Lipovich L, Barnett DH, Stossi F et al. 2007. Whole-genome cartography of binding sites. PLoS Genet 3(6): e87. Doi: 10.1371/journal.pgen.0030087.

Liu MH and Cheung E. 2014. Estrogen receptor-mediated long-range chromatin interactions and transcription in breast cancer. Mol Cell Endocrinol 382(1): 624-632.

Lupien M, Keeckhoute J, Meyer CA, Wang Q, Zhang Y, W Li, Carroll JS, Liu XS, Brown M. 2008. FoxA1 translates epigenetic signatures into enhancer-driven lineage-specific transcription. Cell 132(6): 958-970.

Losada A. 2014. Cohesin in cancer: chromosome segregation and beyone. Nat Rev Cancer 14: 389- 393.

Lupien M, Meyer CA, Bailey ST, Eeckhoute J, Cook J, Westerling T, Zhang X, Carroll JS, Rhodes DR, Liu XS. 2010. Growth factor stimulation induces a distinct ERα cistrome underlying breast cancer endocrine resistance. Genes Dev 24: 2219-2227.

Lupianez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, Horn D, Kayserili H, Opitz JM, Laxova R et al. 2015. Disruptions of topological domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161(5): 1012-1025.

Magnani L, Ballantyne EB, Zhang X, Lupien M. 2011. PBX1 genomic pioneer function drives ERα signaling underlying progression in breast cancer. PLoS Genet 7(11): e1002368. Doi: 10.1371/journal.pgen.1002368.

Magnani L, Stoeck A, Zhang X, Lánczky A, Mirabella AC, Wang TL, Gyorffy B, Lupien M. 2013. Genome-wide reprogramming of the chromatin landscape underlies endocrine therapy resistance in breast cancer. Proc Natl Acad Sci USA 110(16): E1490-9.

Mootha VM, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Pulgserver P, Carlsson E, Ridderstråle M, Laurila E et al. 2003. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genet 34: 267-273.

45

Mourad R, Hsu PY, Liran J, Shen C, Koneru P, Lin H, Liu Y, Nephew K, Huang TH, Li L. 2014. Estrogen Induces Global Reorganization of Chromatin Structure in Human Breast Cancer Cells. PLOSone 10(3): e0118237. Doi:10.1371/journal.pone.0118237.

Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G. 2011. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol 12: R41. Doi: 10.1186/gb-2011-12-4-r41.

Moshkovich N, Nisha P, Boyle PJ, Thompson BA, Dale RK, Lei EP. 2011. RNAi-independent role for Argonaute2 in CTCF/CP190 chromatin insulator function. Genes Dev 25: 1686-1701.

Myslinski E, Krol A, Carbon P. 1998. ZNF76 and ZNF143 are two human homologs of the transcriptional activator Staf. J Biol Chem 273(34): 21998-2006.

Myslinski E, Gérard MA, Krol A, Carbon P. 2006. A genome scale location analysis of human Staf/ZNF143-binding sites suggests a widespread role for human Staf/ZNF143 in mammalian promoters. J Biol Chem 281(52): 39953-39962.

Nora EP, Lajoie BR, Schulz EG, Giorgetti L, Okamoto I, Servant N, Piolot T, van Berkum NL, Meisig J, Sedat J et al. 2012. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485(7398): 381-385.

Pan YF, Senali Abayratna Wansa KD, Liu MH, Zhao B, Hong SZ, Tan PY, Lim KS, Bourque G, Liu ET, Cheung E. 2008. Regulation of Estrogen Receptor-mediated Long Range Transcription via Evolutionarily Conserved Distal Response Elements. J Biol Chem 283: 32977-32988.

Parelho V, Hadjur S, Spivakov M, Leleu M, Sauer S, Gregson HC, Jarmuz A, Canzonetta C, Webster Z, Nesterova T. 2008. Cohesins Functionally Associate with CTCF on Mammalian Chromosome Arms. Cell 132: 422-433.

Parker JS, Mullins M, Cheang MCU, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z et al. 2009. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 27: 1160-1167.

Phillips-Cremins JE, Sauria ME, Sanyal A, Gerasimova TI, Lajoie BR, Bell JS, Ong CT, Hookway TA, Guo C, Sun Y et al. 2013. Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell 153(6): 1281-1295.

Plank JL & Dean A. 2014. Enhancer Function: Mechanistic and Genome-Wide Insights come Together. Molecular Cell 55(1): 5-14.

Quinlan AR and Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6): 841-842.

Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES et al. 2014. A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell 7: 1665-1680.

46

Remeseiro S, Cuadrado A, Gomez-Lopez G, Pisano DG, Losada A. 2012. A unique role of cohesin- SA1 in gene regulation and development. EMBO J 9: 2090-2102.

Renoir JM, Marsaud V, Lazennac G. 2013. Estrogen receptor signaling as a target for novel breast cancer therapeutics. Biochem Pharmacol 85(4): 449-465.

Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L. 2011. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol 12(3): R22. Doi: 10.1186/gb-2011-12-3-r22.

Ross-Innes CS, Brown GD, Carroll JS. 2011. A co-ordinated interaction between CTCF and ER in breast cancer cells. BMC Genomics 12: 593-603.

Rubio ED, Reiss DJ, Welcsh PL, Distechem Fillippova GN, Baliga NS, Aabersold, Ranish JA, Krumm A. 2008. CTCF physically links cohesin to chromatin. PNAS 105(24): 8309-8314.

Sanyal A, Lajoie BR, Jain G, Dekker J. 2012. The long-range interaction landscape of gene promoters. Nature 489 (7414): 109-113.

Schmidt D, Schwalie PC, Ross-Innes CS, Hurtado A, Brown GD, Carroll JS, Flicek P, Odom DT. 2010. A CTCF-independent role for cohesin in tissue-specific transcription. Genome Res 20(5): 578- 588.

Schmidt D, Schwalie PC, Wilson MD, Ballester B, Gonçalves A, Kutter C, Brown GD, Marshall A, Flicek P, Odom DT. 2012. Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages. Cell 148(1-2): 335-348.

Shen Y, Yue F, McCleary DF, Ye Z, Edsall L, Kuan S, Wagner U, Dixon J, Lee L, Lobanenkov VV et al. 2012. A map of the cis-regulatory sequences in the mouse genome. Nature 488: 116-120.

Sofueva S, Yaffe E, Chan WC, Georgopoulou D, Vietri Rudan M, Mira-Bontenbal H, Pollard SM, Schroth GP, Tanay A, Hadjur S. 2013. Cohesin-mediated interactions organize chromosomal domain architecture. EMBO J 32(24): 3119-3129.

Splinter E, Heath H, Kooren J, Palstra RJ, Klous P Grosveld F, Galjart N, de Laat W. 2006. CTCF mediates long-range chromatin looping and local histone modification in the beta-globin locus. Genes Dev 20(17): 2349-54.

Song L, Zhang Z, Gradfeder LL, Boyle AP, Giresi PG, Lee BK, Sheffield NC, Graf S, Huss M, Keefe D et al. 2011. Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Res 21(10): 1757-1767.

Spitz F, Eileen E. 2012. Transcription factors: from enhancer binding to transcriptional control. Nature Rev Genet 13: 613-626.

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES. 2005. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102(43): 15545-15550.

47

The Cancer Genome Atlas Research Network. 2012. Comprehensive molecular portraits of human breast tumors. Nature 490(7418): 61-70.

The Cancer Genome Atlas Research Network. 2014. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 507: 315-322.

Tyson JJ, Baumann WT, Chen C, Verdugo A, Tavassoly I, Wang Y, Weiner LM, Clarke R. 2011. Dynamic modeling of oestrogen signaliing and cell fate in breast cancer cells. Nat Rev Cancer 11: 523- 532.

Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. 2010. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isofom switching during cell differentiation. Nat Biotechnol 28(5): 511-515.

Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L. 2013. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31(1): 46-53.

Uhlmann F. 2004. The mechanism of sister chromatid cohesion. Exp Cell Res 296(1): 80-85.

Van Bortle K, Nichols MH, Li L, Ong CT, Takenaka N, Qin ZS, Corces VG. 2014. Insulator function and topological domain border strength scale with architectural protein occupancy. Genome Biol 15: R82.

Vogelmann J, Le Gall A, Dejardin S, Allemand F, Gamot A, Labesse G, Cuvier O, Negre N, Cohen- Gonsaud M, Margeat E et al. 2014. Chromatin insulator factors involved in long-range DNA interactions and their role in the folding of the Drosophila genome. PLoS Genet 10: e1004544.

Welboren WJ, van Driel MA, Janssen-Megens EM, van Heeringen SJ, Sweep FCGJ, Span PN, Stunnenberg HG. 2009. ChIP-Seq of ERα and RNA polymerase II defines genes differentially responding to ligands. EMBO J 28(10): 1418-1428.

Wendt KS, Yoshida K, Itoh T, Bando M, Koch B, Schirghuber E, Tsutsumi S, Nagae G, Ishihara K, Mishiro T. 2008. Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature 451(7180): 796-801.

Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W et al. 2008. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9(9): R137. Doi: 10.1186/gb-2008-9-9-r137.

Zhang Y, Liang J, Li Y, Xuan C, Wang F, Wang D, Shi L, Zhang D, Shang Y. 2010. CCCTC-binding factor acts upstream of FOXA1 and demarcates the genomic response to estrogen. J Biol Chem 285(37): 28604-28613.

Zuin J, Dixon JR, van der Reijden MI, Ye Z, Kolovos P, Brouwer RW, van de Corput MP, van de Werken HJ, Knoch TA, van IJcken WF et al. 2014. Cohesin and CTCF differentially affect chromatin architecture and gene expression in human cells. Proc Natl Acad Sci USA 111(3): 996-1001.

48