University of Connecticut OpenCommons@UConn

Doctoral Dissertations University of Connecticut Graduate School

1-8-2019 Characterization of Interaction in Mammalian Cells Jufen Zhu University of Connecticut - Storrs, [email protected]

Follow this and additional works at: https://opencommons.uconn.edu/dissertations

Recommended Citation Zhu, Jufen, "Characterization of Chromatin Interaction in Mammalian Cells" (2019). Doctoral Dissertations. 2053. https://opencommons.uconn.edu/dissertations/2053 Characterization of Chromatin Interaction in Mammalian Cells

Jacqueline Jufen Zhu, Ph.D.

University of Connecticut, 2019

Abstract

Higher-order chromatin organization in cell nucleus is mysterious. Microscopy and high- throughput sequencing technologies have been applied to reveal the connections between genome structure and transcription, which is the main topic of my thesis. Higher-order chromatin organization differs across cell types and species, giving an insight into disease genesis and tumorigenesis. Despite an enormous progress has been made recently in epigenomic study, lots of things remain unknown.

My dissertation reviews the current understanding of epigenomics, characterizes chromatin interaction in mammalian cells using different technologies, investigates chromatin reorganization in breast cancer cells, analyzes integrated genomic and epigenomic data in breast cancer metastasis and discusses results and future directions.

Dissertation Jacqueline Jufen Zhu

Characterization of Chromatin Interaction in Mammalian Cells

Jacqueline Jufen Zhu

B.E., Nanjing Agricultural University, 2010

M.S., Chinese Academy of Sciences, 2013

A Dissertation

Submitted in Partial Fulfillment of the

Requirements for the Degree of

Doctor of Philosophy

at the University of Connecticut

2019

i Dissertation Jacqueline Jufen Zhu

Copyright by

Jacqueline Jufen Zhu

2019

ii Dissertation Jacqueline Jufen Zhu

Approval Page

Doctoral of Philosophy Dissertation

Characterization of Chromatin Interaction in Mammalian Cells

Presented by Jacqueline Jufen Zhu, M.S.

Major Advisor______Yijun Ruan

Associate Advisor______Blanka Rogina

Associate Advisor______James Li

Associate Advisor______Derya Unutmaz

Associate Advisor______Jeff Chuang

University of Connecticut

2019

iii Dissertation Jacqueline Jufen Zhu

Acknowledgements

Time flies. It feels like only days ago that I came here to start my PhD study, though actually it is about time to say goodbye. Lots of memories, good or bad, happy or sad, emerge.

Five years ago, I decided to pursue a PhD degree in the US to continue my exploration in biology, philosophy and the universe. I knew it would be a journey to a whole new world, but I did not know I would have met so many great people and lived an unforgettable life with them around. I am so grateful to be here to have a chance to improve my understanding of human, the world and the universe with the company of everyone I know.

I would like to thank my supervisor Yijun for his help on my projects and life. Without him, I could not have learnt so many cutting-edge technologies in biological research; I could not have collaborated with scientists in different fields outside the institute; I could not have realized career life is this tough.

I would like to thank my collaborators, Kent group and Dariusz group. We had a good time working together, communicating with each other and exchanging ideas.

I would like to thank my committee members, Blanka, James, Derya and Jeff for their cooperation. As the committee chair, Blanka provided sufficient guidance for every step to my graduation. James gave lots of constructive suggestions.

I would like to thank the department of Genetics and Genomic Sciences, Dr. Babara

Kream, the graduate school and University of Connecticut.

I would like to thank my two best girl friends, Xi and Ping. Xi is the first person I met and knew in the US. She is my classmate and roommate. We lived together for years, traveled to

New York City for New Year, to Miami at Christmas, complained about school and life, did shopping at weekend, talked about everything about us and helped each other out. Till now, I

iv Dissertation Jacqueline Jufen Zhu know she is one I can go to whenever and wherever I am. Ping is one of my co-workers in the lab. She is my mentor teaching me experiments in the lab, but more like a sister. She is thoughtful and caring. I am so lucky to have her during my PhD life.

I would like to thank my parents and family. My parents always trust on me, on my choices and decisions. They are always on my side supporting and encouraging me.

I would like to thank my mother-in-law, my sister-in-law and her family.

I would like to thank everyone in the lab. Most of us work happily together. Even though there were conflicts and misunderstandings, but I know this is real life. Especially, I enjoy the companies of Emaly, Zhonghui and Minji.

I would like to thank my friends in the first year of my PhD life. We did crazy things together and had a good time.

I would like to thank so many other friends, Xin, Cathy, who listened to me and helped me relieve the stress.

I would like to thank myself. I made my effort to complete the courses, to do the research and to publish papers. I am grateful I did things that I think are right. I am not regretful at all of everything I have done.

Last but not the least, I would like to thank my best friend, my love, my soulmate and my hubby, Albert. At the first sight, he fell in love with me. At the second sight, I knew he is my destiny. He taught me programming and bioinformatics. He shared his experience of pursuing a

PhD degree with me. He filled his life with dream, my life with love and our life with responsibility. I feel I can feel his dream. I feel I am lucky to have him. I feel I could give my everything to him. I am happy of feeling so.

v Dissertation Jacqueline Jufen Zhu

Table of Contents

Chapter 1: Introduction ...... 1 1.1 Overview ...... 2 1.2 Literature Review ...... 3 The genome structure in three-dimension ...... 3 Methods to study chromatin organization ...... 9 Key regulators of genome structure ...... 12 Chromatin reorganization in diseases ...... 14 New insights into genome structure ...... 16 1.3 Hypothesis and Specific Aims ...... 17 Project 1 Visualization of chromatin loop in human cells ...... 17 Project 2 Identification of metastasis-associated chromatin interaction and gene network ...... 18 Chapter 2: Super-resolution imaging of single chromatin loop in human lymphoblastoid cells ...... 28 2.1 Introduction ...... 29 2.2 Materials and Methods ...... 31 Cell culture ...... 31 Nuclei isolation ...... 31 Coverslip coating ...... 31 DNA FISH ...... 31 iPALM imaging ...... 31 Image pre-processing ...... 32 Spots identification...... 32 Algorithms for connecting identified spots ...... 33 Spline Interpolation ...... 34 2.3 Results ...... 35 Chromatin landscape of target region and visualization by DNA FISH ...... 35 Fine structures of target chromatin revealed by 3D super-resolution iPALM imaging ...... 36 Reconstruction of chromatin targets from iPALM images ...... 39 Correlation of physical coordinates with chromatin contact mapping data ...... 44 2.4 Discussion ...... 52 Chapter 3: Investigation of enhancer regulation on metastasis by chromatin interaction in mouse cells lines ...... 54 3.1 Introduction ...... 55

vi Dissertation Jacqueline Jufen Zhu

3.2 Materials and Methods ...... 57 Cell Lines...... 57 Benzonase accessible chromatin (BACh) sequencing analysis ...... 57 Sequencing and data analysis of Benzonase-treated samples ...... 58 Long Read ChIA-PET and data processing ...... 59 RNA-seq data analysis ...... 59 Cell line specific enhancer identification ...... 59 Fosb knockdown ...... 60 CldnE knockout experiment ...... 60 qRT-PCR ...... 61 Western blot ...... 61 Wound healing and cell migration assay ...... 61 3.3 Results ...... 63 Identification of metastasis-associated and their regulatory elements and factors in 4T1 cell line series ...... 63 Gain or loss of functions of metastasis-associated genes by enhancer regulation ...... 67 Multitasking-enhancers regulate more than one metastasis-associated gene...... 73 3.4 Discussion ...... 79 Chapter 4: Identification of a core metastasis susceptibility network by integrating genetic, epigenetic and chromatin interaction analysis ...... 82 4.1 Introduction ...... 83 4.2 Materials and Methods ...... 85 Computing environment ...... 85 Identification of polymorphic BACh sites...... 85 Effect of Pioglitazone treatment on pulmonary metastases ...... 85 Effect of Dexamethasone treatment on pulmonary metastases ...... 86 4.3 Results ...... 87 Correlation of the number of BACh sites in the 4T1 cell line series with metastatic potential...... 87 Identification of polymorphic BACh sites and associated genes ...... 92 Disease and function analysis of metastasis susceptibility candidates ...... 100 Canonical pathway analysis of metastasis susceptibility candidates ...... 104 Metastasis associated genes predict patient prognosis ...... 106 4.4 Discussion ...... 108 Chapter 5: Conclusions, limitations and future directions...... 111 5.1 Summary of the findings ...... 112

vii Dissertation Jacqueline Jufen Zhu

5.2 Limitation of the studies...... 114 5.3 Future directions ...... 116

viii Dissertation Jacqueline Jufen Zhu

Table of Figures

Figure 1.1 The organization of DNA within the chromatin structure ...... 4 Figure 1.2 core particle ...... 4 Figure 1.3 Different Levels of Genome Organization ...... 8 Figure 1.4 Overview of 3C-derived methods ...... 11 Figure 2.1 Experimental design and acquired images ...... 37 Figure 2.2 Landscape of the chromatin loop ...... 38 Figure 2.3 Image characterization and modeling ...... 40 Figure 2.4 iPALM image processing algorithm ...... 41 Figure 2.5 Characterization of volumes and distances between spots for each image ...... 42 Figure 2.6 Comparisons of image models with sequencing data ...... 45 Figure 2.7 Comparisons of image models with sequencing data for Image 1 ...... 46 Figure 2.8 Comparisons of image models with sequencing data for Image 2 ...... 47 Figure 2.9 Comparisons of image models with sequencing data for Image 3 ...... 48 Figure 2.10 Comparisons of image models with sequencing data for Image 4 ...... 49 Figure 2.11 Comparisons of image models with sequencing data for Image 5 ...... 50 Figure 2.12 Comparisons of image models with sequencing data for Image 6 ...... 51 Figure 3.1 Characterization of cell up-regulated genes and associated enhancers in 4T1, 4T07 and 67NR...... 65 Figure 3.2 Binding motif analysis ...... 66 Figure 3.3 Characterization of 4T1-specific enhancers ...... 68 Figure 3.4 Characterization of 67NR-specific enhancers ...... 69 Figure 3.5 Functional validation of CldnE knockout by CRISPR ...... 75 Figure 3.6 Characterization of CldnE knockouts ...... 77 Figure 3.7 Chromatin interaction between MycE and metastasis-associated genes in trans ...... 81 Figure 4.1 Characterization of chromatin accessibility by BAChs ...... 89 Figure 4.2 Evaluation and characterization of BACh sites ...... 91 Figure 4.3 Gene identification by ChIA-PET and polymorphic BACh analysis ...... 96 Figure 4.4 Venn diagrams of BACh sites, RNAPII binding peak and SNPs ...... 97 Figure 4.5 Disease and functional pathway analysis ...... 101 Figure 4.6 Pathway analysis of 338 identified genes ...... 105 Figure 4.7 Distinct chromatin interactions and survival analysis on identified metastasis- associated genes ...... 107

ix Dissertation Jacqueline Jufen Zhu

Table of Tables

Table 2.1 Characterization of images ...... 43 Table 2.2 Linear length of image models...... 43 Table 3.1 Primers for genotyping ...... 60 Table 3.2 Primers for qRT-PCR ...... 62 Table 3.3 Multiple enhancer regulation on metastasis-associated genes in 4T1 cells ...... 69 Table 3.4 Multiple enhancer regulation on metastasis-associated genes in 67NR cells ...... 71 ...... 77 Table 3.5 Multi-tasking enhancers in 4T1 cells ...... 77 Table 3.6 Multi-tasking enhancers in 67NR cells ...... 78 Table 3.7 Co-localization frequency of MycE and metastasis associated genes ...... 81 Table 4.1 Results of Integrated DHS, SNP and ChIA-PET screening strategy ...... 97 Table 4.2 Putative Common Core Metastasis Susceptibility Genes...... 98 Table 4.3 Significantly enriched promoter motifs in core metastasis susceptibility candidate genes ...... 99 ...... 105

x Dissertation Jacqueline Jufen Zhu

Chapter 1: Introduction

1 Dissertation Jacqueline Jufen Zhu

Chapter 1

1.1 Overview

Chromatin interaction is recently studied to reveal the molecular mechanisms of transcription regulation and disease development in mammalian cells by using high-throughput sequencing and microscopic imaging technologies. The investigation in chromatin interaction might provide novel insights into cancer therapeutics and precision medicine.

In this chapter, I will review the current understanding of chromatin interaction and genome organization, the molecular mechanism of transcription regulation by DNA elements in different cell types including normal and cancer cells and the cutting-edge technologies in studying chromatin biology. Then I will outline the hypothesis and specific aims of my thesis projects.

2 Dissertation Jacqueline Jufen Zhu

1.2 Literature Review

Cell nucleus contains the whole genome, stores information and decides cell fate. The mammalian genome comprises of meters of DNA. How the DNA is organized in the tens of microns cell nucleus is not fully known. The chromatin structure is dynamic over time to deliver accurate command for cell function. Therefore, the disorder of chromatin reorganization could lead to gene dysregulation and cellular abnormalities, which is often an implication of diseases.

I will review studies in chromatin organization form different perspectives and its implications in cancer development especially breast cancer.

The genome structure in three-dimension

How chromatin is organized in eukaryotic cell nucleus is a historically mysterious question. It is known DNA is packed in different levels to allow meters long linear DNA condensed in micrometers nucleus. In 1970s, the repeating “beads on string” structure was observed by electron microscopy and were stated as the first level structural and functional unit for chromatin compaction1–5 (Figure 1.1). It was firstly proposed that the nucleosome contained around 200 bp of DNA2,6, but later it was found more accurately by X-ray crystal structure analysis about 146 bp of DNA were wrapped around the octamer to form nucleosomes7 (Figure 1.2), which were connected by dozens of bp of linker DNA, appearing as a “beads on string” structure8. The 10 nm DNA fiber is then folded into higher- order chromatin structures that allows further compaction fitting into the nucleus. The higher- order genome structure is elusive for tens of years. Although the 30 nm chromatin fiber structure was observed and proposed to be the next level structure of 10 nm fiber5,9–11, it remains debatable since it was only observed in vitro8,12,13. Even less is known about the higher-order

3 Dissertation Jacqueline Jufen Zhu structure other than the putative 30 nm fiber. Therefore, more advanced technologies for in vivo analysis are highly desired to decipher the higher-order genome structure.

Figure 1.2 Nucleosome core particle Ribbon traces for the 146-bp DNA Figure 1.1 The organization of DNA within phosphodiester backbones (brown and the chromatin structure turquoise) and eight histone main The lowest level of organization is the chains (blue: H3; green: H4; yellow: H2A; red: nucleosome, in which two superhelical turns of H2B). The views are down the DNA superhelix DNA are wound around the outside of a histone axis for the left particle and perpendicular to it octamer. Nucleosomes are connected to one for the right particle. For both particles, the another by short stretches of linker DNA. pseudo-twofold axis is aligned vertically with The figure and legend are cited from (Luger et the DNA center at the top. al., 1997)8 The figure and legend are cited from (Felsenfeld and Groudine, 2003)7

With the development and application of DNA in situ hybridization (ISH) and fluorescence in situ hybridization (FISH) combined with microscopy, the whole were managed to be painted and territories (CTs) (Figure 1.3) were clearly observed14–16 proving the DNA fibers from different chromosomes are not intermingling like spaghetti but compartmentalized into discrete domains in interphase nucleus. After then, this chromosome painting technology was applied to study chromatin organization at chromosomal level in different cell types. The major goal is to examine the non-randomness of CT arrangement and identify its connections to gene transcription and cellular function. For example, the studies of gene-rich (Chromosome (Chr) 19, Chr17, Chr11) and gene-poor (Chr18, Chr13,

ChrY) chromosomes indicated gene-rich CTs were in more irregular shape (ellipsoid-like) than

4 Dissertation Jacqueline Jufen Zhu gene-poor CTs17. Another example is the CTs of Chr18 and Chr19 were delineated and compared in lymphoblastoid cells across primate species. It was consistently found in all the primate species examined Chr18 were positioned closer to nuclear periphery, while Chr19 was more toward the centroid of the nucleus18, implying a radial positioning of genes in cell nucleus for transcription regulation, which is consistent with the finding that nuclear lamina (Figure 1.3) or lamina-associated domains (LADs) at nuclear periphery are more heterochromatic and associated with transcriptional repression19. Furthermore, the relative position of certain gene and the CT can be transcription relevant20. During the differentiation process of mouse embryonic stem cells (ESCs), it was observed Hoxb genes got activated by extruding out of the

CT, while stayed in the CT before activation21. Broadly, it has been suggested transcriptionally active genes are more often found on the periphery of CT20,22. There then came a speculation of the inter-chromosomal domain (ICD) that the CT boundaries and the space between adjacent

CTs are decondensed and preferentially transcriptionally active with RNA transcripts released directly23; and it was reported ICD can facilitate information exchange and cooperation of gene regulation between chromosomes24. However, it does not mean gene cannot transcribe within

CTs25.

Recently, high-throughput next-generation sequencing technology provided an advantageous platform for genome structure characterization. The most hotly used are chromatin immunoprecipitation sequencing (ChIP-seq) of histone marks, DNase hypersensitive sites (DHS) and transposase-accessible chromatin using sequencing (ATAC-seq)26 to examine chromatin state and accessibility; high-throughput chromosome conformation capture (Hi-C)27 and chromatin interaction analysis using pair-end tag sequencing (ChIA-PET)28 to genome-widely map long-range chromatin interaction (usually from several kilobases to hundreds of kilobases)

5 Dissertation Jacqueline Jufen Zhu that can be reconstructed and remodeled to simulate the whole picture of the genome. Chromatin state or accessibility is an important indicator of the activity of gene or DNA element. For instances, Histone 3 lysine 27 acetylation (H3K27ac) distinguishes active enhancers from poised enhancers29, while Histone 3 lysine 4 trimethylation (H3K4me3) is enriched in active promoters30. However, it does not reflect the 3D structure of the genome as it cannot detect the connectivity of individual chromatin sites. Chromatin interactions suggests a looping structure which means two chromatin loci can physically come together even they are far away in linear genomic distance (Figure 1.3). Chromatin interaction is usually mediated by protein factors, either transcriptionally or structurally related. Transcription-associated chromatin interaction may form transcription factories to efficiently facilitate gene transcription31,32 (Figure 1.3).

Overall, the interactions can be promoter-promoter, distal DNA element-distal DNA element or promoter-distal DNA element31,33. Of them, enhancer-promoter is the most well-studied type of interaction. Recently, the concept of super-enhancer was established as a region composing multiple enhancers involved in cellular identity bound by a series of master transcription factors34. Theoretically, by reconstructing and remodeling the genome-wide interactions, we will be able to depict any compartment and even the whole picture of the genome35,36. According to the chromatin interaction frequency of cell population, a structure with mega bases of DNA was defined as topologically associating domain (TAD) (Figure 1.3) that is conserved across cell types in a given organism37,38, but may differ between cell lineages36,39. Within TAD, chromatin contacts occured remarkably more frequent than those outside40. Extensive studies have found

TAD plays important roles in transcription regulation and DNA replication41,42.

Similar to TAD, insulated neighborhood was proposed to be the mechanistic basis of

TAD. Insulated neighborhood is defined as a CCCTC-binding factor (CTCF)-CTCF confined

6 Dissertation Jacqueline Jufen Zhu chromatin loop, with at least one gene inside, CTCF binding at both anchor sites, co-bound by cohesin. It provides an insulated region for specific enhancer-promoter interaction and transcription. Based on the definition, at least 50% of the TADs are insulated neighborhoods in mammalian cells43.

7 Dissertation Jacqueline Jufen Zhu

Figure 1.3 Different Levels of Genome Organization From top to bottom: Level 1: Chromosomes occupy distinct subregions of the nucleus known as chromosome territories (CTs). Individual chromosomes are indicated by different colors. Level 2: Transcriptionally inactive regions are enriched at the nuclear periphery, where they contact the nuclear lamina (red). Actively transcribed genes often colocalize at RNA polymerase II transcription factories (yellow). These and other instances of colocalization between regions with similar transcriptional activity may provide the physical basis for the observations of A and B compartments in C data. Level 3: Topological domains, or topologically associating domains (TADs), are regions of frequent local interactions separated by boundaries across which interactions are less frequent. CTCF binding sites and other sequence features are enriched at TAD boundaries. Note that CTCF also binds within TADs. Cohesin is often present at TAD boundaries, although it is not shown here. Level 4: Transcriptional regulation depends on long-range interactions between cis-regulatory elements such as enhancers (light red) and promoters (light yellow). These cis-regulatory interactions are facilitated by including transcription factors (TFs; blue), cofactors such as Mediator (Med; red) and Cohesin (purple ring), and RNA polymerase II (Pol II; yellow). The figure and legend are cited from (Gorkin et al., 2014)128

8 Dissertation Jacqueline Jufen Zhu

Methods to study chromatin organization

The most often used methods to investigate chromatin organization are DNA FISH combined with microscopy, high-throughput sequencing combined with molecular library construction and genome editing including clustered regularly interspaced short palindromic repeats (CRISPR)44 or transcription activator-like effector nucleases (TALEN)45 systems.

FISH assays have been applied to stain interest chromatin regions, sizing from dozens of kilobases to the whole chromosome, followed by microscopic imaging. The two most critical factors affecting staining capabilities and imaging qualities are DNA probes and microscopic resolution. Probes for CT and whole chromosome are already well-established. However, to study ultrastructure such as chromatin interaction, which usually spans hundreds of kilobases of

DNA with two interacting loci of only several kilobase length at the anchors, regulator bacterial artificial chromosome (BAC) originated probes can hardly fulfill the requirement as each BAC probe stains about 200 kb DNA lacking accuracy and specificity. To overcome this shortage of regular probes, oligopaints have been developed with advantages in staining accuracy, specificity and flexibility46. Basically, oligopaints can be designed to stain any desired chromatin regions by pooling microarray probes. It has been successfully applied to distinguish homologous chromosomes47 and evaluate chromatin state48. Regarding microscopy technologies, the trend is to bypass the diffraction barrier, break the resolution limit of 200 nm inherent in conventional microscopy, and achieve super resolution. Though electron microcopy has higher resolution than conventional microscopy, it is not easy to apply in studying chromatin structure due to the limitation of labeling. I will only review super-resolution light microcopy in this chapter. Of all the super-resolution microscopic technologies, there are three main methods that have been widely used: structural illumination microscopy (SIM)49, stimulated emission depletion

9 Dissertation Jacqueline Jufen Zhu

(STED)50 and photoactivated localization microscopy/stochastic optical reconstruction microscopy (PALM/STORM)51,52. Beyond the basic two-dimension (2D) imaging, 3D SIM53,

3D STED54, 3D STORM55–57 and interferometric PALM (iPALM)58 have been developed to be more powerful for 3D genome study47,59–62. Recently, these super-resolution imaging technologies were largely utilized in investigating chromatin structure. Epigenetic states63,64 and

TAD65 inferred by sequencing data were imaged and analyzed by STORM, validating the predictions made from sequencing data. Whereas, the biggest limitation of imaging-based methods is low-throughput that only a few cells can be imaged at a time.

Next-generation sequencing provided an efficient platform for genome structure study with cell populations (Figure 1.4). As mentioned previously, ChIP-seq with histone mark antibodies are very commonly used to investigate the distribution of certain chromatin state, active, inactive or poised. Similarly, DNase and ATAC-seq are also used to identify chromatin state, but can only classify it into two states, either open (transcriptionally active) or closed

(transcriptionally inactive). More importantly, proximity ligation-based technologies came out to capture chromatin contacts and map its 3D organization66. Chronologically, chromosome conformation capture (3C)67 was firstly developed to capture one-to-one interactions. After then, circular chromosome conformation capture (4C)68 and chromosome conformation capture carbon copy (5C)69 came out at about the same time to identify one-to-many and many-to-many chromatin contacts, respectively. In 2009, Hi-C27 and ChIA-PET28 were firstly reported to capture chromatin interactions genome-widely in an all-to-all manner. The differences between

Hi-C and ChIA-PET is that Hi-C captures all the chromatin interactions while ChIA-PET selectively captures a subset of interactions mediated by interest protein factors. Both Hi-C and

ChIA-PET have nonnegligible shortcomings: Hi-C needs billions of sequencing reads and ChIA-

10 Dissertation Jacqueline Jufen Zhu

PET requires millions of cells. To overcome these difficulties, novel technologies have been developed recently such as HiChIP70, Capture Hi-C71 and others.

Figure 1.4 Overview of 3C-derived methods An overview of the 3C-derived methods that are discussed is given. The horizontal panel shows the cross-linking, digestion, and ligation steps common to all of the ‘‘C’’ methods. The vertical panels indicate the steps that are specific to separate methods. The figure and legend are cited from (Wit and Laat, 2012)66

11 Dissertation Jacqueline Jufen Zhu

Besides DNA FISH imaging and genome sequencing, genome editing tools such as

TALEN and CRISPR44 have also be used to validate chromatin interaction inferred by sequencing data. Specifically, multigene co-transcription, enhancer regulation on gene promoter and CTCF mediated repressive chromatin loop43,72,73 were validated functionally using

TALEN74,75 and CRISPR/Cas976,77 for genetic editing. More interestingly, other than genetic editing, CRISPR/dCas9 can be applied in epigenetic modification on target chromatin site, in order to modulate gene transcription78. In the other way around, functional CRISPR screening enables to map chromatin interaction and identify enhancers79–83, providing a novel insight into genome structure investigation.

Key regulators of genome structure

Genome architecture is established with known and unknown factors involved. The most well-studied protein factors driving chromatin organization are CTCF, cohesion and RNA polymerase II (RNAPII). Recently, Yin Yang 1 (YY1) was reported to regulate enhancer- promoter interaction. In addition, some long non-coding RNA (lncRNA) such as X-inactive specific transcript (Xist) and functional intergenic repeating RNA element (Firre) also play important roles in genome structure conformation.

CTCF is a highly conserved protein from fly to human. 3D genome studies found CFCF is a key genome organizer that mediates chromatin contacts, bridging the gap between the working mechanism and its contradictorily diverse functions. As a gene activator, CTCF can tether promoters with their enhancers or other promoters, forming a transcription complex to promote gene expression33,84,85. As an insulator, CTCF was observed to enrich at the boundaries of TAD36,37, demarcating the TADs from interacting with each other, thus, inhibited information exchange between enhancers and promoters to repress gene transcription. It was proved by

12 Dissertation Jacqueline Jufen Zhu interrupting the TAD boundary, genes expressed ectopically43,86,87. Furthermore, ChIA-PET analysis showed CTCF binding motif orientation is related to transcription35, connecting delicate

CTCF positioning with gene activity.

Cohesin is another important architectural regulator functioning similarly with CTCF. It is a protein complex composed of subunits Smc1A, Smc3, Rad21 and Stag1/2. Cohesin colocalize heavily with CTCF locating at the boundaries of TAD37 and mediating chromatin interaction associated with gene activation or repression. However, cohesin can bind chromatin site independently of CTCF. For example, cohesin was found to interact with Mediator to contribute to chromatin looping and co-occupy active gene promoters and enhancers in ESCs88.

Despite the similar functions as genome organizers, cohesin and CTCF affect chromatin organization differentially. A general loss of local chromatin interactions by disrupting cohesin was observed, but the TADs remain intact. Whereas, the depletion of CTCF not only reduced interactions within TADs but also increased interactions across TADs89. Furthermore, distinct groups of genes become dysregulated upon depletion of cohesin and CTCF89.

RNAPII is an essential subunit of transcription factors for gene transcription in eukaryotic cells. ChIA-PET analysis showed RNAPII mediates transcriptionally related chromatin interaction. In most of the cases, RNAPII mediated chromatin interactions are shorter than and residing in CTCF/cohesin mediated ones35. It is often used to identify potential enhancers and examine enhancer reorganization by studying RNAPII mediated chromatin loops.

YY1 is a transcription factor in mammalian cells90–92. Only recently, YY1 was found to be involved in chromatin interactions between enhancers and promoters across cell types93.

Xist is a well-known lncRNA functioning to inactivate one of the X chromosomes in female cells. Xist express and spread over in cis to silence the whole X chromosome. It is still

13 Dissertation Jacqueline Jufen Zhu not clear how Xist inactivates X chromosome. Comprehensive Xist interactome study revealed

Xist repelled architectural factors such as cohesin from the inactive X chromosome (Xi), directing a specific chromosome conformation of Xi94.

Firre is a structural component of nuclear matrix expressed in X chromosome. However, its role in shaping genome structure is poorly understood. Firre was found to express locally at X chromosome but interact in trans with genes at autosomes95, suggesting a modulation mechanism across chromosomes. Additionally, by ChIA-PET analysis, super long chromatin loops projected out from Firre genomic locus to both sides only at active X chromosome (Xa) in

GM12878 cells35, consistent with the knowledge that Xi and Xa are in different chromosome conformation, implying the contribution of Firre.

Chromatin reorganization in diseases

Higher-order chromatin organization is one of the critical factors driving gene transcription, which is different across cell types and species. It is natural to assume chromatin is reorganized in disease cells. Studying disease-associated genome structures reveals fundamental mechanisms underlying disease genesis, providing insights into therapeutics. I will review the reorganization in diseases from different levels of CT, TAD, insulated neighborhood and enhancer-promoter interaction.

In immune-deficiency centromeric instability facial anomalies (ICF) syndrome, PAR2 genes were reorganized to localize toward the outside of Xi and Chromosome Y (ChrY) CTs, accompanying with loss of DNA methylation on Xi96. This is a direct reflection of connections between CT reorganization and disease. Indirectly, translocations prefers to occur between neighboring chromosomes instead of chromosomes that are far away from each other97,98. Hence, translocation-associated disease is attributed to the organization of CTs to some degree.

14 Dissertation Jacqueline Jufen Zhu

TAD is a highly self-interacting chromatin domain discussed previously. It maintains frequent interactions inside these discrete compartments but prevents interactions across them.

Disruption of TADs can cause ectopic gene expression and diseases. For instance, the altered chromatin structure at WNT6/IHH/EPHA4/PAX3 locus spanned by TADs is the cause of a distinct human limb malformation. By disrupting the TAD boundary in normal mice, enhancers and promoters ectopically rewired across TADs, resulting pathogenic phenotypes, suggesting a molecular mechanism for limb malformation86. Another example is about T-cell acute lymphoblastic leukemia (T-ALL). The perturbation of insulated neighborhood boundaries in non-malignant cell activated T-ALL proto-oncogenes73. Somatic mutations, DNA hypermethylation and structural variations at the anchor sites of insulated neighborhoods were found to contribute to oncogenesis73,99,100.

Enhancer-promoter interaction is the most straightforward structure regulating gene transcription. Enhancer malfunction could be the cause of diseases. Genetic or epigenetic variations such as point mutations101 and aberrant methylation102,103 on enhancers were found to contribute to cancer development and tumor progression. In colorectal cancer71 and breast cancer104, enhancer arrangements were observed changed compared with normal cells. A region containing a cluster of enhancers occupied by Mediator and other master transcription factors was found to suggest cellular identify and defined as super-enhancer34. Super-enhancers were widely studied in a variety of diseases and cancers such as lymphoma105,106, myeloma107 and T-

ALL108. The acquisition of super-enhancers at oncogenes during the process of tumor development could be the driver pathogenesis109.

15 Dissertation Jacqueline Jufen Zhu

New insights into genome structure

Despite higher-order genome structure has been studied for about half a century, lots of unknown remains.

It is hard to ignore that the limitations of current technologies stopping us from diving deeper into the genome structure. Though DNA probing and imaging allows us to visualize the chromatin organization directly, the resolution is relatively low. We can hardly handle to resolve the DNA arrangement at kilobase level, not to say base level. The advantage of sequencing methods is that it gives a better resolution than imaging. However, it requires a relatively large number of cells. What we have acquired from sequencing data is an averaged information but not accurate for each cell individual, which ignored heterogeneity across single cells.

Many studies focused on those key regulators on genome organization, but may overestimate their function. As Denes Hnisz reviewed110, TADs and gene transcription inside were not affected upon CTCF depletion by some studies, suggesting gene activation or repression play important roles in genome arrangement in CTCF-independent manner. This debated observation suggested the key regulators may not be truly “key” at all times but only under particular circumstances.

An interesting argument is whether the genome structure is the cause or consequence of gene transcription, or both. Based on current knowledge, it seems the answer is both. It is like a circuit, the arrangement of the genome somehow determines the gene transcription, vice versa, the transcribed gene functions to further shape the genome structure, till an internal balance achieved. The balance can be broken by operational mistakes such as genetic mutations or external stimuli. The cell will search to re-achieve another round of balance, till die.

16 Dissertation Jacqueline Jufen Zhu

1.3 Hypothesis and Specific Aims

Project 1 Visualization of chromatin loop in human cells

How meters long chromatin is packed in micrometer-scale cell nucleus is mysterious.

The 3D structure of the genome is of fundamental importance for gene regulation and cellular function. Methods to study 3D genome include molecular biology tools combined with next generation sequencing and DNA staining combined with microscopic imaging. Regarding sequencing, ChIA-PET and Hi-C technologies are most often used to capture higher-order genome organization in kilo base resolution. Chromatin imaging can directly visualize genome positioning using DNA FISH followed by microscopy.

Recent studies in genomics based on sequencing technologies inferred the very basic functional chromatin folding structures known as chromatin loop. Long-range chromatin interactions take place in nuclei are often mediated by protein factors at the loop anchor loci.

Due to theoretical and technical limitation of conventional microscope, the chromatin loop cannot be resolved. To prove the existence of chromatin loop and elucidate its finer structure, super-resolution microscopy is a powerful technology to be applied. iPALM is one of the most well-established super-resolution microscopies that can resolve chromatin 3D structure at single molecule level.

Hypothesis/Rational

Hi-C and ChIA-PET contact map data showed chromatin loops at various genomic loci and length scales. After examining the whole genome, the target chromatin loop region was selected at chromosome 14, which contains a high frequency loop that is about 13kb long and mediated by both CTCF and RNAPII in GM12878 cells. Super-resolution microscopy such as

STORM has been recently used to image functional chromatin domains TAD and epigenetic

17 Dissertation Jacqueline Jufen Zhu states63,65. iPALM can achieve even higher resolution than STORM axially, which makes it promising to resolve finer genome structures such as chromatin loop. By visualizing the target chromatin loop, I expected to see a major looping structure inferred ChIA-PET and Hi-C data.

Possibly, there could be more complex structures beside the major loop.

Specific Aim

Aim Super-resolution imaging of a distinct chromatin loop in human lymphoblastoid cells.

I applied super-resolution microscopy iPALM to image a distinct chromatin loop inferred by

ChIA-PET in human lymphoblastoid GM12878 cells. I then processed the images for 3D modeling and compared it with genomic data for evaluation.

Project 2 Identification of metastasis-associated chromatin interaction and gene network

Beyond studying chromatin interaction in normal cells, we also aim to investigate chromatin interaction and genome structures in breast cancer cells and identify metastasis- associated structures. Implied by connections between DNA elements, we can identify novel genes and investigate the underlying molecular mechanisms in diseases and cancer.

Breast cancer lethality is primarily due to tumor metastasis to other organs such as the lungs, liver and brain. Despite this, relatively little is understood about the etiology and molecular mechanism of metastasis. Although previous works have identified genes whose dysregulation is likely to be involved in metastasis111–119, more needs to be studied in a comprehensive way.

One promising lead comes from the aspect of DNA element regulation on transcription. It has been shown that chromatin interactions between gene promoters and distal enhancers can regulate gene expression and establish cellular identity, the disruptions in which can lead to

18 Dissertation Jacqueline Jufen Zhu disease states31,86,120. By analyzing the global transcription and chromatin interaction in three mouse mammary tumor-derived cell lines (4T1, 4T07 and 67NR) that vary in metastatic potential121,122 although derived from the same tumor, we can identify metastasis-associated genes and their regulatory elements. Further exploration on the functional consequences by disrupting specific interactions using a genome-editing approach will provide more evidences to prove the role of the regulatory elements.

By integrating polymorphic differences in mouse strains with differing metastatic capacities with chromatin accessibility and interaction information to investigate the metastatic cascade, we are able to identify genes involved in disseminated cell to metastatic lesion. By implementing pathway analysis and validation, it will provide novel insights into the mechanism of metastatic colonization for potential clinical intervention.

Specific Aim

Aim 1 Investigation of enhancer regulation on metastasis by chromatin interaction in mouse cells lines. This aim is to interrogate enhancer regulation on metastasis associated genes by characterizing genome-wide transcription and chromatin interaction combined with genome editing validation.

Aim 1a Profiling transcription, chromatin accessibility and chromatin interaction of a series of mouse cell lines. With 4T1, 4T07 and 67NR cells, I profiled the gene transcription by

RNA-Seq, characterized chromatin accessibility by benzonase-accessible chromatin (BACh) analysis, and mapped chromatin interactions by ChIA-PET, specifically examining interactions mediated by RNAPII.

Hypothesis/Rationale: I used a mouse cell model, a series of cell lines 4T1, 4T07 and 67NR derived from the same spontaneous mammary tumor of BALB/c mouse that show various

19 Dissertation Jacqueline Jufen Zhu degrees of metastatic potential from high to low123. RNA-seq, BACh and ChIA-PET analysis gave us a very comprehensive view of all the three cell lines to study metastasis, given the hypothesis of chromatin reorganization leading cancer metastasis.

Aim 1b Identification of metastasis-associated transcription and chromatin interaction. By comparative analysis of datasets from Aim 1a for the three cell lines, I identified differential expressed genes especially 4T1 up-regulated genes that are most likely metastasis associated.

Combining the identified genes with BACh and ChIA-PET data, I further identified chromatin interactions between potential enhancers and gene promoters associated in metastasis in 4T1 cells.

Hypothesis/Rationale: With the differences of metastatic predispositions in 4T1, 4T07 and

67NR cells though from the same tumor, comparative analysis of their transcription, chromatin accessibility and chromatin interaction reveal the higher-order regulatory mechanism for gene transcription.

Aim 1c Validation of enhancer function in metastasis. Informed by the results from the Aim

1b, I disrupted the target chromatin interactions by knocking out the enhancer in metastatic 4T1 cells using CRISPR/Cas9 system to interrogate transcriptional and phenotypic changes in cell adhesion and migration.

Hypothesis/Rationale: Informed by the results from Aim 1a and Aim 1b, enhancers are candidate targets in transcription regulation. Thus, it is meaningful to validate enhancer function by knocking it out to interrupt the interaction with target genes. In 4T1 cells, knocking out the identified enhancer from Aim 1b could suppress target gene expression and metastatic phenotypes.

20 Dissertation Jacqueline Jufen Zhu

Aim 2 Identification of a core metastasis susceptibility network by integrating genetic, epigenetic and chromatin interaction analysis.

Aim 2a Identification of polymorphic BACh sites that potentially promote metastasis. I identified 4T1-specific BACh sites based on BACh analysis mentioned in Aim 1. In parallel, I compared the single nucleotide polymorphisms (SNPs) in five pairs high- and low-metastatic mouse strains (BALB/cj (high) and SEA/GnJ (low); AKR/J (high) and RF/J (low); C57BL/10J

(high) and C57BR/J (low); C57BL/10J (high) and C58/J (low); CAST/Ei (high) and MOLF/Ei

(low)) to filter for the SNPs specifically located in low metastasis strains that potentially suppress metastasis. To enrich for chromatin sites affecting metastasis progression, I identified

4T1-specific BACh sites that have filtered low metastasis associated SNPs residing in.

Hypothesis/Rationale: Based on human studies, it indicates that the majority of disease- associated polymorphisms are non-coding124 and therefore likely mediate their effects through subtle changes in gene transcription125,126. Previously study showed the five low- metastasis mouse strains were derived from a highly metastatic strain FVB/NJ inbred to other mouse strains.

We assumed the acquired low metastatic potential were due to the differed genetic background of the parental mouse strains. Different mouse strain pairs that are close phylogenetically but with high- and low- metastasis were used for SNP analysis. SNPs unique in low metastatic strains are implied to associate with metastasis, therefore, the alteration of which in metastatic cells is one probable reason to cause metastasis susceptibility. The polymorphic BACh sites mapped out by BACh sites with those low-metastasis-associated SNPs inside in metastatic mouse cell were identified to be metastasis associated via promoter or non-promoter regulation on target genes.

21 Dissertation Jacqueline Jufen Zhu

Aim 2b Identification of candidate metastasis susceptibility genes and pathways. I mapped out the connectivity between potential regulatory elements and gene promoters using RNAPII

ChIA-PET. Combined with identified polymorphic BACh sites that associate with metastasis susceptibility, I identified candidate metastasis susceptibility genes in each pair of mouse strains.

By intersecting the genes identified from the five pairs of mouse strains, I got a core set of metastasis susceptibility gene candidates. To better understand the potential function of these candidate genes, I performed pathway analysis to categorize the functions and build up the network for metastasis development.

Hypothesis/Rationale: Genes associated with identified polymorphic BACh sites in Aim 2a would be enriched for factors necessary for the conversion of dormant solitary tumor cells to a proliferative metastatic lesion. Based on this hypothesis, the application of ChIA-PET can improve the specificity to identify genes and pathways associated with metastasis. Previously, putative candidate genes were identified by proximity to DHS and subsequent analysis indicated that these candidates were enriched for factors and mechanisms associated with progression in both mouse models and human breast cancer patients. Despite the success of these studies, two major limitations were present that decreased the specificity of this analysis. First, whole genome

DHS data does not permit refinement or filtering of candidate genes for specific points along the metastatic cascade, specifically at the most clinically relevant point of the dormant-to- proliferative transition. Second, identification of candidate genes by proximity to DHS sites introduces significant false positive and false negative data, since it is known that chromatin structure can bring distant enhancers into contact with promoters by looping. By mapping the chromatin interactions such as enhancer-promoter interactions, it is more specific and trustable to identify the genes connected with polymorphic BACh sites.

22 Dissertation Jacqueline Jufen Zhu

Aim 2c Validation of identified genes and pathways. To investigate the role of genes and pathways within the identified candidates in Aim 2b, I selected glucocorticoid receptor signaling for validation. I implanted 6DT1 cells into mice and treated them with dexamethasone to activate glucocorticoid signaling. By comparing the local and metastatic tumor burden between dexamethasone-treated and control mouse group, the influences of glucocorticoid signaling pathway was evaluated.

Hypothesis/Rationale: As a set of genes and pathways were identified to be associated with metastasis using genetic and epigenetic tools in Aim 2b, within which, some are novel that have never been reported. It is meaningful to further validate their important roles by animal experiment.

23 Dissertation Jacqueline Jufen Zhu

1.4 Background and Significance

Recent 3D genome studies demonstrated genomic distance is not always consistent with physical distance in vivo, linear arrangement of the genome cannot fully represent the mechanism responsible for transcription regulation. Techniques that combine biochemistry with high-throughput sequencing such as Hi-C and ChIA-PET have been used to map chromatin loops that physically interact at long distances, and these interaction maps have been used to infer the 3D structures of the genome. Comparative analysis of different cells showed that the higher-order genome organization such as chromatin interactions could lead to cell-specific transcription by shaping nuclear topology, and suggested enhancer-promoter interactions significantly enriched for cell-specific functions.

Due to limitations in imaging resolution inherent to conventional microscopy, the spatial organization of higher-order chromatin structure has historically been difficult or impossible to directly observe. With the application of super-resolution microscopy, people started to see functional genome structures inferred by sequencing data. For example, STORM combined with

DNA FISH has been used to image chromatin folding of TAD and different epigenetic states inferred by Hi-C63,65. However, more detailed structures beyond TAD such as chromatin loop are still elusive. As one of the most well-established super-resolution light microscopies that can achieve the best resolution up to 20 nm axially, iPALM will provide the best chance to resolve the most fundamental functional chromatin unit.

Breast cancer is the most common malignancy and the second leading cause of cancer- related deaths among women (https://seer.cancer.gov/statfacts/html/breast.html). Although recent evidence suggests that some improvement has also been seen over the years for patients with metastatic disease127, disseminated tumors remain the proximal cause of the majority of breast

24 Dissertation Jacqueline Jufen Zhu cancer deaths. Improved understanding of the unique dependencies and vulnerabilities of disseminated tumor cells and their resulting macroscopic lesions would therefore have the potential to improve patient survival by exploitation of metastasis-specific targets and pathways to eliminate life threatening metastasis.

Molecularly, metastasis results from ectopic gene expression that is affected by genetic and epigenetic events. Therefore, comprehensive study is necessary by combining genetic and epigenetic analysis based on genome-wide sequencing technologies.

A central aim in this study is to dig out the regulatory elements especially enhancers and understand the mechanism of distinct genome organization that programs the cellular function and biological processes. Studies in genome organization and gene regulation have provided new insights into transcription regulatory mechanisms and disease development109,128,129. Via chromatin looping enhancers can affect the expression of target genes located far away in linear distance. It has been reported that disease-associated variation is especially enriched in enhancers of disease-relevant cell types, and cancer cells generate enhancers at oncogenes and other genes important in tumor pathogenesis. Hence, higher-order chromatin conformation could be a critical determinant in breast cancer metastasis.

As a result of genetic screen in mouse models, SNPs that are found to be associated with differential metastatic capacity can be utilized as “fingerprints” on the genome to indicate which genes, regardless of previous knowledge or known functionality, play important roles in tumor progression. These “fingerprints” can subsequently be used to retrospectively construct the probable biology along the metastatic cascade on which genes or DNA elements act without direct observation. Based on a genetic screen that demonstrated a significant role of inherited

SNPs in the progression of a transgene-induced model of mammary tumor metastasis130,131, a

25 Dissertation Jacqueline Jufen Zhu number of genes have been implicated in the predisposition of this mammary tumor model to form pulmonary metastatic lesions115,132,133, several of which have also been potentially implicated in human metastasis susceptibility133–137. Although these studies have implicated new genes in tumor progression, each appears to contribute relatively small amounts of the phenotypic variance, suggesting that metastasis susceptibility is a complex trait that is the combined result of many different genes acting additively or interactively. In order to more comprehensively and specifically interrogate the targets genome-widely, it is promising to utilize sequencing tools that map the epigenomic features combined with SNPs to identify genes and pathways that are associated with metastasis but missed in traditional genetic studies.

26 Dissertation Jacqueline Jufen Zhu

1.5 Outline

This thesis consists two projects to study chromatin interaction in mammalian cells. Each aim of the projects is described and discussed in the following chapters.

Chapter 2: Super-resolution imaging of a distinct chromatin loop in human lymphoblastoid cells

Chapter 3: Investigation of enhancer regulation on metastasis by chromatin interaction in mouse cells lines

Chapter 4: Identification of a core metastasis susceptibility network by integrating genetic, epigenetic and chromatin interaction analysis.

The last chapter concludes the results and findings and discusses the limitations and future directions.

Chapter 5: Conclusions, limitations and future directions

27 Dissertation Jacqueline Jufen Zhu

Chapter 2: Super-resolution imaging of single

chromatin loop in human lymphoblastoid cells

28 Dissertation Jacqueline Jufen Zhu

Chapter 2

2.1 Introduction

Chromatin organization is a fundamental mechanism for gene transcription regulation.

DNA is condensed in different levels in cell nucleus. The lower level 10 nm DNA fiber is folded into higher-order chromatin structures that allows further compaction fitting into the nucleus which tightly associate with transcription regulation and cellular function. The higher-order genome structure is elusive for tens of years. Advanced technologies for in vivo analysis are highly desired to decipher the higher-order genome structure. Recently, technologies combining biochemistry and high-throughput sequencing such as Hi-C and ChIA-PET have been developed to genome-widely characterize long-range interaction (usually from several kilobase to hundreds kilobase) that is considered as the basis of higher-order chromatin organization in order to map the whole genome structure. Chromatin interactions suggests a DNA looping and folding conformation that distant chromatin loci can physically contact with each other. Based on mapped chromatin interactions, complex structures with mega bases of DNA such as TAD and CTCF-mediated chromatin contact domain (CCD) have been proposed and defined. More importantly, these sequencing inferred structures were proved to be involved in transcription regulation and disease development. Thus, they have critical biological importance. However, the visualization of these hypothesized higher-order chromatin structures in situ remains difficult mainly due to resolution limitation of conventional microscopy. Though electron microscopy had been used to visualize the ultrastructures and 3D organization of chromatin directly138,139, it lacked specificity with the restriction of labeling methods to be used in investigating transcription associated chromatin structures. Fortunately, super-resolution light microscopy has made it possible to achieve the goal. Recently, STORM has been used to image chromatin

29 Dissertation Jacqueline Jufen Zhu folding of TAD and different epigenetic states inferred by Hi-C63,65. With this success, there comes a more intriguing challenge to visualize a distinct chromatin loop. In this study, we applied super-resolution microscopy iPALM combined with FISH to visualize a specific chromatin loop that very frequently occurred in human lymphoblastoid GM12878 cells inferred by Hi-C and ChIA-PET chromatin contact data, which enabled us to resolve the very fine chromatin looping structures within 33 kilobases of DNA.

The iPALM imaging was done in Janelia campus, Howard Hughes Medical Institute, helped by Jesse Arron. The chromatin modeling and analysis were mostly done by Zofia Parteka and Przemyslaw Szalaj in University of Warsaw.

30 Dissertation Jacqueline Jufen Zhu

2.2 Materials and Methods

Cell culture

GM12878 cells were cultured in RPMI 1640 with 2mM L-glutamine and 15% fetal bovine serum at 37 °C.

Nuclei isolation

Nuclei EZ Prep Nuclei Isolation Kit from Sigma was used to isolate nuclei from GM12878 cells.

Coverslip coating

Coverslips were incubated in 1M KOH for 20min, washed with water, coated with 0.01% poly-

L-lysine (Sigma) for 20min, rinsed with water, dried for 30min for later use.

DNA FISH

Isolated nuclei were added to attach to poly-L-lysine coated coverslips, fixed with 4% PFA for

10min at room temperature, washed with 1x phosphate-buffered saline (PBS), permeabilized with ice-cold methanol for 10min, washed with 1 x PBS, dehydrated with 75%, 85% and 100% ethanol for 2min each, dried at 60°C for 1h. FISH probes were mixed with hybridization buffer and added to prepared nuclei, denatured at 80°C for 5min, incubated in a humid chamber at 37°C for overnight. Nuclei were washed with 50% formamide/2 x SSC at room temperature for 10min, followed by 2xSSC for 10min, 0.2 x SSC at 55°C for 10min and then large volume of 2 x SSC till imaging. iPALM imaging

Samples were imaged in standard STORM buffer by iPALM (Shtengel et al., 2009). Isolated nuclei were adhered to 25mm round coverslips containing gold nanorod particles that act as calibration standards and alignment/drift fiducial markers. These were prepared as described in

(https://www.sciencedirect.com/science/article/pii/B978012420138500015X). Briefly, coverslips were

31 Dissertation Jacqueline Jufen Zhu washed for 3 hours at 80 degrees C in a 5:1:1 solution of H2O:H2O2:NH3OH, rinsed copiously, and coated with poly-L-lysine. After further washing, gold nanorods (Nanopartz, Inc) were adhered to poly-L-lysine coated coverslips, washed again, and coated with ca. 50nm SiO2 using a Denton vacuum evaporator. Samples were mounted in dSTORM buffer140, containing tris buffered saline, pH 8, 100mM mercapto ethanolamine, 0.5 mg/mL glucose oxidase, 40 ug/mL catalase, and 10% (w/v) glucose (all from Sigma). An 18mm coverslip was adhered atop the bottom coverslip, sealed, mounted in the iPALM, and imaged as described above.

Image pre-processing iPALM images were reconstructed via localization of blinking fluorophores over 25,000 frames across each of three Electron-multiplying charge-coupled device (EMCCD) cameras. Gold nanoparticles act as fiducial markers that allow for (1) spatial registration of the three cameras using a full affine transformation, and (2) calibration of the z-position response of the system. After localization, images were filtered to only include localizations with <30nm uncertainty in all three dimensions. The gold fiducial particles also allow for drift correction in all three dimensions, and for correcting any sample tilt to within 30nm error. Final images were rendered as image stacks, reflecting the fluorophore density and uncertainty of localization, or exported as ASCII delimited text files for further analysis.

Spots identification

Connected component in graph theory is a set of nodes (subgraph) such that all of the nodes are connected by vertices. In image processing we are looking at voxels with different brightness levels. We defined a connected component as a part of the image in which all of the voxels form an area with brightness level above given threshold. By changing the brightness cutoff level, we can split an image into different sets of connected spots (Figure 2.4).

32 Dissertation Jacqueline Jufen Zhu

Algorithms for connecting identified spots

To comprehensively model the peak joining possibility, we applied three different algorithms to render the models. After that we apply spline Interpolation to smooth the models.

Nearest neighbor (NN) algorithm Nearest neighbor algorithm is solving shortest path problem.

Shortest path problem in graph theory is the problem of finding a path between two nodes such that the distance between them is minimized. Algorithm is starting from given point, searching the nearest point in surroundings, connecting them and again searching for the nearest neighbor and this way connecting a whole set of points. After analyzing the genomic data, we found out that examined loop is in between two subdomains. We assumed that flanking regions will be far apart from each other with the loop in the middle. We run two NN simulations each starting from one of two points that were furthest apart in space. Thus, this program will generate two possible image models.

Traveling salesman problem solver (TSP) algorithm We used implementation of greedy algorithm finding one of best solutions for this NP-hard problem. We treat our set of points as graph nodes, and distances between them as vertices. At the beginning each vertex is a separate path of length 1. In each step we are finding two closest disconnected paths, and we connect them into one. This step is repeated until there is just one path left. Greedy TSP solver gives highly non-optimal results therefore after connecting all paths into one we run optimization algorithm. Optimization tries to rearrange points in the path to improve the solution. After finding the shortest path we are simply deleting the longest connection between two points. This way we get the shortest path between two points that are the furthest from each other.

Neighbor joining (NJ) algorithm Neighbor Joining is an agglomerative clustering method used in bioinformatics for creation of phylogenetic trees. In each step the algorithm is searching for a

33 Dissertation Jacqueline Jufen Zhu pair of points that are the closest to each other and connecting them in one. This step is repeated until there is no unconnected points left. We used this approach to connect sets of points from iPALM images.

Spline Interpolation

This is a type of interpolation which uses splines, special polynomials, to construct new set of data points. We used it to smooth structures after connecting spots from iPALM images.

34 Dissertation Jacqueline Jufen Zhu

2.3 Results

Chromatin landscape of target region and visualization by DNA FISH

Hi-C and ChIA-PET data both showed chromatin loops at various genomic loci and length scales. After we examined the whole genome, we selected the target genomic region in chromosome 14, which contains a high frequency loop that is about 13kb long and mediated by both CTCF and RNAPII in GM12878 cells (Figure 2.1B and Figure 2.2). It is located at the T- cell receptor alpha (TCRA) locus where V(D)J recombination takes place during T-lymphocyte development and has been observed and studied in mouse T cells141–143. Specifically, a CTCF- and cohesin-binding site is located between TCRA locus and the neighboring housekeeping gene

DAD1, functioning as an insulator, as the depletion of cohesin led to an increased transcription of DAD1144. Interaction between TCRA enhancer and Dad1 was shown to occur both in mouse

T- and B-lymphocytes, although in the latter it was slightly weaker143,144. The conservation of

TCRA locus in mouse and human145 implies a similar chromatin conformation in human lymphocytes with mouse. Expectedly, in our GM12878 ChIA-PET data, one anchor of the loop overlapped with TCRA enhancer, the other anchor is situated in Dad1 gene body (Figure 2.1B), which is similar to the observation in mouse T cells, suggesting the insulation function of our target loop in human cells, adding biological meaning to the study on its detailed folding conformation.

To keep the integrity of the loop in consideration of staining efficiency, we extended

10kb at both sides. Basically, we were investigating a 33kb chromatin region that contains an interaction loop inside that is shown in both chromatin interaction heat map (Figure 2.1A) and genome browser (Figure 2.1B). The surrounding epigenomic landscape of this region is shown in

(Figure 2.2). By genomic modeling, we reconstructed the conformation of the target based on

35 Dissertation Jacqueline Jufen Zhu sequencing data (Figure 2.1C). However, the chromatin structure inferred by population data lacks specificity and accuracy of individual cells or alleles, which led us to interrogate it by super-resolution imaging. Oligopaints probe46 for DNA FISH was designed to target on the chromatin specifically by avoiding DNA repeats region, resulting a staining density of about 10 oligos/kb, allowing us to visualize the target chromatin as a dot by conventional microscope, though lacking details of the structuring. The composition of each probe oligo was shown in

Figure 2.1D.

Fine structures of target chromatin revealed by 3D super-resolution iPALM imaging

We applied iPALM58 to samples labeled with Atto647N, achieving the resolution at single molecule level (Figure 2.1E). We acquired six good quality images of the target chromatin

(Figure 2.1E) for following analysis.

Briefly, samples were imaged at 30-50ms exposure, under 3kW/cm2 of 647nm laser excitation and 100W/cm2 405nm laser activation for 25,000 frames to capture blinking Atto

647N molecules. Data was imported into the PeakSelector software package (Janelia Research

Campus), which registers each of the three camera images with respect to each other, calibrates the intensity across each camera as a function of z-position, and localizes each blinking molecule in 3D. Further, fiducial nanoparticles embedded in the coverslip allow for drift correction after acquisition and localization to maximize image resolution. By pre-processing, localizations were filtered based on localization uncertainty in all three dimensions and data were rendered as

3D .tif stacks or exported as vectorized datasets for further analysis.

Each image (Figure 2.1E) shows a one copy of the 33 kb chromatin target. The well- separated red spots represent single fluorescent molecules bound along the chromatin. From the

36 Dissertation Jacqueline Jufen Zhu images, we can tell the spots are not randomly distributed but conformed in some order to build featured structure.

A chr14:22.7Mb – 24.05Mb D

23.018Mb – 23.048Mb

E

B

CTCF

RNAPII

C

Figure 2.1 Experimental design and acquired images (A) Representation of the target chromatin loop by Hi-C heatmap. The upper heatmap shows a larger view of the chromatin loop inside. The lower heatmap shows the exact target region for visualization, within which, the blue and green dots indicate RNAPII mediated and CTCF mediated chromatin loops, respectively, inferred by ChIA-PET. (B) Browser view of CFCF and RNAPII mediated chromatin loop of the target region. (C) Modeled chromatin structure using Hi-C and ChIA-PET data. (D) Probe design for DNA FISH and schematics of iPALM imaging. (E) Six acquired images for the distinct chromatin loop by iPALM.

37 Dissertation Jacqueline Jufen Zhu

chr14:20,000,000-26,500,000 ChIA-PET HiC

CTCF loop

CCD

CTCF peak

RNAPII loop

RNAPII peak

chr14:22,700,000-24,050,000

CTCF loop

CCD

CTCF peak

RNAPII loop

RNAPII peak

chr14:23,018,000-23,048,000

CTCF loop

CCD

CTCF peak

RNAPII loop

RNAPII peak

Figure 2.2 Landscape of the chromatin loop The left panels are the browser views of ChIA-PET data for the chromatin loop and its surroundings. The middle panels are ChIA-PET heatmaps for the corresponding chromatin loop and surrounding regions. The right panels are Hi-C heatmaps for the corresponding regions.

38 Dissertation Jacqueline Jufen Zhu

Reconstruction of chromatin targets from iPALM images

We modeled the chromatin structure by extracting and connecting all the spots in the images using different algorithms. All six images were analyzed to get confident single molecule spots from pre-processed images (Figure 2.3A). Basically, we extracted significant signals, measured brightness of images in relative luminosity units which range from 0 to 255.

Brightness threshold was chosen manually to cut off noises. We then started from the lowest brightness level (threshold level) and increasing the value by ten in each step, until we reached maximum brightness of the image (Figure 2.4). In this way we examined different sets of spots of signals at each step, that allowed us to identify all spots desired. The number of spots varied between different images, ranging from 42 to 110 (Table 2.1), which is much smaller than the total number of probe oligos for each copy of chromatin target. A couple of reasons can cause this under-stained effect: the chromatin cannot be ideally fully stained due to the FISH efficiency; some signals can get lost during data processing. We then characterized each image by spot count and volume, as well as the nearest distance between spots (Figure 2.3B, 2.3C, 2.5). The average dot count of each image is 68, and volume is 0.005 µm3 (Table 2.1). The nearest distance enriched in the spectrum of 0-60 nm. Intriguingly, each structure of the target looks different from another, indicating the heterogeneity of the genome across cells, at this higher- order level.

As there is no obvious way to determine the orientation of the target chromatin loop from the staining, we used three different algorithms to simulate and reconstruct the chromatin conformation. Each set of points was connected using NN, NJ and TSP algorithms (Figure 2.3D,

E, F, G). The obtained structures were smoothed using cubic spline interpolation. Four simulated structures were generated for each image. We then measured the linear length of these modeled

39 Dissertation Jacqueline Jufen Zhu structures (Table 2.2). Considering the probing density is in average of 10 fluorophore/kb, it is about able to resolve up to the lower level “beads on string” structure with around 150 bp per unit, even though the DNA is under-stained. As previously reported, the first level of DNA compaction from backbone to histone modified “beads on string” structure is about five to ten-fold condensation in size8. Therefore, our target 33 kb chromatin is estimated to be around

2244 nm to 4488 nm by first level compaction. Compared with our modeled DNA length, most of them are within the estimated range, only two of the images have shorter length, which may suggest a truncated chromatin structure due to the limitation of imaging penetration depth.

A B C

140

Z

20 180 50 Y X 225 60

D E F G

Neighbor Joining Nearest Neighbor 1 Nearest Neighbor 2 Traveling Salesman

Figure 2.3 Image characterization and modeling (A) Signal extraction from image 2. Each red spot indicates one fluorophore tagged with DNA probe oligos. (B) 3D volume of Image 2. (C) Distance distribution of the nearest spots in Image 2. (D) (E) (F) (G) Image modeling with different algorithms for Image 2.

40 Dissertation Jacqueline Jufen Zhu

Figure 2.4 iPALM image processing algorithm The upper panel is a schematic illustration to extract probe spots from pre- precessed. The lower panel is a processing example of Image 2

41 Dissertation Jacqueline Jufen Zhu

Image 1 Image 2 Image 3 140 140 140

Z Z Z 20 20 180 20 160 50 Y 60 80 Y 20 X 225 60 Y X 60 160 X 0 160

Image 4 Image 5 Image 6 140 140

Z Z Z

20 20 20 200 120 220 80 60 100 Y Y Y X 60 X X 160 60 140 300 100

Figure 2.5 Characterization of volumes and distances between spots for each image

42 Dissertation Jacqueline Jufen Zhu

Table 2.1 Characterization of images

Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Spot count 50 80 42 73 51 110

Volume (nm3) 3,906,301 6,077,427 2,753,653 4,716,177 1,993,389 11,434,228

Table 2.2 Linear length of image models

Linear length of the models for images (nm)

Algorithm Image 1 Image 2 Image 3 Image 4 Image 5 Image 6

NJ 2297.63 2569.35 3150.5 3481.72 1699.51 1897.23 2842.1 3218.16 1775.18 1967.29 4751.28 5346.97

NN1 2143.24 2320.08 2420.73 2651.38 1428.13 1562.75 2621.51 3092.21 1589.24 1743.19 3581.4 3829.22

NN2 1968.2 2119.16 2531.7 2730.49 1432.12 1606.68 2402.78 2592.97 1590.37 1742.47 3716.58 4013.24

TSP 1848.5 2038.64 2422.66 2596.36 1267.72 1383.92 2238.86 2404.38 1312.37 1417.93 3364.43 3623.48

43 Dissertation Jacqueline Jufen Zhu

Correlation of physical coordinates with chromatin contact mapping data

To assess the concordance between imaging and sequencing data, we analyzed the Hi-C data for the loop region. Based on the location of contact anchors we defined four contact categories: loop (with both anchors contained within the loop region), flank (with both anchors located in flank regions), loop-flank (with one anchor in loop region and the other in one of the flanks) and inter-flank (with anchors located in flanks on both sides of the loop) (Figure 2.6A).

As expected, we noted that the interaction frequency of the contacts within loop or flank as seen in Hi-C heatmap was significantly higher than for the loop-flank and inter-flank (Figure

2.6C, 7, 8, 9, 10, 11, 12). Considering this effect is to some extent due to higher frequency of contacts between regions located close to each other, we removed the shortest contacts from the analysis and observed very similar patterns.

We also analyzed the relation between physical distance as measured in imaging-based structures between pairs of points and the frequency of contacts between the corresponding loci.

We assumed that chromatin loops are highly flexible and that they can take different conformations across time and different cells. Thus, to identify the expected patterns and variability we used molecular dynamics simulations to model a number of toy regions with varying pattern of contacts. We modeled the loop region using a beads-on-a-string model with

340 equidistant points. A random walk structure was used as the initial configuration. The simulation was run for 1000 steps, and the first 200 steps were discarded. Then we extracted every twentieth frame and analyzed them both individually and collectively. As a result, most of models has shorter distance in loop regions compared to the other three categories (Figure 2.7,

2.8, 2.9, 2.10, 2.11, 2.12). The imaging models generally exhibited similar patterns, but there

44 Dissertation Jacqueline Jufen Zhu was large variability between different methods and images. This suggests that the chromatin loops are very flexible and dynamic.

For each model of the images we plotted the 3D distance between pairs of points against the contact frequency from the corresponding Hi-C heatmap bins and genomic distance between them (Figure 2.7, 2.8, 2.9, 2.10, 2.11, 2.12). In all the cases we observed that contacts representing different types are relatively well-separated. In general, loop-flank and inter-flank contacts are characterized by longer 3D distance and lower Hi-C frequency, whereas flank and loop contacts tend to have smaller distances and higher contact frequency.

Looking back to all the models made from different algorithms, we found the NJ models are the least matched with the hypothesis that the loop region has the shortest pair-wise physical distances, while the NN models are the most matched, suggesting NN algorithm works the best in this structure modeling.

A B C

Figure 2.6 Comparisons of image models with sequencing data (A) Schematics of the target chromatin loop inferred by ChIA-PET data. (B) The chromatin contact information characterized by 3D distance in different regions of the chromatin loop indicated in (A) for Image 2 modeled by nearest neighbor 1. (C) 3D scatter plot for chromatin contacts characterized by 3D distances in nearest neighbor1 modeled Image 2 and Hi-C frequency data.

45 Dissertation Jacqueline Jufen Zhu

Neighbor Joining Nearest Neighbor 1 Nearest Neighbor 2 Traveling Salesman

Figure 2.7 Comparisons of image models with sequencing data for Image 1 The top panels are image models from different algorithms. The middle panels are 3D distance plot of the models. The bottom panels are the chromatin contact within the target region characterized by 3D distance and Hi-C frequency.

46 Dissertation Jacqueline Jufen Zhu

Neighbor Joining Nearest Neighbor 1 Nearest Neighbor 2 Traveling Salesman

Figure 2.8 Comparisons of image models with sequencing data for Image 2 The top panels are image models from different algorithms. The middle panels are 3D distance plot of the models. The bottom panels are the chromatin contact within the target region characterized by 3D distance and Hi-C frequency.

47 Dissertation Jacqueline Jufen Zhu

Neighbor Joining Nearest Neighbor 1 Nearest Neighbor 2 Traveling Salesman

Figure 2.9 Comparisons of image models with sequencing data for Image 3 The top panels are image models from different algorithms. The middle panels are 3D distance plot of the models. The bottom panels are the chromatin contact within the target region characterized by 3D distance and Hi-C frequency.

48 Dissertation Jacqueline Jufen Zhu

Neighbor Joining Nearest Neighbor 1 Nearest Neighbor 2 Traveling Salesman

Figure 2.10 Comparisons of image models with sequencing data for Image 4 The top panels are image models from different algorithms. The middle panels are 3D distance plot of the models. The bottom panels are the chromatin contact within the target region characterized by 3D distance and Hi-C frequency.

49 Dissertation Jacqueline Jufen Zhu

Neighbor Joining Nearest Neighbor 1 Nearest Neighbor 2 Traveling Salesman

Figure 2.11 Comparisons of image models with sequencing data for Image 5 The top panels are image models from different algorithms. The middle panels are 3D distance plot of the models. The bottom panels are the chromatin contact within the target region characterized by 3D distance and Hi-C frequency.

50 Dissertation Jacqueline Jufen Zhu

Neighbor Joining Nearest Neighbor 1 Nearest Neighbor 2 Traveling Salesman

Figure 2.12 Comparisons of image models with sequencing data for Image 6 The top panels are image models from different algorithms. The middle panels are 3D distance plot of the models. The bottom panels are the chromatin contact within the target region characterized by 3D distance and Hi-C frequency.

51 Dissertation Jacqueline Jufen Zhu

2.4 Discussion

This is the first time a candidate chromatin loop is investigated and visualized using super-resolution microscope. Though we expected to see a major loop in the target chromatin region inferred by ChIA-PET data, we observed more complex looping structures in each individual image that differed from each other. There are a variety of possible reasons for the observation. 1. The chromatin loop could be heterogeneous between cells and even alleles, or dynamic other than static in different cell stages. The captured loop suggested by ChIA-PET data just indicated one possible conformation that occurred most frequently at that region. It does not mean other conformations could not happen as other types of conformations could be too rare to be captured. Our imaging data is not inclusive enough due to the limitation of the sample size to reflect the frequencies of each type of the looping incidences. 2. There are structures that cannot be captured by molecular strategies. Both ChIA-PET and Hi-C are based on a hypothesis that the chromatin interactions are mediated by protein factors. In other words, the chromatin that has no protein binding cannot be captured and modeled, though we cannot deny there are ultra-complex twisting and tangling for DNA packing in the nucleus.

This study allows a direct visualization of the chromatin looping in a specific region that we clearly see the physical winding of DNA, though there are limitations hindering us from interpreting the observations more comprehensively. For instances, we are not able to exactly map the iPALM images to the corresponding genome coordinates; we have not got a chance to image the non-looping regions inferred by genome sequencing data; a larger sample size would be helpful bridging the gap of the comparisons between individual by imaging and population by sequencing.

52 Dissertation Jacqueline Jufen Zhu

The imaging data is more direct and straightforward for revealing chromatin conformation than the sequencing data. Therefore, the observations or findings that are not in or against our expectations from the sequencing data are not frustrating but even more suggestive to the current understanding the chromatin looping.

As a pilot study, the experimental design could be further improved. As stated previously, we were not able to distinguish the chromatin loop region and extended tails of the loop by single color DNA FISH staining visualized by iPALM. In the future, dual color can be used to stain the loop and tails separately, increasing the accuracy of the conformation corresponding to the coordinates, decreasing the difficulties in the following modeling. To inclusively capture the possible chromatin structures and estimate the distribution of the occurrence frequencies, a larger sample size is needed. Furthermore, negative controls could be used for comparison, such as a non-looping region in the same cell line or the same target region with a non-looping structure in another cell line inferred by sequencing data. Besides the 13 kb chromatin loop at Chr14, it is interesting to investigate other chromatin loops at different length scales in different chromosomes. The fundamental principle of chromatin looping should be pursued.

53 Dissertation Jacqueline Jufen Zhu

Chapter 3: Investigation of enhancer regulation on

metastasis by chromatin interaction in mouse cells

lines

54 Dissertation Jacqueline Jufen Zhu

Chapter 3

3.1 Introduction

Metastasis is often found in breast cancer, which is the major cause to mortality.

However, the mechanism underlying is not clear. Studying the genome organization of cancer cells can reveal higher order regulatory mechanism of tumorigenesis and metastasis and in order to suggest clinical cure. Identification of these critical targets, however, has been challenging due to the complex nature of the metastatic process. In breast cancer, patients can survive for years, even decades, before the manifestation of metastatic disease146. During the years between the diagnosis of the primary tumor and metastatic relapse, occult tumor cells have remained undetected until some currently undefined and unobserved event results in their conversion to a proliferative secondary tumor. Since millions of cells can be shed by the primary tumor per day147 and dissemination can commence early in tumor evolution148,149, current hypotheses suggest that the colonization step - the conversion from quiescent to actively growing tumor cells

- is likely the rate-limiting step in the metastatic cascade. However, as indicated above, the quiescent-to-proliferative switch can occur anytime over a period of years to decades. This, coupled with the estimate that less than 1 in 10,000 tumor cells is likely to make the conversion150, makes direct visualization of the mechanisms and pathways involved very challenging, even within model organisms.

Results from genomic studies indicate genome organization within the nucleus is nonrandom, which suggests important functions in gene expression151,152. Higher-order genome structure such as chromosome territory reorganization and gene repositioning were found to be associated with human diseases and cancers17,96,153,154. More recently, studies showed genome architecture mediate interactions between regulatory elements such as enhancers and gene

55 Dissertation Jacqueline Jufen Zhu promoters to allow delicate gene regulation31,106,109,155. However, since genomic distance is not always consistent with physical distance in vivo, linear arrangement of the genome cannot fully represent the mechanism responsible for transcription regulation. Techniques that combine biochemistry with high-throughput sequencing such as Hi-C and ChIA-PET have been used to genome-widely map long-range chromatin interactions. Bound by transcription factor complexes, enhancers can be brought proximal to promoters to enhance gene transcription. Disruption of these 3D structures can lead to enhancer malfunction resulting in inappropriate gene expression, potentially leading to diseases and cancers73,86. It has been reported that disease-associated variation is especially enriched in enhancers of disease-relevant cell types, and cancer cells generate enhancers at oncogenes and other genes important in tumor pathogenesis109,156. Hence, higher-order chromatin conformation could be a critical driver in breast cancer metastasis.

CRISPR-based technology validated the importance of one set of enhancer-promoter interactions on target gene expression. Moreover, the analysis has been broadened to include five genetically related but metastatically divergent strain pairs which allows the identification of a core set of common metastasis susceptibility genes and molecular pathways. The results of these studies implicated inherited variation in metabolic pathways as important common mediators of metastatic progression. Furthermore, pre-existing therapeutics are implicated as either potential pro- or anti-metastatic agents that might be used or avoided to reduce subsequent metastatic burden. These studies therefore provide novel insights into a potential core set of processes associated with breast cancer progression that might be exploited to reduce the burden and mortality of breast cancer metastatic disease, as well as important insights into the epigenetic regulation of metastatic progression.

56 Dissertation Jacqueline Jufen Zhu

3.2 Materials and Methods

Cell Lines

4T1, 4T07 and 67NR30 were obtained from the laboratory of Lalage Wakefield (NCI) and confirmed to be of BALB/c mouse origin by microsatellite genotyping. All cell lines were mycoplasma-free.

Benzonase accessible chromatin (BACh) sequencing analysis

Cells were expanded in DMEM with 10% FBS and P/S at 37°C to obtain 8–10 x107 cells/condition/each experiment. BACh analysis was performed as previously described with minor modifications157. Briefly, cells were collected by centrifugation, washed twice in ice cold cellular wash buffer (20 mM TrisHCl pH 7.5, 137 mM NaCl, 1 mM EDTA , 10 mM sodium butyrate, 10 mM sodium orthovanadate, 2 mM sodium fluoride, protease inhibitor cocktail

(Roche)) and resuspended in (40 million cells/ml) hypertonic lysis buffer (20 mM TrisHCl pH

7.5, 2 mM EDTA, 1 mM EGTA, 0.5% glycerol, 20 mM sodium butyrate, 2 mM sodium orthovanadate, 4 mM sodium fluoride, protease inhibitor cocktail (Roche)). Cells were distributed in 500µl aliquots in 1.5 ml tubes and followed by the addition of 500µl of nuclease digestion buffer (40 mM TrisHCl pH 8.0, 6 mM MgCl2, 0.3% NP-40, 1% Glycerol) containing a

3-fold dilution (from 0.125 units/ml to 6U units/ml) of Benzonase (Calbiochem/EMD). This was mixed gently and incubated for three minutes at 37°C. Reactions were terminated by the addition of EDTA (10 mM final) and SDS (0.75% final). Proteinase K was added to a final concentration of 0.5 mg/ml and incubated overnight at 45°C. DNA fragments of 100-500 bp from a chromatin digestion were purified over sucrose gradients158 and precipitated in 0.1 volume NaAc and 0.7 volume isopropanol.

57 Dissertation Jacqueline Jufen Zhu

Sequencing and data analysis of Benzonase-treated samples

DNA was sequenced using either Illumina HiSeq2000 (TrueSeq V3 chemistry) or NextSeq500

(TrueSeq High output V2 chemistry) sequencers at the Advanced Technology Research Facility

(ATRF), National Cancer Institute (NCI-Frederick, MD, USA). The sequence reads were generated as either 50-mer or 75-mer (trimmed to 50-mer by trimmomatic software before alignment), and tags are then aligned to the UCSC mm9 reference genome assembly using Eland or Bowtie2. All the samples had good quality with over 94% of the bases having Q30 or above with 25~50 million raw reads per sample. Regions of enriched tags termed hotspots have been called using DNase2Hotspots algorithm159 (http://sourceforge.net/projects/dnase2hotspots) with

FDR of 0% with minor updates. Tag density values were normalized to 10 million reads. For comparison of BACh profiles across multiple cell lines, percent reference peak coverage (PRC) measure was proposed and employed to ensure compatible levels of digestion by Benzonase in multiple cell lines. To calibrate PRC, commonly represented hotspot sites were identified as reference peaks based on mouse (mm9) DNaseq data from ENCODE. ENCODE narrowPeak definition files (DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of

Washington) for a total of 133 samples were downloaded from the UCSC golden path web site: http://hgdownload.cse.ucsc.edu/goldenPath/mm9/encodeDCC/wgEncodeUwDnase/. A total of

8,587 peaks were found to be present in all 133 samples. By assuming that the most commonly accessible sites may also be present in our optimally digested samples, PRC was obtained as a fraction of reference peaks presented in each sample to evaluate the level of digestion by

Benzonase. Two biological replicates with acceptable PRC were selected and pooled for each cell line. When samples showed bimodality in their tag density distribution due to elevated noise, we dropped hotspots that belonged to a group with low tag density and low PRC by applying

58 Dissertation Jacqueline Jufen Zhu threshold to maximum hotspot tag density values. Each hotspot was classified in relation to genomic features, promoter, exon, , distal upstream and downstream classes, based on the annotation information on refGene.txt downloaded from http://hgdownload.cse.ucsc.edu/goldenpath/mm9/database/. The classification of each BACh was used to generate lineage plots for different BACh categories. Comparison of the relative position of the retained versus lost BACh were shown by plotting empirical cumulative distribution

(ECDF) of the absolute distance to nearest transcription start site (TSS) from the mid-point of a hotspot from two groups of BACh: BACh lost, (67NR, 4T07, 4T1) : (+, -, -) or (+, +, -), and

BACh retained, (67NR, 4T07, 4T1) : (+, +, +), as malignancy increases. The two distributions were compared by two sample Kolmogorov-Smirnov test (K-S test)159.

Long Read ChIA-PET and data processing

Long read ChIA-PET was performed using Tn5 transposase to tag DNA for long tag sequencing by Illumina Nextseq. ChIA-PET data was processed by a customized ChIA-PET data processing pipeline54. Detailed protocol can be found in the paper “Long-read ChIA-PET for base-pair resolution mapping of haplotype-specific chromatin interactions”160.

RNA-seq data analysis

DESeq2 was used to analyze the RNA-seq data for 4T1, 4T07 and 67NR cells to identify differentially expressed genes. The cut-off used was 3-fold change of normalized gene expression (baseMean), Padj < 0.01. By pair-wise comparison, over-expressed genes were identified for each cell line compared to the other two cell lines.

Cell line specific enhancer identification

Based on identified over-expressed gene in 4T1, 4T07 and 67NR and RNAPII-mediated chromatin interactions, genomic loci (promoter region excluded) that distantly interact with the

59 Dissertation Jacqueline Jufen Zhu gene promoters that have RNAPII binding peaks were filtered out as candidate enhancers. By comparative analysis and subtractive strategy, the enhancers that specifically occurs in each cell line but not overlap with any other regulatory elements interacting with promoters in the other cell lines were further identified as cell line specific enhancers.

Fosb knockdown

DsiRNA for Fosb was designed and produced by IDT with sequence1: rArCrArArUrCrUrArUrArUrCrGrUrUrGrArGrArArUrUrCTG and sequence2: rCrArGrArArUrUrCrUrCrArArCrGrArUrArUrArGrArUrUrGrUrArA. Cells were transfected using Lipofectamine RNAiMAX and then incubated for 72h for harvest (ThermoFisher

Scientific).

CldnE knockout experiment

Cas9 transgene was first introduced by infecting cells with lentivirus derived from plasmid lentiCas9-Blast (gift from Feng Zhang; Addgene plasmid # 52962), followed by Blasticidin selection, creating the 4T1_Cas9 cell lines. Then two plasmids expressing guides targeting upstream (guide sequence: GAGTGAGTTCTCAGACAGCCA) and downstream (guide sequence: GGTGTCTGTCCTTTACAGCA) of the CldnE were co-transfected with three knock- in/knock-out plasmids each carrying an EF1a promoter driven antibiotic resistance gene casette

(Hygro, Puro, Neo) flanked by 1kb homology arms upstream and downstream of the CldnE deletion region, into the Cas9-expressing 4T1_Cas9 cells. To select for multiple allelic knockouts, cells were selected simultaneously with hygromycin, puromycin, and geneticin.

Single clone knockout was isolated by flow cytometry and genotyped by PCR and Sanger sequencing. Primers for genotyping can be found in Table 3.1.

Table 3.1 Primers for genotyping Name Sequence

60 Dissertation Jacqueline Jufen Zhu

CldnE_mutant F CGGAACTTCAGTTGACTGCCTC CldnE_mutant R ctgtgttctggcggcaaacc CldnE_wildtype F CGGAACTTCAGTTGACTGCCTC CldnE_wildtype R CTATTGCAGGCTCAGGGCTAATC

qRT-PCR

Gene expression of CldnE knockouts was measured by qRT-PCR. Primers are attached in Table

3.2.

Western blot

Three CldnE knockout strains and 4T1_Cas9 as control were used in this assay. The experiment was done following the general protocol from BIO-RAD. The gel image was analyzed using FIJI to quantify the protein expression by measuring the raw intensity of the protein bands.

Wound healing and cell migration assay

Three CldnE knockout strains and 4T1_Cas9 as control were used in this assay. Culture dish with silicone insert were purchased from Ibidi (Cat. # 80206). The same number of cells for each strain was seeded in the inserts and after 24 hours culture a monolayer of cells and a defined 500 um cell free gap was formed. Images of the gap were captured every two hours till eight hours with 10x objective of a widefield microscope (Leica DMi8). Size of the gaps at each time point was measured using FIJI. The cell migration speed was calculated with a linear trend.

61 Dissertation Jacqueline Jufen Zhu

Table 3.2 Primers for qRT-PCR Name Sequence Actb F CTCTGGCTCCTAGCACCATGAAGA Actb R GTAAAACGCAGCTCAGTAACAGTCCG Cldn3 F ACCAACTGCGTACAAGACGAG Cldn3 R CAGAGCCGCCAACAGGAAA Cldn4 F GTCCTGGGAATCTCCTTGGC Cldn4 R TCTGTGCCGTGACGATGTTG Abhd11 F ACTCCCAAGCCCGAAAAC Abhd11 R CGTATCCAAGTTCAGTCTCCAG Fosb F AGGCAGAGCTGGAGTCGGAGAT Fosb R GCCGAGGACTTGAACTTCACTCG Cdh1 F CACCTGGAGAGAGGCCATGT Cdh1 R TGGGAAACATGAGCAGCTCT Wnt7a F TGAAGAGGACCCAGTGACAGG Wnt7a R GGCGTACTGGTGTGTGTTGT

62 Dissertation Jacqueline Jufen Zhu

3.3 Results

Identification of metastasis-associated genes and their regulatory elements and factors in 4T1 cell line series

The tumors from these three cell lines - 67NR, 4T07 and 4T1 - span the malignant spectrum when inoculated into the mammary fat pad of BALB/c mice, with 67NR remaining localized, 4T07 disseminating to and remaining as single cells in the lung, and 4T1 forming macroscopic pulmonary metastatic lesions. In order to examine the transcription associated with metastasis, we performed RNA-seq with 4T1 cell series and identified 743, 620 and 968 up- regulated genes in 4T1, 4T07 and 67NR, respectively, by comparative analysis (Figure 3.1A).

4T1 up-regulated genes were specifically enriched in tight junction and lipid biosynthesis (Figure

3.1B). Though tight junction usually is thought as a key component of maintaining cell adhesion and tissue integrity to suppress metastasis of invasive tumor cells161,162, in 4T1 cell model, the activated or enhanced expression of tight junction genes including Cdh1, Cldn3 and Cldn4 might function to facilitate the cell colonization instead of building a barrier for secondary tumor formation. In addition, many other 4T1 up-regulated genes identified here such as Wnt7a, Il1a and Hoxa cluster have also been reported previously to be related to cancer metastasis163–167.

To answer the question how these metastasis-associated genes were regulated in the perspective of higher-order genome organization, we analyzed the RNAPII ChIA-PET data to characterize chromatin interactions with both anchors located at BACh to identify cell line specific enhancers that were interacting with these genes. Specifically, 261 cell specific enhancers were identified to connect with 175 colonization-associated genes and 567 cell specific enhancers were interacting with 265 colonization suppressing-associated genes in 4T1 and 67NR, respectively. By analyzing this subset of enhancers and their targeting gene

63 Dissertation Jacqueline Jufen Zhu promoters, we found differential enrichment of protein-binding motifs at gene promoter and enhancer region for both classes of genes (Figure 3.1C, Figure 3.2B). Enhancer region were enriched for Fos:Jun (AP-1) family protein binding, while gene promoter region were mainly enriched for Sp1 family protein. AP-1 family is known as important protein factors that are crucial for human breast cancer metastasis168–170, but the regulation via enhancers hasn’t been stated yet. Therefore, our study suggested its bidirectional role in regulating metastasis, that is, it could either promote or attenuate metastasis progression in different cell lines.

To validate Fos:Jun family protein functions in regulating colonization-associated genes in 4T1 cells, we performed RNAi to knock down a 4T1 up-regulated gene Fosb. The gene expression of candidate targets that have Fos:Jun binding motif at their enhancer region (Figure

3.2A) were examined by qRT-PCR in Fosb knockdown cells where Fosb was down-regulated about three fold (Figure 3.1D). Cldn3 and Cdh1 expression were observed to significantly decrease, while Cldn4 and Wnt7a remained at the same expression level (Figure 3.1D) that might be due to the binding of other Fos:Jun family proteins instead of Fosb. This result added evidences to prove that Fos:Jun family protein regulate colonization-associated genes by binding at enhancer sites.

64 Dissertation Jacqueline Jufen Zhu

B A 4T1 4T07 67NR FDR = 0.05

743 4T1

620 4T07 GO keywords GO

968

67NR

-log10(FDR) Fold enrichment D C Fos:Jun 1.2 Enhancer enriched 1

e-value: 5.3e-10 0.8 site count: 59 0.6 RNAiNegCtr RNAiFosb Sp1 0.4 Promoter 0.2 enriched 0 e-value: 3.9e-23, site count: 60 Fosb Cldn3 Cldn4 Cdh1 Wnt7a

Figure 3.1 Characterization of cell up-regulated genes and associated enhancers in 4T1, 4T07 and 67NR (A) Heap map shows the differentially expressed genes in 4T1, 4T07 and 67NR cells. By comparing gene expression between cell lines using DESeq2, the conditions of defining gene upregulation are log (fold change) > 1.5, FDR < 0.01. Red color indicates upregulation. Blue color 2 indicates downregulation. (B) The enriched functions by analysis of cell line up-regulated genes are shown with FRD (left side) and fold enrichment (right side). (C) Motif analysis was done with 276 identified BACh sites located within enhancer region as well as their interacting promoters of 175 in 4T1 cells. The enhancers are enriched for Fos:Jun protein family binding, while the promoters are enriched for SP1 family binding. Specifically, 59 enhancer sites are found to have Fos:Jun binding motif and 60 sites have SP1 binding motif. (D) Gene expression level was measured in Fosb knock-down cells. Metastasis-associated genes Cldn3 and Cdh1 were significantly down-regulated in Fosb knock-down cells by RNAi.

65 Dissertation Jacqueline Jufen Zhu

A Coordinate of BACh sites Sequence P-value Interacting gene within enhancer region chr5:135443014-135443425 GGAGGCAGGT GTGACTCA CCCCT 1.53e-5 Cldn3, Cldn4

chr8:109139675-109140400 TGTCCCACTC CTGACTCA GAGTGTGACG 4.59e-5 Cdh1

chr6:91319591-91320683 GGCATTATGT ATGAGTCA AAGAATTTTC 1.23e-4 Wnt7a

… … … …

B Fos:Jun

Enhancer enriched

e-value: 3.9e-87 site count: 143 Sp1 Promoter enriched

e-value: 3.3e-30, site count: 69

Figure 3.2 Binding motif analysis (A) Examples of BACh sites located within enhancers that interact with metastasis-associated genes and have Fos:Jun binding motif. Red color indicate Fos:Jun binding motif sequence. (B) Motif analysis was done with 591 identified BACh sites located within enhancer region as well as their interacting promoters of 265 in 67NR cells. The enhancers are enriched for Fos:Jun protein family binding, while the promoters are enriched for SP1 family binding. Specifically, 143 BACh sites are found to have Fos:Jun binding motif and 69 sites have SP1 binding motif.

66 Dissertation Jacqueline Jufen Zhu

Gain or loss of functions of metastasis-associated genes by enhancer regulation

Enhancer interaction not only implicated important roles of certain transcription factors, but also suggested an accurate regulatory mechanism of metastasis-associated genes. Some of the identified genes were connected with more than one enhancer (Figure 3.3A, 3.4A, Table 3.3 and Table 3.4), building up a delicate and complex regulation system, which further implied their importance in metastasis. For instance, Wnt7a was observed to be involved in a variety of cancers171–173. In 4T1 cells, Wnt7a was found to promote breast tumor invasion and metastasis via potentiating TGFb receptor signaling163. Comparatively, our study revealed Wnt7a promoter was connecting to three potential enhancers located 20kb to 130kb away, either within gene body or at non-coding region (Figure 3.3B). Another example comes in 67NR cells. A chemokine gene Ccl7 that is broadly studied and reported to be associated with cancer progression174,175 was highly expressed in 67NR cells, while had no expression in 4T1 cells. Within a 200 kb region,

Ccl7 promoter was interacting with eight enhancers that were only present in 67NR cells (Figure

3.4B). The gain or loss of enhancers would be one of the reasons for gain or loss of function of metastasis associated genes.

67 Dissertation Jacqueline Jufen Zhu

A C

Enhancer/Gene Gene/Enhancer B D

Figure 3.3 Characterization of 4T1-specific enhancers (A) Enhancer distribution per gene in 4T1 cells was characterized. 261 identified enhancers are connecting with 175 up-regulated genes. Most of the genes are interacting with one enhancer, but the others interact with multiple enhancers. (B) The example shows a gene interacting with multiple enhancers. 4T1-upregulated gene Wnt7a is observed to interact with three enhancers. Green line indicates Wnt7a gene promoter site. Red lines indicate enhancer sites. (C) Gene distribution per enhancer in 4T1 cells is characterized. Most of the enhancers are connecting with only one 4T1-upregulated gene, but 12 of them are interacting with two identified genes and the other one is interacting with four genes. (D) The example shows an enhancer interacting with multiple genes. An enhancer is interacting with Cldn3, Cldn4 and Abdh11 directly. Cldn3 and Cldn4 are only expressed in 4T1 cells compared to 4T07 and 67NR. Though Abhd11 is not that significantly up-regulated in 4T1, decreased expression can be observed in 4T07 and 67NR. Green line indicates gene promoter sites. Red line indicates the enhancer.

68 Dissertation Jacqueline Jufen Zhu

A C 67NR 67NR

Enhancer/Gene Gene/Enhancer B D

Figure 3.4 Characterization of 67NR-specific enhancers (A) Enhancer distribution per gene in 67NR cells is characterized. 567 identified enhancers are connecting with 265 up-regulated genes. Most of the genes are interacting with one enhancer, but the others interact with multiple enhancers. (B) The example shows a gene interacting with multiple enhancers. 67NR-upregulated gene Ccl7 is interacting with eight enhancers. Green line indicates Ccl7 gene promoter site. Red lines indicate enhancer sites. (C) Gene distribution per enhancer in 67NR cells is characterized. Most of the enhancers are connecting with only one 67NR-upregulated gene, but 29 of them are interacting with two identified genes. (D) The example shows an enhancer interacting with multiple genes. An enhancer is interacting with Mdm2 and Rab1b directly. Mdm2 and Rab1b are significantly up-regulated in 67NR cells compared to 4T1 and 4T07. Green line indicates gene promoter sites. Red line indicates the enhancer.

69 Dissertation Jacqueline Jufen Zhu

Table 3.3 Multiple enhancer regulation on metastasis-associated genes in 4T1 cells Gene Symbol Enhancer Count Gene Symbol Enhancer Count Hes1 6 Fbln2 2 Cdh1 4 Gm14023 2 Endod1 4 Gm14207 2 Krt8 4 Gm15051 2 5430407P10Rik 3 Has2 2 Bdh1 3 Has2as 2 Cd24a 3 Hs3st1 2 Cgnl1 3 Inhba 2 Cldn4 3 Itga2 2 Cnnm4 3 K230010J24Rik 2 Epcam 3 Krt80 2 Filip1l 3 Lamc2 2 Gm16046 3 Lrp10 2 Pdgfb 3 Myo1d 2 Rbm47 3 Nuak2 2 Ripk4 3 Phldb2 2 Sorbs2 3 Plekha2 2 Tmprss11e 3 Pmepa1 2 Unc5b 3 Rin3 2 Wnt10a 3 Rnpep 2 Wnt7a 3 Sdc1 2 2010204K13Rik 2 Sema3f 2 5830416P10Rik 2 Susd1 2 Bhlhe40 2 Tmem51 2 Cldn3 2 Tnfaip3 2 Ctgf 2 Tuba4a 2 Dll4 2 Ubtd2 2 Dnaja4 2 Vegfc 2 Dock5 2 Wwc2 2 Epha1 2

70 Dissertation Jacqueline Jufen Zhu

Table 3.4 Multiple enhancer regulation on metastasis-associated genes in 67NR cells Enhancer Enhancer Enhancer Gene Symbol Gene Symbol Gene Symbol Count Count Count Plk2 16 Adm 3 Dnhd1 2 Ereg 10 Adrb2 3 Dysfip1 2 Gdnf 10 Amacr 3 Eml1 2 Jun 8 Ankrd57 3 Fam71f1 2 B430305J03Rik 8 Apcdd1 3 Fhl4 2 Ccl7 8 Aqp5 3 Foxd1 2 Pde2a 8 Atf1 3 Gm12981 2 2810474O19Rik 7 Csf1 3 Gm15608 2 Fgfr1 6 Dclk1 3 Gm17249 2 Gas1 6 Dhx29 3 Gm17443 2 2900052L18Rik 5 Fgf2 3 Gm2115 2 Bahcc1 5 Glipr1 3 Gpc4 2 Cdkn2a 5 Gm12824 3 Gpx8 2 Mex3b 5 Gm16508 3 Gypc 2 Sparc 5 Gm9949 3 Irx1 2 Xdh 5 Hrh1 3 Irx5 2 1700023H06Rik 4 Itga7 3 Kcnh1 2 A830082K12Rik 4 Kcnj2 3 Kitl 2 Abcb1b 4 Msx2 3 Layn 2 B230206L02Rik 4 Nt5dc3 3 Marcks 2 Cadm1 4 Osbpl5 3 Mrgprf 2 Cav1 4 Pak1 3 Naip2 2 Ccdc109b 4 Pcdh7 3 Nrg1 2 Chst1 4 Pik3r5 3 Pdpn 2 Dcun1d3 4 Pmp22 3 Plcl1 2 Dnajb4 4 Ppap2a 3 Prss42 2 Ecm1 4 Rap1b 3 Ptgs1 2 Gm11216 4 Rrm2b 3 Qrfp 2 Gm12056 4 S100a7a 3 Rab30 2 Gm9958 4 Sbno2 3 Rnf11 2 Id2 4 Tbx3 3 Rusc2 2 Lrrk2 4 Thbd 3 Sema5a 2 Lyrm1 4 Tmem74 3 Sesn2 2 Ndst3 4 Tspan17 3 Siglecg 2 Otud1 4 Zbtb20 3 Snora2b 2 Prrx1 4 1700012B09Rik 2 Snord123 2

71 Dissertation Jacqueline Jufen Zhu

Rgs2 4 9130023H24Rik 2 Tgfb1i1 2 Snai2 4 A430110N23Rik 2 Tgfbr3 2 1600021P15Rik 3 A930001C03Rik 2 Tigd5 2 1700086P04Rik 3 Aen 2 Tmem45a 2 2610037D02Rik 3 Ankrd55 2 Tusc1 2 4632434I11Rik 3 Ap3b2 2 Usp14 2 9230114K14Rik 3 Atp13a4 2 Vgll3 2 Abca4 3 Cpne2 2 Zfpm2 2

72 Dissertation Jacqueline Jufen Zhu

Multitasking-enhancers regulate more than one metastasis-associated gene

In addition to multiple enhancers being associated with a single gene, enhancers were identified that interacted and potentially regulated multiple metastasis-associated genes. In this study, we identified 13 and 29 multitasking-enhancers in 4T1 and 67NR cells, respectively

(Figure 3.3C, Figure 3.4C, Table 3.5, Table 3.6). Multitasking-enhancers are expected to be more powerful than single-tasking enhancers, since they have the potential to regulate metastasis-associated genes and pathways in a more efficient manner. Figure 3.3D and Figure

3.4D show examples of multitasking enhancers in 4T1 and 67NR.

Claudin are the most important component of cell tight junction, which stand out in 4T1 cells by gene ontology analysis. They are known to associate to breast tumor malignancy176–181.

By comparison, Cldn3 and Cldn4 were highly expressed in 4T1 cells, while no expression was detected in 4T07 and 67NR cells. A multitasking-enhancer was identified and named as CldnE interacting with Cldn3 and Cldn4 genes, as well as an abhydrolase gene Abhd11 in 4T1 cells

(Figure 3.3D), suggesting its important role in metastasis regulation. To interrogate whether the onco-enhancer is functional other than only structural, we implemented CRISPR/Cas9 to knock out the target enhancer CldnE (chr5: 135441137-135443563). Before knocking out, we prepared a 4T1_Cas9 cell line by lentivirus infection on 4T1 to constitutively express Cas9 protein. Then we transfected 4T1_Cas9 cells to replace CldnE with drug-resistant genes by homology-directed repair (HDR) (Figure 3.5A) for following knockouts selection. Considering the aneuploidy of

4T1 cells, we designed three different drug-resistant gene donors for HDR to guarantee at least three copies of CldnE were simultaneously knocked out in a single 4T1_Cas9 cell by drug selection. When we got a puromycin-, geneticin- and hygromycin-resistant knockout cell population, flow cytometry was applied to sort them into single cells. We then randomly picked

73 Dissertation Jacqueline Jufen Zhu ten single cell colonies that were genotyped (Figure 3.6A, 3.6B) to have interrupted CldnE for transcription quantification. Expectedly, by qRT-PCR, we observed a dramatic decrease of the target gene expression (Figure 3.5B), validated the regulatory function of multitasking-enhancer

CldnE. We also performed Western blot to investigate the protein expression of Cldn3 and Cldn4 in three single cell CldnE knockout clones. Compared with the control cells, one of clones had

Cldn3 and Cldn4 protein dramatically down-regulated; the other two clones also showed decreased expression though not that significant except P1C10 expressed about the same level of

Cldn4 with wildtype cells (Figure 3.6C). The statistics showed us Cldn3 protein was significantly down-regulated to 0.6-fold of the control and Cldn4 was about 0.8 fold.

Furthermore, to examine if metastasis potential has been changed within these CldnE knockouts, we performed wound healing assay, which showed the knockouts migrated significantly slower than the control (Figure 3.5C, 3.5D), without observing a significant difference in cell proliferation. In addition, mouse implantation was conducted by injecting CldnE knockouts or control cells into nude mice. 30 incidences in lung (ten incidences for each cell line) were observed to count for micrometastasis in two CldnE knockouts and one control cell line. The micrometastasis is defined as metastatic seeds that have not yet development into visible macrometastatic tumors but can be observed as speckles. Though no significant difference was seen in macrometastatic pulmonary tumors (Figure 3.6D), micrometastasis was observed by

H&E staining significantly less frequently occurred (P-value = 0.02235) in CldnE knockouts

(Figure 3.5E). Therefore, we proved the pivotal role of Cldn3, Cldn4 genes and provided insights into the potential therapeutics in cancer metastasis by targeting on enhancers.

74 Dissertation Jacqueline Jufen Zhu

A C

B D

E

Figure 3.5 Functional validation of CldnE knockout by CRISPR (A) The schematic shows CRISPR knock-out strategy. HDR are introduced to both sides of the CldnE enhancer region. Three different antibiotics are knocked in when enhancers are knocked out to increase the probability of enhancer knock-out at more than three alleles in single cell by further antibiotics selection. (B) qPCR is done to measure the target gene expression in control and ten single cell clone CldnE knockouts. Cldn3, Cldn4 and Abhd11 are all significantly down-regulated in the knockouts. “*” indicates P-value is less than 0.05 by t-test. (C-D) Wound healing and cell migration assay. (C) Three single cell clone knockouts are selected for wound healing and cell migration assay. Images were taken every two hours till eight hours from the start. Here only 0-, 4- and 8-hour images are shown. Compared with control cells, knockout cells are observed to migrate slower. (D) Migration speed is calculated based on the measurement from the images. The control cells migrate at the speed of 116 μm/h, while knockouts migrate at 62 μm/h. (E) Micrometastasis was counted in knockouts and control cells. Ten incidences were observed for each cell lines. One incidence is one speckle observed in H&E staining.

75 Dissertation Jacqueline Jufen Zhu

A C

4T1_Cas9 4T1_Cas9_koCldnE_P1C104T1_Cas9_koCldnE_P2D24T1_Cas9_koCldnE_P3D4

Cldn3 1 0.72 0.77 0.35

Cldn4 1 1.06 0.85 0.56

Gapdh

1.2

1

0.8

0.6 4T1_Cas9 4T1_Cas9_koCldnE 0.4

0.2

0 Cldn3 Cldn4 B

D

76 Dissertation Jacqueline Jufen Zhu

Figure 3.6 Characterization of CldnE knockouts (A) Gel picture shows CldnE wildtype and knock-out DNA fragments. The upper bands indicate CldnE knockout genotype, which only shows up in the ten single cell clone knockouts but not in

control. The lower bands indicate CldnE wildtype genotype, which shows up in all the samples, suggesting CldnE knock-out is not complete in the ten knockout cells. (B) CldnE knockout genotype is sequenced. The sequence spans the EF1α promoter in the targeting constructs to CldnE flanking region. (C) The upper panel shows Western Blot result of Cldn3, Cldn4 and Gapdh in 4T1_Cas cells and CldnE knockouts. The raw intensities of protein bands are measured using FIJI. Normalized by Gapdh, Cldn3 and Cldn4 expressed in CldnE knockouts are compared with 4T1_Cas9. The number underlying each band indicates the relative intensity of the band. The lower panel shows the statistics of Cldn3 and Cldn4 protein level in CldnE knockouts. (D) Primary, pulmonary metastatic tumor weight were measured in mouse implanted with knockouts and controls. Normalized metastasis was calculated.

Table 3.5 Multi-tasking enhancers in 4T1 cells Enhancer Coordinate Gene Count chr6:52023797-52028187 4 chr12:103004008-103005806 2 chr12:82719533-82721279 2 chr14:56342201-56344158 2 chr15:37049761-37051327 2 chr15:56552071-56554508 2 chr15:56672483-56673912 2 chr2:119136894-119138322 2 chr2:119139504-119140698 2 chr2:129236520-129240516 2 chr5:135430733-135444258 2 chr7:141450421-141454102 2 chr9:110582929-110584694 2

77 Dissertation Jacqueline Jufen Zhu

Table 3.6 Multi-tasking enhancers in 67NR cells Enhancer Coordinate Gene Count chr10:117303670-117305698 2 chr11:120113330-120115578 2 chr11:120165218-120167654 2 chr11:120167679-120170193 2 chr11:120176495-120179043 2 chr11:120213812-120216549 2 chr13:113650957-113653080 2 chr13:113991211-113994366 2 chr14:31716840-31718849 2 chr15:32251683-32253349 2 chr15:32444308-32445350 2 chr18:62209005-62211249 2 chr18:62321109-62323590 2 chr18:62362078-62364487 2 chr3:32407170-32410547 2 chr4:132000670-132003197 2 chr4:140982579-140984296 2 chr4:95527757-95529526 2 chr5:18422249-18423650 2 chr7:127030594-127032230 2 chr7:127104239-127105575 2 chr7:127240748-127242263 2 chr7:135487713-135489437 2 chr7:87274020-87276902 2 chr7:99863142-99865504 2 chr7:99980076-99982095 2 chr8:33211357-33212580 2 chr8:33310000-33311769 2 chr9:110683374-110685628 2

78 Dissertation Jacqueline Jufen Zhu

3.4 Discussion

Though the epithelial to mesenchymal transition is usually considered as a marker of metastasis, 4T1 mouse cell model showed a totally different mechanism. Metastatic 4T1 cell has more epithelial features while non-metastatic 67NR is more likely to be in mesenchymal stage.

This phenomenon demonstrates the mesenchymal features are not always the markers or drivers for metastasis. Though theoretically, the mesenchymal cells are more likely to metastasize, other steps of metastasis can also be crucial such as colonization. 4T1 cells have tighter cell junction, likely to circulate in colony and colonize to the secondary tumor site more easily than 67NR cells.

It is interesting that Fos:Jun was found to be involved in enhancer regulation in both metastatic and non-metastatic cells. It is hard to know if the same Fos:Jun proteins function oppositely in different cells or different Fos:Jun proteins take care of different metastatic potentials.

Enhancer has been well studied to regulate genes and associate with diseases. However, multitasking enhancer regulating cancer development is rarely studied. These powerful enhancers could be the potential targets for therapeutics.

By analyzing chromatin interactions between DNA elements and gene promoters, we should be able to identify other regulatory elements such as silencers and insulators besides enhancers. Enhancers are well-studied and found to play important roles in activating or enhancing gene transcription. While, we know relatively little about silencers, which functions in the opposite to enhancers, silencing or weakening gene transcription. Insulators work to insulate certain connections between gene and regulatory elements or between different chromatin domains. Genome architectural proteins such CTCF and cohesin are often seen to bind with insulators to cooperate insulation. For examples, the boundaries of TADs or insulated

79 Dissertation Jacqueline Jufen Zhu neighborhood domains function as insulators to regulate transcription. Therefore, CTCF ChIA-

PET could be used to examine insulator distribution. Similar to enhancer analysis, comparative analysis of chromatin interactions in 4T1 cell line series could identify metastasis-associated silencers and insulators or chromatin domains that participate in metastasis progression.

I mainly focused on intra-chromosomal chromatin interactions in this thesis. However, inter-chromosomal interactions also exist and play important roles in genome organization and cellular function. To more comprehensively understand the associations of chromatin organization and gene transcription, inter-chromosomal interactions between promoters and regulatory elements should not be ignored. Preliminarily, I have observed interactions between metastasis-associated genes and multi-tasking enhancers inter-chromosomally by DNA FISH

(Figure 3.7A, Table 3.7). In metastatic 4T1 cells, the co-localization frequency of a distinct Myc enhancer (named MycE) and other metastasis associated genes in different chromosomes is significantly higher than in non-metastatic 67NR cells and normal NIH3T3 cells (Table 3.7) as well as controls in 4T1, whereas the differences of co-localization frequency between target pairs and controls in 67NR is not as significant as in 4T1 (Table 3.7), indicating an involvement of

MycE in regulating metastasis associated genes in trans in metastatic cells. Furthermore, we have validated the function of MycE by CRISPR knockout. In MycE knockouts, metastasis associated genes such as Cldn3, Cldn4, Cdh1 and Wnt7a were significantly down-regulated as well as Myc

(Figure 3.7B). Considering Myc may have influences on other genes, we overexpressed Myc in

MycE knockouts (Figure 3.7C) and measured the expression of candidate MycE regulating genes again. The results showed Cdh1 expression was not rescued in Myc overexpressed MycE knockouts (Figure 3.7C) validating MycE is promoting metastasis-associated gene expression at least on Cdh1.

80 Dissertation Jacqueline Jufen Zhu

A MycE MycE Cldn3,4 Cldn3,4 f

4T1 67NR

B C

6 Myc 1.2 Cdh1 5 1 4T1_Cas9 4T1_Cas9 4 0.8 3 4T1_Cas9_koMycE 0.6 4T1_Cas9_koMycE 2 _Ctr 0.4 _Ctr 4T1_Cas9_koMycE 4T1_Cas9_koMycE 1 _Myc 0.2 _Myc 0 0 P1C3 P2D2 P2B6 P1C3 P2D2 P2B6

Figure 3.7 Chromatin interaction between MycE and metastasis-associated genes in trans (A) The co-localization of MycE and Cldn3,4. (B) Gene expression in MycE knockouts. By qRT-PCR measurement of transcription in 13 MycE knockouts, Cldn3, Abhd11, Cdh1 and Wnt7a were significantly down-regulated as well as Myc. (C) Gene expression in Myc overexpressed MycE knockouts. In Myc overexpressed 4T1_Cas9_koMycE_Myc cells, Cdh1 expression were not rescued compared with control 4T1_Cas9_knoMycE_Ctr and its parental 4T1_Cas9 cells by qRT-PCR.

Table 3.7 Co-localization frequency of MycE and metastasis associated genes MycE- MycECtr- MycECtr- MycE- MycE- MycECtr- MycE- MycECtr- Cell Cldn CldnCtr Cldn CldnCtr Wnt7a Wnt7a Cdh1 Cdh1

4T1 24.4% 5.3% 10.3% 13.6% 23.2% 9.0% 22.5% 6.4%

67NR 12.0% 15.4% 6.9% 10.7% 18.3% 13.5% 19.6% 10.7%

NIH3T3 6.8% 7.8%

81 Dissertation Jacqueline Jufen Zhu

Chapter 4: Identification of a core metastasis

susceptibility network by integrating genetic,

epigenetic and chromatin interaction analysis

82 Dissertation Jacqueline Jufen Zhu

Chapter 4

4.1 Introduction

Unbiased screens, such as meiotic genetic susceptibility analysis, provides one method to begin to implicate pathways and genes in difficult-to-observe phenomenon. Genetic screens can be performed to identify inherited polymorphisms that affect the final outcome of a phenotype, such as overt metastatic disease, without the need for pre-existing hypotheses about where in the metastatic cascade a particular factor acts. SNPs that are found to be associated with differential metastatic capacity in tumor progression can be used to retrospectively construct the probable biology along the metastatic cascade on which a gene or genes act without direct observation, much like clues used to reconstruct events at a crime scene.

Based on this premise, we developed a genetic screen that demonstrated a significant role of inherited polymorphisms in the progression of a transgene-induced model of mammary tumor metastasis130,131. We utilized a strategy that integrated a subtractive SNP screen between closely- related inbred strains with differing metastatic potentials to identify polymorphisms that were enriched for metastasis suppressing functions126. To further filter the SNPs, we subsequently focused only on SNPs that mapped within DHS of a publicly available mouse mammary tumor cell line. This strategy was based on human studies that indicate that the majority of disease- associated polymorphisms are non-coding124 and therefore likely mediate their effects through subtle changes in gene transcription125,126. Putative candidate genes were identified by proximity to polymorphic DHS sites and subsequent analysis indicated that these candidates were enriched for factors and mechanisms associated with progression in both mouse models and human breast cancer patients126.

83 Dissertation Jacqueline Jufen Zhu

Despite the success of these studies, there are limitations such as the ignorance of distal regulatory elements that connect with certain metastasis associated genes. Therefore, we refine the integrated strategy by specifically focusing on candidate genes enriched for the quiescent-to- proliferative conversion by using BACh sites identified in three cell lines of the 4T1 cell line series, which represent different stages along the metastatic cascade182. Furthermore, BACh-to- gene assignment was refined by the use of RNAPII ChIA-PET analysis, which specifically maps chromatin interactions mediated by RNA polymerase II, to provide physical evidence of polymorphic BACh interaction with gene promoters.

The work in this chapter was mostly done by Kent Hunter group in National Cancer

Institute. I have contributed in doing the chromatin interaction related analysis.

84 Dissertation Jacqueline Jufen Zhu

4.2 Materials and Methods

Computing environment

All computations at NCI were performed on the NIH helix/biowulf system, documentation of which is available at https://helix.nih.gov. We used R computing environment, Perl scripts,

Bedtools, and UCSC liftOver for most of the analyses.

Identification of polymorphic BACh sites

The workflow consisted of the following: 1) the BACh data were filtered for the regions overlapping with polymorphic sites. Since the BACh data were generated in Genome Build mm9, we used UCSC mm9 snp128 data to restrict the BACh sites. 2) Vcf were filtered to retain the SNPs that overlap with the BACh present in the 4T1, 4T07 and 67NR cell lines. 3) SNPs were removed in the BACh that are present in the mouse FVB/NJ strain.

Effect of Pioglitazone treatment on pulmonary metastases

1x105 cells/mouse of the mammary tumor cell line 6DT1 were orthotopically implanted into the fourth mammary fat pad of female FVB/NJ mice (6-8 weeks old). Ten days post-implantation, all of the animals were combined into a single cage then randomly assigned to treatment or control group by alternating assignment to new cages. Pioglitazone (100µM) or vehicle (DMSO,

0.17%) was added to drinking water 10 days post tumor implantation and available to mice ad libitum. Drinking water with Pioglitazone or DMSO was refreshed every week. Tumor growth was monitored and animals were euthanized 28 days post implantation. Tumors and lungs were evaluated for weight and surface metastases respectively. The experiment was performed twice and the results were analyzed using the Mann-Whitney test and the results of the independent experiments combined using the weighted Z-method183. All procedures were performed as approved by the NCI-Bethesda Animal Care and Use Committee.

85 Dissertation Jacqueline Jufen Zhu

Effect of Dexamethasone treatment on pulmonary metastases

1x105 cells/mouse of the mammary tumor cell line 6DT1 were orthotopically implanted into the fourth mammary fat pad of female FVB/NJ mice (6-8 weeks old). 10 days post-implantation all of the animals were combined into a single cage, then randomly assigned to treatment or control group by alternating assignment to new cages. Dexamethasone was administered by intraperitoneal injection at 5 mg/kg of body weight in PBS daily for 21 days. At the end of the treatment period, animals were euthanized and tumor weights and pulmonary metastatic lesions enumerated. The experiment was performed twice and the results of the Mann-Whitney test were combined the weighted Z-method183. All procedures were performed as approved by the

NCI-Bethesda Animal Care and Use Committee.

86 Dissertation Jacqueline Jufen Zhu

4.3 Results

Correlation of the number of BACh sites in the 4T1 cell line series with metastatic potential

To identify BACh that are associated with the colonization of metastatic secondary sites, analysis of three of the cell lines from the mouse 4T1 mammary tumor cell line panel was performed. BACh analysis was performed to identify metastatic 4T1 cell specific sites (Figure

4.1A). Based on the hypothesis, genes associated with these BACh would be enriched for factors necessary for the conversion of dormant solitary tumor cells to a proliferative multicellular metastatic lesion.

Biological duplicate BACh analysis was performed on each cell line. Analysis comparing the replicate samples indicated a high degree of reproducibility (R>= 0.90) (Figure 4.2A). The data for the replicate experiments were pooled and the BACh patterns for the three cell lines examined. Comparison of the BACh between the different cell lines revealed a total of 3580

4T1-specific BACh sites that might be associated with metastasis susceptibility genes (Figure

4.1B). In addition, an unexpected significant inverse correlation was observed between the malignancy of the cell lines and the number of BACh, with a total of 43,112 BACh observed in the benign 67NR cell line, 38,106 in 4T07, and 27,156 in the fully metastatic 4T1 cell line. To confirm that the loss of BACh was not due to a technical artifact, the data were analyzed for a set of reference BACh in each cell line (Figure 4.2A, 4.2B and Figure 4.1C). The analysis indicated that approximately 90% of the reference BACh were successfully captured in each cell line, indicating that the decrease in BACh across the cell lines was not likely due to the relative efficiency of BACh capture in each line (Figure 4.1C).

87 Dissertation Jacqueline Jufen Zhu

The relative position of the BACh to known TSS was then examined to determine if specific classes of BACh were progressively lost during increased metastatic progression. The number of BACh sites associated with promoter elements - defined as regions 2.5 kb upstream of the transcriptional start site to 2.5 kb downstream - was approximately equivalent across all three cell lines. This suggested that approximately the same number of genes were transcriptionally active across the cell lines; subsequent analysis of RNA-seq data was consistent with this interpretation. In addition, RNA content per cell was not found to be significantly different among the three cell lines, indicating that a general decrease in transcription had not occurred. In contrast to the promoter BACh, BACh within , overlapping exons, or residing either upstream or downstream of TSS all showed progressive loss across the cell lines. Analysis of the relative position of the retained versus lost BACh revealed that BACh farther away from transcriptional start sites were preferentially lost as malignancy increased (Figure 4.2D).

Stratification by genomic position further revealed that preferential loss of distant BACh was restricted to BACh outside of gene bodies, suggesting a progressive closure of potential distant enhancer elements as the mammary tumor cell lines gained metastatic potential.

88 Dissertation Jacqueline Jufen Zhu

A B

C

Figure 4.1 Characterization of chromatin accessibility by BAChs (A) Schematic of the different classes of Benzonase accessible chromatin sites (BAChs) for the 4T1 cell line series. Left is a depiction of the tumor phenotypes in three cell lines, with the localized 67NR tumor at the top, the disseminating but not metastatic 4T07 in the middle, and the highly metastatic 4T1 tumor at the bottom. BACh sites are represented by the green peaks. The 4T1-specific BAChs are indicated by the red box. (B) Venn diagram of the BACh site overlap in the 67NR, 4T07 and 4T1 cell lines. (C) Lineage plots of BAChs in the 4T1 cell line series. Dark blue represents sites present in 67NR that persist or reoccur in 4T07 and 4T1. Brown indicates BACh sites that appear in 4T07 that persist in 4T1. Light blue indicates BACh sites that appear specifically in 4T1. Numbers in the lineage plots indicate the number of BACh sites in each category for the three cell lines. The annotation for each genomic and sub-genomic analysis is indicated above the individual lineage plots.

89 Dissertation Jacqueline Jufen Zhu

A B

C D

90 Dissertation Jacqueline Jufen Zhu

Figure 4.2 Evaluation and characterization of BACh sites (A) BACh site replication, overlap and genomic annotation data for the duplicate samples for 67NR, 4T07 and 4T1 cell line. The left figure in each row shows the correlation of the BACh for each duplicate sample. The Venn diagram in the center displays the overlap of BACh identified in the cell line replicates. The histogram on the right shows the overlap of DHS with respect to gene features. (B) Lineage plots of BACh in the 4T1 cell line series. Dark blue represents sites present in 67NR that persist or reoccur in 4T07 and 4T1. Brown indicates BACh sites that appear in 4T07 that persist in 4T1. Light blue indicates BACh sites that appear specifically in 4T1. Numbers in the lineage plots indicate the number of BACh sites in each category for the three cell lines. The annotation for each genomic and sub-genomic analysis is indicated above the individual lineage plots. (C) BACh sites evaluation by PRC. A total of 8,587 peaks were found to be present in all 133 samples. PRC was obtained to evaluate the level of digestion by Benzonase. Our pooled samples with PRC of 90% or greater indicate that all 8,587 sites were present in the sample. Two biological replicates with acceptable PRC were selected and pooled for each cell line, and each pooled data set showed over 90% PRC. (D) Plot of the relative position of persistent or lost Benzonase accessible chromatin sites (BAChs) to transcriptional start sites (TSS) across the 4T1 cell line series. BACh sites that are maintained across the cell lines are represented by the blue line, while lost BAChs are represented by the red line.

91 Dissertation Jacqueline Jufen Zhu

Identification of polymorphic BACh sites and associated genes

Next, we sought to filter the 3580 4T1-specific BACh for those likely to be associated with inherited susceptibility to developing metastatic disease. Previously, we demonstrated that a subtractive strategy comparing phylogenetically closely-related mouse strains with different metastatic capacities was an effective method of identifying potential metastasis-associated genes. Here, by focusing on the 3580 4T1-specific BACh sites and multiple paired high- and low-metastatic mouse strains, we further refined this strategy to identify a common set of metastasis-associated candidates enriched for genes potentially involved in the dormant-to- proliferative colonization switch. Five different mouse strain pairs were selected for this analysis based on close phylogenetic relationships and positions on either end of the inherited metastatic efficiency spectrum (Figure 4.3A, 4.3B).

To identify polymorphic BACh that might influence the transcription of metastasis- associated genes, SNPs present in the low metastatic strain were extracted from those present in its closely-related highly metastatic relative to enrich for variants that suppress metastatic disease

(Figure 4.3C). To specifically enrich for those polymorphisms affecting transcription of potential colonization-suppressing genes, the variants were filtered for those that fall within the 3580 4T1- specific BACh. This resulted in 1053-7396 SNPs present in 339-2254 BACh sites for the five strain pair comparisons (Table 4.1).

It is known that enhancer elements can be at a considerable distance from their target genes and chromosomal looping can result in enhancer-promoter contacts that reach over intervening genes. Polymorphic BACh-gene assignments based only on proximity therefore likely results in a significant degree of both false-positive and false-negative associations. To improve the accuracy of candidate gene discovery we therefore performed RNAPII ChIA-PET in

92 Dissertation Jacqueline Jufen Zhu the 4T1 cell line identify RNAPII mediated chromosomal contacts between polymorphic BACh and active or poised promoters. Polymorphic BACh were used to identify initial intra- chromosomal loops connecting the BACh with distant positions. The initial ChIA-PET end pairs were then used to seed a walking strategy to identify gene promoters in close proximity to polymorphic BACh via intra-chromosomal looping (Figure 4.3C). A total of 358,508 paired end tags were identified by the RNAPII ChIA-PET analysis in 4T1 cells, of which 85,853 represented intra-chromosomal looping events. Interestingly, a similar reduction of the number of RNAPII binding peaks was observed across the 4T1 cell line series as seen for the BACh sites, with 29971 RNAPII binding peaks in 4T1, 41809 in 4T07 and 40513 in 67NR cells (Figure

4.4A). Similar trend of BACh sites and RNAPII binding peaks, indicates a positive correlation of chromatin accessibility and transcriptional activity.

Applying the polymorphic BACh/ChIA-PET walking strategy to each of the five mouse strain pairs resulted in the identification of 587-2123 genes (Table 4.1) that were in contact with polymorphic BACh and whose expression might therefore be influenced by the inherited polymorphisms. The resulting gene lists for each of the mouse strain pair subtractions were then compared to identify a core set of candidate metastasis susceptibility genes. 338 genes were found to be in common in the five separate analyses, potentially representing a common set of functions necessary for metastatic progression in the PyMT mouse mammary tumor model

(Table 4.2; Figure 4.3D). To determine whether a significant fraction of these genes might represent identity by descent due to the shared inheritance of common inbred mouse strains, the overlap of polymorphic BACh and SNPs was also examined. Only nine polymorphic BACh were common to the five different strain combinations (Figure 4.4B), and no SNPs were shared

93 Dissertation Jacqueline Jufen Zhu

(Figure 4.4C), indicating that different inherited polymorphisms and haplotypes were tagging the same set of genes through long range intra-chromosomal architecture.

To determine whether the core metastasis susceptibility candidate genes might share common transcriptional regulatory elements, promoter motif analysis was performed using the

GREAT tool. Five motifs were observed to be over-represented in all five strain comparisons, including the previously characterized Sp1 and ELK1 transcription factor binding sites (Table

4.3), which is consistent of the promoter binding motif analysis in Chapter 3 (Figure 3.1C). The three additional sequence motifs have not previously been associated with transcription factor binding. These motifs may therefore represent binding sites for novel transcription factors with important roles in metastatic breast cancer progression (Table 4.3).

94 Dissertation Jacqueline Jufen Zhu

A C

B

D

95 Dissertation Jacqueline Jufen Zhu

Figure 4.3 Gene identification by ChIA-PET and polymorphic BACh analysis (A) Phylogenetic relation of the pairs of high- (blue) and low- (red) metastatic mouse strains used in the analysis are shown. (B) Histogram of relative metastatic capacity in the selected strain pairs across the full inbred strain survey. Y-axis represents the average number of metastases observed per animal. X-axis indicates the identity of the inbred strain bred to the PyMT model. The number of animals analyzed in each breeding is indicated below the x-axis. Blue bars indicate strains that displayed no significant difference in metastatic capacity compared to the original FVB background of the PyMT mammary tumor model. Red bars indicated strains that showed statistically significant reduction of metastatic disease, calculated by pairwise Mann-Whitney t tests. (C) Schematic of the analytic genome-wide strategy used to identify potential metastasis susceptibility genes. Single nucleotide polymorphisms that distinguish the phylogenetically highly related high and low strains were filtered for those that fall within 4T1-specific BACh sites and are therefore most likely associated with alterations in transcriptional activity. The polymorphic BACh sites were then used to nucleate a walking strategy with the RNAPII ChIA-PET looping data to identify gene promoters that were in contact with the BACh site. (D) Venn diagram showing 338 genes in common were identified as having their transcriptional start sites tagged by the polymorphic BACh-ChIA-PET walking strategy in all five strain pair combinations.

96 Dissertation Jacqueline Jufen Zhu

A 67NR 13877

4T1 4522 18734

11940

4T07 B

C

Figure 4.4 Venn diagrams of BACh sites, RNAPII binding peak and SNPs (A) Similar amount of shared and unique sites was identified in BACh sites and RNAPII binding peaks. (B) Analysis of the overlap of polymorphic DHS in the five comparisons revealed only 9 polymorphic BACh in common. (C) SNP analysis in the polymorphic BACh revealed that no variants were common for the five strain pair comparisons.

97 Dissertation Jacqueline Jufen Zhu

Table 4.1 Results of Integrated DHS, SNP and ChIA-PET screening strategy MOLF Mouse Strain RF subtracted C58 subtracted C57BR subtracted SEA subtracted subtracted by Pairs by AKR by C57BL10 by C57BL10 by BALB CAST SNPs in DHS 1053 1234 1236 2571 7396 Polymorphic DHS 339 373 399 802 2254 Tagged TSS 587 616 690 1155 2123

Table 4.2 Putative Common Core Metastasis Susceptibility Genes Gene Symbol 1110008P14Rik Csnk2b Gstp1 Il1a Pacsin3 Src 1700092M07Rik Ctsw Gstp2 Il1b Pak6 Srrm2 1810009A15Rik Cyb561a3 Gtf2h4 Incenp Pcnxl3 Ssfa2 1810055G02Rik D030056L22Rik Guca1b Ints5 Pfdn4 Ssna1 4933433C11Rik D17H6S53E H2-D1 Itprip Phf19 Ssrp1 Abhd16a D17H6S56E-5 H2-K1 Jmjd8 Pigyl Sssca1 Abt1 D730039F16Rik H2-Q4 Kat5 Pitpnm1 Stx5a Adrm1 Dagla H2-T23 Kremen2 Pkmyt1 Surf4 Agpat2 Ddah2 H2-T24 Lama5 Pmepa1 Surf6 Ahnak Ddb2 Hat1 Lnpep Pnpla7 Suv420h1 AI462493 Ddr1 Hcfc1r1 Lpin2 Polr2g Synj2 AI837181 Ddx39b Hist1h1a Lrrc26 Ppp1r18 Taf6l Alkbh3 Decr2 Hist1h1b Lrrn4cl Ppp2r5b Tceb2 Anapc2 Dhx16 Hist1h1c Lrsam1 Procr Tgif1 Anxa1 Dnmt3b Hist1h1d Lsm2 Prrc2a Thbs4 Ap5b1 Doc2g Hist1h1e Ltbp3 Psat1 Thoc6 Apom Dpf2 Hist1h1t Ly6g5c Psmd5 Tle4 Arfgap2 Drap1 Hist1h2ab Ly6g6c Psors1c2 Tmem138 Arhgap40 Dus3l Hist1h2ac Ly6g6f Rab31 Tmem151a Arhgef28 Dusp5 Hist1h2ad Lyst Rab3il1 Tmem179b Arpc5l Ecsit Hist1h2ae Lzts2 Rabepk Tmem203 Arrdc1 Edem2 Hist1h2af Manbal Rad9a Tmem210 Asb13 Eef1g Hist1h2ai Map3k11 Rbl1 Tmem223 Atat1 Efemp2 Hist1h2ak Mdc1 Rbm39 Tmem258 Atp5e Efna5 Hist1h2an Med22 Rela Tmem8 Atp6v1g2 Ehbp1l1 Hist1h2bb Minpp1 Rgs19 Tnfrsf12a Atrn Ehmt1 Hist1h2bc Mroh8 Rin1 Tor4a Atrnl1 Ehmt2 Hist1h2be Mrpl11 Ring1 Tprn AU023871 Eif1ad Hist1h2bf Mrpl28 Rnaseh2c Traf1 Axin1 Elof1 Hist1h2bg Mrpl41 Rnf208 Tsga10ip B3gat3 Eml3 Hist1h2bh Mrps18b Rnf24 Ttc9c B3gnt1 Fads3 Hist1h2bm Msh5 Rom1 Tubb4b B4galt7 Fam166a Hist1h2bp Mta2 Romo1 Tubb5

98 Dissertation Jacqueline Jufen Zhu

Bag6 Fam69b Hist1h2br Mtl5 Rpl35 Tut1 Banf1 Fam89b Hist1h3a Mus81 Rpl7a Ubxn1 Blcap Fbxw2 Hist1h3b Mybpc3 Rpl9-ps6 Uqcrb Brd3 Fen1 Hist1h3c Myl12a Rpn2 Urm1 Brms1 Fibp Hist1h3d Myl12b Rps21 Vars Cables2 Flot1 Hist1h3e Ncs1 Rrbp1 Vars2 Camsap1 Flywch2 Hist1h3f Ndor1 Sart1 Vwa7 Ccdc147 Fnbp4 Hist1h3g Ndufv1 Scd2 Wdr43 Ccdc64b Fosl1 Hist1h3h Nelfb Scgb1a1 Wdr5 Ccdc85b Frmd8 Hist1h3i Nelfe Scyl1 Wdr74 Ccdc94 Fth1 Hist1h4a Neu1 Serinc5 Wdr85 Cd248 Gal3st3 Hist1h4b Neurl1b Serpinb6a Xrn2 Cdk2ap2 Ganab Hist1h4c Nfe2l2 Sf3b2 Yif1a Cds2 Ggta1 Hist1h4d Nfkbil1 Sgms1 Zbtb3 Cep110 Ghrh Hist1h4f Nfs1 Skiv2l Zc3h6 Cep250 Gm10197 Hist1h4h Nrarp Slc25a12 Zc3h8 Cfl1 Gm10762 Hist1h4j Nrm Slc25a45 Zfp184 Chka Gm10767 Hist1h4k Nudt8 Slc29a2 Zfp213 Ckap2l Gm10800 Hnrnpa3 Nxf1 Slc34a3 Zfp653 Cldn6 Gm16181 Hnrnpul2 Ola1 Slc3a2 Zmynd19

Cldn9 Gm711 Hsd17b12 Olfr1424 Smc3

Clic1 Gm757 Hspa1b Osbpl2 Smndc1

Cnn1 Golga1 Hspa5 Ovol1 Snx32

Commd7 Gpank1 Ier3 P2rx3 Snx9

Table 4.3 Significantly enriched promoter motifs in core metastasis susceptibility candidate genes Hyper FDR Q-Val RF C58 C57BR SEA MOLF Transcription Factor Motif subtracted subtracted subtracted subtracted subtracted from AKR from C57BL10 from C57BL10 from BALB from CAST (no known TF) KRCTCNNNNMANAGC 1.90E-28 7.68E-28 2.03E-26 4.58E-20 2.93E-15 Sp1 GGGCGGR 1.61E-06 0.028 2.72E-15 6.40E-13 2.53E-22 (no known TF) TTTNNANAGCYR 4.64E-15 1.45E-15 6.36E-12 2.63E-11 8.76E-06 ELK1, member of ETS oncogene SCGGAAGY 0.027 0.032 0.0065 9.88E-08 1.02E-05 family (no known TF) GATTGGY 0.00064 0.0099 0.0035 0.0034 2.68E-05

99 Dissertation Jacqueline Jufen Zhu

Disease and function analysis of metastasis susceptibility candidates

Literature searches were performed for the 338 candidate metastasis susceptibility genes to investigate previous associations with metastatic disease. Two genes (Brms1, Mta2) among the 338 have previously been extensively studied for their role in metastatic disease184,185.

Moreover, an additional 30 genes have previously been implicated in phenotypes associated with metastatic disease, including invasion, migration, epithelial-to-mesenchymal transition (EMT) and prognosis, consistent with the screen enriching for metastasis-associated factors.

To better understand the potential molecular and cellular functions represented from the

338 genes, Ingenuity Pathway Analysis was performed. Disease and Function analysis revealed two major categories associated with the putative core gene set (Figure 4.5A; Table 4.4). The first category was formation of nucleosomes and DNA replication. The second association was with diabetes mellitus. shRNA-mediated knockdown of peroxisome proliferator-activated receptor (PPAR), a target molecule for diabetes, was therefore performed and found to increase metastatic capacity in a tumor-autonomous manner (Figure 4.5B, 4.5C). Treatment of animals with Pioglitizone, a PPAR antagonist, reduced metastatic capacity of the tumors (Figure 4.5D,

4.5E). Together, these studies provide support that the metabolic diabetes mellitus axis may be part of the common breast cancer metastatic machinery that might be exploitable for clinical intervention of metastatic colonization.

100 Dissertation Jacqueline Jufen Zhu

A.

B. Pulmomonary Metastases D. Pulmomonary Metastases

100 p= 0.026 150 p= 0.023

80 nodules nodules 100 60

40 metastatic metastatic 50 20 ung ung fl fl 0 0 #o #o e rol h197 s ont crambl C S ioglitazone P

C. Tumor weight E. Tumor weight

2.0 2.5 n.s.

) n.s. ) 2.0 (g 1.5 (g

1.5 1.0 Weight Weight 1.0 0.5

Tumor 0.5 Tumor

0.0 0.0 e rol h197 s ont crambl C S ioglitazone P

Figure 4.5 Disease and functional pathway analysis (A) Disease and Function results of the Ingenuity Pathway Analysis of the core 338 genes. (B -E) Effects of Pparg (panels B and C) or Pioglitizone (panels D and E) on pulmonary metastases (B and D) and the primary tumor growth (C and E) of the 6DT1 allograft mammary tumor cells. The figure represents the combined results of two independent experiments for each experimental condition. P-values were calculated for each experiment by two-tailed Mann Whitney test, using Graph Pad Prism v 7, and subsequently combined using the weighted Z-method.

101 Dissertation Jacqueline Jufen Zhu

Table 4.4 IPA Disease and Function Analysis of Core Metastasis Susceptibility Genes Diseases or Functions Categories p-Value Annotation Cancer, Organismal Injury and Abnormalities, Reproductive breast or ovarian cancer 8.77E-03 System Disease Cancer, Organismal Injury and Abnormalities breast or ovarian carcinoma 2.04E-02 Gene Expression transcription of RNA 1.39E-02 Immunological Disease systemic autoimmune syndrome 3.06E-06 Cancer, Endocrine System Disorders, Organismal Injury and ovarian cancer 2.09E-02 Abnormalities, Reproductive System Disease Connective Tissue Disorders, Inflammatory Disease, Skeletal Rheumatic Disease 5.97E-05 and Muscular Disorders Metabolic Disease glucose metabolism disorder 3.05E-03 Connective Tissue Disorders, Inflammatory Disease, inflammation of joint 1.90E-05 Inflammatory Response, Skeletal and Muscular Disorders Endocrine System Disorders, Gastrointestinal Disease, diabetes mellitus 1.01E-03 Metabolic Disease Inflammatory Disease chronic inflammatory disorder 4.33E-04 Endocrine System Disorders, Gastrointestinal Disease, insulin-dependent diabetes mellitus 3.73E-11 Immunological Disease, Metabolic Disease Connective Tissue Disorders, Immunological Disease, Inflammatory Disease, Inflammatory Response, Skeletal and rheumatoid arthritis 2.07E-05 Muscular Disorders Protein Synthesis polymerization of protein 8.84E-06 Antimicrobial Response, Inflammatory Response antimicrobial response 1.36E-04 Cell Death and Survival cell death of connective tissue cells 2.86E-02 Cellular Assembly and Organization, DNA Replication, formation of nucleosomes 3.03E-18 Recombination, and Repair Cellular Development, Cellular Growth and Proliferation, proliferation of fibroblast cell lines 3.00E-03 Connective Tissue Development and Function Cancer cell transformation 6.82E-03 Post-Translational Modification, Protein Synthesis heterotetramerization of protein 2.73E-17 Cancer, Organismal Injury and Abnormalities, Reproductive cervical tumor 1.47E-02 System Disease Embryonic Development size of embryo 6.42E-03 Organismal Injury and Abnormalities, Reproductive System noninflammatory cervical disorder 2.86E-02 Disease Antimicrobial Response, Inflammatory Response antiviral response 1.30E-03 Lipid Metabolism, Small Molecule Biochemistry metabolism of phospholipid 9.33E-04 Lipid Metabolism, Small Molecule Biochemistry synthesis of phospholipid 2.16E-04 Cell Death and Survival cell death of melanoma cell lines 5.25E-03 Organismal Injury and Abnormalities, Tissue Morphology size of lesion 6.22E-03 Cell Cycle G2 phase 9.55E-03 DNA Replication, Recombination, and Repair repair of DNA 1.74E-02 Cancer, Organismal Injury and Abnormalities invasive cancer 2.36E-02 Cell Death and Survival apoptosis of lung cancer cell lines 3.13E-03 Cell Death and Survival killing of cells 4.80E-03

102 Dissertation Jacqueline Jufen Zhu

Cardiovascular Disease, Organismal Injury and size of infarct 2.88E-03 Abnormalities, Tissue Morphology cell proliferation of cervical cancer Cellular Development, Cellular Growth and Proliferation 9.37E-03 cell lines Cell Death and Survival cell death of sarcoma cell lines 1.94E-02 Cellular Development, Hematological System Development differentiation of megakaryocytes 1.12E-05 and Function, Hematopoiesis, Tissue Development Antimicrobial Response, Inflammatory Response antibacterial response 2.83E-03 Cancer, Organismal Injury and Abnormalities, Reproductive invasive breast cancer 2.55E-02 System Disease Carbohydrate Metabolism, Lipid Metabolism, Small synthesis of phosphatidic acid 8.21E-03 Molecule Biochemistry Carbohydrate Metabolism, Lipid Metabolism, Molecular concentration of phosphatidic acid 1.21E-02 Transport, Small Molecule Biochemistry Cellular Response to Therapeutics sensitivity of cells 1.40E-02 Cell Cycle, DNA Replication, Recombination, and Repair DNA recombination 2.41E-02 Respiratory Disease chronic pulmonary disease 2.60E-02 cell cycle progression of fibroblast Cell Cycle, Connective Tissue Development and Function 1.29E-03 cell lines Cell Cycle, DNA Replication, Recombination, and Repair recombination of cells 5.00E-03 Cell Cycle, Cellular Development, Connective Tissue senescence of fibroblast cell lines 5.48E-03 Development and Function Organismal Development abnormal morphology of tail 5.98E-03 Gene Expression binding of NFkB binding site 7.70E-03 Cancer, Cell Death and Survival, Organismal Injury and cell death of osteosarcoma cells 1.50E-02 Abnormalities, Tumor Morphology Cancer, Dermatological Diseases and Conditions, Organismal papilloma 1.65E-02 Injury and Abnormalities Cancer, Organismal Injury and Abnormalities, Tissue quantity of tumor 2.46E-02 Morphology Cell Cycle G2 phase of tumor cell lines 2.91E-02 DNA Replication, Recombination, and Repair breakage of DNA 1.02E-03 DNA Replication, Recombination, and Repair, Gene methylation of DNA 5.14E-03 Expression Organismal Injury and Abnormalities, Tissue Morphology volume of lesion 6.06E-03 Lipid Metabolism, Molecular Transport, Small Molecule accumulation of triacylglycerol 1.36E-02 Biochemistry Cell Cycle, DNA Replication, Recombination, and Repair homologous recombination of cells 1.36E-02 Dermatological Diseases and Conditions, Organismal Injury hyperkeratosis 1.48E-02 and Abnormalities Hereditary Disorder, Neurological Disease, Organismal Injury spinocerebellar ataxia 1.60E-02 and Abnormalities Connective Tissue Development and Function, Tissue quantity of white adipose tissue 1.80E-02 Morphology Cancer, Organismal Injury and Abnormalities tumorigenesis of benign tumor 2.24E-02 abnormal morphology of Organismal Development 2.85E-02 blastocyst

103 Dissertation Jacqueline Jufen Zhu

Canonical pathway analysis of metastasis susceptibility candidates

The core gene set was enriched for molecules associated with cell-cell and cell-matrix interactions as well as motility and related downstream signaling pathways (Figure 4.6A). In addition, in agreement with previous results126, aryl hydrocarbon receptor signaling was also implicated, as well as glucocorticoid receptor signaling, inflammatory and other immunological components.

104 Dissertation Jacqueline Jufen Zhu

Figure 4.6 Pathway analysis of 338 identified genes Canonical Pathway results of Ingenuity Pathway Analysis of the 338 core gene set. Relative strength of the p value of pathway association is indicated by the shade of red, with the darker red indicating a smaller p value. The range of p values observed is indicated at the top of the figure.

105 Dissertation Jacqueline Jufen Zhu

Metastasis associated genes predict patient prognosis

As stated previously, we have identified 743 genes over-expressed in metastatic 4T1 cells and 338 metastasis susceptibility genes that potentially promote metastasis progression by comparative analysis of cell lines and mouse strains, respectively. We then further identified 25 genes that are shared with these two gene lists: Anxa1, Arrdc1, Cldn6, Ctsw, Hist1h1b,

Hist1h2ab, Hist1h2bb, Hist1h2bc, Hist1h2be, Hist1h2bm, Hist1h2bp, Hspa1b, Il1a, Il1b, Incenp,

Nrarp, Ovol1, Pak6, Pmepa1, Procr, Rom1, Scd2, Tmem223, Tprn, Tubb4b. Almost all these genes show trends towards poorer patient survival if they are amplified, consistently with positive gene expression correlation with imcreased malignancy across 4T1 cell line series.

Among these genes Tmem223 was significantly associated with patient prognosis when the gene was amplified (Figure 4.7 A, C). Other genes, such as Nrarp, trended, but were not significantly likely due to the relatively low fraction of patients with gene amplification (Figure 4.7B, D).

These data are consistent, however, with the ability of this strategy to identify important metastasis-associated genes that individually may contribute to poor prognosis in small subgroups of at-risk breast cancer patients.

106 Dissertation Jacqueline Jufen Zhu

A C

B D

Figure 4.7 Distinct chromatin interactions and survival analysis on identified metastasis- associated genes (A) Chromatin interactions with Tmem223 gene in 4T1 cell series. The green line indicates gene promoter. The red lines indicate enhancers. (B) Survival analysis of Tmem223 in humanbreast cancer patients. (C) Chromatin interaction with Nrarp gene in 4T1 cell series. The green line indicates gene promoter. The red line indicates enhancer. (D) Survival analysis of Nrarp in human breast cancer patients.

107 Dissertation Jacqueline Jufen Zhu

4.4 Discussion

Metastasis remains a significant problem for the management of breast cancer. Mortality can be attributed to metastatic disease and its sequelae, rather than the effects of the originating primary tumor. However, despite the importance of this process for patient outcome, relatively little is known regarding the events that drive the metastatic cascade. A major reason for this ignorance is the difficulty of obtaining appropriate tissue samples to perform in-depth genomic characterizations. Metastatic lesions are usually not surgically removed, and those that are resected are usually confounded by treatment186 which may select for events associated with therapy resistance rather metastasis formation. In addition, biopsies of metastatic lesions in untreated patients at diagnosis often suffer from low tumor content187 or necrosis186. Furthermore, most metastatic lesions are not accessible to biopsy and repeated biopsies of multiple lesions within individuals increases patient morbidity. As an added challenge, since metastasis within patients exhibit significant heterogeneity at both the biomarker188–190 and genomic levels191,192, selective biopsies may not capture the full complexity of events associated with tumor progression.

Intriguingly, analysis of patient samples to date has not provided convincing evidence for the existence of metastasis driver genes193–196. This may be due to limited numbers of samples screened. However, studies to date suggest relatively few new coding mutations in metastases compared to the primary tumor, and no obvious recurrent mutations in genes or pathways have yet been described. Current efforts to address the question of metastasis driver genes in human cancer are underway, utilizing biopsy or archival materials (reviewed in197). These studies, however, are limited to a relatively small number of pre-selected genes and will therefore not have the ability to identify novel and/or unexpected events. Furthermore, these studies also do

108 Dissertation Jacqueline Jufen Zhu not address the role of non-coding events or epigenetic events that may underlie metastatic ability.

Mouse models overcome many of these limitations and offer many advantages. (i) a readily available, renewable resource of tissue enables additional technologies to be applied to a genetically equivalent set of tissues over time; (ii) the ability to observe the full natural history of metastatic development, which is not possible in human cancers due to surgical resection and application of neo-adjuvant and adjuvant therapies; (iii) the existence of syngeneic allograft models that can be utilized to rapidly directly test hypotheses generated from genomic characterization; and (iv) sophisticated genetic engineering capacities allow examination of hypotheses in whole animals, including factors that may function through somatic microenvironmental effects rather than direct effects on tumor cells, an area of research that is precluded in most human studies. Data generated from exploitation of animal models therefore provide an important vehicle for hypothesis generation and testing that can subsequently be examined and verified in more restrictive human experimental systems.

The core gene set of 338 genes are significantly enriched in metabolic, cell-cell contact and immune functions. Consistent with previous studies126,195,198,199, small molecule agents implicated from these analyses to impact these molecular processes were found to significantly affect metastatic progression, both positively and negatively, in these model systems. If these results are translatable into human patients, they might provide opportunities to complement current adjuvant therapies to further decrease or delay the occurrence of metastatic disease.

The integrated strategy presented here provides a novel method to identify potentially important genes within the metastatic cascade. Encouragingly, a number of the genes in this core set have previously been associated with metastatic breast cancer, including Mta2185 and

109 Dissertation Jacqueline Jufen Zhu

Brms1184. Further validation and characterization of additional genes within the core 338 gene set will likely provide evidence for additional metastasis susceptibility genes and valuable clues to the molecular mechanisms driving metastatic colonization.

110 Dissertation Jacqueline Jufen Zhu

Chapter 5: Conclusions, limitations and future

directions

111 Dissertation Jacqueline Jufen Zhu

Chapter 5

5.1 Summary of the findings

Chromatin interaction and genome structure have been widely studied to explore the regulatory mechanism of gene transcription. However, still little is known about the role of chromatin organization in cancer progression. In my thesis, I have characterized the chromatin interaction in mammalian cells using different methods including super-resolution imaging and high-throughput sequencing. Besides the visualization of chromatin loop, I also interrogated the chromatin reorganization in breast cancer metastasis. Beyond, I integrated the chromatin interaction with polymorphic BACh sites to identify metastasis susceptibility genes and network.

The chromatin looping was inferred by DNA sequencing data. However, nobody has ever seen exactly how the chromatin loops into functionally regulatory interactions. We applied the super-resolution microscopy iPALM to image the 3D structure of the distinct 13 kb chromatin loop up to 20 nm resolution laterally and axially. Unexpectedly, we observed a complex looping structure instead of a single chromatin loop inferred by ChIA-PET data. Though the structures of six individual images are positively correlated with Hi-C data, the ultrastructure of each structure is very different across cells, suggesting the heterogeneity of the chromatin looping that can hardly be detected by sequencing methods.

As chromatin organization is associated with gene transcription and disease development.

We have characterized the chromatin interaction in mouse breast cancer cell lines harboring differed metastatic potential to investigate the enhancer regulation in metastasis progression. By comprehensive comparison of transcription, chromatin accessibility and chromatin interaction in

4T1 cell series, we identified metastasis-associated genes and enhancers. The enhancer distribution on metastasis-associated genes revealed a delicate regulatory system that one gene

112 Dissertation Jacqueline Jufen Zhu can be regulated by multiple enhancers and one enhancer can also regulate multiple genes in metastasis. The multi-tasking enhancers connecting with multiple metastasis-associated genes were validated by CRISPR knockout to play important roles in regulating gene transcription and metastatic phenotypes, providing potential candidates for therapeutics in breast cancer metastasis.

For breast cancer metastasis study, we further introduced mouse strains for genetic screening to identify metastasis susceptibility associated SNPs. By integrating chromatin accessibility and interaction with identified SNPs, we identified a core set of metastasis susceptibility associated genes. This set of genes include not only previously identified genes but also novel ones that associate with metastasis. Pathway analysis suggested a couple of canonical signaling pathways involved in metastasis progress. Of them, glucocorticoid receptor pathway was validated to promote metastasis by mouse implantation experiment resulting higher metastasis potential treated with dexamethasone.

Taken together of breast cancer analysis in both mouse cell lines and mouse strains,

Tmem223 and Nrarp were identified to be associated with metastasis progression consistently.

The survival rate analysis in human breast cancer patient showed the alterations on Tmem223 and Nrarp led to poorer survival compared to no alterations, indicating the critical importance of fundamental studies.

113 Dissertation Jacqueline Jufen Zhu

5.2 Limitation of the studies

The super-resolution imaging on chromatin interaction provided novel insights into the chromatin looping conformation. However, the structure of the chromatin interaction is not fully unraveled due to the limitations of the methods. Firstly, DNA FISH used to stain the target chromatin region including the loop and the tails is in single color so that we are not able to distinguish the loop and the tails, giving more difficulties to the following modeling analysis.

Secondly, the penetration depth of iPALM microscopy is only 600 nm, which is theoretically sufficient to cover the target chromatin region, but taking the whole nucleus into account, the penetration depth might not be able to capture the intact target region. Thirdly, the DNA FISH staining efficiency is not ideal to allow all the probe oligos to attach to the chromatin. Ideally, we have 336 fluorophores in total to hybridized with the 33 kb DNA in a density of about 10 fluorophore/kb, which enables the detection of DNA with at least one fluorophore per nucleosome. However, in fact we only got about 70 fluorophores in average for each copy of the chromatin, decreasing the labeling density about five-fold. Lastly, due to the efficiency of iPALM, limited images were acquired that might not fully reflect all the possible conformation of the chromatin loop, which are not representative or conclusive enough.

In breast cancer metastasis study, we utilized the genetic screening for pairs of mouse strains to identify metastasis susceptibility associated SNPs, and integrated the SNPs with chromatin accessibility and interactions characterized in 4T1 mouse cell line series to identify metastasis susceptibility associated genes. Due to the limitations and difficulties of ChIA-PET technology and mouse strain cell collection, it is not realistic so far to characterize chromatin interaction in mouse strains instead of cell lines. Though 4T1 cell line series are from tumors in

BALB/cJ mouse strain, the genetic background and genome organization of this in vitro model

114 Dissertation Jacqueline Jufen Zhu can be different from mouse strains studied in this thesis. Transcriptionally, 4T1 cell series are triple negative; and the EMT is reversely correlated to metastasis, which is against the well- established metastasis mechanism, suggesting 4T1 model could possibly have a distinct molecular mechanism regulating metastasis progression. Therefore, the transcription, chromatin accessibility and interaction in mouse strains may not be fully recapitulated by the analysis of

4T1 cell lines.

In my study, I interrogate the chromatin interaction and gene regulation in mouse models.

To be more suggestive for understanding the mechanism in human and disease development, similar strategies could be applied in human cell studies.

115 Dissertation Jacqueline Jufen Zhu

5.3 Future directions

Regarding technologies used to study genome biology, high throughput DNA sequencing leads an efficient way to study eukaryotic genome by capturing tons of pieces of chromatin interactions. Despite the advantages, limitations are nonnegligible. Currently, sequencing technologies characterize the genome of a cell population, so that we get an averaged or the most frequently occurred chromatin conformation, which lacks individual specificity. Addressing the genome organization of a single cell would be a trend in the future. Different from sequencing technologies, imaging is a complementary method that allows investigation in single cell, but with a shortcoming of low throughput. Besides the improvement in resolution, a high throughput operation in visualizing chromatin conformation could be pursued.

In the aspect of biological significance of genome study, characterizing genome organization in primary human cells would be more informative and suggestive for finding cures to diseases. The future goal is to be able to dig out genome structure associated driving factors for disease genesis in each patient individual if desired, and rescue the abnormality using genome or epigenome editing tools.

In terms of the philosophy beyond biology, I have got inspired through my doctoral study.

I have been studying biology for 12 years, majoring in bio-engineering, microbiology and medical sciences for bachelor, master and PhD, respectively. I like sciences, including math, physics, chemistry, etc. Unlike the subjects that are quantitative, biology is hardly predictable due to the nature of lives. “Life is like a box of chocolate, you never know what you’re gonna get.” This is the most fun part for me, driving me to try to get closer to the truth of life. The ultimate goal for me to study biology is to find the universal rules for the living organisms, to answer the questions of who I am and where I am going. All the living organisms tend to live

116 Dissertation Jacqueline Jufen Zhu better by cooperation following certain rules. Microorganisms cooperate to let some of them to evolve prior to the others. Cells in multicellular organisms cooperate to differentiate into different parts. Nucleus and organelles cooperate within the cell to maintain cellular function.

Chromatin cooperate in the genome to regulate gene transcription. Even a slight change of the chromatin organization could lead to a severely different result. More important to me, it is not to cure diseases, not to understand the biology itself, but to find the balance and achieve the harmony between the nature and the mankind.

117 Dissertation Jacqueline Jufen Zhu

Appendices

DNA Deoxyribonucleic acid RNA Ribonucleic acid ISH In situ hybridization FISH Fluorescence in situ hybridization CT Chromosome territory Chr Chromosome LAD Lamina-associated domain ICD Interchromosomal domain ESC Embryonic stem cell BAC Bacterial artificial chromosome ChIP-seq Chromatin immunoprecipitation sequencing ATAC-seq Transposase-accessible chromatin using sequencing 3C Chromosome conformation capture 4C Circular chromosome conformation capture 5C Chromosome conformation capture carbon copy Hi-C High-throughput chromosome conformation capture ChIA-PET Chromatin interaction analysis using pair-end tag sequencing H3K27ac Histone 3 lysine 27 acetylation H3K4me3 Histone 3 lysine 4 trimethylation 2D Two-dimension 3D Three-dimension DHS DNase Hypersensitive site BACh Benzonase accessible chromatin TAD Topologically associating domain CTCF CCCTC-binding factor RNAPII RNA polymerase II YY1 Ying yang 1 TALEN Transcription activator-like effector nucleases CRISPR Clustered regularly interspaced short palindromic repeats STORM Stochastic optical reconstruction microscopy PALM Photoactivated localization microscopy iPALM Interferometric Photoactivated localization microscopy SIM Structural illumination microscopy STED Stimulated emission depletion lncRNA Long non-coding RNA Xist X-inactive specific transcript Firre Functional intergenic repeating RNA element Xi Inactive X chromosome Xa Active X chromosome ICF Instability facial anomalies T-ALL T-cell acute lymphoblastic leukemia SNP Single nucleotide polymorphism CCD CTCF-mediated chromatin contact domains TCRA T-cell receptor alpha

118 Dissertation Jacqueline Jufen Zhu

PBS Phosphate-buffered saline KOH Potassium hydroxide SSC Saline-sodium citrate EMCCD Electron-multiplying charge-coupled device NN Nearest neighbor NJ Neighbor joining TSP Traveling salesman problem solver TSS Transcription start site qRT-PCR Quantitative reverse transcription polymerase chain reaction HDR Homology-directed repair H&E hematoxylin and eosin PRC Percent reference peak coverage ECDF Empirical cumulative distribution K-S test Kolmogorov-Smirnov test EMT Epithelial-to-mesenchymal transition PPAR Peroxisome proliferator-activated receptor

119 Dissertation Jacqueline Jufen Zhu

References 1. Olins, A. L. & Olins, D. E. Spheroid Chromatin Units (ν Bodies). Science (80-. ). 183, 330–332 (1974). 2. Kornberg, R. D. Chromatin Structure: A Repeating Unit of and DNA. Science (80-. ). 24, 868–871 (1974). 3. Oudet, P., Gross-Bellard, M. & Chambon, P. Electron microscopic and biochemical evidence that chromatin structure is a repeating unit. Cell 4, 281–300 (1975). 4. Bustin, M., Goldblatt, D. & Sperling, R. Chromatin structure visualization by immunoelectron microscopy. Cell 7, 297–304 (1976). 5. Finch, J. T. & Klug, A. Solenoidal model for superstructure in chromatin. Proc. Natl. Acad. Sci. 73, 1897–1901 (1976). 6. Finch, J. T., Noll, M. & Kornberg, R. D. Electron microscopy of defined lengths of chromatin. Proc. Natl. Acad. Sci. 72, 3320–3322 (1975). 7. Luger, K., Mader, A. W., Richmond, R. K., Sargent, D. F. & Richmond, T. J. Crystal structure of the nucleosome core particle at 2.8A˚ resolution. Nature 389, 251–260 (1997). 8. Felsenfeld, G. & Groudine, M. Controlling the double helix. Nature 421, 448–453 (2003). 9. Thoma, F., Koller, T. & Klug, A. Involvement of histone H1 in the organization of t... [J Cell Biol. 1979] - PubMed result. J. Cell Biol. 83, 403–27 (1979). 10. McGhee, J. D., Rau, D. C., Charney, E. & Felsenfeld, G. Orientation of the nucleosome within the higher order structure of chromatin. Cell 22, 87–96 (1980). 11. Worcel, A., Strogatz, S. & Riley, D. Structure of chromatin and the linking number of DNA. Proc. Natl. Acad. Sci. 78, 1461–1465 (1981). 12. van Holde, K. & Zlatanova, J. Chromatin fiber structure: Where is the problem now? Semin. Cell Dev. Biol. 18, 651–658 (2007). 13. Maeshima, K., Hihara, S. & Eltsov, M. Chromatin structure: Does the 30-nm fibre exist in vivo? Curr. Opin. Cell Biol. 22, 291–297 (2010). 14. Schardin, M., Cremer, T., Hager, H. D. & M. Lang. Specific staining of human chromosomes in Chinese hamster x man hybrid cell lines demonstrates interphase chromosome territories. Hum. Genet. 71, 281–287 (1985). 15. Pinkel D Straume t Gray JW. Cytogenetic analysis using quantitative, high-sensitivity, fluorescence hybridization. Genetics 83, 2934–2938 (1986). 16. P, L., T, C., J, B., L, M. & DC., W. Delineation of individual human chromosomes in metaphase and interphase cells by in situ suppression hybridization using recombinant DNA libraries. Hum. Genet. 80, 224–234 (1988). 17. Sehgal, N. et al. Gene density and chromosome territory shape. Chromosoma 123, 499– 513 (2014). 18. Neusser, M. et al. Evolutionary conservation of chromosome territory arrangements in cell nuclei from higher primates. Proc. Natl. Acad. Sci. 99, 4424–429 (2002). 19. Zhao, R., Bodnar, M. S. & Spector, D. L. Nuclear Neighborhoods and Gene Expression. Curr. Opin. Genet. Dev. 19, 172–179 (2009). 20. Fritz, A. et al. Chromosomes at Work: Organization of Chromosome Territories in the Interphase Nucleus. 117, 9–19 (2017). 21. Chambeyron, S. & Bickmore, W. A. Chromatin decondensation and nuclear reorganization of the HoxB locus upon induction of transcription. Genes Dev. 18, 1119– 1130 (2004). 22. Cremer, T. et al. Role of chromosome territories in the functional compartmentalization of

120 Dissertation Jacqueline Jufen Zhu

the cell nucleus. Cold Spring Harb. Symp. Quant. Biol. 58, 777–792 (1993). 23. RM, Z., UR, M., A, K., T, C. & P., L. Evidence for a nuclear compartment of transcription and splicing located at chromosome domain boundaries. Chromosom. Res. 1, 93–106 (1993). 24. Foster, H. A. et al. Relative proximity of chromosome territories influences chromosome exchange partners in radiation-induced chromosome rearrangements in primary human bronchial epithelial cells. Mutat. Res. - Genet. Toxicol. Environ. Mutagen. 756, 66–77 (2013). 25. Cremer, T. & Cremer, M. Chromosome territories. Cold Spring Harb. Perspect. Biol. 2, 1–22 (2010). 26. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin , DNA-binding proteins and nucleosome position. 10, (2013). 27. Lieberman-aiden, E. et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the . 33292, 289–294 (2009). 28. Fullwood, M. J. et al. An oestrogen-receptor- a -bound human chromatin interactome. Nature 461, 58–64 (2009). 29. Creyghton, M. P. et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. U. S. A. 107, 21931–21936 (2010). 30. Liang, G. et al. Distinct localization of acetylation and H3-K4 methylation to the transcription start sites in the human genome. Proc. Natl. Acad. Sci. 101, 7357–7362 (2004). 31. Li, G. et al. Extensive Promoter-centered Chromatin Interactions Provide a Topological Basis for Transcription Regulation. 148, 84–98 (2012). 32. Stees, J., Varn, F., Huang, S., Strouboulis, J. & Bungert, J. Recruitment of Transcription Complexes to Enhancers and the Role of Enhancer Transcription. Biology (Basel). 1, 778– 793 (2012). 33. Handoko, L. et al. CTCF-mediated functional chromatin interactome in pluripotent cells. Nat. Genet. 43, 630–638 (2011). 34. Whyte, W. A. et al. Master Transcription Factors and Mediator Establish Super-Enhancers at Key Cell Identity Genes. Cell 153, 307–319 (2013). 35. Tang, Z. et al. CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription. Cell 163, 1611–1627 (2015). 36. Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014). 37. Dixon, J. R. et al. Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions. Nature 485, 376–380 (2012). 38. Dixon, J. R. et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015). 39. Dowen, J. M. et al. Control of Cell Identity Genes Occurs in Insulated Neighborhoods in Mammalian Chromosomes. Cell 159, 374–387 (2014). 40. Dixon, J. R., Gorkin, D. U. & Ren, B. Chromatin Domains: The Unit of Chromosome Organization. Mol. Cell 62, 668–680 (2016). 41. Pope, B. D. et al. Topologically associating domains are stable units of replication-timing regulation. Nature 515, 402–405 (2014). 42. Hu, J. et al. Chromosomal Loop Domains Direct the Recombination of Antigen Receptor

121 Dissertation Jacqueline Jufen Zhu

Genes. Cell 164, 947–959 (2015). 43. Hnisz, D., Day, D. S. & Young, R. A. Insulated neighborhoods: structural and functional units of mammalian gene control. Cell 165, 255–269 (2016). 44. Ran, F. A. et al. Genome engineering using the CRISPR-Cas9 system. 8, 2281–2308 (2013). 45. Boch, J. TALEs of genome targeting. Nat. Biotechnol. 29, 135–136 (2011). 46. Beliveau, B. J. et al. Versatile design and synthesis platform for visualizing genomes with Oligopaint FISH probes. Proc. Natl. Acad. Sci. 109, 21301–21306 (2012). 47. Beliveau, B. J. et al. Single-molecule super-resolution imaging of chromosomes and in situ haplotype visualization using Oligopaint FISH probes. Nat. Commun. 6, 1–13 (2015). 48. Boettiger, A. N. et al. chromatin folding for different epigenetic states. Nature 529, 418– 422 (2016). 49. M. G. L. GUSTAFSSON. Surpassing the lateral resolution limit by a factor of two using structured illumination microscopy. J. Microsc. 198, 82–87 (2000). 50. Klar, T. A., Jakobs, S., Dyba, M., Egner, A. & Hell, S. W. Fluorescence microscopy with diffraction resolution barrier broken by stimulated emission. Proc. Natl. Acad. Sci. 97, 8206–8210 (2000). 51. Michael J. Rust, M. B. and X. Z. Stochastic optical reconstruction microscopy (STORM) provides sub-diffraction-limit image resolution. Nat. Methods 3, 793–795 (2006). 52. Betzig, E. et al. Imaging Intracellular Fluorescent Proteins at Nanometer Resolution. Science (80-. ). 313, 1642–1645 (2006). 53. Lothar Schermelleh, Peter M. Carlton, Sebastian Haase, Lin Shao, Lukman Winoto, Peter Kner, Brian Burke, M. Cristina Cardoso, David A. Agard, Mats G. L Gustafsson, H. L. and J. W. S. Subdiffraction Multicolor Imaging of the Nuclear Periphery with 3D Structured Illumination Microscopy. Science (80-. ). 320, 1332–1336 (2008). 54. Hein, B., Willig, K. I. & Hell, S. W. Stimulated emission depletion (STED) nanoscopy of a fluorescent protein-labeled organelle inside a living cell. Proc. Natl. Acad. Sci. 105, 14271–14276 (2008). 55. Huang, B., Jones, S. A., Brandenburg, B. & Zhuang, X. Whole-cell 3D STORM reveals interactions between cellular structures with nanometer-scale resolution. 5, 1047–1052 (2008). 56. Huang, B., Wang, W., Bates, M. & Zhuang, X. Three-dimensional Super-resolution Imaging by Stochastic Optical Reconstruction Microscopy. Science (80-. ). 319, 810–813 (2008). 57. Xu, K., Babcock, H. P. & Zhuang, X. Dual-objective STORM reveals three-dimensional filament organization in the actin cytoskeleton. 9, (2012). 58. Gleb Shtengela, James A. Galbraithb, Catherine G. Galbraithc, Jennifer Lippincott- Schwartzd, Jennifer M. Gilletted, Suliana Manleyd, Rachid Sougratd, Clare M. Watermane, Pakorn Kanchanawonge, Michael W. Davidsonf, Richard D. Fettera, and H. F. H. Interferometric fluorescent super-resolution microscopy resolves 3D cellular ultrastructure. Proc. Natl. Acad. Sci. 106, 3125–3130 (2008). 59. Yejun Wang, Shovamayee Maharana, M. D. W. & G. V. S. Super-resolution microscopy reveals decondensed chromatin structure at transcription sites. Sci. Rep. 4, 1–7 (2014). 60. Smeets, D. et al. Three-dimensional super-resolution microscopy of the inactive X chromosome territory reveals a collapse of its active nuclear compartment harboring distinct Xist RNA foci. 7, 1–27 (2014).

122 Dissertation Jacqueline Jufen Zhu

61. Lakadamyali, M. & Cosma, M. P. Advanced microscopy methods for visualizing chromatin structure. FEBS Lett. 589, 3023–3030 (2015). 62. Dong, B. et al. Superresolution intrinsic fluorescence imaging of chromatin utilizing native, unmodified nucleic acids for contrast. Proc. Natl. Acad. Sci. 113, 9716–9721 (2016). 63. Boettiger, A. N. et al. Super-resolution imaging reveals distinct chromatin folding for different epigenetic states. Nature 529, 418–422 (2016). 64. Xu, J. et al. Super-Resolution Imaging of Higher-Order Chromatin Structures at Different Epigenomic States in Single Mammalian Cells. Cell Rep. 24, 873–882 (2018). 65. Wang, S. et al. Spatial organization of chromatin domains and compartments in single chromosomes. Science (80-. ). 353, 598–602 (2016). 66. Wit, E. De & Laat, W. De. A decade of 3C technologies : insights into nuclear organization. Genes Dev. 26, 11–24 (2012). 67. Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing Chromosome Conformation. Science (80-. ). 295, 1306–1312 (2002). 68. Zhao, Z. et al. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nat. Genet. 38, 1341–1347 (2006). 69. Dostie, J. et al. Chromosome Conformation Capture Carbon Copy ( 5C ): A massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 1299–1309 (2006). 70. Mumbach, M. R. et al. HiChIP: Efficient and sensitive analysis of protein-directed genome architecture. (2016). 71. Jager, R. et al. Capture Hi-C identifies the chromatin interactome of colorectal cancer risk loci. Nat. Commun. 6, 1–9 (2014). 72. Guo, Y. et al. CRISPR-mediated deletion of prostate cancer risk-associated CTCF loop anchors identifies repressive chromatin loops. Genome Biol. 1–17 (2018). doi:10.1186/s13059-018-1531-0 73. Hnisz, D. et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. 351, 1454–1458 (2016). 74. Fanucchi, S., Shibayama, Y., Burd, S., Weinberg, M. S. & Mhlanga, M. M. Chromosomal Contact Permits Transcription between Coregulated Genes. Cell 155, 606–620 (2013). 75. Kieffer-Kwon, K.-R. et al. Interactome Maps of Mouse Gene Regulatory Domains Reveal Basic Principles of Transcriptional Regulation. Cell 155, 1507–1520 (2013). 76. Meyer, M. B., Benkusky, N. A. & Pike, J. W. Selective Distal Enhancer Control of the Mmp13 Gene Identified through Clustered Regularly Interspaced Short Palindromic Repeat ( CRISPR ) Genomic Deletions *. 290, 11093–11107 (2015). 77. Li, Y., Rivera, C. M., Ishii, H., Jin, F. & Selvaraj, S. CRISPR Reveals a Distal Super- Enhancer Required for Sox2 Expression in Mouse Embryonic Stem Cells. 1–17 (2014). doi:10.1371/journal.pone.0114485 78. Hilton, I. B. et al. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. 33, (2015). 79. Fulco, C. P. et al. Systematic mapping of functional enhancer-promoter connections with CRISPR interference. Science (80-. ). 354, 769–773 (2016). 80. Lopes, R., Korkmaz, G. & Agami, R. Applying CRISPR-Cas9 tools to identify and characterize transcriptional enhancers. Nat. Rev. Mol. Cell Biol. 17, 597–604 (2016).

123 Dissertation Jacqueline Jufen Zhu

81. Diao, Y. et al. A new class of temporarily phenotypic enhancers identified by CRISPR/Cas9-mediated genetic screening. Genome Res. 26, 397–405 (2016). 82. Klein, J. C., Chen, W., Gasperini, M. & Shendure, J. Identifying Novel Enhancer Elements with CRISPR-Based Screens. ACS Chem. Biol. 13, 326–332 (2018). 83. Han, R. et al. Functional CRISPR screen identifies AP1-associated enhancer regulating FOXF1 to modulate oncogene-induced senescence. Genome Biol. 19, 1–13 (2018). 84. Ong, C. & Corces, V. G. CTCF : an architectural protein bridging genome topology and function. Nat. Publ. Gr. 15, 234–246 (2014). 85. Splinter, E. et al. CTCF mediates long-range chromatin looping and local histone modification in the ␤-globin locus. Genes Dev. 20, 2349–2354 (2006). 86. Lupiáñez, D. G. et al. Disruptions of Topological Chromatin Domains Cause Pathogenic Rewiring of Gene-Enhancer Interactions. Cell 161, 1012–1025 (2015). 87. Narendra, V. et al. CTCF establishes discrete functional chromatin domains at the Hox clusters during differentiation. 347, 1017–1022 (2015). 88. Kagey, M. H. et al. Mediator and Cohesin Connect Gene Expression and Chromatin Architecture Michael. Nature 467, 430–435 (2010). 89. Zuin, J., Dixon, J. R., Reijden, M. I. J. A. Van Der, Ye, Z. & Kolovos, P. Cohesin and CTCF differentially affect chromatin architecture and gene expression in human cells. 111, (2014). 90. Shi, Y., Seto, E., Chang, L. S. & Shenk, T. Transcriptional repression by YY1, a human GLI-Krüppel-related protein, and relief of repression by adenovirus E1A protein. Cell 67, 377–388 (1991). 91. Yang, W. M., Yao, Y. L. & Seto, E. The Fk506-binding protein 25 functionally associates with histone deacetylases and with transcription factor YY1. EMBO J. 20, 4814–4825 (2001). 92. Thiaville, M. M. & Kim, J. Oncogenic Potential of Yin Yang 1 Mediated Through Control of Imprinted Genes. Crit Rev Oncog 16, 199–209 (2011). 93. Weintraub, A. S. et al. YY1 Is a Structural Regulator of Enhancer-Promoter Loops. Cell 171, 1573–1588.e28 (2017). 94. Minajigi, A. et al. A comprehensive Xist interactome reveals cohesin repulsion and an RNA-directed chromosome conformation. 1–19 (2015). 95. Hacisuleyman, E. et al. Topological organization of multichromosomal regions by the long intergenic noncoding RNA Firre. Nat. Publ. Gr. 21, 198–206 (2014). 96. Matarazzo, M. R., Boyle, S., Esposito, M. D. & Bickmore, W. A. Chromosome territory reorganization in a human disease with altered DNA methylation. Proc. Natl. Acad. Sci. 104, 16546–16551 (2007). 97. Manuscript, A. Spatial genome organization in the formation of chromosomal translocations. Karen J. Meaburn Tom Misteli Evi Soutoglou 17, 80–90 (2007). 98. TomMisteli. Higher-order Genome Organization in Human Disease. Cold Spring Harb. Perspect. Biol. 2, (2010). 99. Flavahan, W. A. et al. Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature 529, 110–114 (2016). 100. Gröschel, S. et al. A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in Leukemia. Cell 157, 369–381 (2014). 101. Herz, H.-M., Hu, D. & Shilatifard, A. Enhancer Malfunction in Cancer. Mol. Cell 53, 859–866 (2014).

124 Dissertation Jacqueline Jufen Zhu

102. Bell, R. E. et al. Enhancer methylation dynamics contribute to cancer plasticity and patient mortality. Genome Res. 26, 601–611 (2016). 103. Taberlay, P. C., Statham, A. L., Kelly, T. K., Clark, S. J. & Jones, P. A. Reconfiguration of nucleosome-depleted regions at distal regulatory elements accompanies DNA methylation of enhancers and insulators in cancer. 24, 1421–1432 (2014). 104. Franco, H. L. et al. Enhancer transcription reveals subtype-specific gene expression programs controlling breast cancer pathogenesis. Genome Res. 28, 159–170 (2018). 105. Qian, J. et al. B Cell Super-Enhancers and Regulatory Clusters Recruit AID Tumorigenic Activity. 1–14 (2014). doi:10.1016/j.cell.2014.11.013 106. Chapuy, B. et al. Discovery and Characterization of Super-Enhancer-Associated Dependencies in Diffuse Large B Cell Lymphoma. Cancer Cell 24, 777–790 (2013). 107. Hoke, H. A. et al. Selective Inhibition of Tumor Oncogenes by Disruption of Super- Enhancers. (2013). doi:10.1016/j.cell.2013.03.036 108. Bcrs, N., Bcrs, N., Bcrs, N., Guire, M.- & Myb, B. Cancer by super-enhancer. 455, 1291– 1293 (2014). 109. Hnisz, D. et al. Super-Enhancers in the Control of Cell Identity and Disease. Cell 155, 934–947 (2013). 110. Hnisz, D. & Young, R. A. New Insights into Genome Structure: Genes of a Feather Stick Together. Mol. Cell 67, 730–731 (2017). 111. Driouch, K., Landemaine, T., Sin, S., Wang, S. & Lidereau, R. Gene arrays for diagnosis , prognosis and treatment of breast cancer metastasis. Clin. Exp. Metastasis 575–585 (2007). doi:10.1007/s10585-007-9110-x 112. Eckhardt, B. L. et al. Genomic Analysis of a Spontaneous Model of Breast Cancer Metastasis to Bone Reveals a Role for the Extracellular Matrix. 3, 1–13 (2005). 113. Lou, Y. et al. Epithelial – Mesenchymal Transition ( EMT ) Is Not Sufficient for Spontaneous Murine Breast Cancer Metastasis. 2755–2768 (2008). doi:10.1002/dvdy.21658 114. Weigelt, B., Peterse, J. L. & Veer, L. J. van’t. Breast cancer metastasis: markers and models. Nat. Rev. Cancer 5, 591–602 (2005). 115. Faraji, F. et al. An integrated systems genetics screen reveals the transcriptional structure of inherited predisposition to metastatic disease. 227–240 (2014). doi:10.1101/gr.166223.113.24 116. Minafra, L. et al. Gene Expression Profiling of Epithelial – Mesenchymal Transition in Primary Breast Cancer Cell Culture. 2184, 2173–2183 (2014). 117. Charafe-jauffret, E. et al. Breast Cancer Cell Lines Contain Functional Cancer Stem Cells with Metastatic Capacity and a Distinct Molecular Signature. 1302–1314 (2009). doi:10.1158/0008-5472.CAN-08-2741 118. Kang, Y. et al. A multigenic program mediating breast cancer metastasis to bone. 3, 537– 549 (2003). 119. Tao, K., Fang, M., Alroy, J. & Sahagian, G. G. Imagable 4T1 model for the study of late stage breast cancer. BMC Cancer 8, 1–20 (2008). 120. Wamstad, J. A., Wang, X., Demuren, O. O. & Boyer, L. A. Distal enhancers : new insights into heart development and disease. Trends Cell Biol. 24, 294–302 (2014). 121. Aslakson, C. J. & Miller, F. R. Selective Events in the Metastatic Process Defined by Analysis of the Sequential Dissemination of Subpopulations of a Mouse Mammary Tumor1. 1399–1405 (1992).

125 Dissertation Jacqueline Jufen Zhu

122. Yang, J. et al. Twist , a Master Regulator of Morphogenesis , Plays an Essential Role in Tumor Metastasis Ben Gurion University of the Negev. Cell 117, 927–939 (2004). 123. Wendt, M. K., Smith, J. A. & Scheimann, W. P. Transforming Growth Factor-β-Induced Epithelial-Mesenchymal Transition Facilitates Epidermal Growth Factor-Dependent Breast Cancer Progression. Oncogene 29, 6485–6498 (2011). 124. Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). 125. Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 488, 75–82 (2012). 126. Bai, L. et al. An Integrated Genome-Wide Systems Genetics Screen for Breast Cancer Metastasis Susceptibility Genes. PLoS Genet. 12, 1–18 (2016). 127. Sundquist, M., Brudin, L. & Goran Tejler. Improved survival in metastatic breast cancer 1985-2016. The Breast https://doi.org/10.1016/j.breast.2016.10.005 (2016). 128. Gorkin, D. U., Leung, D. & Ren, B. The 3D Genome in Transcriptional Regulation and Pluripotency. Stem Cell 14, 762–775 (2014). 129. Lee, T. I. & Young, R. A. Transcriptional Regulation and Its Misregulation in Disease. Cell 152, 1237–1251 (2013). 130. Lifsted, T. et al. Identification of inbred mouse strains harboring genetic modifiers of mammary tumor age of onset and metastatic progression. Int. J. Cancer 77, 640–644 (1998). 131. Lancaster, M., Rouse, J. & Hunter, K. Modifiers of mammary tumor progression and metastasis on mouse chromosomes 7, 9, and 17. Mamm. Genome 16, 120–126 (2005). 132. Park, Y. et al. Sipa1 is a candidate for the metastasis efficiency modifier locus Mtes1. Nat. Genet. 37, 1055–1062 (2007). 133. Ha, N. H., Long, J., Cai, Q., Shu, X. O. & Hunter, K. W. The Circadian Rhythm Gene Arntl2 Is a Metastasis Susceptibility Gene for Estrogen Receptor-Negative Breast Cancer. PLoS Genet. 12, 1–20 (2016). 134. Crawford, N. P. S. et al. Germline polymorphisms in SIPA1 are associated with metastasis and other indicators of poor prognosis in breast cancer. Breast Cancer Res. 8, 1–9 (2006). 135. Brooks, R. et al. Polymorphisms in MMP9 and SIPA1 are associated with increased risk of nodal metastases in early stage cervical cancer. Gynecol. Oncol. 116, 1–16 (2011). 136. Pei, R. et al. Association of SIPA1 545 C > T polymorphism with survival in Chinese women with metastatic breast cancer. Front. Med. China 7, 138–142 (2013). 137. Hsieh, S. M., Look, M. P., Sieuwerts, A. M., Foekens, J. A. & Hunter, K. W. Distinct inherited metastasis susceptibility exists for different breast cancer subtypes: A prognosis study. Breast Cancer Res. 11, 1–9 (2009). 138. Mahamid, J. et al. Visualizing the molecular sociology at the HeLa cell nuclear periphery. Science (80-. ). 351, 969–972 (2016). 139. Ou, H. D. et al. ChromEMT: Visualizing 3D chromatin structure and compaction in interphase and mitotic cells. Science (80-. ). 357, 1–32 (2017). 140. Dempsey, G. T., Vaughan, J. C., Chen, K. H., Bates, M. & Zhuang, X. Evaluation of fluorophores for optimal performance in localization-based super-resolution imaging. Nat. Methods 1027–1036 (2011). 141. Seitan, V. et al. Cohesin-based chromatin interactions enable regulated gene expression within pre-existing architectural compartments. Genome Res. 23, 2066–77 (2013). 142. Elizabeth Ing-Simmons et al. Spatial enhancer clustering and regulation of enhancer-

126 Dissertation Jacqueline Jufen Zhu

proximal genes by cohesin Elizabeth. Genome Res. 504–513 (2015). doi:10.1101/gr.184986.114.8 143. Shih, H.-Y. et al. Tcra gene recombination is supported by a Tcra enhancer- and CTCF- dependent chromatin hub. Proc. Natl. Acad. Sci. 109, E3493–E3502 (2012). 144. Seitan, V. C. et al. A role for cohesin in T cell receptor rearrangement and thymocyte differentiation. Nature 476, 467–471 (2012). 145. Glusman, G. et al. Comparative genomics of the human and mouse T cell receptor loci. Immunity 15, 337–349 (2001). 146. Zhang, X. H.-F., Giuliano, M., Trivedi, M. V., Schiff, R. & Osborne, C. K. Metastasis Dormancy in Estrogen Receptor-Positive Breast Cancer Xiang. Clin. Cancer Res. 19, doi:10.1158/1078-0432.CCR-13-0838 (2013). 147. Butler, T. P. & Gullino, P. M. Quantitation of Cell Shedding into Efferent Blood of Mammary Adenocarcinoma. Cancer Res. 512–516 (1975). 148. Gert Riethmüller & A, C. K. Early cancer cell dissemination and late metastatic relapse: clinical reflections and biological approaches to the dormancy problem in patients. Semin. Cancer Biol. 11, 307–311 (2001). 149. Hüsemann, Y. et al. Systemic Spread Is an Early Step in Breast Cancer. Cancer Cell 13, 58–68 (2008). 150. Hill, R., Chambers, A., Ling, V. & Harris, J. Dynamic heterogeneity: rapid generation of metastatic variants in mouse B16 melanoma cells. Science (80-. ). 224, 998–1001 (1984). 151. Cremer, T. et al. Chromosome territories--a functional nuclear landscape. Curr. Opin. Cell Biol. 18, 307–316 (2006). 152. Hübner, M. R., Eckersley-Maslin, M. A. & Spector, D. L. Chromatin organization and transcriptional regulation. Curr. Opin. Genet. Dev. 23, 89–95 (2013). 153. Chromosome territory positioning of conserved homologous chromosomes in different primate species. 367–375 (2006). doi:10.1007/s00412-006-0064-6 154. Lima, A. The Chromosome Territory of Human Oncogenes. 6, 349–354 (1986). 155. Taylor, P. et al. Transcription regulation by distal enhancers. 37–41 doi:10.4161/trns.20720 156. Alinikula, J. & Schatz, D. G. Super-Enhancer Transcription Converges on AID. Cell 159, 1490–1492 (2014). 157. Grøntved, L. et al. Rapid genome-scale mapping of chromatin accessibility in tissue. Epigenetics and Chromatin 5, 1–12 (2012). 158. John, S. et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nat. Genet. 264–268 (2011). 159. Baek, S., Sung, M.-H. & Hager, G. L. Quantitative analysis of genome-wide chromatin remodeling. Methods Mol. Biol. 433–441 (2012). 160. Li, X. et al. Long-read ChIA-PET for base-pair-resolution mapping of haplotype-specific chromatin interactions. Nat. Protoc. 899–915 (2017). 161. Martin, T. A. & Jiang, W. G. Loss of tight junction barrier function and its role in cancer metastasis. Biochim. Biophys. Acta - Biomembr. 1788, 872–891 (2009). 162. Brennan, K., Offiah, G., McSherry, E. A. & Hopkins, A. M. Tight junctions: A barrier to the initiation and progression of breast cancer? J. Biomed. Biotechnol. 2010, (2010). 163. Avgustinova, A. et al. Tumour cell-derived Wnt7a recruits and activates fibroblasts to promote tumour aggressiveness. Nat. Commun. 7, 1–14 (2016). 164. Lewis, A. M., Varghese, S., Xu, H. & Alexander, H. R. Interleukin-1 and cancer

127 Dissertation Jacqueline Jufen Zhu

progression: The emerging role of interleukin-1 receptor antagonist as a novel therapeutic agent in cancer treatment. J. Transl. Med. 4, 1–12 (2006). 165. Mace, K. A., Hansen, S. L., Myers, C., Young, D. M. & Boudreau, N. HOXA3 induces cell migration in endothelial and epithelial cells promoting angiogenesis and wound repair. J. Cell Sci. 118, 2567–2577 (2005). 166. Wardwell-ozgo, J. et al. HOXA1 drives melanoma tumor growth and metastasis and elicits an invasion gene expression signature that prognosticates clinical outcome. Oncogene 33, 1017–1026 (2014). 167. Chen, Y. et al. MiRNA-135a promotes breast cancer cell migration and invasion by targeting HOXA10. BMC Cancer 12, (2012). 168. Kharman-Biz, A. et al. Expression of activator protein-1 (AP-1) family members in breast cancer. BMC Cancer 13, 1 (2013). 169. Ndlovu, M. N. et al. Hyperactivated NF- B and AP-1 Transcription Factors Promote Highly Accessible Chromatin and Constitutive Transcription across the Interleukin-6 Gene Promoter in Metastatic Breast Cancer Cells. Mol. Cell. Biol. 29, 5488–5504 (2009). 170. Zhao, C. et al. Genome-wide profiling of AP-1-regulated transcription provides insights into the invasiveness of triple-negative breast cancer. Cancer Research 74, (2014). 171. MacLean, J. A., King, M. L., Okuda, H. & Hayashi, K. WNT7A regulation by miR-15b in ovarian cancer. PLoS One 11, 1–12 (2016). 172. Huang, X. et al. Wnt7a activates canonical Wnt signaling, promotes bladder cancer cell invasion, and is suppressed by MIR-370-3p. J. Biol. Chem. 293, 6693–6706 (2018). 173. Calvo, R. et al. Altered HOX and WNT7A expression in human lung cancer. Proc. Natl. Acad. Sci. 97, 12776–12781 (2000). 174. Liu, Y., Cai, Y., Liu, L., Wu, Y. & Xiong, X. Crucial biological functions of CCL7 in cancer. PeerJ 6, e4928 (2018). 175. Lee, Y. S. et al. Crosstalk of CCL7 and CCR3 promotes metastasis of colon cancer cells via MAPK signaling pathways. Eur. J. Cancer 61, S58 (2016). 176. Shang, X., Lin, X., Alvarez, E., Manorek, G. & Howell, S. B. Tight Junction Proteins Claudin-3 and Claudin-4 Control Tumor Growth and Metastases. Neoplasia 14, 974-IN22 (2012). 177. Nichols, L. S., Ashfaq, R. & Iacobuzio-Donahue, C. A. Claudin 4 Protein Expression in Primary and Metastatic Pancreatic Cancer: Support for Use as a Therapeutic Target. Am. J. Clin. Pathol. 121, 226–230 (2004). 178. Jiwa, L. S. et al. Upregulation of Claudin-4, CAIX and GLUT-1 in distant breast cancer metastases. BMC Cancer 14, 864 (2014). 179. Huang, J. et al. Claudin-1 enhances tumor proliferation and metastasis by regulating cell anoikis in gastric cancer Jie. Oncotarget 6, 1652–1665 (2014). 180. Tabaries, S. et al. Claudin-2 Promotes Breast Cancer Liver Metastasis by Facilitating Tumor Cell Interactions with Hepatocytes. Mol. Cell. Biol. 32, 2979–2991 (2012). 181. Dahiya, N., Becker, K. G., Wood, W. H., Zhang, Y. & Morin, P. J. Claudin-7 is frequently overexpressed in ovarian cancer and promotes invasion. PLoS One 6, (2011). 182. Aslakson, C. J., Rak, J. W., Miller, B. E. & Miller, F. R. Differential influence of organ site on three subpopulations of a single mouse mammary tumor at two distinct steps in metastasis. Int. J. Cancer 47, 466–472 (1991). 183. Whitlock, M. C. Combining probability from independent tests: The weighted Z-method is superior to Fisher’s approach. J. Evol. Biol. 18, 1368–1373 (2005).

128 Dissertation Jacqueline Jufen Zhu

184. Seraj, M. J., Samant, R. S., Verderame, M. F. & Welch, D. R. Functional Evidence for a Novel Human Breast Carcinoma Metastasis Suppressor, BRMS1, Encoded at Chromosome 11q13. Cancer Res. 60, 2764–2769 (2000). 185. Covington, K. R. et al. Metastasis Tumor Associated Protein 2 Enhances Metastatic Behavior and is Associated with Poor Outcomes in Estrogen Receptor-Negative Breast Cancer. Breast cancer Res. 141, 375–384 (2013). 186. Vakiani, E. et al. Comparative genomic analysis of primary versus metastatic colorectal carcinomas. J. Clin. Oncol. 30, 2956–2962 (2012). 187. Pandis, N. et al. Cytogenetic comparison of primary tumors and lymph node metastases in breast cancer patients. Genes Chromosom. Cancer 22, 122–129 (1998). 188. Krøigård, A. B., Larsen, M. J., Thomassen, M. & Kruse, T. A. Molecular Concordance Between Primary Breast Cancer and Matched Metastases. Breast J. 22, 420–430 (2016). 189. Hoefnagel, L. D. C. et al. Discordance in ERα, PR and HER2 receptor status across different distant breast cancer metastases within the same patient. Ann. Oncol. 24, 3017– 3023 (2013). 190. Lindström, L. S. et al. Clinically used breast cancer markers such as estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 are unstable throughout tumor progression. J. Clin. Oncol. 30, 2601–2608 (2012). 191. Kuukasjärvi, T. et al. Genetic Heterogeneity and Clonal Evolution Underlying Development of Asynchronous Metastasis in Human Breast Cancer. Cancer Res. 1597– 1604 (1997). 192. Haffner, M. C. et al. Tracking the clonal origin of lethal prostate cancer. J. Clin. Invest. 123, 4918–4922 (2013). 193. Jacob, L. S. et al. Metastatic competence can emerge with selection of pre- existing oncogenic alleles without a need of new mutations Leni. Cancer Res. 75, 3713–3719 (2015). 194. Zhao, Z.-M. et al. Early and multiple origins of metastatic lineages within primary tumors. Proc. Natl. Acad. Sci. 113, 2140–2145 (2016). 195. McCreery, M. Q. et al. Evolution of metastasis revealed by mutational landscapes of chemically induced skin cancers. 21, 1514–1520 (2015). 196. Yachida, S. et al. Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 1114–1117 (2010). 197. Zardavas, D. et al. The AURORA initiative for metastatic breast cancer. Br. J. Cancer 111, 1881–1887 (2014). 198. Yang, H. et al. Caffeine suppresses metastasis in a transgenic mouse model: a prototype molecule for prophylaxis of metastasis. Clin. Exp. Metastasis 21, 719–735 (2005). 199. Qamri, Z. et al. Synthetic cannabinoid receptor agonists inhibit tumor growth and metastasis of breast cancer. Mol. Cancer Ther. 8, 3117–3129 (2009).

129