NOVEL BIFUNCTIONAL MOLECULES REDIRECT CHROMATIN-MODIFYING MACHINERY TO STUDY & CONTROL GENE EXPRESSION

Anna Chiarella

A dissertation submitted to the faculty at the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Genetics and Molecular Biology Department in the School of Medicine.

Chapel Hill 2018

Approved by:

Nathaniel Hathaway

Shawn Ahmed

Stephen Frye

Brian Strahl

David Williams

Ó 2018 Anna Chiarella ALL RIGHTS RESERVED

ii ABSTRACT

Anna Chiarella: Novel bifunctional molecules redirect chromatin-modifying machinery to control gene expression (Under the direction of Nathaniel Hathaway)

Somatic cells in the human body contained roughly six feet of DNA. This genomic information needs to be condensed to fit within a microscopic nucleus, as well as properly interpreted by cellular machinery. The specific and controlled condensing of the genome allows for discrete cell types with unique functions ranging from blood cells that carry oxygen through the body, pancreatic cells that release hormones, and neural cells that send electric signals. Chromatin refers to the DNA sequence wrapped around histone protein octamers. Differing levels of chromatin compaction throughout the genome contribute to cell specificity and function. Loosely compact chromatin is associated with high levels of gene expression, whereas tightly compact chromatin is associated with inactive genes. While there are several layers of control that regulate chromatin compaction, two of the most widely studied mechanisms are DNA and histone tail modifications. These modifications are controlled and interpreted by a series of proteins that deposit, remove, or recognize and respond to the modification. The complexity of these modifications increases when you take into account all the types of modifications, residues, and downstream proteins that are involved. The importance of studying these complex processes is highlighted by the frequency of chromatin dysregulation in human diseases. Over the past few decades, it has become increasingly clear that the chromatin environment is irregular in many disease contexts, leading to the development of chromatin-based therapies in the clinic. To study and control the regulation of the chromatin environment in normal and disease contexts, we designed a novel chemical-based system to redirect specific chromatin modifying machinery in a targeted and reversible manner. By maintaining temporal and gene-specific control, we can begin to understand how these writer, reader, and eraser proteins function, as a means to develop personalized and targeted therapeutics.

iii To my amazing and dependable friends, family, and fiancé. Thank you for supporting me through my PhD work and so much more.

iv ACKNOWLEDGEMENTS

I would like to acknowledge the past and present, part-time and full-time, members of the

Hathaway lab. This team of researchers have encouraged me for the past four and a half years and I greatly appreciate their efforts. I would especially like to thank my mentor for guiding me through every stage of my graduate school career, from editing publications, fellowships, and travel grant applications, to providing me with mentoring experience and career advice. I would not be where I am now without his support and confidence in my abilities.

I would also like to acknowledge the people who served on my dissertation committee: Drs.

Shawn Ahmed (chair), Stephen Frye, Brian Strahl, and David Williams. Thank you for attending all of my committee meetings (especially the early morning meetings). More than being present, thank you for providing invaluable research and career advice. You have all contributed to the work in this dissertation with your experimental ideas and guidance. I also appreciate being able to ask your input and advice frequently throughout my graduate school career, not exclusively during committee meetings. Thank you to Shawn Ahmed for serving as my chair and meeting with me to discuss future plans and big-picture visions for this work. And thank you to David Williams for serving as not only a committee member, but as my clinical co-mentor in the Translational Medicine Certificate Program.

Shadowing in the hematopathology department and being a member of the Program has been an important and meaningful part of my time at UNC.

Throughout my dissertation work, I had the pleasure of contributing to several different projects and working with many talented scientists. The pilot CEM story was a close collaboration with Jian Jin and Kyle Butler, the synthetic chemists who developed and shared the molecules. The collaboration continued expanded make a more modular system of regulating the chromatin by utilizing the CRISPR-

Cas system. This expansion progressed in a large part thanks to a very talented undergraduate, April

Wang, who contributed to years of trouble-shooting and optimizing. This work included testing the activating CEMs in a variety of different biological contexts, protein engineering with the Cas9 and Cpf1

v systems, and testing different modes of expression. We also worked together to present the CEMs platform in a video-based paper to help other labs incorporate this work in their research.

Once we demonstrated the functionality of the repressive and activating CEMs, we began extensively targeting disease relevant genes (TP53, AR, let7-c, CXCR4, MYC1, etc). This work brought about a new collaboration with the Crona Lab. Their expertise in disease pathways and model systems are called upon almost daily. In addition, their welcoming attitude made the development of the new lab space as seamless and exciting as possible.

Major expansions of the CEMs system were made possible with a new, hard-working member of the Hathaway lab, Dongbo Lu. The team-based research, expansion of the snack station, and co- mentoring experience made the work more efficient and enjoyable. When the data is inconclusive, unexpected, or exciting, it has been great having a research partner to discuss with.

Several other CEM-related projects were done in a very collaborative environment, including the optimization of the dCas9 protein (Liu lab, Pengda Liu and Siyuan Su), implementation of a new

CIP-based system (Meier Lab and Elise Sloey), incorporation of an orthogonal and AAV compatible dCas9 system (Peter Voorhees), and comparison of the chromatin modifying dCas9-based recruitment systems in a high-throughput manner (Shama Birla). Thank you to all of you for your contributions towards this collaborative work.

I would next like to acknowledge the other collaborative projects that contributed to my learning experiences during my PhD. The Frye/James lab (Kim Barnash, Franky Potjewyd, Jarod Waybright) allowed me the pleasure of learning about their exciting work pertaining to a new chromatin modifying complex. This collaboration provided me experience with trouble-shooting a new recruitment strategy to study mechanism of action. It has also provided us with the knowledge to incorporate a new complex into our CEMs system in the future. With the Asokan lab (Victoria Madigan), I was able to apply my chromatin knowledge in a new context. Through this collaboration, I learned about a gene delivery system, with which I had little familiarity, from leaders of the AAV field. This work has, and will continue to play a major role shaping the CEMs disease-relevant work.

Lastly, investigation of our chromatin-based system relies on the ability to study what is occurring on DNA at the chromatin level. Our work would not be as efficient and accurate had it not

vi been for our collaboration with the Pattenden lab (Samantha Pattenden, Austin Quimby, and others that contributed to the nanodroplets work). This collaboration led to an exciting project and publication, demonstrating the use of their cavitation reagents to shear DNA and study chromatin biology. The method we developed has been instrument for our ongoing research.

vii TABLE OF CONTENTS

LIST OF TABLES………………………………………………………………………………………….. vii LIST OF FIGURES………………………………………………………………………………………...viii

LIST OF ABBREVIATIONS……………………………………………………………………...…….....xiii

CHAPTER 1: INTRODUCTION TO WRITING, READING & ERASING THE HUMAN

EPIGENOME, IN NAUTE & SYNTHETICALLY……………………………………….………. 1

Section 1.1: Chromatin Biology………………………………….…………...……….……….... 1

Section 1.2: Disease Relevance…….………………………………….……...………...... 8

Section 1.3: Synthetic Control of Chromatin………………………………………...... 18

Section 1.4: Bifunctional Molecules and CIP……………………….………………….……...24

Section 1.5: Conclusions and Future Perspectives...…………………………………...... … 25

References……………………………………………………………………………………….. 28

CHAPTER 2: TARGETED GENE REPRESSION USING NOVEL BIFUNCTIONAL MOLECULES TO HARNESS ENDOGENOUS HISTONE DEACETYLATION

ACTIVITY…………………………………………………………………………..…………….. 46

Introduction…………………………………………………………………….…………………. 46

Results……………………………………………………………………………………………. 48

Discussion…………………………………………………………………………………...... 53

Methods…………………………………………………………………………...……………… 55

References…………………………………………………………………………………….…. 59

Figures……………………………………………………………………………………………. 62

Supplemental References……………………………………………………………………….86

CHAPTER 3: REPRESSING GENE TRANSCRIPTION BY REDIRECTING CELLULAR

MACHINERY WITH CHEMICAL EPIGENETIC MODIFIERS………………………………. 87

Introduction……………………………………………………………………………….………. 87

Protocol…………………………………………………………………………….………….….. 89

Representative Results…………………………………………………………..…….……….. 99

viii Discussion………………………………………………………………………………………. 101

References……………………………………………………………………………………… 104

Figures…………………………………………………………………………………………... 105

CHAPTER 4: GENE SPECIFIC ACTIVATION DRIVE BY SMALL MOLECULES……...…...... 110

Introduction…………………………………………………………………………………...... 110

Results…………………………………………………………………………………………... 111

Conclusions……………………………………………………………………………………...114

Methods…………………………………………………………………………………………. 115

References……………………………………………………………………………………… 119

Figures…………………………………………………………………………………………... 121

CHAPTER 5: CAVITATION ENHANCEMENT INCREASES THE EFFICIENCY AND CONSISTENCY OF CHROMATIN FRAGMENTATION FROM FIXED CELLS FOR DOWNSTREAM QUANTITATIVE APPLICATIONS………………………………………...141

Introduction……………………………………………………………………………………… 142

Experimental Procedures……………………………………………………………………… 142

Results and Discussion………………………………………………………………………... 146

References……………………………………………………………………………………… 151

Figures………………………………………………………………………………………...… 153

CHAPTER 6: DISCUSSION……………………………………………………………………………. 168

Conclusions……………………………………………………………………………...……... 168

Future Directions……………………………………………………………………………….. 170

References……………………………………………………………………………………… 174

ix LIST OF TABLES

Table 1.1 - Histone and DNA writers, readers, and modified in diseases …………………...…...... 10

Table 1.2 - Genes with modified chromatin environments in a diseased context………...………...12

Table 1.3 - Inhibitors of chromatin writers, readers, and erasers for translational studies………... 17

Table 5.S1 - ……………………………………………………………………………………………… 167

x LIST OF FIGURES

Figure 2.1 - Design and structure of chemical epigenetic modifiers (CEMs)…………...... 62

Figure 2.2 - CEM23 reversibly and specifically represses GFP expression………………………... 63

Figure 2.3 - Fluorescent microscopy images and colony categorization for CEM-treated cells.…. 64

Figure 2.4 - H3K27Ac chromatin immunoprecipitation at the Cia:Oct4 locus………………………. 65

Figure 2.S1 - …………………………………………………………….………………………………… 66

Figure 2.S2 - ………………………………………………………….…………………………………… 70

Figure 2.S3 - ……………………………………………………….……………………………………… 73

Figure 2.S4 - ………………………………………………………………………………………………. 76

Figure 2.S5 - Synthesis of compounds 1-3. …………………………………………………………… 78

Figure 2.S6 - Synthesis of compounds 4, CEM23, CEM42………………………………………...…79

Figure 2.S7 - Synthesis of compounds 5, 6, CEM36………………………………………………….. 81

1 Figure 2.S8 - H NMR spectra of CEM23………………………………………………………………. 83

1 Figure 2.S9 - H NMR spectra of CEM42………………………………………………………………. 84

1 Figure 2.S10 - H NMR spectra of CEM36…………………………………………………………...... 85

Figure 3.1 - CEMs bind to the CiA:Oct4 locus through recruitment to the FKBP

and tether endogenous epigenetic machinery……………………………………………… 105

Figure 3.2 - Fluorescence images of mouse embryonic stem cells (mESCs) with

a CiA:Oct4-GFP reporter……………………………………...... ………………………..…… 106

Figure 3.3 - Flow cytometry analysis quantitates a decreased expression of

GFP in mESCs upon CEM23 treatment………………..……………………………………. 107

Figure 3.4 - Sonication of chromatin is uniform, and a smear is visible between

200 and 500 bp……………….………………………………………………………………… 108

Figure 3.5 - CEM23 treatment causes a decrease in H3K27ac at the CiA locus………………… 109

Figure 4.1 - dCas9 CEMa system recrutis endogenous chromatin modifying

machinery to regulate transcriptional activity……………………………………………..… 121

Figure 4.2 - Protein engineering and synthetic chemistry explored to optimize dCas9–CEMa

system…………………………………………………………………………………………… 122

xi Figure 4.3 - Optimized dCas9–CEMa effectively reactivates three diverse mammalian genes…...123

Figure 4.S1 - …………………………………………………………….……………………………….. 125

Figure 4.S2 - dCas9-CEMa gRNA optimization………………………….……………………………. 135

Figure 4.S3 - dCas9-HA control…………………………………………………………….……………136

Figure 4.S4 - A dCas9-SunTag approach adapted to the dCas9-CEMa system……………………137

Figure 4.S5 - CEMa in combination with an HDACi…………………………………………………… 138

Figure 4.S6 - dCas9-p300 time course at MYOD1……………………………………………………. 139

Figure 4.S7 - ……………………………………………………………………………………………... 140

Figure 5.1 - Establishment of the CiA system as a tool assessing chromatin changes…………….153

Figure 5.2.- Efficiency of chromatin fragmentation in the presence and absence of nanodroplets..154

Figure 5.3 - G9a ChIP from HEK 293T cells………………………………...... 155

Figure 5.4 - ChIP-qPCR results using traditional sonication compared to using nanodroplets……156

Figure 5.S1 - …………………………..…………………………………………………………………. 157

Figure 5.S2 - ……………………………………………………………………………………………... 158

Figure 5.S3 - ..……………………………………………………………………………………………. 159

Figure 5.S4 - ………………………………………………………………………………………………160

Figure 5.S5 - …………………………………………………………………………………….……….. 161

Figure 5.S6 - ………………………………………………………………………………………………163

Figure 5.S7 - ……………………………………………………………………………………………... 164

Figure 5.S8 - ………………………………………………………………………………………………165

Figure 5.S9 - ………………………………………………………………………………..……………. 166

xii LIST OF ABBREVIATIONS

µm microns

2OG 2-oxoglutarate

AD Activation domain

ADA2 Adenosine deaminase 2

AID Auxin-induced degradation

ALL Acute lymphoblastic leukemia

AML Acute myeloid leukemia

AOD Amine oxidase

AR Androgen

ASCL1 Achaete-Scute family BHLH 1

BET Bromodomain and Extra-Terminal bp Base pair

BRD Bromodomain

BRDi Bromodomain inhibitor

Cas CRSIRP-associated

CD Catalytic domain

CE Core enhancer

CEM Chemical epigenetic modifiers

CEMa CEM activating

CEMr CEM repressing

ChIP Chromatin immunoprecipitation

CiA Chromatin in vivo assay

CIP Chemical inducer of proximity

CML Chronic myeloid leukemia

CMML Chronic myelomonocytic leukemia

CRISPR Clustered regulatory interspaced short palindromic repeats

CTCL Cutaneous T-cell lymphoma

xiii CTD Carboxy-terminal domain

CV Coefficient of variation

CXCR4 C-X-C chemokine receptor type 4 dCas9 Deactivated Cas9

DE Distal enhancer

DLBCL Diffuse large B-cell lymphoma

DMEM Dulbecco’s modified Eagle’s medium

DMF dimethyl formamide

DMSO Dimethyl sulfoxide

DNMT DNA methyl transferase

DNMTi DNMT inhibitor

Dox Deoxycyclin

DRR Distal regulatory region

DSPC 1,2-distearoyl-sn-glycero-3-phosphocholine

DSPE-PEG2000 1,2-distearoyl-sn-glycero-3-phosphoethanolamine-N-methoxy (polyethylene-glycol)-2000 eGFP Enhanced Green Fluorescent Protein enChIP Engineered DNA-binding molecule-mediated chromatin immunoprecipitation

ESI Electrospray ionization

EtOAc ethyl acetate

FACS Fluorescence activated cell sorting

FAD Flavin adenine dinucleotide

FBS Fetal bovine serum

FISH Fluorescence in situ hybridization

FKBP12 FK506 binding protein 12

FL Follicular lymphoma

FRB FKBP12 rapamycin binding

GCN5 General Control Non-Repressible 5

xiv GNAT GCN5 related N-acetyltransferase

GR

HAT Histone acetyl transferase

HCC Hepatocellular carcinoma

HDAC Histone deacetylase

HDACi HDAC inhibitor

HDM Histone demethylase

HDR Homology-directed repair

HEK Human embryonic kidney

HMT Histone methyltransferase

HP1a Heterochromatin Protein 1 alpha

HRMS High resolution mass spectra

HS2 Hypersensitive site 2

IAP Intracisternal A-type particles

IGR Intergenic region

Indel Insertion or deletion iPSC Induced pluripotent stem cells

JmjC Jumonji C

LIF leukemia inhibitor factor

LSD Lysine specific demthylase

MBP Methyl CpG binding proteins

MBD Methyl CpG binding domain

MCP MS2 coat protein

MDS Myelodysplastic syndrome mESC Mouse embryonic stem cell mL Milliliter mM Millimolar

MPN Myeloproliferative neoplasm

xv MYOD1 Myogenic differentiation

NCOR corepressor complex

NE Neuroendocrine

NEAA Non-essential amino acids

NHEJ Nonhomologous end joining nm nanometer

NMC NUT midline carcinoma

NMR Nuclear magnetic resonance

NT Non- Targeting

NuRD Nucleosome remodeling and deacetylase

PBS Phosphate-buffered saline

PCAF P300/CBP-associated factor

PDAC Pancreatic duct adenocarcinoma

PE Proximal enhancer

PEG Polyethylene glycol

PEI Polyethylenimine

PIC Preinitiation complex

Pol II RNA polymerase II

PTCL Peripheral T-cell lymphoma qPCR Quantitative PCR

RTS Rothmund-Thomson syndrome

RNAi RNA interference

Sa Staphylococcus aureus

SAHA Suberanilohydroxamic acid

SAM S-adenosyl-1-methionine scFv Single chain variable fragment scFNA scaffold sgRNA

SFCA Surfactant-free cellulose acetate

xvi sgRNA single guide RNA

Sp Streptococcus pyogenes

TALENs Transcription activator-like effector nucleases

T-ALL T-cell ALL

TET Ten eleven translocation

TMP Trimethoprim

TRD Transcription repression domain

TSA Trichostatin A

TSS Transcriptional start site

VP160 VP16x10

VPR VP64-Rta-p65

ZFN nucleases

ZnF Zinc finger proteins

xvii LIST OF SYMBOLS

a Alpha b Beta

° Degree

µ Micro

D Delta

xviii CHAPTER 1: INTRODUCTION TO WRITING, READING & ERASING THE HUMAN EPIGENOME:

IN NATURE & SYNTHETICALLY

Section 1.1: Chromatin Biology

Somatic cells in humans contain 46 chromosomes composed of both DNA and histone proteins. If assembled in a single linear strand together these three billion base pairs of DNA would be roughly six feet long1. Specific segments of base pairs code for genes that are “read” by the cellular machinery to create something functional (protein, micro RNA, long non-coding RNA, etc.). Expression of distinct sets of genes allows variety of cell types with specific functions to be constructed from a single DNA blueprint2. For example, in mammals the red blood cells that transport oxygen throughout the circulatory system are functionally and phenotypically different from neuronal cells that reside in the brain. For the entirety of the three billion base pairs to fit into a 10 µM nucleus, while still allowing for certain genes to be expressed, DNA needs to be folded in a very organized and specific manner.

One aspect of cells with distinct gene expression profiles is altered levels of chromatin compaction. Chromatin is composed of DNA and eight histone protein cores (dimers of H2A, H2B, H3 and H4). Epigenetics is the study of how heritable changes in the chromatin and gene expression occur in the absence of DNA sequence changes3. The building block of chromatin is called the nucleosome and is made up of 146 base pairs (bp) of DNA, wrapped around a histone octamer4. Chromatin is found in a spectrum between two broad states, heterochromatin and euchromatin. Heterochromatin is the tightly compacted portion of chromatin associated with transcriptional repression5. Euchromatin is a more open form of chromatin, typically containing genes that are being actively transcribed6. The cell regulates chromatin compaction through ATP-dependent nucleosome remodeling, deposition of histone protein variants, DNA methylation, and posttranslational modifications of the N-terminus of histone tails7. In addition to allowing for proper gene expression, these processes appear to work in concert to properly regulate DNA replication, DNA damage repair, and nuclear organization6,8–10.

1 ATP-dependent chromatin remodeling enzymes, often referred to as SWI/SNF-related or Snf2- related enzymes, are able to physically alter the structure of chromatin at specific genomic regions. In eukaryotes, Snf2 proteins contain a Walker box, a conserved sequence of seven of amino acid motifs11.

The Snf2-related proteins are then divided into one of 24 subfamilies (20 present in humans)12. These protein complexes catalyze small-scale chromatin transformations including (1) physically sliding nucleosome units along the DNA to change nucleosome density (2) altering the nucleosome DNA wrapped around the histone octamer (3) exchanging or removing the octamer subunits13–15. To accomplish this, the remodeling enzymes need to bind the structurally complex nucleosome, disturb histone-DNA interactions, and physically restructure the nucleosome13. In humans, the proteins that have the greatest sequence homology to Snf2 and participate in nucleosome positioning are

SMARCA4/BRG1, SMARCA2/BRM, SMARCA5/hSNFR2H, SMARCA1/ hSNF2L, CHD1, CHD2,

CHD3, CHD4, CHD5, CHD6, CHD7, CHD7, CHD8, CHD9, LSH, PASG, SMARCA6, ALC1/CHD1L.

Based on work done in Drosophila, it is likely that nucleosome repositioning occurs in sequential steps of 1 bp per ATP molecule hydrolyzed16. The proteins that are involved in exchanging the histone dimers are DOMO1, SWR1, EP400, INO80, ETL1/SMARCAD1. Other chromatin remodeling members, whose functions are not well characterized, include RAD54L, RAD54B, ATRX, RAD54L2, SMARCA3,

SNF2L3, TTF2, SHPRH, TAF172/BTAF1, CSB, RAD26, PICH, RAD26L, ERCC6L2,

SMARCAL1/HARP, and ZRANB313. ATPase remodeling enzymes, often in association with accessory proteins, form chromatin remodeling complexes. These complexes fall into the SWI/SNF (BAF and

PBAF), ISWI (NURF, CHRAC, ACF, NORC, RSF, WICH), CHD (CHD1, NuRD), or INO80 (ION80,

SRCAP, Tip6) family of complexes17. Often these complexes are able to perform tradition “remodeling” functions as well as control histone tail post-translational modifications. For example, the NuRD complex (a member of the CHD family) includes HDAC1 and HDAC2, which are able deacetylate the histone tails and contribute to the compaction of the nucleosomes14. Because of the diversity and complexity of several of these complexes, research is still undergoing to understand how these complexes form and function in different genomic contexts. Two additional processes that regulate chromatin compaction are DNA and posttranslational histone tail modifications. Protruding out of the nucleosomes are the N-terminal histone tails that are subjected to post- translational chemical

2 modifications, whereas DNA bases (cytosine and adenine) can be directly modified with chemical groups18–21. The combination of histone and DNA modifications constitute the epigenetic code, with some highly studied modifications being associated with high or low levels of gene expression22.

Currently, four different cytosine modifications have been observed: methylation, hydroxy-methylation, formylation, and carboxylation23. For histone tail modifications, there have been over ten modifications identified, with even more currently being validated and characterized24,25. To orchestrate the histone code throughout the genome, there are a series of proteins involved. The “writer” enzymes are capable of modifying the nucleotide base or histone tail residues, whereas the “eraser” enzymes remove the modifications. “Reader” proteins comprise domains able to recognize specific DNA or histone modifications, and are able to directly or indirectly initiate downstream changes like recruitment of transcription factors, co-repressors, co-activators, and RNA polymerase7. While there are several possible modifications (phosphorylation, ubiquitination, arginine methylation, etc.), two of the most widely studied are methylation (of DNA and lysine residues on histone tails) and acetylation (of multiple lysines on histone tails)26.

The term DNA methylation is most often associated with the formation of 5-methylcytosine in the context of a cytosine-guanine (CpG) dinucleotide. DNA methylation is strongly associated with gene repression27,28. The mark is deposited by DNA methyltransferases (DNMTs), enzymes that initiate the transfer of a methyl group from S-adenosyl-1-methionine (SAM) to cytosine at the 5’ position29. The mammalian DNMT family includes DNMT1, DNMT2, DNMT3A, DNMT3B, and DNMT3L26. DNMT1 is considered the maintenance methyltransferase, as it propagates the methylation pattern from the parental to daughter strand after DNA replication30,31. Importantly, DNA methylation has been shown to propagate through cell replication32. The main function of DNMT2 is catalyzing the methylation of tRNAs, which increases tRNA stability33. The de novo methyltransferases are DNMT3A and DNMT3B, as they are able to methylate both hemi-methylated and un-methylated DNA. They are critical for introduction of DNA methylation throughout gametogenesis and the early stages of embryogenesis27,34.

Lastly, DNMT3L lacks the C-terminal domain needed for methyltransferase activity, but rather functions as a regulatory factor for de novo methylation, working in concert with DNMT3A and DNMT3B to increase activity35,36. The proteins that recognize DNA methylation, the methyl CpG binding proteins

3 (MBPs), include zinc finger (ZnF) proteins, SRA domain proteins, and methyl CpG binding domain

(MBD) proteins37. Upon recognition of DNA methylation, gene silencing is initiated in part through recruitment of chromatin remodeling complexes. For example, MBD2 (one of 11 MBD members) has a MBD domain and a transcription repression domain (TRD). It functions as a member of the nucleosome remodeling and deacetylase (NuRD) complex to silence genes26,38. Other proteins, like

SETDB1 and 2, contain a MBD domain but are unable to bind to 5-methylcytosine. Specifically,

SETDB1 and 2 do serve other functions that contribute to heterochromatin formation, including methylating histones39. UHRF1, a SRA family member, recognizes specifically hemimethylated CpG sites and functions with DNMT1 in the maintenance of DNA methylation2,40–42. DNA demethylation can occur either passively or actively. If DNMT1 activity is lacking, successive rounds of replication will passively cause a reduction in DNA methylation43. For active demethylation, the ten-eleven translocation (TET) family of proteins is required. Much remains to be discovered about this family of enzymes, but it has been shown that they are able to catalyze the oxidation steps of methylated cytosine, ultimately removing the methylation23. Though there are exceptions, DNA methylation is strongly associated with transcriptionally silenced genes.

Histone tail mono-, di-, and tri-methylation of the lysine amino acid has been highly studied, and unlike DNA methylation, its influence on transcriptional activity is strongly context dependent44. For example, histone 3 lysine 36 trimethylation (H3K36me3) is found in the gene body of transcriptionally active genes, whereas histone 3 lysine 27 trimethylation (H3K27me3) is associated with gene repression and found near the transcriptional start sites (TSS)44,45. The writers of mono-, di-, and tri- methylation of the histone tails are generically called histone lysine methyltransferases (KMTs). They transfer methyl groups from the methyl donor, adenosyl-methionine. More specifically, KMTs are split into two families based on the presence (or lack) of SET domain26. The mammalian SET-containing enzymes include the H3K4 methyltransferases (SMYD1, SMYD1, MLL1-5, SET1A, SET1B, ASH1L,

ASH2L, PRDM9, SET7)46,47, the H3K9 methyltransferases (EHMT1/GLP, EHMT2/G9a, SETDB1,

SETDB2, SUV39H1, SUV39H2, PRDM2, PRDM3, PRDM8, PRDM16)8,48–50, the H3K27 methyltransferases (EZH1, EZH2, NSD2, NSD3)51–53, H3K36 methyltransferases (NSD1, SMYD2,

SETD2)54, and the H4K20 methyltransferases (SET8, SUV4-20H1, SUV4-20H2)55. While there are

4 other methyltransferases in this broad SET-containing family, these are some of the more characteristic enzymes. The only highly studied KMT that lacks the SET domain is the H3K79 methyltransferase,

DOT156–58. The proteins that recognize lysine methylation contain one or multiple of the highly characterized reader domains: chromodomain59, chromobarrel domain60, double chromodomain61,

PWWP domain62, plant homeodomain63, WD40 domain64, malignant brain tumor domain65, and tudor domain26,66. Some of these reader domains have been discovered in proteins alongside other reader domains (i.e. UHRF1 recognizes hemimethylated DNA and H3K9me3)67 or alongside catalytic domains

(i.e. WDR5 recognizes H3K4me2 and initiates H3K4me3)64. Histone tail demethylation is performed by one of two ways. Lysine specific demethylase (LSD) members, LSD1/KDM1A and LSD2/KDM1B, contain an amine oxidase (AOD) domain responsible for demethylase activity in a flavin adenine dinucleotide (FAD)-dependent manner68. The second family of demethylation enzymes is the Jumonji

C (JmjC) domain containing enzymes69. This larger family is further split into seven subfamilies (called

KDM2-8) and all members function by using the co-substrate 2-oxoglutarate (2OG) and dioxygen with cofactor Fe(II)70–74. In the past few decades, a lot of research has elucidated the various histone demethylases and the substrates upon which they act75.

High levels of histone acetylation are associated with euchromatin and active transcription76,77.

The lysine residues of histone tails are prone to acetylation, causing a buffering of the positively charged lysine. This overall reduction in charge, ultimately leads to a decreased association between the negatively charged DNA and the histone subunits25,78–80. In addition, downstream recruitment effects occur because of the presence of the acetylation mark. Lysine acetylation marks are deposited from acetyl co-A by histone acetyl transferases (HATs). The two HAT families are classified based on cellular location. Type A HATs act on chromatin in the nucleus, whereas Type B HATS act on newly translated histones in the cytoplasm. The Type A HATs are further classified into five subtypes based on their structural and functional characteristics: GNAT, MYST, p300/CREB, TAF250, SRC/NCoA26. The main

Type B HAT is HAT1, which forms a complex with histone chaperones and is responsible for depositing

H4K5ac and H4K12ac81. The GNAT (General Control Non-Repressible 5 (GCN5) related N- acetyltransferase) family contains a P300/CBP-associated factor (PCAF) domain, an adenosine deaminase 2 (ADA2) domain, and a bromodomain and its members are KAT2a/Gcn5L and KAT2B

5 (PCAF)82. The MYST (Moz, YBF2/SAS3, SAS2, TIP60) family is characterized by an acetyl CoA binding site and a zinc finger motif and includes KAT5/TIP60, KAT6A/MOZ/MYST3, KAT6B/MORF/MYST4,

KAT7/HBO1/MYST2, and KAT8/hMOF/MYST183. The p300/CBP family of HATs, KAT3A/CBP and

KAT3B/p300, has been associated with all four histone subunits and contains several domains including the KIX domain, a bromodomain, cysteine-histidine rich domains, and a HAT domain84. The

TAF250 HATs associate with general transcription factors and stabilize transcriptional machinery85.

Lastly, the SRC/NCoA family of HATs recruit coactivators with histone methyltransferase and acetyltransferase activity86.

To date, three domains have been identified to bind acetylated histones, the bromodomains

(BRD), double PHD finger, and Yeats domain26. Bromodomain containing proteins are the most abundant acetyl lysine mark readers, and are classified into 8 families based on similarities in structure and sequence26,87,88. All bromodomains have a conserved motif of four helices and interhelical ZA and

BC loops89,90. The bromodomains of BRD family I are found in PCAF, GCN5L2, FALZ, and CECR2.

These domains recognize H3K14ac, H4K8ac, and H4K16ac26. BRD family II are found in the family members of the Bromodomain and Extra-Terminal Domain (BET) proteins BRD2, BRD3, BRD4 and

BRDT. The BET proteins bind H4K12ac, H3K14 ac, H4K5ac, and H4K8ac. Several ubiquitously expressed proteins contain BRD family III including EP300, CREBBP, WDR9, PHIP, BRD8B, BAZ1B,

BRWD3. BRD family IV is found in ATAD2, BRD1, BRD7, BRD9, BRPF1A, BRPF1B, BRPF3, and

KIAA1240. BRD family V includes BAZ2A, BAZ2B, LOC93349, SP100, SP110, SP140, TIF1a, and

TRIM33, and is characterized by a tandem PHD/BRD. BRD family VI includes MLL and TRIM28. BRD family VII contains ZMYND11, TAF1, TAF1L, WDR9, BRWD3, and PHIP. Lastly, BRD family VIII contains ASHIL, SMARCA2, SMARCA4, and PB126. To date, there are five proteins shown to contain a double PHD finger domain involved in acetyl-lysine recognition, MOZ/KAT6A, MORF/KAT6B, DPF1,

DPF2, and DPF391. The YEATS domain is a relatively new domain currently being further characterized. It has recently been shown that the YEATS domain-containing protein, AF9, binds histone lysine acetylation and crotonylation92. The enzymes that carry out the removal of lysine acetylation, ultimately increasing chromatin compaction and gene repression are called histone deacetylases (HDACs), and are subdivided into class I, IIa, IIb, III, and IV26,93,94. HDAC Class I, IIa, IIb,

6 and IV function in a zinc-dependent manner, whereas class III, or sirtuin, HDACs function through a

NAD+ dependent manner26,93,95. Class I consists of HDAC1, 2, 3, and 8, all of which reside predominantly in the nucleus and are found in most tissue types. All but HDAC8 function in conjunction with a repressive complex. For example, HDAC1 and HDAC2 are part of the Sin3, NuRD, CoREST, and PRC2 complexes, and HDAC3 functions as part of the NCoR or SMRT complex96. Class II HDACs are capable of shuttling in and out of the nucleus. HDAC4, 5, 7 and 9 are the members of Class IIa and are responsive to cellular signaling pathways due to their longer N-terminal region26. HDAC6 and

HDAC10 are the members of Class IIb. Uniquely, HDAC6 contains two catalytic deacetylation domains and functions as a tubulin deacetylase97. Not much is known about HDAC10, but it has been suggested that it functions in cell cycle regulation93. The sole member of HDAC Class IV is HDAC11, about which not much is currently known aside from its sequence similarity to Class I HDACs but absence from repressive complexes like Sin3 and CoREST26,93. Class III HDACs, or the SIRTs, include SIRT1-

SIRT726,95. SIRT1 deacetylases both histones (at H1K16 and H4K26) and other proteins, and so it moves between the cytoplasm and nucleus98–100. SIRT2 is predominantly localized to the cytoplasm but can shuttle into the nucleus and deacetylate H4K16 and H3K5698,101. Similarly, SIRT3 is predominately localized to the mitochondria, but can shuttle to the nucleus to deacetylate the same residues26,98. SIRT4 and SIRT5 do not currently have any known histone substrates98. SIRT6 is found in the nucleus and acts on H3K9ac and H3K56ac. Lastly, SIRT7 acts on H3K18ac in the nucleus26.

While the presence of histone tail acetylation has been known for decades, the complexity of its regulation and influence on the cell is still being actively studied.

The close relationship and interplay between chromatin biology and transcriptional regulation stems from two processes, physical accessibility of the DNA based on the level of compaction and the addition or removal of chromatin modifications by transcriptional machinery. At the eukaryotic promoter,

RNA polymerase II (Pol II) binds and, along with other general transcription factors, forms the preinitiation complex (PIC)102. Importantly, the chromatin environment is important for all these steps, from Pol II binding and PIC formation, to elongation and termination. It has been shown that changes in nucleosome net charge, influenced by acetylation levels, controls how easily DNA can be displaced by the transcriptional machinery103. In addition, H4K16ac prevents 30 nm fiber formation, a compact

7 form of chromatin, thereby allowing for increased overall accessibility of the DNA to the transcriptional machinery104. Within 30 bp of transcription Pol II is phosphorylated at the carboxy-terminal domain

(CTD). This is an important step for the recruitment of chromatin modifying machinery and proper elongation105. The phosphorylated CTD recruits histone tail modifying writers and erasers. For example,

Pol II-mediated Set2 occurs during elongation to deposit H3K36me3, followed by hypoacetylation of the gene body106. While this process is more complex than what has been described, these examples illustrate how transcriptional regulation and the chromatin environment are linked. Because of this interaction, changes in the function, recruitment, or expression of chromatin modifying proteins can have a major impact on the expression of essential genes107.

Section 1.2: Disease Relevance

A flood of discoveries over the past decade have cemented the driving role of disruption to chromatin regulatory pathways leading to many human diseases. Advances in technology and cleaver new approaches have paved the way for understanding how these regulatory processes are dysregulated in diseased settings, what the downstream consequences are, and how compounds can be developed to exploit or correct these changes. Many clinical trials using epigenetic-based therapies are ongoing and a handful of epigenetic based drugs have been approved, yet more targeted and less toxic approaches are needed.

Over the past several decades, both expression changes and sequence mutations have been found for chromatin regulatory machinery, however much work remains on detailing the early epigenetic mechanisms driving these diseases. In addition, global and gene-specific changes in histone or DNA modifications have been associated with multiple diseases, but the specific driver orchestrating these epigenetic changes is not clear. It is also not always certain whether a cancer-specific change is a clinically-irrelevant downstream effect, a potential biomarker for evaluating disease progression, or a central driver or addiction of the disease that might be worth targeting therapeutically. To answer this question, the mechanisms of action and the direct consequences of the dysregulation need to be elucidated. In order to study these disease and develop targeted, epigenetic-based therapies, it is important to understand what disease have epigenetic changes that can be potentially exploited.

8 A wide range of diseases have been identified to have mis-expression of a chromatin regulatory enzyme with potential clinical relevancy. For example, multiple groups implicated overexpression of a polycomb group family member, BMI-1, in the initiation and progression of certain types of lymphoma108–110. It was shown that the zinc finger motif (specifically the RING finger) is required for

BMI-1’s oncogenic effects110. Additionally, BMI-1 has been more recently implicated in the tumor initiation of hepatocellular carcinoma (HCC)111 and the cell proliferation and migration of pancreatic duct adenocarcinoma (PDAC)112. While the complete mechanism of action is not known, it is evident that BMI-1 overexpression, alone, is not enough to cause tumor formation. In the PDAC study, for instance, a mutant KRAS (G12D) is required for malignant transformation in vitro112. These works implicate BMI-1 as a potential therapeutic target for a combinatory drug approach. Other chromatin regulatory proteins found to be mutated or mis-expressed in pre-clinical or clinical models of disease are shown in Table 1.

9 Table 1: Histone and DNA writers, readers, and modified in diseases.

Mutated writers, Alteration Disease (s) readers, and erasers Missense, Frameshift, Nonsense, DNMT3A AML113, MDS114 Splice site TET1 Translocation AML115 TET2 Missense, Nonsense, Frameshift MPN116, AML117, MDS118, CMML119 Translocation, Nonsense, Missense, AML120121, DLBCL122, MDS123, CBP/KAT3A Frameshift RTS124 CML125, Pancreatic126, Translocation, Nonsense, Missense, P300/KAT3B Colorectal126, Breast126, DLBCL122, Frameshift AML127 MOZ/KAT6A Translocations AML127,128 MORF/KAT6B/MYST4 Translocations AML120, Uterine Leiomyomata129 BRD2 Unknown ALL130 BRD3 Translocation, Missense Midline Carcinoma131, Lung132 BRD4 Translocation Midline Carcinoma133 TRIM33 Translocation Lung134 Partial Tandem Duplication, KMT2A/MLL1 AML135, ALL136 Translocation Medulloblastoma137, Breast138, KMT2B/MLL2 Nonsense, Frameshift, Missense Renal139, DLBCL140, Prostate141, FL140, Lung142 KMT2X/MLL3 Nonsense Medulloblastoma137, Breast138

KMT3A/SETD2 Nonsense, Frameshift, Missense Renal139, Breast138

KMT3B/NSD1 Translocation AML143 NSD2 Translocation, Missense Multiple Myeloma144, ALL145 NSD3 Translocation AML146 KMT6/EZH2 Missense DLBCL147, MPN148, MDS149 KDM5A/JARID1A Translocation AML150 KDM5C/JARID1C Nonsense, Frameshift, Splice site Renal139 Deletion, Nonsense, Frameshift, AML151, Renal151, Oesophageal151, KDM6A/UTX Splice site MM151, CML151 PHF6 Deletion, Missense T-ALL152, AML153 BRD8 Missense, Nonsense Liver154 DNMT1 Nonsense, Missense Colon155 HDAC2 Frameshift Colon156 HDAC9 Missense Prostate157 PRDM9 Nonsense, Missense Head and Neck158 SETD2 Frameshift, Nonsense, Splicing site Glioblastoma159, Renal160 SETD1A Nonsense Breast138

10 In most cases, when a chromatin regulatory protein has altered gene expression, localization, and/or functionality (caused by a mutation, modified chromatin environment, chromosomal translation, etc.) the chromatin environment and expression levels of dozens to thousands of genes can be transformed. The downstream gene expression changes are often what causes the direct disease etiology, rather than the specific chromatin modifying machinery directing the changes. For example,

CpG island hyper-methylation occurs in subtypes of breast and ovarian cancer at the BRCA1 locus161,162. BRCA1 is a tumor-suppressor gene (TSG) involved in DNA damage repair, the silencing of which can cause an increased mutation rate and genomic instability in the cancer163. While hyper- methylation of the promoter of TSGs is a common cancer-related phenomenon164–166, the mechanism of action is still not completely understood. DNMTs have been implicated in the inactivation of TSGs in several tumor types35,167,168, however it is not likely that the overexpression of DNMTs, alone, is enough.

It has been suggested that recruitment of the DNMTs (by transcription factors, co-repressors,

H3K9me3, lncRNAs etc) is also required165,169,170. Other disease-relevant genes that have identified with changes in the chromatin environment are shown in Table 2.

11 Table 2: Genes with modified chromatin environments in a diseased context.

Gene Chromatin change Disease (s) Reference TP53 Promoter hypermethylation Glioblastoma 171 CDKN2A Promoter hypermethylation Burkitt’s lymphoma 172 Increased H3K9me2 Liver cancer 173 TSSC3 Promoter hypermethylation Osteosarcoma 174 FSHD Decreased CpG methylation FSHD 175 miR-181c Promoter hypermethylation Glioblastoma 176 Sat2 Loss of H3K4me3 Leukemia 177 p21 Decreased H3ac and H4ac Bladder cancer 178 IL6 H3K9me2 Type 1 Diabetes 179 Decreased H3K27me3 / TAL1 T-ALL 180 Increased H3K27ac TMPRSS4 Decreased DNA methylation Lung Cancer 181 RASSF1A De Novo DNA methylation Breast Cancer 170 STAT1/MyD88 Increased H3K9ac Type 1 Diabetes 182

12 In some cases, it may be advantageous to pharmacologically target downstream gene activity with altered chromatin environments, rather than the mutated chromatin regulatory machinery that is causes the gene expression changes. Unfortunately, when chromatin regulatory machinery is dysregulated in some manner, there are likely changes in the expression of multiple genes contributing to disease progression. In addition, many overexpressed proteins are not able to be targeted due to a lack of specific and potent inhibitors. There are also cases where a gene is repressed (i.e. promoter hyper-methylation of TSG) and needs to be re-expressed. In cases of multiple gene targets, inadequate inhibitors, or gene repression, controlling the chromatin regulatory machinery in a global manner may be the best approach currently available.

With the understanding that epigenetic changes can contribute to disease initiation, growth, and/or therapeutic resistance, small molecules have been developed to study epigenetic machinery and to regulate its activity. The idea of using epigenetic agents in combination with the standard-of- care therapy regiment has been around for over 40 years183. Though epigenetic inhibitors and chemical probes have been used this, the mechanism of interaction was not able to be determined directly until very recently. In part due to novel techniques in structural biology and mass spectrometry, more direct questions can be answered so that potent, target-specific inhibitors can be synthesized.

DNA methylation changes are a hallmark of cancer cells and present as a very desirable target.

DNMT inhibitors (DNMTi) are divided into “nucleoside” and “non-nucleoside” inhibitors, and were first synthesized in the 1960’s. The first two inhibitors synthesized, Azacytidine (5-azacyidine) and

Decitabine (5-aza-2’-deoxycytidine) fall into the nucleoside category because they mimic the structure of cytidine, incorporate into the DNA strands and form irreversible interactions with DNMTs, causing

DNMT degradation184. Both drugs were FDA approved to treat patients with acute myeloid leukemia

(AML), and Azacytidine is also approved to treat patients with myelodysplastic syndrome (MDS)185.

Since then, other nucleoside inhibitors (cytidine and cytosine analogues) have been synthesized

(Guadecitabine, SGI-110, CP-4200, 4’-Thio-2’-deoxycytidine, RX-3117), and are at the clinical stage for treatment of HCC, MDS, AML, metastatic pancreatic cancer, and advanced solid tumors. Other inhibitors that are in the non-nucleoside category exist and function by a different mechanism. For example, procainamide (used to treat cardiac arrhythmias) binds to the CpG regions of DNA to block

13 DNMT activity186. SGI-1027, another non-nucleoside inhibitor, was found to inhibit DNMT activity in colon cancer cells and reactivate the previously hyper-methylated MLH1 and P16187. Currently, there aren’t any published inhibitors of DNA methylation readers or DNA-demethylation, though inhibitors of these kind would potentially have a strong clinical relevance.

Changes in the lysine methylation at specific genes has an important role in affecting the expression of many genes important in tumorigenesis. For this reason, inhibitors of histone lysine methyltransferases, demethylases, and reader proteins are currently being synthesized and tested in pre-clinical and clinical models. The first G9a selective inhibitor synthesized was BIX-01294, with the goal of affecting levels of methylation at H3K9188. To improve upon BIX-01294, UNC0638 was developed and was shown to hypersensitize cancers to lower doses of chemotherapies189,190. Another

G9a inhibitor, A-366 was shown to be effective against leukemia191. Several inhibitors of EZH1 and

EZH2 (subunits of PRC2 that catalyze H3K27 methylation) exist and are classified based on structure into pyridine-indazole scaffolds (EPZ005687, UNC1999, GSK343), pyridine-phenyl scaffolds

(EPZ006088, EPZ6438), and pyridine-indole scaffolds (GSK126, CP1-1205, EI1). Several of these inhibitors are active in clinical trials. For example, EPZ6438 (aka Tazemetostat) is currently being tested in patients with B-cell lymphoma192. To achieve the opposite effect, lysine demethylase inhibitors targeting LSD1 and the JmjC domain have been synthesized and are at the clinical or preclinical stage.

To target LSD1, a reversible inhibitor, HCI-2509, was developed and found to be effective against a subtype of neuroblastoma cells193. Additionally, a Tranylcypromine analogue inhibitor of LSD is currently in a clinical trial to treat AML26. Several MBT family genes (readers of lysine methylation) have reportedly been over-expressed in cancers, suggesting a therapeutic need for inhibitors. UNC926, a

L3MBTL1 inhibitor, and UNC1215, a L3MBTL3, inhibitor have been synthesized and validated194,195.

Because methylation of lysine residues on histone tails can have contrasting effects on gene expression, the need for target-specific inhibitors is especially important.

Inhibitors of lysine acetylation writers, erasers, and readers have the largest presence in preclinical and clinical trials26,89,93,95,98,183,196–212. To target histone acetyltransferases, HAT inhibitors are being identified and characterized. TH1834 is a novel Tip60 inhibitor that has shown pro-apoptotic effects in breast cancer cell lines213. Recently, Lasko et al. discovered a potent and selective p300/CBP

14 inhibitor, designated A-485. With high resolution, crystallography, A-485 was shown to bind the catalytic active site of p300, and compete with acetyl-CoA. It was further shown to inhibit the transcriptional program in prostate cancer, thereby decreasing tumor growth197. Currently, five HDAC inhibitors (HDACi) have already been FDA approved, with several more in clinical trials. The first HDACi discovered was Trichostatin A (TSA), a naturally occurring inhibitor of zinc-dependent HDACs, though it has yet to be used clinically214. Since then, several more hydroxamic acid derivatives were discovered that function as pan-HDACi and have been approved for clinical use including Vorinostat for cutaneous

T-cell lymphoma, Belinostat for peripheral T-cell lymphoma, and Panobinostat for multiple myeloma215–

217. Other chemical classes of HDACi have also been synthesized and tested in disease settings, including the benzamide, romidepsin, and short chain fatty acid classes209. Some HDACi have found to be class specific, which offers the advantage over pan inhibitors of being able to cause a more specific effect or ask more directed questions. For example, entinostat is selective for class 1 HDACs, and has been shown to be effective at treating Hodgkin’s lymphoma198,199. For targeting readers of histone tail acetylation, interactors of the bromodomain represent the largest category of compounds currently in preclinical or clinical trials. Perhaps the most widely used are JQ1 and I-Bet762, the first potent and selective BET bromodomain inhibitors (BRDi)204,218,219. I-Bet762 is currently being tested in a clinical trial for NUT midline carcinoma. More recently, OTX015 is another BRDi currently being tested in patients with hematologic malignanies26. In all, these inhibitors represent useful tools for understand chromatin regulatory pathways as well as potentially life-saving therapeutics.

One disadvantage of using epigenetic based therapies is that many inhibitors target multiple chromatin regulatory proteins. Secondly, targeting chromatin regulatory machinery in a cell-wide manner can affect expression of dozens, if not hundreds of other genes. During the first few decades of clinical trials, high levels of toxicity were observed when patients were treated with HDACi220. This was likely due to poor bioavailability and/or off-target binding effects183,221. For example, the degradation rate of butyrate and phenylbutyrate required doses greater than 400 mg/kg/day220, but it was also shown that these compounds also inhibited phosphorylation and methylation of non-histone substrates222. VPA has been previously shown to cause birth defects, like neural tube closure defects, in recently pregnant patients223. More recently, clinical trials with prostate cancer patients using HDAC

15 and DNMT3 inhibitors have shown high levels of toxicity. For example, Vorinostat in combination with a chemotherapy (docetaxel) was terminated due to toxicities including neutropenia, peripheral neuropathy, and gastrointestinal bleeding224. Other documented toxicities in prostate cancer patients included fatigue, nausea, vomiting, thrombocytopenia, and weight loss225. DNMT and HDAC inhibitors have been shown to induce expression at cryptic transcriptional start sites and upregulate endogenous retroviruses. This presents previously unexpected, and likely unpredictable off-target side effects226.

While improvements in creating potent drugs to reduce off-target toxicities have been made, being able to affect the chromatin environment of specific genes or limit drug exposure to cancer cells over healthy cells are areas being actively pursued in research. Table 3 displays previously synthesized inhibitors of DNA and histone readers, writers, and erasers. For more examples of epigenetic-based dyregulation in disease and therapeutic strategies please review Biswas et al.26, Jones et al.227, Feinberg et al.228, and Portela et al.229.

16

Table 3: Inhibitors of chromatin writers, readers, and erasers for translational studies.

Inhibitors of writers, Chromatin target Status of compound Reference readers, erasers UNC926 MBT domain Preclinical 195 UNC1215 MBT domain Preclinical 194 MS37452 CBX7 Preclinical for prostate cancer 230 CF16 Pygo 2 Preclinical 231 I-BET762 BET bromodomain Clinical for NMC (Phase 1) NCT01587703 JQ1 BET bromodomain Pre-clinical for squamous carcinoma 204 MK-8628 BET bromodomain AML and DLBCL (Phase 1) NCT02698189 Bizine LSD1 Preclinical for lung and prostate 232 Namoline LSD1 Preclinical for prostate 233 Vorinostat HDAC Approved for CTCL 215 Belinostat HDAC Approved for PTCL 234 Panobinostat HDAC Approved for multiple myeloma 235 Quisinostat HDAC Clinical for ovarian cancer (Phase 2) NCT02948075 Entinostat HDAC Clinical for NE tumor (Phase 2) NCT03211988 Mocetinostat HDAC Clinical for Lymphoma (Phase 2) NCT02282358 Azacytidine DNMT Approved for MDS 236 Clinical for rhabdoid tumors, INI1- Tazemetostat EZH2 NCT02601937 negative tumors (Phase 1) EPZ6438 EZH2 Clinical for lymphoma 192 UNC0638 G9a Preclinical for breast cancer 189 AZ-505 SMYD2 Preclinical for polycystic kidney 237 EPZ-5676 Dot1L Clinical for leukemia NCT01684150

17

Section 1.3: Synthetic Control of Chromatin

One way that researches learn the more about the function of any given RNA or protein is by changing expression levels by synthetic technique. Prior to recent advances in genome and epigenome editing in the mammalian genome, an early technique that evolved to study gene functionality is RNA interference (RNAi). Here, a biological process has been repurposed for silencing mRNA in a relatively quick and efficient manner. Unfortunately, RNAi is a does not reduce protein levels to that achieved with a gene knock-out. In addition, the RNAi is generally transient in its effectiveness, inconsistent between sequences, cell types, and labs, and can occur at off-target sites238,239. Since the development of custom zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regulatory interspaced short palindromic repeats (CRISPR)/ CRISPR-associated (Cas) systems, gene editing has been possible. These transformative technologies enable modification of the genome directly by deleted genes (through nonhomologous end joining (NHEJ) creating insertion deletion (Indel) events or by modifying/inserting specific sequences or genes (through inducing homology-directed repair (HDR) and supplying a homology-containing donor template). NHEJ and

HDR are two pathways in the DSB repair process, with NHEJ involving the ligation of broken DNA ends, typically leading to a small insertion/deletion and/or knock-inducing frame-shirt, whereas HDR involves the incorporation of the donor molecule to insert a transgene or nucleotide substitution240. To edit the genome, these DSB and repair pathways are initiated by fusing sequence-specific DNA-binding domains (ZFN or TALEN) to the Fok1restriction endonuclease, or by designing a sequence-specific single guide RNA (sgRNA) to recruit a Cas endonuclease241–245. These techniques have paved the way for genome editing and advanced the fields of basic science and translational research.

ZFNs were the first of these technologies to be developed for editing the mammalian genome, with the most commonly used type being the Cys2-His2 zinc-finger domain. Each zinc-finger consists of roughly 30 amino acids that form a bba conformation, with the a-helix capable of binding 3 base pairs of DNA. The optimized system uses six zinc fingers targeting 18 bases, theoretically providing a specific within 68 million bases of DNA246. Zinc finger domains have been optimized to recognize most of the possible nucleotide triplets, such that pre-designed zinc finger modules can be linked together

18 through cloning techniques and used target a desired sequence. Once linked to the Fok1 endonuclease, the zinc fingers are able to induce a DSB in a gene-specific manner. The major disadvantage of this technique is that it is a time consuming and labor-insensitive process. The alternative way to use this technology is by purchasing pre-made constructs to target your gene-of- interest, though they are typically expensive. TALE proteins are naturally occurring in bacteria and contain a series of 33-35 amino acid repeat domains that recognize a single DNA base pair. Similar to zinc fingers, TALE repeats are strung together to recognize specific DNA sequences247. While this single-base-pair approach allows for more design flexibility than the triple nucleotide binding of the zinc fingers, the cloning process is more arduous and expensive. The most recent technique adapted for mammalian genome editing is the CRISPR-Cas system, which was discovered and adapted from bacteria and archaea. Serving as their native immune system from foreign phages and plasmid DNA, the CRISPR system includes a Cas nuclease. The most commonly used CRISPR system uses a Cas9 protein that is recruited to a specific DNA sequence by way of a DNA-binding RNA. For Cas9- recruitment to a complementary sequence fragment, a CRISPR RNA (crRNA) and trans-activating crRNA (tracrRNA) are required. A stemloop portion interacts with the Cas9 enzyme, and a complementary sequence hybridizes with the DNA of interest. Recently, a chimeric crRNA-tracrRNA can been created, termed sgRNA. A sequence complementary to the region of interest, typically 20 base pairs in length, is ligated into a sgRNA-expressing plasmid. To design the 20 base pair sequence, the only essential targeting component is the presence of the protospacer-adjacent motif (PAM) at the

3’ end of the DNA complementary region248 With these modified sgRNA, gene editing can be performed with merely the Cas9 enzyme and the sgRNA in an quick and inexpensive manner249. The ZFN, TALEN, and CRISPR systems have all been modified for use in several model systems, including human, zebrafish, mouse, worm, plant, and fly241,244,248,250–258. While these gene-editing technologies have been valuable for studying genotypic functions, there remains issues with low-efficient gene editing and irreversible off-target effects.

To induce the recruitment of various effectors and domains in a controlled and specific manner, a nuclease deficient Cas9 has been made. By creating a point mutation in each of the nuclease domains, RuvC1 (D10A) and HNH (H841A), the endonuclease activity is abolished. The deactivated

19 Cas9 (dCas9) is still recruited to the genomic region of interest by the sgRNA , but it will not induce a

DSB253. Dozens of labs have added adapter proteins or domains to dCas9 for a wide range of purposes.

Shortly after the creation of the optimized dCas9, several labs began fusing activating domains to the dCas9 protein with the aim of activating endogenous gene expression. Rather than inferring the activity of a gene product by exogenously expressing the gene, these techniques allow for a more biologically relevant means of directly increasing gene expression. Maeder et al. began by using a dCas9-VP64 fusion to over-express human NTF3 and VEGFA with a dCas9-VP64 construct. VP64 is a string of four VP16 fusions (well-characterized transcriptional activators). They also showed that co- expressing multiple gRNAs to target the same gene at different locations can increase gene activation in a synergistic manner259. Other groups quickly began testing other mammalian genes using dCas9 directly fused to activatong machinery, including VP64244,260,261, VP160 (VP16x10)262, VP192263, p65

AD (activation domain)260, VPR (VP64-Rta-p65)264,265, Tet1 CD (catalytic domain)266–268, and the p300 core domain269.

In addition to testing activating fusions of dCas9, labs began demonstrating the ability to efficiently decrease expression with dCas9 fused to proteins and domains associated with gene repression260,261,265. Piloting work by Gilbert et al. used a dCas9 fused to KRAB (a strongly repressive domain),270 decreased expression of a fluorescent reporter 15-fold, and demonstrated the gene- specificity of the repression with RNA-sequencing. They next applied this system to target endogenous genes (CXCR4 and CD71) in cell lines stably expressing dCas9 or dCas9-KRAB260. To re-write the

DNA methylation landscape at a specific gene, labs have used dCas9-DNMT fusions. Vojta et al. used a dCas9 fused to DNMT3A and demonstrated the ability to regulate the CpG methylation status in a roughly 35 base pair window from the sgRNA-binding sequence. In addition, they showed the expected decreased in expression of the endogenous genes in a specific manner271.

Not only can the genes of interest be regulated through targeting of the promoter, but dCas9 has also been shown to effectively control expression through targeting of non-promoter regulatory elements. The Gershbach lab and colleagues targeted core enhancers (CE), distal regulatory regions

(DRR), proximal enhancers (PE), and distal enhancers (DE) with dCas9 direct fusions (dCas9-VP64 and dCas9-p300). For activating MYOD with sgRNAs targeting the CE, the DDR, or promoter, dCas9-

20 p300 core was more effective than dCas9-VP64. In addition, when gRNAs were used targeting the hypersensitive site 2 (HS2) enhancer region of the human b-globin locus dCas9-p300 was able to significantly overexpress multiple downstream hemoglobin genes simultaneously272. This group also published work targeting the same HS2 enhancer with a dCas9-KRAB and demonstrated the ability to significantly decrease expression of multiple globin genes. Lastly, they also showed that dCas9-KRAB recruitment induced induce H3K9me3 (at the enhancer region to which the dCas9 was recruited) and decreased chromatin accessibility (at the enhancer and promoter region)273. Kearns et al. also demonstrated that cell type-specific regulatory elements can be controlled through recruitment of histone demethylase activity with dCas9-LSD1274. Lastly, a dCsa9-based high-throughput screen was recently performed with dCas9-p300 (for activation) and dCas9-KRAB (for repression) to study regulatory elements in their native context. By targeting the dCas9 fusions to DNase I hypersensitive sites (or regions of open chromatin) surrounding a gene of interest, gain- and loss-of-function screens were performed. Not only were previously identified regulatory elements validated for the b-globin locus, but previously unappreciated regulatory elements of HER2 were identified275.

The next advancement to occur with dCas9-based editing is the incorporation of indirect effector recruitment. For example, the “dCas9-SunTag” system has greatly increased the number of epigenetic modulators that can be tethered to a dCas9. The SunTag component is a repeating peptide array that can recruit antibody-fusion (termed single chain variable fragment (scFv) fusions)276. Suntag adapters capable of recruited 4, 5, 10 or 24 scFv-fusions have been created as a way to increase the effector efficiency (i.e. with scFv-VP64276, scFv-TET1267, scFv-DNMT3A277). A second indirect recruitment technique that increases the recruitment power of dCas9 is the use of scaffolding sgRNAs

(scRNAs) in place of the tradition sgRNAs. The scRNAs provide extra recruitment power by the addition of extra stem loops, while maintaining the dCas9-recruitment function. Three scRNA varieties were made and compared with MS2 stemloops (to recruit MCP proteins), PP7 stemloops (to recruit PCP proteins), and com stemloops (to recruit Com proteins). By creating MCP-, PCP-, and Com- VP64 fusions, it was shown that activation through scRNAs was more effective than the dCas9-VP64 direct fusions. Having two stemloops of identical or differing stem loops was also found to increase recruitment power278. By taking into account the crystal structure of the interaction between dCas9 and

21 sgRNA, the MS2 system was further optimized. It was shown that incorporating the MS2 stemloops within the dCas9-recruiting stem loops, rather than tacking the additional loops downstream of the original stemloops, the recruitment power with MCP fused to VP64 was 14-fold more efficient (for

ASCL1 targeting). The authors hypothesized that this could be the result of improved MS2 positioning or degradation protection of the integrated MS2 stem loops by dCas9. It was also shown that engaging multiple activators, for example through dCas9-VP64 and MCP-P65, was more effective than when the dCas9 and MCP are fused to the same effector279. Gene activation with MCP-TETcd was also found to be effective, through active DNA demethylation280. Chavez et al. conducted a side-by-side comparison of several commonly used dCas9-direct fusion, scRNA, and SunTag activation strategies in several human, mouse, and fly cell lines. Unsurprisingly, the relative effectiveness between the different modes of recruitment appeared to be dependent on effector, cell line, and sgRNA location281.

To control the expression and activity of dCas9, several labs began designing and optimizing dCas9 constructs that are chemically responsive. To control the expression level of the full dCas9 construct, Conklin and collogues used a Dox-inducible promoter to control dCas9 expression. They showed that dCas9-KRAB expression was highly dependent on Dox, as well as reversible within two days282. Conversely, Adli and colleagues demonstrated the ability to turn off dCas9 expression by the addition of a chemical. By repurposing a plant-based technology called auxin-induced degradation

(AID), dCas9 proteins fused to a specific degron are rapidly and reversibly degraded in the presence of auxin. Within 6 hours of auxin treatment, roughly 90% of the dCas9-p300 expression was reduced283.

Groups have demonstrated the ability to control the epigenome-wide programming of mammalian cells. For example, Chakraborty et al. used the dCas9 system to stably overexpress endogenous MYOD1 in mouse embryonic fibroblasts, successfully reprogramming the cells into skeletal myocytes284. Another group activated several gene simultaneously and was able to induce differentiate of human induced pluripotent stem cells (iPSC) into neuronal cells 264. To control differentiation in a tunable and reversible manner, a group used a doxycylin (Dox)-inducible dCas9 fusion to repress pluripotent-related genes (, OCT4, NANOG) in iPSCs. They further demonstrated that knockdown of pluripotent genes with dCas9-KRAB-expressing-cells resulted in greater than 99% of gene expression (in a uniform and consistent manner among the cell population),

22 whereas gene knockout with cells expressing Cas9 resulted in 30-40% of the cell maintaining gene expression and induced a less uniform population of differentiating cells282. Lastly, Balboa et al. and

Liu et al. used strong activators and targeted genes like OCT4, SOX2, NANOG, and LIN28A to reprogram mouse embryonic fibroblasts263 and human skin firboblasts285 into iPSCs.

To study the organization and dynamics of chromatin within the mammalian nucleus, a group used an enhanced GFP (eGFP)- tagged dCas9 to label and image telomeres and individual coding genes in live cells. The florescent intensity levels and localization (indicative of dCas9 and telomere or gene presence) were validated with fluorescence in situ hybridization (FISH). In addition, they demonstrated the ability to track the movement of telomeres and the spatial organization of individual gene loci in real time286.

Combining the dCas9 technology with engineered DNA-binding molecule-mediated chromatin immunoprecipitation (enChIP) has been shown to be an effective tool to study DNA-protein and DNA-

RNA interactions at a given gene of interest. In addition, the specificity of dCas9-DNA binding can be assessed in an unbiased manner. First, a flag-tagged dCas9 was targeted an endogenous gene with co-expression of a sgRNA, crosslinked the cells, sonicated, and performed immunoprecipitation (IP) with an anti-Flag antibody to isolate DNA that formed an interaction with dCas9. Once the DNA is purified, downstream DNA sequencing, RNA sequencing, or mass spectrometry can be performed to determine interaction specificity and potential RNA and protein interactions at a given gene287. This type of tool was recently optimized and further characterized by Liu et al. using a Biotinylated-dCas9 in vivo. The group aggressively investigated the human b-globin locus and studied 3D chromosomal interactions, chromatin modifications, interactions with regulatory proteins, as well as how these characteristics are affected by cellular state and differentiation. With this highly sensitive method, previously unidentified regulatory elements of the b-globin locus were identified and fully characterized288. For further reading of dCas9-based chromatin regulation please read Brocken et al.289 and Thakore et al290.

While the dCas9 and other CRISPR-related systems are continuously being improved and functionally expanded, one major disadvantage of the current systems is that for effective activation or repression, fusions to exogenous effectors must be used. The over expression of these constructs,

23 especially the chromatin regulator-based fusions could be causing currently undetected off-target effects. In addition, the presence of the dCas9 at the targeted sequence could be creating non- biologically relevant interactions. Lastly, the effect that these fusions have on gene expression are unidirectional, activation or repression. The ability to redirect endogenous chromatin regulatory machinery in a specific and reversible manner would reduce exogenous-based off-target effects, potential steric hindrance from fusion proteins, and allow for investigation of naturally occurring chromatin regulatory processes.

Section 1.4: Bifunctional Molecules and CIPs

The fields of synthetic biology and synthetic chemistry have created new tools that are now being applied to the study and control of chromatin biology. Some chemical inducers of proximity (CIP), are a class of bifunctional molecules that bring two proteins, or activity domains, in physical proximity of each other in order to initiate a new biological response. CIPs have paved the way for scientist to control complex processes in a chemically dependent manner.

The first CIP to be synthesized was a homodimer of FK506 (FK1012), which brings together two FK506-binding-protein (FKBP12) molecules291. FK506, alone, can also bind FKBP from one side and calcineurin on the other. Another FKBP-interacting molecule is rapamycin capable of bringing together FKBP12 and the FKBP12-rapamycin-binding (FRB) domain of the TOR kinase. Three other commonly used CIPs are plant-based molecules that have been repurposed in mammalian system for

CIP and have been shown to be non-toxic. Abscisic acid functions with the Pyl receptor and phosphatase ABI1, gibberellin induces proximity of GID1 and GA1, and indole-3-acetic acid (auxin, which has been previously discussion for dCas9 proteosomal degradation) can recruit the Cul1 complex to proteins with an auxin-inducible degron292. Since the discovery and characterization of these

CIPs, groups from multiple fields have repurposed them in diverse ways.

One way labs have repurposed CIPs, is through the redirection of chromatin regulatory machinery in order to ask mechanistic questions in a direct manner. The pilot study for this type of investigation resulted in the development of the chromatin in vivo assay (CiA). To study chromatin regulatory machinery, one of the murine Oct4 loci was engineering to contain a Gal4 DNA binding array.

24 By exogenously expressing Gal4-FKBP (to serve as the DNA anchor) and FRB fused to a protein-of- interest (i.e. heterochromatin protein 1a, HP1a) recruitment to Oct4 can be temporally controlled by rapamycin treatment293.

With the development of CRISPR/Cas9, labs have incorporated CIPs to control the recruitment of dCas9 fusions. Expanding upon the original CiA system, a dCas9-based FKBP and FRB system was optimized and tested with rapamycin. Using ms2-FKBP and FRB- fusions (i.e. FRB-HP1, FRB-BAF), reversible control of the chromatin environment of endogenous mammalian genes was demonstrated.

For example, upon recruitment of HP1 to the CXCR4 locus with the dCas9 system, H3K9me3 was enriched and mRNA levels were decreased294. In Zetsche et al., the authors split a dCas9-VP64 fusion into two different constructs, each with their own promoter, and added a FRB domain to the N-terminus side and FKBP to the C-terminus side. Upon treatment with 200 nM of rapamycin, the level of activation of the target gene (ASCL), was comparable to the full-length dCas9-VP64, and 57-fold more effective than the untreated cells with the split dCas9295. The Qi lab and colleagues incorporated CIP systems in a very intricate way. By creating dCas9-AB1 and dCas9-GAI fusions, there were able to recruit activating (VPR, SunTag-VP64) or repressing (KRAB) machinery in a chemically dependent manner with PYL1 and GID1 fusions. The ABA- and GA- inducible recruitment strategies were mutually exclusive and the effects were reversible296. Lastly, a split SunTag system was created whereby the scFv was fused to FKBP12. VP64 was fused to a mutated FRB domain (T3089L), such that rapamycin analogue AP21967 could bind the FRB mutant but not the endogenous TOR protein. Activation of their

BFP reporter was found to be roughly 140-fold upon treatment with AP21967297. Corson et al.298 and

Stanton et al.292 describe some of the other diverse ways labs have used CIP systems in chemical biology research.

Section 1.5: Conclusions and Future Perspectives

The study of chromatin biology represents a very complex relationship with many levels of regulation. While this overview focuses on DNA and histone tail modifications, there are several other components involved in chromatin compaction and gene expression, including histone variants, nucleosome occupancy, long non-coding RNAs, transcription factor recruitment, localization within the

25 nucleus, and chromatin looping with distal regulatory elements. With the work presented in this thesis, we are studying the contribution that DNA and histone tail modifications have on the chromatin environment and gene expression in a specific and biologically relevant way, so as to not exclude these other pathways. In addition to studying how these modifications and their regulator proteins function in a healthy system, we are also interested in how this process goes array in a disease context. It has become increasingly clear that dysregulation of the global chromatin environment can confer a growth advantage for disease, such as cancer. Thus, by redirecting chromatin regulatory machinery to control the expression of disease-relevant genes we can study the consequences of and reverse the effects of these mis-expressed genes. To redirect endogenous chromatin modifying machinery, we have designed, tested, and optimized bifunctional molecules, called chemical epigenetic modifiers

(CEMs)299,300.

CEMs are synthesized using FK506, a linker region of varying lengths, and an inhibitor or chemical probes that binds to chromatin regulatory machinery. One advantage of using FK506 is that it has an exocyclic alkene group capable of functioning as a chemical handle. Guo et al. described a thiol-ene ‘click’ reaction that opens the doors for the synthesis of several FK506 derivatives including bifunctional molecules that also recruit endogenous histone modifying enzymes301. To recruit histone- modifying proteins, we have used inhibitors, including FDA-approved HDAC and HAT inhibitors. At low nanomolar concentrations CEMs recruit either HDACs or HATs to the targeted gene locus and releasing the proteins, creating a cloud of the enzymes with full, uninhibited activity. Recruiting endogenous proteins in this small-molecule approach, rather than exogenously over-expressing fusions of chromatin regulatory machinery, allows us to manipulate and investigate the physiological epigenetic environment, and swap bifunctional molecules to exert different effects. This system also allows the proteins to freely move proximal to the locus, rather than being held to the DNA and sterically hindered. The series of CEMs that recruit repressive machinery is called CEMr, whereas the activating series is called CEMa. Using these molecules, we study the complex dynamics and interplay of specific histone modifications and chromatin environments in a reversible, biologically relevant manner.

One of the ways that we study the effect induced by recruitment of specific chromatin regulatory machinery, is by performing chromatin immunoprecipitation (ChIP). The technique allows us to

26 determine the relative enrichments of other chromatin regulatory machinery, transcriptional machinery, or histone tail modifications. Briefly, the steps of ChIP include formaldehyde fixing cells, shearing the chromatin into small fragments, incubating with an antibody specific for the protein or modification of interest, reverse crosslinking, followed by a downstream quantitative analysis (i.e. next generation sequencing or quantitative real time PCR). A major bottleneck of this protocol is the chromatin sonication, which is one of the most sensitive, time consuming, and variable steps. To improve the consistency and efficiency of this step, we have developed and tested a sonication enhancing technique based on sonically active nanodroplets. With this optimized method, we have optimized efficiency by 16-fold and reduced variation 2.7-fold302.

To study chromatin biology and its relevant to human disease, we have incorporated synthetic biology and chemical biology to design and optimize novel platforms of investigation. With our CEMa and CEMr technology, we can precisely control the recruitment of chromatin regulatory machinery, which in turn influences the chromatin environment and level of gene expression. By repurposing cavitation reagents, we have also created a new ChIP protocol and demonstrated its increased effectiveness. With our optimized ChIP protocol, we have begun testing the CEM-mediated effects on the chromatin environment.

27 REFERENCES:

1. Han, L., Su, B., Li, W.-H. & Zhao, Z. CpG island density and its correlations with genomic features in mammalian genomes. Genome Biol. 9, R79 (2008).

2. Margueron, R. & Reinberg, D. Chromatin structure and the inheritance of epigenetic information. Nat. Rev. Genet. 11, 285–296 (2010).

3. Clark, S. J., Lee, H. J., Smallwood, S. A., Kelsey, G. & Reik, W. Single-cell epigenomics: powerful new methods for understanding gene regulation and cell identity. Genome Biol. 17, 72 (2016).

4. Luger, K., Mäder, A. W., Richmond, R. K., Sargent, D. F. & Richmond, T. J. Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature 389, 251–260 (1997).

5. Grewal, S. I. S. & Moazed, D. Heterochromatin and Epigenetic Control of Gene Expression. Science (80-. ). 301, 798–802 (2003).

6. Wallrath, L. L., Vitalini, M. W. & Elgin, S. C. R. Fundamentals of Chromatin. 529–552 (2014). doi:10.1007/978-1-4614-8624-4

7. Gardner, K. E., Allis, C. D. & Strahl, B. D. Operating on chromatin, a colorful language where context matters. J. Mol. Biol. 409, 36–46 (2011).

8. Pinheiro, I. et al. Prdm3 and Prdm16 are H3K9me1 methyltransferases required for mammalian heterochromatin integrity. Cell 150, 948–960 (2012).

9. Gonzalez-Sandoval, A. et al. Perinuclear Anchoring of H3K9-Methylated Chromatin Stabilizes Induced Cell Fate in C. elegans Embryos. Cell 163, 1333–1347 (2015).

10. De Koning, L. et al. Heterochromatin protein 1alpha: a hallmark of cell proliferation relevant to clinical oncology. EMBO Mol. Med. 1, 178–191 (2009).

11. Gorbalenya, A. E. & Koonin, E. V. Helicases: amino acid sequence comparisons and structure-function relationships. Curr. Opin. Struct. Biol. 3, 419–429 (1993).

12. Flaus, A., Martin, D. M. A., Barton, G. J. & Owen-Hughes, T. Identification of multiple distinct Snf2 subfamilies with conserved structural motifs. Nucleic Acids Res. 34, 2887–2905 (2006).

13. Narlikar, G. J., Sundaramoorthy, R. & Owen-Hughes, T. Mechanisms and functions of ATP- dependent chromatin-remodeling enzymes. Cell 154, 490–503 (2013).

14. Runge, J. S., Raab, J. R. & Magnuson, T. Epigenetic Regulation by ATP-Dependent Chromatin-Remodeling Enzymes: SNF-ing Out Crosstalk. Current Topics in Developmental Biology 117, (Elsevier Inc., 2015).

15. Henikoff, S., Furuyama, T. & Ahmad, K. Histone variants, nucleosome assembly and epigenetic inheritance. Trends Genet. 20, 320–326 (2004).

16. Blosser, T. R., Yang, J. G., Stone, M. D., Narlikar, G. J. & Zhuang, X. Dynamics of nucleosome remodelling by individual ACF complexes. Nature 462, 1022–1027 (2009).

17. Hota, S. K. & Bruneau, B. G. ATP-dependent chromatin remodeling during mammalian development. Development 143, 2882–2897 (2016).

28 18. Müller-Ott, K. et al. Specificity, propagation, and memory of pericentric heterochromatin. Mol. Syst. Biol. 10, 746 (2014).

19. Luger, K. & Richmond, T. J. The histone tails of the nucleosome. Curr. Opin. Genet. Dev. 8, 140–146 (1998).

20. Strahl, B. D. & Allis, C. D. The language of covalent histone modifications. Nature 403, 41–45 (2000).

21. Jeltsch, A. & Jurkowska, R. Z. New concepts in DNA methylation. Trends Biochem. Sci. 39, 310–318 (2014).

22. Jenuwein, T. & Allis, C. D. Translating the Histone Code. Science (80-. ). 293, 1074–1080 (2001).

23. Ito, S. et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5- carboxylcytosine. Science 333, 1300–3 (2011).

24. Kebede, A. F., Schneider, R. & Daujat, S. Novel types and sites of histone modifications emerge as players in the transcriptional regulation contest. doi:10.1111/febs.13047

25. Bannister, A. J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell Res. 21, 381–95 (2011).

26. Biswas, S. & Rao, C. M. Epigenetic tools (The Writers, The Readers and The Erasers) and their implications in cancer therapy. Eur. J. Pharmacol. 837, 8–24 (2018).

27. El-Osta, a. & Wolffe, a. P. DNA methylation and histone deacetylation in the control of gene expression: Basic biochemistry to human development and disease. Gene Expr. 9, 63–75 (2000).

28. Stewart, M. D., Li, J. & Wong, J. Relationship between Histone H3 Lysine 9 Methylation , Transcription Repression , and Heterochromatin Protein 1 Recruitment Relationship between Histone H3 Lysine 9 Methylation , Transcription Repression , and Heterochromatin Protein 1 Recruitment. Mol. Cell. Biol. 25, 2525–2538 (2005).

29. Santi, D. V, Norment, A. & Garrett, C. E. Covalent bond formation between a DNA-cytosine methyltransferase and DNA containing 5-azacytosine. Proc. Natl. Acad. Sci. U. S. A. 81, 6993–7 (1984).

30. Detich, N., Ramchandani, S. & Szyf, M. A Conserved 3′-Untranslated Element Mediates Growth Regulation of DNA Methyltransferase 1 and Inhibits Its Transforming Activity. J. Biol. Chem. 276, 24881–24890 (2001).

31. Dhe-paganon, S., Syeda, F. & Park, L. DNA methyl transferase 1 : regulatory mechanisms and implications in health and disease. J. Biochem. Mol. Biol. 2, 58–66 (2011).

32. Wigler, M., Levy, D. & Perucho, M. The somatic replication of DNA methylation. Cell 24, 33–40 (1981).

33. Kaiser, S. et al. The RNA methyltransferase Dnmt2 methylates DNA in the structural context of a tRNA. RNA Biol. 14, 1241–1251 (2017).

34. Okano, M., Bell, D. W., Haber, D. a & Li, E. DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 247–57 (1999).

29 35. Gowher, H., Liebert, K., Hermann, A., Xu, G. & Jeltsch, A. Mechanism of stimulation of catalytic activity of Dnmt3A and Dnmt3B DNA-(cytosine-C5)-methyltransferases by Dnmt3L. J. Biol. Chem. 280, 13341–8 (2005).

36. Jurkowska, R. Z., Jurkowski, T. P. & Jeltsch, A. Structure and Function of Mammalian DNA Methyltransferases. ChemBioChem 12, 206–222 (2011).

37. Moore, L. D., Le, T. & Fan, G. DNA Methylation and Its Basic Function. Neuropsychopharmacology 38, 23–38 (2013).

38. Lai, A. Y. & Wade, P. A. Cancer biology and NuRD: a multifaceted chromatin remodelling complex. Nat. Rev. Cancer 11, 588–96 (2011).

39. Du, Q., Luu, P.-L., Stirzaker, C. & Clark, S. J. Methyl-CpG-binding domain proteins: readers of the epigenome. Epigenomics 7, 1051–1073 (2015).

40. Bostick, M. et al. UHRF1 Plays a Role in Maintaining DNA Methylation in Mammalian Cells. Science (80-. ). 317, 1760–1764 (2007).

41. Sharif, J. et al. The SRA protein Np95 mediates epigenetic inheritance by recruiting Dnmt1 to methylated DNA. Nature 450, 908–912 (2007).

42. Rothbart, S. B. et al. Multivalent histone engagement by the linked tandem tudor and PHD domains of UHRF1 is required for the epigenetic inheritance of DNA methylation. Genes Dev. 27, 1288–1298 (2013).

43. Sadakierska-Chudy, A., Kostrzewa, R. M. & Filip, M. A comprehensive view of the epigenetic landscape part I: DNA methylation, passive and active DNA demethylation pathways and histone variants. Neurotox. Res. 27, 84–97 (2015).

44. Vermeulen, M. et al. Quantitative Interaction Proteomics and Genome-wide Profiling of Epigenetic Histone Marks and Their Readers. Cell 142, 967–980 (2010).

45. Ho, J. W. K. et al. Comparative analysis of metazoan chromatin organization. Nature 512, 449–52 (2014).

46. Gu, B. & Lee, M. Histone H3 lysine 4 methyltransferases and demethylases in self-renewal and differentiation of stem cells. Cell Biosci. 3, 39 (2013).

47. Development, S. A central role of H3K4me3 extended chromatin domains in gene regulation. 4–7 (2016).

48. Völkel, P. & Angrand, P.-O. The control of histone lysine methylation in epigenetic regulation. Biochimie 89, 1–20 (2007).

49. Li, H. et al. The Histone Methyltransferase SETDB1 and the DNA Methyltransferase DNMT3A Interact Directly and Localize to Promoters Silenced in Cancer Cells. 281, 19489–19500 (2006).

50. Keniry, A. et al. Setdb1-mediated H3K9 methylation is enriched on the inactive X and plays a role in its epigenetic silencing. Epigenetics Chromatin 9, 16 (2016).

51. Li, Y. et al. The Target of the NSD Family of Histone Lysine Methyltransferases Depends on the Nature of the Substrate. J. Biol. Chem. 284, 34283–34295 (2009).

52. Margueron, R. et al. Ezh1 and Ezh2 Maintain Repressive Chromatin through Different

30 Mechanisms. Mol. Cell 32, 503–518 (2008).

53. Hansen, K. H. et al. A model for transmission of the H3K27me3 epigenetic mark. Nat. Cell Biol. 10, 1291–1300 (2008).

54. Wagner, E. J. & Carpenter, P. B. Understanding the language of Lys36 methylation at histone H3. Nat. Rev. Mol. Cell Biol. 13, 115–126 (2012).

55. Jørgensen, S., Schotta, G. & Sørensen, C. S. Histone H4 lysine 20 methylation: key player in epigenetic regulation of genomic integrity. Nucleic Acids Res. 41, 2797–806 (2013).

56. Stulemeijer, I. J. E. et al. Dot1 histone methyltransferases share a distributive mechanism but have highly diverged catalytic properties. Sci. Rep. 5, 1–11 (2015).

57. Vlaming, H. & van Leeuwen, F. The upstreams and downstreams of H3K79 methylation by DOT1L. Chromosoma (2016). doi:10.1007/s00412-015-0570-5

58. Dindar, G., Anger, A. M., Mehlhorn, C., Hake, S. B. & Janzen, C. J. Structure-guided mutational analysis reveals the functional requirements for product specificity of DOT1 enzymes. Nat. Commun. 5, 5313 (2014).

59. Jacobs, S. A. & Khorasanizadeh, S. Structure of HP1 chromodomain bound to a lysine 9- methylated histone H3 tail. Science 295, 2080–2083 (2002).

60. Yap, K. L. & Zhou, M.-M. Keeping it in the family: diverse histone recognition by conserved structural folds. Crit. Rev. Biochem. Mol. Biol. 45, 488–505 (2010).

61. Marfella, C. G. A. & Imbalzano, A. N. The Chd family of chromatin remodelers. Mutat. Res. 618, 30–40 (2007).

62. Dhayalan, A. et al. The Dnmt3a PWWP Domain Reads Histone 3 Lysine 36 Trimethylation and Guides DNA Methylation. J. Biol. Chem. 285, 26114–26120 (2010).

63. Sanchez, R. & Zhou, M.-M. The PHD finger: a versatile epigenome reader. Trends Biochem. Sci. 36, 364–72 (2011).

64. Trievel, R. C. & Shilatifard, A. WDR5, a complexed protein. Nat. Struct. Mol. Biol. 16, 678–680 (2009).

65. Trojer, P. & Reinberg, D. Beyond histone methyl-lysine binding: How malignant brain tumor (MBT) protein L3MBTL1 impacts chromatin structure. Cell Cycle 7, 578–585 (2008).

66. Brien, G. L. et al. Polycomb PHF19 binds H3K36me3 and recruits PRC2 and demethylase NO66 to embryonic stem cell genes during differentiation. Nat. Struct. Mol. Biol. 19, 1273– 1281 (2012).

67. Cheng, J. et al. Structural Insight into Coordinated Recognition of Trimethylated Histone H3 Lysine 9 (H3K9me3) by the Plant Homeodomain (PHD) and Tandem Tudor Domain (TTD) of UHRF1 (Ubiquitin-like, Containing PHD and RING Finger Domains, 1) Protein. J. Biol. Chem. 288, 1329–1339 (2013).

68. Maiques-Diaz, A. & Somervaille, T. C. LSD1: biologic roles and therapeutic targeting. Epigenomics 8, 1103–16 (2016).

69. Walport, L. J., Hopkinson, R. J. & Schofield, C. J. Mechanisms of human histone and nucleic acid demethylases. Curr. Opin. Chem. Biol. 16, 525–534 (2012).

31

70. Klein, B. J. et al. The histone-H3K4-specific demethylase KDM5B Binds to its substrate and product through distinct PHD fingers. Cell Rep. 6, 325–335 (2014).

71. Aranda, S., Mas, G. & Croce, L. Di. Regulation of gene transcription by Polycomb proteins. 2, 1–15 (2015).

72. Labbé, R. M., Holowatyj, A. & Yang, Z.-Q. Histone lysine demethylase (KDM) subfamily 4: structures, functions and therapeutic potential. Am. J. Transl. Res. 6, 1–15 (2013).

73. Becker, J. S., Nicetto, D. & Zaret, K. S. H3K9me3-Dependent Heterochromatin: Barrier to Cell Fate Changes. Trends Genet. 32, 29–41 (2016).

74. Walport, L. J. et al. Human UTY(KDM6C) Is a Male-specific N ϵ -Methyl Lysyl Demethylase. J. Biol. Chem. 289, 18302–18313 (2014).

75. Dimitrova, E., Turberfield, A. H. & Klose, R. J. Histone demethylases in chromatin biology and beyond. EMBO Rep. 16, 1620–1639 (2015).

76. Katan-Khaykovich, Y. & Struhl, K. Dynamics of global histone acetylation and deacetylation in vivo: rapid restoration of normal histone acetylation status upon removal of activators and repressors. Genes Dev. 16, 743–752 (2002).

77. Struhl, K. Histone acetylation and transcriptional regulatory mechanisms. Genes Dev. 12, 599–606 (1998).

78. Bauer, W. R., Hayes, J. J., White, J. H. & Wolffe, A. P. Nucleosome Structural Changes Due to Acetylation. J. Mol. Biol. 236, 685–690 (1994).

79. Marushige, K. Activation of chromatin by acetylation of histone side chains. Proc. Natl. Acad. Sci. U. S. A. 73, 3937–41 (1976).

80. Li, G. & Zhu, P. Structure and organization of chromatin fiber in the nucleus. FEBS Lett. 589, 2893–2904 (2015).

81. Benson, L. J. et al. Properties of the Type B Histone Acetyltransferase Hat1. J. Biol. Chem. 282, 836–842 (2007).

82. Roth, S. Y., Denu, J. M. & Allis, C. D. Histone Acetyltransferases. Annu. Rev. Biochem. 70, 81–120 (2001).

83. Avvakumov, N. & Côté, J. The MYST family of histone acetyltransferases and their intimate links to cancer. Oncogene 26, 5395–5407 (2007).

84. Dancy, B. M. & Cole, P. A. Protein Lysine Acetylation by p300/CBP. Chem. Rev. 115, 2419– 2452 (2015).

85. Bhattacharya, S. et al. Structural and functional insight into TAF1-TAF7, a subcomplex of transcription factor II D. Proc. Natl. Acad. Sci. U. S. A. 111, 9103–8 (2014).

86. York, B. & O’Malley, B. W. Steroid Receptor Coactivator (SRC) Family: Masters of Systems Biology. J. Biol. Chem. 285, 38743–38750 (2010).

87. Sanchez, R., Meslamani, J. & Zhou, M. M. The bromodomain: From epigenome reader to druggable target. Biochim. Biophys. Acta - Gene Regul. Mech. 1839, 676–685 (2014).

32 88. Biology, a C. S. C. The Bromodomain : A New Target in Emerging Epigenetic The Bromodomain : A New Target in Emerging Epigenetic Medicine. (2015). doi:10.1021/acschembio.5b00831

89. Ferri, E., Petosa, C. & McKenna, C. E. Bromodomains: structure, function and pharmacology of inhibition. Biochem. Pharmacol. (2015). doi:10.1016/j.bcp.2015.12.005

90. Zeng, L. & Zhou, M. M. Bromodomain: An acetyl-lysine binding domain. FEBS Lett. 513, 124– 128 (2002).

91. Sabari, B. R., Zhang, D., Allis, C. D. & Zhao, Y. Metabolic regulation of gene expression through histone acylations. Nat. Rev. Mol. Cell Biol. 18, 90–101 (2017).

92. Li, Y. et al. Molecular Coupling of Histone Crotonylation and Active Transcription by AF9 YEATS Domain. Mol. Cell 62, 181–193 (2016).

93. de Ruijter, A. J. M., van Gennip, A. H., Caron, H. N., Kemp, S. & van Kuilenburg, A. B. P. Histone deacetylases (HDACs): characterization of the classical HDAC family. Biochem. J. 370, 737–749 (2003).

94. Livyatan, I. & Meshorer, E. The HDAC interaction network. Mol. Syst. Biol. 9, 671 (2013).

95. Schemies, J., Uciechowska, U., Sippl, W. & Jung, M. NAD+-dependent histone deacetylases (sirtuins) as novel therapeutic targets. Med. Res. Rev. 30, 861–889 (2010).

96. Yang, X.-J. & Seto, E. Collaborative spirit of histone deacetylases in regulating chromatin structure and gene expression. Curr. Opin. Genet. Dev. 13, 143–53 (2003).

97. Boyault, C., Sadoul, K., Pabion, M. & Khochbin, S. HDAC6, at the crossroads between cytoskeleton and cell signaling by acetylation and ubiquitination. Oncogene 26, 5468–5476 (2007).

98. Stunkel, W. & Campbell, R. M. Sirtuin 1 (SIRT1): The Misunderstood HDAC. J. Biomol. Screen. 16, 1153–1169 (2011).

99. Tanno, M., Sakamoto, J., Miura, T., Shimamoto, K. & Horio, Y. Nucleocytoplasmic Shuttling of the NAD + -dependent Histone Deacetylase SIRT1. J. Biol. Chem. 282, 6823–6832 (2007).

100. Vaquero, A. et al. Human SirT1 Interacts with Histone H1 and Promotes Formation of Facultative Heterochromatin. Mol. Cell 16, 93–105 (2004).

101. Frye, R. A. Phylogenetic Classification of Prokaryotic and Eukaryotic Sir2-like Proteins. Biochem. Biophys. Res. Commun. 273, 793–798 (2000).

102. Thomas, M. C. & Chiang, C.-M. The General Transcription Machinery and General Cofactors. Crit. Rev. Biochem. Mol. Biol. 41, 105–178 (2006).

103. Reinke, H. & Hörz, W. Histones Are First Hyperacetylated and Then Lose Contact with the Activated PHO5 Promoter. Mol. Cell 11, 1599–1607 (2003).

104. Shogren-Knaak, M. et al. Histone H4-K16 Acetylation Controls Chromatin Structure and Protein Interactions. Science (80-. ). 311, 844 LP-847 (2006).

105. Buratowski, S. The CTD code. Nat. Struct. Biol. 10, 679 (2003).

106. Joshi, A. A. & Struhl, K. Eaf3 Chromodomain Interaction with Methylated H3-K36 Links

33 Histone Deacetylation to Pol II Elongation. Mol. Cell 20, 971–978 (2005).

107. Li, B., Carey, M. & Workman, J. L. The Role of Chromatin during Transcription. Cell 128, 707– 719 (2007).

108. Bea, S. et al. BMI-1 Gene Amplification and Overexpression in Hematological Malignancies Occur Mainly in Mantle Cell Lymphomas. Cancer Res. 61, 2409–2412 (2001).

109. Haupt, Y., Bath, M. L., Harris, A. W. & Adams, J. M. bmi-1 transgene induces lymphomas and collaborates with in tumorigenesis. Oncogene 8, 3161–4 (1993).

110. Alkema, M. J., Jacobs, H., van Lohuizen, M. & Berns, A. Perturbation of B and T cell development and predisposition to lymphomagenesis in EμBmi1 transgenic mice require the Bmi1 RING finger. Oncogene 15, 899–910 (1997).

111. Zhang, R. et al. Dysregulation of Bmi1 promotes malignant transformation of hepatic progenitor cells. Oncogenesis 5, e203 (2016).

112. Chen, S. et al. Bmi1 combines with oncogenic KRAS to induce malignant transformation of human pancreatic duct cells in vitro. Tumor Biol. (2016). doi:10.1007/s13277-016-4840-5

113. Ley, T. J. et al. DNMT3A Mutations in Acute Myeloid Leukemia. N. Engl. J. Med. 363, 2424– 2433 (2010).

114. Fenaux, P. et al. Efficacy of azacitidine compared with that of conventional care regimens in the treatment of higher-risk myelodysplastic syndromes: a randomised, open-label, phase III study. Lancet Oncol. 10, 223–232 (2009).

115. Lorsbach, R. B. et al. TET1, a member of a novel protein family, is fused to MLL in acute myeloid leukemia containing the t(10;11)(q22;q23). Leukemia 17, 637 (2003).

116. Delhommeau, F. et al. Mutation in TET2 in Myeloid Cancers. N. Engl. J. Med. 360, 2289–2301 (2009).

117. Cimmino, L., Abdel-Wahab, O., Levine, R. L. & Aifantis, I. TET Family Proteins and Their Role in Stem Cell Differentiation and Transformation. Cell Stem Cell 9, 193–204 (2011).

118. Langemeijer, S. M. C. et al. Acquired mutations in TET2 are common in myelodysplastic syndromes. Nat. Genet. 41, 838–842 (2009).

119. Moran-Crusio, K. et al. Tet2 loss leads to increased hematopoietic stem cell self-renewal and myeloid transformation. Cancer Cell 20, 11–24 (2011).

120. Panagopoulos, I. et al. Fusion of the MORF and CBP genes in acute myeloid leukemia with the t(10;16)(q22;p13). Hum. Mol. Genet. 10, 395–404 (2001).

121. Sobulo, O. M. et al. MLL is fused to CBP, a histone acetyltransferase, in therapy-related acute myeloid leukemia with a t(11;16)(q23;p13.3). Proc. Natl. Acad. Sci. U. S. A. 94, 8732–8737 (1997).

122. Pasqualucci, L. et al. Analysis of the Coding Genome of Diffuse Large B-Cell Lymphoma. Nat. Genet. 43, 830–837 (2011).

123. Taki, Tomohiko; Sako, Masahiro; Tsuchida, Masahiro; Hayashi, Y. The t(11; 16)(q23; p13) Translocation in Myelodysplastic Syndrome Fuses the MLL Gene to the CBP Gene. (2018).

34 124. Petrif, F. et al. Rubinstein-Taybi syndrome caused by mutations in the transcriptional co- activator CBP. Nature 376, 348 (1995).

125. Laï, J. L., Jouet, J.-P., Bauters, F. & Deminatti, M. Chronic myelogenous leukemia with translocation (8;22): Report of a new case. Cancer Genet. Cytogenet. 17, 365–366 (1985).

126. Gayther, S. A. et al. Mutations truncating the EP300 acetylase in human cancers. Nat. Genet. 24, 300 (2000).

127. Kitabayashi, I. et al. Fusion of MOZ and p300 histone acetyltransferases in acute monocytic leukemia with a t(8;22)(p11;q13) chromosome translocation. Leukemia 15, 89 (2001).

128. Huntly, B. J. P. et al. MOZ-TIF2, but not BCR-ABL, confers properties of leukemic stem cells to committed murine hematopoietic progenitors. Cancer Cell 6, 587–596 (2004).

129. Moore, S. D. P. et al. Uterine Leiomyomata with t(10;17) Disrupt the Histone Acetyltransferase. Cancer Res. 64, 5570 LP-5577 (2004).

130. Liu, S. et al. Targeting STAT5 in Hematologic Malignancies through Inhibition of the Bromodomain and Extra-Terminal (BET) Bromodomain Protein BRD2. Mol. Cancer Ther. 13, 1194 LP-1205 (2014).

131. French, C. A. et al. BRD–NUT oncoproteins: a family of closely related nuclear proteins that block epithelial differentiation and maintain the growth of carcinoma cells. Oncogene 27, 2237 (2007).

132. Imielinski, M. et al. Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing. Cell 150, 1107–1120 (2012).

133. French, C. A. et al. Midline carcinoma of children and young adults with NUT rearrangement. J. Clin. Oncol. 22, 4135–4139 (2004).

134. Drilon, A. et al. Response to Cabozantinib in Patients with RET Fusion-Positive Lung Adenocarcinomas. Cancer Discov. (2013).

135. Elzamly, S. et al. Acute myeloid leukemia with KMT2A-SEPT5 translocation: A case report and review of the literature. SAGE Open Med. Case Reports 6, 2050313X17750334 (2018).

136. Winters, A. C. & Bernt, K. M. MLL-Rearranged Leukemias—An Update on Science and Clinical Approaches. Front. Pediatr. 5, 4 (2017).

137. Parsons, D. W. et al. The Genetic Landscape of the Childhood Cancer Medulloblastoma. Science (80-. ). 331, 435 LP-439 (2011).

138. Stephens, P. J. et al. The landscape of cancer genes and mutational processes in breast cancer. Nature 486, 400 (2012).

139. Dalgliesh, G. L. et al. Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes. Nature 463, 360 (2010).

140. Morin, R. D. et al. Frequent mutation of histone modifying genes in non-Hodgkin lymphoma. Nature 476, 298–303 (2011).

141. Grasso, C. S. et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature 487, 239 (2012).

35 142. Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

143. Hollink, I. H. I. M. et al. NUP98/NSD1 characterizes a novel poor prognostic group in acute myeloid leukemia with a distinct HOX gene expression pattern. Blood 118, 3645 LP-3656 (2011).

144. Stec, I. et al. WHSC1, a 90 kb SET domain-containing gene, expressed in early development and homologous to a Drosophila dysmorphy gene maps in the Wolf-Hirschhorn syndrome critical region and is fused to IgH in t(4;14) multiple myeloma. Hum. Mol. Genet. 7, 1071–1082 (1998).

145. Jaffe, J. D. et al. Global chromatin profiling reveals NSD2 mutations in pediatric acute lymphoblastic leukemia. Nat. Genet. 45, 1386–1391 (2013).

146. Rosati, R. et al. NUP98 is fused to the NSD3 gene in acute myeloid leukemia associated with t(8;11)(p11.2;p15). Blood 99, 3857 LP-3860 (2002).

147. Dubois, S. et al. Targeted EZH2 Inhibitors in Diffuse Large B-Cell Lymphoma (DLBCL): Immunohistochemical and Mutational Profiles of Patients May Determine Candidates for Treatment. Blood 124, 1656 LP-1656 (2014).

148. Shih, A. H., Abdel-Wahab, O., Patel, J. P. & Levine, R. L. The role of mutations in epigenetic regulators in myeloid malignancies. Nat. Rev. Cancer 12, 599 (2012).

149. Bejar, R. et al. Clinical Effect of Point Mutations in Myelodysplastic Syndromes. N. Engl. J. Med. 364, 2496–2506 (2011).

150. Gough, S. M., Slape, C. I. & Aplan, P. D. NUP98 gene fusions and hematopoietic malignancies: common themes and new biologic insights. Blood 118, 6247–6257 (2011).

151. van Haaften, G. et al. Somatic mutations of the histone H3K27 demethylase, UTX, in human cancer. Nat. Genet. 41, 521–523 (2009).

152. Van Vlierberghe, P. et al. PHF6 mutations in T-cell acute lymphoblastic leukemia. Nat. Genet. 42, 338 (2010).

153. Van Vlierberghe, P. et al. PHF6 mutations in adult acute myeloid leukemia. Leukemia 25, 130 (2010).

154. Fujimoto, A. et al. Whole-genome sequencing of liver cancers identifies etiological influences on mutation patterns and recurrent mutations in chromatin regulators. Nat. Genet. 44, 760 (2012).

155. Kanai, Y., Ushijima, S., Nakanishi, Y., Sakamoto, M. & Hirohashi, S. Mutation of the DNA methyltransferase (DNMT) 1 gene in human colorectal cancers. Cancer Lett. 192, 75–82 (2003).

156. Ropero, S. et al. A truncating mutation of HDAC2 in human cancers confers resistance to histone deacetylase inhibition. Nat. Genet. 38, 566 (2006).

157. Berger, M. F. et al. The genomic complexity of primary human prostate cancer. Nature 470, 214–220 (2011).

158. Stransky, N. et al. The Mutational Landscape of Head and Neck Squamous Cell Carcinoma. Science 333, 1157–1160 (2011).

36

159. Fontebasso, A. M. et al. Mutations in SETD2 and genes affecting histone H3K36 methylation target hemispheric high-grade gliomas. Acta Neuropathol. 125, 659–669 (2013).

160. Piva, F. et al. BAP1, PBRM1 and SETD2 in clear-cell renal cell carcinoma: molecular diagnostics and possible targets for personalized therapies. Expert Rev. Mol. Diagn. 15, 1201–1210 (2015).

161. Esteller, M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat. Rev. Genet. 8, 286 (2007).

162. The Cancer Genome Atlas Network. Integrated Genomic Analyses of Ovarian Carcinoma. Nature 474, 609–615 (2011).

163. Toyota, M. & Suzuki, H. 11 - Epigenetic Drivers of Genetic Alterations. in Epigenetics and Cancer, Part A (eds. Herceg, Z. & Ushijima, T. B. T.-A. in G.) 70, 309–323 (Academic Press, 2010).

164. Herman, J. G. Hypermethylation of tumor suppressor genes in cancer. Semin. Cancer Biol. 9, 359–367 (1999).

165. Kazanets, A., Shorstova, T., Hilmi, K., Marques, M. & Witcher, M. Epigenetic silencing of tumor suppressor genes: Paradigms, puzzles, and potential. Biochim. Biophys. Acta - Rev. Cancer 1865, 275–288 (2016).

166. Dawson, M. a. & Kouzarides, T. Cancer epigenetics: From mechanism to therapy. Cell 150, 12–27 (2012).

167. Cz, F. N. et al. Genes methylated by DNA methyltransferase 3b are similar in mouse intestine and human colon cancer. Prevention 121, 2–6 (2011).

168. Baylin, S. B. & Jones, P. A. A decade of exploring the cancer epigenome — biological and translational implications. Nat. Rev. Cancer 11, 726 (2011).

169. Mutskov, V. & Felsenfeld, G. Silencing of transgene transcription precedes methylation of promoter DNA and histone H3 lysine 9. EMBO J. 23, 138–149 (2004).

170. Strunnikova, M. et al. Chromatin Inactivation Precedes De Novo DNA Methylation during the Progressive Epigenetic Silencing of the RASSF1A Promoter Chromatin Inactivation Precedes De Novo DNA Methylation during the Progressive Epigenetic Silencing of the RASSF1A Promoter †. Mol.Cell.Biol. 25, 3923–3933 (2005).

171. Jesionek-Kupnicka, D. et al. TP53 Promoter Methylation in Primary Glioblastoma: Relationship with TP53 mRNA and Protein Expression and Mutation Status. DNA Cell Biol. 33, 217–226 (2014).

172. Robaina, M. C. S. et al. Quantitative analysis of CDKN2A methylation, mRNA, and p16INK4a protein expression in children and adolescents with Burkitt lymphoma: Biological and clinical implications. Leuk. Res. 39, 248–256 (2015).

173. Kondo, Y. et al. Alterations of DNA methylation and histone modifications contribute to gene silencing in hepatocellular carcinomas. Hepatol. Res. 37, 974–983 (2007).

174. Li, Y., Huang, Y., Lv, Y., Meng, G. & Guo, Q.-N. Epigenetic regulation of the pro-apoptosis gene TSSC3 in human osteosarcoma cells. Biomed. Pharmacother. 68, 45–50 (2014).

37 175. Cabianca, D. S. et al. A Long ncRNA Links Copy Number Variation to a Polycomb/Trithorax Epigenetic Switch in FSHD Muscular Dystrophy. Cell 149, 819–831 (2012).

176. Ayala-Ortega, E. et al. Epigenetic silencing of miR-181c by DNA methylation in glioblastoma cell lines. BMC Cancer 16, 226 (2016).

177. Fraga, M. F. et al. Loss of acetylation at Lys16 and trimethylation at Lys20 of histone H4 is a common hallmark of human cancer. Nat. Genet. 37, 391 (2005).

178. Richon, V. M., Sandhoff, T. W., Rifkind, R. A. & Marks, P. A. Histone deacetylase inhibitor selectively induces p21(WAF1) expression and gene-associated histone acetylation. Proc. Natl. Acad. Sci. U. S. A. 97, 10014–10019 (2000).

179. Miao, F. et al. Lymphocytes From Patients With Type 1 Diabetes Display a Distinct Profile of Chromatin Histone H3 Lysine 9 Dimethylation : An Epigenetic Study in Diabetes. Diabetes 57, 3189–3198 (2008).

180. Navarro, J. M. et al. Site- and allele-specific polycomb dysregulation in T-cell leukaemia. Nat. Commun. 6, (2015).

181. Villalba, M. et al. Epigenetic alterations leading to TMPRSS4 promoter hypomethylation and protein overexpression predict poor prognosis in squamous lung cancer patients. (2016).

182. Jancar, S. & Serezani, C. H. Imbalance between HDAC and HAT activities drives aberrant STAT1/MyD88 expression in macrophages from type 1 diabetic mice. 31, 334–339 (2018).

183. Göttlicher, M. et al. Valproic acid defines a novel class of HDAC inhibitors inducing differentiation of transformed cells. EMBO J. 20, 6969–78 (2001).

184. Christman, J. K. 5-Azacytidine and 5-aza-2′-deoxycytidine as inhibitors of DNA methylation: Mechanistic studies and their implications for cancer therapy. Oncogene 21, 5483–5495 (2002).

185. Derissen, Ellen J.B., Beunen, Jos H., Schellens, J. H. M. Concise Drug Review: Azacitidine and Decitabine. Oncologist 18, 619–624 (2013).

186. Villar-garea, A., Fraga, M. F., Espada, J. & Cells, C. Procaine Is a DNA-demethylating Agent with Growth-inhibitory Effects in Human Cancer Cells Procaine Is a DNA-demethylating Agent with Growth-inhibitory Effects in Human. 4984–4989 (2003).

187. Datta, J. et al. A new class of quinoline-based DNA hypomethylating agents reactivates tumor suppressor genes by blocking DNA methyltransferase 1 activity and inducing its degradation. Cancer Res. 69, 4277–4285 (2009).

188. Kubicek, S. et al. Reversal of H3K9me2 by a Small-Molecule Inhibitor for the G9a Histone Methyltransferase. Mol. Cell 25, 473–481 (2007).

189. Vedadi, M. et al. A chemical probe selectively inhibits G9a and GLP methyltransferase activity in cells. Nat. Chem. Biol. 7, 566–574 (2011).

190. Agarwal, P. & Jackson, S. P. G9a inhibition potentiates the anti-tumour activity of DNA double- strand break inducing agents by impairing DNA repair independent of status. Cancer Lett. 380, 467–475 (2016).

191. Pappano, W. N. et al. The Histone Methyltransferase Inhibitor A-366 Uncovers a Role for G9a/GLP in the Epigenetics of Leukemia. PLoS One 10, e0131716 (2015).

38

192. Kuntz, K. W. et al. The Importance of Being Me: Magic Methyls, Methyltransferase Inhibitors, and the Discovery of Tazemetostat. J. Med. Chem. 59, 1556–1564 (2016).

193. Gupta, S. et al. Reversible LSD1 inhibition with HCI-2509 induces the p53 gene expression signature and disrupts the MYCN signature in high-risk neuroblastoma cells. Oncotarget 9, 9907–9924 (2018).

194. James, L. I. et al. Discovery of a chemical probe for the L3MBTL3 methyl-lysine reader domain HHS Public Access Author manuscript. Nat Chem Biol 9, 184–191 (2013).

195. Martin Herold, J. et al. Small Molecule Ligands of Methyl-Lysine Binding Proteins. doi:10.1021/jm200045v

196. MacDonald, J. L. & Roskams, a. J. Epigenetic regulation of nervous system development by DNA methylation and histone deacetylation. Prog. Neurobiol. 88, 170–183 (2009).

197. Lasko, L. M. et al. Discovery of a selective catalytic p300/CBP inhibitor that targets lineagespecific tumours (Nature (2017) DOI: 10.1038/nature24028). Nature 558, E1 (2018).

198. Shen, L. et al. Class I histone deacetylase inhibitor entinostat suppresses regulatory T cells and enhances immunotherapies in renal and prostate cancer models. PLoS One 7, e30815 (2012).

199. Pili, R. et al. Phase I study of the histone deacetylase inhibitor entinostat in combination with 13-cis retinoic acid in patients with solid tumours. Br. J. Cancer 106, 77–84 (2012).

200. Greve, G. et al. The pan-HDAC inhibitor panobinostat acts as a sensitizer for erlotinib activity in EGFR-mutated and -wildtype non-small cell lung cancer cells. BMC Cancer 15, 947 (2015).

201. Hay, D. A. et al. Discovery and optimization of small-molecule ligands for the CBP/p300 bromodomains. J. Am. Chem. Soc. 136, 9308–9319 (2014).

202. Dhalluin, C. et al. Structure and ligand of a histone acetyltransferase bromodomain. Nature 399, 491–496 (1999).

203. Shi, J. & Vakoc, C. R. The Mechanisms behind the Therapeutic Activity of BET Bromodomain Inhibition. Mol. Cell 54, 72–736 (2014).

204. Filippakopoulos, P. et al. Selective inhibition of BET bromodomains. Nature 468, 1067–1073 (2010).

205. Sanchez, R., Meslamani, J. & Zhou, M.-M. The bromodomain: From epigenome reader to druggable target. Biochim. Biophys. Acta - Gene Regul. Mech. 1839, 676–685 (2014).

206. Demont, E. H. et al. 1,3-Dimethyl Benzimidazolones Are Potent, Selective Inhibitors of the BRPF1 Bromodomain. ACS Med. Chem. Lett. 5, 1190–1195 (2014).

207. Seal, J. et al. Identification of a novel series of BET family bromodomain inhibitors: Binding mode and profile of I-BET151 (GSK1210151A). Bioorganic Med. Chem. Lett. 22, 2968–2972 (2012).

208. Picaud, S. et al. Generation of a selective small molecule inhibitor of the CBP/p300 bromodomain for leukemia therapy. Cancer Res. 75, 5106–5120 (2015).

209. New, M., Olzscha, H. & La Thangue, N. B. HDAC inhibitor-based therapies: Can we interpret

39 the code? Mol. Oncol. 6, 637–656 (2012).

210. Zappasodi, R. et al. Pleiotropic antitumor effects of the pan-HDAC inhibitor ITF2357 against c- Myc-overexpressing human B-cell non-Hodgkin lymphomas. Int. J. Cancer 135, 2034–2045 (2014).

211. Jia, H. et al. Histone deacetylase (HDAC) inhibitors targeting HDAC3 and HDAC1 ameliorate polyglutamine-elicited phenotypes in model systems of Huntington’s disease. 46, 351–361 (2013).

212. Bantscheff, M. et al. Chemoproteomics profiling of HDAC inhibitors reveals selective targeting of HDAC complexes. Nat. Biotechnol. 29, 255–265 (2011).

213. Gao, C. et al. Rational design and validation of a Tip60 histone acetyltransferase inhibitor. Sci. Rep. 4, 5372 (2015).

214. Yoshida, M., Kijima, M., Akita, M. & Beppu, T. Potent and specific inhibition of mammalian histone deacetylase both in vivo and in vitro by trichostatin A. J. Biol. Chem. 265, 17174–9 (1990).

215. Mann, B. S., Johnson, J. R., Cohen, M. H., Justice, R. & Pazdur, R. FDA Approval Summary: Vorinostat for Treatment of Advanced Primary Cutaneous T-Cell Lymphoma. Oncologist 12, 1247–1252 (2007).

216. Jimeno, A. & McDermott, J. Belinostat for the treatment of peripheral T-cell lymphomas. Drugs of Today 50, 337 (2014).

217. Stiff, A. et al. Histone Deacetylase Inhibitors Enhance the Therapeutic Potential of Reovirus in Multiple Myeloma. Mol. Cancer Ther. 15, 830–841 (2016).

218. Nicodeme, E. et al. Suppression of inflammation by a synthetic histone mimic. Nature 468, 1119–1123 (2010).

219. Mirguet, O. et al. Discovery of epigenetic regulator i-bet762: Lead optimization to afford a clinical candidate inhibitor of the bet bromodomains. J. Med. Chem. 56, 7501–7515 (2013).

220. Warrell, R. P., He, L.-Z., Richon, V., Calleja, E. & Pandolfi, P. P. Therapeutic Targeting of Transcription in Acute Promyelocytic Leukemia by Use of an Inhibitor of Histone Deacetylase. JNCI J. Natl. Cancer Inst. 90, 1621–1625 (1998).

221. Kijima, M., Yoshida, M., Sugita, K., Horinouchi, S. & Beppu, T. Trapoxin, an antitumor cyclic tetrapeptide, is an irreversible inhibitor of mammalian histone deacetylase. J. Biol. Chem. 268, 22429–35 (1993).

222. Newmark, H. L. & Young, C. W. Butyrate and phenylacetate as differentiating agents: practical problems and opportunities. J. Cell. Biochem. Suppl. 22, 247–53 (1995).

223. Nau, H., Hauck, R. S. & Ehlers, K. Valproic acid-induced neural tube defects in mouse and human: aspects of chirality, alternative drug development, pharmacokinetics and possible mechanisms. Pharmacol. Toxicol. 69, 310–21 (1991).

224. Schneider, B. J. et al. Phase I study of vorinostat (suberoylanilide hydroxamic acid, NSC 701852) in combination with docetaxel in patients with advanced and relapsed solid malignancies. Invest. New Drugs 30, 249–257 (2012).

225. Rathkopf, D. E. et al. A phase 2 study of intravenous panobinostat in patients with castration-

40 resistant prostate cancer. Cancer Chemother. Pharmacol. 72, 537–544 (2013).

226. Brocks, D. et al. DNMT and HDAC inhibitors induce cryptic transcription start sites encoded in long terminal repeats. Nat. Genet. 49, 1052–1060 (2017).

227. Jones, P. A., Issa, J. P. J. & Baylin, S. Targeting the cancer epigenome for therapy. Nat. Rev. Genet. 17, 630–641 (2016).

228. Feinberg, A. P., Koldobskiy, M. A., Biology, C. & Institutet, K. Epigenetic modulators, modifiers, and mediators in cancer etiology and progression. 17, 284–299 (2016).

229. Portela, A. & Esteller, M. Epigenetic modifications and human disease. Nat. Biotechnol. 28, 1057–1068 (2010).

230. Ren, C. et al. Small Molecule Modulators of Methyl-Lysine Binding for the CBX7 Chromodomain. Chem. Biol. 22, 161–168 (2015).

231. Miller, T. C. R. et al. Competitive Binding of a Benzimidazole to the Histone-Binding Pocket of the Pygo PHD Finger. ACS Chem. Biol. 9, 2864–2874 (2014).

232. Prusevich, P. et al. A Selective Phenelzine Analogue Inhibitor of Histone Demethylase LSD1. ACS Chem. Biol. 9, 1284–1293 (2014).

233. Willmann, D. et al. Impairment of prostate cancer cell growth by a selective and reversible lysine-specific demethylase 1 inhibitor. Int. J. Cancer 131, 2704–2709 (2012).

234. Lee, H.-Z. et al. FDA Approval: Belinostat for the Treatment of Patients with Relapsed or Refractory Peripheral T-cell Lymphoma. Clin. Cancer Res. 21, 2666 LP-2670 (2015).

235. Laubach, J. P., Moreau, P., San-Miguel, J. F. & Richardson, P. G. Panobinostat for the Treatment of Multiple Myeloma. Clin. Cancer Res. 21, 4767 LP-4773 (2015).

236. Kaminskas, E., Farrell, A. T., Wang, Y.-C., Sridhara, R. & Pazdur, R. FDA Drug Approval Summary: Azacitidine (5-azacytidine, VidazaTM) for Injectable Suspension. Oncol. 10, 176– 182 (2005).

237. Li, L. X. et al. Lysine methyltransferase SMYD2 promotes cyst growth in autosomal dominant polycystic kidney disease. J. Clin. Invest. 127, 2751–2764 (2017).

238. Sharp, P. a. RNA interference--2001. Genes Dev 15, 485–490 (2001).

239. Mittal, V. Improving the efficiency of RNA interference in mammals. Nat. Rev. Genet. 5, 355– 365 (2004).

240. Wyman, C. & Kanaar, R. DNA Double-Strand Break Repair: All’s Well that Ends Well. Annu. Rev. Genet. 40, 363–383 (2006).

241. Cermak, T. et al. Efficient design and assembly of custom TALEN and other TAL effector- based constructs for DNA targeting. Nucleic Acids Res. 39, e82–e82 (2011).

242. Urnov, F. D., Rebar, E. J., Holmes, M. C., Zhang, H. S. & Gregory, P. D. Genome editing with engineered zinc finger nucleases. Nat. Rev. Genet. 11, 636 (2010).

243. Mali, P., Esvelt, K. M. & Church, G. M. Cas9 as a versatile tool for engineering biology. Nat. Methods 10, 957–63 (2013).

41 244. Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–6 (2013).

245. Ran, F. A. et al. Genome engineering using the CRISPR-Cas9 system. Nat. Protoc. 8, 2281– 2308 (2013).

246. Liu, Q., Segal, D. J., Ghiara, J. B. & Barbas, C. F. Design of polydactyl zinc-finger proteins for unique addressing within complex genomes. Proc. Natl. Acad. Sci. 94, 5525 LP-5530 (1997).

247. Boch, J. et al. Breaking the Code of DNA Binding Specificity of TAL-Type III Effectors. Science (80-. ). 326, 1509 LP-1512 (2009).

248. Wiedenheft, B., Sternberg, S. H. & Doudna, J. A. RNA-guided genetic silencing systems in bacteria and archaea. Nature 482, 331 (2012).

249. Cho, S. W., Kim, S., Kim, J. M. & Kim, J.-S. Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230 (2013).

250. Perez-Pinera, P., Ousterout, D. G., Brown, M. T. & Gersbach, C. A. Gene targeting to the ROSA26 locus directed by engineered zinc finger nucleases. Nucleic Acids Res. 40, 3741– 3752 (2012).

251. Zu, Y. et al. TALEN-mediated precise genome modification by homologous recombination in zebrafish. Nat. Methods 10, 329 (2013).

252. Sugi, T. Genome Editing in C. elegans and Other Nematode Species. Int. J. Mol. Sci. 17, 295 (2016).

253. Qi, L. S. et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173–83 (2013).

254. McDonald, J. I. et al. Reprogrammable CRISPR/Cas9-based system for inducing site-specific DNA methylation. Biol. Open bio.019067 (2016). doi:10.1242/bio.019067

255. Maeder, M. L. et al. Robust, synergistic regulation of human gene expression using TALE activators. Nat. Methods 10, 243–5 (2013).

256. Li, J.-F. et al. Multiplex and homologous recombination-mediated plant genome editing via guide RNA/Cas9. Nat. Biotechnol. 31, 688–691 (2013).

257. Doyon, Y. et al. Heritable targeted gene disruption in zebrafish using designed zinc-finger nucleases. Nat. Biotechnol. 26, 702 (2008).

258. Zhang, F. et al. High frequency targeted mutagenesis in Arabidopsis thaliana using zinc finger nucleases. Proc. Natl. Acad. Sci. U. S. A. 107, 12028–12033 (2010).

259. Maeder, M. L. et al. CRISPR RNA-guided activation of endogenous human genes. Nat Methods 10, 977–979 (2013).

260. Gilbert, L. A. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442–51 (2013).

261. Jost, M. et al. Combined CRISPRi/a-Based Chemical Genetic Screens Reveal that Rigosertib Is a Microtubule-Destabilizing Agent. Mol. Cell 68, 210–223.e6 (2017).

262. Cheng, A. W. et al. Multiplexed activation of endogenous genes by CRISPR-on, an RNA- guided transcriptional activator system. Cell Res. 23, 1163–71 (2013).

42

263. Balboa, D. et al. Conditionally Stabilized dCas9 Activator for Controlling Gene Expression in Human Cell Reprogramming and Differentiation. Stem Cell Reports 5, 448–459 (2015).

264. Chavez, A. et al. Highly efficient Cas9-mediated transcriptional programming. Nat. Methods 12, 326–328 (2015).

265. Le Sage, C. et al. Dual direction CRISPR transcriptional regulation screening uncovers gene networks driving drug resistance. Sci. Rep. 7, 1–10 (2017).

266. Choudhury, S. R., Cui, Y., Lubecka, K., Stefanska, B. & Irudayaraj, J. CRISPR-dCas9 mediated TET1 targeting for selective DNA demethylation at BRCA1 promoter. Oncotarget 7, 46545–46556 (2016).

267. Morita, S. et al. Targeted DNA demethylation in vivo using dCas9-peptide repeat and scFv- TET1 catalytic domain fusions. Nat. Biotechnol. 34, 1060–1065 (2016).

268. Liu, S. et al. Editing DNA methylation in the mammalian genome. Cell 167, 233–247 (2016).

269. Hilton, I. B. et al. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat. Biotechnol. 33, 510–517 (2015).

270. Lupo, A. et al. KRAB-Zinc Finger Proteins: A Repressor Family Displaying Multiple Biological Functions. Curr. Genomics 14, 268–78 (2013).

271. Vojta, A. et al. Repurposing the CRISPR-Cas9 system for targeted DNA methylation. Nucleic Acids Res. 1–14 (2016). doi:10.1093/nar/gkw159

272. Hilton, I. B. et al. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat. Biotechnol. 33, 510–517 (2015).

273. Thakore, P. I. et al. Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat. Methods advance on, (2015).

274. Kearns, N. A. et al. Functional annotation of native enhancers with a Cas9 -histone demethylase fusion demethylase fusion. Nat. Methods 12, 401–403 (2015).

275. Klann, T. S. et al. CRISPR-Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome. Nat. Biotechnol. 35, 561–568 (2017).

276. Tanenbaum, M. E., Gilbert, L. A., Qi, L. S., Weissman, J. S. & Vale, R. D. A Protein-Tagging System for Signal Amplification in Gene Expression and Fluorescence Imaging. Cell 159, 635–646 (2014).

277. Pflueger, C. et al. A modular dCas9-SunTag DNMT3A epigenome editing system overcomes pervasive off-target activity of direct fusion dCas9-DNMT3A constructs. Genome Res. 28, 1193–1206 (2018).

278. Zalatan, J. G. et al. Engineering complex synthetic transcriptional programs with CRISPR RNA scaffolds. Cell 160, 339–50 (2015).

279. Konermann, S. et al. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature 517, 583–588 (2015).

280. Xu, X. et al. A CRISPR-based approach for targeted DNA demethylation. Cell Discov. 2, 16009 (2016).

43

281. Chavez, A. et al. Comparison of Cas9 activators in multiple species. Nat. Methods 13, 563– 567 (2016).

282. Mandegar, M. A. et al. CRISPR Interference Efficiently Induces Specific and Reversible Gene Silencing in Human iPSCs. Cell Stem Cell 18, 541–553 (2016).

283. Kuscu, C. et al. Temporal and Spatial Epigenome Editing Allows Precise Gene Regulation in Mammalian Cells. J. Mol. Biol. 1–11 (2018). doi:10.1016/j.jmb.2018.08.001

284. Chakraborty, S. et al. A CRISPR/Cas9-based system for reprogramming cell lineage specification. Stem Cell Reports 3, 940–947 (2014).

285. Liu, P., Chen, M., Liu, Y., Qi, L. S. & Ding, S. CRISPR-Based Chromatin Remodeling of the Endogenous Oct4 or Sox2 Locus Enables Reprogramming to Pluripotency. Cell Stem Cell 22, 252–261.e4 (2018).

286. Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 1479–1491 (2013).

287. Fujita, T. & Fujii, H. Efficient isolation of specific genomic regions and identification of associated proteins by engineered DNA-binding molecule-mediated chromatin immunoprecipitation (enChIP) using CRISPR. Biochem. Biophys. Res. Commun. 439, 132– 136 (2013).

288. Liu, X. et al. In Situ Capture of Chromatin Interactions by Biotinylated dCas9. Cell 170, 1028– 1043.e19 (2017).

289. Brocken, D. J. W., Tark-Dame, M. & Dame, R. T. dCas9: A Versatile Tool for Epigenome Editing. Curr. Issues Mol. Biol. 15–32 (2018). doi:10.21775/cimb.026.015

290. Thakore, P. I., Black, J. B., Hilton, I. B. & Gersbach, C. A. Editing the epigenome: technologies for programmable transcription and epigenetic modulation. Nat. Methods 13, 127–137 (2016).

291. Spencer, D. M., Wandless, T. J., Schreiber, S. L. & Crabtree, G. R. Controlling signal transduction with synthetic ligands. Science (80-. ). 262, 1019 LP-1024 (1993).

292. Stanton, B. Z., Chory, E. J. & Crabtree, G. R. Chemically induced proximity in biology and medicine. Science (80-. ). 359, (2018).

293. Hathaway, N. a. et al. Dynamics and memory of heterochromatin in living cells. Cell 149, 1447–1460 (2012).

294. Braun, S. M. G. et al. Rapid and reversible epigenome editing by endogenous chromatin regulators. Nat. Commun. 8, (2017).

295. Zetsche, B., Volz, S. E. & Zhang, F. A split-Cas9 architecture for inducible genome editing and transcription modulation. Nat. Biotechnol. 33, 139–142 (2015).

296. Gao, Y. et al. Complex transcriptional modulation with orthogonal and inducible dCas9 regulators. Nat. Methods 13, 1043–1049 (2016).

297. Ma, D., Peng, S. & Xie, Z. Integration and exchange of split dCas9 domains for transcriptional controls in mammalian cells. Nat. Commun. 7, 13056 (2016).

298. Corson, T. W., Aberle, N. & Crews, C. M. Design and Applications of Bifunctional Small

44 Molecules: Why Two Heads Are Better Than One. ACS Chem. Biol. 3, 677–692 (2008).

299. Butler, K. V., Chiarella, A. M., Jin, J. & Hathaway, N. A. Targeted Gene Repression Using Novel Bifunctional Molecules to Harness Endogenous Histone Deacetylation Activity. ACS Synth. Biol. 7, 38–45 (2018).

300. Chiarella, A. M., Wang, T. A., Butler, K. V., Jin, J. & Hathaway, N. A. Repressing Gene Transcription by Redirecting Cellular Machinery with Chemical Epigenetic Modifiers. J. Vis. Exp. 1, e58222–e58222 (2018).

301. Guo, Z.-F., Zhang, R. & Liang, F.-S. Facile functionalization of FK506 for biological studies by the thiol–ene ‘click’ reaction. RSC Adv. 4, 11400 (2014).

302. Chiarella, A. M. et al. Cavitation Enhancement Increases the Efficiency and Consistency of Chromatin Fragmentation from Fixed Cells for Downstream Quantitative Applications. Biochemistry 57, 2756–2761 (2018).

45 CHAPTER 2: TARGETED GENE REPRESSION USING NOVEL BIFUNCTIONAL MOLECULES TO

HARNESS ENDOGENOUS HISTONE DEACETYLATION ACTIVITY1

INTRODUCTION:

Specific control of gene expression programs is critical for proper mammalian development and allows for the creation of differentiated tissues from a pluripotent starting state. Much of this control comes from regulation at the level of chromatin, where the presence or absence of chemical marks on

DNA and histone tails regulate expression of the associated gene by directing the binding of transcription factors, corepressors, coactivators, and other biomolecules.1,2 These epigenetic pathways are commonly disrupted in many human diseases including cancer.3 Thus, both to understand human biology and to develop new therapeutic strategies for treating diseases driven by epigenetic dysregulation, it is important to develop new technologies capable of synthetically modulating epigenetic processes at individual genes or sets of genes. Site-specific editing of these epigenetic marks can be accomplished with dCas9 or TALE mediated recruitment of epigenetic enzymes.4–6 In one example, a small guide RNA (sgRNA) is capable of recruiting a dCas9-p300 fusion protein to target genes, causing histone acetylation and transcription activation.7 These new approaches have proved to be extremely useful opening up new avenues for the study of epigenetics and could turn out to be a whole new class of therapeutic for personalized medicine. One major drawback of using a genetic editing technology as a therapeutic or an in vivo tool, is off target or continuous recruitment resulting in permanent genetic mutation at undesirable loci.8 As an alternative, we considered a small molecule

______1This chapter previously appeared as an article in ACS Synthetic Biology. The original citation is as follow: Kyle V. Butler, Anna M. Chiarella, Jian Jin, and Nathaniel A. Hathaway. “Targeted Gene Repression Using Novel Bifunctional Molecules to Harness Endogenous Histone Deacetylation Activity” ACS Synthetic Biology 2018 7 (1), 38-45

46 platform for epigenome editing. Small molecules have some advantages over genetic methods as a therapeutic and a research tool: treating cells with small molecules is technically easy and requires little optimization, they are typically easy to synthesize, they can be active in vivo, effects are often dose sensitive, and most importantly their effects are reversible. Disadvantages also exist for small molecules: their functions are typically more difficult to engineer, their cellular effects may be less profound, and they may have more off-target effects than genetic methods.

Using bivalent small molecules to direct proteins to DNA has been investigated.9 “Small molecule transcription factors” featuring hairpin polyamide DNA binding motifs linked to histone deacetylase inhibitors were used to activate transcription of DNA.10 Because polyamides are difficult to synthesize and the effects likely result from a combination of on-target and off-target effects, this approach is not ideal. Similar examples have been reported that use polyamides or transcription factor ligands as the DNA binding ligand and coactivator-binding molecules as a transcriptional activation domain.9 In another example, glucocorticoid receptor (GR) ligands tethered to FK506 were used to recruit a histone deacetylase (HDAC) 1 – FKBP fusion protein to DNA, causing down-regulation of GR- controlled genes.11 Recently, Liszczak et al. used native chemical ligation to attach small molecules to dCas9, allowing CRISPR guided targeting of the molecules to DNA.12 This system did not cause any changes in gene expression unless the cells were co-transfected with a VP64-Brd4 fusion protein. Thus far, none of these methods have been shown to specifically change both the epigenetic state and the expression of a target gene.

Controlling chemically induced proximity between corepressor complexes and DNA-bound transcription factors requires a bivalent ligand capable of strong interactions with both the transcription factor and a corepressor complex. Components of corepressor complexes with high-affinity ligands include the histone deacetylase (HDAC) enzymes and EZH2.13,14 We chose HDAC inhibitors as the ligands for many reasons: they are potent, easy to synthesize, and tolerate the attachment of long linker groups while still being cell-permeable.15 Also, the isoform selectivity and residence time can be tuned by modifying the ligand.16,17 HDAC enzymes reverse the histone acetyl marks associated with open

47 chromatin and active transcription.18,19 It may seem counterintuitive to use enzyme inhibitors to recruit a complex that enzymatically modifies histones, but HDACs exist in complexes that contain proteins capable of silencing genes through many different mechanisms, and some HDACs are catalytically inactive pseudoenzymes that still bind inhibitors.20,21 Furthermore, HDAC enzymes can exist in complexes with other HDAC isoforms that would not be inhibited by the small molecule.22 HDAC inhibitors bind to corepressor complexes such as CoREST, NuRD, and Sin3 with unique selectivity profiles, depending on the chemical structure and HDAC isoform bound.15,23 By tuning the inhibitor structure, it may be possible to recruit specific corepressor complexes to the target gene.

Additionally, these bifunctional molecules are derived from FDA-approved HDAC inhibitors that have been well studied and characterized. Suberanilohydroxamic acid (SAHA) and several other epigenetic inhibitors are widely used in the clinic.24,25 Though these drugs are effective modulators of certain cellular pathways, their cell-wide effects cause high toxicity and modulate the expression of many genes.26 We describe a targeted, gene-specific application of these molecules. This technology can control gene expression in a dose sensitive and reversible manner, allowing us to gain exquisite control over gene expression through modulation of acetylation levels at the target gene promoter.

RESULTS:

Design of chemical epigenetic modifiers:

We used the previously described Chromatin in vivo assay (CiA) cell line to develop bivalent ligands capable of epigenome editing, which we call chemical epigenetic modifiers (CEMs).27 This cell line is derived from mouse embryonic stem cells (mESC). One copy of the Pou5F locus is modified to include a Gal4 DNA-binding array upstream of the transcriptional start site, and the downstream Oct4 gene is replaced with enhanced GFP (eGFP). Because the eGFP gene is highly expressed, this system is sensitive to the recruitment of gene silencing complexes, and the change in gene expression can be monitored by flow cytometry.

48

The mESCs were lentivirally infected with a plasmid expressing an FKBP-Gal4 fusion protein, which serves as our DNA-anchor to recruit the CEMs (Figure 1a). This was ideal for development of the CEM system because FK506, a natural product that forms a high affinity interaction with FKBP, can serve as the transcription factor ligand.28 Our control cell line expresses the unmodified Gal4 protein, removing the ability to bind and recruit FK506.

The CEMs were constructed by attaching FK506 to an HDAC inhibitor through a short linker using click chemistry, so as not to disturb the stereocenters of FK506 (Figure 1b).28,29 HDAC inhibitors can be attached to linkers or solid supports through the solvent-exposed aryl region with little impact on potency. Kozikowski et al. reported click-chemistry compatible, triazole HDAC inhibitors with low- nM potency at HDACs 1, 2, 3, 6, and 10.30 We used this template to create CEM42 and CEM23. Both

CEMs have the same HDAC-recruiting moiety, but we wanted to investigate if the linker would have an impact on CEM-efficiency. CEM42 contains a relatively long 3X PEG linker and CEM23 contains a propyl linker (Figure 1b).

Recruitment of HDAC activity can efficiently repress the CiA:Oct4 locus:

To serve as a positive control for HDAC-mediated repression, we developed fusion proteins between Gal4 and 8 different HDAC isozymes. HDAC3-Gal4 caused repression in 38.8% of the cells

(Figure 2a), while the other fusions did not have any repressive effects.

Dose response experiments identified the optimal concentration of CEM23. We observed repressive activity with 10nM of CEM23. The level of repression peaked around 100nM. Doses went as high as 10µM, after which significant cell death occurred and an accurate level of repression was not obtainable (Supplementary Figure S1a). We followed FKBP-Gal4 cells treated with 100nM

CEM23 for 72 hours. The cell population reached a maximum of 48% GFP(-) after 48 hours, and sustained this level through 72 hours (Figure 2c). In cells transfected with Gal4, which cannot recruit

49 the CEMs, 100nM of CEM23 did not cause a significant increase in the GFP(-) population (Figure 2b).

The Gal4 control experiment proves that any changes in gene expression are not due to the general gene-silencing effect of the HDAC inhibitors, but are only possible when the inhibitor is localized at the target gene. It also shows that the gene silencing is not caused by Gal4 blocking progression of RNA polymerase. We next exposed the cells to CEM23 with and without an excess of FK506 (the component of CEM23 that binds to FKBP).

To demonstrate the stability and reversibility of CEM23 effects, we sorted CEM23-treated cells using fluorescence activated cell sorting (FACS) (Supplementary Figure S2). After 48 hours of 100nM

CEM23 treatment, the cells were separated into positive and negative populations. The GFP(-) was plated with or without continuous CEM23 exposure and maintained for evaluation. The negatively sorted cells were assessed with flow cytometry 3 days after FACS. The cells that were not treated with

CEM23 regained almost full GFP expression (92.3% GFP+) and 71.8% of the CEM23 treated populated remained GFP(-) (Figure 2d).

To investigate the ability of the CEMs to be washed out and dosed back into the cells, we re- exposed a portion of the negatively sorted cells that were not given post-sort CEM-treatment and had regained GFP-expression. After 48 hours post reintroduction of 100nM CEM23, 78% of the population repressed GFP (Figure 2e). These results demonstrate the specificity and reversibility of our system to tune gene expression in a chemically-depend manner through endogenous HDAC recruitment.

CEM42, featuring a longer linker region, behaved similarly to CEM23 in the FKBP-Gal4 cells. At 100nM, after 48 hours, CEM42 causing 41.1% of the cells to reside in the GFP(-) state, with no significant effects on transcription for the Gal4 control cells (Supplementary Figure S1b).

Knowing that recruitment of HDAC3 was likely to be responsible for gene repression, we synthesized CEM36, based on a different chemotype reported to be selective for HDAC1/331 (Figure

1b). CEM36 features a benzamide zinc chelating group that is known to have a very long residence time at the enzyme relative to the hydroxamic acid zinc chelating groups of CEM23 and CEM42, but is

50 less potent. After treatment of FKBP-Gal4 cells with 100nM CEM36 for 2 days, 29.9% of cells were

GFP(-), with no effect in Gal4 control cells (Supplementary Figure S1c).

To confirm that the bifunctional CEMs repress GFP, rather than one of the components alone, we tested the CEMs with FK506. Our results show that with a 10X excess of FK506, the repressive effects of CEM23 are removed, so FK506 is competing with CEM23 for the same FKBP binding site

(Supplementary Figure S1d). We also tested the effect that the individual components have on GFP expression. FK506 alone did not alter expression at 100nM and 1000nM concentrations. SAHA did have a slightly repressive effect at 1000nM, but no repression was observed at 100nM (Supplementary

Figure S1e).

CEM-mediated repression of eGFP occurs in a variegated and whole-colony fashion:

mESCs are an adherent, colony-forming cell line, thus we hypothesized that these characteristics were influencing the ability of cells to uptake the CEMs and thereby limited the level of overall gene repression. To characterize the mode of the repression, we took fluorescent microscopy images of the HDAC3-Gal4 and CEM-treated cell line. In an unbiased, blinded manner we counted

~200 mESC colonies for each line and classified them into three categories: GFP-Off, GFP-On, and

GFP-Variegated. Colonies were labeled as “GFP Off” if all cells within the colony were off or expressing background-level GFP expression. Colonies in which all the observable cells were expressing GFP relatively brightly were categories as “GFP-On”. “GFP-Variegated” colonies had cells within the colonies that maintained high GFP expression, in an otherwise GFP(-) colony. Examples of cells in these categories are shown in Supplementary Figure S3a. Cell lines were grown and counted in duplicate, then averaged (Supplementary Figure S3b).

The Gal4-expressing and FKBP-Gal4 expressing cell lines were treated with 100nM of CEM23,

CEM36, or CEM42 for 48 hours. Images were taken throughout areas of the wells using a High Content

Fluorescent Microscope (Figure 3a,b). HDAC3-Gal4 cell lines served as a positive control for HDAC3-

51 recruitment. In the HDAC3-Gal4 cells, an average of 43% of cells were GFP-On, 25% were GFP Off and 32% were variegated. As expected, the GFP expression of the Gal4-infected colonies was not significantly affected by CEM treatment. Of the FKBP-Gal4 cells treated with CEM23, 43% of the colonies were GFP-On, 32% were GFP-Off, and 25% were variegated, similar to the HDAC3-Gal4 cell line. The standard deviation between replicates was low when comparing the percentage of GFP-off colonies, whereas the percentages varied more between GFP-On and GFP-Variegated. This could be due to differences in what was determined to be fully on versus a variegated phenotype. Nevertheless, some colonies are remaining fully on or fully off, which does not support the hypothesis that CEM-to- cell accessibility is limited by colony formation, resulting in an incomplete suppression of GFP expression. To validate cell morphology, representative images of the cell lines and treatment condition representative images are shown with a fluorescent microscope, which allowed phase microscopy of the colonies (Figure 3c and Supplementary Figure S3c).

H3K27ac levels decrease upon CEM-recruitment:

To examine the effects of CEM-mediated HDAC recruitment on chromatin environment, we performed chromatin immunoprecipitation (ChIP) followed by quantitative RT-PCR (qPCR). H3K27ac is a histone mark associated with active promoters and is regulated in part by HDAC3 activity.32 We tested for differences in H3K27ac at the Oct4 locus of the HDAC3-Gal4 cell line, the Gal4 cell line, and the FKBP-Gal4 cell line with and without 48 hours of 100nM CEM23 exposure (Figure 4a).

The normalized level of H3K27ac enrichment was significantly higher in the Gal4 cell line compared to the HDAC3-Gal4 cell line when tested at regions 489 base pairs (bp) and 738 bp downstream from the transcriptional start site (TSS) (* = p<0.05, n=4) (Figure 4b). We next tested the

H3K27ac levels at the same location in the FKBP-Gal4 cell lines with and without CEM23 treatment.

Our results show that CEM23 decreased H3K27ac levels at 489, 738, and 1199 bp downstream of the

TSS (Figure 4c, * = p<0.05, ** = p<0.005, n=5). The primer set at 169 bp upstream of the TSS did not

52 show a significant change between the Gal4 cells and HDAC3-Gal4 cells, nor between the FKBP-Gal4

+/- CEM23 cells (Supplementary Figure S4a,b). This is not surprising, as regions this close to the

TSS are typically largely devoid of nucleosomes. The results of H3K27ac levels in the four cell lines at

-169, 489, 738, and 1199 bp from the TSS are summarized in Figure 4d. These data support our hypothesis that our CEM-mediated gene repression occurs through endogenous HDAC recruitment.

To confirm and further characterize the observed changes in histone tail acetylation, FKBP-Gal4 cells treated with 100nM of CEM23 for 48 hours were separated into positive and negative populations by FACS. We expected that the decreased acetylation levels observed in the CEM23-treated cells were contributed by the cells with decreased GFP expression (as observed by flow cytometry and microscopy). We measured the H3K27ac levels in the GFP(+) and GFP(-) population from three separate sorting experiments. We compared H3K27ac enrichment levels between the GFP(+) and

GFP(-) populations. As expected, H3K27ac levels in the GFP(+) population were significantly higher than the GFP-negative population. Additionally, H3K27ac levels in the untreated FKBP-Gal4 cells were not significantly different from the FKBP-Gal4 + CEM23 positively sorted cells. And, H3K27ac levels in the FKBP-Gal4 CEM23-treated cells were not significantly different from the FKBP-Gal4 + CEM23 negatively sorted cells (Supplementary Figure S4c). It should be expected that the sorted GFP(+) population has acetylation levels similar to the unsorted FKBP-Gal4 control cells, and that the sorted

GFP(-) population has acetylation levels similar to the unsorted FKBP-Gal4 CEM23 treated cells. For all ChIP reactions, qPCR was done with primer pairs along the Oct4 locus (Supplementary Figure

S4d). H3K27ac enrichment levels were normalized to primers recognizing an intergenic region (IGR).

These data confirm what we expected and show that within the CEM23-treated cells, the GFP(-) cells have lower H3K27ac levels than the GFP(+) cells.

DISCUSSION:

CEM compounds cause specific, targeted gene suppression by modifying the local chromatin.

The mechanism of action for these compounds is direct recruitment of HDAC-containing corepressor

53 complexes to the FKBP-Gal4 transcription factor. Evidence supporting this claim is: the known ability of similar HDAC inhibitors to bind HDAC-containing corepressor complexes,23 the reduction of transcription and H3K27 acetylation at the target gene, and the recapitulation of the drug-induced phenotype by a Gal4-HDAC3 fusion protein.

Suberanilohydroxamic acid (SAHA) derivatives like CEM23 and CEM42 are known to bind to multiple corepressor complexes, and have particularly high affinity towards CoREST, Sin3, and the

HDAC3-containing nuclear receptor corepressor complex (NCOR).23,33 HDAC3 does not need to be catalytically active to cause gene repression in cells, and represses transcription independent of direct deacetylation, dependent on association with NCOR.34 When a CEM recruits HDAC3/NCOR to a gene,

HDAC3/NCOR can suppress transcription even while HDAC3’s catalytic activity is blocked. HDAC3 is the only HDAC enzyme known to reside in the NCOR complex,35 so it is unclear how deacetylation takes place while the NCOR HDAC3 enzyme is engaged by an inhibitor. Because we observe the associated deacetylation in proximity to our FKBP-Gal4 DNA anchor, we are likely recruiting uninhibited

HDACs to the gene locus. This could be happening through indirect recruitment of inhibited HDACs that our CEMs are recruiting, through direct recruitment whereby the reversible HDAC-CEMs interaction is allowing for previously inhibited HDACs to be active, or a mixture of both conditions. We hypothesize that we generate a cloud of HDACs in the proximity of the gene that are actively deacetylating and altering the chromatin environment in a chemically-dependent manner.

Less selective, hydroxamic acid-containing HDAC inhibitors like CEM23 were most effective in our assays. Slow dissociating benzamide compounds like CEM36 were not as effective, perhaps because these compounds have less affinity for the enzyme.15,16 Future medicinal chemistry campaigns will more exhaustively explore alternative chromatin modification pathways by changing the chromatin activity portion of the bifunctional CEMs to include chemical probes that bind other complexes, for example polycomb repressive complex factors or heterochromatin pathway proteins, since the direct tethering of either pathway has been linked to gene repression.27,36

54 The CEM compounds could be used in other cell lines that contain Gal4 binding arrays at a gene of interest, or could be adapted for use with other genetic engineering systems. dCas9-CRISPR is currently the most programmable system for epigenome editing, and we are working to adapt dCas9 to work with the CEMs. Because the dose of small molecules can be precisely controlled, the CEMs could have value as part of a CRISPR-directed epigenome editing platform.

Drugs that target the addition, removal, or recognition of histone modifications, such as HDAC inhibitors, do this across the genome. The HDAC inhibitors SAHA and depsipeptide have been shown to change the expression of up to 22% of genes.26 HDAC inhibitors also modulate acetylation of many non-histone proteins.37 This lack of selectivity causes side effects, and thus, most of these drugs are reserved for use in cancers for which there aren’t many other therapeutic options.38 By tethering epigenetic modulator drugs to a DNA-targeting group, the pharmacology of these drugs is constrained to the target locus, providing a robust effect at the target gene with minimal off-target effects.

The success of this work, and other results,11 raises the possibility that purely small molecule epigenome editing is possible. An HDAC inhibitor could be tethered to a glucocorticoid or androgen receptor (AR) ligand, and be used to repress genes bound by these receptors. An AR receptor ligand

– HDAC inhibitor conjugate could be a therapy for androgen-independent prostate cancer, where the

AR inappropriately activates target genes with changes to the epigenome.39 Because of the difficulties in delivering proteins or genes in vivo, bivalent small molecules may be the easiest way to achieve targeted in vivo epigenome editing.

METHODS:

Cell lines and infection:

The CiA mouse embryonic stem cells, previously described in Hathaway et al., 2012, were grown on

0.1% gelatin coated tissue culture dishes in high-glucose DMEM (Corning, 10-013-CV) supplemented with FBS serum (Gibo, 26140-079), 10mM HEPES (Corning, 25-060-Cl), NEAA (Gibco, 11140-050),

55 Pen/Strep, 55μM 2-Mercaptoethanol, and 1:500 LIF conditioned media produced from LIF-1Cα (COS) cells. Low passage (22-30) mESC were used. Cells were typically passaged at a density of 3-4 x 106 cells per 10 cm plate, fed every day, and split every 2-3 days. Lentivirus production of mESC infection was done using 293T LentiX cells (Clontech). Low passage cells were plated onto 15 cm cells such that they were 80% confluent 24 hours later. Each plate was transfected with 18μg of the plasmid of interest, 13.5μg of the Gag-Pol expressing plasmid, and 4.5μg of the VSV-G envelope expressing plasmid. PEI transfection was done and 60 hours after transfection, virus was spun down at 20,000

RPM for 2 ½ hours and then added to the mESCs in combination with 5 μg/mL Polybrene (Santa Cruz, sc-134220). Selection of lentiviral constructs was done with either puromycin (1.5μg/mL) or blasticidin

(7.5μg/mL). CEM and FK506 were diluted in DMSO (Sigma D2650) and kept at -20oC added to the cells during cell passaging and the cells were given fresh media and compound daily.

Construct Design:

The plasmids expressing the Gal4-DBD control and FKBP-Gal4 were previously constructed and can be found on Addgene (44176 and 44245, respectively). HDAC3-Gal4 was constructed by stitching

PCR. The HDAC3 was amplified from mESC cDNA and the Gal4 was amplified from Addgene #44176.

Both pieces were stitched into a lentivirus backbone with EF1-α promoter driving HDAC3-Gal4 and a

PGK promoter driving a resistance gene. The primers for the stitching pcr are as follows: forward primer

(AC044) tgaggatccgcggccgcgccaccatggccaagaccgtggcg ; middle forward primer (AC045) cgacaaggaaagtgatgtggagattatgaagctactgtcttctatcgaac ; middle reverse primer (AC046) gttcgatagaagacagtagcttcataatctccacatcactttccttgtcg ; reverse primer (AC047) agagccggcgcggccgcctacgatacagtcaactgtctttgacc.

Flow cytometry and FACS:

Flow cytometry analysis and fluorescence activated cell sorting were done with the University of North

Carolina Flow Facility. The cells were washed, trypsinized, and plated into a 96 well format (at a density of about 2,000 cells/μL) for analysis in the Intellicyt iQue screener PLUS. UNC Flow Core Staff

56 conducted the FACS with the FACSAria II. 107 cells were harvested for ChIP immediately after the sorted populations were acquired.

Image acquisition and quantification:

The images taken for quantification were taken with the GE IN Cell Analyzer 2200 24 hours and 48 hours after compound exposure. Ten images, randomly dispersed throughout the well were taken per cell line. For each cell line treatment condition, 147-213 colonies were counted and characterized as

GFP On, GFP Off, or GFP Variegated. The percentage of each category was determined. This was done in duplicate with freshly infected and selected cell lines. Percentages for each category were averaged. The brightness/contrast of the brightfield images was uniformly adjusted in ImageJ FIJI.

Background artifact in the GPF-fluorescent images was uniformly removed in FIJI with a sliding paraboloid with a rolling ball radius of 10 pixels.

Confirmatory images were taken of the cells 24 and 48 hours after compound exposure with the Leica

Olympus IX71. Phase images were not edited. Fluorescent images were edited in FIJI with a sliding paraboloid with a rolling ball radius of 20 pixels. 5 images were taken per replicate per condition, and representative images are shown.

ChIP-qPCR:

For each sample, cells were trypsinized for 8-10 minutes, trypsin was quenched with 10mL of ES media, and 107 cells were obtained. Cells were spun down, washed with 10mL PBS, fixed for 12 minutes in a mix of formaldehyde (to a final concentration of 1%) and Fix Buffer (50mM HEPES pH 8.0, 1mM EDTA,

0.5mM EGTA, 100mM NaCl), and then quenched by glycine (final concentration of 0.125M). Cells were incubated on ice, spun down at 1000xg for 5 minutes. Nuclei were prepared by consecutive washes with Rinse 1 Buffer (50mM HEPES pH 8.0, 140mM NaCl, 1mM EDTA, 10% glycerol, 0.5% NP40, 0.25%

Triton X100), Rinse 2 Buffer (10mM Tris pH 8.0, 1mM EDTA, 0.5mM EGTA, 200mM NaCl) and

Shearing Buffer (0.1% SDS, 1mM EDTA, 10mM Tris HCl pH 8). After a final spin down, cells were resuspended in 100μL Shearing Buffer and 1 x protease inhibitor cocktail (Calbiochem), nanodroplets

57 (generous gift from Samantha Pattenden40, and sonicated in an E110 Covaris Sonicator for 4 minutes or until DNA was sheared to between 70-500bp (as confirmed by agarose gel).

After sonication, the rest of the protocol was performed with a ChIP-IT High Sensitivity Kit (Active Motif,

53040) and an H3K27ac antibody was used for the pull down (Abcam, ab4729).

58 REFERENCES:

1. Zhou, V. W., Goren, A. & Bernstein, B. E. (2011) Charting histone modifications and the functional organization of mammalian genomes. Nat. Rev. Genet. 12, 7–18.

2. Bannister, A. J. & Kouzarides, T. (2011) Regulation of chromatin by histone modifications. Cell Res. 21, 381–395.

3. Feinberg, A. P., Koldobskiy, M. A. & Göndör, A. (2016) Epigenetic modulators, modifiers and mediators in cancer aetiology and progression. Nat. Rev. Genet. 17, 284–99.

4. Gao, Y. et al. (2016) Complex transcriptional modulation with orthogonal and inducible dCas9 regulators. Nat. Methods Epub Oct 24, 2016. DOI: 10.1038/nmeth.4042.

5. Kungulovski, G. & Jeltsch, A. (2016) Epigenome Editing: State of the Art, Concepts, and Perspectives. Trends in Genetics 32, 101–113.

6. Qi, L. S. et al. (2013) Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173–1183.

7. Hilton, I. B. et al. (2015) Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat. Biotechnol. 33, 510–7.

8. Lanphier, E., Urnov, F., Haecker, S. E., Werner, M. & Smolenski, J. (2015) Don’t edit the human germ line. Nature 519, 410–411.

9. Højfeldt, J. W., Van Dyke, A. R. & Mapp, A. K. (2011) Transforming ligands into transcriptional regulators: building blocks for bifunctional molecules. Chem. Soc. Rev. 40, 4286–94.

10. Xiao, X., Yu, P., Lim, H.-S., Sikder, D. & Kodadek, T. (2007) A Cell-Permeable Synthetic Transcription Factor Mimic. Angew. Chemie Int. Ed. 46, 2865–2868.

11. Højfeldt, J. W. et al. (2014) Bifunctional ligands allow deliberate extrinsic reprogramming of the glucocorticoid receptor. Mol. Endocrinol. 28, 249–59.

12. Liszczak, G. P. et al. (2017) Genomic targeting of epigenetic probes using a chemically tailored Cas9 system. PNAS 114, 681–686.

13. Kim, W. et al. (2013) Targeted disruption of the EZH2–EED complex inhibits EZH2-dependent cancer. Nat. Chem. Biol. 9, 643–650.

14. Dokmanovic, M., Clarke, C. & Marks, P. A. (2007) Histone deacetylase inhibitors: overview and perspectives. Mol. Cancer Res. 5, 981–989.

15. Becher, I. et al. (2014) Chemoproteomics reveals time-dependent binding of histone deacetylase inhibitors to endogenous repressor complexes. ACS Chem. Biol. 9, 1736–1746.

16. Wagner, F. F. et al. (2015) Kinetically Selective Inhibitors of Histone Deacetylase 2 (HDAC2) as Cognition Enhancers. Chem. Sci. 6, 804–815.

17. Kalin, J. H. & Bergman, J. A. (2013) Development and therapeutic implications of selective histone deacetylase 6 inhibitors. Journal of Medicinal Chemistry 56, 6297–6313.

18. Zhang, H., Gao, L., Anandhakumar, J. & Gross, D. S. (2014) Uncoupling Transcription from Covalent Histone Modification. PLoS Genet. 10.

59 19. Delcuve, G. P., Khan, D. H. & Davie, J. R. (2012) Roles of histone deacetylases in epigenetic regulation: emerging paradigms from studies with inhibitors. Clin. Epigenetics 4, 5.

20. Lahm, A. et al. (2007) Unraveling the hidden catalytic activity of vertebrate class IIa histone deacetylases. Proc. Natl. Acad. Sci. U. S. A. 104, 17335–40.

21. Harris, L. G. et al. (2016) Evidence for a non-canonical role of HDAC5 in regulation of the cardiac Ncx1 and Bnp genes. Nucleic Acids Res. 44, 3610–3617.

22. Kelly, R. D. W. & Cowley, S. M. (2013) The physiological roles of histone deacetylase (HDAC) 1 and 2: complex co-stars with multiple leading parts. Biochem. Soc. Trans. 41, 741–9.

23. Bantscheff, M. et al. (2011) Chemoproteomics profiling of HDAC inhibitors reveals selective targeting of HDAC complexes. Nat. Biotechnol. 29, 255–65.

24. Marks, P. (2007) Discovery and development of SAHA as an anticancer agent. Oncogene 26, 1351–1356.

25. Zwergel, C., Stazi, G., Valente, S. & Mai, A. (2016) Histone Deacetylase Inhibitors : Updated Studies in Various Epigenetic - Related Diseases. J. Clin. Epigenetics 2, 1–15.

26. Peart, M. J. et al. (2005) Identification and functional significance of genes regulated by structurally different histone deacetylase inhibitors. Proc. Natl. Acad. Sci. U. S. A. 102, 3697– 3702.

27. Hathaway, N. a. et al. (2012) Dynamics and memory of heterochromatin in living cells. Cell 149, 1447–1460.

28. Guo, Z.-F., Zhang, R. & Liang, F.-S. (2014) Facile functionalization of FK506 for biological studies by the thiol–ene ‘click’ reaction. RSC Adv. 4, 11400.

29. Hong, V., Presolski, S. I., Ma, C. & Finn, M. G. (2009) Analysis and optimization of copper- catalyzed azide-alkyne cycloaddition for bioconjugation. Angew. Chemie - Int. Ed. 48.

30. Chen, Y. et al. (2008) A series of potent and selective, triazolylphenyl-based histone deacetylases inhibitors with activity against pancreatic cancer cells and Plasmodium falciparum. J. Med. Chem. 51, 3437–3448.

31. Rai, M. et al. (2010) Two new pimelic diphenylamide HDAC inhibitors induce sustained frataxin upregulation in cells from Friedreich’s ataxia patients and in a mouse model. PLoS One 5.

32. You, S.-H. et al. (2013) Nuclear receptor co-repressors are required for the histone- deacetylase activity of HDAC3 in vivo. Nat. Struct. Mol. Biol. 20, 182–187.

33. Zhang, J., Kalkum, M., Chait, B. T. & Roeder, R. G. (2002) The N-CoR-HDAC3 nuclear receptor corepressor complex inhibits the JNK pathway through the integral subunit GPS2. Mol. Cell 9, 611–623.

34. Sun, Z. et al. (2013) Deacetylase-Independent function of HDAC3 in transcription and metabolism requires nuclear receptor corepressor. Mol. Cell 52, 769–782.

35. Urvalek, A. M. & Gudas, L. J. (2014) Retinoic acid and histone deacetylases regulate epigenetic changes in embryonic stem cells. J. Biol. Chem. 289, 19519–19530.

36. Bintu, L. et al. (2016) Dynamics of epigenetic regulation at the single-cell level. Science (80-. ).

60 351, 720–724.

37. Seto, E. & Yoshida, M. (2014) Erasers of histone acetylation: The histone deacetylase enzymes. Cold Spring Harb. Perspect. Biol. 6.

38. Subramanian, S., Bates, S. E., Wright, J. J., Espinoza-Delgado, I. & Piekarz, R. L. (2010) Clinical toxicities of histone deacetylase inhibitors. Pharmaceuticals 3, 2751–2767.

39. Wang, Q. et al. (2009) Androgen Receptor Regulates a Distinct Transcription Program in Androgen-Independent Prostate Cancer. Cell 138, 245–256.

40. Kasoji, S. K. et al. (2015) Cavitation enhancing nanodroplets mediate efficient DNA fragmentation in a bench top ultrasonic water bath. PLoS One 10.

61 Figures:

Figure 1. Design and structure of chemical epigenetic modifiers (CEMs). (a) A Gal4-FKBP fusion binds to a 5X Gal4 binding array at the promoter of the CiA:Oct4 locus. A bivalent FK506-HDAC inhibitor binds to FKBP and recruits HDAC corepressor complexes. (b) Structures of CEM compounds described here.

62

Figure 2. CEM23 reversibly and specifically represses GFP expression. (a) Flow cytometry data for

ES cells expressing either Gal4 or Gal4-HDAC3. (b) ES cells expressing Gal4 were treated with either 0 or 100 nM CEM23 for 48 hours. (c) ES cells expressing Gal4-FKBP were treated with CEM23 over 72 hours. (d) ES cells treated with 100nM CEM23 were sorted to keep only GFP(-) cells. These cells were treated with either 0 or 100nM CEM23 for 5 additional days, and analyzed on days 5 or 7.

(e) CEM23 treated, GFP(-) sorted cells were washed out for five days, treated with either 0 or 100nM

CEM23 for two days, and analyzed by flow cytometry. All histograms are representative plots of multiple experiments.

63

Figure 3. Fluorescent microscopy images and colony categorization for CEM-treated cells.

Fluorescence images were taken of ES cells and colonies were characterized as being GFP On, GFP

Off, or a variegated state (b). Representative images are shown (c).

64

Figure 4. H3K27Ac chromatin immunoprecipitation at the Cia:Oct4 locus. Quantitative RT-PCR was performed at four sites at the nucGFP locus. Quantitative RT-PCR compared Gal4 to HDAC3-Gal4 cell lines

(b) and Gal4-FKBP cells to Gal4-FKBP with 100nM CEM23 (c) using an H3K27ac antibody. Results at all four regions are summarized (d). Significance was calculated using Student’s t test with n = 4 for B and n =

5 for C. * p > 0.05; ** p > 0.005

65

Supplemental Figure S1. (a) Overlays of histograms of dose response for CEM23 (0, 10, 50, 100,

1000, 10000nM) with legend showing colors for each dose. (b) Overlays of histograms of CEM42 +/- compound for Gal4 and Gal4-FKBP cell lines. (c) Overlays of histograms of CEM36 +/- compound for

Gal4 and Gal4-FKBP cell lines.

66

Supplemental Figure S1d. Overlays of histograms for the Gal4 cell line and FKBP-Gal4 cell line with 100nM of CEM23, with and without 1μM FK506.

67

Supplemental Figure S1e. Overlays of histograms for the FKBP-Gal4 cell line with FK506 or suberoylanilide hydroxamic acid (SAHA).

68

Supplemental Figure S1e. Median fluorescences for each cell line and experimental condition.

69

Supplemental Figure S2a.The gating of FKBP-Gal4 + 100nM CEM23 cells and gating of

HDAC3-Gal4 cells for Fluorescence Activated Cell Sorting (FACS) for replicate 1 of ChIP experiments.

70

Supplemental Figure S2b.The gating of FKBP-Gal4 + 100nM CEM23 cells and gating of

HDAC3-Gal4 cells for Fluorescence Activated Cell Sorting (FACS) for replicate 2 of ChIP experiments.

71

Supplemental Figure S2c.The gating of FKBP-Gal4 + 100nM CEM23 cells and gating of

HDAC3-Gal4 cells for Fluorescence Activated Cell Sorting (FACS) for replicate 3 of ChIP experiments.

72

Supplemental Figure S3a. Representative images of colonies characterized as GFP-Off, GFP-On, or GFP-Variegated.

73

Supplemental Figure S3b. The raw data of colony characterization into GFP-Off, GFP-On, GFP-

Variegated of 10 experimental conditions for two replicates.

74

Supplemental Figure S3c. Representative images of Phase and GFP Fluorescence for Gal4 and

FKBP-Gal4 cell lines exposed to CEMs.

75

Supplemental Figure S4. Quantitative RT-PCR comparing Gal4 to HDAC3-Gal4 cell lines (a) and

Gal4-FKBP cells to Gal4-FKBP with 100nM CEM23 (b) using an H3K27ac antibody and qPCR primers 169 basepairs upstream of the transcriptional start site. Quantitative RT-PCR compared

Gal4-FKBP cells, Gal4-FKBP with 100nM CEM23, and Gal4-FKBP with 100nM CEM23 sorted into positive and negative populations, using an H3K27ac antibody and qPCR primers 738 basepairs downstream of the transcriptional start site (c). Primer sequences used for qPCR in Figure 4 and

Supplemental Figure 4A-C (d).

76 Supplemental Text 1. Chemistry General Procedures. FK506 was purchased from Selleck. All other chemicals were purchased from Sigma-Aldrich or Fisher Scientific. HPLC spectra for all compounds were acquired using an Agilent 1200 Series system with DAD detector. Analytical HPLC chromatography was performed on a 2.1×150 mm Zorbax 300SB-C18 5 μm column with water containing 0.1% formic acid as solvent A and acetonitrile containing 0.1% formic acid as solvent B at a flow rate of 0.4 mL/min. The gradient program was as follows: 1% B (0−1 min), 1−99% B (1−4 min), and 99% B (4−8 min). High resolution mass spectra (HRMS) data were acquired in positive ion mode using an Agilent G1969A API-TOF with an electrospray ionization (ESI) source. Flash column chromatography was performed on a Teledyne ISCO CombiFlash Rf system equipped with a variable wavelength UV detector and a fraction collector using RediSep Rf normal phase silica columns.

Microwave reactions were performed using a Discover SP CEM. Nuclear Magnetic Resonance

(NMR) spectra were acquired on a Bruker DRX-600 spectrometer with 600 MHz for proton (1H NMR) and 150 MHz for carbon (13C NMR); chemical shifts are reported in ppm (δ). Preparative HPLC was performed on Agilent Prep 1200 series with UV detector set to 254 nm. Samples were injected onto a

Phenomenex Luna 75 x 30 mm, 5 μm, C18 column at room temperature. The flow rate was 30 mL/min. A linear gradient was used with 10% (or 50%) of MeOH (A) in H2O (with 0.1 % TFA) (B) to

100% of MeOH (A). HPLC was used to establish the purity of target compounds. All final compounds had > 95% purity using the HPLC methods described above. NMR integration marked as n.d. indicates that integration could not be determined due to large numbers of protons overlapping with solvent peaks.

77

Supplemental Figure 5. Synthesis of compounds 1-3. (2) FK506-mercaptopropanamide propylazide: FK506-mercaptopropionic acid (Compound 1) was prepared as described.1 Compound 1

(182 mg, 0.2 mmol) in DMF (1 ml) was treated sequentially with EDCI-HCl (25 mg, 0.25 mmol),

HOAT (41 mg, 0.3 mmol), 3-Azido-1-propanamine (25 mg, 0.25 mmol), and DIPEA (70 ul, 0.4 mmol).

This was stirred for 24 h and purified by HPLC. Yield: 82 mg, 41%. TOF-HRMS (m/z) found (calcd.)

+ 1 for [C50H81N5O13S + H] : 992.5634 (992.5630). H NMR (600 MHz, Chloroform-d) δ 6.09 (s, 1H), 5.37

(s, 1H), 5.14 - 4.95 (m, 2H), 4.63 (m, 1H), 4.44 (d, J = 13.7 Hz, 1H), 3.94 (d, J = 10.3 Hz, 1H), 3.70 (d,

J = 9.6 Hz, 1H), 3.65 - 3.54 (m, 1H), 3.46 - 3.25 (m, 17H), 3.07 - 2.97 (m, 2H), 2.86 - 2.73 (m, 3H),

2.54 (dt, J = 12.5, 7.0 Hz, 3H), 2.49 - 2.22 (m, 7H), 2.22 - 1.85 (m, 11H), 1.85 - 1.74 (m, 6H), 1.74 -

1.23 (m, 20H), 1.16 - 0.81 (m, 13H). (3) Compound 1 (200 mg, 0.27 mmol), 11-Azido-3,6,9-

78 trioxaundecan-1-amine (51 ul, 0.26 mmol), EDCI-HCl (53 mg, 0.28 mmol), DIPEA (0.094 ml, 0.54 mmol), and HOAT (51 mg, 0.33 mmol) were stirred in 2 ml DCM for 24 h. The product was purified by

+ HPLC. Yield: 143 mg, 48%. TOF-HRMS (m/z) found (calcd.) for [C55H91N5O16S + H] : 1110.6260

(1110.6260). 1H NMR (600 MHz, Chloroform-d) δ 6.43 (d, J = 15.8 Hz, 1H), 5.36 (s, 1H), 5.26 - 4.91

(m, 3H), 4.61 (s, 1H), 4.44 (d, J = 13.8 Hz, 1H), 4.04 - 3.84 (m, 1H), 3.71 - 3.57 (m, 12H), 3.48 - 3.29

(m, 14H), 3.02 (d, J = 11.7 Hz, 2H), 2.81 (t, J = 7.3 Hz, 2H), 2.75 (d, J = 15.9 Hz, 1H), 2.54 (s, 2H),

2.47 (t, J = 7.4 Hz, 2H), 2.31 (s, 2H), 2.19 - 1.26 (m, n.d.), 1.11 - 0.82 (m, 13H).

Supplemental Figure 6. Synthesis of compounds 4, CEM23, CEM42. (4) N1-(3-Ethynylphenyl)-N8- hydroxyoctanediamide: Methyl 8-((3-ethynylphenyl)amino)-8-oxooctanoate2 (250 mg, 0.87 mmol) was

79 dissolved in 3 ml of MeOH and 3 ml of THF, then treated with KOH (97 mg, 1.74 mmol, dissolved in a minimal amount of water) and NH2OH (50% aqueous solution, 0.57 ml, 8.7 mmol), and stirred 24 h.

The reaction was neutralized with 1M HCl and the product was extracted with DCM. Purification by

+ HPLC gave the product. Yield: 43 mg, 17%. TOF-HRMS (m/z) found (calcd.) for [C16H20N2O3 + H] :

1 289.1655 (289.1552). H NMR (600 MHz, DMSO-d6) δ 10.34 (s, 1H), 9.98 (s, 1H), 8.67 (s, 1H), 7.79

(s, 1H), 7.55 (d, J = 8.2 Hz, 1H), 7.30 (t, J = 7.9 Hz, 1H), 7.13 (d, J = 7.6 Hz, 1H), 4.17 (s, 1H), 2.29 (t,

J = 7.5 Hz, 2H), 1.94 (t, J = 7.4 Hz, 2H), 1.57 (t, J = 7.4 Hz, 2H), 1.52 - 1.45 (m, 2H), 1.27 (s, 4H). 13C

NMR (151 MHz, DMSO-d6) δ 171.93, 169.49, 139.92, 129.56, 126.58, 122.34, 122.20, 119.97, 83.84,

80.89, 40.44, 36.78, 32.64, 28.80, 25.43, 25.33. General procedure for click reactions with

FK506-azides: The azide (1 equiv, 0.02 mmol) and the alkyne (1 equiv) were dissolved in 1 ml tBuOH. This was treated sequentially with TBTA (2 mg), copper sulfate pentahydrate (0.05 equiv of a

0.1M solution in water), and sodium ascorbate (0.2 equiv of a 0.1M solution in water). The reaction was stirred for 24 h then purified by HPLC. (CEM23) The product was prepared from 10 and 3 by the general procedure for click reactions. Yield: 13 mg, 56%. TOF-HRMS (m/z) found (calcd.) for

+ 1 [C66H101N7O16S + H] : 1280.7011 (1280.7104). H NMR (600 MHz, Methanol-d4) δ 8.36 (s, 1H), 8.03

(s, 1H), 7.58 (dd, J = 16.2, 7.9 Hz, 2H), 7.40 (t, J = 7.9 Hz, 1H), 5.29 - 5.09 (m, 2H), 4.54 (m, 2H),

4.36 (d, J = 13.5 Hz, 1H), 3.97 (m, 1H), 3.79 - 3.68 (m, 1H), 3.63 (m, 3H), 3.50 - 3.39 (m, 6H), 3.36

(m, 3H), 3.02 (m, 2H), 2.89 - 2.65 (m, 4H), 2.55 (m, 2H), 2.52 - 2.25 (m, 7H), 2.25 - 1.98 (m, 10H),

1.94 (m, 3H), 1.88 - 1.71 (m, 9H), 1.71 - 1.18 (m, 23H), 1.18 - 1.06 (m, 1H), 1.06 - 0.80 (m, 10H).

HPLC Purity: >95%, tR = 4.78 min. (CEM42) The product was prepared from 4 and 3 by the general procedure for click reactions. Yield: 13 mg, 56%. TOF-HRMS (m/z) found (calcd.) for [C71H111N7O19S

+ 1 + H] : 1398.7731 (1398.7734). H NMR (600 MHz, Methanol-d4) δ 8.37 (s, 1H), 8.06 (s, 1H), 7.57 (d,

J = 7.8 Hz, 2H), 7.41 (t, J = 7.9 Hz, 1H), 5.27 - 5.19 (m, 2H), 4.69 - 4.61 (m, 2H), 3.97 (t, J = 5.0 Hz,

2H), 3.78 - 3.71 (m, 1H), 3.69 - 3.51 (m, 9H), 3.49 - 3.39 (m, 7H), 3.39 - 3.34 (m, overlaps with solvent), 3.04 (s, 1H), 2.73 (t, J = 7.2 Hz, 2H), 2.55 - 2.25 (m, 9H), 2.23 - 1.86 (m, 10H), 1.82 - 1.03

(m, 31H), 1.02 - 0.80 (m, 10H). HPLC Purity: >95%, tR = 5.08 min.

80

Supplemental Figure 7. Synthesis of compounds 5, 6, CEM36. (5) Methyl 6-(4- ethynylbenzamido)hexanoate: 4-Ethynylbenzoic acid (500 mg, 3.42 mmol), methyl 6-aminohexanoate

HCl salt (743 mg, 4.11 mmol), EDCI-HCl (795 mg, 5.13 mmol), HOAT (697 mg, 5.13 mmol), and

DIPEA (1.49 ml, 8.55 mmol) were stirred in 12 ml of a 1:3 mixture of DMF: DCM. Aqueous workup followed by silica gel purification (20 to 100% gradient of EtOAc in hexane) gave the product. Yield:

+ 1 487 mg, 52%. TOF-HRMS (m/z) found (calcd.) for [C16H19NO3 + H] : 274.2439 (274.1443). H NMR

(600 MHz, Chloroform-d) δ 7.75 (d, J = 8.0 Hz, 2H), 7.57 (d, J = 8.0 Hz, 2H), 6.24 (s, 1H), 3.69 (s,

3H), 3.49 (q, J = 6.7 Hz, 2H), 3.21 (s, 1H), 2.36 (t, J = 7.3 Hz, 2H), 1.71 - 1.65 (m, 4H), 1.44 (t, J = 7.8

Hz, 2H). (6) N-(6-((2-Aminophenyl)amino)-6-oxohexyl)-4-ethynylbenzamide: Compound 4 (340 mg,

1.25 mmol) was dissolved in THF (20 ml). To this was added aqueous LiOH (355 mg LiOH dissolved in 20 ml H2O), and the mixture was stirred 24 h. The mixture was acidified with 2N HCl and the product was extracted with EtOAc. The EtOAc extracts were concentrated to give 6-(4- ethynylbenzamido)hexanoic acid. Yield: 93%, 301 mg. TOF-HRMS (m/z) found (calcd.) for

C15H17NO3 (M): [M+H]+, 260.1286 (260.1287). 6-(4-ethynylbenzamido)hexanoic acid (100 mg, 0.38

81 mmol), o-phenylenediamine (206 mg, 1.90 mmol), EDCI (66 mg, 0.42 mmol), and HOAT (58 mg, 0.42 mmol) were stirred in 8 ml of a 1:3 mixture of DMF:DCM for 24 h. The reaction was taken up in sat. aq. NaHCO3 and the product was extracted with EtOAc. The product was purified by silica gel chromatography (eluting first with a gradient of 50 to 100% EtOAc in hexane, followed by 0 to 20%

MeOH in DCM). Yield: 50 mg, 38% for final step. TOF-HRMS (m/z) found (calcd.) for [C21H23N3O2 +

H]+: 350.1954 (350.1869). 1H NMR (600 MHz, Chloroform-d) δ 7.74 (d, J = 8.4 , 2H), 7.47 (m, 2H),

7.21 (d, 1H), 7.04 (q, J = 8.0 Hz, 1H), 6.88 - 6.77 (m, 2H), 3.50 - 3.44 (m, 2H), 3.20 (s, 1H), 2.45 (t, J

= 7.1 Hz, 2H), 1.78 (p, J = 7.2 Hz, 2H), 1.66 (m, 2H), 1.48 (m, J = 7.8 Hz, 2H). 13C NMR (151 MHz,

DMSO-d6) δ 171.49, 165.64, 142.31, 135.15, 132.01, 127.84, 126.10, 125.71, 124.62, 123.94,

116.55, 116.27, 83.33, 83.13, 40.45, 36.12, 29.30, 26.59, 25.49. (CEM36) The product was prepared from 2 and 5 by the general procedure for click reactions, except that the product was purified by silica gel chromatography (gradient of 0 to 20% MeOH in DCM) rather than by HPLC. Exposure to acid used in HPLC buffer causes the product to decompose. Yield: 41 mg, 78%. TOF-HRMS (m/z)

+ found (calcd.) for [C76H114N8O18S + H] : 1459.8060 (1459.8050). HPLC Purity: >95%, tR = 4.88 min.

1 H NMR (600 MHz, Methanol-d4) δ 8.51 - 8.46 (m, 1H), 7.93 (q, J = 8.0 Hz, 4H), 7.11 - 7.06 (m, 1H),

7.05 - 7.00 (m, 1H), 6.85 (s, 1H), 6.73 - 6.67 (m, 1H), 5.22 (s, 2H), 4.65 (dd, J = 26.9, 5.5 Hz, 3H),

4.36 (d, J = 13.7 Hz, 1H), 4.00 - 3.94 (m, 3H), 3.68 - 3.50 (m, 13H), 3.48 - 3.28 (m, n.d.), 3.02 (d, J =

17.0 Hz, 2H), 2.72 (t, J = 7.2 Hz, 3H), 2.47 (ddd, J = 28.7, 14.1, 6.8 Hz, 7H), 2.41 - 1.89 (m, 13H),

1.82 - 1.37 (m, 31H), 0.98 - 0.85 (m, 11H).

82 Supplemental Figure 8. 1H NMR spectra of CEM23.

83

Supplemental Figure 9. 1H NMR spectra of CEM42.

84

Supplemental Figure 10. 1H NMR spectra of CEM36.

85 SUPPLEMENTAL REFERENCES:

1. Guo, Z.-F., Zhang, R. & Liang, F.-S. (2014) Facile functionalization of FK506 for biological

studies by the thiol–ene ‘click’ reaction. RSC Adv. 4, 11400.

2. Chen, Y. et al. (2008) A series of potent and selective, triazolylphenyl-based histone

deacetylases inhibitors with activity against pancreatic cancer cells and Plasmodium

falciparum. J. Med. Chem. 51, 3437–3448.

86 CHAPTER 3: REPRESSING GENE TRANSCRIPTION BY REDIRECTING CELLULAR

MACHINERY WITH CHEMICAL EPIGENETIC MODIFIERS2

Introduction:

Chromatin consists of DNA wrapped around histone octamer proteins that form the core nucleosome particle. Regulation of chromatin compaction is an essential mechanism for proper DNA repair, replication, and expression1,2,3. One way in which cells control the level of compaction is through the addition or removal of various post-translational histone tail modifications. Two such modifications include (1) lysine acetylation, which is most commonly associated with gene activation, and (2) lysine methylation, which can be associated with either gene activation or repression, depending on the amino acid context. The addition of the acetylation and methylation marks is catalyzed by histone acetyltransferases (HATs) and histone methyltransferases (HMTs), respectively, whereas the removal of the mark is done by histone deacetylases (HDACs) and histone demethylases (HDMs), respectively4,5. Although the existence of these proteins has been known for decades, many mechanisms of how chromatin-modifying machinery works to properly regulate gene expression remain to be defined. Since chromatin regulatory processes are dysregulated in many human diseases, new mechanistic insights could lead to future therapeutic applications.

The chromatin in vivo assay (CiA) is a recently described technique that uses a chemical inducer of proximity (CIP) to control chromatin- modifying machinery recruitment to a specific locus6.

This technology has been used to study an expanding list of chromatin dynamics, including histone- modifying proteins, chromatin remodelers, and transcription factors6,7,8,9. CIP-based chromatin

______2This chapter previously appeared as an article in The Journal of Visualized Experiments. The original citation is as follow: Chiarella, A. M., Wang, T. A., Butler, K. V., Jin, J., Hathaway, N. A. “Repressing Gene Transcription by Redirecting Cellular Machinery with Chemical Epigenetic Modifiers.” J. Vis. Exp.(139)

87 tethering of exogenously expressed proteins has also been extended past CiA to modulate non- modified genetic loci by use with a deactivated Clustered Regularly Interspaced Short Palindromic

Repeats (CRISPR)-associated protein (dCas9)10,11. In the CiA system, mESCs have been modified to express a GFP reporter gene at the Oct4 locus, a highly expressed and strictly regulated area of the mESC genome. Previously, this system has been applied to describe the dose-dependent and reversible recruitment of exogenously expressed heterochromatin protein 1 (HP1), an enzyme that binds H3K9me3 and propagates repressive marks (e.g., H3K9me3) and DNA methylation6. To accomplish this type of recruitment strategy, the mESCs are infected with a plasmid that expresses an

FK506-binding protein (FKBP) linked to a Gal4, which is a DNA-protein anchor that binds to a Gal4- binding array upstream of the GFP reporter. The cells are also infected with an FKBP-rapamycin- binding domain (FRB) connected to HP1. When the mESCs are exposed to low nanomolar concentrations of the CIP, rapamycin, both FRB and FKBP, are brought together at the target locus.

Within days of rapamycin treatment, expression of the target gene is repressed, as evidenced by fluorescence microscopy and flow cytometry, and histone acetylation is decreased, shown by ChIP and bisulfite sequencing. While the development and validation of this approach have been significant advances in the field of chromatin research and regulation, one drawback is the required exogenous expression of chromatin- modifying machinery.

To make the technology based on recruitment of only endogenous physiologically-relevant enzymes and make a more modular system, we designed and characterized novel bifunctional molecules, termed CEMs (Figure 1)12. One component of the CEMs includes FK506 which, similar to rapamycin, binds tightly and specifically to FKBP. Thus, the CEMs will still be recruited to the Oct4 locus in the CiA mESCs. The other component of the CEMs is a moiety that binds endogenous chromatin-modifying machinery. In a pilot study, we tested CEMs that contain HDAC inhibitors. While the concept of using an inhibitor to recruit HDACs seems counter-intuitive, the inhibitor is nonetheless able to recruit HDAC activity to the gene of interest. This is accomplished by (1) redirecting the HDAC to the locus, releasing the enzyme, and increasing the density of un-inhibited HDACs in the area and

(2) maintaining HDAC inhibition at the locus, but recruiting repressive complexes that bind to the

88 inhibited HDACs, or (3) a combination of both. In a previous study, we showed that the CEMs were able to successfully repress the GFP reporter in a dose- and time-dependent manner, as well as in a manner that was rapidly reversible (i.e., within 24 hours)12. We characterized the ability of the CEM technology presented here to control gene expression using fluorescence microscopy, flow cytometry, and the ability to control the chromatin environment using ChIP-qPCR6. Here, we describe a method for using and characterizing the CEMs, which will facilitate the adaptation of this system to answer additional questions related to chromatin biology.

Protocol:

1) Cell Line Culture for Producing Lentivirus

1. Grow fresh, low-passage (less than passage 30) 293T human embryonic kidney (HEK) cells

in high-glucose Dulbecco's modified Eagle's medium (DMEM) base media supplemented with

10% fetal bovine serum (FBS), 10 mM HEPES, 1x non-essential amino acids (NEAA), Pen/

Strep, 2-mercaptoenthanol in a 37 °C incubator with 5% CO2. Split the cells every 3 - 5 d and

before they become > 95% confluent.

2. Passage 293T HEK cells with 18 x 106 per 15-cm plate (one 15-cm plate for each virus

produced).

1. For a 15-cm format, aspirate the old media, add 10 mL of 1x phosphate-buffered

saline (PBS), aspirate the PBS, and add 3 mL of 0.05% trypsin. Incubate the cells in

trypsin for 8 min at 37 °C, rocking and tapping the cells halfway through the incubation.

NOTE: The goal of this step is to get the cells into a single-cell suspension.

3. When the cells have reached 60% - 80% confluency the following day, perform a

polyethylenimine (PEI) transfection with 13.5 µg of the psPAX2 plasmid, 4.5 µg of the pMD2.G

89 plasmid, 18 µg of lentiviral-compatible delivery vector with the gene of interest (i.e., FKBP-

Gal4), 108 µL of PEI, and 1.8 mL of transfection media.

1. Gently mix the DNA, PEI, and transfection media in a 15-mL conical tube, incubate

the sample at room temperature for 15 min, and add it dropwise to the 15-cm plate.

NOTE: Please use appropriate safety measures and follow all institutional and

laboratory safety procedures after this step, as all reagents will contain a live virus.

4. After 16 h of transfection and incubation in a 37 °C incubator with 5% CO2, aspirate the

media and gently add fresh 293 HEK media (prewarmed in a 37 °C water bath) to the 15-cm

plates. Incubate the cells in a 37 °C incubator with 5% CO2 for 48 h after the media change.

The virus is then ready to be harvested.

2) Infection of Mouse Embryonic Stem Cells (mESC) with Lentivirus

1. Grow low-passage (less than passage 35), feeder-free adapted CiA mESCs in high-glucose

DMEM base media supplemented with 20% ESC-grade FBS, 10 mM HEPES, 1x NEAA,

Pen/Strep, 2-mercaptoenthanol, and 1:500 leukemia inhibitor factor (LIF) in a 37 °C incubator

with 5% CO2.

NOTE: The LIF is obtained from the supernatant of COS LIF-producing cells, which are

collected in batch and frozen at -80 °C.

1. Grow cells on plates that have been preincubated for 1 - 3 h with 0.1% gelatin in 1x

PBS, and subsequently washed with 1x PBS. Feed the cells daily and split them every

2 - 3 d, depending on their density.

90 NOTE: Morphologically, embryonic stem cell (ES) colonies should be small, round, and

distinct from each other.

2. Passage mESC into a 12-well plate format (100,000 cells/well) 1 d prior to infection.

1. Count the cells with a hemocytometer. For cells grown in a 10-cm format, aspirate

the old media, add 5 mL of 1x PBS, aspirate the PBS, and add 1 mL of 0.25% trypsin.

Incubate the cells in trypsin for 8 min at 37 °C, rocking and tapping the cells halfway

through the incubation. The goal of this step is to get the cells into a single-cell

suspension.

3. 48 h after changing the media from the 293T15-cm viral packaging cells from step 1.4,

remove and transfer the supernatant to a 50-mL conical tube.

4. Centrifuge the supernatant at 300 x g for 5 min to pellet cell debris.

5. Filter the supernatant through a 0.45-µm membrane.

NOTE: Make sure to use surfactant-free cellulose acetate (SFCA) membranes, as some other

materials can retain virus and significantly reduce yields.

6. Concentrate the virus with ultracentrifugation, using a centrifuge with an SW32 rotor, and

spin it at 20,000 rpm (~72,000 x g) for 2.5 h at 4 °C.

7. While the virus concentrates under ultracentrifugation, treat the CiA mESCs in the 12-well

plate with fresh ES media containing 5 µg/mL of polybrene.

91 8. When the virus concentration has finished, carefully aspirate the supernatant and suspend

the virus pellet in 100 µL of 1x PBS (avoid excess bubbles).

9. Add the virus and 1x PBS to a 1.5-mL microfuge tube. Vortex/shake the tube at 300 x g (or

at the lowest setting) to fully suspend the virus. Spin down the tube in a mini-tabletop centrifuge

for 5 - 10 s to remove bubbles.

10. Add 30 µL of the virus to each well of a 12-well plate. Swirl the plate and then centrifuge

the plate at 1,000 x g for 20 min.

NOTE: The amount of virus added can be varied depending on the viral titer, which is inversely

related to delivery construct size.

1. Alternatively, flash freeze the virus with liquid nitrogen and store it at -80 °C.

However, freezing the virus will lower the viral titer.

11. Place the 12-well plate back in the 37 °C incubator and change the media the following

morning (~16 h later) with fresh ES media (as described in step 2.1).

12. After 48 h of infection, select for cells that integrated the viral plasmid by adding the

appropriate antibiotic (i.e., 1.5 µg/mL of puromycin, 10 µg/mL of blasticidin, etc.). Change the

media daily and wash the cells with 1x PBS if a majority of the cells are floating/dead. Keep the

selection media on the cells for 72 - 96 h in a 37 °C incubator with 5% CO2.

3) Chemical Epigenetic Modifiers (CEM)s Preparation and Treatment

1. Suspend the powdered CEM into dimethyl sulfoxide (DMSO) to a stock concentration of 1

mM and a working concentration of 100 µM.

92

2. After the cells have undergone the selection for 72 - 96 h, split the mESCs into a 12-well

format with 100,000 cells/well. Incubate the cells in a 37 °C incubator with 5% CO2 for 24 h.

Afterward, prepare a media solution of 100 nM CEM with ES media. Use this media to feed the

CiA mESC with 1 mL/well. To serve as a control, keep a well of cells without any added CEMs.

1. Change the media by aspirating old media and adding 1 mL/well of freshly prepared

CEM-containing or CEM-free ES media every 24 h for the duration of an experiment.

4) Analysis of the Expression by Microscopy and Flow Cytometry

1. After 48 hours of CEM treatment, image the cells without CEM treatment and the cells treated

with 100 nM CEM using a fluorescence microscope. Take representative images using phase

or brightfield to record the mESC morphology at 20X magnification; the cells should have

formed round colonies, with a few differentiated cells. Under the FITC fluorescence channel,

image both cell conditions.

2. Isolate the control and CEM-treated cells for flow cytometry by aspirating the media, washing

the cells with 1 mL of 1x PBS, aspirating the PBS, and adding 0.25 mL of 0.25% trypsin with

EDTA for 8 - 10 min.

3. Confirm that the cells are no longer in large clumps by looking in the microscope. Use a 20X

magnification. If the cells do not appear to be in a single-cell suspension, gently pipette up and

down with a P1000 pipette to fully trypsinize the cells.

4. Quench the trypsin with 1 mL of fresh media and resuspend the cells at high speed with a

serological pipette (or using a P1000 pipette and tip) until the cells are in a single-cell

suspension.

93

5. Spin down the cells at 300 x g for 5 min. Aspirate the supernatant. Wash with 1x PBS and

recentrifuge the cells. Aspirate the PBS supernatant.

6. Resuspend the cell pellet in 200 mL of FACS buffer (1 mM ethylenediaminetetraacetic acid

[EDTA], 1x PBS, and 0.2% bovine serum albumin [BSA]).

1. The total volume of FACS buffer will be dependent on how many cells are present,

but the concentration should be 1,000 - 3,000 cells/ µL. Count the number of cells in 3

samples and average the concentration of cells. Add more FACS buffer if needed.

7. Perform flow cytometry on > 50,000 cells with the suspended cells, keeping the cells on ice

when they are not being analyzed. Use a 530/40 filter (FITC, AF488, GFP, etc.) for recording

the changes in expression. Determine the appropriate fluorescent voltage using an untreated

control cell sample and set the voltage to have the fluorescent peak be in the middle of the

recorded intensity spectrum. Run the samples at a sheath speed of around 5,000 cells/s.

8. Export the data and analyze the FCS files on a flow cytometry program (i.e., FlowJo) by

gating first on forward scatter vs. side scatter for live cells and then on forward scatter height

vs. area for singlets. Then, visualize GFP channel data on a histogram to determine the relative

GFP expression in the targeted and control populations.

5) Analysis of Chromatin by Chromatin Immunoprecipitation (ChIP) Followed by qPCR

1. Split the mESCs into a 10-cm plate with 3 million cells and 10 mL of ES media (one 10-cm

plate for each experimental or control condition). The following day, add 10 mL of CEM-

containing or CEM-free media. After 48 hours of CEM treatment, aspirate the old media. Wash

and aspirate the cells with 5 mL of 1x PBS. Next, dissociate the cells with 0.25% trypsin, and

94 quench the trypsin with 10 mL of ES media. Count and isolate a sample of 10 million cells per

condition.

2. Spin down each of the 10 million cell samples at 300 x g for 5 min.

3. Resuspend each pellet in 10 mL of 1x PBS and recentrifuge the cells at 300 x g for 5 min.

4. Resuspend each pellet in 10 mL of CiA Fix buffer (50 mM pH 8, 1 mM EDTA pH 8, 0.5 mM

ethylene glycol-bis[ß-aminoethyl ether]-N,N,N',N'- tetraacetic acid [EGTA] pH 8, and 100 mM

sodium chloride [NaCl]). Add 1 mL of 11% formaldehyde solution and immediately invert the

samples 5x - 10x. Incubate the sample at room temperature for 10 min.

5. Add 0.5 mL of 2.5 M glycine. Invert the samples immediately and incubate the sample on

ice for 5 min.

6. Centrifuge the sample at 1,200 x g for 5 min at 4 °C. Aspirate the supernatant.

7. Resuspend the pellet in 10 mL of CiA Rinse 1 (50 mM HEPES pH 8, 140 mM NaCl, 1 mM

EDTA pH 8, 10% glycerol, 0.5% Nonidet P-40 [NP40], and 0.25% Triton X100). Incubate the

sample on ice for 10 min. Centrifuge it at 1,200 x g for 5 min at 4 °C.

8. Resuspend the pellet in 10 mL of CiA Rinse 2 (10 mM tris[hydroxymethyl]aminomethane

[Tris] pH 8, 1 mM EDTA pH 8, 0.5 mM EGTA pH 8, and 200 mM NaCl). Centrifuge the sample

at 1,200 x g for 5 min at 4 °C.

9. Gently add 5 mL of CiA shearing buffer (0.1% SDS, 1 mM EDTA, and 10 mM Tris pH 8)

along the sides of the conical tube to wash off any salt without disrupting the pellet. Centrifuge

the sample at 1,200 x g for 3 min at 4 °C. Repeat the wash step.

95

10. Resuspend the pellet in 90 µL of CiA shearing buffer and fresh protease inhibitors (use a

mixture of leupeptin, chymostatin, and pepstatin A dissolved together at 10mg/mL each in

DMSO to make a 1,000x stock solution, making a final concentration of 10 µg/mL). Add 10 µL

of sonication-enhancing nanodroplets while keeping everything on ice13,14.

11. Transfer 100 µL of the solution to a glass tube and cap the tube.

12. Sonicate the samples for 3.5 min (determined based on the cell line and number of cells).

To avoid overheating the samples, process the samples in 2-min-maximum cycle times if the

total sonication time exceeds 2 min.

Note: The focused ultrasonicator used here functions at a high frequency (500 kHz). Run the

machine at an acoustic power of 55 W. The sonication duration should be predetermined by

each individual lab.

13. Transfer each sonicated chromatin sample to a new 1.5-mL microcentrifuge tube. Spin the

tubes at 8,000 x g for 2 min at 4 °C.

14. To analyze the shearing efficiency and to quantify the relative amount of chromatin, follow

the ChIP kit manufacturer's protocol.

NOTE: Determine the DNA concentration spectrophotometrically (e.g., using a DNA nanodrop

instrument), and run ~200 ng of DNA on a 1% agarose gel to verify that the chromatin is

sonicated to roughly 200 - 500 basepairs (bp) in size.

15. To perform immunoprecipitation and isolate the chromatin, follow the ChIP kit

manufacturer's protocol.

96

NOTE: For the immunoprecipitation of samples with higher DNA concentrations, warm the

elution buffer at 37 °C and elute the samples 2x with 30 µL of elution buffer (spinning for 1 min

each time).

16. To perform qPCR, load the reagents into a 384-well plate. Add 5 µL of 2x asymmetrical

cyanine dye master mix, 2.5 µM of each primer, water, and 0.1 - 10 ng of DNA for each reaction.

1. Design primers to amplify 50 - 100 bp amplicons, and CT values are normalized to

a house-keeping gene, intracisternal A-type particles (IAP).

NOTE: For further qPCR methods, see the kit manufacturer's protocol. The primer set

used to target the Oct4 locus was:

(F) CACATGAAGCAGCACGACTT; (R) CCTTGAAGAAGATGGTGCGC. The control

primer set used to target IAP was: (F) ATTTCGCCTAGGACGTGTCA; (R)

ACTCCATGTGCTCTGCCTTC.

2. Based on the primers and the desired product, run the qPCR with 40 cycles of an

annealing temperature of 60 °C for 1 min.

NOTE: The resulting data was quantitated using the delta-delta Ct comparisons

between samples, with CiA:Oct4 normalized over IAP. The final analysis results are

displayed as averages of triplicate experimental samples, with the error represented

as the standard deviation from the mean.

17. Gently add 5 mL of CiA shearing buffer (0.1% SDS, 1 mM EDTA, and 10 mM Tris pH 8)

along the sides of the conical tube to wash off any salt without disrupting the pellet. Centrifuge

the sample at 1,200 x g for 3 min at 4 °C. Repeat the wash step.

97

18. Resuspend the pellet in 90 µL of CiA shearing buffer and fresh protease inhibitors (use a

mixture of leupeptin, chymostatin, and pepstatin A dissolved together at 10mg/mL each in

DMSO to make a 1,000x stock solution, making a final concentration of 10 µg/mL). Add 10 µL

of sonication-enhancing nanodroplets while keeping everything on ice13,14.

19. Transfer 100 µL of the solution to a glass tube and cap the tube.

20. Sonicate the samples for 3.5 min (determined based on the cell line and number of cells).

To avoid overheating the samples, process the samples in 2-min-maximum cycle times if the

total sonication time exceeds 2 min.

Note: The focused ultrasonicator used here functions at a high frequency (500 kHz). Run the

machine at an acoustic power of 55 W. The sonication duration should be predetermined by

each individual lab.

21. Transfer each sonicated chromatin sample to a new 1.5-mL microcentrifuge tube. Spin the

tubes at 8,000 x g for 2 min at 4 °C.

22. To analyze the shearing efficiency and to quantify the relative amount of chromatin, follow

the ChIP kit manufacturer's protocol.

NOTE: Determine the DNA concentration spectrophotometrically (e.g., using a DNA nanodrop

instrument), and run ~200 ng of DNA on a 1% agarose gel to verify that the chromatin is

sonicated to roughly 200 - 500 basepairs (bp) in size.

23. To perform immunoprecipitation and isolate the chromatin, follow the ChIP kit

manufacturer's protocol.

98

NOTE: For the immunoprecipitation of samples with higher DNA concentrations, warm the

elution buffer at 37 °C and elute the samples 2x with 30 µL of elution buffer (spinning for 1 min

each time).

24. To perform qPCR, load the reagents into a 384-well plate. Add 5 µL of 2x asymmetrical

cyanine dye master mix, 2.5 µM of each primer, water, and 0.1 - 10 ng of DNA for each reaction.

1. Design primers to amplify 50 - 100 bp amplicons, and CT values are normalized to

a house-keeping gene, intracisternal A-type particles (IAP).

NOTE: For further qPCR methods, see the kit manufacturer's protocol. The primer set

used to target the Oct4 locus was:

(F) CACATGAAGCAGCACGACTT; (R) CCTTGAAGAAGATGGTGCGC. The control

primer set used to target IAP was: (F) ATTTCGCCTAGGACGTGTCA; (R)

ACTCCATGTGCTCTGCCTTC.

2. Based on the primers and the desired product, run the qPCR with 40 cycles of an

annealing temperature of 60 °C for 1 min.

NOTE: The resulting data was quantitated using the delta-delta Ct comparisons

between samples, with CiA:Oct4 normalized over IAP. The final analysis results are

displayed as averages of triplicate experimental samples, with the error represented

as the standard deviation from the mean.

Representative Results:

We recently developed CEMs and demonstrated that this technology can be applied to regulate gene expression and the chromatin environment at a reporter locus in a dose-dependent and reversible

99 manner. In Figure 1, a model of the lead CEM, CEM23, is shown. HDAC machinery is recruited to the reporter locus by the HDAC inhibitor which, in this case, is the GFP reporter inserted at the Oct4 locus.

We sought to characterize the CEM system at the Oct4 locus in CiA mESCs. Because the cells express

GFP, it was possible to quickly visualize the expression of the reporter with fluorescence microscopy.

The cells were imaged after 48 hours of treatment with 100 nM of CEM23, along with untreated cells.

The phase images show healthy mESCs, which is important because unhealthy or differentiated mESCs would indicate a non-specific GFP repression. The fluorescence images show a bright GFP expression in the control cells and a reduced GFP expression in mESCs treated with CEM23.

Representative images are displayed in Figure 2.

To quantify the changes in the GFP expression, flow cytometry was used. Again, the CiA mESCs were treated with 100 nM of CEM23 for 48 hours. Experimental cells treated with CEM23 and control cells were prepared for flow cytometry, using > 100,000 cells per sample. Consistently, we observed a >

30% decrease in GFP-expressing cells among those treated with CEMs. A representative histogram is shown in Figure 3.

Once evidence of CEM-induced GFP gene expression decrease was shown, we tested for changes in the chromatin environment using ChIP. One important factor for preparing samples for ChIP is the extent of chromatin sonication. To have the samples as consistent and properly sheared as possible,

10 million cells were sonicated for 3.5 minutes to obtain chromatin between 200 and 500 bp in size

(Figure 4). Because we hypothesized that the repressive CEMs were binding to and recruiting HDAC proteins and repressive complexes to the reporter, we tested for changes in histone tail acetylation.

Histone 3 Lysine 27 acetylation (H3K27ac) is commonly found along the transcriptional start site (TSS) of genes. We performed ChIP with an antibody for H3K27ac and then performed a qPCR with primer sets upstream and downstream of the TSS. The results show that a 48-hour treatment with 100 nM of

CEM23 decreased the level of H3K27ac at the target locus when compared to control cells not treated with CEMs (p < 0.01, two-tailed Student's t-test with three biological replicates, Figure 5).

100

Discussion:

Here, we described the recently developed CEM system being applied to regulate gene expression and chromatin environment at a specific gene in a dose-dependent manner. We provide an accurate method to study the dynamics involved in regulating gene expression through the selective recruitment of specific endogenous chromatin regulatory proteins. This is a highly modular technology that can be applied to investigate how different protein- and chromatin-modifying complexes work in concert to properly regulate the chromatin environment, as well as to study how these processes are dysregulated, with the benefit of achieving gene specificity.

Here, three ways were demonstrated to visualize changes in the chromatin structure and gene expression with new technology. After perturbation of specific targeted loci with CEMs, real-time changes to gene expression could be analyzed by fluorescence microscopy. Then, changes in gene expression levels could be measured more quantitatively with flow cytometry at defined endpoints.

Finally, changes to posttranslational epigenetic marks on chromatin were examined by ChIP; specifically, in this case, H3K27ac. We did not test HDAC recruitment with ChIP-qPCR because of the diversity of HDACs that act in concert at a single locus. It is likely that a heterogenous population of

HDAC enzymes was recruited and it would, therefore, be technically difficult to test them all by ChIP.

In a previously published work, we have demonstrated that the direct recruitment of HDAC3 by a GAL4 fusion caused similar activity on gene expression and chromatin modification by ChIP12. If a non- fluorescent reporter is being modulated, changes in gene expression can also be measured by (1) an

RNA expression analysis with reverse transcriptase qPCR or (2) protein expression with western blotting, or imaging and conducting flow cytometry with fluorescent secondary antibodies.

In relation to current CIP-based techniques like the CiA system12, this CEM technology has the advantage of recruiting endogenous epigenetic modulators to the target gene. Redirecting the cell's own machinery creates a more physiologically relevant means of modulating expression. Exogenously

101 expressing master enzymes, like HATs and transcription factors, might result in side effects from their increased expression, especially while not being actively recruited to the gene locus.

While this article shows the functionality of this novel system with CEMs composed of HDAC inhibitors, several other inhibitors or protein recruiters can be synthesized in place of the HDAC inhibitor. To achieve gene activation, HAT inhibitor- or HDMT inhibitor-based CEMs can be synthesized. To repress and then overexpress the same gene, it is possible to treat the cells with a repressive CEM, wash them with regular media, and then add an activating CEM. This would allow for a clean system to study the dynamics of recruiting repressive and activating complexes to the same genomic region. When designing future CEMs, several characteristics need to be considered, including cell permeability, inhibitor potency, structure, and inhibitory kinetics. The linker length between FK506 and the recruiter moiety will influence the permeability of the CEMs, as well as the position of the recruited protein. When comparing two CEMs that differed only in linker length, the CEM with the shorter linker was more effective12. While it has not been tested directly here, the higher effectiveness is a result of greater cell permeability. Choosing inhibitors with chemical structures amenable to modification without disrupting the portion that binds the active site of the target is also important.

The potency and specificity were also considered by selecting inhibitors with a high potency and specificity for a given class of proteins. The kinetics of inhibition may influence the effectiveness of the

CEMs. CEMs comprised of inhibitors with slow on-off rates tend to be less effective.

One constraint of the current CEM technology is the requirement of a Gal4-binding array at the target gene locus. To recruit CEMs to a region other than the Oct4 CiA locus, any of the hundreds of engineered Gal4 upstream activation sequence (UAS) lines available to the scientific community can be used. One way the system will be made more modular is by incorporating a deactivated Cas9

(dCas9) with chemical epigenetic enzyme recruitment10,11,15. This nuclease-dead protein from the

CRISPR-Cas9 system will still be recruited to any region of the genome to which a guide RNA (gRNA) is designed, but it will not cut the DNA. Using a dCas9-FKBP fusion, the FKBP component will recruit

102 the CEMs where the dCas9 is recruited, greatly increasing the ease and flexibility of potential target genes. Imagine using this technology to target disease-relevant genes as a potential therapeutic or a means by which to reversibly and temporally study disease mechanisms at the chromatin level.

103

REFERENCES:

1. Alabert, C., Groth, A. Chromatin replication and epigenome maintenance. Nature Reviews Molecular Cell Biology. 13 (3), 153-167 (2012).

2. Venkatesh, S., Workman, J. L. Histone exchange, chromatin structure and the regulation of transcription. Nature Reviews Molecular Cell Biology. 16 (3), 178-189 (2015).

3. Bannister, A. J., Kouzarides, T. Regulation of chromatin by histone modifications. Cell Research. 21 (3), 381-395 (2011).

4. Gräff, J., Tsai, L.-H. Histone acetylation: molecular mnemonics on the chromatin. Nature Reviews Neuroscience. 14 (2), 97-111 (2013).

5. Verdin, E., Ott, M. 50 years of protein acetylation: from gene regulation to epigenetics, metabolism and beyond. Nature Reviews Molecular Cell Biology. 16 (4), 258-264 (2015).

6. Hathaway, N. A. et al. Dynamics and memory of heterochromatin in living cells. Cell. 149 (7), 1447- 1460 (2012).

7. Ren, J., Hathaway, N. A., Crabtree, G. R., Muegge, K. Tethering of Lsh at the Oct4 locus promotes gene repression associated with epigenetic changes. Epigenetics. 13 (2), 173-181 (2018).

8. Stanton, B. Z., Hodges, C. et al. Smarca4 ATPase mutations disrupt direct eviction of PRC1 from chromatin. Nature Genetics. 49 (2), 282-288 (2017).

9. Koh, A. S., Miller, E. L. et al. Rapid chromatin repression by Aire provides precise control of immune tolerance. Nature Immunology. 19 (2), 162-172 (2018).

10. Chen, T., Gao, D. et al. Chemically Controlled Epigenome Editing through an Inducible dCas9 System. Journal of the American Chemical Society. 139 (33), 11337-11340 (2017).

11. Braun, S. M. G. et al. Rapid and reversible epigenome editing by endogenous chromatin regulators. Nature Communications. 8 (1), 560 (2017).

12. Butler, K. V., Chiarella, A. M., Jin, J., Hathaway, N. A. Targeted Gene Repression Using Novel Bifunctional Molecules to Harness Endogenous Histone Deacetylation Activity. ACS Synthetic Biology. 7 (1) (2018).

13. Kasoji, S. K. et al. Cavitation Enhancing Nanodroplets Mediate Efficient DNA Fragmentation in a Bench Top Ultrasonic Water Bath. PLoS ONE. 10 (7), e0133014 (2015).

14. Chiarella, A. M. et al. Cavitation enhancement increases the efficiency and consistency of chromatin fragmentation from fixed cells for downstream quantitative applications. Biochemistry. 57 (19), 2756- 2761 (2018).

15. Gao, Y. et al. Complex transcriptional modulation with orthogonal and inducible dCas9 regulators. Nature Methods. 13 (12), 1043-1049 (2016).

104

Figures:

Figure 1: CEMs bind to the CiA:Oct4 locus through recruitment to the FKBP and tether endogenous epigenetic machinery. The model of the CEM system shows Gal4-FKBP as the protein- to-DNA anchor. The FK506 portion of the CEM binds to FKBP and the HDAC inhibitor recruits endogenous HDAC proteins to repress GFP.

105

Figure 2: Fluorescence images of mouse embryonic stem cells (mESCs) with a CiA:Oct4-GFP reporter. Fluorescence microscopy images show a decrease in GFP expression upon CEM treatment.

The top panel shows phase and fluorescent images of mESCs grown with untreated media. The bottom panel shows phase and fluorescent of the mESCs treated with 100 nM of CEM23 for 48 hours. The images adapted with permission from ACS Synthetic Biology12. Copyright (2018) American Chemical

Society

106

Figure 3: Flow cytometry analysis quantitates a decreased expression of GFP in mESCs upon

CEM23 treatment. mESCs without CEM23 (blue) were compared with mESCs treated with 100 nM of

CEM23 for 48 hours (red).

107

Figure 4: Sonication of chromatin is uniform, and a smear is visible between 200 and 500 bp. To perform ChIP-qPCR, the chromatin from mESCs was sonicated. Each sample consisted of approximately 10 million cells, which were sonicated for 3.5 minutes, to produce similarly- sheared chromatin samples between 200 and 500 bp in length.

108

Figure 5: CEM23 treatment causes a decrease in H3K27ac at the CiA locus. ChIP-qPCR was performed to test for changes in the chromatin environment. After a 48-hour treatment with 100 nM of

CEM23, a decrease in H3K27ac was observed at CiA:Oct4 (**p < 0.01, two-tailed Student's t-test with three biological replicates. The error bars represent the standard deviation).

109 CHAPTER 4: GENE SPECIFIC ACTIVATION DRIVEN BY SMALL MOLECULES

Introduction:

The eukaryotic genome is organized and packaged into chromatin with varying degrees of compaction, which contributes to the regulation of gene expression. A complex network of protein- protein and DNA-protein interactions ensures the proper levels of gene expression. Disruptions to this regulatory network drives many human diseases including cancer1,2. An important contributing factor that sculpts the chromatin landscape is post-translational histone tail modifications. Lysine acetylation is one such modification that has both biophysical and indirect protein-recruitment effects. Protein families of writers (histone acetyl transferases, HATs), erasers (histone deacetylases, HDACs), and reader proteins control the deposition, removal, and down-stream gene expression changes associated with this mark3,4.

Several labs have previously demonstrated the power of recruiting exogenous chromatin modifying machinery as a way to control expression levels in a gene-specific manner5–11. With major advances in the CRISPR associated protein 9 (Cas9) and deactivated Cas9 (dCas9) technology, the ability to precisely induce changes in expression has rapidly evolved. We were inspired by these studies and sought to incorporate recent advances in the fields of chemical biology to redirect endogenous chromatin modifying machinery in a dose-dependent and chemically-dependent manner.

We previously demonstrated the power of Chemical Epigenetic Modifiers (CEM) and their ability to control the chromatin landscape and repress genes in a specific and reversible manner12. In this study, we report for the first time CEM activating (CEMa) molecules which recruit activating chromatin modifying machinery. Our CEMa family includes CEM87, CEM88, and CEM114 that bind to activating chromatin modifying enzymes with various binding efficiencies. The warhead portion of the

CEM used to recruit chromatin modifying machinery was previously published inhibitors of HATs or lysine reader proteins (Figure 1A.) CEM87 was created with iBet762, shown to inhibit BRD2, BRD3,

110 and BRD413 (Figure 1B, Supplementary Figure 1A, B). CEM88 was created with a 1,3-dimethyl benzimidazolone, previously shown to inhibit the BRPF1 bromodomain (Figure 1C, Supplementary

Figure 1A, C)14. Lastly, CEM114 was created with “compound 33”, previously shown to inhibit CBP

(Figure 1D, Supplementary Figure 1A, D)15. We retained the FKBP-interacting moiety (FK506) and the peg-linker in the bifunctional molecule, as they have functioned successfully in other previously synthesized CEMs to interact with the FK506 Binding Protein (FKBP). We also show for the first time the coupling of this new synthetic technique to be compatible with dCas9-FKBP-based systems, allowing us to direct the CEMs to virtually any gene of interest. By recruiting endogenous cellular machinery associated with euchromatin and increased gene expression, we hypothesized that the

CEMs would increase transcriptional activity in a dose- and chemically-dependent manner.

Results:

To test for changes in gene expression, we transfected Human Embryonic Kidney (HEK) 293T cells with a BFP reporter gene downstream of a tre3g promoter and performed flow cytometry on at least 100,000 cells after co-expressing the gRNA and dCas9 machinery9. As a benchmark control for gene activation, we used a dCas9-p300 fusion protein known to increase expression11. Using a gRNA that targets the tre3g promoter at six interspaced sites, the dCas9-p300 increased BFP expression 14- fold compared to the non-targeting (NT) control gRNA (Figure 2A, p< 0.05). We then tested activating the reporter gene with a plasmid expressing dCas9 (N-terminus) –FKBPx2 (C-terminus) and tested three of our predicted activating CEMs. After 48 hours of exposure to 200nM of CEM87, CEM88 and

CEM114, BFP expression significantly increased 7.5, 1.8, 1.6- fold, respectively, compared to untreated cells (Figure 2B, p< 0.005, 0.000001, 0.05, respectively).

Optimization of the dCas9-CEMa system was done by testing gRNAs targeted to areas along the span of the promoter and increasing the density of recruitment, as this has been previously shown to have a great impact on dCas9 efficiency16–18. Cells were transfected with plasmids expressing the

BFP reporter, one or multiple gRNA, and dCas9, and were either treated with 200nM of CEM87 for 48 hours or given regular media. Compared to the untreated control cells, the CEM-treated cells expressing the 6x-tre3g gRNA showed a 5-fold increase in BFP expression (Supplemental Figure S2,

111 p< 0.0005). The cells expressing a gRNA designed at a single site, further into the gene body showed a 2-fold increase (Supplemental Figure S2, p< 0.00000005). We also tested a multiplexing gRNA plasmid, whereby the cells would express the 6x-tre3g gRNA and the single site gRNA19. This multiplex gRNA led to an activation of 4-fold (Supplemental Figure S2, p< 0.0005). It is possible that the less- effective single-site gRNA led the chromatin modifying machinery away from the promoter and drove down the overall effectiveness of the system. For future optimization experiments, we exclusively used the 6x-tre3g gRNA plasmid. To serve a positive control, we also transfected cells with dCas9-HA and the targeting 6x-tre3g gRNA. As expected, the addition of CEM87 at 200 nM did not significantly affect the BFP expression (Supplemental Figure S3)

Next, we sought to increase the number of FKBP molecules being recruited to the gene of interest with the idea that increased CEM-recruitment would yield a more effective and highly dose- able system. We adapted the dCas9 “SunTag” system to be compatible with the CEMa technology20,21.

A dCas9-SunTagx10 plasmid was used that expresses an array of 10 yeast-specific GCN4 (gene control protein 4) peptides from the C-terminus of the dCas9. We co-transfected with a single chain variable fragment (scFv), made to be GCN4-specific) fused to FKBP. Theoretically, as many as 10

FKBP molecules per dCas9 would be recruited to our gene of interest, increasing the CEMa recruitment power. With the adapted SunTag method and the tre3g x6 gRNA, 48 hours of 200 nM CEM87 exposure increased expression 16-fold compared to untreated cells (Supplemental Figure S4, p< 0.001).

Equimolar CEM88 and CEM114 increased expression 2-fold (Supplemental Figure S4, p< 0.005 and p< 0.05, respectively).

Another protein-engineering approach that we used to optimize the dCas9-CEMa system was incorporating ms2-compatible gRNAs. The ms2-gRNAs have a modified stem-loop, capable of recruiting both a dCas9-fusion as well as a bacteriophage MS2 coat protein (MCP)-fusion7,22. By using an MCP-FKBP fusion, we increase the number of FKBPs by 2-fold (as two MCP fusions can bind one stem loop) and physically increase the proximity of recruited CEMa to the chromatin. Compared to the untreated cells, CEM87 increased BFP expression 18-fold (p< 0.0000005). With our most efficient system (comprised of the dCas9-FKBPx2; ms2-FKBP; and the tre3g x6 gRNA), we re-tested the other

CEMa compounds to determine whether their activating efficiency also improved. Equivalent

112 concentrations of CEM88 increased expression 5-fold (p< 0.0005), and CEM114 increased expression

6-fold (p< 0.001) compared to cells expressing the NT gRNA (Figure 2C).

With the optimized dCas9 system and the most efficient CEMa, CEM87, we performed a dose- curve and determined the ideal concentration for maximum expression to be 200nM (Figure 2D). Since activity is governed by amount of small molecule ligand added, with this technique we can intricately control gene expression obtaining 2-fold activation at 10 nM, p< 0.05 and 8-fold activation at 50 nM, p<

0.001. At concentrations of 500 nM and above, the cells appeared less healthy as assessed by cell density and morphology (data not shown). Thus, dCas9-CEMa platform does not only have control of gene activity but by varying compound dose between 0 and 200nM the level of gene expression can be differentially modulated. This could be very useful for target validation studies, which is an area of the drug development pipeline in desperate need of new tools if clinical success rates are to be improved23,24.

We also asked what would happen with a dual compound exposure of (1) the gene-specific system of activation with CEM87 and (2) a globally effective HDACi. Upon treatment 2.5 µM of entinostat, a previously characterized HDAC inhibitor, and 200 nM of CEM87 with the dCas9-FKBPx2, ms2-FKBP recruitment system, we observed 34-fold gene activation compared to untreated cells

(Supplemental Figure S5, p< 0.0005)25. While the purpose of the CEMs system is to target transcriptional activity at a gene-specific level, these data to suggest a synergistic effect with complimentary chromatin-regulatory pathways.

We next sought to adapt our plasmid-optimized dCas9-CEMa system to target endogenous mammalian genes. Because we were targeting an endogenous gene, rather than hundreds of transiently expressing reporter plasmids, we chose to infect the dCas9 machinery (dCas9-p300, dCas9-

FKBPx2 and/or MCP-FKBP) into HEK 293T cells. After the cells have been stably selected for integration of the dCas9- and MCP- expressing plasmids, we transfect the gRNAs for the desired gene target. To determine the time point at which we observe more efficient dCas9 regulation by this method, we transfected cells expressing dCas9-p300 with gRNAs for myogenic differentiation (MYOD1), extracted the RNA and performed qRT-PCR after 24, 48 and 72 hours post transfection. MYOD1 was an ideal initial target for this question because it has been previously shown to be capable of modulation

113 by transiently expressed dCas9-p30011. With dCas9-p300 infected and the gRNAs transfected, we found that dCas9-mediated gene regulation was most efficient after 48 hours (Supplemental Figure

S6). To test the ability of CEM87 to activate endogenous genes, we transfected cells expressing dCas9-

FKBPx2 and ms2-FKBP with gRNA plasmids targeting MYOD1 (or NT gRNA for the control). After 48 hours of CEM87 exposure, mRNA levels increased 13-fold compared to untreated cells (Figure 3A, p<

0.01), whereas the cells expressing the NT gRNAs showed no significant changes in expression upon

CEM treatment (Figure 3A, p> 0.05).

To determine the versatility of the dCas9-CEMa system, we tested a second endogenous gene,

C-X-C chemokine receptor type 4 (CXCR4), that has also been previously regulated by other labs using a dCas9 technology5,7,26,27. After 48 hours of CEM87 exposure, expression increased 10-fold compared to untreated cells (Figure 3B, p< 0.005), where-as the cells expressing the NT gRNA had a less than

2-fold change in CXCR4 expression (Figure 3B, p< 0.05). Lastly, we targeted endogenous Achaete-

Scute family BHLH Transcription factor 1 (ASCL1) with previously tested and optimized gRNAs7,21,22.

Upon 48 hours of CEM87-treatment, mRNA levels of ASCL1 increased 22-fold compared to untreated cells (Figure 3C, p< 0.00005), compared to a less than 4-fold increase observed in the NT samples

(Figure 3C, p< 0.05).

Conclusions:

In summary, we have designed, synthesized, and optimized the use of novel bifunctional CEMs capable of robustly activating endogenous genes in a dose-dependent, gene-specific manner. By adapting the CEMa technology to dCas9 targeting constructs, we can theoretically target any gene in the genome by strategic gRNA designing. We have demonstrated the ability to control the chromatin landscape and induce changes in the expression of endogenous mammalian genes in a direct, biologically-relevant manner. This dCas9-CEMa technology paves the way for us and others to target disease relevant genes to ask specific, targeted questions about disease mechanisms of action. Since our gene activation platform is chemically activated in a dose dependent manner, it will be useful in target validation work for visualization of trends between phenotype and gene dosage over a wide

114 range of target gene concentrations. Additionally, we hope to develop the dCas9-CEMa platform as a novel therapeutic approach to changing gene expression levels driving human disease.

Methods:

Chemical Synthesis: See supplemental Figure 1A-D. CEMa compounds were diluted in DMSO

(Sigma D2650) and kept dry at −20 °C.

Plasmid design: To construct dCas9-FKBPx1 and -FKBPx2, we digested Addgene #61425 at the

BamHI and BsrG1 sites. To insert the FKBPx1 (from Addgene #44245) and FKBPx2 (from N160) fusion, we used In-Fusion (#639650) with the following primers: FKBPx1- Forward (5’

GGCGGCCGCTGGATCCGGCGTGCAGGTGGAGACTAT) Reverse (5’

CTCCACTGCCTGTACATTCCAGTTTTAGAAGCTCCACATC) ; FKBPx2- Forward (5’

CTCCACTGCCTGTACATTCCAGTTTTAGAAGCTCCACATC) Reverse (5’

GGCGGCCGCTGGATCCGGGGTCCAAGTTGAAACCATTA). To create a dCas9-SunTagx10 construct with a BFP fusion, we used Addgene #60903, digested with Not1 and Sbf1 restriction enzymes, and re-inserted the WPRE and NLS domains from the original plasmid using In-Fusion and primers: Forward (5’ GGTCCGATGGATCTACAGCGGCCGGGTGGAGGTCCAAAAAAGAAAAGG)

Reverse (5’ CAGTGATCGATCCCTGCAGGGCGGGGAGGCGGCCCAA). To create a multiplex parent plasmid capable of expression the tre3g x6 gRNA and the BFP x1 gRNA from a single plasmid, we used Addgene #52963 and digested with Not1 and PspX1 restriction enzymes. We CIP treated the backbone (NEB, M0290S) and PNK annealed (NEB, M0201S) the following oligos (oligo

1: 5’ GGCCACTAGTCTCTGGAGACGAAACGTCTCTCTAGCCC ; oligo 2: 5’

TCGAGGGCTAGAGAGACGTTTCGTCTCCAGAGACTAGT). After cleaning the reactions, we ligated

(NEB, M020S) the annealed insert and the CIP-treated backbone. For the multiplexing system, the insert plasmids were created and digested as previously described19 and the multiplex parent 2_3 plasmid was digested with BsmB1, CIP treated, then ligated together.

115 Cell culture: Low passage (p30-40) HEK 293T cells were cultured with high-glucose DMEM (Corning,

10-013-CV) supplemented with 10% FBS serum (Atlantic Biologicals, S11550), 10 mM HEPES

(Corning, 25-060-Cl), NEAA (Gibco, 11140-050), Pen/Strep, and 55 μM 2-Mercaptoethanol. Cells were passaged every 2 - 5 days and maintained at 10-90% confluency in an incubator at 37 °C and 5% CO2.

Cell Transfection: For Figure 2 and supplemental data, HEK 293T cells were split into 12-well plates with 100,000 cells per well. The next day, PEI (polyethlenimine, Polysciences #23966-1) transfection was done with 1 µg of DNA, 3 µL of PEI, and 100 µL of Optimem (Gibco, 31985070). For Figures 2A,

2B, S2, and S3 we transfected with 0.1 µg BFP reporter; 0.7 µg dCas9-X; 0.2 µg gRNA. For Figures

1C, 1D, S4, and S5 we transfected with 0.1 µg BFP reporter; 0.5 µg dCas9-X; 0.2 µg scFv-FKBP or ms2-FKBP; 0.2 µg gRNA. Media was change 16 hours after transfection with fresh media, with or without CEMa addition. After 48 hours of CEMa exposure, cells were isolated for flow cytometry analysis. For Figure 3 and supplemental data, HEK 293T cells were split into 6-well plates with 200,000 cells per well. The next day, PIE transfection was done with 3 µg of DNA, 9 µL of PEI, and 200 µL of

Optimem. For Figures 3A-C and S6, 3 µg of gRNA was added in equal ratio (if applicable). Media was change 16 hours after transfection with fresh media, with or without CEMa addition. After 48 hours of

CEMa exposure, cells were isolate for RNA extraction.

Flow cytometry: Flow cytometry was performed with the Attune Nxt as previously described28. The data presented represents three separate transfections and significance was determined with the two- tailed student’s t-test. Confidence interval is 95%. Variance between two conditions being compared was calculated, and equal or unequal variance was considered. The degrees of freedom were calculated to be sample size (N) - 1. Error bars represent standard deviation. A representative gating strategy is shown in Supplemental Figure S7.

Lentiviral infection: Lentivirus production for HEK 293T infection was done using LentiX 293T cells

(Clontech). Low passage cells (8-20) were plated onto 15 cm cells such that they were 70% confluent

24 hours later. Each plate was transfected with 18 μg of the plasmid of interest (dCas9-X), 13.5 μg of

116 the Gag-Pol expressing plasmid (Addgene #12260), and 4.5 μg of the VSV-G envelope expressing plasmid (Addgene #12259). PEI transfection was done and 60 hours after transfection, the virus was spun down at 20, 000 rpm for 2 1/2 hours at 4°C and then added to the HEK 293Ts in combination with 10 μg/mL Polybrene (Santa Cruz, sc-134220). The selection of lentiviral constructs was done with either hygromycin (200 μg/mL) or blasticidin (12 μg/mL).

RNA extraction: For Figures 3 and Supplemental Figure S6, cells from the 6-well plates were isolated.

Cells were washed with 1X PBS, disassociated with 0.05% trypsin, quenched with media, centrifuged and washed with 1X PBS. RNA extraction was performed with a RNeasy Plus Mini Kit (Qiagen, 74134)

TM and the relative enrichment of mRNA was quantified with the RNA-to-CT 1-step kit (Thermo Fisher

Scientific, 4389986). Three biological replicates were performed for RNA extraction. The qRT-PCR results represent two technical replicates for each of the three biological replicates. Quartile analysis was done to exclude outliers and significance was determined with the two-tailed student’s t-test.

Confidence interval is 95%. Variance between two conditions being compared was calculated, and equal or unequal variance was considered. The degrees of freedom were calculated to be sample size

(N) - 1. Error bars represent standard deviation.

Table of primers for qRT-PCR and source:

Target Forward (5’-3’) Reverse (5’-3’) Source

Myod1 CTCTCTGCTCCTTTGCCACA GTGCTCTTCGGGTTTCAGGA doi:

10.1038/nmeth.26

00

Cxcr4 ACTACACCGAGGAAATGGGC CCCACAATGCCAGTTAAGAA doi:

T GA 10.1038/s41467-

017-00644-y

Ascl1 CGCGGCCAACAAGAAGATG CGACGAGTAGGATGAGACCG doi:

10.1038/s41467-

017-00644-y

117 Gapdh CAATGACCCCTTCATTGACC TTGATTTTGGAGGGATCTCG doi:

10.1038/nmeth.26

00

Table of gRNAs and source:

Target: Protospacer Sequence (5’-3’) Source

Tre3g x6 TACGTTCTCTATCACTGATA doi: 10.1038/nmeth.4042

BFP x1 TACAAACTTGGGTCGAATT This study

Myod1 gRNA #1 CCTGGGCTCCGGGGCGTTT doi: 10.1038/nmeth.2600

Myod1 gRNA #2 GGCCCCTGCGGCCACCCCG doi: 10.1038/nmeth.2600

Myod1 gRNA #3 CTCCCTCCCTGCCCGGT AG doi: 10.1038/nmeth.2600

Myod1 gRNA #4 AGGTTTGGAAAGGGCGTGC doi: 10.1038/nmeth.2600

CXCR4 gRNA #1 GCAGACGCGAGGAAGGAGGGCGC doi: 10.1038/s41467-017-00644-y

CXCR4 gRNA #2 CCGACCACCCGCAAACAGCA doi: 10.1038/s41467-017-00644-y

CXCR4 gRNA #3 GCCTCTGGGAGGTCCTGTCCGGCTC doi: 10.1038/s41467-017-00644-y

ASCL1 gRNA #1 CGGGAGAAAGGAACGGGAGG doi: 10.1038/s41467-017-00644-y

ASCL1 gRNA #2 TCCAATTTCTAGGGTCACCG doi: 10.1038/s41467-017-00644-y

ASCL1 gRNA #3 AAGAACTTGAAGCAAAGCGC doi: 10.1038/s41467-017-00644-y

118 REFERENCES:

1. Dawson, M. A. The cancer epigenome: Concepts, challenges, and therapeutic opportunities. Science (80-. ). 355, 1147–1152 (2017). 2. MacDonald, I. A. & Hathaway, N. A. Epigenetic roots of immunologic disease and new methods for examining chromatin regulatory pathways. Immunol. Cell Biol. 93, 261–70 (2015) 3. Zeng, L. & Zhou, M. M. Bromodomain: an acetyl-lysine binding domain. FEBS Lett. 513, 124– 8 (2002). 4. de Ruijter, A. J. M. et al. Histone deacetylases (HDACs): characterization of the classical HDAC family. Biochem. J. 370, 737–49 (2003). 5. Gao, Y. et al. Complex transcriptional modulation with orthogonal and inducible dCas9 regulators. Nat. Methods 13, 1043–1049 (2016). 6. Chen, T. et al. Chemically Controlled Epigenome Editing through an Inducible dCas9 System. J. Am. Chem. Soc. 139, 11337–11340 (2017). 7. Braun, S. M. G. et al. Rapid and reversible epigenome editing by endogenous chromatin regulators. Nat. Commun. 8, (2017). 8. Gao, D. & Liang, F.-S. Chemical Inducible dCas9-Guided Editing of H3K27 Acetylation in Mammalian Cells. in 429–445 (2018). doi:10.1007/978-1-4939-7774-1_24 9. Ma, D. et al. Integration and exchange of split dCas9 domains for transcriptional controls in mammalian cells. Nat. Commun. 7, 13056 (2016). 10. Shrimp, J. H. et al. Chemical Control of a CRISPR-Cas9 Acetyltransferase. ACS Chem. Biol. 13, 455–460 (2018). 11. Hilton, I. B. et al. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat. Biotechnol. 33, 510–517 (2015). 12. Butler, K. V. et al. Targeted Gene Repression Using Novel Bifunctional Molecules to Harness Endogenous Histone Deacetylation Activity. ACS Synth. Biol. 7, 38–45 (2018). 13. Chung, C. et al. Discovery and Characterization of Small Molecule Inhibitors of the BET Family Bromodomains. J. Med. Chem. 54, 3827–3838 (2011). 14. Demont, E. H. et al. 1,3-Dimethyl Benzimidazolones Are Potent, Selective Inhibitors of the BRPF1 Bromodomain. ACS Med. Chem. Lett. 5, 1190–1195 (2014). 15. Hay, D. A. et al. Discovery and Optimization of Small-Molecule Ligands for the CBP/ p300 Bromodomains. doi:10.1021/ja412434f 16. Radzisheuskaya, A. et al. Optimizing sgRNA position markedly improves the efficiency of CRISPR/dCas9-mediated transcriptional repression. Nucleic Acids Res. 44, e141 (2016). 17. Lawhorn, I. E. B. et al. Evaluation of sgRNA Target Sites for CRISPR-Mediated Repression of TP53. PLoS One 9, e113232 (2014). 18. Braun, C. J. et al. Versatile in vivo regulation of tumor phenotypes by dCas9-mediated transcriptional perturbation. Proc. Natl. Acad. Sci. U. S. A. 113, E3892-900 (2016). 19. Sakuma, T. et al. Multiplex genome engineering in human cells using all-in-one CRISPR/Cas9 vector system. Sci. Rep. 4, 4–9 (2014). 20. Tanenbaum, M. E. et al. A Protein-Tagging System for Signal Amplification in Gene Expression and Fluorescence Imaging. Cell 159, 635–646 (2014).

119 21. Chavez, A. et al. Comparison of Cas9 activators in multiple species. Nat. Methods 13, 563– 567 (2016). 22. Konermann, S. et al. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature 517, 583–588 (2015). 23. Begley, C. G. & Ellis, L. M. Raise standards for preclinical cancer research. Nature 483, 531– 533 (2012). 24. Hay, M. et al. Clinical development success rates for investigational drugs. Nat. Biotechnol. 32, 40–51 (2014). 25. Saito, A. et al. A synthetic inhibitor of histone deacetylase, MS-27-275, with marked in vivo antitumor activity against human tumors. Med. Sci. 96, 4592–4597 (1999). 26. Stepper, P. et al. Efficient targeted DNA methylation with chimeric dCas9–Dnmt3a–Dnmt3L methyltransferase. Nucleic Acids Res. 45, 1703–1713 (2017). 27. Gilbert, L. A. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442–51 (2013). 28. Chiarella, A. M. et al. Cavitation Enhancement Increases the Efficiency and Consistency of Chromatin Fragmentation from Fixed Cells for Downstream Quantitative Applications. Biochemistry 57, 2756–2761 (2018).

120 Figures:

Figure 1. dCas9 CEMa system recrutis endogenous chromatin modifying machinery to regulate transcriptional activity. (A) Overall dCas9-CEMa strategy showing gRNAs that are designed to target a specific gene-of-interest using HEK 293T cells as a model system. Changes in transcriptional activity are evident by increases in fluorescent protein levels and relative mRNA quantification. dCas9-FKBP is introduced by transfection, or by infection of lentiviral delivery units. The CEMa molecules are brought to the gene-of-interest through interaction of the FK506 moiety with FKBP. Blue lines represent mRNA.

A chemical linker combines this recruitment event with an interchangeable warhead, specific activating chromatin-modifying machinery can be recruited included (B) BRD2, 3, 4 (C) BRF1 and (D) CBP/p300.

121

Figure 2. Protein engineering and synthetic chemistry explored to optimize dCas9–CEMa system.

HEK 293T cells were transfected with a reporter plasmid that carried a BFP gene (driven by a Tre3G promoter that has low expression) along with a sgRNA targeting the Tre3G promoter and the dCas9.

All samples were measured by flow cytometry 48-hrs after treatment. (A) As a positive control, dCas9- p300 was transfected, and showed a ~14-fold increase in BFP expression. (B) dCas9-FKBPx1 was transfected and was combined with 3 different CEMa compounds that were formulated with different chemical warheads. (CEM87= binds Brd -2,-3,-4; CEM88= binds BRPF1 which recruits MOZ/MORF histone acetyltrransferases, CEM114= binds CBP/p300). The most effective compound was CEM87, which demonstrated ~7-fold increased expression. (C) dCas9 with tandem FKBP repeats and additional

FKBP recruited to MS2 RNA-binding-sites introduced into the sgRNA, and added with the same CEMa set. Again, CEM87 was the most active with ~18-fold increased expression. (D) Using the optimized

122 dCas9-CEMa system and the most effective to-date CEMa, CEM87, a dose curve was performed.

These data demonstrate that transcriptional activity, as measured by mean fluorescence, can be tightly controlled with CEMa concentration.

123

Figure 3. Optimized dCas9–CEMa effectively reactivates three diverse mammalian genes. Overall optimized strategy in HEK 293T cell lines with stably expressing dCas9-FKBPx2 and ms2-FKBP machinery. We transfected gRNAs that were designed to target endogenous mammalian genes. After

48 hours of treatment with 200 nM CEM87, RNA was extracted, qRT-qPCR was performed and results were normalized to Gapdh. (A) dCas9-CEMa causes ~13-fold activation of MYOD1. (B) dCas9-CEMa causes ~10-fold activation of CXCR4. (C) dCas9-CEMa causes ~22-fold activation of ASCL1.

124 Supplemental Figure S1A. Chemistry General Procedures. FK506 was purchased from

Selleckchem. All other chemicals were purchased from Sigma-Aldrich or Fisher Scientific. HPLC spectra for all compounds were acquired using an Agilent 1200 Series system with DAD detector.

Analytical HPLC chromatography was performed on a 2.1×150 mm Zorbax 300SB-C18 5 μm column with water containing 0.1% formic acid as solvent A and acetonitrile containing 0.1% formic acid as solvent B at a flow rate of 0.4 mL/min. The gradient program was as follows: 1% B (0−1 min), 1−99%

B (1−4 min), and 99% B (4−8 min). High resolution mass spectra (HRMS) data were acquired in positive ion mode using an Agilent G1969A API-TOF with an electrospray ionization (ESI) source. Flash column chromatography was performed on a Teledyne ISCO CombiFlash Rf system equipped with a variable wavelength UV detector and a fraction collector using RediSep Rf normal phase silica columns.

Microwave reactions were performed using a Discover SP CEM. Nuclear Magnetic Resonance (NMR) spectra were acquired on a Bruker DRX-600 spectrometer with 600 MHz for proton (1H NMR) and 150

MHz for carbon (13C NMR); chemical shifts are reported in ppm (δ). Preparative HPLC was performed on Agilent Prep 1200 series with UV detector set to 254 nm. Samples were injected onto a Phenomenex

Luna 75 x 30 mm, 5 μm, C18 column at room temperature. The flow rate was 30 mL/min. A linear gradient was used with 10% (or 50%) of MeOH (A) in H2O (with 0.1 % TFA) (B) to 100% of MeOH (A).

HPLC was used to establish the purity of target compounds. All final compounds had > 95% purity using the HPLC methods described above.

125 N N N N N EDCI, HOBT, N DIPEA, propargylamine, DCM HO N O HN N O O O

Cl Cl

Supplemental Figure S1B. (S)-2-(6-(4-Chlorophenyl)-8-methoxy-1-methyl-4H- benzo[f][1,2,4]triazolo[4,3-a][1,4]diazepin-4-yl)-N-(prop-2-yn-1-yl)acetamide:

(S)-2-(6-(4-Chlorophenyl)-8-methoxy-1-methyl-4H-benzo[f][1,2,4]triazolo[4,3-a][1,4]diazepin-4- yl)acetic acid (purchased from a commercial vendor) (35 mg, 0.09 mmol), propargylamine (23 µL, 0.36 mmol), EDCI-HCl (21 mg, 0.13 mmol), HOBT (25 mg, 0.18 mmol), and DIPEA 31 µL, 0.18 mmol) were all added to 4 mL dichloromethane (DCM) and stirred 24 h. The product was purified by HPLC. Yield:

35 mg, 90%. 1H NMR (600 MHz, Chloroform-d) δ 7.52 (d, J = 8.3 Hz, 2H), 7.43 (d, J = 8.9 Hz, 1H),

7.35 (d, J = 8.2 Hz, 2H), 7.26 - 7.23 (m, 1H), 6.90 (d, J = 2.8 Hz, 1H), 4.67 (t, J = 7.0 Hz, 1H), 4.23 -

+ + 3.98 (m, 2H), 3.83 (s, 3H), 3.52 - 3.39 (m, 2H), 2.70 (s, 3H). TOF-HRMS (m/z) [M+H] for C23H21ClN5O2 calculated 434.1378, found 434.1396.

126 OH O N N N OH O O HN N O O O O O H S N O N3 + O H N O

O Cl O O H HO A CuSO4 O Sodium ascorbate TBTA, tBuOH OH O

N N N OH O O O O O HN N O H S N O N H O O N N N O

O O O Cl HO H O

CEM87: (S)-2-(6-(4-Chlorophenyl)-8-methoxy-1-methyl-4H-benzo[f][1,2,4]triazolo[4,3-a][1,4]diazepin-

4-yl)-N-(prop-2-yn-1-yl)acetamide (15 mg, 0.035 mmol), compound A (shown above, synthesized as previously described1) (39 mg, 0.035 mmol), were dissolved in 1 mL tert-butanol. TBTA (2 mg), copper sulfate pentahydrate (0.05 equiv of a 0.1M solution in water), and sodium ascorbate (0.2 equiv of a

0.1M solution in water) were added to the solution. The reaction was stirred for 24 h then purified by

1 HPLC. Yield: 8 mg, 15%. H NMR (600 MHz, Methanol-d4) δ 7.99 (s, 1H), 7.75 (d, J = 8.9 Hz, 1H), 7.51

(d, J = 8.3 Hz, 2H), 7.48-7.25 (m, 3H), 6.94 (s, 1H), 5.23 (m, 2H), 4.73-4.48 (m, 5H), 3.90 (m, 2H), 3.86

(m, 2H), 3.78-3.38 (m, n.d. - integration obscured by solvent peak), 3.04 (m, 3H), 2.72 (t, J = 7.2 Hz,

2H), 2.67 (s, 3H), 2.51 (s, 2H), 2.44 (t, J = 7.3 Hz, 2H), 2.36 (s, 2H), 2.18 (m, 5H), 2.04 (m, 2H), 1.94

(m, 3H), 1.83 - 1.58 (m, 11H), 1.32 (m, 17H), 0.92 (m, 11H). TOF-HRMS (m/z) [M+H]+ for

+ C78H112ClN10O18S calculated 1543.7560, found 1543.7565. HPLC tR = 4.78 min.

127 O O HO LiOH O O Br O O THF / H2O O OH K2CO3, DMF rt, 2 days

Supplemental Figure S1C. 3-(prop-2-yn-1-yloxy)benzoic acid: To the solution of methyl 3- hydroxybenzoate (760 mg, 5 mmol) in DMF (10 mL) was added K2CO3 (1.38 g, 10 mmol). Then, 3- bromoprop-1-yne (1.49 mL, 10 mmol) was added, and the reaction was stirred at room temperature for

2 days. Water and ethyl acetate (EtOAc) were added, and the aqueous phase was extracted with EtOAc

(3 x 20 mL). The combined organic layers were washed with water, brine, and dried over Na2SO4. The solvent was removed and dried under vacuum. The crude product (950 mg, 5 mmol), which was used for next step without further purification, was dissolved in THF / H2O (25 mL / 25 mL). Then, LiOH (1.0 g, 43 mmol) was added, and the reaction was stirred at room temperature overnight. The reaction mixture was then treated with 4 N HCl until pH = 1, and extracted with EtOAc (3 x 30 mL), dried over

Na2SO4. The solvent was removed under reduced pressure to obtained the title compound (801 mg, yield 91% over two steps). 1H NMR (600 MHz, Chloroform-d) δ 7.79 (dt, J = 7.7, 1.2 Hz, 1H), 7.73 (dd,

J = 2.9, 1.5 Hz, 1H), 7.44 (t, J = 7.9 Hz, 1H), 7.30 – 7.20 (m, 1H), 4.78 (d, J = 2.4 Hz, 2H), 2.58 (t, J =

2.3 Hz, 1H).

128 O H N 2 N O N N N N O O SOCl2 O reflux O O OH Cl Pyridine, DCM, rt NH N O

N-(1,3-dimethyl-2-oxo-6-(piperidin-1-yl)-2,3-dihydro-1H-benzo[d]imidazol-5-yl)-3-(prop-2-yn-1- yloxy)benzamide: 3-(prop-2-yn-1-yloxy)benzoic acid (200 mg, 1.13 mmol) was dissolved in excess

SOCl2, and the reaction was heated to reflux for 2 h. Then, the mixture was cooled and concentrated to generate 3-(prop-2-yn-1-yloxy)benzoyl chloride for the next step without purification. To a solution of

5-amino-1,3-dimethyl-6-(piperidin-1-yl)-1H-benzo[d]imidazole-2(3H)-one (shown above, synthesized

2 as previously described ) (323 mg, 1.24 mmol) in CH2Cl2 (10 mL) was added pyridine (0.114 mL, 1.41 mmol) at room temperature. Then, 3-(prop-2-yn-1-yloxy)benzoyl chloride (220 mg, 1.13 mmol) was added and the reaction mixture was stirred for 1 h at this temperature. The reaction was diluted with

DCM and washed by saturated NaHCO3 solution, and the aqueous phase was extracted with DCM.

The combined organics were washed with water, dried over Na2SO4. The solvent was removed under reduced pressure and the residue was purified by flash chromatography on silica gel (50%-100% of

EtOAc in Hexanes), and to obtain the product as white solid (294 mg, yield 62%). 1H NMR (600 MHz,

Methanol-d4) δ 8.19 (s, 1H), 7.57 – 7.45 (m, 3H), 7.29 – 7.22 (m, 1H), 7.16 (s, 1H), 4.84 (d, J = 2.4 Hz,

2H), 3.41 (d, J = 4.8 Hz, 6H), 3.01 (t, J = 2.4 Hz, 1H), 2.88 (t, J = 5.2 Hz, 4H), 1.79 (p, J = 5.7 Hz, 4H),

+ + 1.65 (brs, 2H). HRMS (m/z) [M + H] for C24H27N4O3 calculated 419.2078, found 419.2155.

129 OH O

O OH O O N N O O O H S N O N3 + O H O N O NH N O O O O HO H A CuSO4 O Sodium ascorbate TBTA, tBuOH OH O

O OH O O N N O O O H S N O N H O N N O N O NH N O O O O HO H O

Supplemental Figure S1C. CEM88: Compound A1 (39 mg, 0.035 mmol) and N-(1,3-dimethyl-2-oxo-

6-(piperidin-1-yl)-2,3-dihydro-1H-benzo[d]imidazol-5-yl)-3-(prop-2-yn-1-yloxy)benzamide (15 mg,

0.035 mmol) were dissolved in tert-Butanol (1 mL). TBTA (2 mg, 10 mol %), copper sulfate pentahydrate

(0.035 mL, 10 mol %, 0.1 M), and sodium ascorbate (0.17 mL, 0.2 equiv.) were added, and the reaction was stirred at room temperature for 24 h. The reaction mixture was purified by preparative HPLC (10%-

100% methanol / 0.1% TFA in H2O) to afford CEM88 as white solid in TFA salt form (38 mg, yield 71%).

1 H NMR (600 MHz, Methanol-d4) δ 8.19 (s, 1H), 7.83–7.57 (m, 2H), 7.55–7.44 (m, 2H), 7.36–7.30 (m,

2H), 5.30 (s, 2H), 5.24–5.08 (m, 2H), 4.63–4.58 (m, 3H), 4.33 (d, J = 13.6 Hz, 1H), 4.06–3.87 (m, 3H),

3.74–3.67 (m, 1H), 3.63–3.53 (m, 11H), 3.55–3.36 (m, 19H), 3.07–2.92 (m, 2H), 2.79 (dd, J = 14.4, 5.8

Hz, 1H), 2.68 (t, J = 7.2 Hz, 2H), 2.46 (s, 2H), 2.40 (t, J = 7.3 Hz, 2H), 2.37–2.23 (m, 2H), 2.21–2.07

(m, 3H), 2.06–1.68 (m, 18H), 1.67–1.56 (m, 8H), 1.56–1.26 (m, 10H), 1.25–1.16 (m, 1H), 1.09 (q, J =

+ + 12.9, 12.4 Hz, 1H), 0.99–0.79 (m, 11H). HRMS (m/z) [M+H] for C79H118N9O19S calculated 1528.8259, found 1528.8327. HPLC tR = 4.76 min.

130 O O O O O BnCl, K2CO3 O H DMAP, DMF, rt H PPh3 BnO HO BnO F F F

1) Pd/C, H2 O EtOAc LiOH O THF / H2O 2) O O Br O OH F K2CO3, DMF o F 60 C, 12 h

Supplemental Figure S1D. 3-(3-fluoro-4-(prop-2-yn-1-yloxy)phenyl)propanoic acid: To a solution of 3-fluoro-4-hydroxybenzaldehyde (1.40 g, 10 mmol) in dimethyl formamide (DMF) (20 mL) was added benzyl chloride (1.39 mL, 12 mmol), K2CO3 (2.04 g, 15 mmol), and DMAP (10 mg). The reaction was stirred overnight at room temperature. Water and EtOAc were added, and the aqueous phase was extracted with EtOAc (3 x 20 mL). The combined organic layers were washed with water, brine, and dried over Na2SO4. The solvent was removed under reduced pressure, and the residue was purified by flash chromatography on silica gel (0%-10% of EtOAc in Hexane). The product 4-(benzyloxy)-3- fluorobenzaldehyde was obtained as white solid (1.33 g, yield 58%). 4-(benzyloxy)-3- fluorobenzaldehyde (1.33 g, 5.8 mmol) and methyl (triphenylphosphoranylidene) acetate (2.12 g, 6.4 mmol) were dissolved in DCM (20 mL). The resulting mixture was stirred at room temperature for 3 days. The solvent was removed and purified by flash chromatography on silica gel (0%-25% of EtOAc in Hexane). Methyl 3-(4-(benzyloxy)-3-fluorophenyl)acrylate was obtained as white solid (1.55 g, yield

93%) and dissolved in EtOAc (50 mL). Pd/C (200 mg) was added and the reaction was stirred under

H2 for 18 h. The Pd/C was filtered through Celite, and the filtrate was concentrated. The crude product was dissolved in DMF (20 mL). Then, 3-bromoprop-1-yne (0.72 mL, 6.5 mmol) and K2CO3 (1.1 g, 8.1 mmol) were added. The reaction was stirred at 60 oC overnight. Water and EtOAc were added, and the aqueous phase was extracted with EtOAc (3 x 20 mL). The combined organic layers were washed with water, brine, and dried over Na2SO4. The solvent was removed and purified by flash chromatography on silica gel (0%-50% of EtOAc in Hexane). Methyl 3-(3-fluoro-4-(prop-2-yn-1-yloxy)phenyl)propanoate was obtained as white solid (892 mg, yield 70%). Methyl 3-(3-fluoro-4-(prop-2-yn-1-

131 yloxy)phenyl)propanoate (850 mg, 3.6 mmol) was dissolved in THF/H2O (8 mL/8 mL). LiOH (432 mg,

18 mmol) was added, and the reaction was stirred at room temperature overnight. The reaction was treated with 2 N HCl until pH = 1, extracted with EtOAc (3 x 30 mL), dried over Na2SO4. The solvent

1 was removed to get the title product (754 mg, yield 94%). H NMR (600 MHz, Methanol-d4) δ 7.10 (t, J

= 8.5 Hz, 1H), 7.00 (dd, J = 25.3, 10.2 Hz, 2H), 4.77 (q, J = 3.5 Hz, 2H), 3.02 – 2.92 (m, 1H), 2.87 (d,

J = 7.4 Hz, 2H), 2.64 – 2.54 (m, 2H).

132 N O H2N N O O HN T3P, DIPEA, EtOAc, N o O OH + MW, 150 C, 1 h O N F N F N O O

Supplemental Figure S1D. 4-(2-(5-(3,5-dimethylisoxazol-4-yl)-2-(3-fluoro-4-(prop-2-yn-1- yloxy)phenethyl)-1H-benzo[d]imidazol-1-yl)ethyl)morpholine: 3-(3-fluoro-4-(prop-2-yn-1- yloxy)phenyl)propanoic acid (53 mg, 0.24 mmol) and 4-(3,5-dimethylisoxazol-4-yl)-N1-(2- morpholinoethyl)benzene-1,2-diamine (shown above, synthesized as previously described3) (63 mg,

0.2 mmol) were dissolved in EtOAc (0.7 mL). T3P (0.578 mL, 1 mmol) and DIPEA (0.041 mL, 0.24 mmol) were added, then, the reaction was heated to 150 oC for 10 minutes in microwave reactor (MW).

Water was added to quench the reaction, and the aqueous phase was extracted with EtOAc (3 x 5 mL), dried over Na2SO4. Solvent was removed and the resulting mixture was purified by preparative HPLC

(10%-100% methanol / 0.1% TFA in H2O) to afford title compound as white solid in TFA salt form (41

1 mg, yield 41%). H NMR (600 MHz, Methanol-d4) δ 8.04 (d, J = 8.6 Hz, 1H), 7.74 (d, J = 1.4 Hz, 1H),

7.60 (dd, J = 8.6, 1.5 Hz, 1H), 7.19 – 7.11 (m, 2H), 7.03 (d, J = 8.3 Hz, 1H), 4.78 (d, J = 2.4 Hz, 2H),

3.94 (s, 4H), 3.58 (t, J = 7.8 Hz, 2H), 3.54 – 3.48 (m, 2H), 3.35 (d, J = 9.1 Hz, 4H), 3.31 (p, J = 1.6 Hz,

+ + 4H), 2.98 (t, J = 2.5 Hz, 1H), 2.44 (s, 3H), 2.27 (s, 3H). HRMS (m/z) [M + H] for C29H32FN4O3 calculated

503.2453, found 503.2442.

133 OH O

OH O O N O O O O N H S N O N3 O H + O N N O F O O O N HO H A O O

CuSO4 Sodium ascorbate OH TBTA, tBuOH O

N OH O O O N O O O H S N O N H N O N N O N O F N O O O HO H O O

Supplemental Figure S1D. CEM114: CEM114 was synthesized following the standard procedures for preparing CEM88 from Compound A1 (36 mg, 0.032 mmol) and 4-(2-(5-(3,5-dimethylisoxazol-4-yl)-2-

(3-fluoro-4-(prop-2-yn-1-yloxy)phenethyl)-1H-benzo[d]imidazol-1-yl)ethyl)morpholine (20 mg, 0.032 mmol), TBTA (1 mg, 10 mol %), copper sulfate pentahydrate (0.032 mL, 10 mol %, 0.1 M), and sodium ascorbate (0.16 mL, 0.2 equiv) in tert-Butanol (1 mL). The reaction mixture was purified by preparative

HPLC (10%-100% methanol / 0.1% TFA in H2O) to afford CEM114 as white solid in TFA salt form (23

1 mg, yield 45%). H NMR (600 MHz, Methanol-d4) δ 8.16 (d, J = 2.8 Hz, 1H), 7.98 (dd, J = 8.6, 3.2 Hz,

1H), 7.72 (s, 1H), 7.58 (dd, J = 8.6, 1.5 Hz, 1H), 7.19 (td, J = 8.5, 5.3 Hz, 1H), 7.15 – 7.10 (m, 1H), 7.01

(t, J = 8.6 Hz, 1H), 5.24 – 5.12 (m, 4H), 4.66 – 4.57 (m, 3H), 4.28 (d, J = 13.4 Hz, 1H), 4.06 – 3.94 (m,

1H), 3.89 (dd, J = 12.9, 7.8 Hz, 7H), 3.75 – 3.66 (m, 1H), 3.63 – 3.51 (m, 13H), 3.48 (t, J = 5.5 Hz, 3H),

3.43 – 3.36 (m, 8H), 3.26 – 3.11 (m, 7H), 3.01 (ddd, J = 15.0, 9.6, 4.1 Hz, 1H), 2.92 (t, J = 12.4 Hz,

1H), 2.80 (dd, J = 14.3, 5.3 Hz, 1H), 2.70 (t, J = 7.2 Hz, 2H), 2.51 – 2.40 (m, 8H), 2.28 (s, 5H), 2.23 –

1.97 (m, 6H), 1.95 – 1.68 (m, 9H), 1.68 – 1.30 (m, 17H), 1.24 – 1.14 (m, 1H), 1.13 – 1.04 (m, 1H), 1.00

+ + – 0.77 (m, 11H). HRMS (m/z) [M+H] for C84H123N9O19S calculated 1612.8634, found 1612.8475.

HPLC tR = 4.27 min.

134

Supplemental Figure S2. dCas9-CEMa gRNA optimization. HEK 293T cells were transfected with dCas9-FKBPx2, a BFP reporter, and one of 3 gRNAs sets as indicated. Treated cells were exposed to media with 200 nM of CEM87 for 48 hours at which point flow cytometry was used to quantify changes in BFP gene expression. Upon CEMa exposure, the tre3g x6 gRNA increased expression 5-fold, the

BFP x1 gRNA increased expression 2-fold, and the multiplexing plasmid that co-expressed both gRNAs increased expression 4-fold.

135

Supplemental Figure S3. dCas9-HA control. To serve as a control for CEMa exposure upon dCas9 and gRNA recruitment, HEK 293T cells were transfected with dCas9-HA (with no FKBP recruitment site), tre3g x6 gRNA, and the BFP reporter. Treated cells were exposed to 200 nM of CEM87 for 48 hours. Flow cytometry showed no significant changes in gene expression as measured by BFP (p

>0.05).

136

Supplemental Figure S4. A dCas9-SunTag approach adapted to the dCas9-CEMa system. To optimize the dCas9-CEMa system and explore other recruitment strategies, we tested the efficiency of the dCas9-SunTagx10 approach. HEK 293T cells were transfected with dCas9-SunTag10, a scFv

(GCN4) - FKBP, tre3g x6 gRNA, and a BFP reporter. Cells were treated with one of three CEMa compounds as indicated at 200 nM. After 48 hours of treatment, flow cytometry was performed to measure BFP expression. CEM87 induced a 16-fold increase in expression compared to untreated cells. CEM88 and CEM114 induced a 2-fold increase in expression compared to untreated cells.

137

Supplemental Figure S5. CEMa in combination with an HDACi. Using the optimized the dCas9-

CEMa system (dCas9-FKBPx2, ms2-FKBP, tre3g x6 gRNA, BFP reporter), the effects of a global epigenetic inhibitor in combination with a gene-specific approach was tested. After the cells were transfected, they were treated with 200 nM of CEM87 and 2.5 μM of Entinostat. BFP expression increased 34-fold compared to untreated cells as measured by flow cytometry.

138

Supplemental Figure S6. dCas9-p300 time course at MYOD1. To determine the optimal timepoint post transfection, RNA extraction and qRT-PCR was performed on HEK 293T cells infected with dCas9-p300, and transfected MYOD1-targeting or control gRNA as indicated. Compared to the NT- gRNA cells, dCas9-p300 increased MYOD1 expression 15-fold 24 hours after transfection (p < 0.01),

52-fold 48 hours after transfection (p <0.05), and 44-fold 72 hours after transfection (p <0.01).

139 A.

B.

Supplemental Figure S7. The above histograms represent the gating scheme that is performed on all flow cytometry-based experiments. (A) The FSC-area (A) and SSC-A gate is used to select live, round cells. (B) The SSC-A and SS-height (H) gate is used to select single cells.

140 CHAPTER 5: CAVITATION ENHANCEMENT INCREASES THE EFFICIENCY AND

CONSISTENCY OF CHROMATIN FRAGMENTATION FROM FIXED CELLS FOR DOWNSTREAM

QUANTITATIVE APPLICATIONS3

Introduction:

Investigating DNA-protein interactions is essential for the study of cellular processes including gene regulation, DNA replication, DNA repair, and nucleosomal organization. Chromatin immunoprecipitation (ChIP) is a key method used to determine the genomic location of specific proteins or posttranslational marks at individual loci or genome-wide1, 2. A typical workflow incorporating ChIP is as follows: (1) fixation of cells with formaldehyde, which crosslinks chromatin-bound proteins to DNA;

(2) isolation of the crosslinked chromatin from cell nuclei; (3) nuclei lysis and fragmentation of chromatin into 200-500 base pair (bp) fragments by sonication; (4) enrichment of the protein target and its associated DNA sequences from the sheared lysates using antibody immunoprecipitation; (5) reversal of crosslinks and isolation of DNA; and (6) downstream quantitative analysis of enriched DNA sequences by quantitative PCR (qPCR) or next-generation sequencing.

Random, unbiased fragmentation of chromatin is an important step in protocols incorporating

ChIP analysis1, 3. Since ChIP followed by qPCR or next-generation sequencing determines protein localization to specific regions of the genome, chromatin must be fragmented to 200-500 bp segments2,

4, 5. If the fragments are too small or degraded, the DNA sequence will be lost from downstream analyses. If the fragments are too large, it is impossible to map protein occupancy to a narrow region of the genome. Therefore, variability in the fragmentation technique between samples can bias data ______3This chapter previously appeared as an article in Biochemistry. The original citation is as follow: Anna M. Chiarella, Austin L. Quimby, Marjan Mehrab-Mohseni, Brian Velasco, Sandeep K. Kasoji, Ian J. Davis, Paul A. Dayton, Nathaniel A. Hathaway, and Samantha G. Pattenden. “Cavitation Enhancement Increases the Efficiency and Consistency of Chromatin Fragmentation from Fixed Cells for Downstream Quantitative Applications” Biochemistry 2018 57 (19), 2756-2761.

141 analyses6.

Despite the importance of sonication to the success of the ChIP-based assays, chromatin fragmentation remains an inefficient and inconsistent process, with few recent innovations in the development of chromatin fragmentation technology6-8. Proper chromatin fragmentation is a bottleneck in current protocols, often adding a day to processing times when handling large sample sets. To address this issue, we applied a novel cavitation enhancement reagent to improve fragmentation of chromatin from fixed cell nuclei. These biologically inert nanodroplets are composed of a liquid perfluorocarbon stabilized with a phospholipid monolayer shell9, 10. The droplets have an average diameter of 250 nanometers (nm). Once exposed to ultrasound energy, they vaporize into microbubbles of ~1–5 microns (µm) in diameter resulting in an intense, prolonged cavitation and release of mechanical energy that can be transferred to chromatin causing greater fragmentation than sonication alone10-12.

To assess the performance of nanodroplet reagent in the sonication step of a typical protocol using ChIP, we use the Chromatin in vivo Assay (CiA)13. We chose this assay as it was a previously characterized robust method that can generate significant changes to enrichment of posttranslational marks at a well-defined genomic locus in mammalian cells. This system allows us to recruit chromatin modifying proteins to DNA binding elements inserted in the promoter of a specific gene and then to determine changes in the chromatin environment. Here, we recruited a fragment of heterochromatin protein 1 alpha (HP1) which stimulates heterochromatin formation and performed ChIP of histone posttranslational modifications followed by the quantitative polymerase chain reaction (ChIP-qPCR).

We show that nanodroplets increase efficiency of chromatin fragmentation time by roughly 16-fold, while maintaining chromatin quality for downstream analysis by ChIP-qPCR. In practice, this method can reduce processing time for mouse embryonic stem cells from >38 minutes to <3 minutes, representing a significant advance over current techniques.

142 Experimental Procedures:

Cell culture and infection

The CiA mouse embryonic stem cells were cultured and infected as previously described13.

Rapamycin was diluted in molecular grade ethanol and stored at -20°C. HEK 293T cells were cultured as previously described14.

Fluorescent microscopy and image analysis

Representative images of the CiA mouse embryonic stem cells were imaged using a fluorescence microscope prior to flow cytometry analysis and ChIP-qPCR. The brightness/contrast of the brightfield images was uniformly adjusted in ImageJ FIJI. The background artifact in the GPF- fluorescent images was uniformly removed in FIJI with a sliding paraboloid with a rolling ball radius of

10 pixels.

Flow cytometry

Flow cytometry was performed at the University of North Carolina Flow Facility. The cells were washed in 1XPBS, trypsinized, and resuspended in FACS buffer (1X PBS, 0.2% BSA, 1 millimolar (mM)

EDTA). For each replicate > 100,000 cells were analyzed and gated as shown in Figure S1.

Preparation of fixed cell nuclei

The optimal sonication time for the ChIP-qPCR protocol was determined through titration over a series of sonication time points. For consistency purposes, a fixed number of cells were treated with formaldehyde, nuclei were isolated, prepped for sonication, and analyzed using a standard protocol.

The mESCs were prepared by trypsinization on 15 centimeter (cm) plates containing ~40 million cells (mESCs). Cells were transferred to a 15 milliliter (mL) conical tube and pelleted at 300 x g for 5 minutes followed by resuspension in 15 mL of 1x phosphate buffered saline (PBS). Cells were centrifuged at 300 x g for 5 minutes and the pellet was resuspended in 10 mL fixation buffer (50 mM

HEPES (pH 8.0), 1 mM EDTA (pH 8.0), 0.5 mM EGTA (pH 8.0), 100 mM NaCl). Crosslinking was

143 performed with the addition of 1 mL of 11% formaldehyde (final concentration 1%) followed by rotation for 10 minutes at room temperature. To quench the formaldehyde crosslinking, 0.5 mL of 2.5 molar (M) glycine (125 mM final concentration) was added and rapidly mixed by inversion.

For the HEK 293T cells, formaldehyde was added to the media of a 15 cm plate containing ~2 million cells to a final concentration of 1%. Plates were then placed on a plate shaker for 10 minutes at room temperature. To quench the formaldehyde crosslinking, glycine was added to a final concentration of 125 mM, followed by a 5 minute incubation at room temperature with agitation. Cells were scraped from the plate using a rubber policeman and transferred to a 50 mL conical tube on ice. Cells were pelleted by centrifugation at 300 x g for 5 minutes and the supernatant was discarded.

The pellets from both cell types were place on ice for the remaining preparation steps. Cells were pelleted by centrifugation at 4°C and 1,200 x g for 5 minutes, the pellet was washed in 10 mL of

1x PBS and re-centrifuged under the same conditions. The cell pellets (without supernatant) were flash frozen in liquid nitrogen and stored at -80°C. Nuclei were isolated from the cell pellet by resuspension in 10 mL of Rinse Buffer #1 (50 mM HEPES (pH 8.0), 140 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5%

NP40, and 0.25% Triton X 100), incubation on ice for 10 minutes, and followed by centrifugation at

1,200 x g at 4°C for 5 minutes. The pellet was resuspended in 10 mL of Rinse Buffer #2 (10 mM Tris

(pH 8.0), 1 mM EDTA, 0.5 mM EGTA, 200 mM NaCl) and centrifuged at 1,200 x g and 4°C for 5 minutes. To wash residual salts from the side of the 15 mL conical tube, 5 mL of Shearing Buffer [0.1%

SDS, 1 mM EDTA (pH 8.0), 10 mM Tris-HCl (pH 8.0)] was carefully added without disturbing the pellet.

The samples were spun at 1,200 x g at 4°C for 3 minutes, the supernatant was discarded, and the wash step was repeated.

Sonication of fixed nuclei

Sonication was performed in a Covaris E110 instrument with a customized holder designed to hold glass tubes (Fisher C4008-632R) that were crimp sealed with caps (Fisher C4008-2A). Isolated cell nuclei were resuspended in 90 microliters (µl) of Shearing Buffer (supplemented with 1x protease inhibitor) per 10 million mESCs and 500,000 HEK 293T cells. The resuspended cells were carefully mixed by pipetting and transferred to the glass sonication tubes on ice.

144 The nanodroplet cavitation reagent (MegaShear™, Triangle Biotechnology) was stored at -

20°C and was prepared by defrosting on wet ice prior to use. Briefly, this reagent is formulated by the encapsulation of liquid decafluorobutane in a lipid shell in a buffer of phosphate buffered saline, glycerine (15% v/v), and propylene glycol (5% v/v). The encapsulation lipid was composed of 1,2- distearoyl-sn-glycero-3-phosphocholine (DSPC) (Avanti Polar Lipids, Alabaster, AL, USA) stabilized by

1,2-distearoyl-sn-glycero-3-phosphoethanolamine-N-methoxy (polyethylene-glycol)-2000 (DSPE-

PEG2000) (Avanti Polar Lipids, Alabaster, AL, USA) in a 9 to 1 molar ratio. Further details of the formulation process are provided by Kasoji et al9.

Ten µl of nanodroplets or 10 µl of Shearing Buffer (traditional) was added to the appropriate tubes, the tubes were inverted to mix, followed by a brief centrifugation to remove liquid from the sides of the tube. Samples were sonicated at 4°C for 5 minutes followed by at least 5 minutes of rest for the total designated time using the settings 20% duty cycle, intensity = 8, and 200 cycles per burst.

Extraction and quantitation of total DNA for determination of fragmentation efficiency

Following sonication, the cell lysate containing chromatin was transferred to a 1.5 mL microfuge tube and centrifuged full speed (>18,000 x g) at 4°C for 5 minutes. The supernatant was transferred to a 0.2 mL PCR tube, and the pellet was resuspended in 100 μL Shearing buffer and transferred to another PCR tube. Next, 2 μL 10 µg/µL RNase (Qiagen) was added to each pellet and supernatant sample, followed by incubation at 37°C for 30 minutes. Protein was removed by addition of 2 μL of 20 mg/mL proteinase K (Worthington), samples were mixed by inversion and briefly centrifuged. Samples were incubated at 55°C for 30 minutes followed by 65°C overnight to reverse the formaldehyde crosslinks (~14 hours). DNA was extracted from both pellet and supernatant samples using silica matrix columns (Zymo Research ChIP DNA clean and concentrate kit, D5201) according to manufacturer’s instructions and eluted in a 25 µL volume of 10 mM Tris, pH 8.0. The concentration of DNA in the pellet and supernatant samples was determined using fluorometry (Qubit® dsDNA High Sensitivity Assay Kit,

Invitrogen). To visualize the DNA and determine fragmentation efficiency, electrophoresis was performed on 9 μL (36% of total volume) of eluted supernatant DNA per lane loaded on a 1.2% agarose

145 gel. Average peak fragment size was determined using the Agilent TapeStation 2200 HS D1000 and

D5000 kits according to manufacturer’s instructions.

ChIP assay

ChIP-qPCR was performed using the ChIP-IT High Sensitivityâ kit (Active Motif #53040) according to the manufacturer’s instructions, except that inputs were purified using a clean and concentrate column (Zymo Research #D5205), and the ChIP samples were eluted twice with 30 µL each for a total of 60 µL of elution buffer for the H3K4me3 and H3K9me3 ChIP assays. For each ChIP assay, the equivalent of 5 million cells was used for mESCs and 1 million cells for HEK293T cells.

These samples were analyzed by qPCR using primers and methods as previously described (Table

S1)15. The following antibodies were used: G9a (Abcam #40542), H3K4me3 (Active Motif #39915), and

H3K9me3 (Active Motif #39161).

Results and Discussion:

The Chromatin in vivo Assay as a system for testing the quality of chromatin fragmented in the presence of nanodroplets

The Chromatin in vivo Assay (CiA) is designed to control the chromatin environment at a specific gene locus (Figure 1A)13. This platform is ideal for testing the quality of nanodroplet- fragmented chromatin because it uses well defined methods to induce enrichment of posttranslational marks at the CiA:Oct4 locus in mouse embryonic stem cells (mESCs). In this system HP1a is rapidly tethered using the chemical inducer of proximity (CIP) rapamycin13. When HP1a is recruited to the

CiA:Oct4 promoter region it induces histone methyltransferases (HMT) to catalyze tri-methylation on histone H3 lysine-9 (H3K9me3)16-19. This posttranslational mark then will recruit endogenous HP1 enzymes to propagate heterochromatin formation mostly symmetrically across the promoter and gene body of the targeted allele20. These domains lead to chromatin compaction and gene repression21.

ChIP-qPCR can be used to quantify the extent of H3K9me313. In our system CIP-rapamycin mediated recruitment of HP1a for 5-days results in a complete heterochromatin domain formed at the CiA:Oct4

146 target locus marked by H3K9me313. In this report, we used this well-characterized system to investigate the quality of chromatin fragmented using nanodroplets.

Our experimental system compares enrichment by ChIP-qPCR of H3K9me3 at the CiA:Oct4 locus between two conditions, euchromatin (no CIP-HP1a) and heterochromatin (CIP-HP1a recruitment for 5 days). To confirm that recruitment of HP1a results in the expected gene repression, the cells were exposed to 3nM of CIP-rapamycin for 5 days and GFP expression was quantified and imaged. Flow cytometry results revealed a decrease in GFP positive cells in CIP-HP1a treated samples from 99.8% to 19.4% (Figure 1B, Figure S1). Cells were also imaged with fluorescent microscopy, which confirms the flow cytometry data (Figure 1C). Taken together these results show that recruitment of HP1a causes gene repression at the CiA:Oct4 locus, as previously described13.

Determining efficiency of nanodroplet-mediated chromatin fragmentation using fixed mouse embryonic stem cells and human embryonic kidney (HEK) 293T cells

To determine if cavitation enhancement could alleviate the sonication bottleneck in sample processing for ChIP, we measured chromatin fragmentation efficiency in the presence or absence of nanodroplets in two different cell lines. Cell nuclei were prepared from formaldehyde crosslinked mESCs or HEK293T cells, sonicated in the presence (nanodroplets) or absence (traditional) of the nanodroplet reagent, followed by purification of total soluble DNA, which was used to measure chromatin fragmentation efficiency (Figure 2A, Experimental Procedures). DNA was visualized on an agarose gel (Figure S2), and peak average fragment size was quantitated using an Agilent

TapeStation (Figure 2B-C). Agarose gel electrophoresis demonstrated that samples sonicated in the presence of nanodroplets reached the desired ChIP fragment size distribution of 200-500 bp in less time than traditional sonication for both cell lines (Figure S2). Quantitation of DNA size by TapeStation revealed that samples sonicated with nanodroplets reached a mean peak fragment size of 318 ± 34.7 bp after just 2.3 minutes sonication time for mESCs (Figure 2B, Figure S3-5) and 239 ± 39.4 bp after

3 minutes sonication time for HEK 293T cells (Figures 2C, Figure S6-8). In contrast, traditional samples did not achieve the targeted 200-500 bp range until 38.4 minutes sonication time (mean peak fragment size of 301 ± 27.2 bp) for mESCs (Figure 2B), and 48 minutes sonication time (347 ± 152.3

147 bp) for HEK 293T cells (Figure 2C). Therefore, the addition of nanodroplets to the sonication mixture decreases chromatin fragmentation time from formaldehyde crosslinked cells by 16.7-fold for mESCs, and 16-fold for HEK 293T cells, both of which represent a similarly significant reduction in sample processing time.

ChIP of G9a lysine methyltransferase can be successfully performed on chromatin fragmented from HEK 293T cells in the presence of nanodroplets.

After determining optimal times for chromatin fragmentation using either traditional or nanodroplet-mediated sonication, we tested HEK 293T cell chromatin quality by performing ChIP-qPCR for lysine methyltransferase protein G9a, and a IgG control. The qPCR amplification regions were selected from previously published G9a ChIP-next generation sequencing (ChIP-seq) data (Figure 3A-

B)14. Chromatin preparation and sonication steps were performed as described (Experimental

Procedures). To minimize variability, the post-sonication ChIP protocol was performed using a commercially available kit (Experimental Procedures). We determined that sonication for 3 minutes with nanodroplets and 48 minutes without nanodroplets (Figures 2C) was appropriate to fragment HEK

293T cell chromatin to a range of 200-500 bp. We used these conditions to prepare chromatin for ChIP and confirmed fragmentation by subjecting ChIP input samples to agarose gel electrophoresis (Figure

S9A), and Agilent TapeStation (Figure 3C). Despite 48 minutes sonication, we were unable to fragment all of the traditionally sonicated inputs within the desired 200-500 bp range. Nanodroplet-mediated sonication produced a lower average peak bp size distribution (243 ± 86 bp) compared to the traditional method (482 ± 180 bp) with a similar coefficient of variation (CV) for both methods (nanodroplet CV =

35%; traditional CV = 37%). ChIP-qPCR was performed for G9a or IgG (background) at the G9a positive PGSF1 locus (Figure 3A) and the BC006361 G9a negative locus (Figure 3B) as determined from published G9a ChIP-seq data14. The large variation in input fragment size coupled with the long,

48-minute sonication time resulted in poor G9a ChIP signal over background for the traditional samples and no significant difference in G9a occupancy between positive and negative control regions (Figure

3D). In contrast, samples sonicated for only 3 minutes in the presence of nanodroplets resulted in a more robust G9a ChIP signal over background with a 6-fold difference between G9a occupancy at the

148 positive and negative regions (p = 0.03). Therefore, the addition of nanodroplets to the sonication of fixed HEK 293T cells results in a 16-fold decrease in sonication time compared to the traditional method and produces chromatin of sufficient quality to perform G9a ChIP-qPCR with a significant signal over background.

Chromatin fragmented using nanodroplets faithfully recapitulates ChIP-qPCR data for analysis of histone tail modifications in the CiA system.

Following confirmation that we could perform ChIP from HEK 293T chromatin fragmented in the presence of nanodroplets, we further tested chromatin quality from mESCs by performing ChIP- qPCR for histone posttranslational modifications that undergo a dynamic change following heterochromatin formation in the CiA:Oct4 system13. Chromatin was prepared as described

(Experimental Procedures) and sonicated for 2.3 minutes when nanodroplets were used, and 38.4 minutes for the traditional method (Figure 2B). To confirm fragment size, the ChIP input samples were subjected to agarose gel electrophoresis where they appeared to be fragmented within the desired range (Figure S9B). When fragmentation of these input samples was quantified by TapeStation, however, nanodroplet-mediated sonication produced an overall 2.7-fold reduction in variance of DNA size distribution (coefficient of variation (CV) = 27%) compared to the traditional method (CV = 72%)

(Figure 4A). The CV values were calculated using 6 data points for nanodroplet reagent samples and

5 data points for the traditional method since one traditional sonication data point produced a peak DNA fragment size >1,500 bp, which is outside of the linear range of the high sensitivity TapeStation assay.

Average peak fragment sizes for traditional sonication were: 614 ± 511 control and 927 ± 346 with rapamycin, and for sonication with nanodroplets average peak fragment sizes were: 277 ± 83 control and 275 ± 48 with rapamycin. There was no significant change in average peak fragment size after rapamycin treatment. Overall, in addition to decreasing sonication time, nanodroplets also increased fragmentation consistency by 2.7-fold between input samples in mESCs.

The ChIP assay was performed as described for HEK 293T cells. ChIP antibodies for H3K4me3 were used as a marker of active gene transcription, whereas antibodies for H3K9me3 were used to mark transcriptional repression as previously described13. ChIP was performed using three biological

149 replicates in the presence or absence of nanodroplets, and qPCR experiments had two technical replicates for each of the three biological replicate samples using three primer sets spanning the

CiA:Oct4 locus (Figure 4B). Following recruitment of HP1a by treatment of cells with rapamycin,

H3K4me3 enrichment was significantly decreased at all CiA:Oct4 locus regions tested. As expected, the opposite trend was observed for H3K9me3 levels following HP1a recruitment (Figure 4C). Unlike the ChIP-qPCR results for HEK 293T cells (Figure 3D), the ChIP-qPCR results for our mESC CiA:Oct4 system13 recapitulated previous findings equally well for samples treated with or without nanodroplets.

Overall, our results indicate that the addition of a nanodroplet cavitation enhancing reagent to the sonication mixture does not alter the conclusions of previously-published data in HEK 293T cells or the mESC CiA:Oct4 system13, but does allow for a 16-fold increase in chromatin fragmentation efficiency, and a 2.7-fold reduction in chromatin fragmentation variance in the inducible mESC CiA:Oct4 system.

Conflict of Interest Statement

Authors Samantha Pattenden and Paul Dayton of this publication have equity ownership in Triangle

Biotechnology, Inc., to which the following technologies used or evaluated in this paper have been licensed: nanodroplets (US Patent number 9,427,410 and US Patent Application 14,432,747). Authors

Samantha Pattenden and Paul Dayton are inventors of US Patent Application 14,432,747 and author

Paul Dayton is inventor of US Patent number 9,427,410.

150 REFERENCES:

1. Aparicio, O., Geisberg, J. V., and Struhl, K. (2004) Chromatin immunoprecipitation for determining the association of proteins with specific genomic sequences in vivo, Curr Protoc Cell Biol Chapter 17, 17.17.11-17.17.23.

2. Kuo, M. H., and Allis, C. D. (1999) In vivo cross-linking and immunoprecipitation for studying dynamic Protein:DNA associations in a chromatin environment, Methods 19, 425-433.

3. Schoppee Bortz, P. D., and Wamhoff, B. R. (2011) Chromatin immunoprecipitation (ChIP): revisiting the efficacy of sample preparation, sonication, quantification of sheared DNA, and analysis via PCR, PLoS ONE 6, e26015.

4. Ren, B., and Dynlacht, B. D. (2004) Use of chromatin immunoprecipitation assays in genome- wide location analysis of mammalian transcription factors, Methods Enzymol 376, 304-315.

5. Orlando, V. (2000) Mapping chromosomal proteins in vivo by formaldehyde-crosslinked- chromatin immunoprecipitation, Trends Biochem Sci 25, 99-104.

6. Meyer, C. A., and Liu, X. S. (2014) Identifying and mitigating bias in next-generation sequencing methods for chromatin biology, Nat Rev Genet 15, 709-721.

7. Teytelman, L., Ozaydin, B., Zill, O., Lefrancois, P., Snyder, M., Rine, J., and Eisen, M. B. (2009) Impact of chromatin structures on DNA processing for genomic analyses, PLoS ONE 4, e6700.

8. Hengen, P. N. (1997) Shearing DNA for genomic library construction, Trends in Biochemical Sciences 22, 273-274.

9. Kasoji, S. K., Pattenden, S. G., Malc, E. P., Jayakody, C. N., Tsuruta, J. K., Mieczkowski, P. A., Janzen, W. P., and Dayton, P. A. (2015) Cavitation Enhancing Nanodroplets Mediate Efficient DNA Fragmentation in a Bench Top Ultrasonic Water Bath, PLoS One 10, e0133014.

10. Sheeran, P. S., Wong, V. P., Luois, S., McFarland, R. J., Ross, W. D., Feingold, S., Matsunaga, T. O., and Dayton, P. A. (2011) Decafluorobutane as a Phase-Change Contrast Agent for Low- Energy Extravascular Ultrasonic Imaging, Ultrasound in Medicine and Biology 37, 1518-1530.

11. Sheeran, P. S., and Dayton, P. A. (2012) Phase-change contrast agents for imaging and therapy, Current pharmaceutical design 18, 2152-2165.

12. Sheeran, P. S., Luois, S. H., Mullin, L. B., Matsunaga, T. O., and Dayton, P. A. (2012) Design of ultrasonically-activatable nanoparticles using low boiling point perfluorocarbons, Biomaterials 33, 3262-3269.

13. Hathaway, N. A., Bell, O., Hodges, C., Miller, E. L., Neel, D. S., and Crabtree, G. R. (2012) Dynamics and memory of heterochromatin in living cells, Cell 149, 1447-1460.

14. Simon, J. M., Parker, J. S., Liu, F., Rothbart, S. B., Ait-Si-Ali, S., Strahl, B. D., Jin, J., Davis, I. J., Mosley, A. L., and Pattenden, S. G. (2015) A Role for Widely Interspaced Zinc Finger (WIZ) in Retention of the G9a Methyltransferase on Chromatin, J Biol Chem 290, 26088-26102.

15. Butler, K. V., Chiarella, A. M., Jin, J., and Hathaway, N. A. (2017) Targeted Gene Repression Using Novel Bifunctional Molecules to Harness Endogenous Histone Deacetylation Activity, ACS Synth Biol.

16. Fritsch, L., Robin, P., Mathieu, J. R. R., Souidi, M., Hinaux, H., Rougeulle, C., Harel-Bellan, A., Ameyar-Zazoua, M., and Ait-Si-Ali, S. (2010) A Subset of the Histone H3 Lysine 9

151 Methyltransferases Suv39h1, G9a, GLP, and SETDB1 Participate in a Multimeric Complex, Molecular Cell 37, 46-56.

17. Peters, A. H., Kubicek, S., Mechtler, K., O'Sullivan, R. J., Derijck, A. A., Perez-Burgos, L., Kohlmaier, A., Opravil, S., Tachibana, M., Shinkai, Y., Martens, J. H., and Jenuwein, T. (2003) Partitioning and plasticity of repressive histone methylation states in mammalian chromatin, Mol Cell 12, 1577-1589.

18. Rea, S., Eisenhaber, F., O'Carroll, D., Strahl, B. D., Sun, Z. W., Schmid, M., Opravil, S., Mechtler, K., Ponting, C. P., Allis, C. D., and Jenuwein, T. (2000) Regulation of chromatin structure by site-specific histone H3 methyltransferases, Nature 406, 593-599.

19. Schultz, D. C., Ayyanathan, K., Negorev, D., Maul, G. G., and Rauscher, F. J., 3rd. (2002) SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins, Genes Dev 16, 919-932.

20. Maison, C., and Almouzni, G. (2004) HP1 and the dynamics of heterochromatin maintenance, Nature reviews. Molecular cell biology 5, 296-304.

21. Peters, A. H., Mermoud, J. E., O'Carroll, D., Pagani, M., Schweizer, D., Brockdorff, N., and Jenuwein, T. (2002) Histone H3 lysine 9 methylation is an epigenetic imprint of facultative heterochromatin, Nat Genet 30, 77-80.

152 Figures:

Figure 1. Establishment of the CiA system as a tool assessing chromatin changes. Mouse ES cells were infected with constructs expressing FKBP-Gal4 and FRB-HP1. Rapamycin (a chemical inducer of proximity) brings together FKBP and FRB at the Gal4 domain upstream of eGFP (A) as shown in cartoon form. (B) Flow cytometry results of infected cells treated with 3nM of rapamycin. (C) Fluorescent images of infected cells treated with 3nM of rapamycin.

153

Figure 2. Efficiency of chromatin fragmentation in the presence and absence of nanodroplets. (A) Nanodroplets increase the number of cavitation events when exposed to ultrasound during the sonication phase of the chromatin isolation protocol. (B-C) Peak DNA fragment size (bp) after sonication was determined by Agilent TapeStation 2200 analysis for mESC (B) and HEK 293T (C) cells. Data from each replicate in base pairs (bp) size at each DNA fragment peak without (–) or with (+) nanodroplets is shown as a heat map. The mean value of all three replicates is also indicated. The peak bp size for each sample was also plotted as a function of time without (blue dots) or with (red dots) nanodroplets. The mean peak bp size for each condition is indicated by a horizontal line. Error bars represent the standard deviation of 3 biological replicates except HEK 293T cells traditional, 24 minutes, which represents 2 replicates.

154

Figure 3. G9a ChIP from HEK 293T cells. ChIP-qPCR regions were selected from Simon et al.,14 to include a G9a positive region (A) and a G9a negative region (B). Amplicons are indicated by a grey vertical bar. (C) The peak base pair (bp) size of each input sample without (Traditional, red dots; mean = 482 ± 180 bp; CV = 37%) or with (Nanodroplets, purple dots mean = 243 ± 86 bp; CV = 35%) nanodroplets was measured by Agilent TapeStation. The mean peak bp size for each condition is indicated by a horizontal black line. Error bars represent the average of 3 biological replicates. (D) G9a ChIP-qPCR showing the fold change of G9a signal over IgG signal with nanodroplets (n=3) or without nanodroplets (n=2). P-values are indicated; ns indicates no significant difference.

155

Figure 4. ChIP-qPCR results using traditional sonication compared to using nanodroplets. Control (dots) and rapamycin-treated (triangles) cells were sonicated with and without nanodroplets (n=3). (A) The peak base pair (bp) size of each input sample without (Traditional, red, CV = 72%) or with (Nanodroplets, purple, CV = 27%) nanodroplets was measured by Agilent TapeStation. The mean peak bp size for each condition is indicated by a horizontal black line. (B) ChIP-qPCR was performed using primers positioned along the reporter gene locus as indicated. (C) Rapamycin treated cells were compared to control cells with the traditional and nanodroplet sonication methods using antibodies for H3K9me3 and H3K4me3 (* = p<0.05; ** = p<0.01; *** = p<0.005; **** = p<0.001).

156

Supplemental Figure 1. ES cells were gated to select for a healthy, single-cell population. The same gating scheme was applied to all samples.

157

Supplemental Figure 2. Nanodroplets decrease sonication times for HEK 293T and mouse embryonic stem cells. DNA fragment distribution following sonication of (A) HEK 293T cells and (B) mouse embryonic stem cells was determined by agarose gel electrophoresis. These gels are representative of three biological replicates. Base pair size of the DNA ladder is indicated to the left of the image.

158

159

160

161

Supplemental Figures 3-5. Agilent TapeStation data showing DNA fragment size distribution for mESCs represented in Figure 2B for each of 3 biological replicates. Sonication time (minutes) is indicated on the left for both Traditional and Nanodroplet methods. Peak fragment size called by the TapeStation software is indicated by a vertical blue line with the base pair value listed above.

162

163

164

Supplemental Figures 6-8. Agilent TapeStation data showing DNA fragment size distribution for HEK 293T cells represented in Figure 2C for each of 3 biological replicates. Sonication time (minutes) is indicated on the left for both Traditional and Nanodroplet methods. Peak fragment size called by the TapeStation software is indicated by a vertical blue line with the base pair value listed above.

165

Supplemental Figure 9. DNA fragment distribution of ChIP input samples for (A) HEK 293T cells (n=3 for each condition) and (B) mESCs (n=6 for each sonication condition) was determined by agarose gel electrophoresis. Samples treated with rapamycin are indicated. Replicate number and sonication time (minutes) is indicated for each condition. Base pair size of the 1 Kilobase DNA ladder is indicated to the left of the image.

166 Table S1.

Primer Name Forward Reverse

PGSF1 CTTCAGCACGAGGGACAGCT CTGTCCGGATATTTGGTGCT BC006361 TTCTCCAACTTTGGAAGCCCAGGA TGTCTCCTTCTAGGCCCTCACAAT -169 CTAGAGGATCCGAGGACCAATTG ACCTTCAAGGTCCTCTCACC 738 CACATGGTCCTGCTGGAGTT ATCTAGAGTCGCGGCCGG 1199 GTGATGGGTCAGCAGGGCT TCCGATTCCAGGCCCACCT IAP ATTTCGCCTAGGACGTGTCA ACTCCATGTGCTCTGCCTTC

167 Chapter 6: Discussion

Conclusions:

Since the 1970’s, the concept of chromatin and DNA having a highly ordered structure has been known, with the first microscopic observation of chromatin appearing as “beads on a string”1–3.

With advancements in technology, the structure of chromatin has become increasingly clear4. Thanks to leaders in the field of chromatin biology, the mechanistic consequences of various chromatin are beginning to be understood5. Much of the early work on chromatin regulation and gene expression has been done by associating observations of the chromatin environment with upstream or downstream observations. For example, labs have mutated or deleted a known histone tail writer or erasers and then tested for transcriptional changes. One drawback of this approach is that many histone tail writers and erasers have non-histone substrates, so the downstream effects could be an indirect effect of a non-chromatin based changed. Technology enabling directly recruit endogenous chromatin readers, writers, and erasers with temporal precision will allow researchers to investigate (1) how the chromatin modifying machinery is directly effecting the chromatin environment and what histone tail modifications are increasing or decreasing (2) what other transcriptional or chromatin machinery are being recruited as a consequence (3) if there is a change in the transcriptional activity of the gene of interest (4) what cell-wide consequences results from the up- or down- regulation of this one gene in its native genomic location.

To create a system capable of redirecting endogenous chromatin regulatory machinery to a specific gene, we began combining the fields of chromatin biology, synthetic biology, and synthetic chemistry. Chemical Epigenetic Modifiers (CEMs) are able to be recruited in a gene-specific manner by incorporating FK506, a small molecule with low nanomolar binding and specificity to a protein domain, FKBP6. By fusing FKBP to a DNA binding domain, we are able to target to CEMs to a specific gene or subset of genes. The original gene-specific recruitment was achieved using a Gal4-FKBP

168 fusion and inserting a Gal4 binding array in proximity to the transcriptional start site (TSS) of our gene of interest7. While this system was found to be very efficient, we sought to make the technology more modular and be able to target mammalian genes of interest without the requirement of gene editing (to insert the Gal4 array). To accomplish this, we incorporated CRISPR/Cas9 technology. A deactivated version of Cas9 (dCas9) is capable of being recruited to virtually any region of the genome to which a guide RNA (gRNA) is designed, but will not cut the DNA8. Using a dCas9-FKBP fusions, among other dCas9-related fusions (ms2-FKBP, scFv-FKBP), we demonstrate functionality with our CEMs.

The other portion of the bifunctional CEM is the component capable of binding specific chromatin regulatory machinery. To demonstrate the proof of principle for these novel compounds we focused on recruiting machinery related to histone tail acetylation. After synthesizing and testing over

20 CEMs, we found three activating CEMs (CEMa) and three repressing CEMs (CEMr). The CEMa molecules were synthesized with derivatives shown to bind histone acetyl transferases (HAT) and lysine acetylation reader proteins (hypothesized to recruit downstream writers or transcriptional activators). CEMr molecules were synthesized with derivatives of histone deacetylase (HDAC) inhibitors, most of which were hypothesized to recruit Class I HDACs.

With these novel CEMs, we demonstrated the ability to control gene expression in a specific, dose-responsive, reversible manner. Following our proof-of-principle study, we created a video protocol so that other labs could incorporate this new CEMs technology into their own research. We also demonstrated the ability to enact changes in the chromatin environment, with our lead CEMr decreasing levels of H3K27ac7. In order to efficiently and accurately investigate what CEM-induced changes were occurring in the chromatin environment, we also developed an optimized ChIP protocol. This new protocol can now be used by anyone in the field interested in studying chromatin. Because we completed years of research to invent, thoroughly trouble-shoot, optimize, and characterize the CEMs system, there are a lot of directions that this work may now take. The initial platform for investigation and control of chromatin regulatory processes is complete, yet changes can still be made including the

(1) genes-of-interest (2) cellular context of investigation (3) gene-targeting CRISPR machinery (4) endogenous protein recruiter portion of the CEM and (5) gene target portion of the CEM.

169 Future Directions:

The CEMr and CEMa systems have evolved into a highly modular platform that can be repurposed to target virtually any gene or regulatory element. We have begun working to target genes with disease significance. For example, our CEMr set of compounds allows us to target genes that are driving human cancer such as oncogenes. We hypothesize that if we target such genes with CEMr, we can decrease expression of the gene and induce death in the cancer cells at a greater efficiency than non-cancerous cells. We have started by targeting the MYC1 gene in human malingancies. c-Myc is a well-studied oncogene that is mutated or overexpressed in most human tumor types. It has been implicated in affecting tumor cell DNA replication and transcription, growth, cellular metabolism, stem- like characteristics, and tumor microenvironment9–11. Furthermore, it has also been suggested that

MYC1 is, in part, regulated by histone acetylation levels. Bromodomain inhibitors, iBet-762 and JQ1, reduced MYC expression and caused an anti-tumor effect in Myc-dependent cancers12,13. This data makes MYC1 an ideal target for our targeted CEM-based therapy. By using gRNAs previously optimized and tested for MYC114, we will treat MYC-addicted cancer models with CEMr. We hypothesize that this will recruit HDACs to the loci, decrease MYC1 expression, and induce cell death in a targeted and specific manner. We will begin by optimizing MYC repression in 293T HEK cells, a non-disease setting so as to not confound optimization results with induction of cell death. Once we have compared the different dCas9-based recruitment systems, and performed time-course and dose- curve experiments with our set of CEMr, we will begin repressing MYC in disease models of Burkitt’s lymphoma (Daudi cells), acute promyelocytic leukemia (HL60 cells), and prostate cancer (LnCaps) which are models of cancers with driven by increased c-Myc activity15–17. We will next characterize cell death at different doses of CEMr and correspond that with decreasing levels of MYC1 expression. We will also investigate the direct downstream effects of CEM-mediated HDAC recruitment on the chromatin environment of the MYC locus (including changes in histone marks and recruitment of transcriptional/chromatin machinery). We will then test any effects on cell viability and global gene expression profiles. If successful, we will begin transferring our studies to mouse models of MYC-driven cancers in order to move a step closer to a new therapeutic.

170 Another way that we will improve and continually evolve the CEM platform is by modifying the

CRISPR-based gene targeting system that we use to anchor epigenetic activity. By expanding our

CEM-compatible gene-targeting machinery we will be able to explore using different gene delivery platforms, as well as regulating multiple genes orthogonally. Adeno-associated virus (AAV) is a gene- delivery system with many characteristics that make it ideal for use in the clinic, including little to no toxicity in small and large animal models and some level of organ-specific targeting18. One drawback of this delivery system is that the DNA being delivered needs to be less than 5kb, which can be very limiting considering the Streptococcus pyogenes (Sp) dCas9 that has been used for the CEM system is 4.2 kb. With the addition of FKBPx2 (642 bp) and the promoter (221 bp), the construct exceeds the payload limit. This size limit also prevents the ability to co-express an antibiotic selection (for isolating cells that are expressing the Cas9 machinery), the gRNA(s) (which in many cases is several), or the ms2- or scFv-fusions for increased CEM recruitment. Recently, a series of domain deletions of

CRISPR-related proteins were created and tested. A mini-version of the Sp-dCas9 (3.5 kb) was shown to be roughly 50% functional, and a mini-version of the Staphylococcus aureus (Sa) dCas9 (2 kb) was about 40% functional19. These experiments were done with a VPR activating domain, and it is possible that the “mini” dCas9-FKBP maintains close to 100% functionality with CEMs relative to the regular- sized version. This is something that we are very eager to test, because with a 2 kb version of Cas9 we could co-express several other recruitment components simultaneously with an AAV delivery method. In addition, by incorporating the CEMs system with Sa-dCas9 we will be able to target multiple genes orthogonally. An orthogonal system was previously demonstrate using the Sa-dCas9 and Sp- dCas9 to target and activate one gene, while simultaneously targeting and repressing another gene20.

With our current system, we would need to control the expression of gRNA (Sa vs Sp-compatible) based on the chemical with which we are treating the cell, or we could employ another chemical recruitment system (i.e. rapamycin or ABA),21 to recruit different regulatory machinery to different genes. For example, we could use the CEMa system to activate one gene, and the ABA system to repress a different gene.

The bifunctional CEMs can be modified to control a diverse range of cellular proceses. The linker between FK506 and the chromatin machinery recruiter is a polyethylene glycol (PEG) linker. In

171 terms of delivery, the size and length of the PEG linker can influence chemical stability and permeability22,23. For the specific purposes of the CEMs, the length may play a major role in the steric characteristics and influence how well the recruited chromatin modifying machinery is able to exert an effect on the chromatin environment of the gene locus of interest. In the future, changing the size and composition of the linker will need to be explored in relation to its permeability and CEM-related function. For example, CEM23 and CEM42 are both comprised of FK506 and the HDAC inhibitor SAHA, yet CEM23, which has a shorter linker is more effective7. We are currently unsure of whether this is due to the changes in CEM permeability, or specific to the recruited HDACs and how they are being redirected.

The endogenous protein recruiter portion of the CEMs (i.e. SAHA derivative, iBet-762 derivative) can be substituted. Frequently, there are new chemical probes discovered which we could design and synthesize a CEM24–27. Our current series of CEMs mainly focuses on proteins that recognize and/or regulate histone tail acetylation, but designing CEMs capable of redirecting histone methylation or DNA methylation readers, writers, and erasers could greatly benefit and expand the scope of what CEM technology can study. Thus, novel CEMs can be used to exert a specific and predictable effect on the chromatin environment of a gene-of-interest, or they could be used to study the effect that previously uncharacterized proteins have on chromatin.

Lastly, the DNA recruiting portion of the CEMs, FK506, can be substituted. For example, the recently published dTAG chemical system uses a molecule that binds to a mutated FKBP domain

(FKBP F36V), but not the endogenous FKBP domain28. By replacing our current FK506 with the portion of dTAG that binds FKBP F36V, we can prevent any off-targeting binding that is occurring with our current CEMs. If the CEMs are more-specifically being recruited to the dCas9-related FKBP fusions, the CEMs may exhibit higher efficiency. In addition, replacing the FK506 with a trimethoprim (TMP) we can design CEMs that bind to DHFR, rather than FKBP. A set of CEM function with FKBP fusions and a second set functional with DHFR would allow for dual activities in the Sa-dCas9 and Sp-dCas9 orthogonal systems (i.e. Sa-dCas9-FKBP and Sp-dCas9-DHFR). DHFR is a bacterial specific domain that binds TMP. It has also bene shown that a photocage can be added to TMP to prevent TMP-DHFR interaction and effectively control recruitment with light29. This would add an extra layer of control to our

172 CEMs system. For example, by exposing an organ or set of tumorigenic cells to light at the specified wavelength, we can have time and localization control of the CEM-mediated activity.

As we learn more about chromatin dysregulation in clinical settings our need for improved technology to ask complex and directed questions increases. The fields of synthetic biology and chemical biology have exponentially expanded over the past several decades. We believe the technological advances we have made will aid us in our continued investigation of chromatin biology.

Hopefully, combined with other’s work in this emerging field we can contribute to progression of these important and exciting fields bringing us new technologies for biological exploration.

173 REFERENCES:

1. Olins, A. L. & Olins, D. E. Spheroid Chromatin Units (ν Bodies). Science (80-. ). 183, 330 LP- 332 (1974).

2. Luger, K., Mäder, A. W., Richmond, R. K., Sargent, D. F. & Richmond, T. J. Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature 389, 251–260 (1997).

3. Liegro, I. Di & Cognetti, G. Chromatin structure. Bolletino di Zool. 45, 337–351 (1978).

4. Olins, D. E. & Olins, A. L. Chromatin history: our view from the bridge. Nat. Rev. Mol. Cell Biol. 4, 809 (2003).

5. Jenuwein, T. & Allis, C. D. Translating the Histone Code. Science (80-. ). 293, 1074–1080 (2001).

6. Guo, Z.-F., Zhang, R. & Liang, F.-S. Facile functionalization of FK506 for biological studies by the thiol–ene ‘click’ reaction. RSC Adv. 4, 11400 (2014).

7. Butler, K. V., Chiarella, A. M., Jin, J. & Hathaway, N. A. Targeted Gene Repression Using Novel Bifunctional Molecules to Harness Endogenous Histone Deacetylation Activity. ACS Synth. Biol. 7, 38–45 (2018).

8. Hilton, I. B. et al. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat. Biotechnol. 33, 510–517 (2015).

9. Nesbit, C. E., Tersak, J. M. & Prochownik, E. V. MYC oncogenes and human neoplastic disease. Oncogene 18, 3004–3016 (1999).

10. Carroll, P. A., Freie, B. W., Mathsyaraja, H. & Eisenman, R. N. The MYC transcription factor network: balancing metabolism, proliferation and oncogenesis. Front. Med. 12, 412–425 (2018).

11. Gabay, M., Li, Y. & Felsher, D. W. MYC Activation Is a Hallmark of Cancer Initiation and Maintenance. Cold Spring Harb. Perspect. Med. 4, a014241 (2014).

12. Delmore, J. E. et al. BET Bromodomain Inhibition as a Therapeutic Strategy to Target c-Myc. Cell 146, 904–917 (2011).

13. Wyce, A. et al. Inhibition of BET bromodomain proteins as a therapeutic approach in prostate cancer. Oncotarget 4, 2419–2429 (2013).

14. O’Geen, H. et al. DCas9-based epigenome editing suggests acquisition of histone methylation is not sufficient for target gene repression. Nucleic Acids Res. 45, 9901–9916 (2017).

15. Collins, S. J. The HL-60 promyelocytic leukemia cell line: proliferation, differentiation, and cellular oncogene expression. Blood 70, 1233 LP-1244 (1987).

16. Nag, A. & Smith, R. G. Amplification, rearrangement, and elevated expression of c-myc in the human prostatic carcinoma cell line LNCaP. Prostate 15, 115–122 (1989).

17. Li, Z. et al. A global transcriptional regulatory role for c-Myc in Burkitts lymphoma cells. Proc. Natl. Acad. Sci. 100, 8164 LP-8169 (2003).

18. Mingozzi, F. & High, K. A. Therapeutic in vivo gene transfer for genetic disease using AAV: progress and challenges. Nat. Rev. Genet. 12, 341 (2011).

174

19. Ma, D., Peng, S., Huang, W., Cai, Z. & Xie, Z. Rational Design of Mini-Cas9 for Transcriptional Activation. ACS Synth. Biol. 7, 978–985 (2018).

20. Gao, Y. et al. Complex transcriptional modulation with orthogonal and inducible dCas9 regulators. Nat. Methods 12, 1–9 (2016).

21. Stanton, B. Z., Chory, E. J. & Crabtree, G. R. Chemically induced proximity in biology and medicine. Science (80-. ). 359, (2018).

22. Turecek, P. L., Bossard, M. J., Schoetens, F. & Ivens, I. A. PEGylation of Biopharmaceuticals: A Review of Chemistry and Nonclinical Safety Information of Approved Drugs. J. Pharm. Sci. 105, 460–475 (2016).

23. Gursahani, H., Riggs-Sauthier, J., Pfeiffer, J., Lechuga-Ballesteros, D. & Fishburn, C. S. Absorption of Polyethylene Glycol (PEG) Polymers: The Effect of PEG Size on Permeability. J. Pharm. Sci. 98, 2847–2856 (2009).

24. James, L. I. et al. Discovery of a chemical probe for the L3MBTL3 methyl-lysine reader domain HHS Public Access Author manuscript. Nat Chem Biol 9, 184–191 (2013).

25. Herold, J. M. et al. Small Molecule Ligands of Methyl-Lysine Binding Proteins. J. Med. Chem. 54, 2504–2511 (2011).

26. Hay, D. A. et al. Discovery and Optimization of Small-Molecule Ligands for the CBP/ p300 Bromodomains. doi:10.1021/ja412434f

27. Finley, A. & Copeland, R. a. Small Molecule Control of Chromatin Remodeling. Chem. Biol. 21, 1196–1210 (2014).

28. Nabet, B. et al. The dTAG system for immediate and target-specific protein degradation. Nat. Chem. Biol. 14, 431–441 (2018).

29. Ballister, E. R., Aonbangkhen, C., Mayo, A. M., Lampson, M. A. & Chenoweth, D. M. Localized light-induced protein dimerization in living cells using a photocaged dimerizer. Nat. Commun. 5, 1–9 (2014).

175