Integrative Genome-Wide Analysis of DNA Methyltransferase 3A Reveals a Novel Model of Transcriptional Regulation Modulated by Long Non-coding RNAs

Albert D. Yu

A thesis in partial fulfilment of the requirements for the degree of

Master of Philosophy

School of Biotechnology and Biomolecular Sciences

Faculty of Science

August 2015

2

Abstract

LncRNAs are a class of nucleic acids that have been widely demonstrated to be involved in transcriptional regulation through epigenetic regulation via DNA methylation and chromatin remodelling. Though previously shown to be directly involved in modification by conferring loci specificity to histone methyltransferases and acetylases, their involvement in DNA methylation is largely unappreciated. Our group has previously demonstrated that a lncRNA interacts with the de novo methyltransferase DNMT3A, and mediates its recruitment. We hypothesized that this mechanism may regulate DNMT3A recruitment globally in the genome. By integrating ChIP-seq, RNA-seq, and RIP-seq genome-wide assays, I identify two – BIM and PDCD4 – that demonstrate bivalent modes of DNMT3A regulation through antisense lncRNAs emanating from their first intron, BIM-as and PDCD4-as1. Truncation of BIM-as and PDCD4-as1 of the mRNA- overlapping region results in decreased DNMT3A localization, without affecting promoter methylation or cognate mRNA expression. Surprisingly, RNAi suppression of PDCD4-as1 results in increased DNMT3A recruitment and PDCD4 mRNA expression without affecting PDCD4 promoter methylation. Finally, I demonstrate that loss of DNMT3A can occur despite accumulation of H3K9me3. My results support previous findings that DNMT3A has a wide breadth of context- dependent activity, but demonstrate novel molecular mechanisms that govern its function.

3 Table of Contents

ABSTRACT 3

INDEX OF TABLES 6

INDEX OF FIGURES 7

ACKNOWLEDGEMENTS 8

ABBREVIATIONS 10

INTRODUCTION 13 LITERATURE REVIEW 13 THE BIOLOGY OF LONG NON-CODING RNAS 13 LNCRNAS FORM COMPLEXES WITH PRC2 TO MEDIATE REPRESSION 16 LNCRNAS FORM COMPLEXES WITH WDR5 TO MEDIATE GENE ACTIVATION 18 LNCRNAS FORM COMPLEXES WITH A VARIETY OF POTENTIAL PARTNERS 20 HNRNP ACTIVITY IS MEDIATED BY LNCRNAS 21 LNCRNAS ARE COMPETITIVE INHIBITORS OF 22 LNCRNA TRANSCRIPTION IS INVOLVED IN B CELL DIVERSITY 24 VIRAL-ENCODED LNCRNAS MEDIATE PATHOGENICITY 26 LNCRNAS ARE FREQUENTLY DYSREGULATED IN CANCER 28 CONCLUSION 29 DNA METHYLATION 30 DNA METHYLTRANSFERASES 31 DNA METHYLTRANSFERASES INTERACT WITH LNCRNAS 32 TECHNOLOGY USED IN THIS THESIS 33 RNA-SEQ AND CHIP-SEQ REVEAL CHROMATIN BINDING AND REGULATORY POTENTIAL PROFILES 33 USE OF RIP-SEQ TO REVEAL POTENTIAL MODULATORS OF DNMT3A RECRUITMENT 34 EXPERIMENTAL MODULATION OF LNCRNAS 34 HYPOTHESIS 38 AIMS 38

METHODS 38 CELLS AND TISSUE CULTURE 38 GENOME BROWSERS 39 GENERATION AND ANALYSIS OF GENOME-WIDE DNMT3A INTERACTION DATA 39 RNA-SEQ 39 RIP-SEQ 40 CHIP-SEQ 41 VALIDATION OF RIP-SEQ BY RIP-QPCR 41 CHIP-QPCR FOR VALIDATION AND EXPERIMENTATION 44 DESIGN AND CLONING OF SHRNAS 46 TRANSFECTION AND ANALYSIS OF SHRNAS 48 DESIGN AND CLONING OF SMALL GUIDE RNAS 50 DESIGN AND CLONING OF HOMOLOGY DONOR VECTORS/PROMOTERS 52 CAS9-MEDIATED GENERATION AND VALIDATION OF KNOCKOUT MUTANTS 55 4 BISULFITE CONVERSION AND METHYLATION ANALYSIS 57

RESULTS 58 CHIP-SEQ OF DNMT3A 59 DNMT3A KNOCKDOWN RNA-SEQ 61 DNMT3A RIP-SEQ 65 INTEGRATING CHIP-SEQ WITH RNA-SEQ DATA 67 IDENTIFYING GENES OF INTEREST 71 VALIDATION OF SEQUENCING BY CHIP AND RIP-QPCR 77 KNOCKOUT OF LNCRNAS 80 PROMOTER DYNAMICS 83 INVESTIGATION OF SLIT2 85 INTERROGATION OF PDCD4 86 INTERROGATION OF BIM 89

DISCUSSION 93 DNMT3A INTERACTS WITH NON-CODING RNAS 93 DNMT3A MAY INTERACT WITH FIRRE 94 DNMT3A RECRUITMENT IS CONTINGENT ON A NON-CODING RNA TRANSCRIPT 95 DNMT3A IS MODULATED BY PDCD4-AS1 IN A BIMODAL FASHION 99 PDCD4 MAY BE SUBJECT TO ACTIVE DEMETHYLATION AND METHYLATION CYCLING 100 DNMT3A IS RECRUITED BY BIM-AS INDEPENDENTLY OF H3K9ME3 101

FUTURE DIRECTIONS AND CONCLUSIONS 104 DNMT3A MAY BE MODULATED IN A NUMBER OF POSSIBLE FASHIONS 104 INVESTIGATION IN DNMT3A INTERACTION IS NECESSARY 106 ICLIP WILL REVEAL DNMT3A BINDING DOMAINS ON RNA 109 MANY MYSTERIES ON DNMT3A WILL HAVE TO BE ANSWERED 110

REFERENCES 112

5 Index of Tables Table 1 DNMT3A RIP Primers ______44 Table 2 DNMT3A ChIP Primers ______46 Table 3 shRNA Targets ______46 Table 4 U6M2 Colony PCR and Sequencing ______48 Table 5 RT-qPCR primers ______50 Table 6 sgRNA target sites ______50 Table 7 HDR Template Donor Vector Cloning Primers ______53 Table 8 Cas9 Excision Validation Primers ______56 Table 9 Bisulfite-PCR Primers ______57 Table 10 Top ChIP-seq Enriched Genes ______59 Table 11 Most highly differentially expressed RNA-seq genes ______63 Table 12 Top hits from two approaches to RIP analysis ______65 Table 13 Gene candidates for experimental interrogation ______71

6 Index of Figures Figure 1 Popular lncRNA nomenclature ______14 Figure 2 Common mechanisms of lncRNA regulation. ______16 Figure 3 Select representations of specific lncRNA function. ______20 Figure 4 Two gene excision based methods of lncRNA perturbation. ______37 Figure 5 Features of the U6M2 RNAi expression vector. ______47 Figure 6 Features of the pCDNA-H1 vector. ______51 Figure 7 Features of the Shen Group HDR template donor vector. ______52 Figure 8 Features of our Cas9.BFP vector. ______55 Figure 9 GO Terms of Genes Enriched in ChIP-seq. ______60 Figure 10 CummeRbund: Depletion of DNMT3A in RNA-seq Data ______61 Figure 11 CummeRbund: PCA plot of conditions and replicates ______62 Figure 12 of RNA-seq genes. ______64 Figure 13 Gene ontoloy of RIP-seq genes. ______67 Figure 14 BETA-PlusFunctional prediction of DNMT3A activity. ______68 Figure 15 Functional GO categories of genes predicted to be regulated by DNMT3A ______70 Figure 16 UCSC Genome Browser: PDCD4. ______72 Figure 17 FANTOM5 Zenbu: PDCD4. ______72 Figure 18 UCSC genome browser: BIM ______74 Figure 19 FANTOM5 Zenbu: BIM. ______74 Figure 20 ICGC Data Portal: BIM. ______75 Figure 21 FANTOM5 Zenbu: SLIT2. ______76 Figure 22 UCSC genome browser: DLG5 and EPHA5. ______77 Figure 23 ChIP-qPCR of DNMT3A ______78 Figure 24 RIP-qPCR of DNMT3A. ______79 Figure 25 Cas9 design strategy for knockout of candidate lncRNAs. ______80 Figure 26 Gel: Truncation and Promoter Excision of PDCD4-as1 ______80 Figure 27 Gel: Truncation and Promoter Excision of BIM-as. ______81 Figure 28 Gel: Promoter Excision of SLIT2 ______82 Figure 29 BFP and mCherry expression of HDR Template/Promoter Clones. ______83 Figure 30 qPCR of SLIT2 genes following ablation. ______85 Figure 31 qPCR of PDCD4 genes following ablation. ______86 Figure 32 ChIP of DNMT3A at the PDCD4 locus following ablation. ______87 Figure 33 Lollipop chart of PDCD4 locus methylation following ablation ______88 Figure 34 qPCR of BIM genes following ablation ______89 Figure 35 DNMT3A ChIP at the BIM locus following ablation. ______90 Figure 36 Lollipop chart of methylation of BIM locus following ablation. ______91 Figure 37 H3K9me3 ChIP at the BIM loucs following ablation. ______92 Figure 38 IGV: Firre ChIP-seq peak. ______94 Figure 39 Some models of DNMT3A-lncRNA interaction. ______97

7 Acknowledgements I’d like to thank Kevin Morris for his creativity and support in my endeavor, and for indulging me in my ideas and engaging in invaluable dialogue that guided me at critical junctures throughout my project. This whole thing is owed to you, Kevin – thank you for making this possible. Thank you to Per Johnsson and Rosie Duttsie, who’s work formed the foundation of my own and to whom I always looked back to for guidance, even if they weren’t aware of it themselves. Thank you to Shirley Liu for taking me in to her group and Yiwen Chen for teaching me the fundamentals of high throughput data analysis – lessons that carried me through my project. Thank you to Philip Uren for his assistance in Piranha and RIP-seq analysis – my RIP-seq data would’ve gone unanalyzed if it weren’t for your efforts to help a stranger. Thanks must go to the whole Morris lab, past and present, for their assistance throughout my degree. Matthew Clemson, for your lab savvy and expertise, without which the lab probably wouldn’t exist. Caio, Jess, Nick, David, Leura, Chris, and Rosie, and also everyone down the hall – for creating a hospitable environment that made all these years bearable – even warm – and for all the support and advice that they’ve rendered throughout. Big thank you to Zichen Wang, for always challenging my ideas and forming the other half of a dialectic that propelled the maturation of my ideas and knowledge. Thank you to the visiting students and the undergrads, Erin Harvey, Jesus Castro, Le Thai, and Cristina Diego for helping shoulder the load of interrogating and validating seemingly insurmountable tracts of data. Your willingness to learn was encouraging and revitalizing, and the speed at which you all learned was simply astounding. Thank you to Erin Buczynski, for your unwavering support and understanding – helping to shoulder my burden in times of stress and celebrating with me in times of success. Without your support, I wouldn’t have made it here today. Your support has been more than I could dream of. 8 Finally, big thank you to my family for their endless support in my work. To my grandmother, Guanghua Yin, for raising me to be the person I am today. To my sister, Julia Yu, for whatever. To my parents, Junying Yuan and Qiang Yu, for their resources and advice that have, more than once, helped steer the direction of my work, and for proofreading the monstrous construct you see before you now. You two are the reason I do science.

9 Abbreviations

5caC 5-carboxylcytosine 5fC 5-formylcytosine 5mC 5-methylcytosine 5hmC 5-hydroxymethylcytosine ADD ATRX-DNMT3-DNMT3L as Antisense asRNA Antisense ribonucleic acid BCL2L11 B-cell lymphoma-2-like protein 11 AKA BIM BFP Blue fluorescent protein BIM B-cell lymphoma-2 interacting mediator of cell death bp CAGE Cap analysis of Cas9 Clustered regularly interspaced short palindromic repeats associated protein 9 CBX7 Chromobox homolog 7 cDNA Complementary DNA CEBPA CCAAT/enhancer-binding protein alpha CGI Cytosine Phosphate Guanine Island ChIP Chromatin Immunoprecipitation ChIRP Chromatin isolation by RNA purification CLIP Cross-linking immunoprecipitation CMV Cytomegalovirus CpG Cytosine Phosphate Guanine CRISPR Clustered regularly interspaced short palindromic repeats CTCF CCCTC-binding factor DLG5 Discs Large Homolog 5 DNA deoxyribonucleic acid DNMT deoxyribonucleic acid methyltransferase dsRNA Double-stranded ribonucleic acid DMEM Dulbecco’s Modified Eagle Medium DTT Dithiothreitol EBER2 Epstein-Barr virus encoded ribonucleic acid 2 EBV Epstein-barr Virus ecCEBPA Extra-coding CCAAT/enhancer-binding protein alpha EDTA Ethylenediaminetetraacetic acid EPHA5 Ephrin type-A receptor 5 EST Expressed sequence tag EtOH Ethanol FBS Fetal bovine serum FC Fold change GO Gene ontology

10 H#K#me# Histone# Lysine# with #methylation H3T3 Histone 3 Threonine 3 HCl Hydrochloric Acid HEK293 embryonic kidney 293 HEPES 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid HDAC1 Histone Deacetylase 1 hnRNP Heterogenous ribonucleoprotein HOTAIR Homeobox transcript antisense RNA HOTTIP HOXA transcript at the distal tip HOX Homeotic HP1 Heterochromatin protein 1 iCLIP Individual-nucleotide resolution cross-linking and immunoprecipitation iDRiP Identification of direct RNA interacting proteins IgG Immunoglobulin G IGCG International Cancer Genome Consortium KCl Potassium chloride KD Knockdown KO Knockout LB Lysogeny broth lincRNA Long intergenic non-coding ribonucleic acid lncRNA Long non-coding ribonucleic acid M-MLV Moloney Murine Leukemia Virus mRNA Messenger ribonucleic acid miRNA Micro ribonucleic acid NaDOC Sodium Deoxycholate NaCl Sodium chloride nt Nucleotide ODN Oligonucleotide paRNA Promoter-associated ribonucleic acid PAM Protospacer adjacent motif PAR-CLIP Photoactivatable-Ribonucleoside-Enhanced Crosslinking PARTICLE Promoter of MAT2A-antisense radiation induced circulating lncRNA PBS Phosphate buffered saline PCA Principal Component Analysis PCR polymerase chain reaction PDCD4 Programmed cell death 4 PIPES Piperazine-N PTEN Phosphate and tensin homolog PTENpg1 Phosphate and tensin homolog pseudogene 1 PSG Penicillin streptomycin glutamine PRC2 Polycomb repressive complex 2 qPCR Quantitative polymerase chain reaction

11 RAP RNA antisense purification RCF Relative centrifugal force RIP RNA immunoprecipitation RIPA Radioimmunoprecipitation assay RNA Ribonucleic acid RNAi Ribonucleic acid interference RNAse Ribonuclease RPM Rotations per minute RT Reverse transcribe/transcriptase rRNA Ribosomal ribonucleic acid SDS Sodium dodecyl sulfate SETDB1 SET Domain Bifurcated 1 sgRNA Small guide ribonucleic acid shRNA Small hairpin ribonucleic acid siRNA Small interfering ribonucleic acid SLIT2 Slit homolog 2 SOC Super optimal broth with catabolite repression TDG Thymine-DNA glycosylase TE Tris-ethylenediaminetetraacetic acid TET Ten-eleven translocation methylcytosine dioxygenase TGF Tumor growth factor TSS Transcription start site UTR Untranslated region UV Ultraviolet XIST X-inactive specific transcript

12 Introduction

Literature Review Note: This section is adapted from my published review1.

The Biology of Long Non-Coding RNAs The era wherein non-protein coding DNA was dismissed exclusively as junk DNA has now long since passed; it has now been widely demonstrated and established that vast swathes of these non-protein coding regions exhibit a variety of functional and regulatory roles2. The extent to which these genomic regions are functional is controversial – high estimates claim as much as 80%, though others are quick to refute this figure3,4. Nevertheless, many of these genomic elements have been extensively investigated and validated – not least of which including long-non coding RNAs (lncRNAs). lncRNAs are arbitrarily designated as any non-protein coding transcript greater than 200 nucleotides. They’re typically classified based on their orientation and position relative to protein coding genes and/or their origin or nature. The specific classification of lncRNAs is still the matter of some debate, although some common classifications with respect to protein coding genes can be seen denoted in Figure 15,6.

13

Figure 1 Some popular lncRNA nomenclature. lincRNAs refer to non-coding genes located distal to other protein coding genes. paRNAs refer to lncRNAs emanating from the promoter and overlapping the gene. Some lncRNAs emanate from within the intronic region of a protein coding gene, sometimes overlapping exons as well. asRNAs are located proximal to a protein coding gene but are transcribed from the opposite strand. asRNAs can be non-overlapping and driven by a bidirectional promoter, or they can emanate from within an intronic region of the gene and overlap one or more exons.

The lncRNA’s point of origin also often plays a role in its characterization. Pseudogenic lncRNAs, arising from formerly protein-coding genes, are typically referred to as pseudogenes, rather than lncRNAs7. Enhancer RNAs (eRNA) are typically unspliced, non-polyadenylated transcripts emanating from enhancer regions8,9. Occasionally, but not always, the classification of a lncRNA plays a role in its function. Some exert their regulatory action through transcriptional gene regulation, such as HOTAIR, XIST, and PTENpg1 antisense10-12. Others regulate mRNA stability, such as PTENpg1 sense, KRAS1P, and MYLKP113,14. Looking beyond the handful that have already been thoroughly explored, however, thousands of other mammalian lncRNAs have been found to be associated with protein coding genes involved in a wide variety of activities – not least of which being genes related to immune responses15. Current research has discovered a considerable number of

14 lncRNAs associated with disease states – the global dysregulation of gene expression in cancer being an obvious and prominent target of consideration. What might otherwise be a silent or lowly transcribed lncRNA may find new purpose in the pathogenesis of cancer (reviewed in16 In a similar vein, irregular lncRNA expression have been associated with genetic disorders as well17,18. LncRNAs have even been implicated in the pathogenesis and response to bacterial and viral infections such as Theiler’s Disease and HIV19,20. It might be taken as somewhat of a surprise, then, that relatively few lncRNAs associated with the immune response have been the subject of study, and even fewer that have been thoroughly explored functionally and mechanistically – but not altogether unexpected, as genetic and epigenetic regulation by lncRNAs is very much a budding field. When taking into consideration the diverse arrangement of roles already demonstrated for various lncRNAs, it should be clear that a comprehensive understanding of the immunity demands inquiry into lncRNAs, and that such a pursuit is worthwhile given the significant and diverse forms of regulatory action already identified among lncRNAs. This chapter will aim to provide an overview of lncRNAs that have already been implicated in the regulation of the immune system with regards to possible paradigms governing their mode of action and the means of their discovery. Furthermore, this chapter will also examine some lncRNAs encoded by viruses that play a role in their pathogenesis, highlighting the importance of lncRNA-mediated regulation across a wide variety of function. A common mechanism attributed to lncRNA function is the ability to modulate a gene’s expression by altering the transcriptional activity of its target gene. Currently, a wide variety of diverse actions have been identified by which lncRNAs exert such regulatory action, and as a result, lncRNA functionality has defied strict categorization into broad global paradigms; however, certain mechanisms have been demonstrated with notable frequency that may or may not eventually prove to be a global regulatory function.

15 LncRNAs form complexes with PRC2 to mediate gene repression

Figure 2 Common mechanisms of lncRNA regulation. (A) lncRNAs can act as competitive inhibitors of DNA-binding transcription factors and transcriptional repressors. GAS5 prevents glucocorticoid receptor localization to cognate genes, initiating growth arrest. Lethe binds RelA and blocks its transcriptional action. PACER sequesters the repressive p50 homodimer and allows the p50–p56 heterodimer to activate COX-2 expression.18 (B) lncRNAs can act in cis to recruit WDR5/MLL to mediate cognate gene activation. NeST recruits the WDR5/MLL complex to increase IFN-γ during Salmonella enterica infection. (C) PRC2 can be recruited both in cis and in trans to modulate gene repression. LRRFIP1 and PRC2 bind to a lncRNA-encoded upstream of the TNF promoter to induce closed chromatin at the TNF promoter.1

To date, the vast majority of lncRNA functionality has been attributed to some form of association with proteins, often playing a critical role in their function (reviewed in 21). The exact nature of this association tends to vary – especially given the dearth of mechanistic research, but certain patterns have emerged nonetheless. Though

16 RNA binding domains are quite common, their diversity confers a level of RNA specificity through secondary structure or sequence. Many lncRNAs have been found to operate as protein scaffolds – capable of bringing into proximity disparate proteins and protein complexes with RNA binding domains. One well characterized example is HOTAIR, which represses the HOXD locus in trans from 40 kilobases away10. The 5’ domain of HOTAIR binds the polycomb repressive complex (PRC2), while the 3’ domain binds the LSD1/CoREST/REST complex22 (Figure 2). Independently, both have been found to play some role in epigenetic repression through the deposition of histone 3 lysine 27 trimethylation marks (H3K27me3) or the removal of activating epigenetic marks; the advantage of joining them to a single complex may involve the compounded repressive effect, or allowing one complex to benefit from the loci specificity of the other, or both. Regardless, HOTAIR presents a well-characterized example by which other lncRNAs may be gauged. Indeed, many others have been found to operate in a similar fashion – including some that modulate the immune system. An unnamed lncRNA was found to interact with transcriptional repressor LRRFIP1, along with PRC2 subunits SUZ12 and EZH2 to drive H3K27me3-based repression of TNF23 (Figure 2). Shi et al. first identified LRRFIP1 to be a protein transcriptional repressor of TNF, and that it was also an RNA binding protein. They demonstrated that LRRFIP1 localization to the TNF promoter was contingent upon interaction with a lncRNA encoded upstream of the TNF promoter, as well as complex formation with EZH2 and SUZ12. LRRFIP1 is not currently considered a canonical member of PRC2 – so it’s possible that it may be analogous to the LSD1/CoREST/REST complex in interacting with this unnamed lncRNA. It would not be unreasonable to speculate LRRFIP1 functionality to be contingent on the lncRNA to form a scaffold between it and PRC2, allowing PRC2 to take advantage of LRRFIP1’s specificity to the TNF loci to exert its repressive effect, as it does with HOTAIR. Another lncRNA, lnc-IL7R, was found to be a negative regulator of LPS-induced expression of E-selectin, VCAM-1, IL-6, and IL-824. Though the authors did not demonstrate any particular protein- 17 binding partner of lnc-IL7R, they did find lnc-IL7R to be responsible for the deposition of H3K27me3 marks as well – however, they found no interaction with PRC2 members. Though many lncRNAs have been found to interact with PRC2, many other proteins have demonstrated themselves to work in concert with lncRNAs to induce transcriptional repression12,25. Though PRC2 interaction is a tempting course for establishing a global paradigm (some 20% of lncRNAs have been found to associate with it), there remains a strong likelihood that other means exist by which lncRNAs may exert some repressive action26. Kambara et al. performed an RNA-seq screen to identify lncRNAs upregulated in primary hepatocytes in response to IFN-α stimulation – mimicking the innate immune response against Hepatitis C virus infection (HCV)27. They identified around 200 annotated putative lncRNAs upregulated in response to stimulation, but focused their efforts on the most highly upregulated lncRNA – lncRNA-CMPK2. They found that lncRNA-CMPK2 is regulated by the JAK-STAT signaling pathway, and that it is involved in the downregulation of the interferon response. Knockdown of lncRNA-CMPK2 in HCV infected cells resulted in a significant decrease in HCV RNA and upregulation of antiviral IFN-stimulated genes (ISGs). Furthermore, lncRNA-CMPK2 was found to be upregulated in HCV patients – positioning lncRNA-CMPK2 as a potential target for HCV treatment. Though they did not find the exact molecular mechanism, they did find that lncRNA-CMPK2 was localized to the nucleus and that it was involved in transcriptional repression – meaning it may very well interact with some of the protein binding partners described earlier.

LncRNAs form complexes with WDR5 to mediate gene activation Though the lncRNAs discussed thus far have primarily been involved in the formation of silent heterochromatin, lncRNAs have been found to be involved in the development and maintenance of active chromatin as well. A particularly notable example of this is HOTTIP, a lncRNA responsible for the maintenance of histone

18 h3 lysine 4 trimethylation(H3K4me3 – a mark of active chromatin, not to be confused with H3K27me3 discussed earlier) across the HOXA locus28. HOTTIP interacts with the adaptor protein WDR5 and histone methyltransferase mixed- lineage leukemia (MLL) to maintain active chromatin across the HOXA locus. Later investigation found that WDR5 activity is dependent on lncRNA binding, positing WDR5 as a likely candidate for a global mechanism of lncRNA-mediated gene activation29(Figure 2). At least one gene has been found to modulate the immune response as well: nettoie Salmonella pas Theiler’s (NeST) is a lncRNA adjacent to the IFN-γ locus that is expressed in T cells19. NeST was found to interact with WDR5 to confer an increase in H3K4me3 at the Ifng locus (Figure 2). Curiously, while the expression of NeST led to greater resistance to Salmonella enterica infection, it also caused increased susceptibility to Theiler’s virus. The authors speculate that this is not related to some sort of contradictory function of the lncRNA, but rather that it’s a reflection of the importance of timing and magnitude in an immune response. NeST and HOTTIP are both considered to operate in cis – they lie adjacent to their target gene, and as such, any activity of the WDR5-MLL complex following recruitment can likely be attributed to simple proximity. Yang et al., however, found that WDR5 bound RNA without binding to the genomic loci from which they were transcribed – suggesting that WDR5 may be recruited in trans, in a similar fashion to HOTAIR and PRC229.

19 LncRNAs form complexes with a variety of potential protein partners

Figure 3 Select representations of specific lncRNA function. (a) NRON acts as a scaffold binding NFAT and other proteins to maintain NFAT’s localization in the cytoplasm. Upon T-cell activation, NRON is released from NFAT and NFAT is transported into the nucleus. (b) NEAT1 acts as an essential structural component of paraspeckles, which sequesters repressive protein SFPQ1 from the IL-8 promoter.

Though WDR5 and PRC2 are two well-characterized examples of lncRNA partners, evidence is constantly emerging for interactions with other proteins. One such lncRNA, noncoding repressor of NFAT (NRON), acts as a scaffold to prevent nuclear translocation of nuclear factor of activated T cells (NFAT)30. NRON binds scaffold protein IQ motif containing GTPase activating protein (IQGAP) and three kinases (casein kinase 1, glycogen synthase kinase 3, dual specificity tyrosine phosphorylation regulated kinase) to maintain NFAT in a phosphorylated state and prevent its import into the nucleus(Figure 3). Ca2+ stimulation induces the dephosphorylation of NFAT and its release from NRON and associated proteins. The NFAT family of proteins plays a significant role in T-cell function and immune

20 response; the discovery of NRON highlights an underappreciated role for lncRNAs in immune modulation. hnRNP activity is mediated by lncRNAs Another potential class of lncRNAs are those interacting with heterogeneous nuclear ribonucleoproteins(hnRNP). Two have been identified to modulate immune responses – lincRNA-cox2 and THRIL31,32. Though a thorough understanding of their action has yet to be fully realized, enough evidence has been accumulated for confidence in their functionality. LincRNA-cox2 has received attention on two separate occasions: in one study, it was the most differentially regulated lincRNA following stimulation of Toll-like receptor Tlr4 in bone-marrow-derived dendritic cells15. Recently, lincRNA-cox2 was also identified to be heavily differentially expressed following stimulation of bone marrow-derived macrophages with synthetic bacterial lipopetide Pam3CSK431. THRIL was identified in a similar fashion – it was singled out from a microarray screen of THP1-derived macrophages stimulated with Pam3CSK432. Silencing of lincRNA-cox2 resulted in the upregulation of several genes involved in immune response, including Ccl5, Cx2c11, Ccrl, Irf7, Oas1a, Oas1l, Oas2, Ifi204, and Isg15. Interestingly, lincRNA-cox2 silencing also led to the upregulation of other genes, including Tlr1, Il6, Il23a. Though the authors did not differentiated between direct targets of lincRNA-cox2 and downstream effects, it’s clear that lincRNA-cox2 plays a significant role in regulating the immune response. THRIL, on the other hand, was found to have a direct interaction with the promoter of TNFα. Similar to lincRNA-cox2, THRIL knockdown under Pam3CSK4 was found to differentially regulate a number of immune response genes – including Il-8, Cxcl10, Ccl1, and Csf1. Perhaps by serendipity, THRIL was found to interact directly with the TNFα promoter in complex with hnRNP-L. Indeed, Li et al. demonstrated that THRIL knockdown reduced hnRNP-L localization to the TNFα promoter. Though Carpenter et al. discovered a similar interaction between lincRNA-cox2 and hnRNP A/B and A2/B1, they did not demonstrate any direct

21 interaction between the lncRNA/hnRNP complex and a gene promoter. This does not mean that no interaction is occurring, however; hnRNPs have traditionally been considered to be primarily involved in mRNA processing, but hnRNP-A/B and hnRNP-L have both been implicated in transcriptional repression through direct promoter interaction as well33,34. At least two other lncRNAs have been described to interact with hnRNPs, suggesting that hnRNPs may also prove to be a potent and overlooked source of lncRNA-mediated functionality35,36.

LncRNAs are competitive inhibitors of proteins Protein-lncRNA interactions extend beyond lncRNA scaffolding. One such lncRNA is Growth-Arrest-Specific Transcript 5(GAS5): GAS5 was first identified as an overexpressed gene in mice with an increased susceptibility neural tube defects, and was later found to be essential for cell cycle control and in T cells37,38. GAS5 acts as a competitive inhibitor of the transcription factor glucocorticoid receptor (GR) by binding to the DNA-binding domain and preventing its localization to glucocorticoid responsive elements (GRE)39(Figure 2). Though the examples discussed thus far involved direct interaction of a lncRNA with the chromatin of its cognate gene, lncRNA-based transcriptional regulation is not necessarily contingent on direct chromatin interaction. Indeed, immune response regulation has demonstrated itself to be a surprisingly bountiful source of lncRNA-related transcriptional regulation by sequestration of regulatory proteins. Interestingly, the most common subject of regulation tends to be a subunit of NF-κB. For example, Lethe is a non-coding RNA found to be involved in negative feedback of RelA – a subunit of NF-κB responsible for transcriptional activation40(Figure 2). Rapicavoli et al. identified Lethe through its activation by TNFα and IL-1β expression, as well as through anti-inflammatory agent dexamethasone treatment. Interestingly, they also found that Lethe expression did not increase when exposed to microbial components. This particular phenomenon becomes quite coherent when considering Lethe’s function: Lethe is expressed by

22 RelA-mediated transcription upon TNFα-driven activation, but it proceeds to bind RelA and sequester it away from DNA. It seems likely, then, that Lethe plays a role in the regulation of the immune response by down-regulating inflammation in the absence of inflammatory agents. Meanwhile, the lncRNA p50-associated COX-2 extragenic RNA (PACER) offers a diametric example: PACER has been found to sequester p50, a repressive subunit of NF-κB. Krawczyk and Emerson found that PACER expression promoted active chromatin through p300 histone acetyltransferase recruitment and increased RNA polymerase II occupancy of the COX-2 promoter (Figure 2). To explain its mechanism, they found that PACER co-precipitated with p50, and that PACER knockdown resulted in increased p50 homodimer occupancy of DNA upon LPS stimulation in U937-derived macrophages. By occluding p50 homodimers, PACER allows p50-p56 heterodimers to bind and activate COX-2 expression. Lethe and PACER provide us two examples of inflammatory mediation by lncRNAs through NF-κB regulation. A frequently cited model of lncRNA regulation is that lncRNAs act to confer some form of loci specificity – however, Lethe seems to defy this notion – it acts to reduce RelA binding to several NF-κB regulated genes, including Il6, Sod2, Il8 and Nfkbia. It would be curious to see if there are other lncRNAs that behave in a similar fashion to Lethe, or if Lethe acts as a global layer of regulation for all targets of NF-κB regulation. It would also be interesting to see whether PACER confers a similar brand of non-specific activation. Though not quite of the same mode of action as PACER or Lethe, nuclear enriched abundant transcript1 (NEAT1) also facilitates transcription of immunity genes through sequestration of a transcriptional repressor41. Saha et al. first identified a lncRNA expressed in the mouse central nervous system upon infection by Japanese encephalitis virus or Rabies virus, which they termed virus-inducible ncRNA(VINC)42. Imamura et al. succeeded their investigation of VINC, now termed NEAT1, and found that its expression was induced by poly I:C – an immunostimulant simulating viral infections – through TLR3, and noted that it was 23 linked to large paraspeckle formation. Indeed, NEAT1 has been demonstrated to be an integral architectural component of nuclear paraspeckles43. They proceeded to identify that NEAT1 was, by some mechanism, required for the transcription of IL8. Combined with the knowledge that paraspeckles tended to contain transcriptional regulators, including transcriptional repressor SFPQ and NONO, Imamura et al. speculated that NEAT1-induced paraspeckles might act to sequester SFPQ from the IL8 promoter (Figure 3). Supporting their hypothesis, they found that poly I:C stimulation led to decreased occupancy of SFPQ at the IL8 promoter and increased occupancy within paraspeckles. NEAT1 regulation of immune response is admittedly further downstream than the other examples discussed thus far, but it is nevertheless an intriguing and important source of regulation and response to viral stimulus.

LncRNA transcription is involved in B cell diversity Two particularly striking examples of lncRNA-related regulation have the curious feature of being similar in both mechanism and function. Both examples present evidence for a global mechanism of B-cell biogenesis – they demonstrate that lncRNA transcription are involved in both V(D)J recombination and activation- induced deaminase (AID)-mediated somatic hypermutation (SHM) and class switch recombination (CSR): two processes that both serve to promote B and/or T cell diversity. Though lncRNAs are generated during regulatory steps of these processes, current evidence seems to suggest that the lncRNAs themselves play no particular role in regulation; instead, it’s hypothesized that the act of their transcription forms the regulatory mechanism. V(D)J recombination somatic recombination between an assorted set of variable(V), diverse(D), and joining(J) genetic loci to produce antibodies that recognize antigens as needed. Bolland et al. discovered an antisense transcript emanating from the VH region expressed in a strictly regulated fashion during VH to

DJH recombination44. They hypothesize that this intergenic antisense transcription

24 opens up the chromatin to facilitate VH to DJH recombination by promoting histone acetylation, and that therefore the actual act of transcription itself plays the functional role. Eight years later, Verma-Gaur et al. developed an explanation for V(D)J recombination in B-lineage progenitors based on transcription factories – a hypothesis that states that transcription occurs in spatially fixed clusters of RNA polymerase II to which the chromatin localizes, rather than the other way around as is traditionally thought45,46. They found that antisense transcription occurred primarily around Pax5-activated intergenic repeat (PAIR) elements in the V region. A conformation capture experiment revealed that this antisense transcription occurred proximally to the ordinarily distal Eµ enhancer – suggesting that antisense transcription at the V region causes it to localize at the same transcription factory as the Eµ region. Indeed, they found that a reduction of antisense transcription via YY1 knockout eliminated V and Eµ region co-localization – but that benzimidazole (DRB) treatment to repress transcriptional elongation did not, supporting the idea that this spatial mediation was governed by transcription itself. Following V(D)J recombination, immunoglobulin heavy-chain genes experience another series of recombination events that further promote antibody diversity – SHM acts to introduce novel V region variety, while CSR allows the antibody to alter its isotype. Both process are mediated by AID, and Pefanis et al. reason that AID recruitment to target loci is dependent on antisense transcription of RNA exosome transcripts47. They demonstrate that deletion of the Exosc3 subunit of the RNA exosome complex reduces levels of CSR and SHM while increasing the expression and extending the length of transcription start site associated antisense transcripts that also act as RNA exosome substrates (xTSS-RNA). Coupled with the finding that xTSS-RNAs are most strongly expressed at genes that are targeted by AID suggest that xTSS-RNA expression is critical for AID-mediated mutation and/or translocation. Pefanis et al. hypothesize that divergent transcription coupled with inefficient RNA exosome-mediated degradation of a paused antisense 25 transcript leads to the prolonging of transcription-mediated R-loops and grants increased stability to replication protein A(RPA)-maintained ssDNA structures, which are required by AID as a substrate.

Viral-encoded lncRNAs mediate pathogenicity Thus far, we have discussed a number of lncRNAs that have been demonstrated to be expressed and functional in response to immune challenge – most often following artificial LPS stimulation. These lncRNAs present an important and novel means of regulation of induced immune responses and attempts to recapitulate host- pathogen interactions from a host-centric perspective. A comprehensive understanding of host-pathogen interaction demands extensive investigation of the other side of the field as well – and indeed, lncRNAs have also been demonstrated to be integral to pathogen function as well, impacting its host interaction in a variety of ways. One such lncRNA was recently described for HIV-1. This HIV-1-encoded antisense RNA appears to be involved in establishing viral latency20. This antisense lncRNA emanates from the 3’ UTR across the HIV-1 genome and localizes to the 5’ UTR, recruiting epigenetic remodeling complexes containing DNMT3A – a de novo DNA methyltransferase that has been previously implicated in epigenetic silencing12 and RNA directed transcriptional gene silencing48. Another similar example is polyadenylated nuclear RNA (PAN) expressed from Kaposi’s sarcoma-associated herpesvirus (KSHV) 49,50. Like the HIV-1 encoded antisense lncRNA, PAN has been found to localize to its own genome – however, unlike HIV-1 antisense, PAN has been found to typically be responsible for the activation of transcription programs in KSHV, to such an extent that repression of PAN resulted in the repression of almost all KSHV transcription (although it may also be involved in repression to some degree as well)50. Interestingly, PAN has been found to have a function on the host genome as well; like many lncRNAs discussed in this review, PAN has been found to associate with

26 PRC2(EZH2 and Suz12 specifically) to induce heterochromatin formation as well – it has been implicated in the downregulation of genes encoding interferon, interleukin-18, alpha interferon 16, and RNase L. The idea that a viral lncRNA is capable of modulating its host immune system is not altogether unsurprising, but it does illuminate vast new avenues of investigation. PAN is not the first viral lncRNA reported to interact with its host: in 2007, Reeves et al. found that human cytomegalovirus (HCMV) encoded a 2.7kb non- coding transcript named β2.751. They found that β2.7 interacts with the genes associated with protein retinoid/interferon-induced mortality-19(GRIM-19) subunit of mitochondrial complex I, preventing the relocation of complex I following apoptotic stimuli and maintaining integrity and function of the mitochondria and allowing the virus to complete its life cycle. One set of interesting examples are the subgenomic flavivirus RNAs (sfRNA) produced by flaviruses such as West Nile Virus, Dengue fever, Yellow fever, Tick-borne Encephalitis, and Japanese encephalitis. sfRNAs are viral transcripts approximately 300-500 base pairs long, first identified in Murray Valley encephalitis52. Flaviviral full-length genomic RNA contains no poly(A) tail, but they do bear a 5’ cap – leaving them vulnerable to decapping and degradation by XRN1, a 5’-3’ exoribonuclease53,54. XRN1 does indeed degrade flaviviral RNA, but it halts at the 5’ end of the 3’ UTR, producing these sfRNA bodies. The 5’ end of the 3’ UTR has been found to form a compact secondary structure consisting of interwoven pseudoknots forming a ring-like formation physically preventing progression of XRN1 and producing the sfRNA55. Not only is the sfRNA resistant to XRN1 degradation, but it sequesters XRN1 as well, causing an accumulation of uncapped but stable cellular mRNA, dysregulating overall gene expression within the cell, and protecting other flaviviral RNAs from degradation56. Loss of sfRNA results in decreased plaque formation and replication, highlighting the importance of sfRNAs for flaviviral cytopathicity and pathogenicity53. Furthermore, poor replication in West Nile virus deficient in sfRNA production has its replicative potential rescued 27 by knockdown of interferon-related genes57. Given that increased mRNA stability is likely to have a greater impact on short-lived mRNA, including interferon genes, this finding is not altogether surprising. To date, this is a particularly unique mechanism, but its existence highlights the multitude of potential roles to be played by other lncRNAs, viral or otherwise. Many, though not all, examples described thus far consist of lncRNAs that exert their regulatory action through some form of chromatin localization. However, all mechanisms of chromatin localization discussed thus far rely on the sequence specificity of the protein, rather than the RNA itself. One particularly attractive hypothesis on lncRNA localization features its sequence conferring specificity in a base pairing interaction with RNA or DNA; to date, however, only one such example of this base-pairing interaction has been described thus far. Epstein-Barr virus (EBV) encodes a lncRNA – EBV-encoded RNA 2 (EBER2) – that localizes transcription factor PAX5 to the terminal repeat (TR) region of the EBV genome58. Lee et al found that, in spite of PAX5 consensus sequences occurring in the TR, EBER2 localization to the TR occurred independently of the presence of PAX5, and PAX5 localization was dependent on EBER2. Instead, they found that the TR encodes an RNA sequence that forms a stable RNA-RNA duplex with EBER2, and demonstrated that EBER2 localization was contingent on the presence of a nascent TR RNA transcript. Thus, EBER2 recruits PAX5 by binding to an actively transcribed RNA species – a method of localization that appears logical, but has not been previously demonstrated.

LncRNAs are Frequently Dysregulated in Cancer If we are to accept the biological relevance of lncRNAs, the potential consequence of their dysregulation should be obvious. Given the breadth of function attributed to lncRNAs thus far, it should come as no surprise that lncRNAs stand to play important roles in cancer development and treatment. A recent genome-wide study identified 8,487 lncRNA associated with cancer, with many cancers displaying a

28 highly specific lncRNA expression signatures to their subtype59. Concordant with the current infancy of lncRNA research, few of these have been extensively investigated. HOTAIR, mentioned earlier, was found to be dramatically overexpressed in breast cancer metastases, and highly expressed in primary tumors60. Gupta et al. demonstrated that HOTAIR overexpression epithelial cancer cells promoted a metastatic phenotype by remodeling nuclear architecture to more closely resemble embryonic fibroblasts through altering PRC2 occupancy. Another lncRNA, lncRNA-ATB, promotes hepatoceullar carcinoma metastasis by acting as a competing endogenous RNA (ceRNA)61. Briefly, the ceRNA hypothesis claims that a network of RNA species exerts regulatory influence upon one another by competing for regulation by the same miRNA species through miRNA recognition sites on their 3’ UTR62. lncRNA-ATB may very well serve as a supporting example for this hypothesis; it was demonstrated to sequester members of the miR-200 from Zeb1 and Zeb2, thereby promoting an epithelial-mesenchymal transition and subsequently driving cellular invasion. The full extent of lncRNA involvement in cancer cannot be encompassed within this section – however, the functionality of non-coding RNAs discussed thus far serves to illustrate the breadth of non-coding RNA functionality and the possible consequences of their dysregulation.

Conclusion Though the examples discussed so far have typically been the subject of dedicated scrutiny in signal transduction, lncRNA research in particular has thrived in the genomic era. Many of the lncRNAs discussed earlier were first identified in genomic and transcriptomic screens, and it’s likely that many more remain to be discovered. lincRNA-Cox2, for example, was first identified in a screen intended to identify lincRNAs and assign them putative function15. It was the most highly differentially expressed gene out of 20 upon Tlr4 stimulation of CD11C+ bone-marrow-derived

29 dendritic cells – 80% of which reside in clusters regulated by NFκB; the potential for the other 19 lincRNAs to exert some regulatory mechanism remains yet unexplored. Another study used identification of lincRNAs expressed under LPS stimulation as a demonstration for a lincRNA identification algorithm – ten lincRNAs with putative function were identified, but not characterized any further63. Studies often justify their investigation of a single gene by merit of that gene being most differentially expressed upon the perturbation of a system9,24,27,31,32,40,64. It is not entirely reasonable to assume that high differential expression implies significant contribution, but the inverse is not necessarily true – low differential expression does not necessarily imply insignificance. lncRNA kinetics is a largely unexplored field, and limiting studies to the extremes of expression potentially overlooks significant mechanisms of lncRNA regulation. For example, instances of antisense- mediated transcriptional repression tends to be threshold dependent, such that even a relatively low amount of antisense transcription would be enough to shut down sense strand transcription65. The wide tracts of data that have already been generated are ripe with potential discovery, and choosing to investigate a less differentially expressed gene is by no means any sort of concession or compromise.

DNA Methylation DNA methylation in eukaryotes refers to the modification of cytosine residues with a methyl group, and constitutes a significant epigenetic mark involved in gene regulation. In vertebrates, methylation primarily occurs on CpG dinucleotides, which are statistically underrepresented in the and cluster near transcription start sites66. At its discovery, DNA methylation was found to functionally correspond to gene repression, and modern popular conceptions often continues to define it as a repressive mark67,68. For promoter methylation, this often holds true: gene promoters contain strings of CpG dinucleotides, termed CpG islands (CGIs), and methylation of CGIs has indeed long been demonstrated to be indicative of robust transcriptional repression; however, actively transcribed genes

30 have also been demonstrated to contain CGIs in their promoter that exhibit a dynamic cycle of methylation and demethylation69,70. Active demethylation is mediated by the ten-eleven translocation (TET) family of proteins and Thymine- DNA glycosylase (TDG). The TET family converts 5-methylcytosine (5mC) into 5- hydroxymethylcytosine (5hmC), then 5-formylcytosine(5fC) and 5- carboxylcytosine(5caC), which is then subsequently removed by TDG71,72. High- resolution genome-wide mapping of 5mC, 5hmC, and 5fC indicates that transcriptional regulation may be modulated by 5hmC and 5fC marks poised to be demethylated – however, methylation turnover kinetics has proven a challenging dynamic to investigate73,74. Thus, the straightforward notion that DNA methylation is strictly a repressive mark, even in promoters, has been thrown up for debate.

DNA methyltransferases DNA methylation – that is to say, 5mC – is established by the de novo methyltransferases of the DNA methyltransferase 3 (DNMT3) family, and maintained by DNMT1. DNMT3A and DNMT3B are independent methyltransferases that both associate with DNMT3L to deposit de novo 5mC marks75,76. Interestingly DNMT3A and DNMT3B have also been demonstrated to operate as demethylases and remove 5hmC – a function that may also play a role in methylation cycling69,77. A prominent question that has driven investigations on DNMT3A and DNMT3B seeks to determine the mechanism by which these proteins recognize the sequences that they methylate. Current models of DNMT3A recruitment largely suggest that its localization is driven by recognition of proteins that bear their own sequence specificity in some fashion. Several models hypothesize that DNMT3A recruitment may be dependent on local histone modifications, as histone methylation and DNA methylation has been found to be linked in several organisms78. DNMT3A and DNMT3B have been demonstrated to complex with HP1, which recognizes the heterochromatin mark

31 H3K9me379. Alternatively, another study alleges that DNMT3A interacts directly instead with H3K4, and instead competes with HP1 for H3K4 interaction80. Another finding indicates that the ADD domain of DNMT3A recognizes and preferentially interacts with the H3 histone 1-19 tail of H3K4, but H3K4me381. Furthermore, Zhang et al. describe that H3T3 phosphorylation disrupts ADD domain binding. Transcription factor interactions have been demonstrated to confer sequence specificity upon DNMT3A as well. Indeed, one study identified 69 transcription factors that interact with DNMT3A – suggesting that transcription factors may also have a place in the network governing DNA methylation82. Interestingly, DNMT3A has also been demonstrated to exert a repressive effect independent of its DNA methyltransferase activity. DNMT3A was shown to interact with transcriptional repressor RP58 and bolsters its repressive effect through its association with HDAC183.

DNA methyltransferases interact with lncRNAs DNMT1 has been shown to interact with a non-coding RNA emanating from the CEBPA gene locus84. This RNA, termed extra-coding CEBPA (ecCEBPA), has a TSS ~1 kb upstream of the CEBPA TSS and overlaps the gene, terminating ~1kb downstream. ecCEBPA is preferentially bound by DNMT1 over CEBPA DNA and sequesters it from the CEBPA locus, thereby inhibiting its activity. Di Ruscio et al. also conducted a genome-wide study and determined that this sort of RNA- DNMT1 interaction is a global phenomenon, finding that RNA species that associated with DNMT1 were typically unmethylated at their cognate loci. In a similar fashion, another study found that DNMT3A was amenable to modulation by RNA as well. A transcript antisense to the E-Cadherin promoter was demonstrated to bind and subsequently inhibit the catalytic domain of DNMT3a, while two other RNA species – an antisense to the EF1a promoter and an siRNA targeted to NUP153 mRNA – bound to the allosteric site of DNMT3A without

32 affecting its catalytic activity85. The latter two RNA molecules have been previously demonstrated to induce transcriptional repression at their cognate loci, although one study was retracted86. In further support of this mechanism, however, the Tsix RNA transcript has also been described to form a complex with DNMT3A to induce transient transcriptional silencing at the Xist locus to mediate X inactivation87. Our group has previously demonstrated that an antisense to a PTEN pseudogene – PTNEpg1α – is responsible for recruiting DNMT3A to the PTEN locus and inducing transcriptional repression, albeit independently of DNA methylation12. We thereby hypothesized that lncRNA-mediated recruitment of DNMT3A may operate as a global mechanism of transcriptional repression.

Technology used in this thesis

RNA-Seq and ChIP-Seq reveal chromatin binding and regulatory potential profiles RNA sequencing is a comprehensive means of assessing gene expression across virtually any tissue type or condition. It’s commonly used to determine differential expression driven by disease conditions or treatments, as it represents a powerful means to dissect the consequences of such challenges on biological operations. In theory, it could be used to determine the effect of transcriptional regulation as well: depletion of a transcriptional regulator would naturally affect the genes under its purview. However, differentiating between direct targets of regulation and cascading downstream effects is virtually impossible with RNA-seq data alone. Chromatin immunoprecipitation (ChIP) sequencing involves immunoprecipitating a protein of interest along with any chromatin regions in complex with it. By limiting the sequencing of said chromatin regions, one may reveal, with reasonable resolution, specific genomic regions to which the protein of interest is bound. This, too, serves to reveal potential loci at which a protein may exert some transcriptional regulation; however, binding does not necessarily imply regulatory action. A protein exerting regulatory action may be contingent on 33 collaboration with other regulatory proteins or may require certain cellular conditions amenable to its operation. Furthermore, gene regulation may also occur distal to a gene’s proximal promoter – up to hundreds of kilobases88. By integrating transcriptomic data from RNA-seq with chromatin binding affinity from ChIP-seq, one stands to better clarify the regulatory impact of their protein of interest by assessing the functional impact of its association with a chromatin region. We elected to conduct these two experiments in parallel for the intent of determining targets of DNMT3A regulation, and to assess the impact of DNMT3A on gene expression.

Use of RIP-seq to reveal potential modulators of DNMT3A recruitment RNA immunoprecipitation is in principle, identical to ChIP, except that it precipitates associated RNA molecules instead of DNA. However, RIP-seq presents its own set of challenges. Unlike with ChIP-seq, RIP-seq typically preserves the integrity of transcripts bound to the immunoprecipitated protein without fragmentation. The exact binding site is therefore difficult to determine – often impossible in the absence of clear motifs or secondary structure. The relative unpopularity of RIP-seq also confounds analysis – compared to ChIP-seq, bioinformatics analysis of RIP-seq experiments is relatively immature. Nevertheless, RIP-seq is a technique occasionally employed to determine the identity of RNA species potentially engaged in a regulatory relationship with proteins of interest – especially lncRNAs84,89. We sought to conduct a RIP-seq with the hope of identifying more lncRNAs involved in the regulation of DNMT3A.

Experimental Modulation of lncRNAs The functional interrogation of any gene often begins with detecting the effect its suppression on related biological function. Suppression of lncRNAs, however, present a unique set of challenges compared to mRNA.

34 RNA interference RNA interference (RNAi), a popular method of investigation through sequestration and degradation of RNA species via small interfering RNAs (siRNA) or their precursor, small hairpin RNAs (shRNA), is appropriate for some lncRNA species – but the dependence of RNAi on DICER and RISC, which primarily localize in the cytoplasm, complicates its ability to target nuclear transcripts, which form an abundant proportion of non-coding RNAs2,90. Recent studies have suggested that RNAi machinery do operate in the nucleus, but general efforts at targeting nuclear transcripts appear to experience considerably reduced efficacy91. Other studies have reported that RNAi can occur in the nucleus as well, but from a pathway distinct from cytosolic RNAi86,92. This variety of RNAi is driven by siRNAs inducing transcriptional gene silencing through interactions with heterochromatin inducing factors and DNA methyltransferases. However, the conditions that make a particular gene or siRNA molecule amenable towards silencing have yet to be fully characterized. Yet more studies claim that siRNA and dsRNAs are capable of inducing transcriptional activation as well, further illustrating the increasingly complex network of siRNA activity in the nucleus93,94. Nevertheless, it is due to their general inefficacy that RNAi is not a favored course for targeting nuclear lncRNAs.

Antisense Oligonucleotides One popular approach for suppressing lncRNAs involved the use of antisense oligonucleotides (ODN). ODNs operate by binding to a complementary RNA sequence and degrading it through RNase H, which targets RNA:DNA heteroduplexes. RNase H is active in the nucleus, making it a strong option for functional interrogation of lncRNAs. Several chemistries have been explored to improve specificity and stability of ODNs – but one chemistry of particular interest are morpholinos. Morpholino ODNs do not activate the RNase H pathway; instead, they sterically block the complementary sequence and prevent protein association

35 with that region. Thus, morpholinos offer a means to interrogate the function of specific regions of an RNA molecule58.

CRISPR/Cas9 With the emergence of simple and cost-effective genome engineering through CRISPR/Cas9 technology, manipulating a lncRNA’s genetic locus has proven itself to be an effective method of interrogating its activity. Unlike with protein-coding genes, lncRNAs cannot be suppressed by a frame shift mutation – thus, other strategies must be adopted (reviewed in 95). Intergenic lncRNAs – lincRNAs – present one of the more conceptually simple solutions: simply delete the entire lincRNA by targeting Cas9 to regions flanking the gene. Regions as large as 1mb have been deleted by Cas996 – few, if any genes, would be too large for this strategy to be viable. However, some cautions must be taken before adopting this strategy: unintended consequences may be incurred if the target genomic region contains regulatory elements affecting other genes (such as with enhancer RNAs), or contains multiple transcripts emanating from within the same region. One cautionary tale can be found in the case of Egfl797. The authors of the original study generated an Egfl7 knockout mouse, inadvertently deleting mir-126 as well, which resides in the seventh intron of Egfl7. Consequentially, several phenotypes driven by mir-126 deletion were mistakenly attributed to loss of Egfl7.

36

Figure 4 Two gene excision based methods of lncRNA perturbation. (a) CRISPR sgRNAs are directed towards excising regulatory elements upstream of the lncRNA’s TSS, thereby ablating transcription factor recruitment and transcription. Truncation of the lncRNA is also a possible approach. The truncated gene is repaired through the non-homologous end joining pathway. (b) sgRNAs are directed to induce a double stranded break shortly downstream of the lncRNA TSS, and a donor vector containing homology arms to the break site flanking selection markers and a Poly(A) site is provided as a template to guide homology-directed repair of the target region. The Poly(A) site induces premature termination of the lncRNA, thereby preventing expression of the full transcript.

A less invasive means of gene ablation involves the deletion of the promoter region (Figure 4a). This approach has the added benefit of potentially being able to target lncRNAs with TSS’ in the introns of other genes – but can often be complicated by the often enigmatic nature of gene promoters. Determining the exact character of a promoter can often be difficult, and as a result, deletion of the promoter often fails to achieve total loss of expression. Furthermore, while deleting intronic regions will not likely affect the function of the associated protein, it may alter its expression if the intronic region contains any regulatory elements driving its expression98. 37 Nevertheless, any unintended effects can often be resolved through rigorous experimentation, and partial ablation may be sufficient to achieve desired consequences. Incidentally, truncation through partial deletion of the lncRNA may also lend insight on sequence-specific functionality as well. An attractive alternative to gene deletion instead adopts an additive approach: premature termination of transcription can be achieved through the insertion of a poly(A) tail (Figure 4b)99. Whether or not this approach is less invasive than genomic deletion can often be context dependent; it allows the presence of regulatory elements to remain, but it disrupts the local chromatin architecture. It may be worth employing both methods to account for any unintended effects incurred by the other.

Hypothesis Antisense lncRNAs confer loci specificity upon DNMT3A in a sequence-specific manner and modulate its activity in epigenetic modification and transcriptional regulation.

Aims 1. Analyze ChIP-seq, RIP-seq, and RNA-seq data to determine lncRNAs of interest and validate them with qPCR 2. Develop and compare RNAi and Cas9-mediated excision-based methods of gene suppression and determine their effect on expression of the lncRNA’s cognate gene 3. Determine impact of lncRNA suppression on DNMT3A localization 4. Determine impact of lncRNA suppression on DNA methylation

Methods

Cells and Tissue Culture Human Embryonic Kidney 293 cells (HEK293) were provided to our group courtesy of Professor Merlin Crossley on the 3rd of April, 2013, originally sourced 38 from the American Type Culture Collection (HEK293 ATCC® CRL-1573 ™). Seed stocks were generated upon reception and stored in liquid nitrogen, and used to replace working cultures after a maximum of 15-20 passages. Knockout cell lines were generated from working cultures at no greater than 3 passages following their thaw, and seed stocks of knockout lines were preserved in the same fashion. Working cells were cultured in a 1:1 mixture Dulbecco’s Modified Eagle Medium and Ham’s F-12 nutrient mixture (DMEM F-12 Ham) (Sigma-Aldrich) supplemented with 10% fetal bovine serum(FBS) (Life Technologies) and 1x Penicillin-streptomycin-glutamine(PSG) (Life Technologies). Cells were incubated at

37°C with 5% CO2, and passaged every 3-5 days by removing residual media components with a phosphate-buffered saline (PBS) wash and severing their adherence with a 0.25% trypsin-EDTA solution contained in serum-free media (Sigma-Aldrich)

Genome Browsers

UCSC genome browser was used to identify gene annotations, EST data, CpG islands, visualize our ab initio transcriptome, and examine ENCODE and modENCODE transcription factor and histone ChIP-seq data100. General basic local alignment was also conducted for primers, shRNA, and sgRNA targets on UCSC. The FANTOM5 Zenbu browser was used to examine FANTOM5 CAGE- seq data and TSS prediction101. The ICGC was used to examine mutations across cancer types in BIM-as102.

Generation and Analysis of Genome-Wide DNMT3A Interaction Data

RNA-seq RNA-Seq data was trimmed using Trim Galore! and mapped to hg19 using Tophat2, with replicate samples aligned independently. Aligned reads were quantified by Cufflinks and differential gene expression events on combined replicates were called

39 using Cuffdiff. Data quality and replicate correlation was investigated using the CummeRbund package. All treatments were conducted using default parameters.

RIP-seq RIP-Seq data was trimmed with sickle and aligned to hg19 using Tophat2 and subsequently analyzed using the Piranha and Zagros software packages103,104. Reads were separated in to 25 nucleotide bins and leveraged against the input sample as a covariate in a zero truncated negative binomial regression model with p value thresholding and multiple hypothesis correction disabled. The parameters used were as follows: -s -p 1.0 -c -z 25 -u 0 -v -l. Resulting output was then subject to a manual Bonferroni correction: 118,542 peaks were called, setting the p value threshold at 0.05/118542=4.2*10-8. Gene coordinates that survived manual thresholding were then intersected with the GENCODE GRCh37.p13 annotation set and coordinates > 2kb away from an annotated gene were removed, as manual examination determined the vast majority of these intergenic coordinates to correspond to rRNA repeats. Remaining hits were bi-directionally extended to 150 nucleotides if the output length was under 150 nucleotides – so that secondary structure would be better represented. Secondary structure probability was calculated using Thermo, which uses RNA Vienna package to calculate base pairing probability without using RNA Fold’s calculation, and motif identification was conducted through Zagros. Piranha and Zagros were designed for use in CLIP-Seq peak calling and motif analysis, respectively, which involves a partial RNase digestion and UV cross- linking to better improve the resolution of the protein-binding site on its target transcript. This form of peak calling would normally be redundant in RIP-Seq, because RIP-Seq protocols typically leave the total transcript intact, providing no clue towards the protein-binding site. However, because our RIP-Seq protocol features a mild sonication, we hypothesized that a fragment containing the DNMT3A binding site may be at least mildly overrepresented in our sample and justify the use of peak-calling software over traditional RNA-Seq analysis.

40 Given the vast variation currently found in RIP-Seq analysis methods, however, RIP-Seq and RNA-Seq analysis was also conducted independently courtesy of Martin Smith of the Garvan Institute. Briefly, he conducted ab initio transcriptome assembly using Trinity and used Salmon to quantify transcripts against the reference assembly generated by Trinity. Hits of interest were taken from both outputs and experimentally investigated. RNA-Seq, ChIP-Seq, and RIP-Seq samples were generated by Per Johnnson of the Karolinska Institute, Sweden, and library prep and sequencing were conducted courtesy of the Ramaciotti center. Stranded RNA-Seq and unstranded RIP-Seq libraries were generated by the TruSeq RNA library prep kit (Illumina), and CHIP-Seq libraries were generated with the TruSeq ChIP sample prep kit.

ChIP-Seq Raw reads were trimmed as paired-end reads with Trim Galore! using the following parameters and mapped to hg19 using bowtie2. Aligned reads were sorted with samtools in preparation for peak calling. Mapped, sorted reads had peaks called by macs2 with broad peak calling enabled and a q value threshold of 0.05 using the following paramters: -g hs -n Peaks -B --broad -q 0.05105. Peaks were then assessed by regulatory potential using BETA-minus. ChIP-seq and RNA-seq data was integrated using BETA-plus to determine regulatory potential

Validation of RIP-Seq by RIP-qPCR 12 million HEK293 cells were suspended in 1 ml of a 1% formaldehyde in PBS solution in order to cross-link protein-RNA interactions. Cells were incubated at room temperature for 10 minutes on a rotating platform, after which 50µl 2.5M glycine was added and incubated for 5 minutes. Cells were then spun down at 800 RCF for 5 minutes and washed twice with ice-cold PBS. Cross-linked and washed cells were re-suspended in 180µl RIP cell lysis buffer (20mM HEPES pH 7.0, 150mM NaCl, 1mM EDTA pH 7.0, 1% Igepal 41 CA630, 0.5% NaDOC, 0.2% SDS, 1x Complete Inhibitor [Roche], 0.25µl RNaseOUT[Invitrogen], 10mM DTT ), passed through a 25-gauge needle 20 times, and incubated on ice for 10 minutes. The resulting cell lysate was sonicated on a QSonica Q700 cup horn sonicator at amplitude 4, 30 seconds on, 30 seconds off, for a total process time of 1 minute and 30 seconds. Sonicated lysate was centrifuged at 12,000 RCF at 4°C, and the supernatant was diluted in 1620µl RIP Dilution Buffer (20mM HEPES pH 7.0, 150mM NaCl, 1mM EDTA pH 7.0, 1% Igepal CA630, 1x Complete Protease Inhibitor [Roche], 2.25µl RnaseOUT [Invitrogen], 1.62µl 10mM DTT). Meanwhile, two sets of 300µl Protein G Dynabeads (Invitrogen) were washed twice with 1ml RIP Dilution Buffer and re-suspended in their original volume. One set was set aside for pre-clearing, while the other set was blocked in RIP Dilution Buffer with 0.3 mg/ml BSA and 1 mg/ml Salmon Sperm DNA for 4 hours. Diluted lysate was pre-cleared with 300µl Protein G Dynabeads and 18µg Normal Rabbit IgG (Cell Signaling Technology) for 2 hours. Pre-cleared lysate was divided into 3 aliquots of 600µl, and then subsequently divided into 270µl, 270µl, and 60µl – reserved for the experimental immunoprecipitation, IgG control immunoprecipitation, and input, respectively. 3µg antibody – either DNMT3A (ab2850, Abcam) or Normal Rabbit IgG (Cell Signaling Technology, 2729) was added to appropriate samples and incubated for 3 hours at 4°C on a rotating platform. Blocked Dynabeads were washed twice with 1ml RIP Dilution Buffer and resuspended in their original volume and 50µl of blocked Dynabeads were added to each immunoprecipitation sample. Beads and sample were rotated at 4C for 30 minutes, whereupon the supernatant was removed and the beads were washed six times, 15 minutes per wash, with 1ml of the following buffers in order: RIP Cell Lysis Buffer, Low Salt RIP Wash Buffer (20mM HEPES pH 7.0, 150mM NaCl, 1mM EDTA pH 7.0, 0.5% Igepal CA630), High Salt RIP Wash Buffer(20mM HEPES pH 7.0, 500mM NaCl, 1mM EDTA pH 7.0, 0.5% Igepal CA630), LiCl RIP 42 Wash Buffer(20mM HEPES pH 7.0, 0.25M LiCl, 1mM EDTA pH 7.0, 0.5% Igepal CA630, 0.5% NaDOC), and twice more with Low Salt RIP Wash Buffer. 120µl Elution Buffer (10mM HEPES pH 7.0, 150mM NaCl, 1mM EDTA pH 7.0, 2% SDS, 10mM DTT, 1.6µg Proteinase K) was then added to the beads and 60µl of a 2x concentrate Elution Buffer was added to the input, and all samples were incubated on a Thermomixer (Eppendorf) at 65°C, 1000 RPM, for 2 hours. 500µl TRIZol (Life Technologies) was then added to each sample and the total volume was transferred into Phase Lock Gel Heavy (5 Prime) tubes. 200µl chloroform was added and samples were shaken for 30 seconds, incubated at room temperature for 10 minutes, and spun at 4°C, 18,000 RCF, for 20 minutes. Aqueous phase was transferred into a new tube and an equal volume of isopropanol and 1µl Glycoblue (Ambion) was added, whereupon samples were precipitated at -20°C overnight. Samples were then spun at 4°C, 18,000 RCF, for 20 minutes, and the supernatant were removed. The RNA pellet was washed twice with 70% EtOH and re-suspended in 25µl 1x DNase Buffer (Ambion). 1µl TurboDNase (Ambion) was added to each sample and incubated at 37°C for 45 minutes. 75µl 100% EtOH was then added to each sample, and samples were subsequently bound to an RNeasy minElute column (Qiagen) and washed according to protocol – with the exception of wash buffers being incubated on the column for 5 minutes prior to centrifugation. RNA was eluted off columns twice with 15µl 50°C RNase free water and purity was gauged on a Nanodrop 1000. 11µl of each sample was reverse transcribed with Superscript IV (Invitrogen) according to protocol with random pentadecamers. cDNA was analyzed on a Viia7 (Applied Biosystems) quantitative thermal cycler using KAPA Sybr Master Mix (KAPA) and the primers listed in Table 1.

43 Target Forward Primer Reverse

PDCD4 GCTACTTGGAAAGCTGAGGTG TAGCCCTTCTGCCCATTAGA BIM GAGGAAGTTGTTGGAGGAGAATAG CTCGCCACTTGTCCTTGTT SLIT2 CACTTTGAGAAGGACGAACCA CACCTGATGGCAAAGGTAGAA DGL5 GGCACGGCCTTTGACAAGA GGCTGTGGAGTGTGTGGTAG EPHA5 CACCCCAGTGTTTGCAGCAT ATTCGCAGCAACTTCCACTG Table 1 Primers used to detect lncRNAs bound to DNMT3A, and to measure their expression as well

ChIP-qPCR for validation and experimentation 1.5*106 to 3*106 cells grown in 6-well plates in triplicate per condition were cross- linked with 54µl 37% Formaldehyde and incubated for 10 minutes on a rotating platform. Cross-linking was quenched by the addition of 100µl 2.5M Glycine and incubated for a further 5 minutes. Cells were then washed twice with 1ml cold PBS, harvested with a cell scraper, and centrifuged for 5 minutes at 800 RCF at 4°C. Residual PBS was removed and cells were re-suspended in 1ml Farnham Lysis Buffer (5mM PIPES, 85mM KCl, 0.5% NP40, 1x Complete Protease Inhibitor[Roche]). Cells were run through a 25 gauge needle 20 times, incubated for 10 minutes on ice, and centrifuged for 5 minutes at 800 RCF, 4°C. The supernatant was removed and cell lysate was re-suspended in 180µl Nuclei Lysis Buffer (50mM Tris-HCl pH 8.1, 10mM EDTA, 1% SDS, 1x Complete Protease Inhibitor) and incubated for 10 minutes before being sonicated on a Qsonica Q700 cup horn sonicator at amplitude 16, 30 seconds on, 30 seconds off, total process time 12 minutes. Sonicated lysate was centrifuged for 10 minutes at 18,000 RCF, 4°C, and supernatant was removed and diluted into 1,620µl ChIP Dilution Buffer (0.01% SDS, 1.1% Triton X-100, 1.2mM EDTA, 16.7mM Tris-HCl pH 8.1, 167mM NaCl, 1x Complete Protease Inhibitor). Samples were pre-cleared with and 6µg Normal Rabbit IgG and 30µl Protein G Dynabeads beads washed twice with ChIP Dilution Buffer per sample for 2 hours at 4°C on a rotating platform. Supernatant was the

44 removed from magnetic beads and each sample was divided into volumes of 810µl, 810µl, and 180µl, indicating the experimental immunoprecipitation, IgG control immunoprecipitation, and input, respectively. 1µg of antibody – either DNMT3A or Normal Rabbit IgG – was added to each sample and incubated overnight on a rotating platform at 4°C. 30µl washed Protein G Dynabeads per sample were also blocked overnight in 0.3mg/ml BSA. Blocked Dynabeads were washed twice with ChIP Dilution Buffer, added to each sample, and incubated for 30 minutes on a rotating platform at 4°C. Supernatant was then removed and Dynabeads were washed 6 times, 15 minutes and 1 ml buffer per wash, in the following order: Nucleci Lysis Buffer Buffer, Low Salt Immune Complex Wash Buffer (0.1% SDS, 1% Triton X-100, 2mM EDTA, 20mM Tris-HCl pH 8.1, 150mM NaCl, High Salt Immune Complex Wash Buffer (0.1% SDS, 1% Triton X-100, 2mM EDTA, 20mM Tris-HCl pH 8.1, 500mM NaCl), LiCl Immune Complex Wash Buffer(0.25M LiCl, 1% Igepal CA630, 1% NaDOC, 1mM EDTA, 10mM Tris-HCl pH 8.1), Low Salt Immune Complex Wash Buffer, and TE buffer(10mM Tris-HCl pH 8.1, 1mM EDTA). 120µl Elution Buffer (10mM HEPES pH 7.0, 150mM NaCl, 1mM EDTA pH 7.0, 2% SDS, 10mM DTT, 1.6µg Proteinase K) was then added to the beads, 60µl of a 2x concentrate Elution Buffer was added to the input, and all samples were incubated on a Thermomixer (Eppendorf) at 65°C, 1000 RPM, for 4 hours. Samples were then separated on a magnetic rack and 600µl Puffer PB (Qiagen) and 10µl Sodium Acetate pH 5.1 was added to each sample and bound to a PCR purification column (Qiagen). Purification was carried out according to manufacturer’s protocol, with exception of Buffer PE being incubated on the column for 5 minutes prior to centrifugation. Samples were eluted twice with 30µl 50°C Buffer EB. Samples were then analyzed via qPCR as previously described with the primers described in Table 2.

45 Target Forward Primer Reverse

PDCD4 GGAAGCCTAAGGTTCTGGTTA CTCGCTGTTCTCTGTGACTTT BIM GGTGGTCCCTTTACAGAGTTT GGAGGTGGATCGGGAAATG SLIT2 CACTTTGAGAAGGACGAACCA CACCTGATGGCAAAGGTAGAA EPHA5 CGGATCAAGGGTGTCGAGAG GGCTGCAAAGAACCTTCACC DGL5 GACGCCGTTCAGGTAGAGAA CTGCTGCTCAAGCTGCTCTT Table 2 Primers targeted towards promoter regions to detect chromatin bound to DNMT3A.

Design and cloning of shRNAs shRNAs were designed and cross-referenced on Invitrogen’s Block-iT RNAi designer and the Whitehead Institute’s siRNA design page106,107. Selected targets are described in Table 3. Target Sequence

PDCD4sh1 GGGATGGGAGCGTTGGGAAA PDCD4sh2 TGAGTCCCTGTCCTAGCCCT BIMsh1 GATGTTCCATTTCCCGATCC BIMsh2 GCTTTCTTCTGCGTGGTTCT SLIT2sh1 GCGAAGTGCATGTGTGTTTGT SLIT2sh2 GCGTCCGTCCTATTGTTTAGA SLIT2sh3 GTGCTGACTAGTGGATATTTC Table 3 shRNA target sites selected through aformentioned design tools. Sense/antisense pairs were ordered and cloned into U6M2.

46

Figure 5 Features of the U6M2 RNAi expression vector. BglII and KpnI restriction sites are located downstream of a U6 promoter, which drives expression of the shRNA. Vector also contains a neomycin resistance cassette for population selection.

Sense-antisense oligonucleotide pairs were ordered from IDT and cloned into the U6M2 expression vector (Figure 5). 2µl of each oligonucleotide diluted to 100µM was phosphorylated with T4 Polynucleotide Kinase (PNK)(NEB) according to manufacturers instructions and subsequently mixed and annealed in a thermal cycler at 98°C and ramped down at 0.8°C per minute to 20°C. Meanwhile, pCDNA3-

47 U6M2 was sequentially digested using BglII (NEB) and KpnI (NEB) and ran through a PCR purification column (Qiagen) in between digestions. Digested U6M2 was treated with Antarctic phosphatase (NEB) according to manufacturers instructions and purified once more. Annealed oligonucleotides were diluted 1:25 and 1µl was ligated with 50ng digested U6M2 with T4 Ligase (NEB) according to manufacturer’s instructions. 5µl of ligation reaction was added to 50µl CaCl2 competent cells and incubated for 30 minutes on ice. Competent cells were then heat-shocked at 42°C for 30 seconds, incubated on ice for 2 minutes, and then diluted with 350µl S.O.C media (NEB) and recovered for 1 hour at 37°C in a shaking incubator. The entire transformation reaction was plated on LB agar plates supplemented with ampicillin (Sigma-Aldrich) and incubated overnight at 37°C. 2-3 colonies per plate were picked into 50µl H2O and incubated at 98C for 10 minutes. 2µl of each sample was subject to PCR with 8µl JumpStart Taq Readymix, 0.8µl F,

0.8µl R, 4.4µl H2O, under the following cycling conditions: 95°C for 5 minutes, (95°C for 30 seconds, 53°C for 30 seconds, 72°C for 30 seconds) for 25 cycles, 72°C for 2 minutes. PCR primers are listed in Table 4.

Forward Primer Reverse Primer

U6M2 Colony AAGGTCGGGCAGGAAGAG CGCCACTGTGCTGGATATCT PCR/Sequencing Table 4 Primers used for colony PCR and sequencing of sequences cloned in to U6M2 vector.

Transfection and Analysis of shRNAs 0.6*106 HEK293 was plated into 6-well plates in triplicate per condition and grown overnight. The following day, 2.5µg shRNA was complexed with 3µl Lipofectamine 3000 (Life Technologies) and 5µl P3000in 250µl Opti-MEM and added to each well drop-wise and incubated for 48 hours. Following 48 hours, RNA was extracted with the Qiagen RNeasy kit according to manufacturers protocol, treated with Turbo DNase and inactivated with EDTA according to protocol. RNA was quantified on a 48 Qubit 2.0(Invitrogen) with the RNA BR reagent and 400-1000ng was reverse transcribed into cDNA with M-MLV reverse transcriptase and random hexamers according to manufacturer’s protocol. cDNA was then amplified and measured via qPCR as previously described. Primers for qPCR were designed on IDT PrimerQuest with the intent of targeting regions exclusive to either the mRNA or the putative lncRNA and avoiding common regions between the two whenever possible. Primers targeted to PDCD4 mRNA span multiple exons in a region far downstream of the promoter common to all splice isoforms of PDCD4, while the asRNA primers target either the second exon – located in the promoter of PDCD4 – and one set intended to span the exons of PDCD4. Primers targeted to Exon 1 could not be designed due to its exceptionally GC-rich nature. BIM and SLIT2 mRNA primers are designed in a similar fashion to PDCD4; however, because we lack the exact boundaries of the transcripts, primers were either targeted to the promoter region (with both BIM and SLIT2), or in the first intron (with just BIM). All primers are listed in Table 5. Target Forward Primer Reverse Primer

PDCD4 GGGAAGGCTAGTCATAGAGAGA GACATACTCAGAAGCACGGTAG mRNA PDCD4 GCTACTTGGAAAGCTGAGGTG TAGCCCTTCTGCCCATTAGA asRNA PDCD4 CGCCTACTGGGTTTCATCAT TGGCTCACGCTTGTAATCTC asRNA Exon-Span BIM mRNA CCCTTTCTTGGCCCTTGTT CAGAAGGTTGCTTTGCCATTT BIM asRNA GAGGAAGTTGTTGGAGGAGAATAG CTCGCCACTTGTCCTTGTT BIM GGTGGTCCCTTTACAGAGTTT GGAGGTGGATCGGGAAATG pasRNA

49 SLIT2 CTGGGTGAAGTCGGAATATAAGG CACAGGACCTTGACAGGTAAAT mRNA SLIT2 CACTTTGAGAAGGACGAACCA CACCTGATGGCAAAGGTAGAA pRNA Table 5 Primers used to target features of various genes under investigation. Note: BIM pasRNA is the region of BIM-as overlapping the BIM promoter, while BIM asRNA is the region overlapping the intron.

Design and Cloning of small guide RNAs sgRNAs were designed and cross referenced between MIT’s CRISPR off-targeting analysis tool and the Broad Institute’s sgRNA design tool108,109. Input sequences were designed based on regions flanking the estimated promoter region of lncRNAs of interest, demarcated by CTCF regions when possible. sgRNAs scoring higher than 70 on MIT’s tool and greater than 0.35 on the Broad Institute’s tool were selected, truncated to 17 nucleotides based on findings of improved specificity, and ordered as sense/antisense oligonucleotide pairs on IDT (Table 6).

Target 5’ sgRNA 3’ sgRNA

PDCD4 AATTCAACCTCCGCCAGA GGCAGCCGCAAGATGGG BIM GTTATATGGGAAAAGGG ACGGCGGAGCTTGGCGG SLIT2 AGTTTGTAACTCCAGCG AGCGGAGACGCTTGCAT Table 6 sgRNA target sites selected from the Broad Institute and MIT web tools.

50

Figure 6 Features of the pCDNA-H1 vector. Two BsmbI cloning site site is located in between an H1 promoter and a chimeric sgRNA sequence.

Oligonucleotides were diluted to 100µM and phosphorylated and annealed as previously described. Meanwhile, the pCDNA3.1-H1sgRNA (Figure 6) vector was digested with BsmBI (NEB) according to manufacturer’s protocol, de- phosphorylated, and purified as previously described. Oligonucleotides and digested H1sgRNA vector were ligated and transformed as previously described. 3 successful transformants were picked into 1 ml LB media cultures per plate, miniprepped (Qiagen) according to protocol, and sequenced with BigDye Terminator 3.1 according to protocol. Sequence-verified clones were then picked into 50-100ml LB media and midi-prepped with a Midi-prep Plasmid Plus kit(Qiagen) according to the high yield protocol.

51 Design and Cloning of Homology Donor Vectors/Promoters

Figure 7 Features of the Shen Group HDR template donor vector. Puromycin resistance cassette and mCherry cassette are separated by a T2A linker and flanked by FLP recombination sites. 3xSV40 Poly(A) and a BGH Poly(A) are located downstream of the Puromycin/mCherry cassette. Homology arms intended towards the lncRNA HAUNT flank the knock-in sequence, but are not used for my purposes.

800-1000bp homology arms were designed flanking the cut site of the 3’ sgRNA. An mCherry/Puro/4xPoly(A) cassette-bearing plasmid was provided courtesy of Xiaohua Shen’s group at Tsinghua University (Figure 7). Puc19 was used as the cloning vector. Primers amplifying homology arms were designed to have to have 30-40bp extended sequences overlapping Puc19’s EcoRV restriction site and the mCherry/Puro/4xPoly(A) cassette. mCherry/Puro/4xPoly(A) cassette was amplified from its vector with PCR using Q5 polymerase according to standard 52 reaction protocol with Hi-GC buffer, an annealing temperature of 62°C, and an extension time of 1 minute; PCR products were subsequently digested with DpnI to remove remaining plasmid. Puc19 was linearized with PCR using Phusion Hi-GC master mix according to protocol with an annealing temperature of 62°C and an extension time of 45 seconds; PCR product was subsequently digested with DpnI. Homology arms were amplified using 2-step PCR with Q5 polymerase supplemented with Hi-GC buffer, a variable amount of DMSO, and an extension time of 1 minute. All primers used can be seen in Table 7. Target Sequence % DMSO pUC19 F gctgggagaccgtgact 0 pUC19 R ggttgtatcacctccgcca 4xPA F ggggatcctctagagtcgac 0 4xPA R gggtaccgagctcgaattc PDCD4 HA1 F ttgtaaaacgacggccagtgaattcgagctcggtaccc cgacgaagttcgcgcagggatg 10 PDCD4 HA1 R aagttatgatatcatcgatgtgactagtcacggtctcccagc agggggcgggggcggagg PDCD4 HA2 F atacgaagttatagatctggcggaggtgatacaacc cccatcttgcggctgcctccgag 10 PDCD4 HA2 R gcttgcatgcctgcaggtcgactctagaggatcccc gccgagtttgcagaacgccctgtg BIM HA1 F ttgtaaaacgacggccagtgaattcgagctcggtaccc aagagccacccctggccgcg 10 BIM HA1 R aagttatgatatcatcgatgtgactagtcacggtctcccagc ggcgccgcttgtcgccg BIM HA2 F atacgaagttatagatctggcggaggtgatacaacc atgcaagcgtctccgctcgc 3 BIM HA2 R gcttgcatgcctgcaggtcgactctagaggatcccc tcgcgaggaccaacccagtcc SLIT2 HA1 F tgtaaaacgacggccagtgaattcgagctcggtaccc actgaccacggtgtggttcagtg 3 SLIT2 HA1 R aagttatgatatcatcgatgtgactagtcacggtctcccagc ccgccaagctccgccgtc SLIT2 HA2 F atacgaagttatagatctggcggaggtgatacaacc cggcggtggtggtggct 3 SLIT2 HA2 R gcttgcatgcctgcaggtcgactctagaggatcccc ggcgtactggagaatggctcggt Table 7 Primers used to clone donor vectors to serve as a template for homology-directed repair. pUC19 primers were designed to linearize pUC19. 4xPA primers were used to amplify the mCherry/Puro/4xPA cassette from the Shen group’s donor vector. Remaining primers were designed to amplify homology arms from genomic DNA, with 30-40 bp sequences overlapping other pUC19 or the 4xPA cassette for Gibson assembly.

53 Homology arms were gel purified using the Qiaquick Gel Extraction kit. PCR product was loaded on to a 0.8% agarose gel and run for 90v for 45 minutes. Bands corresponding to desired product were excised and dissolved in 55°C buffer QG. Dissolved gel slices were loaded on to the column, and washed according to protocol with the following exceptions: wash buffers were incubated on the column for 5 minutes, and columns were washed twice with buffer PE. DNA was eluted twice with 20µl of 55°C buffer EB, incubated on the column for 5 minutes each time. Homology arms, pUC19, and mCherry/Puro/4xPA were quantified and 0.2 pmols of each were measured out. An equal volume of 2x Gibson Assembly Master Mix(NEB) was added and incubated at 50°C for 1 hour and transformed into NEB 5-alpha cells, grown, and midi-prepped with the Qiagen midi-prep kit according to protocol. Sequencing and PCR validation was conducted with M13 primers, as previously described.

54 Cas9-mediated Generation and Validation of Knockout Mutants

Figure 8 Features of our Cas9 vector. A Cas9-BFP fusion protein is expressed from an SFFV promoter, along with 4 nuclear localization signals. This vector is also amenable to lentiviral expression.

HEK293 cells were plated at a density of 0.6*106 cells per well in a 6-well plate, with a single replicate per condition and one untransfected non-fluorescent control. A Cas9.BFP construct (Figure 8) was complexed with 3µl Lipofectamine 3000, 6µl P3000, 250µl Opti-MEM, and the guide RNA pair at a ratio of 1:1:1 of Cas9:sgRNA1:sgRNA2, for a total of 3µg of plasmid DNA. Transfection complex was added drop-wise to each well and incubated for 48 hours. Following 48 hours, transfected cells were isolated into single colonies in one of two ways.

55 In the original method used only for PDCD4-as1 knockout cells, the transfected heterogeneous population was diluted to 5 cells per ml in DMEM F-12 Ham supplemented with 20% FBS and 1x PSG, and plated at a quantity of 100µl per well into a 96 well plate. The second method, used for BIM and SLIT2, had the heterogeneous population sorted by BFP expression into single cell colonies in a 96- well plate grown in DMEM F-12 Ham supplemented with 1x PSG, 20% FBS, and 40% 24 hour conditioned media. The untransfected population was used as an autofluorescent blank against which the transfected cells were sorted. Single cells from both methods were grown for 2-3 weeks, until wells with yellowed media – indicating growth – could be identified. Colonies with yellowed media were treated with trypsin and transferred into 6-well plates, where they were cultured for an additional 72-96 hours before half the well was passaged into fresh media, and the other half removed for genomic DNA extraction via a DNeasy extraction kit (Qiagen), conducted according to manufacturer’s protocol on a QIACube. Genomic DNA was subject to PCR with specific primers flanking the deleted region (Table 8) under the following conditions: 8µl 2x Phusion HiGC MasterMix (NEB), 0.8µl

Forward Primer, 0.8µl Reverse Primer, 2µl Template, 4.4µl H2O, and incubated at 98°C for 1 minute, (98°C for 10 seconds, variable annealing temperature for 15 seconds, 72°C for 1:30 minutes) for 35 cycles, and 72°C for 2 minutes. Target Forward Primer Reverse Primer Anneal

PDCD4 CTTTCCACGTTTCACTCCTCTC GCGAATCTTCCTCACAGGATTT 67°C BIM CAGTGATTGGGCGTAGGAG AGTCTATTCGTGTGTGCTCTTT 59°C SLIT2 GCATGAAGACCAACATCCTCTA CGGGTAGTCCTTCTCCTTCTA 64°C Table 8 Primers flanking the target excision site for each gene targeted.

Successful knockouts were passaged into T75 flasks and used in RT-qPCR and ChIP analysis as previously described. Cells were also transfected with a 1:1:1 ratio of Cas9:3’ sgRNA:Donor vector, and treated the same as above, except they were sorted for mCherry as well as BFP.

56

Bisulfite Conversion and Methylation Analysis Genomic DNA was either extracted from approximately 5*106 HEK293 Cells, Cas9- mediated knockout cells as described above, or shRNA treated cells using the DNeasy kit(Qiagen). For shRNA treated cells, cells were treated with the appropriate shRNA as previously described, with the following deviation: after 24 hours, cells are passaged into fresh media containing 800µg/ml Gentamicin and grown for an additional 48 hours. Surviving cells have had their genomic DNA extracted as previously described. Genomic DNA is then subject to bisulfite conversion through the Epitect Bisulfite Conversion kit (Qiagen) according to manufacturer’s protocol. Bisulfite converted DNA was amplified with primers specific to the gene of interesting in a 45°C-65°C gradient PCR under the following conditions: 12.5µl Jumpstart Taq Readymix, 1.25µl F, 1.25µl R, 5µl 5M Betaine, 5µl

H2O, and incubated at 95°C for 2 minutes, (95°C for 30 seconds, variable annealing temperature for 30 seconds, 72°C for 30 seconds) for 40 cycles, and 72°C for 2 minutes. Primers were designed through Methprimer with parameters set for Bisulfite PCR with CpG Island detection turned on (Table 9)110. Target Forward Primer Reverse Primer

PDCD4 1 TTTTAGTTAGTTTTAGGAGTTATATG CTATTTATTTTTATTTTCTTCTACCCAATA PDCD4 2 TTTTTATTTTTTAGTATTGTTTTTTTT ATTTATTTTTATTTTCTTCTACCCAATAAC BIM 1 AAATTTATGGGTTTTTTTTATATTG TATTCCTATACAACCTTACAACCTC BIM 2 TTTTGGTAGAGATAGAAAGGGATAT ATAAACACTTTAAAAAACAAAAATC Table 9 Primers intended for amplification of bisulfite-converted DNA. These are targeted towards CpG islands located within the promoter of their associated gene.

PCR product from wells producing visible bands were pooled and purified using the Qiagen minElute PCR Purification kit according to manufacturer’s protocol with an elution volume of 15µl using 50°C Buffer EB. Purified PCR products were ligated into a pGEM-T Easy(Promega) vector according to manufacturer’s protocol and 57 plated on LB plates supplemented with 150 mg/ml ampicillin (Sigma-Aldrich), 15 mg/ml X-Gal (Sigma-Aldrich), and 1mM IPTG (Sigma-Aldrich) and incubated overnight at 37°C. 5-8 white colonies were selected from each plate and grown in 1ml media overnight at 37°C with shaking at 200 RPM. Cultures were then mini-prepped using the Qiagen Spin Miniprep kit according to protocol and sequenced with Bigdye Terminator 3.1 (Thermofisher).

Results As discussed earlier, previous work conducted by Per Johnsson indicated that DNMT3A was recruited to the PTEN promoter by work of PTENpg1-alpha, an antisense to a pseudogene of PTEN. I hypothesized that other genes may be regulated by DNMT3A in a similar manner. To investigate this possibility, I first aimed to discover the targets of DNMT3A regulation and/or DNMT3A-binding locus. I then sought to determine if there was an antisense transcript or other non- coding RNA transcript emanating in proximity to any DNMT3A-bound loci. Finally, I endeavored to prove whether or not these non-coding RNA species were responsible for regulating DNMT3A recruitment in any fashion. In order to determine the targets of DNMT3A regulation and localization, two experiments were conducted; RNA-Seq and ChIP-Seq. The ChIP-Seq experiment aimed to determine the genomic sites to which DNMT3A is localized, by means discussed earlier in the introduction. The RNA-Seq experiment was conducted in order to determine genes regulated by DNMT3A. By overlaying the two datasets, two aims are achieved: first, I clarified the RNA-Seq data by revealing which genes were direct targets of DNMT3A, as opposed to those dysregulated downstream. Second, I revealed the extent to which DNMT3A localization affects gene expression.

58 ChIP-Seq of DNMT3A GeneSymbol Score LOC100507420 9.282 LOC101927421 6.566 LINC01411 5.342 MIR3666 4.579 ZNF703 4.57 ERLIN2 4.492 PRSS23 4.429 LOC102723701 4.425 HOXC4 4.373 FLJ12825 4.327 LOC101929622 4.262 Table 10 Top ten mostly highly scoring targets of ChIP-seq targeting according to BETA- minus. Scores are calculated based on distance to the associated gene.

35 million reads in DNMT3A chromatin immunoprecipitation samples and 69 million reads in the input samples were aligned with bowtie2 against hg19 with a >95% alignment rate across all samples. Peak-calling was conducted through MACS2 with a p-value threshold of 0.05 and broad-peak calling turned on, but otherwise under the default parameters. Both narrow and broad peak calling accounts for the possible processive action of DNMT3A, and leaves opens the possibility of comparing genes wherein DNMT3A exhibits processive activity and comparing them to genes where DNMT3A appears to localize to a specific region in some future undertaking. First, peaks were called separately on biological replicates separately to determine the level of correlation between the two replicates. wigCorrelate returned a Pearson’s R correlation of 0.614201 – a borderline result, but sufficient when coupled with experimental validation. After merging and calling peaks once more, combined narrow and broad peaks totaled 8051 peaks. Regulation prediction by BETA-minus identified 4502 possible targets of DNMT3A (Table 10). BETA predicted targets were subject to gene ontology analysis on DAVID.

59 100

80

60

40 -log2(p value) -log2(p 20

0

GO:0003002~regionalization GO:0060173~limb development

GO:0048736~appendage development GO:0003700~transcriptionGO:0048598~embryonic factor morphogenesis activity GO:0007389~patternGO:0001501~skeletal specification process system development GO:0030528~transcription regulator activity GO:0043565~sequence-specific DNA binding GO:0048568~embryonic organ development

GO:0016481~negativeGO:0009952~anterior/posterior regulation of transcription pattern formation GO:0010629~negativeGO:0048706~embryonic regulation skeletal of gene system expression development

GO:0006357~regulation of transcription from RNA polymerase II promoter Figure 9 The fifteen most highly enriched GO terms from functional annotation clustering on DAVID based on ChIP-seq targets predicted by BETA-minus.

The most statistically significant gene ontology terms enriched from our ChIP-seq are those involved in various systems of development, which is consistent with reports on DNMT3A’s essential role in embryonic development and cellular differentiation (Figure 9) 75,111. Therefore, its activity in HEK293 cells may be driving its cancer-like phenotype, suggesting that DNMT3A regulation in HEK293 cells may represent the breadth of possible dysregulations it may undergo in cancer and offering some vindication towards HEK293 as a model to study DNMT3A functionality.

60 DNMT3A Knockdown RNA-seq RNA-seq was conducted on DNMT3A-targeted siRNA treated cells and scrambled siRNA treated cells in triplicate. A total of 102 million reads across treatment and control samples were independently mapped to hg19 by Tophat2. Transcript abundances were assembled and estimated by Cufflinks and differential expression events were called by Cuffdiff. Quality control was conducted through the CummeRbund package to verify data quality, correlation across replicates, and efficacy of DNMT3A knockdown.

Figure 10 Graph indicating DNMT3A expression in DNMT3A-depleted HEK293 cells according to RNA-seq analysis. Knockdown and control samples are plotted against FPKM, indicating significant depletion in siRNA treated cells.

Individual gene analysis in cummeRbund confirms significant DNMT3A knockdown in siRNA treated sample(X3A) relative to control (X3ACon) – a roughly 50% decrease in expression (Figure 10).

61

Figure 11 First and second principle components representing overall gene expression of three biological replicates each of DNMT3A-depleted and control HEK293 cells. Biological replicates contain good clustering and separation from the other condition, indicating overall fidelity of the experiment.

A PCA plot containing the first and second components verified strong clustering between replicates within samples and strong separation on the second component, but not the first component (Figure 11). Strong separation between each condition suggests the fidelity of the experiment.

62 GeneID log(FC) P Value ENTPD8 3.85242 5.00E-05 TBX10 3.79399 5.00E-05 GRIN1 3.38512 5.00E-05 CEND1 3.32864 5.00E-05 ART5 3.21039 5.00E-05 CNIH2 3.05683 5.00E-05 RAB26 2.99526 5.00E-05 OPRL1 2.97761 0.00035 EFCAB4A 2.96359 5.00E-05

GeneID log(FC) P Value CDH13 -2.5478 5.00E-05 NMNAT2 -2.67687 5.00E-05 ACTBL2 -2.72971 5.00E-05 HULC -2.90775 0.0023 KIAA0408,SOGA3 -2.92056 5.00E-05 ADAMTS16 -2.9689 5.00E-05 SYP -3.05598 5.00E-05 CPA4 -3.83959 5.00E-05 BEX1 -4.63625 5.00E-05 CHGB -5.6006 0.00265 Table 11 Differential effects of DNMT3A knockdown. Top: Ten most downregulated genes. Bottom: Ten most upregulated genes.

Differential regulation analysis with Cuffdiff identified 4142 downregulated and 3895 upregulated genes upon loss of DNMT3A (Table 11). XIST, a gene previously reported to be regulated by DNMT3A, experienced a ~20% loss in expression – which is concordant with the previous finding that DNMT3A initially silences XIST, but is required for a subsequent increase in expression87. Concurring with our group’s previous finding, PTEN is also negatively regulated by DNMT3A: loss of DNMT3A induces a ~30% increase in PTEN expression. These findings serve to validate the fidelity of our RNA-seq experiment. Unfortunately, TFF1 and ERα –

63 genes reported to be positively regulated by DNMT3A through methylation cycling – were not detected at all69,70.

Figure 12 Gene ontology of RNA-seq terms separated by upregulation or downregulation following DNMT3A knockdown. Seven most significant terms are represented.

The most confident gene ontology terms enriched from differentially expressed genes consist primarily of broad functional categories (Figure 12). This is likely a consequence of downstream genes affected by loss of DNMT3A representing a diverse array of functional terms, causing dramatically increased confidence in broader categories while obscuring more specific categories further down the list. Further investigation on RNA-seq gene ontology could be conducted, but integrating ChIP-seq data offers a more attractive alternative for investigating DNMT3A regulation.

64 DNMT3A RIP-seq Gene ID Score P Value RPL13A 357 8.89E-13 RPL13AP5 357 8.89E-13 RPL23A 335 5.07E-13 RNVU1-19 259.333 5.87E-13 RPS3 193 5.74E-13 SSTR5-AS1 175 5.29E-13 RPL31 153 5.69E-13 RPS17 148 6.80E-13 SNORD4A 139 6.08E-13 LMOD3 114 5.06E-13

Gene ID Log(FC) P Value - 16.17092008 4.43E-10 RPL27 12.86956356 4.44E-08 RPS12 12.91432561 4.61E-08 SLIT2 12.96289303 5.07E-08 RPL37 13.09673024 5.16E-08 RBM39 12.96138477 5.18E-08 XIST 12.75906258 5.94E-08 SMARCE1 12.7941516 6.51E-08 TMEM241 13.18186854 7.91E-08 MATR3 12.63639251 8.14E-08 Table 12 Top ten hits from two means of RIP-seq analysis. Top: My analysis, conducted through Piranha peak-calling software. Bottom: Analysis conducted by Martin Smith, based on Salmon quantification from an ab initio transcriptome.

80 million reads from DNMT3A RNA Immunoprecipitation samples and 100 million reads from input in total were mapped to hg19 by Tophat2. Peak calling was conducted through Piranha by sorting reads in 25nt bins and calling peaks based on bin abundance. P value threshold and multiple hypothesis correction were disabled during peak calling, with the Bonferroni corrected p value threshold determined to be 0.05/118542 total peaks called = 4.2*10-8 (Table 12).

65 Another independent analysis was conducted by Martin Smith of the Garvan Institute. A stranded ab initio transcriptome was constructed through Trinity using total reads from the RNA-seq experiment and served as a set of reference transcripts for quantification of RNA immunoprecipitation and input reads by Salmon (Table 10). Differential RIP-Seq analysis produced two sets of genes with only approximately 25% overlap. Both sets predominantly featured ribosomal proteins, which were largely discarded as potential genes of interest. Interestingly, Martin’s analysis included XIST – a transcript that has never been determined to interact with Dnmt3a, despite many investigations into XIST’s interactome112,113. As mentioned in the introduction, TSIX has been previously reported to interact with DNMT3A to modulate XIST expression, but DNMTA3 interaction with XIST has never been reported before. This result may call this analysis approach in to question, or it may describe a previously unreported interaction between DNMT3A and XIST.

66 8

6

4

-log2(p value) -log2(p 2

0

GO:0030529~ribonucleoprotein complex

GO:0043228~non-membrane-bounded organelle

GO:0043232~intracellular non-membrane-bounded organelle GO Term

Figure 13 The only three enriched GO terms from genes associated with peaks called by Piranha.

Gene ontology revealed few significantly enriched terms (Figure 13) – of the three that were enriched, all describe ribosomal function in some way. Ribosomal proteins and RNA are possible non-specific interactors, so it’s likely that no biological process is enriched in RIP targets. This suggests that DNMT3A interaction with RNA is not involved in any specific biological process; if DNMT3A is indeed recruited by lncRNAs, then genes involved in such a process are involved in a myriad of functions.

Integrating ChIP-seq with RNA-seq data BETA-plus was used to determine the regulatory potential of DNMT3A on its ChIP-seq determined binding sites. Briefly, significant Cuffdiff ouputs are first 67 overlayed with ChIP-seq peaks, and a certain percentile of differentially regulated genes is used to determine regulatory potential while the remaining genes are used as background against which regulatory potential is measured. Scoring is based on degree of differential expression and distance from the peak.

Figure 14 Functional prediction of DNMT3A activity. Upregulate refers to genes upregulated upon loss of DNMT3A, while downregulate refers to genes downregulated. (a) Functional prediction based on top 30% differentially regulated genes. (b) Top 50%. (c) Top 95%.

68 When the threshold is set to measure the 50% most highly upregulated and downregulated genes proximal to ChIP-seq peaks, a significant difference (p=9.24*10-4) between genes upregulated upon loss of DNMT3A and background genes can be observed. This affirms the current understanding of DNMT3A as a gene suppressor. Interestingly, however, genes that are predicted to be upregulated by DNMT3A are nearly significant over background (p=0.0561). This prompted me to resubmit the data through BETA-plus with the threshold set to the top 30% of differentially expressed genes to see if DNMT3A upregulated genes might be restricted to a smaller subset of genes. When only 30% of genes are represented, DNMT3A is predicted to upregulate its target genes with a p value that significance even further (p=0.051). To contrast, I ran BETA-plus with the top 95% differentially expressed genes represented: DNMT3A downregulates these genes with even greater significance (p=1.3*10-4), while confidence in DNMT3A upregulation is lost entirely (p=0.673) (Figure 14). These results indicate that, on the majority of genes it targets, DNMT3A does indeed exert some form of transcriptional repression. In addition, DNMT3A may exert some positive regulation on a relatively small subset of genes or exert a relatively small effect as well. Though p values should not be taken as absolute indication of truth or falsehood, its manipulation may offer some insight into the nature of DNMT3A’s activity. Through gradual restriction of the differential expression threshold, confidence in positive regulation increases, which indicates that increased thresholding is causing us to approach the true set of genes positively regulated by DNMT3A. Though confidence is higher in DNMT3A’s repressive activity, DNMT3A may very well operate as a transcriptional activator as well. Surprisingly, across all significance thresholds, more genes were found to be upregulated by DNMT3A than downregulated – 510 upregulated compared to 455 downregulated at the 50% threshold. These findings correspond to ones found by a previousy publication, who found that DNMT3A responsible for upregulating 536 genes and downregulating 406 genes in postnatal neural stem cells114. 69

Figure 15 Functional GO categories of genes predicted to be upregulated by DNMT3A (left) and genes predicted to be downregulated by DNMT3A (right)

Gene ontology terms enriched in BETA-plus predictions revealed that DNMT3A operates to upregulate many genes that are involved in neural development, while also downregulating genes involved in other developmental programs (Figure 15). Despite being classified as embryonic kidney cells, some studies have indicated that HEK293 cells more closely resemble neurons115. These findings may support that sentiment while also reasserting DNMT3A as a key molecular driver of development across a number of different tissues.

70 Identifying genes of interest To determine genes worth experimentally validating, we manually analyzed each RIP-Seq hit for non-coding RNA transcripts. We selected genes of interest that fulfilled one of two criteria: we selected loci that either contained non-coding antisense transcription, or we selected loci that contained sense transcription occurring upstream of the annotated gene based on the Trinity transcriptome assembled by Martin. We then cross-referenced these RIP-Seq genes of interest with ChIP-Seq and RNA-Seq data. From these analyses, we identified 5 genes that satisfy our selection criteria (Table 13). Of these genes, 2 contain an overlapping antisense transcript emanating from the first intron, 1 contains a promoter-associated transcript, and 2 contain a non-overlapping antisense transcript emanating from a bidirectional promoter. The latter 2 either did not contain a ChIP-seq peak or a RIP- seq peak across all datasets; however, they did contain features that motivated some further investigation and also functioned as a negative control. GeneSymbol ChIP MartinRIP Log(FC) AlbertRIP Score RNA Seq FC Score PDCD4 0.411 N/A 4.4* 1.01 BCL2L11(BIM) 0.056 1.74 35 1.15 SLIT2 0.475 12.96 21.67 1.62 EPHA5 0.086 N/A 1.8* 1.27 DLG5 N/A 1.85 7* 0.91 Table 13 *Excluded following Bonferroni correction. Table of hits from integrative analysis of ChIP-seq, RIP-seq, and RNA-seq datasets with respective scores compiled, including RIP scores from both Martin and my own RIP analysis.

71

Figure 16 ChIP-seq peaks as represented by bedgraph outputs from MACS2 visualized in IGV. (a) BCL2L11 AKA BIM locus revealing a modest peak within the first intron of BIM and potentially within the promoter as well. (b) PDCD4 locus revealing a modest peak in the promoter and the first intron, possibly correlating to the promoter of PDCD4-as1 as well. (c) SLIT2 locus revealing a large peak in the promoter. (d) EPHA5 and NR_034138 AKA EPHA5-as1 locus revealing large peaks across the body of both genes, potentially overlapping promoter elements driving transcription of either or both genes. (e) DLG5 and DLG5-as1 locus revealing peaks spread across the locus.

Figure 17 UCSC genome browser with Gencode comprehensive, Refseq, and UCSC gene annotation tracks visible. (a) PDCD4-as1 transcript as indicated by Gencode comprehensive gene annotation. (b) PDCD4-as1 transcripted as indicated by Refseq and UCSC genes.

Figure 18 FANTOM5 Zenbu browser displaying Gencode gene annotations, CAGE data, and predicted TSS’ at the PDCD4 promoter locus. (a) Antisense transcription and TSS’ 72 concordant with the Gencode annotation of PDCD4-as1. (b) Lack of antisense transcription or TSS’ where Refseq and UCSC have annotated PDCD4-as1.

Programmed cell death 4 (PDCD4) was one of the original genes we identified in genome-wide HeLa studies. Work conducted by a previous student determined that PDCD4-as1 was bound to DNMT3A in HeLa cells, but was unable to achieve a determine, in a statistically significant fashion, the role it plays in modulating DNMT3A. Though excluded following multiple comparisons correction in my RIP analysis and excluded altogether in Martin’s, legacy work conducted on PDCD4 drove its reemergence as a gene of interest. Examination of CHIP-seq data on integrated genome viewer(IGV) reveals peaks scattered throughout the promoter and first intron of PDCD4(Figure 16), suggesting a possible mode of interaction beyond simple promoter methylation of PDCD4 alone. Investigation on the UCSC browser revealed two independent but overlapping transcripts referred to as PDCD4-as1: One unspliced transcript driven by a bidirectional promoter, and one spliced transcript overlapping PDCD4’s first intron and exon; the former was annotated by Refseq and UCSC genes, the latter, in Gencode’s comprehensive gene set (Figure 17). I elected to base my investigation on Gencode’s version of PDCD4- as1 for two reasons: first, Gencode’s comprehensive set has been reported to bear greater transcriptional complexity and better representation of lncRNAs116. Second, investigation on the FANTOM5 Zenbu browser supports an antisense transcription start site occurring in the first intron of PDCD4 – coinciding with Gencode’s PDCD4-as1 annotation – but not in the promoter (Figure 18). PDCD4 has been previously demonstrated to exert a tumor suppressive influence by promoting apoptosis through inhibiting translation factor eIF4A117,118. Indeed, PDCD4 has been previously demonstrated to be downregulated in breast, prostate and hepatocellular carcinoma119-121. Taken together, these elements piqued my interest in investigating PDCD4’s mode of regulation.

73

Figure 19 UCSC genome browser positioned at the BIM (listed here as BCL2L11) promoter locus with our ab initio transcriptome, Gencode genes, and Human ESTs displayed. (a) Antisense transcription indicated by our ab initio transcriptome constructed from the totality of our RNA-seq reads. (b) ESTs supporting antisense transcription in this locus.

Figure 20 FANTOM5 Zenbu browser displaying CAGE data, predicted transcription start sites, and Gencode genes at the BIM(listed here as BCL2L11) promoter locus. Although no antisense TSS is predicted, a small amount of antisense transcription is shown to occur in the first intron of BIM.

74

Figure 21 ICGC browser centered around the TSS of BIM (listed here as BCL2L11). A cluster of mutations (bottom) has been reported within the first intron of BIM, where BIM- as is believed to emanate from.

BCL2L11, also known as and henceforth referred to as BIM, was a highly-enriched hit from the Piranha RIP-Seq analysis and plays a potent role in promoting apoptosis, positing it as a prime gene for investigation with respect to its biological role122. BIM is a major player in apoptosis, functioning as an important pro-apoptotic BH3-only member of Bcl-2 family in cancer development and also drawing clinical interest through its role in cancer treatment; deficiencies in BIM expression have been known to confer resistance to cancer therapy123. Upon further investigation in the UCSC Genome Browser, several ESTs supporting antisense transcription were discovered: AA854054, AA890664, AI671148, BF591082, AI656279, and AA015748 (Figure 19). Inspecting FANTOM5 CAGE data on the Zenbu Genome Browser supported a relatively small amount of antisense transcription occurring, beginning within the first intron of BIM (Figure 20). From this, I hypothesized the locus encoding BIM to contain an 75 antisense transcript, henceforth referred to as BIM-as, potentially responsible for DNMT3A interaction with BIM promoter. Investigating BIM in the International Cancer Genome Consortium (ICGC) genome browser reveals a cluster of mutations occurring within the first intron of BIM (Figure 21). Although these mutations are indicated as low impact, this may be an overlooked consequence of their intronic localization, they may very well be mutations altering the expression or content of BIM-as, subsequently altering regulation of BIM. The IGCG Data Portal indicates that BIM is mutated in ~30% of melanomas, and ~20% of malignant lymphomas and esophageal cancers. In epithelial cancers, BIM phosphorylation suppresses its apoptotic function and conferring a pro-survival phenotype upon BIM124. Since the loss of apoptosis sensitivity is known as a hallmark of cancers125, if BIM-as serves to regulate BIM in any fashion, suppression of BIM expression by these cancer-related mutations in BIM-as may potentially act as a driver of cancer – further emphasizing BIM-as as a gene worth investigating. ChIP-Seq data reveals a minor peak representing DNMT3A localization at the BIM promoter (Figure 16). Furthermore, BIM was also upregulated 1.15-fold upon DNMT3A suppression. Thus, BIM emerges as a candidate for further investigation.

Figure 22 FANTOM5 Zenbu browser positioned around the promoter region of SLIT2, displaying Gencode genes, CAGE data, and TSS predictions. Although no TSS is predicted upstream of SLIT2, a large amount of transcription can be seen occurring – magnitudes larger than the predicted SLIT2 TSS.

76 The Trinity transcriptome revealed that transcription was occurring upstream of the SLIT2 gene. The presence of reads mapping into the intronic regions of SLIT2 suggests the possibility that this transcript may also be discontinuous with the SLIT2 mRNA. A previous study has demonstrated that genes containing an untranslated extended 5’ UTR – termed a promoter associated RNA(paRNA) – were susceptible to siRNA mediated gene silencing by some mechanism of this extended UTR126. I hypothesized that this extended transcript, henceforth referred to as SLIT2-paRNA may be itself be a promoter-associated RNA, and may be responsible for recruitment of DNMT3A in regulation of SLIT2. CAGE data from FANTOM5 supports the presence of an extended 5’ UTR (Figure 22). SLIT2 had a modest ChIP-Seq peak and was upregulated 1.62-fold upon DNMT3A knockdown – roughly inversely proportional with the level of suppression experienced by DNMT3A (Figure 16). Though far from definitive, having all three experiments correspond with one each other led me to further investigate as a gene possibly regulated by DNMT3A.

Figure 23 UCSC genome browser displaying Gencode comprehensive gene annotations centered around (top) DLG5 and (bottom) EPHA5 promoter regions. Both DLG5 and EPHA5 contain non-overlapping asRNAs.

DLG5 was initially thought to contain a ChiP-seq peak, but upon further examination of the analysis, its ChIP-seq peak was found to be located several kilobases away and no significant peak was found proximal to its locus (Figure 16); nevertheless, DLG5 was able to serve as a negative control for our validation experiments. EPHA5 did contain a ChIP-seq peak, but scored very lowly on my RIP analysis and not at all on Martin’s – serving as another negative control (Figure 16). However, both genes contain non-overlapping annotated antisense transcripts emanating from a bidirectional promoter (Figure 23), which implicated them as potential genes of interest according to our selection criteria. Thus, we proceeded to 77 investigate the two genes under the possibility that our sequencing experiments may be flawed.

Validation of Sequencing By ChIP and RIP-qPCR

Figure 24 Graph displaying results from chromatin immunoprecipitation of DNMT3A. Chromatin was precipitated using antibodies for DNMT3A and IgG. CT values from 3A-IP and IgG-IP were normalized against input, and fold-change of 3A-IP over IgG-IP was determined.

Deep sequencing is a powerful tool for detecting genome-wide events; however, the technology has yet to be perfected, and confidence in its results stands to benefit from experimental validation. To this end, I conducted ChIP-qPCR and RIP-qPCR to attempt to replicate the deep-sequencing findings in a low-throughput, gene specific fashion. Following formaldehyde-crosslinked ChIP of DNMT3A, qPCR was conducted to detect whether genomic promoter regions of interest were bound to DNMT3A. qPCR confirmed significant enrichment of the promoter regions of PDCD4, BIM, SLIT2, and EPHA5, but not DLG5 (Figure 24). DLG5 was thereby

78 removed as a candidate for investigation, adding to the credibility of our ChIP-seq experiment in the process. SLIT2 also demonstrates relatively low enrichment, suggesting possible reconsideration as SLIT2 as a candidate.

Figure 25 Graph displaying results from RNA immunoprecipitation of DNMT3A. RNA was precipitated using antibodies for DNMT3A and IgG. CT values from 3A-IP and IgG-IP were normalized against input, and fold-change of 3A-IP over IgG-IP was determined.

Following formaldehyde-crosslinked RIP of DNMT3A, qPCR was conducted to detect putative non-coding RNA species of interest bound to DNMT3A. Interaction with DNMT3A was confirmed for PDCD4-as1, BIM-as, and SLIT2 pRNA, but not DGL5-as1 or EPHA5-as1 (Figure 25). EPHA5-as1 is thereby also disqualified as a gene of interest. At this point, both genes with non-overlapping asRNAs have been eliminated. Though this sample is not large enough to draw meaningful conclusions from, it seems that DNMT3A is interacting with lncRNA species that overlap an

79 mRNA. It’s also worth nothing that the relative proportion of enrichment – BIM-as being the most enriched and PDCD4-as1 being the least – is concordant with the scoring in my RIP-seq analysis. PDCD4-as1 was originally excluded after multiple hypothesis correction, suggesting that a more relaxed significance threshold may be warranted on my RIP-seq analysis.

Knockout of lncRNAs

Figure 26 Design strategy for knockout of candidate lncRNAs. lncRNA truncation and promoter deletion from within the first intron of their cognate mRNAs were used to ablate (A)PDCD4-as1 and (B)BIM-as, while promoter deletion was used to target (C)SLIT2 paRNA.

Figure 27 Gel demonstrating excision of PDCD4-as1 with Cas9. Wildtype and total transfected population are in the first two lanes, indicating the sizes expected for intact the intact and excised PDCD4-as1 locus. Other ten bands indicate single colonies obtained from limiting dilution. Lanes 3, 7, and 9 indicate a total knockout of the PDCD4-as1 locus. The character of the artifact in lane 1 is unknown.

80 In order to knock out PDCD4-as1, I co-transfected two sgRNAs targeted towards regions flanking the promoter and first exon of PDCD4-as1 (Figure 26) with Cas9 into HEK293 cells and diluted the cells into 0.1 cells per well in a 96-well plate. After 2-3 weeks, I checked the plate for growth. 13 surviving colonies were found following limiting dilution of PDCD4 knockout cells in transiently transfected plates, 10 of which survived the transition into 6-well plates. Of the 10 colonies that survived to be genotyped, 2 remained similar to wildtype, approximately 4 observed a heterozygous deletion, 3 experienced a homozygous deletion, and 1 experienced an unexpected deletion event that may inspire further investigation at a later date (Figure 27). Colony 7 was selected for further investigation, while colony 3 and 9 were frozen as liquid nitrogen stocks to serve as a CRISPR off target effects control if future experimentation demands it. Colonies 6 and 8 were frozen into liquid nitrogen stocks as CRISPR negative controls, and colonies 4 and 5 were frozen to potentially compare homozygous versus heterozygous deletions of the gene in the future.

Figure 28 Gel depicting Cas9 excision of BIM-as1. Left-most lanes contain WT and total transfected population, demonstrating expected sizes for intact and truncated BIM-as gene locus. Other samples are surviving single colonies following BFP-sorting. None appeared to demonstrate homozygous loss, with the possible exception of colony 7.

81 To knock out BIM-as, I co-transfected HEK293 cells with sgRNAs intended towards deleting a large portion of the first intron, where Martin’s transcriptome assembly indicates unannotated antisense transcription is occurring (Figure 26). Rather than conduct a limiting dilution, I instead sorted for BFP, which is fused to our Cas9 construct, into 96-well plates. 25 surviving colonies were found following single-cell sorting for BFP of the heterogeneous BIM-deletion population in 96 well plates. Of those 25 surviving colonies, 23 survived the transition into 6-well plates. 10 colonies were screened, 2 of which experienced no deletion, 5 of which experienced heterozygous deletions, and 2 of which experienced unpredicted deletion events (Figure 28). Colonies 2, 7, and 10 were selected for further investigation, with colony 10 serving as the basis for most downstream investigations. Colonies 1 and 4 were frozen to reserve as negative controls.

Figure 29 Gel depicting Cas9 excision of SLIT2 paRNA promoter. Left-most lane contain WT SLIT2, demonstrating expected sizes for the intact SLIT2 paRNA promoter locus. Remaining samples represent surviving single colonies following BFP-sorting. Surprisingly, the WT displays two bands, the larger of the two being consistent throughout all samples. This is believed to be an artifact, indicating sample 5 as a homozygous deletion.

Because the SLIT2 paRNA is contiguous with the mRNA transcript, I designed sgRNAs upstream of the SLIT2 paRNA TSS as indicated by our transcriptome and downstream of a CTCF boundary indicated by the UCSC genome browser (Figure

82 26). I used BFP-sorting for SLIT2 knockout as well. 5 surviving colonies were found following single-cell sorting for BFP of the heterogeneous SLIT2-deletion population in 96 well plates. Of those 5, all 5 populations survived the transition into 6-well plates. Of these 5 populations, 2 experienced no deletion event, 1 experienced a possible homozygous deletion event, 1 experienced a possible heterozygous deletion event, and 1 experienced an unexpected deletion event. It’s worth noting that even in the wildtype, 293 cells appear to have at least two variations of the SLIT2 promoter – one of them potentially lacking at least one PAM required for Cas9-targeting. Alternatively, it may also be a PCR artifact. This may be a characteristic of the HEK293 cell line, but was not further investigated, as it was deemed not pertinent to our investigation of chromatin dynamics.

Promoter dynamics

Figure 30 Flow cytometry sorting for BFP and mCherry expression following transfection of Cas9.BFP, one sgRNA, and HDR template donor vector. (A) Wildtype negative control demonstrates the extent of autofluoresecence in HEK293 cells. (B) BFP expresses strongly (left), but the SLIT2 homology arm – the region directly upstream of the SLIT2 paRNA TSS – is incapable of driving mCherry expression (right). (C) The PDCD4-as1 promoter is able to drive expression of mCherry (right) at a level equal to or higher than BFP (left). (d)

83 The BIM-as promoter is able to drive mCherry expression just barely above autofluorescence levels (right). BFP is also expressed (right).

Another strategy I intend to employ for suppressing lncRNA function is to insert a Poly(A) cassette to cause premature termination of the lncRNA. Although knock-in of this transgene has yet to be achieved, an incidental finding was attained through these efforts. The poly(A) cassette includes mCherry and puromycin resistance cassettes upstream of the poly(A) sequences; because the homology arms I was using to guide homology-directed repair were flanking the TSS of my lncRNAs of interest, I was also cloning the promoter of the lncRNA upstream of a fluorescent reporter. To isolate transgenic clones, I sorted for both BFP and mCherry – but this also served to measure the efficacy of the putative promoter. The PDCD4-as1 promoter drove mCherry expression most strongly – nearly on par with the CMV- promoter driven Cas9.BFP (Figure 30). The BIM-as promoter was also able to drive mCherry expression, though with significantly less efficacy than PDCD4-as1 (Figure 30). This may be because the knock-in target site in BIM-as is more distal to the TSS than in PDCD4-as1, although the previous failures to detect the existence of BIM-as do suggest that it is not a highly expressed transcript. The SLIT2 promoter, on the other hand, failed to drive mCherry expression at all – in fact, cells transfected with the knock-in complex exhibited less auto-fluorescence than the negative control (Figure 30). This seems to suggest that the SLIT2 region I deleted may not be sufficient for driving transcription. In order to determine the efficacy of lncRNA ablation across different approaches, I performed RT-qPCR under vehicle-transfected, shRNA-transfected, and vehicle-transfected Cas9-deleted conditions on my genes of interest. Through these experiments, I also hope to reveal the relationship, if any, between the putative non-coding RNA and its cognate mRNA. Our original hypothesis claimed that these lncRNAs were responsible for DNMT3A recruitment, which, as a de novo methyltransferase, would deposit methyl marks on CpG islands across its target promoter – thus suppressing expression of the cognate gene. However, this is 84 dependent on the idea that DNA methylation is inversely correlated with gene expression, when the reality can often be more complex127. As it stands, I approached these experiments from an agnostic perspective, with no expectation for loss or increase in mRNA expression.

Investigation of SLIT2

Figure 31 qPCR of (a) SLIT2 mRNA and (b) SLIT2 paRNA following promoter excision by Cas9 and suppression with an shRNA.

RT-qPCR of SLIT2-suppressed conditions reveals a concordant relationship between SLIT2 mRNA and paRNA expression – loss of mRNA is correlated with a loss of paRNA (Figure 31). Furthermore, Cas9-mediated deletion of the putative promoter region and shRNA-mediated knockdown of the promoter transcript results in a similar phenotype of decreased expression of both transcripts. These results indicate that the region deleted by Cas9 was indeed responsible for some transcriptional activation – though perhaps not enough to drive transcription on its own. Distinguishing between the two transcripts – the paRNA and the mRNA – is difficult, however, as they are transcripts continuous with one another. Three results: 85 the low ChIP enrichment, the promoter’s inability to drive an independent transcript, and the inability to distinguish between the mRNA and the alleged paRNA with results generated thus far, drove me to suspend investigation into SLIT2.

Interrogation of PDCD4

Figure 32 qPCR of PDCD4 (a) mRNA and (b) asRNA following PDCD4-as1 truncation and promoter deletion and suppression by shRNA. (c) Expression of PDCD4-as1 as measured by intron-spanning primers, specific to the spliced isoform. qPCR confirms the efficacy of both the small hairpin and the Cas9-mediated knockout of PDCD4-as1, although to different degrees (Figure 32). The shRNA exhibits greater efficacy in reducing expression of the second exon – presumably incorporating both spliced and unspliced transcripts – and the spliced isoform, relative to the effect generated by Cas9-mediated knockout. shRNA-mediated knockdown appears to have a greater effect on exon 2 than the spliced isoform, suggesting that the shRNA may play a role in regulating the unspliced isoform as well. Interestingly, shRNA knockdown of PDCD4-as1 appears to cause increased expression of the PDCD4 mRNA, while Cas9-mediated knockout appears to play no significant role at all. The effect size of the shRNA knockdown, however, is quite small – it resulted in only a 1.3-fold increase of the mRNA. These results suggest that PDCD4-as1 may indeed be responsible for the recruitment of repressive 86 elements to the PDCD4 promoter, and in a manner dependent on exon 1. Furthermore, that the shRNA exerted an effect at all implies that PDCD4-as1 functions to recruit transcriptional repressors from the cytosol, due to the inability of shRNAs to operate in the nucleus.

Figure 33 ChIP of DNMT3A at the PDCD4 promoter locus following PDCD4-as1 truncation and promoter excision and shRNA-mediated suppression.

In accordance with the RT-qPCR results, ChIP-qPCR also reveals a divergent phenotype of DNMT3A localization between shRNA-mediated knockdown and Cas9-mediated knockout (Figure 33). DNMT3A localization to the PDCD4 promoter is heavily diminished following Cas9-mediated knockout, whereas shRNA- mediated knockdown nearly doubles DNMT3A localization. It’s not altogether unsurprising that loss of DNMT3A localization does not correlate to any significant change in mRNA expression, as DNMT3a is only responsible for the deposition of novel methyl marks; interestingly, however, increased DNMT3A localization appears to be correlated to increased mRNA expression. These results imply that 87 DNMT3A may be conferring transcriptional activation through an unanticipated mechanism.

Figure 34 Lollipop chart indicating methylation state at the PDCD4 promoter in PDCD4- as1 knockout HEK293 cells, PDCD4-as1 suppressed cells, and wildtype HEK293 cells.

Bisulfite analysis reveals no significant difference in methylation state at the PDCD4 locus between wildtype HEK293s, PDCD4 shRNA treated, and PDCD4-as1 knockout cells (Figure 34). Interestingly, despite DNMT3A localizing to the PDCD4 promoter in wildtype HEK293 cells, the wildtype promoter is nevertheless almost entirely unmethylated. This implies that DNMT3A-driven DNA methylation is not contingent only on lncRNA-driven localization; it may very well seem that the mechanics driving DNA methylation may involve a complex molecular network. On the other hand, it may also suggest that 5mC is a transient modification at the 88 PDCD4 promoter, and a more complex network of methylation-driven regulation may be at play here.

Interrogation of BIM

Figure 35 qPCR of BIM (A) mRNA (B) Exon-overlapping asRNA and (C) Promoter- overlapping asRNA following truncation and promoter excision of BIM-as.

No significant difference in expression was caused by any shRNAs tested in either the expression of BIM mRNA or asRNA, suggesting either the need to redesign my shRNAs targeted to BIM, or indicating that BIM-as is a nuclear-specific transcript to which shRNAs cannot access. In contrast, Cas9-mediated knockout of the asRNA demonstrated a total reduction of the asRNA – logical, as the targeted region no longer exists – and a 30% reduction of transcripts emanating from the promoter region of BIM (Figure 35). This indicates a failure to knockout the promoter, but success in producing a truncated lncRNA. In contrast, the mRNA only experienced a mild reduction in expression – just 15%. This could represent a concordant relationship between asRNA and mRNA expression – in which case, the asRNA may be responsible for recruiting transcriptional activators. On the other hand, it may also be a modest consequence of manipulation of the genomic locus – an unintentional deletion of regulatory elements may also have driven the loss of expression.

89

Figure 36 DNMT3A ChIP at the BIM promoter following BIM-as truncation and promoter deletion.

ChIP-qPCR analysis reveals a 3-fold loss of DNMT3A at the BIM promoter following BIM-as truncation (Figure 36). As with PDCD4, excision-mediated ablation of the antisense transcript has driven a loss of DNMT3A localization at the BIM promoter, implicating BIM-as as a potential recruiter of DNMT3A. Unlike PDCD4, however, where loss of DNMT3A had no effect on mRNA expression, loss of DNMT3A at the BIM promoter drove a mild decrease in expression – which, as mentioned earlier, may be a consequence of the chromatin excision.

90

Figure 37 Lollipop chart indicating methylation state at the BIM promoter in BIM-as knockout HEK293 cells and wildtype HEK293 cells.

Bisulfite-PCR reveals no significant difference in methylation of the BIM promoter following truncation of BIM-as (Figure 37). Unlike PDCD4, the BIM promoter is almost entirely methylated in wildtype HEK293 cells – suggesting that loss of DNMT3A localization does not lead to loss of methylation. This is not entirely unsurprising, as DNMT3A is a de novo methyltransferase, rather than a maintenance one. Nevertheless, continued localization to the BIM promoter despite the presence of DNA methylation suggests encourages our hypothesis that an independent mechanism still operates to recruit it. One previously suggested mechanism for DNMT3A localization involves an interaction with H3K9me3. One study states that DNMT3A interacts with the protein HP1α, which in turn recognizes H3K9me3 through its chromodomain79, and another claims DNMT3A interacts with a G9A/GLA/MPP8 complex to simultaneously deposit 5mC and H3K9 methylation128. Nevertheless, both studies claim that DNMT3A binding is in some fashion driven by methylated H3K9 recognition, so I investigated whether differential H3K9me3 levels occurred between wildtype and BIM-as truncated HEK293 cells.

91

Figure 38 H3K9me3 ChIP at the BIM promoter in wildtype HEK293 cells vs BIM-as knockout cells.

Formaldehyde crosslinked ChIP of H3K9me3 revealed 7-fold increased localization around the BIM promoter in BIM-as truncated (Figure 38). Interestingly, this contradicts previous reports that DNMT3A is recruited through H3K9me3 recognition, encouraging our efforts to seek an alternative mechanism of DNMT3A recruitment. However, this may also provide an explanation for the loss of BIM mRNA expression – H3K9me3 is typically indicative of constitutive heterochromatin.

92 Discussion To reiterate work conducted thus far: DNMT3A has been previously described to modulate PTEN in a trans-acting fashion through an interaction with PTENpg112. The nature of this interaction, wherein PTENpg1 bears antisense sequence similarity to the promoter and 5’ UTR of the PTEN parent gene, led us to hypothesize that other natural antisense or promoter associated lncRNAs may also modulate their cognate gene in a cis-acting fashion. This hypothesis drove us to generate and analyze three genome-wide datasets in the hopes of identifying a global mechanism driving DNMT3A recruitment and epigenetic modification. The results described here served to significantly extend our understanding on the mechanism of DNMT3A and the role of lncRNAs its regulatory network.

DNMT3A interacts with non-coding RNAs Our hypothesis alleges that a non-coding RNA overlapping the promoter of a gene is responsible for recruiting DNMT3A to its cognate locus. Our RIP-seq experiment, though intended to identify non-coding RNA candidates potentially responsible for such recruitment, instead revealed a host of mRNAs and rRNAs with which DNMT3A interacts, and few lncRNAs, not including our positive controls and previously reported results. This is far from indicative of a failed experiment, however. The lack of positive controls isn’t entirely surprising from a technique as untested and unrefined as RNA immunoprecipitation – it may simply be case of excessive background in the prepared sample, or false negatives when conducting excessively rigorous statistical analyses, or any number of human-error driven stochastic processes. RIP-qPCR demonstrated enrichment of at least one positive control – PDCD4-as1 – so I decided to accept the hits as they are, while acknowledging that they may not represent the full breadth of DNMT3A-RNA interaction. The enrichment of ribosomal components is unsurprising given their sheer abundance within the cell. The number of mRNAs may be an interesting phenomenon in itself – for example, it’s possible that these mRNAs themselves may

93 be responsible for recruiting DNMT3A, though this is a hypothesis for another project, and will be discussed further below. Instead, I investigated the possibility that either unannotated non-coding transcription was occurring, or annotated antisense transcription that was misidentified as the sense transcript, which would not be able to be differentiated due to our unstranded library. Indeed, I identified several such loci matching that description. BIM contained unannotated antisense transcription emanating from the first intron, while SLIT2 had an extended 5’ UTR – though possibly also an independent transcript. These two transcripts, along with PDCD4, whose antisense also emanates from the first intron, were both validated via qPCR. Two other genes I chose to investigate, DLG5 and EPHA5, both contained non-overlapping antisense transcripts emanating from a bidirectional promoter. However, both DLG5-as1 and EPHA5-as1 were not demonstrated to interact with DNMT3A based on qPCR. These findings affirm our hypothesis that DNMT3A does indeed interact with lncRNAs.

DNMT3A may interact with Firre

Figure 39 Visualization of DNMT3A ChIP macs2-outputted bedgraph files in IGV. Top: DNMT3A ChIP-seq peaks. Middle: Input peaks. Bottom: FIRRE gene locus.

An interesting hit worth noting coming from my RIP analysis is the lincRNA Firre. Firre has been previously described to mediate nuclear architecture in a 5-megabase domain around its locus129. Hacisuleyman et al. found that Firre interacts with hnRNPU to recruit trans-chromosomal domains to a region proximal to the Firre locus, and hypothesize that these domains form a subcompartment of genes intended to undergo concordant regulation. They also hypothesize that, in order to recruit some trans-chromosomal domains, hnRNPU forms a complex with Firre and other as-of-yet unidentified sequence specific proteins – among which could indeed 94 include DNMT3A. Confounding this possibility is the finding that of the five genes they found to interact with Firre – Slc25a12, Ypel4, Eeef1a1, Atf4, and Ppplr10 – none were found to also interact with DNMT3A in our ChIP-seq experiment. This, of course, does not necessarily quash our interest in this matter – Hacisuleyman et al. also mention 34 other Firre localization sites that do not overlap with mRNAs. With the knowledge that DNMT3A serves to engage in transposon silencing, this appears even more likely, and causes the idea that a mechanism exists that allows DNMT3A concordantly regulate multiple loci simultaneously into a rather appealing one130. Reinforcing this hypothesis is the finding that a DNMT3A ChIP-seq peak occurs near the transcription start site of the Firre gene – a site that Firre is known to localize to as well (Figure 39). Further investigation in to DNMT3A’s interaction with Firre may be worthwhile.

DNMT3A recruitment is contingent on a non-coding RNA transcript To verify the role played by these lncRNAs in recruiting DNMT3A and modulating gene expression, I devised two approaches for lncRNA ablation: RNA suppression via shRNAs, and gene excision via Cas9. Neither method is without its flaws, though both offer their own advantages. The Cas9-mediated gene excision or promoter deletion of the lncRNA is a conceptually elegant method of lncRNA ablation that has gained considerable traction in the wake of CRISPR’s popularization95,99. The idea, mentioned earlier, is simple: by deleting the lncRNA or its promoter from its endogenous locus, it can no longer be expressed. For lincRNAs, this is a procedure without many complex considerations – lincRNAs, intergenic by nature, are amenable to deletion without affecting nearby genes. My own putative lncRNA targets, however, either share a promoter with their cognate gene (SLIT2), or emanate from its first intron (PDCD4, BIM), which are known to contain regulatory elements98. However, DNMT3A is not known to recognize DNA motifs – so any loss or gain of DNMT3A localization

95 in the absence of a region driving or encoding a lncRNA is revelatory in some fashion regardless of the outcome. The Cas9-mediated genomic excision consistently reduced or eliminated the expression of the lncRNA across all three genes investigated. In PDCD4-as1 knockouts, most of the first exon and part of the estimated promoter of PDCD4- as1 were deleted, successfully reducing expression of the remaining second exon by 30%. In spite of this relatively small effect, however, DNMT3A localization to the PDCD4 promoter was eliminated almost entirely. From this, we can infer that some feature of this genomic region is essential for the recruitment of DNMT3A. In BIM-as knockouts, lncRNA expression was also successfully ablated – mitigated entirely within the deleted region, and decreased by ~30% in the remaining transcript. Because BIM-as is an unannotated transcript, a large portion of BIM-as’ first intron was deleted to effectively capture the BIM-as promoter. Loss of this region resulted in a 3-fold loss of DNMT3A localization, which also coincided with a 7-fold increase in H3K9me3. H3K9me3 has been previously reported to recruit DNMT3A, though my results indicate that H3K9me3-mediated recruitment may be a more complex network than previously thought. Though there remain other alternative hypotheses to explore, this slightly emboldens the possibility that BIM-as is playing a role in DNMT3A recruitment.

96

Figure 40 Some models that may govern DNMT3A interaction with lncRNAs and recruitment to genomic loci. (A) DNMT3A may interact with asRNAs, which forms an RNA:RNA duplex with the nascent cognate mRNA or paRNA. (B) The asRNA may interact with two sets of proteins: one of which includes DNMT3A, the other which includes other proteins that confer loci specificity. (C) DNMT3A interacts with the lncRNA, which forms an RNA:DNA triplex via Hoogsteen base-pairing to confer loci specificity to DNMT3A. All models likely involve other protein players, whose identities are currently unknown.

Based on these two findings, I propose four possible models of DNMT3A recruitment mediated by lncRNAs. The first possible model states that DNMT3A interacts with the nascent asRNA transcript to confer it loci specificity. This model is a well-characterized model of DNA methylation in plants, and has been previously reported to modulate transcriptional regulation at least once in mammals as well. A 2010 study demonstrated that the nascent asRNA ANRIL recruits CBX7 – a member of the polycomb repressive complex 1 – and induces chromatin remodeling on its cognate locus131.

97 The second model states that DNMT3A interacts with the mature asRNA transcript, which then base pairs with the lncRNA’s cognate mRNA to localize DNMT3A to its genomic locus by forming an RNA duplex (Figure 40). This model has only been demonstrated once before: the EBV lncRNA EBER2, discussed in the introduction58. Nevertheless, demonstrating that this mode of recruitment can occur encourages the likelihood that it does occur elsewhere. The third model hypothesizes that DNMT3A interacts with the mature asRNA transcript, which then forms an RNA:RNA:DNA triplex with its cognate genomic locus (Figure 39). This model has been described at least three times, all in very similar capacities: a noncoding RNA complexes with regulatory proteins and recruits it to one or more loci to which it has triplex formation affinity. It should be noted that sequence base pairing does not drive triplex formation; instead, it’s determined by the presence of complementary nucleic acid motifs that form Hoogsteen and reverse Hoogsteen hydrogen bonds. Though this in theory could drive a plethora of trans acting interactions, only one of the three characterized triplex-forming lncRNAs is thought to operate in trans: the lncRNA MEG3 was found to not only regulate TGF-β in trans, but also several other TGF-β pathway genes132 The other two regulate their cognate loci: PARTICLE is a radiation induced antisense lncRNA that regulates its sense gene, MAT2A, while the somewhat broadly titled pRNA is expressed upstream of the rRNA transcription start site and is required for methylation of the rRNA promoter 133,134. Interestingly, pRNA exerts its methylating action via recruitment of DNMT3B. It should be mentioned that these two transcripts may also act in trans, even if their regulation is specific to a single gene, as it’s difficult to demonstrate that a particular transcript is only acting on the allele from which it was transcribed. Protein scaffolding, possibly the most common mode of lncRNA functionality described thus far, is unlikely based on my results – at least in the loci I investigated. In this model, two sets of proteins interact with the lncRNA: one set bears some catalytic function, while the other confers loci specificity (Figure 39). 98 Because DNMT3A localization at the loci I’ve investigated is dependent on an asRNA emanating from that locus, it seems unlikely that another factor is involved in conferring specificity.

DNMT3A is modulated by PDCD4-as1 in a bimodal fashion Efforts to suppress PDCD4-as1 through two modes resulted in a serendipitous finding: the shRNA-mediated knockdown and the Cas9-mediated gene excision, despite both methods successfully ablating PDCD4-as1 expression, had a differential impact on DNMT3A localization. Promoter and first exon excision remitted DNMT3A localization almost entirely, while the shRNA mediated knockdown increased DNMT3A binding by ~30%. Moreover, increased DNMT3A localization was correlated with a proportional (also ~30%) increase in PDCD4 mRNA expression in the absence of increased promoter methylation, while decreased localization had no effect on gene expression. RNA-seq data indicated that PDCD4 experienced no change in expression upon depletion of DNMT3A. These results, taken together, suggest that PDCD4 is positively regulated by DNMT3A, but is not subject to DNMT3A regulation in its native state. PDCD4-as1, on the other hand, is negatively regulated by DNMT3A – it experienced a 2-fold increase in expression upon its depletion. However, my experiments demonstrate that PDCD4-as1 also interacts with DNMT3A. Several possible models may be proposed to account for this phenomenon. First, the nascent and mature PDCD4-as1 transcript may exhibit bimodal functionality. The nascent PDCD4-as1 transcript may recruit DNMT3A to negatively regulate its own expression in a fashion dependent on its first exon. The mature PDCD4-as1 transcript, on the other hand, may catalytically inhibit DNMT3A and prevent its localization. Localization of the mature transcript affects this hypothesis; the reasonable efficiency of the shRNA suggests that it was

99 operating on a cytoplasmic transcript. If PDCD4-as1 can modulate cytoplasmic DNMT3A activity, it may function as a global modulator of DNMT3A. Another possibility is that the genomic locus excised by Cas9 may itself contain some DNMT3A recruitment elements, while the PDCD4-as1 transcript is responsible for DNMT3A localization inhibition. This model of bimodal regulation has been previously demonstrated by the lncRNA HAUNT, whose locus operated as an enhancer of HOXA expression, while the lncRNA itself induces heterochromatin formation and transcriptional repression at the HOXA locus99. Though no global DNMT3A recruitment motif has ever been reported, it’s also possible that a DNMT3A-interacting protein required for its recruitment may recognize a motif in that region. Alternatively, increased recruitment of DNMT3A and gene activation may not relate to the endogenous locus at all, but instead, an unintended experimental side effect: siRNAs and dsRNAs, intermediaries in shRNA processing, have been demonstrated to cause gene activation in specific genes an Ago2 dependent manner93,94. The extent to which this mechanism occurs and the conditions required for its provocation are largely unknown, making this possibility a potentially challenging one to unravel. Though I have presented three possibilities here, in the absence of further information, the speculative possibilities are endless. More experiments to deconvolute the molecular network at play are to be conducted, which will be discussed later.

PDCD4 may be subject to active demethylation and methylation cycling A curious finding is that increased DNMT3A localization corresponds to increased PDCD4 expression. I mentioned in the introduction a mechanism by which this is possible: active demethylation, wherein 5mC is progressively oxidized before being reestablished and subject to oxidation once more. A more recent study described that active demethylation did correlate with an accumulation of active histone marks,

100 though not necessarily transcriptional activation135. One of the original methylation cycling studies demonstrated DNMT3A to be a key player in gene activation mediated by cycling, and they, as well as another study, demonstrated the ability of DNMT3A to remove 5hmC in vitro69,77. Therefore, it may be likely that DNMT3A, rather than suppressing the PDCD4 locus, may be engaging in deamination of 5hmC. Alternatively, 5mC marks may also serve as the bottleneck in this particular instance of methylation cycling, and DNMT3A may be depositing transient 5mC marks that are rapidly cleared by demethylation machinery and subsequently driving increased transcription. Bisulfite analysis of the PDCD4 promoter’s CpG island provides some insight on this matter. 5mC marks are largely depleted in both wildtype and small- hairpin transfected conditions, though not entirely. This can be reconciled with both possibilities mentioned earlier: DNMT3A may indeed be engaging in a function independent of its methyltransferase activity – in which case, 5hmC analysis would likely be more informative in determining whether DNMT3A is deaminating 5hmC. This is an experiment currently under way. Alternatively, the presence of lingering 5mC marks may also suggest that DNMT3A is still engaging in the deposition of 5mC – however, not enough alleles were sequenced in this particular experiment to determine if this is true. Increased localization of TET family proteins would lend credence to this hypothesis, as it suggests increased active demethylation. It may be worth experimentally modulating TET protein levels to gauge their role in PDCD4 expression. It’s also worth mentioning that loss of DNMT3A having no impact does not necessarily indicate that DNMT3A plays a minimal role in regulating PDCD4 transcription – its absence may be compensated for by, for example, DNMT3B.

DNMT3A is recruited by BIM-as independently of H3K9me3 BIM-as is a novel lncRNA that I extrapolated from the ab initio transcriptome generated by my collaborator, and was subsequently found to interact with

101 DNMT3A. Interestingly, excising a large portion of the intron-overlapping segment of the BIM-as while largely maintaining expression of the promoter-overlapping segment significantly decreased DNMT3A localization, indicating the necessity of the truncated region in recruiting DNMT3A. Meanwhile, truncation of the intron- overlapping region simultaneously increased H3K9me3 – a repressive histone mark said to recruit DNMT3A – positioning BIM-as in a network involved in regulating H3K9me3 as well. Both PDCD4 and BIM experienced a loss of DNMT3A localization upon truncation of the region of the lncRNA overlapping their cognate gene, suggesting that this overlap is important for DNMT3A recruitment. This excision also drove a relatively minor concordant loss of expression in the remainder of BIM-as and BIM mRNA (around ~20%) – which may be a consequence of the deletion event, as our RNA-seq data demonstrates that BIM is upregulated upon depletion of DNMT3A. Nevertheless, the loss of DNMT3A in spite of the accumulation of a recruiting mark indicates this finding to be of some interest. One molecular mechanism by which DNMT3A and H3K9me3 co-localize has been described, though in a site-dependent manner128. Chang et al. describe how the protein MPP8 binds to a dimethylated form of DNMT3A, DNMT3AK44me2, and forms a complex between itself, DNMT3AK44me2, and the H3K9 methyltransferases G9a and GLP – previously described to be responsible H3K9me2136. This complex then simultaneously induces H3K9me2 and DNA methylation. Another study demonstrates that G9a and GLP form a complex with other H3K9 methyltransferases, SETDB1 and Suv39h1, to induce both H3K9me2 and H3K9me3137. SETDB1 has been previously described to interact with DNMT3A as well, and SETDB1 and DNMT3A were reported to be co-dependent on one another for silencing of a reporter gene138. However, a more recent study reports that DNMTs and SETDB1 regulate a largely independent set of genes – they demonstrate that suppression of DNMTs and SETDB1 reveals that only about 8% of differentially regulated between the two, and suggest that SETDB1 deposits 102 H3K9me3 independently of DNA methylation139. However, deletion of Suv39h1 has been shown to result in a loss of both H3K9me3 and DNA methylation at major satellite repeats, while deletion of DNMTs has no effect on H3K9me3 in pericentromeric heterochromatin79. These seemingly contradictory findings, occasionally paradoxical even within the same report, illustrate a highly complex and context-dependent network orchestrated by different H3K9 methyltransferases and DNMTs in repressing transcription. My results demonstrate that H3K9me3 is independently insufficient for maintaining endogenous levels of DNMT3A recruitment. The disproportionate increase in H3K9me3 following BIM-as knockout, regardless of whether the critical element was the non-coding RNA or another feature of the region, may have served as a compensatory mechanism to attempt to maintain regional levels of DNMT3A – although still failing to do so. Alternatively, H3K9me3 may be attempting to compensate for the loss of DNMT3A through recruitment DNMT3B instead – DNMT3B has also been demonstrated to interact with HP1α along with DNMT3A79. Bisulfite PCR revealed that the BIM locus is heavily methylated in HEK293 cells, indicating heavy suppression – an unsurprising finding, given the pro-apoptotic function of BIM and the cancer-like phenotype of HEK293. However, loss of DNMT3A localization does not correspond to a loss of BIM promoter methylation. This indicates that methylation of the BIM promoter is either maintained by DNMT3B and/or DNMT1, or one or more of the DNMTs are being recruited to compensate for the loss of DNMT3A. Naturally, this is amenable towards my earlier hypothesis accounting for the increase in H3K9me3. Alternatively, DNMT3A may not be responsible for promoter methylation – instead, it may be involved in gene body methylation. Gene body methylation is a poorly characterized epigenetic feature; however, it is generally described as a feature of actively transcribed genes140-142. Unlike intragenic DNA methylation however, intragenic H3K9me3 is generally associated with transcriptional repression141. If intragenic methylation is indeed decreased as a consequence of the loss of 103 DNMT3A, an increase in intragenic H3K9me3 may follow. This hypothesis is not supported by our RNA-seq data, which indicates that DNMT3A is a negative regulator of BIM. However, given the poor understanding of intragenic methylation, the possibility may still be worth investigating.

Future Directions and Conclusions

DNMT3A may be modulated in a number of possible fashions Evidence demonstrated in this report provides considerable evidence for lncRNA- driven mediation of DNMT3A. Several avenues for further investigation remain available to us. First, I will illustrate the possibilities that stand before us, and the strengths and weakness of each possibility. Second, I will suggest experiments and investigations that could demystify the molecular network before us. The first possibility is our hypothesis: DNMT3A is indeed interacting with our lncRNA to modulate its recruitment to the lncRNA’s cognate locus, as discussed earlier. However, transcription of the antisense lncRNA is never ablated entirely by our deletion events, while recruitment of DNMT3A is disproportionately affected – if our hypothesis were to be true, it would suggest that the DNMT3A recruitment is at least partially dependent on some character of the deleted region, such as secondary structure or sequence, rather than the entire transcript. The evidence we have available to us right now encourages this possibility. In selecting the objects of our investigation from RIP-Seq data, we are choosing transcripts that interact directly or in complex with DNMT3A. The second possibility is that, rather than our lncRNA, DNMT3A is interacting with the mRNA transcript. It is not outside the realm of possibility that mRNA, in addition to coding for a protein, may serve some other function through its sequence chemistry or secondary structure – in other words, the mRNA may be a protein coding “lncRNA” of sorts. The same principle from the lncRNA hypothesis applies to the functional mRNA hypothesis as well: there exists a region, specifically 104 situated around the first intron or 5’ end of the transcript, that is responsible for recruitment of DNMT3A. As far as I can tell, an interaction of this nature currently lacks precedence, but it’s one that may be worth considering nonetheless. Before discussing the third and fourth possible hypothesis, we must preface it by briefly discussing an issue that may have confounded our RIP-seq: our formaldehyde cross-linking. Although it improves interaction stability and overall sensitivity, covalent cross-linking combined with the general preservation of the entire bound transcript means that there is a strong possibility that indirect interactions are captured as well – meaning that DNMT3A may not be the protein directly modulated by the RNA species, but rather, merely a member of the greater complex. The third possibility is that, rather than interacting with the RNA, DNMT3A instead interacts with the DNA, and the RNA transcript was merely captured as part of the greater cross-linked complex. This hypothesis assumes that some part of the genomic region may contain some sort of binding motif recognized by DNMT3A. When we deleted the portion of the genome we believed contained promoter elements for our antisense transcript, we might instead have targeted a region containing repressive elements directed towards expression of the sense transcript. Given that regulatory elements are known to occur in the first intron, this is certainly not outside the realm of possibility143. However, this seems unlikely, as no DNA- binding motif for DNMT3A has yet to be identified, nor could one be identified from our own ChIP-seq experiment. The fourth possibility is that, rather than interacting with the RNA or DNA, DNMT3A interacts with a protein intermediary that is the target of RNA or DNA modulation instead of DNMT3A. This is a possibility that will likely be raised by objectors to our hypothesis. The current champion for lncRNA-mediated epigenetic modification is the Polycomb Repressive Complex 2 (PRC2) – although which of its member components is most involved in lncRNA interaction is hotly debated144,145. PRC2 is thought to confer loci specificity not through direct interactions with a 105 lncRNA, but rather, another sequence-specific protein that is also bound to the same lncRNA molecule (see Table )22. It is certainly is possible that DNMT3A may be operating in a similar fashion. My results offer support both for and against this possibility. Truncation of the lncRNA mitigating DNMT3A localization in a region so proximal to its own locus suggests that there’s likely no need for a protein intermediary to confer such specificity – it encourages the possibility of a cis acting interaction. On the other hand, suppression of PDCD4-as1 using shRNA increasing DNMT3A localization suggests the possibility of an independent means of localization – though as stated earlier, this is but one possibility. These four possibilities are not mutually exclusive; in fact, it’s likely naïve to believe that any one mechanism is responsible for mediating recruitment of DNMT3A – given the breadth of its activity that I have discussed thus far. Any combination of these four potential regulatory actors may form a complex network of interactions responsible for repression or recruitment of DNMT3A or DNMT3A-interacting factors. The dependence of DNMT3A recruitment on the region of the asRNA overlapping the mRNA suggests that an interaction between the asRNA and the mRNA is important – an RNA:RNA duplex, perhaps. Alternatively, the asRNA may depend on an interaction with the open, active transcribed chromatin instead – an RNA:DNA duplex. These modes, among with several others, were discussed in the previous section – and represent but a subset of the possible mechanisms that remain.

Investigation in DNMT3A interaction is necessary The next stage of experiments should aim to resolve these different hypotheses and clarify a clear mechanism by which DNMT3A recruitment is mediated. First, we should investigate the validity of any interaction between DNMT3A and our transcript of interest while also considering the possibility of additional interactions, or lack thereof, with other nucleic acid species. Second, we should investigate any

106 other possible protein interactors, through investigations into the RNA, the DNA, and DNMT3A itself. There are several possible options for determining the validity of a DNMT3A-nucleic acid interaction. We could essentially repeat our experiments whether greater efforts made towards stringency; for example, we could simply perform an RNA immunoprecipitation without cross-linking, thereby reducing the possibility of any non-specific or distal interactions being immunoprecipitated. This approach perhaps offers the least promise of new data, however, and adopting an experiment such as iCLIP – discussed earlier – would be enormously preferable instead. In vitro assays to validate direct DNMT3A interaction with nucleic acids or proteins in question may also be desirable to bolster any in vivo work conducted, while also potentially capable of detecting any nuanced differences between DNMT3A interactions with different molecules. Such in vitro assays include electronic mobility shift assays, isothermal titration calorimetry, and microscale thermophoresis – all intended to detect the physical change generated by the interaction of two molecules. Several of the possible models can also be explored simultaneously. Interrogating the interactome of lncRNAs of interest, though a blind experiment, is likely to reveal some finding of interest that would, at the very least, serve to refine potential hypotheses worth exploring. An early approach to this end involved the in vitro transcription and biotinylation of the transcript of interest, followed by transfection of the biotinylated transcript or incubation with cellular lysate146. The biotinylated transcript, along with any proteins bound to it, are then precipitated using streptavidin, which forms a powerful non-covalent bond with biotin. Mass spectrometry can then be used to discern or proteins bound to the synthetic transcript. This technique has a few caveats: overexpression and lysate incubation both represent non-endogenous environments that may skew the RNA molecule’s interaction profile.

107 Three similar techniques to solve this problem have been recently developed: identification of direct RNA interacting proteins(iDRiP), RNA-antisense purification(RAP-MS), and Chromatin Isolation by RNA Purification(ChIRP-MS) 112,113,147. Essentially, they all involve the use of several biotinylated oligonucleotides tiled along and complementary to the transcript of interest. These oligonucleotides are then incubated with cell lysate, where they’re left to bind to the transcript of interest, before they’re subsequently precipitated with streptavidin. iDRiP, RAP, and ChIRP, have some small differences in their protocol: RAP uses 90 nucleotide long oligonucleotides, iDRiP 25, and ChIRP 20. iDRiP and RAP feature an ultraviolet cross-linking step, while ChIRP features a 3% formaledyhde cross-linking. iDRiP also features a DNase I treatment step to reduce off-targeting, while RAP also allows for quantitative comparison by growing its cells in SILAC media. Each technique was used to explore the XIST interactome, and each technique revealed a set of proteins largely inclusive of one another, with some exceptions. These techniques can also be used to identify chromatin or RNA interacting partners by purifying nucleic acids instead of proteins and sequencing bound molecules – an intent for which they were originally designed148,149. This approach stands to be a powerful method for deciphering the web of possibility that stands before us. Discovering protein-interacting partners with DNMT3A-interacting lncRNAs across different functional contexts will help elucidate the conditions necessary for DNMT3A to exert its differential regulatory actions. Furthermore, identifying chromatin or RNA species interacting with the lncRNA may also reveal the nature of DNMT3A’s localization – whether it’s an RNA:RNA interaction or an RNA:DNA interaction, for example. The exact nuances of such interactions, however, will require other approaches to investigate. A few different approaches have been used to differentiate RNA:DNA triplexes, RNA:DNA duplexes, and RNA:RNA duplexes. The first involves RNase H: RNase H degrades RNA:DNA duplexes, but not RNA:RNA duplexes or RNA:DNA triplexes134,150. An antibody for RNA:DNA duplexes could also be used 108 to determine the presence of said duplexes, although differentiating between lncRNA:DNA interactions and active transcription or R-loops may pose other challenges(Kerafast, S9.6). Alternatively, detecting physical changes in vitro has also been used to detect the formation of certain structures: circular dichroism spectroscopy and surface plasmon resonance have both been used to determine triplex formation by non-coding RNAs 132,133. iCLIP will reveal DNMT3A binding domains on RNA iCLIP is often considered a more attractive alternative to RIP-Seq, given its greater stringency and its ability to reveal the protein binding site on RNA at a high resolution103. iCLIP creates greater avenues of exclusion with a partial RNA digestion and gel extraction step, and by cross-linking the protein-RNA interaction with UV light, proteinase digestion leaves residues at the site of interaction that prevent reverse transcription. This absence allows the site of interaction to be identified during downstream analysis. iCLIP is considered less sensitive than RIP- Seq however, due to the partial RNase digestion step and the general inefficiency of UV-crosslinking151. An alternative, PARS-CLIP, involves feeding cells photoreactive ribonucleoside analogues to improve cross-linking efficiency; however, these ribonucleosides often exhibit undesirable toxicity, so the technique is not always viable. The loss of sensitivity may itself be a desirable feature: non-specific or infrequently binding RNA species may be less likely to bear physiological relevance than the frequently binding and less likely to be excluded events. Therefore, positive hits from iCLIP are likely to be better justified for experimental validation than RIP- Seq hits. Such confidence is placed in CLIP-Seq’s low background that it is often run without controls. The only thing preventing us from conducting our own CLIP- Seq experiment thus far is the absence of radiation licensing to use ATP-32, which is essential to CLIP-Seq experiments (Verbal communication).

109 Many mysteries on DNMT3A will have to be answered Peppered throughout the discussion and results are the various findings, both my own and others, that paint a seemingly contradictory portrait of DNMT3A and DNA methylation. DNMT3A has alternatively been described as a methyltransferase, a demethylase, a transcriptional repressor, a transcriptional activator, as associated with histone marks, independent from histone marks, and more. My own results demonstrate a multivalent function of DNMT3A – acting to both suppress and activate genes, associating with both methylated and unmethylated regions, and operating independently of H3K9me3. The cumulative conclusion that can be drawn from research conducted thus far is that DNMT3A is a multi-faceted protein involved in manifold function and regulation, dependent on cellular and regional context. Perhaps one failure of research conducted thus far is the tendency to conflate DNMT3A and DNMT3B, and occasionally DNMT1 as well – particularly when investigating their interactions with and histone methyltransferases. Research has demonstrated that DNMT3A and DNMT3B are able to regulate an independent array of genes, and that DNMT3A is responsible for epigenetic architecture independently of DNMT3B152. For example, DNMT3A was found to be responsible for maintain wide “canyons” of low methylation by marking its borders with 5hmC, and for methylation of intergenic regions to maintain active expression of neurogenic genes by antagonizing PRC2 localization114,153. It’s clear that each DNMT should be investigated independently, as they bear distinct vignettes of molecular activity. Likewise, the function of each DNMT should be investigated in a loci- specific manner as well. My results demonstrate at least two distinct functional profiles for DNMT3A localization, both congruent and distinct from previous reports. The assorted and disparate profiles of protein interaction and activity demonstrated thus far are a testament to the want of contextual molecular mechanisms over big data. Without an understanding of when DNMT3A functions 110 in the way it does and why, interpretations of genome-wide data become hampered and inconclusive. I have demonstrated that DNMT3A interacts with antisense lncRNAs, and that these lncRNAs mediate recruitment of DNMT3A in a fashion dependent on their sequences that overlap the mRNA of their sense gene. I also demonstrate that DNMT3A localization can occur independently of depositing 5mC, and that DNMT3A can be involved in transcriptional activation. Finally, I show that DNMT3A recruitment can be independent of H3K9 trimethylation. My results are novel contributions to the realm of DNMT3A activity, and serve to help unravel the full breadth of its function by implicating lncRNAs as a significant player in its recruitment.

111 References 1. Yu, A. D., Wang, Z. & Morris, K. V. Long noncoding RNAs: a potent source of regulation in immunity and disease. Immunol. Cell Biol. 93, 277–283 (2015). 2. Morris, K. V. & Mattick, J. S. The rise of regulatory RNA. Nature Reviews Genetics 1–15 (2014). doi:10.1038/nrg3722 3. Consortium, T. E. P. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 488, 57–74 (2013). 4. Graur, D. et al. On the immortality of television sets: ‘function’ in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol 5, 578–590 (2013). 5. Ma, L., Bajic, V. B. & Zhang, Z. On the classification of long non-coding RNAs. rnabiology 10, 925–933 (2013). 6. Yan, B.-X. & Ma, J.-X. Promoter-associated RNAs and promoter-targeted RNAs. Cell. Mol. Life Sci. 69, 2833–2842 (2012). 7. Wen, Y.-Z., Zheng, L.-L., Qu, L.-H., Ayala, F. J. & Lun, Z.-R. Pseudogenes are not pseudo any more. rnabiology 9, 27–32 (2012). 8. Hsieh, C. L. et al. Enhancer RNAs participate in androgen receptor-driven looping that selectively enhances gene activation. Proc. Natl. Acad. Sci. U.S.A. 111, 7319–7324 (2014). 9. IIott, N. E. et al. Long non-coding RNAs and enhancer RNAs regulate the lipopolysaccharide-induced inflammatory response in human monocytes. Nat Commun 5, 3979 (2014). 10. Rinn, J. L. et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007). 11. Chow, J. C., Yen, Z., Ziesche, S. M. & Brown, C. J. Silencing of the mammalian X chromosome. Annu Rev Genomics Hum Genet 6, 69–92 (2005). 12. Johnsson, P. et al. A pseudogene long-noncoding-RNA network regulates PTEN transcription and translation in human cells. Nat Struct Mol Biol 20, 440–446 (2013). 13. Tay, Y. et al. Coding-Independent Regulation of the Tumor Suppressor PTEN by Competing Endogenous mRNAs. Cell 147, 344–357 (2011). 14. Han, Y. J., Ma, S. F., Yourek, G., Park, Y. D. & Garcia, J. G. N. A transcribed pseudogene of MYLK promotes cell proliferation. The FASEB Journal 25, 2305–2312 (2011). 15. Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 457, 223–227 (2009). 16. Gutschner, T. & Diederichs, S. The hallmarks of cancer: A long non-coding RNA point of view. rnabiology 9, 703–719 (2012). 17. Cabianca, D. S. et al. A Long ncRNA Links Copy Number Variation to a Polycomb/Trithorax Epigenetic Switch in FSHD Muscular Dystrophy. Cell 149, 819–831 (2012). 18. van Dijk, M. et al. HELLP babies link a novel lincRNA to the trophoblast cell cycle. J. Clin. Invest. 122, 4003–4011 (2012). 19. Gomez, J. A. et al. The NeST Long ncRNA Controls Microbial Susceptibility and EpigeneticActivation of the Interferon-g Locus. Cell 152, 743–754 (2013). 20. Saayman, S. et al. An HIV-encoded antisense long non-coding RNA epigenetically regulates viral transcription. Mol Ther (2014). doi:10.1038/mt.2014.29 21. Mercer, T.R. & Mattick, J.S. Structure and function of long noncoding RNAs in epigenetic regulation. Nat Struct Mol Biol 20, 300–307 (2013). 22. Tsai, M.-C. et al. Long noncoding RNA as modular scaffold of histone modification complexes. Science 329, 689–693 (2010). 23. Shi, L. et al. Noncoding RNAs and LRRFIP1 regulate TNF expression. J. Immunol. 192, 3057–3067 (2014). 24. Cui, H. et al. The human long noncoding RNA lnc-IL7R regulates the inflammatory response. Eur. J. Immunol. 44, 2085–2095 (2014). 112 25. Zhao, J. et al. Genome-wide Identification of Polycomb-Associated RNAs by RIP-seq. Molecular Cell 40, 939–953 (2010). 26. Khalil, A. M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. U.S.A. 106, 11667–11672 (2009). 27. Kambara, H. et al. Negative regulation of the interferon response by an interferon-induced long non-coding RNA. Nucl. Acids Res. gku713 (2014). doi:10.1093/nar/gku713 28. Wang, K. C. et al. A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature 472, 120–124 (2011). 29. Yang, Y. W. et al. Essential role of lncRNA binding for WDR5 maintenance of active chromatin and embryonic stem cell pluripotency. eLife 3, e02046–e02046 (2014). 30. Sharma, S. et al. Dephosphorylation of the nuclear factor of activated T cells (NFAT) transcription factor is regulated by an RNA-protein scaffold complex. Proc. Natl. Acad. Sci. U.S.A. 108, 11381–11386 (2011). 31. Carpenter, S. et al. A Long Noncoding RNA Mediates Both Activation and Repression of Immune Response Genes. Science 341, 789–792 (2013). 32. Li, Z. et al. The long noncoding RNA THRIL regulates TNFα expression through its interaction with hnRNPL. Proc. Natl. Acad. Sci. U.S.A. 111, 1002–1007 (2014). 33. Campillos, M. et al. Specific interaction of heterogeneous nuclear ribonucleoprotein A1 with the -219T allelic form modulates APOE promoter activity. Nucl. Acids Res. 31, 3063– 3070 (2003). 34. Kuninger, D. T., Izumi, T., Papaconstantinou, J. & Mitra, S. Human AP-endonuclease 1 and hnRNP-L interact with a nCaRE-like repressor element in the AP-endonuclease 1 promoter. Nucl. Acids Res. 30, 823–829 (2002). 35. Hasegawa, Y. et al. The matrix protein hnRNP U is required for chromosomal localization of Xist RNA. Developmental Cell 19, 469–476 (2010). 36. Huarte, M. et al. A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell 142, 409–419 (2010). 37. Vacha, S. J., Bennett, G. D., Mackler, S. A., Koebbe, M. J. & Finnell, R. H. Identification of a growth arrest specific (gas 5) gene by differential display as a candidate gene for determining susceptibility to hyperthermia-induced exencephaly in mice. Dev. Genet. 21, 212–222 (1997). 38. Mourtada-Maarabouni, M., Hedge, V. L., Kirkham, L., Farzaneh, F. & Williams, G. T. Growth arrest in human T-cells is controlled by the non-coding RNA growth-arrest- specific transcript 5 (GAS5). Journal of Cell Science 123, 1181–1181 (2010). 39. Kino, T., Hurt, D. E., Ichijo, T., Nader, N. & Chrousos, G. P. Noncoding RNA Gas5 Is a Growth Arrest- and Starvation-Associated Repressor of the Glucocorticoid Receptor. Science Signaling 3, ra8–ra8 (2010). 40. Rapicavoli, N. A. et al. A mammalian pseudogene lncRNA at the interface of inflammation and anti-inflammatory therapeutics. eLife 2, e00762–e00762 (2013). 41. Imamura, K. et al. Long Noncoding RNA NEAT1-Dependent SFPQ Relocation from Promoter Region to Paraspeckle Mediates IL8 Expression upon Immune Stimuli. Molecular Cell 53, 393–406 (2014). 42. Saha, S., Murthy, S. & Rangarajan, P. N. Identification and characterization of a virus- inducible non-coding RNA in mouse brain. J. Gen. Virol. 87, 1991–1995 (2006). 43. Clemson, C. M. et al. An Architectural Role for a Nuclear Noncoding RNA: NEAT1 RNA Is Essential for the Structure of Paraspeckles. Molecular Cell 33, 717–726 (2009). 44. Bolland, D. J. et al. Antisense intergenic transcription in V(D)J recombination. Nat Immunol 5, 630–637 (2004). 45. Verma-Gaur, J. & Torkamani, A. Noncoding transcription within the Igh distal VH region at PAIR elements affects the 3D structure of the Igh locus in pro-B cells. Proc. Natl. Acad. 113 Sci. U.S.A., 17004-17009 (2012). doi:10.1073/pnas.1208398109/ 46. Sutherland, H. & Bickmore, W. A. Transcription factories: gene expression in unions? Nature Reviews Genetics 10, 457–466 (2009). 47. Pefanis, E. et al. Noncoding RNA transcription targets AID to divergently transcribed loci in B cells. Nature 1–17 (2014). doi:10.1038/nature13580 48. Weinberg, M. S. et al. The antisense strand of small interfering RNAs directs histone methylation and transcriptional gene silencing in human cells. RNA 12, 256–262 (2006). 49. Rossetto, C. C. & Pari, G. S. Kaposi's sarcoma-associated herpesvirus noncoding polyadenylated nuclear RNA interacts with virus- and host cell-encoded proteins and suppresses expression of genes involved in immune modulation. J. Virol. 85, 13290–13297 (2011). 50. Rossetto, C. C., Tarrant-Elorza, M., Verma, S., Purushothaman, P. & Pari, G. S. Regulation of viral and cellular gene expression by Kaposi's sarcoma-associated herpesvirus polyadenylated nuclear RNA. J. Virol. 87, 5540–5553 (2013). 51. Reeves, M. B., Davies, A. A., McSharry, B. P., Wilkinson, G. W. & Sinclair, J. H. Complex I binding by a virally encoded RNA regulates mitochondria-induced cell death. Science 316, 1345–1348 (2007). 52. Urosevic, N., van Maanen, M., Mansfield, J. P., Mackenzie, J. S. & Shellam, G. R. Molecular characterization of virus-specific RNA produced in the brains of flavivirus- susceptible and -resistant mice after challenge with Murray Valley encephalitis virus. J. Gen. Virol. 78 ( Pt 1), 23–29 (1997). 53. Pijlman, G. P. et al. A Highly Structured, Nuclease-Resistant, Noncoding RNA Produced by Flaviviruses Is Required for Pathogenicity. Cell Host & Microbe, 4, 579–591 (2008). 54. Sheth, U. & Parker, R. Decapping and decay of messenger RNA occur in cytoplasmic processing bodies. Science 300, 805–808 (2003). 55. Chapman, E. G. et al. The structural basis of pathogenic subgenomic flavivirus RNA (sfRNA) production. Science 344, 307–310 (2014). 56. Moon, S. L. et al. A noncoding RNA produced by arthropod-borne flaviviruses inhibits the cellular exoribonuclease XRN1 and alters host mRNA stability. RNA 18, 2029–2040 (2012). 57. Schuessler, A. et al. West Nile virus noncoding subgenomic RNA contributes to viral evasion of the type I interferon-mediated antiviral response. J. Virol. 86, 5708–5718 (2012). 58. Lee, N., Moss, W. N., Yario, T. A. & Steitz, J. A. EBV Noncoding RNA Binds Nascent RNA to Drive Host PAX5 to Viral DNA. Cell 160, 607–618 (2015). 59. Iyer, M. K. et al. The landscape of long noncoding RNAs in the human transcriptome. Nature Genetics 1–13 (2015). doi:10.1038/ng.3192 60. Gupta, R. A. et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464, 1071–1076 (2010). 61. Yuan, J.-H. et al. A Long Noncoding RNA Activated by TGF-β Promotes the Invasion- Metastasis Cascade in Hepatocellular Carcinoma. Cancer Cell 25, 666–681 (2014). 62. Salmena, L., Poliseno, L., Tay, Y., Kats, L. & Pandolfi, P. P. A ceRNA Hypothesis: The Rosetta Stone of a Hidden RNA Language? Cell 146, 353–358 (2011). 63. Garmire, L. X. et al. A Global Clustering Algorithm to Identify Long Intergenic Non- Coding RNA - with Applications in Mouse Macrophages. PLoS ONE 6, e24051 (2011). 64. Peng, X. et al. Unique signatures of long noncoding RNA expression in response to virus infection and altered innate immune signaling. mBio 1, e00206–10–e00206–18 (2010). 65. Pelechano, V. & Steinmetz, L. M. Gene regulation by antisense transcription. Nature Reviews Genetics, 14, 880–893 (2013). 66. Illingworth, R. S. et al. Orphan CpG Islands Identify Numerous Conserved Promoters in the Mammalian Genome. PLoS Genet 6, e1001134–15 (2010). 114 67. Vardimon, L., Kressmann, A., Cedar, H., Maechler, M. & Doerfler, W. Expression of a cloned adenovirus gene is inhibited by in vitro methylation. Proc. Natl. Acad. Sci. U.S.A. 79, 1073–1077 (1982). 68. Stein, R., Razin, A. & Cedar, H. In vitro methylation of the hamster adenine phosphoribosyltransferase gene inhibits its expression in mouse L cells. Proc. Natl. Acad. Sci. U.S.A. 79, 3418–3422 (1982). 69. Métivier, R. et al. Cyclical DNA methylation of a transcriptionally active promoter. Nature 452, 45–50 (2008). 70. Kangaspeska, S. et al. Transient cyclical methylation of promoter DNA. Nature 452, 112– 115 (2008). 71. Ito, S. et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5- carboxylcytosine. Science (2011). doi:10.1126/science.1203690 72. Zhang, L. et al. Thymine DNA glycosylase specifically recognizes 5-carboxylcytosine- modified DNA. Nature Chemical Biology 8, 328–330 (2012). 73. Booth, M. J. et al. Quantitative sequencing of 5-methylcytosine and 5- hydroxymethylcytosine at single-base resolution. Science 336, 934–937 (2012). 74. Song, C.-X. et al. Genome-wide Profiling of 5-Formylcytosine Reveals Its Roles in Epigenetic Priming. Cell 153, 678–691 (2013). 75. Okano, M., Bell, D. W., Haber, D. A. & Li, E. DNA Methyltransferases Dnmt3a and Dnmt3b Are Essential for De Novo Methylation and Mammalian Development. Cell 99, 247–257 (1999). 76. Suetake, I., Shinozaki, F., Miyagawa, J., Takeshima, H. & Tajima, S. DNMT3L stimulates the DNA methylation activity of Dnmt3a and Dnmt3b through a direct interaction. Journal of Biological Chemistry 279, 27816–27823 (2004). 77. Chen, C.-C., Wang, K.-Y. & Shen, C.-K. J. The Mammalian de Novo DNA Methyltransferases DNMT3A and DNMT3B Are Also DNA 5-Hydroxymethylcytosine Dehydroxymethylases. Journal of Biological Chemistry 287, 33116–33121 (2012). 78. Suzuki, M. M. & Bird, A. DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet 9, 465–476 (2008). 79. Lehnertz, B., Ueda, Y., Derijck, A. & Braunschweig, U. Suv39h-mediated histone H3 lysine 9 methylation directs DNA methylation to major satellite repeats at pericentric heterochromatin. Current Biology (2003). doi:10.1016/S0960-9822(03)00432-9 80. Otani, J. et al. Structural basis for recognition of H3K4 methylation status by the DNA methyltransferase 3A ATRX-DNMT3-DNMT3L domain. EMBO Rep 10, 1235–1241 (2009). 81. Zhang, Y. et al. Chromatin methylation activity of Dnmt3a and Dnmt3a/3L is guided by interaction of the ADD domain with the histone H3 tail. Nucl. Acids Res. 38, 4246–4253 (2010). 82. Hervouet, E., Vallette, F. M. & Cartron, P.-F. Dnmt3/transcription factor interactions as crucial players in targeted DNA methylation. Epigenetics 4, 487–499 (2014). 83. Fuks, F., Burgers, W. A., Godin, N., Kasai, M. & Kouzarides, T. Dnmt3a binds deacetylases and is recruited by a sequence‐specific repressor to silence transcription. EMBO J. 20, 2536–2544 (2001). 84. Di Ruscio, A. et al. DNMT1-interacting RNAs block gene-specific DNA methylation. Nature, 1–10 (2014). doi:10.1038/nature12598 85. Holz-Schietinger, C. & Reich, N. O. RNA modulation of the human DNA methyltransferase 3A. Nucl. Acids Res. 40, 8550–8557 (2012). 86. Morris, K. V., Chan, S. W.-L., Jacobsen, S. E. & Looney, D. J. Small interfering RNA- induced transcriptional gene silencing in human cells. Science 305, 1289–1292 (2004). 87. Sun, B. K., Deaton, A. M. & Lee, J. T. A Transient Heterochromatic State in Xist Preempts X Inactivation Choice without RNA Stabilization. Molecular Cell 21, 617–628 115 (2006). 88. Shlyueva, D., Stampfel, G. & Stark, A. Transcriptional enhancers: from properties to genome-wide predictions. Nature Reviews Genetics 15, 272–286 (2014). 89. Cifuentes-Rojas, C., Hernandez, A. J., Sarma, K. & Lee, J. T. Regulatory Interactions between RNA and Polycomb Repressive Complex 2. Molecular Cell 55, 171–185 (2014). 90. Zamore, P. D., Tuschl, T., Sharp, P. A. & Bartel, D. P. RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101, 25–33 (2000). 91. Gagnon, K. T., Li, L., Chu, Y., Janowski, B. A. & Corey, D. R. RNAi factors are present and active in human cell nuclei. CELREP 6, 211–221 (2014). 92. Castel, S. E. & Martienssen, R. A. RNA interference in the nucleus: roles for small RNAs in transcription, epigenetics and beyond. Nature Reviews Genetics 14, 100–112 (2013). 93. Li, L.-C. et al. Small dsRNAs induce transcriptional activation in human cells. Proc. Natl. Acad. Sci. U.S.A. 103, 17337–17342 (2006). 94. Huang, G.-W., Liao, L.-D., Li, E.-M. & Xu, L.-Y. siRNA induces gelsolin gene transcription activation in human esophageal cancer cell. Sci Rep 5, 7901 (2015). 95. Ho, T.-T. et al. Targeting non-coding RNAs with the CRISPR/Cas9 system in human cell lines. Nucl. Acids Res. gku1198 (2014). doi:10.1093/nar/gku1198 96. Kraft, K. et al. Deletions, Inversions, Duplications: Engineering of Structural Variants using CRISPR/Cas in Mice. CELREP 10, 833–839 (2015). 97. Kuhnert, F. et al. Attribution of vascular phenotypes of the murine Egfl7 locus to the microRNA miR-126. Development 135, 3989–3993 (2008). 98. Majewski, J. & Ott, J. Distribution and characterization of regulatory elements in the human genome. Genome Research 12, 1827–1836 (2002). 99. Yin, Y. et al. Opposing Roles for the lncRNA Haunt and Its Genomic Locus in Regulating HOXA Gene Activation during Embryonic Stem Cell Differentiation. STEM 16, 1–14 (2015). 100. Kent, W. J. et al. The human genome browser at UCSC. Genome Research 12, 996–1006 (2002). 101. Lizio, M. et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biology 16, 22 (2015). 102. Zhang, J. et al. International Cancer Genome Consortium Data Portal--a one-stop shop for cancer genomics data. Database (Oxford) 2011, bar026 (2011). 103. Uren, P. J. et al. Site identification in high-throughput RNA-protein interaction data. Bioinformatics 28, 3013–3020 (2012). 104. Bahrami-Samani, E., Penalva, L. O. F., Smith, A. D. & Uren, P. J. Leveraging cross-link modification events in CLIP-seq for motif discovery. Nucl. Acids Res. 43, 95–103 (2015). 105. Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using MACS. Nature Protocols 7, 1728–1740 (2012). 106. Thermofisher. BLOCK-iT™ RNAi Designer. rnaidesigner.thermofisher.com at 107. Yuan, B., Latek, R., Hossbach, M., Tuschl, T. & Lewitter, F. siRNA Selection Server: an automated siRNA oligonucleotide prediction server. Nucl. Acids Res. 32, W130–4 (2004). 108. Zhang, F. CRISPR Design. crispr.mit.edu at 109. Doench, J. G. et al. Rational design of highly active sgRNAs for CRISPR- Cas9–mediated gene inactivation. Nature Biotechnology 1–8 (2014). doi:10.1038/nbt.3026 110. Li, L.-C. & Dahiya, R. MethPrimer: designing primers for methylation PCRs. Bioinformatics 18, 1427–1431 (2002). 111. Challen, G. A. et al. Dnmt3a is essential for hematopoietic stem cell differentiation. Nat 116 Genet 44, 23–31 (2011). 112. McHugh, C. A. et al. The Xist lncRNA interacts directly with SHARP to silence transcription through HDAC3. Nature 1–23 (2015). doi:10.1038/nature14443 113. Chu, C. et al. Systematic Discovery of Xist RNA Binding Proteins. Cell 161, 404–416 (2015). 114. Wu, H. et al. Dnmt3a-dependent nonpromoter DNA methylation facilitates transcription of neurogenic genes. Science 329, 444–448 (2010). 115. Shaw, G., Morse, S., Ararat, M. & Graham, F. L. Preferential transformation of human neuronal cells by human adenoviruses and the origin of HEK 293 cells. The FASEB Journal 16, 869–871 (2002). 116. Frankish, A. et al. Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction. BMC Genomics 16 Suppl 8, S2 (2015). 117. Yang, H.-S. et al. The transformation suppressor Pdcd4 is a novel eukaryotic translation initiation factor 4A binding protein that inhibits translation. Mol. Cell. Biol. 23, 26–37 (2003). 118. Dorrello, N. V. et al. S6K1- and betaTRCP-mediated degradation of PDCD4 promotes protein translation and cell growth. Science 314, 467–471 (2006). 119. Zhang, H. et al. Involvement of programmed cell death 4 in transforming growth factor- beta1-induced apoptosis in human hepatocellular carcinoma. Oncogene 25, 6101–6112 (2006). 120. Goke, R., Barth, P., Schmidt, A., Samans, B. & Lankat-Buttgereit, B. Programmed cell death protein 4 suppresses CDK1/cdc2 via induction of p21(Waf1/Cip1). Am. J. Physiol., Cell Physiol. 287, C1541–6 (2004). 121. Afonja, O., Juste, D., Das, S., Matsuhashi, S. & Samuels, H. H. Induction of PDCD4 tumor suppressor gene expression by RAR agonists, antiestrogen and HER-2/neu antagonist in breast cancer cells. Evidence for a role in apoptosis. Oncogene 23, 8135–8145 (2004). 122. O'Connor, L. et al. Bim: a novel member of the Bcl‐2 family that promotes apoptosis. EMBO J. 17, 384–395 (1998). 123. Faber, A. C., Ebi, H., Costa, C. & Engelman, J. A. Apoptosis in targeted therapy responses: the role of BIM. Adv. Pharmacol. 65, 519–542 (2012). 124. Gogada, R. et al. Bim, a proapoptotic protein, up-regulated via transcription factor E2F1- dependent mechanism, functions as a prosurvival molecule in cancer. J. Biol. Chem. 288, 368–381 (2013). 125. Hanahan, D. & Weinberg, R. A. Hallmarks of Cancer: The Next Generation. Cell 144, 646–674 (2011). 126. Han, J., Kim, D. & Morris, K. V. Promoter-associated RNA is required for RNA-directed transcriptional gene silencing in human cells. Proc. Natl. Acad. Sci. U.S.A. 104, 12422– 12427 (2007). 127. Schübeler, D. Function and information content of DNA methylation. Nature 517, 321– 326 (2015). 128. Chang, Y. et al. MPP8 mediates the interactions between DNA methyltransferase Dnmt3a and H3K9 methyltransferase GLP/G9a. Nat Commun 2, 533–10 (2011). 129. Hacisuleyman, E. et al. Topological organization of multichromosomal regions by the long intergenic noncoding RNA Firre. Nat Struct Mol Biol 21, 198–206 (2014). 130. Aravin, A. A. et al. A piRNA pathway primed by individual transposons is linked to de novo DNA methylation in mice. Molecular Cell 31, 785–799 (2008). 131. Yap, K. L. et al. Molecular Interplay of the Noncoding RNA ANRIL and Methylated Histone H3 Lysine 27 by Polycomb CBX7 in Transcriptional Silencing of INK4a. Molecular Cell 38, 662–674 (2010). 117 132. O’Leary, V. B. et al. PARTICLE, a Triplex-Forming Long ncRNA, Regulates Locus- Specific Methylation in Response to Low-Dose Irradiation. CELREP 11, 474–485 (2015). 133. Mondal, T. et al. MEG3 long noncoding RNA regulates the TGF-β pathway genes through formation of RNA–DNA triplex structures. Nat Commun 6, 1–17 (2015). 134. Schmitz, K.-M., Mayer, C., Postepska, A. & Grummt, I. Interaction of noncoding RNA with the rDNA promoter mediates recruitment of DNMT3b and silencing of rRNA genes. Genes & Development 24, 2264–2269 (2010). 135. Klug, M. et al. Active DNA demethylation in human postmitotic cells correlates with activating histone modifications, but not transcription levels. Genome Biology 11, R63–11 (2010). 136. Chen, X. et al. G9a/GLP-dependent histone H3K9me2 patterning during human hematopoietic stem cell lineage commitment. Genes & Development 26, 2499–2511 (2012). 137. Fritsch, L. et al. A Subset of the Histone H3 Lysine 9 Methyltransferases Suv39h1, G9a, GLP, and SETDB1 Participate in a Multimeric Complex. Molecular Cell 37, 46–56 (2010). 138. Li, H. et al. The histone methyltransferase SETDB1 and the DNA methyltransferase DNMT3A interact directly and localize to promoters silenced in cancer cells. Journal of Biological Chemistry 281, 19489–19500 (2006). 139. Karimi, M. M. et al. DNA Methylation and SETDB1/H3K9me3 Regulate Predominantly Distinct Sets of Genes, Retroelements, and Chimeric Transcripts in mESCs. STEM 8, 676–687 (2011). 140. Jjingo, D., Conley, A. B., Yi, S. V., Lunyak, V. V. & Jordan, I. K. On the presence and role of human gene-body DNA methylation. Oncotarget 3, 462–474 (2012). 141. Hahn, M. A., Wu, X., Li, A. X., Hahn, T. & Pfeifer, G. P. Relationship between gene body DNA methylation and intragenic H3K9me3 and H3K36me3 chromatin marks. PLoS ONE 6, e18844 (2011). 142. Hellman, A. & Chess, A. Gene body-specific methylation on the active X chromosome. Science 315, 1141–1143 (2007). 143. Bornstein, P., McKay, J., Morishima, J. K., Devarayalu, S. & Gelinas, R. E. Regulatory elements in the first intron contribute to transcriptional control of the human alpha 1(I) collagen gene. Proc. Natl. Acad. Sci. U.S.A. 84, 8869–8873 (1987). 144. Kaneko, S. et al. Interactions between JARID2 and Noncoding RNAs Regulate PRC2 Recruitment to Chromatin. Molecular Cell 53, 290–300 (2014). 145. Kretz, M. & Meister, G. RNA Binding of PRC2: Promiscuous or Well Ordered? Molecular Cell 55, 157–158 (2014). 146. Feng, Y. et al. in Cancer Cell Signaling (ed. Robles-Flores, M.) 1165, 115–143 (Springer New York, 2014). 147. Minajigi, A. et al. A comprehensive Xist interactome reveals cohesin repulsion and an RNA-directed chromosome conformation. Science 349, aab2276–aab2276 (2015). 148. Engreitz, J. M. et al. RNA-RNA Interactions Enable Specific Targeting of Noncoding RNAs to Nascent Pre-mRNAs and Chromatin Sites. Cell 159, 188–199 (2014). 149. Chu, C., Quinn, J. & Chang, H. Y. Chromatin Isolation by RNA Purification (ChIRP). JoVE (2012). doi:10.3791/3912 150. Fedoroff, O. Y., Salazar, M. & Reid, B. R. Structure of a DNA: RNA hybrid duplex: why RNase H does not cleave pure RNA. Journal of molecular biology 233, 509–523 (1993). 151. Wang, T. et al. Design and bioinformatics analysis of genome-wide CLIP experiments. Nucl. Acids Res. 1–12 (2015). doi:10.1093/nar/gkv439 152. Challen, G. A. et al. Dnmt3a and Dnmt3b Have Overlapping and Distinct Functions in Hematopoietic Stem Cells. STEM 1–15 (2014). doi:10.1016/j.stem.2014.06.018 153. Jeong, M. et al. Large conserved domains of low DNA methylation maintained by Dnmt3a. Nat Genet 46, 17–23 (2014).

118