Please do not remove this page

The Human Integrator Complex Facilitates Transcriptional Elongation by Endonucleolytic Cleavage of Nascent Transcripts

Blumenthal, Ezra https://scholarship.miami.edu/discovery/delivery/01UOML_INST:ResearchRepository/12386098540002976?l#13386098530002976

Blumenthal, E. (2021). The Human Integrator Complex Facilitates Transcriptional Elongation by Endonucleolytic Cleavage of Nascent Transcripts [University of Miami]. https://scholarship.miami.edu/discovery/fulldisplay/alma991031605861802976/01UOML_INST:ResearchR epository

Open Downloaded On 2021/09/25 13:59:55 -0400

Please do not remove this page

UNIVERSITY OF MIAMI

THE HUMAN INTEGRATOR COMPLEX FACILITATES TRANSCRIPTIONAL ELONGATION BY ENDONUCLEOLYTIC CLEAVAGE OF NASCENT TRANSCRIPTS

By

Ezra Blumenthal

A DISSERTATION

Submitted to the Faculty of the University of Miami in partial fulfillment of the requirements for the degree of Doctor of Philosophy

Coral Gables, Florida

August 2021

©2021 Ezra Blumenthal All Rights Reserved

UNIVERSITY OF MIAMI

A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

THE HUMAN INTEGRATOR COMPLEX FACILITATES TRANSCRIPTIONAL ELONGATION BY ENDONUCLEOLYTIC CLEAVAGE OF NASCENT TRANSCRIPTS

Ezra Blumenthal

Approved:

______Ramin Shiekhattar, Ph.D. Anthony J. Capobianco, Ph.D. Professor of Human Genetics Professor of Surgery

______David J. Robbins, Ph.D. Stephen D. Nimer, M.D. Professor of Surgery Professor of Medicine

______Lluis Morey, Ph.D. Guillermo Prado, Ph.D. Assistant Professor of Human Genetics Dean of the Graduate School

______Sandra K. Lemmon, Ph.D. Professor of Molecular and Cellular Pharmacology

Blumenthal, Ezra (Ph.D., Cancer Biology) The Human Integrator Complex (August 2021) Facilitates Transcriptional Elongation by Endonucleolytic Cleavage of Nascent Transcripts.

Abstract of a dissertation at the University of Miami.

Dissertation supervised by Ramin Shiekhattar, Ph.D. No. of pages in text. (109)

Transcription by RNA polymerase II (RNAPII) is pervasive in the .

However, the mechanisms controlling transcription at promoters and enhancers remain enigmatic. Here we demonstrate that INTS11, the catalytic subunit of the

Integrator complex, regulates transcription at these loci through its endonuclease activity. Promoters of require INTS11 to cleave nascent transcripts associated with paused RNAPII and induce their premature termination in the proximity of the +1 nucleosome. The turnover of RNAPII permits the subsequent recruitment of an elongation-competent RNAPII complex leading to productive elongation. In contrast, enhancers do not require INTS11 catalysis to evict paused RNAPII but rather to terminate enhancer RNA transcription beyond the

+1 nucleosome. These findings are supported by the differential occupancy of

NELF, SPT5, and tyrosine-1-phosphorylated RNAPII. This study elucidates the role of Integrator in mediating transcriptional elongation at human promoters through the endonucleolytic cleavage of nascent transcripts and the dynamic turnover of RNAPII.

DEDICATION to Kay; the love of my life, best friend, and quarantine companion

iii

ACKNOWLEDGEMENTS

I must begin by first thanking my mentor, Ramin Shiekhattar, for accepting me into his lab and taking an interest in my scientific and personal development. It is an incredible privilege to have been trained in a lab with a rich history of numerous landmark discoveries including the initial purifications of the

Microprocessor, CoREST, and Integrator complexes. Ramin’s work ethic, creativity and approach to answering scientific questions exemplifies the qualities needed to succeed as a premier scientist and I am grateful to have had this exposure during my graduate studies.

My deepest gratitude goes to the members of my thesis committee, Tony

Capobianco, David Robbins, Lluis Morey, Sandra Lemmon, and Stephen Nimer for their unwavering support throughout my PhD training. Their suggestions and feedback during committee meetings were critical to the progress of my work and growth. There were some personally difficult moments over the years and on many occasions Drs. Morey and Lemmon offered encouraging words and helped me see the big picture. I am also profoundly thankful to Dr. Nimer for the time I spent working in his lab and his continued mentorship after my departure. My interest in hematology was sparked through this experience, which laid the foundation for my future clinical practice.

My research career began serendipitously at the age of 16 when my neighbor, Neal Rosen, took me into his lab at Memorial Sloan-Kettering Cancer

Center and provided the critical resources for my early development. Over the six summers I worked in his lab, I received mentoring from outstanding physician

iv

scientists including Christine Pratilas, David Solit, and Sarat Chandarlapaty.

Though unbeknownst at the time, my access to these individuals in a rigorous scientific environment was paramount to my early career development and crucial for securing a post-baccalaureate O.R.I.S.E (Oak Ridge Institute for Science

Education) fellowship at the F.D.A. and a M.D./Ph.D. fellowship at the University of Miami.

Throughout all stages of my career, I received countless hours of technical assistance from dozens senior scientists, post-docs, and technicians for no other reason than pure kindness. I want to express my gratitude to all of these individuals for their selfless contribution to my success. In particular, I want to thank my co- authors Felipe Beckedorff and Lucas da Silva for their tireless efforts analyzing and digesting my genome-wide data. Without their determination, the content presented in this document would simply not exist.

Finally, I conclude by thanking my wife Kalynn Cook, my parents George and Barbara Blumenthal, and my brother Avi for being the most supportive family

I could ever ask for. Thanks for joining me on this incredibly gratifying journey through the echelons of science and medicine. Cheers.

v

TABLE OF CONTENTS

LIST OF ABBREVIATIONS ix

LIST OF FIGURES xi

LIST OF TABLES xiii

1. INTRODUCTION 1 1.1. EUKARYOTIC TRANSCRIPTION BY RNAPII AND REGULATION OF EXPRESSION BY ENHANCERS 2 1.1.1. Stages of transcription and RNAPII pause-release 2 1.1.2. Enhancer elements and genomic architecture 7 1.2. COMPOSITION AND FUNCTIONS OF THE INTEGRATOR COMPLEX 13 1.2.1. Core subunits and RNAPII association 13 1.2.2. Endonuclease-dependent functions of Integrator 17 1.2.3. Integrator’s contribution to RNAPII pause-release and the MAPK gene expression program 20 1.2.4. Physiological importance 21

2. KEY RESOURCES AND METHODS 24 2.1. KEY RESOURCES 25 2.2. METHODS 28

3. THE INTEGRATOR CATALYTIC MODULE REGULATES STEADY-STATE GENE EXPRESSION 39 3.1. APPROACH 40 3.2. RESULTS 40 3.2.1. Validation of INTS11, INTS9 and INTS4 depletion for RNA-seq 40 3.2.2. INTS11, INTS9 and INTS4 regulate expression of a core gene set 41 3.3. SUMMARY 46

4. INTEGRATOR MEDIATES TRANSCRIPTIONAL ELONGATION 47 4.1. APPROACH 48 4.2. RESULTS 49 4.2.1. Bioinformatic parameters for gene selection and INTS11 genomic occupancy 49 4.2.2. Integrator facilitates transcriptional elongation 50 4.2.3. The traveling matrix reveals Integrator-dependent transcriptional behavior 51 4.2.4. Class IV genes are enriched in Integrator and RNAPII 53 4.3. SUMMARY 55

vi

5. INTEGRATOR RELIEVES RNAPII PAUSING ADJACENT TO THE +1 NUCLEOSOME 56 5.1. APPROACH 57 5.2. RESULTS 57 5.2.1. Integrator is required for RNAPII recruitment and pause-release 57 5.2.2. The steady-state occupancy of RNAPII recapitulates alterations in nascent transcription at the four classes of genes 59 5.2.3. Integrator functions at promoters with a positioned +1 nucleosome decorated in H3K4me3 60 5.3. SUMMARY 62

6. THE CATALYTIC ACTIVITY OF INTEGRATOR PROMOTES RNAPII PAUSE-RELEASE AND CONTRIBUTES TO TRANSCRIPTIONAL ELONGATION 63 6.1. APPROACH 64 6.2. RESULTS 64 6.2.1. Validation of endogenous INTS11 depletion and shRNA-resistant expression of WT-INTS11 and E203Q-INTS11 64 6.2.2. INTS11 catalysis facilitates transcriptional elongation 65 6.2.3. INTS11 catalysis is required for RNAPII pause-release but not RNAPII recruitment 66 6.3. SUMMARY 68

7. INTS11 CATALYZES THE PREMATURE TERMINATION OF SMALL RNA ASSOCIATED WITH PAUSED RNAPII 69 7.1. APPROACH 70 7.2. RESULTS 70 7.2.1. INTS11 catalysis has a limited role in the stability of RNAPII-associated smRNA 70 7.2.2. INTS11 terminates smRNA associated with paused RNAPII 72 7.2.3. INTS11 generates a 21-nucleotide smRNA product of catalysis 73 7.3. SUMMARY 75

8. INTEGRATOR REGULATES ENHANCER RNA TRANSCRIPTIONAL TERMINATION 76 8.1. APPROACH 77 8.2. RESULTS 77 8.2.1. Integrator recruits RNAPII to enhancers and terminates eRNA transcription but is dispensable for RNAPII pause-release 77 8.2.2. The enhancer traveling matrix conveys the role of Integrator in transcriptional termination beyond the +1 nucleosome 79

vii

8.3. SUMMARY 80

9. PAUSING FACTORS AND RNAPII PHOSPHORLAYTION CONVEY INTEGRATOR FUNCTIONS AT GENES AND ENHANCERS 81 9.1. APPROACH 82 9.2. RESULTS 82 9.2.1. NELF, SPT5 and tyrosine-1 phosphorylated RNAPII affirm Integrator activity at genes and enhancers 82 9.2.2. Integrator localizes in clusters with tyrosine-1 phosphorylated RNAPII 85 9.3. SUMMARY 86

10. DISCUSSION 87 10.1. THE CONTRASTING ROLES OF HUMAN AND DROSOPHILA INTEGRATOR IN TRANSCRIPTIONAL REGULATION 88 10.2. INTEGRATOR MEDIATES THE DYNAMIC TURNOVER OF RNAPII PROXIMAL TO THE +1 NUCLEOSOME AT GENES AND DISTAL TO THE +1 NUCLEOSOME AT ENHANCERS 91 10.3. FUTURE PERSPECTIVES 96

REFERENCES 99

viii

LIST OF ABBREVIATIONS

ATAC Assay for transposable chromatin

CAGE Cap analysis of gene expression

ChIP Chromatin immunoprecipitation

CPSF Cleavage and polyadenylation specificity factor

CTD C-terminal domain

EGF Epidermal growth factor eRNA Enhancer RNA

GFP Green fluorescent

H3K4me1 Monomethylated lysine 4 on histone H3

H3K4me3 Trimethylated lysine 4 on histone H3

H3K27ac Acetylated lysine 27 on histone H3

H3K36me3 Trimethylated lysine 36 on histone H3

IEG Immediate early gene

INTS Integrator subunit

NELF Negative elongation factor

PRO Precision run-on

RNAPII RNA polymerase II

RNAPII-S2 Serine-2 phosphorylated RNAPII

RNAPII-S5 Serine-5 phosphorylated RNAPII

RNAPII-Y1 Tyrosine-1 phosphorylated RNAPII smRNA Small RNA

SPT5 Suppressor of ty 5 homolog

ix

STORM Stochastic optical reconstruction microscopy

TES Transcription end site

TM Traveling matrix

TR Traveling ratio

TSS Transcription start site

UsnRNA Uridylate-rich small nuclear RNAs

-seq High throughput sequencing

x

LIST OF FIGURES

Figure 1.1.1. Early stages of transcription and RNAPII pause-release. 4

Figure 1.1.2. Transcriptional activity and genomic architecture of promoters and enhancers. 10

Figure 1.2.1. Subunit composition of the human Integrator complex. 15

Figure 1.2.2. The role of Integrator in UsnRNA 3’-end processing. 18

Figure 1.2.3. Integrator mediates RNAPII pause-release. 21

Figure 3.2.1. Validation of INTS11, INTS9 and INTS4 depletion for RNA-seq. 41

Figure 3.2.2. INTS11, INTS9 and INTS4 regulate expression of a core gene set. 42

Figure 4.2.1. Bioinformatic parameters for gene selection and INTS11 genomic occupancy. 50

Figure 4.2.2. Integrator facilitates transcriptional elongation. 51

Figure 4.2.3. The traveling matrix reveals Integrator-dependent transcriptional behavior. 53

Figure 4.2.4. Class IV genes are occupied by Integrator; PRO-seq examples of class IV genes. 54

Figure 5.2.1. Integrator is required for RNAPII recruitment and pause-release. 58

Figure 5.2.2. The steady-state occupancy of RNAPII recapitulates alterations in nascent transcription. 60

Figure 5.2.3. Integrator functions at promoters with a positioned +1 nucleosome decorated in H3K4me3. 61

Figure 6.2.1. Validation of endogenous INTS11 depletion and shRNA-resistant expression of WT-INTS11 and E203Q-INTS11. 65

Figure 6.2.2. INTS11 catalysis facilitates transcriptional elongation. 66

Figure 6.2.3. INTS11 catalysis is required for RNAPII pause-release but not RNAPII recruitment. 67

xi

Figure 7.2.1. INTS11 catalysis has a limited role in the stability of RNAPII-associated smRNA. 71

Figure 7.2.2. INTS11 terminates smRNA associated with paused RNAPII. 73

Figure 7.2.3. INTS11 generates a 21-nucleotide smRNA product of catalysis. 74

Figure 8.2.1. Integrator recruits RNAPII to enhancers and terminates eRNA transcription dispensable for RNAPII pause-release. 78

Figure 8.2.2. The enhancer traveling matrix conveys the role of Integrator in transcriptional termination beyond the +1 nucleosome. 80

Figure 9.2.1. NELF, SPT5 and tyrosine-1 phosphorylated RNAPII affirm Integrator activity at genes and enhancers. 84

Figure 9.2.2. Integrator localized in clusters with tyrosine-1 phosphorylated RNAPII. 85

Figure 10.2.1 Model of Integrator mediating RNAPII pause-release and elongation at human genes. 94

xii

LIST OF TABLES

Table 1. Significantly downregulated pathways by RNA-seq. 43

Table 2. Significantly upregulated pathways by RNA-seq. 45

xiii

CHAPTER 1: INTRODUCTION

1 2

1.1. EUKARYOTIC TRANSCRIPTION BY RNAPII AND REGULATION OF

GENE EXPRESSION BY ENHANCERS

1.1.1. Stages of transcription and RNAPII pause-release

Transcription by RNA polymerase II (RNAPII) in eukaryotes is governed at multiple stages, allowing for the precise execution of gene expression programs critical to cellular development and homeostasis. We begin this section with a brief overview of the steps of transcription, namely initiation, elongation and termination.

We particularly focus on the factors and events that orchestrate RNAPII pause- release, which is a tightly regulated rate-limiting event early in elongation seen in higher eukaryotes and notably absent in yeast (Adelman and Lis, 2012).

The packaging of eukaryotic DNA into chromatin represents a fundamental difference between eukaryotes and prokaryotes in their genomic organization

(Struhl, 1999). Since the eukaryotic transcriptional machinery is blocked by nucleosomes, the structural unit of chromatin, transcriptional initiation requires cooperation among chromatin remodelers, transcription factors and transcriptional cofactors to alter the local environment and support entry of general transcription factors (GTFs) that recruit RNAPII to target genes (Core and Adelman, 2019).

During formation of the preinitiation complex, GTFs coalesce around the core promoter and orient RNAPII at transcription start sites (TSS) (Figure 1.1.1. A). This event follows an orderly stepwise cascade as follows: 1) TFIID containing TBP

(TATA binding protein) binds TATA or TATA-like DNA sequences and bends the double helix 90 degrees. TATA-associated factors (TAFs) interact with adjacent sequences and impart additional specificity. 2) TFIIA and TFIIB stabilize TBP-DNA 3 engagement through interactions on opposite ends of the TBP crescent structure.

Notably, TFIIB directly associates with RNAPII thereby physically linking DNA- bound TBP to RNAPII. 3) TFIIF, which was initially considered a subunit of RNAPII itself, directly recruits the RPB2 subunit of RNAPII and makes contacts with TFIIA to stabilize DNA interactions up-and-downstream from the TSS. TFIIF also associates with several phosphatases that dephosphorylate the C-terminal domain of RPB1 (CTD), which contains 52 repeats of the YSPTSPS heptad in human

RPB1 and 26 repeats in S. cerevisiae RPB1. An unphosphorylated CTD is critical for the association of RNAPII with the ~20-30 subunit Mediator complex, which transduces regulatory information through the binding of sequence-specific transcription factors and subsequent bridging of enhancer regions to promoters of nearly all RNAPII-transcribed genes. 4) TFIIE and TFIIH associate and functionally cooperate in promoter melting and transcriptional initiation. The ATP-dependent

XPB helicase, a component of TFIIH, is stimulated by TFIIE and unwinds the local

DNA to open the transcription bubble, thus enabling RNAPII to initiate transcription from the single-stranded template DNA (Kornberg, 2007; Roeder, 2019; Saunders et al., 2006; Schier and Taatjes, 2020).

As transcriptional initiation proceeds, Mediator stimulates the activity of the

TFIIH-associated kinase CDK7 to phosphorylate RNAPII on serine-5 and serine-7 of the CTD (Boeing et al., 2010; Meyer et al., 2010). While the role serine-7 phosphorylation is not completely understood, it may aid in the recruitment of the

Integrator complex to UsnRNA genes and prime the CTD for ensuing phosphorylation by P-TEFb (Czudnochowski et al., 2012; Egloff et al., 2007) (see 4 below). On the other hand, the molecular functions of serine-5 phosphorylation are well established in early transcription including the disruption of association between RNAPII and Mediator, and the recruitment of enzymes that cap nascent transcripts in their 5’-end with N7-methyl-guanosine (Figure 1.1.1. B, Right) (Core and Adelman, 2019)

Figure 1.1.1. Early stages of transcription and RNAPII pause-release. (A) Formation of the preinitiation complex (PIC) and RNAPII recruitment requires the orderly assembly of (left) general transcription factors, which is facilitated by (right) enhancer-bound transcription factors (TF) and transcriptional cofactors (COF). (B) (Left) TFIIH melts promoter DNA around the transcription start site to allow transcriptional initiation by RNAPII and promoter escape. (Right) Mediator (MED) stimulates TFIIH to phosphorylate RNAPII on serine-5 and serine-7 of the C-terminal domain. (C) (Left) After transcribing 20-120 nucleotides, RNAPII stalls in a process known as promoter-proximal pausing. (Right) The release of RNAPII into productive elongation relies on the kinase activity of P-TEFb, which phosphorylates NELF, DSIF and RNAPII on serine-2 of the C-terminal domain. (Adapted from Haberle, V. et al 2018)

Coincident with these phosphorylation events, RNAPII undergoes promoter escape where it breaks contacts with promoter elements and preinitiation 5 machinery to begin transcriptional initiation (Figure 1.1.1. B, Left). A critical regulatory step occurs after RNAPII transcribes 20-120 nucleotides from the TSS where it transiently stalls in the vicinity of the +1 nucleosome (Figure 1.1.1. C, Left)

(Chen et al., 2018). This phenomenon, known as RNAPII promoter-proximal pausing, was initially described at heat shock inducible genes in drosophila and has since been the subject of intense investigation in the genome-wide era

(Rougvie and Lis, 1988). Surprisingly, these studies revealed that most RNAPII- transcribed genes display the highest RNAPII density in promoter regions, establishing that RNAPII proximal pausing is ubiquitous in the genome (Muse et al., 2007; Rahl et al., 2010; Zeitlinger et al., 2007). While the molecular role of

RNAPII promoter-proximal pausing is under extensive debate, it is accepted that

RNAPII pausing 1) ensures the fidelity of 5’-capping on nascent RNA, 2) prevents reinitiation of another RNAPII complex, and 3) sustains the nucleosome-free state encompassing promoter regions (Gilchrist et al., 2010; Shao and Zeitlinger, 2017;

Tome et al., 2018). In early fly embryos, paused RNAPII complexes are spatially and temporally activated to allow for coordinated tissue morphogenesis (Boettiger and Levine, 2009; Lagha et al., 2013). Indeed, the involvement of paused genes in development, stimulus responsiveness and cellular reproduction suggests that multicellular organisms evolved RNAPII proximal pausing as an additional regulatory layer in the implementation of critical transcriptional programs

(Zeitlinger et al., 2007).

Numerous factors contribute to the establishment and maintenance of paused RNAPII including sequence-specific and general transcription factors, 6 negative elongation factor (NELF), DRB-sensitivity inducible factor (DSIF), polymerase associated factor 1 (PAF1), and Gdown1 (Chen et al., 2018).

Recently, NELF was structurally shown to promote RNAPII pausing through the inhibition of nucleotide entry into the active site of RPB1 and impede RNAPII translocation across the DNA (Vos et al., 2018). Intrinsic factors such as DNA/RNA hybrids and the +1 nucleosome have also been shown to promote RNAPII pausing by imposing an energetic barrier that obstructs RNAPII entry into gene bodies

(Bintu et al., 2012; Eddy et al., 2011; Teves et al., 2014; Weber et al., 2014).

The release of RNAPII and its nascent RNA into productive elongation relies on the kinase activity of CDK9, a component of P-TEFb, which phosphorylates

SPT5 (a subunit of DSIF), serine-2 on the RNAPII CTD, and NELF, which leads to its eviction (Figure 1.1.1. C, Right) (Peterlin and Price, 2006). P-TEFb is found in several distinct macromolecular complexes including those with 7SK snRNP,

BRD4, and the super elongation complex (SEC), which contains the majority of active P-TEFb (Chen et al., 2018; Core and Adelman, 2019). As elongation proceeds, phosphorylation of serine-2 on the RPB1 CTD promotes the association of RNAPII with RNA processing factors involved in splicing, cleavage and polyadenylation, in addition to complexes required for chromatin remodeling and histone exchange (Schier and Taatjes, 2020; Venkatesh and Workman, 2015).

At the 3’-end of genes, elongating RNAPII slows and is evicted from the chromatin, resulting in transcriptional termination. As RNAPII encounters the polyadenylation signal AAUAAA, the cleavage of its associated transcript occurs

18-30 nucleotides downstream by CPSF73, the RNA endonuclease subunit of the 7 cleavage and polyadenylation specificity factor (CPSF) complex. Following cleavage, RNAPII continues transcribing approximately 150 nucleotides past the polyadenylation site and is released from the template, though the mechanism is not clear (Creamer et al., 2011). Two models for RNAPII evection been proposed:

The allosteric model hypothesizes that the binding of the termination complex displaces elongation factors, which leads to decreased transcriptional fidelity and ultimately termination (Richard and Manley, 2009). While the torpedo model stipulates that the newly exposed 5’-end of the remaining transcript is continuously degraded by the XRN2 exonuclease until it ‘chases down’ elongating RNAPII to induce a conformational change in the active site leading to dissociation from the template DNA (Porrua and Libri, 2015; Proudfoot, 2016).

1.1.2. Enhancer elements and genomic architecture

In the following section, we discuss how the activity of enhancer elements contributes to transcriptional regulation at eukaryotic genes. Next, we will describe transcriptional characteristics and genomic architecture at this these regulatory sites and highlight their commonalities with promoters of protein-coding genes.

Finally, we will briefly examine how recently described non-coding transcripts derived from enhancers, known as enhancer RNA (eRNA), modulate transcriptional events at promoters.

Enhancers are DNA sequences that bind sequence-specific transcription factors and transcriptional cofactors to increase the rate of transcription at proximal and distal target genes, as the restrictive nature of chromatin only permits core 8 promoters to drive transcription at low basal rates (Haberle and Stark, 2018).The functional hallmarks of enhancers include: 1) augmentation of the transcriptional rate at target genes; 2) orientation-independent activation of linked genes; 3) positional functionality irrespective of distance from gene targets; 4) operation with heterologous promoters and 5) DNase I hypersensitivity, reflecting its open- chromatin state (Kim and Shiekhattar, 2015). In humans, enhancer elements far outnumber protein-coding genes and are estimated in the hundreds of thousands

(Calo and Wysocka, 2013). Enhancers are critical for cell-fate decisions and cellular development by controlling spatiotemporal patterns of gene activation. For example, the expression of Shh gene, which encodes a signaling morphogen crucial to brain and limb development, is governed by a series of long-range enhancers known as the zone of polarizing activity spanning 900 kb in length composed of ETS1/GABPa and ETV4/ETV5 binding sites (Lettice et al., 2012;

Schoenfelder and Fraser, 2019). Interestingly, studies of the mammalian immunoglobulin enhancer also demonstrated that enhancer activity could be restricted to particular cell types or tissues, as only cell lines of the lymphoid lineage exhibited activation of this enhancer (Banerji et al., 1983; Gillies et al.,

1983). Collectively, these examples and numerous other studies affirm that eukaryotic gene expression programs are carefully dictated through vast networks of enhancers bound by transcriptional regulators that prime cells for differentiation and lineage specificity.

Genome-wide studies have shown that promoters and enhancers share remarkably conserved architectural and functional features including DNase I 9 hypersensitivity, nucleosome positioning, transcription factor occupancy, histone methylation and acetylation, and bi-directional transcription (Kim and Shiekhattar,

2015). The nucleosome-free landscape encompassing enhancers and promoters allows for the binding of transcription factors to core promoter elements (Core et al., 2014). At both sites, transcriptional initiation occurs within a relatively narrow

100-200 bp window flanked by well-positioned +1 and -1 nucleosomes (Core et al., 2014). As with promoters, GTFs congregate at enhancers to recruit RNAPII, where it undergoes the established transcriptional cycle of initiation, elongation and termination (Figure 1.1.2.). While the unphosphorylated and serine-5 phosphorylated forms of the RNAPII CTD are detectable at enhancers, serine-2 phosphorylation, the elongation-specific form of RNAPII, is not readily apparent and likely reflects their lower rate of transcription (Koch et al., 2011). RNAPII pausing has also been described at enhancers, although recent work from the

Adelman group has shown that paused RNAPII complexes are considerably less stable compared to those at promoters. In their study, 50% of enhancers displayed paused RNAPII half-lives of less than 2.5 minutes, while more than 55% of protein- coding genes exhibited paused RNAPII half-lives greater than 10 minutes and 26% were greater than 20 minutes (Henriques et al., 2018). Unlike protein-coding genes, which utilize the cleavage and polyadenylation complex to terminate transcription, enhancer transcriptional termination is mediated by the Integrator complex, which is discussed later in this section (Lai et al., 2015). 10

Figure 1.1.2. Transcriptional activity and genomic architecture of promoters and enhancers. Both regulatory elements undergo bi-directional transcription following the assembly of common transcriptional machinery onto core promoters in nucleosome-free regions. Transcription towards the sense mRNA is favored at promoters due to the protection of transcripts by the U1 splicing complex. Enhancer RNA transcripts are short lived and degraded by the exosome. Promoters and enhancers are both enriched in post-translationally modified nucleosomes. Promoters feature nucleosomes decorated with H3K4me3 whereas enhancers harbor nucleosomes containing H3K4me1. (Adapted from Kim, T. et al 2015)

Covalent modifications on histones can alter the organization and function of chromatin by affecting histone-DNA and histone-histone interactions, thus carrying an additional layer of regulatory information beyond the DNA sequence

(Tessarz and Kouzarides, 2014). These modifications also serve as a platform by which effector modules and associated complexes can engage the chromatin and participate in chromatin-templated processes like transcription, DNA repair, and replication (Taverna et al., 2007; Tessarz and Kouzarides, 2014). Antibody and mass spectrometric-based methods have identified at least eight different types of 11 post-translational histone modifications on about 50 different sites, many of which are still not well understood (Taverna et al., 2007). Post-translationally modified histones are often sequestered in distinct genomic loci and the local ratio of trimethylated lysine 4 on histone H3 (H3K4me3) to the monomotheylated form

(H3K4me1) has been generally used to distinguish promoters from enhancers, with promoters featuring highest levels of H3K4me3 (Heintzman et al., 2007).

However, recent work argues that lysine four methylation on histone H3 is a continuum that merely reflects transcriptional activity, and the low ratio of

H3K4me3/H3K4me1 at enhancers confers their lower rate of transcription (Figure

1.1.2.) (Core et al., 2014). Histone acetylation is also found at active promoters and enhancers, in particular, acetylation of histone H3 on lysines 9, 18 and 27

(Calo and Wysocka, 2013). Notably, acetylation of histone H3 on lysine 27 delineates active from poised or inactive enhancers and is more strongly correlated with cell-type specificity than the occupancy of the p300/CBP acetyltransferases that catalyze its deposition (Creyghton et al., 2010; Rada-Iglesias et al., 2011).

Variant histones like the replication-independent H3.3 and H2Az are also found at enhancers and promoters, as they are typically enriched at nucleosome-free regions (Jin et al., 2009). While the characterization and localization of histone modifications have aided in distinguishing enhancers from promoters, many architectural features are shared among them, and the presence of enhancer RNA may be a better readout for identifying active enhancers.

Enhancers undergo prominent bi-directional transcription generating short

5’-capped, non-polyadenylated, non-spliced enhancer eRNA that are subject to 12 degradation by the exosome (Sartorelli and Lauberth, 2020). Similarly, promoters also undergo divergent transcription to produce short anti-sense non-coding RNA, though transcription is clearly favored toward the sense mRNA (Kim and

Shiekhattar, 2015). The determinants of directionality at promoters are still not well understood but a study from Phillip Sharp’s group noted an abundance of polyadenylation sites and a lack of U1 snRNP sites in antisense regions of promoters, which acts to potentially limit pervasive transcription (Figure 1.1.2.)

(Almada et al., 2013).

Described nearly a decade ago, enhancer RNA was shown to correlate with the level of expression at target genes, suggesting that eRNA synthesis occurs specifically at enhancers engaged in regulating actively transcribed genes (Kim et al., 2010). The functional significance of these transcripts is an active area of investigation, but numerous studies have demonstrated their cell-type and signal- dependent activity. Two notable examples of eRNA functionality include those from the MyoD and the KLK3 (also known as prostate-specific antigen) enhancers. The eRNA expressed from the MyoD enhancer binds the cohesion complex and mediates its loading to drive transcription of myogenenin and ultimately myogenic differentiation (Tsai et al., 2018) Likewise, the KLK3 eRNA was shown to selectively regulate expression of androgen-responsive genes through bridging of the KLK3 enhancer to the KLK2 promoter in prostate cell lines. Importantly, siRNA- mediated knockdown of the KLK3 eRNA reduced proliferation of prostate cancer cells thereby linking enhancer RNA transcription to androgen-dependent malignant cell growth (Hsieh et al., 2014). As genome-wide technologies continue 13 to improve, more work is needed to better understand the contribution of eRNA to downstream transcription and other biological events. Nevertheless, the discovery and elucidation of these transcripts underscores the incredible complexity underlying transcriptional regulation in the eukaryotic genome.

1.2. COMPOSITION AND FUNCTIONS OF THE INTEGRATOR COMPLEX

1.2.1. Core subunits and RNAPII association

In the previous section, we described how the eukaryotic genome requires the concerted activity of transcription factor-bound enhancers and enhancer RNA to direct gene expression. Much of this regulatory information is transmitted through transcriptional coactivators found in multiprotein complexes that do not bind DNA directly but instead dock on transcription factors to drive distinct transcriptional programs (Spiegelman and Heinrich, 2004). In the following section, we discuss the composition of the RNAPII-associated Integrator complex, which is a transcriptional coactivator with RNA endonuclease activity whose functionality is well characterized in the processing of non-coding RNA, particularly UsnRNA. We will catalog Integrator’s known RNA targets and later examine the non-catalytic role of Integrator in RNAPII pause-release and its contribution to biological processes.

Integrator is a metazoan-restricted multi-subunit protein complex that was unexpectedly discovered during the purification of DSS1, a small 70 amino acid protein required for the stability of the BRCA2 complex (Baillat et al., 2005; Li et al., 2006; Marston et al., 1999). The initial purification yielded 12 subunits that were 14 designated INTS1-INTS12 based on their size, with INTS1 representing the largest subunit having a mass of approximately 250 kDa. In the time since its discovery, two additional subunits have been described, INTS13 and INTS14, that were previously known as Asunder and VWA9, respectively (Chen et al., 2012).

Strikingly, many Integrator subunits lack evolutionarily identifiable homologs and are completely devoid of annotated protein domains. For example, INTS2, INTS5, and INTS10 are the sole members of their own conserved protein families whose only predicted functional module spans the entire length of the subunit. INTS11 and INTS9 are the noteworthy exceptions and are paralogs of the pre-mRNA processing factors CPSF73 and CPSF100, respectively (Figure 1.2.1.) (Baillat et al., 2005; Dominski et al., 2005). Like CPSF73, INTS11 contains a metalo-β- lacatamase domain that is utilized for the zinc-dependent endonucleolytic hydrolysis of phosphodiester bonds on RNA substrates. INTS9 and CPSF100 also contain a metalo-β-lacatamase domain, however, they are rendered catalytically inactive due to the lack of conserved residues required for zinc coordination.

CPSF73, CPSF100, INTS11, INTS9, Artemis, and Snm1/Pso2 are structurally distinct from other members of the metalo-β-lacatamase family and feature a β-

CASP domain, though the functional significance of this domain is not well understood (Dominski, 2007). 15

Figure 1.2.1. Subunit composition of the human Integrator complex. Predicted domains of the 14 human Integrator subunits include armadillo-like repeats (ARM), coiled-coil domains (COIL), domain of unknown function (DUF), INTS6/SAGE1/DDX26B/CT45 C-terminus (ISDCC), plant homeodomain finger (PHD), tetratricopeptide repeats (TPR), β-lactamase/β-CASP (* indicates inactivity), and von Willebrand type A like domain (VWA). (Adapted from Baillat, D. et al 2015)

16

The Shiekhattar lab showed that Integrator associates with RNAPII by mass spectrometry and the 8WG16 antibody against the unphosphorylated RNAPII CTD disrupted the Integrator-RNAPII interaction. Moreover, GST-immobilized CTD was sufficient to enrich Integrator from HeLa nuclear extracts (Baillat et al., 2005). Later studies confirmed the affinity of Integrator to the CTD though it is unclear if

Integrator has a preference for a particular phosphorylation status. Three conflicting results emerged. Dylan Taatjes’s group performed in-vitro phosphorylation on the CTD by TFIIH or P-TEFb and compared interacting partners by mass spectrometry after incubation with HeLa nuclear extracts.

Interestingly, they found that Integrator interacted equally well with the unphosphorlyated CTD, TFIIH phosphorylated CTD and P-TEFb phosphorylated

CTD, suggesting that Integrator association with the CTD is agnostic to phosphorylation (Ebmeier et al., 2017). In 2007, Shona Murphy’s group showed that serine-7 phosphorylation is required for UsnRNA processing and Integrator recruitment to UsnRNA genes. In addition, mutation of serine-7 to alanine on a

GST-CTD backbone abolished Integrator interaction (Egloff et al., 2007). Later, they showed through a similar methodology that mutation of serine-2 to alanine was also sufficient to block Integrator binding, indicating that Integrator recognizes a serine-2/serine-7 phosphorylated dyad (Egloff et al., 2010). Finally, using reconstituted tyrosine-1 mutant CTD-expressing cells, Jean-Cristophe Andrau’s group showed not only that tyrosine-1 is required for transcriptional termination genome-wide but that tyrosine-1 is needed for the interaction with Integrator by mass spectrometry (Shah et al., 2018). It is noteworthy that at the time of this 17 writing, no study has demonstrated the binding of drosophila Integrator to RNAPII.

In addition, it is still unknown which Integrator subunit or subunits directly bind the

CTD. Nevertheless, despite their disparities, these studies and ours collectively demonstrate that human Integrator stably associates with the CTD of RNAPII.

1.2.2. Endonuclease-dependent functions of Integrator

In this section we discuss the endonucleolytic activity of Integrator and examine a number of RNA substrates subject to its catalytic activity including

UsnRNA, viral microRNA, enhancer RNA, and the long non-coding RNA NEAT1.

Integrator harbors a RNA endonuclease subunit, INTS11, and our initial characterization of the complex revealed its role in the 3’-end processing of small nuclear RNAs (UsnRNA) involved in spliceosome formation (Baillat et al., 2005).

Prior to the discovery of Integrator, it was understood that the generation of

UsnRNA had three requirements: 1) a distal sequence element occupied by Oct1 and Sp1 transcription factors and a proximal sequence element bound by SNAPc

(snRNA activating protein complex), 2) the CTD of RNAPII, and 3) a sequence of

GTTTN0-3AAARNNAGA known as the 3’-box, which is located 9–19 bp downstream of the 3’-end of the UsnRNA termination site (Baillat and Wagner,

2015). During its initial characterization and subsequent investigations, Integrator was shown to stably associate with the RNAPII CTD and occupy the promoters, gene bodies and 3’-ends of UsnRNA genes (Baillat et al., 2005; Egloff et al., 2012).

Importantly, expression of E203Q-INTS11, the catalytic-null mutant of INTS11, or the siRNA-mediated knockdown of INTS11 and other subunits, impair UsnRNA 18 processing resulting in the transcriptional read-through of RNAPII (Baillat et al.,

2005; Ezzeddine et al., 2011; O'Reilly et al., 2014). Since these discoveries, it is now established that the enzymatic activity of Integrator requires functional association of the INTS11/INTS9 heterodimer with INTS4 to collectively form the catalytically active cleavage module (Figure 1.2.2.) (Albrecht et al., 2018; Wu et al., 2017).

Figure 1.2.2. The role of Integrator in UsnRNA 3’-end processing. Integrator associates with the RNAPII C-terminal domain at UsnRNA genes through recognition of the serine-2/serine-7 phosphorylated dyad. Upon encountering the 3’-box sequence element, INTS11 cleaves the UsnRNA transcript in a catalytic module composed of INTS11, INTS9, and INTS4. (Adapted from Baillat, D. et al 2015)

Numerous efforts have sought to identify other RNA substrates of

Integrator. Joan Steitz’s group found that Integrator is needed for the 5’ and 3’ processing of viral pre-microRNA from the oncogenic γ-herpesvirus known to produce T-cell leukemia in primates, Herpesvirus saimiri. In these studies, they showed these viral microRNA are transcribed downstream of endogenous

UsnRNA and possess a viral microRNA 3’-box, which results in their processing by Integrator (Cazalla et al., 2011; Xie et al., 2015). Interestingly, this mechanism 19 also exemplifies microprocessor-independent microRNA processing that has also been described in mirtrons (Flynt et al., 2010).

Our lab recently found that Integrator is required for RNAPII pause-release at immediate early genes (IEGs), which is discussed in more detail below. As these genes are highly responsive to epidermal growth factor, we asked whether

Integrator plays a role in the functionality of enhancers that mediate IEG induction.

Using chromatin-confirmation capture and chromatin-enriched RNA sequencing, we found that Integrator is needed not only for enhancer-promoter looping but also for the 3’-end processing of enhancer RNA. Upon depletion of INTS11, enhancer

RNA transcripts extend beyond their canonical 3’-end. Importantly, overexpression of E203Q-INTS11 recapitulated this phenotype, demonstrating that the catalytic activity of INTS11 is critical for the processing of these non-polyadenylated eRNA, thus expanding the scope of Integrator function and enhancer regulation (Lai et al., 2015).

Finally, in collaboration with Jean-Cristophe Marine’s group, we showed that Integrator mediates the isotype switching of the long non-coding RNA NEAT1 from its long isoform to its short isoform. The long isoform of NEAT1 is an essential architectural component of paraspeckles, which are nuclear phase-separated bodies involved in the stress-response. Importantly, paraspeckles are found in many epithelial cancers and protect from oncogene-induced replication stress and impart chemoresistance. In this study, we demonstrated that Integrator restrains expression of the long NEAT1 isoform and paraspeckle assembly. Since lower levels of Integrator correlate with poor survival and diminished response to 20 chemotherapy, these results establish a link between Integrator activity, cancer biology and chemosensitivity (Barra et al., 2020).

1.2.3. Integrator’s contribution to RNAPII pause-release and the MAPK gene expression program

Transcriptional coactivators mediate the functional integration of regulatory information and augment expression of distinct biological gene expression programs (Spiegelman and Heinrich, 2004). We sought to understand the regulation of IEGs, which are rapidly induced upon stimulation by epidermal growth factor (EGF), much like heat-shock responsive genes in response to temperature elevation. These genes are distinctly controlled through extensive pre-loading of paused RNAPII complexes, which are released into gene bodies upon activation.

We found that following EGF induction, Integrator is rapidly recruited to IEGs where it mediates RNAPII pause-release through its recruitment of P-TEFb (CDK9 specifically) and AFF4, two components of the super elongation complex (SEC)

(Gardini et al., 2014). More recently, we showed that INTS11 is a direct phosphorylation target of the MAPK pathway, thereby linking EGF signaling to

Integrator genomic occupancy (Yue et al., 2017). The engagement of Integrator and the SEC is critical for activating the mitogen-activated protein kinases (MAPK) gene expression program, though less is known about the mechanisms by which

Integrator regulates transcription and RNAPII pause-release genome-wide.

Interestingly, the findings we observed at EGF-responsive genes were recapitulated at the HSP90 locus, suggesting that Integrator regulates RNAPII 21 pause-release at other stimulus-responsive genes (Gardini et al., 2014). Notably, other groups have demonstrated that Integrator physically associates with NELF and DSIF, indicating that Integrator may also functionally coordinate with factors that promote pausing (Stadelmayer et al., 2014; Yamamoto et al., 2014).

Figure 1.2.3. Integrator mediates RNAPII pause-release. Following activation by epidermal growth factor, Integrator engages the C-terminal domain of RNAPII and recruits the Super Elongation Complex to mediate RNAPII pause-release and productive elongation. (Adapted from Gardini, A. et al 2014).

1.2.4 Physiological importance

While most of our understanding of Integrator is limited to its role in RNA processing and RNAPII pause-release, a few studies have noted its contribution to physiological processes. In one of the earliest biological studies of Integrator,

INTS11 was shown to be important for cell cycle progression as its depletion in

HeLa cells leads to cell-cycle arrest in the G1 phase (Dominski et al., 2005). Shortly thereafter, it was demonstrated that mouse embryos with homozygous deletion of 22

INTS1 arrest growth at the early blastocyst stage due to UsnRNA misprocessing, coincident with high levels of caspase 3 and 7 (Hata and Nakayama, 2007).

A number of studies further examined the role of Integrator in development.

One showed that Integrator is key for maintaining the splicing of SMAD transcripts in zebrafish and its abolishment reduced expression of other genes important for hematopoiesis (Tao et al., 2009). Notably, INTS13 was recently shown to dissociate from Integrator and form a module with EGR1/2 and NAB2 at enhancers to direct myeloid differentiation (Barbieri et al., 2018). Another found that loss of

INTS6 or INTS11 impaired adipocyte differentiation through the downregulation of adipocyte-specific gene targets (Otani et al., 2013). INTS6 was also shown to restrict dorsal organizing genes in zebrafish and its loss resulted in delayed cell movement during gastrulation and expanded dorsalization of tissues (Kapp et al.,

2013). Integrator has also been described in cortical neuron migration through its interactions with Nipbl, a subunit of the cohesion complex, and the transcription factor Zfp609. These findings were particularly noteworthy given the neurodevelopmental defects seen in humans with Integrator mutations (Krall et al.,

2018; Oegema et al., 2017; van den Berg et al., 2017; Zhang et al., 2019).

In the enclosed study, we sought to understand how Integrator regulates transcription at promoters and enhancers with a specific interest in the function its catalytic activity. We find that Integrator regulates RNAPII recruitment genome- wide, independent of its enzymatic activity. At numerous human genes, INTS11 mediates RNAPII pause-release through endonucleolytic cleavage of nascent transcripts allowing for the re-initiation of an elongation-competent RNAPII 23 complex. At enhancers, INTS11 catalysis is needed strictly for the termination of enhancer RNA transcription beyond the +1 nucleosome and plays little role in pause-release. Such mechanistic distinctions are reflected in the increased occupancy of pause factors, NELF and SPT5, following Integrator depletion at promoters. Our results highlight fundamental differences in the regulation of transcription at human promoters and enhancers. This work highlights Integrator as a critical determinant in the recruitment and dynamic turnover of RNAPII at human promoters, where it initiates and terminates numerous rounds of transcription to facilitate productive elongation through the +1 nucleosome.

CHAPTER 2: KEY RESOURCES AND METHODS

24 25

2.1. KEY RESOURCES

REAGENT or RESOURCE SOURCE IDENTIFIER Antibodies Rabbit polyclonal Anti-INTS11 Atlas Antibodies Cat #HPA029025, RRID:AB_10600 425 Rabbit polyclonal Anti-INTS11 Abcam Cat#75276, RRID:AB_12809 58 Rabbit polyclonal Anti-INTS9 Atlas Antibodies Cat# HPA051615, RRID:AB_26815 52 Rabbit polyclonal Anti-INTS4 Bethyl Cat# A301-296A, Laboratories RRID:AB_93790 9 Mouse monoclonal Anti-INTS1 Millipore Cat# MABS1984, RRID:AB_28482 04 Mouse monoclonal Anti-Lamin A Active Motif Cat# 39962, RRID:AB_26150 28 Rabbit monoclonal Anti-GAPDH Cell Signaling Cat# 5174, Technology RRID:AB_10622 025 Rabbit monoclonal Anti-RPB1 NTD Cell Signaling Cat# 14958, Technology RRID:AB_26878 76 Rabbit polyclonal Anti-RNA pol II CTD Abcam Cat# ab5095, phospho S2 - RPB1 Ser-2 RRID:AB_30474 9 Rabbit polyclonal Anti-RNA pol II CTD Abcam Cat# ab5131, phospho S5 - RPB1 Ser-5 RRID:AB_44936 9 Rat monoclonal Anti-RNA Pol II CTD Active Motif Cat# 61383, phospho Tyr1 - RPB1 Tyr-1 RRID:AB_27936 13 Rabbit polyclonal Anti-NELF-E Santa Cruz Cat# sc-377052, Biotechnology NA Rat monoclonal Anti-SPT5 Millipore Cat# MABE1803, RRID:AB_28482 05 26

Rabbit polyclonal Anti-Histone Abcam Cat# ab8895, H3K4me1 RRID:AB_30684 7 Rabbit polyclonal Anti-Histone Diagenode Cat# pAb-003- H3K4me3 050, RRID:AB_26160 52 Rabbit polyclonal Anti-Histone Abcam Cat# ab9050, H3K36me3 RRID:AB_30696 6 Rabbit polyclonal Anti-Histone Diagenode Cat# H3K27ac C15410196, RRID:AB_26370 79 Chemicals, Peptides, and Recombinant Turbo DNAse Invitrogen cat# AM1907 Biotin-11-NTPs Perkin Elmer Cat# NEL54(2/3/4/5)0 01 Dynabeads M-280 streptavidin Invitrogen Cat# 11205

Critical Commercial Assays NEBNext Ultra II DNA library prep kit NEB Cat# E7645S Truseq Stranded Total RNA library Illumina Cat# 20020596 prep kit SMARTer smRNA-seq kit Takara Cat# 635030 Nextera DNA Library Prep Kit Illumina Cat# FC-121- 1030 Deposited Data Raw and analyzed data This paper GEO: GSE125534 & GSE125535

Experimental Models: Cell Lines Hela shGFP Gardini et al. 2014 Hela shINTS11 Gardini et al. 2014 Hela shINTS11 - FLAG-INTS11E203Q This paper Hela shINTS11 - FLAG-INTS11WT This paper

Recombinant DNA Tet-pLKO-puro vector addgene Cat# 21915 Tet-pLKO-neo vector addgene Cat# 21916 27

Cumate-pLenti-Cloning-2A-GFP vector ABM inc. Cat# iCu003 Software and Algorithms Cutadapt v1.14 Martin 2011 Trimmomatic v0.32 Bolger et al. 2014 Bowtie v1.1.2 Langmead et al. 2009 BEDTools v2.28 Quinlan and Hall, 2010 STAR v2.5.3a Dobin et al. 2013 deepTools2 Ramirez et al. 2016 MACS2 v2.1.2 Zhang et al. 2008 Homer annotatePeaks v4.9.1-5 Heinz et al. 2010 ATAC-seq pipeline https://github.co m/kundajelab/ata c_dnase_pipelin es NucleoATAC Schep et al. 2015 RSEM v1.2.31 Li and Dewey, 2011 DESeq2 Love at el. 2014 RECLU clustering Ohmiya et al. 2014 Prism v7.0c GraphPad Clus-Doc Matlab R2018b Pageion et al. 2016 Matlab R2018b https://www.math works.com/produ cts/new_product s/release2018b.h tml

Nikon Elements Analysis 5.02.01 Nikon Other GraphPad Prism

28

2.2. METHODS

Cell lines and cell culture

Cell lines were maintained in mycoplasma-free conditions and were routinely tested for infection. HeLa INTS11 and GFP inducible knockdown clones were established as previously described (Lai et al., 2015). HeLa rescue cells were established by cloning the same shINTS11 sequence into Tet-pLKO-neo vector

(Addgene) and single clones were selected with G418 (500μg/ml). shRNA- resistant N-terminal Flag-tagged WT or E203Q mutant INTS11 cDNA (Baillat et al., 2005) were cloned into Cumate-pLenti-Cloning-2A-GFP vector (ABM Inc.), and transfected into a shINTS11-Tet-pLKO-neo single clone. Stable cell lines were maintained in puromycin (2μg/ml) and G418 (200μg/ml) containing DMEM medium.

Antibodies

The antibodies utilized in this study are as follows: for ChIP-Seq INTS11 [Atlas

Antibodies, HPA029025, lot #A107128 (and western blot) and Abcam, 75276, lot

#1], RPB1 NTD [Cell Signaling Technology, 14958 lot #1], SPT5 [Millipore,

MABE1803, lot #Q2830940], NELF-E [Santa Cruz, 377052, lot #C1319], RPB1

Tyr-1 [Active Motif, 61383 Clone 3D12, lot #31516002 and for STORM imaging],

RPB1 Ser-5 [Abcam, 5131, lot #GR3178102-1], RPB1 Ser-2 [Abcam, 5095, lot

#GR3225147-1], H3K4me1 [Abcam, 8895, lot #GR3235544-1], H3K4me3

[Diagenode, 15410003-50, lot #A1052D], H3K36me3 [Abcam, 9050 lot#

GR3257952-1], H3K27ac [Diagenode, C15410196, lot #A1723-0041D]. For 29 western blot INTS4 [Bethyl Laboratories, A301-296A, lot #1], INTS9 [Atlast

Antibodies, HPA051615, lot # R66985], Lamin A [Active Motif, 39961, lot

#33310001], GAPDH [Cell Signaling, 5174, lot #6] and for STORM INTS1

[Millipore, MABS1984, lot #Q2790436]. For STORM imaging Alexa Fluor 568

[Invitrogen A11077 Lot #1917936] was used as a secondary staining for RPB1

Tyr-1 and Janelia Fluor 646 [Novus Bio NBP1-7-2739JF646 Lot #36481-051615] was used as a secondary staining for INTS1.

RNA isolation

Total RNA was extracted using Trizol reagent (Thermo Fisher Scientific,

#15596026) according to the manufacturer’s instructions. Genomic DNA was removed by Turbo DNAse treatment (Invitrogen, #AM1907).

Library preparation

ChIP-seq libraries were generated using the NEBNext Ultra II DNA library prep kit

(New England Biolabs, #E7645S) with at least 10 ng of input DNA. Total RNA-seq libraries were produced using Truseq Stranded Total RNA library prep kit (Illumina,

#20020596) with 500 ng of DNAse-treated Input RNA. Small RNA libraries were prepared using the SMARTer smRNA-seq kit (Takara, #635030) with 750 ng of

Nuclear-enriched total RNA. Genome-wide experiments were performed as two independent biological replicates. To avoid a batch effect in library preparation and sequencing flow cell, these replicates were processed together.

30

PRO-seq and data analysis

PRO-seq experiments were performed as described previously (Mahat et al.,

2016). Nuclei were isolated by Dounce homogenizer with loose pestle. 1x10^7 nuclei were subjected to nuclear run-on (30°C, 3 min) in the presence of 25 μM

Biotin-11-ATP/GTP/CTP/UTP (PerkinElmer). Total RNA was extracted and hydrolyzed in 0.2 M NaOH (on ice, 10 min). Biotinylated nascent RNAs were purified by Dynabeads M-280 streptavidin (Invitrogen). Following adaptor ligation, cDNA synthesis, and PCR amplification 140–350 bp libraries were size-selected by Pippin HT with 2% gel cassette 20B (Sage Science) then sequenced by

NextSeq 500 system (Illumina) with single-read runs. Raw fastq data were trimmed by Cutadapt 1.14 (Martin, 2011) and Trimmomatic v0.32 (Bolger et al., 2014), and then aligned on hg19 or dm3 genome by bowtie 1.1.2 (Langmead et al., 2009).

Strand-specific single nucleotide ends of aligned reads were generated by

BEDTools v2.28 with genomecov as bedgraph format (Quinlan and Hall, 2010).

Bedgraph data were normalized by the number of reads mapped to spike-in dm3 genome, and then converted to bigwig data, which were used for downstream analyses.

Cap analysis of gene expression (CAGE) library preparation, sequencing and analysis

CAGE library preparation, sequencing and mapping were performed by

DNAFORM (Kanagawa, Japan), the CAGE library prep is based on nAnTiCAGE protocol (Murata et al., 2003). Briefly, first strand cDNAs were transcribed to the 5’ 31 end of capped RNAs and were attached to CAGE “bar code” tags. Multiplex deep sequencing of two cDNA libraries (shGFP without dox) was performed on an

Illumina HiSeq2500 sequencer. The sequenced CAGE tags were mapped to the human genome hg19 using BWA and used the a filter of MAPQ >= 20 (Li and

Durbin, 2009). Next, we removed the rRNA from the bam files. The reads with

MAPQ < 20 in BWA were mapped again by HISAT2 (Kim et al., 2015). We kept only the unique mapped reads (with the tag NH:i:1) and combined with the BWA mapped reads that were utilized in subsequent analyses. For tag clustering, the

CAGE-tag were input for RECLU clustering, with a maximum irreproducible discovery rate (IDR) of 0.1 (Ohmiya et al., 2014). We used BedTools command closest to annotated the CAGE peaks against ENSENBL (v87). Peaks with distance >500nt from an annotated TSS region were discarded to avoid multiple

CAGE signals assigned to the same promoter. When multiple peaks were detected in the same promoter, we considered the CAGE peak with lower IDR. The gene isoforms related with the annotated TSS by CAGE was considered for further analysis if we detect a FPKM > 0 in total RNA-seq of INTS11 shRNA with or without dox. We used deepTool2 to generate normalized bigwig (Ramírez et al., 2016).

ChIP-Seq and data analysis

ChIP-seq was performed as previously described (Gardini et al., 2014; Lai et al.,

2015; Yue et al., 2017) with slight modifications. 2 x 107 cells were crosslinked in

1% formaldehyde for 10 minutes at room temperature and quenched with 125 mM glycine. For INTS11 ChIP, samples were crosslinked with Cross-link Gold 32

(Diagenode C01019027) according to the manufacturer’s instructions prior to formaldehyde fixation. Samples were resuspended in ChIP lysis buffer (20 mM

Tris-HCl, 50 mM NaCl, 0.1% SDS, 0.5% Triton-X-100, 1 mM EDTA) and sonicated in a Covaris M220 focused sonicator until chromatin fragments of 150-300 bp were produced. Protein A/G magnetic beads were bound with antibody and incubated with 1 mg of sonicated chromatin overnight. The following day, beads were washed twice with Mixed Micelle Buffer (150 mM NaCl, 1% Triton X-100, 0.2% SDS,

20 mM Tris-HCl, 5 mM EDTA, 65% sucrose), Buffer 500 (500 mM NaCl, 1% Triton

X-100, 0.1% Na deoxycholate, 25 mM HEPES, 10 mM Tris-HCl, 1 mM EDTA),

LiCl/detergent wash (250 mM LiCl, 0.5% Na deoxycholate, 0.5% NP-40, 10 mM

Tris-HCl, 1 mM EDTA), and eluted in Decrosslinking Buffer (20 mM Tris-HCl, 300 mM NaCl, 0.5% SDS, 1 mM EDTA) at 65 °C. Eluates were treated with RNAse A for 1 hour at 37 °C and then Proteinase K for 3 hours at 56 °C. The DNA was finally purified by phenol/chloroform and eluted in 1x TE. Raw FASTQ data were processed with Trimmomatic v0.32 to remove low-quality reads and then aligned to the human genome hg19 using STAR aligner v2.5.3a (Dobin et al., 2013). We used deepTool2 to generate normalized bigwig and heat maps (Ramírez et al.,

2016). Peaks were called using MACS2.1.2 with default parameters –shiftsize 75

–nomodel --extsize 200 –p 0.01 for all data. Whole-cell extract input from the corresponding cell lines were used as controls. Peaks with fold change > 3 and a q-value < 0.01 were used for downstream analysis. Homer annotatePeaks v4.9.1-

5 was used for peak annotation and intergenic peaks were selected for eRNA identification. 33

RNA-seq data analysis

Raw fastq RNA-seq data were processed with Trimmomatic v0.32 and aligned to human genome (hg19 version) using STAR aligner v2.5.3a with default parameters and RSEM v1.2.31 to obtain expected gene counts against the human Ensembl (release 87)(Bolger et al., 2014; Dobin et al., 2013; Li and Durbin,

2009). To select genes for downstream analysis, we extended the TSS positions

(defined by the CAGE peaks) by 500 toward to the 3’end. We overlapped the nucleosome position defined by NucleoATAC and used the closet nucleosome to the CAGE peak to capture the +1 nucleosome dyad center position (Schep et al., 2015). The genes with no nucleosome in the 500nt window were excluded from the subsequent analysis. We merged the 2 PRO-Seq biological replicates for each condition, shINTS11 with and without dox, and excluded all TSS region with a CPM

< 0.5 in both conditions. The remaining TSSs were defined as all genes (n=8000).

Differential expression was determined between INTS11 shRNA +Dox or INTS9 shRNA +Dox or INTS4 shRNA +Dox against GFP shRNA +Dox or INTS11 shRNA (+Dox vs -Dox) and GFP shRNA (+Dox vs -Dox) using DESeq2 and R v3.2.3 with q-value < 0.05. For visualization on the UCSC Genome Browser, all tracks were CPM (count per million) normalized against the total number of usable reads in that data set using deepTools2 (Love et al., 2014; Ramírez et al., 2016).

Gene Ontology analysis was performed using DAVID (v6.8) online tool with standard parameters (https://david.ncifcrf.gov/home.jsp).

34

Small RNA analysis

Reads were then adapter trimmed (AAAAAAA) as recommend by SMARTer smRNA-seq kit (Takara, #635030) protocol using Cutadapt (v1.14) and reads less than 17 bp were discarded. First, we aligned the reads against human elements in RepBase (v23.08) with STAR (v2.5.3a), repeat-mapping reads were removed all others were then mapped against the full human genome (hg19 version) keeping all uniquely aligned reads. For visualization on the UCSC Genome

Browser, all tracks were CPM (count per million) normalized against the total number of usable reads in that data set using deepTools2 (Ramirez et al., 2016).

The average smRNA length profiles were calculated with the first quartile of all gene classes and we used a window from CAGE peak to +150 nt for the calculation.

Differentially paused genes modeling

PRO-seq was used to calculate the traveling ratio (TR) as previously described in

(Rahl et al., 2010). Briefly, we calculated the ratio between the RPKM values in promoter region (-50 nt from CAGE to + 300 nt) over the region corresponding to the gene body (+301 nt from CAGE to TES) for all transcripts used in the analysis

(n = 8000). The traveling matrix (TM) can be visualized as a 2D distribution by using the log2 ratio (RPKM –Dox/ RPKM +Dox ) of promoter region (X axis) and gene body (Y axis). To access the genes with differential pause after INTS11 depletion, we developed a pause relative metric using a 2D Gaussian distribution model.

First, we calculated a null model dispersion based in the Control shRNA (-Dox) 35 and (+Dox) observed changes. The two-dimensional Control shRNA relative distribution was estimated using a 2D gaussian probabilistic model, with the number of mixture k = 1 using the numerical Expectation Maximization (EM) algorithm with a convergence threshold < 1e-3. To select the genes from INTS11 shRNA condition presenting a significant relative pause value (FDR < 0.01), we exclude those genes that show a high probability to belong to the null Control shRNA. The relative difference generated by this 2D representation can be categorized in four different groups:

Class 1: log2 promoter > 0 and log2 gene body < 0

Class 2: log2 promoter < 0 and log2 gene body < 0

Class 3: log2 promoter > 0 and log2 gene body > 0

Class 4: log2 promoter < 0 and log2 gene body > 0

The model can be summarized as: p = Log2(-Dox/+Dox) RPKM at the -50 nt CAGE peak up to + 300 nt window region. c = Log2(-Dox/+Dox) RPKM from the CAGE peak +301 nt up to gene TES.

P = {p1, p2 … pn }, where, P is populated with the p shControl values.

C = {c1, c2 … cn }, where, C is populated with the c shControl values

μ, Σ = EM (P, C), where the parameters from the multivariate Gaussian distribution

μ (mean vector) and Σ (2x2 covariance matrix) are estimated using the Expectation

Maximization algorithm. 36

� = [ �!"#$%&'', �!"#$%&''] , is a vector containing the values p and c for a given gene in the shINTS11 condition.

We defined the probability of a gene x be differentially paused by using the following 2D multivariate normal joint probability distribution equation:

1 1 �(�| �, Σ ) = 1 − − (� − �)%Σ)'(� − �) (2�)( |Σ| 2

Eq 1.

After obtaining the differential pause probability for the genes in the INTS11 shRNA condition, we selected as significant those genes that presented an adjusted probability FDR < 0.01. The 2D Gaussian model and prediction was estimated using the python sklearn mixture package version 0.21.1 (Pedregosa et al., 2011).

Genome-wide identification of eRNA loci

For eRNA identification, we first defined a set of intergenic enhancers by using the

BEDTools Window (v2.28.0) function to overlap H3K27ac and H3K4me1 intergenic peaks (defined in ChIP-Seq and data analysis section) in a window of

±200 bp. After identifying intergenic regions containing H3K27ac/H3K4me1 peaks, the BEDtools Intersect function was used with the option –v to discard any peaks that overlap with any gene from the hg19 reference genome (ENSENBL v87), including an additional 2 kb upstream from every TSS. In order to determine which intergenic enhancers presented transcriptional activity, we separated the mapped read from PRO-seq by strand and individually call peaks using MACS2.1.2 with 37 the following parameters –nomodel --extsize 200 --broad-cutoff 0.1 and –broad to identify broad peaks. We merge the common peaks from each samples (Control shRNA and INTS11 shRNA) for each stranded peaks and later all the peaks from common plus strand were merge with common minus strand peaks. BEDtools intersect between the intergenic enhancer peaks and the common peaks from

PRO-Seq was use to define the enhancer that contain eRNAs. We excluded all transcripts with less than 800nt, in order to minimize any bias in the TR analysis, since these eRNAs are short to precisely separate promoter and gene body regions. Lastly, we define the TSS coordinate for each eRNA based on the position of the highest signal occurrence in the PRO-Seq peak window and discarded those with CPM < 0.5. For eRNA TR calculation we use as promoter region (-100 nt from PRO-seq summit to + 150 nt) and gene body (+151 nt from

PRO-seq summit to TES plus 1kb) all transcripts used in the analysis (n = 2293).

The output from the TR calculation was used as input for the traveling matrix pipeline described above.

Microscopy imaging

Cells (4x103) were plated onto micro slide 4 well plates (Ibidi), fixed in 4% formaldehyde, and immunostained with appropriate primary and secondary antibodies. Imaging experiments were carried out with a Nikon eclipse Ti2 microscope equipped with Nikon Instruments (NSTORM). For two color imaging dSTORM imaging, Janelia 646, and Alexa 568 secondary antibodies were used with MEA STORM imaging buffer and were imaged continuously with 5000 frames 38 collected per filter range at a frequency of 30 ms. The images were acquired using a 100x, 1.49 NA objective, and imaged onto a Hamamatsu C11440 ORCA-flash

4.0. Storm analysis was carried out with Nikon Elements Analysis 5.02.01 for identification of molecules. Molecule list files were then exported from Nikon elements to be further analyzed using software Clus-Doc software in Matlab

R2018b (Pageon et al., 2016). Cluster density analysis, specifically DBSCAN function, was carried out after manually selecting region of interest corresponding to the nucleus. Statistical significance and graphing were performed using Prism software.

CHAPTER 3: THE INTEGRATOR CATALYTIC MODULE REGULATES STEADY-STATE GENE EXPRESSION

39 40

3.1. APPROACH

Previous studies by our group and others focused primarily on the role of

INTS11 in regulating gene expression as its structural composition and enzymatic activity is best characterized among Integrator subunits (Baillat and Wagner, 2015;

Baillat et al., 2005; Dominski, 2007; Wu et al., 2017). However, it is now understood that the endonuclease activity of Integrator requires the association of the

INTS11/INTS9 heterodimer with INTS4 to form the catalytically active cleavage module (Albrecht et al., 2018). We asked whether a core set of genes is regulated by these three subunits in their steady-state and wondered to what extent these genes can be categorized based on the functions of their cognate proteins. To address these questions, we first generated stable HeLa cells containing doxycycline-inducible hairpin sequences targeting green fluorescent protein

(Control shRNA), INTS11 (INTS11 shRNA), INTS9 (INTS9 shRNA) and INTS4

(INTS4 shRNA). We next assessed the depletion of these subunits by western blot after three days of doxycycline administration at 1000 ng/mL. Finally, we isolated total RNA from the corresponding samples, performed RNA-seq and conducted differential expression analysis.

3.2. RESULTS

3.2.1. Validation of INTS11, INTS9 and INTS4 depletion for RNA-seq

To gain insight into the transcriptional targets regulated by INTS11, INTS9 and INTS4, the core subunits of the Integrator catalytic module, we generated

HeLa cells with stably inducible short hairpin RNA to downregulate expression of 41 each subunit. In addition, we created a control shRNA against green fluorescent protein. Interestingly, while depletion of INTS4 did not affect the stability of INTS11 or INTS9, reduction of either INTS11 or INTS9 lead to the loss of both proteins, reflecting their association as obligate partners (Figure 3.2.1.) (Wu et al., 2017).

Control INTS4 INTS9 INTS11 shRNA shRNA shRNA shRNA Dox - + - + - + - + INTS4 Figure 3.2.1. Validation of INTS11, INTS9 and INTS9 INTS4 depletion for RNA-seq. Representative western blot depicting INTS11 stable HeLa cells without (-Dox) and with (+Dox) GAPDH shRNA induction.

3.2.2. INTS11, INTS9 and INTS4 regulate expression of a core gene set

We next performed total RNA-sequencing (RNA-seq) to assess their genome-wide effect on 24,832 expressed genes. Knockdown of INTS11, INTS9 or

INTS4 resulted in the differential expression of 5347, 7423, and 5692 genes respectively, and nearly half in each condition were down- or upregulated (Figure

3.2.2. A-C). Notably, we found 1898 core genes whose expression was collectively regulated by the three subunits where approximately 950 genes were commonly down- or upregulated (Figure 3.2.2. D-E). Despite their strong association, INTS11 and INTS9 respectively regulate expression of over 2000 genes independent of the other subunits. INTS9 and INTS4 showed more overlapping genes compared to that of INTS11 and INTS4, suggesting that INTS11 may operate outside of the catalytic module. Indeed, while the stability of INTS11 and INTS9 are co- dependent, the initial biochemical analysis of the complex revealed other INTS11- 42 containing complexes that could underlie differences in gene expression upon their respective depletion (Baillat et al., 2005). Finally, we performed

(GO) analysis to better understand the biological functions of proteins encoded by these genes. Numerous GO classifications were assessed including Cellular

Component, Molecular Function, KEGG pathways, and Interpro Domains.

Products of downregulated genes function in chromatin dynamics, hormone signaling and transcriptional regulation while upregulated genes generate proteins involved in organelle structure, metabolism, and growth factor signaling (Tables 1-

2).

43

Table 1. Significantly downregulated pathways by RNA-seq

GO Cellular Component Term P-value nucleoplasm part (GO:0044451) 6.59E-05 nuclear body (GO:0016604) 1.99E-04 histone methyltransferase complex (GO:0035097) 0.002072806 lysosomal lumen (GO:0043202) 0.002567476 heterochromatin (GO:0000792) 0.012559201 nuclear speck (GO:0016607) 0.015288465 euchromatin (GO:0000791) 0.016139199 lysosome (GO:0005764) 0.019815063 Set1C/COMPASS complex (GO:0048188) 0.021670853 vacuolar lumen (GO:0005775) 0.022281718 chromatin (GO:0000785) 0.02718224 AP-3 adaptor complex (GO:0030123) 0.030051428 nuclear chromatin (GO:0000790) 0.034801827 condensed (GO:0000793) 0.04055024 H3 histone acetyltransferase complex (GO:0070775) 0.040757356 messenger ribonucleoprotein complex (GO:1990124) 0.040757356 MOZ/MORF histone acetyltransferase complex (GO:0070776) 0.040757356 micro-ribonucleoprotein complex (GO:0035068) 0.040757356 Golgi cisterna membrane (GO:0032580) 0.042492669 RNA polymerase II transcription factor complex (GO:0090575) 0.049176184 nucleolus (GO:0005730) 0.049416565

KEGG Pathways Term P-value Lysosome(hsa04142) 0.000702277 Thyroid hormone signaling pathway(hsa04919) 0.004470618 Estrogen signaling pathway(hsa04915) 0.004486988 Insulin signaling pathway(hsa04910) 0.00655587 Aldosterone synthesis and secretion(hsa04925) 0.012074255 Protein processing in endoplasmic reticulum(hsa04141) 0.028856452 Circadian rhythm(hsa04710) 0.029536255 Hippo signaling pathway(hsa04390) 0.030243947 Phosphatidylinositol signaling system(hsa04070) 0.033993092 cAMP signaling pathway(hsa04024) 0.040742207 TGF-beta signaling pathway(hsa04350) 0.041706813 Endocytosis(hsa04144) 0.043745601 Pathways in cancer(hsa05200) 0.047084217

INTERPRO domains Term P-value Zinc finger, PHD-type(IPR001965) 5.28999E-06 Zinc finger, PHD-finger(IPR019787) 5.91974E-06 Zinc finger, PHD-type, conserved site(IPR019786) 0.000120447 Rho GTPase-activating protein domain(IPR000198) 0.000538215 Protein kinase-like domain(IPR011009) 0.000787335 Zinc finger, FYVE/PHD-type(IPR011011) 0.001079241 Rho GTPase activation protein(IPR008936) 0.0017493 Serine/threonine-protein kinase, active site(IPR008271) 0.002103345 Dbl homology (DH) domain(IPR000219) 0.00323101 Autism susceptibility gene 2 protein(IPR023246) 0.005354852 Regulator of G protein signalling superfamily(IPR016137) 0.005942767 Protein kinase, catalytic domain(IPR000719) 0.006619181 Zinc finger, C2H2-like(IPR015880) 0.007856504 Rap GTPase activating proteins domain(IPR000331) 0.01001598 Helicase/SANT-associated, DNA binding(IPR014012) 0.010405321 Zinc finger, C2H2(IPR007087) 0.011658275 Zinc finger, RING/FYVE/PHD-type(IPR013083) 0.012557815 Zinc finger, CXXC-type(IPR002857) 0.012934212 Regulator of G-protein signaling, domain 1(IPR024066) 0.013255842 Branched-chain alpha-ketoacid dehydrogenase kinase/Pyruvate dehydrogenase kinase, N-terminal(IPR018955) 0.016851123 Signal transduction histidine kinase, core(IPR005467) 0.016851123 Heat shock protein 70, conserved site(IPR018181) 0.024313697 Heat shock protein 70 family(IPR013126) 0.024313697 Src homology-3 domain(IPR001452) 0.025972939 Basic-leucine zipper domain(IPR004827) 0.027467448 GoLoco motif(IPR003109) 0.033422436 PDZ domain(IPR001478) 0.034960015 Bromodomain(IPR001487) 0.035930994 Ubiquitin-associated/translation elongation factor EF1B, N-terminal, eukaryote(IPR015940) 0.039161041 Protein kinase, ATP binding site(IPR017441) 0.039501712 Protein of unknown function DUF1669(IPR012461) 0.043315438 Zinc finger C2H2-type/integrase DNA-binding domain(IPR013087) 0.049439863

44

Table 1. Significantly downregulated pathways by RNA-seq (continued)

GO Molecular Function Term P-value transcription regulatory region DNA binding(GO:0044212) 7.36927E-07 regulatory region DNA binding(GO:0000975) 8.35061E-07 regulatory region nucleic acid binding(GO:0001067) 8.61599E-07 transcription factor activity, transcription factor binding(GO:0000989) 2.82132E-06 protein binding(GO:0005515) 2.88039E-06 transcription factor activity, protein binding(GO:0000988) 3.76603E-06 nucleic acid binding transcription factor activity(GO:0001071) 5.05503E-06 transcription factor activity, sequence-specific DNA binding(GO:0003700) 5.08025E-06 transcription regulatory region sequence-specific DNA binding(GO:0000976) 7.44383E-06 transcription cofactor activity(GO:0003712) 9.73936E-06 molecular function regulator(GO:0098772) 1.64742E-05 sequence-specific DNA binding(GO:0043565) 2.07473E-05 RNA polymerase II regulatory region DNA binding(GO:0001012) 2.39519E-05 GTPase activator activity(GO:0005096) 2.46296E-05 sequence-specific double-stranded DNA binding(GO:1990837) 2.87113E-05 nucleoside-triphosphatase regulator activity(GO:0060589) 3.35368E-05 DNA binding(GO:0003677) 4.59772E-05 macromolecular complex binding(GO:0044877) 4.64061E-05 GTPase regulator activity(GO:0030695) 6.84319E-05 RNA polymerase II regulatory region sequence-specific DNA binding(GO:0000977) 8.49324E-05 chromatin binding(GO:0003682) 9.2015E-05 binding(GO:0005488) 0.00010922 enzyme activator activity(GO:0008047) 0.000121438 RNA polymerase II transcription factor activity, sequence-specific DNA binding(GO:0000981) 0.000127325 double-stranded DNA binding(GO:0003690) 0.000148197 enzyme binding(GO:0019899) 0.000261748 beta-catenin binding(GO:0008013) 0.000286977 transcription coactivator activity(GO:0003713) 0.000579839 enzyme regulator activity(GO:0030234) 0.000608004 transcriptional activator activity, RNA polymerase II transcription regulatory region sequence-specific binding(GO:0001228) 0.000825566 transcription factor activity, RNA polymerase II core promoter proximal region sequence-specific binding(GO:0000982) 0.000851101 heterocyclic compound binding(GO:1901363) 0.001296291 transcription corepressor activity(GO:0003714) 0.001351433 zinc ion binding(GO:0008270) 0.001697507 transcription factor binding(GO:0008134) 0.001759196 nuclear hormone receptor binding(GO:0035257) 0.001992628 organic cyclic compound binding(GO:0097159) 0.002002928 kinase activity(GO:0016301) 0.002631185 kinase binding(GO:0019900) 0.002644022 phosphotransferase activity, alcohol group as acceptor(GO:0016773) 0.003247471 tubulin binding(GO:0015631) 0.003641922 nucleic acid binding(GO:0003676) 0.003686082 protein phosphatase binding(GO:0019903) 0.004187777 cation binding(GO:0043169) 0.004223332 transcriptional activator activity, RNA polymerase II core promoter proximal region sequence-specific binding(GO:0001077) 0.00502536 metal ion binding(GO:0046872) 0.005085414 RNA polymerase II transcription coactivator activity(GO:0001105) 0.006532877 ion binding(GO:0043167) 0.007207109 hormone receptor binding(GO:0051427) 0.007743207 protein C-terminus binding(GO:0008022) 0.008106386 Rho guanyl-nucleotide exchange factor activity(GO:0005089) 0.008161447 protein serine/threonine/tyrosine kinase activity(GO:0004712) 0.008420807 cytoskeletal protein binding(GO:0008092) 0.008932149 phosphatase binding(GO:0019902) 0.00966914 p53 binding(GO:0002039) 0.01008882 transcriptional activator activity, RNA polymerase II transcription factor binding(GO:0001190) 0.010172409 transcriptional repressor activity, RNA polymerase II activating transcription factor binding(GO:0098811) 0.010172409 protein serine/threonine kinase activity(GO:0004674) 0.01367437 protein kinase binding(GO:0019901) 0.015537484 adenyl nucleotide binding(GO:0030554) 0.015690876 transition metal ion binding(GO:0046914) 0.015727933 adenyl ribonucleotide binding(GO:0032559) 0.017895949 kinesin binding(GO:0019894) 0.019235369 ATP binding(GO:0005524) 0.019318433 transferase activity, transferring phosphorus-containing groups(GO:0016772) 0.019432178 protein kinase activity(GO:0004672) 0.019436877 core promoter proximal region sequence-specific DNA binding(GO:0000987) 0.01990672 core promoter proximal region DNA binding(GO:0001159) 0.02123346 steroid hormone receptor binding(GO:0035258) 0.025334109 transcriptional repressor activity, RNA polymerase II transcription regulatory region sequence-specific binding(GO:0001227) 0.027773426 protein complex binding(GO:0032403) 0.027981646 protein complex scaffold(GO:0032947) 0.030396531 RNA polymerase II core promoter proximal region sequence-specific DNA binding(GO:0000978) 0.03181039 mitogen-activated protein kinase kinase kinase binding(GO:0031435) 0.038445943 guanyl-nucleotide exchange factor activity(GO:0005085) 0.038586959 microtubule binding(GO:0008017) 0.038845317 phosphatase regulator activity(GO:0019208) 0.044154085 Notch binding(GO:0005112) 0.044637084 45

Table 2. Significantly upregulated pathways by RNA-seq

GO Cellular Component Term P-value Golgi subcompartment (GO:0098791) 6.49E-04 Golgi membrane (GO:0000139) 0.001022685 trans-Golgi network (GO:0005802) 0.002867013 platelet alpha granule (GO:0031091) 0.003286784 late endosome (GO:0005770) 0.006306134 late endosome membrane (GO:0031902) 0.006811137 phagocytic vesicle (GO:0045335) 0.015883633 mitochondrial matrix (GO:0005759) 0.019940254 endoplasmic reticulum-Golgi intermediate compartment membrane (GO:0033116) 0.022127256 integral component of peroxisomal membrane (GO:0005779) 0.025615194 specific granule membrane (GO:0035579) 0.027272451 endoplasmic reticulum lumen (GO:0005788) 0.030710997 endocytic vesicle (GO:0030139) 0.030907258 BLOC-1 complex (GO:0031083) 0.036760845 platelet alpha granule lumen (GO:0031093) 0.037561248 specific granule (GO:0042581) 0.039115887 manchette (GO:0002177) 0.039664363 ER to Golgi transport vesicle membrane (GO:0012507) 0.040481714 platelet alpha granule membrane (GO:0031092) 0.043119044

KEGG Pathways Term P-value Metabolic pathways(hsa01100) 9.01938E-06 Valine, leucine and isoleucine degradation(hsa00280) 0.000326454 Biosynthesis of antibiotics(hsa01130) 0.001049305 Focal adhesion(hsa04510) 0.004808281 Synthesis and degradation of ketone bodies(hsa00072) 0.005256923 ECM-receptor interaction(hsa04512) 0.005343144 Fatty acid metabolism(hsa01212) 0.008906908 Biosynthesis of amino acids(hsa01230) 0.018463873 Regulation of actin cytoskeleton(hsa04810) 0.027619345 Malaria(hsa05144) 0.036550415 Peroxisome(hsa04146) 0.036799501 PI3K-Akt signaling pathway(hsa04151) 0.036836997 Renal cell carcinoma(hsa05211) 0.037700446 Proteoglycans in cancer(hsa05205) 0.039388238 Amoebiasis(hsa05146) 0.046737777

INTERPRO domains Term P-value Epidermal growth factor-like domain(IPR000742) 0.000215226 Galactose-binding domain-like(IPR008979) 0.00393877 TMP21-related(IPR015720) 0.007222681 EGF, extracellular(IPR013111) 0.007449681 NIPSNAP(IPR012577) 0.008261204 EGF-like, conserved site(IPR013032) 0.009380103 Small GTP-binding protein domain(IPR005225) 0.01166011 Annexin(IPR001464) 0.014634539 Annexin repeat(IPR018502) 0.014634539 Annexin repeat, conserved site(IPR018252) 0.014634539 P-loop containing nucleoside triphosphate hydrolase(IPR027417) 0.018410058 EGF-like calcium-binding(IPR001881) 0.023390736 Plexin-like fold(IPR016201) 0.0252178 Dimeric alpha-beta barrel(IPR011008) 0.026792636 GOLD(IPR009038) 0.033850151 Semaphorin(IPR027231) 0.038726929 Insulin-like growth factor binding protein, N-terminal(IPR009030) 0.040011082 Integrin beta subunit(IPR015812) 0.043673611 IQ motif, EF-hand binding site(IPR000048) 0.048028366

46

Table 2. Significantly upregulated pathways by RNA-seq (continued)

GO Molecular Function Term P-value protein binding(GO:0005515) 0.000118563 oxidoreductase activity(GO:0016491) 0.001039386 GTPase activity(GO:0003924) 0.003480809 catalytic activity(GO:0003824) 0.004695276 cell adhesion molecule binding(GO:0050839) 0.005266296 ligase activity, forming carbon-nitrogen bonds(GO:0016879) 0.006019279 sulfur compound binding(GO:1901681) 0.007509735 MutSalpha complex binding(GO:0032407) 0.013217516 Ras GTPase binding(GO:0017016) 0.014066497 electron carrier activity(GO:0009055) 0.014749602 GTPase binding(GO:0051020) 0.016989039 neuropilin binding(GO:0038191) 0.017407998 chemorepellent activity(GO:0045499) 0.01776958 MutLalpha complex binding(GO:0032405) 0.019333011 syntaxin binding(GO:0019905) 0.021704074 oxidoreductase activity, acting on the CH-CH group of donors(GO:0016627) 0.025344388 pyrimidine nucleotide-sugar transmembrane transporter activity(GO:0015165) 0.026395021 small GTPase binding(GO:0031267) 0.028197596 oxidoreductase activity, acting on the aldehyde or oxo group of donors(GO:0016903) 0.029041744 binding(GO:0005488) 0.032335406 vitamin binding(GO:0019842) 0.033334814 mismatch repair complex binding(GO:0032404) 0.034323495 GTP binding(GO:0005525) 0.035026364 nucleotide-sugar transmembrane transporter activity(GO:0005338) 0.043043064 oxidoreductase activity, acting on the aldehyde or oxo group of donors, disulfide as acceptor(GO:0016624) 0.043043064 muscle alpha-actinin binding(GO:0051371) 0.043043064 integrin binding(GO:0005178) 0.047524819 intramolecular oxidoreductase activity, transposing S-S bonds(GO:0016864) 0.048513515 protein disulfide isomerase activity(GO:0003756) 0.048513515 modified amino acid binding(GO:0072341) 0.048628011 extracellular matrix binding(GO:0050840) 0.049173055

3.3. SUMMARY

To understand the contribution of the Integrator catalytic module on steady- state gene expression, we developed stable HeLa cells with shRNA against

INTS11, INTS9 and INTS4 and performed RNA-seq. Differential expression analysis showed that each of these subunits regulate levels of more than 5000 genes respectively. The subunits comprising the catalytic core collectively regulate expression of 1898 genes, where 956 were downregulated and 942 were upregulated. Downregulated genes encode proteins involved in chromatin dynamics and transcriptional regulation while upregulated genes produce proteins that function in metabolism and growth factor signaling.

CHAPTER 4: INTEGRATOR MEDIATES TRANSCRIPTIONAL ELONGATION

47 48

4.1. APPROACH

Our previous results indicate that the Integrator subunits composing the catalytic module each regulate the steady-state mRNA levels of over 5000 genes.

Nevertheless, these alterations in gene expression might be due to indirect effects on mRNA stability and not the specific loss of Integrator function. Numerous co- transcriptional and post-transcriptional processes alter transcript stability including

5’-capping, N6-methyladenosine modification, pre-mRNA splicing, codon usage, and alternative polyadenylation (Chen and Shyu, 2017). If any of these processes were disrupted by the loss of Integrator, the steady state RNA levels could reflect a milieu of primary, secondary and tertiary effects.

As we sought to understand the direct contribution of Integrator to transcription, we depleted INTS11 and performed precision run-on and sequencing (PRO-seq), which provides a genome-wide distribution of transcriptionally engaged RNAPII at single nucleotide resolution (Kwak et al.,

2013). An advantage of PRO-seq compared to other methods of nascent transcript sequencing is its utilization of biotin-labeled ribonucleotide triphosphate analogs for the nuclear run-on reaction. Since these bases are biotinylated on the 3’- hydroxyl, their incorporation abolishes further elongation and ensures base-pair resolution. The nascent RNA can then be affinity purified through a streptavidin pull down and sequenced from the 3’-end. Cap analysis of gene expression

(CAGE-seq) experiments were also conducted, which involves the purification of

RNA through its 5’-cap and allows for the precise mapping of transcription start sites (TSSs) as transcripts are sequenced from the 5’-end (Shiraki et al., 2003). It 49 is known that the +1 nucleosome imposes an energetic barrier to transcriptional elongation and closely maps to sites of RNAPII pausing (Weber et al., 2014). We hypothesized that Integrator facilitates RNAPII pause release adjacent to the +1 nucleosome, so we performed assay for transposase-accessible chromatin using sequencing (ATAC-seq) to precisely map nucleosome deposition genome-wide.

Finally, we conducted chromatin immunoprecipitation and sequencing (ChIP-seq) of INTS11 and RNAPII to appreciate how alterations in nascent transcription manifest in their steady-state occupancy. High quality antibodies are critical for

ChIP-seq experiments and in our study, we used the extensively cited ENCODE- validated RNAPII antibody from Cell Signaling Technologies. As INTS11 antibodies lack ENCODE validation, we performed ChIP-seq with two immunohistochemistry-grade antibodies, one from Sigma and the other from

Abcam, and only analyzed the common peaks between the two antibodies.

4.2. RESULTS

4.2.1. Bioinformatic parameters for gene selection and INTS11 genomic occupancy

Multiple bioinformatic filters were used to assess transcriptionally engaged

RNAPII including detectable signal from RNA-seq, PRO-seq and CAGE-seq, and a positioned +1 nucleosome within 500 nucleotides from the CAGE signal as determined by ATAC-seq. These cutoffs yielded 8000 genes for further analysis

(Figure 4.2.1. A). We evaluated the specificity of the INTS11 antibodies by confirming the loss of ChIP-seq signal after induction of the shRNA. An average 50 profile of INTS11 at these 8000 genes revealed more than 60% depletion on the chromatin (Figure 4.2.1. B). Moreover, a strong correlation (r = 0.79) of INTS11 and RNAPII occupancy was found in promoter regions, in agreement with biochemical evidence (Figure 4.2.1. C).

A B C

Figure 4.2.1. Bioinformatic parameters for gene selection and INTS11 genomic occupancy. (A) Schematic of the criteria us ed to curate genes, see method. (B) ChIP-seq occupancy of INTS11 centered on the TSS (CAGE peak +/- 200 bp) of 8000 genes without (Blue, -Dox) and with (Red, +Dox) induction of INTS11 shRNA (C) Spearman correlation of RNAPII and INTS11 ChIP-Seq oc c upanc y centered on the TSS (CAGE peak +/- 200 bp) of 8000 genes.

4.2.2. Integrator facilitates transcriptional elongation

We used PRO-seq to calculate the RNAPII traveling ratio (TR) as the density of RNAPII in the promoter region (defined here as -50 nt from CAGE to +

300 nt) over the region corresponding to the gene body (+301nt from CAGE to

TES), which allows for genome-wide assessment of RNAPII behavior (Rahl et al.,

2010). We detected an increase in the TR after depletion of INTS11, suggesting that Integrator is required for RNAPII pause-release (Figure 4.2.2. A-B). However, an increased TR does not provide sufficient mechanistic evidence that Integrator is required to license transcriptional elongation. To more thoroughly examine transcriptional dynamics, we separated the PRO-seq signal into promoter and gene body regions (defined above) as these loci correspond to transcriptional initiation and elongation, respectively. We also centered the PRO-seq promoter 51 profiles on the +1 nucleosome dyad to assess the response of RNAPII as it encounters the +1 nucleosome barrier (Figure 4.2.2. C-D). Depletion of INTS11 led to an increase and shift of RNAPII toward 3’-end of the promoter region (Figure

4.2.2. D). Moreover, we observed reduced read density through the gene body, reflecting impaired elongation (Figure 4.2.2. D).

A B

C D

Figure 4.2.2. Integrator facilitates transcriptional elongation. (A and B) PRO-seq traveling ratio (TR) at expressed genes (n = 8000) in (A) Control shRNA cells and (B) INTS11 shRNA cells without (Blue, - Dox) and with (Red, +Dox) shRNA induction. KS test ** = p-value < 0.01 and *** = p-value < 0.001. (C and D) PRO-seq promoter proximal read density and gene body profile at all expressed genes (n = 8000) in (C) Control shRNA cells and (D) INTS11 shRNA cells without (Blue, -Dox) and with (Red, +Dox) shRNA induction. PRO-seq promoter profiles are centered on the +1 nucleosome dyad. The gene body length is normalized using the region corresponding to 25% of the total annotated gene length up to the transcription end site (TES) plus an additional 1kb.

4.2.3. The traveling matrix reveals Integrator-dependent transcriptional behavior

Measurements of the TR pose a major limitation in that the same value can result from a reduction of signal in gene body or an increase in the promoter, obscuring transcriptional dynamics (Figure 4.2.3. A, compare conditions B and C).

Assessment of the TR and RNAPII average profiles provide a representative insight into global gene expression changes, though we strived to understand the 52 response of individual genes to INTS11 depletion. Therefore, we devised a two- dimensional Gaussian distribution of RNAPII, which we termed the traveling matrix

(TM), to simultaneously dissect the alterations at promoters and gene bodies of

INTS11-responsive genes (Figure 4.2.3. A). The TM resolves changes in the promoter and gene body regions of individual genes, which are then represented as a two-dimensional topographic map. The color intensity in the TM displays the concentration of genes at a particular position. Induction of Control shRNA minimally changed its TM as most genes were centered close to the origin of the two dimensional graph (Figure 4.2.3. B). In contrast, depletion of INTS11 induced an extensive transcriptional effect as the TM coordinates of numerous genes were altered (Figure 4.2.3. C). Consistent with the average profiles, the genes in the

INTS11 TM showed reduced gene body read density (Figure 4.2.3 C, note the downward shift of the matrix).

We set the Control TM as the baseline expected null dispersion and excluded these genes from the INTS11 TM to identify genes that significantly (false discover rate (FDR) < 0.05) differ from the expected control dispersion. We named this matrix the differential TM (Figure 4.2.3. D-E, see Methods). We then divided the differential TM into four gene class quadrants (class I to IV), representing differentially expressed genes with coincident changes in promoter and gene body signal. While INTS11 depletion altered many gene coordinates in the differential

TM (compare Figure 4.2.3. B and C), the majority of INTS11-responsive genes accumulated in class III and class IV quadrants, signifying a loss of transcriptional elongation (Figure 4.2.3. D-E). Moreover, the pronounced enrichment of genes in 53 class IV, where augmented promoter signal was coupled with reduced gene body reads, implicates a requirement for INTS11 in mediating RNAPII pause-release and productive elongation at these loci.

A B

C

D E

Figure 4.2.3. The Traveling Matrix reveals Integrator-dependent transcriptional behavior. (A) Hypothetical schematic of control condition “A” whereby treatments producing conditions “B” and “C” generate identical traveling ratio (TR) values by distinct transcriptional mechanisms. Use of the Traveling Matrix (TM) illuminates transcriptional alterations obscured by identical TRs. (B and C) TM of (B) Control shRNA cells and (C) INTS11 shRNA cells. (D) Differential TM of INTS11 shRNA cells. (E) Dotted differential TM of INTS11 shRNA cells where each dot represents one gene.

4.2.4. Class IV genes are enriched in Integrator and RNAPII

The differential TM illuminated defects in transcription elongation as more than two-thirds of transcripts were reduced in gene body signal after the depletion 54 of INTS11. We wondered whether the enrichment of Integrator and RNAPII was uniform among the classes. We next analyzed the steady-state occupancy of

INTS11 and RNAPII to assess their distribution in the promoter regions of the four classes of genes defined above. The ChIP-seq profiles of INTS11 and RNAPII closely track each other, reflective of their constitutive association (Figure 4.2.4.

A-B). The maximum signal of both targets was found adjacent to the +1 nucleosome, indicating that all classes of genes undergo tightly regulated RNAPII pausing within confined genomic loci.

A C

B

Figure 4.2.4. Class IV genes are occupied by Integrator; PRO-seq examples of class IV genes. (A and B) ChIP-seq promoter profiles of (A) INTS11 and (B) RNAPII centered on the +1 nucleosome dyad at gene classes defined by the differential TM. Kolmogorov–Smirnov (KS) test p-value < 0.001 for classes paired against class IV. (C) PRO-seq ex amples of class IV genes SEL1L and MED4 in Control shRNA cells and INTS11 shRNA cells without (-Dox) and with (+Dox) shRNA induction.

While class I, II and III genes displayed an equal distribution of Integrator and

RNAPII in their steady-state, class IV genes showed the highest levels of Integrator and RNAPII occupancy at their promoters, consistent with their responsiveness to

INTS11 depletion (Figure 4.2.4. A-B). The TM highlighted class IV genes in their 55 increased PRO-seq promoter signal and decreased gene body reads, as exemplified by SEL1L and MED4 (Figure 4.2.4. C).

4.3. SUMMARY

To study the direct function of Integrator in transcriptional regulation, we curated all expressed genes using stringent bioinformatic criteria and arrived at

8000 genes for downstream analysis. We then validated the specificity of INTS11

ChIP-seq antibodies by demonstrating the loss of genomic signal after the induction of INTS11 shRNA. Moreover, INTS11 and RNAPII ChIP-seq peaks were highly correlated at promoters, in agreement with their constitutive association.

The PRO-seq TR revealed defects in transcriptional elongation following the loss of INTS11, while average profiles showed an accumulation of signal at promoters and a concomitant decrease in gene bodies. Notably, the increase in PRO-seq signal at promoters was shifted over the border of the +1 nucleosome toward the

3’-end. Due to the limitations of the TR, we utilized a differential TM and found more than two-thirds of genes were reduced in gene-body RNAPII signal, indicating that Integrator predominantly promotes transcriptional elongation.

Interestingly, INTS11 and RNAPII were most enriched at class IV genes, which were characterized by increased PRO-seq reads at promoters and decreased reads in gene bodies.

CHAPTER 5: INTEGRATOR RELIEVES RNAPII PAUSING ADJACENT TO THE +1 NUCLEOSOME

56 57

5.1. APPROACH

The differential TM highlighted multiple transcriptional phenotypes that we designated into four classes based on the differential PRO-seq signal at promoters and gene bodies following the knockdown of INTS11. We sought to understand the transcriptional behavior of the four classes of genes with a particular interest in the response of RNAPII as it approaches the +1 nucleosome. Therefore, we analyzed gene classes by their PRO-seq average profiles, TR, steady-state occupancy of RNAPII, histone modifications, and nucleosome deposition.

Collectively, this examination allows for a comprehensive understanding of the transcriptional and epigenetic landscape at genes after the loss of Integrator.

5.2. RESULTS

5.2.1. Integrator is required for RNAPII recruitment and pause-release

To understand how INTS11 regulates transcription of the four gene classes determined by the differential TM, we displayed the promoter profile of engaged

RNAPII centered on the +1 nucleosome and depicted the gene body profile from its first 25% to the TES (Figure 5.2.1. A-D). Class III and IV genes represent the majority of INTS11-responsive genes (n = 2104 or 67%), which showed decreased read density in their gene bodies upon depletion of INTS11 (Figure 5.2.1. D). As seen in the differential TM, class IV genes not only exhibited a substantial increase in paused RNAPII in proximity of +1 nucleosome, the peak of RNAPII was also shifted toward the center of the dyad (Figure 5.2.1. D). This behavior was also noted in class II genes (Figure 5.2.1. B). In contrast, class III and I genes showed 58 decreased read density close to +1 nucleosome without a change in the position of RNAPII, suggesting that Integrator mediates the recruitment of RNAPII to these loci. (Figure 5.2.1. B and D).

A B

C D

E F

G H

Figure 5.2.1. Integrator is required for RNAPII recruitment and pause-release. (A and B) PRO-seq promoter proximal read density and gene body profile at Class I and Class II genes in (A) Control shRNA cells and (B) INTS11 shRNA cells without (Class I: Orange, Class II: Blue, -Dox) and with (Class I: Grey, Class II: Red, +Dox) shRNA induction. KS test p-value = 0.01 of Class I promoter and gene body profiles (–Dox vs. +Dox). KS test p-value < 0.001 of Class II promoter and gene body profiles (–Dox vs +Dox) (C and D) PRO-seq promoter proximal read density and gene body profile at Class III and Class IV genes in (C) Control shRNA cells and (D) INTS11 shRNA cells without (Class III: Orange, Class IV: Blue, -Dox) and with (Class III: Grey, Class IV: Red, +Dox) shRNA induction. KS test p-value < 0.001 for both Class III and IV (–Dox vs. +Dox) promoter and gene body profiles. PRO-seq promoter profiles are centered on the +1 nucleosome dyad. (E-H) PRO-Seq TR of (E) Class I genes, (F) Class II genes, (G) Class III genes, and (H) Class IV genes in Control shRNA cells and INTS11 shRNA cells without (Blue, -Dox) and with (Red, +Dox) shRNA induction.

Analysis of the TR illuminated elongation defects in class II, III and IV genes, while this impairment was most pronounced at class IV genes (Figure 5.2.1. E-H). 59

Moreover, the decreased TR at class I genes substantiates their increased elongation. As the upregulation of class I genes occurred without de-novo RNAPII recruitment, these findings suggest that Integrator represses transcription at these loci.

5.2.2. The steady-state occupancy of RNAPII recapitulates alterations in nascent transcription at the four classes of genes

We next asked whether the changes in nascent transcription manifest in the steady-state occupancy of RNAPII. The PRO-seq results were then validated using ChIP-seq of total RNAPII, which showed comparable profiles of RNAPII at promoters and gene bodies of the four classes of genes (Figure 5.2.2. A and B,

Figure 5.2.1. B and D). Importantly, the alterations in nascent transcription were largely recapitulated in steady-state gene expression. Indeed, the coordinates of up- and downregulated genes were found above and below the horizontal axis of the differential TM, respectively (Figure 5.2.2. C).

60

A B

C

Figure 5.2.2. The steady-state occupancy of RNAPII recapitulates alterations in nascent transcription. (A) RNAPII ChIP-seq promoter and gene body profiles at Class I and Class II genes without (Class I: Orange, Class II: Blue, -Dox) and with (Class I: Grey, Class II: Red, +Dox) INTS11 shRNA induction. KS test p-value < 0.05 and 0.01 for Class I and II (–Dox vs +Dox) promoter profiles, respectively and KS test p-value < 0.05 for gene body profiles. (B) RNAPII ChIP-seq promoter and gene body profiles at Class III and Class IV genes without (Class III: Orange, Class IV: Blue, -Dox) and with (Class III: Grey, Class IV: Red, +Dox) INTS11 shRNA induction. KS test p-value < 0.0001 and 0.05 for Class III and IV (–Dox vs +Dox) promoter profiles, respectively and KS test p-value < 0.001 for gene body profiles. RNAPII Chip-seq promoter profiles are centered on the +1 nucleosome dyad. (C) Projection of differentially expressed genes calculated by RNA-seq on the differential TM of INTS11 shRNA cells. Blue dots are downregulated genes and red dots are upregulated genes defined by RNA- seq.

5.2.3. Integrator functions at promoters with a positioned +1 nucleosome decorated in H3K4me3

We wondered how the epigenetic landscape was composed at the four classes of genes defined by the differential TM. Specifically, we asked whether depletion of INTS11 affects the localization of the +1 nucleosome and the levels of its associated histone modifications. Interestingly, while depletion of INTS11 did not impact the position of the +1 nucleosome in the four classes of genes, the two gene classes (II and IV) that featured increased RNAPII pausing showed highest

ATAC-seq read density of the +1 nucleosome (Figure 5.2.3. A-D). This indicates 61 that the position of the +1 nucleosome was more fixed at these genes and reflects a tighter confinement of nucleosome deposition. As such, the height of the mononucleosome profile measures the probability of nucleosome distribution within a region and not an amount or quantity of nucleosome.

A B C D

E F G H

Figure 5.2.3. Integrator functions at promoters with a positioned +1 nucleosome decorated in H3K4me3. (A and B) Mononucleosome profile at Class I and Class II genes in (A) Control shRNA cells and (B) INTS11 shRNA cells without (Class I: Orange, Class II: Blue, -Dox) and with (Class I: Grey, Class II: Red, +Dox) shRNA induction. (C and D) Mononucleosome profile at Class III and Class IV genes in (I) Control shRNA cells and (J) INTS11 shRNA cells without (Class III: Orange, Class IV: Blue, -Dox) and with (Class III: Grey, Class IV: Red, +Dox) shRNA induction. (E) ChIP-seq promoter profile centered on the +1 nucleosome dyad at classes defined by the TM of H3K4me3 without (Solid line, -Dox) and with (Dashed line, +Dox) shRNA induction in shINTS11 cells. KS test p-value < 0.001 for classes paired against Class IV. (F) ChIP-seq gene body profile of H3K36me3 without (Solid line, -Dox) and with (Dashed line, +Dox) shRNA induction in shINTS11 cells. KS test p-value < 0.001 for classes paired against Class IV. (G) ChIP-seq promoter profile centered on the +1 nucleosome dyad at classes defined by the TM of H3K4me1 in shINTS11 cells. (H) Gene body PRO-seq profile of INTS11 shRNA cells at the classes defined by the TM, KS test p-value < 0.0001 for classes paired against Class IV.

The enhanced positioning of the +1 nucleosome at class IV genes was coupled with high occupancy histone H3 lysine 4 trimethylation (H3K4me3) and histone H3 lysine 36 trimethylation (H3K36me3), and low enrichment of histone H3 lysine 4 monomethylation (H3K4me1) (Figure 5.2.3. E-G). This is in agreement with high transcriptional activity and elevated levels of INTS11 and RNAPII at promoters of class IV genes (Figure 4.2.4. A-B). In contrast, class I genes displayed the lowest 62 levels of H3K4me3, H3K36me3 and RNAPII in their promoters, consistent with their comparably low expression (Figure 5.2.3. E-F and H, Figure 4.2.4. B).

Interestingly, despite the transcriptional alterations incurred by the loss of INTS11, the levels of histone modifications were largely unaffected at all classes of genes.

5.3. SUMMARY

Taken together, we find that at genes with a highly positioned +1 nucleosome decorated in H3K4me3, INTS11 depletion elicits a substantial increase in RNAPII pausing in conjunction with reduced elongation, as exemplified by class IV genes in the differential TM. We also identified defects in the recruitment of RNAPII, suggesting that Integrator is critical for both transcriptional initiation and pause-release. Importantly, the transcriptional changes invoked by the depletion of INTS11 were largely recapitulated in steady-state RNAPII occupancy and transcript levels, while nucleosome deposition and histone modifications were mostly unchanged.

CHAPTER 6: THE CATALYTIC ACTIVITY OF INTEGRATOR PROMOTES RNAPII PAUSE-RELEASE AND CONTRIBUTES TO TRANSCRIPTIONAL ELONGATION

63 64

6.1. APPROACH

The previous results indicate that Integrator plays a role in the recruitment of RNAPII and pause-release at genes with a positioned +1 nucleosome enriched in H3K4me3. However, it is unclear whether these observations were due to the specific loss of INTS11 or the destabilization of the entire catalytic module. We wondered whether a catalytic-mutant INTS11 would generate a transcriptional response similar to the loss of INTS11. Therefore, we generated stable HeLa cells that express wild-type INTS11 (WT INTS11 shRNA) or the catalytic mutant E203Q-

INTS11 (E203Q INTS11 shRNA) in the shINTS11 background from a sequence refractory to the shRNA. We carefully selected clones whose exogenous expression was close to the dosage of endogenous INTS11 and performed PRO- seq.

6.2. RESULTS

6.2.1. Validation of endogenous INTS11 depletion and shRNA-resistant expression of WT-INTS11 and E203Q-INTS11

Few studies have examined the direct contribution of INTS11 catalytic activity to gene expression and Integrator function. Accordingly, we depleted

INTS11 and performed rescue experiments by ectopically expressing wild type

INTS11 or its catalytic mutant E203Q from a sequence resistant to the shRNA. In addition, these exogenously expressed proteins contain a Flag tag (DYKDDDDK) in their N-terminus (Figure 6.2.1. A). Since a Flag tag adds negative charge and approximately 3 kDa of mass to a protein, this results in a slower migration rate 65 during polyacrylamide gel electrophoresis. The resulting western blot will then display two bands, a higher one representing the exogenously expressed protein and a lower one consisting of the endogenous protein.

Figure 6.2.1. Validation of endogenous INTS11 depletion and shRNA-resistant expression of WT- INTS11 and E203Q-INTS11. (A) Immunoblot of INTS11 and Lamin A without (-Dox) and with (+Dox) shRNA induction in cells expressing WT-INTS11 and E203Q-INTS11. The upper band denotes stable expression of Flag-tagged WT-INTS11 or E203Q-INTS11 from a sequence refractory to INTS11 shRNA.

6.2.2. INTS11 catalysis facilitates transcriptional elongation

To address the role of the INTS11 endonuclease activity in transcriptional regulation, we next performed PRO-seq after rescuing INTS11-depleted HeLa cells with WT-INTS11 or E203Q-INTS11. Exogenous expression of WT-INTS11 rescued the changes incurred by INTS11 depletion at all 8000 genes as seen by the average profiles and the TR (Figure 6.2.2. A and B). In contrast, rescue with

INTS11-E203Q produced a similar TR and average profile to that of INTS11 depletion, indicating that defective RNA processing augments RNAPII pausing and impairs transcriptional elongation (Figure 6.2.2. C and D, Figure 4.2.2. B and D).

Notably, the summit of the PRO-seq signal was also shifted towards the 3’-end of the average profile, much like the average profile of INTS11-depleted cells.

66

A B

C D

Figure 6.2.2. INTS11 catalysis facilitates transcriptional elongation (A) PRO-seq TR at all expressed genes (n = 8000) in INTS11 shRNA cells expressing WT-INTS11 without (Blue, -Dox) and with (Red, +Dox) shRNA induction. (B) PRO-seq promoter proximal read density and gene body profile at all expressed genes (n = 8000) in INTS11 shRNA cells expressing WT-INTS11 without (Blue, -Dox) and with (Red, +Dox) shRNA induction. PRO-seq promoter profiles are centered on the +1 nucleosome dyad. (C) PRO-seq TR at all expressed genes (n = 8000) in INTS11 shRNA cells expressing E203Q-INTS11 without (Blue, -Dox) and with (Red, +Dox) shRNA induction. (D) PRO-seq promoter proximal read density and gene body profile at all expressed genes (n = 8000) in INTS11 shRNA cells expressing E203Q-INTS11 without (Blue, -Dox) and with (Red, +Dox) shRNA induction. PRO-seq promoter profiles are centered on the +1 nucleosome dyad.

6.2.3. INTS11 catalysis is required for RNAPII pause-release but not RNAPII recruitment

Given the similarities of the average profiles and TR between INTS11 depleted cells and E203Q-expressing cells, we next asked whether transcriptional behavior was conserved at the four classes of genes. Examination of the average profiles at the four gene classes revealed that at class II and IV genes, INTS11 catalysis is required for the release of paused-RNAPII into productive elongation

(Figure 6.2.3. B and D). Class III and IV genes highlighted the requirement of

INTS11 catalysis for transcriptional elongation as the reduction in gene body read density was not restored by E203Q-INTS11 expression (Figure 6.2.3. D). 67

Strikingly, the impaired recruitment of RNAPII to class I and III genes was rescued by E203Q-INTS11 expression, thus delineating a catalytic-independent role of

Integrator in the recruitment of RNAPII at these genes (Figure 6.2.3 B and D).

A B

C D

E F

G H

Figure 6.2.3. INTS11 catalysis is required for RNAPII pause-release but not RNAPII recruitment (A and B) PRO-seq promoter proximal read density and gene body profile at Class I and Class II genes in (A) INTS11 shRNA cells expressing WT-INTS11 and (B) INTS11 shRNA cells expressing E203Q-INTS11 without (Class I: Orange, Class II: Blue, -Dox) and with (Class I: Grey, Class II: Red, +Dox) shRNA induction. KS test p-value < 0.0001 for Class II (–Dox vs +Dox) promoter profile and KS test p-value < 0.05 and 0.01 for Class I and Class II (–Dox vs +Dox) gene body profiles, respectively. (C and D) PRO- seq promoter proximal read density and gene body profile at Class III and Class IV genes in (C) INTS11 shRNA cells expressing WT-INTS11 and (D) INTS11 shRNA cells expressing E203Q-INTS11 without (Class I: Orange, Class II: Blue, -Dox) and with (Class I: Grey, Class II: Red, +Dox) shRNA induction. KS test p-value < 0.0001 for Class IV (–Dox vs +Dox) promoter profile and KS test p-value < 0.001 for Class III and Class IV (–Dox vs +Dox) gene body profiles. PRO-seq promoter profiles are centered on the +1 nucleosome dyad. (E-H) PRO-Seq TR of (E) Class I genes, (F) Class II genes, (G) Class III genes, and (H) Class IV genes in INTS11 shRNA cells expressing WT-INTS11 and INTS11 shRNA cells expressing E203Q-INTS11 without (Blue, -Dox) and with (Red, +Dox) shRNA induction. KS test; * = p-value < 0.05, **= p-value < 0.01 and *** = p-value < 0.001

68

At class III genes, while E203Q-INTS11 expression corrected RNAPII recruitment defects, the reduction in gene body read density persisted, affirming the essentiality of INTS11 catalysis in the transition of RNAPII to productive elongation

(Figure 6.2.3. D). Notably, the read density in the bodies of the class I and II genes increased to a lesser extent compared to that of INTS11 depletion (Figure 5.2.1. B vs. Figure 6.2.3. B).

6.3. SUMMARY

We demonstrate here that depletion of INTS11 and reexpression of E203Q-

INTS11 augmented RNAPII pausing and impaired elongation at 8000 genes, thus phenocopying the knockdown of endogenous INTS11. Importantly, rescue with

WT-INTS11 restored all transcriptional defects incurred by the loss of INTS11. At the four classes of genes, the catalytic activity of INTS11 was required for RNAPII pause-release and promoting the transition to productive elongation but not

RNAPII recruitment. These results elucidate catalytic-dependent and independent functions of Integrator in the regulation of transcription at genes with a well- positioned +1 nucleosome.

CHAPTER 7: INTS11 CATALYZES THE PREMATURE TERMINATION OF SMALL RNA ASSOCIATED WITH PAUSED RNAPII

69 70

7.1. APPROACH

Our previous experiments demonstrate that in the proximity of the +1 nucleosome, the catalytic activity of INTS11 is required for the release of paused

RNAPII into productive elongation. We surmised that nascent transcripts protruding from the active site of RNAPII undergo endonucleolytic cleavage by

INTS11 and prematurely terminate, allowing for new rounds of initiation (Erickson et al., 2018; Shao and Zeitlinger, 2017; Steurer et al., 2018). As RNAPII transcribes

~60-120 nt before pausing, we analyzed these nascent mRNA by performing small

RNA (smRNA) sequencing from samples enriched in nuclear RNA. This purification is critical, as whole-cell smRNA does not provide enough sequencing coverage for analysis. This is likely due to the high levels of cytoplasmic microRNA and tRNA consuming sequencing reads (data not shown).

7.2. RESULTS

7.2.1. INTS11 catalysis has a limited role in the stability of RNAPII- associated smRNA

We first assessed where smRNA map along genes. We detected an enrichment of smRNA in the 5’-end of all genes and subsequently analyzed the reads that mapped to the first 150 nt from the TSS (Figure 7.2.1. A). The presence of smRNA was observed at all classes of genes, indicating that the generation of

RNAPII-associated smRNA is pervasive throughout the genome (Figure 7.2.1. B and C, E and F). This is in line with previous studies that showed the production of smRNA was not exclusive to genes with paused-RNAPII, despite their enrichment 71 at paused-sites (Nachaev, Taft). We found that the amount of smRNA at class II and IV genes was unchanged in all conditions; while at class I and III genes, the level of smRNA in INTS11-depleted cells was reduced due to impaired RNAPII recruitment (Figure 7.2.1. B and C, E and F).

We previously established that depletion of INTS11 or overexpression of

E203Q-INTS11 compromises UsnRNA termination, resulting in RNAPII read- through and transcript lengthening (Baillat et al., 2005). Interestingly, analysis of smRNA read length illuminated the extension of transcripts in E203Q-INTS11 cells at baseline, suggesting that E203Q-INTS11 exerted a dominant-negative affect on

RNA processing prior to the depletion of endogenous INTS11 (Figure 7.2.1. D).

A B C D

E F

Figure 7.2.1. INTS11 catalysis has a limited role in the stability of RNAPII-associated smRNA. (A) smRNA whole gene profile of all expressed genes (n = 8000) in INTS11 shRNA cells. (B and C) Box plot

of log2(+Dox/-Dox) smRNA CPM at (B) Class IV and (C) Class III genes. t-test; *** = p-value < 0.001 (D) Box plot of smRNA read length at all genes in INTS11 shRNA cells expressing WT-INTS11 (Orange) and INTS11 shRNA cells expressing E203Q-INTS11 (Gray). t-test; *** = p-value < 0.001. (E and F) Box plot of log (+Dox/-Dox) smRNA CPM at (A) Class II and (B) Class I genes. 2

72

7.2.2. INTS11 terminates smRNA associated with paused RNAPII

Evidence from the aforementioned studies implicate that defects in premature termination would be accentuated at class II and IV genes, as they display pronounced stalling of RNAPII in the absence of INTS11 or its catalytic activity (Nechaev et al., 2010; Taft et al., 2009). Indeed, the mean read length of smRNA at these classes increased following depletion of INTS11, consistent with the augmented pausing of RNAPII and its 3’-shift towards the center of the +1 nucleosome dyad (Figure 7.2.2. A, Figure 7.2.2. E, Figure 5.2.1. B and D). At class

III genes, knockdown of INTS11 generated shorter smRNA extensions compared to that of class II and IV genes, owing to the reduction in RNAPII recruitment

(Figure 7.2.2. B, see Figure 5.2.1. B and D). Importantly, ectopic expression of

WT-INTS11 rescued the lengthening of smRNA transcripts at all classes, while expression of E203Q-INTS11 failed to do so (Figure 7.2.2 C and D, G and H). We noted that the magnitude of smRNA lengthening was subdued by the dominant- negative activity of E203Q-INTS11, however, the importance of INTS11 catalytic activity in smRNA processing was underscored at all gene classes (7.2.2. A-H).

73

A B

C D

E F

G H

Figure 7.2.2. INTS11 terminates smRNA associated with paused RNAPII. (A and B) smRNA mean size distribution centered on the TSSs of (A) Class IV and (B) Class III genes in Control shRNA cells and INTS11 shRNA cells without (Blue, -Dox) and with (Red, +Dox) shRNA induction. (C and D) smRNA mean size distribution centered on the TSSs of (C) Class IV and (D) Class III genes in INTS11 shRNA cells expressing WT-INTS11 and INTS11 shRNA cells expressing E203Q-INTS11 without (Blue, -Dox) and with (Red, +Dox) shRNA induction. (E and F) smRNA mean size distribution centered on the TSSs of (E) Class II and (F) Class I genes in Control shRNA cells and INTS11 shRNA cells without (Blue, -Dox) and with (Red, +Dox) shRNA induction. (G and H) smRNA mean size distribution centered on the TSSs of (G) Class II and (H) Class I genes in INTS11 shRNA cells expressing WT-INTS11 and INTS11 shRNA cells expressing E203Q-INTS11 without (Blue, -Dox) and with (Red, +Dox) shRNA induction.

7.2.3. INTS11 generates a 21-nucleotide smRNA product of catalysis

Further dissection of smRNA by the read length distribution revealed two populations of transcripts that originate from TSSs with average sizes of 21 and

41 nt, respectively (Figure 7.2.3. A-H). 74

A B

C D

E F I J

G H

Figure 7.2.3. INTS11 generates a 21-nucleotide smRNA product of catalysis. (A-D) Two-dimensional depiction of smRNA distance from TSS vs. read length in (A) Control shRNA cells, (B) INTS11 shRNA cells, (C) INTS11 shRNA cells expressing WT-INTS11 and (D) INTS11 shRNA cells expressing E203Q- INTS11 without (-Dox) and with (+Dox) shRNA induction. The magenta dotted line partitions smRNAs into two populations of shorter and longer than 25 nucleotides. The yellow dotted lines serve as visual placeholders. (E-H) smRNA read length distribution at all genes in (E) Control shRNA cells, (F) INTS11 shRNA cells, (G) INTS11 shRNA cells expressing WT-INTS11 and (H) INTS11 shRNA cells expressing E203Q-INTS11 without (Blue, -Dox) and with (Red, +Dox) shRNA induction. smRNA examples of Class IV genes (I) SEL1L and (J) MED4 in Control shRNA cells, INTS11 shRNA cells, INTS11 shRNA cells expressing WT-INTS11 and INTS11 shRNA cells expressing E203Q-INTS11 without (-Dox) and with (+Dox) shRNA induction.

After INTS11 knockdown, we observed an accumulation and extension of the 41 nt smRNA population concomitant with a depletion of the 21 nt smRNA population (Figure 7.2.3. B and F). This processing defect was rescued by WT-

INTS11 yet persisted in cells expressing E203Q-INTS11, indicating that the 75 population of 21 nt transcripts correspond to the cleavage product of INTS11 catalysis (Figure 7.2.3. C and D, G and H). As these populations map to TSSs, visualizing these separate populations on the genome browser poses difficulties.

Nonetheless, the increased average length of smRNA was well exemplified by the class IV genes SEL1L and MED4 (Figure 7.2.3. M and N).

7.3. SUMMARY

We show here that our methodology of smRNA purification and sequencing enriched nascent transcripts associated with paused RNAPII at the 5’-end of genes. In addition, we demonstrate that INTS11 catalysis does not destabilize smRNA, as the levels were unchanged at all gene classes in cells expressing

E203Q-INTS11. Despite exerting dominant-negative activity on smRNA processing, catalytic-mutant INTS11 impaired smRNA processing at all classes of genes, while class IV genes were most affected. Taken together, these data demonstrate a role for Integrator and the catalytic activity of INTS11 in the termination of RNAPII-associated nascent transcripts at genes with high levels of paused RNAPII.

CHAPTER 8: INTEGRATOR REGULATES ENHANCER RNA TRANSCRIPTIONAL TERMINATION

76 77

8.1. APPROACH

We asked whether our findings at promoters were generalizable to enhancer regions genome-wide given the known requirement for Integrator in the biogenesis and 3’-end processing of EGF-responsive eRNA. In 2012, the

ENCODE consortium identified around 400,000 putative enhancers in human cell lines based on their chromatin state (Calo and Wysocka, 2013). The presence of intergenic H3K4me1 or its ratio to H3K4me3 is often used for defining enhancer regions, however, the identification of active enhancers requires a transcriptional readout. Therefore, to understand the activity of Integrator at enhancers, we first established the bioinformatic criteria for identifying transcriptionally active enhancers. Next, we assessed RNAPII occupancy and dynamics by ChIP-seq and

PRO-seq average profiles, TR, and enhancer TM.

8.2. RESULTS

8.2.1. Integrator recruits RNAPII to enhancers and terminates eRNA transcription but is dispensable for RNAPII pause-release

We evaluated RNAPII profiles at 2293 enhancers by finding common intergenic H3K27ac and H3K4me1 peaks and excluded regions that overlap within

2kb of an annotated TSS. We then centered the reads of the bidirectional eRNA transcripts on the maximum PRO-seq signal (Figure 8.2.1. A, see Methods).

Consistent with previous observations at epidermal growth factor-responsive enhancers, depletion of INTS11 produced a global defect in eRNA termination, as demonstrated by the extension of RNAPII reads and accumulation of transcripts 78 beyond their canonical 3’-end (Figure 8.2.1. B-F). This was particularly evident in cells expressing E203Q-INTS11 following the knockdown of INTS11 (compare

Figure 8.2.1. D and F).

A B

C D E F

G H I J

Figure 8.2.1. Integrator recruits RNAPII to enhancers and terminates eRNA transcription but is dispensable for RNAPII pause release. (A) Enhancer analysis schematic. (B) RNAPII ChIP-seq profile at enhancers (n = 2293) without (Blue, -Dox) and with (Red, +Dox) INTS11 shRNA induction. (C-F) PRO- seq enhancer profile (n = 2293) in (C) Control shRNA cells (D) INTS11 shRNA cells (E) INTS11 shRNA cells expressing WT-INTS11 and (F) INTS11 shRNA cells expressing E203Q-INTS11 without (Blue, - Dox) and with (Red, +Dox) shRNA induction. (G-J) PRO-seq enhancer TSS profile in (G) Control shRNA cells (H) INTS11 shRNA cells (I) INTS11 shRNA cells expressing WT-INTS11 and (J) INTS11 shRNA cells expressing E203Q-INTS11 without (Blue, -Dox) and with (Red, +Dox) shRNA induction. KS test p- value < 0.001 for all conditions (–Dox vs +Dox).

We then centered PRO-seq reads on the +1 nucleosome as promoters and enhancers share remarkably conserved transcriptional architecture, including well- positioned +1 and -1 nucleosomes (Core et al., 2014). Knockdown of INTS11 also decreased recruitment of RNAPII to enhancers, similar to that observed at class I and III promoters (Figure 8.2.1. G and H, Figure 5.2.1. B and D). Strikingly, 79 expression of both WT-INTS11 and E203Q-INTS11 restored defects in RNAPII occupancy incurred by the loss of INTS11, further attesting to the catalytic- independent role of Integrator in the recruitment of RNAPII (Figure 8.2.1. I and J).

8.2.2. The enhancer traveling matrix conveys the role of Integrator in transcriptional termination beyond the +1 nucleosome

We next determined the TR and TM at enhancers (see method). In agreement with the PRO-seq average profiles, the TR at enhancers displayed augmented transcriptional elongation resulting from decreased RNAPII density at the TSS and an accumulation of reads within enhancer bodies (Figure 8.2.2. A-D).

Strikingly, the TM at enhancers was entirely different to that of promoters (Figure

8.2.2. E and F, Figure 4.2.3. B and C). While INTS11 depletion led to reduced reads in the bodies of most genes as shown in Figure 4.2.3 C, enhancers displayed an augmentation of reads in their bodies well beyond the +1 nucleosome, reflecting impaired termination (Figure 8.2.1. D and Figure 8.2.2. F). The differential TM at enhancers also depicted a pattern considerably different from the differential TM at promoters (compare Figure 4.2.3. D and E vs. Figure 8.2.2. G and H). Notably, only six enhancers showed increased RNAPII-stalling after INTS11 knockdown, suggesting that Integrator plays little role in RNAPII pause-release at enhancers. 80

A B C D

E G H

F

Figure 8.2.2. The enhancer traveling matrix conveys the role of Integrator in transcriptional termination beyond the +1 nucleosome. (A-D) Enhancer TR in (A) Control shRNA cells (B) INTS11 shRNA cells (C) INTS11 shRNA cells expressing WT-INTS11 and (D) INTS11 shRNA cells expressing E203Q-INTS11 without (Blue, -Dox) and with (Red, +Dox) shRNA induction. KS test; *** = p-value < 0.001. (E and F) Enhancer TM of (E) Control shRNA cells and (F) INTS11 shRNA cells. (G) Differential enhancer TM of INTS11 shRNA cells. (H) Dotted enhancer differential TM of INTS11 shRNA cells.

8.3. SUMMARY

Taken together, these data indicate that Integrator is required for the global recruitment of RNAPII to enhancers and promoters independent of its catalytic activity. At promoters, INTS11 catalysis facilitates transcriptional elongation through the premature termination of nascent mRNA adjacent to the +1 nucleosome. While at enhancers, the endonuclease activity of INTS11 is needed strictly for the termination of eRNA transcripts beyond the +1 nucleosome.

CHAPTER 9: PAUSING FACTORS AND RNAPII PHOSPHORYLATION CONVEY INTEGRATOR FUNCTIONS AT GENES AND ENHANCERS

81 82

9.1. APPROACH

Our previous results stipulate that the Integrator complex is required for the genome-wide recruitment of RNAPII, independent of its catalytic activity. At promoters, the endonuclease activity of INTS11 is needed for facilitating transcriptional elongation through the cleavage of nascent transcripts associated with paused RNAPII. We wondered whether these functions of Integrator are conveyed in the occupancy of factors known to be involved in RNAPII pause- release, such as NELF and SPT5. Given the association of Integrator with the

RNAPII CTD, we also sought to understand how Integrator activity is transmitted through the differential phosphorylation of serine-5 (RNAPII-S5), serine-2

(RNAPII-S2), and tyrosine-1 (RNAPII-Y1) residues. To address these questions, we performed ChIP-seq using antibodies against NELF-E, SPT5, RNAPII-S5,

RNAPII-S2, RNAPII-Y1 and analyzed their occupancy at the four classes of genes and enhancers regions. In addition, we also performed stochastic optical reconstruction microscopy (STORM) to visually appreciate the association of

Integrator and RNAPII-Y1, which was previously identified by mass spectroscopy.

9.2. RESULTS

9.2.1. NELF, SPT5 and tyrosine-1 phosphorylated RNAPII affirm Integrator activity at genes and enhancers

We analyzed factors associated with RNAPII pause-release at promoters and enhancers following depletion of INTS11. In agreement with the increased

RNAPII pausing observed at promoters of class II and IV genes, knockdown of 83

INTS11 increased occupancy of the NELF subunit, NELF-E, and the DSIF subunit,

SPT5 (Figure 9.2.1. A and B). At enhancers, NELF-E levels were reduced, in line with decreased RNAPII recruitment (Figure 9.2.1. D). Moreover, while SPT5 levels were unchanged around eRNA start sites, there was increased occupancy in enhancer bodies, reflecting its role as an elongation factor upon phosphorylation by P-TEFb (Figure 9.2.1 E) (Yamada et al., 2006).We evaluated levels of tyrosine-

1 phosphorylated-RNAPII (RNAPII-Y1), which was shown to be required for transcriptional termination and functionally associated with Integrator by mass spectroscopy (Mayer et al., 2012; Shah et al., 2018). After INTS11 depletion, elevated levels of RNAPII-Y1 were found at enhancers and at promoters of class

II and IV genes, consistent with the pronounced defects in RNA processing at these sites (Figure 9.2.1. C and F, Figures 7.2.2., 7.2.3. and 8.2.1.). In contrast, promoters of class I and III genes were reduced in RNAPII-Y1 occupancy due to impaired recruitment of RNAPII (Figure 9.2.1. C). This was further confirmed through analysis of serine-5 phosphorylated RNAPII (RNAPII-S5), which was similarly diminished at these classes (Figure 9.2.1. G). Moreover, occupancy of serine-2 phosphorylated RNAPII (RNAPII-S2) closely mirrored the PRO-seq gene body profiles at the four classes of genes (Figure 9.2.1. H). 84

A B C

D E F

G H

Figure 9.2.1. NELF, SPT5 and tyrosine-1 phosphorylated RNAPII affirm Integrator activity at genes and enhancers. (A) NELF-E ChIP-seq promoter profile at all gene classes without (Class I & III: Orange, Class II & IV: Blue, -Dox) and with (Class I & III: Grey, Class II & IV: Red, +Dox) shRNA induction in INTS11 shRNA cells. KS test p-value < 0.0001 for all gene classes (–Dox vs +Dox). (B) SPT5 ChIP-seq promoter profile at all gene classes without (Class I & III: Orange, Class II & IV: Blue, -Dox) and with (Class I & III: Grey, Class II & IV: Red, +Dox) shRNA induction in INTS11 shRNA cells. KS test p-value < 0.0001 for all gene classes (–Dox vs +Dox). (C) RNAPII-Y1 ChIP-seq promoter profile at all gene classes without (Class I & III: Orange, Class II & IV: Blue, -Dox) and with (Class I & III: Grey, Class II & IV: Red, +Dox) shRNA induction in INTS11 shRNA cells. KS test p-value < 0.0001 for all gene classes (–Dox vs +Dox). (D) NELF-E ChIP-seq enhancer profile without (Blue, -Dox) and with (Red, +Dox) shRNA induction in INTS11 shRNA cells. (E) SPT5 ChIP-seq enhancer profile without (Blue, -Dox) and with (Red, +Dox) shRNA induction in INTS11 shRNA cells. (F) RNAPII-Y1 ChIP-seq enhancer profile without (Blue, -Dox) and with (Red, +Dox) shRNA induction in INTS11 shRNA cells. (G) RNAPII-S5 ChIP-seq promoter profile at all gene classes without (Class I & III: Orange, Class II & IV: Blue, -Dox) and with (Class I & III: Grey, Class II & IV: Red, +Dox) shRNA induction in INTS11 shRNA cells. KS test p-value < 0.0001 for Class III (–Dox vs +Dox). (H) RNAPII-S2 ChIP-seq promoter profile at all gene classes without (Class I & III: Orange, Class II & IV: Blue, -Dox) and with (Class I & III: Grey, Class II & IV: Red, +Dox) shRNA induction in INTS11 shRNA cells. KS test p-value < 0.0001 for all gene classes (–Dox vs +Dox).

85

9.2.2. Integrator localizes in clusters with tyrosine-1 phosphorylated RNAPII

To ascertain whether these results could be visualized in cells, we performed STORM (stochastic optical reconstruction microscopy) imaging of

INTS1 and RNAPII-Y1, which revealed highly colocalized signals in large clusters throughout the nucleus (Figure 9.2.2 A and C). Strikingly, depletion of INTS11 significantly increased the density of these clusters (Figure 9.2.2 B and C), in agreement with the elevated ChIP-seq profiles of RNAPII-Y1 at paused genes and enhancers.

A B

C D

Figure 9.2.2. Integrator localizes in clusters with tyrosine-1 phosphorylated RNAPII. (A) STORM images of INTS1, RNAPII-Y1 and a merge without (-Dox) and with (+Dox) shRNA induction in INTS11 shRNA cells. (B) Quantification of RNAPII-Y1 cluster density without (-Dox) and with (+Dox) shRNA induction in INTS11 shRNA cells (n = 8, ** = p<0.01). (C) STORM images of INTS1, RNAPII-Y1 and a merge without (-Dox) and with (+Dox) shRNA induction in Control shRNA cells. (D) Quantification of RNAPII-Y1 cluster density without (-Dox) and with (+Dox) shRNA induction in Control shRNA cells (n = 8).

86

9.3. SUMMARY

Here, we demonstrate that the functions of Integrator in transcriptional regulation at genes and enhancers are transmitted through factors that cooperate in these processes, namely RNAPII pause-release and transcriptional termination.

At promoters of genes, NELF, SPT5 and RNAPII-Y1 occupancy closely track the behavior of RNAPII at the four gene classes after depletion of INTS11. At enhancers, where Integrator is strictly required for eRNA termination, NELF and

SPT5 occupancy deviate from one another, while SPT5 follows RNAPII-Y1 beyond the +1 nucleosome. RNAPII-Y1 and INTS1 colocalize in large clusters throughout the nucleus, reflecting the elevated levels of RNAPII-Y1 at Class IV genes and enhancers. In summary, these factors corroborate the role of Integrator catalytic activity in premature termination and RNAPII pause-release at promoters and transcriptional termination at enhancers.

CHAPTER 10: DISCUSSION

87 88

10.1. THE CONTRASTING ROLES OF HUMAN AND DROSOPHILA

INTEGRATOR IN TRANSCRIPTIONAL REGULATION

Recent studies conducted by the Wilusz and Adelman labs posit that the drosophila Integrator complex attenuates gene expression at protein-coding genes and enhancers by catalyzing premature termination of transcripts leading to their degradation by the exosome (Elrod et al., 2019; Tatomer et al., 2019). Despite similarities in the proposed mechanism of action, our work in human cells opposes this contention. In the following section, we contextualize our data in light of these findings and contrast the differences in function and RNAPII association between human and drosophila Integrator. In addition, we highlight the dissimilarity in epigenetic landscape of Integrator-occupied regulatory sites between these two species.

We show here that depletion of Integrator subunits that compose the catalytic module (INTS11, INTS9 and INTS4) lead to alterations in steady-state gene expression, consistent with numerous reports (Gardini et al., 2014; Lai et al.,

2015; Stadelmayer et al., 2014; Yamamoto et al., 2014; Yue et al., 2017) . While recent studies indicate that drosophila Integrator primarily attenuates gene expression, our data demonstrate that Integrator plays a role in the repression and activation of human genes in their steady-state levels (Elrod et al., 2019; Tatomer et al., 2019). For example, we found more than half of 7,425 differentially expressed genes were downregulated after knockdown of human INTS9, while depletion of drosophila INTS9 reduced expression of less than 18% of genes

(Elrod et al., 2019; Tatomer et al., 2019). Interestingly, upregulated gene ontology 89 pathways were the most conserved between these species, suggesting that throughout evolution the scope of Integrator expanded from a set of repressed target genes to include those that require its function for activation.

We previously showed that Integrator is stably associated with the C- terminal domain of RNAPII and ChIP-seq analysis of INTS11 and RNAPII showed highly correlated genomic occupancy at 8000 genes (r = .79). In drosophila, only a 39% correlation was found between INTS1 and PRO-seq signals at 9,499 genes, suggesting differences in Integrator occupancy, RNAPII association, and regulatory function in lower metazoans (Elrod et al., 2019). This difference in

RNAPII association is likely due to the lack of conservation in accessory subunits outside the catalytic module, which is an ongoing subject of investigation in our lab.

Recent genetic analyses of individuals with severe neurodevelopmental abnormalities identified mutations in the human Integrator subunits, namely INTS1 and INTS8 (Krall et al., 2018; Oegema et al., 2017; Zhang et al., 2019). Notably, fibroblasts derived from these patients exhibited defects in gene expression and splicing, including reduced mRNA levels of SOX4, NPTX1, and NTN1. Moreover, introduction of the described INTS8-EVL mutation into mouse P19 cells impaired expression of Hoxa1 and the neuronal markers Tubb3 and Reelin during an in- vitro neuronal differentiation assay. Ultimately these mutations result in profound intellectual disability, epilepsy, spasticity, facial and limb dysmorphism and subtle structural brain abnormalities (Oegema et al., 2017). The gene expression alterations found in cells from these individuals are consistent with our PRO-seq 90 findings in HeLa cells and argue for a role of human Integrator in transcriptional activation.

Class IV genes determined by the TM featured Integrator-dependent

RNAPII pausing, a strongly positioned +1 nucleosome decorated in H3K4me3, gene bodies enriched in H3K36me3, and were most occupied by INTS11 and

RNAPII compared to the other classes. These results contrast the report in drosophila that shows a repressive function of Integrator at lowly expressed genes and its anticorrelation with H3K4me3 and H3K36me3 (Elrod et al., 2019).

Furthermore, we found an inverse correlation of Integrator occupancy and

H3K4me1, also in opposition to this work. Interestingly, high levels of H3K4me1 and low levels of H3K4me3 and H3K36me3 were characteristic of class I genes, which featured an epigenetic landscape most closely attributed to enhancer elements. Notably, this class comprised the fewest Integrator-responsive genes in our study (327 of 3125 genes), which is consistent with a small set of lowly expressed genes in drosophila that display de-repression upon Integrator depletion (Elrod et al., 2019). These data suggest that throughout evolution

Integrator acquired a predominant function in the transcriptional activation of genes that require INTS11 catalysis for pause-release and elongation.

91

10.2. INTEGRATOR MEDIATES THE DYNAMIC TURNOVER OF RNAPII

PROXIMAL TO THE +1 NUCLEOSOME AT GENES AND DISTAL TO THE +1

NUCLEOSOME AT ENHANCERS

We previously showed that Integrator is critical for RNAPII recruitment and pause-release at immediate early genes through its recruitment of the super elongation complex subunits CDK9, AFF4, and ELL (Gardini et al., 2014). More recently, we demonstrated that INTS11 is a phosphorylation target of active MAP kinase signaling, thus linking the transcriptional function of Integrator with upstream growth factor signaling (Yue et al., 2017). We also found that EGF- responsive enhancers require Integrator for generating and terminating eRNA, indicating that Integrator coordinates the transcriptional dynamics at the key regulatory sites for immediate early genes (Lai et al., 2015). These studies utilized shRNA depletion of INTS11, but here, we sought to understand the genome-wide role of the Integrator catalysis in RNAPII recruitment and pause-release at promoters, and eRNA termination. Moreover, we aimed to utilize improved methodologies like PRO-seq as our previous studies lacked the sequencing resolution to fully appreciate the activity of Integrator at these regulatory regions.

To address the direct contribution of Integrator to transcription at promoters and enhancers in human cells, we depleted INTS11 and performed PRO-seq. We utilized a two-dimensional traveling matrix (TM) to project the positional changes in RNAPII at promoters and gene bodies incurred by the loss of INTS11. Strikingly, more than two-thirds of differentially regulated genes exhibited a loss of RNAPII signal in gene bodies, indicating that Integrator promotes transcriptional elongation 92 in human cells. Half of these genes displayed deficient RNAPII recruitment whereas the other half showed increased RNAPII pausing. This reduction in elongating RNAPII was evident in the steady-state levels of RNA, which were largely diminished in these classes. Critically, ectopic expression of E203Q-

INTS11 restored recruitment of RNAPII, while defects in pause-release persisted.

We note that the studies performed by the Adelman and Wilusz labs did not assess

RNAPII pausing by traveling ratio or the like. Interestingly, the former publication showed that genes unchanged in expression following INTS9 knockdown (as defined by their gene body PRO-seq signal) were elevated in PRO-seq signal at promoters. Since this increase in promoter signal was not accompanied by a proportional increase in gene body signal, these genes actually display impaired

RNAPII pause-release, and a traveling ratio would drastically alter the interpretation of the data (Elrod et al., 2019). This emphasizes the importance in performing a unified analysis of promoters and gene bodies, which is accomplished by the traveling matrix in our study. Through the utilization of catalytic-dead INTS11 in conjunction with the TM we illuminated the mechanistically distinct roles of Integrator in the recruitment of RNAPII and its catalytic activity in the transition of RNAPII to productive elongation.

We found that loss of INTS11 or ectopic expression of E203Q-INTS11 resulted not only in an accumulation of paused RNAPII but also a 3’-shift of RNAPII peak toward the center of the +1 nucleosome dyad. Remarkably, smRNA transcripts concomitantly lengthened over this region, demonstrating the requirement of INTS11 catalysis for their termination. Importantly, the levels of 93 smRNA were unchanged at all gene classes in cells expressing E203Q-INTS11, which argues against a role for Integrator in regulating smRNA stability. Closer examination of smRNA revealed two populations that emanate from TSSs: a 21 nt cleavage product that reduced upon loss of INTS11 or its catalytic activity and another 41 nt population that extended as a consequence of defective processing.

While the report in drosophila from the Wilusz group attributes numerous promoter- associated transcripts greater than 80 nt as Integrator-dependent, the smRNA in the present study is consistent with the length reported in previous literature and demonstrated conditional size changes that mechanistically reflect defective endonucleolytic RNA processing (Kapranov et al., 2007; Nechaev et al., 2010; Taft et al., 2009; Tatomer et al., 2019). These results indicate that INTS11 terminates nascent transcripts to rapidly evict paused RNAPII from the chromatin. This allows for the subsequent recruitment of an elongation-competent RNAPII complex to traverse the +1 nucleosome and elongate into the body of genes (Figure 10.2.1)

(Erickson et al., 2018; Krebs et al., 2017; Shao and Zeitlinger, 2017; Steurer et al.,

2018). 94

The prevailing model of RNAPII pausing stipulates that RNAPII residence time can range from minutes to hours, depending on the gene and method of analysis (Core and Adelman, 2019). However, recent work has challenged this model. Jurgen Marteijn’s group performed fluorescence recovery after photobleaching (FRAP) in cells expressing GFP-labeled RNAPII and found that on average, the RNAPII pausing time was approximately 42 seconds, and only 10% of those polymerases entered into productive elongation (Steurer et al., 2018).

Indeed, studies from the Bentley and Schubeler labs have also confirmed that

RNAPII undergoes dynamic cycles of initiation, pausing, premature termination and eviction prior to the establishment of a RNAPII complex that transitions to productive elongation (Erickson et al., 2018; Krebs et al., 2017). In agreement, recent work from the Zeitlinger group showed that paused-RNAPII complexes 95 must be evicted in order to initiate subsequent rounds of transcription (Shao and

Zeitlinger, 2017). Our data indicate that Integrator function is central to this contemporary model of RNAPII pausing, as we demonstrate its catalytic- independent recruitment of RNAPII and catalytic-dependent premature termination of nascent transcripts associated with paused RNAPII. We propose that at human genes, Integrator is key for mediating dynamic turnover of RNAPII through these dual functions, thus balancing the cycles of initiation and early termination events to promote productive elongation of RNAPII.

Genome-wide studies have found numerous commonalities between two classes of transcriptional regulatory elements, promoters and enhancers (Kim and

Shiekhattar, 2015). Examination of these distinct loci found architectural commonalities including DNA sequence, transcription factor occupancy, and nucleosome deposition. We wondered whether the functions of Integrator are also conserved among these regions. Consistent with our observations at promoters, enhancers also demonstrate a catalytic-independent requirement of Integrator in

RNAPII recruitment. However, it was proposed that RNAPII pausing at enhancers is more transient compared to promoters as evidenced by a lower RNAPII traveling ratio and increased susceptibility to P-TEFb inhibition (Henriques et al., 2018). In support, the enhancer TM revealed that Integrator has a limited contribution to

RNAPII pause-release at these regulatory elements. At promoters of human genes, the enzymatic activity of INTS11 is needed to promote elongation through the dynamic turnover of paused RNAPII close to the +1 nucleosome. While at enhancers, the catalytic activity of INTS11 is strictly utilized for eRNA termination 96 beyond the +1 nucleosome and plays little role in promoting pause-release and elongation. Collectively, these data indicate that Integrator catalyzes the termination of transcripts during distinct stages of transcription at human enhancers and promoters.

ChIP-seq studies examining occupancy of the pause factors NELF-E and

SPT5 corroborate the role of Integrator in mediating recruitment of RNAPII and facilitating pause-release at promoters. Moreover, the increased recruitment of pause factors was unique to promoters and was not seen at enhancers. The reduced occupancy of RNAPII-S5 at class III genes substantiates the role of

Integrator in RNAPII recruitment, while its unchanged levels at class IV genes signifies that the accumulation of PRO-seq signal at promoters was due to impaired pause-release and not increased initiation. Importantly, the elevation in

RNAPII-Y1 occupancy at paused genes and enhancers highlight the requirement for Integrator in the termination of transcripts at these sites. At promoter-proximal sites, Integrator induces the premature termination of nascent transcripts, while at enhancers Integrator processes the 3’-end of enhancer RNAs.

10.3 FUTURE PERSPECTIVES

Our work demonstrates that the Integrator complex is critical for the recruitment of RNAPII to human promoters and enhancers. The catalytic activity of Integrator ultimately facilitates transcriptional elongation at promoters while it terminates productive elongation at enhancers. Our efforts have led to a more detailed understanding of Integrator function, but more work is needed to address 97 a number of outstanding questions. First, what are the determinants for recruitment and stabilization of an elongation-competent RNAPII complex? In other words, what are the factors or conditions necessary for a polymerase complex to escape premature termination and proceed to productive elongation? Since only ~1% of

5’-bound RNAPII complexes transpire to productive elongation, uncovering the mechanism(s) underlying this extremely rare event are paramount for understanding this ubiquitous and highly regulated process (Darzacq et al., 2007;

Steurer et al., 2018).

Another interesting question concerns how exactly RNAPII is evicted from the chromatin following cleavage of its associated nascent transcript. One possibility is that after Integrator cleaves RNA through its endonuclease activity, the 5’ exonuclease XRN2 degrades the transcript from its 5’-end. This would suggest a mechanism similar to the torpedo model of termination at the 3’-end of protein-coding genes, where XRN2 continuously degrades the downstream transcript until it catches up with RNAPII and triggers its release by altering the confirmation of its active site (Proudfoot, 2016). As CPSF73 has also been shown to possess exonuclease activity, another scenario could be that Integrator itself initially cleaves transcripts using its endonuclease activity and subsequently degrades nascent mRNA through exonuclease activity (Yang et al., 2009). Despite its homology to CPSF73, INT11 exonuclease activity has not been demonstrated and this possibility is only speculative. We must also address which Integrator subunit or subunits interact directly with the RNAPII CTD. This understanding will 98 undoubtedly lead to a better appreciation of the spatial dynamics of the complex as it participates in transcriptional functions.

Finally, and most critically, what is the contribution of Integrator subunits outside of the catalytic module to the functionality of Integrator and how does the molecular activity of these subunits manifest biologically? For example, the Abdel-

Wahab lab recently showed that mutations in IDH2 and SRSF2 cooperate in leukemogenesis through the missplicing of INTS3, leading to its downregulation and perturbations in gene expression. Strikingly, ectopic expression of INTS3 in xenografts of IDH2 and SRSF2 double mutant HL-60 cells induced myeloid differentiation and slowed leukemia progression in-vivo. The study also found that

INTS3 missplicing is also found in other molecular subtypes of acute myeloid leukemia, suggesting that aberrant splicing of INTS3 directly contributes to leukemogenesis (Yoshimi et al., 2019). As INTS3 and INTS6 associate with hSSB1 and C9orf80 to form the SOSS (sensor of ssDNA) complex, these data link the activity of Integrator subunits to the DNA damage response and hematopoiesis

(Huang et al., 2009; Skaar et al., 2009; Vidhyasagar et al., 2018; Zhang et al.,

2013). This one example emphasizes the importance in characterizing the molecular functionality of Integrator subunits to better understand their biological output and relevance to pathophysiological conditions.

REFERENCES

Adelman, K., and Lis, J.T. (2012). Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nature Reviews Genetics 13, 720–731.

Albrecht, T.R., Shevtsov, S.P., Wu, Y., Mascibroda, L.G., Peart, N.J., Huang, K.- L., Sawyer, I.A., Tong, L., Dundr, M., and Wagner, E.J. (2018). Integrator subunit 4 is a “symplekin-like” scaffold that associates with INTS9/11 to form the Integrator cleavage module. Nucleic Acids Research 46, 4241–4255.

Almada, A.E., Wu, X., Kriz, A.J., Burge, C.B., and Sharp, P.A. (2013). Promoter directionality is controlled by U1 snRNP and polyadenylation signals. Nature 499, 360–363.

Baillat, D., and Wagner, E.J. (2015). Integrator: Surprisingly diverse functions in gene expression. Trends in Biochemical Sciences 40, 257–264.

Baillat, D., Hakimi, M.-A., Näär, A.M., Shilatifard, A., Cooch, N., and Shiekhattar, R. (2005). Integrator, a multiprotein mediator of small nuclear RNA processing, associates with the c-terminal repeat of RNA polymerase II. Cell 123, 265–276.

Banerji, J., Olson, L., and Schaffner, W. (1983). A lymphocyte-specific cellular enhancer is located downstream of the joining region in immunoglobulin heavy chain genes. Cell 33, 729–740.

Barbieri, E., Trizzino, M., Welsh, S.A., Owens, T.A., Calabretta, B., Carroll, M., Sarma, K., and Gardini, A. (2018). Targeted enhancer activation by a subunit of the integrator complex. Molecular Cell 71, 103–116.e107.

Barra, J., Gaidosh, G.S., Blumenthal, E., Beckedorff, F., Tayari, M.M., Kirstein, N., Karakach, T.K., Jensen, T.H., Impens, F., Gevaert, K., et al. (2020). Integrator restrains paraspeckles assembly by promoting isoform switching of the lncRNA NEAT1. Science Advances 6, 1–17.

Bintu, L., Ishibashi, T., Dangkulwanich, M., Wu, Y.-Y., Lubkowska, L., Kashlev, M., and Bustamante, C. (2012). Nucleosomal elements that control the topography of the barrier to transcription. Cell 151, 738–749.

Boeing, S., Rigault, C., Heidemann, M., Eick, D., and Meisterernst, M. (2010). RNA polymerase II c-terminal heptarepeat domain ser-7 phosphorylation is established in a mediator-dependent fashion. Journal of Biological Chemistry 285, 188–196.

Boettiger, A.N., and Levine, M. (2009). Synchronous and stochastic patterns of gene activation in the drosophila embryo. Science 325, 471–473.

Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: A flexible trimmer for illumina sequence data. Bioinformatics 30, 2114–2120.

99 100

Calo, E., and Wysocka, J. (2013). Modification of enhancer chromatin: What, how, and why? Molecular Cell 49, 825–837.

Cazalla, D., Xie, M., and Steitz, J.A. (2011). A primate herpesvirus uses the Integrator complex to generate viral microRNAs. Molecular Cell 43, 982–992.

Chen, C.-Y.A., and Shyu, A.-B. (2017). Emerging themes in regulation of global mRNA turnover in cis. Trends in Biochemical Sciences 42, 16–27.

Chen, F.X., Smith, E.R., and Shilatifard, A. (2018). Born to run: Control of transcription elongation by RNA polymerase II. Nature Reviews Molecular Cell Biology 19, 464–478.

Chen, J., Ezzeddine, N., Waltenspiel, B., Albrecht, T.R., Warren, W.D., Marzluff, W.F., and Wagner, E.J. (2012). An RNAi screen identifies additional members of the drosophila Integrator complex and a requirement for cyclin C/Cdk8 in snRNA 3'-end formation. RNA 18, 2148–2156.

Core, L., and Adelman, K. (2019). Promoter-proximal pausing of RNA polymerase II: A nexus of gene regulation. Genes and Development 33, 960–982.

Core, L.J., Martins, A.L., Danko, C.G., Waters, C.T., Siepel, A., and Lis, J.T. (2014). Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nature Genetics 46, 1311–1320.

Creamer, T.J., Darby, M.M., Jamonnak, N., Schaughency, P., Hao, H., Wheelan, S.J., and Corden, J.L. (2011). Transcriptome-wide binding sites for components of the Saccharomyces cerevisiae non-poly(A) termination pathway: Nrd1, Nab3, and Sen1. PLoS Genetics 7, e1002329.

Creyghton, M.P., Cheng, A.W., Welstead, G.G., Kooistra, T., Carey, B.W., Steine, E.J., Hanna, J., Lodato, M.A., Frampton, G.M., Sharp, P.A., et al. (2010). Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proceedings of the National Academy of Sciences of the United States of America 107, 21931–21936.

Czudnochowski, N., Bösken, C.A., and Geyer, M. (2012). Serine-7 but not serine- 5 phosphorylation primes RNA polymerase II CTD for P-TEFb recognition. Nature Communications 3, 842.

Darzacq, X., Shav-Tal, Y., de Turris, V., Brody, Y., Shenoy, S.M., Phair, R.D., and Singer, R.H. (2007). In vivo dynamics of RNA polymerase II transcription. Nature Structural and Molecular Biology 14, 796–806.

Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T.R. (2013). STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21.

101

Dominski, Z. (2007). Nucleases of the metallo-β-lactamase family and their role in DNA and RNA metabolism. Critical Reviews in Biochemistry and Molecular Biology 42, 67–93.

Dominski, Z., Yang, X.-C., Purdy, M., Wagner, E.J., and Marzluff, W.F. (2005). A CPSF-73 homologue is required for cell cycle progression but not cell growth and interacts with a protein having features of CPSF-100. Molecular Cell Biology 25, 1489–1500.

Ebmeier, C.C., Erickson, B., Allen, B.L., Allen, M.A., Kim, H., Fong, N., Jacobsen, J.R., Liang, K., Shilatifard, A., Dowell, R.D., et al. (2017). Human TFIIH kinase CDK7 regulates transcription-associated chromatin modifications. Cell Reports 20, 1173–1186.

Eddy, J., Vallur, A.C., Varma, S., Liu, H., Reinhold, W.C., Pommier, Y., and Maizels, N. (2011). G4 motifs correlate with promoter-proximal transcriptional pausing in human genes. Nucleic Acids Research 39, 4975–4983.

Egloff, S., O'Reilly, D., Chapman, R.D., Taylor, A., Tanzhaus, K., Pitts, L., Eick, D., and Murphy, S. (2007). Serine-7 of the RNA polymerase II CTD is specifically required for snRNA gene expression. Science 318, 1777–1779.

Egloff, S., Szczepaniak, S.A., Dienstbier, M., Taylor, A., Knight, S., and Murphy, S. (2010). The Integrator complex recognizes a new double mark on the RNA polymerase II carboxyl-terminal domain. Journal of Biological Chemistry 285, 20564–20569.

Egloff, S., Zaborowska, J., Laitem, C., Kiss, T., and Murphy, S. (2012). Ser7 phosphorylation of the CTD recruits the RPAP2 ser5 phosphatase to snRNA genes. Molecular Cell 45, 111–122.

Elrod, N.D., Henriques, T., Huang, K.-L., Tatomer, D.C., Wilusz, J.E., Wagner, E.J., and Adelman, K. (2019). The Integrator complex attenuates promoter- proximal transcription at protein-coding genes. Molecular Cell 76, 738–752.

Erickson, B., Sheridan, R.M., Cortazar, M., and Bentley, D.L. (2018). Dynamic turnover of paused Pol II complexes at human promoters. Genes and Development 32, 1215–1225.

Ezzeddine, N., Chen, J., Waltenspiel, B., Burch, B., Albrecht, T., Zhuo, M., Warren, W.D., Marzluff, W.F., and Wagner, E.J. (2011). A subset of drosophila Integrator proteins is essential for efficient U7 snRNA and spliceosomal snRNA 3'-end formation. Molecular Cell Biology 31, 328–341.

Flynt, A.S., Greimann, J.C., Chung, W.-J., Lima, C.D., and Lai, E.C. (2010). MicroRNA biogenesis via splicing and exosome-mediated trimming in drosophila. Molecular Cell 38, 900–907.

102

Gardini, A., Baillat, D., Cesaroni, M., Hu, D., Marinis, J.M., Wagner, E.J., Lazar, M.A., Shilatifard, A., and Shiekhattar, R. (2014). Integrator regulates transcriptional initiation and pause release following activation. Molecular Cell 56, 128–139.

Gilchrist, D.A., Santos, dos, G., Fargo, D.C., Xie, B., Gao, Y., Li, L., and Adelman, K. (2010). Pausing of RNA polymerase II disrupts DNA-specified nucleosome organization to enable precise gene regulation. Cell 143, 540–551.

Gillies, S.D., Morrison, S.L., Oi, V.T., and Tonegawa, S. (1983). A tissue-specific transcription enhancer element is located in the major intron of a rearranged immunoglobulin heavy chain gene. Cell 33, 717–728.

Haberle, V., and Stark, A. (2018). Eukaryotic core promoters and the functional basis of transcription initiation. Nature Reviews Molecular Cell Biology 19, 621– 637.

Hata, T., and Nakayama, M. (2007). Targeted disruption of the murine large nuclear KIAA1440/Ints1 protein causes growth arrest in early blastocyst stage embryos and eventual apoptotic cell death. Biochimica et Biophysica Acta 1773, 1039–1051.

Heintzman, N.D., Stuart, R.K., Hon, G., Fu, Y., Ching, C.W., Hawkins, R.D., Barrera, L.O., Van Calcar, S., Qu, C., Ching, K.A., et al. (2007). Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nature Genetics 39, 311–318.

Henriques, T., Scruggs, B.S., Inouye, M.O., Muse, G.W., Williams, L.H., Burkholder, A.B., Lavender, C.A., Fargo, D.C., and Adelman, K. (2018). Widespread transcriptional pausing and elongation control at enhancers. Genes and Development 32, 26–41.

Hsieh, C.-L., Fei, T., Chen, Y., Li, T., Gao, Y., Wang, X., Sun, T., Sweeney, C.J., Lee, G.-S.M., Chen, S., et al. (2014). Enhancer RNAs participate in androgen receptor-driven looping that selectively enhances gene activation. Proceedings of the National Academy of Sciences of the United States of America 111, 7319– 7324.

Huang, J., Gong, Z., Ghosal, G., and Chen, J. (2009). SOSS complexes participate in the maintenance of genomic stability. Molecular Cell 35, 384–393.

Jin, C., Zang, C., Wei, G., Cui, K., Peng, W., Zhao, K., and Felsenfeld, G. (2009). H3.3/H2A.Z double variant–containing nucleosomes mark ‘nucleosome-free regions’ of active promoters and other regulatory regions. Nature Genetics 41, 941–945.

103

Kapp, L.D., Abrams, E.W., Marlow, F.L., and Mullins, M.C. (2013). The Integrator complex subunit 6 (Ints6) confines the dorsal organizer in vertebrate embryogenesis. PLoS Genetics 9, e1003822.

Kapranov, P., Cheng, J., Dike, S., Nix, D.A., Duttagupta, R., Willingham, A.T., Stadler, P.F., Hertel, J., Hackermüller, J., Hofacker, I.L., et al. (2007). RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 1484–1488.

Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: A fast spliced aligner with low memory requirements. Nature Methods 12, 357–360.

Kim, T.-K., and Shiekhattar, R. (2015). Architectural and functional commonalities between enhancers and promoters. Cell 162, 948–959.

Kim, T.-K., Hemberg, M., Gray, J.M., Costa, A.M., Bear, D.M., Wu, J., Harmin, D.A., Laptewicz, M., Barbara-Haley, K., Kuersten, S., et al. (2010). Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187.

Koch, F., Fenouil, R., Gut, M., Cauchy, P., Albert, T.K., Zacarias-Cabeza, J., Spicuglia, S., la Chapelle, de, A.L., Heidemann, M., Hintermair, C., et al. (2011). Transcription initiation platforms and GTF recruitment at tissue-specific enhancers and promoters. Nature Structural and Molecular Biology 18, 956-963.

Kornberg, R.D. (2007). The molecular basis of eukaryotic transcription. Proceedings of the National Academy of Sciences of the United States of America 104, 12955–12961.

Krall, M., Htun, S., Schnur, R.E., Brooks, A.S., Baker, L., Campomanes, A.A., Lamont, R.E., Gripp, K.W., Schneidman-Duhovny, D., Innes, A.M., et al. (2019). Biallelic sequence variants in INTS1 in patients with developmental delays, cataracts, and craniofacial anomalies. European Journal of Human Genetics 27, 582-593.

Krebs, A.R., Imanci, D., Hoerner, L., Gaidatzis, D., Burger, L., and Schübeler, D. (2017). Genome-wide single-molecule footprinting reveals high RNA polymerase II turnover at paused promoters. Molecular Cell 67, 411–422.

Kwak, H., Fuda, N.J., Core, L.J., and Lis, J.T. (2013). Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339, 950– 953.

Lagha, M., Bothma, J.P., Esposito, E., Ng, S., Stefanik, L., Tsui, C., Johnston, J., Chen, K., Gilmour, D.S., Zeitlinger, J., et al. (2013). Paused pol II coordinates tissue morphogenesis in the drosophila embryo. Cell 153, 976–987.

Lai, F., Gardini, A., Zhang, A., and Shiekhattar, R. (2015). Integrator mediates the biogenesis of enhancer RNAs. Nature 525, 399–403.

104

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology. 10, R25.

Lettice, L.A., Williamson, I., Wiltshire, J.H., Peluso, S., Devenney, P.S., Hill, A.E., Essafi, A., Hagman, J., Mort, R., Grimes, G., et al. (2012). Opposing functions of the ETS factor family define Shh spatial expression in limb buds and underlie polydactyly. Developmental Cell 22, 459–467.

Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows- Wheeler transform. Bioinformatics 25, 1754–1760.

Li, J., Zou, C., Bai, Y., Wazer, D.E., Band, V., and Gao, Q. (2006). DSS1 is required for the stability of BRCA2. Oncogene 25, 1186–1194.

Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15, 550.

Mahat, D.B., Kwak, H., Booth, G.T., Jonkers, I.H., Danko, C.G., Patel, R.K., Waters, C.T., Munson, K., Core, L.J., and Lis, J.T. (2016). Base-pair resolution genome-wide mapping of active RNA polymerases using precision nuclear run-on (PRO-seq). Nature Protocols 11, 1455–1476.

Marston, N.J., Richards, W.J., Hughes, D., Bertwistle, D., Marshall, C.J., and Ashworth, A. (1999). Interaction between the product of the breast cancer susceptibility gene BRCA2 and DSS1, a protein functionally conserved from yeast to mammals. Molecular Cell Biology 19, 4633–4642.

Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet Journal 17, 10–12.

Mayer, A., Heidemann, M., Lidschreiber, M., Schreieck, A., Sun, M., Hintermair, C., Kremmer, E., Eick, D., and Cramer, P. (2012). CTD tyrosine phosphorylation impairs termination factor recruitment to RNA polymerase II. Science 336, 1723– 1725.

Meyer, K.D., Lin, S.-C., Bernecky, C., Gao, Y., and Taatjes, D.J. (2010). P53 activates transcription by directing structural shifts in Mediator. Nature Structural and Molecular Biology 17, 753–760.

Murata, S., Chiba, T., and Tanaka, K. (2003). CHIP: A quality-control E3 ligase collaborating with molecular chaperones. The International Journal of Biochemistry and Cell Biology 35, 572–578.

Muse, G.W., Gilchrist, D.A., Nechaev, S., Shah, R., Parker, J.S., Grissom, S.F., Zeitlinger, J., and Adelman, K. (2007). RNA polymerase is poised for activation across the genome. Nature Genetics 39, 1507–1511.

105

Nechaev, S., Fargo, D.C., Santos, dos, G., Liu, L., Gao, Y., and Adelman, K. (2010). Global analysis of short RNAs reveals widespread promoter-proximal stalling and arrest of pol II in drosophila. Science 327, 335–338.

O'Reilly, D., Kuznetsova, O.V., Laitem, C., Zaborowska, J., Dienstbier, M., and Murphy, S. (2014). Human snRNA genes use polyadenylation factors to promote efficient transcription termination. Nucleic Acids Research 42, 264–275.

Oegema, R., Baillat, D., Schot, R., van Unen, L.M., Brooks, A., Kia, S.K., Hoogeboom, A.J.M., Xia, Z., Li, W., Cesaroni, M., et al. (2017). Human mutations in Integrator complex subunits link transcriptome integrity to brain development. PLoS Genetics 13, e1006809.

Ohmiya, H., Vitezic, M., Frith, M.C., Itoh, M., Carninci, P., Forrest, A.R.R., Hayashizaki, Y., Lassmann, T., FANTOM Consortium (2014). RECLU: A pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE). BMC Genomics 15, 269–15.

Otani, Y., Nakatsu, Y., Sakoda, H., Fukushima, T., Fujishiro, M., Kushiyama, A., Okubo, H., Tsuchiya, Y., Ohno, H., Takahashi, S.-I., et al. (2013). Integrator complex plays an essential role in adipose differentiation. Biochemical and Biophysical Research Communications 434, 197–202.

Pageon, S.V., Nicovich, P.R., Mollazade, M., Tabarin, T., and Gaus, K. (2016). Clus-DoC: A combined cluster detection and colocalization analysis for single- molecule localization microscopy data. Molecular Biology of the Cell 27, 3627– 3636.

Peterlin, B.M., and Price, D.H. (2006). Controlling the elongation phase of transcription with P-TEFb. Molecular Cell 23, 297–305.

Porrua, O., and Libri, D. (2015). Transcription termination and the control of the transcriptome: Why, where and how to stop. Nature Reviews Molecular Cell Biology 16, 190–202.

Proudfoot, N.J. (2016). Transcriptional termination in mammals: Stopping the RNA polymerase II juggernaut. Science 352, aad9926.

Quinlan, A.R., and Hall, I.M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842.

Rada-Iglesias, A., Bajpai, R., Swigut, T., Brugmann, S.A., Flynn, R.A., and Wysocka, J. (2011). A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279–283.

Rahl, P.B., Lin, C.Y., Seila, A.C., Flynn, R.A., McCuine, S., Burge, C.B., Sharp, P.A., and Young, R.A. (2010). C-myc regulates transcriptional pause release. Cell 141, 432–445.

106

Ramírez, F., Ryan, D.P., Grüning, B., Bhardwaj, V., Kilpert, F., Richter, A.S., Heyne, S., Dündar, F., and Manke, T. (2016). deepTools2: A next generation web server for deep-sequencing data analysis. Nucleic Acids Research 44, W160– W165.

Richard, P., and Manley, J.L. (2009). Transcription termination by nuclear RNA polymerases. Genes and Development 23, 1247–1269.

Roeder, R.G. (2019). 50+ years of eukaryotic transcription: An expanding universe of factors and mechanisms. Nature Structural and Molecular Biology 26, 783–791.

Rougvie, A.E., and Lis, J.T. (1988). The RNA polymerase II molecule at the 5' end of the uninduced hsp70 gene of D. melanogaster is transcriptionally engaged. Cell 54, 795–804.

Sartorelli, V., and Lauberth, S.M. (2020). Enhancer RNAs are an important regulatory layer of the epigenome. Nature Structural and Molecular Biology 27, 521–528.

Saunders, A., Core, L.J., and Lis, J.T. (2006). Breaking barriers to transcription elongation. Nature Reviews Molecular Cell Biology 7, 557–567.

Schep, A.N., Buenrostro, J.D., Denny, S.K., Schwartz, K., Sherlock, G., and Greenleaf, W.J. (2015). Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions. Genome Research 25, 1757–1770.

Schier, A.C., and Taatjes, D.J. (2020). Structure and mechanism of the RNA polymerase II transcription machinery. Genes and Development 34, 465–488.

Schoenfelder, S., and Fraser, P. (2019). Long-range enhancer-promoter contacts in gene expression control. Nature Reviews Genetics 20, 437–455.

Shah, N., Maqbool, M.A., Yahia, Y., Aabidine, El, A.Z., Esnault, C., Forné, I., Decker, T.-M., Martin, D., Schüller, R., Krebs, S., et al. (2018). Tyrosine-1 of RNA polymerase II CTD controls global termination of gene transcription in mammals. Molecular Cell 69, 48–61.

Shao, W., and Zeitlinger, J. (2017). Paused RNA polymerase II inhibits new transcriptional initiation. Nature Genetics 49, 1045–1051.

Shiraki, T., Kondo, S., Katayama, S., Waki, K., Kasukawa, T., Kawaji, H., Kodzius, R., Watahiki, A., Nakamura, M., Arakawa, T., et al. (2003). Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proceedings of the National Academy of Sciences of the United States of America 100, 15776–15781.

107

Skaar, J.R., Richard, D.J., Saraf, A., Toschi, A., Bolderson, E., Florens, L., Washburn, M.P., Khanna, K.K., and Pagano, M. (2009). INTS3 controls the hSSB1-mediated DNA damage response. The Journal of Cell Biology 187, 25–32.

Spiegelman, B.M., and Heinrich, R. (2004). Biological control through regulated transcriptional coactivators. Cell 119, 157–167.

Stadelmayer, B., Micas, G., Gamot, A., Martin, P., Malirat, N., Koval, S., Raffel, R., Sobhian, B., Severac, D., Rialle, S., et al. (2014). Integrator complex regulates NELF-mediated RNA polymerase II pause/release and processivity at coding genes. Nature Communications 5, 720.

Steurer, B., Janssens, R.C., Geverts, B., Geijer, M.E., Wienholz, F., Theil, A.F., Chang, J., Dealy, S., Pothof, J., van Cappellen, W.A., et al. (2018). Live-cell analysis of endogenous GFP-RPB1 uncovers rapid turnover of initiating and promoter-paused RNA polymerase II. Proceedings of the National Academy of Sciences of the United States of America 115, E4368–E4376.

Struhl, K. (1999). Fundamentally different logic of gene regulation in eukaryotes and prokaryotes. Cell 98, 1–4.

Taft, R.J., Glazov, E.A., Cloonan, N., Simons, C., Stephen, S., Faulkner, G.J., Lassmann, T., Forrest, A.R.R., Grimmond, S.M., Schroder, K., et al. (2009). Tiny RNAs associated with transcription start sites in animals. Nature Genetics 41, 572– 578.

Tao, S., Cai, Y., and Sampath, K. (2009). The Integrator subunits function in hematopoiesis by modulating Smad/BMP signaling. Development 136, 2757– 2765.

Tatomer, D.C., Elrod, N.D., Liang, D., Xiao, M.-S., Jiang, J.Z., Jonathan, M., Huang, K.-L., Wagner, E.J., Cherry, S., and Wilusz, J.E. (2019). The Integrator complex cleaves nascent mRNAs to attenuate transcription. Genes and Development 33, 1525–1538.

Taverna, S.D., Li, H., Ruthenburg, A.J., Allis, C.D., and Patel, D.J. (2007). How chromatin-binding modules interpret histone modifications: Lessons from professional pocket pickers. Nature Structural and Molecular Biology 14, 1025– 1040.

Tessarz, P., and Kouzarides, T. (2014). Histone core modifications regulating nucleosome structure and dynamics. Nature Reviews Molecular Cell Biology 15, 703–708.

Teves, S.S., Weber, C.M., and Henikoff, S. (2014). Transcribing through the nucleosome. Trends in Biochemical Sciences 39, 577–586.

108

Tome, J.M., Tippens, N.D., and Lis, J.T. (2018). Single-molecule nascent RNA sequencing identifies regulatory domain architecture at promoters and enhancers. Nature Genetics 50, 1533–1541.

Tsai, P.-F., Dell’Orso, S., Rodriguez, J., Vivanco, K.O., Ko, K.-D., Jiang, K., Juan, A.H., Sarshad, A.A., Vian, L., Tran, M., et al. (2018). A muscle-specific enhancer RNA mediates Cohesin recruitment and regulates transcription in trans. Molecular Cell 71, 129–141. van den Berg, D.L.C., Azzarelli, R., Oishi, K., Ben Martynoga, Urbán, N., Dekkers, D.H.W., Demmers, J.A., and Guillemot, F. (2017). Nipbl interacts with Zfp609 and the Integrator complex to regulate cortical neuron migration. Neuron 93, 348–361.

Venkatesh, S., and Workman, J.L. (2015). Histone exchange, chromatin structure and the regulation of transcription. Nature Reviews Molecular Cell Biology 16, 178–189.

Vidhyasagar, V., He, Y., Guo, M., Talwar, T., Singh, R.S., Yadav, M., Katselis, G., Vizeacoumar, F.J., Lukong, K.E., and Wu, Y. (2018). Biochemical characterization of INTS3 and C9ORF80, two subunits of hNABP1/2 heterotrimeric complex in nucleic acid binding. The Biochemical Journal 475, 45–60.

Vos, S.M., Farnung, L., Urlaub, H., and Cramer, P. (2018). Structure of paused transcription complex Pol II-DSIF-NELF. Nature 560, 601–606.

Weber, C.M., Ramachandran, S., and Henikoff, S. (2014). Nucleosomes are context-specific, H2A.Z-modulated barriers to RNA polymerase. Molecular Cell 53, 819–830.

Wu, Y., Albrecht, T.R., Baillat, D., Wagner, E.J., and Tong, L. (2017). Molecular basis for the interaction between Integrator subunits IntS9 and IntS11 and its functional importance. Proceedings of the National Academy of Sciences of the United States of America 114, 4394–4399.

Xie, M., Zhang, W., Shu, M.-D., Xu, A., Lenis, D.A., DiMaio, D., and Steitz, J.A. (2015). The host Integrator complex acts in transcription-independent maturation of herpesvirus microRNA 3′ ends. Genes and Development 29, 1552–1564.

Yamada, T., Yamaguchi, Y., Inukai, N., Okamoto, S., Mura, T., and Handa, H. (2006). P-TEFb-mediated phosphorylation of hSpt5 C-terminal repeats is critical for processive transcription elongation. Molecular Cell 21, 227–237.

Yamamoto, J., Hagiwara, Y., Chiba, K., Isobe, T., Narita, T., Handa, H., and Yamaguchi, Y. (2014). DSIF and NELF interact with Integrator to specify the correct post-transcriptional fate of snRNA genes. Nature Communications 5, 4263.

109

Yang, X.-C., Sullivan, K.D., Marzluff, W.F., and Dominski, Z. (2009). Studies of the 5' exonuclease and endonuclease activities of CPSF-73 in histone pre-mRNA processing. Molecular and Cellular Biology 29, 31–42.

Yoshimi, A., Lin, K.-T., Wiseman, D.H., Rahman, M.A., Pastore, A., Wang, B., Lee, S.C.-W., Micol, J.-B., Zhang, X.J., de Botton, S., et al. (2019). Coordinated alterations in RNA splicing and epigenetic regulation drive leukaemogenesis. Nature 574, 273–277.

Yue, J., Lai, F., Beckedorff, F., Zhang, A., Pastori, C., and Shiekhattar, R. (2017). Integrator orchestrates RAS/ERK1/2 signaling transcriptional programs. Genes and Development 31, 1809–1820.

Zeitlinger, J., Stark, A., Kellis, M., Hong, J.-W., Nechaev, S., Adelman, K., Levine, M., and Young, R.A. (2007). RNA polymerase stalling at developmental control genes in the drosophila melanogaster embryo. Nature Genetics 39, 1512–1516.

Zhang, F., Ma, T., and Yu, X. (2013). A core hSSB1-INTS complex participates in the DNA damage response. Journal of Cell Science 126, 4850–4855.

Zhang, X., Wang, Y., Yang, F., Tang, J., Xu, X., Yang, L., Yang, X.-A., and Wu, D. (2019). Biallelic INTS1 mutations cause a rare neurodevelopmental disorder in two Chinese siblings. Journal of Molecular Neuroscience. 40, 257–258.