MOLECULAR ANALYSIS OF RNA II ELONGATION AND

TERMINATION ON MAMMALIAN

by

KRISTOPHER WADE BRANNAN

B.A., University of Colorado Boulder, 2007

A thesis submitted to the

Faculty of the Graduate School of the

University of Colorado in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

Molecular Biology Program

2014

This thesis for the Doctor of Philosophy degree by

Kristopher Wade Brannan

has been approved for the

Molecular Biology Program

by

Arthur Gutierrez-Hartmann, Chair

David Bentley, Advisor

Sandy Martin

James DeGregori

Thomas Evans

Tom Blumenthal

Date ___3/26/14______

ii

Brannan, Kristopher Wade (Ph.D., Molecular Biology)

Molecular Analysis of RNA Polymerase II Elongation and Termination on Mammalian

Genes

Thesis directed by Professor David Bentley

ABSTRACT

RNA polymerase II (pol II) pausing, elongation and termination are important mechanisms for controlling pol II distribution and transcriptional output during the transcription cycle on coding genes. This thesis focuses on mechanisms that influence pol II elongation through protein coding genes, and the state of elongating pol

II during promoter proximal accumulation, elongation and termination. I address the following specific questions: 1) What factors are important for pol II termination? 2)

What is the role of premature termination in limiting pol II elongation? 3) What is the role of CTD phosphorylation on pol II elongation and mRNA cleavage/?

I report that decapping and TTF2 interact with Xrn2 and that these factors localize by ChIP at 5’ ends of genes. Knockdown of decapping and termination factors by shRNA caused a widespread re-positioning of pol II at 5’ ends of genes away from start sites and toward distal positions both downstream and upstream. These results suggest that co-transcriptional decapping and premature termination by a torpedo mechanism is broadly employed to limit transcription of human genes. I also report that

Fcp1 localizes at the 5’ end of human genes and limits CTD phosphorylation at both

Ser2P and Ser5P positions. Fcp1 knockdown caused a widespread redistribution of pol II at 5’ ends away from transcription start sites (TSS) toward downstream positions, and localized increases in pol II Ser2P and Ser5P on highly expressed genes. Fcp1

iii knockdown also results in shifting in pA-site choice on ~1000 genes, primarily toward proximal positions. These results suggest that cotranscriptional dephosphorylation by

Fcp1 is important for limiting both pol II elongation and usage of proximal alternative polyadenylation signals. Together these results have implications for new mechanisms regulating transcriptional control both at the elongation checkpoint and at the level of 3’ end formation.

The form and content of this abstract are approved. I recommend its publication.

Approved: David Bentley

iv

ACKNOWLEDGMENTS

Thank you to the Bentley lab, to the Molecular Biology Program, to my family, and to Dr. Plank.

v

TABLE OF CONTENTS

CHAPTER

I. INTRODUCTION ...... 1

1.1 Mechanisms of RNA pol II transcriptional pausing, elongation, and termination ...... 1

1.1A RNA pol II transcription cycle phases ...... 1

1.1B Termination models ...... 16

1.1C Xrn2 and associated proteins ...... 22

1.1D Decapping ...... 24

1.2 Regulation by co-transcriptional phosphatase activity ...... 27

1.2A Dynamic phosphorylation of RNA pol II CTD Ser2 ...... 28

1.2C Co-transcriptional phosphatase activity ...... 30

1.2D Alternative polyadenylation ...... 31

1.3 Specific questions ...... 33

II. MATERIALS AND METHODS ...... 34

2.1 Cell lines and growth conditions ...... 34

2.2 ...... 34

2.3 Immunopurification and mass spectrometry ...... 34

2.4 RNAi-mediated knock-downs ...... 35

2.5 RT-PCR ...... 35

2.6 ChIP, ChIP-Sequencing, mapping and analysis ...... 36

2.7 PA-Seq and analysis ...... 37

2.8 Decapping reactions ...... 38

vi

III. mRNA DECAPPING FACTORS AND THE EXONUCLEASE XRN2 FUNCTION IN WIDESPREAD PREMATURE TERMINATION OF RNA POLYMERASE II TRANSCRIPTION ...... 39

3.1 Summary ...... 39

3.2 Introduction ...... 40

3.3 Results ...... 42

3.3A Xrn2 associates with TTF2 and mRNA decapping factors ...... 42

3.3B Termination and decapping factors localize at 5’ ends near paused pol II ...... 44

3.3C Depletion of Xrn2 and TTF2 re-distributes pol II away from the TSS toward promoter-distal positions ...... 47

3.3D Decapping factor knockdown redistributes pol II away from promoter- proximal positions ...... 59

3.3E Comparing the roles of termination, decapping, and pausing factors in pol II localization ...... 63

3.3F Knockdown of termination and decapping factors redistributes pol II within ‘‘pausing-regulated’’ genes ...... 65

3.4 Discussion ...... 68

3.4A A Nuclear function for decapping factors in promoting premature termination of transcription ...... 68

3.4B Premature termination and the control of eukaryotic ...... 70

IV. COTRANSCRIPTIONAL DEPHOSPHORYLATION OF RNA POL II INFLUENCES PROMOTER PROXIMAL PAUSING AND PA-SITE CHOICE ...... 74

4.1 Abstract ...... 74

4.2 Introduction ...... 75

4.3 Results ...... 78

4.3A Fcp1 localizes at 5’ ends near paused pol II ...... 78

vii

4.3B Depletion of Fcp1 reduces promoter proximal pol II pausing ...... 78

4.3C Fcp1 knockdown increases relative CTD Ser2P and Ser5P within gene bodies ...... 82

4.3D Fcp1 knockdown results in an upstream shift in pA-site choice ...... 88

4.3E Fcp1 knockdown does not alter recruitment of Cstf77 ...... 93

4.4 Discussion ...... 97

V. CONCLUSIONS/DISCUSSION ...... 101

REFERENCES ...... 108

APPENDIX

A. SUPPLEMENTAL TABLES ...... 129

viii

LIST OF TABLES

TABLE

1. Human pLKO.1 lentiviral shRNA ...... 129

2. Proteins identified by mass spectrometry ...... 130

3. PSY analysis of pA-site shifts ...... 132

ix

LIST OF FIGURES

FIGURE

1-1 The pol II transcription cycle proceeding through three phases...... 2

1-2 Stepwise assembly of the pre-initiation complex (PIC)...... 4

1-3 RNA pol II density profile across a typical metazoan protein-coding gene...... 7

1-4 Promoter proximal pausing and transition into elongation...... 12

1-5 The core cleavage and polyadenylation machinery...... 15

1-6 pol II termination models...... 17

1-7 mRNA ...... 25

3-1 Xrn2 associates with termination and mRNA decapping factors...... 43

3-2 Decapping activity in nuclear extract...... 45

3-3 ChIP-seq of pol II, Xrn2, TTF2 and Dcp1a in HeLa cells...... 46

3-4 DRB induces similar changes in pol II and Xrn2 distribution on GAPDH...... 48

3-5 RNAi-mediated knockdown of termination and decapping factors...... 49

3-6 Knockdown of decapping and termination factors does not affect total pol II recruitment...... 51

3-7 Pol II ChIP-Seq normalized to total read counts on indicated genes in HEK293 cells stably expressing shScramble (scr), and shXrn2+shTTF2 shRNAs...... 52

3-8 Effect of Xrn2+TTF2 and Dcp2 knockdown on pol II distribution at genes lacking 5’ pol II accumulation and intronless genes...... 53

3-9 Little effect of Xrn2+TTF2 or Dcp2 knockdown on termination at 3’ ends...... 55

3-10 Termination factor knockdown redistributes pol II away from promoter proximal regions...... 56

3-11 Termination factor knockdown increases pol II escape index...... 58

x

3-12 Knockdown of decapping factors increases relative pol II occupancy upstream and downstream of start sites...... 60

3-13 Knockdown of decapping factors increases relative pol II occupancy upstream and downstream of start sites...... 61

3-14 Escape index in knockdown lines and uninfected parent compared to scr control...... 62

3-15 Knockdown of decapping and termination factors has similar effect on pol II density as does knockdown of negative elongation factors...... 64

3-16 Escape index plots in knockdown lines and uninfected parent vs. scr control...... 66

3-17 Effects of termination factor knockdown on steady-state mRNA abundance...... 67

3-18 The promoter-proximal “torpedo” model for premature termination of pol II transcription...... 72

4-1 ChIP-Seq of Fcp1 in HeLa cells shows enrichment around the TSS...... 79

4-2 ChIP-Seq of Fcp1 in HeLa cells shows enrichment around the TSS...... 80

4-3 Western validation of Fcp1 knockdown...... 81

4-4 Fcp1 knockdown has some effect on pol II recruitment...... 83

4-5 Fcp1 knockdown redistributes CTD ChIP-seq signal away from gene 5’ regions.....84

4-6 Knockdown of Fcp1 reduces relative pol II occupancy near transcription start sites...... 85

4-7 Nelf distribution is not influenced by Fcp1 depletion...... 86

4-8 Fcp1 knockdown does not alter relative distribution of NELF genome-wide...... 87

4-9 Fcp1 knockdown shifts Ser2P distribution...... 89

4-10 Relative ChIP-Seq frequency of CTD Ser2P...... 90

4-11 Fcp1 knockdown increases and shifts Ser5P distribution...... 91

4-12 Relative ChIP-Seq frequency and mean signal of CTD Ser5P in shFcp1 and shScramble lines...... 92

4-13 Pol II ChIP-Seq signal relative to poly(A) sites...... 94

xi

4-14 Fcp1 knockdown shifts pA-site choice upstream on ZNF146...... 95

4-15 Relative ChIP-seq signal of Cstf77 in Fcp1 knockdown...... 96

xii

CHAPTER I

INTRODUCTION

1.1 Mechanisms of RNA pol II transcriptional pausing, elongation, and termination

Expression of eukaryotic genes begins with synthesis of mRNA from protein coding genes by the 12-subunit DNA dependent RNA polymerase II (pol II) during what is known as the pol II transcription cycle. This cycle is a highly coordinated process involving the co-transcriptional recruitment of proteins involved in pol II transit as well as mRNA capping, splicing, 3’ end formation and transcriptional termination. Each step in this cycle is a target for the regulation of gene expression, and many human diseases are correlated with disruption of one or more of these steps. Pol II pausing, elongation and termination are now recognized as important mechanisms for controlling pol II transit and transcriptional output during this cycle. This thesis will focus on mechanisms influencing pol II elongation and termination, and will describe a decapping and exonucleolytic pathway involved in pol II premature termination.

1.1A RNA pol II transcription cycle phases

Transcription of eukaryotic protein coding genes proceeds in three basic phases

(Figure 1-1). The first phase involves formation of the pre-initiation complex (PIC) where pol II is recruited to the DNA template, associates with general transcription factors (GTFs) and initiates RNA synthesis. The second phase is productive elongation of the transcript, which occurs as pol II escapes the promoter proximal pause and travels through the gene body. The final phase of the cycle is mRNA 3’ end formation and

1

Figure 1-1: The pol II transcription cycle proceeding through three phases. The initiation phase coincides with phosphorylation of the pol II c-terminal domain (CTD) at the serine 5 position (Ser5P) by the TFIIH subunit CDK7. Following promoter proximal pausing, the CDK9 subunit of PTEFb phosphorylates the CTD at the serine 2 position (Ser2P), which promotes transition into the elongation phase. Dephosphorylation of the CTD is carried out by the phosphatases Ssu72, thought to primarily dephosphorylate CTD Ser5P, and Fcp1, thought to primarily dephosphorylate Ser2P. The final phase of the cycle, termination, occurs when both pol II and the transcript dissociate from the DNA template [1].

2 termination, when both pol II and the nascent transcript dissociate from the DNA template [2].

Each of these phases is highly regulated and mechanistically coupled to co- transcriptional mRNA processing events (reviewed in [3]). The phase-specific co- transcriptional recruitment of mRNA processing factors is facilitated by the large disordered C-terminal domain (CTD) of the Rbp1 subunit of pol II. This CTD consists of tandem repeats (26 in yeast and 52 in mammals) of the consensus heptad YSPTSPS, which are dynamically modified throughout the transcription cycle (Figure 1-1) [4]. In this section I will describe important steps along each phase of the transcription cycle, and I will discuss dynamic CTD phosphorylation in detail in section 1.2.

PIC formation

The initial phase of the transcription cycle is pol II recruitment and PIC formation at gene promoters. This complicated process involves the stepwise assembly of the large multi-subunit complexes, termed general transcription factors, which include most often

TFIID, TFIIA, TFIIB, TFIIF, TFIIE, and TFIIH, on promoter elements within template

DNA (reviewed in [5]) (Figure 1-2). Promoter elements differ in their composition between genes, and the PIC composition also varies between promoters in a dynamic and context specific way [6]. The earliest step in PIC formation for promoters containing the common TATA element is binding by the TATA-binding protein (TBP) subunit of

TFIID. Many promoters do not contain a TATA-element, and other subunits of TFIID

(TBP-associated factors or TAFs) interact with other types of promoter elements including the initiator element (INR) and downstream promoter element (DPE) [7].

3

Figure 1-2: Stepwise assembly of the pre-initiation complex (PIC). TFIID first binds the promoter, followed by recruitment of TFIIA and TFIIB, which stabilize promoter- bound TFIID. Next follows recruitment of pol II/TFIIF and formation of a stable TFIID- TFIIA-TFIIB-pol II/TFIIF-promoter complex. TFIIE is then recruited, with the subsequent entry of TFIIH, which phosphorylates the pol II CTD at the serine 5 position (Ser5P) and promotes promoter clearance [5].

4

Regardless of specific promoter elements, TFIID components are necessary for PIC nucleation on most protein coding genes.

TFIIB stabilizes TFIID-promoter binding and recruits TFIIF along with hypophosphorylated pol II to the promoter [8]. TFIIF directly binds pol II and enhances affinity of pol II for GTFs at promoter complexes [9]. TFIIF is also responsible for recruitment of TFIIE/TFIIH to the PIC, and is involved in promoter escape and elongation efficiency [9, 10]. It was recently shown that the transcription factor Gdown1 is able to block TFIIF elongation stimulation [11]. The pol II CTD phosphatase Fcp1 binds the RAP74 subunit of TFIIF, and this binding activates Fcp1 phosphatase activity

[12]. I will discuss the consequences of co-transcriptional Fcp1 activity on CTD phosphorylation and pol II elongation in Chapter IV of this thesis. A multi-subunit complex termed mediator binds hypophosphorylated pol II during PIC formation, and stimulates the kinase activity of TFIIH. This phosphorylation disrupts mediator binding and promotes its release from the CTD [13]. After pol II proceeds to elongation, mediator largely remains at the promoter as part of a scaffold [14], where it likely facilitates recruitment and reinitiation of free pol II dephosphorylated by the action of

Fcp1 or other CTD specific phosphatases.

Initiation and early elongation

The steps following PIC formation are important for mRNA 5’ end capping and establishment of a stable elongation complex. TFIIH ATPase/helicase activity is required for promoter opening and promoter clearance [15]. At this stage transcription initiation is frequently abortive and pol II slips and backtracks within the region immediately following initiation [16]. This slippage is relieved by the basal transcription and

5 elongation factor TFIIS, which stimulates intrinsic transcript cleavage activity of pol II allowing backtracked pol II to resume RNA synthesis following arrest [17].

Immediately following initiation, the pol II CTD is phosphorylated at the Ser5 position by the kinase activity of the Cdk7 subunit of TFIIH (Figure 1-1), or the Cdk8 subunit of mediator, and this phosphorylation facilitates recruitment of mRNA capping (CE) [18]. Ser5P CTD localizes the mammalian and also allosterically regulates it [19]. In yeast Fcp1 activity releases CE from pol II elongation complexes [20] [21], and in vitro removal of the Ser5P phosphate by human Fcp1 results in inefficient recruitment of the capping apparatus to the ternary complex and loss of the stimulatory effect of Ser5P on CE [22]. In Chapter IV I will report the effect of Fcp1 depletion on pol II Ser5P and elongation on human genes. Human CE stimulates promoter escape by countering the negative elongation factor NELF, and CE recruitment is enhanced by direct binding to the elongation factor Spt5 [22]. Promoter proximal pausing of pol II mediated by these negative elongation factors may provide a temporal window to ensure that capping has occurred before productive elongation proceeds. The cap is bound by cap binding proteins and the pre-mRNA is protected from 5’-3’ degradation by the nuclear exonuclease Xrn2 [23]. The removal of the 5’ cap is primarily carried out in the cytoplasm by the action of the decapping Dcp2 in concert with a complex of associated cofactors, and I will discuss the decapping mechanism in detail in section 1.1D.

Promoter proximal pausing and escape into elongation

Following initiation pol II encounters a rate-limiting barrier that lies between early elongation and productive elongation (Figure 1-3). The transition between these two

6

Figure 1-3: RNA pol II density profile across a typical metazoan protein-coding gene. Elevated density around the transcription start site (TSS) results from promoter- proximal pausing and premature termination of transcription. Blue and green arrows denote divergent transcription from the TSS. A second peak of pol II accumulation downstream of the poly (A) site precedes termination coupled to cleavage and polyadenylation. Black arrows denote termination of transcription with eviction of pol II (yellow circles) from the DNA template downstream of the poly (A) site (red arrow) and also in the promoter-proximal region. The mRNA cap structure is denoted by a white circle.

7 phases of the transcription cycle has now been characterized as a powerful regulatory switch used to increase or decrease gene expression in a signal-responsive fashion.

Promoter proximal accumulation of pol II was first demonstrated in viral systems, but has lately been shown to be a general feature of cellular genes. Run-on transcripts made in nuclei from SV40 infected cells are strongly biased toward the 5’ end of the late transcription unit, and labeled RNA extended on viral transcription complexes produces discrete 93–95 base RNA that is prematurely terminated near a potential hairpin loop structure [24, 25]. Similar examples of promoter-proximal stalling and/or premature termination occur on early and late promoters of polyoma virus [26]. SV40 late transcription is regulated by a mechanism [24] that controls a decision between premature termination and productive elongation, analogous to attenuation on bacterial operons

[27]. Transcription complexes assembled in HeLa nuclear extract on the adenovirus 2 major late promoters under NTP limiting conditions gives rise to 20 nucleotide uncapped transcripts that are elongated into capped transcripts upon NTP addition [28]. This phenomenon of pol II pausing at relatively discrete positions near the transcription start site and remaining competent to resume elongation was termed “promoter-proximal pausing.”

This pattern of pol II accumulation near start sites was shown to be common to a number of cellular genes. High levels of pol II accumulate at the 5’ ends of the

Drosophila heat shock gene hsp70 [29, 30], and human c-myc genes in the absence of active expression [31, 32]. Pol II localized by run-on accumulates at the promoter- proximal region on Hsp26 and GAPDH in Drosophila [33] and adenosine deaminase, c- fos, DHFR and transthyretin genes in mammals [34-37]. These early nuclear run-on

8 studies detected transcription proceeding in both directions from the start site, but the significance of this divergent transcription remained obscure [31, 38].

Promoter proximal pausing is proposed to be a “general rate-limiting step” in the pol II transcription cycle [39]. Recent ChIP-seq and Gro-Seq studies localize pol II genome-wide and reveal high levels of pol II accumulation at the start sites of thousands of genes in Drosophila and human cells [40-42]. ChIP-seq data presented in this thesis demonstrates that promoter proximal accumulation is a general feature of nearly all protein coding genes where pol II is localized. Promoter-proximal pol II accumulation likely involves sequence elements upstream and downstream of the start site as well as chromatin structure [43-45]. While the details of what makes pol II pile-up near start sites remain somewhat obscure, this is clearly a characteristic shared by numerous promoters

(Figure 1-3).

There has been speculation as to the purpose of promoter proximally accumulated pol II preceding gene activation. Nuclear run-on transcription of the cellular

Hsp70 and c-myc genes [29, 31, 32] and a transfected reporter driven by the HIV1 LTR

[46], show that upon activation, the ratio of within the gene body relative to the 5’ end increases, implying that regulation of gene expression can be exerted at the level of transcriptional elongation. At Hsp70, the amount of paused pol II prior to heat shock correlates with the amount of mRNA made after heat shock [45]. Promoter proximal pol II accumulation therefore provides a pool of engaged polymerases poised for rapid transcription in response to a gene activation stimulus. Promoter proximal pol II could also be important for excluding nucleosomes near the TSS thereby providing a bookmark in the chromatin that can be easily accessed by the transcriptional machinery

9

[44]. An extended pol II dwell time within the promoter proximal region may provide a temporal window for cotranscriptional capping of the nascent mRNA [47, 48], and could help to “license” productive elongation complexes by allowing time for recruitment of processing and elongation factors.

The flux of pol II from the promoter-proximal region into the body of a gene is controlled by various mechanisms. The HIV1 transactivator protein Tat stimulates elongation by pol II [46]. Without Tat, most polymerases that initiate from the HIV-1

LTR terminate prematurely shortly downstream of the TAR hairpin loop sequence in a manner resembling the SV40 late transcription unit. In the presence of Tat, pol II acquires the ability to extend transcripts all the way to the end of the provirus. These results suggested that Tat regulates transcription by an antitermination mechanism similar to that exerted by the bacteriophage lambda N protein [49], and in fact such a mechanism, dependent on Xrn2 activity, was recently demonstrated [50].

Tat can activate transcription when tethered to a DNA- in the promoter [51], and enhancers and promoter-bound chimeric transcription factors comprising activation domains fused to a DNA-binding domain can stimulate elongation

[52]. A number of natural cellular activators stimulate elongation including heat-shock factor, NFkB, and c-myc [43, 53, 54]. Activation domains that enhance elongation and initiation, respectively, can synergize with one another and the most potent activation domains, such as Herpes virus VP16, can stimulate both initiation and elongation [55,

56].

Activators and cellular transcription factors stimulate pol II transit away from the promoter-proximal region through interaction with negative elongation factors. The ATP

10 analogue 5, 6-dichloro-1-ß-D-ribofuranosylbenzimidazole (DRB) inhibits pol II chain elongation but not initiation [57]. The DRB-sensitivity-inducing factor (DSIF) subunits

Spt4/5 were identified as a conserved pol II binding complex required for inhibition of elongation near 5’ ends [58]. A second negative-elongation factor, NELF, cooperates with DSIF to induce pol II pausing [59]. The counterpart to these negative factors is the positive transcription elongation factor b (PTEFb) [60], the cyclin-dependent complex Cdk9-CyclinT1 that is specifically inhibited by DRB [61, 62]. DSIF,

NELF and PTEFb are all components of the same control system, and a major function of

PTEFb is to alleviate the negative effects of DSIF and NELF [63] by phosphorylating them both as well as the pol II C-terminal domain [64, 65] (Figure 1-4). This same mechanism of PTEFb-mediated elongation by DSIF and NELF phosphorylation regulates elongation at many cellular genes including c-fos and NFkB targets [65, 66]. PTEFb occurs in multiple complexes with different protein and RNA subunits [67, 68] and there are likely to be multiple ways that it can be recruited to genes. These include binding directly to transcription factors [53] and chromatin components [69]. Gdown1 is a transcription factor that influences elongation by limiting termination by the factor TTF2

[11]. In Chapter III, I will discuss a role for TTF2 and associated termination factors in limiting productive elongation in mammalian cells.

Control of gene expression at the level of transcriptional elongation is recognized to be at least as important as control of the initiation step in pol II transcription.

Important questions remain unresolved about the nature of promoter-proximally accumulated pol II. It is still not clear how many of these paused polymerases have backtracked and are destined ultimately to resume elongation and how many are destined

11

Figure 1-4: Promoter proximal pausing and transition into elongation. The negative elongation factors DSIF and NELF act to pause pol II following initiation. The transition into productive elongation is triggered by phosphorylation of the Spt5 subunit of DSIF, NELF and the serine 2 position of the pol II CTD (Ser2P). These phosphorylations are carried out by the CDK9 subunit of the positive elongation factor PTEFb. C-myc is one of a number of transcriptional activators known to recruit PTEFb to upregulate transcription at the elongation phase [70].

12 for premature termination. In section 1.1 I discuss premature termination in detail, and in

Chapters III and IV I will discuss the possibility of premature termination and CTD phosphorylation acting as distinct targets for regulation by controlled polymerase release into the body of the gene.

3’ end formation and termination

The final phase of the transcription cycle occurs after pol II has traveled the length of the gene body and transcription is terminated downstream of polyadenylation signals. The mechanisms of pol II termination vary depending on the type of transcript, and the CTD phosphorylation state is thought to play a role in determining which termination mechanism is used. In yeast, small nuclear RNAs (snRNAs), small nucleolar

RNAs (snoRNAs), and cryptic unstable transcripts (CUTs), together constitute a class of short transcripts that terminate within 300-600 bp of the promoter by recruitment of the

RNA binding proteins Nrd1 and Nab3, and the RNA helicase Sen1 [71-74]. This mechanism of termination differs from that utilized for protein coding genes, which involves recruitment of cleavage and polyadenylation machinery. One theory explaining how the EC dictates the mechanism of termination proposes that distance from the promoter can be sensed by the CTD phosphorylation state; Ser5 phosphorylation signaling for Nrd1 mediated termination, Ser2 phosphorylation signaling for cleavage/polyadenylation-coupled termination [75]. Mammalian systems terminate snRNA transcription by a different mechanism [76], and generate snoRNAs from

[77], but there may be an as yet uncharacterized parallel to the Nrd1/Nab3/Sen1 termination system used at mammalian short noncoding or cryptic transcripts [78]. A

13 genome-wide screen recently identified Nrd1 as a termination factor involved in cleavage/polyadenylation-coupled termination in C. elegans [79].

3’ end processing of protein coding genes coincides with termination, and requires both cis acting elements and trans acting factors that are conserved from yeast to humans [80] (Figure 1-5). The sequence element that directs 3’ processing for polyadenylated transcripts is the poly(A) signal [81], which in mammals consists of the hexanucleotide AAUAAA located ~10-30 nt upstream of the CA cleavage site and ~40-

60 nt upstream of a U/GU rich region [82]. This poly(A) signal is recognized by the cleavage and polyadenylation specificity factor complex comprised of the subunits

CPSF-30, CPSF-73, CPSF-100, CPSF-160, and Fip1 [83]. CPSF-160 directly recognizes the hexanucleotide AAUAAA, while CPSF-73 carries out the endonucleolytic cleavage at the polyadenylation site [84, 85]. The downstream cleavage product possesses an uncapped 5’ phosphate and is rapidly degraded [86]. The cleavage stimulation factor

(CstF) subunits CstF-50, CstF-64, and CstF-77 are recruited to the cleavage complex by interactions with CPSF and the U/GU-rich element. Other components of the mammalian cleavage/polyadenylation apparatus include I (CFI), cleavage factor II (CFII containing Clp1 and Pcf11), poly(A) polymerase (PAP), and symplekin

(Figure 1-5). A majority of these components have homologues in the yeast cleavage/polyadenylation machinery, which consists of the cleavage factors CFIA

(containing Clp1p and Pcf11p), CFIB, and the cleavage-polyadenylation factor CPF. All of these factors are attractive termination factor candidates, since they localize to the region of pol II termination, and some are shown to be directly involved in termination in yeast, as discussed further in the next section.

14

Figure 1-5: The core cleavage and polyadenylation machinery. Cleavage and polyadenylation involves both cis-acting elements and a large group of core trans-acting proteins. In this diagram, cis-elements are colored to match the trans-acting factors that bind them. A six nucleotide motif termed the poly(A) signal (PAS) canonically AAUAAA, is generally located 15-30 nt upstream of the cleavage site. Cleavage is enhanced by U- and GU-rich downstream sequence elements (DSEs) and upstream sequence elements (USEs). These sequence elements and the PAS are recognized by cleavage and polyadenylation specificity factor (CPSF) and cleavage stimulating factor (CSTF) complexes. The CPSF complex consists of CPSF1, which recognizes the PAS, as well as CPSF4, CPSF2, CPSF3, FIP1L1 and WDR33. The CSTF complex consists of CSTF1, CSTF2 and CSTF3 that are important for PAS selection. Poly(A) polymerase (PAP) catalyzes addition of the poly(A) tail. Other important factors include the scaffold protein simplekin, the cleavage factors Im (CFIm) and CFIIm, and the poly(A)-binding protein (PAB) [87].

15

The only known protein coding genes that do not undergo polyadenylation are metazoan histone mRNAs. 3’ processing of histone mRNA is directed by conserved histone stem-loop and purine-rich histone downstream sequence elements [88]. Like polyadenylated mRNA, histone mRNA 3’ processing involves recruitment of CstF and

CPSF subunits, as well as endonucleolytic cleavage by CPSF-73 [89, 90]. Unlike polyadenylated mRNA, mature histone mRNA 3’ ends contain a stem loop structure

(HSL) bound by the stem-loop-binding protein (SLBP). The HSL/SLBP interaction acts in place of polyadenylation to promote stability and export of the message [91].

Mammalian mRNA 3’ end processing and pol II termination both require the poly(A) signal [92-94]. The poly(A) signal directs recruitment of a large complex of pre- mRNA processing factors that mediate coupled cleavage of the nascent transcript and addition of a poly(A) tail, both of which are required for export, stability and translation of the message [95, 96]. After transcribing the poly(A) signal, pol II pauses and terminates stochastically, within a window of roughly 2000 bases [97, 98]. Mutation of the poly(A) signal leads to defects in both 3’ end processing and termination, suggesting that these processes are tightly linked [92-94]. The competing models that attempt to explain this link are the “torpedo” and the “allosteric” termination models (Figure 1-6).

A “hybrid” model has also been proposed that integrates aspects of both previous models.

1.1B Termination models

3’ end formation linked

The predominating model for pol II termination is the “torpedo” model, which proposes that an RNA cleavage event provides an entry site for an RNA 5’-3’ exonuclease, Rat1 in yeast and Xrn2 in mammals (Figure 1-6B). This exonuclease is

16

Figure 1-6: pol II termination models. A) The allosteric model. Conformational changes in the elongation complex occur after transcription of the PAS, leading to destabilization either by eviction of anti-termination factors or recruitment of termination factors. B) The torpedo model. Cleavage downstream of the PAS allows entry of the 5’ to 3’ exonuclease (Xrn2 in mammals and Rat1 in yeast) which degrades the nascent transcript and is thought to evict pol II from the DNA template [99].

17 then thought to degrade the nascent transcript until it contacts elongating pol II and physically removes it from the template [94]. Experiments supporting this model show that inhibition of the yeast 5’ to 3’ exonuclease Rat1 and its human homologue Xrn2 can disrupt termination. RNAi knockdown of Xrn2 causes transcriptional read-through of a

β-globin containing plasmid in nuclear run on assays [100]. It has not been shown directly that Xrn2 is required for termination of stably integrated constructs or chromosomal genes. Several studies that contradict the torpedo model, and thereby lend support to allosteric or hybrid models, suggest that pre-mRNA 3’ end cleavage is not required for pol II termination [101, 102], and that 5’ to 3’ degradation of cleaved transcripts is not sufficient for pol II termination [103-105].

The allosteric or “anti-terminator” model proposes that transcription of the poly(A) signal induces a conformational change in the pol II elongation complex, rendering it prone to terminate [93] (Figure 1-6A). This change may result from recruitment of cleavage/polyadenylation factors leading to subsequent eviction of anti- termination factors or recruitment of dedicated termination factors. There are several studies that support this model. The yeast 3’ end processing factor Pcf11 has been shown to promote disruption of the elongation complex in vitro, by contacting both pol II CTD and the nascent RNA [106]. Other studies in yeast implicate the transcriptional co- activator Sub1 [107] and the SR-like protein Npl3 [108] as anti-termination factors that are overcome by the activity of Rna15 upon transcription through the poly(A) site. A recent study showed CTD phosphorylation at Ser7P limited termination factor recruitment in gene bodies [109]. 3’ end displacement of elongation factors, such as

Spt5, has also been implicated in termination [110]. The discrepancies between evidence

18 for the allosteric and torpedo termination models imply that there is likely a network of 3’ end processing factors acting in concert with 5’ to 3’ exonucleases to induce termination, as predicted by the hybrid model.

The hybrid model of termination stems from two observations. 1) While 5’ to 3’ degradation of cleaved nascent RNA promotes termination, it is not sufficient and requires the recruitment of 3’ end processing factors such as Pcf11. 2) Recruitment of

Pcf11 requires the presence of the 5’ to 3’ exonuclease Rat1. This reciprocal requirement of factor recruitment implies that neither torpedo action nor allosteric changes in the EC alone are sufficient for termination [103].

The above models occasionally attempt to explain the connection between 3’ end processing and termination by singling out 3’ processing factors as termination factors.

Well-characterized termination factors associated with 3’ end processing have been identified in various eukaryotic systems.

A majority of the factors shown to be directly involved in poly(A) site pol II termination have been identified in S. cerevisiae. Disrupting the activity of Rna14,

Rna15, and Pcf11, factors involved in transcript cleavage at the poly(A) site, was shown to cause termination defects on CYC1 [110, 111]. Termination defects were also observed in cells lacking the chromatin remodeler Chd1 [112]. CPF subunits Yhh1p

(CPSF-160 in humans) and Ssu72 were shown to be involved in termination by transcriptional run-on analysis [113]. As previously mentioned, depletion of the yeast 5’-

3’ exonuclease Rat1 inhibits but does not completely prevent termination [114].

A genome wide RNAi screen for termination factors in C. elegans revealed a number of 3’ end processing factors including homologues of the cleavage and

19 polyadenylation factors Cstf1, CPSF1 and CPSF4. This screen did not implicate 5’ to 3’ exonucleases as predicted by the torpedo model, although it did identify CIDS-1, a homolog of the yeast 3’ processing factor Rtt103 shown to copurify with Rat1 [114].

Recruitment of the identified 3’ processing factors to the EC may render pol II competent for termination, lending support to the allosteric model. This screen also revealed novel factors not previously shown to be involved in either poly(A) dependent 3’ end processing or termination, including the known snRNA/snoRNA termination factor Nrd1 and the putative splicing factor SRp20 [79].

Many of the termination factors identified in yeast and C. elegans have homologues in mammalian systems, but for most of these, a role in mammalian termination has not been established. Factors for which a role in mammalian termination has been directly demonstrated in plasmid based systems include the Rat1 homologue

Xrn2[100], the splicing factor p54nrb [115], the cleavage/polyadenylation factor Pcf11

[116], the Sen1 helicase homologue senataxin (SETX) [117], and the mitotic termination factor TTF2 [118]. A recent study showed that RNA/DNA hybrid structures (R-loops) are especially prevalent over G-rich pause sites positioned downstream of poly(A) signals, and that SETX in association with Xrn2 resolves these R-loop structures to allow access for 5'-3' degradation downstream of poly(A) cleavage sites [119]. The complexities of how the many 3’ end processing factors and termination factors act in combination to terminate pol II in a gene specific manner are still the subject of investigation.

20

Premature termination

The state of promoter proximally accumulated pol II is the subject of debate. In vitro pulse chase experiments showed that pol II can pause close to the start site and then resume elongation [28]. Since then, the most popular interpretation of in vivo polymerase mapping studies has been that they result from a similar “promoter-proximal pausing” phenomenon. That at least some promoter-proximal polymerase can resume elongation is demonstrated by nuclear runon experiments, but the possibility that some fraction of the promoter-proximal polymerases terminate prematurely and never enter the productive elongation phase cannot be eliminated (Figure 1-3). The evidence for premature termination is quite clear for the SV40 late and HIV viral genes [46, 120], but it is less compelling for cellular genes. Prematurely terminated RNAs are a major product of c- myc transcription in microinjected Xenopus oocytes, but the physiological relevance of this phenomenon remains unproven [120]. Short (20–90 bases) transcription start site- associated (TSSa) sense and antisense transcripts present at very low levels in the nucleus were detected by high-throughput RNA sequencing [121], and these transcripts are possible products of termination within this region. Premature termination was reported to occur very infrequently near promoters in Drosophila S2 cells giving rise to TSSa

RNAs quickly degraded by the 3’ – 5’ nuclear exosome, but a termination mechanism was not proposed [122]. If premature termination by the torpedo-model is common for cellular promoter-proximally paused pol II, it would require a mechanism such as Drosha cleavage or decapping to provide the 5’ monophosphate substrate for Xrn2 enzymatic activity.

21

Several recent studies have provided evidence for torpedo mediated premature termination of promoter proximal pol II. Wagschal et. al showed that XRN2 co-localizes with RRP6 (3′ to 5’ ) and SETX at the HIV-1 promoter-proximal region and initiates premature pol II pausing and termination following Drosha mediated cleavage of a stem-loop structure in the nascent transcript [50]. Davidson et. al showed that Xrn2 co-transcriptionally degrades nascent transcripts when splicing and 3’ end processing are disrupted. These transcripts were stabilized by Dcp2 inhibition, and both

Xrn2 and Dcp2 co-localize to these loci by ChIP, suggesting that the Xrn2 substrate is provided co-transcriptionally by nuclear decapping [123]. In yeast unmethylated or uncapped mRNAs are processed by the Xrn2 Rai1 and prematurely terminated by Xrn2[124]. I will discuss mRNA decapping further in section 1.1D, and in Chapter III

I describe a mechanism whereby Xrn2 cooperates with nuclear decapping factors to elicit widespread premature termination on human genes.

1.1C Xrn2 and associated proteins

Xrn2 (Rat1 in yeast) is part of a conserved family of 5’- 3’ exonucleases (e.g.

Arabidopsis XRN2/3 and Trypanosome brucei XRND) that are localized in the nucleus and nucleolus, where they are involved not only in transcription termination by pol II (as discussed in 1.1B, Figure 1-6B) but also in termination by pol I[125]. Nuclear XRNs are also involved in various biological functions in addition to transcription termination, and they are essential for viability in S. cerevisiae, S. pombe, D. melanogaster, C. elegans, and A. thaliana [126-128]. In Arabidopsis, the nuclear XRNs independently suppress post-transcriptional gene silencing (PTGS) [125, 129], and an xrn2/xrn3 double mutation in Arabidopsis confers drought tolerance by overexpression of drought-responsive

22 genes[130, 131]. In eukaryotes RNA pol I transcribes 18S, 5.8S and 25S rRNAs as single long precursor molecules, and Xrn2 is responsible for trimming the 5’ ends of these precursors [132]. Xrn2 also facilitates telomere elongation through degradation of

Telomeric Repeat-containing RNA (TERRA)[133, 134].

Xrn2 often occurs in a complex with different interacting factors involved in various processes. In yeast, the Xrn2 ortholog Rat1 forms a stable complex with the co- factor Rai1 and this interaction stimulates Rat1 activity. Rai1 is involved in surveillance of mRNA capping and has pyrophospho- activity that removes the 5’ end of aberrantly capped mRNA[124]. Rtt103 is a yeast protein that binds pol II Ser2P CTD and interacts with Rat1-Rai1 to recruit this complex to pre-mRNA 3’ ends and promote pol II termination [104, 114, 135]. In human cells, Xrn2 interacts with P54nrb/PSF, a complex involved in pre-mRNA splicing and 3’ end formation [99, 115]. In this thesis I present a list of novel Xrn2 interacting proteins, including termination and decapping factors, revealed by proteomic analysis of Xrn2 immunoprecipitates from HeLa nuclear extracts.

Xrn2 has a potential role in degrading transcripts produced near promoters. In mammalian cells, a species of small, uncapped RNAs between 18 and 24 nt in length

RNAs were mapped around the TSS corresponding to the region of promoter proximally accumulated pol II [121, 136]. These RNAs accumulate in both the sense and antisense orientations within ~40 bp of the TSS [137]. Double knockdown of Xrn1 and Xrn2 in

HeLa cells by siRNA both lengthened and stabilized TSSa RNA fragments[138], implying that Xrn2 is responsible for degrading these transcripts. This study did not propose a mechanism whereby a substrate for Xrn2 is provided on TSSa RNA. In this

23 thesis I present evidence that Xrn2 is involved in termination of transcription precisely within the sense and antisense regions where TSSa RNAs are detected, and that this termination is decapping factor mediated.

1.1D Decapping

Removal of the 5’ methylated cap of mRNA, referred to as “decapping”, is a key step in the mRNA decay pathway that occurs frequently in the cytoplasm to regulate mRNA translational output and turnover. This reaction produces an m7GDP and 5’ monophosphate mRNA which provides a substrate for 5’ to 3’ degradation by Xrn exonucleases. Decapping is carried out by a complex of proteins (Figure 1-7), some of which can localize both to the cytoplasm and the nucleus. While decapping is thought to be a largely cytoplasmic event, recent studies, including work presented in this thesis, have described roles for nuclear decapping. In this section I will describe the decapping complex and what is currently known about cellular localization and biological functions of several components.

Dcp2 is the most widely conserved eukaryotic decapping enzyme. Originally identified as a decapping cofactor Saccharomyces cerevisiae [139], it was later shown to have intrinsic decapping activity not only in budding yeast, but also in Caenorhabditis elegans, and humans [140-143]. Dcp2 belongs to a family of enzymes that hydrolyze nucleoside diphosphates linked to other moieties via a conserved nucleotide diphosphate X (NUDIX) motif [139, 144]. Dcp2 also contains an N-terminal box A element, which promotes binding to the activator Dcp1, as well as a C-terminal box B element within the NUDIX motif, which is important for RNA binding and decapping activity [145, 146]. Substrates for Dcp2 activity include m7G-capped RNA,

24

Figure 1-7: mRNA decapping complex. Dcp2 is the enzyme responsible for removal of the 5’ cap of mRNA which normally occurs in the cytoplasm and allows entry of the cytoplasmic 5’ to 3’ exonuclease Xrn1. Dcp2 frequently occurs in a complex with the cofactor Dcp1, and a number of proteins associate with the Dcp1–Dcp2 complex, including Rck, Pat1, the Lsm1-7 complex, and the enhancers of decapping Hedls and Edc3 [147].

25 m2, 2, 7G-capped RNA and unmethylated capped RNA, but activity on unmethylated capped RNA is low [148]. Dcp2 recognizes mRNA substrates by simultaneously interacting with the cap and at least 12 nucleotides of the RNA body [149]. Dcp2 is thought to localize mainly in the cytoplasm in visible foci termed P-bodies [150].

However, Dcp2 was detected in the nucleus [151, 152] and was localized directly on genes by ChIP-qPCR [123].

Dcp1 is an essential cofactor of Dcp2, and was originally described as the catalytic decapping protein. S. cerevisiae Dcp1 mutation causes a severe decapping deficiency [148], and in vitro activity of recombinant Dcp2 is greatly stimulated by Dcp1

[141]. Dcp1 directly interacts with Dcp2 and this complex is conserved throughout eukaryotes [149, 153]. Dcp1 has also been shown by two-hybrid analysis to interact with

Xrn1, as well as the nuclear termination factor TTF2 [154]. In this thesis I show that

Dcp1 co-localizes with Xrn2, TTF2 and accumulated pol II at gene promoters, and also that Dcp1 activity can be immunoprecipitated from HeLa nuclear extracts.

Other proteins associate with the Dcp1–Dcp2 decapping complex, including Pat1,

Rck/p54, the Lsm1-7 complex, and the well characterized enhancers of decapping Hedls,

Edc1, Edc2, Edc3, and Edc4 (reviewed in [147]) (Figure 1-7). The first enhancers of decapping characterized, Edc1 and Edc2, were discovered in budding yeast and were shown to bind directly to RNA to stimulate decapping in vitro [155]. While no homologs of Edc1 and Edc2 have been identified in other eukaryotes, the enhancer Edc3, which interacts directly with Dcp1-Dcp2, is highly conserved [156] (Figure 1-7). Deletion of

Edc3 in yeast reduces the rate of decapping [157], and yeast Edc3 stimulates the decapping reaction in vitro [158, 159]. Edc3 has not been previously detected in the

26 nucleus. Work in this thesis shows by immunoprecipitation followed by mass spectrometry or western blot analysis that Edc3 associates with the nuclear exonuclease

Xrn2.

1.2 Regulation by co-transcriptional phosphatase activity

As previously discussed, the CTD of the Rpb1 subunit of pol II contains tandem repeats of the consensus sequence YSPTSPS, and repeat number varies in a species specific manner (26 in yeast and 52 in humans) [160]. The CTD is essential for viability but not for pol II catalytic activity, suggesting it serves a role outside of RNA synthesis

[161]. The dynamic structural flexibility of the CTD makes it highly versatile for presenting interfaces that recruit many different binding partners. Adding complexity to the possible timing and specificity of these binding interactions is the fact that the consensus heptad has the capability to be postranslationally modified at each of its seven residues [162-164]. The varying phosphorylations, glycosylations and isomerizations of these residues constitute a “CTD code” that can signal stage specific interactions between transcribing pol II and transcriptional activators or repressors, pre-mRNA processing factors, chromatin modifiers and remodelers, and elongation and termination factors

[165]. Studies utilizing monoclonal antibodies specific to the different CTD heptad modifications have elucidated how the CTD phosphorylation state changes during the transcription cycle. The first of these antibodies developed were against CTD Ser5P and

Ser2P, and they helped describe the characteristic transition between high Ser5P and high

Ser2P that proceeds during the transcription cycle and facilitates important co- transcriptional events (Figure 1-1). Other CTD phosphorylations described include

Ser7P, Y1P and T4P [166-168]. In section 1.1 I described the Ser5P modification that

27 occurs during PIC formation and which is important for capping and early elongation. In this section, I will review dynamic phosphorylation at the CTD Ser2 position, and its functional significance in elongation and 3’ end formation.

1.2A Dynamic phosphorylation of RNA pol II CTD Ser2

The typical profile of CTD phosphorylation on eukaryotic protein coding genes follows an inverted Ser5P to Ser2P accumulation profile, where Ser5P is high near the

TSS then decreases towards gene 3’ ends, and Ser2P is low near the TSS and increases as pol II travels through the gene body (Figure 1-1). Several are capable of phosphorylating CTD Ser2P. In S. cerevisiae, Bur1 and Ctk1 are two enzymes that share this capability [169-171] The S. pombe ortholog of Bur1 and Ctk1 are Cdk9 and Lsk1 respectively [171-173]. In addition to CTD Ser2, Bur1 and Cdk9 also phosphorylates the

Spt5 subunit of the DISF at a C-terminal repeat domain (CTR) similar to pol II CTD, and this phosphorylation converts Spt5 from a negative to a positive elongation factor [169,

174]. As discussed in section 1.1, metazoan DSIF recruits the negative elongation factor

NELF, which is also phosphorylated by Cdk9 subunit of PTEFb to allow pol II to elongate away from the promoter proximal pause region [67, 175, 176] (Figure 1-4).

Other recently described CTD Ser2 kinases include Cdk12, Cdk13 and Brd4 [177-179].

While the role for Cdk13 remains unclear, Cdk12 is thought to be important for the DNA damage response through specific gene activation [180, 181]. Brd4 recruits PTEFb, but has been shown to phosphorylate Ser2 in the absence of PTEFb on certain genes or in lines where PTEFb expression is low [179, 182]. Ser2P accumulates downstream of the

TSS within gene bodies (Figure 1-1), and is correlated with the recruitment of splicing and elongation factors. Spt6 is one such elongation factor that contains a tandem SH2

28 domain shown to directly bind Ser2P. Spt6 is involved in a number of co-transcriptional events including recruitment of export factors and chromatin remodelers [183]. Set2 is a histone methyltransferase that is recruited and stabilized by CTD Ser2P [184-186]. Set2 confers H3K36 trimethylation that is characteristic of productively elongating pol II on active genes [187]. Ser2P is also important for H2B monoubiquitination (H2Bub1) and

3’ end formation of histone mRNAs [188]. Ser2P peaks downstream of the PAS on most genes and is therefore localized to regions of 3’ end processing factor recruitment. The termination factors Rtt103 and Pcf11 are shown directly bind Ser2P [189, 190]. It has been speculated that Ser2P may be involved in the recruitment of other 3’ end processing machinery such as cleavage and polyadenylation factors. ChIP-seq experiments supporting this idea show that Ser2P peaks correspond to 3’ end processing factor peaks near gene poly(A)-sites [109, 191, 192]. In Chapter IV I will address the question of whether changes in Ser2P profiles result in changes in recruitment of the cleavage stimulation factor Cstf77. The removal of Ser2P is thought to be primarily carried out by the CTD phosphatase Fcp1 (Figure 1-1). The predominating model is that Fcp1 acts post-transcriptionally to remove Ser2P and restore the pool of hypophosphorylated pol II for initiation of further rounds of transcription [193]. Studies have shown that Fcp1 acts co-transcriptionally in yeast to limit the amount of Ser2P within gene bodies [171]. This thesis will present evidence that Fcp1 plays a similar role in human cells, and that limiting Ser2P co-transcriptionally influences pol II elongation as well as mRNA 3’ end formation.

29

1.2C Co-transcriptional phosphatase activity

A number of enzymes are involved in the removal CTD phosphorylation marks

(Figure 1-1). Specific removal of Ser5P can be accomplished in vivo by Ssu72, and mammalian small CTD phosphatases such as Scp1 [193, 194]. In yeast Ser5P erasure is carried out primarily by Ssu72 and this activity depends on CTD proline isomerization

[195, 196]. Rtr1 is another Ser5P specific phosphatase identified in S. cerevisiae, and it is thought to be redundant with Ssu72 [197]. The human homologue of Rtr1, RNAPII- associated protein 2 (RPAP2) [198] also dephosphorylates Ser5P [199]. RPAP2 is recruited to snRNA transcripts by CTD Ser7P and is important for snRNA correct 3′ end processing [199]. Fcp1 has been shown to have phosphatase activity against CTD at

Ser5P, but is believed to preferentially act on Ser2P.

Fcp1 is considered the primary CTD phosphatase conserved throughout eukaryotes [12]. Fcp1 binds pol II and the RAP74 subunit of TFIIF both through its central and C-terminal domains, which stimulates its activity [200]. Fcp1 can dephosphorylate CTD at Ser5P and Ser2P, but has varying affinities/specificities for these marks in different systems. Fcp1 purified from S. cerevisiae dephosphorylates

Ser5P in vitro and not Ser2P [201]. Contradictory evidence shows Fcp1 from fission yeast S.pombe is 10 fold more active on Ser2P than Ser5P [202], and Fcp1 mutation in budding yeast increases Ser2P but only modestly increases Ser5P [171]. Fcp1 from mammalian cells appears to have roughly equal activity on Ser2P and Ser5P in vitro

[203]. In this thesis I will provide evidence that Fcp1 dephosphorylates both Ser2P and

Ser5P on human genes in a cellular context.

30

Fcp1 is believed to act largely post-transcriptionally, and the main function of its phosphatase activity is in replenishing hypophosphorylated pol II that is competent for

PIC formation. There is evidence, however, that co-transcriptional Fcp1 activity may be involved in regulating pol II elongation. Fcp1 mutation in S. cerevisiae increases both pol

II elongation and Ser2P on genes, and Fcp1 inhibits Tat-mediated activation of HIV-1 transcription, which is regulated at the level of elongation[171, 204]. If Fcp1 activity limits Ser2P on genes, it could be considered as an elongation antagonist, since it erases the phosphorylation mark added to the CTD by the positive elongation factor PTEFb.

Elevated levels of Ser2P CTD in the absence of Fcp1 activity could also give rise to early recruitment of 3’ end processing factors and affect poly(A)-site choice. In this thesis I will present evidence that Fcp1 localizes to the 5’ ends of human genes where it acts to limit both Ser5P and Ser2P on gene bodies. This cotranscriptional Fcp1 activity influences pol II elongation, and depletion of Fcp1 results in global shifts in poly(A) site choice.

1.2D Alternative polyadenylation

As discussed in section 1.1, 3’ end formation of most protein coding mRNAs involves the transcription of the PAS, recruitment of cleavage and polyadenylation machinery, endonucleolytic cleavage of the message at the cleavage site, and addition of an extended polyA tail. Recently it has been demonstrated that the majority of human transcripts give rise to multiple detectable sites of 3’ end formation [205]. This alternative polyadenylation (APA) allows the production of multiple isoforms that vary in their stability, localization, translational output and even function. APA can arise from events such as intronic APA, internal APA, and alternative

31 terminal exon APA [87]. The most common form of APA, known as tandem 3’ UTR

APA, is splicing independent and involves the choice of alternative cleavage sites within a continuous 3’ UTR. Tandem 3’ UTRs can lead to loss of cis regulatory elements such as AU-rich elements (ARE) or microRNA binding sites contained in 3’ UTRs, and play a role in gene regulation by altering message stability or localization [206]. Message stabilization by shortening of 3’ UTRs has been demonstrated in activated T lymphocytes, neurons, embryonic and tumor cells, and it has been suggested that shorter

3’ UTRs may be a hallmark of unchecked proliferation [206-209]. An open question in the field is how the transcription machinery choses a given PAS in response to environmental or developmental signals, and what co-transcriptional events influence this choice.

One way APA is thought to be regulated is through varying 3’ end processing machinery expression levels. 3’ end processing machinery was shown to be downregulated during mouse embryonic development and this lead to progressive 3’

UTR lengthening [208]. 3’ end processing factors were upregulated leading to widespread 3’ UTR shortening during reprogramming into stem cells [210] or the transition from growth arrest to proliferation [211]. Another example of this type of APA regulation is at the immunoglobulin M (IgM) gene. Low concentrations of the cleavage stimulation factor 2 CSTF2 in early stages of B cell development favor cleavage at the canonical and stronger IgM poly(A) site, whereas high CSTF2 concentrations in activated

B cells induce usage of the weaker proximal site [212, 213]. High cleavage factor levels do not necessarily correlate with usage of proximal poly(A) sites. The cleavage factors

CFIm25 and CFIm68 were shown to play a role cleavage to proximal sites [87]. Given

32 the enrichment of CTD Ser2P peaks on human genes within gene 3’ regions corresponding to 3’ end processing machinery recruitment, it is reasonable to speculate that Ser2P mediated recruitment of these factors may be a mechanism regulating APA.

1.3 Specific questions

Understanding the various steps of the transcription cycle in detail is important for understanding the regulation of gene expression at the most fundamental level, and provides insights into the molecular mechanisms underlying human disease. This thesis focuses on mechanisms that influence pol II transit across protein coding genes, and the state of elongating pol II during promoter proximal accumulation, elongation and termination. I will address the following specific questions 1) What are the various factors that are important for pol II termination? 2) What is the role of premature termination in limiting the flux of pol II from the promoter proximal region into gene bodies? 3) What is the role of CTD phosphorylation on elongation and mRNA 3’ end formation?

33

CHAPTER II

MATERIALS AND METHODS

2.1 Cell lines and growth conditions

HeLa and HEK293-Flp-in T-REX-glob cells were grown in DMEM medium supplemented with 10% fetal bovine serum, 1% pen/strep at 37°C and 5% CO2.

HEK293-Flp-in T-REX-glob cells contain a stably integrated, hygromycin-resistance gene and CMV b-globin reporter that is not relevant to these studies.

2.2 Antibodies

Rabbit anti-pan pol II CTD was described previously [21]. Anti-Xrn2 was raised in rabbits against GST-Xrn2 (a.a. 402-537) and affinity purified. Affinity purified rabbit anti-GFP [214], rabbit anti-Dcp1a, and anti-Edc3 have been described [215, 216]. Anti-

Dcp2 and Anti-Aly (11G5, THOC4) were gifts of M. Kiledjian and G. Dreyfuss. Rabbit anti-TTF2 was from Proteintech and affinity purified sheep anti-TTF2 was a gift of D.

Price. Anti Fcp1 and anti Cstf77 ware raised in rabbits and anti-Cstf77 was affinity purified in lab. Affinity purified anti-Ser2P Rat monoclonal (3E10) was purchased from

Chromotek. Affinity purified anti-NELF-E (H-140) is a rabbit polyclonal IgG from Santa Cruz Biotechnology.

2.3 Immunopurification and mass spectrometry

Ben Erickson conducted Xrn2 immunopurification as follows: Rabbit anti-Xrn2 and anti-GFP control antibodies were affinity purified and coupled to Amino-Link resin

(Pierce) at 1 mg/ml and equal amounts were used for precipitation from HeLa Nuclear extract in the presence of RNAseA (20mg/ml) in buffer D. IP’ed material was

34 fractionated by 4-15% SDS-PAGE and each lane was cut into 14 bands. Proteins were reduced, alkylated and trypsin digested in the gel, extracted and analyzed by LCMS/MS on two analytical platforms: ThermoFisher LTQ XL and LTQ-FT Ultra. Results for the proteins most enriched in the Xrn2 IP relative to the GFP control by number of assigned spectra are listed in Table A1-2. Total assigned spectra were 22346 and 29540 for the anti-GFP, and 16776 and 23442 for the anti-Xrn2 LTQ and FT analyses respectively.

2.4 RNAi-mediated knock-downs

pLKO.1-puro shRNA lentiviruses (Open Biosystems) were used to infect

HEK293-Flp-in T-REX-glob cells. For Xrn2:TTF2 double knockdown, a neomycin resistance marker was cloned in place of the puromycin resistance marker in pLKO.1- puro TRCN0000049900 targeting Xrn2 and knockdowns were verified by western blot.

shRNA lentiviruses are described in detail in Table A1-1.

2.5 RT-PCR

RNA was extracted from knockdown or control lines using Trizol (Invitrogen).

RNA was polyadenylated and reverse transcribed with a mixture of oligo-dT and random primers using the miSCRIPT kit (Qiagen). Real-time PCR quantification on Roche LC-

480 was carried out using SYBR green reagent (Invitrogen) and primers specific to the

HSP90AA1 and UBA52 (HSP90AA1 F: 5’TGGTTTCTCTCAAGGACTACT3’,

HSP90AA1 R: 5’GGGCTCAATCATATAGATCACT3’ UBA52 F:

5’CAGCTTGCCCAGAAATACA3’, UBA52 R:

5’CTTCAAGGAAAGAACCACCTTA3’). Results were normalized to the mitochondrial

COX1 mRNA control (F: 5' ACTAACAGACCGCAACCTCAA-3', R: 5'

AGGATAAGAATATAAACTTCAGGGTGAC-3').

35

2.6 ChIP, ChIP-Sequencing, mapping and analysis

Cells were crosslinked in monolayer with 1% formaldehyde and harvested in complete RIPA buffer. Chromatin shearing was done at 4 degrees in the Bio-Ruptor sonicator for 3 X 15 minute cycles. Immunoprecipitations from 2 mg of cross-linked extract, using 2-5 ug of appropriate antibody with 30 ul preblocked protein-A sepherose, were processed for Illumina library construction. Real-time PCR quantification on Roche

LC-480 was carried out using SYBR green reagent (Invitrogen) and primers specific to the

GAPDH gene. Illumina adaptors were ligated and libraries were amplified using tru-seq indexed primers. Fragments were size selected by agarose gel or AMPure bead purification (Beckman Coulter). Libraries were sequenced on the Illumina Genome

Analyzer IIx and Hi-Seq platforms. Single-end 34 base reads (after removing barcodes) or 50 base reads (tru-seq index demultiplexed) were mapped to the hg18 UCSC (Mar. 2006) or hg19 (2009) with Bowtie version 0.12.5 [217]. We generated bed and wig profiles using 50bp bins and 200bp windows assuming a 180bp fragment size shifting effect. Results were viewed with the UCSC genome browser, integrated genome browser (IGB), or the R statistical software package.

We determined the central position of each ChIPed DNA fragment and calculated its position relative the TSS. Relative frequencies are defined as read counts per 50 bp bin fixed divided by the total number of aligned reads in all bins. The y-axis represents the proportion of counts contained in each bin. Values of the shScramble control were subtracted from log2 values of the relative frequencies in each bin of the experimental samples. Peaks of enrichment of Xrn2, Dcp1a and TTF2 in Hela cells were mapped relative to an input DNA background using the HOMER peakfinder (v3.2) [218] with a

36 default option except that fragment length was set to 200bp (-fragLength 200 - inputFragLength 200).

Heat map plots of pol II, Dcp1a, Xrn2 and TTF2 were determined by Dr.

Hyunmin Kim in the target range 1kb up and 2kb downstream from the TSS of RefSeq genes. Reads per bin per million bases (RPBM, binsize = 50bp) were determined by normalizing each sample to total aligned reads (in millions). Genes were sorted by pol II density. We defined the Escape Index (EI) to measure the flux of Pol II from the promoter region into the body of a gene. EI corresponds to

body _ density log2 promoter _ density where body_density = # tags per base in a range between +301 base from TSS and poly(A) site and promoter_density = # tags per base in a range between -30 and +300 base from TSS. EI changes (DEI) in knockdown cells relative to the shScrambled control were compared by one-sample T-test for each gene assuming as a null hypothesis a normal distribution fitted by DEI of untreated parent to the shScramble control. We obtained a cut-off threshold for the false discovery rates (FDR) by correcting for multiple p values [219]. ChIP-seq data sets are deposited at GEO accession GSE36185.

2.7 PA-Seq and analysis

Total RNA was harvested from cells by Trizoltm protocol, and chloroform/ethanol extracted. Total RNA was pA+ selected from 100ug using BioMag Oligo-(dT) beads by manufacturer’s protocol. 0.1-1ug pA+ RNA was fragmented with Ambion fragmentation reagent per manufacturer’s protocol. cDNA synthesis of pA+ fragmented RNA performed using Protoscript II and circular oligo dT primer at 42 degrees for 45 minutes:

37

/5Phos/GAT CGG AAG AGC ACA CGT CT/ideoxyU/ /ideoxyU/AC ACT CTT TCC

CTA CAC GAC GCT CTT CCG ATC TTT TTT TTT TTT VN

This cDNA was circularized with CircLigase by manufacturer’s protocol, and libraries were prepared using tru-seq indexed primers and sequenced on the Illumina Hiseq platform.

Sequenced libraries were mapped to hg19, and peaks were filtered to include only known or reported pA-sites according to refseq and Merk databases. PSY analysis was done by Dr. Hyunmin Kim to assign significance, directionality and magnitude of pA-site shifts between pA-seq experiments (Table A1-3).

2.8 Decapping reactions

Cap-labeled substrate RNA was prepared using 32P-GTP and recombinant vaccinia RNA guanylyltransferase and (guanine-N7)-methyltransferase. Decapping reactions were carried out in 50 mM Tris, pH 7.9, 30 mM (NH4)2SO4, 1 mM MgCl2, 1 mM DTT) for 30 min at 30º followed by 30 min. at 37° using 1-5 ng of 32P cap-labeled

RNA (0.03 to 0.15 pmol) and total Hela nuclear extract (57g) or immunoprecipitated protein on protein A beads. As a positive control we used 1 g of C. elegans recombinant

Dcp2. Reactions were stopped by addition of EDTA and freezing. Reaction products were resolved by thin-layer chromatography (TLC) as described.

38

CHAPTER III

mRNA DECAPPING FACTORS AND THE EXONUCLEASE XRN2 FUNCTION

IN WIDESPREAD PREMATURE TERMINATION OF RNA POLYMERASE II

TRANSCRIPTION1

3.1 Summary

I investigated the role of human mRNA decapping factors in control of transcription by RNA polymerase II (pol II). Decapping proteins Edc3, Dcp1a, and Dcp2 and the termination factor TTF2 coimmunoprecipitate with Xrn2, the nuclear 5'-3' exonuclease "torpedo" that facilitates transcription termination at the 3' ends of genes.

Dcp1a, Xrn2, and TTF2 localize near transcription start sites (TSSs) by ChIP-seq. At genes with 5' peaks of paused pol II, knockdown of decapping or termination factors

Xrn2 and TTF2 shifted polymerase away from the TSS toward upstream and downstream distal positions. This redistribution of pol II is similar in magnitude to that caused by depletion of the elongation factor Spt5. I propose that coupled decapping of nascent transcripts and premature termination by the "torpedo" mechanism is a widespread and acts to limits bidirectional pol II elongation. Regulated co-transcriptional decapping near promoter-proximal pause sites followed by premature termination could control productive pol II elongation.

1 This chapter is taken all or in part from Brannan, K et. Al (2012)“mRNA decapping factors and the exonuclease Xrn2 function in widespread premature termination of RNA polymerase II transcription.” Molecular Cell 46(3):311-24 by permission of Cell Press.

39

3.2 Introduction

Decapping is a major control step in cytoplasmic mRNA degradation [220, 221].

In mammals this step is catalyzed by two unrelated pyrophosphatases. Dcp2 occurs in a complex with the cofactors Dcp1a/b, Edc3, Hedls, and Rck/p54 [140, 153] (Figure 1-7) and is responsible for decapping a subset of mRNAs. The single subunit enzyme Nudt16 was recently found to be responsible for the majority of cytoplasmic decapping in some cells [151]. While Dcp2 is mainly cytoplasmic, it has been detected in the nucleus [151,

152] and has been implicated in turnover of misprocessed nuclear mRNA precursors in yeast [222]. Decapping initiates 5’–3’ mRNA degradation by exposing a 5’ phosphate substrate for the major cytoplasmic 5’–3’ RNA exonuclease Xrn1 [223].

The conserved nuclear homolog of Xrn1 is the exonuclease Xrn2, Rat1 in yeast, which acts as a ‘‘torpedo’’ [224] that facilitates poly(A) site-dependent termination by

RNA polymerase II (pol II) downstream of genes [100, 114] (Figure 1-6). In yeast, Rat1 is also required for optimal termination of rDNA transcription by pol I [225, 226]. Just as decapping provides an entry point for Xrn1 on mature mRNAs in the cytoplasm, cotranscriptional poly (A) site cleavage provides a 5’ phosphate entry site for Xrn2/Rat1 that then degrades nascent pol II transcripts in the nucleus. The Xrn2/Rat1 torpedo is necessary but not sufficient to evict pol II from the template [103, 104]. Additional unknown factors probably cooperate with Xrn2 to elicit termination. One candidate is the

DNA-dependent ATPase TTF2/lodestar, which can release pol II and RNA transcripts from a DNA template in vitro [227] and helps eliminate pol II from condensed mitotic [118]. Interestingly, TTF2 physically interacts with the decapping factor

Dcp1a [154].

40

Control of the flux of elongating pol II through promoter-proximal pause sites is important for regulation of a large fraction of genes in multicellular organisms [41, 42,

54, 175, 228]. As a result of promoter-proximal pausing, high densities of pol II accumulate at the TSSs of most human genes with much lower densities downstream within genes. Promoter-proximal pausing is facilitated by the negative elongation factors

NELF and DSIF (Spt4/5) and antagonized by the positive elongation factor PTEFb,

TFIIS, and transcriptional activators [67, 229, 230]. Control of the transition from pausing to productive elongation is widely thought to be a mechanism for rapidly up- regulating transcription in response to developmental and environmental cues [29, 31, 54,

228, 231-233]. However the fate of paused polymerases in vivo is a long-standing unresolved question. In principle these polymerases could resume productive elongation in response to an appropriate signal or they could terminate prematurely in a manner analogous to prokaryotic transcriptional regulatory mechanisms [234].

Premature termination near 5’ ends has been detected within yeast genes where it is promoted by Nrd1, Nab3, and Sen1 [71, 75, 78, 235]. In addition, premature termination involving the Rat1 exonuclease has been detected in yeast mutants that are defective for mRNA capping [124, 236]. In these mutants premature termination by a torpedo mechanism is thought to serve a quality control function preventing production of full-length transcripts with defective cap structures. In mammalian cells premature termination of transcription has been demonstrated for SV40 and HIV viral genes and its mechanism is unknown [24, 46, 237].

While there is only sparse evidence for promoter-proximal termination in metazoans, the recent finding that transcription frequently initiates in both directions at

41 human promoters [42, 121, 238] begs the question of whether antisense transcription is limited by a novel termination mechanism. In this chapter I report that decapping proteins and TTF2 interact with Xrn2 and that these factors localize by ChIP at 5’ ends of genes. Knockdown of decapping and termination factors by shRNA caused a widespread re-positioning of pol II at 5’ ends of genes away from start sites and toward distal positions both downstream and upstream. These results suggest that co-transcriptional decapping and premature termination by a torpedo mechanism is broadly employed to limit transcription of human genes.

3.3 Results

3.3A Xrn2 associates with TTF2 and mRNA decapping factors

To identify candidate termination factors that associate with Xrn2, Ben Erickson performed mass spectrometry of Xrn2 immunoprecipitates (IPs) from RNase-treated

HeLa nuclear extract. TTF2 was among the most strongly enriched proteins in the Xrn2

IP relative to the control anti-GFP IP and was confirmed by western blotting (Figures 3-

1A and 3-1B). TTF2 interacts with Dcp1a in the yeast two-hybrid assay [154] and consistent with this association, we found that Dcp1a and two other decapping factors

Edc3 and Dcp2 co-IP with Xrn2 (Figure 3-1A,B). The pol I termination and chromatin re-modeling factors TTF-I and Rsf1 [239] were also strongly enriched in the Xrn2 IP

(Figure 3-1A) suggesting a role for Xrn2 in termination of human rRNA transcription as previously demonstrated in yeast [225, 226]. Additional Xrn2-associated proteins

42

A

Figure 3-1: Xrn2 associates with termination and mRNA decapping factors. (A) Selected proteins enriched in the Xrn2 IP from RNAseA treated HeLa nuclear extract analyzed by MS on LTQ and FT platforms. Anti-GFP is a negative control. Total assigned spectra were 22346 and 29540 for the anti-GFP and 16776 and 23442 for the anti-Xrn2 LTQ and FT analyses respectively. (B) Western blots of Xrn2 and GFP IPs probed with anti-TTF2, -Edc3, -Dcp1a, -Dcp2, and Aly as a negative control. Inputs are 7.5% of the IP.

43 identified by MS include rRNA maturation factors, splicing factors and cleavage- polyadenylation factors (Figure 3-1A, Table A1-2).

3.3B Termination and decapping factors localize at 5’ ends near paused pol II

I investigated the functional significance of the copurification of decapping factors and TTF2 with Xrn2. The association of decapping factors with a nuclear enzyme,

Xrn2, was surprising as decapping is thought to be predominantly cytoplasmic [151,

221]. However, I found that decapping activity is readily detectable in HeLa nuclear extract and can be specifically immunoprecipitated by anti-Dcp1a antibody (Figure 3-2).

I investigated whether decapping factors might act in the nucleus at sites of transcription by ChIP-seq analysis of Dcp1a and pol II in HeLa cells. Because Dcp1a interacts with

TTF2 [154] and Xrn2 (Figure 3-1), I further tested whether these three factors colocalize on genes by ChIP-seq with antibodies whose specificity was confirmed by shRNA knockdown of the respective proteins (Figure 3-3A; [118]). The Dcp1a ChIP-seq signals were sufficient to identify 745 genes with peaks of enrichment within 500 bases of a TSS

(FDR < 0.05) detected by the HOMER peak finder (Heinz et al., 2010). Note that while the peak finding analysis of our ChIP-seq results reveals significant enrichment of Dcp1a,

Xrn2, and TTF2 near TSSs, it does not provide a complete list of genes where these factors are found at the 5’ end. Furthermore Xrn2 and TTF2 colocalized with Dcp1a near these start sites to a significant extent (Figures 3-3). Interestingly, peaks of Xrn2 and

TTF2 accumulation (FDR < 0.05) were more frequent within 500 bases of a TSS than they were in the region 0–3 kb downstream of a poly(A) site. Importantly, the ChIP signals for Dcp1a, TTF2, and Xrn2 overlap extensively with one another and with pol II

44

IP

-

IP - Buffer GFP Dcp2 Dcp1a HeLaNE

7 m GDP

Capped RNA

1 2 3 4 5

Figure 3-2: Decapping activity in nuclear extract. Decapping reactions were performed with anti-Dcp1a immunoprecipitate or anti-GFP control immunoprecipitate from Hela nuclear extract (NE, 1.9 mg) recombinant Dcp2 (1g), buffer control, or total Hela NE and products were separated by TLC.

45

Figure 3-3: ChIP-seq of pol II, Xrn2, TTF2 and Dcp1a in HeLa cells. A) ChIP-Seq profiles of pol II with Xrn2, TTF2 and Dcp1a on the RBM39 and MAT2A genes. B) ChIP-Seq heatmap profiles for pol II, Xrn2, TTF2 and Dcp1a in HeLa cells for 10,034 genes >2kb long in the region -1kb to +2kb relative to the TSS ranked by pol II signal.

46 at several thousand genes in the region between -1 kb and +2 kb from the TSS (Figures 3-

3B). In addition, when transcription elongation was inhibited with DRB, the distribution of Xrn2 on the GAPDH gene shifted toward the 5’ end of the gene in parallel with pol II

(Figures 3-4A and 3-4B). The localization of TTF2 at 5’ ends of genes was further confirmed by ChIP-seq using an independent commercial polyclonal antibody (Nova

Fong, data not shown). These results show that these termination and decapping factors commonly localize near the 5’ ends of genes.

3.3C Depletion of Xrn2 and TTF2 re-distributes pol II away from the TSS toward promoter-distal positions

Colocalization of a decapping factor with Xrn2 at 5’ ends suggests a possible parallel with cytoplasmic mRNA degradation by decapping and 5’–3’ exonucleolytic degradation by Xrn1 [220]. According to this model, decapping of nascent transcripts coupled to Xrn2-mediated degradation might cause premature termination of transcription by the torpedo mechanism. If decapping factors, Xrn2, and TTF2 terminate a population of promoter-proximal polymerases, then their depletion would enhance readthrough transcription downstream and/or upstream of promoters. In this event, relative pol II density in promoter-distal regions is expected to increase relative to promoter-proximal regions. To test this prediction, I first knocked down Xrn2 and TTF2 individually and together in stable HEK293 cell lines by infection with lentiviral shRNA expression vectors (Figure 3-5A and Table A1-1). I then measured pol II occupancy along genes by ChIP-seq in the knockdown lines relative to two controls, the uninfected parent and a line infected with a scrambled shRNA lentivirus. Knockdown of Xrn2 and/or TTF2 had only small effects on the overall enrichment of pol II within genes

47

Figure 3-4: DRB induces similar changes in pol II and Xrn2 distribution on GAPDH. Q-PCR of anti-pol II A) or -Xrn2 B) ChIP from the shScramble control HEK293 cells treated with vehicle or DRB (50M, 8 hr). Signals are normalized to the values at amplicon +55. Means (n>3) and standard errors of the mean (SEM) are shown. Note loss of both pol II and Xrn2 from the 3’ end when elongation is inhibited by DRB.

48

Figure 3-5: RNAi-mediated knockdown of termination and decapping factors. Western blots of equal amounts of total protein from uninfected controls and cells infected with lentiviruses expressing shRNAs directed against Xrn2, TTF2, Xrn2+TTF2, Edc3, Dcp1a, and Dcp2. Cstf77 or Symplekin are loading controls. All blots from uninfected controls and cognate shRNA expressing cells are from the same gel. The anti- Xrn2, -TTF2 (PTG labs), and -Dcp1a antibodies used for Western blotting are the same as those used for ChIP-seq.

49 compared to parent or scrambled shRNA controls, as determined by anti-pol II ChIP-seq reads per kb per million in the region from -30 to the poly(A) site (Figure 3-6). The depletion of Xrn2 and TTF2 achieved in our cell lines therefore does not cause widespread inhibition of transcription initiation. Pol II density profiles normalized to total read-counts (Figure 3-7) revealed that, as predicted by the model, on genes where pol II normally accumulates in the promoter-proximal region, knockdown of Xrn2+TTF2 markedly increased pol II density at promoter-distal positions and decreased it in promoter-proximal positions. Knockdown of Xrn2 or TTF2 individually caused a similar redistribution of pol II from promoter-proximal to promoter-distal positions, but the effects were smaller than in the double knockdown (Figure 3-10B). The complementary changes in pol II density at distal and proximal sites are consistent with a redistribution of polymerase away from the start site and into the body of the gene. In contrast only modest effects on pol II 5’ –3’ distribution were evident on genes with lower levels of promoter-proximal pol II accumulation, including HIST1H4C, SFN, and ARHGDIA and the noncoding RNA gene HOTAIR (Figure 3-8).

On the RBM39 gene where divergent antisense transcription occurs [42, 238], depletion of Xrn2+TTF2 modestly increases pol II density upstream of the TSS as well as downstream (Figure 3-7). This observation suggests that Xrn2 and TTF2 may normally limit the amount of divergent antisense transcription. Similar enhancement of antisense transcription in the Xrn2+TTF2 knockdown cells occurred at the ARHGDIA gene

(Figure 3-8). These observations on individual genes were confirmed in an average of many genes (see Figures 3-10A and B).

50

A

B

Figure 3-6: Knockdown of decapping and termination factors does not affect total pol II recruitment. Pol II densities expressed as log2 RPKM (reads per kilobase per million) in the region from -30 to the poly(A) site (see diagram) in the shScrambled (scr) control relative to uninfected parent and double knockdown lines (see also Figure S2C). Each plot represents a set of 5507 genes >2kb long and >2kb from neighbors enriched for pol II (FDR<.05) within 500 bases of a TSS that are shared in common between the uninfected parent, scrambled control and knockdown datasets. We noted some decrease in RPKM in knockdown lines among more highly expressed genes but the number of genes that significantly diverge from the scrambled control (p<.01, two-tailed T test) is a small fraction of the total.

51

Figure 3-7: Pol II ChIP-Seq normalized to total read counts on indicated genes in HEK293 cells stably expressing shScramble (scr), and shXrn2+shTTF2 shRNAs. Pol II density is reduced near the TSS and increased within the gene and 3’ flank when Xrn2 and TTF2 are knocked down in combination. Images were made using the IGB6.4 Browser [240].

52

Figure 3-8: Effect of Xrn2+TTF2 and Dcp2 Knockdown on Pol II Distribution at Genes Lacking 5’ Pol II Accumulation and Intronless Genes. Pol II ChIP-Seq density profiles in shScramble (scr), shXrn2:TTF2, and shDcp2 cells on genes with relatively low levels of “paused” pol II at their 5’ ends. Note there is little effect of knockdowns on pol II distribution around the start sites of these genes.

53

Xrn2 and TTF2 shRNAs did not cause a major inhibition of transcription termination downstream of poly(A) sites (Figure 3-9) at the level of knockdown achieved in our cells, consistent with a previous report [105]. At some genes like ACTB, however, termination was delayed (Figure 3-7). In summary, the results in Figure 3-7 show that depletion of two termination factors that localize at 5’ ends of genes caused a shift in the distribution of pol II from promoter-proximal to promoter-distal positions both upstream and downstream of the TSS.

To determine how generally Xrn2/TTF2 depletion affects transcription within human genes, I assessed pol II distribution on a cohort of 5507 genes in each knockdown cell line. The genes analyzed were selected on the basis that they are >2 kb long and >2 kb from neighboring genes and are significantly enriched (FDR < 0.05) for pol II in the region within 500 bases of a TSS. First, I plotted the relative frequency of pol II ChIP-seq reads on these genes in the region from -1 kb to +2 kb from the TSS for the Xrn2+TTF2 double knockdown and scrambled shRNA control lines (Figure 3-10A). This analysis shows that the relative increase in pol II read frequency at positions both upstream and downstream of the TSS in the double knockdown line is detectable even in a cohort of more than 5000 genes. To quantify the effects of Xrn2 and TTF2 depletion on a broad scale, I plotted the ratios of the relative frequencies of pol II ChIP-seq reads (see

Experimental Procedures) in each knockdown cell line compared to the scrambled shRNA control (Figure 3-10B). As a control, I compared the uninfected parent line to the scrambled shRNA (gray line, Figure 3-10B), and as expected there was little difference.

In contrast, each of the knockdown lines demonstrated a specific increase in the relative

54

Figure 3-9: Little effect of Xrn2+TTF2 or Dcp2 knockdown on termination at 3’ ends. Log2 ratio of pol II ChIP-Seq read frequencies in knockdown cell lines relative to the scrambled shRNA control (scr) on >6000 genes in >2kb long and >5kb from neighboring genes. Signal is plotted relative to refseq poly(A) site (at 0), -1kb and + 5kb.

55

Figure 3-10: Termination factor knockdown redistributes pol II away from promoter proximal regions. A) ChIP-Seq relative frequency profiles (reads per 50bp bin divided by total reads in all bins, see Experimental Procedures) in shScramble (scr) control and shXrn2+shTTF2 expressing HEK293 cells across 5507 genes. B) Log2 ratio of pol II ChIP-Seq read frequencies in uninfected parent and knockdown cell lines relative to the shScramble (scr) control. Note the additive effect with Xrn2+TTF2 knockdown.

56 frequency of pol II at promoter-distal positions compared to the scrambled control, with the greatest effect in the Xrn2+TTF2 double knockdown (Figure 3-10B). The apparent additive effect of depleting TTF2 and Xrn2 is significant because the only known function in common between these proteins is in transcription termination. Notably, depletion of termination factors increased relative pol II frequency at positions both downstream and upstream of the TSS, and these effects are widespread, as evidenced by the fact that they are detectable in plots encompassing the group of 5507 genes.

To investigate further how termination factors influence pol II distribution across genes, I calculated the escape index (EI), defined as log2 promoter-distal (+301 to the polyA site) pol II density: promoter-proximal (-100 to +300) pol II density (Figure 3-

11C) in control and knockdown HEK293 cell lines for 5507 genes. This analysis revealed significant (FDR < 0.01) increases in EI for many genes in the Xrn2+TTF2 double knockdown line, but not the uninfected parent cells, relative to the scrambled shRNA control (Figure 3-11A). Importantly, the greatest effects of Xrn2/TTF2 depletion on EI were evident for those genes with low EI values that have the greatest accumulation of promoter-proximal pol II density. In contrast, genes in the minority class of human genes with high EI values and low levels of promoter-proximal pol II accumulation, were less affected by depletion of these termination factors. This point is particularly clear for the

Xrn2+TTF2 double knockdown (Figure 3-11, red) and is further supported by inspection of individual genes with low levels of promoter-proximal pausing and high EI values

(Figure 3-8). In summary, these results show that at many genes the termination factors

Xrn2 and TTF2 limit pol II elongation away from the TSS in both directions, consistent with widespread premature termination in promoter- proximal regions.

57

Figure 3-11: Termination factor knockdown increases pol II escape index. Definition of Escape Index (EI) A) Escape index (EI) for 5507 genes. The numbers of genes that differ from the scr control (FDR < 0.01, one-sample T-test) are given. Note widespread elevation of EI in the Xrn2:TTF2 knockdown line. Best fit lines (orange) were generated by loess fitting (Local Polynomial Regression Fitting). Black dots correspond to genes with significant (FDR <.05) 5’ peaks of Dcp1a ChIP signal in Hela.

58

3.3D Decapping factor knockdown redistributes pol II away from promoter- proximal positions

If Xrn2 and TTF2 cooperate to facilitate termination at promoter-proximal regions by a torpedo mechanism, then decapping or RNA cleavage would be required to provide a suitable 5’ phosphorylated substrate for the exonuclease. To test whether decapping factors affect how pol II is distributed between promoter-proximal and distal regions, I stably knocked down Edc3, Dcp1a, or Dcp2 in HEK293 cells (Figure 3-5B) and analyzed pol II distributions by ChIP-seq. Similar to the termination factors, depletion of Dcp2,

Edc3, or Dcp1a did not cause major changes in overall levels of pol II enrichment within genes when compared to the scrambled shRNA control (Figure 3-6B). Depletion of all these decapping factors, especially Dcp2, specifically increased pol II densities at promoter-distal locations and decreased it at the TSS (Figures 3-12). As we observed for

Xrn2/TTF2, depletion of decapping factors enhanced relative pol II accumulation at the 3’ pause that precedes termination at some genes like ACTB, probably because more polymerases reach the 3’ end (Figure 3-12). Knockdown of decapping factors increased pol II relative frequency at promoter-distal positions both upstream and downstream of the TSS (Figures 3-13A and 3-13B). In addition, plots of escape index for the group of 5537 genes revealed that the promoter-proximal to distal shift in pol II distribution caused by depleting decapping factors is widespread (Figure 3-

14). In summary, these results reveal a surprising effect of depleting decapping factors on pol II transcription that strongly resembles the effect of depleting the termination factors

Xrn2 and TTF2. I interpret these results to suggest that decapping, coupled to polymerase

59

Figure 3-12: Knockdown of decapping factors increases relative pol II occupancy upstream and downstream of start sites. Pol II ChIP-Seq normalized to total read counts in HEK293 cells stably expressing shScramble (scr) and shDcp2. Note reduced pol II density near the TSS and increased density within the gene and 3’ flank with knockdown of Dcp2.

60

Figure 3-13: Knockdown of decapping factors increases relative pol II occupancy upstream and downstream of start sites. A) ChIP-Seq relative frequency profiles (5507 genes) in shScramble (scr) and shDcp2 expressing HEK293 cells (as in Figure 3-10). B) Log2 ratio of pol II ChIP-Seq read frequencies in uninfected parent and decapping factor knockdown cell lines relative to the scr control (as in Figure 3-10).

61

Figure 3-14: Escape index in knockdown lines and uninfected parent compared to scr control (as in Figure 3-11). The numbers of genes that differ from the scr control (FDR < 0.01, one-sample T-test) are given. Note widespread elevation of EI particularly in the Dcp2 knockdown line. Black dots correspond to genes with significant 5’ peaks of Dcp1a ChIP signal (as in Figure 3-11).

62 displacement by the Xrn2 torpedo and the ATPase TTF2, facilitates premature termination of bidirectional transcription from human promoters.

3.3E Comparing the roles of termination, decapping, and pausing factors in pol II localization

The current model for control of pol II transcriptional elongation on a large fraction of metazoan genes is by regulated release from a promoter-proximal pause that is established by NELF and DSIF (Spt4/5) [67, 175, 229, 231, 233]. Previously, the effects of depleting NelfA and Spt5 on transcription were defined by pol II ChIP-seq in mouse embryonic stem (ES) cells [54], and I compared these results with the effects of depleting decapping and termination factors. In agreement with Rahl et al. (2010), pol II relative frequencies and EI on over 5000 mouse genes showed that knockdown of Spt5 increased pol II density at downstream positions relative to the TSS (Figures 3-15A, 3-15B, 3-15C), just as I saw for knockdown of termination and decapping factors. The magnitude of the increase in relative pol II density within genes caused by Spt5 knockdown was similar to that caused by Edc3, Xrn2, or TTF2 knockdown and less than Dcp1a, Dcp2, or

Xrn2:TTF2 double knockdown. In contrast with the latter factors, Spt5 knockdown appeared to have little effect on divergent transcription, as judged by pol II densities upstream of the TSS, while the NelfA knockdown had a much more modest effect on average pol II distribution among these genes (Figure 3-15A, 3-15B), as previously reported[54]. Note that Spt5 knockdown resulted in a step-like increase in pol II density immediately downstream of start sites, whereas termination and decapping factor knockdown causes a gradual increase in relative pol II density over a 2 kb region

63

Figure 3-15: Knockdown of decapping and termination factors has similar effect on pol II density as does knockdown of negative elongation factors. A) ChIP-Seq relative frequency profiles (as in 3-10) in mouse ES cells expressing shScramble (scr) (6813 genes) shNelfA (6467 genes) and shSpt5 (5604 genes) shRNAs from [54]. B) Log2 ratios of pol II ChIP-Seq read frequencies in shNelfA and shSpt5 expressing ES cells relative to the scr control. C) Escape index (EI) in shNelfA and shSpt5 ES cells relative to the scr control. FDR was not calculated because only one control, the scrambled shRNA, was analyzed. Note elevation of EI and pol II relative frequency in the gene body in Spt5 knockdown cells is comparable to decapping factor knockdown cells (Figures 3-13 and 3-14).

64 downstream of the start site. This effect is consistent with premature termination by the torpedo occurring over hundreds of bases, similar to termination downstream of genes.

3.3F Knockdown of termination and decapping factors redistributes pol II within

‘‘pausing-regulated’’ genes

Regulated premature termination could function to control transcription by modulating the flux of polymerases through the body of a gene. I investigated this idea by examining genes that are regulated at the elongation level. At PIM1, ENO1, CCNB1, and

UBA52, which are activated by Myc and HSP90AA1 and HSPA8, which are activated by

HSF, depletion of Dcp2, or Xrn2+TTF2, increased pol II density within the gene body relative to the TSS (Figures 3-16A–3-16D ). Consistent with the results in Figures 3-

15A–15C, the effects of TTF2/Xrn2 and Dcp2 knockdown on pol II distribution at individual genes are at least as great as the effects of depleting NelfA or Spt5 (Figure 3-

16). How the transcriptome is affected by global changes in pol II elongation caused by depletion of pausing regulators or decapping/termination factors remains to be determined, but it is likely to be influenced by homeostatic mechanisms that control mRNA stability. Knockdown of Xrn2+TTF2 increased the steady-state level of

HSP90AA1 mRNA relative to a mitochondrial mRNA control by 1.6-fold, as determined by Q RT-PCR, but had no significant effect on abundance of the Myc target mRNA

UBA52 (Figure 3-17). Dcp2 and Xrn2+TTF2 depletion by shRNA, but not a scrambled shRNA control, significantly elevated EI on most genes within a group of 173 Myc [241] and HSF targets thought to be activated by release of promoter-proximal pausing (Figure

3-16). I conclude that depletion of decapping and termination factors can mimic the effects of activators that stimulate transcriptional elongation.

65

Figure 3-16: Escape index (EI) plots in knockdown lines and uninfected parent vs. scr control. These plots are on 173 genes likely to be regulated at the level of elongation including targets of Myc [241] and HSF. Note widespread elevation of EI when termination or decapping factors are depleted.

66

Figure 3-17: Effects of termination factor knockdown on steady-state mRNA abundance. Q-PCR analysis of cDNA from parent or shXrn2:TTF2 lines using primers specific for the 3’ regions of HSP90AA1 or UBA52 genes normalized to the mitochondrial COX1 control. Standard deviations are shown. Termination factor knockdown results in increased total RNA levels for HSP90AA1 but not UBA52.

67

3.4 Discussion

3.4A A Nuclear function for decapping factors in promoting premature termination of transcription

I report a previously unidentified function for mRNA decapping enzymes in the nucleus that is distinct from their well known role in cytoplasmic mRNA turnover. I propose that, in the nucleus, decapping of nascent transcripts near sites of promoter- proximal pol II pausing facilitates premature termination of transcription by providing an entry point for the 5’–3’ RNA exonuclease torpedo, Xrn2 [224] previously shown to function in termination by pol I and pol II at 3’ ends of genes [100, 114, 225, 226]

(Figure 7). Consistent with a general role in termination, I found that Xrn2 also co- purified with the pol I termination factors TTF-I and Rsf1[239] (Figure 1A). I further suggest that premature termination is aided by the ATPase, TTF2, that interacts with the decapping factor Dcp1a [154] and displaces pol II from template DNA in vitro [227]. In support of this model, I have presented two lines of evidence: (i) Decapping factors and

TTF2 coimmunoprecipitate with Xrn2 and colocalize with Xrn2 at the 5’ ends of genes;

(ii) shRNA-mediated depletion of Xrn2, TTF2, or the decapping factors Edc3, Dcp1a, and Dcp2 all caused widespread repositioning of pol II away from TSSs and toward distal locations upstream and downstream of the TSS consistent with inhibition of premature termination.

While I cannot exclude the possibility that Edc3, Dcp1a, Dcp2, TTF2, and Xrn2 might also influence pol II pausing or elongation rate, the most straightforward interpretation of our results is that they affect pol II distribution through their well- established functions in decapping and transcription termination, because these two

68 functions can account for all our observations. Hence, according to the two-step model in

Figure 3-19, cotranscriptional decapping first acts to expose a 5’ monophosphate end on the nascent RNA, and this provides a substrate for Xrn2 to begin degrading the transcript.

Knockdown of Xrn2 or TTF2 caused relatively modest effects, but the double knockdown induced a larger pol II redistribution into gene bodies. The only known function in common between these two proteins is transcription termination. Therefore, the similarity and additive nature of these defects are consistent with both proteins participating in promoter-proximal termination. It remains possible that TTF2 or decapping factors could also influence the processivity of the Xrn2 exonuclease. A redistribution of pol II from the TSS into the gene body rather than a selective slow-down of elongation within the gene is consistent with the fact that elevated pol II density within gene bodies was largely offset by reduced density around start sites. As a result, overall pol II occupancy on genes was little affected by depletion of termination or decapping factors in our cell lines. In future it will be important to test the cotranscriptional decapping model by developing methods for determination of the cap status of nascent transcripts at genes with and without evidence of premature termination.

Frequent premature termination by a torpedo mechanism in promoter-proximal regions could affect the interpretation of nuclear run-on experiments because run-on transcripts made by polymerases that have engaged the Xrn2 nuclease may be degraded.

This might explain why the excess of promoter-proximal pol II detected by ChIP is not always matched by an equivalent excess of run-on signal detected by GRO-seq [233].

69

3.4B Premature termination and the control of eukaryotic gene expression

Regulated premature termination is a well-characterized mechanism of controlling gene expression in prokaryotes [27, 234]. In eukaryotes, premature termination by pol II has been detected on viral transcription units [24, 46, 237] and at some yeast genes [71,

75, 78] but there is no previous evidence to suggest that it has widespread significance in metazoans. The results presented here suggest that in fact premature termination by a

“torpedo” mechanism prevents productive elongation by a substantial fraction of the promoter-proximal pol II transcription complexes at thousands of human genes.

Premature termination by this mechanism occurs in yeast mutants defective in mRNA capping [124, 236] where it prevents production of full-length uncapped transcripts. The extent of the pol II re-distribution effected by termination and decapping factors is not easily reconciled with a function limited to quality control through elimination of the small fraction of transcription complexes with incompletely capped nascent transcripts.

Instead these results imply that premature termination by decapping coupled with pol II displacement by Xrn2/TTF2 is a quite general mechanism for limiting productive transcriptional elongation. Depletion of termination and decapping factors enhanced relative pol II occupancy at positions upstream of many promoters while decreasing it near the start sites. This result suggests that co-transcriptional decapping and polymerase eviction facilitated by Xrn2/TTF2 accounts for termination of at least some of the divergent transcription that commonly occurs at human promoters [42, 121, 238]. In addition to its role as a termination factor, Xrn2 has been implicated in formation of the short sense and antisense transcription start site-associated (TSS-a) RNAs that lack cap

70 structures [121, 138]. The mechanism of TSS-a RNA formation is not known but it could involve the decapping and exonucleolytic degradation of nascent transcripts.

In summary, the results presented in Chapter III suggest that cotranscriptional decapping and termination aided by Xrn2/TTF2 is an important constraint on the flux of pol II through thousands of genes. Indeed premature termination appears to exert an effect on transcription in vivo that is as general and as profound as DSIF and NELF- mediated promoter-proximal pausing. In future it will be of interest to investigate how cotranscriptional decapping and subsequent termination are controlled and how they may be affected by activators and promoter-proximal pausing. I speculate that the promoter- proximal pause found at most human genes may serve as a decision point for regulated decapping (Figure 3-18). Paused elongation complexes that undergo decapping will be prematurely terminated whereas complexes that escape decapping will retain the potential to resume productive elongation subject to regulation by TFIIS, DSIF, NELF, and PTEFb

[67, 229, 230]. The possibility that decapping and coupled premature termination could be subject to regulation is consistent with the observation that the effects of depleting decapping and termination factors closely resemble the effects of activators like c-Myc and HSF that stimulate transcriptional elongation [43, 54]. Whether decapping and termination factors are recruited to 5’ ends of genes together in a complex or sequentially remains to be resolved however preliminary results indicate that Xrn2 is recruited independently of decapping factors because knockdown of Dcp2 did not reduce Xrn2 localization at the 5’ end of GAPDH (Data not shown). The presence of Dcp1a, TTF2 and Xrn2 at the 5’ end of a gene is not necessarily correlated with strong premature termination, perhaps because their activity is suppressed at some genes. For example

71

Figure 3-18: The promoter-proximal “torpedo” model for premature termination of pol II transcription. Co-transcriptional decapping by Dcp2 at promoter-proximal pause sites is proposed to expose a 5’ PO4 that is attacked by the exonuclease Xrn2 leading to termination facilitated by TTF2 and possibly other factors. Polymerases that escape decapping are bound by cap binding complex (CBC) and may pause, resume productive elongation, or terminate by an alternative mechanism. We speculate that promoter- proximal pausing serves as a decision point for regulated decapping.

72 decapping and termination factors are found at the 5’ ends of histone genes (data not shown) where there is little evidence of premature termination, and transcription is unaffected by their depletion. It will also be of interest to determine whether the recruitment or the enzymatic activity of Xrn2, TTF2, or Dcp2 is regulated by their interactions with one another and other proteins at various promoters. In addition to decapping factors, the deadenylase Ccr4 and the exosome have been implicated in aspects of transcription [242-244] suggesting that a broad connection may exist between transcription and RNA turnover.

73

CHAPTER III

COTRANSCRIPTIONAL DEPHOSPHORYLATION OF RNA POL II

INFLUENCES PROMOTER PROXIMAL PAUSING AND PA-SITE CHOICE.

4.1 Summary

Hyperphosphorylation of the C-terminal domain of RNA polymerase II (pol II) is important for controlling both pol II elongation and co-transcriptional recruitment of mRNA processing factors during the transcription cycle. CTD serine 5 phosphorylation

(Ser5P) is important for mRNA capping and pol II promoter clearance, while serine 2

(Ser2P) phosphorylation is important for pol II escape from promoter proximal pausing and recruitment of 3’ end processing factors. Fcp1 is the major CTD phosphatase previously shown to be responsible for restoring pol II to the hypophosphorylated form

[12]. Here we show that Fcp1 localizes to mammalian promoter proximal regions genome wide. Fcp1 depletion redistributed relative Ser5P and Ser2P, and resulted in reduced promoter proximal accumulation of pol II, as well as early pol II termination. pA-seq analysis revealed that Fcp1 knockdown shifted pA-site usage to promoter proximal sites for many genes. These results describe a new role for co-transcriptional action of Fcp1 in controlling both pol II transit and mRNA 3’ end formation.

74

4.2 Introduction

The largest subunit (Rpb1) of the mammalian RNA polymerase II (pol II) has a unique C-terminal domain comprised of 52 tandem repeats of the heptapeptide consensus sequence YSPTSPS. Post-translational modifications of the CTD heptapeptides are important for regulating mRNA synthesis and processing [194]. Dynamic phosphorylation of the CTD at serine 5 (Ser5P) and serine 2 (Ser2P) positions have well characterized patterns across gene bodies and are crucial not only for recruiting RNA processing factors, but also for controlling pol II pausing, elongation and termination

(reviewed in [245]). Pol II initiates transcription in its hypophosphorylated form, and is subsequently phosphorylated at the Ser5 position by the Kin28/Cdk7 subunit of general transcription factor TFIIH, which is required for the recruitment of capping factors [21]

[246]. Following initiation, pol II accumulates in the promoter proximal region of most mammalian genes and is restricted from elongation by the action of negative elongation and termination factors [233, 247]. The transition of paused pol II into productive elongation is considered an important regulatory target for synchronously modulating gene expression [248]. The proposed mechanism for pol II escape from promoter proximal pause is through the action of the Cdk9 kinase subunit of the positive elongation factor complex PTEFb. This kinase phosphorylates the negative elongation factors

NELF and DSIF, as well as the pol II CTD at the serine Ser2 position, which is believed to license paused pol II for productive elongation [59, 249]. Ser2P increases as the elongation complex proceeds through the length of the gene body [165].

Because Ser2P accumulates within gene bodies and is most abundant at gene 3’ ends preceding termination, it is believed to be important for recruitment of cleavage and

75 polyadenylation (C/P) machinery to sites of transcription of conserved hexanucleotide polyadenylation signals (PAS) [250] [251]. Transcription of these signals leads to pre- mRNA binding by a large complex of factors involved in cleavage at pA-sites 20-50bp downstream and subsequent addition of the poly(A) tail [252] [253]. One such factor in this complex, the cleavage stimulation factor Cstf77, accumulates within gene 3’ ends near pA-sites as demonstrated by chromatin precipitation [254].

Many human genes contain multiple PAS within their 3’ untranslated regions

(UTRs). Alternate use of these signals (APA) is a mechanism used to modulate gene expression in different cellular contexts (reviewed in [255]). Usage of pA-sites within gene proximal regions of 3’ UTRs can stabilize or increase translation of messages through elimination of regulatory targets such as AU rich elements (AREs) and microRNA target sites, and shortened 3’ UTRs of proto-oncogenes are a hallmark of certain types of cancers [207]. It is not yet known whether recruitment of 3’ end processing factors by CTD Ser2 phosphorylation state plays a role in pA-site choice on genes where APA occurs.

Fcp1 has been identified as the major CTD-specific phosphatase and is conserved from yeast to humans [256]. It is part of a family of phosphatases characterized by an N- terminal phosphatase domain with a DxDxT signature, and C-terminal Brca-1 C-terminal

(BRCT) domain [202]. Fcp1 interacts with the pol II CTD, and with the RAP74 subunit of the general transcription factor TFIIF, which activates its CTD- phosphatase activity

[12]. Mutation of Fcp1 results in a severe developmental disorder in humans, and its expression is disrupted in a number of human cancers, but the molecular mechanisms underlying Fcp1 related pathogenesis remain unclear [257].

76

There is conflicting evidence as to the specificity of Fcp1 CTD-phosphatase activity. Mammalian Fcp1 has been shown to dephosphorylate both Ser2 and Ser5 with similar efficiency in vitro [203]. CTD Ser2P is increased in Fcp1 mutant Saccharomyces cerevisiae, while phosphorylation at Ser5P is only mildly increased [171]. In contrast,

Fcp1 purified from Saccharomyces cerevisiae dephosphorylated Ser5, but not Ser2 [201].

In S. pombe Fcp1 was found to be 10-fold more active in de-phosphorylating Ser2 than

Ser5 [202]. Mammalian Fcp1 specificity has not been systematically characterized in a cellular context.

Fcp1 activity is thought to be important for restoring the pool of hypo- phosphorylated pol II for the initiation phase of transcription, but it is not clear whether this activity is primarily post or co-transcriptional. The association of Fcp1 with the general transcription factor TFIIF and with pol II CTD suggest a possible activity on chromatin at sites of transcription. Fcp1 was shown to localize to promoter proximal regions in S. cerevisiae and its mutation lead to an increase in elongation-associated Ser2 phosphorylation [171]. There is evidence that Fcp1 may play a role in gene activation at the elongation phase of transcription of HIV-1, as it has been shown to inhibit Tat- mediated transactivation [258]. Whether Fcp1 activity regulates elongation on mammalian cellular genes is not known.

Here I report that Fcp1 localizes at the 5’ end of human genes and limits CTD phosphorylation at both Ser2P and Ser5P positions. Fcp1 knockdown caused a widespread redistribution of pol II at gene 5’ ends away from transcription start sites

(TSS) toward downstream positions, and a localized increases in pol II Ser2P and Ser5P on highly expressed genes. Fcp1 knockdown also results in shifting in pA-site choice on

77

~1000 genes, primarily toward proximal positions. These results suggest that cotranscriptional dephosphorylation by Fcp1 is important for limiting both pol II elongation and usage of proximal alternative polyadenylation signals.

4.3 Results

4.3A Fcp1 localizes at 5’ ends near paused pol II

Since Fcp1 is shown to be active in dephosphorylating Ser5P, which is highest near transcription start sites (TSS) at the 5’ end of genes, and CTD Ser2P, which is highest at gene 3’ ends, I began by addressing the question of where on human genes

Fcp1 is most generally localized. To determine the genome-wide localization of Fcp1 in human cells, I performed ChIP-seq analysis from HeLa cell lysate with antibody specific to Fcp1. The specificity of this antibody was further demonstrated by shRNA-mediated knockdown of Fcp1 in HEK293 cells (Figure 4-3). HOMER peak finding analysis of

Fcp1 ChIP-seq signals identified 1900 genes with significant peaks of enrichment within

500 bases of a TSS (FDR < 0.05) as illustrated on the ZNF146 and PIN4 genes (Figures

4-1A and 4-1B) [218]. A meta-gene profile across ~15000 genes showed general enrichment of Fcp1 signal near the TSS when signal is averaged or when normalized to input (Figurea 4-2A and 4-2B). These results demonstrate that Fcp1 frequently localizes at the 5’ end of genes near the region of promoter proximal pol II accumulation, and may act co-transcriptionally to limit phosphorylation of promoter proximally paused pol II.

4.3B Depletion of Fcp1 reduces promoter proximal pol II pausing

The localization of Fcp1 near gene TSSs could imply a role influencing elongation of promoter proximally paused pol II away from these regions. If CTD Ser2 phosphorylation is reversed by the phosphatase activity of Fcp1 at sites of promoter

78

A

Scale 10 kb hg19 chr19: 36,710,000 36,715,000 36,720,000 36,725,000 36,730,000 36,735,000 0.236672 _ Fcp1.pool

Fcp1.pool

0.0197227 _ 9.32599 _ hg19_shScr_2_CTD

hg19_shScr_2_CTD

0.0297955 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) ZNF565 ZNF146 ZNF565 ZNF565 ZNF565 ZNF146 ZNF146 ZNF146 Poly(A) Sites, Both Reported and Predicted PolyA_DB Poly(A)

B

Scale 10 kb hg19 chrX: 71,405,000 71,410,000 71,415,000 0.453622 _ Fcp1.pool

Fcp1.pool

0.0197227 _ 3.15832 _ hg19_shScr_2_CTD

hg19_shScr_2_CTD

0.0297955 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) PIN4 PIN4 PIN4 Poly(A) Sites, Both Reported and Predicted PolyA_DB Poly(A)

Figure 4-1: ChIP-Seq of Fcp1 in HeLa cells shows enrichment around the TSS. A) Fcp1 ChIP-Seq signal (top track) enriched near the promoter on the ZNF146 gene, compared to CTD ChIP-Seq (bottom track) from 293 cells. B) ChIP-seq signal as in A) across the PIN4 gene, comparing Fcp1 and CTD signals.

79

A anti_Fcp1_ChIP

Fcp1_pool_hg19.543.N.sorted 0.0012 0.0010 0.0008 mean 0.0006 ChIP Density Fcp1 ChIP Density 0.0004 0.0002

-500 TSS 500 PolyA Gene position N = 15178 genes in common B anti_Fcp1_ChIP/ Input

Fcp1_pool_hg19.543.N.sorted hg19_293_shScr_input.543.N.sorted 3.0 2.5 2.0 1.5 mean - Relative Frequency - Relative mean 1.0 ChIP Density Fcp1/ Input ChIP Density 0.5

-500 TSS 500 PolyA Gene position N = 15178 genes in common Figure 4-2: ChIP-Seq of Fcp1 in HeLa cells shows enrichment around the TSS. A) Fcp1 ChIP-seq in HeLa cells, mean of bin counts genome wide for 15178 genes. Bins are -500 to +500 bp from the TSS, 10 bin variable gene body, and -1000 to +4000 from the poly(A) site. B) ChIP-Seq relative frequency profile of Fcp1, normalized to input (black line).

80

Figure 4-3: Western validation of Fcp1 knockdown. Western blots of equal amounts of total protein from uninfected controls and cells infected with lentiviruses expressing shRNAs directed against Fcp1. Cstf77 is a loading control. All blots from uninfected controls and cognate shRNA expressing cells are from the same gel. The anti-Fcp1 antibody used for Western blotting is the same as used for ChIP-seq.

81 proximal pol II pausing, it is possible that this dephosphorylation could limit elongation from the pause by counteracting Cdk9 kinase activity. To test this possibility, I depleted

Fcp1 in 293 cells using two independent lentiviral shRNA expression vectors (Figure 4-3,

Table A1-1). Fcp1 knockdown was followed by pol II ChIP-seq analysis and compared to scrambled shRNA treated or uninfected controls. Fcp1 activity has been reported to be important for initiation because it is necessary to restore the pool of hypophosphorylated pol II competent for the initiation phase of transcription. I found that Fcp1 depletion does cause at least some inhibition of pol II recruitment, as determined by anti-pol II ChIP-seq reads per kilobase per million in the region from the TSS to the poly(A) site (Figure 4-4).

Fcp1 knockdown redistributed CTD ChIP-seq signal on the heat shock gene HSP90AA1 and on the highly expressed ACTB gene (Figures 4-5A and 4-5B). Depletion of Fcp1 did result in a modest but widespread decrease in pol II accumulation near the TSS when measured on ~3000 highly expressed genes, but this did not correlate with a loss of pol II accumulation within gene bodies (Figure 4-6). Fcp1 knockdown did not result in lowering or redistribution of ChIP-seq signal for the negative elongation factor NELF, indicating that the influence of Fcp1 activity on promoter proximally paused pol II is independent of NELF localization (Figures 4-7 and 4-8). These results are consistent with a role for Fcp1 in limiting pol II elongation away from promoter proximal regions, in a manner independent of NELF recruitment.

4.3C Fcp1 knockdown increases relative CTD Ser2P and Ser5P within gene bodies

The distinctive pattern of CTD phosphorylation across mammalian genes is high

Ser5P and low Ser2P at 5’ ends, with a transition to high Ser2P and low Ser5P at 3’ ends

(Figure 1-1). If Fcp1 dephosphorylation of CTD at the Ser2 position occurs within gene

82

anti_CTD_ChIP

hg19_shScr_CTD_1_remap.543.N.sorted 0.06 shFcp1_999_CTD_hg19.543.N.sorted shScr_R2_CTD_hg19.543.N.sorted hg19_shFcp1_996_CTD_R2.543.N.sorted 0.05 0.04 0.03 mean 0.02 ChIP Density CTD ChIP Density 0.01 0.00

-500 TSS 500 PolyA Gene position N = 13007 genes in common

Figure 4-4: Fcp1 knockdown has some effect on pol II recruitment. Mean genome wide CTD ChIP-seq signal normalized to library size across the region between the TSS and pA-site. Color key: green = shScramble control 1, light blue = shScramble control 2, yellow = shFcp1_1 clone 996, dark blue = shFcp1_2 clone 999. Differences around the TSS indicate that Fcp1 depletion likely reduces pol II recruitment to some extent, while similar levels within the 10 bin variable gene body indicate that Fcp1 also likely inhibits elongation away from promoter regions.

83

A CTD ChIP-Seq Scale 10 kb hg19 chr14:102,540,000 102,545,000 102,550,000 102,555,000 102,560,000 20.2033 _ hg19_shScr_CTD_1_remap

hg19_shScr_CTD_1_remap

0.0396144 _ 22.4062 _ hg19_shScr_2_CTD

hg19_shScr_2_CTD

0.0297955 _ 7.12705 _ hg19_shFcp1_996_CTD_remap

hg19_shFcp1_996_CTD_remap

0.0304575 _ 8.00832 _ hg19_shFcp1_996_CTD_R2

hg19_shFcp1_996_CTD_R2

0.0494341 _ 6.89383 _ hg19_shFcp1_999_CTD

hg19_shFcp1_999_CTD

0.0163361 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) HSP90AA1 HSP90AA1 HSP90AA1 HSP90AA1 B Scale 5 kb hg19 chr7: 5,565,000 5,570,000 5,575,000 14.7762 _ hg19_shScr_CTD_1_remap

hg19_shScr_CTD_1_remap

0.0396144 _ 4.23096 _ hg19_shScr_2_CTD

hg19_shScr_2_CTD

0.0297955 _ 3.01529 _ hg19_shFcp1_996_CTD_remap

hg19_shFcp1_996_CTD_remap

0.0304575 _ 3.26265 _ hg19_shFcp1_996_CTD_R2

hg19_shFcp1_996_CTD_R2

0.0494341 _ 3.80631 _ hg19_shFcp1_999_CTD

hg19_shFcp1_999_CTD

0.0163361 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) BC044606 ACTB ACTB ACTB DL492006

Figure 4-5: Fcp1 knockdown redistributes CTD ChIP-seq signal away from gene 5’ regions. A) CTD ChIP-seq signal for two shScramble controls and three Fcp1 knockdown experiments (two independent shRNAs) on the HSP90AA1 gene normalized to library size. Top track is shScrambe control 1, the next track down shScramble control 2, the third track down is the shFcp1_1 (clone 996) line, and the bottom track is the shFcp1_2 (clone 999) line Note decreased promoter proximal signal corresponds to increased signal (red arrows) within the gene body B) CTD ChIP-seq signal as in (A) on the ACTB gene. shFcp1_1 shows slight early termination when compared to shScramble and shFcp1_2 lines.

84

anti CTD ChIP-Seq

hg19_shFcp1_996_CTD_R2.543.N.sorted - 5k_Brannan_gene_list shFcp1_999_CTD_hg19.543.N.sorted - 5k_Brannan_gene_list shScr_R2_CTD_hg19.543.N.sorted - 5k_Brannan_gene_list hg19_shFcp1_996_CTD_remap.543.N.sorted - 5k_Brannan_gene_list 0.20 hg19_shScr_CTD_1_remap.543.N.sorted - 5k_Brannan_gene_list 0.15 0.10 mean - Relative Frequency - Relative mean ChIP Density CTD ChIP Density 0.05 0.00

-500 TSS 500 PolyA +4000 Gene position N = 4840 : 5k_Brannan_gene_list Figure 4-6: Knockdown of Fcp1 reduces relative pol II occupancy near transcription start sites. ChIP-Seq relative frequency profiles across 2884 genes (reads per 50bp bin divided by total reads in all bins) from two independent shScramble (scr) controls (red and black lines), and two independent shFcp1 expressing HEK293 cell lines: shFcp1_1 (clone 996 technical replicates, green and orange lines), and shFcp1_2 (clone 999, blue line).. Bins are -500 to +500 bp from the TSS, 10 bin variable gene body, and -1000 to +4000 from the poly(A) site. Note reduced pol II accumulation near TSS, and higher pol II relative frequency within 3’ regions when Fcp1 is depleted.

85

A NELF ChIP-Seq Scale 10 kb hg19 chr19: 36,705,000 36,710,000 36,715,000 36,720,000 36,725,000 36,730,000 5.44072 _ hg19_shScr_Nelf hg19_shScr_Nelf

0.0388623 _ 8.70501 _ hg19_shFcp1_996_Nelf hg19_shFcp1_996_Nelf

0.0279007 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) ZNF565 ZNF146 ZNF565 ZNF565 ZNF565 ZNF146 ZNF146 ZNF146

B NELF ChIP-Seq Scale 5 kb hg19 chr14: 102,545,000 102,550,000 102,555,000 3.57533 _ hg19_shScr_Nelf hg19_shScr_Nelf

0.0388623 _ 6.02655 _ hg19_shFcp1_996_Nelf hg19_shFcp1_996_Nelf

0.0279007 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) HSP90AA1 HSP90AA1 HSP90AA1 HSP90AA1 Figure 4-7: Nelf distribution is not influenced by Fcp1 depletion. A) Nelf ChIP-seq signal on the ZNF146 in shFcp1_1 line (clone 996, top track) compared to shScramble (shScr) control line (bottom track). B) Nelf ChIP-seq signal as in A) on the HSP90AA1 gene.

86

anti_Nelf_ChIP

shScr_Nelf_hg19.543.N.sorted shFcp1_996_Nelf_hg19.543.N.sorted 0.20 0.15 0.10 mean - Relative Frequency - Relative mean ChIP Density Nelf ChIP Density 0.05 0.00

-500 TSS 500 PolyA Gene position N = 15644 genes in common

Figure 4-8: Fcp1 knockdown does not alter relative distribution of NELF genome- wide. Nelf ChIP-Seq relative frequency profiles (reads per 50bp bin divided by total reads in all bins) across 15644 genes in shScramble control line (shScr, black line) compared to 293 line expressing shFcp1 (shFcp1_1 clone 996, red line).

87 bodies, Fcp1 knockdown might result in increased Ser2P signal on genes. To evaluate how Fcp1 influences the pol II phosphorylation state on genes, I performed ChIP-seq in

Fcp1 knockdown and control lines using antibodies specific for the Ser2P and Ser5P

CTD marks. Upon knockdown of Fcp1, the Ser2P levels were detectably increased in gene 3’ regions both on individual genes such as PGAM1 and ZNF146 (Figure 4-9A and

4-9B), and on a subset of 3035 highly expressed genes (Figure 4-10). Notably, the increased relative Ser2P signal tends to localize within the region just upstream and downstream of the pA-site. Ser5P signal also shifted upon Fcp1 knockdown, with higher relative signal accumulation near the TSSs for the PGAM1 and XIST genes (Figure 4-

11). Metagene analysis across a cohort of 4873 genes shows that Fcp1 depletion not only redistributes Ser5P ChIP-seq signal to 5’ regions as shown by relative frequency plots

(Figure 4-12A), but also increases the mean normalized ChIP-seq signal within this region (Figure 4-12B). Together these results show that Fcp1 acts to reduce both Ser5P and Ser2P CTD marks on genes, and that this activity reduces pol II promoter proximal pausing.

4.3D Fcp1 knockdown results in an upstream shift in pA-site choice

Recruitment of C/P factors to sites of transcription occurs most often at gene 3’ ends near functional PASs. Ser2P ChIP-seq signals commonly peak downstream of the

PAS, coinciding with the prominent 3’ pol II peak that often precedes termination.

Because dynamically phosphorylated pol II CTD can facilitate the recruitment of mRNA processing factors, one possible mechanism for the specific recruitment of C/P factors to gene 3’ ends is through direct association of these factors with Ser2P CTD enriched within this region. The relative enrichment of Ser2P near pA-sites upon Fcp1 depletion

88

A Ser2P ChIP-Seq

Scale 10 kb hg19 chr10: 99,185,000 99,190,000 99,195,000 99,200,000 2.21366 _ hg19_shScr_Ser2_ratmc_1 hg19_shScr_Ser2_ratmc_1

0.0357042 _ 2.32994 _ hg19_shFcp1_996_Ser2_ratmc_1 hg19_shFcp1_996_Ser2_ratmc_1

0.029871 _ 1.95169 _ hg19_shFcp1_999_Ser2_ratmc_1 hg19_shFcp1_999_Ser2_ratmc_1

0.0513603 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) LOC644215 EXOSC1 PGAM1 PGAM1 EXOSC1 B

Scale 20 kb hg19 chr19: 36,700,000 36,710,000 36,720,000 36,730,000 36,740,000 2.69 - hg19_shScr_Ser2_ratmc_1

0.035704 _ 2.65852 - hg19_shFcp1_996_Ser2_ratmc_1

0.029871 _ 2.69 - hg19_shFcp1_999_Ser2_ratmc_1

0.05136 _ ZNF565 ZNF146 ZNF565 ZNF565 ZNF565 ZNF146 ZNF146 ZNF146 RefSeq Genes

Figure 4-9: Fcp1 knockdown shifts Ser2P distribution A) Ser2P ChIP-seq on PGAM1 gene. Top track is shScrambe control. The next track down is the shFcp1_1 (clone 996) line, and the bottom track is the shFcp1_2 (clone 999) line. Note increased signal (red arrows) near the pA-site in two Fcp1 knockdown lines when compared to Scramble control. B) Ser2P ChIP-Seq signal as in A) on the ZNF146 gene showing overall increased signal upon Fcp1 depletion and slight decreased signal (red arrows) in the region of the 3’ flank.

89

A anti Ser2P ChIP-Seq

hg19_shFcp1_999_Ser2_ratmc_1.543.N.sorted - 5k_Brannan_gene_list hg19_shScr_Ser2_ratmc_1.543.N.sorted - 5k_Brannan_gene_list 0.08 hg19_shFcp1_996_Ser2_ratmc_1.543.N.sorted - 5k_Brannan_gene_list 0.06 0.04 mean - Relative Frequency - Relative mean 0.02 ChIP Density Ser2P ChIP Density 0.00 -500 TSS 500 PolyA Gene position N = 3035 : 5k_Brannan_gene_list

B anti Ser2P ChIP-Seq

hg19_shFcp1_999_Ser2_ratmc_1.543.N.sorted - 5k_Brannan_gene_list hg19_shScr_Ser2_ratmc_1.543.N.sorted - 5k_Brannan_gene_list 0.08 hg19_shFcp1_996_Ser2_ratmc_1.543.N.sorted - 5k_Brannan_gene_list 0.06 0.04 mean - Relative Frequency - Relative mean ChIP Density Ser2P ChIP Density 0.02

PolyA Gene position N = 3035 : 5k_Brannan_gene_list Figure 4-10: Relative ChIP-Seq frequency of CTD Ser2P. A) ChIP-Seq relative frequency of Ser2P in Fcp1 knockdown lines across 3035 genes, compared to shScramble control. Black line is shScramble control, green line is shFcp1_1 clone 996 line, and blue line is shFcp1_2 clone 999 line. Note relative enrichment of Ser2P within gene 3’ regions in two shFcp1 lines when compared to shScramble controls. B) Window focusing on last 10 bins in (A) showing relative increase in Ser2P in Fcp1 knockdown within the region upstream and downstream of the pA-site preceding termination.

90

A Scale 10 kb hg19 chr10: 99,185,000 99,190,000 99,195,000 99,200,000 1.36502 _ hg19_shScr_Ser5 hg19_shScr_Ser5

0.0220164 _ 2.56851 _ hg19_shFcp1_996_Ser5 hg19_shFcp1_996_Ser5

0.0262093 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) LOC644215 EXOSC1 PGAM1 PGAM1 EXOSC1

B Scale 20 kb hg19 chrX: 73,050,000 73,100,000 5.61418 _ hg19_shScr_Ser5 hg19_shScr_Ser5

0.0220164 _ 6.91925 _ hg19_shFcp1_996_Ser5 hg19_shFcp1_996_Ser5

0.0262093 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) TSIX XIST Figure 4-11: A) Ser5P ChIP-seq for Fcp1 knockdown (clone 996, bottom track) compared to Scramble control (top track) on the PGAM1 gene. Fcp1 depletion results in a relative increase in Ser5P near the promoter, and a decrease in Ser5P within gene bodies. B) Ser5P ChIP-seq profiles as in A) on the XIST gene.

91

A

B

Figure 4-12: Relative ChIP-Seq frequency and mean signal of CTD Ser5P in shFcp1 and shScramble lines. A) ChIP-Seq relative frequency of Ser5P divided by total CTD across 4873 genes, normalized to shScramble control (black line). Note relative enrichment of Ser5P within gene 5’ regions in the shFcp1 line (clone 996, green line) when compared to shScramble control. B) Mean ChIP-Seq signal across genes as in (A), normalized by library size.

92 might be predicted to facilitate early recruitment of C/P factors to more proximal sites.

Analysis of relative pol II ChIP-seq signal, when normalized to levels at annotated refseq pA-sites, showed that Fcp1 knockdown results in an earlier decline in pol II signal compared to control, implying earlier pol II termination potentially due to earlier recruitment of C/P factors (Figure 4-13). To test whether Fcp1 depletion results in alternate pA-site usage, I performed RNA isolation and global mapping of gene 3’ ends

(pA-seq) in knockdown and control lines. Comparison of pA-seq peaks on genes known to use proximal and distal alternative polyadenylation signals, such as PGAM1 and

ZNF146, showed preferred usage of proximal pA-sites when Fcp1 is knocked down.

(Figures 4-14A and 4-14B). Significant shifts (p < 0.01) in pA-site usage were detected on ~500 genes when Fcp1 was depleted with a single shRNA (shFcp1_1 (996)), and of these genes, ~76% preferentially utilize proximal pA-sites, in line with our prediction

(Table A1-3). These proximal shifts were also seen for ~200 of these genes when Fcp1 was depleted with an alternate shRNA (shFcp1_2 (999)).

4.3E Fcp1 knockdown does not alter recruitment of Cstf77

Use of proximal pA-sites upon Fcp1 depletion could be a result of altered cleavage factor recruitment to sites of transcription. We investigated whether enrichment of Ser2P CTD within gene 5’ regions influences recruitment of the cleavage stimulation factor Cstf-77 by ChIP-seq. Normal Cstf-77 ChIP-seq signals accumulate at gene 3’ ends, similar to the profile seen for pol II Ser2P [254]. Fcp1 knockdown did not result in any significant redistribution of Cstf-77 signal when relative frequency was plotted across thousands of genes (Figure 4-15A). Fcp1 depletion did result in an earlier decline in Cstf77 ChIP-seq signals downstream of pA-sites, in line with the termination effect

93

anti_CTD_ChIP

hg19_shScr_CTD_1_remap.543.N.sorted - 5k_Brannan_gene_list shFcp1_999_CTD_hg19.543.N.sorted - 5k_Brannan_gene_list shScr_R2_CTD_hg19.543.N.sorted - 5k_Brannan_gene_list hg19_shFcp1_996_CTD_R2.543.N.sorted - 5k_Brannan_gene_list hg19_shFcp1_996_CTD_remap.543.N.sorted - 5k_Brannan_gene_list 1.2 1.0 mean 0.8 ChIP Density CTD ChIP Density 0.6

PolyA +4kb Gene position N = 2884 : 5k_Brannan_gene_list

Figure 4-13: Pol II ChIP-Seq signal relative to poly(A) sites. CTD ChIP-Seq signal across 2884 genes focused on the region form the refseq pA-site and 4kb downstream, from two independent shScramble (scr) controls (brown and black lines), and two independent shFcp1 expressing HEK293 cell lines: shFcp1_1 (clone 996 technical replicates, green and yellow lines), and shFcp1_2 (clone 999, blue line). Note the earlier decline in signal in Fcp1 knockdown lines, similar to that seen in Figure 4-4 for shFcp1_1 (996).

94

A Scale 5 kb hg19 chr8: 81,880,000 81,885,000 81,890,000 -1 _ KB_shScrCircligase42deg_130829_w10_peak_known

KB_shScrCircligase42deg_130829_w10_peak_known

-6 _ -2 _ KB_shFcp1996Circligase42deg_130829_peak_known_w10

KB_shFcp1996Circligase42deg_130829_peak_known_w10

-4 _ -2 _ KB_Fcp1shRNA999_130829_w10_peak_known

KB_Fcp1shRNA999_130829_w10_peak_known

-4 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) PAG1 Poly(A) Sites, Both Reported and Predicted PolyA_DB Poly(A) B Scale 2 kb hg19 chr19: 36,727,000 36,728,000 36,729,000 36,730,000 317 _ KB_shScrCircligase42deg_130829_w10_peak_known

KB_shScrCircligase42deg_130829_w10_peak_known

1 _ 114 _ KB_shFcp1996Circligase42deg_130829_peak_known_w10

KB_shFcp1996Circligase42deg_130829_peak_known_w10

1 _ 277 _ KB_Fcp1shRNA999_130829_w10_peak_known

KB_Fcp1shRNA999_130829_w10_peak_known

1 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) ZNF146 ZNF146 ZNF146 ZNF146 Poly(A) Sites, Both Reported and Predicted PolyA_DB Poly(A) Figure 4-14: A) pA-seq signal on the PAG1 gene shows proximal shifting in two independent shFcp1 lines (middle track is shFcp1_1 clone 996, and bottom track is shFcp1_2 clone 999) when compared to shScramble (top track). Refseq predicted and reported pA-site tracks are shown below sample tracks. B) Similar shift as in A) on the ZNF146 gene.

95

A

anti Cstf77 ChIP-Seq

0.08 hg19_shScr_Cstf77_1.543.N.sorted - 5k_Brannan_gene_list hg19_shFcp1_999_Cstf77_1.543.N.sorted - 5k_Brannan_gene_list hg19_shFcp1_996_Cstf77_1.543.N.sorted - 5k_Brannan_gene_list 0.06 0.04 mean - Relative Frequency - Relative mean ChIP Density Cstf77 ChIP Density 0.02

-500 TSS 500 PolyA Gene position N = 4412 : 5k_Brannan_gene_list

B

anti Cstf77 ChIP-Seq

hg19_shScr_Cstf77_1.543.N.sorted - 5k_Brannan_gene_list hg19_shFcp1_999_Cstf77_1.543.N.sorted - 5k_Brannan_gene_list 0.035 hg19_shFcp1_996_Cstf77_1.543.N.sorted - 5k_Brannan_gene_list 0.030 0.025 mean - Relative Frequency - Relative mean ChIP Density Cstf77 ChIP Density 0.020 0.015

PolyA +4000 Gene position N = 4412 : 5k_Brannan_gene_list

Figure 4-15: A) ChIP-seq relative frequency of Cstf77 across 2745 genes showing little redistribution in Fcp1 knockdown lines (shFcp1_1 996 is green line, shFcp1_2 999 is blue line) when compared to shSramble control (black line). B) Close up of bins 20-30 as in (A) showing that Cstf77 signal declines earlier in Fcp1 knockdown lines.

96 reported for CTD (Figures 4-13 and 4-15B). Cstf77 relative frequency was slightly increased in regions near the TSS and upstream of the pA-site when Fcp1 was depleted, but these results vary depending on the shRNA expressed (Figure 4-15). These results need to be repeated to determine if the mechanism by which Fcp1 increases usage of proximal pA-sites is independent of recruitment of this particular cleavage factor to pol II

Ser2P CTD within gene bodies. It will be interesting to determine how Fcp1 influences recruitment of other components of the C/P machinery, and whether CTD dephosphorylation by Fcp1 is a mechanism used to regulate gene expression through altered 3’ UTR mediated changes in message stability, localization or translational output.

4.4 Discussion

The primary pol II CTD phosphatase Fcp1 has historically been defined as the factor responsible for establishing a pool of hypophosphorylated pol II needed for initiation by post-transcriptional removal of CTD Ser2P and Ser5P, but the possibility remains that Fcp1 performs this function co-transcriptionally [8, 12, 203, 259]. A role for co-transcriptional action of Fcp1 in influencing pol II elongation and limiting CTD Ser2P within gene bodies was described in yeast [171]. In this chapter I report similar results in mammalian cells, showing that Fcp1 depletion not only increases the relative level of

CTD Ser2P within gene 5’ regions, but also increases the amount of pol II that escapes promoter proximal pausing and travels into gene bodies. It is conceivable that a competition exists between Fcp1 phosphatase activity and Cdk9 kinase activity that determines the extent of pol II elongation. Elongation might therefore be induced not

97 only by PTEFb recruitment or activity, but also by inhibition of Fcp1 recruitment or activity.

CTD Ser5P phosphorylation by the Cdk7 subunit of TFIIH is important for recruitment of capping enzymes (CE) and early elongation [18]. CE counters the effect of the negative elongation factor NELF to promote pol II escape into elongation [22]. Fcp1 dephosphorylation of Ser5P has been shown to reduce recruitment and activity of CE in vitro, and in yeast Fcp1 activity results in CE release from the elongation complex [20-

22]. If capping enzyme recruitment is necessary for promoter clearance, Fcp1 activity might increase pol II dwell time at promoters by competing with Ser5P mediated CE recruitment. In this chapter, I show that Fcp1 localizes to promoter regions by ChIP, and that depletion of Fcp1 increases relative Ser5P within these regions. Thus, there are two possible ways that Fcp1 might decrease promoter proximal accumulation of pol II: 1) through opposing the phosphorylation of Ser2P by the Cdk9 subunit of PTEFb, thereby reducing the fraction of pol II that is “licensed” to escape pausing, and 2) by limiting recruitment of CE by Ser5P, which promotes clearance [22]. It will be interesting to investigate how Fcp1 knockdown affects CE recruitment by ChIP. There is also likely at least some effect of Fcp1 activity on initiation by providing hypophosphorylated pol II, and depletion of Fcp1 in this study did show some effect on pol II recruitment (Figure 4-

4).

Ser2P phosphorylation is known to be associated with recruitment of cleavage and polyadenylation machinery and proper mRNA 3’ end formation [250, 251]. In mammalian cells, Ser2P ChIP-seq signals peak within the region ~1-4kb downstream of

PAS, where pol II often peaks before terminating. Several C/P factors also show the

98 highest accumulation within this region, including the cleavage stimulation factor Cstf77

[254]. One interesting interpretation of these results is that pol II enriched for Ser2P is directly involved in recruitment of C/P factors to sites of PAS transcription. If this is the case, redistribution of Ser2P peaks on genes could result in recruitment of C/P factors to different regions. In metazoans, alternative PAS exist for the majority of genes. One conceivable mechanism for choosing between multiple PAS is by redistributing pol II

Ser2P accumulation to favor recruitment of C/P factors to one PAS or the other. I report a significant shift in PAS usage for a large number of genes upon Fcp1 depletion. The majority of these shifts are toward gene proximal sites. In line with this result, Fcp1 depletion resulted in an earlier decline in total pol II ChIP signal within the region 4kb downstream of refseq PAS for these genes, implying termination near more proximal sites. These results are consistent with the prediction that Fcp1 activity is responsible for limiting usage of proximal PAS by lowering Ser2P within these regions. While Fcp1 depletion did not result in dramatic redistribution of the cleavage factor Cstf-77, future studies will examine the effect of Fcp1 depletion on recruitment of other C/P factors.

Fcp1 has been shown to associate with the U1 snRNP [260], which is involved in masking cryptic PAS within gene 5’ regions. Since our current pA-seq filtering does not take into consideration usage of cryptic PAS within gene bodies, it will be interesting to expand our analysis to determine whether Fcp1 plays any additional role in limiting usage of these cryptic PAS.

Taken together the results presented in this chapter point to multiple co- transcriptional roles for Fcp1 in influencing not only pol II pausing and escape, but also pol II termination and 3’ end formation. This places Fcp1 within the “yin and yang” of

99 elongation transcriptional control that is emerging as a major layer of regulation for multiple steps along the transcription cycle, and also within the group of factors involved in 3’ end formation that is important for mRNA fate. The tug of war that exists between kinases such as Cdk7 and Cdk9 and phosphatases such as Fcp1 may prove to be an important mechanism for modulating transcriptional responses.

100

CHAPTER V

CONCLUSIONS/DISCUSSION

Transcriptional responses can be finely tuned by influencing when and where pol

II pauses, elongates, and terminates. This thesis focuses on the relationship between pausing, elongation and termination at the sites of pol II accumulation both near transcription start sites and near sites of cleavage and polyadenylation. The state of pol II within these regions dictates both mRNA production rates and mRNA fate.

Regulation of pol II flux away from the promoter proximal region is considered as important for controlling transcriptional responses as initiation [231]. Since pol II promoter proximal accumulation is a general feature of metazoan protein coding genes, network specific upregulation through elongation likely requires that multiple layers of control are at play [175]. The interplay between DSIF, NELF, pol II CTD and PTEFb is well characterized as a target for regulating pol II promoter proximal escape, but how this interplay differs between genes and responses is not completely understood [67]. The transition between pausing and elongation is more complex than simple PTEFb recruitment, and other factors that influence this transition include chromatin modifiers/remodelers, antitermination factors, CTD phosphatases, and non-coding RNAs

(reviewed in[261]). This thesis describes potential new mechanisms of elongation control in mammalian cells by 1) premature termination, a mechanism well characterized in bacterial and viral systems and 2) CTD dephosphorylation, a mechanism previously described in yeast.

101

The major evidence that I report for the premature termination phenomenon is

ChIP-seq data revealing that inhibition of pol II termination factors results in widespread redistribution of pol II into gene bodies, similar to what is seen when transcription is activated by recruitment of PTEFb [54]. On the HSP90AA1 gene, which is known to be upregulated through enhanced elongation from promoter proximal pausing, this redistribution of pol II upon termination factor knockdown is correlated with increased total RNA levels (Figure 3.3F). It is conceivable that there are cellular mechanisms for switching pol II from a prematurely terminating or idling state to a processive state that escapes termination to enter productive elongation. This switch could be thrown to rapidly or synchronously upregulate mRNA levels in response to signals, as occurs when activators recruit positive elongation factors. Conversely, increased premature termination could be a silencing mechanism used to lower expression in a network specific fashion. HIV-1 potentially utilizes torpedo-mediated premature termination in this way, to maintain low levels of viral transcription until Tat recruitment facilitates pol

II escape by protecting the pre-mRNA from cleavage by Drosha [50]. Protection of the cap by cap binding proteins could play a similar role in facilitating escape from premature termination by limiting access to the substrate by decapping factors. Whether recruitment of cap binding proteins changes in response to signals to upregulate gene expression is an open question. Tat also promotes elongation through recruitment of

PTEFb to relieve DSIF/NELF mediated pausing [262]. Pausing induced by these negative elongation factors could be a prerequisite for premature termination by allowing a temporal window for decapping and pol II tracking by Xrn2, and so PTEFb recruitment possibly adds another layer of control that allows escape from premature termination

102

[262]. There may be a controlled competition between the influence of factors that inhibit or enhance premature termination, and those that do the same for stable pausing.

An interesting avenue for future studies will be to carefully catalogue how the relationship between pausing and premature termination is gene specifically altered in response to signaling such as cytokine stimulation, or stresses such as heat-shock or hypoxia.

It has been argued that premature termination is quite rare on metazoan genes as evidenced by long pol II dwell times and the very low detectable levels of terminated

RNAs mapping to promoter proximal regions [122]. It is important, however, to consider that stable pausing and premature termination of pol II are not mutually exclusive events.

It is even possible that stable pausing of any single pol II might increase its probability of undergoing premature termination. Further, RNA products of premature termination by torpedo would be very difficult to detect due to rapid degradation by Xrn2. Indeed, Xrn2 depletion results in stabilization of an RNA species that maps to promoter proximal regions and is a size that would be protected from degradation if contained within the pol

II exit channel; precisely what would be expected of torpedo premature termination degradation products [138]. Increasingly sophisticated studies will be needed to quantify, in a gene and context specific manner, the level of promoter proximal pol II that undergoes premature termination, and the level that proceeds to productive elongation, and how these levels vary in response to developmental or external cues.

Xrn2 is a termination factor shown to have roles not only at promoter proximal regions, but also at gene 3’ ends following cleavage and polyadenylation [50, 100, 263].

It is not clear why depletion of Xrn2 alone reduces promoter proximal accumulation of

103 pol II, but does not result in dramatic termination defects in 3’ flanking regions of genes

(Chapter III [104]). One explanation is that residual Xrn2 remaining in shRNA treated cells is directed primarily to sites of cleavage and polyadenylation. The most dramatic effect of Xrn2 depletion is when it is combined with depletion of the associating helicase

TTF2, and TTF2 ChIP-seq signal reveals co-localization with Xrn2 near gene promoters.

Another helicase, SETX, has been shown to cooperate in terminating pol II within gene

3’ flanking regions [119]. These results suggest a possible switching of Xrn2/helicase cooperation along the length of genes. Xrn2 in complex with TTF2 may preferentially associate with factors that localize to gene 5’ ends, such as capping/decapping factors or pol II enriched for Ser5P. Xrn2 that cooperates with SETX may preferentially associate with factors that localize to gene 3’ ends, such as cleavage-polyadenylation machinery or pol II Ser2P. SETX also participates in promoter proximal termination on HIV-1, and termination by cleavage and polyadenylation near gene promoters was shown to influence directionality of divergent transcription by an uncharacterized mechanism[50,

264]. What flavors of cleavage and polyadenylation machinery/Xrn2/helicase interactions might be at play at certain times at various genes, and which premature termination mechanisms predominate at different promoters remains a matter of debate.

Xrn2 is likely involved in a variety of processes separate from pol II termination.

I reported results of proteomic analysis conducted by Kirk Hansen on Xrn2 immunoprecipitates from RNAse treated HeLa nuclear extracts purified by Ben Erickson in the lab, which showed that factors interacting with Xrn2 include rRNA maturation factors, splicing factors, cleavage-polyadenylation factors, histone variants, the pol I termination factor TTF1. Follow up studies will establish the roll of these interactions in

104 mammalian systems. It also remains to be determined whether the human ortholog of yeast Rai1, DOM3Z has a conserved function in premature termination of defectively capped RNAs as does yeast Rai1 [124].

Another way cells may control the decision between premature termination and productive elongation is by antagonizing the action of TTF2. The transcription factor

Gdown1 can inhibit the termination activity of TTF2 in vivo [11]. Gdown1 associated pol II was also shown to be inefficient at elongation mediated by TFIIF. It is interesting to note that Gdown1 both promotes pausing and inhibits termination. It remains to be determined whether similar factors specifically limit the activity of SETX in Xrn2- mediated termination at gene 3’ ends, and whether 3’ end pausing of pol II is mediated by the action of negative elongation factors.

Since it is well established that phosphorylation of the pol II CTD at Ser2P promotes escape from promoter proximal pausing, it seems possible that removal of this mark may competitively oppose this escape. The primary Ser2P phosphatase Fcp1 has been localized to gene 5’ regions in yeast [171], and in this thesis by ChIP-seq in mammalian cells (Chapter IV, Figure 1). In yeast, Fcp1 action opposed the kinase activity of CTD kinase Ctk1, as evidenced when Fcp1 mutation increased the amount of elongating pol II phosphorylated at Ser2P within gene bodies [171]. I report similar results, demonstrating that Fcp1 depletion by shRNA increases relative Ser2P within gene 5’ regions, and results in redistribution of pol II away from promoters. These results imply that a competition between phosphorylation and dephosphorylation during transcription may provide another layer of elongation control. Future studies will be

105 important for determining how the balance between phosphatase and kinase activity is responsively shifted.

Once initiation of transcription has been established, it now appears that there are many ways to stimulate a transcriptional response at the transition to elongation.

Potential cellular mechanisms suggested in this thesis that remain to be tested include inhibition of or protection from decapping near promoters, and recruitment of factors that antagonize premature termination and/or antagonize action of CTD phosphatases at the elongation checkpoint. Further, regulation of elongation by premature termination or

CTD dephosphorylation need to be analyzed in the context of signal mediated gene activation, cellular stresses, and tissue disease states.

Termination that occurs after productive elongation is coupled to 3’ end processing, and regulation at this step is important for establishing mRNA fate. Since

Ser2P phosphorylation peaks downstream of poly(A)-sites, and since it directly binds 3’ end processing factors, it is believed that this mark influences both poly(A) site choice and termination. In this thesis, I report that depletion of the phosphatase Fcp1 increases the relative amount of Ser2P within gene bodies, and results in both early termination and increased usage of proximal poly(A)-sites. This result draws a direct connection between modulation of Ser2P and APA, implicating Fcp1 in diseases where APA is altered such as cancer. It will be important to determine if the poly(A)-site shifts observed in 293 cells correspond to shifts observed in cancers where Fcp1 is mutated or downregulated.

I also report that Fcp1 depletion results in increased relative Ser5P near gene promoters.

Since Fcp1 is thought to be necessary for reinitiation of transcription through erasure of both Ser5P and Ser2P, its influence on APA may also contribute to a feedback regulation.

106

In this scenario, messages that are subject to reduced initiation, through reduced Fcp1 localization, have increased elongation and can be stabilized by usage of proximal poly(A) sites, allowing exclusion of destabilizing elements within the 3’ UTR. Our lab is in the process of determining the relative steady state levels of messages that have altered APA upon Fcp1 knockdown.

PTEFb phosphorylates pol II CTD as well as the CTR of the Spt5 subunit of

DSIF, and both CTD Ser2P and phosphorylated Spt5 steadily increase downstream of the elongation checkpoint and into gene bodies. Unlike Ser2P however, Spt5 phosphorylation does not peak downstream of poly(A)-sites preceding 3’ end termination. A longer pol II dwell time within this region could facilitate both 3’ end formation and pol II termination. How Spt5 undergoes dephosphorylation remains to be determined since no

Spt5 CTR phosphatase has been identified. Identification of such a phosphatase could allow direct testing of the role of Spt5 dephosphorylation in 3’ end pausing and termination of pol II as well as 3’ end processing of the transcript.

The major conclusions from results presented in this thesis are 1) depletion of both termination and decapping factors has widespread effects on pol II distribution, likely by inhibition of premature termination 2) depletion of the CTD phosphatase Fcp1 causes a decrease in pol II promoter proximal accumulation and a general upstream shift in poly(A) site usage correlated with early termination. Together these results have implications for new mechanisms regulating transcriptional control both at the elongation checkpoint, and at the level of 3’ end formation.

107

REFERENCES

1. Hsin, J.P. and J.L. Manley, The RNA polymerase II CTD coordinates transcription and RNA processing. Genes Dev, 2012. 26(19): p. 2119-37.

2. Saunders, A., L.J. Core, and J.T. Lis, Breaking barriers to transcription elongation. Nat Rev Mol Cell Biol, 2006. 7(8): p. 557-67.

3. Orphanides, G. and D. Reinberg, A unified theory of gene expression. Cell, 2002. 108(4): p. 439-51.

4. Bentley, D.L., Rules of engagement: co-transcriptional recruitment of pre-mRNA processing factors. Curr Opin Cell Biol, 2005. 17(3): p. 251-6.

5. Thomas, M.C. and C.M. Chiang, The general transcription machinery and general cofactors. Crit Rev Biochem Mol Biol, 2006. 41(3): p. 105-78.

6. Sikorski, T.W. and S. Buratowski, The basal initiation machinery: beyond the general transcription factors. Curr Opin Cell Biol, 2009. 21(3): p. 344-51.

7. Juven-Gershon, T., et al., The RNA polymerase II core promoter - the gateway to transcription. Curr Opin Cell Biol, 2008. 20(3): p. 253-9.

8. Zawel, L., K.P. Kumar, and D. Reinberg, Recycling of the general transcription factors during RNA polymerase II transcription. Genes Dev, 1995. 9(12): p. 1479-90.

9. Maxon, M.E., J.A. Goodrich, and R. Tjian, Transcription factor IIE binds preferentially to RNA polymerase IIA and recruits TFIIH - a model for promoter clearance. Genes Dev, 1994. 8(5): p. 515-524.

10. Goodrich, J.A. and R. Tjian, Transcription factors IIe and IIh and ATP hydrolysis direct promoter clearance by RNA-polymerase-II. Cell, 1994. 77(1): p. 145-156.

11. Cheng, B., et al., Functional Association of Gdown1 with RNA Polymerase II Poised on Human Genes. Molecular Cell, 2012. 45(1): p. 38-50.

12. Archambault, J., et al., FCP1, the RAP74-interacting subunit of a human protein phosphatase that dephosphorylates the carboxyl-terminal domain of RNA polymerase IIO. J Biol Chem, 1998. 273(42): p. 27593-601.

13. Max, T., M. Sogaard, and J.Q. Svejstrup, Hyperphosphorylation of the C-terminal repeat domain of RNA polymerase II facilitates dissociation of its complex with mediator. J Biol Chem, 2007. 282(19): p. 14113-20.

108

14. Yudkovsky, N., J.A. Ranish, and S. Hahn, A transcription reinitiation intermediate that is stabilized by activator. Nature, 2000. 408(6809): p. 225-9.

15. Holstege, F.C., P.C. van der Vliet, and H.T. Timmers, Opening of an RNA polymerase II promoter occurs in two distinct steps and requires the basal transcription factors IIE and IIH. EMBO J, 1996. 15(7): p. 1666-77.

16. Keene, R.G. and D.S. Luse, Initially transcribed sequences strongly affect the extent of abortive initiation by RNA polymerase II. J Biol Chem, 1999. 274(17): p. 11526-34.

17. Reines, D., J.W. Conaway, and R.C. Conaway, The rna-polymerase-ii general elongation-factors. Trends In Biochemical Sciences Trends In Biochemical Sciences 21, 1996: p. 351-355.

18. Cho, E.J., et al., mRNA capping enzyme is recruited to the transcription complex by phosphorylation of the RNA polymerase II carboxy-terminal domain. Genes Dev, 1997. 11(24): p. 3319-26.

19. Ho, C. and S. Shuman, Distinct Effector Roles for Ser2 and Ser5 Phosphorylation of the RNA polymerase II CTD in the Recruitment and Allosteric Activation of Mammalian Capping Enzyme. Mol. Cell, 1999. 3: p. 405-411.

20. Komarnitsky, P., E.J. Cho, and S. Buratowski, Different phosphorylated forms of RNA polymerase II and associated mRNA processing factors during transcription. Genes Dev, 2000. 14(19): p. 2452-60.

21. Schroeder, S.C., et al., Dynamic association of capping enzymes with transcribing RNA polymerase II. Genes Dev, 2000. 14(19): p. 2435-40.

22. Mandal, S.S., et al., Functional interactions of RNA-capping enzyme with factors that positively and negatively regulate promoter escape by RNA polymerase II. Proc Natl Acad Sci U S A, 2004. 101(20): p. 7572-7.

23. Shatkin, A.J., mRNA cap binding proteins: essential factors for initiating translation. Cell., 1985. 40.(2.): p. 223-4.

24. Hay, N., H. Skolnik-David, and Y. Aloni, Attenuation in the control of SV40 gene expression. Cell, 1982. 29(1): p. 183-93.

25. Skolnik-David, H. and Y. Aloni, Pausing of RNA polymerase molecules during in vivo transcription of the SV40 leader region. EMBO J, 1983. 2(2): p. 179-84.

26. Skarnes, W.C., D.C. Tessier, and N.H. Acheson, RNA polymerases stall and/or prematurely terminate nearby both early and late promoters on polyomavirus DNA. J Mol Biol, 1988. 203(1): p. 153-71.

109

27. Yanofsky, C., Transcription attenuation: once viewed as a novel regulatory strategy. J Bacteriol, 2000. 182(1): p. 1-8.

28. Coppola, J.A., A.S. Field, and D.S. Luse, Promoter-proximal pausing by RNA polymerase II in vitro: transcripts shorter than 20 nucleotides are not capped. Proc Natl Acad Sci U S A, 1983. 80: p. 1251-5.

29. Gilmour, D.S. and J.T. Lis, RNA polymerase II interacts with the promoter region of the noninduced hsp70 gene in Drosophila melanogaster cells. Mol Cell Biol, 1986. 6(11): p. 3984-9.

30. Abrahante, J.E., E.A. Miller, and A.E. Rougvie, Identification of heterochronic mutants in Caenorhabditis elegans. Temporal misexpression of a collagen::green fluorescent protein fusion gene. Genetics, 1998. 149(3): p. 1335-51.

31. Bentley, D.L. and M. Groudine, A block to elongation is largely responsible for decreased transcription of c-myc in differentiated HL60 cells. Nature, 1986. 321(6071): p. 702-706.

32. Eick, D. and G.W. Bornkamm, Transcriptional arrest within the first exon is a fast control mechanism in c-myc gene expression. Nucleic Acids Res, 1986. 14(21): p. 8331-46.

33. Rougvie, A.E. and J.T. Lis, Postinitiation transcriptional control in Drosophila melanogaster. Mol. Cell. Biol. 1990. 10(11): p. 6041-6045.

34. Chen, Z., et al., Identification and characterization of transcriptional arrest sites in exon 1 of the human adenosine deaminase gene. Mol Cell Biol, 1990. 10(9): p. 4555-64.

35. Collart, M.A., et al., c-fos gene transcription in murine macrophages is modulated by a calcium-dependent block to elongation in 1. Mol Cell Biol, 1991. 11(5): p. 2826-31.

36. Mirkovitch, J. and J.E. Darnell, Mapping of RNA-polymerase on mammalian genes in cells and nuclei. Mol Biol Cell, 1992. 3(10): p. 1085-1094.

37. Schilling, L.J. and P.J. Farnham, Inappropriate transcription from the 5' end of the murine dihydrofolate-reductase gene masks transcriptional regulation. Nucleic Acids Res, 1994. 22(15): p. 3061-3068.

38. Crouse, G.F., et al., Analysis of the mouse dhfr promoter region: existence of a divergently transcribed gene. Mol Cell Biol, 1985. 5(8): p. 1847-58.

110

39. Krumm, A., L.B. Hickey, and M. Groudine, Promoter-proximal pausing of RNA polymerase II defines a general rate-limiting step after transcription initiation. Genes Dev, 1995. 9(5): p. 559-72.

40. Guenther, M.G., et al., A chromatin landmark and transcription initiation at most promoters in human cells. Cell, 2007. 130(1): p. 77-88.

41. Muse, G.W., et al., RNA polymerase is poised for activation across the genome. Nat Genet, 2007. 39(12): p. 1507-11.

42. Core, L.J., J.J. Waterfall, and J.T. Lis, Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science, 2008. 322(5909): p. 1845-8.

43. Brown, S.A., A.N. Imbalzano, and R.E. Kingston, Activator-dependent regulation of transcriptional pausing on nucleosomal templates. Genes Dev, 1996. 10(12): p. 1479-1490.

44. Gilchrist, D.A., et al., Pausing of RNA Polymerase II Disrupts DNA-Specified Nucleosome Organization to Enable Precise Gene Regulation. Cell, 2010. 143(4): p. 540-551.

45. Lee, H., et al., DNA sequence requirements for generating paused polymerase at the start of hsp70. Genes Dev, 1992. 6(2): p. 284-95.

46. Kao, S.Y., et al., Anti-termination of transcription within the long terminal repeat of HIV-1 by tat gene product. Nature, 1987. 330(6147): p. 489-493.

47. Pei, Y., B. Schwer, and S. Shuman, Interactions between fission yeast Cdk9, its cyclin partner Pch1, and mRNA capping enzyme Pct1 suggest an elongation checkpoint for mRNA quality control. J Biol Chem, 2003. 278(9): p. 7180-8.

48. Rasmussen, E.B. and J.T. Lis, In vivo transcriptional pausing and cap formation on three Drosophila heat shock genes. Proc Natl Acad Sci U S A, 1993. 90(17): p. 7923-7.

49. Greenblatt, J., J. Nodwell, and S. Mason, Transcriptional antitermination. Nature, 1993. 364: p. 401-406.

50. Wagschal, A., et al., Microprocessor, Setx, Xrn2, and Rrp6 Co-operate to Induce Premature Termination of Transcription by RNAPII. Cell, 2012. 150(6): p. 1147- 1157.

51. Southgate, C.D. and M.R. Green, The HIV-1 Tat protein activates transcription from an upstream DNA-binding site: implications for Tat function. Genes Dev, 1991. 5(12B): p. 2496-507.

111

52. Yankulov, K., et al., Transcriptional Elongation by RNA Polymerase II is Stimulated by Transactivators. Cell, 1994. 77: p. 749-759.

53. Barboric, M., et al., NF-kappaB binds P-TEFb to stimulate transcriptional elongation by RNA polymerase II. Mol Cell, 2001. 8(2): p. 327-37.

54. Rahl, P.B., et al., c-Myc regulates transcriptional pause release. Cell, 2010. 141(3): p. 432-45.

55. Blair, W.S., R.A. Fridell, and B.R. Cullen, Synergistic enhancement of both initiation and elongation by acidic transcription activation domains. EMBO, 1996. 15(7): p. 1658-1665.

56. Blau, J., et al., Three functional classes of transcriptional activation domain. Mol Cell Biol, 1996. 16: p. 2044-2055.

57. Fraser, N.W., P.B. Sehgal, and J.E. Darnell, DRB-induced premature termination of late adenovirus transcription. Nature, 1978. 272(5654): p. 590-3.

58. Wada, T., et al., DSIF, a novel transcription elongation factor that regulates RNA polymerase II processivity, is composed of human Spt4 and Spt5 homologs. Genes Dev, 1998. 12(3): p. 343-56.

59. Yamaguchi, Y., et al., NELF, a multisubunit complex containing RD, cooperates with DSIF to repress RNA polymerase II elongation. Cell, 1999. 97(1): p. 41-51.

60. Marshall, N.F. and D.H. Price, Control of formation of two distinct classes of RNA polymerase II elongation complexes. Mol Cell Biol, 1992. 12(5): p. 2078-90.

61. Wei, P., et al., A novel CDK9-associated C-type cyclin interacts directly with HIV-1 Tat and mediates its high-affinity, loop-specific binding to TAR RNA. Cell., 1998. 92.(4.): p. 451-462.

62. Zhu, Y.R., et al., Transcription elongation factor P-TEFb is required for HIV-1 Tat transactivation in vitro. Genes and Dev., 1997. 11.(20.): p. 2622-2632.

63. Wada, T., et al., Evidence that P-TEFb alleviates the negative effect of DSIF on RNA polymerase II-dependent transcription in vitro. Embo J, 1998. 17(24): p. 7395-403.

64. Marshall, N.F., et al., Control of RNA polymerase II elongation potential by a novel carboxyl-terminal domain kinase. J Biol Chem, 1996. 271(43): p. 27176-83.

65. Yamada, T., et al., P-TEFb-mediated phosphorylation of hSpt5 C-terminal repeats is critical for processive transcription elongation. Mol Cell, 2006. 21(2): p. 227-37.

112

66. Amir-Zilberstein, L., et al., Differential regulation of NF-kappaB by elongation factors is determined by core promoter type. Molecular and Cellular Biology, 2007. 27(14): p. 5246-59.

67. Peterlin, B.M. and D.H. Price, Controlling the elongation phase of transcription with P-TEFb. Mol Cell, 2006. 23(3): p. 297-305.

68. Smith, E., C. Lin, and A. Shilatifard, The super elongation complex (SEC) and MLL in development and disease. Genes Dev, 2011. 25(7): p. 661-72.

69. Yang, Z., et al., Recruitment of P-TEFb for stimulation of transcriptional elongation by the bromodomain protein Brd4. Mol Cell, 2005. 19(4): p. 535-45.

70. Price, D.H., Regulation of RNA polymerase II elongation by c-Myc. Cell, 2010. 141(3): p. 399-400.

71. Steinmetz, E.J., et al., RNA-binding protein Nrd1 directs poly(A)-independent 3'- end formation of RNA polymerase II transcripts. Nature, 2001. 413(6853): p. 327- 31.

72. Kim, M., et al., Distinct pathways for snoRNA and mRNA termination. Mol Cell, 2006. 24(5): p. 723-34.

73. Arigo, J.T., et al., Termination of Cryptic Unstable Transcripts Is Directed by Yeast RNA-Binding Proteins Nrd1 and Nab3. Mol Cell, 2006. 23(6): p. 841-51.

74. Lenstra, T.L., et al., The Role of Ctk1 Kinase in Termination of Small Non-Coding RNAs. PLoS One, 2013. 8(12): p. e80495.

75. Gudipati, R.K., et al., Phosphorylation of the RNA polymerase II C-terminal domain dictates transcription termination choice. Nat Struct Mol Biol, 2008. 15(8): p. 786-94.

76. Cuello, P., et al., Transcription of the human U2 snRNA genes continues beyond the 3' box in vivo. Embo J, 1999. 18(10): p. 2867-77.

77. Kiss, T., Small nucleolar RNAs: an abundant group of noncoding RNAs with diverse cellular functions. Cell, 2002. 109(2): p. 145-8.

78. Vasiljeva, L., et al., The Nrd1-Nab3-Sen1 termination complex interacts with the Ser5-phosphorylated RNA polymerase II C-terminal domain. Nat Struct Mol Biol, 2008. 15(8): p. 795-804.

79. Cui, M., et al., Genes involved in pre-mRNA 3'-end formation and transcription termination revealed by a lin-15 operon Muv suppressor screen. Proc Natl Acad Sci U S A, 2008. 105(43): p. 16665-70.

113

80. Gilmartin, G.M., Eukaryotic mRNA 3' processing: a common means to different ends. Genes Dev, 2005. 19(21): p. 2517-21.

81. Proudfoot, N.J. and G.G. Brownlee, 3' non-coding region sequences in eukaryotic messenger RNA. Nature, 1976. 263(5574): p. 211-4.

82. Zhao, J., L. Hyman, and C. Moore, Formation of mRNA 3' ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mRNA synthesis. Microbiol Mol Biol Rev, 1999. 63(2): p. 405-45.

83. Mandel, C.R., Y. Bai, and L. Tong, Protein factors in pre-mRNA 3'-end processing. Cell Mol Life Sci, 2008. 65(7-8): p. 1099-122.

84. Kaufmann, I., et al., Human Fip1 is a subunit of CPSF that binds to U-rich RNA elements and stimulates poly(A) polymerase. Embo J, 2004. 23(3): p. 616-26.

85. Ryan, K., O. Calvo, and J.L. Manley, Evidence that polyadenylation factor CPSF- 73 is the mRNA 3' processing endonuclease. Rna, 2004. 10(4): p. 565-73.

86. Manley, J.L., P.A. Sharp, and M.L. Gefter, Rna synthesis in isolated nuclei processing of adenovirus serotype 2 late messenger rna precursors. J Mol Biol, 1982. 159(4): p. 581-99.

87. Elkon, R., A.P. Ugalde, and R. Agami, Alternative cleavage and polyadenylation: extent, regulation and function. Nat Rev Genet, 2013. 14(7): p. 496-506.

88. Dominski, Z. and W.F. Marzluff, Formation of the 3' end of histone mRNA. Gene, 1999. 239(1): p. 1-14.

89. Dominski, Z., X.C. Yang, and W.F. Marzluff, The Polyadenylation Factor CPSF- 73 Is Involved in Histone-Pre-mRNA Processing. Cell, 2005. 123(1): p. 37-48.

90. Kolev, N.G. and J.A. Steitz, Symplekin and multiple other polyadenylation factors participate in 3'-end maturation of histone mRNAs. Genes Dev, 2005. 19(21): p. 2583-92.

91. Sanchez, R. and W.F. Marzluff, The stem-loop binding protein is required for efficient translation of histone mRNA in vivo and in vitro. Mol Cell Biol, 2002. 22(20): p. 7093-104.

92. Whitelaw, E. and N. Proudfoot, Alpha-thalassaemia caused by a poly(A) site mutation reveals that transcriptional termination is linked to 3' end processing in the human alpha 2 globin gene. EMBO J, 1986. 5(11): p. 2915-22.

114

93. Logan, J., et al., A poly(A) addition site and a downstream termination region are required for efficient cessation of transcription by RNA polymerase II in the mouse beta maj-globin gene. Proc Natl Acad Sci U S A, 1987. 84(23): p. 8306- 10.

94. Connelly, S. and J.L. Manley, A functional mRNA polyadenylation signal is required for transcription termination by RNA polymerase II. Genes Dev, 1988. 2(4): p. 440-52.

95. Kuhn, U. and E. Wahle, Structure and function of poly(A) binding proteins. Biochim Biophys Acta, 2004. 1678(2-3): p. 67-84.

96. Mangus, D.A., M.C. Evans, and A. Jacobson, Poly(A)-binding proteins: multifunctional scaffolds for the post-transcriptional control of gene expression. Genome Biol, 2003. 4(7): p. 223. 97. Orozco, I.J., S.J. Kim, and H.G. Martinson, The poly(A) signal, without the assistance of any downstream element, directs RNA polymerase II to pause in vivo and then to release stochastically from the template. J Biol Chem, 2002. 277(45): p. 42899-911.

98. Tran, D.P., et al., Mechanism of poly(a) signal transduction to ii in vitro. Mol Cell Biol, 2001. 21(21): p. 7495-508.

99. Rosonina, E., S. Kaneko, and J.L. Manley, Terminating the transcript: breaking up is hard to do. Genes Dev, 2006. 20(9): p. 1050-6.

100. West, S., N. Gromak, and N.J. Proudfoot, Human 5' --> 3' exonuclease Xrn2 promotes transcription termination at co-transcriptional cleavage sites. Nature, 2004. 432(7016): p. 522-5.

101. Osheim, Y.N., N.J. Proudfoot, and A.L. Beyer, EM visualization of transcription by RNA polymerase II: downstream termination requires a poly(A) signal but not transcript cleavage. Mol Cell, 1999. 3(3): p. 379-87.

102. Sadowski, M., et al., Independent functions of yeast Pcf11p in pre-mRNA 3' end processing and in transcription termination. Embo J, 2003. 22(9): p. 2167-77.

103. Luo, W., A.W. Johnson, and D.L. Bentley, The role of Rat1 in coupling mRNA 3'- end processing to transcription termination: implications for a unified allosteric- torpedo model. Genes Dev, 2006. 20(8): p. 954-65.

104. Dengl, S. and P. Cramer, Torpedo nuclease Rat1 is insufficient to terminate RNA polymerase II in vitro. J Biol Chem, 2009. 284(32): p. 21270-9.

105. Banerjee, A., et al., A novel tandem reporter quantifies RNA polymerase II termination in mammalian cells. PLoS ONE, 2009. 4(7): p. e6193.

115

106. Zhang, Z., J. Fu, and D.S. Gilmour, CTD-dependent dismantling of the RNA polymerase II elongation complex by the pre-mRNA 3'-end processing factor, Pcf11. Genes Dev, 2005. 19(13): p. 1572-80.

107. Calvo, O. and J.L. Manley, Evolutionarily conserved interaction between CstF-64 and PC4 links transcription, polyadenylation, and termination. Mol Cell, 2001. 7(5): p. 1013-23.

108. Dermody, J.L., et al., Unphosphorylated SR-like protein Npl3 stimulates RNA polymerase II elongation. PLoS ONE, 2008. 3(9): p. e3273.

109. Mayer, A., et al., CTD tyrosine phosphorylation impairs termination factor recruitment to RNA polymerase II. Science, 2012. 336(6089): p. 1723-5.

110. Kim, M., et al., Transitions in RNA polymerase II elongation complexes at the 3' ends of genes. Embo J, 2004. 23(2): p. 354-64.

111. Birse, C.E., et al., Coupling termination of transcription to messenger RNA maturation in yeast. Science, 1998. 280(5361): p. 298-301.

112. Alen, C., et al., A role for chromatin remodeling in transcriptional termination by RNA polymerase II. Mol Cell, 2002. 10(6): p. 1441-52.

113. Dichtl, B., et al., Yhh1p/Cft1p directly links poly(A) site recognition and RNA polymerase II transcription termination. Embo J, 2002. 21(15): p. 4125-35.

114. Kim, M., et al., The yeast Rat1 exonuclease promotes transcription termination by RNA polymerase II. Nature, 2004. 432(7016): p. 517-22.

115. Kaneko, S., et al., The multifunctional protein p54nrb/PSF recruits the exonuclease XRN2 to facilitate pre-mRNA 3' processing and transcription termination. Genes Dev, 2007. 21(14): p. 1779-89.

116. West, S. and N.J. Proudfoot, Human Pcf11 enhances degradation of RNA polymerase II-associated nascent RNA and transcriptional termination. Nucleic Acids Res, 2008. 36(3): p. 905-14.

117. Suraweera, A., et al., Functional role for senataxin, defective in ataxia oculomotor apraxia type 2, in transcriptional regulation. Hum Mol Genet, 2009. 18(18): p. 3384-96.

118. Jiang, Y., et al., Involvement of transcription termination factor 2 in mitotic repression of transcription elongation. Mol Cell, 2004. 14(3): p. 375-85.

116

119. Skourti-Stathaki, K., N.J. Proudfoot, and N. Gromak, Human senataxin resolves RNA/DNA hybrids formed at transcriptional pause sites to promote Xrn2- dependent termination. Mol Cell, 2011. 42(6): p. 794-805.

120. Skolnik-David, H., N. Hay, and Y. Aloni, Site of premature termination of late transcription of simian virus 40 DNA: enhancement by 5,6-dichloro-1-beta-D- ribofuranosylbenzimidazole. Proc Natl Acad Sci U S A, 1982. 79(9): p. 2743-7.

121. Seila, A.C., et al., Divergent transcription from active promoters. Science, 2008. 322(5909): p. 1849-51.

122. Henriques, T., et al., Stable Pausing by RNA Polymerase II Provides an Opportunity to Target and Integrate Regulatory Signals. Mol Cell, 2013. 52(4): p. 517-28.

123. Davidson, L., A. Kerr, and S. West, Co-transcriptional degradation of aberrant pre-mRNA by Xrn2. EMBO J, 2012. 31(11): p. 2566-78.

124. Jiao, X., et al., Identification of a quality-control mechanism for mRNA 5'-end capping. Nature, 2010. 467(7315): p. 608-11.

125. Kastenmayer, J.P. and P.J. Green, Novel features of the XRN-family in Arabidopsis: evidence that AtXRN4, one of several orthologs of nuclear Xrn2p/Rat1p, functions in the cytoplasm. Proc Natl Acad Sci U S A, 2000. 97(25): p. 13985-90.

126. Kenna, M., et al., An essential yeast gene with homology to the exonuclease- encoding XRN1/KEM1 gene also encodes a protein with exoribonuclease activity. Molecular and Cellular Biology., 1993. 13.(1.): p. 341-50.

127. Zhang, M., et al., Cloning and mapping of the XRN2 gene to human 20p11.1-p11.2. Genomics, 1999. 59(2): p. 252-4.

128. Sugano, S., et al., Molecular analysis of the dhp1+ gene of Schizosaccharomyces pombe: an essential gene that has homology to the DST2 and RAT1 genes of Saccharomyces cerevisiae. Mol Gen Genet, 1994. 243(1): p. 1-8.

129. Gy, I., et al., Arabidopsis FIERY1, XRN2, and XRN3 are endogenous RNA silencing suppressors. Plant Cell, 2007. 19(11): p. 3451-61.

130. Estavillo, G.M., et al., Evidence for a SAL1-PAP chloroplast retrograde pathway that functions in drought and high light signaling in Arabidopsis. Plant Cell, 2011. 23(11): p. 3992-4012.

117

131. Hirsch, J., et al., A novel fry1 allele reveals the existence of a mutant phenotype unrelated to 5'->3' exoribonuclease (XRN) activities in Arabidopsis thaliana roots. PLoS One, 2011. 6(2): p. e16724.

132. Fatica, A. and D. Tollervey, Making ribosomes. Curr Opin Cell Biol, 2002. 14(3): p. 313-8.

133. Luke, B. and J. Lingner, TERRA: telomeric repeat-containing RNA. EMBO J, 2009. 28(17): p. 2503-10.

134. Luke, B., et al., The Rat1p 5' to 3' exonuclease degrades telomeric repeat- containing RNA and promotes telomere elongation in Saccharomyces cerevisiae. Mol Cell, 2008. 32(4): p. 465-77.

135. Tollervey, D., Molecular biology: termination by torpedo. Nature, 2004. 432(7016): p. 456-7.

136. Taft, R.J., et al., Tiny RNAs associated with transcription start sites in animals. Nat Genet, 2009. 41(5): p. 572-8.

137. Lenhard, B., A. Sandelin, and P. Carninci, Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat Rev Genet, 2012. 13(4): p. 233-45.

138. Valen, E., et al., Biogenic mechanisms and utilization of small RNAs derived from human protein-coding genes. Nat Struct Mol Biol, 2011. 18: p. 1075-1082.

139. Dunckley, T. and R. Parker, The DCP2 protein is required for mRNA decapping in Saccharomyces cerevisiae and contains a functional MutT motif. Embo J, 1999. 18(19): p. 5411-22.

140. Wang, Z., et al., The hDcp2 protein is a mammalian mRNA decapping enzyme. Proc Natl Acad Sci U S A, 2002. 99(20): p. 12663-8.

141. Steiger, M., et al., Analysis of recombinant yeast decapping enzyme. RNA, 2003. 9(2): p. 231-8.

142. Iwasaki, S., et al., Characterization of Arabidopsis decapping proteins AtDCP1 and AtDCP2, which are essential for post-embryonic development. FEBS Lett, 2007. 581(13): p. 2455-9.

143. Cohen, L.S., et al., Dcp2 Decaps m2,2,7GpppN-capped RNAs, and its activity is sequence and context dependent. Mol Cell Biol, 2005. 25(20): p. 8779-91.

118

144. Bessman, M.J., D.N. Frick, and S.F. O'Handley, The MutT proteins or "Nudix" , a family of versatile, widely distributed, "housecleaning" enzymes. J Biol Chem, 1996. 271(41): p. 25059-62.

145. Piccirillo, C., R. Khanna, and M. Kiledjian, Functional characterization of the mammalian mRNA decapping enzyme hDcp2. Rna, 2003. 9(9): p. 1138-47.

146. She, M., et al., Crystal structure and functional analysis of Dcp2p from Schizosaccharomyces pombe. Nat Struct Mol Biol, 2006. 13(1): p. 63-70.

147. Arribas-Layton, M., et al., Structural and functional control of the eukaryotic mRNA decapping machinery. Biochim Biophys Acta, 2013. 1829(6-7): p. 580-9.

148. LaGrandeur, T.E. and R. Parker, Isolation and characterization of Dcp1p, the yeast mRNA decapping enzyme. Embo J, 1998. 17(5): p. 1487-96.

149. She, M., et al., Structural basis of recognition and activation by dcp1. Mol Cell, 2008. 29(3): p. 337-49.

150. Sheth, U. and R. Parker, Decapping and decay of messenger RNA occur in cytoplasmic processing bodies. Science, 2003. 300(5620): p. 805-8.

151. Song, M.G., Y. Li, and M. Kiledjian, Multiple mRNA decapping enzymes in mammalian cells. Mol Cell, 2010. 40(3): p. 423-32.

152. Parker, R. and H. Song, The enzymes and control of eukaryotic mRNA turnover. Nat Struct Mol Biol, 2004. 11(2): p. 121-7.

153. Lykke-Andersen, J., Identification of a human decapping complex associated with hUpf proteins in nonsense-mediated decay. Mol Cell Biol, 2002. 22(23): p. 8114- 21.

154. Leonard, D., et al., hLodestar/HuF2 interacts with CDC5L and is involved in pre- mRNA splicing. Biochem Biophys Res Commun, 2003. 308(4): p. 793-801.

155. Schwartz, D., C.J. Decker, and R. Parker, The enhancer of decapping proteins, Edc1p and Edc2p, bind RNA and stimulate the activity of the decapping enzyme. RNA, 2003. 9(2): p. 239-51.

156. Fromm, S.A., et al., The structural basis of Edc3- and Scd6-mediated activation of the Dcp1:Dcp2 mRNA decapping complex. EMBO J, 2012. 31(2): p. 279-90.

157. Kshirsagar, M. and R. Parker, Identification of Edc3p as an enhancer of mRNA decapping in Saccharomyces cerevisiae. Genetics, 2004. 166(2): p. 729-39.

119

158. Harigaya, Y., et al., Identification and analysis of the interaction between Edc3 and Dcp2 in Saccharomyces cerevisiae. Mol Cell Biol, 2010. 30(6): p. 1446-56.

159. Nissan, T., et al., Decapping activators in Saccharomyces cerevisiae act by multiple mechanisms. Mol Cell, 2010. 39(5): p. 773-83.

160. Phatnani, H.P. and A.L. Greenleaf, Phosphorylation and functions of the RNA polymerase II CTD. Genes Dev, 2006. 20(21): p. 2922-2936.

161. Bartolomei, M.S., et al., Genetic analysis of the repetitive carboxyl-terminal domain of the largest subunit of mouse RNA polymerase II. Mol Cell Biol, 1988. 8(1): p. 330-339.

162. Corden, J.L., et al., A unique structure at the carboxyl terminus of the largest subunit of eukaryotic RNA polymerase II. Proc Natl Acad Sci USA, 1985. 82.(23.): p. 7934-8.

163. Kelly, W.G., M.E. Dahmus, and G.W. Hart, RNA polymerase II is a glycoprotein. Modification of the COOH-terminal domain by O-GlcNAc. J Biol Chem, 1993. 268(14): p. 10416-24.

164. Xu, Y.X., et al., Pin1 modulates the structure and function of human RNA polymerase II. Genes Dev, 2003. 17(22): p. 2765-76.

165. Buratowski, S., The CTD code. Nat Struct Biol, 2003. 10(9): p. 679-80.

166. Chapman, R.D., et al., Transcribing RNA polymerase II is phosphorylated at CTD residue serine-7. Science, 2007. 318(5857): p. 1780-2.

167. Baskaran, R., M.E. Dahmus, and J.Y. Wang, Tyrosine phosphorylation of mammalian RNA polymerase II carboxyl-terminal domain. Proc Natl Acad Sci U S A, 1993. 90(23): p. 11167-71.

168. Hsin, J.-P., A. Sheth, and J.L. Manley, RNAP II CTD Phosphorylated on Threonine-4 Is Required for Histone mRNA 3' End Processing. Science, 2011. 334(6056): p. 683-686.

169. Wood, A. and A. Shilatifard, Bur1/Bur2 and the Ctk complex in yeast: the split personality of mammalian P-TEFb. Cell Cycle, 2006. 5(10): p. 1066-8.

170. Qiu, H., C. Hu, and A.G. Hinnebusch, Phosphorylation of the Pol II CTD by KIN28 enhances BUR1/BUR2 recruitment and Ser2 CTD phosphorylation near promoters. Mol Cell, 2009. 33(6): p. 752-62.

171. Cho, E.J., et al., Opposing effects of Ctk1 kinase and Fcp1 phosphatase at Ser 2 of the RNA polymerase II C-terminal domain. Genes Dev, 2001. 15(24): p. 3319-29.

120

172. Viladevall, L., et al., TFIIH and P-TEFb coordinate transcription with capping enzyme recruitment at specific genes in fission yeast. Mol Cell, 2009. 33(6): p. 738-51.

173. Coudreuse, D., et al., A gene-specific requirement of RNA polymerase II CTD phosphorylation for sexual differentiation in S. pombe. Curr Biol, 2010. 20(12): p. 1053-64.

174. Liu, Y., et al., Phosphorylation of the transcription elongation factor Spt5 by yeast Bur1 kinase stimulates recruitment of the PAF complex. Mol Cell Biol, 2009. 29(17): p. 4852-63.

175. Nechaev, S. and K. Adelman, Pol II waiting in the starting gates: Regulating the transition from transcription initiation into productive elongation. BBA - Gene Regulatory Mechanisms, 2011. 1809(1): p. 34-45.

176. Sims, R.J., 3rd, R. Belotserkovskaya, and D. Reinberg, Elongation by RNA polymerase II: the short and long of it. Genes Dev, 2004. 18(20): p. 2437-68.

177. Bartkowiak, B., et al., CDK12 is a transcription elongation-associated CTD kinase, the metazoan ortholog of yeast Ctk1. Genes Dev, 2010. 24(20): p. 2303- 16.

178. Bartkowiak, B. and A.L. Greenleaf, Phosphorylation of RNAPII: To P-TEFb or not to P-TEFb? Transcription, 2011. 2(3): p. 115-119.

179. Devaiah, B.N., et al., BRD4 is an atypical kinase that phosphorylates serine2 of the RNA polymerase II carboxy-terminal domain. Proc Natl Acad Sci U S A, 2012. 109(18): p. 6927-32.

180. Kohoutek, J. and D. Blazek, Cyclin K goes with Cdk12 and Cdk13. Cell Div, 2012. 7: p. 12.

181. Blazek, D., et al., The Cyclin K/Cdk12 complex maintains genomic stability via regulation of expression of DNA damage response genes. Genes Dev, 2011. 25(20): p. 2158-72.

182. Rahman, S., et al., The Brd4 extraterminal domain confers transcription activation independent of pTEFb by recruiting multiple proteins, including NSD3. Mol Cell Biol, 2011. 31(13): p. 2641-52.

183. Yoh, S.M., et al., The Spt6 SH2 domain binds Ser2-P RNAPII to direct Iws1- dependent mRNA splicing and export. Genes Dev, 2007. 21(2): p. 160-74.

121

184. Kizer, K.O., et al., A novel domain in Set2 mediates RNA polymerase II interaction and couples histone H3 K36 methylation with transcript elongation. Mol Cell Biol, 2005. 25(8): p. 3305-16.

185. Vojnic, E., et al., Structure and carboxyl-terminal domain (CTD) binding of the Set2 SRI domain that couples histone H3 Lys36 methylation to transcription. J Biol Chem, 2006. 281(1): p. 13-5.

186. Fuchs, S.M., et al., RNA Polymerase II Carboxyl-terminal Domain Phosphorylation Regulates Protein Stability of the Set2 Methyltransferase and Histone H3 Di- and Trimethylation at Lysine 36. Journal of Biological Chemistry, 2012. 287(5): p. 3249-3256.

187. Keogh, M.C., et al., Cotranscriptional set2 methylation of histone H3 lysine 36 recruits a repressive Rpd3 complex. Cell, 2005. 123(4): p. 593-605.

188. Pirngruber, J., et al., CDK9 directs H2B monoubiquitination and controls replication-dependent histone mRNA 3'-end processing. EMBO Rep, 2009. 10(8): p. 894-900.

189. Meinhart, A. and P. Cramer, Recognition of RNA polymerase II carboxy-terminal domain by 3'-RNA-processing factors. Nature, 2004. 430(6996): p. 223-6.

190. Lunde, B.M., et al., Cooperative interaction of transcription termination factors with the RNA polymerase II C-terminal domain. Nat Struct Mol Biol, 2010. 17(10): p. 1195-201.

191. Kim, H., et al., Gene-specific RNA polymerase II phosphorylation and the CTD code. Nat Struct Mol Biol, 2010. 17(10): p. 1279-86.

192. Mayer, A., et al., Uniform transitions of the general RNA polymerase II transcription complex. Nat Struct Mol Biol, 2010. 17(10): p. 1272-8.

193. Meinhart, A., et al., A structural perspective of CTD function. Genes Dev, 2005. 19(12): p. 1401-15.

194. Egloff, S. and S. Murphy, Cracking the RNA polymerase II CTD code. Trends Genet, 2008. 24(6): p. 280-8.

195. Krishnamurthy, S., et al., Ssu72 Is an RNA polymerase II CTD phosphatase. Mol Cell, 2004. 14(3): p. 387-94.

196. Krishnamurthy, S., et al., Functional interaction of the Ess1 prolyl with components of the RNA polymerase II initiation and termination machineries. Mol Cell Biol, 2009. 29(11): p. 2925-34.

122

197. Mosley, A.L., et al., Rtr1 is a CTD phosphatase that regulates RNA polymerase II during the transition from serine 5 to serine 2 phosphorylation. Mol Cell, 2009. 34(2): p. 168-78.

198. Jeronimo, C., et al., Systematic analysis of the protein interaction network for the human transcription machinery reveals the identity of the 7SK capping enzyme. Mol Cell, 2007. 27(2): p. 262-74.

199. Egloff, S., et al., Ser7 phosphorylation of the CTD recruits the RPAP2 Ser5 phosphatase to snRNA genes. Mol Cell, 2012. 45(1): p. 111-22.

200. Wostenberg, C., et al., Atomistic simulations reveal structural disorder in the RAP74-FCP1 complex. J Phys Chem B, 2011. 115(46): p. 13731-9.

201. Kong, S.E., et al., Interaction of Fcp1 Phosphatase with Elongating RNA Polymerase II Holoenzyme, Enzymatic Mechanism of Action, and Genetic Interaction with Elongator. J Biol Chem, 2005. 280(6): p. 4299-306.

202. Hausmann, S. and S. Shuman, Characterization of the CTD phosphatase Fcp1 from fission yeast. Preferential dephosphorylation of serine 2 versus serine 5. J Biol Chem, 2002. 277(24): p. 21213-20.

203. Lin, P.S., M.F. Dubois, and M.E. Dahmus, TFIIF-associating carboxyl-terminal domain phosphatase dephosphorylates phosphoserines 2 and 5 of RNA polymerase II. J Biol Chem, 2002. 277(48): p. 45949-56.

204. Abbott, K.L., et al., Interactions of the HIV-1 Tat and RAP74 proteins with the RNA polymerase II CTD phosphatase FCP1. Biochemistry, 2005. 44(8): p. 2716- 31.

205. Derti, A., et al., A quantitative atlas of polyadenylation in five mammals. Genome Research, 2012. 22(6): p. 1173-1183.

206. Sandberg, R., et al., Proliferating cells express mRNAs with shortened 3' untranslated regions and fewer microRNA target sites. Science, 2008. 320(5883): p. 1643-7.

207. Mayr, C. and D.P. Bartel, Widespread shortening of 3'UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell, 2009. 138(4): p. 673-84.

208. Ji, Z., et al., Progressive lengthening of 3' untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development. Proc Natl Acad Sci U S A, 2009. 106(17): p. 7028-33.

123

209. Mangone, M., et al., The Landscape of C. elegans 3'UTRs. Science, 2010. 329(5990): p. 432-435.

210. Ji, Z. and B. Tian, Reprogramming of 3' untranslated regions of mRNAs by alternative polyadenylation in generation of pluripotent stem cells from different cell types. PLoS One, 2009. 4(12): p. e8419.

211. Elkon, R., et al., E2F mediates enhanced alternative polyadenylation in proliferation. Genome Biol, 2012. 13(7): p. R59.

212. Takagaki, Y. and J.L. Manley, Levels of polyadenylation factor CstF-64 control IgM heavy chain mRNA accumulation and other events associated with B cell differentiation. Mol Cell, 1998. 2(6): p. 761-71.

213. Takagaki, Y., et al., The polyadenylation factor CstF-64 regulates alternative processing of IgM heavy chain pre-mRNA during B cell differentiation. Cell, 1996. 87(5): p. 941-52.

214. Fong, N., M. Ohman, and D.L. Bentley, Fast ribozyme cleavage releases transcripts from RNA polymerase II and aborts co-transcriptional pre-mRNA processing. Nat Struct Mol Biol, 2009. 16(9): p. 916-22.

215. Fenger-Gron, M., et al., Multiple processing body factors and the ARE binding protein TTP activate mRNA decapping. Mol Cell, 2005. 20(6): p. 905-15.

216. Lykke-Andersen, J. and E. Wagner, Recruitment and activation of mRNA decay enzymes by two ARE-mediated decay activation domains in the proteins TTP and BRF-1. Genes Dev, 2005. 19(3): p. 351-61.

217. Langmead, B., et al., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 2009. 10(3): p. R25.

218. Heinz, S., et al., Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell, 2010. 38(4): p. 576-89.

219. Benjamini, Y. and Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Stat. Soc. Ser.B, 1995. 57: p. 289-300.

220. Coller, J. and R. Parker, Eukaryotic mRNA decapping. Annu Rev Biochem, 2004. 73: p. 861-90.

221. Franks, T.M. and J. Lykke-Andersen, The control of mRNA decapping and P- body formation. Mol Cell, 2008. 32(5): p. 605-15.

124

222. Kufel, J., et al., Nuclear pre-mRNA decapping and 5' degradation in yeast require the Lsm2-8p complex. Mol Cell Biol, 2004. 24(21): p. 9646-57.

223. Muhlrad, D., C.J. Decker, and R. Parker, Deadenylation of the unstable mRNA encoded by the yeast MFA2 gene leads to decapping followed by 5'-->3' digestion of the transcript. Genes Dev, 1994. 8(7): p. 855-66.

224. Connelly, S. and J.L. Manley, A functional mRNA polyadenylation signal is required for transcription termination by RNA polymerase II. Genes Dev, 1988. 2: p. 440-52.

225. Kawauchi, J., et al., Budding yeast RNA polymerases I and II employ parallel mechanisms of transcriptional termination. Genes Dev, 2008. 22(8): p. 1082-92.

226. El Hage, A., et al., Efficient termination of transcription by RNA polymerase I requires the 5' exonuclease Rat1 in yeast. Genes Dev, 2008. 22(8): p. 1069-81.

227. Liu, M., Z. Xie, and D.H. Price, A human RNA polymerase II transcription termination factor is a SWI2/SNF2 family member. J Biol Chem, 1998. 273(40): p. 25541-4.

228. Zeitlinger, J., et al., RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo. Nat Genet, 2007. 39(12): p. 1512-6.

229. Chiba, K., et al., Promoter-proximal pausing and its release: molecular mechanisms and physiological functions. Exp Cell Res, 2010. 316(17): p. 2723- 30.

230. Nechaev, S., et al., Global analysis of short RNAs reveals widespread promoter- proximal stalling and arrest of Pol II in Drosophila. Science, 2010. 327(5963): p. 335-8.

231. Levine, M., Paused RNA polymerase II as a developmental checkpoint. Cell, 2011. 145(4): p. 502-11.

232. Krumm, A., L. Hickey, and M. Groudine, Promoter-proximal pausing of RNA polymerase II defines a general rate-limiting step after transcription initiation. Genes. Dev.,, 1995. 9: p. 559-572.

233. Core, L.J. and J.T. Lis, Transcription regulation through promoter-proximal pausing of RNA polymerase II. Science, 2008. 319(5871): p. 1791-2.

234. Nudler, E. and M.E. Gottesman, Transcription termination and anti-termination in E. coli. Genes Cells, 2002. 7(8): p. 755-68.

125

235. Lykke-Andersen, S. and T.H. Jensen, Overlapping pathways dictate termination of RNA polymerase II transcription. Biochimie, 2007. 89(10): p. 1177-82.

236. Jimeno-Gonzalez, S., et al., The yeast 5'-3' exonuclease Rat1p functions during transcription elongation by RNA polymerase II. Mol Cell, 2010. 37(4): p. 580-7.

237. Toohey, M.G. and K.A. Jones, In vitro formation of short RNA polymerase II transcripts that terminate within the HIV-1 and HIV-2 promoter-proximal downstream regions. Genes Dev, 1989. 3(3): p. 265-82.

238. Preker, P., et al., RNA exosome depletion reveals transcription upstream of active human promoters. Science, 2008. 322(5909): p. 1851-4.

239. Längst, G., et al., RNA polymerase I transcription on nucleosomal templates: the transcription termination factor TTF-I induces chromatin remodeling and relieves transcriptional repression. EMBO J, 1997. 16(4): p. 760-8.

240. Nicol, J.W., et al., The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics, 2009. 25(20): p. 2730- 1.

241. Li, Z., et al., A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells. Proc Natl Acad Sci USA, 2003. 100(14): p. 8164-9.

242. Kruk, J.A., et al., The multifunctional Ccr4-Not complex directly promotes transcription elongation. Genes & Development, 2011. 25(6): p. 581-93.

243. Qiu, H., et al., An array of coactivators is required for optimal recruitment of TATA binding protein and RNA polymerase II by promoter-bound Gcn4p. Molecular and Cellular Biology, 2004. 24(10): p. 4104-17.

244. Andrulis, E.D., et al., The RNA processing exosome is linked to elongating RNA polymerase II in Drosophila. Nature, 2002. 420(6917): p. 837-41.

245. Heidemann, M., et al., Dynamic phosphorylation patterns of RNA polymerase II CTD during transcription. Biochim Biophys Acta, 2013. 1829(1): p. 55-62.

246. Rodriguez, C.R., et al., Kin28, the TFIIH-associated carboxy-terminal domain kinase, facilitates the recruitment of mRNA processing machinery to RNA polymerase II. Mol Cell Biol, 2000. 20(1): p. 104-12.

247. Brannan, K., et al., mRNA decapping factors and the exonuclease Xrn2 function in widespread premature termination of RNA polymerase II transcription. Mol Cell, 2012. 46(3): p. 311-24.

126

248. Adelman, K. and J.T. Lis, Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nat Rev Genet, 2012. 13(10): p. 720-31.

249. Renner, D.B., et al., A highly purified RNA polymerase II elongation control system. J Biol Chem, 2001. 276(45): p. 42601-9.

250. Licatalosi, D.D., et al., Functional interaction of yeast pre-mRNA 3' end processing factors with RNA polymerase II. Mol Cell, 2002. 9(5): p. 1101-11.

251. Ahn, S.H., M. Kim, and S. Buratowski, Phosphorylation of serine 2 within the RNA polymerase II C-terminal domain couples transcription and 3' end processing. Mol Cell, 2004. 13(1): p. 67-76.

252. Proudfoot, N. and J. O'Sullivan, Polyadenylation: a tail of two complexes. Curr Biol, 2002. 12(24): p. R855-7.

253. Shi, Y., et al., Molecular architecture of the human pre-mRNA 3' processing complex. Mol Cell, 2009. 33(3): p. 365-76.

254. Glover-Cutter, K., et al., RNA polymerase II pauses and associates with pre- mRNA processing factors at both ends of genes. Nat Struct Mol Biol, 2008. 15(1): p. 71-8.

255. Shi, Y., Alternative polyadenylation: new insights from global analyses. RNA, 2012. 18(12): p. 2105-17.

256. Kobor, M.S., et al., An unusual eukaryotic protein phosphatase required for transcription by RNA polymerase II and CTD dephosphorylation in S. cerevisiae. Mol Cell, 1999. 4(1): p. 55-62.

257. Varon, R., et al., Partial deficiency of the C-terminal-domain phosphatase of RNA polymerase II is associated with congenital cataracts facial dysmorphism neuropathy syndrome. Nat Genet, 2003. 35(2): p. 185-9.

258. Licciardo, P., et al., Inhibition of Tat transactivation by the RNA polymerase II CTD-phosphatase FCP1. AIDS, 2001. 15(3): p. 301-7.

259. Fuda, N.J., et al., Fcp1 dephosphorylation of the RNA polymerase II C-terminal domain is required for efficient transcription of heat shock genes. Mol Cell Biol, 2012. 32(17): p. 3428-37.

260. Licciardo, P., et al., The FCP1 phosphatase interacts with RNA polymerase II and with MEP50 a component of the methylosome complex involved in the assembly of snRNP. Nucleic Acids Res, 2003. 31(3): p. 999-1005.

127

261. Smith, E. and A. Shilatifard, Transcriptional elongation checkpoint control in development and disease. Genes Dev, 2013. 27(10): p. 1079-88.

262. Garber, M.E. and K.A. Jones, HIV-1 Tat: coping with negative elongation factors. Curr Opin Immunol, 1999. 11(4): p. 460-5.

263. Brannan, K. and D. Bentley, Control of Transcriptional Elongation by RNA polymerase II: A retrospective. Genetics Res Intl, 2012: p. Article ID 170173

264. Almada, A.E., et al., Promoter directionality is controlled by U1 snRNP and polyadenylation signals. Nature, 2013. 499(7458): p. 360-3.

128

APPENDIX A

SUPPLEMENTAL TABLES

Table 1. Human pLKO.1 lentiviral shRNA

Gene_Name Clone ID Target Sequence XRN2 TRCN0000049900 CGTGAGTATTTGGAAAGAGAA TTF2 TRCN0000022232 CCTGACAATGATTGCGCTCAT EDC3 TRCN0000159171 GCACAGGTAGTATAAGGTTAT DCP1A TRCN0000021012 CCATTTCCCTTTGAGCAGTTA DCP2 TRCN0000050143 CCACGGAAACTTCAGGATAAT CTDP1 (Fcp1) TRCN0000002996 CGGGAAACCTTAGAAATCTCT CTDP1 (Fcp1) TRCN0000002999 CAGACGAGAAAGAAAGTAAAT Scramble CCTAAGGTTAAGTCGCCCTCG

129

Table 2. Proteins Identified by Mass Spectrometry on LTQ and FT Platforms with Greater Than 10 Fold Enrichment in the Anti-Xrn2 IP Relative to Anti-GFP Control from RNAse-Treated Hela NE Total assigned spectra were 22346 and 29540 for the anti-GFP and 16776 and 23442 for the anti-Xrn2 LTQ and FT analyses, respectively.

Number Assigned Spectra

Accession LTQ FT LTQ FT Xrn2 / GFP Number Anti- Anti- Anti- Anti- Ratio GFP GFP Xrn2 Xrn2

CCAR1_HUMAN 2 0 197 178 187.5 TTF1_HUMAN 0 1 65 63 128.0 RSF1_HUMAN 3 0 107 205 104.0 RRMJ3_HUMAN 1 3 165 205 92.5 CARF_HUMAN 0 3 96 168 88.0 XRN2_HUMAN 7 3 332 511 84.3 COASY_HUMAN 0 0 45 64 54.5 TTF2_HUMAN 0 0 54 54 54.0 CP088_HUMAN 1 1 40 61 50.5 PB1_HUMAN 1 0 20 30 50.0 ATD2B_HUMAN 0 0 38 58 48.0 H2AY_HUMAN 2 0 45 50 47.5 SURF6_HUMAN 1 0 18 28 46.0 EDC3_HUMAN 0 0 32 54 43.0 CF150_HUMAN 0 1 17 17 34.0 CENPB_HUMAN 0 0 33 34 33.5 NKTR_HUMAN 0 0 14 47 30.5 HRX_HUMAN 1 2 21 70 30.3 CHD1_HUMAN 4 2 62 110 28.7 NKRF_HUMAN 6 6 133 198 27.6 WRN_HUMAN 0 0 22 33 27.5 PAPD5_HUMAN 0 0 29 26 27.5 IKBL1_HUMAN 0 1 16 10 26.0 BRWD1_HUMAN 0 0 15 36 25.5 AATF_HUMAN 4 3 67 98 23.6 NOG1_HUMAN 2 0 22 24 23.0 INCE_HUMAN 0 0 24 22 23.0 DDX10_HUMAN 0 0 21 22 21.5 SFR18_HUMAN 0 0 18 24 21.0 NO66_HUMAN 0 0 21 21 21.0 ASSY_HUMAN 1 0 10 11 21.0 RO52_HUMAN 1 0 8 13 21.0 CCDC6_HUMAN 0 0 20 22 21.0

130

Table 2 (continued) BOREA_HUMAN 1 0 7 13 20.0 NOM1_HUMAN 2 0 23 17 20.0 PHIP_HUMAN 1 2 21 36 19.0 NVL_HUMAN 0 0 16 22 19.0 SHCBP_HUMAN 0 1 7 11 18.0 RRP15_HUMAN 0 2 16 20 18.0 SCRIB_HUMAN 11 8 132 201 17.5 MD1L1_HUMAN 1 1 12 23 17.5 MFAP1_HUMAN 0 1 6 11 17.0 TAP26_HUMAN 1 0 6 11 17.0 TCF20_HUMAN 2 0 13 20 16.5 UCHL5_HUMAN 1 0 6 10 16.0 DDX27_HUMAN 0 0 16 16 16.0 SFR19_HUMAN 0 0 12 19 15.5 RBM14_HUMAN 1 0 7 8 15.0 BRPF1_HUMAN 0 0 11 19 15.0 BRE1A_HUMAN 1 1 16 14 15.0 NGDN_HUMAN 2 4 34 55 14.8 CPSF7_HUMAN 2 0 16 13 14.5 SRRM2_HUMAN 2 17 105 169 14.4 DDX54_HUMAN 4 2 36 49 14.2 NOG2_HUMAN 6 3 57 67 13.8 SMRC2_HUMAN 0 6 36 44 13.3 TAD3L_HUMAN 0 1 6 7 13.0 NOLC1_HUMAN 20 17 262 187 12.1 MEPCE_HUMAN 0 1 6 6 12.0 RL36_HUMAN 0 1 6 6 12.0 MK67I_HUMAN 0 1 5 7 12.0 BRE1B_HUMAN 1 3 20 28 12.0 ZC3HF_HUMAN 0 1 6 6 12.0 MRE11_HUMAN 0 0 11 13 12.0 RRP1B_HUMAN 5 4 46 61 11.9 GPTC4_HUMAN 5 10 64 113 11.8 MTA70_HUMAN 0 0 10 13 11.5 TAF9B_HUMAN 1 0 3 8 11.0 TDIF2_HUMAN 2 1 16 17 11.0 SMRD2_HUMAN 1 2 17 16 11.0 SSF1_HUMAN 1 0 3 8 11.0 RRP1_HUMAN 0 0 11 11 11.0 CHD2_HUMAN 0 0 8 13 10.5 HNRPG_HUMAN 0 0 11 10 10.5 SPF45_HUMAN 0 0 10 11 10.5 DDX56_HUMAN 2 4 31 31 10.3 SPA5L_HUMAN 0 2 11 9 10.0 EWS_HUMAN 1 0 5 5 10.0 TRAIP_HUMAN 0 0 10 10 10.0

131

Table 3. PSY analysis of pA-site shifts for shFcp1_1_996 vs. shScramble control. R dictates direction of shift with negative values being upstream shifts. Linear trend gives p-value for shifting. This table contains the 500 genes with p-value < 0.01

R (direction of shift and linear_trend Gene Name magnitude) (p-value) NBPF1 -1 0 SKP1 -1 0 ATPIF1 -0.673842122 0 ZNF146 -0.666044938 0 MARCKS -0.63759366 0 VIM -0.41160741 0 GNAS -0.374072979 0 ATP5E -0.329386529 0 RPL15 -0.301986273 0 MT-TF -0.225672983 0 PMAIP1 0.134392624 0 XIST 0.224150506 0 NUCKS1 0.344390358 0 EEF2 0.405787328 0 ILF2 0.513442405 0 NPM1 0.527352949 0 PSMB1 0.86147911 0 PSMG2 -0.731176513 2.22E-16 SDHB -0.593255504 3.33E-16 AP1S2 0.55528496 9.99E-16 RPS14 0.703721546 1.33E-15 FAM8A1 -0.729268154 1.78E-15 NID1 0.427517429 2.44E-15 YY1 -0.381737384 3.77E-15 SAT1 0.58468001 1.74E-14 ZNF598 0.492140384 5.14E-14 KLF6 -0.781503686 6.58E-14 HMGXB4 0.673450613 1.02E-13 PPP2R2A -0.708146487 3.97E-12 CENPV -1 4.26E-12 ACTG1 -0.158478248 5.11E-12 SETD2 1 1.18E-11 RPN2 -0.404448411 1.68E-11 ADAMTS1 -0.460850458 1.75E-11

132

Table 3 (continued) CIRBP -0.424894634 2.66E-11 CCAR1 0.679320544 1.86E-10 PRDX6 -0.186195643 3.07E-10 KIF5B -1 4.24E-10 REV1 1 4.24E-10 MCL1 0.669426369 8.49E-10 PRDX3 -0.244565827 1.13E-09 PPP1CC -0.163255833 1.54E-09 ALDH18A1 -0.525749883 4.78E-09 ZNF711 -0.166167604 4.95E-09 ENO1 -0.161068745 1.43E-08 HIBADH -0.692531837 1.44E-08 ABR -1 1.54E-08 HNRNPR -0.116911696 3.44E-08 HNRNPR 1 4.32E-08 RCAN1 -0.865148868 4.46E-08 CHERP -0.667643468 1.46E-07 OAZ1 -0.15225073 2.34E-07 PFN1 -0.211494054 2.54E-07 DNAJC1 -1 3.41E-07 POFUT1 -0.467602948 3.78E-07 ATP5J -0.267597329 3.83E-07 SNHG12 0.503466077 4.79E-07 RPL13 0.091386345 4.96E-07 MKNK2 -0.294608361 5.49E-07 DDX21 -0.2204866 5.63E-07 HNRNPA0 -0.345489986 5.89E-07 SUB1 -0.397139797 5.98E-07 ATRX -1 9.63E-07 CROCCP2 1 9.63E-07 HNRNPK -0.308655831 1.01E-06 MRFAP1 -0.174588701 1.03E-06 SAR1A -0.35995784 1.28E-06 RPL8 -0.069875886 1.29E-06 BCLAF1 -0.351669453 1.33E-06 SCD -0.329630387 1.34E-06 YME1L1 0.387675408 1.50E-06 MGEA5 -0.544009806 2.11E-06 PELP1 0.669284379 2.22E-06 PRDX1 -0.493139778 3.73E-06 BCLAF1 -0.302299291 4.77E-06 NME4 -0.137199436 5.41E-06

133

Table 3 (continued) SERBP1 -0.086211115 5.83E-06 UCHL1 -0.241250462 6.78E-06 ALKBH2 1 7.74E-06 MEX3C -0.698212002 1.01E-05 MDM2 -0.280210639 1.02E-05 GOPC -0.5872634 1.11E-05 TMBIM6 -0.122604195 1.45E-05 HS6ST2 -0.5204165 1.54E-05 NCL 0.158840077 1.90E-05 WWC1 -0.657365208 2.04E-05 ZC3H18 0.553766111 2.10E-05 SLC15A4 -0.459024766 2.32E-05 COX7A2 -0.182611974 2.96E-05 KIF5C -0.646971072 3.43E-05 CDKN1B -0.415620951 3.54E-05 UBE2B 1 3.74E-05 DARS -0.26629981 4.30E-05 C6orf120 -0.355974865 4.32E-05 PANK3 -0.478666467 4.87E-05 HSP90AA1 0.705011645 5.12E-05 ARF1 -0.247257913 5.17E-05 DLC1 -0.276431607 5.93E-05 ZNF521 -0.834275857 6.31E-05 UTP14A -1 6.33E-05 CTBP2 -0.565511264 7.54E-05 NARS -0.127679679 7.82E-05 SRSF2 -0.182777587 8.39E-05 TPBG -0.619269338 8.98E-05 CYR61 -0.79854941 9.15E-05 VEGFA -0.312146493 9.67E-05 FMR1 -0.309829612 9.84E-05 G3BP2 -0.387005358 0.000100511 YBX3 -0.09433283 0.000105823 CENPL -1 0.000107511 KDELR1 -0.487732532 0.000108276 NUB1 0.566816861 0.000120878 ADD3 -0.1151071 0.000125572 CEBPG -0.182393467 0.000128006 KRIT1 0.810405551 0.000144032 NUSAP1 -0.439599044 0.00015584 IDS 0.535136008 0.000179711 STARD7 -1 0.000182811

134

Table 3 (continued)

TAB1 -1 0.000182811 ITPA 1 0.000182811 APPBP2 -0.348695648 0.000184501 SREK1 -0.348547745 0.000185668 COQ3 -0.367330874 0.000193004 RPS2 -0.20229331 0.000204322 GHITM -0.17307902 0.000212182 DACH1 -0.600811957 0.000212516 SMG1 -0.297073106 0.000216851 EIF2S3 -0.235768862 0.000217399 HMGN5 0.502710047 0.000220624 RPL30 0.126567635 0.000235991 FDFT1 -0.319575606 0.000254479 C16orf80 -0.496880958 0.000260902 SAAL1 -0.654653671 0.000267436 POLR2C -0.383820927 0.000271323 CCND1 -0.473154888 0.000278659 HNRNPAB -0.123822516 0.000311233 RBM25 -0.426817846 0.000322613 NDUFC1 -0.393129547 0.00037094 PSMA4 0.069355013 0.00037738 DEK 0.206536521 0.000380302 C17orf51 -0.657035466 0.000402797 CADM1 -0.243704406 0.000413034 KIF5B -0.326726896 0.000433248 NASP 0.161052474 0.000441828 SERINC1 -0.387722753 0.00044646 COPS2 -0.448357375 0.000462156 CDCA5 -0.685994341 0.00046894 POMP -0.294453522 0.000471507 CNN3 -0.366091695 0.000478894 PDIA6 -0.239346598 0.000492249 NCL -1 0.000532006 SPIN4 -1 0.000532006 NCL -0.261499537 0.000541554 GTF2H3 0.297775578 0.000566845 ZBTB34 -0.734846923 0.000567399 SLC31A1 -0.663324958 0.000567399 RAN -0.268804614 0.000576622 CCDC90B -0.436301161 0.000591595 MT-TH -0.159790014 0.000593525

135

Table 3 (continued) VEZT -0.178278751 0.000615653 PLAC1 -0.429781812 0.000646569 MOAP1 -0.531085005 0.000672373 MT-TT 0.337919445 0.000683663 MYH10 -0.573637065 0.000689596 PRKAR1A -0.327801867 0.000696867 NINL -0.795174872 0.00074181 NAA16 -0.54647043 0.000755318 TP53BP2 -0.368770369 0.00078039 ACSL1 -0.452605069 0.000789046 PRC1 0.22559512 0.000842312 SYBU -0.786006485 0.000853782 CCNT2 -0.547337275 0.000870575 TXN 0.203181373 0.000880316 ITGB3BP -0.304828401 0.000883281 ZNF503 -0.33205935 0.000898263 LPIN1 -0.691564075 0.000911119 RANBP1 -0.205949602 0.000939512 HSP90AA1 0.04586377 0.000973594 TRIP13 -0.174952603 0.00099579 CTBP1 -0.328911769 0.00100502 TMEM50A 0.259955975 0.001008259 ZWINT -0.412595977 0.001057072 SLC30A9 0.730296743 0.001090835 MRPL21 0.185304858 0.00110384 ILF3 0.179090945 0.001120921 ATP5G3 -0.263056815 0.001138592 NAA50 -0.115618639 0.001189239 RAB5A -0.194860566 0.001231867 ARRDC4 0.272973444 0.001238485 PARM1 -0.671161985 0.001287373 RRM1 -0.294195145 0.001330631 ARGLU1 0.19507142 0.001377096 WARS2 -0.525538273 0.001389976 CDC7 -0.371405955 0.001398527 AFG3L2 -0.539474851 0.001414988 CCNL2 -0.544652164 0.00149403 PTP4A2 -0.129042572 0.001503394 UBE2R2 -0.359285198 0.001508097 SLC6A15 -0.817282471 0.001549117 COPS7B -0.527324076 0.001556465 ATP8B2 -1 0.001565402

136

Table 3 (continued) DPAGT1 -1 0.001565402 SYNE2 -1 0.001565402 CHD2 1 0.001565402 MKLN1 1 0.001565402 ERI1 -0.519472687 0.001578618 GPI -0.235361528 0.001590157 IKZF4 -0.788810638 0.001603647 LDHB -0.788810638 0.001603647 RIOK3 -0.788810638 0.001603647 IRF2BP2 -0.157716641 0.001744579 SNW1 -0.315323539 0.001799055 L3MBTL2 0.30460385 0.001800786 CNOT7 -0.201458936 0.001802451 PARP1 -0.285780467 0.001823939 LRRC58 -0.542667465 0.001824617 ZFP36 -0.777383633 0.001873824 PFDN4 0.229120215 0.001883977 ATP5J2 -0.205733396 0.001982436 CKAP2L 0.59484969 0.001995309 DFFA -0.416545987 0.002007046 PTDSS2 -0.707106781 0.002054719 LDHB 0.641688948 0.002087924 DNAJC2 -0.560093426 0.002156712 GGPS1 -0.308000336 0.002179871 PRC1 -0.215447803 0.002197984 BMP7 -0.397905457 0.002240347 KANK2 -0.3906377 0.002280973 ZIC2 -0.219550811 0.002287672 BRD4 -0.402961452 0.002347838 PTX3 -0.697931499 0.002348457 PTPN11 -0.562527277 0.002451194 PRPSAP1 -0.420039341 0.002454079 ARHGAP35 -0.403753658 0.002515957 POLR3H 0.427022868 0.002531859 HERC4 -0.642818354 0.00256907 IPO11 -0.470768315 0.002574983 PHB2 -0.454242628 0.002585897 PSEN1 -0.548230713 0.002675229 MRPS26 -0.832688593 0.002679467 OPA3 -1 0.002699796 TMEM198 -1 0.002699796 BOC -1 0.002699796

137

Table 3 (continued) TRIM5 1 0.002699796 USP46 1 0.002699796 GIGYF1 1 0.002699796 NEFL -0.248132085 0.002715871 RPSA 0.369035426 0.00271706 TUBB -0.202479921 0.002857034 CCDC50 -0.544439629 0.002863545 LANCL2 -0.56346935 0.002867468 SRSF7 0.16981644 0.002879978 ERLEC1 -0.434057366 0.002922755 PCK2 -0.441945168 0.003030193 ANKRD17 0.233527337 0.003045282 AFF2 0.854656313 0.003070242 NUP107 -0.337234641 0.003084169 FBL 0.076234117 0.003101936 FZD8 0.440447369 0.003130637 MANF -0.408590952 0.003215054 SSRP1 -0.274501154 0.003243174 TMEM209 -0.168933927 0.003327331 ERCC1 -0.784464541 0.003333354 NEIL3 -0.978317942 0.003335973 RAB10 -0.263730575 0.003445519 CAMSAP2 -0.707156598 0.003549128 VEGFB 0.433860916 0.003609347 TNPO1 -0.097220593 0.003669561 XRN2 -0.246337402 0.003681045 SPTSSA -0.580844036 0.003681691 IRX5 -0.320575597 0.003696828 SUCLG2 -0.512877645 0.003716487 CPD -0.336133341 0.00383373 MAD2L1 -0.298082027 0.003852265 NPTX1 -0.196521746 0.003873697 CSNK1G3 -0.832147107 0.003943552 THOC5 -0.535173777 0.00395161 CFL2 -0.385044491 0.003959014 ARPC2 -0.229417143 0.004045543 NVL -0.406421667 0.004055205 BROX -0.492353606 0.004093177 CENPP -0.658076644 0.004124368 TRIM2 0.413820441 0.00414341 TCTN3 0.447667028 0.004150806 RPL14 -0.175590866 0.004185917

138

Table 3 (continued) SLMO2 -0.232744603 0.004236229 SLC39A14 -0.378804727 0.004237573 ARID4B -0.623817418 0.004253936 MRPL34 -0.211757892 0.004279756 PSIP1 0.249532203 0.004289792 TRIM33 -0.321256519 0.004298396 RPS19 0.169224074 0.004347042 CDK11A -0.247226717 0.004356037 AAMP -0.357437875 0.004552889 LTBP1 0.758068559 0.004562057 GLUD1 -0.187239845 0.004604808 TNFAIP8 -0.262982204 0.004619929 NCKAP1 -0.232679307 0.004645121 METAP1 -0.326718333 0.004662608 GSTO1 -0.199044671 0.004670007 TIMM9 -0.318261591 0.004672714 TCEB2 -0.169339297 0.004676321 VIM -1 0.004677735 CCDC34 -1 0.004677735 FAM181B -1 0.004677735 EHD4 -1 0.004677735 TMEM87B -1 0.004677735 HOXD-AS1 -1 0.004677735 CCT8 -1 0.004677735 FAM134B -1 0.004677735 CREB3L2 -0.426401433 0.004677735 STX8 1 0.004677735 NKX2-4 1 0.004677735 NAP1L1 -0.126826397 0.004734588 RPAP2 0.78173596 0.004823522 LDHB 0.140975638 0.004862776 PTBP1 0.14227755 0.005013705 MAP1B 0.163561146 0.005039634 LYRM5 0.321601405 0.005052677 MYO9A 0.845154255 0.005062032 ATP5H 0.206599896 0.005071492 RPL37 -0.197581935 0.005091179 JMJD1C -0.381173815 0.005093757 KDM2A -0.611232042 0.005094182 SPINT2 -0.241770085 0.00513102 TRA2B -0.212610844 0.005166596 EXOSC8 0.191567558 0.00528277

139

Table 3 (continued) CDS2 -0.557809574 0.005286325 GRB10 -0.445973038 0.00535111 STUB1 -0.163820429 0.005353643 ZADH2 -0.389836289 0.005369503 AFF1 -0.491303684 0.005448737 ZNF615 -0.483646453 0.005463786 CNIH -0.261050747 0.005520043 ZFHX3 -0.381110209 0.005528252 UBE2A -0.335440623 0.005672866 PINK1-AS 0.261148957 0.005714291 ABHD2 -0.738548946 0.005720312 TRMT112 -0.169935498 0.005760262 PDIA4 0.265563681 0.005783344 HAUS5 -0.36542787 0.005799274 RAB7A -0.229041293 0.005815221 FAM204A 0.42538498 0.00583683 MLX -0.613915545 0.006041623 C1orf27 -0.323560756 0.006041802 NDFIP1 -0.229385337 0.006087148 PRDM16 -0.826925041 0.006095487 FZD3 -0.47003814 0.006129544 CEP44 -0.684653197 0.006169899 CACNA2D1 -0.684653197 0.006169899 SNX15 -0.476731295 0.006169899 CARS2 0.246853306 0.006186379 GDAP1 -0.369084757 0.006196351 ELMSAN1 0.516752327 0.006249478 PUM1 -0.175374744 0.006260393 HIST1H2BN 0.455129495 0.006318522 POLR3B -0.626473596 0.006319324 FGD5-AS1 -0.155076173 0.006325795 SERPINE2 -0.442609963 0.006363639 SECISBP2L -0.862403845 0.006388197 RPS21 -0.052293338 0.006434454 PLEKHA1 -0.680036799 0.006525286 LEPROTL1 -0.34191619 0.006650113 UBE2Q1 0.404498041 0.006658554 ADIPOR1 -0.237830066 0.006694319 ASNS -0.179973667 0.006696405 B4GALT6 -0.433148985 0.00683014 XBP1 -0.113899131 0.006831375 SMNDC1 -0.27384264 0.006995945

140

Table 3 (continued) F2R -0.448862694 0.007077486 GSTK1 -0.356576921 0.007100435 UGDH 0.601919051 0.007105395 FKBP1A 0.212605231 0.00716091 GOT1 -0.280332741 0.007169708 MYCBP -0.50792913 0.007194421 SAMD11 0.175321883 0.007196031 PSMC3IP -0.572263384 0.007271378 CKB -0.631888377 0.007342987 OXR1 -0.119495703 0.007362014 AFF4 -0.305306938 0.007382971 SFPQ -0.245453155 0.007415663 SAMHD1 -0.535078698 0.007464159 HSPA8 -0.597746674 0.007513046 PSMA1 -0.224512732 0.007677317 VMP1 -0.251745203 0.007716689 CHMP7 -0.486187856 0.007745649 LINC00657 -0.665703116 0.007749059 THUMPD3 -0.463381605 0.007769527 TMX3 -0.209435506 0.007873692 ST3GAL6 -0.840325572 0.007875932 ZNF644 -0.293284632 0.007911928 ARPC5L -0.462289994 0.007915482 FBN2 -0.15463863 0.008121248 LAMTOR5 -0.193045993 0.008123018 NDUFB9 -0.196183966 0.00812895 PRC1 0.264594399 0.00814633 HOXC13 -1 0.008150972 ESD -1 0.008150972 ZNF112 -1 0.008150972 NUP35 -1 0.008150972 MCIDAS -1 0.008150972 KLF4 -0.683130051 0.008150972 MED17 0.683130051 0.008150972 CHST12 1 0.008150972 PTPRK -0.440687865 0.008190187 GNB2L1 0.301212075 0.008214465 KIAA1191 -0.230803452 0.008249931 CSTF2 -0.622531756 0.008261816 TRIP6 -0.197276184 0.008305894 HMGA2 -0.376664878 0.008372815 MARK4 -0.432240776 0.008558148

141

Table 3 (continued) MT-TR 0.177914489 0.008617274 DR1 -0.360730007 0.008635571 HIGD1A -0.927461922 0.008709318 PCOLCE2 0.35015874 0.008784006 EPHB4 -0.726648698 0.008793772 MLYCD 0.828078671 0.008828761 C8orf59 0.828078671 0.008828761 SMIM10 0.206991596 0.008837954 COL4A1 -0.352555052 0.008932723 USP13 -0.54481957 0.008978732 TTC14 0.329178388 0.008981108 PDE4DIP -0.544725816 0.008990551 UBB -0.08663584 0.009000418 CALM2 -0.112053152 0.009025073 MRPL35 -0.428940754 0.009076884 UBXN2A -0.521588665 0.009108803 C15orf61 -0.235079237 0.009129806 PMPCB -0.241836585 0.009196554 ATP6V1A 0.823628811 0.009199688 TPD52L2 -0.460124306 0.009245074 SLC7A6OS -0.5547432 0.009268827 NACA -0.088767195 0.009359911 COA1 -0.266500895 0.009389708 NBEA -0.580381 0.009444166 GSE1 -0.213348874 0.009445311 ANAPC11 -0.219002162 0.009562284 EIF1B -0.270131017 0.009569591 SENP6 0.223362682 0.009720686 NGFRAP1 -0.14770064 0.009774508 DNAJA3 -0.241866806 0.009810703 POT1 -0.538411327 0.009819259 CFDP1 -0.487356373 0.009913053 ZNF268 -0.607240062 0.009986356

142