DISCLAIMER:

This document does not meet current format guidelines Graduate School at the The University of Texas at Austin. of the It has been published for informational use only. Copyright by Joshua Edward Mayfield 2017 The Dissertation Committee for Joshua Edward Mayfield Certifies that this is the approved version of the following dissertation:

POST-TRANSLATIONAL MODIFICATION OF THE C-TERMINAL DOMAIN OF RNA POLYMERASE II: IDENTIFICATION AND CROSS TALK

Committee:

Yan Zhang, Supervisor

Jennifer S. Brodbelt

Marvin L. Hackert

Rick Russell

Arlen W. Johnson POST-TRANSLATIONAL MODIFICATION OF THE C-TERMINAL DOMAIN OF RNA POLYMERASE II: IDENTIFICATION AND CROSS TALK

by

Joshua Edward Mayfield

Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

The University of Texas at Austin August 2017 Acknowledgements

Much of the work presented in this dissertation was the result of extensive collaboration with several wonderful groups. The data presented in Chapter 2 would not have been possible but for the aid of Shuang Fan and their advisor Felicia Etzkorn for their organic chemistry and peptide synthesis expertise, Shuo Wei from Kun Ping Lu’s lab for providing shRNA treated HeLa cell material, Bing Li for RNA polymerase II substrate and comments on the manuscript, and Andy Ellington for his extensive commentary and guidance in generating the manuscript. The data in Chapter 3 and subsequent investigations presented in Chapter 4 would not have been possible if not for the tireless effort of members of Jennifer Brodbelt’s group: Michelle Robinson, Victoria

Cotham, Rachel Mehaffey, and Joe Cannon who gathered and interpreted mass spectrometry data. I must extend a special acknowledgement to Jennifer Brodbelt herself for her continual support and chemical perspective throughout this collaboration and my graduate career. Finally, I must extend my greatest thanks to my mentor Yan “Jessie”

Zhang who has constantly supported me and positioned me alongside a wonderful team of scientists.

ABSTRACT: POST-TRANSLATIONAL MODIFICATION OF THE C-TERMINAL DOMAIN OF RNA POLYMERASE II: IDENTIFICATION AND CROSS TALK

Joshua Edward Mayfield, Ph.D. The University of Texas at Austin, 2017

Supervisor: Yan Zhang

RNA polymerase II is a highly regulated protein complex that transcribes all protein coding mRNA and many non-coding RNAs. A key mechanism that facilitates its activity is post-translational modification of the carboxyl-terminal domain of RNA polymerase II (CTD). This unstructured domain is conserved throughout eukaryotes and composed of repeats of the consensus amino acid heptad Tyr1-Ser2-Pro3-Thr4-Ser5-

Pro6-Ser7. This domain acts as a platform for the recruitment of transcriptional regulators that specifically recognize post-translational modification states of the CTD. The majority of our understanding of CTD modification comes from the use of phospho- specific , which provide identity and abundance information but give only low- resolution information for how these marks co-exist and interact at the molecular level.

During my graduate work I sought to utilize the tools of chemical biology to investigate

CTD modification in high resolution. Using a combination of chemical tools, analytical chemistry, and molecular biology I studied CTD modification in extremely high resolution. This work reveals the existence of interactions between CTD modifications,

i

the influence of CTD sequence divergence on modification events, and presents initial data to support a role for previously encoded modifications to direct subsequent modification events.

ii

Table of Contents

LIST OF TABLES VII

LIST OF FIGURES VIII

CHAPTER 1: THE CTD CODE. 1

Abstract ...... 1

1.1 Transcription in Prokaryotic and Eukaryotic Systems ...... 2

1.2 RNA Polymerase II: Transcription Cycle ...... 5 1.2.1 Initiation...... 5 1.2.2 Elongation...... 7 1.2.3 Termination...... 8 1.2.4 Transcription cycle and CTD...... 10

1.3 The carboxyl-terminal domain of RNA polymerase II & CTD Code ...... 10 1.3.1 Cycle of CTD phosphorylation...... 11 1.3.2 Contribution of different CTD phosphorylation marks...... 13 1.3.3 CTD kinases, writers of the code...... 16 1.3.4 CTD phosphatases: Erasers of the code...... 17 1.3.5 Prolyl isomerases: Modifiers of the code...... 24

1.4 Chemical Biology to Decipher the CTD Code ...... 26

1.5 References ...... 28

CHAPTER 2: CHEMICAL TOOLS TO INVESTIGATE PROLINE ISOMERIZATION AND DEPHOSPHORYLATION IN THE CTD CODE. 40

Abstract ...... 40

2.1 Introduction ...... 41

2.2 Results and Discussion ...... 43 2.2.1 Synthetic CTD peptidomimetic analogues incorporating cis and trans- locked isosteres...... 43 iii

2.2.2 Ssu72 is a cis-specific CTD Ser5 phosphatase...... 45 2.2.3 Scps strongly favor trans-proline as substrate...... 49 2.2.4 Fcp1 is a trans-preferred phosphatase...... 53 2.2.5 Prolyl isomerase Pin1 does not alter the apparent phosphatase activity of Fcp1...... 56 2.2.6 In vitro reconstruction of Pin1 mediates Ssu72 enhancement in full length CTD...... 58 2.2.7 Prolyl isomerase activity regulates cis-specific CTD phosphatase in the cell...... 61

2.3 Conclusion and Perspective ...... 63

2.4 Materials and Methods ...... 67 2.4.1 Antibodies and reagents ...... 67 2.4.2 General synthesis and characterization of chemical tools...... 67 2.4.3 Protein expression and purification...... 68 2.4.4 Crystallization and crystal soaking with peptidomimetic compounds.70 2.4.5 Data collection and structure determination...... 70 2.4.6 Malachite green assay and analysis...... 71 2.4.7 Fcp1/Pin1 coupled assay and analysis...... 72 2.4.8 In vitro reconstruction of Pin1 mediated Ssu72 enhancement...... 72 2.4.9 Establishment of shPin1 stable cell lines...... 73 2.4.10 Immunoblotting ...... 74

2.5 References ...... 75

CHAPTER 3: ULTRAVIOLET PHOTODISSOCIATION MASS SPECTROMETRY TO MAP PHOSPHORYLATION ALONG RNA POLYMERASE II CTD. 79

Abstract ...... 79

3.1 Introduction ...... 80

3.2 Results and Discussion ...... 83 3.2.1 Analysis of Saccharomyces cerevisiae CTD...... 83 3.2.2 Analysis of Drosophila melanogaster CTD ...... 92

iv

3.2.3 Tyrosine 1 is required for CTD phosphorylation by Erk2 and other CTD kinases...... 107 3.2.4 Tyrosine 1 limits the addition of phosphates to GST-CTD substrate.108

3.3 Conclusion and Perspective ...... 110

3.4 Materials and Methods ...... 112 3.4.1 Materials...... 112 3.4.2 Protein expression and purification...... 112 3.4.3 Kinase treatment of GST-CTD constructs...... 114 3.4.4 Sample preparation for mass spectrometry analysis...... 114 3.4.5 Mass spectrometry, liquid chromatography, and ultraviolet photodissociation...... 115 3.4.6 Data analysis...... 118 3.4.7 Gel shift analysis of CTD5...... 119 3.4.8 MALDI-MS analysis of GST-yCTD...... 119

3.5 References ...... 120

CHAPTER 4: CROSS TALK OF PHOSPHORYLATION MARKS WITHIN THE CTD CODE. 124

Abstract ...... 124

4.1 Introduction ...... 125

4.2 Results and Discussion ...... 127 4.2.1 Total amounts of CTD phosphorylation in full length CTD are dictated by heptad number...... 127 4.2.2 Tyrosine 1 phosphorylation of the CTD alters the specificity of P-TEFb but not TFIIH...... 131 4.2.3 Inhibition of tyrosine kinases reduces the level of Ser2 phosphorylation of the CTD in cells...... 134 4.2.4 Ser5/Ser2 specificity is governed by a potential Tyr1 binding pocket conserved across CTD kinases...... 137

v

4.3 Conclusion and Perspective ...... 145

4.4 Materials and Methods ...... 146 4.4.1 Antibodies and reagents...... 146 4.4.2 Protein expression and purification...... 147 4.4.3 Kinase treatment of GST-yCTD constructs...... 148 4.4.4 MALDI-MS analysis of GST-yCTD...... 149 4.4.5 Gel shift analysis of GST-yCTD samples...... 149 4.4.6 Cell culture and total protein preparation...... 150 4.4.7 Immunoblotting ...... 150 4.4.8 Sequence alignment...... 151

4.5 References ...... 151

BIBLIOGRAPHY 154

vi

List of Tables

Table 1-1. Phosphorylation marks of the CTD...... 15

Table 2-1. Sequence of native and peptidomimetic compounds...... 45

Table 2-2. X-ray crystallography data collection and refinement statistics...... 47

Table 3-1. Drosophila melanogaster GST-CTD construct sequences...... 95

Table 3-3. CTD peptides with ambiguous phosphosites from positive mode UVPD analysis...... 100

Table 3-4. CTD peptides with localized phosphosites from negative mode UVPD analysis...... 101

Table 3-5. CTD peptides with ambiguous phosphosites from negative mode UVPD analysis...... 101

vii

List of Figures

Figure 1-1. Central dogma of biology...... 5

Figure 1-2. The CTD sits at the intersection of DNA, RNA, and protein codes. . 11

Figure 1-3. Phosphorylation CTD residues through transcription in S. cerevisiae.13

Figure 1-4. Conformations of phospho-serine proline motifs...... 26

Figure 2-1. Locked proline isosteres...... 44

Figure 2-2. Drosophila melanogaster Ssu72+Symplekin analysis using locked proline peptides...... 48

Figure 2-3. Human Scp1 analysis using locked proline peptides...... 52

Figure 2-4. Proline isomer specificity of Fcp1...... 55

Figure 2-5. In vitro reconstruction Pin1 mediated Ssu72 enhancement...... 60

Figure 2-6. Impact of Pin1 knockdown on CTD phosphorylation states in Hela cell lines...... 62

Figure 2-7. Model of differentiated regulation mediated by proline isomerization of CTD in RNA polymerase II dephosphorylation...... 64

Figure 3-1. LC-MS base peak MS1 chromatogram and MS/MS spectra of unmodified yeast CTD heptads following digestion with trypsin and proteinase K...... 85

Figure 3-2. LC-UVPD-MS analysis of TFIIH and Erk2 treated yeast GST-CTD digested with trypsin and proteinase K...... 88

Figure 3-3. Base peak MS1 chromatograms...... 90

Figure 3-4. Phosphorylations identified in Drosophila melanogaster CTD following treatment with Erk2 using LC-UVPD-MS...... 96

Figure 3-5. Gel shift assay and intact mass analysis results for CTD5 before and after treatment with Erk2...... 106

Figure 3-6. Gel shift analysis of CTD5 constructs treated with TFIIH kinase. . 108

viii

Figure 3-7. MALDI-MS analysis of wild-type and YtoF sequence GST-yCTD treated under saturating Erk2 conditions...... 109

Figure 4-1. Intact mass analysis of GST-yCTD substrate treated with multiple kinases...... 130

Figure 4-2. Analysis of Ser5 and Ser2 phosphorylation of Abl, P-TEFb, and TFIIH treated samples...... 133

Figure 4-3. Western blot for Ser2 and Ser5 phosphorylation in HEK293T cells treated with imatinib...... 136

Figure 4-4. Docking and mutagenesis analysis of Erk2...... 138

Figure 4-5. Sequence alignment of Ser5 CTD kinases...... 140

Figure 4-6. Sequence alignment of Ser2 CTD kinases...... 142

Figure 4-7. A putative tyrosine binding pocket is conserved amongst CTD kinases...... 144

ix

Chapter 1: The CTD Code.

ABSTRACT

RNA polymerase II (RNA Pol II) generates all protein coding mRNA, small nuclear RNA, small nucleolar RNA, and some micro RNAs (1-4). Like many important eukaryotic enzymes, the function of RNA Pol II is tightly regulated by post-translational modifications such as phosphorylation (4-13). A search of PhosphoSitePlus®, a high throughput proteomics database for post-translational modifications (14), reveals over

100 sites of phosphorylation throughout the subunits of human RNA Pol II identified by proteomic analysis or molecular biology approaches. However, the most clearly physiologically relevant phosphorylation sites are densely localized within one region of

RNA Pol II: the C-terminal region of the largest subunit (RPB1) now universally referred to as the C-terminal domain (CTD) of RNA polymerase II (15, 16). The primary sequence of the CTD consists of multiple repeats of the consensus heptad sequence

Tyr1Ser2Pro3Thr4Ser5Pro6Ser7, with the number of repeats ranging from 5 in Plasmodium yoelii (GenBank Accession: XM_726075) to 52 in humans (GenBank Accession:

NM_000937) (17). This sequence is highly enriched in potential phosphorylation sites with each of the five non-proline residues in the heptad repeat being recognized as phosphate acceptors in vivo, where they are dynamically phosphorylated throughout the transcription cycle and contribute to every stage of transcription (9, 12, 13, 18). Specific

1

post-translational modification states of the CTD are essential because they recruit factors to direct 5’-capping(7, 8, 19) and 3’-polyadenylation (12, 20-23), co-transcriptionally splice nascent RNA(24, 25), and recycle the polymerase for subsequent promoter binding

(26, 27). This collection of post-translational modifications and recruited protein factors constitute the “CTD Code” for eukaryotic transcription (28). This chapter aims to briefly introduce eukaryotic transcription, explain our current state of knowledge of the CTD code with specific focus on phosphorylation, and provide rationale for the use of a chemical biology approach to study this dynamic signaling platform.

1.1 TRANSCRIPTION IN PROKARYOTIC AND EUKARYOTIC SYSTEMS

The central dogma of biology (Figure 1-1) describes, in broad terms, the flow of biological information within cells. Genomic DNA, at the base of this central dogma, encodes vast amounts of information including the templates for a diverse repertoire of biomolecules (i.e. RNA and proteins). The first step in interpreting and utilizing this DNA code is often the conversion of the DNA template to an RNA transcript. This process, referred to as transcription, generates a vast array of functionally distinct RNA molecules that can perform catalytic, messenger, and regulatory functions (29).

Transcription is achieved through the action of DNA-dependent RNA polymerases. This group of holoenzymes is essential from single celled prokaryotes to multicellular eukaryotes. Although the catalytic mechanism and much of the structural architecture of these polymerases is conserved, the degree to and mechanism by which they are regulated are clearly different across taxonomic domains (30, 31).

2

In prokaryotic systems, like Escherichia coli, there is a single RNA polymerase that generates all cellular RNAs including ribosomal RNA (rRNA), transfer RNA (tRNA), messenger RNA (mRNA), and various non-coding RNA (ncRNA) (30, 31). The regulation of this holoenzyme is mediated primarily through the recruitment of RNA polymerase to the promoter of genes1. This is accomplished through the action of multiple σ subunits that specifically bind promoter regions of the and recruit the RNA polymerase holoenzyme to provide specificity and directionality to transcription. Additional DNA binding proteins further regulate transcription by modulating the affinity of the polymerase for DNA or blocking their ability to synthesize RNA. Key examples in Escherichia coli include cAMP receptor protein (CRP) and various repressors, like the well-characterized lac repressor. Additionally, transcription and translation of mRNA are coupled in prokaryotic systems and, therefore, these transcripts require minimal post- transcriptional processing (30, 31). The fairly straightforward nature of transcription regulations in prokaryotes is in stark contrast to what is observed in eukaryotes. The transcription process in eukaryotes necessitates a greater degree of regulation than is found in prokaryotes. This arises from both the increased size and complexity of eukaryotic genomes, the uncoupled nature of transcription and translation, and the need, in multicellular organisms, to specifically time transcriptional events in cell differentiation and development. This is accomplished through multiple mechanisms, beginning with the division of labor between three primary polymerase: RNA polymerase I, II, & III (30, 31). RNA polymerase I is composed of 14 subunits and is the simplest of these polymerases in terms of regulation. It transcribes the 28S, 18S, and 5.8S rRNA as an

1 A promoter is a region of the genome that directs transcription of adjacent DNA (30, 31). 3

initial 45S pre-rRNA transcript. This transcription occurs in a special portion of the nucleus known as the nucleolus. Here, newly transcribed 45S pre-rRNA is processed and assembled with other ribosomal factors for subsequent export as a functional ribosome. Because of it’s small transcriptional repertoire (1 transcript) and its localization to the nucleolus, RNA polymerase I does not require intricate additional regulatory apparatus (30, 31). RNA polymerase III is the largest of eukaryotic RNA polymerases and contains 17 subunits. It is primarily responsible for the generation of transfer RNAs (tRNA) and 5S rRNA throughout the nucleus. RNA polymerase III is directed to these genes via three additional proteins/protein complexes: TFIIIA, TFIIIB, and TFIIIC. TFIIIA utilizes a zinc-finger domain to contact specific DNA elements of ~40bp in length. This is followed by recruitment of TFIIIB and TFIIIC. Once TFIIIA, TFIIIB, and TFIIIC are bound to genes RNA polymerase III can repeatedly transcribe these regions and produce abundant amounts of RNA. Additionally, RNA polymerase III transcription is subject to negative feedback regulation via TFIIIA’s inherent affinity for 5S rRNA. This affinity limits 5S RNA production when in excess by sequestering TFIIIA from genomic DNA. Although RNA polymerase III regulation is slightly more complex than RNA polymerase I, both seem simplistic in comparison to the regulation of RNA polymerase II (30, 31).

RNA polymerase II is composed of 12 subunits and has the most diverse transcriptional repertoire of the eukaryotic polymerases (30, 31) accounting for more than 19,000 genes in humans (32). This myriad set of transcripts requires an equally immense set of regulatory mechanisms. The basics of the transcription process have been established for several decades (30, 31) and the many specific regulatory mechanisms have been reviewed extensively (33-39). To discuss of any of these mechanisms, including the CTD, a basic understanding of the transcription cycle is required. 4

DNA

Transcription

RNA

Translation

Protein

Figure 1-1. Central dogma of biology.

1.2 RNA POLYMERASE II: TRANSCRIPTION CYCLE

The RNA polymerase II (RNA Pol II) transcription cycle can be broadly divided into three main stages: initiation, elongation, and termination. These three stages rely on different protein factors and DNA elements for timely and targeted execution of transcription (29-31, 40-42). Each of these stages makes unique contributions to the final transcribed product and a rudimentary understanding of their molecular underpinnings is essential to the discussion of CTD code.

1.2.1 Initiation.

Initiation is the first and most widely studied stage of transcription (30, 31, 42). It relies on a complex network of cellular signals and protein transcription factors that tune the expression and abundance of transcripts to meet the needs of the organism (30, 31, 5

41). Upstream of the actual transcription process, multiple pathways induce or repress the initiation of transcription via signal transduction to transcription factors (41). However, regardless of upstream signals once induced the actual mechanism of transcription initiation is fairly consistent across genes (30, 31). RNA polymerase II is directed to the correct genomic loci largely by the action of promoter DNA elements and basal transcription factors2 (30, 43). This combination of factors helps assemble the pre-initiation complex (PIC), which contains the basal transcription factors and a transcription competent polymerase (30). These promoter DNA elements include well-studied elements like TATA- and CAAT-boxes (30, 31) and many emerging and non-canonical promoter elements (43). These elements may be directly bound by the basal transcription factors, as seen with TATA-box elements, or indirectly bound through the transcription activator domains (TADs) of various transcription factors that bind DNA (30, 31). Ultimately, the basal transcription factor TFIID (along with TFIIA) binds these sites and nucleates the formation of the PIC. Subsequently, TFIIB is recruited and this TFIID/TFIIA/TFIIB complex engages TFIIF and RNA polymerase II. This large assembly of proteins subsequently recruits TFIIE and TFIIH and forms the closed conformation of the PIC (30). In transcription initiation, the PIC melts bound DNA and the template DNA strand is threaded into the active site of the polymerase via an ATP-dependent mechanism largely mediated by TFIIB and TFIIH (30). Once bound to the template DNA strand, the PIC assumes the “open” conformation. The PIC then scans the DNA for an initiator motif (Inf), where complimentary nucleotides are recruited and the first

2 Basal transcription factors are the 5 multi-protein complexes minimally required for RNA polymerase II activity in vivo. These include TFIID/TFIIA, TFIIB, TFIIF, TFIIE, and TFIIH (30, 31).

6

phosphodiester bond is formed (30). The majority of these early polymerase reactions result in abortive transcripts, but eventually the polymerase will successfully transcribe more than seven nucleotides, which triggers the release of TFIIB (30). The successful formation of this short RNA polymer and release of TFIIB constitute promoter escape, a hallmark of committed transcription. This shift from the PIC to active transcription is accompanied by phosphorylation of the CTD (29, 30).

1.2.2 Elongation.

As the CTD becomes hyperphosphorylated and polymerase progresses along DNA the other components of the PIC release from the polymerase, save TFIIF (30). TFIIF remains associated with RNA Pol II throughout transcription and aids in the recruitment of important transcription regulators and CTD modifiers (i.e. Fcp1 (44)). TFIID/TFIIA remain at the PIC site to initiate subsequent rounds of transcription (30). Following promoter escape, early elongation is characterized by two major events: installation of the 5’-cap to nascent RNA (29, 40) and promoter proximal pausing (40). The 5’-cap both stabilizes RNA and is essential for translation of mRNA(30, 31). Promoter proximal pausing is a phenomena observed primarily in multicellular organisms where the polymerase stalls 30-60 bp downstream from the transcription start site. This stall is thought to provide a window for additional regulation and quality control of the 5’-cap(40). Both 5’-capping and release from the promoter-proximal-pause are mediated by specific CTD phosphorylations (29). Once released from the promoter-proximal pause the polymerase officially enters productive elongation. The elongation rate of different genes varies and this seems to have importance to co-transcriptional processing like splicing (40). The rate of elongation is influenced by multiple factors including DNA sequence, nucleosome occupancy, and

7

the compliment of elongation factors or repressors recruited to the polymerase (30, 40, 45). Co-transcriptional splicing also occurs during elongation and is observed from the promoter proximal pause through the end the elongation phase (29, 40). Eventually, the elongating polymerase reaches the end of the coding region of the transcript and must disengage. This process is called termination.

1.2.3 Termination.

Transcription termination is perhaps the most poorly understood aspect of eukaryotic transcription (42). In prokaryotes, transcription termination is straightforwardd and is directed by either GC-rich DNA element that pause the polymerase and result in dissociation or through a factor-dependent mechanism involving the formation of an RNA hairpin and recruitment of the termination factor rho (30). The events leading to termination in eukaryotes are not nearly as well defined In eukaryotes, the polymerase often remains engaged with the DNA and active well after the end of the coding region of the (30). This can be functionally relevant as it helps generates the 3’-UTR, but much of this additional transcription is not well understood (42). In the case of pre-mRNA, the polymerase generally passes one or more AATAAA DNA elements. This generates a cleavage site in the resultant transcript that is recognized by the cleavage and polyadenylation specificity factor (CPSF) in multicellular eukaryotes or the cleavage and polyadenylation factor (CPF) in single celled eukaryotes (30, 31, 42, 46-48). These multi-protein complexes interact with the polymerase (via Ser2 phosphorylations of the CTD (29)) and cleave the majority of pre-mRNA transcript downstream of AAUAAA sites. Polynucleotide adenylyltransferase then adds the poly(A) tail, which stabilizes the transcript (30, 31, 42, 46-48). Interestingly, not all mRNA receive poly(A) tails. Histone mRNA is an example of these tailless transcripts

8

and it does appear to decrease their half-life (12, 42). Concurrent with or following cleavage and polyadenylation, a variety of factors can facilitate the dissociation of the polymerase from DNA. This can occur abruptly after cleavage sites or multiple kilobases downstream (42). Models to explain a physical mechanism for this dissociation are varied and include allosteric changes of the polymerase induced by CPF/CPSF binding, road blocks introduced by various chromatin bound proteins, and in some limited cases ubiquitination and proteasome degradation of the DNA bound polymerase (42). These mechanisms appear gene and context specific and a unified model of RNA Pol II termination has not been satisfactorily developed. Ultimately, the majority of RNA Pol II/TFIIF complexes dissociate from DNA and are recycled to other PIC where they can initiate subsequent rounds of transcription (29-31, 42). Despite our lack of understanding about transcription termination, its importance to cellular viability is clear. Proper termination insures the generation of functional transcripts and prevents read-through into other gene loci. Read-through into other gene loci often results in transcriptional interference (48). This prevents the production of functional transcripts from the original loci and prevents efficient initiation and 5’- capping of downstream genes by physical collision of the read-through polymerase with downstream PICs (42, 48). For genes encoded in opposite directions along DNA (i.e. convergent genes) read-through can result in polymerase-polymerase collisions and similarly inhibit gene expression (42, 49). Finally, efficient termination prevents DNA damage and promotes genome stability by preventing RNA Pol II collisions with DNA replication machinery in dividing cells (42). This process of transcription termination is essential to the cell and will be an important field of research in coming years.

9

1.2.4 Transcription cycle and CTD.

Each stage of transcription is characterized by particular CTD modifications. These modifications are essential in recruiting various RNA processing factors, including the 5’-capping and 3’-polyadenylation machinery (29). Furthermore, this code is strikingly dynamic across individual cycles of transcription and can vary in a gene specific manner. In depth discussion of these coding dynamics and their contribution to transcription, specifically of mRNA, is included below.

1.3 THE CARBOXYL-TERMINAL DOMAIN OF RNA POLYMERASE II & CTD CODE

The CTD of RNA Pol II is conserved among eukaryotes, but not present in prokaryotic RNA polymerases or in other eukaryotic DNA-dependent RNA polymerases

(RNA Pol I & III) (15, 50). The CTD acts as a nexus for the flow of information between

DNA, RNA, and protein codes during transcription (Figure 1-2) and undergoes extensive post-translational modification especially phosphorylation (4, 29). The role of CTD phosphorylation during transcription is to recruit proteins that facilitate mRNA production and processing such as initiation factors (26, 27), capping enzymes (7, 8, 19), splicing factors (24, 25), and poly-adenylation/termination complexes (12, 20-23).

Furthermore, different phosphorylated forms of the CTD link transcription to other cellular events like histone modification (51), DNA repair (52), and cell cycle regulation

(53-55).

10

Kinase Phosphatase

CTD Polymerase

RNA

DNA

Figure 1-2. The CTD sits at the intersection of DNA, RNA, and protein codes.

1.3.1 Cycle of CTD phosphorylation.

The phosphorylation status of the two best-understood CTD residues, Ser2 and

Ser5, is tightly correlated with the progress of transcription (56, 57). Genome wide ChIP-

Seq analysis has allowed for mapping of Ser2, Ser5, and other phosphorylated residues of the heptad repeat in yeast (13, 56, 58-60) (Figure 1-3). The CTD must be hypophosphorylated for the polymerase to bind the promoter of transcribed genes (5, 61).

During promoter clearance, the transcribing RNA Pol II leaves the pre-initiation complex and polymerizes the first few nucleotides. This step in transcription is associated with increasing levels of Ser5 phosphorylation that quickly plateau. Within the next 750bp

11

Ser5 is significantly dephosphorylated, although phosphorylated Ser5 seems to be present throughout transcript elongation (56). Phosphorylated Ser2 levels rise later in elongation and reach their peak when approximately 1kb is transcribed in S. cerevisiae (56).

Phosphorylated Ser2 remains present past the site of poly-adenylation and is removed at transcription termination (56). Less well-understood phosphorylation at Tyr1, Thr4, and

Ser7 have been identified and are also accumulated in distinct patterns throughout the transcription cycle. Unlike Ser2 and Ser5, the biological implication of phosphorylation events at these sites might differ in different model organisms. In yeast, phosphorylated

Tyr1 levels follow a similar trend to phosphorylated Ser2, being initially low at the promoter and increasing throughout transcript elongation in yeast (13). Notably, vertebrate Tyr1 phosphorylation occurs at the transcription start site, enhancer regions, and to a lesser extent in the coding region of genes suggesting differential functionality in these organisms (62, 63). Unlike Ser2, Tyr1 phosphorylation does not persist after the poly-adenylation site (13, 62). Thr4 phosphorylation is evident throughout transcript elongation but is low at the poly-adenylation site in yeast (13). Ser7 phosphorylation rises along side Ser5 phosphorylation, but remains at a more constant level throughout transcription and drops as the polymerase reaches the poly-adenylation site (56). All phosphorylation marks are removed before the initiation of another round of transcription

(64) (Figure 1-3).

12

Figure 1-3. Phosphorylation CTD residues through transcription in S. cerevisiae.

The y-axis indicates approximate ChIP-enrichments found in genomic tiling arrays in S. cerevisiae. The x-axis indicates approximate stage of transcription. (TSS = transcription start site. TTS = transcription termination site) (13, 56).

1.3.2 Contribution of different CTD phosphorylation marks.

The various phosphorylation marks of the CTD contribute in distinct ways to the transcription cycle and coincide with various transcriptional events (Table 1-1). The most essential marks of the CTD are phosphorylation of Ser5 and Ser2, which play a variety of roles. The most important function of Ser5 phosphorylation is the recruitment of 5’-capping factors, enhancement of guanylyltransferase activity to form the 5’-end of nascent RNA, and influence mediator interactions and promoter escape (8, 65, 66). This point was elegantly demonstrated in S. pombe, where bypassing the requirement for Ser5 phosphorylation was accomplished by covalently tethering 5’capping machinery to the CTD (19). Ser2 phosphorylation’s core function is to promotes the transition from initiation to elongation (67), facilitate release from the promoter-proximal paused state (40), and facilitate the recruitment of 3’-end processing factors (23). Given these functions, Ser2 and Ser5 are likely important to the majority of RNA polymerase II 13

transcripts. Tyr1 phosphorylation appears to play a role in general transcription by preventing the recruitment of termination factors and promoting elongation (13). Thr4 and Ser7, on the other hand, may be playing ‘gene-specific’ roles. For example, Ser7 phosphorylation is essential for snRNA gene expression (10) and Thr4 phosphorylation is linked specifically to histone mRNA 3’-processing in mammals (12) and is likely the related to the elongation defects observed in Thr4 mutants (68). The marks in this CTD code occur dynamic during transcription and are installed and removed through the action of cascades of post-translational modification enzymes (28). The enzymes that install or remove modifications along the CTD are termed writers (e.g. kinases) and erasers (e.g. phosphatase). The codes installed by these enzymes are interpreted by readers, things like the 5’-capping machinery and elongation factors (4). Together, these factors ensure proper timing of transcriptional processes (4).

14

Phosphorylated Associated Processes Reference Residue Inhibition of termination, stabilization Tyr1 of RNA polymerase II in cytosol, (59, 69-71) antisense and enhancer transcription Transcription elongation, release from promoter proximal pause, transcription termination, cleavage Ser2 (19, 72-80) and polyadenylation, chromatin modification, and DNA topology alterations Transcription elongation, transcription termination, splicing, Thr4 (81-85) histone mRNA processing, and chromatin modification Transcription initiation, mRNA capping, interaction with mediator, Ser5 splicing, chromatin modification, and (19, 25, 57, 66, 82, 86-90) non-coding RNA transcription termination Expression of snRNA, integrator Ser7 interactions, and priming of P-TEFb (9, 10, 91) activity (in vitro)

Table 1-1. Phosphorylation marks of the CTD.

15

1.3.3 CTD kinases, writers of the code.

Interest in writers of the CTD code, the CTD kinases, has generated a wealth of information to explain fundamental determinants of gene expression (27, 67, 92, 93).

Intriguingly, the CTD kinases responsible for Ser2 and Ser5 phosphorylation belong to the cyclin-dependent kinase family (CDK) and are similar to the other nuclear kinases involved in cell-cycle regulation (94). Ser5 phosphorylations in the vast majority of transcription are installed by TFIIH, specifically the kinase subunit Cdk7 (29, 95, 96). In some developmental contexts Erk1/2 can also phosphorylate the Ser5 of poised RNA polymerase II, but these are likely relatively rare occurrences (97). Ser2 phosphorylations are dependent on the action of P-TEFb in vivo, specifically it’s Cdk9 kinase subunit (29). Interestingly, in vitro P-TEFb acts as a Ser5 kinase (91). This discrepancy in specificity has yet to be resolved. Several other CDKs such as CDK8 (98),

CDK12 (99) and CDK13 (100) have been reported to complement the function of CDK7 and 9 by phosphorylating Ser5 and Ser2 of CTD heptad repeats but the importance of this overlap in specificity is not fully understood.

Some of the kinases responsible for phosphorylation of Tyr1, Thr4, and Ser7 have also been identified. Tyr1 phosphorylation was identified more than two decades ago

(101-104). Initial in vitro studies suggested c-Abl could recognize and phosphorylated the

CTD (102, 103) and subsequent in vivo analysis did confirm it as a bona fide CTD kinase in the context of DNA damage response (105). However, a basal level of Tyr1 phosphorylation remained even upon perturbation of c-Abl and convincing in vivo data

16

has yet to establish a clear candidate for Tyr1 kinase in the majority of RNA Pol II mediated transcription (101, 105). CTD kinase activity seems to be a common feature of

Abl-like protein kinases because the c-Abl family member Arg also phosphorylates the

CTD in vitro, with comparable efficiency to c-Abl (101). Thr4 appears to be phosphorylated by PLK3 kinase both in vivo and in vitro against purified RNA polymerase II holoenzyme (83). Ser7 appears to be phosphorylated by TFIIH, similar to

Ser5. This is a surprising finding, because TFIIH is generally localized to the promoter of genes not coding regions of snRNA (95).

1.3.4 CTD phosphatases: Erasers of the code.

Compared to the intensive study of CTD kinases spanning over three decades, the

CTD phosphatases have been overshadowed due to their perceived passive role of simply erasing CTD phosphorylation at the end of transcription. However, this view has been disputed with overwhelming evidence that phosphatases are dynamic and active participants in transcription regulation. Cells lacking certain CTD phosphatases experience termination defects (106-109), reduced transcription levels due to failure to turnover RNA Pol II (109-111), and even cell death (112, 113). These studies have revealed six CTD phosphatases: , Ssu72, the Scps, Rtr1, Cdc14, and Glc7 (114). The phosphatases essential to CTD dephosphorylation in general transcription are Fcp1 and

Ssu72. These dephosphorylated Ser2 and Ser5, respectively (114).

Fcp1 was initially characterized as a PP2C phosphatase that utilized the PPP/PPM mechanism, based on its requirement for magnesium and resistance to okadaic acid (115).

17

However, the mechanism of Fcp1 more closely resembles that of the haloacid dehydrogenase (HAD) protein family, a huge protein family including enzymes that mediate C-P or O-P bond cleavage (116). These findings positioned Fcp1 as the founding member of a new aspartate-based serine/threonine phosphatase family in eukaryotes, the

FCP/SCP phosphatases (117, 118). There is substantial evidence that the phosphatase activity of Fcp1 is vital to general transcription mediated by RNA Pol II. Yeast strains lacking the Fcp1 gene are not viable, presumably due to an inability to dephosphorylate and recycle RNA Pol II for subsequent rounds of transcription (111, 113). Yeast containing Fcp1 variants with attenuated phosphatase activity can survive but exhibit defects like read-through transcription and globally reduced RNA levels (111). These findings strongly linked Fcp1 phosphatase activity to general transcription in vivo. . Fcp1 can dephosphorylate Ser2 and Ser5 residues in synthetic phosphoryl-CTD peptides in vitro (119). This is not surprising since the residues flanking these two serine residues

are quite similar (Y1S2P3 vs. T4S5P6). However, yeast cells with inactive Fcp1 accumulate only phosphorylated Ser2, suggesting that Fcp1 controls the phosphorylation state of Ser2 rather than Ser5 in vivo (120). Fcp1 favored synthetic CTD peptides phosphorylated at

Ser2 over Ser5 by 10-fold in in vitro kinetic assays with specific activities of 16 nmol vs.

1.6 nmol of inorganic phosphate released per μg of Fcp1 per hour against phosphorylated

Ser2 and Ser5 peptides, respectively (119). Fcp1 has also been implicated as the phosphatase responsible for the newly identified Thr4 dephosphorylation in vivo (121,

122) and Fcp1 is active against phosphorylated Ser7 in vitro, but the physiological significance of this activity is unclear (56, 123). These results together suggest that Fcp1 18

fulfills the specific and vital role of Ser2 CTD dephosphorylation, but maintains secondary activity against Thr4, Ser5, and Ser7 to ensure the complete dephosphorylation of the CTD at the end of transcription.

In the search for a physiologically relevant Ser5 CTD phosphatase, researchers identified Ssu72 (gene SSU72, suppressor of sua7, gene 2) (124). SSU72 is an essential gene for yeast cell growth and is conserved throughout eukaryotes (112, 125). SSU72 was originally identified in a genetic screen for genes that suppress the phenotype of mutations in SUA7, the S. cerevisiae homologue of the gene encoding human TFIIB

(112). Functional interaction between Ssu72, TFIIB, and Rpb2 suggested that this protein played a role in transcript initiation (126). Further investigation implicated Ssu72 as a key player in 3’ end processing and termination (106, 109). Ssu72 was identified as a constituent of the Cleavage and Polyadenylation Factor (CPF) complex in yeast through proteomic studies and confirmed using immunoprecipitation (109, 127). In vivo and in vitro transcription assays showed that Ssu72 mutant variants are defective in transcript elongation and termination (109, 128). Indeed, Ssu72 activity is essential for the accurate termination of RNA Pol II-mediated transcription since Ssu72 variants with compromised phosphatase activity resulted in read-through transcription of almost half of snRNA and snoRNA as well as some mRNAs (106). This phenotype can be partially rescued by reducing the elongation speed through mutation or introduction of chemical agents (106,

109). It seems that Ssu72 balances transcript elongation and termination by decreasing the rate of elongation and allowing for appropriate binding of transcription factors and/or termination machinery (109). The function of Ssu72 in transcription termination is also 19

conserved in higher eukaryotes where Ssu72 is found to be a constituent of human

Cleavage and Polyadenylation Specificity Factor (CPSF) complex (homologue of yeast

CPF complex in mammalian systems) (125). Ssu72-mediated transcriptional regulation is reliant on its phosphatase activity. However, it was not immediately apparent that Ssu72 was indeed a phosphatase. The phosphatase activity of Ssu72 was supported by two bioinformatic lines of evidence: (1) Ssu72 orthologues contain a conserved CxxxxxRS sequence motif commonly found in protein tyrosine phosphatases (129) and (2) structure alignment revealed Ssu72 has a comparable fold to the low molecular weight tyrosine phosphatase (LMW-PTP) despite sharing little (130). Purified Ssu72 was first shown to be active using p-nitro-phenyl phosphate (pNPP) as a substrate (106), which is inline with its cysteine-based PTP mechanism considering the chemical similarity between pNPP and phosphorylated tyrosine. Interestingly, yeast mutants with decreased Ssu72 activity accumulated Ser5, but not Ser2 CTD phosphorylation (124).

This result is consistent with in vitro phosphatase assays against phosphorylated CTD peptide substrate, showing that Ssu72 is specific for Ser5 and not Ser2 or Tyr1 (123,

131). More recently, evidence suggests that Ssu72 is also the physiological Ser7 CTD phosphatase (56, 108). In vitro, Ssu72 is 4000 fold less active against phosphorylated

Ser7 than Ser5 peptides (123). However, this activity seems to be much greater in vivo as long as Ser5 is also phosphorylated, suggesting that Ssu72 first targets Ser5 phosphorylated heptads and then removes phosphates from Ser7 (108).

The characterization of Fcp1 as a Ser2 CTD phosphatase sparked interest in finding other related phosphatases acting on the CTD. A bioinformatics approach for 20

proteins containing regions similar to the Fcp1 homology domain revealed three closely related human proteins (132). The highly similar Small CTD Phosphatases 1-3 (Scp1-3 or CTDSP1/2/L) contain the conserved HAD family mechanism, but lack the BRCT domain and C-terminal TFIIF-interacting domain that are required for Fcp1 recognition of the CTD (132). These three isoforms perform identically in all in vitro biophysical and kinetic characterizations; and although they locate in different , the tissue specific expression profile for the three isoforms do not appear to differ (133). Purified

Scp1 dephosphorylates RNA Pol II in vitro and tethering it to the promoter of various plasmid borne-genes in yeast represses their transcription (132). Kinetic characterization of Scp1 revealed a strong preference for phosphorylated Ser5 of the CTD as opposed to

Ser2 (132, 134). Interestingly, biological experiments suggest that Scps do not play a role in general transcription like Fcp1. Instead, they appear to be specific for repression of neuronal genes during development. Loss of functional Scp1 leads to de-repression of several neuronal genes as well as an induction of neuronal phenotypes in neural progenitor cells (133, 135). Based on these initial findings (132, 133), a plausible mechanism for Scps in neuronal silencing was that they are recruited to neuronal genes via their interaction with transcription factor called REST (133). Once localized, they dephosphorylate any active RNA Pol II and silence gene expression. In this model, Scps dephosphorylate the CTD at Ser5 and other residues efficiently due to their high phosphatase activity. Scps prevents RNA Pol II from entering active transcription by maintaining RNA Pol II in the hypophosphorylated form, consistent with a role in gene silencing. An additional mode of action has been recently proposed for Scps role in gene 21

silencing. REST degradation and association with REST-complex cofactors is regulated by phosphorylation at S861 and S864 (136). Phosphorylation of these residues increases

REST affinity to E3 ubiquitin ligase SCFβ-TRCP and reduces it’s affinity to CoREST (136), a protein that binds the C-terminus of REST and helps recruit additional silencing factors

(137). Scps directly dephosphorylate REST at S861/S864 both in vitro and in vivo, and can abolish these negative effects on REST function (136). By preventing ubiquitin- mediated degradation of REST and promoting its assembly with CoREST Scps stabilize the silencing complex and prevent neuronal gene expression (136). These two models are not mutually exclusive and may coexist. Therefore, Scp1 negatively regulates RE-1 linked neuronal genes through dephosphorylation of REST to promote its stability/silencing function and may interfere with phosphorylation of RNA polymerase II to prevent transcript generation.

None of the CTD phosphatases are more controversial than Rtr1. Since Fcp1 and

Ssu72 are primarily recruited in the later stages of transcription (108), many have searched for phosphatases that modify the CTD phosphorylation status earlier in transcription cycle (56). An early drop in Ser5 phosphorylation levels, detected using

Ser5 specific antibodies, is coincident with the recruitment of a Pol II-associated protein,

Rtr1 (107). Incubation of S. cerevisiae Rtr1 with in vitro phosphorylated GST-CTD and synthetic peptides resulted in a reduction of Ser5 phosphorylated species (107).

Furthermore, yeast strains lacking Rtr1 display growth defects, accumulate phosphorylated Ser5, and experience problems in transcription (107, 138). RPAP2, the human homologue of Rtr1, also associates with RNA Pol II and dephosphorylates Ser5 in 22

vitro and in vivo (139). This combination of in vitro and in vivo evidence supports the notion that Rtr1 and RPAP2 are Ser5 CTD phosphatases.

Cdc14 is an essential cell cycle phosphatase that regulates key events during late mitosis. Cdc14 reverses Cdk-dependent phosphorylation and promotes late mitotic exit

(140). Cdc14 has been identified as a component of the silencing complex RENT. RENT is similar to the Scps associated REST complex in that it helps to silence gene expression

(141, 142). Cdc14 is thought to contribute to transcription silencing by dephosphorylating

CTD (54, 55). Recombinant Cdc14 can dephosphorylate purified, hyperphosphorylated

CTD substrate in vitro. This dephosphorylation occurs both at Ser2 and Ser5, as determined by site-specific antibodies (54, 55). Phosphorylated Ser5 accumulates in

Cdc14-null cell lines and tagged Cdc14 co-immunoprecipitates with Rpb1 (54). Cdc14- null cells display a shift in cyclin transcript expression, suggesting Cdc14 may be controlling a subset of genes during cell cycle (54), but it is unclear if this can be linked entirely to its CTD phosphatase activity. These experiments suggest an interesting role for Cdc14 dephosphorylation of CTD in cell cycle regulation and expand the critical nature of this regulated process.

Tyr1 phosphorylation is most abundant at the end of the transcription cycle and is believed to play an important role in timing termination (13). Purified yeast cleavage and poly-adenylation factor (CPF) complex can dephosphorylate the CTD at Tyr1, Ser2, and

Ser5 (131). Ssu72 and Glc7 are the only two known phosphatases in this complex (143).

Ssu72 has no significant Tyr1 phosphatase activity in vitro or in vivo, even though it is

23

structurally similar to established tyrosine phosphatases (123, 131). This leaves Glc7, a di-metal ion phosphatase that is essential for the termination of snoRNA (144) and mRNA export (145). Inactivation of Glc7 by broad-spectrum di-metal phosphatase inhibitors EDTA and microcystin abolished CPF phosphatase activity against Tyr1 and

Ser2 in vitro. Since EDTA and microcystin do not affect the activity of PTP phosphatases like Ssu72, it was reasoned that Glc7 is responsible for the Tyr1 and Ser2 phosphatase activity of CPF in vitro. Depletion of Glc7 in vivo led to dephosphorylation defects specific to Tyr1 and resulted in increased occupancy of RNA Pol II downstream of genes, suggesting defects in termination (131). In vitro purified Glc7 isolated from CPF has not been tested for phosphatase activity against CTD, owing in part to the low specificity expected of di-metal dependent phosphatases in the absence of auxiliary domains or binding partners (118, 131). Therefore, the identification and role of Glc7 binding partners in directing Glc7 activity against CTD must be investigated to unambiguously characterize this phosphatase.

1.3.5 Prolyl isomerases: Modifiers of the code.

Beyond CTD kinases and phosphatases, other CTD modifiers can impact the phosphorylation state of the CTD. These include factors that alter the conformation of the CTD, which can either enhance or decrease the affinity of CTD binding partners. In turn, changes in affinity towards the CTD over the course of transcription can regulate the timing of transcriptional processes. Primary examples of such CTD modifiers are the prolyl isomerases Ess1 in yeast and Pin1 in humans.

24

Prolyl isomerization plays a major role in signal transduction (146). Proline is the only natural amino acid that stably assumes either a cis or a trans conformation about its prolyl peptide bond (Figure 1-4). The trans form is thermodynamically favored and naturally occurs ~70-90% of the time (147). This difference in conformation can alter the interaction between proteins if the protein-protein interface contains proline residues.

Additionally, the two isoforms interchange according to a thermal equilibrium, but this process is slow in the context of an entire protein and can be rate-limiting (148, 149). To overcome this slow isomerization, nature has evolved a class of enzymes called prolyl isomerases that increase the rate of isomer conversion (150, 151). It is important to note that prolyl isomerases do not alter total isomer ratios but instead can rapidly re-balance isomer pools when the equilibrium is broken and maintain isomer ratios at a constant level. Ess1 and Pin1 are essential phospho-specific proline isomerases that act on a variety of substrates (146). In terms of the CTD, Ess1/Pin1 equilibrate Pro3 and Pro6 residues adjacent to phosphorylated Ser2 and Ser5 residues (151, 152). Depletion or mutation of Ess1/Pin1 leads to global transcription defects and accumulation of CTD

Ser5 phosphorylation, suggesting that proline isomerization of the CTD plays a role in transcriptional signaling (153, 154). Ess1/Pin1 enhances CTD dephosphorylation and recognizes the same Ser-Pro binding motif as CTD phosphatases. One explanation for the role of Ess1/Pin1 in CTD dephosphorylation is that prolyl isomerases and phosphatases are functionally linked. Specifically, the proline isomer preference of CTD phosphatases may contribute to their apparent activity and be regulated by prolyl isomerases.

25

Unfortunately, little mechanistic data is available to explain the impact of proline isomerization on transcription at the molecular level. This is primarily due to the subtle conformational differences introduced by proline isomerization that cannot be detected using current technologies like sequencing or mass spectrometry.

Figure 1-4. Conformations of phospho-serine proline motifs.

1.4 CHEMICAL BIOLOGY TO DECIPHER THE CTD CODE

The last several decades have provided an immense amount of information about the transcription process, and specifically about CTD function and modification.

However, the greatest hurdle remaining in the CTD field today is the resolution in which we can observe the landscape of phosphorylation marks along the CTD (29, 155, 156).

Owing to current technological limitations, how these marks are patterned in individual heptads and along the sprawling CTD remain fairly nebulous. Phosphorylation specific

CTD antibodies provided the leap necessary to identify the presence and abundance of specific phosphorylated residues in the CTD. Unfortunately, these antibodies do not

26

provide the intricate molecular detail required to fully characterize the pattern of post- translational modification at specific residues in specific locations along the CTD. This prevents the investigation of higher-order combinations of post-translation modifications.

These include combinations of proline isomerization and phosphorylation or divalent phosphorylation marks across a single or multiple heptads. These combinations of marks in the code may make unique contributions to CTD function and factor recruitment that are currently beyond the resolution of available tools.

The ability of post-translational modifications to ‘cross talk’ within the CTD code has been proposed multiple times (29, 155, 157). This cross talk is characterized two major ways: 1. As combinations of post-translational modifications that specifically recruit certain factors. An example of such a combinatorial code would be Ser5 and Ser2 phosphorylation co-occurring in the same heptad to recruit a specific factor, or 2. As the ability of previously installed marks to potentiate the installation or removal of other

CTD modifications. An example of this would be if Ser5 phosphorylation alters subsequent kinase preference for Ser2. Both forms of cross talk would be mediated by protein factors interpreting the CTD code that are sensitive to other marks or combinations of them. Because the majority of our knowledge comes from the use of

CTD antibodies that recognize isolated CTD marks and only provide abundance information, very little data is available to support or deny such cross talk (29, 155, 157).

In my graduate work I sought to overcome these limitations and apply the tools of chemical biology to understand CTD modification in an unbiased and high-resolution fashion. By increasing the resolution in which we study CTD, I hoped to investigate the 27

existence of CTD modification cross talk and the existence of a combinatorial code.

Additionally, I was interested in developing tools that allow for high-resolution determination of CTD modification in vitro and in vivo.

In Chapter 2, I utilize chemical tools to confirm that CTD modifications (proline isomerization and serine phosphorylation) indeed interact within the CTD code to generate functionally relevant combinations of marks. These studies suggest that CTD modifications are capable of cross talk or combining as specific combinations of post- translational modification direct CTD phosphorylation status and influence transcription outcomes. In Chapter 3, tandem mass spectrometry and analytical chemistry were utilized to reveal the phosphorylation pattern of CTD kinases along full-length CTD sequences in high resolution. From these studies I derive a rulebook that defines how sequence variation from the consensus heptad influences kinase activity and reveal a novel role for highly conserved CTD residues. In Chapter 4, I build upon the previous two chapters and investigate the ability of phosphorylation marks to cross talk within the CTD code. Initial findings presented in this chapter suggest previously encoded information indeed influences downstream modification and addresses some contradictions concerning P-

TEFb.

1.5 REFERENCES

1. Steinmetz EJ, et al. (2006) Genome-wide distribution of yeast RNA polymerase II and its control by Sen1 helicase. Molecular cell 24(5):735-746. 2. Koch F, Jourquin F, Ferrier P, & Andrau JC (2008) Genome-wide RNA polymerase II: not genes only! Trends in biochemical sciences 33(6):265-273.

28

3. Koch F, et al. (2011) Transcription initiation platforms and GTF recruitment at tissue-specific enhancers and promoters. Nature structural & molecular biology 18(8):956-963. 4. Jeronimo C, Bataille AR, & Robert F (2013) The writers, readers, and functions of the RNA polymerase II C-terminal domain code. Chemical reviews 113(11):8491-8522. 5. Kang ME & Dahmus ME (1993) RNA polymerases IIA and IIO have distinct roles during transcription from the TATA-less murine dihydrofolate reductase promoter. The Journal of biological chemistry 268(33):25033-25040. 6. Kelly WG, Dahmus ME, & Hart GW (1993) RNA polymerase II is a glycoprotein. Modification of the COOH-terminal domain by O-GlcNAc. The Journal of biological chemistry 268(14):10416-10424. 7. Cho EJ, Takagi T, Moore CR, & Buratowski S (1997) mRNA capping enzyme is recruited to the transcription complex by phosphorylation of the RNA polymerase II carboxy-terminal domain. Genes & development 11(24):3319-3326. 8. Ho CK & Shuman S (1999) Distinct roles for CTD Ser-2 and Ser-5 phosphorylation in the recruitment and allosteric activation of mammalian mRNA capping enzyme. Molecular cell 3(3):405-411. 9. Chapman RD, et al. (2007) Transcribing RNA polymerase II is phosphorylated at CTD residue serine-7. Science 318(5857):1780-1782. 10. Egloff S, et al. (2007) Serine-7 of the RNA polymerase II CTD is specifically required for snRNA gene expression. Science 318(5857):1777-1779. 11. Glover-Cutter K, et al. (2009) TFIIH-associated Cdk7 kinase functions in phosphorylation of C-terminal domain Ser7 residues, promoter-proximal pausing, and termination by RNA polymerase II. Molecular and cellular biology 29(20):5455-5464. 12. Hsin JP, Sheth A, & Manley JL (2011) RNAP II CTD phosphorylated on threonine-4 is required for histone mRNA 3' end processing. Science 334(6056):683-686. 13. Mayer A, et al. (2012) CTD tyrosine phosphorylation impairs termination factor recruitment to RNA polymerase II. Science 336(6089):1723-1725. 14. Hornbeck PV, et al. (2015) PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic acids research 43(Database issue):D512-520. 15. Allison LA, Moyle M, Shales M, & Ingles CJ (1985) Extensive homology among the largest subunits of eukaryotic and prokaryotic RNA polymerases. Cell 42(2):599-610. 16. Corden JL, Cadena DL, Ahearn JM, Jr., & Dahmus ME (1985) A unique structure at the carboxyl terminus of the largest subunit of eukaryotic RNA polymerase II. Proceedings of the National Academy of Sciences of the United States of America 82(23):7934-7938. 17. Chapman RD, Heidemann M, Hintermair C, & Eick D (2008) Molecular evolution of the RNA polymerase II CTD. Trends in genetics : TIG 24(6):289- 296. 29

18. Patturajan M, et al. (1998) Growth-related changes in phosphorylation of yeast RNA polymerase II. The Journal of biological chemistry 273(8):4689-4694. 19. Schwer B & Shuman S (2011) Deciphering the RNA polymerase II CTD code in fission yeast. Molecular cell 43(2):311-318. 20. Hirose Y & Manley JL (1998) RNA polymerase II is an essential mRNA polyadenylation factor. Nature 395(6697):93-96. 21. Richard P & Manley JL (2009) Transcription termination by nuclear RNA polymerases. Genes & development 23(11):1247-1269. 22. Kuehner JN, Pearson EL, & Moore C (2011) Unravelling the means to an end: RNA polymerase II transcription termination. Nature reviews. Molecular cell biology 12(5):283-294. 23. Ahn SH, Kim M, & Buratowski S (2004) Phosphorylation of serine 2 within the RNA polymerase II C-terminal domain couples transcription and 3' end processing. Molecular cell 13(1):67-76. 24. McCracken S, et al. (1997) The C-terminal domain of RNA polymerase II couples mRNA processing to transcription. Nature 385(6614):357-361. 25. Morris DP & Greenleaf AL (2000) The splicing factor, Prp40, binds the phosphorylated carboxyl-terminal domain of RNA polymerase II. The Journal of biological chemistry 275(51):39935-39943. 26. Lu H, Flores O, Weinmann R, & Reinberg D (1991) The nonphosphorylated form of RNA polymerase II preferentially associates with the preinitiation complex. Proceedings of the National Academy of Sciences of the United States of America 88(22):10004-10008. 27. Conaway RC, Bradsher JN, & Conaway JW (1992) Mechanism of assembly of the RNA polymerase II preinitiation complex. Evidence for a functional interaction between the carboxyl-terminal domain of the largest subunit of RNA polymerase II and a high molecular mass form of the TATA factor. The Journal of biological chemistry 267(12):8464-8467. 28. Buratowski S (2003) The CTD code. Nature structural biology 10(9):679-680. 29. Eick D & Geyer M (2013) The RNA polymerase II carboxy-terminal domain (CTD) code. Chemical reviews 113(11):8456-8490. 30. Mathews CK (2013) Biochemistry (Pearson, Toronto) 4th Ed pp xxvi, 1342 p. 31. Nelson DL, Lehninger AL, & Cox MM (2013) Lehninger principles of biochemistry (W.H. Freeman, New York) 6th Ed p 1 vol. (various pagings). 32. Ezkurdia I, et al. (2014) Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Human molecular genetics 23(22):5866-5878. 33. Carrera I & Treisman JE (2008) Message in a nucleus: signaling to the transcriptional machinery. Current opinion in genetics & development 18(5):397- 403. 34. Weake VM & Workman JL (2010) Inducible gene expression: diverse regulatory mechanisms. Nature reviews. Genetics 11(6):426-437.

30

35. Liu X, Kraus WL, & Bai X (2015) Ready, pause, go: regulation of RNA polymerase II pausing and release by cellular signaling pathways. Trends in biochemical sciences 40(9):516-525. 36. Loya TJ & Reines D (2016) Recent advances in understanding transcription termination by RNA polymerase II. F1000Research 5. 37. Guiro J & Murphy S (2017) Regulation of expression of human RNA polymerase II-transcribed snRNA genes. Open biology 7(6). 38. Mayer A, Landry HM, & Churchman LS (2017) Pause & go: from the discovery of RNA polymerase pausing to its functional implications. Current opinion in cell biology 46:72-80. 39. Skalska L, Beltran-Nebot M, Ule J, & Jenner RG (2017) Regulatory feedback from nascent RNA to chromatin and transcription. Nature reviews. Molecular cell biology 18(5):331-337. 40. Jonkers I & Lis JT (2015) Getting up to speed with transcription elongation by RNA polymerase II. Nature reviews. Molecular cell biology 16(3):167-177. 41. Krauss G (2003) Biochemistry of signal transduction and regulation (Wiley- VCH, Weinheim Great Britain) 3rd Ed pp xvi, 541 p. 42. Proudfoot NJ (2016) Transcriptional termination in mammals: Stopping the RNA polymerase II juggernaut. Science 352(6291):aad9926. 43. Roy AL & Singer DS (2015) Core promoters in transcription: old problem, new insights. Trends in biochemical sciences 40(3):165-171. 44. Archambault J, et al. (1998) FCP1, the RAP74-interacting subunit of a human protein phosphatase that dephosphorylates the carboxyl-terminal domain of RNA polymerase IIO. The Journal of biological chemistry 273(42):27593-27601. 45. Orphanides G, LeRoy G, Chang CH, Luse DS, & Reinberg D (1998) FACT, a factor that facilitates transcript elongation through nucleosomes. Cell 92(1):105- 116. 46. Mandel CR, et al. (2006) Polyadenylation factor CPSF-73 is the pre-mRNA 3'- end-processing endonuclease. Nature 444(7121):953-956. 47. Murthy KG & Manley JL (1995) The 160-kD subunit of human cleavage- polyadenylation specificity factor coordinates pre-mRNA 3'-end formation. Genes & development 9(21):2672-2683. 48. Shearwin KE, Callen BP, & Egan JB (2005) Transcriptional interference--a crash course. Trends in genetics : TIG 21(6):339-345. 49. Gaillard H, Garcia-Muse T, & Aguilera A (2015) Replication stress and cancer. Nature reviews. Cancer 15(5):276-289. 50. Engel C, Sainsbury S, Cheung AC, Kostrewa D, & Cramer P (2013) RNA polymerase I structure and transcription regulation. Nature 502(7473):650-655. 51. Ng HH, Robert F, Young RA, & Struhl K (2003) Targeted recruitment of Set1 histone methylase by elongating Pol II provides a localized mark and memory of recent transcriptional activity. Molecular cell 11(3):709-719.

31

52. Winsor TS, Bartkowiak B, Bennett CB, & Greenleaf AL (2013) A DNA damage response system associated with the phosphoCTD of elongating RNA polymerase II. PloS one 8(4):e60909. 53. Di Vona C, et al. (2015) Chromatin-wide profiling of DYRK1A reveals a role as a gene-specific RNA polymerase II CTD kinase. Molecular cell 57(3):506-520. 54. Guillamot M, et al. (2011) Cdc14b regulates mammalian RNA polymerase II and represses cell cycle transcription. Scientific reports 1:189. 55. Clemente-Blanco A, et al. (2011) Cdc14 phosphatase promotes segregation of telomeres through repression of RNA polymerase II transcription. Nature cell biology 13(12):1450-1456. 56. Bataille AR, et al. (2012) A universal RNA polymerase II CTD cycle is orchestrated by complex interplays between kinase, phosphatase, and isomerase enzymes along genes. Molecular cell 45(2):158-170. 57. Komarnitsky P, Cho EJ, & Buratowski S (2000) Different phosphorylated forms of RNA polymerase II and associated mRNA processing factors during transcription. Genes & development 14(19):2452-2460. 58. Kim H, et al. (2010) Gene-specific RNA polymerase II phosphorylation and the CTD code. Nature structural & molecular biology 17(10):1279-1286. 59. Mayer A, et al. (2010) Uniform transitions of the general RNA polymerase II transcription complex. Nature structural & molecular biology 17(10):1272-1278. 60. Tietjen JR, et al. (2010) Chemical-genomic dissection of the CTD code. Nature structural & molecular biology 17(9):1154-1161. 61. Chesnut JD, Stephens JH, & Dahmus ME (1992) The interaction of RNA polymerase II with the adenovirus-2 major late promoter is precluded by phosphorylation of the C-terminal domain of subunit IIa. The Journal of biological chemistry 267(15):10500-10506. 62. Descostes N, et al. (2014) Tyrosine phosphorylation of RNA polymerase II CTD is associated with antisense promoter transcription and active enhancers in mammalian cells. Elife 3:e02105. 63. Hsin JP, Li W, Hoque M, Tian B, & Manley JL (2014) RNAP II CTD tyrosine 1 performs diverse functions in vertebrate cells. Elife 3:e02112. 64. Cho H, et al. (1999) A protein phosphatase functions to recycle RNA polymerase II. Genes & development 13(12):1540-1552. 65. Ghosh A, Shuman S, & Lima CD (2011) Structural insights to how mammalian capping enzyme reads the CTD code. Molecular cell 43(2):299-310. 66. Wong KH, Jin Y, & Struhl K (2014) TFIIH Phosphorylation of the Pol II CTD Stimulates Mediator Dissociation from the Preinitiation Complex and Promoter Escape. Molecular Cell 54(4):601-612. 67. Ni Z, et al. (2008) P-TEFb is critical for the maturation of RNA polymerase II into productive elongation in vivo. Molecular and cellular biology 28(3):1161- 1170.

32

68. Hintermair C, et al. (2012) Threonine-4 of mammalian RNA polymerase II CTD is targeted by Polo-like kinase 3 and required for transcriptional elongation. The EMBO journal 31(12):2784-2797. 69. Cadena DL & Dahmus ME (1987) Messenger RNA synthesis in mammalian cells is catalyzed by the phosphorylated form of RNA polymerase II. The Journal of biological chemistry 262(26):12468-12474. 70. Descostes N, et al. (2014) Tyrosine phosphorylation of RNA Polymerase II CTD is associated with antisense promoter transcription and active enhancers in mammalian cells. Elife 3. 71. Hsin J-P, Li W, Hoque M, Tian B, & Manley JL (2014) RNAP II CTD tyrosine 1 performs diverse functions in vertebrate cells. Elife 3. 72. Adelman K & Lis JT (2012) Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nature Reviews Genetics 13(10):720-731. 73. Baranello L, et al. (2016) RNA Polymerase II Regulates Topoisomerase 1 Activity to Favor Efficient Transcription. Cell 165(2):357-371. 74. Gu B, Eick D, & Bensaude O (2013) CTD serine-2 plays a critical role in splicing and termination factor recruitment to RNA polymerase II in vivo. Nucleic Acids Research 41(3):1591-1603. 75. Krogan NJ, et al. (2003) Methylation of histone H3 by Set2 in Saccharomyces cerevisiae is linked to transcriptional elongation by RNA polymerase II. Molecular and Cellular Biology 23(12):4207-4218. 76. Licatalosi DD, et al. (2002) Functional interaction of yeast pre-mRNA 3 ' end processing factors with RNA polymerase II. Molecular Cell 9(5):1101-1111. 77. Lunde BM, et al. (2010) Cooperative interaction of transcription termination factors with the RNA polymerase II C-terminal domain. Nature Structural & Molecular Biology 17(10):1195-+. 78. Meinhart A & Cramer P (2004) Recognition of RNA polymerase II carboxy- terminal domain by 3 '-RNA-processing factors. Nature 430(6996):223-226. 79. Sun M, Lariviere L, Dengl S, Mayer A, & Cramer P (2010) A Tandem SH2 Domain in Transcription Elongation Factor Spt6 Binds the Phosphorylated RNA Polymerase II C-terminal Repeat Domain (CTD). Journal of Biological Chemistry 285(53):41597-41603. 80. Yu M, et al. (2015) RNA polymerase II-associated factor 1 regulates the release and phosphorylation of paused RNA polymerase II. Science 350(6266):1383- 1386. 81. Coletta A, et al. (2010) Low-complexity regions within protein sequences have position-dependent roles. Bmc Systems Biology 4. 82. Harlen KM, et al. (2016) Comprehensive RNA Polymerase II Interactomes Reveal Distinct and Varied Roles for Each Phospho-CTD Residue. Cell Reports 15(10):2147-2158. 83. Hintermair C, et al. (2012) Threonine-4 of mammalian RNA polymerase II CTD is targeted by Polo-like kinase 3 and required for transcriptional elongation. Embo Journal 31(12):2784-2797. 33

84. Hsin J-P, Sheth A, & Manley JL (2011) RNAP II CTD Phosphorylated on Threonine-4 Is Required for Histone mRNA 3 ' End Processing. Science 334(6056):683-686. 85. Rosonina E, et al. (2014) Threonine-4 of the budding yeast RNAP II CTD couples transcription with Htz1-mediated chromatin remodeling. Proceedings of the National Academy of Sciences of the United States of America 111(33):11924- 11931. 86. Arigo JT, Eyler DE, Carroll KL, & Corden JL (2006) Termination of cryptic unstable transcripts is directed by yeast RNA-Binding proteins Nrd1 and Nab3. Molecular Cell 23(6):841-851. 87. Krogan NJ, et al. (2003) The Paf1 complex is required for histone h3 methylation by COMPASS and Dot1p: Linking transcriptional elongation to histone methylation. Molecular Cell 11(3):721-729. 88. Nojima T, et al. (2015) Mammalian NET-Seq Reveals Genome-wide Nascent Transcription Coupled to RNA Processing. Cell 161(3):526-540. 89. Schroeder SC, Schwer B, Shuman S, & Bentley D (2000) Dynamic association of capping enzymes with transcribing RNA polymerase II. Genes & Development 14(19):2435-2440. 90. Terzi N, Churchman LS, Vasiljeva L, Weissman J, & Buratowski S (2011) H3K4 Trimethylation by Set1 Promotes Efficient Termination by the Nrd1-Nab3-Sen1 Pathway. Molecular and Cellular Biology 31(17):3569-3583. 91. Czudnochowski N, Boesken CA, & Geyer M (2012) Serine-7 but not serine-5 phosphorylation primes RNA polymerase II CTD for P-TEFb recognition. Nature Communications 3. 92. Lu H, Zawel L, Fisher L, Egly JM, & Reinberg D (1992) Human general transcription factor IIH phosphorylates the C-terminal domain of RNA polymerase II. Nature 358(6388):641-645. 93. Trigon S, et al. (1998) Characterization of the residues phosphorylated in vitro by different C-terminal domain kinases. The Journal of biological chemistry 273(12):6769-6775. 94. Morgan DO (1997) Cyclin-dependent kinases: engines, clocks, and microprocessors. Annual review of cell and developmental biology 13:261-291. 95. Akhtar MS, et al. (2009) TFIIH kinase places bivalent marks on the carboxy- terminal domain of RNA polymerase II. Molecular cell 34(3):387-393. 96. Liu Y, et al. (2004) Two cyclin-dependent kinases promote RNA polymerase II transcription and formation of the scaffold complex. Molecular and cellular biology 24(4):1721-1735. 97. Tee WW, Shen SS, Oksuz O, Narendra V, & Reinberg D (2014) Erk1/2 activity promotes chromatin features and RNAPII phosphorylation at developmental promoters in mouse ESCs. Cell 156(4):678-690. 98. Donner AJ, Ebmeier CC, Taatjes DJ, & Espinosa JM (2010) CDK8 is a positive regulator of transcriptional elongation within the serum response network. Nature structural & molecular biology 17(2):194-201. 34

99. Bartkowiak B, et al. (2010) CDK12 is a transcription elongation-associated CTD kinase, the metazoan ortholog of yeast Ctk1. Genes & development 24(20):2303- 2316. 100. Liang K, et al. (2015) Characterization of human cyclin-dependent kinase 12 (CDK12) and CDK13 complexes in C-terminal domain phosphorylation, gene transcription, and RNA processing. Molecular and cellular biology 35(6):928- 938. 101. Baskaran R, Chiang GG, Mysliwiec T, Kruh GD, & Wang JY (1997) Tyrosine phosphorylation of RNA polymerase II carboxyl-terminal domain by the Abl- related gene product. The Journal of biological chemistry 272(30):18905-18909. 102. Baskaran R, Chiang GG, & Wang JY (1996) Identification of a binding site in c- Ab1 tyrosine kinase for the C-terminal repeated domain of RNA polymerase II. Molecular and cellular biology 16(7):3361-3369. 103. Baskaran R, Dahmus ME, & Wang JY (1993) Tyrosine phosphorylation of mammalian RNA polymerase II carboxyl-terminal domain. Proceedings of the National Academy of Sciences of the United States of America 90(23):11167- 11171. 104. Duyster J, Baskaran R, & Wang JY (1995) Src homology 2 domain as a specificity determinant in the c-Abl-mediated tyrosine phosphorylation of the RNA polymerase II carboxyl-terminal repeated domain. Proceedings of the National Academy of Sciences of the United States of America 92(5):1555-1559. 105. Liu ZG, et al. (1996) Three distinct signalling responses by murine fibroblasts to genotoxic stress. Nature 384(6606):273-276. 106. Ganem C, et al. (2003) Ssu72 is a phosphatase essential for transcription termination of snoRNAs and specific mRNAs in yeast. The EMBO journal 22(7):1588-1598. 107. Mosley AL, et al. (2009) Rtr1 is a CTD phosphatase that regulates RNA polymerase II during the transition from serine 5 to serine 2 phosphorylation. Molecular cell 34(2):168-178. 108. Zhang DW, et al. (2012) Ssu72 phosphatase-dependent erasure of phospho-Ser7 marks on the RNA polymerase II C-terminal domain is essential for viability and transcription termination. The Journal of biological chemistry 287(11):8541- 8551. 109. Dichtl B, et al. (2002) A role for SSU72 in balancing RNA polymerase II transcription elongation and termination. Molecular cell 10(5):1139-1150. 110. Fuda NJ, et al. (2012) Fcp1 dephosphorylation of the RNA polymerase II C- terminal domain is required for efficient transcription of heat shock genes. Molecular and cellular biology 32(17):3428-3437. 111. Kobor MS, et al. (1999) An unusual eukaryotic protein phosphatase required for transcription by RNA polymerase II and CTD dephosphorylation in S. cerevisiae. Molecular cell 4(1):55-62. 112. Sun ZW & Hampsey M (1996) Synthetic enhancement of a TFIIB defect by a mutation in SSU72, an essential yeast gene encoding a novel protein that affects 35

transcription start site selection in vivo. Molecular and cellular biology 16(4):1557-1566. 113. Archambault J, et al. (1997) An essential component of a C-terminal domain phosphatase that interacts with transcription factor IIF in Saccharomyces cerevisiae. Proceedings of the National Academy of Sciences of the United States of America 94(26):14300-14305. 114. Mayfield JE, Burkholder NT, & Zhang YJ (2016) Dephosphorylating eukaryotic RNA polymerase II. Biochimica et biophysica acta 1864(4):372-387. 115. Chambers RS & Dahmus ME (1994) Purification and characterization of a phosphatase from HeLa cells which dephosphorylates the C-terminal domain of RNA polymerase II. The Journal of biological chemistry 269(42):26243-26248. 116. Allen KN & Dunaway-Mariano D (2004) Phosphoryl group transfer: evolution of a catalytic scaffold. Trends in biochemical sciences 29(9):495-503. 117. Kobor MS, et al. (2000) A motif shared by TFIIF and TFIIB mediates their interaction with the RNA polymerase II carboxy-terminal domain phosphatase Fcp1p in Saccharomyces cerevisiae. Molecular and cellular biology 20(20):7438- 7449. 118. Shi Y (2009) Serine/threonine phosphatases: mechanism through structure. Cell 139(3):468-484. 119. Hausmann S & Shuman S (2002) Characterization of the CTD phosphatase Fcp1 from fission yeast. Preferential dephosphorylation of serine 2 versus serine 5. The Journal of biological chemistry 277(24):21213-21220. 120. Cho EJ, Kobor MS, Kim M, Greenblatt J, & Buratowski S (2001) Opposing effects of Ctk1 kinase and Fcp1 phosphatase at Ser 2 of the RNA polymerase II C-terminal domain. Genes & development 15(24):3319-3329. 121. Allepuz-Fuster P, et al. (2014) Rpb4/7 facilitates RNA polymerase II CTD dephosphorylation. Nucleic acids research 42(22):13674-13688. 122. Hsin JP, Xiang K, & Manley JL (2014) Function and control of RNA polymerase II C-terminal domain phosphorylation in vertebrate transcription and RNA processing. Molecular and cellular biology 34(13):2488-2498. 123. Xiang K, Manley JL, & Tong L (2012) An unexpected binding mode for a Pol II CTD peptide phosphorylated at Ser7 in the active site of the CTD phosphatase Ssu72. Genes & development 26(20):2265-2270. 124. Krishnamurthy S, He X, Reyes-Reyes M, Moore C, & Hampsey M (2004) Ssu72 Is an RNA polymerase II CTD phosphatase. Molecular cell 14(3):387-394. 125. Xiang K, et al. (2010) Crystal structure of the human symplekin-Ssu72-CTD phosphopeptide complex. Nature 467(7316):729-733. 126. Pappas DL, Jr. & Hampsey M (2000) Functional interaction between Ssu72 and the Rpb2 subunit of RNA polymerase II in Saccharomyces cerevisiae. Molecular and cellular biology 20(22):8343-8351. 127. Gavin AC, et al. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868):141-147.

36

128. Steinmetz EJ & Brow DA (2003) Ssu72 protein mediates both poly(A)-coupled and poly(A)-independent termination of RNA polymerase II transcription. Molecular and cellular biology 23(18):6339-6349. 129. Meinhart A, Silberzahn T, & Cramer P (2003) The mRNA transcription/processing factor Ssu72 is a potential tyrosine phosphatase. The Journal of biological chemistry 278(18):15917-15921. 130. Zhang Y, Zhang M, & Zhang Y (2011) Crystal structure of Ssu72, an essential eukaryotic phosphatase specific for the C-terminal domain of RNA polymerase II, in complex with a transition state analogue. The Biochemical journal 434(3):435- 444. 131. Schreieck A, et al. (2014) RNA polymerase II termination involves C-terminal- domain tyrosine dephosphorylation by CPF subunit Glc7. Nature structural & molecular biology 21(2):175-179. 132. Yeo M, Lin PS, Dahmus ME, & Gill GN (2003) A novel RNA polymerase II C- terminal domain phosphatase that preferentially dephosphorylates serine 5. The Journal of biological chemistry 278(28):26078-26085. 133. Yeo M, et al. (2005) Small CTD phosphatases function in silencing neuronal gene expression. Science 307(5709):596-600. 134. Zhang Y, et al. (2006) Determinants for dephosphorylation of the RNA polymerase II C-terminal domain by Scp1. Molecular cell 24(5):759-770. 135. Xue Y, et al. (2013) Direct conversion of fibroblasts to neurons by reprogramming PTB-regulated microRNA circuits. Cell 152(1-2):82-96. 136. Nesti E, Corson GM, McCleskey M, Oyer JA, & Mandel G (2014) C-terminal domain small phosphatase 1 and MAP kinase reciprocally control REST stability and neuronal differentiation. Proceedings of the National Academy of Sciences of the United States of America 111(37):E3929-3936. 137. Andres ME, et al. (1999) CoREST: a functional corepressor required for regulation of neural-specific gene expression. Proceedings of the National Academy of Sciences of the United States of America 96(17):9873-9878. 138. Hsu PL, et al. (2014) Rtr1 is a dual specificity phosphatase that dephosphorylates Tyr1 and Ser5 on the RNA polymerase II CTD. Journal of molecular biology 426(16):2970-2981. 139. Egloff S, Zaborowska J, Laitem C, Kiss T, & Murphy S (2012) Ser7 phosphorylation of the CTD recruits the RPAP2 Ser5 phosphatase to snRNA genes. Molecular cell 45(1):111-122. 140. Visintin R, et al. (1998) The phosphatase Cdc14 triggers mitotic exit by reversal of Cdk-dependent phosphorylation. Molecular cell 2(6):709-718. 141. Shou W, et al. (1999) Exit from mitosis is triggered by Tem1-dependent release of the protein phosphatase Cdc14 from nucleolar RENT complex. Cell 97(2):233- 244. 142. Visintin R, Hwang ES, & Amon A (1999) Cfi1 prevents premature exit from mitosis by anchoring Cdc14 phosphatase in the nucleolus. Nature 398(6730):818- 823. 37

143. Nedea E, et al. (2003) Organization and function of APT, a subcomplex of the yeast cleavage and polyadenylation factor involved in the formation of mRNA and small nucleolar RNA 3'-ends. The Journal of biological chemistry 278(35):33000-33010. 144. Nedea E, et al. (2008) The Glc7 phosphatase subunit of the cleavage and polyadenylation factor is essential for transcription termination on snoRNA genes. Molecular cell 29(5):577-587. 145. Gilbert W & Guthrie C (2004) The Glc7p nuclear phosphatase promotes mRNA export by facilitating association of Mex67p with mRNA. Molecular cell 13(2):201-212. 146. Liou YC, Zhou XZ, & Lu KP (2011) Prolyl isomerase Pin1 as a molecular switch to determine the fate of phosphoproteins. Trends in biochemical sciences 36(10):501-514. 147. Brandts JF, Halvorson HR, & Brennan M (1975) Consideration of the Possibility that the slow step in protein denaturation reactions is due to cis-trans isomerism of proline residues. Biochemistry 14(22):4953-4963. 148. Brandl CJ & Deber CM (1986) Hypothesis about the function of membrane- buried proline residues in transport proteins. Proceedings of the National Academy of Sciences of the United States of America 83(4):917-921. 149. Wedemeyer WJ, Welker E, & Scheraga HA (2002) Proline cis-trans isomerization and protein folding. Biochemistry 41(50):14637-14644. 150. Fischer G, Bang H, & Mech C (1984) [Determination of enzymatic catalysis for the cis-trans-isomerization of peptide binding in proline-containing peptides]. Biomedica biochimica acta 43(10):1101-1111. 151. Hanes SD (2014) The Ess1 prolyl isomerase: traffic cop of the RNA polymerase II transcription cycle. Biochimica et biophysica acta 1839(4):316-333. 152. Lu KP, Finn G, Lee TH, & Nicholson LK (2007) Prolyl cis-trans isomerization as a molecular timer. Nature chemical biology 3(10):619-629. 153. Singh N, et al. (2009) The Ess1 prolyl isomerase is required for transcription termination of small noncoding RNAs via the Nrd1 pathway. Molecular cell 36(2):255-266. 154. Mayfield JE, et al. (2015) Chemical Tools To Decipher Regulation of Phosphatases by Proline Isomerization on Eukaryotic RNA Polymerase II. ACS chemical biology. 155. Jeronimo C, Bataille AR, & Robert F (2013) The Writers, Readers, and Functions of the RNA Polymerase II C-Terminal Domain Code. Chemical reviews 113(11):8491-8522. 156. Jeronimo C, Collin P, & Robert F (2016) The RNA Polymerase II CTD: The Increasing Complexity of a Low-Complexity Protein Domain. Journal of Molecular Biology 428(12):2607-2622. 157. Yogesha SD, Mayfield JE, & Zhang Y (2014) Cross-talk of phosphorylation and prolyl isomerization of the C-terminal domain of RNA Polymerase II. Molecules 19(2):1481-1511. 38

39

Chapter 2: Chemical tools to investigate proline isomerization and dephosphorylation in the CTD code.

ABSTRACT

Proline isomerization impacts biological signaling, but is subtle and difficult to detect in proteins. We characterize this poorly understood regulatory mechanism for

RNA polymerase II carboxyl terminal domain (CTD) phosphorylation state using novel, direct, and quantitative chemical tools. We determine the proline isomeric preference of three CTD phosphatases: Ssu72 as cis-proline specific, Scp1 and Fcp1 as strongly trans- preferred. Due to this inherent characteristic, these phosphatases respond differently to enzymes that catalyze the isomerization of proline, like Ess1/Pin1. We demonstrate this selective regulation of RNA polymerase II phosphorylation state exists within human cells, consistent with in vitro assays. These results support a model in which, instead of a global enhancement of downstream enzymatic activities, proline isomerases selectively boost the activity of a subset of CTD regulatory factors specific for cis-proline. This leads to diversified phosphorylation states of CTD in vitro and in cells. We provide the chemical tools to investigate proline isomerization and its ability to selectively enhance signaling in transcription. Furthermore, we provide direct evidence for cross talk between post-translational modifications within the CTD code.

40

2.1 INTRODUCTION

Proline residues, Pro3 and Pro6, flank the major phosphorylation sites of Ser2 and

Ser5. When Ser2 and Ser5 are phosphorylated, the isomerization states of Pro3 and Pro6 are equilibrated by phospho-specific prolyl isomerases Ess1, in yeast, and Pin1, in humans (1, 2). Ess1/Pin1 has been identified to play an important role in transcription regulation (3-6). Ess1 mutations are synthetic lethal with truncated CTD alleles, linking it to CTD-mediated transcription (5). In human cells, Pin1 isomerase activity can impact

RNA polymerase II phosphorylation state and alter RNA polymerase II localization (6).

Furthermore, defective transcription termination phenotypes are associated with compromised prolyl isomerase activity in yeast (3).

While these data suggest that Ess1/Pin1 impacts transcription by altering the proline isomerization state of CTD and in turn its phosphorylation, this is difficult to prove mechanistically. One hypothesis is that proline isomerization state impacts the relative activities of modification enzymes. CTD phosphatases are a promising down stream effector of proline isomerization state since they recognize the same phosphorylated serine-proline motif as Ess1/Pin1. Three CTD phosphatases have been well characterized: Ssu72, Scp1, and Fcp1 (7-10). Both Ssu72 and Scp1 dephosphorylate Ser5 of CTD, but lead to different transcriptional outcomes. Ssu72 plays a pivotal role in general transcription elongation, 3’-end processing, and termination (8, 11, 12). Scp1 is a component of the RE1-silencing transcription factor

(REST) complex and is found only in higher eukaryotes. REST complex prevents the

41

transcription of a subset of neuronal genes (9). Fcp1, the only Ser2 phosphatase characterized to date, is essential for the recycling of RNA polymerase II (13, 14).

Little is known about the use of proline isomerization as a regulatory mechanism during transcription (2). Unique amongst the natural amino acids, the peptide bond of proline can stably assume a cis or trans isomer conformation and isomer interconversion occurs naturally. The trans form is preferred and occurs 70–90% of the time (15).

Prolines located in fully folded proteins can assume the cis or trans form exclusively (16,

17), but context-specific equilibration and the activity of prolyl isomerases, like

Ess1/Pin1, can greatly increase the conversion rate (1). Proline isomerization is not evident in sequence or molecular weight analysis and conversion between the two isomers is challenging to monitor in cells.

To surmount these limitations we developed “locked-proline” analogues that mimic proline, but cannot undergo isomer conversion (18-22). By incorporating locked- proline analogues in place of proline residues we can differentiate their subtle regulatory effect on protein modification (18). In this study, we designed peptidomimetic compounds to characterize the prolyl isomeric requirement for substrates of CTD phosphatases and demonstrate that Pin1 up-regulates only cis-specific phosphatases.

Using yeast GST-CTD as substrate, we show Pin1 isomerase activity promotes dephosphorylation by Ssu72 in the context of full length CTD. We translate our in vitro observations to a cellular system by investigating the accumulation of CTD phosphorylation marks with and without Pin1 activity in Hela cells. Based on these

42

results we propose a model of divergent Pin1 regulation on CTD phosphatases and identify Pin1 as a kinetic switch that helps determine effective and accurate transcription.

2.2 RESULTS AND DISCUSSION

2.2.1 Synthetic CTD peptidomimetic analogues incorporating cis and trans-locked isosteres.

In vitro proteomic analysis revealed that more than 100 yeast proteins are found associated directly or indirectly with CTD (23), and the number is suspected to be higher in mammals. Although it is suggested that many of these proteins are recruited to CTD by binding phospho-Ser-Pro motifs, the configuration of proline is established for very few

(22, 24-27). Since proline isomerization results in subtle changes in conformation, not in sequence or molecular weight, detection of isomer-specific binding is difficult.

We use a chemical biology approach to determine if a protein is selective towards a given proline isomeric state in CTD peptides. To prevent cis/trans auto-conversion, we synthesized peptides incorporating locked isostere analogues to mimic proline residues in only the cis or trans configuration (Figure 2-1) (18, 19, 21, 22). In these peptides, the amide-nitrogen atom of the proline is substituted with a carbon atom and the prolyl peptide bond is replaced with a carbon-carbon double bond to prevent thermal isomerization. These locked isosteres are good mimics for proline residues in different isomeric states and Pin1 recognizes both forms (22). Furthermore, recent studies using peptides with alkene isosteres in place of SP motifs show trans conformation is phosphorylated by Cdk1-cyclin B kinase (18). 43

O- O- A -O P O -O P O O O

N N N H H O H O O trans trans-locked O H B N O O -O P O NH -O P O NH O O O- O- cis cis-locked

Figure 2-1. Locked proline isosteres.

The crystal structures of Scp1 and Ssu72 in complex with phosphorylated-Ser5

CTD peptides are solved (10, 25, 26, 28). Although these phosphatases both recognize phosphorylated Ser5 in CTD, the Pro6 isomerization states captured at their active sites differ. Scp1 binds to CTD peptides when Pro6 is in trans conformation (10), whereas

Pro6 is in cis conformation when bound to Ssu72 (25, 26, 28). Since X-ray crystallography only captures the species that is favorable for crystallization, we want to establish selectivity for proline isomers by CTD phosphatases in solution using peptides incorporating cis or trans-locked proline analogues. Two native peptides and four different CTD peptidomimetic compounds were synthesized as 11-mer or 12-mer repeats with Pro6 or Pro3 following phosphorylated Ser5 or Ser2 replaced by cis or trans-locked proline analogues (Table 2-1).

44

Peptide Sequence

native pSer5 (11-mer) Ac-S-P-Y-S-P-T-S(PO3H2)-P-S-Y-S-NH2 trans-locked pSer5 (11-mer) Ac-S-P-Y-S-P-T-S(PO3H2)Ψ[(E)C=CH)]-P-S-Y-S-NH2 cis-locked pSer5 (11-mer) Ac-S-P-Y-S-P-T-S(PO3H2)Ψ[(Z)C=CH)]-P-S-Y-S-NH2 native pSer2 (12-mer) Ac-S-P-S-Y-S(PO3H2)-P-T-S-P-S-Y-S-NH2

Ac-S-P-S-Y-S(PO3H2)Ψ[(E)C=CH)]-P-T-S-P-S-Y-S- trans-locked pSer2 (12-mer)

NH2

Ac-S-P-S-Y-S(PO3H2)Ψ[(Z)C=CH)]-P-T-S-P-S-Y-S- cis-locked pSer2 (12-mer)

NH2

Table 2-1. Sequence of native and peptidomimetic compounds.

2.2.2 Ssu72 is a cis-specific CTD Ser5 phosphatase.

Ssu72 is a conserved eukaryotic Ser5 phosphatase that is important to transcription termination and mRNA co-processing (8, 11, 12, 29, 30). The phosphatase activity of Drosophila Ssu72-Symplekin was tested against synthetic CTD peptides with

Pro6 replaced by a cis- or trans-locked moiety. Ssu72-Symplekin shows robust

-1 -1 phosphatase activity against cis-locked peptide (kcat/Km of 5.24 ± 0.08 mM s ) (Figure 2-

2A), but no phosphatase activity was detected against the trans-locked peptide (Figure 2-

2A). As a control experiment, we used a synthetic peptide with the same sequence incorporating native pSer-Pro. The activity of Ssu72 against the natural peptide is substantially lower than cis-locked peptidomimetic (Figure 2-2A). This is because the

45

effective cis-substrate for Ssu72 constitutes only a small portion of the natural peptide pool, estimated at 10–30% (15). These quantitative measurements establish Ssu72 as a cis-specific phosphatase for Ser5-Pro6 motifs of CTD.

The crystal structure of Ssu72-Symplekin bound to the cis-locked peptide provides a structural explanation for proline isomer specificity. The complex structure was solved to 2.95 Å using a catalytically inactive variant (C13D/D144N) to capture the mode of isostere peptide binding (Table 1). Strong electron density was observed at the active site of Ssu72 with 6 of 11 substrate residues observed (2-2B). Importantly, the cis- locked proline moiety is bound into a small hydrophobic pocket formed by Met 85, Leu

45, Pro 46, and the side chain of Met 17. This pocket is approximately 5Å deep allowing it to snuggly accommodate the proline side chain (Figure 2-2C). The cis proline isostere binds tightly to this deep and narrow pocket, whereas the alternative trans proline configuration clashes sterically. Alignment with native peptide complex structure (28) reveals conserved geometry and nearly identical substrate positioning (Figure 2-2D), despite cis-locked peptide’s carbon-carbon double bond being shorter than a peptide bond

(1.20 Å versus 1.33 Å). Since the peptide bond between Ser5 and Pro6 does not form hydrogen bonds with Ssu72 active site residues, replacement of the amide bond by an alkene minimally disturbs hydrogen bonding and allows the cis-locked peptide to act as optimal Ssu72 substrate. These observations coupled with our kinetic analysis suggest that Ssu72 has a strict selectivity for cis-Pro6 of CTD.

46

Data Collection and Refinement Statistics Ssu72+Cis-Locked peptide Scp1 + Trans-Locked Peptide Scp1 + Cis-locked Peptide

PDB Code 4YGX 4YGY 4YH1 Data Collection Wavelength (Å) 1.0332 1.03334 0.97648 Space Group P4 C2 C2 Cell Dimensions:

a, b, c (Å) 127.9, 127.9, 105.9 125.3, 78.3, 63.0 125.1, 78.8, 62.9 α, β, γ (deg.) 90.0, 90.0, 90.0 90.0, 112.6, 90.0 90.0, 112.54, 90.0 Resolution (Å) 127.88 – 2.95 (3.00 – 2.95)a 64.86 – 2.36 (2.40 – 2.36) 65.11 – 2.20 (2.24 – 2.20) # of Unique Reflections 35810 22534 27838 I/σ(I) 20.5(1.4) 29.3(3.3) 11.3(1.4) Completeness (%) 99.2(96.2) 96.9(85.2) 97.0(83.0) Redundancy 6.9(4.5) 3.7(3.1) 3.7(2.8)

Rsym (%) 10.4(89.9) 5.0(31.0) 11(49.5) Refinement Resolution (Å) 50.00 – 2.95 50.00 – 2.36 50.00 – 2.20 # of Reflections (Test Set) 33965(1844) 21427(1105) 26369(1469)

b Rwork/Rfree (%) 20.6/25.4 18.3/24.6 19.3/24.6 # of Atoms

Protein 8113 2908 2908 Mg2+ NA 2 2 Ligand 42 72 72 Water 10 62 141 B-factors (Å2)

Protein 88.7 46.8 32.7 Mg2+ NA 53.5 53.6 Ligand 109.6 69.3 67.8 Water 62.6 50.3 38.0 RMS Deviations

Bond Lengths (Å) 0.013 0.016 0.017 Bond angles (deg.) 1.74 1.86 2 Ramachandran Plot (%)c

Favored 96.9 94.7 95.5 Allowed 3.1 5.3 4.5 Outlier 0 0 0 a b c Highest resolution shell is shown in parenthesis. Rfree is calculated with 5% of the data randomly omitted from refinement. Ramachandran statistics generated in MolProbity.

Table 2-2. X-ray crystallography data collection and refinement statistics.

47

A B Ssu72 + Symplekin cis-locked peptide

C Ssu72 + Symplekin D Ssu72 + Symplekin cis-locked cis-locked peptide peptide

native Ser7 Pro3 peptide Ser2

Thr4 cis-isostere Pro6 pSer5

pSer5

Figure 2-2. Drosophila melanogaster Ssu72+Symplekin analysis using locked proline

peptides.

48

Figure 2-2. Drosophila melanogaster Ssu72+Symplekin analysis using locked proline peptides.

(A) Kinetic analysis of Ssu72+Symplekin against native, cis-locked, and trans-locked

Pro6 containing phospho-Ser5 peptides. Ssu72+Symplekin shows considerable activity against the cis-locked (goldenrod) compound with significantly lower activity against the native peptide (tomato). Activity against the trans-locked peptide was not detected above background (blue). Data from three experimental replicates, error bars indicate standard deviation (n=3). (B) 2Fo-Fc map about cis-locked peptide (goldenrod) contoured to 1σ.

Density accounts for residues analogous to Ser2 through Ser7 of a consensus CTD heptad repeat. (C) Surface depiction of Ssu72+Symplekin. The image has been rotated with respect to panels B and D about a vertical axis along the Ser5 position by ~90° counterclockwise and tilted towards the viewer by an additional ~90°. The locked- proline isostere fits into a restrictive hydrophobic pocket. (D) Alignment of complex crystal structures of Ssu72+Symplekin containing cis-locked (goldenrod, PDB ID: 4ygx) and native (tomato, PDB ID: 4imj) peptides. Residues numbered to indicate position in consensus CTD heptad repeat.

2.2.3 Scps strongly favor trans-proline as substrate.

As a component of master silencing complex REST, Scp phosphatases (Scp1–3) are CTD Ser5 phosphatases whose activity is implicated in repression of neuronal genes

(9). Since the three isoforms of Scps have identical catalytic activity and all catalytic

49

residues conserved, we used the best-characterized Scp, Scp1, to study their prolyl isomeric selectivity (10). Both cis and trans-locked CTD peptidomimetics were used as substrate in a phosphatase assay for human Scp1. Different from Ssu72, Scp1 shows robust phosphatase activity against both locked peptides (Figure 2-3A). However, Scp1

presents a significant preference towards trans-locked and native CTD peptides (kcat/Km=

323 ± 22 mM-1s-1 and 386 ± 21 mM-1s-1, respectively). Scp1 can also recognize the cis-

locked peptide as substrate, although there is an 8.5-fold reduction in activity (kcat/Km=

38.0 ± 1.3 mM-1s-1) (Figure 2-3A). This explains why only the trans form of proline is found in X-ray crystal structures of Scp1 bound to native CTD peptide. Because of the averaging effect of X-ray crystallography experiments, they present an averaged structure in which the cis-proline signal would contribute very little to the final electron density due to its low population and weaker affinity. For the first time our chemical tools allow for discrete and quantitative partitioning of proline isomer substrate preference.

We obtained crystal structures of Scp1 bound with each locked isostere compound using an inactive Scp1 variant (D96N) in which the nucleophile Asp was mutated to Asn to prevent product turnover. The Scp1+trans-locked peptide structure was solved to 2.36 Å and the Scp1+cis-locked peptide structure was solved to 2.20 Å (Table

2-2). In both structures five of the eleven synthetic peptide residues can be visualized at the active site (Figure 2-3B). The Scp1 structure containing the cis-locked peptide reveals nearly identical conformation for most residues with the only significant difference being the change of configuration of proline analogues (Figure 2-3C). The locked isosteres, which are flipped 180º in the two structures, locate at the edge of the active site binding 50

pocket and provide a structural explanation for Scp1’s less stringent proline isomer requirement. Unlike Ssu72, in which Pro6 extends into a deep and narrow pocket, Scp1 binds solvent exposed Pro6 at the rim of the active site pocket (Figure 2-3D & 2-3E). The openness of the Scp1 active site for Pro6 binding explains the more promiscuous nature of Scp1 prolyl selectivity since both isomers can be accommodated. These complex structures are consistent with our kinetic results showing that both the cis and trans configurations at Pro6 serve as substrates for Scp1. Our chemical tools provide insight not only into the kinetic impact of proline isomerization state, but also help explain this functional data in terms of protein structure. By visualizing proteins of interest bound to each proline isoform, we can better understand the substrate promiscuity inherent to some protein active sites and develop a physical explanation for this variability.

51

A B cis-locked Scp1 trans-locked Scp1 Peptide Peptide

C trans-locked Scp1 D Scp1 peptide cis-locked peptide Ser2 Pro3 cis-isostere Pro6

Thr4 pSer5

cis-locked pSer5 peptide

E Scp1

trans-isostere

pSer5

trans-locked peptide

Figure 2-3. Human Scp1 analysis using locked proline peptides.

52

Figure 2-3. Human Scp1 analysis using locked proline peptides.

(A) Kinetic analysis of Scp1 against native, cis-locked, and trans-locked Pro6 containing phospho-Ser5 peptides. Scp1 shows comparable and high activity against native (tomato) and trans-locked (blue) peptides. Activity against cis-locked peptide (goldenrod) is observed but nearly 10-fold smaller than that observed for native and trans-locked substrates. Data from three experimental replicates, error bars indicate standard deviation

(n=3). (B) 2Fo-Fc map about cis-locked peptide (goldenrod, left) and trans-locked (blue, right) contoured to 1σ. Density accounts for residues analogous to Ser2 through Pro6 of a consensus CTD heptad repeat. (C) Alignment of complex crystal structures of Scp1 containing cis-locked (goldenrod, PDB ID: 4yh1) and trans-locked (blue, PDB ID: 4ygy) peptides. The structures align well except at the Pro6 location, where they are flipped

180° relative to one another. Residues numbered to indicate position in consensus CTD heptad repeat. (D) Surface depiction of Scp1 and cis-locked peptide (goldenrod). (E)

Surface depiction of Scp1 and trans-locked peptide (blue).

2.2.4 Fcp1 is a trans-preferred phosphatase.

Fcp1 is the only Ser2 phosphatase reported and its activity is essential for the recycling of RNA polymerase II (14, 31). The effect of proline isomerization of CTD on

Fcp1 has been debated since Fcp1 phosphatase activity was first characterized (6, 32).

Unfortunately, even with high concentrations of CTD peptides included in crystallization conditions no peptide was resolved in the active site of Fcp1 (7). To identify the prolyl selectivity of Fcp1, we designed Fcp1 substrates with our locked-proline isosteres at the

53

Ser2-Pro3 position. We synthesized 12-mer synthetic CTD peptides with Pro3 replaced by cis or trans locked isosteres. These two peptidomimetic compounds were used as substrates for Fcp1 in phosphatase assays. Fcp1 shows activity against the trans-locked isostere compound at levels comparable to the activity observed against the native peptide (Figure 2-4A). Lower activity was observed against cis-locked peptide (Figure 2-

4A). Therefore, our results indicate that Fcp1 prefers trans-proline next to the serine subject to dephosphorylation, but can also accept cis-proline containing substrate. This proline isomeric preference is identical to Scp1, though their mode of CTD binding is expected to be different (7).

54

A

B

Figure 2-4. Proline isomer specificity of Fcp1.

(A) Kinetic analysis of Fcp1 against native, cis-locked, and trans-locked Pro3 containing phospho-Ser2 peptides. Fcp1 shows comparable and higher activity against native

(tomato) and trans-locked (blue) peptides. Activity against cis-locked peptide (goldenrod) is observed but lower than that observed for native and trans-locked substrates. Native and trans-locked data from three experimental replicates, error bars indicate standard deviation (n=3). Cis-locked data is from one experimental replicate (n=1). (B) Fcp1/Pin1 coupled assay. All trials show comparable activity with or without Pin1 supplementation.

Error bars indicate standard deviation (n=3). 55

2.2.5 Prolyl isomerase Pin1 does not alter the apparent phosphatase activity of Fcp1.

Since CTD phosphatases have different preferences for proline configuration within their recognition phospho-Ser-Pro motifs, enzymes that catalyze proline isomerization could affect downstream phosphatase function. Indeed, it has been shown that Ess1/Pin1 isomerase activity can alter the transcription profile in yeast (3, 4, 33) and humans (6), but down-stream effectors for such regulation on transcription are not well established (1). Pin1 shows strong affinity to the two SP islands in CTD consensus

sequence with a Kd of 30 µM for phospho-Ser5-Pro6 and a Kd of 61 µM for phospho-

Ser2-Pro3 in peptides of about a single repeat in length (34). Since Pin1 recognizes the same motif as CTD phosphatases, we hypothesize the enzymatic activity of Pin1 can alter downstream phosphatase activity leading to changes in transcription pattern. This hypothesis is consistent with our previous observation that there are dramatic differences between Ssu72 and Scp1 activities when Pin1 is present: Ssu72 showed a 3- to 4-fold increase in phosphatase activity upon Pin1 supplementation, while Scp1 activity was unaffected (22). This enhancement is specific to Pin1 isomerase activity and binding

CTD, since mutation to prevent recognition of CTD by Pin1 eliminates this effect (22).

To see how Pin1 activity affects dephosphorylation of Ser2 in CTD by Fcp1, we measured the phosphatase activity of Fcp1 in the presence and absence of Pin1 (Figure 2-

4B). Previous data suggests contradictory roles of Pin1 on Fcp1 phosphatase activity, showing both stimulatory and inhibitory effects(6, 32, 35). We believe these contradictory reports stem from several factors: (1) the identity and concentration of the de facto substrate was not determined, (2) non-physiological kinases were utilized to 56

generate CTD substrate as a mixture of phosphorylated species, (3) Pin1 concentrations utilized in the in vitro assays were sometimes quite high and may have competed with

Fcp1 for binding CTD substrate, and finally (4) overexpression of Pin1 in vivo can affect many other human Pin1 substrates. To overcome these limitations, we monitored the phosphatase activity of Fcp1 with or without Pin1 against saturating amounts of synthetic

CTD peptide phosphorylated at Ser2. Additionally, by performing a control reaction with the truncated and mutated PPIase K77/82Q domain of Pin1, which is incapable of binding and isomerizing the CTD, we can determine if the observed effects on phosphatase activity are the result of Pin1’s prolyl isomerase activity against CTD. Our quantitative assay reveals Fcp1 phosphatase activity against the phosphorylated 12-mer

CTD peptide is unaffected by Pin1 isomerase activity (Figure 2-4B). This observation is consistent with our determination that Fcp1 is a trans preferred Ser2 phosphatase, reminiscent of the Scp1/Pin1 profile in which Pin1 isomerase activity does not affect phosphate release by Scp1 (22).

The observation that phosphatases like Fcp1/Scp1 that show strong preference to trans proline are not affected by Pin1 activity is surprising. It has been generally assumed that since Pin1 can convert the cis and trans isomers whenever either population is below equilibrium, Pin1-mediated proline isomerization could promote the activity of any downstream proteins by replenishing substrate pools. However, the rate of cis/trans thermal conversion is highly dependent on the local protein configuration and can range from minutes to hundreds of hours (16, 17). The isomerase activity of Pin1/Ess1 would only boost the downstream enzyme activity when this thermal conversion is rate limiting 57

and too slow to sustain the supply of substrate. Since trans-proline is the major natural species, we hypothesize Ess1/Pin1 isomerase activity will not show any obvious effect until most of the substrate is depleted. Therefore, instead of a global effect of up- regulation for any downstream phosphatases recognizing SP motifs, Ess1/Pin1 only has a significant regulatory effect on proteins with strong preference to the minor cis-proline species. CTD phosphatases with specificity or preference to trans conformation proline can bypass Pin1 regulation with no alteration on the signaling pathway even though Pin1 performs the isomerization reaction.

2.2.6 In vitro reconstruction of Pin1 mediates Ssu72 enhancement in full length CTD.

Quantitative measurement of Ssu72 activity against phosphorylated CTD peptides established that Ssu72 phosphatase activity is enhanced 3–4 fold upon Pin1 supplementation(22). Since cis:trans proline conversion rate is highly dependent on context, we investigate if proline isomerization can be rate limiting in the context of full- length CTD and if the enhancement of Ssu72 by Pin1 is still evident. To test this, we reconstructed a minimalist system in vitro using GST-CTD phosphorylated by physiologically relevant kinase TFIIH to enrich the substrate for phospho-Ser5 marks,

Ssu72 then dephosphorylated this substrate with or without Pin1. The level of Ser5 phosphorylation was determined in western blot using phosphorylated Ser5 specific antibodies (Figure 2-5A). Reactions containing Pin1 displayed a higher degree of dephosphorylation relative to the zero time point than the reactions containing Ssu72 alone at all time points (Figure 2-5B). These results are consistent with our kinetic

58

results using short CTD peptides (22). These data imply Pin1 increases apparent Ssu72 activity against both short peptides and full length CTD.

59

A B Ssu72 Time (min.): 0 5 10 15 20 25

pSer5

Ctrl

Ssu72+Pin1 Time (min.): 0 5 10 15 20 25

pSer5

Ctrl

Figure 2-5. In vitro reconstruction Pin1 mediated Ssu72 enhancement.

(A) Western blot against TFIIH phosphorylated GST-CTD dephosphorylated by Ssu72 with or without Pin1. Phospho-Ser5 (pSer5) was monitored for reactions containing

Ssu72 alone (top) or Ssu72+Pin1 (bottom) over the indicated time course. Mouse IgG heavy chain (Ctrl) was introduced during reaction quenching to provide a loading control.

(B) Quantification of western blot. Phospho-Ser5 bands were first normalized to loading control and then relative to the respective zero time point for each condition. Blot and quantification represent one experimental replicate of three independent experimental replicates, all displaying increased dephosphorylation upon Pin1 supplementation.

60

2.2.7 Prolyl isomerase activity regulates cis-specific CTD phosphatase in the cell.

Our in vitro observations suggest that CTD phosphorylation state is differentially regulated by phosphatases based on their proline isomeric selectivity. To determine if this extends to cells, we monitored the effect of proline isomerization on the phosphorylation states of RNA polymerase II in HeLa cells. In this experiment, Pin1 expression was knocked down by more than 90% (36), and the phosphorylation levels of Ser2 and Ser5 were monitored relative to vector control cells. During general transcription, Fcp1 is the main phosphatase for Ser2 dephosphorylation (13), whereas Ssu72 is the workhorse for Ser5 dephosphorylation (27). Western blot for RNA polymerase II using a phospho- specific Ser5 consistently showed the accumulation of phospho-Ser5 CTD by 30–60% in the Pin1 knockdown cells compared to an empty vector control (Figure 2-6A & 2-6B). Due to the lack of Pin1 activity CTD repeats containing phospho-Ser5-cis-Pro6 are subject to depletion, and Ssu72 must wait for the slow trans to cis thermal conversion to dephosphorylate the remaining substrate. This apparent reduction of Ssu72 activity due to the loss of Pin1 causes an accumulation of phospho-Ser5 CTD in the cell. Importantly, the phosphorylation level of Ser2 is not impacted by Pin1 knockdown (Figure 2-6A & 2-6B). This is consistent with our in vitro data showing that Ssu72 activity is increased by Pin1 (22), whereas Fcp1 is unaffected.

61

A B shPin1 - +

pSer2

β-actin

pSer5

β-actin

Figure 2-6. Impact of Pin1 knockdown on CTD phosphorylation states in Hela cell lines.

(A) Western blot analysis of phosphorylated Ser2 (top) and phosphorylated Ser5 (bottom) in Hela cell lines transformed with either empty vector (left) or shPin1 containing vector

(right). Blots were performed using three biological replicates of both the empty vector control and shPin1 containing vector cells. Control and knockdown protein sample pairs were prepared in parallel. The three paired sample sets were analyzed on three separate blots. (B) Quantification of western blot, phosphorylated Ser5 levels increase 30–60% upon Pin1 knockdown. Quantification of western blot was performed by first normalizing control and shPin1 samples to the endogenous loading control (β-Actin). The shPin1 samples were then normalized to the paired vector control samples. Significance was assessed using Welch’s t-test (n=3).

62

2.3 CONCLUSION AND PERSPECTIVE

The isomerase activity of Ess1/Pin1 has been found to affect the outcome of eukaryotic transcription, yet its biological mechanism is not understood (1). In this study, we utilized locked-proline analogues to address how subtle conformational variation in proline isomerization alters signal transduction and results in differentiated regulation.

Specifically, we determined the prolyl isomeric preference of three CTD phosphatases:

Ssu72, Scp1, and Fcp1. Based on the different abundances of cis and trans proline, we reason that the impact of prolyl isomerase activity on CTD phosphatases varies. For

CTD in the absence of prolyl isomerases, cis-proline containing motifs become depleted.

This reduced availability of substrate acts as a “kinetic trap” to hinder the dephosphorylation of RNA polymerase II by cis-specific CTD phosphatases. The resultant accumulation of phospho-Ser5 would lead to global transcription termination defects such as read-through(12). However, Pin1-mediated cis-trans conversion overcomes this kinetic trap and provides sufficient cis-proline for Ssu72 consumption.

Using GST-CTD, we show the enhanced activity of Ssu72 by Pin1 extends to full-length substrate. This is consistent with the change in CTD phosphorylation states observed in cells upon Pin1 knock-down. Our kinetic, structural, and cellular results support a model in which Ess1/Pin1 alters the conformation of CTD and provides a kinetic switch that leads to differentiated phosphorylation states and results in different transcriptional outcomes (Figure 2-7).

63

cis cis specific phosphatases: Ssu72 trans RNA pol II

P Pin1 Normal STOP P termination -Pin1

trans preferred phosphatases: Scp1 & Fcp1

Read Normal through transcription function

Figure 2-7. Model of differentiated regulation mediated by proline isomerization of CTD

in RNA polymerase II dephosphorylation.

64

Figure 2-7. Model of differentiated regulation mediated by proline isomerization of CTD in RNA polymerase II dephosphorylation.

Trans-preferred or specific phosphatases, Scp1 and Fcp1, have substrate consistently available due to the thermodynamic preference for trans-proline in the CTD. Therefore, these enzymes bypass regulation by prolyl isomerases like Pin1. However, cis-specific phosphatases like Ssu72 rely on a minor substrate pool containing the cis-proline isomer and quickly deplete their available substrate. Prolyl isomerases, like Pin1, can restore the equilibrium between cis and trans isomers and replenish substrate pools. This regulatory switch provides for proper RNA polymerase II CTD phosphorylation levels and normal transcription termination. Upon Pin1 disruption or knockdown global transcription defects, like read-through, may occur.

Conversely, CTD phosphatases Scp1 and Fcp1 strongly favor the proline residue next the phospho-serine to be in trans conformation. Our results challenge a model where Ess1/Pin1 globally enhances the activity of proteins with isomeric preferences.

Instead, we show Pin1 has no effect on Scp1 and Fcp1 activity in vitro. We reason that these enzymes utilize the major trans proline species, have abundant substrate pools available, and cis to trans conversion is less likely to become rate limiting until almost all substrate is depleted, which is unlikely in vivo. Therefore, Pin1 isomerase activity has little effect on trans proline specific or preferred phosphatases. This mechanism is confirmed via western blot against cells lacking Pin1 in which Ser2 phosphorylation, regulated by Fcp1, is not affected but Ser5 phosphorylation, regulated by Ssu72, is accumulated. Therefore, even though Ess1/Pin1 has been reported to have widespread 65

effects on transcription, the direct regulation is restricted to proteins with cis-specific proline selectivity (Figure 2-7).

Proteomic studies have reported at least 100 proteins associate with CTD in vitro, mostly when it is phosphorylated (23). The proline residues situated next to the phosphorylated serine are also likely recognized by CTD binding proteins. With the configuration difference introduced by the isomeric states of proline, the interacting domains of CTD binding proteins could accommodate one isomer better due to steric restrictions. Proline isomerization state can alter the suitability of CTD as substrate for binding partners or downstream CTD modification enzymes. As shown here, such selectivity plays a pivotal role in regulatory pathways controlled by Pin1 and leads to differentiated outcomes. Furthermore, the isomeric states of proline could play a crucial role for the recruitment of protein factors and the assembly of transcription complexes.

Nrd1, for example, is a protein factor for small non-coding RNA termination that forms the NNS complex and is the only other known cis-specific CTD binding protein (24).

Therefore, prolyl isomeric selectivity might direct the assembly of different complexes to process nascent RNA polymerase II products. Since prolyl isomeric selectivity is hard to detect, the locked isostere compounds described here are useful as chemical tools to directly establish the preference of proteins towards proline isomers. As shown here for

CTD biology, locked isosteres can be used to elucidate regulatory pathways involving proline-containing motifs such as MAP kinase signaling, cyclin-dependent kinase signaling, and the GSK3β pathway (18, 37).

66

2.4 MATERIALS AND METHODS

2.4.1 Antibodies and reagents

Primary antibodies were obtained from the companies indicated: Anti-RNA polymerase II B1 (phospho CTD Ser-2), clone 3E10 (Millipore, Cat#:04-1571 Lot#:

NG1857513, Dilution: 1/1,000 – 1/5,000). Anti-RNA polymerase II B1 (phospho-CTD

Ser-5), clone 3E8 (Millipore, Cat#:04-1572, Lot#: NG1881282, Dilution: 1,5000 -

1/10,000). Anti-beta Actin [mAbcam 8224] antibody – Loading Control ab8224 (Abcam,

Cat#:ab8224, Lot#: GR151143-1, Dilution: 1/5,000), Anti-Rpb3, clone 1Y26

(BioLegend, Cat# 665003, Lot# 2013H01-002, Dilution: 1/3,000).

Secondary antibodies were obtained from the companies indicated: Goat Anti-Rat

IgG Antibody, HRP conjugate (Millipore, Cat#: AP136P, Dilution: 1/50,000), Goat Anti-

Mouse IgG H&L (HRP) preadsorbed (Abcam, Cat#: ab97040, Lot#: GR129315-4,

Dilution: 1/30,000 - 1/50,000).

2.4.2 General synthesis and characterization of chemical tools.

The cis and trans-locked isostere moieties (Fmoc–Ser(PO(OBn)-(OCH2CH2CN))–

Ψ[(E)CH=C]–Pro–OH and Fmoc–Ser(PO(OBn)-(OCH2CH2CN))–Ψ[(Z)CH=C]–Pro–

OH, respectively) (Figure 2-1) were synthesized as previously described (20, 21).

Peptides and peptidomimetic compounds incorporating the locked proline moieties were synthesized using solid phase peptide synthesis and Fmoc-protected amino acids and isosteres similar to peptides previously described (21).

67

2.4.3 Protein expression and purification.

Human full length Pin1, PPIase, and Scp1 were subcloned into a pET28a

(Novagene) derivative vector, pHIS8, which has an N-terminal 8×His tag followed by a thrombin cutting site and purified as reported previously (22). Drosophila melanogaster

Ssu72+Symplekin were purified in an identical manner to methods previously discussed

(28). Succinctly, these proteins were overexpressed in E. coli BL21 (DE3) cells by growing at 37°C in Luria-Bertani media supplemented with 50μg/mL kanamycin to an

OD600 of 0.4-0.6. Expression was induced by adding isopropyl-β-D- thiogalactopyranoside (IPTG) to a final concentration of 0.5 mM and the cultures were grown at 16 ºC for an additional 16 hours. The cells were pelleted, lysed via sonication, and centrifuged to separate the aqueous fraction from cellular debris. The aqueous fraction was initially purified using a Ni-NTA column (Qiagen) and eluted with imidazole. The N-terminal 8×His tag was removed with thrombin protease during dialysis at 4 °C and the proteins were further purified using ion exchange and size- exclusion chromatography columns. Homogeneity was confirmed with SDS-PAGE.

Scp1 D96N variant and Ssu72 C13D/D144N+Symplekin used in crystallization studies and PPIase K77/82Q used in kinetic analysis were generated using QuikChange

Site-Directed Mutagenesis Kit (Stratagene) and purified in a similar manner as described above. In order to obtain the highest quality protein for crystallization, the protein was further purified by size exclusion chromatography using Superdex 200 (GE Healthcare).

Residues 148-641 of S. pombe Fcp1 were cloned into a pHis8 vector. The protein was purified with similar protocol as described above. Finally, the protein was dialyzed 68

into a buffer containing 50mM Tris-HCl pH7.5, 100mM NaCl, 10% Glycerol, 0.01%

Triton X-100, 5mM BME and concentrated to be used for kinetic analysis. All Fcp1 was used within 8 hours of preparation to maintain the highest in vitro activity.

The fusion protein of GST and yeast CTD (GST-CTD) was cloned into a pet28a

(Novagene) derivative vector, which has an N-terminal His-tag. This protein was overexpressed in E. coli BL21 (DE3) cells by growing at 37°C in Luria-Bertani media

supplemented with 50μg/mL kanamycin to an OD600 of 0.4-0.6. Expression was induced by adding isopropyl-β-D-thiogalactopyranoside (IPTG) to a final concentration of 0.5 mM and the cultures were grown at 16 ºC for an additional 16 hours. The cells were pelleted, lysed via sonication in lysis buffer (50mM Tris-HCl pH 8.0, 500mM NaCl,

15mM Imidazole, 10% Glycerol, 0.1% Triton X-100, 10mM BME), and centrifuged to separate the aqueous fraction from cellular debris. The aqueous fraction was initially purified using a Ni-NTA column (Qiagen) and eluted with elution buffer (50mM Tris-

HCl pH 8.0, 500mM NaCl, 200mM Imidazole, 10mM BME). The protein was dialyzed at 4 °C against gel filtration buffer (20mM Tris-HCl pH 8.0, 50mM NaCl, 10mM BME) overnight. The sample was concentrated and loaded and ran on a Superdex 200 gel filtration column (GE). Homogeneity was confirmed with SDS-PAGE.

GST-CTD was phosphorylated in vitro using Cdk7/cyclinH/MAT1 (Millipore,

Cat#: 14-476M, Lot#:2153276-B) (TFIIH). Final reaction conditions contained 8mM

MOPS/NaoH pH 7.0, 0.125mM EDTA, 1mM ATP, 10mM Magnesium Acetate, 1 μg/μL

GST-CTD, ~0.04 μg/μL TFIIH in a 60 μL reaction volume. Reactions were incubated at

30°C for 16 hours, divided into aliquots, and flash frozen. 69

2.4.4 Crystallization and crystal soaking with peptidomimetic compounds.

Scp1 and Ssu72-Symplekin crystals were grown using sitting-drop vapor diffusion at room temperature by mixing 1 μL of ~10 mg/mL protein solution with 1 μL of reservoir solution. Optimized Scp1 D96N crystals were obtained in a condition consisting of 30% PEG 3350 (w/v) and 0.2 M magnesium acetate. Optimized Ssu72

C13D/D144N + Symplekin crystals were grown with a reservoir solution of 12% PEG

3350 (w/v) and 100 mM HEPES pH 8.6.

To obtain phosphatase+CTD mimetic peptide complex structures, the crystals of

Scp1 D96N variant were soaked in reservoir solution containing 1.1 mM trans or cis- locked peptide for 2 hours at room temperature. Ssu72-C13D/D144N+Symplekin crystals were soaked in a reservoir solution containing 2 mM cis-locked peptide overnight at room temperature. The Crystals were briefly equilibrated in cryo-protectants containing

20-25% glycerol (v/v), vitrified in nylon loops, and then stored in liquid nitrogen until data collection.

2.4.5 Data collection and structure determination.

Crystallographic data for Ssu72+Symplekin with cis-locked peptide and Scp1 with trans-locked peptide were collected on beam-line 23, sector B of the Advanced Photon Source (APS). Scp1 with cis-locked peptide data was collected on beam line BL 5.0.3 of the Advance Light Source (ALS). All diffraction data were processed with HKL2000 (38). Scp1 D96N complex structures were determined by molecular replacement (MR) using an Scp1 D96N structure (PDB ID: 2ght) as the search model using Phaser-MR in the CCP4 suite (39). The MR solution was built iteratively with

70

manual building in COOT(40) and computational refinements performed using Refmac5 in the CCP4 suit (39, 41). Ssu72 C13D/D144N+Symplekin structures were determined by molecular replacement (MR) using the structure of human Ssu72-symplekin (PDB ID: 3O2s) as the search model in the program PHASER-MR available in the CCP4 software package(39). The MR solution was initially refined in REFMAC5 (41) and iteratively built in COOT (40) . Restraint files for the cis- and trans-locked isosteres as well as linkages were generated using the JLigand program in the CCP4 suite (39). The quality of the final refined structures was evaluated by MolProbity (42).

2.4.6 Malachite green assay and analysis.

Malachite green assays for phosphatase activity were carried out in 200 μL PCR tubes in a Mastercycler PCR machine (Eppendorf North America). Enzyme and peptidomimetic compounds or peptides were mixed in a final reaction volume of 20 μL

(conditions and amounts in Supplementary Table 2). Reactions were quenched with 40

μL BIOMOL® Green Reagent (Enzo® Life Sciences) and the color was allowed to

develop for 30 minutes at room temperature. The OD620 was determined using an

Infinite® M200 plate reader (Tecan) and the concentration of phosphate released were determined using a standard curve prepared previously for each reaction buffer.

Specificity constants were determined by plotting rate against concentration and fitting data in R-Studio (43) using the nls() command and the following equation (1),

where kon=kcat/Km, as shown previously (28).

�!" � � = ! ! (1) 1 + !" !!"#

71

2.4.7 Fcp1/Pin1 coupled assay and analysis.

For the Pin1 inclusion assay, 200 ng of Fcp1 were incubated with 250 μM of native phospho-Ser2 peptide. Pin1 and the isomeric domain PPIase reactions were supplemented with 10 ng of Pin1 or 10 ng of PPIase protein, respectively. The reactions

were carried out in 50 mM Tris-acetate pH 5.5, 10 mM MgCl2 in a total volume of 20 μL at 37 °C for the time indicated. The reactions were quenched with malachite green and

color was allowed to develop for 30 minutes. OD620 measurements were determined using an Infinite® M200 plate reader (Tecan).

2.4.8 In vitro reconstruction of Pin1 mediated Ssu72 enhancement.

In vitro dephosphorylation of GST- CTD was performed in a reaction containing

263 ng of Ssu72 or 263 ng Ssu72 plus 158 ng Pin1, 735 ng phosphorylated GST-CTD, and 100 mM MES pH6.5 at a reaction volume of 10.5 μL. The reactions were carried out at 30°C for the time indicated (0, 5, 10, 15, 20, or 25 minutes) and quenched with 2X

Laemmli buffer containing denatured mouse IgG (Millipore, Cat#: 05-782, Lot#:

2510311) for use as loading control at a dilution of 1/250. A portion of these reactions corresponding to 35ng of GST-CTD (1 μL) was loaded onto 15% SDS-PAGE and ran at

180V for 1 hour. The gel was transferred to PVDF membrane at 100V for 1 hour at 4°C.

The membrane was blocked in 1X TBST + 5% BSA for 1 hour at 4°C with shaking. The blot was then incubated with Goat Anti-Mouse IgG H&L (HRP) preadsorbed antibody

(Abcam, Cat#: ab97040, Lot# GR129315-4, Dilution 1/30,000) for two hours at 4°C with

72

shaking. The blot was washed 5 times for 5 minutes in 1X TBST at room temperature and incubated with SuperSignal West Pico Chemiluminescent Substrate (Pierce,

Cat#:34079) according to factory directions. Blots were imaged using a ChemiDoc™

MP System (Biorad). The blot was incubated twice in stripping buffer (0.2M glycine,

0.1% SDS (w/v), 1% Tween-20 (v/v), pH 2.2) for 10 minutes each at room temperature to remove the Goat Anti-Mouse secondary. Blot was washed twice for 10 minutes in 1X

PBS, then in 1X TBST for 5 minutes, and finally blocked in 1X TBST + 5% BSA for 1 hour at 4°C. The blot was then probed for phosphorylated Ser5 using Anti-RNA polymerase II B1 (phospho-CTD Ser-5), clone 3E8 (Millipore, Cat#:04-1572, Lot#:

NG1881282, Dilution: 1/10,000) for 1 hour at 4°C with shaking. The blot was washed 5 times for 5 minutes in 1X TBST and incubated with Goat Anti-Rat IgG Antibody, HRP conjugate (Millipore, Cat#: AP136P, Dilution: 1/50,000) for 1 hour at room temperature with shaking. The blot was washed 5 times for 5 minutes in 1X TBST and then incubated with chemiluminescent substrate as before. Blots were imaged using a

ChemiDoc™ MP System (Biorad) and quantified using Image Lab™ Software (Biorad) image analysis tools. Phosphorylated Ser5 bands were normalized to bands for the heavy chain of mouse IgG.

2.4.9 Establishment of shPin1 stable cell lines.

Cervical cancer cell line, Hela, was infected with Pin1 shRNA or control shRNA lentiviruses, as previously described (36). In brief, 293T packaging cells were transfected with pLKO.1-based control or shPin1 plasmid along with pCMV-Delta 8.9 73

containing gag, pol and rev genes and VSV-G expressing envelope plasmid using transfection reagent, lipofectamine 2000. Cell medium containing viral particles was harvested 60 hours post transfection and Hela cells were infected with the media in the presence of 4 μg/mL hexadimethrine bromide (polybrene) for 48 hours. Stable cell clones were selected using 2 μg/ml of puromycin and checked for Pin1 protein expression by immunoblotting analysis with anti-Pin1 antibody to confirm Pin1 knockdown efficiency.

Stable cells were maintained continuously in culture, splitting every fourth day and seeding at the concentration of 6 × 105 cells per 10 cm culture dish.

2.4.10 Immunoblotting

Vector control and shPin1 Hela cells were lysed in RIPA (150 mM NaCl, 1% Triton X-100, 0.5% sodium deoxycholate, 0.1% sodium dodecyl sulfate, 50 mM Tris- HCl pH 8.0) with Halt™ Protease and Phosphatase Inhibitor Cocktail (Biorad, Cat#:78440, Lot#: PB196591) and the aqueous fraction was collected by centrifugation at 6 °C 13,000 rpm for 10 minutes. The protein concentration of the aqueous fraction was determined by BCA assay. This aqueous fraction was then combined with 2X Laemmli buffer and boiled at 95 °C for 5 minutes. Total protein (20-40 μg) was loaded onto a 4- 20% gradient SDS-PAGE gel (Biorad, Cat#:456-1096) and ran at 150V for 50 minutes at room temperature in a Mini-PROTEAN Tetra Cell (Biorad). The proteins were transferred to PVDF membrane at 100 V for 1 hour at 4 °C in a Mini-PROTEAN Tetra Cell (Biorad). Membranes were blocked in 1X TBST (20 mM Tris-HCl pH 7.5, 150 mM NaCl, 0.1% Tween-20) + 5% Bovine Serum Albumin (BSA) for 1 hour at 4 °C with shaking. Blocked membranes were incubated in primary antibody, at dilutions indicated above, in either 1X TBST or 1X TBST+5% BSA at 4 °C overnight. The membranes were 74

then washed six times with 1X TBST for 5 minutes each at room temperature and incubated with secondary antibody in 1X TBST for 1 hour at 4 °C. The membrane was washed once again and incubated with SuperSignal West Pico Chemiluminescent Substrate (Pierce, Cat#: 34079) according to factory directions. Blots were imaged using a ChemiDoc™ MP System (Biorad). Blots were quantified using Image Lab™ Software (Biorad) image analysis tools.

2.5 REFERENCES

1. Hanes SD (2014) The Ess1 prolyl isomerase: traffic cop of the RNA polymerase II transcription cycle. Biochimica et biophysica acta 1839(4):316-333. 2. Lu KP, Finn G, Lee TH, & Nicholson LK (2007) Prolyl cis-trans isomerization as a molecular timer. Nature chemical biology 3(10):619-629. 3. Singh N, et al. (2009) The Ess1 prolyl isomerase is required for transcription termination of small noncoding RNAs via the Nrd1 pathway. Molecular cell 36(2):255-266. 4. Wu X, Rossettini A, & Hanes SD (2003) The ESS1 prolyl isomerase and its suppressor BYE1 interact with RNA pol II to inhibit transcription elongation in Saccharomyces cerevisiae. Genetics 165(4):1687-1702. 5. Wu X, et al. (2000) The Ess1 prolyl isomerase is linked to chromatin remodeling complexes and the general transcription machinery. The EMBO journal 19(14):3727-3738. 6. Xu YX, Hirose Y, Zhou XZ, Lu KP, & Manley JL (2003) Pin1 modulates the structure and function of human RNA polymerase II. Genes & development 17(22):2765-2776. 7. Ghosh A, Shuman S, & Lima CD (2008) The structure of Fcp1, an essential RNA polymerase II CTD phosphatase. Molecular cell 32(4):478-490. 8. Krishnamurthy S, He X, Reyes-Reyes M, Moore C, & Hampsey M (2004) Ssu72 Is an RNA polymerase II CTD phosphatase. Molecular cell 14(3):387-394. 9. Yeo M, et al. (2005) Small CTD phosphatases function in silencing neuronal gene expression. Science 307(5709):596-600. 10. Zhang Y, et al. (2006) Determinants for dephosphorylation of the RNA polymerase II C-terminal domain by Scp1. Molecular cell 24(5):759-770. 11. Dichtl B, et al. (2002) A role for SSU72 in balancing RNA polymerase II transcription elongation and termination. Molecular cell 10(5):1139-1150. 12. Ganem C, et al. (2003) Ssu72 is a phosphatase essential for transcription termination of snoRNAs and specific mRNAs in yeast. The EMBO journal 22(7):1588-1598. 75

13. Cho EJ, Kobor MS, Kim M, Greenblatt J, & Buratowski S (2001) Opposing effects of Ctk1 kinase and Fcp1 phosphatase at Ser 2 of the RNA polymerase II C-terminal domain. Genes & development 15(24):3319-3329. 14. Cho H, et al. (1999) A protein phosphatase functions to recycle RNA polymerase II. Genes & development 13(12):1540-1552. 15. Brandts JF, Halvorson HR, & Brennan M (1975) Consideration of the Possibility that the slow step in protein denaturation reactions is due to cis-trans isomerism of proline residues. Biochemistry 14(22):4953-4963. 16. Brandl CJ & Deber CM (1986) Hypothesis about the function of membrane- buried proline residues in transport proteins. Proceedings of the National Academy of Sciences of the United States of America 83(4):917-921. 17. Wedemeyer WJ, Welker E, & Scheraga HA (2002) Proline cis-trans isomerization and protein folding. Biochemistry 41(50):14637-14644. 18. Etzkorn FA & Zhao S (2015) Stereospecific Phosphorylation by the Central Mitotic Kinase Cdk1-Cyclin B. ACS chemical biology. 19. Namanja AT, et al. (2010) Toward flexibility-activity relationships by NMR spectroscopy: dynamics of Pin1 ligands. Journal of the American Chemical Society 132(16):5607-5609. 20. Wang XJ, et al. (2003) Serine-cis-proline and serine-trans-proline isosteres: stereoselective synthesis of (Z)- and (E)-alkene mimics by Still-Wittig and Ireland-Claisen rearrangements. The Journal of organic chemistry 68(6):2343- 2349. 21. Wang XJ, Xu B, Mullins AB, Neiler FK, & Etzkorn FA (2004) Conformationally locked isostere of phosphoSer-cis-Pro inhibits Pin1 23-fold better than phosphoSer-trans-Pro isostere. Journal of the American Chemical Society 126(47):15533-15542. 22. Zhang M, et al. (2012) Structural and kinetic analysis of prolyl- isomerization/phosphorylation cross-talk in the CTD code. ACS chemical biology 7(8):1462-1470. 23. Phatnani HP & Greenleaf AL (2006) Phosphorylation and functions of the RNA polymerase II CTD. Genes & development 20(21):2922-2936. 24. Kubicek K, et al. (2012) Serine phosphorylation and proline isomerization in RNAP II CTD control recruitment of Nrd1. Genes & development 26(17):1891- 1896. 25. Werner-Allen JW, et al. (2011) cis-Proline-mediated Ser(P)5 dephosphorylation by the RNA polymerase II C-terminal domain phosphatase Ssu72. The Journal of biological chemistry 286(7):5717-5726. 26. Xiang K, et al. (2010) Crystal structure of the human symplekin-Ssu72-CTD phosphopeptide complex. Nature 467(7316):729-733. 27. Zhang Y, Zhang M, & Zhang Y (2011) Crystal structure of Ssu72, an essential eukaryotic phosphatase specific for the C-terminal domain of RNA polymerase II, in complex with a transition state analogue. The Biochemical journal 434(3):435- 444. 76

28. Luo Y, et al. (2013) novel modifications on C-terminal domain of RNA polymerase II can fine-tune the phosphatase activity of Ssu72. ACS chemical biology 8(9):2042-2052. 29. He X, et al. (2003) Functional interactions between the transcription and mRNA 3' end processing machineries mediated by Ssu72 and Sub1. Genes & development 17(8):1030-1042. 30. Reyes-Reyes M & Hampsey M (2007) Role for the Ssu72 C-terminal domain phosphatase in RNA polymerase II transcription elongation. Molecular and cellular biology 27(3):926-936. 31. Kobor MS, et al. (1999) An unusual eukaryotic protein phosphatase required for transcription by RNA polymerase II and CTD dephosphorylation in S. cerevisiae. Molecular cell 4(1):55-62. 32. Kops O, Zhou XZ, & Lu KP (2002) Pin1 modulates the dephosphorylation of the RNA polymerase II C-terminal domain by yeast Fcp1. FEBS letters 513(2-3):305- 311. 33. Hani J, et al. (1999) Mutations in a peptidylprolyl-cis/trans-isomerase gene lead to a defect in 3'-end formation of a pre-mRNA in Saccharomyces cerevisiae. The Journal of biological chemistry 274(1):108-116. 34. Verdecia MA, Bowman ME, Lu KP, Hunter T, & Noel JP (2000) Structural basis for phosphoserine-proline recognition by group IV WW domains. Nature structural biology 7(8):639-643. 35. Palancade B, et al. (2004) Dephosphorylation of RNA polymerase II by CTD- phosphatase FCP1 is inhibited by phospho-CTD associating proteins. Journal of molecular biology 335(2):415-424. 36. Min SH, et al. (2012) Negative regulation of the stability and tumor suppressor function of Fbw7 by the Pin1 prolyl isomerase. Molecular cell 46(6):771-783. 37. Lu KP, Liou YC, & Zhou XZ (2002) Pinning down proline-directed phosphorylation signaling. Trends in cell biology 12(4):164-172. 38. Otwinowski Z & Minor W (1997) Processing of X-ray diffraction data collected in oscillation mode. Method Enzymol 276:307-326. 39. Winn MD, et al. (2011) Overview of the CCP4 suite and current developments. Acta crystallographica. Section D, Biological crystallography 67(Pt 4):235-242. 40. Emsley P & Cowtan K (2004) Coot: model-building tools for molecular graphics. Acta crystallographica. Section D, Biological crystallography 60(Pt 12 Pt 1):2126-2132. 41. Vagin AA, et al. (2004) REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use. Acta crystallographica. Section D, Biological crystallography 60(Pt 12 Pt 1):2184-2195. 42. Chen VB, et al. (2010) MolProbity: all-atom structure validation for macromolecular crystallography. Acta crystallographica. Section D, Biological crystallography 66(Pt 1):12-21. 43. Team RC (2013) R: A language and environment for statistical computing. (Vienna, Austria). 77

78

Chapter 3: Ultraviolet photodissociation mass spectrometry to map phosphorylation along RNA polymerase II CTD.

ABSTRACT

Phosphorylation of the C-terminal domain of RNA polymerase II (CTD) plays an essential role in eukaryotic transcription by recruiting transcriptional regulatory factors to the active polymerase. However, the scarcity of basic residues and repetitive nature of the CTD sequence impose a huge challenge for site-specific characterization of phosphorylation, hindering our understanding of this crucial biological process. In this chapter, I discuss the use of innovative LC-UVPD-MS methods to analyze post- translational modification along native sequence CTDs. Application of our method to the

Drosophila melanogaster CTD reveals the phosphorylation pattern of this for the first time. The divergent nature of fly CTD allows us to derive rules defining how flanking residues affect phosphorylation choice by CTD kinases. Our data supports the use of LC-UVPD-MS to decipher the CTD code and determine rules that program its function.

79

3.1 INTRODUCTION

Specific monoclonal antibodies against modifications of the consensus heptad have broadened our understanding of CTD phosphorylation and helped characterize the temporal association of CTD modifying proteins to different modifications during the transcription cycle (1). However, antibodies suffer from several inherent limitations.

First, the available antibodies have been raised against consensus CTD sequences that are not present in all species. Important model systems like Drosophila melanogaster and

Homo sapiens contain a number of heptads that diverge from the consensus sequence and cannot be confidently characterized by these tools. Second, the ability of other marks within CTD heptads to interfere with antibody/epitope recognition is not completely known (2). This results in an inability to reliably quantify the total phosphorylation of the

CTD. Third, antibodies cannot localize phosphorylation marks along the sprawling CTD.

Finally, due to the similarity of phosphate-accepting motifs (i.e. YS2P vs. TS5P vs. PS7Y) the potential for cross-reactivity of these antibodies cannot be ignored.

Tandem mass spectrometry methods can map the location of post-translational modifications in high resolution and, therefore, are considered the gold standard for such analysis (3). However, the highly repetitive heptapeptide sequence of RNAP II CTD is especially challenging for LC-MS/MS analysis. Recently, important strides have been made in the study of the CTD with the development of mass spectrometry approaches to analyze the CTD in yeast (4) and human cells (5) by introducing mutations to the CTD sequence to facilitate proteolytic digestion and modification site localization. By expressing these mutant constructs in cells and analyzing resultant modifications with 80

collision-induced dissociation (CID), several conclusions were made. First, CTD phosphorylation is evenly dispersed throughout the length of the CTD and is not localized proximal or distal to the catalytic core of RNAP II. Second, within individual heptad repeats, Ser2 and Ser5 phosphorylations are significantly more abundant compared to other modification sites (i.e. phosphorylation of Tyr1, Thr4, and Ser7).

Finally, the phosphorylation along the CTD is significantly less dense than once thought, with the majority of heptads only accepting one phosphate. These findings establish mass spectrometry as an indispensable tool for studying CTD biology and lay the groundwork for further technological advancement.

Developing methodologies to analyze the CTD without the necessity to change the original sequence would be an important step towards understanding the function of the CTD. However, the scarcity of basic sites prevents tryptic digestion and also suppresses effective protonation, which is critical for analysis using conventional positive mode MS/MS techniques including CID and electron activation methods.

Phosphorylation increases the acidity of the consensus sequence and further decreases the efficiency of protonation and the success of subsequent positive mode analysis. Negative polarity peptide analysis generally results in uninformative CID mass spectra comprised mostly of neutral phosphate losses, precluding both identification and characterization of peptides (6, 7). Ultraviolet photodissociation (UVPD) using 193 nm photons is an alternative to existing collision- and electron-based activation methods that offers several advantages for phosphorylation site mapping in the CTD. Charge state bias, which is a limitation for traditional methods, is largely overcome using UVPD, and high sequence 81

coverage has been demonstrated even for singly charged precursor ions(8-11). The high energy deposition (6.4 eV) that is achieved upon absorption of a 193 nm photon permits access to fragmentation pathways that are not available using traditional methods, ultimately leading to the formation of a, b, c, x, y, and z ions which account for cleavage of each bond in the peptide backbone(12). The greater number of product ions obtained using UVPD increases the confidence of peptide sequencing results while also improving the ability to pinpoint sites of modification. Another merit of UVPD is the ability to generate diagnostic product ions from peptide anions (13-15). Although peptide analysis is generally undertaken in the positive ion mode based on greater sensitivity and number of applicable MS/MS techniques, the negative mode offers unique benefits for certain types of peptides. This is especially true for characterization of labile PTMs including phosphorylation, sulfation, and O-glycosylation, all of which exhibit superior retention using negative mode UVPD(14, 16-19). Furthermore, alternating between positive and negative electrospray ionization modes is easily done, even within a single LC-MS run, further increasing the versatility of UVPD-MS.

In the present study, UVPD-MS is used to its full advantage in both positive and negative modes to analyze CTD modifications in unprecedented detail. We investigate two highly dissimilar eukaryotic CTDs with no introduced mutations: yeast

(Saccharomyces cerevisiae) that is composed almost entirely of consensus heptads and fruit fly (Drosophila melanogaster) that diverges greatly from consensus sequence. In the yeast CTD (yCTD), we phosphorylate GST-yCTD constructs with two physiologically relevant CTD kinases, TFIIH and Erk2, and pinpoint the resultant phosphorylation sites. 82

To further test the power of UVPD in detecting CTD modification, we use the

Drosophila melanogaster CTD (DmCTD), which has a highly divergent sequence with only 2 of its approximately 45 heptad repeats matching the consensus sequence. By localizing the phosphorylation sites introduced by Erk2, our results reveal how the flanking residues affect the phosphorylation choice by this kinase. The novel strategy establishes UVPD-MS in combination with alternative proteases as a powerful way to investigate native CTD modifications across species.

3.2 RESULTS AND DISCUSSION

3.2.1 Analysis of Saccharomyces cerevisiae CTD.

The identification of CTD modifications in the context of the wild-type amino acid sequence is obviously ideal for drawing physiologically relevant conclusions.

However, the lack of basic residues in the consensus CTD sequence makes it inapplicable to established trypsin digestion methods for mass spectrometric analysis which previously necessitated the introduction of lysine or arginine residues at the Ser7 position for yeast CTD analysis(4, 5). To investigate native yeast CTD modification we turned to alternative protease digestion strategies. Although yCTD lacks basic residues, it is highly enriched in the residue tyrosine that is found in the consensus sequence. The proteases chymotrypsin and proteinase K typically cleave peptide bonds flanked by aromatic and, to a lesser extent, aliphatic residues making the CTD an ideal substrate for these enzymes.

83

To investigate the efficacy of these proteases in LC-UVPD-MS analysis of yCTD we generated a GST-yCTD construct and purified recombinant protein to homogeneity.

The recombinant protein was initially treated with trypsin overnight to digest and remove the GST tag. The intact yCTD portion was purified from GST peptides by passing the digest through a 10 kDa molecular weight cut off (MWCO) filter, which retained yCTD.

The yCTD sample was then treated with either chymotrypsin or proteinase K. The final digests were then analyzed using LC-UVPD-MS. Varying numbers of missed cleavages by chymotrypsin resulted in a more complex peptide mixture compared to using proteinase K which provided more efficient digestion and generated just two dominant heptad peptides with sequences YSPTSPS and SPSYSPT (Figure 3-1A). While the observed proteinase K cleavages that occurred C-terminal to Thr4 and Ser7 deviated from the expected specificity C-terminal to tyrosine, the cleavage pattern was reproducible and thus proteinase K was considered the more optimal protease for yCTD digestion.

Ultraviolet photodissociation at 193 nm and CID were evaluated for sequencing the heptads generated by proteinase K digestion, and both identified two peptides with sequences YSPTSPS and SPSYSPT (Figure 3-1A). Using UVPD more extensive fragmentation including the production of a- and x-type sequence ions was achieved

(Figure 3-1C and 3-1E), while the abundance of less informative water loss ions was decreased relative to the analogous CID spectra (Figure 3-1B and 3-1D). Additionally, improved phosphate retention on product ions has been demonstrated for UVPD (20), making it ultimately better suited for CTD phosphorylation analysis following kinase treatment. Both peptides contain a heptad repeat but the location of bond cleavage varies. 84

For clarity, we will analyze and discuss the residues from this and subsequent analyses according to their position in the established CTD naming and numbering convention (i.e.

Tyr1, Ser2, Pro3, Thr4, Ser5, Pro6, Ser7) though their positions in generated peptides may vary.

Figure 3-1. LC-MS base peak MS1 chromatogram and MS/MS spectra of unmodified

yeast CTD heptads following digestion with trypsin and proteinase K.

85

Figure 3-1. LC-MS base peak MS1 chromatogram and MS/MS spectra of unmodified yeast CTD heptads following digestion with trypsin and proteinase K.

(A) LC-MS base peak MS1 chromatogram with peaks corresponding to the elution of unmodified heptad peptides (m/z 738) detected at time 13-14 min. (B & C.) MS/MS (CID and UVPD) mass spectra acquired for protonated YSPTSPS. (D & E) MS/MS (CID and

UVPD) mass spectra acquired for protonated SPSYSPT. Tyr side chain losses generated by UVPD are denoted m1 and m4 for YSPTSPS and SPSYSPT, respectively.

We next sought to apply our LC-UVPD-MS workflow to phosphorylated yCTD.

To generate phosphorylated yCTD we treated GST-yCTD with TFIIH, a multi-protein complex that is required for the phosphorylation of the CTD at Ser5 during the initiation of RNAP II transcription (21). This reaction is catalyzed by the kinase subunit Kin28, in yeast, or Cdk7, in multi-cellular organisms (22, 23). Given that nearly half of the heptads in the mammalian CTD match the consensus found in the yeast CTD (1), we utilized commercially available human TFIIH complex to investigate its patterning along the yCTD. Upon UVPD-MS analysis of TFIIH treated GST-yCTD, a peptide (ion of m/z

818, 1+) corresponding to the mass of the consensus heptad plus one phosphorylation was observed in the LC-MS chromatogram (Figure 3-2A), in addition to the two previously detected unmodified heptad peptides of m/z 738 corresponding to YSPTSPS and SPSYSPT. The MS/MS spectra acquired during the elution of the phospho-heptad showed distinctive variations at different elution time-points, thus revealing the presence of two isomers (Figure 3-2B). Targeted LC-UVPD-MS runs allowed better 86

characterization of the two isomers. Two abundant UVPD product ions which were unique to the early (m/z 407, a4 from SPSYpSPT) and late (m/z 594, x5 from

YSPTpSPS) portions of the elution profile were identified. Extracted ion chromatograms

(Figure 2B) revealed that the two ions matched to different heptad peptides with phosphorylation at different positions as defined by the consensus sequence (YSPTSPS).

The UVPD mass spectra confirmed the sequences as SPSYpSPT with phosphorylation on

Ser2 (Figure 3-2C) and YSPTpSPS with phosphorylation on Ser5 (Figure 3-2D). In addition to the unique a4 (SPSYpSPT) and x5 (YSPTpSPS) ions, other diagnostic ions that confidently differentiated each of the two phosphopeptides were y2 and a2 and the presence or absence of y6. These findings demonstrate the ability of TFIIH to phosphorylate both Ser2 and Ser5 of the yCTD. Based on the integrated chromatographic peak areas attributed to the SPSYpSPT peptide versus the YSPTpSPS peptide in Figure 3-2B (and considering the similar ionization efficiencies and fragmentation efficiencies of these two heptads), it appears that phosphorylation of Ser5 may be more prominent than phosphorylation of Ser2. More accurate quantitation is difficult given that the extracted LC traces are based on single unique fragment ions for each peptide, but it is reasonable to estimate that phosphorylation of the Ser5 is favored over phosphorylation of the Ser2. No phosphorylations of Ser7, Thr4, or Tyr1 were detected.

87

Figure 3-2. LC-UVPD-MS analysis of TFIIH and Erk2 treated yeast GST-CTD digested

with trypsin and proteinase K.

88

Figure 3-2. LC-UVPD-MS analysis of TFIIH and Erk2 treated yeast GST-CTD digested with trypsin and proteinase K.

The base peak MS1 chromatograms (0-45 minutes) are provide in Figure 3-3. Two singly phosphorylated heptads, m/z 818.3, are partially resolved in the base peak MS1 chromatogram (A for TFIIH and E for Erk2). In subsequent LC-MS analysis, m/z 818.3 was targeted for activation during the course of elution and extracted ion chromatograms

(XICs) for distinguishing product ions, a4 from protonated SPSYpSPT and x5 from protonated YSPTpSPS, were generated to track the isomeric peptides (B for TFIIH and F for Erk2). UVPD using two 2 mJ pulses was used to sequence the heptad peptides and localize the sites of phosphorylation (C & D for TFIIH; G & H for Erk2). Ions that have undergone phosphate neutral loss are denoted with “-P”. Tyr side chain losses generated by UVPD are denoted m1 and m4 for YSPTSPS and SPSYSPT, respectively.

89

Figure 3-3. Base peak MS1 chromatograms.

Base peak MS1 chromatograms for proteinase K-digested yeast CTD following phosphorylation with (A) TFIIH or (B) Erk2.

Recently, the physiological role of mitogen-activate protein kinase 1 (MAPK1) or

Erk2 in phosphorylating the CTD of poised RNA polymerase II at developmentally important genes has been reported (24). In mouse embryonic stem cells, Erk2 associates with a transcriptionally poised RNAP II and phosphorylates it at Ser5. Upon specific developmental signals, occupancy by Erk2 and associated proteins gradually diminishes 90

and these regions of the chromatin become transcriptionally active to direct cells down specific developmental paths. To identify phosphorylation targets of Erk2 within the consensus CTD, we took advantage of our LC-UVPD-MS approach to obtain high- resolution information about Erk2 patterning along the yCTD.

GST-yCTD was phosphorylated by Erk2 kinase and analyzed by the protocol used for TFIIH analysis. Intriguingly, the LC-UVPD-MS results obtained for GST-yCTD treated with Erk2 kinase mirrored those observed with TFIIH in terms of the detection and differentiation of the same two phosphorylated heptads (Figure 3-2E and 3-2F).

Again, two mono-phosphorylated species containing the consensus sequence were observed with phosphorylation confirmed at Ser2 or Ser5 upon Erk2 treatment of the

CTD based on the UVPD mass spectra (Figure 3-2G and 3-2H). The YSPTpSPS phosphopeptide appears to be more prominent compared to the SPSYpSPT for the Erk2 treatment based on comparison of the abundance of the x5 and a4 ions for each peptide, respectively. Thus, preferential phosphorylation of Ser5 over Ser2 mirrors the trend observed for the TFIIH-treated yCTD. Phosphorylation was not detected at Ser7, Thr4 or

Tyr1 in our spectra.

Despite the reactions occurring under conditions of excess ATP to GST-yCTD substrate, in the presence of large amounts of kinase, and during long incubation times, we did not observe simultaneous phosphorylation of Ser2 and Ser5 within a single heptad span. This is consistent with recent analysis of yCTD (4) in that phosphorylation marks are not placed as densely as previously thought.

91

Our analysis provides the first evidence to support the similarity in CTD patterning between two kinases TFIIH and Erk2. Both kinases can phosphorylate Ser2 and Ser5, with a clear preference for Ser5. The preferential phosphorylation of Ser5 over

Ser2 is more pronounced for the TFIIH than for Erk2. Furthermore, these findings fall in line with existing CTD MS data that suggests the CTD code is simplistic and trends towards single phosphorylation of a given heptad span (4, 5).

3.2.2 Analysis of Drosophila melanogaster CTD

As described above, our innovative LC-UVPD-MS approach proved applicable to the study of modifications within the heptads of native yCTD. An exciting benefit of this technology is it should be widely applicable to other organisms regardless of divergence from consensus sequence. A prime example for such a CTD is that of Drosophila melanogaster (fruit fly) in which only two of its forty-five heptad repeats faithfully recreate the consensus heptad, thus precluding analysis by antibodies generated against consensus heptads. Due to the heterogeneity of the DmCTD sequence, these analyses have the additional benefit of permitting us to map phosphorylations along the full-length

DmCTD. This inherent variation from consensus sequence allows us to interrogate the role of flanking residues within the same heptad on the preferential site of phosphorylation.

Drosophila melanogaster is an important model system in the study of transcription, with many of the conclusions being drawn from this model organism extending across eukaryotes (25-29). Most core developmental mechanisms are highly

92

conserved (27) including JAK-STAT (25), Wnt signaling (26), Hedgehog signaling pathways (30) and recently characterized Jarid2-Polycomb Repressive Complex 2

(PRC2) pathway (31). Jarid2 is the founding member of the JmjC domain family of proteins and has been characterized as a transcriptional co-repressor (32, 33). PRC2 is a multi-protein complex that catalyzes H3K27me3 methylation, a histone mark associated with gene repression (33). In flies and mouse embryonic stem cells (ESC), Jarid2 and

PRC2 associate in vivo along chromatin at a subset of developmentally important genes

(31, 33) and contribute to gene silencing by propagating H3K27me3 marks and maintaining RNAPII in the transcriptionally poised state (33). These poised RNAP II complexes are activated upon specific signals to induce rapid transcription and solidify cells down specific developmental paths. Poised RNAP II stands as a conserved developmental mechanism across metazoans (34).

Recently, work in mouse ESC has demonstrated a physiological role for Erk2 as a bona fide CTD kinase for transcriptionally poised RNAPII at Jarid2-PRC2 targeted genes

(24). This phosphorylation contributes to the poised RNAP II state and informs the subsequent transition to an active gene state characterized by replacement of poised

RNAP II factors with canonical transcription machineries. The developmental role and mechanism of Jarid2-PRC2 is conserved between flies and mammals (31, 33), and supports a yet to be explored role of Erk2 in fly development. To better understand the phosphorylation of Drosophila melanogaster CTD by Erk2, we endeavored to analyze the Erk2 phosphorylation pattern using our novel LC-UVPD-MS approach.

93

We generated GST fusion constructs of full-length Drosophila melanogaster

CTD (GST-DmCTD). Multiple GST-DmCTD truncation constructs were also produced to reduce the overall complexity in each LC-MS run and facilitate complete sequence coverage. The sequences of the various DmCTD constructs are summarized in Table 3-1, with heptads numbered as describe in Figure 3-4. The CTD1 construct contains full- length DmCTD. CTD2 and CTD3 included the N-terminal region of the protein from heptad 1-16 and 1-25, respectively, whereas CTD4 included the C-terminal region from heptad 26-45 (35). CTD5 covered an interior region of the protein spanning heptads 16-

24. To digest GST-CTD1, trypsin, proteinase K, and chymotrypsin were considered. The low frequency of trypsin cleavage sites throughout the DmCTD prevents tryptic digestion into appropriately sized peptides for bottom-up LC-MS analysis. Proteinase K was rejected for its poorly defined cleavage specificity (36), a factor that would confound prediction of products within the more variable DmCTD sequence and increase the complexity of an already complex digest. Therefore, chymotrypsin was chosen as it more consistently cleaves C-terminal to aromatic residues with additional lower activity cleavage C-terminal to methionine, leucine, and histidine residues. LC-UVPD-MS analysis of unmodified GST-DmCTD chymotrypsin digests revealed peptides applicable to LC-MS and high coverage of the full-length as well as truncated DmCTD constructs.

94

Full length Drosophila GST-CTD MHHHHHHSSMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELGLEFPNLP YYIDGDVKLTQSMAIIRYIADKHNMLGGCPKERAEISMLEGAVLDIRYGVSRIAYSKDFETLKVDF LSKLPEMLKMFEDRLCHKTYLNGDHVTHPDFMLYDALDVVLYMDPMCLDAFPKLVCFKKRIEAI PQIDKYLKSSKYIAWPLQGWQATFGGGDHPPKSSSLEVLFQGPGSGMSPSYSPTSPNYTASSPGGA SPNYSPSSPNYSPTSPLYASPRYASTTPNFNPQSTGYSPSSSGYSPTSPVYSPTVQFQSSPSFAGSGSNI YSPGNAYSPSSSNYSPNSPSYSPTSPSYLPSSPSYSPTSPCYSPTSPSYSPTSPNYTPVTPSYSPTSPNYS ASPQYSPASPAYSQTGVKYSPTSPTYSPPSPSYDGSPGSPQYTPGSPQYSPASPKYSPTSPLYSPSSPQ HSPSNQYSPTGSTYSATSPRYSPNMSIYSPSSTKYSPTSPTYTPTARNYSPTSPMYSPTAPSHYSPTSP AYSPSSPTFEESED CTD2: Heptads 1-16 MHHHHHHSSMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELGLEFPNLP YYIDGDVKLTQSMAIIRYIADKHNMLGGCPKERAEISMLEGAVLDIRYGVSRIAYSKDFETLKVDF LSKLPEMLKMFEDRLCHKTYLNGDHVTHPDFMLYDALDVVLYMDPMCLDAFPKLVCFKKRIEAI PQIDKYLKSSKYIAWPLQGWQATFGGGDHPPKSSSLEVLFQGPGSGMSPSYSPTSPNYTASSPGGA SPNYSPSSPNYSPTSPLYASPRYASTTPNFNPQSTGYSPSSSGYSPTSPVYSPTVQFQSSPSFAGSGSNI YSPGNAYSPSSSNY CTD3: Heptads 1-25 MHHHHHHSSMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELGLEFPNLP YYIDGDVKLTQSMAIIRYIADKHNMLGGCPKERAEISMLEGAVLDIRYGVSRIAYSKDFETLKVDF LSKLPEMLKMFEDRLCHKTYLNGDHVTHPDFMLYDALDVVLYMDPMCLDAFPKLVCFKKRIEAI PQIDKYLKSSKYIAWPLQGWQATFGGGDHPPKSSSLEVLFQGPGSGMSPSYSPTSPNYTASSPGGA SPNYSPSSPNYSPTSPLYASPRYASTTPNFNPQSTGYSPSSSGYSPTSPVYSPTVQFQSSPSFAGSGSNI YSPGNAYSPSSSNYSPNSPSYSPTSPSYSPSSPSYSPTSPCYSPTSPSYSPTSPNYTPVTPSYSPTSPNYS ASP CTD4: Heptads 26-45 MHHHHHHSSMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELGLEFPNLP YYIDGDVKLTQSMAIIRYIADKHNMLGGCPKERAEISMLEGAVLDIRYGVSRIAYSKDFETLKVDF LSKLPEMLKMFEDRLCHKTYLNGDHVTHPDFMLYDALDVVLYMDPMCLDAFPKLVCFKKRIEAI PQIDKYLKSSKYIAWPLQGWQATFGGGDHPPKSSSLEVLFQGPGSGMQYSPASPAYSQTGVKYSP TSPTYSPPSPSYDGSPGSPQYTPGSPQYSPASPKYSPTSPLYSPSSPQHSPSNQYSPTGSTYSATSPRY SPNMSIYSPSSTKYSPTSPTYTPTARNYSPTSPMYSPTAPSHYSPTSPAYSPSSPTFEESED CTD5: Heptads 16-24 MHHHHHHSSMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELGLEFPNLP YYIDGDVKLTQSMAIIRYIADKHNMLGGCPKERAEISMLEGAVLDIRYGVSRIAYSKDFETLKVDF LSKLPEMLKMFEDRLCHKTYLNGDHVTHPDFMLYDALDVVLYMDPMCLDAFPKLVCFKKRIEAI PQIDKYLKSSKYIAWPLQGWQATFGGGDHPPKSSSLEVLFQGPGSGMYSPSSSNYSPNSPSYSPTSP SYSPSSPSYSPTSPCYSPTSPSYSPTSPNYTPVTPSYSPTSPN

Table 3-1. Drosophila melanogaster GST-CTD construct sequences.

95

Figure 3-4. Phosphorylations identified in Drosophila melanogaster CTD following

treatment with Erk2 using LC-UVPD-MS.

96

Figure 3-4. Phosphorylations identified in Drosophila melanogaster CTD following treatment with Erk2 using LC-UVPD-MS.

A) The site of phosphorylation in Drosophila melanogaster CTD where confirmed sites are highlighted in green. One peptide (repeat 28) shows a single phosphorylation but the position of the phosphate could not be distinguished among three sites (shown in gold).

Regions of the protein in grey were not detected in the Erk2 treated or control CTD samples. The phosphorylation map is the composite of sites identified using positive mode and negative mode LC-UVPD-MS. Representative UVPD mass spectra from positive mode (B) and negative mode (C) analysis are shown for the chymotryptic peptide SPTpSPVYSPTVQF which covers heptads 11 and 12. In both polarities, the doubly charged ions of m/z 745.3 (for positive mode) and m/z 743.3 (for negative mode) were activated using 2 pulses at 2 mJ. Ions that are detected following phosphate neutral loss are denoted by “-P”. D) The “rule book” for CTD phosphorylation by Erk2. SP motifs are recognized with the strict requirement for proline (blue) following serine/threonine (orange and red). Ser5 (S, red) is favored over Ser2 (S, orange) during phosphorylation by Erk2. Thr4 (T) and Ser7 (S) shown in dashed font had little impact on the phosphorylation outcome. An aromatic residue such as tyrosine (Y) or phenylalanine

(F) is required (colored green) for phosphorylation.

97

To analyze phosphorylation patterning along GST-DmCTD, constructs were treated with Erk2 kinase and subjected to LC-UVPD-MS analysis (Figure 3-4A). Both positive and negative ionization modes were used to complement one another. Usually positive mode ionization affords high coverage with ample signal/noise ratio for standard peptides comprised of amino acids with basic side-chains (i.e. Lys, Arg). Analyzing peptides in negative mode using UVPD helps to ensure the detection of multiply phosphorylated peptides, which typically ionize more readily as anions than as protonated cations. In this way, negative mode UVPD analysis accounts for peptides in higher phosphorylation states which may arise from phosphorylation of neighboring heptads in multi-heptad long peptides formed by chymotrypsin missed cleavages. Representative spectra for UVPD analysis are shown in Figure 3-4B (positive mode) and 3C (negative mode), each displaying extensive sequence coverage and confident phosphate localization. By analyzing peptides in both positive and negative ion modes, we ensured identification of the maximum number of phosphorylation sites for DmCTD. Indeed, positive mode UVPD analysis provided the best overall sequence coverage of full-length

DmCTD. Following treatment with Erk2, 22 phosphopeptides were identified accounting for 20 unique phosphosites from 20 individual heptads (Table 3-2). Phosphorylation of two additional heptads was also detected, but could not be confidently localized to a specific serine, threonine, or tyrosine (Table 3-3). Fewer overall phosphopeptides were identified using negative mode UVPD (Table 3-4 and 3-5); however, one phosphosite from heptad 5 that could not be pinpointed by positive mode UVPD was confidently

98

localized upon negative mode UVPD analysis. All other site assignments agreed based on the UVPD spectra acquired for the protonated and deprotonated phosphopeptides.

Our use of truncated DmCTD constructs (CTD2-4) proved beneficial in both confirming phosphosites identified and revealing phosphosites not observed in full-length

DmCTD1. Identical phosphorylation sites were detected in CTD2 and CTD3 versus full length CTD1 (Table 3-4). CTD4, which covers the distal region of the DmCTD, shows several additional sites in negative mode analysis (Table 3-4) that were then confirmed in positive mode analysis (Table 3-2). A single unique phosphopeptide,

SPApSPKYSPTSPL, was identified in both positive and negative modes only in CTD4 constructs, supporting the benefit of decreasing LC-MS complexity to identify all possible phosphorylation sites.

99

Sequence Heptad # (Phosphosite in Heptad) Construct TASSPGGASPNYSPSSPNYSPTpSPLY 6 (S5) Full YApSPRYASTTPNFNPQSTGY 7 (S5) Full ASPRYASTpTPNFNPQSTGY 8 (T5) Full SPTpSPVYpSPTVQF 11 (S5), 12 (S2) Full QSpSPSFAGSGSNIY 13 (S5) Full QSSPSFAGSGSNIYpSPGNAY 15 (S2) Full SPGNAYpSPSSSNY 16 (S2) Full SASPQYSPApSPAYSQTGVKY 26 (S5) Full SPApSPAYSQTGVKY 26 (S5) Full SPApSPKYSPTSPL 32 (S5) CTD4 YSPSSPQHSPSNQYpSPTGSTY 36 (S2) Full SATpSPRYSPNMSIYSPSSTKY 37 (S5) Full SATSPRYpSPNMSIY 38 (S2) Full, CTD4 pSPSSTKY 39 (S2) Full SPTpSPTYpTPTARNY 40 (S5), 41 (T2) Full SPTSPTYpTPTARNY 41 (T2) Full, CTD4 SPTpSPMYSPTAPSHY 42 (S5) Full, CTD4 SPTpSPMYpSPTAPSHY 42 (S5), 43 (S2) Full SPTpSPAYSPSpSPTFEESED 44 (S5), 45 (S5) Full

Table 3-2. CTD peptides with localized phosphosites from positive mode UVPD

analysis.

Sequence Heptad # (Phosphosite in Heptad) Construct TASSPGGASPNYp(SPSS)PNYSPTpSPLY 5 (S2/S4/S5), 6 (S5) Full p(SPTS)PTYSPPSPSY 28 (S2/T4/S5) CTD4 p(SPTS)PAYSPSSPTFEESED 44 (S2/T4/S5) Full, CTD4

Table 3-3. CTD peptides with ambiguous phosphosites from positive mode UVPD

analysis.

100

Sequence Heptad # (Phosphosite in Heptad) Construct SPSpSPNYSPTpSPLY 5 (S5), 6 (S5) Full, CTD2, CTD3 SPSSPNYSPTpSPLY 6 (S5) Full, CTD2, CTD3 SPTpSPVYpSPTVQF 11 (S5), 12 (S2) Full, CTD2, CTD3 SPTSPVYpSPTVQF 12( S2) Full, CTD2, CTD3 QSpSPSFAGSGSNIY 13 (S5) Full, CTD2, CTD3 QSSPSFAGSGSNIYpSPGNAY 15 (S2) Full SPApSPAYSQTGVKY 26 (S5) Full, CTD4 SPApSPKYSPTSPL 32 (S5) CTD4 SIYpSPSSTKY 39 (S2) CTD4 SPTSPMYpSPTAPSHY 43 (S2) Full, CTD4 SPTpSPAYSPSpSPTFEESED 44 (S5), 45 (S5) CTD4 SPTSPAYSPSpSPTFEESED 45 (S5) Full, CTD4

Table 3-4. CTD peptides with localized phosphosites from negative mode UVPD

analysis.

Sequence Heptad # (Phosphosite in Heptad) Construct ASp(TT)PNFNPQSTGY 8 (T4/T5) Full, CTD2, CTD3 SPp(TS)PVYSPTVQF 11 (T4/S5) Full, CTD2, CTD3 SAp(TS)PRYpSPNMSIY 37 (T4/S5), 38 (S2) CTD4 SPp(TS)PAYSPSSPTFEESED 44 (T4/S5) Full, CTD4

Table 3-5. CTD peptides with ambiguous phosphosites from negative mode UVPD

analysis.

101

Despite the additional measures taken to facilitate full characterization of

DmCTD, two regions lacked sequence coverage. These two regions, heptads 17-24 and heptads 30-31 (Figure 3-4A), were not detected in positive or negative mode or in any of the truncation constructs that contained them. These regions were also not detected in untreated GST-DmCTD constructs. These findings suggest the lack of coverage is not a result of phosphorylation of these heptads but rather is a characteristic inherent to this region. A smaller construct CTD5, which includes repeats 16-24, was made to allow focused study in this region, but it proved to be consistently resistant to digestion using chymotrypsin and resulted in sub-par MS/MS analysis. A possible culprit is the cysteine located in the Ser7 position of heptad 20 which may impede ionization based on inter- protein disulfide linkages, although reduction and alkylation of the CTD5 construct did not appear to improve the sequence coverage. Furthermore, the solubility of CTD5, once cleaved from the GST-tag, appears to be low and results in a low retention in solution upon GST tag cleavage. Instead of employing the bottom-up approach, an alternative intact protein strategy was pursued to characterize this region, which is described in the next section. Despite these uncovered regions of the DmCTD sequence, the majority of

DmCTD exhibits high coverage upon LC-UVPD-MS analysis, and our findings can be summarized to derive conclusions about the distribution of phosphorylation along

DmCTD and the influence of neighboring residues within heptads on Erk2 phosphorylation preference (Figure 3-4A).

Upon examination of DmCTD modifications several characteristics of Erk2 phosphorylation become apparent (Figure 3-4D). As expected for serine/threonine 102

kinases such as Erk2, phosphorylation occurs exclusively on serine and threonine residues. Interestingly, other residues within the heptad seem to govern phosphorylation choice. First, all observed phosphorylations occur on serine/threonine residues flanked in the +1 (succeeding) position by proline. In non-consensus heptads when proline is not present next to serine, the serine is not subject to phosphorylation as observed for the

Ser5 position of repeats 9 and 10. When there is no serine/threonine at the fifth position of the consensus sequence or if the fifth residue serine/threonine is not flanked by proline at +1 position as found in heptads 16, 36, 38, 39, 41, and 43, phosphorylation at the Ser2 position can occur, given a suitable S/T-P motif is present (Figure 3-4A). This is in line with the role of Erk2 as a mitogen-activated protein kinase that phosphorylates serine/threonine residues flanked on their C-terminus by proline. We observe phosphorylation only at the Ser2 and Ser5 positions with a preference for Ser5 and no phosphorylation observed at the Thr4 or Ser7 position. Second, in contrast to the strict requirement of C-terminal proline after serine/threonine, residues at the 4 and 7 position of the heptad appear inconsequential for phosphorylation choice. The fourth position in the phosphorylated heptads can be occupied by serine (S), threonine (T), glycine (G), alanine (A), or asparagine (N). No correlation between the identity of the fourth position residue and the site of phosphorylation by Erk2 was identified. The seventh position in the heptad is even more divergent with residues asparagine (N), leucine (L), arginine (R) valine (V), glutamine (Q), serine (S), threonine (T), lysine (K), isoleucine (I), and methionine (M) all occupying phosphorylated heptads. Furthermore, the absence of these amino acids at the 4 and 7 position does not abolish phosphorylations within heptads (i.e. 103

heptads 7 and 15). Taken together, these results suggest the requirements for residues at positions 4 and 7 are not stringent and have little impact on phosphorylation choice.

Third, an aromatic residue like tyrosine (Y) or phenylalanine (F) is always present in the first position in phosphorylated heptads. This position is the best conserved within the consensus sequence. Substitution by alanine (A) (heptad 4) or histidine (H) (heptad 35) does not result in phosphorylation despite the presence of suitable SP motifs.

As observed in yCTD, phosphorylation of DmCTD by a single kinase results in a single phosphorylation within a heptad of conventional numbering with tyrosine (Y) as the first residue. Double phosphorylation (e.g. both Ser2/Ser5) within the same consensus heptad with Ser5 phosphorylation immediately C-terminal to Ser2 phosphorylation (i.e. YpSPTpSPS) was not observed, despite the clear ability of Erk2 to phosphorylate both sites in yCTD. Interestingly, we do observe Ser2 phosphorylation immediately following Ser5 phosphorylation (i.e. TpSPSYpSP) resulting in double phosphorylation within a seven-residue frame. In all these scenarios (repeat 11/12, 40/41,

42/43) (Table 3-2), the second repeat does not exhibit phosphorylation at position 5, either due to the absence of serine or threonine at the fifth position or the absence of proline at the sixth position, resulting in the phosphorylation Ser2 instead. Therefore, in the phosphorylation patterns of DmCTD, doubly phosphorylated Ser5/Ser2 is possible in addition to the singly phosphorylated Ser2 or Ser5 within a seven amino acid span.

One region of DmCTD (repeat 17-24) was consistently not covered in our bottom-up analysis of full-length (CTD1) and minimal-length (CTD5 including repeat

16-24) DmCTD recombinant protein (Figure 3-4A). However, its sequence doesn’t 104

deviate greatly from consensus sequence, making it a potentially good substrate for Erk2 based on our observed rulebook. Therefore, we used alternative MS approaches to detect phosphorylation. To overcome the suspected low solubility of CTD5 in the absence of the recombinant GST-tag, we used an intact protein MS strategy and measured the overall mass of the GST-CTD5 construct before and after kinase treatment. The mass spectrum of the intact wild-type GST-CTD5 exhibits two additional products after one hour of kinase treatment, each consistent with addition of one or two phosphates (Figure 3-5A).

Therefore, this region of DmCTD is subject to phosphorylation.

105

Figure 3-5. Gel shift assay and intact mass analysis results for CTD5 before and after

treatment with Erk2.

A) Wild-type or native sequence, B) all tyrosine residues (Tyr) were mutated to alanine

(Ala), C) all tyrosine residues were mutated to phenylalanine. A portion of the mass spectra of the intact proteins is shown to illustrate the addition of phosphate groups upon treatment with Erk2.

106

3.2.3 Tyrosine 1 is required for CTD phosphorylation by Erk2 and other CTD kinases.

Our observed rulebook suggests a previously unexplored role of tyrosine 1 in directing CTD phosphorylation. Erk2 phosphorylation at Ser5 and Ser2 appears to require a tyrosine or phenylalanine in the first position of the heptad. To better understand the role of tyrosine in CTD phosphorylation, we replaced tyrosine in each repeat of the

CTD5 construct with alanine (YtoA) or phenylalanine (YtoF) and investigated the phosphorylation pattern upon Erk2 kinase treatment using both SDS-PAGE gel shift analysis and the intact mass analysis. The native sequence GST-CTD5 and these two mutant variants were treated with Erk2 kinase and analyzed by gel-shift analysis and mass spectrometry (Figure 3-5A to 3-5C). Consistent with our analysis for full-length

DmCTD, the YtoA mutation abolishes phosphorylation in both gel shift and MS assays

(Figure 3-5B). In fact, the untreated and Erk2-treated YtoA construct gives nearly identical mass spectra. This result suggests there is an absolute requirement for an aromatic residue at the first position of the heptad for at least some CTD kinases.

Unexpectedly, the phosphorylation of native sequence CTD5 and YtoF CTD5 is not equivalent based on the results of gel-shift analysis or MS. Native sequence CTD5 generates relatively few phosphorylated products based on the mass spectrum of the intact protein and a nearly homogenous product band in gel-shift analysis. The one-hour reaction with Erk2 leads to the addition of only one or two phosphate on native CTD5

(Figure 3-5A). In contrast, the YtoF mutation variant gives multiple phosphorylated species upon both MS and gel shift analysis (Figure 3-5C). These data suggest that 107

aromatic character alone does not fully explain tyrosine’s contribution to CTD patterning and that loss of the tyrosine hydroxyl group interferes with native CTD behavior. This distinctive phosphorylation pattern is conserved across CTD kinases as identical constructs treated with TFIIH display similiar behavior (Figure 3-6). In this way, Tyr1 makes a unique contribution to the CTD code and may help punctuate the CTD code by regulating the placement of phosphorylation marks for multiple CTD kinases.

GST-CTD5 Substrate WT YtoF WT YtoA

TFIIH: - + + +

Figure 3-6. Gel shift analysis of CTD5 constructs treated with TFIIH kinase.

3.2.4 Tyrosine 1 limits the addition of phosphates to GST-CTD substrate.

Tyrosine 1 appears to be making important contributions to CTD phosphorylation in a manner not entirely dependent on it’s aromatic character. To better understand the contribution of tyrosine in the first position of the heptad we treated full length GST- yCTD constructs of either wild-type sequence or with all tyrosines substituted with phenylalanine. These constructs were treated with Erk2 kinase under saturating amounts of ATP for a long period of time (16 hours). This insured complete phosphorylation of 108

the substrate. These samples were then analyzed using MALDI mass spectrometry

(MALDI-MS) to determine the intact mass (Figure 3-7). These analyses reveal a difference in the total number of phosphates added to substrate. The wild-type construct accepts approximately 38 phosphates, which is consistent with 1.5 phosphates per heptad along the 26-heptad construct. The YtoF mutant, on the other hand, accepts 52 phosphates, which corresponds to phosphorylation of every Ser2 and Ser5 position. This data supports a role for tyrosine 1 in modulating kinase specificity and final phosphorylation outcomes. These observations, in combination with the data presented above, warrant further investigation and may explain the high conservation of tyrosine residues in even the most divergent CTD sequences (1).

+38 Phosphates

90 +52 Phosphates )

60 CTD wt e e Intensity (% v CTD YtoF Relati 30

0 Erk2 48000 50000 52000 54000 Molecular Weight

Figure 3-7. MALDI-MS analysis of wild-type and YtoF sequence GST-yCTD treated

under saturating Erk2 conditions.

109

3.3 CONCLUSION AND PERSPECTIVE

Our novel LC-UVPD-MS strategy provides high-resolution PTM identification and localization along native CTD sequences and across species. In addition to attaining residue level modification information, we have overcome previous limitations to

MS/MS analysis of the CTD by utilizing alternative proteases and peptide fragmentation via UVPD. These innovations allow us to analyze native CTD sequence without the need to introduce protease sites. Additionally, UVPD is applicable to both positive and negative mode analysis, facilitates the detection of PTMs easily lost in other fragmentation methods, and generates a greater number of diagnostic fragment ions.

Because of these factors, LC-UVPD-MS localizes phosphorylation marks with greater sensitivity and confidence than other fragmentation methods currently reported for CTD

MS. Impressively, we were able to identify 22 phosphorylation sites in the DmCTD.

A major determinant for the phosphorylation state of the CTD appears to be the identity and positioning of flanking residues. Using the highly divergent DmCTD, we were able to interrogate how flanking residues affect Ser2 and Ser5 phosphorylation by

Erk2 (Figure 3-4). The strictest rule demands that a proline residue follow the serine/threonine subject to phosphorylation. Interestingly previous work from us and other labs has shown that this proline is also essential for dephosphorylation, through both its identity and isomerization state (37-39). Proline is unique among the natural amino acids because it can stably assume the cis-isomer conformation about its peptide bond, which is required for recognition by the essential CTD phosphatase Ssu72 (37).

Therefore, the presence of proline directs both the phosphorylation and 110

dephosphorylation of CTD and the isomerization state informs the half-life of such marks in the context of eukaryotic transcription. In contrast, residues at the 4 and 7 position of the consensus heptad have little bearing on phosphorylation choice. Positions 4 and 7 can be occupied by a variety of amino acids or completely absent with no obvious impact on phosphorylation. In light of this, the recent MS analysis of yeast (4) and human (5)

CTD, which incorporate Ser7 position mutations, is likely physiologically relevant for at least some CTD kinase phosphorylation patterns. For potential phosphorylation sites, phosphorylation at the 5 position is favored by Erk2 unless a suitable S/T-P motif is absent. In such situations, phosphorylation in the 2 position can occur when a suitable

S/T-P motif is present. In addition to the strict requirement of S/T-P motif, our results highlight the requirement of an aromatic residue at the first position of the heptad repeat for the phosphorylation of residues at the Ser2 or Ser5 position. This might explain the observation that of the seven residues of the consensus heptad tyrosine at the 1 position appears to be the best conserved even in highly divergent sequences like DmCTD (1).

Additionally, substitution of phenylalanine does not fully compensate for tyrosine at the first position of the heptad and points to an interesting mechanism in which tyrosine 1 influences CTD coding events.

The rules established for CTD kinase patterning in our analysis open interesting avenues for future research. The importance of some of the residues within the consensus CTD sequence has been known for some time (40, 41). However, the influence that alternative flanking residues can have on CTD phosphorylation choice has not been previously elucidated in such high resolution or in the context of the full-length native 111

sequence CTD. Importantly, our approach requires no mutation of the native CTD sequence for analysis. Because of this, RNAP II from cell lines exhibiting transcription defects, for instance due to the knock-down of certain transcription factors, can be purified directly and subject to MS analysis to determine CTD phosphorylation pattern.

Implementation of such methods allows for the direct correlation of the PTM state of the

CTD to transcriptional status and represents a critical step towards a full understanding of the CTD code.

3.4 MATERIALS AND METHODS

3.4.1 Materials.

TFIIH (Cdk7/cyclinH/MAT1) was obtained from Millipore (Temecula, CA). Erk2

(p42 MAP Kinase) was obtained from New England Biosciences (Ipswich, MA).

Sequencing grade chymotrypsin was obtained from Promega (Madison, WI) and MS grade Pierce trypsin was obtained from Thermo Fisher Scientific (Grand Island, NY).

LC-MS grade solvents were obtained from EMD Millipore (Temecula, CA). Integrafrit columns (360 µm O.D. x 100 µm I.D.) and picofrit columns (360 µm O.D. x 75 µm I.D. x 30 µm emitter tip I.D.) were purchased from New Objective (Woburn, MA). Other reagents were obtained from Sigma (St. Louis, MO).

3.4.2 Protein expression and purification.

All GST-CTD coding sequences were subcloned into pET28a (Novagene) derivative vectors encoding an N-terminal His-tag followed by GST-tag and a 3C-

112

protease site. The yeast CTD (yCTD) coding sequence was amplified from

Saccharomyces cerevisiae genomic DNA. Drosophila melanogaster CTD1-4 (CTD1-4) constructs were amplified from plasmids previously established (35). CTD1 acquired a point mutation during sub-cloning (corresponding to Drosophila melanogaster RNA polymerase II (Sequence ID: gb|AAA28868.1|) S1695L substitution). This mutation was not present in other constructs and this region was not covered in any MS analysis

(Figure 3A, heptad 19). Therefore, this point mutation is unlikely to affect our results.

CTD5 coding sequences, including WT and YtoA variants, were synthesized by IDT as linear DNA and subcloned into the same vector.

Proteins were overexpressed in E. coli BL21 (DE3) cells by growing at 37°C in

Luria-Bertani media containing 50 μg/mL kanamycin to an OD600 of 0.4-0.6. Expression was induced by addition of isoprolyl-β-D-thiogalactopyranoside (IPTG) to a final concentration of 0.5 mM. After induction, the cultures were grown at 16°C for an additional 16 hours. The cells were pelleted and lysed via sonication in lysis buffer (50 mM Tris-HCl pH 8.0, 500 mM NaCl, 15 mM Imidazole, 10% Glycerol, 0.1% Triton X-

100, 10 mM β-mercaptoethanol (BME)). Lysate was cleared by centrifugation at

15,000rpm for 45 minutes at 4°C. The supernatant was initially purified using Ni-NTA

(Qiagen) beads and eluted with elution buffer (50 mM Tris-HCl pH 8.0, 500 mM NaCl,

200 mM Imidazole, and 10 mM BME). The protein was dialyzed against gel filtration buffer (20 mM Tris-HCl pH 8.0, 50 mM NaCl, 10 mM BME) at 4°C overnight. Finally this sample was concentrated and ran on a Superdex 200 gel filtration column (GE).

113

Homogeneity of the eluted fractions was determined via Coomassie Brilliant Blue stained

SDS-PAGE.

3.4.3 Kinase treatment of GST-CTD constructs.

TFIIH reactions were performed using 1.4 μg Cdk7/cyclinH/MAT1 complex against 20 μg of GST-CTD substrate in a total reaction volume of 20 μL in a condition containing 8 mM MOPS/NaOH pH7.0, 0.2 M EDTA, 1 mM ATP, and 10 mM magnesium acetate. The reaction was performed at 30°C for 16 hours in a Mastercycler

PCR machine (Eppendorf North America). Erk2 reactions were performed using 50 U of

Erk2 and 10-20 μg of GST-CTD in a total reaction volume of 40 μL substrate in a condition containing 50 mM Tris-HCl pH 7.5, 0.5 mM ATP, 10 mM magnesium chloride, 0.1 mM EDTA, 2 mM DTT, and 0.01% Brij 35. The reaction was performed at

30°C for 1 hour (for GST-yCTD and CTD5 constructs) or 2 hours (for GST-CTD1-4) in a Mastercycler PCR machine (Eppendorf North America). All kinase controls were performed in an identical manner to reactions described above, but an equal volume of sterile deionized water was used in place of kinases.

3.4.4 Sample preparation for mass spectrometry analysis.

GST-yCTD samples were prepared for bottom-up analysis using a two-step proteolysis method. First, overnight digestion at 37 °C with trypsin was carried out using a 1:50 enzyme to substrate ratio to cleave within the GST portion of the protein while leaving the Lys-free/Arg-free 26mer yCTD intact. The resulting digest was passed

114

through a 10 KDa molecular weight cutoff (MWCO) filter to both remove tryptic GST peptides and buffer exchange the retained 26mer into 50 mM Tris-HCl containing 10 mM

CaCl2 (pH 8) in preparation for subsequent proteinase K digestion. Proteinase K was added in a 1:100 ratio and digestion proceeded overnight at 37 °C. Samples were diluted to 1 μM in 0.2% formic acid for LC-MS analysis.

GST-CTD1-5 samples were reduced for 30 minutes at 55 °C using 5 mM dithiothreitol followed by alkylation of reduced cysteines for 30 minutes at room temperature in the dark using 15 mM iodoacetamide. Samples were then diluted into 100

mM Tris-HCl containing 10 mM CaCl2 (pH 8) and digested overnight at room temperature with chymotrypsin using a 1:50 enzyme to substrate ratio. Digests were quenched by the addition of 0.5% trifluoroacetic acid and desalted on C18 spin columns.

Samples were resuspended to 1 µM in 0.1% formic acid for bottom-up LC-MS analysis.

For intact protein analysis, CTD5 constructs were buffer exchanged into 0.1% formic acid using a 7 KDa MWCO Zeba size exclusion spin column (Thermo Fisher

Scientific). Samples were concentrated to a volume of 1 mg/mL prior to analysis.

3.4.5 Mass spectrometry, liquid chromatography, and ultraviolet photodissociation.

Bottom-up analysis of the yCTD was performed on a Velos Pro dual linear ion trap mass spectrometer (Thermo Fisher Scientific, San Jose, CA) equipped with a

Coherent ExciStar XS excimer laser (Santa Clara, CA) operated at 193 nm and 500 Hz as previously described for UVPD. (9, 42) Two pulses at 2 mJ were used for photodissociation. Separations were carried out on a Dionex Ultimate 3000 nano liquid

115

chromatograph configured for preconcentration. Integrafrit trap columns were packed to

3.5 cm using 5 µm Michrom Magic C18 while picofrit analytical columns were packed to

20 cm using 3.5 µm Waters Xbridge BEH C18 (Milford, MA). Mobile phase A was water and B was acetonitrile, each containing 0.1% formic acid. Peptides were loaded onto the trap column for 5 minutes in aqueous solvent containing 2% acetonitrile and

0.1% formic acid at a flow rate of 5 µL/min. Separations occurred over a 20 minute linear gradient in which the percent B was increased from 2-15% during the first 15 minutes and further increased to 35% during the last 5 minutes. The flow rate was maintained at

0.3 µL/min during the separation. A top seven data dependent acquisition method was first used to identify the main phosphorylated species. A targeted analysis followed in which m/z 818, corresponding to the singly phosphorylated heptad peptide, was continually selected for UVPD activation (between MS1 acquisitions that occurred after every five MS/MS events) in order to better resolve partially co-eluting phospho-isomers.

The resulting UVPD spectra were manually interpreted.

DmCTD peptides were analyzed by LC-MS in both positive and negative modes. Negative mode analysis was performed on the Velos Pro mass spectrometer and

Dionex nano LC equipped with C18 columns as described above. To facilitate the formation of peptide anions, methanol was used in place of acetonitrile in mobile phase

B, and 0.1% trifluoroethanol (TFE) was added to all mobile phases in place of formic acid. The loading solvent consisted of 98% water, 2% methanol, and 0.1% TFE.

Following sample loading at 5 µL/min for 3 minutes, a 50 minute linear gradient from 2-

90% B at a flow rate of 0.25 µL/min was used for separations. MS1 spectra were acquired 116

from m/z 400-2000 and the top eight most abundant ions were selected for UVPD using a single 2 mJ pulse. Dynamic exclusion was enabled with an exclusion duration of 8.00 seconds. MassMatrix database search engine was used to interpret the negative mode

UVPD spectra.

An Orbitrap Fusion Tribrid mass spectrometer (Thermo Fischer Scientific,

Bremen, Germany) equipped with a Coherent ExciStar XS excimer laser operated at 193 nm was used for positive mode LC-MS analysis of the DmCTD. The Fusion mass spectrometer was modified for UVPD as described earlier. (43) Nano LC conditions were analogous to those described for separations of the yeast CTD, except that peptides were loaded directly onto the C18 analytical column and separated over 60 minutes using a gradient from 2-40% B. Photoactivation was achieved using 2 pulses at 2 mJ in a 3 ms top speed data-dependent method. All data was acquired in the orbitrap analyzer where

MS1 and MS2 spectra were collected at resolving powers of 60K and 15K (at m/z 200), respectively. Data analysis was performed using Proteome Discoverer 2.0.

LC-MS analysis of intact CTD5 constructs was carried out on the Orbitrap

Fusion Tribrid mass spectrometer. The nano LC system was set up with a 3 cm preconcentration column and a 25 cm analytical column containing PLRP-S resin (5 µm,

1000 Å) and operated under acidic conditions as described for positive mode bottom-up analysis. Each construct was preconcentrated online followed by separation using a fast ramp from 2-23% B over 5 minutes followed by a shallowed gradient from 23-50 %B over 25 minutes. All MS1 data was collected at a resolving power of 240K at m/z 200. To improve spectral signal-to-noise prior to deconvolution, the maximum number of 117

informative spectra were averaged together and subsequently Xtracted at S/N threshold of 3 to obtain the deconvolved mass of each construct.

3.4.6 Data analysis.

Database search was used to interpret the results from both negative and positive mode UVPD analysis of the DmCTD. Regardless of the program used, all data was searched against a forward and reverse FASTA database containing only Drosophila melanogaster and yeast GST-CTD sequences. Phosphorylation of serine, threonine, and tyrosine was set as a variable modification in all searches and carbamidomethyl was a fixed modification of cysteines in only the positive mode searches (reduction and alkylation was not carried out prior to negative mode analysis). MassMatrix Xtreme

3.0.10.16, which is programmed to search for a, x, c, z, and y type product ions, was used to interpret the negative UVPD results. Peptide mass and fragment mass tolerances were

±1.00 Da and ±0.80 Da, respectively, and the minimum pp score was 6.0 while the

minimum pptag score was 3.0. All sites of phosphorylation reported by MassMatrix were manually verified due to the lack of companion PTM localization software for negative mode fragmentation results.

Positive UVPD data was analyzed in Proteome Discoverer 2.0 using Sequest

HT database search and ptmRS site localization software. Prior to database search, a non- fragment filter was applied to remove precursor peaks from MS/MS spectra within a 1 Da window offset. The precursor mass tolerance was 10 ppm, and the fragment mass tolerance was 0.02 Da. All possible product ions including a, b, c, x, y, and z ions were

118

considered for spectrum matching. PSMs were validated using a fixed value PSM validator which filters matches based on a maximum Delta Cn of 0.05. Strict and relaxed target FDR settings were 0.01 and 0.05 respectively for both PSMs and peptides.

Phosphorylation site localization was achieved using ptmRS operating in PhosphoRS mode. Only sites with greater than 99% isoform confidence probability were considered localized without further manual inspection.

3.4.7 Gel shift analysis of CTD5.

1μg of GST-CTD5 substrate from the reactions described above was combined with 2X Laemmli buffer to a final volume of 10μL and boiled at 95°C for 5 minutes. The entire sample was loaded onto 15% SDS-PAGE gel and separated at 200V for 50 minutes at room temperature in a Mini-PROTEAN Tetra Cell (Biorad). The gel was stained with

Coomassie Brilliant Blue and destained until bands were clearly visible. Gels were imaged in a ChemiDocTM MP System (Biorad).

3.4.8 MALDI-MS analysis of GST-yCTD.

5ug of GST-yCTD protein from the kinase reaction described above was equilibrated with dilute trifluoracetic acid (TFA) to a final concentration of 0.01% TFA and a pH of < 4. These samples were desalted using ZipTip (Millipore) tips according to manufacturer instructions. These samples were mixed 1:1 with a 2,5-Dihydrobenzoic acid matrix solution (DHB) and spotted on a stainless steel sample plate. The spots were allowed to crystallize at ambient temperature and pressure. MALDI-MS spectra were

119

obtained on an AB Voyager-DE PRO MALDI-TOF instrument with manual adjustment of instrument parameters to insure the greatest signal to noise. Sample masses were determined by a single point calibration against the untreated GST-yCTD wild-type construct mass (~48kDa). Data analysis and noise reduction was performed on

DataExplorer (AB) software. The final data was visualized and interpreted in R-Studio using the ggplot2 package.

3.5 REFERENCES

1. Eick D & Geyer M (2013) The RNA polymerase II carboxy-terminal domain (CTD) code. Chemical reviews 113(11):8456-8490. 2. Heidemann M, Hintermair C, Voss K, & Eick D (2013) Dynamic phosphorylation patterns of RNA polymerase II CTD during transcription. Biochimica et biophysica acta 1829(1):55-62. 3. Riley NM & Coon JJ (2016) Phosphoproteomics in the Age of Rapid and Deep Proteome Profiling. Analytical chemistry 88(1):74-94. 4. Suh H, et al. (2016) Direct Analysis of Phosphorylation Sites on the Rpb1 C- Terminal Domain of RNA Polymerase II. Molecular cell 61(2):297-304. 5. Schuller R, et al. (2016) Heptad-Specific Phosphorylation of RNA Polymerase II CTD. Molecular cell 61(2):305-314. 6. Palumbo AM, et al. (2011) Tandem mass spectrometry strategies for phosphoproteome analysis. Mass spectrometry reviews 30(4):600-625. 7. Brown R, Stuart SS, Houel S, Ahn NG, & Old WM (2015) Large-Scale Examination of Factors Influencing Phosphopeptide Neutral Loss during Collision Induced Dissociation. Journal of the American Society for Mass Spectrometry 26(7):1128-1142. 8. Thompson MS, Cui W, & Reilly JP (2007) Factors that impact the vacuum ultraviolet photofragmentation of peptide ions. Journal of the American Society for Mass Spectrometry 18(8):1439-1452. 9. Madsen JA, Boutz DR, & Brodbelt JS (2010) Ultrafast ultraviolet photodissociation at 193 nm and its applicability to proteomic workflows. Journal of proteome research 9(8):4205-4214. 10. Robinson MR, Madsen JA, & Brodbelt JS (2012) 193 nm ultraviolet photodissociation of imidazolinylated Lys-N peptides for de novo sequencing. Analytical chemistry 84(5):2433-2439.

120

11. Greer SM, Parker WR, & Brodbelt JS (2015) Impact of Protease on Ultraviolet Photodissociation Mass Spectrometry for Bottom-up Proteomics. Journal of proteome research 14(6):2626-2632. 12. Brodbelt JS (2014) Photodissociation mass spectrometry: new tools for characterization of biological molecules. Chemical Society reviews 43(8):2757- 2783. 13. Greer SM, Cannon JR, & Brodbelt JS (2014) Improvement of shotgun proteomics in the negative mode by carbamylation of peptides and ultraviolet photodissociation mass spectrometry. Analytical chemistry 86(24):12285-12290. 14. Madsen JA, Kaoud TS, Dalby KN, & Brodbelt JS (2011) 193-nm photodissociation of singly and multiply charged peptide anions for acidic proteome characterization. Proteomics 11(7):1329-1334. 15. Shaw JB, Madsen JA, Xu H, & Brodbelt JS (2012) Systematic comparison of ultraviolet photodissociation and electron transfer dissociation for peptide anion characterization. Journal of the American Society for Mass Spectrometry 23(10):1707-1715. 16. Han SW, et al. (2012) Tyrosine sulfation in a Gram-negative bacterium. Nat Commun 3:1153. 17. Luo Y, et al. (2013) novel modifications on C-terminal domain of RNA polymerase II can fine-tune the phosphatase activity of Ssu72. ACS chemical biology 8(9):2042-2052. 18. Madsen JA, et al. (2013) Concurrent automated sequencing of the glycan and peptide portions of O-linked glycopeptide anions by ultraviolet photodissociation mass spectrometry. Analytical chemistry 85(19):9253-9261. 19. Robinson MR, Moore KL, & Brodbelt JS (2014) Direct identification of tyrosine sulfation by using ultraviolet photodissociation mass spectrometry. Journal of the American Society for Mass Spectrometry 25(8):1461-1471. 20. Fort KL, et al. (2016) Implementation of Ultraviolet Photodissociation on a Benchtop Q Exactive Mass Spectrometer and Its Application to Phosphoproteomics. Analytical chemistry 88(4):2303-2310. 21. Thomas MC & Chiang CM (2006) The general transcription machinery and general cofactors. Critical reviews in biochemistry and molecular biology 41(3):105-178. 22. Jeronimo C & Robert F (2014) Kin28 regulates the transient association of Mediator with core promoters. Nature structural & molecular biology 21(5):449- 455. 23. Mayfield JE, Burkholder NT, & Zhang YJ (2016) Dephosphorylating eukaryotic RNA polymerase II. Biochimica et biophysica acta 1864(4):372-387. 24. Tee WW, Shen SS, Oksuz O, Narendra V, & Reinberg D (2014) Erk1/2 activity promotes chromatin features and RNAPII phosphorylation at developmental promoters in mouse ESCs. Cell 156(4):678-690.

121

25. Arbouzova NI & Zeidler MP (2006) JAK/STAT signalling in Drosophila: insights into conserved regulatory and cellular functions. Development 133(14):2605- 2616. 26. Nusse R (2005) Wnt signaling in disease and in development. Cell research 15(1):28-32. 27. Pires-daSilva A & Sommer RJ (2003) The evolution of signalling pathways in animal development. Nature reviews. Genetics 4(1):39-49. 28. Reiter LT, Potocki L, Chien S, Gribskov M, & Bier E (2001) A systematic analysis of human disease-associated gene sequences in Drosophila melanogaster. Genome research 11(6):1114-1125. 29. Rubin GM, et al. (2000) Comparative genomics of the eukaryotes. Science 287(5461):2204-2215. 30. Ingham PW, Nakano Y, & Seger C (2011) Mechanisms and functions of Hedgehog signalling across the metazoa. Nature reviews. Genetics 12(6):393-406. 31. Herz HM, et al. (2012) Polycomb repressive complex 2-dependent and - independent functions of Jarid2 in transcriptional regulation in Drosophila. Molecular and cellular biology 32(9):1683-1693. 32. Klose RJ, Kallin EM, & Zhang Y (2006) JmjC-domain-containing proteins and histone demethylation. Nature reviews. Genetics 7(9):715-727. 33. Margueron R & Reinberg D (2011) The Polycomb complex PRC2 and its mark in life. Nature 469(7330):343-349. 34. Price DH (2008) Poised polymerases: on your mark...get set...go! Molecular cell 30(1):7-10. 35. Zhang Z & Gilmour DS (2006) Pcf11 is a termination factor in Drosophila that dismantles the elongation complex by bridging the CTD of RNA polymerase II to the nascent transcript. Molecular cell 21(1):65-74. 36. Betzel C, et al. (1988) X-ray and model-building studies on the specificity of the active site of proteinase K. Proteins 4(3):157-164. 37. Mayfield JE, et al. (2015) Chemical tools to decipher regulation of phosphatases by proline isomerization on eukaryotic RNA polymerase II. ACS Chem. Biol. 10(10):2405-2414. 38. Werner-Allen JW, et al. (2011) cis-Proline-mediated Ser(P)5 dephosphorylation by the RNA polymerase II C-terminal domain phosphatase Ssu72. The Journal of biological chemistry 286(7):5717-5726. 39. Xu YX, Hirose Y, Zhou XZ, Lu KP, & Manley JL (2003) Pin1 modulates the structure and function of human RNA polymerase II. Genes & development 17(22):2765-2776. 40. West ML & Corden JL (1995) Construction and analysis of yeast RNA polymerase II CTD deletion and substitution mutations. Genetics 140(4):1223- 1233. 41. Schwer B & Shuman S (2011) Deciphering the RNA polymerase II CTD code in fission yeast. Molecular cell 43(2):311-318.

122

42. Gardner MW, Vasicek LA, Shabbir S, Anslyn EV, & Brodbelt JS (2008) Chromogenic cross-linker for the characterization of protein structure by infrared multiphoton dissociation mass spectrometry. Analytical chemistry 80(13):4807- 4819. 43. Klein DR, Holden DD, & Brodbelt JS (2016) Shotgun Analysis of Rough-Type Lipopolysaccharides Using Ultraviolet Photodissociation Mass Spectrometry. Analytical chemistry 88(1):1044-1051.

123

Chapter 4: Cross talk of phosphorylation marks within the CTD code.

ABSTRACT

In previous chapters, direct evidence for cross talk between post-translational modifications within the CTD code (Chapter 2) and mass spectrometry methods to study phosphorylation patterning in high resolution along the CTD (Chapter 3) were obtained. These advances in both understanding of and technology to study the CTD code provide a theoretical and experimental basis to investigate interactions between CTD phosphorylation marks and subsequent coding events. Here, we take the first steps to characterize CTD phosphorylation cross talk, or how phosphorylations coexist and influence one another within the CTD code. Utilizing intact mass analysis the upper limit of phosphorylation marks added to a full length CTD via the action of single and tandem kinase treatments with TFIIH, P-TEFb, and Abl kinase are determined. Subsequently, we investigate the impact of tyrosine 1 phosphorylation on downstream CTD kinases phosphorylation events by TFIIH and P-TEFb. It is revealed that previously installed tyrosine 1 phosphorylations influence the phosphorylation preference of P-TEFb but not TFIIH. Docking and mutagenesis analysis of Erk2 provides a potential structural explanation for this shift in phosphorylation preference. This addresses inconsistencies between P-TEFb’s in vitro and in vivo specificity. Furthermore, it reveals a new layer of CTD regulation in which previously installed marks direct the identity and abundance of subsequent coding events.

124

4.1 INTRODUCTION

Phosphorylation of the CTD is clearly important for transcription progression and specific phosphorylation marks rise and fall consistently across genes (1-4). For instance,

Ser5 is the hallmark of promoter escape (5) and Ser2 phosphorylation coincides with the transition to productive elongation and is high at transcription termination (1, 6, 7). The timing of these marks is obviously regulated, but exactly how this is accomplished in cells is not well understood. Localization of the kinase clearly plays a role. TFIIH, for example, is localized to the pre-initiation complex (PIC) and Ser5 phosphorylation rises at the very beginning of transcription (5). Similiarly, Ser2 doesn’t rise until the polymerase reaches P-TEFb at the promoter-paused region in mammals (6). A complimentary mechanism in which previously installed phosphorylations direct downstream CTD modifiers, like kinases, for specific modification events may also exist

This could be accomplished through generating binding motifs for particular kinases or by altering the specificity of kinases already localized with the CTD.

Few investigation of how previous marks affect CTD kinase functions have been performed (8). However, the impact of other phosphorylation marks on P-TEFb function was studied using synthetic CTD peptides and a combination of molecular biology and mass spectrometry (8). They made three major findings. First, P-TEFb has Ser5 kinase activity in vitro. This is in direct opposition to its physiological role as CTD Ser2 kinase

(9). Second, P-TEFb is not capabale of generating double phosphorylations (Ser2/Ser5) marks within a single heptad and the total number of phosphates added to synthetic peptide substrate (3 heptads) is equal to the number of heptads. Third, phosphorylation 125

of a heptad at Ser2, Ser5, or Tyr1 prevents phosphorylation within the same heptad.

Phosphorylation at Ser7 does not prevent phosphorylation within the same heptad.

Finally, the activity of P-TEFb is stimulated by Ser7 pre-phosphorylation and provides evidence that kinases are sensitive to previously installed phosphorylation marks.

However, this previous phosphorylation did not alter P-TEFb’s Ser5 preference (8).

Interestingly, despite the in vitro co-occurrence of CTD phosphorylations within a single heptad (8), recent data suggest these are minor events in vivo (10, 11). In both yeast and human cells double phosphorylations of heptads were observed using mass spectrometry, but they were a minor species (~30 fold less abundant) compared to single phosphorylated heptads (10). Interestingly, they did observe trends for marks co- occurring in the same and neighboring heptads suggesting there may be some mechanism by which nearby phosphorylation marks potentiate subsequent coding events.

Because of the sparse data on this topic, we sought to begin initial investigations of how phosphorylations of the CTD coexist and how previously installed modifications impact the action of subsequent kinases. We studied the phosphorylation of

Saccharomyces cerevisiae CTD constructs (GST-yCTD) treated in vitro with Abl kinase,

TFIIH, and P-TEFb. These constructs were treated with individual kinases or kinases in tandem. Their intact masses were determined and an upper limit for phosphorylation addition was determined. These constructs were further analyzed and revealed a unique impact of Tyr1 phosphorylation on P-TEFb patterning. This impact is further explored in human cells and at the level of protein structure.

126

4.2 RESULTS AND DISCUSSION

4.2.1 Total amounts of CTD phosphorylation in full length CTD are dictated by heptad number.

Data by our group (Chapter 3) and others (10, 11) suggests CTD phosphorylation occurs primarily in a single phosphate per heptad manner both in vitro and in vivo. Although a minor compliment of dual phosphorylated heptads were identified in both yeast and human cells, the majority of peptide species contain a single phosphate per heptad (~30-fold more) (10). In cells, this can arise from either the inherent substrate preference of CTD kinases or as a result of the action of CTD phosphatases that deplete the steady state level of doubly phosphorylated heptads. To more directly test these models, we sought to investigate the number of phosphorylations added to GST- fusions to Saccharomyces cerevisiae (GST-yCTD) substrate by three CTD kinases: the kinase core subunits of TFIIH (referred to as TFIIH), P-TEFb (referred to as P-TEFb), and Abl kinase. To determine the total number of phosphates added to GST-yCTD substrate by the action of a single kinase, I performed intact mass analysis utilizing MALD-TOF mass spectrometry (MALDI-MS) (Figure 4-1A). This analysis reveals that TFIIH and P-TEFb treatment alone adds ~25 and ~24 phosphates, respectively (Figure 4-1A). Treatment with Abl kinase adds ~13 phosphates (Figure 4-1A), which suggest phosphorylation of every other heptad. Given the resolution of MALDI at this mass range, the broadness of the spectra peaks, and the fact that GST-yCTD is composed of 26 highly consensus heptad repeats, it can be concluded that TFIIH and P-TEFb add approximately 1 phosphate per heptad while Abl kinase appears to add 0.5 phosphates per heptad. This data suggests that multiple phosphorylations of a heptad do not commonly arise from the 127

action of a single kinase. However, the action of multiple kinases acting in tandem may result in multiple phosphorylations within a given heptad. To test the impact of tandem kinase activity on CTD, we treated GST-yCTD with combinations of TFIIH, P-TEFb, and Abl kinase. Previous studies have suggested that P- TEFb and TFIIH have nearly identical patterning along Drosophila melanogaster CTD (Chapter 3 and (12)). Data acquired in our lab via collaboration for GST-yCTD treated with TFIIH, P-TEFb, and a tandem treatment of the two provided highly similiar phosphopeptides in LC-UVPD-MS analysis (Figure 3-2 and Michelle Robinson (Jennifer Brodbelt Lab, UT Austin), personal communication) and suggests that there is little interaction between the phosphorylation patterns of these two kinases. Therefore, we focused our analysis on how phosphorylation of tyrosine 1 by Abl kinase impacts or is impacted by subsequent or previous phosphorylation by TFIIH and P-TEFb using MALDI-MS (Figure 4-1B). This reveals a similar behavior to what was observed for the single kinase treatment. Pre-treatment of substrate with Abl followed by treatment with P-TEFb or TFIIH results in the addition of ~24 and ~27 phosphates, respectively (Figure 4-1B). Pre-treatment of substrate with TFIIH followed by Abl results in the addition of ~29 phosphates (Figure 4-1B). Once again, given the resolution of MALDI-MS at this mass range and the broadness of the peaks observed in our spectra, the very similar increases in mass for these treatments are likely representative of a single phosphorylation of each of the 26 heptads of the GST-yCTD substrate. The slightly higher phosphorylation (~29 phosphates) observed for the sample pre-treated with TFIIH and then phosphorylated with Abl may be explained by the presence of a consensus Abl phosphorylation motif (I/V/L/YXXP/F) (13) at Y173 of our GST tag. Additionally, native PAGE analysis of TFIIH/Abl treated samples echo the observations made in MALDI-MS

(Figure 4-1C). Specifically, TFIIH alone, TFIIH followed by Abl (TFIIH -> Abl), and 128

Abl followed by TFIIH (Abl -> TFIIH) all result in an identical shift of the substrate (Figure 4-1C). Since the GST-tag is consistent across these samples, phosphorylation of the CTD only modestly alters its shape and hydrodynamic radius (12), and these experiments were performed at low acrylamide concentrations (10%) it is likely that the majority of their mobility results from their charge state. Since the tandem treated samples shift no further than the singly treated samples in PAGE they are likely similarly charged and, therefore, similarly phosphorylated. Taken together, these data support a model in which inherent kinase preferences enforce the observed single phosphorylation per heptad trend. Furthermore, if a combinatorial code exists and previous phosphorylations influence downstream coding this cross talk likely spans multiple heptads; at least for TFIIH, P-TEFb, and Abl-like kinases.

129

A. 125 B. 125

~25 P ~29 P ~24 P ~27 P ~13 P ~24 P 100 100 ) ) 75 Reaction 75 Reaction CTRL CTRL Abl Abl −> PTEFb TFIIH Abl −> TFIIH 50 50 PTEFb TFIIH −> Abl ercent Intensity (% ercent Intensity (% ercent P P

25 25

0 0

48000 49000 50000 51000 52000 53000 48000 49000 50000 51000 52000 53000 Mass (Da) Mass (Da) C. Abl -> TFIIH -> CTRL TFIIH Abl -> TFIIH Abl CTDa

CTDO

Figure 4-1. Intact mass analysis of GST-yCTD substrate treated with multiple kinases.

(A) MALDI-MS analysis of GST-yCTD treated with a single kinase (CTRL (grey), Abl

(goldenrod), P-TEFb (tomato), TFIIH (steel blue)). (B) MALDI-MS analysis of GST- yCTD susbtrate treated with multiple kinases (CTRL (grey), Abl followed by P-TEFb

(Abl -> P-TEFb, tomato), Abl followed by TFIIH (Abl -> TFIIH, steel blue), and TFIIH followed by Abl (TFIIH -> Abl, goldenrod)). (C) Native PAGE analysis of control and phosphorylated GST-yCTD samples treated TFIIH, Abl kinase, and combinations thereof

(CTDa=Hypophosphorylated GST-yCTD, CTDO=Hyperphosphorylated GST-yCTD).

130

4.2.2 Tyrosine 1 phosphorylation of the CTD alters the specificity of P-TEFb but not

TFIIH.

Data presented above suggests that the amount of phosphorylation added to GST- yCTD substrate by CTD kinases remains fairly constant and is dictated by the total number of CTD heptads present. It has not been established if the abundance of phosphorylation at specific residues of the heptad (Tyr1, Ser2, Ser5, etc.) remains constant and if it is influenced by the presence of previously installed phosphorylation marks. Therefore, I sought to investigate how tyrosine 1 phosphorylation may influence the abundance of downstream CTD modifications, specifically phosphorylation of Ser2 and Ser5. Tyrosine 1 phosphorylation is a promising mechanism to direct subsequent CTD coding events as it rises early in transcription; either alongside, in human cells for example, or immediately following, as seen in yeast cells, the rise and peak of Ser5 phosphorylation (7). Furthermore, ChIP-seq data suggests that Tyr1 phosphorylation remains intact until polyadenylation of the transcript (7, 14). This temporal coupling of Tyr1 phosphorylation, either prior to or concurrent with the majority of other CTD coding events, and the fact that Tyr1 appears to be essential for both cell viability (15) and in vitro kinase function (Chapter 3) suggests that if a combinatorial phosphorylation code does exist Tyr1 likely plays a role. Furthermore, the previous section reveals that putative CTD tyrosine 1 kinases (c-Abl and Abl-like kinases) (16-19) phosphorylate the CTD rather conservatively only installing phosphates at approximately half (13 out of 26) of the available heptads. This leaves a sufficient number of heptads for subsequent CTD kinases, unlike P-TEFb and TFIIH that sequester the majority of sites with Ser2/Ser5 phosphorylation if allowed. This additional coding space may enable cross talk between phosphorylation and direct phosphorylation events. These modifications likely occur in

131

neighboring heptads, since double phosphorylation of heptads containing Tyr1 phosphorylation has not be observed in vitro (8). To test the hypothesis that tyrosine 1 phosphorylation influences subsequent coding events, I treated GST-yCTD constructs in a similar manner to those in section 4.2.1. These samples were analyzed via western blot with phospho-specific CTD antibodies to determine if phosphorylation of Tyr1 with Abl kinase alters the abundance of CTD phosphorylation marks installed by P-TEFb and TFIIH, specifically Ser5 and Ser2 phosphorylation (Figure 4-2). Ser5 phosphorylation abundance does not appear to be significantly altered by Abl kinase pre-treatment. There is a trend towards decreasing Ser5 abundance for the Abl -> P-TEFb treated sample, but this does not reach statistical significance (p-value < 0.05). Interestingly, there is a marked increase in Ser2 phosphorylation for the Abl -> P-TEFb treated sample (Figure 4-2A, bottom). Serine 2 abundance increases significantly (p-value < 0.01, Welch’s t-test) by ~2.5-fold when compared to P-TEFb treatment alone (Figure 4-2B). This dramatic increase does not occur for TFIIH treated samples, despite TFIIH’s clear ability to install Ser2 marks (Chapter 3). Also, samples treated with Abl kinase alone give no signal from the phospho-specific CTD antibodies (data not shown). This data suggest that tyrosine 1 phosphorylation, like those installed by Abl kinase, can influence the specificity of P-

TEFb and induce a switch from a Ser5 specific CTD kinase to a Ser2 specific CTD kinase. Importantly, this may explain the inconsistencies between P-TEFb’s in vitro and in vivo specificities. Multiple labs have demonstrated P-TEFb alone has an inherent preference for phosphorylating Ser5 of the CTD (8, 9). How Tyr1 phosphorylation may influence the function of P-TEFb has not been determined, except that it prevents further phosphorylation of a heptad by P-TEFb (8). Tyr1 phosphorylation may be the missing

132

link that explains P-TEFb’s in vivo specificity for Ser2 and reveals a novel mechanism to time the occurrence and abundance of Ser2 marks.

A. B. PTEFb -> TFIIH -> PTEFB Abl TFIIH Abl

pSer5

Total Protein (Coomassie)

pSer2

Total Protein (Coomassie)

Figure 4-2. Analysis of Ser5 and Ser2 phosphorylation of Abl, P-TEFb, and TFIIH

treated samples.

(A) Western blot analysis of GST-yCTD treated with various P-TEFb alone (P-TEFb),

Abl followed by P-TEFb (Abl -> P-TEFb), TFIIH alone (TFIIH), or Abl followed by

TFIIH (Abl -> TFIIH). Phosphorylated Ser5 (top) and Ser2 (bottom) were analyzed and

normalized to total protein amount determined by coomassie staining of the membrane.

Blots show are representative of three experimental replicates (n=3). (B) Quantification

of blots. Densitometry performed in ImageStudioLite (Licor). Significance was assessed

with Welch’s t-test (n.s. = not significant, p-value > 0.05; */** = p-value < 0.05). 133

4.2.3 Inhibition of tyrosine kinases reduces the level of Ser2 phosphorylation of the

CTD in cells.

In vitro observation in section 4.2.2. suggests Ser2 phosphorylation via P-TEFb is determined, at least in part, by Tyr1 phosphorylation. To test if this observation extends to complex cellular systems we sought to perturb Tyr1 phosphorylation abundance in human embryonic kidney cells (HEK293T) and probe for alterations in other CTD phosphorylation marks via western blot. Unfortunately, a clear candidate for the Tyr1 kinase involved in general transcription has not been successfully identified. Consensus in the field suggests it is an Abl-like kinase, which has been suspected for over two decades (16-19) but direct evidence for a specific gene is lacking. Therefore, we turned to tyrosine kinase inhibitors which, like the majority of kinase inhibitors, having fairly promiscuous inhibitory repertoires due to the conserved nature of kinase active sites (20).

We utilized the archetypical tyrosine kinase inhibitor imatinib (Gleevec), which is most widely known for its effectiveness in treating chronic myeloid leukemia (CML) by inhibiting the gene product of the brc-abl mutation common to this disease (20).

Importantly, imatinib has been shown to have multiple off target effects including the inhibition of other Abl-like tyrosine kinases, including c-KIT and PDGF-R (21). We hypothesized that if Tyr1 phosphorylation of the CTD does direct Ser2 phosphorylation, then even broad inhibition of tyrosine kinase function should specifically decrease Ser2 abundance while not significantly altering the level of other marks.

To test this hypothesis, we treated HEK293T cells with varying amounts of imatinib (5-50 μM) and found that 30 μM was sufficient to alter CTD phosphorylation 134

status while maintaining robust cell growth. The decrease of CTD phosphorylation observed was specific for Ser2, as western blot against Ser5 phosphorylation did not reveal a significant change (Figure 4-3). This data suggests that inhibition of Abl-like tyrosine kinases, which likely phosphorylate Tyr1 of the CTD (9, 16-19), results in a decrease in Ser2 phosphorylation. This supports our in vitro observation that Tyr1 phosphorylation influences the abundance of Ser2 marks by altering P-TEFb specificity.

135

A.# B.#

pSer2&

n.s. 90 Total&Protein& * (Coomassie)& t (%) 60 Treatment 30uM Imatinib CTRL e e Amoun v pSer5& Relati 30

Total&Protein& (Coomassie)&

0.0

Ser2 Ser5

Figure 4-3. Western blot for Ser2 and Ser5 phosphorylation in HEK293T cells treated

with imatinib.

(A) Western blot analysis of 20 μg total protein from HEK293T cells treated with 30 μM imatinib (right) or DMSO vehicle control (left). Ser2 phosphorylation (top) and Ser5 phosphorylation (bottom) were normalized to total protein stained with coomassie blue.

Blots represent results from six biological replicates (n=6). (B) Quantification of blots.

Densitometry performed in ImageStudioLite (Licor). Significance was assessed with

Welch’s t-test (n.s. = not significant, p-value > 0.05; * = significant, p-value < 0.01).

136

4.2.4 Ser5/Ser2 specificity is governed by a potential Tyr1 binding pocket conserved across CTD kinases.

The above data suggests P-TEFb is sensitive to the phosphorylation state of Tyr1 along GST-yCTD substrate. The fact that we observed this in purified protein systems suggests this sensitivity to Tyr1 phosphorylation is encoded at the level of protein structure. To further investigate this we sought to identify binding pockets within CTD kinases that may interact with tyrosine residues along the CTD. Erk2 is a well characterized mammalian kinase that has recently been implicated in CTD phosphorylation (22) and has a high degree of sequence and structural similiarity to Cdk7 and Cdk9 (Figure 4-5 & 4-6), the cyclin dependent kinase subunits of TFIIH and P- TEFb. Therefore we utilized Erk2 as a model CTD kinase for docking analysis. Iterative cycles of docking, crude molecular dynamics simulation, and manual inspection of protein structure revealed a conserved hydrophobic pocket proximal to the kinase active site that may facilitate tyrosine binding (Figure 4-4A, 4-5, & 4-6). Several hydrophobic residues form this pocket, but a solvent exposed tryptophan (W190) residue contributes most significantly to its structure (Figure 4-4A). To verify that this pocket is indeed important to kinase function against CTD we generated W190A mutants of intrinsically active Erk2 (I84A) (23), and demonstrate that such mutations abolishes kinase activity against GST-yCTD substrate as shown by SDS-PAGE gel shift (Figure 4-4B). Interestingly, less drastic mutagenesis of this pocket (W190Y) does not abolish kinase activity, but in fact shifts kinase preference to increases the abundance of Ser2 phosphorylation installed along CTD substrate by 15-fold (Figure 4-4C). This dramatic increase supports a role for this pocket in binding the CTD and directing the Ser5/Ser2 specificity of CTD kinases.

137

A.#

W190

B.# Erk2 Erk2 CTRL W190A

C.#

Figure 4-4. Docking and mutagenesis analysis of Erk2.

138

Figure 4-4. Docking and mutagenesis analysis of Erk2.

(A) Docking analysis of tyrosine into available Erk2 structures (PDB ID: 4gt3) reveals a putative tyrosine binding pocket (red). This pocket is formed by multiple hydrophobic residues, with the major residue being tryptophan 190 (W190). (B) W190A mutation of intrinsically active Erk2 mutant (I84A) prevents phosphorylation of GST-yCTD susbtrate

in SDS-PAGE gel shift analysis (CTDA=Hypophosphorylated GST-yCTD,

CTDO=Hyperphosphorylated GST-yCTD). (C) W190Y mutation of intrinsically active

Erk2 mutant (I84A) alters kinase preference and dramatically increases Ser2 abundance as shown by western blot using Ser2 CTD phosphospecific antibodies (n=1). Blot quantified using ImageStudioLite (Licor).

139

CLUSTAL O(1.2.4) multiple sequence alignment

sp|P50613|CDK7_HUMAN ------MALDVKSRAKRYEKLDFLGEGQFATVYKARDKNTNQIVAIKKIKLGHRS tr|Q24216|Q24216_DROME ------MLPNANDKTERYAKLSFLGEGQFATVYKARDTVTNQIVAVKKIKKGSRE sp|P06242|KIN28_YEAST ------MKVNMEYTKEKKVGEGTYAVVYLGCQHSTGRKIAIKEIKT---S sp|P63086|MK01_RAT MAAAAAAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNLNKVRVAIKKISPFEH- . .* : . :*** :. * . : . :*:*:*. sp|P50613|CDK7_HUMAN EAKDGINRTALREIKLLQELSHPNIIGLLDAFG-----HKSNISLVFDFMETDLEVIIKD tr|Q24216|Q24216_DROME DARDGINRTALREIKILQELQHENIIGLVDVFG-----QLSNVSLVFDFMDTDLEVIIKD sp|P06242|KIN28_YEAST EFKDGLDMSAIREVKYLQEMQHPNVIELIDIFM-----AYDNLNLVLEFLPTDLEVVIKD sp|P63086|MK01_RAT ---QTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKT : ::**:* * .: * *:* : * : .:: :* ::: *** ::* sp|P50613|CDK7_HUMAN NSLVLTPSHIKAYMLMTLQGLEYLHQHWILHRDLKPNNLLLDENGVLKLADFGLAKSFGS tr|Q24216|Q24216_DROME NKIILTQANIKAYAIMTLKGLEYLHLNWILHRDLKPNNLLVNSDGILKIGDFGLAKSFGS sp|P06242|KIN28_YEAST KSILFTPADIKAWMLMTLRGVYHCHRNFILHRDLKPNNLLFSPDGQIKVADFGLARAIPA sp|P63086|MK01_RAT QH--LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADP : :: .* : *:*: : * :*******.***.. :*: *****: sp|P50613|CDK7_HUMAN PN---RAYTHQVVTRWYRAPELLFGARMYGVGVDMWAVGCILAELLLRVPFLPGDSDLDQ tr|Q24216|Q24216_DROME PN---RIYTHHVVTRWYRSPELLFGARQYGTGVDMWAVGCILAELMLRVPFMPGDSDLDQ sp|P06242|KIN28_YEAST PH---EILTSNVVTRWYRAPELLFGAKHYTSAIDIWSVGVIFAELMLRIPYLPGQNDVDQ sp|P63086|MK01_RAT DHDHTGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQ : * *.*****:**:::.:: * .:*:*:** *:**:: . * :**. :** sp|P50613|CDK7_HUMAN LTRIFETLGTPTEEQWPDMCSLPDYVTFKSFPGIP---LHHIFSAAGDDLLDLIQGLFLF tr|Q24216|Q24216_DROME LTRIFSTLGTPTEAEWPHLSKLHDYLQFRNFPGTP---LDNIFTAAGNDLIHLMQRLFAM sp|P06242|KIN28_YEAST MEVTFRALGTPTDRDWPEVSSFMTYNKLQIYPPPSRDELRKRFIAASEYALDFMCGMLTM sp|P63086|MK01_RAT LNHILGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTF : : **:*:: : : .: : * . * *.. :.:: :: : sp|P50613|CDK7_HUMAN NPCARITATQALKMKYFSNRPGPTPGCQLPRPNCPVETLKEQSNPAL-----AIKRKRTE tr|Q24216|Q24216_DROME NPLRRVSCREALSMPYFANKPAPTVGPKLPMPSAILAAK-EGANPQTGDTKPALKRKLVE sp|P06242|KIN28_YEAST NPQKRWTAVQCLESDYFKELPPPSDPSSIKIRN------sp|P63086|MK01_RAT NPHKRIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKE------KLKELIFE ** * :.* *: : *: sp|P50613|CDK7_HUMAN A--LEQGGLPKK--LIF tr|Q24216|Q24216_DROME TT-VRGNGLAQKKRLQF sp|P06242|KIN28_YEAST ------sp|P63086|MK01_RAT ETARFQPGYRS------

Figure 4-5. Sequence alignment of Ser5 CTD kinases.

140

Figure 4-5. Sequence alignment of Ser5 CTD kinases.

Amino acid sequences were obtained from UniProt (24). They represent Ser5 CTD kinases (9, 11) and are aligned in the following order (top to bottom): Homo sapiens

(entry# P50613), Drosophila melanogaster (entry# Q24216), and Saccharomyces cerevisiae (Kin28) (entry# P06242). Rattus norvegicus Erk2 (entry# P63086) is also included for reference. The conserved tryptophan residue (equivalent to W190 of Rattus novergicus Erk2) is highlighted in yellow, the arginine residue conserved amongst Ser5

CTD kinases is highlighted in red, and other residues conserved across all CTD kinases are highlighted in grey.

141

CLUSTAL O(1.2.4) multiple sequence alignment

sp|P50750|CDK9_HUMAN ------tr|O17432|O17432_DROME ------sp|Q03957|CTK1_YEAST MSYNNGNTYSKSYSRNNKRPLFGKRSPNPQSLARPPPPKRIRTDSGYQSNMDNISSHRVN

sp|P50750|CDK9_HUMAN ------tr|O17432|O17432_DROME ------MAHMS sp|Q03957|CTK1_YEAST SNDQPGHTKSRGNNNLSRYNDTSFQTSSRYQGSRYNNNNTSYENRPKSIKRDETKAEFLS

sp|P50750|CDK9_HUMAN ------MAKQY------DSVECPFCDEV tr|O17432|O17432_DROME HMLQQPSGSTPSNVGSSSSRTM---SLMEKQKYI------EDYDFPYCDES sp|Q03957|CTK1_YEAST HLPKGPKSVEKSRYNNSSNTSNDIKNGYHASKYYNHKGQEGRSVIAKKVPVSVLTQQRST * . sp|P50750|CDK9_HUMAN SKYEKLAKIGQGTFGEVFKARHRKTGQ-KVALKKVLMENEKEGFPITALREIKILQLLKH tr|O17432|O17432_DROME NKYEKVAKIGQGTFGEVFKAREKKGNKKFVAMKKVLMDNEKEGFPITALREIRILQLLKH sp|Q03957|CTK1_YEAST SVYLRIMQVGEGTYGKVYKAKNTNTE-KLVALKKLRLQGEREGFPITSIREIKLLQSFDH . * :: ::*:**:*:*:**:. : **:**: ::.*:******::***::** :.* sp|P50750|CDK9_HUMAN ENVVNLIEICRTKASPYNRCKGSIYLVFDFCEHDLAGLLSNVLVKFTLSEIKRVMQMLLN tr|O17432|O17432_DROME ENVVNLIEICRTKATATNGYRSTFYLVFDFCEHDLAGLLSNMNVKFSLGEIKKVMQQLLN sp|Q03957|CTK1_YEAST PNVSTIKEIMVE------SQKTVYMIFEYADNDLSGLLLNKEVQISHSQCKHLFKQLLL ** .: ** : :.*::*::.::**:*** * *::: .: *:::: ** sp|P50750|CDK9_HUMAN GLYYIHRNKILHRDMKAANVLITRDGVLKLADFGLARAFSLAKNSQPNRYTNRVVTLWYR tr|O17432|O17432_DROME GLYYIHSNKILHRDMKAANVLITKHGILKLADFGLARAFSIPKNESKNRYTNRVVTLWYR sp|Q03957|CTK1_YEAST GMEYLHDNKILHRDVKGSNILIDNQGNLKITDFGLARKMN-----SRADYTNRVITLWYR *: *:* *******:*.:*:** ..* **::****** :. . *****:***** sp|P50750|CDK9_HUMAN PPELLLGERDYGPPIDLWGAGCIMAEMWTRSPIMQGNTEQHQLALISQLCGSITPEVWPN tr|O17432|O17432_DROME PPELLLGDRNYGPPVDMWGAGCIMAEMWTRSPIMQGNTEQQQLTFISQLCGSFTPDVWPG sp|Q03957|CTK1_YEAST PPELLLGTTNYGTEVDMWGCGCLLVELFNKTAIFQGSNELEQIESIFKIMGTPTINSWPT ******* :** :*:**.**::.*::.:: *:**..* .*: * :: *: * : ** sp|P50750|CDK9_HUMAN VDNYELYEKLELVKGQ--KRKVKDRLKAYVRDPYALDLIDKLLVLDPAQRIDSDDALNHD tr|O17432|O17432_DROME VEELELYKSIELPKNQ--KRRVKERLRPYVKDQTGCDLLDKLLTLDPKKRIDADTALNHD sp|Q03957|CTK1_YEAST LYDMPWFFMIMPQQTTKYVNNFSEKFKSVLPSSKCLQLAINLLCYDQTKRFSATEALQSD : : : : : ....:::: : . :* :** * :*:.: **: * sp|P50750|CDK9_HUMAN FFWSDPMPSDL--KGMLSTHLTSMFEYLAPPRRKGSQITQQSTNQ------SRNPATTN tr|O17432|O17432_DROME FFWTDPMPSDL--SKMLSQHLQSMFEYLAQPRRSNQMRNYHQQLT------TMNQKPQD sp|Q03957|CTK1_YEAST YFKEEPKPEPLVLDGLVSC-----HEYEVKLARKQKRPNILSTNTNNKGNGNSNNNNNNN :* :* *. * . ::* .** . *. . . . : * : sp|P50750|CDK9_HUMAN QTEFERVF tr|O17432|O17432_DROME NSMIDRVW sp|Q03957|CTK1_YEAST NDDDDK-- : ::

Figure 4-6. Sequence alignment of Ser2 CTD kinases.

142

Figure 4-6. Sequence alignment of Ser2 CTD kinases.

Amino acid sequences were obtained from UniProt (24). They represent Ser2 CTD kinases (9, 25) and are aligned in the following order (top to bottom): Homo sapiens

(entry# P50750), Drosophila melanogaster (entry# O17432), and Saccharomyces cerevisiae (CTK1_YEAST) (entry# Q03957). The conserved tryptophan residue

(equivalent to W190 of Rattus novergicus Erk2) is highlighted in yellow, the leucine residue conserved amongst Ser2 CTD kinases is highlighted in blue, and other residues conserved across all CTD kinases are highlighted in grey.

Importantly, this hydrophobic pocket is evolutionarily conserved from yeast to mammals in the kinase subunits of TFIIH and P-TEFb and their homologues (Figure 4-5 & 4-6), suggesting it is similarly important to their ability to bind and phosphorylated CTD. Interestingly, although these pockets are highly similar analysis of their sequences (Figures 4-5 & 4-6) and structures (Figure 4-7) suggests they differ consistently in electrostatic character in a manner dependent on their physiological substrate (Ser5 or Ser2 of the CTD). For CTD Ser5 kinases, like Erk2 and TFIIH, the conserved tryptophan residue is flanked N-terminally by the basic residue arginine across species (Figure 4-5 & Figure 4-7A, top two panels). The kinase subunit of P-TEFb and homologues, on the other hand, has a tryptophan residue consistently flanked N-terminally by the aliphatic residue leucine across species (Figure 4-6 & Figure 4-7A, bottom panel). These conserved electrostatic characteristics may provide a mechanism through which the phosphorylation state of Tyr1 can be detected. These subtle electrostatic interactions may contribute to the shift in P-TEFb specificity upon Tyr1 phosphorylation and warrant further investigation.

143

A.# Erk2 W190

R189

TFIIH (Cdk7) W177

R176

P-TEFb (Cdk9) W193

L192

Figure 4-7. A putative tyrosine binding pocket is conserved amongst CTD kinases.

144

Figure 4-7. A putative tyrosine binding pocket is conserved amongst CTD kinases.

(A) Analysis of Erk2, TFIIH (Cdk7), and P-TEFb (Cdk9) kinase structures (PDB IDs:

4gt3, 1ua2, and 4ogr, respectively). Conserved tryptophan residues are labeled in black and N-temrinal residues influences the electrostatics of the pocket are shown in red or blue.

4.3 CONCLUSION AND PERSPECTIVE

The data presented in this chapter provides context to better understand how phosphorylations coexist and influence one another within the CTD code. Intact mass analysis utilizing MALDI-MS supports previous observations that the CTD trends towards a simplistic deposition of a single phosphate per heptad (10, 11). Furthermore, this phosphorylation limit appears to be at least partially enforced at the level of CTD kinases, since even prolonged incubation and tandem kinase treatment of substrate does not markedly alter the total number of phosphates added to GST-yCTD by TFIIH, P- TEFb, or Abl kinase. Interestingly, Ser2 and Ser5 kinases, like P-TEFb and TFIIH, appear to readily phosphorylate nearly all the heptads of the substrate. Abl kinase, on the other hand, only phosphorylates approximately half. Assuming the Abl kinase utilized in these studies is similar to the unidentified physiological Tyr1 kinase this conservative phosphorylation of the CTD may insure conserved space for subsequent coding events and ultimately allow for cross talk between CTD phosphorylation and subsequent coding events. The ability of previously installed phosphorylations to impact subsequent phosphorylation of the CTD was also investigated in this chapter. Specifically, it was revealed that Tyr1 phosphorylation shifts the specificity of P-TEFb, but not TFIIH. P-

145

TEFb shifts from a Ser5 specific kinase, as previously observed by multiple labs (8, 9), to a Ser2 preferred kinase in the context of previously installed Tyr1 phosphorylations. These marks are likely installed in neighboring heptads, as P-TEFb was previously shown to be incapable of phosphorylating heptads already phosphorylated at Tyr1 (8). Futhermore, Ser2 phosphorylation of the CTD appears dependent on the action of tyrosine kinases as treatment of cells with the tyrosine kinase inhibitor imatinib significantly decreases Ser2 phosphorylation abundance but does not impact Ser5. This data suggests that CTD phosphorylation marks are not isolated events but actually influence the specificity of downstream kinases. P-TEFb’s ability to interpret and respond to previously installed phosphorylation marks highlights an emerging facet of CTD biology in which CTD marks cooperate across the transcription cycle to shift modification modes. This shift in phosphorylation site preference may act as a mechanism to time the recruitment of CTD binding factors specific for Ser2 phosphorylated heptads and insure the efficient production of functional transcripts.

4.4 MATERIALS AND METHODS

4.4.1 Antibodies and reagents.

Primary antibodies were obtained from the companies indicated: Anti-RNA polymerase II B1 (phospho CTD Ser-2), clone 3E10 (Millipore, Cat#:04-1571 Lot#: NG1857513, Dilution: 1/1,000 – 1/5,000). Anti-RNA polymerase II B1 (phospho-CTD Ser-5), clone 3E8 (Millipore, Cat#:04-1572, Lot#: NG1881282, Dilution: 1,5000 - 1/10,000). Secondary antibodies were obtained from the companies indicated: Goat Anti-Rat IgG Antibody, HRP conjugate (Millipore, Cat#: AP136P, Dilution: 1/50,000).

146

Abl kinase was obtained from ProQinase (Prod# 0992-0000-1). TFIIH (Cdk7/Cyclin H/MAT1 (CAK complex) Protein, active, Cat#14-476) and P-TEFb (Cdk9/Cyclin T1 Protein, active, Cat#14-685) were obtained from Millipore. Imatinib was obtained from Selleck Chemicals.

4.4.2 Protein expression and purification.

CTD coding sequences were subcloned into pET28a (Novagene) derivative vectors encoding an N-terminal His-tag followed by GST-tag and a 3C-protease site. The yeast CTD (yCTD) coding sequence was amplified from Saccharomyces cerevisiae genomic DNA. The expression vector (NpT7-5-Erk2, previously described (26)) encoding Rattus norvegicus Erk2 was a kind gift from Kevin Dalby. The intrinsically active Erk2 mutant (I84A) was generated using QuikChange II Site-Directed

Mutagenesis Kit (Agilent) according to factory directions.

Proteins were overexpressed in E. coli BL21 (DE3) cells by growing at 37°C in

Luria-Bertani media containing 50 μg/mL kanamycin to an OD600 of 0.4-0.6. Expression was induced by addition of isoprolyl-β-D-thiogalactopyranoside (IPTG) to a final concentration of 0.5 mM. After induction, the cultures were grown at 16°C for an additional 16 hours. The cells were pelleted and lysed via sonication in lysis buffer (50 mM Tris-HCl pH 8.0, 500 mM NaCl, 15 mM Imidazole, 10% Glycerol, 0.1% Triton X-

100, 10 mM β-mercaptoethanol (BME)). Lysate was cleared by centrifugation at

15,000rpm for 45 minutes at 4°C. The supernatant was initially purified using Ni-NTA

(Qiagen) beads and eluted with elution buffer (50 mM Tris-HCl pH 8.0, 500 mM NaCl,

200 mM Imidazole, and 10 mM BME). The protein was dialyzed against gel filtration 147

buffer (20 mM Tris-HCl pH 8.0, 50 mM NaCl, 10 mM BME for GST-yCTD & 20mM

Tris-HCl pH 7.5, 200mM NaCl, 10mM BME for Erk2 mutants). Finally this sample was concentrated and ran on a Superdex 200 gel filtration column (GE). Homogeneity of the eluted fractions was determined via Coomassie Brilliant Blue stained SDS-PAGE.

Samples were concentrated to ~10 mg/mL in vivaspin columns (Sartorius).

4.4.3 Kinase treatment of GST-yCTD constructs.

Abl kinase treated and paired no kinase controls were prepared in buffer conditions containing 1 μg/μL GST-yCTD substrate, 0.0035 μg/μL Abl kinase, 50 mM

Tris-HCl pH7.5, 50 mM MgCl2, 2 mM ATP. These were incubated at 30°C for 16 hours and held at 4°C until used or flash frozen in liquid nitrogen and stored at -80°C. TFIIH treated and paired no kinase controls were prepared in buffer conditions containing 1 μg/μL GST-yCTD substrate, 0.025 μg/uL TFIIH, 50 mM Tris-HCl pH7.5,

50 mM MgCl2, 2 mM ATP. These were incubated at 30°C for 16 hours and held at 4°C until used or flash frozen in liquid nitrogen and stored at -80°C. P-TEFb treated and paired no kinase controls were prepared in buffer conditions containing 1 μg/μL GST-yCTD substrate, 0.0075 μg/uL P-TEFb, 50 mM Tris-HCl pH7.5, 50 mM MgCl2, 2 mM ATP. These were incubated at 30°C for 16 hours and held at 4°C until used or flash frozen in liquid nitrogen and stored at -80°C. Tandem kinase treatments were performed by taking 10 μL (equivalent of 10 μg GST-yCTD substrate) of single kinase treated or no kinase control samples and combining them with 10 μL of a buffer containing 50mM Tris-HCl pH7.5, 50 mM

MgCl2, 4 mM ATP, and 0.05 μg/μL TFIIH or 0.015 μ/uL P-TEFb or 0.007 μg/μL Abl.

148

These were incubated at 30°C for 16 hours and held at 4°C until used or flash frozen in liquid nitrogen and stored at -80°C.

4.4.4 MALDI-MS analysis of GST-yCTD.

5ug of GST-yCTD protein from the kinase reaction described above was equilibrated with dilute trifluoracetic acid (TFA) to a final concentration of 0.01% TFA and a pH of < 4. These samples were desalted using ZipTip (Millipore) tips according to manufacturer instructions. These samples were mixed 1:1 with a 2,5-Dihydrobenzoic acid matrix solution (DHB) and spotted on a stainless steel sample plate. The spots were allowed to crystallize at ambient temperature and pressure. MALDI-MS spectra were obtained on an AB Voyager-DE PRO MALDI-TOF instrument with manual adjustment of instrument parameters to insure the greatest signal to noise. Sample masses were determined by a single point calibration against the untreated GST-yCTD wild-type construct mass (48775.10 Da). Data analysis, noise reduction, and Gaussian smoothing were performed in both DataExplorer (AB) and R-Studio (smoother package) to provide interpretable data. Data was visualized in R-Studio using ggplot2. Average mass was determined to be the highest intensity peak of the post-processed data.

4.4.5 Gel shift analysis of GST-yCTD samples.

Native PAGE gel shift analysis was performed using 10% acrylamide native PAGE gels. A volume containing 1 μg of phosphorylated GST-yCTD was loaded into wells and the gels were run at 100V on ice until the dye front reached the bottom of the gel (~4 hours). SDS-PAGE analysis was performed in an identical manner but the gels

149

contained 1% SDS and were run at 150V for 1 hour at room temperature. Gels were stained with Coomassie Brilliant Blue an visualized on BioRad GelDoc system.

4.4.6 Cell culture and total protein preparation.

HEK293T cells were maintained in DMEM with splitting every other day and seeding at a concentration of 9.6 × 104 cells per 10 cm culture dish and incubated at 37°C at 5% CO2. Cells to be treated with 30 μM imatinib or vehicle control were platted at 5 × 105 cells per well in 6-well tissue culture plates in fresh DMEM (ISC BioExpress, Cat#T- 2989-6). Cells were incubated for 24 hours and media was replaced with fresh DMEM containing 30 μM imatinib or DMSO vehicle control. Cells were incubated for an additional 40 hours. Protein preparations were generated by direct in well lysis. Media was removed and 200 μL RIPA buffer (150 mM NaCl, 10mM Tris-HCl pH 7.5, 0.1% SDS, 1% Triton X-100, 1% deoxycholate, and 5 mM EDTA) supplemented to 1X with HALT protease and phosphatase inhibitor cocktail (Thermo Scientific) was added directly to cells. Plates were incubated on ice for 15 minutes with gently shaking and lysate was transferred to micro centrifuge tubes. Samples were briefly sonicated to reduce viscosity and spun at 13,000 rpm for 15 minutes to remove cell debris. Protein concentration was determined utilizing PierceTM BCA Protein Assay Kit (Thermo Scientific) against a BSA standard curve. Samples were diluted to 2 μg/μL with SDS-PAGE loading buffer and boiled at 95°C for 5 minutes. Sample was aliquoted and frozen at -80°C.

4.4.7 Immunoblotting

Total protein from cell lysate (20-40 μg) or GST-yCTD samples (500ng) were loaded onto a 4-20% gradient SDS-PAGE gel (Biorad, Cat#:456-1096) and ran at 150V for 50 minutes at room temperature in a Mini-PROTEAN Tetra Cell (Biorad). The 150

proteins were transferred to PVDF membrane at 100 V for 1 hour at 4 °C in a Mini- PROTEAN Tetra Cell (Biorad). Membranes were blocked in 1X TBST (20 mM Tris-HCl pH 7.5, 150 mM NaCl, 0.1% Tween-20) + 5% Bovine Serum Albumin (BSA) for 1 hour at 4 °C with shaking. Blocked membranes were incubated in primary antibody, at dilutions indicated in section 4.2.1, in either 1X TBST or 1X TBST+5% BSA at 4 °C overnight. The membranes were then washed six times with 1X TBST for 5 minutes each at room temperature and incubated with secondary antibody in 1X TBST for 1 hour at room temperature. The membrane was washed once again and incubated with SuperSignal West Pico Chemiluminescent Substrate (Pierce, Cat#: 34079) according to factory directions. Blots were imaged using a G:BOX gel doc system (Syngene) and quantified in ImageStudioLite (Licor). Total protein normalization was performed as described previously (27).

4.4.8 Sequence alignment.

Sequences to be aligned were obtained from UniProt (24). Sequence alignment was performed using the Clustal Omega algorithm (28).

4.5 REFERENCES

1. Bataille AR, et al. (2012) A universal RNA polymerase II CTD cycle is orchestrated by complex interplays between kinase, phosphatase, and isomerase enzymes along genes. Molecular cell 45(2):158-170. 2. Kim H, et al. (2010) Gene-specific RNA polymerase II phosphorylation and the CTD code. Nature structural & molecular biology 17(10):1279-1286. 3. Mayer A, et al. (2010) Uniform transitions of the general RNA polymerase II transcription complex. Nature structural & molecular biology 17(10):1272-1278. 4. Tietjen JR, et al. (2010) Chemical-genomic dissection of the CTD code. Nature structural & molecular biology 17(9):1154-1161. 5. Wong KH, Jin Y, & Struhl K (2014) TFIIH Phosphorylation of the Pol II CTD Stimulates Mediator Dissociation from the Preinitiation Complex and Promoter Escape. Molecular Cell 54(4):601-612.

151

6. Jonkers I & Lis JT (2015) Getting up to speed with transcription elongation by RNA polymerase II. Nature reviews. Molecular cell biology 16(3):167-177. 7. Mayfield JE, Burkholder NT, & Zhang YJ (2016) Dephosphorylating eukaryotic RNA polymerase II. Biochimica et biophysica acta 1864(4):372-387. 8. Czudnochowski N, Bosken CA, & Geyer M (2012) Serine-7 but not serine-5 phosphorylation primes RNA polymerase II CTD for P-TEFb recognition. Nat Commun 3:842. 9. Eick D & Geyer M (2013) The RNA polymerase II carboxy-terminal domain (CTD) code. Chemical reviews 113(11):8456-8490. 10. Schuller R, et al. (2016) Heptad-Specific Phosphorylation of RNA Polymerase II CTD. Molecular cell 61(2):305-314. 11. Suh H, et al. (2016) Direct Analysis of Phosphorylation Sites on the Rpb1 C- Terminal Domain of RNA Polymerase II. Molecular cell 61(2):297-304. 12. Portz B, et al. (2017) Structural heterogeneity in the intrinsically disordered RNA polymerase II C-terminal domain. Nat Commun 8:15231. 13. Songyang Z, et al. (1995) Catalytic specificity of protein-tyrosine kinases is critical for selective signalling. Nature 373(6514):536-539. 14. Harlen KM & Churchman LS (2017) The code and beyond: transcription regulation by the RNA polymerase II carboxy-terminal domain. Nature reviews. Molecular cell biology 18(4):263-273. 15. Schwer B & Shuman S (2011) Deciphering the RNA polymerase II CTD code in fission yeast. Molecular cell 43(2):311-318. 16. Baskaran R, Chiang GG, Mysliwiec T, Kruh GD, & Wang JY (1997) Tyrosine phosphorylation of RNA polymerase II carboxyl-terminal domain by the Abl- related gene product. The Journal of biological chemistry 272(30):18905-18909. 17. Baskaran R, Chiang GG, & Wang JY (1996) Identification of a binding site in c- Ab1 tyrosine kinase for the C-terminal repeated domain of RNA polymerase II. Molecular and cellular biology 16(7):3361-3369. 18. Baskaran R, Dahmus ME, & Wang JY (1993) Tyrosine phosphorylation of mammalian RNA polymerase II carboxyl-terminal domain. Proceedings of the National Academy of Sciences of the United States of America 90(23):11167- 11171. 19. Duyster J, Baskaran R, & Wang JY (1995) Src homology 2 domain as a specificity determinant in the c-Abl-mediated tyrosine phosphorylation of the RNA polymerase II carboxyl-terminal repeated domain. Proceedings of the National Academy of Sciences of the United States of America 92(5):1555-1559. 20. Winter GE, et al. (2012) Systems-pharmacology dissection of a drug synergy in imatinib-resistant CML. Nature chemical biology 8(11):905-912. 21. Nurmio M, et al. (2007) Inhibition of tyrosine kinases PDGFR and C-Kit by imatinib mesylate interferes with postnatal testicular development in the rat. International journal of andrology 30(4):366-376; discussion 376.

152

22. Tee WW, Shen SS, Oksuz O, Narendra V, & Reinberg D (2014) Erk1/2 activity promotes chromatin features and RNAPII phosphorylation at developmental promoters in mouse ESCs. Cell 156(4):678-690. 23. Smorodinsky-Atias K, et al. (2016) Intrinsically active variants of Erk oncogenically transform cells and disclose unexpected autophosphorylation capability that is independent of TEY phosphorylation. Molecular biology of the cell 27(6):1026-1039. 24. Apweiler R, et al. (2004) UniProt: the Universal Protein knowledgebase. Nucleic acids research 32(Database issue):D115-119. 25. Bres V, Yoh SM, & Jones KA (2008) The multi-tasking P-TEFb complex. Current opinion in cell biology 20(3):334-340. 26. Robbins DJ, et al. (1993) Regulation and properties of extracellular signal- regulated protein kinases 1 and 2 in vitro. The Journal of biological chemistry 268(7):5097-5106. 27. Welinder C & Ekblad L (2011) Coomassie staining as loading control in Western blot analysis. Journal of proteome research 10(3):1416-1419. 28. Sievers F, et al. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology 7:539.

153

Bibliography

1. Adelman K & Lis JT (2012) Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nature Reviews Genetics 13(10):720-731. 2. Ahn SH, Kim M, & Buratowski S (2004) Phosphorylation of serine 2 within the RNA polymerase II C-terminal domain couples transcription and 3' end processing. Molecular cell 13(1):67-76. 3. Akhtar MS, et al. (2009) TFIIH kinase places bivalent marks on the carboxy- terminal domain of RNA polymerase II. Molecular cell 34(3):387-393. 4. Allen KN & Dunaway-Mariano D (2004) Phosphoryl group transfer: evolution of a catalytic scaffold. Trends in biochemical sciences 29(9):495-503. 5. Allepuz-Fuster P, et al. (2014) Rpb4/7 facilitates RNA polymerase II CTD dephosphorylation. Nucleic acids research 42(22):13674-13688. 6. Allison LA, Moyle M, Shales M, & Ingles CJ (1985) Extensive homology among the largest subunits of eukaryotic and prokaryotic RNA polymerases. Cell 42(2):599-610. 7. Andres ME, et al. (1999) CoREST: a functional corepressor required for regulation of neural-specific gene expression. Proceedings of the National Academy of Sciences of the United States of America 96(17):9873-9878. 8. Apweiler R, et al. (2004) UniProt: the Universal Protein knowledgebase. Nucleic acids research 32(Database issue):D115-119. 9. Arbouzova NI & Zeidler MP (2006) JAK/STAT signalling in Drosophila: insights into conserved regulatory and cellular functions. Development 133(14):2605- 2616. 10. Archambault J, et al. (1997) An essential component of a C-terminal domain phosphatase that interacts with transcription factor IIF in Saccharomyces cerevisiae. Proceedings of the National Academy of Sciences of the United States of America 94(26):14300-14305. 11. Archambault J, et al. (1998) FCP1, the RAP74-interacting subunit of a human protein phosphatase that dephosphorylates the carboxyl-terminal domain of RNA polymerase IIO. The Journal of biological chemistry 273(42):27593-27601. 12. Arigo JT, Eyler DE, Carroll KL, & Corden JL (2006) Termination of cryptic unstable transcripts is directed by yeast RNA-Binding proteins Nrd1 and Nab3. Molecular Cell 23(6):841-851. 13. Baranello L, et al. (2016) RNA Polymerase II Regulates Topoisomerase 1 Activity to Favor Efficient Transcription. Cell 165(2):357-371. 14. Bartkowiak B, et al. (2010) CDK12 is a transcription elongation-associated CTD kinase, the metazoan ortholog of yeast Ctk1. Genes & development 24(20):2303- 2316. 15. Baskaran R, Chiang GG, Mysliwiec T, Kruh GD, & Wang JY (1997) Tyrosine phosphorylation of RNA polymerase II carboxyl-terminal domain by the Abl- related gene product. The Journal of biological chemistry 272(30):18905-18909. 16. Baskaran R, Chiang GG, & Wang JY (1996) Identification of a binding site in c- 154

Ab1 tyrosine kinase for the C-terminal repeated domain of RNA polymerase II. Molecular and cellular biology 16(7):3361-3369. 17. Baskaran R, Dahmus ME, & Wang JY (1993) Tyrosine phosphorylation of mammalian RNA polymerase II carboxyl-terminal domain. Proceedings of the National Academy of Sciences of the United States of America 90(23):11167- 11171. 18. Bataille AR, et al. (2012) A universal RNA polymerase II CTD cycle is orchestrated by complex interplays between kinase, phosphatase, and isomerase enzymes along genes. Molecular cell 45(2):158-170. 19. Betzel C, et al. (1988) X-ray and model-building studies on the specificity of the active site of proteinase K. Proteins 4(3):157-164. 20. Brandl CJ & Deber CM (1986) Hypothesis about the function of membrane- buried proline residues in transport proteins. Proceedings of the National Academy of Sciences of the United States of America 83(4):917-921. 21. Brandts JF, Halvorson HR, & Brennan M (1975) Consideration of the Possibility that the slow step in protein denaturation reactions is due to cis-trans isomerism of proline residues. Biochemistry 14(22):4953-4963. 22. Bres V, Yoh SM, & Jones KA (2008) The multi-tasking P-TEFb complex. Current opinion in cell biology 20(3):334-340. 23. Brodbelt JS (2014) Photodissociation mass spectrometry: new tools for characterization of biological molecules. Chemical Society reviews 43(8):2757- 2783. 24. Brown R, Stuart SS, Houel S, Ahn NG, & Old WM (2015) Large-Scale Examination of Factors Influencing Phosphopeptide Neutral Loss during Collision Induced Dissociation. Journal of the American Society for Mass Spectrometry 26(7):1128-1142. 25. Buratowski S (2003) The CTD code. Nature structural biology 10(9):679-680. 26. Cadena DL & Dahmus ME (1987) Messenger RNA synthesis in mammalian cells is catalyzed by the phosphorylated form of RNA polymerase II. The Journal of biological chemistry 262(26):12468-12474. 27. Carrera I & Treisman JE (2008) Message in a nucleus: signaling to the transcriptional machinery. Current opinion in genetics & development 18(5):397- 403. 28. Chambers RS & Dahmus ME (1994) Purification and characterization of a phosphatase from HeLa cells which dephosphorylates the C-terminal domain of RNA polymerase II. The Journal of biological chemistry 269(42):26243-26248. 29. Chapman RD, et al. (2007) Transcribing RNA polymerase II is phosphorylated at CTD residue serine-7. Science 318(5857):1780-1782. 30. Chapman RD, Heidemann M, Hintermair C, & Eick D (2008) Molecular evolution of the RNA polymerase II CTD. Trends in genetics : TIG 24(6):289- 296. 31. Chen VB, et al. (2010) MolProbity: all-atom structure validation for macromolecular crystallography. Acta crystallographica. Section D, Biological 155

crystallography 66(Pt 1):12-21. 32. Chesnut JD, Stephens JH, & Dahmus ME (1992) The interaction of RNA polymerase II with the adenovirus-2 major late promoter is precluded by phosphorylation of the C-terminal domain of subunit IIa. The Journal of biological chemistry 267(15):10500-10506. 33. Cho EJ, Kobor MS, Kim M, Greenblatt J, & Buratowski S (2001) Opposing effects of Ctk1 kinase and Fcp1 phosphatase at Ser 2 of the RNA polymerase II C-terminal domain. Genes & development 15(24):3319-3329. 34. Cho EJ, Takagi T, Moore CR, & Buratowski S (1997) mRNA capping enzyme is recruited to the transcription complex by phosphorylation of the RNA polymerase II carboxy-terminal domain. Genes & development 11(24):3319-3326. 35. Cho H, et al. (1999) A protein phosphatase functions to recycle RNA polymerase II. Genes & development 13(12):1540-1552. 36. Clemente-Blanco A, et al. (2011) Cdc14 phosphatase promotes segregation of telomeres through repression of RNA polymerase II transcription. Nature cell biology 13(12):1450-1456. 37. Coletta A, et al. (2010) Low-complexity regions within protein sequences have position-dependent roles. Bmc Systems Biology 4. 38. Conaway RC, Bradsher JN, & Conaway JW (1992) Mechanism of assembly of the RNA polymerase II preinitiation complex. Evidence for a functional interaction between the carboxyl-terminal domain of the largest subunit of RNA polymerase II and a high molecular mass form of the TATA factor. The Journal of biological chemistry 267(12):8464-8467. 39. Corden JL, Cadena DL, Ahearn JM, Jr., & Dahmus ME (1985) A unique structure at the carboxyl terminus of the largest subunit of eukaryotic RNA polymerase II. Proceedings of the National Academy of Sciences of the United States of America 82(23):7934-7938. 40. Czudnochowski N, Boesken CA, & Geyer M (2012) Serine-7 but not serine-5 phosphorylation primes RNA polymerase II CTD for P-TEFb recognition. Nature Communications 3. 41. Czudnochowski N, Bosken CA, & Geyer M (2012) Serine-7 but not serine-5 phosphorylation primes RNA polymerase II CTD for P-TEFb recognition. Nat Commun 3:842. 42. Descostes N, et al. (2014) Tyrosine phosphorylation of RNA Polymerase II CTD is associated with antisense promoter transcription and active enhancers in mammalian cells. Elife 3. 43. Descostes N, et al. (2014) Tyrosine phosphorylation of RNA polymerase II CTD is associated with antisense promoter transcription and active enhancers in mammalian cells. Elife 3:e02105. 44. Di Vona C, et al. (2015) Chromatin-wide profiling of DYRK1A reveals a role as a gene-specific RNA polymerase II CTD kinase. Molecular cell 57(3):506-520. 45. Dichtl B, et al. (2002) A role for SSU72 in balancing RNA polymerase II transcription elongation and termination. Molecular cell 10(5):1139-1150. 156

46. Donner AJ, Ebmeier CC, Taatjes DJ, & Espinosa JM (2010) CDK8 is a positive regulator of transcriptional elongation within the serum response network. Nature structural & molecular biology 17(2):194-201. 47. Duyster J, Baskaran R, & Wang JY (1995) Src homology 2 domain as a specificity determinant in the c-Abl-mediated tyrosine phosphorylation of the RNA polymerase II carboxyl-terminal repeated domain. Proceedings of the National Academy of Sciences of the United States of America 92(5):1555-1559. 48. Egloff S, et al. (2007) Serine-7 of the RNA polymerase II CTD is specifically required for snRNA gene expression. Science 318(5857):1777-1779. 49. Egloff S, Zaborowska J, Laitem C, Kiss T, & Murphy S (2012) Ser7 phosphorylation of the CTD recruits the RPAP2 Ser5 phosphatase to snRNA genes. Molecular cell 45(1):111-122. 50. Eick D & Geyer M (2013) The RNA polymerase II carboxy-terminal domain (CTD) code. Chemical reviews 113(11):8456-8490. 51. Emsley P & Cowtan K (2004) Coot: model-building tools for molecular graphics. Acta crystallographica. Section D, Biological crystallography 60(Pt 12 Pt 1):2126-2132. 52. Engel C, Sainsbury S, Cheung AC, Kostrewa D, & Cramer P (2013) RNA polymerase I structure and transcription regulation. Nature 502(7473):650-655. 53. Etzkorn FA & Zhao S (2015) Stereospecific Phosphorylation by the Central Mitotic Kinase Cdk1-Cyclin B. ACS chemical biology. 54. Ezkurdia I, et al. (2014) Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Human molecular genetics 23(22):5866-5878. 55. Fischer G, Bang H, & Mech C (1984) [Determination of enzymatic catalysis for the cis-trans-isomerization of peptide binding in proline-containing peptides]. Biomedica biochimica acta 43(10):1101-1111. 56. Fort KL, et al. (2016) Implementation of Ultraviolet Photodissociation on a Benchtop Q Exactive Mass Spectrometer and Its Application to Phosphoproteomics. Analytical chemistry 88(4):2303-2310. 57. Fuda NJ, et al. (2012) Fcp1 dephosphorylation of the RNA polymerase II C- terminal domain is required for efficient transcription of heat shock genes. Molecular and cellular biology 32(17):3428-3437. 58. Gaillard H, Garcia-Muse T, & Aguilera A (2015) Replication stress and cancer. Nature reviews. Cancer 15(5):276-289. 59. Ganem C, et al. (2003) Ssu72 is a phosphatase essential for transcription termination of snoRNAs and specific mRNAs in yeast. The EMBO journal 22(7):1588-1598. 60. Gardner MW, Vasicek LA, Shabbir S, Anslyn EV, & Brodbelt JS (2008) Chromogenic cross-linker for the characterization of protein structure by infrared multiphoton dissociation mass spectrometry. Analytical chemistry 80(13):4807- 4819. 61. Gavin AC, et al. (2002) Functional organization of the yeast proteome by 157

systematic analysis of protein complexes. Nature 415(6868):141-147. 62. Ghosh A, Shuman S, & Lima CD (2008) The structure of Fcp1, an essential RNA polymerase II CTD phosphatase. Molecular cell 32(4):478-490. 63. Ghosh A, Shuman S, & Lima CD (2011) Structural insights to how mammalian capping enzyme reads the CTD code. Molecular cell 43(2):299-310. 64. Gilbert W & Guthrie C (2004) The Glc7p nuclear phosphatase promotes mRNA export by facilitating association of Mex67p with mRNA. Molecular cell 13(2):201-212. 65. Glover-Cutter K, et al. (2009) TFIIH-associated Cdk7 kinase functions in phosphorylation of C-terminal domain Ser7 residues, promoter-proximal pausing, and termination by RNA polymerase II. Molecular and cellular biology 29(20):5455-5464. 66. Greer SM, Cannon JR, & Brodbelt JS (2014) Improvement of shotgun proteomics in the negative mode by carbamylation of peptides and ultraviolet photodissociation mass spectrometry. Analytical chemistry 86(24):12285-12290. 67. Greer SM, Parker WR, & Brodbelt JS (2015) Impact of Protease on Ultraviolet Photodissociation Mass Spectrometry for Bottom-up Proteomics. Journal of proteome research 14(6):2626-2632. 68. Gu B, Eick D, & Bensaude O (2013) CTD serine-2 plays a critical role in splicing and termination factor recruitment to RNA polymerase II in vivo. Nucleic Acids Research 41(3):1591-1603. 69. Guillamot M, et al. (2011) Cdc14b regulates mammalian RNA polymerase II and represses cell cycle transcription. Scientific reports 1:189. 70. Guiro J & Murphy S (2017) Regulation of expression of human RNA polymerase II-transcribed snRNA genes. Open biology 7(6). 71. Han SW, et al. (2012) Tyrosine sulfation in a Gram-negative bacterium. Nat Commun 3:1153. 72. Hanes SD (2014) The Ess1 prolyl isomerase: traffic cop of the RNA polymerase II transcription cycle. Biochimica et biophysica acta 1839(4):316-333. 73. Hani J, et al. (1999) Mutations in a peptidylprolyl-cis/trans-isomerase gene lead to a defect in 3'-end formation of a pre-mRNA in Saccharomyces cerevisiae. The Journal of biological chemistry 274(1):108-116. 74. Harlen KM & Churchman LS (2017) The code and beyond: transcription regulation by the RNA polymerase II carboxy-terminal domain. Nature reviews. Molecular cell biology 18(4):263-273. 75. Harlen KM, et al. (2016) Comprehensive RNA Polymerase II Interactomes Reveal Distinct and Varied Roles for Each Phospho-CTD Residue. Cell Reports 15(10):2147-2158. 76. Hausmann S & Shuman S (2002) Characterization of the CTD phosphatase Fcp1 from fission yeast. Preferential dephosphorylation of serine 2 versus serine 5. The Journal of biological chemistry 277(24):21213-21220. 77. He X, et al. (2003) Functional interactions between the transcription and mRNA 3' end processing machineries mediated by Ssu72 and Sub1. Genes & 158

development 17(8):1030-1042. 78. Heidemann M, Hintermair C, Voss K, & Eick D (2013) Dynamic phosphorylation patterns of RNA polymerase II CTD during transcription. Biochimica et biophysica acta 1829(1):55-62. 79. Herz HM, et al. (2012) Polycomb repressive complex 2-dependent and - independent functions of Jarid2 in transcriptional regulation in Drosophila. Molecular and cellular biology 32(9):1683-1693. 80. Hintermair C, et al. (2012) Threonine-4 of mammalian RNA polymerase II CTD is targeted by Polo-like kinase 3 and required for transcriptional elongation. Embo Journal 31(12):2784-2797. 81. Hintermair C, et al. (2012) Threonine-4 of mammalian RNA polymerase II CTD is targeted by Polo-like kinase 3 and required for transcriptional elongation. The EMBO journal 31(12):2784-2797. 82. Hirose Y & Manley JL (1998) RNA polymerase II is an essential mRNA polyadenylation factor. Nature 395(6697):93-96. 83. Ho CK & Shuman S (1999) Distinct roles for CTD Ser-2 and Ser-5 phosphorylation in the recruitment and allosteric activation of mammalian mRNA capping enzyme. Molecular cell 3(3):405-411. 84. Hornbeck PV, et al. (2015) PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic acids research 43(Database issue):D512-520. 85. Hsin J-P, Li W, Hoque M, Tian B, & Manley JL (2014) RNAP II CTD tyrosine 1 performs diverse functions in vertebrate cells. Elife 3. 86. Hsin J-P, Sheth A, & Manley JL (2011) RNAP II CTD Phosphorylated on Threonine-4 Is Required for Histone mRNA 3 ' End Processing. Science 334(6056):683-686. 87. Hsin JP, Sheth A, & Manley JL (2011) RNAP II CTD phosphorylated on threonine-4 is required for histone mRNA 3' end processing. Science 334(6056):683-686. 88. Hsin JP, Xiang K, & Manley JL (2014) Function and control of RNA polymerase II C-terminal domain phosphorylation in vertebrate transcription and RNA processing. Molecular and cellular biology 34(13):2488-2498. 89. Hsu PL, et al. (2014) Rtr1 is a dual specificity phosphatase that dephosphorylates Tyr1 and Ser5 on the RNA polymerase II CTD. Journal of molecular biology 426(16):2970-2981. 90. Ingham PW, Nakano Y, & Seger C (2011) Mechanisms and functions of Hedgehog signalling across the metazoa. Nature reviews. Genetics 12(6):393-406. 91. Jeronimo C, Bataille AR, & Robert F (2013) The writers, readers, and functions of the RNA polymerase II C-terminal domain code. Chemical reviews 113(11):8491-8522. 92. Jeronimo C, Collin P, & Robert F (2016) The RNA Polymerase II CTD: The Increasing Complexity of a Low-Complexity Protein Domain. Journal of Molecular Biology 428(12):2607-2622. 93. Jeronimo C & Robert F (2014) Kin28 regulates the transient association of 159

Mediator with core promoters. Nature structural & molecular biology 21(5):449- 455. 94. Jonkers I & Lis JT (2015) Getting up to speed with transcription elongation by RNA polymerase II. Nature reviews. Molecular cell biology 16(3):167-177. 95. Kang ME & Dahmus ME (1993) RNA polymerases IIA and IIO have distinct roles during transcription from the TATA-less murine dihydrofolate reductase promoter. The Journal of biological chemistry 268(33):25033-25040. 96. Kelly WG, Dahmus ME, & Hart GW (1993) RNA polymerase II is a glycoprotein. Modification of the COOH-terminal domain by O-GlcNAc. The Journal of biological chemistry 268(14):10416-10424. 97. Kim H, et al. (2010) Gene-specific RNA polymerase II phosphorylation and the CTD code. Nature structural & molecular biology 17(10):1279-1286. 98. Klein DR, Holden DD, & Brodbelt JS (2016) Shotgun Analysis of Rough-Type Lipopolysaccharides Using Ultraviolet Photodissociation Mass Spectrometry. Analytical chemistry 88(1):1044-1051. 99. Klose RJ, Kallin EM, & Zhang Y (2006) JmjC-domain-containing proteins and histone demethylation. Nature reviews. Genetics 7(9):715-727. 100. Kobor MS, et al. (1999) An unusual eukaryotic protein phosphatase required for transcription by RNA polymerase II and CTD dephosphorylation in S. cerevisiae. Molecular cell 4(1):55-62. 101. Kobor MS, et al. (2000) A motif shared by TFIIF and TFIIB mediates their interaction with the RNA polymerase II carboxy-terminal domain phosphatase Fcp1p in Saccharomyces cerevisiae. Molecular and cellular biology 20(20):7438- 7449. 102. Koch F, et al. (2011) Transcription initiation platforms and GTF recruitment at tissue-specific enhancers and promoters. Nature structural & molecular biology 18(8):956-963. 103. Koch F, Jourquin F, Ferrier P, & Andrau JC (2008) Genome-wide RNA polymerase II: not genes only! Trends in biochemical sciences 33(6):265-273. 104. Komarnitsky P, Cho EJ, & Buratowski S (2000) Different phosphorylated forms of RNA polymerase II and associated mRNA processing factors during transcription. Genes & development 14(19):2452-2460. 105. Kops O, Zhou XZ, & Lu KP (2002) Pin1 modulates the dephosphorylation of the RNA polymerase II C-terminal domain by yeast Fcp1. FEBS letters 513(2-3):305- 311. 106. Krauss G (2003) Biochemistry of signal transduction and regulation (Wiley- VCH, Weinheim Great Britain) 3rd Ed pp xvi, 541 p. 107. Krishnamurthy S, He X, Reyes-Reyes M, Moore C, & Hampsey M (2004) Ssu72 Is an RNA polymerase II CTD phosphatase. Molecular cell 14(3):387-394. 108. Krogan NJ, et al. (2003) The Paf1 complex is required for histone h3 methylation by COMPASS and Dot1p: Linking transcriptional elongation to histone methylation. Molecular Cell 11(3):721-729. 109. Krogan NJ, et al. (2003) Methylation of histone H3 by Set2 in Saccharomyces 160

cerevisiae is linked to transcriptional elongation by RNA polymerase II. Molecular and Cellular Biology 23(12):4207-4218. 110. Kubicek K, et al. (2012) Serine phosphorylation and proline isomerization in RNAP II CTD control recruitment of Nrd1. Genes & development 26(17):1891- 1896. 111. Kuehner JN, Pearson EL, & Moore C (2011) Unravelling the means to an end: RNA polymerase II transcription termination. Nature reviews. Molecular cell biology 12(5):283-294. 112. Liang K, et al. (2015) Characterization of human cyclin-dependent kinase 12 (CDK12) and CDK13 complexes in C-terminal domain phosphorylation, gene transcription, and RNA processing. Molecular and cellular biology 35(6):928- 938. 113. Licatalosi DD, et al. (2002) Functional interaction of yeast pre-mRNA 3 ' end processing factors with RNA polymerase II. Molecular Cell 9(5):1101-1111. 114. Liou YC, Zhou XZ, & Lu KP (2011) Prolyl isomerase Pin1 as a molecular switch to determine the fate of phosphoproteins. Trends in biochemical sciences 36(10):501-514. 115. Liu X, Kraus WL, & Bai X (2015) Ready, pause, go: regulation of RNA polymerase II pausing and release by cellular signaling pathways. Trends in biochemical sciences 40(9):516-525. 116. Liu Y, et al. (2004) Two cyclin-dependent kinases promote RNA polymerase II transcription and formation of the scaffold complex. Molecular and cellular biology 24(4):1721-1735. 117. Liu ZG, et al. (1996) Three distinct signalling responses by murine fibroblasts to genotoxic stress. Nature 384(6606):273-276. 118. Loya TJ & Reines D (2016) Recent advances in understanding transcription termination by RNA polymerase II. F1000Research 5. 119. Lu H, Flores O, Weinmann R, & Reinberg D (1991) The nonphosphorylated form of RNA polymerase II preferentially associates with the preinitiation complex. Proceedings of the National Academy of Sciences of the United States of America 88(22):10004-10008. 120. Lu H, Zawel L, Fisher L, Egly JM, & Reinberg D (1992) Human general transcription factor IIH phosphorylates the C-terminal domain of RNA polymerase II. Nature 358(6388):641-645. 121. Lu KP, Finn G, Lee TH, & Nicholson LK (2007) Prolyl cis-trans isomerization as a molecular timer. Nature chemical biology 3(10):619-629. 122. Lu KP, Liou YC, & Zhou XZ (2002) Pinning down proline-directed phosphorylation signaling. Trends in cell biology 12(4):164-172. 123. Lunde BM, et al. (2010) Cooperative interaction of transcription termination factors with the RNA polymerase II C-terminal domain. Nature Structural & Molecular Biology 17(10):1195-+. 124. Luo Y, et al. (2013) novel modifications on C-terminal domain of RNA polymerase II can fine-tune the phosphatase activity of Ssu72. ACS chemical 161

biology 8(9):2042-2052. 125. Madsen JA, Boutz DR, & Brodbelt JS (2010) Ultrafast ultraviolet photodissociation at 193 nm and its applicability to proteomic workflows. Journal of proteome research 9(8):4205-4214. 126. Madsen JA, Kaoud TS, Dalby KN, & Brodbelt JS (2011) 193-nm photodissociation of singly and multiply charged peptide anions for acidic proteome characterization. Proteomics 11(7):1329-1334. 127. Madsen JA, et al. (2013) Concurrent automated sequencing of the glycan and peptide portions of O-linked glycopeptide anions by ultraviolet photodissociation mass spectrometry. Analytical chemistry 85(19):9253-9261. 128. Mandel CR, et al. (2006) Polyadenylation factor CPSF-73 is the pre-mRNA 3'- end-processing endonuclease. Nature 444(7121):953-956. 129. Margueron R & Reinberg D (2011) The Polycomb complex PRC2 and its mark in life. Nature 469(7330):343-349. 130. Mathews CK (2013) Biochemistry (Pearson, Toronto) 4th Ed pp xxvi, 1342 p. 131. Mayer A, et al. (2012) CTD tyrosine phosphorylation impairs termination factor recruitment to RNA polymerase II. Science 336(6089):1723-1725. 132. Mayer A, Landry HM, & Churchman LS (2017) Pause & go: from the discovery of RNA polymerase pausing to its functional implications. Current opinion in cell biology 46:72-80. 133. Mayer A, et al. (2010) Uniform transitions of the general RNA polymerase II transcription complex. Nature structural & molecular biology 17(10):1272-1278. 134. Mayfield JE, Burkholder NT, & Zhang YJ (2016) Dephosphorylating eukaryotic RNA polymerase II. Biochimica et biophysica acta 1864(4):372-387. 135. Mayfield JE, et al. (2015) Chemical Tools To Decipher Regulation of Phosphatases by Proline Isomerization on Eukaryotic RNA Polymerase II. ACS chemical biology. 136. McCracken S, et al. (1997) The C-terminal domain of RNA polymerase II couples mRNA processing to transcription. Nature 385(6614):357-361. 137. Meinhart A & Cramer P (2004) Recognition of RNA polymerase II carboxy- terminal domain by 3 '-RNA-processing factors. Nature 430(6996):223-226. 138. Meinhart A, Silberzahn T, & Cramer P (2003) The mRNA transcription/processing factor Ssu72 is a potential tyrosine phosphatase. The Journal of biological chemistry 278(18):15917-15921. 139. Min SH, et al. (2012) Negative regulation of the stability and tumor suppressor function of Fbw7 by the Pin1 prolyl isomerase. Molecular cell 46(6):771-783. 140. Morgan DO (1997) Cyclin-dependent kinases: engines, clocks, and microprocessors. Annual review of cell and developmental biology 13:261-291. 141. Morris DP & Greenleaf AL (2000) The splicing factor, Prp40, binds the phosphorylated carboxyl-terminal domain of RNA polymerase II. The Journal of biological chemistry 275(51):39935-39943. 142. Mosley AL, et al. (2009) Rtr1 is a CTD phosphatase that regulates RNA polymerase II during the transition from serine 5 to serine 2 phosphorylation. 162

Molecular cell 34(2):168-178. 143. Murthy KG & Manley JL (1995) The 160-kD subunit of human cleavage- polyadenylation specificity factor coordinates pre-mRNA 3'-end formation. Genes & development 9(21):2672-2683. 144. Namanja AT, et al. (2010) Toward flexibility-activity relationships by NMR spectroscopy: dynamics of Pin1 ligands. Journal of the American Chemical Society 132(16):5607-5609. 145. Nedea E, et al. (2003) Organization and function of APT, a subcomplex of the yeast cleavage and polyadenylation factor involved in the formation of mRNA and small nucleolar RNA 3'-ends. The Journal of biological chemistry 278(35):33000-33010. 146. Nedea E, et al. (2008) The Glc7 phosphatase subunit of the cleavage and polyadenylation factor is essential for transcription termination on snoRNA genes. Molecular cell 29(5):577-587. 147. Nelson DL, Lehninger AL, & Cox MM (2013) Lehninger principles of biochemistry (W.H. Freeman, New York) 6th Ed p 1 vol. (various pagings). 148. Nesti E, Corson GM, McCleskey M, Oyer JA, & Mandel G (2014) C-terminal domain small phosphatase 1 and MAP kinase reciprocally control REST stability and neuronal differentiation. Proceedings of the National Academy of Sciences of the United States of America 111(37):E3929-3936. 149. Ng HH, Robert F, Young RA, & Struhl K (2003) Targeted recruitment of Set1 histone methylase by elongating Pol II provides a localized mark and memory of recent transcriptional activity. Molecular cell 11(3):709-719. 150. Ni Z, et al. (2008) P-TEFb is critical for the maturation of RNA polymerase II into productive elongation in vivo. Molecular and cellular biology 28(3):1161- 1170. 151. Nojima T, et al. (2015) Mammalian NET-Seq Reveals Genome-wide Nascent Transcription Coupled to RNA Processing. Cell 161(3):526-540. 152. Nurmio M, et al. (2007) Inhibition of tyrosine kinases PDGFR and C-Kit by imatinib mesylate interferes with postnatal testicular development in the rat. International journal of andrology 30(4):366-376; discussion 376. 153. Nusse R (2005) Wnt signaling in disease and in development. Cell research 15(1):28-32. 154. Orphanides G, LeRoy G, Chang CH, Luse DS, & Reinberg D (1998) FACT, a factor that facilitates transcript elongation through nucleosomes. Cell 92(1):105- 116. 155. Otwinowski Z & Minor W (1997) Processing of X-ray diffraction data collected in oscillation mode. Method Enzymol 276:307-326. 156. Palancade B, et al. (2004) Dephosphorylation of RNA polymerase II by CTD- phosphatase FCP1 is inhibited by phospho-CTD associating proteins. Journal of molecular biology 335(2):415-424. 157. Palumbo AM, et al. (2011) Tandem mass spectrometry strategies for phosphoproteome analysis. Mass spectrometry reviews 30(4):600-625. 163

158. Pappas DL, Jr. & Hampsey M (2000) Functional interaction between Ssu72 and the Rpb2 subunit of RNA polymerase II in Saccharomyces cerevisiae. Molecular and cellular biology 20(22):8343-8351. 159. Patturajan M, et al. (1998) Growth-related changes in phosphorylation of yeast RNA polymerase II. The Journal of biological chemistry 273(8):4689-4694. 160. Phatnani HP & Greenleaf AL (2006) Phosphorylation and functions of the RNA polymerase II CTD. Genes & development 20(21):2922-2936. 161. Pires-daSilva A & Sommer RJ (2003) The evolution of signalling pathways in animal development. Nature reviews. Genetics 4(1):39-49. 162. Portz B, et al. (2017) Structural heterogeneity in the intrinsically disordered RNA polymerase II C-terminal domain. Nat Commun 8:15231. 163. Price DH (2008) Poised polymerases: on your mark...get set...go! Molecular cell 30(1):7-10. 164. Proudfoot NJ (2016) Transcriptional termination in mammals: Stopping the RNA polymerase II juggernaut. Science 352(6291):aad9926. 165. Reiter LT, Potocki L, Chien S, Gribskov M, & Bier E (2001) A systematic analysis of human disease-associated gene sequences in Drosophila melanogaster. Genome research 11(6):1114-1125. 166. Reyes-Reyes M & Hampsey M (2007) Role for the Ssu72 C-terminal domain phosphatase in RNA polymerase II transcription elongation. Molecular and cellular biology 27(3):926-936. 167. Richard P & Manley JL (2009) Transcription termination by nuclear RNA polymerases. Genes & development 23(11):1247-1269. 168. Riley NM & Coon JJ (2016) Phosphoproteomics in the Age of Rapid and Deep Proteome Profiling. Analytical chemistry 88(1):74-94. 169. Robbins DJ, et al. (1993) Regulation and properties of extracellular signal- regulated protein kinases 1 and 2 in vitro. The Journal of biological chemistry 268(7):5097-5106. 170. Robinson MR, Madsen JA, & Brodbelt JS (2012) 193 nm ultraviolet photodissociation of imidazolinylated Lys-N peptides for de novo sequencing. Analytical chemistry 84(5):2433-2439. 171. Robinson MR, Moore KL, & Brodbelt JS (2014) Direct identification of tyrosine sulfation by using ultraviolet photodissociation mass spectrometry. Journal of the American Society for Mass Spectrometry 25(8):1461-1471. 172. Rosonina E, et al. (2014) Threonine-4 of the budding yeast RNAP II CTD couples transcription with Htz1-mediated chromatin remodeling. Proceedings of the National Academy of Sciences of the United States of America 111(33):11924- 11931. 173. Roy AL & Singer DS (2015) Core promoters in transcription: old problem, new insights. Trends in biochemical sciences 40(3):165-171. 174. Rubin GM, et al. (2000) Comparative genomics of the eukaryotes. Science 287(5461):2204-2215. 175. Schreieck A, et al. (2014) RNA polymerase II termination involves C-terminal- 164

domain tyrosine dephosphorylation by CPF subunit Glc7. Nature structural & molecular biology 21(2):175-179. 176. Schroeder SC, Schwer B, Shuman S, & Bentley D (2000) Dynamic association of capping enzymes with transcribing RNA polymerase II. Genes & Development 14(19):2435-2440. 177. Schuller R, et al. (2016) Heptad-Specific Phosphorylation of RNA Polymerase II CTD. Molecular cell 61(2):305-314. 178. Schwer B & Shuman S (2011) Deciphering the RNA polymerase II CTD code in fission yeast. Molecular cell 43(2):311-318. 179. Shaw JB, Madsen JA, Xu H, & Brodbelt JS (2012) Systematic comparison of ultraviolet photodissociation and electron transfer dissociation for peptide anion characterization. Journal of the American Society for Mass Spectrometry 23(10):1707-1715. 180. Shearwin KE, Callen BP, & Egan JB (2005) Transcriptional interference--a crash course. Trends in genetics : TIG 21(6):339-345. 181. Shi Y (2009) Serine/threonine phosphatases: mechanism through structure. Cell 139(3):468-484. 182. Shou W, et al. (1999) Exit from mitosis is triggered by Tem1-dependent release of the protein phosphatase Cdc14 from nucleolar RENT complex. Cell 97(2):233- 244. 183. Sievers F, et al. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology 7:539. 184. Singh N, et al. (2009) The Ess1 prolyl isomerase is required for transcription termination of small noncoding RNAs via the Nrd1 pathway. Molecular cell 36(2):255-266. 185. Skalska L, Beltran-Nebot M, Ule J, & Jenner RG (2017) Regulatory feedback from nascent RNA to chromatin and transcription. Nature reviews. Molecular cell biology 18(5):331-337. 186. Smorodinsky-Atias K, et al. (2016) Intrinsically active variants of Erk oncogenically transform cells and disclose unexpected autophosphorylation capability that is independent of TEY phosphorylation. Molecular biology of the cell 27(6):1026-1039. 187. Songyang Z, et al. (1995) Catalytic specificity of protein-tyrosine kinases is critical for selective signalling. Nature 373(6514):536-539. 188. Steinmetz EJ & Brow DA (2003) Ssu72 protein mediates both poly(A)-coupled and poly(A)-independent termination of RNA polymerase II transcription. Molecular and cellular biology 23(18):6339-6349. 189. Steinmetz EJ, et al. (2006) Genome-wide distribution of yeast RNA polymerase II and its control by Sen1 helicase. Molecular cell 24(5):735-746. 190. Suh H, et al. (2016) Direct Analysis of Phosphorylation Sites on the Rpb1 C- Terminal Domain of RNA Polymerase II. Molecular cell 61(2):297-304. 191. Sun M, Lariviere L, Dengl S, Mayer A, & Cramer P (2010) A Tandem SH2 Domain in Transcription Elongation Factor Spt6 Binds the Phosphorylated RNA 165

Polymerase II C-terminal Repeat Domain (CTD). Journal of Biological Chemistry 285(53):41597-41603. 192. Sun ZW & Hampsey M (1996) Synthetic enhancement of a TFIIB defect by a mutation in SSU72, an essential yeast gene encoding a novel protein that affects transcription start site selection in vivo. Molecular and cellular biology 16(4):1557-1566. 193. Team RC (2013) R: A language and environment for statistical computing. (Vienna, Austria). 194. Tee WW, Shen SS, Oksuz O, Narendra V, & Reinberg D (2014) Erk1/2 activity promotes chromatin features and RNAPII phosphorylation at developmental promoters in mouse ESCs. Cell 156(4):678-690. 195. Terzi N, Churchman LS, Vasiljeva L, Weissman J, & Buratowski S (2011) H3K4 Trimethylation by Set1 Promotes Efficient Termination by the Nrd1-Nab3-Sen1 Pathway. Molecular and Cellular Biology 31(17):3569-3583. 196. Thomas MC & Chiang CM (2006) The general transcription machinery and general cofactors. Critical reviews in biochemistry and molecular biology 41(3):105-178. 197. Thompson MS, Cui W, & Reilly JP (2007) Factors that impact the vacuum ultraviolet photofragmentation of peptide ions. Journal of the American Society for Mass Spectrometry 18(8):1439-1452. 198. Tietjen JR, et al. (2010) Chemical-genomic dissection of the CTD code. Nature structural & molecular biology 17(9):1154-1161. 199. Trigon S, et al. (1998) Characterization of the residues phosphorylated in vitro by different C-terminal domain kinases. The Journal of biological chemistry 273(12):6769-6775. 200. Vagin AA, et al. (2004) REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use. Acta crystallographica. Section D, Biological crystallography 60(Pt 12 Pt 1):2184-2195. 201. Verdecia MA, Bowman ME, Lu KP, Hunter T, & Noel JP (2000) Structural basis for phosphoserine-proline recognition by group IV WW domains. Nature structural biology 7(8):639-643. 202. Visintin R, et al. (1998) The phosphatase Cdc14 triggers mitotic exit by reversal of Cdk-dependent phosphorylation. Molecular cell 2(6):709-718. 203. Visintin R, Hwang ES, & Amon A (1999) Cfi1 prevents premature exit from mitosis by anchoring Cdc14 phosphatase in the nucleolus. Nature 398(6730):818- 823. 204. Wang XJ, et al. (2003) Serine-cis-proline and serine-trans-proline isosteres: stereoselective synthesis of (Z)- and (E)-alkene mimics by Still-Wittig and Ireland-Claisen rearrangements. The Journal of organic chemistry 68(6):2343- 2349. 205. Wang XJ, Xu B, Mullins AB, Neiler FK, & Etzkorn FA (2004) Conformationally locked isostere of phosphoSer-cis-Pro inhibits Pin1 23-fold better than phosphoSer-trans-Pro isostere. Journal of the American Chemical Society 166

126(47):15533-15542. 206. Weake VM & Workman JL (2010) Inducible gene expression: diverse regulatory mechanisms. Nature reviews. Genetics 11(6):426-437. 207. Wedemeyer WJ, Welker E, & Scheraga HA (2002) Proline cis-trans isomerization and protein folding. Biochemistry 41(50):14637-14644. 208. Welinder C & Ekblad L (2011) Coomassie staining as loading control in Western blot analysis. Journal of proteome research 10(3):1416-1419. 209. Werner-Allen JW, et al. (2011) cis-Proline-mediated Ser(P)5 dephosphorylation by the RNA polymerase II C-terminal domain phosphatase Ssu72. The Journal of biological chemistry 286(7):5717-5726. 210. West ML & Corden JL (1995) Construction and analysis of yeast RNA polymerase II CTD deletion and substitution mutations. Genetics 140(4):1223- 1233. 211. Winn MD, et al. (2011) Overview of the CCP4 suite and current developments. Acta crystallographica. Section D, Biological crystallography 67(Pt 4):235-242. 212. Winsor TS, Bartkowiak B, Bennett CB, & Greenleaf AL (2013) A DNA damage response system associated with the phosphoCTD of elongating RNA polymerase II. PloS one 8(4):e60909. 213. Winter GE, et al. (2012) Systems-pharmacology dissection of a drug synergy in imatinib-resistant CML. Nature chemical biology 8(11):905-912. 214. Wong KH, Jin Y, & Struhl K (2014) TFIIH Phosphorylation of the Pol II CTD Stimulates Mediator Dissociation from the Preinitiation Complex and Promoter Escape. Molecular Cell 54(4):601-612. 215. Wu X, Rossettini A, & Hanes SD (2003) The ESS1 prolyl isomerase and its suppressor BYE1 interact with RNA pol II to inhibit transcription elongation in Saccharomyces cerevisiae. Genetics 165(4):1687-1702. 216. Wu X, et al. (2000) The Ess1 prolyl isomerase is linked to chromatin remodeling complexes and the general transcription machinery. The EMBO journal 19(14):3727-3738. 217. Xiang K, Manley JL, & Tong L (2012) An unexpected binding mode for a Pol II CTD peptide phosphorylated at Ser7 in the active site of the CTD phosphatase Ssu72. Genes & development 26(20):2265-2270. 218. Xiang K, et al. (2010) Crystal structure of the human symplekin-Ssu72-CTD phosphopeptide complex. Nature 467(7316):729-733. 219. Xu YX, Hirose Y, Zhou XZ, Lu KP, & Manley JL (2003) Pin1 modulates the structure and function of human RNA polymerase II. Genes & development 17(22):2765-2776. 220. Xue Y, et al. (2013) Direct conversion of fibroblasts to neurons by reprogramming PTB-regulated microRNA circuits. Cell 152(1-2):82-96. 221. Yeo M, et al. (2005) Small CTD phosphatases function in silencing neuronal gene expression. Science 307(5709):596-600. 222. Yeo M, Lin PS, Dahmus ME, & Gill GN (2003) A novel RNA polymerase II C- terminal domain phosphatase that preferentially dephosphorylates serine 5. The 167

Journal of biological chemistry 278(28):26078-26085. 223. Yogesha SD, Mayfield JE, & Zhang Y (2014) Cross-talk of phosphorylation and prolyl isomerization of the C-terminal domain of RNA Polymerase II. Molecules 19(2):1481-1511. 224. Yu M, et al. (2015) RNA polymerase II-associated factor 1 regulates the release and phosphorylation of paused RNA polymerase II. Science 350(6266):1383- 1386. 225. Zhang DW, et al. (2012) Ssu72 phosphatase-dependent erasure of phospho-Ser7 marks on the RNA polymerase II C-terminal domain is essential for viability and transcription termination. The Journal of biological chemistry 287(11):8541- 8551. 226 . Zhang M, et al. (2012) Structural and kinetic analysis of prolyl- isomerization/phosphorylation cross-talk in the CTD code. ACS chemical biology 7(8):1462-1470. 227. Zhang Y, et al. (2006) Determinants for dephosphorylation of the RNA polymerase II C-terminal domain by Scp1. Molecular cell 24(5):759-770. 228. Zhang Y, Zhang M, & Zhang Y (2011) Crystal structure of Ssu72, an essential eukaryotic phosphatase specific for the C-terminal domain of RNA polymerase II, in complex with a transition state analogue. The Biochemical journal 434(3):435- 444. 229. Zhang Z & Gilmour DS (2006) Pcf11 is a termination factor in Drosophila that dismantles the elongation complex by bridging the CTD of RNA polymerase II to the nascent transcript. Molecular cell 21(1):65-74.

168