The impact of cytoplasmic capping on transcriptome complexity

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the

Graduate School of The Ohio State University

By

Daniel E. del Valle-Morales, B.S.

Graduate Program in Molecular, Cellular, and Developmental Biology

The Ohio State University

2020

Dissertation Committee

Daniel R. Schoenberg, Advisor

Dawn S. Chandler

Ralf Bundschuh

Guramrit Singh

Copyrighted by

Daniel E. del Valle-Morales

2020

Abstract

The 5’ cap is an essential modification of mRNAs that is needed for the functionality and lifespan of an mRNA. The cap is added almost immediately after the first nucleotide is transcribed, coordinated by RNGTT and RNMT-RAM bound to the C-terminal tail of RNA Pol

II. This process of capping was thought to exclusively occur in the nucleus and loss of the cap was irreversible, leading to the rapid degradation of the mRNA. However, not all mRNAs share this fate. Over the last decade, the Schoenberg lab has characterized cytoplasmic capping, a process where the cap can be restored to previously decapped mRNAs. Cytoplasmic capping is catalyzed by a complex that consists of a cytoplasmic pool of both RNGTT and RNMT-RAM bound to the adapter NCK1 along with an unknown 5’ monophosphate kinase. mRNAs that undergo cytoplasmic capping can cycle from being in a decapped state to a recapped state as a way to fine tune expression, a process called cap homeostasis.

The recapping targets were initially identified using a catalytically inactive and cytoplasmically restricted form of RNGTT termed K294A. Overexpression of K294A resulted in an accumulation of uncapped mRNAs in non-translating mRNPs. A major drawback of this approach was its reliance on detecting uncapped mRNAs, a potentially unstable population that can undergo partial degradation. An alternative tool to study recapping was developed using a catalytically inactive and cytoplasmically restricted form of RNMT termed ΔN-RNMT. ΔN-

RNMT expression results in a decrease in the steady state levels of recapped mRNAs. A U2OS stable cell line expressing ΔN-RNMT was developed, and changes in the steady state levels of potential recapping were measured by RNA-Seq. 5’ terminal oligopyrimidine (TOP) mRNAs were identified as recapping targets, and I showed direct evidence of recapping

i

occurring at both the canonical site and downstream in the 5’UTR of eIF3D and eIF3K.

Expression of ΔN-RNMT also results in a shift in the 3’UTR usage from proximal sites to distal sites.

Uncapped mRNAs that accumulated with inhibition of cytoplasmic capping map to downstream CAGE tags (Kiss et al., 2015). If recapping were to occur at these sites, N- terminally truncated could be translated. In collaboration with the Wysocki lab, we examined the relationship of cytoplasmic capping with N-terminally truncated proteins and showed that half of the downstream N-termini identified decrease when cytoplasmic capping is inhibited. These downstream N-terminal peptides correspond to RNA binding proteins, and the mapped downstream N-termini have a start site near the vicinity of the peptide. These studies expand the scope of cytoplasmic capping showing both direct evidence of recapping at canonical and downstream sites, and evidence of cytoplasmic capping expanding the proteome through synthesis of truncated proteins.

ii

Dedication

I can write a thesis in itself for all the people I would like to dedicate this to, but I do want

dedicate this to my undergraduate mentor Carlos Ruiz.

iii

Acknowledgements

I first would like to give thanks to my collaborators for their contributions to this work particularly to Bernice Agana for her fantastic work. Thank you to the Center for RNA Biology, the Department of Biological Chemistry and Pharmacology, and the OSU Graduate School for supporting my work with funding and awards and for providing excellent facilities to research and to meet new scientists. Thanks to the NIH for providing the supplemental grant which funded the majority of my research.

I would also like to acknowledge my committee members, Ralf, Guramrit, and Dawn, for their tough, yet fair critiques of my work. I would like to thank all of the past lab members;

Chandrama, my first lab mentor; Dan Kiss for training me as an RNA biologist, Jackson for his friendship, Shan-Qing, and the various undergraduates Gabe, Andrew, and Mikaela. Thanks to

Mike Kearse and his lab for that brief moment that we shared a lab bench. And to Wen Tan for his extensive lab discussions.

And thank you, Daniel Schoenberg, for taking me in as an undergraduate during my first research internship. That experience cemented my path towards my Ph.D. and it was a blast to work for you. Gracias por tu ayuda!

iv

Vita

2011 …………………………………………………………….. First Bilingual Preparatory dd dd School, Aguadilla, Puerto Rico

2014 …………………………………………………………….. B.S. Natural Science, d University of Puerto Rico, d d d Aguadilla

2014 to present …………………………………………………. Graduate Research Associate, d d The Ohio State University

Publications del Valle-Morales D., Trotman J., Bundschuh R., Schoenberg D.R. (2020) Inhibition of cytoplasmic cap methylation identifies 5’ TOP mRNAs as recapping targets and reveals recapping sites downstream of native 5’ends. Nucleic Acids Res., 48(7), 3806–3815

Fields of study

Major Field: Molecular, Cellular, and Developmental Biology

v

Table of contents

Abstract………………………………………………………………………………………….. i Dedication………………………………………………………………………………………. iii Acknowledgements …………………………………………………………………………….. iv Vita ………………………………………………………………………………………………v List of Tables …………………………………………………………………………………. viii List of Figures ………………………………………………………………………………….. ix Chapter 1. Introduction: Recapping of Cytoplasmic mRNAs…………………………………... 1 Abstract…………………………………………………………………………………………. 1 Introduction…………………………………………………………………………………….... 1 The 5’ cap and canonical capping……………………………………………………………... 3 Early Evidence of Cytoplasmic Capping ……………………………………………………... 5 The Cytoplasmic Capping Complex ………………………………………………………….. 6 Characteristics of Recapping Targets …………………………………………………………10 Capping downstream of canonical capping sites …………………………………………….. 13 Unanswered Questions ……………………………………………………………………….. 15 Chapter 2. Inhibition of cytoplasmic cap methylation identifies 5’ TOP mRNAs as recapping targets and reveals recapping sites downstream of native 5’ends……………………………… 17 Abstract……………………………………………………………………………………….. 17 Introduction …………………………………………………………………………………... 18 Materials and Methods ……………………………………………………………………….. 20 Results ………………………………………………………………………………………... 27 Discussion ……………………………………………………………………………………..45 Acknowledgements……………………………………………………………………………. 48 Chapter 3. mRNA recapping increases proteome complexity by enabling translation downstream of canonical 5’ends……………………………………………………………………………... 50 Abstract……………………………………………………………………………………… 50 Introduction……………………...……………………………………………………………. 51

vi

Materials and Methods ……………………………………………………………………… 52 Results ………………………………………………………………………………………. 61 Discussion …………………………………………………………………………………... 71 Chapter 4. Identification of cytoplasmic capping sites in mRNAs……………………………. 75 Abstract……………………………………………………………………………………… 75 Introduction………………………………………………………………………………….. 75 Cap analysis of …………………………………………………………….. 78 Cap-SMART………………………………………………………………………………… 81 TeloPrime……………………………………………………………………………………. 85 ReCappable Seq……………………………………………………………………………... 90 Chapter 5. Future Work and concluding remarks……………………………………………... 95 References……………………………………………………………………………………... 99 Appendix List of primers…………………………………………………………………...... 105

vii

List of Tables

Table 1 PANTHER analysis on ΔN-RNMT downregulated genes ………………………….... 31 Table 2 PANTHER analysis on ΔN-RNMT upregulated genes ………………………...... 41 Table 3 List of primers used in chapter 2……………………………………………………....105 Table 4 List of primers used in chapter 3………………………………………………………106

viii

List of Figure

Figure 1 Structure of the 5’ cap and the enzymatic steps of nuclear capping ………………….. 2 Figure 2 The cytoplasmic capping complex……………………………………………………. 6 Figure 3 Identification of recapping targets through the inhibition of cytoplasmic capping…… 9 Figure 4 Recapping targets accumulate in non-translating RNPs when cytoplasmic capping is inhibited...... 10 Figure 5 5’end procession of mRNAs as a possible origin of downstream recapping targets.....13 Figure 6 Validation of ΔN-RNMT cell line………………………….………………………… 26 Figure 7 Metagene analysis of QuantSeq data…………………………………………………. 28 Figure 8 Position of sequence tags and quantitative changes of RNAs after expression of ΔN- RNMT……………………………………………………………………………….…………. 29 Figure 9 Differential expression on TOP mRNAs with ΔN-RNMT induction……………….... 32 Figure 10 Western blot analysis of TOP genes………………………………………………..... 33 Figure 11 Cap-end analysis detects a decrease at the canonical and downstream capping sites of TOP mRNAs……………………………………………………………………………………. 35 Figure 12 Induction of ΔN-RNMT does not activate the cellular stress response……………... 40 Figure 13 3’UTR usage analysis reveals a shift from proximal to distal 3’UTR cleavage sites. 42 Figure 14 PABPN1 protein levels increase with ΔN-RNMT induction……………………….. 44 Figure 15 Inhibition of cytoplasmic capping shows a minor impact in the proteome………… 60 Figure 16 Approach for the enrichment of N-termini generated through cytoplasmic capping. 62 Figure 17 Downstream N-termini decreases when cytoplasmic capping is inhibited………… 63 Figure 18 Downstream N-termini that decrease with K294A are enriched in RNA binding proteins………………………………………………………………………………………… 66 Figure 19 Downstream peptides have possible start sites close to the mapped peptide………. 68 Figure 20 TeloPrime coupled with Illumina library preparation detects 5’ truncated mRNAs.. 69 Figure 21 Cap analysis of gene expression (CAGE) for the identification of TSS…………… 77 Figure 22 Cap-SMART protocol for enrichment of 5’ capped mRNAs………………………. 80 Figure 23 Recovery of mRNAs after Cap-SMART………………………………………….... 82 Figure 24 TeloPrime protocol for the selection of capped mRNAs………………………….... 84 ix

Figure 25 Modified TeloPrime protocol with random hexamer priming…………………….. 87 Figure 26 ReCappable Seq protocol…………………………………………………………... 89 Figure 27 Recovery of biotinylated capped mRNAs after performing ReCappable………….. 92

x

Chapter 1. Introduction: Recapping of cytoplasmic mRNAs

Abstract

The N7-methylguanosine cap is a vital modification of mRNAs required for the translation of mRNAs. Loss of the cap was thought to be irreversible and lead to the rapid decay of uncapped mRNAs. This notion was challenged when a cytoplasmic pool of capping enzyme was identified in a complex capable of restoring the cap on to uncapped mRNAs. The uncapped mRNAs can cycle from an uncapped state to a recapped state to fine tune gene expression in a process called cap homeostasis. This chapter will give an overview of the early evidence of cytoplasmic capping and the major advances in our understanding of cytoplasmic capping.

Introduction:

Eukaryotic mRNAs are heavily processed (Hocine et al., 2010). One of the first processing events is the addition of the N7-methylguanosine cap. The cap serves as a vital modification for the functionality of mRNAs. Proteins binding to the cap coordinates splicing and polyadenlynation of the mRNA, and the nuclear export of mRNA through cap binding proteins (Gonatopoulos-Pournatzis & Cowling, 2013). Once exported, the cap protects the mRNA from 5’-3’ degradation (Ramanathan et al., 2016), and coordinates cap-dependent translation initiation. At the end of an mRNAs life cycle, the cap is removed and the mRNA is degraded.

The process of capping was once thought to be predominantly nuclear and decapping was considered to be irreversible, leading to the rapid degradation of mRNAs. This notion was challenged when a pool of capping enzyme (RNGTT) was found in the cytoplasm (Otsuka et al.,

1

2009). This led to the identification of the cytoplasmic capping complex which is capable of restoring the cap to uncapped mRNAs (Mukherjee et al., 2014) in a cyclical fashion to fine tune gene expression (Murkhejee et al., 2014). This chapter summarizes the history of cytoplasmic capping, the finding that led to our working model of the cytoplasmic capping complex and characteristics of recapping targets.

Figure 1. Structure of the 5’ cap and the enzymatic steps of nuclear capping. A. Structure of the

5’ cap with the critical N7-methylguanosine (green) for cap 0 and the sites of methylation on the first nucleotide (cap 1) and second nucleotide (cap 2) in red. B. Enzymatic process of nuclear capping (Schoenberg & Maquat, 2009).

2

The 5’ cap and canonical capping

The 5’ cap consists of an inverted GTP molecule methylated in the N7 position which is linked to the first transcribed nucleotide of mRNAs through a 5’ to 5’ triphosphate bridge

(Figure 1A) (Furuichi et al., 1975). All Pol II transcripts such as pre-mRNAs, long non-coding

RNAs (lncRNA), miRNA precursors, small nucleolar RNAs (snoRNA), and small nuclear RNAs

(snRNAs) are capped immediately as the first nucleotide is being transcribed. Capping is catalyzed by capping enzyme (RNGTT) bound to the phosphorylated C-terminal domain of RNA

Pol II (Figure 1B). Capping starts when lysine 294 of RNGTT’s triphosphatase domain attacks the alpha phosphate of GTP to form a covalent GMP adduct. The triphosphatase domain converts the 5’ triphosphate of the newly transcribed mRNA into a diphosphate. The guanylyltransferase domain then transfers the GMP molecule to the first transcribed nucleotide forming the G-capped mRNA. RNA guanine-7 methyltransferase (RNMT) completes the cap by methylation of the N7 position of the cap, forming the basic Cap0. Cap0 is sufficient for recognition of cap binding proteins and translation (Calero et al., 2002). Other methylation events can occur at the 2’-O position of the first transcribed nucleotide (Cap1) and the second nucleotide (Cap2) by CMTR1 and CMTR2, respectively (Belanger et al., 2010). If the first transcribed nucleotide is an adenosine, additional methylation at the N6 position (m6A) can occur (Sun et al., 2019). In vivo, cap1 is used to recognize self RNA from viral mRNAs (Leung

& Amarasinghe, 2016) while cap2 enhances translation (Werner et al., 2011).

Once capped, the nuclear cap binding proteins (CBP), CBP20/CBP80 in eukaryotes, bind to the cap as a quality control to ensure that only capped mRNAs are spliced, cleaved, and

3 polyadenylated at the 3’end (Gonatopoulos-Pournatzis & Cowling, 2013). The fully matured mRNA is then exported to the cytoplasm and is protected from degradation (Gonatopoulos-

Pournatzis & Cowling, 2013). CBP20/CBP80 then recruits the translation machinery where the initial round of translation occurs as a quality control for mRNA integrity. The nuclear cap binding complex is then replaced by eIF4E to initiate cap dependent translation. Other cap binding proteins have been discovered recently, such as LARP1 which suppresses the translation of genes containing the 5’ terminal oligopyrimidine (TOP) motif (Philippe et al., 2018) and eIF3D, which can promote translation of a subset of genes independent of eIF4E (Lee et al.

2016). The interaction of these two cap binding proteins with cytoplasmic capping is discussed further in chapter 2.

During the end of an mRNA’s lifespan, the cap is removed by the decapping enzymes

DCP1/DCP2. DCP1/DCP2 removes the cap by cleaving between the alpha and beta phosphate of the triphosphate bridge, generating m7GDP and a 5’ monophosphate end. Subsequently, the mRNA harboring a 5’ monophosphate is now a substrate for 5’ to 3’ decay by the exonuclease

XRN1. Other decapping enzymes such as DXO which cleaves improperly methylated caps (Jiao et al., 2010) and from the Nudix family of decapping enzymes (Grudzien-Nogalska, & Kiledjian,

2017, Song et al., 2013) have been characterized. Unlike DCP1/DCP2, some of the Nudix decapping enzymes such as Nudt2 cleaves caps between the gamma and beta phosphate of the triphosphate linkage, generating a 5’ diphosphate end that is resistant to XRN1 (Dilweg et al.,

2019, Charley et al., 2018).

4

Early evidence of cytoplasmic capping

Capping was thought to exclusively occur in the nucleus, and once mRNAs are decapped would lead to the rapid degradation of mRNAs. This notion was challenged with early studies of a nonsense in the genetic blood disorder β-thalassemia (Lim, & Maquat, 1992).

Erythroid cells from a mouse model of β-thalassemia generated decay intermediates of β-globin mRNA that were 5’ truncated yet surprisingly stable. These decay intermediates were polyadenylated and only present in the cytoplasm (Lim, & Maquat, 1992, Stevens et al., 2002).

Upon closer examination, these decay intermediates contained a cap-like structure that bound to an anti-cap antibody, was eluted with m7GDP, and no longer bound to the anti-cap antibody after treatment with the phosphoric ester hydrolase tobacco acid pyrophosphatase. These findings were further confirmed much later by the Schoenberg lab where these decay intermediates were shown to bind to recombinant eIF4E, be resistant to XRN1 and were only susceptible to degradation after DCP2 treatment (Otsuka et al., 2009). These decay intermediates were later shown to originate from the NMD-associated endonuclease SMG6 (Mascarenhas et al., 2013).

At the time, capping was thought to be restrictive to the nucleus. These findings put into question whether there was a mechanism that is capable of adding the cap in the cytoplasm to uncapped mRNAs.

5

Figure 2. The cytoplasmic capping complex. (Trotman & Schoenberg, 2019). Current model of the enzymatic steps of cytoplasmic capping. Through cap homeostasis, recapping mRNAs cycle from a translationally active capped state to a translationally inactive uncapped state.

The cytoplasmic capping complex

The first hints of a mechanism capable of generating a cytoplasmic cap came from the detection of a cytoplasmic pool of RNGTT (Otsuka et al., 2009). Although thought to be predominantly nuclear, RNGTT was found in the cytoplasm in erythroid, U2OS, MEL, HEK293, and COS-1 cells. When this population of cytoplasmic RNGTT was immunoprecipitated, it was capable of incorporating GTP onto the 5’end of a 5’ monophosphate RNA but not on the end of a

5’ hydroxyl RNA (Otsuka et al., 2009). The population of cytoplasmic RNGTT that had enzymatic activity sedimented at ~140 kDa, suggesting that RNGTT is in a complex with other proteins (Otsuka et al., 2009).

6

In an effort to identify the portion of RNGTT that is involved in cytoplasmic capping, we noticed that any alteration of the C-terminal region of RNGTT would impair with the capping ability of cytoplasmic RNGTT (Mukherjee et al., 2014). The C-terminal region of mammalian

RNGTT contains a proline rich region that is predicted to bind to SH3 domains; this region was found to bind to NCK1 (NCK adapter protein 1). NCK1 consists of three SH3 domains and one

SH2 domain. Mutagenesis of each domain provided insights into the position of RNGTT binding to NCK1 and of the enzymatic activity found in each domain (Mukherjee et al., 2014).

Mutagenesis of the second SH3 domain impaired the ability of the complex to convert a 5’ monophosphate RNA into a diphosphate. Mutagenesis of the third SH3 domain impaired guanylylation and is the interacting site for RNGTT. The same enzyme involved in nuclear cap methylation RMNT was also shown to interact transiently with RNGTT along with its co- activator RAM in the cytoplasm (Trotman et al., 2017).

These findings are summarized in Figure 2 as our current working model of the biochemical steps in cytoplasmic capping. The cytoplasmic capping complex functions as a metabolon, where each enzyme involved in the process is adjacent to each other to generate a cytoplasmic Cap0. When mRNAs are decapped in the cytoplasm by DCP2/1, the resulting monophosphate end is phosphorylated by an unknown kinase. This step may be bypassed if the mRNA was decapped by other decapping enzymes that generate a diphosphate. The now diphosphate end can be guanylylated by RNGTT forming the G-capped mRNA. RNMT then methylates the cap at the N7 position, forming the fully matured cap. The recapped mRNA can then reenter the translating pool of mRNAs.

7

The proline rich region is only present in the RNGTT of higher metazoans. However, a pool of capping enzyme was found in the cytoplasm of Drosophila and was regulating hedgehog signaling (Chen et al., 2017). Trypanosomes also have cytoplasmic capping activity. They have a predominantly cytoplasmic capping enzyme (TbCE1) that surprisingly contains its own 5’ monophosphate kinase domain; it is the only enzyme identified to date with this activity

(Ignatochkina et al., 2015). TbCE1 effectively recaps the spliced leader sequence present on all

Trypanosome mRNA. Similar to cytoplasmic capping in higher metazoans, the guanine-N7 methyltransferase, TbCMT1, methylates recapped mRNAs in trypanosomes (Hall & Ho, 2006,

Ignatochkina et al., 2015). TbCE1 and their related enzymes are limited to kinetoplastids, suggesting a different evolutionary origin for recapping compared to the higher eukaryotes.

8

Figure 3. Identification of recapping targets through the inhibition of cytoplasmic capping

(adapted from Mukherjee et al., 2012). A. Cytoplasmic RNA from uninduced and K294A induced cells were treated ± Xrn1 and the degree of 5’end degradation was measured using an

Affymetrix human array. B. Recapping targets were classified based on their susceptibility to Xrn1 degradation. The native group were susceptible to XRN1 degradation in uninduced cells.

The capping inhibited group were susceptible to XRN1 degradation when cytoplasmic capping was inhibited. The common group showed susceptibility to XRN1 degradation in uninduced cells and increased susceptibility to XRN1 degradation when cytoplasmic capping is inhibited.

9

Figure 4. Recapping targets accumulate in non-translating mRNPs when cytoplasmic capping is inhibited (Mukherjee et al., 2012). A. Polysome gradient of cytoplasmic lysates from control

(blue) and DN-cCE (red). Fractions 1-9 represent non-translating mRNPs. Fractions 10 onward represents bound mRNAs. B. RT-qPCR of cytoplasmic capping targets from each fraction of the polysome gradient.

Characteristics of recapping targets

In order to identify cytoplasmic capping targets without disrupting nuclear capping, the

Schoenberg lab created a tetracycline inducible stable line of U2OS cell expressing a dominant negative mutant form of RNGTT that was catalytically inactive and restricted to the cytoplasm termed K294A (Otsuka et al., 2009). K294A is a mutant form of capping enzyme in which the

10 lysine at position 294 is changed to alanine. This prevents the covalent binding of GMP, inhibiting guanylylation activity. Overexpression of K294A would outcompete endogenous

RNGTT binding to NCK1, blocking cytoplasmic capping and resulting in cytoplasmic capping targets to remain uncapped. Following the assumption that a pool of uncapped mRNAs is maintained in a stable state, cytoplasmic RNA from control and K294A expressing cells were treated with XRN1 and the change in RNA degradation was monitored using an Affymetrix human exon ST array (Figure 3A) (Mukherjee et al., 2012). 4176 genes were identified and subdivided into three classes of cytoplasmic capping targets based on the sensitivity to XRN1 degradation (Figure 3B). The native group are genes have a subpopulation that is sensitive to

XRN1 in uninduced cells, the capping inhibited are genes that only showed XRN1 sensitivity when cytoplasmic capping is inhibited, and common are genes that were sensitive to XRN1 in uninduced cells and their sensitivity increased when cytoplasmic capping was inhibited. This list of genes served as our initial set of recapping genes.

The effect of blocking cytoplasmic capping in translation was examined with polysome gradients (Figure 4A) (Mukherjee et al., 2012). A strong peak was detected in earlier fractions corresponding to non-translating mRNPs when K294A is expressed. RT-qPCR of each fraction for select recapping genes revealed an enrichment for recapping genes in the non-translating mRNPs with K294A induction (Figure 4B). This accumulation in the non-translating mRNPs with K294A induction led to the proposed mechanism of cap homeostasis where mRNA cycle from an uncapped state to a recapped state to fine tune gene expression (Figure 3). In order for uncapped mRNAs to reenter the translating pool, both the cap and the poly(A) tail must be restored. When examining the poly(A) tail length of recapped mRNAs, the poly(A) tail length of

11 the capped forms on translating polysome and the uncapped forms that accumulate in non- translating mRNP complexes remained the same. We concluded that cytoplasmic capping is the only process needed to restore uncapped transcripts for translation (Kiss et al., 2016).

Cytoplasmic capping only affected a subset of mRNAs, suggesting some form of target selectivity for recapping. The interacting partners for cytoplasmic RNGTT were examined by both immunoprecipitation and proximity biotin-ligation and subsequent streptavidin-pulldown

(Roux et al. 2019) of cytoplasmic RNGTT, and analyzed by mass spectrometry. This identified

52 RNA binding proteins as interacting partners of cytoplasmic RNGTT, suggesting that the

RNA binding proteins are responsible for target specificity of recapping (Trotman et al., 2018).

Surprisingly, other members of the recapping complex were not detected in this analysis, suggesting that formation of the cytoplasmic capping complex is transient and only assembles when needed.

12

Figure 5. 5’end processing of mRNAs as a possible origin of downstream recapping targets

(Trotman & Schoenberg, 2019). A. Possible avenues for the generation of a 5’ truncated recapping target. B. The effects of downstream recapping based on its location within the mRNA.

Capping downstream of canonical capping sites

Cap homeostasis provides a mechanism where uncapped mRNAs can reenter translation.

Capping at the canonical 5’end is the most likely result of cytoplasmic capping. It may also be possible that cytoplasmic capping can occur on 5’ truncated mRNAs. Initial evidence that suggests that 5’ truncated mRNAs exist came from a transcriptomic analysis of transcriptional start sites (TSS) with cap analysis of gene expression (CAGE) (Takahashi et al., 2012). Roughly

25% of the reads generated with CAGE mapped within (Fejes-Toth et al., 2009). It is possible that alternative TSS (Reyes & Huber, 2018) may be a source of these downstream

CAGE tags. However, when the 5’ends of uncapped mRNAs that accumulate with K294A

13 expression were mapped with 5’ RACE, they align near the vicinity of these downstream CAGE tags (Kiss et al., 2015, Berger et al., 2019). This suggests that cytoplasmic capping may be the source of these 5’ truncated mRNAs.

Downstream recapping sites can possibly be generated by endonuclease cleavage, pausing at structured sequences, or RNA binding proteins within the mRNA that impedes XRN1 degradation (Charley et al., 2018) (Figure 5A). All of these mechanisms can produce a downstream 5’end that can potentially be recapped. The location of the recapping site within the gene body can greatly affect the protein product after translation (Figure 5B). For example, recapping within the 5’UTR can possibly exclude key regulatory elements that are found in the

5’UTR. An example of such an event is described in chapter 2 where a downstream recapping site truncates the TOP regulatory motif. This event would not change the protein sequence of the gene but could cause dysregulation of translation of the mRNA. Recapping downstream of the canonical AUG may cause a drastic change in the protein that is synthesized from the mRNA.

Translation of this recapping target would generate a truncated protein if expressed within the same frame or encode a completely new protein if translated out of frame. This process of the translation of truncated proteins is discussed further in chapter 3. Finally, recapping within the

3’UTR (Sudmant et al., 2019) could act as a sponge that binds translation factors as a means to regulate global translation or other regulatory elements such as miRNAs and RNA binding proteins. All such cases suggest that cytoplasmic capping has a drastic effect in the proteomic complexity of the cell.

14

Unanswered questions

With the discovery of cytoplasmic capping, extensive efforts have been made to characterize the biochemical reactions involved with cytoplasmic capping. Our understanding of this process is described with the current cytoplasmic capping model of Figure 2. There are still several unanswered questions on cytoplasmic capping. Little is known of the biological importance of cytoplasmic capping and the effects of cytoplasmic capping on the transcriptome.

The location of recapping can also have a significant impact on the translated protein product.

We know that a kinase binds to the second SH3 domain for NCK1 because that domain exhibits kinase activity on monophosphate 5’ RNAs. However, the identity of this kinase, remains elusive. The goal for the study that identified interacting partners of cytoplasmic

RNGTT was to identify the 5’ monophosphate kinase (Trotman et al., 2018). With the exception of TbCE1 in trypanosomes, no other known protein exhibits 5’ monophosphate kinase activity.

To date, we only know that the cytoplasmic complex exhibits this activity.

The location of cytoplasmic capping within the mRNA can have a significant impact on the protein that is synthesized. Cytoplasmic capping was shown to occur indirectly at canonical and near the vicinity of downstream CAGE tags by mapping the uncapped mRNAs that accumulate with inhibition of cytoplasmic capping (Kiss et al., 2015). A major caveat of this study is the reliance on detecting uncapped mRNAs, an unstable population of mRNAs that can undergo partial degradation. An alternative method for identifying recapping targets is described in chapter 2 where direct evidence of cytoplasmic capping at the canonical and downstream capping sites is provided. However, the transcriptome-wide location of cytoplasmic capping has yet to be determined. Efforts to answer this question are discussed in chapter 4, where various

15 methods were developed to address this question. More importantly, how prevalent is downstream recapping? Recapping downstream of canonical AUGs could have significant effect in the proteome and can result in N-terminally truncated proteins if translated The relationship between N-terminally truncated proteins and cytoplasmic capping is examined in chapter 3.

Finally, the biological effects of cytoplasmic capping have yet to be determined. To date, all of the studies done on cytoplasmic capping were conducted on stable cell lines. Inhibition of cytoplasmic capping with K294A resulted in the accumulation of uncapped mRNAs.

Interestingly, this inhibition also reduced the recovery of cells after arsenite stress, suggesting that cytoplasmic capping may play a role in stress recovery (Otsuka et al, 2009). There are limitations with using K294A particularly with relying on uncapped mRNAs to determine recapping. To circumvent this issue, a stable cell line expressing a dominant negative inhibitor of cytoplasmic cap methylation (ΔN-RNMT) was developed (del Valle Morales et al., 2020).

Chapter 2 will describe this system in more detail. The subsequent chapters will try to address these questions and offer more insight of the biological effects of cytoplasmic capping.

16

Chapter 2. Inhibition of cytoplasmic cap methylation identifies 5’ TOP mRNAs as

recapping targets and reveals recapping sites downstream of native 5’ends1, 2

Abstract

Cap homeostasis is a cyclical process of decapping and recapping to fine tune gene expression. mRNAs that undergo cytoplasmic recapping were previously identified by the accumulation of uncapped forms when cytoplasmic capping was blocked. This approach has several drawbacks as it is based on the detection of uncapped mRNAs. In this study, I developed a catalytically inactive form of RNMT that blocks cytoplasmic cap methylation in U2OS cells.

This identified mRNAs with 5’-terminal oligopyrimidine (TOP) as a class of recapping targets.

Closer examination of the cap status of five TOP mRNAs showed a decrease in their native 5’ cap mRNA and identified downstream capping sites for eIF3D and eIF3K mRNAs, serving as the first direct evidence of downstream recapping. Previous work suggested a potential role of alternative (APA) in recapping target selection. An analysis on APA usage showed no correlation between cytoplasmic capping and APA. Instead, inhibition of cytoplasmic cap methylation resulted in a shift from proximal to distal 3’UTR usage. PABPN1, a known regulator of APA site selection, was also shown to increase when cytoplasmic cap methylation is inhibited. Together, these results show the biological impact of cytoplasmic capping and its impact of APA usage.

17

______1 This chapter has benefited from the writing and editing contributions of all four authors: Daniel del Valle-Moralesa,b,c, Jackson B. Trotmana,c, Ralf Bundschuha,c,d,e, and Daniel R. Schoenberga,c (aCenter for RNA Biology, bMolecular Cellular and Developmental Biology, cDepartment of Biological Chemistry and Pharmacology, dDepartment of Physics, eDepartment of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio). 2 This chapter is modified from an article originally published in Nucleic Acids Res. (del Valle- Morales et al., 2020; Oxford University Press; DOI 10.1093/nar/gkaa046) and is used here in accordance with a Creative Commons license agreement.

Introduction

The 5’ cap is a critical modification to mRNAs that is necessary for gene expression.

Loss of the cap was originally associated with the rapid decay of mRNAs, but this is not the case for all mRNAs. Previous work from our lab identified a protein complex in the cytoplasm that can restore the cap to uncapped mRNAs (Otsuka et al., 2009, Mukherjee et al., 2014). This complex consists of the adaptor protein Nck1, an unknown 5’ monophosphate kinase, capping enzyme, and cap methyltransferase (RNMT) together with its co-activator RAM.

The biochemical steps of cytoplasmic capping are well established (Trotman &

Schoenberg, 2019), but less is known about the characteristics of cytoplasmic capping targets.

Our initial studies on cytoplasmic capping targets was done through the inhibition of cytoplasmic guanylylation. This caused an accumulation of uncapped mRNAs in non-translating RNPs, and we identified recapping targets by their susceptibility of degradation by XRN1 (Mukherjee et al.,

2012). Proteomic analysis of the binding partners of cytoplasmic capping enzyme identified 66 interacting partners, 52 of which were RNA binding proteins (Trotman et al., 2019). This suggests that target selectivity is dependent on these RNA binding proteins and their specific

18 binding to a subset of mRNAs. The involvement of NCK1 in receptor tyrosine kinase signaling suggests that the scope of cytoplasmic capping targets may differ depending on cell type.

A major caveat of this approach is the reliance on detecting uncapped mRNAs, an inherently unstable population of mRNA. Downstream biochemical separation of uncapped mRNAs is required to identify recapped mRNAs, and mapping of the recapping site can only be inferred as partial degradation of the 5’end may occur on uncapped mRNAs. Thus, there was a need to develop a more direct tool for studying cytoplasmic capping that can be easily introduced to different cell types.

Trotman et al., (2017), identified RNMT as the enzyme responsible for cytoplasmic N7 cap methylation. In this study we developed ΔN-RNMT, the C-terminal portion of RNMT (121-

476) containing a mutation in the S-adenosylmethionine binding pocket, as a dominant negative inhibitor of cytoplasmic cap methylation. We observed that overexpression of ΔN-RNMT leads to the accumulation of unmethylated cytoplasmic caps and observed a decrease in the steady state levels of recapped mRNAs while non recapping mRNAs remained unchanged. This decrease is likely the result of degradation of transcripts with improperly methylated caps by cap quality control enzymes such as DXO (Jiao et al., 2010).

In this chapter, we developed a USOS cell line expressing ΔN-RNMT under a Tet- promoter. Using RNA-Seq, we identified novel recapping genes based on the decrease of their steady state levels. Of the recapping mRNAs identified, 5’ terminal oligo pyrimidine (TOP) were identified as the largest single group of decreasing genes and provided the first direct evidence of cytoplasmic capping occurring at both the canonical cap site and downstream sites.

19

Materials and Methods Cloning of pcDNA3/TO-ΔN-RNMT plasmid

With pcDNA3-FLAG-RNMT 121-476 D203A (“pcDNA3-ΔN-RNMT,” (Trotman et al.,

2017); Addgene plasmid #112708) as template, Phusion Site-Directed Mutagenesis

(ThermoFisher Scientific F541) was used with forward primer JT130 (5′-

CCTATCAGTGATAGAGATCTCCCTATCAGTGATAGAGATCTGGCTAACTAGAGAACC

CAC-3′) and reverse primer JT127 (5′-GAGAGCTCTGCTTATATAGACCTCCCA-3′) to insert two copies of the tetracycline operator sequence (TetO2) between the CMV promoter TATA box and the transcription start site at the same location as in pcDNA4/TO (ThermoFisher Scientific

V102020). The sequence of the resulting plasmid, pcDNA3/TO-ΔN-RNMT, was verified by

Sanger sequencing.

Generation and culture of U2OS-TR/ΔN-RNMT stable cell line

Human U2OS osteosarcoma cells stably expressing the tetracycline repressor (U2OS-TR) were described previously in (Ostuka et al., 2009). To generate cells with tetracycline-inducible

ΔN-RNMT stably integrated, U2OS-TR cells were transfected with pcDNA3/TO-ΔN-RNMT using Fugene 6 following the manufacturer’s protocol. Cells were selected in medium containing

600 μg/mL G418 (ThermoFisher Scientific 10131035), seeded at low density on new dishes, and individual colonies were isolated with cloning cylinders and expanded. Several clonal lines were tested by Western blotting for responsiveness to doxycycline induction of ΔN-RNMT, and the line with the greatest level of expression (#17) was chosen for this study. Cells were grown at

37°C and under 5% CO2 in McCoy’s 5A medium (ThermoFisher Scientific 116600)

20 supplemented with tetracycline-free fetal bovine serum (Atlanta Biologicals S10350) to 10%

(v/v).

Immunofluorescence

U2OS-TR/ΔN-RNMT cells were seeded on glass coverslips and cultured for 25 hr in medium with or without 1 μg/mL doxycycline before fixing with ice-cold methanol for 20 min.

Coverslips were washed three times with PBS before blocking in IF Block Solution (PBS containing 1% (w/v) BSA and 0.05% (v/v) Triton X-100) at room temperature for 90 min. ΔN-

RNMT was visualized by incubating at 4°C overnight with a 1:1000 dilution of mouse monoclonal anti-FLAG (Sigma F3165). Coverslips were washed three times for 5 min with IF

Wash Buffer (PBS containing 0.5 mM MgCl2 and 0.05% (v/v) Triton X-100) and then incubated in the dark at room temperature for 60 min in IF Block Buffer containing a 1:1000 dilution of anti-mouse Alexa Fluor 680 (ThermoFisher Scientific A21057) and 0.75 μg/mL DAPI.

Coverslips were washed three times with IF Wash Buffer as before, mounted on glass microscope slides with ProLong Gold Antifade Mountant (ThermoFisher Scientific P36930), and incubated in the dark at room temperature overnight to allow the mountant to cure. Images were acquired at room temperature with a Nikon Eclipse Ti-U inverted microscope fitted with a CFI

Plan Apo VC 60X oil immersion objective and a Nikon DS-Qi1 monochrome digital camera.

Images were analyzed using Nikon NIS-Elements AR 3.10 software. Specificity of the secondary antibody for the primary antibodies was confirmed by parallel preparation of control coverslips not treated with primary antibody.

21

Western blotting

Cytoplasmic extracts were diluted to 1x Laemmli sample buffer, heated at 95°C for 5 min, and electrophoresed on Bio-Rad Mini-PROTEAN TGX SDS-PAGE gels at 150 V in 1x

Tris/glycine buffer containing 1% SDS (w/v). Proteins were then transferred to an Immobilon-

FL PVDF membrane (Millipore Sigma IPFL00010) at 4°C and at 100 V for 60 min in 1x

Tris/glycine buffer containing 20% methanol (v/v) and 0.1% SDS (w/v). Membranes were blocked at room temperature in 3% BSA (w/v) in TBS for at least 30 min. Primary antibody staining was performed with rabbit anti-RNMT antibody (Proteintech 13743-1-AP, 1:500 dilution) or rabbit anti-EEF2 (One World Lab; 1:500 dilution) in 3% BSA (w/v) in TBS.

Following three 10-min washes with TBS-T, membranes were incubated in the dark for 30 min in 3% BSA (w/v) in TBS containing a 1:10,000 dilution of anti-rabbit Alexa Fluor 680

(ThermoFisher Scientific A21109). Membranes were washed with TBS-T as before, and

Western blots were visualized on a Li-Cor Odyssey at 700 nm.

Preparation of cytoplasmic RNA

3x106 cells were split into a 10 cm dish and after 48 hr, 1 μg/ml of doxycycline was added. Twenty-four hours later cells were rinsed once with ice cold phosphate buffered saline

(PBS) and suspended in 1 ml of PBS with a cell scraper. The recovered cells were centrifuged at

2,500 xg for 5 minutes at 4°C, washed once with 1 ml PBS and centrifuged again at 2,500 xg for

5 minutes at 4°C. The pelleted cells were resuspended in 5 volumes of lysis buffer (20 mM Tris-

HCl, pH 7.5, 150 mM NaCl, 5 mM MgCl2, 1 mM DTT, 0.2% NP-40, 80 U/ml RNaseOUT

(Invitrogen)) and incubated on ice for 10 minutes. Nuclei were removed by centrifugation at

12,000 xg for 10 minutes at 4°C. The supernatant fraction was used for western blot analysis and

22 for RNA isolation using Direct-zol RNA MiniPrep Kit (Zymo Research R2053) including an in- column DNAse I digestion. Purified RNA was eluted in RNase free water.

Preparation and sequencing of QuantSeq REV libraries

Sequencing libraries were prepared from 2 μg of cytoplasmic RNA from uninduced or 24 hr doxycycline-treated (induced) cells carrying the ΔN-RNMT transgene (n=5 for each) or parental U2OS-TR cells (n=3 for each) using the QuantSeq 3′ RNA-Seq Library Prep Kit REV for Illumina (Lexogen) according to manufacturer’s protocol. The final concentration of each library was determined using Qubit 2.0 Fluorometer (Invitrogen). Paired end 75 sequencing of libraries from ΔN-RNMT expressing cells was performed by Lexogen at the Vienna Biocenter

Core Facility on an Illumina NextSeq 500. Paired end 150 sequencing of libraries from U2OS-

TR cells was performed in the Genome Services Laboratory at Nationwide Children’s Hospital,

Columbus, OH, on an Illumina MiSeq.

Quantitative RT-PCR

0.5 μg of cytoplasmic RNA was spiked with 1 fmol of CleanCap mCherry mRNA

(Trilink L7023) and 0.5 μl of oligo(dT)15 primer (500 μg/ml) in a total volume of 10 μl. The mixture was incubated at 65°C for 5 minutes and immediately placed on ice. The mixture was

® brought to 20 μl with 4 μl of 25 mM MgCl2, 1 μl of 10 mM dNTPs, 4 μl of GoScript 5x

Reaction Buffer, and 1 μl of GoScript® Reverse Transcriptase (Promega A2791). Reactions were placed in the thermocycler at 25°C for 5 minutes, 42°C for 1 hour and 70°C for 15 minutes. The resulting cDNA was quantified by real time PCR in technical triplicate reactions containing 0.5

μM reverse and forward primer (Table 3) and 1x SensiFAST SYBR No-ROX (Bioline, BIO-

23

98005) with a Bio-Rad CFX Connect real-time PCR detection system. PCR was performed with the protocol of 95°C for 3 min, 40 cycles of (95°C for 10 s, 55°C for 30 s).

Quantification and statistical analysis of RT-qPCR data

RT-qPCR data were analyzed using Bio-Rad CFX Maestro Software. Ct values were determined by regression mode. Fold change was determined by the ΔΔCq method corrected for the primer efficiency and normalized to the mCherry mRNA spike-in control and STRN4 as an endogenous non-target control. Values for uninduced (or 0 hr) samples were arbitrarily set to 1.

Statistical analysis was performed with GraphPad Prism 6 and significance was determined by unpaired t-test, with results having p value <0.05 considered significant. For Figure 9B

GraphPad Prism 6 was used to plot the mean ± standard deviation of independent biological triplicates.

5’end analysis

2 μg of cytoplasmic RNA spiked with 1 fmol of mCherry mRNA and 1 fmol uncapped

Luciferase RNA (Promega) was used for gene specific 5’end mapping using TeloPrime® Full- length cDNA Amplification Kit V1 (Lexogen) according to the manufacturer's protocol. The resulting cDNA was PCR amplified with MyTaq 2x mix (Bioline IO-21105) using gene specific reverse primers and the TeloPrime forward adapter primer. PCR samples were ethanol precipitated, separated on a 6% native PAGE gel, and bands were visualized using SYBR® Gold

Nucleic Acid Gel Stain (ThermoFisher Scientific S11494). Bands of interest were excised from the gel and centrifuged through a 0.6 ml microtube for 1 minute at 13,000 xg. The crushed gel slice was soaked in 3 volumes of nuclease free water and incubated overnight at room temperature with slight agitation. Eluted DNA was ethanol precipitated and sequenced using

24 gene specific reverse primers at the Genomics Shared Resource at The Ohio State University.

The doublet bands of EIF3K and EIF3D were extracted as described above. The recovered DNA was incubated with MyTaq 2x mix (Bioline) at 70°C for 20 minutes to add overhanging A residues and purified using DNA Clean and Concentrator-5 (Zymo). These were then ligated into pGEM®-T Easy Vector System (Promega A1360) for 1hr at room temperature using a 3:1 insert to vector ratio and T4 DNA ligase (Promega M1801) and transformed into Stellar Competent

Cells (Clontech) cells following the manufacturer's protocol. Transformed cells were plated on

LB/ampicillin plates and incubated overnight at 37°C. Individual colonies were grown in liquid medium (LB/ampicillin), and plasmid DNA was recovered using NucleoSpin Plasmid kit

(Clontech). The purified plasmids were sequenced at the Genomics Shared Resource at The Ohio

State University using T7 promoter forward primer, and capped ends were identified as the sequence immediately adjacent to the TeloPrime adaptor.

Bioinformatics

Data reduction was performed using the REV Human (GRCh38) Lexogen QuantSeq

2.2.3 pipeline from the Lexogen Blue Bee platform. Files were filtered for base mean read count

>20 across all samples (12,134 genes), and differential gene expression profiling was performed using DESeq2 (Love et al., 2014) on Galaxy (Afgan et al., 2018). Gene groups were identified by their statistical overrepresentation using default settings in the PANTHER (Protein ANalysis

THrough Evolutionary Relationships) Classification System (Mi et al., 2017, Mi et al., 2019), version 14.0 released 2018-12-03. The Annotation Data Set was set to PANTHER Protein Class and analysis was performed using Fisher’s Exact Test with False Discovery Rate correction. To identify genes with changes in 3'UTR usage, reads in the terminal 50 nucleotides of each gene

25 were counted separately from reads throughout the entire gene body for each hGRC38 RefSeq gene using featureCounts (Liao et al., 2014). Read counts in the terminal 50 nucleotides were provided as "Ribo-Seq" data to RiboDiff (Zhong et al., 2017) while total transcript length read counts were provided as "RNA-Seq" such that RiboDiff would identify genes with changes in the ratio between read counts at the annotated 3'UTR end and read counts across all possible

3'UTR ends as a function of ΔN-RNMT induction. Genes were considered significant if the multiple testing corrected p-value for a change in the ratio reported by RiboDiff was below 0.05.

Figure 6. Validation of ΔN-RNMT cell line. A line of tetracycline-inducible U2OS cells was transfected with a plasmid bearing a tetracycline-regulated transgene expressing ΔN-RNMT.

Clonal lines were selected by growth in medium containing G418 and analyzed for potential leakiness and inducible expression. A. Cytoplasmic lysates of -/+ 24 hr induction of doxycycline was analyzed with a western blot probed for anti-RNMT (upper panel) and anti-EEF2 loading control (lower panel). B. ΔN-RNMT cells were grown on coverslips -/+ 24 hr induction with

26 doxycycline, fixed, and stained with DAPI and anti-FLAG antibody to detect FLAG-tagged ΔN-

RNMT.

Results

An inducible cell line to study the inhibition of cytoplasmic cap methylation

Tetracycline-inducible U2OS cells were stably transfected with a plasmid expressing ΔN-

RNMT containing the HIV Rev NES sequence and under tetracycline operator control. Clonal cell lines were selected and validated for the overexpression of ΔN-RNMT construct. Lysates of cell lines -/+ 24 hr induction of doxycycline were analyzed by Western blot. Figure 6A shows the Western blot of the line used in the rest of this study. ΔN-RNMT was only detected after doxycycline induction. Endogenous RNMT showed no change after doxycycline induction. To ensure that ΔN-RNMT is only expressed in the cytoplasm, immunofluorescence was performed on ΔN-RNMT cells (Figure 6B). ΔN-RNMT was primarily detected in the cytoplasm and only when induced with doxycycline Thus, ΔN-RNMT cells should only affect cytoplasmic capping as a dominant negative inhibitor to RNMT.

27

Figure 7. Metagene analysis of QuantSeq data A. Correlation plot displaying Pearson coefficients for all ten libraries of uninduced (dox-) and ΔN-RNMT induced (dox+) libraries. B.

Metagene analysis of the distribution of the 75 bp reads across the genome after normalizing for gene length. C. RNA from triplicate cell cultures of U2OS-TR (Tet-repressor) treated with -/+ doxycycline for 24 hr was analyzed with RNA-Seq to determine off target effects of doxycycline. Differential expression is shown as a volcano plot. Red point represents genes that are p. adj. <0.05.

28

Figure 8. Position of sequence tags and quantitative changes of RNAs after expression of ΔN-

RNMT. A. QuantSeq Rev sequence tag visualization of a known recapping gene (Mukherjee et al., 2012) (VDAC3) and two control transcripts (STRN4 and ACTB). The sequence for ACTB

29 poly(A) signal motif AAUAA is highlighted in red. Sequence is shown in the 3’-5’ direction. B.

Differentially expressed mRNAs from QuantSeq Rev are displayed in a volcano plot. Significant mRNAs (p. adj. <0.05) are colored red for decreasing genes and blue for increasing genes.

Quantification of RNA levels after inhibition of cytoplasmic cap methylation

Cytoplasmic RNA extracted from uninduced ΔN-RNMT cells and ΔN-RNMT cells treated for 24 hr with doxycyline were analyzed using QuantSeq® 3’ mRNA RNA-Seq REV

(Lexogen). This approach was selected for this study as it is both quantitative and generates reads at the polyadenylation cleavage site. Five libraries for control and ΔN-RNMT induced cells were sequenced. The reproducibility of each library was addressed with a correlation plot, showing a strong correlation between each replicate (Figure 7A). A metagene analysis for the location of each read across the gene body confirms that the QuantSeq reads are located at the

3’UTR (Figure 7B). As an additional confirmation, the sequence reads for a known recapping gene (VDAC3) and two controls (STRN4 and ACTB) were visualized directly (Figure 8A). The mapped reads aligned to the 3’UTR of all genes selected. Zooming into the sequence of ACTB, the sequence of the aligned read contains the canonical polyadenylation cleavage site motif, confirming that these sequence reads are specific to the polyadenylation site. As a control for off target effects caused by doxycycline (Ahler et al., 2013), QuantSeq was also performed on parental U2OS-TR cells in triplicate for -/+ 24 hr addition of doxycycline. Only a single non- coding RNA was detected as differentially expressed (Figure 7C), indicating that the changes observed are specific to ΔN-RNMT expression.

30

Table 1. PANTHER analysis on ΔN-RNMT downregulated genes. The downregulated genes were analyzed with PANTHER (Mi et al., 2017) using Fisher’s Exact Test with False Discovery

Rate correction. The table is sorted by FDR.

# down PANTHER Protein REFLIST Fold raw with ΔN- expected FDR Class (20996) enrichment P-value RNMT 160 39 15.93 2.45 4.45 X 10-6 3.19 X 10-4 membrane traffic 280 51 27.89 1.83 2.13 X 10-4 7.63 X 10-3 protein transferase 866 129 86.25 1.50 2.79 X 10-5 1.50 X 10-3 RNA binding protein 636 92 63.34 1.45 1.01 X 10-3 3.11 X 10-2

TOP mRNAs decrease with ΔN-RNMT expression

Differential expression analysis identified 5606 genes that undergo a statistically significant (p.adj. <0.05) fold change ≥1.5 (Figure 8B). Surprisingly, these genes were roughly equally distributed between increasing and decreasing. Since ΔN-RNMT is expected to cause a decrease in the steady state levels of recapping genes, we focused on the decreasing set of genes first. GO-terms for the decreasing set of genes were identified using PANTHER (Protein

ANalysis THrough Evolutionary Relationships) (Mi et al., 2017) (Table 1). Using a false discovery rate of <0.05, this identified membrane traffic proteins, RNA binding proteins, transferases, and ribosomal proteins, the most statistically significant group.

31

Figure 9. Differential expression of TOP mRNAs following ΔN-RNMT induction. A. Heat map of TOP mRNAs that overlap with sequencing data. Graph represents the read counts of the five replicate libraries with ribosomal proteins (RP) and non-ribosomal TOP mRNAs (TOP) classified on the right. The scale indicates the z-score for each gene B. Cytoplasmic RNA was recovered from triplicate cultures of cells at the indicate time points of induction of ΔN-RNMT.

Each sample was spiked with mCherry mRNA and analyzed with RT-qPCR for 18s rRNA,

XRCC6, RPS4X, RPL8, RPS3, eIF3D, and eIF3K. The results were normalized to mCherry and

STRN4, and the graph displays the mean normalized expression ± standard error of the mean

(S.E.M.) (n=3). Asterisk (*) represent p. value<0.05 by unpaired two-tailed t-test.

32

Figure 10. Western blot analysis of TOP genes. Cytoplasmic lysates of control (0 hr), 6 hr, and 24 hr induction of ΔN-RNMT were analyzed by Western blot with antibodies forRps4X

(Abclonal A6730), Rps3 (Abclonal A2533), Rpl8 (Abclonal A10042), eIF3K (Abclonal A9969) and eIF3D (Abclonal A5947). Relative intensities normalized to Ponceau S staining are shown with ± S.E.M. (n=3).

Ribosomal proteins are part of a much larger class of mRNAs that contain a 5’terminal oligopyrimidine (TOP) motif. TOP mRNAs include ribosomal proteins and translation factors.

The TOP motif is adjacent to the 5’cap and consists of a cytosine followed by a string of 4-14 pyrimidines (Fonseca et al., 2018). TOP mRNAs are regulated by the RNA binding protein La-

Related Protein 1 (LARP1) which recognizes both the TOP motif and the 5’cap to suppress translation. Since we observed an enrichment of ribosomal proteins in our decreasing genes, we asked whether TOP mRNAs are also enriched in our decreasing gene set. Gentilella et al. (2017) classified 310 TOP mRNAs based on their recovery with LARP1. We used this list and checked 33 for an overlap between our differentially expressed genes. We found 116 genes that overlapped with our decreasing set of genes (p = 2.4 x 10-12 by two-tailed Fisher’s exact test). Only 31 TOP genes overlapped with the increasing set of genes; both are shown in Figure 9A.

Five TOP mRNAs, three ribosomal (RPS3, RPS4X, RPL8) and two translation factors

(eIF3K, eIF3D), were selected for RT-qPCR in a time course induction of ΔN-RNMT (Figure

9B). The data was normalized to spiked-in mCherry control mRNA and internal control STRN4.

The control transcript 18S rRNA showed no change over the time course. A decrease was detected for 4 out of the five TOP mRNAs over time course confirming our observation that

TOP mRNAs decrease with ΔN-RNMT. XRCC6, a non-TOP mRNA detected in our differential expression, also decreased over the time course, validating the QuantSeq approach. Ribosomal proteins are fairly stable and we did not expect to see a change in the steady state levels of these proteins. No change was observed in the protein levels of the TOP genes (Figure 10) although

RPL8 did decrease slightly at 6 hr.

34

Figure 11. Cap-end analysis detects a decrease at the canonical and downstream capping sites of

TOP mRNAs. A. Overview of gene specific TeloPrime approach. mRNA is converted into a

DNA/RNA hybrid by reverse transcription using oligo(dT) priming. A double stranded adapter with a C overhang is introduced to the reaction. The C overhang base pairs weakly with the cap which is then ligated to the cDNA by a double stranded DNA ligase. PCR is then performed using a forward primer for the TeloPrime adapter and gene specific reverse primer. B. 2 μg of cytoplasmic RNA from uninduced (-) and 24 hr induced (+) ΔN-RNMT cells was spiked with

35

1fmol of capped mCherry and reverse transcribed, ligated, PCR amplified, and analyzed on a 6% native PAGE gel stained with SYBR gold. cDNA used for each PCR reaction was normalized to mCherry. Cycles used for each gene were: RPS3, 15 cycles; RPS4X and RPL8, 18 cycles;

EIF3K, 19 cycles; mCherry and EIF3D, 22 cycles C. Gel bands detected in B were cut out and sequenced by Sanger sequencing. RPS4X and RPL8 were sequenced directly. eIF3K, eIF3D, and

RPS3 were cloned by TA cloning and colonies were selected and sequenced by Sanger sequencing. The parenthesis represents the number of colonies identified for each 5’end. The sequencing result for each transcript was mapped to their corresponding RefSeq sequences; 50 bp of the RefSeq sequence is shown. The box highlights the TOP motif. The sequence in bold represents the 5’ most sequence when multiple 5’ends were detected for a given gene.

TOP mRNAs are recapped at canonical and downstream capping sites

An analysis of transcriptional start sites (TSS) done with Cap Analysis of Gene

Expression (CAGE) identified TSS that maps downstream of canonical capping sites (Djebali et al., 2012). Previous work from our lab has shown that uncapped forms of known recapping genes map close to the vicinity of these downstream CAGE tags (Kiss et al., 2015, Berger et al.,

2019). With TOP mRNAs levels decreasing with induction of ΔN-RNMT, we asked whether the capping site of TOP mRNAs changes with ΔN-RNMT expression. Because LARP1 requires both the cap and the TOP motif for regulation, it is possible that cytoplasmic capping impacts this regulation, particularly if recapping occurs downstream of the TOP motif. To test this, we utilized the TeloPrime system (Lexogen) to assay the cap status of RPS3, RPS4X, RPL8, eIF3D,

36 and eIF3K. TeloPrime is a cDNA enrichment method that is specific to capped mRNAs (Figure

11A). After reverse transcription, a double stranded adapter with a cytosine overhang is introduced to the reaction. The cytosine overhang of the adapter can weakly with the 5’ guanosine cap of the DNA/RNA hybrid which is then ligated with a double stranded ligase. PCR is then performed with a forward primer of the adapter and a gene specific reverse primer to visualize the 5’end. If cytoplasmic capping occurs at the canonical capping site, we would expect to see a single band that decreases with ΔN-RNMT induction. If recapping occurs downstream, then we would expect to see multiple smaller bands that disappear with ΔN-RNMT induction.

To avoid potential artifacts in PCR amplification, each gene was processed to the minimum number of cycles needed to visualize a product. Capped mCherry was also spiked into the reaction to normalize for sample variation. The results of the TeloPrime reaction are visualized in Figure 11B. For the ribosomal proteins RPS3 and RPS4X we detected a single band that corresponds to the canonical capping site. This band decreases when ΔN-RNMT is induced for RPS3 and RPS4X which is in agreement with the RT-qPCR data. RPL8 had an upper band which was not analyzed in this study. Surprisingly for the translation factors eIF3D and eIF3K, multiple faster migrating bands were detected and decreasing with ΔN-RNMT induction. These bands correspond to the canonical capping site as well as downstream capping sites.

The bands for the uninduced (-) sample as well as the doublet detected in eIF3K and eIF3D were excised from the gel and purified for Sanger sequencing (Figure 11C). RPS4X,

RPS3, and RPL8 were directly Sanger sequenced, revealing a single sequence that mapped to the canonical RefSeq capping site. As an additional control, the RPS3 PCR product was cloned by

TA cloning and ten colonies were sequenced. All of these mapped to a single sequence at the

37

RefSeq canonical capping site. Because of the difficulty of isolating the doublet bands, the doublet band for eIF3D and eIF3K were excised as a single piece and TA cloned. 24 colonies of eIF3K and 27 colonies of eIF3D were sequenced and shown in Figure 11 C. As with RPS3, the majority of the colonies isolated (15 for eIF3K and 18 for eIF3D) mapped to the RefSeq canonical capping site. The remaining colonies mapped downstream of the canonical capping site, showing truncations in the 5’end. These truncations altered the length of the TOP motif and particularly in eIF3D, the TOP motif was missing in some of the 5’ends identified. This experiment serves as the first direct proof of cytoplasmic capping downstream of canonical capping sites.

38

Table 2. PANTHER analysis on ΔN-RNMT upregulated genes. The upregulated genes were analyzed with PANTHER (Mi et al., 2017) using Fisher’s Exact Test with False Discovery Rate correction. The table is sorted on FDR.

#up with PANTHER Protein REFLIST Fold raw ΔΝ- expected FDR Class (20996) enrichment P-value RΝΜΤ ubiquitin-protein ligase 98 27 10.88 2.48 1.19 X 10-4 2.13 X 10-3 ligase 252 54 27.98 1.93 3.19 X 10-5 7.61 X 10-4 chromatin/chromatin- 122 30 13.54 2.21 3.16 X 10-4 4.85 X 10-3 binding protein DNA binding 469 89 52.07 1.71 8.40 X 10-6 3.61 X 10-4 protein 4.04 X 10- 8.69 X 10- nucleic acid binding 1599 279 177.52 1.57 12 10 mRNA splicing factor 109 25 12.10 2.07 2.08 X 10-3 2.23 X 10-2 mRNA processing 150 38 16.65 2.28 2.33 X 10-5 6.25 X 10-4 factor RNA binding 636 111 70.61 1.57 2.26 X 10-5 6.93 X 10-4 protein kinase modulator 137 31 15.21 2.04 7.00 X 10-4 1.00 X 10-2 transcription cofactor 167 35 18.54 1.89 1.29 X 10-3 1.63 X 10-2 transcription factor 1156 191 128.34 1.49 5.02 X 10-7 5.39 X 10-5 kinase 368 65 40.86 1.59 8.65 X 10-4 1.16 X 10-2 transferase 866 144 96.14 1.50 1.03 X 10-5 3.69 X 10-4

39

Figure 12. Induction of ΔN-RNMT does not activate the cellular stress response. A.

Cytoplasmic lysates from uninduced (0 hr), 3 hr, 6 hr, 12 hr, and 24 hr induction of ΔN-RNMT were analyzed by Western blot probed for elF2α (Proteintech, 1170-1-AP), phospho-elF2α (Cell

Signaling, D9G8), and RNMT (bottom panel). B. The global impact of ΔN-RNMT on translation was determined by pulse labeling with puromycin as described in Schmidt, et al. (2009).

Triplicate cultures were incubated for 24 hr. without (lanes 1-6) or with (lanes 7-9) doxycycline.

They were then treated for 10 min with DMSO (-puromycin) or 10 μg/ml puromycin, washed twice with pre-warmed medium and incubated for an additional 50 min prior to harvest.

Puromycin incorporation was determined by Western blotting with anti-puromycin monoclonal antibody (12D10, Millipore Sigma MABE343).

40

Upregulated genes are enriched for DNA binding proteins, transcription factors and RNA processing proteins

To our surprise, 2426 transcripts increase with ΔN-RNMT induction. These increasing transcripts are most likely an indirect effect of inhibition of cytoplasmic cap methylation. Closer examination of these increasing transcripts with PANTHER (Table 2) revealed that the class of increasing transcripts are RNA binding proteins, transcription factors, kinases, and ligases. This suggests that a cellular response is activated to compensate for the loss in cytoplasmic capping.

We considered that ΔN-RNMT may be causing a stress response in the cell, but this was not the case. We saw no evidence for the phosphorylation of the integrated stress response protein eIF2ɑ and no change in global translation with ΔN-RNMT induction when nascent proteins were labeled with puromycin (Figure 12). The cells are responding to the loss of cytoplasmic capping, but this response is not through stress.

41

Figure 13. 3’UTR usage analysis reveals a shift from proximal to distal 3’UTR cleavage sites.

A. Change in 3’end usage was determined by RiboDiff comparison of full length reads vs annotated 3’reads. The change in 3’end usage was plotted against the differential expression from QuantSeq. The colored points represent significant genes of 3’end usage (red) and differential expression (blue) with p. adj. <0.05. The purple dots represent transcripts that overlapped in both comparisons. B. Position of sequence tags for a transcripts representing 3’

42 end usage significant (DDX21) and both significant (RDH11). The transcripts are shown in the

5’-3’orientation.

ΔN-RNMT induction shifts 3’UTR cleavage site from proximal to distal sites

Alternative polyadenylation is a mechanism where multiple cleavage sites are selected within an mRNA to vary the length of the 3’UTR. This may include or exclude key regulatory motifs from the 3’UTR. In a previous study, we saw a correlation of alternative polyadenylation with our original list of cytoplasmic capping targets (Kiss et al., 2015). This suggests a possible role in the 3’UTR in determining cytoplasmic capping targets. Because QuantSeq generates sequence tags at the 3’end (Figure 8A), we examined the 3’end usage of each gene. This usage was determined by repurposing the statistical package RiboDiff (Zhong et al., 2017). We reasoned that the differences in 3’UTR usage can be determined as a ratio similar to how the translation efficiency of ribosome protected fragments are calculated in ribosome profiling. The logarithm of this ratio, where a positive value corresponds to a preference of distal ends and a negative value corresponds to a preference of proximal ends, is plotted together with the differential expression from QuantSeq (Figure 13A). We did not see a significant overlap between the 3’UTR usage and differential expression (purple points), demonstrating that 3’UTR usage is not a determinant for target selection of cytoplasmic capping.

43

.

Figure 14. PABPN1 protein levels increase with ΔN-RNMT induction. Cellular lysates from uninduced (-Dox) and 24 hr induced (+Dox) ΔN-RNMT cells were analyzed by Western blot probed for PABPN1 (Proteintech 66807-1-lg) and Tubulin (Sigma T6199). Western blots were visualized using anti-mouse Alexa Fluor 594 (ThermoFisher Scientific A11005). Relative intensity normalized to Tubulin is shown with ± S.E.M. (n=3).

Interestingly, there is a large class of transcripts that show a significant increase in

3’UTR usage of distal sites (red) but with no changes in the expression levels. These transcripts do not group into any significant class of transcripts through PANTHER. Closer examination of these transcripts by mapping the reads to the RefSeq genome reveals an increase in distal end usage when ΔN-RNMT is induced. Figure 13B shows this phenomenon for DDX21 (3’end significant) and RDH11 (both significant). Since these transcripts are shifting in their 3’end usage but exhibit little expression change, this 3’end shift is likely to be an indirect effect of the inhibition of cytoplasmic capping. Upon closer examination of PABPN1, a protein that regulates

44 distal 3’end selection (Jenal et al., 2012), it is upregulated by 2 fold in mRNA levels and 1.7 fold in protein levels (Figure 14), providing a likely explanation of the observed shift in 3’end usage.

Discussion

We have previously identified recapping targets through the accumulation of uncapped transcripts when cytoplasmic capping is inhibited (Mukherjee et al. 2012). The major caveats were the need for additional biochemical separation of uncapped mRNAs and the reliance of detecting a potentially unstable pool of uncapped mRNAs. To overcome this, we developed a new tool for studying cytoplasmic capping based on the observation that induction of a dominant negative mutant form of RNMT (ΔN-RNMT) results in the decrease in the steady state levels of known recapping genes. ΔN-RNMT inhibits the cap methylation step in cytoplasmic capping, producing an unmethylated cap that would be degraded by cap surveillance mechanisms such as

DXO (Jiao et al., 2010). We reasoned that we could use the decrease in the steady state levels to directly assay the decrease of recapped mRNAs by RNA-Seq methods. We identified ribosomal proteins as one of the most significant class of genes that appeared decreasing (Table 1).

Ribosomal proteins belong to a class of mRNAs with a TOP motif so we examined the relationship of TOP mRNAs with cytoplasmic capping. We used the 301 transcripts identified by

LARP1 binding (Gentilella et al., 2017) as our list of TOP mRNAs. We saw 116 TOP mRNAs in our decreasing set of genes (Figure 9A) and confirmed this decrease for five TOP mRNAs by

RT-qPCR (Figure 9B). TOP mRNAs are localized into stress granules and P bodies early into the stress response (Wilbertz et al., 2019), and these findings may be relevant to an earlier observation where inhibition of cytoplasmic capping would reduce the recovery of cells after stress (Otsuka et al., 2009). LARP1 tethers TOP mRNAs into stress granules and since both the

45 cap and the TOP sequence is required for LARP1 binding, cytoplasmic capping may play a role in maintaining TOP mRNAs once cellular stress is removed.

An unanswered question of cytoplasmic capping is where the location of the restored cap in recapped mRNA is. Uncapped forms of mRNAs corresponding to the native site and downstream sites were identified when cytoplasmic capping was blocked with K294A, a dominant negative inhibitor of cytoplasmic capping (Mukherjee 2012). 25% of CAGE tags are downstream within exons (Djebali et al., 2012) and we showed that uncapped mRNAs mapped close to the vicinity of these downstream CAGE tags (Kiss et al., 2015, Berger et al., 2019).

Uncapped mRNAs are inherently unstable, and the exact location of recapping can only be inferred. We answered this question by using the TeloPrime approach which tags capped mRNAs by the ligation of a double stranded adapter that contains a cytosine overhang followed by gene specific PCR for five TOP mRNAs (Figure 11A). Gene specific PCR generated a single band for RPS4X, RPS3, and RPL8 that when sequenced, mapped to the canonical capping site.

This illustrates the process described in (Mukherjee et al., 2012) of cap homeostasis occurring on ribosomal proteins. For eIF3D and eIF3K, multiple downstream bands were detected that were mapped by cloning and sequencing (Figure 11B). These downstream sites are truncated in their

TOP motif and can potentially avoid LARP1 regulation without any changes in the protein sequence. This has important implications for eIF3D as eIF3D is a cap binding protein, independent of eIF4E, which is required to initiate translation on a specific subset of mRNAs

(Lee et al., 2016). It is still not known how these downstream ends are generated. One possibility is the highly structured 5’UTR of TOP mRNAs (Mizrahi et al., 2018) may protect uncapped TOP mRNAs and impede complete degradation of XRN1 (Dilweg et al., 2019).

46

It was puzzling at first the number of mRNAs that appeared increasing when cytoplasmic cap methylation is inhibited. We first examined whether this is caused by cellular stress, but we saw no evidence for the activation of the integrated stress response or a decrease in global translation (Figure 12). A PANTHER analysis of these transcripts identified DNA binding proteins, transcription factors/co-activators, and proteins involved in pre-mRNA splicing as increasing, perhaps similar to that was shown previously (El-Brolosy et al., 2019) as a way to compensate for the loss of cytoplasmic capping.

Previous bioinformatic analysis of recapping targets suggested a possible link between cytoplasmic capping and 3’end site selection (Kiss et al., 2016). We examined this relationship further by quantifying the 3’end usage of our QuantSeq reads. There was no correlation of 3’end usage and cytoplasmic capping, however to our surprise, a large pool of transcripts showed a shift of their 3’end usage from proximal to distal sites (Figure 13A). These transcripts do not group into any significant transcripts sets and suggest a global shift in 3’end usage as an indirect effect of inhibiting cytoplasmic cap methylation. This notion was supported by the 2 fold mRNA increase and 1.7 fold protein increase of PABPN1 (Figure 14). Increased PABPN1 is linked to the preferential selection of distal polyadenylation sites (Jenal et al., 2012), providing a possible mechanism which explains the observed 3’end shift. It is important to note that this study was done on U2OS cells, a bone cancer cell line. Cancer cells are known to predominantly use the proximal 3’end (Wilbertz et al., 2019). The reversal of this phenotype raises the possible role of cytoplasmic capping in cancer progression. Further studies are needed to confirm this observation.

47

QuantSeq is a powerful tool to quantify the transcriptome. However, it only provides information of the 3’ end. The assumption is that recapping targets are degraded when cytoplasmic cap methylation is inhibited. Any recapping target that may be partially degraded will also be detected as a read with QuantSeq. There is also the possibility that uncapped forms of recapped mRNAs accumulate over time with the inhibition of cytoplasmic cap methylation.

QuantSeq cannot differentiate between uncapped and capped mRNAs and this may cause possible recapping transcripts to remain undetected. When the time course induction of ΔN-

RNMT was performed at later time points, the TOP mRNAs showed a partial recovery at 24 hr.

This recovery could be due to some other mechanism that compensates for the loss of cytoplasmic capping or due to uncapped forms of the recapping transcripts accumulating in non- translating mRNPs. A transcriptomic 5’ end analysis is needed to identify the full range of recapping targets.

To conclude, ΔN-RNMT is a novel tool for the identification of cytoplasmic capping. By measuring the decrease of steady state levels of recapped mRNAs, we identified TOP mRNAs as recapping targets and showed direct evidence for recapping both at the canonical end and downstream sites. ΔN-RNMT can be easily introduced to other biological systems, thus facilitating future studies on the biological importance of cytoplasmic capping.

Acknowledgments

We wish to thank Lexogen GmbH for their gift of a QuantSeq REV kit, and for library sequencing. We also wish to thank Gabriel Shye-White for his assistance with this study and

Wen Tang for his helpful comments.

48

DVM and JBT designed and performed the experiments, DVM, RB and DRS performed bioinformatics analysis. DRS, RB, DVM and JBT wrote the manuscript and DVM, RB and DRS prepared the Figures. QuantSeq data are deposited in the Short Read Archive under BioProject

ID PRJNA547607.

49

Chapter 3. mRNA recapping increases proteome complexity by enabling translation

downstream of canonical 5’ends1,2

Abstract

Cytoplasmic capping is a process that affects gene regulation by the cyclical decapping and recapping of mRNAs. Cytoplasmic capping can occur at canonical capping sites as shown directly in previous work for RPS3 and RPS4X, and downstream of the canonical cap site, within the 5’UTR, in the case of eIF3D and eIF3K (del Valle-Morales et al., 2020). This raises the question of whether recapping may occur downstream of the canonical AUG. In a comprehensive study to detect N-termini in the proteome (Yeom et al., 2017), a majority of the detected N-termini mapped downstream of the canonical initiation site. Cytoplasmic capping may expand the proteome by creating truncated open reading frames that occur downstream from the annotated AUG . In this study, the effect of cytoplasmic capping in the proteome was measured using mass spectrometry and positional proteomics (Yeom et al., 2017).

Total proteomics showed changes in only a few proteins when cytoplasmic capping is inhibited.

However, positional mapping of N-termini detected 932 N-terminally truncated peptides, 61% of which decrease when cytoplasmic capping is inhibited. PANTHER analysis of the downstream

N-termini reveals an overrepresentation of nucleic acid binding proteins; 65% of the decreasing downstream peptides were identified as RNA binding proteins. The downstream peptides have a canonical or non-canonical translation start site upstream of the mapped peptide. Together, this study shows that N-terminally truncated proteins are produced by cytoplasmic capping, adding an additional layer of complexity within the proteome.

50

______1 This chapter is part of a collaboration with Bernice Agana and Vicki Wysocki has benefited from the writing and editing contributions of all four authors: Bernice Aganaa,b,d ,Daniel del Valle-Moralesa,c,e ,Vicki Wysocki b,d, and Daniel R. Schoenberga,e ( aCenter for RNA Biology, bOhio State Biochemistry Program, cMolecular, Cellular and Developmental Biology Program, dDepartment of Chemistry and Biochemistry, and eDepartment of Biological Chemistry, The Ohio State University, Columbus, Ohio). Bernice Agana performed all proteomic analysis. Daniel del Valle-Morales performed the TeloPrime analysis with the proteomics data.

Introduction

The 5’ cap is added co-transcriptionally almost immediately after the first transcribed nucleotide. CAGE was one of the first efforts to identify the 5’ capping site of all mRNAs as part of the ENCODE project (ENCODE Project Consortium, 2004). To their surprise, 25% of the

CAGE tags mapped downstream of the canonical capping site within spliced exons (Fejes-Toth et al., 2009). These findings coincided with the identification of a pool of RNGTT in the cytoplasm by the Schoenberg lab (Otsuka et al., 2009). When cytoplasmic capping is blocked at the guanylylation step, uncapped mRNAs accumulate in non-translating mRNPs (Mukherjee et al., 2012). The 5’end of these uncapped mRNAs mapped near the vicinity of the downstream

CAGE tags (Kiss et al., 2015; Berger et al., 2019) suggesting that cytoplasmic capping may be the source of the downstream CAGE tags. Direct proof of cytoplasmic capping occurring at both the canonical capping site and downstream for eIF3D and eIF3K was described in the previous chapter (del Valle-Morales et al., 2020). However, the downstream sites identified for eIF3D and eIF3K were located within the 5’UTR. These sites truncate the TOP motif affecting gene regulation but not the protein sequence.

Downstream CAGE tags could have significant effects on the proteome as 5’ truncated mRNAs could translate N-terminally truncated proteins. The N-terminally truncated proteins

51 could potentially be catalytically inactive, misfolded, lack key regulatory domains, or have novel activity. Yet, 25% of the CAGE tags are downstream. Ribosome profiling has identified mRNAs with multiple initiation sites (Lee et al., 2012). Also, a study of N-termini on a proteomic scale using Nrich identified a large portion of unannotated N-termini that were downstream of the canonical AUG (Yeom et al., 2019). These studies suggest that translating 5’ truncated mRNAs is a possible mechanism exploited by the cell to add proteome complexity. The question remains whether cytoplasmic capping is the source of these N-terminally truncated proteins.

This study, done in collaboration with Bernice Agana and Vicki Wysocki, used a modified form of Nrich to identify downstream N-termini in U2OS cells expressing the inhibitory form of cytoplasmic capping enzyme (hereafter referred to as K294A). Nrich is a protocol that enriches for the N-terminus of peptides. Using positional proteomics on peptides recovered from Nrich, we identified 1404 N-terminal peptides, 932 of which are downstream of canonical N-termini. Half of the downstream N-termini were decreasing when cytoplasmic capping was inhibited, and these downstream N-termini have a possible translation start site near the vicinity of the mapped peptide.

Materials and Methods

Cell Culture and Protein Extraction

Tetracycline-inducible U2OS (U2OS-TR) cells and tetracycline-inducible U2OS cells stably transfected with pcDNA4/TO/myc-K294A-ΔNLS+NES-Flag (U2OS-K294A) were described previously (Otsuka et al., 2009). Cells were maintained in a humidified incubator at

37°C under 5% CO2 and were discarded after no more than 10 passages. Cells were grown in

McCoy’s 5A medium (ThermoFisher Scientific 116600) supplemented with 10% tetracycline-

52 free fetal bovine serum (FBS, Atlanta Biologicals S10350). Triplicate cultures of parental U2OS-

TR or K294A-expressing cells at 70-80% confluence were switched to medium without or with 1

µg/ml of doxycycline for 24 hr. Prior to harvest cultures were washed 3 times with phosphate buffered saline (PBS) and lysed using ice cold lysis buffer (0.1 M HEPES, pH 8.5, 6M guanidine hydrochloride supplemented with one tablet of protease inhibitor (cOmplete Mini EDTA-free cocktail, Roche Life Science) and one tablet of phosphatase inhibitor (PhosSTOP, Roche Life

Science). The cell lysates were sonicated using Sonic Dismembrator Model 100 (Fisher

Scientific) for 30 cycles of alternating 30 sec bursts followed by 30 sec rest followed by centrifugation at 16,000 xg for 15 min at 4°C. The protein concentration of the collected supernatant was determined by bicinchoninic acid (BCA) protein assay kit (ThermoFisher

Scientific).

Sample preparation for total proteome analysis

400 µg of lysate was reductively alkylated by first incubating for 1 hr at 37 ˚C with 10 mM dithiothreitol, followed by 30 min alkylation (in the dark) at 25°C with 55 mM iodoacetamide (Sigma Aldrich). Samples were diluted 6-fold with 50 mM ammonium bicarbonate to reduce the concentration of guanidine hydrochloride to less than 1 M. Tryptic digestion was performed by adding 2 μL of 1 μg/μL trypsin (1:200 w/w) supplemented with 1 μL of 1% ProteaseMAX surfactant (Promega) and incubating at 37°C for 3 hr. Trypsin was inactivated by addition of trifluoroacetic acid (TFA) to a final concentration of 0.5%. The digestion products were centrifuged 16,000 ×g for 10 min, and the supernatants were collected and evaporated to dryness. 53

Sample preparation for N-terminal proteomics

The primary amines of protein extracts were propionylated and N-terminal peptides were enriched as previously described (Yeom et al., 2017) with modifications. Lysates of uninduced and doxycycline-treated K294A cells (5 mg in lysis buffer) were reduced with DTT at a final concentration of 10 mM at 37˚C for 1 hr and alkylated with iodoacetamide (Sigma Aldrich) at a final concentration of 55 mM at room temperature for 30 min in the dark. The pH of the protein sample was adjusted to 12 using 6M NaOH. Primary amines were labeled by treatment with 150 mM propionic anhydride (Sigma-Aldrich) for 1 hr at 25°C. The labeling step was repeated to ensure complete primary amine labeling and reaction was quenched with 20 mM hydroxylamine for 30 min. Proteins were washed with 50 mM ammonium bicarbonate to remove excess reagents using S-Trap spin columns (ProtiFi). Tryptic digests of samples were prepared by the S-

Trap protocol (Zougman et al., 2014) (1:50 enzyme:protein ratio). New primary amines generated from tryptic digests were removed by incubation with NHS-activated agarose slurry for 2 hr at 25 °C.

Unbound peptides were separated using S-Trap spin columns and the flow through fraction containing enriched N-terminal peptides was retained. These were loaded onto Waters

Oasis HLB 1CC (WAT094225) and eluted into 16 fractions using increasing concentration of acetonitrile in 10 mM ammonium formate, pH 10. The 16 fractions were concatenated into 6 fractions as follows: 10%, 22.5% and 35% acetonitrile eluates were mixed and denoted as 54 fraction #1, 12.5%, 25% and 37.5% as fraction #2, 15%, 27.5% and 40% as fraction #3, 17.5%,

30% and 42.5% as fraction #4, 20%, 32.5% and 45% as fraction #5 and 80% acetonitrile eluate as fraction #6. The peptides were evaporated to dryness and stored at −80 °C until use.

Total Protein Analysis

Analysis of the impact of doxycycline on the U2OS cell proteome (U2OS-TR, Figure 1B) was performed by LC-MS/MS on a Waters nanoACQUITY UHPLC system (Waters

Corporation, Milford, MA) coupled to a Thermo LTQ-Orbitrap Elite hybrid mass spectrometer via an Easy Spray ion source (ThermoFisher Scientific Scientific, Bremen, Germany). All other samples were analyzed using a nanoElute coupled to a timsTOF Pro equipped with a

CaptiveSpray source (Bruker, Germany). Peptides were separated on a 25 cm X 75 μm analytical column, packed with 1.6 μm C18 beads (IonOpticks, Australia). The column temperature was maintained at 50 °C using an integrated column oven (Sonation GmbH, Germany). Separation was achieved using 0.1 % formic acid (A1) and acetonitrile with 0.1 % formic acid (B1) as mobile phases. The column was equilibrated with 4 column volumes of 100 % solvent A1 before loading sample at a maintained pressure of 800 bar. Peptide separation was achieved at 0.4 ml/min using a linear gradient from 2% to 25% solvent B1 over 90 min, 25% to 37% over

10min, 37% to 80% over 10 min and maintained for 10 min for total separation method time 120 min. Data acquisition on the timsTOF Pro utilized the Parallel Accumulation Serial

Fragmentation (PASEF) acquisition mode. Instrument settings included default imeX mode, mass range 100 to 1700 m/z, capillary voltage of 1.6 kV, dry gas 3 l/min and dry temp of 180 °C.

PASEF settings included 10 ms/ms scans at 1.18 sec total cycle time, scheduling target intensity 55 of 20000, charge range 0-5, active exclusion release after 0.4 min, and CID collision energy 42 eV.

N-terminal peptide analysis

For Nrich analysis a total of 0.5 μg of peptides reconstituted in 0.1% formic acid was injected and loaded onto a Waters Symmetry C18 trap column (100 Å, 5 μm particle diameter,

180 μm x 20 mm) for desalting at a flow rate of 20 μL/min. The analytical separation was achieved on a C18 reversed phase column (75 µm x 15 cm, PepMap C18, 3 µm, 100Å) at pH 2.4 which was equilibrated to initial conditions of 98% (v/v) A1 and 2% (v/v) B1. The subsequent separation was achieved at 35 ˚C where the % B1 was maintained at 2% for 5 min; 2%-35% over

75 min; 35%-45% over 10 min and 45%-85% over 10 min at a flow rate of 0.3 μL/min. The column was held at 85% (v/v) B1 for 5 min before reaching initial conditions after 10 min.

The heated capillary temperature and electrospray voltage on the Orbitrap Elite were 200

°C and 1.5 kV, respectively, using top 15 data dependent acquisition in positive ion mode. The

MS scans were acquired at a resolution of 120 000 with an automatic gain control (AGC) target value of 1 × 106 for a scan range of 400–1600 m/z. Collision induced dissociation (CID) spectra were obtained in the ion trap with AGC target of 1 × 104, maximum ion injection time (IT) of 50 ms, 1 m/z isolation width, normalized collisional energy (NCE) of 35 and ion activation time of

10 ms. The transfer tube S-lens RF was 49% and dynamic exclusion was set at 15 s with a repeat count of 1 for an exclusion list size of 500.

56

TeloPrime gene specific PCR and sequencing

Cytoplasmic RNA was isolated as described in (del Valle Morales et al., 2020) from five replicate cultures of uninduced and 24 hr doxycycline-treated cells. Prior to cDNA synthesis 0.1 fmol of capped β-globin mRNA was added as an internal control to 2 μg of cytoplasmic RNA.

Capped end-specific cDNA was generated using the TeloPrime® Full-Length cDNA

Amplification Kit V1 (Lexogen) according to the manufacturer's protocol and eluted in a final volume of 20 μl. 1μl of this was subject to a single amplification with Phusion® Hot Start Flex

2X MasterMix (NEB) using gene specific reverse primers containing the 5’ overhang of Illumina

PE1, and TeloPrime forward adapter primer containing the 3’overhang of Illumina PE2 using the following conditions; 98ºC for 40 sec, 63ºC for 30 sec, 72ºC for 30 sec. The reaction was purified using DNA Clean & Concentrator-5 (Zymo) and eluted in a final volume of 15 μl. The purified reaction was index PCR amplified for library construction using the full-length Illumina

PE1 and barcoded PE2 with Phusion Hot Start Flex 2X MasterMix (NEB) with the following cycle conditions: 1. 98℃ for 30s, 2. 98℃ for 10s, 3. 72℃ for 1min, 4. Repeat step 2-3 for 24 cycles. The indexed reactions were purified by ethanol precipitation, eluted in 5 μl of water, and pooled. The pooled libraries were separated on a 6% native PAGE gel and visualized with

SYBR® Gold Nucleic Acid Gel Stain (ThermoFisher Scientific S11494). Bands from the size range of 100-200 bp were excised and centrifuged in a 0.6 ml microtube for 1 minute at 13,000 xg. The crushed gel slice was soaked in 3 volumes of nuclease free water and incubated overnight at room temperature with slight agitation. Eluted DNA was ethanol precipitated and

57 sequenced by paired end 75 on a Illumina MiSeq v3 from Genome Services Laboratory at

Nationwide Children’s Hospital, Columbus, OH.

Data Processing and Analysis

Database search for global proteins

The data files from the timsTOF Pro were converted into mgf-files using MSConverGUI

(Proteowizard). Protein identifications were obtained via the Thermo Proteome Discoverer software (v 2.0) using the Sequest search algorithm and the Uniprot Swissprot canonical H. sapiens proteome (as of 09/04/2019). Cysteine carbamidomethylation was set as a fixed modification while oxidation of methionines and deamidation of asparagines and glutamines were all set as variable modifications. Percolator was used for estimation of PSM (peptide spectrum match) level FDR of peptide identification. Proteins were identified at a 0.01 false discovery rate (FDR) for PSM and 0.01 protein FDR. Protein groups were filtered only to include peptides with 99% confidence and a minimum of two peptides per protein group.

Database search for N-terminal peptides

All spectra were matched to peptide sequences in the Uniprot Swissprot canonical & isoform H. sapiens proteome (as of 09/04/2019). Peptide identifications were obtained via the

Thermo Proteome Discoverer software (v 2.0) using the Sequest and MS Amanda search algorithms. Cysteine carbamidomethylation (+57.02Da) and propionylation of lysine (+56.02

Da) were set as a fixed modification while oxidation of methionines (+15.99 Da), acetylation

(+42.01 Da) or propionylation of N-termini were all set as variable modifications. Percolator was used for estimation of PSM (peptide spectrum match) level FDR of peptide identification at 0.01. 58

The complete mass spectrometry dataset is available from the ProteomeXchange Consortium via the jPOST partner repository under the dataset identifier PXD016907.

Statistical Analysis

Label free relative quantification was carried out using the Limma package in R (Ritchie et al.,

2015). Differential expression of individual peptides and proteins was determined using peptide spectral matches (PSMs). For total proteome analysis, data was filtered for PSMs observed in all replicates of at least one condition and normalized using quantile normalization. PSMs were log2 transformed and differential enrichment analysis was carried out using linear models combined with empirical Bayes statistics function in Limma. False discovery correction was applied using the Benjamani-Hochberg method and data was visualized using volcano plots.

59

Figure 15. Inhibition of cytoplasmic capping shows a minor impact in the proteome (Figure generated by Bernice Agana). A. Triplicate cultures of Tet-inducible U2OS cells stably transfected with K294A were incubated for 24 hr with or without 1μg of doxycycline. Total cellular protein was extracted, digested with trypsin, and analyzed by LC-MS/MS for changes in protein expression. Genes that are statistically significant are represented in red. B. To control for doxycycline effects on protein levels, the same experiment from A. was repeated using parental Tet-inducible USOs cells.

60

______1This section is based on data generated by Bernice Agana.

Results

Total proteomic analysis shows little change from inhibition of cytoplasmic capping

Cytoplasmic capping transcripts represent a limited number of genes in the overall transcriptome (Mukherjee et al., 2012). Thus, we expected for there to be little change in the overall proteome when cytoplasmic capping is inhibited. We confirmed this by comparing the proteome profile of cells expressing K294A to uninduced cells (Figure 15A). As expected,

RNGTT (K294A) showed the largest fold change and very few proteins undergo a change when cytoplasmic capping is inhibited. The proteins that did change did not group into any significant gene class. Interestingly, LARP1 appeared decreasing (Figure 15A) which ties a previous observation that TOP mRNAs are recapping targets (del Valle Morales et al., 2020). As a control for off-targets effect of doxycycline (Ahler et al., 2013), the same experiment was repeated with parental Tet-inducible U2OS cells after 24 hr induction of doxycycline. There were no significant proteins changes with doxycycline induction (Figure 15B), proving that any effects seen are caused solely by cytoplasmic capping.

61

Figure 16. Nrich approach for the enrichment of N-termini generated through cytoplasmic capping. Triplicate cultures of tetracycline inducible U2OS cells carrying the dominant negative

K294A form of cytoplasmic capping enzyme were cultured for 24 hr. in medium without

(uninduced) or with 1 μg/ml doxycycline. Total cellular protein extracts were treated with propionic anhydride then digested to completion with trypsin. Newly created N terminal peptides were removed with N-hydroxysuccinimide (NHS) agarose and peptides with blocked N-termini recovered with high pH were identified by LC-MS/MS.

62

Figure 17. Downstream N-termini decreases when cytoplasmic capping is inhibited (Figure generated by Bernice Agana). A. Scatter plot of all N-termini detected with positional proteomics. B. Scatter plot representing all N-termini that mapped downstream of the initiating

AUG. C. Scatter plot from A with the genes classified as either canonical N-termini or downstream N-termini. Downstream N-termini are represented in green and canonical termini are represented in purple. D. Scatter plot of N-termini that decrease with K294A induction. E.

Scatter plot of downstream N-termini (green) together with N-termini that decrease with K294A

(red). F. Scatter plot of all N-termini (blue) together with N-termini that decrease with K294A

(red). 63

______1This section is based on data generated by Bernice Agana.

Positional proteomics identifies downstream N-termini that decrease with cytoplasmic cap inhibition

We have shown direct evidence of downstream recapping for eIF3D and eIF3K in which the TOP motif is truncated (del Valle Morales et al., 2020). In this case, downstream recapping occurs within the 5’UTR and there is no change in the protein sequence of eIF3D and eIF3K.

There is a possibility that recapping can occur downstream of the initiating AUG. We have indirect evidence that uncapped mRNAs that accumulated with K294A expression map to the vicinity of downstream CAGE tags that cluster within exons (Kiss et al., 2015, Berger et al.,

2019). If recapping were to occur after the initiating AUGs, the result would be an N-terminally truncated version of the protein if translated within the same coding frame.

To answer this question, we adapted the Nrich protocol (Yeom et al., 2017), but used a single acetylating agent and one endoprotease rather than the more comprehensive analysis of all

N-termini in that study (Figure 16). Triplicate cultures of cells carrying K294A were induced for

24 hr ±doxycycline and total protein lysates were harvested. The extracts were reductively alkylated, treated with propionic anhydride to block free amines, digested with trypsin, and filtered through N-hydroxysuccinimide agarose prior to LC-MS/MS.

Using positional proteomics, we identified 1405 distinct N-termini. The N-termini are shown graphically in Figure 17A. These peptides were classified based on the location of the N- terminus as canonical if the N-terminus maps to the initiating AUG, and downstream if the N-

64 terminus maps downstream of the initiating AUG (Figure 17B). Both of these groups are overlapped in Figure 17C. This matches a similar observation in (Yeom et al., 2017) where a comprehensive look at N-termini identified a greater number of downstream N-termini (932) than canonical N-termini (473). Of the downstream N-termini identified in our study, 179 of them match with the downstream peptides in (Yeom et al., 2017). Although our study used a different cell line and is less comprehensive, this match suggests that these downstream N- termini are the translation products of downstream cytoplasmic capping. We plotted the downstream N-termini that decrease with K294A in Figure 17D. The overall pattern observed for the peptides was one of decline when cytoplasmic capping is inhibited. 50% of the identified downstream N-termini were declining (Figure 17E). None of them correspond to canonical N- termini (compare Figure 17F with 17C). Thus, cytoplasmic capping enables the expression of downstream N-termini.

65

Figure 18. Downstream N-termini that decrease with K294A are enriched in RNA binding proteins. A. Pie chart representing the proportion of downstream N-termini that decrease with

K294A (purple 61%) vs those that are unchanged (yellow 39%). Total peptides n=611. B.

PANTHER analysis of the downstream N-termini n=611. C. Pie chart representing the proportion of N-termini that were identified as ihRBP (Trendel et al., 2019) (blue 55%) vs those that were not (orange 45%). D. Pie chart representing the proportion of downstream N-termini that were identified as ihRBP (Trendel et al., 2019) (blue 65%) vs those that were not (purple

35%).

66

______1This section is based on data generated by Bernice Agana.

Characteristics of downstream N-termini

Upon closer examination of the downstream N-termini, many of them derived from the same protein. To simplify our analysis, each peptide was reduced to a single Uniprot entry, reducing the number of proteins to 611. 373 (61%) of them decreased with K294A expression

(Figure 18A). We performed a PANTHER analysis (Mi et al., 2017) of these downstream peptides and identified nucleic acid binding as the most prominent class of genes in our dataset

(Figure 18B). Recently, the list of RNA binding proteins was refined, and 1753 proteins were identified as constituents of the integrated human RNA-binding proteome (ihRBP) (Trendel et al., 2019). We tested to see if there is a representation of ihRBP in our peptides and 55% of the peptides matched to the ihRBP (Figure 18C). For the downstream peptides that are decreasing,

65% of them were identified as ihRBP (Figure 18D). In summary this indicated that RNA binding proteins contain a subset of downstream peptides that decrease when cytoplasmic capping is inhibited.

67

Figure 19. Downstream peptides have possible start sites close to the mapped peptide (Figure generated by Ralf Bundschuh). The mapped downstream peptide was scanned upstream to locate the nearest cognate (AUG) or non-cognate (CUG, GUG) start sites. Peptides that did not map to any known UniProt were labeled as background.

68

Figure 20. TeloPrime coupled with Illumina library preparation detects 5’ truncated mRNAs. A.

Overview of the modified TeloPrime approach. After following the protocol of Figure 11A, cDNA is amplified with forward and gene specific primers containing half of the Illumina sequencing adapter. The resulting cDNA is amplified with the full-length Illumina adapter, separated on a 6% native PAGE gel, size selected, and purified for Miseq PE75 sequencing. B.

Modified TeloPrime was done with 2μg cytoplasmic RNA from ± K294A cells spiked with

0.1fmol of capped β-globin. Gene specific PCR was done for each replicate with β-globin reverse primer and the final cDNA library was separated on a 6% native PAGE gel stained with

SYBR gold. C. Replicate -4 was selected for gene specific PCR of nine decreasing downstream 69 peptides. The final cDNA library was separated on a 6% native PAGE gel and stained with

SYBR gold.

______

1This section is based on data generated by Ralf Bundschuh (Figure 19) and Daniel del Valle- Morales (Figure 20).

Downstream N-termini have a corresponding 5’ truncated mRNA

The detection of these downstream peptides suggests that an mRNA is recapped downstream of the canonical cap site and translated. In order for these downstream peptides to be translated, there must be a start codon or near cognate start codon in the vicinity of the peptide.

When each peptide was searched for the nearest upstream start codon, the peptides that mapped to an UniProt protein contained possible start site 1-10 amino acids upstream from the mapped peptide (Figure 19). The cap site for these peptides must therefore be located shortly before the possible start site. Cytoplasmic RNA from five replicate K294A cells ± 24 hr doxycycline induction was purified to detect downstream 5’ends using the TeloPrime approach described in chapter 2 (del Valle-Morales et al., 2020) modified for deep sequencing (Figure 20A). The resulting cDNA is then amplified with the full-length Illumina sequencing adapter to construct the sequencing library.

Nine genes (TUBB4B, GSTP1, CCT5, PRDX1, ANXA2, HSPD1, ACTG1, ENO1, and

VIM) were selected based on their fold change with K294A and their peptide count to increase

70 the probability of detecting the rare downstream capped ends. The primers were designed to cover the region corresponding to the N-terminus of the downstream peptide. Equal numbers of cycles were done for each transcript, and amplification was only performed to the point where β- globin was detected (24 cycles). As expected, the β-globin gene specific PCR generated the expected band for all replicates (Figure 20B). Replicate -4 was selected for visualization of the ten genes on a 6% native PAGE gel. We did not expect to be able to visualize these bands given the rarity of a downstream recapped mRNA. However, a strong band running at roughly 140bp was detected for TUBB4B while a faint band was detected for the other genes (GSTP1, CCT5,

PRDX1, ANXA2, HSPD1, ACTG1, ENO1, VIM) (Figure 20C). The TeloPrime libraries are currently being processed. For now, these findings suggest that the downstream peptides may have a corresponding downstream capped mRNA that contains a possible start site close to the vicinity of the mapped downstream peptide.

Discussion

The 5’ cap site was established to occur at a static site defined by the location of the first transcribed nucleotide during RNA transcription. Evidence from CAGE has shown that mRNAs may have multiple cap sites downstream of the canonical translation start site (Fejes-Toth et al.,

2009); that study coincided with the discovery of cytoplasmic capping. Another study aimed to identify the N-terminus of all proteins discovered a large portion of their N-termini are unannotated and mapped downstream of the canonical start sit (Yeom et al., 2017). With a possible mechanism that could generate a downstream capping site, we wanted to determine if

71 cytoplasmic capping is a possible source of downstream 5’ends and if translation of recapped mRNAs produce translated N-terminally truncated proteins?

We first addressed whether inhibition of cytoplasmic capping has any effect on the proteome using a dominant negative form of cytoplasmic capping enzyme (K294A). We did not expect to see much change as only a limited number of genes were identified as recapping genes

(Mukherjee et al., 2012, del Valle-Morales et al., 2020). As expected, only a limited number of genes showed any change with inhibition of cytoplasmic capping (Figure 15).

Downstream N-termini were identified by positional proteomics. This identified

1405 distinct N-termini, more than half of which were at downstream sites (Figure 17B-C)

(Figure 18A). In Yeom et al. (2017), a more comprehensive study, a large majority of their N- termini mapped downstream of the canonical AUG. 179 of our downstream peptides matched with this study this match provides validation of our approach. When comparing the change of downstream N-termini when cytoplasmic capping is inhibited, the downstream N-termini were decreasing (Figure 17E-C). This corresponds to 65% of the downstream N-termini once curated for multiple peptides from a given protein. In other words, inhibition of cytoplasmic capping results in a decrease of downstream N-termini.

Nucleic acid binding proteins were identified as the major class of genes for the downstream N-termini. 388 of the detected N-termini belonged to recently characterized representatives of the human RNA binding proteome (ihRBPs) (Trendel et al., 2019). 65% (243) of the downstream N-termini were identified as ihRBP. Truncations of RNA binding proteins can have a significant impact on the functionality depending on the location of the recapping site, 72 excluding potential binding domains or regulatory domains. ihRBPs are now identified to both have downstream N-termini and be decreasing with cytoplasmic capping. This could explain the large range of genes observed to change when cytoplasmic cap methylation is inhibited (Figure

8) and the 3’UTR shifts seen in Figure 13. Dysregulation of RNA binding proteins can have drastic effects in the transcriptome of cells. Future work is needed to determine the biological ramifications of cytoplasmic capping.

For the downstream N-termini to be translated, there must be an upstream start site near the vicinity of the peptide. This was indeed the case when the peptides were scanned to find the nearest start site. The mapped downstream peptides had a cognate or near cognate start site 6-10 amino acids upstream (Figure 19). The next question is whether the downstream peptide also has a corresponding 5’ truncated mRNA. To answer this, we used a modified TeloPrime protocol designed for Illumina sequencing used (Figure 20A). Nine genes were selected for TeloPrime sequencing. The library for TUBB4B showed a strong band at 140bp while faint bands were observed for GSTP1, CCT5, PRDX1, ANXA2, HSPD1, ACTG1, ENO1, and VIM. The sequencing results will determine if N-terminally truncated peptides are generated by recapping of 5’-truncated mRNAs.

Downstream recapping is most likely a rare event which increases the difficulty in detecting downstream N-termini. Indeed, the PSM of the downstream N-termini are at the lower range of detectability. This is evident in the scatter plots of the downstream N-termini in Figure

18. The low PSMs can affect the analysis in Figure 19 where the background signal is higher in some of the upstream translational start sites. TeloPrime sequencing on genes with a mapping 73 downstream N-termini aims to address this question. If we can provide evidence of 5’ truncated mRNAs that match some of our downstream N-termini and decrease when cytoplasmic capping is inhibited, we can have more confidence of our detected downstream N-termini.

Several studies have detected translation events downstream of the canonical start site

(Yeom et al., 2017, Lee et al., 2012). The production of an N-terminally truncated protein is initially associated with disease (Fortelny et al., 2015), yet if regulated, can add an additional layer of proteome diversity This study identified cytoplasmic capping as a source for downstream N-termini, highlighting the importance of cytoplasmic capping in genome complexity.

74

Chapter 4. Methods used for the identification of cytoplasmic capping sites

Abstract

Cytoplasmic capping is a cyclical process where mRNAs are capped and decapped to fine tune gene expression. Recapping can occur at the canonical capping site and my work presented in earlier chapter’s provides evidence that cytoplasmic capping can occur downstream of the canonical capping site for select mRNAs (e.g., TOP mRNAs). However, a transcriptome- wide identification of recapping sites has not been provided. Here I will discuss various methods for 5’ cap enrichment, the caveats of such methods, and efforts to implement these methods to determine the location of cytoplasmic capping throughout the entire transcriptome.

Introduction

Cytoplasmic capping acts in a cyclical matter to cycle uncapped mRNAs to a translationally active state by addition of the 5’ cap. A subpopulation of the transcriptome is maintained in an uncapped state and can be processed by the cytoplasmic capping complex to add the 5’ cap back onto them. Whether these mRNAs underwent some form of processing or partial degradation is unknown. Recapping would predominantly occur at the canonical cap site, however if uncapped mRNAs are partially degraded or generated by cleavage of endonucleases, then an internal uncapped mRNA would be available to be recapped. These downstream capping sites could generate an N-terminally truncated protein if recapping were to occur downstream of the initiating AUG. Thus, there is a need to determine the exact site of recapping in mRNAs.

75

Previous studies assaying uncapped mRNAs that accumulate with inhibition of cytoplasmic capping revealed that the accumulated uncapped mRNAs locate close to downstream CAGE tags (Kiss et al., 2015, Berger et al., 2019). This provided indirect data that recapping may occur downstream of the canonical capping site. Data in chapter 2 showed direct evidence of cytoplasmic capping at the canonical site for RPS4X and RPS3 mRNAs and at downstream sites within the 5’UTR for eIF3K and eIF3D mRNAs (del Valle-Morales et al.,

2020). N-terminally truncated peptides that decrease when cytoplasmic capping is inhibited in chapter 3 suggest that cytoplasmic capping can occur downstream of the initiating AUG. We know that recapping can occur at the canonical site and downstream sites. However, we do not know the location of these sites on a transcriptomic scale and the prevalence of downstream recapping sites. This chapter discusses the various tools used to assay 5’ cap ends, efforts made to implement such assays to identify recapping sites, and caveats of such methods.

76

Figure 21. Cap analysis of gene expression (CAGE) for the identification of TSS. A. Up to 5μg of poly(A) selected RNA is used as the initial input. The mRNA is reverse transcribed with random hexamer priming. B. Biotin is chemically introduced to the 5’ cap C. The reaction is treated with RNase I to remove any remaining RNA and biotinylated capped mRNAs is recovered with streptavidin beads. D. The DNA strand from the cDNA hybrid is released and a

77

5’ linker region containing an EcoP15I cleavage site is added by ssDNA ligation. E. Second strand synthesis fills out the DNA strand and generates a double stranded cDNA. The cDNA is treated with EcoR15I which cleaves the cDNA and generates a 27 nucleotide insert. This insert is sequenced, clustered, and mapped.

Cap Analysis of Gene Expression

CAGE is one of the more successful methods for isolating 5’ends (Kodzius et al., 2006,

Takahashi et al., 2012) (Figure 21). It utilizes cap-trapper, a protocol where biotin is chemically introduced into the diol of the 5’ cap to capture and recover capped mRNAs (Carninci et al.,

1996). The captured mRNA is ligated to a 5’ adapter that contains an EcoP15I cleavage site. The cleavage of EcoP15I generates the 27 nucleotide insert which is termed the CAGE tag. The

CAGE tag is sequenced, clustered, and mapped to the genome. The resulting peaks represents the transcriptional start sites (TSS) across the genome.

CAGE was used in the ENCODE project to annotate transcriptional start sites across the (ENCODE Project Consortium, 2004). These CAGE reads are available and can be implemented as a genome track in the UCSC Genome Browser. Surprisingly, 25% of CAGE tags map within spliced exons (Fejes-Toth et al., 2009). These downstream CAGE tags provided valuable information for predicting possible sites for cytoplasmic capping. We showed that uncapped mRNAs that accumulate with K294A map to the vicinity of these downstream CAGE tags (Kiss et al., 2015, Berger et al., 2019). 78

There are however, some caveats for using CAGE to identify the location for cytoplasmic capping. The canonical sites determined by CAGE are fairly accurate and have been confirmed by other sequencing methods (Adiconis et al., 2018). The downstream sites however, are more difficult to validate due to the low read count of such peaks. When the read distribution across the gene body was compared for ReCappable Seq and CAGE, many of the downstream CAGE tags were not reproducible with ReCappable Seq (Bo et al., 2019). Downstream recapping sites are potentially rare events that may be difficult to detect with CAGE. For these reasons, we aimed to use alternative methods to identify recapping sites.

79

Figure 22. Cap-SMART protocol for enrichment of 5’ capped mRNAs. A. 375 ng of poly(A)

RNA is used as input for Cap-SMART. This contains a pool of capped and uncapped mRNAs.

B. RNA is treated with shrimp alkaline phosphatase (SAP) to convert all uncapped mRNAs into a hydroxide. C. RNA is treated with polynucleotide kinase (PNK) to convert all uncapped 5’ends to a monophosphate. D. The stop oligo consisting of isomeric nucleotides is ligated to uncapped mRNAs by T4 RNA ligase. E. First strand synthesis is performed by the SMART enzyme using an oligo(dT) primer with a 3’adapter, and a SMART oligo used for strand switching. F. Second 80 strand synthesis is performed with a biotinylated SMART primer and the 3’ adapter. G. cDNA is fragmented and the biotinylated 5’end is recovered using streptavidin beads.

Cap-SMART

Cap-SMART was developed as an alternative to CAGE to determine transcriptional start sites (Machida & Lin, 2014). Aimed to address the major caveats of CAGE, it utilizes the strand switching properties of the SMART reverse transcriptase (Clontech) to strand switch an adapter to capped mRNAs while uncapped mRNAs are blocked with a ligated primer containing isomeric nucleotides. The protocol requires less RNA as input (375ng of poly(A) RNA) than

CAGE, and uses oligo(dT) priming to only amplify poly(A) RNA.

375 ng of poly(A) RNA (Figure 22A) is treated with shrimp alkaline phosphatase (SAP) and then treated with polynucleotide kinase (PNK) to ensure that all uncapped mRNAs contain a monophosphate at the 5’end (Figure 22B-C). The resulting RNA is then incubated with T4 RNA ligase and the stop oligo (iCiGiC) (Figure 22D). Uncapped mRNAs have a free 5’ monophosphate that allows ligation of the stop oligo. The stop oligo prevents strand switching from the SMART enzyme, ensuring that strand switching only occurs on capped mRNAs (Figure

22E). The ligated RNA is treated with SMART reverse transcriptase which is primed with an oligo(dT) primer with a 3’adapter and a 5’ adapter used for strand switching. The resulting cDNA is amplified with PCR with a biotinylated 5’ adapter and a reverse primer for the 3’ adapter (Figure 22F). The biotinylated cDNA is fragmented and the 5’ends are purified with 81 streptavidin beads. The resulting 5’cDNA can be converted to a cDNA library by the Illumina

DNA sequencing protocol (Figure 22G).

Figure 23. Recovery of mRNAs after Cap-SMART. 375ng of Poly(A) selected RNA spiked with 1fmol of capped mCherry and uncapped luciferase was used as input for Cap-SMART and the protocol was performed up to the biotin selection step. 10% of the reaction was taken as input and the cDNA was purified using streptavidin beads. The unbound and wash fractions were kept and 10% of each fraction was used for RT-qPCR analysis spiked with eGFP mRNA (Trilink).

All samples were normalized to eGFP and to input.

82

Complications arose when Cap-SMART was implemented for identifying cytoplasmic capping targets through cytoplasmic cap inhibition. A test run of Cap-SMART was performed using 375ng of poly(A) selected RNA spiked with 1 fmol of capped mCherry and uncapped luciferase, and the recovery of the two spikes and a non-recapping control was assayed with RT- qPCR. 50% of a non-recapped control Rplp0 was recovered in the bound fraction after biotin selection. However, the spiked-in capped mCherry had a much lower recovery of 12%. The

SMART enzyme is biased towards mRNAs with longer poly(A) tails (Vardi et al., 2017). This could explain the poor recovery of the mCherry spike compared with the endogenously capped

Rplp0 mRNA (Figure 23) as the synthetic mCherry mRNA does not have a long poly(A) tail.

This bias may exclude potential recapping targets from being amplified. The partial recovery of luciferase raises concern that incomplete ligation of the stop oligo may cause some uncapped mRNAs to be selected. This can raise false positive results where recapped mRNAs are not abundant. In addition, CAGE outperforms SMART enzyme sequencing in both precision and sensitivity of sequencing 5’ends (Adiconis et al., 2018). These studies together with the low recovery of strand switched cDNAs convinced us that Cap-SMART may not be the ideal method to identify downstream capped ends.

83

Figure 24. TeloPrime protocol for the selection of capped mRNAs A. 2μg of RNA is primed with oligo(dT) primer containing a 3’adapter and reverse transcribed. B. A double stranded adapter containing a cytosine overhang is added to the DNA/RNA hybrid. The cytosine will base pair weakly to the guanosine cap and is ligated by a double stranded specific ligase. C. First strand synthesis is performed using the 5’ adapter primer to displace the RNA strand and

84 generate a stable double stranded cDNA. D. The cDNA is PCR amplified with the 5’ adapter forward primer and 3’adapter reverse primer to generate cDNA for downstream applications.

TeloPrime

Shortly after troubleshooting Cap-SMART, TeloPrime became available in the market.

TeloPrime is a cDNA production method developed by Lexogen that specifically selects for capped mRNAs. 1ng - 2μg of total mRNA is first reverse transcribed by an oligo(dT) primer containing a 3’adaptor (Figure 24A). The RNA/DNA hybrid is then mixed with a double stranded adaptor with a cytosine overhang (Figure 24B). This overhang base pairs weakly with

5’ guanosine cap, and using a proprietary double stranded ligase, is ligated opposite to the cap on the 3’end of the corresponding cDNA. Second strand synthesis is then performed using the 5’ adapter to generate a double stranded cDNA (Figure 24C). Full length PCR is then performed using primers for the 5’ adapter and the 3’adapter to generate enough material for downstream analysis (Figure 24D). The number of PCR cycles used is determined empirically for each sample by qPCR amplification of the cDNA.

The protocol generates enough cDNA for downstream applications such as 5’ RACE, qPCR, and PacBio sequencing. As illustrated in Chapter 2, we used TeloPrime to assay the 5’ cap status of TOP mRNAs when cytoplasmic cap methylation is inhibited. The protocol does introduce variability primarily due to the ligation efficiency of the double stranded ligase. Thus, it is critical to introduce a capped mRNA in the system to normalize for this variability. In our 85 case, 1 fmol of capped mCherry was added to the reaction for sample normalization (Figure

11B). For library preparation of TeloPrime, β-globin mRNA was used as an internal control as it can be directly aligned to HG38 for normalization.

This approach of performing gene specific PCR of the end product of TeloPrime is semi- quantitative. It is limited to the detectability of the gene with electrophoresis and to the PCR cycles performed to visualize the band. This was not an issue for Figure 11 as ribosomal proteins and translation factors are fairly abundant. A more quantitative approach to TeloPrime was used in Chapter 3 to detect 5’ends downstream of the initiating AUG codon, a much rarer event that might be difficult to detect with electrophoresis alone. In this approach, a portion of the Illumina sequencing adapter was attached to the forward TeloPrime adapter and the gene specific reverse primer. After a single round of gene specific PCR, the full length Illumina sequencing adapter is added through PCR to generate the cDNA library. The cDNA library is then separated on a native PAGE gel and size selected for sequencing (Figure 20A).

86

Figure 25. Modified TeloPrime protocol with random hexamer priming. A. Overview of the modified TeloPrime protocol with random hexamer priming. After cDNA amplification, the cDNA is hybridized with random hexamer with PE1 primer and the chain was extended using T7

DNA polymerase. A second round of PCR is used to add the PE1 and PE2 adapters for Illumina sequencing. B. A test reaction for the modified TeloPrime protocol with random hexamer primers. The cDNA was amplified for 15, 18, and 21 cycles during the index PCR. The resulting library was visualized on a 6% native PAGE gel stained with SYBR gold.

87

TeloPrime works well on the gene by gene basis, but using this method for 5’end analysis in the transcriptomic scale proved to be a challenge. Because the protocol is in the form of a kit, it is difficult to alter the reaction. An effort was made to alter the protocol for transcriptomics

(Figure 25A). Instead of priming with oligo(dT) at the reverse transcriptase step, random hexamers were added with the Illumina sequencing adapter adjacent to the hexamers. Adding the random hexamers at this step did not yield any significant results as the oligo(dT) primer is included into the reverse transcriptase mix provided by the kit, causing two primers to be competing in the reverse transcription step. To avoid this issue, the TeloPrime protocol was followed to completion. The resulting cDNA was PCR amplified for a single cycle with a 5’ adapter primer containing part of the Illumina sequencing primer. The cDNA was hybridized to the random hexamer and the chain was extended by T7 DNA polymerase. A final round of PCR with the Illumina sequencing primer was performed to add the barcoded sequencing adapters.

The major difficulty of this modification was the length of the final cDNA library. The libraries did not reflect the expected distribution from the use of random hexamers, bringing concerns that the amplified products were artifacts (Figure 25B).

It is important to note that a newer version of TeloPrime became available after the experiments shown in this document. Version 2 contains the oligo(dT) primer separate from the reverse transcription reaction mix, making it possible to include a random hexamer priming at earlier steps. Modifications were made to improve the efficiency of full-length PCR and to increase the yield of cDNA by the end of the protocol. TeloPrime is a relatively easy protocol to employ for assaying the capping status of specific genes. The newer version facilitates 88 modifications to the protocol and may improve the application of TeloPrime for sequencing

5’ends in the near future.

Figure 26. ReCappable Seq protocol. A. 5μg of RNA is incubated with shrimp alkaline phosphatase to convert all uncapped mRNAs to a 5’ hydroxyl. B. The SAP treated RNA is

89 incubated with yDcpS to remove the cap on remaining capped mRNAs to produce a 5’ diphosphate. C. The yDcpS treated RNA is incubated with vaccinia capping enzyme (VCE) to recap the 5’ diphosphate RNAs with a biotinylated GTP. D. The recapped RNA is chemically fragmented with Mg+2 and treated with polynucleotide kinase (PNK) to convert the cleaved

5’ends to a monophosphate. E. The biotinylated 5’ends are recovered with two rounds of streptavidin bead purification. F. The recovered RNA is decapped with RppH to generate a 5’ monophosphate for small RNA library construction.

ReCappable Seq

ReCappable Seq was developed by New England Biolabs as a method to capture capped

RNAs for RNA-Seq (Bo et al., 2019). It relies on the enzymatic activity of the yeast decapping enzyme yDcpS to cleave capped mRNAs. yDcpS cleaves between the gamma and beta phosphate of the triphosphate linkage of the cap, generating a 5’ diphosphate (Wulf et al., 2019).

Vaccinia Capping Enzyme (VCE) is then used to replace this with a biotinylated GMP. The recapped biotinylated mRNAs are then recovered with streptavidin beads. This principle of cleaving cap mRNAs and recapping with a biotinylated cap led to the development of

ReCappable Seq (Bo et al., 2019). ReCappable Seq was shown to specifically recover 5’ends and the sequenced libraries showed a sharper peak distribution than CAGE. It also recovered peaks downstream of the canonical cap site at a lower frequency than CAGE, suggesting that some of

90 the downstream CAGE tags may be false positives. Downstream sites were still detected with

ReCappable Seq, and it is possible that these sites are generated by cytoplasmic capping.

For the protocol, total RNA is treated with SAP to remove any potential diphosphate ends

(Figure 26A). 5μg of SAP treated mRNAs is then treated with yDcpS to remove the cap mRNAs and generate a 5’ diphosphate (Figure 26B). The RNA is treated with Vaccinia Capping Enzyme

(VCE) plus biotin-GTP to recap the mRNAs. (Figure 26C). The RNA is then fragmented chemically with magnesium to generate an even distribution of fragments and treated with polynucleotide kinase (PNK) to add a 3’ phosphate which is needed for cDNA library preparation (Figure 26D). Biotinylated 5’ends are recovered with two rounds of selection with streptavidin beads (Figure 26E) which separates the capped 5’ends from uncapped mRNAs and internally cleaved RNAs. The recovered RNA is treated with RppH to remove the biotinylated cap and generate a 5’ phosphate that is needed for library construction (Figure 26F). The final product is a small RNA fragment that corresponds to the 5’end of capped mRNAs which can be used as input for small RNA sequencing techniques.

91

Figure 27. Recovery of biotinylated capped mRNAs after performing ReCappable Seq. 5μg of total cytoplasmic RNA in triplicate was used as input for ReCappable Seq. 10% input samples were taken after the RNA was fragmented (Figure 25D) and the RNA was purified with a single round of selection on streptavidin beads. 10% of the unbound and bound fractions were spiked with EGFP mRNA (Trilink) and quantified by RT-qPCR. The samples were normalized to EGFP and the input fraction (fragmented). Error bars represent standard error of the mean (S.E.M.) n=3.

There are certain caveats and issues when we implement ReCappable Seq to recover capped mRNAs after inhibition of cytoplasmic cap methylation. First, the protocol requires at least 5 μg of total RNA as input. This is because yDcpS is not very efficient at cleaving capped mRNAs. To overcome this, the protocol uses yDcpS in excess (200 units per reaction) which

92 lowers the recovery of RNA after the reaction. In our hands, we recover 50% of the RNA after yDcpS treatment. This may be due to excess yDcpS still bound to RNA during purification. This degree of sample loss makes it difficult to perform ReCappable Seq on smaller input samples or with poly(A) as input. As a result, we resorted to test ReCappable Seq with 5 μg of total cytoplasmic RNA spiked with capped mCherry. We saw a complete recovery of mCherry after biotin selection and eluted with formamide when comparing with the fragmented RNA as input

(Figure 27). However, when cDNA libraries were made and quality control was done with TA cloning, the cDNA library consisted of either adapters or inserts of 3-5 nucleotides. More troubleshooting is needed to pinpoint the exact issue.

Conclusion

The location of recapped 5’ends in the transcriptome can have significant effects in protein translation (e.g, encoding truncated mRNAs) depending on how far downstream these sites are. Several methods were used to try and identify the site of cytoplasmic capping in mRNAs. Of the methods described, TeloPrime was the most successful, providing direct evidence of recapping occurring at both canonical and downstream sites for specific cases.

However, efforts to apply TeloPrime at the transcriptome scale were unsuccessful. Cap-SMART has been shown to be biased for mRNAs with a long poly(A) tail, affecting any potential results.

On paper, ReCappable Seq is as powerful of a tool as CAGE. Unfortunately, the protocol is much more difficult to implement and currently does not amplify enough of a desired insert for sequencing.

93

CAGE still remains as the most powerful tool to identify TSS. Although its reliability in detecting downstream sites is a concern, CAGE can still be used to identify the recapping site by comparing the downstream reads from normal conditions to when cytoplasmic capping is inhibited (e.g., by expression of ΔN-RNMT). The experiment can at least serve as a reference to later validate specific genes with TeloPrime. Thus, the identification of recapping sites across the transcriptome must await future investigation.

94

Chapter 5. Future Work and Concluding remarks

Cytoplasmic capping has progressed drastically over the last decade. mRNA decapping was once thought to be an irreversible process that only occurs in the nucleus. This notion was challenged with the identification of cytoplasmic capping. Cytoplasmic capping has been shown to play a role in fine tuning gene expression through cap homeostasis. My studies broadens the field of cytoplasmic capping by providing a new tool to study cytoplasmic capping, ΔN-RNMT.

With it, I identified TOP mRNAs as cytoplasmic capping targets and observed a shift in the

3’UTR usage from proximal to distal sites. A major unanswered question is the location of the recapping site. My studies provided direct evidence of recapping at both the canonical cap site and downstream within the 5’UTR. I also provided evidence that cytoplasmic capping may be a source of N-terminally truncated proteins.

The list of cytoplasmic capping targets was expanded with the experiments detailed in chapter 2. We now identified TOP mRNAs as a major class of recapping targets. TOP mRNAs include ribosomal proteins and translation factors, and their expression is regulated by LARP1

(Fonseca et al., 2018). Upstream of LARP1 is mTOR, which is the major regulatory pathway for cell metabolism (Laplante, & Sabatini, 2009). The relationship between mTOR and cytoplasmic capping was beyond the scope of my studies. However, the relationship between mTOR and cytoplasmic capping could imply that cytoplasmic capping is regulated by the cell cycle. It is not known what regulates the complex assembly. Nck1 is a possible candidate for regulating cytoplasmic capping due to its involvement in tyrosine kinase signaling.

We had indirect evidence suggesting that the 5’ ends of uncapped forms of recapping targets map close to downstream CAGE tags (Kiss et al., 2016). My study now provides direct 95 evidence of recapping at both the canonical cap site and downstream within the mRNA.

Downstream capping sites was observed for eIF3D and eIF3K. These sites truncate the TOP motif and could potentially avoid LARP1 regulation. The 5’UTR of TOP mRNAs are highly structured (Mizrahi et al., 2018) and may protect uncapped TOP mRNAs from XRN1 degradation. Structured sequences can impede XRN1 degradation (Dilweg et al., 2019). This could be a possible mechanism that stabilizes uncapped mRNAs or influences where recapping can occur.

TOP mRNAs were recently shown to be resistant to influenza’s cap snatching mechanism

(Clohisey et al., 2020). This raises the question what role does cytoplasmic capping play in the development of RNA viruses? Cytoplasmic capping has the potential to be hijacked by viruses as a means to cap their own mRNAs and quickly enter translation. Viruses commonly use cellular machinery for their own purposes and cytoplasmic capping seems exploitable. It would be interesting to see whether inhibition of cytoplasmic capping also inhibits viral development. This can also open the possibility of targeting cytoplasmic capping as an antiviral treatment.

An unanswered question is whether cytoplasmic capping of downstream sites could lead to the expression of N-terminally truncated proteins. N-terminally truncated proteins have been detected in HEK293 cells (Yeom et al., 2019). Could the cell use cytoplasmic capping to generate N-terminally truncated proteins as a way to expand the proteome? My study detailed in chapter 3 showed that cytoplasmic capping is a source for N-terminally truncated proteins. The downstream N-termini that decrease when cytoplasmic capping is blocked were identified as

RNA binding proteins. The truncated proteins contained a cognate or near cognate start codon upstream of the mapped peptide. It would be interesting to determine whether the truncated RNA

96 binding proteins retain their functionality. Depending on what domain is truncated, the RNA binding protein could either lose its binding function or lose regulatory domain(s). Domain mapping of the truncated RNA binding proteins identified could help answer this question. To note, RNA binding proteins also appeared in the decreasing set of genes when ΔN-RNMT is expressed, confirming the proteomic observation.

The observed shift in 3’UTR usage when cytoplasmic cap methylation is inhibited was an unexpected finding. Prior bioinformatics evidence suggested a possible link between target selection and 3’UTR usage (Kiss et al., 2016). This was not the case. Instead, a global shift from proximal sites to distal sites was observed. PABPN1’s increase provides a possible mechanism where the 3’UTR shift is an indirect effect from inhibition of cytoplasmic capping. This study used a bone cancer cell line U2OS. Cancer cells are known to exhibit a shift in 3’UTR usage from distal to proximal sites (Wilbertz et al., 2019), the complete opposite of what is observed when cytoplasmic capping is inhibited. The reversal of this phenotype suggest that cytoplasmic capping may play a role in the dysregulation observed in cancer and possibly in cancer formation. This would be an interesting avenue to pursue the possible relationship between cancer and cytoplasmic capping.

All of the experiments done with cytoplasmic capping are confined to cell lines. Because of the difficulty in separating capped from uncapped mRNAs in an efficient matter, K294A is not the ideal construct to use when experimenting with complex organisms. K294A blocks the guanylylation step whereas ΔN-RNMT blocks cap methylation. RNA seq is all that is needed to quantify the change in recapped mRNAs when using ΔN-RNMT, allowing the study of cytoplasmic capping in any organism that can be genetically altered and help answer many

97 outstanding questions. For example, does cytoplasmic capping has tissue specificity? How does the loss of cytoplasmic capping affect metazoan development? Can cytoplasmic capping be used a novel target for cancer therapeutics? ΔN-RNMT provides the means to explore these questions.

With this study, I have laid the groundwork for future work in cytoplasmic capping.

Much is left to discover in the field of cytoplasmic capping. But it is clear the cytoplasmic capping plays a vital role in maintaining the complexity of the transcriptome.

98

References

Adiconis, X., Haber, A. L., Simmons, S. K., Levy Moonshine, A., Ji, Z., Busby, M. A., Shi, X., Jacques, J., Lancaster, M. A., Pan, J. Q., Regev, A., & Levin, J. Z. (2018). Comprehensive comparative analysis of 5'-end RNA-Sequencing methods. Nat. Methods, 15(7), 505–511.

Affymetrix ENCODE Transcriptome Project, & Cold Spring Harbor Laboratory ENCODE Transcriptome Project (2009). Post-transcriptional processing generates a diversity of 5'- modified long and short RNAs. Nature, 457(7232), 1028–1032.

Afgan, E., Baker, D., Batut, B., van den Beek, M., Bouvier, D., Cech, M., Chilton, J., Clements, D., Coraor, N., Grüning, B. A., Guerler, A., Hillman-Jackson, J., Hiltemann, S., Jalili, V., Rasche, H., Soranzo, N., Goecks, J., Taylor, J., Nekrutenko, A., & Blankenberg, D. (2018). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res., 46(W1), W537–W544.

Ahler, E., Sullivan, W. J., Cass, A., Braas, D., York, A. G., Bensinger, S. J., Graeber, T. G., & Christofk, H. R. (2013). Doxycycline alters metabolism and proliferation of human cell lines. PloS One, 8(5), e64561.

Bélanger, F., Stepinski, J., Darzynkiewicz, E., & Pelletier, J. (2010). Characterization of hMTr1, a human Cap1 2'-O-ribose methyltransferase. J. Biol. Chem., 285(43), 33037–33044.

Berger, M. R., Alvarado, R., & Kiss, D. L. (2019). mRNA 5'ends targeted by cytoplasmic recapping cluster at CAGE tags and select transcripts are alternatively spliced. FEBS Lett., 593(7), 670–679.

Yan, B., Tzertzinis, G., Schildkraut, I., & Ettwiller, L. (2019) ReCappable Seq: Comprehensive Determination of Transcription Start Sites derived from all RNA polymerases. bioRxiv, 696559

Calero, G., Wilson, K. F., Ly, T., Rios-Steiner, J. L., Clardy, J. C., & Cerione, R. A. (2002). Structural basis of m7GpppG binding to the nuclear cap-binding protein complex. Nat. Struct. Mol. Biol., 9(12), 912-917.

Carninci P., Kvam C., Kitamura A., Ohsumi T., Okazaki Y., Itoh M., Kamiya M., Shibata K., Sasaki N., Izawa M., Muramatsu M., Hayashizaki Y., & Schneider C. (1996) High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics, 37(3), 327-36.

Charley, P. A., Wilusz, C. J., & Wilusz, J. (2018). Identification of phlebovirus and arenavirus RNA sequences that stall and repress the exoribonuclease XRN1. J. Biol. Chem., 293(1), 285– 295.

99

Chen, P., Zhou, Z., Yao, X., Pang, S., Liu, M., Jiang, W., Jiang, J., & Zhang, Q. (2017). Capping Enzyme mRNA-cap/RNGTT Regulates Hedgehog Pathway Activity by Antagonizing Protein Kinase A. Sci. Rep., 7(1), 2891.

Clohisey S., Parkinson N., Wang B., Bertin N., Wise H., Tomoiu A., FANTOM5 Consortium, Summer K. M., Hendry R. W., Carninci P., Forrest A. R. R., Hayashizaki Y., Digard P., Hume D. A., & Baillie K. J. (2020) Comprehensive characterization of transcriptional activity during influenza A virus infection reveals biases in cap-snatching of host RNA sequences. J. Virol., DOI: 10.1128/JVI.01720-19. del Valle Morales D., Trotman J., Bundschuh R., & Schoenberg D. R. (2020) Inhibition of cytoplasmic cap methylation identifies 5′ TOP mRNAs as recapping targets and reveals recapping sites downstream of native 5′ ends, Nucleic Acids Res., 48(7), 3806–3815

Dilweg, I. W., Gultyaev, A. P., & Olsthoorn, R. C. (2019). Structural features of an Xrn1- resistant plant virus RNA. RNA Biol., 16(6), 838–845.

Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., Xue, C., Marinov, G. K., Khatun, J., Williams, B. A., Zaleski, C., Rozowsky, J., Röder, M., Kokocinski, F., Abdelhamid, R. F., Alioto, T., … Gingeras, T. R. (2012). Landscape of transcription in human cells. Nature, 489(7414), 101–108

El-Brolosy, M. A., Kontarakis, Z., Rossi, A., Kuenne, C., Günther, S., Fukuda, N., Kikhi, K., Boezio, G., Takacs, C. M., Lai, S. L., Fukuda, R., Gerri, C., Giraldez, A. J., & Stainier, D. (2019). Genetic compensation triggered by mutant mRNA degradation. Nature, 568(7751), 193– 197.

ENCODE Project Consortium. (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science, 306(5696), 636-40.

Fonseca, B. D., Lahr, R. M., Damgaard, C. K., Alain, T., & Berman, A. J. (2018). LARP1 on TOP of ribosome production. Wiley Interdiscip. Rev. RNA, DOI: 10.1002/wrna.1480.

Fortelny, N., Pavlidis, P., & Overall, C. M. (2015). The path of no return--Truncated protein N- termini and current ignorance of their genesis. Proteomics, 15(14), 2547–2552.

Furuichi, Y., Morgan, M., Muthukrishnan, S., & Shatkin, A.J. (1975). Reovirus messenger RNA contains a methylated, blocked 5′-terminal structure: m7G(5′)ppp(5′)GmpCp-. Proc. Natl. Acad. Sci. U.S.A. 72(6), 362–366.

Gentilella, A., Morón-Duran, F. D., Fuentes, P., Zweig-Rocha, G., Riaño-Canalias, F., Pelletier, J., Ruiz, M., Turón, G., Castaño, J., Tauler, A., Bueno, C., Menéndez, P., Kozma, S. C., & Thomas, G. (2017). Autogenous Control of 5’′TOP mRNA Stability by 40S . Mol. Cell, 67(1), 55–70.

100

Gonatopoulos-Pournatzis, T., & Cowling, V. H. (2014). Cap-binding complex (CBC). Biochem. J., 457(2), 231–242.

Grudzien-Nogalska, E., & Kiledjian, M. (2017). New insights into decapping enzymes and selective mRNA decay. Wiley Interdiscip. Rev. RNA, 8(1), DOI:10.1002/wrna.1379.

Hall, M. P., & Ho, C. K. (2006). Characterization of a Trypanosoma brucei RNA cap (guanine N-7) methyltransferase. RNA, 12(3), 488–497.

Ignatochkina, A. V., Takagi, Y., Liu, Y., Nagata, K., & Ho, C. K. (2015). The messenger RNA decapping and recapping pathway in Trypanosoma. Proc. Natl. Acad. Sci. U.S.A., 112(22), 6967–6972.

Ingolia, N. T., Lareau, L. F., & Weissman, J. S. (2011). Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell, 147(4), 789– 802.

Jenal, M., Elkon, R., Loayza-Puch, F., van Haaften, G., Kühn, U., Menzies, F.M., Oude, Vrielink J.A. ,Bos, A.J., Drost, J., Rooijers, K., Rubinsztein, DC., & Agami, R. (2012) The poly(A)- binding protein nuclear 1 suppresses alternative cleavage and polyadenylation sites. Cell, 149(3), 538–55

Jiao, X., Xiang, S., Oh, C., Martin, C. E., Tong, L., & Kiledjian, M. (2010). Identification of a quality-control mechanism for mRNA 5'-end capping. Nature, 467(7315), 608–611.

Kiss, D. L., Oman, K., Bundschuh, R., & Schoenberg, D. R. (2015). Uncapped 5’ends of mRNAs targeted by cytoplasmic capping map to the vicinity of downstream CAGE tags. FEBS Lett., 589(3), 279–284.

Kiss, D. L., Oman, K. M., Dougherty, J. A., Mukherjee, C., Bundschuh, R., & Schoenberg, D. R. (2016). Cap homeostasis is independent of poly(A) tail length. Nucleic Acids Res., 44(1), 304– 314

Kodzius, R., Kojima, M., Nishiyori, H., Nakamura, M., Fukuda, S., Tagami, M., Sasaki D., Imamura K., Kai C., Harbers M., Hayashizaki Y., & Carninci, P. (2006) CAGE: cap analysis of gene expression. Nat. Methods, 3(3), 211–222.

Laplante, M., & Sabatini, D. M. (2009). mTOR signaling at a glance. J. Cell Sci., 122(20), 3589– 3594.

Lee, A. S., Kranzusch, P. J., Doudna, J. A., & Cate, J. H. (2016). eIF3D is an mRNA cap-binding protein that is required for specialized translation initiation. Nature, 536(7614), 96–99.

101

Lee, S., Liu, B., Lee, S., Huang, S. X., Shen, B., & Qian, S. B. (2012). Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proc. Natl. Acad. Sci. U.S.A., 109(37), E2424–E2432.

Leung, D. W., & Amarasinghe, G. K. (2016). When your cap matters: structural insights into self vs non-self recognition of 5' RNA by immunomodulatory host proteins. Curr. Opin. Struc. Biol., 36, 133–141.

Liao Y., Smyth G.K., Shi W. (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7), 923–930.

Lim, S. K., & Maquat, L. E. (1992). Human beta-globin mRNAs that harbor a nonsense codon are degraded in murine erythroid tissues to intermediates lacking regions of exon I or exons I and II that have a cap-like structure at the 5' termini. EMBO Rep., 11(9), 3271–3278.

Hocine, S., Singer, R. H., & Grünwald, D. (2010). RNA processing and export. CSH Perspect. Biol., 2(12), a000752.

Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol., 15(12), 550.

Machida, R. J., & Lin, Y. Y. (2014). Four methods of preparing mRNA 5’end libraries using the Illumina sequencing platform. PloS One, 9(7), e101812.

Mascarenhas, R., Dougherty, J. A., & Schoenberg, D. R. (2013). SMG6 cleavage generates metastable decay intermediates from nonsense-containing β-globin mRNA. PloS One, 8(9), e74791.

Mi, H., Huang, X., Muruganujan, A., Tang, H., Mills, C., Kang, D., & Thomas, P. D. (2017). PANTHER version 11: expanded annotation data from and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res., 45(D1), D183–D189.

Mi, H., Muruganujan, A., Casagrande, J. T., & Thomas, P. D. (2013). Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc., 8(8), 1551–1566.

Mizrahi, O., Nachshon, A., Shitrit, A., Gelbart, I.A., Dobesova, M., Brenner, S., Kahana, C., & Stern-Ginossar N. (2018) Virus-induced changes in mRNA secondary structure uncover cis- regulatory elements that directly control gene expression. Mol. Cell, 72(5), 862–874.

Mukherjee, C., Bakthavachalu, B., & Schoenberg, D. R. (2014). The cytoplasmic capping complex assembles on adapter protein nck1 bound to the proline-rich C-terminus of Mammalian capping enzyme. PLoS Biology, 12(8), e1001933.

102

Mukherjee, C., Patil, D. P., Kennedy, B. A., Bakthavachalu, B., Bundschuh, R., & Schoenberg, D. R. (2012). Identification of cytoplasmic capping targets reveals a role for cap homeostasis in translation and mRNA stability. Cell Rep., 2(3), 674–684.

Otsuka, Y., Kedersha, N. L., & Schoenberg, D. R. (2009). Identification of a cytoplasmic complex that adds a cap onto 5'-monophosphate RNA. Mol. Cell Biol., 29(8), 2155–2167.

Philippe, L., Vasseur, J. J., Debart, F., & Thoreen, C. C. (2018). La-related protein 1 (LARP1) repression of TOP mRNA translation is mediated through its cap-binding domain and controlled by an adjacent regulatory region. Nucleic Acids Res., 46(3), 1457–1469.

Ramanathan, A., Robb, G. B., & Chan, S. H. (2016). mRNA capping: biological functions and applications. Nucleic Acids Res., 44(16), 7511–7526.

Reyes, A., & Huber, W. (2018). Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues. Nucleic Acids Res., 46(2), 582–592.

Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA sequencing and microarray studies. Nucleic Acids Res., 43(7), e47.

Roux, K. J., Kim, D. I., Burke, B., & May, D. G. (2018). BioID: A Screen for Protein-Protein Interactions. Curr. Protoc. Protein Sci, 91, 19.23.1–19.23.15.

Schmidt, E., Clavarino, G., Ceppi, M., & Pierre, P. (2006) SUnSET, a nonradioactive method to monitor protein synthesis. Nat. Methods, 6(4), 275–277.

Schoenberg, D. R., & Maquat, L. E. (2009). Re-capping the message. Trends Biochem. Sci., 34(9), 435–442.

Song, M. G., Bail, S., & Kiledjian, M. (2013). Multiple Nudix family proteins possess mRNA decapping activity. RNA, 19(3), 390–399.

Stevens, A., Wang, Y., Bremer, K., Zhang, J., Hoepfner, R., Antoniou, M., Schoenberg, D. R., & Maquat, L. E. (2002). Beta -Globin mRNA decay in erythroid cells: UG site-preferred endonucleolytic cleavage that is augmented by a premature termination codon. Proc. Natl. Acad. Sci. U.S.A., 99(20), 12741–12746.

Sun, H., Zhang, M., Li, K., Bai, D., & Yi, C. (2019). Cap-specific, terminal N6-methylation by a mammalian m6Am methyltransferase. Cell Res., 29(1), 80–82.

Takahashi, H., Kato, S., Murata, M., & Carninci, P. (2012). CAGE (cap analysis of gene expression): a protocol for the detection of promoter and transcriptional networks. Methods Mol. Biol., 786, 181–200.

103

Trendel J., Schwarzl T., Horos R., Prakash A., Bateman A., Hentze MW., Krijgsveld J. (2019) The human RNA‐binding proteome and its dynamics during translational arrest. Cell, 176 (1-2), 391–403.e19

Trotman, J. B., & Schoenberg, D. R. (2019). A recap of RNA recapping. Wiley Interdiscip. Rev. RNA, 10(1), e1504.

Trotman, J. B., Agana, B. A., Giltmier, A. J., Wysocki, V. H., & Schoenberg, D. R. (2018). RNA-binding proteins and heat-shock protein 90 are constituents of the cytoplasmic capping enzyme interactome. J. Biol. Chem, 293(43), 16596–16607.

Trotman, J. B., Giltmier, A. J., Mukherjee, C., & Schoenberg, D. R. (2017). RNA guanine-7 methyltransferase catalyzes the methylation of cytoplasmically recapped RNAs. Nucleic Acids Res., 45(18), 10726–10739.

Vardi, O., Shamir, I., Javasky, E., Goren, A., & Simon, I. (2017). Biases in the SMART-DNA library preparation method associated with genomic poly dA/dT sequences. PloS One, 12(2), e0172769.

Werner, M., Purta, E., Kaminska, K. H., Cymerman, I. A., Campbell, D. A., Mittra, B., Zamudio, J. R., Sturm, N. R., Jaworski, J., & Bujnicki, J. M. (2011). 2'-O-ribose methylation of cap2 in human: function and evolution in a horizontally mobile family. Nucleic Acids Res., 39(11), 4756–4768.

Wilbertz, J.H., Voigt, F., Horvathova, I., Roth, G., Zhan, Y., & Chao, J.A. (2019) Single- molecule imaging of mRNA localization and regulation during the integrated stress response. Mol. Cell, 73(5), 946–958.

Wulf, M. G., Buswell, J., Chan, S. H., Dai, N., Marks, K., Martin, E. R., Tzertzinis, G., Whipple, J. M., Corrêa, I. R., Jr, & Schildkraut, I. (2019). The yeast scavenger decapping enzyme DcpS and its application for in vitro RNA recapping. Sci. Rep, 9(1), 8594.

Yeom, J., Ju, S., Choi, Y., Paek, E., & Lee, C. (2017). Comprehensive analysis of human protein N-termini enables assessment of various protein forms. Sci. Rep, 7(1), 6599.

Zhong, Y., Karaletsos, T., Drewe, P., Sreedharan, V. T., Kuo, D., Singh, K., Wendel, H. G., & Rätsch, G. (2017). RiboDiff: detecting changes of mRNA translation efficiency from ribosome footprints. Bioinformatics, 33(1), 139–141.

Zougman, A., Selby, P.J. and Banks, R.E. (2014), Suspension trapping (STrap) sample preparation method for bottom‐up proteomics analysis. Proteomics, 14(9), 1006-1000.

104

Appendix: list of primers

Table 3. List of primers used in chapter 2. Table 3. List of primers used for RT-qPCR and 5’end mapping Primers used for RT-qPCR Sequence (5’ – 3’) mCherry FWD ATGGTGAGCAAGGGCGAGGAG mCherry REV GCCACCCTTGGTCACCTTCAGC STRN4 FWD GGAAAGGGCAGGAGAATCTAAA STRN4 REV TCTGACACATCTGCTTTCTTCTC 18S rRNA FWD CTGAGAAACGGCTACCACATG 18S rRNA REV GGAAAGGGCAGGAGAATCTAA XRCC6 FWD CGTGGATTGTCGTCTTCTGT XRCC6 REV TCTTGTTCTTCCTCTGCTTCTT RPS4X FWD GCTCCTCGTCCATCCAC RPS4X REV TCTCCTGTCAGGGCATACT RPS3 FWD GTCTCCTTGGCAGCTGTATT RPS3 REV GAACAGTCTAGTCAGACCTTGTG RPL8 FWD ATTGGCAATGTGCTCCCT RPL8 REV GGAGATAACGGTGGCATAGTTC eIF3K FWD ACGTCGTTTCCGTTTCCA eIF3K REV CTTCTGTCGCCTTCCACAA eIF3D FWD AGCAGGTCATCCGTGTCTA eIF3D REV TAATGAGTTTGCCAGCCAGATC Primers used for TeloPprime TeloPrime FWD TGGATTGATATGTAATACGACTCACTATAG mCherry FWD ATGGTGAGCAAGGGCGAGGAG mCherry REV GCCACCCTTGGTCACCTTCAGC RPS4X REV TeloPrime CTAACGCAGCCATGGCTC RPS3 REV TeloPrime GTTCTTGCCAACCGCCAT RPL8 REV TeloPrime GGATCACACGGCCCAT EIF3K REV TeloPrime GTCGATACCCTTGAGCAACTT EIF3D REV TeloPrime CTTTCCTAGCCGATCTCCTTT Primers used for TA cloning T7 forward promoter TCAGTAATACGACTCACTATAG

105

Table 4. List of primers used in chapter 3. Primers for TeloPrime Name Sequence (5’ – 3’) GTGACTGGAGTTCAGACGTGTGCTCTTCCGTGGA TeloPrime PE2 3' TTGATATGTAATACGACTCACTATAG CACTCTTTCCCTACACGACGCTCTTCCGATCTGGC TUBB4B CTCATTGTAGTACACGTTG CACTCTTTCCCTACACGACGCTCTTCCGATCTCAC CCT5 AAAGCATCGTGAAGGGATCG CACTCTTTCCCTACACGACGCTCTTCCGATCTCGG VIM CAAAGTTCTCTTCCATTTC CACTCTTTCCCTACACGACGCTCTTCCGATCTTCT ENO1 TCAGTCTCCCCCGAACG CACTCTTTCCCTACACGACGCTCTTCCGATCTCTG HSPD1 CTCAATGATTTCTTGAATACGT CACTCTTTCCCTACACGACGCTCTTCCGATCTCAG ANXA2 AGCCATCCTCTGCTCTTCTA CACTCTTTCCCTACACGACGCTCTTCCGATCTCCT PRDX1 GAGCAATGGTGCGCTTC CACTCTTTCCCTACACGACGCTCTTCCGATCTATA GSTP1 GAGCCCAAGGGTGCG CACTCTTTCCCTACACGACGCTCTTCCGATCTCCA ACTG1 TGACGCCCTGGTGTCT CACTCTTTCCCTACACGACGCTCTTCCGATCTCCT Beta-Globin CAGGAGTCAGATGCACCAT Barcoded primers for index PCR CAAGCAGAAGACGGCATACGAGATATCACGCAG Index 168 TGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATCGATGTCTG Index 169 TGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATTTAGGCGTG Index 170 TGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATTGACCACCG Index 171 TGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATACAGTGCGG Index 172 TGACTGGAGTTCAGACGTGT

CAAGCAGAAGACGGCATACGAGATGCCAATTTG Index 173 TGACTGGAGTTCAGACGT CAAGCAGAAGACGGCATACGAGATCAGATCGGG Index 174 TGACTGGAGTTCAGACGTGT CAAGCAGAAGACGGCATACGAGATACTTGATGG Index 175 TGACTGGAGTTCAGACGTGT

CAAGCAGAAGACGGCATACGAGATGATCAGTTG Index 176 TGACTGGAGTTCAGACGTGT

CAAGCAGAAGACGGCATACGAGATTAGCTTTTGT Index 177 GACTGGAGTTCAGACGTGT

106