Reference Maps for Comparative Analysis of RNA by LC-MS and RNA Sequencing

A dissertation submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy (PhD)

in the Department of Chemistry

of the McMicken College of Arts and Sciences

2018

by

Mellie June Santander Paulines

Master of Chemistry, University of the Philippines, 2013

B.S. Chemistry, University of the Philippines, 2007

Committee Chair: Patrick A. Limbach, PhD

i ii

iii

Abstract

This dissertation is focused on developing an approach for high throughput analysis of RNA

from biological systems. Covalent modification of the canonical (A, U, C, G)

is one of the essential features of cellular RNA maturation. Chemical modifications can range from

simple isomerization or methylations to more complex chemical additions orchestrated by multiple

. More than 150 chemical modifications have been identified, and transfer RNAs (tRNAs)

contain the highest density of these modifications. The similarities among individual tRNA

isoacceptors, extensive sample preparation and low throughput of analysis have slowed the

characterization of total tRNA pools from organisms beyond the bacterial kingdom. Here, I

introduced the reference concept using a set of known standards, which allows one to rapidly characterize tRNA samples both qualitatively and quantitatively. The methods I developed here

created a more powerful comparative analysis approach using stable isotope labelled in vitro

transcripts, increased speed of data analysis by mass spectral matching and understanding the

dynamics of RNA expression in the pathogenic, radioresistant fungi Cryptococcus neoformans when

exposed to different environmental stimuli or stresses such as ionizing radiation. The advances made

here can be applied to a holistic approach in studying tRNA modifications, expression levels and other

cellular processing in a biological system.

iv

v

Acknowledgement

Ang in inga obra akon gina didicar sa akon pamilya. Salamat sa pag palangga kag pag suporta sa ini nga pang lakaton. Sa akon maestró, Dr. Patrick Limbach, salamat sa pag kupkop sa imo luyo. Halin sa akon kabubut-on, madamo gd nga salamat.

This body of work is dedicated to my family. Thank you for the love and support as I walk this journey. To my adviser, Dr. Patrick Limbach, thank you for taking me under your guidance. My deepest gratitude to you.

vi

vii

Table of Contents

Abstract ...... ii

Acknowledgements ...... iv

List of Figures ...... 11

List of Tables ...... 15

List of Appendices ...... 17

Chapter 1 Introduction ………………………………………………………………………..18

1.1 Research Goal ………………………………………………………………………………..18

1.2 Introduction………………………………………………………………………………….19

1.2.1 Chemical Modifications to RNA…………………………………………………….20

1.2.2 Characterizations of RNA Modifications – an Overview…………………………….24

1.2.3 Global Profiling of Modifications – Common Themes……………………………….26

1.2.4 Modification Profiling by RNA-Seq Approaches……………………………………..28 1.2.5 Bioinformatics data from RNA Seq only……………………………………………28

1.2.6 Specific Modified Nucleosides……………………………………………………….29

1.2.6.1 5-methylcytidine [m5C] and 5-hydroxymethylcytidine [hm5C]………………30

1.2.6.2 N6-methyladenosine [m6A] and N6, 2’-O-dimethyl [m6Am]………32

1.2.6.3 [ψ]………………………………………………………….34

1.2.6.4 [I]…………………………………………………………………35

1.2.6.5 1-methyladenosine [m1A]………………………………………………….37

1.2.7 Profiling by Class of Modifications………………………………………………….38

1.2.7.1 Methyl modifications………………………………………………………38

1.2.7.2 2’-O-methyl modifications……………………………………………..…40

8

1.2.7.3 Nicotinamide dinucleotide (NAD) caps …………………………40

1.2.8 RNA-Seq Alternatives for Modification Profiling and Measurements……...….41

1.2.8.1 Mass Spectrometry………………………………………………....…….41

1.2.8.2 SCARLET (m6A)………………………………………………………………..43

1.2.8.3 Reverse Transcription at Low triphosphate (RTL-P)..44

1.2.9 Conclusion………………………………………………………...……………….44

Chapter 2 Using Mass Spectral Matching to Interpret LC-MS/MS Data During RNA Modification Mapping ………………………………………………………………………….47

2.1 Introduction…………………………………………………………………………...47

2.2 Experimental…………………………………………………………………………..48

2.3 Results and Discussion ………………………………………………………………...51

2.4 Conclusion …………………………………………………………………………....65

Chapter 3 Stable Isotope Labeling for Improved Comparative Analysis of RNA Digests by Mass Spectrometry …………………………………………………………………………..66

3.1 Introduction…………………………………………………………………………...66

3.2 Experimental……………………………………………………………………….…67

3.3 Results and Discussion …………………………………………………………....…..70

3.4 Conclusion …………………………………………………………………....………83

9

Chapter 4 RNA modification of the fungi Cryptococcus neoformans by mass spectrometry and its dynamics under ionizing radiation

4.1 Introduction……………………………………………………………………..……85

4.2 Experimental…………………………………………………………………………86

4.3 Results and Discussion …………………………………………..…………………...89

4.4 Conclusion ………………………………………………………………………….103

Chapter 5 Radiation induced changes in transcriptome and tRNA levels of Cryptococcus neoformans by RNA-sequencing

5.1 Introduction…………………………………………………………………………104

5.2 Experimental………………………………………………………………………...105

5.3 Results and Discussion ………………………………………………...……………107

5.4 Conclusion …………………………………………………………………...……..125

Chapter 6 Conclusion and Future Directions

6.1 Summary and Conclusion ………………………………………………………...…126

6.2 Future Directions ………………………………………………………….………128

Bibliography ...... 139

Appendices ...... 148

10

List of Figures

Figure 1.1 DNA and RNA reversible post transcriptional modifications that regulates gene expression. DNA methylation alters how genes are expressed without changing the DNA sequence. RNA methylation has recently been demonstrated to regulate how a transcript can be processed and affect gene expression……………………………………………………………………………20

Figure 1.2 Representative RNA chemical modifications. [m5C] 5-methylcytidine; [hm5C] 5- hydroxymethylcytidine; [m6A] N6- methyladenosine; [m6Am] N6,2-O-dimethyladenosine; [ψ] pseudouridine; [I] inosine. ………………………………………………………………………22

Figure 1.3 Common themes in global profiling by RNA-Seq approaches. (A) Chemical modification derivatization strategies are used to enhance reverse transcriptase (RT) stops. (B) Differential analysis takes advantage of varying sensitivity of RNA-Seq to the presence of chemical modifications. (C) Affinity purification—usually through antibodies targeting specific chemically modified nucleosides—enriches the RNA pool prior to next-generation sequencing……………………….27

Figure 1.4 Schematic representation of MeRIP-Seq. The fragmented m6A-containing mRNAs is incubated with an m6A-antibody. The bound mRNAs are immunoprecipitated and the enriched pool is subjected to next generation sequencing………………………………………………………33

Figure 1.5 (A) Chemical reaction of inosine with acrylonitrile to form N1-cyanoethylinosine (ce1I). (B) Schematic of the ICE-seq protocol. The RNA is treated with CE+ (left) and control (CE-) on the right are shown. In the first strand of cDNA synthesis, CE+ condition has truncated reads whereas in the CE- condition, inosine is transcribed as . After the second cDNA strand synthesis, a size selection step of 400-500 bp cDNA is performed. Sequencing of cDNA reads from both ends can reveal the A-I editing site by detecting G-erased reads in CE+ conditions. …………………36

Figure 1.6 Schematic representation of demethylase-thermostable group II intron RT tRNA sequencing (DM-tRNA-seq). Red shaded circles are Alkb-sensitive methylated nucleosides (m1A, m1G, m3C)………………………………………………………………………………………39

Figure 1.7 Schematic representation of DNA-based exclusion list for enhanced detection of modified RNAs by LC-MS/MS……………………………………………………………………………42

Figure 2.1 Experimental scheme for the bottom up, RNA mass mapping approach. This is a combination of predicting the location of modification ( data) to the genomic sequences, in silico digestions and manual de novo interpretation of the MS/MS data. ……………...……….52

Figure 2.2 Workflow for qualitative analysis of CID of oligonucleotide by spectral matching……54

Figure 2.3 Screenshot of the NIST MS search v2 software. a) Reference (search) b) histogram of the 100 hits c) list of the hits with the score (match, reverse match) d) reference spectrum and m/z of fragment ions e) head to tail plot of the reference and the hit in the library f) spectra of the hit and m/z of fragment ions………………………………………………………………………………56

11

Figure 2.4 Head to tail plot of the hit 1 for the oligo AU[t6A]AGp. The top spectrum is the reference and the bottom is the scan 3014 from the library…………………………………………………58 Figure 2.5 Extracted ion chromatogram of the wild type and mutants for the oligo AU[t6A]AGp (m/z 899.97). Δtsa B and ΔtsaC is completely devoid of the t6A modification in the anticodon loop of tRNA Ile_GAU whereas the Δtsa E still has trace amounts of the modification………………………….63

Figure 2.6 Head to tail plot of the hit 1 for the oligo AU[t6A]AGp. The top spectrum is the reference and the bottom is hit 1, which corresponds to the unmodified version of the reference……………64 Figure 3.1 (A) Schematic outline of stable isotope labeling- comparative analysis of RNA digest (SIL- CARD). An in vitro transcribed RNA, synthesized with 13C and 15N enriched triphosphate, is the reference for the RNA sample. When RNase T1 digestion products of the sample are unmodified, the mass spectrum will reveal a doublet, separated by 15 Da, from the sample and reference. Singlets, which contain post-transcriptional modifications, can be characterized by MS/MS. (b) Sequences for the E. coli tRNA Tyr II transcript (reference) and E. coli tRNA Tyr II RNA (sample). The sequences can be digested theoretically (i.e RNase T1) to generate a predicted list of singlets and doublets………………………………………………………………………………………..…70

Figure 3.2 Mass spectral data for the E. coli tRNA Tyr II RNase T1 digestion product UUCGp. (A) The doubly charged digestion products from both sample and reference are detected with the expected m/z difference of 7.5. MS/MS of the (B) sample and (C) reference digestion products. The 15 Da increase in the y-type fragment ions of the reference is due to the 13C and 15N labeled . *denotes isotopically labeled oligonucleotide……………………………………….73 Figure 3.3 Mass spectral data for the E. coli tRNA Tyr II RNase T1 digestion products (A) CCAAAGp, (B) UCAUCGp and (C) ACUUCGp. As the digestion products from reference and sample co-elute, the reference can be used for quantitative analysis of tRNA abundance in the mass spectrum…………………………………………………………………………………………..76

Figure 3.4 Extracted ion chromatograms (XICs) and mass spectra of the three expected singlets from SIL-CARD analysis of E. coli tRNA Tyr II, where the digestion products from the post- transcriptionally modified sample all elute after the tRNA transcript. (A) [m5U]ΨCGp (m/z 646.08, - 2) and UUCG*p (m/z 646.58, -2); (B) [s4U][s4U]CCCGp (m/z 960.04, -2) and UUCCCG*p (m/z 951.59, -2); (C) ACU[Q]UA[ms2i6A]A[Ψ]CUGp (m/z 1365.56, -4), ACUG*p (m/z 658.08) and UAAAUCUG*p (m/z 1293.18). The unmodified anticodon sequence (ACUGUAAAUCUGp) was not detected in the sample, hence no doublets for ACUGp and UAAAUCUGp are detected. Modified nucleosides: s4U: 4-thiouridine; Q: Queosine; ms2i6A: 2-methylthio-N6-isopentenyladenosine; Ψ: pseudouridine; m5U: 5-methyluridine……………………………………………………………..77

Figure 3.5 (A) Representative mass spectra of various heavy:light ratios for the RNase T1 digestion product UUCCCG. (B) Calibration curve generated for the ten different heavy:light ratios of UUCCCG listed in Table 2……………………………………………………………………….79

12

Figure 3.6 Mass spectra of the unmodified oligonucleotide AAUCCUUCCCCCACCACCAAAGp (m/z 1734.78, -4 charge state). (A) When this oligonucleotide is labeled by the 18O approach, the predicted doublet (m/z 1734.78 and 1735.06) is not readily identified at this high charge state. (B) When the same oligonucleotide is labeled by using enriched GTP, the expected doublet can easily be identified with a low resolution mass analyzer……………………………………………………..83

Figure 4.1 Scheme used to investigate the effects of ionizing radiation in the fungus C. neoformans. Chapter 4 will discuss results obtained in viable count assay and LC-MS/MS. Chapter 5 explores the results of RNA sequencing……………………………………………………………………...…90

Figure 4.2 Consensus modification of all tRNA modifications found in sequence specific manner in the clover leaf structure of tRNA from C. neoformans. ……………………………………………..93

Figure 4.3 Biosynthetic pathway for the synthesis of ms2io6A (2-methylthio-N6-(cis- hydroxyisopentenyl) adenosine. The associated are labelled in numbers. (1) Mod5 (S. cerevisae) (2) MiaE (S. typhimurium) (3) MiaB (E. coli). ……………………………………………………….96

Figure 4.4 Survival curve of C. neoformans exposed to varying levels of ionizing radiation at a constant dose of 132 Gy/hour……………………………………………………………………………...97

Figure 4.5 Dose response of the three IR induced oxidation of (A) guanosine and (B) m5C and (C) m6A in C. neoformans. A. Fold change response of the 2-electron oxidation of base, 8-oxo-G B. Fold change response of m5C and its oxidation product, hm5C and (C) oxidation product of m6A to hm6A………………………………………………………………………………………..…99

Figure 4.6 Hierarchal clustering analysis of ionizing radiation induced changes in the level of tRNA modifications in C. neoformans……………………………………………………………………101

Figure 5.1 A. Volcano plot of the differentially expressed genes in 100 and 300 Gy relative to control (FC>2 and p<0.05). B. distribution of the significantly up and down regulated genes relative to control…………………………………………………………………………………………...109

Figure 5.2 Fold change response of RNA modifying transcripts of C. neoformans in IR. Ψ – Pus 1, Pus 4; D – Dus 1, Dus 2, Dus 3; Um – Trm 14; m5U – Trm2; mcm5U- Trm 9, Trm 112, Uba 4; mcm5s2U – Ncs 2, Ncs 6; I – Tad 1; m1A – Trm 61; m6A – Ime 4; m1I Tad 1, Trm 5; i6A, io6A – Mod 5, t6A – Sua 5; m3C – Trm140; m5C – Trm 5; m2G – Trm 3; m1G – Trm 10, m7G – Trm 8; m22G – Trm1. No known enzymes for the following nucleosides: nmc5U and Am. Known enzymes in S. cerevisae or E. coli was aligned (BLAST) to the C. neoformans genome for homologues. Asterisk indicate a statistical significant difference in expression level of genes at p<0.05…………………………111

13

Figure 5.3 Enzymatic pathway for the synthesis of I and mcm5s2U from the canonical base A and U. mcm5s2U synthesis begins with the addition of a carboxymethyl group in position 5 of U to form cm5U. This step is catalyzed by the elongator complex (elp 1-6) in yeast. A methyl group is added by a methyl , Trm9/112 with S-adenosyl methionine as the methyl donor[1] to form mcm5U. Ubiquitin related modifier 1 (Urm1) along with other proteins adds the thiol group at position 2 [2] to form mcm5s2U. The conversion of A to Inosine is an example of RNA editing which the hydrolytic deamination of adenosine to form inosine. The reaction is catalyzed by tRNA specific adenosine deaminases (ADAT) which in yeast is the Tad1 enzyme………………………………112

Figure 5.4 A. Qualitative analysis of the tRNA genes in C. neoformans after exposure to IR for A) pre tRNA and B) mature tRNA. Green indicates that it is expressed and red means no new pre or mature tRNA was detected………………………………………………………………………114

Figure 5.5 Venn diagram of the 132 pre-tRNA transcripts detected in C. neoformans in IR. A total of twenty tRNAs are synthesized after cells are exposed to IR (IR induced) and five that are expressed in the control only samples (IR repressed). The 101 tRNAs transcripts expressed consistently are labelled as “housekeeping tRNAs”………………………………………………………………117 Figure 5.7 Heat map of the 70 isotype mature tRNAs fold change in IR………………………..119 Figure 5.8 A. DNA repair enzymes induced in IR. Rad51, Rad 57 – recombinase; Rdh54, Rad54 – DNA dependent ATPase; Enzymes in double stranded break repair – Rad57, RFA2/RPA32, Dnl4, MRE11; B. Homologous recombination enzymes – Mus81, Sae2, Rad57, Rad54, Rad52. C. Post replication repair – Rad5, HPR5, Rad6; Nucleotide excision repair- Rad1; DNA damage checkpoint – Rad9, Rad17. Asterisk indicate a statistical significant difference in expression level of genes at p<0.05……………………………………………………………………………….…122

Figure 6.1 A. Single reaction monitoring of Dus1 nucleoside by LC-MS/MS. No significant change over the exposure tested. B. Western Blot analysis of the Dus1 protein…………………………129

Figure 6.2 A. Comparative analysis of the tRNA genes and genes with introns in C. neoformans and other model fungi. Botrytis cinerea, Aspergillus fumigatus and Penicillium chrysogenum were found Chernobyl reactor 4. C. neoformans possess the highest intron containing tRNA genes ever sequenced to date. B. Canonical intron (blue circles) in tRNA is found in the anticodon (black circles) loop. After splicing, the mature tRNA can participate in translation. Spliced sites are indicated by the arrow…131

Figure 6.3 A. Number of tRNA genes and intron length in C. neoformans. Underlined is the range of length of regulatory RNA. B. Nucleotide conservation of the first eight sequence of the introns in the fungi……..………………………………………………………………………………………133

Figure 6.4 Proposed model for the differential expression of pre-tRNA genes transcribed in response of different conditions. The protein Maf1 is dephosphorylated and binds to the Poll III to initiate tRNA transcription, with the transcription factor IIIC recognizing the tDNA. With different forms of stress, different set of pre-tRNAs are transcribed colored blocks and a subset (housekeeping set) are expressed at all times. ………………………………………………………………………...138

14

List of Tables

Table 2.1. Results obtained for the reference spectrum oligo AU[t6A]AGp_899.97 using Identity (Quick, Normal, MS/MS) and Similarity (Simple, Hybrid, Neutral Loss, and MS/MS in EI)………57

Table 2.2. Result of searches of modified reference spectrum in identity (quick) and similarity (simple)………………………………………………………………………………...…………58

Table 2.3. Result of searches of unmodified reference spectrum in identity (quick) and similarity (simple)………………………………………………………………………………………...…60

Table 2.4 Result of searches of decoy spectra from C. neoformans T1 digest using similarity search…61

Table 2.5. Similarity search for AU[t6A]AGp_899.97 to S. mutans libraries (wild type, ΔC, ΔB and ΔE)……………………………………………………………………...…62

Table 2.6. Similarity search for E.coli_AUAAGp_827.41 to S. mutans libraries (wild type, ΔC, ΔB and ΔE)………………………………………………………………………………………….……62

Table 2.7. List of m/z values for c- and y- type fragment ions for the anticodon loop of tRNA Ile_GAU both modified and unmodified…………………………………………………………63

Table 3.1 The gene sequences of E. coli tRNA Tyr II forward primer (with the T7 RNA polymerase promoter region) and reverse primer……………………………………………………………...68

Table 3.2 The light and heavy oligonucleotide of the RNase T1 digest of the tRNA Tyr II reference. A mass shift of 15 Da is observed per charge state. The light and heavy oligo for the terminal oligo can’t be distinguished since it does not have a guanosine nucleotide………………………………74

Table 3.3 Three point calibration curve for the oligo UUCCCGp (light and heavy) at different ratios with the phosphate group at the 3’ end……………………………………………………………80

Table 3.4 Average ion abundance ratios of the monoisotopic peaks of heavy: light ratios for the oligo UUCCCG. The corresponding calibration curve is shown in Figure 3.5b……………………81

Table 4.1 Modified nucleoside from C. neoformans total tRNAs by LC-MS/MS, including the accurate mass and retention time………………………………………………………………….91

Table 4.2 Digestion product obtained from the anticodon loop of the tRNAs where two different version of A37 were detected……………………………………………………………………..95

15

Table 5.1 List of IR induced, and IR repressed pre-tRNAs……………………………………...117 Table 5.2 ROS activated antioxidant transcripts induced in IR: superoxide dismutase (SOD1,2); Catalase (CAT1-4); peroxiredoxin (TSA); thioredoxin (TRX1,2 GRX3); thioredoxin reductase (TRR1); cytochrome c-peroxidase (CCP1); gluthathione peroxidase (GPX1,2,5) and sulfiredoxin (SRX1)…………………………………………………………………………………………123

Table 2. Capsule related transcripts. Description of these enzyme can be found at the list of abbreviations……………………………………………………………………………………124

Table 6.1 Alignment of tRNA introns to known miRNA. (www.mirbase.org). HK – housekeeping, IR-I- radiation induced, IR-R – radiation repressed. Y - yes; N – No……………………..……134

Table 6.2 Differential expression of multiple copies of tRNA Val AAC in IR…………………136

16

List of Appendices

Appendix A1 Summary of mapped tRNA sequences for C. neoformans…………………………148

Appendix A2 Gene Ontology analysis of C. neoformans up and down regulated genes in IR. ..….159

Appendix A3 Pre-tRNAs that were not detected in C. neoformans in any samples analyzed…….163

Appendix A4 Housekeeping tRNAs …………………………………………………………..165

17

Chapter 1 Introduction

1.1 Research Goal

The goal of this dissertation is to advance the field of liquid chromatography – mass spectrometry (LC-MS) based characterization of post transcriptional modifications in ribonucleic acid using reference maps and a differential analysis strategy. The reference, as used in three different ways in this thesis, has helped reduce sample complexity by simplifying data analysis, allowing me to rapidly characterize samples and gain an understanding of the translational machinery, specifically transfer

RNA, when exposed to stress. The reference concept that I introduced will help in the holistic understanding of RNA modifications in biological systems.

These advancements were achieved by: developing a spectral matching method that helps in mapping modifications (Chapter 2), a method to quantify levels of oligonucleotides (Chapter 3) and

RNA sequencing (Chapter 4 and 5) to understand the cellular processes in Cryptococcus neoformans when exposed to ionizing radiation. The introduction of next generation sequencing techniques such as RNA sequencing is a good orthogonal but complementary method to mass spectrometry because it is high-throughput has low sample requirements and low detection limits. Combining next generation sequencing with the established MS-based pipeline in the lab can improve modification mapping quality and analysis by knowing what (t)RNAs are expressed and their levels in the sample.

Furthermore, this body of work has laid the foundation for a new avenue of research with C. neoformans tRNAs.

18

1.2 Introduction

The remainder of this chapter is developed from my published review article on RNA modification mapping [3].

While chemical modifications to RNA have been known since the 1950’s [4], it is only over the past several years where RNA modifications have become more “mainstream” and a prime focus of biological research [5]. This interest (some would say renewed interest) was generated, in large part, by the discovery that particular RNA modifications were reversible and there are dedicated proteins that can generate (‘writers’), recognize (‘readers’) and remove (‘erasers’) the modification (Figure 1.1).

The concept of reversible modifications suggested the possibility that some chemical changes to RNA structure would play a key role in gene expression in a manner reminiscent of DNA epigenetic changes. Not surprisingly, these dynamic changes in RNA modification status became referred to as the “epitranscriptome” [6].

Determining the biological function and significance of these modifications requires tools and technologies that can provide insights into the types and distributions of RNA modifications across the transcriptome [7]. Leveraging advances made in DNA sequencing technologies, the use of high- throughput RNA-Seq [8] has enabled the study of some RNA modifications at the organism level. Yet despite the excitement generated by these RNA-Seq approaches, this platform has limitations and for the field to grow, new and improved techniques will be required.

This chapter will examine the variety of technologies and approaches that are available for examining RNA modifications at a global level. After a brief overview of chemical modifications to

RNA (themselves a subset of RNA post-transcriptional modifications), the chapter will examine the variety of strategies adapted for RNA-Seq that can generate high-throughput data on chemical modifications. Next, I discuss alternatives to RNA-Seq, with a focus on mass spectrometry (MS),

19 which remains the only technology capable of directly identifying nearly all possible chemical modifications in RNA. I conclude by examining new and improved technologies and strategies on the horizon that may become important contributors to the field of RNA modification.

Figure 1.1 DNA and RNA reversible post transcriptional modifications that regulates gene expression. DNA methylation alters how genes are expressed without changing the DNA sequence. RNA methylation has recently been demonstrated to regulate how a transcript can be processed and affect gene expression.

1.2.1 Chemical Modifications to RNA

After transcription, the cell is able to modify RNA in a number of ways. Post-transcriptional modifications that impact the sequence or stability of RNA include capping, splicing, trimming, G- cap, CCA tail addition (for tRNAs), polyadenylation, and polyuridylation [9]. While these post- transcriptional modifications are biologically important, this chapter will focus on modifications that result in a change in the chemical structure of the RNA transcript, most commonly by the addition of small chemical groups (e.g., methyl) or the replacement of one chemical group for another (e.g., sulfur replacement of oxygen) to one or more individual nucleosides present within the already transcribed

20

RNA. These chemical modifications are genome encoded processes, as one or more enzymes are utilized to effect such structural changes. Another class of chemical modifications to the transcribed

RNA, those caused by oxidative damage [10-13], will not be discussed further although many of the techniques described below could be adapted to analyze RNA lesions.

The field of RNA chemical modifications is the subject of numerous reviews, to which the interested reader is directed [5, 7, 14-18]. As these modifications have an impact on the techniques described below, a brief summary of the types and distribution is warranted. At present, ca. 150 enzymatically generated RNA chemical modifications have been identified [19]. While the pace of

“new” nucleoside discovery has slowed, there is no evidence to suggest the current list is exhaustive as new organisms are studied and the technology for modification detection improves. Modifications of interest in this chapter are shown in Figure 1.2.

The single most abundant modified nucleoside in RNA is pseudouridine, originally considered the “fifth” nucleoside due to its relatively high (~5%) cellular abundance [20, 21]. Pseudouridine is an isomer of , generated enzymatically by base removal while on the RNA strand, rotation, and reattachment. This isomerization reaction leads to an additional hydrogen bond as compared to

21

Figure 1.2 Representative RNA chemical modifications. [m5C] 5-methylcytidine; [hm5C] 5- hydroxymethylcytidine; [m6A] N6- methyladenosine; [m6Am] N6,2-O-dimethyladenosine; [ψ] pseudouridine; [I] inosine.

uridine as well as a carbon-carbon glycosidic bond (unlike carbon-nitrogen glycosidic bonds in all

other nucleosides), features that have been exploited in developing technologies for pseudouridine

22 detection. Pseudouridine is found across all types of RNA, being particularly abundant in eukaryotic rRNA.

The single most abundant class of modifications is methylation. Methylation of RNA takes many forms – the can be methylated (sometimes multiply), the ribose sugar can be methylated at the 2’-hydroxyl, and both base and sugar methylations can be found on the same nucleoside. As with pseudouridine, methylations are found in all classes of RNA.

Inosine, a chemical modification that results from the enzymatic deamination of adenosine, is a special category of RNA chemical modifications. Historically editing was coined to denote the insertions of uridine in mRNAs of two trypanosomatids protozoans and was later broadened to refer to a change in RNA sequence or base structure that changes the coding property from its gene sequence that included uridine insertions and deletions and to uridine and adenosine to inosine deamination [22]. Given recent discoveries of other transcriptome modifications that change the coding property of a gene (vida supra), the utility of considering editing as a specialized class of

RNA modifications may soon be a thing of the past. Inosine is the only editing event that results in a noncanonical nucleoside and is a prevalent RNA chemical modification, being concentrated in dsRNA in higher organisms. The translational machinery reads inosine as guanosine, thus base pairing it with cytidine. Such recoding events can lead to different protein sequences with profound impacts on protein function.

Beyond pseudouridine, methylations and inosine, the majority of other chemical modifications that have been identified arise from the multi-enzymatic processing of RNAs. These chemical changes can be simple (S for O in cytidine and uridine) or more complex, such as the addition of amino acids

(e.g., N6-threonylcarbamoyladenosine [t6A]), sugars (e.g., galactosyl queuosine [galQ]) or fatty acids

(e.g., 2-geranylthiouridine [ges2U]), or the generation of multi-ring structures such as the wybutosine

23 family of modifications. At present, the vast majority of these more complex modification structures have only been found in tRNAs [19], although this limitation to that RNA class may be influenced by the relatively high amounts of tRNA per cell, which simplifies detection and identification of low abundance, complex chemical modifications.

1.2.2 Characterizations of RNA Modifications – an Overview

Techniques for identifying and characterizing chemical modifications to RNA have existed ever since the canonical nucleosides were discovered. Historically, modifications were first characterized by paper chromatography [21]. In the late 1960’s, the development of MS approaches for structural characterization was rapidly applied to unknown nucleoside structures, such that nearly all of the known RNA chemical modifications were determined by this approach [4]. Once a structure has been determined, the generation of suitable standards – either synthetically or biosynthetically – enables the use of other strategies for identification of modified nucleosides. Most common in labs is the use of thin layer chromatography (TLC) [23] or high performance liquid chromatography (HPLC) [24]. Both approaches are based on the measurement of some relative (rather than absolute) characteristic of the modified nucleoside, and identification is made by comparison against an authentic standard or compiled library of relative migration distances (TLC) or retention times (HPLC) [25]. Additional specificity in nucleoside detection was enabled by coupling HPLC with

MS, where the combination of HPLC retention time and molecular mass (plus structural data) from

MS is now the most powerful platform for the identification and characterization of chemically modified nucleosides from an RNA sample [26, 27].

24

While nucleoside analysis by techniques such as liquid chromatography-mass spectrometry

(LC-MS) is invaluable in determining what chemically modified nucleosides are present within an

RNA sample, nucleoside analysis alone cannot reveal the sequence location within an RNA that is chemically modified. For that, approaches are required that retain sufficient sequence context to place the modification onto a particular nucleotide or limit the region that would contain the modification.

The earliest approach for placing modifications within an RNA sequence involved a combination of (RNase) digestion and electrophoretic separation to fingerprint or map modifications on specific RNAs [28]. This approach took advantage of the anomalous migration of RNase digestion products containing a modification, and often required subsequent sequencing of RNase digestion products to specifically locate the sequence position of the modification.

These cumbersome RNase fingerprinting approaches were replaced with improved techniques starting in the mid 1980’s. The discovery of reverse transcriptase (RT), which when combined with the polymerase chain reaction (PCR), was used as an effective tool for locating the positions of certain chemical modifications, when those modifications inhibited the RT reaction (so-called “RT stops”).

While an indirect method of detection, RT-PCR enabled the mapping of chemical modifications onto diverse RNAs with better sensitivity and specificity than historical RNase fingerprinting approaches

[29]. In addition, the RNase fingerprinting approach was adapted for use with LC-MS to eliminate the need for radioactive labeling and improve the speed and selectivity of analysis [30].Unlike an RT-PCR approach, RNA modification mapping by mass spectrometry is a direct detection method, as the chemical modification is revealed by the anomalous mass of the modified nucleoside in the RNA sequence.

Without question, the methods and techniques described above – along with others not mentioned – have been key in improving our understanding of RNA modifications. However, these

25 techniques nearly all are predicated on the analysis of one RNA sequence at a time. Thus, one could not envision determining the global RNA modification profile for a cell even when limiting to the modification profile for a particular class of RNA. As will be discussed below, a number of new technical advances have changed the paradigm from one RNA sequence at a time into a true ‘omics’ approach to RNA modification profiling. While these techniques are not without their own limitations, these modern approaches enable a greater picture of RNA modification profiles to be determined from less sample and far quicker than previously possible.

1.2.3 Global Profiling of Modifications – Common Themes

As the interest in understanding global profiles of chemical modifications to RNA has grown, new methods and technologies have been developed or adapted to simplify interpretation of the potential large datasets and to improve the accuracy of profiling results. As will become evident below, where these particular methods are discussed in more depth, a few common themes underlay method development (Figure 1.3). The vast majority of profiling approaches are built around the use of RT to create the cDNA for subsequent sequence analysis (e.g., RNA-Seq approaches). Many chemical modifications lead to RT stops or misincorporation errors. Thus, one theme that continues to emerge is to treat the sample, preferably in some selective fashion, to enhance RT stops. By selectively enhancing RT effects, the sites/locations of modified nucleosides are more precisely defined with an added benefit of higher modification detection sensitivity. Often used concurrently with sample treatment, another theme is the advantage that arises from differential sample analysis. By comparing untreated and treated samples, the researcher is looking for a difference in the datasets to reveal modification profiles. Such differential analyses also improve the precision and sensitivity of the experimental approach as demonstrated in this thesis. Lastly, techniques that can incorporate an

26

Figure 1.3 Common themes in global profiling by RNA-Seq approaches. (a) Chemical modification derivatization strategies are used to enhance reverse transcriptase (RT) stops. (b) Differential analysis takes advantage of varying sensitivity of RNA-Seq to the presence of chemical modifications. (c) Affinity purification—usually through antibodies targeting specific chemically modified nucleosides—enriches the RNA pool prior to next-generation sequencing.

27

enrichment step before analysis benefit by removing unmodified RNAs from the background. Not

only does enrichment allow the method to focus on the modified RNAs of interest, it can also improve

the sensitivity and dynamic range of the measurement. Methods that incorporate one or more of these

common themes have high utility and provide a template for future developments in the field.

1.2.4 Modification Profiling by RNA-Seq Approaches

It is hard to understate the impact of modern genomic sequencing technologies on science.

Beyond their impacts in reducing the time and cost for whole genome sequencing analysis, the benefits of these technologies spilled over into numerous related areas. One such area – that of whole transcriptome sequencing, also referred to as RNA-Seq, served as the catalyst for transforming RNA

modification mapping from a slow, cumbersome, one modification at one site approach into a field

where the entire transcriptome – both coding and non-coding RNAs – can be examined for potential

chemical modifications.

1.2.5 Bioinformatics data from RNA Seq only

A common theme of the RNA-Seq approaches – either specific for one modification or more

general for a modification class – is the manipulation of the RNA sample into a form more conducive

for modification analysis. These manipulations, whether chemical derivatization/treatment or

selective enrichment via purification, improve the accuracy and specificity of analysis. However, there

are well-established reports that the direct analysis of RNA-Seq data enables a computational analysis

of possible RNA chemical modification sites. One such approach is RNA and DNA Differences

(RDD). Inosine is the representative example here. Inosine arises from the deamination of adenosine

28 and is typically recognized as guanosine during RT-PCR. In RDD, RNA-Seq data is aligned against the reference genome (DNA) to identify differences arising from AG conversions seen with inosine

[31]. This RDD approach has improved greatly since it was first introduced through advances in computational tools to improve RDD identifications that can be attributed specifically to inosine [32,

33].

In addition, over a half dozen years ago, it was noted that “errors” found in RNA-Seq datasets may not arise due to the technology/methodology, but rather are manifestations of chemically modified nucleosides in the RNA [34-36]. Not surprisingly, these strictly computational analyses of

RNA-Seq datasets focused on small RNAs, including microRNAs and tRNAs, as they are easier to characterize computationally and both RNA classes are characterized by rather robust chemical modifications (as well as 5’/3’-editing in the case of miRNAs). In these computational approaches, the key is to develop software with sufficient discriminatory power to identify possible sequencing

“errors” arising from the modification/polymerase interaction as opposed to standard experimental error or polymorphisms. While these computational approaches have been shown to reveal putative modification locations and therefore may have a role in allowing researchers to limit the scope of investigations, they do require validation by the other approaches discussed in this chapter.

1.2.6 Specific Modified Nucleosides

At present, the most successful RNA-Seq based approaches for profiling RNA modifications either utilize a specific derivatization reaction that changes the targeted modified nucleoside into a moiety that is amenable to detection during the RT step or take advantage of antibody recognition of the modified nucleoside for enrichment and mapping. Due to the limited number of specific

29

derivatization chemistries or antibodies, these types of targeted approaches are only available for a

small subset of modifications that include 5-methylcytidine (m5C), 5-hydroxymethylcytidine (hm5C),

N6-methyladenosine (m6A), N6, 2’-O-dimethyl adenosine (m6Am), pseudouridine (ψ), inosine (I) and

1-methyladenosine (m1A).

1.2.6.1 5-methylcytidine [m5C] and 5-hydroxymethylcytidine [hm5C]

One of the first approaches adapted to RNA-Seq technology was the use of bisulfite

modification of RNA to identify methylation sites. Bisulfite sequencing has a rich history in both DNA

and RNA methylcytidine analysis, and the strengths and weaknesses of this approach are now well

appreciated [37].The essence of the chemistry is that bisulfite treatment of RNA leads to deamination of C, generating a U in its place. However, 5-methylcytidine undergoes a slower deamination relative to cytidine thus it is read as a C during reverse transcription. By comparing control (untreated) samples against bisulfite-treated samples, one can – in principle – identify sites of methylation. As adapted with deep sequencing technologies, the sequencing depth can be used to identify RNA sites that are hypomodified (i.e., a non-stoichiometric amount of modified nucleoside is present within the sample).

RNA bisulfite sequencing offers the advantage of directly detecting m5C during reverse transcription

but the harsh reaction conditions (high pH) may lead to RNA degradation. A number of publications

reporting on 5-methylcytidine in RNA are available that use this approach, and optimized protocols

exist [38].

An alternative, antibody-based strategy has been introduced more recently [39]. In this

strategy, Aza-IP, an RNA cytosine methyltransferase (RCMT) inhibitor, 5-azacytidine, is used to

generate a covalent adduct of the RCMT and RNA at the methylation site. Immunoprecipitation (IP)

30

of this adduct using an antibody against the particular RCMT under study is used to enrich the RNA

sample prior to analysis. After treatment to break the enzyme-substrate adduct, the RNA can be

sequenced as before. Khoddami and Cairns found that sequencing of these products also lead to a

characteristic C>G transversion site that could be used to pinpoint 5-methylcytidine locations within

the RNA [38]. Unlike bisulfite treatment, this approach does not require additional sequencing of an

untreated control, and the IP enrichment step improves sequencing depth. Another similar but

chemically distinct immunoprecipitation approach, miCLIP (methylation individual-nucleotide

resolution crosslinking immunoprecipitation) was developed by Hussain and co-workers [40]. Similar

to Aza-IP, miCLIP forms an irreversible covalent crosslink between the m5C and methyltransferase

followed by immunoprecipitation, but differs in the way the bond is formed. In miCLIP, a highly

conserved cysteine residue in the catalytic core of Nsun2 is mutated to alanine (C271A), forming a

stable RNA-protein complex that can be detected by Western blots. Upon immunoprecipitation, the

m5C enriched RNA pool is followed by standard NGS library preparation and sequencing. This is in

contrast to Aza-IP where 5-azacytidine is intentionally incorporated to the nascent RNA by polymerases during the transcription process. Antibody based technology are highly specific to the target and subsequent immunoprecipitation step facilitate detection of low abundance methylated

RNAs. Other putative m5C methyltransferases in mammals (Nsun1-Nsun7) can also be explored to

enhance detection as most of these proteins are known to bind to mRNAs as well [41, 42].

An antibody approach has also been developed for hm5C [43, 44] and recently used to examine

RNA hydroxy-methylation patterns in Drosophila [45]. With this antibody-based strategy, only regions

of RNA transcripts enriched in hm5C can be determined with the majority of hm5C peaks being

detected in coding regions exhibiting UC-rich sequences containing a UCCUC repeating motif.

Additional technique developments will be required to begin identifying the specific residues

containing hm5C along with quantitative information on this modification.

31

1.2.6.2 N6-methyladenosine [m6A] and N 6, 2’-O-dimethyl adenosine [m6Am]

Although m6A was known to be present in RNA transcripts over 40 years ago, the

development of m6A-seq and related approaches has enabled organism-level studies on the

abundance, distribution and biological function of this modification. The m6A-Seq (or MeRIP-Seq)

approach is based on the use of highly specific antibodies to m6A, which are used to purify RNA

fragments that contain this modification (Figure 1.4) [46]. The sample generated for sequencing is

enriched in this specific modification, and identification of m6A consensus sites depends on sufficient enrichment. Site-specific identification is performed by comparing IP-enriched sample reads against

non-enriched samples. Thus, specific detection of m6A is indirect and limited to methylation

stoichiometries sufficient to meet data analysis parameters for identification. This approach revealed

that m6A is particularly enriched in the 3’ UTR region of the transcript, near the stop codon and within

long exons but cannot identify specific m6A residues [47]. The sites of methylation can be also be

predicted computationally by searching for R-A*-C sequence motif (where R is G/C/A; A* potential

methylation site) and a recently discovered consensus motif DRACH (where D could be A/G or U;

H could be A/C/U. Searching for such motifs in areas near the highest m6A signal can be used to

limit the methylation site [48].

UV-CLIP (cross-linking immunoprecipitation) was adapted for use with the m6A-seq

approach to enable direct identification of m6A sites within RNA fragments [49]. The essence of this

approach is that by direct UV-crosslinking of the antibody to m6A, subsequent reverse transcriptase

treatment enables the precise location of the methylation to be identified. In addition, another related

modification, N6-2’-O-dimethyl adenosine (m6Am) can also be mapped at the 5’ end of mRNAs.

Unlike m6A which can cluster at different sites in the transcript, m6Am seem to be exclusively localized

at the transcription start site (TSS) [49]. While this protocol does not address difficulties related to

32

Figure 1.4 Schematic representation of MeRIP-Seq. The fragmented m6A-containing mRNAs is incubated with an m6A-antibody. The bound mRNAs are immunoprecipitated and the enriched pool is subjected to next generation sequencing.

modification stoichiometry, it has significantly simplified locating m6A motifs to specific sites – essentially increased the resolution (around 200 nt) for detecting m6A at the transcriptome level [50].

Chen and coworkers developed another photocrosslinking based assay, photo-crosslinking-assisted m6A-sequencing (PA-m6A-seq) which significantly improved resolution to ~20 nt of m6A residue in the transcript [51].

33

One can also couple these high throughput sequencing techniques with the writers, readers

and eraser to validate m6A localization. As demonstrated by the work of Wang and co-workers, the

two m6A reader proteins, YTHDF1 and YTHDF2, pull down with PAR-CLIP experiments have a

significant overlap with m6A – seq previously identified [52, 53]. Interestingly, the two readers have opposing roles for the fate of a transcript. The former promotes translation by interacting with the protein machinery and the latter facilitates transcript decay. Other YTH domain family proteins in humans (YTHDF3 and YTHDC1) also have RNA binding properties which can be explored in pull down experiments as well the m6A eraser FTO or Alkhb5 for differential analysis.

1.2.6.3 Pseudouridine [ψ]

Transcriptome-wide detection of ψ by RNA-Seq was introduced nearly simultaneously by

multiple groups [54-56]. This approach is built off the long-standing selective chemical derivatization

of ψ by 1-cyclohexyl-3-(2-(4-morpholinyl)ethyl) carbodiimide tosylate (CMCT). As originally

developed by Ofengand and co-workers, CMC can be selectively retained on ψ generating an RT stop

during RT-PCR [57]. To adapt this to RNA-Seq, sequencing reads of small RNA fragments (typically

generated during RT stops) are aligned against the reference genome to identify that are likely

derivatized. Schwartz and co-workers used spiked-in standards (in vitro transcribed RNAs) to quantify

ψ stoichiometries [56]. ψ-seq revealed a complex landscape of pseudouridylation in mRNA and

ncRNA. Pseudouridine mapping reveals no localization in distribution, being detected in coding

regions of the transcript as well as 5’ and 3’ UTRs. However, it is noted that pseudouridylation is

highly inducible both by dependent (requires H/ACA Ribonucleoprotein) or independent (Pus

enzyme family) mechanisms [58].

34

1.2.6.4 Inosine [I]

While inosine can be identified by RDD, as mentioned above, a number of technical concerns limit the utility of RDD at the transcriptome level. Cattenoz and co-workers developed an inosine specific sequencing protocol or iSeq that selectively enriched the samples with inosine prior to high throughput sequencing [59]. This approach relies on the specificity of RNase T1, which is known to cleave at unmodified guanosine and inosine residue. In the presence of borate, guanosine forms a stable adduct with glyoxal. Glyoxalated guanosines are resistant to RNase T1 cleavage, generating 3’- inosine containing digestion products. One of the limitations of this method is the challenge in mapping small RNase T1 digestion products.

Suzuki and co-workers developed a differential approach called ICE-Seq (inosine chemical erasing) [60]. Inosine is selectively derivatized by cyanoethylation, which introduces an RT stop that leads to truncated RT-PCR products that are subsequently eliminated during sample processing

(Figure 1.5). By comparing treated RNA fragments against untreated fragments, the combination of

“erased” reads with A  G conversions in the untreated sample enables more accurate determination of inosine at the transcriptome level. A low frequency of editing sites may give unreliable results as the ICE score is based on the ratio of G base reads between treated and untreated samples.

35

(a)

(b)

Figure 1.5 (A) Chemical reaction of inosine with acrylonitrile to form N1-cyanoethylinosine (ce1I). (B) Schematic of the ICE-seq protocol. The RNA is treated with CE+ (left) and control (CE-) on the right are shown. In the first strand of cDNA synthesis, CE+ condition has truncated reads whereas in the CE- condition, inosine is transcribed as cytidine. After the second cDNA strand synthesis, a size selection step of 400-500 bp cDNA is performed. Sequencing of cDNA reads from both ends can reveal the A-I editing site by detecting G-erased reads in CE+ conditions.

Furthermore, differentiating an authentic editing site from systematic artifacts like SNPs, sequencing/errors and short sequencing reads is still a challenge [58].

36

1.2.6.5 1-methyladenosine [m1A]

The chemically modified nucleoside 1-methyl adenosine at position 58 of tRNAs is found in

all three domains of life. It is also present in rRNA in a specific, highly conserved residue. While the

role of m1A in ncRNAs is well established (i.e. structural stabilization) [61], its presence and role in

mRNA is still unexplored. Two independent research groups have published simultaneously the global

profiling of m1A in eukaryotic mRNA [62, 63]. Dominissini et al developed m1A-seq, an antibody

based approach to immunoprecipitated fragments containing the modified nucleoside. They also

exploited the known m1A to m6A conversion (Dimroth rearrangement) as an orthogonal method to

verify and validate the resulting m1A site from immunoprecipitation. Li et al used a differential

approach by erasing the methyl group (using AlKB from E. coli) in m1A enriched immunoprecipitated

fragments. By comparing the treated and untreated samples, the m1A containing region is identified.

Both groups had similar findings that m1A is enriched in 5’-UTR and coding sequence (CDS) regions.

The behavior of reverse transcriptase also influences the reads from RNA-seq experiments.

Without any chemical treatments to a sample, it can either misincorporate a wrong nucleotide or cause

abortive RT-stop. Hauenschild et al. demonstrated that m1A residues have a distinct signature in reverse transcription which includes the identity of nucleotide 3’ to the modification (sequence dependent) and structure dependence. These signatures were feed to a machine learning algorithm to predict the location of m1A in trypanosomal tRNAs. To date, this approach has been focused on tRNAs and rRNAs, as those RNA classes are known to contain m1A. However, the described protocol

is compatible with any RNA class.

37

1.2.7 Profiling by Class of Modifications

Related to the modification-specific RNA-Seq approaches described above, there are several

other recently developed high-throughput approaches that are more appropriately described as

‘modification class’ techniques. These approaches are typically directed against the specific chemical

group(s) that are altered prior to reverse transcription. Thus, the nucleoside identity can vary in these

cases providing a greater coverage of modification profiles present in the transcriptome.

1.2.7.1 Methyl modifications

Arm-Seq (AlkB-facilitated RNA methylation sequencing) [64] and demethylase-thermostable

group II intron RT tRNA sequencing (DM-tRNA Seq) [65] both are predicated on the removal of base

methylations that are known to typically cause RT stops. ARM-seq uses a wild type E Coli Alk B

demethylase while DM-tRNA Seq uses a genetically engineered Alk B (D151S), which has improved

activity for m1G than the wild type. ARM-Seq ligates adapters at both ends prior to reverse

transcription. DM-tRNA Seq uses a thermostable group II intron reverse transcriptase, which does

not require ligation. It switches from an adapter to the 3’ end of the tRNA and synthesizes the cDNA

(Figure 1.6) [65]. RNA fragments that are untreated will tend to create truncated products due to the interfering methylation. By comparing sequencing reads of untreated with treated samples, the locations of AlkB-sensitive base methylations (e.g., m1A, m3C and m1G) can be determined.

38

Figure 1.6 Schematic representation of demethylase-thermostable group II intron RT tRNA sequencing (DM-tRNA-seq). Red shaded circles are AlkB-sensitive methylated nucleosides (m1A, m1G, m3C).

To date, these methods have been used to characterize methylations in tRNA and rRNA classes. Importantly, this concept demonstrates that differential enzymatic treatment using “erasers” can be a general approach to large-scale examination of RNA chemical modifications. The utility of such a concept will depend, in part, on the further discovery of additional enzymes that are found to remove particular chemical modifications from RNA substrates.

39

1.2.7.2 2’-O-methyl modifications

The RiboMeth-Seq approach has been developed for global detection of 2'-O ribose

methylations [66]. This approach takes advantage of the lability of 2’-hydroxyls to alkaline hydrolysis.

Thus, unmodified RNAs treated under limited alkaline conditions should generate fragments that fully

represent the entire RNA sequence. In contrast, a 2’-methyl hinders alkaline hydrolysis, leading to

RNA fragments whose hydrolysis pattern is offset due to the 2’-methyl group. By aligning RNA-Seq

data to a reference genome, the read ends for 2’-methyl modified nucleosides are revealed as

underrepresented in the data set, enabling site-specific detection of this class of methylations.

1.2.7.3 Nicotinamide adenine dinucleotide (NAD) caps

While technologies and approaches for characterizing 5’-capping modifications are not the

focus of this chapter, the recent demonstration of NAD capture-Seq for NAD end-modified bacterial

RNAs is worthy of discussion [67]. This approach is the first example of chemical derivatization prior

to library generation and deep sequencing that uses a chemical biology approach to modification

detection. Cahova et al. used chemical derivatization to purify only RNA fragments that contain the desired modification class (here NAD 5’-capped RNAs). Moreover, derivatization was an enzymatically based modification to NAD that allowed subsequent biotinylation and purification.

40

1.2.8 RNA-Seq Alternatives for Modification Profiling and Measurements

1.2.8.1 Mass Spectrometry

Clearly the last five years have seen the emergence of targeted approaches for the global

profiling of RNA chemical modifications. Concurrent with those developments, there have also been

advances in untargeted approaches that move these technologies beyond the one RNA at a time

paradigm as well. The most significant advances have occurred in LC-MS based techniques. Mass

spectrometry has always played an important role in the characterization of chemical modifications

[68]. Beyond nucleoside-level characterization, MS has proven a useful technique for characterizing

modified nucleosides within a given RNA sequence. The general strategy for mapping modifications

– RNA modification mapping – involves purification of a single RNA, digestion with an appropriate

(e.g., RNase T1, which cleaves specifically at unmodified guanosines), and separation

and analysis using LC-MS. As all known chemical modifications except pseudouridine result in a mass

increase to the canonical nucleoside, modifications are mapped to specific sites by monitoring the

mass increase determined during collisional induced dissociation tandem mass spectrometry (CID

MS/MS) [69-71]. The advantages of LC-MS/MS strategies for RNA modification mapping include

broad applicability (i.e., either by RNA class or modification type) and high accuracy (as mass can be

measured with high precision by MS). The primary disadvantages include the general limitation to

serial analysis of one RNA at a time, the relatively poorer sensitivity for RNA detection, and the

technical skill required for LC-MS/MS instrumentation.

Recently, new developments in the field of RNA modification mapping and LC-MS/MS have

addressed some of the historic limitations of MS in RNA modification profiling [72-76]. Several approaches have built upon successes in the area of proteomics to move away from rigorous single

41

RNA purification into the direct RNA modification mapping of multiple RNAs (e.g., the entire set of bacterial tRNAs from an organism, Figure 1.7) [77]. Here, LC-MS/MS technologies can provide robust quantitative mapping of RNA chemical modifications through better chromatography conditions and reproducible and defined fragmentation of endonuclease digestion products [78]. As new computational methods are developed, data interpretation becomes more automated and amenable to increased throughputs in sample complexity and volume [79-81].

Figure 1.7 Schematic representation of DNA-based exclusion list for enhanced detection of modified RNAs by LC-MS/MS.

42

Despite these advances, LC-MS/MS approaches are still being applied to either more highly

modified sRNAs, including tRNA, or rRNAs. The application of LC-MS/MS to mRNAs or the entire

transcriptome will require further improvements in the dynamic range of this technology. An

unexplored but likely important role for MS in future global modification profiling studies will be as

an independent and orthogonal technique to validate multiple modifications found in RNA-Seq approaches described earlier. When combined with relative or absolute quantification methods, this platform should provide a middle ground between RNA-Seq techniques and the targeted approaches described below that are still based on RT-PCR, TLC or other conventional biochemical approaches.

1.2.8.2 SCARLET (m6A)

Liu and coworkers developed SCARLET – Site-specific Cleavage And Radioactive labeling

followed by Ligation-assisted Extraction and Thin-layer chromatography – to enable any particular

m6A modification site to be examined quantitatively [82]. The approach is built on several

conventional historic biochemical approaches used for modified nucleoside

identification/quantification including 32P post-labeling and TLC separation and detection of the

targeted chemical modification [23, 83]. Site-specificity is enabled by RNase H-directed cleavage of

the target RNA using specifically constructed chimeric oligonucleotides [84]. A key point to emphasize

is that SCARLET is based on the direct measurement of m6A in the RNA – no derivatization,

enrichment or amplification of the RNA is involved. However, the challenge is that a direct detection

strategy does require highly sensitive methods (e.g., radioactivity) to enable determination of low

abundance m6A sites. While SCARLET is a low-throughput approach, it is adaptable to other chemical

modifications as the RNase H-step can be tailored to any RNA location.

43

1.2.8.3 Reverse Transcription at Low deoxyribonucleoside triphosphate (RTL-P)

Another example of quantitative measurement of site-specific modifications is RTL-P

(Reverse Transcription at Low deoxyribonucleoside triphosphate (dNTP) concentrations followed by polymerase chain reaction (PCR)) for 2’-O-methyl modifications [85]. Here, the use of primers upstream and downstream of a 2’-O-methylation site in RNA allows for a comparative analysis of

PCR products to measure modification levels. This approach is low throughput and involves the indirect determination of modification levels. However, as with any method based on differential analyses, the sensitivity of this approach is higher than methods using only a single input for analysis.

1.2.9 Conclusion

High throughput RNA-seq methods have enabled researchers to have an unprecedented view of the transcriptome. While second-generation DNA sequencing technologies have proven invaluable in global profiling of specific RNA chemical modifications, these technologies still suffer from the inability to directly detect RNA or RNA modifications and methods, outside of LC-MS/MS, are limited to only a handful of the more prevalent modifications (primarily methylations) in RNA. The inherent systematic error in the analysis can lead to false positive or negative results, which impact the accuracy of assigning modification sites especially those of low abundance. Short reads are prone to mapping/alignment errors due to the inherent repeats/duplication in the genome. Additional ambiguity in data interpretation can be caused by splice junctions, polymorphism between individuals and highly similar paralogous genes. Although there is no panacea for all of these limitations, increasing the read length, sequence depth and direct detection of modifications (bypassing the RNA to DNA conversion) would be a significant improvement. In addition, the large data sets generated

44 from NGS warrants a user-friendly informatics pipeline that includes steps to lower false positives, increase sensitivity and selectivity.

One of the risks involved in immunoprecipitation is the non-specificity of the antibody to the target. Ideally the antibody should capture only its cognate target, but several experiments report that specificities can vary from lot to lot or some just do not perform the job as expected. Prior to use antibodies should be subjected to several assays to validate the specificity with necessary controls. Co- precipitation of fragments with modifications other than the target may lead to false positives.

Rigorous testing should be an integral quality control step in a lab that performs immunoprecipitation prior to next generation sequencing.

Newer technologies are on the horizon that offer the potential for direct RNA sequencing with the possibility that such direct analysis can also differentiate canonical nucleosides from chemically modified nucleosides. Nanopore-based sequencing has the potential to globally characterize RNA chemical modifications directly, obviating the need to make cDNA constructs. The nanopore sequencing platform allows detection of bases by the disruption in the current that flows through the protein channel [86, 87]. It would not be surprising to discover that chemical modifications would each have a unique current signature that would allow direct detection and sequencing. Ayub and co-workers have recently demonstrated RNA nanopore exosequencing that can discriminate the four canonical bases as well as inosine, m6A and m5C [86]. Though the method has not yet been applied to the global profiling of transcripts, one can already envision an array of chip- based nanopores with engineered proteins optimized for specific modifications allowing parallel sequencing and data collection of RNA modification profiles directly and with high selectivity.

Another possibility is to use native error-prone or genetically optimized reverse transcriptases that can recognize RNA modifications and insert an appropriate nucleotide in the DNA strand that

45

would serve as a direct modification marker during DNA sequencing. Alternatively, rate-based

measurements, which have already been examined, [88] could be improved by using these next

generation reverse transcriptase, providing greater selectivity and differentiation when measuring

incorporation across a modified nucleoside. One can also exploit the differential chemical reactivity

of modified with a variety of reagents to be coupled with high throughput sequencing. A

wealth of information is available in literature for the orthogonal reactivity of modified bases if one is willing to revisit the organic chemistry of . In this fashion, one can foresee a combination of direct detection of RNA chemical modifications that is validated by site- or modification-specific

derivatization analyzed using older RT-cDNA based NGS.

Finally, additional possibilities exist with alternative platforms such as mass spectrometry.

While LC-MS/MS has become a mature, powerful technology, the methods available, limits of

detection and ability to handle complex mixtures of modified RNAs are only in their infancy. If one

uses the growth of LC-MS/MS within the field of proteomics as a reasonable guide for the possibilities of this technology, one readily notes that applications moving beyond RNA class into whole transcriptome studies are feasible, if just impractical at present. Despite these limitations, mass spectrometry combined with RNA seq is still a powerful platform for analysis of small non-coding

RNAs like tRNA or miRNA. Leveraging the power of differential analysis with a set of known

standards, one can to rapidly characterize tRNA samples both qualitatively and quantitatively. The

methods established here is a more powerful comparative analysis in understanding the dynamics of

RNA modifications and expression in the pathogenic, radioresistant fungi Cryptococcus neoformans when

exposed to different environmental stimuli such as ionizing radiation.

46

Chapter 2 Using Mass Spectral Matching to Interpret LC-MS/MS Data During RNA

Modification Mapping

2.1 Introduction

Collision induced dissociation of an oligonucleotide generates c-, y-, w- and a-B product ions,

(M-H-B)-, neutral losses, secondary backbone dissociation and other fragmentation channels [89, 90].

In RNA, c- and y- ions, along with 5’ base and backbone cleavage are the most favorable; mass

difference in c- and y- ions from genomic sequence reveals the identity of a modification and its

sequence location [91]. The assignment of MS/MS peaks for identification is often the bottleneck of

any mass mapping analysis. Mass spectrometry-based characterization of transfer RNA has been

around for more than 25 years, yet suitable software equipped to handle the complexity, interpret and

annotate the MS/MS data is still in its infancy relative to the proteomics field [92]. Recently, several

tools has been developed to aid in the analysis. Ariadne, RoboOligo and RMM are just a few platforms

to simplify MS/MS interpretation [93-95]. Recently our lab introduced RNAModMapper (RAMM) to

interpret CID data and map modifications in the RNA [91]. This software has been used to map

methylations in the 16S and 23S rRNA of Streptomyces griseus.

Over the years, I have amassed a collection of well characterized oligonucleotide spectra from

RNase digests of total tRNAs from different model organism. I used these as reference spectra for spectral matching to aid in reducing the manual de novo sequencing analysis. The premise behind spectral matching is simple: in some fixed conditions, molecules fragment in a reproducible way, hence it becomes its unique fingerprint [96]. Since all the features are included, spectral matching offers a more global approach in oligonucleotide identification.

47

I re-purposed a publicly available software, NIST MS search v2, which allows the user to

perform spectral matching which allows users to build their own library. In this chapter I demonstrate

the first use of a reference, which is defined here as an interpreted oligonucleotide mass spectrum, for automation of LC-MS/MS modification mapping data analysis. This strategy was implemented in the model organism E. coli and for a set of wild type and mutant strains of Streptococcus mutants to determine which proteins are essential for the anticodon modification N6-threonylcarbamoyladenosine (t6A).

The work with S. mutants has been published, [97] while the rest is in the process of being submitted

for a publication.

2.2 Experimental Section

2.2.1 Transfer RNA samples

E. coli strain MG1655 was cultured with Luria-Bertani both. Cryptococcus neoformans (Sanfelice)

Vuillemin (ATCC® 32045™) was grown in potato dextrose (PD) medium (pH 5.5) at 30°C. Cells were grown until the OD reached 1.5, harvested and washed twice by 1X PBS to remove remaining media. The cells were resuspended in lysis buffer and liquid nitrogen was added. The cells were grinded in a mortar and pestle while frozen. This cycle of freeze-partial thaw-grind was repeated five times.

Tri-Reagent was added, and the cycle was again repeated five times to ensure complete cell lysis.

Chloroform was added to facilitate phase separation. The aqueous phase was removed by centrifugation and RNA was precipitated using equal volumes of isopropanol. The RNA pellet was dissolved in sterile water and checked for integrity by running an aliquot on an agarose gel. Wild type and mutant strains of S. mutans were cultured as described [97].

48

To isolate the tRNA fraction, total RNA was resuspended in 100mM Tris-acetate (pH 6.3)

and 15% ethanol and loaded into an anion exchange column (Nucleobond, Machery Nagel). Small

RNAs were eluted in a 400 mM NH4Cl, 100 mM Tris-acetate (pH 6.3) and 15% ethanol. The tRNA

fraction was eluted in 750 mM NH4Cl, 100 mM Tris-acetate (pH 6.3) and 15% ethanol. The tRNA

was precipitated by adding equal volumes of isopropyl alcohol. The integrity was checked by running

a denaturing PAGE gel.

2.2.2 Liquid Chromatography and Tandem Mass Spectrometry

The tRNA was digested with RNAse T1 (50 U T1/ μg tRNA) in a 220 mM ammonium acetate

buffer (pH 7) for 2 h at 37 °C. The digests were separated using a Poroshell™ C18 column (50 mm

× 1 mm; 2.7 μm) from Agilent (Santa Clara, CA) in a Hitachi La Chrom Ultra UPLC containing two

L-216OU pumps, L-2455U diode array detector and an L-2300 column oven (35 °C). The flow rate

was set at 150 μL/min with a 10 minute equilibration in 95% Buffer A (400 mM HFIP, 16.3 mM

TEA, pH 7) and 5% Buffer B (400 mM HFIP, 16.3 mM TEA:methanol, 50:50 v:v, pH 7). The

gradient begins with 10 %B and increased at 8 % min for 10 min. The mobile phase was then increased

to 95% for 5 min before re-equilibrating prior to the next analysis. The eluent was connected to a

Thermo Scientific (Waltham, MA) LTQ™ linear ion trap mass spectrometer. The mass spectrometer operating parameters included a capillary temperature of 275 °C, spray voltage of 4 kV, source current of 100 μA, and sheath, auxiliary and sweep gases set to 40, 10 and 10 arbitrary units, respectively. Each instrumental segment consisted of a full scan range restricted from 600 to 2000 Da (scan 1), collected in negative polarity, followed by three product ion scans (scans 2-4), using collision-induced

dissociation (CID) at a normalized collision energy of 50% with an activation time of 5 ms. In data-

dependent mode, scans 2-4 were triggered by the three most abundant ions from scan 1 and isolated

by a mass width of 2 (±1 Da). Each ion selected for CID was analyzed for up to 10 scans before it

49 was added to a dynamic exclusion list for 5 s (typical chromatographic fwhm). All MS/MS data were collected in profile mode.

2.2.3 Reference spectra and library generation

The reference as used in this chapter is defined as the interpreted CID fragmentation of a signature digestion product (SDP) generated from various experiments and from different organisms.

The spectral matching strategy was first evaluated using a set of well characterized and interpreted spectra of SDPs from E. coli. The modification status of E. coli tRNAs has been validated independently by different groups using techniques other than mass spectrometry. The sequences can be obtained from the Modomics RNA database (http://modomics.genesilico.pl/) [98] . For an E. coli

RNAse T1 digest, a total of 73 unique digestion products contains at least one post transcriptional modification, except for pseudouridine (Δ m/z = 0). All the reference spectra used in this chapter has been generated from various experiments in the lab. The library consists of the entire RNase digest

LC-MS/MS run and no further processing was performed.

2.2.4 Spectra pre-processing and spectral matching

All reference and library spectra were first converted into a mascot generated format (mgf) using

MassMatrix conversion software. A library was created by importing the mgf file of the RNase digest

LC-MS/MS run. The software gives an ID to all the MS/MS events of the imported file, scan number, retention time (seconds) and precursor mass. To perform spectral matching, the reference spectrum is imported and searched through a specific library. I tested different search options available in the

NIST MS search v2.0 spectrum search type: identity search (quick, normal, MS/MS) and similarity search (simple, hybrid, neutral loss, MS/MS in EI). For the spectra collected in LTQ, the precursor

50 and product ion m/z tolerance were set at ±0.8 Da. The search results were evaluated based on the following criteria: match, reverse match and probability score (similarity search only).

2.3 Results and Discussion

A typical workflow for LC-MS/MS RNA modification mapping involves a bottom up approach employing base specific (RNases) to generate smaller oligonucleotides amendable for LC-MS/MS (Figure 2.1). The genomic tDNA sequences are digested in silico, and using the pool of modified nucleosides, modifications are placed in the sequence context. Predicted m/z and fragment ions (c, y, base and phosphate losses) are calculated. This calculated list of RNase digestion products and their predicted product ions serves as a local database against which the experimental data can be compared. Comparisons can be conducted manually or in (semi-)automated fashion by use of software programs [91, 93, 94].

Whether experimental data is annotated manually or using software, data analysis can still be labor intensive. For example, the in-silico digestion of Cryptococcus neoformans has ~300 predicted RNase

T1 digestion products, which in a typical LC/MS/MS run will generate at least 3000 MS/MS spectra.

This format requires investigating each MS/MS spectrum to ensure that at least 80% of the predicted c- and y-type product ions are present. Although several strategies have been employed to reduce the workload i.e., using public databases to know the consensus modification at certain locations, ignoring

RNase digestion products that are four or less nucleotides, this approach still explores the entire potential MS/MS space in a step-by-step “peak hopping” fashion [94].

The alternative approach investigated here was to adapt prior spectral matching software to an RNA modification mapping strategy. By generating a library of MS/MS spectra, the entire MS/MS

51 spectrum pattern is evaluated rather than peak hopping to locate only c- or y-type ions. The existence of publicly available software from NIST (MS search v2.0) would allow this approach to be adapted to any dissociation method or vendor platform. The key challenges to investigate were (1) MS/MS spectral library features and (2) spectral matching strategy.

Figure 2.1 Experimental scheme for the bottom up, RNA mass mapping approach. This is a combination of predicting the location of modification (nucleoside data) to the genomic sequences, in silico digestions and manual de novo interpretation of the MS/MS data.

52

2.3.1 Reference and library generation

The typical approach in spectral matching is to establish a library of reference spectra against which the experimental data can be searched one by one. Traditional library construction includes a rigorous collection of spectra from different instrument platforms, conditions and sample types. For

modification mapping alone, this is quite a tedious task. In L. lactis for example, RNase T1 generates

a total of 952 digestion products, with only 9% (88/952) containing at least one modification and 17%

(167/952) are unique but unmodified oligonucleotides. With the limitation that the software used has

no batch search, an L. lactis RNAse T1 digest run has an average of 6700 MS/MS scan events generated in one experiment. Performing the search one by one is not only daunting as it is akin to “finding a

needle (modified oligos) in a haystack (total digests)”.

The alternative explored here was to use previously acquired and well characterized

oligonucleotide mass spectra that can be searched against the experimental data set (Figure 2.2). These

previous mass spectra include both RNase digestion products without post-transcriptional

modifications as well as digestion products with one or more modifications. These were accumulated

over time from various organisms (e.g., E. coli, S. cerevisae, L. lactis, B. subtilis and C. neoformans), in vitro

transcripts and synthetic oligonucleotides. Each MS/MS spectrum is identified with a unique ID that describes the oligonucleotide origin, sequence, precursor mass (m/z) and charge state. For oligonucleotides that exists in more than one charge state, a preferred reference is the one which all the c- and y-type ions are present, which is usually the lowest charge state. For the development of spectral matching performed here, my initial reference list contained 61 uniquely identified spectra

(both modified and unmodified). Ideally, the reference can be any oligonucleotides as each will fragment in a unique way. However, for spectral matching as a tool to interpret MS/MS data in RNA modification mapping as used here, the best set of references will be modified oligonucleotides.

53

Figure 2.2 Workflow for qualitative analysis of CID of oligonucleotide by spectral matching.

Post transcriptional modifications in oligonucleotides create such a unique mass shift that it

creates a unique fragmentation (i.e. neutral losses of modifications) that few overlaps can be created

in aligning similar spectra in the library. The type of data acquired from these set of experiments from low-resolution instrument (i.e. LTQ-XL), has a big impact in how the output can be interpreted. The

A+1 isotope of cytidine is the same nominal mass as uridine, making it difficult to discriminate sequence isomers and C to U substitutions. In situations like this, the end user must manually evaluate the mass spectrum and compare the precursor ion mass difference for mass error.

Each LC-MS/MS analysis of the complete set of RNase digestion products from the sample of interest is considered the spectral matching library. Effectively, each reference spectrum (discussed above) is searched against the contents of the library to identify the top candidates for the experimental data. With 61 reference spectra for E. coli and a typical library of 3,000 MS/MS spectra, theoretically

54

only 2% of all the scan events have positive identification. However, all the output from every search

is the same: a list of 100 spectra where the end user can manually check using the score system.

To determine whether spectral matching strategy is useful for LC-MS/MS data interpretation

is modification mapping experiments, the set of reference and library from E. coli RNAse T1 digests

was used. Its modification status is well characterized and validated through independent analysis [77,

88, 98]. The reference oligo will have a unique ID which is a combination of: origin, sequence, precursor mass (m/z) and charge state. The entire LC-MS/MS run from an RNase digests is the library.

Both the reference and library were converted to an MGF format and passed through the spectral

matching pipeline (Figure 2.2). The NIST MS search v2 software is used to search the library for the

best match in the library.

2.3.2 Spectral matching by NIST Mass Spectral Search Program (v2.0)

The NIST Mass spectral search program is shown below (Figure 2.3). NIST uses a dot

product algorithm, which uses the cosine of the angle between the two mass spectra. If the two mass

spectra are perfectly aligned (m/z values, and ion intensity), it will be perfectly aligned (θ =0) and the

cosine 0 is 1. The opposite is an orthogonal spectrum (θ = 90) where the 0 (no peaks are matched).

The maximum score in the program is 1000. The two library search options in retrieving match spectra

are: (a) identity search (quick, normal, MS/MS) and (b) similarity search (simple, hybrid, neutral loss,

MS/MS in EI). It is important that the user has an end goal in mind before using this tool to facilitate

modification mapping data analysis. The creators of the software recommend what type of search

options to use depending on the question asked from the end user. If the question is: show everything in my experimental data (library) that looks remotely similar with this oligonucleotide (reference), the

similarity search is the best option to use as it is optimized to find spectra with similar features in the

55

library. The identity search operates under the assumption that the molecule of interest is in the library.

If the question asked is: “is this spectrum (reference) in the library?”, then it recommends the identity search option. I first tested out if the two search types give a variable result using reference RNase T1

digests of E. coli and evaluated the different score outputs, precursor mass and scan number.

a d

b e

c

f

Figure 2.3 Screenshot of the NIST MS search v2 software. a) Reference (search) b) histogram of the 100 hits c) list of the hits with the score (match, reverse match) d) reference spectrum and m/z of fragment ions e) head to tail plot of the reference and the hit in the library f) spectra of the hit and m/z of fragment ions.

The software gives 100 spectra that are potentially the oligonucleotide similar/identical to the

reference for the end user to evaluate. For the oligonucleotide EC_AU[t6A]AGp_899.9_2CS, the

56

identity and similarity search gave a consistent result, top two results are the correct scan numbers for

the reference oligo in the experimental data. This is verified by checking the scan numbers in the raw

file. Both search types give a Match(M) and Reverse Match(RM) score but only the identity search and similarity in MS/MS in EI has a probability score (Table 2.1 and Figure 2.4). There are no differences in the results with the sub options for both search types, except for Identity (MS/MS) and

Similarity (Neutral loss) which gave no output. Several other oligonucleotides were tested to establish the best cut off scores for the identity search (Table 2.2).

Table 2.1. Results obtained for the reference spectrum oligo AU[t6A]AGp_899.97 using Identity (Quick, Normal, MS/MS) and Similarity (Simple, Hybrid, Neutral Loss, and MS/MS in EI).

Type Match Reverse probability Scan Precursor Hit Match Number mass number Identity Quick 743 801 97.4 3014 900.32 1

Normal 743 801 97.4 3014 900.32 1 MS/MS ------Similarity Simple 778 863 - 3014 900.32 1 Hybrid 778 863 - 3014 900.32 1 Neutral ------Loss MS/MS in 328 672 38.0 3014 900.32 1 EI

57

Figure 2.4 Head to tail plot of the hit 1 for the oligo AU[t6A]AGp. The top spectrum is the reference and the bottom is the scan 3014 from the library.

Table 2.2. Result of searches of modified reference spectrum in identity (quick) and similarity (simple). Identity Similarity Reference oligonucleotide m/z Match R. Prob m/z Match R. Match Match Phe_A[s4U]AGp_671.25 671.05 673 740 76.2 671.05 689 790 Pro_AU[s4U]Gp_659.24 659.62 647 760 79.5 659.62 635 810 Arg_A[D]AGp_663.74 663.81 729 805 77.0 663.81 719 844 Ala_CUU[cmo5U]Gp_829.45 829.48 649 710 59.6 829.48 673 763 Gln_U[Um]U[cmnm5s2U]UGp_1004.51 1004.12 776 800 74.9 1004.12 888 927 Leu_AA[D][D][Gm]Gp_997.62 997.30 745 783 65.3 997.3 810 867 Thr_[m7G]UCACCAGp_1299.82 1299.81 667 765 91.9 1299.81 699 839 Val_CACCUCCCU[cmo5U]AC[m6A]AGp_1606.68 1606.83 751 794 57.8 1606.83 797 873 Ser_A[ms2i6A]AACCGp_1201.73 1201.63 719 781 38.2 1201.63 761 862 Ser_CUCCC[s2C]UGp_1257.34 1258.66 666 704 77.8 1258.66 674 735 Lys_ACU[mnm5s2U]UU[t6A][Ψ]CAAUUGp_1654.26 1654.78 681 735 48.7 1654.78 704 796 Leu_A[Ψ]UAA[ms2i6A]A[Ψ]CCCUCGp_1659.55 1659.82 741 784 60.7 1659.82 785 853 Leu_A[Ψ]U[Cm]AA[ms2i6A]A[Ψ]CAACCGp_1644.16 1644.35 619 677 75.7 1644.35 607 693 Cys_U[s4U]AACAAAGp_1470.36 11471.18 655 717 64.4 1471.18 664 759 Tyr_ACU[oQ]UA[ms2i6A]A[Ψ]CUGp_1371.35 1371.54 725 756 28.5 1371.54 798 853 Ile_ACU[k2C]AU[t6A]A[Ψ]CGp_1261.7 1262.17 757 806 51.3 1262.17 791 875

58

I then investigated as to why the MS/MS (Identity) and neutral loss (Similarity) option did not give any output. The MS/MS option extracts scans from the library from a list of highest priority, with number one is within ±0.01ppm a strict mass range. All the spectrum collected are from an ion trap instrument which has a low mass accuracy, so this explains why there is no output in the identity

MS/MS option. For the neutral loss (similarity), a maximum of five neutral losses within the 64 m/z value from the precursor mass are used for the spectral matching. In addition, the neutral losses in the reference has to be within the factor of four of the abundances in the library for spectral matching.

The highest peak in the spectrum is the loss of 145 Da, which corresponds to the neutral loss of threonyl-carbamoyl moiety. This is outside the preset 64 m/z range and since the similarity search has no peak ranking and scaling, the algorithm is more sensitive if (a) neutral loss ions are present and (b) if the intensity is matched with the library. These results are consistent with all oligonucleotides used in this program.

2.3.3 Score outputs for modified and unmodified reference spectra

To identify the best cut off scores, the match, reverse match and probability were compared.

The reverse match consistently gives a higher score than match score. This is due to the nature of the algorithm which in this search type, the peaks in the reference that are not in the library are ignored

[99]. As a result, there are less peaks that are aligned, hence a higher score (Table 2.2). In general, a combination of M (>600) and RM (>700) along with precursor m/z value gives a confident identification. Scores that fall between (600

Another feature in spectral matching is an assigned probability score. However, I did not find this to be a useful criterion for assessing the accuracy of the identification. An example is the

59

aforementioned RNase T1 digestion product AU[t6A]AGp. Two matches were generated, and both

were manually verified to be correct. The program assigned the probability of hit 1 as 97.4 and hit 2

as 2.45 because it distributes the total probability of 1.0 across the two possibilities considering other criteria. Another example was noted with the RNase T1 digestion product

Tyr_ACU[oQ]UA[ms2i6A]A[Ψ]CUGp. A total of six scans for the CID MS/MS spectra were present

in the LC-MS/MS data (the library), and each of these scans – while matching the reference spectrum

– were ranked by probability based on the initial spectral matching score criteria.

Table 2.3. Result of searches of unmodified reference spectrum in identity (quick) and similarity (simple). Identity Similarity Reference oligonucleotide m/z Match R. Prob m/z Match R. Match Match Tyr_CCAAAGp_979.14 980.16 540 615 68.4 980.16 505 614 Ser_AAAGp_674.37 675.27 703 751 82.4 675.27 681 767 Ser_UCAAAAGp_1144.51 1144.36 557 635 69.9 1144.36 558 677 Asp_AAUACCUGp_1285.85 1287.24 687 749 45.1 1287.24 713 810 Trp_CCCCUGp_944.17 945.51 691 771 52.4 945.51 713 853 Leu_CCUCCCGp_1096.91 1096.6 540 632 79.0 1096.6 565 708 Val_UCAUCACCCACCA_1333.62 1333.60 756 805 45.7 1333.60 798 884 Val_UCCACUCGp_1261.56 1262.17 752 806 48.9 1262.17 779 874 Thr_UAAUCAGp_1133.9 1134.71 695 744 72.5 1134.71 724 803

Because lower resolution and mass accuracy ion trap data was used in testing and developing this

spectral matching approach, certain sequences can provide conflicting results that should be evaluated

before an assignment is made (Table 2.3). For example, the reference spectra Tyr_CCAAAGp has a precursor mass of 979.14. Searching the LC-MS/MS data set using this reference yielded an

identification whose precursor mass was off by 1 Da consistent with a C-U difference between the

60

reference and library. Upon closer inspection, the library spectrum corresponds to the oligonucleotide

CUAAAGp as the y-type ion series was the same, but the c-type ions differed by 1 Da from c2-c5.

To test the robustness of the software, the decoy spectra from the signature digestion products

from the fungi C. neoformans and were searched against the E. coli library. The two organisms are

phylogenetically distinct, one being a eukaryote and the other a bacterium, hence we expect some

modifications that are distinct in one kingdom. CLUSTAL alignment of the tRNAs from the C.

neoformans and E. coli has very low homology except for the highly conserve T-loop sequence (UUCG).

In addition, to qualify as a decoy spectrum, each genomic sequence of the digestion product tested

from C. neoformans was checked with E. coli tRNA sequences to ensure no overlap exists.

Consistently all the top results (Table 2.4) have low scores, below the set thresholds for correct match.

Table 2.4 Result of searches of decoy spectra from C. neoformans T1 digest using similarity search. Reference oligonucleotide m/z Match R. In the Match library? Cneo_ CUU[mcm5s2U]UCACG_1459.51 1448.48 208 357 No Cneo_A[io6A]AUCGp_1022.12 1023.10 208 397 No Cneo_UCAUU[I]p_949.18 958.14 207 282 No Cneo_CA[i6A]ACCGp_1166.14 1110.50 232 400 No Cneo_[Um]U[mcm5s2U]UGp_ 843.84 177 248 No Cneo_ [m1A]AUCCCACUUCCCACUCCA_1467.74 1464.07 396 520 No

2.3.4 N6-threonylcarbamoyladenosine (t6A) in tRNA Ile_GAU of Streptococcus mutans

The method of spectral comparison was applied to detect the t6A modification in the

anticodon loop of tRNA Ile_GAU of the bacteria S. mutans. The synthesis of t6A involves the

production of L-threonylcarbamoyladenylate (TC-AMP) by the enzyme TsaC or TsaC2. The TC-AMP

is added to adenosine 37 by the threonylcarbamoyltransferase complex, which is composed of the 3

subunits TsaD, TsaB and TsaE. To elucidate which subunit is essential for the t6A37 modification,

gene deletions of TsaC, TsaB and TsaE were prepared and total tRNA was analyzed for the

61 oligonucleotide AU[t6A]AGp (m/z 899.97). This target oligonucleotide is the same with the tRNA

Ile_GAU from E. coli. The reference spectrum from E. coli was used and passed through the software protocol as described previously. The results are presented in Table 2.5.

Based on the similarity match and reverse match score, the oligonucleotide is detected in the wild type and TsaE deletion mutant but was not detected in the TsaC and TsaB deletion mutants.

These results were verified by manually searching for the precursor mass and manually annotating the mass spectrum (Figure 2.6). Interestingly, the top match for the TsaC and TsaB deletion mutant samples is the unmodified equivalent of the reference spectrum (i.e., AUAAGp). The oligonucleotides have the same c-type and y-type product ions until after the location of t6A, wherein a mass shift of

145 Da is found for all other product ions (Table 2.7). If the unmodified oligonucleotide is used as a reference, the results are consistent that AUAAGp is present in all the mutant samples.

Table 2.5. Similarity search for AU[t6A]AGp_899.97 to S. mutans libraries (wild type, ΔC, ΔB and ΔE).

Library m/z Match R. In the library? Match Wild type 900.64 818 901 yes tsaC 827.15 351 795 no tsaB 827.13 344 800 no tsaE 899.60 794 881 yes

Table 2.6. Similarity search for E.coli_AUAAGp_827.41 to S. mutans libraries (wild type, ΔC, ΔB and ΔE).

Library m/z Match R. In the library? Match Wild type 827.11 662 710 yes tsaC 827.14 806 867 yes tsaB 827.11 826 874 yes tsaE 827.12 623 751 yes

62

Figure 2.5 Extracted ion chromatogram of the wild type and mutants for the oligo AU[t6A]AGp (m/z 899.97). Δtsa B and ΔtsaC is completely devoid of the t6A modification in the anticodon loop of tRNA Ile_GAU whereas the Δtsa E still has trace amounts of the modification.

Table 2.7. List of m/z values for c- and y- type fragment ions for the anticodon loop of tRNA Ile_GAU both modified and unmodified.

AUAAGp AU[t6A]AGp

Ion series charge c y Ion series charge c y 1 -1 362 328 1 -1 362 328 2 -1 691 634 2 -1 691 634 3 -1 1020 963 3 -1 1165 1108 4 -2 662 1292 4 -2 735 1437

63

Figure 2.6 Head to tail plot of the hit 1 for the oligo AU[t6A]AGp. The top spectrum is the reference and the bottom is hit 1, which corresponds to the unmodified version of the reference.

2.3.4 Method Limitations

The re-purposing of the NIST MS search v2 software has been demonstrated to work well with comparative analysis of CID spectra. As demonstrated, the best reference (search spectrum) is an oligonucleotide with post transcriptional modification. Unlike proteins where there are 21 possible amino acids for every position (not including modifications), RNA only have four canonical bases. As a consequence, high sequence similarity exists even with tRNAs from different species. In the present work, a linear ion trap mass analyzer was used, hence oligonucleotides that differ in C and U residues can be very challenging to discriminate by spectral matching alone. In addition, the mass of U is the

A+1 isotope for C. However, if a modification (of 150 possibilities) is present the mass shift is unique enough that the spectrum becomes its own unique fingerprint.

Ideally, comparative analysis of tandem mass spectra should be independent of the type of mass spectrometer (trap vs beam) or instrument conditions. However, data collected in the lab clearly showed a different fragmentation pattern is produced in ion beam type (G2 Synapt) vs trapping type

(LTQ-XL). In G2, the major ions are a-b losses and w- type, whereas in the LTQ-XL, mostly c- and

64

y- ions are predominant. This can be challenging if the set of reference or library is generated from

two different instruments. For routine analysis, one may need to be consistent with the instrument

used. High accuracy mass spectrometers offer the advantage of now resolving the C vs U differences and sequence isomers, which can improve the spectral matching through narrowing the window for precursor selection. All in all, consistency should be observed when analysis is done by spectral matching.

2.4 Conclusion

This chapter demonstrates the use of a well characterized oligonucleotide as a reference spectrum of oligonucleotides for CID mass spectral matching. The practical application of this method is for routine analysis of oligos for positive identification, during chromatography optimization, gene deletions and knockouts. The advantage of this method is by targeted detection, high specificity analysis and eliminated the rigorous library building. Although it was demonstrated in this chapter that the library is the LC-MS/MS experimental run, one can reverse the approach by using the collection of well characterized spectra as the library and unknowns as the search spectra. The output, however, will now show oligonucleotides with sequence/modification information and precursor m/z values versus the way it was done in this chapter where the only information available

is the precursor mass and retention time. Over time, spectra are collected and added to the pool of

references which will eventually become part of a routine lab analysis.

As demonstrated in this chapter, the use of a reference spectrum can be used to automate LC-

MS/MS data analysis for modification mapping to get qualitative information quicker. In the next

chapter I will present the second way a reference can be used for a more precise quantitative

information.

65

Chapter 3 Stable Isotope Labeling for Improved Comparative Analysis of RNA Digests by Mass Spectrometry

3.1 Introduction

This chapter demonstrates a second approach for using a reference as an internal standard for improved comparative analysis and quantification. The reference is an in-vitro transcribed RNA, which is the same sequence as the sample except that it is not modified. The method described in this chapter is an expansion of the original comparative analysis approach (CARD) with mass spectrometry analysis. The Stable Isotope Labeling for Comparative Analysis of RNA Digests (SIL-CARD) approach is more widely applicable for tRNA analysis as one only needs to know the genomic information for the tRNA of interest to generate the isotopically labeled transcript(s) as a reference.

Doublets are easily characterized by the mass shift due to the 13C and 15N enrichment in the isotopically labeled precursor, which can be tailored to the RNA of interest. If nothing co-elutes with the reference, then the sample oligonucleotide has a modification. Further analysis of the oligonucleotide through its CID fragmentation will determine the modification and its location in the sequence. With SIL-

CARD, the advantages of previous quantitative approaches to RNA modification mapping can now be brought to bear on another class of important modified RNAs.

This work has been published in Journal of American Society for Mass Spectrometry. (2017), 28 pp. 551-561.

66

3.2 Experimental

3.2.1 Materials

Escherichia coli tRNA Tyr II was purchased from Sigma Adrich (St. Louis, MO USA).

Deoxynucleotide triphosphate mix, PfuI DNA polymerase and 10X PCR buffer were used as received from Promega (Madison, WI). Triethylamine (TEA), 1,1,1,3,3,3-hexafluoroisopropanol (HFIP), ammonium acetate, ethanol and MEGAClearTM Transcription kit were obtained from Thermo Fischer

Scientific (Waltman, MA). Qiagen QIAquick® (Hilden, Germany) and EpiCenter AmpliScribeTM

(Madison,WI) kits were used. HPLC-grade methanol and acetonitrile were obtained from Honeywell

Burdick & Jackson, Inc. (Muskegon, MI). UltraPure agarose was purchased from Invitrogen

Corporation (Carlsbad, CA). RNase T1 and bacterial alkaline were obtained from

Worthington (Lakewood,NJ). Sep-Pak C18 cartridges were obtained from Waters (Milford, MA).

Nanopure water (18 MOhms) from a Barnstead (Dubuque, IA) nanopure system was used

throughout.

3.2.2 Amplification of E. coli tDNA Tyr II by Polymerase Chain Reaction (PCR).

The gblocks® gene fragments for Escherichia coli tDNA Tyrosine II, forward primer (which

contains the T7 RNA polymerase promoter sequence) and reverse primer (Table 3.1) were obtained

from Integrated DNA technologies, Inc. (Corraville, IA). The PCR master mix was prepared following

the vendor protocol. For the negative control, the same master mix was prepared without the gene

fragment. Amplification was as follows: initial denaturation for 2 min at 94 °C, denaturation, annealing

and extension at 92 °C, 55 °C, 72 °C for 30 sec each, and the cycle was repeated 28 times. Final

extension was held for 5 min at 72 °C. The PCR products were purified by QIAquick® clean up kit

67

and analyzed on a 1.5% agarose gel with standard DNA marker. The amplified tDNA was used as the

template for subsequent T7 RNA in vitro transcription.

Table 3.1 The gene sequences of E. coli tRNA Tyr II forward primer (with the T7 RNA polymerase promoter region) and reverse primer. tDNA GGTGGGGTTCCCGAGCGGCCAAAGGGAGCAGACTGTAAATCTGCCGTCATC GACTTCGAAGGTTCGAATCCTTCCCCCACCACCATTAAGCTTTCCGGAT Forward primer GGATCCTAATACGACTCACTATAGGGGTGGGGTTCCCGAGCGGCCAAA Reverse primer TTTTGAGCTTTGGTGGTGGGGGAA

3.2.3 In vitro transcription of light (regular) and heavy (isotopically enriched) E coli tRNA

Tyr II.

The transcription reactions were assembled from AmpliScribeTM High Yield Transcription Kit:

tDNA template (PCR product), 10X reaction buffer, dithiothreiol (DTT), RNase inhibitor, inorganic

phosphatase, T7 RNA polymerase, ATP, CTP, UTP and GTP in a total volume of 50 µL maintained

by addition of sterile water. For isotopically enriched transcripts, GTP was replaced by 13C, 15N labeled

GTP (98% atom, Sigma Aldrich). After 1 h incubation at 37 °C, DNAse I and EDTA were added to

quench the reaction. The transcripts were purified (MEGAClearTM Transcription kit) following vendor

instructions. The RNA was quantified using an Implen Pearl NanoPhotometer (Implen GmbH,

Munich, Germany).

3.2.4 RNase T1 digestion

All RNA samples were digested using RNase T1, which was purified as previously described

[100]. CARD analyses involved combining 1 μg of isotopically labeled transcript with 1 μg of E coli

tRNA Tyr II standard prior to enzymatic digestion. The samples were digested with 50 U RNase T1

68 per μg RNA for 2 h at 37 °C, vacuum dried and stored at -20 °C until further analysis. Samples were resuspended with 10 μL of mobile phase A prior to LC-MS/MS analysis.

3.2.5 Liquid Chromatography and tandem mass spectrometry.

The RNase digestion products were separated on an Xbridge-MS C18 1.0 x 150 mm, 3.5 μm particle size and 150 Å pore size (Waters, Milford, MS) at a flow rate of 30 μL min-1. Mobile phase A

(MPA) consists of 8 mM TEA/200 mM HFIP, pH 7 and mobile phase B is 50% MPA and methanol.

The LC gradient initiated at 10 %B then increased linearly to 60 %B for 32 min, followed by 95 %B for 5 min before a minimum 20 min re-equilibration period at 10 %B.

All LC-MS/MS analyses were performed using a MicroAS autosampler, Surveyor MS pump

HPLC system, and Thermo LTQ-XL (Thermo Scientific, Waltman, MA) linear ion trap mass spectrometer with an ESI source. The capillary temperature was set at 275 °C, spray voltage of 4 kV and 35, 14 and 10 arbitrary flow units of sheath, auxiliary and sweep gas, respectively. The mass spectra were recorded in negative polarity and profile data type. Scan event 1 was set at zoom scan with a mass range of 500-2000 m/z. Sequence information of the four most abundant digestion products were obtained in scan events 2-5 by data dependent CID. Data acquisition was through Thermo

Xcalibur software.

3.2.6 Calibration curve and data analysis.

A calibration curve was generated by analyzing mixtures of heavy (isotopically enriched transcripts) and light (regular transcripts). The heavy and light transcripts were combined in ratios of

10:1 to 1:10 ([heavy]/[light]). Each mixture was prepared and analyzed in triplicate. The ion abundance

69

of an RNase T1 digestion product was measured using extracted ion chromatograms (XIC) of the

major isotope. The ion abundance ratio, averaged from the extracted ion chromatogram, was

calculated by the use of equation 3.1,

= ∗ ∗ (eq 3.1) 𝐼𝐼ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝐼𝐼𝐴𝐴+15 𝐼𝐼𝑙𝑙𝑙𝑙𝑙𝑙ℎ𝑡𝑡 𝐼𝐼𝐴𝐴

where IA represents the monoisotopic peak abundance of the light digestion product, and IA+15

represents the monoisotopic peak abundance of the heavy digestion product. The characteristic mass

13 15 shift of 15 Da is due to the C10 and N5 in the GTP precursor. All ratios were corrected for the 98% isotopic purity of GTP. The arithmetic mean, and standard deviation were calculation for each ratio.

A calibration curve representing the mean values of ion abundances and ratios were constructed by a linear least square fit.

The E. coli tRNA Tyr II gene sequence was obtained from the Genomic tRNA Database

(http://gtrnadb.ucsc.edu/) [101]. The E. coli Tyr II sequence annotated with posttranscriptional

modifications was obtained from Modomics (http://modomics.genesilico.pl/) [98]. The Mongo

Oligo Calculator (http://mods.rna.albany.edu/masspec/Mongo-Oligo) was used to calculate the

theoretical molecular mass of the RNase T1 digestion products from all RNA sequences, and these

masses were used to identify the appropriate XIC channels to monitor during quantitative analyses.

3.3 Results and Discussion

The advantages of a transcript-based in vitro synthesized internal standard or reference for the

CARD approach were recognized within the initial report by Taoka et al. [102]. Our initial approach,

described here, was to examine the utility of such transcripts for the quantitative characterization of

individual tRNAs. However, the method itself should be readily extended to more complex mixtures

70

of tRNAs although that has not yet been explored. A schematic illustration of our SIL-CARD

workflow is shown in Figure 3.1. Isotopically labeled guanosine triphosphate precursor is also used

here, as RNase T1 digestion will ensure that each heavy digestion product (except the 3’-terminus of

the tRNA) will have the characteristic 15 Da mass shift.

(a)

(b) In vitro transcript 5’- pppGGUGGGGUUCCCGAGCGGCCAAA GGGAGCAGACUGUAAAUC UGCCGUCACAGACUUCGAAGGUUCGAAUCCUUCCCCCACCACCA –3’

E. Coli Tyr II : 5’- pGGUGGGG[s4U][s4U]CCCGAGCGGCCAAAGGGAGCAGACU[Q]UA

[ms2i6A]AUCUGCCGUCACAGACUUCGAAGG[m5U][Ψ]CGAAUCCUUCC CCCACCACCA -3’

Figure 3.1 (a) Schematic outline of stable isotope labeling- comparative analysis of RNA digest (SIL- CARD). An in vitro transcribed RNA, synthesized with 13C and 15N enriched guanosine triphosphate, is the reference for the RNA sample. When RNase T1 digestion products of the sample are unmodified, the mass spectrum will reveal a doublet, separated by 15 Da, from the sample and reference. Singlets, which contain post-transcriptional modifications, can be characterized by MS/MS. (b) Sequences for the E. coli tRNA Tyr II transcript (reference) and E. coli tRNA Tyr II RNA (sample). The sequences can be digested theoretically (i.e. RNase T1) to generate a predicted list of singlets and doublets.

71

3.3.1 Identification and characterization of in vitro transcribed E. coli tRNA Tyr II

To verify that isotopically labeled transcripts would co-elute and provide sufficient m/z differences to be used in the CARD method, light (regular) and heavy (isotopically enriched) transcripts of tRNA tyrosine II (GUA) patterned from E. coli were synthesized. Equal amounts of the light and heavy transcripts were combined and digested with RNase T1 to generate a series of oligonucleotides, which were then separated by ion pairing liquid chromatography and detected by

ESI-MS/MS. A representative example of the data obtained is shown in Figure 3.2. Here, the expected digestion product UUCGp is detected from both the light (m/z 639.1, -2 charge state) and heavy (m/z 646.6, -2 charge state) transcripts. The mass shift of 7.5 Da is as expected, taking into account a -2 charge state. These digestion products co-elute, showing no retention time effects due to the labeled guanosine as expected based on previous reports (Fig. 3.2a) [102].

Due to the reproducible nature of CID of RNase T1 digestion products, the y-type fragment ion series should contain the isotopic label for the heavy transcript. In other words, typical c-type fragment ion series for the two samples should be identical, but the y-type fragments will differ by the

+15 Da mass shift due to the labeled guanosine on the 3’-terminus of each RNase T1 digestion product. As seen in Figures 3.2b and 3.2c, indeed c-type fragment ions are identical for these two digestion products, but the y-type fragment ions contain the expected mass shift. These data confirm the localization of the isotopic label to only the guanosine residue. A similar analysis was performed on all other expected RNase T1 digestion products from both the light and heavy transcripts to confirm their utility in the SIL-CARD approach (Table 3.2).

72

(a)

b)

c)

Figure 3.2 Mass spectral data for the E. coli tRNA Tyr II RNase T1 digestion product UUCGp. (A) The doubly charged digestion products from both sample and reference are detected with the expected m/z difference of 7.5. MS/MS of the (B) sample and (C) reference digestion products. The 15 Da increase in the y-type fragment ions of the reference is due to the 13C and 15N labeled guanosine triphosphate. *denotes isotopically labeled oligonucleotide.

73

Table 3.2 The light and heavy oligonucleotide of the RNase T1 digest of the tRNA Tyr II reference. A mass shift of 15 Da is observed per charge state. The light and heavy oligo for the terminal oligo can’t be distinguished since it does not have a guanosine nucleotide.

Digestion product Charge state m/z Light heavy UUCCCGp -2 944.14 951.66 ACUGp -2 650.58 658.08 UAAAUCUGp -2 1285.68 1293.20 UUCGp -2 639.08 646.60 CCAAAGp -2 979.14 986.66 UCAUCGp -2 956.12 963.58 ACUUCGp -2 956.14 963.6 AAUCCUUCCCCCACCACCAAAG -3 1734.66 1738.4 CUCAAAA-OH -2 1084.66 1084.66

Use of heavy transcript as CARD reference

As an initial demonstration that in vitro transcripts can be used for CARD, the isotopically

labeled transcript of E. coli tRNA Tyr II and an E. Coli tRNA Tyr II standard were combined in equal

amounts and analyzed by LC-MS/MS. Based on the known modification profile of E. coli tRNA Tyr

II, three CARD doublets should be found in the mass spectral data: CCAAAGp, UCACAGp and

ACUUCGp (RNase T1 digestion products less than 4-mers were not considered). Because the

transcript is unmodified, and the standard used here is of known modification profile, there are three

singlets from E. coli tRNA Tyr II anticipated in the experimental data: [m5U][Ψ]CGp,

[s4U][s4U]CCCGp and ACU[Q]UA[ms2i6A]A[Ψ]CUGp. Two of these RNase digestion products

should be detected as unmodified equivalent singlets from the in vitro reference: UUCG*p and

UUCCCG*p. The transcript equivalent of ACU[Q]UA[ms2i6A]A[Ψ]CUGp should not be detected, as

74

the modified nucleoside queuosine is post-transcriptionally added to G34 of E. coli tRNA Tyr II. Thus,

the transcript would be detected as two unmodified singlets: ACUG*p and UAAAUCUG*p.

The first step in standard CARD analysis is to examine the data for doublets. Figure 3.3

contains the predicted doublets, which differ in mass only at the terminal guanosine residue. Figure

3.4 shows the detection of the singlets predicted from a CARD analysis performed in this manner.

Figure 3.4a reveals the expected m/z values for UUCG*p and its modified equivalent. Similarly,

Figure 3.4b reveals the expected m/z value for UUCCCG*p and its modified equivalent. In both cases, the posttranscriptional modifications lead to elution time differences between the reference and sample digestion products. Figure 3.4c confirms that the unmodified transcript yields two digestion products, which are the sequence equivalent of the modified product ACU[Q]UA[ms2i6A]A[Ψ]CUGp.

75

Figure 3.3 Mass spectral data for the E. coli tRNA Tyr II RNase T1 digestion products (A) CCAAAGp, (B) UCAUCGp and (C) ACUUCGp. As the digestion products from reference and sample co-elute, the reference can be used for quantitative analysis of tRNA abundance in the mass spectrum.

76

Figure 3.4 Extracted ion chromatograms (XICs) and mass spectra of the three expected singlets from SIL-CARD analysis of E. coli tRNA Tyr II, where the digestion products from the post- transcriptionally modified sample all elute after the tRNA transcript. (A) [m5U]ΨCGp (m/z 646.08, - 2) and UUCG*p (m/z 646.58, -2); (B) [s4U][s4U]CCCGp (m/z 960.04, -2) and UUCCCG*p (m/z 951.59, -2); (C) ACU[Q]UA[ms2i6A]A[Ψ]CUGp (m/z 1365.56, -4), ACUG*p (m/z 658.08) and UAAAUCUG*p (m/z 1293.18). The unmodified anticodon sequence (ACUGUAAAUCUGp) was not detected in the sample, hence no doublets for ACUGp and UAAAUCUGp are detected. Modified nucleosides: s4U: 4-thiouridine; Q: Queosine; ms2i6A: 2-methylthio-N6-isopentenyladenosine; Ψ: pseudouridine; m5U: 5-methyluridine.

77

These data demonstrate that in vitro transcripts can be used as the reference for CARD

characterization of tRNAs. Although prior versions of the CARD approach used cellular tRNAs

containing known modification profiles as the reference, the initial data analysis is similar when using

transcripts. In those earlier iterations of the method, the data is first examined to identify all doublets,

which are used to confirm sequence and modification equivalency between the reference and the

sample. After that step, the data is then examined for singlets. While prior versions of CARD require

a reverse labeling step to differentiate singlet source (either reference or sample), the use of an

unmodified transcript eliminates the need for any reverse labeling as the m/z values for all transcript digestion products will be known before analysis and only these digestion products would contain the characteristic isotopically labeled guanosine at the 3’-terminus of the digestion product.

3.3.2 Quantitative utility of CARD references

After characterizing the advantages of isotopically labeled transcripts for use as the reference in a CARD approach, we next sought to confirm that these transcripts – after enzymatic digestion and

LC-MS – could also serve as internal standards to obtain quantitative information from a CARD experiment. To establish quantification of RNase T1 digestion products, a linear correlation between sample: reference ratios and the detected ion abundance ratios should exist. The unlabeled and labeled

E. coli tRNA Tyr II transcripts were combined in concentration ratios ranging from 10:1 to 1:10

[labeled/unlabeled], digested with RNase T1 and analyzed by LC-MS. Surprisingly, a non-linear

78

Figure 3.5 (a) Representative mass spectra of various heavy:light ratios for the RNase T1 digestion product UUCCCG. (b) Calibration curve generated for the ten different heavy:light ratios of UUCCCG listed in Table 2.

response was noted that was independent of the LC and ESI-MS conditions used during analysis

(Table 3.3). However, when the 3’ phosphate group from each RNase T1 digestion product was

79 removed by bacterial prior to LC-MS analysis, a more well-behaved linear relationship between sample concentration and ion abundance was found (Figure 3.5).

Representative mass spectra of different mixtures of the doubly charged digestion product

UUCCCG are shown in Figure 3.5a. As seen in these examples, the 15 Da mass difference more easily enables quantification as compared to prior CARD approaches. Ion abundance ratios were calculated using Equation 3.1 and are found in Table 3.4 The RSD for these ratios were within 10% throughout. From these data a representative calibration curve was generated as shown in Figure

3.5b. These data are well-behaved when analyzed by linear least squares, yielding a slope close to 1.0 with only a slight positive bias towards the isotopically labeled digestion products.

Table 3.3 Three point calibration curve for the oligo UUCCCGp (light and heavy) at different ratios with the phosphate group at the 3’ end.

Prepared [heavy]/[light] Measured [heavy]/[light]

0.5 3.9±0.31

1.0 1.90±0.05

2.0 0.97±0.08

Similar curves were constructed for the eight other RNase T1 digestion products with all being linear with slopes around 1.0. Thus, a linear relationship of the monoisotopic ion abundance and concentration is established within one order of magnitude change in molar concentration, which is similar to the dynamic range obtained in 18O/16O labeling previously reported. (a) Given this linear response, it was informative to then review the earlier data obtained from the E. coli tRNA Tyr II

80 standard and the isotopically labeled transcript. The measured ion abundances for the three doublets correspond to molar concentration ratios of the unmodified digestion products in the sample tested

Thus, SIL-CARD is a straightforward and rapid means to quantify RNA digests.

Table 3.4 Average ion abundance ratios of the monoisotopic peaks of heavy: light ratios for the oligo UUCCCG. The corresponding calibration curve is shown in Figure 3.5b.

[heavy]/[light] Calculated average % CV monoisotopic ion abundancea 10 10.9 ± 1.1 8.30 8 8.83 ± 0.45 9.43 6 6.56 ± 0.21 8.60 4 4.44 ± 0.23 9.82 2 2.25 ± 0.062 10.9 1 1.18 ± 0.019 15.1 0.5 0.58 ± 0.039 13.6 0.25 0.27 ± 0.007 11.9 0.167 0.20 ± 0.011 16.1 0.125 0.14 ± 0.002 13.1 a n=3, ±SD

For accurate quantitative measurement, a calibration curve of the unknown sample with a known amount of internal standard is required, provided the response factor of the instrument does not deviate significantly from unity. Given the approach relies on comparing ratios of ion abundances, as long as the amount of internal standard is known with confidence, absolute quantification is possible. Because the detection and quantification is performed simultaneously, heavy transcripts could serve as an internal standard to follow changes in levels of particular RNase T1 digestion products.

81

3.3.3 Benefits of SIL-CARD for tRNA characterization

An overall evaluation of the data obtained using isotopically labeled in vitro tRNA transcripts

as the reference for CARD finds several important advantages as compared to the prior approach

based on the enzymatic incorporation of 18O/16O during RNase digestion. Spectral congestion is reduced.

Overlapping isotope peaks, due to the natural 13C and 15N present in the RNAs, makes data analysis

quite challenging when searching singlets and doublets. SIL-CARD reduces spectral congestion by

increasing the mass shift from +2 Da to +15 Da (for guanosine). Moreover, even long RNase

digestion products, which may be detected at relatively high charge states, can still be differentiated

by SIL-CARD (Figure 3.6). The labeling step is separated from enzymatic treatment. The prior iteration required isotopic labeling by the complete hydrolysis of RNA in heavy water. However, not all RNases uniformly generate 3’-phosphates as some RNases (i.e. U2, A, Cusativin) halt at cyclic phosphate intermediates. As these intermediates cannot incorporate the 18O/16O label, a second round of enzymatic treatment using lambda phosphatase (LPP) to transform cyclic phosphates to linear products is required for labeling [103]. By separating the labeling step from any enzymatic digestion steps, the label will always be present in SIL-CARD (Figure 3.2). Labeled transcripts enable broader

applicability of the method. Unlike earlier versions of CARD, which are limited to using cellular RNAs

whose modification status is fully characterized as references, the use of in vitro transcripts as references

means that this method can be applied to any tRNA or tRNA pool so long as the tDNA sequences

are known. The much greater availability of genomic sequence information over RNA sequence

information with all modifications annotated should enable this approach to be used in a wide variety

of assays including those that would monitor tRNA modification profiles in response to different

external stimuli [104].

82

Figure 3.6 Mass spectra of the unmodified oligonucleotide AAUCCUUCCCCCACCACCAAAGp (m/z 1734.78, -4 charge state). (a) When this oligonucleotide is labeled by the 18O approach, the predicted doublet (m/z 1734.78 and 1735.06) is not readily identified at this high charge state. (b) When the same oligonucleotide is labeled by using enriched GTP, the expected doublet can easily be identified with a low resolution mass analyzer.

3.4 Conclusion

This chapter demonstrated the second approach for using a reference as an internal standard for an improved comparative and quantitative analysis. The reference is an in vitro transcribed RNA, with a stable isotope label on the guanosine residue. Unlike prior iterations of CARD, SIL-CARD simplifies the identification of doublets and inherently incorporates an isotopically labeled standard suitable for quantitative analysis of tRNA abundance. While the proof-of-concept work presented here is limited to a single tRNA, the method is scalable to tRNA mixtures in the same manner as traditional CARD.

This improved method should prove particularly useful for studies focused on dynamic changes in

83 tRNA modification levels in addition to more conventional characterization of tRNA modification profiles.

The reference concept can be further expanded into non-MS techniques such as next generation sequencing for a complementary method in understanding RNA modifications in biological systems. Chapter 4 and Chapter 5 described an in-depth study of the human pathogen fungus Cryptococcus neoformans, leveraging both mass spectrometry and RNA sequencing to understand the translational apparatus when the fungus is exposed to ionizing radiation.

84

Chapter 4 RNA modification of the fungi Cryptococcus neoformans by mass spectrometry and its dynamics under ionizing radiation

4.1 Introduction

Radioresistant fungi are a unique biological system to study due to the discovery that highly pigmented fungi were discovered colonizing the walls of Chernobyl nuclear reactors twelve years after the explosion. The environment was highly ionizing; an estimated 105 times the natural background level was measured for the annual exposure of these fungi [105]. Until this research, nothing about tRNA expression, modification or function was known for these fungi. Moreover, this biological system provides a good model to expand comparative methods for tRNA analysis beyond only MS data.

The concept of reference can be further expanded into non-MS techniques such as using next generation sequencing as a complementary method in understanding RNA modifications and cellular processes in biological systems. While other groups have focused in quantifying RNA modifications and how codon usage bias changes in response to stress [38] [104, 106], I would like to complement our mass spectrometry data with RNA sequencing with the attention specifically on RNA modifying enzymes and tRNA expression levels (pre-tRNA and mature). Changes in the level of modified nucleosides can be hypothesized due to an increased synthesis of tRNA transcripts (hence more copies are being modified) or due to an increased turnover of pre-tRNA transcripts, thereby enriching the pool with modified tRNAs.

In this chapter, the modification status of Cryptococcus neoformans total tRNA is explored and mapped back to the genomic sequence. Then, the fungus was exposed to ionizing radiation and the levels were compared with the control. These data and insights can then be compared with the

85

transcriptome data to be presented in Chapter 5 to understand possible mechanisms C. neoformans

uses to adapt to ionizing radiation.

4.2 Experimental

4.2.1 Culturing and tRNA purification

C. neoformans (Sanfelice) Vuillemin (ATCC® 32045™) was grown in potato dextrose (PD)

medium (pH 5.5) at 30°C. The cells were harvested when the optical density reached 1.5 at 600 nm

(about 18 h). Cell count was performed and the cell count at this OD is at 1.5x108 cells/mL. Total

RNA and tRNA isolation were previously described in Chapter 2.

4.2.2 Ionizing radiation exposure and cell viability test

Exposure to ionizing radiation was done in the University of Cincinnati, Department of

Mechanical and Materials Engineering program. The C. neoformans cells were grown until an OD of

1.5 is reached, aliquoted in glass vials for ionizing radiation exposure with Cobalt 60 as the source.

Cell count was performed using a hemocytometer. The dosage is fixed at 132 Gy/hour, and the total

exposure time was adjusted to achieve a maximum exposure of 300 Gy (in 50 Gy increments). After

exposure, the cells were washed with 1X PBS and processed for total RNA and tRNA extraction.

An aliquot of the exposed cells was diluted to produce colonies between 30 to 300 in PD

plates. After two days, the cream-colored colonies were visible and manually counted for percent cell

survival.

86

4.2.3 Analysis of tRNA nucleoside compositions by LC-MS/MS

Total RNA (1 μg) and tRNA (3 μg) were denatured by heating to 95° for 3 min and quickly chilled in an ice bath for 10 min. The solution was added 1/10 volume of 0.1 M ammonium acetate and P1 2U/μg RNA. The mixture was incubated for 2 h at 45°C. Next, 1/10 volume of 1 M ammonium bicarbonate was added, and terminal phosphates were removed by digestion with bacterial alkaline phosphatase (0.1 U/μg) and snake venom (1.2x10-4 U/μg RNA) incubated at 37°C for 2 h.

After digestion, the samples were vacuum dried and resuspended in mobile phase A for nucleoside analysis. The nucleosides were separated using a Vanquish LC system with an HSS T3 column (100Å, 1.8 μm, 2.1x50 mm) with 5 mM ammonium acetate, pH 5.3 and 40% acetonitrile as mobile phase A and B respectively. A linear gradient was used to separate the (1-2% in 9.2 min., 3% in 16 min., 5% in 21.4 min., 25% in 24.6 min., 50% in 26.9 min. 75% in 30.2 min (hold of 0.3 min), 99% in 35 min. The flow rate was set at 250 μL/min and the column oven at 30 °C. The eluent was coupled to an Orbitrap Fusion LumosTM (Thermo Scientific) mass spectrometer. The source was set at a capillary temperature 329 °C, spray voltage 3.5 kV, and 38, 11, 1 arbitrary units of sheath, auxiliary and sweep gas, respectively. The full-scan mass spectra (m/z range: 105-900) were acquired in the Orbitrap mode at 120,000 mass resolution at a mass accuracy of <4 ppm.

Fragmentation of the nucleoside by collision-induced dissociation (CID, 42% collision energy) was used in data-dependent mode to switch automatically between MS (scan 1), and four CID scans.

87

4.2.5 tRNA modification mapping by LC-MS/MS

Total tRNA was digested with two different enzymes, RNAse T1 and Cusativin. The protocol for RNase T1 was described previously in Chapter 2. Cusativin (over expressed and purified in- house) was added in a 1:1 μg ratio with the tRNA (buffered with 120 mM ammonium acetate) and

incubated for 1 h at 37 °C. After digestion, the samples were vacuum dried prior to analysis. An

internal standard, 3-bromodeoxy uridine, in mobile phase A was added in a fixed concentration to all

samples prior to injection. LC-MS/MS details are described in Chapter 2.

4.2.6 Data analysis

A comprehensive list of all known modified nucleosides including molecular mass were

obtained from the Modomics database (http://modomics.genesilico.pl/). All known RNA

modifications were searched in the raw data by comparing the m/z value (within a 5 ppm mass range)

and fragmentation (sugar neutral loss). The peak area of the modified nucleoside was normalized with

the internal standard each for the control and IR exposed samples. The fold change was calculated as

the ratio of the normalized IR treated samples over control.

Transfer RNA genomic sequences were obtained from the Genomic tRNA Database

(http://lowelab.ucsc.edu/GtRNAdb/). Several strategies were employed to aid in the modification

mapping of the expressed tRNA. First, a catalogue of in-silico digestion of all the predicted RNase

T1 and Cusativin products were created. Second, using the pool of nucleosides detected, the

modification location was predicted. Third, an automated analysis using the RAMM program[91] was

used. Fourth, manual annotation of the MS/MS was performed, with the criterion with at least 80%

of the expected c and y ion fragments (http://mods.rna.albany.edu/masspec/Mongo-Oligo) were

detected. A special case was observed for 7-methyl guanosine which fragments uniquely in CID.

88

4.3 Results and Discussion

C. neoformans is a basidiomycete fungal pathogen that affects immunocompromised patients like those diagnosed with AIDS/HIV. It is estimated that it killed more than 180,000 humans per year

[107]. According to the World Health Organization, Cryptococcosis ranks (in absolute numbers of deaths) as the 5th most lethal infectious disease, behind AIDS, tuberculosis, malaria and diarrhea

(http://www.who.int/mediacentre/factsheets). Its virulence is associated with two unique characteristics of the fungi: a carbohydrate capsule and melanin synthesis [48, 108, 109]. The capsule is composed primarily of two polysaccharides, glucuronoxylomannan (GXM) and galactoxylomannan

(GalXM), in addition to a smaller proportion of mannoproteins (MP). The capsule is dynamic and responds to changes in the environment. When the media is supplemented with molecules like L-3,4- dihydroxyphenylalanine (DOPA) or 1,8-dihydroxynaphthalene (DHN), it can synthesis the brown pigment melanin. Several studies have shown that this pigment is a potential radioprotective mechanism as well protection for fluctuating temperatures, harmful UV radiation, fungicidal effects of silver ions and evasion of phagocytes [99, 108, 110-113]. This radioresistant property is not exclusive to melanized C. neoformans, but also to a host of other fungi and bacteria found colonizing the walls of Chernobyl Power Plant and water systems in Chernobyl, Ukraine [105].

Despite this wealth of information on the virulence factors of C. neoformans, particularly its association with the fungi’s radioresistant property [114-116], there is no information regarding its

RNA modification status. To understand how ionizing radiation impacts RNA modification, I first profiled all the modified bases and mapped those back to the tRNA genes by LC-MS/MS. I then investigated the changes in the nucleoside level of the post transcriptional modifications in response to the stress induced. Results from this chapter will be supplemented with the expression level of tRNA and RNA modifying enzymes using RNA sequencing (Figure 4.1) that are described in

Chapter 5.

89

Figure 4.1 Scheme used to investigate the effects of ionizing radiation in the fungus C. neoformans. Chapter 4 will discuss results obtained in viable count assay and LC-MS/MS. Chapter 5 explores the results of RNA sequencing.

4.3.1 Identification of modified nucleosides in C. neoformans

Beyond the canonical nucleosides (A,U,G,C), twenty five modifications were detected (Table

4.1). In comparison to a model fungi S. cerevisae, the modification profile is very similar [38], except for wybutosine (yW) which was not detected in C. neoformans and io6A, N-6 (cis-hydroxyisopentenyl) adenosine, which is in C. neoformans tRNA. However, the precursor molecules (m1G for yW, i6A for io6A) are detected in the samples, indicating steps for further hyper modification of the G37 and A37 modifications were not present in the fungi studied. The modification io6A is known to be exclusively a bacterial modification [117] and it can be hypermodified into the methyl thiolated forms, msi6A and hydroxylated ms2io6A, both of which were not detected in C. neoformans or S. cerevisae.

90

Table 4.1 Modified nucleoside from C. neoformans total tRNAs by LC-MS/MS, including the accurate mass and retention time.

RT (min) + nucleoside MH 1.33 247.0926 D, dihydrouridine 1.4 245.077 ψ, Pseudouridine 2.42 302.0983 5 ncm U, 5- carbamoylmethyluridine

2.45 258.1087 m3C/m4C, 3/4-methylcytidine 2.95 274.1040 nm5U, 5-aminomethyluridine

3.43 282.1199 m1A,1-methyladenosine

3.99 258.1086 m4C, 4-methylcytidine or m5C, 5-methylcytidine

5.44 258.1091 Cm, 2-O-methyl cytidine

6.04 298.1148 m7G , 7-methylguanosine

5.85 269.0883 I, Inosine

6.17 259.0927 m5U, 5-methyluridine

10.5 259.0931 Um, 2-O-methyl uridine

91

+ RT (min) MH nucleoside

13.79 283.1039 m1I,1-methylinosine

14.12 317.0981 mcm5U, 5- methoxycarbanoylmethyluridine

14.44 298.1148 m1G ,1-methylguanosine

17.57 286.1040 ac4C, N4-acetylcytidine

18.09 298.1148 m2G ,2-methylguanosine

25.7 312.1304 m22G ,N2,N2,dimethylguanosine 25.81 282.1203 Am, 2-O-methyl Adenosine

26.53 333.0751 mcm5s2U,5- methoxycarbonylmethyl-2- thiouridine

27.4 413.1417 t6A, N6-threonylcarbamoyl adenosine

27.5 282.1203 m6A, 6-methyl adenosine

29.32 296.1359 m28A/m66A, methyl adenosine

29.61 352.1617 io6A ,N6-(cis- hydroxyisopentenyl)adenosine

29.5 336.1666 i6A, N6-isopentenyladenosine

4.3.2 tRNA modification mapping

The next step was to map the detected nucleosides onto the total tRNA genes. The genomic tRNA database predicts 144 tRNAs decoding the twenty standard amino acids, five were predicted to be pseudogenes. An interesting features of C. neoformans tRNA is that 92% (133/144) of the tRNAs have canonical introns. This oddity will be explored further in Chapter 6 of this thesis.

92

There are 48 potentially unique tRNAs as predicted by tRNA-ScanSE distributed in the fourteen chromosomes of the organism. With a combination of guanosine and cytidine specific cleaving enzymes, only three tRNAs have a sequence coverage of 75% and more; twenty one have between 40-75% and the remainder have less than 40% due to similarities in digestion products found in other tRNAs or some were just not detected due to being not abundant. A consensus map is shown in Figure 4.2. Furthermore, some tRNA isoacceptors in C. neoformans (Arg CCU, Asn GUU, Gln

UUG, Glu CTC, His GTG, Leu AAG, Phe GAA, Ser GCT and Trp CCA) have 2 or more isodecoders, which makes the mapping even more challenging, especially when there is a cytidine vs uridine difference. These challenges can be circumvented by using a more sensitive instrument or probe purifying individual tRNA species. In addition, the use of a nonspecific RNases can be advantageous to generate longer digestion products which can be uniquely mapped to one specific tRNA. However, since these RNases cleave randomly, the biggest hurdle will be the data analysis.

Although there is some available software that can be used, processing time can be a hurdle. The final modification map can be found in Appendix A1 (found at the end of the dissertation).

Using the two RNases, we detected that both modified and unmodified versions of a specific location can present as a mixture, particularly in the anticodon stem loop region. For example, both the modified and unmodified version of A37 were detected in Ser GCA, A[i6A]AUGp and AAAUGp.

Perhaps the most interesting discovery in mapping modifications is the presence of hyper-modified

A37 in the tRNA Ser AGA, UGA and Tyr GTA. The two versions of A37 (i6A37 and io6A37) are presented in Table 4.2. This position is important in modulating the codon-anticodon interaction, especially for the case of A-U base pairs relative to the G-C pair. The modification of A37 stabilizes

93

Figure 4.2 Consensus modification of all tRNA modifications found in sequence specific manner in the clover leaf structure of tRNA from C. neoformans.

the Watson-Crick pair by base stacking. Furthermore, this prevents +1 frameshifting events by

favoring the cognate tRNA selection. Studies of mutant tRNAs lacking the A37 modification (i.e.

ms2i6A) resulted in up to 9-fold frameshifting compared to the wild type [118]. Björk’s group have

studied in detail in the effect of ms2io6A vs io6A in multiple tRNA species for its impact in maintaining

frame maintenance during translation. The absence of methyl-thiol group (-ms2) strongly influences the A and P site frameshift and the hydroxylated isopentyl group (io6A) role is minor [20]. However,

94 they only tested for A vs ms2i6A and i6A vs ms2i6A pair. Although the difference is subtle (-OH group), delineating its role (if it is significant) requires further investigation.

In yeast, the enzyme mod5 is responsible for i6A synthesis, using the substrate dimethylallyl- pyrophosphate (DMAPP) [119]. However, the next step in the pathway (Figure 4.3), the allylic hydroxylation step catalyzed by the enzyme miaE, has only been described in eubacteria and not in eukaryotes. This enzyme belongs to a class of carboxylate-bridged nonheme di-iron enzymes that requires O2 as a substrate for mono oxygen addition [120]. Björk’s group has demonstrated that in

Salmonella, the hydroxylated form of A37 (specifically ms2io6A) showed no impact in translation relative to the methyl-thiol (ms2i6A) form. They did note that the hydroxyl group can regulate the growth of the organism in citric acid metabolites like succinate, fumarate and malate. Salmonella

“senses” the modification and will grow in the presence of these metabolites [121].

Table 4.2 Digestion product obtained from the anticodon loop of the tRNAs where two different version of A37 were detected.

tRNA isoacceptor sequence

6 Ser AGA A[i A]AUCGp A[io6A]AUCGp Ser UGA A[i6A]AUCCCGp

A[io6A]AUCCCGp Tyr GUA UA[i6A]AUCUGp UA[io6A]AUCUGp

95

Figure 4.3 Biosynthetic pathway for the synthesis of ms2io6A (2-methylthio-N6-(cis- hydroxyisopentenyl) adenosine. The enzyme associated are labelled in numbers. (1) Mod5 (S. cerevisae) (2) MiaE (S. typhimurium) (3) MiaB (E. coli).

Most of these studies were performed in eubacteria and not in eukaryotes. In fact, along with

C. neoformans (in this study), there are only two eukaryotes reported bearing the io6A modification:

Nicotiana rustica (wild tabacco) and Lupinus luteus (yellow lupin) in its serine tRNAs. BLAST analysis

of the miaE enzyme from Salmonella typhimurium showed no homology in any protein in C. neoformans.

In addition, the miaB enzyme in E. coli has no homology to any hypothetical protein in the fungus.

This result raises questions as to what enzyme acts on i6A for its conversion to io6A and why the

methylthio forms are not required by the fungus.

96

4.3.3 Effects of ionizing radiation in the cell viability of C. neoformans

The cytotoxic dose - response curves for C. neoformans in ionizing radiation was first established. The source used in the experiment is Cobalt-60 which decays into Ni-60 (step 1) and a release of 1.7MeV energy. The gamma (γ) rays is emitted in (step 2) as it returns to the ground state accompanied by the release of 1.33 MeV energy. The dose used is constant at 132 Gy/hour (2.2

Gy/min) and after the maximum does was achieved (300 Gy), 30% of the cells survived relative to the control (Figure 4.4). This result is similar to what Shuryak et al. [122] which the group observed

20% survival with 300 Gy. The key differences are the type of media use (minimal vs rich) and dose rate (10.67 Gy/min vs 2.2 Gy/min).

100

80

60

40

20 Percentsurvival 0 0 100 200 300 Gy

Figure 4.4 Survival curve of C. neoformans exposed to varying levels of ionizing radiation at a constant dose of 132 Gy/hour.

97

4.3.4 Effects of ionizing radiation on tRNA modifications in C. neoformans

Modifications in the anticodon position can alter the decoding activity of tRNAs. Depending

on the type of chemical groups added, it can enhance or restrict base pairing with its cognate codon

[123, 124]. Recently, it has been demonstrated that tRNA modifications are part of a programmed

stress response in S. cerevisae. In response to stress, the modulating efficiency of tRNA is enhanced

and favors decoding of mRNA transcripts enriched in its cognate codons [38, 104, 106]. I wanted to show if this model holds true for C. neoformans, specifically looking for a modification that is only induced in ionizing radiation and whose levels increase with increasing dose of IR.

4.3.5 Oxidized nucleobases are increased in relative abundance

Cellular exposure to ionizing radiation leads to a cascade of events that affects chemical and biological processes in an organism. Radiation interacts with the cellular components through direct mechanisms which includes DNA damage through strand breaks. As water is the main cellular component, radiolysis of water generates free radicals (reactive oxygen species) that in turn attacks other cellular constituents (indirect effect). Ionizing radiation like γ rays affects cells through indirect effects. The maximum exposure used in this experiment generated a small, reproducible increase in the level of guanine oxidation product 8-oxo-rG. This is not surprising since guanine base is the most

easily oxidized of the canonical nucleobases [125]. However, the four-electron oxidation product, Gh,

was not detected as well as any oxidized adenine base in the data. In addition, the hydroxylated

products of m5C (5-hydroxymethyl cytidine, hm5C) and m6A (N6-hydroxymethyladenosine, hm6A)

were also detected (Figure 4.5). In the cell, oxidation of these methylated bases is a way to remove the methyl group. For example, in the cell m6A demethylation can be accomplished in two ways: FTO

98

Figure 4.5 Dose response of the three IR induced oxidation of (A) guanosine and (B) m5C and (C) m6A in C. neoformans. A. Fold change response of the 2-electron oxidation of guanine base, 8-oxo-G B. Fold change response of m5C and its oxidation product, hm5C and (C) oxidation product of m6A to hm6A.

99

pathway and ALKBH5 (reversible demethylation) [48, 126]. FTO oxidizes the m6A to hm6A which

then releases an aldehyde (-HCOOH) to form adenosine. ALKBH5 directly removes the methyl group

and this reaction is reversible. In cytidine, the TET family of enzymes mediates the removal of the

methyl group through a similar fashion with methyl adenosine removal [127]. The level of m5C is

constant in the all level of exposure tested. In contrast, the Dedon group showed that this is one of

the key nucleoside that is increased in H2O2 exposure. The differences observed here could possibly

be due to the different cellular mechanism a radioresistant fungi (C. neoformans) adapt versus a

radiosensitive organism S. cerevisae. Although the conclusion made here can be limited because there

is no codon bias usage analysis was performed, it raises the questions of the applicability model put

forth in how a cell responds to stress.

4.3.6 RNA modification signature in ionizing radiation

Apart from the oxidized bases, I am also interested if a there is a nucleoside that will be

detected exclusively during ionizing radiation exposure. Apart from the three oxidized bases

mentioned above, no new modifications appear in C. neoformans after radiation exposure. However, as

shown in Figure 4.6, C. neoformans possess a unique modification profile in IR. Variations in

modification are grouped into 3 major clusters. Group 1 is composed of all the sugar methylations

(Am, Gm, Cm, Um), m6A and m66A. Group 2 includes io6A, m1I, m3C, t6A, ac4C, i6A, D, m1G, m5U,

Y, m22G, m5C, m7G, m1A, m227G and a subgroup of 8-oxo-G and nm5U. The third group consists of

I, mcm5U, mcm5s2U, ncm5U, hm6A and hm5C. Group 1 nucleosides showed a relatively constant

amount versus control or a decrease with increasing IR dose. In most cases, these changes are not

very significant, like the Am, Um, Cm but two modifications, m6A and Gm decreased significantly over time. The decrease in the m6A level is mirrored by the increase of its first oxidation product,

100

hm6A as shown in Figure 4.5. Interestingly, the levels of hm6A remains constant all throughout the exposure.

0.3 1.0 2.4

Figure 4.6 Hierarchal clustering analysis of ionizing radiation induced changes in the level of tRNA modifications in C. neoformans.

The second group of nucleosides are those that showed no change relative to control up to

1.5x. A majority of these modifications are important for scaffolding and structural stability like D, ψ,

m22G, m1A, m5U, m7G. This group also includes anticodon loop modifications, particularly in the position, like m1I, i6A, io6A, t6A, t6A and m1G. A subgroup consisting of nm5U and 8-oxo-G showed

an increase in a dose dependent manner.

There was no significant increase in the m5C modification as observed with other groups reported previously. One possible explanation is that the population of the tRNAs bearing that

101 modification is stable over the course of the exposure. Unfortunately, the anticodon loop of Leu CAA, which in yeast has m5C in C34, was not detected in RNAse T1 and Cusativin digests. The last group consist of exclusively position 34 modifications, I, ncm5U, mcm5U and mcm5s2U. All have elevated levels in IR exposure, and this location potentially impacts translation fidelity and efficiency. Inosine was mapped back to multiple tRNAs with A34 (Ala AGC, Ile AAT, Leu AAG, Pro AGG, Ser AGA,

Val AAC). Inosine has an expansive decoding ability being able to bind with U-, C- and A ending codons. Uridine modifications have a well-documented role in its restrictive and expansive decoding ability. In particular, xm5U modifications like ncm5U, mcm5U and mcm5s2U, prefers binding to codons that end in -A and -G meanwhile reduced ability to bind with -C and -U ending codons. This modification (mcm5U and mcm5s2U) has been mapped back into tRNA Lys UUU, Arg UCU, Gln

UUG, Glu UUC and Gly UCC. The enzyme Trm9 participates in the synthesis of mcm5U by a methyl group addition. Deng et al demonstrated that in yeast, Trm9 catalyzed modifications particularly in tRNA Arg UCU and Glu UUC enhanced the selection of proteins rich in its cognate codon for survival when cells are exposed to stress[106]. As previously mentioned, there was no codon usage bias analysis performed in this work, so it is not conclusive if the same observation can be made for

C. neoformans in IR.

These observed increases in modifications raise a few intriguing questions about the nature of tRNA transcripts bearing these modifications. We can hypothesize the following explanations for the observed phenomenon: first, it can be due to the increased synthesis of new tRNA molecules, which one can postulate a regulatory response was activated by the ionizing radiation demanding more fully modified tRNAs. Second, it may be in due to the increase in the RNA modifying enzyme level (or activity). Third, it may be accounted by the enhanced turnover of unmodified tRNAs that enriches the pool of fully modified nucleosides.

102

Conclusion

In this chapter I present baseline information about the modification status of the total tRNA pool of the pathogenic, radioresistant fungi C. neoformans. Twenty-five unique modified nucleosides were detected, which is similar to the model fungi S. cerevisae. With the combination of RNase T1 and

Cusativin, modification mapping revealed an interesting set of anticodon modifications, containing two sets of A37 (i6A and io6A). When cells were exposed to ionizing radiation, the oxidation products of guanine base (8-oxo-G), m5C (hm5C) and m6A (hm6A) were detected. Apart from these three mentioned which are expected, no new modification showed up exclusively due to ionizing radiation.

There was an increased level in several nucleosides, specifically I, ncm5U, mcm5U and mcm5s2U. These modifications are in position 34 of the tRNA, which is a site that impacts codon-anticodon interaction.

Since only the levels of nucleosides were changed when the cells were exposed to ionizing radiation, it is necessary to expand the scope of the analysis to techniques other than mass spectrometry. In Chapter 5, an orthogonal but complementary technique RNA sequencing was used to explore the transcriptome of C. neoformans. These studies will help me better understand how C. neoformans’ tRNAs respond to IR.

103

Chapter 5 Radiation induced changes in transcriptome and tRNA levels of

Cryptococcus neoformans by RNA sequencing

5.1 Introduction

In the previous chapter, tRNA modifications from the pathogenic fungus C. neoformans were profiled using LC-MS/MS. However, nucleoside information only reveals a portion of cellular processes involved when a cell is in a state of disturbance. Techniques like RNA sequencing can complement mass spectrometry to understand mechanisms involved in the radioresistant network in

C. neoformans. This “going global” approach is a new era for modification mapping, which considers not just the product (nucleoside information and oligonucleotide maps) but also the varying levels of transcripts that are induced or repressed. One important question to ask is: does tRNA (pre and mature) expression vary in response to a specific condition? Measuring tRNA expression levels by

RNA sequencing offers the advantage of a high-through put analysis and low detection limits.

In this chapter, I present the third use of a reference, which is measuring the global expression levels of RNA in the transcriptome. This combined approached (MS and RNA sequencing) improves the breadth of the reference concept in understanding what cellular processes in C. neoformans confer radiation resistance.

104

5.2 Experimental

5.2.1 C. neoformans culturing and RNA, tRNA purification

C. neoformans culture protocol and isolation of RNA and total tRNA were performed as described in Chapter 2.

5.2.2 DNAse I digestion

The samples (RNA or total tRNA samples, 10 μg each) were added with 1/10 volume of 10x

DNase I buffer and 1 μL rDNase I (recombinante DNase I) and incubated for 30 min at 37 °C.

DNase I inactivation reaction (for removal of divalent cations) was added (1/10 volume) and

incubated at room temperature for 2 min. The supernatant, containing the DNAse free-RNA was

removed by spinning the tubes at 10,000 rpm for 20 min. All the reagents used were from the Ambion

Invitrogen DNA removal kit.

5.2.4 Ionizing radiation exposure

The procedure for ionizing radiation exposure is described in Chapter 4.

5.2.5 Library preparation for next generation sequencing

Duplicate biological samples were submitted to the DNA Sequencing and Genotyping Core

at Cincinnati Children’s Hospital Medical Center (CCHMC). The total RNA and tRNA samples

(DNase I treated) were prepared at a final concentration of 1 μg/μL in water. The RNA quality and

quantity were assessed by running a sample in an Agilent technology 2100 Bioanalyzer. Only samples

with an RQN (RNA quality number) > 8 were processed for library generation. The isolation of

mRNA, cDNA synthesis, ligation and PCR amplification for library generation protocol are described

in the protocol by Illumina Truseq Stranded mRNA Sample Preparation guide. Briefly, poly

adenylated RNA species was probed out using oligo-dT attached magnetic beads. The mRNA was

105 fragmented, and the first strand of cDNA synthesis was made using a reverse transcriptase, the second cDNA follows, including several quality control steps, then adapters were ligated. cDNA was amplified by PCR to create the final cDNA library.

For the tRNA analysis, the RapidSeqTM High Yield Small RNA Prep Kit protocol was used with modification. The denaturing step was increased to 90 °C for 2-3 min and immediately placed in ice for 5 min. A size selection of the library was narrowed between 125 nt to 310 nt during the purification of cDNA construct. This size was selected to accommodate the size of pre-tRNA and mature tRNAs plus adapter sequences.

The barcoded libraries (total RNA and tRNA) were sequenced using a HiSeq 2500 read 75bp

(single read). The reads for mRNA were about 30M and for tRNA at 3M.

5.2.6 Data analysis

All data analysis was done by the Bioinformatics Services of the CCHMC DNA Sequencing and Genotyping Core facility. The reference genome used was C. neoformans JEC21 from

EnsemblFungi (http://fungi.ensembl.org) GCA 000091045.1.3. Two other genomes were tested but the JEC21 strain had an overall read mapping rate of > 90%. The FASTQ files were obtained from the core facility of CCHMC. Quality control steps were performed to determine overall quality of the reads of the FASTQ files. Upon passing basic quality matrices, the reads were trimmed to remove adapters and low-quality reads using Trimmomatic. The trimmed reads were then mapped to the reference genome using Tophat (using Bowtie in the background); this step created a summary of alignment and an output (BAM) files for downstream processing. Then the transcript/gene quantification was determined using “Cufflinks”, which the output is abundance in FPKM (Fragment per kilobase of exon per million reads). The quantified sample matrix was then used to determine

106

differential gene expression between experimental groups (control and treated samples). An R package

called EBSeq was used to perform the differential gene expression analysis between the group of

samples (biological replicates). The significant differentially expressed genes were obtained by using a

fold change cutoff of 2 and adjusted p-value cutoff of ≤ 0.05. To create a comprehensive overview

of the experiment, overall expression profiling is visualized using heatmap plots. These plots for the given set of genes will be generated using the R package “Heatmap2”. Downstream functional annotation of a gene of interest and/or significantly dysregulated in the experiment is determined

using gene ontology (cellular components, molecular function and biological process) and pathway

analysis. A detailed functional annotation and pathway analysis was performed using Fungifun2 tool.

The same analysis pipeline was followed for the pre-tRNA expression level analysis except the

quantification tool used was RSEM. For the mature tRNA/tRNA fragment analysis, the protocol by

the Lowe lab was adapted [128]. The reads were mapped to mature tRNA sequences from tRNAscan-

SE gene predictions from C. neoformans JEC21[2]. The introns were removed, the CCA tail was added

to the 3’ end. An additional step was performed for all histidine tRNAs. Mature histidine tRNAs have

a guanosine (-1 position) nucleotide incorporated at 5’ end which is added post transcriptionally [129].

5.3 Results and Discussion

Twelve years after the Chernobyl Nuclear reactor 4 explosion, fungal growth was discovered

in that highly ionizing environment [105]. The diversity of melanized fungi and bacterial species

thriving and growing in Chernobyl led to decades of research on radioresistance [110, 114, 116, 130].

The model fungi used in this study is C. neoformans, a well-studied pathogenic fungus that can cause

fungal meningitis and cryptococcosis in immunocompromised patients [48]. There is also a wealth of

107

information on the carbohydrate capsule, which along with melanin are the fungi’s virulence factors

[108, 131]. However, there is no information on tRNA modifications and tRNA expression levels for

C. neoformans.

In Chapter 4 I showed that some modifications have increased in the total tRNA pool of C. neoformans after radiation exposure. Using the power of RNA sequencing, we analyzed the transcriptome profile and tRNA expression levels of C. neoformans after exposure to IR. The expression level of the RNA in the cell can tell us what genes are mobilized in the cell in response to radiation.

5.3.1 Low dose radiation does not significantly remodel the transcriptome profile of C.

neoformans

The impact of ionizing radiation on the gene expression of the radioresistant non melanized

form of C. neoformans was first examined by exposure to 100 Gy and 300 Gy at a fixed dosage of 132

Gy/hour. As shown in Chapter 4 (Figure 4.4) colony counting assay revealed no enhanced growth because of radiation which is in agreement with the findings of Dadachova et al [114]. The 100 Gy exposures represents 80% survival and 300 Gy exposure resulted in 30% survival of the fungus. To understand the molecular mechanisms mobilized when the fungi is exposed to IR, I examined the transcriptional patterns as the cells enter log phase. The transcriptome analysis revealed that out of the 6,449 genes expressed in the cell, only 363 genes (146 down regulated and 217 up regulated) display differential expression in response to radiation (p<0.05, fold change ≥ 2) (Figure 5.1A). This accounts

to ~6% of the total annotated genes in the genome (Figure 5.1B). In contrast, Jung et al. found that

a significant proportion of genes (37%) were regulated as a response to radiation (2,587 of a total of

6,962 genes expressed).

108

A.

B.

Figure 5.1 A. Volcano plot of the differentially expressed genes in 100 and 300 Gy relative to control (FC>2 and p<0.05). B. distribution of the significantly up and down regulated genes relative to control.

109

This discrepancy could be attributed to the type of strain used, dosage and exposure time. Of the

several the several pathogenic Cryptococcus species (C. neoformans var. grubii H99 strain, C. neoformans var.

neoformans JEC21 strain, Cryptococcus gattii R265 strain, and C. gattii WM276 strain), JEC21 strain (used

in this study) is the least radioresistant of all the Cryptococcus species mentioned, and the var grubii strain

(used by Jung et al.) is most tolerant to radiation. In addition, the dose used by Jung et al. is at 3000

Gy for 1 hour (vs 300 Gy in 2.2 hours) and the cells in the study were placed in a fresh media and allowed to recover for 30 – 120 minutes after exposure versus immediate harvest in the current work.

I choose to harvest the cells immediately after exposure to get a glimpse of the immediate response of the fungi. It is also important to note that the progeny of the surviving cells can also experience the radiation induced stress changes similar to the parent cells that were directly exposed to stress as well as the progeny of bystander cells [132-134]. It is therefore worth investigating the long-term

impact of the radiation in the progeny of the fungus.

5.3.2 Variations in the transcript levels of RNA modifying enzymes under radiation

In Chapter 4 it was observed the fungi’s modification level is sensitive to IR, in particular

position 34 modifications (I, ncm5U, mcm5U and mcm5s2U) were increased over the course of

exposure. Using a homology-based search in model organisms like S. cerevisae and E. coli, the enzymes

responsible for the modifications can be assigned in C. neoformans. The expression level of the genes

associated with these modifications can be monitored in the RNA seq data. As shown in Figure 5.2,

upon exposure to IR, there are no significant changes in the expression level of RNA modifying

enzymes except for transcripts involved in the enzymatic pathway (Figure 5.3) to generate mcm5s2U

(C. neoformans homologue for Trm9 and Uba4). This is in agreement with the nucleoside data in

110

Figure 5.2 Fold change response of RNA modifying transcripts of C. neoformans in IR. Ψ – Pus 1, Pus 4; D – Dus 1, Dus 2, Dus 3; Um – Trm 14; m5U – Trm2; mcm5U- Trm 9, Trm 112, Uba 4; mcm5s2U – Ncs 2, Ncs 6; I – Tad 1; m1A – Trm 61; m6A – Ime 4; m1I Tad 1, Trm 5; i6A, io6A – Mod 5, t6A – Sua 5; m3C – Trm140; m5C – Trm 5; m2G – Trm 3; m1G – Trm 10, m7G – Trm 8; m22G – Trm1. No known enzymes for the following nucleosides: nmc5U and Am. Known enzymes in S. cerevisae or E. coli was aligned (BLAST) to the C. neoformans genome for homologues. Black bar is for control, blue is for 100 Gy and orange bar is for 300 Gy fold change induction. Asterisk indicate a statistical significant difference in expression level of genes at p<0.05.

Chapter 4. However, what is very surprising is the Dus1 transcript has a two-fold induction

(p<0.05) consistently in all the biological replicates tested. This was not expected since the initial

nucleoside data does not show an increased level of the modification dihydrouridine in the sample as

shown in Chapter 4, Figure 4.6. The results here showed that there is no increase in the synthesis

of genes involved in RNA modifying enzymes, which could not explain the variations of nucleoside

111

levels in the fungi under IR. The next step is to examine the levels of tRNA transcripts in the fungi

under IR.

Figure 5.3 Enzymatic pathway for the synthesis of I and mcm5s2U from the canonical base A and U. mcm5s2U synthesis begins with the addition of a carboxymethyl group in position 5 of U to form cm5U. This step is catalyzed by the elongator complex (elp 1-6) in yeast. A methyl group is added by a methyl transferase, Trm9/112 with S-adenosyl methionine as the methyl donor[1] to form mcm5U. Ubiquitin related modifier 1 (Urm1) along with other proteins adds the thiol group at position 2 [2] to form mcm5s2U. The conversion of A to Inosine is an example of RNA editing which the hydrolytic deamination of adenosine to form inosine. The reaction is catalyzed by tRNA specific adenosine deaminases (ADAT) which in yeast is the Tad1 enzyme.

112

5.3.3 pre-tRNA and mature tRNA expression after IR exposure

Transcription of a tRNA is catalyzed by RNA polymerase III, with transcription factor (TF)

IIIB binding to the A and B boxes (D loop and T loop in tRNA gene respectively) and TF IIIC binding upstream of the tDNA which recruits the polymerase [135]. Cytoplasmic tRNAs are transcribed as pre-tRNA in the nucleus and it undergoes several processing steps to a generate mature tRNA. All pre-tRNAs contain a 5’ leader and 3’ trailer sequence and a subset which contains intervening sequences in the anticodon loop called introns. Both the 5’ and 3’ extra sequences are removed by RNase P and RNase Z respectively, then the 3’ CCA tail is added. The introns are spliced out and numerous modifications can be added prior to removal of the introns and in different subcellular locations (cytoplasmic vs nuclear) [64, 128, 136]. Mature tRNAs are defined as a fully processed molecule that can participate in translation.

In C. neoformans, the genomic tRNA database listed 144 total tRNA genes with 139 for decoding twenty amino acids and 5 predicted pseudogenes. Pseudogenes are genes that possess the qualities of a tRNA yet lack the ability to fold into a canonical 3D structure [137]. In the RNA seq data set, pre-tRNA reads mapped directly to the reference genome whereas mature tRNAs were reads mapped to tRNAscan-SE tRNA gene predictions for the fungi [101]. The mature tRNA reference sequences for mapping contains CCA tail, the trailer, leader and intron sequences were removed, and

G nucleotide added to the 5’ end of histidine tRNAs. Multiple copies of mature tRNAs have identical sequences and the reads were combined to a single tRNA isotype.

113

A. Pre-tRNA

B. Mature tRNA

Figure 5.4 A. Qualitative analysis of the tRNA genes in C. neoformans after exposure to IR for A) pre tRNA and B) mature tRNA. Green indicates that it is expressed and red means no new pre or mature tRNA was detected.

114

The first goal was to identify what tRNA genes are transcribed in C. neoformans after IR exposure. What I asked is: at the isoacceptor level, does IR induced or repress specific tRNAs?

Figure 5.4A is the plot for the pre-tRNA transcripts detected in control vs IR exposed. All the mature

tRNAs are expressed at different levels in both control and treated samples (Figure 5.4B) but nothing

induced or repressed in IR at this level. Two pre-tRNAs are of interest: Ser UGA and Arg CCU. The

differential expression of Ser UGA suggests no new transcript was made when cells are exposed to

IR. There seems to be a preference for which isodecoders of Arg CCU (1-1, 2-1) are expressed

depending on the cell’s state. Isodecoders are defined as tRNAs bearing the same anticodon but have

sequence difference elsewhere. Arg CCU-1 has two copies in chromosome 4 whereas CCU-2 is a

single copy in chromosome 10. The two differ in a single nucleotide at position 17 (C vs U) and the

intron length is 25 vs 27 nucleotides long in CCU-1 and CCU-2 respectively. This position in the D-

loop is exclusively modified by the Dus1 enzyme for uridine to dihydrouridine modification.

Interestingly, the D loop is one of the binding sites for the transcription factor (TF IIIC) during tRNA

synthesis. This apparent switch in the isodecoder preference in IR is fascinating and to the best of my

knowledge, the first time being observed as a consequence of stress to an organism. On the surface it

seems that there is really no consequence on which isodecoder is synthesized since both can decode

the same codon. The mystery to this isodecoder switch in IR can be tested in many ways like

differences of the isodecoders in translation efficiency, aminoacylation levels or stability.

5.3.4 Variations in the pre-tRNA and mature expression under IR

Beyond the isoacceptor level and qualitative check for induced or repressed tRNAs in IR,

RNA seq data showed variations in the level of pre-tRNA and mature tRNA genes in C. neoformans.

115

From the pre-tRNA expression in IR exposed fungi, it is revealed that 1) not all tRNAs encoded in the genome are transcribed 2) a subset of pre-tRNAs are stable, likely insensitive to IR exposure and 3) a cluster of pre-tRNAs are sensitive and responds to IR exposure. I classified the pre-tRNAs into “housekeeping” tRNAs which are always expressed, IR induced tRNAs, and IR repressed tRNAs. The Venn diagram in Figure 5.5 shows 101 pre-tRNAs that are housekeeping, 20 that are transcribed upon IR exposure and five that upon IR exposure are repressed. The identities of these pre-tRNAs are listed Table 5.1. A complete list of all the pre-tRNAs that were not expressed in any condition are listed in Appendix A3 and the different levels of pre-tRNA expression are listed in Appendix A4. In the pre-tRNA set (with the exception for tRNA Ser UGA 1-1), it is observed that there is at least one pre-tRNA (from multiple copy genes) that the level is stable, meaning not much is changing in expression level while the other copies varying levels of expression. In the housekeeping set, tRNA Ser UGA is very interesting as no new pre-tRNA transcripts are being produced in IR. This single copy tRNA resides on chromosome 2. Its mature counterpart has ncm5U in the wobble position and possesses the unique modified A37 containing both the i6A and io6A modification as discussed in Chapter 4. Although no new tRNAs are synthesized, the levels of mature tRNA Ser are relatively the same with the control. It could possibly mean that there is no additional demand for it because the mature levels are sufficient for translation.

116

Figure 5.5 Venn diagram of the 132 pre-tRNA transcripts detected in C. neoformans in IR. A total of twenty tRNAs are synthesized after cells are exposed to IR (IR induced) and five that are expressed in the control only samples (IR repressed). The 101 tRNAs transcripts expressed consistently are labelled as “housekeeping tRNAs”.

Table 5.1 List of IR induced, and IR repressed pre-tRNAs.

Unique in control Unique in IR Ser-UGA-1-1 Phe-GAA-2-2 Asn-GUU-1-1 Val-AAC1-6 Val-CAC-1-1 Arg-CCU-1-2 Ala-AGC-1-1 Gly-GCC-1-2 Val-AAC-1-1 Arg-CCU-2-1 Glu-CUC-2-4 Gly-GCC-1-7 Ala-AGC-1-4 Asn-GUU-2-2 Ile-AAU-1-4 Thr-AGU-1-1 Gly-GCC-1-5

In Chapter 4, I have showed that C. neoformans total tRNA pool responds by changing the relative quantities of the modifications in responds to IR exposure. These variations can be due to the changes in the RNA modifying enzyme level (or activity). However, as shown in section 5.3.2, only the Dus1 (for dihydrouridine) and Trm9 (for mcm5U) transcripts were significantly upregulated under

IR exposure. The nucleoside data for mcm5U level increased is consistent with the transcript (Trm9).

117

The second hypothesis is that the changes in nucleoside level could be due to the synthesis of new tRNA molecules, which one can postulate that in IR a regulatory response was activated demanding either more fully modified tRNAs or tRNAs are subjected to degradation (turn over process).

Figure 5.6 Fold change response of significantly expressed mature tRNA in IR for 100 Gy and 300 Gy.

To test this hypothesis, I analyzed the levels of the 75 unique tRNA isotypes of C. neoformans under IR exposure. Only 13 sets of tRNAs were significantly upregulated and downregulated (Figure

5.6). The common set upregulated after IR exposure are: tRNA iMet, Leu CAA, Thr UGU and Tyr

GUA. The common set downregulated are: Glu CUC, Leu AAG and Cys GCA. In Figure 5.7, the heat map of the mature tRNAs showed no striking variations of tRNA levels under IR exposure.

None of the tRNAs bearing the increased modifications (listed in Figure 5.3) were increased in abundance after IR exposure. Combining the results from Chapter 4 and 5, it is possible that the tRNA levels and tRNA modification abundance (nucleoside) in IR may not be as tightly coordinated

118

Figure 5.7 Heat map of the 70 isotype mature tRNAs fold change in IR.

119

as I initially hypothesized. The IR induced changes in the nucleoside level may be independent of the

tRNA abundance level in the cell. In S. cerevisae as demonstrated by the Dedon group, exposure to

5 5 H2O2 and MMS, the increased levels of m C and mcm U was not necessarily reflected by the increase

in tRNA copies of Leu CAA and Arg UCU respectively [104]. However, it is important to note the

current limitations of the RNA sequencing for high throughput analysis of mature tRNA. tRNAs have

thermodynamically stable secondary and tertiary structures that may inhibit ligation of adapters for

cDNA synthesis. In addition, post transcriptional modifications can cause strong stops during the

reverse transcription, generating truncated reads. Although other labs have published several methods

to overcome these limitations [3], the one used in this thesis is the standard miRNA analysis protocol

and not specific for tRNAs. The cell also generates tRNA derived fragments (tdRs) by the cleavage of

tRNAs in the anticodon (tRNA halves) or in the D and T loops (5’ or 3’ fragments) that upon

sequencing can be mapped back to the mature tRNA sequence or can escape detection due to the

short reads [128, 138, 139]. In addition, tRNA turnover either by the rapid decay pathway (RTD) or

TRAMP/rp6 mediated tRNA degradation can also generate fragments as a way of tRNA quality control [140]. All in all, it is hard to tease out the tRNAs that were up or downregulated due to the effects of IR alone and not due to the combination of tRNA fragments or turnover pathways activated in the sample. All in all, regulation of gene expression of C. neoformans under IR goes beyond the

tRNA expression and modification so it is important to look at other cellular processes that happens

to the fungi under IR exposure.

5.3.5 DNA damage repair and antioxidant genes are upregulated in radiation

Despite the low number of transcripts that were differentially regulated during exposure, the fungus still must counteract the deleterious effect of reactive oxygen species generated by radiation

120

through DNA repair process and ROS detoxification as part of its defense system. Transcripts

involved in post replication repair (Rad5, HPR5, Rad6), homologous recombination (Mus81, SAE2,

MMS4, Rad57, Rad54 and Rad 52), nucleotide excision repair (Rad1) and DNA damage checkpoint

(Rad9, Rad 17) were highly induced (Figure 5.8). This is not surprising as protection of its genome

integrity is a crucial response to DNA damaging agents such as ROS generated by IR. The anti-oxidant

systems were also activated but not the same extent as the DNA repair pathways. Most antioxidant

related transcripts return to background level or lower at 300 Gy (Table 5.2). Notably, thiol

peroxidase mutants of C. neoformans were shown to be sensitive to hydrogen peroxide. This enzyme is

also critical for virulence [141]. Taken together, C. neoformans counteracts the direct cellular damage

and avoid indirect toxic effects of ROS by amping up the DNA repair process and antioxidant system.

The complete gene ontology analysis is in the Appendix A2.

5.3.6 Capsule synthesis is not induced in IR

Capsule formation of C. neoformans is an important virulence factor along with melanin

production. The capsule is composed of two major polysaccharides: glucoglucuronoxylomannan

(GXM) and glucuronoxylomannogalactan. GXM consists of an α-(1,3)-mannan main chain with β

(1,2)-glucuronic acid residues attached to every third mannose. It provides protection to the fungus from desiccation, radiation and prevents phagocytic activity [108, 130, 131, 142]. In capsule inducing media, enlarged capsules confer resistance to reactive oxygen species. Furthermore, S. cerevisae (a capsular) was protected from the harmful ROS if the released polysaccharide from C. neoformans is transferred in the media [108]. Zaragoza et al. observed capsule retraction when the fungus was exposed to IR, hereby making it the first line of defense through a gradual release of the polysaccharide in the media [143].

121

Until now, the focus of research is looking at high resolution of images of the capsule in IR and the changes in the sugar composition [131]. The advantage of transcriptome analysis is that most of the information about the fungus is already on hand, one just has to ask the right question.

A. B.

C.

Figure 5.8 A. DNA repair enzymes induced in IR. Rad51, Rad 57 – recombinase; Rdh54, Rad54 – DNA dependent ATPase; Enzymes in double stranded break repair – Rad57, RFA2/RPA32, Dnl4, MRE11; B. Homologous recombination enzymes – Mus81, Sae2, Rad57, Rad54, Rad52. C. Post replication repair – Rad5, HPR5, Rad6; Nucleotide excision repair- Rad1; DNA damage checkpoint – Rad9, Rad17. Asterisk indicate a statistical significant difference in expression level of genes at p<0.05.

122

Table 5.2 ROS activated antioxidant transcripts induced in IR: superoxide dismutase (SOD1,2); Catalase (CAT1-4); peroxiredoxin (TSA); thioredoxin (TRX1,2 GRX3); thioredoxin reductase (TRR1); cytochrome c-peroxidase (CCP1); gluthathione peroxidase (GPX1,2,5) and sulfiredoxin (SRX1).

Gene name 100 Gy 300 Gy Fold change p value Fold change p value SOD1 1.54 1.13E-02 1.15 5.83E-01 SOD2 0.81 2.25E-01 0.64 1.50E-01 CAT1 0.83 2.27E-01 0.71 1.81E-01 CAT3 0.84 2.19E-01 0.50 9.06E-03 CAT4 0.74 9.69E-02 1.40 2.11E-01 TSA1 1.03 8.25E-01 0.51 5.04E-02 TSA, putative 1.24 1.54E-01 0.94 8.03E-01 TRX1 1.57 8.18E-03 1.06 8.17E-01 TRX2 1.43 4.48E-02 1.25 3.81E-01 GRX3 0.98 8.88E-01 0.99 9.62E-01 TRR1 0.86 3.67E-01 0.59 5.45E-02 TRR putative 1.24 1.54E-01 0.94 8.03E-01 CCP1 0.74 1.04E-01 1.07 7.29E-01 CCP putative 1.07 7.29E-01 1.90 2.43E-02 GPX1 1.47 4.68E-02 1.22 4.39E-01 GPX2 1.49 3.50E-02 1.03 9.14E-01 GRX5 1.59 8.14E-03 1.66 5.40E-02 SRX1 1.39 5.87E-02 0.82 4.26E-01

With this, I asked the question if capsule formation is induced during IR exposure. The enzymes

associated in the synthesis of capsule and its level in IR are presented in Table 5.3. With the exception

of two, all the capsule transcripts were stable over the exposure time. SEM images collected showed

a slight retraction of the capsule, which reinforces what was previously known before that the capsule

gets a direct hit from radiation. Zaragoza et al. demonstrated that it acts as an antioxidant by

scavenging the ROS and is not associated with the catalase activity in the cell [108]. My results

presented proof that the capsule formation is not induced by IR exposure.

123

Table 5.3 Capsule related transcripts. Description of these enzyme can be found at the list of abbreviations.

Gene name 100 Gy 300 Gy Fold change p value Fold change p value

CAP59 1.14 4.06E-01 1.61 7.26E-02 CAP64 1.11 4.71E-01 1.09 7.26E-01 CAP60 0.92 5.60E-01 1.10 7.21E-01 CAP10 0.93 6.15E-01 1.04 8.73E-01 CIR1 0.98 9.05E-01 1.00 9.91E-01 GPA1 0.97 8.57E-01 1.09 7.25E-01 GPR4 0.79 1.42E-01 0.67 1.69E-01 PKA1 0.88 4.05E-01 0.94 8.02E-01 PKR1 0.97 8.28E-01 1.08 7.58E-01 PDE1 1.02 8.95E-01 1.10 7.11E-01 CAC1 0.72 4.40E-02 0.84 4.95E-01 ACA1 1.00 9.94E-01 0.92 7.55E-01 CAN2 0.84 2.56E-01 0.64 8.51E-02 NRG1 0.93 6.59E-01 0.79 3.74E-01 MAN1 0.84 2.36E-01 0.72 2.03E-01 GMT1 0.87 3.86E-01 1.02 9.53E-01 UGD1 0.88 4.06E-01 0.89 6.45E-01 UXS1 1.05 7.53E-01 1.35 2.87E-01 CMT1 0.77 8.84E-02 0.86 5.47E-01 CAS1 1.02 8.71E-01 1.18 5.25E-01 CAS3 0.86 3.19E-01 1.04 8.80E-01 CAS31 0.77 8.18E-02 0.79 3.61E-01 CXT1 0.91 5.45E-01 1.05 8.47E-01 CMT1, putative 1.51 8.57E-03 2.32 3.26E-02 CAP6 0.96 7.80E-01 1.17 5.38E-01 CAP4 0.86 4.25E-01 2.23 8.73E-03 CAP2 0.84 2.48E-01 1.20 4.94E-01 CAP5 0.82 1.82E-01 1.03 9.09E-01 CAP1alpha 0.82 1.82E-01 1.03 9.09E-01

124

5.4 Conclusion

A platform that combines two orthogonal techniques, mass spectrometry and RNA

sequencing, has been used to understand RNA modification, expression level and function for the

fungi C. neoformans. Within the exposure limits used in these experiments, there is no significant remodeling of the transcriptome in IR as less than 10% of the genes were mobilized. Some of the responses were expected- enhanced transcription of DNA repair enzymes and antioxidant activity were observed to protect the genome integrity. The transcripts for RNA modifying enzymes were also stable except for Dus1, an enzyme that catalyzes the reduction of uridine. The nucleoside data for tRNA enriched pool did not mirror this increased level of the Dus1 transcript which led me to several hypothesis that will be discussed in Chapter 6.

To the best of my knowledge, the results for the differential expression level of tRNA, both pre and mature, are reported for C. neoformans and in radiation conditions. In pre-tRNA levels, it was

discovered that multiple copies of the same tRNA are transcribed at different levels. These tRNAs are distributed in different chromosomal locations. Whether this observation bears any biological significance is beyond the scope of this dissertation but will be discussed Chapter 6. In addition, a

portion of pre-tRNAs (15/132 expressed) are only synthesized in IR and five were repressed. I only

tested the fungi in IR exposure; it can be speculated that more than just RNA modifications and tRNA

expression levels will be a regulatory response when the cell is in a state of disturbance. Whether the

machineries for tRNA synthesis are sensitive to different conditions, through codon bias or maybe

tRNAs are just “hitchhiking” in protein synthesis, is worth investigating as it will show an orchestra

of players working in tandem in response to stress and working together in achieving a new state of

equilibrium.

125

Chapter 6 Conclusion and Future Directions

6.1 Summary and Conclusions

Historically, tRNAs are challenging to work with – the similarities among individual tRNA isoacceptors, extensive sample preparation required and low throughput of analysis have impeded the characterization of total tRNA pools from organisms beyond the bacterial kingdom. Recent advances in LC-MS based approaches with multiple enzymatic digestions and targeted techniques have alleviated the need to individually purify tRNAs for mapping modifications in complex mixtures.

An important discovery from this thesis is the different layers of complexity this non- coding

RNA possess. The modification profile of tRNAs can have different states that exist simultaneously with both modified/unmodified sites or when hyper-modifications are present at different stages of

the multi-step pathway. In this body of work, I combined two techniques, LC-MS and RNA

sequencing through the reference concept to gain a holistic understanding of tRNA modifications,

expression levels and its dynamics in a biological system.

In Chapter 2, I developed a mass spectral matching method to interpret LC-MS/MS data

during RNA modification mapping. The reference used here is a collection of previously acquired,

well-characterized oligonucleotide mass spectra to be searched against experimental data. I took

advantage of the publicly available software from NIST, MS Search, and established a set of criteria

for positive identification of the reference in the library. This method has been applied to identify

essential enzymes needed for the t6A modification of S. mutans through monitoring of the anticodon

loop of tRNA Ile GAU. Library building and collecting reference spectra can be tailored to the specific

needs of the researcher/lab to increase the speed of data analysis.

126

Chapter 3 demonstrated the second use of a reference as an internal standard for an improved comparative analysis and quantification. The reference is an isotopically labelled in vitro transcribed RNA, synthesized to the RNA of interest. A significant aspect of this approach is wide applicability of the reference, which can be made easily if the genomic information is available for the sample of interest. Data analysis is easier compared to the traditional O16/O18 labelling because the

doublets are easily characterized due to the wide mass shifts in the mass spectrum. Singlets generated

from the samples are unique and can be further characterized by CID fragmentation to reveal the

location of post transcriptional modifications. The method developed can be implemented to focused

on dynamic changes in modification levels in addition to more conventional characterization of RNA

modification profiles.

The reference concept can be further expanded into non-LC-MS based techniques to

understand a biological system under different (stress) conditions. The third use of the reference is

measuring the global expression level of RNA in the transcriptome. The model organism I choose to

work with is the pathogenic and radioresistant fungus C. neoformans. To the best of my knowledge,

nothing is known about tRNA expression, modification or function for this eukaryote until now. I

used a next generation sequencing technique called RNA seq to expand the comparative analysis

approach and understand the RNA expression and other cellular processes in the fungi response to

perturbations in the system. In Chapter 4, I first identified the baseline information of the

modification status of its total tRNA pool, mapped it back to the genomic sequence and its changes

in response to IR. Oxidation products of guanine (8-oxo-G), m5C (hm5C) and m6A (hm6A) and an

increase in position 34 modifications such as I, ncm5U, mcm5U and mcm5s2U.

In Chapter 5, RNA seq data revealed that the initial response of the fungus to IR is to up

regulate the DNA repair and antioxidant systems to preserve the genome integrity. I then investigated

127

the tRNA expression levels for both pre and mature levels in response to IR. An important discovery

made in this work is that the levels of mature tRNA are relatively stable throughout the course of IR

exposure, yet the pre-tRNA synthesis is dynamic and a subset of pre-tRNAs are sensitive to radiation.

There are no significant changes in the expression level of RNA modifying enzymes except for

transcripts involved in the enzymatic pathway to generate mcm5s2U (C. neoformans homologue for

Trm9 and Uba4). The Dus1 transcript has a two-fold induction (p<0.05) consistently in all the biological replicates tested but the nucleoside level is stable over the course of exposure. In addition, it was discovered that multiple copies of the same tRNA, which are distributed in different chromosomal locations, are transcribed in different levels.

6.2 Future Directions

6.2.1 Quantification of selected nucleosides by selected reaction monitoring (SRM)

In Chapter 5, the transcript level of Dus1 is doubled in IR relative to control. However, the nucleoside level in Chapter 4 is stable over the course of exposure. This modification is found in the

D-loop of a tRNA and is a common modification that enhances a tRNA’s flexibility thus promoting tertiary interactions [144, 145]. Dihydrouridine is the least retentive on a reverse phase C-18 column,

eluting before all other molecules in the sample. This hydrophilicity results in a low ionization

efficiency during electrospray. To optimize detection while simultaneously performing quantification,

a triple quadrupole (QQQ) mass spectrometer selected reaction monitoring (SRM) assay was

developed. As shown in Figure 6.1A, the level of dihydrouridine is constant. I also performed a

Western Blot analysis to monitor the enzyme levels (Figure 6.1B). Consistently in all biological replicates, both the protein and nucleoside levels are stable, but the transcript increases. Why are more

128 transcripts made when the protein level stays the same? It is also very interesting as to why, of the three putative Dus proteins in C. neoformans (through alignment with the Dusp enzymes in S. cerevisae), only the Dus1 transcript is sensitive to radiation. The Dus proteins are site specific with Dus1 acting only in the D-loop U16/U17[146].

A. B.

Figure 6.1 A. Single reaction monitoring of Dus1 nucleoside by LC-MS/MS. No significant change over the exposure tested. B. Western Blot analysis of the Dus1 protein.

Recently, this modification has been detected in the 23S rRNA of the bacterium Clostridium sporogenes along with its methylated form, methyl dihydrouridine [147]. It is possible that Dus1 can be acting on substrates like other RNA species. Fractionation of different RNA species of the fungi and SRM experiments can help deduce the RNA substrate of the enzyme.

129

6.2.2 C. neoformans as a system to study tRNA processing

While working with C. neoformans, I realized that this fungus is “odd” to say the least. To date,

it is the intron richest genome to ever have been sequenced [148-150]. About ~99% of its genes

contain introns, which span a 19Mb genome organized into 14 chromosomes. In contrast, S. cerevisae

has ~5% introns in its genome [151]. The average length of intronic sequence is about 65 nt, with

each gene containing about ~5 introns [152]. It is therefore not surprising that its tRNA genes are also

full of introns. The genomic tRNA database predicts 133 of the 139 tRNA genes have introns in it.

Figure 6.2A shows how C. neoformans tRNA compare with other model fungi and species that were

found in Chernobyl. For the tRNA to participate in the translation process, the trailer and leader

sequences must be removed, the introns spliced out, ligated and a CCA tail is attached to generate a

mature tRNA (Figure 6.2B).

The saga of a tRNA from “birth” to “death” begins in the nucleolus where the transcription

process occurs. Once transcribed, the tRNA begins a journey of processing and modification, which

includes trimming of the 3’ and 5’ ends, CCA addition and post transcriptional modifications [136].

In yeast, these processes can occur in different locations in the cell – either in the nucleus or in the

cytoplasm. The splicing of introns by the SEN (splicing endonuclease complex) is exclusively in the

cytoplasm (surface of the mitochondria), which occurs in the nucleoplasm for vertebrates [64, 128,

153]. In the fungi studied here, almost all the pre-tRNA genes detected have introns (108/110), which

all were processed to be a mature tRNA based on the results from Chapter 5. The big questions that

must be answered are: Where the splicing enzyme and localized in the cell? Are the introns

even essential to the cell’s viability? Which modifications are added to the tRNA pre - post intron removal? What is the evolutionary advantage of introns to the fungi for which its main presence adds

130

an extra step prior to export to the cytoplasm? One possibility is that it may function as a guide RNA,

like the archaea H. volcanii Trp CCA intron that helps methylate its anticodon loop [154].

A.

B.

Figure 6.2 A. Comparative analysis of the tRNA genes and genes with introns in C. neoformans and other model fungi. Botrytis cinerea, Aspergillus fumigatus and Penicillium chrysogenum were found Chernobyl reactor 4. C. neoformans possess the highest intron containing tRNA genes ever sequenced to date. B. Canonical intron (blue circles) in tRNA is found in the anticodon (black circles) loop. After splicing, the mature tRNA can participate in translation. Spliced sites are indicated by the arrow.

131

Although the questions posed here will probably takes years to be answered, C. neoformans

tRNA biology deserves special attention since it is a human pathogen. The ligation of 3’ and 5’ spliced

tRNA fragments is the opposite way in fungi vs humans. In human cells, the two ends are joined by

the phosphate HSPC117, in a 3’ phosphate ligase pathway (along with 2H CPDase, hClp1 and

Tpt1) [64]. In fungi and other plants, it follows a 5’ phosphate ligation pathway with Trl1, Rlg1 and

Tpt1 proteins that have unique steps compared to the mammalian counterpart [155]. The complete

mechanism will not be discussed here, but what is important to note is that mammals possess both

set of enzymes needed for either 3’ or 5’ ligation pathway but inherently follows the 3’ pathway. In a

mouse model, when the 5’ enzymes were deleted (Clp1 and Tpt1), they were found to be non-essential

[156]. This key difference could possibly be a new drug target for the fungi without compromising the

host.

. 6.2.3 tRNA introns as a potential source of regulatory RNA?

Next, I surveyed the length of the introns in C. neoformans and the results are shown in Figure

6.3A. Although variable, there are some intron lengths that fall within the regulatory small non coding

RNA like miRNA, siRNA and piRNA[70]. A common feature (but not true for all) of miRNA is the

bias for a uridine nucleotide at the 5’ end and majority of these introns do have U at the beginning of

their sequences (Figure 6.3B).

I then searched if there is a precedence that a regulatory RNA can come from introns and/or tRNAs and there are some literature available [103, 157-159]. Intronic miRNA do exists in nature (as

opposed to intergenic miRNA) which shares promoter regions within its host gene thus its expression

is dependent on the host. I also searched for any homology with known miRNAs from different

132 organisms with the introns and the results are shown in Table 6.1. Although similar with known miRNAs, one still has to demonstrate in vivo that these introns perform a regulatory function.

A.

B.

Figure 6.3 A. Number of tRNA genes and intron length in C. neoformans. Underlined is the range of length of regulatory RNA. B. Nucleotide conservation of the first eight sequence of the introns in the fungi.

Perhaps the closest available information for tRNA intron as regulatory RNA is through the recent discovery of circular RNA (circRNA) from excised introns [103]. With these potentially rich areas of research to follow, it is important to show if these C. neoformans introns do act in a regulatory fashion through repression of transcripts. In addition, Maute et al. have demonstrated that a 3’ fragment (22 nt) from human tRNA Gly GCC binds to Argonaute proteins[159]. This demonstrates that segments in tRNA can act as substrates with proteins associated with miRNA processing.

133

In a more intriguing thought, introns could it be acting as a “guide RNA” for proteins (for further processing, flag for degradation etc.), a self-regulator (self-repression after synthesis) or

Table 6.1 Alignment of tRNA introns to known miRNA. (www.mirbase.org). HK – housekeeping, IR-I- radiation induced, IR-R – radiation repressed. Y - yes; N - No

protection against viral invasion by preventing the virus to use tRNAs as primer. To answer these questions, a large scale bioinformatics survey of potential targets and binding sites can be performed.

134

6.2.4 Regulation of tRNA synthesis: why are multiple copies of the same tRNA expressed at

different levels?

Another curious result from the RNA seq data in Chapter 5 is the different expression levels

of tRNA isoacceptors, which have identical sequences but are in different locations in the

chromosome[160]. Tracking pre-tRNA differential expression is possible only for tRNAs that

possess a unique intron sequence. In those that contain unique intron sequences, it is observed that

identical tRNAs (just different locations in the genome) are expressed in varying levels. For example,

tRNA Val AAC has seven copies and five are in close proximity with each other on chromosome 12.

In Table 6.2, copies 2,3 and 7 are not expressed in any conditions tested. Copy 5 is expressed

abundantly relative to other genes; however, the expression of copies 1, 4 and 6 is sensitive to IR.

Copies 1 and 4 are expressed only in IR whereas copy 6 is repressed. It seems that at any given time,

there is at least one tRNA isoacceptor being expressed all the time (housekeeping) and the other redundant copies are expressed under particular conditions. This observation is also true for Ala AGC

tRNAs and Thr TGA.

This differential expression does not make any sense given that the promoter regions for

tRNA genes are imbedded within the actual sequence, which corresponds to the D and T loop (A and

B box respectively). If it is a multiple copy, the D and T loop should be the same sequence. The

variability could possibly lie outside the tRNA gene itself. I propose two different hypotheses that can be tested. First, the difference could lie on the 5’ upstream region of the tDNA. When RNA polymerase III is recruited by transcription factors, it binds to the 5’ upstream region of the tDNA, which is variable even for same tRNA copy. Alignment of the upstream sequence and comparing it with the sequence preference of Pol III may show subtle differences between identical tRNAs. It is

135

also possible that histone modifications in the location of tRNA genes explain why some tRNAs are

not expressed (inactive) and some are expressed all the time.

Table 6.2 Differential expression of multiple copies of tRNA Val AAC in IR.

The second hypothesis could be due to the spatial arrangements of the tDNA in the genome.

It is possible that the induction/repression of the tRNA is due to protein coding genes flanking the

tDNA itself. This “hitchhiking” mechanism may explain the differences in the levels of tRNA

expression concomitant with the up and down regulated levels of proteins during stress. Recently, Sagi

et al. showed that in C. elegans, tRNA genes can be found in the introns of protein coding genes. tRNA

expression transcription is regulated by the promoter of its host gene[160]. One must navigate the genome of the fungus and looks for common motifs and trends to explain these differences and combine it with the RNA seq and proteome information. If a tRNA gene is just “tagging” along with a protein coding gene, an increase or decrease of the protein expression will reflect the tRNA abundance.

136

6.2.4 A proposed model for a dynamic tRNA expression in response to stress

With everything we know now about C. neoformans, I propose here a new model (Figure 6.4) in which the pool of tRNAs (pre and mature) is distinct in different types of stress. The repression and induction of pre-tRNAs during exposure reveals the tRNAs’ dynamic role but this model needs to be tested with different types of stress. Null mutations or gene deletions of the redundant tRNAs

Figure 6.4 Proposed model for the differential expression of pre-tRNA genes transcribed in response of different conditions. The protein Maf1 is dephosphorylated and binds to the Poll III to initiate tRNA transcription, with the transcription factor IIIC recognizing the tDNA. With different forms of stress, different set of pre-tRNAs are transcribed colored blocks and a subset (housekeeping set) are expressed at all times.

137

can be made to show the impact to cell viability when exposed to IR. Furthermore, a qRT-PCR experiment should be performed to validate the RNA seq data collected here.

The uniqueness of C. neoformans, as a fungal pathogen, carbohydrate capsule and melanin synthesis, radioresistant character and intron rich genome is a good system to study interdisciplinary areas such as tRNA biology, gene regulation, pathogenicity, material design for protection (IR) etc.

Our understanding of tRNA and its role in the cell has greatly expanded in the past decade and will continue to do with the discovery of tRNA derived fragments, circRNA from tRNA introns and potentially miRNA-like sequences derived from tRNAs all of which may be implicated as part of the global cellular stress response.

138

Bibliography

1. Letoquart, J., van Tran, N., Caroline, V., Aleksandrov, A., Lazar, N., van Tilbeurgh, H., Liger, D., Graille, M.: Insights into molecular plasticity in protein complexes from Trm9-Trm112 tRNA modifying enzyme crystal structure. Nucleic acids research. 43, 10989-11002 (2015) 2. Termathe, M., Leidel, S.A.: The Uba4 domain interplay is mediated via a thioester that is critical for tRNA thiolation through Urm1 thiocarboxylation. Nucleic acids research. 46, 5171-5181 (2018) 3. Limbach, P.A., Paulines, M.J.: Going global: the new era of mapping modifications in RNA. Wiley interdisciplinary reviews. RNA. 8, (2017) 4. Limbach, P., Crain, P., McCloskey, J.: Summary: the modified nucleosides of RNA. Nucleic Acids Res. 22, 2183-2196 (1994) 5. Li, S., Mason, C.E.: The Pivotal Regulatory Landscape of RNA Modifications. Annual review of genomics and human genetics. 15, 127-150 (2014) 6. Saletore, Y., Meyer, K., Korlach, J., Vilfan, I.D., Jaffrey, S., Mason, C.E.: The birth of the Epitranscriptome: deciphering the function of RNA modifications. Genome Biology. 13, 175 (2012) 7. Kellner, S., Burhenne, J., Helm, M.: Detection of RNA Modifications. RNA Biology. 7, 237-247 (2010) 8. Mortazavi, A., Williams, B., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 5, 621-628 (2008) 9. Licatalosi, D.D., Darnell, R.B.: RNA processing and its regulation: global insights into biological networks. Nature reviews. Genetics. 11, 75-87 (2010) 10. Bellacosa, A., Moss, E.G.: RNA repair: damage control. Current biology : CB. 13, R482-484 (2003) 11. Falnes, P.O., Klungland, A., Alseth, I.: Repair of methyl lesions in DNA and RNA by oxidative demethylation. Neuroscience. 145, 1222-1232 (2007) 12. Fimognari, C.: Role of Oxidative RNA Damage in Chronic-Degenerative Diseases. Oxidative medicine and cellular longevity. 2015, 358713 (2015) 13. Nunomura, A., Moreira, P.I., Castellani, R.J., Lee, H.G., Zhu, X., Smith, M.A., Perry, G.: Oxidative damage to RNA in aging and neurodegenerative disorders. Neurotoxicity research. 22, 231-248 (2012) 14. Helm, M., Alfonzo, J.D.: Posttranscriptional RNA Modifications: playing metabolic games in a cell's chemical Legoland. Chemistry & biology. 21, 174-185 (2014) 15. Jackman, J.E., Alfonzo, J.D.: Transfer RNA modifications: nature's combinatorial chemistry playground. Wiley interdisciplinary reviews. RNA. 4, 35-48 (2013) 16. Phizicky, E.M., Alfonzo, J.D.: Do all modifications benefit all tRNAs? FEBS letters. 584, 265-271 (2010) 17. Phizicky, E.M., Hopper, A.K.: tRNA biology charges to the front. Genes & development. 24, 1832- 1860 (2010) 18. Phizicky, E.M., Hopper, A.K.: tRNA processing, modification, and subcellular dynamics: past, present, and future. RNA (New York, N.Y.). 21, 483-485 (2015) 19. Machnicka, M., Milanowska, K., Osman, O., Purta, E., Kurkowska, M., Olchowik, A., Januszewski, W., Kalinowski, S., Dunin-Horkawicz, S., Rother, K., Helm, M., Bujnicki, J., Grosjean, H.: MODOMICS: a database of RNA modification pathways--2013 update. Nucleic Acids Res., D262- 267 (2012) 20. Yu, C.-T., Allen, F.W.: Studies on an isomer of uridine isolated from ribonucleic acids. Biochim Biophys Acta. 32, 393-405 (1959) 21. Cohn, W.E.: 5-Ribosyl uracil, a carbon-carbon ribofuranosyl nucleoside in ribonucleic acids. Biochim Biophys Acta. 32, 569-571 (1959)

139

22. Maydanovych, O., Beal, P.A.: Breaking the central dogma by RNA editing. Chemical reviews. 106, 3397-3411 (2006) 23. Grosjean, H., Keith, G., Droogmans, L.: Detection and quantification of modified nucleotides in RNA using thin-layer chromatography. Methods in molecular biology (Clifton, N.J.). 265, 357-391 (2004) 24. Pomerantz, S.C., McCloskey, J.A.: Analysis of RNA hydrolyzates by liquid chromatography-mass spectrometry. Methods Enzymol. 193, 796 (1990) 25. 26. Cai, W.M., Chionh, Y.H., Hia, F., Gu, C., Kellner, S., McBee, M.E., Ng, C.S., Pang, Y.L.J., Prestwich, E.G., Lim, K.S., Ramesh Babu, I., Begley, T.J., Dedon, P.C.: A Platform for Discovery and Quantification of Modified Ribonucleosides in RNA: Application to Stress-Induced Reprogramming of tRNA Modifications. Methods Enzymol. 560, 29-71 (2015) 27. Su, D., Chan, C.T.Y., Gu, C., Lim, K.S., Chionh, Y.H., McBee, M.E., Russell, B.S., Babu, I.R., Begley, T.J., Dedon, P.C.: Quantitative analysis of modifications in tRNA by HPLC-coupled mass spectrometry. Nat. Protocols. 9, 828-841 (2014) 28. Sanger, F., Brownlee, G.G., Barrell, B.G.: A Two-dimensional Fractionation Procedure for Radioactive Nucleotides. J Mol Biol. 13, 373-398 (1965) 29. Motorin, Y., Muller, S., Behm-Ansmant, I., Branlant, C.: Identification of Modified Residues in RNA by Reverse Transcription-based Methods. Methods Enzymol. 425, 21-53 (2007) 30. Kowalak, J.A., Pomerantz, S.C., Crain, P.F., McCloskey, J.A.: A novel method for the determination of post-transcriptional modification in RNA by mass spectrometry. Nucleic Acids Res. 21, 4577- 4584 (1993) 31. Picardi, E., Gallo, A., Galeano, F., Tomaselli, S., Pesole, G.: A Novel Computational Strategy to Identify A-to-I RNA Editing Sites by RNA-Seq Data: De Novo Detection in Human Spinal Cord Tissue. PLoS ONE. 7, e44184 (2012) 32. Picardi, E., D'Erchia, A.M., Montalvo, A., Pesole, G.: Using REDItools to Detect RNA Editing Events in NGS Datasets. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.]. 49, 12.12.11-12.12.15 (2015) 33. Torres, A.G., Pineyro, D., Rodriguez-Escriba, M., Camacho, N., Reina, O., Saint-Leger, A., Filonava, L., Batlle, E., Ribas de Pouplana, L.: Inosine modifications in human tRNAs are incorporated at the precursor tRNA level. Nucleic Acids Research. 43, 5145-5157 (2015) 34. Ebhardt, H.A., Tsang, H.H., Dai, D.C., Liu, Y., Bostan, B., Fahlman, R.P.: Meta-analysis of small RNA- sequencing errors reveals ubiquitous post-transcriptional RNA modifications. Nucleic Acids Research. 37, 2461-2470 (2009) 35. Findeiss, S., Langenberger, D., Stadler, P.F., Hoffmann, S.: Traces of post-transcriptional RNA modifications in deep sequencing data. Biological chemistry. 392, 305-313 (2011) 36. Iida, K., Jin, H., Zhu, J.-K.: Bioinformatics analysis suggests base modifications of tRNAs and miRNAs in Arabidopsis thaliana. BMC genomics. 10, 155 (2009) 37. Motorin, Y., Lyko, F., Helm, M.: 5-methylcytosine in RNA: Detection, enzymatic formation and biological functions. Nucleic Acids Res. 38, 1415-1430 (2010) 38. Khoddami, V., Yerra, A., Cairns, B.R. (1 ed.). Elsevier Inc., (2015) 39. Khoddami, V., Cairns, B.R.: Identification of direct targets and modified bases of RNA cytosine methyltransferases. Nature Biotechnol. 31, 458-464 (2013) 40. Hussain, S., Aleksic, J., Blanco, S., Dietmann, S., Frye, M.: Characterizing 5-methylcytosine in the mammalian epitranscriptome. Genome Biology. 14, (2013) 41. Hussain, S., Sajini, A., Blanco, S., Dietmann, S., Lombard, P., Sugimoto, Y., Paramor, M., Gleeson, J., Odom, D., Ule, J., Frye, M.: NSun2-Mediated Cytosine-5 Methylation of Vault Noncoding RNA Determines Its Processing into Regulatory Small RNAs. Cell Reports. 4, 255-261 (2013)

140

42. Hussain, S., Bashir, Z.: The epitranscriptome in modulating spatiotemporal RNA translation in neuronal post-synaptic function. Frontiers in Cellular Neuroscience. 9, 420 (2015) 43. Wu, H., D'Alessio, A.C., Ito, S., Wang, Z., Cui, K., Zhao, K., Sun, Y.E., Zhang, Y.: Genome-wide analysis of 5-hydroxymethylcytosine distribution reveals its dual function in transcriptional regulation in mouse embryonic stem cells. Genes & development. 25, 679-684 (2011) 44. Kallin, E.M., Rodriguez-Ubreva, J., Christensen, J., Cimmino, L., Aifantis, I., Helin, K., Ballestar, E., Graf, T.: Tet2 facilitates the derepression of myeloid target genes during CEBPalpha-induced transdifferentiation of pre-B cells. Molecular cell. 48, 266-276 (2012) 45. Delatte, B., Wang, F., Ngoc, L.V., Collignon, E., Bonvin, E., Deplus, R., Calonne, E., Hassabi, B., Putmans, P., Awe, S., Wetzel, C., Kreher, J., Soin, R., Creppe, C., Limbach, P.A., Gueydan, C., Kruys, V., Brehm, A., Minakhina, S., Defrance, M., Steward, R., Fuks, F.: RNA biochemistry. Transcriptome-wide distribution and function of RNA hydroxymethylcytosine. Science. 351, 282- 285 (2016) 46. Dominissini, D., Moshitch-Moshkovitz, S., Amariglio, N., Rechavi, G. (1 ed.). Elsevier Inc., (2015) 47. Meyer, K., Saletore, Y., Zumbo, P., Elemento, O., Mason, C., Jaffrey, S.: Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 30 UTRs and near Stop Codons. Cell. 149, 1635-1646 (2012) 48. Srikanta, D., Santiago-Tirado, F.H., Doering, T.L.: Cryptococcus neoformans: historical curiosity to modern pathogen. Yeast (Chichester, England). 31, 47-60 (2014) 49. Linder, B., Grozhik, A.V., Olarerin-George, A.O., Meydan, C., Mason, C.E., Jaffrey, S.R.: Single- nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nature Methods. 12, 767-772 (2015) 50. Ke, S., Alemu, E.A., Mertens, C., Gantman, E.C., Fak, J.J., Mele, A., Haripal, B., Zucker-Scharff, I., Moore, M.J., Park, C.Y., Vågbø, C.B., Kusśnierczyk, A., Klungland, A., Darnell, J., James E, Darnell, R.B.: A majority of m 6A residues are in the last exons, allowing the potential for 3′ UTR regulation. Genes & Development. 29, 2037-2053 (2015) 51. Chen, K., Lu, Z., Wang, X., Fu, Y., Luo, G.Z., Liu, N., Han, D., Dominissini, D., Dai, Q., Pan, T., He, C.: High-resolution N(6) -methyladenosine (m(6) A) map using photo-crosslinking-assisted m(6) A sequencing. Angewandte Chemie (International ed. in English). 54, 1587-1590 (2015) 52. Wang, X., Lu, Z., Gomez, A., Hon, G.C., Yue, Y., Han, D., Fu, Y., Parisien, M., Dai, Q., Jia, G., Ren, B., Pan, T., He, C.: N6-methyladenosine-dependent regulation of messenger RNA stability. Nature. 505, 117-120 (2014) 53. Wang, X., Zhao, B.S., Roundtree, I.A., Lu, Z., Han, D., Ma, H., Weng, X., Chen, K., Shi, H., He, C.: N(6)-methyladenosine Modulates Messenger RNA Translation Efficiency. Cell. 161, 1388-1399 (2015) 54. Carlile, T.M., Rojas-Duran, M.F., Zinshteyn, B., Shin, H., Bartoli, K.M., Gilbert, W.V.: Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells. Nature. 515, 143- 146 (2014) 55. Lovejoy, A.F., Riordan, D.P., Brown, P.O.: Transcriptome-Wide Mapping of : Pseudouridine Synthases Modify Specific mRNAs in S. cerevisiae. PLoS ONE. 9, e110799 (2014) 56. Schwartz, S., Bernstein, D.A., Mumbach, M.R., Jovanovic, M., Herbst, R.H., León-Ricardo, B.X., Engreitz, J.M., Guttman, M., Satija, R., Lander, E.S., Fink, G., Regev, A.: Transcriptome-wide Mapping Reveals Widespread Dynamic-Regulated Pseudouridylation of ncRNA and mRNA. Cell. 159, 148-162 (2014) 57. Bakin, A., Kowalak, J.A., McCloskey, J.A., Ofengand, J.: The single pseudouridine residue in Escherichia coli 16S RNA is located at position 516. Nucleic Acids Research. 22, 3681-3684 (1994) 58. Wu, G., Huang, C., Yu, Y.-T. (1 ed.). Elsevier Inc., (2015)

141

59. Cattenoz, P., Taft, R., Westhof, E., Mattick, J.: Transcriptome-wide identification of A > I RNA editing sites by inosine specific cleavage. RNA (New York, N.Y.). 19, 257-270 (2013) 60. Suzuki, T., Ueda, H., Okada, S., Sakurai, M.: Transcriptome-wide identification of adenosine-to- inosine editing using the ICE-seq method. Nature Protocols. 10, 715-732 (2015) 61. Roovers, M., Wouters, J., Bujnicki, J.M., Tricot, C., Stalon, V., Grosjean, H., Droogmans, L.: A primordial RNA modification enzyme: the case of tRNA (m1A) methyltransferase. Nucleic Acids Res. 32, 465-476 (2004) 62. Dominissini, D., Nachtergaele, S., Moshitch-Moshkovitz, S., Peer, E., Kol, N., Ben-Haim, M.S., Dai, Q., Di Segni, A., Salmon-Divon, M., Clark, W.C., Zheng, G., Pan, T., Solomon, O., Eyal, E., Hershkovitz, V., Han, D., Dore, L.C., Amariglio, N., Rechavi, G., He, C.: The dynamic N(1)- methyladenosine methylome in eukaryotic messenger RNA. Nature. 530, 441-446 (2016) 63. Li, X., Xiong, X., Wang, K., Wang, L., Shu, X., Ma, S., Yi, C.: Transcriptome-wide mapping reveals reversible and dynamic N(1)-methyladenosine methylome. Nature chemical biology. 12, 311-316 (2016) 64. Cozen, A.E., Quartley, E., Holmes, A.D., Hrabeta-Robinson, E., Phizicky, E.M., Lowe, T.M.: ARM- seq: AlkB-facilitated RNA methylation sequencing reveals a complex landscape of modified tRNA fragments. Nature Methods. 12, 879-884 (2015) 65. Zheng, G., Qin, Y., Clark, W., Dai, Q., Yi, C., He, C., Lambowitz, A., Pan, T.: Efficient and quantitative high-throughput tRNA sequencing. Nature Methods. 12, 835-839 (2015) 66. Birkedal, U., Christensen-Dalsgaard, M., Krogh, N., Sabarinathan, R., Gorodkin, J., Nielsen, H.: Profiling of Ribose Methylations in RNA by High-Throughput Sequencing. Angewandte Chemie International Edition. n/a-n/a (2014) 67. Cahová, H., Winz, M.-L., Höfer, K., Nübel, G., Jäschke, A.: NAD captureSeq indicates NAD as a bacterial cap for a subset of regulatory RNAs. Nature. 519, 374-377 (2014) 68. McCloskey, J.A., Nishimura, S.: Modified nucleosides in transfer RNA. Accounts of chemical research. (1977) 69. Gaston, K.W., Limbach, P.A.: The identification and characterization of non-coding and coding RNAs and their modified nucleosides by mass spectrometry. RNA Biol. 11, 1568-1585 (2014) 70. Björkbom, A., Lelyveld, V.S., Zhang, S., Zhang, W., Tam, C.P., Blain, J.C., Szostak, J.W.: Bidirectional Direct Sequencing of Noncanonical RNA by Two-Dimensional Analysis of Mass Chromatograms. Journal of the American Chemical Society. 137, 14430-14438 (2015) 71. Wetzel, C., Limbach, P.A.: Mass spectrometry of modified RNAs: recent developments. The Analyst. 141, 16-23 (2016) 72. Li, S., Limbach, P.: Method for Comparative Analysis of Ribonucleic Acids Using Isotope Labeling and Mass Spectrometry. Anal Chem. 84, 8607-8613 (2012) 73. Li, S., Limbach, P.: Mass spectrometry sequencing of transfer ribonucleic acids by the comparative analysis of RNA digests (CARD) approach The Analyst. 138, 1386-1394 (2013) 74. Puri, P., Wetzel, C., Saffert, P., Gaston, K.W., Russell, S.P., Cordero Varela, J.A., van der Vlies, P., Zhang, G., Limbach, P.A., Ignatova, Z., Poolman, B.: Systematic identification of tRNAome and its dynamics in Lactococcus lactis. Mol Microbiol. 93, 944-956 (2014) 75. Wetzel, C., Limbach, P.: The global identification of tRNA isoacceptors by targeted tandem mass spectrometry. The Analyst. 138, 6063-6072 (2013) 76. Taoka, M., Nobe, Y., Hori, M., Takeuchi, A., Masaki, S., Yamauchi, Y., Nakayama, H., Takahashi, N., Isobe, T.: A mass spectrometry-based method for comprehensive quantitative determination of post-transcriptional RNA modifications: the complete chemical structure of Schizosaccharomyces pombe ribosomal RNAs. Nucleic Acids Research. 43, e115 (2015)

142

77. Cao, X., Limbach, P.A.: Enhanced Detection of Post-Transcriptional Modifications Using a Mass- Exclusion List Strategy for RNA Modification Mapping by LC- MS/MS. Anal Chem. 87, 8433-8440 (2015) 78. Popova, A.M., Williamson, J.R.: Quantitative analysis of rRNA modifications using stable isotope labeling and mass spectrometry. JACS. 136, 2058-2069 (2014) 79. Sample, P.J., Gaston, K.W., Alfonzo, J.D., Limbach, P.A.: RoboOligo: software for mass spectrometry data to support manual and de novo sequencing of post-transcriptionally modified ribonucleic acids. Nucleic Acids Res. 43, e64-e64 (2015) 80. Nakayama, H., Takahashi, N., Isobe, T.: Informatics for mass spectrometry-based RNA analysis. Mass Spectrom Rev. 30, 1000-1012 (2011) 81. Nakayama, H., Akiyama, M., Taoka, M., Yamauchi, Y., Nobe, Y., Ishikawa, H., Takahashi, N., Isobe, T.: Ariadne: a database search engine for identification and chemical analysis of RNA using tandem mass spectrometry data. Nucleic Acids Res. 37, e47 (2009) 82. Liu, N., Pan, T.: Probing RNA Modification Status at Single-Nucleotide Resolution in Total RNA. Methods in enzymology. 560, 149-159 (2015) 83. Zhao, X., Yu, Y.T.: Detection and quantitation of RNA base modifications. RNA (New York, N.Y.). 10, 996-1002 (2004) 84. Yu, Y.T., Shu, M.D., Steitz, J.A.: A new method for detecting sites of 2'-O-methylation in RNA molecules. RNA (New York, N.Y.). 3, 324-331 (1997) 85. Dong, Z.-W., Shao, P., Diao, L.-T., Zhou, H., Yu, C.-H., Qu, L.-H.: RTL-P: a sensitive approach for detecting sites of 2'-O-methylation in RNA molecules. Nucleic Acids Research. 40, e157- e157 (2012) 86. Ayub, M., Hardwick, S.W., Luisi, B.F., Bayley, H.: Nanopore-Based Identification of Individual Nucleotides for Direct RNA Sequencing. Nano Letters. 13, 6144-6150 (2013) 87. Smith, A.M., Abu-Shumays, R., Akeson, M., Bernick, D.L.: Capture, Unfolding, and Detection of Individual tRNA Molecules Using a Nanopore Device. Frontiers in Bioengineering and Biotechnology. 3, (2015) 88. Vilfan, I.D., Tsai, Y.-C., Clark, T.A., Wegener, J., Dai, Q., Yi, C., Pan, T., Turner, S.W., Korlach, J.: Analysis of RNA base modification and structural rearrangement by single-molecule real-time detection of reverse transcription. Journal of Nanobiotechnology. 11, 1-1 (2013) 89. McLuckey, S.A., Van Berkel, G.J., Glish, G.L.: Tandem mass spectrometry of small, multiply charged oligonucleotides. Journal of the American Society for Mass Spectrometry. 3, 60-70 (1992) 90. Taucher, M., Rieder, U., Breuker, K.: Minimizing base loss and internal fragmentation in collisionally activated dissociation of multiply deprotonated RNA. Journal of the American Society for Mass Spectrometry. 21, 278-285 (2010) 91. Yu, N., Lobue, P.A., Cao, X., Limbach, P.A.: RNAModMapper: RNA Modification Mapping Software for Analysis of Liquid Chromatography Tandem Mass Spectrometry Data. Analytical chemistry. 89, 10744-10752 (2017) 92. Kowalak, J.A., Pomerantz, S.C., Crain, P.F., McCloskey, J.A.: A novel method for the determination of post-transcriptional modification in RNA by mass spectrometry. Nucleic acids research. 21, 4577-4585 (1993) 93. Nakayama, H., Akiyama, M., Taoka, M., Yamauchi, Y., Nobe, Y., Ishikawa, H., Takahashi, N., Isobe, T.: Ariadne: a database search engine for identification and chemical analysis of RNA using tandem mass spectrometry data. Nucleic acids research. 37, e47 (2009) 94. Sample, P.J., Gaston, K.W., Alfonzo, J.D., Limbach, P.A.: RoboOligo: software for mass spectrometry data to support manual and de novo sequencing of post-transcriptionally modified ribonucleic acids. Nucleic acids research. 43, e64 (2015)

143

95. Matthiesen, R., Kirpekar, F.: Identification of RNA molecules by specific enzyme digestion and mass spectrometry: software for and implementation of RNA mass mapping. Nucleic acids research. 37, e48 (2009) 96. Lam, H.: Building and searching tandem mass spectral libraries for peptide identification. Molecular & cellular proteomics : MCP. 10, R111.008565 (2011) 97. Bacusmo, J.M., Orsini, S.S., Hu, J., DeMott, M., Thiaville, P.C., Elfarash, A., Paulines, M.J., Rojas- Benítez, D., Meineke, B., Deutsch, C., Iwata-Reuyl, D., Limbach, P.A., Dedon, P.C., Rice, K.C., Shuman, S., Crécy-Lagard, V.d.: The t6A modification acts as a positive determinant for the anticodon nuclease PrrC, and is distinctively nonessential in Streptococcus mutans. RNA Biology. 1-10 (2017) 98. Machnicka, M.A., Milanowska, K., Osman Oglou, O., Purta, E., Kurkowska, M., Olchowik, A., Januszewski, W., Kalinowski, S., Dunin-Horkawicz, S., Rother, K.M., Helm, M., Bujnicki, J.M., Grosjean, H.: MODOMICS: a database of RNA modification pathways--2013 update. Nucleic acids research. 41, D262-267 (2013) 99. Wang, Y., Casadevall, A.: Decreased susceptibility of melanized Cryptococcus neoformans to UV light. Applied and environmental microbiology. 60, 3864-3866 (1994) 100. Castleberry, C.M., Limbach, P.A.: Relative quantitation of transfer RNAs using liquid chromatography mass spectrometry and signature digestion products. Nucleic acids research. 38, e162 (2010) 101. Chan, P.P., Lowe, T.M.: GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic acids research. 44, D184-189 (2016) 102. Taoka, M., Nobe, Y., Hori, M., Takeuchi, A., Masaki, S., Yamauchi, Y., Nakayama, H., Takahashi, N., Isobe, T.: A mass spectrometry-based method for comprehensive quantitative determination of post-transcriptional RNA modifications: the complete chemical structure of Schizosaccharomyces pombe ribosomal RNAs. Nucleic acids research. 43, e115 (2015) 103. Houser, W.M., Butterer, A., Addepalli, B., Limbach, P.A.: Combining recombinant ribonuclease U2 and for RNA modification mapping by liquid chromatography-mass spectrometry. Analytical biochemistry. 478, 52-58 (2015) 104. Pang, Y.L., Abo, R., Levine, S.S., Dedon, P.C.: Diverse cell stresses induce unique patterns of tRNA up- and down-regulation: tRNA-seq for quantifying changes in tRNA copy number. Nucleic acids research. 42, e170 (2014) 105. Zhdanova, N.N., Zakharchenko, V.A., Vember, V.V., Nakonechnaya, L.T.: Fungi from Chernobyl: mycobiota of the inner regions of the containment structures of the damaged nuclear reactor. Mycological Research. 104, 1421-1426 (2000) 106. Chan, C.T., Deng, W., Li, F., DeMott, M.S., Babu, I.R., Begley, T.J., Dedon, P.C.: Highly Predictive Reprogramming of tRNA Modifications Is Linked to Selective Expression of Codon-Biased Genes. Chemical research in toxicology. 28, 978-988 (2015) 107. Rajasingham, R., Smith, R.M., Park, B.J., Jarvis, J.N., Govender, N.P., Chiller, T.M., Denning, D.W., Loyse, A., Boulware, D.R.: Global burden of disease of HIV-associated cryptococcal meningitis: an updated analysis. The Lancet. Infectious diseases. 17, 873-881 (2017) 108. Zaragoza, O., Rodrigues, M.L., De Jesus, M., Frases, S., Dadachova, E., Casadevall, A.: The capsule of the fungal pathogen Cryptococcus neoformans. Advances in applied microbiology. 68, 133-216 (2009) 109. Idnurm, A., Bahn, Y.S., Nielsen, K., Lin, X., Fraser, J.A., Heitman, J.: Deciphering the model pathogenic fungus Cryptococcus neoformans. Nature reviews. Microbiology. 3, 753-764 (2005) 110. Dadachova, E., Casadevall, A.: Ionizing radiation: how fungi cope, adapt, and exploit with the help of melanin. Current opinion in microbiology. 11, 525-531 (2008)

144

111. Rosas, A.L., Casadevall, A.: Melanization affects susceptibility of Cryptococcus neoformans to heat and cold. FEMS microbiology letters. 153, 265-272 (1997) 112. Garcia-Rivera, J., Casadevall, A.: Melanization of Cryptococcus neoformans reduces its susceptibility to the antimicrobial effects of silver nitrate. Medical mycology. 39, 353-357 (2001) 113. Rosas, A.L., Casadevall, A.: Melanization decreases the susceptibility of Cryptococcus neoformans to enzymatic degradation. Mycopathologia. 151, 53-56 (2001) 114. Dadachova, E., Bryan, R.A., Huang, X., Moadel, T., Schweitzer, A.D., Aisen, P., Nosanchuk, J.D., Casadevall, A.: Ionizing radiation changes the electronic properties of melanin and enhances the growth of melanized fungi. PloS one. 2, e457 (2007) 115. Malo, M.E., Bryan, R.A., Shuryak, I., Dadachova, E.: Morphological changes in melanized and non- melanized Cryptococcus neoformans cells post exposure to sparsely and densely ionizing radiation demonstrate protective effect of melanin. Fungal biology. 122, 449-456 (2018) 116. Pacelli, C., Bryan, R.A., Onofri, S., Selbmann, L., Shuryak, I., Dadachova, E.: Melanin is effective in protecting fast and slow growing fungi from various types of ionizing radiation. Environmental microbiology. 19, 1612-1624 (2017) 117. Helm, M., Alfonzo, Juan D.: Posttranscriptional RNA Modifications: Playing Metabolic Games in a Cell’s Chemical Legoland. Chemistry & Biology. 21, 174-185 (2014) 118. Schweizer, U., Bohleber, S., Fradejas-Villar, N.: The modified base isopentenyladenosine and its derivatives in tRNA. RNA Biology. 14, 1197-1208 (2017) 119. Benko, A.L., Vaduva, G., Martin, N.C., Hopper, A.K.: Competition between a sterol biosynthetic enzyme and tRNA modification in addition to changes in the protein synthesis machinery causes altered nonsense suppression. Proceedings of the National Academy of Sciences of the United States of America. 97, 61-66 (2000) 120. Mathevon, C., Pierrel, F., Oddou, J.-L., Garcia-Serres, R., Blondin, G., Latour, J.-M., Ménage, S., Gambarelli, S., Fontecave, M., Atta, M.: tRNA-modifying MiaE protein from Salmonella typhimurium is a nonheme diiron monooxygenase. Proceedings of the National Academy of Sciences. 104, 13295-13300 (2007) 121. Persson, B.C., Olafsson, O., Lundgren, H.K., Hederstedt, L., Bjork, G.R.: The ms2io6A37 modification of tRNA in Salmonella typhimurium regulates growth on citric acid cycle intermediates. Journal of bacteriology. 180, 3144-3151 (1998) 122. Shuryak, I., Bryan, R.A., Broitman, J., Marino, S.A., Morgenstern, A., Apostolidis, C., Dadachova, E.: Effects of radiation type and delivery mode on a radioresistant eukaryote Cryptococcus neoformans. Nuclear medicine and biology. 42, 515-523 (2015) 123. Iben, J.R., Maraia, R.J.: tRNAomics: tRNA gene copy number variation and codon use provide bioinformatic evidence of a new anticodon:codon wobble pair in a eukaryote. RNA (New York, N.Y.). 18, 1358-1372 (2012) 124. Higgs, P.G., Ran, W.: Coevolution of Codon Usage and tRNA Genes Leads to Alternative Stable States of Biased Codon Usage. Molecular Biology and Evolution. 25, 2279-2291 (2008) 125. Munk, B.H., Burrows, C.J., Schlegel, H.B.: Exploration of mechanisms for the transformation of 8- hydroxy guanine radical to FAPyG by density functional theory. Chemical research in toxicology. 20, 432-444 (2007) 126. Zou, S., Toh, J.D., Wong, K.H., Gao, Y.G., Hong, W., Woon, E.C.: N(6)-Methyladenosine: a conformational marker that regulates the substrate specificity of human demethylases FTO and ALKBH5. Scientific reports. 6, 25677 (2016) 127. Roundtree, I.A., Evans, M.E., Pan, T., He, C.: Dynamic RNA Modifications in Gene Expression Regulation. Cell. 169, 1187-1200 (2017)

145

128. Cozen, A.E., Quartley, E., Holmes, A.D., Hrabeta-Robinson, E., Phizicky, E.M., Lowe, T.M.: ARM- seq: AlkB-facilitated RNA methylation sequencing reveals a complex landscape of modified tRNA fragments. Nature methods. 12, 879-884 (2015) 129. Gu, W., Jackman, J.E., Lohan, A.J., Gray, M.W., Phizicky, E.M.: tRNAHis maturation: an essential yeast protein catalyzes addition of a guanine nucleotide to the 5' end of tRNAHis. Genes & development. 17, 2889-2901 (2003) 130. McFadden, D.C., Casadevall, A.: Capsule and melanin synthesis in Cryptococcus neoformans. Medical Mycology. 39, 19-30 (2001) 131. Araujo, G.R., Fontes, G.N., Leao, D., Rocha, G.M., Pontes, B., Sant'Anna, C., de Souza, W., Frases, S.: Cryptococcus neoformans capsular polysaccharides form branched and complex filamentous networks viewed by high-resolution microscopy. Journal of structural biology. 193, 75-82 (2016) 132. Azzam, E.I., Jay-Gerin, J.P., Pain, D.: Ionizing radiation-induced metabolic oxidative stress and prolonged cell injury. Cancer letters. 327, 48-60 (2012) 133. Azzam, E.I., de Toledo, S.M., Little, J.B.: Oxidative metabolism, gap junctions and the ionizing radiation-induced bystander effect. Oncogene. 22, 7050-7057 (2003) 134. Mothersill, C., Seymour, C.B.: Radiation-induced bystander effects--implications for cancer. Nature reviews. Cancer. 4, 158-164 (2004) 135. Lesniewska, E., Boguta, M.: Novel layers of RNA polymerase III control affecting tRNA gene transcription in eukaryotes. Open biology. 7, (2017) 136. Hopper, A.K.: Transfer RNA post-transcriptional processing, turnover, and subcellular dynamics in the yeast Saccharomyces cerevisiae. Genetics. 194, 43-67 (2013) 137. Bermudez-Santana, C., Attolini, C.S.-O., Kirsten, T., Engelhardt, J., Prohaska, S.J., Steigele, S., Stadler, P.F.: Genomic organization of eukaryotic tRNAs. BMC Genomics. 11, 270 (2010) 138. Keam, S.P., Hutvagner, G.: tRNA-Derived Fragments (tRFs): Emerging New Roles for an Ancient RNA in the Regulation of Gene Expression. Life (Basel, Switzerland). 5, 1638-1651 (2015) 139. Venkatesh, T., Suresh, P.S., Tsutsumi, R.: tRFs: miRNAs in disguise. Gene. 579, 133-138 (2016) 140. Dewe, J.M., Whipple, J.M., Chernyakov, I., Jaramillo, L.N., Phizicky, E.M.: The yeast rapid tRNA decay pathway competes with elongation factor 1A for substrate tRNAs and acts on tRNAs lacking one or more of several modifications. RNA (New York, N.Y.). 18, 1886-1896 (2012) 141. Missall, T.A., Pusateri, M.E., Lodge, J.K.: Thiol peroxidase is critical for virulence and resistance to nitric oxide and peroxide in the fungal pathogen, Cryptococcus neoformans. Molecular microbiology. 51, 1447-1458 (2004) 142. Ding, H., Mayer, F.L., Sanchez-Leon, E., de, S.A.G.R., Frases, S., Kronstad, J.W.: Networks of fibers and factors: regulation of capsule formation in Cryptococcus neoformans. F1000Research. 5, (2016) 143. Maxson, M.E., Dadachova, E., Casadevall, A., Zaragoza, O.: Radial mass density, charge, and epitope distribution in the Cryptococcus neoformans capsule. Eukaryotic cell. 6, 95-109 (2007) 144. Dalluge, J.J., Hashizume, T., Sopchik, A.E., McCloskey, J.A., Davis, D.R.: Conformational flexibility in RNA: the role of dihydrouridine. Nucleic acids research. 24, 1073-1079 (1996) 145. Dalluge, J.J., Hamamoto, T., Horikoshi, K., Morita, R.Y., Stetter, K.O., McCloskey, J.A.: Posttranscriptional modification of tRNA in psychrophilic bacteria. Journal of bacteriology. 179, 1918-1923 (1997) 146. Xing, F., Hiley, S.L., Hughes, T.R., Phizicky, E.M.: The specificities of four yeast dihydrouridine synthases for cytoplasmic tRNAs. The Journal of biological chemistry. 279, 17850-17860 (2004) 147. Kirpekar, F., Hansen, L.H., Mundus, J., Tryggedsson, S., Teixeira Dos Santos, P., Ntokou, E., Vester, B.: Mapping of ribosomal 23S ribosomal RNA modifications in Clostridium sporogenes. RNA biology. (2018)

146

148. Goebels, C., Thonn, A., Gonzalez-Hilarion, S., Rolland, O., Moyrand, F., Beilharz, T.H., Janbon, G.: Introns regulate gene expression in Cryptococcus neoformans in a Pab2p dependent pathway. PLoS genetics. 9, e1003686 (2013) 149. Gonzalez-Hilarion, S., Paulet, D., Lee, K.T., Hon, C.C., Lechat, P., Mogensen, E., Moyrand, F., Proux, C., Barboux, R., Bussotti, G., Hwang, J., Coppee, J.Y., Bahn, Y.S., Janbon, G.: Intron retention- dependent gene regulation in Cryptococcus neoformans. Scientific reports. 6, 32252 (2016) 150. Kupfer, D.M., Drabenstot, S.D., Buchanan, K.L., Lai, H., Zhu, H., Dyer, D.W., Roe, B.A., Murphy, J.W.: Introns and splicing elements of five diverse fungi. Eukaryotic cell. 3, 1088-1100 (2004) 151. Loftus, B.J., Fung, E., Roncaglia, P., Rowley, D., Amedeo, P., Bruno, D., Vamathevan, J., Miranda, M., Anderson, I.J., Fraser, J.A., Allen, J.E., Bosdet, I.E., Brent, M.R., Chiu, R., Doering, T.L., Donlin, M.J., D'Souza, C.A., Fox, D.S., Grinberg, V., Fu, J., et al.: The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science (New York, N.Y.). 307, 1321-1324 (2005) 152. Janbon, G.: Introns in Cryptococcus. Memorias do Instituto Oswaldo Cruz. 113, e170519 (2018) 153. Yoshihisa, T., Yunoki-Esaki, K., Ohshima, C., Tanaka, N., Endo, T.: Possibility of cytoplasmic pre- tRNA splicing: the yeast tRNA splicing endonuclease mainly localizes on the mitochondria. Molecular biology of the cell. 14, 3266-3279 (2003) 154. Joardar, A., Gurha, P., Skariah, G., Gupta, R.: Box C/D RNA-guided 2'-O methylations and the intron of tRNATrp are not essential for the viability of Haloferax volcanii. Journal of bacteriology. 190, 7308-7313 (2008) 155. Yoshihisa, T.: Handling tRNA introns, archaeal way and eukaryotic way. Frontiers in genetics. 5, 213 (2014) 156. Harding, H.P., Lackey, J.G., Hsu, H.C., Zhang, Y., Deng, J., Xu, R.M., Damha, M.J., Ron, D.: An intact unfolded protein response in Trpt1 knockout mice reveals phylogenic divergence in pathways for RNA ligation. RNA (New York, N.Y.). 14, 225-232 (2008) 157. Lee, Y.S., Shibata, Y., Malhotra, A., Dutta, A.: A novel class of small RNAs: tRNA-derived RNA fragments (tRFs). Genes & development. 23, 2639-2649 (2009) 158. Lin, S.L., Miller, J.D., Ying, S.Y.: Intronic microRNA (miRNA). Journal of biomedicine & biotechnology. 2006, 26818 (2006) 159. Maute, R.L., Schneider, C., Sumazin, P., Holmes, A., Califano, A., Basso, K., Dalla-Favera, R.: tRNA- derived microRNA modulates proliferation and the DNA damage response and is down-regulated in B cell lymphoma. Proc Natl Acad Sci U S A. 110, 1404-1409 (2013) 160. Sagi, D., Rak, R., Gingold, H., Adir, I., Maayan, G., Dahan, O., Broday, L., Pilpel, Y., Rechavi, O.: Tissue- and Time-Specific Expression of Otherwise Identical tRNA Genes. PLoS genetics. 12, e1006264 (2016)

147

Appendices

Appendix A1 Summary of mapped tRNA sequences for C. neoformans. Each modification (highlighted in red) used per page has the complete name at the bottom of the page.

O –1- methyl Inosine (m1I) “ – N1 methyl adenosine (m1A) I – Inosine (I) & - 5-carbamoylmethyluridine (ncm5U) T – 5- methyl Uridine (m5U) D – dihydrouridine (D)

148

O –1- methyl Inosine (m1I) 6 – N6-threonylcarbamoyladenosine (t6A) K – 1-methylguanosine (m1G) & - 5-carbamoylmethyluridine (ncm5U) 1 – 5-methoxycarbonylmethyluridine (mcm5U) L - N2-methylguanosine (m2G)

149

+ – N6-isopentenyladenosine (i6A) “ – N1 methyl adenosine (m1A) J – 2′-O-methyluridine (Um) ? - 5-methylcytidine (m5C) T – 5- methyl Uridine (m5U)

150

3 – 5-methoxycarbonylmethyl-2-thiouridine (mcm5s2U) “ – N1 methyl adenosine (m1A) J – 2′-O-methyluridine (Um) ? - 5-methylcytidine (m5C) T – 5- methyl Uridine (m5U)

151

3 – 5-methoxycarbonylmethyl-2-thiouridine (mcm5s2U) “ – N1 methyl adenosine (m1A) T – 5- methyl Uridine (m5U) K – 1-methylguanosine (m1G) B – 2’-O-methylcytidine (Cm) D – dihydrouridine (D)

152

έ - 1,2′-O-dimethylguanosine (m1Gm) 6 – N6-threonylcarbamoyladenosine (t6A) I – Inosine (I) T – 5- methyl Uridine (m5U) “ – N1 methyl adenosine (m1A)

153

“ – N1 methyl adenosine (m1A) 6 – N6-threonylcarbamoyladenosine (t6A) K – 1-methylguanosine (m1G) L - N2-methylguanosine (m2G) D – dihydrouridine (D) ?- 5-methylcytidine (m5C) 3 – 5-methoxycarbonylmethyl-2-thiouridine (mcm5s2U)

154

I – Inosine (I) “ – N1 methyl adenosine (m1A) K – 1-methylguanosine (m1G) L - N2-methylguanosine (m2G) T – 5- methyl Uridine (m5U) B – 2’-O-methylcytidine (Cm) & - 5-carbamoylmethyluridine (ncm5U)

155

+ – N6-isopentenyladenosine (i6A) ` - N6-(cis-hydroxyisopentenyl)adenosine (io6A) T – 5- methyl Uridine (m5U) B – 2’-O-methylcytidine (Cm) & - 5-carbamoylmethyluridine (ncm5U) “ – N1 methyl adenosine (m1A)

156

6 – N6-threonylcarbamoyladenosine (t6A) “ – N1 methyl adenosine (m1A) T – 5- methyl Uridine (m5U) + – N6-isopentenyladenosine (i6A) ` - N6-(cis-hydroxyisopentenyl)adenosine (io6A) 7 - 7-methylguanosine (m7G)

157

I – Inosine (I) T – 5- methyl Uridine (m5U) K – 1-methylguanosine (m1G) 7 - 7-methylguanosine (m7G) D – dihydrouridine (D) ? - 5-methylcytidine (m5C)

158

Appendix A2 Gene Ontology analysis of C. neoformans up and down regulated genes in IR.

Control vs 100 Gy Upregulated genes

159

Control vs 100 Gy down regulated genes

160

Control vs 300 Gy Upregulated genes

161

Control vs 300 Gy Downregulated genes

162

Appendix A3 Pre-tRNAs that were not detected in C. neoformans in the samples exposed to IR from the results of Chapter 5. Each unique tRNA transcript is named by its isotype and codon (i.e. Trp-CCA) and numbered in ascending order from the most canonical to the least (i.e. Trp-CCA-1, Trp-CCA-2 etc) as determined by the program tRNA-Scan SE). Multiple copies of the same tRNA (but at different loci) are annexed with a second number (Trp-CCA-1-1, Trp-CCA-1-2 etc). Chromosomal locations and legacy name of each tRNA transcripts can be found at the genomic tRNA database (http://gtrnadb.ucsc.edu).

Not detected in control Val-CAC-1-1 Cys-GCA-1-1 Gly-GCC-1-2 Ile-AAT-1-6 Tyr-GTA-1-2 Arg-CCT-1-1 Phe-GAA-2-3 Asn-GTT-2-2 Thr-AGT-1-1 Tyr-GTA-1-1 Gly-GCC-1-6 Ala-AGC-1-2 Val-AAC-1-2 Ala-AGC-1-3 Thr-AGT-1-4 Val-AAC-1-1 Thr-AGT-1-5 Ile-AAT-1-4 Met-CAT-2-2 Gly-GCC-1-7 Glu-CTC-2-4 Ala-AGC-1-6 Val-AAC-1-3 Ala-AGC-1-5 Gly-GCC-1-5 Val-AAC-1-4 Arg-CCT-1-2 Phe-GAA-2-4 Phe-GAA-1-1 Val-AAC-1-7 Asn-GTT-1-1

Not detected in 100 Gy Met-CAT-1-3 Leu-CAA-1-3 Glu-TTC-1-3 Arg-CCT-1-2 Asn-GTT-1-1 Val-AAC-1-1 Tyr-GTA-1-3 Gly-GCC-1-7 Ile-AAT-1-3 Val-AAC-1-4

163

Not detected in 300 Gy Val-CAC-1-1 Phe-GAA-1-1 Gly-GCC-1-2 Val-AAC-1-7 Phe-GAA-2-3 Ile-AAT-1-6 Thr-AGT-1-1 Asn-GTT-2-2 Ala-AGC-1-4 Ala-AGC-1-2 Gly-GCC-1-6 Ala-AGC-1-3 Val-AAC-1-2 Arg-CCT-2-1 Thr-AGT-1-5 Ile-AAT-1-4 Met-CAT-2-2 Val-AAC-1-6 Glu-CTC-2-4 Ala-AGC-1-1 Val-AAC-1-3 Ser-TGA-1-1 Gly-GCC-1-5

164

Appendix A4 Housekeeping pre-tRNAs defined as C. neoforman’s tRNA transcripts expressed in both control and IR treated samples from Chapter 5.

tRNA-Gly-TCC-1-1 tRNA-Gly-TCC-1-2 tRNA-His-GTG-2-1 tRNA-Glu-TTC-1-1 tRNA-Glu-CTC-1-3 tRNA-Glu-CTC-2-1 tRNA-Gln-CTG-1-3 tRNA-Gln-CTG-1-1 tRNA-Gln-CTG-1-2 tRNA-His-GTG-3-1 tRNA-His-GTG-1-1 tRNA-Leu-AAG-1-3 tRNA-Leu-AAG-1-4 tRNA-Leu-AAG-1-2 tRNA-Leu-AAG-1-1 tRNA-Leu-AAG-2-1 tRNA-Trp-CCA-2-2 tRNA-Pro-CGG-1-1 tRNA-Gly-GCC-1-4 tRNA-Gly-GCC-1-8 tRNA-Gly-GCC-1-9 tRNA-Gly-GCC-1-1 tRNA-Arg-TCT-1-1 tRNA-Pro-TGG-1-1 tRNA-Ser-GCT-2-1 tRNA-Glu-TTC-1-2

165

tRNA-Met-CAT-1-1 tRNA-Gln-TTG-2-1 tRNA-Gln-TTG-1-1 tRNA-Thr-CGT-1-1 tRNA-Glu-CTC-1-1 tRNA-Val-AAC-1-5 tRNA-Thr-AGT-1-3 tRNA-Gly-CCC-1-1 tRNA-Ile-TAT-1-1 tRNA-Thr-TGT-1-1 tRNA-Leu-AAG-1-5 tRNA-Ile-AAT-1-2 tRNA-Ile-AAT-1-1 tRNA-Ile-AAT-1-5 tRNA-Trp-CCA-2-1 tRNA-Lys-CTT-1-2 tRNA-Ser-GCT-1-1 tRNA-Glu-CTC-2-6 tRNA-Lys-CTT-1-1 tRNA-Lys-CTT-1-3 tRNA-Lys-CTT-1-4 tRNA-Lys-CTT-1-5 tRNA-Lys-CTT-1-7 tRNA-Leu-CAG-1-1 tRNA-Glu-CTC-1-2 tRNA-Trp-CCA-1-1 tRNA-Pro-AGG-1-4 tRNA-Pro-AGG-1-2 tRNA-Ser-CGA-1-1

166

tRNA-His-GTG-2-2 tRNA-Pro-AGG-1-1 tRNA-Arg-TCG-1-1 tRNA-Arg-TCG-1-5 tRNA-Arg-TCG-1-2 tRNA-Arg-TCG-1-3 tRNA-Arg-TCG-1-4 tRNA-Glu-CTC-2-2 tRNA-Pro-AGG-1-3 tRNA-Pro-AGG-1-5 tRNA-Tyr-GTA-1-4 tRNA-Cys-GCA-1-3 tRNA-Thr-AGT-1-2 tRNA-Asn-GTT-1-2 tRNA-Arg-ACG-1-1 tRNA-Asn-GTT-2-1 tRNA-Ser-AGA-1-1 tRNA-Ser-AGA-1-3 tRNA-Ser-AGA-1-6 tRNA-Ser-AGA-1-4 tRNA-Ser-AGA-1-2 tRNA-Ser-AGA-1-5 tRNA-Lys-TTT-1-1 tRNA-Met-CAT-1-2 tRNA-Val-CAC-1-2 tRNA-Glu-CTC-2-5 tRNA-Lys-CTT-1-6 tRNA-Ala-TGC-1-1 tRNA-Phe-GAA-2-2

167

tRNA-Phe-GAA-2-1 tRNA-Leu-TAA-1-1 tRNA-Leu-CAA-1-1 tRNA-Leu-CAA-1-2 tRNA-Ala-CGC-1-1 tRNA-Ala-CGC-1-2 tRNA-Ala-AGC-1-7 tRNA-Gly-GCC-1-3 tRNA-Glu-CTC-2-3 tRNA-Gly-TCC-1-3 tRNA-Gly-CCC-1-2 tRNA-Cys-GCA-1-2 tRNA-Arg-ACG-1-3 tRNA-Arg-CCG-1-1 tRNA-Met-CAT-2-1 tRNA-Met-CAT-2-3 tRNA-Arg-ACG-1-2

168

169