<<

Post-transcriptional Modification Characterizing and Mapping of tRNAs Using Liquid Chromatography with Tandem Mass Spectrometry

A dissertation submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy (PhD)

in the Department of Chemistry

of the McMicken College of Arts and Sciences

by

Ningxi Yu

M. Sc. Chemistry, Central China Normal University, 2012

B. Eng. Wuhan Institute of Technology, 2009

October 2018

Committee Chair: Patrick A. Limbach, Ph.D

i

Abstract

This dissertation is focused on exploring the transfer RNA modification profiles in archaea.

Transfer RNA (tRNA) plays a key role in decoding the genetic information on messenger RNA

(mRNA), and the post-transcriptional modification within tRNAs shape the decoding strategies in different organisms. have been extensively studied in term of types and positions of tRNA modifications, and a few eukaryotic organisms have also been investigated. However, our knowledge of tRNA modifications in archaea is still limited. While the modifications in multiple archaeal organisms have been identified, the only sets of tRNA sequences whose modification have been localized to particular tRNAs is from Haloferax volcanii. To improve our understanding of archaeal tRNA modification profiles and decoding strategies, I have used liquid chromatography and tandem mass spectrometry to localize post-transcriptional modifications of selected archaeal organisms. A computational tool has been developed for MS/MS data interpretation and RNA sequence annotation. By using this tool, the modifications were localized to tRNA sequences from five archaeal organisms. Among the five selected organisms, the modifications in the anticodon of jannaschii tRNAs have been fully identified, and the first compilation of modified tRNA sequences of this archaea have been generated. The types of modification in four other archaeal organisms, kandleri,

Methanothermobacter marburgensis, Sulfolobus acidocaldarius, and Thermoproteus tenax, were investigated, and some of these modifications were localized on the tRNA sequences. This study expands our understanding of modification patterns and decoding strategies of archaeal tRNAs, and it also allows us to have a more comprehensive comparison of the tRNA modification landscape among three domains of life.

ii

iii

Acknowledgment

This dissertation is dedicated to my family. Your love and support always keep me moving forward.

I would like to express my sincere gratitude to my advisor Dr. Patrick A. Limbach for your patience and guidance, his encouragement and help has led me to overcome the challenges and finish my

Ph.D. Besides my advisor, I would like to thank the rest of my thesis committee, Dr. Neil Ayres and Dr. Peng Zhang, for their insightful comments and suggestions for my research. I also would like to thank all the members of Limbach group for helping me and bringing me joy, I have been so lucky to work with you.

iv

Table of Contents Abstract ...... ii

Acknowledgment ...... iv

Tale of Contents ...... v

Chapter 1 Introduction...... 1 1.1 Research Goal ...... 1 1.2 Post-transcriptional Modifications of tRNA ...... 2 1.3 tRNA Modifications and Decoding ...... 4 1.4 Analytical Methods of tRNA Modification Mapping ...... 9 1.5 tRNA Modifications in Archaea ...... 13

Chapter 2 RNAModMapper: RNA Modification Mapping Software for Analysis of Liquid Chromatography Tandem Mass Spectrometry Data...... 19 2.1 Introduction ...... 19 2.2 Experimental ...... 20 2.3 Results and Discussion ...... 23 2.4 Conclusions ...... 51

Chapter 3 Transfer RNA Modification Profiles and Codon Decoding Strategies in Methanocaldococcus jannaschii...... 52 3.1 Introduction ...... 52 3.2 Experimental ...... 52 3.3 Results and Discussion ...... 58 3.4 Conclusions ...... 130

Chapter 4 The Post-transcriptional Modifications in tRNA from Selected Archaeal Organisms...... 131 4.1 Introduction ...... 131 4.2 Experimental ...... 132 4.3 Results and Discussion ...... 134

v

4.4 Conclusions ...... 159

Chapter 5 Conclusions and Future Work ...... 171 5.1 Conclusions ...... 171 5.2 Future Work ...... 173

Bibliography ...... 179

Appendices ...... 189

vi

vii

Chapter 1

Introduction

1.1 Research Goal

The goal of this dissertation is to expand our understanding of transfer RNA (tRNA) modification profiles in archaea by using liquid chromatography and tandem mass spectrometry (LC-MS/MS).

A new software tool, RNAModMapper, was developed to enable MS/MS spectral data interpretation and modification mapping for RNA (Chapter 2). This tool then was used for mapping the modifications on tRNAs from selected archaeal organisms. I started by analyzing the modification pattern of Methanocaldococcus jannaschii in terms of the types and locations of tRNA modifications, and all the modifications in the anticodon loop were identified, which revealed the decoding coding strategies of this organism. To further explore the tRNA modifications profiles in archaea, four other archaea, Methanopyrus kandleri,

Methanothermobacter marburgensis, Sulfolobus acidocaldarius, and Thermoproteus tenax, were studied. This work provides a starting point for understanding tRNA modifications in archaea.

Portions of this chapter have previously been published in Analytical Chemistry, 2017, 89 (20), pp

10744-10752.

1

1.2 Post-transcriptional Modifications of tRNA

Ribonucleic acid (RNA) is a polymeric biomolecule that plays important roles in many biological processes. The basic composition of RNA is four canonical nucleosides, adenosine (A), guanosine

(G), cytidine (C) and uridine (U) as shown in Figure 1.1. After transcription, RNA can be post- transcriptionally modified, and a variety of chemical modifications are enzymatically introduced to the canonical nucleosides within the RNA sequence. There are more than 100 types of naturally- occurring chemical modifications found in RNA, and the modifications vary from simple methylations to complex modifications with multiple function groups1-3. The formation of modifications are catalyzed by . The simple modifications usually need few enzymes, while the complex modifications require multiple enzymes with many steps.

Figure 1.1 The basic structure of RNA and some examples of modifications.

Among all types of RNAs, transfer RNA (tRNA) has been found to have the greatest density of modifications. To date, there are more than 100 chemically unique modified nucleosides that have

2 been identified in tRNAs from all three domains of life as shown in Figure 1.2, and the names of modifications are adapted from the MODOMICS database3. The comparison of tRNA modifications between the three domains of life shows each domain has unique modifications, and modifications also vary between different organisms within the same domain of life. It also reveals a set of universally conserved modifications that occur in tRNAs in all three domains. These universal modifications have relative simple chemical structures, such as the methyl group is added to different positions of the nucleobase or ribose, which indicates that they might be essential for all three domains of life.

Figure 1.2 The tRNA modifications identified from three domains of life to date. The length of a tRNA molecule is usually from 76-90 nucleotides, and modifications are distributed at different positions. The same modification can occur at different positions in

3 different tRNAs of different organisms. For example, N4-acetylcytidine (ac4C) was found at position 34 of Escherichia coli tRNAMet and it was also found at position 12 of Saccharomyces cerevisiae tRNASer and tRNALeu. Moreover, the same modification also can occur at different positions of different tRNA molecules even within the same organism. For example, N2- methylguanosine (m2G) was found at position 10 and 26 in many S. cerevisiae tRNAs.

Modifications do not randomly occur in tRNA molecules, and they usually serve certain functions at specific positions.

1.3 tRNA Modifications and Decoding

The primary function of tRNA is to carry the appropriate amino acid to the ribosome in a specific order by pairing to the codons on messenger RNAs (mRNAs). A growing number of experimental results and crystal structures show that the anticodon loop of tRNA needs to be modified to accurately and efficiently decode mRNA.

The standard genetic code to encode the amino acids is used by most organisms. The 61 sense codons are divided into 23 amino acid codon boxes and 2 stop codon boxes. The 23 amino acid codon boxes can be grouped to 1-, 2-, 3-, and 4-codon boxes which have one, two, three, and four codons, respectively, to encode each amino acid as shown in Figure 1.3. To decode the codon triplets, the three nucleotides in anticodons of tRNA are paired to the codons, and the modifications at position 34 (also called wobble position) and position 37 are most important and diversified4.

4

Figure 1.3 The standard genetic code of each amino acid (left). Codon-anticodon pairing (right).

According to Watson-Crick base pairing (A·U, C·G), 61 sense codons require 61 tRNAs for decoding. However, most organisms have fewer than 61 tRNAs, which indicates that some tRNAs must decode more than one codon5. Valine belongs to a 4-codon box, which is encoded by the

Val Val codons GUU/GUC/GUA/GUG, but E. coli only has tRNA GAC and tRNA UAC to decode them.

5 Val Uridine 5-oxyacetic acid (cmo U) at position 34 of E. coli tRNA UAC is able to decode all four valine codons (GUU, GUC, GUA, GUG), but has a relative stronger binding to the codons GUA,

GUG, and GUU. Crystal structures of cmo5U bound to A/G/U/C were reported by Weixlbaumer and co-workers as shown in Figure 1.46. A and G pairs with cmo5U in a normal Watson-Crick geometry, and when cmo5U pairs with U and C a minor groove is formed by one hydrogen bound, but the carboxyl group of the cmo5U results in additional hydrogen bonding with O4 of the uridine

Val codon. This work provides a structural explanation for binding affinities of tRNA UAC with cmo5U to the A-/G-/U-ending codons are similar and higher than the C-ending codon.

5

Figure 1.4 The crystal structures of cmo5U bound to A/G/U/C, the hydrogen bonds are shown by dotted lines. (a) cmo5U·A. (b) cmo5U·G. (c) cmo5U·U. (d) cmo5U·C. This figure and other molecular graphics figures are made with UCSF Chimera7.

Modifications at position 37 are also involved in enhancing decoding accuracy. When position 37 is occupied by a purine base, it is usually modified to stabilize the codon recognition by stacking with the first codon-anticodon base pair8-9. For example, lysine belongs to a 2-codon box and it is encoded by the codons AAA and AAG. 2-thiouridine (s2U) and 5-methylaminomethyluridine

(mnm5U) at position 34 along with the modification N6-threonylcarbamoyladenosine (t6A) at

Lys position 37 in E. coli tRNA UUU were essential for recognizing two lysine codons AAA and AAG and discriminating against asparagine codons AAU and AAC4. Later a study reported the X-ray

Lys 6 structure of E. coli tRNA mnm5s2UUU bound to cognate or near-cognate codons. It shows that t A

6 at position 37 formed cross-strand stacking with the first nucleotide of the mRNA codon as shown in Figure 1.5, which further demonstrate the tRNA discrimination mechanism10.

Figure 1.5 Cross-strand stacking of t6A37 with the first nucleotide of the A-site bound codon (position +4).

The decoding strategies for the same codon(s) are not uniform in different organisms, and decoding the isoleucine codon AUA in the three domains of life is a notable example. Isoleucine is encoded by three codons AUU/AUC/AUA. The challenge is discriminating the isoleucine codon AUA from the methionine codon AUG. The organisms from three domains have their own strategies as shown in Figure 1.6. Both Haloarcula marismortui and E. coli use the anticodon CAU to decode AUA, but C34 is modified to agmatidine11 (agm2C or C+) and lysidine12 (k2C) respectively. The chemical structures of C+ and k2C are similar. S. cerevisiae uses anticodon UAU with pseudouridine (Ψ) at positions 34 and 36 to decode AUA13.

7

Figure 1.6 The decoding strategies for H. marismortui (Archaea), E. coli (Bacteria), and S. cerevisiae (Eukarya) to decoding isoleucine codon AUA, and the chemical structures of agmatidine (agm2C or C+), lysine (k2C), and pseudouridine (Ψ).

Ile The crystal structure of modified archaeal tRNA CAU bound to its cognate AUA codon on the ribosome was obtained by Voorhees and co-workers. It provides a structural basis of decoding isoleucine codon AUA with agmatidine14. The U36 and A35 in tRNA interact with A1 and U2 in mRNA via canonical Watson-Crick base pairing, and agm2C34 is binding to A3 by one hydrogen bound. Moreover, the terminal amine of agmatidine forms another hydrogen bond to O4′ of ribose in A5, which enhances the binding interactions (Figure 1.7). The presence of agmatidine at the

Ile wobble position of tRNA CAU is able to discriminate against the methionine codon AUG. The modeling of a G at the third position of the mRNA codon indicates that the exocylic amine (N2) of the G residue would clash with agmatidine, and this steric clash would prevent a canonical

Watson-Crick geometry between G34 and A3. Because lysidine is chemically similar to agmatidine, the authors also suggested that the mechanism of discriminating AUA codon and AUG codon by lysine in bacteria also might be similar to agmatidine.

8

Ile Figure 1.7 Interactions between modified tRNA CAU and AUA codon. There is one hydrogen bound between N4 of agm2C34 and N1 of A3, and the terminal amine of agmatidine forms another hydrogen bound to O4′ of ribose in A5.

Many differences exist in tRNA modifications and decoding strategies between different organisms and domains. To better understand how different organisms use tRNA modifications to facilitate codon recognition and ensure decoding fidelity, one needs to characterize and localize the modification on tRNA sequences.

1.4 Analytical Methods of tRNA Modification Mapping

Modern approaches that seek to identify specific tRNA sequence locations that are modified are based on either RNA-seq technology or mass spectrometry15. RNA-seq methods have an advantage in throughput16-17; however, special sample treatment is typically required to recognize sites of chemical modification18-19. Mass spectrometry enjoys the advantage of being sensitive to every chemical modification within tRNA, although analysis is typically time-consuming and requires more sample than RNA-seq.

9

Even with the advent of high-throughput genomic sequencing technologies, there remains an interest in mass spectrometry as a platform for tRNA modification mapping with a specific attentiveness in developing new tools and methods that reduce sample consumption and analysis time20-21. The most common platform for tRNA modification mapping by mass spectrometry is

LC-MS/MS22. RNA modification mapping by mass spectrometry was adapted from prior biochemical approaches to RNase mapping of RNA23. A tRNA nucleoside profile is obtained, and these modifications are placed onto the correct tRNA sequence context by digesting an individual tRNA or total tRNA pools with a base-specific that generates oligoribonucleotide digestion products amenable to LC-MS/MS.

To obtain the tRNA nucleoside profile, a tRNA sample is digested to nucleosides by using multiple , and separated by liquid chromatography followed by tandem mass spectrometry analysis. The modified nucleosides are identified by the mass/charge ratio of the intact molecule

(MS1), and the detailed structural information can be investigated by collision induced dissociation

(CID) mass spectrometry (MS2)22. During CID fragmentation, the intact nucleoside usually fragments between the base and ribose, which generates the base ion peak. By using retention time, nucleoside isomers can be differentiated, such as 1-methyladenosine (m1A) and N6- methyladenosine (m6A), or 1-methylguanosine (m1G) and N2-methylguanosine (m2G). An alternative fragmentation method, higher-energy collisional dissociation (HCD), yields more informative product ions, which can be used to differentiate the isomeric nucleosides24.

10

Figure 1.8 Example of isomeric nucleosides differentiation. Methyladenosine isomers, m1A, Am, and m6A are separated by liquid chromatography. The CID-MS/MS spectra of m1A and m6A are the same, but the HCD-MS/MS spectra are different.

Once the tRNA nucleoside profile is identified, the tRNA sample is digested to oligonucleotides by . LC-CID-MS/MS analysis is used to localize the positions of modified nucleosides. The most commonly used ribonucleases for modification mapping are RNase T1 and

RNase A, and the mapping protocols for other ribonucleases such as RNase U2, RNase MC1, and

Cusativin, have been reported25-27. Collisional-induced dissociation of oligonucleotides generates product ions that can be assigned using the McLuckey nomenclature28 as c-, y-, w-, and a-B-type product ions as shown in Figure 1.9. Mass differences between sequential c- and y-type fragment ions reveal the identity of canonical and modified nucleosides as well as their location within the oligonucleotide sequence.

11

Figure 1.9 (A) The nomenclature of oligonucleotide fragmentations. (B) An example of MS/MS spectral data.

While this RNA modification mapping by mass spectrometry approach is quite effective, overall experimental throughput is most often limited by the data interpretation (assigning fragment ions to each MS/MS spectrum) and sequence annotation (mapping interpreted MS/MS data back onto the original RNA sequence) steps. In Chapter 2, I created a computational tool to improve the throughput for RNA modification mapping by LC-MS/MS.

12

1.5 tRNA Modifications in Archaea

In the past few decades, the tRNA modifications in bacteria and eukarya have been extensively studied, and the modification profiles of some organisms from these two domains are fully characterized, such as Escherichia coli from bacteria and Saccharomyces cerevisiae from eukarya.

The consensus modification sequences of E. coli and S. cerevisiae tRNAs are shown in Figure

1.10.

Figure 1.10 The consensus modification sequence of E. coli and S. cerevisiae tRNAs, the modifications and positions are obtained from the MODOMICS database3.

Archaea is a domain of single-celled prokaryotic organisms. They were initially classified as bacteria, however analyses of ribosomal RNA sequences have separated them from bacteria29-30.

The size and shape of archaea and bacteria are similar, but archaea possess genes and some metabolic pathways that are more closely related to those of . They were first discovered

13 in extreme environments, such as high temperature hot springs and extremely saline waters. Later they have been found in a broad range of habitats and in the human microbiome. There are many commercially appealing enzymes found in archaea because of their characteristics.

For example, thermostable DNA polymerases from archaea have been extensively applied in PCR due to their high fidelity and thermostability31.

The extreme growth conditions make archaeal organisms difficult to culture, and our understanding of tRNA modification profiles in archaea still lags what we know about bacteria and eukaryotes. The primary studies of tRNA modifications in archaea were done by McCloskey and co-worker using LC-MS/MS32-33. These studies showed that archaea have some modifications in common with bacteria and eukaryotes, such as dihydrouridine (D), 1-methylguanosine (m1G),

2′-O-methylcytidine (Cm), and N6-threonylcarbamoyladenosine (t6A). However, some modifications have only been identified within archaea so far, such as agmatidine (C+),

2 2 2 Archaeosine (G+), and N ,N ,2′-O-trimethylguanosine (m 2Gm).

5-cyanomethyluridine (cnm5U) is a unique modified uridine that is only found at the anticodon wobble position in archaea. It was initially characterized from a mutant of Haloarcula marismortui

34 Ile tRNAs by Mandal and co-workers . The C34 in the anticodon of tRNA CAU was mutated to U and expressed in Haloferax volcanii. The mutant tRNA binds to codons AUA and AUU while the wild-type tRNA only binds to codon AUA. Biochemical and mass spectrometric analysis of the mutant tRNAs discovered this new modified nucleoside. It was also identified in naturally occurring tRNAs from H. volcanii, and M. maripaludis, but not from S. solfataricus, S. cerevisiae, or E. coli.

14

Figure 1.11 The chemical structure of 5-cyanomethyluridine (cnm5U).

To date, the only set of archaeal tRNA sequences whose modification pattern is almost completely characterized is from H. volcanii. The tRNA modification pattern was first studied by Gupta35-36, and later some modifications were also predicted as shown in Figure 1.1237-38. The modifications at positions 34 and 37 of H. volcanii tRNAs are not as diversified as E. coli and S. cerevisiae tRNAs. The reason could be either some of the modified nucleosides were not identified during that time, or this organism has fewer modifications in its tRNAs.

Figure 1.12 The tRNA modification profile of H. volcanii.

15

The decoding strategies of H. volcanii, E. coli, and S. cerevisiae tRNAs are summarized in Table

1.1, and xU in H. volcanii is believed to be mcm5s2U or a similar type of uridine derivative. There are two 1-codon boxes, Met-AUG and Trp-UGG. Both H. volcanii and S. cerevisiae use modified

C34 to decode the methionine codon AUG. To decode the tryptophan codon UGG, all three organisms use Cm34 to stabilized codon-anticodon interactions.

The 2-codon boxes can be divided into two groups. The first group has pyrimidine-ending codons:

Phe-UUU/UUC, Tyr-UAU/UAC, His-CAU/CAC, Asn-AAU/AAC, Asp-GAU/GAC, Cys-

UGU/UGC, and Ser-AGU/AGC. The G34-containing tRNAs are used to decode pyrimidine- ending codons for all three organisms, and only G34 in E. coli can be modified to queuosine (Q) and glutamyl -queuosine (gluQ). Position 37 is usually modified in this group to enhance anticodon-codon interactions (modifications at position 37 are not listed in the table). The second group of amino acids has purine-ending codons: Leu-UUA/UUG, Gln-CAA/CAG, Lys-

AAA/AAG, Glu-GAA/GAG and Arg-AGA/AGG. The tRNAs with modified U34 are used to decode the codons in this group. All U34 positions in E. coli, and S. cerevisiae tRNAs are modified, while the U34 modification was not fully characterized in H. volcanii.

There is only one 3-codon box, which is Ile-AUU/AUC/AUA. H. volcanii and E. coli use

Ile tRNA CAU with agmatidine for differentiating Ile AUA codon to Met AUG codon, but S.

Ile Ile cerevisiae uses tRNA UAU with Ψ34. Both H. volcanii and E. coli use the tRNA GAU to decode the other two Ile codons (AUU and AUC), while A34 is modified to inosine (I) in S. cerevisiae tRNA to decode these two codons. The decoding strategies of H. volcanii and E. coli are similar in this codon box.

There are eight 4-codon boxes. All U34 positions are modified to decode A3 and G3 codons in E. coli, and S. cerevisiae. These two organisms use inosine, which is converted from A34, to decode

16 the arginine CGU/CGC/CGA/CGG codons, while only S. cerevisiae uses I34 to decode Serine, proline, threonine and alanine codons. The modification of U34 that H. volcanii tRNA used to decode 4-codon boxes is still unknown.

Table 1.1 The comparison of decoding strategies between H. volcanii (orange, xU indicates the experimental unidentified uridine derivative, it is believed to be mcm5s2U or a similar type of U derivatives), E. coli (red), and S. cerevisiae (blue).

17

Whether H. volcanii is a good representative of how archaea have adapted their tRNA-based decoding machinery can only be determined by compiling information from additional archaeal tRNAs. In Chapter 3, I have extensively studied the tRNA modification profiles of

Methanocaldococcus jannaschii. To further understand modification patterns in other archaea, four archaeal organisms (Methanopyrus kandleri, Methanothermobacter marburgensis,

Sulfolobus acidocaldarius, and Thermoproteus tenax) were studied in Chapter 4.

18

Chapter 2

RNAModMapper: RNA Modification Mapping Software for Analysis of Liquid

Chromatography Tandem Mass Spectrometry Data

2.1 Introduction

The importance of post-transcriptional modification in tRNAs and analytical methods for modification mapping were described in Chapter 1. Given the low throughput of manual MS/MS data interpretation for oligonucleotides, a number of computational tools have been created to improve the automation of this step. The first tool for analyzing oligonucleotide MS/MS data was

Simple Oligonucleotide Sequencer (SOS)39, an interactive program developed by Rozenski and

McClosky for ab initio oligonucleotide sequencing by mass spectrometry. Nyakas et al.40 developed the OMA and OPA software toolbox, which can analyze precursor and product ion spectra of oligonucleotides, oligonucleotide derivatives and oligonucleotide adducts with metal ions or drugs. While each of these tools has utility for simplifying MS/MS data interpretation, neither is applicable to large-scale tRNA modification mapping due to the inability to batch process large LC-MS/MS files.

The first computational platform applicable to RNA modification mapping at scale was Ariadne, developed by Nakayama et al41. Ariadne is a web-based database search engine that uses MS/MS data of RNA digestion products to search against the sequence database to identify particular

RNAs in biological samples. RMM is another database search program, which can search whole prokaryotic genomes or RNA FASTA sequence databases42, to identify the specific RNA within a sample. Unlike RMM, Ariadne does offer functionalities that provide the user control over the types of chemical modifications that may be present within the MS/MS datasets, although the

19 database of organisms present in Ariadne is limited. More recently, the standalone program

RoboOligo was developed to allow both automated de novo and manual analysis of modified oligonucleotide MS/MS spectra43. RoboOligo was explicitly created to handle tRNAs and the large diversity of modifications present in those samples. However, RoboOligo only allows annotation of a single tRNA sequence at a time.

While these newer tools are a major improvement over manual interpretation of MS/MS data, our efforts at improving RNA modification mapping throughput remain hindered by the lack of software that can automate both the MS/MS data interpretation and tRNA sequence annotation steps. Thus, here I introduce a new tool for oligonucleotide MS/MS data interpretation built specifically for RNA modification mapping. RNAModMapper (RAMM) is able to interpret CID data from oligonucleotides, and then map interpreted MS/MS sequences onto RNA sequences.

The capabilities of RAMM were evaluated using multiple tRNA samples. Further, to test its capabilities with other RNA types, bacterial 16S and 23S rRNAs were also analyzed and processed.

As a stand-alone program built in an open source code environment, RAMM enables higher throughput RNA modification mapping by LC-MS/MS.

This work has been published in Analytical Chemistry, 2017, 89 (20), pp 10744-10752.

2.2 Experimental

2.2.1 Materials

Escherichia coli MRE 600 total transfer ribonucleic acid, RNase T1, TRI-Reagent, calcium chloride, 1,3,3,3-hexafluoro-2-propanol (HFIP) and triethylamine (TEA) were purchased from

Sigma Aldrich (St. Louis, MO). Ammonium was purchased from Fisher Science (Fair

20

Lawn, NJ). A Nucleobond AX 500 column was purchased from Macherey-Nagel (Düren,

Germany). The RNAIDTM kit with SPINTM was purchased from MP Biomedicals (Solon, OH).

LC-MS grade water and methanol were purchased from Honeywell B&J (Morristown, NJ).

Total RNA was obtained from Streptococcus griseus cultured in house as described previously44.

S. griseus total RNA sample (88 µg) was run on a 1% low melting point agarose gel. The 16S and

23S rRNA bands were excised and individually treated with 600 µL of RNA binding salt solution, which was heated to 50 °C for 10 min. To each sample was added 2.2 µL of 10% acetic acid and

10 µL of RNAMATRIXTM. The solutions were suspended for 10 min and centrifuged for 1 min to pellet the RNA/RNA matrix complex. The pellets were washed twice with RNA washing solution and suspended with sterile water. The RNA was eluted from each sample by incubating at 50 °C for 5 min, and then the solution was centrifuged for 2 min and the liquid with RNA was transferred to a spin filter before final centrifugation and storage. The 16S and 23S rRNA samples from S. griseus were prepared by Dr. Xiaoyu Cao.

2.2.2 Ribonuclease Digestion

For RNase T1 digestion, 10 µg of RNA sample and 500 U of RNase T1 were combined in 220 mM ammonium acetate and incubated for 2 h at 37 °C. Samples were lyophilized and then rehydrated in mobile phase A (MPA: 200 mM HFIP, 8.15 mM TEA, pH = 7.0).

2.2.3 LC-MS/MS Analysis

For low-resolution LC-MS/MS, RNase T1 digestion products were separated on a Waters XBridge

C18 column (3.5 µm, 1 mm × 150 mm) with a gradient of 5% B to 20% B in 5 min; 20% B to 30%

21

B to 95% B in 43 min; hold at 95% B for 5 min, followed by re-equilibration for another 15 min at 5% B, where mobile phase B is composed of 50:50 v:v MPA:methanol, pH = 7.0.

Low-resolution data were acquired on a Thermo LTQ-XL mass spectrometer with a capillary temperature of 275 °C, spray voltage of 4 kV, sheath gas, auxiliary gas and sweep gas at 40, 10, and 10 arbitrary units, respectively, and a capillary voltage of -100 V. Samples were analyzed in negative polarity over an m/z range of 500 to 2000. Data dependent acquisition was used to collect

MS/MS data at a normalized collision energy of 42% with an activation time of 30 ms. A full range mass scan was followed by four scans of the most abundant precursors from the full scan, with m/z values selected for CID analyzed for up to 10 scans before they were placed on a dynamic exclusion list for 30 s. Precursor ion selection was performed with an isolation width of 2.

For high-resolution LC-MS/MS, RNase T1 digestion products were separated on an Agilent

Poroshell 120 EC-C18 column (2.7 µm, 1mm x 75 mm) thermostatted at 50 °C at a flow rate of

80 µL/min with a gradient of 5% B for 5 min; 5% to 73% B in 60 min; 100% B for 5 min, followed by re-equilibration at 5% B for 20 min. The same mobile phase A and B compositions used in the low-resolution LC-MS/MS experiments were used for all high-resolution LC-MS/MS experiments.

High-resolution data were acquired on a Waters Synapt G2-S HDMS mass spectrometer with a source temperature of 120 °C, desolvation temperature of 400 °C, capillary voltage of 2.5 kV, sampling cone of 55V, source offset of 80V, cone gas of 50 L/hr and desolvation gas of 700 L/hr.

For all measurements, the mass spectrometer was operated in V-mode (sensitivity) with a typical resolving power of 15,000 FWHM (full width at half maximum). Samples were analyzed in negative-mode ESI over an m/z range of 500 to 2000 for MS and 300 to 2000 for MS/MS. Data dependent acquisition was used to collect MS/MS data (1 s scan) for a maximum of 3 ions per MS scan (0.2 s scan) using a collision energy ramp from 18 V to 38 V before being added to a dynamic

22 exclusion list for 15 s. Precursor ion selection was performed with an isolation width of 2.

Lockspray calibration was performed using a solution of leucine enkephalin (200 pg/µL) infused at 5 µL/min. Lockspray scans were collected for 1 s every 30 s with setmass at m/z 554.2615. The high-resolution MS/MS data on the Waters Synapt were acquired by Peter A. Lobue.

2.2.4 Data Analysis

RNAModMapper (RAMM) was used for automated MS/MS data analysis and sequence annotation. The program and user manual are available as a free download from http://bearcatms.uc.edu/. RAMM was developed in Java and tested on Windows 10, Windows 8 and Windows 7. Data analysis was performed on a Dell precision tower 7910 running Windows

10 with an Intel Xeon CPU E5-2630 processor at 2.4 GHz with 128 GB RAM. E. coli tRNA sequences with modifications were obtained from the MODOMICS database45. The 16S and 23S rRNA sequences of S. griseus were obtained from NCBI (http://www.ncbi.nlm.nih.gov/).

2.3 Results and Discussion

RAMM was developed as a tool to enable local processing of MS/MS spectra obtained from oligoribonucleotides with an emphasis on handling chemically modified RNase digestion products.

Unlike other RNA mass spectrometry software, RAMM uses user-generated sequence input files as the basis for mapping the interpreted MS/MS data with full flexibility to handle more than 100 post-transcriptionally modified nucleosides. A schematic workflow of the process used for data analysis and annotation is shown in Figure 2.1. A typical RNA sequence mapping experiment involves RNA sample preparation and enzymatic digestion followed by LC-MS/MS analysis of

23 the RNase digestion products. This experimental workflow will generate the data file that is used by RAMM to interpret MS/MS data of interest, from which modified nucleosides can then be mapped onto RNA sequences.

Figure 2.1 The workflow of RNA modification mapping by LC-MS/MS.

The RAMM home screen is shown in Figure 2.2. Two menus with options are available: Actions and Functions. The Actions menu allows the user to select between fixed and variable sequence position modification mapping and to identify the input files, modifications and tolerance parameters. The Functions menu allows the user to load an output file and export any output files as a .CSV formatted file.

24

Figure 2.2 View of the main panel for the RNAModMapper program. The Actions menu allows the user to select between fixed and variable sequence position modification mapping and to identify the input files, modifications and tolerance parameters. The Functions menu allows the user to load output file and export output file as a .CSV formatted file. The type of mapping experiment (fixed or variable) that is desired is identified in the main user interface by selecting the Actions menu. The MS/MS data file (MGF) and the FASTA file containing the modified or unmodified sequences to be mapped are then chosen in the mapping window (Figure 2.3). The user also defines the used during sample preparation, precursor and product ion mass tolerances, mass type (average or monoisotopic), and other experimental processing parameters. Once the processing parameters have been chosen, the RNA sequences contained in the FASTA file are processed in silico to generate a local database of RNase digestion products against which the experimental MS/MS data is compared and scored. These interpreted

MS/MS sequences can then be mapped back onto the full-length input sequence(s) to annotate the modifications in a location-specific manner. The key design and development features of RAMM will be described first, followed by selected demonstrations of the applicability of the software.

25

Figure 2.3 Screenshot of fixed sequence position modification mapping window. As denoted in the text, the user has full control over the input files and data processing parameters.

Input Files

Input files are required in FASTA format. RNA modified nucleosides can be annotated in the input file by using the MODOMICS nomenclature45. RNA gene (i.e., DNA) sequences can also be used as RAMM will convert directly to the predicted RNA sequence. All MS/MS raw data must be converted to the MGF (Mascot generic format) data format for input into RAMM. For Thermo

LTQ-XL data, both the Mass Matrix File Conversion Tool46 and MSConvert47 were capable of converting the original RAW data file format to an MGF format that could be processed with

RAMM. For Waters Synapt G2-S data, the RAW data file was first pre-processed in PLGS

(ProteinLynx Global Server, Waters Corp.) for MS/MS spectra noise reduction and then exported in mzML format. MSConvert was then used to convert the mzML to an MGF format that could be processed with RAMM.

26

In Silico Digestion

RAMM supports five different ribonucleases for in silico digestion: RNase T1, RNase U2, RNase

A, RNase MC1 and Cusativin. The programmed selectivity of each enzyme is as follows: RNase

T1 cleaves at the 3′-end of guanosine and N2-methylguanosine (m2G)48. RNase U2 cleaves at the

3′-end of guanosine and adenosine49-50. RNase A cleaves at the 3′-end of unmodified pyrimidines.

Cusativin cleaves at the 3′-end of cytidine51 and RNase MC1 cleaves at the 5′-end of uridine52. The user can select among different 3′-termini products: linear phosphate, cyclic phosphate, and hydroxyl. The user also has the option to select up to 5 missed cleavages for any enzyme.

MS/MS Spectra Interpretation

RAMM is designed such that the set of user FASTA files serves as a locally generated database of potential enzymatic digestion products against which the MS/MS data (in the MGF file) are searched and ranked. For each in silico generated digestion product, a molecular weight is calculated and serves as the initial comparison with the parent mass of the MS/MS query spectrum.

If the mass difference is within the precursor mass tolerance, the predicted product ions of the in silico digestion product are calculated and compared against the peaks in the query spectrum with any matches determined by the product ion mass tolerance. The MS/MS data is then scored and ranked as described below.

RAMM supports 120 chemical modifications/motifs and allows the user to designate single methylation and thiolation motifs (Table 2.1). Because typical oligonucleotide MS/MS data cannot differentiate nucleobase ring positions that are modified (e.g., m1A versus m6A), methylations or thiolations to canonical nucleobases that cannot be differentiated by MS/MS data can be represented by only one modification motif (e.g., mA represents any methylated adenosine)

27 in RAMM if the user desires. Pseudouridine is an isomer of uridine, thus it cannot be identified by

RAMM in the MS/MS data. However, RAMM allows for up to five user-defined modifications to be added to the modification database, thus derivatization of pseudouridine53-54 or other modifications can be accounted for depending on experimental needs. RAMM was developed to account for two common situations in RNA modifications: fixed and variable sequence position modifications. For example, for tRNAs a significant number of modifications are known to only occur at the wobble base (position 34) of the tRNA sequence. In contrast, other modifications can occur at a variety of sequence locations in any individual tRNA. Identifying fixed or variable sequence modifications are user-selected features of the program, although only one option is available at a time.

Table 2.1 Modifications supported by RAMM.

Modifications of Adenosine Arp Am mA m1A m2A m6A

8 1 2 6 6 6 6 6 m A m Am ms t A t A m t A m 6A

2 2 6 6 6 6 6 m 8A ms m A ac A ct A f A hm A hn6A I Im m1I m1Im i6A

ms2i6A ioA ms2io6A Modifications of Cytidine mC Cm m3C m4C m5C m5Cm ac4Cm k2C C+ hm5C ho5C f5C

s2C ac4C Modifications of Guanosine

1 2 7 2 Gm mG m G m G m G m 7G

2 2 m 7Gm m 27G Q oQ manQ galQ

+ gluQ oyW yW G preQ0 preQ1 imG imG-14 mimG OHyWy OHyW OHyWx

28

2 2 yW-86 yW-72 yW-58 m 2G m 2Gm mGm Modifications of Uridine D m5D mU Um m3U m5U mUm mcm5U mcm5Um cmnm5U cmnm5s2U cmnm5se2U cmnm5ges2U ncm5U ncm5Um ncm5s2U cmnm5Um chm5U nchm5U mchm5U mchm5Um cnm5U mnm5U mnm5se2U mcm5s2U nm5U nm5s2U nm5se2U nm5ges2U ges2U tm5U tm5s2U s2U s2Um s4U sU ho5U m5s2U mnm5s2U inm5U inm5Um inm5s2U mo5U cmo5U mcmo5U cm5U cm5s2U acp3U

acp3D

Scoring Function

The scoring function of RAMM is built upon both a normalized binomial distribution probability score55 and a dot product score56, which have been used previously in the interpretation of peptide

MS/MS data in proteomics. RAMM incorporates both approaches, adopted for oligonucleotide fragmentation schemes, with the final reported score being the product of the two scores.

The cumulative binomial distribution probability55 (Equation 2.1) is generated from the c-, y-, w-, and a-B-type ions that arise from oligonucleotide fragmentation in CID.

푁 푁 푃(푛, 푝, 푁) = ∑ ( ) 푝푘(1 − 푝)푁−푘 (퐸푞. 2.1) 푘 푘=푛 where N is the total number of theoretical c-, y-, w-, and a-B-type ions, n is the total number of matched theoretical ions, and p is the probability of matching one theoretical product ion. P(n,p,N)

29 is a calculated P-value of matching at least n out of N theoretical ions by chance. The P-value will be very small for a true positive.

As the P-value depends on the length of the oligonucleotide and the number of matched ions, it is more useful to convert this to a P-score, which is normalized to the oligonucleotide length as in

Equation 2.2. If all the theoretical ions are found in the MS/MS spectrum, the value S(P) will be

100, regardless of the length of the oligonucleotide.

100 ∗ log (푃) 푆(푃) = 10 (퐸푞. 2.2) 푁 ∗ log10 (푝)

RAMM allows the P-score to be weighted by the relative abundance differences between c/y-type ions and a-B/w-type ions for oligoribonucleotides. McLuckey and co-workers found that c/y-type ions have higher relative abundance as compared to a-B/w-type ions for samples containing a 2′- hydroxyl57. To account for these fragmentation channel differences, the program default P-score calculation is as given in Equation 2.3.

푆(푃푇) = [0.7 ∗ 푆(푃푐/푦)] + [0.3 ∗ 푆(푃푎−퐵/푤)] (퐸푞. 2.3)

The two weighting factors were estimated from the relative contribution of c/y (70%) and a-B/w

(30%) dissociation channels at low and high excitation amplitudes from typical MS/MS data obtained during these studies. These default values can be adjusted by the user to match the experimental data obtained from the user’s own mass spectrometer.

A limitation of scoring MS/MS data using only the P-score is that multiple sequences could match the data resulting in similar or identical P-score values with no easy means of differentiating the true sequence from false positives. To improve the accuracy of MS/MS data assignments, RAMM

30 also incorporates a dot product approach to calculate the similarity between observed and reconstructed spectra (Equation 2.4)56.

∑ 퐼 × 퐼 퐷푃 = 표푏푠푒푟푣푒푑 푟푒푐표푛푠푡푟푢푐푡푒푑 (퐸푞. 2.4) 2 2 √∑ 퐼표푏푠푒푟푣푒푑 × ∑ 퐼푟푒푐표푛푠푡푟푢푐푡푒푑

Iobserved and Ireconstructed are the ion abundances of the observed and reconstructed spectra respectively. Figure 2.4 provides an example of spectrum reconstruction in RAMM. In this example, the representative experimental data is separated into those m/z values that match, within the user-specified product ion tolerance, the predicted product ion m/z values for the particular sequence (generated by the in silico database spectrum) and those experimental m/z values that do not match. The ion abundances for the matching m/z values are retained at their original values while ion abundances for all non-matching m/z values are averaged. A dot product of 1 would signify that all of the experimentally most abundant ions arise from only those m/z values generated by the database spectrum. Low dot product scores arise when the most abundant ions in the experimental data set do not match the predicted m/z values of the database spectrum.

Figure 2.4 Spectrum reconstruction. The observed MS/MS spectrum is compared against m/z values generated by in silico CID of two oligonucleotide sequences (A and B). In this example, A and B have the same number of matched product ions, although at different overall ion abundance. To reconstruct spectral data for calculating the dot product, the matched ions (red) are kept, and the intensity of unmatched ions (grey) is the average intensity of all the unmatched ions in the observed spectrum.

31

Although scoring is built primarily around conventional c- and y-type fragment ions, one unique feature of some modified nucleosides is their propensity to fragment through direct nucleobase loss. These labile nucleobases include 7-methylguanosine (m7G), lysidine (k2C), queuosine (Q), epoxyqueuosine (oQ), and N6-threonylcarbamoyladenosine (t6A). When these bases are present in the sample, nucleobase loss is the primary fragmentation channel during MS/MS resulting in CID data with few or very low abundance c- and y-type ions. Another similar example is that RNase digestion products ending in a 3′-phosphate often dissociate during CID by loss of phosphoric acid, which again can lead to reduced c- and y-type ions in the MS/MS spectrum58. To account for this behavior, the RAMM scoring function supports neutral loss events (Table 2.2) during the dot product calculation step.

Table 2.2 The neutral losses of m7G, k2C, Queuosine, epoxyqueuosine, t6A and phosphoric acid.

Name Monoisotopic mass (Da) Average mass (Da) m7G 164.065 165.158 k2C 144.099 144.172 Q 115.063 115.131 oQ 131.058 131.130 t6A 147.053 147.129

H3PO4 97.977 97.995

The type of mass spectrometer (e.g. low-resolution LTQ-XL versus high-resolution Synapt G2-S) used during the LC-MS/MS experiment can have an impact on the MS/MS data interpretation output. Since the unit-resolution of the LTQ-XL is unable to sufficiently resolve the mass difference between consecutive U and C nucleotides (mass difference = 1 Da), there is the potential for RAMM to interpret the spectrum associated with a theoretical digestion product, for example

32

UCCCGp, with both sequence isomers (e.g. CUCCGp, CCUCGp, etc.) and C-to-U substitutions

(e.g. UUCCGp, UCUCGp, etc.). In these instances, the analyst must choose the correctly interpreted spectrum through examination of the precursor and product ion mass errors, P-score, and dot product for each of the possible interpretations generated by RAMM. Generally, the interpretation that provides the lowest mass errors, highest P-score, and highest dot product represents the correct assignment. LC-MS/MS mapping experiments performed on the Synapt provided sufficient resolution for differentiation of oligonucleotides containing consecutive C and

U nucleotides. Therefore, by setting narrower precursor and product ion mass tolerances (0.02 Da and 0.1 Da, respectively) in RAMM, which are consistent with the resolution and mass accuracy of the Synapt, significantly fewer potential interpretations will be produced for a given digestion product.

The overall quality and characteristics of the MS/MS spectrum can also impact RAMM output.

During the course of running mapping experiments on both the LTQ-XL and Synapt, it was observed that the MS/MS spectra generated on the Synapt generally result in lower dot product scores than those generated on the LTQ-XL for the same digestion product. It is believed that this is due to the inherently different nature of how CID spectra for oligonucleotides are produced on an ion-beam type (Synapt) and trapping type (LTQ-XL) mass spectrometer. The instrument parameters that provide the highest relative abundance of sequence specific product ions on the

Synapt also tend to yield a higher relative abundance of precursor ion in the MS/MS spectrum than that observed on the LTQ-XL (Figure 2.5). As the RAMM scoring function does not currently take the precursor ion into account, this leads to a bias in the dot product score for MS/MS spectra containing a high abundance of precursor ion.

33

Figure 2.5 Interpreted MS/MS spectra for digestion product A[ms2i6A]AACCGp from E. coli Ser tRNA UGA obtained from LTQ-XL (top) and Synapt G2-S (bottom). Collision-induced dissociation conditions lead to higher abundance of precursor ion on Synapt G2-S compared to LQT-XL.

Interpreted and scored MS/MS results are shown on four main panels (Figure 2.6). Panel A will list interpreted MS/MS spectra in retention time order. Each interpreted spectrum can be selected by the user. Once selected, detailed information will be shown within Panels B, C and D. Panel B shows the mass error between the observed precursor ion mass and theoretical precursor ion mass, the P-score, dot product and how many ions are matched. Panel C presents the interpreted MS/MS spectrum; matched product ions are labeled and highlighted. Panel D gives the value and mass error for each matched ion from 5′-end to 3′-end.

34

Figure 2.6 Screenshot of user interface. A) Annotated MS/MS spectrum data listed by retention time. B) MS/MS data assignment features and statistics including precursor mass error, outputs of scoring functions, and number of detected fragment ions. C) For the selected data from A, the annotated MS/MS spectrum is shown with assigned fragment ions highlighted. D) Tabular output of the data in C including mass error for each fragment ion assignment.

Scoring Function Performance

As described above, the scoring function has two components: P-score and dot product. RAMM uses the product of those two components to rank order MS/MS assignments. To characterize the ability of RAMM to correctly identify the true MS/MS sequence assignment, a FASTA file containing only E. coli tRNA gene sequences (Table 2.3) and the 27 known post-transcriptional modifications in E. coli tRNAs (no pseudouridine, Table 2.4) were used as inputs for spectral interpretation. For these evaluations, both the variable sequence and fixed position functions were used. For low resolution Thermo LTQ-XL experiments, an average mass calculation was used.

MS/MS data interpretation precursor and product ion mass tolerances were set to 1.0 Da. For high

35 resolution Waters Synapt G2-S experiments, a monoisotopic mass calculation was used. The precursor mass tolerance was set to 0.02 Da and the product ion mass tolerance was set to 0.1 Da.

Table 2.3 E. coli total tRNA gene sequences. >Escherichia_coli Ala (GGC) 76 bp Sc: 86.51 GGGGCTATAGCTCAGCTGGGAGAGCGCTTGCATGGCATGCAAGAGGTCAGCGGTTCGATCC CGCTTAGCTCCACCA >Escherichia_coli Ala (TGC) 76 bp Sc: 88.77 GGGGCTATAGCTCAGCTGGGAGAGCGCCTGCTTTGCACGCAGGAGGTCTGCGGTTCGATCCC GCATAGCTCCACCA >Escherichia_coli Arg (ACG) 77 bp Sc: 87.22 GCATCCGTAGCTCAGCTGGATAGAGTACTCGGCTACGAACCGAGCGGTCGGAGGTTCGAAT CCTCCCGGATGCACCA >Escherichia_coli Arg (CCG) 77 bp Sc: 89.49 GCGCCCGTAGCTCAGCTGGATAGAGCGCTGCCCTCCGGAGGCAGAGGTCTCAGGTTCGAATC CTGTCGGGCGCGCCA >Escherichia_coli Arg (CCT) 75 bp Sc: 66.39 GTCCTCTTAGTTAAATGGATATAACGAGCCCCTCCTAAGGGCTAATTGCAGGTTCGATTCCT GCAGGGGACACCA >Escherichia_coli Arg (TCT) 77 bp Sc: 89.39 GCGCCCTTAGCTCAGTTGGATAGAGCAACGACCTTCTAAGTCGTGGGCCGCAGGTTCGAATC CTGCAGGGCGCGCCA >Escherichia_coli Asn (GTT) 76 bp Sc: 87.13 TCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACTGGTTCGAGTCC AGTCAGAGGAGCCA >Escherichia_coli Asp (GTC) 77 bp Sc: 90.74 GGAGCGGTAGTTCAGTCGGTTAGAATACCTGCCTGTCACGCAGGGGGTCGCGGGTTCGAGTC CCGTCCGTTCCGCCA >Escherichia_coli Cys (GCA) 74 bp Sc: 51.48 GGCGCGTTAACAAAGCGGTTATGTAGCGGATTGCAAATCCGTCTAGTCCGGTTCGACTCCGG AACGCGCCTCCA >Escherichia_coli Gln (CTG) 75 bp Sc: 75.83 TGGGGTATCGCCAAGCGGTAAGGCACCGGATTCTGATTCCGGCATTCCGAGGTTCGAATCCT CGTACCCCAGCCA >Escherichia_coli Gln (TTG) 75 bp Sc: 74.33 TGGGGTATCGCCAAGCGGTAAGGCACCGGTTTTTGATACCGGCATTCCCTGGTTCGAATCCA GGTACCCCAGCCA >Escherichia_coli Glu (TTC) 76 bp Sc: 59.80 GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAGGGGTTCGAATCC CCTAGGGGACGCCA >Escherichia_coli Gly (CCC) 74 bp Sc: 78.75 GCGGGCGTAGTTCAATGGTAGAACGAGAGCTTCCCAAGCTCTATACGAGGGTTCGATTCCCT TCGCCCGCTCCA >Escherichia_coli Gly (GCC) 76 bp Sc: 93.74 GCGGGAATAGCTCAGTTGGTAGAGCACGACCTTGCCAAGGTCGGGGTCGCGAGTTCGAGTC TCGTTTCCCGCTCCA >Escherichia_coli Gly (TCC) 75 bp Sc: 64.85

36

GCGGGCATCGTATAATGGCTATTACCTCAGCCTTCCAAGCTGATGATGCGGGTTCGATTCCC GCTGCCCGCTCCA >Escherichia_coli His (GTG) 76 bp Sc: 84.86 GTGGCTATAGCTCAGTTGGTAGAGCCCTGGATTGTGATTCCAGTTGTCGTGGGTTCGAATCC CATTAGCCACCCCA >Escherichia_coli Ile (GAT) 77 bp Sc: 88.37 AGGCTTGTAGCTCAGGTGGTTAGAGCGCACCCCTGATAAGGGTGAGGTCGGTGGTTCAAGTC CACTCAGGCCTACCA >Escherichia_coli Leu (CAA) 85 bp Sc: 79.12 GCCGAAGTGGCGAAATCGGTAGACGCAGTTGATTCAAAATCAACCGTAGAAATACGTGCCG GTTCGAGTCCGGCCTTCGGCACCA >Escherichia_coli Leu (CAG) 87 bp Sc: 66.59 GCGAAGGTGGCGGAATTGGTAGACGCGCTAGCTTCAGGTGTTAGTGTTCTTACGGACGTGGG GGTTCAAGTCCCCCCCCTCGCACCA >Escherichia_coli Leu (CAG) 87 bp Sc: 69.07 GCGAAGGTGGCGGAATTGGTAGACGCGCTAGCTTCAGGTGTTAGTGTCCTTACGGACGTGGG GGTTCAAGTCCCCCCCCTCGCACCA >Escherichia_coli Leu (GAG) 87 bp Sc: 70.30 GCCGAGGTGGTGGAATTGGTAGACACGCTACCTTGAGGTGGTAGTGCCCAATAGGGCTTAC GGGTTCAAGTCCCGTCCTCGGTACCA >Escherichia_coli Leu (TAA) 87 bp Sc: 71.59 GCCCGGATGGTGGAATCGGTAGACACAAGGGATTTAAAATCCCTCGGCGTTCGCGCTGTGCG GGTTCAAGTCCCGCTCCGGGTACCA >Escherichia_coli Leu (TAG) 85 bp Sc: 78.30 GCGGGAGTGGCGAAATTGGTAGACGCACCAGATTTAGGTTCTGGCGCCGCAAGGTGTGCGA GTTCAAGTCTCGCCTCCCGCACCA >Escherichia_coli Lys (TTT) 76 bp Sc: 99.54 GGGTCGTTAGCTCAGTTGGTAGAGCAGTTGACTTTTAATCAATTGGTCGCAGGTTCGAATCC TGCACGACCCACCA >Escherichia_coli Met (CAT) 77 bp Sc: 82.92 CGCGGGGTGGAGCAGCCTGGTAGCTCGTCGGGCTCATAACCCGAAGATCGTCGGTTCAAATC CGGCCCCCGCAACCA >Escherichia_coli Met (CAT) 77 bp Sc: 86.07 CGCGGGGTGGAGCAGCCTGGTAGCTCGTCGGGCTCATAACCCGAAGGTCGTCGGTTCAAATC CGGCCCCCGCAACCA >Escherichia_coli Met (CAT) 76 bp Sc: 93.86 GGCCCCTTAGCTCAGTGGTTAGAGCAGGCGACTCATAATCGCTTGGTCGCTGGTTCAAGTCC AGCAGGGGCCACCA >Escherichia_coli Met (CAT) 76 bp Sc: 93.98 GGCCCTTTAGCTCAGTGGTTAGAGCAGGCGACTCATAATCGCTTGGTCGCTGGTTCAAGTCC AGCAAGGGCCACCA >Escherichia_coli Met (CAT) 77 bp Sc: 96.18 GGCTACGTAGCTCAGTTGGTTAGAGCACATCACTCATAATGATGGGGTCACAGGTTCGAATC CCGTCGTAGCCACCA >Escherichia_coli Phe (GAA) 76 bp Sc: 84.11 GCCCGGATAGCTCAGTCGGTAGAGCAGGGGATTGAAAATCCCCGTGTCCTTGGTTCGATTCC GAGTCCGGGCACCA >Escherichia_coli Pro (CGG) 77 bp Sc: 85.12 CGGTGATTGGCGCAGCCTGGTAGCGCACTTCGTTCGGGACGAAGGGGTCGGAGGTTCGAAT CCTCTATCACCGACCA >Escherichia_coli Pro (GGG) 77 bp Sc: 77.97

37

CGGCACGTAGCGCAGCCTGGTAGCGCACCGTCATGGGGTGTCGGGGGTCGGAGGTTCAAAT CCTCTCGTGCCGACCA >Escherichia_coli Pro (TGG) 77 bp Sc: 92.73 CGGCGAGTAGCGCAGCTTGGTAGCGCAACTGGTTTGGGACCAGTGGGTCGGAGGTTCGAAT CCTCTCTCGCCGACCA >Escherichia_coli SeC(p) (TCA) 91 bp Sc: 75.19 GGAAGATCGTCGTCTCCGGTGAGGCGGCTGGACTTCAAATCCAGTTGGGGCCGCCAGCGGTC CCGGGCAGGTTCGACTCCTGTGATCTTCC >Escherichia_coli Ser (CGA) 90 bp Sc: 75.49 GGAGAGATGCCGGAGCGGCTGAACGGACCGGTCTCGAAAACCGGAGTAGGGGCAACTCTAC CGGGGGTTCAAATCCCCCTCTCTCCGCCA >Escherichia_coli Ser (GCT) 93 bp Sc: 72.46 GGTGAGGTGGCCGAGAGGCTGAAGGCGCTCCCCTGCTAAGGGAGTATGCGGTCAAAAGCTG CATCCGGGGTTCGAATCCCCGCCTCACCGCCA >Escherichia_coli Ser (GGA) 88 bp Sc: 68.60 GGTGAGGTGTCCGAGTGGCTGAAGGAGCACGCCTGGAAAGTGTGTATACGGCAACGTATCG GGGGTTCGAATCCCCCCCTCACCGCCA >Escherichia_coli Ser (TGA) 88 bp Sc: 76.89 GGAAGTGTGGCCGAGCGGTTGAAGGCACCGGTCTTGAAAACCGGCGACCCGAAAGGGTTCC AGAGTTCGAATCTCTGCGCTTCCGCCA >Escherichia_coli Thr (CGT) 77 bp Sc: 42.36 GCTCAAGTAGTTAAAAATGCATTAACATCGCATTCGTAATGCGAAGGTCGTAGGTTCGACTC CTATTATCGGCACCA >Escherichia_coli Thr (CGT) 76 bp Sc: 91.49 GCCGATATAGCTCAGTTGGTAGAGCAGCGCATTCGTAATGCGAAGGTCGTAGGTTCGACTCC TATTATCGGCACCA >Escherichia_coli Thr (GGT) 76 bp Sc: 88.70 GCTGATATGGCTCAGTTGGTAGAGCGCACCCTTGGTAAGGGTGAGGTCCCCAGTTCGACTCT GGGTATCAGCACCA >Escherichia_coli Thr (GGT) 76 bp Sc: 94.75 GCTGATATAGCTCAGTTGGTAGAGCGCACCCTTGGTAAGGGTGAGGTCGGCAGTTCGAATCT GCCTATCAGCACCA >Escherichia_coli Thr (TGT) 76 bp Sc: 91.83 GCCGACTTAGCTCAGTAGGTAGAGCAACTGACTTGTAATCAGTAGGTCACCAGTTCGATTCC GGTAGTCGGCACCA >Escherichia_coli Trp (CCA) 76 bp Sc: 82.06 AGGGGCGTAGTTCAATTGGTAGAGCACCGGTCTCCAAAACCGGGTGTTGGGAGTTCGAGTCT CTCCGCCCCTGCCA >Escherichia_coli Tyr (GTA) 85 bp Sc: 66.93 GGTGGGGTTCCCGAGCGGCCAAAGGGAGCAGACTGTAAATCTGCCGTCATCGACTTCGAAG GTTCGAATCCTTCCCCCACCACCA >Escherichia_coli Tyr (GTA) 85 bp Sc: 67.63 GGTGGGGTTCCCGAGCGGCCAAAGGGAGCAGACTGTAAATCTGCCGTCACAGACTTCGAAG GTTCGAATCCTTCCCCCACCACCA >Escherichia_coli Val (GAC) 77 bp Sc: 92.34 GCGTTCATAGCTCAGTTGGTTAGAGCACCACCTTGACATGGTGGGGGTCGTTGGTTCGAGTC CAATTGAACGCACCA >Escherichia_coli Val (GAC) 77 bp Sc: 96.76 GCGTCCGTAGCTCAGTTGGTTAGAGCACCACCTTGACATGGTGGGGGTCGGTGGTTCGAGTC CACTCGGACGCACCA >Escherichia_coli Val (TAC) 76 bp Sc: 94.34

38

GGGTGATTAGCTCAGCTGGGAGAGCACCTCCCTTACAAGGAGGGGGTCGGCGGTTCGATCC CGTCATCACCCACCA

Table 2.4 Selected modifications for variable sequence position modification mapping of E. coli total tRNA gene sequences.

Selected Adenosine Modifications I m2A m6A ms2i6A i6A m6t6A 6 t A Selected Cytidine Modifications 2 2 4 s C k C Cm ac C Selected Guanosine Modifications m1G m7G Gm Q oQ galQ Selected Uridine Modifications D m5U s4U acp3U mnm5U Um 5 2 5 2 5 5 cmnm s U mnm s U cmnm Um cmo U

RAMM spectral interpretation results were verified manually and then classified: those MS/MS spectra known to be present in the sample that were correctly interpreted by RAMM and those

MS/MS spectra known to be present but were incorrectly interpreted by RAMM. Based on the results, receiver operating characteristic (ROC) curves59 were created by plotting the sensitivity and specificity as shown in Figure 2.7. Sensitivity is defined as true positive/(true positive + false negative) and specificity is defined as true negative/(true negative + false positive). Here, a true positive is the number of correct MS/MS interpretations with scores above the user-defined scoring threshold; a false negative is the number of correct MS/MS interpretations whose scores were found to be below the scoring threshold; a true negative is the number of incorrect MS/MS interpretations with scores below the scoring threshold; and a false positive is the number of incorrect MS/MS interpretations with scores above the scoring threshold.

39

Figure 2.7 The receiver operating characteristic (ROC) curve for sequence mapping results of fixed (red) and variable (blue) sequence position modifications.

The area under the ROC curve (AUC) were 0.94 and 0.89 for spectral interpretation of fixed sequence position modifications and variable sequence position modifications, respectively, using low resolution data from LTQ-XL with 486 MS/MS spectra and 177 digestion product sequences

(there are 109 unique digestion products including 58 unique digestion products containing at least one post-transcriptionally modified nucleoside). Overall RAMM was able to correctly characterize MS/MS spectra containing modified nucleosides from the data set tested for either fixed or variable modification options. The relatively lower AUC for variable sequence position modifications is mainly caused by false positives, as a larger number of incorrect modified MS/MS spectra were scored. These results reinforce that, while RAMM can be a powerful and effective tool for the automated interpretation of MS/MS data and subsequent RNA sequence mapping, the results are not completely without error and user interaction with the interpreted data remains warranted.

40

The program was tested for its ability to correctly interpret LC-MS/MS data from a typical untargeted analysis by using an RNase T1 digestion of total tRNA from E. coli. This sample has been extensively characterized44, and the modification status of each E. coli tRNA is known45.

There are a total of 73 unique theoretical RNase T1 digestion products containing at least one post- transcriptionally modified nucleoside (excluding pseudouridine) as listed in Table 2.5. RAMM was able to accurately interpret the MS/MS spectra for 58 of these digestion products using both low-resolution (LTQ-XL) and high-resolution (Synapt G2-S) instruments, consistent with past manual interpretation outcomes44. A P-score of 70 can be considered significant for LC-MS/MS data generated on both a low-resolution (LTQ-XL) and high-resolution (Synapt) mass spectrometer. However, due to the inherently different character of the MS/MS spectra for oligonucleotides generated on these two different mass spectrometers, a dot product score of 0.85 was considered significant on the LTQ-XL and a dot product score of 0.65 was considered significant on the Synapt.

Table 2.5 RNase T1 digestion products of E. coli total tRNAs containing at least one post- transcriptionally modified nucleoside. N/D denotes that this digestion product was not processed by RAMM but in the original LC-MS/MS data. N/A denotes that this digestion product was not present in the original LC-MS/MS data.

Digestion product Fixed modification Variable modification rank score rank score [m7G][acp3U]CGp 1 77.281 1 77.281 [m7G]UCGp 1 64.867 1 64.867 [m5U]YCGp 1 64.634 2 64.634 DDAGp 1 97.1 1 97.1 DD[Gm]Gp 1 75.27 1 75.27 U[m7G]UUGp 1 81.82 1 81.82 ADAGp 1 96.3 1 96.3 [m7G]UCUGp 2 50.757 4 50.757 [m7G]UCAGp 1 77.287 1 77.287 UU[m7G]UCGp 1 72.724 1 72.724

41

A[s4U]AGp 1 80.591 1 80.591 AUUQUGp 1 84.257 1 84.257 UA[s4U]CGp 1 97.9 1 97.9 [m1G]ACGp 1 88.587 1 88.587 AU[s4U]AGp 1 89.273 1 89.273 U[Um]U[cmnm5s2U]UGp 1 61.349 1 61.349 [m1G]YUCUGp 1 51.678 1 51.678 ACU[k2C]AU[t6A]AYCGp 1 65.73 1 65.73 CUU[cmo5U]Gp 1 81.09 1 81.09 AADC[Gm]Gp 1 78.622 1 78.622 [m2A]ACCGp 1 90.668 1 90.668 [m5U]YCAAGp 1 77.459 1 77.459 [m2A]YYCCGp 1 85.41 1 85.41 CU[t6A]AGp 1 63.327 1 63.327 [s4U][s4U]CCCGp 1 85.882 1 85.882 [m7G][acp3U]CACAGp 1 69.974 1 69.974 [m2A]YACCGp 2 72.157 2 72.157 U[Cm]U[cmo5U]Gp 1 65.823 2 65.823 [m1G]ACCAGp 1 73.472 1 73.472 A[Um]UCUGp 1 83.336 2 83.336 AADD[Gm]Gp 1 77.907 1 77.907 AU[t6A]AGp 1 76.121 1 76.121 [m7G]UCCCCAGp 1 53.468 1 53.468 CUCCC[s2C]UGp 1 73.637 1 73.637 [m2A]YYCCAGp 1 73.24 1 73.24 CCU[oQ]UC[m2A]CGp 1 58.348 1 58.348 UUCAADDGp 1 68.849 1 68.849 CCU[mnm5U]CCAAGp 1 68.536 1 68.536 CCCU[mnm5s2U]UC[m2A]CGp 1 66.863 1 66.863 UAU[m7G]UCACUGp 1 58.168 1 58.168 U[s4U]AACAAAGp 1 81.932 1 81.932 [m5U]YCAAAUCCGp 1 82.035 1 82.035 [Cm]UCAUAACCCGp 1 66.6 2 66.6 ACU[oQ]UU[t6A]AYCCGp 1 51.205 1 51.205 ACU[mnm5s2U]UU[t6A]AYCAAUUGp 1 59.431 1 59.431 [m5U]YCAAAUCCUCUCGp 1 64.876 1 64.876 CACCUCCCU[cmo5U]AC[m6A]AGp 1 68.265 1 68.265 ACUUCA[i6A]AUCCAGp 1 62.426 1 62.426 ACU[oQ]UA[ms2i6A]AYCUGp 1 64.054 1 64.054 CA[ms2i6A]AYCCGp 1 71.639 1 71.639 AUU[Cm]AA[ms2i6A]AUCAACCGp 1 63.3 1 63.3

42

U[Cm]UCCA[ms2i6A]AACCGp 1 62.288 1 62.288 AA[ms2i6A]AYCCCCGp 1 76.117 1 76.117 A[ms2i6A]AACCGp 1 78.152 1 78.152 U[m7G][acp3U]CCUUGp 1 59.954 1 59.954 DDAUGp 1 50.884 1 50.884 CUA[s4U]AGp 1 71.586 1 71.586 AAADC[Gm]Gp 1 63.183 1 63.183 UU[cmo5U]Gp N/D N/D 7 [m G]UCACCAGp N/D N/D 4 6 CACAUCACU[ac C]AU[t A]AYGp N/A N/A 5 2 6 AYU[cmnm Um]AA[ms i A]AYCCCUCGp N/A N/A [m5U]YCAAAUCCCCCUCUCUCCGp N/A N/A ACU[cmo5U]Gp N/A N/A AC[s2C]U[mnm5U]CU[t6A]AGp N/A N/A 6 CCCCUCCU[t A]AGp N/A N/A 7 [m G]UCUCAGp N/A N/A 4 CCCUU[s U]AGp N/A N/A AAADD[Gm]Gp N/A N/A CCCCU[s4U]AGp N/A N/A U[m6t6A]AGp N/A N/A 2 CC[s C]UCCGp N/A N/A 4 UUCA[s U]AGp N/A N/A

RNA Modification Mapping

To examine the capabilities of the program to map MS/MS data onto RNA sequences, a FASTA file containing 40 unique tRNA sequences with 27 known post-transcriptional modifications

(Table 2.6) was used. MS/MS interpretation was performed using only fixed sequence position modifications. The length of matched oligonucleotides varied from dimers to a 16-mer, and all matched digestion products are aligned under the input sequences. An example of the mapping

Ser result is shown in Figure 2.8, where matched digestion products are aligned under the tRNA UGA sequence.

43

Table 2.6 E. coli tRNA sequences with modifications. > tRNA | Met | CAU | Escherichia coli | prokaryotic cytosol GGCUACG[s4U]AGCUCAGDD[Gm]GDDAGAGCACAUCACU[ac4C]AU[t6A]AYGAUGGG[m7G] [acp3U]CACAGG[m5U]YCGAAUCCCGUCGUAGCCACCA > tRNA | Phe | GAA | Escherichia coli | prokaryotic cytosol GCCCGGA[s4U]AGCUCAGDCGGDAGAGCAGGGGAYUGAA[ms2i6A]AYCCCCGU[m7G][acp3U]C CUUGG[m5U]YCGAUUCCGAGUCCGGGCACCA > tRNA | Pro | CGG | Escherichia coli | prokaryotic cytosol CGGUGAUUGGCGCAGCCUGGDAGCGCACUUCGUUCGG[m1G]ACGAAGGG[m7G]UCGGAGG [m5U]YCGAAUCCUCUAUCACCGACCA > tRNA | Sec | UCA | Escherichia coli | prokaryotic cytosol AAGAUCG[s4U]CGUCUCCGGDGAGGCGGCUGGACUUCA[i6A]AUCCAGUUGGGGCCGCGCGG UCCCGGGCAGG[m5U]YCGACUCCUGUGAUCUUGCCA > tRNA | Ser | UGA | Escherichia coli | prokaryotic cytosol GGAAGUG[s4U]GGCCGAGC[Gm]GDDGAAGGCACCGGU[Cm]U[cmo5U]GA[ms2i6A]AACCGGCG ACCCGAAAGGGUUCCAGAG[m5U]YCGAAUCUCUGCGCUUCCGCCA > tRNA | Ser | CGA | Escherichia coli | prokaryotic cytosol GGAGAGAUGCCGGAGC[Gm]GCDGAACGGACCGGUCUCGA[ms2i6A]AACCGGAGUAGGGGCA ACUCUACCGGGGG[m5U]YCAAAUCCCCCUCUCUCCGCCA > tRNA | Ser | GCU | Escherichia coli | prokaryotic cytosol GGUGAGG[s4U]GGCCGAGAGGCDGAAGGCGCUCCC[s2C]UGCU[t6A]AGGGAGUAUGCGGUCA AAAGCUGCAUCCGGGG[m5U]YCGAAUCCCCGCCUCACCGCCA > tRNA | Ser | GGA | Escherichia coli | prokaryotic cytosol GGUGAGG[s4U]GUCCGAGU[Gm]GDDGAAGGAGCACGCCUGGAAAGYGUGUAUACGGCAAC GUAUCGGGGG[m5U]YCGAAUCCCCCCCUCACCGCCA > tRNA | Thr | GGU | Escherichia coli | prokaryotic cytosol GCUGAUAUGGCUCAGDDGGDAGAGCGCACCCUUGGU[m6t6A]AGGGUGAG[m7G]UCCCCAG [m5U]YCGACUCUGGGUAUCAGCACCA > tRNA | Thr | GGU | Escherichia coli | prokaryotic cytosol GCUGAUAUAGCUCAGDDGGDAGAGCGCACCCUUGGU[m6t6A]AGGGUGAG[m7G]UCGGCAG [m5U]YCGAAUCUGCCUAUCAGCACCA > tRNA | Trp | CCA | Escherichia coli | prokaryotic cytosol AGGGGCG[s4U]AGUUCAADDGGDAGAGCACCGGU[Cm]UCCA[ms2i6A]AACCGGGU[m7G]UUG GGAG[m5U]YCGAGUCUCUCCGCCCCUGCCA > tRNA | Tyr | GUA | Escherichia coli | prokaryotic cytosol GGUGGGG[s4U][s4U]CCCGAGC[Gm]GCCAAAGGGAGCAGACUQUA[ms2i6A]AYCUGCCGUCAU CGACUUCGAAGG[m5U]YCGAAUCCUUCCCCCACCACCA > tRNA | Tyr | GUA | Escherichia coli | prokaryotic cytosol GGUGGGG[s4U][s4U]CCCGAGC[Gm]GCCAAAGGGAGCAGACUQUA[ms2i6A]AYCUGCCGUCAC AGACUUCGAAGG[m5U]YCGAAUCCUUCCCCCACCACCA > tRNA | Val | UAC | Escherichia coli | prokaryotic cytosol GGGUGAU[s4U]AGCUCAGCDGGGAGAGCACCUCCCU[cmo5U]AC[m6A]AGGAGGGG[m7G]UCG GCGG[m5U]YCGAUCCCGUCAUCACCCACCA > tRNA | Val | GAC | Escherichia coli | prokaryotic cytosol GCGUCCG[s4U]AGCUCAGDDGGDDAGAGCACCACCUUGACAUGGUGGGG[m7G][acp3U]CGGU GG[m5U]YCGAGUCCACUCGGACGCACCA > tRNA | Val | GAC | Escherichia coli | prokaryotic cytosol GCGUUCA[s4U]AGCUCAGDDGGDDAGAGCACCACCUUGACAUGGUGGGG[m7G][acp3U]CGUU GG[m5U]YCGAGUCCAAUUGAACGCACCA > tRNA | Ini | CAU | Escherichia coli | prokaryotic cytosol

44

CGCGGGG[s4U]GGAGCAGCCUGGDAGCUCGUCGGG[Cm]UCAUAACCCGAAG[m7G]UCGUCG G[m5U]YCAAAUCCGGCCCCCGCAACCA > tRNA | Ini | CAU | Escherichia coli | prokaryotic cytosol CGCGGGG[s4U]GGAGCAGCCUGGDAGCUCGUCGGG[Cm]UCAUAACCCGAAGAUCGUCGG [m5U]YCAAAUCCGGCCCCCGCAACCA > tRNA | Ala | UGC | Escherichia coli | prokaryotic cytosol GGGGCUAUAGCUCAGCDGGGAGAGCGCCUGCUU[cmo5U]GCACGCAGGAG[m7G]UCUGCGG [m5U]YCGAUCCCGCAUAGCUCCACCA > tRNA | Ala | GGC | Escherichia coli | prokaryotic cytosol GGGGCUAUAGCUCAGCDGGGAGAGCGCUUGCAUGGCAUGCAAGAG[m7G]UCAGCGG[m5U] YCGAUCCCGCUUAGCUCCACCA > tRNA | Arg | ACG | Escherichia coli | prokaryotic cytosol GCAUCCG[s4U]AGCUCAGCDGGADAGAGUACUCGG[s2C]UICG[m2A]ACCGAGCG[m7G][acp3U] CGGAGG[m5U]YCGAAUCCUCCCGGAUGCACCA > tRNA | Arg | CCG | Escherichia coli | prokaryotic cytosol GCGCCCGUAGCUCAGCDGGADAGAGCGCUGCC[s2C]UCCG[m1G]AGGCAGAG[m7G]UCUCAG G[m5U]YCGAAUCCUGUCGGGCGCGCCA > tRNA | Arg | UCU | Escherichia coli | prokaryotic cytosol GCGCCCUUAGCUCAGUUGGAUAGAGCAACGAC[s2C]U[mnm5U]CU[t6A]AGYCGUGGGCCGCA GG[m5U]YCGAAUCCUGCAGGGCGCGCCA > tRNA | Asn | GUU | Escherichia coli | prokaryotic cytosol UCCUCUG[s4U]AGUUCAGDCGGDAGAACGGCGGACUQUU[t6A]AYCCGUAU[m7G]UCACUGG [m5U]YCGAGUCCAGUCAGAGGAGCCA > tRNA | Asp | GUC | Escherichia coli | prokaryotic cytosol GGAGCGG[s4U]AGUUCAGDCGGDDAGAAUACCUGCCU[gluQ]UC[m2A]CGCAGGGG[m7G]UCG CGGG[m5U]YCGAGUCCCGYCCGUUCCGCCA > tRNA | Cys | GCA | Escherichia coli | prokaryotic cytosol GGCGCGU[s4U]AACAAAGCGGDDAUGUAGCGGAYUGCA[ms2i6A]AYCCGUCUAGUCCGG [m5U]YCGACUCCGGAACGCGCCUCCA > tRNA | Gln | UUG | Escherichia coli | prokaryotic cytosol UGGGGUA[s4U]CGCCAAGC[Gm]GDAAGGCACCGGU[Um]U[cmnm5s2U]UG[m2A]YACCGGCAU UCCCUGG[m5U]YCGAAUCCAGGUACCCCAGCCA > tRNA | Gln | CUG | Escherichia coli | prokaryotic cytosol UGGGGUA[s4U]CGCCAAGC[Gm]GDAAGGCACCGGA[Um]UCUG[m2A]YYCCGGCAUUCCGAG G[m5U]YCGAAUCCUCGUACCCCAGCCA > tRNA | Glu | UUC | Escherichia coli | prokaryotic cytosol GUCCCCUUCGUCYAGAGGCCCAGGACACCGCCCU[mnm5s2U]UC[m2A]CGGCGGUAACAGGG G[m5U]YCGAAUCCCCUAGGGGACGCCA > tRNA | Gly | CCC | Escherichia coli | prokaryotic cytosol GCGGGCG[s4U]AGUUCAAUGGDAGAACGAGAGCUUCCCAAGCUCUAUACGAGGG[m5U]YCG AUUCCCUUCGCCCGCUCCA > tRNA | Gly | UCC | Escherichia coli | prokaryotic cytosol GCGGGCAUCGUAUAAUGGCUAUUACCUCAGCCU[mnm5U]CCAAGCUGAUGAUGCGGG[m5U] YCGAUUCCCGCUGCCCGCUCCA > tRNA | Gly | GCC | Escherichia coli | prokaryotic cytosol GCGGGAAUAGCUCAGDDGGDAGAGCACGACCUUGCCAAGGUCGGG[m7G]UCGCGAG[m5U] YCGAGUCUCGUUUCCCGCUCCA > tRNA | His | GUG | Escherichia coli | prokaryotic cytosol GGUGGCUA[s4U]AGCUCAGDDGGDAGAGCCCUGGAUUQUG[m2A]YYCCAGUU[m7G]UCGUG GG[m5U]YCGAAUCCCAUUAGCCACCCCA > tRNA | Ile | GAU | Escherichia coli | prokaryotic cytosol

45

AGGCUUGUAGCUCAGGDGGDDAGAGCGCACCCCUGAU[t6A]AGGGUGAG[m7G][acp3U]CGGU GG[m5U]YCAAGUCCACYCAGGCCUACCA > tRNA | Ile | CAU | Escherichia coli | prokaryotic cytosol GGCCCCU[s4U]AGCUCAGU[Gm]GDDAGAGCAGGCGACU[k2C]AU[t6A]AYCGCUUG[m7G] [acp3U]CGCUGG[m5U]YCAAGUCCAGCAGGGGCCACCA > tRNA | Leu | CAG | Escherichia coli | prokaryotic cytosol GCGAAGGUGGCGGAADD[Gm]GDAGACGCGCUAGCUUCAG[m1G]YGYUAGUGUCCUUACGG ACGUGGGGG[m5U]YCAAGUCCCCCCCCUCGCACCA > tRNA | Leu | GAG | Escherichia coli | prokaryotic cytosol GCCGAGGUGGUGGAADD[Gm]GDAGACACGCUACCUUGAG[m1G]YGGUAGUGCCCAAUAGG GCUUACGGG[m5U]YCAAGUCCCGUCCUCGGUACCA > tRNA | Leu | UAA | Escherichia coli | prokaryotic cytosol GCCCGGA[s4U]GGUGGAADC[Gm]GDAGACACAAGGGAYU[cmnm5Um]AA[ms2i6A]AYCCCUCG GCGUUCGCGCUGUGCGGG[m5U]YCAAGUCCCGCUCCGGGUACCA > tRNA | Leu | CAA | Escherichia coli | prokaryotic cytosol GCCGAAG[s4U]GGCGAAADC[Gm]GDAGACGCAGUUGAYU[Cm]AA[ms2i6A]AYCAACCGUAG AAAUACGUGCCGG[m5U]YCGAGUCCGGCCUUCGGCACCA > tRNA | Lys | UUU | Escherichia coli | prokaryotic cytosol GGGUCGUUAGCUCAGDDGGDAGAGCAGUUGACU[mnm5s2U]UU[t6A]AYCAAUUG[m7G] [acp3U]CGCAGG[m5U]YCGAAUCCUGCACGACCCACCA

Figure 2.8 Representative example of sequence mapping with fixed modifications. The modifications are listed above the sequence, and matched digestion products, which arise from interpreted MS/MS data, are aligned under the sequence. The depth of each digestion product indicates how many times an MS/MS spectrum was found and interpreted in the data file.

When unmodified RNA sequence files are input, RAMM will align both unmodified and modified digestion products with differences highlighted. It should be noted that interpreted MS/MS spectra can be aligned to more than one sequence. For example, the MS/MS spectrum interpreted as

Gln Gln CACCGp can align with four of the 40 tRNA sequences used here (tRNA UUG, tRNA CUG,

46

Ser Trp tRNA UGA or tRNA CCA). The mapping function will not discriminate which sequence. Another feature implemented is that the number of MS/MS spectra that are interpreted and then mapped onto a particular sequence region is denoted to provide additional information on the MS/MS data quality. Based on the E. coli total tRNA sequence mapping result, the calculated sequence coverage is 63%. While only RNase T1 digested tRNAs were analyzed here, a number of experimental options exist to increase sequence coverage22.

S. griseus 16S rRNA and 23S rRNA Sequence Mapping

RAMM was next used to annotate modifications from rRNA digests. Unlike tRNAs, rRNAs have fewer types of modifications with pseudouridine and methylations being the most common. To test the ability of RAMM to identify methylations, S. griseus 16S (1570 nt) and 23S (3208 nt) rRNAs (Table 2.7) were digested with RNase T1 and analyzed by LC-MS/MS. For data interpretation, the following modifications (or motifs) were selected: mA, Am, mCm, Cm, mC, mG, Gm, mU, Um and mUm.

Table 2.7 Gene sequences of S. griseus 16S rRNA and 23S rRNA. >gb|M76388.1|STMDRNA:1079-2606 S.griseus 16S rRNA gene sequence CATTCACGGAGAGTTTGATCCTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAA GTCGAACGATGAAGCCTTTCGGGGTGGATTAGTGGCGAACGGGTGAGTAACACGTGGGCAA TCTGCCCTTCACTCTGGGACAAGCCCTGGAAACGGGGTCTAATACCGGATAACACTCTGTCC CGCATGGGACGGGGTTAAAAGCTCCGGCGGTGAAGGATGAGCCCGCGGCCTATCAGCTTGT TGGTGGGGTAATGGCCTACCAAGGCGACGACGGGTAGCCGGCCTGAGAGGGCGACCGGCCA CACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAA TGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCT CTTTCAGCAGGGAAGAAGCGAGAGTGACGGTACCTGCAGAAGAAGCGCCGGCTAACTACGT GCCAGCAGCCGCGGTAATACGTAGGGCGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGAG CTCGTAGGCGGCTTGTCACGTCGGATGTGAAAGCCCGGGGCTTAACCCCGGGTCTGCATTCG ATACGGGCTAGCTAGAGTGTGGTAGGGGAGATCGGAATTCCTGGTGTAGCGGTGAAATGCG CAGATATCAGGAGGAACACCGGTGGCGAAGGCGGATCTCTGGGCCATTACTGACGCTGAGG AGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGTTGG GAACTAGGTGTTGGCGACATTCCACGTCGTCGGTGCCGCAGCTAACGCATTAAGTTCCCCGC CTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCAGC GGAGCATGTGGCTTAATTCGACGCAACGCGAAGAACCTTACCAAGGCTTGACATATACCGG AAAGCATCAGAGATGGTGCCCCCCTTGTGGTCGGTATACAGGTGGTGCATGGCTGTCGTCAG CTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTTCTGTGTTGCCA

47

GCATGCCTTCGGGGTGATGGGGACTCACAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTG GGGACGACGTCAAGTCATCATGCCCCTTATGTCTTGGGCTGCACACGTGCTACAATGGCCGG TACAATGAGCTGCGATGCGCGAGGCGGAGCGAATCTCAAAAAGCCGGTCTCAGTTCGGATT GGGGTCTGCAACTCGACCCCATGAAGTCGGAGTTGCTAGTAATCGCAGATCAGCATTGCTGC GGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACGTCACGAAAGTCGGTAACACCC GAAGCCGGTGGCCCAACCCCTTGTGGGAGGGAGCTGTCGAAGGTGGGACTGGCGATTGGGA CGAAGTCGTAACAAGGTAGCCGTACCGGAAGGTGCGGCTGGATCACCTCCTTTCT >gb|M76388.1|STMDRNA:2911-6030 S.griseus 23S rRNA gene sequence GGCCAAGTTTTTAAGGGCGCACGGTGGATGCCTTGGCACCAGGAACCGATGAAGGACGTGG GAGGCCACGATAGTCCCCGGGGAGCTGTCAACCAAGCTTTGATCCGGGGGTTTCCGAATGG GGAAACCCGGCAGTCGTCATGGGCTGTCACCCGCTGCTGAACACATAGGCAGTGTGGAGGG AACGAGGGGAAGTGAAACATCTCAGTACCCTCAGGAAGAGAAAACAACCGTGATTCCGGGA GTAGTGGCGAGCGAAACTGGATGAGGCCAAACCGTA TGCGTGTGATACCCGGCAGTTGCGCATGCGGGTTGTGGGATCTCTCTTTCACAGTCTGCCGG CTGTGAGGCGAGTCAGAAACCGTTGATGTAGGCGAAGGACATGCGAAAGGTCCGGCGTAGA GGGTAAGACCCCCGTAGCTGAAACATTGACGGCTCGTTTGAGAGACACCCAAGTAGCACGG GGCCCGAGAAATCCCGTGTGAATCTGGCGGGACCACCCGCTAAGCCTAAATATTCCCTGGTG ACCGATAGCGGATAGTACCGTGAGGGAATGGTGAAAAGTACCGCGGGAGCGGAGTGAAAT AGTACCTGAAACCGTGTGCCTACAAGCCGTGGGAGCGTCGCTGGCAGCACTTGTGCTGTCAG TCGTGACTGCGTGCCTTTTGAAGAATGAGCCTGCGAGTTAGCGGTGTGTAGCGAGGTTAACC CGTGTGGGGAAGCCGTAGCGAAAGCGAGTCCGAACAGGGCGGTTGAGTTGCACGCTCTAGA CCCGAAGCGGAGTGATCTAGCCATGGGCAGGTTGAAGCGGAGGTAAGACTTCGTGGAGGAC CGAACCACCAGGGTTGAAAACCTGGGGGATGACCTGTGGTTAGGGGTGAAAGGCCAATCAA ACTCCGTGATAGCTGGTTCTCCCCGAAATGCATTTAGGTGCAGCGTCGTGTGTTTCTTGCCGG AGGTAGAGCACTGGATAGGCGATGGGCCCTACCGGGTTACTGACCTTAGCCAAACTCCGAA TGCCGGTAAGTGAGAGCACGGCAGTGAGACTGTGGGGGATAAGCTCCATGGTCGAGAGGGA AACAGCCCAGAGCATCGACTAAGGCCCCTAAGCGTACGCTAAGTGGGAAAGGATGTGGAGT CGCAGAGACAACCAGGAGGTTGGCTTAGAAGCAGCCACCCTTGAAAGAGTGCGTAATAGCT CACTGGTCAAGTGATTCCGCGCCGACAATGTAGCGGGGCTCAAGCGTACCGCCGAAGTCGT GTCATTCGTACATGTATCCCCAACGGGAGTACGGATGGGTAGGGGAGCGTCGTGTGCCGGGT GAAGCAGCCGCGGAAGCGAGTTGTGGACGGTTCACGAGTGAGAATGCAGGCATGAGTAGCG ATACACACGTGAGAAACGTGTGCGCCGATTGACTAAGGGTTCCTGGGTCAAGCTGATCTGCC CAGGGTAAGTCGGGACCTAAGGCGAGGCCGACAGGCGTAGTCGATGGACAACCGGTTGATA TTCGGTACCCGCTTTGAAACGCCCAGTACTGAATCAGGCGATGCTAAGTCCGTGAAGCCGGC CCGATCTCTTCGGAGTTGAGGGTAGTGGTGGAGCCGATGAACCAGACTTGTAGTAGGTAAGC GATGGGGTGACGCAGGAAGGTAGTCCAGCCCGGGCGGTGGTTGTCCCGGGGTAAGGGTGTA GGACGCACGGTAGGCAAATCCGTCGTGCATATAAGTCTGAGACCTGATGCCGAGCCGATTGT GGTGAAGTGGATGATCCTATGCTGTCGAGAAAAGCCTCTAGCGAGTTTCATGGCGGCCCGTA CCCTAAACCGACTCAGGTGGTCAGGTAGAGAATACCGAGGCGTTCGGGTGAACTATGGTTA AGGAACTCGGCAAAATGCCCCCGTAACTTCGGGAGAAGGGGGGCCATCACTGGTGATCCGA TTTACTCGGTGAGCTGGGGGTGGCCGCAGAGACCAGCGAGAAGCGACTGTTTACTAAAAAC ACAGGTCCGTGCGAAGCCGTAAGGCGATGTATACGAACTGACGCCTGCCCGGTGCTGGAAC GTTAAGGGGACCGGTTAGCTGACTTTCGGGTCGGCGAAGCTGAGAACTTAAGCGCCAGTAA ACGGCGGTGGTAACTATAACCATCCTAAGGTAGCGAAATTCCTTGTCGGGTAAGTTCCGACC TGCACGAATGGCGTAACGACTTCTCGACTGTCTCAACCATAGGCCCGGTGAAATTGCACTAC GAGTAAAGATGCTCGTTTCGCGCAGCAGGACGGAAAGACCCCGGGACCTTTACTATAGTTTG ATATTGGTGTTCGGTTCGGCTTGTGTAGGATAGGTGGGAGACTTTGAAGCGGCCACGCCAGT GGTTGTGGAGTCGTCGTTGAAATACCACTCTGGTCGTGCTGGATGTCTAACCTGGGTCCGTG ATCCGGATCAGGGACAGTGTCTGATGGGTAGTTTAACTGGGGCGGTTGCCTCCTAAAGAGTA ACGGAGGCGCCCAAAGGTTCCCTCAGCCTGGTTGGCAATCAGGTGTTGAGTGTAAGTGCACA AGGGAGCTTGACTGTGAGACCGACGGGTCGAGCAGGGACGAAAGTCGGGACTAGTGATCCG

48

GCAGTGGCTTGTGGAAGCGCTGTCGCTCAACGGATAAAAGGTACCCCGGGGATAACAGGCT GATCTTCCCCAAGAGTCCATATCGACGGGATGGTTTGGCACCTCGATGTCGGCTCGTCGCAT CCTGGGGCTGGAGTCGGTCCCAAGGGTTGGGCTGTTCGCCCATTAAAGCGGTACGCGAGCTG GGTTTAGAACGTCGTGAGACAGTTCGGTCCCTATCCGCTGTGCGCGTAGGAATATTGAGAAG GGCTGTCCCTAGTACGAGAGGACCGGGACGGACGAACCTCTGGTGTGCCAGTTGTCCTGCCA AGGGCATGGCTGGTTGGCTACGTTCGGAAAGGATAACCGCTGAAAGCATCTAAGCGGGAAG CCTGCTTCGAGATGAGTATTCCCACCCTCTTGAAGGGTTAAGGCTCCCAGTAGACGACTGGG TTGATAGGCCAGATGTGGAAGCCCGGTAACGGGTGGAGCTGACTGGTACTAATAGGCCGAG GGCTTGTCCTC

Table 2.8 lists the modified oligonucleotides that were identified in the S. griseus 16S and 23S rRNA samples. A methyl group on the base or sugar will increase the nucleoside mass by 14 Da, and RAMM is able to distinguish if the methyl group was on the base or sugar by using a-B ions and base loss ions. As an example of data interpretation quality, the RNase T1 digestion product

UC[mA]CGp from 16S rRNA was identified with a P-score of 100.0, which indicates all the a-

B/w–type and c/y type–ions were found in the spectral data. Another example is the digestion product UC[Am]CGp, which was interpreted with a P-score of 92.99, reflecting the missing a3-B and Am base loss from the molecular ion assignments. Figure 2.9 provides detailed information surrounding these interpretations.

Table 2.8 The oligonucleotides with modifications found by RAMM in S. griseus 16S and 23S rRNA samples. S. griseus 16S rRNA S. griseus 23S rRNA dot dot Oligonucleotide P score product Oligonucleotide P socre product UC[mA]Gp 90.858 0.762 UU[mA]Gp 90.858 0.750 AC[mCm]Gp 90.858 0.914 U[mA]Ugp 100.000 0.810 CC[mG]CGp 74.456 0.982 UC[Am]Gp 100.000 0.601 [mC]A[mA]CGp 92.993 0.899 AC[mCm]Gp 81.521 0.955 U[mA]CCGp 92.993 0.954 U[mA]CGp 100.000 0.629 CCC[Am]Gp 87.195 0.946 U[mA]CUGp 92.993 0.788 UC[mA]CGp 100.0 0.92 CCA[mCm]Gp 82.231 0.847 AU[mA]CGp 92.993 0.809 [Um]A[mA]CGp 92.993 0.812 AC[mA]CGp 100.000 0.766 CU[mA]CGp 92.993 0.957 [mCm]CCGp 90.858 0.659 U[mA]CCGp 100.000 0.968

49

AA[Am]CGp 100.000 0.829 CUU[mA]Gp 82.231 0.778 CAC[Am]CGp 94.272 0.939 CCC[Am]Gp 87.195 0.974 C[mA]UUCGp 94.272 0.884 CA[Gm]Ugp 92.993 0.902 UCA[mCm]Gp 92.993 0.836 UCC[Am]Cp 92.993 0.910 CUA[mA]CGp 89.445 0.899 AC[mC]CGp 92.993 0.941 UA[Am]UCGp 85.216 0.839 CAC[Am]Agp 89.445 0.815 UACA[Am]Ugp 80.400 0.827 ACU[mC]Agp 85.216 0.780 AUA[Um]CAGp 84.551 0.842 ACUC[mA]Gp 89.445 0.855 CCAC[Am]CUGp 78.382 0.856 ACA[Am]Ugp 100.000 0.836 CUAAC[Um]ACGp 69.207 0.893 CCAA[Am]CCGp 86.428 0.894 CAACCCUU[mGm]UUCUGp 68.542 0.909 AAA[Um]UCCUUGp 84.026 0.832 UAACUAUAA[mC]CAUCCUAAGp 70.269 0.861

Figure 2.9 Oligonucleotide UC[mA]CGp (a) has a relative higher score than UC[Am]CGp (b), for UC[mA]CGp all the product ions are found, however for UC[Am]CGp the a3-B ion and M- BH3 ion were missing, which indicates the modification on Adenine is more likely to be m1A.

50

2.4 Conclusions

In this chapter, I describe a new program RAMM for the stand-alone analysis of LC-MS/MS data of oligonucleotides. RAMM was developed with the particular goals of allowing modified nucleosides to be annotated within MS/MS data and for RNA modification mapping. For the former, RAMM uses a two-stage scoring function of RAMM to improve the accuracy of MS/MS sequence annotation. Besides c-, y-, w-, a-B-type ions, RAMM can also account for neutral and base loss ions that are often encountered in oligoribonucleotide MS/MS data. Although the program is pre-configured for the known enzyme-induced RNA modification motifs, a user- controlled feature allows for custom modifications (e.g., synthetic modified RNA) to be entered to handle other modifications of interest.

51

Chapter 3

Transfer RNA Modification Profiles and Codon Decoding Strategies in

Methanocaldococcus jannaschii

This chapter has been submiited as a manuscript with co-authors, Manasses Jora, Carlos G.

Acevedo-Rocha, Lennart Randau, Valérie de Crécy-Lagard, and Patrick A. Limbach.

3.1 Introduction

Our understanding of tRNA modification profiles (i.e., the specific modifications and locations in each tRNA) in archaea is limited. To date, the only archaeal tRNA sequences whose modification profiles are almost completely characterized are from Haloferax volcanii 35-36. Whether this organism is a good representative of how archaea have adapted their tRNA-based decoding machinery can only be determined by compiling information from additional archaeal tRNAs. In this chapter, I have used LC-MS/MS and modification mapping to conduct an in-depth examination of the post-transcriptionally modified nucleosides in the anticodon loop of tRNAs from Methanocaldococcus jannaschii (formerly jannaschii). These anticodon modification profiles are compared against those found in H. volcannii, and insights into archaeal decoding strategies are identified in comparison to those used by bacteria and single cell eukarya.

3.2 Experimental

3.2.1 Materials

M. jannaschii (DSM 2661) cells were a kind gift of Karl O. Stetter and Michael Thomm. The cells were grown anaerobically in a 300 L bioreactor on H2 and CO2 (80:20) at 85 °C under a pressurized

52 headspace of 200 kPa at the Archaeenzentrum of the Universität Regensburg, Germany. The cells were harvested at the midpoint of their exponential growth phase and subsequently stored at

−80 °C. Total RNA was isolated from 3.2 g of cells by TRIzol (Invitrogen) reagent according to the manufacturer’s instructions. Qiagen-Tip 2500 was used to purify tRNAs (approximately 4 mg) from total RNA60. This part of work was done by Dr. Carlos G. Acevedo-Rocha from Dr. Lennart

Randau’s lab at Max Planck Institute for Terrestrial Microbiology.

3.2.2 Total Nucleosides Preparation and LC-MS/MS Analysis

Total tRNA of M. jannaschii was denatured by heating at 100 °C for 3 min and then rapidly placed in an ice water slush. The denatured total tRNA was added to 1/10 volume of 0.1M ammonium acetate and mixed. Next, 0.1 unit (U) of P1 was added for every 1 µg of total tRNA, and the mixture was incubated at 45 °C for 120 min. Then 0.0001 U snake venom and 0.003 U Antarctic were added per µg of tRNA and this mixture was incubated at

37 °C for 120 min. The resulting sample of hydrolyzed nucleosides was dried in a vacuum concentrator. The nucleoside digest was reconstituted in mobile phase A (MPA, 5 mM ammonium acetate pH = 5.3), and separated on a Waters HSS T3 column (100 Å, 1.8 µm, 2.1 × 50 mm). Mass spectral data were obtained on a Thermo Scientific Orbitrap Fusion Lumos Tribrid mass spectrometer, following the chromatographic and mass spectrometry conditions previously reported 61. The nucleoside data was acquired by Manasses Jora.

3.2.3 Ribonuclease Digestion and LC-MS/MS Analysis

Total tRNA of M. jannaschii was denatured as described above. RNase T1, RNase A, RNase U2,

RNase MC1 and Cusativin were used to digest the total tRNA mixture into smaller

53 oligonucleotides amenable to LC-MS/MS RNA modification mapping. For each RNase digest, 4

µg of total tRNA was prepared in 1/3 volume of 200 mM ammonium acetate. 50 U of RNase T1 was added for every 1 µg of total tRNA, and the mixture was incubated at 37 °C for 2 h. 0.001 U of RNase A was added for every 1 µg of total tRNA and the mixture was incubated at 37 °C for

30 min. Two micrograms of RNase U2 were added for every 1 µg of total tRNA, and the mixture was incubated at 65 °C for 30 min 27. One and a half micrograms of RNase MC1 were added for every 1 µg of total tRNA, and the mixture was incubated 37 °C for 120 min26. One microgram of

Cusativin was added for every 1 µg of total tRNA, and the mixture was incubated at 37 °C for 60 min25.

The RNase digestion products were lyophilized and rehydrated in MPA (200 mM HFIP, 8.15mM

TEA, pH = 7.0), and then separated on a Waters XBridge C18 column (3.5 µm, 1 × 150 mm) at a flow rate of 60 µL/min and temperature of 50 °C, with a gradient of 5% MPB (50% MPA, 50% methanol, v:v, pH = 7.0) to 20% MPB in 5 min, 20% MPB to 95% MPB in 43 min, hold at 95%

MPB for 5 min, and re-equilibrate the column for 15 min at 5% B. Mass spectrometry analysis was conducted using a Thermo LTQ-XL mass spectrometer in negative polarity with a capillary temperature of 275 °C, spray voltage of 4 kV, capillary voltage of -100 V, sheath gas, auxiliary gas and sweep gas at 40, 10, and 10 arbitrary units, respectively. The sample was analyzed over an m/z range from 500 to 2000 for the full scan, followed by four data-dependent acquisition scans.

3.2.4 Data Analysis

The tRNA sequences of M. jannaschii were obtained from the GtRNAdb (http://gtrnadb.ucsc.edu/)

62 Ile 11 . The only isoleucine tRNA listed in the GtRNAdb was for tRNA GAU. As for other archaea ,

54

Met Ile the tRNA CAU sequence in the GtRNAdb is most likely the tRNA CAU sequence and was used for all data interpretation and annotation steps. The complete list of genomic tRNA sequences used for modification mapping are listed in Table 3.1.

Table 3.1 The tRNA sequences of M. Jannaschii. >Methanocaldococcus Jannaschii chr.trna34-AlaGGC GGGCUGGUAGCUCAGACUGGGAGAGCGCCGCAUUGGCUGUGCGGAGGCCGCGGG UUCAAAUCCCGCCCAGUCCACCA >Methanocaldococcus Jannaschii chr.trna15-AlaUGC GGGCCCGUAGCUCAGCUGGGAGAGCGCCGGCCUUGCAAGCCGGAGGCCGUGGGUU CAAAUCCCACCGGGUCCA >Methanocaldococcus Jannaschii chr.trna28-ArgGCG GCCCGGGUCGCCUAGCCAGGAUAGGGCGCUGGCCUGCGGAGCCAGUUUUUUCAGG GGUUCAAAUCCCCUCCCGGGCG >Methanocaldococcus Jannaschii chr.trna17-ArgUCG GCCCUGGUGGUGUAGAGGAUAUCACGGGGGACUUCGGAUCCCCAAACCCGGGUUC AAAUCCCGGCCAGGGCCUCA >Methanocaldococcus Jannaschii chr.trna7-ArgUCU GGACCCGUAGCCUAGCCUGGAUAGGGCACCGGCCUUCUAAGCCGGGGGUCGGGGG UUCAAAUCCCCUCGGGUCCGCCA >Methanocaldococcus Jannaschii chr.trna10-AsnGUU GCCCCCAUAGCUCAGAUGGUAGAGCGACGGACUGUUAAUCCGUAGGUCGCAGGUU CGAGUCCUGCUGGGGGCGCCA >Methanocaldococcus Jannaschii chr.trna22-AspGUC GCCCUGGUGGUGUAGCCCGGCCUAUCAUACGGGACUGUCACUCCCGUGACUCGGG UUCAAAUCCCGGCCAGGGCGCCA >Methanocaldococcus Jannaschii chr.trna31-CysGCA GCCGGGGUAGUCUAGGGGCUAGGCAGCGGACUGCAGAUCCGCCUUACGUGGGUUC AAAUCCCACCCCCGGCUCCA >Methanocaldococcus Jannaschii chr.trna16-GlnUUG AGCCCGGUGGUGUAGUGGCCUAUCAUCCGGGGCUUUGGACCCCGGGACCGCGGUU CGAAUCCGCGCCGGGCUACCA >Methanocaldococcus Jannaschii chr.trna8-GluUUC GCUCCGGUGGUGUAGUCCGGCCAAUCAUGCGGGCCUUUCGAGCCCGCGACCCGGG UUCAAAUCCCGGCCGGAGCAUCA >Methanocaldococcus Jannaschii chr.trna24-GlyGCC GCGGCCUUGGUGUAGCCUGGUAACACACGGGCCUGCCACGCCCGGACCCCGGGUU CAAAUCCCGGAGGCCGCACCA >Methanocaldococcus Jannaschii chr.trna25-GlyUCC GCGCCGGUGGUGUAGCCUGGUAUCACUCUGGCCUUCCAAGCCAGCGACCCGGGUU CAAAUCCCGGCCGGCGCACCA

55

>Methanocaldococcus Jannaschii chr.trna14-HisGUG GCCGGGGUGGGGUAGUGGCCAUCCUGGGGGACUGUGGAUCCCCUGACCCGGGUUC AAUUCCCGGUCCCGGCC >Methanocaldococcus Jannaschii chr.trna29-IleGAU AGGGCGGUGGCUCAGCCUGGUUAGAGUGCUCGGCUGAUAACCGAGUGGUCCGGG GUUCGAAUCCCCGCCGCCCUACCA >Methanocaldococcus Jannaschii chr.trna11-IleCAU GGGCCCGUAGCUCAGCCUGGUCAGAGCGCUCGGCUCAUAACCGAGUGGUCAAGGG UUCAAAUCCCUUCGGGCCCACCA >Methanocaldococcus Jannaschii chr.trna1-LeuGAG GCGGGGGUCGCCAAGCCAGGUCAAAGGCGCCAGAUUGAGGGUCUGGUCCCGUAGG GGUUCGCGGGUUCAAAUCCCGUCCCCCGCACCA >Methanocaldococcus Jannaschii chr.trna5-LeuUAA GCAGGGGUCGCCAAGCCUGGCCAAAGGCGCCGGACUUAAGAUCCGGUCCCGUAGG GGUUCGGGGGUUCAAAUCCCCUCCCCUGCACCA >Methanocaldococcus Jannaschii chr.trna13-LeuUAG GCAGGGGUCGCCAAGCCUGGCCAAAGGCGCUGGGCCUAGGACCCAGUCCCGUAGG GGUUCCAGGGUUCAAAUCCCUGCCCCUGCACCA >Methanocaldococcus Jannaschii chr.trna21-LysUUU GGGCCCGUAGCUCAGUCUGGCAGAGCGCCUGGCUUUUAACCAGGUGGUCGAGGGU UCAAAUCCCUUCGGGCCCG >Methanocaldococcus Jannaschii chr.trna37-MetCAU GCCGAGGUGGCUUAGCUGGUUAUAGCGCCCGGCUCAUAACCGGGAGGUCGAGGG UUCGAAUCCCUCCCUCGGCACCA >Methanocaldococcus Jannaschii chr.trna33-MetCAU AGCGGGGUAGGGUAGCCAGGUCCAUCCCGCCGGGCUCAUAACCCGGAGAUCGGAG GUUCAAAUCCUCCCCCCGCUA >Methanocaldococcus Jannaschii chr.trna9-PheGAA GCCGCGGUAGUUCAGCCUGGGAGAACGCUGGACUGAAGAUCCAGUUGUCGGGUG UUCAAAUCACCCCCGCGGCACCA >Methanocaldococcus Jannaschii chr.trna4-ProGGG GGGGCCGUGGGGUAGCCUGGAUAUCCUGUGCGCUUGGGGGGCGUGCGACCCGGG UUCAAGUCCCGGCGGCCCCACCA >Methanocaldococcus Jannaschii chr.trna19-ProUGG GGGCCUGUGGGGUAGCCUGGUCUAUCCUUUGGGAUUUGGGAUCCUGAGACCCCA GUUCAAAUCUGGGCAGGCCCACCA >Methanocaldococcus Jannaschii chr.trna27-SerGCU GCGGGGGUGGCCCAGCCUGGUACGGCGUGGGACUGCUAAUCCCAUUGGGCAACGC CCAGCCCGGGUUCAAGUCCCGGCCCCCGCGUCA >Methanocaldococcus Jannaschii chr.trna3-SerGGA GCGGAGGUAGCCUAGCCCGGCCAAGGCGUGGGACUGGAGAUCCCAUGGGGCUUUG CCCCGCGGGGGUUCAAAUCCCCCCCUCCGCGCCA >Methanocaldococcus Jannaschii chr.trna6-SerUGA GCAGGGGUAGCCAAGCCAGGACUACGGCGCUGGACUUGAGAUCCAGUGGGGCUU UGCCCCGCCUGGGUUCAAAUCCCAGCCCCUGCGCCA >Methanocaldococcus Jannaschii chr.trna30-ThrGGU

56

GCCUCGGUAGCUCAGCCUGGCGGAGCGCCUGCUUGGUAAGCAGGAGGUCGCGGGU UCAAACCCCGCCCGAGGCUCCA

>Methanocaldococcus Jannaschii chr.trna18-ThrUGU GCCUCGGUGGCUCAGCCUGGUAGAGCGCCUGACUUGUAAUCAGGUGGUCGGGGG UUCAAAUCCCCCCCGGGGCUCCA >Methanocaldococcus Jannaschii chr.trna26-TrpCCA GGGGGUGUGGUGUAGCCAGGUCUAUCAUCGGGGACUCCAGAUCCCUGGACCUGG GUUCAAAUCCCAGCACCCCCACCA >Methanocaldococcus Jannaschii chr.trna20-TyrGUA CCGGCGGUAGUUCAGCCUGGUAGAACGGCGGACUGUAGAUCCGCAUGUCGCUGGU UCAAAUCCGGCCCGCCGGACCA >Methanocaldococcus Jannaschii chr.trna36-ValCAC GGGCUCGUGGUCUAGUGGCUAUGACGCCGCCCUCACAAGGCGGUGGUCGCGGGUU CGAAUCCCGCCGAGCCCACCA >Methanocaldococcus Jannaschii chr.trna32-ValGAC GGGCUCGUGGUCUAGAUGGCUAUGAUGCCGCCCUGACACGGCGGUGGUCGGGAG UUCGAAUCUCCCCGAGCCCACCA >Methanocaldococcus Jannaschii chr.trna23-ValUAC GGGCCCAUGGUCUAGCUGGCUAUGACGUCGCCCUUACAAGGCGAAGGUCGCCGGU UCGAAUCCGGCUGGGCCCA

LC-MS/MS data from RNA modification mapping experiments were analyzed using variable sequence position modification of RNAModMapper 63. All the .RAW data files were converted to .MGF files by MSConvert from ProteoWizard, and the mass tolerance of both precursor ion and fragment ion were set to 1 Da. For RNase T1, the 3′ end of digestion products was set to phosphate, and the missed cleavages was set to 0. For RNase A, RNase U2, RNase MC1 and Cusativin, the

3′ end of digestion products was set to cyclic phosphate, and the missed cleavages was set up to 4.

The P-score and dot product score threshold were set to 70 and 0.8, respectively. Oligonucleotides having at least 80% of c and y ions identified were then used for mapping to the tRNA gene sequences. If a modification is present, only when the corresponding c or y ions necessary to localize the position of this modification in the digestion product were identified, was such data then mapped onto the tRNA gene sequence. When possible, signature digestion products (SDPs)64-

57

67 were used to establish the final localization of modifications within the anticodon stem loop of tRNAs.

3.3 Results and Discussion

3.3.1 Modified Nucleosides in M. jannaschii Total tRNAs

A total of 30 post-transcriptionally modified nucleosides in M. jannaschii tRNAs were identified by LC-MS/MS analysis of total tRNA nucleoside digests (Table 3.2). The extracted ion chromatograms, MS and MS/MS spectral data of all 30 modified nucleosides are provided in

Figures 3.1-3.30. Of these, 16 would be expected to be localized in the anticodon stem-loop. Five modified nucleosides (mnm5U, mnm5s2U, cnm5U, cnm5s2U, and agmatidine) are typically found at the wobble position (Position 34). Eight modified nucleosides (m1G, imG-14, imG, mimG, t6A, ms2t6A, hn6A, and ms2hn6A) are known to be localized to position 37 of the tRNA loop. Three modified nucleosides, Um, s2C and Cm, are often found in the anticodon stem at position 32.

58

Table 3.2. The detected post-transcriptional modifications in M. jannaschii. The abbreviations of modifications are from the MODOMICS database 3.

Nucleoside RT(min) M+H CID Fragments m5C 4.76 258.1082 126.0661 Cm 6.62 258.1082 112.0506 s2C 3.66 260.0698 128.0278 C+ 28.73 356.2037 224.1617 Ψ 1.51 245.0766 209.0556/179.0450/155.0450 m1Ψ 3.50 259.0923 223.0712/193.0607/169.0608 s4U 10.07 261.0537 129.0117 Um 12.34 259.0923 113.0346 cnm5U 6.81 284.0876 152.0454 cnm5s2U 17.97 300.0646 168.0226 mnm5U 1.90 288.1188 156.0767 mnm5s2U 5.07 304.0959 172.0538 Am 29.38 282.1194 136.0617 m1A 4.05 282.1194 150.0773 m6A 31.17 282.1194 150.0773 m1I 17.73 283.1034 151.0613 t6A 30.94 413.1413 281.0990 hn6A 32.44 427.1568 295.1147 ms2t6A 32.69 459.1291 327.0865 ms2hn6A 33.71 473.1445 341.1023 G+ 29.76 325.1251 193.0831 Gm 17.34 298.1143 152.0566 m2G 20.26 298.1144 166.0724 2 m 2G 29.20 312.1299 180.0878 2 m 2Gm 32.57 326.1456 180.0878 m2Gm 31.27 312.1298 166.0722 m1G 18.30 298.1143 166.0723 imG-14 31.75 322.1143 190.0722 imG 32.39 336.1299 204.0878 mimG 34.35 350.1456 218.1035

59

Figure 3.1 XIC, MS and MS/MS data of m5C. Figure 3.2 XIC, MS and MS/MS data of Cm.

Figure 3.3 XIC, MS and MS/MS data of s2C. Figure 3.4 XIC, MS and MS/MS data of C+.

60

Figure 3.5 XIC, MS and MS/MS data of Ψ. Figure 3.6 XIC, MS and MS/MS data of m1Ψ.

Figure 3.7 XIC, MS and MS/MS data of s4U. Figure 3.8 XIC, MS and MS/MS data of Um.

61

Figure 3.9 XIC, MS and MS/MS data of cnm5U. Figure 3.10 XIC, MS and MS/MS data of cnm5s2U.

Figure 3.11 XIC, MS and MS/MS data of mnm5U. Figure 3.12 XIC, MS and MS/MS data of mnm5s2U.

62

Figure 3.13 XIC, MS and MS/MS data of Am. Figure 3.14 XIC, MS and MS/MS data of m1A.

Figure 3.15 XIC, MS and MS/MS data of m6A. Figure 3.16 XIC, MS and MS/MS data of m1I.

63

Figure 3.17 XIC, MS and MS/MS data of t6A. Figure 3.18 XIC, MS and MS/MS data of ms2t6A.

Figure 3.19 XIC, MS and MS/MS data of hn6A. Figure 3.20 XIC, MS and MS/MS data of ms2hn6A.

64

Figure 3.21 XIC, MS and MS/MS data of G+. Figure 3.22 XIC, MS and MS/MS data of Gm.

2 2 Figure 3.23 XIC, MS and MS/MS data of m G. Figure 3.24 XIC, MS and MS/MS data of m 2G.

65

2 2 Figure 3.25 XIC, MS and MS/MS data of m 2Gm. Figure 3.26 XIC, MS and MS/MS data of m Gm.

Figure 3.27 XIC, MS and MS/MS data of m1G. Figure 3.28 XIC, MS and MS/MS data of imG-14.

66

Figure 3.29 XIC, MS and MS/MS data of imG. Figure 3.30 XIC, MS and MS/MS data of mimG.

67

The majority of these modified nucleosides have been characterized before, either in archaea or other organisms 3. However, several modified nucleosides were either unique or required additional confirmation. For example, cnm5U has recently been reported in archaea 34, and I report here both the presence of this position 34 modification and its 2-thiolated analog, cnm5s2U, which was not found in Haloarcula marismortui and has not been previously characterized. The identification of cnm5s2U was easily enabled by higher-energy collisional dissociation (HCD) LC-

MS/MS of nucleosides 61. Figure 3.31 is the HCD MS/MS spectrum obtained when m/z 284, corresponding to cnm5U, was dissociated. The peak at m/z 152.0455 corresponds to the modified nucleobase ion of cnm5U. The peak at m/z 125.0346 is a diagnostic base fragment ion that was observed previously 34. A similar HCD fragmentation pattern was seen when m/z 300, corresponding to cnm5s2U, was dissociated (Figure 3.31 B). Peaks at m/z 168.0232 and m/z

141.0123 were detected, which match the addition of sulfur in place of oxygen in the elemental formula, as compared against m/z 152.0455 and m/z 125.0346 for cnm5U.

Figure 3.31 HCD spectral data of (A) cnm5U and (B) cnm5s2U. A1 and B1 are the base ions, and A2 and B2 are the fragment ions of A1 and B1, respectively.

68

The HPLC elution profile for cnm5U was similar to that previously reported 34, and as is common for other 2-thiolated nucleosides (e.g., mnm5U/mnm5s2U Figures 3.11 and 3.12), the 2-thio analog elutes later than the 2-oxo version. The HCD-MS/MS spectral data of mnm5U and mnm5s2U were also obtained (Figure 3.32) to confirm the fragment ion relationships between 2-thio and 2-oxo modified uridines.

Figure 3.32. HCD-MS/MS spectral data of mnm5U (A) and mnm5s2U (B). A1 and B1 are the base ions, and A2 and B2 are the fragment ions of A1 and B1, respectively.

N6-hydroxynorvalylcarbamoyladenosine, hn6A, and 2-methylthio-N6- hydroxynorvalylcarbamoyladenosine, ms2hn6A, were identified in M. jannaschii. A particular challenge in identifying hn6A is that the modified adenosine, m6t6A, has the same mass as hn6A.

Here, hn6A was confirmed by a combination of HPLC elution time and HCD fragmentation pattern

(Figure 3.33) 61. The HCD spectra of hn6A and m6t6A differ significantly. Adenosine, t6A and

69 hn6A share similar ions at m/z 136.0619 and m/z 119.0354, which arise from the adenine nucleobase. In contrast, the HCD spectrum for m6t6A is more similar to m6A, as both contain ions at m/z 150.0777 and m/z 123.0667, arising from the N6-methyladenine nucleobase and its fragment ion. The fragmentation of adenine and N6,N6-dimethyladenine have been studied before 68-69, and these studies indicated that the methyl group at position 6 affected the dissociation of the adenine nucleobase. In our study, the same fragmentation effects were observed – the labile groups (N6- hydroxynorvalylcarbamoyl group from hn6A and the N6-threonylcarbamoyl group from t6A) at position 6 of the adenine nucleobase are dissociated to produce protonated adenine.

Figure 3.33 HCD-MS/MS spectral data of hn6A, m6t6A, A, m6A, and t6A.

70

Wyosine (imG), which is a modified nucleoside found in archaea and eukaryotes, was identified in M. jannaschii. It has the same mass as its isomer isowyosine (imG2), however these isomers can be differentiated through nucleobase fragmentation patterns. Nucleobase fragmentation of imG shows the loss of CH4, CO, HCN, and HCN, while only CO and HCN fragments are derived

70 from the nucleobase of imG2 . HCD-MS/MS identified the loss of CH3, CO, HCN, and HCN from the wyosine nucleobase (Figure 3.34), which indicates that imG was present in M. jannaschii.

Figure 3.34 CID-MS/MS (A) and HCD-MS/MS (B) of imG at m/z 336.1299.

In several instances, modified nucleosides arising from a single modification pathway were detected and the relative levels of these modifications can reveal some insight into tRNA

71

(hypo)modification status. For example, t6A and ms2t6A were both detected. Comparing the peak areas for these two nucleosides as a means to estimate their relative abundances revealed that the

2-methylthio modification was rare (t6A:ms2t6A ratio of 21:1). In contrast, hn6A and ms2hn6A were found to be nearly equivalent (hn6A:ms2hn6A ratio of 1.5:1) suggesting M. jannaschii tRNAs utilize the MtaB homolog 71 in a tRNA-dependent manner. Another position 37 modification pathway, that of methylwyosine, yielded three components (excluding m1G, which is the first step in the pathway 70): imG-14, imG and mimG. The peak areas of imG-14, imG, and mimG were calculated from the nucleoside analysis data to estimate their relative abundances. This semi- quantitative analysis yielded peak area ratios of 20:23:1 for imG-14:imG:mimG, suggesting imG-

14 can accumulate and mimG is a rare modified nucleoside in this sample.

3.3.2 Modification Mapping of M. jannaschii tRNA

Once the census of modified nucleosides was obtained and the identities of all modifications confirmed, the next step was to conduct RNA modification mapping by LC-MS/MS. The protocol for such analyses is well-established even when analyzing the pool of total tRNAs 22, 72. However, in this study my particular interest was to focus on obtaining high quality mapping data for the anticodon stem-loop, to enable an enhanced understanding of the decoding strategy for this archaeon.

M. jannaschii contains 35 unique tRNA sequences. Sufficient LC-MS/MS data was obtained to characterize the anticodon regions of each unique tRNA sequence except for selenocysteine

(tRNA-Sec) 73. The various RNase digestion products that generated oligonucleotides containing position 34 and 37 for these 34 tRNA sequences are listed in Table 3.3. Detailed MS/MS spectral

72 data and peak annotations are listed in Figures 3.35-3.92. One feature that simplified assigning and annotating the LC-MS/MS data was that the majority of these digestion products are signature digestion products. In such cases, once the MS/MS data from the RNase digestion product is interpreted and any modified nucleoside is localized within the digestion product, that information can be annotated to only a single tRNA sequence in M. jannaschii.

Table 3.3 The detected oligonucleotides from anticodon stem-loops. The enzyme used to generate the digestion product is labeled. T1: RNase T1, A: RNase A, MC1: RNase MC1, Cus: Cusativin, U2: RNase U2. Anticodon positions 34, 35, 36, and 37 identified by bold text.

tRNA- Detected oligonucleotide anticodon Ala-GGC CA[Um]UGp (T1), UGGCUGp (MC1)

Ala-UGC CU[cnm5U]GCp (Cus), CAAGp (T1)

Arg-GCG GGCCUGCp (A), GCGGAGCCp (A)

Arg-UCG [cnm5U]CG[imG-14]AUp (A)

Arg-UCU CCU[cnm5U]CU[t6A]AGp (T1), CCU[cnm5s2U]CU[t6A]AGp (T1)

Asn-GUU GUU[t6A]AUp (A)

Asp-GUC GGGACUGUp (A), UCACUCCCGp (T1)

Cys-GCA UGCA[m1G]AUCCGCCp (MC1), UGCA[imG-14]AUCCGCCp (MC1), CA[imG-14]AUCCGp (T1) Gln-UUG U[mnm5U]UG[m1G]ACCCCp (Cus), U[mnm5s2U]UG[m1G]ACCCCp (Cus) Glu-UUC CCU[mnm5s2U]UC[m1G]AGp (T1)

Gly-GCC GGGC[Cm]UGCp (A), CCACGp (T1)

Gly-UCC CCU[mnm5U]CCAAGp (T1)

His-GUG GUG[m1G]AUCp (A)

Ile-GAU GAU[t6A]ACp (A), GAU[hn6A]ACp (A)

Ile-CAU CU[C+]AU[hn6A]ACCGp (T1)

73

Leu-GAG GAGGGUCUp (A)

Leu-UAA [cnm5U]AA[m1G]AUCp (A), [cnm5U]AA[imG]AUCp (A), [cnm5s2U]AA[m1G]AUCp (A) Leu-UAG CC[cnm5s2U]AGp (T1), [m1G]ACCCAGp (T1)

Lys-UUU [Cm]U[cnm5s2U]UU[t6A]ACCAGp (T1), [Cm]U[cnm5s2U]UU[hn6A]ACCAGp (T1) Met-CAU1 CUCAUAACCGp (T1), CU[Cm]AU[hn6A]ACCGp (T1), CU[Cm]AU[ms2hn6A]ACCGp (T1) Met-CAU2 CUCAUAACCCGp (T1)

Phe-GAA AA[m1G]AUCCAGp (T1), GAA[m1G]AUp (A), GAA[imG]AUp (A)

Pro-GGG GGG[m1G]GGCp (A)

Pro-UGG AUU[cnm5U]Gp (T1), [m1G]AUCCUGp (T1)

Ser-GCU GGGACUGCp (A), CU[hn6A]AUCCCAUUGp (T1)

Ser-GGA GGA[m1G]AUCp (A), GGA[imG]AUCp (A)

Ser-UGA U[cnm5s2U]GA[m1G]AUCp (A)

Thr-GGU GGU[hn6A]AGCp (A)

Thr-UGU GACU[cnm5U]GUp (A), U[hn6A]AUCAGp (T1)

Trp-CCA A[s2C]UCCA[m1G]AUCCCUGp (T1)

Tyr-GUA GUA[imG-14]AUp (A)

Val-CAC CCCUCACAAGp (T1)

Val-GAC CC[Cm]UGp (T1), ACACGp (T1)

Val-UAC CCU[cnm5U]ACp (A), CCCU[cnm5U]ACAAGp (T1)

74

Figure 3.35 MS/MS spectral data and peak annotation of UGGCUGp.

Figure 3.36 MS/MS spectral data and peak annotation of CA[Um]UGp.

Figure 3.37 MS/MS spectral data and peak annotation of CU[cnm5U]GCp.

75

Figure 3.38 MS/MS spectral data and peak annotation of CAAGp.

Figure 3.39 MS/MS spectral data and peak annotation of GGCCUGCp.

Figure 3.40 MS/MS spectral data and peak annotation of GCGGAGCCp.

76

Figure 3.41 MS/MS spectral data and peak annotation of [cnm5U]CG[imG-14]AUp.

Figure 3.42 MS/MS spectral data and peak annotation of CCU[cnm5U]CU[t6A]AGp.

77

Figure 3.43 MS/MS spectral data and peak annotation of CCU[cnm5s2U]CU[t6A]AGp.

Figure 3.44 MS/MS spectral data and peak annotation of GGGACUGUp.

78

Figure 3.45 MS/MS spectral data and peak annotation of GUU[t6A]AUp.

Figure 3.46 MS/MS spectral data and peak annotation of UCACUCCCGp.

79

Figure 3.47 MS/MS spectral data and peak annotation of UGCA[imG-14]AUCCGp.

Figure 3.48 MS/MS spectral data and peak annotation of CA[imG-14]AUCCGp.

Figure 3.49 MS/MS spectral data and peak annotation of UGCA[m1G]AUCCGCCp.

80

Figure 3.50 MS/MS spectral data and peak annotation of U[mnm5U]UG[m1G]ACCCCp.

Figure 3.51 MS/MS spectral data and peak annotation of U[mnm5s2U]UG[m1G]ACCCCp.

Figure 3.52 MS/MS spectral data and peak annotation of CCU[mnm5s2U]UC[m1G]AGp.

81

Figure 3.53 MS/MS spectral data and peak annotation of GGGC[Cm]UGCp.

Figure 3.54 MS/MS spectral data and peak annotation of CCACGp.

Figure 3.55 MS/MS spectral data and peak annotation of CCU[mnm5U]CCAAGp.

82

Figure 3.56 MS/MS spectral data and peak annotation of GUG[m1G]AUCp.

Figure 3.57 MS/MS spectral data and peak annotation of GAU[t6A]ACp.

83

Figure 3.58 MS/MS spectral data and peak annotation of GAU[hn6A]ACp.

Figure 3.59 MS/MS spectral data and peak annotation of GAGGGUCUp.

84

Figure 3.60 MS/MS spectral data and peak annotation of CU[C+]AU[hn6A]ACCGp.

Figure 3.61 MS/MS spectral data and peak annotation of [cnm5U]AA[m1G]AUCp.

85

Figure 3.62 MS/MS spectral data and peak annotation of [cnm5s2U]AA[m1G]AUCp.

Figure 3.63 MS/MS spectral data and peak annotation of [cnm5U]AA[imG]AUCp.

86

Figure 3.64 MS/MS spectral data and peak annotation of CC[cnm5s2U]AGp.

Figure 3.65 MS/MS spectral data and peak annotation of [m1G]ACCCAGp.

Figure 3.66 MS/MS spectral data and peak annotation of [Cm]U[cnm5s2U]UU[t6A]ACCAGp.

87

Figure 3.67 MS/MS spectral data and peak annotation of [Cm]U[cnm5s2U]UU[hn6A]ACCAGp.

Figure 3.68 MS/MS spectral data and peak annotation of CUCAUAACCGp.

Figure 3.69 MS/MS spectral data and peak annotation of CUCAUAACCCGp.

88

Figure 3.70 MS/MS spectral data and peak annotation of CU[Cm]AU[hn6A]ACCGp.

Figure 3.71 MS/MS spectral data and peak annotation of GAA[m1G]AUp.

89

Figure 3.72 MS/MS spectral data and peak annotation of CU[Cm]AU[ms2hn6A]ACCGp.

Figure 3.73 MS/MS spectral data and peak annotation of AA[m1G]AUCCAGp.

90

Figure 3.74 MS/MS spectral data and peak annotation of GAA[imG]AUp.

Figure 3.75 MS/MS spectral data and peak annotation of GGG[m1G]GGCp.

91

Figure 3.76 MS/MS spectral data and peak annotation of AUU[cnm5U]Gp.

Figure 3.77 MS/MS spectral data and peak annotation of [m1G]AUCCUGp.

Figure 3.78 MS/MS spectral data and peak annotation of GGGACUGCp.

92

Figure 3.79 MS/MS spectral data and peak annotation of CU[hn6A]AUCCCAUUGp.

Figure 3.80 MS/MS spectral data and peak annotation of GGA[m1G]AUCp.

Figure 3.81 MS/MS spectral data and peak annotation of U[cnm5s2U]GA[m1G]AUCp.

93

Figure 3.82 MS/MS spectral data and peak annotation of GGA[imG]AUCp.

Figure 3.83 MS/MS spectral data and peak annotation of GGU[hn6A]AGCp.

94

Figure 3.84 MS/MS spectral data and peak annotation of GACU[cnm5U]GUp.

Figure 3.85 MS/MS spectral data and peak annotation of U[hn6A]AUCAGp.

Figure 3.86 MS/MS spectral data and peak annotation of A[s2C]UCCA[m1G]AUCCCUGp.

95

Figure 3.87 MS/MS spectral data and peak annotation of GUA[imG-14]AUp.

Figure 3.88 MS/MS spectral data and peak annotation of CCCUCACAAGp.

Figure 3.89 MS/MS spectral data and peak annotation of CC[Cm]UGp.

96

Figure 3.90 MS/MS spectral data and peak annotation of ACACGp.

Figure 3.91 MS/MS spectral data and peak annotation of CCU[cnm5U]ACp.

Figure 3.92 MS/MS spectral data and peak annotation of CCCU[cnm5U]ACAAGp.

97

Several digestion products were detected that do not fit the SDP criteria, but rather could map onto at least two unique tRNA sequences. However, by understanding and applying common rules for localizing modified nucleosides within a tRNA sequence, only a single tRNA sequence was the logical solution for mapping. For example, CCUAGp is an in silico predicted RNase T1 digestion

Arg Arg Leu product that could arise from four different tRNAs: tRNA GCG, tRNA UCU, tRNA UAG, and

Ser tRNA GGA, When the LC-MS/MS data was acquired, analyzed and interpreted, this digestion product was found to correspond to CC[cnm5s2U]AGp. The 2-thio-5-cyanomethyluridine modification should occur at position 34, as noted above. The only tRNA among the four where

Leu CCUAGp arises from position 32 through position 36 is tRNA UAG, thus specifically mapping this potentially ambiguous RNase T1 digestion product back onto a single tRNA sequence.

In some cases a single RNase digestion product will not cover the entire anticodon region. In those cases, multiple digestion products were required to ensure both position 34 and position 37 were accurately mapped. For example, GACU[cnm5U]GUp and U[hn6A]AUCAGp were identified to

Thr cover the anticodon loop in tRNA UGU. The use of multiple RNases allowed all anticodon regions to be mapped with SDPs or with digestion products that cannot logically be mapped elsewhere in the sequence.

Where possible from the data, position 32 within each tRNA sequence was mapped. Position 32 modifications were found to be rare in M. jannaschii tRNAs. Of the three putative modified

Lys nucleosides for this position identified above, Cm was identified at position 32 of tRNA UUU ,

Met Gly Val Ala tRNA CAU, tRNA GCC, and tRNA GAC, Um was identified at position 32 of tRNA GGC, and

2 Trp s C was identified at position 32 of tRNA CCA. Other sites of modification that could be unambiguously identified in the data were also assigned, and a consensus modification sequence is shown in Figure 3.93.

98

Many modifications outside of the anticodon could also be localized in this work. One challenge with mapping regions outside of the anticodon is that tRNA sequence similarities reduce the utility of the SDP approach. For example, the T-stem loop region is highly conserved in many M. jannaschii tRNAs, leading to RNase digestion products that could arise from more than one tRNA.

Figure 3.93 The consensus modification sequence for M. jannaschii tRNAs. The positions of modifications placed on the tRNA secondary structure were experimentally determined in this study. The positions of modifications predicted or reported in other studies are listed at the bottom of the figure. The modifying genes were predicted by Dr. Valérie de Crécy-Lagard, and they are indicated by red.

99

In those cases, modifications were localized to specific regions although it remains to be determined whether every possible tRNA possesses that modification. The selected MS/MS spectral data and peak annotations for assignments outside of the anticodon are provided in

Figures 3.94-3.131. Combining the high quality anticodon mapping data with these other localization data results in the first compilation of modified tRNA sequences from M. jannaschii

(Figure 3.132).

Figure 3.94 MS/MS spectral data and peak annotation of CUCA[G+]ACUGp.

Figure 3.95 MS/MS spectral data and peak annotation of [Cm][m1I][m1A]AUCCp.

100

Figure 3.96 MS/MS spectral data and peak annotation of CGUA[m2G]CUp.

Figure 3.97 MS/MS spectral data and peak annotation of [m2G]GGUCCp.

Figure 3.98 MS/MS spectral data and peak annotation of G[s4U]A[m2G]CCp.

101

Figure 3.99 MS/MS spectral data and peak annotation of [m2G]GGUCCGCp.

Figure 3.100 MS/MS spectral data and peak annotation of CCCCCA[s4U]AGp.

2 Figure 3.101 MS/MS spectral data and peak annotation of C[m 2Gm]ACGp.

102

Figure 3.102 MS/MS spectral data and peak annotation of CG[m1A]GUp.

Figure 3.103 MS/MS spectral data and peak annotation of [m2G]GGGGCp.

Figure 3.104 MS/MS spectral data and peak annotation of [m2G]GGGGCp.

103

Figure 3.105 MS/MS spectral data and peak annotation of CCUUA[m5C]Gp.

Figure 3.106 MS/MS spectral data and peak annotation of UCUA[G+] Gp.

Figure 3.107 MS/MS spectral data and peak annotation of [m1I]AAUCCp.

104

Figure 3.108 MS/MS spectral data and peak annotation of GGGAC[m5C]p.

Figure 3.109 MS/MS spectral data and peak annotation of [Cm]G[m1A]AUCCGCp.

Figure 3.110 MS/MS spectral data and peak annotation of CCU[s4U]Gp.

105

Figure 3.111 MS/MS spectral data and peak annotation of ACC[m5C]CGp.

Figure 3.112 MS/MS spectral data and peak annotation of GGA[m2G]GCp.

Figure 3.113 MS/MS spectral data and peak annotation of [m1I]AUUCp.

106

Figure 3.114 MS/MS spectral data and peak annotation of [m1A]AUCCCCGp.

2 Figure 3.115 MS/MS spectral data and peak annotation of UCCAUCCC[m 2G]CCGp.

2 Figure 3.116 MS/MS spectral data and peak annotation of UCCAUCCC[m 2Gm]CCGp.

107

Figure 3.117 MS/MS spectral data and peak annotation of A[m2G]CUCA[G+]Up.

Figure 3.118 MS/MS spectral data and peak annotation of [m1A]AUCCCUCCCUCGp.

2 Figure 3.119 MS/MS spectral data and peak annotation of G[m 2G]GGUp.

108

Figure 3.120 MS/MS spectral data and peak annotation of AC[m5C]CCAGp.

Figure 3.121 MS/MS spectral data and peak annotation of [s4U]AGCCAp.

Figure 3.122 MS/MS spectral data and peak annotation of C[m1I]AACCCp.

109

Figure 3.123 MS/MS spectral data and peak annotation of CUGGAC[m5C]Up.

Figure 3.124 MS/MS spectral data and peak annotation of [m1A]AUCCCGp.

Figure 3.125 MS/MS spectral data and peak annotation of G[m2G]UCUA[G+]AUp.

110

Figure 3.126 MS/MS spectral data and peak annotation of GGGAG[m1Ψ]Up.

Figure 3.127 MS/MS spectral data and peak annotation of [m1A]AUCUCCCC[m2G]p.

Figure 3.128 MS/MS spectral data and peak annotation of [Cm]G[m1A]AUCUCp.

111

Figure 3.129 MS/MS spectral data and peak annotation of CCCA[s4U]Gp.

2 Figure 3.130 MS/MS spectral data and peak annotation of AC[m 2G]UCGp.

Figure 3.131 MS/MS spectral data and peak annotation of U[m2G]GGCCCp.

112

Figure 3.132 Compilation of modified tRNA sequences from M. jannaschii. The numbering of sequences are adopted from tRNAdb 74. Modifications identified by signature digestion products are represented in bold black. Modifications in grey are listed based on mass spectral data obtained from digestion products that can not uniquely be assigned to a single tRNA. For digestion products containing more than one modified nucleoside at a single location, all such modifications are listed.

113

3.3.3 Discussion

Modified Nucleosides in M. jannaschii

McCloskey and colleagues previously characterized post-transcriptionally modified nucleosides in M. jannaschii tRNAs 32. They characterized 25 modified nucleosides in that work. All of those were also found here except for inosine and an unknown nucleoside of molecular mass 422 Da, which is likely yW-86 70. McCloskey and colleagues noted that the presence of inosine might result from artefactual deamination of adenosine. I identified seven modified nucleosides that were not found in the work by McCloskey and co-workers: s2C, agmatidine, cnm5U, cnm5s2U, imG-14, mimG, and ms2hn6A.

The census of modified nucleosides in M. jannaschii tRNAs reveals a common suite of structural and functional archaeal modifications. With the exception of cnm5s2U, all modified nucleosides have been identified before 3. Several modifications (t6A, m1G and Cm) are found throughout all kingdoms. Others, such as mnm5U, mnm5s2U, hn6A and s2C, are also found in bacterial tRNAs.

Four modifications, cnm5U, cnm5s2U, agmatidine and mimG, have only been discovered in archaea to date. Our analyses revealed the presence of imG and imG-14, which are believed to be within the wybutosine pathway and have previously been found in eukaryotes 70, 75.

Table 3.4 The tRNA modifications have been previously studied from different archaea.

Methanococcus Methanococcus Methanococcus Methanococcus Methanococcus Methanoccoides Stetteria maripaludis vannielii thermolithotrophicus igneus jannaschii burtonii hydrogenophila Euryarchaeota Euryarchaeota Euryarchaeota Euryarchaeota Euryarchaeota

Ψ + + + + + + + D + m1A + + + + + + + m2A + s2C + + + + m1Ψ + + + tr tr +

Ψm

114 mnm5s2U + + + + + m5C + + + + + + m5Cm +

Cm + + + + + + + mcm5s2U

I + + + + + +

Um tr tr + + + s2U s2Um s4U + + + + + + m1I + + + + + + m1Im m1G + + + + + + + Gm + tr + + + + + m2G + + + + + + + m7G + +

G+ + + + + + +

2 m 2G + + + + + + + t6A + + + + + + + Am + + + m6A + + + tr + + + ac4C + ac4Cm + m2Gm + + + mimG + hn6A + + + + + + + imG + + + + + + imG2 + ms2t6A + + + tr + +

2 m 2Gm + + + ms2hn6A + +

Thermoplasma Methanobacterium Methanothermus Sulfolobus Thermoproteus acidophilum thermoautotrophicum fulgidus fervidus solfataricus neutrophilus Euryarchaeota Euryarchaeota Euryarchaeota Euryarchaeota Crenarchaeota Crenarchaeota

Ψ +

D + + m1A + m2A s2C m1Ψ + + +

Ψm + mnm5s2U

115 m5C + m5Cm + +

Cm + mcm5s2U

I +

Um + s2U + + s2Um + + s4U + m1I + m1Im m1G +

Gm + m2G + m7G + +

G+ + + + + + + 2 m 2G + t6A +

Am + + m6A + ac4C + ac4Cm + + m2Gm + + + mimG + + hn6A imG imG2 ms2t6A +

2 m 2Gm + + + + ms2hn6A

Acidianus Thermodiscus Thermococcus Pyrobaculum Pyrodictium Haloferax infernus maritimus sp. islandicum occultum Volcani Crenarchaeota Crenarchaeota Euryarchaeota Crenarchaeota Crenarchaeota Euryarchaeota

Ψ +

D + m1A m2A s2C + m1Ψ + +

Ψm + +

116 mnm5s2U m5C + m5Cm + + + + +

Cm + mcm5s2U +

I

Um + s2U + s2Um + + + + s4U m1I + m1Im + + m1G +

Gm m2G + m7G

G+ + + + +

2 m 2G + t6A +

Am + + + + + m6A ac4C + ac4Cm + + + + + m2Gm + + + + mimG + + + + hn6A imG imG2 ms2t6A

2 m 2Gm + + + + ms2hn6A

There are 19 of archaea (including M. jannaschii) whose tRNA modifications have been studied before 32-33, 76. Table 3.4 summarizes the distribution of 38 different post-transcriptionally modified nucleosides among these 19 species. The modified nucleoside that are found in

Euryarchaeota and Crenarchaeota are compared as shown in Figure 3.133. When focusing on the anticodon stem-loop, there are five position 34 modified uridines (mnm5U, mnm5s2U, mcm5s2U, cnm5U, cnm5s2U) that are found in Euryarchaeota but not in Crenarchaeota. All except mcm5s2U

117 were found here. Four position 37 modifications, t6A, ms2t6A, hn6A and ms2hn6A, have been found in both Euryarchaeota, including M. jannaschii, and Crenarchaeota. As discussed further below, the wyosine pathway varies between euryarchaeota and crenarchaeota 70.

Figure 3.133 The modified nucleosides in Euryarchaeota and Crenarchaeota from previous studies and this study.

RNA Modification Mapping of Archaeal Total tRNA Pools

Despite the availability of new technologies and analytical methods, very little data is available on the tRNA modification profiles for archaeal organisms. A review of the Modomics database finds only 61 archaeal tRNA sequences whose modification profiles are known or partially known. As noted before, the vast majority (41 of 61) arise from Haloferax volcanii. The remainder are distributed among six other archaea: salinarumm, 12; Thermoplasma acidophilum,

3; Methanobacterium thermaggregans, 2; and one each from Halococcus morrhuae,

Methanosarcina barkeri, and Sulfolobus acidocaldarius.

118

Although multiple techniques exist to localize modifications onto RNA sequences, mass spectrometry-based approaches have been the most successful for analyzing tRNAs, which can be challenging due to both the structural diversity of modified nucleosides and the relatively high density of modifications per tRNA sequence 3. Mass spectrometry has the advantage of unbiased detection – every modification that results in a mass change to the canonical nucleoside can be detected directly. Moreover, the mass change alone is often sufficient to identify the modification.

When these characteristics are combined with the standard tandem mass spectrometry approach for sequencing oligonucleotides 21-22, 77, the specific sequence location and identity of modified nucleosides can be localized onto tRNA sequences.

The strategy used here was adapted from the lab’s previous approach to total tRNA modification mapping22, 72, 77. The key to obtaining high quality data covering the anticodon regions of M. jannaschii tRNAs was the use of multiple RNases. In this work, RNase T1 was used to establish the baseline mapping coverage, as this enzyme reproducibly cleaves tRNAs at unmodified guanosines and m2G. The RNase T1 digests were complemented with four other ribonucleases:

RNase A, RNase U2, MC1 and Cusativin. A total of 10 experimental LC-MS/MS analyses of ribonuclease digests were performed. For each analysis, the resulting MS/MS data was first interpreted using RNAModMapper as described in the Materials and Methods. High quality interpretations that yielded SDPs were annotated directly onto the appropriate tRNA sequence. In those cases, other enzyme digests were used to validate the SDP annotation. In some instances, different enzymes were required to generate SDPs that could cover the entire anticodon loop.

M. jannaschii tRNA modifications outside of anticodon m2G(6): Trm14, previously identified in M. jannaschii (MJ0438), is believed to responsible for m2G at position 6 78. My mapping analyses detected the digestion product GG[m2G]G[s4U]p. This

119

Cys His Met 2 digestion product could arise from tRNA GCA, tRNA GUG, or tRNA CAU1. Previously m G was

Cys 78 localized to position 6 of tRNA GCA in M. jannaschii , which suggests this tRNA is more likely than the others to contain this modification. s4U(8): The sulfur ThiI (MJ0931) is most likely responsible for s4U at position 8 79.

This modification could be localized at this position in many possible tRNAs including

Arg Asn Gly Ser Val tRNA UCU, tRNA GUU, tRNA GCC, tRNA UGA, and tRNA UAC.

2 2 2 2 m G(10)/m 2G(10): The methylated guanosine modification N -methylguanosine (m G) was

Arg Lys Val localized to position 10 in tRNA UCU, tRNA UUU, and tRNA GAU. The dimethylated guanosine

2 His modification (m 2G) could be localized to position 10 of tRNA GUG. These modifications are proposed to arise from PAB1283 in P. abyssi 80. While the for PAB1283 is conserved in Archaea, its homolog (MJ0710) has not been experimentally verified in M. jannaschii.

G+(15): Archaeosine, G+, is widely found at position 15 of archaeal tRNAs. The pathway for this modification is complex with multiple steps 81-82 and the whole pathway can be identified in M. jannaschii: the four enzymes for synthesis of the preQ0 precursor; TGT (MJ0436) that inserts

83 + 84 preQ0 in tRNA ; and ArcS (MJ1022) that converts preQ0-tRNA to G . Archaeosine could be identified at this position in a large number of M. jannaschii tRNAs.

2 2 m 2G(26)/m 2Gm(26): The position 26 dimethylguanosine nucleobase modification was reported to arise from Trm1 in horikoshii 85, and the Trm1 family (MJ0946) is present in M.

2 2 jannaschii. Both m 2G and m 2Gm were found at position 26 at nearly all of the tRNA sequences possessing G26. In most cases, both the dimethyl- and trimethyl-guanosines were found in the same digestion products, suggesting 2’-OH methylation occurs after nucleobase methylation.

120 m5C(48/49): The Trm4 methyltransferase (MJ0026) is responsible for m5C formation in M.

86 Cys jannaschii , which was localized here to position 48 in tRNA GCA and position 49 of multiple tRNAs.

Ψ(54)/m1Ψ(54)/Ψ(55): Pseudouridine was detected in the nucleoside analysis, and it likely arises from the archaeal Pus10 protein (MJ0041) that generates Ψ54 and Ψ55 in archaeal tRNAs 87. As no pseudouridine-specific mapping was performed 60, 88, the specific tRNAs that contain Ψ54 and

Ψ55 could not be identified here The DUF358 SPOUT-methyltransferase MJ1640 (renamed to

TrmY) acts in concert with Pus10 to catalyze the biosynthesis of m1Ψ at position 54 89-90. Data

1 Val consistent with m Ψ was obtained from tRNA GAC.

Cm(56): The Trm56 2'-O-methyltransferase MJ1385 91 is responsible for Cm formation at position

Gln Val 56 in M. jannaschii, and Cm was localized to position 56 in tRNA UUG and tRNA GAC in this study. m1I(57): 1-methylinosine at position 57 is predicted to be introduced by HVO_2747 in H. volcanii

37 1 His Thr . Mapping localized m I to this position in M. jannaschii tRNA GUG and tRNA GGU. m1A(58): TrmI, responsible for m1A at position 58, has been validated in Pyrococcus abyssi 92.

1 Asn The TrmI ortholog is present in M. jannaschii (MJ0134). m A was localized to tRNA GUU,

Gln Ile Met Val Val tRNA UUG, tRNA GAU, tRNA CAU, tRNA CAC, and tRNA GAC.

An abundant RNase digestion product, [Cm][m1I][m1A]AUCCp (Figure 3.95), was found during analysis. Due to the strong sequence similarity in the T-stem loop, this digestion product could map onto multiple M. jannaschii tRNAs localizing Cm56, m1I57 and m1A58.

2 2 Arg Asn Gly m G(67): My data also localized m G to position 67 in tRNA UCU, tRNA GUU, tRNA GCC,

Ile Val tRNA CAU, and tRNA UAC. Confidence in this position assignment was noted because an RNase

121

T1 digestion product consistent with [m1A]AUCUCCCC[m2G]p (Figure 3.127) was detected.

This modification is typically found at the opposing location, G6, in the acceptor stem, but prior

Lys 93 studies on tRNA UUU from Loligo Bleekeri have identified this modification at G67. It may be likely that M. jannaschii tRNAs utilize m2G6 or m2G67 for folding, stability or charging. It is not yet known whether Trm14 (MJ0438) would also be responsible for modifying G67 or if a different methyltransferase is required.

M. jannaschii tRNA Anticodon Modifications

Gly Cm/Um(32): The sugar methylated residue Cm was mapped to position 32 in tRNA GCC,

Lys Met Val tRNA UUU, tRNA CAU1 and tRNA GAC. Similarly, the sugar methylated residue Um was

Ala 94 mapped to position 32 in tRNA GGC. These methylations most likely arise from TrmJ , which was found in Sulfolobus acidocaldarius. However, the candidate gene in M. jannaschii is still unidentified.

2 Trp s C(32): 2-thiocytidine was localized to position 32 in tRNA CCA. A tRNA 2-thiolation protein candidate, MJ1157, is likely responsible for s2C formation.

Agmatidine(34): Agmatidine, introduced by TiaS as characterized in Archaeoglobus fulgidus 95,

Ile was found at position 34 in tRNA CAU, as expected. The TiaS ortholog is present in M. jannaschii

(MJ1095).

5 2 Arg Leu Arg 5 xm (s )U(34): U34 of tRNA UCU, tRNA UAA, and tRNA UCU was modified either as cnm U

5 2 Gln or as the 2-thio analog, cnm s U. Similarly, I found U34 of tRNA UUG was modified either to mnm5U or its 2-thio analog, mnm5s2U. While the data obtained only allows an estimation of the relative levels of xm5U vs xm5s2U modifications, these hypomodifications states suggest that the xm5U (cnm5U and mnm5U) modification pathways might be sufficient alone to guarantee accurate

122 decoding, independent of the s2U modification pathway. Archaeal Elp3 can catalyze the tRNA modification at C5 of U34 96, and the Elp3 homolog, MJ1136, is present in M. jannaschii. However, the role of Elp3 and other enzymes in the formation of mnm5(s2)U and/or cnm5(s2)U in M. jannaschii remains to be determined.

(ms2)x6A(37): t6A is found throughout all domains, and its biosynthesis pathway in archaea has

97 6 Arg Asn Ile been described . t A could be mapped to position 37 in tRNA UCU, tRNA GUU, tRNA GAU

Lys 6 98 and tRNA UUU. The enzymes responsible for t A biosynthesis have been identified in Archaea .

The first step is catalyzed by threonylcarbamoyl-AMP synthase (TsaC MJ0062) that forms the carboxythreonyladenylate that is transferred to target tRNAs by the KEOPS/ t6A synthase complex components (MJ1130, MJ0594a and MJ0187) 99. The modified nucleoside t6A can be further modified to ms2t6A by MtaB in bacterial cells. The MtaB homolog (MJ0867) is found in M. jannaschii 71 and, while ms2t6A was detected during nucleosides analysis, my modification mapping did not identify the tRNA(s) possessing this modification most likely due to its low relative abundance in this sample. hn6A was initially reported in various thermophilic bacteria and archaea 32, and was mapped here

Ile Ile Lys Met Ser Thr to position 37 of tRNA GAU, tRNA CAU, tRNA UUU, tRNA CAU1, tRNA GCU, tRNA GGU and

Thr 2 6 Met tRNA UGU. ms hn A was also localized to position 37 of tRNA CAU1. Previously, Perrochia et al. suggested that the archaeal Sua5 and KEOPS could be responsible for the synthesis of hn6A 97, however the detailed biosynthesis pathway of hn6A is still unknown. Interestingly, A37 of

Ile Lys 6 6 tRNA GAU and tRNA UUU was detected in two different modification states: t A and hn A. What is unique is the data suggests that t6A and hn6A share either a common pathway or a common tRNA substrate target.

123

Wyosine(37) and derivatives: Archaeal wyosine (imG) and methylwyosine (mimG) formation follows the pathway illustrated in Figure 3.134. As noted above, nucleoside analysis identified all four components of this pathway (m1G, imG-14, imG and mimG). The methyltransferase, Trm5

(MJ0883), was reported to yield m1G at position 37 100. Taw1 (MJ0257), which further modifies m1G to imG-14, and Taw3 (MJ1510) responsible for imG and mimG, have been identified in M. jannaschii 70, 101.

Figure 3.134 The modification pathway and predicted/identified modifying enzymes wyosine derivatives.

Phe 1 M. jannaschii tRNA GAA was found to contain both m G and imG at position 37 (Figure 3.74).

Previous bioinformatics analysis suggests that this organism should be capable of generating 7- aminocarboxypropylwyosine (yW-72) at G37 70 and the immediate precursor to this modified nucleoside (yW-86) was likely present in the prior nucleoside analysis of M. jannaschii tRNAs conducted by McCloskey et al. 32. The absence of these wyosine derivatives in the tRNAs analyzed here could be related to culturing conditions or due to limited Taw2 (MJ1557) activity. Additional experiments are warranted to understand wyosine modification states for M. jannaschii and other archaeal tRNAs.

Very surprisingly, my mapping results revealed five additional tRNAs that might contain wyosine-

Cys Leu related modifications at G37: tRNA GCA (imG-14, Figures 3.47 and 3.48), tRNA UAA (imG,

Ser Arg Figure 3.63), tRNA GGA (imG, Figure 3.82), tRNA UCG (imG-14, Figure 3.41), and

Tyr Cys tRNA GUA (imG-14, Figure 3.87). tRNA GCA was found to contain solely imG-14 at position

1 Leu 37. In contrast, both m G and imG, but no imG-14, were found at position 37 of tRNA UAA, and

124

Ser tRNA GGA. If independently validated, these results suggest that wyosine modification to G37 is more prevalent in archaea than previously thought.

Methylwyosine is an archaeal-specific guanosine 37 derivative, but it was not localized to any

Phe tRNA of M. jannaschii including its expected substrate, tRNA GAA. The most likely rationale for not mapping mimG is that its abundance is below the limits of detection for the RNA modification mapping method used here.

Archaeal Decoding Strategies

RNA modification mapping of the total tRNA pool from M. jannaschii allowed me to generate anticodon modification profiles for 34 tRNAs (Table 3.5). Because H. volcanii is the only other archaeal organism whose total tRNA pool has been extensively characterized, first I compared the anticodon modification profiles between these two archaea. I then will describe general characteristics of decoding strategies across the three domains using E. coli as the comparative organism for bacteria, S. cerevisiae as the comparative organism for single cell eukaryotes, and my data for M. jannaschii.

There are total 46 identified tRNAs in H. volcanii 37. H. volcanii and M. jannaschii have several decoding strategies in common. Both organisms use an A-34 sparing strategy to decode all

Ile pyrimidine-ending codons. Both M. jannaschii and H. volcanii use tRNA CAU with modified C34 to decode the isoleucine AUA codon but not the methionine AUG codon. In my study, I have

Ile identified agmatidine at position 34 of tRNA CAU; while agmatidine has not been experimentally verified in H. volcanii, the genes for this modification are present 102.

125

Table 3.5 Codon:anticodon decoding for M. jannaschii tRNAs. Position 34 of anticodon identified with bold text; position 37 of anticodon identified in grey text.

H. volcanii always uses three isoacceptor tRNAs to decode the 4-codon boxes, but M. jannaschii only uses two isoacceptor tRNAs to decode 4-codon boxes except valine codons. In addition, U34 is modified in M. jannaschii tRNAs, which likely accounts for the more efficient decoding of the

4-codon boxes. Both archaea have modifications at position 32. H. volcanii only has Cm at position

Lys Lys Trp Tyr 32 of tRNA UUU, tRNA CUU, tRNA CCA, and tRNA GUA. The modifications at position 32 in

Lys Val M. jannaschii tRNAs are more diversified. I identified Cm in tRNA UUU and tRNA GAC, Um in

126

Ala 2 Trp Ile tRNA GCC, and s C in tRNA CCA. I also found that, except for tRNA CAU, no other modified

C34 was identified in M. jannaschii. In contrast, H. volcanii uses ac4C34 for decoding serine, proline, glutamine, and lysine codons.

Comparison of Decoding Strategies

To improve our understanding of the decoding strategies used by archaea, a comparative analysis of position 34 and 37 modifications in the tRNAs for E. coli, S. cerevisiae, and M. jannaschii is shown in Table 3.6.

1-codon box. There are two 1-codon boxes, Met-AUG and Trp-UGG. M. jannaschii is unique as

C34 in tRNA-Trp is unmodified. In contrast, H. volcanii, E. coli and S. cerevisiae use Cm34 to decode G3 of the tryptophan codon UGG. Cm34 is believed to stabilize codon-anticodon interactions. As C34 is unmodified in M. jannaschii, additional stabilization might occur via the s2C32 modification. In contrast, Cm34 was detected in M. jannaschii tRNA-Met, likely to differentiate between the methionine codon AUG and the isoleucine codon AUC, similar to the strategy used by E. coli, which contains ac4C34.

2-codon box. There are 12 2-codon boxes, which can be divided into two groups. The first group has pyrimidine-ending codons: Phe-UUU/UUC, Tyr-UAU/UAC, His-CAU/CAC, Asn-

AAU/AAC, Asp-GAU/GAC, Cys-UGU/UGC, and Ser-AGU/AGC. The second group of amino acids has purine-ending codons: Leu-UUA/UUG, Gln-CAA/CAG, Lys-AAA/AAG, Glu-

GAA/GAG and Arg-AGA/AGG. A G34-containing tRNA is used to decode pyrimidine-ending codons across all domains, with the most common feature being that position 37 is modified in various manners to enhance anticodon-codon interactions. For purine-ending codons, tRNA requirements appear to be determined by U34 modification status. Similar to the general trends

127 seen in E. coli and S. cerevisiae, M. jannaschii only uses a single U34-containing tRNA, which is then post-transcriptionally modified to more efficiently decode each codon box in this group.

3-codon box. M. jannaschii follows the proposed rule for archaeal decoding of the AUA codon, which is based on the bacterial model of modifying C34 to differentiate from methionine decoding.

4-codon box. There are eight 4-codon boxes. M. jannaschii only uses two tRNAs (UNN and GNN) to decode all 4-box codons except valine. The G34-containing tRNA is able to decode C3 and U3 codons, while U34 is modified to cnm5(s2)U or mnm5U for decoding A3 and G3 codons. Besides

U34 modifications, E. coli and S. cerevisiae have other decoding strategies to decode particular 4- codon boxes. Serine, proline, threonine and alanine codons can be decoded by modifying A34 to inosine in S. cerevisiae. To decode the arginine CGU/CGC/CGA/CGG codons, both E. coli and S. cerevisiae use I34. Because there is no A34 anticodon in M. jannaschii and H. volcanii, the inosine based decoding strategy is not required.

128

Table 3.6 The comparison of decoding strategy between M. jannaschii (black), H. volcanii (orange, xU indicates the experimental unidentified uridine derivative), E. coli (red), and S. cerevisiae (blue). A. A. A. A. A. A. A. A. Anticodon Anticodon Anticodon Anticodon Codon Codon Codon Codon GAA GGA GUA GCA UUU 5 2 UAU GAA cnm s UGA GUA UGU GCA Phe UCU Tyr Cys GAA GGA (UGA) QUA GCA UUC UAC UGC GmAA ac4CGA GΨA GCA UCC 5 2 Ser cnm (s )UAA GGA CGA UGA Stop UCA UUA UAA CAA cmo5UGA UAA cmnm5UmAA CCA Leu Stop CmCA CmAA UCG IGA CGA Trp UUG UAG UGG ncm5UmAA ncm5UGA CmCA m5CAA CmCA GGG GUG GAG GCG 5 CAU GUG CUU cnm5s2UAG CCU cnm UGG CGU 5 His QUG cnm UCG GGG UGG CAC CUC GAG CAG CCC 4 GUG CGC GCG CCG Leu ac CGG mo5UAG Pro mnm5(s2)UUG Arg xUCG CUA CCA GGG CGG ac4CUG (UUG) CGA GAG CAG 5 CAA cmo UGG 5 2 ICG CCG 5 mnm s UUG CUG cmo UAG CCG Gln CGG IGG CUG CAG 5 2 ICG CCG GAG UAG ncm5UGG mcm s UUG CUG AUU GAU C+AU GUU GCU GGU AAU AGU 5 GUU GCU GAU C*AU ACU cnm UGU Asn Ser Ile AUC QUU GCU 2 AAC AGC GAU k CAU GGU (UGU) CGU GCU ACC GUU AUA IAU ΨAΨ Thr 5 2 GGU CGU cnm s UUU 5 2 ACA cnm (s )UCU CmAU cmo5UGU AAA ac4CUU xUUU AGA (UCU) (CCU) CmAU Lys 5 2 Arg MetAUG ACG IGU CGU mnm s UUU mnm5UCU CCU ac4CAU AAG 5 2 ncm5UGU mcm s UUU AGG mcm5UCU CCU CAU CUU

GUC GCC GAC CAC GGC GAU 5 GUC 5 GUU cnm UAC GCU cnm5UGC Asp GGU mnm UCC gluQUC GCC CCC GAC CAC (UAC) GGC UGC CGC GAC GUC GCC GUC GGC xUCC Val GAC Ala Gly GGC mnm5s2UUC GCC CCC GUA cmo5UAC GCA GGA cmo5UGC GAA xUUC ac4CUC mnm5UCC GAC CAC Glu mnm5s2UUC GUG GCG IGC GGG GCC CCC 5 5 2 ncm UAC ncm5UGC GAG mcm s UUC mcm5UCC CUC

129

3.4 Conclusion

In this study I have mapped the post-transcriptional modifications present in M. jannaschii tRNAs.

For the anticodon loop, I find that U34 in M. jannaschii tRNAs are modified to cnm5U, cnm5s2U, mnm5U, and mnm5s2U to efficiently decode synonymous codons by wobble base pairing. The modifications t6A, hn6A, m1G, imG-14 and imG at position 37 likely ensure correct decoding of

2-fold degenerate synonymous codons.

130

Chapter 4

The Post-transcriptional Modifications in tRNA from Selected Archaeal Organisms

4.1 Introduction

The post-transcriptional modifications to tRNAs in bacteria and eukaryotes have been intensively studied in terms of the types, positions, functions, and modification pathways as noted in Chapter

1. However, very few archaeal organisms have been explored32-33, 35-37, 76. Unlike Bacteria and

Eukaryotes, our understanding of tRNA modifications in Archaea is still limited. Table 4.1 lists the modified nucleosides, which have been identified by Dr. Susan Russell (unpublished). In

Chapter 3, I have characterized the modification profile of Methanocaldococcus jannaschii tRNAs with an emphasis on the anticodon loop. To further understand the modifications in other Archaea, two Euryarchaeota, Methanopyrus kandleri and Methanothermobacter marburgensis, and two

Crenarchaeota, Sulfolobus acidocaldarius and Thermoproteus tenax, were investigated. Modified nucleosides were analyzed by LC-MC/MC. The homologs of modifying enzyme and possible guide RNAs were also identified. All the tRNA samples of archaeal organisms were from Dr.

Lennart Randau’s lab (Max Planck Institute for Terrestrial Microbiology).

Table 4.1 The modified nucleosides identified in tRNA from selected archaeal organisms. The data is adapted from dissertation of Dr. Susan Russell, dissertation title: Characterizing Modified Nucleosides in RNA by LC/UV/MS. H.marismortui M.maripaludis M.maripaludis H.walsbyi Am m1A m1A m1A m1A 2 m A m6A m6A m6A m6A m1I m1I m1I m1I t6A t6A t6A t6A ms2t6A ms2t6A ms2t6A ms2t6A 6 hn A

131

Gm m1G m1G m1G m1G m2G m2G m2G m2G m7G m7G m7G m7G 2 2 2 2 m 2G m 2G m 2G m 2G 2 m 2Gm G+ G+ G+ G+ imG-14 imG-14 imG imG mimG Cm Cm Cm Cm m5C m5C m5C m5C 5 m Cm 2 s C 4 s U 4 4 4 ac C ac C ac C 4 ac Cm C+ C+ C+ C+ Ψ Ψ Ψ Ψ m1Ψ m1Ψ m1Ψ m1Ψ Um 5 2 5 2 mcm s U mcm s U mcm5U

4.2 Experimental

4.2.1 The Total Nucleosides and Oligonucleotides Preparation

The total tRNA samples were denatured at 100 ºC for 3 min then chilled in an ice water bath, 1/10 volume 0.1 M ammonium acetate was added to the tRNA after denature. For each 20 µg tRNA, 2

U (units) of nuclease P1 were added and the samples were incubated at 45 ºC for 2 h. After nuclease

P1 digestion, 1/10 volume of 1.0 M ammonium bicarbonate was added to the sample, then 0.0001

U of snake venom phosphodiesterase and 0.003 U Antarctic Phosphatase were added per microgram of tRNA, and the mixture was incubated at 37 ºC for 2 h. Finally, the hydrolyzed nucleosides sample was dried in a vacuum concentrator.

132

RNase T1 was used to digest the total tRNAs to oligonucleotides. The total tRNA samples were denatured at 100 ºC for 3 min then chilled in an ice water bath. Four micrograms of total tRNA of each sample were prepared in 1/3 volume of 200 mM ammonium acetate, and 50 U of RNase T1 was added for every 1 µg of total tRNA. The mixture was incubated at 37°C for 2 h, and dried in a vacuum concentrator.

4.2.2 LC-MS/MS Analysis

The hydrolyzed nucleosides were reconstituted in mobile phase A (MPA, 5 mM ammonium acetate pH = 5.3), and analyzed on a VanquishTM Flex Quaternary UHPLC system with a Waters

HSS T3 column (100 Å, 1.8 µm, 2.1 × 50 mm) at a flow rate of 250 µL/min. The column temperature was held at 30 °C. A gradient consisting of 1% mobile phase B (MPB, 60% H2O, 40% acetonitrile, v:v) for 6.3min; 2% in 9.2 min; 3% in 16 min; 5% in 21.4 min; 25% in 24.6 min; 50% in 26.9 min; 75% in 30.2 min and hold for 0.3 min; 99% in 35 min and hold for 0.8 min, and reverting MPB to 1% was used for separations. Mass spectral data were obtained on a Thermo

Orbitrap Fusion Lumos mass spectrometer in positive polarity with a capillary temperature of

329 °C, spray voltage of 3.5 kV, sheath gas, auxiliary gas and sweep gas at 38, 11, and 1 arbitrary units, respectively. The sample was analyzed in the Orbitrap mode with mass range from 105 to

900 at 120,000 mass resolution, and data-dependent acquisition was used to acquire MS/MS data.

The nucleoside data was acquired by Manasses Jora.

The RNase T1 digestion products were rehydrated in mobile phase A (MPA, 200 mM HFIP,

8.15mM TEA, pH = 7.0), and separated on a Waters XBridge C18 column (3.5 µm, 1 × 150 mm) with a gradient of 1% mobile phase B (MPB, 50% MPA, 50% methanol, v:v) to 20% in 5 min;

133

20% to 95% B in 43 min; hold 95% for 5 min, followed by re-equilibration for another 15 min at

5%. Mass spectral data were acquired on a Thermo LTQ-XL mass spectrometry in negative polarity with a capillary temperature of 275°C, spray voltage of 4 kV, sheath gas, auxiliary gas and sweep gas at 40, 10, and 10 arbitrary unites, respectively. Samples were analyzed by data- dependent acquisition, and the mass range is from m/z 500 to 2000.

4.2.3 Homologous Proteins Search for Modifying Enzyme

The homologous proteins search was performed using the PSI-BLAST online search tool

(https://blast.ncbi.nlm.nih.gov/Blast.cgi)103. The query sequence is based on the enzyme whose function is known for catalyzing a specific modification. If the modifying enzyme is found in

Archaea, this sequence was used. If the modifying enzyme has not been found in Archaea, the enzyme from one of the other two domains (Bacteria and Eukarya) was used. The protein candidates are selected based on the E value; a lower E value indicates the enzyme is likely to be the most homologous protein and serves the similar function as the query protein.

4.3 Results and Discussion

4.3.1 Identification of Modified Nucleosides in Selected Archaeal Organisms

The intact mass of modified nucleosides and CID fragments of its precursor were used for identification of modified nucleosides. HCD MS2 was also used to differentiate positional isomers of nucleosides modifications24. Tables 4.2-4.6 list the modified nucleosides characterized in M.

134 kandleri, M. jannaschii, M. marburgensis, S. acidocaldarius and T. tenax, respectively. All identified nucleosides in tRNA from these five archaeal organisms are summarized in Table 4.7.

Table 4.2 Identified nucleosides in M. kandleri. Nucleoside RT(min) M+H CID Fragments HCD Fragments C 1.97 244.0927 112 U 2.63 245.0767 113 Ψ 1.39 245.0768 209/179/155 m5C 4.14 258.1084 126 126/109/83/81/56 Cm 5.22 258.1084 112 Um 9.97 259.0924 113 m1Ψ 3.96 259.0924 229/200/179/169 2 s C 3.09 260.0698 128 A 18.49 268.1037 136 I 8.88 269.0878 137 Am 25.69 282.1196 136 m1A 5.97 282.1197 150 150/133/109 m6A 27.49 282.1196 150 150/123/94 1 m I 15.71 283.1036 151 G 7.27 284.0989 152 4 ac C 17.28 286.1033 154/112 Gm 15.22 298.1144 152 m1G 16.05 298.1144 166 167/166/153/149/128/109/67 m2G 17.92 298.1144 166 167/153/149/128/110/57 4 ac Cm 26.89 300.1189 154/112 2 m 2G 24.8 312.13 180 m2Gm 27.57 312.13 166 167/153/149/128/110/57 + G 27.2 325.1251 193 2 m 2G m 27.86 326.1457 180 + C 30.04 356.2043 224 6 t A 27.24 413.1412 281 hn6A 28.53 427.157 295 162/136/119 2 6 ms t A 28.72 459.129 327 OHyWy 27.32 467.1884 335 2 6 ms hn A 29.58 473.1448 341

Table 4.3 Identified nucleosides in M. jannaschii. Nucleoside RT(min) M+H CID Fragments HCD Fragments C 1.93 244.0927 112 U 2.54 245.0766 113 Ψ 1.29 245.0766 209/179/155

135

m5C 4.12 258.1082 126 126/109/83/81/56 Cm 5.19 258.1082 112 Um 10.06 259.0923 113 m1Ψ 3.50 259.0923 223/193/169 2 s C 3.1 260.0697 128 s4U 5.47 261.0538 129 129/112/86/69/59 A 18.33 268.1039 136 Am 25.68 282.1195 136 m1A 6.51 282.1195 150 150/133/109 m6A 27.46 282.1192 150 150/123/94 1 m I 15.64 283.1034 151 5 cnm U 5.09 284.0874 152 G 7.14 284.0987 152 mnm5U 2.62 288.1187 257/239/156/125 Gm 15.18 298.1143 152 m1G 16.12 298.1143 166 167/166/153/149/128/109/67 m2G 17.83 298.1144 166 167/153/149/128/110/57 5 2 cnm s U 9.81 300.0647 168 mnm5s2U 3.38 304.0958 273/255/172/141 2 m 2G 25.55 312.13 180 m2Gm 27.55 312.1298 166 167/153/149/128/110/57 imG-14 27.98 322.1143 190 + G 27.2 325.1252 193 2 m 2Gm 28.64 326.1455 180 imG 28.47 336.1299 204 mimG 30.11 350.1455 218 + C 29.71 356.2035 224 6 t A 27.17 413.1413 281 6 hn A 28.44 427.1569 295 2 6 ms t A 28.64 459.1291 327 2 6 ms hn A 29.52 473.1445 341

Table 4.4 Identified nucleosides in M. marburgiensis. Nucleoside RT(min) M+H CID Fragments HCD Fragments C 1.93 244.0925 112 U 2.73 245.0766 113 Ψ 1.23 245.0765 209/179/155 m5C 4.53 258.108 126 126/109/83/81/56 Cm 6.04 258.1082 112 m1Ψ 1.47 259.0923 223/193/169 s4U 8.4 261.0537 129 129/112/86/69/59

136

A 18.46 268.1038 136 m1A 4.65 282.1195 150 150/133/109 m6A 27.49 282.1195 150 150/123/94 1 m I 15.73 283.1034 151 5 cnm U 5.78 284.0875 152 G 7.39 284.0988 152 mnm5U 2.32 288.1188 257/239/156/125 m1G 16.09 298.1143 166 167/166/153/149/128/109/67 m2G 17.98 298.1144 166 167/153/149/128/110/57 mnm5s2U 5.69 304.0957 273/255/172/141 2 m 2G 25.57 312.1299 180 imG-14 27.98 322.1142 190 G+ 27.20 325.1250 193 imG 29.24 336.1297 204 + C 29.68 356.2036 224 6 t A 27.23 413.1411 281 yW-86 26.58 423.1619 291 2 6 ms t A 28.69 459.1288 327

Table 4.5 Identified nucleosides in S. acidocaldarius. Nucleoside RT(min) M+H CID Fragments HCD Fragments C 1.91 244.0927 112 U 2.7 245.0766 113 Ψ 1.27 245.0767 209/179/155 m5C 4.28 258.1084 126 126/109/83/81/56 Cm 5.93 258.1083 112 Um 10.3 259.0923 113 m1Ψ 2.97 259.0923 223/193/169 s2U 7.23 261.0537 129 112/84/70/60 A 18.4 268.1039 136 Am 25.71 282.1194 136 m1A 4.61 282.1195 150 150/133/109 m6A 27.47 282.1195 150 150/123/94 1 m I 15.59 283.1035 151 G 7.36 284.0988 152 4 ac C 17.24 286.1031 154/112 Gm 15.22 298.1144 152 m1G 16.1 298.1145 166 167/166/153/149/128/109/67 m2G 17.95 298.1143 166 167/153/149/128/110/57 2 m 2G 25.53 312.1299 180 + G 27.19 325.1251 193 2 m 2Gm 28.65 326.1455 180 5 mcm Um 26.91 331.1134 185

137

imG 29.23 336.1298 204 mimG 29.67 350.1457 218 + C 29.78 356.2038 224 6 t A 27.24 413.1411 281

Table 4.6 Identified nucleosides in T. tenax. Nucleoside RT(min) M+H CID Fragments HCD Fragments C 1.9 244.0927 112 U 2.59 245.0767 113 Ψ 1.3 245.0767 209/179/155 m5C 4.14 258.1083 126 126/109/83/81/56 Cm 5.22 258.1083 112 Um 10.15 259.0923 113 m1Ψ 2.96 259.0923 223/193/169 s4U 5.69 261.0537 129 129/112/86/69/59 A 18.48 268.1038 136 2 s Um 26.48 275.0695 129 m1A 6.66 282.1194 150 150/133/109 m6A 27.5 282.1196 150 150/123/94 1 m I 15.77 283.1035 151 G 7.22 284.0987 152 4 ac C 17.34 286.1031 154/112 Gm 15.24 298.1143 152 m1G 16.17 298.1145 166 167/166/153/149/128/109/67 m2G 17.98 298.1143 166 167/153/149/128/110/57 preQ0tRNA 24.42 308.0988 176 2 m 2G 25.56 312.13 180 m2Gm 27.55 312.1301 166 167/153/149/128/110/57 + G 27.2 325.1252 193 2 m 2G m 28.65 326.1456 180 5 mcm Um 26.91 331.1133 185 5 2 mcm s U 26.25 333.0749 201/169 imG 29.22 336.1299 204 mimG 29.66 350.1458 218 + C 29.56 356.2039 224 6 t A 27.16 413.1413 281 2 6 ms t A 28.65 459.1289 327

138

Table 4.7 The comparison of identified nucleosides in tRNAs from all five archaeal organisms. Euryarchaeota Crenarchaeota M. kandleri M. jannaschii M. marburgensis S. acidocaldarius T. tenax A A A A A Am Am Am m1A m1A m1A m1A m1A m6A m6A m6A m6A m6A I m1I m1I m1I m1I m1I t6A t6A t6A t6A t6A ms2t6A ms2t6A ms2t6A ms2t6A hn6A hn6A ms2hn6A ms2hn6A G G G G G Gm Gm Gm Gm m1G m1G m1G m1G m1G m2G m2G m2G m2G m2G 2 2 2 2 2 m 2G m 2G m 2G m 2G m 2G 2 2 2 2 m 2Gm m 2Gm m 2Gm m 2Gm m2Gm m2Gm m2Gm preQ0tRNA G+ G+ G+ G+ G+ imG-14 imG-14 imG imG imG imG mimG mimG mimG OHyWy yW-86 C C C C C Cm Cm Cm Cm Cm m5C m5C m5C m5C m5C s2C s2C ac4C ac4C ac4C ac4Cm C+ C+ C+ C+ C+ U U U U U Ψ Ψ Ψ Ψ Ψ m1Ψ m1Ψ m1Ψ m1Ψ m1Ψ Um Um Um Um s2U s2Um s4U s4U s4U cnm5U cnm5U cnm5s2U mnm5U mnm5U

139

mnm5s2U mnm5s2U mcm5Um mcm5Um mcm5s2U

M. jannaschii has been studied before by McCloskey and co-workers32, and in my study I have identified some modified nucleosides that are in common with this study. However I also identified two modified nucleosides, 5-cyanomethyluridine (cnm5U) and 5-cyanomethyl-2-thiouridine

(cnm5s2U), which are found in M. jannaschii before as noted in Chapter 3.

4.3.2 Modification Mapping of Selected Archaeal Organisms

The modification mapping results of selected archaeal organisms are shown in Figures 4.10 - 4.14, and Appendix Table A1 lists the digestion products with modifications found in selected archaeal organisms. The modified nucleoside whose position was localized is colored by black in the consensus modification sequences. Some digestion products are signature digestion products

(SDPs)64-66, in this case the modified nucleoside can be localized within the digestion product, and mapped back to only a single tRNA sequence. Because of the high similarity of some tRNA sequences, some digestion products are not SDPs, and they can be placed onto multiple sequences.

However, if the modification in the digestion product can be mapped at a certain position on multiple tRNA sequences, I still can identify the position of this modification. For example,

A[G+]CCAGp was found in T. tenax tRNAs, and it can be mapped back to position 15 of

Leu Leu Leu Leu Leu tRNA CAA, tRNA CAG, tRNA GAG, tRNA UAA, and tRNA UAG. In this case, G+ cannot be mapped back to a single tRNA, but I still can identify G+ is present at position 15 of T. tenax tRNAs.

140

4.3.3 Homologous Proteins Search Result

Table 4.8 lists the homologous proteins of tRNA modifying enzymes for detected modification in five archaeal organisms.

4.3.4 Guide RNAs Directed tRNA 2′-O-methyl Modification

In Archaea, guide RNAs can work with enzymes to catalyze the biosynthesis of modifications at specific positions on tRNAs. The 2′-O-methylation on tRNA and rRNA can be due to the C/D box sRNAs guide mechanism104-106. The C/D box sRNAs have four conserved sequence elements box

C and C′ (RUGAUGA) and box D and D′ (CUGA) The C and D elements are located at the 5′ and

3′ end and the D′ and C′ elements are located in the center of the molecule as shown in Figure 4.1.

The guide region is between the box C and D′ or between box C′ and D. The complementarity between the guide region and target tRNA sequence is about 10 base pairs in length107.

Figure 4.1 The structure of C/D box

141

Table 4.8 The homologs of tRNA modifying enzyme found in each organism. The Ref organism column lists the reference organism used for protein homolog search. The Accession ID column lists the accession number of each protein from reference organism in NCBI protein database. The accession ID in NCBI protein database is used for the homolog protein search result.

Euryarchaeota Crenarchaeota Position Protein Ref organism Accession ID M. kandleri M. jannaschii M. marburgensis S. acidocaldarius T. tenax Am 4 Trm13 S. cerevisiae NP_014516 - - - - - m1A 9 Trm10p S. acidocaldarius WP_011278489 - - - WP_011278489 - 57/58 TrmI P. abyssi WP_010867553 AAM02628 WP_010869627 WP_013294844 WP_011277713 WP_014126096 m6A 37 TrmM E. coli P31825 - - - - - I 34 TadA E. coli 1Z3A_B - - - - - m1I 57 HVO_2747 H. volcanii ADE02888 AAM02871 WP_010869529 WP_013296490 WP_011277518 WP_014127944 t6A 37 Sua5 P. abyssi Q9UYB2 AAM01850 WP_010869554 WP_013295105 WP_011278455 WP_014126903 KEOPS P. abyssi Q9UXT7 Q8TVD4 WP_010870641 WP_013294854 WP_011277720 WP_014126820 ms2t6A 37 MtaB B. subtilis P54462 AAM02294 Q58277 WP_013296034 WP_011277826 WP_014126571 hn6A 37 ------ms2hn6A 37 ------Gm 18 Trm3 S. cerevisiae Q07527 - - - - - 18 TrmH E. coli P0AGJ2 - - - - - 34 Trm7 S. cerevisiae P38238 - - - - - m1G 37 aTrm5 M. jannaschii WP_010870397 AAM01814 WP_010870397 WP_013295271 WP_011277456 WP_014127564 m2G 6 TrmN M. jannaschii WP_010869937 - WP_010869937 - - - 10 Trm-G10 P. abyssi Q9UY84 AAM02107 WP_010870216 WP_013295936 WP_011278127 - 26 Trm1 P. horikoshii WP_010885888 Q8TYY7 WP_010870460 WP_013296356 WP_011278180 WP_014127357 2 m 2G 10 Trm-G10 P. abyssi Q9UY84 AAM02107 WP_010870216 WP_013295936 WP_011278127 - 26 Trm1 P. horikoshii WP_010885888 Q8TYY7 WP_010870460 WP_013296356 WP_011278180 WP_014127357 2 m 2Gm 26 ------m2Gm ------preQ0tRNA 15 TgtA H. volcanii BAB40327 Q8TYV3 Q57878 WP_013295448 WP_015385469 WP_014127801 G+ 17 arcTgt T. gammatolerans ACS33643 Q8TYV3 Q57878 WP_013295448 WP_015385469 WP_014127801 15 ArcS M. jannaschii Q58428 AAM02297 Q58428 WP_013295079 - - 15 QueF-L P. calidifontis WP_011848915 - - - - WP_014127843 imG-14 37 Taw1 M. jannaschii WP_010869754 AAM01272 WP_010869754 WP_013296225 WP_011278168 WP_014127892 imG 37 Taw3 M. jannaschii Q58905 AAM01265 WP_010871033 - - - 37 Taw3 S. solfataricus Q9UX16 - - - WP_011278418 CCC81358 mimG 37 Taw3 M. jannaschii Q58905 AAM01265 WP_010871033 - - - 37 Taw3 S. solfataricus Q9UX16 - - - WP_011278418 CCC81358 OHyWy 37 ------yW-86 37 Taw2 M. jannaschii Q58952 AAM02153 Q58952 WP_013296367 WP_011277456 WP_014127564 Cm 32 TrmJ S. acidocaldarius Q4JB16 AAM01734 Q58871 WP_013295247 Q4JB16 - 34 Fibrillarin S. solfataricus WP_009992372 Q8TXU9 WP_010870202 WP_013296389 WP_011278183 WP_014127490 34 L7Ae S. solfataricus WP_010922882 Q8TV03 WP_010870715 WP_013295524 ALU30318 WP_014127866 34 Nop5p S. solfataricus WP_010923226 AAM01774 WP_010870199 WP_013296388 WP_011278184 WP_052883189 34 TrmL E. coli NP_418063 - - - - - 56 aTrm56 P. abyssi WP_010868694 Q8TX72 WP_010870902 WP_048901304 WP_011277539 - m5C 40/48/49 Trm4 M. jannaschii Q60343 AAM01585 Q60343 - WP_015385477 CCC81393

142

49 PAB1947 P. abyssi WP_010867748 Q8TYR4 WP_010869516 WP_013295021 WP_011277459 WP_014127403 s2C 32 TtcA E. coli P76055 - - - - - ac4C 34 TmcA H. volcanii ADE03866 Q8TYZ5 - - WP_015385426 WP_014127336 ac4Cm 34 ------C+ 34 TiaS A. fulgidus O28025 Q8TWM9 Q58495 WP_013296151 AHC51104 CCC81213 Ψ 55 Cbf5 H. volcanii ADE03180 Q8TZ08 WP_010869643 WP_013295341 WP_080504008 WP_014127730 38/39/40 TruA H. volcanii WP_013035360 Q8TWZ3 WP_010871199 WP_013296047 - WP_014126012 54/55 Pus10 M. jannaschii WP_064496374 Q8TUV7 WP_064496374 WP_013296487 AHC51579 WP_014126534 m1Ψ 54 TrmY M. jannaschii WP_010871164 Q8TWL5 WP_010871164 - - - Um 44 Trm44 S. cerevisiae NP_015295 - - - - - s2U 34 Ncs6 M. maripaludis CAF30912 AAM02316 WP_010870671 WP_013295151 WP_011278385 WP_014126840 s2Um ------s4U 8 Thil M. maripaludis CAF30910 AAM02121 WP_064496669 WP_013295099 WP_011279120 WP_052883094 cnm5U 34 ------cnm5s2U 34 ------mnm5U 34 MnmE E. coli P25522 - - - - - 34 MnmG E. coli P0A6U3 - - - - - 34 MnmC E. coli P77182 - - - - - 34 DUF752 A. Aeolicus 3VYW_D - Q58084 WP_013295447 - - mnm5s2U 34 MnmA E. coli P25745 - - - - - 34 IscS E. coli P0A6B7 - - WP_013296553 - - mcm5Um 34 ELP3 S. cerevisiae Q02908 AAM02265 WP_064496733 ADL58804 WP_015385536 WP_052883303 34 Trm9 S. cerevisiae P49957 - - - - - 34 Alkbh8 Homo sapiens NP_001287939 - - - - - mcm5s2U 34 Ctu1 S. pombe NP_596064 AAM02316 WP_010870671 WP_013295151 WP_011278385 WP_014126840

143

The 2′-O-methylation of nucleotides C34 and U39 in H. volcanii pre-tRNATrp requires the C/D box ribonucleoprotein (RNP) which is C/D box RNAs and core proteins104. To complete the 2′-O- methylation, the C/D box guide RNAs need to work with three proteins: Fibrillarin, L7Ae, and

Nop5p108-109, and these three proteins are found in all five archaeal species as shown in Table 4.8.

Many C/D box sRNAs were identified in different Archaea species110-114 including M. kandleri,

M. jannaschii, S. acidocaldarius, and T. tenax. The guide region of detected C/D box sRNAs from different Archaea species are manually aligned to their tRNA sequences. Figures 4.2 to 4.6 list possible guide RNAs candidates for the 2′-O-methylcytidine at position 34 of target tRNAs.

M. kandleri

A total of 126 C/D box sRNA genes were found in M. kandleri by Su and co-workers113, and the targets of these C/D box sRNAs include not only tRNAs but also rRNAs and other non-coding

RNAs. I manually checked C/D box sRNAs and the target tRNA sequences and found the

iMet anticodon loop of tRNA CAU is able to pair with the guide region of one of the C/D box sRNAs as shown in Figure 4.2, which might produce the 2′-O-methylcytidine at position 34.

iMet Figure 4.2 The predicted C/D box sRNAs pairs with M. kandleri tRNA CAU, position 34 on tRNA is colored by red.

M. jannaschii

A M. jannaschii C/D box sRNA was reported by Suryadi and co-workers112, it binds to L7Ae protein and guides the 2′-O-methylation. I found the guide region of this C/D box sRNA can pair

144

Met with tRNA CAU anticodon region as shown in Figure 4.3. It suggests that the 2′-O-methylation

Met at position 34 of M. jannaschii tRNA CAU might be guided by this C/D box sRNA.

Met Figure 4.3 The predicted C/D box sRNAs pairs with M. jannaschii tRNA CAU, position 34 on tRNA is colored by red. S. acidocaldarius

The C/D box sRNA genes of S. acidocaldarius were reported by Tripp and co-workers114. There

Trp is a C/D box sRNA which can pair with the anticodon region of S. acidocaldarius tRNA CCA, and it might produce the 2′-O-methylcytidine at position 34 as shown in Figure 4.4. I also identified another C/D box sRNA, which can pair with the anticodon region of S. acidocaldarius

Met tRNA CAU and yield the 2′-O-methylcytidine at position 34 as shown in Figure 4.5.

CU[Cm]AUAACCCUGp was found during mapping with RNase T1, suggesting that Cm34 in S.

Met acidocaldarius tRNA CAU is likely due to a guide RNA mechanism.

Trp Figure 4.4 The predicted C/D box sRNAs pairs with S. acidocaldarius tRNA CCA, position 34 on tRNA is colored by red.

145

Met Figure 4.5 The predicted C/D box sRNAs pairs with S. acidocaldarius tRNA CAU, position 34 on tRNA is colored by red.

T. tenax

A total of 52 C/D box sRNA genes of T. tenax were identified by Tripp and co-workers114. One

Trp C/D box sRNA is able to pair with the anticodon region of T. tenax tRNA CCA as shown in Figure

Trp 4.6. This result might indicate that position 34 of T. tenax tRNA CCA could be modified to 2′-O- methylcytidine.

Met Figure 4.6 The predicted C/D box sRNAs pairs with T. tenax tRNA CAU, position 34 on tRNA is colored by red.

4.3.5 Discussion

To better understand archaeal tRNA modifications, the modified nucleosides identified here are discussed in term of positions, modifying enzymes, and guide RNAs (if applicable). Based on the homolog protein search results of modifying enzymes found in different species, the positions of some unlocalized modifications can be predicted. The modifying enzymes of hn6A, and ms2hn6A are not reported before, so these two modifications are not discussed.

146

4.3.5.1 Modified Adenosine

2′-O-methyladenosine (Am)

2′-O-methyladenosine was identified in M. Kandleri, M. jannaschii, and S. Acidocaldarius, but the position was not localized. Am is catalyzed by Trm13 in S. cerevisiae at position 4, which is in the middle of the acceptor stem115. This modification is widely conserved in Eukaryotes, however no homolog of Trm13 is found in the five Archaea. Am and the modifying enzyme have not yet been identified in any bacterial tRNA.

1-methyladenosine (m1A)

1-methyladenosine is one of the most conserved modifications in tRNA, and it was found in all five Archaea. In P. abyssi, m1A is introduced by TrmI at positions 57 and 5892, 116. Homologs of

TrmI were found in all five organisms. Modification mapping results show that m1A is found at position 58 in M. jannaschii, S. Acidocaldarius and T. Tenax tRNAs. m1A also can be introduced by Trm10p at position 9 in S. acidocaldarius117, but no m1A at position 9 was identified in this study.

N6-methyladenosine (m6A)

N6-methyladenosine was identified in tRNA of three domains of life118. m6A was detected in all five Archaea in this study, and it was mapped at position 37 in M. marburgensis tRNAs. The modifying enzyme, TrmM119, which catalyzes the biosynthesis of m6A at position 37 was found in E. coli. However, the homolog of TrmM was not found in any of these five Archaea.

147

Inosine (I)

Inosine and its 1-methyl derivative m1I were reported to be found at position 34 in Bacteria and

Eukarya. TadA catalyzes A to I conversion at position 34120 in these kingdoms. However, no TadA homolog was found using PSI-BLAST. Inosine was only found in M. kandleri, but the position was not localized by modification mapping. An adenosine deaminase HVO_2747 was predicted in H. volcanii37, however it catalyzes the deamination of m1A to produce m1I at position 57.

1-methylinosine (m1I)

1-methylinosine was found in all five Archaea species, and it was identified at position 57 in M. jannaschii. An adenosine deaminase HVO_2747 was predicted in H. volcanii37. It is able to catalyze the deamination of m1A to produce m1I at position 57. Homologs of this predicted deaminase were found in all five organisms, which indicates that m1I might be present at position

57 in all these Archaea tRNAs.

N6-threonylcarbamoyladenosine (t6A) and 2-methylthio-N6-threonylcarbamoyladenosine (ms2t6A)

N6-threonylcarbamoyladenosine is a universal tRNA nucleoside in three domains, and it is found at position 37 of most tRNAs with A37. It was detected in all five samples by nucleosides analysis.

The KEOPS complex proteins in Archaea play an important role in t6A biosynthesis99, and the homologs of KEOPS are found in all five archaea. t6A was mapped at position 37 of M. jannaschii,

M. marburgensis, S. acidocaldarius and T. tenax tRNAs, t6A is also likely to be present at position

37 of M. kandleri, but was not localized possibly due to low abundance. t6A can be further modified to ms2t6A by MtaB (B. subtilis71), and ms2t6A was detected in M. kandleri, M. jannaschii, M. marburgensis, and T. tenax. It was mapped at position 37 of M.

148 marburgensis tRNA. Homologs of MtaB were found in all five Archaea suggesting that ms2t6A might be present at position 37 of M. kandleri, M. jannaschii, and T. tenax tRNAs.

4.3.5.2 Modified Guanosine

2′-O-methylguanosine (Gm)

2′-O-methylguanosine was found in all five Archaea. Trm3 from S. cerevisiae121 and TrmH from

Escherichia coli122 are able to catalyze the formation of Gm at position 18. These two enzymes were used for homologous protein search, however no homologs were identified. Modification mapping shows that Gm was found at position 18 of S. acidocaldarius, which indicates that an uncharacterized enzyme might be responsible for Gm18 in Archaea. Trm7-catalyzed Gm was also reported to be found at position 34 of the anticodon loop in S. cerevisiae123, however the homolog of Trm7 was not found in these archaea. Gm was not mapped at position 34 in these samples.

1-methylguanosine (m1G)

1-methylguanosine was found in all five Archaea, and modification mapping shows that m1G is present at position 37 of M. jannaschii and T. tenax tRNAs. The aTrm5 found in M. jannaschii catalyzes the formation of m1G124, and PSI-BLAST analysis identified aTrm5 homolog in the other four Archaea. I would expect that m1G is present at position 37 of the other three samples.

2 2 2 2 2 2 2 N -methylguanosin (m G), N ,N -dimethylguanosine (m 2G), N ,N ,2′-O-trimethylguanosine

2 2 2 (m 2Gm), and N ,2′-O-dimethylguanosine (m Gm)

N2-methylguanosine and N2,N2-dimethylguanosine were found in all five Archaea. Trm14 was found in M. jannaschii to catalyze the m2G modification at position 678. Trm14 homologs were

149 not found in the other organisms, and modification mapping localized m2G to position 6 in M.

2 2 125 jannaschii. m G and m 2G at position 10 are catalyzed by Trm-G10 in P. abyssi . Homologs of

PAB1283 were found in M. kandleri, M. jannaschii, M. marburgensis and S. acidocaldarius. m2G

2 2 was mapped to position 10 of M. jannaschii and M. marburgensis. m G and m 2G at position 26 are introduced by Trm1 in P. horikoshii85. Trm1 homologs were found in all five archaea, and

2 2 2 m 2G and m 2Gm were mapped at position 26 of all five Archaea. These results indicate that m 2G at position 26 is highly conserved in Archaea. m2Gm was detected in M. kandleri, M. jannaschii, and T. tenax tRNAs. There two possible biosynthesis pathways, either from m2G to m2Gm, or from

Gm to m2Gm. However, the position of m2Gm was not localized. The position and biosynthesis of m2Gm still remains unknown.

Archaeosine (G+)

Archaeosine is highly conserved in archaeal tRNAs, and it was detected at position 15 of all five

Archaea. The biosynthesis pathway of G+ involves multiple enzymes, and it is diversified in different phyla of archaea. In Euryarchaeota, 7- cyano-7-deazaguanine (preQ0) is inserted into

+ 84 tRNA by arcTgt to form a preQ0-tRNA, and preQ0-tRNA is converted to G by ArcS . Homologs of arcTgt and ArcS are found in all three Euryarchaeota.

In Crenarchaeota, it is still unknown whether the conversion of the nitrile group to the formamidino

126 group on preQ0 occurs before or after preQ0 is inserted into tRNA . There might be two possible pathways for G+ synthesis in Crenarchaeota as shown in Figure 4.8. Crenarchaeota lack ArcS

82 homologs, but QueF-L found in P. calidifontis is able to convert the nitrile group on preQ0 . A

QueF-L homolog was only found in T. tenax. I also detected preQ0-tRNA in T. tenax nucleosides, which might indicate that the conversion of the nitrile group to the formamidino group on preQ0

150 occurred after preQ0 was inserted into the tRNA in a manner consistent with the pathway in

Euryarchaeota.

Figure 4.7 The two possible pathways of G+ biosynthesis in Crenarchaeota Wyosine Derivatives

The wyosine derivatives are diversified in Archaea, and the biosynthesis pathways were predicted in Euryarchaeota and Crenarchaeota as shown in Figure 4.970. In my study, 4-demethylwyosine

(imG-14), wyosine (imG), methylwyosine (mimG), 7-aminocarboxypropyl-demethylwyosine

(yW-86), and methylated undermodified hydroxywybutosine (OHyWy) were detected. imG-14 was found at position 37 of M. jannaschii and M. marburgensis. Homologs of Taw1, which converts m1G to imG-14, were found in all five archaea. imG to mimG is catalyzed by Taw3. Taw3 homologs were found. mimG was also mapped at position 37 of T. tenax tRNA, and it suggests

151 that this modification is the final product for G37 in T. tenax. yW-86 was found in M. marburgensis, and homologs of Taw2, which is responsible for yW-86 biosynthesis, were found in all five archaea. OHyWy was only detected in M. kandleri. The enzyme for OHyWy biosynthesis is still unknown.

4.3.5.3 Modified Cytidine

2′-O-methylcytidine (Cm)

2′-O-methylcytidine is found at various positions in tRNAs. TrmJ, found in S. acidocaldarius, is able to catalyze 2′-O-methylcytidine formation at position 3294. A homolog of TrmJ was found in

M. kandleri, M. jannaschii, M. marburgensis, but not in T. tenax. Modification mapping data shows that Cm32 was found in M. kandleri, M. jannaschii, M. marburgensis, and S. acidocaldarius but not in T. tenax. Thus it is likely that the TrmJ homologs are responsible for Cm32 in these organisms.

Cm34 in archaeal tRNAs can be introduced by C/D box sRNAs associated with three proteins, fibrillarin, Nop5, and L7Ae109, 127, and homologs of these proteins were highly conserved all five

Archaea. The C/D box sRNAs were found in M. kandleri, M. jannaschii, S. acidocaldarius, and T. tenax. Modification mapping identified CU[Cm]AUAACCCUGp and CU[Cm]CAGp in S.

Met Trp acidocaldarius tRNA CAU and tRNA CCA, respectively. The guide region of detected C/D box sRNAs are only able to pair with the anticodon loop of these two tRNA sequences as shown in

Met Figures 4.4 and 4.5, which indicates that Cm at position 34 of S. acidocaldarius tRNA CAU and

Trp tRNA CCA might be due to the guide RNA mechanism. TrmL in E. coli can catalyze 2′-O- methylation at tRNA position 34128. TrmL homologs were not found in any of these five Archaea,

152 therefore the enzyme for Cm34 modification in M. kandleri, M. jannaschii, and T. tenax remains unknown.

Cm was also found at position 56 of P. abyssi tRNAs, and aTrm56 is responsible for Cm56 modification with the C56 2′-O-methylase activity provided by a C/D guide sRNP in crenarchaeons105. The aTrm56 homolog was identified in M. kandleri, M. jannaschii, M. marburgensis, and S. acidocaldarius, but modification mapping did not localize Cm to position

56.

5-methylcytidine (m5C)

5-methylcytidine is a common modified nucleoside in archaeal tRNAs, and it is usually found at positions 40, 48, and 49. A Trm4 homolog from M. jannaschii, MJ0026, was reported to introduce m5C at position 40/48/4986, and PAB1947 reported in P. abyssi catalyzes m5C formation at several positions within tRNAs but prefers cytidine at position 49129. Nucleoside analysis data shows that all five samples have m5C. The modification mapping data indicates that m5C is found at positions

40/48/49 of M. kandleri, 48/49 of M. jannaschii, 48 of M. marburgensis, 48/49 of S. acidocaldarius, and 48 of T. Tenax. PSI-BLAST shows that only M. marburgensis lacks a Trm4 homolog, but it does contain a homolog of PAB1947. The other four Archaea have both the Trm4 homolog and the homolog of PAB1947 for m5C modification. I would expect that m5C at position

48 of M. marburgensis tRNAs is only introduced by the homolog of PAB1947. For the other four archaea, the biosynthesis of m5C might be catalyzed by Trm4 or/and homolog of PAB1947. This suggests that there is more than one way to catalyze the biosynthesis of m5C in these four Archaea, and m5C might be important for Archaea. More experiments (for example, gene knockout) are needed to confirm the modifying enzyme of m5C.

153

2-thiocytidine (s2C)

2-thiocytidine is found at position 32 of M. kandleri and M. jannaschii tRNAs. The s2C modification enzyme, TtcA, was reported in E. coli130, however PSI-BLAST analysis finds no homolog of TtcA in any of these archaea. At this time, the enzyme in these two archaea organisms responsible for s2C remains unidentified.

N4-acetylcytidine (ac4C)

N4-acetylcytidine (ac4C) was found at position 34 in H. volcanii tRNAs, and it was predicted to be catalyzed by HVO_273637. ac4C was detected in M. Kandleri, S. acidocaldarius, and T. tenax. The homologs of HVO_2736 were also found in these three archaea. Modification mapping only

4 Ala Thr detected ac C at position 34 of T. tenax tRNA CGC or/and tRNA CGU, but I would expect that ac4C is present at position 34 of M. kandleri and S. acidocaldarius. The modification mapping did not localize the ac4C at position 34 of M. kandleri or S. acidocaldarius, which might be due to low abundance.

Agmatidine (C+)

Ile2 11, 14 Agmatidine (C+) is a modified cytidine found at position 34 of archaeal tRNA CAU .

Nucleoside analysis shows that it was detected in all five samples. It was reported that TiaS catalyzes agmatidine formation131, and homologs of TiaS were found in all five archaea. The modification mapping results localized C+ to position 34 of M. marburgensis and M. jannaschi.

The position of C+ was not localized in other three Archaea, which might be due to the low

Ile abundance, because only tRNA CAU has this modification.

154

4.3.5.4 Modified Uridine

Pseudouridine (Ψ)

Pseudouridine is the isomer of uridine as shown in Figure 4.7. In Archaea, pseudouridine is found at many positions (38/39/40132-133, 54/5587, and 55133). The nucleosides analysis shows that pseudouridine was found in all the archaeal samples, however the positons are not identified here, although they could be in future studies using appropriate techniques134-136. There are two stand- alone enzymes, TruA in H. volcanii133 and Pus10 in M. jannaschii87, which catalyze pseudouridylation at positions 38/39/40 and 54/55 respectively. PSI-BLAST result shows that

TruA homologs are found in M. kandleri, M. jannaschii, M. marburgensis, and T. Tenax but not in S. acidocaldarius. The Pus10 homologs are found in all five organisms.

Figure 4.8 Uridine and Pseudouridine Pseudouridine at position 55 was identified previously in tRNAs by Cbf5133 which might be dependent on the H/ACA box guide RNA machinery. PSI-BLAST results indicate that Cbf5 homologs are found in all five organisms. It remains to be seen whether the H/ACA guide RNAs for pseudouridine are present. The results indicate that pseudouridine might be present at position

54/55 in all five organisms and at position 38-40 of M. kandleri, M. jannaschii, M. marburgensis, and T. Tenax.

155

1- methylpseudouridine (m1Ψ)

1-methylpseudouridine at position 54 of tRNA was converted from pseudouridine by TrmY, which was found in M. jannaschii90. In my study, m1Ψ was only found in M. kandleri nucleosides, but the position was not identified from modification mapping. I would predict that m1Ψ is present at position 54 in M. kandleri tRNAs.

2′-O-methyluridine (Um)

2′-O-methyluridine was found in M. kandleri, M. jannaschii, S. acidocaldarius, and T. tenax.

Modification mapping data shows that Um is present at position 32 of M. kandleri and M. jannaschii tRNAs. TrmJ found in E. coli catalyzes 2′-O-methylation of cytidine and uridine at position 32137, however the TrmJ found in S. acidocaldarius only catalyzes 2′-O-methylation of cytidine at position 3294. The modifying enzyme for Um32 is still unknown in M. kandleri and M. jannaschii. Modification mapping also shows Um is present at position 44 of S. acidocaldarius, and T. tenax tRNAs. Trm44 found in S. cerevisiae is able to catalyze 2′-O-methylation of uridine at position 44, but no homologs of Trm44 were found in these two Archaea. The enzyme responsible for Um44 remains unidentified in S. acidocaldarius and T. tenax.

4-thiouridine (s4U)

4-thiouridine was detected in M. jannaschii, M. marburgensis and T. tenax. Modification mapping was able to localize s4U at position 8 of M. jannaschii. The biosynthesis of s4U in bacteria requires

IscS and ThiI. IscS is a cysteine desulfurase that mobilizes the sulfur from L-cysteine to form an

IscS persulfide. Sulfur is then transferred to Thil and finally transferred to U8 of tRNA138-139.

However, a recent study shows that Thil in the Archaeon Methanococcus maripaludis can catalyze s4U alone79. In my study, protein homolog analysis shows that homologs of Thil were found in all

156 five archaea. Although not yet confirmed, it is likely that s4U is present at position 8 in M. marburgensis and T. tenax tRNAs.

5-cyanomethyluridine (cnm5U) and 5-cyanomethyl-2-thio-uridine (cnm5s2U)

5-cyanomethyluridine was firstly identified by Mandal and co-workers in H. marismortui tRNAs.

They also found that cnm5U was present in total tRNAs from Euryarchaea but not in Crenarchaea,

Bacteria, or Eukaryotes140. In my study, cnm5U was identified at position 37 of M. marburgensis and M. jannaschii tRNAs. The 2-thiolated analog, cnm5s2U, was also identified at position 37 of

M. jannaschii tRNAs. However, the modifying enzymes remain unidentified.

5-methylaminomethyluridine (mnm5U) and 5-methylaminomethyl-2-thiouridine (mnm5s2U)

Modification at position 34 of tRNAs is critical for decoding, and U34 is the most diversely modified location. Nucleoside analysis showed that mnm5U and mnm5s2U were found in M. jannaschii and M. marburgensis, and they were mapped to position 34 in the tRNAs from each organism. The biosynthesis pathway of mnm5U involves multiple enzymes in E. coli: the GTPase

MnmE, the folate-dependent MnmG, and the bifunctional oxidase methyltransferase MnmC141.

However PSI-BLAST analysis did not report homologs for these enzymes. An alternative biosynthesis pathway was reported by Kitamura and co-workers as shown in Figure 4.10. The intermediate 5-aminomethyluridine (nm5U) can be methylated to mnm5U by Aquifex Aeolicus

DUF752142, which is homologous to the methyltransferase MnmC2 domain of E. Coli MnmC. The

PSI-BLAST analysis shows that homologs of DUF752 are found in M. jannaschii141 and M. marburgensis. No homologs were found other three archaea. The enzyme which catalyzes uridine to nm5U is still unknown in M. jannaschii and M. marburgensis.

157

Thiolation of U34 can happen independently from mnm5U, and it is reported that IscS and MnmA are required for 2-thiouridine biosynthesis in E. coli143. However the only homolog of IscS found was in M. marburgensis. The enzymes for 2-thiourdine biosynthesis in M. jannaschii remain unknown.

Figure 4.9 The biosynthesis pathways of mnm5U. The upper pathway is in Aquifex Aeolicus, and lower pathway is in E. coli.

5-methoxycarbonylmethyl-2′-O-methyluridine (mcm5Um) and 5-methoxycarbonylmethyl-2- thiouridine (mcm5s2U)

Nucleoside analysis shows that 5-methoxycarbonylmethyl-2′-O-methyluridine (mcm5Um) and 5- methoxycarbonylmethyl-2-thiouridine (mcm5s2U) were only found in S. acidocaldarius and T. tenax. Modification mapping identified mcm5Um at position 34 of in T. tenax tRNAs. It was reported that the biosynthesis of mcm5s2U requires multiple enzymes. Elp3144 and methyltransferase145-147 (Trm9 or Alkbh8) are responsible for mcm5U side chain biosynthesis, and

158

Ctu1 is responsible for s2U biosynthesis148. PSI-BLAST analysis indicates that homologs of Elp3 are found in all five archaea, and no homolog of Trm9 or Alkbh8 is found in any one of them, which indicates that an alternative methyltransferase is needed to complete the biosynthesis of mcm5U. PSI-BLAST analysis also shows that Ctu1 was found in all five Archaea.

4.4 Conclusion

In this study, I have analyzed the modified nucleosides in five archaeal organisms by LC-MS/MS.

I used RNase T1 to digest the total tRNAs to oligonucleotides, and by using the theoretical digestion products from tRNA gene sequences and the modified nucleosides found, some modifications are localized onto tRNA sequences.

The modifying enzymes homologs were identified using PSI-BLAST. For some modifications, the enzymes were reported in Archaea, and these enzymes were used as reference proteins. For modifications whose modifying enzyme are not found in Archaea, the enzymes found in Bacteria or/and Eukaryotes were used as reference proteins. If a modification was found in the sample, and the homolog of the modifying enzyme for specific position(s) were identified, I then predict that this modification might be present at the specific position(s) on tRNAs. Figures 4.10-4.14 summarize the modification profiles of each Archaea studied. The modification whose position is experimentally verified is colored by black, and the modification whose position is predicted is colored by blue. Figure 4.15 shows the modification profile of H. volcanii tRNAs, it includes experimentally validated position (black) and predicted (blue)35-38.

The five Archaea and H. volcanii tRNAs have some common modifications, such as m1A, m1I,

6 1 2 2 5 + + 1 t A, m G, m G, m 2G, Cm, m C, G , C , Ψ, and m Ψ. Modification mapping results show that

159 some of these common modifications are present at the same location. For example, G+ was found

2 5 at position 15, m 2G was found at position 26, m C was found at position 48/49. The conserved modifications might indicate the chemical structure and the position are important for these

Archaea.

Position 34 of tRNAs is usually modified for efficient codon recognition. The study of H. volcanii tRNAs found Cm and ac4C at position 34 and predicted mcm5s2U, mo5U, and C+. My study found more modifications present at position 34 in archaeal tRNAs, mnm5U, mnm5s2U, cnm5U, cnm5s2U, and mcm5s2U. The various U34 modifications indicate that different Archaea use different modifications at position 34 of tRNAs for efficiently decoding. The modification mapping only localized C+ at position 34 of M. jannaschii and M. marburgensis, but the homolog of C+ modifying enzyme was found in all five Archaea. Moreover, because of the special decoding ability of C+, which was discussed in Chapter 3, it is likely that C+ is conserved at position 34 of

Ile 2 archaeal tRNA CAU to decode the isoleucine codon AUA. k C was found at position 34 of several

Ile bacterial tRNA CAU to decode the isoleucine codon AUA, which suggests that Archaea and

Bacteria use similar decoding strategy to decode isoleucine codon AUA.

The modified purine at position 37 of tRNAs is important for translational accuracy. t6A and m1G were found at position 37 of H. volcanii tRNAs. In this study, not only t6A and m1G but also hn6A, ms2t6A, m6A, imG-14, and mimG were mapped at position 37. This result shows that modifications of A37 and G37 in Archaeal tRNAs are more diversified than what previously found in H. volcanii.

Interestingly, imG-14 and mimG were found at position 37 of M. jannaschii and T. tenax tRNAs, respectively. These two modifications are wyosine derivatives whose precursor is m1G. It suggests that Archaea use wyosine derivatives to maintain the reading frame during translation, which is similar to Eukaryotes149.

160

Many tRNA modifying enzymes have been characterized in Archaea before, and some homologs were found in these five Archaea. However, tRNA modifying enzymes and biosynthesis pathway of some modifications are still unknown. For example, cnm5U was only found in tRNAs from euryarchaea but not in crenarchaea, eubacteria, or eukaryotes. My study also supports these results.

The modification pathways and enzymes remain unknown. I would expect that cnm5U modifying enzymes might be unique in euryarchaea. Archaea also use guide RNA with corresponding enzymes to introduce the modifications at specific positions like Eukaryotes. The modification

Met mapping result and guide RNA search result suggest that Cm34 in S. acidocaldarius tRNA CAU is likely due to a guide RNA mechanism.

This study has explored tRNA modification profiles, predicted modifying enzymes and guide

RNAs in selected archaeal organisms. It expands our understanding of tRNA modifications in

Archaea, and it is a starting point for further study about Archaea in terms of tRNA modifying enzymes and modification pathways.

161

Figure 4.10 The consensus modification Figure 4.11 The consensus modification Figure 4.12 The consensus modification sequence of M. kandleri tRNAs. sequence of M. jannaschii tRNAs. sequence of M. marburgensis tRNAs.

Figure 4.13 The consensus modification Figure 4.14 The consensus modification Figure 4.15 The consensus modification sequence of S. acidocaldarius tRNAs. sequence of T. tenax tRNAs. sequence of H. volcanii tRNAs. 162

Chapter 5

Conclusions and Future Work

5.1 Conclusions

In this dissertation I have studied the tRNA modification profiles from selected archaeal organisms by using LC-MS/MS. One of the most important contributions of this work was generating the first compilation of modified tRNA sequences of Methanocaldococcus jannaschii to understand the decoding strategy in this organism. Our knowledge of archaeal tRNA modification profiles was limited to a well-studied archaeon, Haloferax volcanii. tRNA modifications have been identified in many other archaeal organisms, but the locations of these modifications were not examined. This work provides detailed information about the locations of modifications in M. jannaschii tRNAs. All the modifications in the anticodon region were localized, which allowed me to compare the decoding strategies between M. jannaschii and H. volcanii, as well as other eukaryal and bacterial organisms. The comparsion of decoding strategies used by different orgnisms provides more comprehensive information about the universal and unique rules of decoding. Because the modifications of tRNA are critical for decoding, especially the modificaions at position 34 and 37, the comparion also reveals whether differnet organisms use the same or different modificaitons for decoding, and it provides the basis for futher study about the tRNA modifying enzymes and biosynthesis pathways in different organism.

In Chapter 2, I have developed a new stand-alone computational tool, RNAModMapper (RAMM), for oligonucleotide MS/MS data interpretation and RNA sequence annotation, which enables higher throughput RNA modification mapping by LC-MS/MS. The two-stage scoring function of

RAMM improves the accuracy of MS/MS sequence annotation. Besides c-, y-, w-, a-B-type ions,

163

RAMM uniquely accounts for the labile neutral- and base-loss ions that are often encountered in oligonucleotide CID-MS/MS data. The user-defined modifications allows one to handle other modifications of interest. RAMM is able to perform MS/MS spectral data analysis and sequence annotation not only for tRNA but also for other types of RNA and synthetic oligonucleotides.

In Chapter 3, I have extensively studied the modification profile of M. jannaschii tRNAs, and a compilation of modified tRNA sequences were generated. Many modifications in M. jannaschii were also found in bacteria and eukaryotes before, while several modifications that were unique to archaea were identified. A new modification, cnm5s2U, has been characterized and localized in this chapter. The modification mapping results of the anticodon loop showed that wyosine-related modifications (imG and imG-14) were present at position 37 of multiple tRNAs, which suggested that wyosine and its derivatives to position 37 might be more prevalent in archaea than previously thought. In this study, I found that the uridines at position 34 in M. jannaschii tRNA were modified to cnm5U, cnm5s2U, mnm5U, and mnm5s2U to efficiently decode synonymous codons by wobble base pairing. The modifications t6A, hn6A, m1G, imG-14 and imG at position 37 were likely to ensure correct decoding of 2-fold degenerate synonymous codons.

In Chapter 4, the post-transcriptional modifications in four archaeal organisms, Methanopyrus kandleri, Methanothermobacter marburgensis, Sulfolobus acidocaldarius, and Thermoproteus tenax, were investigated. They have some common modifications, and they also have their unique ones. The positions of some modifications were localized to the specific positions on tRNA sequences. The tRNA modifying enzymes and guide RNAs were predicted by using bioinformatics, and based on the homologs of tRNA modifying enzymes, the positions of some modifications were also predicted. The mapping results of position 34 in the anticodon loop shows that archaea and bacteria share some common U34 modifications and also use similar decoding strategies. The

164 mapping results of position 37 in the anticodon loop indicates that these archaea use wyosine and its derivatives to maintain decoding accuracy during the translation, which is similar to what was found in eukaryotes.

5.2 Future Work

5.2.1 The Shortcoming and Future Work of RNAModMapper

In Chapter 2, there are some limitations of RNAModMapper when working with high resolution

MS and MS/MS data. The 13C isotope peak of high molecular weight RNase digestion products

(e. g. > 2700Da ) could be the most abundant ion in the MS spectrum, because MS/MS isolation was centered around this isotope, the precursor ion mass is higher than the mass of the monoisotopic peak. In this case, the precursor mass tolerance needs to be increased in RAMM for interpretation of high molecular weight oligonucleotides. Some observed false positives in high- resolution MS/MS spectrum are due to both the assignment of lower abundance product ions with simultaneous incorrect assignment of high abundance product ions. This occurs when the precursor mass falls within the defined tolerance but it is the incorrect charge state. To eliminate the need for increased precursor ion tolerance and reduce the number of incorrect interpretations, future versions of RAMM will calculate the isotopic state of precursor ion, and determine its charge state by analysis of isotopic spacing.

When variable position mode in RAMM was used to analyze RNase T1 digestion products of

Phe Saccharomyces cerevisiae tRNA GAA, it was found that the program was not capable of interpreting the digestion products containing more than one modified G. For example, RNase T1

Phe digestion of S. cerevisiae tRNA GAA yields an oligonucleotide

165

A[Cm]U[Gm]AA[yW]AΨ[m5C]UGp, and its canonical oligonucleotide is ACUGAAGAUCUGp.

There are two modified G (Gm and yW) in one digestion product, which is rare. Since the current version of RAMM only considers one modified G for one oligonucleotide, it fails to identify this digestion product. This issue will be solved in a future version.

The current version of RAMM only matches the interpreted MS/MS data from singal RNase digestion to RNA sequences. However multiple RNases are usually used to improve the sequence converage and mapping confidence through overlapping the digestion products. The future version of RAMM will compile mapping results from experiments where multiple RNase digests generated from more than one enzyme are used.

5.2.2 tRNA Modifications in M. jannaschii

In Chapter 3, the modification profile of M. jannaschii tRNAs was extensively studied, however there are still many open questions. Methylwyosine (mimG) was not localized in any specific M. jannaschii tRNA due to the low abundance. One project would be the localization of this low abundance modification. Once the position of mimG is localized, it will provide us a full picture of how M. jannaschii uses wyosine and its derivatives (imG-14, mimG) for correct decoding, and it also allows us to confirm the modifying genes predicted in this study for wyosine and its derivatives in this organism.

Another wyosine derivative, 7-aminocarboxypropyl-demethylwyosine (yW-86), was detected in

M. jannaschii by McCloskey and co-workers32, but this modification was not found in my study.

Bioinformatics analysis also suggested that yW-86 can be generated by this organism70. The absence of yW-86 might be related to culturing conditions or due to limited Taw2 activity in M.

166 jannaschii. Further experiments (e.g., using different growth conditions) are need to understand the states of wyosine-related modification in this and other archaeal tRNAs.

Four hypomodifications of U34 (cnm5U, cnm5s2U, mnm5U, and mnm5s2U) were identified, however the modifying enzyme(s) for these U34 modifications remain unknown. One possible enzyme that is involved in these modifications is archaeal Elp3, which catalyzes tRNA modification at C5 of U3496. The Elp3 homolog was found in M. jannaschii, and further assays

(e.g., gene knockout) are needed to confirm the function of this enzyme for the U34 modification in this organism.

The modified adenosines (t6A, hn6A, ms2t6A, and ms2hn6A) were localized at position 37 of multiple tRNAs in M. jannaschii. Interestingly, both t6A and hn6A were localized at the same position in the same tRNA, and they have a high structural similarity (Figure 5.1). This unique modification state suggests that t6A and hn6A share either a common pathway or a common tRNA substrate target. The biosynthesis pathway and modifying enzymes of t6A were identified in archaea, however the detailed biosynthesis pathway of hn6A is still unknown. The previous study suggested that one of t6A modifying enzyme (KEOPS) could be responsible for the synthesis of hn6A97. Future work exploring the biosynthesis pathway of hn6A by using the gene knockout method and changing the substrate target warranted.

Figure 5.1 The chemical structure of t6A and hn6A.

167

Another curious finding was that m2G was localized to position 67, which is in acceptor stem. This is the first time that m2G was localized to position 67 in archaea. m2G67 was only found in

Lys 93 2 tRNA UUU from Loligo Bleekeri , however the functions and modifying enzyme of m G67 are still unknown. m2G was found at position 6, which is also in acceptor stem. It is possible that M. jannaschii tRNAs use either m2G6 or m2G67 for folding, stability or charging. The previous study found that Trm14 catalyzes the biosynthesis of m2G678. It will be interesting to know whether

Trm14 would also be responsible for modifying G67 or if a different methyltransferase is required.

In this study, the position of pseudouridine was not localized. Future work to localize pseudouridine is the first step of understanding its functions and modifying enzymes. Because pseudouridine has the same mass as uridine, direct CID-MS/MS of pseudouridine-containing oligonucleotide has the same product ions as the isomeric uridine-containing oligonucleotide. The tRNA sample can be treated by chemicals that only react with pseudouridine, such as CMCT and acrylonitrile136, and RNase digestion products with derivatized pseudouridine generate unique fragments ions, which enables one to identify the position of pseudouridine.

5.2.3 tRNA Modifications in Other Archaea

In Chapter 4, the tRNA modification in four archaea were studied. This work only focused on the qualitative analysis of modifications in selected archaeal organisms. Future work would be to quantify the common modifications. There were many common modifications found among these five archaea tRNAs, and an interesting question is whether these common modifications are present at the same level. One way to measure the relative abundance of a common modification in different archaeal organisms is to spike an internal standard to each sample. The peak area of

168 the common modification can be normalized by the peak area of internal standard, and the ratio of each normalized peak area from different samples is calculated to get the relative abundance of the modification from each sample.

Comparing the modified nucleosides between crenarchaea and euryachaea, there are some unique modifications in each phylum. For example, cnm5U, which was localized to position 34 in two euryachaea (M. jannaschii, and M. marburgensis) was not found in any crenarchaea, which is consistent with a previous study34. It is likely that the biosynthesis pathways and modifying enzymes are unique in euryarchaea. Since I have localized cnm5U to specific positions in M. jannaschii and M. marburgensis tRNAs, future work would be further study on this modification in these two organisms.

The modifications were partially localized in these four archaeal organisms by using RNase T1.

Future work would be using multiple ribonucleases to localize modifications to a specific tRNA, especially for the anticodon loop. The nucleoside data shows that the modifications that are supposed to be present at position 34 and 37 are different among five archaea, and the three euryachaea (M. jannaschii, M. kandleri, M. marburgensis) have more diversified U34 modifications than the other two crenarchaea (S. acidocaldarius, T. tenax). In three euryachaea, cnm5(s2)U and mnm5(s2)U are localized to position 34 of some tRNA sequences, while in two crenarchaea, only mcm5(s2)U(m) was found at position 34. By using multiple ribonucleases, one might localize these modifications to a specific tRNA, it would allow us to understand why crenarchaea have fewer U34 modifications for decoding, and whether they use fewer modifications to efficiently decode more tRNAs.

Based on the modification mapping that I have done for two crenarchaea, T. tenax will be the next organism that one could study in the future. There are four modifications that have been localized 169 to the position 34 so far, and mimG was localized to position 37 in T. tenax, which indicates that it has relative high abundance. Interestingly, m2G was also localized to position 67 in this organism.

It is likely that m2G67 might be common in archaea tRNAs. Once the modification profile of T. tenax tRNA is fully identified, it will allow a more comprehensive comparison between crenarchaea and euryachaea in terms of decoding strategies and modification patterns.

170

Bibliography

1. Limbach, P. A.; Crain, P. F.; McCloskey, J. A., Summary: the modified nucleosides of RNA. Nucleic Acids Res. 1994, 22 (12), 2183-2196. 2. Cantara, W. A.; Crain, P. F.; Rozenski, J.; McCloskey, J. A.; Harris, K. A.; Zhang, X.; Vendeix, F. A.; Fabris, D.; Agris, P. F., The RNA modification database, RNAMDB: 2011 update. Nucleic Acids Res. 2010, 39 (suppl_1), D195-D201. 3. Boccaletto, P.; Machnicka, M. A.; Purta, E.; Piątkowski, P.; Bagiński, B.; Wirecki, T. K.; de Crécy-Lagard, V.; Ross, R.; Limbach, P. A.; Kotter, A., MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res. 2017, 46 (D1), D303-D307. 4. Yarian, C.; Townsend, H.; Czestkowski, W.; Sochacka, E.; Malkiewicz, A. J.; Guenther, R.; Miskiewicz, A.; Agris, P. F., Accurate translation of the genetic code depends on tRNA modified nucleosides. J. Biol. Chem. 2002, 277 (19), 16391-16395. 5. Crick, F., Codon-anticodon pairing: the wobble hypothesis. 1966. 6. Weixlbaumer, A.; Murphy IV, F. V.; Dziergowska, A.; Malkiewicz, A.; Vendeix, F. A.; Agris, P. F.; Ramakrishnan, V., Mechanism for expanding the decoding capacity of transfer RNAs by modification of uridines. Nature structural and molecular biology 2007, 14 (6), 498. 7. Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt, D. M.; Meng, E. C.; Ferrin, T. E., UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 2004, 25 (13), 1605-1612. 8. Agris, P. F.; Eruysal, E. R.; Narendran, A.; Väre, V. Y.; Vangaveti, S.; Ranganathan, S. V., Celebrating wobble decoding: half a century and still much is new. RNA Biol. 2018, 15 (4-5), 537- 553. 9. Schweizer, U.; Bohleber, S.; Fradejas-Villar, N., The modified base isopentenyladenosine and its derivatives in tRNA. RNA Biol. 2017, 14 (9), 1197-1208. 10. Rozov, A.; Demeshkina, N.; Khusainov, I.; Westhof, E.; Yusupov, M.; Yusupova, G., Novel base-pairing interactions at the tRNA wobble position crucial for accurate reading of the genetic code. Nature communications 2016, 7, 10457. 11. Mandal, D.; Köhrer, C.; Su, D.; Russell, S. P.; Krivos, K.; Castleberry, C. M.; Blum, P.; Limbach, P. A.; Söll, D.; RajBhandary, U. L., Agmatidine, a modified cytidine in the anticodon of archaeal tRNAIle, base pairs with adenosine but not with guanosine. Proc. Natl. Acad. Sci. U. S. A. 2010, 107 (7), 2872-2877. 12. Muramatsu, T.; Yokoyama, S.; Horie, N.; Matsuda, A.; Ueda, T.; Yamaizumi, Z.; Kuchino, Y.; Nishimura, S.; Miyazawa, T., A novel lysine-substituted nucleoside in the first position of the anticodon of minor isoleucine tRNA from Escherichia coli. J. Biol. Chem. 1988, 263 (19), 9261- 9267. 13. Szweykowska‐Kulinska, Z.; Senger, B.; Keith, G.; Fasiolo, F.; Grosjean, H., Intron‐ dependent formation of pseudouridines in the anticodon of Saccharomyces cerevisiae minor tRNA (Ile). The EMBO journal 1994, 13 (19), 4636-4644. 14. Voorhees, R. M.; Mandal, D.; Neubauer, C.; Köhrer, C.; RajBhandary, U. L.; Ramakrishnan, V., The structural basis for specific decoding of AUA by isoleucine tRNA on the ribosome. Nat. Struct. Mol. Biol. 2013, 20 (5), 641. 15. Limbach, P. A.; Paulines, M. J., Going global: the new era of mapping modifications in RNA. Wiley Interdisciplinary Reviews: RNA 2017, 8 (1), e1367. 16. Wang, Z.; Gerstein, M.; Snyder, M., RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews genetics 2009, 10 (1), 57. 171

17. Tserovski, L.; Marchand, V.; Hauenschild, R.; Blanloeil-Oillo, F.; Helm, M.; Motorin, Y., High-throughput sequencing for 1-methyladenosine (m1A) mapping in RNA. Methods 2016, 107, 110-121. 18. Cozen, A. E.; Quartley, E.; Holmes, A. D.; Hrabeta-Robinson, E.; Phizicky, E. M.; Lowe, T. M., ARM-seq: AlkB-facilitated RNA methylation sequencing reveals a complex landscape of modified tRNA fragments. Nat. Methods 2015, 12 (9), 879. 19. Zheng, G.; Qin, Y.; Clark, W. C.; Dai, Q.; Yi, C.; He, C.; Lambowitz, A. M.; Pan, T., Efficient and quantitative high-throughput tRNA sequencing. Nat. Methods 2015, 12 (9), 835. 20. Suzuki, T.; Ikeuchi, Y.; Noma, A.; Suzuki, T.; Sakaguchi, Y., Mass spectrometric identification and characterization of RNA-modifying enzymes. Methods Enzymol. 2007, 425, 211. 21. Wetzel, C.; Limbach, P. A., Mass spectrometry of modified RNAs: recent developments. Analyst 2016, 141 (1), 16-23. 22. Ross, R.; Cao, X.; Yu, N.; Limbach, P. A., Sequence mapping of transfer RNA chemical modifications by liquid chromatography tandem mass spectrometry. Methods 2016, 107, 73-78. 23. Kowalak, J. A.; Pomerantz, S. C.; Crain, P. F.; McCloskey, J. A., A novel method for the determination of posttranscriptional modification in RNA by mass spectrometry. Nucleic Acids Res. 1993, 21 (19), 4577-4585. 24. Jora, M.; Burns, A. P.; Ross, R. L.; Lobue, P. A.; Zhao, R.; Palumbo, C. M.; Beal, P. A.; Addepalli, B.; Limbach, P. A., Differentiating Positional Isomers of Nucleoside Modifications by Higher-Energy Collisional Dissociation Mass Spectrometry (HCD MS). J. Am. Soc. Mass Spectrom. 2018, 29 (8), 1745-1756. 25. Addepalli, B.; Venus, S.; Thakur, P.; Limbach, P. A., Novel ribonuclease activity of cusativin from Cucumis sativus for mapping nucleoside modifications in RNA. Anal. Bioanal. Chem. 2017, 409 (24), 5645-5654. 26. Addepalli, B.; Lesner, N. P.; Limbach, P. A., Detection of RNA nucleoside modifications with the uridine-specific ribonuclease MC1 from Momordica charantia. RNA 2015. 27. Houser, W. M.; Butterer, A.; Addepalli, B.; Limbach, P. A., Combining recombinant ribonuclease U2 and for RNA modification mapping by liquid chromatography–mass spectrometry. Anal. Biochem. 2015, 478, 52-58. 28. McLuckey, S. A.; Van Berker, G. J.; Glish, G. L., Tandem mass spectrometry of small, multiply charged oligonucleotides. J. Am. Soc. Mass Spectrom. 1992, 3 (1), 60-70. 29. Woese, C. R.; Fox, G. E., Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl. Acad. Sci. U. S. A. 1977, 74 (11), 5088-5090. 30. Woese, C. R.; Kandler, O.; Wheelis, M. L., Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. U. S. A. 1990, 87 (12), 4576-4579. 31. Jarrell, K. F.; Walters, A. D.; Bochiwal, C.; Borgia, J. M.; Dickinson, T.; Chong, J. P., Major players on the microbial stage: why archaea are important. Microbiology 2011, 157 (4), 919-936. 32. McCloskey, J. A.; Graham, D. E.; Zhou, S.; Crain, P. F.; Ibba, M.; Konisky, J.; Söll, D.; Olsen, G. J., Post-transcriptional modification in archaeal tRNAs: identities and phylogenetic relations of nucleotides from mesophilic and hyperthermophilic . Nucleic Acids Res. 2001, 29 (22), 4699-4706. 33. Noon, K. R.; Guymon, R.; Crain, P. F.; McCloskey, J. A.; Thomm, M.; Lim, J.; Cavicchioli, R., Influence of temperature on tRNA modification in archaea: burtonii

172

(optimum growth temperature [Topt], 23 C) and Stetteria hydrogenophila (Topt, 95 C). J. Bacteriol. 2003, 185 (18), 5483-5490. 34. Mandal, D.; Köhrer, C.; Su, D.; Babu, I. R.; Chan, C. T.; Liu, Y.; Söll, D.; Blum, P.; Kuwahara, M.; Dedon, P. C., Identification and codon reading properties of 5-cyanomethyl uridine, a new modified nucleoside found in the anticodon wobble position of mutant haloarchaeal isoleucine tRNAs. RNA 2014, 20 (2), 177-188. 35. Gupta, R., Halobacterium volcanii tRNAs. Identification of 41 tRNAs covering all amino acids, and the sequences of 33 class I tRNAs. J. Biol. Chem. 1984, 259 (15), 9461-9471. 36. Gupta, R., Transfer RNAs of Halobacterium volcanii: sequences of five leucine and three serine tRNAs. Syst. Appl. Microbiol. 1986, 7 (1), 102-105. 37. Grosjean, H.; Gaspin, C.; Marck, C.; Decatur, W. A.; de Crécy-Lagard, V., RNomics and Modomics in the halophilic archaea Haloferax volcanii: identification of RNA modification genes. BMC Genomics 2008, 9 (1), 470. 38. Phillips, G.; de Crécy-Lagard, V., Biosynthesis and function of tRNA modifications in Archaea. Curr. Opin. Microbiol. 2011, 14 (3), 335-341. 39. Rozenski, J.; McCloskey, J. A., SOS: a simple interactive program for ab initio oligonucleotide sequencing by mass spectrometry. J Am Soc Mass Spectrom 2002, 13 (3), 200- 203. 40. Nyakas, A.; Blum, L. C.; Stucki, S. R.; Reymond, J.-L.; Schürch, S., OMA and OPA— software-supported mass spectra analysis of native and modified nucleic acids. J Am Soc Mass Spectrom 2013, 24 (2), 249-256. 41. Nakayama, H.; Akiyama, M.; Taoka, M.; Yamauchi, Y.; Nobe, Y.; Ishikawa, H.; Takahashi, N.; Isobe, T., Ariadne: a database search engine for identification and chemical analysis of RNA using tandem mass spectrometry data. Nucleic Acids Res 2009, 37 (6), e47. 42. Matthiesen, R.; Kirpekar, F., Identification of RNA molecules by specific enzyme digestion and mass spectrometry: software for and implementation of RNA mass mapping. Nucleic Acids Res 2009, 37 (6), e48. 43. Sample, P. J.; Gaston, K. W.; Alfonzo, J. D.; Limbach, P. A., RoboOligo: software for mass spectrometry data to support manual and de novo sequencing of post-transcriptionally modified ribonucleic acids. Nucleic Acids Res 2015, 43 (10), e64. 44. Cao, X.; Limbach, P. A., Enhanced detection of post-transcriptional modifications using a mass-exclusion list strategy for RNA modification mapping by LC-MS/MS. Anal Chem 2015, 87 (16), 8433-8440. 45. Machnicka, M. A.; Milanowska, K.; Oglou, O. O.; Purta, E.; Kurkowska, M.; Olchowik, A.; Januszewski, W.; Kalinowski, S.; Dunin-Horkawicz, S.; Rother, K. M.; Helm, M.; Bujnicki, J.; Grosjean, H., MODOMICS: a database of RNA modification pathways—2013 update. Nucleic Acids Res 2013, 41, D262-D267. 46. Xu, H.; Freitas, M. A., MassMatrix: a database search program for rapid characterization of proteins and peptides from tandem mass spectrometry data. Proteomics 2009, 9 (6), 1548-1555. 47. Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P., ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 2008, 24 (21), 2534-2536. 48. Steyaert, J., A Decade of Protein Engineering on —Atomic Dissection of the Enzyme‐Substrate Interactions. Eur J Biochem 1997, 247 (1), 1-11. 49. Uchida, T.; ARIMA, T.; EGAMI, F., Specificity of RNase U2. J Biochem 1970, 67 (1), 91-102.

173

50. Houser, W. M.; Butterer, A.; Addepalli, B.; Limbach, P. A., Combining recombinant ribonuclease U2 and protein phosphatase for RNA modification mapping by liquid chromatography–mass spectrometry. Anal Biochem 2015, 478, 52-58. 51. Rojo, M. A.; Arias, F. J.; Iglesias, R.; Ferreras, J. M.; Muñoz, R.; Escarmís, C.; Soriano, F.; López-Fando, J.; Méndez, E.; Girbés, T., Cusativin, a new cytidine-specific ribonuclease accumulated in seeds of Cucumis sativus L. Planta 1994, 194 (3), 328-338. 52. Addepalli, B.; Lesner, N. P.; Limbach, P. A., Detection of RNA nucleoside modifications with the uridine-specific ribonuclease MC1 from Momordica charantia. RNA 2015, 21 (10), 1746- 1756. 53. Mengel‐Jørgensen, J.; Kirpekar, F., Detection of pseudouridine and other modifications in tRNA by cyanoethylation and MALDI mass spectrometry. Nucleic Acids Res 2002, 30 (23), e135. 54. Addepalli, B.; Limbach, P. A., Pseudouridine in the Anticodon of Escherichia coli tRNATyr(QΨA) Is Catalyzed by the Dual Specificity Enzyme RluF. J Biol Chem 2016, 291 (42), 22327-22337. 55. Beausoleil, S. A.; Villén, J.; Gerber, S. A.; Rush, J.; Gygi, S. P., A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nature Biotech 2006, 24 (10), 1285-1292. 56. Yen, C.-Y.; Houel, S.; Ahn, N. G.; Old, W. M., Spectrum-to-spectrum searching using a proteome-wide spectral library. Mol Cell Proteomics 2011, 10 (7), M111. 007666. 57. Huang, T.-y.; Kharlamova, A.; Liu, J.; McLuckey, S. A., Ion trap collision-induced dissociation of multiply deprotonated RNA: c/y-ions versus (aB)/w-ions. J Am Soc Mass Spectrom 2008, 19 (12), 1832-1840. 58. Krivos, K. L.; Addepalli, B.; Limbach, P. A., Removal of 3'‐phosphate group by bacterial improves oligonucleotide sequence coverage of RNase digestion products analyzed by collision‐induced dissociation mass spectrometry. Rapid Commun Mass Spectrom 2011, 25 (23), 3609-3616. 59. Kapp, E. A.; Schütz, F.; Connolly, L. M.; Chakel, J. A.; Meza, J. E.; Miller, C. A.; Fenyo, D.; Eng, J. K.; Adkins, J. N.; Omenn, G. S., An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 2005, 5 (13), 3475-3490. 60. Addepalli, B.; Limbach, P. A., Pseudouridine in the anticodon of Escherichia coli tRNATyr (QΨA) is catalyzed by the dual specificity enzyme RluF. Journal of Biological Chemistry 2016, jbc. M116. 747865. 61. Jora, M.; Burns, A. P.; Ross, R. L.; Lobue, P. A.; Zhao, R.; Palumbo, C. M.; Beal, P. A.; Addepalli, B.; Limbach, P. A., Differentiating Positional Isomers of Nucleoside Modifications by Higher-Energy Collisional Dissociation Mass Spectrometry (HCD MS). J. Am. Soc. Mass Spectrom. 2018, 29, 1-12. 62. Chan, P. P.; Lowe, T. M., GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic Acids Res. 2015, 44 (D1), D184-D189. 63. Yu, N.; Lobue, P. A.; Cao, X.; Limbach, P. A., RNAModMapper: RNA modification mapping software for analysis of liquid chromatography tandem mass spectrometry data. Anal. Chem. 2017, 89 (20), 10744-10752. 64. Hossain, M.; Limbach, P. A., Mass spectrometry-based detection of transfer RNAs by their signature digestion products. RNA 2006. 65. Hossain, M.; Limbach, P. A., Multiple improve MALDI-MS signature digestion product detection of bacterial transfer RNAs. Anal. Bioanal. Chem. 2009, 394 (4), 1125.

174

66. Castleberry, C. M.; Limbach, P. A., Relative quantitation of transfer RNAs using liquid chromatography mass spectrometry and signature digestion products. Nucleic Acids Res. 2010, 38 (16), e162-e162. 67. Wetzel, C.; Limbach, P. A., Global identification of transfer RNAs by liquid chromatography–mass spectrometry (LC–MS). J. Proteomics 2012, 75 (12), 3450-3464. 68. Nelson, C. C.; McCloskey, J. A., Collision-induced dissociation of adenine. J. Am. Chem. Soc. 1992, 114 (10), 3661-3668. 69. Chan, C. T.; Chionh, Y. H.; Ho, C.-H.; Lim, K. S.; Babu, I. R.; Ang, E.; Wenwei, L.; Alonso, S.; Dedon, P. C., Identification of N6, N6-dimethyladenosine in transfer RNA from Mycobacterium bovis Bacille Calmette-Guerin. Molecules 2011, 16 (6), 5168-5181. 70. de Crécy-Lagard, V.; Brochier-Armanet, C.; Urbonavičius, J.; Fernandez, B.; Phillips, G.; Lyons, B.; Noma, A.; Alvarez, S.; Droogmans, L.; Armengaud, J., Biosynthesis of wyosine derivatives in tRNA: an ancient and highly diverse pathway in Archaea. Molecular biology and evolution 2010, 27 (9), 2062-2077. 71. Arragain, S.; Handelman, S. K.; Forouhar, F.; Wei, F.-Y.; Tomizawa, K.; Hunt, J. F.; Douki, T.; Fontecave, M.; Mulliez, E.; Atta, M., Identification of eukaryotic and prokaryotic methythiotransferases for biosynthesis of 2-methylthio-N6-threonylcarbamoyladenosine in tRNA. J. Biol. Chem. 2010, jbc. M110. 106831. 72. Puri, P.; Wetzel, C.; Saffert, P.; Gaston, K. W.; Russell, S. P.; Cordero Varela, J. A.; van der Vlies, P.; Zhang, G.; Limbach, P. A.; Ignatova, Z., Systematic identification of tRNAome and its dynamics in L actococcus lactis. Mol. Microbiol. 2014, 93 (5), 944-956. 73. Wagner, T. M.; Nair, V.; Guymon, R.; Pomerantz, S. C.; Crain, P. F.; Davis, D. R.; McCloskey, J. A. In A novel method for sequence placement of modified nucleotides in mixtures of transfer RNA, Nucleic Acids Symp. Ser., Oxford University Press: 2004; pp 263-264. 74. Jühling, F.; Mörl, M.; Hartmann, R. K.; Sprinzl, M.; Stadler, P. F.; Pütz, J., tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic acids research 2008, 37 (suppl_1), D159-D162. 75. Perche-Letuvée, P.; Molle, T.; Forouhar, F.; Mulliez, E.; Atta, M., Wybutosine biosynthesis: structural and mechanistic overview. RNA Biol. 2014, 11 (12), 1508-1518. 76. Edmonds, C.; Crain, P.; Gupta, R.; Hashizume, T.; Hocart, C.; Kowalak, J.; Pomerantz, S.; Stetter, K.; McCloskey, J., Posttranscriptional modification of tRNA in thermophilic archaea (Archaebacteria). J. Bacteriol. 1991, 173 (10), 3138-3148. 77. Ross, R. L.; Cao, X.; Limbach, P. A., Mapping Post‐Transcriptional Modifications onto Transfer Ribonucleic Acid Sequences by Liquid Chromatography Tandem Mass Spectrometry. Biomolecules 2017, 7 (1), 21. 78. Menezes, S.; Gaston, K. W.; Krivos, K. L.; Apolinario, E. E.; Reich, N. O.; Sowers, K. R.; Limbach, P. A.; Perona, J. J., Formation of m 2 G6 in Methanocaldococcus jannaschii tRNA catalyzed by the novel methyltransferase Trm14. Nucleic Acids Res. 2011, 39 (17), 7641-7655. 79. Liu, Y.; Zhu, X.; Nakamura, A.; Orlando, R.; Söll, D.; Whitman, W. B., Biosynthesis of 4- thiouridine in tRNA in the methanogenic archaeon Methanococcus maripaludis. J. Biol. Chem. 2012, jbc. M112. 405688. 80. Armengaud, J.; Urbonavičius, J.; Fernandez, B.; Chaussinand, G.; Bujnicki, J. M.; Grosjean, H., N2-methylation of guanosine at position 10 in tRNA is catalyzed by a THUMP domain-containing, S-adenosylmethionine-dependent methyltransferase, conserved in Archaea and Eukaryota. J. Biol. Chem. 2004, 279 (35), 37142-37152.

175

81. Watanabe, M.; Matsuo, M.; Tanaka, S.; Akimoto, H.; Asahi, S.; Nishimura, S.; Katze, J. R.; Hashizume, T.; Crain, P. F.; McCloskey, J. A., Biosynthesis of archaeosine, a novel derivative of 7-deazaguanosine specific to archaeal tRNA, proceeds via a pathway involving base replacement on the tRNA polynucleotide chain. J. Biol. Chem. 1997, 272 (32), 20146-20151. 82. Phillips, G.; Swairjo, M. A.; Gaston, K. W.; Bailly, M.; Limbach, P. A.; Iwata-Reuyl, D.; de Crécy-Lagard, V., Diversity of archaeosine synthesis in crenarchaeota. ACS Chem. Biol. 2011, 7 (2), 300-305. 83. Bai, Y.; Fox, D. T.; Lacy, J. A.; Van Lanen, S. G.; Iwata-Reuyl, D., Hypermodification of tRNA in Thermophilic Archaea CLONING, OVEREXPRESSION, AND CHARACTERIZATION OF tRNA-GUANINE TRANSGLYCOSYLASE FROM METHANOCOCCUS JANNASCHII. Journal of Biological Chemistry 2000, 275 (37), 28731- 28738. 84. Phillips, G.; Chikwana, V. M.; Maxwell, A.; El-Yacoubi, B.; Swairjo, M. A.; Iwata-Reuyl, D.; de Crécy-Lagard, V., Discovery and characterization of an amidinotransferase involved in the modification of archaeal tRNA. Journal of Biological Chemistry 2010, 285 (17), 12706-12713. 85. Nishimoto, M.; Higashijima, K.; Shirouzu, M.; Grosjean, H.; Bessho, Y.; Yokoyama, S., Crystal structure of tRNA N2, N2-guanosine dimethyltransferase Trm1 from Pyrococcus horikoshii. J. Mol. Biol. 2008, 383 (4), 871-884. 86. Kuratani, M.; Hirano, M.; Goto-Ito, S.; Itoh, Y.; Hikida, Y.; Nishimoto, M.; Sekine, S.-i.; Bessho, Y.; Ito, T.; Grosjean, H., Crystal structure of Methanocaldococcus jannaschii Trm4 complexed with sinefungin. J. Mol. Biol. 2010, 401 (3), 323-333. 87. Gurha, P.; Gupta, R., Archaeal Pus10 proteins can produce both pseudouridine 54 and 55 in tRNA. RNA 2008. 88. Mengel‐Jørgensen, J.; Kirpekar, F., Detection of pseudouridine and other modifications in tRNA by cyanoethylation and MALDI mass spectrometry. Nucleic acids research 2002, 30 (23), e135-e135. 89. Chatterjee, K.; Blaby, I. K.; Thiaville, P. C.; Majumder, M.; Grosjean, H.; Yuan, Y. A.; Gupta, R.; de Crécy-Lagard, V., The archaeal COG1901/DUF358 SPOUT-methyltransferase members, together with pseudouridine synthase Pus10, catalyze the formation of 1- methylpseudouridine at position 54 of tRNA. RNA 2012. 90. Chen, H.-Y.; Yuan, Y. A., Crystal structure of Mj1640/DUF358 protein reveals a putative SPOUT-class RNA methyltransferase. J. Mol. Cell. Biol. 2010, 2 (6), 366-374. 91. Kawamura, T.; Anraku, R.; Hasegawa, T.; Tomikawa, C.; Hori, H., Transfer RNA methyltransferases from Thermoplasma acidophilum, a thermoacidophilic archaeon. International journal of molecular sciences 2014, 16 (1), 91-113. 92. Roovers, M.; Wouters, J.; Bujnicki, J. M.; Tricot, C.; Stalon, V.; Grosjean, H.; Droogmans, L., A primordial RNA modification enzyme: the case of tRNA (m1A) methyltransferase. Nucleic Acids Res. 2004, 32 (2), 465-476. 93. Matsuo, M.; Abe, Y.; Saruta, Y.; Okada, N., Mollusk genes encoding lysine tRNA (UUU) contain introns. Gene 1995, 165 (2), 249-253. 94. Somme, J.; Van Laer, B.; Roovers, M.; Steyaert, J.; Versées, W.; Droogmans, L., Characterization of two homologous 2′-O-methyltransferases showing different specificities for their tRNA substrates. RNA 2014. 95. Terasaka, N.; Kimura, S.; Osawa, T.; Numata, T.; Suzuki, T., Biogenesis of 2- agmatinylcytidine catalyzed by the dual protein and RNA kinase TiaS. Nat. Struct. Mol. Biol. 2011, 18 (11), 1268.

176

96. Selvadurai, K.; Wang, P.; Seimetz, J.; Huang, R. H., Archaeal Elp3 catalyzes tRNA wobble uridine modification at C5 via a radical mechanism. Nat. Chem. Biol. 2014, 10 (10), 810. 97. Perrochia, L.; Crozat, E.; Hecker, A.; Zhang, W.; Bareille, J.; Collinet, B.; Van Tilbeurgh, H.; Forterre, P.; Basta, T., In vitro biosynthesis of a universal t6A tRNA modification in Archaea and Eukarya. Nucleic Acids Res. 2012, 41 (3), 1953-1964. 98. Thiaville, P. C.; Iwata-Reuyl, D.; de Crécy-Lagard, V., Diversity of the biosynthesis pathway for threonylcarbamoyladenosine (t6A), a universal modification of tRNA. RNA biology 2014, 11 (12), 1529-1539. 99. Deutsch, C.; El Yacoubi, B.; de Crécy-Lagard, V.; Iwata-Reuyl, D., Biosynthesis of threonylcarbamoyl adenosine (t6A), a universal tRNA nucleoside. J. Biol. Chem. 2012, 287 (17), 13666-13673. 100. Goto‐Ito, S.; Ito, T.; Ishii, R.; Muto, Y.; Bessho, Y.; Yokoyama, S., Crystal structure of archaeal tRNA (m1G37) methyltransferase aTrm5. Proteins: Structure, Function, and Bioinformatics 2008, 72 (4), 1274-1289. 101. Suzuki, Y.; Noma, A.; Suzuki, T.; Senda, M.; Senda, T.; Ishitani, R.; Nureki, O., Crystal structure of the radical SAM enzyme catalyzing tricyclic modified base formation in tRNA. J. Mol. Biol. 2007, 372 (5), 1204-1214. 102. Blaby, I. K.; Phillips, G.; Blaby-Haas, C. E.; Gulig, K. S.; El Yacoubi, B.; de Crécy-Lagard, V., Towards a systems approach in the genetic analysis of archaea: accelerating mutant construction and phenotypic analysis in Haloferax volcanii. Archaea 2010, 2010. 103. Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17), 3389-3402. 104. Singh, S. K.; Gurha, P.; Tran, E. J.; Maxwell, E. S.; Gupta, R., Sequential 2′-O-methylation of archaeal pre-tRNATrp nucleotides is guided by the intron-encoded but trans-acting box C/D ribonucleoprotein of pre-tRNA. J. Biol. Chem. 2004, 279 (46), 47661-47671. 105. RENALIER, M.-H.; JOSEPH, N.; Gaspin, C.; Thebault, P.; MOUGIN, A., The Cm56 tRNA modification in archaea is catalyzed either by a specific 2′-O-methylase, or a C/D sRNP. RNA 2005, 11 (7), 1051-1063. 106. Dennis, P. P.; Tripp, V.; Lui, L.; Lowe, T.; Randau, L., C/D box sRNA-guided 2′-O- methylation patterns of archaeal rRNA molecules. BMC Genomics 2015, 16 (1), 632. 107. Watkins, N. J.; Ségault, V.; Charpentier, B.; Nottrott, S.; Fabrizio, P.; Bachi, A.; Wilm, M.; Rosbash, M.; Branlant, C.; Lührmann, R., A common core RNP structure shared between the small nucleoar box C/D RNPs and the spliceosomal U4 snRNP. Cell 2000, 103 (3), 457-466. 108. Hardin, J. W.; Reyes, F. E.; Batey, R. T., Analysis of a critical interaction within the archaeal box C/D small ribonucleoprotein complex. J. Biol. Chem. 2009. 109. Bortolin, M. L.; Bachellerie, J. P.; Clouet‐d’Orval, B., In vitro RNP assembly and methylation guide activity of an unusual box C/D RNA, cis‐acting archaeal pre‐tRNA Trp. Nucleic Acids Res. 2003, 31 (22), 6524-6535. 110. Lui, L. M.; Uzilov, A. V.; Bernick, D. L.; Corredor, A.; Lowe, T. M.; Dennis, P. P., Methylation guide RNA evolution in archaea: structure, function and genomic organization of 110 C/D box sRNA families across six Pyrobaculum species. Nucleic Acids Res. 2018, 46 (11), 5678- 5691. 111. Dennis, P. P.; Omer, A., Small non-coding RNAs in Archaea. Curr. Opin. Microbiol. 2005, 8 (6), 685-694.

177

112. Suryadi, J.; Tran, E. J.; Maxwell, E. S.; Brown, B. A., The crystal structure of the Methanocaldococcus jannaschii multifunctional L7Ae RNA-binding protein reveals an induced- fit interaction with the box C/D RNAs. Biochemistry 2005, 44 (28), 9657-9672. 113. Su, A. A.; Tripp, V.; Randau, L., RNA-Seq analyses reveal the order of tRNA processing events and the maturation of C/D box and CRISPR RNAs in the Methanopyrus kandleri. Nucleic Acids Res. 2013, 41 (12), 6250-6258. 114. Tripp, V.; Martin, R.; Orell, A.; Alkhnbashi, O. S.; Backofen, R.; Randau, L., Plasticity of archaeal C/D box sRNA biogenesis. Mol. Microbiol. 2017, 103 (1), 151-164. 115. Wilkinson, M. L.; Crary, S. M.; Jackman, J. E.; Grayhack, E. J.; Phizicky, E. M., The 2′- O-methyltransferase responsible for modification of yeast tRNA at position 4. RNA 2007. 116. Guelorget, A.; Roovers, M.; Guerineau, V.; Barbey, C.; Li, X.; Golinelli-Pimpaneau, B., Insights into the hyperthermostability and unusual region-specificity of archaeal Pyrococcus abyssi tRNA m1A57/58 methyltransferase. Nucleic Acids Res. 2010, 38 (18), 6206-6218. 117. Kempenaers, M.; Roovers, M.; Oudjama, Y.; Tkaczuk, K. L.; Bujnicki, J. M.; Droogmans, L., New archaeal methyltransferases forming 1-methyladenosine or 1-methyladenosine and 1- methylguanosine at position 9 of tRNA. Nucleic Acids Res. 2010, 38 (19), 6533-6543. 118. Jackman, J. E.; Alfonzo, J. D., Transfer RNA modifications: nature's combinatorial chemistry playground. Wiley Interdisciplinary Reviews: RNA 2013, 4 (1), 35-48. 119. Golovina, A. Y.; Sergiev, P. V.; Golovin, A. V.; Serebryakova, M. V.; Demina, I.; Govorun, V. M.; Dontsova, O. A., The yfiC gene of E. coli encodes an adenine-N6 methyltransferase that specifically modifies A37 of tRNA1Val (cmo5UAC). RNA 2009, 15 (6), 1134-1141. 120. Wolf, J.; Gerber, A. P.; Keller, W., tadA, an essential tRNA‐specific adenosine deaminase from Escherichia coli. The EMBO journal 2002, 21 (14), 3841-3851. 121. Cavaillé, J.; Chetouani, F.; BACHELLERIE, J.-P., The yeast Saccharomyces cerevisiae YDL112w ORF encodes the putative 2′-O-ribose methyltransferase catalyzing the formation of Gm18 in tRNAs. RNA 1999, 5 (1), 66-81. 122. Persson, B. C.; Jäger, G.; Gustafsson, C., The spoU gene of Escherichia coli, the fourth gene of the spoT operon, is essential for tRNA (Gm18) 2′-O-methyltransferase activity. Nucleic Acids Res. 1997, 25 (20), 4093-4097. 123. Pintard, L.; Lecointe, F.; Bujnicki, J. M.; Bonnerot, C.; Grosjean, H.; Lapeyre, B., Trm7p catalyses the formation of two 2′‐O‐methylriboses in yeast tRNA anticodon loop. The EMBO journal 2002, 21 (7), 1811-1820. 124. Christian, T.; Evilia, C.; Williams, S.; Hou, Y.-M., Distinct origins of tRNA (m1G37) methyltransferase. J. Mol. Biol. 2004, 339 (4), 707-719. 125. Gabant, G.; Auxilien, S.; Tuszynska, I.; Locard, M.; Gajda, M. J.; Chaussinand, G.; Fernandez, B.; Dedieu, A.; Grosjean, H.; Golinelli-Pimpaneau, B., THUMP from archaeal tRNA: m 22 G10 methyltransferase, a genuine autonomously folding domain. Nucleic Acids Res. 2006, 34 (9), 2483-2494. 126. Bon Ramos, A.; Bao, L.; Turner, B.; de Crécy-Lagard, V.; Iwata-Reuyl, D., QueF-Like, a Non-Homologous Archaeosine Synthase from the Crenarchaeota. Biomolecules 2017, 7 (2), 36. 127. Ye, K.; Jia, R.; Lin, J.; Ju, M.; Peng, J.; Xu, A.; Zhang, L., Structural organization of box C/D RNA-guided RNA methyltransferase. Proc. Natl. Acad. Sci. U. S. A. 2009, 106 (33), 13808- 13813. 128. Zhou, M.; Long, T.; Fang, Z.-P.; Zhou, X.-L.; Liu, R.-J.; Wang, E.-D., Identification of determinants for tRNA substrate recognition by Escherichia coli C/U34 2′-O-methyltransferase. RNA Biol. 2015, 12 (8), 900-911.

178

129. Auxilien, S.; El Khadali, F.; Rasmussen, A.; Douthwaite, S.; Grosjean, H., Archease from Pyrococcus abyssi improves substrate specificity and solubility of a tRNA m5C-methyltransferase. J. Biol. Chem. 2007. 130. Bouvier, D.; Labessan, N.; Clémancey, M.; Latour, J.-M.; Ravanat, J.-L.; Fontecave, M.; Atta, M., TtcA a new tRNA-thioltransferase with an Fe-S cluster. Nucleic Acids Res. 2014, 42 (12), 7960-7970. 131. Ikeuchi, Y.; Kimura, S.; Numata, T.; Nakamura, D.; Yokogawa, T.; Ogata, T.; Wada, T.; Suzuki, T.; Suzuki, T., Agmatine-conjugated cytidine in a tRNA anticodon is essential for AUA decoding in archaea. Nat. Chem. Biol. 2010, 6 (4), 277. 132. Koonin, E. V., Pseudouridine synthases: four families of enzymes containing a putative uridine-binding motif also conserved in dUTPases and dCTP deaminases. Nucleic Acids Res. 1996, 24 (12), 2411-2415. 133. Blaby, I. K.; Majumder, M.; Chatterjee, K.; Jana, S.; Grosjean, H.; de Crécy-Lagard, V.; Gupta, R., Pseudouridine formation in archaeal RNAs: The case of Haloferax volcanii. RNA 2011. 134. Durairaj, A.; Limbach, P. A., Mass spectrometry of the fifth nucleoside: a review of the identification of pseudouridine in nucleic acids. Anal. Chim. Acta 2008, 623 (2), 117-125. 135. Patteson, K.; Rodicio, L. P.; Limbach, P. A., Identification of the mass-silent post- transcriptionally modified nucleoside pseudouridine in RNA by matrix-assisted laser desorption/ionization mass spectrometry. Nucleic Acids Res. 2001, 29 (10), e49-e49. 136. Addepalli, B.; Limbach, P. A., Mass spectrometry-based quantification of pseudouridine in RNA. J. Am. Soc. Mass Spectrom. 2011, 22 (8), 1363-1372. 137. Purta, E.; Van Vliet, F.; Tkaczuk, K. L.; Dunin-Horkawicz, S.; Mori, H.; Droogmans, L.; Bujnicki, J. M., The yfhQ gene of Escherichia coli encodes a tRNA: Cm32/Um32 methyltransferase. BMC Mol. Biol. 2006, 7 (1), 23. 138. Kambampati, R.; Lauhon, C. T., IscS is a sulfurtransferase for the in vitro biosynthesis of 4-thiouridine in Escherichia coli tRNA. Biochemistry 1999, 38 (50), 16561-16568. 139. Kambampati, R.; Lauhon, C. T., Evidence for the Transfer of Sulfane Sulfur from IscS to ThiI during the in Vitro Biosynthesis of 4-Thiouridine inEscherichia coli tRNA. J. Biol. Chem. 2000, 275 (15), 10727-10730. 140. Mandal, D.; Köhrer, C.; Su, D.; Babu, I. R.; Chan, C. T.; Liu, Y.; Söll, D.; Blum, P.; Kuwahara, M.; Dedon, P. C., Identification and codon reading properties of 5-cyanomethyl uridine, a new modified nucleoside found in the anticodon wobble position of mutant haloarchaeal isoleucine tRNAs. RNA 2013. 141. Armengod, M.-E.; Moukadiri, I.; Prado, S.; Ruiz-Partida, R.; Benítez-Páez, A.; Villarroya, M.; Lomas, R.; Garzón, M. J.; Martínez-Zamora, A.; Meseguer, S., Enzymology of tRNA modification in the bacterial MnmEG pathway. Biochimie 2012, 94 (7), 1510-1520. 142. DOMAIN, U. F. O., Characterization and Structure of the Aquifex aeolicus Protein DUF752. J. Biol. Chem 2012, 2012 (287), 43950-43960. 143. Kambampati, R.; Lauhon, C. T., MnmA and IscS are required for in vitro 2-thiouridine biosynthesis in Escherichia coli. Biochemistry 2003, 42 (4), 1109-1117. 144. Huang, B.; Johansson, M. J.; Byström, A. S., An early step in wobble uridine tRNA modification requires the Elongator complex. RNA 2005, 11 (4), 424-436. 145. Kalhor, H. R.; Clarke, S., Novel methyltransferase for modified uridine residues at the wobble position of tRNA. Mol. Cell. Biol. 2003, 23 (24), 9283-9292.

179

146. Deng, W.; Babu, I. R.; Su, D.; Yin, S.; Begley, T. J.; Dedon, P. C., Trm9-catalyzed tRNA modifications regulate global protein expression by codon-biased translation. PLoS Genet. 2015, 11 (12), e1005706. 147. Songe-Møller, L.; van den Born, E.; Leihne, V.; Vågbø, C. B.; Kristoffersen, T.; Krokan, H. E.; Kirpekar, F.; Falnes, P. Ø.; Klungland, A., Mammalian ALKBH8 possesses tRNA methyltransferase activity required for the biogenesis of multiple wobble uridine modifications implicated in translational decoding. Mol. Cell. Biol. 2010, 30 (7), 1814-1827. 148. Dewez, M.; Bauer, F.; Dieu, M.; Raes, M.; Vandenhaute, J.; Hermand, D., The conserved wobble uridine tRNA thiolase Ctu1–Ctu2 is required to maintain genome integrity. Proc. Natl. Acad. Sci. U. S. A. 2008, 105 (14), 5459-5464. 149. Waas, W. F.; Druzina, Z.; Hanan, M.; Schimmel, P., Role of a tRNA base modification and its precursors in frameshifting in eukaryotes. J. Biol. Chem. 2007, 282 (36), 26026-26034.

180

Appendiices

Appendix Table A1 The modification mapping result of RNase T1 digestoin of selected archaeal organisms

Euryarchaeota Crenarchaeota Pos M. kandleri M. marburgensis S. acidocaldarius T. tenax 10 CCUUA[m2G]p CCCAUA[m2G]p UA[G+]CGp A[G+]CUGp A[G+]CAGp A[G+]CUGp A[G+]CCUGp CUCA[G+]Gp UCUA[G+]CGp A[G+]CCAGp 15 UA[G+]CCUGp UUCA[G+]UUGp CUCA[G+]UUGp UUUA[G+]CGp UCUA[G+]UCUGp CUCA[G+]UCUGp CUUA[G+]CCAGp CCA[G+]CCAGp UA[G+]UCCGp CUAA[G+]CCCGp 18 CAA[Gm]Gp 2 2 2 2 C[m 2Gm]CCGp C[m 2G]UCGp C[m 2Gm]UCCGp AU[ m 2Gm ]Gp 2 2 2 26 C[m 2Gm]CGp C[m 2Gm]CCCGp UAU[ m 2Gm ]CGp 2 2 C[m 2Gm]CCGp C[ m 2G ]CCUCACUGp 2 AC[ m 2Gm ]CCCGp C[s2C]UGp AU[Cm]ACGp C[Cm]UUCCAAGp 32 CC[Um]UGp A[Cm]U[cnm5U]Gp [Cm]UCCAGp CU[C+]AU[ms2t6A]ACCGp CU[Cm]AUAACCCUGp CCU[ac4C]Gp 34 CU[mnm5s2U]UGp CCU[mcm5Um]CC[t6A]AGp CCU[Cm]Gp CU[mnm5U]AGp CU[mcm5Um]UUAACCCGp ACU[Cm]Gp ACU[mnm5s2U]CGp CU[Cm]UGp CCU[mcm5s2U]CGp 5 5 A[Cm]U[cnm U]Gp CCU[mcm Um]Gp [imG-14]ACCUGp CCU[mcm5Um]CC[t6A]AGp UC[m1G]AGp AA[imG-14]AUCUGp AU[t6A]ACCGp AC[m1G]CGp A[imG-14]AU[m5C]CCGp U[t6A]AGp U[t6A]AUGp CC[m6A]CGp A[mimG]AGp AC[m6A]CGp AU[t6A]ACCGp 37 AU[t6A]ACCGp U[t6A]AUCCCGp U[t6A]AU[m5C]UCGp CU[C+]AU[ms2t6A]ACCGp CU[Cm]AU[t6A]ACCGp ACUCCU[t6A]AUCUGp 40 AU[m5C]CCGp A[imG-14]AU[m5C]CCGp U[m5C]CCUGp U[t6A]AU[m5C]UCGp 44 CCUC[Um]Gp [Um]CCCCGp 48 U[m5C]CCCGp AU[m5C]ACGp AC[m5C]AGp AU[m5C]CCGp 49 ACC[m5C]CGp UC[m5C]CGp [m1A]AUCCCGp [m1A]AUCCCGp 58 [m1A]AUCCCACCCCCCGp [m1A]AUCCCACC[m2G]p 67 [m1A]AUCCCACC[m2G]p

181