<<

EFFECTS OF LOCAL RNA SEQUENCE AND STRUCTURAL

CONTEXTS ON P PROCESSING SPECIFICITY

By

JING ZHAO

Submitted in partial fulfillment of the requirements for

the degree of Doctor of Philosophy

Dissertation Advisor: Dr. Michael E Harris

Department of Biochemistry

CASE WESTERN RESERVE UNIVERSITY

May, 2019

CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of

Jing Zhao

candidate for the degree of Doctor of Philosophy *.

Committee Chair

Eckhard Jankowsky, Ph.D.

Committee Member

Michael E Harris, Ph.D.

Committee Member

Jonatha Gott, Ph.D.

Committee Member

Derek Taylor, Ph.D.

Committee Member

Date of Defense

March 5th, 2019

*We also certify that written approval has been obtained for any proprietary material contained therein

2 Table of Contents

Acknowledgement ...... 8

Abstract ...... 10

Chapter 1 Background and Significance...... 13

Background ...... 13

Significance ...... 24

Chapter 2 Distributive binding controlled by local RNA context results in 3′ to

5′ directional processing of dicistronic tRNA precursors by ribonuclease

P ...... 27

Abstract ...... 28

Introduction ...... 29

Results ...... 36

Evidence for 3′ to 5′ ordered processing of dicistronic ptRNAval VW by RNase P in

vitro ...... 36

Kinetic evidence for a distributive processing mechanism ...... 46

A stable stem loop in the tRNAvalW 5′ leader inhibits RNase P cleavage and further

enforces directional processing...... 61

Discussion ...... 65

Materials and Methods ...... 73

Chapter 3 ...... 78

3 Global analysis of sequence specificity landscape for C5 using monocistronic ptRNA demonstrates that presence of secondary structure in the leader sequence influences the RNase P processing specificity ...... 78

Abstract ...... 79

Introduction ...... 81

Results and Discussion...... 85

Engineering a stable stem-loop proximal to the P protein reveals the

sequence preferences of P protein more clearly ...... 85

The stable stemloop in the 21C distal leader sequence minimizes the effects of

unfavorable secondary structure on P protein binding specificity ...... 89

Quantitative modeling indicates that even in the absense of unfavorable structures

within leader sequences, surrounding sequence context is still an important factor for

sequence specificity ...... 92

The contribution to P protein binding by N(-6) to N(-3) depends on the

identity of nucleotides N(-2) and N(-1) ...... 98

G in the 5′ leader sequence is an anti-determinant, but only in the presence of a

second pairing interaction with the 3′ CCA ...... 103

Sequence landscape for RNase P processing specificity ...... 109

Materials and Methods ...... 112

Chapter 4: Summary and Further Directions ...... 116

Summary ...... 116

Future directions ...... 118

4 Appendix ...... 121

...... 148

Reference ...... 157

5 List of Figures

Figure 1. 1 Recognition model of ptRNA by RNase P holoenzyme ...... 15

Figure 1. 2 Global precursor tRNA distribution in E.coli K12 MG1655...... 18

Figure 1. 3 Global secondary structure of precursor tRNA of E.coli K12 MG1655 ...... 20

Figure 1. 4 Proposed free energy landscape of E. coli RNase P ...... 23

Figure 2. 1 3′ to 5′ directional RNase P cleavage is the initial step in processing of three dicistronic tRNA precursors 31

Figure 2. 2 In-line structure probing of 5′ 32P end-labeled dicistronic valVW precursor

RNA corresponds to predicted secondary structure ...... 39

Figure 2. 3 In vitro processing of of 5′ 32P end-labeled ...... 44

Figure 2. 4 Site specificity of RNase P processing of the dicistronic valVW precursor in vitro ...... 45

Figure 2. 5 Dependence of the observed rate constants and accumulation of reaction intermediate as a function of enzyme concentration ...... 49

Figure 2. 6 Determination of the reversibility of binding at the 5′ tRNA valV processing site ...... 51

Figure 2. 7 Kinetics analysis RNase P processing of 5′ 32P-labeled dicistronic valWV (A and B), valVV (C and D), and valWW (E and F) substrate ...... 56

Figure 2. 8 In-line structure probing of 5′ end labeled dicistronic valWV (A, B), valVV

(C, D) and valWW (E, F) substrate RNAs ...... 60

Figure 2. 9 Single turnover reaction analysis of precursor tRNA rate constants ...... 64

Figure 3. 1 HiTS-KIN substrate pool design and kinetic reaction ...... 84

Figure 3. 2 Determination of the effects on RNase P processing of 5′ leader sequence variation on the relative kcat/Km for all 4096 substrate variants ...... 88

Figure 3. 3 unfavorable secondary structures interfere with C5 protein binding specificity

...... 91

Figure 3. 4 PWM modeling of C5 protein binding specificity...... 95

Figure 3. 5 Heatmap modeling of C5 protein binding specificity including neighborhood interaction ...... 97

Figure 3. 6 Sequence variations from N(-6) to N(-3) are interdependent with identity of

N(-2) and N(-1)...... 101

Figure 3. 7 Combination of C(-4)G(-3) is a positive determinant when “A/U” appeared in

N(-2)N(-1) positions...... 102

Figure 3. 8 Comparison of the krel value of all 16 combinations at N(-4) and

N(-2) to the reference sequence ...... 105

Figure 3. 9 variations at the N(-1) position with a “G” at N(-2) indicates that a single base pairing interaction between leader sequence and the 3′ CCA may contribute to krel...... 107

Figure 3. 10 Propopsed Free energy landscape of RNase P processing of substrate pools through multiple-turnover reaction ...... 111

Figure 3. 11 HiTS-KIN processing diagram ...... 114

7 Acknowledgement

Here I am, at the point of graduation. When I look back, I don’t think I could have gone through the entire journey without the help from so many people: parents, advisor, friends, colleagues and so on. I am very thankful to my undergrad university, before I came to US to pursue my master and Ph. D degree in 2012, for giving me knowledge and courage. I was born in a small but happy family. As the only child, I am really thankful for all the love, care and full support that my parents gave me in the past, present and future.

Regarding this thesis dissertation, first of all, I want to express my great appreciation for my advisor, Dr. Michael Harris who played an essential role in my research career. I came to Dr. Harris lab in 2013 summer as a master student, he took me in and trained me. I had some lab experience during my undergraduate, but everything I learned in Harris lab was all new to me. After one semester, he recruited me as a graduate student. During the next five years as a graduate student in Harris lab, I had my own independent project and also some cooperative projects with my colleagues. We communicated a lot and he inspired my mind with insightful discussions on a weekly basis. He taught me how to conduct a project and help me go through the troubleshooting. He always led me in patient and gave me a lot of help on both my presentation and writing. And more importantly, he helped me build my faith in science.

Also, I would like to express my thanks to all lab members in Harris lab. Thanks to Dr.

Courtney N. Niland who trained me when I just came to this lab and lead me and helped me to learn the essential knowledge of HiTS-KIN technique. I need to especially point out

8 that the contribution of Dr. Courtney Niland in chapter 3, all the data from 21A population is from her. Thanks to Dr. Hsuan-chun Lin who helped me to analyze the sequencing data and expand my research and learning by teaching me the bio-statistics processing. Thanks to Shahbaz Gardezi and Jiaqiang Zhu for proofreading my document. I also want to take a moment to express my thanks to the rest of the colleagues, Dr. Andrew Knappenberger,

Edward Ollie, Kandice Simmons and all the new lab members in University of Florida,

Shahbaz Gardezi, Loc Huynh, Han Hu, Jiaqiang Zhu, Suhyun Yoon, Michael Hanna who build and maintain a harmonic atmosphere for science.

I would like to take a moment here to sincerely acknowledge all my committee members,

Dr. Eckhard Jankowsky, Dr. Jonatha Gott, Dr. Derek Taylor. They have been providing me some very useful advice for my research. I really appreciate them for helping me on the qualifying exam, attending my annual department seminars and finally my graduation defense.

Last but not least, I need to give myself some credit for working hard, maintaining a positive altitude on research and spirit of never giving up. I am really grateful for everything that I learned so far, and I want to keep learning for the rest of my life. I am ready to move on in my life with grateful.

9 Effects of Local RNA Contexts on Processing Specificity

Abstract

by

JING ZHAO

The affinity and kinetics of RNA and protein association depends on the stability of the

RNA-protein complex as well as the structure and stability of the free RNA and protein.

While much is known about how and bind their specific sites in RNA, less is understood about the effects of RNA sequence and structure contexts surrounding these specific binding sites on binding. Ribonuclease P (RNase P) is a ribonucleoprotein enzyme involved in tRNA processing that relies on the protein subunit that forms essential contacts to a linear sequence of 3-8 nucleotides in the 5′ leader of precursor tRNA (ptRNA).

Therefore, binding is likely to be influenced by the presence of competing secondary structure in the free RNA leader sequence. To address this issue, we pursued two primary lines of investigation. First, we compared the kinetics of RNA cleavage in monocistronic and polycistronic ptRNAs, which offers the opportunity to investigate the role of local

RNA contexts on recognition. Kinetic analysis by using single turnover kinetics in vitro, cleavage site mapping and uniform substrate labelling on polycistronic ptRNAval VW reproduces the 3′ to 5′ ordered processing observed in vivo suggesting this is the most predominant pathway by RNase P. In-line probing further reveals the effects of both ptRNAs order and leader structure on directional processing. Kinetic analysis on valV and valW individually suggests that valV is an intrinsically slow and valW is an intrinsically fast substrate due to the leader structure on valV. Therefore, ordered processing can arise

10 from the ability of genomically encoded sequences to form inhibitory secondary structure proximal to the RNase P cleavage site that inhibits processing.

In a parallel experiment, we designed three different ptRNA substrate pools with randomized leader sequences proximal to the RNase P cleavage site, but with different

RNA sequences appended upstream. Using a high throughput approach the rate constants for all the individual RNA substrates in the population were determined, and the resulting data was analyzed to test the effect of variation in both sequence and secondary structure on processing specificity globally. The effects of two upstream extensions (21A and 21B) containing transversion mutations were compared. Also, a third population (21C) was designed to form a stable stem loop that should insulate the randomized protein binding site from the effects of inhibitory interactions with additional leader sequences. Using a high-throughput sequencing kinetics (HiTS-KIN) method, we determined the rate constants for all sequence variants at N (-1) to N(-6) in the 21A, B and C backgrounds.

Comparison of the three HiTS-KIN data clearly demonstrates that formation of inhibitory secondary structures influences the association kinetics of a subset of substrates. Together, the comparative HiTS-KIN analyses in vitro and mechanistic basis of ordered genomic encoded polycistronic ptRNAs processing demonstrate the contribution of structure and sequence variations to RNase P processing specificity. Surprisingly, the results from both studies reveal several substrates that are processed rapidly despite the presence of structure in the 5′ leader sequence. Altogether, these results provide a better understanding of intrinsic molecular recognition properties of similar multiple substrate RNA processing

11 enzymes, reveal in detail the effects of sequence and structure contexts on RNA molecular recognition, and suggest new questions for future study of this important area of research.

12 Chapter 1 Background and Significance

Background

Coding and non-coding RNAs undergo cleavage and splicing reactions that remove leader, trailer, and/or intron sequences before they assume their biological functions. The recognition sites for RNA processing enzymes are usually highly conserved. However, these consensus recognition sites occur within a surrounding context of RNA sequences and structures that can vary significantly between different precursor transcripts. Indeed, the flanking RNAs that surround RNA processing sites have been shown to alter the reaction rates of pre-miRNA processing,(1-3) pre-mRNA splicing(4) and in numerous other

RNA processing pathways(5-7). The maturation of tRNAs is a prime example of how the recognition sites for RNA processing can occur within the context of many different flanking sequences. However, little is known about the influence of the flanking sequence contexts on precursor tRNA (ptRNAs) processing rates, or whether such effects have any significance for regulation of tRNA abundance in vivo.

Ribonuclease P (RNase P) is found in all three domains of life, , Eukarya and

Archaea, and is conserved through evolution(8). In the cell, RNase P is essential. It removes the extra 5′ leader sequences of all 87 different ptRNAs by hydrolysis of a phosphodiester bond between the tRNA and the 5′ leader. The leader sequences of different ptRNAs vary greatly, including sequence contexts, length and structures. The bacterial tRNA processing enzyme, RNase P, is a representative of a larger class of RNA processing enzymes that recognize multiple substrates. Thus, it can serve as a useful model system to quantitatively dissect the influence of the local RNA sequences and structures on RNA binding specificity.

13 The bacterial RNase P has a simple composition: one RNA component called P RNA

(about 350-400 nucleotides), serving as the catalyst during the reaction, is independently functional, which means it can recognize and process ptRNAs alone with slow catalysis or less specificity(9); and a protein subunit (P protein), the essential protein subunit of

Escherichia coli (E. coli) RNase P called C5, about 100 amino acids long, which usually binds to single-stranded leader sequence ( “-3” to “-8” position) of ptRNAs (Figure 1.1)(10) and contributes to molecular recognition of various of ptRNA substrates by enhancing the catalysis and specificity of P RNA(11,12). Since RNase P is essential to the cell, both the P

RNA component and the protein subunit are necessary for cell viability. Despite the lack of the conservation of leader sequence from all ptRNAs, some experimental approaches such as mutagenesis(12-16), crosslinking(17,18) and X-ray of Thermotoga maritima RNase P-substrate complex(10) indicated that P RNA makes direct contacts with the tRNA substrate as well as the protein subunit and binds to leader sequence from “-1” to “-8”. The kinetics of genomically encoded E. coli ptRNAs with RNase P revealed that they occur in vitro with essentially the same kcat/Km values which is consistent with the structural model(13,19). By using this experimental system, we ask the following questions:

How variable are the sequences surrounding RNase P processing sites? To what extent does variation in the local sequence context affect RNase P processing? And, do these effects contribute to tRNA abundance or regulation?

14 A B

Figure 1. 1 Recognition model of ptRNA by RNase P holoenzyme

(A) General secondary structure of ptRNAs. Leader sequences of ptRNAs show variation in both sequences, length and structural contexts. Also see Appendix 1.

(B) The 3.8 Å resolution X-ray crystal structure of RNase P-tRNAphe complex from

Thermotoga maritima(10). The leader sequence of tRNA (in red) seems fully caught in P protein and a part of tRNAphe body (in orange) makes directly contact with P RNA (in green).

15 According to the tRNA database, there are over 80 tRNA encoded in the E. coil genome (Figure 1.2)(20,21). 28 of them are transcribed as monocistronic ptRNAs in which each of these tRNA genes has its own promoter and they are transcribed into individual precursor tRNAs. 51 of them are transcribed as 14 polycistronic ptRNAs. each of these polycistronic ptRNAs share one promoter. Polycistronic ptRNAs contain at least 2 and up to 8 tRNAs in one transcriptional unit. The rest of the 7 precursor tRNAs are encoded within the ribosomal RNA precursor transcripts. Our analysis of leader sequences of these tRNA genes shows that the 5′ leader sequence of both monocistronic and polycistronic ptRNAs can vary from highly structured to simple homopolymeric single strand sequences.

In order to function, all of the ptRNAs require subsequent processing at both the 5′ and 3′ ends in order to generate the mature tRNAs for protein synthesis. RNase E is a major contributor to the initial processing of most of the polycistronic ptRNA transcripts and generates monocistronic ptRNAs products that undergo further 5′ and 3′ terminus maturation. RNase E is well-known for cleaving single-stranded RNA sequences guided by an adjacent structure or stem loop(22-24). It has also been suggested that the difference in cleavage efficiency of RNase E is influenced by canonical nucleotide A/U-rich RNase E cleavage site(25). Therefore, for the initiation step, ptRNA processing by RNase E can occur with considerable differences in efficiency of processing for different polycistronic ptRNA contexts. While maturation of 5′ leader sequence of monocistronic ptRNAs by RNase P occurs endonucleolytically. 3′ end maturation is usually accomplished by a combination of enzymes including RNase T, RNase PH, RNase D(26). However, there are three polycistronic ptRNAs from the valine and lysine family for which the initial processing step is entirely dependent on RNase P. The processing of this unexpected and remarkable

16 class of ptRNAs is investigated in detail and discussed in chapter 2. All ptRNAs within these polycistronic transcripts have encoded a 3′ “ACCA” sequence next to acceptor stem loop that can also be a part of the 5′ leader sequence of the downstream tRNA.

17

Monocistronic ptRNAs (28) Polycistronic ptRNAs (51)

tRNA 5’ 3’ tRNA tRNA tRNA 5’ //…// 3’ serW serU serT serX leuX leuW leuZ glyU metT-leuW-glnU-glnW-metU-glnV-glnX argU argW proL proK lysT-valT-lysW-valZ-lysY-lysZ-lysQ-rho f metY metT thrV thrW serV-argV-argY-argZ-argQ asnW pheU asnU asnT valU-valX-valY-lysV-rho aspV aspU asnV gltU argX-hisR-leuT-proM

ileX ileY gltV gltT metZ-metW-metV-rho thrU-tyrU-glyT-thrT Ribosomal ptRNAs (7) glyV-glyX-glyY leuQ-leuP-leuV RIBOSOMAL alaV RNA glyW-cysT-leuZ secG-leuU-rho gltW RIBOSOMAL ileU RNA valV-valW * alaW-alaX RIBOSOMAL aspT RNA alaT tyrT-tyrV

ileV RIBOSOMAL ileT E.coli K12 MG1655 RNA

Figure 1. 2 Global precursor tRNA distribution in E.coli K12 MG1655

All precursor tRNAs are categorized into three groups, each tRNA family is marked in different colors. Monocistronic ptRNAs are transcribed individually and have their own promoter. Polycistronic ptRNAs contain at least two ptRNAs that are transcribed together and share a single promoter. Only seven ptRNAs are encoded within . Dicistronic

VW with asterisk is further studied in the chapter 2.

18 One distinctive feature of bacterial RNase P is that it has the ability to correctly process multiple alternative substrates as all the different precursor tRNAs are the primary substrates for this enzyme. Seemingly, the bacterial RNase P substrate recognition occurs with few primary sequence determinants unlike restriction enzymes and other sequence- specific . Rather, recognition seems to be achieved by contributions of both sequence and structure-specific interactions that act cooperatively to accurately identify the correct cleavage site for hydrolysis.

The global organization of tRNA structure is highly conserved: an acceptor stem with CCA at the 3′ end, a D stem loop, a T stem loop and an anticodon stem loop (Figure 1.3, panel

A)(16). As outlined above, tRNA precursors must undergo a series of processing reactions to form functional tRNAs, these reactions include removal of extra sequences at both 5′ and 3′ ends and modification of specific nucleobase residues. It is known that RNase P

RNA binds precursor tRNA through the acceptor stem and coaxially stacked T-stem loop(27). The anticodon stem loop and D-stem loop do not interact with RNase P RNA directly, since their deletion has only a limited influence on RNase P activity. Additionally, the alignment study of all ptRNAs from E. coli K12 further shows that there is no sequence conservation on acceptor stem (region II and III) except the 3′-CCA end. About two thirds of the ptRNAs have (+1)G-C(+72) base pair that closes the acceptor stem and a similar percentage of the ptRNAs have “U” at the N(-1) position, while the rest of the leader sequence (region I) shows very limited sequence conservation(20). The percentage of the ptRNAs that contain all three P RNA recognition elements is about half, we refer to these as canonical ptRNAs.

19

A

B

Figure 1. 3 Global secondary structure of precursor tRNA of E.coli K12 MG1655

(A) The general and conserved secondary structure of ptRNAs. Nucleotides shown in the structure are usually conserved and recognized by P RNA. C5 protein binds to leader region

I (gray circles). The acceptor stem and T stem loop of tRNA interact with P RNA ( black circles).

(B) Sequence logos for three regions (I, II, III). Leader sequence (region I) has no sequence conservation. 5′ side of acceptor stem shows “G” at N(+1), and 3′ end sequences are conserved with “CCA”.

20 Current X-ray crystal structure of bacterial RNase P-tRNA complex are consistent with biochemical data showing that recognition determinants are the acceptor stem loop and few nucleotides which are located near the cleavage site. First are the N (-1) and N (-2) positions that are directly contacted by P RNA subunit. The N (-1) nucleobase has a strong interaction with a specific adenosine in the universally conserved J5/15 region of P RNA

(14,15). Mutations at the N (-1) position were observed to change the processing rate and site-specificity of RNase P and are partly rescued by compensatory mutations on P

RNA(14,28). The N (-2) nucleobase is likely interacting with J18/2 region of P RNA(15,18).

The nucleobases from N(-3) to N(-8) in the leader sequences are P protein binding sites.

Some biochemical studies showed that sequence preferences at N(-4) position are observed for both E. coli and B. subtilis indicating that the RNase P specificity at N(-4) position is a common substrate recognition site in the cell. The difference in protein-RNA interaction between E. coli and Bacillus subtilis (B. subtilis) also suggests that the molecular recognition of P protein and the leader sequences of ptRNAs are species-specific(12).

Additional biochemical and structural data also shows that the 5′ leader sequence of substrate binds to the cleft surface of C5 protein(29). This is consistent with current resolution of X-ray crystal structure of Thermotoga maritima RNase P-substrate complex, where the 5′ leader sequence nucleotides seems to be interacting with the protein subunit

(Figure 1.1, panel B)(10). However, the mechanistic basis of sequence recognition remains unknown. Many in vitro studies have shown that the variation of 5′ leader sequence involved in tRNA-P RNA and tRNA-protein interactions could have significant functional effects on RNase P processing specificity(11,30-33). The analysis of 5′ leader single point nucleobase mutations and amino acid substitutions in protein subunit of Bacillus subtilis

21 RNase P indicate the specific interaction of adenosine at N(-4). However, the sequence preferences of E.coli RNase P are still unknown.

Besides the crystal structure of the RNase P-tRNA complex, another way to understand

RNase P specificity is to understand the kinetic mechanism of the enzyme. The catalytic mechanism of bacterial RNase P has been established and is shown to proceed via four steps: “Association”, “Conformational change”, “Cleavage” and “Product Release”. The disassociation of the leader sequence of RNase P is rapid. However, stronger binding of protein subunit and leader sequence leads to a slower product release step. Since we performed the reaction under multiple turnover condition with excess substrate reacted with limited enzyme. Therefore, the rate limiting step of the reaction is the substrate and enzyme association and/or the conformational change step(34). The value of second order rate constant (kcat/Km) reflects the kon. As we discussed above, the direct contacts between

P RNA and the substrate tRNA are usually conserved. Substrate variation arises from the variation in sequence and structure context of 5 ′ leader sequence. This difference influences the speed of association and conformational change and could ultimately lead to the relative enrichment of different tRNA species in the cell.

22

Figure 1. 4 Proposed free energy landscape of E. coli RNase P

23 Significance

The recent years have witnessed an explosion of the high-resolution structures of functional

RNAs, such as the HDV , ribosome and self-splicing introns. In parallel, there are increasing numbers of the researchers that focus on studying or discovering the mechanism of non-coding RNAs involved in regulation. However, only limited research has been done that shows the far-reaching connections between RNA structure and function as well as how chemical detail of RNA leads to its biological function. Most of the limitations are due to the lack of efficient experimental technique. Here, we use

RNase P as the model to study multiple substrate recognition by a single enzyme. Bacterial

RNase P has a simple composition and is an essential enzyme. Furthermore, it is the only

RNA processing enzyme responsible for all tRNA 5′ end maturation. Understanding how

RNA sequence contexts and secondary structures influence RNase P substrate recognition is important for uncovering the complete biological function of RNase P. The same principles are likely to apply to other much more complex RNA processing enzymes such as the , spliceosome, TRAMP complex and complexes involved in miRNA and lncRNA function.

In chapter 2, we use three distinctive polycistronic ptRNAs that were recently discovered also considered as primary transcripts for RNase P in vivo. While RNase P usually processes monocistronic ptRNAs, it can also process three polycistronic ptRNAs from the valine and lysine family without initial cleavage by RNase E. By using in vitro kinetics to study this process we identified how the polycistronic context in conjunction with the leader structure affects processing specificity. We also identified that the mechanism is

24 non-processive. It is well established that RNase P is the only enzyme involved in 5′ leader sequence maturation for ptRNAs(25,33). Recent studies clearly show that the level of tRNAs in E.coli affect translation efficiency and cell function. These levels are most likely regulated based on the demand for protein synthesis(35,36). Our results, obtained from studying the processing of polycistronic ptRNAs, further help to understand this multiple substrate processing enzyme and its potential role in regulation of tRNA abundance.

In chapter 3, we analyzed the effects of variation in both the sequence and secondary structure in the leader sequence proximal to cleavage sites on global processing specificity.

We created three randomized substrate pools with three different 21 nucleotide long extra sequence upstream of the randomized region, N(-1) to N(-6) which is proximal to the cleavage site to avoid ligation bias. To determine these rate constants, we used the HiTS-

KIN method developed by our lab. HiTS-KIN combines next-generation sequencing and internal competition to measure the relative rate constants for a large number of RNA substrates simultaneously(31). These data were analyzed using a quantitative model for specificity and correlation analyses with calculated folding free energies to identify effects due to local RNA folding. By comparing the results between three different substrate pools, for the first time, we can systematically examine and understand the effects from both the sequence and structure context and how they regulate or control RNase P specificity.

Such an in-depth study of RNase P not only helps to better understand biological catalysis and RNA function but also to identify facets that influence specificity in vivo. Additionally,

25 such as RNase P RNA are currently being investigated as potential therapeutics and hopefully this fundamental study will inspire future drug design.

26

Chapter 2 Distributive enzyme binding controlled by local RNA context results in

3′ to 5′ directional processing of dicistronic tRNA precursors by Escherichia coli ribonuclease P

The study described in this chapter was published in

Zhao, J. and M. E. Harris. "Distributive enzyme binding controlled by local RNA context results in 3' to 5' directional processing of dicistronic tRNA precursors by Escherichia coli ribonuclease P." Nucleic Acids Res. (2018)

27 Abstract

RNA processing by and RNA modifying enzymes often involves sequential reactions of the same enzyme on a single precursor transcript. In Escherichia coli, processing of polycistronic tRNA precursors involves separation into individual pre- tRNAs by one of several ribonucleases followed by 5′ end maturation by ribonuclease P.

A notable exception is valine and lysine tRNAs encoded by three polycistronic precursors that follow a recently discovered pathway involving initial 3′ to 5′ directional processing by RNase P. Here, we show that the dicistronic precursor containing tRNAvalV and tRNAvalW undergoes accurate and efficient 3′ to 5′ directional processing by RNase P in vitro. Kinetic analyses reveal a distributive mechanism involving dissociation of the enzyme between the two cleavage steps. Directional processing is maintained despite swapping or duplicating the two tRNAs consistent with inhibition of processing by 3′ trailer sequences. Structure-function studies identify a stem-loop in the 5′ leader of tRNAvalV that inhibits RNase P cleavage and further enforces directional processing. The results demonstrate that directional processing is an intrinsic property of RNase P and show how

RNA sequence and structure context can modulate reaction rates in order to direct precursors along specific pathways.

28 Introduction

There are over 80 different tRNA genes encoded in the Escherichia coli genome.

These individual genes are transcribed in the form of 28 monocistronic precursor tRNAs

(ptRNAs), 14 polycistronic precursors containing up to seven individual tRNAs, and seven tRNAs encoded within ribosomal RNA precursors(21) (Appendix 1). The pathways by which these tRNA precursors are processed into mature tRNAs were defined primarily by molecular experiments that characterized the potential intermediates that accumulate in cells depleted of one or more specific tRNA processing enzymes by conditional mutations(25,26,33,37,38). The sequence of these partially processed polycistronic precursors support a general biosynthetic pathway in which processing is initiated by

RNase E cleavage within a few nucleotides downstream of each tRNA 3′ terminal CCA sequence. Subsequent endonucleolytic cleavage by the ribonucleoprotein enzyme ribonuclease P (RNase P) is responsible for removal of 5′ leader sequence to generate the mature tRNA 5′ end(39,40). The process of final 3′ terminal maturation as well as rho terminator removal is performed by a combination of 3′ to 5′ (38,41,42) (Figure

2.1 A).

29

A B

metT-leuW-glnU-glnW-metU-glnV-glnX serV-argV-argY-argZ-argQ valV-valW argX-hisR-leuT-proM valU-valX-valY-lysV-rho thrU-tyrU-glyT-thrT lysT-valT-lysW-valZ-lysY-lysZ-lysQ-rho metZ-metW-metV-rho glyV-glyX-glyY leuQ-leuP-leuV glyW-cysT-leuZ secG-leuU-rho alaW-alaX tyrT-tyrV

30 Figure 2. 1, 3′ to 5′ directional RNase P cleavage is the initial step in processing of three ploycistronic tRNA precursors

A), The eleven ploycistronic tRNA precursors are listed that are processed by the major processing pathway in which the initial step is separation into individual monocistronic units by RNase E (indicated by an arrow), 5′ end maturation by RNase P (shown as a lightning bolt), and subsequent 3′ end trimming (not shown). B), The three dicistronic tRNA precursors encoding the entire complement of valine and lysine tRNAs undergo initial 3′ to 5′ directional processing by RNase P, followed by 3′ end trimming.

31 In contrast to this canonical pathway for polycistronic tRNA precursor processing, recent molecular genetic studies by Kushner and colleagues demonstrated an alternative processing pathway for three polycistronic precursors that encode the entire complement of valine and lysine tRNAs (valV/valW, valU/valX/valY/lysY and lysT/valT/lysW/valZ/lysY/lysZ/lysQ). This alternative pathway is of particular interest because it involves previously unidentified functions for RNase P, which normally acts after separation of individual tRNAs from polycistronic substrates(33) (Figure 2.1 B).

Northern blot and primer extension analysis of transcripts that accumulate when RNase P is absent demonstrate that all three polycistronic transcripts undergo initial processing such that separation into individual tRNAs is concomitant with 5 ′ end maturation(25,33).

Remarkably, RNase P separates the individual tRNAs by first removing the Rho- independent transcription terminators from the primary valU and lysT transcripts (for clarity the two longest polycistronic substrates are referred to by the tRNA located at their

5′ end), and then subsequently catalyzes cleavage proceeding in the 3′ to 5′ direction generating one pre-tRNA at a time. This unexpected function of RNase P in vivo presents several challenges to our current understanding of the substrate specificity of this widespread and essential enzyme. For such a directional processing pathway to occur the rate of processing of the internal RNase P cleavage sites must be supressed in the primary precursor transcript. These same sites must then become activated for RNase P cleavage subsequent to the processing of the tRNA located immediately downstream. As discussed below, there are several potential mechanisms for such ordered processing. However, the basis for the observed differences in relative rates of phosphodiester bond hydrolysis at different cognate processing sites by RNase P that result in directionality is unknown.

32 This gap in our knowledge exists despite the fact that the structure of bacterial

RNase P and its molecular recognition of several model pre-tRNAs have been well studied(43-45). In most organisms, RNase P occurs as a ribonucleoprotein containing a highly conserved RNA subunit (P RNA) and a single protein or a collection of proteins(44,46).

The composition of bacterial RNase P is simplest and is composed of a ca. 400nt P RNA subunit and a single ca.100 amind acid(47) protein subunit that is conserved in all three kingdoms(48). The folded structure is compact and forms a pocket for contacting the tRNA substrate(10,17,47,49), which includes the that positions metal ions for catalysis. P

RNA is bound by a single smaller protein subunit of approximately 100 amino acids that contributes to molecular recognition by contacting pre-tRNA 5′ leader sequences(50), as well as facilitating P RNA metal ion binding(19,51), and stabilizing structure and conformational changes(52-54). A subset of Eukarya alternatively has one of two classes of homologous protein enzymes as their nuclear or organellar RNase P(44,55). Detailed structure-function studies show that P RNA recognizes the 11-nucleotide helix formed by the acceptor stem and T-stem and contacts functional groups on the G(1)-C(72) base pair at the tRNA 5′ end, the R(73)C(74)C(75) sequence at the 3′ end of the tRNA, and N(-1) and N(-2) in the 5′ leader sequence. The substrate binding of P RNA also contacts

2′-OH groups in the T stem-loop (27,56). The spacing of these contacts in the T stem-loop in relation to the cleavage site results in an overall shape recognition of the substrate (10,57-59).

The P protein subunit interactions with the pre-tRNA 5′ leader have distinct sequence preferences at positions N(-3) to N(-6), primarily for the nucleobase at N(-4), although this interaction appears to be species-specific(12,30). Despite this detailed understanding of

RNase P molecular recognition, none of these features explains the differences in

33 processing rate constants at different cognate binding sites that results in 3′ to 5′ directional processing observed in vivo (25,33).

A common feature of RNA processing enzymes including RNase P is the ability to recognize cognate processing sites within different RNA sequence and structure contexts(26,37,42,60-64). There is growing appreciation that the conserved sequences and structures directly contacted by RNA processing enzymes are embedded within a surrounding RNA context that can indirectly influence the relative rates of processing and affinities of RNA binding proteins(65-71). In vitro and in vivo studies in an increasing number of experimental systems shows that context-dependent effects on RNA recognition can arise due to multiple factors including the presence of competing binding sites(72-77) and the ability to form alternative secondary structures(78-81). Thus, it is important to understand the basis for differences in the relative rates of processing at individual cognate sites in RNA precursors, and how they may contribute to . With respect to tRNA processing, differences in processing rates can affect steady state tRNA levels in biologically meaningful ways(36,82-85). The processing of polycistronic RNA precursors by

RNase P thus provides a valuable experimental system for investigating molecular recognition of multiple alternative processing sites embedded within a single RNA substrate.

In order to better understand the basis for different rates of processing for different

RNase P cleavage sites we investigated the in vitro processing of the simplest RNase P- dependent dicistronic precursor substrate (referred to throughout as valVW), which encodes only two substrates, tRNAvalV and tRNAvalW. The 3′ to 5′ pattern of processing observed in vivo is also observed in vitro with synthetic precursor valVW and purified

34 RNase P, demonstrating that no additional cellular factors contribute significantly to processing order. Kinetic analyses reveal a distributive mechanism in which directional processing occurs by two independent binding and cleavage steps. Structure-function studies show that inhibition of RNase P cleavage by the presence of long 3′ trailer sequences as well as by stable secondary structure in the 5′ leader sequence of the upstream tRNAvalV, both likely contribute to the observed directional processing mechanism. These data provide insight into a new facet of molecular recognition by an essential tRNA processing enzyme necessary for understanding its in vivo function. Importantly, the data show how processing rates at alternative RNA processing sites may be adjusted by local sequence and structure context.

35 Results

Evidence for 3′ to 5′ ordered processing of dicistronic ptRNAval VW by RNase P in vitro

Experiments by Kushner and colleagues showed that E. coli RNase P processes the dicistronic valVW that contains only two valine ptRNAs with 3′ to 5′ directionality(33).

The valVW substrate provides a simple model system to determine the basic mechanistic features that underlie directional processing. This dicistronic precursor is relatively small

(178 nucleotides) and contains two valine tRNAs (tRNAvalV and tRNAvalW) substrates from isotype 1 that differ only by a few nucleotide positions in their acceptor stems, as shown in

Figure 2.2 A. There are only 4 nucleotides in the intergenic region between the two tRNAs that lacks a classic A/U rich RNase E cleavage site. The tRNAvalV at the 5′ end has a 20- nucleotide long leader sequence. Inspection of the tRNAvalV 5′ leader sequence shows that it most likely forms a stable 6 base paired nucleotide stem with a 5-nucleotide loop located two nucleotides away from RNase P cleavage site. Recent comprehensive analysis of the sequence specificity of C5 protein shows that RNA structure as well as sequence preferences in the 5′ leader affects the kcat/Km for RNase P cleavage, and therefore could potentially control the competition between alternative processing sites(30). Moreover,

RNase P recognizes the folded structure of tRNA, and given the highly self-complementary sequence of the dicistronic valVW folding into alternative conformations could affect the reaction kinetics observed in vitro. Therefore, it is critical to first test for formation of the predicted 5′ stem loop and establish the predominant folded conformation of the in vitro transcribed dicistronic valVW substrate.

36 In-line probing involves analysis of the pattern of spontaneous cleavage of the phosphodiester backbone of 5′ end labeled RNA in which enhanced reactivity reflects backbone structure with correct geometry for in-line attack(86). Regions where RNA is single-stranded allow in-line conformation undergo detectable levels of 2′-O- transphosphorylation upon incubation under reaction conditions. Regions where RNA nucleotides involved in structure preclude from in-line attack. Figure 2.2 shows that the valVW substrate produces an in-line probing pattern that is consistent with the predicted secondary structure. Cleavage at tRNA loops and junction regions are clearly observed

(Figure 2.2 A) which precisely corresponds to loops and bulges in the precursor (Figure

2.2 B and C). Predicted stems appear to be well formed and protected from in-line attack.

G39 in the D-loop is clearly protected from in-line attack, consistent with long-range tertiary base-pairing with C77 in the T-loop. Although it is difficult to map the 10-15 nucleotides near the extreme 3′ end of tRNAvalW (lower-case in Figure 2.2), the data clearly shows that the acceptor stem and D-stem of tRNAvalW are correctly formed.

37

A

B C

38 Figure 2. 2 In-line structure probing of 5′ 32P end-labeled dicistronic valVW precursor RNA corresponds to predicted secondary structure

(A) Predicted secondary structures of the valVW substrate. Levels of spontaneous RNA cleavage at backbone are circled and shaded according to the intensity of the cleavage product. Analysis of valVW cleavage products on 12%(B) and 8%(C) polyacrylamide gels.

Lanes are marked as follows. UR: unreacted RNA; T1: partial RNase T1 digestion; OH: partial alkaline hydrolysis; VW-ILP: refolded ptRNAsval VW subjected to in-line probing condition (50 mM Tris–HCl (pH 8.3 at 20°C), 20 mM MgCl2, and 100 mM KCl) incubated at 21℃ for 40 hrs. Bands corresponding to sites of T1 digestion are numbered as indicated in panel A.

39 Analysis of the processing kinetics of the valVW dicistronic substrate was performed under single turnover reaction conditions as described in Material and Methods. These conditions are chosen to simplify the interpretation of the kinetics of the formation and decay of intermediates, and to allow the reaction rate constants to be quantified without complication due to product inhibition. For an optimal ptRNAmet substrate the rate constant for substrate dissociation is slow and the observed rate constant at sub-saturating enzyme concentrations reflects substrate association(16). However, this mechanism may not hold for substrates like valVW with more complex structures and potential inhibitory secondary structure. Figure 2.3 shows the products formed by in vitro processing of 1 nM 5′ 32P- labelled dicistronic valVW substrate by 30 nM RNase P. Since the radiolabel is uniquely located on the RNA 5′ end only the original substrate (species a in Figure 2.3), and reaction intermediates and products that retain the original substrate 5′ phosphate can be detected.

The apparent kinetics of formation of reaction products visualized by phosphor imager analysis appears to reproduce the 3′ to 5′ directional processing of valVW observed in vivo

(Figure 2.3 A and B). The intermediate b contains the original 5′ end 32P label and it is formed at early time points. The mobility of this product is consistent with cleavage in the intergenic region at the correct valW processing site. Intermediate b disappeared with similar apparent kinetics as the accumulation of product c that has mobility in the gel consistent with correct cleavage at the 5′ tRNAvalV processing site. These results are consistent with a simple sequential mechanism (Scheme 1) in which the substrate a is converted to intermediate b that is processed to yield 5′ product c.

A sequential mechanism is further supported by fitting the data to Equations 1, 2 and 3

(87) derived for Scheme 1 and quantification of the observed rate constants k1 and k2. As

40 described in Materials and Methods the magnitude of k1 obtained by fitting the decrease in substrate a is consistent with the kinetics of intermediate b formation and the lag in formation of product c. Similarly, the magnitude of k2 that is obtained by fitting the kinetics of the decay of intermediate b is consistent with the formation of product c. In sum, the fitting results show that valVW is processed by RNase P by initial fast cleavage at the 3′

-1 most valV tRNA site (kobs(valV): 0.013 ± 0.002 s ) followed by processing at the 5′ most

-1 valW tRNA site with an equivalent apparent rate constant (kobs(valW): 0.029 ± 0.002 s ).

To characterize the entire complement of reaction intermediates and products we repeated the kinetic experiments using an RNA substrate population synthesized in vitro using [α-

32P] CTP so that the resultant RNAs are uniformly labelled (Figure 2.3 C). The full length valVW (a) was again processed to generate intermediate b that retains the 5′ end as well as

3′ products c2 and c3 that form with similar apparent kinetics (also see scheme in panel

D) . The c2 and c3 products are ca. 77 nucleotides in length consistent with the mature tRNAvalW. Based on 5′ end mapping results (see below) the different products are due to

3′ end heterogeneity generated during in vitro transcription by T7 RNA (88,89).

The intermediate b decreases as expected at later time points and is further cleaved to generate product c1, which is ca. 81 nucleotides in length and represents the mature tRNAvalV with the additional UCCU intergenic sequence at the 3′ end (scheme in panel D, bottom line). The corresponding 5′ leader fragment generated by the second cleavage reaction migrates lower in the gel and is not shown in the figure. A very minor high molecular weight product is also observed that is consistent with a minor (<1%) amount of initial cleavage at the 5′ tRNAvalV site. Therefore, the data from uniformly labelled RNA

41 verifies the identity of predicted cleavage products from dicistronic valVW processing by

RNase P and is fully consistent with a 3′ to 5′ directional processing mechanism.

Next, we confirmed the site-specificity of RNase P processing at both cleavage sites in valVW. As shown in Figure 2.4, five time points collected during the reaction were loaded next to RNase T1 cleavage standards that map G residues, and an alkaline hydrolysis digestion ladder. The nucleotides around each cleavage site are mapped as marked on the secondary structure of valVW (Figure 2.2 A) and marked next to the gel in Figure 2.4.

The cleavage events resulting in the major products (Figure 2.3 A, products b and c) were verified as resulting from correct processing by RNase P at the authentic 5′ ends of tRNAvalW and tRNAvalV tRNA, respectively.

42

A B a b c

C D c2 +c3 a b

43 Figure 2. 3 In vitro processing of of 5′ 32P end-labeled(A) and uniformly labeled with

α-32P-CTP (C).

Reactions either containing or lacking RNase P are marked by plus (+) or minus (-) symbol, respectively. Structures of the substrate and predicted structures of products are shown next to each band based on gel mobility. The tRNAvalV is shown as a black line and the tRNAvalW is shown as a dashed line. Total incubation time for substrate is shown as an oblique triangle (30 minutes). The reaction for panel (A) resolved in the 15% UERA gel and the reaction for panel (C) resolved in the 6% UERA gel. (B). Fitting of the change in fraction of the substrate a, intermediate b, and product c to equations for two sequential first order reactions. As described in Materials and Methods, the data for a are fit to equation 1 is shown as a dashed line, the data for b and c are fit to Equations 2 and 3, respectively.

44 A B

Figure 2. 4 Site specificity of RNase P processing of the dicistronic valVW precursor in vitro

Phosphor imager analysis of cleavage products resolved on 8%(A) and 12% gel (B) polyacrylamide gels with 5′ 32P end-labeled valVW RNA. The lanes are marked as follows.

C:control reaction with no enzyme, OH: alkaline hydrolysis of dicistronic valVW, T1:

RNase T1 ladder. The positions of the guanosines and surrounding sequence are indicated next to the gel.

45

Kinetic evidence for a distributive processing mechanism

The directional processing pathway observed for valVW may occur due to several potential mechanisms. In a simple distributive mechanism binding occurs independently to the two sites, and the presence of additional 3′ sequences or some other feature is inhibitory, thereby blocking cleavage at the tRNAvalV site until cleavage at the downstream tRNAvalW site has occurred. A second model that is consistent with directional processing is a processive mechanism in which RNase P binds initially at the 3’ end and scans the substrate in a 3′ to 5′ manner without dissociation to identify potential cleavage sites.

Simple processive and distributive models for valVW directional processing by RNase P are illustrated in Figure 2.5 A. In the processive mechanism RNase P (E) binds the valVW

valW substrate (S) at the downstream tRNA cleavage site with equilibrium constant KA and

valV cleavage occurs at k1. Processing at the tRNA site occurs subsequently with rate constant k2 without dissociation and equilibration of RNase P with the free enzyme population. In the distributive mechanism binding occurs to the tRNAvalW cleavage site as in the processive mechanism, however, for processing at the tRNAvalV site, enzyme dissociation and rebinding with equilibrium constant KB occurs, followed by processing with rate constant k2. Importantly, the products predicted for initial cleavage at the tRNAvalV site are not observed. Thus, the relative second order rate constant for cleavage at this site must be significantly slower (>10-fold) in the valVW substrate compared to the rate of cleavage at the tRNAvalW site. An important distinction between the two mechanisms is clearly the presence of an independent binding step in the distributive mechanism. Thus, it follows that a fundamental difference in the observed reaction kinetics

46 will be the enzyme concentration dependence of the observed rate constant for the second cleavage step. Accordingly, we performed a series of reactions with a range of enzyme concentrations (10-600 nM) and quantified the observed rate constants for processing at

valV valW the tRNA (k1) and tRNA (k2) sites. A plot of the observed k1 and k2 versus RNase

P concentration in Figure 2.5 B shows that cleavage at both sites is concentration dependent and shows saturation behavior at enzyme concentrations greater than 60 nM and an apparent equilibrium constant Kobs of ca. 10 nM.

A second consequence of the difference in the processive versus distributive mechanisms is the level of accumulation of intermediate product b, which results from initial cleavage at the downstream tRNAvalW cleavage site. In the processive mechanism b accumulates to a lesser degree at low enzyme concentrations compared to saturating enzyme concentrations due to the change in relative magnitudes of the rate constants for the first and second cleavage steps. In contrast, in the distributive mechanism, b accumulates to the same extent since the rate constants for both steps increase with increasing enzyme concentration, but their relative magnitudes will remain the same. To illustrate this principle, kinetic simulations of the processive and non-processive mechanisms were performed using the estimates of rate and equilibrium constants observed for the native substrate as described in Supplementary Information. As shown in Figure 2.5 C for the in vitro processing of valVW, the intermediate b accumulates to approximately the same extent at both limiting and saturating enzyme concentration. This result, together with the dependence of k1 and k2 on enzyme concentration are clearly most consistent with the non- processive mechanism involving independent binding and cleavage at both sites.

47

A

B

C

3 nM 60 nM

48

Figure 2. 5 Dependence of the observed rate constants and accumulation of reaction intermediate as a function of enzyme concentration

A) Two possible mechanisms for RNase P processing of dicistronic ptRNAval VW. In the first model the processing of the two site is processive (Scheme 1) without dissociation of the enzyme between the two cleavage events, and the second is a distributive model

(Scheme 2) in which processing at the two sites occurs independently. B) Dependence of k1 and k2 for 3′ to 5′ directional processing of valVW on RNase P concentration under single turnover reaction conditions (Materials and Methods). C) Comparison of the accumulation intermediate b in reactions containing high (60nM) and low (3nM) enzyme concentration showing accumulation to the same extent independent of enzyme concentration.

49

Next, we tested whether RNase P equilibrates with the free enzyme population between processing at the tRNAvalW and tRNAvalV sites as predicted by the distributive mechanism using an ‘isotope trapping’ experiment(90). The RNase P-valVW complex was formed by mixing limiting substrate (about 0.1nM- 1nM) with a saturating concentration of enzyme

(60 nM). At an intermediate time, an excess of non-radiolabeled valVW substrate was added. If RNase P does not dissociate between cleavage steps then there will be no corresponding effect on reaction kinetics in the presence or absence of the non-radiolabeled valVW “chase”. Alternatively, if RNase P dissociates from the substrate after the first cleavage step, then the addition of non-radiolabeled substrate will slow the conversion of intermediate b to product c. As shown in Figure 2.6, the presence of competitor valVW substrate alters the reaction kinetics and dramatically slows the second cleavage step, which further demonstrates a dissociative mechanism. In sum, the concentration dependence of the observed rate constants for processing at the tRNAvalV and tRNAvalW sites, the enzyme concentration dependence of intermediate b accumulation, as well as the response to the presence of excess substrate in pulse-chase experiments support a distributive mechanism.

50 A 0.1nM-1nM VW

B C

Figure 2. 6 Determination of the reversibility of binding at the 5′ tRNA valV processing site

Time courses of processing of 5′ 32P-labeled dicistronic valVW precursor by RNase P

(30nM) under single turnover conditions without (left) or with (right) addition of excess unlabeled valVW substrate after ~ 30% of radiolabeled substrate had been cleaved at the

3'-proximal site. (B and C) The kinetics of substrate a, intermediate b and 5′ leader fragment c is fit to equations for two sequential first order reactions for both data sets to illustrate the change in reaction kinetics post-chase.

51 Kinetics of RNase P processing of swapped and duplicated tRNA are consistent with inhibition by 3′ trailer sequences

In order for directional processing to occur by a distributive reaction scheme, the rate of cleavage at the upstream tRNAvalV site must be supressed in the dicistronic valVW substrate. A simple mechanism for the observed directional processing is inhibition of cleavage at the upstream tRNAvalV site by the presence of a specific sequence or structure in tRNAvalW as part of the 3′ trailer sequence. To test this prediction, we generated three novel dicistronic ptRNAs constructs in which the order of the tRNAs and associated 5′ leaders are swapped (valWV) or in which they are duplicated (valVV and valWW). Single turnover reactions were performed and the observed cleavage sites were mapped for each substrate. The kinetics of dicistronic valWV, valVV, and valWW processing by the RNase

P holoenzyme were quantitatively analysed and compared with genomic encoded dicistronic ptRNAval VW.

Dicistronic ptRNAval WV in which the order of tRNAs are swapped (Figure 2.7 A) shows comparable kinetics with a pattern of intermediates and products that are similar compared to the native dicistronic ptRNAval VW substrate (Figure 2.3 A). Intermediate b is the only species that appears in the gel at very early time points and it decays at later time points with concomitant formation of product c. These data indicate RNase P processing at the 3′ end valV cleavage site, followed by cleavage at the upstream valW site. The two cleavage sites were verified as occurring at the correct tRNA 5′ end by T1 mapping. The pattern of products and reaction kinetics are clearly consistent with directional processing in which the 3′ most processing site on both dicistronic ptRNAval VW and WV substrates is cleaved first in a manner that is independent of the tRNA identity or the associated 5′ leader

52 sequences. Fitting the data for the swapped valWV substrate to equations for a sequential first order mechanism shows that this general mechanism is followed (Figure 2.7 A).

The reaction products and kinetics were similarly analysed for the dicistronic valVV and valWW substrates in which the individual ptRNA valV and valW with their associated 5′ leader sequences are duplicated. For the valVV substrate, approximately 90% of precursor is reacted. However, products b and c accumulated at the same time and with similar kinetics since both of them fit to a single exponential curve (Figure 2.7 B). The size of both product b and c are consistent with cleavage at the correct site for the 3′ and 5′ end substrate respectively. The observation that both species b and c accumulated with similar rate constants is inconsistent with the 3′ to 5′ directional processing. Rather, the valVV substrate appears to be processed at both sites randomly with similar rate constants.

Importantly, the species b resulting from cleavage at 3′-proximal cleavage site remains stable and does not undergo further processing to form product c. These data reveal two important insights. First, the presence additional of 3′ trailer sequences is not uniformly an inhibitor of RNase P processing, since rapid processing at the upstream cleavage can occur in the presence of a 3′ trailer sequence containing a tRNA. Second, lack of subsequent cleavage of the product b suggests that the upstream processing site is likely to be inhibited by misfolding. The products observed and the kinetics of their accumulation show that the valVV substrate appears to be processed by RNase P at both sites in a mutually exclusive manner and not from the 3′ to 5′ direction.

The reaction of 5′ 32P end-labelled valWW substrate yields only the single, small product c representing the 5′ leader resulting from cleavage at the upstream tRNAvalW. However, over 60% of the initial substrate remained unreacted even at long incubation times (Figure

53 2.7 C). This result indicates that a large fraction of the valWW RNA folds into a conformation that is not recognized as a substrate by RNase P. The remaining fraction of the valWW substrate population is reactive and undergoes faster cleavage at the RNase P processing site located near the 5′ end rather than the 3′ end. Again, this result demonstrates that the presence of 3′ trailer sequences is not uniformly inhibitory, and together with the results described above for the valVV substrate, suggests that duplicated tRNA substrates are prone to misfolding resulting in multiple RNA conformations with different reactivities.

54

A C E

A C E

B D F

c

B D F

55 Figure 2. 7 Kinetics analysis RNase P processing of 5′ 32P-labeled dicistronic valWV (A and B), valVV (C and D), and valWW (E and F) substrate RNAs

Single turnover reactions were performed as described for the native valVW substrate. The position of cleavage resulting from processing at the correct sites is supported by T1 mapping. The structure of substrates and products are shown next to each band. The tRNAvalV is shown as a black line and the ptRNAvalW is shown as a dashed line. Incubation time for each substrate is shown as an oblique triangle (30 minutes). Panels B, D, and F show quantitative analysis by fitting the data for each reaction to equations for sequential first order reactions as described in Material and Methods.

56 Thus, the kinetics results suggested that the dicistronic ptRNAval VV and WW may form mis-folded RNA conformations. Therefore, we subjected each dicistronic ptRNAs (valWV, valVV, and valWW) to in-line probing to test the extent to which they fold in vitro as expected. Indeed, as shown in Figure 2.8 the in-line probing pattern of valWV is not entirely consistent with the predicted secondary structure. Specifically, there are deviations from the predicted structure in the stem loop between the two tRNAs that forms the leader structure of the 3′ tRNAvalV cleavage site. With respect to the 5′ tRNAvalW the pattern of in- line cleavage products is as expected based on the secondary structure: The D-loop, helix junctions, anticodon, T-loop show cleavage and predicted stems are protected (Figure 2.8

A and B). However, pronounced cleavage is observed in the predicted intergenic stem loop in the leader structure of the 3′ tRNAvalV. The in-line cleavage observed in the predicted intergenic stem loop suggests local mis-folding, or heterogeneity in the pairing pattern within this stem. The acceptor stem of tRNAvalV, and the intergenic stem are both GC rich and multiple alternative pairings are possible in the intergenic region. These results can be compared with the kinetic data obtained using dicistronic valWV in which accurate and efficient 3′ to 5′ directional processing is nonetheless observed. Although the intergenic region between the two substrates appears to be mis-folded, or to assume a population of structures, they are compatible with fast cleavage by RNase P. Independent of structure in the intergenic region in, the RNase P enzyme still processes the dicistronic valWV RNA in a 3′ to 5′ directional manner similar to the native valVW substrate.

The in-line probing pattern of the duplicated valVV substrate is consistent with predicted secondary structures of the encoded tRNAs (Figure 2.8 C and D). Cleavage at the 5′ leader loop (only a part of the loop shown in the gel) and at the adjacent sequences “UU” from

57 G19-G22 confirmed the formation of the stem loop in the 5′ leader of the upstream tRNAvalV. The single strand leader sequence of 3′ end valV “CCAC” and U117-U118 shows strong spontaneous cleavage as well as the 3′ leader loop from C106-U110 while no cleavage from region C100 to G116 is observed. Together, these confirmed the formation of the stem loop in the 3′ leader structure of the downstream tRNA. Cleavage at positions

U29-A30, A35-A43, U54-U60, G66-C70, U126-A127, A132-A140, these regions correspond to single strand nucleotides or bulges. Taken these together, dicistronic valVV folds correctly in general as predicted.

The in-line probing pattern of valWW, however, suggests that duplication of two identical tRNAs within one dicistronic transcript can result in significant mis-folding. As shown in

Figure 2.8, the 5′ end tRNAvalW appears to fold correctly. Cleavage is only observed in

U16-A17, D loop and AC loop. The T loop in the 5′ valW, however, is well protected in the structure. We also observe limited T1 digestion at positions G76-G96 suggesting formation of highly stable secondary structure corresponding to the two identical acceptor stems. We hypothesize that the acceptor stems of the two tRNAvalW sequences may undergo alternative folding in which the 5′ end tRNAvalW acceptor stem base pairs with the complementary 3′ end of the second downstream tRNAvalW acceptor. This possible folding pattern is consistence the formation of a single product with an accurate RNase P cleavage at the 5′ end tRNAvalW site.

58

A B

C D

59

E F

Figure 2. 8 In-line structure probing of 5′ end labeled dicistronic valWV (A, B), valVV (C, D) and valWW (E, F) substrate RNAs

Panels A, C, and E) show the levels of spontaneous RNA cleavage at patterns observed with 5′ 32P end-labeled RNAs resolved on 12% and 8% polyacrylamide gels. Lanes are indicated above the gel as follows. NR: unreacted RNA; T1: partial RNase T1 digestion;

OH: partial alkaline hydrolysis; WV/VV/WW-ILP: in-line probing reaction. Bands corresponding to certain T1 digestion and alkaline hydrolysis are circled on the predicted secondary structures of valWV, valVV, and valWW in panels B, C and D, respectively.

60 A stable stem loop in the tRNAvalW 5′ leader inhibits RNase P cleavage and further enforces directional processing

To better understand the origin of the large differences in rate constants for processing at the valV and valW cleavage sites in different dicistronic contexts, we measured the reaction kinetics of monocistronic tRNAvalV and tRNAvalW precursors. In order to quantify differences in RNase P specificity, the values of the observed rate constants were measured under the same single turnover conditions used in kinetic analyses of the native dicistronic substrate. The rate constants summarized in Figure 2.9 were measured under limiting enzyme concentrations such that the kobs values for RNase P cleavage reflect the activation energy for reaction from an equivalent ground state, i.e. the free substrates in solution.

Both the valV and valW ptRNA substrates are processed with essentially identical rate constants in the context of both monocistronic and dicistronic context. We also observed little effect of the presence of the additional four nucleotides at the 3′ end of valV that are present in the intermediate b resulting from cleavage at the downstream valW cleavage site in the native valVW substrate.

valV Importantly, the kobs value for RNase P cleavage of the ptRNA substrate is consistently

2-4-fold slower that cleavage of ptRNAvalW regardless of whether the substrate is processed in the context of a monocistronic or dicistronic substrate. As established by in-line probing experiments above, a stable stem loop structure clearly forms in the leader of the ptRNAvalV substrate. The stem loop includes nucleotides N(-3) to N(-8), which encompass the binding site for the RNase P protein subunit(13,50). As established previously, for a non-initiator ptRNAmet substrate the presence of stable secondary structure in the 5′ leader sequence can inhibit RNase P processing(30). To test whether the presence of the stem loop inhibits the

61 rate of the monocistronic ptRNAvalV the 5′ leader sequence was switched with the valW leader “ACCAUCCU”. Switching the leader sequences with that of ptRNAvalW rescues the slower ptRNAvalV processing rate, showing that the native leader sequence and associated stem loop structure is likely to inhibit native valV processing. To test this hypothesis, the ability of the ptRNAvalV 5′ leader to form secondary structure was altered by mutation and the effects on processing rate were measured and compared. The entire stem loop in the genomic encode valV ptRNA was disrupted to produce a structure-free leader mutation by changing upstream sequence contexts (ptRNAval mut1, mut2). About 4-fold acceleration in

valV kobs is measured in mut1 when compared to the genomic encoded ptRNA 5′ leader.

However, the kobs measured in the mut2 only has about 1-fold increase in the rate constant, most likely have no significant increase. when the entire stem loop was disrupted and

valV further shortened in length to produce ptRNA mut3, about a 4-fold acceleration in kobs was observed. Since the former three mutations contain the same leader sequence contexts in “-1” to “-8” position, we considered whether sequence specificity in the leader region is also the element responsible for the observed catalysis defect. The ptRNAvalV mut4 substrate was generated by disrupting the stem loop by changing the sequences proximal to the cleavage site (“-1”to “-8”) to the leader sequence in the ptRNAvalW substrate

(ACCAUCCU). This substitution results in a similar increase in kobs. Stabilizing the leader structure (ptRNAvalV mut5, mut6) by changing sequence contexts of stem or including a stable “GAAA” tetra loop have no significant effect on the observed cleavage rate.

62

A

ACCAUCCU_valV

ValV_UCCU

valW

valV

di_valW

di_valV

Rate Constant(s-1) B

mut1

mut2

mut3

mut4

mut5

mut6

valV

Rate Constant(s-1)

63 Figure 2. 9 Single turnover reaction analysis of precursor tRNA rate constants

A. Comparison of rate constants for ptRNAvalV and ptRNAvalW in monocistronic versus dicistronic contexts. The substrate structures are shown on the left. The ptRNAvalV substrate is shown as a solid line and the ptRNAvalW is shown as a dashed line. B. Analysis of the effects of leader sequence structure on RNase P processing. The predicted secondary structure of the leader sequences of the individual ptRNAvalV mutants are shown on the left.

64 Discussion

Endoribonucleases that are involved in the initial steps of tRNA and rRNA biosynthesis

(for example RNase E, RNase P, and RNase III) cleave the RNA phosphodiester backbone of their substrates at a specific site, but nonetheless have broad substrate specificity. These

RNases are archetypical alternative substrate enzymes because they must recognize many different cognate cleavage sites embedded in different sequence and structure contexts(26,37,42,61-64). In this regard they reflect a general feature of many RNA binding proteins in that their specificity is broad and physiologically relevant RNA targets may have a range of affinities or may not always be optimized for highest affinity(70,91). The specificity of an enzyme for multiple alternative substrates has two important consequences for understanding the biological function of RNA processing endonucleases. First, their kinetic mechanisms cannot be understood entirely in terms of simple one enzyme-one substrate reaction scheme. The second is that their specificity cannot be completely defined by approaches that use consensus motif analysis, or that are based entirely on structure- function experiments using only one or a few optimal substrates(70).

Although RNase P is a ribonucleoprotein, there are also other endonucleases involved in stable RNA processing that are protein enzymes. For example, RNase E plays an essential role in mRNA turnover, as well as contributing to rRNA and tRNA maturation by interacting with numerous RNA target sites(63). Structure-function studies in vitro and recent transcriptome-wide identification of RNase E processing sites revealed a sequence signature of a uridine located two nucleotides downstream in a single-stranded segment.

However, this motif by itself is insufficient to fully describe RNase E specificity, which may be modulated by other proteins binding to its substrates that sequester binding sites

65 like the site-specific RNA binding protein RsmZ(92), or that help direct cleave at specific sites such as the ubiquitous Hqf(93), or potentially by alteration of RNA location in the cell(63). Like RNase P, the RNase III has a primary role in cleavage of primary RNA precursors catalyzing the first step in rRNA processing, and also plays a global role as a dsRNA specific RNA processing enzyme involved in bacterial gene expression and regulation(62,94). Analogous to molecular recognition by RNase P, RNase

III does not have a strongly conserved cognate recognition sequence, but instead primarily recognizes elements of dsRNA structure(62,94). Formation of dsRNA containing RNase III target sites is key to regulation of multiple mRNAs encoding proteins involved in environmental stress-related processes(94,95). Competition between alterative binding sites in rRNA and its own gene in the rncO operon is inherent to self-regulation of its expression(96). Variation in regular helical structure among different substrates determines whether one or both strands of a target site will be cleaved(62,97,98). Although the structure and catalytic function of RNase P are well studied, comparatively less is known about its target sites in the transcriptome and potential roles in gene regulation. Genetic depletion studies using microarray analyses to monitor changes in transcript levels suggests control over a range of mRNA processing events, and in vitro studies confirm cleavage of specific mRNAs and other small RNAs(99-102). However, a complete accounting of RNase P substrates is not yet unavailable.

The B. subtilis P RNA encoded by the rnpB gene is able to replace and complement a deletion of the homologous E. coli RNA subunit despite significant differences in their sequence and secondary structure(103,104). However, growth defects suggest that the function of the hybrid RNase P lacks some key functions of the native E. coli enzyme. An

66 important recent advance has been the discovery that RNase P can exist as a ribonucleoprotein or as protein-only enzyme. The single protein form of the enzyme is termed PRORP (PRotein-Only RNase P) and is widespread in the nuclei and organelles of . Indeed, both organellar PRORP and two nuclear PRORP enzymes from

Arabidopsis thaliana can also confer viability to E. coli in which endogenous ribonucleoprotein RNase P has been inactivated(105). However, defects in 4.5S RNA were observed as well as inaccurate processing of ptRNAs with extended acceptor sequences including tRNAHis and tRNASec, the latter of which was degraded rather than processed in the absence of native ribonucleoprotein RNase P. Despite the potential for compensatory mutations elsewhere in the genome that help confer viability, these enzyme substitution studies led to the conclusion that PRORP enzymes are capable of executing the basic functions of the ribonucleoprotein RNase P enzyme in Bacteria and Eukarya(105-107).

Presumably, these functions include the processing of the dicistronic tRNAVal and tRNALys precursor RNAs. It is not known whether the ability of substituting enzymes extends to the property of directional processing, however, it seems unlikely that directional processing is essential. More likely the directional processing is a consequence of the intrinsic substrate specificity of the bacterial enzyme, and any influence on regulation of tRNA levels is likely to be non-essential. Nonetheless, the observation of a range of phenotypes and lower growth rates in the absence of the native E. coli enzyme supports the conclusion that, like RNase III and RNase E, the RNase P ribonucleoprotein plays an important, but not essential role in a range of different RNA regulation pathways.

In addition to shedding light in its biological role, understanding the RNA structures and sequences that direct RNase P cleavage and the mechanistic basis for competition between

67 alternative processing sites for RNase P can provide insight into general principles of RNA processing enzyme specificity. Models of the specificity of RNase P that are useful for understanding its in vivo function must include the competition between alternative cognate processing sites. For competition between alternative individual substrates containing single sites, several approaches for quantifying relative kcat/Km values for multiple alternative substrate in vitro have been developed(108-111). The recognition of multiple alternative processing sites in the same RNA transcript will necessarily be subject to the same basic structure-function relationships as single substrates with a few key additional considerations. First, the potential for processivity or facilitated diffusion in which processing of the second site on a transcript is accelerated by limited exchange of the bound enzyme with the free enzyme population. Concepts of facilitated diffusion are well explored for DNA binding proteins and endonucleases(112-114), however, the flexible and complex structure of RNA makes it unlikely that analogous mechanisms apply for site- specific involved in stable RNA processing. Nonetheless, the electrostatic field generated by large RNAs such as the ribosome can engender unexpectedly fast macromolecular association rates for ribotoxins(115,116). Therefore, it is possible that similar electrostatic association might limit diffusion and accelerate processing at multiple sites on the same RNA precursor. However, the data provided here for the dicistronic valVW demonstrates that RNase P cleavage is non-processive and that directionality processing arises due to the removal of 3′ trailer sequences which inhibit processing of the 5′ proximal cleavage site.

A key feature of RNase P specificity that is underscored by the results presented here is that the length and structure of the 5′ leader and 3′ trailer sequences can be important

68 modulators of RNase P specificity. It is now well established that the proximal 5′ leader sequence interacts with the essential RNase P protein subunit. These interactions are strongest with single-stranded leaders of more than five nucleotides and crosslinking, x- ray crystallography, modelling and in vitro structure-function studies support leader sequence binding in a cleft in the P protein(10,12,43,50,117). Nucleobase specific interactions were identified at N(-4), and high-throughput kinetic analyses demonstrate significant, although broad sequence specificity(12,30). The presence of a 5′ leader is not essential for tRNA biosynthesis or function in vivo, although the length of the leader sequence of suppressor tRNAs can affect growth rate(118). The effects of leader sequence on RNase P recognition are complicated by the potential to form base pairs with the 3′ CCA sequence which dramatically slows catalysis and can result in mis-cleavage(119,120). Additionally, the presence of stable secondary structure in the 5′ leader proximal to the RNase P cleavage site is inhibitory, although it clearly does not block cleavage, as evidenced by comprehensive analysis of 5′ leader specificity of an elongator tRNAmetT precursor(30). This effect is clearly supported by the structure-function studies presented here in which a stable stem loop located two nucleotides 5′ to the cleavage site results in a ca.10-fold reduction in the observed rate constant for RNase P processing. Thus, the presence or absence of stable structure in the 5′ leader sequence within the P protein binding site, as well as the specific sequence of the P protein binding site, both contribute to recognition of a particular cleavage site. Together these contributions can be significant, resulting in orders of magnitude differences in relative rate constants for RNase P cleavage.

The 3′ CCA sequence is encoded in E. coli tRNAs genes and therefore included in precursor transcripts(42). Experimental evidence from both in vivo and in vitro studies

69 demonstrate an important pairing interaction between the RCC sequence located at the tRNA 3′ end and the P15 internal bulge in P RNA(121-123). Although it was suggested previously that bacterial RNase P may be inhibited by the presence of long 3′ trailer sequences(124), we lack a comprehensive understanding of the contribution of 3′ trailer sequences to RNase P specificity. The prevailing view has been that 3′ end formation precedes RNase P processing, and until now there has been little motivation to explore such effects for the E. coli enzyme. Recently, the presence of a Rho-independent transcription terminator was shown to significantly inhibit the ability of RNase P to process tRNA precursors in vivo(41). However, some 3′ extensions are benign as demonstrated by the data presented here, in which processing of the valV intermediate

(Figure 2.3, panel A, band B) with an additional four nucleotide AUUA sequence has no effect on relative rate of processing. Additionally, the duplicated dicistronic valVV and valWW substrates are processed efficiently at the 5′ proximal RNase P cleavage site despite the presence of a long 3′ trailer sequence. However, the fact that these substrates do not fold into uniform structures makes it difficult to understand the basis for these effects.

Nonetheless, the data presented here support the conclusion that the presence of an extended and highly structured 3′ trailer sequence reduces the relative rate constant for

RNase P processing.

The basic outline of the process of tRNA biosynthesis is understood, and the overall regulation of translation by the stringent response is known. However, the mechanisms by which cells establish the appropriate levels of tRNA by balancing expression and turnover are less well understood. Recent studies make clear that tRNA levels are likely to be dynamically regulated in bacteria including E. coli to match the demands of protein

70 synthesis, and that mis-regulation can affect translation efficiency and cell function(35,36,82,83). The tRNA processing pathways in E. coli appear to be stochastic at some level, but different tRNA precursors follow different processing pathways and the consequences of these differences for tRNA gene expression are not known. The ordered processing of dicistronic tRNAs by RNase P offers the potential for coordinated control by changes in RNase P activity. The Pan lab used microarray analysis to show that mature tRNA levels in B. subtilis are established by rates of synthesis and processing, as well as turnover of precursor and mature tRNAs(125). Recently it was shown that tRNA precursors originating from the internal spacer regions of the rrn operons, in particular rrnB, are poly(A) polymerase targets(126). initiates RNA turnover, so the fact that tRNA precursors are substrates for polyadenylation supports the idea that tRNA levels are regulated at some level by precursor turnover. The directional processing by RNase P provides the potential for coordinated regulation of the dicistronic tRNA operons for valine and lysine tRNAs. Reduced rates of RNase P processing could result in greater turnover of the precursor and therefore lower tRNA abundance. Indeed, proteins involved in stable

RNA biosynthesis and degradation are colocalized (although it is not clear if RNase P is involved) and such compartmentalization provides a way to functionally co-ordinate these processes(127).

In the present study the mechanism for directional processing of dicistronic precursor tRNAs by E. coli RNase P was investigated and established to follow a non-processive mechanism. Inhibition of processing by the presence of highly structured 3′ trailer sequences appears to be the primary mechanism by which directional processing is established for the dicistronic ptRNAval VW. However, this cannot be the entire story since

71 this mechanism does not explain the directional processing of precursors like valU and lysT that contain four and seven tRNAs, respectively. In these cases, there may be local sequence variations that help to maintain directional processing due to variation or structure in the 5′ leader or formation of local inhibitory structure near the cleavage site.

72 Materials and Methods

Preparation of substrate RNAs and RNase P

The E. coli C5 protein was expressed and affinity purified using the NEB IMPACT system as described previously(128). P RNA and precursor tRNAs were synthesized by in vitro transcription from PCR or cloned DNA templates using T7 RNA polymerase, reaction products were purified by denaturing polyacrylamide gel electrophoresis using standard protocols as described(15,18). Pre-tRNA substrates were either 5′ end labelled with 32P using

γ-32P-ATP and polynucleotide , or were uniformly labelled during in vitro synthesis by including a-32P-CTP(129) in the reaction followed by gel purification and recovery by ethanol precipitation.

In vitro RNase P processing reactions

Single turnover reactions were performed using a range of enzyme concentrations as described in the text in a reaction buffer containing 50 mM 4-morpholineethanesulfonic acid (MES) pH 6, 100 mM NaCl, 0.005% Triton X-100, and 17.5 mM MgCl2. RNase P holoenzyme was assembled in reaction buffer without magnesium by heat denaturation at

95°C for 3 minutes followed by incubation at 37°C for 10 minutes. MgCl2 was then added to the appropriate concentration (final concentration of 17.5 mM) and the incubation continued for another 10 minutes. A 1.5x equivalent concentration of C5 protein relative to the P RNA concentration was added to form the RNase P holoenzyme. Substrate solutions were prepared separately using 32P-labelled pre-tRNA by the same protocol.

Equal volumes (ca. 40 µl) of RNase P holoenzyme and ptRNA substrate were mixed to begin the reaction. The final concentrations of RNase P were 3 nM- 60 nM and ptRNA

73 substrate concentration, 0.1nM - 1 nM. Aliquots (~5 µl) were taken during the time courses and quenched with equal volume of formamide loading dye containing 100 mM EDTA.

Substrate and products were separated by electrophoresis on 15% denaturing polyacrylamide gels and each band was quantified by phosphor imager. The resulting data were used to calculate conversion of substrate to product which was plotted versus time and fit to the appropriate rate equations described below.

Analysis of reaction kinetics

The decrease in substrate concentration for both monocistronic and dicistronic substrates followed simple pseudo first order kinetics and rate constants were determined by fitting the decrease in substrate concentration to equation 1.

�(�������� ��������) = � − �� (Equation 1)

Where F is the fraction of total substrate remaining (F = [S]/([P]+[S])) at each time point.

The term A is the amplitude of the reaction and k is the observed rate constant and t is time.

Rate constants were obtained from at least three independent repeats with error bars shown in the figures representing one standard deviation.

In order to quantify the observed rate constants for cleavage of the native dicistronic valVW substrate that contains two cleavage sites, the reaction was fit to a mechanism involving two sequential first order reactions(87), as illustrated in Scheme 1.

74 Scheme 1: consecutive first order reaction

In Scheme 1 the species a, b and c are identified in Figure 2.3 and correspond to the 5′ end labeled valVW substrate, the 5′ valW cleavage product, and the 5′ leader of valV, respectively. The data for the decrease in 5′ end labeled valVW substrate (a) was fit to a single exponential function (Equation 1) to determine the observed rate constant for processing at the valW site (k1). To independently determine k1 and confirm its magnitude the data for the accumulation and decay of the intermediate species b was fit to Equation

2.

(Equation 2)

In this equation F(b) is fraction of the total substrate in the intermediate b form (F(b) = b/(a+b+c)) and k1 and k2 are defined in Scheme 1. The magnitude of k1 was determined in two ways. First, by fitting the depletion of substrate (a) to Equation 1 and by using

Equation 2 and fitting the signal obtained from quantifying the intermediate product b. The magnitudes of k1 obtained from these two analyses was typically within 20%. Similarly, the magnitude of k1 was determined by fitting the intensity of the signal for intermediate b to Equation 2 as well as by fitting the accumulation of product c to Equation 3.

�(�) = 1 − ∗ ∗ (Equation 3)

In this equation F(c) is fraction of the total substrate in the product c form (F(c) = c/(a+b+c)) and k1 and k2 are defined in Scheme 1. For evaluation of k2 using this method the

75 magnitude of k1 was fixed to the value obtained from fitting the data from the same reaction to Equation 1 and 2.

In-line probing

In-line probing assays were used to assess the formation of native secondary structure for the valVW dicistronic substrate and synthetic substrates with swapped or duplicated tRNAs(86). Briefly, 5′ 32P-labeled substrate RNAs synthesized using the protocol described above, were incubated in in-line reaction buffer [50 mM Tris–HCl, pH 8.3, 20 mM MgCl2, and 100 mM KCl] at room temperature (20°C) for 40 hours, 3 µl of reaction was loaded in both 8% and 12% denaturing gel running at 40 W for 3.5-4 hours. Gels were dried and exposed to phosphorimager screens. RNase T1 reaction was incubated in sodium citrate buffer [0.025 M sodium citrate, pH 5.0 at 23°C] for 5 minutes, 3 µl of reaction was loaded as a marker to allow the nucleotide position corresponding to gel individual bands to be identified. Alkaline digestion reaction was incubated in Na2CO3 buffer [0.05 M Na2CO3, pH 9.0 at 23°C and 1 mM EDTA] at 90°C for 5 min. 3 µl of the alkaline hydrolysis reaction was loaded on the gel to serve as a marker.

Pulse-chase analysis of reaction processivity

For the pulse-chase experiments shown in Figure 2.6 an isotope dilution approach was used in which the reaction was initiated with a low concentration of labeled substrate and then a high concentration of unlabeled substrate was added at an intermediate time during the reaction(90). The kinetics before and after the non-radiolabeled substrate chase were compared to determine the degree to which the enzyme dissociates between processing

76 events on the same substrate. An individual single turnover reaction condition containing

30 nM RNase P and ca. 0.1 nM valVW dicistronic precursor substrate was set up as described, above. After mixing enzyme and substrate the reaction was allowed to proceed and then divided into two parallel reactions when approximately 30% of the initial substrate was consumed. For one of the parallel reactions the reaction was continued as usual and 5

µl aliquots were taken at each time point. For the second reaction, 1 µl of a high concentration of RNA competitor (the native dicistronic valVW substrate) to a concentration of 60 µM was added to quench the reaction. The products of both reactions were resolved on 15% denaturing polyacrylamide gels.

77

Chapter 3

Global analysis of sequence specificity landscape for C5 protein using monocistronic ptRNA demonstrates that presence of secondary structure in the leader sequence influences the RNase P processing specificity

To be published

78 Abstract

Competing RNA structure and local context of functional binding sites in RNA can influence RNA recognition by RNA binding proteins and RNA processing enzymes. Such effects can influence binding affinity and kinetics, but are difficult to study in a comprehensive manner. RNase P is an essential ribonucleoprotein enzyme that removes tRNA 5′ leader sequences and must recognize its processing sites in the context of multiple precursor tRNA substrates (ptRNA). RNase P function depends on P protein subunit binding 5-8 nucleotides proximal to the site of cleavage on ptRNA in an extended conformation. P protein binding site is sequence specific, but optimal determinants are often obscured by formation of pairing interactions with distal leader sequences or the conserved tRNA 3′ terminal CCA sequence (3′CCA). Here, we determined the affinity landscape for P protein binding in the context of three different distal 5′ leader sequences designed to distinguish effects of local and long range structure on RNase P specificity. As reported previously stable basepairing between distal 5′ leader sequences and the P protein binding site is inhibitory. Engineering a stable structure within the distal 5′ leader sequence prevents these long range effects and reveals A at N (-2) and C at N(-4) as a positive determinant for P protein specificity. The effects of mutations at these positions are coupled to the identity of neighboring nucleotides within the binding site. However, we show here that this interdependence is largely due to threshold inhibitory effects resulting from pairing with the 3′ CCA. Quantitative analysis of the effects on reaction rate constants suggests a model in which melting of pairs between the 5′ leader and 3′ CCA may be rate limiting for processing of some substrates in vivo. In addition to providing insight into the

79 mechanism of an essential enzyme, these data provide a mechanistic and structural basis for the dependence of nucleotide recognition on surrounding sequence context.

80 Introduction

RNA binding proteins (RBP) are responsible for regulating gene expression at the RNA level. In order to function in the cell, they must recognize their cognate binding sites among alternative RNA sequences and structural contexts. Multiple substrate recognition is an essential and inherent characteristic for some ribonucleoprotein complexes such as the ribosome and spliceosome. Therefore, understanding how they discriminate between alternative RNA substrates and recognize their cognate binding sites among thousands of non-cognate binding sites as well as characterizing the details of substrate sequences and structure contexts that drive the enzyme specificity is critical.

RNase P is an essential enzyme, that removes the extra 5′ leader sequences of all 87 different precursor tRNAs (ptRNAs) in an E. coli cell. RNase P is a useful model RNA processing enzyme in that it recognizes multiple substrates, and can be employed to quantitatively dissect the influence of local RNA sequences and structures on enzyme binding specificity. The bacterial RNase P has a simple composition: one RNA component

(350-400 nucleotides), serving as the catalyst; and the protein subunit (about 100 amino acids) that binds to single-strand leader sequence ( “-8” to “-3” position) of ptRNAs

(Figure 1.1)(10) and contributes to molecular recognition of ptRNA substrates by enhancing the catalysis and specificity of P RNA(11,12). The crystal structure of Thermotoga maritima RNase P-substrate complex(10) together with extensive biochemical and structure-probing studies indicates that P RNA makes direct contacts with the tRNA substrate, while the protein subunit binds to the leader sequence.

81 The contacts between P RNA and tRNA involve conserved sequences at the cleavage site and the helical structure of the tRNA acceptor stem. However, the leader sequence that the protein subunit must recognize, is highly variable between different ptRNAs, which exhibit great diversity in both sequence and structural contexts. Despite the lack of conservation in the 5′ leader from all ptRNAs, the protein subunit has to bind to the 5′ leader sequence and, together with P RNA, catalyze the removal of the 5′ leader sequence. This raises an important question: How is RNase P specificity influenced by the diversity in sequence and structure context in terms of the kinetic mechanism displayed by RNase P to process all the different ptRNA substrates.

In order to understand the mechanistic basis of RNase P specificity, our lab previously developed HiTS-KIN, which allows us to measure thousands of relative rate constants simultaneously. The resulting rate constant distributions can be analyzed to develop quantitative models of sequence specificity, as well as provide evidence for establishing the rules between RNase P protein and 5′ leader sequence binding specificity or structural effects(31). We developed a substrate pool by using the ptRNAmet82 body as the basis with the randomized nucleotides in N(-6) to N(-1) positions, which contains both the P RNA contact site and a portion of the protein binding site. An extra 21 nucleotide sequence was attached upstream of the randomized region to avoid ligation bias during library preparation(130). X-ray crystallography of the Thermotoga maritima RNase P-substrate complex as well as other biochemical studies demonstrated that the protein subunit binds to a single-stranded conformation of the 5′ leader(10,131,132). Consistent with these findings, these HiTS-KIN studies demonstrated that the additional 21 nucleotide sequence can result

82 in unfavorable secondary structures that form with the randomized positions N(-6) to N(-

1) and further influence the RNase P specificity.

In the present study the results for three different distal leader sequences are compared, each of the three containing a different 21nt sequence. 21A sequence, the initial 21nt sequence for developing the HiTS-KIN method(31), 21B in which each nucleotide is changed to the complement of 21A, and 21C which is designed to form a stable hairpin to prevent interactions with the P protein binding site and thus “insulate” the resulting rate constant distribution from the effects of long-range secondary structure formation (Figure

3.1). The resulting affinity distributions for the 21A, 21B and 21C populations were analyzed using a quantitative model for specificity, correlation analyses, and calculated folding free energies to identify effects due to local RNA folding. Comparative analysis of the three data sets allows effects from long range RNA structure, local RNA structure and intrinsic sequence specificity to be deconvoluted. The resulting insights are consistent with a multi-step binding model for RNase P processing that depends on local RNA structure and sequence that is likely to be relevant for its in vivo function. Similar models for the contribution of competing structures to specificity are likely to be applicable to other multiple substrate enzymes involved in stable RNA maturation.

83 A

B C

Figure 3. 1 HiTS-KIN substrate pool design and kinetic reaction

A) The three randomized substrate ptRNAmet82 pools are shown. The randomized region

N(-6) to N(-1) is marked in yellow box, N(-8) to N(-3) is the C5 protein binding site, and

“AUG” is the linker nucleotides to connect upstream 21nt tag and the randomized region, the black arrow indicates the RNase P cleavage site. The three different 21 extra sequences attached upstream of the randomized region, namely 21A, 21B and 21C, are also shown.

B) A typical denaturing gel result of multiple turnover reaction with substrate (ptRNAmet82) and product (Leader) indicated next to the band.

C) Multiple turnover reaction kinetics of genomic encoded ptRNAmet82 WT (red) and randomized substrate (black) is shown. Data was fitted to an exponential equation.

84 Results and Discussion

Engineering a stable stem-loop proximal to the P protein binding site reveals the sequence preferences of P protein more clearly

The ptRNAmet82 is a canonical RNase P substrate, it has “U” at N(-1) position, a closing

“G(+1)-C (+72)” base pair at acceptor stem, and “CCA” at 3′ end. In order to analyze the specificity of the protein subunit of RNase P, we randomized the leader sequence of ptRNAmet82 from N(-6) to N(-1). Positions N(-2) and N(-1) are contacted by the P RNA, which could affect the substrate cleavage, while positions N(-6) to N(-3) fall within the protein binding site. The randomized region from N(-6) from N(-1) provided 4096 substrate variations. The relative rate constants (kcat/Km) were obtained for all 4096 substrate variants using HiTS-KIN as described previously. The change in the abundance of alternative RNA substrates was measured using next-gen sequencing over the course of the reaction, and individual rate constants were calculated using internal competition

(16) kinetics . Each relative second order rate constant (krel) was calculated and normalized to that of the genomically encoded leader sequence (“AAAGAU”).

The natural log distributions of all 4096 relative rate constants for processing ptRNAmet82

21A/B/C “-6” to “-1” are shown in Figure 3.2. The HiTS-KIN method revealed a range of relative rate constants spanning several orders of magnitude. The genomically encoded 5′

met82 leader sequence for substrate ptRNA (lnkrel value equals to 0) was not among the fastest sequences in all three distributions. There was a significant number of sequences in all three populations that react faster than the genomically encoded sequence. Additionally, the shape of the affinity distribution varies for the different 21nt distal leader sequences.

85 More overlap was observed between 21B/C than 21A/B. The difference, both in distribution and in sequence logos, between 21A/B are well addressed and compared in previous papers(30,32). Notably, the 21C population is overall faster than 21A and 21B populations. This difference is consistent with minimization of unfavorable secondary structures formed by 21A or 21B sequence with the randomized sequence in those populations.

Next, we calculated a sequence probability logo(133) for the fastest population of 5′ leader sequences. The sequences with the top 1% of all krel values in each dataset were analyzed and the distribution were plotted as sequence logos (Figure 3.2, panel B). The sequence logos for 21A and 21B populations have several key differences. Previous studies have addressed the effects of the unfavorable secondary structures on randomized region by comparing 21A and 21B populations(30,32). The difference in the sequence logos between

21A as compared to 21B and 21C is likely due to unfavorable secondary structures between the randomized leader region and the 21A tag, which affects C5 protein binding affinity.

Without the long range secondary structures influencing on P protein binding specificity, a nucleotide “C” at N(-4) position is observed in both 21B and 21C populations implicating that as a positive determinant for P protein binding. Based on previous results believe This result is consistent with previous studies that showed that N(-4) in the leader sequence contributes to binding specificity(12).

Curiously, in all three populations, we noticed that there is little sequence conservation at the N(-1) position(14,28). Studies have shown that N(-1) position is usually conserved, while

86 it does not affect association, it contributes to catalysis by directly interacting with the

J5/15 region of the P RNA(14,16,28). Since we used multiple turnover reaction in the HiTS-

KIN and measured the second order rate constants kcat/Km which reflects association, and not catalysis. This could explain the lack of sequence conservation at the N(-1) position in all any of the populations. On the other hand, we observed nucleotide “A” at N(-2) position in all three populations, which some studies have shown to interact with the J18/2 of P

RNA, but the chemical details of this interaction are unknown(15,18,134). However, our sequence logo with “A” at N(-2) position indicates that this position might involve in protein binding specificity.

87

A B

C5 Binding Site “-3 to-8” P RNA

2 1 bites 21A A A 0 5’ ------3’ 6 5 4 3 2 1 2

1 bites 21B 0 5’ ------3’ 6 5 4 3 2 1 2

1 bites 21C 0 5’ ------3’ 6 5 4 3 2 1

Figure 3. 2 Determination of the effects on RNase P processing of 5′ leader sequence variation on the relative kcat/Km for all 4096 substrate variants

A) Histogram of the natural log distribution of krel values with the numbers of substrate variants with a particular krel value shown for all three populations. krel values in the figure are the average of three replicates.

B) Comparison of the “Sequence logo” from the top 1% substrates with the fastest rate constants from 21A/B/C populations. “C” at N(-4) position is identified as a positive determinant for protein binding and “A” at N(-2) as the preferred residue might involve in the protein binding. Black letters “AA” show the wildtype sequence. The protein binding and P RNA contact regions are also shown in the figure.

88 The stable stemloop in the 21C distal leader sequence minimizes the effects of unfavorable secondary structure on P protein binding specificity

To correlate the effect of secondary structure within the leader sequence on association binding and rate constants in the three different 21nt sequence contexts, we plotted the

met82 value of krel for individual sequence variants of ptRNA within each of those three different 21nt sequence backgrounds (Figure 3.3). By comparing the relative rate constant distributions of 21A and 21B populations, we observe that there is a linear trend is not strongly established (colored in light blue). This indicates that the potential inhibitory structures are being formed due to the different 21A/B leader sequence backgrounds. On the other hand, the comparison of krel values of individual substrate variants between 21B and 21C populations shows much less deviation of rate constants and a majority of the sequences follow a linear trend.

To further test whether this deviation of rate constants is caused by unfavorable secondary structure, we used M-Fold to calculate all the mean folding free energies (MFE) between the 21A/B/C sequences and each of the 4096 5′ leader sequences(133). Figure 3.3 panels C and D show an overlay of MFE of 21A and 21B populations respectively, on top of the rate constant distributions for all individual substrate variants in 21B and 21C populations. In general, the lower the MFE value the higher the likelihood to fold. Overall, the 21A population displays a higher tendency to fold than the 21B population. Meanwhile, 21C has a single MFE value since it is designed to form a stable stem loop that alternative structure are less likely to form. Panel C clearly shows that the 5′ leader sequences predicted to form the most stable structures in the presence of 21A context (in green) react fast in the context of 21B and 21C backgrounds. We also observe a linear trend between

89 21B and 21C for a large number of sequences in 21B with lower MFE values (hot pink).

These correlations indicate that unfavorable secondary structures are the primary contributor for the slower rate constants observed in the 21A population for identical leader sequences across all three populations. This observation makes 21C sequence the best system to understand the intrinsic sequence specificity of the C5 protein without any interfering effects arising due to the formation of inhibitory secondary structures.

Overall, the MFE values together with the affinity distributions and the nucleotide preferences depicted by the sequence logos indicate that there are two factors affecting C5 protein binding specificity. First, sequence preferences at C(-4) and A(-2). Second, an independent effect that arises due to secondary structure being formed between 21nt and randomized region.

90 A B

C D

2.5

0.0 ∆" MFE 21B data$MFEB

from 21C from −2.5

−5.0 −2.5 rel −7.5

−10.0 log(data$X21C..1.6)

−5.0 Average k

−5.0 −2.5 0.0 2.5 log(data$X21B.1.6) Average krel from 21B

Figure 3. 3 unfavorable secondary structures interfere with C5 protein binding specificity

A,B) Density plot distributions of krel values compared between 21A-21B and 21B-21C.

C,D) Dot plot of krel values from both 21B and 21C overlay with the mean folding free energy (MFE) for all 4096 sequence variants in 21A and 21B populations respectively.

Color hot pink is less likely to fold and color green is likely to form secondary structure.

91 Quantitative modeling indicates that even in the absense of unfavorable structures within leader sequences, surrounding sequence context is still an important factor for sequence specificity

To comprehensively and quantitatively understand the RNase P specificity for 5′ leader sequences containing randomized nucleotide residues from N(-6) to N(-1), we used position weight matrix (PWM) and heat map models. PWM is a common model used for describing DNA-protein interactions(135,136). In PWM models (Figure 3.4), each nucleotide in the leader sequence from N(-6) to N(-1) is considered as independent; no coupling or interaction between neighborhood nucleotides is assumed. After fitting the HiTS-KIN datasets to the PWM model, we compared the experimental krel with krel predicted by the

PWM model. The results indicate that the PWM model alone only explains 41%, 53%, 62% of 21A/B/C populations respectively (Figure 3.4). Next, we used pairwise interaction matrix (PIM) that incorporated the degree of interaction coupling into the PWM and generated the heat map for each population (Figure 3.5). Generally, the strongest couplings in each dataset were observed between nucleotides that were adjacent to each other. Further, we noticed that in the 21A population, nucleotide “C” in from positions “-5” to “-2” had strong interaction whenever its neighboring nucleotide was a “G”. In the 21B leader sequence, none of the strong interactions in 21A were present. This suggests that nucleotides “G” and “C” appearing as neighboring nucleotides in the C5 protein binding site are a negative sequence determinant for protein binding specificity. Together, the

PWM with interaction coupling explains 65% and 62% of measured krel values for the 21A and 21B populations, respectively. This is a significant improvement compared to the

92 percentages observed for the PWM alone, indicating that interaction coupling is an important parameter for C5 protein binding specificity (31).

Next, we moved on to the 21C sequence. 21C is an entirely different sequence compared to 21A and 21B, which are transversions of each other, as described above. In the absence of unfavorable secondary structures that appear to be present in 21A and 21B, the PWM model alone would theoretically be able to explain nearly almost all of the sequence specificity of the 21C dataset and no improvement would be observed upon incorporating interaction coupling into the PWM model. However, data fitting was significantly improved when interaction coupling was incorporate along with the PWM. 62% for PWM model alone and 74% when interaction coupling is taken into account. This result suggests that interaction coupling is an important parameter for C5 protein binding specificity.

Additionally, the heat map of the 21C population shows a clear relationship between nucleotide position and interaction coupling. We observed a strong interaction coupling between position N(-5)N(-4) and N(-2)N(-1) positions at the single nucleotide level. This is consistent with the sequence logo pattern shown earlier that nucleotide “A” at position

“-2” and nucleotide “C” at position “-4” showed up to be a positive determinant. However,

What is the role of the nucleotides at N(-2) position in the contribution to the specificity for C5 protein binding stays unknown. Together, our data suggests that interactions between neighboring bases have the largest effect on C5 protein sequence specificity.

Lastly, we assume the PWM with interaction coupling model is based on leader sequence interactions with C5 protein, not from conformational change.

93

A B

C D

E F

94 Figure 3. 4 PWM modeling of C5 protein binding specificity

A, C, E) Linear coefficients for each nucleotide at each position are shown. Each nucleotide is indicated by a specific color. Score of the other three nucleotides is calculated relative to the genomically encoded leader sequence, AAAGAU, N(-6) to N(-1). Positive and negative values indicate the preference relative to the genomically encoded sequence. The top nucleotide in each position is the one that is most preferred by the C5 protein.

B, D, F) Comparison of krel values predicted by linear regression fitting to PWM and experimental krel for each population are shown.

95 A B

C D

E F

96 Figure 3. 5 Heatmap modeling of C5 protein binding specificity including neighborhood interaction

A, C, E) Heatmap models considering both PWM and interaction coupling between neighborhood nucleotide residues is shown. Red means in that given position, the particular nucleotides have the binding promotion; blue means in that given position, the two nucleotides have the binding inhibition.

B, D, F) Comparison of krel values predicted by linear regression and interaction coupling fit to experimental krel for each population is shown.

97 The contribution to P protein binding by nucleotides N(-6) to N(-3) depends on the identity of nucleotides N(-2) and N(-1)

To further dissect the dependence of sequence variation on P RNA contact sites, N(-2) to

N(-1), on C5 protein binding site, N(-6) to N(-3), without the influence of unfavorable secondary structure, we examined the effects on all the combinations of nucleotides at N(-

2) and N(-1) from sequence variations.

We inspected all 256 sequence variations at N(-6) to N(-3) of the C5 protein binding site in the context of all 16 combinations at N(-2) and N(-1) position individually (Figure 3.6, panel A). Individual krel values of these sequence variants were compared to the genomically encoded sequence NNNNA(-2)U(-1) by dot plots (panel C). Overall, most of the data showed a single linear trend with a positive slope suggesting that the sequence variations from N(-6) to N(-3) are independent of the identity of the nucleotide residues at

N(-2) and N(-1). Otherwise, some deviation behavior would have been observed if there is any correlated relationship between the sequence variation in N(-6) to N(-3) and the identity of N(-2) and N(-1). Most of the points in each graph are smaller compare to that of “AU” indicating that A(-2)U(-1) is the best combination in the leader sequence at those positions. This is also consistent with the sequence logo pattern where nucleotide “A(-2)” appeared as a positive determinant for RNase P specificity, while nucleotide “U” is usually conserved in the N(-1) position has a strong interaction with a specific adenosine in the universally conserved J5/15 region of P RNA (14,15).

We noticed that the slopes approach zero for a few combinations of N(-2)N(-1). The most predominant effects are in the subset of substrates where the nucleotide “G” is present at the N(-2) or the N(-1) position or both (Figure 3.6, red box). These effects on krel value

98 maybe due to the potential base pairing interaction between the 5′ leader sequence and the

3′ RCCA sequence of the ptRNAs. For example, pairing between G(-2)G(-1)G(+1)and

C(+72)CCA-3′ sequence would extended the acceptor stem and potentially lead to slow cleavage or even mis-cleavage (Figure 3.6, panel B, red box)(16).

Another interesting subset of substrates include those containing “A/C” at N(-2) or N(-1) positions (Figure, 3.6, blue box). These four subsets (NNNNAC, NNNNAA, NNNNCA,

NNNNCC) contain a minor fraction that deviates from the main linear trend. Surprisingly, the sequences with C(-4)G(-3) which are indicating red dots in Figure 3.7, have relatively high krel values when compared to the reference sequence in all four subsets. This suggests that the combination of C(-4)G(-3) is a positive determinant when “A/U” appears in the

N(-2)N(-1) positions (Figure 3.6, panel B, blue box). This indicates that not all the pairing interactions between the nucleotides proximal to the cleavage site and the 3′ end “CCA” sequence are inhibitory as one base pairing interaction does not affect processing. This also suggests that there might be conformational change in the enzyme-substrate complex since one pairing might not be strong enough to pull the 5′ leader and 3′CCA together.

To sum, these data suggest an interdependent relationship between the identity of the 5′

N(-2)N(-1) and sequence variations in positions N(-6) to N(-3). Nucleotide “C” at N(-4) position appears to contribute in an independent manner. Additionally, there are a limited number of sequences that show pairing with 3′ CCA, suggesting that base pairing can be a potent anti-determinant.

99

A B

C

100 Figure 3. 6 Sequence variations from N(-6) to N(-3) are interdependent with identity of N(-2) and N(-1)

A, 16 subsets are prepared based on the identities of nucleotides in N(-2)N(-1) positions.

Each of these 16 datasets contains 256 sequence variants. “Reference” indicates the genomically-encoded sequence A(-2)U(-1).

B, Proposed pairing interaction between nucleotides in the leader sequence proximal to cleavage site and 3′ end “CCA” sequence of ptRNA as a potent anti-determinant is shown.

C, 16 subsets are compared individually with the reference sequence A(-2)U(-1). Across each row is the change in nucleotide N(-1) while each column follows a change in nucleotide N(-2). Each scatter plot depicts the relationship between krel ratios comparing substrates containing sequence variations from N(-6) to N(-3) in context of the catalysis of the nucleotides at N(-2)N(-1) (shown in the y-axis) relative to the reference sequence

(shown in the x-axis).

101

Figure 3. 7 Combination of C(-4)G(-3) is a positive determinant when “A/U” appeared in N(-2)N(-1) positions.

Four subsets (NNNNAC, NNNNAA, NNNNCA, NNNNCC) were color-coded according to the nucleotide present at the N(-3) position and plotted C(-4)N(-3) with AA, CA, AC,

CC at N(-2)N(-1) respectively

102 G in the 5′ leader sequence is an anti-determinant, but only in the presence of a second pairing interaction with the 3′ CCA

In order to further understand the pairing interaction between the nucleotides proximal to the cleavage site in the leader sequence and the 3′ “CCA” end, as well as their potential role in conformational change. A similar strategy to examine other potential patterns and got the same results. 16 subsets were prepared with all combinations at N(-4) and N(-2) and the 256 possible combinations at N(-6)N(-5)N(-3) and N(-1) positions (Figure 3.8, panel A). As shown in panel C, most of the datasets showed a single linear trend with a positive slope. This is consistent with the earlier result that showed that “C” the in N(-4) position appears to contribute in an independent manner and is unaffected by the identities of N(-2) and N(-1).

Across each row is the nucleotide at the N(-4) position; nucleotides at the N(-2) position are displayed in columns. The slope in each graph is shallower compared to C(-4) and A(-

2), respectively. This indicates that C(-4) and A(-2) are two positive determinants for

RNase P specificity.

Interestingly, we also observed that the slope approaches to zero in a subset of substrates with “G” at N(-2) position. Although similar to Figure 3.6, the krel values are quite different.

Using colors to distinguish between the nucleotides in the N(-1) position plotted with N(-

4)G(-2) it is apparent that all sequences with “G/U” at N(-1) have relatively slow krel values in all four subsets (Figure 3.8). Since there is a potential pairing interaction between G(-2) in the leader sequence and the 3′ CCA, we believe that any additional paring interaction at

N(-1) position would likely slow down the rate constants (Figure 3.8, panel B, blue box).

More interestingly, for the substrate variants with “G” at both N(-4) and N(-2) positions,

103 no matter which nucleotide appeared in N(-1) position, rate constants are slow likely because G(-2) and G(-4) in the leader region paired with the 3′ end CCA(Figure 3.8, panel

B, red box).

104 A B

C

105 Figure 3. 8 Comparison of the krel value of all 16 nucleotide combinations at N(-4) and N(-2) to the reference sequence

A) 16 datasets were prepared based on the identities of the nucleotides at the in N(-4)N(-

2) positions. Each of these 16 datasets contains 256 sequence variants. “Reference” indicates the genomically encoded sequence C(-4)A(-2).

B) Proposed pairing interaction between nucleotides in the leader sequence proximal to cleavage site and the 3′ end “CCA” sequence of ptRNA as a potential anti-determinant.

C) 16 subsets were compared individually with the reference sequence C(-4)A(-2). Across each row is the change in nucleotide N(-4) while each column follows a change in nucleotide N(-2). Each scatter plot compares the krel ratio of substrates containing sequence variations from N(-6)N(-5)N(-3)N(-1) in context of mutation at N(-4)N(-2) (shown in the y-axis) to the reference sequence (shown in the x-axis).

106

8 A(−1)

C(−1)

6 G(−1)

U(−1) 4 data1[, 14] NNANGN

NNCNGN

rel 2 rel k k 0

0 2 4 6 8 data1[, 10] k NNCNAN krel NNCNAN rel 8 8 A(−1) A(−1)

C(−1) C(−1)

6 G(−1) 6 G(−1)

U(−1) U(−1) 4 4 data1[, 30] data1[, 22] NNGNGN

NNUNGN

rel 2 2 rel k k 0 0

0 2 4 6 8 0 2 4 6 8 data1[, 10] data1[, 10] krel NNCNAN krel NNCNAN

107 Figure 3. 9 variations at the N(-1) position with a “G” at N(-2) indicates that a single base pairing interaction between leader sequence and the 3′ CCA may contribute to

krel.

Four subsets (NNANGN, NNCNGN, NNGNGN, NNUNGN) were individually represented by different colors for the different nucleotides at N(-1) position and compared to the reference sequence NNCNAN.

108

Sequence landscape for RNase P processing specificity

Taken together, the HiTS-KIN data from the 21C sequence suggests a free energy landscape shown in Figure 3.10. we used multiple turnover kinetics in the HiTS-KIN measure the second order rate constants, kcat/Km, this value mainly reflects the rate limiting step, which is the substrate association step (kon) that includes both the binding and conformational change steps. Since the catalysis and product release steps in the multiple turnover kinetics are so fast, they would not affect the observed rate constant in these experiments. Previous studies and the comparison between the 21A/B/C populations all suggest that structures in the leader sequence with the protein binding site, proximal to cleavage sites , negatively affect RNase P processing specificity by competing with the

RNA-protein interaction because the protein usually binds to single-stranded leader sequences from N(-8) to N(-3)(10,132). Substrates with unfavorable structures proximal to cleavage site usually have a lower ground state, therefore, more energy is needed to activate and unfold the structure, resulting in a slower observed rate constant. While the 21C sequence provides a substrate pool with a stem loop upstream of the protein binding site, keeping the protein binding site a linear sequence so that they should interact with the C5 protein resulting in a stable RNase P-substrate complex. This would eliminate the influence on the ground state from unfavorable secondary structure, lowering the free energy of the transition state and revealing a clear sequence specificity landscape for RNase P processing.

These experiments demonstrate that C(-4) and A(-2) are positive determinants for the C5 protein binding specificity, for example, single-stranded leader sequences with C(-4) or

A(-2) would result in a fast kcat/Km since a lower binding energy is required. Additionally, incorporating interaction coupling into PWM model reveals that the coupling between

109 neighborhood bases is also an important factor for 21C populations (even without the influence of structure in the leader region). In addition to the unfavorable secondary structure between 21nt sequence (21A or 21B) and N(-6) to N(-1), another potentials negative effect on the kcat/Km value is the base pairing interaction between nucleotides in the 5′ leader sequence proximal to the cleavage site and the 3′ end “CCA” sequence.

Extensive base pairing between the 5′ leader and the 3′ CCA would require more energy

to unfold the structure, slowing the kcat/Km. However, the data suggests that if only one base pair forms between the 5′ leader and the 3′ CCA, the kcat/Km is unaffected. For example, in the sequence NNCGCA(-1), although G(-3) likely pairs with the 3′ end CCA, no other base pairs are expected to form between the 3′end and the N(-2)/N(-1), so C(-

4)G(-3) becomes a positive determinant. Hence, for this particular sequence, as well as other A/C rich-N(-2) to N(-1)-containing sequences, the kcat/Km is unaffected (Figure 3.6,

Figure 3.7). More experiments are needed to answer the following questions: Does the positive determinant C(-4) rescue this potential base pair between the 5′ leader sequence and the 3′ CCA? Is there any direct evidence showing a conformational change after enzyme binds to substrate? Does base pairing influence the rate limiting step?

110

Figure 3. 10 Propopsed Free energy landscape of RNase P processing of substrate pools through multiple-turnover reaction

111 Materials and Methods

Multiple turnover reaction and substrate pool generation

Generation of random RNA populations: A ptRNAMet82 substrate RNA containing a region of randomized nucleotides is used as a model substrate. tRNAMet82 body contains the consensus sequence of E.coli RNase P (Figure 3.6)(20). A random population of cDNA is generated from a plasmid template containing the wild type genomic sequence and upstream T7 RNA polymerase binding site using oligonucleotide primers randomized in the desired region. This cDNA pool is then in vitro transcribed using T7 RNA polymerase to create a randomized ptRNA substrate pool. The genomic coding leader sequence of ptRNAMet82 is AAAAAGAU (position “-8” to “-1” relative to the RNase P cleavage site).

Our randomized substrates populations contain 6 randomized nucleotides as shown in

Figure 3.1. Three different extra 21 nucleotides leader sequences are attached upstream of the randomized sequences. ptRNAs substrates are dephosphorylated using calf intestinal and 5′end-labeled with γ32P-ATP using T4 polynucleotide kinase according to the manufacturer′s protocol.

HiTS-KIN processing diagram

HiTS-KIN reactions are performed under multiple turnover conditions in pH 8.0 solution containing 50 mM Tris-HCl, 17.5 mM MgCl2, 0.005% Triton X-100, 100 mM NaCl with

1 µM ptRNA and 5 nM E. coli RNase P holoenzyme (P RNA:C5 protein=1:1). Multiple turnover reactions using competitive alternative substrate kinetics for limited enzyme.

Aliquots from the reaction are taken at appropriate time points and run on a 15% polyacrylamide gel to separate substrate and product. Residual substrate precursor bands

112 are located by exposure to X-ray film, purified by phenol-chloroform extraction and precipitation. cDNA libraries for Illumina sequencing are prepared from unreacted ptRNA substrate populations recovered from the gel. The RNA is re-suspended and reverse transcribed into cDNA templates using Superscript III. Sequencing samples are prepared by PCR ensuring a linear range, and sent for Illumina sequencing on HiScan 2500 with single ends reads of 75 bp using an equimolar mixture of each sample in one flow cell.

Processing of Illumina sequencing data: All the sequence reads are aligned and sorted

(130) according to 21A/B/C sequences and barcodes . Final relative rate constants (krel) of individual sequence population are obtained by using the equation in Figure 3.11. by monitoring the changes of the substrate variants based on the independent time points.

113

b c a

e d

Figure 3. 11 HiTS-KIN processing diagram a) Create a randomized substrate population. b) React the randomized population with the enzyme. c) Separate substrate and product. d) Isolate the residual substrate population at time points along the reaction and send for Illumina sequencing. e) Monitor the change in each substrate sequence over time and calculate a relative rate constant for each individual sequence.

114 HiTS-KIN data analysis

To comprehensively analyze specificity using all of the data from the affinity distribution, we used PWM modeling. PWM is a linear regression model to examine how different nucleotides in the particular positions affect the reaction rate including the position of each base. For example, if we randomize “-3” to “-8” positions in the leader sequence, we get

{Ni,j, -8 ≤ i ≤ -3, j ϵ A,U,G,C}, i stands for the position in the leader and j means the nucleotide there. So, if we multiply 6 randomized positions with all 4 possible nucleotides, we get 24 variables. Following equation is used to calculate linear coefficients.

8 ln(kaAcCgGuUrel) ~ å i i+++ i i i i i i i=3

Functional couplings between bases which can be displaced in a “Heatmap” are also provided in the HiTS-KIN data analysis(137). The PWM described above assumes the contributions of nucleotide bases are independent in the randomized region. However, some studies have reported that interactions between the neighborhood bases can affect enzyme affinity and specificity(138). Indeed, HiTS-KIN data are poorly described by a strict

PWM model. Therefore, on the foundation of the PWM, our Heatmap adds interaction terms of the effect between the neighboring bases shown as following equation. Together with the PWM and sequence logo, this further helps us understand enzyme affinity and specificity precisely.

8 n ln(kaAcCgGuUIrel) ~ åå( i i+++ i i i i i i) +a n n ii==31

115 Chapter 4: Summary and Further Directions

Summary

The key features of RNase P specificity are emphasized by our results presented in chapter

2 and chapter 3. In chapter 2, we used genomically encoded polycistronic ptRNAs and demonstrateed that polycistronic context, as well as structures in both the 5′ leader and the

3′ trailer are important modulators of RNase P processing. RNase P processes polycistronic ptRNAs from 3′ to 5′ in a non-processive mechanism. However, the presence of a highly structured 3′ end inhibits the directional processing. Additionally, structure within the 5′ leader proximal to cleavage site is inhibitory but clearly does not block cleavage entirely.

It is well known that the 5′ leader sequence proximal to the cleavage site interacts with P protein and the preferred P protein-RNA interaction is with the single-stranded 5′ leader sequence(10). Therefore, local sequence variation must play an essential role in RNase P specificity as well. In chapter 3, we examined the effects of nucleotide variants within the leader sequence close to the cleavage site using HiTS-KIN. By minimizing influence of secondary structure, by using 21C stem loop sequence attached upstream of randomized region N(-6) to N(-1). We outlined the sequence specificity of the C5 protein binding site, as well as the P RNA contact site. Further, we identified “C” at N(-4) and “A” at N(-2) positions as two positive determinants for RNase P processing. Additionally, by comparing the HiTS-KIN datasets of the 21A/B/C populations, we identified the effect of secondary structure within the C5 protein binding region on kcat/Km. We also defined interaction coupling as an important factor that contributes to the C5 protein binding specificity. These results provide us with a fundamental basis of how leader sequence variation governs the

C5 protein binding specificity. However, more biochemical and structural details need to

116 be uncovered in order to understand the mechanistic basis of this essential, multi-substrate recognizing enzyme.

Although basic outline of the tRNA biosynthesis process is understood, little is known about how cells maintain appropriate tRNAs levels – whether it is by balancing their expression or by tuning the in turnover to match the demands of protein synthesis. The ordered processing of three polycistronic ptRNAs from the valine and lysine family, which is entirely dependent on RNase P, provides a potential coordinated control point for maintaining specific tRNA abundance levels.

Sequence-based and structural specificity for the C5 protein described in this thesis applies to genomically encoded ptRNAs processing in the cell. Besides the three polycistronic ptRNAs from valine and lysine family, the leader sequences of the ptRNAs encoded in the

E .coli geome from N(-6) to N(-1) have been listed as Appendix 1. We predicted the krel for the individual leader sequence of each of these endogenous ptRNAs using krel library from the 21C HiTS-KIN dataset. They showed a significant variation in both leader sequence and predicted processing rate constants. The predicted RNase P processing rate constants for the most of the tRNAs are above 1. Some tRNA families, such as proline, glycine and alanine, have either short leader sequences or a positive determinant for the C5 protein binding site without any unfavorable leader structure. They also have relatively high expression levels in the cell and this matches the krel values of the leader sequences, which are also predicted to be high based on our HiTS-KIN data.

117 Future directions

Previous biochemical studies show, albeit to a limited extent, how variation in the 5′ leader sequence affects the RNase P processing rate constant and specificity. The work presented here furthers our understanding of how the natural variation in 5′ leader sequences of endogenous ptRNAs may affect RNase P processing specificity. In order to expand the frontiers of biochemistry further, it is critical to develop an understanding of the mechanistic basis of energetic coupling and neighborhood interactions, as they will likely be observed in other multiple substrate RNA processing enzymes.

The 3.8 Å resolution crystal structure of Thermotoga maritima RNase P shows intimate contact between the single-stranded 5′ leader sequence and the P protein. Using this structure as the foundation, HiTS-KIN allowed us to explore the biochemical details of this contact by enabling us to measure rate constants for all ptRNA substrate variants simultaneously and to define globally the specificity landscape of E. coli RNase P. These experiments demonstrate the effects of sequence variations in this region of the 5′ leader in multiple structural on processing rate constants. We discovered that the P protein is sensitive to unfavorable secondary structures since, by minimizing competing structures, our 21C HiTS-KIN data revealed a preference for a “C” at N(-4) position, is a positive determinant for P protein binding specificity. Energetic couplings especially between the

N(-1) and N(-2) positions, suggest that a conformational change of the RNase P-ptRNA complex occurs in the transition state. This is consistent with previous studies that show that there is a conformational change when RNase P binds to its substrate to regulate the stability of the E-S complex. However, limited information is available about the

118 relationship between a potential conformational change of the enzyme-substrate complex and its sequence specificity. Therefore, it will be highly conducive to expanding our insight to make mutations on N(-4) and N(-2) positions and perform stopped-flow kinetics to define the role of N(-4) and N(-2) positions in the potential conformational change. We propose to adopt a mini tRNA-like helix with 12 base pairs as the role of the acceptor stem and 8 single-stranded nucleotides as the role of the leader sequence and label the substrate with Cy3 on its 3′ end and Cy5 on the 5′ end and to perform single turnover kinetics in the presence of Ca2+ ions to slow down the cleavage step. The cy3 signal will be quenched by the 5′ end cy5 when the two ends move towards each other due to Florescence Resonance

Energy Transfer (FRET). Based on the change in the cy3 emission, we can measure the change in the distance between the 3′ end and 5′ end over the reaction time. We may able to interpret a potential conformational change during the assemble of enzyme-substrate complex.

Additionally, the relationship between the sequence of the 5′ leader sequence and a potential conformational change within the RNase P-substrate complex can also further be investigated by adopting a structural approach. We are interested in the application of cryo-electron microscopy (cryo-EM) to provide us with direct visualization of conformational changes occurring in bacterial RNase P-substrate complex. Our HiTS-KIN results indicate that secondary structure proximal to the cleavage site generally slows down the processing rate constant. Since P protein usually binds to single-stranded RNA leader sequences, structures in the P protein binding region might influence protein binding specificity. However, we found a few stable secondary structures with high MFE values

119 that react quickly both in vitro and in vivo. Using cryo-EM to understand how these substrates with stable structures bind to C5 protein is of particular interest.

The crystal structure of Thermotoga maritima indicates that the P RNA makes a direct contact with the ptRNA. However, the recognition sites for this interaction are uncertain.

One way to identify the P RNA recognition sites on the ptRNA body is by randomizing the nucleotide residues in the TψC loop, 3′ end “ACCA” sequence or in a part of the acceptor stem in isolation or combination and then using HiTS-KIN to explore energetic coupling.

Additionally, there are two types of RNase P in bacteria. E. coli is a representative of A type, while, the B type model is derived from B. subtilis. Studies have shown that these two types have different 5′ leader sequence specificities, since sequence preferences for the leader sequences are different(12). However, the protein-RNA contact region is usually conserved. This suggests the P protein has species specific based on the different sequence preferences. Additionally, because RNase P shows a great detail of diversity even within the same type, so studying RNase P from other pathogenic bacterial species such as

Staphylococcus aureus would be of great pharmaceutical interest. This approach will significantly broaden our understanding of RNase P processing specificity for all types of the enzyme in pathogenic bacteria to explore its potential as an antibiotic target.

120 Appendix

Sequence information of all polycistronic transcription units

Ala (GGC) 76 bp GGGGCTATAGCTCAGCTGGGAGAGCGCTTGCATGGCATGCAAGAGGTCAGCG GTTCGATCCCGCTTAGCTCCACCA

Range 1: alaX 2518041 to 2518116 N39 AATTTTGCACCCAGCAAACTTGGTACGTAAACGCATCGT Range 2: alaW 2518156 to 2518231 N22 AACTCGACAAGCTTTACGTGAC AlaWX transcription unit

Ala (TGC) 76 bp GGGGCTATAGCTCAGCTGGGAGAGCGCCTGCTTTGCACGCAGGAGGTCTGCG GTTCGATCCCGCATAGCTCCACCA

Range 1: alaV 225500 to 225575 N42

Involved in ribosome RNA: rrsH-ileV-alaV-rrlH-rrfH

AATTTGCACGGCAAATTTGAAGAGGTTTTAACTACATGTTAT

Range 2: alaU 3418971 to 3435020

Involved in ribosome RNA: rrsD-ileU-alaU-rrlD-rrfD-thrV-rrfF

AGGTTTTAACTACATGTTAT

Cannot located promoter

Range 3: alaT 4037260 to 4037335 N42

Involved in ribosome RNA: rrsA-ileT-alaT-rrlA-rrfA

AATTTGCACGGCAAATTTGAAGAGGTTTTAACTACATGTTAT

121

Arg (ACG) 77 bp GCATCCGTAGCTCAGCTGGATAGAGTACTCGGCTACGAACCGAGCGGTCGGA GGTTCGAATCCTCCCGGATGCACCA

Range 1: argQ 2817784 to 2817860 N200

3′ATATTCTCCGAAATCTTCAGCAATGAAGGTATCGAAGAGGTAGCGGAATTA

ACCGCAGGCACTAGGGATGATAACGTTGCTTTAGCAACGGCCCGAAGGGCGA

GCCGCAAGGCGAGTAATCCTCCCGGATGCACCATCTCTTAATTGATATGGCTT

TAGTAGCGGTATCAATATCAGCAGTAGAATAAGTTTTCCCGAT-5′

Range 2: argZ 2818059 to 2818135 N62

TCTCTTACTTGATATGGCTTTAGTAGCGGTATCAATATCAGCAGTAAAATAAA

TTTCCCGAT

Range 3: argY 2818198 to 2818274 N200

3′ATATTCTACGTACTTTCAGCGATGAAGGTATGGAAGAGGTGGCGGTAATAA

CCGCAGGCACCAGGGAGGATAACGTTGCTTTAGCAACGGCCCGAAGGGCGA

GCCGCAAGGCGAGTAATCCTCCCGGATGCACCATCTCTTACTTGATACGGCTT

TAGTAGCGGTATCAAAAAATCTGCAGTAAAGTAAGTTTCCCGAT-5′

Range 4: argV with serV 2818473 to 2818549 N3

TTT serV-argVYZQ

Arg (CCG) 77 bp GCGCCCGTAGCTCAGCTGGATAGAGCGCTGCCCTCCGGAGGCAGAGGTCTCA GGTTCGAATCCTGTCGGGCGCGCCA

Range 1: argX 3982375 to 3982451 N11

122 AACGGCGCTAA argX-hisR-leuT-proM

Arg (CCT) 75 bp GTCCTCTTAGTTAAATGGATATAACGAGCCCCTCCTAAGGGCTAATTGCAGGT TCGATTCCTGCAGGGGACACCA Range 1: argW 2466309 to 2466383 N1

T

Arg (TCT) 77 bp GCGCCCTTAGCTCAGTTGGATAGAGCAACGACCTTCTAAGTCGTGGGCCGCA GGTTCGAATCCTGCAGGGCGCGCCA

Range 1: argU 564723 to 564799 N20

TGTGGCGTAACGCCATAGTT

Asn (GTT) 76 bp TCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACTGG TTCGAGTCCAGTCAGAGGAGCCA

Range 1: asnT 2044549 to 2044624 N8

CACACGAT

Range 2: asnW 2058027 to 2058102 N31

*GCATCGCTAATATTCGCCTCGTTCTCACGAT

*CAN NOT LOCATED PROMOTER

Range 3: asnU 2059851 to 2059926 N6

AACGAT

Range 4: asnV 2062260 to 2062335 N8

CACACGAT

123

Asp (GTC) 77 bp GGAGCGGTAGTTCAGTCGGTTAGAATACCTGCCTGTCACGCAGGGGGTCGCG GGTTCGAG TCCCGTCCGTTCCGCCA Range 1: aspU 228928 to 229004 N20

TGGTTGTAAAAGAATTCGGT

Range 2: aspV 236931 to 237007 N18

CCCAACGGAACACAACGC

Range 3: aspT with rRNA 5S 3946872 to 3946948

*CAAATTAAGCAGTAAGCCGGTCATAAAACCGGTGGTTGTAAAAGAATTCGG

T

*CAN NOT LOCATE PROMOTER

Cys (GCA) 74 bp GGCGCGTTAACAAAGCGGTTATGTAGCGGATTGCAAATCCGTCTAGTCCGGT TCGACTCCGGAACGCGCCTCCA Range 1: cysT with glyW 1991914 to 1991987 N53

TTTAAAAGACATCGGCGTCAAGCGGATGTCTGGCTGAAAGGCCTGAAGAATT

T

Unit: glyW-cysT-leuZ

Gln (CTG) 75 bp TGGGGTATCGCCAAGCGGTAAGGCACCGGATTCTGATTCCGGCATTCCGAGG TTCGAATCCTCGTACCCCAGCCA

Gln (TTG) 75 bp TGGGGTATCGCCAAGCGGTAAGGCACCGGTTTTTGATACCGGCATTCCCTGGT TCGAATCCAGGTACCCCAGCCA

metT-leuW-glnUW-metU-glnVX

124

Range 1: 696430 to 696504 glnX with glnV N36

TTTATTCAAGACGCTTACCTTGTAAGTGCACCCAGT

Range 2: glnV with metU 696542 to 696616 N46

ATTCTGAATGTATCGAATATGTTCGGCAAATTCAAAACCAATTTGT

Range 3: glnW with glnU 696756 to 696830 N33

CTTCTTCGAGTAAGCGGTTCACCGCCCGGTTAT

Range 4: glnU with leuW 696865 to 696939 N22

TCACCAGAAAGCGTTGTACCCA

Glu (TTC) 76 bp GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAGGG GTTCGAATCCCCTAGGGGACGCCA * encoded in ribosome RNA, cannot located promoter

Range 1: gltW 2729369 to 2729444

GAAAATATCACGCAACGCGTGATAAGCAATTTTCGT #

UNIT: rrsG-gltW-rrlG-rrfG

Range 2: gltU 3943435 to 3943510

TGAAAAGCAAGGCGTCTTGCGAAGCAGACTGATAC #

Range 3: gltT 4168372 to 4168447

ATATCACGCAACGCGTGATAAGCAATTTTCGT #

Range 4: gltV 4209774 to 4209849

GAAAAGCAAGGCGTCTTGCGAAGCAGACTGATAC #

125 Gly (CCC) 74 bp GCGGGCGTAGTTCAATGGTAGAACGAGAGCTTCCCAAGCTCTATACGAGGGT TCGATTCCCTTCGCCCGCTCCA

Range 1: glyU 2998984 to 2999057 N7

TCTCGAA

Gly (GCC) 76 bp GCGGGAATAGCTCAGTTGGTAGAGCACGACCTTGCCAAGGTCGGGGTCGCGA GTTCGAGTCTCGTTTCCCGCTCCA Range 1: glyW 1992042 to 1992117 N63

GAACGGCGGCACTGATTGCCAGACGATAATAAAATCAAGTGATTAACTGATT

GCTTGATGAAT unit: glyW-cysT-leuZ

Range 2: glyV 4392360 to 4392435 N47

TAACGACGCAGAAATGCGAAAATTACGAAAGCAAAATTAAGTAGTAC

Range 3: glyX 4392472 to 4392547 N36

AAATTTGAAAAGTGCTGCAAAGCACAGACCACCCAA

Range 4: glyY 4392583 to 4392658 N36

AAATTTGAAAGTGCTGTAAGGCACAGACCACCCAA unit:glyVXY

Gly (TCC) 75 bp GCGGGCATCGTATAATGGCTATTACCTCAGCCTTCCAAGCTGATGATGCGGGT TCGATTCCCGCTGCCCGCTCCA

Range 1: glyT 4175673 to 4175747 N116

126 ATTTCGGCCACGCGATGGCGTAGCCCGAGACGATAAGTTCGCTTACCGGCTC

GAATAAAGAGAGCTTCTCTCGATATTCAGTGCAGAATGAAAATCAGGTAGCC

GAGTTCCAGGAT

UNIT: thrU-tyrU-glyT-thrT

His (GTG) 76 bp GTGGCTATAGCTCAGTTGGTAGAGCCCTGGATTGTGATTCCAGTTGTCGTGGG TTCGAATCCCATTAGCCACCCCA

Range 1: hisR 3982510 to 3982585 N58

TTTAGTCCCGGCGCTTGAGCTGCGGTGGTAGTAATACCGCGTAACAAGATTTG

TAGTG

UNIT: argX-hisR-leuT-proM

Ile (GAT) 77 bp AGGCTTGTAGCTCAGGTGGTTAGAGCGCACCCCTGATAAGGGTGAGGTCGGT GGTTCAAGTCCACTCAGGCCTACCA

Range 1: ileV 225381 to 225457 N68

CCTTAAAGAAGCGTACTTTGCAGTGCTCACACAGATTGTCTGATGAAAATGA

GCAGTAAAACCTCTAC

UNIT: rrsH-ileV-alaV-rrlH-rrfH

Range 2: ileU 3427076 to 3427152 N68

CCTTAAAGAAGCGTACTTTGCAGTGCTCACACAGATTGTCTGATGAAAATGA

GCAGTAAAACCTCTAC

UNIT: rrsD-ileU-alaU-rrlD-rrfD-thrV-rrfF

127 Range 3: ileT 4037141 to 4037217 N68

CCTTAAAGAAGCGTTCTTTGAAGTGCTCACACAGATTGTCTGATGAAAATGA

GCAGTAAAACCTCTAC

UNIT: rrsA-ileT-alaT-rrlA-rrfA

Ile2 (CAT) 76 bp GGCCCCTTAGCTCAGTGGTTAGAGCAGGCGACTCATAATCGCTTGGTCGCTG GTTCAAGTCCAGCAGGGGCCACCA Range 1: ileX 3215598 to 3215673 N1

T

Range 2: ileY 2785762 to 2785837

*AAATATTCATAACAATAACGTGTGAGTT

TATTGTTATTGCACACTCAA

*can not locate promoter

Leu (CAA) 85 bp GCCGAAGTGGCGAAATCGGTAGACGCAGTTGATTCAAAATCAACCGTAGAAA TACGTGCCGGTTCGAGTCCGGCCTTCGGCACCA

Range 1: leuX 4496405 to 4496489 N19

TTCCGCATACCTCTTCAGT

Leu (CAG) 87 bp GCGAAGGTGGCGGAATTGGTAGACGCGCTAGCTTCAGGTGTTAGTGTTCTTA CGGACGTGGGGGTTCAAGTCCCCCCCCTCGCACCA

Range 1: leuT 3982606 to 3982692 N20

TTATTAGAAGTTGTGACAAT

128 UNIT: argX-hisR-leuT-proM

Range 2: leuV 4606079 to 4606165 N33

ACGAGGCGATATCAAAAAAAGTAAGATGACTGT

Range 3: leuQ 4606315 to 4606401 N31

GGTAGCAATTCTTTTTAAGAATTGATGGTAT

UNIT: leuQPV

Range 4: leuP 4606200 to 4606286 N27

AAACCACGTTGATATTGCTCGCACTGG

Leu (GAG) 87 bp GCCGAGGTGGTGGAATTGGTAGACACGCTACCTTGAGGTGGTAGTGCCCAAT AGGGCTTACGGGTTCAAGTCCCGTCCTCGGTACCA

Range 1: LeuU attach with anything eles 3322072 to 3322158 N13

AAGTAGTATCCGT

Leu (TAA) 87 bp

GCCCGGATGGTGGAATCGGTAGACACAAGGGATTTAAAATCCCTCGGCGTTC GCGCTGTGCGGGTTCAAGTCCCGCTCCGGGTACCA

Range 1: leuZ with cyst 1991815 to 1991901 N11

TTTCTTCCCGA

Leu (TAG) 85 bp GCGGGAGTGGCGAAATTGGTAGACGCACCAGATTTAGGTTCTGGCGCCGCAA GGTGTGCGAGTTCAAGTCTCGCCTCCCGCACCA

Range 1: leuW with Met 696963 to 697047 N8

CTTTTTTT

129 Met (CAT) 77 bp CGCGGGGTGGAGCAGCCTGGTAGCTCGTCGGGCTCATAACCCGAAGATCGTC GGTTCAAATCCGGCCCCCGCAACCA Range 1: metY 3318213 to 3318289 N86

AGATTTTACGTCCCGTCTCGGTACACCAAATCCCAGCAGTATTTGCATTTTTT

ACCCAAAACGAGTAGAATTTGCCACGTTTCAGG

Range 2: metZ 2947387 to 2947463 N3

GGA

Range 3: metW 2947497 to 2947573 N33

ATTAAAATTTGATGAAGTAAAGCAGTACGGTGA

Range 4: metV 2947607 to 2947683 N33

ATCAAATTTGATGAAGTAAAAGCAGTACGGTGA

UNIT: metZWV

Met (CAT) 77 bp GGCTACGTAGCTCAGTTGGTTAGAGCACATCACTCATAATGATGGGGTCACA GGTTCGAATCCCGTCGTAGCCACCA

Range 1: metU attch with glnW 696664 to 696740 N14

CGAAGAAACAATCT

Range 2: metT 697057 to 697133 N34

CGCAACGCCGATAAGGTATCGCGAAAAAAAAGAT

Phe (GAA) 76 bp GCCCGGATAGCTCAGTCGGTAGAGCAGGGGATTGAAAATCCCCGTGTCCTTG GTTCGATTCCGAGTCCGGGCACCA

Range 1: pheU 3110366 to 3110441 N1

T

130 From tRNA database Range 2: 4362551-4362626

ACGGTTTAATGCGCCCCGTT

Pro (CGG) 77 bp CGGTGATTGGCGCAGCCTGGTAGCGCACTTCGTTCGGGACGAAGGGGTCGGA GGTTCGAATCCTCTATCACCGACCA

Range 1: proK 3708616 to 3708692 N2

TT

Pro (GGG) 77 bp CGGCACGTAGCGCAGCCTGGTAGCGCACCGTCATGGGGTGTCGGGGGTCGGA GGTTCAAATCCTCTCGTGCCGACCA

Range 1: proL 2286211 to 2286287 N4

GAGT

Pro (TGG) 77 bp CGGCGAGTAGCGCAGCTTGGTAGCGCAACTGGTTTGGGACCAGTGGGTCGGA GGTTCGAATCCTCTCTCGCCGACCA

Range 1: proM attcha with leuT 3982735 to 3982811 N42

CGACTTTAAAGAATTGAACTAAAAATTCAAAAAGCAGTATTT

SeC(p) (TCA) 91 bp GGAAGATCGTCGTCTCCGGTGAGGCGGCTGGACTTCAAATCCAGTTGGGGCC GCCAGCGGTCCCGGGCAGGTTCGACTCCTGTGATCTTCC

Range 1: sel C 3836222 to 3836312

NO LEADER

Upstream: CCATGTAGCAATCGAGGCGC

131 Ser (CGA) 90 bp GGAGAGATGCCGGAGCGGCTGAACGGACCGGTCTCGAAAACCGGAGTAGGG GCAACTCTACCGGGGGTTCAAATCCCCCTCTCTCCGCCA

Range 1: serU 2043468 to 2043557 N5

TGAAC

Ser (GCT) 93 bp GGTGAGGTGGCCGAGAGGCTGAAGGCGCTCCCCTGCTAAGGGAGTATGCGGT CAAAAGCTGCATCCGGGGTTCGAATCCCCGCCTCACCGCCA

Range 1: serV 2818553 to 2818645 N52

CAGCGACGATGAGCGATAAACAAGTTCTTCGAAGCACTCGTAAGAGGCGTGT

UNIT: serV-argVYZQ

Ser (GGA) 88 bp GGTGAGGTGTCCGAGTGGCTGAAGGAGCACGCCTGGAAAGTGTGTATACGGC AACGTATCGGGGGTTCGAATCCCCCCCTCACCGCCA

Range 1: serW 925884 to 925971 N36

CGTTGCCACCCATGAGGTTTGGTAGCAAAACAATTT

Range 2: serX 1097565 to 1097652 N24

CCAGCAAGACAGGAACGACAATTT

Ser (TGA) 88 bp GGAAGTGTGGCCGAGCGGTTGAAGGCACCGGTCTTGAAAACCGGCGACCCG AAAGGGTTCCAGAGTTCGAATCTCTGCGCTTCCGCCA

Range 1: serT 1031625 to 1031712 N3

TCC

Thr (CGT) 76 bp GCCGATATAGCTCAGTTGGTAGAGCAGCGCATTCGTAATGCGAAGGTCGTAG GTTCGACTCCTATTATCGGCACCA

132

Range 1: thrW 262871 to 262946 N31

TGATTCACGTTCGTGAACATGTCCTTTCAGG

Thr (GGT) 76 bp GCTGATATGGCTCAGTTGGTAGAGCGCACCCTTGGTAAGGGTGAGGTCCCCA GTTCGACTCTGGGTATCAGCACCA

Range 2: thrT attach with glyT 4175754 to 4175829 N6

AGATGT

Range 1: thrV 3423580 to 3423655 N12

CAAATTTAGCGT

Thr (TGT) 76 bp GCCGACTTAGCTCAGTAGGTAGAGCAACTGACTTGTAATCAGTAGGTCACCA GTTCGATTCCGGTAGTCGGCACCA

Range 1: thrU 4175388 to 4175463 N7

ACTTGAT

UNIT: thrU-tyrU-glyT-thrT

Trp (CCA) 76 bp AGGGGCGTAGTTCAATTGGTAGAGCACCGGTCTCCAAAACCGGGTGTTGGGA GTTCGAGTCTCTCCGCCCCTGCCA

Range 1: trpT attach with aspT 3946957 to 3947032 N8

CCCTAATT

Tyr (GTA) 85 bp

133 GGTGGGGTTCCCGAGCGGCCAAAGGGAGCAGACTGTAAATCTGCCGTCATCG ACTTCGAAGGTTCGAATCCTTCCCCCACCACCA

Range 1: tyrV 1287244 to 1287328 N209

3′TAATTCACCACAGCGATGTGGTAACTTCGGTGAGATTGTGAAAGTTTGCACT

CATCGGGGAAGGGTGAGAACCTTCGACTAAGGTTCGATTCGAGCGAAAGCGA

GAGAACGTTGCCACAGGCAACGGCCCGAAGGGTGAGGAGCGAAGCGACGAA

TAATCCTTCCCCCACCACCATCTCAAGCATTACCTCAAAATTTGAATTACCCC

T-5′

Range 2: tyrT 1287538 to 1287622 N43

GCTTCCCGATAAGGGAGCAGGCCAGTAAAAAGCATTACCCCGT

Range 3: TyrU with thrU 4175472 to 4175556 N8

TCAAGTCC

UNIT: tyrTV-tpr

134 Table 1, Predicted secondary structures of the 14 polycistronic precursor tRNA leader sequences encoded in the E. coli genome

polycistr ΔG tRNA N# Leader Structure onic kcal/mol

39 -5.6

1.2

alaWX

22

1.9

135 valV

valV 20 -5.10

valV

valV

valW valVW

valW

4 UCCU valW

valW

136

43 -10.6

tyrTV

209 -69.4

137

-5.1

31

0 27

138

leu

QPV 33 -5.5

139

47 -8.6

glyVX

Y all only

-7.5 one 36 structur e

36 -6.9

140

glyW

63 -16.5 glyW

glyW

glyW cysT glyW- cysT cysT- 53 -20.7 leuZ cysT all only

one cysT

leuZ

leuZ 11 2.1

leuZ

leuZ

141

3 GGA

fmetZ- 33 -0.5 metW- metV- -0.5 rho

0.2 33

0.5

142

thrU

7 ACUUGAU thrU thrU-

tyrU- thrU glyT- thrT thrU

tyrU

tyrU 8 TCAAGTCC

tyrU

tyrU

143 glyT

glyT

glyT 116 -33

glyT

thrT

6 AGAUGU thrT

thrT

argX thrT

argX argX- AACGGCGCTAA

11 hisR- argX leuT- proM argX

144

hisR

hisR 58 - hisR 14.10

hisR

leuT

leuT 20 1.0 leuT

leuT

145

proM

42 -3.0 proM

proM

proM valUXY

valUXY

7

valUXY ACAGCGGA

valUXY

44

146

valU- valX- valY- lysV- rho

46

147 lysV

lysV

lysV 4 CUUC

lysV

serV

serV

serV 52 -14.9

serV

argVYZQ 3 UUU

argVYZQ

argVYZQ

argVYZQ

148

200 -84

serV- argV- argY- argZ- argQ

62 -11.2

149

200 -73.4

150

lysT

lysT

lysT -4.8 40

lysT

lysT- valT valT- lysW- valT valZ- 135 -41.7 lysY- valT

lysZ-

valT lysQ- lysW rho

lysW 2

CC lysW

lysW

151

valZ

valZ

149 -53.10 valZ

valZ lysY/Z/Q

3 CUC lysY/Z/Q

lysY/Z/Q

lysY/Z/Q 146 -42.9

152

132 -33.23

153

metT

metT

metT 34 6.6

metT

leuW

leuW

8 CTTTTTTT leuW

leuW

metT-

glnU/W leuW- glnU- glnU/W 22 glnW- 1.4 metU- glnU/W

glnU/W

154 glnV- glnX

33 -6.4

metU

metU 14 1.0

metU

metU

155 glnV/X

glnV/X

glnV/X

46 glnV/X -6.8

-5.6

36

156 Reference

1. Mattioli, C., Pianigiani, G. and Pagani, F. (2013) A competitive regulatory mechanism discriminates between juxtaposed splice sites and pri-miRNA structures. Nucleic Acids Res, 41, 8680-8691. 2. Kim, B., Jeong, K. and Kim, V.N. (2017) Genome-wide Mapping of DROSHA Cleavage Sites on Primary and Noncanonical Substrates. Mol Cell, 66, 258-269 e255. 3. Fernandez, N., Cordiner, R.A., Young, R.S., Hug, N., Macias, S. and Caceres, J.F. (2017) Genetic variation and RNA structure regulate microRNA biogenesis. Nat Commun, 8, 15114. 4. Buratti, E. and Baralle, F.E. (2004) Influence of RNA secondary structure on the pre-mRNA splicing process. Mol Cell Biol, 24, 10505-10514. 5. Fahlman, R.P., Dale, T. and Uhlenbeck, O.C. (2004) Uniform binding of aminoacylated transfer RNAs to the ribosomal A and P sites. Mol Cell, 16, 799- 805. 6. Welker, N.C., Maity, T.S., Ye, X., Aruscavage, P.J., Krauchuk, A.A., Liu, Q. and Bass, B.L. (2011) Dicerʹs domain discriminates dsRNA termini to promote an altered reaction mode. Mol Cell, 41, 589-599. 7. Bergeron, L., Jr., Perreault, J.P. and Abou Elela, S. (2010) Short RNA duplexes guide sequence-dependent cleavage by human Dicer. RNA, 16, 2464-2473. 8. Esakova, O. and Krasilnikov, A.S. (2010) Of proteins and RNA: the RNase P/MRP family. RNA, 16, 1725-1747. 9. Guerrier-Takada, C., Gardiner, K., Marsh, T., Pace, N. and Altman, S. (1983) The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme. Cell, 35, 849-857. 10. Reiter, N.J., Osterman, A., Torres-Larios, A., Swinger, K.K., Pan, T. and Mondragon, A. (2010) Structure of a bacterial ribonuclease P holoenzyme in complex with tRNA. Nature, 468, 784-789. 11. Sun, L., Campbell, F.E., Zahler, N.H. and Harris, M.E. (2006) Evidence that substrate-specific effects of C5 protein lead to uniformity in binding and catalysis by RNase P. EMBO J, 25, 3998-4007. 12. Koutmou, K.S., Zahler, N.H., Kurz, J.C., Campbell, F.E., Harris, M.E. and Fierke, C.A. (2010) Protein-precursor tRNA contact leads to sequence-specific recognition of 5ʹ leaders by bacterial ribonuclease P. J Mol Biol, 396, 195-208. 13. Sun, L., Campbell, F.E., Yandek, L.E. and Harris, M.E. (2010) Binding of C5 protein to P RNA enhances the rate constant for catalysis for P RNA processing of pre- tRNAs lacking a consensus (+ 1)/C(+ 72) pair. J Mol Biol, 395, 1019-1037. 14. Zahler, N.H., Christian, E.L. and Harris, M.E. (2003) Recognition of the 5ʹ leader of pre-tRNA substrates by the active site of ribonuclease P. RNA, 9, 734-745. 15. Christian, E.L., Zahler, N.H., Kaye, N.M. and Harris, M.E. (2002) Analysis of substrate recognition by the ribonucleoprotein endonuclease RNase P. Methods, 28, 307-322.

157 16. Yandek, L.E., Lin, H.C. and Harris, M.E. (2013) Alternative substrate kinetics of Escherichia coli ribonuclease P: determination of relative rate constants by internal competition. J Biol Chem, 288, 8342-8354. 17. Harris, M.E., Kazantsev, A.V., Chen, J.L. and Pace, N.R. (1997) Analysis of the tertiary structure of the ribonuclease P ribozyme-substrate complex by site- specific photoaffinity crosslinking. Rna, 3, 561-576. 18. Christian, E.L., McPheeters, D.S. and Harris, M.E. (1998) Identification of individual nucleotides in the bacterial ribonuclease P ribozyme adjacent to the pre-tRNA cleavage site by short-range photo-cross-linking. Biochemistry, 37, 17618-17628. 19. Sun, L. and Harris, M.E. (2007) Evidence that binding of C5 protein to P RNA enhances ribozyme catalysis by influencing active site metal ion affinity. RNA, 13, 1505-1515. 20. Chan, P.P. and Lowe, T.M. (2009) GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res, 37, D93-97. 21. Blattner, F.R., Plunkett, G., 3rd, Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F. et al. (1997) The complete genome sequence of Escherichia coli K-12. Science, 277, 1453-1462. 22. Mackie, G.A. and Genereaux, J.L. (1993) The role of RNA structure in determining RNase E-dependent cleavage sites in the mRNA for ribosomal protein S20 in vitro. J Mol Biol, 234, 998-1012. 23. Cormack, R.S. and Mackie, G.A. (1992) Structural requirements for the processing of Escherichia coli 5 S ribosomal RNA by RNase E in vitro. J Mol Biol, 228, 1078-1090. 24. Kaberdin, V.R. (2003) Probing the substrate specificity of Escherichia coli RNase E using a novel oligonucleotide-based assay. Nucleic Acids Res, 31, 4710-4716. 25. Mohanty, B.K. and Kushner, S.R. (2007) Ribonuclease P processes polycistronic tRNA transcripts in Escherichia coli independent of . Nucleic Acids Res, 35, 7614-7625. 26. Li, Z. and Deutscher, M.P. (1996) Maturation pathways for E. coli tRNA precursors: a random multienzyme process in vivo. Cell, 86, 503-512. 27. Gaur, R.K. and Krupp, G. (1993) Modification interference approach to detect ribose moieties important for the optimal activity of a ribozyme. Nucleic Acids Res, 21, 21-26. 28. Zahler, N.H., Sun, L., Christian, E.L. and Harris, M.E. (2005) The pre-tRNA nucleotide base and 2ʹ-hydroxyl at N(-1) contribute to fidelity in tRNA processing by RNase P. J Mol Biol, 345, 969-985. 29. Niranjanakumari, S., Stams, T., Crary, S.M., Christianson, D.W. and Fierke, C.A. (1998) Protein component of the ribozyme ribonuclease P alters substrate recognition by directly contacting precursor tRNA. Proc Natl Acad Sci U S A, 95, 15212-15217. 30. Lin, H.C., Zhao, J., Niland, C.N., Tran, B., Jankowsky, E. and Harris, M.E. (2016) Analysis of the RNA Binding Specificity Landscape of C5 Protein Reveals Structure

158 and Sequence Preferences that Direct RNase P Specificity. Cell chemical biology, 23, 1271-1281. 31. Guenther, U.P., Yandek, L.E., Niland, C.N., Campbell, F.E., Anderson, D., Anderson, V.E., Harris, M.E. and Jankowsky, E. (2013) Hidden specificity in an apparently nonspecific RNA-binding protein. Nature, 502, 385-388. 32. Niland, C.N., Zhao, J., Lin, H.C., Anderson, D.R., Jankowsky, E. and Harris, M.E. (2016) Determination of the Specificity Landscape for Ribonuclease P Processing of Precursor tRNA 5ʹ Leader Sequences. ACS Chem Biol, 11, 2285-2292. 33. Agrawal, A., Mohanty, B.K. and Kushner, S.R. (2014) Processing of the seven valine tRNAs in Escherichia coli involves novel features of RNase P. Nucleic Acids Res, 42, 11166-11179. 34. Hsieh, J. and Fierke, C.A. (2009) Conformational change in the Bacillus subtilis RNase P holoenzyme--pre-tRNA complex enhances substrate affinity and limits cleavage rate. RNA, 15, 1565-1577. 35. Kramer, E.B. and Farabaugh, P.J. (2007) The frequency of translational misreading errors in E. coli is largely determined by tRNA competition. RNA, 13, 87-96. 36. Svenningsen, S.L., Kongstad, M., Stenum, T.S., Munoz-Gomez, A.J. and Sorensen, M.A. (2017) Transfer RNA is highly unstable during early amino acid starvation in Escherichia coli. Nucleic Acids Res, 45, 793-804. 37. Li, Z. and Deutscher, M.P. (2002) RNase E plays an essential role in the maturation of Escherichia coli tRNA precursors. RNA, 8, 97-109. 38. Mohanty, B.K. and Kushner, S.R. (2010) Processing of the Escherichia coli leuX tRNA transcript, encoding tRNA(Leu5), requires either the 3ʹ-->5ʹ polynucleotide or RNase P to remove the Rho- independent transcription terminator. Nucleic Acids Res, 38, 597-607. 39. Ziehler, W.A., Day, J.J., Fierke, C.A. and Engelke, D.R. (2000) Effects of 5ʹ leader and 3ʹ trailer structures on pre-tRNA processing by nuclear RNase P. Biochemistry, 39, 9909-9916. 40. Gossringer, M., Helmecke, D. and Hartmann, R.K. (2012) Characterization of RNase P RNA activity. Methods Mol Biol, 848, 61-72. 41. Mohanty, B.K. and Kushner, S.R. (2008) Rho-independent transcription terminators inhibit RNase P processing of the secG leuU and metT tRNA polycistronic transcripts in Escherichia coli. Nucleic Acids Res, 36, 364-375. 42. Shepherd, J. and Ibba, M. (2015) Bacterial transfer RNAs. FEMS microbiology reviews, 39, 280-300. 43. Klemm, B.P., Wu, N., Chen, Y., Liu, X., Kaitany, K.J., Howard, M.J. and Fierke, C.A. (2016) The Diversity of Ribonuclease P: Protein and RNA Catalysts with Analogous Biological Functions. Biomolecules, 6. 44. Howard, M.J., Liu, X., Lim, W.H., Klemm, B.P., Fierke, C.A., Koutmos, M. and Engelke, D.R. (2013) RNase P enzymes: divergent scaffolds for a conserved biological reaction. RNA Biol, 10, 909-914. 45. Evans, D., Marquez, S.M. and Pace, N.R. (2006) RNase P: interface of the RNA and protein worlds. Trends Biochem Sci, 31, 333-341.

159 46. Jarrous, N. and Gopalan, V. (2010) Archaeal/eukaryal RNase P: subunits, functions and RNA diversification. Nucleic Acids Res, 38, 7885-7894. 47. Tsai, H.Y., Masquida, B., Biswas, R., Westhof, E. and Gopalan, V. (2003) Molecular modeling of the three-dimensional structure of the bacterial RNase P holoenzyme. J Mol Biol, 325, 661-675. 48. Frank, D.N. and Pace, N.R. (1998) Ribonuclease P: unity and diversity in a tRNA processing ribozyme. Annu Rev Biochem, 67, 153-180. 49. Harris, M.E., Nolan, J.M., Malhotra, A., Brown, J.W., Harvey, S.C. and Pace, N.R. (1994) Use of photoaffinity crosslinking and molecular modeling to analyze the global architecture of ribonuclease P RNA. EMBO Journal, 13, 3953-3963. 50. Kurz, J.C., Niranjanakumari, S. and Fierke, C.A. (1998) Protein component of Bacillus subtilis RNase P specifically enhances the affinity for precursor-tRNAAsp. Biochemistry, 37, 2393-2400. 51. Kurz, J.C. and Fierke, C.A. (2002) The affinity of magnesium binding sites in the Bacillus subtilis RNase P x pre-tRNA complex is enhanced by the protein subunit. Biochemistry, 41, 9545-9558. 52. Koutmou, K.S., Day-Storms, J.J. and Fierke, C.A. (2011) The RNR motif of B. subtilis RNase P protein interacts with both PRNA and pre-tRNA to stabilize an active conformer. RNA, 17, 1225-1235. 53. Buck, A.H., Kazantsev, A.V., Dalby, A.B. and Pace, N.R. (2005) Structural perspective on the activation of RNAse P RNA by protein. Nat Struct Mol Biol, 12, 958-964. 54. Buck, A.H., Dalby, A.B., Poole, A.W., Kazantsev, A.V. and Pace, N.R. (2005) Protein activation of a ribozyme: the role of bacterial RNase P protein. Embo J, 24, 3360-3368. 55. Lechner, M., Rossmanith, W., Hartmann, R.K., Tholken, C., Gutmann, B., Giege, P. and Gobert, A. (2015) Distribution of Ribonucleoprotein and Protein-Only RNase P in Eukarya. and evolution, 32, 3186-3193. 56. Pan, T., Loria, A. and Zhong, K. (1995) Probing of tertiary interactions in RNA: 2'- hydroxyl-base contacts between the RNase P RNA and pre-tRNA. Proc Natl Acad Sci U S A, 92, 12510-12514. 57. Suwa, S., Nagai, Y., Fujimoto, A., Kikuchi, Y. and Tanaka, T. (2009) Analysis on substrate specificity of Escherichia coli ribonuclease P using shape variants of pre-tRNA: proposal of subsites model for substrate shape recognition. J Biochem, 145, 151-160. 58. Loria, A. and Pan, T. (2000) The 3' substrate determinants for the catalytic efficiency of the Bacillus subtilis RNase P holoenzyme suggest autolytic processing of the RNase P RNA in vivo. Rna, 6, 1413-1422. 59. Wu, S., Chen, Y., Lindell, M., Mao, G. and Kirsebom, L.A. (2011) Functional coupling between a distal interaction and the cleavage site in bacterial RNase-P- RNA-mediated cleavage. J Mol Biol, 411, 384-396. 60. Altuvia, Y., Bar, A., Reiss, N., Karavani, E., Argaman, L. and Margalit, H. (2018) In vivo cleavage rules and target repertoire of RNase III in Escherichia coli. Nucleic Acids Research, gky684-gky684.

160 61. Deutscher, M.P. (2009) Maturation and degradation of ribosomal RNA in bacteria. Progress in molecular biology and translational science, 85, 369-391. 62. Nicholson, A.W. (2014) Ribonuclease III mechanisms of double-stranded RNA cleavage. Wiley Interdiscip Rev RNA, 5, 31-48. 63. Mackie, G.A. (2013) RNase E: at the interface of bacterial RNA processing and decay. Nature reviews. Microbiology, 11, 45-57. 64. Deutscher, M.P. (2009), Progress in molecular biology and translational science. Academic Press, Vol. 85, pp. 369-391. 65. Taliaferro, J.M., Lambert, N.J., Sudmant, P.H., Dominguez, D., Merkin, J.J., Alexis, M.S., Bazile, C. and Burge, C.B. (2016) RNA Sequence Context Effects Measured In Vitro Predict In Vivo Protein Binding and Regulation. Mol Cell, 64, 294-306. 66. Fu, X.D. and Ares, M., Jr. (2014) Context-dependent control of alternative splicing by RNA-binding proteins. Nat Rev Genet, 15, 689-701. 67. Pickering, B.M. and Willis, A.E. (2005) The implications of structured 5′ untranslated regions on translation and disease. Seminars in cell & developmental biology, 16, 39-47. 68. Lin, C.L., Taggart, A.J. and Fairbrother, W.G. (2016) RNA structure in splicing: An evolutionary perspective. RNA Biol, 13, 766-771. 69. Barash, Y., Calarco, J.A., Gao, W., Pan, Q., Wang, X., Shai, O., Blencowe, B.J. and Frey, B.J. (2010) Deciphering the splicing code. Nature, 465, 53-59. 70. Jankowsky, E. and Harris, M.E. (2015) Specificity and nonspecificity in RNA- protein interactions. Nat Rev Mol Cell Biol, 16, 533-544. 71. Manning, K.S. and Cooper, T.A. (2017) The roles of RNA processing in translating genotype to phenotype. Nature reviews. Molecular , 18, 102-114. 72. Zarnack, K., Konig, J., Tajnik, M., Martincorena, I., Eustermann, S., Stevant, I., Reyes, A., Anders, S., Luscombe, N.M. and Ule, J. (2013) Direct competition between hnRNP C and U2AF65 protects the transcriptome from the exonization of Alu elements. Cell, 152, 453-466. 73. Sellier, C., Cerro-Herreros, E., Blatter, M., Freyermuth, F., Gaucherot, A., Ruffenach, F., Sarkar, P., Puymirat, J., Udd, B., Day, J.W. et al. (2018) rbFOX1/MBNL1 competition for CCUG RNA repeats binding contributes to myotonic dystrophy type 1/type 2 differences. Nature communications, 9, 2009. 74. Nazim, M., Masuda, A., Rahman, M.A., Nasrin, F., Takeda, J.-i., Ohe, K., Ohkawara, B., Ito, M. and Ohno, K. (2017) Competitive regulation of alternative splicing and alternative polyadenylation by hnRNP H and CstF64 determines isoforms. Nucleic Acids Research, 45, 1455-1468. 75. Dassi, E. (2017) Handshakes and Fights: The Regulatory Interplay of RNA-Binding Proteins. Frontiers in Molecular Biosciences, 4, 67. 76. Yu, T.X., Rao, J.N., Zou, T., Liu, L., Xiao, L., Ouyang, M., Cao, S., Gorospe, M. and Wang, J.Y. (2013) Competitive binding of CUGBP1 and HuR to occludin mRNA controls its translation and modulates epithelial barrier function. Molecular biology of the cell, 24, 85-99. 77. Elkon, R., Ugalde, A.P. and Agami, R. (2013) Alternative cleavage and polyadenylation: extent, regulation and function. Nat Rev Genet, 14, 496-506.

161 78. Warf, M.B., Diegel, J.V., von Hippel, P.H. and Berglund, J.A. (2009) The protein factors MBNL1 and U2AF65 bind alternative RNA structures to regulate splicing. Proceedings of the National Academy of Sciences, 106, 9203-9208. 79. Foley, S.W., Kramer, M.C. and Gregory, B.D. (2017) RNA structure, binding, and coordination in Arabidopsis. Wiley Interdiscip Rev RNA, 8. 80. Liu, N., Zhou, K.I., Parisien, M., Dai, Q., Diatchenko, L. and Pan, T. (2017) N6- methyladenosine alters RNA structure to regulate binding of a low-complexity protein. Nucleic Acids Research, 45, 6051-6063. 81. Huang, H., Zhang, J., Harvey, S.E., Hu, X. and Cheng, C. (2017) RNA G-quadruplex secondary structure promotes alternative splicing via the RNA-binding protein hnRNPF. Genes & development, 31, 2296-2309. 82. Dong, H., Nilsson, L. and Kurland, C.G. (1996) Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J Mol Biol, 260, 649-663. 83. Sørensen, M.A., Fehler, A.O. and Lo Svenningsen, S. (2017) Transfer RNA instability as a stress response in Escherichia coli: Rapid dynamics of the tRNA pool as a function of demand. RNA Biology, 1-8. 84. Dittmar, K.A., Goodenbour, J.M. and Pan, T. (2006) Tissue-specific differences in human transfer RNA expression. PLoS genetics, 2, e221. 85. Dittmar, K.A., Mobley, E.M., Radek, A.J. and Pan, T. (2004) Exploring the regulation of tRNA distribution on the genomic scale. J Mol Biol, 337, 31-47. 86. Regulski, E.E. and Breaker, R.R. (2008) In-line probing analysis of riboswitches. Methods Mol Biol, 419, 53-67. 87. Fersht, A. (1998) Structure and Mechanism in Protein Science. W.H. Freemand & Co. 88. Milligan, J.F., Groebe, D.R., Witherell, G.W. and Uhlenbeck, O.C. (1987) Oligoribonucleotide synthesis using T7 RNA polymerase and synthetic DNA templates. Nucleic Acids Res, 15, 8783-8798. 89. Milligan, J.F. and Uhlenbeck, O.C. (1989) Synthesis of small RNAs using T7 RNA polymerase. Methods Enzymol, 180, 51-62. 90. Rose, I.A., O'Connell, E.L. and Litwin, S. (1974) Determination of the rate of -glucose dissociation by the isotope-trapping method. J Biol Chem, 249, 5163-5168. 91. Helder, S., Blythe, A.J., Bond, C.S. and Mackay, J.P. (2016) Determinants of affinity and specificity in RNA-binding proteins. Curr Opin Struct Biol, 38, 83-91. 92. Duss, O., Michel, E., Yulikov, M., Schubert, M., Jeschke, G. and Allain, F.H.T. (2014) Structural basis of the non-coding RNA RsmZ acting as a protein sponge. Nature, 509, 588. 93. Chao, Y., Li, L., Girodat, D., Forstner, K.U., Said, N., Corcoran, C., Smiga, M., Papenfort, K., Reinhardt, R., Wieden, H.J. et al. (2017) In Vivo Cleavage Map Illuminates the Central Role of RNase E in Coding and Non-coding RNA Pathways. Mol Cell, 65, 39-51.

162 94. Lim, B., Sim, M., Lee, H., Hyun, S., Lee, Y., Hahn, Y., Shin, E. and Lee, K. (2015) Regulation of Escherichia coli RNase III activity. Journal of microbiology (Seoul, Korea), 53, 487-494. 95. Lalaouna, D., Simoneau-Roy, M., Lafontaine, D. and Masse, E. (2013) Regulatory RNAs and target mRNA decay in prokaryotes. Biochim Biophys Acta, 1829, 742- 747. 96. Matsunaga, J., Simons, E.L. and Simons, R.W. (1996) RNase III autoregulation: structure and function of rncO, the posttranscriptional "operator". Rna, 2, 1228- 1240. 97. Pertzev, A.V. and Nicholson, A.W. (2006) Characterization of RNA sequence determinants and antideterminants of processing reactivity for a minimal substrate of Escherichia coli ribonuclease III. Nucleic Acids Res, 34, 3708-3721. 98. Chelladurai, B., Li, H., Zhang, K. and Nicholson, A.W. (1993) Mutational analysis of a ribonuclease III processing signal. Biochemistry, 32, 7549-7558. 99. Altman, S., Wesolowski, D., Guerrier-Takada, C. and Li, Y. (2005) RNase P cleaves transient structures in some riboswitches. Proceedings of the National Academy of Sciences of the United States of America, 102, 11284-11289. 100. Hartmann, R.K., Heinrich, J., Schlegl, J. and Schuster, H. (1995) Precursor of C4 antisense RNA of bacteriophages P1 and P7 is a substrate for RNase P of Escherichia coli. Proc Natl Acad Sci U S A, 92, 5822-5826. 101. Li, Y. and Altman, S. (1996) Cleavage by RNase P of gene N mRNA reduces bacteriophage lambda burst size. Nucleic Acids Res, 24, 835-842. 102. Li, Y. and Altman, S. (2003) A specific endoribonuclease, RNase P, affects gene expression of polycistronic operon mRNAs. Proceedings of the National Academy of Sciences, 100, 13213-13218. 103. Waugh, D.S. and Pace, N.R. (1990) Complementation of an RNase P RNA (rnpB) gene deletion in Escherichia coli by homologous genes from distantly related eubacteria. J Bacteriol, 172, 6316-6322. 104. Wegscheid, B., Condon, C. and Hartmann, R.K. (2006) Type A and B RNase P RNAs are interchangeable in vivo despite substantial biophysical differences. EMBO reports, 7, 411-417. 105. Gossringer, M., Lechner, M., Brillante, N., Weber, C., Rossmanith, W. and Hartmann, R.K. (2017) Protein-only RNase P function in Escherichia coli: viability, processing defects and differences between PRORP isoenzymes. Nucleic Acids Res, 45, 7441-7454. 106. Taschner, A., Weber, C., Buzet, A., Hartmann, R.K., Hartig, A. and Rossmanith, W. (2012) Nuclear RNase P of Trypanosoma brucei: a single protein in place of the multicomponent RNA-protein complex. Cell reports, 2, 19-25. 107. Weber, C., Hartig, A., Hartmann, R.K. and Rossmanith, W. (2014) Playing RNase P Evolution: Swapping the RNA Catalyst for a Protein Reveals Functional Uniformity of Highly Divergent Enzyme Forms. PLoS genetics, 10, e1004506. 108. Anderson, V.E. (2015) Multiple alternative substrate kinetics. Biochim Biophys Acta, 1854, 1729-1736.

163 109. Kuo, Y.M., Henry, R.A. and Andrews, A.J. (2016) Measuring specificity in multi- substrate/product systems as a tool to investigate selectivity in vivo. Biochim Biophys Acta, 1864, 70-76. 110. Schellenberger, V., Siegel, R.A. and Rutter, W.J. (1993) Analysis of enzyme specificity by multiple substrate kinetics. Biochemistry, 32, 4344-4348. 111. Deng, Z., Mao, J., Wang, Y., Zou, H. and Ye, M. (2017) Enzyme Kinetics for Complex System Enables Accurate Determination of Specificity Constants of Numerous Substrates in a Mixture by Proteomics Platform. Molecular & cellular proteomics : MCP, 16, 135-145. 112. Kolomeisky, A.B. (2011) Physics of protein-DNA interactions: mechanisms of facilitated target search. Phys Chem Chem Phys, 13, 2088-2095. 113. Siggers, T. and Gordan, R. (2014) Protein-DNA binding: complexities and multi- protein codes. Nucleic Acids Res, 42, 2099-2111. 114. von Hippel, P.H. and Berg, O.G. (1989) Facilitated target location in biological systems. J Biol Chem, 264, 675-678. 115. Korennykh, A.V., Piccirilli, J.A. and Correll, C.C. (2006) The electrostatic character of the ribosomal surface enables extraordinarily rapid target location by ribotoxins. Nat Struct Mol Biol, 13, 436-443. 116. Plantinga, M.J., Korennykh, A.V., Piccirilli, J.A. and Correll, C.C. (2008) Electrostatic interactions guide the active site face of a structure-specific ribonuclease to its RNA substrate. Biochemistry, 47, 8912-8918. 117. Niranjanakumari, S., Day-Storms, J.J., Ahmed, M., Hsieh, J., Zahler, N.H., Venters, R.A. and Fierke, C.A. (2007) Probing the architecture of the B. subtilis RNase P holoenzyme active site by cross-linking and affinity cleavage. Rna, 13, 521-535. 118. Fredrik Pettersson, B.M., Ardell, D.H. and Kirsebom, L.A. (2005) The length of the 5' leader of Escherichia coli tRNA precursors influences bacterial growth. J Mol Biol, 351, 9-15. 119. Brannvall, M., Kikovska, E. and Kirsebom, L.A. (2004) Cross talk between the +73/294 interaction and the cleavage site in RNase P RNA mediated cleavage. Nucleic Acids Res, 32, 5418-5429. 120. Kikovska, E., Brannvall, M. and Kirsebom, L.A. (2006) The exocyclic amine at the RNase P cleavage site contributes to substrate binding and catalysis. J Mol Biol, 359, 572-584. 121. Brannvall, M., Pettersson, B.M. and Kirsebom, L.A. (2003) Importance of the +73/294 interaction in Escherichia coli RNase P RNA substrate complexes for cleavage and metal ion coordination. J Mol Biol, 325, 697-709. 122. Busch, S., Kirsebom, L.A., Notbohm, H. and Hartmann, R.K. (2000) Differential role of the intermolecular base-pairs G292-C(75) and G293-C(74) in the reaction catalyzed by Escherichia coli RNase P RNA. J Mol Biol, 299, 941-951. 123. Kirsebom, L.A. and Svard, S.G. (1994) Base pairing between Escherichia coli RNase P RNA and its substrate. Embo j, 13, 4870-4876. 124. Altman, S., Baer, M., Gold, H., Guerrier-Takada, C., Kirsebom, L., Lawrence, N., Lumelsky, N. and Vioque, A. (1987) Cleavage of RNA by RNase P from Escherichia coli.

164 125. Schwartz, M.H., Wang, H., Pan, J.N., Clark, W.C., Cui, S., Eckwahl, M.J., Pan, D.W., Parisien, M., Owens, S.M., Cheng, B.L. et al. (2018) Microbiome characterization by high-throughput transfer RNA sequencing and modification analysis. Nat Commun, 9, 5353. 126. Alexandre, M., Céline, G., Eliane, H. and Philippe, R. (2012) Search for poly(A) polymerase targets in E. coli reveals its implication in surveillance of Glu tRNA processing and degradation of stable RNAs. Molecular microbiology, 83, 436-451. 127. Taghbalout, A., Yang, Q. and Arluison, V. (2014) The Escherichia coli RNA processing and degradation machinery is compartmentalized within an organized cellular network. Biochemical Journal, 458, 11-22. 128. Guo, X., Campbell, F.E., Sun, L., Christian, E.L., Anderson, V.E. and Harris, M.E. (2006) RNA-dependent folding and stabilization of C5 protein during assembly of the E. coli RNase P holoenzyme. J Mol Biol, 360, 190-203. 129. Huang, C. and Yu, Y.T. (2013) Synthesis and labeling of RNA in vitro. Curr Protoc Mol Biol, Chapter 4, Unit4 15. 130. Jankowsky, E. and Harris, M.E. (2017) Mapping specificity landscapes of RNA- protein interactions by high throughput sequencing. Methods, 118-119, 111-118. 131. Kurz, J.C. and Fierke, C.A. (2000) Ribonuclease P: a ribonucleoprotein enzyme. Curr Opin Chem Biol, 4, 553-558. 132. Rueda, D., Hsieh, J., Day-Storms, J.J., Fierke, C.A. and Walter, N.G. (2005) The 5' leader of precursor tRNAAsp bound to the Bacillus subtilis RNase P holoenzyme has an extended conformation. Biochemistry, 44, 16130-16139. 133. Zuker, M. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res, 31, 3406-3415. 134. LaGrandeur, T.E., Huttenhofer, A., Noller, H.F. and Pace, N.R. (1994) Phylogenetic comparative chemical footprint analysis of the interaction between ribonuclease P RNA and tRNA. EMBO J, 13, 3945-3952. 135. Stormo, G.D. (2013) Modeling the specificity of protein-DNA interactions. Quant Biol, 1, 115-130. 136. Tan, C. and Takada, S. (2018) Dynamic and Structural Modeling of the Specificity in Protein-DNA Interactions Guided by Binding Assay and Structure Data. J Chem Theory Comput, 14, 3877-3889. 137. Forsdyke, D.R. (2007) Calculation of folding energies of single-stranded nucleic acid sequences: conceptual issues. J Theor Biol, 248, 745-753. 138. Singer, B. and Hang, B. (2000) Nucleic acid sequence and repair: role of adduct, neighbor bases and enzyme specificity. Carcinogenesis, 21, 1071-1078.

165