SPECIFICITY LANDSCAPE OF RIBONUCLEASE P PROCESSING OF PRE- TRNA SUBSTRATES BY HIGH-THROUGHPUT ENZYMOLOGY

by

COURTNEY NICOLE NILAND

Submitted in partial fulfillment of the requirements

For the degree of Doctor of Philosophy

Dissertation Advisor: Dr. Michael E. Harris

Department of Biochemistry

CASE WESTERN RESERVE UNIVERSITY

January 2017

CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of

Courtney Nicole Niland

candidate for the degree of Doctor of Philosophy.

Committee Chair

Hung-Ying Kao, Ph.D.

Committee Member

Michael E. Harris, Ph.D.

Committee Member

Eckhard Jankowsky, Ph.D.

Committee Member

Blanton Tolbert, Ph.D.

Committee Member

Michael A. Weiss, Ph.D.

Date of Defense

October 25, 2016

*We also certify that written approval has been obtained for any proprietary

material contained therein.

Dedication

Firstly, I dedicate this body of work to my wonderful parents, Timothy and

Jeannie Niland, because without their love and support it would not exist. From an early age, they instilled in me a strong moral compass and a high value for education. I saw the value of hard work manifested in their own actions as I grew- up and it was through their encouragement that I went to college and subsequently, graduate school. They have been my rock throughout this sometimes extremely challenging endeavor. They have shared in my triumphs and success and comforted me in times of worry and struggle. They have been my biggest cheerleaders, closest confidants, trusted counselors, and above all my best friends. The accomplishment of a PhD is as much theirs as mine.

Secondly, I dedicate this to my amazing friends and family who have showered me with their pride and enthusiasm throughout this time. Everyone from aunts, uncles, cousins, grandparents, family friends, and others to my friends both in Cleveland and abroad who have become a second family. To my lab family for fun outings, emotional support, and constant team enthusiasm. To my great mentor, Dr. Michael Harris for his wonderful mentorship and kind patience and understanding throughout my graduate schooling, even before I became a member of the lab. I could ask for no better lab experience in graduate school and will look back on my time here with fond memories as I move on to future careers.

Table of Contents

List of Figures ...... 3 Acknowledgements ...... 5 Specificity Landscape of Ribonuclease P Processing of Pre-tRNA Substrates by High- Throughput Enzymology ...... 6 Abstract ...... 6 Chapter I: Introduction ...... 8 Introduction ...... 9 Ribonuclease P...... 13 RNase P:pre-tRNA Interactions ...... 16 RNase P Mechanism ...... 18 Measuring the True Specificity of an Enzyme ...... 23 High-Throughput Enzymology to Measure RNase P Specificity...... 24 Discussion ...... 25 Chapter 2: Determination of the Specificity Landscape for Ribonuclease P Processing of Precursor tRNA 5' Leader Sequences ...... 27 Abstract ...... 28 Introduction ...... 29 Results and Discussion ...... 34 RNase P Processing of Pre-tRNA Substrate Pools ...... 34 Modeling of RNase P Substrate Specificity ...... 36 Specificity Landscape of Substrate Processing ...... 40 Methods ...... 48 Multiple-Turnover and Single-Turnover Reactions ...... 48 High-Throughput Sequencing Kinetics (HTS-Kin) ...... 49 Quantitative Modeling of Sequence Specificity ...... 50 Supplementary Methods and Data ...... 51 Preparation and Purification of and RNA ...... 51 Multiple turnover and single turnover reaction kinetics ...... 54 High-Throughput Sequencing Kinetics (HTS-Kin) ...... 56 Quantitative modeling of sequence specificity ...... 62 Chapter 3: Optimization of high-throughput sequencing kinetics for determining enzymatic rate constants of thousands of RNA substrates ...... 71 Abstract ...... 72

1

Introduction ...... 73 Results and Discussion ...... 77 Determination of relative rate constants for in vitro RNA processing reactions by HTS-Kin ...... 77

The magnitude of krel is independent of the distribution of substrate mole fractions in the initial precursor RNA population...... 83 Optimization of reaction kinetics and choice of internal reference for calculation

of krel ...... 86 Reliability of first-strand cDNA synthesis and quantitative PCR for Illumina library preparation ...... 90

Robustness of Illumina sequencing for reproducible determination of krel values ..... 95 Evaluation of experimental error ...... 96 Chapter 4: The contribution of the C5 protein subunit of Escherichia coli ribonuclease P to specificity for precursor tRNA is modulated by proximal 5’ leader sequences ...... 104 Abstract ...... 105 Introduction ...... 107 Results ...... 114 Discussion ...... 130 RNA and Protein Preparation ...... 134 RNase P Reactions ...... 134 High-Throughput Sequencing Kinetics ...... 135 RNA Sequence Specificity Modeling ...... 136 Supplementary Data ...... 137 Chapter 5: Summary and Future Directions ...... 141 Summary ...... 142 Identification of Rate Limiting Step in RNase P Processing of 5’ Leader Variants in Pre-tRNA by HTS-Kin ...... 143 Investigation of the Sources of Error in HTS-Kin and Optimal Conditions for Quantitative Analysis ...... 144 Observation of Energetic Coupling in RNase P Specificity for Pre-tRNA Substrates ...... 146 Discussion ...... 148 Future Directions ...... 149 Bibliography ...... 151

2

List of Figures

Figure 1-1: Complexity of RNA protein interactions in a strongly simplified model system…………………………………………………………………………………...12

Figure 1-2: Diagram of the three-dimensional structure of Thermotoga maritima P RNA bound to pre-tRNA………...... ……………………………………14

Figure 1-3: The X-ray crystal structure of RNase P-product complex from T. maritima at 4.2Å………………………………...……………………………………………19

Figure 2-1: Recognition of the pre-tRNA 5′ leader sequence by the RNA and protein subunits of ribonuclease P………………………………………………………..…..30

Figure 2-2: Determination of the effects of all possible pre-tRNA 5′ leader sequence variants on the relative kcat/Km for RNase P processing………………………....35

Figure 2-3: Quantitative modeling of 5′ leader sequence specificity of RNase P……..38

Figure 2-4: Effect of sequence variation in the 5′ leader is primarily on substrate affinity and not on catalysis…………………………………………………………………….42

Figure 2-5: Reaction free-energy profile for RNase P processing of pre-tRNAMet82 and effects of sequence variation defining the specificity landscape...... …44

Figure 2-S1: Analysis of raw HTS-Kin datasets obtained with the 21A or 21B terminal 5’ sequence………………………………….………………………………………..65

Figure 2-S2: Quantitative analysis of single turnover HTS-Kin datasets performed with the 21A terminal 5’ sequence…………………………………………………………….67

Figure 2-S3: Single substrate reactions using pre-tRNAMet with varying 5’ leader sequences……………………………………………………………………………….68

Figure 2-S4: The 5’ leader sequences from endogenous pre-tRNA from E. coli show variation in specificity………….………………………………………………………...70

Figure 3-1: High-throughput sequencing kinetics (HTS-Kin) measures processing rates of thousands of RNA substrates using internal competition kinetics…...78

Figure 3-2: Analysis of the dependence of the observed krel on the distribution of substrate mole fractions in the initial precursor RNA population………………...………84

Figure 3-3: Optimization of reaction kinetics and choice of internal reference for calculation of krel…….……………………..………………………………………………..88

Figure 3-4: Analysis of the reliability of first-strand cDNA synthesis and

3 quantitative PCR for Illumina library preparation………………………………...…………..93

Figure 3-5: Illumina sequencing errors contribute to imprecision in measurement of low krel values……………………….………………….………………………………….....97

Figure 3-6: HTS-Kin replicates show reproducible determination of substrate variant krel except for slowest reacting substrates…………………………………………...99

Figure 4-1: Bacterial RNase P is an essential ribonucleoprotein RNA processing enzyme…………………………………………………………………………...109

Figure 4-2: Sequence determinants for holoenzyme and reactions are unique and reveal changes in specificity……………………………..…………………116

Figure 4-3: Plotting the coupling coefficients from the PWM+coupling coefficient model identifies key altered energetic in holoenzyme and ribozyme reactions…...... ….119

Figure 4-4: Variation of in the 5’ leader of pre-tRNA contacting the P RNA subunit of RNase P alter the energetic contribution of nucleotides contacting the C5 protein ………………………………………………………………...……………….123

Figure 4-5: Additional secondary structure in the acceptor stem of the pre-tRNAMet formed between 5’ leader nucleotides and the 3’ACCA slows processing rate constants…………………………………………………………………….125

Figure 4-6: Coupling between nucleotides in the RNA and protein binding sites in the 5’ leader of pre-tRNA is a key component of RNase P specificity...... 128

Figure 4-7: A mechanistic model for modulation of interdependence of proximal and distal leader sequence specificities …………….………………………………………132

Figure 4-S1: Schematic of the HTS-Kin procedure………………………………………137

Figure 4-S2: Dividing the holoenzyme HTS-Kin data into subsets reveals extensive coupling between the sequence identity in the RNA binding site on sequence specificity of the protein binding site in the 5’ leader of pre-tRNA……………..138

Figure 4-S3: Position weight matrix values of holoenzyme HTS-Kin data subsets reveals changes in sequence specificity of 5’ leader nucleotides in the protein binding site……………………………………………………………………………………..140

4

Acknowledgements

For helpful discussions on the manuscripts presented in this thesis, including critical reading and discussion of results, I thank members of the Harris lab including Dr. Michael Harris, Andrew Knappenberger, Jing Zhao, Hsuan-Chun

Lin, Edward Ollie, and Dr. Daniel Kellerman. I would like to especially point out the contribution of Jing Zhao to the data in the second chapter herein in which she performed all studies using the transversed substrate population (21B) in Figure

2-3. In addition, the essential work in Figure 2-2 using the pre-tRNA population randomized at N(-8 to -3) was the result of experiments by Dr. Ulf-Peter Gunther and Dr. Frank Campbell. I would also like to thank the members of my thesis committee for their support and guidance over these years: Dr. Hung-Ying Kao,

Dr. Michael Harris, Dr. Eckhard Jankowsky, Dr. Blanton Tolbert, and Dr. Michael

Weiss. Finally, I would like to thank members of the Biochemistry Department at

Case Western Reserve University for providing a collegial environment in which to perform this exciting work and for helpful feedback at annual seminars.

5

Specificity Landscape of Ribonuclease P Processing of Pre-tRNA Substrates by High-Throughput Enzymology

Abstract

by

COURTNEY NICOLE NILAND

To fully understand the roles of RNA processing enzymes in cellular processes and human health, it is essential to dissect their substrate specificity. Using

Ribonuclease P (RNase P) as a model system, this body of work seeks to better understand multiple substrate recognition by RNA processing enzymes. The ubiquitous endonuclease RNase P removes the 5’ leader from all pre-tRNAs in the . To understand how variation in the 5’ leader is accommodated by the active site of RNase P, we comprehensively determined the processing rates of pre-tRNA substrates with all possible sequence combinations in the 5’ leader. This quantification involved Illumina® sequencing of the residual substrate population at different reaction times to monitor substrate depletion and calculate relative rate constants, a technique termed HTS-Kin. Additionally, the 5’ leader of pre-tRNA is recognized by both the catalytic RNA subunit (P RNA) and smaller protein subunit of RNase P, C5. We therefore hypothesized that variation in substrate sequence contacting one enzyme subunit may alter the recognition or energetic contribution of contacts to the other enzyme subunit. Upon comprehensive determination of

6

RNase P specificity for 5’ leader sequences, we have determined that this enzyme is tuned for specificity at association as the catalytic rate constant is unaffected by substrate variation. We have also performed mechanistic analysis to identify the key sources of error in the HTS-Kin technique: experimental error and Illumina sequencing error. Finally, we ascertained that the sequence identity of 5’ leader nucleotides contacting P RNA does not alter C5 protein specificity but rather modulates its energetic contribution to the processing rate.

7

Chapter I: Introduction

8

Introduction

Recent decades have opened a wealth of knowledge and excitement in the field of RNA biology and biochemistry. From the discovery of catalytic RNA, the spliceosome, and RNA editing decades ago to the recent discovery of microRNAs, long non-coding , and CRISPR RNAs, the field of RNA biochemistry has flourished with discoveries. Not only did these discoveries exterminate the central dogma that RNAs sole purpose is to encode for protein through , they ushered in a profound recognition of the role of RNA in regulation. Through careful experimentation by brilliant scientists, these intriguing discoveries are now transforming our view of human health and disease. For example, defects in function of many of these RNAs are associated with various forms of cancer and neurological disorders (1,2). Additional evidence for the importance of these RNAs in biology is provided by studies that implicate defects in RNA-binding in various diseases (3-6). The association of these RNAs with many disease states begs a mechanistic explanation of their function, particularly with regard to substrate recognition and structure-function relationships.

The gateway to the RNA world was opened with the identification that RNA could act as a catalyst for biological reactions in the cell, much like its protein counterparts. The first demonstration of the catalytic power of RNA for a substrate in trans was shown for the enzyme RNase P in the lab of Dr. Sydney Altman in

1983 (16,17). Along with the discovery of catalytic RNAs in the form of the hammerhead (7), hairpin (8,9), Varkud (10), and hepatitis delta (11) , larger and more complex ribozymes were discovered, including group

9

I and II self-splicing introns (12-15). These large ribozymes contain distinct domains and utilize nucleophiles in the form of metal ions or water molecules to catalyze hydrolysis or phosphoryl transfer reactions (18).

The myriad of functions performed by RNA in the cell necessitate its formation of complex three-dimensional structures to maintain its biological roles.

The making or breaking of interactions can result in regulation of structures in the

RNA molecule that direct its function; such is the case for riboswitches, RNA thermometers, and small RNAs that regulate bacterial gene expression (19-21).

This complexity translates from these small RNAs to motifs of large RNA complexes described above. A number of crystal structures have been determined for these large and small RNAs that shed light on the intricacy of their structure and its relation to substrate recognition and catalysis. For example, structures of the ribosome, group I - and group II introns were determined some time ago along with more recent structures of the , HDV ribozyme, Varkund satellite ribozyme, and RNase P (22-28). In each of these complexes, an intricate hydrogen-bonding network is apparent along with the necessity of metal ions in the folded structure. A network of RNA secondary structures including loops and bulges, helices, pseudoknots, kinks, and turns are pieced together to obtain these complex structures.

In recent decades, the powerful combination of RNA and protein in recognition of various substrates has become well-established (Figure 1-1).

Examples of biologically important ribonucleoprotein complexes (containing catalytic and non-catalytic RNA) include the ribosome, spliceosome, RNA

10 interference, and others. The textbook example of the ribosome shows that whereas the RNA subunits together are catalytic, the various ribosomal proteins serve to stabilize the structure. The spliceosome of is comprised of several small nuclear ribonucleoproteins, – each made of small nuclear

RNA and various proteins, that function to regulate gene expression by removing introns from pre-mRNA transcripts and performing alternative splicing to produce multiple from a single RNA transcript. Heterogeneous nuclear ribonucleoproteins (hnRNPs) are a diverse class of complexes that guide pre- mRNA folding, regulate splicing, and transport mRNA from the nucleus to the cytoplasm. Additional examples of RNA-binding proteins and RNA-protein complexes are vast in biology.

Another member of this family is the bacterial RNase P, which functions as a complex of a large catalytic RNA and a smaller protein subunit, both important for substrate recognition. A common theme among these RNA-binding proteins and ribonucleoprotein complexes is that their biological function necessitates their recognition of multiple substrates, many of which vary in their sequence and/or structure. This fact begs a reconciliation of the need for enzyme specificity while accommodating cellular needs. In many cases a quantitative mechanistic determination of these enzymes specificity is lacking. In the work presented herein, we employ RNase P as a model system to better understand multiple substrate recognition and the coordinated functions of enzyme subunits.

11

Figure 1-1. Complexity of RNA protein interactions in a strongly simplified model system. RNAs are represented as structures, proteins as colored circles. Black lines indicate RNA-protein interactions, dotted red lines RNA-RNA interactions, dotted blue lines protein-protein interactions. Created by Eckhard Jankowsky and Michael Harris.

12

Ribonuclease P

Ribonuclease P (RNase P) is an essential enzyme in all domains of , consisting of a catalytic RNA subunit, P RNA (~400 nucleotides) and a small protein subunit, C5 Protein (~100 amino acids) in (29) (Figure 1-2). This complexity is increased in higher order archaeal and eukaryotic versions of RNase

P which include several protein subunits and a structurally similar single catalytic

RNA subunit (30,31). In the mitochondria of humans and , a unique form of

RNase P has been identified that contains no RNA subunit, called protein-only

RNase P (PRORP), but maintains a similar catalytic mechanism to its bacterial form (32,33). In all forms of life, RNase P is responsible for the removal of the 5’ leader of all precursor tRNAs (pre-tRNAs) in the cell (34,35). The P RNA subunit of bacterial RNase P can perform endonucleolytic cleavage of the pre-tRNA in vitro without the C5 protein under conditions of high ionic strength (17,36). Whereas hydrolysis of the phosphodiester bond is performed by the P RNA subunit, the C5 protein increases substrate affinity and the processing rate constant as well as the association of catalytically important divalent cations (37,38).

A comparison of the sequence from over 200 P RNAs species of bacteria revealed high conservation of nucleotides in the catalytic domain and led to a consensus structural model of the enzyme consistent with biochemical evidence

(39,40). The P RNA subunit of RNase P consists of several distinct domains with individual functions. Two distinct domains of P RNA are identifiable upon inspection of its secondary and tertiary structure: the catalytic domain and specificity domain. Region P7-P11 contains nucleotides in the specificity domain

13

Figure 1-2. Diagram of the three-dimensional structure of Thermotoga maritima P RNA bound to pre-tRNA based on the crystal structure of (Torres-Larios et al. 2005). Helices of P RNA are shown as individually colored cylinders. The tRNA- body of the substrate is depicted by a black ribbon and the P protein subunit (C5 in E. Coli) is shown as a gray sphere. The 5′-leader sequence of pre-tRNA is shown as a dashed line. (Reprinted with permission from the RNA Journal, citation: Sun L. and Harris ME. RNA (2007) 13(9): 1505-15. Copyright Cold Spring Harbor Laboratory Press).

14 that contact the t-stem loop of the tRNA; the catalytic domain in regions P1-P5 is where the chemistry is performed and catalytic metal ions bind (41,42). The P1-

P4 multihelix junction is highly conserved among species and contains residues involved in binding of magnesium cations through their non-bridging oxygen (43).

Specifically, the presence of a bulged in the P4 helix is correlated with binding of magnesium ions and substrate cleavage (44). These divalent metal ions potentiate local changes in structure that are necessary for tertiary folding of the P

RNA as well as substrate binding and catalysis (44,45).

As described above, the C5 protein subunit of RNase P is not involved in cleavage of the 5’ leader from pre-tRNA. However, it is responsible for recruitment of divalent metal ions and substrate recognition and therefore is necessary for in vivo function (46). The C5 protein of RNase P recognizes the pre-tRNA substrate only in the 5’ leader at nucleotides distal to the site of cleavage (47). A pronounced single-stranded RNA-binding cleft and significant conservation of amino-acid residues is evident from the X-ray crystal structure of the protein subunit of

Thermatoga maritima RNase P and phylogenetic sequence conservation analysis

(26). Due to the high degree of sequence variation in the 5’ leader of pre-tRNAs, it has been assumed until recently that the C5 protein was a non-specific RNA- binding protein.

RNase P is distinguished from the other aforementioned ribozymes in its ability to perform multiple rounds of catalysis, i.e., multiple turnover reactions, due to its recognition of its substrate in trans (48). This is contrary to the single turnover self-cleavage reactions common to most ribozymes that act in cis. Another key

15 feature of RNase P as mentioned above is its biological necessity to recognize and process all pre-tRNAs in the cell. In the E. coli cell there are 87 pre-tRNAs which vary in sequence and structure; all must be processes by RNase P (49). Due to the significant variation in pre-tRNA substrates, the specificity of RNase P is poorly understood.

RNase P:pre-tRNA Interactions

Despite the lack of sequence conservation in pre-tRNA substrates, previous biochemical data have identified key determinants for RNase P recognition. The

P RNA subunit is thought to base-pair with the T-stem loop of the tRNA body through its specificity domain (50,51). For those pre-tRNAs encoded with a 3’-

CCA, the L15 region of the P RNA also makes direct base-pairs with these nucleotides (52-55). In addition, the G(+1)/C(+72) base-pair at the beginning of the acceptor stem of the tRNA is a positive determinant for substrate cleavage

(37). Previous studies from our lab also identified binding interactions between the

P RNA and proximal nucleotides in the 5’ leader of pre-tRNA using crosslinking and mutational analysis (56,57).

The C5 protein subunit of RNase P is also known to contact the 5’ leader of pre-tRNA at nucleotides distal to the cleavage site (47,58). In experiments measuring the equilibrium dissociation constant for pre-tRNA substrates with all identified determinants (cognate substrates) or lacking one or more (non-cognate substrates), substantial variation was observed in their affinity for P RNA and their

16 observed processing rate constant (59). Inclusion of the C5 protein provided uniformity in substrate binding and catalysis by increasing the affinity of all pre- tRNA substrates tested to similar values (59).

Biochemical data from our lab and others have shown that despite their natural variation, nucleotides in the 5’ leader of pre-tRNA substrates are important for substrate specificity and recognition. For example, hydroxyl radical protection assays showed increased protection of nucleotides N(-6) to N(-3) in the 5’ leader in the presence of the C5 protein compared to the P RNA alone, indicating direct contacts between the distal region of the 5’ leader and the C5 protein. In another study from the Fierke lab, structural information was used to guide mutagenesis experiments, which revealed that an α-helix of the C5 protein beginning with arginine-asparagine-arginine contained specific hydrogen bonds to the N(-4) of the

5’ leader and that these interactions are necessary for binding and conformational change (60).

Our lab previously identified a potential direct base-pairing interaction between the N(-1) position in the 5’ leader of pre-tRNA and an adenosine residue,

A248, in the J5/15 region of P RNA (56). This study investigated the effect of this mutation on catalysis by the ribozyme and the importance of this contact in the holoenzyme and its role in substrate recognition are unknown. Another set of studies also identified the N(-2) position of the 5’ leader as a possible interaction with the J18/2 region of P RNA; however, the exact residues involved were unidentified (61-63). The importance of contacts between the 5’ leader of pre- tRNAs and RNase P is underscored by studies that mutated the 5’ leader

17 sequences of cognate pre-tRNA substrates to those found in non-cognate pre- tRNAs and vice versa (59). In general, the 5’ leaders from non-cognate pre-tRNAs increased substrate affinity while those from cognate substrates decreased affinity, possibly reflecting the necessity of contacts to the 5’ leader in the enzyme- substrate complex when other optimal contacts are unavailable.

In addition to these direct contacts, the X-ray crystal structure of T. maritima enzyme-product complex suggests intimate contacts between the P RNA and the proximal region of the 5’ leader and between the C5 protein and the distal nucleotides of the 5’ leader (Figure 1-3; (26)). Unfortunately, atomic resolution of these interactions is unavailable from the data as the resolution of structure at the nucleobases of the 5’ leader is low, and thus many of the molecular interactions remain elusive. The importance of these individual contacts in the RNase P reaction and how they might work together to aid in substrate recognition are unexplored.

RNase P Mechanism

Biochemical studies to date of the RNase P reaction mechanism have supported a two-step pre-tRNA substrate binding mechanism (Scheme 1-1). In this model the RNase P-pre-tRNA complex is formed upon an initial binding event followed by a conformational change that increases affinity of the complex as additional contacts are made (64). Upon formation of the high-affinity complex, irreversible cleavage of the substrate occurs, and the 5’ leader and tRNA products

18

Figure 1-3. The X-ray crystal structure of RNase P-product complex from T. maritima at 4.2Å. The protein subunit of RNase P (pink) and P RNA subunit (blue) both make intimate contacts with the 5’ leader (green) of the pre-tRNA substrate. The tRNA body is colored orange. Individual nucleobases in the 5’ leader are not present because their precise position could not be determined with the obtained spectral data. Adapted from Reiter et al. Nature 2010 using PyMol (PDB: 3Q1R).

19

Scheme 1: In the holoenzyme, the pre-tRNA is bound with higher affinity than the ribozyme and the conformational change of the enzyme with substrate bound, k2, is promoted relative to ribozyme. Product release is much slower in the ribozyme than the holoenzyme and is possibly the rate limiting step in the ribozyme reaction.

20 are released from the RNase P active site (37). For substrate recognition, this conformational change in the enzyme-substrate complex upon binding has been observed for the Bacillus subtilis RNase P using fluorescence anisotropy; the results support an induced fit model of substrate recognition (64). Thus, two parameters in this reaction become important for its characterization: (i) the first- order rate constant of substrate cleavage (kcat) measuring the reaction of the enzyme-substrate complex to free enzyme and products (ii) and the second-order rate constant (kcat/Km) measuring the reaction of free enzyme and substrate through the reaction to free enzyme and products.

A simple chemical mechanism has been proposed for endonucleolytic cleavage of pre-tRNA by RNase P (48). RNase P hydrolyzes the phosphodiester bond between the first nucleotide in the tRNA body N(1) and the beginning of the

5’ leader, N(-1). Upon RNase P recognition, a transition state is formed with Mg2+ ion coordination of bridging and non-bridging oxygens of the phosphate backbone.

Binding of divalent cations promotes donation of a proton by the water nucleophile to the 2’ oxygen, which in turn facilitates protonation of the phosphate leaving group at the 5’ end of the tRNA body. Due to the slow rate constant observed in multiple-turnover reactions with RNase P, Fierke and colleagues have suggested that the enzyme is tuned for recognition of multiple substrates rather than rapid catalysis (65).

The high degree of sequence and structural variation in the pre-tRNA substrates, particularly in the 5’ leader, raises significant questions regarding the mechanism of RNase P cleavage. As stated above, early experiments using in

21 vitro transcribed RNA established that the P RNA subunit of RNase P is responsible for catalysis. Steady-state and pre-steady-state kinetics performed on the ribozyme (P RNA) alone identified product release as the rate-limiting step of the reaction (65-67). In fact, the equilibrium dissociation constant of the tRNA product with P RNA was lower than that of the pre-tRNA substrate (59). Pre-steady state reactions in the presence of the C5 protein subunit show an increase in affinity of RNase P for its pre-tRNA substrate compared to P RNA alone (49,65-

67). In steady-state reactions presence of the C5 protein also increases the catalytic rate constant likely through recruitment of magnesium ions and through the promotion of conformational change of P RNA (37,38,59). For the E. coli

RNase P holoenzyme, it was recently shown that the rate-limiting step is substrate association (49).

Dissecting the molecular recognition of RNase P for its pre-tRNA substrates using combinations of kinetics, thermodynamics, and structure-function assays provides a system in which to globally understand specificity of ribonucleoproteins.

By using a reasonably well-studied enzyme, we hope to elucidate the mechanism by which ribonucleoproteins can retain specificity for their cognate substrates among non-cognate binding sites in the cell. These studies not only have direct implications for understanding enzyme specificity, but also are likely to provide a more comprehensive view of how perturbations of this mechanism might promote disease states.

22

Measuring the True Specificity of an Enzyme

One of the key questions regarding enzymes that process multiple substrates is how they retain their specificity for various cognate substrates while avoiding recognition of other non-cognate molecules in the cell. Therefore, a true measure of enzyme specificity must include competition between potential substrates as this is what occurs in vivo (68). As previously described by Fersht, competition of two substrates for the same enzyme is described by a ratio of the reaction rates as given by a ratio of second order rate constants (kcat/Km) of each substrate multiplied by their respective concentrations (Equation 1-1). The derivation of this equation has been published by Fersht and emphasizes the fact that substrates are in competition at the level of their “specificity constant” or ratio of kcat/Km (68,69).

(푘푐푎푡⁄ ) 푣표푏푠2 퐾푚 2 푆2 = 푘 ( ) Equation 1-1 푣표푏푠1 ( 푐푎푡⁄ ) 푆1 퐾푚 1

As discussed above, the second-order rate constant of a reaction includes all reaction steps from enzyme and substrate collision to regeneration of free enzyme upon release of products. Therefore, measuring the second-order rate constant of a reaction will quantify the rate-limiting step for that substrate up to and including the first irreversible step. By its very nature then, it is possible that two competing substrates in an enzymatic reaction could have different rate-limiting steps (i.e., substrate 1 may be limited by association and substrate 2 by catalysis).

By placing substrates into competition with one another for the enzyme, each substrate acts as a competitive inhibitor to all others.

23

Previously, others have derived these equations in relation to kinetic isotope effect studies and using competitive alternative substrates (70-73). The results of this derivation have provided insight into the conditions under which these measurements can be made. These equations are valid for any concentration of substrate, regardless of its apparent Km, since substrates are in competition for association with the enzyme. While differences in substrate concentration affect the reaction rate, the specificity constant (krel) remains unchanged (49). In addition, this approach is also independent of enzyme concentration because although present in the equation, it cancels out in the final derivation. Thus, this equation is also applicable to single-turnover reactions in which enzyme is in large excess over substrate. Finally, this theory is valid for any number of substrates and can be expanded far above comparison of two. As shown previously by our lab, additional substrates in the enzyme reaction continue to act as competitive inhibitors for all other substrates in the reaction. This results in a decrease in the processing rate of all substrates equally, and thus the quantified krel is unaltered

(49).

High-Throughput Enzymology to Measure RNase P Specificity

Obtaining a true measure of enzyme specificity has been a significant challenge in biochemistry. Previous attempts have focused on either the effect of single point mutations on enzyme processing or in qualitative approaches that select only the most favorable substrates. In recognition of the above knowledge of biological specificity and advantage of a competitive approach to analyzing

24 substrate recognition, we have developed a high-throughput enzymology approach to quantitatively define enzyme specificity for RNase P.

To accomplish this, we used a combination of enzymology and high- throughput sequencing to monitor the processing rates of thousands of pre-tRNA substrate variants in a single reaction. The details of this technique are described throughout the remaining chapters of this thesis and have been published (49,74-

77). An initial study from our lab provided a proof of concept for the use of competitive alternative substrate reactions with RNase P. These studies showed that pre-tRNAMet and pre-tRNAFMet (canonical and non-canonical RNase P substrates respectively) each have a rate-limiting step at association (49). In addition, this study provided direct evidence that relative rate constants could be obtained by placing these substrates into competition with one another at a variety of concentrations to obtain a similar krel (49). As expected, addition of a third pre- tRNA substrate did not affect this parameter (49). These studies laid the foundation for scaling up this analysis to testing thousands of substrates as described herein, which we term High-Throughput Sequencing Kinetics (HTS-Kin).

Discussion

Here, we present the results of studies aimed at a comprehensive determination of specificity in RNase P. We set out to answer the following questions: 1) How is variation in the 5’ leader nucleotides of pre-tRNA accommodated by RNase P? 2) What is the effect of variation in reaction

25 parameters and experimental set-up on HTS-Kin results? 3) Does the sequence identity of contacts between the 5’ leader and P RNA alter specificity of contacts between the C5 protein and 5’ leader? Each of these questions is addressed in its own chapter.

26

Chapter 2: Determination of the Specificity Landscape for Ribonuclease P Processing of Precursor tRNA 5' Leader Sequences

Reprinted with permission from Niland, C.N.; Zhao, J.; Lin, H-C.; Anderson, D.R.; Jankowsky, E.; Harris, M.E. “Determination of the Specificity Landscape for Ribonuclease P Processing of Precursor tRNA 5' Leader Sequences” ACS Chem Biol (2016) 11(8):2285-92 Copyright 2016 American Chemical Society

27

Abstract

Maturation of tRNA depends on a single endonuclease, ribonuclease P

(RNase P), to remove highly variable 5′ leader sequences from precursor tRNA transcripts. Here, we use high-throughput enzymology to report multiple-turnover and single-turnover kinetics for Escherichia coli RNase P processing of all possible

5′ leader sequences, including nucleotides contacting both the RNA and protein subunits of RNase P. The results reveal that the identity of N(−2) and N(−3) relative to the cleavage site at N(1) primarily control alternative substrate selection and act at the level of association not the cleavage step. As a consequence, the specificity for N(−1), which contacts the active site and contributes to catalysis, is suppressed. This study demonstrates high-throughput RNA enzymology as a means to globally determine RNA specificity landscapes and reveals the mechanism of substrate discrimination by a widespread and essential RNA- processing enzyme.

28

Introduction

To function in the cell, most RNA-processing enzymes and ribonucleases must recognize many alternative RNA substrates. Multiple substrate recognition is an inherent and essential function characteristic of complex ribonucleoprotein machines such as the spliceosome, ribosome, and enzymes involved in processing tRNAs, snRNAs, and siRNAs. A major challenge they all face is to recognize their cognate substrates among the excess of non-cognate binding sites in the transcriptome. Therefore, a complete understanding of RNA metabolism and gene expression requires characterizing in detail the substrate RNA sequences and structures that drive enzyme specificity. Achieving a comprehensive understanding of the mechanistic basis for substrate discrimination by RNA-processing enzymes is challenging but important for realizing their potential as drug targets (78-80) and platforms for engineering synthetic biology (81-83).

Ribonuclease P is a ubiquitous and essential tRNA-processing endonuclease that generates the mature 5′ end of tRNA by removal of the 5′ leader sequence (84). In bacteria, RNase P is composed of a large catalytic RNA subunit

(P RNA) and a smaller protein subunit, both of which are responsible for substrate recognition. In Escherichia coli, a single RNase P enzyme must process all 87 pre-tRNAs, which vary greatly in sequence and length of their 5′ leaders with some base conservation observed only at nucleotides proximal to the cleavage site

(49,58) (Figure 2-1A). Despite this lack of 5′ leader sequence conservation, experimental approaches such as cross-linking, chemical protection, mutagenesis,

29

Figure 2-1. Recognition of the pre-tRNA 5′ leader sequence by the RNA and protein subunits of ribonuclease P. (a) A sequence alignment of all 5′ leaders in Escherichia coli pre-tRNAs reveals no identifiable conserved sequence motif. (b) X-ray crystal structure of the Thermotoga maritima RNase P-product complex from Reiter et al. P RNA subunit is shown in cyan, protein subunit in pink, tRNA body in orange, and green 5′ leader.

30 as well as X-ray crystallography of the Thermotoga maritima RNase P-product complex, demonstrate both the protein subunit and RNA subunit make direct contact with 5′ leader nucleotides N(−8) to N(−1) (16,34,35,37,56,59,85-90).

Comparison of the kinetics of genomically encoded E. coli pre-tRNAs reveals that they react in vitro with essentially equivalent kcat/Km values (37,59). Similar multiple-turnover reaction kinetics are due, in part, to variation in the strength of 5′ leader sequence interactions that may compensate for weaker affinities of different tRNAs (49,59). Bioinformatic analyses suggest that sequence-specific contacts between the protein subunit and the 5′ leader sequences of pre-tRNAs may be common in bacterial RNase P and may lead to species-specific substrate recognition (58).

Current structure models of pre-tRNA bound to bacterial RNase P indicate that the two nucleotides, N(−2) and N(−1), immediately 5′ to the cleavage site interact directly with the catalytic P RNA subunit consistent with biochemical data.

The N(−1) nucleobase interacts with a universally conserved adenosine in the

J5/15 region of P RNA (56,86,91). N(−2) is proposed to interact with J18/2 although the chemical details of this interaction are not defined (86,89,92). More distal 5′ leader sequences N(−8) to N(−3) are contacted by the small, but essential protein subunit (termed C5 in E. coli) (47,66,93). Biochemical and biophysical data show that the 5′ leader binds to a cleft on the surface of P protein in an extended conformation (35,47,94). An X-ray crystal structure of T. maritima RNase P with a

5′ leader sequence oligonucleotide soaked into the crystal is consistent with these intimate contacts (26) (Figure 2-1B). In vitro structure–function studies

31 demonstrate that sequence variation at 5′ leader nucleotides involved in both

RNA–RNA and RNA–protein interactions can have significant functional effects on

RNase P processing rates as well as cleavage site specificity (58,59,95). For example, analysis of 5′ leader point mutations and amino acid substitutions in P protein support favorable hydrogen bonding interactions between an adenosine at

N(−4) and the protein subunit of Bacillus subtilis RNase P; however, the sequence preference of E. coli RNase P at this position is diminished (58). Additionally, mutation of the 5′ leader at N(−1) showed altered processing rates and that are partly rescued by compensatory mutations P RNA in J5/15 (56,57). At present, the 5′ leader sequence specificity of RNase P enzymes and the manner in which sequence variation affects the reaction mechanism are not well-defined. As a consequence, we lack the information necessary to predict the effects of variation in the sequences of genomically encoded tRNA precursors in vivo and the understanding to identify the range of cognate substrates in the transcriptome.

As an initial step toward addressing this problem, we determined the sequence specificity of E. coli RNase P for 5′ leader sequence positions N(−8) to

N(−3) in a non-initiator pre-tRNAMet82, which includes binding sites for both the

RNA and protein subunits (75). Previously, we measured the relative kcat/Km values for RNase P processing of all possible substrate variations in the C5 binding site using high-throughput sequencing kinetics (HTS-Kin). HTS-Kin combines

Illumina sequencing and internal competition kinetic analysis to measure the relative second-order rate constant (kcat/Km) for thousands of alternative RNA substrates simultaneously. Quantitative modeling of the resulting rate-constant

32 distribution revealed the functional sequence specificity of C5 protein. However, the specificity for nucleotides proximal to the cleavage site that interact with P

RNA, as well as the mechanism by which discrimination between alternative substrates is achieved remain unknown.

Here, we determine the effects of all possible sequence variants at N(−6) to

N(−1) in the 5′ leader that contact the P RNA and C5 protein subunits on both multiple-turnover and single-turnover reaction kinetics. The observed specificity of RNase P under multiple-turnover conditions reveals N(−2) and N(−3) as the key determinants of alternative substrate discrimination. Moreover, we find that competition for alternative substrates occurs at the level of substrate association, not catalysis, and that this conceals specificity for N(−1) at the cleavage step.

Thus, a combined approach of high-throughput RNA enzymology and quantitative modeling of sequence specificity provides a general means to globally determine

RNA specificity landscapes. Taken together, the results reveal the key specificity determinants for an essential RNA-processing enzyme and support a general specificity landscape and mechanism by which discrimination is achieved.

33

Results and Discussion

RNase P Processing of Pre-tRNA Substrate Pools

To characterize the specificity of E. coli RNase P for pre-tRNA 5′ leader sequences, we randomized nucleotides N(−6) to N(−1) that include positions interacting with the RNA subunit (N(−2) and N(−1)), as well as N(−6) to N(−3) that interact with the C5 protein subunit, providing 4096 substrate variants (Figure 2-

2A) (56,59,62,87-90). Analysis of multiple-turnover kinetics of the pre-tRNAMet

N(−6 to −1) randomized population demonstrated that this pool of substrates reacts with kinetics that are overall slower than the native pre-tRNAMet82 with its genomically encoded 5′ leader sequence (Figure 2-2B). Similarly slow reactivity was observed for the pre-tRNAMet N(−8 to −3) substrate population randomized at positions N(−8) to N(−3) as reported in Guenther et al (75). Thus, randomization of the 5′ leader from N(−6) to N(−1) or from N(−8) to N(−3) affects RNase P multiple-turnover processing kinetics similarly, likely due to their commonly randomized region (N(−6) to N(−3)).

Met Next, the relative kcat/Km for all 4096 sequences in the pre-tRNA N(−6 to

−1) randomized population were measured using HTS-Kin. HTS-Kin measures changes in the relative abundance of alternative RNA substrates over reaction time using next-generation sequencing, and these data are used to calculate relative rate constants using internal competition kinetics (68,69,96,97). A relative second- order rate constant, krel, was calculated for each substrate that is normalized to the rate constant for the genomically encoded sequence (i.e., krel = (kcat/Km (variant))/( kcat/Km (AAAGAU)). A histogram showing the distribution of average relative rate

34

Figure 2-2. Determination of the effects of all possible pre-tRNA 5′ leader sequence variants on the relative kcat/Km for RNase P processing. (a) The six nucleotides randomized in pre-tRNAMet85N(−6 to −1) and pre-tRNAMet85N(−8 to −3) substrate pools are indicated by a green box, RNase P cleavage site indicated with arrow. (b) Multiple-turnover kinetics of genomically encoded pre-tRNAMet(WT)21A substrate (black) to the randomized pre-tRNAMet N(−8 to −3)21A (red) and pre- tRNAMet N(−6 to −1)21A (blue) populations. The inset shows typical results from RNase P reactions run on a denaturing polyacrylamide gel, each lane shows a time point that separates substrate and product. (c) Histogram of the number of substrate variants with a particular krel value from HTS-Kin showing the average of three experiments with the pre-tRNAMet N(−6 to −1) and two with pre-tRNAMet N(−8 to −3). (d) Sequence probability logos identify sequence preference in the fastest 1% of substrates in the HTS-Kin reactions. Black letters show the wildtype sequence and indicate nonrandomized positions. (e) The location of the 256 sequences that are in common between the randomized pre-tRNAMet N(−8 to −3)21A and pre-tRNAMet N(−6 to −1)21A populations at positions N(−6) to N(−3) are indicated by a green box. (f) The observed krel values from the two separate independent experiments and distinct randomized pools are plotted and fit to a linear function, red line.

35 constants for all substrate variants from three replicate experiments is shown in

Figure 2-2C. This histogram defines a rate-constant distribution that describes the entire range of effects of sequence variation on kcat/Km. The rate-constant distribution for the pre-tRNAMet N(−6 to −1) population is similar in shape to that previously observed for pre-tRNAMet N(−8 to −3) randomized population (75). A significant number of sequences (ca. 200) in the pre-tRNAMet N(−6 to −1) population reacted with krel greater than 2-fold faster than the reference, indicating that the genomically encoded 5′ leader sequence at N(−6) to N(−1) is not optimal for kcat/Km.

Modeling of RNase P Substrate Specificity

To identify the sequence determinants optimal for kcat/Km, we calculated sequence probability logos from 5′ leader substrate variants with the top 1% of krel values (98) (Figure 2-2D). Surprisingly, the results indicated little sequence preference at N(−1) and N(−4) which were previously indicated to contribute to specificity (56-58). In contrast, N(−2) and N(−3) showed strong preferences for adenosine and uridine, respectively. Optimal sequence determinants in the protein binding site (−6 to −3) of the 5′ leader are similar to those previously observed for these positions in HTS-Kin results obtained using the pre-tRNAMet N(−8 to −3) population (75). A subset of 256 sequence variants is common to both the pre- tRNAMet N(−6 to −1) and pre-tRNAMet N(−8 to −3) randomized populations (Figure

2-2E). As an internal check on accuracy, we compared the krel values measured for these sequences in the two independent randomized populations (Figure 2-

36

2F). The results show that the measured krel in both HTS-Kin experiments are highly correlative, demonstrating the reproducibility of the technique.

To amplify pre-tRNAs for high-throughput sequencing without ligation bias, an additional 21 nucleotides were added to the terminal 5′ end of pre-tRNAMet N(−6 to −1) substrates. Cross-linking, X-ray crystallography, and FRET studies demonstrate that the 5′ leader sequence is bound in an extended single-stranded conformation (35,47,94), therefore these additional 21nt (termed 21A) could result in formation of unfavorable secondary structure in the ground state that would complicate identification of intrinsic RNase P sequence specificity. We tested for

Met these effects by comparing the krel values determined for a second pre-tRNA

N(−6 to −1) population in which each nucleotide in the 21nt upstream sequence was changed to its Watson–Crick complement (termed 21B). Differences in krel value for the same 5′ leader substrate variant in the two different contexts identify potential instances of inhibitory secondary structure. Comparison of the rate- constant distributions for the pre-tRNAMet N(−6 to −1)21A and pre-tRNAMet N(−6 to

−1)21B populations (Figure 2-3A) shows that the majority of substrate processing rates correlate between the 21A and 21B substrate backgrounds. However, a subset of the population of N(−6) to N(−1) substrate variants reacts with greater than 2-fold slower kinetics in the 21A leader sequence context compared to its complement 21B.

To test whether inhibitory secondary structure explains the deviation of rate constants of some substrates between the 21A and 21B populations, M-Fold was used to calculate the predicted folding free energies between the 21A sequence

37

Figure 2-3. Quantitative modeling of 5′ leader sequence specificity of RNase P. (a) Density plot of average krel for each substrate variant in three HTS-Kin reactions in the 21A context compared to two reactions with the 21B substrate context. The plot shows the number of substrates in each hexagonal region indicated by the scale at right (less than 1% of data omitted for clarity). (b) Plot of average krel for 5′ leader variants in the context of 21A or 21B sequence. Individual points are colored by their M-Fold predicted highest folding free energy between the 5′ leader sequence and 21A sequence (less than 1% of data omitted for clarity). (c) PWM model used to describe the HTS-Kin data evaluated by plotting the observed krel from HTS-Kin against the predicted krel from the model. (d) A PWM model with an included coupling term used to describe the data evaluated by plotting the observed krel from HTS-Kin against the predicted krel of the model. (e) Linear coefficients for each nucleotide, indicated by color, from the PWM portion of model 2 shown for each position in the 5′ leader indicated at the bottom. (f) Heatmap of coupling coefficients from the model. Position and nucleotide identity indicated on each axis and the coupling coefficient indicated at the vertex.

38 and 5′ leaders (99). An overlay of these predicted folding free energies on a plot comparing the rate-constant distributions for the pre-tRNAMet N(−6 to −1)21A and pre-tRNAMet N(−6 to −1)21B populations is shown in Figure 2-3B. The 5′ leader sequences predicted to form the most stable secondary structure in the presence of the 21A context react with slower kinetics and show the greatest degree of displacement from the rate constant measured in the 21B context. Thus, a portion of substrates are affected in the ground state in a manner independent of the intrinsic sequence specificity of RNase P for N(−6) to N(−1). To isolate effects of

5′ leader sequence variation on RNase P processing independent of upstream sequence context, we selected and averaged the data for those substrate variants with krel values that varied by less than 2-fold between the 21A and 21B contexts for further analysis of intrinsic sequence specificity.

To comprehensively and quantitatively determine the specificity of RNase

P for 5′ leader sequences N(−6) to N(−1), we first applied a simple position weight matrix (PWM) model. PWM models are commonly used to describe DNA-binding proteins (100) and consider each position in the 5′ leader as independent and therefore uncoupled with other positions in the binding site. Fitting the HTS-Kin data to a PWM model and comparing the observed krel to that predicted by the model indicates this model only explains about half of the correlation between observed rate constant and 5′ leader sequence (Figure 2-3C). Additionally, the data were fit to a PWM model including an additional term that quantifies coupling between positions in the binding site of the 5′ leader. Including quantitative coupling coefficients (β values) in the model, which allows the sequence identity

39 at each position to modulate the effects of sequence variation at other positions in the binding site, provides a significantly better fit to the experimental data (Figure

2-3D). Importantly, similar results were obtained from fitting the complete pre- tRNAMet N(−6 to −1)21A or pre-tRNAMet N(−6 to −1)21B data sets (Figure 2-S1). This result indicates that the inaccuracies introduced by inhibitory secondary structure are relatively minor compared to the dominant contribution from intrinsic RNase P sequence specificity and reflects the overdetermination of these parameters by

HTS-Kin. Contributions to specificity are greatest at positions N(−2) and N(−3), consistent with the high probabilities of uridine and adenosine at these positions in the sequence probability logo (Figure 2-3E). The β values expressing the degree of coupling between positions are shown in a heatmap in Figure 2-3F. The strongest coupling occurs between adjacent nucleotides. Interestingly, relatively strong coupling coefficients are observed between the RNA binding site and the proximal protein binding site, in particular between the N(−2) and N(−3) positions.

This result indicates coupling at the single nucleotide level between the energetic contributions of nucleobases in the pre-tRNA 5′ leader that contact P RNA and those that contact C5 protein.

Specificity Landscape of Substrate Processing

The magnitude of kcat/Km reflects the first irreversible step, which is substrate association for the kinetic mechanism of pre-tRNAMet82. Processing of pre-tRNAMet with its genomically encoded 5′ leader sequence involves slow dissociation relative to the RNA cleavage step (49,59). However, the pre-tRNAMet

40

N(−6 to −1) randomized substrate pool contains all sequence combinations at positions N(−2) and N(−1) that are sites of interaction with the catalytic P RNA subunit of RNase P and could affect the substrate cleavage step. Although the protein subunit of RNase P contributes little to catalysis for optimal pre-tRNA substrates, the extent to which RNA–protein interactions broadly contribute to catalysis is not known. Additionally, it is not known whether catalytically important contacts between the 5′ leader and the P RNA subunit are stabilized or destabilized by the more distal protein contacts.

To determine the extent to which randomization of N(−6) to N(−1) affects catalysis, we performed single-turnover reactions under saturating enzyme conditions in which the observed rate constant reflects the substrate cleavage step. Similar to the multiple-turnover kinetics, the pre-tRNAMet N(−6 to −1)21A randomized pool under single-turnover conditions reacted with slower overall kinetics compared to the genomically encoded reference substrate; however, the substrate population reacted to completion (Figure 2-4A). This result is consistent with smaller overall effects of sequence variation on catalysis, or a smaller number of substrates for which this parameter is affected.

To distinguish these possibilities, we measured the relative single-turnover rate constants for all substrate variants in the pre-tRNAMet N(−6 to −1)21A population by performing HTS-Kin under saturating enzyme conditions. An overlay of histograms showing the observed rate-constant distributions from multiple- turnover versus single-turnover reactions with pre-tRNAMet N(−6 to −1)21A is shown in Figure 2-4B. A much narrower distribution of krel values is observed for single

41

Figure 2-4. Effect of sequence variation in the 5′ leader is primarily on substrate affinity and not on catalysis. (a) Single-turnover reactions performed at saturating enzyme conditions with pre-tRNAMet(WT)21A or pre-tRNAMet N(−6 to −1)21A from two and three experiments, respectively. (b) Histogram of the distribution of krel of substrates in a single-turnover HTS-Kin reaction compared to averaged multiple- turnover HTS-Kin reactions. (c) Comparison of krel for each substrate variant (individual points on the graph) in averaged multiple-turnover and a single-turnover HTS-Kin. (d) Single substrate reactions were used to determine the effect of 5′ leader sequence variation on the later steps in the reaction, (i.e., E·S→ E + P) or –1 the second-order rate constant (i.e., E + S→ E + P). The kcat (s ) from multiple- turnover reactions for each 5′ leader sequence listed on the x-axis from at least –1 –1 three experiments (red). The absolute kcat/Km (μ s ) measured for these sequences is shown from at least three experiments (black). Standard deviation is indicated by the error bars.

42

-turnover conditions demonstrating that few substrate variants in the 5′ leader alter the cleavage step, while the same substrates result in a much greater range of observed kcat/Km values. For those substrates in which the single-turnover rate constant was significantly affected, there are two possible interpretations: the catalytic rate constant may be reduced due to variation in the 5′ leader or these substrates may bind more weekly to RNase P and thus have a Km above the concentration used. A plot of observed krel values for substrates in multiple- turnover reactions versus single-turnover reactions reveals little correlation between the effects of 5′ leader variation and further illustrates the much smaller range of effects on the cleavage step (Figure 2-4C). The sequence specificity for cleavage under single-turnover conditions can also be modeled. Fitting the single- turnover HTS-Kin data to the PWM model with coupling coefficients described above, identifies N(−1) as the primary contributor to specificity (Figure 2-S2). The magnitude of the calculated PWM scores shows that uridine is optimal at N(−1) consistent with previous studies of RNase P catalysis and the current model of the

ES complex (56).

Thus, comparison of the multiple-turnover and single-turnover rate constants suggests a specificity landscape for 5′ leader sequence discrimination by E. coli RNase P (Figure 2-5) that globally describes the effects of sequence variation on the reaction mechanism. In this model, variation in proximal 5′ leader sequences have relatively small effects on the cleavage step and therefore are likely to also have small effects on kcat. In contrast, sequence variation at N(−2) and N(−3) alters kcat/Km, which controls the competition between alternative

43

Figure 2-5. Reaction free-energy profile for RNase P processing of pre-tRNAMet82 and effects of sequence variation defining the specificity landscape. In this mechanism, the pre-tRNA substrate combined with RNase P (P RNA is shown as a blue oval and P protein a smaller red sphere) to form the RNase P-pre-tRNA complex which undergoes turnover to form the tRNA and 5′ leader sequence products and free RNase P. For the genomically encoded pre-tRNAMet82 leader sequence, kcat is fast relative to dissociation, and the magnitude of kcat/Km reflects association. Differences in the magnitude of kcat/Km for alternative pre-tRNAs arise from possible effects of unfavorable structure in the ground state (ΔG structure), differences in interactions with the 5′ leader sequence that affect the transition state for association (ΔG RNA-protein), or large changes in the cleavage rate constant that also affect the magnitude of kcat (ΔG cleavage).

44 substrates. We tested the predicted effects on the kcat/Km and kcat values for a series of single pre-tRNA substrates selected to span the nearly 100-fold range in krel as measured by HTS-Kin. As expected, the observed kcat/Km values measured using individual substrate reactions showed significant variation based on 5′ leader

2 sequence identity and correlate well with the krel from HTS-Kin (Adj. R = 0.81)

(Figure 2-4D and Figure 2-S3). The results further demonstrate HTS-Kin as an accurate and reliable method for rapid and comprehensive determination of relative rate constants. In contrast, the kcat values measured individually for the same substrates were very similar and varied less than 2-fold. An exception is the slowest reacting 5′ leader variant that is predicted to have inhibitory secondary structure between the 5′ leader and 21A and resulted in decreases in both kcat and kcat/Km. For the majority of substrates, the primary effect of 5′ leader sequence variation is on kcat/Km with minimal effects on the substrate cleavage step and consequently minimal effects on kcat (49,59).

Previous mutagenesis and cross-linking experiments clearly show a U(−1) makes an optimal direct contact to a conserved adenosine residue in J5/15 of P

RNA. However, the insensitivity of kcat/Km to the identity of the N(−1) nucleobase is apparent from both the HTS-Kin data and confirmed by single-substrate assays

(55-57,91). Insensitivity to sequence variation at N(−1) observed in HTS-Kin indicates formation of contacts with this position occur at a step subsequent to the first irreversible step, which for the genomically encoded 5′ leader sequence is association. As a consequence, variation at N(−1), which contributes primarily to the cleavage step, does not significantly affect the observed kcat/Km. Conversely,

45

N(−2) and N(−3) contribute significantly to RNase P specificity as demonstrated by optimal sequence probability logos and quantitative modeling of the HTS-Kin data.

The resulting sequence specificity model is likely to have important implications for substrate processing in the cell. The distribution of krel values predicted for the 87 different 5′ leader sequences encoded in the E. coli show significant variation in magnitude occurring throughout the rate-constant distribution (Figure 2-S4). The fastest reacting 5′ leader sequences matched those in proline, glycine, and alanine pre-tRNAs, which have adenosine or uridine at

N(−2) and uridine at N(−3) and thus are predicted to be optimized for kcat/Km based on the HTS-Kin results. The slowest reacting 5′ leaders predicted by these results are found in tyrosine, methionine, and arginine which lack at least one of these specificity determinants. Importantly, there are contacts between the P RNA subunit of RNase P and the tRNA body that may also modulate the overall enzyme specificity for these substrates in vivo (92,101-103).

The general model for tRNA biosynthesis in E. coli suggests that the rate- limiting step occurs at the level of . For some substrates the presence of specificity determinants in the 5′ leader sequence may act to offset favorable or unfavorable interactions with the tRNA portion of the substrate to maintain uniform rates of 5′ end maturation. However, more recent molecular genetic analyses reveal that RNase P processes multiple polycistronic transcripts in which the order of processing occurs in a strict 3′ to 5′ direction (95). Thus, for some substrates, the presence or absence of specificity determinants may modulate cleavage relative to other competing cognate binding sites in the cell. The availability of a

46 comprehensive model for 5′ leader sequence specificity allows a more accurate evaluation of the potential differences in molecular recognition in vivo.

This study demonstrates the use of high-throughput RNA enzymology methods using common molecular biology techniques and instrumentation as a general means to globally determine the specificity landscape for molecular recognition by an essential RNA-processing enzyme. The application of HTS-Kin to measure both multiple-turnover and single-turnover kinetic parameters for thousands of RNA substrate variants reveals the full range of effects and the intrinsic sequence specificity for these two kinetic parameters. The ability to measure different kinetic parameters for the same sequence variants allows the construction of simple kinetic schemes for each substrate and insight into how variation alters the free-energy landscape of the reaction. Analysis of the high- throughput biochemical data using quantitative models of sequence specificity provides a way to identify key specificity determinants and quantitatively compare results obtained between experiments. Therefore, the general approach described here for characterizing the specificity of RNase P presents a rigorous and comprehensive way to investigate RNA specificity that is likely to be applicable to a range of experimental systems.

47

Methods

Multiple-Turnover and Single-Turnover Reactions

E. coli C5 protein was overexpressed and purified as described (104). P

RNA and pre-tRNAMet82 were synthesized by in vitro transcription using T7 RNA polymerase and PCR or cloned DNA templates. Pre-tRNA substrates were 5′ end labeled with γ-32P using polynucleotide kinase. Multiple-turnover reactions were performed in 50 mM Tris-HCl pH 8, 100 mM NaCl, 0.005% Triton X-100, and 17.5 mM MgCl2. RNase P was assembled by heating P RNA to 95°C for 3 min then

37°C for 10 min. MgCl2 was added and incubated at 37°C for 10 min before adding equivalent concentrations of C5 protein. Substrate was prepared separately by combining pre-tRNA with trace amounts of 32P labeled pre-tRNA, heating to 95°C for 3 min, incubating at 37°C for 10 min before adding MgCl2. Equal volumes of the enzyme and substrate were mixed to achieve final concentrations of 5 or 10 nM RNase P and 1 μM pre-tRNA for randomized pools. Aliquots were taken at selected time points and quenched with an equal volume of formamide loading dye containing 100 mM EDTA. Products were resolved on 15% denaturing polyacrylamide gels and quantified using phosphorimager analysis. Individual substrate assays used 3 μM pre-tRNAMet and 5 nM RNase P, and the kinetic data were fit to the integrated Michaelis–Menten equation. Single-turnover reactions were performed in 50 mM MES pH 6.0, 100 mM NaCl, 0.005% Triton X-100, and

17.5 mM MgCl2. Reactions were performed as described above except the final concentration of holoenzyme was 100 nM and substrate concentrations were <10 nM.

48

High-Throughput Sequencing Kinetics (HTS-Kin)

Multiple-turnover HTS-Kin measurements were made as described in

Guenther et al. Briefly, reactions were performed as described above, except scaled up 10-fold to provide sufficient material for subsequent analysis. Aliquots were taken during the time course, RNAs were resolved by polyacrylamide gel electrophoresis, and the residual substrate population was eluted and purified.

Equal amounts of RNA from individual time points were used as templates for first- strand cDNA synthesis. Products were diluted 1:300, and 1 μL of this dilution was used for PCR amplification followed by multiplexed Illumina sequencing in 75 bp single end reads on Hi-Seq 2500 instrument. Single-turnover HTS-Kin reactions were set up as described above for single-turnover reactions and were not scaled up. Primer sequences are included in Supporting Information.

Relative rate-constant (krel) values were calculated using

(1 − 푓) 푙푛 푅 푅 푖,0 (∑푖 ) 푅 1 푅 푋 푘 = 푖 0 푟푒푙 (1 − 푓) 푙푛 푖 푅 ∑1 푋 푅0 where R is ratio of each substrate variant to the reference, R0 this same ratio at the start of the reaction, and X is the mole fraction for each substrate variant. The ratio of substrates is quantified by the number of raw sequence reads from high- throughput sequencing and the total fraction of reaction was determined by quantification of polyacrylamide gels using ImageQuant software as described above.

49

Quantitative Modeling of Sequence Specificity

The HTS-Kin data were fit to a simple position weight matrix model treating each nucleotide in the randomized substrate region as independent and non- interacting using

6 푙푛(푘푟푒푙) = ∑ (푎푖퐴푖 + 푐푖퐶푖 + 푔푖퐺푖 + 푢푖푈푖) 푖=1

where ai, ci, gi, and ui, are integer values (0 or 1) signifying nucleotide identity and Ai, Ci, Gi, and Ui represent the linear coefficients for that nucleotide at position i. The PWM+IC model considered not only nucleotide identity and position in the randomized region but also the position and identity of other nucleotides in the binding site using the following equation:

6 푙푛(푘푟푒푙) = ∑ (푎푖퐴푖 + 푐푖퐶푖 + 푔푖퐺푖 + 푢푖푈푖) + 훽푗퐼푗 푖=1 where the second summed terms are pairwise interaction terms. For each of these couplings, βj is the linear coefficient for interaction j, and Ij is an indicator variable. Ij is 1 for all substrates with that specific pair of nucleotides, and 0 otherwise. Each interaction term which had an absolute t-value greater than 3.5

(p < 0.005) was recorded as significant. A final model was built using stepwise regression, starting with all of the significant pairwise interactions identified in the first step.

50

Supplementary Methods and Data

Preparation and Purification of Protein and RNA

Expression and purification of E. coli C5 protein was done as previously described (104). The E. coli P RNA gene with an added T7 promoter sequence and Bbs1 restriction site at the 3’ end was cloned into pUC18 vector. This was then linearized with Bbs1 (NEB R0539L) using standard protocol. The linearized plasmid was then used for run-off in vitro transcription with T7 RNA polymerase (NEB M0251S) using 5-10 µg of template cDNA. The resulting RNA was purified by phenol-chloroform extraction, resolved by PAGE and gel purified by UV shadowing. Excised RNAs were eluted from the gel in 10 mM Tris-HCl pH

8, 1 mM EDTA pH 8, and 0.1% SDS and recovered by ethanol precipitation. The final purified RNA was resuspended in TE Buffer (10 mM Tris-HCl pH 8, 1 mM

EDTA pH 8) and quantified using the UV absorbance. The E. coli pre-tRNAMet82 gene was also cloned into the pUC18 vector with added T7 promoter sequence on the 5’ end and Bbs1 restriction site on the 3’ end. This plasmid was also linearized as above using Bbs1 digestion. The resulting DNA was used as a template for

PCR to create sequence variants in the 5’ leader and add the 21nt sequence. PCR conditions were as follows: 1 U Taq DNA polymerase (Roche 04638964001), 1x supplied PCR buffer, 0.2 mM dNTP mix, 0.5 µM forward and reverser primers, and

18 nM template DNA were combined and heated on a thermocycler to 95˚C for 2 min followed by 40 cycles of 95˚C for 30 sec, 55˚C for 45 sec, and 72˚C for 1 min, followed by a final incubation at 72˚C for 5 min. A portion of the PCR product was then run on a 1% agarose gel at 100 volts and visualized on a transilluminator to

51 check product size before purifying by phenol-chloroform extraction and ethanol precipitation and resuspension in TE Buffer as described above. From the resulting cDNA template, 20-25 µg was used to synthesize substrate RNA by in vitro transcription using T7 RNA polymerase as described above.

PCR Primers used in this work: ptRNAMet(WT)21A Forward Primer

5’-

TAATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGAAAAAGATGGC

TACGTAGCTCAGTTGG-3’ ptRNAMet(-6 to -1)21A Forward Primer

5’-

TAATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGAANNNNNNGG

CTACGTAGCTCAGTTGG-3’ ptRNAMet(-8 to -3)21A Forward Primer

5’-

TAATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGNNNNNNAUGG

CTACGTAGCTCAGTTGG-3’ ptRNAMet(-6 to -1)21B Forward Primer

5’-

TAATACGACTCACTATACCCTCTGGCCTTAAGTCTAACATGAANNNNNNGGC

TACGTAGCTCAGTTGG-3’ ptRNAMet(CGGGUA)21A Forward Primer

52

5’-

TAATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGAACGGGTAGG

CTACGTAGCTCAGTTGG-3’ ptRNAMet(AAUGGC)21A Forward Primer

5’-

TAATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGAAAAUGGCGG

CTACGTAGCTCAGTTGG-3’ ptRNAMet(AUGGUU)21A Forward Primer

5’-

TAATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGAAATGGTTGGC

TACGTAGCTCAGTTGG-3’ ptRNAMet(AUAUCU)21A Forward Primer

5’-

TAATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGAAATATCTGGC

TACGTAGCTCAGTTGG-3’ ptRNAMet(AAUUGC)21A Forward Primer

5’-

TAATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGAAAATTGCGGC

TACGTAGCTCAGTTGG-3’ ptRNAMet(CACGAU)21A Forward Primer

5’-

TAATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGAACACGATGG

CTACGTAGCTCAGTTGG-3’

53 ptRNAMet(AUAUUU)21A Forward Primer

5’-

TAATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGAAATATTTGGC

TACGTAGCTCAGTTGG-3’ ptRNAMet(AUAUAU)21A Forward Primer

5’-

TAATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGAAATATATGGC

TACGTAGCTCAGTTGG-3’ ptRNAMet(CACGCU)21A Forward Primer

5’-

TAATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGAACACGCUGG

CTACGTAGCTCAGTTGG-3’ ptRNAMet Reverse Primer

5’-TGGTGGCTACGACGGGAT-3’

Multiple turnover and single turnover reaction kinetics

To visualize the substrate and products of the reactions the pre-tRNA substrates were 5’ end labeled with γ-32P. Multiple turnover reactions were performed in 50 mM Tris-HCl pH 8, 100 mM NaCl, 0.005% Triton X-100, and 17.5 mM MgCl2. To obtain the observed second order rate constants for the wild-type and randomized pre-tRNAMet populations, the enzyme complex was formed by adding 10 nM of P RNA, heating to 95˚C for 3 min, then incubating at 37˚C for 10 min before adding MgCl2 and another 10 min before adding 10 nM of C5 protein.

54

The substrate containing solution was formed separately by adding 2 µM of pre- tRNA spiked with a negligible amount of 32P labelled pre-tRNA, heating to 95˚C for

3 min, then incubating at 37˚C for 10 min before adding the MgCl2. Reactions were started by using equal volumes of the enzyme and substrate to achieve the final concentrations of 5 or 10 nM RNase P holoenzyme and 1 µM pre-tRNA substrate.

Aliquots of 5µL were taken at given timepoints and the reaction was stopped in equal volume of formamide loading dye + 100 mM EDTA on dry ice. Samples were run on a 15% denaturing polyacrylamide gel at 80 watts then dried and exposed to a phosphorimager screen. Radioactivity was quantified using

ImageQuant software and fraction of reaction calculated by taking the amount of product at each timepoint divided by the addition of substrate and product bands.

Controls in which enzyme was omitted from the reaction were also run with each experiment and showed no cleavage of substrate. Rate constants for the randomized populations were calculated by fitting the data to single exponential lines in Origin software.

For individual substrate assays to measure absolute kcat/Km, reactions were performed as above except the final concentration of substrate was 3µM and holoenzyme 5 nM. After quantification, these reactions were fit to the integrated

Michaelis-Menten equation to obtain the kcat/Km and datapoints from the end of the reaction were removed until the fit to the line reached an Adj. R2 of at least 0.98.

The single turnover rate constants from these data were taken from the initial rate

55 portion of these reactions where substrate was less than 15% reacted and were fit to a linear function.

Single turnover reactions were performed in 50 mM MES pH 6.0, 100 mM

NaCl, 0.005% Triton X-100, and 17.5 mM MgCl2. To obtain the first order rate constants for wildtype and randomized pre-tRNAMet populations, reactions were set up exactly as above with multiple turnover conditions except the final concentration of holoenzyme was 100 nM, which is expected to be saturating conditions for holoenzyme above the reported Km, and substrate concentrations were below 10 nM in these reactions and contained only 32P-labelled RNA. To confirm saturation, single turnover reactions were performed by increasing the holoenzyme concentration to 1 µM in reactions with less than 10 nM pre-tRNA, the rate constant was unchanged. Reactions were quantified as described above for multiple turnover conditions.

High-Throughput Sequencing Kinetics (HTS-Kin)

Multiple turnover HTS-Kin reactions were performed exactly as described above, except they were scaled up 10-fold in volume to allow for sufficient material for subsequent analysis by next generation sequencing. Aliquots from reaction timepoints were 160 µL and the reaction stopped by placing in 33 mM EDTA on dry ice. To concentrate samples, phenol-chloroform extraction and ethanol precipitation was performed as described. The purified RNA was then run on a

10% denaturing polyacrylamide gel, bands corresponding to the residual substrate population were identified by exposing the gel to x-ray film and were isolated (gels

56 were also exposed to phosphorimager screen for later quantification). The excised gel bands containing the residual substrate population were extracted from the gel and placed in 10 mM Tris-HCl pH 8, 1 mM EDTA, and 0.1% SDS at 4˚C overnight to elute the pre-tRNA. The substrate RNA was then equalized between timepoints based on the fraction reacted. First-strand synthesis was performed using 5 µL of the equalized RNA and 1µM reverse primer. Reactions were placed at 72˚C for

10 min and placed on ice for 1 min before adding 100U of SuperScript III reverse transcriptase, and 0.75 mM dNTP mix, and 2.5 mM DTT, and 1x supplied RT Buffer before beginning incubation at 42˚C for 10 min, 50˚C for 40 min, and 55˚C for 20 min before heat inactivating the enzyme at 95˚C for 5 min. Samples were then diluted 1:300 and 1µL of this dilution was used to amplify for high-throughput sequencing. Polymerase chain reaction was performed with 1.5 U Taq DNA polymerase, 0.5 µM forward primers that bound to the 21nt sequence at the 5’ end and contained a barcode for each timepoint and randomized dinucleotide sequence, 0.5 µM reverse primers complementary to the 3’ end, 1x supplied PCR buffer, and 0.2 mM dNTP mix. To reduce bias from amplification, the number of

PCR cycles was selected based on the minimal number needed to visualize the product on a 1% agarose gel stained with ethidium bromide. The PCR protocol consisted of incubation at 95˚C for 2 min followed by 14-18 cycles of 95˚C for 30 sec, 55˚C for 45 sec, and 72˚C for 1 min, followed by a final incubation at 72˚C for

5 min. The PCR products were then purified by centrifugation through Amicon

Ultra-0.5 Centrifugal Filter Devices. Samples were then sent for multiplexed

Illumina sequencing in 75 bp single end reads on Hi-Seq 2500 instrument using

57 equimolar amounts of each timepoint. Single turnover HTS-Kin reactions were set up exactly as described above for single turnover reactions and were not scaled up.

Primer sequences are as follows:

Reverse Transcription Primer

5’-CAAGCAGAAGACGGCATACGATGGTGGCTACGACGGGAT-3’

Forward PCR Primer for 21A Substrates (underlined letters indicate the indexed barcode)

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNATCGGGAGACCGGAATTCAGATTG-3’

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNGATGGGAGACCGGAATTCAGATTG-3’

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNCGAGGGAGACCGGAATTCAGATTG-3’

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNTCCGGGAGACCGGAATTCAGATTG-3’

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNCACGGGAGACCGGAATTCAGATTG-3’

58

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNTGTGGGAGACCGGAATTCAGATTG-3’

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNACTGGGAGACCGGAATTCAGATTG-3’

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNGTAGGGAGACCGGAATTCAGATTG-3’

Forward Primers for 21B Substrates (underlined letters indicate the indexed barcode)

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNATCCCCTCTGGCCTTAAGTCTAAC-3’

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNGATCCCTCTGGCCTTAAGTCTAAC-3’

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNCGACCCTCTGGCCTTAAGTCTAAC-3’

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNTCCCCCTCTGGCCTTAAGTCTAAC-3’

59

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNCACCCCTCTGGCCTTAAGTCTAAC-3’

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNTGTCCCTCTGGCCTTAAGTCTAAC-3’

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNACTCCCTCTGGCCTTAAGTCTAAC-3’

5’-

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC

CGATCTNNGTACCCTCTGGCCTTAAGTCTAAC-3’

De-multiplexing of the high-throughput sequencing data was done using customized Perl scripts. Barcodes were first moved to the front of the read for separation by NovoBarcode software from NovoCraft. All reads were then aligned to the 21nt sequence at their 5’ end permitting one mismatch in this region, then a count from this region was done to extract the randomized hexamer sequence in the 5’ leader. A report script was used to count the number of reads for each randomized hexamer.

The number of sequence reads and the information on reaction progress are analyses using internal competition kinetics to compute a relative rate constant for each substrate calibrated to the genomically encoded substrate sequence.

60

Briefly, the ratio of the observed rate constants in a reaction containing multiple substrates is:

푉 ( ) 푣 퐾 푆 2 = 2 ( 2) 푉 푣1 ( ) 푆1 퐾 1 where the ratio of the V/K values of substrate S2 to the reference substrate S1 is the defined krel for S2. Therefore:

푉 ( ) 퐾 2 = 푘 푉 푟푒푙 ( ) 퐾 1

In this work, the reference substrate S1 contains the genomically encoded

5’ leader sequence (AAAGAU), meaning that any substrate with krel > 1 is processed faster than the wildtype sequence, while substrates with krel < 1 are slower than the wildtype. Integration of the equation above gives:

푆 ( 2 ) 푙푛 푆 푘 = 2,0 푟푒푙 푆 푙푛 ( 1 ) 푆1,0 where S1,0 and S2,0 are the initial concentrations of the reference substrate and a particular substrate variant respectively and S1 and S2 represents the concentrations of these substrates at a specific time in the reaction. Due to the nature of these experiments, we quantify these amount of each substrate as a ratio, R, for example at time zero:

푆푖,0 푅푖,0 = 푆1,0 or at a specific time in the reaction:

61

푆푖 푅푖 = 푆1

The initial mole fraction, Xi, then becomes:

푆푖 푋푖 = 푛 ∑푖=1 푆푖,0 and the fraction of reaction of the entire substrate population is:

푓 = 1 − ∑ 푋푖 푖=1

Thus, rearranging and taking this information together provides the final equation.

(1 − 푓) 푙푛 푅 푅 푖,0 (∑푖 ) 푅 1 푅 푋 푘 = 푖 0 푟푒푙 (1 − 푓) 푙푛 푖 푅 ∑1 푋 푅0 where R is shortened to encompass the ratio of each substrate variant to the reference; R0 symbolizes this same ratio before the start of the reaction. X here symbolizes the mole fraction for each substrate variant. The ratio of substrates is quantified by the number of raw sequence reads from high-throughput sequencing and the total fraction of reaction was determined by quantification of polyacrylamide gels by phosphorimager and ImageQuant software as described above.

Quantitative modeling of sequence specificity

The position weight matrix model treating each nucleotide in the randomized substrate region as independent and non-interacting with its

62 neighbors takes into account only the position and identity at each position in a given substrate variant. This model follows the equation:

6 푙푛(푘푟푒푙) = ∑ (푎푖퐴푖 + 푐푖퐶푖 + 푔푖퐺푖 + 푢푖푈푖) 푖=1 where ai, ci, gi, and ui, represent the integer value of 0 or 1 of that nucleotide for each substrate (0 if the nucleotide identity at that position is not the indicated and

1 if it is) and Ai, Ci, Gi, and Ui represent the linear coefficients for that nucleotide and position. This is calculated for each position in the binding site and values are summed as shown. This model was run as an R script which fit the data by linear regression and was used to predict the krel for each substrate variant.

The PWM+IC model considered not only nucleotide identity and position in the randomized region but all the position and identity of other nucleotides in the binding site. This model follows the equation:

6 푙푛(푘푟푒푙) = ∑ (푎푖퐴푖 + 푐푖퐶푖 + 푔푖퐺푖 + 푢푖푈푖) + 훽푗퐼푗 푖=1 where the second part of the equation has been added to include one pairwise interaction term of two nucleotide-position pairs. This model was estimated 240 times, once for each of the 240 possible interactions. For each of these couplings,

훽푗 is the linear coefficient for interaction j, and Ij is an indicator variable. Ij is 1 for all substrates with that specific pair of nucleotides, and 0 otherwise. (If the first interaction examined is A(-1)-A(-2), then Ij is 1 for all sequences that have 푎1 =

1 and 푎2 = 1, and 0 otherwise). Each interaction term which had an absolute t- value greater than 3.5 (p < 0.005) was recorded as a significant interaction. A final model was built using stepwise regression, starting with all of the significant

63 pairwise interactions identified in the first step, resulting in a final model with significant pairwise interaction terms in addition to the variables used in the base

PWM model. This model was also run as an R script and fit to the data by linear regression, where we then used the results to predict the krel of all substrate variants based on this model.

64

65

Figure 2-S1 Analysis of raw HTS-Kin datasets obtained with the 21A or 21B terminal 5’ sequence. a) Comparison of the observed krel values observed for the pre-tRNAMet N(-6 to -1)21A population obtained using HTS-Kin to values predicted by fitting the data to a PWM model of sequence specificity (model 1). b) Met 21A Comparison of the observed krel values for the pre-tRNA N(-6 to -1) population to predicted values from fitting to a PWM model including coupling coefficients as described in the main text (model 2). c) Linear coefficients from the PWM portion of model 2 for each nucleotide (x-axis) at each position in the randomized 5’ leader (y-axis). d) Heatmap of coupling coefficients predicted by model 2. Position in the 5’ leader and nucleotide identity are indicated on each axis and the coupling coefficient indicated at the vertex where positive coefficients are red and negative are blue. The yellow box surrounds coupling coefficients between nucleotides contacting the RNA and protein subunits. The exact same analysis is shown for HTS-Kin reactions performed with pre-tRNAMet N(-6 to -1)21B in e-h.

66

Figure 2-S2 Quantitative analysis of single turnover HTS-Kin datasets performed with the 21A terminal 5’ sequence. a) Comparison of the observed krel values measured for the pre-tRNAMet N(-6 to -1)21A population using HTS-Kin under single turnover reaction conditions to the values predicted by fitting the data to the PWM model (model 1). b) Comparison of the observed krel values measured under single turnover conditions for the pre-tRNAMet N(-6 to -1)21A population to the values predicted by a PWM model including coupling coefficients (model 2). c) The linear coefficients from the PWM portion of model 2 are shown for each nucleotide (x- axis) at each position in the randomized 5’ leader (y-axis). d) Heatmap of the coupling coefficients predicted by model 2. Position and nucleotide identity are indicated on each axis and the coupling coefficient indicated at the vertex where positive coefficients are red and negative are blue. The yellow box surrounds coupling coefficients between nucleotides contacting the RNA and protein subunits (note the smaller range of the coefficients compared to those in multiple turnover HTS-Kin).

67

68

Figure 2-S3 Single substrate reactions using pre-tRNAMet with varying 5’ leader sequences. a) Representative single substrate multiple turnover reactions with 3 µM pre-tRNAMet substrate with mutant 5’ leader sequences containing the terminal 21A nucleotides and 5 nM RNase P. Depletion of the substrate (x-axis) is shown over reaction time (y-axis). The identities of the 5’ leader sequences at N(-6) to N(-1) are shown in the legend. b) Second order rate constants were calculated from at least three single substrate multiple turnover reactions as shown in panel A by fitting the graph of substrate depletion over time to the integrated Michaelis- Menten equation. The absolute kcat/Km is plotted compared to the average krel from three independent HTS-Kin experiments. Error bars are shown indicating standard deviation of both experiments. c) Initial rates of the same multiple turnover reactions were fit to a line to measure the observed first order rate constant. The kobs is compared with observed krel in HTS-Kin. Error bars are shown indicating standard deviation of both experiments.

69

Figure 2-S4 The 5’ leader sequences from endogenous pre-tRNA from E. coli show variation in specificity. A plot of the average krel determined from three multiple turnover HTS-Kin experiments with pre-tRNAMet N(-6 to -1)21A. The placement of 5’ leader sequences that correspond to those in the 87 endogenous pre-tRNA in E. coli are indicated by the black circles and shows a range of krel.

70

Chapter 3: Optimization of high-throughput sequencing kinetics for determining enzymatic rate constants of thousands of RNA substrates

Reprinted from Anal Biochem. Vol. 510. Niland, C.N.; Jankowsky, E.; Harris, M.E. “Optimization of high-throughput sequencing kinetics for determining enzymatic rate constants of thousands of RNA substrates.” Pg. 510:1-10. Copyright 2016 with permission from Elsevier

71

Abstract

Quantification of the specificity of RNA binding proteins and RNA processing enzymes is essential to understanding their fundamental roles in biological processes. High-throughput sequencing kinetics (HTS-Kin) uses high- throughput sequencing and internal competition kinetics to simultaneously monitor the processing rate constants of thousands of substrates by RNA processing enzymes. This technique has provided unprecedented insight into the substrate specificity of the tRNA processing endonuclease ribonuclease P. Here, we investigated the accuracy and robustness of measurements associated with each step of the HTS-Kin procedure. We examine the effect of substrate concentration on the observed rate constant, determine the optimal kinetic parameters, and provide guidelines for reducing error in amplification of the substrate population.

Importantly, we found that high-throughput sequencing and experimental reproducibility contribute to error, and these are the main sources of imprecision in the quantified results when otherwise optimized guidelines are followed.

72

Introduction

The ability of ribonucleases, ribonucleoproteins, and RNA processing enzymes to recognize multiple alternative substrates is essential to cellular gene expression. For example, the RNA substrates for key enzymes such as the ribosome, spliceosome, tRNA, and mRNA processing enzymes can vary greatly in sequence and/or structure (105-109). Given the broad range of alternative substrates that are recognized by these enzymes, their specificity cannot be entirely captured by sequence motif analysis, homology modeling, or similar approaches that consider only genomically encoded or optimal substrates (110).

Moreover, it is well established that a biologically relevant investigation of enzyme specificity involves understanding how substrates compete for association

(68,71,111). In vitro structure–function experiments comparing the kinetics of individual RNA substrate variants provide a powerful way to test potential specificity determinants. However, this approach has limited throughput and, therefore, is not practical for achieving a comprehensive description of specificity.

A more complete understanding of the specificity of RNA binding proteins and RNA processing enzymes can be gained by analysis of the processing rate constant or equilibrium binding constant for all possible substrate variants (110).

Such data provide a means for identifying sequence and structure determinants of specificity and comprehensively analyzing how sequence variation affects the reaction mechanism (112). This level of understanding is necessary for prediction of the distribution of enzyme binding sites in the transcriptome and designing RNAs

73 and RNA binding proteins with novel specificities (113-115). By analyzing the effect of all possible variations in substrate RNA sequence on rate constants or equilibrium constants, the effect of sequence variation at one position on the sequence preference elsewhere in the binding site is revealed (75). Such coupling between the energetic contributions of nucleotides in the RNA substrate is expected due in part to the complex structure and folding of RNA. Quantitative analysis of the interdependence between the contributions of individual nucleotides to recognition by RNA binding proteins and RNA processing enzymes has the potential to reveal important elements of substrate structure as well as their intrinsic sequence specificity.

Recently, powerful new approaches have been developed aimed at comprehensively analyzing RNA sequence specificity, including SELEX

(systematic evolution of ligands by exponential enrichment) (116), Bind-n-Seq

(117), and HiTS-RAP (high-throughput sequencing–RNA affinity profiling) (118).

However, these techniques monitor only equilibrium processes or provide information on optimal substrates only and, therefore, do not analyze the full complement of substrate variants or require specialized instrumentation. We developed a new technique termed high-throughput sequencing kinetics (HTS-

Kin) that overcomes these limitations, allowing quantitative measurement of the second-order rate constants of thousands of substrate variants in a single reaction using standard molecular biology methods and standard Illumina sequencing protocols. Initial application of HTS-Kin was used to comprehensively analyze the specificity of C5, the protein subunit of the transfer RNA processing

74 ribonucleoprotein enzyme RNase P from Escherichia coli, for its corresponding binding site in the 5′ leader of precursor tRNA (75). The affinity distribution of C5 was found to resemble those of highly specific binding proteins (75).

Unlike these specific proteins, however, C5 does not bind its physiological RNA targets with the highest affinity but rather binds them with affinities near the median of the distribution. Thus, the data not only delineated the rules governing substrate recognition by C5 but also revealed that apparently nonspecific and specific RNA- binding modes might not differ fundamentally but represent distinct parts of common affinity distributions.

HTS-Kin continues to provide important new insights into RNase P molecular recognition and is amenable to a broad range of applications. Therefore, it is necessary to consider sources of uncertainty, evaluate their contribution to error in determination of relative rate constants by this method, and propose strategies for minimizing or avoiding inaccuracies in interpretation of rate constants calculated from these data. In HTS-Kin, the relative rate constants for in vitro RNA processing reactions are determined by analyzing the change in the concentration of individual RNAs in the unreacted substrate population compared with a reference substrate using internal competition kinetics. The change in concentration of each substrate is calculated from the number of reads obtained by Illumina sequencing of the substrate population at select time points in the reaction relative to a reference substrate.

75

Thus, for the HTS-Kin technique, there are several factors requiring optimization in order to minimize error and that may limit accuracy. These factors include (i) accounting for the variation in initial substrate concentrations in randomized RNA populations, (ii) choosing the appropriate time scale for accurately capturing the range of rate constants in the population, (iii) selecting an appropriate reference substrate as an internal standard, (iv) preparing the cDNA library by reverse transcription and polymerase chain reaction (PCR) amplification,

(v) the Illumina sequencing itself, and (vi) error due to experimental reproducibility.

Here, we examine each of these factors individually with respect to its contribution to the variation in the observed affinity distributions measured by HTS-Kin. In general, for optimal HTS-Kin experiments, early reaction times should be used to minimize rate constant compression. Although substrate amplification must be maintained in the linear range, the error due to small differences in cycle number is negligible. In addition, quantification of rate constants for slow reacting substrates is subject to error from Illumina sequencing, yet a high degree of experimental reproducibility is achieved for most substrate sequence variants.

76

Results and Discussion

Determination of relative rate constants for in vitro RNA processing reactions by

HTS-Kin

Internal competition kinetics, which HTS-Kin uses to calculate relative rate constants from Illumina sequencing data, is based on the fact that variation in specificity is due to differences in the activation energies for kcat/Km of alternative substrates for the same enzyme (Figure 3-1A). There are several advantages and potential disadvantages in using internal competition kinetics; therefore, it is important to consider these factors in the context of their application in HTS-Kin.

The kinetics of such reactions containing multiple alternative substrates has been described previously (49,71,111), and the equations and derivations for internal competition were recently reviewed and developed for quantification of both precursor and product ratios by Anderson (119). Briefly, as illustrated in Scheme

1, a single population of enzyme (E) can combine with multiple substrates (S1, S2,

S3, …, Si).

77

78

Figure 3-1. High-throughput sequencing kinetics (HTS-Kin) measures processing rates of thousands of RNA substrates using internal competition kinetics. (A) Reaction coordinate diagram depicting the processing of multiple pre-tRNA substrates by RNase P. As the reaction progresses, the activation energy for kcat/Km determines the relative rate of product formation; thus, favorable substrates (blue) are depleted more quickly, whereas unfavorable substrates (orange) are minimally processed and accumulate transiently relative to the wild-type substrate (black). (B) The substrate and product at different time points in the reaction are separated on a denaturing polyacrylamide gel (left), and the residual substrate population is isolated for high-throughput sequencing. Plotting the normalized reads for each substrate variant from Illumina sequencing shows that as the reaction progresses, substrates with fast krel values are depleted from the residual substrate population, whereas those with slow krel values accumulate (right). (C) An affinity distribution measured using HTS-Kin using a pre-tRNAMetN(−1 to −6) randomized population is shown as the number of substrate variants with a given krel value and depicts the entire range of effects of this variation on enzyme processing. By definition, the wild-type pre-tRNA has a krel of 1, and substrates are calibrated to this as either faster (krel > 1) or slower (krel < 1) than the reference.

79

Scheme 1: Association of RNase P (E) with multiple pre-tRNA substrates (S1, S2,

S3, Si).

The rate of product formation of any individual substrate (vobs1) is proportional to the fraction of total enzyme in the ES1 form (70). Additional alternative substrates deplete ES1, and consequently the rate of formation of P1, by acting as competitive inhibitors. For alternative substrates, here the substrate variant S2 and wild-type reference S1, the multiple turnover rate equation is essentially that for competitive inhibition and the ratio of the two observed rates simplifies to (71,111,120,121).

Thus, the relative rate constant, or the ratio of the processing rate constants for the two competing substrates, is the ratio of their respective kcat/Km values multiplied by the ratio of their concentrations. Integration of the above general equation describes how the ratio of substrates will change over the time course for first-order and pseudo-first-order reactions (70,72):

In Eq. (2), the values of S1,0 and S2,0 are the initial concentrations of the two substrates, and S1 and S2 are their concentrations after a specific time interval.

This expression can be integrated and rearranged to give (75).

80

where Ri is the ratio S2/S1 determined at remaining total substrate f and Ri,0 is the ratio S2/S1 at the start of the reaction. This expression is valid for any analytical method to measure S2/S1. In the case of HTS-Kin, these ratios are calculated from the number of Illumina sequence reads obtained from libraries made from the substrate population at the start of the reaction and at specific fractions of total substrate reacted.

HTS-Kin reactions involving RNase P require the following steps, all of which have specific features that can impact the reproducibility and contribute to the error in the calculation of relative rate constants. First, a population of pre-tRNA randomized in the 5′ leader at N(–6) to N(–1) that contacts both the RNA and protein subunits of RNase P is synthesized. Randomization is accomplished using the cloned wild-type pre-tRNAMet gene as a template for PCR amplification in which the forward primers encode the randomized positions. The randomized DNA pool is then used for in vitro transcription to generate the randomized pre-tRNA substrate pool. Although the initial synthetic DNA population is synthesized to result in an approximate equimolar distribution of nucleotides at each position, it is unlikely that this distribution is maintained throughout the PCR and workup of the substrate pool. However, the initial distribution of substrate variants is assayed directly by Illumina sequencing. Moreover, as described in more detail below, the use of internal competition kinetics minimizes the effects of systematic

81 inaccuracies in measurements of substrate ratios and does not rely on an equimolar distribution in the initial precursor population.

The substrate pool is reacted with RNase P, and substrate and product from individual reaction time points are separated on a denaturing polyacrylamide gel.

The reaction progress is quantified and the substrate RNA populations are isolated at different time points and made into libraries for Illumina sequencing using reverse transcription and PCR amplification using a unique barcodes for each time point (Figure 3-1B). By monitoring the number of Illumina reads of each sequence as a function of time, it is clear that as the reaction progresses, favorable substrates deplete from the residual substrate population while those with slow rate constants accumulate (Figure 3-1B). Using Eq. (3) above, this information is used to calculate krel values for all 4096 substrate variants. These data represent the entire range of effects of this 5′ leader variation on enzyme processing. This is best exemplified in an affinity distribution as shown in Figure 3-1C as a histogram of the number of substrate variants with a specific relative rate constant, krel.

The application of internal competition kinetics in this method offers several important advantages with respect to accuracy and precision of the resulting rate constant distribution. Due to the use of substrate ratios to calculate rate constants, systematic inaccuracies in the determination of these ratios, which may occur during several steps in the process, are canceled. In addition, experimental variation in krel calculation is minimized because all substrates react in the same reaction vessel and under identical reaction conditions. Nonetheless,

82 disadvantages include the necessity to optimize several key reaction parameters and the potential for contributions from multiple sources of stochastic error that may propagate through the experiment. In the following sections, we consider the advantages and disadvantages at each step in the application of HTS-Kin with respect to reproducibility and minimization of error. First, we consider factors that may skew results or require optimization in the calculation of relative rate constants from substrate ratios. Then, we consider factors affecting the workup and measurement of the substrate ratios themselves by Illumina sequencing.

The magnitude of krel is independent of the distribution of substrate mole fractions in the initial precursor RNA population

One key factor apparent from inspection of Eq. (1) is that this expression is valid for any initial values of S1 and S2. Accordingly, the observed krel values measured by HTS-Kin should necessarily be independent of the individual concentrations of each individual substrate in the randomized pre-tRNA population. To test this in the application of HTS-Kin, we calculated the apparent mole fraction for each substrate variant in the initial substrate pool using its number of sequencing reads and dividing by the total number of reads for all substrate variants and then compared these values with the calculated krel for that substrate.

As shown in Figure 3-2A, a density plot of the observed krel plotted versus the mole fraction in the initial substrate population clearly shows that the two distributions are uncorrelated. In addition, a comparison of the ratio of high-throughput

83

84

Figure 3-2. Analysis of the dependence of the observed krel on the distribution of substrate mole fractions in the initial precursor RNA population. From a single HTS-Kin reaction (Experiment 1) with pre-tRNA substrate variants randomized in the 5′ leader at N(−6 to −1), various aspects from Illumina sequencing are compared with the calculated krel. (A) Density plot of the mole fraction of each substrate variant (calculated as the ratio of number of reads of that substrate to that for all substrates at T0) compared with its calculated krel, with the number of substrates in a particular area on the graph indicated by the shade of blue. (B) Density plot of the raw number of reads of each substrate variant in the starting material compared with the calculated krel of that substrate. (C) Ratio of raw reads of a substrate variant at a defined time in the reaction to that in the starting material (Illumina reads for Sn at 12% reacted/Illumina reads for Sn at 0% reacted) shows an exponential decrease with increasing krel.

85 sequencing reads of mutant substrate to wild type in the starting material to the observed rate constant also reveals no correlation between these two parameters, as expected (Figure 3-2B). In contrast, the change in the ratio of Illumina sequence reads for each substrate variant at a specific fraction of reaction relative to the ratio in the initial substrate population necessarily defines the magnitude of the observed krel calculated by Eq. (3). In Figure 3-2C, the change in Illumina sequencing reads over the course of the reaction is plotted versus the magnitude of the calculated krel value to illustrate this fact. Thus, these results are consistent with principles of alternative substrate kinetics introduced above and described in more detail elsewhere (119).

Optimization of reaction kinetics and choice of internal reference for calculation of krel

Two additional aspects of the application of internal competition kinetics to calculate krel that are self-evident in Eq. (3) are the selections of the fraction of reaction (f) and the reference substrate (essentially S1 from Eq. (1)). For the application of HTS-Kin to RNase P specificity, the genomically encoded leader sequence for the pre-tRNAMet served as the reference substrate. For ease of interpretation, the use of a wild-type sequence as the reference has the obvious advantage that the absolute magnitude of krel reflects the fold difference in observed rate constant from a biologically relevant standard. Note, however, that the genomically encoded reference may or may not be the optimal substrate with respect to enzyme processing.

86

It follows that substrate variants with fast rate constants will exhibit a large change in substrate ratio relative to the reference (Ri) per unit time, whereas slower reacting species will result in only small changes in the observed ratios. A disadvantage is that precision of every krel measurement depends on the level of error in the measurements of the wild-type reference substrate (S1 and S1,0). The contribution of this error to the calculated krel value may limit precision of rate constant measurements that are significantly different from the reference. To address this potential limitation, 21 different reference substrate variants spanning a wide range from fast to slow processing by RNase P were each used to calculate the krel of all 4096 substrate variants. The krel determined for a single substrate variant using each of the 21 different reference substrates was averaged, and a standard deviation was calculated. References that produced a krel outside of the standard deviation for any substrate were eliminated. The remaining 15 reference substrates were used to calculate an average krel for each pre-tRNA, and a plot of this analysis is shown in Figure 3-3A. The results clearly show a high correlation with krel determined using the wild-type reference. Error bars on the plot of krel values determined using multiple references provide an estimate of maximum uncertainty from using a single reference.

As noted previously (74,122), a second factor that is integral in optimizing the range of effects that can be measured by HTS-Kin is the selection of appropriate time points for calculation of the expected range of krel values. This is apparent from Eq. (3), where the fraction of reaction is used to calculate each krel.

The primary consequence of choice of inappropriate time points is illustrated in

87

Figure 3-3. Optimization of reaction kinetics and choice of internal reference for calculation of krel. (A) Calculation of krel from HTS-Kin data from Experiment 1 using the wild-type 5′ leader (AAAGAU) as a reference compared with using 15 5′ leader variants spanning a range of krel as references in combination. The results of a single HTS-Kin reaction of pre-tRNAMetN(−6 to −1) with RNase P are investigated for their processing rate at increasing fractions of total substrate reacted. Inset: Simulation of the reaction progress of substrate variants with a range of rate constants. The simulated rate constants are indicated in the legend, and an identified optimal time for isolation and calculation of krel by HTS-Kin is indicated by the dashed black line. (B) Affinity distributions of the number of substrates with an indicated krel at various times in the same HTS-Kin reaction indicated in the legend. (C,D) Comparison of the krel determined from the same HTS-Kin reaction at different time points is shown as a density plot. Compression of rate constants is observed strongly at late times in the reaction.

88

Figure 3-3B, where affinity distributions calculated from samples taken at different time points in the same reaction are compared. The affinity distribution determined from an early time point provides the greatest range in krel values, whereas at later points it contains higher levels of substrate conversion exhibit compression in the range of observed krel values, as discussed previously (74,119). The basis for this effect is illustrated in the inset of Figure 3-3B with simulated kinetics for RNAs with different rate constants. At very early points in the reaction, only the fastest substrates will be processed, making calculation of krel for the vast majority of substrates highly error prone because their concentrations have changed little over this short time. Conversely, calculating krel from late points in the reaction provides a poor measure of processing rates because the fastest substrates are nearly consumed to completion, making the measurement of their krel inaccurate. At these later times in the reaction, the substrate ratios approach values reflect incomplete reactivity of the initial RNA population due to misfolding or other chemical differences. In addition, substrates with slower rate constants are afforded sufficient time to reach similar fractions of reaction to their faster counterparts. As a result, the observed krel values become artificially faster, as discussed previously

(74,122).

As shown in Figure 3-3C and D, we investigated the effect of varying the fraction of substrate reacted (f), from approximately 0.1 to 0.5, on the determined krel for each substrate variant. In this experiment, a single HTS-Kin reaction containing the same randomized pre-tRNA pool was sampled at several time points. The observed fraction of substrate reacted was determined for each time

89 point, and affinity distributions were calculated. Figure 3-3C and D shows the comparison of the krel distributions obtained for f = 0.12 versus the distributions obtained at fs = 0.23 and 0.54, respectively. A clear difference is observed in the range of krel values calculated using the substrate populations from later time points compared with f = 0.12. The range of krel values decreases dramatically from

1000-fold at f = 0.12–100-fold at 0.23 and just over 10-fold at 0.54. This compression in the calculated krel values is clearly shown in an overlay of the histograms representing the individual affinity distributions for the three experiments (Figure 3-3A). The data further demonstrate that sampling at early time points at low substrate conversion provides the greatest accuracy. However, gains in the increase in signal to noise for slower reacting substrates achieved by sampling at later time points is more than offset by a large increase in systematic error affecting the entire affinity distribution.

Reliability of first-strand cDNA synthesis and quantitative PCR for Illumina library preparation

After the substrate RNA population is isolated from different times in the reaction, it must be converted to cDNA using reverse transcription followed by PCR to generate the library for Illumina sequencing. During PCR, Illumina adapters are added to the cDNA corresponding to each RNA substrate as well as unique barcodes in order to distinguish reaction time points to allow for multiplexing.

Previous analytical studies of RNA quantification using Illumina sequencing

90 showed that the majority of error is the result of library preparation or poor choice of PCR primers (75,115). The accuracy of krel in turn relies on the accuracy of measuring changes in the abundance of substrate variants over time. Therefore, it is essential to amplify the library under conditions where these differences are accurately preserved; thus, later amplification steps of HTS-Kin must be carefully considered and performed.

Several studies have aimed at achieving a quantitative understanding of various artifacts introduced by PCR that are relevant to HTS-Kin. For instance, template concentration, bias against high GC templates, template switching, and polymerase errors may contribute to errors in downstream steps (123-125). These previous studies indicated that this bias and these errors can be minimized by using the minimum number of amplification cycles required to form products and defining the optimal template concentration in the PCR. Another consideration is the importance of testing for differential amplification of different barcoded primers because this can introduce amplification and subsequent sequencing bias for barcodes containing structure (126-128). In our own experience, inaccurate results in one instance during preliminary experiments were traced to this effect. This consideration is tested by validation of all barcoded primers used for amplification by RT-PCR (reverse transcription PCR) or qPCR (quantitative PCR).

To diminish to the greatest extent possible the types of error during PCR amplification listed above, we determined the minimum number of PCR cycles necessary to achieve an identifiable cDNA product. Differences in the amount of

91 pre-tRNA substrate remaining at different time points in the reaction were accounted for by normalizing the amount of template RNA used in the reverse transcription reaction. We used semi-quantitative PCR to identify the linear range of amplification for each residual substrate population (Figure 3-4A). To combat variations in PCR that would diminish the variation in the substrate population, we selected 14 cycles as the first number of PCR cycles for which a definable cDNA product band was observed.

In addition, inaccuracy in the construction of the Illumina sequencing library may arise if the amount of PCR products is not proportional to the concentration of input cDNA. To minimize this possibility, we ensure that the amount of DNA produced at the chosen number of PCR cycles is dependent on the amount of the first-strand cDNA product used as template. To demonstrate this, we performed

PCRs for 14 cycles for reactions containing a 2-fold difference in the amount of first-strand cDNA synthesis products used as template. As shown in Figure 3-4B, an approximately 2-fold increase in the amount of PCR product is detected by agarose gel electrophoresis in reactions containing a proportional increased cDNA template.

Nonetheless, it is possible that despite optimization there is nonlinear amplification of individual sequences even within the linear range for PCR amplification of the total population, which could be a potential source of error in the determination of krel values by HTS-Kin. To test this directly, we determined the observed krel from samples in which the same cDNA template was amplified

92

93

Figure 3-4. Analysis of the reliability of first-strand cDNA synthesis and quantitative PCR for Illumina library preparation. Substrate cDNA synthesis and amplification from RNase P HTS-Kin reactions from Experiment 2 with pre-tRNAMetN(−6 to −1) are shown. (A) A 1% agarose gel showing the results of semi-quantitative PCR performed on the first-strand synthesis template from HTS-Kin reactions at different times in the reaction. The reaction time and number of PCR cycles are indicated at the top of the gel. RT, reverse transcription. (B) A 1% agarose gel showing linearity in the first-strand cDNA synthesis by reverse transcriptase. PCRs containing 1 or 2 times the amount of first-strand cDNA template from substrate populations at 0 and 12.36% reacted in HTS-Kin were performed for 0 and 14 cycles, and a control reverse transcription reaction was included in which no substrate RNA was added. (C) Comparison of the krel determined for the same HTS-Kin reaction in which the same first-strand cDNA from the substrate population was amplified for 14 or 16 cycles depicted as a density plot.

94 for 14 versus 16 cycles of PCR, which are both in the apparent linear range of PCR amplification. In Figure 3-4C, the krel values measured for all substrate variants in these two samples are compared. The krel values are highly correlative, in particular for the fastest reacting substrate variants. Significantly greater differences are observed in the krel values for slower reacting species.

Because the samples compared in Figure 3-4C are from the same reaction, the observed error for the slow reacting species could be due to errors in downstream Illumina sequencing steps. As discussed above, the slowest reacting species will undergo the smallest change in concentration over the reaction; therefore, these data will exhibit the greatest sensitivity to stochastic measurement errors in the determination of these values. The high degree of correspondence for the vast majority of the population demonstrates the robustness of the method so long as attention is paid to whether linearity is maintained with respect to template concentration and PCR amplification.

Robustness of Illumina sequencing for reproducible determination of krel values

An unknown level of error may come from the variability between Illumina sequencing runs due to variation in flow cell, sample handling, or the instrument itself. Error from these sources can be minimized by pooling samples from different

HTS-Kin reactions and different time points in the same Illumina flow cell lane using unique barcodes and combining these with other users' samples or a control

95 sample. The reported error rate for Illumina HiSeq 2000 is 0.26%, the lowest reported for major high-throughput sequencing platforms (129). Although we used data from the Illumina Hi-Seq 2500 in these studies, a similarly high level of fidelity is expected. Systematic miscalling of a particular nucleotide in the cDNA has been investigated and quantified, and there are various approaches to correcting these errors (130,131). However, because of the large number of sequence reads (500–

1500) obtained for most substrate variants, it is not necessary to apply them in

HTS-Kin.

To estimate the error introduced in the Illumina sequencing step of the procedure, we compared the rate constants calculated from two sequencing runs on the same cDNA sample. Figure 3-5 shows a plot of the two krel data sets obtained from the two separate sequencing runs. Inspection of the data shows that the substrate variants with the slowest krel have the greatest difference between measurements. Because the samples were not prepared separately for each run, we attribute this error directly to the variability of the high-throughput sequencing.

Hence, Illumina sequencing appears to limit the ability to detect small changes in concentration of the slowest substrates over the short term in the RNase P reaction. Nonetheless, the data reveal highly robust reproducibility of the calculated krel values, demonstrating that for the majority of sequences the error introduced by Illumina sequencing is minimal.

Evaluation of experimental error

96

Figure 3-5. Illumina sequencing errors contribute to imprecision in measurement of low krel values. Resequencing was performed on the cDNA created from the second experimental replicate of HTS-Kin performed on RNase P processing of Met pre-tRNA N(−6 to −1). The krel values determined from both Illumina sequencing runs of the same samples are plotted in a density plot where the number of points in a given area of the graph are indicated by the color and key at the right and show significant error in krel determination for slow reacting substrates.

97

Optimally, analytical methods should provide data with sufficient precision such that the principal source of error is due to differences between experimental trials.

We quantified the magnitude of experimental error between replicate HTS-Kin experiments. RNase P reactions were performed with the same pre-tRNAMetN(–6 to −1) population in triplicate and time points taken to achieve similar fractions of reaction, and the krel values were determined for each substrate variant using Eq.

(3). The variation among the three individual experiments is visualized by plotting the resulting affinity distributions. In Figure 3-6A, the affinity distributions for

Experiments 1 and 2 are compared, and in Figure 3-6B the data for Experiments

2 and 3 are compared. Both plots demonstrate strong correlation among the three data sets given that the majority of substrate variants are processed with very similar observed krel values between replicate experiments. Deviation from this trend is observed for substrates with very slow krel values that show the least correlation between replicates. As described above, this is due in large part to the relatively small changes in these substrates' concentration over the short time of the reaction that are in turn limited by error in the quantification of RNA levels by

Illumina sequence reads.

The average krel and standard deviation calculated for each substrate variant was used to calculate the coefficient of variation (CV = standard deviation/average). The CV for each substrate variant was then plotted versus the magnitude of its average krel value. As shown in Figure 3-6C, the substrate variants with the fastest krel values are measured with the greatest precision. As expected based on the plots shown in Figure 3-6A and B, the CV for each substrate variant

98

99

Figure 3-6. HTS-Kin replicates show reproducible determination of substrate variant krel except for slowest reacting substrates. (A,B) Comparison of the krel determined from replicate HTS-Kin experiments shown as a density plot, with the color indicating the number of points in that portion of the graph as shown by the legend. There is good correlation between replicates except for substrates processed with very slow krel. (C) The standard deviation (SD) in krel from three replicate HTS-Kin reactions of RNase P with pre-tRNAMetN(−6 to −1) was calculated, and the ratio of substrate krel to its SD was plotted. Substrates with high error are indicated by a ratio greater than 1. The coefficient of variation ratio (SD/krel) is compared with the observed processing rate, and substrate variants are aligned from fast to slow reacting.

100 increases as krel decreases. The error increases sharply only for substrate variants with krel values that are 50- to 100-fold slower than the reference. However, the majority (75%) are measured with CV < 1, and the fastest 50% of sequence variants are measured with higher precision (CV < 2).

101

Conclusions

The analyses shown here provide strong support for the interpretation that the primary source of error for most krel values determined by HTS-Kin arises due to experiment-to-experiment variation. Importantly, the reproducibility between experiments for the majority of substrates shows a CV less than or equal to 1 for krel values spanning two orders of magnitude. For systems with a greater range of rate constants, the reproducibility is expected to be even better. However, for the slowest reacting substrate variants, an additional source of error becomes significant. The application of internal competition kinetics requires the measurement of the change in the ratio of the abundance of a particular RNA at the start of the reaction and at a specific time point. For slow reacting sequences, this change in RNA concentrations is small and falls below the range that can be reproducibly measured by Illumina sequencing. This effect is not significantly amplified by experimental error, but it limits accurate measurement at the lowest krel values. In sum, a carefully performed HTS-Kin experiment will include benchmarks using the procedures outlined here. Namely, the fraction of reaction should be carefully chosen to provide the greatest range in rate constants and an appropriate reference substrate identified that lies near the center of the rate constant distribution. Any analytical method used to quantify the change in substrate or product ratios can be applied; however, the error within these measurements necessarily impacts the measured krel; therefore, its precision must be investigated. In addition, the method of preparation of the RNA substrates for high-throughput sequencing, be it amplification to cDNA by PCR or ligation,

102 introduces its own bias that can be handled to an extent by appropriate determination of the linear range of this amplification. The main error for substrates with slow krel comes from errors in precision of Illumina sequencing, and this should be investigated for other forms of quantification. HTS-Kin provides reproducible determinations of krel values for RNase P processing reactions, and these principles are likely to hold for many analogous in vitro RNA processing reactions.

103

Chapter 4: The contribution of the C5 protein subunit of Escherichia coli ribonuclease P to specificity for precursor tRNA is modulated by proximal

5’ leader sequences

Niland, C.N.; Anderson, D.R.; Jankowsky, E.; Harris, M.E. In Preparation

104

Abstract

RNA processing enzymes carry out important biological functions and typically consist of multiple subunits. In addition, these enzymes often must recognize multiple alternative substrates that vary in their functional binding, resulting in efficient discrimination. While the effect of changes in RNA substrate sequence and structure on enzyme processing is well accepted, the interdependence of enzyme subunits on molecular recognition of RNA substrates is relatively unexplored. To address this deficiency, we have used Ribonuclease

P (RNase P) as a model system to better understand shared molecular recognition of RNA substrates. RNase P is an endonuclease that removes 5’ leaders from precursor tRNAs (pre-tRNAs), and functions in bacteria as a ribonucleoprotein with a catalytic RNA subunit (P RNA) and a protein subunit (C5 in E. coli). The P RNA subunit contacts the tRNA and proximal 5’ leader nucleotides, while the C5 protein contacts distal 5’ leader sequences. To investigate whether the enzyme subunits contribute independently to specificity, or exhibit cooperativity or anti-cooperativity in contacting the substrate, we compared the relative kcat/Km values for all possible combinations of the six proximal 5’ leader sequences (n=4096) for processing by the E. coli P RNA subunit alone and by the RNase P holoenzyme. Analysis of the resulting rate constant distributions using unbiased and hypothesis-driven data mining approaches reveal predicted as well as surprising new features of RNase

P specificity. Surprisingly, the results reveal that the contribution of C5 protein to

RNase P processing rate constants is modulated by the identity of the nucleotide at N(-2). Using RNase P as a model system, these results provide a more

105 comprehensive understanding of the ability of enzyme subunits to crosstalk and thus alter functional binding of multiple RNA substrates. In addition, these studies provide a fundamental mechanism for considering the molecular relationship of enzymes with multiple subunits.

106

Introduction

There is growing appreciation of the role of RNAs in various health and disease states and efforts to target RNA for potential therapeutics are growing (1-

3). Due to the varied and essential roles that RNA plays in gene expression it is important to understand the specificity of key enzymes that bind and regulate RNA.

Such information is important for the development of RNA based antimicrobial therapeutics (3-5) and for bioengineering RNAs with novel functions (6-8).

Numerous studies have focused on identifying binding sites of RNA binding proteins (RBPs) and RNA processing enzymes within the transcriptomes of cells, and defining optimal sequence and structure motifs for association (9-13). Despite these important advances in identifying RBP binding sites, the quantitative effects of local sequence context and how variation from consensus motifs affects functional binding by multiple enzyme subunits are not well understood.

Differences in the sequence and structure of an RNA can affect the extent to which enzyme subunits recognize and associate with these alternative substrates. These contributions can arise due to perturbation of local RNA geometry, altering the chemical/electrostatic environment of nearby contacts, or by altering the rate limiting step in the reaction. A dissection of these contributions can is possible through quantitative analysis of a complete sampling of all possible substrate sequence combinations encompassing the enzyme binding site. This information provides a complete description of substrate recognition by enzyme subunits as well as their potential cooperativity. Recently, new methods were developed to measure the affinities and reaction kinetics of thousands of RNAs

107 simultaneously (9,14-17) that can potentially provide such information, but they have yet to see wide application.

Ribonuclease P (RNase P) is a multiple substrate RNA processing enzyme that provides a useful model system to investigate how variation in RNA sequence and structure affect shared molecular recognition (18,19). RNase P is a ubiquitous and essential tRNA processing endonuclease that removes the 5’ leader sequence from all precursor tRNA (pre-tRNAs) in the cell (Figure 4-1A). In Bacteria, RNase

P is composed of a large (ca. 400 nucleotide) catalytic RNA subunit (P RNA) and a smaller (ca. 90 amino acid) protein subunit (termed C5 protein in E. coli) (Figure

4-1B). While the P RNA subunit alone can process pre-tRNAs in vitro at high salt concentrations, the protein subunit is necessary for in vivo function and in vitro activity under physiological conditions (18-31). The P RNA subunit contacts the dsRNA helical structure formed by stacking of the T-stem and acceptor stem of tRNA. The P RNA active site recognizes nucleotides N(-2) and N(-1) in the 5’ leader relative to the cleavage site at N(1) (32-35). The C5 protein subunit binds to P RNA adjacent to the active site to form the RNase P holoenzyme, and contacts nucleotides N(-8 to -3) in the distal 5’ leader (24,27,31,36-39). An x-ray crystal structure of the Thermotoga maritima RNase P with tRNA and 5’ leader products bound provides a model that is consistent with the biochemical data. Thus, recognition of the ssRNA structure and sequence of the pre-tRNA 5’ leader is shared between the RNA and protein subunits of RNase P.

The nucleotide sequence at the cleavage site contacted by P RNA and C5 protein are only weakly conserved (40,41), however, a few key determinants for

108

109

Figure 4-1: Bacterial RNase P is an essential ribonucleoprotein RNA processing enzyme. A) Schematic of the RNase P reaction with pre-tRNA with cleavage site indicated by the black arrow and region of 5’ leader contact to the enzyme indicated by the blue rectangle. B) Diagram of the interactions between RNase P and pre- tRNA substrates. The RNA subunit of RNase P (blue) contacts nucleotides proximal to the cleavage site at N(-2) and N(-1), while the C5 protein subunit (pink) contacts nucleotides distal to the site of cleavage at N(-6) to N(-3). C) Diagram of pre-tRNAMet in which each nucleotide is indicated by a circle and contacts to the C5 protein are shown in red while those to the P RNA are indicated in blue.

110 enzyme-substrate recognition have been identified (Figure 4-1C). The 3’RCCA of the tRNA body base-pairs to L15 in P RNA, a G(1)-C(72) base-pair at the top of the acceptor stem is preferred in the enzyme active site, a U(-1) contacts a conserved adenosine in J5/15 of the P RNA, and N(-2) is thought to contact nucleotides in J18/2 of P RNA (25). The 5’ leader interactions with C5 protein contribute to substrate affinity and indirectly stabilize the binding of metal ions important for catalysis (42). In spite of strong experimental support for these enzyme contacts in vitro, many genomically encoded pre-tRNA substrates lack one or more of these determinants. The necessity of multiple substrate recognition and a naturally variable substrate population suggest that the energetic contribution of RNase P enzyme subunits may vary between pre-tRNA substrates.

Recently, high-throughput enzymology methods were developed that allow the multiple-turnover and single-turnover kinetics of Escherichia coli RNase P processing to be measured for thousands of pre-tRNA substrates simultaneously

(43,44). Application of these methods was used to determine the specificity landscape for pre-tRNAMet 5’ leader recognition. Quantitative analysis of the resulting rate constant distributions revealed that the identity of N(-2) and N(-3)

(relative to the cleavage site at N(1)) primarily control alternative substrate selection and act at the level of association, not the cleavage step (44). Similar high-throughput analysis of pre-tRNAMet reaction kinetics and equilibrium binding affinity for all possible sequence combinations in the C5 protein binding site at N(-

8 to -3) showed that its observed contribution to RNase P specificity includes

111 contributions from both sequence specific protein-RNA interactions as well as 5’ leader secondary structure.

The ability to measure kinetics for all possible RNA substrate variants in an enzyme binding site reveals the effects of mutation at each randomized position in the background of all possible surrounding sequence combinations. These data provide opportunities to address questions regarding specificity that can be answered in no other way. Here, we investigate whether 5’ leader interactions with the P RNA and C5 protein subunit are interdependent (either cooperative or anti- cooperative) or contribute additively and independently to define the observed sequence specificity of RNase P. To answer this question, we comprehensively investigated the interdependence of the energetic contributions of 5’ leader nucleotides that contact RNase P (N(-2 to -1)) and those that contact P protein (N(-

6 to -3)) in the 5’ leader. We measured the relative kcat/Km for all possible nucleotide variations of the proximal 5’ leader sequence (N(-6 to -1)) (n = 4096) for processing by the P RNA alone, and compared these values with the values previously determined for the same substrate pool using the RNase P holoenzyme that contains both P RNA and C5 protein.

Hypothesis driven data mining reveals both familiar and surprising new determinants of RNase P specificity that are confirmed with single substrate assays. As expected the presence of the C5 protein results in greater dependence on the 5’ leader sequence distal to the cleavage site. The data also clearly show that pairing of the proximal 5’ leader to the tRNA 3’RCCA acts as a strong anti- determinant. As a consequence this effect contributes significantly to RNase P 5’

112 leader sequence discrimination. Interestingly, we observe that the identity of the nucleotide at N(-2) controls the contribution of C5 protein interactions to kcat/Km, illustrating energetic coupling between enzyme subunits. Thus, large context dependent effects contribute to RNase P specificity, and variation in nucleotides that contact P RNA modulate the contribution of C5 protein to substrate discrimination.

113

Results

To comprehensively investigate the specificity of RNA processing enzymes and RNA binding proteins, we recently reported the development High-Throughput

Sequencing Kinetics (HTS-Kin), which allows the relative kcat/Km of thousands of

RNA sequence variants to be measured simultaneously (43,45). By investigating all possible substrate variants in the region of interest, it is possible to comprehensively determine the context dependent effect of pre-tRNA sequence variation in the 5’ leader.

The HTS-KIN technique is outlined in Figure 4-S1 and involves the creation of a randomized pre-tRNA substrate pool that is processed by RNase P in vitro.

Aliquots are isolated and quenched at various times over the course of the reaction, and substrate and product are separated on a denaturing polyacrylamide gel. The residual pre-tRNA population at different reaction times is isolated by gel purification and prepared for high-throughput Illumina sequencing by RT-PCR. By monitoring the change in number of sequencing reads for each substrate variant as a function of time, the relative kcat/Km for each species (i) calibrated to a reference sequence (krel = (kcat/Km)i/(kcat/Km)reference) can be calculated using internal competition kinetics. Analysis of all possible substrate variations in the region of interest provides an affinity distribution that describes the entire complement of effects of sequence variation in the randomized region.

Due to the shared molecular recognition of the 5’ leader of pre-tRNA by P

RNA and C5 protein, we hypothesized that cooperative or anti-cooperative interdependence may exist between their contributions to RNase P specificity. To

114 test this hypothesis, we used HTS-KIN to compare the affinity distributions obtained with the RNase P holoenzyme and with P RNA alone.

Comparison of the affinity distributions for P RNA and RNase P processing of pre-tRNAMetN(-6 to -1)- To interrogate the specificity of both subunits of RNase

P for the 5’ leader of pre-tRNA, we created a pool of pre-tRNAMet substrates variants randomized in the 5’ leader at nucleotides N(-6 to -1) encompassing both protein and RNA contacts (Figure 4-2A). The pre-tRNAMetN(-6 to -1) substrate pool was reacted with the P RNA alone or the RNase P holoenzyme and the relative kcat/Km values for all 4,096 substrates are shown in Figure 4-2B. The results of the reaction with RNase P holoenzyme were reported previously (44) but were not thoroughly interrogated. A large range of relative rate constants spanning about 100-fold is observed in the HTS-Kin reactions of the pre-tRNA pool with the

RNase P holoenzyme. In contrast, a reaction with P RNA alone showed a much narrower range of krel values, indicating an overall smaller effect of 5’ leader sequence variation on kcat/Km. The smaller variation in rate constants seen for the

P RNA multiple turnover reaction reflects a smaller number of sequence determinants in the binding site. A dot plot comparison of the krel values for RNase

P and for P RNA alone is shown in Figure 4-2C. Processing of the pre-tRNAMetN(-

6 to -1) substrate pool by P RNA reveals the majority of variants have krel values within a ~5-fold range. The greater range of values observed for processing by the RNase P holoenzyme illustrates a much greater effect of sequence variation on the observed rate constants. Nonetheless, the observed linear correlation between the two affinity distributions is consistent with at least a subset of shared

115

Figure 4-2: Sequence determinants for holoenzyme and ribozyme reactions are unique and reveal changes in specificity. A) Secondary structure of pre- tRNAMetN(-6 to -1) with regions of randomization analyzed in this study indicated by N’s. As shown, these regions encompass nucleotides of the 5’ leader that are involved in P RNA and C5 protein contact. B) The affinity distribution of relative rate constants from holoenzyme (black, published in (44)) and ribozyme (red) with pre-tRNAMetN(-6 to -1) show that while some substrates are processed with rate constants significantly above the genomically encoded reference in holoenzyme reactions, the same substrate population is significantly slower in the ribozyme reaction with no protein subunit. C) Comparison of the relative rate constants measured for each substrate variant in HTS-Kin reactions with RNase P holoenzyme (x-axis) and P RNA ribozyme (y-axis). D) Sequence preference in the 5’ leader of pre-tRNAMetN(-6 to -1) examined by creating sequence probability logos of the fastest reacting substrates in holoenzyme or ribozyme reactions. The position in the 5’ leader is indicated on the bottom of the logo, nucleotide preference at each position is indicated by the identity and size of the letter of the nucleobase.

116 sequence specificity determinants.

An initial perspective on sequence specificity was obtained by calculating optimal sequence logos for the fastest reacting 1% of substrate variants in each data set (46) (Figure 4-2D). This analysis identifies a strong preference for an adenosine at the N(-2) position and uridine at the N(-3) position in the 5’ leader of pre-tRNAMet in RNase P holoenzyme reactions as reported, previously (47).

However, in reactions catalyzed by P RNA alone, sequence preference for an adenosine at the N(-2) position in the 5’ leader is observed, but the preference for uridine at N(-3) is lost. The variation of distal 5’ leader nucleotides appears to have a minimal contribution in the optimal RNAs in both reactions. Thus, the C5 protein promotes additional specificity in contacts to the 5’ leader while suppressing specificity at the nucleotide at the cleavage site.

Quantitative RNA specificity modeling reveals coupling between adjacent proximal 5’ leader nucleotides- To fully analyze the quantitative datasets generated by HTS-Kin, we used unbiased modeling approaches to describe the data

(43,44,48). First, the krel values for P RNA and RNase P processing were fit to a position weight matrix (PWM) model that defines the identity and position of each nucleotide in the 5’ leader and considers each position in isolation (Figure 4-S2A).

For both data sets the simple PWM model does a relatively poor job of describing the affinity distributions. In both the P RNA alone and RNase P reactions this model is sufficient to explain < 50% of the effects of sequence variation on reaction rate (Figure 4-S2B-C).

117

As described previously, a more complex model that includes the effect of any two positions on one another can more accurately define the relationship between the substrate 5’ leader sequence and the observed rate constant (Figure

4-S2D) (see Experimental Procedures). The coupling coefficients express the quantitative effects of surrounding sequence on the contribution of individual nucleotide positions. These data thus contain information on effects due to energetic coupling, effects of local environment and secondary structure. A comparison of the observed rate constants to those predicted by a linear fit of this model shows that it can explain over 60% of the effects of sequence variation for both P RNA and the RNase P holoenzyme reactions (Figure S2E&F).

The coupling coefficients predicted by the model for the P RNA alone and

RNase P holoenzyme reactions are presented in a heat map in Figure 4-3A. For the P RNA alone reaction the coupling coefficients observed at distal regions in the 5’ leader of pre-tRNAMet are essentially zero consistent with lack of contacts with the ribozyme. Conversely, there are strong coupling coefficients between neighboring nucleotides proximal to the cleavage site at N(-1), N(-2), N(-3) and N(-

4). This result is somewhat surprising since P RNA is only known to interact with

N(-1) and N(-2). The contribution of nucleotides distal to the cleavage site could reflect local changes in structure at N(-1) and N(-2) that alter the energetics of N(-

3) and N(-4). Such effects could also result from structure formation of these proximal nucleotides (N(-4) to N(-1)) with the 3’ACCA of the tRNA body as pattern emerges showing preference against neighboring guanosines (i.e. G(-3) and G(-

4)).

118

119

Figure 4-3: Plotting the coupling coefficients from the PWM+coupling coefficient model identifies key altered energetic in holoenzyme and ribozyme reactions. A) Heatmap showing the coupling coefficients in the P RNA reaction predicted from the PWM+coupling coefficient model between nucleotides in the randomized region of the 5’ leader, N(-6) to N(-1). The identity and position of the nucleotide is indicated on each axis and the strength of the predicted coupling coefficient is indicated by the color at the vertex (red for a positive coefficient, blue for a negative coefficient). These results are compared to that for the RNase P reaction (44) with circles indicating the strength of these coefficients. B) Comparison of the coupling coefficients calculated from the model between RNase P and P RNA HTS-Kin reactions.

120

The calculated coupling coefficients from fitting the obtained krel from the

RNase P reaction to this model are also shown in Figure 4-3A. The heat map clearly shows that significant values between proximal 5’ leader nucleotides are reduced in the holoenzyme reaction compared to the ribozyme reaction.

Additionally, the largest coupling coefficients are observed for nucleotides in the distal region of the 5’ leader that contact the C5 protein reflecting an increase in their energetic contribution to binding. These data are consistent with the interpretation that C5 protein sequence specificity is directed primarily at distal 5’ leader sequences N(-6 to -3). The decrease in magnitude of coupling coefficient values between these nucleotides and N(-2) to N(-1) is consistent with the suppression of N(-1) specificity revealed by the analyses of optimal sequence logos in Figure 4-2D.

The calculated coupling coefficients for HTS-Kin reactions with the RNase

P ribozyme and holoenzyme are compared in Figure 4-3C. This comparison reveals that the strength of only a few coupling coefficients is greater in the ribozyme reaction compared to the holoenzyme reaction. The range of values for the coupling coefficients is much larger in the ribozyme reaction compared to that in the holoenzyme reaction. Thus, supporting the idea that the presence of the C5 protein subunit facilitates sharing of enzyme specificity between all positions in the

5’ leader.

The specificity for N(-6) to N(-3) depends on the identity of proximal 5’ leader nucleotides- To dissect the inter-dependence of 5’ leader nucleotide interactions with the P RNA and C5 protein subunits, we examined whether the

121 identity of nucleotides at N(-2) and N(-1) influences the observed sequence specificity for positions N(-6) to N(-3).

Due to the comprehensive determination of rate constants by HTS-Kin, the dataset contains the krel for all 256 possible combinations of sequences at N(-6 to

-3) in the C5 binding site of the 5’ leader in the context of all 16 combinations of dinucleotides at N(-2) and N(-1) in the P RNA binding site (see Figure 4-4A). To extract this information we binned the HTS-Kin data from the holoenzyme reaction into subsets according to the identity of nucleotides at N(-2)N(-1). Next, we used individual dot plots to compare the krel value for each N(-6 to -3) variant in the context of different N(-2)N(-1) sequences. These data were compared to that obtained with the genomically encoded A(-2)U(-1) (Figure 4-4B). A linear relationship with a positive slope near unity indicates that the effect of mutation at

N(-6 to -3) on C5 specificity is independent of the identity of N(-2)N(-1). If the identity of N(-2)N(-1) changes the observed energetic contribution of nucleotides contacting C5 then a deviation from this behavior will be observed. Upon inspection it is clear that the effect of variation at N(-6 to -3) is linearly correlated for most combinations of nucleotides at N(-2)N(-1).

However, for several combinations of N(-2)N(-1), variation of nucleotides at

N(-6 to -3) has little effect on the observed krel. This behavior is reflected in slopes that approach zero. For comparison, the slopes for each individual graph are shown in Figure 4-4C. The most pronounced effects are observed for substrates with a C(-2) and to a lesser extent U(-2). Additionally, pre-tRNAs containing G(-

2)U(-1), G(-2)G(-1) are relatively insensitive to variation at positions N(-6 to -3)

122

Figure 4-4: Variation of nucleotides in the 5’ leader of pre-tRNA contacting the P RNA subunit of RNase P alter the energetic contribution of nucleotides contacting the C5 protein. A) The 16 data subsets of HTS-Kin separated by the identity of nucleotides at N(-2) and N(-1) in the RNA binding site. B) Each subset is compared to the genomically encoded A(-2)U(-1). Across each row symbolizes a change in nucleotide identity at the N(-1) position while down a column follows a change at N(-2). Each of the 16 boxes represents a scatter plot of the krel for a substrate with a given sequence at N(-6) to N(-3) in the background of the mutant or wildtype nucleotides at N(-2) and N(-1). C) Each scatter plot was fit to a line with the slope indicated in the graph.

123 compared to the reference substrate (A(-2)U(-1)).

Thus, the HTS-Kin data reveal inter-dependence between the identity of 5’ leader nucleotides involved in interactions with P RNA (N(-2) and N(-1)) and those contacted by the C5 protein subunit (N(-6 to -3)). Comparing the effect of all possible combinations of sequences at N(-2)N(-1) to the reference sequence A(-

2)U(-1) reveals that the identity of proximal nucleotides does not influence the apparent C5 sequence specificity. Rather, the degree to which C5 specificity contributes to kcat/Km is controlled by the identity of the nucleotide at N(-2) and to a lesser extent N(-1). Using the comparative perspective shown in Figure 4-4, we identified the most prominent inter-dependence effects indicated by the HTS-Kin data, and tested them by analyzing selected pre-tRNAMet 5’ leader sequence variants using traditional single substrate kinetic assays.

Unfavorable effects on kcat/Km due to pairing between proximal 5’ leader nucleotides and the tRNA 3’ terminal ACCA- It is well documented that the 3’RCCA of pre-tRNA base pairs with nucleotides in the P15 region of the P RNA subunit.

Extension of the acceptor stem by engineering paring interactions with N(-2)N(-1) is unfavorable for RNase P catalysis can result in mis-cleavage in some substrates

(49,50). Consistent with this known feature of RNase P specificity, we observe that G(-2)G(-1) and G(-2)U(-1) sequence combinations have negative effects on specificity consistent with pairing of these positions to the terminal 3’ACCA of pre- tRNAMet (Figure 4-5A). To test these predictions for pre-tRNAMet we used single substrate assays in multiple turnover reactions to measure the rate of processing

124

Figure 4-5: Additional secondary structure in the acceptor stem of the pre-tRNAMet formed between 5’ leader nucleotides and the 3’ACCA slows processing rate constants. A) The proposed interactions between the N(-4) to N(-1) nucleotides of the 5’ leader and the 3’ACCA of the tRNA body. B) Results of individual substrate assays to test the effect of zero, two, or four base pairs between the 5’ leader and 3’ACCA on the krel. The 5’ leader of each substrate from N(-6) to N(-1) is indicated on the x-axis. Experiments were performed in triplicate and error bars indicate the standard deviation from all experiments. * indicates no cleavage was observed.

125 of pre-tRNAMet substrates containing increasing numbers of base-pairs with the terminal 3’ACCA. We compared these effects in the background of three different

N(-6 to -3) sequences (AAUA, AAAU and AAUG). When each of these sequences is combined with A(-2)A(-1) the relative kcat/Km values are greater than the native reference sequence (Figure 4-5B). Nonetheless, their relative kcat/Km values are sensitive to N(-6 to -3) sequence variation consistent with the HTS-Kin results.

These same pre-tRNAs engineered to contain a G(-2)U(-1) would form complementary base pairs with the A73 and C74 of the 3’ACCA. For all three distal

5’ leader sequences a significantly slower (5 to 10-fold) relative rate constants was observed (Figure 4-5B). Interestingly, the same relative rate constant was observed within error for all three pre-tRNAs independent of the identity of N(-6 to

-3). This observation contrasts to the sensitivity to N(-6 to -3) sequence variation in the context of 5’ leader sequences with A(-2)A(-1). Substrates designed with perfectly complementary base-pairs with the 3’ACCA of the tRNA body contained a 5’-UGGU-3’ at N(-4 to -1) and were inactive in multiple turnover reactions.

However, the HTS-Kin data indicates that 5’ leader sequence containing 5’-

YGGU-3’ that pair with the 3’ ACCA have krel values close to the genomically encoded reference. These data points are observed as the outlier sequence variants that contain a G(-2)U(-1) that do not follow the global trend compared to the A(-2)U(-1) reference in Figure 4-5B. This inaccuracy relative to single substrate experiments is likely due to non-linear amplification of these variants during library preparation. For these outlier sequences formation of a complete stem involving the 3’ACCA may compete for primer binding in RT-PCR steps

126 required for Illumina sequencing. Nonetheless, HTS-Kin results are broadly predictive of results obtained using individual sequence variants.

N(-2) identity controls the contribution of distal leader sequences that contact C5 protein to specificity- The above analysis determined at multiple levels that the identity of nucleotides contacted by P RNA modulates the sensitivity of

RNase P to sequence variation in the C5 protein binding site. This effect is exemplified by the substrate population with A(-2)U(-1) that shows a broad affinity distribution for substrates that vary only in the protein binding site at N(-6 to -3)

(Figure 4-6A). This variation in affinity is indicative of 5’ leader nucleotides in the protein binding site having significant energetic contribution to kcat/Km. For substrates with U(-2)U(-1) the apparent sequence specificity in the C5 protein binding site at N(-6 to -3) is essentially unchanged (see Figure 4-S3). In contrast, substrates with C(-2)U(-1) have a narrow distribution of processing rate constants resulting from sequence variation at positions N(-6 to -3) having smaller effects on kcat/Km (Figure 4-6B).

To determine the effect of different sequence variations at N(-6 to -3) in the context of either A(-2)U(-1) or C(-2)U(-1), we directly compared the rate constants from each data subset. Plotting the observed krel for each possible N(-6 to -3) sequence variant in the background of C(-2)U(-1) versus A(-2)U(-1) (Figure 4-6C) reveals a slope of 0.4, demonstrating that substrates with a C(-2) are insensitive to variation in the C5 protein binding site.

These predictions are validated by the results from single substrate assays of individual 5’ leader sequence variants. Mutation of N(-6 to -3) in the C5 protein

127

Figure 4-6: Coupling between nucleotides in the RNA and protein binding sites in the 5’ leader of pre-tRNA is a key component of RNase P specificity. A) One population of substrates with U(-1) and A(-2) includes 256 sequences, the affinity distribution for these substrates is broad and shows protein binding site variation alters RNase P processing rates. B) For those substrates with U(-1) and C(-2), the affinity distribution is much more narrow, indicating little effect of protein binding site sequence variation on krel. C) Comparison of the measured krel from HTS-Kin for a particular protein binding site sequence (indicated by each dot) in the background of adenosine or cytosine at N(-2) in the RNA binding site illustrates that NNNNAU substrates are sensitive to protein binding site sequence variation, while NNNNCU are insensitive to this variation. Two substrate variants are indicated in blue and red that contain AUAU or CACG at N(-6) to N(-3) respectively. D) Single substrate multiple turnover assays measured the absolute kcat/Km for substrates with the indicated 5’ leaders on the x-axis. These reactions contained pre-tRNAMetWT as an internal reference. Experiments were performed in triplicate and error bars indicate the standard deviation from all experiments.

128 binding site from AUAU to CACG reduces kcat/Km for RNase P processing by ca.

2-fold when N(-2) in the RNA binding site is an adenosine. However, there is no significant measurable difference in rate constant for the same change in the N(-6 to -3) sequence when N(-2) is cytosine (Fig. 6D). Thus, the identity of proximal 5’ leader nucleotides that contact P RNA can influence degree to which nucleotides distal from the cleavage site contribute to specificity.

129

Discussion

In the E. coli cell, a single RNase P enzyme must process all 87 pre-tRNAs.

Previous biochemical and structural studies of RNase P substrate recognition provide a general model for specificity in which the C5 protein contacts 5’ leader nucleotides (24,26,27,36), while the P RNA subunit contacts both the tRNA body and 5’ leader nucleotides proximal to the cleavage site (27,28,34,49,51). Analysis of different endogenous pre-tRNAs reveals similar kcat/Km values despite significant variation in regions contacted by RNase P, including the 5’ leader.

However, swapping 5’ leader sequences between different pre-tRNAs has large effects on in vitro processing rate constants (31). Within the endogenous pre-tRNA population, 5’ leader length and potential to form secondary structure is widely variable. Thus, despite a general model of substrate specificity, the manner in which RNase P accommodates variation in these features in order to function has been relatively unexplored until recently.

Structure-function studies have revealed details of how optimal 5’ leader sequences are positioned in the RNase P active site. Studies from our lab and others identified a direct base-pairing between the U(-1) of pre-tRNAAsp and the

A248 residue in J5/15 of the P RNA subunit (25,35). Additionally, the nucleotide sequence in the 5’ leader of pre-tRNA was shown to be important for substrate binding by RNase P, particularly for the P RNA subunit (26,31). Fierke and colleagues identified a specific interaction between tyrosine at position 34 in the

C5 protein of Bacillus subtilis and the N(-4) position in the 5’ leader of pre-tRNAAsp that regulates binding affinity (39). Recent HTS-Kin analysis on pre-tRNAMetN(-6

130 to -1) processed by the RNase P holoenzyme (44) showed that the identity of N(-

2) and N(-3) primarily control alternative substrate selection at the level of association, not the cleavage step. As a consequence, the specificity for N(-1), which contacts the active site and contributes to catalysis, is suppressed.

The use of HTS-Kin provides a unique advance in determining the energetic contribution of each residue as it quantifies the effect of mutation of a given nucleotide in the 5’ leader in the background of all possible surrounding sequences. By mining the resulting affinity distributions for P RNA and RNase P processing of the pre-tRNAMetN(-6 to -1) random pool, we provide evidence for both expected and unexpected aspects of RNase P specificity. Firstly, we observe no change in global specificity for nucleotides at N(-6 to -3) in protein-RNA contacts upon variation in sequence at RNA-RNA. Instead, we observed attenuation effects in which at one extreme the identity of 5’ leader nucleotides contacting P RNA completely eliminates the energetic contribution of nucleotides in the C5 protein binding site. Additionally, the inhibitory effect of pairing between proximal 5’ leader nucleotides and the tRNA body is demonstrated. Interestingly, these effects also result in an attenuation of the contribution of distal 5’ leader sequences to pre- tRNA affinity. Coupling observed between the RNA and protein subunits of RNase

P in substrate recognition could be a result of either altered substrate association or conformational change of the enzyme upon substrate binding (Figure 4-7).

Indeed, the induced fit model of RNase P predicts this conformational change and this step in the reaction has been shown to be rate limiting for the B. subtilis RNase

P (52).

131

Figure 4-7: A mechanistic model for modulation of interdependence of proximal and distal leader sequence specificities. A possible interpretation of the results herein is that there is a change in the rate limiting step of substrates with either an A(-2) or C(-2), possibly at the conformational change step.

132

This may be the case for the E. coli enzyme studied here as well and further studies will be needed to dissect the mechanism of this energetic coupling identified here.

The idea of coupling between different regions of enzyme or substrate is not unprecedented. It was previously shown for alkaline phosphatase that single mutations of active site residues could not account for the combined rate defect observed when mutated in combination (53). Another example of coupling in RNA processing was shown in tRNA binding by the ribosome where mutation of the anti-codon stem in the tRNA body resulted in weakened binding of the tRNA to its cognate codon (54). While several other examples are found within the literature, the role of energetic coupling in specificity in the case of enzymes with multiple subunits is unexplored.

Overall, this study provides the first foray into a comprehensive and quantitative determination of energetic coupling between individual subunits of

RNA processing enzymes. Using RNase P as a simple model for these studies will reveal general principles of this energetic coupling that can be applied to understanding more complex enzymes with multiple subunits such as the spliceosome, hnRNPs, ribosome, etc. Given the well-established role of RNA binding proteins in human health, obtaining a complete picture of their substrate specificity is paramount to identifying their therapeutic potential. To fully understand how to target these enzymes using novel small molecules or drugs and to understand their mechanism of action, a quantitative understanding of their substrate recognition is essential and is only complete with a solid description of the energetics of these interactions.

133

Materials and Methods

RNA and Protein Preparation

C5 protein was expressed and purified as previously described (104). Both

P RNA and pre-tRNAMet were prepared by in vitro transcription as described previously (49). Briefly, the genes for P RNA or pre-tRNAMet were cloned into pUC19 vector and linearized to use as a template for T7 RNA polymerase (New

England Biolabs). To create the mutant pre-tRNAMetN(-6 to -1) substrate pool,

DNA primers incorporating mutations at the desired positions were used for PCR amplification of the cloned DNA template and this PCR product was used for in vitro transcription as described (75,77). RNA was purified by polyacrylamide gel electrophoresis with UV shadowing followed by standard phenol-chloroform extraction and ethanol precipitation with a final resuspension in 10 mM Tris-HCl pH 8.0, 1 mM EDTA. A portion of the pre-tRNA population was 5’ end labelled with γ-32P using polynucleotide kinase and purified as described above.

RNase P Reactions

Multiple turnover substrate reactions were performed in 50 mM Tris-HCl pH

8, 100 mM NaCl, 0.005% Triton X-100, and 17.5 mM MgCl2. For individual substrate reactions, the RNase P holoenzyme was assembled using 2 nM of RNA, heating to 95˚C for 3 min followed by 37˚C for 10 min before addition of 17.5 mM

MgCl2 and 2 nM C5 protein. Substrate pools were prepared separately using 60

134 nM unlabeled pre-tRNA spiked with a negligible amount of 32P-pre-tRNA with 17.5 mM MgCl2. The reaction was started by mixing equal volumes of enzyme and substrate to give 30 nM substrate and 1 nM enzyme. Aliquots were taken at desired timepoints and quenched in formamide loading dye with 100 mM EDTA.

Polyacyrlamide gel electrophoresis was used to separate substrate and product and the labelled portion of the substrate population allowed for quantification by phosphorimager and ImageQuant software. For reactions demonstrating coupling, 30 nM of pre-tRNAMetWT with a shortened 5’ leader was included in each reaction as an internal reference for substrate processing and used to derive krel by dividing the processing rate of the mutant substrate by that of the wildtype reference.

High-Throughput Sequencing Kinetics

Reactions were performed exactly as described above except they were scaled up by 10-fold in volume to provide sufficient RNA for Illumina sequencing.

Holoenzyme reactions contained 1 µM pre-tRNAMetN(-6 to -1) and 5 nM RNase P holoenzyme. Ribozyme reactions contained 1 µM pre-tRNAMetN(-6 to -1) and 10 nM P RNA. Quantification of the relative processing rate constant was performed as previously described (75,76,149) using the final equation:

(1 − 푓) 푙푛 푅 푅 푖,0 (∑푖 ) 푅 1 푅 푋 푘 = 푖 0 푟푒푙 (1 − 푓) 푙푛 푖 푅 ∑1 푋 푅0

135 in which krel is the relative second order rate constant, f is the overall fraction of substrate reacted determined by phosphorimager analysis, and X is the mole fraction obtained from Illumina reads. R indicates a ratio of reads between the mutant and wildtype substrate from Illumina sequencing where the subscript i,0 denotes this ratio before the reaction begins and the subscript i denotes this ratio at the fraction of substrate reacted f.

RNA Sequence Specificity Modeling

The position weight matrix model with coupling coefficients included considered nucleotide identity and position in the randomized region as well as position and identity of other nucleotides in the binding site using the following equation:

6 푙푛(푘푟푒푙) = ∑ (푎푖퐴푖 + 푐푖퐶푖 + 푔푖퐺푖 + 푢푖푈푖) + 훽푗퐼푗 푖=1 where ai, ci, gi, and ui, are integer values (0 or 1) signifying nucleotide identity and

Ai, Ci, Gi, and Ui represent the linear coefficients for that nucleotide at position i.

The term βj is the linear coefficient for interaction between two positions and nucleotide identities. Ij is 1 for all substrates with that specific pair of nucleotides, and 0 otherwise. Each interaction term which had an absolute t-value greater than

3.5 (p < 0.005) was used in a final model of stepwise regression to obtain predicted krel values.

136

Supplementary Data

Figure 4-S1: Schematic of the HTS-Kin procedure. First, the RNA is randomized in sequence at the region of interest, here at the -1 to -6 positions in the 5’ leader yielding 4,096 pre-tRNA variants. Next, we react that substrate pool with RNase P, and thus high affinity substrates will be cleaved quickly, while the lower affinity pre-tRNAs will take longer to react. Then, we isolate the residual substrate population at different reaction times by gel purification and perform RT-PCR to analyze by Illumina sequencing. The number of high-throughput sequencing reads allows us to estimate the change in abundance of each pre-tRNA as a function of time. As shown, if we align sequences from fast to slow reacting, and compare the number of reads, we can see that as the reaction progresses, fast reacting sequences become depleted from the substrate population while slower substrates accumulate. We calculate for each substrate a relative rate constant, or krel value, (which is a ratio of its kcat/Km to that of the genomically encoded reference) using the total fraction of reaction, f, and the ratio of the number of reads of the sequence to the WT reference represented by R.

137

138

Figure 4-S2: Including coupling coefficients in a mathematical description of the data provides a better model for RNase P specificity. A) Conceptual diagram of position weight matrix model showing each position as independent of all others in the randomized region (From Jankowsky et al). B) Scatter plot of the relative rate constants of pre-tRNAMetN(-6 to -1) observed in the ribozyme and holoenzyme C) HTS-Kin reaction compared to that predicted by the position weight matrix model. D) Diagram depicting the PWM+coupling coefficient model in which any two nucleotides in the randomized region my attenuate or modulate another. E) Comparison of the relative rate constants of pre-tRNAMetN(-6 to -1) observed in the ribozyme and holoenzyme F) HTS-Kin reaction compared to that predicted by the position weight matrix model including coupling coefficients (PWM+IC). Less than 10% of data omitted for clarity. RNase P modeling results published in Niland et al. ACS Chem Biol 2016.

139

Figure 4-S3: Dividing the holoenzyme HTS-Kin data into subsets reveals extensive coupling between the sequence identity in the RNA binding site on sequence specificity of the protein binding site in the 5’ leader of pre-tRNA. The HTS-Kin data was divided into 16 subsets based on the identity of nucleotides at N(-2) and N(-1) in the RNA binding site. Across each row symbolizes a change in nucleotide identity at the N(-1) position while down a column follows a change at N(-2). Each of the 16 boxes represents the affinity distribution of substrate variants with a common set of nucleotides at N(-2) and N(-1) in the 5’ leader in which only nucleotides contacting the C5 protein are allowed to vary (N(-6) to N(-3)). The dashed red line indicates the average krel of this population and the dashed blue line shows the highest krel in the data subset.

140

Chapter 5: Summary and Future Directions

141

Summary

Ribonuclease P is an essential endonuclease responsible for cleavage of

5’ leaders from precursor tRNAs. RNase P is found in all domains of life in various forms and must process all pre-tRNAs despite variation in sequence and structure.

The RNA subunit of bacterial RNase P, P RNA, is the catalytic portion of the enzyme while the protein subunit, C5 protein, aids in substrate recognition by binding the 5’ leader and divalent cations (16,38,59). Proximal nucleotides of the

5’ leader, N(-2) to N(-1), contact the P RNA subunit while distal nucleotides N(-6 to -3) are bound by the C5 protein. Despite biochemical evidence that mutation of nucleotides in the 5’ leader alter processing rates, it was thought until recently that this enzyme was non-specific due to the wide range of substrates it must process.

Based on previous biochemical and structural work, the effect of variation in the 5’ leader of pre-tRNA on the processing rate constants by RNase P has been documented to some extent, however, the molecular determinants of this specificity are still elusive. In the work presented herein, we aimed to understand how the natural variation in the 5’ leader sequence of endogenous pre-tRNAs is accommodated by the enzyme and also whether there is crosstalk between the two enzyme subunits to achieve this shared molecular recognition. To accomplish this, we randomized the region of the 5’ leader of pre-tRNA that contacts both the

P RNA subunit and the C5 protein (nucleotides N(-6 to -1)) and measured the processing rates of all substrate variants using high-throughput sequencing.

142

Identification of Rate Limiting Step in RNase P Processing of 5’ Leader Variants in Pre-tRNA by HTS-Kin

The optimal experiment to gain additional insight into RNase P specificity would be measurement of the processing rate constants of substrates will all possible sequence combinations in the region of interest. This approach carries strong advantages over those using single point mutations or techniques such as

SELEX that qualitatively analyze only the tightest binding substrates because it not only provides quantitative analysis of all substrate variants but also a comprehensive determination of specificity as it includes the effect of mutation at one position in the substrate in all possible backgrounds of the randomized region.

Another significant strength of this technique is that measurements are made with substrates in competition with one another for processing by the enzyme, which is the true nature of substrates in the cell and thus provides a true measure of specificity.

As an initial means to gain insight into the specificity of RNase P processing of pre-tRNA substrates, we randomized regions of the 5’ leader of pre-tRNA contacting both the RNA and protein subunits, N(-6 to -1). This randomized substrate pool was then subject to multiple turnover reactions with RNase P holoenzyme and the rate constants of each substrate quantified using the HTS-

Kin technique. As an initial validation of this new population, we compared the rate constants obtained for a subset of these substrate variants with those previously reported in randomization of the C5 protein binding site alone and showed that they were highly similar (75).

143

By comparing the processing rate constants of these substrates under single turnover and multiple turnover conditions, we are able to discern differences in the effect of these mutations on the second-order and first-order rate constants respectively. We observed that RNase P seems to be globally tuned for specificity at association (i.e. E+S → E•S) as there is significant variation in the krel observed for the randomized substrate population under kcat/Km conditions which is eliminated under kcat conditions. The multiple turnover reactions were best described by a mathematical model which takes into account nucleotide identity and position in the randomized region but also the neighboring environment, suggesting effects of energetic coupling on RNase P specificity. Additionally, we observe that, in general, mutations predicted to alter the secondary structure of the substrate have slower processing rate constants.

Together these experiments comprehensively define the effect of 5’ leader sequence variation on both kcat/Km and kcat using the HTS-Kin technique. We also comprehensively determined RNase P specificity for 5’ leaders of pre-tRNAs and identified the rate limiting step in the processing of these substrates.

Investigation of the Sources of Error in HTS-Kin and Optimal Conditions for

Quantitative Analysis

In order to understand the potential sources of error in the High-Throughput

Sequencing Kinetics experiments performed here, we undertook a rigorous analysis of each step in the procedure and its effect on the observed rate

144 constants. This analysis was important in confirming that this high-throughput data set abided by the principles of competitive alternative substrate kinetics and in providing a roadmap for performing these experiments in future studies.

First, we identified that although the mole fraction of a substrate variant in the population or its starting concentration compared to the reference did not correlate with the substrate processing rate constant, the change in substrate concentration over time showed exponential decrease with increasing krel.

Therefore, these reactions seem to follow a typical competitive alternative substrate model in which the substrate concentration is irrelevant to the determined processing rate constant as each substrate acts as competitive inhibitors for others. Therefore, the only parameter that governs krel is the second order rate constant, or change in this substrate concentration over time.

We also pinpoint the portion of the reaction in which timepoints should be taken for HTS-Kin analysis. For example, we observe that sampling the reaction at later fractions of substrate reacted shows significant compression of the relative rate constants to a krel of 1. This data suggests that using longer reaction times for HTS-Kin analysis allows substrates processed with slower relative rate constants to appear to catch-up to their faster counterparts and thus the variation in processing rate constants is reduced. We find that selecting timepoints near the early part of the reaction (10-20% reacted) provides the greatest precision.

However, the error from Illumina sequencing was a large source of the observed error in the HTS-Kin measurements. This was quantified by sequencing the same sample in duplicate without re-preparation. In addition, the slowest

145 reacting substrates in the population consistently showed the largest amount of error in their calculated krel and we attribute this to the small change in the concentration of these substrates at early timepoints. Our conclusions are that experimental precision and Illumina sequencing are the largest sources of error in

HTS-Kin reactions and we suggest that the optimal way to handle such uncertainty is to perform these reactions as three biological replicates and use the averaged dataset.

Observation of Energetic Coupling in RNase P Specificity for Pre-tRNA Substrates

Next, we sought to understand the molecular determinants that dictate the shared molecular recognition of pre-tRNA substrates by both subunits of RNase

P. Since randomizing N(-6 to -1) in the 5’ leader encompasses binding sites of both the RNA and protein subunits of RNase P, we were able to dissect the potential long range effects of energetic coupling in our HTS-Kin data. This included understanding whether the sequence identity at proximal RNA-RNA contacts alters specificity at distal protein-RNA contacts.

First, we analyzed the effect of 5’ leader sequence variation on the rate constant for processing by the ribozyme, P RNA alone, and compared it to the holoenzyme reaction, P RNA and C5 protein. This revealed that the ribozyme was insensitive to this variation, in contrast to the holoenzyme reaction which showed a significant range of substrate processing rate constants. To analyze the data, mathematical models that took into account not only the position and identity of

146 nucleotides in the 5’ leader but also its neighboring environment provided the best fit to data from both reactions. We further showed that the strongest effects of energetic coupling were found between neighboring nucleotides. This indicated that energetic coupling is important for understanding RNase P specificity as the nucleotides in the randomized region can modulate or attenuate the effect of variation at another position.

Hypothesis driven data mining was used to directly identify energetic coupling in this high-throughput quantitative biochemical dataset. By creating easily interpretable data subsets, we were able to identify that there is no change in sequence specificity of the C5 protein for nucleotides N(-6 to -3) when contacts to the P RNA at N(-2) or N(-1) are mutated. However, there appears to be an on and off switch to the energetic contribution of nucleotides contacting the C5 protein that is located at the N(-2) position. By changing the identity of N(-2), the contribution of nucleotides at N(-6 to -3) to the processing rate constant was quantified. We showed that an adenosine at N(-2) allowed for tuning of RNase P processing by protein-RNA contacts, while a cytosine at N(-2) reduced this contribution significantly and lead to uniformity in substrate processing. We directly confirmed these results using single substrate assays.

Our overall hypothesis is that substrates with an adenosine at N(-2) may have a rate limiting step at association, and thus only substrate variants that alter substrate binding will be detected. Conversely, those substrates with cytosine at

N(-2) may have a different rate limiting step, such as conformational change of the enzyme-substrate complex, that masks the effect of this variation on the initial

147 binding event. This change in rate limiting step is unlikely to be down stream of this part of the reaction for the following reasons: there was no effect of 5’ leader variation on the single turnover rate constant of the majority of substrates and competitive substrate kinetics report on the rate limiting step of the reaction up to and including the first irreversible step. Other more complicated hypotheses are also possible such as a local perturbation of structure or change in chemical environment, however we favor the change in mechanism model which can be directly tested in future experiments.

Discussion

Taken together these studies show that RNase P possesses quantifiable specificity for 5’ leaders of pre-tRNA substrates. From this comprehensive analysis of the processing rate constants for all 5’ leader sequences, we identified energetic coupling between the RNA and protein subunits of RNase P in substrate recognition. These results provide an initial explanation of the ability of RNase P to process multiple substrates and also may indicate a means of regulation in substrate recognition. Currently, an understanding of how this change in energetic coupling manifests in substrate processing in vivo is lacking. For instance, it is possible that pre-tRNAs corresponding to genes of the rarer amino acids (i.e. proline and methionine) are processed more slowly as they are needed less frequently for tRNA charging and protein synthesis while those for the more common amino acids are processed quickly as they are needed more frequently.

Whether this hypothesis is true will be difficult to test as there are many steps in

148 tRNA processing which could be the rate limiting step in their formation, many pre- tRNAs are found in polycistronic transcripts in bacterial which alters their processing, the abundance of pre-tRNAs or tRNA in the cell has not been well quantified, the effect of sequence in the tRNA body has not been accounted for here, and many pre-tRNAs are transcribed from multiple genes.

Regardless of the relation to in vivo processing of pre-tRNAs, these studies provide the first comprehensive description of substrate recognition by multiple enzyme subunits. This work follows studies from the Uhlenbeck, Batey, and

Herschlag labs in positing the coordinated functions of residues in the substrate or enzyme and furthers the idea that a complete description of the function of an individual residue cannot be described by its study in isolation using single point mutations. Rather these positions must be viewed as acting together to achieve effects that are greater than the sum of their parts. I submit that the future of enzymology lies in determining the kinetics of reactions in vivo and determining the energetic coupling of enzyme-substrate interactions.

Future Directions

In the future, it would be highly intriguing to investigate the change in specificity of RNase P from different species. For instance, a comparison of the specificity of RNase P from a commensal like E. coli and pathogenic bacteria like Staphylococcus aureus may inform on differences that can be exploited as potential antimicrobial agents for pathogenic bacteria. There are two

149 categories of RNase P enzymes from bacteria, an A-type which includes E. coli and B-type derived from B. subtillus, and these would be predicted to have different specificities, particularly at protein-RNA contacts as the protein is less conserved.

Additionally, the P RNA subunit of RNase P makes extensive contacts with the tRNA body of the substrate but it is not clear whether there is crosstalk in substrate recognition at these sights. An understanding of the contribution of these interactions could be preliminarily gained by studying the effect of 5’ leader randomization in the background of different tRNA bodies. It would also be highly intriguing to randomize regions such as the TψC loop, anti-codon stem, 3’ACCA, or G-C base-pair at the top of the acceptor stem in isolation or in combination to interrogate another level of energetic coupling.

Finally, dissecting the mechanistic basis for this energetic coupling will be critical to understanding how and whether it might occur in other multisubstrate

RNA processing enzymes. To approach this question, the microscopic rate constants for substrate association, conformational change, and equilibrium binding will need to be measured. This would take advantage of pulse-chase experiments, stopped-flow kinetics, and fluorescence polarization anisotropy, all of which have been previously applied to RNase P. In addition, with proper and careful modification, the HTS-Kin technique may be amenable to perform some of these measurements in a high-throughput manner rather than a subset of model substrates.

150

Bibliography

1. Diederichs, S., Bartsch, L., Berkmann, J. C., Frose, K., Heitmann, J., Hoppe, C., Iggena, D., Jazmati, D., Karschnia, P., Linsenmeier, M., Maulhardt, T., Mohrmann, L., Morstein, J., Paffenholz, S. V., Ropenack, P., Ruckert, T., Sandig, L., Schell, M., Steinmann, A., Voss, G., Wasmuth, J., Weinberger, M. E., and Wullenkord, R. (2016) The dark matter of the cancer genome: aberrations in regulatory elements, untranslated regions, splice sites, non-coding RNA and synonymous mutations. EMBO Mol Med 8, 442-457 2. Johnson, R., Noble, W., Tartaglia, G. G., and Buckley, N. J. (2012) Neurodegeneration as an RNA disorder. Prog Neurobiol 99, 293-315 3. Kai, M. (2016) Roles of RNA binding proteins in DNA Damage Response. Int J Mol Sci 17, 310 4. Jiang, S., and Baltimore, D. (2016) RNA-binding protein Lin28 in cancer and immunity. Cancer Lett 375, 108-113 5. Darbelli, L., and Richard, S. (2016) Emerging functions of the Quaking RNA binding proteins and link to human diseases. Wiley Interdiscip Rev RNA 7, 399-412 6. Cookson, M. R. (2016) RNA binding proteins implicated in neurodegenerative diseases. Wiley Interdiscip Rev RNA 7. Forster, A. C., and Symons, R. H. (1987) Self-cleavage of plus and minus RNAs of a virusoid and a structural model for the active sites. Cell 49, 211-220 8. Buzayan, J. M., Hampel, A., and Bruening, G. (1986) Nucleotide sequence and newly formed phosphodiester bond of spontaneously ligated satellite tobacco ringspot virus RNA. Nucleic Acids Res 14, 9729-9743 9. Fedor, M. J. (2000) Structure and function of the hairpin ribozyme. J Mol Biol 297, 269- 291. 10. Lilley, D. M. (2004) The Varkud satellite ribozyme. Rna 10, 151-158 11. Been, M. D., and Wickham, G. S. (1997) Self-cleaving ribozymes of hepatitis delta virus RNA. European journal of biochemistry / FEBS 247, 741-753. 12. Fedorova, O., Su, L. J., and Pyle, A. M. (2002) Group II introns: highly specific endonucleases with modular structures and diverse catalytic functions. Methods 28, 323-335 13. Kruger, K., Grabowski, P. J., Zaug, A. J., Sands, J., Gottschling, D. E., and Cech, T. R. (1982) Self-splicing RNA: autoexcision and autocyclization of the ribosomal RNA intervening sequence of Tetrahymena. Cell 31, 147-157. 14. Qin, P. Z., and Pyle, A. M. (1998) The architectural organization and mechanistic function of group II intron structural elements. Curr Opin Struct Biol 8, 301-308. 15. van der Veen, R., Arnberg, A. C., van der Horst, G., Bonen, L., Tabak, H. F., and Grivell, L. A. (1986) Excised group II introns in yeast mitochondria are lariats and can be formed by self-splicing in vitro. Cell 44, 225-234 16. Guerrier-Takada, C., Gardiner, K., Marsh, T., Pace, N., and Altman, S. (1983) The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme. Cell 35, 849-857. 17. Guerrier-Takada, C., and Altman, S. (1984) Catalytic activity of an RNA molecule prepared by transcription in vitro. Science 223, 285-286. 18. Strobel, S. A., and Cochrane, J. C. (2007) RNA catalysis: ribozymes, ribosomes, and riboswitches. Current opinion in chemical biology 11, 636-643 19. Garst, A. D., Edwards, A. L., and Batey, R. T. (2011) Riboswitches: structures and mechanisms. Cold Spring Harb Perspect Biol 3

151

20. Kortmann, J., and Narberhaus, F. (2012) Bacterial RNA thermometers: molecular zippers and switches. Nat Rev Microbiol 10, 255-265 21. Wagner, E. G., and Romby, P. (2015) Small RNAs in bacteria and : who they are, what they do, and how they do it. Adv Genet 90, 133-208 22. Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden, B. L., Kundrot, C. E., Cech, T. R., and Doudna, J. A. (1996) Crystal structure of a group I ribozyme domain: principles of RNA packing. Science 273, 1678-1685. 23. Zhang, L., and Doudna, J. A. (2002) Structural insights into group II intron catalysis and branch-site selection. Science 295, 2084-2088 24. Ferre-D'Amare, A. R., Zhou, K., and Doudna, J. A. (1998) Crystal structure of a hepatitis delta virus ribozyme. Nature 395, 567-574 25. Pley, H. W., Flaherty, K. M., and McKay, D. B. (1994) Three-dimensional structure of a hammerhead ribozyme. Nature 372, 68-74. 26. Reiter, N. J., Osterman, A., Torres-Larios, A., Swinger, K. K., Pan, T., and Mondragon, A. (2010) Structure of a bacterial ribonuclease P holoenzyme in complex with tRNA. Nature 468, 784-789 27. Suslov, N. B., DasGupta, S., Huang, H., Fuller, J. R., Lilley, D. M., Rice, P. A., and Piccirilli, J. A. (2015) Crystal structure of the Varkud satellite ribozyme. Nat Chem Biol 11, 840-846 28. Yusupov, M. M., Yusupova, G. Z., Baucom, A., Lieberman, K., Earnest, T. N., Cate, J. H., and Noller, H. F. (2001) Crystal structure of the ribosome at 5.5 A resolution. Science 292, 883-896 29. Stark, B. C. (1995) Early studies of native ribonuclease P, including discovery of its essential RNA component. Mol Biol Rep 22, 95-97 30. Jarrous, N., and Gopalan, V. (2010) Archaeal/eukaryal RNase P: subunits, functions and RNA diversification. Nucleic Acids Res 38, 7885-7894 31. Xiao, S., Houser-Scott, F., and Engelke, D. R. (2001) Eukaryotic ribonuclease P: increased complexity to cope with the nuclear pre-tRNA pathway. J Cell Physiol 187, 11-20. 32. Gobert, A., Gutmann, B., Taschner, A., Gossringer, M., Holzmann, J., Hartmann, R. K., Rossmanith, W., and Giege, P. (2010) A single Arabidopsis organellar protein has RNase P activity. Nat Struct Mol Biol 17, 740-744 33. Howard, M. J., Klemm, B. P., and Fierke, C. A. (2015) Mechanistic Studies Reveal Similar Catalytic Strategies for Phosphodiester Bond Hydrolysis by Protein-only and RNA- dependent Ribonuclease P. J Biol Chem 290, 13454-13464 34. Frank, D. N., and Pace, N. R. (1998) Ribonuclease P: unity and diversity in a tRNA processing ribozyme. Annu Rev Biochem 67, 153-180. 35. Kurz, J. C., and Fierke, C. A. (2000) Ribonuclease P: a ribonucleoprotein enzyme. Current opinion in chemical biology 4, 553-558. 36. Guerrier-Takada, C., and Altman, S. (1993) A physical assay for and kinetic analysis of the interactions between M1 RNA and tRNA precursor substrates. Biochemistry 32, 7152- 7161. 37. Sun, L., Campbell, F. E., Yandek, L. E., and Harris, M. E. (2010) Binding of C5 protein to P RNA enhances the rate constant for catalysis for P RNA processing of pre-tRNAs lacking a consensus (+ 1)/C(+ 72) pair. J Mol Biol 395, 1019-1037 38. Sun, L., and Harris, M. E. (2007) Evidence that binding of C5 protein to P RNA enhances ribozyme catalysis by influencing active site metal ion affinity. Rna 13, 1505-1515 39. Brown, J. W. (1999) The Ribonuclease P Database. Nucleic Acids Res 27, 314. 40. Pace, N. R., and Brown, J. W. (1995) Evolutionary perspective on the structure and function of ribonuclease P, a ribozyme. J Bacteriol 177, 1919-1928.

152

41. Kirsebom, L. A. (2007) RNase P RNA mediated cleavage: Substrate recognition and catalysis. Biochimie 42. Wu, S., Kikovska, E., Lindell, M., and Kirsebom, L. A. (2012) Cleavage mediated by the catalytic domain of bacterial RNase P RNA. J Mol Biol 422, 204-214 43. Christian, E. L., Kaye, N. M., and Harris, M. E. (2002) Evidence for a polynuclear metal ion binding site in the catalytic domain of ribonuclease P RNA. EMBO Journal 21, 2253-2262 44. Kaye, N. M., Zahler, N. H., Christian, E. L., and Harris, M. E. (2002) Conservation of helical structure contributes to functional metal ion interactions in the catalytic domain of ribonuclease P RNA. Journal of Molecular Biology 324, 429-442 45. Kaye, N. M., Christian, E. L., and Harris, M. E. (2002) NAIM and site-specific functional group modification analysis of RNase P RNA: magnesium dependent structure within the conserved P1-P4 multihelix junction contributes to catalysis. Biochemistry 41, 4533-4545 46. Kole, R., Baer, M. F., Stark, B. C., and Altman, S. (1980) E. coli RNAase P has a required RNA component. Cell 19, 881-887 47. Niranjanakumari, S., Stams, T., Crary, S. M., Christianson, D. W., and Fierke, C. A. (1998) Protein component of the ribozyme ribonuclease P alters substrate recognition by directly contacting precursor tRNA. Proceedings of the National Academy of Sciences of the United States of America 95, 15212-15217 48. Kazantsev, A. V., and Pace, N. R. (2006) Bacterial RNase P: a new view of an ancient enzyme. Nat Rev Microbiol 4, 729-740 49. Yandek, L. E., Lin, H. C., and Harris, M. E. (2013) Alternative substrate kinetics of Escherichia coli ribonuclease P: determination of relative rate constants by internal competition. J Biol Chem 288, 8342-8354 50. Pan, T., Loria, A., and Zhong, K. (1995) Probing of tertiary interactions in RNA: 2'- hydroxyl-base contacts between the RNase P RNA and pre-tRNA. Proc Natl Acad Sci U S A 92, 12510-12514. 51. Loria, A., and Pan, T. (1997) Recognition of the T stem-loop of a pre-tRNA substrate by the ribozyme from Bacillus subtilis ribonuclease P. Biochemistry 36, 6317-6325. 52. Svard, S. G., Kagardt, U., and Kirsebom, L. A. (1996) Phylogenetic comparative mutational analysis of the ing between RNase P RNA and its substrate. Rna 2, 463-472. 53. Busch, S., Kirsebom, L. A., Notbohm, H., and Hartmann, R. K. (2000) Differential role of the intermolecular s G292-C(75) and G293- C(74) in the reaction catalyzed by Escherichia coli RNase P RNA. J Mol Biol 299, 941-951. 54. Brannvall, M., Pettersson, B. M., and Kirsebom, L. A. (2003) Importance of the +73/294 interaction in Escherichia coli RNase P RNA substrate complexes for cleavage and metal ion coordination. J Mol Biol 325, 697-709. 55. Brannvall, M., and Kirsebom, L. A. (2005) Complexity in orchestration of chemical groups near different cleavage sites in RNase P RNA mediated cleavage. J Mol Biol 351, 251-257 56. Zahler, N. H., Christian, E. L., and Harris, M. E. (2003) Recognition of the 5' leader of pre- tRNA substrates by the active site of ribonuclease P. Rna 9, 734-745 57. Zahler, N. H., Sun, L., Christian, E. L., and Harris, M. E. (2005) The pre-tRNA nucleotide base and 2'-hydroxyl at N(-1) contribute to fidelity in tRNA processing by RNase P. J Mol Biol 345, 969-985. Epub 2004 Dec 2008. 58. Koutmou, K. S., Zahler, N. H., Kurz, J. C., Campbell, F. E., Harris, M. E., and Fierke, C. A. (2010) Protein-precursor tRNA contact leads to sequence-specific recognition of 5' leaders by bacterial ribonuclease P. J Mol Biol 396, 195-208

153

59. Sun, L., Campbell, F. E., Zahler, N. H., and Harris, M. E. (2006) Evidence that substrate- specific effects of C5 protein lead to uniformity in binding and catalysis by RNase P. Embo J 25, 3998-4007 60. Koutmou, K. S., Day-Storms, J. J., and Fierke, C. A. (2011) The RNR motif of B. subtilis RNase P protein interacts with both PRNA and pre-tRNA to stabilize an active conformer. RNA 17, 1225-1235 61. Brannvall, M., Mattsson, J. G., Svard, S. G., and Kirsebom, L. A. (1998) RNase P RNA structure and cleavage reflect the primary structure of tRNA genes. J Mol Biol 283, 771- 783. 62. Loria, A., and Pan, T. (1998) Recognition of the 5' leader and the acceptor stem of a pre- tRNA substrate by the ribozyme from Bacillus subtilis RNase P. Biochemistry 37, 10126- 10133. 63. Hansen, A., Pfeiffer, T., Zuleeg, T., Limmer, S., Ciesiolka, J., Feltens, R., and Hartmann, R. K. (2001) Exploring the minimal substrate requirements for trans-cleavage by RNase P holoenzymes from Escherichia coli and Bacillus subtilis. Molecular microbiology 41, 131- 143. 64. Hsieh, J., and Fierke, C. A. (2009) Conformational change in the Bacillus subtilis RNase P holoenzyme--pre-tRNA complex enhances substrate affinity and limits cleavage rate. RNA 15, 1565-1577 65. Beebe, J. A., and Fierke, C. A. (1994) A kinetic mechanism for cleavage of precursor tRNA(Asp) catalyzed by the RNA component of Bacillus subtilis ribonuclease P. Biochemistry 33, 10294-10304 66. Kurz, J. C., Niranjanakumari, S., and Fierke, C. A. (1998) Protein component of Bacillus subtilis RNase P specifically enhances the affinity for precursor-tRNAAsp. Biochemistry 37, 2393-2400 67. Tallsjo, A., and Kirsebom, L. A. (1993) Product release is a rate-limiting step during cleavage by the catalytic RNA subunit of Escherichia coli RNase P. Nucleic Acids Res 21, 51-57. 68. Cornish-Bowden, A. (1984) Enzyme specificity: Its meaning in the general case. Journal of Theoretical Biology 108, 451-457 69. Fersht, A. (1985) Enzyme structure and mechanism, 2nd ed., Freeman and Co., New York 70. Cleland, W. W., and Cook, P. F. (2007) Enzyme Kinetics and Mechanism, Garland Publishers, London and New York 71. Cha, S. (1968) Kinetics of enzyme reactions with competing alternative substrates. Molecular pharmacology 4, 621-629 72. Kohen, A., and Limbach, H.-H. (2006) Isotope effects in chemistry and biology, Taylor & Francis, Boca Raton 73. Northrop, D. B. (1975) Steady-state analysis of kinetic isotope effects in enzymic reactions. Biochemistry 14, 2644-2651. 74. Lin, H.-C., Yandek, L. E., Gjermeni, I., and Harris, M. E. (2014) Determination of relative rate constants for in vitro RNA processing reactions by internal competition. Analytical Biochemistry 467, 54-61 75. Guenther, U. P., Yandek, L. E., Niland, C. N., Campbell, F. E., Anderson, D., Anderson, V. E., Harris, M. E., and Jankowsky, E. (2013) Hidden specificity in an apparently nonspecific RNA-binding protein. Nature 502, 385-388 76. Niland, C. N., Zhao, J., Lin, H. C., Anderson, D. R., Jankowsky, E., and Harris, M. E. (2016) Determination of the Specificity Landscape for Ribonuclease P Processing of Precursor tRNA 5' Leader Sequences. ACS Chem Biol

154

77. Niland, C. N., Jankowsky, E., and Harris, M. E. (2016) Optimization of high-throughput sequencing kinetics for determining enzymatic rate constants of thousands of RNA substrates. Anal Biochem 510, 1-10 78. Hsu, T. Y., Simon, L. M., Neill, N. J., Marcotte, R., Sayad, A., Bland, C. S., Echeverria, G. V., Sun, T., Kurley, S. J., Tyagi, S., Karlin, K. L., Dominguez-Vidana, R., Hartman, J. D., Renwick, A., Scorsone, K., Bernardi, R. J., Skinner, S. O., Jain, A., Orellana, M., Lagisetti, C., Golding, I., Jung, S. Y., Neilson, J. R., Zhang, X. H., Cooper, T. A., Webb, T. R., Neel, B. G., Shaw, C. A., and Westbrook, T. F. (2015) The spliceosome is a therapeutic vulnerability in MYC-driven cancer. Nature 525, 384-388 79. Shi, Y., Joyner, A. S., Shadrick, W., Palacios, G., Lagisetti, C., Potter, P. M., Sambucetti, L. C., Stamm, S., and Webb, T. R. (2015) Pharmacodynamic assays to facilitate preclinical and clinical development of pre-mRNA splicing modulatory drug candidates. Pharmacol Res Perspect 3, e00158 80. Guan, L., and Disney, M. D. (2012) Recent advances in developing small molecules targeting RNA. ACS Chem Biol 7, 73-86 81. Dominguez, A. A., Lim, W. A., and Qi, L. S. (2016) Beyond editing: repurposing CRISPR- Cas9 for precision genome regulation and interrogation. Nat Rev Mol Cell Biol 17, 5-15 82. Chappell, J., Watters, K. E., Takahashi, M. K., and Lucks, J. B. (2015) A renaissance in RNA synthetic biology: new mechanisms, applications and tools for the future. Current opinion in chemical biology 28, 47-56 83. Peters, G., Coussement, P., Maertens, J., Lammertyn, J., and De Mey, M. (2015) Putting RNA to work: Translating RNA fundamentals into biotechnological engineering practice. Biotechnol Adv 33, 1829-1844 84. Esakova, O., and Krasilnikov, A. S. (2010) Of proteins and RNA: the RNase P/MRP family. RNA 16, 1725-1747 85. Guerrier-Takada, C., McClain, W. H., and Altman, S. (1984) Cleavage of tRNA precursors by the RNA subunit of E. coli ribonuclease P (M1 RNA) is influenced by 3'-proximal CCA in the substrates. Cell 38, 219-224. 86. Christian, E. L., Zahler, N. H., Kaye, N. M., and Harris, M. E. (2002) Analysis of substrate recognition by the ribonucleoprotein endonuclease RNase P. Methods 28, 307-322 87. Harris, M. E., Nolan, J. M., Malhotra, A., Brown, J. W., Harvey, S. C., and Pace, N. R. (1994) Use of photoaffinity crosslinking and molecular modeling to analyze the global architecture of ribonuclease P RNA. EMBO Journal 13, 3953-3963 88. Harris, M. E., Kazantsev, A. V., Chen, J. L., and Pace, N. R. (1997) Analysis of the tertiary structure of the ribonuclease P ribozyme-substrate complex by site-specific photoaffinity crosslinking. Rna 3, 561-576 89. Christian, E. L., McPheeters, D. S., and Harris, M. E. (1998) Identification of individual nucleotides in the bacterial ribonuclease P ribozyme adjacent to the pre-tRNA cleavage site by short-range photo-cross-linking. Biochemistry 37, 17618-17628 90. Christian, E. L., and Harris, M. E. (1999) The track of the pre-tRNA 5' leader in the ribonuclease P ribozyme-substrate complex. Biochemistry 38, 12629-12638 91. Brannvall, M., Fredrik Pettersson, B. M., and Kirsebom, L. A. (2002) The residue immediately upstream of the RNase P cleavage site is a positive determinant. Biochimie 84, 693-703. 92. LaGrandeur, T. E., Huttenhofer, A., Noller, H. F., and Pace, N. R. (1994) Phylogenetic comparative chemical footprint analysis of the interaction between ribonuclease P RNA and tRNA. Embo J 13, 3945-3952.

155

93. Crary, S. M., Niranjanakumari, S., and Fierke, C. A. (1998) The protein component of Bacillus subtilis ribonuclease P increases catalytic efficiency by enhancing interactions with the 5' leader sequence of pre-tRNAAsp. Biochemistry 37, 9409-9416 94. Rueda, D., Hsieh, J., Day-Storms, J. J., Fierke, C. A., and Walter, N. G. (2005) The 5' leader of precursor tRNAAsp bound to the Bacillus subtilis RNase P holoenzyme has an extended conformation. Biochemistry 44, 16130-16139 95. Agrawal, A., Mohanty, B. K., and Kushner, S. R. (2014) Processing of the seven valine tRNAs in Escherichia coli involves novel features of RNase P. Nucleic Acids Res 42, 11166-11179 96. Herschlag, D. (1988) The role of induced fit and conformationla changes of enzymes in specificity and catalysis. Bioorganic chemistry 16, 62-96 97. Cleland, W. W., and Hengge, A. C. (2006) Enzymatic mechanisms of phosphate and sulfate transfer. Chemical reviews 106, 3252-3278 98. Crooks, G. E., Hon, G., Chandonia, J. M., and Brenner, S. E. (2004) WebLogo: a sequence logo generator. Genome research 14, 1188-1190 99. Zuker, M. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31, 3406-3415 100. Stormo, G. D. (2013) Modeling the specificity of protein-DNA interactions. Quant Biol. 1, 115 - 130 101. Loria, A., Niranjanakumari, S., Fierke, C. A., and Pan, T. (1998) Recognition of a pre-tRNA substrate by the Bacillus subtilis RNase P holoenzyme. Biochemistry 37, 15466-15473 102. Svard, S. G., and Kirsebom, L. A. (1993) Determinants of Escherichia coli RNase P cleavage site selection: a detailed in vitro and in vivo analysis. Nucleic Acids Res 21, 427- 434. 103. Oh, B. K., and Pace, N. R. (1994) Interaction of the 3'-end of tRNA with ribonuclease P RNA. Nucleic Acids Research 22, 4087-4094 104. Guo, X., Campbell, F. E., Sun, L., Christian, E. L., Anderson, V. E., and Harris, M. E. (2006) RNA-dependent folding and stabilization of C5 protein during assembly of the E. coli RNase P holoenzyme. J Mol Biol 360, 190-203 105. Parker, R., and Song, H. (2004) The enzymes and control of eukaryotic mRNA turnover. Nature Struct Mol Biol., 121 - 127 106. Wohlgemuth, I., Pohl, C., Mittelstaet, J., Konevega, A. L., and Rodnina, M. V. (2011) Evolutionary optimization of speed and accuracy of decoding on the ribosome. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 366, 2979-2986 107. Zaher, H. S., Shaw, J. J., Strobel, S. A., and Green, R. (2011) The 2'-OH group of the peptidyl-tRNA stabilizes an active conformation of the ribosomal PTC. EMBO J 30, 2445- 2453 108. Wichtowska, D., Turowski, T. W., and Boguta, M. (2013) An interplay between transcription, processing, and degradation determines tRNA levels in yeast. Wiley Interdiscip Rev RNA 4, 709-722 109. Lin, S., and Gregory, R. I. (2015) MicroRNA biogenesis pathways in cancer. Nat Rev Cancer 15, 321-333 110. Jankowsky, E., and Harris, M. E. (2015) Specificity and nonspecificity in RNA-protein interactions. Nat Rev Mol Cell Biol 16, 533-544 111. Fersht, A. (1998) Structure and Mechanism in Protein Science, W.H. Freemand & Co. 112. Buenrostro, J. D., Araya, C. L., Chircus, L. M., Layton, C. J., Chang, H. Y., Snyder, M. P., and Greenleaf, W. J. (2014) Quantitative analysis of RNA-protein interactions on a

156

massively parallel array reveals biophysical and evolutionary landscapes. Nature Biotechnol. 32, 562 - 568 113. McKeague, M., Wong, R. S., and Smolke, C. D. (2016) Opportunities in the design and application of RNA for gene expression control. Nucleic Acids Res 44, 2987-2999 114. Chen, Y., and Varani, G. (2013) Engineering RNA binding proteins for biology. Febs j 280, 3734-3754 115. Choudhury, R., Tsai, Y. S., Dominguez, D., Wang, Y., and Wang, Z. (2012) Engineering RNA endonucleases with customized sequence specificities. Nature Commun. 3, 1147 116. Tuerk, C., and Gold, L. (1990) Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505-510 117. Zykovich, A., Korf, I., and Segal, D. J. (2009) Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Res 37, e151 118. Tome, J. M., Ozer, A., Pagano, J. M., Gheba, D., Schroth, G. P., and Lis, J. T. (2014) Comprehensive analysis of RNA-protein interactions by high-throughput sequencing- RNA affinity profiling. Nat Methods 11, 683-688 119. Anderson, V. E. (2015) Multiple alternative substrate kinetics. Biochim Biophys Acta 1854, 1729-1736 120. Cleland, W. W. (2005) The use of isotope effects to determine enzyme mechanisms. Arch Biochem Biophys 433, 2-12 121. Herschlag, D. (1988) The Role of Induced Fit and Conformational Changes of Enzymes in Specificity and Catalysis. Bioorganic chemistry 16, 62-96 122. Kellerman, D. L., Simmons, K. S., Pedraza, M., Piccirilli, J. A., York, D. M., and Harris, M. E. (2015) Determination of hepatitis delta virus ribozyme N(-1) nucleobase and functional group specificity using internal competition kinetics. Anal Biochem 483, 12-20 123. Kebschull, J. M., and Zador, A. M. (2015) Sources of PCR-induced distortions in high- throughput sequencing data sets. Nucleic Acids Res 43, e143 124. Kennedy, K., Hall, M. W., Lynch, M. D., Moreno-Hagelsieb, G., and Neufeld, J. D. (2014) Evaluating bias of illumina-based bacterial 16S rRNA gene profiles. Appl Environ Microbiol 80, 5717-5722 125. Brandariz-Fontes, C., Camacho-Sanchez, M., Vila, C., Vega-Pla, J. L., Rico, C., and Leonard, J. A. (2015) Effect of the enzyme and PCR conditions on the quality of high- throughput DNA sequencing results. Scientific reports 5, 8056 126. Pawluczyk, M., Weiss, J., Links, M. G., Egana Aranguren, M., Wilkinson, M. D., and Egea- Cortines, M. (2015) Quantitative evaluation of bias in PCR amplification and next- generation sequencing derived from metabarcoding samples. Anal Bioanal Chem 407, 1841-1848 127. Peng, Q., Vijaya Satya, R., Lewis, M., Randad, P., and Wang, Y. (2015) Reducing amplification artifacts in high multiplex amplicon sequencing by using molecular barcodes. BMC genomics 16, 589 128. D'Amore, R., Ijaz, U. Z., Schirmer, M., Kenny, J. G., Gregory, R., Darby, A. C., Shakya, M., Podar, M., Quince, C., and Hall, N. (2016) A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling. BMC genomics 17, 55 129. Quail, M. A., Smith, M., Coupland, P., Otto, T. D., Harris, S. R., Connor, T. R., Bertoni, A., Swerdlow, H. P., and Gu, Y. (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC genomics 13, 341

157

130. Yang, X., Aluru, S., and Dorman, K. S. (2011) Repeat-aware modeling and correction of short read errors. BMC Bioinformatics 12 Suppl 1, S52 131. Schirmer, M., D'Amore, R., Ijaz, U. Z., Hall, N., and Quince, C. (2016) Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data. BMC Bioinformatics 17, 125 132. Sullenger, B. A., and Nair, S. (2016) From the RNA world to the clinic. Science (New York, N.Y.) 352, 1417-1420 133. Maier, K. E., and Levy, M. (2016) From selection hits to clinical leads: progress in aptamer discovery. Molecular therapy. Methods & clinical development 5, 16014 134. Jankowsky, E., and Harris, M. E. (2015) Specificity and nonspecificity in RNA-protein interactions. Nat Rev Mol Cell Biol 16, 533-544 135. Ascano, M., Hafner, M., Cekan, P., Gerstberger, S., and Tuschl, T. (2012) Identification of RNA-protein interaction networks using PAR-CLIP. Wiley Interdiscip Rev RNA 3, 159-177 136. Zhao, J., Ohsumi, T. K., Kung, J. T., Ogawa, Y., Grau, D. J., Sarma, K., Song, J. J., Kingston, R. E., Borowsky, M., and Lee, J. T. (2010) Genome-wide identification of polycomb- associated RNAs by RIP-seq. Mol Cell 40, 939-953 137. Licatalosi, D. D., Mele, A., Fak, J. J., Ule, J., Kayikci, M., Chi, S. W., Clark, T. A., Schweitzer, A. C., Blume, J. E., Wang, X., Darnell, J. C., and Darnell, R. B. (2008) HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464-469 138. Cook, K. B., Hughes, T. R., and Morris, Q. D. (2015) High-throughput characterization of protein-RNA interactions. Briefings in functional genomics 14, 74-89 139. Lambert, N., Robertson, A., Jangi, M., McGeary, S., Sharp, P. A., and Burge, C. B. (2014) RNA Bind-n-Seq: quantitative assessment of the sequence and structural binding specificity of RNA binding proteins. Mol Cell. 54, 887 - 900 140. Ozer, A., Tome, J. M., Friedman, R. C., Gheba, D., Schroth, G. P., and Lis, J. T. (2015) Quantitative assessment of RNA-protein interactions with high-throughput sequencing- RNA affinity profiling. Nature protocols 10, 1212-1233 141. Ascano, M., Gerstberger, S., and Tuschl, T. (2013) Multi-disciplinary methods to define RNA-protein interactions and regulatory networks. Current opinion in genetics & development 23, 20-28 142. Klemm, B. P., Wu, N., Chen, Y., Liu, X., Kaitany, K. J., Howard, M. J., and Fierke, C. A. (2016) The Diversity of Ribonuclease P: Protein and RNA Catalysts with Analogous Biological Functions. Biomolecules 6 143. Brannvall, M., Kikovska, E., and Kirsebom, L. A. (2004) Cross talk between the +73/294 interaction and the cleavage site in RNase P RNA mediated cleavage. Nucleic Acids Res 32, 5418-5429. Print 2004. 144. Hardt, W. D., Schlegl, J., Erdmann, V. A., and Hartmann, R. K. (1993) Role of the D arm and the anticodon arm in tRNA recognition by eubacterial and eukaryotic RNase P enzymes. Biochemistry 32, 13046-13053. 145. Pace, N. R., Reich, C., James, B. D., Olsen, G. J., Pace, B., and Waugh, D. S. (1987) Structure and catalytic function in ribonuclease P. Cold Spring Harb Symp Quant Biol 52, 239-248. 146. Altman, S., and Guerrier-Takada, C. (1986) M1 RNA, the RNA subunit of Escherichia coli ribonuclease P, can undergo a pH-sensitive conformational change. Biochemistry 25, 1205-1208. 147. Crooks, G. E., Hon, G., Chandonia, J. M., and Brenner, S. E. (2004) WebLogo: a sequence logo generator. Genome Res. 14, 1188 - 1190.

158

148. Niland, C. N., Zhao, J., Lin, H. C., Anderson, D. R., Jankowsky, E., and Harris, M. E. (2016) Determination of the Specificity Landscape for Ribonuclease P Processing of Precursor tRNA 5' Leader Sequences. ACS chemical biology 11, 2285-2292 149. Lin, H. C., Yandek, L. E., Gjermeni, I., and Harris, M. E. (2014) Determination of relative rate constants for in vitro RNA processing reactions by internal competition. Anal Biochem 467, 54-61 150. Wegscheid, B., and Hartmann, R. K. (2006) The precursor tRNA 3'-CCA interaction with Escherichia coli RNase P RNA is essential for catalysis by RNase P in vivo. Rna 12, 2135- 2148 151. McConnell, T. S., Cech, T. R., and Herschlag, D. (1993) Guanosine binding to the Tetrahymena ribozyme: thermodynamic coupling with oligonucleotide binding. Proceedings of the National Academy of Sciences of the United States of America 90, 8362-8366 152. Olejniczak, M., Dale, T., Fahlman, R. P., and Uhlenbeck, O. C. (2005) Idiosyncratic tuning of tRNAs to achieve uniform ribosome binding. Nat Struct Mol Biol 12, 788-793 153. Sunden, F., Peck, A., Salzman, J., Ressl, S., and Herschlag, D. (2015) Extensive site- directed mutagenesis reveals interconnected functional units in the alkaline phosphatase active site. Elife 4 154. Gupta, A., and Gribskov, M. (2011) The role of RNA sequence and structure in RNA-- protein interactions. J Mol Biol 409, 574-587 155. von Hippel, P. H., and Berg, O. G. (1986) On the specificity of DNA-protein interactions. Proc Natl Acad Sci U S A 83, 1608-1612. 156. Ray, D., Kazan, H., Chan, E. T., Peña-Castillo, L., Chaudhry, S., Talukder, S., Blencowe, B. J., Morris, Q., and Hughes, T. R. (2009) Rapid and systematic analysis of the RNA recognition specificities of RNA binding proteins. Nat. Biotechnol. 27, 667 - 670 157. Campbell, Z. T., Bhimsaria, D., Valley, C. T., Rodriguez-Martinez, J. A., Menichelli, E., Williamson, J. R., Ansari, A. Z., and Wickens, M. (2012) Cooperativity in RNA-Protein Interactions: Global Analysis of RNA Binding Specificity. Cell Rep. 1, 570 - 581 158. Singh, R., and Valcárcel, J. (2005) Building specificity with nonspecific RNA binding proteins. Nat. Struct. Mol. Biol. 12, 645 - 653 159. Zhuang, F., Fuchs, R. T., Sun, Z., Zheng, Y., and Robb, G. B. (2012) Structural bias in T4 RNA ligase-mediated 3'-adapter ligation. Nucleic Acids Res. 40, e54 160. Smith, J. K., Hsieh, J., and Fierke, C. A. (2007) Importance of RNA-protein interactions in bacterial ribonuclease P structure and catalysis. Biopolymers 87, 329-338 161. Schellenberger, V., Siegel, R. A., and Rutter, W. J. (1993) Analysis of enzyme specificity by multiple substrate kinetics. Biochemistry 32, 4344-4348 162. Lorenz, C., Gesell, T., Zimmermann, B., Schoeberl, U., Bilusic, I., Rajkowitsch, L., Waldsich, C., von Haeseler, A., and Schroeder, R. (2010) Genomic SELEX for Hfq-binding RNAs identifies genomic aptamers predominantly in antisense transcripts. Nucleic Acids Res. 38, 3794 - 3808 163. Pitt, J. N., and Ferre-D'Amare, A. R. (2010) Rapid construction of empirical RNA fitness landscapes. Science 330, 376-379 164. Badis, G., Berger, M. F., Philippakis, A. A., Talukder, S., Gehrke, A. R., Jaeger, S. A., Chan, E. T., Metzler, G., Vedenko, A., Chen, X., Kuznetsov, H., Wang, C. F., Coburn, D., Newburger, D. E., Morris, Q., Hughes, T. R., and Bulyk, M. L. (2009) Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720 - 1723 165. Stormo, G., and Zhao, Y. (2010) Determining the specificity of protein - DNA interactions. Nat. Rev. Genetics 11, 751 - 760

159

166. Rowe, W., Platt, M., Wedge, D. C., Day, P. J., Kell, D. B., and Knowles, J. (2009) Analysis of a complete DNA-protein affinity landscape. J. R. Soc. Interface 7, 397 - 408 167. Nutiu, R., Friedman, R. C., Luo, S., Khrebtukova, I., Silva, D., Li, R., Zhang, L., Schroth, G. P., and Burge, C. B. (2011) Direct measurement of DNA affinity landscapes on a high- throughput sequencing instrument. Nat. Biotechnol. 29, 659 - 664 168. SantaLucia, J. J., and Turner, D. H. (1997) Measuring the thermodynamics of RNA secondary structure formation. Biopolymers 44, 309 - 319 169. Forsdyke, D. R. (2007) Calculation of folding energies of single-stranded nucleic acid sequences: conceptual issues. J. Theor. Biol. 248, 745 - 753 170. Maerkl, S. J., and Quake, S. R. (2007) A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233 - 237 171. Zhao, Y., and Stormo, G. D. (2011) Quantitative analysis demonstrates most transcription factors require only simple models of specificity. Nat Biotechnol 29, 480- 483 172. Leontis, N. B., Lescoute, A., and Westhof, E. (2006) The building blocks and motifs of RNA architecture. Curr Opin Struct Biol 16, 279-287 173. Snoussi, K., and Leroy, J. L. (2001) Imino proton exchange and kinetics in RNA duplexes. Biochemistry 40, 8898 - 8904 174. LaRiviere, F. J., Wolfson, A. D., and Uhlenbeck, O. C. (2001) Uniform binding of aminoacyl-tRNAs to elongation factor Tu by thermodynamic compensation. Science 294, 165-168. 175. Guo, X., Campbell, F. E., Sun, L., Christian, E. L., Anderson, V. E., and Harris, M. E. (2006) RNA-dependent folding and stabilization of C5 protein during assembly of the E. coli RNase P holoenzyme. J. Mol. Biol. 360, 190 - 203 176. Christian, E. L., McPheeters, D. S., and Harris, M. E. (1998) Identification of individual nucleotides in the bacterial ribonuclease P ribozyme adjacent to the pre-tRNA cleavage site by short-range photo-cross-linking. Biochemistry 37, 17618 - 17628 177. Schellenberger, V., Siegel, R. A., and Rutter, W. J. (1993) Analysis of Enzyme Specificity by Multiple Substrate Kinetics. Biochemistry 32, 4344 - 4348 178. Cha, S. (1968) Kinetics of enzyme reactions with competing alternative substrates. . Mol. Pharamacol. 4, 621 - 624 179. Northrop, D. B. (1983) Fitting enzyme-kinetic data to V/K. Anal. Biochem. 132, 457 - 461 180. Northrop, D. B. (1999) Rethinking fundamentals of enzyme action. Adv. Enzymol. Relat. Areas Mol. Biol. 73, 25 - 55

160