1

EFFECTS OF THE RIBOSOMAL EXIT SITE ON TRANSLATIONAL ACCURACY

BY

MICHAEL WARD

A Dissertation Submitted to the Graduate Faculty of

WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES

in Partial fulfillment of the Requirements

for a Degree of

DOCTOR OF PHILOSOPHY

Biology

December, 2020

Winston-Salem, North Carolina

Approved By:

James F. Curran, Ph.D., Advisor

Rebecca Alexander, Ph.D., Chair

David A. Ornelles, Ph.D.

Ke Reid, Ph.D.

Brian W. Tague, Ph.D. ii

TABLE OF CONTENTS

LIST OF FIGURES iii

LIST OF ABBREVIATIONS iv

ABSTRACT v

INTRODUCTION 1

Codon-Anticodon Interaction and Reading Frame Maintenance 4

The E-site Wobble Position Codon and Decoding Accuracy 15

Mutations that Increase Misreading and Reveal More Non-Randomness… 21

METHODS 33

Initiation Studies 39

Misreading Studies 42

RESULTS 52

Frameshifting 52

Misreading 58

Sequence Analysis 73

DISCUSSION 91

The E-site and Ribosomal Frameshifting 91

The E-site and Misreading 94

Conclusions 101

REFERENCES 104

SCHOLASTIC VITA 136 iii

LIST OF FIGURES AND TABLES

TABLES Page

1 Frameshifting frequencies 57

2 List of activities recorded after introducing random E-site codons 70

3 Specific Mutations 96

FIGURES

1 Map of the pJC1168 plasmid 36

2 Analysis of total in our frameshift samples 41

3 Outline of two test constructs used 44

4 The complex at the time of frameshift 53

5 Protein levels of E-site synonym misreading samples 59

6 ß-Galactosidase activities at codon 461 Glu 61

7 ß-Galactosidase activities at codon 503 Tyr 63

8 ß-Galactosidase activities at codon 460 His 65

9 Protein levels of E-site ATG misreading samples 67

10 ATG in different contexts 72

11 Amino acid frequencies 76

12 Codon comparison with second position codon 78

13 Serine codon usage 80

14 Leucine codon usage 82

15 Arginine codon usage 84

16 Second and third codon analysis 87

17 Frequencies at the first, second and third nucleotide positions 89 iv

LIST OF ABBREVIATIONS

General aa-tRNA Aminoacyl-tRNA DMSO Dimethyl sulfoxide EF-Tu Elongation factor thermo unstable GDPNP 5'-Guanylyl imidodiphosphate GTP Guanosine triphosphate KLD Kinase, Ligase and DpnI LB Luria-Bertani medium ORF Open reading frame PEG Polyethylene glycol RBS binding site RF1 Release factors 1 RF2 Release factors 2 rRNA Ribosomal ribonucleic acid SD Shine–Dalgarno SDS-PAGE sodium dodecyl sulphate–polyacrylamide gel electrophoresis SOC Super Optimal Broth tRNA Transfer ribonucleic acid VBsup Supplemented Vogel-bonner medium X-gal 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside

DNA/RNA Bases A Adenine B C or G or T/U C Cytosine D A or G or T/U G Guanine H A or C or T/U K G or T/U M A or C N Any base R A or G S G or C T Thymine U Uracil V A or C or G W A or T/U Y C or T/U v

ABSTRACT

Not all coding sequences are translated equally well, and this document describes studies of sequence features that affect . Prior works suggest that codon:anticodon interaction in the ribosomal E-site helps maintain the reading frame.

Here is tested the hypothesis that non-Watson-Crick codon:anticodon pairing at non-

AUG initiation codons causes frameshifting. The results show that non-Watson:Crick pairing at the first codon position is not associated with frameshifting. The significance of the tRNA in translation is as an adapter molecule in delivering specific amino acids to matching codons on the mRNA chain. It has been suggested that correct discrimination of the correct aminoacyl-tRNA among approximately 60 competitors is accomplished in part by an allosteric interaction between tRNAs in the E-site and the A-site. This model suggests that the nature of the E site tRNA can affect the accuracy of aminoacyl-tRNA selection in the A site. Here is examined the effects, in vivo, of altering the E-site codon and assaying the effects on A-site misreading. Codons for essential amino acids codons in

ß-Galactosidase were mutated such that enzyme activity requires misreading to insert the correct amino acid. Then, additional mutations at the E site position were made to determine whether the E site tRNA would affect misreading at the A site. It is shown that the E site codon can affect apparent misreading in some but not all tested sites.

Moreover, it is shown that an AUG initiation codon in the E-site may dramatically increase A site misreading, leading to a hypothesis that sites that resemble translational initiation sequences may allow for high-frequency misreading of at least one codon

(CAA misread as GAA). Finally, analyses of the entire E. coli genome sequence shows that codon and nucleotide biases differ for initiation and internal sequences, and strongly vi suggest that at least some of the initiation region bias has a translational cause. The described studies could provide valuable background for future work on sequence features that modulate genetic translation.

1

INTRODUCTION

Understanding the mechanism of translation is a fundamental goal of Molecular

Biology. Yet, after 50 years of study, the mechanism by which decode sequence information with high accuracy is still only partially understood. Consider that due to the degeneracy of the genetic code, with its average of three codons per amino acid, the theoretical number of possible 300-codon open reading frames (ORFs) is ~3300

(~10142), which is greater than the estimated number of subatomic particles in the universe. One might ask, could all of them be equally translatable? The available evidence answers with a resounding NO! The literature is replete with examples of

“recoding,” in which specific sites are translated in ways that differ from the rules predicted by the genetic code (Baronov 2002). At these various sites, nucleotide sequences cause the translational apparatus to skip or reread nucleotides, or to misread codons. Mutational characterizations of such sites have greatly increased our understanding of both the standard and these variant translational mechanisms. Viewed the other way around, most sites are accurately translated because they do not have features that cause recoding. It seems very likely to me that one should be able to apply that understanding to predict as yet undiscovered translational properties or phenomena.

Genes show extensive non-randomness in codon usage (Sharp et al. 2005, Brandis and Hughes 2016), and even in nucleotide usage based on codon position. The latter can readily be seen in E. coli genes as the “G-nonG-N” pattern in ORFs (Trifonov 1987), which is likely due to demand on amino acid usage in (Curran and Gross 1994) and on demands imposed by the replication machinery (“GC-Skew”; Awakara and

Tomita 2007). Codon bias may have many causes including matching codon usage to 2 tRNA abundance, which should promote the accuracy and efficiency of expression (Quax et al. 2015). But there are likely other factors that affect efficiency and accuracy, but except for avoidance of frameshift-prone sites (Schwartz and Curran 1997, Gurvich et al.

2005) those are largely unexplored. Intriguingly, codon bias near initiation sites differs substantially from that internal to ORFs (Looman et al. 1985, Tats et al. 2006, Stenström and Isaksson 2001, Bivona et al. 2010); and most notably, initiation sites contain sequences that are known to be frameshift prone (Fu and Parker 1994, Prere et al. 2011).

It seems likely to me that unknown features of initiation sites and/or complexes mitigate this frameshifting potential.

Ribosomes have three tRNA binding sites

Biological polypeptide synthesis occurs when a nucleic acid sequence carried by messenger RNA is translated into a polypeptide chain by a ribonucleoprotein known as a ribosome. Intermediaries in this process are the aminoacyl-tRNAs (aa-tRNAs), which bind to codons while carrying an appropriate amino acid for addition to the polypeptide.

In 1963, Watson predicted the ribosome would have two coding sites: an “A” site for incoming aa-tRNAs, and a “P” site for peptidyl-tRNA. Protein synthesis would then occur by the transfer of the nascent peptide from the peptidyl-tRNA to the amino acid on the A-site tRNA. These basic functions for these sites have been firmly established.

However, later evidence suggested the existence of a third site (the exit or E-site) where deacylated tRNA binds prior to exiting the ribosome, and that codon-anticodon interaction occurs in this site (Rheinberger and Nierhaus 1980, Rheinberger et al. 1981).

The ribosomal E-site was first discovered using an in vitro, poly-U programmed ribosomal system, by showing that the ribosome could bind three tRNAs simultaneously 3 in a codon-specific manner (Rheinberger and Nierhaus 1980, Rheinberger et al. 1981).

The E-site has since been found in the ribosomes of every domain of life (Wilson and

Nierhaus 2007, see also Sato et al. 2001). It was not immediately clear why ribosomes would have a third site; it is not essential for tRNA binding in either the P- or A-sites, and it is not required for peptidyl transfer. But the fact that ribosomes from all organisms have it suggests that it is selected for something. Otherwise, it is unlikely that it would be conserved over the innumerable organismal generations that separate the living domains.

Hypotheses on translational roles for the ribosomal E-site

Two translational functions have been suggested for the E-site. Evidence for them is discussed in detail below, but they are briefly introduced here to clarify my major hypotheses. First, codon:anticodon interaction in the E-site can block slippage of the P- site tRNA along the message and thus help maintain the reading frame in vitro (Blaha and Nierhaus 2001) and in vivo (Sanders and Curran, 2007). In other studies, the sequences near translational initiation sites have features that suggest that they may be frameshift-prone (Gurvich et al. 2003). Therefore, I hypothesized that the E-site tRNA can help preserve the reading frame on artificially constructed frameshift prone sequences at an initiation site.

The other proposed function of the E-site is to increase the accuracy of aa-tRNA selection at the A-site (Nierhaus and Rheinberger 1984). It was observed in an in vitro system that an E-site tRNA can inhibit binding by an aa-tRNA in the A-site. Those observations led to the prediction that cognate codon:anticodon complexes in the E-site would prevent weak, near-cognate binding to the A-site; this prediction also has support from observations made with an in vitro experimental system. Therefore, I hypothesized 4 that the frequency of misreading at the A-site could be altered by changing the codon corresponding to the E-site.

CODON-ANTICODON INTERACTION AND READING FRAME MAINTENANCE

The ribosomal E-site was first discovered using the poly U system, by showing that the ribosome could bind three tRNAs simultaneously (Rheinberger and Nierhaus

1980, Rheinberger et al. 1981). Further studies implicated the E-site in reading frame maintenance, which is essential for the production of functional proteins (Blaha and

Nierhaus 2001). In our lab, theoretical models have suggested that the role of the E-site in reading frame maintenance is crucial, but so far there is limited support for this model

(Lim and Curran 2001).

Frameshift

Although there are some exceptions found in viruses the genetic code is not overlapping. Each codon or triplet specifies a single amino acid. Since each codon contains three nucleotides this means that there are three possible reading frames for each specific sequence. The significance is that they each specify a completely different set of amino acids and therefore entirely different proteins. In fact, if the wrong reading frame is used it will produce a sequence of amino acids that codes for no functioning protein at all. To specify the correct sequence, the reading frame is established by the first codon of the mRNA that specifies an amino acid called the initiation codon. The most common initiation codon is AUG, and other similar codons being rarely used. In bacterial cells, this codon specifies an N-formylmethionine, a modified form of methionine containing a formyl group. The formyl group, if not the entire amino acid, is later removed. Any AUG 5 codons encountered before a stop codon are then encoded as a normal methionine amino acid. Just knowing that a frameshift can cause codons to be read in a way that they produce something with no function at all is enough to understand that maintaining the frame is important. However, frameshifts do occur at a very low level. The ribosome is an enzyme, it catalyzes a reaction that converts an mRNA sequence transcribed from

DNA into an amino acid sequence that can then fold into a complex protein. Unlike some much simpler enzyme driven chemical reactions the ribosome has a complex set of interactions that lead to the final product. Here I would like to examine in detail some of the aspects of the interacting molecules that specifically affect frameshifting.

Structures of tRNA

Though the nucleotide sequence varies among organisms, the structure of the tRNA molecule is highly preserved across all three domains of life. It consists of four stems and three loops with the acceptor stem terminating with a CCA sequence at the 3' end where the amino acid is attached. There are also certain nucleotides that are edited post-transcriptionally. The D-loop generally contains Dihydrouridine while the TψCG loop usually contains ribothymidine and pseudouridine. Most importantly for our purposes, the anticodon contains the triplet that will base pair with the corresponding codon in the mRNA. Anticodons often contain modified nucleotides that alter base pairing specificity; for example, the anticodon contains an Inosine, a deaminated product of adenine. The use of Inosine is important because it can form non-Watson-Crick base pairs with A, U and C nucleotides. This allows for non-standard base pairing if Watson-

Crick base pairing was a requirement then cells would need to produce 61 tRNAs.

However, tRNAs have the capacity to recognize more than one codon due to the Inosine 6 in the anticodon loop. The first and second bases form Watson-Crick base pairs while the third can form Crick 'wobble pairs' that can recognize either pyrimidines or purines at the

3' position. While wobble base pairing at the 3' position is relatively well understood, it is not the only type of wobble pairing that is seen. In , while the most commonly recognized start codon is AUG, GUG and UUG are also used even though they all are read by the same initiation tRNA with a 5’-CAU-3’ anticodon. Since the AUG codon is the only one using three Watson-Crick base pairs the others may have decreased usage possibly due to their instability with binding energies being half that of methionine in some cases.

Codon Context

Protein synthesis is one of the most energy consuming processes of the cell. It has been suggested that up to 95% of the cell's resources are utilized for protein production

(Ingraham et al. 1983). Because of this, it is expected that highly expressed genes are selected on the basis of rapid as well as accurate translation. One possible way to improve speed and accuracy is to use codons for which the corresponding tRNAs are readily available. This has possible benefits in that the codon can be paired with the cognate tRNA more rapidly, decreasing the probability of translational error. Moreover, this rapid pairing also decreases the likelihood of ribosomal stalling, which can lead to frameshift errors. In fact, it has been suggested that some of the most common codons correspond to tRNAs with the highest concentrations possibly as a result of coevolution

(Ikemura 1981, 1985; Ikemura and Ozeki 1983; Gouy and Gautier 1982; Bulmer 1987;

Osawa et al. 1992). Further, the effectiveness of expressing foreign genes can be reduced because of increased errors at rare codons (Spanjaard and Van Duin 1988, Bogosian et al. 7

1990, Scorer et al. 1991). Speed and accuracy, however, do not have to be incompatible.

This is especially true in the case of frameshift errors, in fact, a decrease in the rate of the movement of the ribosome on the mRNA has been linked with an increase in frameshifts

(Sipley and Goldman 1993). In the case of the initiation codon, however, all the different codons are being paired with the same tRNA anticodon. This leads us to believe that in this case, the message stability is a factor in determining the rate of frameshifting as opposed to the rate at which the codon is read.

Frameshift Prone Sites

I have so far discussed only the detrimental effects of frameshifting, creating proteins that can have no function or even may be harmful, at the very least being big energy wasters for the cell. There are sites where frameshifting is common and even necessary for the production of essential proteins. For example, the prfB gene of

Escherichia coli (E. coli) is a programmed that is required for the production of release factor 2 (RF2). RF2 is involved in the termination of translation by recognizing the UAA and UGA stop codons. The majority of frameshift occur in the -1 direction in prokaryotes and is thought to involve simultaneous slippage of the A- and P- site tRNAs (Jacks et al. 1988, Craigen and Casey, 1986). The +1 frameshift is less common, the prfB gene is the only known example of this type in prokaryotes. It occurs at the CUU UGA codon while the tRNALeu is in the P-site and UGA in the A-site. A frameshift is dependent on the concentration of RF2, with termination occurring at high concentrations. In low concentrations, tRNALeu slips by a single nucleotide in the 5' direction from CUU onto UUU. This is however not the only example of programmed frameshifting in prokaryotes, -1 frameshifting has been observed in DNA polymerase III 8

(Flower and McHenry 1990), cytidine deaminase (Mejlhede et al. 1999), α-1, 2- fucosyltransferase (Wang et al. 1999), and α-fucosidase (Cobucci-Ponzano et al. 2006).

The +1 frameshift has also been identified in with examples including the regulation of ornithine decarboxylase antizyme (Dinman 2006), ABP140 mRNA involved in actin reorganization (Kilchert and Spang 2011), immune response modulator

IL-10 (Saulquin et al. 2002), and the telomerase subunit (Taliaferro and Farabaugh,

2007).

Ribosomal Pausing

In order for the frameshift to occur, at high frequency, the ribosome must pause so that the tRNA can slip into the new frame. In the example of RF2, the pause is created by the stop codon, increasing the levels of RF2 increases termination and therefore decrease frameshifting (Craigen and Caskey, 1986). As noted above, increases in ribosomal accuracy have also been shown to cause frameshifting because of the slower rate of translation (Sipley and Goldman 1993). However, this does not necessarily mean that stop codons are required for frameshifting. If an UAG codon is placed 3' to the slippery site, then the introduction of suppressor tRNAs can reduce frameshifting significantly

(Curran and Yarus 1988). Furthermore, it has been shown that when the stop codon is replaced with certain sense codons they can give equally high frameshifting rates (Curran and Yarus 1988, 1989; Pedersen and Curran 1991).

Time the tRNA spends in the E-site

In order for the E-site to be involved in maintaining the reading frame, it is necessary that the tRNA spend a sufficient amount of time there. If it simply passes into 9 the E-site and is immediately released it seems unlikely that the E-site/tRNA interactions are significant or that they could be involved in increased stability. The E-site would need to remain occupied after translocation. RNA footprinting has already identified interactions between tRNA and the ribosomal E-site (Moazed and Noller 1989, Bocchetta et al. 2001). Dinos et al. (2005) asked two questions about the E-site: does the peptidyl- transferase reaction depend on an occupied E-site and at what stage was the tRNA released from the ribosome? To answer these questions they prepared three ribosomal complexes. One contained a peptidyl-tRNA analog in the P site, with the other sites empty. A second complex was prepared in which the peptidyl-tRNA had been translocated to the P-site and a new tRNA was bound in the A-site. The final complex contained a ternary complex frozen in the A-site by GDPNP, an analog of GTP that is non-hydrolyzable. A kinetic assay was performed to determine the rate of translocation and peptide bond formation. They determined that the E-site tRNA is released once the

A-site is occupied, but that the active state of the peptidyl-transferase center is unaffected. The peptide-bond is formed only when the E-site is empty, suggesting that the E-site tRNA is released between decoding in the A-site and GTP hydrolysis by EF-

Tu. We suggest that if the E-site is occupied until accommodation in the A-site the bonds of the tRNA in the E-site may be valuable in preventing codon slippage before accommodation of the correct tRNA.

Sequence Factors Affecting Initiation

In prokaryotes the ribosome binding site (RBS) on the mRNA contains specific sequences intended to enhance ribosome recruitment but also have effects on the translational efficiency by enhancing the ability of the ribosome to initiate translation. 10

Surrounding the initiation codon the RBS extends roughly from -20 nucleotides before the initiation site to +13 nucleotides beyond it (Scherer et al. 1980). Some important sequence features included in this area are the Shine-Dalgarno sequence, the initiation codon, the distance between the Shine-Dalgarno sequence and the initiation codon and the nucleotides 3' to the initiation codon.

Shine-Dalgarno Sequence

The Shine-Dalgarno (SD) sequence is found upstream of the start codon, this polypurine region (sequence AGGAGG) is believed to bind with the 3' end of the rRNA increasing recruitment of the ribosome and helping in the alignment with the initiation codon (Steitz and Jakes 1975). One important piece of evidence for this interaction was the mutagenesis of the anti-Shine-Dalgarno region of the 16s rRNA (Jacob et al. 1987). A single base mutation in the anti-Shine-Dalgarno region was sufficient to decrease the level of most cellular proteins. Usually the SD is positioned 5-8 nucleotides upstream of the start codon and generally only a 4 or 5 nucleotide SD is required to see efficient translation. Modifications increasing the size of the SD sequence have shown a combination of effects from increases in translation to decreases in translation or even no effect at all (Munson et al. 1984, de Smit and van Duin 1994, Komarova et al. 2002,

Chen et al. 1994). However, there is an increased effect of a stronger SD:rRNA interaction when the start codon is not AUG (Weyens et al. 1988). Furthermore, it has been shown that when the untranslated leader is removed that an AUG initiation codon, rather than another, like GUG, is essential for translation (Van Etten and Janssen 1998).

This suggests that for non-methionine start codons the SD sequence plays a larger role in initiation. 11

Distance between the Shine-Dalgarno sequence and the initiation codon

Another important finding about the SD sequence is that its distance between the

SD sequence and initiation codon is important. Since the SD sequence is involved in proper alignment with the start codon it is thought that the optimal sequence length would be the distance between the 3' end of the 16s rRNA and the P site where the first tRNA enters the ribosome (Chen et al. 1994). However, other factors affect this length such as secondary structures, complicating determination of the exact length of the intermittent sequence. Studies have suggested that there is a favorable range where the length optimizes translation levels and while some variation has only a small effect; too much variation in either direction greatly reduces translation (Ringquist et al. 1992).

Nucleotides 3' to the initiation codon

Studies of translational initiation sites have shown that nucleotides on both sides of the initiation codon are not randomly distributed (Stormo et al. 1982, Gren 1984). This suggests that codons following the initiation codon are not only part of the coding sequence but a regulatory region as well. In general, an AU-rich RBS is thought to increase translational efficiency by decreasing the probability of secondary structure that could potentially limit the ability of the ribosome to interact (Zhang and Deutscher 1992,

Zhelyabovskaya et al. 2004). This region of AU richness extends past the start codon into the two codons following it (Scherer et al. 1980, Shultzaberger et al. 2001). However, the most common codons in the second position are AAA and GCU, and expression data have shown that higher expression levels are obtained with AAA than GCU. It is entirely possible that second codon identity could be important in regulating expression levels

(Ringquist et al. 1992). Interestingly, none of the features of the RBS seem to be absolute 12 requirements as much as they are preferences that together contribute to the efficiency of initiation. I would argue that the RBS is a complex interaction of different features, where each is somewhat variable, and that the combined interaction is needed for optimum translation. However, this complex interaction is made even more complex by the fact that each of these may also act as a regulatory region that controls levels of expression.

The final molecular event necessary for polypeptide synthesis occurs when a nucleic acid sequence carried by messenger RNA is translated into a polypeptide chain by a ribonucleoprotein known as a ribosome. This process links the sequence of nucleic acids to that of amino acids by providing a platform where the codons of the mRNA can be matched to the anticodons of the tRNAs that deliver the encoded amino acids. The platform contains three sites for tRNA. Two of them, the A-site for incoming aminoacyl- tRNA and the P-site for peptidyl-tRNA, were predicted by Watson’s classical model

(Watson 1963). However, later evidence suggested the existence of a third site (the exit or E-site) where deacylated tRNA binds prior to exiting the ribosome and that codon- anticodon interaction is present even on this site (Rheinberger and Nierhaus 1980,

Rheinberger et al. 1981).

The first models of tRNA binding sites were based on the specific operations that needed to be performed for translation to occur. The P-site donated the peptidyl-tRNA substrate while the A site accepted the aminoacyl-tRNAs (Watson 1963, Lipmann 1963).

When a third tRNA binding site was discovered the model of translation increased in complexity (Rheinberger et al. 1981, Grajevskaja et al. 1982, Rheinberger and Nierhaus

1983). While there is variation in ribosomes from the three domains of life, they share many common features (Noller 1991, Wittmann-Liebold et al. 1990). In all three 13 domains, an E-site with the operational role of accepting the deacylated tRNA has been observed (Noller et al. 2002). More difficult is determining the operational role of the E- site in reading frame maintenance and aminoacyl-tRNA selection.

The amount of time the tRNA occupies the E-site and its interactions with the ribosome are important in determining if it has an operational role. An atomic model of tRNA interactions has been produced that confirms ribosomal interactions with the tRNAs. In this model the E, P and A-sites were filled with a non-cognate tRNA, deacylated initiator tRNAfMet, and aminoacyl tRNAPhe respectively, with the ribosome bound to a pre-translocation mRNA (Selmer et al. 2006). Other studies have confirmed the possibility of codon-anticodon interactions in the E-site (Jenner et al. 2010). A distortion was observed in the junction between the anticodon stem and D-loop of the deacylated P-site tRNA that may be relaxed by movement into the E-site. This movement is prevented from happening prematurely by interactions between the P-site tRNA and a region between the P and E-site on the ribosome (Giedroc and Cornish 2009). Moreover, increased levels of frameshifting have been associated with mutations in this same region

(Sergiev et al. 2005). Further studies have shown reduced gene expression from individual AGG or GGG codons (Stenström and Isaksson 2002, Zahn and Landy 1996).

Low gene expression may be attributed to peptidyl-tRNA drop-off, a premature release of peptidyl-tRNAs from ribosomes. When on a highly expressed message this can have the effect of reducing rare tRNAs cognate to the E-site, suggesting that they are held in the

E-site for a significant amount of time prior to the release of the slowly translated codon.

For the E-site to have an effect on frameshifting it would need to remain occupied post- translocation, this timing has been supported by a study that suggests it remains occupied 14

75% of the time (Remme et al. 1989). Previous work suggests that deacylated E-site tRNA is released between the binding of new aminoacyl-tRNA in the A-site and the hydrolyzation of GTP by EF-Tu (Dinos et al. 2005). Though another study suggested that it was readily dissociated (Semenkov et al. 1996), this study has been criticized due to failure to maintain buffer conditions with the original study where E-site tRNA is not released upon translocation in a polyamine-Mg2+ buffer (Nierhaus et al. 1997). This highlights one important drawback with all these studies: they were performed in vitro, where buffer conditions may have a significant effect on the results.

To resolve these problems we use in vivo studies. Several studies have provided evidence to suggest that codon-anticodon interaction in the ribosomal E-site helps maintain reading frame (Marquez et al. 2004, Trimble et al. 2004, Jenner et al. 2007,

Sanders and Curran 2007, Devaraj et al. 2009). However, there has so far been little in vivo evidence in support of E-site involvement.

Structural studies of the codon:anticodon duplex in the E-site have not yielded clear information about the structure of the E-site complex. Past structures determined for ribosomes containing E-site tRNAs were made with non-cognate codons to the tRNA E- sites and do not show codon:anticodon pairing (Yusopov et al. 2001, Selmer et al. 2006).

A more recent structure was determined with cognate tRNAs in the E-site (Jenner at al.

2007), however, in this ribosome initiation complex, the mRNA E codon is in a conformation that does not allow it to form base pairs with the tRNA anticodon (Yusupov et al. 2001). In this structure, the E codon and cognate tRNA were not well resolved and information about codon:anticodon interaction at the E-site was inconclusive. Later studies using time-resolved cryo-electron microscopy suggested that while the peptidyl- 15 tRNA is stably bound in the P-site, the deacylated tRNA takes various positions (Fischer et al. 2010) and that these may be related to the various positions previously observed in tRNA in the E-site (Robertson and Wintermeyer 1987). While these studies have provided valuable information about the functional structure of the ribosome, they have so far been unable to resolve functional information about the E-site and have led to some controversy over its codon:anticodon binding (Agrawal et al. 1996, Stark et al. 1997).

Past in vitro studies have also demonstrated limitations in explaining the functional role of the E-site.

In E. coli, more than one codon can be used for initiation of protein synthesis. The vast majority of genes use AUG is the initiator. However, other, similar codons are also use to varying extents (for example, GUG, UUG, AUU and CUG; Sato et al. 2001,

McCarthy and Brimacombe, 1994, Sato et al. 2001). Even so, all of these are read by the same initiation tRNA with a CAU anticodon. Therefore, this anticodon must form different duplexes with the various initiation codons. The AUG codon is the only one using three Watson-Crick base pairs, while the others have decreased usage, comparable to a decrease in their predicted stability. Furthermore, duplex stability has been correlated with initiation codon frequency suggesting there is a functional significance to these duplexes. Taken together, this suggests that though many codons can serve as initiators natural selection favors the most stable duplexes.

THE E-SITE WOBBLE POSITION CODON AND DECODING ACCURACY

Steps of aa-tRNA Selection at Initiation 16

Aminoacyl-tRNA selection is a multiple step process, and it seems likely that variation among the rates of these steps can affect selection and, therefore, accuracy. The initial step in selection is the binding of the ternary complex (EF-Tu-GTP-aa-tRNA) with the ribosome. Initial binding is with ribosomal proteins at the L7/12 stalk (Kothe et al.

2004, Diaconu et al. 2005). The stalk has three elements: a base, L10 helix α8-L7/12 N- terminal-domain complex, and L7/12 C-terminal domains. The mobile L7/12 C-terminal domain recruits translation factors to the ribosome (Diaconu et al. 2005). Studies using

NMR, X-ray scattering, and examining electrostatic properties of the stalk suggest that the L7/12 C-terminal domains bind ternary complexes and bring it to the factor-binding site of the 50S ribosomal subunit (Rodnina et al. 1996, Gromadski and Rodnina 2004,

Bernado et al. 2010). Recognition starts with tRNA entering the 30S subunits decoding site where codon-anticodon complexes form. The formation of the complex is followed by conformational changes in the 16S rRNA (Ogle et al. 2001). Next, A-minor interactions are formed. In A-minor motifs, adenine-rich edges on the minor groove, in this case, the bases, are inserted into the minor groove of the neighboring RNA that is rich in guanine or cytosine, here the first two base pairs of the complex (Konevega et al.

2004, Murphy et al. 2004). These motifs are able to form hydrogen bonds and van der

Waals interactions even in non-canonical pairings (Kleiger and Eisenberg 2002, Bella and Humphries 2005). Next, domain closure by the rotation of the 30S head and shoulder domains is thought to force the anticodon stem and loop of the aa-tRNA into an accommodated position (Ogle et al. 2001, Schmeing et al. 2009, Schuette et al. 2009). In

GTPase activation, the ribosome is rearranged to bring the necessary components together for the hydrolysis of GTP. Evidence suggests that the rate of this rearrangement 17 is increased by a cognate codon-anticodon duplex because correct bonding signals the

GTPase center of EF-Tu (Rodnina and Wintermeyer 2001). Furthermore, incorrect binding may lead to displacement of necessary reaction components from their reaction centers. Several interactions are important for the hydrolysis of GTP especially His84 which is thought to hydrogen bond the substrates (Daviter et al. 2003, Bos et al 2007).

Hydrolysis produces two reaction products GDP (guanosine diphosphate) and Pi

(inorganic phosphate) and it is likely that during this step aa-tRNA is released from EF-

Tu (Effraim et al. 2009, Burakovsky et al. 2010). Finally, the aa-tRNA is accommodated by movement into the peptidyl transferase center and the rate of peptide bond formation has been linked to the rate of accommodation (Pape et al. 1999, Wohlgemuth et al. 2010).

Molecular Mechanisms of Selection

Non-Watson-Crick base pairing of the first two codons of the codon-anticodon duplex impairs interactions with the decoding site. A mismatch at any position has been shown to increase the rate of dissociation of the tRNA only 1000-fold (Gromadski et al.

2006). Since the rate of translational accuracy can be greater that this factor, the thermodynamic stability of the codon-anticodon duplex alone is not sufficient to explain the accuracy of translation. Instead, the increased rate of the forward reaction for cognate codons is a result of conformational changes in the ribosome induced by interactions of the ribosome with the decoding center (Cochella and Green 2005, Pan et al. 2008). There are 54 proteins found in the E. coli ribosome of these there are six that show evidence of being involved in the proofreading stage including S13, S19, L16, L25, L27, and L31

(Jenner at al. 2010). Further evidence of the importance of conformational changes in the ribosome come from the study of specific antibiotics. Ogle et al. (2001) showed that 18 paromomycin causes changes to the decoding center that stabilize the near-cognate binding. Treatment with streptomycin has also been shown to cause an almost complete loss of selectivity by restricting movement of the 30S head to the 30S body decreasing the rate of reaction with the cognate tRNA (Carter et al. 2000, Ogle et al. 2003,

Gromadski and Rodnina 2004).

Fidelity of Amino Acid Incorporation

There are multiple selection stages in aa-tRNA selection, and incorrect aa-tRNAs are thought dissociate at different stages, depending on the qualities of their match to codons. aa-tRNAs are delivered to the ribosome in a complex with EF-Tu and GTP (EF-

Tu-GTP) called the ternary complex and movement of the ternary complex into the A- site requires the action of several intermediates (Rodnina and Wintermeyer 2001,

Marshall et al. 2008). Initial recruitment of the ternary complex requires L12 to interact with EF-Tu (Kothe et al. 2004). Next, the codon is scanned by the decoding site of the 30S subunit (Geggier et al. 2010). It is at this stage that their poor codon:anticodon fit causes most noncognate ternary complexes to be rejected. If not immediately rejected, then upon formation of the codon recognition complex, a set of interactions are formed between the ribosome and the codon-anticodon duplex leading to domain closure, GTP hydrolysis and GTPase activation of EF-Tu (Rodnina et al. 1995,

Pape et al. 1999, Ogle et al. 2002, Gromadski et al. 2006). Most near-cognate aa-tRNA dissociate after GTP hydrolysis (Thompson and Stone 1997, Ruusala et al. 1982, Rodnina and Wintermeyer 2001), and only correctly paired codon:anticodon complexes are likely to remain long enough for peptide bond formation. Interestingly, if incorrect tRNAs are 19 accepted, ribosomes can undergo “abortive termination”, ending synthesis of error- containing proteins (Zaher and Green 2009).

Error Frequency

The first attempts to measure error rates were made over 50 years ago and yet we still know surprisingly little about the frequency of error or the mechanism behind them

(Loftfield 1963). One reason for the wide range of results is that studies vary in the types of reporter systems used, the type and concentration of the misreading tRNAs, and the mRNA context making comparison difficult (Parker 1989, Kramer and Farabaugh 2016).

Estimates of missense error frequency range from 10-3 to 10-5 per codon depending on the method used. These methods include measurements of the incorporation of 35S-cysteine into flagellin, for which cysteine is not encoded (Edelmann and Gallant 1977, Bouadloun et al. 1983, Rice et al. 1984). Another set of studies measured effects on protein function thought to be due to misreading mutant codons for essential amino acids (Parker and Friesen 1980, Johnston et al. 1984, Parker and Holtz

1984), Rosenberger and Foskett 1981, Rosenberger and Hilton 1983). Furthermore, measuring at specific loci is often not representative of a whole genome error rate because of features like sequence motifs and base composition (Lee et al. 2012, Wielgoss et al. 2011, Zhou et al. 2013). However, site-specific mutagenesis methods allow one to change and study sequence features near specific misread loci.

The significance of the tRNA in the process of translation is as an adapter molecule binding a particular amino acid and delivering it to the matching codon on the mRNA chain. As there are 20 amino acids incorporated into proteins and 61 sense 20 codons, amino acids can be represented by more than one tRNA, and these are known as isoacceptors. Moreover, some tRNAs can read more than one codon. This latter observation led Francis Crick to propose that nonstandard or “wobble” pairings could occur (Crick 1966). For example, if the anticodon has a G at its 5’-end, it can pair to codons having either C or U at the ‘3-end. As an example of these translational properties, E. coli uses three tRNAs to read the four Valine codons (GUU, GUC, GUA or

GUG), and these tRNAs have overlapping codon specificities.

Each codon in an unoccupied A-site is specific for a single tRNA or (isoacceptor set); but its tRNA competitors, especially near cognates, can interfere with the selection of the correct anticodon. On average, a given cognate aa-tRNA has about 5-6 near- cognate and >50 non-cognate competitors. Moreover, the interaction surface of the codon:anticodon pair is small relative to that of the rest of the aa-tRNA-EF-Tu-GDP ternary complex with the ribosome (reviewed by Wilson and Nierhaus 2006). All of this suggests that the translational machinery operates to maximize discrimination based on the affinity differences between the correct and all other ternary complexes. One model has suggested that discrimination is accomplished by an allosteric interaction between tRNAs in the E-site, with those at the A-site (Nierhaus, 1990). Evidence has suggested that when tRNA is occupying the E-site, this induces a low-affinity A-site, reducing the binding of incorrect tRNAs at the A-site, that may be mediated by a conformational change in the rRNA (Rheinberger and Nierhaus 1980, Lill et al. 1989). A number of studies have quantitatively examined the structure of deacylated tRNA in the E-site.

However this alone has not clarified E-site significance in aa-tRNA selection. Our work 21 is unique in that we will examine the effects, in vivo, of altering the E-site codon and assaying the effects on A-site misreading and nonsense suppression.

The idea that the E-site tRNA was important in influencing cognate A-site tRNA selection was first proposed by Nierhaus (Nierhaus 1990). Their in vitro assays showed that cognate E-site tRNA decreased erroneous aa-tRNA incorporation (Nierhaus 1990).

They also showed that deacyl-tRNA remains bound to the E-site until cognate aa-tRNA is bound to the A-site, but is released prior to the hydrolysis of GTP (Dinos et al. 2005).

While this evidence is consistent with the negative allosteric model other experiments have produced conflicting results. Evidence from other in vitro studies have shown no tight tRNA binding at the E-site in standard buffers (Semenkov et al. 1996). And high levels of translational accuracy have also been obtained when no tRNA was in the E-site

(Pape et al. 1998, Wintermeyer et al. 2004, Rodnina et al. 2005). One way to resolve this issue is to compare accuracy when different codons occupy the E site. In vivo experiments would eliminate uncertainties inherent in in vitro conditions.

MUTATIONS THAT INCREASE MISREADING AND REVEAL MORE NON-

RANDOMNESS IN CODON USAGE

Due to the degeneracy of the genetic code, the number of possible open reading frames that could specify the same amino acid sequence is unimaginably large. With an average of three codons per amino acid, they typical 300 amino acid sequence could be specified by ~3300 ORFs, a number greater than the number of particles in the universe.

This leads me to the question of whether all of “synonymous” ORFs would be translated equally well. One approach to address this question is to examine genes for non- randomness. For example, if certain synonymous codons were used more frequently than 22 others, then one could conclude that codon bias contributes to one or more translational qualities such as accuracy or energy efficiency. Examining sequences to determine unique features that interact with the ribosome in translation is a daunting task. Some features that alter how a sequence is translated have already been discovered suggesting that translation is not always a matter of canonical decoding. An analogy to this might be formatting of files on a computer. A document that is formatted for one operating system may not be read by another because of the unique features of the format that are important for that system. Are there features of the sequence that format it for the ribosome and have meaning during translation other than to simply be decoded? Here I would like to review some of the known features that make translation not a matter of straight decoding but that suggest they can also signal the ribosome to react in a certain way.

Codon Context

There is evidence that the neighboring sequences can alter codon translation, and such effects are referred to as “context effects”. Context effects have been observed to cause many kinds phenomena such as ribosomal frameshifts, and missense and nonsense suppression. Such effects are usually detected in one of three ways. One method finds translational anomalies such as high-frequency frameshifting, and then altering the context to observe any effects on the anomaly. Another method used to study context effects is to use site-specific mutagenesis to create novel mRNA sites and determine whether the introduced sequence features affect translation. A third method is statistical analyses that identify sequence non-randomness. Nonrandomness may be maintained by 23 selection pressure for translational properties. Moreover, searches for non-randomness are often used to guide the other approaches and I will expand on it just below.

Statistical Evidence

By analyzing gene sequences patterns can be found that occur more often than would be predicted by simple random selection for codons. For example, it has been observed in E. coli that A-T base pairs prefer neighboring G-C base pairs and G-C prefer neighboring A-T bases (Blake and Earley 1986). In a study of protist sequences by

Grantham and collaborators it was found that there was a bias toward G or C-ending codons. In addition, for amino acids encoded by either four or six codons there was a clear preference for cytosine in the third position (Grantham et al. 1986). Moreover,

Gouy examined the base usage in the three codon positions to see if it correlated with 5’- and 3’-adjacent codons at each base position. Statistically significant codon bias was found in several contexts (Gouy 1987). A “3-1” doublet frequency is a preference of the third base of one codon to be followed by a specific first base in the following codon.

While early studies showed this type of preference extensively in higher vertebrates, it has since been observed in prokaryotes (Blake and Earley 1986, Shpaer 1986). Two different mechanisms have been suggested to explain the physical basis of this effect.

One possibility is that there is an interaction between peptidyl and aminoacyl tRNAs in the A and P sites which influences translation efficiency. A second explanation suggests that codon-anticodon recognition is affected by the adjacent 5’ or 3’ nucleotide (Yarus and Folley 1985, Shpaer 1986). However, experiments in nonsense suppression and studies of 3-1 doublet frequency favors the first hypothesis (Ayer and Yarus 1986). Thus, the selection may drive non-random context usage by more than one mechanism. 24

The above works suggest that non-randomness may be maintained by pressure to maintain translational efficiency. However, it is important to consider that there could also be selection acting on DNA or mRNA structure or stability. Supporting the conclusion that bias has a translational cause is that they are dependent on triplet phasing

(see, for example, the discussion by Folley and Yarus 1986). These workers also showed that context biases differ between highly expressed genes and those with lower expression, again suggesting a translational cause.

Experimental Evidence

The first experimental demonstration of codon context effects was an observation of suppression efficiency of nonsense mutants. It was shown, for example, that ochre suppressors, which can translate both UAA (ochre) and UAG (amber) premature termination codons, translate these codons with relative efficiencies that vary depending on the 3’ context (Feinstein and Altman 1977, 1978). This led to the discovery that when translating the UAG codon the efficiency of the amber suppressor supE can vary over one order of magnitude depending on the adjacent 3' codon (Bossi and Roth 1980). Two studies together were the first to suggest that sequences downstream of the termination codon may be involved in translation termination (Bossi 1983, Miller and Albertini

1983). Bossi's study in Salmonella typhimurium and the same sites in E. coli by Miller and Albertini showed bias in codon usage on the 3’ side of the termination codons. A statistical analysis of these codons supported the significance of this finding (Kohli &

Grosjean 1981). A later reanalysis of this data showed that suppression efficiency was modulated by the 2 nucleotides downstream (Stormo et al. 1986). These studies introduced the technique of comparing different sites and looking for context effects, 25 however, this does not dismiss the possibility that these effects may be specific to that site. The sequence downstream of UAA codons, that are dependent on release factor 1

(RFI) for termination, has also been shown to affect termination (Ganoza et al. 1984). A study of termination sequences in eukaryotes suggested that the preferred signals for termination take the form UAA(A/G) and UGA(A/G) (Brown et al. 1990). Furthermore, a strong preference has been observed for UAA over UGA and UAG in highly expressed genes (Kohli and Grosjean 1981). One possible explanation for this is that nucleotides involved in strong base stacking can restrict release, whereas loosely stacked bases favor release.

Termination is not the only function shown to be affected by codon context. Many factors may affect the rate of translation; one of the best accepted is the availability of the cognate tRNA in the pool of available tRNAs (Varenne et al. 1984). There are experimental approaches to determining the rate of elongation. One approach is to measure the entire elongation cycle. Pedersen used a pulse-chase method to compare the translation time of two genes rich in rare codons, lacI and bla, with genes containing few rare codons, fus, tsf, tuf and rpsA to show that poorly expressed genes are translated more slowly (Pedersen 1984). Curran and Yarus (1989) used a technique to determine how long it takes to fill the aminoacyl site by placing tRNA selection at a codon in competition with a frameshift with a constant rate and found a 25-fold range in selection times for different YNN codons. Using this technique they were able to show a correlation, in highly expressed genes, for codon preference rather than rate constant. It may be possible to combine this technique of determining rate of A site selection with 26 different adjacent codons to analyze the effects of specific contexts on rate of selection. I was unable to find any work examining this but it is worth exploring. tRNA Modifications

When we describe the genetic code as degenerate we are referring to the fact that most amino acids are encoded by more than one codon. Beyond the different anticodons for amino acids there are a number of modifications that alter the nucleotides of individual tRNAs. Like the side chains of amino acids these modifications contribute to the hydrophobic/hydrophilic properties of the RNA (Agris 2007). But do these modifications have an effect on how the message is translated? Nishimura's work, especially on modifications of purine-37, changed the concept of codon recognition by showing there was greater complexity than the standard three base pair recognition (Jukes

1973). Here I will discuss some of these modifications and how they may be involved in codon recognition.

Purine-37 modifications

As I mentioned above, codon bias, the tendency of a specific codon to be preferred over another, has been observed in many organisms. Not surprisingly codon bias can be linked to the abundance of the available tRNA isoacceptor. However, this is not always the case and selection for rare codons can have effects on translation. For example, the rare arginine codons AGA or AGG can cluster around initiation and termination sequences in E. coli (Gurvich et al. 2005). The low abundance of those tRNAs can lead to ribosome stalling and frameshifting which can ultimately affect expression levels of those proteins (Cruz-Vera et al. 2004). There are modification 27 differences between the tRNAs for AGA and AGG and those for the common CGU and

CGC codons. Where the tRNA for the common codons contain only the nucleoside inosine at the wobble position, the rare codons are more heavily modified. AGA and

AGG are read by tRNA Arg4, which has 2-thiocytidine, 5-methylamino-methyluridine and

N6-threonylcarbamoyladenosine modifications at positions 32, 34 and 37, respectively

(Sprinzl et al. 1998). There is evidence that the 2-thiocytidine modification affects the decoding of AGG (Jäger et al. 2004). The fact that tRNA modifications can be involved in recognition at sites that can affect expression levels suggests that these modifications can be important for regulation of expression.

This is not the only significant modification at purine-37. In an earlier section, I discussed frameshifting. There are genes that have programmed frameshifting regions that have very high rates of frameshifting (Farabaugh 1996). There are also places where frameshifting occurs with lower efficiency and the efficiencies of these frameshifts have been shown to be influenced by tRNA modifications (Farabaugh and Björk 1999, Agris et al. 2007). Modifications of purine-37 have been shown to affect translational frameshifting, in the absence of the 2-methylthio-N6-(cis-hydroxyisopentenyl) adenosine and N1-methylguanosine increase the rate of +1 frameshifts but have little effect on -1 frameshifting (Urbonavicius et al. 2003).

Wobble position-34 modifications

Many codons are recognized by more than one tRNA, and tRNA nucleoside modification can contribute to this form of degeneracy. One especially important modification is 5-oxyacetic acid uridine modification at position 34, which reads the third codon nucleotide. It is found on at least one tRNA in the eight “full” boxes of the genetic 28 code, where all for codons XXN specify the same amino acid. This modification allows the tRNA to read three of the four codons (XXU, XXA and AAG). In Salmonella enterica, it has been shown that deletion of a single enzyme in the biosynthesis of this modification reduces growth (Näsvall et al. 2004). In this way modification can again be regulatory in that there is not a simple relation between the abundance of a certain tRNA for each codon in every case. In some cases, a modification may allow different codons to be read by a single tRNA changing the significance of abundance.

Modifications at this position can also be important in frameshifting. The common modification to queuosine at position 34 was shown to suppress +1 frameshifting at aspartic acid, histidine, tyrosine and asparagine codons (Urbonavičius

2001). In E. coli +1 frameshifting was promoted by the lack of the 5- methylaminomethyl-2-thiouridine at position 34 in tRNAs for glutamic acid, glutamine, arginine, leucine and lysine (Urbonavicius et al. 2003). The induction or suppression changes the way in which the message is processed which also points to these modifications as regulatory in that they can change the way in which a specific message is read.

Recoding Systems

I finished the last section about processivity errors with a discussion of frameshifting. This is a type of error where the message is not processed by the "normal rules''. In this section I want to talk about other ways in which the rules are changed by recoding systems. One example I mentioned above was programmed frameshifting. Each nucleotide in a single codon specifies a single amino acid, since there are three nucleotides this means that there are three reading frames for each sequence. A frameshift 29 occurs when the reading frame dictated by the start codon is not followed, the ribosome shifts on the sequence and reads codons in the wrong frame. If a frameshift occurs for most genes this means that it will produce a sequence of amino acids that don't create a functional protein. However in some cases there are induced frameshifts. One of the best examples of a programmed frameshift is prfB. The gene codes peptide release factor 2, and the concentration of this release factor regulates the efficiency of a frameshift required to express this factor (Sanders and Curran 2007). This is a case where a sequence allows for the production of two proteins that may share an initial sequence but after the point of frameshifting a different C-terminus is coded. Another example of certain codes being processed by different rules is polyproline pauses. The ribosome has been observed to pause when it encounters short proline rich motifs possibly because proline is a secondary amine or because of its cyclic side chain (Doerfel et al. 2013).

Stalling at these sites is alleviated by the elongation factor EF-P. EF-P binds near the peptidyl-tRNA in the E-site a β-lysyl moiety reaches into the active site and restores catalysis (Ude et al. 2013). This is an interesting example in that it shows clearly that the ribosome cannot translate all sequences equally and that for this case an extra factor is required. Another interesting example is ribosomal skipping, in this example a viral peptide prevents covalent linking of the A and P site amino acids but permits the ribosome to continue translation, resulting in a cleavage of the polypeptide chain. The cleavage is caused by cis-acting hydrolase element or CHYSEL sequence observed only in +ssRNA and dsRNA viruses and occurs between a proline and a glycine (De Felipe

2004). The cleavage of a polypeptide in the cell allows for the production of two proteins from a single sequence. 30

The above review is not an exhaustive list of all the ways in which sequences can sometimes be translated by a different set of rules. Hopefully, it is a good reminder to the reader of the complexity that is actually involved in the process of translation. In the above examples we see how translation of certain codons can be affected by neighboring residues, the presence or lack of tRNA modifications at specific residues and sequences that the ribosome cannot handle in the normal way. Furthermore we see how factors outside the ribosome, like tRNA modifications can regulate translation and even how in some cases the ribosome at specific sequences is programmed to read it out of frame producing a new protein. I hope that this gives some idea of the complex interplay of factors that are involved in translation beyond a simple sequence.

The amino acid sequence of proteins is stored in the coding regions of genes.

Nonetheless, in the same coding regions genes have found ways to incorporate information regulating the expression level of those genes (Grantham et al. 1981, Gouy and Gautier 1982). Several studies in E. coli have revealed different patterns of codon bias when highly expressed genes are contrasted with weakly expressed ones (Ikemura

1981, Ikemura 1985, Grosjean and Fiers 1982, Gustafsson et al. 2004). The quality of the final protein can be affected by choosing a series of codons that are translated with a particular efficiency and regulation of gene expression is modulated by elongation rates

(Konigsberg and Godson 1983, Robinson et al. 1984). Still, this is not the entire story.

Various studies have also shown that modification of the tRNA that reads these codons can alter the way in which the codons are translated. An abundance of the cell's resources are allocated to tRNA modifications with more genetic information dedicated to modifications than to the tRNA genes (Bjork 1995, Gustilo et al. 2008). Modifications 31 have been shown to affect termination and reading frame maintenance in ways that affect expression levels (Gurvich et al. 2005, Cruz-Vera et al. 2004). Furthermore, sequences are not always translated by the same rules. Induced frameshifts, like peptide release factor 2 in E. coli, can directly modulate expression levels. It would appear that a number of systems have evolved to incorporate information that can regulate expression levels of genes along with the amino acid sequence and, undoubtedly, there are more that have yet to be discovered.

The mechanisms that allow neighboring nucleotides to influence translation of a particular codon are still obscure. In addition, it is not clear that only nucleotides that are directly neighboring a particular codon are the only ones active in aspects of translation like reading frame maintenance, tRNA selection and termination. In 1990, Nierhaus proposed that the A and E-sites were reciprocally coupled and while the E-site was occupied that only the cognate ternary complex binds with sufficient energy to discharge the deacylated tRNA from the E-site (Nierhaus 1990). Their in vitro assays showed that erroneous aa-tRNA incorporation could be inhibited by cognate E-site tRNA (Nierhaus

1990). Furthermore, past research in our lab has indicated that the E-site is important in reading frame maintenance. We employed the E. coli lacZ gene as a reporter to show that, in vivo, codon:anticodon base pairing controls frameshifting. A plasmid-borne lacZ reporter gene with an inserted E. coli prfB (RF2) programmed frameshift sequence was transformed into a strain of E. coli with a chromosomal lacZ deletion. Because of the deletion of lacZ the enzyme activity observed could be attributed to the inserted plasmid constructs. Therefore, the ribosomal E-site has been directly implicated in some of the same functions that are involved in regulation of expression levels. Two ways that have 32 been previously used to detect these translational anomalies include statistical analysis to detect non-random contexts around specific codons and experimentally testing translational anomalies, altering the surrounding codon context and observing the effects.

Much of the work that has been done in this area has focused on codons directly neighboring the translational anomaly, but our laboratory's major interest has been the effects of the E-site on factors like frameshifting and tRNA selection. We hypothesized we could detect an unknown translational anomaly by pairing our lacZ reporter system with mutations at specific codons that have been previously shown to minimize enzyme activity in lacZ. By first mutating the essential codon to another codon that knocks down activity and mutating the E-site codon to a series of random codons we are hoping to detect an anomaly where the E-site codon has an effect on selection in the A-site. We were able to detect an unusual effect involving the methionine codon in the E-site.

Further, analysis showed that this effect does not hold true for all A-site codons. It will take extensive analysis to fully understand this phenomenon but effects of methionine codons previously have been studied only at the initiation site. This result may show that methionine has other effects in the coding region of the gene. 33

METHODS Initiation Studies Microbial Maintenance MY600 E. coli were maintained on LB agar (Miller) plates (casein peptone 10 g/l, yeast extract 5g/L, NaCl 10 g/l and Agar 12 g/l) and allowed to grow overnight before storage at 4˚C. Plasmid-containing strains were maintained as above but with the addition of (68 g/l) to the media.

Reporter Construct Synthesis

The lacUV5 promoter containing plasmid pJC1168 was used as the backbone for the reporter plasmid. The tetranucleotide UUUY has been shown to spontaneously frameshift from the UUU triplet to the UUY triplet at frequencies of between 3% and

16% (Fu and Parker 1994, Schwartz and Curran 1997). We made use of this frameshift prone sequence (UUU cnn) as our slippery site. A series of frameshift mutants were produced by cloning double-stranded oligonucleotides between the HindIII (305) and

BamHI (348) sites of the polylinker of pJC1168. In all, 8 frameshift mutants were produced including four containing the possible start codon variants:

AUG (5'-AGCUUCUAUGUUUCGGAACGAUACUUG-3') GUG (5'-AGCUUCUGUGUUUCGGAACGAUACUUG-3') CUG (5'-AGCUUCUCUGUUUCGGAACGAUACUUG-3') UUG (5'-AGCUUCUUUGUUUCGGAACGAUACUUG-3') The start codons and slippery site are italicized and bolded respectively. Another four were produced with an additional nucleotide U between the phenylalanine and arginine codons of the UUU cnn slippery site for each of these start codons:

AUG (5'-AGCUUCUAUGUUUCGGUAACGAUACUUG-3') 34

GUG (5'-AGCUUCUGUGUUUCGGUAACGAUACUUG-3') CUG (5'-AGCUUCUCUGUUUCGGUAACGAUACUUG-3') UUG (5'-AGCUUCUUUGUUUCGGUAACGAUACUUG-3') The inserted nucleotide is underlined. The additional nucleotide is inserted such that it interrupts lacZ and requires a one base frameshift for normal expression. The oligonucleotides were produced by Integrated DNA Technologies (Coralville, Iowa) with standard desalting. The DNA cloning was achieved by digesting the backbone, plasmid pJC1168, with HindIII (NEB Catalog #R0104S) and BamHI (NEB Catalog #R0136S) restriction enzymes, and ligating the complementary restriction sites that were placed in the synthetic oligonucleotide sequence. The HindIII and BamHI dual restriction was performed in MED buffer (0.3 M NaCl, 0.3 M MgCl2, 0.3 M TRIS ((HOCH2)3CNH2) and BSA 10 µl (10mg/ml in TE (10 mM Tris-HCl 0.1 mM EDTA, pH 7.5)). The final buffer was prepared by diluting 12 µl of MED buffer in 78 µl of TE, pH 7.5. As control

20 µl of MED buffer was set aside and 15 ng pJC1168 plasmid DNA was added. The tubes containing plasmid DNA were divided into 5, 20 µl aliquots. A single enzyme restriction was performed with BamHI only and HIndIII only and three were double digested with both enzymes. The reactions were incubated for 3 hours before adding 5 µl of a combined stop solution and SDS loading dye Trizma-sodium-dodecyl-sulfate- bromo-phenol-blue (15 mL of TE pH 7.5, 6 ml of 99.5% Glycerol, 0.2 mL of 10%

Sodium dodecyl sulfate and 0.5 mL Bromophenol blue (30mg/mL).

Gel Electrophoresis and Band extraction

To visualize the results the plasmids were separated on a 1% agarose gel for 45 minutes at 170 volts in TAE buffer, pH 7.8. Each entire 25 µl final volume reaction was added to a single well. After 45 minutes the gel was stained by soaking it in the TAE 35 buffer with the addition of 10 µl of Ethidium bromide (25 mg/ml) while gently shaking.

The gel was then exposed to UV light and the three double restricted bands were extracted using the QIAquick PCR Purification Kit (Catalog #28106) following the

QIAquick Gel Extraction Kit Protocol for micro-centrifuges.

Insertion of Double-Stranded Oligonucleotides

The oligonucleotides were ligated into the cut plasmids in 50 µl reactions. Each reaction contained 2 µl T4 DNA Ligase Buffer (10X), 16 µl Vector DNA (7 kb), 1 µl

Insert DNA (27 or 28 bases), Nuclease-free water to 50 µl and 1 µl T4 DNA Ligase. The reactions were incubated overnight (16 to 18 hours) at 15˚C. The ligase was heat inactivated at 65˚C for 10 minutes. The plasmids were then transformed into the MY600

E. coli.

Transformation of Mutant Plasmids

The plasmid used for the construction of mutants was pJC1168 which encodes a pseudo-wild-type lacZ and confers chloramphenicol resistance as a selectable marker

(Figure 1). All mutants were made as described above. The mutant plasmids were then transformed into MY600 Δ(pro lac) ara, thi (Curran and Yarus 1986). 36

Figure 1. Map of the pJC1168 plasmid showing positions of the lacZ and camR genes and the BamHI and HindIII restriction sites.

37

Overnight cultures are prepared by inoculating 2 ml of Luria-Bertani (Miller) broth (LB) with a single culture selected from a LB agar plate. After overnight incubation

10 µl of culture was added to 2 ml of fresh LB. The new culture was allowed to grow for

2 to 2.5 hours at which point it was diluted with 2 ml of cold (4˚C) 2xTSS buffer. The

TSS buffer is prepared by combining 10g PEG (Polyethylene glycol), 2 g MgCl2*6H2O,

10 mL DMSO (Dimethyl sulfoxide) and adding LB to 60 ml. The completed mixture is then filter sterilized using a 0.22 µm filter. The diluted culture is mixed with 20 µl of the ligation mixture and stored on ice for 20 minutes. The ligation and competent cell mixture is then heat shocked in a 37˚C heat block for 7 minutes. Finally, 1 ml of fresh LB is added to each 200 ml culture and the cells are incubated at 37˚C for 3 hours. A 100 µl aliquot of the recovered cells was then spread on an LB agar antibiotic plate

(chloramphenicol 68 g/L) and incubated overnight at 37˚C. The plates were previously treated with 100 µl of X-gal (5-bromo-4-chloro-3-indolyl-fl-D-galactopyranoside, 13 mg/ml in Dimethylformamide) for blue-white screening.

Plasmid Extraction

Colonies were selected from each plate and used to inoculate 2 ml of LB broth.

The inoculated tubes were placed in a rotor and incubated at 37˚C overnight. After incubation, the 2 ml of inoculated media was used to perform plasmid extraction using the Zyppy® Plasmid Miniprep Kit (Catalog # D4036) according to manufacturer's protocol.

SalI Analysis of Suspected Mutants 38

The removal of the fragment from pJC1168 between HindIII 305 and BamHI 348 removes one of the two SalI enzyme restriction sites on the plasmid. For a mutant plasmid, SalI would serve to linearize the plasmid rather than cutting it into two fragments as it would with pJC1168. To help determine if our plasmids indeed contained our frameshift prone sequences each suspected mutant was digested with SalI and visualized on an agarose gel as described above. Plasmids samples that showed evidence of having only a single restriction in this analysis were selected to be re-transformed into the MY600 cell line. Colonies of these plasmids were then grown and plasmids were extracted as described above except that the final elution was done with 30 µl of nuclease free ddH2O.

Plasmid Sequencing

Plasmids were prepared as above and sent to the Nevada Genomics Center (Reno,

Nevada) for sequencing. The chromatograms were analyzed using FinchTV 1.4.0

(Geospiza 2009). Those containing the correct sequence were then used for enzymatic assays.

Enzymatic Assays

Fresh transformations were performed, as above, for every assay. Overnight cultures are grown in supplemented Vogel-Bonner salts (VBsup). A 50x solution of VB salts is prepared to contain (670 ml ddH2O, 10 g MgSO4*7 H2O, 100 g Citric acid monohydrate, 500 g K2HPO4, and 175 g NaHNH4PO4*4H2O) and diluted to a 1x solution before autoclaving. Finally, 100 ml of 1x solution is then supplemented with 2 ml 10%

Glucose, 1 ml 10% casamino acids and 1 ml of a solution containing proline, histidine 39 and tryptophan 10 mg/mL each and thiamine 1 mg/ml. Overnights are prepared by inoculating 2 ml of VBsup (containing the appropriate selective antibiotic.) with a single culture selected from a transformation plate. The culture is allowed to grow in a rotor for

14 to 16 hours and 20 µl of this culture is then used to inoculate 2 mL of fresh VBsup.

The day culture is allowed to grow for 2.5 to 3 hours until the A550 of the day culture is close to 200.

A reaction buffer (Z-buffer) is prepared first by making a 5x Z Salt solution. To 1 liter of ddH20 80.5g Na2HPO4*7H2O, 27.5g NaH2PO4*H2O, 3.75g of 1 M KCl and 1.23g of 1 M MgSO4 are added. The final buffer is prepared by adding 20 ml of the 5x Z Salt solution to 78 mL of ddH2O, 1m BSA (10mg/mL in TE), 0.078 mL β-mercaptoethanol and 15 µl of 10% sodium dodecyl sulfate. Tubes containing 1 ml of Z buffer were placed in a 28˚C heat block. To each tube, an amount of day culture is added. The amount varies depending on the suspected activity. For the experimental mutants that will have lower activity 200 µl of day culture is used, while for controls only 10 µl is necessary. To each tube, 2 drops of chloroform is added and the tube is vortexed for 10 seconds. Timing of the reaction begins with the addition of the ONPG (ortho-Nitrophenyl-β-galactoside) substrate (40 mg in 10 mL of Z buffer.) The reaction is allowed to proceed until the yellow color begins to be visible, at which point 500 µl of stop solution (1M sodium carbonate) is added. Once all the reactions have been stopped the A420 of each tube is recorded and the activity is calculated in Miller units.

Where:

A420 is the absorbance of the ONPG

A550 is the scatter from cell debris t is the reaction time in minutes 40

v is the volume of culture in milliliters

1 Miller Unit = A420 x 1000 / A550 x (v) x (t)

Protein levels

Bacterial cells pellets from 600 µl of an overnight culture were lysed in 50 ul hot (95°C)

SDS Protein Sample Buffer containing 20% SDS, 0.5 M Tris (pH 6.8), 9 M urea, 0.01% bromophenol blue and 2.5% β-mercaptoethanol. Approximately 10 ul of lysate was loaded per lane of a 10% polyacrylamide gel and resolved by SDS-polyacrylamide gel electrophoresis. PAGE gels were stained with Coommassie blue (50% methanol, 10% acetic acid, 0.25% R-250) for 2 hours and then destained with two changes in destaining solution containing 20% acetic acid and 7% methanol. After equilibration in distilled water, gels were imaged with the Bio-Rad ChemiDoc MP system fitted with a white light conversion screen. Far-red fluorescent light from the Coomassie stained proteins was captured as a 16-bit image and quantified with Image J in order to estimate the relative amount of protein present. Levels of protein present in the 120-kDa beta-galactosidase band was expressed relative to the bulk staining present in the selection from the lower portion of the gel (Figure 8). 41

Figure 2. Analysis of total protein in our frameshift samples on an SDS-PAGE gel. The

β-galactosidase band is visible in some samples at 110 kDa. 42

Misreading Studies Site Selection

For this work, I located sites in the lacZ reporter that have two criteria: a codon that specifies an amino acid essential for β-galactosidase function (referred to as the A- site codon), and a codon two positions upstream that has synonyms (the E-site codon).

Previous studies have identified amino acid substitutions in the active site of β-

Galactosidase that result in reduction of enzyme activity to less than 0.1% of the wild type enzyme. Three sites met these criteria: Glutamic acid at position 461, tyrosine at position 503, and asparagine at amino acid 460.

Selection of GAA (Glu) to CAA (Gln) at codon 461 as the principal site

When we were planning potential mutants to test the E-site effects we were especially interested in amino acids that would be in the active site of the enzyme as they were more likely to be directly involved in the reaction. Previous work identified all of the codons we chose as coding for active site amino acids (Edwards et al. 1990, Cupples and Miller 1988, Cupples et al. 1990, Lo et al. 2010, Roth et al. 2003, Huber et al. 1994).

However, this does not mean they all limit activity in the same way and therefore they may not recover activity in the same way. Studies of the asparagine at position 460 and the glutamic acid at 461 suggest that they are both involved in electrostatic stabilization of a transition state galactosyl cation (Wheatley et al. 2012, Huber et al. 1994). Failure to bind the transition state molecule results in very little enzyme activity. We also select amino acid replacements that have different properties than those of the wild-type. At

Glu-461 previous studies have shown that substituting Gly, Gln, His, and Lys greatly reduces binding of the transition state analog 2-aminogalactose (Cupples et al. 1990). 43

Similarly, changing Asn-460 to His-460 has been shown to reduce binding (Kappelhoff et al. 2003). Thus, both AAT 460 (Asn) and GAA 461 (Glu) appeared suitable for analysis.

Selection of Tyr-503 as a backup site

Tyr-503 is a highly conserved amino acid in the β-galactosidase enzyme (David et al. 1992, Pochs et al. 1992, Hancock et al. 1991, Burchhardt and Bahl 1991, Stokes et al.

1985). While mutations at Tyr-503 have been shown to reduce enzyme activity similarly to those at Glu-461 we found little difference in the effect of the E-site wobble position changes on recovery of activity. One way to test when two amino acids act in tandem is to mutate one of them and then both together. If they interact the double mutant’s inactivation should be no greater than the single mutant. Using this technique it was found that Tyr-503 works cooperatively with Glu-537 (Roth et al. 2003). Analysis of

537-Glu shows that during catalysis it forms a covalent bond with galactose and 503-Tyr acts as an acid catalyst to break this bond (Gebler et al. 1992, Penner et al. 1999). It is possible that at very low levels bond breakage occurs without the help of 503-Tyr and makes it more difficult to detect differences in low levels of activity in the absence of

503-Tyr (See examples in Figure 3). 44

Figure 3. Outline of two test constructs used. The 461 glutamic acid was mutated to a glutamine while the downstream codon in the E-site was changed to other codons specifying the same amino acid. The 503 Tyrosine codon was mutated to a histidine codon while the downstream E-site codon was changed to other codons specifying the same amino acid. 45

Site Directed Mutagenesis

Mutations were introduced into the pJC1168 plasmid using the Q5 site-directed mutagenesis kit (NEB Catalog #E0554S). The Primers were designed using the

NEBaseChanger TM (www.NEBaseChanger.neb.com). Initially two primers were designed for each site using the NEBaseChanger TM software; these primers only introduced a single mutation at the active site location. For the Glu-461 codon the primers, forward 5'-TCAATCAGGCCACGGCGCTAAT-3' and reverse 5'-

TTCCCCAGCGACCAGATGATCACAC-3' will convert Glu-461 to Gln-461.

459 461 GLY GLU 5'-tcaaTCAGGCCACGGCGCTAAT-3' | |||||||||||||||||||| gtgcagcgcgatcgtaatcacccgagtgtgatcatctggtcgctggggaatgaatcaggccacggcgctaatcacgacg ||||||||||||||||||||||||| 3'-CACACTAGTAGACCAGCGACccctt-5'

The reverse primer 5'-TTCCCCAGCGACCAGATGATCACAC-3' was then changed to introduce a silent mutation at the Gly-459 codon producing two more primers 5'-

TTGCCCAGCGACCAGATGATCACAC-3' and 5'-

TTTCCCAGCGACCAGATGATCACAC-3' that will maintain Gly-459 while replacing the third pairing in the codon the wobble position. The same process was performed with the second site.

501 503 PRO TYR 5'-GCACGCGCGCGTGGATGAAGAC-3' | |||||||||||||||||||| gcagtatgaaggcggcggagccgacaccacggccaccgatattatttgcccgatgtacgcgcgcgtggatgaagac ||||||||||||||||||||||||| 3'- GCCGGTGGCTATAATAAACGGGCTA-5'

46

Tyr-503’s TAC codon was changed to CAC producing His-503. The primers for this mutation were 5'- GCACGCGCGCGTGGATGAAGAC-3' and 5'-

ATCGGGCAAATAATATCGGTGGCCG-3'

The thermocycler reactions were set up according to the protocol given with the mutagenesis kit (12.5µl Q5 Hot Start High-Fidelity 2X Master Mix (supplied with the kit), 1.25 µl of 10µm forward primer, 1.25 µl of 10µm reverse primer, 1 µl of 1 ng/µl template DNA and 9.0 µl of nuclease-free water (Cat#: 20-138 Apex BioResearch

Products). The thermocycler conditions were (98˚C for 30 seconds, followed by 25 cycles of (98˚C for 10 seconds, 63˚C for 30 seconds and 72˚C for 3.5 minutes) and a final extension at 72˚C for 2 minutes). The same protocol was used for both sites except that the annealing temperature was raised to 66˚C for the Tyr-503 mutations to accommodate the different primers.

The combined Kinase, Ligase and DpnI (KLD) treatment was performed according to the manufacturer's protocol 1 µl of each PCR product was placed in a reaction with 5µl of 2x KLD reaction buffer, 1µL of KLD reaction enzyme and 3 µl of nuclease-free water. Buffers and enzymes were used as supplied with the mutagenesis kit.

The reactions were incubated at room temperature for five minutes before transformations were performed.

Plasmids were initially transformed into NEB® 5-alpha competent E. coli

(Catalog #C2987I). Tubes containing 50 µl of cells were stored at -80˚C and thawed on ice for 5 min. After incubation 5 µl of each Kinase, Ligase and DpnI reaction was added to 50 µl of 5-alpha competent cells. The cells were incubated on ice for 30 minutes and then the tubes were moved to a 42 ˚C water bath for 30 seconds. The tubes were then 47 incubated on ice for 5 more minutes before adding 950 µl of SOC Outgrowth Medium

(Catalog #B9020S), the tubes were then placed in a rotor inside a 37˚C gravity convection incubator for 1 hour. Growth media plates made with LB Agar (Miller) from

Fisher BioReagents® (Catalog #BP9724) and containing 2 µl/ml of chloramphenicol were spread with 100 µl of X-gal (13mg/ml in dimethylformamide) and allowed to dry. A 100

µl sample of each mutant plasmid transformation was then spread on an X-gal treated plate and stored in a gravity convection incubator at 37˚C overnight.

For each site mutants were also made where only the A-site codon or the E-site codon was changed. It was found that all 3 A-site codon changes were sufficient to decrease activity independent of any other changes. It was also found that at all 3 test sites changes to the E-site codon showed no change in activity level regardless of the codon used. Changing the wobble position of these codons has no effect on enzyme activity.

Transformation of Mutant Plasmids

The plasmid used for the construction of mutants was pJC1168 which encodes a pseudo-wild-type lacZ and confers chloramphenicol resistance as a selectable marker. All mutants were made as described above. The mutant plasmids were transformed into 8 strains of cells.

● 599 D (pro lac) ara, thi, rpsL ● 600 Δ (pro lac) ara, thi (Curran and Yarus 1986) ● B209 A B (pro lac) ara gyrA rpoB (Andersson et al. 1982) ● B210 A B (pro lac) ara gyrA rpoB rpsL141(Andersson et al. 1982) ● B211 A B (pro lac) ara gyrA rpoB rpsE1023 (Andersson et al. 1982) ● B212 A (pro lac) ara gyrA rpoB rpsD12 (Anderson et al. 1982) ● B213 A (pro lac) ara gyrA rpoB rpsD14 (Anderson et al. 1982) ● B214 F- A (pro lac) ara gyrA rpoB rpsD16 (Anderson et al. 1982) 48

Initial Site Directed Mutagenesis

The same forward primer was used in this study to change Glu-461 (GAA) codon to a Gln-461 (CAA). On the reverse primer random nucleotides will be introduced in the position of Gly-459 (GGG). For the Glu-461 codon the primers, forward 5'-

TCAATCAGGCCACGGCGCTAAT-3' and reverse 5'-

TTNNNCAGCGACCAGATGATCACAC-3' will convert Glu-461 to Gln-461 with unknown sequences downstream. The primers with the desired mutations are designed to anneal back to back on the plasmid.

459 461 GLY GLU 5'-tcaaTCAGGCCACGGCGCTAAT-3' | |||||||||||||||||||| gtgcagcgcgatcgtaatcacccgagtgtgatcatctggtcgctggggaatgaatcaggccacggcgctaatcacgacg |||||||||||||||||||| || 3'-CACACTAGTAGACCAGCGACnnntt-5'

Thermocycler reactions followed the protocol given with the mutagenesis kit (described above). The thermocycler conditions were (98˚C for 30 seconds, followed by 25 cycles of

(98˚C for 10 seconds, 65˚C for 30 seconds and 72˚C for 3.5 minutes) and a final extension at 72˚C for 2 minutes).

Second Site Directed Mutagenesis

After sequencing a mutant with high activity, even given the Glu-461 mutation, was selected for testing. This mutant contained the non-degenerate methionine codon in the Gly-459 position. To reduce the possibility of contamination from other mutants that may have been produced during the first thermocycler reaction a primer was designed specifically to introduce the ATG codon in the downstream position of the Glu-461 49 mutant. A new primer was designed with the sequence 5'-

TTCATCAGCGACCAGATGATCACAC-3' and a new mutant was produced as described above.

459 461 GLY GLU 5'-tcaaTCAGGCCACGGCGCTAAT-3' | |||||||||||||||||||| ggtgcagcgatcgtaatcacccgagtgtgatcatctggtcgctggggaatgaatcaggccacggcgctaatcacgacgc |||||||||||||||||||| ||| 3'-CACACTAGTAGACCAGCGACtactt-5'

The new mutant was assayed to determine if a similar level of increased activity could be detected.

Second Site Testing of Sequence

To test the effect in a site with a different downstream context a mutant was prepared where residual lacZ activity at Glu-415 was used as a reporter. For the Glu-415 codon the primers, forward 5'-TCAAACCCACGGCATGGTGCCA-3' and reverse 5'-

TTCATGGCTTCATCCACCACATACAGGC-3' will convert Glu-415 to Gln-415 with the methionine codon downstream. The primers with the desired mutations are designed to anneal back to back on the plasmid.

413 415 ASN GLU 5'tcaaACCCACGGCATGGTGCCA-3' | |||||||||||||||||||| accatccgctgttggtacacgctgtgcgaccgctacgcctgtatgtggtggatgaagccaatattgaaACCCACGGCATGGTGCC |||||||||||||||||||||||| | CGGACATACACCACCTACTTCGGTACTT

The new mutant at this new site was assayed to determine if a similar level of increased activity could be detected.

Efficiency of codon following the methionine 50

The efficiency of translation following the methionine was examined. Efficiencies of codon translation was previously studied by Looman, where efficiencies were defined as the amount of a β-galactosidase fusion protein in percentage of total protein where other prokaryotic genes were fused to the 1acZ gene (Looman et al. 1985). From these previously determined efficiencies we noticed that the second codon after our methionine

(Asn AAU) was the second most efficiently translated codon in E. coli (efficiency 1.2) after Lys AAA (efficiency 1.5). Here we wanted to change the Asn codon to Asn AAC, a codon for the same amino acid but with an efficiency of only 0.6 to see if this had any effect on the misreading at Glu-461. To do this a new primer 5’-

CCAATCAGGCCACGGCGCTAAT-3’ was paired with the primer for methionine at

Gly-459 and a mutagenesis reaction followed by transformation was performed. The resulting mutants were assayed as before.

Sequence analysis

We utilized a Python program to analyze the genome of E. coli (E. coli K-12 substrain MG1655). We examined all open reading frames with common initiation and termination codons. Our program allowed us to extract the sequences following any codon triplet we designated using the initiation codon to set the reading frame. Our first question: When the initiator methionine enters the E-site is there a bias for specific codons directly following the initiator? Starting after the initiation codon we examined the use of codons in the second position. This allowed us to examine initiation context codon usage across the entire genome. We then asked: How does the usage of certain codons following the start codon compare with their usage in the rest of the gene? Are codons used frequently in the second position avoided elsewhere in the sequence? Using 51 the same program we collected data on overall codon usage excluding initiation and termination codons which we expect to have their own biases based on previous studies.

We then compared this usage with what we observed after the start codon. Finally we returned to the initiation context and examined codon usage in the second and third positions after start. Here we wanted to know if when the initiation codon enters the E- site is there a bias toward specific codons in the A and P sites. 52

RESULTS

Frameshifting

Sequencing

We hypothesize that if duplex stability varies among initiation codons, and it is functionally significant in frame maintenance, then a decrease in initiation duplex stability will be associated with increased frame slippage of P-site tRNA. To test this hypothesis we placed a slippery site immediately downstream of the initiation codon of a plasmid-borne lacZ gene (Figure 4). 53

Figure 4. The complex at the time of shift. We varied the initiation codon in the E-site with the expectation that as the quality of the codon changes that the P-site tRNA will go from UUU to UUC. Only when this frameshift occurs will the lacZ reporter be produced. 54

We wish to keep the context identical with the different constructs making changes only to the initiation codon. As a control for initiation efficiency, constructs with altered initiation codons that do not require frameshifting for lacZ expression were also generated. The plasmid pJC1168 contains two HindIII sites HindIII 328 is downstream of the HindIII 305 site where we wish to insert our test construct. To test that the oligonucleotides were inserted correctly the plasmids were sequenced by the Nevada

Genomics Center (University of Nevada, Reno). The results were compared to those of the initial sequence to ensure that the new sequences, including the new start codon, were placed correctly and maintained the correct reading frame. The sequencing results confirmed that we had produced a set of plasmids placing the slippery site downstream of the initiation codon for each start codon as well as a set with altered initiation codons that do not require frameshifting for lacZ expression but contain the same sequence minus the additional U nucleotide downstream of the second codon.

Secondary Structures

Our initial oligonucleotide inserts were designed to avoid sequences that produce secondary structures shown to inhibit initiation (eg. De Smit and van Duin 1990, Buchler et al. 1992). However, we also analyzed this sequence for any newly created secondary structures arising from our test constructs. The insert oligos along with various length flanking regions were analyzed using Integrated DNA Technologies OligoAnalyzer 3.1

(http://www.idtdna.com/calc/analyzer) a freely available tool that can be used to identify possible hairpin structures that can form in DNA sequences.

The frameshift CUG sequence produced a unique set of hairpin structures that required binding of only two nucleotides for each loop. We hypothesized that these may 55 be problematic as they could form so easily. We also tested all of the constructs by cloning them into pJC1168. As expected the activities were lower in the test constructs but in the frameshift CUG mutants, the activity was lower than one Miller unit. It was determined that by changing an aspartic acid codon two codons after the frameshift site the probability of forming the deleterious hairpin structures was removed. Since this codon would only be translated after the frameshift and would not affect our frameshift site we changed it for all test constructs from the sequence NUG UUU CGG U AAC

GAU ACU to NUG UUU CGG U AAC GAC ACU for frameshift codons and the same minus the extra nucleotide for the non-frameshift constructs. We tested these and found that they all produced the same activity levels as before except the frameshift CTG that now produced average activity of >1 Miller unit.

Enzymatic Assays

Constructs that contain the additional uracil in the frameshift prone sequence

(UUU cnn) will require frameshifting to produce a functional β-galactosidase enzyme. To test the effect of frameshifting we performed beta-galactosidase assays of each of the frameshift and non-frameshift constructs. If the quality of the codon in the E-site affects frameshifting then we would expect to see that the difference between the frameshift and non-frameshift constructs would vary for each start codon. A control set was similarly assayed using the PJC1168 plasmid with no changes made to the start codon and without the slippery sequence. These produced an average beta-galactosidase activity of ~28,000

Miller units. The AUG construct with the slippery site inserted produced an average activity level of ~29,000 Miller Units similar to the original plasmid. We did see decreased activity occur with changes to the start codon with averages of 700, 1000 and 56

4500 units for codons GUG, CUG, and UUG respectively. As expected, we also saw a further decrease in constructs requiring frameshifting to produce a functional β- galactosidase enzyme (Table 1). 57

Table 1. Frameshifting frequencies of the frameshift prone (UUU cnn) sequence

58

However, for each possible start codon examined the frameshift construct reduced activity by -3 orders of magnitude compared to the corresponding control. This observation suggests that while frameshifting could be induced at the slippery site the quality of the start codon contained in the E-site did not have an effect on the rate of frameshifting. Total protein was also extracted and confirmed that the quantity of β- galactosidase was the same for each start codon frameshift and non-frameshift construct.

Misreading

Sequencing

To determine that our constructs had been created correctly the plasmids were sequenced by the Nevada Genomics Center (University of Nevada, Reno). The results were compared to those of the initial sequence to ensure that the new sequences, including both mutant sites, were introduced in the correct codons as well as the correct near-cognate codons introduced where appropriate.

Protein Analysis

An SDS PAGE analysis of protein was performed and the amount of ß-

Galactosidase in our different mutants was compared to check for expression level differences between samples that showed differences in misreading dependent on E-site codon. While protein levels remain similar for the 461 GAA to CAA mutation, mutations at 460 apparently reduced protein levels and not just activity (Figure 5). 59

Figure 5. Protein levels when the 461 GAA codon is changed to CAA and the 460 AAT codon is changed to GAT. The amount of protein when GAA is changed to CAA remains similar. However, when 460 AAT is changed to GAT the level of ß-Galactosidase protein is very low.

60

When examining the results of these experiments it is important to remember that we are looking for two things. Past experiments have examined the effect of changing codons on misreading. They wanted to determine if the rates of misreading were the same for all codons and at all locations. The idea was to change a specific codon to a near cognate and then measure activity. Here we have done the same thing we changed a specific codon and we are looking to see if the activity relative to wild type is increased indicating that misreading at that codon does occur. Our experiment, however, has an extra dimension that those do not. We also have changed the codon in the E-site to contain a third position wobble pairing. We want to see if changing the amino acid in the wobble position has an effect on the amount of misreading in the A-site. If it does then we would expect to see that the amount of misreading is different with the different codons and that this is consistent in the different strains being tested.

Misreading of a Glutamine Codon at Position 461 as Glutamic acid

For the first construct I changed the glutamic acid codon (GAA) at position 461 to a glutamine (CAA). I also changed upstream E-site glycine codon (GGG) at position 459 to two synonyms (GGC and GGA). These mutants have only about 10-4th the activity of the wild type construct. Importantly, the mutants have activities that vary by about five- or 10-fold depending on the E. coli host (for strains MY600 and BL209 in Figure 6).

These observations strongly suggest that the E-site codon can indeed influence misreading at the A-site. Interestingly, the WT glycine codon, GGG allows for the least misreading, which is consistent with the model that codon usage optimizes accuracy. 61

Figure 6. Graph showing estimates of changes in ß-Galactosidase activity in all mutant strains where codon 461 Glu was mutated to CAA Gln and codon 459 remained GGG

(blue) or was changed to GGA (red) or GGC (green). Strain MY600 was used as a control for MY599 since they contain the same genotype except for the addition of an rpsL mutation in MY599. B209 was the control for B210-B214 as it has the same genotype but lacks any ram mutation. The strain B210 contains the rpsL mutation, B211 the rpsE mutation and B212 -B214 all have rpsD mutation. Standard error bars are shown 62

I also assayed the enzyme activities from these constructs in the ram and rpsL ribosomal mutant backgrounds. To confirm that these results were related to misreading we test the same lacZ constructs in a number of E.coli strains that contain various ribosomal mutations known to affect misreading. The data (Figure 7) show that the ram mutations cause either no or modest increases in activity (compare the BL209 control to ram mutants BL211-214). And the rpsL mutations have no detectable effects on misreading (compare the MY600 control to MY599, and the BL209 control to BL210).

Notably, the order of differences among E-site alleles is preserved in all ribosomal backgrounds. Altogether, these results are not dramatic, they are consistent with the hypothesis that the E-site codon can affect misreading.

Misreading of a Histidine Codon at Position 503 for Tyrosine

Because any single site can give spurious results, I chose a second site, an essential tyrosine at codon 503 (Tyr UAC), which was changed to a CAC histidine codon. As expected, this mutation decreased enzyme activity to ~0.1 percent of wild type activity. I then made synonymous changes at the E-site position proline (Pro 501, CCG) to the two possible near cognates CCC and CCA. I also examined the effects in strains in

E. coli with ribosomal accuracy mutations. The data (Figure 7) show that the synonyms have similar activities that are not systematically increased by ram (compared control

BL209 to ram BL211-BL214) or decreased (compare MY600 control to rpsL MY599, and BL209 control to rpsL BL210). So at this site, these constructs do not provide clear evidence for E-site codon effects on misreading at this A-site codon. 63

Figure 7. Graph showing estimates of changes in ß-Galactosidase activity in all mutant strains where codon 503 Tyr was mutated to CAC His and codon 501 remained

CCG(blue) or was changed to CCA(red) or CCC(green). 64

Misreading of a Glutamic Acid Codon at Position 460 as Asparagine

Intriguingly, it was previously shown that the glutamic acid codons (GAU and

GAC) could be misread as asparagine codons (AAU and AAC). I followed that lead by mutating the essential asparagine codon at position 460 (AAU) to aspartic acid (GAU), and also varying the E-site leucine CUG codon to its CUA and CUC synonyms. As expected for constructs having a mutation in the essential asparagine codon, these mutants have activities that are less than 0.1% of wild type. However, the E-site mutants differ over very wide ranges (~6-fold in BL209 and >20-fold in MY600). The CUC E- site mutant has particularly high activities that are robust to ribosomal mutational background. As above, the ram and rpsL ribosomes have small effects that are not readily explicable by their expected effects on misreading (Figure 8).

The results in this section strongly suggest that the E-site codon can affect misreading in the A-site, but these effects vary among sites. Misreading of Glu for Gln at

CAA 461, and of Asn for Glu GAU460 show strong E-site effects that respond to some of the ram mutations. But they respond differently to ram mutations, and do not respond strongly to rpsL mutations. Also, protein level of ß-Galactosidase seems to be affected with mutations at this position. The evidence for E-site effects on Tyr for His misreading at position 503 is very weak. 65

Figure 8. Graph showing estimates of changes in ß-Galactosidase activity in all mutant strains where codon 460 His was changed to a GAT aspartic acid codon. Codon 458 Leu

CTG either remained the same (blue) or was mutated to one of the near cognates CTA

(red) or CTC (green). 66

Protein Analysis

Analysis of protein by SDS PAGE showed that at our original test site protein levels at the 110 kDa band were similar to those of the wild type plasmid. However, as previously observed, at the 461, test site, with the GAT mutation, protein levels were much lower. The same change from GAA to CAA also had no effect on protein levels at codon 415. In both of these additional sites protein levels were much higher when the

ATG-AAT-CAA was inserted even though it remains similar when this combination is in position 461 (Figure 9). 67

Figure 9. Protein levels when the 461 GAA codon is changed to CAA and the 415 GAA codon is changed to CAA. The amount of protein when GAA is changed to CAA remains similar in both of these mutants. 68

I conclude that this approach for studying E-site effects on misreading has provided tantalizing information, but its requirement for two sequence features -- an essential A-site codon and an E-site codon that has many synonyms -- limits its ability to examine many A and E-site combinations.

Misreading With Other E-site Codons

We previously identified a glutamine codon that is necessary for LacZ activity

(Tsai and Curran, 1998). In our previous study this codon was mutated to a near-cognate so that residual LacZ activity can be used as a reporter of misreading to test the effects of

E-site tRNA on A-site misreading. Here we will use this same codon to look for context effects by changing the second upstream codon (that will be in the E-site during misreading) to a series of random codons so that we can look for effects on enzyme activity. The Glu-461 (GAA) codon will be mutated to a Gln-461 (CAA) codon which has previously been shown to reduce activity to only a few Miller units.

Our initial experiment produced a number of mutants with different levels of activity. However, one was significantly higher than any of the others, AUG. Given that methionine is a significant codon because it is the start codon and its activity was so much higher we selected it for further study. Because of the possibility of contamination with other mutants from using a random primer we repeated the experiment this time designing the mutagenesis to specifically insert an AUG codon in the E-site. The activity level was similar to the activity initially observed with the AUG mutant.

To test this change in another site we employed the Glutamic Acid codon (GAA) at 415. First, we changed the 415-GAA to a Glu-CAA. The activity level of our mutant 69 decreased but not as significantly as what we observed with Glu-461, but the activity was reduced to 337.63 units. Introducing the ATG-AAT sequence directly before it increased the activity to 1121.13 units. This increase was similar to the increase seen at Glu-461

(Figure 10.)

70

Table 2. List of activities recorded after our initial experiment introducing random codons into the E-site with a Glu-461 mutated to Gln-461

71

Efficiency of codon following the methionine

Previous studies have suggested that transcription levels are affected by the efficiency of the codon following the start codon. Here, we wanted to see if the misreading we observed could be affected by changing the codon after the AUG. The wild type codon at position 460 is an asparagine AAU. This asparagine codon has an efficiency of 1.2 based on the measurements of Looman et al. (1987). They also determined that the asparagine codon AAC was translated with an efficiency of only 0.6.

When we introduced the AAC codon into the P-site with AUG in the E-site misreading was reduced to approximately half what was seen with AAU. Both AAC and AAU code for the same amino acid and are translated by the same tRNA. 72

Figure 10. A number of enzymatic assays were performed with the mutations at position

415 and 561. Here we show those results averaged together and expressed as a percent of wild type activity. 73

Sequence Analysis Nucleotide and Encoded Amino Acid Biases at Initiation Codons

E. coli genes have clear and well-characterized biases in codon usage. As discussed in the Introduction, usage is generally correlated with tRNA concentration, especially in highly expressed genes, suggesting that it promotes expression. However, codon usage bias at initiation sites is especially interesting because codon biases at the second and third codons differ from those of internal codons. There could be many causes of bias at initiation, some of which might be due to effects on translation including minimizing message structures that could block initiation. Moreover, the amino acids near the amino terminus are more likely to be involved with protein processing, while internal amino acids may be generally important for protein folding and function. Many pioneering works that addressed such issues were performed on specific subsets of E. coli that may not be fully representative of the genome. I therefore explored amino acid, codon and nucleotide bias involved with initiation for the entire E. coli genome.

Parsing the Escherichia coli K-12 genome

I used the E. coli K-12 substrain MG1655 complete genome sequence from

Genbank (accession U00096), which was last updated on September 24, 2018. This genome has served as the standard E. coli reference, and has been refined extensively over the past two decades (originally published by Blattner et al. 1997, further references are available in the Genbank sequence file). The number of annotated genes in the file is

4546. I selected only those that have open reading frames (ORFs) that start with known initiation codons (ATG, GTG, CTG, TTG, ATT, ATA, AAA) and standard termination codons (TAA, TAG, TGA). These criteria removed genes for functional RNAs such as 74 tRNAs, as well as genes with incompletely characterized or mis-annotated ORFs. It also eliminated at least one gene that contains a programmed frameshift (prfB -- the “RF2” gene). Its removal, together with inspection of a handful of randomly selected ORFs, confirm the efficacy of the selection methodology. The method selected 4175 ORFs, which were used for the analysis.

Amino acid bias at the amino-terminus of encoded polypeptide

Previous work on a subset of genes (Looman et al. 1987) or of a protein database

(Shemesh et al. 2010, who used the Swiss-Prot database) had identified amino acid biases at the N-terminus of E. coli proteins. I extended that work with an examination of the

ORFS from the entire E. coli genome. I compared the relative frequencies of all amino acids at the second and third positions to their frequencies internally (that is from the fourth codon through the second to last codon; the last sense codon was not used to avoid potential interference from known bias at termination signals).

Figure 3 shows that most of the hydrophobic and both acidic amino acids are underrepresented as the second amino acid, relative to their usage internally. And instead, serine, lysine, threonine and asparagine are overrepresented. It seems likely that these biases are at least partially due to effects on protein processing. E. coli has multiple mechanisms for protein secretion that utilize signal peptides; however, none of them are known to use the second amino acid. On the other hand, the amino terminal formyl- methionine (f-Met) is removed from the majority of proteins by the action of two enzymes, peptide deformylase and methionyl aminopeptidase (MAP) (Hirel et al. 1989).

It is not known why f-Met removal occurs, but removal is very sensitive to the second amino acid. But second amino acid frequency bias does not correlate in a simple way 75 with the known effects on f-Met removal (Looman et al. 1987). For example, removal is associated with small, hydrophobic and moderately polar amino acids, but some are overrepresented (Ser, Asn) and not others (e.g. Gly, Val). Proteins with second position bulky and basic amino acids are poor substrates for f-met removal, but among these amino acids only lysine is overrepresented in proteins (Figure 11). 76

Figure 11. Overrepresentation of serine, lysine, threonine and asparagine in the second position following the initiation codon. Also other underrepresented amino acids like glycine and valine are apparent at the second position. 77

In conclusion, my results cannot exclude roles for second amino acid bias in protein processing, but the rules would have to be complex. For example, it is possible that proteins that have their f-Met removed tend to have serine or threonine rather than glycine or valine as their second amino acids, even though all of them would serve as good substrates. In addition, one would also have to conclude that proteins that do not have f-Met removed use lysine or asparagine rather than arginine or leucine, even though all are very poor substrates. Importantly, however, the lack of a clear correlation to activity strongly suggests that bias is unlikely to be solely driven by effects on protein processing.

Amino acid bias of amino-terminus compared to internal codons

I then asked whether codon usage might also differ between the second and internal codons. As example, I will present data for the three amino acids that are each encoded by six codons. Of them, serine use is sharply increased as the second amino acid relative to its internal use; leucine shows the opposite order of preference; and arginine is used about equally as the second and as an internal amino acid. The codon sets specific for each of these amino acids are known to exhibit strong biases within ORFs. Here, I compare their biases at the second codon to those at internal positions (Figure 12). 78

Figure 12. Panel A. shows the poor correlation between usage of codons internally with use of codons in the second position following the initiation codon. Panel B shows the high use of the AAA codon in the second position. 79

In Figure 13 is plotted the relative frequencies of the six serine codons at the second and at internal positions. There are clear and highly significant differences in serine codon usage depending on location. These differences are very highly significant; the Chi square value is literally “off the chart” (932.69 far exceeds 37.7, the significance level [p] of 0.001 for 5 degrees of freedom). All serine codons are as common or far more so at the second codon than internally, as might be expected given that serine is more abundant as the second amino acid than internally. There are also strong biases among serine codons that differ among the datasets. The codon is heavily biased towards AGC and AGT, which are read by the same tRNA and among the other four codons, the ones that have two As and Ts are more common than those that have only one such nucleotide.

Apparently, the second codon is more sensitive to codon and/or tRNA identity than are internal codons. These differences suggest that at least some of the second codon bias has a translational cause rather than simply a bias for the amino acid serine. 80

Figure 13. Preference for specific serine codons following the initiation codon. AGT and

AGC read by an abundant tRNA, are most common. There is also a preference for TCT and TCA codons over TCC and TCG which is in line with the preference for A- and T- rich codons in the second position. 81

The preferred codons are AGT and AGC, which are read by the same abundant tRNA. After them, TCT and TCA are favored over TCC and TCG. The latter pattern is consistent with a preference for A- and T-rich codons at the second position.

Leucine is avoided as the second amino acid, relative to its use internally, with almost every codon being more common at internal positions. The greatest difference is observed for CTG, which is the most abundant codon in the E. coli genome. These differences between second and internal codon usage are highly significant (Chi square =

141.12) (Figure 14). 82

Figure 14. Leucine codons are much less common in the second position even though they are the most common codon used internally, where CTG is the most common. These differences between second and internal codon usage are highly significant (Chi square =

141.12) 83

Arginine is the third of three amino acids that is specified by six codons. The amino acid is used with about equal frequency as both the second and internal amino acid

(~6%, see Figure 10). These differences between second and internal codon usage are highly significant (Chi square = 165.05). But it, too, exhibits differential codon bias between the second and internal positions. As for serine and leucine, arginine specification shows a preference for A- and T-rich codons (Figures 13, 14, 15). 84

Figure 15. Arginine codon bias between second and internal positions like serine and leucine at the second position there is preference for A- and T-rich codons. Difference between second and internal significance (Chi square = 165.05). 85

In conclusion, these data show that codon bias differs between the second and internal codons. And in these cases, the second codon trends towards A- and T-richness.

The same biases are broadly observed for the synonyms of other amino acids (not shown). This pattern cannot be explained solely by selection for amino acid sequence, and is likely caused because it affects one or more translational phenomena.

Codon and nucleotide bias at the second and third codon positions

Above, it is clear that amino acid usage bias differs for the second and internal codons. Moreover, at least some of that bias likely has a translational cause. To determine whether codon usage bias at the second codon would broadly be different from internal codons, I plotted the numbers of all codons at the second position vs. their numbers internally. Figure 12 shows the internal and second codon usage correlate poorly.

Notably, AAA is more than twice as common as any other at the second codon (Panel B).

Below, I will show that the codon AAA is also predominant at the third codon.

E. coli ORFs most commonly initiate with AUG, which pairs with the initiator

Met-tRNA with three Watson-Crick base pairs. But ~10% begin with a similar codon using wobble pairing at the first or third codon position (e.g., codons including GUG and

AUU, where the bolded nucleotide forms a wobble pair to the initiator tRNA). One hypothesis that could explain bias near the initiation site is that it might somehow compensate for weaker pairing at non-AUG codons. To address this possibility, Figure 16 shows plots of codon usage frequency at AUG vs. that at other codons. Panel 16A compares second codon usage between these groups of genes. However, the high correlation coefficient (r = 0.947) shows that the biases are virtually the same for both classes of initiator. Panel 16B shows that bias at the third codon of genes is also highly 86 correlated between AUG- and non-AUG-starting genes (r = 0.904). These very strong correlations show that AUG- and non-AUG-starting codons show virtually the same biases. They also show that bias at these codon positions does not somehow compensate for some deficiency associated with non-AUG initiation codons. 87

Figure 16. Panel A. Second position codon usage frequency when the initiation codon is

AUG is virtually the same as when initiation starts with another codon (r = 0.947). Panel

B. Third position codons are also the same for AUG- and non-AUG-starting genes (r =

0.904). 88

Interestingly, AAA is the most common second and third codon of E.coli genes.

Moreover, close inspection shows that initiation sites are generally read with weaker base pairing such as A:U, U:A and wobble pairs (not shown). To explore that further, I also examined nucleotide frequencies at the first, second and third nucleotide positions of codons (Figure 17). The figure is a bit complicated, but from left-to-right are three groups of bars. The left-most group is for the first codon position, and in it are plotted the frequencies of each of the four nucleotides (1A, 1T, etc.) for internal (blue), second (red) and third (green) codons. The central and right groups are for the middle and third codon nucleotides, respectively. The internal nucleotide frequencies exhibit the widely recognized “G-nonG-N” pattern, which is related to amino acid usage in proteins

(Trifonov 1987, Curran and Gross 1994). The second and third codons have biases that differ from this familiar internal pattern in several ways: they are stronger, the second and third codon biases are similar to each other, and they have an “A-A-A/T” pattern. 89

Figure 17. Frequencies at the first, second and third nucleotide positions of codons. From left-to-right are three groups of bars. The left-most group is for the first codon position, and in it are plotted the frequencies of each of the four nucleotides (1A, 1T, etc.) for internal (blue), second (red) and third (green) codons. The central and right groups are for the middle and third codon nucleotides, respectively. The internal nucleotide frequencies exhibit the widely recognized “G-nonG-N” pattern, which is related to amino acid usage in proteins (Trifonov 1987, Curran and Gross 1994). The second and third codons have biases that differ from this familiar internal pattern in several ways: they are stronger, the second and third codon biases are similar to each other, and they have an “A-A-A/T” pattern. 90

The selective mechanism that drives this A-richness is not clear. It had previously been suggested, using much smaller datasets, that A-richness near initiation sites might minimize secondary structure, which could interfere with initiation, which is certainly possible. However, weak pairing could have costs because such pairing, especially in runs, is known to be frameshift-prone at internal sites, including some programmed frameshift sites. I thus hypothesized that initiation might generally be frameshift prone. In addition, I hypothesized that the primary use of the AUG for initiation may help compensate for any tendency to frameshift for two reasons. It is not a ‘slippery” run that could allow for ambiguous framing; and it allows for three Watson:Crick base pairs to the initiator f-Met-tRNA, which would be less likely to leave the E-site prior to selection of an in-frame aa-tRNA at the A-site. 91

DISCUSSION

The E-site and Ribosomal Frameshifting

Several studies have provided evidence that codon-anticodon interaction in the ribosomal E-site helps maintain reading frame (Marquez et al. 2004, Trimble et al. 2004,

Jenner et al. 2007, Sanders and Curran 2007, Devaraj et al. 2009). Previous work in our lab has provided some in vivo evidence but the observations were made in a programmed frameshift site. Here our goal is to test the sequence features that have already been observed to contribute to RF2 frameshifting and determine if they could be involved in frameshifting errors that occur during normal translation. We incorporated a previously identified 'shifty sequence' (UUU cnn) that promotes +1 frameshifting directly following the start codon (Fu and Parker 1994). Because the initiation codon allows wobble pairing of the first nucleotide this allowed us to test the effect of codon quality in the E-site on frameshifting at the initiation codon. Because all codons are read by the same tRNA we hypothesized that the non-Watson-Crick binding of codons other than ATG would destabilize the E-site codon and contribute to frameshifting.

The results of our experiment, however, showed that the rate of frameshifting did not change with the quality of binding at the initiation codon. The RF2 programmed frameshifting mechanism has four features that contribute to high-frequency frameshifting. First, the frameshifting occurs during the slow translation at a UGA codon

(Craigen and Caskey 1986, Adamski et al. 1993). The slow translation may give the ribosome time to shift to the new reading frame and an inverse relationship has been shown between frameshifting and the rate of tRNA selection at slowly translated codons

(Curran and Yarus 1989, Curran and Yarus 1988). Our slippery sequence contains the 92

CGG phenylalanine codon which is translated by a rare tRNA that may introduce the ribosomal pause necessary for frameshifting. Second, a run of purines upstream of the slippery site base pairs with the anti-Shine-Dalgarno sequence of the 16s rRNA and disruption of the binding base pairs with single nucleotide substitutions has been shown to decrease frameshifting (Weiss et al. 1988). Third, the codon-anticodon in the pre-shift site contains a G:U wobble base pair, the weak pairing at this site may contribute to realigning at the slippage site (Curran 1993). Finally, the realigned codon-anticodon complex should contain only stable base pairs (Curran 1993, Weiss et al. 1988). The

'shifty sequence' we used has been shown to cause a +1 rightward frameshift in a site near the 5' end of E. coli argI. This slippage site contains only the third and fourth feature that

I have described here (Fu and Parker 1994). We have previously observed that the frameshifting tendency of this slippery site is somewhat applicable to non-RF2 frameshift sites suggesting that the slippery sequence is capable of inducing frameshift in the absence of the other frameshift enhancing elements.

So are there differences at the initiation site that may have contributed to the lack of frameshifting? The slippery site that we have incorporated here contains an arginine codon (CGG) that is read by a rare tRNA. The pause prior to the aa-tRNA selection at this second codon may be involved in increased frameshifting. Our previous work has suggested that while this pause may be important in RF2 programmed frameshifting that it may not be as important in other contexts as the percent of frameshifting was unchanged by changing the second codon to CGU a non-rare codon (Schwartz and

Curran 1997). However, tRNA selection at the CGG codon is 17 times slower than at the

CGU codon suggesting that slow selection may play a role in frameshifting (Curran and 93

Yarus 1989). If this pause is important than there may be another issue at the initiation codon. The “ribosomal ramp” hypothesis suggests that the first 30 to 50 codons of a protein are ones that translate with relatively low efficiency and that translational efficiency increases among later codons. However, one outlier of the ramp hypothesis is that the codon following the methionine initiation codon is translated with much higher efficiency especially in E. coli (Tuller et al. 2010). In fact, it is translated with much higher efficiency than the neighboring codons including non-rare codons. This suggests that there may be other factors at the initiation codon that may reduce the effects of our shifty sequence. This increase in efficiency may be due to quick release and recycling of the initiation methionine and may be codon-independent. The final feature that was absent in this study was the interaction with the Shine-Dalgarno sequence. The slippery sequence was previously assayed with and without any Shine-Dalgarno-like element and it was found that in the lack of such an element frameshift frequencies were still ~2 orders of magnitude higher than normal frameshift frequency (Schwartz and Curran

1997). Therefore, while the lack of this feature reduces the rate of frameshifting in this context there is still a measurable amount of frameshifting that seems to be unrelated to the quality of the codon-anticodon interaction in the E-site. However, the features that affect translation initiation interact in a complex manner where none of the requirements are absolute so much as they are preferences that together contribute to the efficiency of initiation. It has been shown that when the untranslated leader is removed that an AUG initiation codon is essential for translation (Van Etten and Janssen 1998). This suggests that start codons where non-Watson-Crick base pairing is necessary may rely on other features for their stability. This study is unique in that it is the first attempt to analyze this 94 type of frameshifting in this context. It may be necessary to determine what other factors are involved during translation initiation that could make frameshifting at this site different than during normal elongation. Future studies may need to determine the interaction of initiation factors that could be involved in reading frame maintenance by pairing these constructs with other mutations that affect different aspects of initiation.

The E-site and Misreading

Misreading at Mutations in “Essential” Codons

We hypothesized that the quality of the codon in the E-site has an effect on the tRNA selection in the A-site during translation. To test this I studied three codons that had been identified as coding for amino acids that are essential for ß-Galactosidase activity. I changed these essential codons to near-cognates, which as expected reduced ß-

Galactosidase activities to very low levels. Residual activities are presumed to be due to misreading to restore the essential amino acids in ß-Galactosidase. Then to detect the effects of the E-site codon on these residual activities, I changed the codon two positions upstream from the essential codon to synonyms. Because these changes do not affect amino acid sequence at the E-site position, changes in the residual activities were predicted to be due to effects of the E-site codon (and/or its tRNA) on misreading. At two of the three sites activities did change when the downstream codon was changed to a synonym.

The Effects of Ribosomal Accuracy Mutations on Misreading

To further clarify the role of misreading in these variations in enzyme activity, I assayed these constructs in a series of strains that contain ribosomal mutations that have 95 been shown to either decrease or increase misreading. Certain mutations in rpsL have been shown to increase accuracy, and such mutations are called “restrictive” mutations because they appear to restrict misreading. Certain other mutations in rpsD and rpsE are named ram for “ribosomal ambiguity” because they appear to increase misreading.

Clearly interpretable results would have been that rpsL always decreased and ram always increased activities, which would indicate that activities were due to misreading.

Otherwise, those activities were never changed by these mutations, indicating that activities were not due to misreading. My results, however, were mixed, with some ribosomal mutations affecting activities for some but not all constructs.

It is important to understand how each of these mutations affects the ribosome, which is outlined in table 2. 96

Table 3. A list of the specific mutations that were carried by strains used in these experiments including where the specific strains have been observed to affect translation.

97

The structural basis of the mechanism by which the ribosome incorporates the correct substrate faster than the incorrect one may involve 30S domain closure. This is a global conformational change of the ribosomal 30S subunit when the formation of

Watson–Crick base pairs occur at the first two codons of the codon-anticodon complex.

Restrictive rpsL mutations occur in ribosomal protein S12, which interacts with helices of the 16s rRNA after 30S subunit closure (Ogle et al. 2002). It is possible that the closed form of the 30S subunit is weakened by these mutations leading to a hyper-accurate ribosome (Ogle et al. 2003). The rpsD and rpsE genes encode mutations ribosomal proteins S4 and S5, respectively. Interactions between these proteins are thought to be broken during domain closure (Carter et al. 2000). The interruptions to this interface may lower the number of bonds that must break allowing closure of the 30S subunit more likely in the event of near cognates (Ogle et al. 2002, Ogle et al. 2003). This may lead one to expect that all rpsL mutants will lead to decreases in the rate of misreading, while all rpsD and rpsE mutants may lead to increases in misreading. This however is not the case, previous studies have shown that the rate of misreading is only increased or decreased in the case of codons that are known to be especially error prone. And only rpsE mutations have been shown to consistently increase most near cognate misreading errors (Kramer and Farabaugh 2007).

Thus, variations in misreading are expected to produce mixed results with these ribosomal mutations; and the only signal that could suggest a lack of misreading would be if no ribosomal mutant had any effect at a given site. But every site responded to the ribosomal mutations, albeit idiosyncratically. The most reliable ram mutation, the rpsE 98 mutation (in strain B211), always increased the activity of the most error prone construct within each group. Therefore, the simplest interpretation of my results is that activities are in fact due to misreading that it is not always affected by specific ribosomal accuracy mutations.

Effects of the E-site codon on misreading at three codons

To detect the effects of the E-site codon on these misreading-dependent activities,

I changed the codon two positions upstream from the essential codon to synonyms.

Because these changes do not affect amino acid sequence at the E-site position, changes in the residual activities were predicted to be due to effects of the E-site codon (and/or its tRNA) on misreading. Two of the locations had synonym-dependent differences in misreading that varied over ranges of about 6- to 20-fold (Glu for Gln at codon CAA 461, and Asn for Glu at GAU 460). In contrast, the misreading of Tyr for His at CAC 503 did not significantly vary among synonyms. I conclude that the identity of the E-site can affect misreading by substantial factors, even when comparing synonyms in the E-site.

But, and not surprisingly perhaps, not every possible E and A site combination has a different level of misreading.

Unfortunately, the misreading events required to suppress the essential site mutations are all so different that it is not possible to discern rules for E-site effects on misreading. For example, the three mutant A-site codons require different base pairs for misreading to occur. Although all three sites require a first-position mispair, for site 461 it is C:C (codon:anticodon); for site 460 it is G:U for 460; and for site 503 it is C:A. The three sites also differ in the identities of their E-site synonyms. These variations among 99 sites is a direct consequence of my experimental requirement to maintain amino acid sequence at the E-site.

Further work with these limitations on the experimental site will require the use of a different and broader set of reporter genes. And it would be necessary to find sites for essential amino acids that offer a wide variety of A and E-site combinations. Such work is beyond the scope of the currently available experimental model systems.

Codon Bias Near Initiation Sites

Efficiency of translation is partially determined by the translation initiation region

(TIR), a region of the mRNA, between positions -20 and +15 that is recognized by the small ribosomal subunit (Stormo et al. 1982). Evidence suggests that sequence motifs in this region are directly involved in recruitment of ribosomes as well as a number of initiation factors. While this is not fully understood features have been identified that are important for this recruitment. We previously discussed such features in the Shine-

Dalgarno region and its spacing from the start codon. The identity of the start codon and the base preceding it are also important (Schneider et al. 1986, Esposito et al. 2003). It has also been observed that selection pressure maintains a bias for certain codons following the start codon and these codons have up to a 20-fold effect on gene expression

(Stenstrom et al. 2001, Ohno et al. 2001). The most common second codon is AAA and, in general, single-stranded regions of 16S rRNAs are high in A content. This led to the idea that these regions being low in secondary structure was favorable for initiation

(Stenstrom and Isaksson 2002). However, there is conflicting evidence as to second codon content and initiation efficiency. Codons beginning with G also are correlated with high expression levels in E. coli while NGG codons reduce gene expression (Gutierrez et 100 al. 1996, Gonzalez de Valdivia and Isaksson 2004). Here, I extend those studies by examining the entire E. coli K-12 (substrain MG1655) genome to compare codon usage immediately after initiation to that internally to coding sequences. These efforts address several incompletely understood phenomena associated with translational initiation.

Secondary structure minimization

Another interesting feature of initiation sites is the strong preference for A-rich codons at the second and third codon positions. Stormo and colleagues, and others, using smaller datasets have suggested that this feature may result from selection for relatively low probability of secondary structure near initiation sites which could interfere with initiation (Stromo 1986). My data are broadly consistent with this hypothesis; notably, A is the most common nucleotide at each of the codon positions for the second and third codons (i.e., at nucleotides 4-9). However, initiation sites contain bias that is not simply correlated to this propensity, for some of the bias near initiation sites cannot be explained solely by selection for A-richness. For example, the next-most abundant nucleotide in the middle position of the second codon position is C, and A and T are used about equally at the first position of the third codon. Although I am unable to explain these sequence features, they suggest the possibility of the second and third codon identity may be important for an as yet unknown phenomenon(a).

Amino Terminal Processing

Others have probed the importance of the second amino acid in processing of the amino terminus (Sherman et al. 1985, Ben-Bassat et al. 1987). The formal group of f-Met is generally removed; in many proteins the Met is also removed. Earlier sequence 101 analyses as well as in vitro assays show that Met removal takes place efficiently when the second amino acid is small, hydrophobic and moderately polar (Frottin et al. 2006). If amino terminal processing was the only important feature then you would expect the amino acids with them to be favored more or less equally. However, we found that certain amino acids like Ser and Asn are much more highly favored at the second and third position than they are internally. Moreover, amino acids with similar chemical features such as Gly and Val show the opposite bias. Additionally, I find that the individual codons for Ser show strong biases at the second position that differ from their usages internally, and that do not consistently increase A-richness. For example, both

AGT and AGC are the most abundant serine codons at the second position, and they are used about equally. This (and other) pattern cannot result simply to minimize the propensity to form secondary structures. Altogether, these observations strongly suggest that protein processing and secondary structure minimization cannot be the only explanations for second codon bias.

CONCLUSIONS

Here we investigated the two roles that have been suggested for the ribosomal E- site. First, we examined its role in frame maintenance. We tested to see if the wobble position of the initiation codon was important in preventing frameshift. Our results were that it alone is not sufficient to have an impact on frameshift rates. During initiation the ribosome is part of a complex set of interacting elements. It is possible that while the E- site does not play a role at this point it still may be important in frame maintenance during elongation. Future studies may be able to look at frame maintenance in a non- initiation context. 102

In our next study we detected the effects of the ribosomal E-site on misreading in the A-site. In that study we maintained amino acid sequence at the E-site. In this and previous studies we identified that mutations at codon 461 could be used to detect misreading. We also tried two other sites not previously tested as misreading reporters and found mixed results that suggesting that differences in E-site synonyms, codon contexts or functional role of a particular amino acid in the final protein may mean only certain changes in a protein can be used to detect misreading. One future direction for our studies may be to test a number of essential codons and try to determine what others could be used to confirm misreading results. Our method uses an in vivo approach to examining misreading effects. That means that our results are ultimately dependent on producing an entire enzyme and testing it, in vitro studies have also been used and may be helpful in examining some contexts that cannot be studied in vivo.

In our final study we used the codon which has functioned in the past to test for misreading, relaxing the requirement to maintain amino acid sequence at the E-site. Here we designed a way to test the effect of specific E-site codons for effects on misreading.

This was somewhat speculative in that we simply introduced random codons into the E- site of our reporter. However, we were successful in detecting misreading when AUG tRNA are in the E-site. We also found that in the P-site a silent mutation was sufficient to alter the amount of misreading at the A-site. Because previous studies have suggested that transcription levels of a gene are affected by the efficiency of translation of the codon following the start codon this led us to wonder if methionine codons had specific effects on the ribosome while in the E-site but not only at initiation. We examined codon usage across the E. coli genome and found a number of differences in codon usage 103 following the start codon vs their use internally. For example hydrophobic and acidic amino acids are underrepresented and there is a bias for A and T richness in codons following initiation. Taken together our results suggest that there is reason for future studies to examine the use of methionine codons internally as they may have effects on translation even after initiation. Furthermore understanding how an AUG in the E-site effects the ribosome may help us to unravel how it is such an important initiator.

Our overall goal is to collect evidence to support the two proposed roles of the E- site in the ribosome. While it turns out that frame maintenance at initiation may not be dependent on the wobble nucleotide of the start codon we found some interesting effects of the AUG codons in the ribosome. We found evidence that the stability of tRNA binding in the E-site does effect accuracy of decoding at the A-site. Further, we went on to find that there is at least one codon that causes greater misreading while it occupies the

E-site.

Our initial question focused on the E-site but is it possible that translation is not the same in all instances following a simple set of rules of how the code is to be translated. We decided that the available evidence would answer no. In our studies we found that context surrounding different codons can alter how individual codons are read.

While further research will be required to unravel how these interactions work we can conclude that our answer to the original question is still a resounding NO! 104

REFERENCES

Adamski, F.M., Donly, B.C. and Tate, W.P., 1993. Competition between frameshifting, termination and suppression at the frameshift site in the Escherichia coli release factor-

2 mRNA. Nucleic acids research, 21(22), pp.5074-5078

Agrawal, R.K., Penczek, P., Grassucci, R.A. and Li, Y., 1996. Direct visualization of

A-, P-, and E-site transfer RNAs in the Escherichia coli ribosome. Science, 271(5251), p.1000.

Agris, P.F., Vendeix, F.A. and Graham, W.D., 2007. tRNA’s wobble decoding of the genome: 40 years of modification. Journal of molecular biology, 366(1), pp.1-13.

Arakawa, K. and Tomita, M., 2007. The GC skew index: a measure of genomic compositional asymmetry and the degree of replicational selection. Evolutionary

Bioinformatics, 3, p.117693430700300006.

Ayer, D. and Yarus, M., 1986. The context effect does not require a fourth base pair.

Science, 231(4736), pp.393-395.

Baranov, P.V., Gesteland, R.F. and Atkins, J.F., 2002. Recoding: translational bifurcations in gene expression. Gene, 286(2), pp.187-201. 105

Bella, J. and Humphries, M.J., 2005. Cα-H··· O= C hydrogen bonds contribute to the specificity of RGD cell-adhesion interactions. BMC structural biology, 5(1), p.1.

Bivona, L., Zou, Z., Stutzman, N. and Sun, P.D., 2010. Influence of the second amino acid on recombinant protein expression. Protein expression and purification, 74(2), pp.248-256.

Bjork, G.R., 1995. Biosynthesis and function of modified nucleosides in. tRNA-

Structure, Biosynthesis, and Function, pp.165-205.

Blaha, G. and Nierhaus, K.H., 2001, January. Features and functions of the ribosomal E site. In Cold Spring Harbor symposia on quantitative biology (Vol. 66, pp. 135-146).

Cold Spring Harbor Laboratory Press.

Blake, R.D. and Earley, S., 1986. Distribution and evolution of sequence characteristics in the E. coli genome. Journal of Biomolecular Structure and Dynamics, 4(2), pp.291-

307.

Blank, A., Gallant, J.A., Burgess, R.R. and Loeb, L.A., 1986. An RNA polymerase mutant with reduced accuracy of chain elongation. Biochemistry, 25(20), pp.5920-

5928. 106

Bocchetta, M., Xiong, L., Shah, S. and Mankin, A.S., 2001. Interactions between 23S rRNA and tRNA in the ribosomal E site. Rna, 7(01), pp.54-63.

Bogosian, G., Violand, B.N., Jung, P.E. and Kane, J.F., 1990. Effect of protein overexpression on mistranslation in Escherichia coli. The ribosome: structure, function, and evolution. American Society for Microbiology, Washington, DC, pp.546-

558.

Bos, J.L., Rehmann, H. and Wittinghofer, A., 2007. GEFs and GAPs: critical elements in the control of small G proteins. Cell, 129(5), pp.865-877.

Bossi, L. and Roth, J.R., 1980. The influence of codon context on genetic code translation. Nature, 286(5769), p.123.

Bossi, L., 1983. Context effects: translation of UAG codon by suppressor tRNA is affected by the sequence following UAG in the message. Journal of molecular biology,

164(1), pp.73-87.

Bouadloun, F., Donner, D. and Kurland, C.G., 1983. Codon-specific missense errors in vivo. The EMBO journal, 2(8), p.1351.

Brandis, G. and Hughes, D., 2016. The selective advantage of synonymous codon usage bias in Salmonella. PLoS genetics, 12(3), p.e1005926. 107

Brown, C.M., Stockwell, P.A., Trotman, C.N.A. and Tate, W.P., 1990. Sequence analysis suggests that tetra-nucleotides signal the termination of protein synthesis in eukaryotes. Nucleic Acids Research, 18(21), pp.6339-6345.

Bücheler, U.S., Werner, D. and Schirmer, R.H., 1992. Generating compatible translation initiation regions for heterologous gene expression in Escherichia coli by exhaustive periShine-Dalgarno mutagenesis. Human glutathione reductase cDNA as a model. Nucleic acids research, 20(12), pp.3127-3133.

Bulmer, M., 1987. Coevolution of codon usage and transfer RNA abundance.Nature,

325(6106), pp.728-730.

Burakovsky, D.E., Sergiev, P.V., Steblyanko, M.A., Kubarenko, A.V., Konevega, A.L.,

Bogdanov, A.A., Rodnina, M.V. and Dontsova, O.A., 2010. Mutations at the accommodation gate of the ribosome impair RF2-dependent translation termination.

RNA, 16(9), pp.1848-1853.

Bylund, G.O., Persson, B.C., Lundberg, L.A. and Wikström, P.M., 1997. A novel ribosome-associated protein is important for efficient translation in Escherichia coli.

Journal of bacteriology, 179(14), pp.4567-4574. 108

Carter, A.P., Clemons, W.M., Brodersen, D.E., Morgan-Warren, R.J., Wimberly, B.T. and Ramakrishnan, V., 2000. Functional insights from the structure of the 30S ribosomal subunit and its interactions with antibiotics. Nature, 407(6802), pp.340-348.

Chen, H., Bjerknes, M., Kumar, R. and Jay, E., 1994. Determination of the optimal aligned spacing between the Shine–Dalgarno sequence and the translation initiation codon of Escherichia coli m RNAs. Nucleic acids research, 22(23), pp.4953-4957.

Cobucci-Ponzano, B., Conte, F., Benelli, D., Londei, P., Flagiello, A., Monti, M.,

Pucci, P., Rossi, M. and Moracci, M., 2006. The gene of an archaeal α-L-fucosidase is expressed by translational frameshifting. Nucleic acids research, 34(15), pp.4258-

4268.

Cochella, L. and Green, R., 2005. Fidelity in protein synthesis. Current biology, 15(14), pp.R536-R540.

Craigen, W.J. and Caskey, C.T., 1986. Expression of peptide chain release factor 2 requires high-efficiency frameshift. Nature, 322(6076), pp.273-275.

Crick, F.H.C., 1966. Codon-anticodon pairing: the wobble hypothesis. 109

Cruz-Vera, L.R., Magos-Castro, M.A., Zamora-Romo, E. and Guarneros, G., 2004.

Ribosome stalling and peptidyl-tRNA drop-off during translational delay at AGA codons. Nucleic acids research, 32(15), pp.4462-4468.

Curran, J.F. and Gross, B.L., 1994. Evidence that GHN phase bias does not constitute a framing code. Journal of molecular biology, 235(1), pp.389-395.

Curran, J.F. and Yarus, M., 1986. Base substitutions in the tRNA anticodon arm do not degrade the accuracy of reading frame maintenance.Proceedings of the National

Academy of Sciences, 83(17), pp.6538-6542.

Curran, J.F. and Yarus, M., 1988. Use of tRNA suppressors to probe regulation of

Escherichia coli release factor 2. Journal of molecular biology,203(1), pp.75-83.

Curran, J.F. and Yarus, M., 1989. Rates of aminoacyl-tRNA selection at 29 sense codons in vivo. Journal of molecular biology, 209(1), pp.65-77.

Curran, J.F., 1993. Analysis of effects of tRNA: message stability on frameshift frequency at the Escherichia coli RF2 programmed frameshift site. Nucleic Acids

Research, 21(8), pp.1837-1843. 110

Daviter, T., Wieden, H.J. and Rodnina, M.V., 2003. Essential role of histidine 84 in elongation factor Tu for the chemical step of GTP hydrolysis on the ribosome. Journal of molecular biology, 332(3), pp.689-699.

De Felipe, P., 2004. Skipping the co-expression problem: the new 2A" CHYSEL" technology. Genetic vaccines and therapy, 2(1), p.13.

de Smit, M.H. and Van Duin, J., 1990. Secondary structure of the ribosome binding site determines translational efficiency: a quantitative analysis.Proceedings of the National

Academy of Sciences, 87(19), pp.7668-7672.

de Smit, M.H. and van Duin, J., 1994. Translational initiation on structured messengers: another role for the Shine-Dalgarno interaction. Journal of molecular biology, 235(1), pp.173-184.

Devaraj, A., Shoji, S., Holbrook, E.D. and Fredrick, K., 2009. A role for the 30S subunit E site in maintenance of the translational reading frame. Rna,15(2), pp.255-

265.

Diaconu, M., Kothe, U., Schlünzen, F., Fischer, N., Harms, J.M., Tonevitsky, A.G.,

Stark, H., Rodnina, M.V. and Wahl, M.C., 2005. Structural basis for the function of the ribosomal L7/12 stalk in factor binding and GTPase activation. Cell, 121(7), pp.991-

1004. 111

Dinman, J.D., 2006. Programmed ribosomal frameshifting goes beyond viruses: organisms from all three kingdoms use frameshifting to regulate gene expression, perhaps signaling a paradigm shift. Microbe (Washington, DC), 1(11), p.521.

Dinos, G., Kalpaxis, D.L., Wilson, D.N. and Nierhaus, K.H., 2005. Deacylated tRNA is released from the E site upon A site occupation but before GTP is hydrolyzed by EF-

Tu. Nucleic acids research, 33(16), pp.5291-5296.

Doerfel, L.K., Wohlgemuth, I., Kothe, C., Peske, F., Urlaub, H. and Rodnina, M.V.,

2013. EF-P is essential for rapid synthesis of proteins containing consecutive proline residues. Science, 339(6115), pp.85-88.

Edelmann, P. and Gallant, J., 1977. Mistranslation in E. coli. Cell, 10(1), pp.131-137.

Effraim, P.R., Wang, J., Englander, M.T., Avins, J., Leyh, T.S., Gonzalez, R.L. and

Cornish, V.W., 2009. Natural amino acids do not require their native tRNAs for efficient selection by the ribosome. Nature chemical biology, 5(12), pp.947-953.

Farabaugh, P.J. and Björk, G.R., 1999. How translational accuracy influences reading frame maintenance. The EMBO Journal, 18(6), pp.1427-1434.

Farabaugh, P.J., 1996. Programmed translational frameshifting. Annual review of genetics, 30(1), pp.507-528. 112

Feinstein, S.I. and Altman, S., 1977. Coding properties of an ochre-suppressing derivative of Escherichia coli tRNAITyr. Journal of molecular biology, 112(3), pp.453-

470.

Feinstein, S.I. and Altman, S., 1978. Context effects on nonsense codon suppression in

Escherichia coli. Genetics, 88(2), pp.201-219.

Fischer, N., Konevega, A.L., Wintermeyer, W., Rodnina, M.V. and Stark, H., 2010.

Ribosome dynamics and tRNA movement by time-resolved electron cryomicroscopy.

Nature, 466(7304), pp.329-333.

Flower, A.M. and McHenry, C.S., 1990. The gamma subunit of DNA polymerase III holoenzyme of Escherichia coli is produced by ribosomal frameshifting. Proceedings of the National Academy of Sciences, 87(10), pp.3713-3717.

Fu, C. and Parker, J., 1994. A ribosomal frameshifting error during translation of the argl mRNA of Escherichia coli. Molecular and General Genetics MGG, 243(4), pp.434-441.

Ganoza, M.C., Buckingham, K., Hader, P. and Neilson, T., 1984. Effect of base sequence on in vitro protein-chain termination. Journal of Biological Chemistry,

259(22), pp.14101-14104. 113

Geggier, S. and Vologodskii, A., 2010. Sequence dependence of DNA bending rigidity.

Proceedings of the National Academy of Sciences, 107(35), pp.15421-15426.

Giedroc, D.P. and Cornish, P.V., 2009. Frameshifting RNA pseudoknots: structure and mechanism. Virus research, 139(2), pp.193-208.

Gouy, M. and Gautier, C., 1982. Codon usage in bacteria: correlation with gene expressivity. Nucleic acids research, 10(22), pp.7055-7074.

Gouy, M., 1987. Codon contexts in enterobacterial and coliphage genes. Molecular biology and evolution, 4(4), pp.426-444.

Grajevskaja, R.A., Ivanov, Y.V. and Saminsky, E.M., 1982. 70‐S Ribosomes of

Escherichia coli Have an Additional Site for Deacylated tRNA Binding. European

Journal of Biochemistry, 128(1), pp.47-52.

Grantham, R.I., Perrin, P.A. and Mouchiroud, D.O., 1986. Patterns in codon usage of different kinds of species. Oxford Surv Evol Biol, 3, pp.48-81.

Gren, E.J., 1984. Recognition of messenger RNA during translational initiation in

Escherichia coli. Biochimie, 66(1), pp.1-29.

Gromadski, K.B. and Rodnina, M.V., 2004. Kinetic determinants of high-fidelity tRNA discrimination on the ribosome. Molecular cell, 13(2), pp.191-200. 114

Gromadski, K.B., Daviter, T. and Rodnina, M.V., 2006. A uniform response to mismatches in codon-anticodon complexes ensures ribosomal fidelity. Molecular cell,

21(3), pp.369-377.

Grosjean, H. and Fiers, W., 1982. Preferential codon usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene, 18(3), pp.199-209.

Gurvich, O.L., Baranov, P.V., Gesteland, R.F. and Atkins, J.F., 2005. Expression levels influence ribosomal frameshifting at the tandem rare arginine codons AGG_AGG and

AGA_AGA in Escherichia coli. Journal of bacteriology, 187(12), pp.4023-4032.

Gurvich, O.L., Baranov, P.V., Zhou, J., Hammer, A.W., Gesteland, R.F. and Atkins,

J.F., 2003. Sequences that direct significant levels of frameshifting are frequent in coding regions of Escherichia coli. The EMBO Journal, 22(21), pp.5941-5950.

Gustafsson, C., Govindarajan, S. and Minshull, J., 2004. Codon bias and heterologous protein expression. Trends in biotechnology, 22(7), pp.346-353.

Gustilo, E.M., Vendeix, F.A. and Agris, P.F., 2008. tRNA's modifications bring order to gene expression. Current opinion in microbiology, 11(2), pp.134-140. 115

Ikemura, T. and Ozeki, H., 1983, January. Codon usage and transfer RNA contents: organism-specific codon-choice patterns in reference to the isoacceptor contents. In

Cold Spring Harbor Symposia on Quantitative Biology (Vol. 47, pp. 1087-1097). Cold

Spring Harbor Laboratory Press.

Ikemura, T., 1981. Correlation between the abundance of Escherichia coli transfer

RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. Journal of molecular biology, 151(3), pp.389-409.

Ikemura, T., 1985. Codon usage and tRNA content in unicellular and multicellular organisms. Molecular biology and evolution, 2(1), pp.13-34.

Ingraham, J.L., 0. Maaloe, and FC Neidhardt. 1983. Growth of the bacterial cell.

Sinauer Assoc., Inc., Sunderland, Mass, p.3.

Jacks, T., Madhani, H.D., Masiarz, F.R. and Varmus, H.E., 1988. Signals for ribosomal frameshifting in the Rous sarcoma virus gag-pol region. Cell,55(3), pp.447-458.

Jacob, W.F., Santer, M. and Dahlberg, A.E., 1987. A single base change in the Shine-

Dalgarno region of 16S rRNA of Escherichia coli affects translation of many proteins.

Proceedings of the National Academy of Sciences, 84(14), pp.4757-4761. 116

Jäger, G., Leipuviene, R., Pollard, M.G., Qian, Q. and Björk, G.R., 2004. The conserved Cys-X1-X2-Cys motif present in the TtcA protein is required for the thiolation of cytidine in position 32 of tRNA from Salmonella enterica serovar

Typhimurium. Journal of bacteriology, 186(3), pp.750-757.

Jenner, L., Rees, B., Yusupov, M. and Yusupova, G., 2007. Messenger RNA conformations in the ribosomal E site revealed by X‐ray crystallography.EMBO reports, 8(9), pp.846-850.

Jenner, L.B., Demeshkina, N., Yusupova, G. and Yusupov, M., 2010. Structural aspects of messenger RNA reading frame maintenance by the ribosome. Nature structural & molecular biology, 17(5), pp.555-560.

Johnston, M. and Davis, R.W., 1984. Sequences that regulate the divergent GAL1-

GAL10 promoter in Saccharomyces cerevisiae. Molecular and Cellular Biology, 4(8), pp.1440-1448.

Jukes, T.H., 1973. Possibilities for the evolution of the genetic code from a preceding form. Nature, 246(5427), p.22.

Kilchert, C. and Spang, A., 2011. Cotranslational transport of ABP140 mRNA to the distal pole of S. cerevisiae. The EMBO journal, 30(17), pp.3567-3580. 117

Kleiger, G. and Eisenberg, D., 2002. GXXXG and GXXXA Motifs Stabilize FAD and

NAD (P)-binding Rossmann Folds Through C α–H⋯ O Hydrogen Bonds and van der

Waals Interactions. Journal of molecular biology, 323(1), pp.69-76.

Kohli, J. and Grosjean, H., 1981. Usage of the three termination codons: compilation and analysis of the known eukaryotic and prokaryotic translation termination sequences. Molecular and General Genetics MGG, 182(3), pp.430-439.

Komarova, A.V., Tchufistova, L.S., Supina, E.V. and Boni, I.V., 2002. Protein S1 counteracts the inhibitory effect of the extended Shine-Dalgarno sequence on translation. Rna, 8(9), pp.1137-1147.

Konevega, A.L., Soboleva, N.G., Makhno, V.I., Semenkov, Y.P., Wintermerer, W.,

Rodnina, M.V. and Katunin, V.I., 2004. Purine bases at position 37 of tRNA stabilize codon–anticodon interaction in the ribosomal A site by stacking and Mg2+-dependent interactions. Rna, 10(1), pp.90-101.

Konigsberg, W. and Godson, G.N., 1983. Evidence for use of rare codons in the dnaG gene and other regulatory genes of Escherichia coli. Proceedings of the National

Academy of Sciences, 80(3), pp.687-691. 118

Kothe, U., Wieden, H.J., Mohr, D. and Rodnina, M.V., 2004. Interaction of helix D of elongation factor Tu with helices 4 and 5 of protein L7/12 on the ribosome. Journal of molecular biology, 336(5), pp.1011-1021.

Kramer, E.B. and Farabaugh, P.J., 2007. The frequency of translational misreading errors in E. coli is largely determined by tRNA competition. Rna, 13(1), pp.87-96.

Lee, S., Liu, B., Lee, S., Huang, S.X., Shen, B. and Qian, S.B., 2012. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution.

Proceedings of the National Academy of Sciences, 109(37), pp.E2424-E2432.

Lill, R., Cunningham, K., Brundage, L.A., Ito, K., Oliver, D. and Wickner, W., 1989.

SecA protein hydrolyzes ATP and is an essential component of the protein translocation ATPase of Escherichia coli. The EMBO Journal, 8(3), p.961.

Lim, V.I. and Curran, J.F., 2001. Analysis of codon: anticodon interactions within the ribosome provides new insights into codon reading and the genetic code structure. Rna,

7(07), pp.942-957.

Lipmann, F., 1963. Messenger ribonucleic acid. Progress in Nucleic Acid Research and

Molecular Biology, 1, pp.135-161. 119

Loftfield, R.B. and Vanderjagt, D.O.R.O.T.H.Y., 1972. The frequency of errors in protein biosynthesis. Biochemical Journal, 128(5), p.1353.

Looman, A.C., Bodlaender, J., Comstock, L.J., Eaton, D., Jhurani, P., De Boer, H.A. and Van Knippenberg, P.H., 1987. Influence of the codon following the AUG initiation codon on the expression of a modified lacZ gene in Escherichia coli. The EMBO journal, 6(8), pp.2489-2492.

Looman, A.C., De Gruyter, M., Vogelaar, A. and Van Knippenberg, P.H., 1985.

Effects of heterologous ribosomal binding sites on the transcription and translation of the lacZ gene of Escherichia coli. Gene, 37(1-3), pp.145-154.

Márquez, V., Wilson, D.N., Tate, W.P., Triana-Alonso, F. and Nierhaus, K.H., 2004.

Maintaining the ribosomal reading frame: the influence of the E site during translational regulation of release factor 2. Cell, 118(1), pp.45-55.

Marshall, R.A., Dorywalska, M. and Puglisi, J.D., 2008. Irreversible chemical steps control intersubunit dynamics during translation. Proceedings of the National Academy of Sciences, 105(40), pp.15364-15369.

McCarthy, J.E. and Brimacombie, R., 1994. Prokaryotic translation: the interactive pathway leading to initiation. Trends in Genetics, 10(11), pp.402-407. 120

Mejlhede, N., Atkins, J.F. and Neuhard, J., 1999. Ribosomal− 1 Frameshifting during

Decoding ofBacillus subtilis cdd Occurs at the Sequence CGA AAG. Journal of bacteriology, 181(9), pp.2930-2937.

Miller, J.H. and Albertini, A.M., 1983. Effects of surrounding sequence on the suppression of nonsense codons. Journal of molecular biology, 164(1), pp.59-71.

Moazed, D. and Noller, H.F., 1989. Interaction of tRNA with 23S rRNA in the ribosomal A, P, and E sites. Cell, 57(4), pp.585-597.

Munson, L.M., Stormo, G.D., Niece, R.L. and Reznikoff, W.S., 1984. lacZ translation initiation mutations. Journal of molecular biology, 177(4), pp.663-683.

Näsvall, S.J., Chen, P. and Björk, G.R., 2004. The modified wobble nucleoside uridine-

5-oxyacetic acid in tRNAProcmo5UGG promotes reading of all four proline codons in vivo. Rna, 10(10), pp.1662-1673.

Nierhaus, K.H. and Rheinberger, H.J., 1984. An alternative model for the elongation cycle of protein biosynthesis. Trends in Biochemical Sciences, 9(10), pp.428-432.

Nierhaus, K.H., 1990. The allosteric three-site model for the ribosomal elongation cycle: features and future. Biochemistry, 29(21), pp.4997-5008. 121

Nierhaus, K.H., Jünemann, R. and Spahn, C.M., 1997. Are the current three-site models valid descriptions of the ribosomal elongation cycle?.Proceedings of the

National Academy of Sciences, 94(20), pp.10499-10500.

Noller, H.F., 1991. Ribosomal RNA and translation. Annual review of biochemistry,

60(1), pp.191-227.

Noller, H.F., Yusupov, M.M., Yusupova, G.Z., Baucom, A. and Cate, J.H.D., 2002.

Translocation of tRNA during protein synthesis. FEBS letters, 514(1), pp.11-16.

O'Connor, M., Brunelli, C.A., Firpo, M.A., Gregory, S.T., Lieberman, K.R., Lodmell,

J.S., Moine, H., Ryk, D.I.V. and Dahlberg, A.E., 1995. Genetic probes of ribosomal

RNA function. Biochemistry and cell biology, 73(11-12), pp.859-868.

Ogle, J.M., Brodersen, D.E., Clemons, W.M., Tarry, M.J., Carter, A.P. and

Ramakrishnan, V., 2001. Recognition of cognate transfer RNA by the 30S ribosomal subunit. Science, 292(5518), pp.897-902.

Ogle, J.M., Carter, A.P. and Ramakrishnan, V., 2003. Insights into the decoding mechanism from recent ribosome structures. Trends in biochemical sciences, 28(5), pp.259-266. 122

Ogle, J.M., Murphy, F.V., Tarry, M.J. and Ramakrishnan, V., 2002. Selection of tRNA by the ribosome requires a transition from an open to a closed form. Cell, 111(5), pp.721-732.

Osawa, S., Jukes, T.H., Watanabe, K. and Muto, A., 1992. Recent evidence for evolution of the genetic code. Microbiological reviews, 56(1), pp.229-264.

Pan, C.C., Chung, M.Y., Ng, K.F., Liu, C.Y., Wang, J.S., Chai, C.Y., Huang, S.H.,

Chen, P.H. and Ho, D.M.T., 2008. Constant allelic alteration on chromosome 16p

(TSC2 gene) in perivascular epithelioid cell tumour (PEComa): genetic evidence for the relationship of PEComa with angiomyolipoma. The Journal of pathology, 214(3), pp.387-393.

Pape, T., Wintermeyer, W. and Rodnina, M., 1999. Induced fit in initial selection and proofreading of aminoacyl‐tRNA on the ribosome. The EMBO Journal, 18(13), pp.3800-3807.

Parker, J. and Friesen, J.D., 1980. “Two out of three” codon reading leading to mistranslation in vivo. Molecular and General Genetics MGG, 177(3), pp.439-445.

Parker, J. and Holtz, G., 1984. Control of basal-level codon misreading in Escherichia coli. Biochemical and biophysical research communications, 121(2), pp.487-492. 123

Parker, J.A.C.K., 1989. Errors and alternatives in reading the universal genetic code.

Microbiological reviews, 53(3), p.273.

Pedersen, S., 1984. Escherichia coli ribosomes translate in vivo with variable rate. The

EMBO journal, 3(12), pp.2895-2898.

Pedersen, W.T. and Curran, J.F., 1991. Effects of the nucleotide 3′ to an amber codon on ribosomal selection rates of suppressor tRNA and release factor-1. Journal of molecular biology, 219(2), pp.231-241.

Pino, P., Aeby, E., Foth, B.J., Sheiner, L., Soldati, T., Schneider, A. and Soldati‐Favre,

D., 2010. Mitochondrial translation in absence of local tRNA aminoacylation and methionyl tRNAMet formylation in Apicomplexa. Molecular microbiology, 76(3), pp.706-718.

Prère, M.F., Canal, I., Wills, N.M., Atkins, J.F. and Fayet, O., 2011. The interplay of mRNA stimulatory signals required for AUU-mediated initiation and programmed− 1 ribosomal frameshifting in decoding of transposable element IS911. Journal of bacteriology, 193(11), pp.2735-2744.

Quax, T.E., Claassens, N.J., Söll, D. and van der Oost, J., 2015. Codon bias as a means to fine-tune gene expression. Molecular cell, 59(2), pp.149-161. 124

Ramakrishnan, V., 2009. GTPase activation of elongation factor EF‐Tu by the ribosome during decoding. The EMBO journal, 28(6), pp.755-765.

Remme, J., Margus, T., Villems, R. and Nierhaus, K.H., 1989. The third ribosomal tRNA‐binding site, the E site, is occupied in native polysomes. European Journal of

Biochemistry, 183(2), pp.281-284.

Rheinberger, H.J. and Nierhaus, K.H., 1980. Simultaneous binding of three tRNA molecules by the ribosome of Escherichia coli. Biochemistry International, 1(4), pp.297-303.

Rheinberger, H.J. and Nierhaus, K.H., 1983. Testing an alternative model for the ribosomal peptide elongation cycle. Proceedings of the National Academy of Sciences,

80(14), pp.4213-4217.

Rheinberger, H.J., Sternbach, H. and Nierhaus, K.H., 1981. Three tRNA binding sites on Escherichia coli ribosomes. Proceedings of the National Academy of Sciences,

78(9), pp.5310-5314.

Rice, N.R., Stephens, R.M., Couez, D., Deschamps, J., Kettmann, R., Burny, A. and

Gilden, R.V., 1984. The nucleotide sequence of the env gene and post-env region of bovine leukemia virus. Virology, 138(1), pp.82-93. 125

Ring, M. and Huber, R.E., 1990. Multiple replacements establish the importance of tyrosine-503 in β-galactosidase (Escherichia coli). Archives of biochemistry and biophysics, 283(2), pp.342-350.

Ringquist, S., Shinedling, S., Barrick, D., Green, L., Binkley, J., Stormo, G.D. and

Gold, L., 1992. Translation initiation in Escherichia coli: sequences within the ribosome‐binding site. Molecular microbiology, 6(9), pp.1219-1229.

Robertson, J.M. and Wintermeyer, W., 1987. Mechanism of ribosomal translocation: tRNA binds transiently to an exit site before leaving the ribosome during translocation.

Journal of molecular biology, 196(3), pp.525-540.

Robertson, J.M., Paulsen, H. and Wintermeyer, W., 1986. Pre-steady-state kinetics of ribosomal translocation. Journal of molecular biology, 192(2), pp.351-360.

Robinson, M., Lilley, R., Little, S., Emtage, J.S., Yarranton, G., Stephens, P., Millican,

A., Eaton, M. and Humphreys, G., 1984. Codon usage can affect efficiency of translation of genes in Escherichia coli. Nucleic acids research, 12(17), pp.6663-6671.

Rodnina, M.V. and Wintermeyer, W., 2001. Fidelity of aminoacyl-tRNA selection on the ribosome: kinetic and structural mechanisms. Annual review of biochemistry,

70(1), pp.415-435. 126

Rodnina, M.V., Gromadski, K.B., Kothe, U. and Wieden, H.J., 2005. Recognition and selection of tRNA in translation. FEBS letters, 579(4), pp.938-942.

Rodnina, M.V., Pape, T., Fricke, R. and Wintermeyer, W., 1995. Elongation factor Tu, a GTPase triggered by codon recognition on the ribosome: mechanism and GTP consumption. Biochemistry and cell biology, 73(11-12), pp.1221-1227.

Rodnina, M.V., Pape, T., Fricke, R., Kuhn, L. and Wintermeyer, W., 1996. Initial binding of the elongation factor Tu· GTP· aminoacyl-tRNA complex preceding codon recognition on the ribosome. Journal of Biological Chemistry, 271(2), pp.646-652.

Rosenberger, R.F. and Foskett, G., 1981. An estimate of the frequency of in vivo transcriptional errors at a nonsense codon in Escherichia coli. Molecular and General

Genetics MGG, 183(3), pp.561-563.

Rosenberger, R.F. and Hilton, J., 1983. The frequency of transcriptional and translational errors at nonsense codons in the lacZ gene of Escherichia coli. Molecular and General Genetics MGG, 191(2), pp.207-212.

Ruusala, T., Ehrenberg, M. and Kurland, C.G., 1982. Catalytic effects of elongation factor Ts on polypeptide synthesis. The EMBO journal, 1(1), p.75. 127

Sanders, C.L. and Curran, J.F., 2007. Genetic analysis of the E site during RF2 programmed frameshifting. Rna, 13(9), pp.1483-1491.

Saulquin, X., Scotet, E., Trautmann, L., Peyrat, M.A., Halary, F., Bonneville, M. and

Houssaint, E., 2002. + 1 Frameshifting as a novel mechanism to generate a cryptic cytotoxic T lymphocyte epitope derived from human interleukin 10. The Journal of experimental medicine, 195(3), pp.353-358.

Scherer, G.F., Walkinshaw, M.D., Arnott, S. and Morré, D.J., 1980. The ribosome binding sites recognized by E. coli ribosomes have regions with signal character in both the leader and protein coding segments. Nucleic acids research, 8(17), pp.3895-3908.

Schmeing, T.M. and Ramakrishnan, V., 2009. What recent ribosome structures have revealed about the mechanism of translation. Nature, 461(7268), pp.1234-1242.

Schuette, J.C., Murphy IV, F.V., Kelley, A.C., Weir, J.R., Giesebrecht, J., Connell,

S.R., Loerke, J., Mielke, T., Zhang, W., Penczek, P.A. and Ramakrishnan, V., 2009.

GTPase activation of elongation factor EF‐Tu by the ribosome during decoding. The

EMBO journal, 28(6), pp.755-765.

Schwartz, R. and Curran, J.F., 1997. Analyses of frameshifting at UUU-pyrimidine sites. Nucleic acids research, 25(10), pp.2005-2011. 128

Scorer, C.A., Carrier, M.J. and Rosenberger, R.F., 1991. Amino acid misincorporation during high-level expression of mouse epidermal growth factor in Escherichia coli.

Nucleic acids research, 19(13), pp.3511-3516.

Selmer, M., Dunham, C.M., Murphy, F.V., Weixlbaumer, A., Petry, S., Kelley, A.C.,

Weir, J.R. and Ramakrishnan, V., 2006. Structure of the 70S ribosome complexed with mRNA and tRNA. Science, 313(5795), pp.1935-1942.

Semenkov, Y.P., Rodnina, M.V. and Wintermeyer, W., 1996. The" allosteric three-site model" of elongation cannot be confirmed in a well-defined ribosome system from

Escherichia coli. Proceedings of the National Academy of Sciences, 93(22), pp.12183-

12188.

Sergiev, P.V., Lesnyak, D.V., Kiparisov, S.V., Burakovsky, D.E., Leonov, A.A.,

Bogdanov, A.A., Brimacombe, R. and Dontsova, O.A., 2005. Function of the ribosomal E-site: a mutagenesis study. Nucleic acids research,33(18), pp.6048-6056.

Sharp, P.M. and Li, W.H., 1986. An evolutionary perspective on synonymous codon usage in unicellular organisms. Journal of molecular evolution, 24(1-2), pp.28-38.

Shpaer, E.G., 1986. Constraints on codon context in Escherichia coli genes their possible role in modulating the efficiency of translation. Journal of molecular biology,

188(4), pp.555-564. 129

Shultzaberger, R.K., Bucheimer, R.E., Rudd, K.E. and Schneider, T.D., 2001. Anatomy of Escherichia coli ribosome binding sites. Journal of molecular biology, 313(1), pp.215-228.

Sipley, J. and Goldman, E., 1993. Increased ribosomal accuracy increases a programmed translational frameshift in Escherichia coli. Proceedings of the National

Academy of Sciences, 90(6), pp.2315-2319.

Spanjaard, R.A. and Van Duin, J., 1988. Translation of the sequence AGG-AGG yields

50% ribosomal frameshift. Proceedings of the National Academy of Sciences, 85(21), pp.7967-7971.

Springgate, C.F. and Loeb, L.A., 1975. On the fidelity of transcription by Escherichia coli ribonucleic acid polymerase. Journal of molecular biology, 97(4), pp.577-591.

Sprinzl, M., Horn, C., Brown, M., Ioudovitch, A. and Steinberg, S., 1998. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic acids research, 26(1), pp.148-153.

Stark, H., Rodnina, M.V., Rinke-Appel, J., Brimacombe, R., Wintermeyer, W. and van

Heel, M., 1997. Visualization of elongation factor Tu on the Escherichia coli ribosome.

Nature, 389(6649), pp.403-406. 130

Steitz, J.A. and Jakes, K., 1975. How ribosomes select initiator regions in mRNA: base pair formation between the 3'terminus of 16S rRNA and the mRNA during initiation of protein synthesis in Escherichia coli. Proceedings of the National Academy of

Sciences, 72(12), pp.4734-4738.

Stenström, C.M. and Isaksson, L.A., 2002. Influences on translation initiation and early elongation by the messenger RNA region flanking the initiation codon at the 3′ side.

Gene, 288(1), pp.1-8.

Stenström, C.M., Jin, H., Major, L.L., Tate, W.P. and Isaksson, L.A., 2001. Codon bias at the 3′-side of the initiation codon is correlated with translation initiation efficiency in

Escherichia coli. Gene, 263(1-2), pp.273-284.

Stormo, G.D., Schneider, T.D. and Gold, L., 1986. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic acids research, 14(16), pp.6661-6679.

Stormo, G.D., Schneider, T.D. and Gold, L.M., 1982. Characterization of translational initiation sites in E. coli. Nucleic acids research, 10(9), pp.2971-2996.

Taliaferro, D. and Farabaugh, P.J., 2007. An mRNA sequence derived from the yeast

EST3 gene stimulates programmed+ 1 translational frameshifting.RNA, 13(4), pp.606-

613. 131

Tats, A., Remm, M. and Tenson, T., 2006. Highly expressed proteins have an increased frequency of alanine in the second amino acid position. BMC genomics, 7(1), p.28.

Trifonov, E.N., 1987. Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences. Journal of molecular biology, 194(4), pp.643-652.

Trimble, M.J., Minnicus, A. and Williams, K.P., 2004. tRNA slippage at the tmRNA resume codon. Rna, 10(5), pp.805-812.

Tsai, F. and Curran, J.F., 1998. tRNA 2 Gln mutants that translate the CGA arginine codon as glutamine in Escherichia coli. RNA, 4(12), pp.1514-1522.

Tuller, T., Carmi, A., Vestsigian, K., Navon, S., Dorfan, Y., Zaborske, J., Pan, T.,

Dahan, O., Furman, I. and Pilpel, Y., 2010. An evolutionarily conserved mechanism for controlling the efficiency of protein translation.Cell, 141(2), pp.344-354.

Ude, S., Lassak, J., Starosta, A.L., Kraxenberger, T., Wilson, D.N. and Jung, K., 2013.

Translation elongation factor EF-P alleviates ribosome stalling at polyproline stretches.

Science, 339(6115), pp.82-85. 132

Urbonavičius, J., Qian, Q., Durand, J.M., Hagervall, T.G. and Björk, G.R., 2001.

Improvement of reading frame maintenance is a common function for several tRNA modifications. The EMBO journal, 20(17), pp.4863-4873.

Urbonavicius, J., Stahl, G., Durand, J.M., Salem, S.N.B., Qian, Q., Farabaugh, P.J. and

Björk, G.R., 2003. Transfer RNA modifications that alter+ 1 frameshifting in general fail to affect− 1 frameshifting. Rna, 9(6), pp.760-768.

Van Etten, W.J. and Janssen, G.R., 1998. An AUG initiation codon, not codon– anticodon complementarity, is required for the translation of unleadered mRNA in

Escherichia coli. Molecular microbiology, 27(5), pp.987-1001.

Varenne, S., Buc, J., Lloubes, R. and Lazdunski, C., 1984. Translation is a non-uniform process: effect of tRNA availability on the rate of elongation of nascent polypeptide chains. Journal of molecular biology, 180(3), pp.549-576.

Wang, G., Rasko, D.A., Sherburne, R. and Taylor, D.E., 1999. Molecular genetic basis for the variable expression of Lewis Y antigen in Helicobacter pylori: analysis of the α

(1, 2) fucosyltransferase gene. Molecular microbiology, 31(4), pp.1265-1274.

Watson, J.D., 1963. Involvement of RNA in the synthesis of proteins.Science,

140(3562), pp.17-26. 133

Weiss, R.B., Dunn, D.M., Dahlberg, A.E., Atkins, J.F. and Gesteland, R.F., 1988.

Reading frame switch caused by base-pair formation between the 3'end of 16S rRNA and the mRNA during elongation of protein synthesis in Escherichia coli. The EMBO journal, 7(5), p.1503.

Weyens, G., Charlier, D., Roovers, M., Piérard, A. and Glansdorff, N., 1988. On the role of the Shine-Dalgarno sequence in determining the efficiency of translation initiation at a weak start codon in the car operon of Escherichia coli K12. Journal of molecular biology, 204(4), pp.1045-1048.

Wielgoss, S., Barrick, J.E., Tenaillon, O., Cruveiller, S., Chane-Woon-Ming, B.,

Médigue, C., Lenski, R.E. and Schneider, D., 2011. Mutation rate inferred from pp.183-186.synonymous substitutions in a long-term evolution experiment with

Escherichia coli. G3: Genes, Genomes, Genetics, 1(3), pp.183-186.

Wilson, D.N. and Nierhaus, K.H., 2006. The E-site story: the importance of maintaining two tRNAs on the ribosome during protein synthesis. Cellular and

Molecular Life Sciences CMLS, 63(23), pp.2725-2737.

Wilson, D.N. and Nierhaus, K.H., 2007. The weird and wonderful world of bacterial ribosome regulation. Critical reviews in biochemistry and molecular biology, 42(3), pp.187-219. 134

Wintermeyer, W., Peske, F., Beringer, M., Gromadski, K.B., Savelsbergh, A. and

Rodnina, M.V., 2004. Mechanisms of elongation on the ribosome: dynamics of a macromolecular machine. Biochemical Society Transactions, 32(5), pp.733-737.

Wittmann-Liebold, B., Kopke, A.K.E., Arndt, E., Kromer, W., Hatakeyama, T. and

Wittmann, H.G., 1990. Sequence comparison and evolution of ribosomal proteins and their genes. The ribosome: structure, function and evolution. American Society for

Microbiology, Washington, DC, pp.598-616.

Wohlgemuth, I., Pohl, C. and Rodnina, M.V., 2010. Optimization of speed and accuracy of decoding in translation. The EMBO journal, 29(21), pp.3701-3709.

Yarus, M. and Folley, L.S., 1985. Sense codons are found in specific contexts. Journal of molecular biology, 182(4), pp.529-540.

Yusupov, M.M., Yusupova, G.Z., Baucom, A., Lieberman, K., Earnest, T.N., Cate,

J.H.D. and Noller, H.F., 2001. Crystal structure of the ribosome at 5.5 Å resolution. science, 292(5518), pp.883-896.

Zaher, H.S. and Green, R., 2009. Quality control by the ribosome following peptide bond formation. Nature, 457(7226), pp.161-166. 135

Zahn, K. and Landy, A., 1996. Modulation of lambda integrase synthesis by rare arginine tRNA. Molecular microbiology, 21(1), pp.69-76.

Zhang, J. and Deutscher, M.P., 1992. A uridine-rich sequence required for translation of prokaryotic mRNA. Proceedings of the National Academy of Sciences, 89(7), pp.2605-2609.

Zhelyabovskaya, O.B., Berlin, Y.A. and Birikh, K.R., 2004. Artificial genetic selection for an efficient translation initiation site for expression of human RACK1 gene in

Escherichia coli. Nucleic acids research, 32(5), pp.e52-e52.

136

SCHOLASTIC VITA

Michael Duane Ward

UNDERGRADUATE STUDY: Salisbury University Salisbury, Maryland B.A.,Philosophy and Psychology 2000

Salisbury University Salisbury, Maryland B.S.,Biology 2006

GRADUATE STUDY: North Carolina State University Raleigh, North Carolina M.S.,Entomology 2009

North Carolina State University Raleigh, North Carolina Graduate Certificate,Molecular Biotechnology 2009

Wake Forest University Winston-Salem, North Carolina Ph.D.,Biology 2020

SCHOLASTIC AND PROFESSIONAL EXPERIENCE:

Summer Intern, University of Maryland Center for Environmental Science, Horn Point Laboratory, 2000

Graduate Research Assistant, Wake Forest University, 2009-2020

Graduate Teaching Assistant, Wake Forest University, 2009-2015