EFFECTS OF CODON USAGE ON MRNA

TRANSLATION AND DECAY

by

VLADIMIR PRESNYAK

Submitted in partial fulfillment of the requirements for the degree

of Doctor of Philosophy

Dissertation Adviser: Dr. Jeffery Coller

Department of Biochemistry

CASE WESTERN RESERVE UNIVERSITY

May, 2015

CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of

Vladimir Presnyak

Candidate for the Doctor of Philosophy degree*.

Committee chair

Timothy Nilsen

Dissertation advisor

Jeff Coller

Committee members

Maria Hatzoglou

Donny Licatalosi

Nathan Morris

Date of defense

March 18, 2015

* We also certify that written approval has been obtained for any proprietary material contained therein.

2

TABLE OF CONTENTS

TABLE OF CONTENTS ...... 3 INDEX OF FIGURES ...... 5 INDEX OF TABLES ...... 6 ACKNOWLEDGEMENTS ...... 7 ABSTRACT ...... 8 CHAPTER 1: BACKGROUND ...... 10 Cellular context of mRNA decay ...... 10 Overview of mRNA decay ...... 12 Pathways of mRNA decay ...... 14 Quality control pathways ...... 18 mRNA translation is intricately linked to decay ...... 20 Mechanism is insufficient to explain decay rate ...... 23 Codon bias and optimality ...... 23 CHAPTER 2: RARE CODON ANALYSIS...... 26 Position of rare codons changes effects on decay ...... 26 Destabilization by rare codons depends on translation ...... 28 Decay of reporters occurs through the major decay pathway ...... 29 Ribosome association of constructs remains unchanged ...... 32 CHAPTER 3: GENOMIC ANALYSIS ...... 34 Determination of half-lives by RNA-seq ...... 34 The search for features correlating with decay ...... 37 Codon usage correlates with mRNA stability ...... 39 Codon usage in genome falls into distinct patterns ...... 45 Optimal codons increase average ribosome density ...... 49 Related appear correlated through codon usage ...... 52 CHAPTER 4: EXPERIMENTAL VALIDATION ...... 55 Changes in codon content leads to changes in stability ...... 55 Regulation through codon usage dominates over UTR regulation ...... 56 Codon content affects the major decay pathway ...... 58 HIS3 reporter system allows for fine tuning of mRNA stability ...... 60 Codon content impacts translation beyond changes in mRNA ...... 62 Affected step of translation is elongation ...... 65

3

Changes in codon content can impact cellular fitness ...... 68 CHAPTER 5: DISCUSSION ...... 70 Overview ...... 70 Considerations of rare codon experiments ...... 71 Considerations of RNA-seq study ...... 72 Considerations of codon content experiments ...... 75 Possible roles of DHH1 ...... 76 Codon optimality in yeast and other organisms ...... 77 Ribosome as monitor of all mRNA fates ...... 83 Future directions ...... 84 APPENDIX A: BIOINFORMATICS ...... 86 Half-life fitting ...... 86 CSC calculation ...... 88 APPENDIX B: MATERIALS AND METHODS ...... 93 Yeast strains and growth ...... 93 Plasmids and strain construction ...... 94 Northern RNA analysis ...... 97 Polyribosome analysis ...... 98 Asymmetric PCR probes...... 99 Plating assays ...... 100 RNA-seq ...... 100 Alignment and half-life calculation ...... 101 Statistical techniques ...... 102 Heat map generation ...... 102 Tables ...... 103 BIBLIOGRAPHY ...... 111

4

INDEX OF FIGURES

Figure 1: Overview of 5’-3’ decay ...... 15 Figure 2: Structure and features of the rare codon constructs ...... 26 Figure 3: mRNAs bearing rare codons display reduction in stability dependent on the position of the rare codons ...... 28 Figure 4: mRNA translation is required for position dependence ...... 29 Figure 5: Inclusion of rare codons accelerates both deadenylation and degradation ...... 30 Figure 6: Decapping is required for position dependent mRNA destabilization by rare codons ...... 31 Figure 7: mRNAs bearing rare codons do not change in ribosome distribution ...... 33 Figure 8: Sequencing of poly(A)+ enriched and total mRNA sets produces different halflives ...... 35 Figure 9: Occurrence of codons correlates with differential effects on mRNA half- life ...... 39 Figure 10: Influence on stability correlated with optimality for virtually all codons ...... 41 Figure 11: Frequency of occurrence is not equivalent to optimality for codons ...... 42 Figure 12: Data from an independent source show codon effects similar to ours .... 43 Figure 13: Codon effects are dependent on reading frame ...... 44 Figure 14: Genes cluster into distinct patterns of codon usage ...... 46 Figure 15: Individual clusters show effects of codon usage...... 48 Figure 16: Codons impact average ribosome density similarly to effects on mRNA decay ...... 51 Figure 17: Previously observed similarity in half-life can be explained by similar codon usage ...... 53 Figure 18: Changes in codon composition lead directly to changes in decay ...... 56 Figure 19: Regulation of stability through the UTR appears weaker than codon effects ...... 58 Figure 20: Regulation of decay through codon usage utilizes the major decay pathway ...... 59 Figure 21: Changes in optimal codon content accelerates both deadenylation and decapping ...... 60 Figure 22: Codon usage impacts translation beyond its effects on mRNA decay ..... 62 Figure 23: Changes in codon usage do not produce changes in ribosomal association ...... 64 Figure 24: Ribosomal translocation is directly affected by codon usage ...... 67 Figure 25: Regulation through codon usage is potent enough to impact cellular fitness ...... 69 Figure 26: Codon usage bias varies greatly between organisms ...... 79 Figure 27: Codon usage within degenerate groups displays lower bias in higher eukaryotes ...... 81

5

INDEX OF TABLES

Table 1: Yeast strains used in this study ...... 103 Table 2: Plasmids used in this study ...... 105 Table 3: Oligonucleotides used in this study ...... 106

Note:

The data presented herein are reproduced with permission from: Presnyak V,

Alhusaini N, Chen YH, Martin S, Morris N, Kline N, Olson S, Weinberg D, Baker KE,

Graveley BR, Coller J. Codon optimality is a major determinant of mRNA stability. Cell.

2015 Mar 12;160(6):1111-24.

6

ACKNOWLEDGEMENTS

All of the work presented herein would not be possible without the help and support of a great number of people. First and most importantly, I would like to thank my wife, Jennifer. The unfailing love and support she has offered me, whether here by my side or 2000 miles away completing her schooling as a physician, has been a crucial source of motivation and stability though the ups and downs of graduate work. I would also like to thank my parents, Oleg and Nataliya, whose long journey from their homeland and many sacrifices have given me the chance to be here.

I would like to thank my adviser, Dr. Jeff Coller. His insights into many areas, from data interpretation to the values of collaboration and presentation skills, have been indispensable. I would like to acknowledge Dr. Kristian Baker as well, whose contributions have helped me advance my research with advice and direction over many years. I would also like to thank the members of my committee, Drs. Timothy

Nilsen, Maria Hatzoglou, Donny Licatalosi, and Nathan Morris for their guidance.

Other researchers in the lab have all contributed to this work in many different ways. TJ Sweet was my mentor when I first started in the lab, showing me techniques and procedures as well as informing me about the background in the field. His knowledge and his approach to science were inspirational. Najwa Alhusaini’s support in the lab has been invaluable with her vast expertise in molecular biology techniques. Ying-Hsin Chen, Sophie Martin, and Nicholas Kline have all been kind enough to dedicate large amounts of their time in support of my work. The work of collaborators beyond the lab has been equally crucial; especially that of Dr. Brenton

Graveley, whose expertise in the area of RNA sequencing was the very basis of this project.

7

Effects of Codon Usage on mRNA

Translation and Decay

Abstract

by

VLADIMIR PRESNYAK

Gene expression is a complex process regulated at many steps. One important step is the degradation of mRNA. The major pathways and enzymes in mRNA decay have been identified and described, but this has not yet led to a good understanding of the mechanisms for the observed differences in mRNA half-lives.

Previous research has elucidated several examples of regulation through 3’ UTR elements, but general mechanisms are not clear. The translation of mRNA is intricately linked to decay, thus the two processes must be evaluated together to uncover regulation that frequently affects both.

We show that the choice of codons in the body of an mRNA can dictate the stability of the mRNA. Messages with a high percentage of optimal codons are relatively stable, whereas messages with a high percentage of non-optimal codons are relatively unstable. Reducing the optimal codon content of a stable message with naturally high occurrence of optimal codon leads to a reduction in stability, and

8 conversely, increasing the optimal codon content of a naturally unstable message leads to an increase in stability. Similarly, inclusion of rare codons (a subset of non- optimal codons) in a message leads to a reduction in stability. Importantly, these effects appear to be dominant over previously-described 3’ UTR regulatory elements.

Finally, we show that optimal codon content is shared among groups of genes encoding of related function, which have previously been found to have similar half-lives. Taken together, this evidence indicates that codon content can be an intrinsic regulatory feature of mRNAs affecting stability.

Consistent with the role of codons in translation, we show that this method of regulation directly affects steps of translation. The reduction of mRNA stability caused by rare codon inclusion is dependent on translation. Concordantly, inclusion of non-optimal codons in reporters leads to reduction in output that is greater than the decrease in mRNA levels. These strongly suggest that translation is the primary target of this method of regulation. To determine the precise step affected, we demonstrate that the changes in optimal codon content have a direct effect on the rate of ribosome translocation on these messages. The coupling of translation to decay makes the changes in half-lives possible and further amplifies the effect of the regulation.

This work demonstrates the powerful mechanisms of regulation possible through codon usage, suggesting that this may be an evolutionary feature that allows cells to regulate expression of proteins without changes to the sequence or the need for external control sequences. Additionally, it implicates the ribosome as the key point of regulation, not only for translation, but for mRNA stability as well.

9

CHAPTER 1: BACKGROUND

Cellular context of mRNA decay

Gene expression and its regulation are central to the basic functioning of all cells. Virtually all processes in the cell depend on changes in gene expression; progression through the cell cycle, responses to environmental variations, and even apoptosis are all examples of this. The core processes of gene expression fit into the central dogma of molecular biology (Crick, 1970). The two cornerstone events are the transfer of genetic information from DNA to messenger RNA (mRNA) through transcription in the nucleus and subsequent transfer of that information from mRNA into protein by translation. This, of course, is an oversimplification of the process, as there are a great number of steps surrounding the processing of each of the three molecules involved, which can therefore be divided into three areas relating to the molecules involved: steps that affect DNA, such as activation and binding of transcription factors, and chromatin modifications; steps that affect RNA, such as splicing, polyadenylation, and export; and steps that affect protein, such as protein folding and modification. The two processes that transfer information from one molecule to another, transcription and translation, reside at the interfaces of these areas.

Rather than simple transfers of information, transcription and translation are incredibly complex events, with multiple layers of regulation (Dever and Green, 2012;

Hinnebusch and Lorsch, 2012; Shandilya and Roberts, 2012). They are also points of signal amplification, as multiple transcription events can create multiple mRNAs from a single DNA , and multiple rounds of translation can create multiple proteins from a single mRNA molecule. Regulation of these steps yields a range of functional

10 gene product levels that has been estimated to be around 6 orders of magnitude in yeast, with reported protein counts ranging from single digit molecules per cell for the most rare to over a million for the most abundant. (Ghaemmaghami et al., 2003) This regulation can occur from the earliest steps, like chromatin modification (Shilatifard,

2006), which happens before transcription even begins, to the latest steps like the vast array of post-translational protein modifications (Lothrop et al., 2013), occurring after translation has completed.

mRNA holds a special position in gene expression, as it is the only molecule involved in both transcription and translation. Thus, regulation of mRNA metabolism plays a critical role in regulating levels of gene expression. A majority of regulatory events occur in the nucleus and those mRNAs that are exported to the cytoplasm are competent to attempt translation. Once in the cytoplasm, the key regulatory mechanism acting on mRNAs is mRNA decay. It represents the default fate of mRNAs, as any molecule that is not actively protected from it by another process will be destroyed. mRNA decay serves as a counterbalance to both transcription and translation by controlling levels of the intermediate molecule.

The rate of mRNA decay is highly variable and can be adjusted to meet the needs of the cell, allowing for great precision and flexibility when combined with regulation at the level of transcription and translation. While mRNA decay is the direct negative counterpart of transcription, destroying the molecules produced by that process, it’s even more tightly intertwined with the process of translation (Huch and Nissan, 2014). Translation appears to be the primary process which protects mRNAs from decay. These processes compete for the same pool of mRNAs in the cytoplasm, ensuring the survival of the fittest among those mRNAs. Those that

11 translate poorly, especially mRNAs with features that prevent normal translation (e.g. mRNAs harboring nonsense mutations or breaks), are rapidly eliminated (Shoemaker and Green, 2012). This competition amplifies the effects of regulation of translation

– mRNAs that are poorly translated also tend to be highly susceptible to mRNA degradation. This is possible because of the competition between these processes, which manifests as a tight-knit relationship between the two that has been observed for some time, but the details of which remain elusive (Jacobson and Peltz, 1996;

Roy and Jacobson, 2013).

Overview of mRNA decay

From a molecular standpoint, a regulated process of mRNA decay is facilitated through the protection of mRNAs by two features added during the process of its maturation in the nucleus: the cap that is added to the 5’ end of the message by the capping enzyme complex, known as the 5’ cap, (Topisirovic et al., 2011) and the long stretch of adenosine residues added by the polyadenylation machinery, known as the poly(A) tail (Proudfoot, 2011). The addition of these features is coupled to transcription, assuring that the message is protected in the early steps of processing.

The cap is added to the mRNA immediately as it emerges from the RNA polymerase – the capping complex is associated directly with the RNA polymerase

(Cho et al., 1997). The cap is a unique 7mGpppN structure, consisting of a guanine residue methylated at the 7-nitrogen, which is coupled to the first nucleotide of the message through a unique 5’-5’ triphosphate linkage (Shatkin, 1976). This unique configuration confers resistance to enzymes that would normally degrade RNAs from the 5’ end (5’-3’ exonucleases), necessitating removal of the structure if the mRNA is

12 to be degraded in this manner (Furuichi et al., 1977). The cap is also bound by a cap binding complex in the nucleus and will eventually be bound by eIF4E in the cytoplasm to promote translation. These proteins further stabilize the cap and protect the message (Topisirovic et al., 2011).

The poly(A) tail is a long stretch of adenosines, ranging from a fully-adenylated length of about 70 nucleotides in yeast to over 200 in mammals. This is added to the mRNA immediately after transcription, with factors involved in the process also interacting directly with the polymerase (Glover-Cutter et al., 2008). A set of cleavage/specificity factors cleave the mRNA and stimulate addition of adenosines by polyadenylate polymerase (PAP) until the tail is sufficiently long and the interaction between the cleavage factors and the PAP is disrupted. The newly-formed poly(A) tail is then bound by a protective protein known as polyadenylate binding protein (PAB).

This machinery also interacts with the RNA polymerase to promote termination, as well as the spliceosome to facilitate splicing of the nascent transcript (Colgan and

Manley, 1997; Proudfoot, 2011).

Decay of mRNAs occurs through two pathways. The pathways are defined by the exonucleases that carry out the destruction of the mRNA. The major decay pathway focuses on recruiting a 5’ to 3’ exonuclease, and is thus defined as the 5’-3’ decay pathway. The minor decay pathway works in the opposite direction, being defined as the 3’-5’ decay pathway (Parker, 2012; Schoenberg and Maquat, 2012).

In addition to these basic decay pathways, there are several quality control pathways affecting specific mRNAs that cause aberrancies in translation. The processes of quality control mRNA decay are nonsense-mediated mRNA decay, which affects primarily mRNAs with nonsense (termination) codons encountered earlier than

13 expected; non-stop mRNA decay, which affects mRNAs lacking a functional termination codon; and no-go mRNA decay, which specifically targets mRNAs with stalled ribosomes, a situation classically caused by strong secondary structure. These pathways use some of the same proteins as the default decay processes, but tend to recruit the decay machinery in ways that are specific to the aberrancy that triggers the pathway (Isken and Maquat, 2007; Shoemaker and Green, 2012).

Pathways of mRNA decay

The 5’-3’ decay pathway is the major pathway of mRNA degradation in yeast

(Figure 1). This pathway is divided into three steps: deadenylation, the removal of the poly(A) tail; decapping, the removal of the 5’ cap; and exonucleolytic degradation, the destruction of the message body (Tucker and Parker, 2000).

Deadenylation generally occurs first, and is thought to be rate-limiting for most mRNAs (Franks and Lykke-Andersen, 2008). In this step, the poly(A) tail of the message is shortened to a length of about 10 nucleotides, which is thought to be sufficient to remove the PAB molecules that otherwise function as a protective signal

(Tucker et al., 2002). This is carried out by a large complex of proteins known as the

CCR4-NOT complex, consisting of the CCR4 poly(A)-specific exonuclease, POP2 (Caf1 in higher eukaryotes) putative exonuclease, the NOT1 structural scaffold protein, and several proteins of unknown function, including other NOT and CAF proteins. In yeast,

CCR4 provides all of the nucleolytic activity of the complex, but in other organisms

Caf1 is known to be active or even dominant (Chen et al., 2002; Goldstrohm and

Wickens, 2008). Specific triggers for deadenylation are not well described, but a number of proteins have been described to interact with both the deadenylation complex and the translational machinery, suggesting a direct link between the

14 processes (Gray et al., 2000; Hoshino, 2012; Wilusz et al., 2001). On some messages, deadenylation can be triggered by recruitment of the complex by factors associated with the messages through sequence-specific interactions, such as the

PUF family of proteins or even miRNA-mediated complexes in higher eukaryotes

(Eulalio et al., 2009; Goldstrohm et al., 2007; Nilsen, 2007).

15

Decapping of normal messages is a deadenylation-dependent event, thought to occur directly afterwards and to be coordinated by preferential binding of the decapping activators to deadenylated mRNAs (Chowdhury et al., 2007; Decker and

Parker, 1993; Tharun and Parker, 2001). The basic function of decapping is to remove the protective 5’ cap structure, allowing access to the mRNA body by exonucleases. This means that the decapping factors compete for access to the cap with other proteins, most importantly eIF4E (Schwartz and Parker, 2000). Because eIF4E is a central factor in translation initiation (see below), decapping activators tend to function as translational repressors. Additionally, many of them interact with

PAB1, a poly(A) binding protein known to stimulate translation (Vilela et al., 2000;

Wyers et al., 2000). For some decapping cofactors, translational repression may actually be their primary function (Coller and Parker, 2005; Sweet et al., 2012). The core set of decapping activators include DHH1, a DEAD-box helicase with multiple functions (Coller et al., 2001); PAT1, an mRNA-binding protein thought to coordinate the activity of the decapping complex (Nissan et al., 2010); and the LSM1-7 complex, a heptameric RNA-binding ring thought to associate with the mRNA late in the process and consolidate the decapping signals (Tharun and Parker, 2001). The act of removing the cap structure itself is carried out by the DCP1/DCP2 holoenzyme, with

DCP2 contributing the enzymatic activity (Coller and Parker, 2004). Several other proteins contribute to decapping activity, but are not required, including the enhancers of decapping EDC1-3, proteins that have been described to interact with the decapping enzyme and promote decapping in vitro (Dunckley et al., 2001;

Kshirsagar and Parker, 2004).

16

Exonucleolytic degradation is the last step in the pathway and appears to be the least regulated; it is carried out by the XRN1 exonuclease, which acts on mRNAs without cofactors. Its known interactions are with the decapping machinery, as well as some components of quality control mRNA decay pathways, which are thought to provide its recruitment mechanism (Nagarajan et al., 2013). This enzyme rapidly removes nucleotides from the 5’ end of any RNA with a 5’ monophosphate in a highly processive manner, which is attributable to an unwinding activity inherent to the active site (Jinek et al., 2011). This feature additionally allows the enzyme to degrade structured mRNAs without a helicase cofactor.

For most studied mRNAs in yeast, 3’-5’ decay pathway appears to contribute little to overall half-life, with decay intermediates only observable when the 5’-3’ pathway is blocked (Anderson and Parker, 1998). However, the pathway is important in cases where the 5’-3’ decay pathway fails to function and in specific cases of regulation through the pathway (Lin et al., 2007; Orban and Izaurralde, 2005). The pathway begins with an identical deadenylation step as the 5’-3’ pathway; however, this is followed by direct digestion of the message body from the 3’ end by the exosome rather than decapping and decay from the 5’ end. The exosome consists of several proteins that are homologous to nucleases, but it appears that only one subunit has nuclease activity in the cytoplasm, RRP44. The other nuclease-like proteins, as well as the accessory factors in this complex are thought to function in recruitment, binding, and channeling of the RNA. This complex is recruited by a group of proteins known as the SKI proteins, which interact with the exosome and serve to facilitate its degradation of mRNAs (Chlebowski et al., 2013). After the degradation of

17 the message body, the cap of the message is left to be degraded by the scavenging decapping enzyme DCS1 (Wang and Kiledjian, 2001).

Quality control pathways

Nonsense-mediated decay (NMD) is a quality control decay pathway that degrades mRNAs which terminate translation prematurely, typically due to the presence of a nonsense codon early in the message, though other conditions exist under which this pathway may become active, such as the presence of multiple reading frames or abnormally long 3’ untranslated regions (UTRs) (Baker and Parker,

2004; Losson and Lacroute, 1979). It is proposed to stop the production of truncated proteins, which can be deleterious in several ways, including aggregation, gain of function, and dominant negative activity (Frischmeyer and Dietz, 1999; Pulak and

Anderson, 1993). This process functions by recruiting a host of decay-related proteins to the message, including the deadenylation components, the decapping enzyme, and exosomal components, which accelerate degradation in multiple ways e.g. bypassing the deadenylation requirement to recruit the decapping machinery and the exosome. This leads to the removal of the cap from the message and its subsequent degradation by XRN1 as well as degradation by the exosome. (Lejeune et al., 2003; Swisher and Parker, 2011). The main components of the pathway are the

UPF proteins, specifically UPF1-3, with an accessory group of proteins known as the

SMG proteins, which are required in some organisms. The UPF proteins carry out both recognition of NMD targets and recruitment of decay machinery (Chang et al.,

2007). It should be noted that NMD can destabilize messages very strongly, with changes of 100-fold or more possible upon introduction of a premature nonsense codon (Baumann et al., 1985).

18

Non-stop decay (NSD) is also a quality control pathway, but its targets are those mRNAs lacking a stop codon. These occur in cases of aborted transcription, premature polyadenylation, or damage to the mRNA. This process is proposed to rescue ribosomes that have reached the end of an mRNA without encountering a stop codon and thus have not gone through termination (Frischmeyer et al., 2002; van Hoof et al., 2002). As termination facilitates release of the ribosome from the mRNA, these ribosomes require rescuing to be removed from the mRNA. Factors closely related to eRF3 (SKI7 in yeast or Hbs1 in mammals) are thought to recognize the ribosome in the same way as the traditional termination factors, though the mechanism of targeting to stalled ribosomes are unclear. Upon recognition, the ribosome can be release from the message through the action of Dom34 (Saito et al., 2013) and the message itself can be degraded through the action of the exosome, which is recruited through interactions with the SKI complex (Klauer and van Hoof, 2012).

No-go decay (NGD) is a quality control pathway responsible for degradation of mRNAs that cause ribosomes to stall during translation. This can occur when a ribosomes encounters a very stable structure in the open reading frame (ORF) of the mRNA (Harigaya and Parker, 2010). Similarly to NSD, this pathway facilitates the release of ribosomes through the function of HBS1 and DOM34. This promotes endonucleolytic cleavage of the message by unidentified nucleases. It is unclear whether HBS1-DOM34 stimulates the nuclease activity in some way or whether the removal of the ribosome is sufficient to allow access to the mRNA (Doma and Parker,

2006; Shoemaker et al., 2010). Once the mRNA is cleaved, the fragments now possess unprotected ends, which can be processed by XRN1 and the exosome.

19 mRNA translation is intricately linked to decay

While the quality control pathways are clearly dependent on translation for their function, the interplay between translation and decay of normal messages is much less clear. Translation makes use of many of the same features of the mRNA that are relevant to decay, such as the 5’ cap and 3’ poly(A) tail and their associated proteins (Hinnebusch and Lorsch, 2012). Translation can be divided into three separate processes – initiation, which is the recruitment of a ribosome to the mRNA; elongation, which is the progressive assembly of a protein molecule; and finally termination, which is the removal of the protein product and the ribosome from the mRNA. The process entails enormous complexity, with numerous regulatory and accessory factors at each step – in addition to the ribosome itself, which contains 3

RNAs and over 70 proteins. Regulation of this process has traditionally been thought to occur primarily during initiation, though there is some emerging evidence that regulation may also occur during elongation and termination (Sonenberg and

Hinnebusch, 2009).

In eukaryotes, the translation cycle begins with loading of the small 40S ribosomal subunit with an initiator Met tRNA by the action of eukaryotic initiation factors eIF1 and eIF2. The ribosomal subunit is then bound by eIF3 to make the 43S pre-initiation complex. eIF3 serves to prevent premature association of the small subunit with the large one and to facilitate its recruitment to mRNAs (Jackson et al.,

2010). Meanwhile, the mRNA is bound by the eIF4 complex, made up of the eIF4E cap-binding protein, the eIF4A RNA helicase, and the eIF4G structural scaffold.

Assembly of this complex on the mRNA allows for interactions with eIF3 and recruitment of the pre-initiation complex to the 5’ of the message. At this point, the

20 pre-initiation complex can scan along the 5’ end of the mRNA until it finds an appropriate AUG initiation codon. Once it is positioned at the start site, eIF5 then acts to facilitate the recruitment of the large ribosomal subunit, which is bound by eIF6 that serves to prevent premature ribosome association and aids in recruitment

(Jackson et al., 2010). The subunits are joined and the initiation factors are released, allowing for the ribosome to begin elongation and for the initiation factors to repeat the process with a new ribosome.

Interactions of both eIF4E and eIF4G are critical to the interplay between translation and decay. eIF4E strongly binds and protects the 5’ cap, thus inhibiting 5’ decay. Its removal is thought to be regulated as a potential rate-limiting step for the pathway (von der Haar et al., 2004). eIF4G is a scaffold that interacts with many proteins involved in both processes, with documented connections to PAB1 at the 3’ of the message, in addition to its role in coordinating the members of the eIF4 complex (Tarun and Sachs, 1996). It has also been found to interact with many different accessory factors involved in various facets of decay (Rajyaguru et al.,

2012). This observation is part of the basis of the closed-loop model of translation, where the 3’ ends of messages are found close to the 5’ end, facilitating stimulation of translation by PAB1 and allowing efficient recycling of ribosomes while protecting the transcripts from decay (Jacobson, 1996). These claims are supported by evidence that inhibition of translation initiation leads to accelerated decay of messages.

Elongation is conceptually a simpler process, with eukaryotic translation elongation factor (eEF) 1 facilitating recruitment of tRNAs to the ribosome as it progresses, allowing for decoding and protein assembly. The other elongation factor,

21 eEF2, allows for movement of the ribosome along the mRNA in a ratcheting fashion

(Dever and Green, 2012). The complexity in this process comes from the difficulty in matching the correct tRNA to the codon currently in the amino-acyl tRNA site of the ribosome (A-site). Recruitment of tRNAs is a stochastic process, requiring exquisite sensitivity from the ribosome to rapidly incorporate correct matches and reject incorrect ones (Rodnina and Wintermeyer, 2001). The effects of perturbations at this step on decay are unclear – inhibiting elongation with cycloheximide has led to stabilization of mRNAs (Beelman and Parker, 1994), but inhibiting elongation by recruitment of translational repressors such as DHH1 has led to destabilization

(Sweet et al., 2012). It should be noted that both experimental approaches have caveats, as the drug treatment is likely to have pleotropic effects and the tethering of

DHH1 may promote occurrence of abnormal interactions.

Translation termination occurs when a ribosome reaches the termination codon of an mRNA. In this case, no tRNA is available to match the stop codon, so the

A-site must be filled by eukaryotic release factor (eRF) 1, which acts with its cofactor eRF3 to stimulate release of the protein product and disassembly of the ribosome through the ribosome recycling pathway (Dever and Green, 2012). The termination factors have been shown to interact with PAB1 and UPF1, and thus may play an important role in decay, especially in NMD (Ivanov et al., 2008). Perturbations at this step, such as mutations that change the conformation of the A-site or changes in the concentrations of the release factors, typically lead to read-through translation, where a non-cognate tRNA is used to decode the stop codon and then translate on past the expected site of termination (Bertram et al., 2001).

22

Mechanism is insufficient to explain decay rate

Despite the fact that normal mRNAs are degraded by a common decay pathway, turnover rates for individual yeast mRNAs differ dramatically with half-lives ranging from <1 minute to 60 minutes or greater (Coller and Parker, 2004). Specific modes of regulation are not well known and it is postulated that features of the mRNA itself or composition of its associated proteins may lead to differences in entering the mRNA decay pathway. Currently, some sequence and/or structural elements located within 5’ and 3’ UTRs have been implicated in contributing to the decay of a subset of mRNAs (Geisberg et al., 2014; Lee and Lykke-Andersen, 2013;

Muhlrad and Parker, 1992). In yeast, the primary example of this type of regulation is the effect of the PUF proteins, which promote deadenylation of messages containing specific sequences to which these proteins can bind upon recognition. However, PUF proteins are thought to regulate only 10% of yeast genes, thus failing to account for the wide variety of half-lives in the transcriptome. In higher eukaryotes, miRNA- mediated complexes may function in the same way, potentially impacting a large fraction in the genome, but the premise holds that these features regulate mRNA stability predominantly in a transcript-specific manner (Geisberg et al., 2014).

Therefore, it seems likely that additional and more general features which act to modulate transcript stability could exist within mRNAs.

Codon bias and optimality

One such feature of mRNAs has been suggested through experiments relating to codon usage (Hoekema et al., 1987). Hoekama et al. showed that synonymous substitution of minor codons (also called rare codons) into the highly expressed

PGK1 gene would lead to a reduction in protein output. Part of the reduction came

23 from reduced translational efficiency caused by the substitutions, but changes in mRNA steady state levels contributed as well. Rare codons are defined as codons that occur infrequently in the genome and have been ascribed a variety of functions in many systems, though the relationship between rarity of codons and their effects is best described in bacterial systems (Plotkin and Kudla, 2011). The concept that codon usage could lead to effects on translation was first elucidated by research in codon usage bias, which showed that rather than a consistent pattern of usage across the genome, different codons were preferentially used in different genes.

(Grantham et al., 1981; Ikemura, 1985; Sharp and Li, 1987). Codon usage bias was quickly characterized to affect the speed and accuracy of translation, as well as levels of gene expression (Akashi, 1994; Sharp and Li, 1986). Over time, these attributes were ascribed to a broader concept, known as codon optimality.

Codon optimality is a concept related to codon usage bias, born of the understanding that the frequency of occurrence and effects on translation and other processes need not always be linked (Pechmann and Frydman, 2013; dos Reis et al.,

2004). Conceptually, codon optimality is a scale that reflects the balance between the supply of charged tRNA molecules in the cytoplasmic pool and the demand of tRNA usage by translating ribosomes, representing a measure of translation efficiency. Specifically, optimal codons are postulated to be decoded faster and more accurately by the ribosome than non-optimal codons (Akashi, 1994; Drummond and

Wilke, 2008), which are hypothesized to slow translation elongation (Novoa and

Ribas de Pouplana, 2012; Tuller et al., 2010).

Codon optimality has been shown to play an important role in wide variety of processes related to translation, modulating factors such as expression level,

24 translation elongation rates, kinetics of protein folding, and translational accuracy

(Akashi, 1994; Hudson et al., 2011; Kri Ko et al., 2014; Novoa and Ribas de

Pouplana, 2012; Pechmann and Frydman, 2013; dos Reis et al., 2004; Zhou et al.,

2009). Further, there has been some work showing that codon selection is under significant evolutionary pressure, with cells preferring to express tRNAs at levels tightly coordinated with relative demands of codon usage (Doherty and McInerney,

2013; Yona et al., 2013). The link between codon usage and translational effects has been well documented, but the effects on decay, a process intricately tied to translation, have not been explored in depth.

25

CHAPTER 2: RARE CODON ANALYSIS

Position of rare codons changes effects on decay

In our lab’s previous work, we showed that inclusion of a cluster of rare arginine codons within the open reading frame (ORF) of a reporter mRNA dramatically enhanced its turnover (Hu et al., 2009; Sweet et al., 2012). This destabilization affected both the deadenylation and decapping steps of the major decay pathway.

We additionally showed that the destabilization was not dependent on known components of several quality control pathways.

It was clear that the presence of rare codons leads to destabilization of the reporter mRNA used in those experiments, but the mechanism remained a mystery.

To get more insight into a possible mechanism, we sought to characterize the effect further. Guided by evidence that showed some mRNA decay pathways can exhibit strong position dependence (specifically NMD), we created a series of constructs where an identical rare codon stretch was inserted into the mRNA into several different locations, at 5% into the reading frame, 25%, 50%, 63%, 77% (this is identical to the original construct), and 94%, referred to as RC 5, RC 25, RC 50, RC

26

63, RC 77, and RC 94 respectively, and a control construct without the rare codons referred to as –RC (diagram of codon insertion presented in (Figure 2). This allowed us to test a spectrum of reporters that varied only in the position of the rare codon stretch.

The half-lives of the reporters were tested by placing the reporter mRNAs into a plasmid under the control of the GAL UAS, a system that allows for half-life determination through transcriptional shut-off. The GAL-controlled mRNAs are highly expressed in the presence of galactose, but strongly and quickly repressed upon transferring the cells into a glucose-rich medium. We can then collect samples throughout a time course and evaluate the levels of mRNA remaining in the absence of transcription to calculate a decay rate and a half-life. Performing the experiment in wild-type cells with all of the reporters (Figure 3) demonstrated that the reporters do indeed exhibit a dependence on position, with stretches of rare codons inserted later in the message producing greater decreases in half-life.

This was surprising for two main reasons. First, the other pathway that displays similar behavior, NMD, has an opposite polarity – stop codons earlier in the message produce greater decreases in half-life for that pathway. Second, it was entirely unclear why rare codons inserted later in the message would lead to greater destabilization. This called into question the previously established results (Hoekema et al., 1987), as their methodology involved a series of reporters with an increasingly long stretch of rare codons at the 5’ of the message that produced increasing destabilization of the message. Their interpretation was that more rare codons added to a message led to more destabilization. However, if the position of the rare codons is important to the extent of destabilization, the reported destabilization seen in that

27 work may be due to introduction of rare codons later in the message, rather than the increasing numbers of the rare codons.

Destabilization by rare codons depends on translation

To further characterize this behavior, we verified its dependence on translation. This allowed us to ascertain that the observed effect is due to the rare nature of the codons, as alternate explanations for destabilization could include changes in properties of the mRNA itself, such as structure or GC content. To do this, we inserted a strong stem-loop structure into the 5’ UTRs of several of the RC

28 constructs. This strong structure has been shown to reduce translation of the message to below 1%, likely due to blocking of ribosomal scanning. With a steady- state analysis, we showed that whereas the rare codon constructs without the stem loop demonstrated a reduction in accumulation (stemming from a reduction in half- life) for the construct with rare codons positioned late in the message, the rare codon constructs with the stem loop showed little difference in accumulation between the constructs (Figure 4). As expected, the stem-loop containing constructs are expressed at a lower level than their well-translated counterparts, due to the typical coupling between translation and stability.

Decay of reporters occurs through the major decay pathway

With the translational dependence of this phenomenon established, we tested these constructs on a high-resolution acrylamide northern gel to verify that deadenylation and decapping are both being impacted, as would be expected if the rare codons affected the major decay pathway rather than triggering a quality control pathway. Rapid decapping without deadenylation could be a tell-tale sign of a quality control pathway such as NMD, which can uncouple decapping from deadenylation.

This experiment indicated that the rare codon bearing messages do appear to

29 undergo both deadenylation and decapping faster than the control (Figure 5), suggesting that this is likely due to regulation of the major decay pathway. To further exclude the possible activation of other quality control pathways, we tested the levels of these reporters in cells deleted for key proteins in those factors – upf1∆ for NMD and dom34∆ for NGD. In both cases, the polarity remained intact (though somewhat diminished in the dom34∆ strain), solidifying the idea that this phenomenon was due to the action of the major decay pathway (data not shown).

Having determined that these differences are most likely due to the action of the major decay pathway, we tested the stability of the constructs in a series of

30 deletion mutants to find the members of the pathway that are responsible. We found that deletion of the decapping enzyme DCP2 (Figure 6) leads to a situation where the mRNAs bearing rare codons are less stable than the control (though much more stable than in wild type cells), but the position of the rare codons no longer dictates the degree of destabilization. This indicated to us that the polarity was due to the action of the 5’-3’ decay pathway, as in the absence of DCP2, the normally minor 3’-

5’ decay pathway is thought to become the main method of decay. Further narrowing down the factors involved in the 5’-3’ decay pathway, we found the constructs behaved similarly in dhh1∆ cells (not shown, very similar to dcp2∆ below). By contrast, in cells deleted for other decay cofactors (specifically lsm1∆ and pat1∆), constructs were stabilized overall, but retained the dependence on position polarity

(data not shown, similar to WT above). We concluded that DHH1 acts on DCP2 independently of the other known decapping cofactors to create the position dependence in these reporters.

31

Ribosome association of constructs remains unchanged

To explain the mechanism behind the increased destabilization of mRNAs with late rare codons, and presumably late disruptions in translation, we hypothesized that the clearance of these messages may be caused by accumulation of slowed ribosomes. We termed this the “traffic jam” model, where areas of slow translation late in the message would lead to accumulation of a larger number of ribosomes on the message than areas of slow translation early in the message, since they could just pile up onto the message before the slowdown. To test this theory, we analyzed the polysomal association of these mRNAs to ascertain whether the ones with the late rare codons would indeed harbor more ribosomes than the ones with early rare codons or without rare codons at all. This was done by sucrose gradient fractionation, which allows separation of cellular complexes by weight. This in turn allowed us to estimate the number of ribosomes that are associated with a message by looking for the message in fractions whose weight corresponds to clusters of a certain number of ribosomes. We found that the mRNAs all appear to settle in similar fractions, suggesting that the number of ribosomes associated with each is similar (Figure 7).

The caveat is that they all settle in heavy ribosomal fractions, where resolution becomes poor beyond about 6 ribosomes. It is possible that the numbers of ribosomes are different on the messages, but the differences are unlikely to be dramatic, as we do not see movement into extraordinarily deep fractions of the gradient, which can be seen in cases of heavy ribosome accumulation (Sweet et al.,

2012).

32

These experiments with rare codons gave us very strong indications that there was an unexplored relationship between codon usage and mRNA half-life. Codon usage has been implicated in some translational regulation, allowing highly expressed genes to translate quickly and efficiently or allowing time for proteins to fold when necessary. A role in mRNA decay would introduce a totally new and wide- reaching role for codon usage – it may help establish and influence mRNA half-lives, which taken together with its role in translation would give it a dual role in the control of gene expression.

33

CHAPTER 3: GENOMIC ANALYSIS

Determination of half-lives by RNA-seq

Extending analyses described above, we turned to a whole-transcriptome approach to analyzing the role of codon usage in mRNA decay regulation. We started by obtaining mRNA half-lives for as many genes in yeast as possible. We chose to do our own decay measurements rather than relying on previously published data sets due to some concerns about the effects of deadenylation on those half-lives presented in many studies. Measuring global mRNA decay rates using methods that either enrich for polyA+ RNA from total RNA samples and/or synthesize complementary DNA (cDNA) using oligonucleotides annealed to the poly(A) tail may fail to capture important information for several reasons. Although it is firmly established that deadenylation is the rate limiting step in mRNA turnover, we and others have observed that specific mRNAs persist in cells as relatively long-lived deadenylated species (Hu et al., 2009; Muhlrad et al., 1995). For such transcripts, decapping and subsequent decay is delayed and decapping becomes the rate defining step for mRNA degradation. Moreover, some mRNAs may contain structures that impede poly(A) tail function (Geisberg et al., 2014). Lastly, the overall level of information gained may vary with the level of poly(A) enrichment achieved in the protocol used, creating further uncertainty. With this in mind, we sought to determine how prevalent these phenomena are on a transcriptome-wide level. For this purpose, we performed an experiment similar to a GAL transcriptional shut-off described above, except the rapid repression of transcription was achieved across the whole genome by inactivation of RNA polymerase II (Nonet et al., 1987). A set of samples across a time course was collected, similar to standard shut-off experiments. At each

34 time point, libraries were prepared from either oligo-dT selected mRNAs or rRNA- depleted whole cell RNA and subjected to Illumina sequencing (see experimental procedures).

35

This approach allowed us to compare the half-lives of species captured by oligo dT selection (referred to as poly(A)+) with the half-lives of the total mRNA decay rates, calculated from samples the rRNA depleted samples (Figure 8A). Remarkably, the vast majority (92%) of transcripts for which we could confidently calculate half- lives (3969) had longer half-lives when the rRNA depleted libraries were analyzed relative to the half-lives determined from poly(A) selected libraries (Figure 8B and C).

In fact, a majority of transcripts demonstrated half-lives that were more than twice as long when calculated from total mRNA than from poly(A)+ mRNA. It is important to note that not all of these transcripts need to exist as completely deadenylated RNAs.

Oligo-dT selection typically uses resins bound to oligonucleotides of dT around 18 nt in length. mRNAs with poly(A) tails shorter than that number may not be captured efficiently, and so may be lost from analysis of the poly(A)+ pool without undergoing complete deadenylation. These data do indicate that mRNA half-lives determined from poly(A)+ data sets may give skewed values for many genes. This observation expands upon previous experiments that showed certain mRNAs, such as PGK1, could remain as a largely deadenylated species for extended periods, visible on high- resolution northern gels in shut-off experiments (Muhlrad et al., 1995). Our observations here show that this is not an isolated phenomenon, but actually a typical mode of regulation, where mRNAs may not undergo decapping immediately follow deadenylation. Going forward, we analyzed mRNA half-lives calculated from the total mRNA data to avoid complications of interpreting differential effects of deadenylation.

36

The search for features correlating with decay

With this data in hand, we attempted to identify sequence motifs that might dictate stability or instability. We used the MEME suite (Bailey et al., 2009) to search through the 5’ UTRs, ORFs, and 3’ UTRs of the most stable 10% of all mRNAs in an attempt to find conserved motifs that may explain their stability. We repeated the analysis with the 10% least stable of all mRNAs to look for destabilizing elements in the same fashion. We concluded in both cases that there did not appear to be any conserved sequence motifs shared among the most or least stable groups of mRNAs.

Similarly, we analyzed features like length, abundance, and GC content of the most stable and least stable groups. Of these, the abundance was the only one that showed a notable difference between the groups, with stable mRNAs being more abundant than their unstable counterparts – median FPKM was 87 for 10% most stable and only 30 for the 10% least stable. This observation served as a control and confirmed the ab initio assumption that the rate of decay of mRNAs contributes strongly to establishing overall level of expression. This observation is not indicative of a novel relationship, as half-life and abundance are not independent variables. In sum, we were not able to find significant correlations between these general mRNA features and half-life.

Searching for other features, we returned to the observations we had previously made with our rare codon constructs. Thus, we inspected our transcriptome-wide mRNA half-life data to determine whether codon content within

ORFs could affect mRNA stability. To do so, we began by analyzing the extent of correlation between occurrence of individual rare codons and changes in stability. We found that occurrence of the very rare codons individually did not seem associated

37 with a broad reduction in stability. To address this problem, we analyzed the rare codons as a group. As a group, rare codon occurrence did correlate with reduction of mRNA stability, but exhibited inexplicable variability based on the codons included in the group. We hypothesized that if rare codons caused destabilization of messages, as we included increasingly common codons into the test group, the significance of association with half-life reduction would decrease. Instead, the significance fluctuated as we included more codons, rising with the inclusion of some, falling with the inclusion of others. This contradicted our hypothesis that rarity of codons was predictive of mRNA stability.

To form a new hypothesis, we took an unbiased approach – we determined if mRNAs enriched in any individual codon demonstrated greater or lesser stability. In general, we defined mRNAs as stable if they show a positive 2-fold difference from the median (~20 min), and unstable if they show a negative 2-fold difference from the median (~5 min). For each of the 61 translated codons, we calculated a frequency of occurrence in each mRNA, resulting in a list of 3969 frequencies, which we compared to our list of mRNA half-lives. A Pearson correlation calculation was used to generate an R-value, representing the level of correlation between the occurrences of that codon and the half-lives. We refer to this metric as the Codon occurrence to mRNA Stability Correlation coefficient (CSC). Repeating the calculation for all the codons, we could compare the CSC values for all codons to each other

(Figure 9). It is clear from that comparison that some codons are associated with stabilization of mRNAs and others are associated with destabilization. This indicates that some preferentially occurred in stable mRNAs while others occurred preferentially in unstable mRNAs (overall p-value = 1.496e-14, permutation p-value <

38

10-4). For example, the GCT alanine codon was highly enriched in stable transcripts as defined by our RNA-seq analysis, while its synonymous codons, GCG and GCA were preferentially present in unstable transcripts (Figure 9). Approximately one-third of all codons were over-represented in stable mRNAs, while the remaining two-thirds appeared to predominate in unstable mRNAs. As a consequence of the large dataset and significance of the observed correlation, these data strongly suggest that codon usage influences mRNA degradation rates.

Codon usage correlates with mRNA stability

To gain insight into this phenomenon, we analyzed the overall relationship between our CSC metric and existing methods for evaluating the optimality of codons. A previous publication (dos Reis et al., 2004) had established a metric for

39 the translational efficiency of codons. This metric was termed the tRNA Adaptive

Index (tAI). It attempts to describe the availability of decoding potential in the cell – it is based on the tRNA gene copy number, which was found to be a good proxy for tRNA concentration (Percudani et al., 1997), and a factor that accounts for the strength of interaction between the codon and anticodon. This metric is meant to reflect the efficiency of tRNA usage by the ribosome. To keep consistency with recent literature regarding optimality, we defined our optimal and non-optimal codons based on the definitions presented by Frydman and colleagues, which are based on a tAI cutoff around .5 combined with an accounting of the over- and under-representation of certain codons in the genome (Figure 10A) (Pechmann and Frydman, 2013; Zhou et al., 2009). Strikingly, we found that codons associated with stable or unstable mRNAs nearly perfectly mirrored their assignment as optimal or non-optimal, respectively (Figure 10B). Direct comparison between our CSC metric and the tAI metric revealed quantitatively illustrates the relationship between codon optimality and effects on revealed very good overall agreement between these values (Figure

13A; R = 0.753, p-value = 2.583e-12, permutation p-value < 10-4), suggesting that there is a significant link between the translation efficiency of a codon and its ability to stabilize mRNA.

40

It is important to note that while codon optimality is somewhat associated in genomic codon usage (Figure 11), both commonly occurring and uncommonly occurring codons can be optimal or non-optimal. Overall, rare codons tend to be non- optimal, but important exceptions exist in the form of both optimal uncommon codons and non-optimal common ones.

41

Further, the relationship between optimal codon content and mRNA half-life is independent of the method used to determine half-life. Our method, while based on a classical and dependable method of achieving transcriptional repression, has some downsides that could in theory affect the half-life calculations. For example, the rpb1-

1 mutation itself could affect mRNA decay rates or shifting the cells to a non- permissive temperature may disturb the pathway in some way. To mitigate these risks, we repeated our analysis of codon usage vs. mRNA half-life using mRNA decay rates obtained by a different laboratory using a different method. In contrast to our own, these data were obtained with a steady state approach calculation using metabolic labelling that minimally perturbs the cell and is completely distinct from our method (Miller et al., 2011). Both datasets show very similar final numbers for the CSC metric, indicating that method has not significantly skewed the calculations

(Figure 12).

42

To determine if the codon optimality correlation was possibly masking other features that might actually be determining mRNA half-life (e.g. sequence content,

GC percentage, or secondary structure), we reanalyzed our data after computationally introducing +1 and +2 frameshifts. In the analysis of these frameshifted ORFs, the correlation between codon content and stability completely disappears, thus eliminating other variables as determinative (Figure 13B: R = -

0.127, p-value = 0.3303, permutation p-value = 0.8847; and Figure 13C: R= -0.288, p-value = 0.0242, permutation p-value = 0.0012).

43

As shown above, computational analysis of our global mRNA stability data revealed a relationship between codon occurrence and mRNA half-life. There are three possibilities that emerge from this analysis: first, stabilizing or destabilizing effects may come from clusters of optimal or non-optimal codons respectively, as the rare codon constructs would suggest; second, there may be a small number of key codons which can stabilize or destabilize messages on their own – these would likely be those codons that fall at the extremes of the CSC metric; third, simple codon content ratio may be at work – higher prevalence of optimal codons leads to stabilization and higher prevalence of non-optimal codons leads to destabilization. To evaluate the first possibility, we computationally analyzed mRNA sequences in an attempt to identify clustering patterns of non-optimal codons in an effort to identify effects similar to those seen with the rare codon cluster. However, we found that there is little tendency for non-optimal codons to cluster. Indeed, we found that non-

44 optimal codons tended to be evenly distributed within each individual transcript. For those few mRNAs where non-optimal codon clusters could be found, those clusters did not appear to be predictive of particularly short half-lives. Thus, we ruled out codon clustering as a primary mechanism of regulation through optimality.

Codon usage in genome falls into distinct patterns

To distinguish between the other two possibilities, we analyzed the pattern of codon usage across the transcriptome to identify specific trends across genes. To do this, we evaluated the relative codon composition of each mRNA and then applied clustering analysis to identify similar patterns of usage. This analysis revealed that there are several different mRNA classes that differ strikingly in preferred codon usage within the transcriptome (Figure 14). There was a limited number of distinct patterns that emerged, suggesting that these groups may represent cohorts of co- regulated genes that can respond to changes in tRNA availability. Possibilities include changes in the cell cycle, such as switching between states of proliferation and quiescence, which has recently been shown to produce a switch in tRNA expression patterns (Gingold et al., 2014); activation of stress response pathways, which have been shown to affect tRNA modification pathways (Chan et al., 2012); or changes in nutrient availability, which have been shown to change subcellular distribution of tRNAs (Whitney et al., 2007).

45

46

Within these classes, two stand out because they specifically prefer either optimal or non-optimal codons (Figure 15A & B). Further, this preferred usage correlates well with overall transcript stability (Figure 15C). The average half-life of the genes in these two groups differs by approximately 2-fold. We concluded that this was most likely due to the widespread preference towards inclusion of optimal codons in one group. This supported the hypothesis that inclusion of a large number of optimal codons, rather than any individual codon, lead to a stabilizing effect in this group. Closer inspection of several stable mRNAs further supported that conclusion, as none of these was enriched in any particular codon, but an overwhelming proportion (>80%) of codons fell into the category of optimal (Figure 15D). By contrast, unstable mRNAs appeared to contain a mix of optimal and non-optimal codons. These were marked by a lack of enrichment for any identifiable group of codons at all (Figure 15E). These analyses demonstrated that in this set of mRNAs, the stable mRNAs are biased towards harboring predominately optimal codons and the unstable mRNAs include non-optimal codons in greater number, though the specific codon identities vary between individual transcripts. Applying the conclusion that optimal codon content is the defining factor for stabilization, we divided mRNAs into groups based on their optimal codon content. mRNAs with less than 40% optimal codons were found to be typically unstable, with a median half-life close to 5 minutes. In contrast, mRNAs with 70% optimal codon content or greater were found to be much more stable on average, with a median half-life of 17.8 minutes (Figure

15F). It should be noted that each of those groups represent a relatively small fraction of the genome with a majority falling in between (about 6% and 10% respectively).

47

48

Optimal codons increase average ribosome density

As the primary impact of codon effects would be expected to be at the level of translation, we reasoned that effects similar to those described for stability should be visible for measures of translation. Currently, there are no direct genome-wide translation rate measurements, but one widely-used proxy is ribosomal profiling. In this assay, mRNAs engaged by ribosomes are digested with nucleases and the remaining protected fragments are analyzed to extrapolate a ribosomal density across the mRNAs. While this gives no information about the speed of ribosomal transit, it allows an estimation of the average ribosomal density for each mRNA.

Previous studies have found that messages known to be highly translated tend to correlate with higher ribosome occupancies (Ingolia et al., 2009).

Specifically, we analyzed a previously published ribosome profiling data set

(Ingolia et al., 2009) from wild-type yeast cells and compared this to mRNA codon usage. Translational efficiency of each mRNA was calculated by normalizing the number of ribosome-protected fragments (RPF) per mRNA from profiling data to the total number of reads for each transcript from whole-cell RNA-seq analysis performed in parallel to generate a translational efficiency index (TEI). TEI from mRNAs was analyzed for correlations with codon usage, using the same algorithm as the CSC calculation described above to create Codon occurrence to TEI Correlation coefficient

49

(CTC). This revealed a strong relationship between TEI and codon usage (Figure

16A).

While this relationship was not identical to the one seen with mRNA stability previously observed, the association of optimal codons with positive effects and non- optimal codons with negative effects held true. Comparing our calculated CSC values to the CTC values, we obtain a very strong relationship, demonstrating that the two measures are closely linked (Figure 16B). Accordingly, transcripts with high TEI scores were enriched in optimal codons while transcripts with low TEI scores predominantly harbored non-optimal codon triplets. These data provide support for the premise that codon optimality dramatically influences translation efficiency

(Tuller et al., 2010) and argue that codon optimality influences mRNA translation and stability nearly identically, implicating codon usage in the linkage between the two processes.

As with the mRNA stability analysis, we sought to verify that these observations were not unique to the data set we chose; thus, we extended our analysis to include several more datasets found in the literature (Artieri and Fraser,

2014; Brar et al., 2012; Gerashchenko et al., 2012; Guydosh and Green, 2014;

Ingolia et al., 2009; McManus et al., 2014; Zinshteyn and Gilbert, 2013). Plotting the

CTC values for all of these data sets together (Figure 16C) demonstrates that all of these data sets are consistent with the data set used for our analysis.

50

51

Related genes appear correlated through codon usage

Consistent with the idea of co-regulation through codon usage, a previous analysis of mRNA stability in yeast revealed that the decay rates of some mRNAs encoding proteins that function in the same pathway or are part of the same complex were similar. Turnover of individual mRNAs appeared to be based on the physiological function and cellular requirement of the proteins they encode (Wang et al., 2002). We hypothesized that codon composition may provide a mechanism for the cell to coordinate the metabolism of transcripts expressing proteins of common function, in line with the grouping of genes into large blocks with similar usage. We assessed codon usage for genes whose protein products function in common pathways and/or complexes. We observed that mRNAs encoding the enzymes involved in glycolysis (n=10) had a similar and extraordinarily high proportion of optimal codons (mean=86%; Figure 17A). These transcripts were determined to be stable in the previous study and were confirmed to have a long half-life in our genome-wide analysis (median half-life=43.4 min). By contrast, mRNAs encoding polypeptides involved in pheromone response in yeast cells (n = 14) were all unstable in both studies (median half-life=5.6 min in our data) and harbored an average of only 43% optimal codons (Figure 17A).

Our analysis revealed that other groups of transcripts behave similarly. The stable large and small cytosolic ribosomal subunit protein mRNAs (n=70 and 54, respectively; median half-life=18.9 min and 20.2 min, respectively) demonstrated an average optimal codon content of 89% and 88% respectively. However, mRNAs that encode ribosomal proteins functioning in the mitochondria are unstable (n=42; median half-life=4.8 min), consistent with the observation that they have 45%

52 optimal codon content (Figure 17A & B). Other families of genes that have similar decay rates include those whose protein products are involved in ribosomal processing, tRNA modification, the TCA cycle, RNA processing, and components of the translational machinery (Figure 17 and data not shown). These data provide evidence that transcripts expressing proteins of related function are coordinated at the level of optimal codon content as well as decay rate, suggesting that these genes may have evolved specific codon contents as a mechanism to facilitate precise synchronization of expression based on their function in the cell.

53

In this section, we have established that optimal codon content is a general property of mRNAs that correlates significantly with half-life. It must be noted that the correlation we observe between optimal codon content and mRNA half-life is modest

– the trend is clear, but there are many outliers where the optimal codon content does not match the observed half-life. While codon optimality can clearly influence decay rates, other factors will also play major roles in regulation of mRNA decay.

Players such as translation initiation rate, 5’ UTR and 3’UTR sequence, and RNA binding proteins work together to achieve the vast continuum of observable mRNA half-lives.

54

CHAPTER 4: EXPERIMENTAL VALIDATION

Changes in codon content leads to changes in stability

To experimentally validate the relationship observed in the computational analysis, we tested the effects of altering optimal codon content within an mRNA. We designed two complimentary sets of reporters. In the first case, we started with a natural mRNA that has relatively few optimal codons and modified its coding sequence to contain a much higher percentage of optimal codons without strongly affecting other variables like GC content and secondary structure. At the same time, we took a natural mRNA with a high percentage of optimal codons and modified it to harbor many more non-optimal codons in the same fashion. We then expressed the two reporters in their native context (using their natural flanking sequences and promoters).

Specifically, we modified the codon content of the unstable LSM8 mRNA (half- life = 4.65 min) by making synonymous optimal substitutions in 52 of its 60 non- optimal codons. Similarly, we replaced the majority of optimal codons (108 of 113) within the coding region of the stable RPS20 mRNA (half-life = 25.3 min) with synonymous, non-optimal codons. This methodology ensured that the polypeptides encoded by these sequences were unchanged from the native form. Northern blot analysis of transcriptional shut-off experiments using rpb1-1 revealed that alteration of the codons within these two transcripts resulted in dramatic changes in their stability. Specifically, the half-life of LSM8 mRNA was increased greater than 7-fold as a consequence of the conversion of non-optimal codons into synonymous optimal codons in its ORF (half-life = 18.7 min; Figure 18A). In contrast, substitution of non- optimal for optimal codons within the stable RPS20 mRNA resulted in a sharp (10

55 fold) reduction in its stability (half-life = 2.5 min; Figure 18B). These experiments demonstrate that codon usage is a critical determinant of mRNA stability. To put these findings into context, a 10-fold change in half-life is similar to the changes seen when introducing nonsense mutations into a message to trigger the powerful NMD pathway in yeast (Zhang et al., 1995) and is at the maximal end of the regulation seen by other factors such as the PUF proteins (Olivas and Parker, 2000) in yeast.

Thus, optimal codon content within an mRNA can strongly influence stability in a way that can be manipulated within the native context.

Regulation through codon usage dominates over UTR regulation

To further understand the relationship between regulation of mRNA stability by codon usage and regulation by methods described previously, e.g. 3’ UTR

56 elements, we evaluated the behavior of high and low optimal codon content reporters in well-studied genomic contexts. To do this, we generated two synthetic open reading frames which encode the same 59 amino acid polypeptide, but differ in the optimality at each codon (SYN reporters). This was done to avoid any possible intrinsic regulation that may occur in the ORF of genes. The synthetic sequences have no similarity to any yeast genes at either the RNA or the protein level. We introduced the synthetic ORFs into a reporter bearing the 5’ and 3‘ UTRs of MFA2, a well-studied mRNA which is rapidly degraded in the cell (half-life = 3.0 min), a phenomenon shown to be mediated, in part, by elements encoded within its 3’ UTR

(LaGrandeur and Parker, 1999; Muhlrad and Parker, 1992). We also introduced the synthetic ORFs into a reporter with the 5’ and 3’ UTRs of PGK1, a well-characterized and stable mRNA (half-life = 25 min) (LaGrandeur and Parker, 1999; Muhlrad et al.,

1995). When the stability of the four reporter mRNAs was measured by transcriptional shut-off analysis using the GAL expression system, the SYN transcripts encoded with optimal codons were found to be significantly more stable (~4-fold) than their counterparts bearing the non-optimal codons (Figure 19). In this experiment, we can see the influence of the regulatory flanking sequences on the stability of these messages – both reporters had about 1.5-fold longer half-life in the stabilizing context of PGK1 than in the destabilizing context of MFA2. However, the magnitude of regulation by optimal codon content of the ORF was much greater than that of the flanking sequences. While this was an extreme case with polar opposite optimal codon contents, it clearly made the point that regulation at this level is potentially more potent than the previously described methods of regulation in the

UTRs.

57

Codon content affects the major decay pathway

Importantly, degradation of both the optimally and non-optimally encoded SYN reporter mRNAs was determined to occur through the 5’-3’ deadenylation-dependent major mRNA decay pathway, rather than the aberrant decay pathways. To establish this, we tested the decay of these reporters in a series of deletion mutants (Figure

20). Only the reporters with PGK1 context were utilized for these assays, as they produce better signal on a northern blot and behave the same as reporters with the

MFA2 context. The non-stop decay pathway was not tested here, as it has the best- defined targets of all of the aberrant mRNA decay pathways and absolutely requires a

58 message lacking a stop codon. The dom34∆ strain is not competent for no-go decay, which could result if a large number of non-optimal codon produced a block in translation similar to the stalling at strong mRNA structures. This strain does not show significant stabilization of the reporters, implying that the no-go decay pathway is not involved in the decay of these messages. The upf1∆ strain is not competent for nonsense-mediated decay, and a lack of stabilization here demonstrates that this pathway is not involved in the decay of these messages either. As a positive control for the major decay pathway, both ccr4∆ and dcp2∆ strains were tested to ensure that disruptions of the major decay pathway would stabilize the reporters.

59

Further, high-resolution northern analysis of the decay of these mRNAs confirmed that the rates of both deadenylation and decapping, the regulated steps in the major decay pathway, were affected as a consequence of changes in codon composition within the reporter ORFs (Figure 21). These data demonstrate that optimal codon content is a critical determinant of mRNA stability influencing both the rate of deadenylation and decapping during turnover of the mRNA independently of

5’ and 3’ UTRs. Regulation at the level of the UTRs can act in parallel with regulation by the optimal codon of the ORF.

HIS3 reporter system allows for fine tuning of mRNA stability

In conjunction with regulation at the level of mRNA decay, we expected that optimal codon content should have effects on translation, as proposed by multiple

60 previous studies on codon optimality. To evaluate these effects in vivo, we established a new reporter system which produces a well-characterized protein product encoded by different amounts of optimal codons. Specifically, we engineered the ORF of the HIS3 gene to contain either all optimal (HIS3 opt) or all non-optimal codons (HIS3 non-opt), with the wild-type HIS3 gene providing an intermediate point at 43% optimal codons (Figure 22A). The HIS3 gene was chosen because it is a protein with a well-characterized function that could be used to screen for expression by growth and has a relatively long ORF (220 amino acids) compared to our other synonymous mutation constructs, allowing us to effectively monitor ribosome association by sucrose density gradients (see below).

We then determined the mRNA decay rate of the three HIS3 constructs by transcriptional shutoff analysis using an rpb1-1 strain. Consistent with our previous results, it was observed that changing optimal codon content produced a dramatic effect on mRNA half-life (Figure 22B). Notably, the effect on HIS3 mRNA decay was commensurate with change in optimal codon content. The native HIS3 mRNA is more stable than average, especially given its relatively low optimal codon content. It has a half-life of 9.5 min, above the median of 7.3 min for the entire genome. For this experiment, the native UTRs and transcriptional context of the gene were maintained to avoid disrupting any regulatory elements which may be responsible for its relative stability. Decreasing the optimal codon content to nearly 0 produced a marked decrease in the half-life of the mRNA to 2 minutes. Conversely, increasing the optimal content to near maximum produced a large increase in the stability of the message, with a half-life of greater than 60 minutes. Thus, within this single native genomic context, we can achieve a full range of mRNA half-lives, from the minimum observed

61 to beyond the scope of our measurements, without introducing any regulatory changes outside of the open reading frame and without altering protein sequence.

Codon content impacts translation beyond changes in mRNA

With the range of regulation of mRNA half-life and accumulation within this reporter system established, we assayed the effects of this regulation on translation in the system. To ascertain that protein output of the reporters was reduced as well as the mRNA levels, we monitored the protein output from the HIS3 construct with a high optimal codon content by western blotting and compared it to the inverted

62 reporter. To account for changes in mRNA levels previously described, we normalized the amount of protein detected to the levels of mRNA for each reporter, as determined by northern blot. We observed that the non-optimal construct had four- fold less protein output than the optimal construct (Figure 22C). This confirmed our expectation that translation should be impacted by changes in optimal codon content in addition to changes at the mRNA expression level.

At the level of mechanism, there were several possibilities for the reduction in protein output per mRNA molecule in the case of low optimal codon content. In broad terms, each molecule can either engage fewer ribosomes, with the rate of ribosomal transit roughly equivalent for both constructs, or the ribosomes can progress through the non-optimal codons slower than the optimal codons, which may or may not be combined with a slowdown in initiation to balance the numbers of ribosomes on the message. There has been significant controversy in the field surrounding the question of whether translation rates vary on different codons, with different labs reporting different results (Charneski and Hurst, 2013; Gardin et al., 2014; Qian et al., 2012). Based on the experiments with rare codons, we hypothesized that the number of ribosomes on the constructs should not vary, with the reduction in protein output then having to come from slower rates of translation elongation on the constructs with low optimal codon content.

63

To measure this, we evaluated the ribosome occupancy of the HIS3 mRNA constructs. Ribosome occupancy was monitored using sucrose gradients, followed by fractionation and northern blotting of the isolated fractions (Hu et al., 2009). In a critical validation of our hypothesis, it was observed that the ribosome occupancy

64 was nearly identical for all three HIS3 reporter mRNAs (Figure 23), suggesting that each construct engaged a very similar number of ribosomes. In this case, the HIS3 construct is of appropriate size to resolve ribosome counts, so the ambiguity of the rare codon experiment is avoided (see above). The quantitation of the northern blots presented below further confirms that there is no significant shift in ribosome occupancy between the messages. Thus, we propose that the lack of change in ribosomal association must mean that the observed four-fold decrease in protein output is likely due to lower protein output per ribosome, implying a decrease in ribosome translocation rate on the construct encoded with low optimal codon content.

Affected step of translation is elongation

To directly determine whether ribosome translocation rate is indeed the affected step, we monitored ribosomal run-off of these two reporters. Ribosomal run- off measurements are performed by blocking initiation and monitor the ribosomal occupancy of messages as ribosomes progress through elongation and termination.

In the case of ribosomal run-off experiments in yeast, glucose deprivation can be used to induce rapid inhibition of translational initiation (Coller and Parker, 2005).

Under these conditions, ribosomes will complete their elongation and termination cycles and a large fraction of messages that were engaged in polyribosomal complexes then transition to lighter parts of the gradient as they can no longer re- enter translation (Figure 24A). We monitored this transition for both our well- characterized HIS3 reporter system and for the previously identified endogenous mRNAs, RPS20 and LSM8, which naturally have very high (92%) and very low (45%) optimal codon content respectively. We extracted mRNA-ribosome complexes before

65 and after glucose deprivation, separated the material with a sucrose gradient, collected fractions, and monitored the presence of the mRNAs in each fraction by northern analysis.

Importantly, under normal conditions the ribosome occupancy of both of the

HIS3 was determined to be similar (Figure 24B & C), with the short RPS20 and LSM8 constructs engaging fewer ribosomes than the longer HIS3 mRNAs. However, upon inhibition of translation initiation and induction of ribosome run-off, a large fraction of the high optimal codon construct mRNA relocated to the light fractions of the gradient in the ribosome-free area, whereas the low optimal codon content HIS3 mRNA remained virtually undisturbed in the polyribosomal fraction (Figure 24B & C).

This situation is mirrored in the endogenous messages, with RPS20 mRNA shifting significantly in their distribution upon inhibition of initiation, but LSM8 mRNA remaining largely associated with polyribosomes. We this conclude that the reduction in output per mRNA is most likely due to a change in rate of elongation of the associated ribosomes. This is demonstrated by the inability of the mRNAs to efficiently transition to the non-translating pool upon induction of ribosomal run-off.

This observation supports recent work that has found that more highly available tRNAs (i.e. ones that score highly on the tAI optimality scale) tend to translate more rapidly (Gardin et al., 2014). Thus, we have shown that regulation by optimal codon content sits at the junction of mRNA expression and protein translation – it has effects on both, with evidence pointing to a direct involvement in translation. It has long been observed that perturbations in translation can lead to destabilization of messages (Coller and Parker, 2004), creating a landscape with quickly and efficiently translated stable messages at one extreme and poorly translated unstable messages

66 at the other. The main variable in this scenario appears to be the optimal codon content of the messages, with contributions from regulatory factors that interact with

UTR sequences and potentially other factors that influence the balance between translation and decay (Sweet et al., 2012).

67

Changes in codon content can impact cellular fitness

Finally, to evaluate the cellular impact of this range of regulation, we assayed the growth of our cell lines bearing the HIS3 constructs on 3-AT. As our cell background is deficient for histidine synthesis due to disruption of the imidazoleglycerol-phosphate dehydratase enzyme (IGPD) encoded by its native HIS3 gene, the copy of HIS3 encoded by the reporters is the sole source of histidine in the absence of supplementation. 3-aminotriazole (3-AT) is a competitive inhibitor of IGPD that can be precisely titrated to challenge the cells (Glaser and Houston, 1974). In this experiment, we plated cells carrying our reporters in a dcp2∆ background in 5- fold dilution series on plates containing no histidine and 40 mM 3-AT (Figure 25). The dcp2∆ background was used to keep the mRNAs at similar levels by blocking enhanced degradation of the low optimal codon content transcript. Results of a parallel experiment using a wild-type background showed very similar results, but required higher concentrations of 3-AT and higher dilutions (data not shown). The results show that the reduction of protein output by the low optimal codon content message produces a dramatic decrease in the fitness of the cells under these challenging conditions. This directly demonstrates that the regulation of expression by codon usage can have dramatic consequences to the cell. Thus, this method of regulation carries potential for both a wide range of regulation, especially when the stability and translational phenotypes are considered together, and for dramatic consequences on fitness.

68

69

CHAPTER 5: DISCUSSION

Overview

In this work, we present six lines of evidence in support of the finding that codon usage is a general mechanism regulating both mRNA stability and translation to achieve powerful and precise regulation of gene expression. First, analysis of rare codon reporters indicates that rare codons can induce accelerated decay in yeast through the major decay pathway. Second, global analysis of RNA decay rates reveals that mRNA half-life correlates with optimal codon content. Many stable mRNAs demonstrate a strong preference towards the inclusion of optimal codons within their coding regions, while many unstable mRNAs harbor non-optimal codons. Third, we observe tightly coordinated optimal codon content in genes encoding proteins with common physiological function. We hypothesize that this finding explains the previously observed similarity in mRNA decay rates for these gene families. Fourth, we demonstrate that mRNA half-life can be predictably manipulated by changes in optimal codon content. This argues against the idea that optimal codon content may be masking other features responsible for changes in stability. Fifth, we directly demonstrate that changes in optimal codon content impact rates of change the rate of ribosome translocation of a transcript, indicating that the effect on mRNA decay occurs through modulation of mRNA translation elongation. Finally, we demonstrate that changes in optimal codon content under conditions where the gene product is limiting for growth lead to significant impacts on cellular fitness. Taken together, our data suggest that the optimal codon content of genes is a key features of the open reading frames that determines both their capability to produce protein and their

70 stability as transcripts. There is likely evolutionary pressure on protein coding regions to coordinate gene expression at the level of protein synthesis and mRNA decay.

Considerations of rare codon experiments

The work began with an exploratory project based on previous observations that rare codons negatively impact mRNA stability when present in mRNA coding regions made by our group and others (Caponigro et al., 1993; Hoekema et al.,

1987; Hu et al., 2009; Sweet et al., 2012), we expanded our analysis to include questions of dependence on position as well as evaluation of factors involved in these phenomena. We found a striking position dependence associated with the degree of destabilization conferred by a stretch of rare codons. Rare codons early in the message provided less destabilization than rare codons later in the message in a largely linear fashion. There are some potential concerns regarding the interpretation of this experiment, since the exogenous codons were introduced into the mRNA without regard for disruption of local mRNA or protein structure. The rare codon- containing constructs all produce protein at a greatly reduced rate as compared to the parent mRNA (data not shown), but there does not appear to be a polarity associated with the protein production. One possible explanation is that the protein is misfolding due to the new codons disrupting local structure, which has been shown to lead to disruptions in translation and message decay (Hollien and Weissman,

2006; Kirstein-Miles et al., 2013). However, this is unlikely because this type of response should inhibit cell growth, which was not observed with these constructs.

The dependence of this phenomenon on DCP2 and DHH1 suggests that this pathway functions through the 5’-3’ major decay pathway rather than quality control mRNA decay pathways, but curiously, disruptions of other components of that

71 pathway do not disrupt the position dependence. This suggests that perhaps DHH1 acts on DCP2 independently of the other decay factors. This model is supported by the observations that DHH1 appears to have a limited set of substrates, whereas other decapping activators such as LSM1 and PAT1 function broadly to affect virtually every mRNA (Coller and Parker, 2005; Sweet et al., 2012). Additionally,

DHH1 is known to be a translational repressor independent of its role in decay, functioning to restrict ribosomal run-off (Sweet et al., 2012). The observations of unchanged ribosome association with the rare codon constructs in the presence of significant protein output reduction suggests that these constructs may exhibit slower translocation of ribosomes, which could be mediated by DHH1. Overall, the reasons behind the polarity displayed by these constructs (which was not recapitulated by whole-genome analysis, discussed below) remain a mystery.

Considerations of RNA-seq study

In the subsequent experiments, we combined a transcriptional inhibition time course with genome-wide RNA-Seq analysis of poly(A)+ versus total RNA (depleted of rRNA) and demonstrated that total mRNA half-lives are significantly longer than poly(A)+ half-lives. These data suggest either that methods used to enrich poly(A)+ mRNA are inefficient or that upon removal of the poly(A) tail, a large fraction of mRNAs are maintained at some level in the cell as deadenylated transcripts, which is a striking observation, as deadenylation in yeast and other eukaryotes represents not only the initial event in mRNA degradation, but for many mRNAs has been suggested to be the rate-limiting step (Decker and Parker, 1993; Franks and Lykke-Andersen,

2008). From the perspective of a technical limitation, oligo-dT based purifications are biased towards mRNAs with longer tails, as short tails allow for fewer binding sites for

72 the oligonucleotides; studies across decades have reported significant drops in signal once the length of the poly(A) tail approached 20-30 nucleotides (Blower et al.,

2013; Cabada et al., 1977) From the perspective of poly(A)- mRNAs existing in the cell as a biological molecule, some studies have estimated that as many as 35% of mRNA species in the cytoplasm could be found in the poly(A)- state (Cheng et al.,

2005), though other studies place that number lower (Yang et al., 2011). These could occur due to a lack of polyadenylation activity rather than deadenylation, but current understanding of mRNA transcription and processing dictates that the cleavage and polyadenylation steps are intricately woven into transcription termination and export (Proudfoot, 2011), making it unlikely that they do not occur for such a large group of mRNAs. Regardless of the reason, data from our study as well as previous ones (Wang et al., 2002) indicate that sole use of poly(A)+ methods to determine mRNA decay rates could lead to the underestimation of half-lives.

Our global mRNA decay analysis was performed by inhibiting mRNA transcription using a well-characterized temperature-sensitive allele of the gene for the large subunit of RNA polymerase II, rpb1-1 (Nonet et al., 1987). While this method has its drawbacks, it has been shown to produce mRNA half-life measurements comparable to those produced by other methods, such as shut-off of regulated promoters such as the GAL promoter, approach to steady-state by metabolic labelling, and inhibition of transcription with drugs such as thiolutin

(Herrick et al., 1990; Wang et al., 2002). Consistent with these assumptions, mRNA half-life values obtained in this study are in agreement with many mRNAs whose decay has been measured experimentally and published in the literature (Decker and

Parker, 1993; Herrick et al., 1990; Miller et al., 2011) despite the fact that rpb1-1

73 shut-off does induce a stress that could alter mRNA decay rates. Importantly, our dataset also correlates moderately with half-lives generated by Miller et al., who used a steady state approach methodology with metabolic labelling by 4-thiouracil to determine mRNA half-lives in an effort to minimize perturbation to the cells. Their approach has its own limitations, including the need to express an exogenous nucleoside transporter for proper uptake, poorly-understood kinetics of uptake and incorporation of their label, and inherent limitations of microarray technology in half- life calculation. Nonetheless, their method is completely distinct from ours and provides a proper control data set. Using the Miller et. al., data set we observe strong a correlation between mRNA half-life and codon optimality (Figure 12). In fact, the correlation between half-life and codon optimality observed in the Miller et al. (2011) dataset was higher than our own (overall correlation R=.533 vs R=.313 respectively).

It should be noted that the correlation of half-life values between our two data sets is limited (R=.448). Thus, we conclude that the influence of codon optimality on mRNA half-life is observable in data from independent sources, using independent methodologies, even in the absence of perfect agreement in the half-life values.

Based on previous observations with rare codons, we evaluated our genome- wide half-life data for any effect codon content might have on mRNA turnover. Our global analysis revealed correlations between the enrichment of individual codons and the stability (or instability) of mRNA in yeast (Figure 9). This analysis demonstrated that the pattern of codon usage bias among synonymous codons had specific repercussions for mRNA stability. We observed enrichment of optimal codons within the coding region of stable mRNAs, while non-optimal codons are found to predominate within unstable mRNAs. It is important to note that while codon content

74 is clearly a major determinant of mRNA stability, it does not predict half-lives of all mRNAs. For example, mRNAs for several histone components, such as HHF2 and

HHT1, contain 85% optimal codons, but yet are very unstable with half-lives of 2.4 and 3.5 minutes, respectively. The half-lives of such mRNAs could be dictated by their ability to initiate translation efficiently (or inefficiently) and/or by elements in 5’ or 3’ UTRs. It is also possible that features within the ORFs might explain some of the outliers, such as the distribution and placement of optimal and non-optimal codons.

Curiously, we were unable to recapitulate the position-dependent polarity seen in our experiments with rare codons through a cluster search in our sequencing data, thus more nuanced exploration of this concept may be warranted.

Considerations of codon content experiments

As a translation efficiency scale, codon optimality reflects a balance between supply of available charged tRNA and demand of translating ribosomes. In an extension of this concept, Tuller et al (2010) theorized that coadaptation between coding sequences and tRNA pools can influence translation speed and as a consequence, ribosomal density. However, studies of ribosomal profiling data have found conflicting evidence of differences in ribosome density at individual codons – some reporting no changes between codons at all, and some finding meaningful differences (Charneski and Hurst, 2013; Gardin et al., 2014; Ingolia et al., 2009;

Qian et al., 2012). Thus, the contribution of codon optimality to elongation rate is not clear. We propose that codon optimality can powerfully alter ribosome translocation rate. We have shown that protein output is greater from an mRNA containing optimal codons than from an analogous mRNA containing synonymous non-optimal codons.

These mRNAs have similar ribosome association patterns in sucrose gradients, but

75 produce different amounts of protein, suggesting that ribosome translocation rate is different between these mRNAs. Additionally, upon blockage of translational initiation, we observe that an mRNA with optimal codons clears from polyribosomes much more efficiently that the synonymous mRNA containing non-optimal codons.

Together, these data argue that codon identity can have a powerful influence on ribosome translocation rate. The failure to see this influence by some ribosome profiling experiments may indicate that the effect of each individual codon on elongation rate is minute and undetectable by analysis of individual fragments.

Indeed, while codon identity may have a small influence on ribosome decoding rate when measured individually, our data suggest that codon optimality is powerfully additive and can result in dramatic changes to mRNA metabolism. Though the precise mechanism connecting effects on translation elongation with effects on deadenylation and decapping remains unknown, we have previously posited that slowing of ribosomal transit leads to the association of decay factors that promote entry of the mRNA into decay (Sweet et al., 2012).

Possible roles of DHH1

The DEAD-box RNA helicase DHH1 is an intriguing candidate for the mechanism of monitoring translation elongation by the ribosome for multiple reasons. First, DHH1 is an integral component of the mRNA decay machinery

(Presnyak and Coller, 2013) that has been shown to act as an activator of decapping though its role in promoting translational repression (Coller and Parker, 2005; Sweet et al., 2012). Secondly, DHH1 is an abundant protein (>50,000 per cell), far exceeding the levels of all other mRNA decay factors. Third, DHH1 orthologs in yeast and Xenopus has been shown to associate broadly across an mRNA transcript,

76 including sites within the coding region (Minshall and Standart, 2004; Mitchell et al.,

2013). Fourth, our recent findings have shown that DHH1 protein can modulate translation of mRNA at the level of translational elongation when directed to a transcript (Sweet et al., 2012). Lastly, our findings with the rare codon constructs implicate DHH1 in regulation of mRNA stability conferred by codon effects. Taken together, these findings spell out a model wherein DHH1 associates directly with transcripts through its intrinsic RNA-binding activity and influences ribosome transit in some way. The simplest model of this event can be based on residence time, where a failure to displace DHH1 by quickly translating ribosomes eventually leads to activation of further translational repression and/or recruitment of a decay complex.

Further studies are needed to determine the mode of influence on mRNA decay by translation elongation rate, including understanding of the role of DHH1 in the process; and whether this conserved protein represents the link between codon usage and the array of mRNA decay rates observed in yeast and other eukaryotic cells.

Codon optimality in yeast and other organisms

We show that codon optimality strongly influences mRNA stability regardless of the nature of the transcript’s untranslated regions (Figure 19). We suggest that codon usage is a general mechanism intrinsic to all mRNAs, which facilitates the fine- tuning of gene expression. It works in concert with message-specific mRNA regulators commonly found in UTR regions to fulfill that function (Goldstrohm et al., 2007;

Olivas and Parker, 2000; White et al., 2013). Codon usage may serve to establish the base decay rate and expression level for a given mRNA, while UTR-based regulation may play more specific roles for regulating individual mRNAs as needed for a given

77 intra- and extracellular environment. Interestingly, these two pathways may function upon a single point of regulation - UTR regulators are described to be able to induce translational repression rather than directly recruit the decay machinery, which is consistent with the alterations of translation elongation effected by optimal and non- optimal codons.

Codon optimality as described in this work and previously is a relatively straightforward concept in the context of simple, single-celled organisms such as yeast and bacteria, which spend the largest portions of their existence in either an exponentially growing phase or in a quiescent phase while waiting for proper conditions for exponential growth. It’s clear that in these cells, it is most beneficial to tweak the production of proteins that are rate-limiting for growth to levels that are as high as possible. Thus, in these systems, optimality is well-defined and easily observable. Gene products like the ribosomal proteins and metabolic enzymes have immediately distinct codon usage signatures that are highly biased towards codons readily identifiable as optimal. The entire cell is a machine with a singular purpose – to grow as quickly as possible as soon as conditions allow.

In higher organisms, the concept of optimality becomes much murkier. Cells are no longer single-purpose growth machines, but are instead cogs in a much larger mechanism, which must be finely coordinated and tuned to perform multiple functions. Indeed, even the codon bias is reduced in these cases. For example, the ribosomal genes of yeast are among the most highly expressed and most biased genes in yeast, showing extreme preference for optimal codons. This preference is strongly reduced in higher organisms (Figure 26). In S. cerevisiae, there is a very small number of very frequently used codons, with a majority of codons showing

78 relatively low usage. In S. pombe, a yeast with a much slower rate of growth, the bias is significantly less pronounced. As the chart moves to metazoans, including D. melanogaster and M. musculus, the bias continues to decreases, with the mouse ribosomal genes showing little bias compared to yeast genes.

Codon bias of ribosomal genes by organism 70

60

50

40

30

Usage/1000 Usage/1000 codons 20

10

0 S. cer S. pom D. mel M. mus

By limiting the analysis to single amino acids, we can demonstrate that bias among degenerately encoded groups also decreases along with overall genomic bias

(Figure 27). Presented here are the breakdowns of codon usage within the 6-fold degenerate amino acids in the ribosomal genes of several organisms. Ribosomal genes were chosen for this analysis both because they are well-conserved between organisms and because they need to be expressed at very high levels to achieve

79 rapid growth. Within this group of organisms, S. cerevisiae follows a consistent patterns; one codon is always used for a majority of occurrences, with a second accounting for a majority of the remainder. These patterns extend to other groups of degenerate codons beyond those that are shown, though some of the groups with fewer members only have a single preferred codon. The usage patterns correspond tightly to tRNA expression levels, e.g. for Arg, the tRNA decoding the most common

AGA codons is present at 11 copies (Iben and Maraia, 2012), the tRNA decoding the second most common CGT is present at 6 copies, and the other 4 codons only have

2 corresponding tRNA genes combined. There are some variations in the way that the decoding potential is provided, e.g. for serine, the second most common codon, TCC, lacks a direct decoding tRNA and is decoded by I-C wobble pairing with the tRNA for the most common codon, TCT (present at 11 copies). These preferences persist at a lower level in fission yeast and fruit flies, but are largely absent from the mouse ribosomal genes, particularly in the case of arginine. Thus, the definition of optimality in higher eukaryotes becomes a daunting task, as there is a lack of clear preference for specific codons. This may reflect the need for the more diversified roles that cells play within these organisms; different cell types in different tissues may express tRNAs at different levels, affecting sets of genes enriched in the cognate codons.

80

AA Codon S. cer S. pom D. mel M. mus Arg AGA 80% 9% 4% 16% Arg AGG 1% 1% 8% 19% Arg CGA 0% 1% 3% 15% Arg CGC 0% 19% 50% 20% Arg CGG 0% 0% 3% 18% Arg CGT 18% 70% 33% 11% Leu CTA 6% 1% 3% 4% Leu CTC 0% 17% 14% 20% Leu CTG 0% 2% 58% 45% Leu CTT 2% 36% 6% 15% Leu TTA 16% 7% 1% 3% Leu TTG 75% 37% 17% 13% Ser AGC 2% 10% 17% 21% Ser AGT 2% 7% 4% 10% Ser TCA 4% 6% 3% 9% Ser TCC 39% 23% 43% 29% Ser TCG 0% 3% 23% 5% Ser TCT 53% 51% 11% 27%

Additionally, in our experiments, we use exponentially growing yeast cells, thus providing a steady and unchanging environment for our assay. However, both long and short time scales provide important opportunities for the reassignment of codon optimality in the cell. In the short term, changes in cellular growth conditions and nutrient availability could significantly impact individual (or subsets of) charged tRNA levels. As a consequence of this reduction in supply, translation elongation rates of mRNAs enriched in the cognate codons of these tRNAs would be slowed and mRNA levels decreased due to enhanced turnover. In this way, codon optimality provides the cell not only with a general mechanism to hone mRNA levels, but also with a mechanism to sense environmental conditions and rapidly tailor global patterns of gene expression.

81

Long term genetic changes that introduce synonymous mutations into protein- coding genes do not alter the amino acid sequence of the encoded polypeptide; however, such changes would impact mRNA and protein expression levels if the mutations significantly altered the proportion of optimal codons within the open reading frame of the mRNA. Synonymous gene mutation can thus be envisioned as a method to evolve mRNA stability rates that are advantageous to the cell. We find that mRNAs encoding proteins that act together in similar pathways or stoichiometric complexes, which have been previously observed to decay at similar rates (Wang et al., 2002), encode nearly identical proportions of optimal codons (Figure 17). We suggest that codon optimality has been finely tuned for these gene sets as an elegant mechanism to ensure coordinated post-transcriptional regulation and parsimonious expression of proteins at the precise levels required by the cell.

Interestingly, similar levels of codon optimality would ensure not only similarity of stability and translation rates for related mRNAs, but also coordination of response to changes in tRNA levels (e.g. nutrient availability, stress, cell type, etc.). Recent studies reveal that tRNA concentrations within the cell are not static but are constantly undergoing change, sometimes dramatically. For instance, large scale

RNA profiling experiments have demonstrated that tRNA concentrations vary widely between proliferating and differentiating cells (Gingold et al., 2014). Based on our analysis, we would argue that significant alterations in tRNA concentrations could alter the mRNA expression profile within a cell, even without any changes in transcription by dynamically changing message stability.

82

Ribosome as monitor of all mRNA fates

As a final implication, our work suggests that co-translational mRNA surveillance by the ribosome is not only important to target aberrant mRNAs to rapidly decay, but also to tune the degradation rates of normal mRNAs. In eukaryotes, aberrations in mRNAs lead to aberrant translation events such as premature termination, lack of translation termination, and ribosome stalling, which result in the accelerated turnover of the mRNA by the nonsense-mediated, non-stop, and no-go decay pathways, respectively (Shoemaker and Green, 2012). All of these pathways are critically dependent on the ribosome. Events surrounding aberrant translation anchor each pathway, making the ribosome the sole sensor able to trigger the entry of mRNA into these pathways. We find here that codon usage within normal mRNAs also influences translating ribosomes and can have profound effects on mRNA stability. Thus, the ribosome acts as the master sensor for all mRNA decay, determining the fate of all mRNA through modulation of its elongation and/or termination processes. The use of the ribosome as a sensor is ideal for protein- coding genes, whose primary function in the cell is to be translated. We suggest that a component of mRNA stability is built into all mRNAs as a function of codon composition. The elongation rate of translating ribosomes is communicated to the general decay machinery, which affects the rate of deadenylation and decapping.

Individually, identity of codons within an mRNA would be predicted to have tiny influence on overall ribosomal decoding; however, within the framework of an entire mRNA, we show that codon optimality can have profound effects on translation elongation and mRNA turnover. We conclude that codon identity represents a general property of mRNAs and is a critical determinant of their stability.

83

Future directions

At this point, there are three direct lines of experimentation that can be done to understand the mechanism of regulation though codon usage and expand the application of these findings.

First, the relationship between tRNA expression and codon usage needs to be explored. It is clear that optimal codons are those that match up well with the available pool of charged tRNAs. This is advantageous to the cell, whose growth may be limited by production of proteins, particularly components of the ribosome and metabolic pathways. This also presents a potential mechanism of regulation, as altering the availability of tRNAs will lead to changes in effective optimality of the cognate codons, which can strongly alter both translation and stability of mRNAs. It has been shown that tRNA pools can change in conditions of stress or with changes in cellular programs (Chan et al., 2012; Gingold et al., 2014), making it likely that this mechanism is utilized by cells with some regularity. One way to discover this regulation is to monitor the tRNA pool of cells under a variety of conditions, while testing the stability of mRNAs in that scenario. The best test would be a direct measurement of translation rates for all the mRNAs in the cell to ascertain that the changes in tRNA availability actually affect translation as expected, but as such an assay is not available, mRNA stability is a good proxy for this measurement. Another way to test these effects would be to artificially alter the available tRNA concentration in the cell by inhibiting modification or changing nutrient conditions such that some amino acids become limiting. Measurement of mRNA stability under these conditions would similarly allow for shifts in regulation.

84

Similarly extending the idea that optimality is a fluid concept, patterns of tRNA expression and mRNA stability in multicellular organisms are an important area to research. In a complex organism with a multitude of cellular programs ranging from developmental roles to maintenance of mature tissues, preferred sets of codons could vary widely, depending on tRNA expression patterns and cellular function at the time. One important project would be to understand those relationships – the differential tRNA expression patterns could play roles in regulating large rafts of genes in programs like development, helping explain developmental regulation such as some fetal splice isoforms. Further, changes in tRNA expression could be important to maintaining cellular identity, for example, reinforcing the differentiated behavior of mature cells and the proliferative behavior of stem cells. The first analysis of this kind has been published in recent studies (Gingold et al., 2014).

From a mechanistic point of view, it is not clear what regulates the entry of non-optimal messages into the decay pathway and the coupling to translational events. From some of our experiments, we suspect that the RNA helicase DHH1 can participate in this process, but further study would be necessary to identify the specifics of this interaction. We can look at the effects of changes in optimal codon content on the stability on messages in the absence of DHH1 in the cell. Similarly, we can evaluate the effects of DHH1 deletion on mRNA stability of normal messages for which we have previously established half-lives. Finally, we can look at direct association of DHH1 with messages harboring predominantly optimal or non-optimal codons, as we would expect the association to differ between messages that are targets of regulation and those that are not.

85

APPENDIX A: BIOINFORMATICS

This section provides a highlight of the code used in the project. It is not a list of code used, rather it is meant to illustrate some of the techniques and coding structures utilized for the project. Additionally, it provides some convenient code sections that could be adapted for future scripts. Code is presented below with line by line comments in grey.

Half-life fitting

This code was written in R, using the RStudio environment for Windows. It takes an Excel file of genes names and quantitation across a time course. Our input for this script was a file containing our normalized FPKM reads. The data presented above uses least absolute deviation fitting for half-life calculation, which was chosen because we had a relatively dense time course, with 10 time points. This situation required a robust method, as we wanted to minimize impact of outliers. The particular example below uses ordinary least squares fitting, as that should be suitable to a wider range of applications.

# Half-life calculation by OLS method # Requires xlsx, foreach, and doParallel packages # Requires an installed and set up version of the # Java Runtime (JRE) - this is required for xlsx only

###################################################### ## IMPORTANT - ADJUST THESE PARAMETERS FOR YOUR RUN ## ######################################################

# input data file location (xlsx format) # expected file arrangement is gene names in first column, # FPKM at given time points in subsequent columns # extra columns are okay, they will be ignored # remember to use \\ for path separator input_file <- c("C:\\User\\Experiment 1.xlsx")

86

# output file for calculated k values # remember to use \\ for path separator output_file <- c("C:\\User\\exp_1_results.txt")

# time points to be used for fitting in minutes time_points <- c(0,2,4,6,8,10,15,20,30,40,60)

# number of threads to use for calculation threads <- c(4)

################################### ## DATA PROCESSING AND FUNCTIONS ## ###################################

# import packages library(xlsx) library(foreach) library(doParallel)

# read xlsx file with FPKM of timepoints into data frame read_data <- read.xlsx(input_file, sheetIndex = 1)

# convert data frame to matrix without gene names temp_matrix <- as.matrix(cbind(read_data[,-1]))

# discard columns beyond pre-defined time points data_matrix <- temp_matrix[,1:length(time_points)]

# define main function - this takes a row of data from # matrix above, normalizes it to the 0 time point, # and fits a k value to it using the supplied distance # function and OLS fitting method main <- function (data_row){

# normalize data by dividing through by time point 0 norm_data <- data_row / data_row[1]

# pass data to optimization function, which will minimize # the distance function (between 0 and 1) final_k <- (optimize(function(prop_k) dist(norm_data, prop_k), c(0,1)))

# return the optimised k value along with minimized error return(c(as.numeric(final_k[1]), as.numeric(final_k[2]))) }

# define distance function (exponential decay fit with OLS)

87 dist <- function(data, proposed_k) {

# proposed values calculated from passed proposed k value # and pre-defined time points in the format of e^-kx prop_values <- exp(-proposed_k*time_points)

# returns the sum of the squares of the distances between # the actual data and proposed values return(sum((data - prop_values)^2)) }

# set up parallelization back-end – sets previously defined # number of threads and registers the parallel backend cluster <- makeCluster(threads) registerDoParallel(cluster)

# set up function loop - simply calls the main function on # each row of data and captures output loop_output <- foreach(i = 1:(nrow(data_matrix))) %dopar% { main(data_matrix[i,]) }

# returns to the sequential back-end registerDoSEQ()

# format output of function loop into matrix # add row labels and headers final_output <- do.call(rbind,loop_output) rownames(final_output) <- read_data[,1] colnames(final_output) <- c("k", "dist")

# write output to file (tsv format) write.table(final_output, output_file, quote = FALSE, sep = "\t")

The output produced from this can then be easily imported into Excel and sorted or filtered as necessary. Excel was chosen for filtering as it allows faster visualization and adjustment than doing the same with a script.

CSC calculation

The CSC calculation code was originally created in Perl 5 using the Ubuntu linux command line, and later adapted in R. The code presented below is from the

88 original Perl script. Much of this code has been adapted into other scripts, as it is a good set of techniques and snippets to handle this type of sequencing information.

The basic workflow of the script begins with two files as input. The first is a FASTA document with the sequences of all coding regions (CDS) of the genome. For yeast, this file can be obtained from the Saccharomyces Genome Database

(http://www.yeastgenome.org) or through the UCSC genome table browser

(http://genome.ucsc.edu/cgi-bin/hgTables). UCSC has many more organisms, and most species-specific databases will be able to provide this as well. The second is a listing of genes and their corresponding calculated half-lives, provided by the script described above. The FASTA file is read into memory, and the occurrence of each codon in that file is counted. The script then goes through each codon one by one and evaluates the correlation between the occurrence of that codon and the half-life provided by the second file. These correlations are printed out to the command line to be piped to a file.

# CSC calculation script #!/usr/bin/perl

# load modules – requires basic stats and list utils use strict; use warnings; use Statistics::Basic; use List::Util;

########################### ## COMMAND LINE HANDLING ## ###########################

# Read file names from command line # will throw an error if it cannot do it my $CDS_file = shift or die; my $T12_file = shift or die;

############################

89

## GENERAL FUNCTION CALLS ## ############################

# Establish variables used in body # hash of strings – stores half-lives my %halflives; # hash of strings – stores raw sequences my %sequences; # hash of hashes – stores codon counts for each sequence my %counted_genes; # this is the hard-coded list of codons for the script my @codon_list = split ', ', "TTT, CTT, ATT, GTT, TCT, CCT, ACT, GCT, TAT, CAT, AAT, GAT, TGT, CGT, AGT, GGT, TTC, CTC, ATC, GTC, TCC, CCC, ACC, GCC, TAC, CAC, AAC, GAC, TGC, CGC, AGC, GGC, TTA, CTA, ATA, GTA, TCA, CCA, ACA, GCA, CAA, AAA, GAG, TGG, CGA, AGA, GGA, TTG, CTG, ATG, GTG, TCG, CCG, ACG, GCG, CAG, AAG, GAA, CGG, AGG, GGG";

# Call subroutine to parse FASTA file my %sequences = FASTA_parse( $CDS_file );

# Call subroutine to parse half-lives %halflives = T12_PARSE( $T12_file );

# looks for mismatches between files, omitting any half- # life that does not have a sequence associated with it foreach ( keys %halflives ){ delete $halflives{$_} if not defined $sequences{$_} }

# Uses a loop to call the codon-counting subroutine foreach my $key ( keys %sequences ) { # stores each set of counted codons into hash $counted_genes{$key} = { CODONS( $sequences{$key} ) } }

# Print header print "Codon\tCorrelation\n";

# loops through the array of codons defined above, # creates two vectors (arrays) populated with codon count # and halflife, then calculates a correlation # note that codon count is normalized to length of sequence for (my $i = 0; $i < scalar @codon_list; $i++) { # declare variables used in the loop my @codon_count; my @halflife;

90

# loop generates the arrays # takes one gene from the half-life list at a time foreach my $key ( keys %halflives ) { # places half-life of current gene into array push @halflife, $halflives{$key}; # takes the count of the current gene for the codon # defined by outer loop # defaults to 0 if value not present my $hits = $counted_genes{$key}{$codon_list[$i]} || 0; # normalizes codon count as fraction of total and # places normalized count into array push @codon_count, ($hits / (List::Util::sum values %{$counted_genes{$key}})) ; }

# calculates correlation between the two ordered arrays # for the current codon my $correlation = Statistics::Basic::correlation( \@codon_count, \@halflife );

# prints the correlation for the current codon print $codon_list[$i]."\t".$correlation."\n"; }

# Create a hash from a FASTA file # lines starting with > become hash key # following lines converted into hash value sub FASTA_parse { # take passed argument as file name my $filename = shift;

# define temporary variables used in the sub my %temp_hash; my $seq_name;

# open file for reading open CDS_FILE, $filename; # pull lines from file one by one until end of file while ( my $line = ) { # check if file name starts with > # and matches expected naming if ( $line =~ m/^>(.+?)\s/ ) { # if line does start with >, capture first “word” $seq_name = $1; # trim identifier from name to match HL file

91

$seq_name =~ s/sacCer3_sgdGene_//; } # if line doesn’t start with >, capture as sequence else { # take off return character chomp $line; # put sequence into hash under the name defined above $temp_hash{$seq_name} .= $line; } } # close file handle close CDS_FILE;

# returns hash of name/sequence pairs return %temp_hash; }

# Parses a simple half-life file - 2-column TSV input sub T12_PARSE { # take passed argument as file name my $filename = shift;

# define temporary variables used in the sub my %temp_hash;

# open file for reading open T12_FILE, $filename; # pull lines from file one by one until end of file while ( my $line = ) { # regex splits line, checks for proper formatting # of half-life and captures both sides if ( $line =~ m/(^\w+-*\w*)\t((-*\d+\.\d+|0))/ ) { # stores half-life in hash under gene name $temp_hash{$1} = $2 } } # closes file handle close T12_FILE;

# returns hash of gene/half-life pairs return %temp_hash; }

# Function for splitting sequence into codons # input is a sequence, output is a codon hash sub CODONS { # take passed argument as sequence

92

my $curr_seq = shift;

# declare temporary variables used in sub my %codons;

# regex tokenizes sequence into 3-character pieces # and creates a hash using the codon itself as the key # the values are incremented as new codons are found $codons{$_}++ foreach ($curr_seq =~ m/\w{3}/g);

# returns hash of codon/count pairs return %codons; }

This code has been adapted for numerous scripts during this project, highlighting the adaptability of the basic subroutines, such as FASTA handling and codon counting. Nevertheless, care is required when adapting this code, for example the FASTA handler needs to be adjusted for the expected gene naming formats, which requires some expertise to tune the regular expression to capture the gene name properly.

APPENDIX B: MATERIALS AND METHODS

Yeast strains and growth

The genotypes of all yeast strains used in this study are listed below in Table

1. Unless otherwise indicated, all strains are based on BY4741. Cells were grown in standard synthetic medium (pH 6.5) supplemented with appropriate amino acids and either 2% glucose, 2% galactose/1% sucrose, 2% raffinose/1% sucrose or 2% sucrose as the carbon source. All cells were grown at 24 C and collected at mid-log phase (3 x 107 cells ml-1).

93

Plasmids and strain construction

The plasmids and oligonucleotides used in this study are listed in Table 2 and

Table 3 respectively.

Rare codon reporters: To make the negative control for the rare codon experiments, cloning sites were introduced into the 3’ UTR of pJC296 (Hu et al.,

2009) by site-directed mutagenesis using oligos oJC1011/oJC1012 and oJC1013/oJC1014. MS2 sites were introduced by annealing oligos oJC1015/oJC1016 and cloning them into the newly created SpeI/XhoI sites to create the base plasmid without RC (pJC408). Constructs used in the experiments were additionally tagged with HA at the 3’ terminal using oJC1196/oJC1197 to create the

–RC construct (pJC441). Three other constructs were created in the same way from pJC408, amplifying the ORF in two pieces, with the rare codon stretch encoded on the primers at the junction of the two pieces. HA tags were also added on the far 3’ primer. Oligos used were: oJC1261/oJC1244/oJC1245/oJC559 for RC 25 (pJC469), oJC1261/oJC1246/oJC1247/oJC559 for RC 50 (pJC470), oJC1261/oJC1248/oJC1249/oJC559 for RC 63 (pJC471). The RC 77 (pJC443) plasmid was generated from the previously used pJC314 (Hu et al., 2009) in the same way that –RC was made from pJC296 above (SDM to add restriction sites, insertion of MS2 sites, then addition of HA tag). RC 94 (pJC489) was made by 2 rounds of PCR, inserting half of the codon stretch at a time into pJC441, using oligos oJC1318/oJC1319 and oJC1320/oJC1321. The last one, RC 5 (pJC468) was made by direct amplification from pJC296 with oJC558/oJC877 and oJC559/oJC876.

These amplicons were combined by PCR reaction with oJC558/oJC559 and then cloned back into pJC296.

94

For the stem-loop (SL) constructs, the stem loop construct without RC

(pJC442) was made from pJC134 (Hu et al., 2009) as described above (SDM to add restriction sites, insertion of MS2 sites, then addition of HA tag). The SL constructs bearing the rare codons were made by inserting the RC stretch with 2 rounds of PCR as for pJC489 above. Oligos used were: oJC1370-3 for SL RC 5 (pJC497) and oJC1374-7 for SL RC 77 (pJC498).

LSM8 & RPS20 reporters: To construct the base reporter plasmids bearing

LSM8 (pJC663) and RPS20 (pJC666), DNA was amplified from the LSM8 locus with oJC2357/oJC2358 and from the RPS20 locus with oJC2366/oJC2367. Restriction sites were inserted by site-directed mutagenesis to facilitate further cloning. XhoI sites were introduced directly upstream of the start codon in both using oJC2415/oJC2416 and oJC2417/oJC2418 respectively. SphI sites were introduced directly downstream of the stop codon using oJC2431/oJC2432 and oJC2433/oJC2434. Several point mutations were introduced into the 3’ UTRs to facilitate detection using oJC2435/oJC2436 and oJC2437/oJC2438 respectively.

These were then cloned into pJC69 (Gietz and Sugino, 1988) to create pJC663, 666.

The optimality-inverted plasmids (pJC667, 668 respectively) were constructed by synthesizing the ORF in two parts by annealing oJC2421/oJC2422 and amplifying with oJC2423/oJC2424 for LSM8 and annealing oJC2427/oJC2428 and amplifying with oJC2427/oJC2428 for RPS20. These inserts were cloned back into the

XhoI/SphI sites of pJC663, 666. These reporters were transformed into yJC244 to make yJC1888-91.

SYN reporters: To construct the plasmids bearing the synthetic reporters, restriction sites were introduced directly before the start codon and after the stop

95 codon of a PGK1-bearing plasmid (pJC296) as well as an MFA2-bearing plasmid

(pJC312). Both of these plasmids are under the control of a GAL1 UAS. SpeI and XhoI sites were inserted into pJC296, using oJC2377/oJC2378 and oJC2379/oJC2380 respectively. XbaI and XhoI sites were introduced into pJC312, using oJC2381/oJC2382 and oJC2383/oJC2384 respectively. The SYN-opt sequence was synthesized as two complementary oligonucleotides (oJC2385/oJC2409), then annealed and digested with SpeI/XhoI, then ligated into similarly digested plasmids prepared as above to make the SYN-opt reporters with PGK1 context (pJC672) and

MFA2 context (pJC674). The SYN-nonopt oligonucleotides (oJC2386/oJC2410) were processed identically to generate the SYN-nonopt reporter with PGK1 context

(pJC673) and MFA2 context (pJC675). These reporters were transformed into yJC151 to make yJC1892-95.

HIS3 reporters: For the HIS3 reporters, the endogenous reporter (pJC712) was made by amplifying the URA3 selectable marker from pJC390 with oJC2508/2509 and inserting it into the cloning site of pJC387, which already contained the HIS3

ORF under the control of its native promoter. This was transformed into yJC151 to make yJC2031 and into yJC1883 to make yJC2033. The non-optimal ORF was synthesized by annealing 4 oligonucleotides (oJC2500-3), then amplifying with oJC2518/oJC2519, and replacing the existing ORF of the pJC387 plasmid using

PacI/AscI to make pJC710. Selectable marker URA3 was then added as described above to make pJC711. This was transformed into yJC151 to make yJC2030 and into yJC1883 to make yJC2032. The optimal ORF was constructed by annealing 4 oligonucleotides (oJC2605-8), amplifying with pJC2611/2612, and then replacing the ORF of pJC711 using PacI/AscI to make pJC716. This was transformed into

96 yJC151 to make yJC2088 and into yJC244 to make yJC2090. FLAG-tagged versions were produced by introducing the FLAG tag via site-directed mutagenesis into pJC711 using oligonucleotides oJC2620/2621 to make pJC719 and into pJC716 using oligonucleotides oJC2622/2623 to make pJC720. These were transformed into yJC151 to make yJC2135 and yJC2137 respectively. All of the HIS3 constructs were designed to retain a short invariant region in the ORF (positions 337-359), which was used for detection by northern oligonucleotide probe oJC2564.

Northern RNA analysis

Northern RNA analysis of shutoffs was performed essentially as previously described (Hu et al., 2009). Briefly, for analysis of the RC and SYN reporters, cells carrying the SYN reporters were grown in 2% galactose, 1% sucrose synthetic media and collected at mid-log phase. Transcription repression was achieved by resuspending collected cells in media containing 4% glucose. After transcriptional repression, cell aliquots were removed, total RNA was isolated by (30 mg) was analyzed by electrophoresis through 1.4% formaldehyde agarose gel or 6% denaturing polyacrylamide gel. For analysis of LSM8, RPS20, and HIS3 reporters, rpb1-1 shut-offs were performed as described below in the first paragraph of the

RNA-seq section, then loaded onto 1.4% formaldehyde agarose gels instead of library construction and following steps.

Northern analyses were performed using oligonucleotide radiolabelled with T4

PNK. Specifically, the LSM8 reporters were detected using oJC2450, RPS20 with oJC2451, HIS3 with oJC2564, and SYN RNAs with oJC168. Northern signal quantitation was performed using ImageQuant software.

97

Polyribosome analysis

Sucrose density gradients for polyribosome analysis were performed essentially as described previously (Hu et al., 2009). Specifically, cells were grown until mid-log phase (OD600 = 0.4-0.45) at 24°C in synthetic media with the appropriate amino acids and 2% glucose. For glucose deprivation experiments, cells were centrifuged and resuspended in media with or without glucose for 10 min before harvesting. All cells were treated with cycloheximide to a final concentration of

100 µg ml-1 and collected by centrifugation. Cell pellets were lysed in buffer (10 mM

Tris, pH 7.4, 100 mM NaCl, 30 mM MgCl2, 1 mM DTT, 100 µg ml-1 cycloheximide) by vortexing with glass beads, and cleared using the hot needle puncture method followed by centrifugation at 2,000 rpm for 2 min at 4°C. After centrifugation of the supernatants at 29,000 r.p.m. for 10 min with a TLA 120.2 rotor, Triton X-100 was added to a final concentration of 1%. Sucrose gradients were made on a Biocomp gradient maker and were 15–45% weight/weight (sucrose to buffer (50 mM

TrisAcetate pH 7.0, 50 mM NH4Cl, 12 mM MgCl2, 1 mM DTT)). 10 units (OD260) of cell lysate were loaded onto each gradient. Gradients were centrifuged at 41,000 r.p.m. for 2 h and 26 min at 4 °C in a Beckman SW-41Ti rotor and fractionated using a

Brandel Fractionation System and an Isco UA-6 ultraviolet detector. Fractions were precipitated overnight at −20°C using 2 volumes 95% ethanol. RNA/protein was pelleted at 14,000 rpm for 30 min, then pellets were resuspended in 500 µL LET (25 mM Tris pH 8.0, 100 mM LiCl, 20 mM EDTA) with 1% SDS. Fractions were then extracted once with phenol/LET, once with phenol/chloroform/LET, and then were precipitated with one-tenth volume of 7.5 M CH3COONH4 and 2 volumes 95% ethanol. After centrifugation at 14,000 rpm for 20 min, pellets were washed once

98 with 700 µL 75% ethanol, air dried, and resuspended in 1× LET. Half of each sample was loaded on 1.4% agarose-formaldehyde gels and Northern analysis carried out as above. Northern blots of RNA from cells without stress were probed with oligonucleotide oJC2564. Northern blots of RNA from cells with stress were probed with probes generated by radiolabeled asymmetric PCR for increased sensitivity.

Asymmetric PCR probes

Plasmids pJC711 and pJC716 were used as templates to amplify non-optimal and optimal His3 sequences, respectively, in a first PCR using oJC2540/oJC2541 and Phusion Taq polymerase (BioLabs). The PCR products were run on 1% agarose gel and the single amplicons were extracted using a GenElute Gel extraction kit

(Sigma) and resuspended in 30 µL of water. 4 µL were added to a final 50 µL PCR mix containing dATP, dGTP, dTTP (200 µM each), dCTP (3 µM), the reverse primer oJC2564 (His3 ORF, 1 µM), 50 µCi of [α-32P]dCTP (3000 Ci/mmol; 10 µCi/µL) and 5 units of Taq polymerase. After denaturation at 94°C for 5’, asymmetric amplification was performed for 40 cycles (15 sec at 94°C, 30 sec at 58°C, 30 sec at 72°C) followed by 10 min at 72°C. The obtained radiolabelled probes were purified on

Micro Bio-Spin 6 Chromatography Columns (BioRad) following the manufacturer’s instructions. Blots were pre-hybridized 1 h at 42°C in 50 % formamide, 5 X SSC, 1 X

Denhardt’s, 0.5 mg/mL salmon sperm DNA, 10 mM EDTA and 0.2 % SDS, and probed with the optimal or non-optimal single-stranded probes generated by asymmetric PCR overnight at 42°C in the same buffer. They were washed twice for 5 min at room temperature in 2 X SSC, 0.1 % SDS, and once for 45 min at 50°C in 0.1

X SSC, 0.1 % SDS, and then placed on phosphorimager screens for overnight exposure.

99

Plating assays

For assays of growth on 3-AT, the HIS3 constructs pJC710, pJC711, pJC716 were transformed into the dcp2∆ strain to make yJC2040/yJC2041/uJC2089. These were then grown at 24 C in complete synthetic medium overnight, collected, and resuspended in medium lacking histidine at a density of OD600 = 0.2. They were then serially diluted and 4 µl of each dilution was plated onto plates lacking histidine and supplemented with 3-AT. These were then grown at 24 C and photographed.

RNA-seq

rpb1-1 mutant cells (Nonet et al., 1987) (yJC244) were grown to mid-log phase at 24°C as described above. To achieve transcriptional repression, cells were shifted to 37°C, then cell aliquots were removed and isolated total RNA was used for library construction. 10 time points were collected over 60 minutes, including an initial aliquot collected at time 0, before the temperature shift.

Total RNA libraries were then prepared using the Illumina TruSeq Stranded

Total RNA library prep kit. The starting material consisted of 1 μg of total RNA and 1 ng of ERCC Phage NIST spike-ins.

Poly(A)+ RNA libraries were prepared using the Illumina TruSeq Stranded mRNA library prep kit. The starting material for these libraries consisted of 4 μg of

RNA and 1 ng of ERCC Phage NIST spike-ins.

The libraries were quantitated using an Agilent Bioanalyzer and sequenced on an Illumina HiSeq2000 using paired-end 100 bp reads with an index read.

Sequencing data and the processed data for each gene are available at the Gene

Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under accession number

GSE57385.

100

Alignment and half-life calculation

Reads were aligned to the SacCer2 S. cerevisiae reference genome using

Bowtie v0.12.7 (Langmead et al., 2009) using the parameters ‘-m 1 -v 2 -p 8’. The remaining unaligned reads were then aligned to a reference file containing the sequences of the spike-in controls using the same parameters. The aligned reads were then converted into bam format and indexed using samtools v0.1.18 (Li et al.,

2009). Gene FPKM values were calculated with Cufflinks v1.3.0 (Trapnell et al.,

2010) using default parameters and a gtf file of the SacCer2 SGD gene annotation downloaded from the UCSC browser. The raw FPKM numbers were then normalized to the number of reads aligning to the spike-ins to adjust for the amplification resulting from a smaller pool of mRNA at later time points.

To estimate the half-life for each gene, we normalized each of the expression levels for each gene and each time series to the initial expression level. We then fit an exponential decay curve to the data by minimizing the sum of the absolute residuals for each gene. We filtered the list to exclude dubious and unverified ORFs, genes for which the average absolute residual was greater than .14, and genes which had an estimated half-life longer than the measured time course. To get a very rough idea of the variability in our estimates of the gene half-lives we performed a bootstrap type procedure. The un-normalized residuals from the original data were resampled for each gene and added to the un-normalized fitted curve values to repeatedly simulate new sample data sets. The 95% confidence intervals were based on the 2.5% and 97.5% quantiles of the half-life estimates calculated from the simulated data sets.

101

Statistical techniques

The Codon occurrence to mRNA Stability Correlation coefficient (CSC) was determined by calculating a Pearson correlation coefficient between the frequency of occurrence of individual codons and the half-lives of the messages containing them.

To determine the statistical significance of the association between codon optimality and the CSC, we first categorized the CSC as either positive or negative. We then used a chi-squared test of association. We also used linear regression as another measure. Similarly, to look at association in between the categories of optimal codon content and mRNA half-life, we used an ANOVA f-test with mRNA half-life on the log scale.

Any test of association between codon optimality and transcript stability may show artificial statistical significance due to confounding with the content of the genes. To help mitigate this possibility, for each test statistic, we randomly permuted the base pairs of the genes and recalculated the test statistic for each of

10,000 permutations. We calculated the base pair permutation p-value as the number of permuted data sets with a test of association stronger than the chi- squared test in the un-permuted data. Statistical calculations were done using the R environment. Percent optimal codon values were calculated by generating a list of optimal and non-optimal codons as previously described (Pechmann and Frydman,

2013).

Heat map generation

For all mRNA with reliable half-lives, rates of usage of each of the 61 codons was calculated by using an in-house Perl script. These values were then input into an

Excel spreadsheet, assigned ranks using the RANK.AVG function, and then exported

102 to a tsv file. These were then evaluated using a Spearman distance metric and clustered using k-means clustering in Cluster3 (de Hoon et al., 2004). The clustered output was visualized and color coded using the log-scale option of Java Treeview

(Saldanha, 2004).

Tables

Table 1 Yeast Strains Name Genotype Source yJC151 MATa, ura3, leu2, his3, met15 This study (Nonet et yJC244 MATa, ura3-52, his3-200, leu2-3.112, rpb1-1 al., 1987) yJC1093 MATa, ura3, leu2, his3, met15, [PGK1-HA-MS2, URA3], [MS2, LEU2] This study yJC1095 MATa, ura3, leu2, his3, met15, [SL-PGK1-HA-MS2, URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, [PGK1-RC77%-HA-MS2, yJC1097 URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, dcp2::NEO, [PGK1-HA-MS2, yJC1099 URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, dcp2::NEO, [SL-PGK1-HA- yJC1101 MS2, URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, dcp2::NEO, [PGK1-RC77%- yJC1103 HA-MS2, URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, [PGK1-RC2%-HA-MS2, yJC1221 URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, [PGK1-RC25%-HA-MS2, yJC1223 URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, [PGK1-RC50%-HA-MS2, yJC1225 URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, [PGK1-RC63%-HA-MS2, yJC1227 URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, [PGK1-RC94%-HA-MS2, yJC1229 URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, dcp2::NEO, [PGK1-RC2%-HA- yJC1231 MS2, URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, dcp2::NEO, [PGK1-RC25%- yJC1233 HA-MS2, URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, dcp2::NEO, [PGK1-RC50%- yJC1235 HA-MS2, URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, dcp2::NEO, [PGK1-RC63%- yJC1237 HA-MS2, URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, met15, dcp2::NEO, [PGK1-RC94%- yJC1239 HA-MS2, URA3], [MS2, LEU2] This study MATa, ura3, his3, leu2, met15, [SL-PGK1-RC5%-HA-MS2, yJC1304 URA3], [MS2, LEU2] This study

103

MATa, ura3, his3, leu2, met15, [SL-PGK1-RC77%-HA-MS2, yJC1306 URA3], [MS2, LEU2] This study MATa, ura3, leu2, his3, lys2, dcp2::NEO [MS2, LEU2] [SL- yJC1334 PGK1-RC5%, URA3] This study MATa, ura3, leu2, his3, lys2, dcp2::NEO [MS2, LEU2] [SL- yJC1336 PGK1-RC77%, URA3] This study MATa, ura3-52, his3-200, leu2-3.112, rpb1-1, yJC1888 [WT (nonoptimal) LSM8, URA3] This study MATa, ura3-52, his3-200, leu2-3.112, rpb1-1, yJC1889 [mutant (optimal) LSM8, URA3] This study MATa, ura3-52, his3-200, leu2-3.112, rpb1-1, yJC1890 [WT (optimal) RPS20, URA3] This study MATa, ura3-52, his3-200, leu2-3.112, rpb1-1, yJC1891 [mutant (nonoptimal) RPS20, URA3] This study yJC1892 MATa, ura3, leu2, his3, met15, [pGAL-PGK1pG/SYN opt, URA3] This study yJC1893 MATa, ura3, leu2, his3, met15, [pGAL-PGK1pG/SYN non-opt, URA3] This study yJC1894 MATa, ura3, leu2, his3, met15, [pGAL-MFA2pG/SYN opt, URA3] This study yJC1895 MATa, ura3, leu2, his3, met15, [pGAL-MFA2pG/SYN non-opt, URA3] This study MATa, ura3, leu2, his3, met15, dcp2::NEO, [pGAL-PGK1pG/SYN opt, yJC1917 URA3] This study MATa, ura3, leu2, his3, met15, dcp2::NEO, [pGAL-PGK1pG/SYN non- yJC1918 opt, URA3] This study MATa, ura3, leu2, his3, met15, ccr4::NEO, [pGAL-PGK1pG/SYN opt, yJC1961 URA3] This study MATa, ura3, leu2, his3, met15, ccr4::NEO, [pGAL-PGK1pG/SYN non- yJC1962 opt, URA3] This study MATa, ura3, leu2, his3, met15, dom34::NEO, [pGAL-PGK1pG/SYN opt, yJC1963 URA3] This study MATa, ura3, leu2, his3, met15, dom34::NEO, [pGAL-PGK1pG/SYN non- yJC1964 opt, URA3] This study MATa, ura3, leu2, his3, met15, upf1::NEO, [pGAL-PGK1pG/SYN opt, yJC1996 URA3] This study MATa, ura3, leu2, his3, met15, upf1::NEO, [pGAL-PGK1pG/SYN non- yJC1997 opt, URA3] This study yJC2030 MATa, ura3, leu2, his3, met15, [HIS3-non-optimal, URA3] This study yJC2031 MATa, ura3, leu2, his3, met15, [HIS3-endogenous, URA3] This study yJC2032 MATa, ura3, leu2, his3, met15, rpb1-1, [HIS3-non-optimal, URA3] This study yJC2033 MATa, ura3, leu2, his3, met15, rpb1-1, [HIS3-endogenous, URA3] This study yJC2040 MATa, ura3, leu2, his3, lys2, dcp2::NEO, [HIS-non-optimal, URA3] This study yJC2041 MATa, ura3, leu2, his3, lys2, dcp2::NEO, [HIS-endogenous, URA3] This study yJC2088 MATa, ura3, leu2, his3, met15, [HIS3-optimal, URA3] This study yJC2089 MATa, ura3, leu2, his3, lys2, dcp2::NEO, [HIS3-optimal, URA3] This study yJC2090 MATa, ura3-52, his3-200, leu2-3.112, rpb1-1, [HIS3-optimal, URA3] This study

104 yJC2135 MATa, ura3, leu2, his3, met15, [FLAG-HIS3-non-optimal, URA3] This study yJC2137 MATa, ura3, leu2, his3, met15, [FLAG-HIS3-optimal, URA3] This study yJC1883 MATa, ura3, leu2, his3, met15, rpb1-1 This study

Table 2 Plasmids Name Description Reference (Gietz and pJC69 YCpLac33 Sugino, 1988) (Decker and pJC296 PGK1pG reporter under control of GAL1 promoter Parker, 1993) (Decker and pJC312 MFA2–pG reporter under control of GAL1 promoter Parker, 1993) (Brachmann pJC387 pRS413 et al., 1998) (Brachmann pJC390 pRS416 et al., 1998) pJC659 WT LSM8 +/–500 bp in pJC69 This study pJC660 WT RPS20 +/–500 bp in pJC69 This study pJC661 pJC659 with an XhoI site upstream of the LSM8 start codon. This study pJC661 with an SphI site downstream of the LSM8 stop codon pJC662 (and an additional single nucleotide mutation within the–3' UTR). This study pJC663 pJC662 with additional LSM8 3' UTR mutations. This study pJC664 pJC660 with an XhoI site upstream of the RPS20 start codon. This study pJC664 with an SphI site downstream of the RPS20 stop codon pJC665 (and an additional single nucleotide mutation within the–3' UTR). This study pJC666 pJC665 with additional RPS20 3' UTR mutations. This study pJC663 in which the mutant (optimal) LSM8 gene replaced pJC667 the WT LSM8 gene. This study pJC666 in which the mutant (nonoptimal) RPS20 gene replaced pJC668 the WT RPS20 gene. This study PGK1pG reporter with SYN opt ORF (under control of GAL1 pJC672 promoter). This study PGK1pG reporter with SYN non–opt ORF (under control of GAL1 pJC673 promoter). This study MFA2pG reporter with SYN opt ORF (under control of GAL1 pJC674 promoter). This study MFA2pG reporter with SYN non–opt ORF (under control of GAL1 pJC675 promoter). This study pJC710 HIS3 non-optimal ORF (own promoter) without marker This study pJC711 HIS3 non-optimal ORF (own promoter) This study pJC712 HIS3 endogenous ORF (own promoter) This study pJC716 HIS3 optimal ORF (own promoter) This study pJC719 Flag-tagged HIS3 non-optimal (own promoter) This study pJC720 Flag-tagged HIS3 optimal (own promoter) This study

105

Table 3 Oligonucleotides Name Sequence Reference oJC168 (Hu et al., (oRP121) 5'-AATTCCCCCCCCCCCCCCCCCCA-3' 2009) oJC558 5'-CCGGGGATCCGTACTGTTACTCTCTCTC-3' This study oJC559 5'-GTGCCAAGCTTTAACGAACGCAGAAT-3' This study 5'-TTAATAGCGCGGCGGCGGCGGCGGGCGACGGACGGTAAG oJC876 AAGATCACTTCTAACCAAAGAATTGTTGCTGC-3' This study 5'-CGTCGCCCGCCGCCGCCGCCGCGCTATTAAGACACGCTTG oJC877 TCCTTCAAGTCCAAATCTTG-3' This study oJC1011 5'-TTGAATTGAATTGAAACTAGTAATTTGGGGGGGGGG-3' This study oJC1012 5'-CCCCCCCCCCAAATTACTAGTTTCAATTCAATTCAA-3' This study oJC1013 5'-GGGGGGGGGGGGGGGCTCGAGTAGATCAATTTTTTTC-3' This study oJC1014 5'-GAAAAAAATTGATCTACTCGAGCCCCCCCCCCCCCCC-3' This study 5'-CTAGACATGAGGATCACCCATGTCTGCAGGTCGACTCTAGA oJC1015 AAACATGAGGATCACCCATGT-3' This study 5'-TCGAACATGGGTGATCCTCATGTTTTCTAGAGTCGACCTGC oJC1016 AGACATGGGTGATCCTCATGT-3' This study 5'-GTGTTGCTTTCTTATCCGAATTAATTAACGGCGCGCCGAATT oJC1196 GAAACTAGACATG-3' This study 5'-CATGTCTAGTTTCAATTCGGCGCGCCGTTAATTAATTCGGAT oJC1197 AAGAAAGCAACAC-3' This study 5'-CGTCGCCCGCCGCCGCCGCCGCGCTATTAATTCAACTTCT oJC1244 GGACCGACAC-3' This study 5'-TTAATAGCGCGGCGGCGGCGGCGGGCGACGGTTATTTTGT oJC1245 TGGAAAACTTG-3' This study 5'-CGTCGCCCGCCGCCGCCGCCGCGCTATTAAGGCCAAGAAT oJC1246 GGTCTGGTTG-3' This study 5'-TTAATAGCGCGGCGGCGGCGGCGGGCGACGATTCAATTGA oJC1247 TTGACAACTTG-3' This study 5'-CGTCGCCCGCCGCCGCCGCCGCGCTATTAAACCAGCCTTG oJC1248 TCGAAGATGG-3' This study 5'-TTAATAGCGCGGCGGCGGCGGCGGGCGACGGCCAAGGCC oJC1249 AAGGGTGTCGAAG-3' This study oJC1261 5'-GAGCTCGGTACCCGGGGATCCGTAC-3' This study 5'-CATGTCTCTACTGGTTTAATAGCGCGGCGGGAATTATTGGA oJC1318 AGG-3' This study 5'-CCTTCCAATAATTCCCGCCGCGCTATTAAACCAGTAGAGAC oJC1319 ATG-3' This study 5'-GTCTCTACTGGTTTAATAGCGCGGCGGCGGCGGCGGGCGA oJC1320 CGAAGGAATTGCCAG-3' This study 5'-CTGGCAATTCCTTCGTCGCCCGCCGCCGCCGCCGCGCTAT oJC1321 TAAACCAGTAGAGAC-3' This study 5'-GAAGGACAAGCGTGTCTTAATAGCGCGGCGGTTCAACGTC oJC1370 CCATTG-3' This study 5'-CAATGGGACGTTGAACCGCCGCGCTATTAAGACACGCTTGT oJC1371 CCTTC-3' This study 5'-GTGTCTTAATAGCGCGGCGGCGGCGGCGGGCGACGGACG oJC1372 GTAAGAAGATC-3' This study 106

5'-GATCTTCTTACCGTCCGTCGCCCGCCGCCGCCGCCGCGCT oJC1373 ATTAAGACAC-3' This study 5'-CCAGAATCTAGAAAGTTAATAGCGCGGCGGGTTGCAAAGG oJC1374 CTAAG-3' This study 5'-CTTAGCCTTTGCAACCCGCCGCGCTATTAACTTTCTAGATTC oJC1375 TGG-3' This study 5'-CTAGAAAGTTAATAGCGCGGCGGCGGCGGCGGGCGACGA oJC1376 CCATTGTCTGG-3' This study 5'-CCAGACAATGGTCGTCGCCCGCCGCCGCCGCCGCGCTATT oJC1377 AACTTTCTAG-3' This study oJC2357 5'-CAGGTCAAGCTTTCCAGTAGCTGGTTAAACTTG-3' This study oJC2358 5'-CAGGTCGGATCCTTGCTATTTGGCGATGAGTTC-3' This study oJC2366 5'-CGGTCCAAGCTTATTGTGACTAGAATACTATTG-3' This study oJC2367 5'-CAGGTCGGATCCGCGTGAAACATTTATCAGC-3' This study 5'-CTACTTTTTACAACAAATATACTAGTATGTCTTTATCTTCAAA oJC2377 GTT-3’ This study 5'-AACTTTGAAGATAAAGACATACTAGTATATTTGTTGTAAAAA oJC2378 GTAG-3’ This study 5'-CTTATCCGAAAAGAAATAACTCGAGTTGAATTGAACGAAGG oJC2379 AATTT-3’ This study 5'-AAATTCCTTCGTTCAATTCAACTCGAGTTATTTCTTTTCGGA oJC2380 TAAG-3’ This study 5'-TCATACAACAATAACTACCATCTAGAATGCAACCGATCACCA oJC2381 CTGC-3’ This study 5'-GCAGTGGTGATCGGTTGCATTCTAGATGGTAGTTATTGTTG oJC2382 TATGA-3’ This study 5'-CCGCCTGTGTTATCGCTTAACTCGAGACGACAACCAAGAGA oJC2383 TCTAG-3’ This study 5'-CTAGATCTCTTGGTTGTCGTCTCGAGTTAAGCGATAACACA oJC2384 GGCGG-3’ This study 5'-GAATACTAGTATGCCACCAAAGGCTTCCCCAACCGGTGCTT CCTCCGTTTTGAAGGCTAAGGCTCCATCCATCCCAGCTAAGAC CGTTGGTAAGACCTTGCCAAAGACCGTTATCACCAAGTTGTCC ACCGTTATCACCTTGGGTGCTGCTGGTTTGATCGTTCCATTGT oJC2385 CCATCGGTATCGGTGTTTAACTCGAGCTAA-3’ This study 5'-GAATACTAGTATGCCGCCGAAAGCAAGTCCGACAGGAGCA AGTAGTGTACTGAAAGCAAAAGCACCGAGTATACCGGCAAAA ACAGTAGGAAAAACACTGCCGAAAACAGTAATAACAAAACTGA GTACAGTAATAACACTGGGAGCAGCAGGACTGATAGTACCGC oJC2386 TGAGTATAGGAATAGGAGTATAACTCGAGCTAA-3’ This study 5'-TTAGCTCGAGTTAAACACCGATACCGATGGACAATGGAACG ATCAAACCAGCAGCACCCAAGGTGATAACGGTGGACAACTTG GTGATAACGGTCTTTGGCAAGGTCTTACCAACGGTCTTAGCTG GGATGGATGGAGCCTTAGCCTTCAAAACGGAGGAAGCACCGG oJC2409 TTGGGGAAGCCTTTGGTGGCATACTAGTATTC-3’ This study 5'-TTAGCTCGAGTTATACTCCTATTCCTATACTCAGCGGTACTA TCAGTCCTGCTGCTCCCAGTGTTATTACTGTACTCAGTTTTGTT ATTACTGTTTTCGGCAGTGTTTTTCCTACTGTTTTTGCCGGTAT ACTCGGTGCTTTTGCTTTCAGTACACTACTTGCTCCTGTCGGA oJC2410 CTTGCTTTCGGCGGCATACTAGTATTC-3’ This study

107

5'-CAAGAAGAACACACTATCGCTCGAGATGTCAGCCACCTTGA oJC2415 AAGAC-3’ This study 5'-GTCTTTCAAGGTGGCTGACATCTCGAGCGATAGTGTGTTCT oJC2416 TCTTG-3’ This study 5'-AATAAACAAAAAGGTATACTCGAGATGTCTGACTTTCAAAAG oJC2417 G-3’ This study 5'-CCTTTTGAAAGTCAGACATCTCGAGTATACCTTTTTGTTTAT oJC2418 T-3’ This study 5'-CAGGATCTCGAGATGTCCGCCACCTTGAAGGACTACTTGAA CAAGAGAGTTGTTATCATCAAGGTTGACGGCGAATGTTTGATC GCTTCTTTGAACGGCTTCGACAAGAACACTAACTTGTTCATCA CCAACGTTTTCAACCGTATCTCTAAGGAATTCATCTGTAAGGC oJC2421 TCAATTGTTGCGTGGCTCCGAAATTGC-3' This study 5'-CAGGATGCATGCTTACTTAGTCTTAGATTCGTAAACCTTTTC CCAGATAACGTGTTCGTTTTCGATCTTGTTCTTGGTGTCCTTCA ACATTGGGACCTTCTTTTCGTCGATTGGAGCCAAAGAGTCGTC GTTTTCGGCGTCGATCAAGCCAACCAAAGCAATTTCGGAGCC oJC2422 ACGCAACAATTGAGCCTTAC-3' This study oJC2423 5'-CAGGATCTCGAGATGTCCGCC-3' This study oJC2424 5'-CAGGATGCATGCTTACTTAGTCTTAGATTCG-3' This study 5'-ATGAGTGATTTTCAGAAAGAGAAAGTAGAGGAGCAGGAGC AGCAGCAGCAGCAGATAATAAAAATTAGGATAACACTGACAAG CACAAAAGTAAAACAGCTGGAGAATGTAAGCTCAAATATTGTA AAAAATGCAGAGCAGCATAATCTGGTAAAAAAAGGACCGGTAA oJC2425 GGCTACCGACAAAAGTACTGAAAATAAGCAC-3' This study 5'-TGCTTAATTTGATGCTACTACTACCTCTACATCCACTCCAGG CTCAATTGTTATCTGTGTTATCCTTTTTACTATCTGTACAGGTG CCTCAAGATCTATATACCTTTTATGTATCCTCATCTCATATGTC TCCCATGTTTTACTTCCCTCTCCATTCGGTGTTTTCCTTGTGCT oJC2426 TATTTTCAGTACTTTTGTCGGTAGCC-3' This study oJC2427 5'-CAGGATCTCGAGATGAGTGATTTTCAGAAAGAG-3' This study oJC2428 5'-CAGGATGCATGCTTAATTTGATGCTACTACTACC-3' This study 5'-GTGTACGAATCAAAGACAAAATAAGCATGCAGCAATAATAG oJC2431 TAATAATAATA-3’ This study 5'-TATTATTATTACTATTATTGCTGCATGCTTATTTTGTCTTTGA oJC2432 TTCGTACAC-3’ This study 5'-GTTGTTGTTGCTTCCAACTAAGCATGCTGTAACTGGAAATAA oJC2433 TTTC-3’ This study 5'-GAAATTATTTCCAGTTACAGCATGCTTAGTTGGAAGCAACAA oJC2434 CAAC-3’ This study 5'-CAAAGACAAAATAAGCATGCAGGTAACCAAGTAATAATAATA oJC2435 ATAATAA-3’ This study 5'-TTATTATTATTATTATTACTTGGTTACCTGCATGCTTATTTTG oJC2436 TCTTTG-3’ This study 5'-CTTCCAACTAAGCATGCTGTTACGCGTAATAATTTCCATTAG oJC2437 ATTCC-3’ This study 5'-GGAATCTAATGGAAATTATTACGCGTAACAGCATGCTTAGTT oJC2438 GGAAG-3’ This study

108 oJC2450 5'-TACTTGGTTACCTGCATGC-3' This study oJC2451 5'-ATTACGCGTAACAGCATGC-3’ This study 5'-CAAGTTAATTAAATGACAGAGCAGAAAGCGCTAGTAAAACG AATAACAAATGAGACGAAAATACAGATAGCGATATCATTAAAA GGAGGACCCCTAGCGATAGAGCATTCGATATTTCCGGAGAAA GAGGCAGAGGCAGTAGCAGAGCAGGCGACACAGTCGCAGGT oJC2500 GATAAATGTGCATACAGGAATAGGGTTTCTG-3' This study 5'-CCTTTTTACTCCTCGCACCGCCCCTAGCGCCTCTTTAAATG CCTGTCCGAGTGCTATCCCGCAATCCTCTGTCGTATGATGATC ATCTATATGTAAATCTCCTATGCACTCTACTATTAGCGACCAGC CCGAATGTTTCGCCAGTGCATGTATCATATGATCCAGAAACCC oJC2501 TATTCCTGTATGCACATTTATCACCTG-3' This study 5'-GCATTTAAAGAGGCGCTAGGGGCGGTGCGAGGAGTAAAAA GGTTTGGATCAGGATTTGCGCCTCTGGATGAGGCACTTTCGA GGGCGGTGGTAGATCTTTCGAATAGGCCGTATGCAGTAGTGG AGCTTGGATTACAGAGGGAGAAAGTAGGAGATCTCTCATGCG oJC2502 AGATGATACCGCATTTTCTTGAGAGCTTTGCAGA-3' This study 5'-GTTAGGCGCGCCCTACATAAGTACTCCTTTCGTCGAGGGTA CATCATTCGTTCCATTGGGCGACGTCGCCTCCCTTATCGCTAC CGCAAGTGCTTTAAACGCACTCTCACTTCGATGATGATCATTT TTGCCTCGCAGACAATCTACATGGAGCGTTATCCTGCTTGCCT oJC2503 CTGCAAAGCTCTCAAGAAAATGCGGTATCA-3' This study 5'-GAGTGCACCATACCACAGCTCTCGAGTTCAATTCATCATTTT oJC2508 TTTT-3' This study 5'-ATCTGTGCGGTATTTCACACGAATTCGGGTAATAACTGATAT oJC2509 AATT-3' This study oJC2518 5'-CAAGTTAATTAAATGACAGAGCAG-3' This study oJC2519 5'-GTTAGGCGCGCCCTACATAAGTACT-3' This study oJC2540 5'-AGTAATGTGATTTCTTCGAA-3' This study oJC2541 5'-ATTCATAGGTATACATATAT-3' This study oJC2564 5'-CCTGATCCAAACCTTTTTACTCC-3' This study 5'-CCTACGTTAATTAAATGACTGAACAAAAGGCCTTGGTTAAG CGTATTACTAACGAAACCAAGATTCAAATTGCCATCTCTTTGAA GGGTGGTCCATTGGCCATTGAACACTCCATCTTCCCAGAAAA GGAAGCTGAAGCTGTTGCTGAACAAGCCACTCAATCCCAAGT oJC2605 CATTAACGTCCACACTGGTATTGGTTTCTTG-3' This study 5'-AACCTTTTTACTCCACGGACGGCACCCAAGGCTTCCTTGAA AGCTTGACCCAAAGCAATACCACAGTCTTCAGTGGTGTGGTG GTCGTCAATGTGCAAGTCACCAATACATTCAACGATCAAGGAC CAACCGGAGTGCTTGGCCAAAGCGTGAATCATGTGGTCCAAG oJC2606 AAACCAATACCAGTGTGGACGTTAATGACTTG-3' This study 5'-GGAAGCCTTGGGTGCCGTCCGTGGAGTAAAAAGGTTTGGA TCAGGTTTCGCCCCATTGGACGAAGCTTTGTCCAGAGCCGTC GTTGACTTGTCCAACAGACCATACGCTGTTGTCGAATTGGGTT TGCAAAGAGAAAAGGTTGGTGACTTGTCTTGTGAAATGATCCC oJC2607 ACACTTCTTGGAATCCTTCGCTGAAGCTTCCA-3' This study

5'-GGTTCAGGCGCGCCCTACATCAAAACACCCTTGGTGGATG GAACGTCGTTGGTACCGTTTGGGGAGGTGGCTTCTCTAATGG oJC2608 CAACGGCCAAAGCCTTGAAGGCAGATTCAGAACGGTGGTGGT This study 109

CGTTCTTACCACGCAAACAGTCAACGTGCAAGGTAATTCTGGA AGCTTCAGCGAAGGATTCCAAGAAGTGTGGGA-3' oJC2611 5'-CCTACGTTAATTAAATGACTGAACAAAAGGCCTTGG-3' This study oJC2612 5'-GGTTCAGGCGCGCCCTACATCAAAAC-3' This study 5'-GAGCAGGCAAGATAAACGATTAATTAAATGGATTATAAAGAT GATGATGATAAAACAGAGCAGAAAGCGCTAGTAAAACGAATA- oJC2620 3' This study 5'-TATTCGTTTTACTAGCGCTTTCTGCTCTGTTTTATCATCATCA oJC2621 TCTTTATAATCCATTTAATTAATCGTTTATCTTGCCTGCTC-3' This study 5'-GAGCAGGCAAGATAAACGATTAATTAAATGGATTATAAAGAT GATGATGATAAAACTGAACAAAAGGCCTTGGTTAAGCGTATT-3 oJC2622 ' This study 5'-AATACGCTTAACCAAGGCCTTTTGTTCAGTTTTATCATCATC oJC2623 ATCTTTATAATCCATTTAATTAATCGTTTATCTTGCCTGCTC-3' This study oJC2632 5'-TGGTTGGTAGTCTGACTGGACCCT-3' This study oJC2633 5'-GCTTGCTATGAGGCATTCGCCG-3' This study

110

BIBLIOGRAPHY

Akashi, H. (1994). Synonymous codon usage in Drosophila melanogaster: Natural selection and translational accuracy. Genetics 136, 927–935.

Anderson, J.S., and Parker, R.P. (1998). The 3’ to 5' degradation of yeast mRNAs is a general mechanism for mRNA turnover that requires the SKI2 DEVH box protein and 3' to 5' exonucleases of the exosome complex. EMBO J. 17, 1497–1506.

Artieri, C.G., and Fraser, H.B. (2014). Evolution at two levels of gene expression in yeast. Genome Res. 24, 411–421.

Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, W.W., and Noble, W.S. (2009). MEME Suite: Tools for motif discovery and searching. Nucleic Acids Res. 37.

Baker, K.E., and Parker, R. (2004). Nonsense-mediated mRNA decay: terminating erroneous gene expression. Curr. Opin. Cell Biol. 16, 293–299.

Baumann, B., Potash, M.J., and Köhler, G. (1985). Consequences of frameshift mutations at the immunoglobulin heavy chain locus of the mouse. EMBO J. 4, 351–359.

Beelman, C.A., and Parker, R. (1994). Differential effects of translational inhibition in cis and in trans on the decay of the unstable yeast MFA2 mRNA. J. Biol. Chem. 269, 9687– 9692.

Bertram, G., Innes, S., Minella, O., Richardson, J.P., and Stansfield, I. (2001). Endless possibilities: translation termination and stop codon recognition. Microbiology 147, 255– 269.

Blower, M.D., Jambhekar, A., Schwarz, D.S., and Toombs, J.A. (2013). Combining different mRNA capture methods to analyze the transcriptome: analysis of the Xenopus laevis transcriptome. PLoS One 8, e77700.

Brachmann, C.B., Davies, A., Cost, G.J., Caputo, E., Li, J., Hieter, P., and Boeke, J.D. (1998). Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14, 115–132.

Brar, G.A., Yassour, M., Friedman, N., Regev, A., Ingolia, N.T., and Weissman, J.S. (2012). High-resolution view of the yeast meiotic program revealed by ribosome profiling. Science (80-. ). 335, 552–557.

Cabada, M.O., Darnbrough, C., Ford, P.J., and Turner, P.C. (1977). Differential accumulation of two size classes of poly(A) associated with messenger RNA during oogenesis in Xenopus laevis. Dev. Biol. 57, 427–439.

111

Caponigro, G., Muhlrad, D., and Parker, R. (1993). A small segment of the MAT alpha 1 transcript promotes mRNA decay in Saccharomyces cerevisiae: a stimulatory role for rare codons. Mol. Cell. Biol. 13, 5141–5148.

Chan, C.T.Y., Pang, Y.L.J., Deng, W., Babu, I.R., Dyavaiah, M., Begley, T.J., and Dedon, P.C. (2012). Reprogramming of tRNA modifications controls the oxidative stress response by codon-biased translation of proteins. Nat. Commun. 3, 937.

Chang, Y.-F., Imam, J.S., and Wilkinson, M.F. (2007). The nonsense-mediated decay RNA surveillance pathway. Annu. Rev. Biochem. 76, 51–74.

Charneski, C.A., and Hurst, L.D. (2013). Positively charged residues are the major determinants of ribosomal velocity. PLoS Biol. 11, e1001508.

Chen, J., Chiang, Y.-C., and Denis, C.L. (2002). CCR4, a 3’-5' poly(A) RNA and ssDNA exonuclease, is the catalytic component of the cytoplasmic deadenylase. EMBO J. 21, 1414–1426.

Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H., Helt, G., et al. (2005). Transcriptional maps of 10 human at 5-nucleotide resolution. Science (80-. ). 308, 1149–1154.

Chlebowski, A., Lubas, M., Jensen, T.H., and Dziembowski, A. (2013). RNA decay machines: the exosome. Biochim. Biophys. Acta 1829, 552–560.

Cho, E.J., Takagi, T., Moore, C.R., and Buratowski, S. (1997). mRNA capping enzyme is recruited to the transcription complex by phosphorylation of the RNA polymerase II carboxy-terminal domain. Genes Dev. 11, 3319–3326.

Chowdhury, A., Mukhopadhyay, J., and Tharun, S. (2007). The decapping activator Lsm1p-7p-Pat1p complex has the intrinsic ability to distinguish between oligoadenylated and polyadenylated RNAs. RNA 13, 998–1016.

Colgan, D.F., and Manley, J.L. (1997). Mechanism and regulation of mRNA polyadenylation. Genes Dev. 11, 2755–2766.

Coller, J., and Parker, R. (2004). Eukaryotic mRNA decapping. Annu. Rev. Biochem. 73, 861–890.

Coller, J., and Parker, R. (2005). General translational repression by activators of mRNA decapping. Cell 122, 875–886.

Coller, J.M., Tucker, M., Sheth, U., Valencia-Sanchez, M.A., and Parker, R. (2001). The DEAD box helicase, Dhh1p, functions in mRNA decapping and interacts with both the decapping and deadenylase complexes. RNA 7, 1717–1727.

Crick, F. (1970). Central dogma of molecular biology. Nature 227, 561–563.

112

Decker, C.J., and Parker, R. (1993). A turnover pathway for both stable and unstable mRNAs in yeast: Evidence for a requirement for deadenylation. Genes Dev. 7, 1632– 1643.

Dever, T.E., and Green, R. (2012). The elongation, termination, and recycling phases of translation in eukaryotes. Cold Spring Harb. Perspect. Biol. 4, a013706.

Doherty, A., and McInerney, J.O. (2013). Translational selection frequently overcomes genetic drift in shaping synonymous codon usage patterns in vertebrates. Mol. Biol. Evol. 30, 2263–2267.

Doma, M.K., and Parker, R. (2006). Endonucleolytic cleavage of eukaryotic mRNAs with stalls in translation elongation. Nature 440, 561–564.

Drummond, D.A., and Wilke, C.O. (2008). Mistranslation-Induced Protein Misfolding as a Dominant Constraint on Coding-Sequence Evolution. Cell 134, 341–352.

Dunckley, T., Tucker, M., and Parker, R. (2001). Two related proteins, Edc1p and Edc2p, stimulate mRNA decapping in Saccharomyces cerevisiae. Genetics 157, 27–37.

Eulalio, A., Huntzinger, E., Nishihara, T., Rehwinkel, J., Fauser, M., and Izaurralde, E. (2009). Deadenylation is a widespread effect of miRNA regulation. RNA 15, 21–32.

Franks, T.M., and Lykke-Andersen, J. (2008). The control of mRNA decapping and P- body formation. Mol. Cell 32, 605–615.

Frischmeyer, P.A., and Dietz, H.C. (1999). Nonsense-mediated mRNA decay in health and disease. Hum. Mol. Genet. 8, 1893–1900.

Frischmeyer, P.A., van Hoof, A., O’Donnell, K., Guerrerio, A.L., Parker, R., and Dietz, H.C. (2002). An mRNA surveillance mechanism that eliminates transcripts lacking termination codons. Science (80-. ). 295, 2258–2261.

Furuichi, Y., LaFiandra, A., and Shatkin, A.J. (1977). 5’-Terminal structure and mRNA stability. Nature 266, 235–239.

Gardin, J., Yeasmin, R., Yurovsky, A., Cai, Y., Skiena, S., and Futcher, B. (2014). Measurement of average decoding rates of the 61 sense codons in vivo. Elife 3.

Geisberg, J. V., Moqtaderi, Z., Fan, X., Ozsolak, F., and Struhl, K. (2014). Global analysis of mRNA isoform half-lives reveals stabilizing and destabilizing elements in yeast. Cell 156, 812–824.

Gerashchenko, M. V, Lobanov, A. V, and Gladyshev, V.N. (2012). Genome-wide ribosome profiling reveals complex translational regulation in response to oxidative stress. Proc. Natl. Acad. Sci. U. S. A. 109, 17394–17399.

113

Ghaemmaghami, S., Huh, W.-K., Bower, K., Howson, R.W., Belle, A., Dephoure, N., O’Shea, E.K., and Weissman, J.S. (2003). Global analysis of protein expression in yeast. Nature 425, 737–741.

Gietz, R.D., and Sugino, A. (1988). New yeast-Escherichia coli shuttle vectors constructed with in vitro mutagenized yeast genes lacking six-base pair restriction sites. Gene 74, 527–534.

Gingold, H., Tehler, D., Christoffersen, N.R., Nielsen, M.M., Asmar, F., Kooistra, S.M., Christophersen, N.S., Christensen, L.L., Borre, M., Sørensen, K.D., et al. (2014). A Dual Program for Translation Regulation in Cellular Proliferation and Differentiation. Cell 158, 1281–1292.

Glaser, R.D., and Houston, L.L. (1974). Subunit structure and photooxidation of yeast imidazoleglycerolphosphate dehydratase. Biochemistry 13, 5145–5152.

Glover-Cutter, K., Kim, S., Espinosa, J., and Bentley, D.L. (2008). RNA polymerase II pauses and associates with pre-mRNA processing factors at both ends of genes. Nat. Struct. Mol. Biol. 15, 71–78.

Goldstrohm, A.C., and Wickens, M. (2008). Multifunctional deadenylase complexes diversify mRNA control. Nat. Rev. Mol. Cell Biol. 9, 337–344.

Goldstrohm, A.C., Seay, D.J., Hook, B.A., and Wickens, M. (2007). PUF protein- mediated deadenylation is catalyzed by Ccr4p. J. Biol. Chem. 282, 109–114.

Grantham, R., Gautier, C., Gouy, M., Jacobzone, M., and Mercier, R. (1981). Codon catalog usage is a genome strategy modulated for gene expressivity. Nucleic Acids Res. 9, 213.

Gray, N.K., Coller, J.M., Dickson, K.S., and Wickens, M. (2000). Multiple portions of poly(A)-binding protein stimulate translation in vivo. EMBO J. 19, 4723–4733.

Guydosh, N.R., and Green, R. (2014). Dom34 rescues ribosomes in 3’ untranslated regions. Cell 156, 950–962.

Von der Haar, T., Gross, J.D., Wagner, G., and McCarthy, J.E.G. (2004). The mRNA cap-binding protein eIF4E in post-transcriptional gene expression. Nat. Struct. Mol. Biol. 11, 503–511.

Harigaya, Y., and Parker, R. (2010). No-go decay: a quality control mechanism for RNA in translation. Wiley Interdiscip. Rev. RNA 1, 132–141.

Herrick, D., Parker, R., and Jacobson, A. (1990). Identification and comparison of stable and unstable mRNAs in Saccharomyces cerevisiae. Mol. Cell. Biol. 10, 2269–2284.

Hinnebusch, A.G., and Lorsch, J.R. (2012). The mechanism of eukaryotic translation initiation: new insights and challenges. Cold Spring Harb. Perspect. Biol. 4.

114

Hoekema, A., Kastelein, R.A., Vasser, M., and de Boer, H.A. (1987). Codon replacement in the PGK1 gene of Saccharomyces cerevisiae: experimental approach to study the role of biased codon usage in gene expression. Mol. Cell. Biol. 7, 2914–2924.

Hollien, J., and Weissman, J.S. (2006). Decay of endoplasmic reticulum-localized mRNAs during the unfolded protein response. Science (80-. ). 313, 104–107.

Van Hoof, A., Frischmeyer, P.A., Dietz, H.C., and Parker, R. (2002). Exosome-mediated recognition and degradation of mRNAs lacking a termination codon. Science (80-. ). 295, 2262–2264.

De Hoon, M.J.L., Imoto, S., Nolan, J., and Miyano, S. (2004). Open source clustering software. Bioinformatics 20, 1453–1454.

Hoshino, S. (2012). Mechanism of the initiation of mRNA decay: role of eRF3 family G proteins. Wiley Interdiscip. Rev. RNA 3, 743–757.

Hu, W., Sweet, T.J., Chamnongpol, S., Baker, K.E., and Coller, J. (2009). Co- translational mRNA decay in Saccharomyces cerevisiae. Nature 461, 225–229.

Huch, S., and Nissan, T. (2014). Interrelations between translation and general mRNA degradation in yeast. Wiley Interdiscip. Rev. RNA 5, 747–763.

Hudson, N.J., Gu, Q., Nagaraj, S.H., Ding, Y.S., Dalrymple, B.P., and Reverter, A. (2011). Eukaryotic evolutionary transitions are associated with extreme codon bias in functionally-related proteins. PLoS One 6.

Iben, J.R., and Maraia, R.J. (2012). tRNAomics: tRNA gene copy number variation and codon use provide bioinformatic evidence of a new anticodon:codon wobble pair in a eukaryote. RNA 18, 1358–1372.

Ikemura, T. (1985). Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2, 13–34.

Ingolia, N.T., Ghaemmaghami, S., Newman, J.R.S., and Weissman, J.S. (2009). Genome- wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science (80-. ). 324, 218–223.

Isken, O., and Maquat, L.E. (2007). Quality control of eukaryotic mRNA: safeguarding cells from abnormal mRNA function. Genes Dev. 21, 1833–1856.

Ivanov, P. V, Gehring, N.H., Kunz, J.B., Hentze, M.W., and Kulozik, A.E. (2008). Interactions between UPF1, eRFs, PABP and the exon junction complex suggest an integrated model for mammalian NMD pathways. EMBO J. 27, 736–747.

Jackson, R.J., Hellen, C.U.T., and Pestova, T. V (2010). The mechanism of eukaryotic translation initiation and principles of its regulation. Nat. Rev. Mol. Cell Biol. 11, 113– 127.

115

Jacobson, A. (1996). Poly(A) Metabolism and Translation: The Closed-loop Model. In Cold Spring Harbor Monograph Archive; Volume 30 (1996): Translational Control,.

Jacobson, A., and Peltz, S.W. (1996). Interrelationships of the pathways of mRNA decay and translation in eukaryotic cells. Annu. Rev. Biochem. 65, 693–739.

Jinek, M., Coyle, S.M., and Doudna, J.A. (2011). Coupled 5’ nucleotide recognition and processivity in Xrn1-mediated mRNA decay. Mol. Cell 41, 600–608.

Kirstein-Miles, J., Scior, A., Deuerling, E., and Morimoto, R.I. (2013). The nascent polypeptide-associated complex is a key regulator of proteostasis. EMBO J. 32, 1451– 1468.

Klauer, A.A., and van Hoof, A. (2012). Degradation of mRNAs that lack a stop codon: a decade of nonstop progress. Wiley Interdiscip. Rev. RNA 3, 649–660.

Kri Ko, A., Copi, T., Gabaldón, T., Lehner, B., and Supek, F. (2014). Inferring gene function from evolutionary change in signatures of translation efficiency. Genome Biol. 15, R44.

Kshirsagar, M., and Parker, R. (2004). Identification of Edc3p as an enhancer of mRNA decapping in Saccharomyces cerevisiae. Genetics 166, 729–739.

LaGrandeur, T., and Parker, R. (1999). The cis acting sequences responsible for the differential decay of the unstable MFA2 and stable PGK1 transcripts in yeast include the context of the translational start codon. RNA 5, 420–433.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory- efficient alignment of short DNA sequences to the . Genome Biol. 10, R25.

Lee, S.R., and Lykke-Andersen, J. (2013). Emerging roles for ribonucleoprotein modification and remodeling in controlling RNA fate. Trends Cell Biol. 23, 504–510.

Lejeune, F., Li, X., and Maquat, L.E. (2003). Nonsense-Mediated mRNA Decay in Mammalian Cells Involves Decapping, Deadenylating, and Exonucleolytic Activities. Mol. Cell 12, 675–687.

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079.

Lin, W.-J., Duffy, A., and Chen, C.-Y. (2007). Localization of AU-rich element- containing mRNA in cytoplasmic granules containing exosome subunits. J. Biol. Chem. 282, 19958–19968.

Losson, R., and Lacroute, F. (1979). Interference of nonsense mutations with eukaryotic messenger RNA stability. Proc. Natl. Acad. Sci. U. S. A. 76, 5134–5137.

116

Lothrop, A.P., Torres, M.P., and Fuchs, S.M. (2013). Deciphering post-translational modification codes. FEBS Lett. 587, 1247–1257.

McManus, C.J., May, G.E., Spealman, P., and Shteyman, A. (2014). Ribosome profiling reveals post-transcriptional buffering of divergent gene expression in yeast. Genome Res. 24, 422–430.

Miller, C., Schwalb, B., Maier, K., Schulz, D., Dümcke, S., Zacher, B., Mayer, A., Sydow, J., Marcinowski, L., Dölken, L., et al. (2011). Dynamic transcriptome analysis measures rates of mRNA synthesis and decay in yeast. Mol. Syst. Biol. 7, 458.

Minshall, N., and Standart, N. (2004). The active form of Xp54 RNA helicase in translational repression is an RNA-mediated oligomer. Nucleic Acids Res. 32, 1325– 1334.

Mitchell, S.F., Jain, S., She, M., and Parker, R. (2013). Global analysis of yeast mRNPs. Nat. Struct. Mol. Biol. 20, 127–133.

Muhlrad, D., and Parker, R. (1992). Mutations affecting stability and deadenylation of the yeast MFA2 transcript. Genes Dev. 6, 2100–2111.

Muhlrad, D., Decker, C.J., and Parker, R. (1995). Turnover mechanisms of the stable yeast PGK1 mRNA. Mol. Cell. Biol. 15, 2145–2156.

Nagarajan, V.K., Jones, C.I., Newbury, S.F., and Green, P.J. (2013). XRN 5’→3' exoribonucleases: structure, mechanisms and functions. Biochim. Biophys. Acta 1829, 590–603.

Nilsen, T.W. (2007). Mechanisms of microRNA-mediated gene regulation in animal cells. Trends Genet. 23, 243–249.

Nissan, T., Rajyaguru, P., She, M., Song, H., and Parker, R. (2010). Decapping activators in Saccharomyces cerevisiae act by multiple mechanisms. Mol. Cell 39, 773–783.

Nonet, M., Scafe, C., Sexton, J., and Young, R. (1987). Eucaryotic RNA polymerase conditional mutant that rapidly ceases mRNA synthesis. Mol. Cell. Biol. 7, 1602–1611.

Novoa, E.M., and Ribas de Pouplana, L. (2012). Speeding with control: Codon usage, tRNAs, and ribosomes. Trends Genet. 28, 574–581.

Olivas, W., and Parker, R. (2000). The Puf3 protein is a transcript-specific regulator of mRNA degradation in yeast. EMBO J. 19, 6602–6611.

Orban, T.I., and Izaurralde, E. (2005). Decay of mRNAs targeted by RISC requires XRN1, the Ski complex, and the exosome. RNA 11, 459–469.

Parker, R. (2012). RNA degradation in Saccharomyces cerevisae. Genetics 191, 671–702.

117

Pechmann, S., and Frydman, J. (2013). Evolutionary conservation of codon optimality reveals hidden signatures of cotranslational folding. Nat. Struct. Mol. Biol. 20, 237–243.

Percudani, R., Pavesi, A., and Ottonello, S. (1997). Transfer RNA gene redundancy and translational selection in Saccharomyces cerevisiae. J. Mol. Biol. 268, 322–330.

Plotkin, J.B., and Kudla, G. (2011). Synonymous but not the same: the causes and consequences of codon bias. Nat. Rev. Genet. 12, 32–42.

Presnyak, V., and Coller, J. (2013). The DHH1/RCKp54 family of helicases: An ancient family of proteins that promote translational silencing. Biochim. Biophys. Acta - Gene Regul. Mech. 1829, 817–823.

Proudfoot, N.J. (2011). Ending the message: Poly(A) signals then and now. Genes Dev. 25, 1770–1782.

Pulak, R., and Anderson, P. (1993). mRNA Surveillance by the Caenorhabditis elegans smg genes. Genes Dev. 7, 1885–1897.

Qian, W., Yang, J.R., Pearson, N.M., Maclean, C., and Zhang, J. (2012). Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genet. 8.

Rajyaguru, P., She, M., and Parker, R. (2012). Scd6 targets eIF4G to repress translation: RGG motif proteins as a class of eIF4G-binding proteins. Mol. Cell 45, 244–254.

Dos Reis, M., Savva, R., and Wernisch, L. (2004). Solving the riddle of codon usage preferences: A test for translational selection. Nucleic Acids Res. 32, 5036–5044.

Rodnina, M. V, and Wintermeyer, W. (2001). Fidelity of aminoacyl-tRNA selection on the ribosome: kinetic and structural mechanisms. Annu. Rev. Biochem. 70, 415–435.

Roy, B., and Jacobson, A. (2013). The intimate relationships of mRNA decay and translation. Trends Genet. 29, 691–699.

Saito, S., Hosoda, N., and Hoshino, S.I. (2013). The Hbs1-Dom34 protein complex functions in non-stop mRNA decay in mammalian cells. J. Biol. Chem. 288, 17832– 17843.

Saldanha, A.J. (2004). Java Treeview--extensible visualization of microarray data. Bioinformatics 20, 3246–3248.

Schoenberg, D.R., and Maquat, L.E. (2012). Regulation of cytoplasmic mRNA decay. Nat. Rev. Genet. 13, 246–259.

Schwartz, D.C., and Parker, R. (2000). mRNA decapping in yeast requires dissociation of the cap binding protein, eukaryotic translation initiation factor 4E. Mol. Cell. Biol. 20, 7933–7942.

118

Shandilya, J., and Roberts, S.G.E. (2012). The transcription cycle in eukaryotes: from productive initiation to RNA polymerase II recycling. Biochim. Biophys. Acta 1819, 391–400.

Sharp, P.M., and Li, W.H. (1986). An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 24, 28–38.

Sharp, P.M., and Li, W.H. (1987). The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295.

Shatkin, A. (1976). Capping of eucaryotic mRNAs. Cell 9, 645–653.

Shilatifard, A. (2006). Chromatin modifications by methylation and ubiquitination: implications in the regulation of gene expression. Annu. Rev. Biochem. 75, 243–269.

Shoemaker, C.J., and Green, R. (2012). Translation drives mRNA quality control. Nat. Struct. Mol. Biol. 19, 594–601.

Shoemaker, C.J., Eyler, D.E., and Green, R. (2010). Dom34:Hbs1 promotes subunit dissociation and peptidyl-tRNA drop-off to initiate no-go decay. Science (80-. ). 330, 369–372.

Sonenberg, N., and Hinnebusch, A.G. (2009). Regulation of translation initiation in eukaryotes: mechanisms and biological targets. Cell 136, 731–745.

Sweet, T., Kovalak, C., and Coller, J. (2012). The dead-box protein dhh1 promotes decapping by slowing ribosome movement. PLoS Biol. 10.

Swisher, K.D., and Parker, R. (2011). Interactions between Upf1 and the decapping factors Edc3 and Pat1 in Saccharomyces cerevisiae. PLoS One 6, e26547.

Tarun, S.Z., and Sachs, A.B. (1996). Association of the yeast poly(A) tail binding protein with translation initiation factor eIF-4G. EMBO J. 15, 7168–7177.

Tharun, S., and Parker, R. (2001). Targeting an mRNA for decapping: displacement of translation factors and association of the Lsm1p-7p complex on deadenylated yeast mRNAs. Mol. Cell 8, 1075–1083.

Topisirovic, I., Svitkin, Y. V, Sonenberg, N., and Shatkin, A.J. (2011). Cap and cap- binding proteins in the control of gene expression. Wiley Interdiscip. Rev. RNA 2, 277– 298.

Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515.

119

Tucker, M., and Parker, R. (2000). Mechanisms and control of mRNA decapping in Saccharomyces cerevisiae. Annu. Rev. Biochem. 69, 571–595.

Tucker, M., Staples, R.R., Valencia-Sanchez, M.A., Muhlrad, D., and Parker, R. (2002). Ccr4p is the catalytic subunit of a Ccr4p/Pop2p/Notp mRNA deadenylase complex in Saccharomyces cerevisiae. EMBO J. 21, 1427–1436.

Tuller, T., Carmi, A., Vestsigian, K., Navon, S., Dorfan, Y., Zaborske, J., Pan, T., Dahan, O., Furman, I., and Pilpel, Y. (2010). An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141, 344–354.

Vilela, C., Velasco, C., Ptushkina, M., and McCarthy, J.E. (2000). The eukaryotic mRNA decapping protein Dcp1 interacts physically and functionally with the eIF4F translation initiation complex. EMBO J. 19, 4372–4382.

Wang, Z., and Kiledjian, M. (2001). Functional Link between the Mammalian Exosome and mRNA Decapping. Cell 107, 751–762.

Wang, Y., Liu, C.L., Storey, J.D., Tibshirani, R.J., Herschlag, D., and Brown, P.O. (2002). Precision and functional specificity in mRNA decay. Proc. Natl. Acad. Sci. U. S. A. 99, 5860–5865.

White, E.J.F., Brewer, G., and Wilson, G.M. (2013). Post-transcriptional control of gene expression by AUF1: Mechanisms, physiological targets, and regulation. Biochim. Biophys. Acta - Gene Regul. Mech. 1829, 680–688.

Whitney, M.L., Hurto, R.L., Shaheen, H.H., and Hopper, A.K. (2007). Rapid and reversible nuclear accumulation of cytoplasmic tRNA in response to nutrient availability. Mol. Biol. Cell 18, 2678–2686.

Wilusz, C.J., Gao, M., Jones, C.L., Wilusz, J., and Peltz, S.W. (2001). Poly(A)-binding proteins regulate both mRNA deadenylation and decapping in yeast cytoplasmic extracts. RNA 7, 1416–1424.

Wyers, F., Minet, M., Dufour, M.E., Vo, L.T.A., and Lacroute, F. (2000). Deletion of the PAT1 Gene Affects Translation Initiation and Suppresses a PAB1 Gene Deletion in Yeast. Mol. Cell. Biol. 20, 3538–3549.

Yang, L., Duff, M.O., Graveley, B.R., Carmichael, G.G., and Chen, L.-L. (2011). Genomewide characterization of non-polyadenylated RNAs. Genome Biol. 12, R16.

Yona, A.H., Bloom-Ackermann, Z., Frumkin, I., Hanson-Smith, V., Charpak-Amikam, Y., Feng, Q., Boeke, J.D., Dahan, O., and Pilpel, Y. (2013). tRNA genes rapidly change in evolution to meet novel translational demands. Elife 2013.

Zhang, S., Ruiz-Echevarria, M.J., Quan, Y., and Peltz, S.W. (1995). Identification and characterization of a sequence motif involved in nonsense-mediated mRNA decay. Mol. Cell. Biol. 15, 2231–2244.

120

Zhou, T., Weems, M., and Wilke, C.O. (2009). Translationally Optimal Codons Associate with Structurally Sensitive Sites in Proteins. Mol. Biol. Evol. 26, 1571–1580.

Zinshteyn, B., and Gilbert, W. V (2013). Loss of a conserved tRNA anticodon modification perturbs cellular signaling. PLoS Genet. 9, e1003675.

121