INVESTIGATION OF THE MRNP AND TRANSCRIPTOME REGULATED BY

NONSENSE-MEDIATED RNA DECAY

by

JENNA E. SMITH

Submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Dissertation Advisor: Dr. Kristian Baker

Department of Biochemistry

CASE WESTERN RESERVE UNIVERSITY

MAY, 2015 CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the dissertation of

Jenna E. Smith . candidate for the degree of Doctor of Philosophy *.

Committee Chair

Dr. Timothy Nilsen .

Dissertation Advisor

Dr. Kristian E. Baker .

Committee Member

Dr. Donny Licatalosi .

Committee Member

Dr. Maria Hatzoglou .

Committee Member

Dr. Derek Taylor .

Date of Defense

December 19, 2014 .

*We also certify that written approval has been obtained

for any proprietary material contained therein. TABLE OF CONTENTS

LIST OF TABLES...... 5

LIST OF FIGURES...... 6

ACKNOWLEDGEMENTS ...... 8

ABSTRACT ...... 10

CHAPTER 1: AN OVERVIEW OF NONSENSE-MEDIATED RNA DECAY. . . . 12

1.1 RNA surveillance in gene expression...... 12

1.2 Biological importance of the NMD pathway...... 13

NMD in development ...... 15

NMD and human disease...... 17

1.3 NMD impacts the expression of many classes of endogenous

mRNAs ...... 19

1.4 The factors involved in NMD ...... 24

1.5 Models for the recognition of NMD substrates ...... 27

Pre-mRNA splicing and the exon junction complex...... 28

Other cis-acting elements ...... 30

Faux 3’ UTR...... 31

Towards a universal model...... 33

1.6 Downstream steps in NMD...... 35

Activation of UPF1 enzymatic activity ...... 35

Direct impact of NMD on protein output...... 36

1 Decay of NMD RNAs ...... 37

1.7 Harnessing NMD to investigate RNA biology...... 38

CHAPTER 2: PURIFICATION OF AN NMD-SENSITIVE mRNP

COMPLEX ...... 40

2.1 Introduction...... 40

Rationale - evidence for a distinct NMD mRNP ...... 40

Existing methods to interrogate RNA-protein interactions ...... 41

2.2 Results and conclusions ...... 44

Overview of the mRNP pulldown protocol ...... 44

Design and characterization of reporter mRNAs ...... 46

Optimization of mRNP pulldown efficiency...... 48

Co-purification of mRNA-binding proteins ...... 53

Preliminary identification of mRNP components by mass

spectrometry...... 55

2.3 Discussion...... 60

2.4 Future directions...... 62

CHAPTER 3: TRANSLATION OF UNANNOTATED RNAs IN YEAST...... 67

3.1 Introduction...... 67

The emerging field of long non-coding RNAs ...... 67

NMD substrate recognition requires translation...... 69

3.2 Results and conclusions ...... 70

2 Identification of unannotated RNAs in the S. cerevisiae

genome...... 70

uRNAs co-sediment with polyribosomes ...... 74

Ribosome profiling indicates uRNAs are bound by ribosomes . . . 77

Conservation and detection of sORFs expressed from uRNAs . . 81

NMD-sensitivity of uRNAs provides evidence for translation. . . . . 83

3.3 Discussion...... 86

3.4 Future directions...... 90

CHAPTER 4: IDENTIFICATION OF NMD-SENSITIVE mRNAs ...... 97

4.1 Introduction...... 97

Obstacles to the identification of NMD substrates...... 97

Improving approaches for transcriptome analyses...... 100

4.2 Results and conclusions ...... 101

Identification of direct and indirect targets of NMD by RNA-seq . 101

Identifying transcript features unique to NMD-sensitive RNA

isoforms...... 105

4.3 Discussion...... 107

4.4 Future directions...... 110

CHAPTER 5: CONCLUDING REMARKS ...... 113

APPENDIX A - GENERAL EXPERIMENTAL METHODS ...... 115

APPENDIX B - mRNP PULLDOWN PROTOCOL...... 118

3 APPENDIX C - EXPERIMENTAL METHODS RELATED TO

CHAPTER 2 ...... 124

APPENDIX D - EXPERIMENTAL METHODS RELATED TO

CHAPTER 3 ...... 129

APPENDIX E - EXPERIMENTAL METHODS RELATED TO

CHAPTER 4 ...... 148

BIBLIOGRAPHY ...... 153

4 LIST OF TABLES

Table 2-1. List of NMD-sensitive mRNP-associated proteins with candidate roles

in NMD ...... 59

Table 3-1. Summary of high-throughput sequencing reads ...... 93

Table 3-2. Predicted sORFs and conservation analyses ...... 94

Table C-1. List of oligonucleotides, plasmids, and yeast strains used in

Chapter 2 ...... 128

Table D-1. List of oligonucleotides, plasmids, and yeast strains used in

Chapter 3 ...... 147

Table E-1. List of oligonucleotides, plasmids, and yeast strains used in

Chapter 4 ...... 152

5 LIST OF FIGURES

Figure 1-1. The impact of a premature termination codon on protein production

and mRNA stability...... 14

Figure 1-2. A variety of RNAs are substrates for NMD...... 22

Figure 1-3. Models for NMD substrate recognition...... 30

Figure 2-1. Design and validation of GFP reporter mRNAs ...... 47

Figure 2-2. Selection of streptavidin-conjugated magnetic beads for mRNP

purification ...... 49

Figure 2-3. Optimization of mRNP pulldown conditions ...... 50

Figure 2-4. mRNP pulldown is highly specific for target RNA...... 52

Figure 2-5. Co-purification of proteins with mRNP pulldown protocol...... 54

Figure 2-6. Preliminary screening of candidate proteins identified by mRNP

pulldown...... 65

Figure 3-1. Depletion of ribosomal RNA in samples for RNA-seq ...... 71

Figure 3-2. Characterization of yeast unannotated RNAs ...... 73

Figure 3-3. Yeast uRNAs and lncRNAs co-sediment with polyribosomes . . . . . 76

Figure 3-4. Ribosome profiling identifies regions of ribosome association with

uRNAs ...... 78

Figure 3-5. 3-nucleotide phasing of ribosome footprints facilitated the

identification of translated sORFs...... 80

Figure 3-6. Evidence for sORF expression and conservation ...... 82

Figure 3-7. Many uRNAs are sensitive to the translation-dependent NMD

pathway ...... 85

6 Figure 3-8. Evidence for translation of uRNAs in yeast ...... 87

Figure 4-1. Characteristics of NMD-sensitive mRNAs identified by RNA-seq . 103

Figure 4-2. Validation of RNA-seq by northern analysis...... 104

Figure 4-3. A tool for identifying known and unique NMD-sensitive transcript

regions...... 107

Figure 4-4. Global transcriptional inhibition using a temperature-sensitive

mutation in RNA Polymerase II...... 111

7 ACKNOWLEDGEMENTS

All of the work that I put into my doctoral research could not have been completed without the assistance and support of others. First and foremost, my research advisor and mentor, Dr. Kristian Baker, has been incredibly supportive and encouraging throughout my time in her lab. When I joined the Baker lab, I would have been lost without her guidance and always-open door to help me find my footing as an independent researcher. Over the years, Kristian has helped me grow as a scientist, a critical thinker, and a professional.

Many other faculty at CWRU have contributed to my development as a scientist. Prominent among these is Dr. Jeff Coller, who has always provided valuable insight in group meetings and helped motivate my research to move into new and exciting directions. I would also like to acknowledge Dr. Jo Ann Wise for the outside viewpoint she contributed to lab meeting discussion. Finally, the members of my thesis committee - Drs. Timothy Nilsen, Maria Hatzoglou, Donny

Licatalosi, and Derek Taylor - have provided suggestions and guidance throughout and at this final stage of my graduate career, without which my research would not have been as successful.

Some of the work presented here is directly thanks to the contributions of other researchers. In particular, one project was built off of work by Sarah

Geisler, a former member of the Coller lab, who also helped in the initial stages of my own research. Nathan Huynh has contributed a substantial amount of time as an undergraduate student and been a great help to me. Nicholas Kline provided invaluable bioinformatics support, without which I would have been lost.

8 I have also relied on the expertise of the CWRU Genome and Transcriptome

Sequencing Core, in particular Frank Campbell and Sheldon Bai. Our mass spectrometry analyses were greatly aided by the expertise and insight of Dr.

Amber Mosley and members of her lab. Finally, collaborators at the Whitehead

Institute - Juan Alvarez and Wenqian Hu - have also contributed great ideas and data to supplement the work documented here.

All of the past and present members of the Baker and Coller labs have been wonderful to work with. The distinction between these two labs was often blurred, which provided a great opportunity to work with twice as many knowledgeable colleagues. In particular, DaJuan Whiteside and Najwa Al-

Husaini were tremendously helpful in everything from troubleshooting to helping with protocols to ordering reagents. I cannot imagine working with a better group of people.

Aside from these direct contributions, my graduate research would not have succeeded without the mental and emotional support of my family. My husband, Trevor Smith, and parents, Steve and Faith Sroka, were always encouraging and optimistic, even at the most difficult times. Although they may not have always known exactly the challenges I was facing, they were always there to encourage me every step of the way. Thank you all for your love and support.

9 Investigation of the mRNP and Transcriptome Regulated by

Nonsense-Mediated RNA Decay

Abstract

by

JENNA E. SMITH

Appropriate and accurate gene expression is critical for all organisms.

One quality control pathway which exists to maintain the fidelity of gene expression is the nonsense-mediated RNA decay (NMD) pathway, responsible for recognizing and targeting for rapid degradation RNAs undergoing premature translation termination. Although this pathway is conserved throughout eukarya and essential in higher organisms, many mechanistic details underlying NMD are not understood.

Proteins uniquely bound to NMD-sensitive mRNAs are predicted to facilitate their recognition as aberrant by the NMD pathway. To investigate differences between NMD-sensitive and NMD-insensitive RNAs, an efficient and highly-specific method to biochemically purify an individual mRNP from a whole- cell lysate was developed. This extensively optimized procedure will serve as a powerful tool to identify proteins preferentially associated with NMD-sensitive mRNAs to elucidate how NMD substrates are recognized. Furthermore, the mRNP pulldown protocol can be adapted to study other aspects of mRNA regulation.

10 Because NMD is a strictly translation-dependent process, sensitivity to

NMD can provide evidence for the translation of RNAs. An emerging class of poorly characterized RNAs, long non-coding RNAs (lncRNAs) are bioinformatically classified to lack protein-coding capacity. Unannotated RNAs

(uRNAs) in yeast were specifically investigated for their capacity to associate with translating ribosomes based on co-sedimentation with polyribosomes, ribosome profiling, detection of encoded peptides, and sensitivity to NMD. These data demonstrated that many transcripts considered to lack protein-coding potential are, in fact, actively translated, and implicate NMD in regulating the activity or expression of a subset of lncRNAs.

Finally, many endogenous mRNAs are sensitive to the NMD pathway, including specific mRNA isoforms. Genome-wide profiling of the yeast transcriptome by high-throughput sequencing globally identified mRNAs sensitive to NMD. These mRNAs were enriched for features common to NMD-sensitive

RNAs, and also included indirect targets of NMD. A bioinformatic tool was developed to find NMD-sensitive regions of the yeast transcriptome independent of gene annotation or transcript structure, and identify NMD-sensitive RNA isoforms. This preliminary analysis of the NMD-sensitive transcriptome has revealed unappreciated complexity in RNA processing and expression in yeast.

11 CHAPTER 1: AN OVERVIEW OF NONSENSE-

MEDIATED RNA DECAY

1.1 RNA surveillance in gene expression

One of the most essential tasks carried out by all organisms is gene expression, which must occur under the appropriate conditions and with minimal error. RNA is the effector molecule responsible for converting genetic information from DNA - in which it is stored - to function, be that the RNA itself, or translation of the RNA into a protein. Thus, gene expression is not only controlled by the regulation of transcription, but can also be regulated during all steps of RNA metabolism. Gene expression can, therefore, be impacted by processing, localization, binding of RNA binding proteins (RBPs), regulation of translation, and decay of the RNA. In particular, an often overlooked aspect of gene expression is the degradation of RNAs as a critical means to either maintain the correct steady-state level of an RNA population or turn off expression upon a change in environmental conditions. Furthermore, quality control pathways exist to target RNAs for degradation if they contain errors, helping ensure the fidelity of gene expression.

Because of the importance of accurate gene expression in all organisms, it should perhaps be unsurprising that there are many mechanisms in place to ensure that the expression of a gene occurs with minimal error or wasted resources (reviewed in Eberle and Visa, 2014). In particular, three major surveillance pathways detect aberrancies in RNA sequence or structure during

12 translation and help clear the cell of these RNAs (Shoemaker and Green, 2012).

The earliest identified RNA surveillance pathway, nonsense-mediated RNA decay

(NMD) recognizes mRNAs that undergo premature translation termination and targets these RNAs for rapid degradation (Leeds et al., 1991; Losson and

Lacroute, 1979). In contrast, non-stop decay recognizes mRNAs which lack a termination codon, and also induces their decay (Frischmeyer et al., 2002).

Finally, no-go decay frees translating ribosomes from mRNAs where they have stalled during elongation due to a strong pause or structural block (Doma and

Parker, 2006). Although these pathways represent a small piece of the complex network in place to ensure that gene expression occurs accurately and appropriately, they provide a critical means to eliminate aberrant RNAs from the cell. Here, the focus will be on the first identified and best characterized RNA surveillance pathway, NMD.

1.2 Biological importance of the NMD pathway

Over three decades ago, it was first observed in yeast that mRNAs containing premature termination codons (PTCs) due to amber nonsense mutations were less abundant than their counterparts that lacked a PTC (Losson and Lacroute, 1979). Subsequently, it was determined that this effect was due to the preferential decay of PTC-containing mRNAs (Leeds et al., 1991; Figures

1-1A and 1-1B), leading to the characterization of the translation-dependent RNA quality control pathway known as nonsense-mediated RNA decay. The NMD pathway is conserved throughout eukarya, and recognizes and targets for rapid

13 degradation RNAs undergoing premature translation termination (reviewed in

Baker and Parker, 2004). The translation of truncated open reading frames

(ORFs) generated from premature translation termination (Figure 1-1B) could have deleterious effects on the cell by producing polypeptides truncated at their carboxy-terminus with dominant-negative functions (Pulak and Anderson, 1993), suggesting that a key role of NMD is to limit the production of these truncated proteins (reviewed in Bhuvanagiri et al., 2010; Holbrook et al., 2004).

Furthermore, while premature termination due to nonsense or frameshift mutations within a gene coding sequence are the most straightforward examples of NMD substrates, there are actually many classes of RNAs sensitive to NMD

(discussed in Chapter 1.3), suggesting the role of NMD in shaping gene expression may be quite broad. Indeed, 3-10% of genes across eukarya show

NMD-dependent changes in abundance (Rehwinkel et al., 2006), leading to the proposal that NMD may also be involved in regulating the expression of endogenous genes.

A Production of full-length protein

AAAn

AAAn

AAAn Stable RNA

AAAn species B Premature Production of termination codon truncated protein

AAAn UPF3 RNA rapidly degraded UPF1 UPF2

Figure 1-1. The impact of a premature termination codon on protein production and mRNA stability. A) Translation of a normal mRNA produces a full-length protein, and the RNA species is relatively stable. B) Translation of an mRNA containing a premature termination codon results in production of a truncated protein, and leads to the accelerated degradation of the mRNA. This process of nonsense-mediated RNA decay (NMD) requires the activity of three conserved NMD factors, UPF1, UPF2, and UPF3; the precise mechanism of these factors in facilitating NMD is not fully understood.

14 The fact that NMD is a conserved pathway in eukaryotes from yeast to humans hints at its importance. Elimination of the NMD pathway leads to lethality in many organisms, including fruit fly Drosophila melanogaster

(Metzstein and Krasnow, 2006), plants such as Arabidopsis thaliana (Yoine et al.,

2006), and vertebrates including zebrafish (Wittkopp et al., 2009) and mammals

(Medghalchi et al., 2001; Weischenfeldt et al., 2008). Even in lower organisms in which NMD is dispensable for viability, including budding yeast Saccharomyces cerevisiae, fission yeast Schizosaccharomyces pombe, and Caenorhabditis elegans, loss of NMD can still be detrimental for survival. NMD mutations in S. pombe result in increased sensitivity to oxidative stress (Matia-Gonzalez et al.,

2013; Rodriguez-Gabriel et al., 2006), while in worms deletion of any of the factors involved in NMD leads to a severe morphogenetic defect for the which the

C. elegans NMD factors are named (SMG proteins; Suppressor with

Morphogenetic effect on Genitalia; Hodgkin et al., 1989). Notably, it is not clear whether the importance of NMD is due solely to its role in RNA surveillance, or through its effects on the expression of endogenous genes.

NMD in development

Several specific roles for the NMD pathway during development have been uncovered in higher eukaryotes. NMD may have an important role in the development of the immune system. In mammals, NMD is critical for hematopoietic maintenance and the development and activation of lymphocytes based on conditional knockout of the NMD pathway (Weischenfeldt et al., 2008).

15 Similarly, transgenic mice ubiquitously over-expressing a dominant-negative form of one of the core NMD factors demonstrate abnormal thymus development and decreased thymocyte maturation (Frischmeyer-Guerrerio et al., 2011). The role of NMD in the immune system may be related to its function in eliminating non- productive gene rearrangements that occur in the T cell receptor and immunoglobulin genes during immune system development (Carter et al., 1995;

Weischenfeldt et al., 2008).

The activity and regulation of NMD in the nervous system also appears to be critical in higher eukaryotes. The regulation of NMD activity during development is important for the differentiation and maturation of neural cells in human, mouse, and Xenopus, suggesting an essential role for NMD in appropriate development of the brain (Lou et al., 2014). Indeed, in support of this, in humans mutations in NMD factor UPF3B are associated with intellectual deficiencies manifested as syndromic and nonsyndromic X-linked mental retardation (Tarpey et al., 2007).

NMD may also be important for development in other tissues. Conditional knockout of NMD has demonstrated a role for NMD in liver function, particularly fetal liver development and liver regeneration (Thoren et al., 2010). In D. melanogaster, deletion of NMD factors is lethal by larval stage 2 (Chapin et al.,

2014), and mutations in NMD factors result in abnormal development of genitalia in C. elegans (Hodgkin et al., 1989), providing evidence for roles of NMD in development throughout metazoa that have not yet been characterized.

16 NMD and human disease

Not only is NMD important for normal cellular function in many contexts, it has been implicated in the pathogenesis of a number of human diseases. Up to

1/3 of all genetic diseases are predicted to be caused by mutations that introduce a PTC into a gene coding sequence (Frischmeyer and Dietz, 1999). Among the most notable of these are 5% of mutations in the cystic fibrosis transmembrane receptor CFTR which cause cystic fibrosis, mutations in the dystrophin gene which lead to muscular dystrophy, and mutations in the β-globin gene that result in β-thalassemia (reviewed in Bhuvanagiri et al., 2010). Interestingly, the pathogenesis of these diseases can be either positively or negatively impacted by NMD (Khajavi et al., 2006). In some diseases, including β-thalassemia, NMD limits the production of truncated proteins with dominant-negative function that result from translation of PTC-containing mRNAs, thereby preventing the more severe dominant-negative phenotype seen with nonsense mutations in the β- globin gene that fail to elicit NMD (reviewed in Bhuvanagiri et al., 2010). In other cases, such as Duchenne muscular dystrophy, NMD may suppress expression of a truncated but partially active protein by triggering degradation of the PTC- containing transcript, causing a more severe phenotype (reviewed in Bhuvanagiri et al., 2010).

Because of the extent of disease-causing mutations impacted by NMD, a promising route of treatment may be through modulation of the NMD pathway itself (Peltz et al., 2009). There are two distinct mechanisms through which NMD therapeutics may act: 1) nonsense suppression of the disease-causing mutation,

17 which allows the expression of some full-length protein, or 2) inhibition of NMD activity to increase the levels of truncated polypeptides. For the former approach, aminoglycosides such as gentamycin, which decrease the fidelity of translational decoding and can suppress translation termination at nonsense codons, have been explored to treat diseases caused by nonsense mutations

(reviewed in Peltz et al., 2013). Additionally, a small molecule inhibitor of the

NMD pathway, referred to as PTC124 or Ataluren, has been shown to selectively promote readthrough of PTCs but not normal termination codons, allowing the production of full-length proteins from normally NMD-sensitive mRNAs (Peltz et al., 2013; Welch et al., 2007). This drug has had success alleviating disease phenotypes in mouse models or in vitro, including Duchenne muscular dystrophy

(Welch et al., 2007) and cystic fibrosis (Du et al., 2008), and is now undergoing

Phase III clinical trials (Peltz et al., 2013). In contrast, for the latter therapeutic approach, inhibition of NMD may allow the expression of truncated polypeptides that still possess partial functionality from a gene containing a nonsense mutation. Alternatively, inhibition of NMD in cancer has been shown to promote accumulation of new truncated protein isoforms that may arise from widespread genomic disorganization, which could be recognized by the native immune system (Pastor et al., 2010). To this effect, treatment by aptamer-conjugated siRNA knockdown of NMD components has had success in decreasing the severity of mouse tumors by inducing an immune response against the tumor

(Pastor et al., 2010). The promise of modulating NMD as a disease therapy

18 highlights the importance of understanding all that we can about the molecular details of the NMD pathway.

1.3 NMD impacts the expression of many classes of endogenous mRNAs

NMD is traditionally classified as a surveillance pathway, with classic targets arising from nonsense or frameshift mutations which create a stop codon within an open reading frame. However, genome-wide expression profiling has indicated that NMD also impacts the expression of 3-10% of endogenous genes

(Rehwinkel et al., 2006). Although the significance of this regulation is still unclear, examination of mRNAs sensitive to NMD has revealed a number of features - in addition to nonsense or frameshift mutations - that can render an mRNA a substrate of NMD. A common feature of all known classes of NMD substrates is the presence of a translation termination event that either occurs or is perceived to occur prematurely. NMD-inducing features can be grouped into three main categories - premature termination within an annotated ORF, premature termination due to translation of alternate reading frames, or extension of untranslated RNA downstream of a normal termination codon (Figure 1-2).

PTCs within an ORF may arise in a number of ways (Figure 1-2A). They may be introduced by genomic nonsense or frameshift mutations, or by errors in transcription (Figure 1-2A, top). These generally represent truly aberrant mRNA species that are cleared by surveillance. In contrast, mRNAs that encode the non-standard amino acid selenocysteine purposefully encode a stop codon within the coding sequence. Selenocysteine is encoded by the UGA stop codon, and

19 failure of the selenocysteine tRNA (sec-tRNA) to decode the UGA, particularly in conditions of selenium deficiency where sec-tRNA levels are limiting, will result in translation termination within the ORF (Moriarty et al., 1998; Seyedali and Berry,

2014).

PTCs can also be introduced into an ORF following erroneous or alternative pre-mRNA splicing. First, the erroneous or regulated retention of introns (Figure 1-2A, middle), often due to weak consensus splice sites

(Braunschweig et al., 2014; He et al., 1993; Sayani et al., 2008), is likely to introduce PTCs because introns are evolutionarily selected to either result in a frameshift if retained in the transcript or encode in-frame termination codons

(Jaillon et al., 2008). Similarly, the use of cryptic splice sites can also lead to the inclusion of intronic sequences or produce frameshifts that lead to downstream nonsense codons (Kawashima et al., 2014).

Second, distinct from intron retention, alternative splicing also commonly produces NMD substrates (Figure 1-2A, bottom), with over 1/3 of alternative splicing events predicted to introduce PTCs (Lewis et al., 2003). Indeed, the coupling of alternative splicing and NMD (AS-NMD) is thought to represent one of the most prominent means of generating NMD substrates in higher eukaryotes. In some cases, PTCs may be introduced by incorporating “poison cassette exons” which harbor in-frame nonsense codons (Lareau et al., 2007).

PTCs may also be introduced during splicing by the inclusion of a combination of exons that lead to frameshifts in the annotated coding sequence. Interestingly, the generation of AS-NMD splice products by poison cassette exon is used by

20 the SR family of alternative splicing regulators (Lareau et al., 2007), indicating that this may serve as an important feedback mechanism to buffer the level of alternative splicing.

Translation of alternate ORFs upstream to or out-of-frame of annotated reading frames can also result in premature translation termination (Figure 1-2B).

Selection of an alternate start codon that is out-of-frame of the annotated start codon often occurs when the annotated AUG is in a weak initiation context, a process termed leaky scanning (Figure 1-2B, top; Arribere and Gilbert, 2013;

Welch and Jacobson, 1999). Alternatively, -1 programmed ribosomal frameshifting (-1 PRF) can occur when a translating ribosome encounters two cis-acting signals in the RNA sequence, causing the ribosome to shift translation into the -1 reading frame (Figure 1-2B, middle; Belew et al., 2011; Belew et al.,

2014; Plant et al., 2004). Both of these events result in translation out-of-frame and likely lead to premature termination. Accordingly, it has been documented that the vast majority of -1 PRF signals in eukaryotes result in translation termination upstream of the annotated stop codon (Jacobs et al., 2007).

Additionally, many mRNAs contain ORFs upstream of the annotated ORF

- within the 5’ UTR - termed upstream ORFs (uORFs; Figure 1-2B, bottom).

Translation of a uORF leads to very early termination, upstream of the annotated

ORF, often making the mRNA sensitive to NMD (Hurt et al., 2013; Johansson et al., 2007; Yepiskoposyan et al., 2011). It has recently become appreciated that many uORFs initiation translation at non-AUG start codons (Brar et al., 2012;

Fritsch et al., 2012; Ingolia et al., 2011), suggesting that many uORFs may not

21 A

Nonsense or frameshift mutation; AAAAAAAn failure to decode selenocysteine

Intron retention

AAAAAAAn AAAAAAAn Normal NMD substrate

Alternative splicing

AAAAAAAn AAAAAAAn Normal NMD substrate

B

Alternative AUG selection AAAAAAAn

Ribosomal frameshift AAAAAAAn

Translation of uORF AAAAAAAn

C Extended 3’ UTR AAAAAAAn

Figure 1-2. A variety of RNAs are substrates for NMD. A) PTCs may be introduced into an annotated ORF by nonsense or frameshift mutations or failure to decode UGA stop codon as selenocysteine (top; mutation or stop codon indicated in red); retention of a PTC-containing intron (middle), or alternative splicing that introduces a PTC (bottom). B) Translation of alternate reading frames (indicated in dark green) may result in premature translation termination. Alternative reading frame translation may occur upon translation initiation at an alternative AUG start codon (top), ribosomal frameshift (middle; frameshift signals indicated in red), or translation of a uORF (bottom). C) Extension of the 3’ UTR can cause a normal termination codon to elicit NMD.

yet be identified and, therefore, the contribution of uORF translation to NMD may be underestimated.

Finally, even normal termination codons can trigger NMD if the RNA contains a long 3’ UTR (Kebaara and Atkin, 2009; Muhlrad and Parker, 1999a).

This is proposed to be because the large region of downstream untranslated

22 RNA resulting from a long 3’ UTR can make a normal termination event appear premature (Figure 1-2C). This is supported by the phenomenon of polarity, where a more 5’ proximal PTC that generates a longer untranslated 3’ sequence is more efficient at eliciting NMD (Cao and Parker, 2003; Losson and Lacroute,

1979; Peltz et al., 1993). Long 3’ UTRs may be genomically encoded, or result from processing the nascent mRNA at alternative, distal cleavage and polyadenylation sites (Muhlrad and Parker, 1999a; Pulak and Anderson, 1993;

Shi, 2012). Notably, however, the relationship between 3’ UTR length and NMD is not straightforward - there is no direct linear relationship between 3’ UTR length and NMD, and very long 3’ UTRs have been found to escape recognition by NMD (Kebaara and Atkin, 2009).

Global analyses of the NMD-sensitive transcriptome, originally by microarray and subsequently using high-throughput sequencing, have helped uncover many of these features common to NMD-sensitive RNAs. However, there is much that remains unclear regarding the endogenous, physiological targets of NMD. Although 3-10% of eukaryotic mRNAs are sensitive to NMD

(Rehwinkel et al., 2006), many of these mRNAs do not contain apparent NMD- inducing features. Furthermore, there is no consensus of NMD targets, suggesting that the coordinated regulation of mRNAs in common cellular pathways is not evolutionarily conserved. Our understanding of NMD can be greatly increased by understanding the impact it has on both aberrant and endogenous RNAs in the cell. To investigate this in S. cerevisiae at nucleotide resolution, high-throughput sequencing analyses of RNAs impacted by loss of

23 NMD activity are presented in Chapters 3 and 4. With continued analysis, it should be possible to identify features that lead to NMD-sensitivity for many

RNAs, and expand the knowledge of how NMD impacts the global transcriptome in this organism.

1.4 The factors involved in NMD

NMD requires a conserved set of proteins in order to elicit the accelerated decay of target mRNAs. The core NMD machinery is composed of the UPF proteins, for “up-frameshift suppressor,” (Culbertson et al., 1980), and includes

UPF1, UPF2, and UPF3 (Figure 1-1B) which were first identified in S. cerevisiae.

UPF1 was originally characterized upon discovery that mutations in the UPF1 gene resulted in allosuppression of frameshift mutations in HIS4 (Leeds et al.,

1991). UPF1 is a superfamily I helicase (Altamura et al., 1992; de la Cruz et al.,

1999; Leeds et al., 1992), which exhibits 5’ to 3’ helicase and nucleic acid- dependent ATPase activity (Bhattacharya et al., 2000; Czaplinski et al., 1995), both of which are required for its activity in NMD (Weng et al., 1996a; Weng et al., 1996b). Also an RNA binding protein, UPF1 directly binds 9-11 nucleotides of

RNA via its helicase domain (Chakrabarti et al., 2011; Chamieh et al., 2008).

Finally, UPF1 is a phosphoprotein, with the cycle of UPF1 phosphorylation and dephosphorylation required for NMD in higher eukaryotes (Page et al., 1999).

UPF3 was identified in a genetic screen similar to that in which UPF1 was identified (Leeds et al., 1992), while UPF2 (also called NMD2) was discovered through interaction with UPF1 in a yeast two-hybrid screen (He and Jacobson,

24 1995). UPF2 is a mIF4G (Middle of eukaryotic translation Initiation Factor 4G) domain-containing protein which can bind RNA through its third mIF4G domain

(Kadlec et al., 2004). It is thought to act as a molecular bridge, binding the cysteine-histidine-rich (CH) domain of UPF1 (Chakrabarti et al., 2011; Chamieh et al., 2008; Kadlec et al., 2006) and RNA recognition motif of UPF3 (Kadlec et al., 2004) to form a heterotrimeric complex. Neither UPF2 nor UPF3 have any demonstrated enzymatic function, although UPF2 can activate the helicase and

ATPase activities of UPF1, implicating this as a specific role for UPF2 in NMD

(Chakrabarti et al., 2011; Chamieh et al., 2008).

Subsequent screens in C. elegans for allele-specific suppressors identified homologs of the yeast NMD factors, as well as additional proteins required for

NMD in metazoa (Cali et al., 1999; Hodgkin et al., 1989). These were termed

SMG factors (Suppressor with Morphogenetic effect on Genitalia), named for the characteristic phenotype of worms carrying mutations in these genes. SMG2,

SMG3, and SMG4 correspond to yeast factors UPF1, UPF2, and UPF3, respectively, while SMG1, SMG5, SMG6, and SMG7 are metazoa-specific NMD factors. SMG1 is a phosphoinositol-3 kinase (PIK)-related kinase that is responsible for the phosphorylation of UPF1 (Page et al., 1999; Pal et al., 2001).

Alternatively, SMG5-7 bind phosphorylated UPF1 through their 14-3-3-like domains (Fukuhara et al., 2005) and recruit protein phosphatase 2A (PP2A) to induce UPF1 dephosphorylation (Chiu et al., 2003; Ohnishi et al., 2003). SMG6 is a PIN (PilT N-terminus) domain-containing endonuclease responsible for cleaving NMD targets to initiate their degradation in metazoa (Eberle et al., 2009;

25 Huntzinger et al., 2008). In budding yeast, the absence of clear homologs of these factors suggests that neither the phosphorylation of UPF1 nor endonucleolytic cleavage of NMD substrates are conserved steps in the NMD cycle, although it has been suggested that the yeast protein EBS1 may be a

SMG7 homolog (Luke et al., 2007).

Homologs of the core NMD factors and most auxiliary factors have been identified in all eukaryotic organisms investigated, consistent with the presence of a conserved NMD pathway (reviewed in Conti and Izaurralde, 2005). In general, metazoa including zebrafish (Wittkopp et al., 2009) and mammals (Applequist et al., 1997; Chiu et al., 2003; Gatfield et al., 2003; Lykke-Andersen et al., 2000;

Mendell et al., 2000; Ohnishi et al., 2003; Perlick et al., 1996; Serin et al., 2001;

Yamashita et al., 2001) express all seven NMD factors. A notable exception is D. melanogaster, which lacks a clear ortholog of SMG7 (Gatfield et al., 2003).

Although NMD has been studied for over two decades, novel factors with proposed roles in NMD continue to be identified. Recent screens in C. elegans have identified two factors essential for viability, termed SMGL-1 and SMGL-2

(SMG Lethal; also known as DHX34 and NBAS) that play an uncharacterized role in NMD in C. elegans, zebrafish, and humans (Longman et al., 2013;

Longman et al., 2007). Two additional proteins, SMG8 and SMG9, appear to play a role in regulating the kinase activity of SMG1 in humans and C. elegans

(Yamashita et al., 2009). Finally, human PNRC2 (Proline-rich Nuclear Receptor

Coregulatory protein 2) has recently been shown to function in concert with UPF1 in NMD and facilitates recruitment of the decay machinery to NMD targets (Cho

26 et al., 2009). Importantly, the fact that novel proteins continue to be implicated in the well-studied NMD pathway suggests there is still much we do not understand about the mechanisms of NMD substrate recognition and decay.

1.5 Models for the recognition of NMD substrates

The first step in NMD is the recognition of an RNA as an NMD substrate.

There is still uncertainty regarding how RNAs are recognized as targets for this decay pathway. Specifically, how is premature translation termination distinguished from normal translation termination? Several observations suggest that it is the spatial location of translation termination, specifically in relation to the downstream mRNA, that is important for PTC definition. For instance, if translation termination occurs closer to the 5’ end of an mRNA and further from the normal termination codon, the mRNA will be more efficiently targeted for rapid degradation through the NMD pathway, a phenomenon referred to as polarity

(Losson and Lacroute, 1979; Peltz et al., 1993). Additionally, an extended 3’ untranslated region (UTR) can cause a normal termination codon to be perceived as a PTC and trigger NMD (Kebaara and Atkin, 2009; Muhlrad and Parker,

1999a). Furthermore, an abundance of evidence suggests that differences in the spatial arrangement of translation termination and proteins associated with an mRNA help differentiate between normal and premature translation termination

(reviewed in Baker and Parker, 2004). As described below, the predominant models to explain NMD substrate recognition build upon this concept (Figure

1-3). Critically, however, no model represents a conserved and universal

27 mechanism for how NMD substrates are discriminated from normal RNAs, highlighting the ongoing need for investigation into this fundamental aspect of

NMD.

Pre-mRNA splicing and the exon junction complex

In higher eukaryotes, particularly vertebrates, pre-mRNA splicing can enhance NMD (Carter et al., 1996; Figure 1-3A). Specifically, an exon-exon junction located more than ~50-55 nucleotides downstream of a termination codon enhances the targeting of that mRNA by NMD (Zhang et al., 1998).

During pre-mRNA splicing, a suite of proteins known as the exon junction complex (EJC) is deposited onto the mRNA ~20-24 nucleotides upstream of the splice junction (Le Hir et al., 2000). Ribosomes displace EJCs from the coding region of mRNAs during translation elongation, usually resulting in an EJC-free mRNA. However, if translation termination occurs >55 nucleotides (nt) upstream of an exon-exon junction, translating ribosomes cannot physically displace the downstream EJC (known as the 50-55 nt rule; Nagy and Maquat, 1998). When translation termination occurs in the presence of an EJC, two of the core NMD factors, UPF2 and UPF3, which associate with the EJC through direct interaction with EJC component Y14 (Kashima et al., 2006; Le Hir et al., 2001), are believed to interact with UPF1 and activate NMD.

There is substantial evidence that the EJC enhances NMD. First, some

PTCs are only competent to induce NMD when located upstream of an exon- exon junction (Carter et al., 1996). Second, some naturally intronless genes,

28 which lack EJCs, are immune to NMD unless an artificial intron is placed downstream of the PTC (Maquat and Li, 2001). Finally, artificially placing EJC components on an mRNA downstream of a termination codon, through tethering protein components of the EJC or inserting a downstream intron, can convert an mRNA to an NMD substrate (Carter et al., 1996; Ivanov et al., 2008; Lykke-

Andersen et al., 2001; Thermann et al., 1998). Interestingly, the general absence of introns from 3’ UTRs suggests selective pressure exists to limit splicing downstream of a normal termination codon, which would cause otherwise normal

RNAs to be recognized as NMD substrates (Nagy and Maquat, 1998).

Despite this evidence, the presence of an EJC - or splicing in general - is not absolutely required for NMD substrate recognition. In some cases, reporter mRNAs that lack introns (Singh et al., 2008) or which contain a PTC that does not meet the “50-55 nt rule” (Buhler et al., 2006; Carter et al., 1996) are still recognized as NMD substrates. Furthermore, lower eukaryotes including D. melanogaster (Gatfield et al., 2003), C. elegans (Longman et al., 2007), fission yeast S. pombe (Wen and Brogna, 2010), and budding yeast S. cerevisiae do not require pre-mRNA splicing for efficient NMD substrate recognition. Indeed, only

~5% of S. cerevisiae mRNAs undergo pre-mRNA splicing, while every mRNA from this organism that has been tested can be made competent for NMD.

Therefore, while an EJC can enhance NMD substrate recognition, likely by increasing recruitment of the NMD machinery to an mRNA, it cannot underly substrate recognition for all mRNA substrates of this conserved pathway.

29 Other cis-acting elements

In budding yeast, which lack EJCs, a related model referred to as the downstream sequence element (DSE) model was previously proposed (Figure

1-3B). In this case, a degenerate consensus sequence of TGYYGATGYYYYY

A Pre-mRNA splicing EJC deposition

AAAAAAAAAA AAAAAAAAAA

A Displacement of Retention of EJC on EJC by translating Pre-mRNA mRNA during ribosomes premature translation splicing terminationEJC deposition

AAAAAAAAAA AAAAAAAAAA

AAAAAAAAAA AAAAAAAAAA

B Displacement of HRP1 Retention of EJC on EJC by translating mRNA during HRP1 ribosomes premature translation termination DSE AAAAAAAAAA DSE AAAAAAAAAA

AAAAAAAAAA AAAAAAAAAA

C NMD NMD Positive interactions with terminating B HRP1 AAAAAAAAAA AAAAAAAAAAribosome and release factors HRP1

DSE AAAAAAAAAA DSE AAAAAAAAAAAAAAAAAAAA AAAAAAAAAA

D NMD NMD Positive interactions DC with terminating ribosome and release factors AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA

Figure 1-3. Models for NMD substrate recognition. A) The EJC model postulates that when translation termination occurs >50-55nt upstream of an exon-exon junction, the EJC fails to be removed from the mRNA during translation which enhances recruitment of the NMD machinery. B) In the DSE model, HRP1 associates with a degenerate consensus sequence; when it fails to be displaced by translating ribosomes due to premature translation termination, NMD is triggered. C) The faux 3’ UTR model suggests that a false or extended 3’ UTR increases the distance between translation termination and poly(A) binding protein, which ordinarily provides a positive signal that translation is normal. In the absence of this positive signal, the NMD machinery is recruited. D) Other unidentified components of the mRNP may provide spatial or contextual information about the location of translation termination, similar to the models above.

30 (the DSE) elicits NMD only when located downstream of a premature termination codon (Hagan et al., 1995; Zhang et al., 1995), in a manner similar to an EJC.

HRP1, a heterogeneous nuclear ribonucleoprotein (hnRNP), has been proposed to bind to this consensus sequence and recruit the NMD machinery when translation termination occurs while it is associated with an mRNA (Gonzalez et al., 2000).

Importantly, the DSE model is not generally accepted as a mechanism by which NMD substrates may be recognized. Most importantly, it fails to explain how NMD substrates lacking a DSE can still be effectively discriminated from normal mRNAs (Hagan et al., 1995; Zhang et al., 1995). Moreover, the requirement for an mRNA to contain a DSE within its coding sequence would place evolutionary constraints on open reading frames and polypeptide sequences. Finally, a DSE motif has not been identified in any organism except yeast. This has led to the conclusion that the DSE model cannot represent a universal or conserved mechanism by which NMD substrates are identified, and left open the question of how NMD substrate discrimination is carried out in the ancestral model eukaryote, S. cerevisiae.

Faux 3’ UTR

A third model for NMD substrate recognition is the faux 3’ UTR model.

This model was originally described in yeast, but is now believed to apply more broadly throughout eukarya. The model is based on the fact that factors associated with a normal mRNA 3’ end are further from the location of premature

31 translation termination compared to normal termination (Figure 1-3C).

Specifically, the increased distance between the site of translational termination and the 3’ end of an mRNA created by premature termination generates a “false”

3’ UTR (Amrani et al., 2006; Amrani et al., 2004). A typical mRNA ends with a 3’ polyadenosine tail, to which the poly(A)-binding protein PAB1 binds. It is believed that when translation termination occurs at the normal position, proximal to the 3’ end of the RNA and poly(A) tail, PAB1 provides a positive signal to terminating ribosomes that termination is occurring in a normal context. When termination occurs further upstream and more distal to the poly(A) tail, such as at a PTC, it occurs in the absence of proximal PAB1 and, therefore, lacks the positive signal that translation termination is normal. The mechanism for this distance-dependent relationship was initially proposed to be due to competition between PAB1 and UPF1 to bind eukaryotic release factor eRF3 (SUP35 in yeast). During normal translation termination, PAB1 would interact with eRF3 and block the association of UPF1 at the site of termination; in contrast, during premature termination, the absence of PAB would allow UPF1 to interact with eRF3, thus distinguishing premature translation termination from normal termination (Amrani et al., 2004; Ivanov et al., 2008). However, subsequent data has failed to find evidence to support this competitive binding in vitro (Kervestin et al., 2012; Singh et al., 2008).

A number of pieces of evidence support a role for PAB1 in NMD. The faux

3’ UTR model is supported by the observation that increasing the distance between translation termination and the 3’ end of an mRNA, through an extended

32 3’ UTR or a more 5’-proximal PTC, increases targeting of that mRNA to NMD

(Buhler et al., 2006; Eberle et al., 2008; Singh et al., 2008). Additionally, in yeast,

D. melanogaster, and humans, artificially tethering PAB1 immediately downstream of a PTC converts an NMD-sensitive mRNA to one that is insensitive to NMD (Amrani et al., 2004; Behm-Ansmant et al., 2007; Eberle et al., 2008; Ivanov et al., 2008; Silva et al., 2008; Singh et al., 2008).

Once again, however, the faux 3’ UTR model is unable to fully explain

NMD substrate recognition. In both S. cerevisiae and S. pombe, cytoplasmic

PAB1 is not required for the accurate discrimination of normal and PTC- containing mRNAs by the NMD machinery (Meaux et al., 2008; Wen and Brogna,

2010). It has also been shown in S. cerevisiae that neither normal 3’ end formation to produce a normal 3’ UTR (Baker and Parker, 2006; Gonzalez et al.,

2000) nor the presence of a poly(A) tail on an mRNA is required for NMD substrate recognition (Meaux et al., 2008). Therefore, while the context of the 3’

UTR does appear to play an important role in NMD substrate recognition, PAB1 and the poly(A) tail are not sufficient to explain this effect.

Towards a universal model

In the absence of a universal model for NMD substrate recognition, alternative ways of explaining how this process occurs should be considered.

One approach has been to merge all existing models into a “unified NMD model” in which a faux 3’ UTR triggers NMD, and can be enhanced by factors such as an EJC or DSE (Stalder and Muhlemann, 2008). Alternatively, it has been

33 suggested that NMD substrate recognition may be more complex, resulting from a combination of signals that antagonize NMD (i.e. PAB1) and signals that favor

NMD (i.e. EJCs) to define translation termination as normal or aberrant (Silva and Romao, 2009; Singh et al., 2008). Indeed, it is likely that not all signals that can either enhance or inhibit NMD have been identified. Perhaps additional components of the messenger ribonucleoprotein (mRNP) complex provide contextual information about translation termination, similar to what has been proposed for the EJC, PAB1, or HRP1 (Figure 1-3D).

Certainly, there is clear evidence that specific mRNP features can positively or negatively impact NMD substrate recognition, and it is widely accepted that a combination of cis- and trans-acting features of the mRNP complex define an NMD substrate. What is not yet agreed upon, however, is which factors are necessary and sufficient to elicit NMD. By expanding our knowledge of the trans-acting protein factors that associate with normal versus

NMD-sensitive mRNAs, it may be possible to identify additional features important for the recognition of NMD substrates in S. cerevisiae, which will increase our understanding of how NMD substrate recognition occurs. To this end, a protocol to purify an in vivo mRNP complex has been developed and is presented in Chapter 2. This procedure will facilitate the identification of additional proteins that differentially associate with NMD-sensitive mRNAs and which may be involved in discriminating these from normal mRNA species.

34 1.6 Downstream steps in NMD

Activation of UPF1 enzymatic activity

The molecular events subsequent to recognition of an NMD substrate are not understood in detail, despite much investigation. It is known that both the

ATPase and helicase activity of UPF1 are required for NMD (Weng et al., 1996a, b). In addition to these two enzymatic functions of UPF1, the phosphorylation cycle of UPF1 is also required for progression of NMD in higher eukaryotes

(Kashima et al., 2006; Yamashita et al., 2009). However, there is conflicting evidence regarding how and when these steps occur, and what their role in NMD may be (reviewed in Schweingruber et al., 2013). Most evidence suggests that in higher eukaryotes, phosphorylation of UPF1 by SMG1 occurs following association with and recognition of an NMD substrate, as part of a multi-protein complex (Kashima et al., 2006). This phosphorylation is important for the recruitment of SMG5-7 and progression of NMD (Ohnishi et al., 2003).

Meanwhile, the ATPase and helicase activities of UPF1 are believed to be activated by its association with UPF2 (Chakrabarti et al., 2011; Chamieh et al.,

2008), which is likely to occur concomitant with or immediately after NMD substrate recognition. The ATPase and helicase activities of UPF1 have been proposed to be involved in steps as diverse as remodeling the mRNP to enhance the decay of NMD substrate RNAs (Franks et al., 2010; Melero et al., 2012), scanning the 3‘ UTR for EJCs or other NMD-inducing features (Shigeoka et al.,

2012), or recycling UPF1 (Kurosaki et al., 2014). Importantly, although many of these molecular details are still being worked out, what is quite clear is that the

35 progression of steps from recognition of an NMD substrate to degradation of that

RNA involves many mRNP remodeling events and interactions.

Direct impact of NMD on protein output

Premature translation termination is primarily deleterious to the cell due to the potential production of truncated polypeptides with dominant-negative function (reviewed in Bhuvanagiri et al., 2010; Holbrook et al., 2004). Therefore, an important aspect of NMD is to minimize the amount of protein produced from an ORF truncated by a nonsense codon. There is evidence that this may occur in two ways. First, translation of the mRNA may be repressed prior to the decay of NMD substrate mRNAs. In humans, UPF1 can directly inhibit translation in vitro through interaction with eukaryotic translation initiation factor eIF3 (Isken et al., 2008). This is corroborated by observations in yeast that more protein is produced per RNA molecule from NMD substrate mRNAs when NMD is inactivated (Muhlrad and Parker, 1999b). These data suggest that UPF1 is normally involved in limiting the translation of NMD substrate mRNAs independent of their accelerated degradation. Second, there is evidence that proteins generated from NMD substrate RNAs are targeted for degradation by the proteasome as another mechanism to minimize the expression of proteins from the aberrant mRNA (Kuroha et al., 2009), although mechanistically how

NMD might promote protein turnover is unclear. This suggests that NMD not only induces the decay of the RNA substrate (see below), but also of the aberrant

36 protein encoded by the RNA. These data indicate that the cell takes a number of precautions to minimize the production of truncated proteins.

Decay of NMD substrate RNAs

The ultimate fate of a transcript recognized as a substrate for NMD is accelerated decay of the RNA. Rapid decay ensures that steady-state levels of the aberrant RNAs are low, thereby limiting production of truncated polypeptides.

RNAs targeted to NMD primarily undergo decay initiated at the 5’ end of the transcript by decapping, carried out by a holoenzyme composed of DCP1 and catalytic subunit DCP2. This is followed by rapid 5’ to 3’ exonucleolytic digestion by cytoplasmic exoribonuclease XRN1 (Lejeune et al., 2003; Muhlrad and

Parker, 1994). In yeast, decapping of NMD substrates occurs independent of deadenylation, making it distinct from the decay of normal mRNAs which is initiated by removal of the 3‘ poly(A) tail (Decker and Parker, 1993; Muhlrad et al., 1994). Notably, exonucleolytic decay from the 3’ end of the RNA, carried out by deadenylases and the cytoplasmic exosome, is also accelerated by NMD

(Cao and Parker, 2003; Lejeune et al., 2003; Mitchell and Tollervey, 2003;

Takahashi et al., 2003). A third mode of decay for NMD substrates has been described in higher eukaryotes. SMG6, a PIN-domain endonuclease essential for NMD, catalyzes the endonucleolytic cleavage of RNAs targeted to NMD

(Eberle et al., 2009; Huntzinger et al., 2008), either at the site of the PTC or

~20-40 nucleotides downstream (Boehm et al., 2014). This subsequently leads to exonucleolytic decay of the two resulting RNA fragments. In higher eukaryotes

37 in which both endonucleolytic and exonucleolytic decay occur, it is unclear which pathway dominates (Muhlemann and Lykke-Andersen, 2010).

Although degradation of NMD-substrate mRNAs has been well studied, it is not clear how this accelerated decay is induced. Some evidence suggests the decay machinery may be directly recruited to NMD substrates by NMD factors

(He and Jacobson, 1995; Loh et al., 2013; Lykke-Andersen, 2002; Takahashi et al., 2003). However, these interactions have never been shown to be absolutely required for the rapid decay of NMD substrate RNAs, and in some cases have even been shown to be dispensable for NMD (Swisher and Parker, 2011).

Therefore, a second possibility is that the NMD machinery induces an overall change in the organization of the mRNP or the mRNA translation state, which could indirectly make the mRNA accessible for decay (reviewed in Baker and

Parker, 2004).

1.7 Harnessing NMD to investigate RNA biology

The NMD pathway provides an invaluable system for learning about RNA biology more broadly. This pathway is sufficiently characterized such that it can be easily manipulated, while simultaneously providing a number of remaining questions that are of significant interest to unravel. Here, I have taken advantage of the current knowledge - and knowledge gaps - about the NMD pathway to study three novel experimental questions. First, NMD was used to develop a protocol to purify and compare in vivo-assembled mRNP complexes, by taking advantage of the effective discrimination between NMD-sensitive and -insensitive

38 RNAs by the NMD pathway. In this way, an experimental system could be designed to compare and contrast two very similar RNA species that were nevertheless subject to distinct mechanisms of decay. Furthermore, this approach simultaneously allows the identification of novel mRNP features unique to mRNAs that undergo NMD, which may clarify our incomplete understanding of

NMD substrate recognition. Second, the sensitivity of an RNA to NMD, a pathway that is dependent upon translation to recognize its substrates, can provide evidence that the RNA is translated. This feature was used as one of several independent pieces of evidence that many putatively non-coding RNAs undergo translation and thus may be misclassified. Finally, global gene expression analysis in the absence of NMD was used to identify NMD-sensitive mRNAs and RNA isoforms. It is predicted that, during this analysis, novel RNA isoforms can be identified that have previously been overlooked due to their instability and low abundance. Thus, through their identification it will also be possible to learn about the extent of alternative RNA processing in yeast, which is believed to be minimal compared to higher eukaryotes. At the same time, this will refine the list of endogenous substrates of NMD, and potentially demonstrate an expanded breadth of NMD activity important to help the cell tolerate erroneous RNA processing or regulate gene expression. Importantly, by harnessing NMD to uncover information about diverse aspects of RNA biology, we have also expanded our understanding of the mechanisms and impact of

NMD.

39 CHAPTER 2: PURIFICATION OF AN NMD-

SENSITIVE mRNP COMPLEX

2.1 Introduction

Rationale - evidence for a distinct NMD mRNP

The repertoire of proteins associated with an mRNA (i.e., the mRNP) can serve as a critical signal to the cell as to whether that RNA is aberrant (Eberle and Visa, 2014). Indeed, all current models for NMD substrate recognition are built upon differences in the association of proteins with NMD-sensitive versus

NMD-insensitive mRNAs which define the context of translation termination - for instance, the presence and location of an EJC. However, as described above

(see Chapter 1.5), current knowledge regarding the differences between NMD- sensitive and NMD-insensitive mRNAs is insufficient to explain how the cell discriminates between these two classes. The full extent of differences in the mRNP complex between these two classes of mRNAs, or indeed the complete repertoire of proteins associated with any single mRNA, is unknown (reviewed in

Muller-McNicoll and Neugebauer, 2013). Because mRNA binding proteins are intimately involved in the life-cycle of an mRNA - many are deposited co- transcriptionally and impact events from pre-mRNA processing to degradation of the mRNA (Muller-McNicoll and Neugebauer, 2013) - a complete understanding of associated factors will improve our knowledge about many of these processes and, in particular, about NMD. To this end, we sought to develop a protocol to purify and characterize a single mRNP species as it exists in vivo. This approach

40 can be used to identify differences in proteins associated with an NMD-sensitive mRNA versus an NMD-insensitive counterpart. In addition to the information that can be learned about NMD substrate recognition, the NMD pathway provides an excellent system in which to develop this protocol, as proteins that associate with two similar mRNAs that nevertheless differ in their sensitivity to NMD can be directly compared. Furthermore, this technique is invaluable in its utility to interrogate the mRNP complex of any single mRNA species to gain understanding about various aspects of mRNA activity or metabolism.

Existing methods to interrogate RNA-protein interactions

Many techniques have been developed over the years to gain information about the interactions between proteins and RNAs. These techniques have provided fundamental information about the nature and makeup of RNP complexes, including small nucleolar RNPs (snRNPs) (Bardwell and Wickens,

1990; Grabowski and Sharp, 1986; Niranjanakumari et al., 2002), the spliceosome (Zhou and Reed, 2003), 7SK non-coding RNA (Hogg and Collins,

2007), small bacterial non-coding RNAs (ncRNAs; Said et al., 2009) and individual mRNAs (Slobodin and Gerst, 2010). Each method has proven to be beneficial in specific contexts, but as described below none were ideally suited to address our experimental question.

One of the earliest methods developed to identify multiple proteins that bind to a single RNA is RNA affinity chromatography (Grabowski and Sharp,

1986; Sharma, 2008) in which a cell lysate is applied to an immobilized synthetic

41 RNA (often biotinylated RNA bound to a streptavidin resin) such that RNA binding proteins specific to the sequence or structure of the synthetic RNA bind and are retained on the resin (Grabowski and Sharp, 1986). This approach can identify a set of proteins that associate with a single RNA, but because the protein-RNA associations are formed in vitro a major caveat of this approach is that the interactions may not be a true representation of complexes that exist in vivo.

A number of methods taking the reverse approach - identifying RNAs bound by a single protein - have been developed, built around RNA immunoprecipitation (RIP; Gilbert et al., 2004; Niranjanakumari et al., 2002). In this method, a specific protein is immunoprecipitated, and RNAs that co-purify with the protein are analyzed. When RNA analysis is performed by microarray

(i.e. RIP-chip; Keene et al., 2006) or high-throughput RNA sequencing (i.e. RIP-

Seq; Zhao et al., 2010), a global list of mRNAs with which the protein associates can be obtained. Immunoprecipitation may be done following covalent crosslinking of protein-RNA complexes, often with formaldehyde (Gilbert et al.,

2004; Niranjanakumari et al., 2002; Selth et al., 2009) or UV light (i.e. UV

CrossLinking and ImmunoPrecipitation, CLIP; Ule et al., 2005), to ensure the interactions that are detected were captured as they occur in vivo. These variations on RIP provide specific information about whether an RNA interacts with a given protein, but are unable to simultaneously provide information regarding all RBPs that associate with that RNA.

The approach most conceptually similar to the one developed here is the use of RNA aptamers and their cognate binding proteins or ligands to selectively

42 purify a single mRNA species. In this method, an RNA aptamer sequence is inserted into the RNA of interest, and it is purified from a whole cell lysate by binding to its protein or ligand. Many derivations of this method exist, including

RNA Affinity in Tandem (RAT) which utilizes aptamers for both bacteriophage

MS2 coat protein and tobramycin on a single RNA to perform a two-step purification (Hogg and Collins, 2007), or RNA-binding protein Purification and

IDentification (RaPID; Slobodin and Gerst, 2010) which captures in vivo interactions by formaldehyde crosslinking. The most useful aspect of these approaches is the unbiased identification of proteins interacting with a single

RNA, accomplished by targeting the RNA for purification rather than the protein.

Because the RNA aptamer-ligand system is ideal for the experimental question we hoped to ask, initial mRNP purification trials used the MS2 coat protein and RNA aptamer system. However, a number of technical challenges limited the ability to use this method for meaningful characterization of mRNP complexes. First, it is unclear what fraction of MS2 coat protein associates with the RNA aptamer in vivo, and, similarly, whether the exogenous MS2 coat protein associates nonspecifically with any yeast proteins or mRNAs in vivo. Both of these issues may have contributed to high background and low purification efficiency (data not shown). Additionally, the stringency of this purification approach is dictated by conditions that are compatible with protein purification, thus limiting the salt concentration and use of detergents. Finally, previous investigations performed with this approach largely purified the RNA of interest from a lysate by in vitro interaction with an immobilized ligand (Bardwell and

43 Wickens, 1990; Hogg and Collins, 2007; Said et al., 2009; Zhou and Reed,

2003), thus making it difficult to prevent the formation of RNA-protein interactions in vitro during purification. Due to these drawbacks, we sought to develop an improved purification scheme based on purification of an mRNP by hybridization.

Purification by hybridization has several benefits, including resistance to denaturing conditions, and the ability to directly target an endogenous RNA of interest - without modification - for purification. Several related methodologies have taken advantage of these benefits to identify the repertoire of proteins that bind all mRNAs in yeast (Mitchell et al., 2013) and mammals (Baltz et al., 2012;

Castello et al., 2012), by identifying RBPs that crosslink to polyadenylated RNAs subsequently purified by oligo-d(T) selection. This provided an excellent way to identify novel RBPs, and revealed unexpected classes of RNA-associated proteins such as metabolic proteins (Mitchell et al., 2013). These studies accomplished on a global scale what we sought to accomplish at single-RNA resolution. A related approach, RNA antisense purification (RAP) purifies an

RNA by hybridization in order to identify its interactions with DNA (Engreitz et al.,

2013) or RNA (Engreitz et al., 2014), either directly or through protein intermediates. The specificity and resistance to stringent purification conditions that the hybridization approach provides made it the ideal system around which to design our purification protocol.

2.2 Results and conclusions

Overview of the mRNP pulldown protocol

44 We sought to design a procedure to purify a single mRNA species and the proteins that are associated with that mRNA in vivo, so as to identify the protein composition of a single mRNP. The details of this protocol and optimization are described below (see also Appendix B). Briefly, the mRNP is purified from dcp2Δ yeast cells lacking the enzymatic activity to remove the protective 5’ 7- methylguanosine cap from mRNAs (Steiger et al., 2003), which stabilizes an

NMD-sensitive reporter (Muhlrad and Parker, 1994) so as to increase its steady- state abundance to enable purification of a sufficient quantity of material. The yeast strain genetic background also includes deletion of three vacuolar proteases (pep4Δ, prc1Δ, prb1Δ) which minimizes protein degradation during incubation steps of the protocol (data not shown). Prior to lysis, cells are treated with formaldehyde to covalently crosslink protein-RNA and protein-protein interactions that occur in vivo (Niranjanakumari et al., 2002; Sutherland et al.,

2008), and a crude whole-cell lysate is generated. From the lysate, a single mRNA species is purified based on sequence-specific hybridization to biotinylated DNA oligonucleotides immobilized onto streptavidin magnetic beads.

Because this hybridization-based purification is resistant to salt and detergent, purification is carried out under denaturing conditions (0.5% sodium dodecyl sulfate) and high salt (500 mM lithium chloride), to minimize the purification of protein contaminants (bound nonspecifically to mRNA, DNA, or beads) or the in vitro formation of complexes (Mili and Steitz, 2004), while maintaining the covalent interactions captured by in vivo formaldehyde crosslinking. After purification, RNA can be extracted and analyzed by northern blotting or

45 quantitative reverse transcriptase polymerase chain reaction (qRT-PCR), or proteins may be analyzed by western blotting or mass spectrometric analysis.

Design and characterization of reporter mRNAs

To identify proteins that differentially associate with NMD-sensitive versus

NMD-insensitive mRNAs, reporter mRNAs to represent each of these classes were deliberately designed. These reporter mRNAs encode the heterologous

GFP ORF (Figure 2-1A, reporter 1). Insertion of a PTC 25% of the way through the ORF (Figure 2-1A, reporter 2) rendered the mRNA a robust substrate for

NMD, based on a >10-fold decrease in abundance that was rescued upon NMD inactivation by deletion of NMD factor UPF1 (Figure 2-1B, compare WT to upf1Δ). Deletion of sequences following the PTC that functionally act as a 3’

UTR upon premature translation termination (Figure 2-1A, reporters 3 - 5) decreased the sensitivity of the GFP reporter mRNA to NMD in a sequence- independent manner (Figure 2-1B). Indeed, the NMD sensitivity of the GFP reporter was essentially eliminated by removing all downstream untranslated regions of the ORF (Figure 2-1A, reporter 5; Figure 2-1B [ΔABC]). Thus, reporters 2 and 5 were selected to compare NMD-sensitive and NMD-insensitive mRNPs, as they are identical except for the deletion of untranslated sequences downstream of the PTC in reporter 5, contain translated ORFs of equivalent size and sequence, but display different sensitivities to NMD. It was particularly beneficial that the differential sensitivity of the reporters to NMD was achieved by altering only the length of the 3’ UTR while keeping the ORF length constant, in

46 WT upf1 PTC at 25% PTC at 25% A B

B BC B BC A B C A B C 1

GFP 2

3 SCR1

4 160

140 A 120 100 5 80 WT 60 upf1

Abundance 40 Relative mRN 20 0 Normal PTC PTC (1) PTC (2) PTC (3)

C ABC ABC ABC ABC

+PTC +PTC +PTC +PTC +PTC +PTC +PTC +PTC

GFP GFP GFP GFP GFP GFP GFP GFP GFP GFP GFP GFP

Relative RNA levels 100 21.6 102 30.0 9.44 25.0 37.4 31.6 35.3 123 178 589 Promoter GAL TDH3 Plasmid copy-number Cen 2µ WT dcp2

Figure 2-1. Design and validation of GFP reporter mRNAs. A) Schematic representation of GFP reporters. Green indicates ORF; light green indicates untranslated open reading frame due to insertion of a PTC 25% of the way through the ORF. B) Steady-state abundance of GFP reporters in wild-type and NMD-deficient upf1Δ cells. Quantification of each set of deletions, normalized to SCR1 levels, shown below. Error bars represent SEM. C) Steady-state abundance of GFP reporters expressed from inducible GAL promoter or constitutive TDH3 promoter, from plasmids bearing a centromeric (Cen; i.e. low-copy) or 2µ (i.e. high-copy) origin of replication. Cells expressing GAL-driven reporters were grown in media containing 2% galactose/1% sucrose instead of 2% glucose, to induce expression.

light of recent observations that ORF length can impact the sensitivity of an mRNA to NMD through an unresolved mechanism (Decourty et al., 2014).

Importantly, the commonly used, inducible GAL promoter is inefficient at driving gene expression in the dcp2Δ genetic background required to stabilize the NMD- sensitive reporter RNA (Geisler et al., 2012). Therefore, to drive expression in this strain, the GFP reporters were placed under control of the constitutive TDH3

47 promoter (Nacken et al., 1996). Because this promoter is ~3-fold weaker than the GAL promoter (Figure 2-1C, left two panels), the GFP mRNAs were expressed from a high-copy plasmid (Figure 2-1C, right panel) to drive high expression and increase the yield of the purified mRNP.

Optimization of mRNP pulldown efficiency

The mRNP pulldown protocol is based on the high specificity base-pairing between complementary DNA and RNA sequences. The overall purification scheme (Figure 2-2A) involves immobilizing complementary DNA oligonucleotides to magnetic beads followed by hybridization to the target RNA in a whole-cell lysate to specifically and cleanly purify a single RNA species. For immobilization, a 3’ biotin adduct was incorporated onto the DNA oligonucleotides, and the strong interaction between biotin and streptavidin was harnessed. The resistance of this interaction to high salt and ionic detergents allowed us to perform purification under stringent conditions without disrupting the association of complementary DNA oligonucleotides with the beads. A panel of streptavidin Dynabeads with different binding capacities and hydrophilic properties was tested (Figure 2-2B). The MyOne Streptavidin C1 Dynabeads

(Invitrogen) provided >2.5-fold higher binding capacity than any other beads tested, based on the amount of GFP reporter purified from whole-cell RNA

(Figure 2-2C).

Nearly all variables for the mRNP pulldown were exhaustively optimized to maximize recovery of the GFP reporter mRNAs, including concentration and

48 A

M-270M-280MyOneMyOne - C1M-270 - T1M-280MyOneMyOne - C1 - T1

B C

1.4 1.7 5.6 2.2 Percent of Input Input Eluate

Figure 2-2. Selection of streptavidin-conjugated magnetic beads for mRNP purification. A) Schematic of mRNP pulldown design. Biotinylated DNA oligonucleotides are immobilized onto streptavidin magnetic beads, which hybridize to the reporter mRNA and copurify any associated RNA binding proteins, including known RBPs such as ribosomal proteins or PAB1 (shown), and unknown factors. B) Table of characteristics of magnetic streptavidin Dynabeads tested for use in mRNP pulldown protocol. C) Northern blot after pulldown of GFP reporter mRNA from whole-cell RNA, to test Dynabeads for binding capacity.

length of formaldehyde crosslinking treatment, lysis method, hybridization conditions, incubation time and temperature, and elution method.

Representative optimization experiments are presented in Figure 2-3. The highest GFP reporter mRNA recovery was achieved following an overnight incubation at room temperature, which resulted in >4-fold more mRNA purified than a one-hour incubation at room temperature or overnight incubation at 50 °C, and approximately 30% more than an overnight incubation at 37 °C (Figure

2-3A). The addition of the RNA denaturant formamide was found to increase

49 A B Input Supernatant Eluate Eluate 1/10 Input

°C O/N °C O/N °C O/N °C O/N RT 1hr RT O/N 37 50 RT 1hr RT O/N 37 50 ComplementaryNo DNA 2x bead 2x incubationDecreased +14% oligo oligo volume volume LiCl formamide

Percent of Input 1.3 6.1 4.8 1.3 Percent of Input 15.3 51.8 17.0 14.5 30.5

C

1/10 Supernatant Eluate 1/10 Input

Oligos 1 3 4 1+2+3 1+2+4 1+2+3+4 1 3 4 1+2+3 1+2+4 1+2+3+4

Percent of Input 100 102 72.5 14.7 24.0 14.5 14.5 0.72 0.38 35.1 19.4 47.4 36.8

Figure 2-3. Optimization of mRNP pulldown conditions. A) Incubation time and temperature for the hybridization step of the mRNP pulldown protocol was optimized for maximum mRNA recovery, based on percent of reporter mRNA purified under each condition. B) A panel of buffer and incubation conditions was tested for the hybridization step of the mRNP pulldown protocol. Conditions as labelled. C) A series of DNA oligonucleotides complementary to different regions of the common sequence of the GFP reporters (schematic of NMD-sensitive reporter shown below) were tested for the efficiency with which they purified the reporter mRNA. Star indicates combination selected for use in purification.

recovery approximately two-fold, presumably by increasing accessibility of the reporter mRNA for hybridization (Figure 2-3B).

Complementary DNA oligonucleotides were extensively tested to identify those which afforded the greatest recovery of the reporter mRNA. These oligonucleotides were required to satisfy several criteria. First, they must be

50 complementary to the GFP ORF sequence rather than flanking sequences, because the reporter utilizes 5’ and 3’ UTR sequences from endogenous yeast genes which would also be purified if hybridization was targeted to these regions

(Figure 2-3C, bottom); in contrast, the GFP ORF bears no significant homology to any yeast sequences. Second, they must be complementary to only the region of the GFP ORF common to both NMD-sensitive and NMD-insensitive reporters, such that both are being purified in an identical fashion. Third, to prevent hybridization between the mRNA and DNA oligonucleotide from being physically occluded, we avoided the last ~18 nucleotides of the common GFP ORF sequence which would overlap the footprint of a terminating ribosome (Ingolia et al., 2009). Fourth, the oligonucleotides must not contain any significant intramolecular secondary structure which could block hybridization to the reporter mRNA. Based on these criteria, DNA oligonucleotides complementary to different regions of the common GFP ORF were designed and tested both in isolation and in combination. Individually, the DNA oligonucleotides showed a wide range of effectiveness, recovering from <1% to >35% of the GFP reporter

(Figure 2-3C). We rationalized that simultaneously using a combination of DNA oligonucleotides might increase the capture of individual mRNA molecules obscured at one hybridization site but not another due to crosslinking of proteins.

Indeed, a combination of three DNA oligonucleotides was determined to be the most effective combination for purifying the GFP reporter mRNA, increasing the percent of RNA recovered to nearly 50% (Figure 2-3C, starred lane).

51 After thorough optimization of the mRNP pulldown protocol, the specificity of purification was determined by two assays. First, northern blot analysis demonstrated efficient purification of the GFP reporter mRNA, with 1/3 to 1/2 of the RNA present in the lysate consistently recovered (Figure 2-4A, top panel). In contrast, a nonspecific but highly abundant cellular mRNA, PGK1, was undetectable in the purified sample (Figure 2-4A, middle panel). Additionally, there was no detectable background of the highly-abundant 18S ribosomal RNA

(rRNA), a common purification contaminant, in a control purification from cells lacking a GFP reporter (Figure 2-4A, bottom panel). In contrast, co-purification of rRNA did occur upon purification of either reporter mRNA (Figure 2-4A, bottom panel), which was anticipated due to the fact that the GFP reporters are

A B

) 100000

No reporterNMD-sensitiveNMD-insensitiveNo reporterNMD-sensitiveNMD-insensitiveNo reporterNMD-sensitiveNMD-insensitive

PGK1 10000 / GFP

GFP GFP 1000

PGK1 100

18S rRNA 10 Fold enrichment ( 1 1/10 Input 1/10 Sup Eluate NMD-sensitive NMD-insensitive

32 16 40 31 Percent of Input

Figure 2-4. mRNP pulldown is highly specific for target RNA. A) Northern blot of GFP reporter RNA purified in mRNP pulldown (top), a negative control mRNA PGK1 (middle), and ribosomal RNA. Values indicate percent of GFP reporter RNA present in supernatant or purified eluate relative to respective input. B) Quantitation by qRT-PCR of GFP reporter and PGK1 negative control mRNAs purified by mRNP pulldown. Plotted is the relative enrichment of GFP reporter mRNA (GFP eluate/GFP input) compared to PGK1 (PGK1 eluate/PGK1 input). Data represent the average of 3-4 independent experiments +/- SEM.

52 translated (data not shown). Second, qRT-PCR analysis of the purified RNA demonstrated >5000-fold enrichment for the GFP reporter versus the non- specific PGK1 mRNA, providing quantitative evidence that this purification protocol is highly specific (Figure 2-4B). Furthermore, quantification of the absolute amount of GFP reporter RNA purified, based on comparison to a standard curve by qRT-PCR, indicated that approximately 10-50 pmol of GFP

RNA could be purified from yeast grown in 200 mL of liquid culture (data not shown). Based on these data, the mRNP pulldown can specifically and efficiently purify an mRNA of interest with minimal undesired nonspecific background.

Co-purification of mRNA-binding proteins

Next, co-purification of protein components of the mRNP was analyzed. mRNP pulldown was performed for cells expressing either NMD-sensitive or

NMD-insensitive reporter mRNAs, and control cells lacking a GFP reporter.

Proteins co-purified with the reporter mRNAs were analyzed by western blotting

(Figure 2-5). As expected, the mRNA-associated protein PAB1, which binds to mRNA 3’ poly(A) tails, was detected in the purified eluate for both reporter mRNAs, but was not detected in the purification from cells that did not express a reporter mRNA (Figure 2-5A, third panel). When normalized to the quantity of mRNA purified for this experiment (as determined by northern blotting), the level of co-purified PAB1 was similar between the two reporters (Figure 2-5B, white).

This demonstrated that predicted mRNP proteins were specifically co-purified with the target mRNA via the mRNP pulldown protocol. As a control for

53 nonspecific background, the metabolic 3-phosphoglycerate kinase

(PGK1), which is not expected to associate with mRNA, was undetectable in all purified eluates (Figure 2-5A, bottom panel). This provided additional evidence that the purification procedure resulted in limited purification of nonspecific proteins, indicating that the contribution of in vitro formed mRNP interactions and the association of proteins with the magnetic beads or DNA oligonucleotides under these conditions was minimal.

Interestingly, NMD factor UPF1 showed a strong enrichment in the mRNP of the NMD-sensitive mRNA but demonstrated some level of association with both reporter mRNAs (Figure 2-5A, top two panels), consistent with previous observations that UPF1 binds normal mRNAs but shows a preferential association with NMD substrates (Hogg and Goff, 2010; Johns et al., 2007).

A B

Urea lysate - reporterNMD-sensitiveNMD-insensitive reporter - reporter reporterNMD-sensitiveNMD-insensitive reporter reporter 250 kD -HA 100 150 kD (UPF1-HA ~113 kD)

100 kD level) 250 kD -HA A 150 kD (UPF1-HA ~113 kD) 10 Long exposure 100 kD PAB1 UPF1 75 kD -PAB1 (67 kD) 1 50 kD 75 kD Relative level in eluate

-PGK1 (normalized to RN 50 kD (45 kD) 0.1 37 kD Eluates Inputs

NMD-sensitive NMD-insensitive

Figure 2-5. Co-purification of proteins with mRNP pulldown protocol. A) Western blot of proteins co-purified with reporter mRNA in mRNP pulldown. PAB1 and UPF1 are expected mRNP components, while PGK1 serves as a negative control. B) Quantitation of (A). Relative protein levels are normalized to amount of reporter mRNA purified, as determined by northern blotting (data not shown).

54 When normalized to purified mRNA levels, a >100-fold enrichment of UPF1 with the NMD-sensitive mRNP was observed (Figure 2-5B, gray). Importantly, the enrichment of UPF1 with the NMD-sensitive mRNA provided evidence that proteins differentially associated with two mRNAs can be distinguished by this method.

Preliminary identification of mRNP components by mass spectrometry

For the comprehensive identification of proteins associated with the NMD- sensitive GFP mRNA, proteins were analyzed by mass spectrometry (MS) following mRNP pulldown by collaborators in the lab of Dr. Amber Mosley in the

Department of Biochemistry and Molecular Biology at Indiana University. In order to obtain a sufficient quantity of material for analysis by MS, the scale of the purification was increased to 12 liters of liquid yeast culture. Notably, MS analysis was only carried out on one-fourth of the purified material, indicating that purification of an mRNP from ~3 liters of liquid yeast culture should generate adequate material for MS analysis. Briefly, after purification of the mRNP, digestion with trypsin was used to liberate proteins from the bead-associated mRNP complex and generate peptides for MS. Tryptic peptides were separated by multi-dimensional chromatography and analyzed on an LTQ Velos Pro Ion

Trap MS, producing a list of >900 detected yeast proteins. For analysis, 236 yeast proteins met detection cutoffs of ≥2 unique peptides and ≥5 total peptides detected.

Included in the list of identified proteins were highly abundant cellular factors, such as metabolic and structural proteins. Some of these may

55 represent nonspecific background purified by nature of high cellular abundance or affinity for the magnetic beads or DNA oligonucleotides. This list of 236 proteins was compared against several databases of yeast proteins identified as common contaminants in MS analyses. Specifically, the Contaminant Repository for Affinity Purification (CRAPome; Mellacheruvu et al., 2013) identified yeast protein contaminants in several purification systems, while Dr. Amber Mosley provided several additional lists of yeast proteins commonly identified as contaminants in MS purifications. The number of datasets in which a protein appeared, with more hits indicating a higher likelihood of being a contaminant, was taken into account when identifying candidate proteins of interest.

Importantly, these datasets of common contaminants were each generated using distinct purification protocols, and thus contaminants unique to the mRNP purification system cannot be unequivocally identified by comparison to these datasets. Rather, future MS analysis of a mock mRNP purification from cells that lack a GFP reporter mRNA will be used as an invaluable resource to identify precisely which proteins are contaminants of this purification system (see

Chapter 2.4).

Of the 236 identified proteins, 84 have annotated functions related to

RNA. The enrichment of proteins with characterized, predicted, or anticipated functions related to RNA activity supports the specific purification of mRNP complexes from a yeast cell lysate. Proteins with high detection included NMD factor UPF1 and poly(A) tail-binding protein PAB1, both of which associated strongly with this mRNA based on detection by western blot (Figure 2-5A). This

56 indicated that analysis of the mRNP by MS recapitulated the results obtained by the targeted detection of proteins.

Proteins identified by MS analysis were assessed for a potential role in

NMD. Proteins involved in RNA quality control would be predicted to have RNA- related activity, while it would be more surprising for a metabolic enzyme, for example, to play a role in RNA surveillance. Therefore, the subset of 84 proteins with RNA-related functions were selected to be investigated initially. This subset includes proteins with well-defined roles in processes including translation (i.e. ribosomal proteins, translation factors), RNA processing (i.e. tRNA synthetases, splicing factors, RNA polymerases), and RNA decay (i.e. exoribonuclease XRN1,

NMD factor UPF1). Because these proteins have been previously well studied in general, we rationalized that they would be less likely to play a novel role in

NMD, although a secondary function in defining the context of translation termination cannot be excluded based on this alone. Rather, candidate proteins with more poorly defined roles in translation or mRNA metabolism (ribosome- associated proteins, RNA-binding proteins) or specific activity related to other aspects of NMD (nonsense suppression, protein phosphorylation, proteasome interactions), were considered intriguing candidates for an uncharacterized role in NMD. This assessment narrowed the list of candidates to be tested for a role in promoting NMD in the cell to 35 proteins (Table 2-1).

As one method to further hone this list of candidate proteins, the association of each protein with the mRNP relative to its overall cellular abundance was inspected. Importantly, proteins specifically associated with the

57 purified mRNP would be predicted to show an enrichment following mRNP purification. To address this, an abundance/detection (A/D) ratio was calculated.

This ratio measures the estimated cellular abundance of a protein, as determined by a global analysis of chromosomally TAP-tagged protein levels in log-phase yeast cells (Ghaemmaghami et al., 2003), relative to the number of peptides detected per protein after normalization for protein length (molecules per cell/# peptides per protein length). Thus, a smaller A/D ratio indicated an enrichment of the protein in the MS analysis relative to its cellular abundance. For UPF1, a protein specifically enriched on the NMD-sensitive mRNP, the A/D ratio was

2112, indicating that scores in this range or lower were suggestive of high enrichment. In contrast, the metabolic enzyme PGK1 had a much higher A/D ratio of 10450. Based on these examples, a ratio of ~5000 or less was considered to indicate specific enrichment of a protein with the mRNP.

Interestingly, 15 of 25 candidate proteins with a calculated A/D ratio met this score criteria, suggesting these proteins not only bound to but were specifically enriched in the purified mRNP, and making these candidates the highest priority for further investigation.

The list of candidate proteins can be further refined by future analyses of control mRNP complexes. Specifically, purification of an NMD-insensitive mRNP will enable identification of proteins that show a preferential association with either NMD-sensitive or NMD-insensitive mRNA. Additionally, a mock purification from cells lacking a GFP reporter will identify contaminant proteins, which can be excluded from future analysis.

58 Abundance Gene # Unique # Total / Detection Name Viability NMD-Relevant Function Priority Peptides Peptides Ratio

MBF1 Viable Suppressor of frameshift mutations +++ 5 40 1786 OLA1 Viable Increased readthrough of premature stop codons +++ 8 31 -

DBP5 Inviable DEAD-box RNA helicase, mRNP export/remodeling +++ 2 13 5524

DBP2 Viable DEAD-box RNA helicase, uncharacterized role in NMD +++ 2 18 10040

HRP1 Inviable Binds NMD cis-acting DSE +++ 2 6 -

ECM32 Viable Interacts with translation termination factors +++ 3 9 917

BFR1 Viable Associated with mRNP complexes and polyribosomes ++ 10 16 9811

RBG2 Viable GTPase with a role in translation ++ 3 11 -

RBG1 Viable GTPase that associates with translating ribosomes ++ 3 5 -

EAP1 Viable Translation inhibitor, decapping activator ++ 4 7 2925

NEW1 Viable Cosediments with polyribosomes ++ 5 9 49966

SUB2 Inviable DEAD-box RNA helicase, mRNA export ++ 2 17 13564

SRO9 Viable Associates with translating ribosomes ++ 11 32 1143

PBP1 Viable Interacts with PAB1 ++ 6 10 2072

SNF1 Viable Protein kinase ++ 3 5 746

RPG1 Inviable Translation initiation factor eIF3a ++ 12 30 16934

HYP2 Viable Translation eIF5A ++ 2 19 -

ASC1 Viable Inhibits translation, component of 40S ribosomes + 4 18 59015

SCP160 Viable Interacts with translating mRNAs + 17 43 -

ARC1 Viable Involved in tRNA delivery + 9 29 7481

NGR1 Viable RNA binding protein, regulates mRNA-specific decay + 2 8 1294

RLI1 Inviable Translation initiation, termination, ribosome recycling + 3 7 5455

TMA19 Viable Associates with ribosomes + 2 9 -

STM1 Viable Required for optimal translation under stress + 10 29 4406

YLR419W Viable Putative helicase, uncharacterized + 4 15 -

REI1 Viable Cytoplasmic pre-60S factor + 2 8 -

PIN4 Viable Contains RNA-recognition motif + 3 6 5155

SPT5 Inviable Roles in RNA processing and quality control + 3 6 4146

NSR1 Viable rRNA processing and ribosome biogenesis - 11 128 2503

ARB1 Inviable 60S ribosome biogenesis and activity - 5 6 -

PRP43 Inviable RNA helicase involved in Pol II transcript metabolism - 5 7 18518

STE20 Viable Protein kinase - 4 5 486

RPT1 Inviable Proteasome ATPase - 3 8 61

TSA1 Viable Associates with ribosomes - 2 24 30870

VMA2 Viable Vacuolar ATPase - 4 7 96753

Table 2-1. List of NMD-sensitive mRNP-associated proteins with candidate roles in NMD. Gene name, viability of strains lacking gene of interest, brief description of characterized function as relevant to a potential role in NMD, priority, MS peptide detection levels, and A/D ratios listed. Functions based on gene descriptions in the Saccharomyces Genome Database (yeastgenome.org). Priority rankings, based on predicted role and peptide detection by MS: (+++) high interest; (++) moderate interest; (+) low interest; (-) tangential interest.

59 2.3 Discussion

The mRNP pulldown protocol represents a novel method to specifically and efficiently purify a single mRNP species as it exists in vivo. The use of formaldehyde crosslinking captures physiological RNA-protein interactions, which, when coupled with hybridization, allows purification under stringent conditions including ionic detergents and high salt concentration. This procedure improves over previous methodologies designed to interrogate RNA-protein interactions in several ways. First, it neither relies upon nor is conducive to interactions formed in vitro, which may not be representative of mRNP complex assembly in vivo. Second, directly targeting the RNA by hybridization results in high purification efficiency, which may be more difficult to achieve when purification is mediated by RNA-protein interactions. Third, this represents an approach that is unbiased by prior knowledge of protein function, as components of the mRNP can be identified by mass spectrometry following purification. This protocol has been demonstrated to effectively purify an mRNP complex, and will be used to identify differences between two mRNPs that display different sensitivity to the NMD pathway. Preliminary MS analysis for an NMD-sensitive mRNP complex identified an enrichment of proteins with known or anticipated function in mRNA activity, indicating that MS analysis of the purified mRNP can be used to produce a valid and informative dataset. A number of candidate proteins with possible roles in NMD were identified in this manner, and further interpretation of the data will be facilitated by future analyses of an NMD-

60 insensitive control mRNP and a mock sample generated from cells that do not express a GFP reporter mRNA.

This procedure can be easily adapted for the purification of other mRNP complexes to understand various aspects of RNA metabolism. With minimal optimization, a set of DNA oligonucleotides complementary to an RNA of interest can be tested and used in place of the oligonucleotides presented here.

Alternatively, the small region of the GFP ORF to which complementary DNA oligonucleotides were targeted can be cloned into an RNA of interest, serving as a transferrable targeting platform. The latter approach has been used to investigate the association of proteins with mRNAs displaying different sensitivities to decapping activator DHH1 (data not shown).

The mRNP pulldown protocol can interrogate the association of proteins with an mRNP complex in two distinct ways, depending on the nature of the experimental question and extent of prior knowledge regarding the role of a protein in the life cycle of an mRNA. First, this approach can be used to ask directed questions regarding the association of a specific RNA-binding protein with individual mRNAs using western analysis. This is convincingly demonstrated by the differential association of UPF1 with NMD-sensitive and

NMD-insensitive mRNAs following mRNP purification, which recapitulates the documented preference of UPF1 for NMD substrate mRNAs (Johns et al., 2007).

Furthermore, another interesting application of this protocol could be analysis of the association of known mRNP components with an mRNA in different gene deletion backgrounds. This could identify whether one component of an RNA

61 regulatory pathway is required for other factors involved in the same pathway to associate with an mRNA. This type of analysis could provide information about the order of protein assembly in an mRNP complex, or the interdependency of protein association with a specific mRNA. Moving forward, purification of mRNP complexes yields sufficient quantity of material for protein detection by MS. This facilitates the detection of unknown proteins associated with any single mRNA population. The mRNP pulldown procedure therefore represents a powerful new tool to uncover many details of mRNP complex formation and regulation.

2.4 Future directions

Following the development of the mRNP pulldown protocol, two long-term goals remain: 1) to identify factors important for NMD substrate recognition based on differences between NMD-sensitive and NMD-insensitive mRNPs, and

2) to obtain a complete picture of proteins that associate with a typical mRNP.

Following substantial protocol optimization, it was demonstrated that the purified mRNP can be effectively analyzed by MS, and data regarding an NMD-sensitive mRNP is now available for further study. Future analysis of control datasets will be carried out in order to extract the maximum amount of useful information from this dataset. Specifically, analysis of the mRNP associated with the NMD- insensitive mRNA will identify proteins that consistently associate with mRNAs and those that preferentially associate with NMD-sensitive or -insensitive mRNA populations. Additionally, analysis of a mock mRNP preparation from cells lacking a GFP reporter mRNA will identify proteins purified non-specifically, which

62 can be filtered from experimental datasets. The analysis of control datasets will allow both experimental goals to be addressed in more depth.

Protein factors involved in the recognition of NMD substrates are anticipated to show differential association with NMD-sensitive and -insensitive mRNAs. NMD-sensitive mRNAs are predicted to preferentially associate with specific proteins that help recruit the NMD machinery or mark the NMD substrate, based on the increased length of untranslated RNA that results from premature translation termination which may serve as a binding platform for

RBPs, although it is also formally possible that instead NMD substrates lack some feature of normal RNAs. Comparison of proteins associated with the

NMD-sensitive mRNP to those associated with the control NMD-insensitive mRNP will facilitate the identification of those that show a differential association between the two classes of mRNAs. This will provide stronger evidence for a potential role for candidate proteins in NMD substrate recognition.

An intriguing possibility is that the NMD factors themselves sense premature translation termination, in contrast to relying on other trans-acting factors to recruit the NMD machinery to NMD substrates. In support of this possibility, UPF1 associates preferentially with NMD-substrate mRNAs, or mRNAs with longer regions of untranslated RNA downstream of the site of translation termination, in a sequence-independent manner (Hogg and Goff,

2010; Hwang et al., 2010; Johns et al., 2007). In this way, UPF1 itself could act as a sensor of 3’ UTR length and directly recognize NMD substrates. This model is supported by the enriched association of UPF1 with the NMD-sensitive GFP

63 reporter mRNA by western blotting, and strong detection with this mRNP by MS analysis. Further investigation is needed to determine if UPF1 alone is truly able to detect the length of an mRNA 3’ UTR independent of any other protein factors.

Concomitantly, other proteins identified by MS to associate with the NMD- sensitive mRNA will be investigated for a general role in NMD. Proteins encoded by nonessential genes will be screened by generating genomic deletions of each gene individually. The CYH2 pre-mRNA, an endogenous NMD substrate due to inefficient splicing leading to intron retention (He et al., 1993), will then be analyzed for an increase in abundance indicative of decreased NMD activity. A representative example of this screening procedure was demonstrated with hnRNP K-like protein HEK2, which is implicated in nuclear mRNA maturation, export, and persistent association with cytoplasmic mRNPs (Denisenko and

Bomsztyk, 2002). The PBP2 protein is a paralog of HEK2 that associates with

PAB1 protein (Mangus et al., 1998), and was simultaneously tested to exclude a redundant role between the two proteins (Figure 2-6A). In this example, neither protein alone or in combination affected the abundance of CYH2 pre-mRNA, arguing against a role in NMD substrate recognition. Screening in this manner is being carried out for candidate proteins from the presented mRNP pulldown data, selected based on high representation in the NMD-sensitive mRNP and putative functions in mRNA processing or translation (Table 2-1). To this point, no novel proteins with a general role in NMD have been identified. Because candidate proteins were identified based on association with GFP reporter mRNAs, the abundance of the GFP reporters can be monitored following deletion of each

64 B GAL-HA:DBP5 Hours following shift

untagged 0 1 2 4 6 from GAL to GLU media

75 kD

50 kD -HA (DBP5-HA ~59 kD) 37 kD

A pbp2 75 kD -PAB1 50 kD (64 kD) WT upf1 hek2 pbp2 hek2 37 kD CYH2 % DBP5-HA 100 85.8 29.2 1.97 4.33 pre-mRNA

CYH2 WT GAL-HA:DBP5 mRNA 0 1 2 4 6 0 1 2 4 6 Hours following shift WT upf1 from GAL to GLU media CYH2 % pre-mRNA/ 4.6 28.4 3.5 3.2 4.6 mRNA pre-mRNA CYH2 mRNA

% pre-mRNA/ 3.9 33.3 5.2 5.1 5.7 33.6 81.5 4.4 5.4 5.9 23.8 30.3 mRNA

SCR1

Figure 2-6. Preliminary screening of candidate proteins identified by mRNP pulldown. A) Northern analysis of CYH2 mRNA (bottom) and pre-mRNA (top) in hek2Δ, paralog pbp2Δ, or double deletion strains. B) Screening method for essential genes. Protein analysis by western blot (top) and RNA analysis by northern blot (bottom) for galactose-inducible DBP5, following a shift from media containing galactose to glucose. CYH2 pre-mRNA (northern, top band) was assayed as an endogenous NMD substrate for changes in NMD activity upon repression of DBP5-HA expression.

protein as a secondary screen, or as a means to identify an RNA-specific role in

NMD targeting.

To screen proteins that are essential in a wild-type genetic background, an alternate approach has been developed. Endogenous essential genes can be engineered to be expressed under control of the galactose-inducible GAL promoter (Longtine et al., 1998), such that the activity of NMD can be monitored when expression of the protein of interest is conditionally repressed. One interesting essential candidate protein, DBP5 (see Table 2-1), has been tested in this manner (Figure 2-6B); DPB5 protein levels were reduced to <2% by 4 hours after shift to growth in glucose which represses transcription from the GAL

65 promoter. At this time-point, no difference in CYH2 pre-mRNA levels was apparent when compared to wild-type, suggesting DBP5 does not have a general role in NMD. Importantly, however, these data demonstrated the utility of testing other essential genes with this repressible gene expression system.

The mechanism of mRNA decay in yeast has been thoroughly characterized, but little is known about how trans-acting protein factors impact the translation or decay of specific mRNAs. Because the mRNP pulldown protocol reveals many proteins that associate with a single mRNA species in vivo, proteins involved in other aspects of mRNA activity and metabolism - not only NMD - will be identified. In other words, a single dataset has the potential to provide information about many aspects of mRNA metabolism. Proteins associated with the NMD-sensitive mRNP can be screened for roles in the specific metabolism of the GFP reporter mRNAs or general effects on mRNA.

These proteins can be investigated for global or gene-specific effects on the translation and decay of mRNAs in future studies.

66 CHAPTER 3: TRANSLATION OF UNANNOTATED

RNAs IN YEAST1

3.1 Introduction

The emerging field of long non-coding RNAs

Recent transcriptome analyses have uncovered thousands of previously unidentified >200 nucleotide long non-coding RNAs (lncRNAs), in large part thanks to advances in technology such as the development of high-throughput

RNA sequencing (RNA-seq). These novel RNAs are bioinformatically predicted to lack protein-coding capacity (Derrien et al., 2012). Many lncRNAs are therefore thought to serve as functional RNA molecules, and a number have been ascribed non-coding functions in the cell (reviewed in Geisler and Coller,

2013). For example, some lncRNAs are believed to recruit chromatin remodeling complexes (i.e. Polycomb Repressive Complex 2; PRC2) to specific sites in the genome, providing targeting specificity and/or serving as scaffolds for multimeric complex assembly (Khalil et al., 2009; Zhao et al., 2010); this includes the notable examples of the Xist RNA responsible for X-chromosome inactivation in females (Zhao et al., 2008), and HOTAIR which is involved in trans-mediated Hox gene regulation during mammalian development (Rinn et al., 2007). Many other lncRNAs, and particularly those found in budding yeast, have been proposed to

1 Much of the data and description presented in this chapter is reproduced with permission from our recent publication: Smith, J.E., Alvarez-Dominguez, J.R., Kline, N., Huynh, N.J., Geisler, S., Hu, W., Coller, J., and Baker, K.E. (2014). Translation of small open reading frames within unannotated RNA transcripts in Saccharomyces cerevisiae. Cell Reports 7, 1858-1866. Publication is also available at http://www.sciencedirect.com/science/article/pii/ S2211124714003982

67 affect the regulation of nearby genes in cis, commonly through transcriptional interference (Bumgarner et al., 2009; Martens et al., 2004) or transcription- induced chromatin remodeling (Camblong et al., 2007; Kim et al., 2012). lncRNAs thus represent an exciting and expanding field of study.

In yeast, a number of classes of putative lncRNAs have been detected in global transcriptome analyses, and categorized primarily based on sensitivity to cellular RNA decay enzymes. These include Cryptic Unstable Transcripts (CUTs) sensitive to nuclear exosome component RRP6 (Xu et al., 2009), cytoplasmic exonuclease XRN1-sensitive Unstable Transcripts (XUTs; van Dijk et al., 2011),

DCP2-sensitive lncRNAs (Geisler et al., 2012), and Stable Unannotated

Transcripts (SUTs; Xu et al., 2009). In total, these classes include several thousand putatively non-coding RNAs. An important consideration is that these studies predominantly focused on identification of RNAs stabilized in the absence of specific cellular RNA decay machinery, and all RNAs expressed in wild-type yeast have not yet been comprehensively annotated. Therefore, there may be additional RNAs expressed from the yeast genome that have not yet been specifically described.

By definition, lncRNAs are predicted to lack protein-coding capacity.

However, many lncRNAs are structurally similar to mRNAs, containing features that promote efficient translation such as a 5’ 7-methylguanosine cap and 3’ poly(A) tail (Guttman et al., 2009) which makes them theoretically competent to be translated. Recent observations based on ribosome profiling (Aspden et al.,

2014; Brar et al., 2012; Chew et al., 2013; Ingolia et al., 2014; Ingolia et al., 2011;

68 Juntawong et al., 2014) or co-fractionation with polyribosomal complexes

(Aspden et al., 2014; van Heesch et al., 2014) suggest that many lncRNAs associate with ribosomes, which has complicated the assumption that these

RNAs serve solely non-coding functions. Other studies have challenged whether this apparent ribosome binding is artifactual or biologically relevant (Guttman et al., 2013). However, small peptides with important biological roles in signaling and development have been identified in flies (Galindo et al., 2007; Magny et al.,

2013) and zebrafish (Pauli et al., 2014) that are encoded by RNAs originally classified as lncRNAs, suggesting translation of putatively non-coding RNAs does occur and might represent a widespread phenomenon with important biological implications. Therefore, it would not be surprising to discover that many novel RNAs have been misclassified as non-coding. It will be important to unravel the nature of these putatively non-coding RNAs in order to truly elucidate their biological role in the cell.

NMD substrate recognition requires translation

NMD is a translation-dependent process, a characteristic supported by substantial experimental evidence. Early observations demonstrated that mRNAs containing frameshift or nonsense mutations can be stabilized by suppressor tRNAs (Belgrader et al., 1993; Culbertson et al., 1980; Gozalbo and

Hohmann, 1990; Leeds et al., 1991), which impact translation elongation at the step of decoding. Additionally, inhibition of translation by antibiotics (Carter et al.,

1995), secondary structure in the 5’ UTR (Belgrader et al., 1993), or regulation by

69 the iron-responsive element (IRE; Thermann et al., 1998) causes normally NMD- sensitive mRNAs to escape regulation by NMD and be stabilized. Finally, translation by the ribosome is the only known mechanism to decode a reading- frame and locate in-frame termination codons. These independent lines of evidence support the well-accepted aspect of NMD that an RNA cannot become sensitive to NMD unless it is actively translated.

The translation-dependent nature of NMD makes it useful for determining the translational status of an RNA. Essentially, sensitivity to NMD provides evidence that an RNA is translated, which can corroborate evidence for translation obtained by other means. Specifically, the sensitivity of putatively non-coding RNAs to NMD can provide evidence that their detected association with ribosomes, assayed by polyribosome analysis or ribosome profiling, is not an artifact of the methods used but instead is indicative of active translation in vivo.

3.2 Results and conclusions

Identification of unannotated RNAs in the S. cerevisiae genome

! To identify putative non-coding RNAs in S. cerevisiae, we performed RNA- seq on steady-state whole-cell RNA in wild-type and NMD-deficient upf1Δ cells in duplicate. Prior to constructing RNA-seq libraries, whole-cell RNA was depleted of ribosomal RNA (rRNA) by subtractive hybridization with the Human/Mouse/Rat

Ribo-Zero kit from Epicentre, and Zymo RNA Clean and Concentrator-5 columns were used with a modified protocol (see Appendix D) optimized to remove small

70 (less than ~200 nucleotide) RNAs, which resulted in dramatic removal of rRNA and abundant small RNAs with good recovery of other RNA species (Figure 3-1).

The removal of rRNA eliminated the need to use oligo-d(T) selection during cDNA library preparation. Thus, RNA was converted to cDNA by random priming, and cDNA libraries for sequencing were generated with a strand-specific protocol such that sequencing reads could be uniquely mapped to their strand of origin. Sequencing by Illumina HiSeq generated ~11-22 million uniquely mapped reads per sample (Table 3-1).

Figure 3-1. Depletion of ribosomal RNA in samples for RNA-seq. Whole cell RNA (left) can be specifically depleted of ribosomal RNA and abundant small RNAs such as tRNAs (right). Recovery of mRNAs, 7S ncRNA SCR1, and rRNA following depletion steps was monitored by ethidium bromide stain and northern blotting. rRNA depletion from a panel of yeast strains is shown.

71 ! To identify RNAs expressed from regions of the yeast genome that did not correspond to annotated genes, mapped RNA-seq data was analyzed by

Reference Annotation-Based Transcript (RABT) assembly using the Cufflinks transcript assembly software package (Roberts et al., 2011). This approach identified 1146 unannotated RNAs (uRNAs) expressed from intergenic or antisense loci, and which are putatively non-coding (full list available in Smith et al., 2014). Many uRNAs are expressed from loci that overlap one or more recently-described classes of yeast lncRNAs (Figure 3-2A; Geisler et al., 2012; van Dijk et al., 2011; Xu et al., 2009), but also include many which have not been categorized.

! uRNAs were further characterized to predict whether they might associate with ribosomes. Translation is enhanced by features on mRNAs including the 5’

7-methylguanosine cap and 3’ poly(A) tail, which are co-transcriptionally added to most transcripts generated by RNA Polymerase II (Pol II), including mRNAs.

Therefore, transcription by Pol II would suggest that uRNAs could be competent for translation. An abbreviated transcriptional shutoff was performed using the rpb1-1 temperature-sensitive allele of Pol II subunit RPB1, which causes transcription by Pol II to be completely halted at 37 °C (Nonet et al., 1987), and the abundance of RNA following Pol II transcriptional shut-off was monitored by northern blot analysis. Decreased RNA abundance after transcriptional shut-off would be indicative of RNA decay in the absence of new transcription, and suggest the RNA is transcribed by Pol II. As expected, RNA Polymerase III transcript SCR1 showed no change in abundance across the time-course (Figure

72 CUTs only A SUTs only XUTs only DCP2-sensitive only CUTs + SUTs CUTs + XUTs SUTs + XUTs DCP2-sensitve + SUTs DCP2-sensitive + XUTs CUTs + SUTs + XUTs DCP2-sensitive + SUTs + XUTs DCP2-sensitive + SUTs + XUTs + CUTs Unclassified

B rpb1-1/ C rpb1-1/upf1 120.0 rpb1-1 upf1 Time at 100.0 37°C 0’ 30’ 0’ 30’ CYH2 mRNA 80.0 CYH2 pre-mRNA 60.0 T = 0 T = 30 SCR1 RNA 40.0 Percent Remaining ORC2-TRM7 intergenic 20.0 0.0 top bottom

FAA2-BIM1 intergenic SCR1 mRNA ncRNA intergenic CYH2 ICR1 intergenic ICR1 lncRNA FAA2-BIM1 YKU80-YMR107W intergenic ORC2-TRM7 rpb1-1 120.0 RPS15 antisense 100.0 uRNAs RPL5 antisense 80.0 BMH1 antisense 60.0 T = 0 YNL190W antisense T = 30 40.0

YAR047C antisense Percent Remaining 20.0

FAS2-USV1 antisense 0.0

SCR1 RNA SCR1 antisenseantisenseantisenseantisenseantisenseantisense pre-mRNA RPL5 CYH2 RPS15 BMH1 YNL190WYAR047C D )+ FAS2-USV1 YKU80-YMR107W intergenic Inputpoly(A Flow-through

CYH2 pre-mRNA CYH2 mRNA E 120 poly(A)+ PGK1 mRNA Flow-through 100

ORC2-TRM7 intergenic 80

60 FAA2-BIM1 intergenic Fraction 40 ICR1 lncRNA 20

uRNAs YKU80-YMR107W intergenic 0

BMH1 antisense SCR1 mRNA mRNA ncRNA intergenic antisenseantisenseantisense pre-mRNA CYH2 PGK1 ICR1 intergenic (top) YNL190W antisense CYH2 BMH1 intergenic (bototm) YNL190WYAR047C ORC2-TRM7 YAR047C antisense FAA2-BIM1 YKU80-YMR107W intergenic FAA2-BIM1 SCR1 RNA

Figure 3-2. Characterization of yeast unannotated RNAs. A) Overlap between uRNAs identified in this study and four previously described categories of lncRNAs. B) Two timepoint transcriptional shutoff with temperature-sensitive RNA Polymerase II subunit rpb1-1. RNAs sensitive to upf1Δ (described below) were analyzed in a rpb1-1/upf1Δ strain for enhanced detection. CYH2 mRNA and pre-mRNA serve as controls for Pol II transcripts, and SCR1 serves as a negative control RNA Polymerase III transcript. C) Quantification of (B). Top - rpb1-1/upf1Δ; bottom - rpb1-1. D) Oligo-d(T) was used to separate polyadenylated (poly(A)+) from non- polyadenylated (flowthrough) RNAs. mRNAs CYH2 and PGK1 serve as controls for poly(A)+ mRNAs, while ncRNA SCR1 serves as non-adenylated control. E) Quantification of (D). Quantification is the fraction of each RNA species present in either the poly(A)+ or flowthrough fractions relative to total recovered. Panel A is reproduced with permission from Smith et al., 2014.

73 3-2B and 3-2C). In contrast, mRNAs and the tested subset of uRNAs showed substantial decreases in abundance upon Pol II inactivation (Figure 3-2B and

3-2C), suggesting that the tested uRNAs are transcribed by Pol II similar to mRNAs. Included in this and numerous downstream analyses is ICR1, a lncRNA that functions in regulating flocculation via transcriptional interference

(Bumgarner et al., 2009). To assay for polyadenylation of uRNAs, RNAs were tested by hybridization to oligo-d(T) magnetic beads. Interestingly, some uRNAs

(e.g. ORC2-TRM7 intergenic) were highly associated with oligo-d(T) beads, suggesting that these RNA species are predominantly polyadenylated. In contrast, other uRNAs (e.g. YNL190W antisense) showed weak association with oligo-d(T) beads, suggesting these may largely lack a poly(A) tail (Figure 3-2D and 3-2E). These data suggest that uRNAs represent a heterogeneous class of

RNAs, with a subset of containing features consistent with being competent for translation.

! uRNAs co-sediment with polyribosomes

! As a first measure of ribosome association, we identified RNAs that co- sediment with polyribosomal complexes by sucrose density centrifugation using

RNA-seq (i.e. Polysome-seq). Briefly, after separating a yeast whole-cell lysate across a sucrose gradient, fractions corresponding to polyribosomal complexes were pooled (Figure 3-3A) and used to generate strand-specific, rRNA-depleted cDNA libraries analyzed by RNA-seq using the same approach described above

(Table 3-1). For each RNA, a Translatability Score was calculated, which

74 represents the amount of RNA that co-sedimented with polyribosomes relative to total RNA abundance as determined by RNA-seq of whole-cell RNA

(FPKMPolysome-seq/FPKMRNA-seq). As expected, mRNAs were well represented in the Polysome-seq libraries as demonstrated by high average Translatability

Scores (mean 1.12 +/- 0.49 SD; Figure 3-3B and 3-3C, blue). In contrast, classical ncRNAs such as snRNAs were underrepresented in Polysome-seq, with much lower Translatability Scores (mean 0.24 +/-0.19 SD; Figure 3-3B and

3-3C, gray). This indicated that polyribosome analysis effectively partitioned translated RNAs from true non-coding RNAs.

! Interestingly, uRNAs showed a broad range of Translatability Scores

(mean 0.98 +/- 0.79 SD; Figure 3-3B and 3-3C, red) that largely overlapped the distribution observed for mRNAs. This indicated that some uRNAs were poorly polyribosome-associated while others were polyribosome-associated to a similar extent as protein-coding mRNAs, again suggesting that uRNAs represent a heterogeneous class of transcripts. Previously described lncRNA categories were also analyzed to see if any particular class was more or less enriched on polyribosomes. For example, CUTs sensitive to the nuclear exosome are predicted to be underrepresented on cytoplasmic polyribosomes, while XUTs sensitive to the cytoplasmic exonuclease XRN1 might demonstrate higher polyribosome association due to their predicted cytoplasmic localization.

Surprisingly, no category of lncRNA showed a trend towards higher or lower

Translatability Scores (Figure 3-3D), with the exception of the novel uRNAs which trended towards lower Translatability Scores (Figure 3-3D, light blue). This

75 suggests that there are no underlying differences in the association of lncRNAs in yeast with the translational machinery that can simply be explained by the method by which they were identified or class to which they belong.

RNP 80S monosomes A B 60 600 ncRNA mRNA 50 uRNA 500 (mRNAs, unannotated RNAs)

Polyribosomes 40 400 Number 60S 40S 30 300 Number (ncRNAs) 25S rRNA 20 200 18S rRNA

10 100

tRNAs

Fractions collected for RNA-Seq 0 1 2 0.5 1.5 0.25 0.75 1.25 1.75 0.125 0.375 0.625 0.875 1.125 1.375 1.625 1.875 Translatability Score

C D 250

200 Mixed uRNAs DUTs 150 XUTs SUTs CUTs 100 Translatability Score Translatability 50 Cumulative Number (RNAs)

0

mRNA Classic ncRNA uRNA 0 1 2 (n=5024) (n=42) (n=1146) 0.5 1.5 0.25 0.75 1.25 1.75 0.125 0.375 0.625 0.875 1.125 1.375 1.625 1.875 Translatability Score

Figure 3-3. Yeast uRNAs and lncRNAs co-sediment with polyribosomes. A) Polyribosome analysis of yeast cell lysates. Top: UV trace after sedimentation through sucrose gradients. bottom: ethidium bromide stain of RNA isolated from each gradient fraction. RNA for Polysome- seq pooled from fractions indicated. B) Translatability Score (FPKMPolysome-seq/FPKMRNA-seq) for characterized ncRNAs, mRNAs, and uRNAs. C) Distribution of Translatability Scores as in (B) for each class of RNA. D) Translatability Scores as in (B) for previously described categories of yeast lncRNAs, uRNAs, or RNAs falling into more than one category (mixed). Panels A, B, and C are reproduced with permission from Smith et al., 2014.

76 Ribosome profiling indicates uRNAs are bound by ribosomes

! Polysome-seq provided preliminary evidence that many putatively non- coding RNAs in yeast associate with the translational machinery. As an independent method to corroborate Polysome-seq, we employed ribosome profiling (Ingolia et al., 2009) to globally assay ribosome association at nucleotide resolution (Figure 3-4A). Following limited RNase I-digestion of a yeast whole- cell lysate, ribosome-protected fragments (RPFs) of RNA were purified from fractions of a sucrose density gradient corresponding to 80S ribosomes (Figure

3-4B), which should minimize the isolation of RNA fragments protected by other large complexes (Guttman et al., 2013). Following depletion of rRNA and construction of strand-specific cDNA libraries from RPFs, samples were analyzed by Illumina HiSeq high-throughput sequencing and compared to total RNA libraries generated in parallel by random RNA fragmentation. Approximately 3-6 million mapped non-rRNA reads were generated for each sample (Table 3-1).

(An additional biological replicate from upf1Δ cells was performed at a later date, providing approximately 10-fold increased depth of coverage [Table 3-1] and recapitulating the data presented hereafter with higher confidence [data not shown]). Using these datasets, a Footprinting Score (FPKMfootprints/FPKMfragments) was calculated, which provides a measure of ribosome association with a given

RNA normalized to RNA expression. As expected, this measure of ribosome association was positively correlated with the Translatability Score calculated by

Polysome-seq (Figure 3-4C), supporting the validity of both assays.

77 ! 331 of 1146 uRNAs were reproducibly detected in the ribosome profiling datasets. Of these, >50% showed significant ribosome binding (Footprinting

Score >0.1), corroborating the conclusion drawn from Polysome-seq that many

RNP 80S A Yeast cell B C culture

100

Add cycloheximide Lyse cells All RNAs Polyribosomes =0.73)=0.73) -RNase I 10 uRNAs =0.46)=0.46)

Digest extract with RNase I 1 Footprinting Score 0.1

Enrich ribosome-bound monosomes and purify

~28 nt RNA fragments +RNase I

0.01 0.1 1 10 Translatability Score

cDNA synthesis and High-throughput sequencing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Fraction

Million of sequenced ribosome-protected fragments % Sucrose

D chrII 219,000 220,000 221,000 chrII 169,400 170,000 E

55 1100

0 0

intergenic 4 (reads) (reads) 10 otal RNA otal RNA T T 12 16 antisense 55 1100

0 0 (reads) (reads) 12 16 YBL005W 1000 YBL005W YBL005W-A YBL027W Ribosome footprints Ribosome footprints sORF-32 sORF-1 YBL027W-YBL026W

chrXV 968,400 970,000 chrXIII 481,000 481,400

12 18 Length (nt)

intergenic 0 0 100

(reads) (reads) intergenic otal RNA otal RNA

T T 12 1000

12 18 Annotated ORFs uRNA ribosome 0 0 (n=6607) footprint coverage (reads) (reads) (n=185) 12 1000

tK(CUU)M YKU80-YMR107W Ribosome footprints sORF-15 Ribosome footprints sORF-4 YOR343C-YOR343W-B

Figure 3-4. Ribosome profiling identifies regions of ribosome association with uRNAs. A) Schematic of ribosome profiling protocol. B) Representative UV trace of polyribosome gradients from cell lysates without (-) or with (+) RNase I treatment. Fractions encompassing the collapsed 80S peak following RNase I treatment collected for analysis are indicated. C) Comparison of Polysome-seq and ribosome profiling. The Translatability Score calculated from Polysome-seq and Footprinting Score calculated by ribosome profiling were compared for all RNAs for which a score could be calculated for both assays. All RNAs (black) and uRNAs (red) are shown. Spearman rank correlation coefficients (ρ) indicated. D) RNA-seq and ribosome footprinting sequence coverage for sample uRNAs. Waston strand (navy); Crick strand (teal). Annotated genes (navy or teal bars) and putative sORFs delineated by ribosome footprints (green bars) are indicated. E) Boxplot of the average length of yeast annotated ORFs (including verified, uncharacterized, and dubious ORFs) compared to the average size of the region covered by ribosome footprints for uRNAs. All panels reproduced with permission from Smith et al., 2014.

78 uRNAs associate with polyribosomal complexes. Representative examples of uRNAs protected by ribosome footprints are shown in Figure 3-4D. Ribosome footprints tended to concentrate near the 5’ end of uRNAs, consistent with translation initiation at the first available AUG codon (Kozak, 1989). In general, ribosomes protected a smaller region of uRNAs than most annotated ORFs

(Figure 3-4E), providing an explanation for why any ORFs encoded within uRNAs may have previously been overlooked.

! RPFs generated by ribosome profiling that correspond precisely to the size of a ribosome footprint (28 nt) can be used to predict translated reading frames, based on the 3-nucleotide periodicity of footprints generated by ribosomes as they translocate along an RNA (Ingolia et al., 2009). This is demonstrated by the strong bias towards ribosome footprints aligning in-frame with the annotated ORFs on mRNAs (Figure 3-5A). Using this feature, uRNAs that exhibited a bias towards ribosome footprints in a single reading frame, suggestive of active translation at that region, were identified. Of 80 uRNAs with sufficient coverage to perform this analysis, 61 contained ribosome footprints aligning predominantly to a single frame. Manual inspection of these uRNAs identified 43 on which ribosome footprints demarcated at least one reading frame

(canonical AUG initiation codon and in-frame downstream stop codon) encoding

≥10 amino acids, which we identified as short ORFs (sORFs; Figure 3-5B; Table

3-2). The more recent ribosome profiling dataset with ~10-fold deeper coverage expanded this list to include 51 additional sORFs. The manually annotated boundaries of sORFs were corroborated by a metagene analysis of ribosome

79 footprints alignment across all sORFs, which showed a sharp drop in association upstream and downstream of the predicted reading frame (Figure 3-5C). These analyses provided evidence for active translation on many uRNAs, and identified a number of novel loci capable of expressing proteins in the yeast genome.

A 80

70 B 100 nt 60 YKU80-YMR107W intergenic RNA (sORF-4) 50 Ribosomal AUG UAA 40 Footprints Putative ORF Fragmented 30 RNA Percent Reads 20

10

0 1 2 3 Reading Frame

Average length = 104 nt C Ribosome footprints (WT) Total fragmented RNA (WT) Reads per Million (RPM)

Normalized chromosomal position

Figure 3-5. 3-nucleotide phasing of ribosome footprints facilitated the identification of translated sORFs. A) Fraction of 28-nucleotide ribosome footprints (colored) or randomly fragmented RNA (white) mapping to each of three frames for annotated mRNAs. Data are mean +/- SEM. B) 28 nucleotide ribosome footprints mapping to YKU80-YMR107W intergenic uRNA demonstrate phasing and delineate an ORF within AUG start and UAA stop codons. Ribosome footprints colored based on frame to which they map as in (A). C) Metagene plot of average sequencing coverage of ribosome footprints (red) or total fragmented RNA (gray) mapped along all predicted sORF coding regions (n=47), plus 100 nucleotides flanking either end. 5’End indicates predicted start codon position; 3’End indicates predicted stop codon position. Red bar demarcates putative sORF metagene. Data are mean reads per million. All panels are reproduced with permission from Smith et al., 2014.

80 Conservation and detection of sORFs expressed from uRNAs

! To provide further evidence that sORFs encoded within uRNAs are actively translated, production of the encoded polypeptide was directly assayed by western blotting. Expression of sORF-encoded polypeptides would provide strong evidence for the active translation of a subset of uRNAs. To facilitate detection by western blotting, predicted sORFs were epitope tagged using two approaches. First, a C-terminal 3xHA tag was inserted at the chromosomal locus of five sORFs using standard homologous recombination techniques (Figure

3-6A). This ensured endogenous expression of the sORF, but replaced endogenous sequences at the 3’ end of the uRNA with sequences that promote

3’ end formation shortly downstream of the protein tag. To avoid this caveat, a second approach involved cloning the genomic DNA for five uRNAs (+/-500 bp) into a high-copy expression vector and precisely inserting a C-terminal FLAG tag by site-directed mutagenesis (Figure 3-6C).

! In both cases, cells were treated with proteasome-inhibitor MG-132 to stabilize the small sORF peptides in the event that they were unstable.

Expression of the polypeptide encoded by sORF-4 tagged at the chromosomal locus was detectable, and importantly detection was dependent upon insertion of the 3xHA tag in-frame with the predicted sORF (Figure 3-6B). Furthermore, polypeptides encoded by both sORF-1 and sORF-4 were detected using the plasmid-based expression system (Figure 3-6D). Importantly, the exogenously- expressed peptide encoded by sORF-4 was detectable even without the addition of MG-132 (data not shown). These data provided direct evidence that a subset

81 of uRNAs are actively translated to produce protein, despite their putative annotation as non-coding.

! Evolutionary conservation among related species can be used as a potential indication of the biological importance of a protein. Three measures

A AUG stop

HA-tag and selectable marker E Ribosome-profiling predicted ORF

RNA-seq predicted transcript S. cerevisiaeS. pastorianusS. paradoxusS. mikataeS. kudriavzeviiS. bayanusN.castelliiC. glabrataK. lactisA. gossypii sORF-1 100 sORF-4 sORF-2 B HA - + + sORF-3 Tag in-frame - + sORF-4 20 kD sORF-5 * -HA sORF-6 15 kD * (sORF-4 - 14.2 kDa) sORF-7

50 kD % Identity -PGK1 sORF-8 (44.7 kDa) sORF-9 37 kD sORF-10 sORF-11 0 sORF-12 C AUG stop sORF-13 sORF-14 -500 bp +500 bp sORF-15 FLAG-tag sORF-16 Ribosome-profiling predicted ORF sORF-17 RNA-seq predicted transcript sORF-18 sORF-19 sORF-20 D sORF-1 sORF-4 sORF-1 sORF-4 Clade 1 FLAG - - + + 25 kD Clade 3 Clade 4

* Clade 11 Clade 12 20 kD -FLAG (sORF-1 - 10.1 kDa) 15 kD (sORF-4 - 10.6 kDa) 10 kD 50 kD -PGK1 (44.7 kDa) 37 kD

Figure 3-6. Evidence for sORF expression and conservation. A) Epitope tagging of putative sORFs at their endogenous chromosomal locus by homologous recombination. Solid black line represents uRNA defined by RNA-seq. B) Western blot analysis detects the translation of chromosomally tagged sORF-4. Signal is specific to an in-frame tag and corresponds to molecular weight for the chimeric peptide. Asterisk indicates a nonspecific signal. PGK1 serves as a loading control. C) Genomic DNA flanking uRNAs was cloned and the putative sORF epitope-tagged at its C-terminus. Solid black line represents uRNA defined by RNA-seq. D) Western blot detects translation of yeast sORF-1 and sORF-4. Signal corresponds to expected molecular weight for each chimeric peptide. Asterisk indicates a nonspecific signal. PGK1 serves as loading control. E) Conservation of sORFs among divergent yeast species. Putative peptides encoded by sORFs were identified in other yeast species based on six-frame translation using TBLASTN. Percent identical residues relative to full-length putative peptide indicated. Top 20 most conserved candidates shown. Figure is reproduced with permission from Smith et al., 2014.

82 were employed to examine the conservation of peptides encoded by sORFs across ten yeast species (Kurtzman and Robnett, 2003). TBLASTN, an alignment program which compares a peptide sequence to 6-frame translation products in query genomes, was first used to identify the potential for similar proteins to be encoded within other yeast genomes. This analysis identified 39 of 47 sORFs that demonstrated conservation within at least one related yeast species, and 20 of these conserved to at least two species (Figure 3-6E, Table

3-2). As a second measure of conservation, the ratio of nonsynonymous to synonymous mutations (Ka/Ks; Zhang et al., 2006) was applied to indicate if the coding-potential of a specific region has been preferentially conserved. 12 sORFs demonstrated a bias towards synonymous mutations, providing evidence of peptide conservation rather than simply conservation of the nucleotide sequence (Table 3-2). Finally, 14 uRNAs are expressed from regions of the S. cerevisiae genome displaying high conservation in other yeast species by phastCons analysis (Table 3-2; Siepel et al., 2005). Cumulatively, these data provide preliminary evidence for conservation of a number of the newly-identified sORFs, suggesting that translation of uRNAs may occur because they encode proteins with important biological function.

NMD-sensitivity of uRNAs provides evidence for translation

! Our data from Polysome-seq and ribosome profiling indicated that many uRNAs are likely associated with translating ribosomes. To investigate whether any of these uRNAs are therefore subject to translation surveillance, sensitivity of

83 uRNAs to NMD was investigated. Inspection of RNA-seq data in wild-type versus NMD-deficient (upf1Δ) cells revealed that the abundance of 192 of 1146 uRNAs (16.8%) increased ≥2-fold in the absence of NMD (Figure 3-7A, orange), a subset of which were validated by northern blotting (Figure 3-7B, orange).

Notably, an increase in steady-state RNA abundance does not distinguish whether these uRNAs are direct or indirect targets of the NMD pathway.

Importantly, however, the subset of NMD-sensitive uRNAs displayed higher

Translatability Scores than uRNAs insensitive to NMD (Figure 3-7C), indicating a higher association with ribosomes that is consistent with many of these uRNAs being direct targets of NMD, because NMD is a translation-dependent process

(see Chapter 3.1).

! Many features may contribute to whether an RNA is sensitive to NMD (see

Chapter 1.3). To identify differences between NMD-sensitive and NMD- insensitive uRNAs, patterns of ribosome binding determined by ribosome profiling were investigated (Figure 3-7D). Intriguingly, NMD-sensitive uRNAs demonstrated much longer downstream ribosome-free regions than those insensitive to NMD (891 nt +/- 64 SEM versus 287 nt +/- 50 SEM; Figure 3-7E).

Given that a long 3’ UTR is a known NMD-targeting feature (Kebaara and Atkin,

2009), this observation provided a possible mechanistic explanation for why some uRNAs are sensitive to NMD. Additionally, this is supported by predictions that lncRNAs in other species would be sensitive to NMD if translated due to possessing short, 5’ proximal reading frames that would result in early translation termination and long 3’ UTRs (Niazi and Valadkhan, 2012). Moreover, the

84 from Smith et al., 2014.al., Smithet from freeregions NMD-sensitivefor and -insensitive uRNAs. Data are(bottom). presented asin Figure 3-4D. (gray). Score distribution uRNAsfor sensitive NMDto (orange) insensitive vs. NMDto Translatability byRNA-seq beto sensitive RepresentativeNMD. to and WT from analysissteady-stateof RNA analysis,FDR<0.05) sensitivityNMD-sensitiveNMD. to uRNAs exhibiting astatistically significant (based on Cuffdiff expressionlevels in wild-type(FPKM) versus (WT) A) NMD pathway. 3-7.ManyuRNAs aretranslation-dependent Figure the sensitive to B A

upf1 steady-state expression (FPKM) NMD-insensitive NMD-sensitive 1000 100 10 D)

WT WT steady-stateexpression(FPKM) Sequence coverage for NMD-sensitive uRNA or lncRNA orlncRNA Sequence coverage NMD-sensitivefor uRNA 10 SCR1 FAS2-USV1 YAR047C YNL190W BMH1 RPL5 RPS15 YKU80-YMR107W FAA2-BIM1 ORC2-TRM7 ICR1 lncRNA antisense RNA antisense antisense 100 ≥ antisense antisense 2-foldincrease in steady-state levels in intergenic antisense intergenic intergenic 1000 D C E upf1 WT

upf1 Ribosome footprints Total RNA Ribosome footprints Total RNA Percent

(reads) (reads) (reads) (reads) ORC2-TRM7 12 35 35 35 35 35 35 12 chrII 0 0 0 0 chrII 10 12 14 16 0 2 4 6 8

0 85 Δ 0.125 sORF-21 sORF-21 362,000 362,000 cells shows uRNAs and lncRNA NMD-insensitive uRN NMD-sensitive uRN E) 3’ ribosome-free length (nt) 0.25 upf1 1000 1500 2000 2500 3000 3500 4000 Length distribution downstreamof ribosome-

500 0.375 SCR1 0 Δ

intergenic 0.5

Allpanels reproduced with permission NMD-sensitive

measured with RNA-seq reveals uRNA measured with RNA-seq reveals uRNA 0.625 T 364,000 364,000 ranslatability Score loading control isshown. (n=119) 0.75 A

A

0.875 upf1 YBR061C YBR061C

1 1.125 Δ ICR1

in orange. Ribosome footprints Total RNA Ribosome footprints Total RNA ORFs ORFs 1.25 (reads) (reads) (reads) (reads) 30 30 30 30 30 30 chrIX chrIX 0 0 0 6 0 6 YIR019C YIR019C

NMD-insensitive 1.375 in WT (top)or in WT

YIR020C 1.5 YIR020C (n=66) 394,000 394,000 1.625 ICR1 YIR021W-A YIR021W-A 1.75 ICR1 ICR1 1.875 B)

ICR1 Northern blot 396,000 396,000 2 C) YIR021W YIR021W predicted upf1 uRNA uRNA Δ enrichment of a known feature of direct NMD targets among NMD-sensitive uRNAs further supports the prediction that many NMD-sensitive uRNAs are directly targeted by the NMD pathway. Therefore, sensitivity to NMD provided further support that ribosome association detected by other assays was indicative of active translation in vivo for of up to one-fifth of uRNAs.

3.3 Discussion

! lncRNAs have recently emerged as a novel, poorly understood class of transcripts. Although an increasing number of lncRNAs have well-defined non- coding functions, it is unclear whether the classification of the entire population as non-coding is accurate. Several independent lines of evidence indicate that a subset of putatively non-coding uRNAs in yeast are both associated with the translation machinery and engaged in active translation (Figure 3-8). First, many uRNAs co-sedimented with polyribosomes to an extent similar to mRNAs and distinct from classical ncRNAs (Figure 3-8, top). Second, many uRNAs were protected from nuclease digestion by ribosomes, and some displayed 3- nucleotide ribosome phasing indicative of active translation (Figure 3-8, middle).

Third, we directly detected expressed peptides encoded by predicted sORFs within two uRNAs (Figure 3-8, bottom left). Finally, a substantial fraction (~17%) of uRNAs were determined to be sensitive to the translation-dependent NMD pathway (Figure 3-8, bottom right). These observations suggest that the de facto classification of novel RNAs with limited protein-coding capacity as non-coding is simplistic, and is likely to cause the impact that translation may have on the

86 Figure 3-8. Evidence for translation of uRNAs in yeast. Assays including co-sedimentation with polyribosomes (top), ribosome profiling (middle), detection of expressed peptides (bottom left), and sensitivity to translation-dependent NMD (bottom right) were used to characterize the translational status of uRNAs. Figure reproduced with permission from Smith et al., 2014.

activity of these RNAs to be overlooked. Importantly, subsequent to publication of this data (Smith et al., 2014), an increasing number of studies have continued to find evidence that putatively non-coding RNA species are associated with ribosomes (Aspden et al., 2014; Ingolia et al., 2014; Ruiz-Orera et al., 2014), consistent with the conclusions of our analysis.

87 ! While the data presented here indicate that ~17% of uRNAs in yeast are sensitive to NMD, the increase in steady-state abundance does not definitively prove that NMD-sensitive uRNAs are directly targeted by the NMD pathway.

However, observations that these uRNAs have a higher association with ribosomes than uRNAs insensitive to NMD (Figure 3-7C) and are enriched for a known NMD-targeting feature (Figure 3-7E) provide support that many of these are direct targets for translation surveillance. Furthermore, an analysis of lncRNAs in mammalian cells carried out by collaborators at the Whitehead

Institute corroborated our findings in yeast and provided additional support for these lncRNAs being direct targets of NMD (Smith et al., 2014). Critically, a similar percentage (~17%) of lncRNAs expressed in mouse embryonic stem cells increased in abundance ≥1.5-fold in cells where NMD was inhibited by knockdown of UPF1 or inhibition of translation by cycloheximide (Hurt et al.,

2013; Smith et al., 2014). Moreover, lncRNAs that showed an increase in expression upon inhibition of NMD were 9.6-fold enriched for direct binding to

UPF1 (Hurt et al., 2013; Smith et al., 2014), consistent with these lncRNAs being direct targets of NMD. These analyses demonstrated that the sensitivity of a large subset of non-coding RNAs to NMD is not confined to yeast, and that this phenomenon is likely conserved throughout eukarya. While our data reveal a role for NMD in the metabolism of a large subset of predicted non-coding RNAs in yeast and mammals, it is not unheard of for RNAs putatively annotated as non- coding to be targets of the NMD pathway. Indeed, NMD has been previously shown to target specific RNAs annotated as lncRNAs in yeast (Thompson and

88 Parker, 2007; Toesca et al., 2011), plants (Kurihara et al., 2009), and humans

(Tani et al., 2013). In order to more clearly distinguish those uRNAs that are direct targets of the NMD pathway from indirect targets, future analysis of global

RNA decay rates in NMD-deficient strains is planned (see Chapter 4.4).

! The putative sensitivity of a fraction of uRNAs to NMD has interesting implications for the role of NMD in lncRNA regulation. Because NMD causes the rapid degradation of translated lncRNAs, an intriguing possibility is that NMD helps ensure that the steady-state lncRNA pool is ribosome-free and therefore truly non-coding. In this situation, a lncRNA with true non-coding function that erroneously engages translating ribosomes would be rapidly degraded.

Furthermore, the selective degradation of cytoplasmic lncRNAs that have become engaged in translation may help ensure that the population of that lncRNA remains enriched in the nucleus, where many lncRNAs are predicted to function (Derrien et al., 2012).

! The identification and translation of novel sORFs suggests a potentially expanded coding capacity within the yeast genome. Although sORFs encode peptides between 10 and 100 amino acids and are, therefore, smaller than most proteins, the yeast genome includes several hundred annotated ORFs <100 amino acids, some of which have essential functions (Kastenmayer et al., 2006).

Furthermore, small peptides encoded within RNAs that were originally described as non-coding in flies (Galindo et al., 2007; Magny et al., 2013) and zebrafish

(Pauli et al., 2014) have been shown to play important roles in signaling and development, arguing against the assumption that small proteins cannot carry

89 out important biological functions. Indeed, it has recently become appreciated that sORFs encoding functional peptides are more prevalent than originally thought (reviewed in Andrews and Rothnagel, 2014). Although a specific function has not been determined for any novel sORFs identified here, some of these peptides are anticipated to play important biological roles in the cell.

! It is also formally possible that the translation of uRNAs is stochastic and serves no specific function. The instability of the detected sORF-encoded peptides based on their enhanced detection upon treatment with proteasome- inhibitor MG-132 (Figure 3-6B, 3-6D), and the low expression of polypeptides from lncRNAs in humans (Banfai et al., 2012) could argue that peptides resulting from stochastic translation of non-coding RNAs are nonfunctional and rapidly degraded. However, even if encoded peptides do not currently serve a particular function, the translation of these sequences could serve as a reservoir for de novo gene evolution (Carvunis et al., 2012; Wilson and Masel, 2011) via the production and sampling of novel peptides. While nonfunctional peptides would be rapidly eliminated, ones with novel functions may give rise to new genes. In this case, the stochastic translation of otherwise non-coding RNAs would still play a critical biological role.

3.4 Future directions

! It will be of great interest to determine whether any novel sORFs or uRNAs have critical functions in the cell. The genetic tractability of S. cerevisiae makes deletion of genomic loci straightforward using standard yeast genetic

90 techniques. A preliminary screen for alterations in growth rate on a wide panel of nutrient and chemical stresses can identify sORFs or uRNAs whose deletion results in a notable growth phenotype compared to wild-type cells. A similar approach was previously used to perform phenotypic analyses for 247 annotated sORFs (annotated genes encoding proteins of <100 amino acids) in yeast

(Kastenmayer et al., 2006). Roles for 22 sORFs in various growth conditions were identified in this manner (Kastenmayer et al., 2006), highlighting both the utility of the approach and the fact that sORFs can nevertheless encode proteins with important cellular functions. This approach can be used to screen many loci, although it is relatively low-throughput and identifying the right conditions to test to observe a phenotype may be challenging.

! Preliminary screening of four novel sORFs - sORF-4, sORF-7, sORF-13, and sORF-14 - is currently underway using strains containing precise deletions of each sORF by gene replacement with an auxotrophic marker. Deletion strains have been screened on a wide range of conditions and compounds that result in osmotic stress, DNA damage, oxidative stress, and transcription or translation inhibition, among others. Growth is initially monitored by serial dilution spot assays on solid media, followed by halo assays and growth curves to confirm phenotypes and determine precise differences in growth. To ensure growth effects are specific to deletion at the locus of interest, and determine whether the effects are due to deletion of the sORF or disruption of the encoding uRNA, deletions will be rescued by complementation with a plasmid-encoded sORF or

91 uRNA gene. Once a phenotype has been thoroughly established, the precise role of the encoded peptide or uRNA can be specifically investigated.

! uRNAs which do not display significant association with ribosomes may have non-coding functions, perhaps through regulation of neighboring genes in cis. Other well-studied lncRNAs in yeast regulate expression of proximal genes through transcriptional interference (Bumgarner et al., 2009; Martens et al., 2004) or chromatin remodeling (Camblong et al., 2007; Kim et al., 2012). Cis regulation of gene expression predicts concomitant changes in the expression of proximal uRNAs and mRNAs under different conditions. uRNAs whose expression is significantly co-regulated with proximal mRNAs can be identified by monitoring changes in RNA expression across many growth conditions. A recent study of the yeast transcriptome under 18 growth conditions may serve as a useful reservoir of data for preliminary analysis (Waern and Snyder, 2013). This may facilitate the identification of additional functional lncRNAs in yeast, which as a class are still largely uncharacterized.

92 non-rRNA Sample Replicate # of Reads Mapped reads rRNA reads mapped reads

# % # % # %

1 13,803,030 11,096,422 80.4 - - - - WT steady state 2 31,213,437 22,617,117 72.5 - - - -

1 15,855,617 12,523,774 79.0 - - - - upf1Δ steady state 2 31,154,574 21,890,286 70.3 - - - -

1 31,342,788 22,141,454 70.6 - - - - WT polysomes 2 33,619,004 23,239,241 69.1 - - - -

1 25,964,168 17,821,788 68.6 - - - - upf1Δ polysomes 2 25,784,511 17,782,505 69.0 - - - -

WT steady state - 7,341,479 5,507,344 75.0 526,798 9.6 4,980,546 90.4 fragmented RNA

upf1Δ steady state - 7,912,713 5,759,010 72.8 219,344 2.9 5,539,666 96.2 fragmented RNA

WT 1 28,404,464 26,963,688 94.9 20,725,760 75.1 6,237,928 23.1 ribosome profiling 2 22,793,695 21,682,577 95.1 17,916,898 80.8 3,765,679 17.4

upf1Δ 1 21,727,393 20,603,069 94.8 16,440,899 77.9 4,162,170 20.2 ribosome profiling 2 23,351,093 21,834,783 93.5 16,885,425 75.6 4,949,358 22.7

upf1Δ steady state 104,227,632 66,629,672 79.7 1,496,568 2.3 65,133,104 97.8 fragmented * RNA

upf1Δ ribosome * 79.375,073 52,043,597 93.8 17,467,335 33.6 34,576,262 66.4 profiling

Table 3-1. Summary of high-throughput sequencing reads. For all RNA-seq libraries, the number of total reads and number and percentage of mapped reads are listed. For ribosome profiling RNA-seq libraries, the number and percentage of rRNA or non-rRNA reads are also provided. *Indicates libraries that were generated in a later experiment for increased depth of coverage. Table modified with permission from Smith et al., 2014.

93 i PhastCons Conserved S. pastorianus S. paradoxus Elements Log-odds BLAST Percent BLAST E- BLAST Bit BLAST Percent BLAST E- BLAST Bit Scoreabc sORF Number Encompassing uRNA sORF Coordinates Identityd valuee Scoree Ka/Ks (ω)efgh Identity value Score Ka/Ks (ω) sORF-1 XLOC_000132- chrII:169873-169637 17 98.73 3.00E-48 169 N.D. 83.5 3.00E-39 144 0.317 sORF-2 XLOC_000464- chrIII:309884-309747 308 41.3 0.11 33.1 0.205 78.26 2.00E-14 70.9 N.S. sORF-3 XLOC_002595+ chrXII:675730-675861 117 100 2.00E-20 88.2 N.D. 97.73 9.00E-20 86.3 0.032 sORF-4 XLOC_002893+ chrXIII:480923-481186 N/A 48.86 9.00E-38 88.6 N.S. 0 - - - sORF-5 XLOC_000429+ chrIII:242629-242685 52 100 7.00E-04 38.5 N.D. 78.95 0.69 29.6 N.S. sORF-6 XLOC_000768- chrIV:916135-916022 814 0 - - - 73.68 2.00E-10 58.9 N.S. sORF-7 XLOC_002334- chrXI:513546-513430 110 100 5.00E-18 80.5 N.D. 35.9 0.18 32.3 N.S. sORF-8 XLOC_002919- chrXIII:619370-619098 54 97.8 1.00E-35 134 N.S. 68.13 4.00E-26 106 0.376 sORF-9 XLOC_003873+ chrXVI:777577-777669 N/A 45.16 2.3 28.5 N.S. 45.16 4.1 27.7 N.S. sORF-10 XLOC_001697+ chrVII:902532-902762 24 100 2.00E-44 158 N.D. 0 - - - sORF-11 XLOC_002196+ chrXI:66560-66604 43 100 0.3 30.4 N.D. 0 - - - sORF-12 XLOC_002899+ chrXIII:502246-502341 91 100 4.00E-14 69.3 N.D. 0 - - - sORF-13 XLOC_000636+ chrIV:525039-525113 N/A 100 8.00E-08 49.3 N.D. 100 2.00E-07 49.3 N.D. sORF-14 XLOC_000636+ chrIV:525082-525150 N/A 100 1.00E-06 45.8 N.D. 100 3.00E-06 45.8 N.D. sORF-15 XLOC_003583- chrXV:969893-969765 N/A 0 - - - 60.47 6.00E-11 60.1 N.S. sORF-16 XLOC_003728+ chrXVI:286664-286765 N/A 100 2.00E-13 67 N.S. 41.18 3.1 28.5 N.S. sORF-17 XLOC_000364+ chrIII:39662-39805 N/A 100 6.00E-23 95.5 N.D. 37.5 3.1 28.9 0.325 sORF-18 XLOC_000631+ chrIV:513023-513172 33 100 3.00E-13 67.4 N.D. 34 3.00E-04 41.2 0.198 sORF-19 XLOC_003307+ chrXV:38888-39046 N/A 100 3.00E-26 105 N.D. 33.96 0.005 37 N.S. sORF-20 XLOC_003329+ chrXV:96656-96823 N/A 100 6.00E-30 116 N.D. 26.79 1.4 28.5 0.273 sORF-21 XLOC_000204+ chrII:362898-362939 353 100 5.6 26.6 N.D. 0 - - - sORF-22 XLOC_000631+ chrIV:512693-512740 N/A 100 0.042 33.1 N.D. 0 - - - sORF-23 XLOC_001229+ chrV:285208-285264 644 100 3.00E-04 39.7 N.D. 0 - - - sORF-24 XLOC_001689+ chrVII:875763-875801 N/A 100 3.3 26.9 N.D. 0 - - - sORF-25 XLOC_003019+ chrXIII:873056-873103 N/A 100 0.1 32 N.D. 0 - - - sORF-26 XLOC_003128+ chrXIV:270232-270270 N/A 100 5.5 26.6 N.D. 0 - - - sORF-27 XLOC_001405+ chrVII:18942-19028 N/A 100 6.00E-11 59.7 N.D. 0 - - - sORF-28 XLOC_000405+ chrIII:169068-169193 N/A 100 3.00E-20 87.4 N.D. 0 - - - sORF-29 XLOC_000841+ chrIV:1206674-1206796 139 80.49 2.00E-12 62.8 N.S. 0 - - - sORF-30 XLOC_001222+ chrV:268931-268963 N/A 0 - - - 0 - - - sORF-31 XLOC_003621+ chrXV:1003992-1004036 N/A 100 0.87 29.3 N.D. 0 - - - sORF-32 XLOC_000144- chrII:220889-220800 N/A 73.33 2.00E-07 49.7 N.S. 0 - - - sORF-33 XLOC_000337- chrII:792024-791974 N/A 100 0.024 33.9 N.D. 0 - - - sORF-34 XLOC_000617- chrIV:471496-471440 N/A 100 4.00E-04 39.3 N.D. 0 - - - sORF-35 XLOC_000803- chrIV:1019114-1019028 N/A 100 1.00E-09 55.8 N.D. 0 - - - sORF-36 XLOC_001190- chrV:177560-177402 N/A 100 4.00E-28 110 N.D. 0 - - - sORF-37 XLOC_001277- chrV:432186-432139 N/A 100 0.1 32 N.D. 0 - - - sORF-38 XLOC_001365- chrVI:159009-158926 N/A 60.71 0.004 37 N.D. 0 - - - sORF-39 XLOC_001409- chrVII:17702-17652 N/A 100 0.035 33.5 N.D. 0 - - - sORF-40 XLOC_001678- chrVII:811263-811213 N/A 100 0.005 35.8 N.D. 0 - - - sORF-41 XLOC_001678- chrVII:811200-811162 N/A 0 - - - 0 - - - sORF-42 XLOC_002811- chrXIII:311954-311892 N/A 95.24 8.00E-05 41.6 N.S. 0 - - - sORF-43 XLOC_003248- chrXV:5661-5560 N/A 0 - - - 0 - - - sORF-44 XLOC_003248- chrXV:6801-6511 N/A 0 - - - 0 - - - sORF-45 XLOC_003866- chrXVI:781588-781517 N/A 0 - - - 0 - - - sORF-46 XLOC_003609- chrXV:1064363-1064313 N/A 0 - - - 0 - - - sORF-47 XLOC_003356- chrXV:326433-326329 N/A 91.43 2.00E-12 64.3 N.S. 0 - - -

aBold indicates phastCons conserved element completely overlaps sORF bItalics indicates that conserved element may be influenced by gene antisense to sORF uRNA cN/A indicates no conserved element corresponds to sORF locus dPercent identity score of 0 indicates no alignment found at E<10 eFor all comparisons where BLAST produced no match, "-" is recorded fN.D. = not determined; nucleotide sequences show 100% alignment gN.S. = not significant; value not reported due to Fisher's p-value >0.05 hBold indicates Ka/Ks ratio supports purifying selection iAll data reported as for S. pastorianus

Table 3-2. Predicted sORFs and conservation analyses. For each sORF, the encoding uRNA, sacCer2 chromosomal coordinates, phastCons log-odds score for previously identified conserved elements (Siepel et al., 2005), TBLASTN results (percent identical residues relative to full-length putative peptide, E-values, and bit scores), and Ka/Ks ratios (Zhang et al., 2006) are presented. Table reproduced with permission from Smith et al., 2014. Page 1 of 3.

94 S. mikataei S. kudriavzeviii S. bayanusi N. castelliii

BLAST Percent BLAST E- BLAST Bit BLAST Percent BLAST E- BLAST Bit BLAST Percent BLAST E- BLAST Bit BLAST Percent BLAST E- BLAST Bit Identity value Score Ka/Ks (ω) Identity value Score Ka/Ks (ω) Identity value Score Ka/Ks (ω) Identity value Score Ka/Ks (ω) 20.25 2.00E-12 34.3 N.S. 62 1.00E-22 96.3 0.301 56.96 6.00E-23 97.1 0.212 15.19 6 28.5 N.S. 65.22 6.00E-10 57.8 0.299 71.74 7.00E-13 66.2 0.265 45.65 0.065 33.9 0.455 39.13 1.9 29.3 0.415 0 - - - 79.55 2.00E-13 67.8 0.226 75 3.00E-11 61.2 0.193 0 - - - 0 - - - 12.5 8.00E-04 28.5 0.161 17.05 7.7 28.5 N.S. 0 - - - 73.68 0.94 29.3 N.S. 0 - - - 57.89 5.9 26.9 6.55 0 - - - 55.26 8.00E-06 45.1 N.S. 55.26 2.00E-06 47.4 N.S. 44.74 9.00E-04 39.3 N.S. 0 - - - 28.21 7.9 27.3 N.S. 0 - - - 0 - - - 0 - - - 67.03 1.00E-24 102 0.267 0 - - - 54.95 3.00E-14 72.4 0.203 0 - - - 45.16 0.82 30 5.77 0 - - - 0 - - - 0 - - - 0 - - - 79.22 7.00E-33 122 0.18 0 - - - 0 - - - 0 - - - 93.33 0.84 29.3 0.001 86.67 4.6 26.9 0.043 0 - - - 0 - - - 71.88 5.00E-10 54.7 0.146 71.88 4.00E-09 54.7 0.146 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 32.56 5 27.7 0.317 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - -

Table 3-2. Predicted sORFs and conservation analyses. For each sORF, the encoding uRNA, sacCer2 chromosomal coordinates, phastCons log-odds score for previously identified conserved elements (Siepel et al., 2005), TBLASTN results (percent identical residues relative to full-length putative peptide, E-values, and bit scores), and Ka/Ks ratios (Zhang et al., 2006) are presented. Table reproduced with permission from Smith et al., 2014. Page 2 of 3.

95 C. glabratai K. lactisi A. gossypiii

BLAST Percent BLAST E- BLAST Bit BLAST Percent BLAST E- BLAST Bit BLAST Percent BLAST E- BLAST Bit Identity value Score Ka/Ks (ω) Identity value Score Ka/Ks (ω) Identity value Score Ka/Ks (ω) 24.05 2.00E-04 42.4 N.S. 17.72 0.037 35.4 N.S. 24.05 0.025 36.2 0.338 45.65 0.028 35 N.S. 0 - - - 0 - - - 27.27 8 27.3 0.506 0 - - - 27.27 2.7 28.9 N.S. 26.14 4.3 29.3 N.S. 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 39.56 6.00E-11 62.8 0.24 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 59.38 8.00E-05 42 0.257 46.88 0.034 34.3 N.S. 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - - 0 - - -

Table 3-2. Predicted sORFs and conservation analyses. For each sORF, the encoding uRNA, sacCer2 chromosomal coordinates, phastCons log-odds score for previously identified conserved elements (Siepel et al., 2005), TBLASTN results (percent identical residues relative to full-length putative peptide, E-values, and bit scores), and Ka/Ks ratios (Zhang et al., 2006) are presented. Table reproduced with permission from Smith et al., 2014. Page 3 of 3.

96 CHAPTER 4: IDENTIFICATION OF NMD-SENSITIVE

mRNAs

4.1 Introduction

Obstacles to the identification of NMD substrates

Genome-wide analyses of gene expression have indicated that NMD impacts the expression of ~3-10% of endogenous genes across eukarya

(Rehwinkel et al., 2006). Although this supports the conclusion that NMD plays a role in the regulation of endogenous gene expression in addition to its function in

RNA surveillance, predicted targets show little conservation across species

(Rehwinkel et al., 2006), bringing into question whether NMD has a conserved role in the regulation of specific cellular processes. Furthermore, many NMD- sensitive mRNAs have no identified feature that would explain their targeting to

NMD, suggesting we do not yet have a clear picture of why NMD targets endogenous mRNAs.

Several studies have globally identified NMD targets in yeast using a number of methods: 1) increased steady-state abundance in the absence of

NMD (He et al., 2003; Lelivelt and Culbertson, 1999), 2) association with NMD factor UPF1 (Johansson et al., 2007), 3) decreased rate of decay in the absence of NMD after transcriptional inhibition (Guan et al., 2006), and 4) rapid destabilization of mRNAs upon induction of NMD (Johansson et al., 2007).

Notably, these studies demonstrate weak overlap of predicted yeast NMD

97 targets. However, these discrepancies may be explained in part by caveats of the experimental approaches taken.

Several of these studies identified NMD-sensitive RNAs based on a change in steady-state RNA abundance when NMD activity is eliminated (He et al., 2003; Lelivelt and Culbertson, 1999). Importantly, a change in steady-state

RNA abundance cannot distinguish direct NMD substrates from mRNAs whose expression changes secondarily, which may lead to the misidentification of NMD targets. Furthermore, this approach also cannot identify direct NMD substrates whose abundance does not change significantly at steady-state in the absence of

NMD despite a decrease in decay rate (as documented in Chapin et al., 2014;

Tani et al., 2012), which may cause true NMD targets to be missed. To address these issues, alternate approaches specifically designed to identify direct NMD substrates were used, including monitoring the rate of RNA decay when NMD is inhibited or reactivated (Guan et al., 2006; Johansson et al., 2007) or identifying mRNAs which directly bind to NMD factors (Johansson et al., 2007). However, observations that UPF1 can bind both NMD-sensitive and NMD-insensitive RNAs

(Hwang et al., 2010; Johns et al., 2007; Zund et al., 2013), and that only phosphorylated UPF1 may selectively mark NMD substrates in human cells

(Kurosaki et al., 2014), suggest that even the latter approach may lead to inaccuracies in identifying targets of NMD. The most accurate, but technically challenging, way of identifying mRNAs directly targeted by NMD is to identify those whose decay rate is slowed when NMD is eliminated.

98 Additionally, previous studies to identify yeast NMD targets all assessed mRNA levels using microarray analysis, which has several limitations.

Microarrays can suffer from inconsistent or inadequate detection of low- abundance NMD-sensitive RNAs (Wang et al., 2009). Furthermore, detection of mRNA expression by cDNA microarray is limited to known mRNA isoforms which represent a substantial fraction of the RNA produced from a gene locus. For example, a classic endogenous NMD substrate in yeast is the CYH2 pre-mRNA, which is inefficiently spliced due to weak consensus splice sites and escapes as an unspliced transcript into the cytoplasm (He et al., 1993). The pre-mRNA is robustly targeted by NMD due to an early in-frame stop codon within the retained intron. Importantly, previous studies that have attempted to identify NMD targets globally have failed to identify the CYH2 pre-mRNA in their datasets (Guan et al.,

2006; He et al., 2003; Johansson et al., 2007; Lelivelt and Culbertson, 1999).

One explanation for this is because the pre-mRNA is present at steady-state at much lower levels than the mature mRNA, its change in expression in the absence of NMD is masked by expression of the mRNA. Furthermore, the pre- mRNA itself cannot be uniquely detected because the intronic sequence is often not included on cDNA microarrays. Indeed, this is an issue for any genes which express more than one mRNA isoform, particularly given that an NMD-sensitive isoform will likely be low in abundance, and may not have been previously characterized.

Finally, the use of poly(A) selection during cDNA library construction may also impact the identification of NMD substrates in previous studies. While

99 poly(A) selection represents an important step to enrich mRNAs from abundant non-coding RNAs such as ribosomal RNA, it may impact the population of RNA used for downstream analysis. For example, NMD substrates in wild-type cells have long poly(A) tails since they are degraded independent of deadenylation. In contrast, however, in NMD-deficient cells, NMD targets are degraded through the normal mRNA decay pathway initiated by deadenylation, and a large fraction of the stabilized pool of NMD substrates now have short poly(A) tails at steady-state

(data not shown). Therefore, the true fold-change in mRNA abundance may be underestimated by using poly(A) selection and inadvertently excluding deadenylated mRNAs from analysis.

Improving approaches for transcriptome analyses

As noted above, prior studies that have globally identified endogenous

NMD substrates in yeast have relied upon array-based technology. While microarrays are effective at detecting large fold-changes in mRNA levels, they display poor sensitivity for low-abundance transcript detection (into which category NMD substrates are predicted to fall) and are unable to identify, and subsequently distinguish between, novel transcript isoforms. In contrast, high- throughput sequencing (i.e. RNA-seq) provides increased sensitivity and a greater dynamic range of detection (Wang et al., 2009). Therefore, the identification of mRNAs sensitive to NMD can be more accurately performed using RNA-seq, which also enables the detection of novel NMD-sensitive mRNA isoforms. Furthermore, the nucleotide-level resolution provided by RNA-seq can

100 be used to identify features that would explain the sensitivity of specific mRNA isoforms to NMD, such as extended or truncated 5’ UTRs, retained introns, or extended 3’ UTRs (see Chapter 1.3). Finally, performing gene expression analysis without poly(A) selection will avoid biasing the RNA population used for analysis and allow detection of the most robust changes in gene expression possible. Interestingly, analysis of the yeast transcriptome in the absence of

NMD is expected to lead to the identification of transcript isoforms that have not been annotated due to their low abundance in wild-type cells, ultimately revealing novel insights into alternative transcript processing in S. cerevisiae.

4.2 Results and conclusions

Identification of direct and indirect targets of NMD by RNA-seq

To globally identify RNAs that are sensitive to loss of the NMD pathway in yeast, the transcriptomes of wild-type and NMD-deficient (upf1Δ) yeast cells were analyzed by RNA-seq in duplicate. As described in Chapter 3.2, whole-cell

RNA was depleted of rRNA by subtractive hybridization with the Epicenter

Human/Mouse/Rat Ribo-Zero kit, and abundant small RNAs were depleted with the Zymo RNA Clean and Concentrator-5 columns using a modified protocol optimized to remove RNAs of less than ~200 nucleotides. The removal of rRNA by subtractive hybridization eliminated the need to generate mRNA-specific libraries with poly(A) selection by oligo-d(T) hybridization or priming. Rather,

RNA was converted to cDNA by random priming, and cDNA libraries were generated with a strand-specific protocol such that sequencing reads could be

101 uniquely mapped to their strand of origin. These libraries were analyzed by

Illumina HiSeq, producing ~11-22 million uniquely mapped reads per sample (see

Table 3-1). mRNAs which displayed a ≥2-fold increase in expression in the absence of NMD were identified using the differential expression program

Cuffdiff. 493 mRNAs sensitive to the loss of NMD were identified by this analysis. These mRNAs include both direct substrates of the NMD pathway and indirect targets that increase in abundance due to secondary effects of NMD inactivity.

The list of NMD-sensitive mRNAs identified by RNA-seq was compared to a published list of mRNAs that demonstrated a ≥2-fold increase in expression in upf1Δ yeast, as monitored by microarray (He et al., 2003). The majority of mRNAs identified as NMD substrates by RNA-seq (339/493, 68.8%; Figure 4-1A) were also identified in this previous analysis. This strong but imperfect overlap supports the validity of the RNA-seq dataset while also highlighting expected differences between the two technologies. Notably, 154 mRNAs were not previously identified as being sensitive to NMD, which may be explained in part by the absence of poly(A) selection in the preparation of cDNA libraries and increased sensitivity of RNA-seq versus microarrays.

Many NMD-sensitive mRNAs contained features known to induce NMD, although whether a specific feature directly contributed to the NMD-sensitivity of an mRNA could not be concluded without further investigation. 19 of the 493

NMD-sensitive mRNAs (3.9%) have been previously identified to harbor long 3’

UTRs (Kebaara and Atkin, 2009), while an additional 34 contained uORFs within

102 the 5’ UTR that showed evidence of translation by ribosome profiling (Figure

4-1B; Ingolia et al., 2009). Qualitative analysis based upon visual inspection of mapped RNA-seq read coverage at all 493 NMD-sensitive loci also identified other features that are often enriched among NMD-sensitive mRNAs, such as atypically long 5’ or 3’ UTRs (indicated as “upstream transcription” or

“downstream transcription”, respectively), or contiguous transcription across two adjacent gene loci (indicated as “multi-gene transcription;” Figure 4-1B). Many of

A RNA-seq Microarray (He et al., 2003)

154 339 372

B Downstream transcription (8.9%)

Long 3' UTR (3.9%)

Upstream transcription (6.5%)

Translated uORF (6.9%)

Multi-gene transcription (8.3%)

Undetermined (65.5%)

Figure 4-1. Characteristics of NMD-sensitive mRNAs identified by RNA-seq. A) Overlap between NMD targets identified by RNA-seq (green) to a published list of targets obtained by microarray (red). B) Possible NMD-inducing characteristics were identified for NMD-sensitive mRNAs. Features known to induce NMD include extension of the 3’ UTR (green) identified either by visual inspection (dark) or comparison to published data (light; Kebaara and Atkin, 2009), uORFs within the 5’ UTR (red) predicted based on longer 5’ UTR by visual inspection (dark) or comparison to ribosome profiling data (light; Ingolia et al., 2009), and transcription across multiple adjacent gene loci identified by visual inspection (blue).

103 the RNAs with no apparent NMD-inducing features may represent indirect targets that are upregulated secondarily in the absence of the NMD pathway.

To validate the RNA-seq transcriptome analysis results, a number of mRNAs predicted to be NMD-sensitive were analyzed by northern analysis

(Figures 4-2A and 4-2C). Of 8 mRNAs analyzed, all showed an increase in abundance in cells lacking NMD (upf1Δ), and in many cases the fold-change measured by the two assays was comparable (Figure 4-2C). Interestingly, northern analysis provided preliminary evidence for the presence of more than

A B chrV 550,000 551,000 800 WT upf1

AIF1 WT

0 800 SCR1 upf1 0 SLO1 ISC10 SLO1

SCR1

C Fold upregulated Gene Northern RNA-seq ** Blot ISC10 AIF1 3.73 15.24 * ISC10 3.24 6.97 SLO1 7.75 6.97 YPT35 2.74 2.69 SCR1 BSC5 1.86 5.25 DAL7 1.72 9.65 ULI1 7.83 9.56 ESF1 5.10 2.81

Figure 4-2. Validation of RNA-seq by northern analysis. A) Representative northern blots for 3 mRNAs identified as sensitive to NMD by RNA-seq. ISC10 (bottom panel) expresses two transcripts, a smaller transcript (*) and a larger one likely to be a transcriptional readthrough product (**). SCR1 serves as a loading control. B) RNA-seq coverage for the ISC10 locus supports the presence of a long RNA isoform that spans two adjacent gene loci. X axis indicates read coverage. C) Quantification of change in RNA abundance, by both northern analysis and RNA-seq, for all mRNAs validated by northern analysis.

104 one mRNA isoform for the ISC10 gene locus. Based on the qualitative assessment of RNA-seq data (Figure 4-1B), this locus was predicted to produce at least one mRNA isoform which spans part of each ISC10 and SLO1 gene

(Figure 4-2B). Only the large isoform of ISC10, predicted to be this readthrough transcript, showed sensitivity to NMD (Figure 4-2A, bottom). These data provided a list of NMD-sensitive mRNAs in budding yeast identified using the high-sensitivity method of RNA-seq, and included evidence of novel isoform- specific sensitivity of mRNAs to NMD in yeast.

Identifying transcript features unique to NMD-sensitive RNA isoforms

To identify RNA features that are associated with NMD sensitivity in a high-throughput and unbiased manner, a bioinformatic approach was developed.

The “Find_NMD_Features” tool calculates RNA-seq read coverage at nucleotide- resolution across the yeast genome, and identifies regions where the read coverage increased in upf1Δ relative to wild-type cells. This can identify NMD- sensitive regions independent of gene annotations or assembled transcripts.

This tool enables the identification of regions that may be unique to an NMD- sensitive isoform, such as a long 3’ UTR (Figure 4-3A) or retained intron (Figure

4-3B) which would otherwise be missed by quantifying the average change in expression across an entire gene. It can also identify both annotated and unannotated full-length RNAs sensitive to NMD.

As proof-of-principle, the Find_NMD_Features tool was used to search for known NMD-sensitive mRNA isoforms. Briefly, the average fold-change in

105 expression between wild-type and upf1Δ was determined within a sliding window

10 nucleotides long, and the window was expanded to cover a contiguous section of the genome demonstrating a similar fold-change in expression. If the region was a minimum length of 50 nucleotides, had a minimum average coverage of at least 10 reads in upf1Δ, and displayed an average fold-change ≥2 across the region, it is identified as an NMD “feature.” The CYH2 pre-mRNA is a robust NMD substrate due to an in-frame termination codon contained within its inefficiently spliced intron (He et al., 1993). Although previous global transcriptome studies have failed to identify the CYH2 pre-mRNA (see Chapter

4.1), an NMD feature was identified that corresponded precisely to the CYH2 intron (Figure 4-3C), the region unique to the pre-mRNA and where the increase in expression can be detected because it is not masked by signal from the mRNA. This indicated that the CYH2 pre-mRNA specifically is sensitive to NMD.

A second previously described NMD-sensitive transcript isoform is an RNA which spans the MAK31-PET18 locus (Johansson et al., 2007). Due to transcription initiation upstream of the canonical PET18 transcription start site, part of the MAK31 ORF is included in the 5’ UTR and functions as a uORF.

Find_NMD_Features identified the region spanning MAK31 and PET18 as an

NMD-sensitive feature (Figure 4-3D). Finally, the novel readthrough transcript detected by northern analysis at the ISC10 locus (Figure 4-2A, bottom) was also identified (Figure 4-3E), demonstrating the ability of this tool to identify previously uncharacterized NMD-sensitive transcript isoforms. These three examples indicate that the Find_NMD_Features tool can be used for further interrogations

106 of the NMD-sensitive transcriptome, and should facilitate the identification of specific unknown RNA isoforms sensitive to NMD.

4.3 Discussion

A long-standing question in the field of NMD research is how, why, and which endogenous mRNAs are targeted for decay by this quality control pathway.

A Gene ORF B Gene ORF NMD-insensitive isoform NMD-insensitive isoform NMD-sensitive isoform NMD-sensitive isoform WT WT upf1 upf1

Fold change 1.5 3 Fold change 1.5 4 1.5

Feature Feature (long 3’ UTR) (retained intron)

C D E

chrVII 311,000 312,000 chrIII 154,000 155,000 chrV 550,000 551,000 40,000 40,000 800 WT WT WT

0 0 0 40,000 40,000 800 upf1 upf1 upf1 0 0 0

CYH2 PET18 MAK31 ISC10 SLO1 Feature Feature Feature

Figure 4-3. A tool for identifying known and unique NMD-sensitive transcript regions. A) A theoretical example where an RNA isoform containing a long 3’ UTR (pink) is sensitive to NMD and exhibits low expression in wild-type cells relative to a more stable isoform lacking a long 3’ UTR. The long 3‘ UTR can be identified as a feature sensitive to NMD (red) based on an increase in expression in this region that is not observed when expression across the entire gene locus is quantified. B) As in (A), a theoretical example where the NMD-sensitive RNA isoform contains a retained intron (pink). C) Known NMD substrate CYH2 pre-mRNA. The CYH2 locus is not identified as an NMD substrate based on quantification across the entire gene, but Find_NMD_Features identifies the retained intron in the pre-mRNA as expected. D) Transcriptional readthrough at the PET18-MAK31 locus creates an NMD-sensitive transcript spanning both loci, which is identified. E) Transcriptional readthrough at the ISC10 locus creates an NMD-sensitive transcript spanning two adjacent gene loci, which is identified.

107 Investigation of the NMD-sensitive transcriptome in yeast by RNA-seq aims to address the latter two parts of this question. The current analysis of the NMD- sensitive transcriptome at steady-state includes both direct and indirect targets of

NMD. Importantly, any changes in gene expression upon the loss of NMD can impact cellular activity, whether these result from stabilization of RNAs that are directly targeted by NMD or as a secondary effect. Therefore, knowing the global impact of NMD is important for understanding its biological relevance.

Nonetheless, distinguishing direct versus indirect targets of NMD will be essential to identify features that directly cause endogenous mRNAs to become NMD substrates. More sophisticated methods, described in Chapter 4.4, will be required to parse out direct targets of the NMD pathway from this dataset.

A number of individual examples of a gene expressing multiple mRNA isoforms that are differentially regulated by NMD have been documented, most commonly generated by alternative pre-mRNA splicing. Because regulated alternative splicing is generally considered to be negligible in yeast, it is unclear what the contribution of isoform-specific regulation by NMD may be in this model eukaryote. However, recent studies have identified numerous examples of yeast genes producing more than one RNA isoform through selection of cryptic splice sites, alternative transcription start sites, or alternative sites of 3’ end processing

(Arribere and Gilbert, 2013; Brar et al., 2012; Kawashima et al., 2014; Law et al.,

2005; Nagalakshmi et al., 2008). Therefore, although S. cerevisiae is considered to express a relatively simple transcriptome in comparison to higher eukaryotes, it is still a reservoir for the production of many alternative transcript isoforms.

108 Furthermore, these processes are known to produce RNA isoforms with differential sensitivity to NMD. Cryptic splicing can generate NMD-sensitive transcripts by introducing intronic PTCs or translational frameshifts (Kawashima et al., 2014). Changes in 5’ UTR length can alter selection of the translation initiation codon, leading to out-of-frame translation (Arribere and Gilbert, 2013).

Finally, alternative 5’ UTRs may include or exclude uORFs (Brar et al., 2012), a feature capable of inducing NMD when translated.

There is also precedent for the identification of novel transcript isoforms upon their stabilization in the absence of NMD. A recent study of gapped RNA- seq read alignments in upf1Δ yeast discovered that widespread alternative splicing generated NMD-sensitive transcripts not previously identified

(Kawashima et al., 2014). Investigation of the transcriptome in the absence of

NMD not only provides information about endogenous NMD substrates, but can also reveal general information about RNA processing in yeast. The

Find_NMD_Features bioinformatic tool is specifically designed to identify NMD- sensitive RNA isoforms with no prior knowledge of the isoform structure required.

Advances in the technologies available to globally investigate gene expression will continue to improve the accuracy of identifying NMD substrates, through both increased sensitivity and novel methods of analysis. The accurate identification of RNAs sensitive to NMD is essential for understanding the global impact of NMD on gene expression and gaining insight into what proportion of

NMD activity extends beyond surveillance alone. Additionally, identification of

109 direct NMD substrates may inform our understanding of how NMD substrates are recognized, another compelling area of research.

4.4 Future directions

Previous investigations of NMD-sensitive mRNAs in yeast, including the data presented here, have provided support for isoform-specific mRNA sensitivity to NMD. To extend this analysis to the entire NMD-sensitive transcriptome, a tool to identify features that are uniquely present in NMD-sensitive isoforms has been developed. In future analyses, specific examples of NMD-sensitive features identified by Find_NMD_Features can be validated by northern analysis and/or

5’ or 3’ end mapping. From these more detailed investigations, global trends of features common to NMD-sensitive mRNAs may emerge. These data will serve as a valuable resource for continued research into the identification and characterization of endogenous NMD substrates.

Genes regulated both directly and indirectly by NMD are included in the present analyses. However, progress has begun towards a second approach that can be used to identify direct substrates of the NMD pathway, based on mRNA stabilization in the absence of NMD, by globally monitoring mRNA half- lives. Using a temperature-sensitive mutation in RNA Polymerase II (Pol II) subunit rpb1-1, transcription can be globally inhibited upon a shift in temperature from 24 °C to 37 °C (Nonet et al., 1987), allowing the decay of the existing pool of mRNA to be monitored over time. Preliminary data comparing rpb1-1 to rpb1-1/upf1Δ yeast indicated that transcription by Pol II was efficiently inhibited

110 upon shift to 37 °C, and the NMD-sensitive CYH2 pre-mRNA was both higher in abundance (Figure 4-4A) and stabilized (Figure 4-4B) in the absence of NMD

(rpb1-1/upf1Δ). The identification of direct targets of the NMD pathway based on a change in decay rate following transcriptional arrest has been previous performed by microarray analysis (Guan et al., 2006). This study led to the observation that most NMD substrates undergo biphasic decay, with the majority

A rpb1-1 rpb1-1/upf1 0 1.5 3 4.5 7.5 15 0 1.5 3 4.5 7.5 15 Minutes after transcriptional shutoff

CYH2 pre-mRNA

CYH2 mRNA

SCR1 loading control

B

100 pre-mRNA

rpb1-1 CYH2 Percent remaining rpb1-1/upf1

10 0 5 10 15 Time (min)

Figure 4-4. Global transcriptional inhibition using a temperature-sensitive mutation in RNA Polymerase II. A) Northern analysis following shift to non-permissive temperature to globally inhibit transcription by RNA Polymerase II. CYH2 pre-mRNA is an endogenous NMD substrate, SCR1 serves as a loading control. B) Quantification of CYH2 pre-mRNA in (A). Half- life in rpb1-1 is ~3 minutes; half-life in rpb1-1/upf1Δ is ~15 minutes.

111 of the mRNA degraded very quickly following transcriptional arrest. However, in this study, an insufficient number of timepoints collected early after transcriptional arrest made it difficult to adequately monitor the fast initial rate of decay of NMD substrates. Therefore, we will use a short time-course biased towards very early timepoints, with six timepoints collected within 15 minutes of transcriptional arrest, to more accurately calculate the half-lives of unstable NMD substrates.

Additionally, performing this analysis by RNA-seq should provide increased sensitivity and thus generate a more accurate and complete list of NMD substrates.

112 CHAPTER 5: CONCLUDING REMARKS

Investigation of the nonsense-mediated RNA decay pathway will continue to be of great interest, both for understanding the mechanisms of basic biology and ultimate applications towards treating human disease. Major questions in understanding the fundamentals of NMD remain. How are PTC-containing RNAs distinguished from normal RNAs? What is the precise sequence of steps that occurs downstream of recognition of an RNA by the NMD pathway? Does the major role of NMD reside in its surveillance function, or in its impact on gene expression through fine-tuning levels of endogenous NMD-sensitive RNAs? The work presented here represent important advances to understanding the complex NMD pathway.

The development of a protocol to characterize mRNP complexes from a single mRNA species as they exist in vivo was a challenging task due to the diversity of RNAs expressed in a cell and the scale required to purify sufficient material for downstream analyses. This technique will now be invaluable for future studies into mRNA regulation and metabolism. The mRNP pulldown protocol has facilitated the identification of proteins that are highly associated with an NMD-sensitive mRNA, including known NMD factor UPF1. This dataset can now serve as a rich resource for identifying candidates that may help define the context of translation termination and lead to NMD-substrate recognition.

NMD is known to affect the expression of a number of endogenous mRNAs. However, RNAs originally annotated as lncRNAs represent a novel

RNA class displaying sensitivity to NMD, previously overlooked due to their

113 classification as non-coding. The discovery that RNAs within this class are recognized by the NMD pathway has expanded the scope of the role of NMD in the cell. Although the importance of this regulation is not yet clear, an interesting implication is that NMD serves a critical role in the regulation of lncRNA activity.

This novel role of NMD deserves ongoing investigation in the future.

Finally, it has recently become appreciated that NMD is responsible for the down-regulation of many mRNA isoforms. Some of these isoforms may be produced by low fidelity RNA processing, while others may feed into regulatory networks potentially fine-tuned by NMD. It is likely that the down-regulation of many RNA isoforms by NMD has hindered their discovery, suggesting that investigation of the transcriptome in the absence of NMD may greatly expand our understanding of alternative RNA processing events. Furthermore, only through examining endogenous NMD-sensitive RNAs can the contribution of NMD in quality control versus gene expression regulation be uncovered.

Existing knowledge of the NMD pathway provided a strong foundation upon which the investigations presented here were built. However, many details of NMD remain enigmatic. Future investigations into this specialized RNA decay pathway, particularly with improved experimental approaches and methods of analysis, should yield important insights into NMD.

114 APPENDIX A - GENERAL EXPERIMENTAL METHODS

Details regarding all yeast strains, oligos, and plasmids are deposited in databases in the Baker Lab or Coller Lab. Unless otherwise noted, all yeast strains are isogenic to the Resgen strain background (BY4741). Detailed protocols for many procedures can be found online in the Coller Lab Protocol

Book (http://www.case.edu/med/coller/Coller%20Protocol%20Book.pdf).

Procedures from Chapter 3 are also available as supplemental information in

Smith et al., 2014.

Strain construction - Yeast strains, including gene deletions and endogenous

protein tagging were made using standard homologous recombination

methods (Longtine et al., 1998), unless otherwise noted. Plasmids were

constructed using standard cloning techniques. Plasmids were transformed

into yeast strains using standard lithium acetate transformation.

Growth conditions - Yeast were grown in synthetic media supplemented with 2%

glucose and appropriate amino acids, under standard laboratory conditions

(30 °C at 250 RPM) to mid-log phase (optical density [OD] at 600 nm

wavelength = 0.4), unless otherwise noted. To harvest yeast cells, cultures

were centrifuged at 4000 RPM for 5 minutes, cell pellet was transferred to a

1.5 or 2 mL Eppendorf tube, and cells collected at 13,200 RPM for 2 minutes.

Cells were flash frozen on dry ice, and stored indefinitely at -80 °C.

115 RNA analysis by northern blotting - 50 mL of yeast culture was grown in synthetic

medium to mid-log phase and harvested. Whole-cell RNA was isolated using

standard glass-bead lysis and phenol extraction (Geisler et al., 2012). 25 - 40

µg of RNA was separated by agarose gel electrophoresis on a 1.4% agarose

gel with 5.92% formaldehyde. RNA was transferred to a Hybond-N nylon

membrane and immobilized with UV crosslinking. Membranes were washed

in 0.1X SSC/0.1% SDS for 1 hour at 65 °C, incubated for 1 hour in

hybridization buffer (10X Denhardt’s solution, 6X SSC, 0.1% SDS), and

probed overnight in hybridization buffer with either 5’ 32P end-labelled DNA

oligonucleotides or ɑ-32P CTP probes generated by asymmetric PCR (Rio et

al., 2011) at individually optimized temperatures, to detect the RNA of interest.

Excess probe was washed from membrane three times for 15 minutes with

6X SSC/0.1% SDS at individually optimized temperatures. Membrane was

exposed to a storage phosphor screen and developed using a GE Storm 820

or GE Typhoon 9400 Variable Mode Imager. Signal intensity was quantified

using ImageQuant software.

Protein analysis by western blotting - Cell pellets from cultures grown to mid-log

phase were heated in 5M urea at 95 °C for 2 minutes, then lysed by

mechanical disruption with glass beads by vortexing for 5 minutes. Solution A

was added to lysates (125 mM Tris-HCl, pH 6.8, 2% SDS), followed by

vortexing for 1 minute and heating to 95 °C for 2 minutes. Glass beads and

cellular debris were cleared from lysates by centrifugation at 13,200 RPM for

4 minutes. Equivalent OD units (A260) of lysate in 1X SDS sample buffer (125

116 mM Tris-HCl, pH 6.8, 2% SDS, 100 mM DTT, 10% glycerol, 0.05% bromphenol blue) were separated on SDS-PAGE gels by electrophoresis in

1X Laemmli buffer (25 mM Tris base, 192 mM glycine, 0.1% SDS) or on

NuPAGE Novex 4-12% Bis-Tris gels by electrophoresis in 1X MOPS SDS running buffer (50 mM MOPS, 50 mM Tris base, 0.1% SDS, 1 mM EDTA, pH

7.7). Proteins were transferred to an Immobilon-P PVDF transfer membrane in 1X western transfer buffer (25 mM Tris base, 192 mM glycine, 20% methanol) at 4 °C by electroblotting at 250 mA for 2 hours. Membrane was blocked in blocking buffer (5% milk powder in 1X TBS/T [1X Tris-buffered saline, 0.1% Tween-20]) overnight at 4 °C. Membrane was incubated with primary antibodies and secondary antibodies in blocking buffer each for 1 hour. Between each incubation, membrane was washed with 1X TBS/0.1%

Tween-20 three times for 15 minutes. Signal was detected by chemiluminescence using Blue Ultra Autorad film. Quantification was performed with ImageJ software.

117 APPENDIX B - mRNP PULLDOWN PROTOCOL

1. Grow 200 mL culture of cells to OD600 = 0.4. Crosslink with formaldehyde to

0.25% for 15 minutes at 30 °C, followed by quenching with glycine to 0.125 M

for 5’ at 30 °C. Harvest cells at 4000 RPM for 1 minute, and decant media.

Resuspend pellet in residual media and transfer to 2 mL Eppendorf tube.

Pellet cells at 13,200 RPM and flash freeze on dry ice.

2. Perform polysome lysis procedure as described:

a. Supplement 1x lysis buffer (10 mM Tris pH 7.4, 100 mM NaCl, 30 mM

MgCl2, 1 Roche complete EDTA-free protease inhibitor cocktail tablet) with

DTT to 1 mM.

b. Resuspend each cell pellet in 400 µL 1X lysis buffer with brief vortex.

c. Add ½ volume of sterile glass beads.

d. Vortex at 4 °C for 3 minutes.

e. Place on ice for 2 minutes.

f. Vortex at 4 °C for 3 minutes.

g. Place on ice for 2 minutes.

h. Vortex at 4 °C for 3 minutes.

i. Puncture the bottom of the 2 mL tube with a red hot 18 gauge needle and

place in 15 mL conical vial. Spin at 2000 RPM for 2 minutes at 4 °C.

j. Transfer supernatant to a cold 1.5 mL Eppendorf tube, and adjust volume

for any lost buffer/sample by bringing up to 400 µL with additional lysis

buffer.

118 k. With remaining cell debris, resuspend in lysis buffer and transfer to 2mL

tube. Pellet at 13,200 RPM. Use pellet to repeat lysis (steps b-j) and

combine with first lysate.

l. Quantify lysate OD260 and OD280 by diluting 5 uL in 995 µL dH2O, using

lysis buffer in dH2O as a blank.

m. Bring the volume of 800 µL lysate (2 lyses of a single 200 mL pellet) up to

3 mL with 2.2 mL TE buffer (10 mM Tris, 1 mM EDTA, pH 8.0). Save a

small amount of input, ideally 1/10 for RNA or 1/20 for protein.

n. Prepare 5 mL hybridization reactions for each sample in a 15 mL conical

vial. (This can be scaled up to 10 mL in a 15 mL conical vial with no

decrease in efficiency.)

3 mL lysate in TE buffer

500 mM LiCl (500 µL 5 M stock)

0.5% SDS (250 µL 10% stock)

50 mM EDTA (500 µL 0.5 M stock)

10 mM Tris, pH 7.5 (50 µL 1 M stock)

14% formamide (700 µL)

Fungal protease inhibitors 5 µL

3. Immobilize biotinylated DNA oligonucleotides to streptavidin Dynabeads

a. Resuspend MyOne C1 Streptavidin Dynabeads completely by pipetting,

and transfer enough beads to 1.5 mL Eppendorf tube for all samples. Use

250 µL beads per 200 mL cell pellet.

119 b. Place beads on magnet and remove supernatant.

c. Wash beads with an equal volume of 1x B&W buffer (5 mM Tris-HCl pH

7.5, 0.5 mM EDTA, 1 M NaCl).

d. Remove wash, and repeat twice.

e. Incubate beads with biotinylated oligonucleotide for 15’ at room

temperature, in a total volume 4x the original bead volume:

a. 2x volume of 2x B&W buffer

b. 4 nmol biotinylated oligo per 100 µL beads (10 µL of 400 pmol/µL

stock per 100 µL beads)

c. DEPC dH2O to 4x volume

f. Wash beads 2x in 1x B&W buffer, in ½ volume as incubation. At each

step, place beads on magnet and remove and discard supernatant.

g. Resuspend beads in 1x B&W buffer to original volume, and aliquot volume

of beads needed per sample into individual tubes.

4. Anneal RNA to beads

a. Use an aliquot of each sample in hybridization buffer to resuspend

Dynabeads, and transfer to 15 mL vial. Parafilm lid.

b. Incubate overnight at room temperature, rotating.

5. Briefly spin conical vials to collect liquid.

120 6. Remove supernatant from beads. Do this in several steps; keep adding

hybridization solution to same 1.5 mL Eppendorf to collect all beads in one

tube. Save supernatant if going to be analyzed.

7. Wash beads:

a. Wash beads 2x in Wash Buffer 1 (10 mM Tris pH 7.5, 1 mM EDTA, 250

mM LiCl, 0.1% SDS), using ~twice the original volume of beads.

b. Wash beads 3x in Wash Buffer 2 (10 mM Tris pH 7.5, 1 mM EDTA, 100

mM LiCl), using ~twice the original volume of beads. After each wash,

collect supernatant, then briefly spin down beads to collect any residual

wash buffer to help ensure complete removal of SDS.

8. For RNA analysis or SDS-PAGE analysis of protein, elute (optional):

a. Resuspend beads in 75 µL dH2O per 100 µL beads (93.5 µL for 125 µL

beads).

b. Heat beads at 70 °C for 2 minutes to break annealing interaction.

c. Place beads immediately on magnet and remove eluate. If any beads are

carried over into eluate, place new tube on magnet and transfer eluate

one additional time to remove beads.

(If this elution is not performed, resuspend beads in Wash Buffer 2 and store

at -80 °C. Continue to step 11.)

121 9. RNA Analysis:

a. Precipitate samples with 0.2 M NaCl and 1.5 µL GlycoBlue, overnight at

-20 °C with 2.5X volumes of 95% ethanol (EtOH).

b. Pellet RNA at 13,200 RPM for 10 minutes. Wash in 500 µL 75% EtOH,

vortex for 10 seconds, centrifuge at 13,200 RPM for 10 minutes, and

remove EtOH.

c. Resuspend pellet in 450 µL LET buffer (25 mM Tris pH 8.0, 100 mM LiCl,

20 mM EDTA) + 50 µL 10% SDS by vortexing for 5 minutes.

d. Reverse crosslinking by heating at 65 °C for 1 hour.

e. Extract from proteins with an equal volume of phenol/chloroform/LET (50%

v/v phenol, 50% v/v chloroform, equilibrated with LET buffer) followed by

an equal volume of chloroform.

f. Precipitate RNA overnight with 0.2 M NaCl, 1.5 µL GlycoBlue, and 2.5X

volumes of 95% EtOH.

g. Pellet, wash, and air dry RNA. Resuspend in LET buffer for northern

analysis, or DEPC-treated dH2O for qRT-PCR analysis.

10. Protein analysis by SDS-PAGE:

a. Concentrate proteins in SpeedVac on high heat until volume is ~20 µL

b. Precipitate proteins by adding 4x volumes of cold acetone. Incubate for 1

hour at -20 °C.

c. Spin at 14,000 RPM at 4 °C for 10 minutes to pellet proteins. Wash with

80% acetone.

122 d. Remove wash and air dry pellet.

e. Resuspend in 1x SDS sample buffer (125 mM Tris-HCl pH 6.8, 2% SDS,

100 mM DTT, 10% glycerol, 0.05% bromphenol blue) by vortexing 5’ or

until pellets are completely resuspended. Also run buffer along sides of

tubes extensively.

f. Heat proteins at 70 °C for 1 hour followed by 95 °C for 5 minutes to

reverse crosslinks. Sample can now be analyzed on SDS-PAGE gel.

11. Analyze proteins by mass spectrometry2

2 These steps performed by collaborators in the lab of Amber Mosley at Indiana University; see Appendix C for details

123 APPENDIX C - EXPERIMENTAL METHODS

RELATED TO CHAPTER 2

Generation of yKB551 - For mRNP pulldown, a strain containing genomic

deletion of DCP2 (dcp2Δ) and three vacuolar proteases (pep4Δ, prb1Δ,

prc1Δ) was constructed by mating and sporulation. yJC529 (MATa, triple

protease deletion, UPF1-HA; not isogenic to BY4741) was crossed with

yKB352 (MATɑ, dcp2::KanMX6). Diploids were selected on synthetic media

without lysine plus G418 and sporulated at room temperature in sporulation

media (1% potassium acetate, 0.1% yeast extract, 0.05% glucose) until >50%

tetrads were observed by light microscopy. Spores were gently separated with

zymolase (incubated rotating at 30 °C for 1 hour, glass beads added,

incubated rotating at 30 °C for an additional hour, and vortexed for 2 minutes

to disrupt ascal sac). Single spores were plated onto synthetic media without

histidine plus G418 to select for the deletion of dcp2 with the kanamycin

resistance cassette and at least one of the vacuolar protease deletions.

Spores were screened by PCR for deletion of prb1, prc1, and pep4 and

genomic HA tagging of UPF1 to generate yKB551 (not isogenic to BY4741). qRT-PCR analysis

DNase treatment - Purified RNA was treated with DNase I (incubate at 37 °C for

20 minutes, stop reaction with 2 µL 0.2 M EDTA), brought to a volume of 400

µL in 2M NH4OAc, and extracted with an equal volume of phenol/chloroform/

TE (50% v/v phenol, 50% v/v chloroform, equilibrated with TE buffer). RNA

124 was precipitated with 1.5 µL GlycoBlue and 2.5X volumes of 95% EtOH

overnight at -20 °C. RNA was pelleted at 13,200 RPM for 10 minutes, washed

in 500 µL 75% EtOH, pelleted at 13,200 RPM for 10 minutes, air dried, and

resuspended in DEPC-treated dH2O.

Reverse Transcription - RNA was reverse transcribed into cDNA using Invitrogen

SuperScript III and gene-specific reverse transcription primers. RNA, primers,

and dNTPs were incubated at 70 °C for 5 minutes to denature, then cooled on

ice for 5 minutes. 5X First-Strand buffer, DTT, and SuperScript III enzyme

were added and incubated at 55 °C for 1 hour followed by 70 °C for 15

minutes, to generate cDNA. Serial dilutions were initially performed to

determine the linear range of detection for each target and sample type.

Reactions without SuperScript III enzyme were included to verify no

contamination of residual DNA in the RNA preparation. qPCR - Quantitative PCR was performed with Affymetrix SybrGreen Master Mix,

or ClonTech SYBR Advantage qPCR Premix and LSR ROX reference dye.

qPCR was performed on a ABI Step-One instrument, with gene-specific

forward and reverse PCR primers. qPCR cycle included denaturation at 95 °C

for 30 seconds, and 40 cycles of 95 °C for 5 seconds and 60 °C for 30

seconds. Melt curves were inspected to verify a single uniform amplicon;

cycle threshold manually set at 0.5 for each experiment. qPCR efficiency was

determined for each reaction using 10-fold serial dilutions to generate a

standard curve, and this efficiency was used to determine fold-changes based

upon ΔCt and ΔΔCt values (1 = 100% efficient) based on equation:

125 (1+PCR_efficiency)^ΔCt. For mRNP pulldown efficiency calculations ΔCt

indicated depletion or enrichment of RNA species when comparing

supernatant to input or eluate to input; ΔΔCt indicated relative enrichment of

GFP reporter to PGK1 mRNA in eluate.

Mass spectrometry analysis3 - Purified mRNP (on-bead) was treated with

Benzonase to digest excess nucleic acids. Beads were resuspended in

trypsin digestion buffer (50 mM ammonium bicarbonate, pH 8.5) with 1 µg

Trypsin Gold (Promega), and incubated in a fresh microcentrifuge tube

overnight at 37 °C with shaking for trypsin digestion. Reaction was quenched

with 0.1% formic acid. Residual SDS was removed from sample using hiPPR

Detergent Removal Spin Columns (Pierce). Tryptic peptides were analyzed

by LC-MC using a Proxeon Easy nLC in-line with a LTQ Velos Pro Ion-Trap

MS. Liquid chromatography was performed with a strong cation exchange

resin in tandem with a reverse phase resin, using a 10-step Multidimensional

Protein Identification Technology (MudPIT) protocol. Following MS, RAW

data files were searched against the 08-10-2013 Saccharomyces cerevisiae

database of 5910 non-redundant annotated yeast protein sequences obtained

from NCBI using SEQUEST. For complete details of MS run parameters and

SEQUEST database search parameters, see Smith-Kinnaman et al., 2014.

Inhibition of DPB5 expression and assay for NMD - Cells expressing HA-tagged

DBP5 under control of the GAL promoter (yKB666) were grown in synthetic

3 These steps performed by collaborators in the lab of Amber Mosley at Indiana University.

126 media containing glucose to mid-log phase. Cells were collected by centrifugation at 4000 RPM for 2 minutes, and resuspended in 10 mL synthetic media lacking sugar. 1 mL aliquots were removed for RNA or protein analysis, pelleted, media aspirated, and flash frozen on dry ice. To the remaining 8 mL, 1 mL of synthetic media lacking sugar and 1 mL 40% glucose were added, to a final glucose concentration of 4%. Cells were incubated at 30 °C, and 1 mL aliquots were removed at 1, 2, 4, and 6 hr timepoints for both RNA and protein, and harvested as above. RNA was isolated and analyzed by northern blotting, and protein was isolated and analyzed by western blotting, using standard protocols.

127 Name Description Notes Source oJC1240 GGCTAGCAAAGGAGAAGAACTC Forward qPCR primer for GFP Sweet et al., 2012 oJC1241 CCGTATGTTGCATCACCTTC Reverse transcription primer and reverse qPCR primer for GFP Sweet et al., 2012 oJC306 GTCTAGCCGCGAGGAAGG Antisense probe to SCR1 for northern blotting This study oJC357 TTAGCGTAAAGGATGGGG Antisense probe to PGK1 for northern blotting Decker and Parker, 1993 oJC591 CGGATAAGAAAGCAACACCTGGC Reverse transcription primer for PGK1 This study oKB103 AAGTAGACTAGTGTAAACATCAGCCAAAGAGCTCAA Forward qPCR primer for PGK1 This study TTCGTGTCTGAAC oKB126 GCTGTCAAGGCTTCTGCCCCAGG Reverse qPCR primer for PGK1 This study oKB172 TTCATTAGAAGTACAATGGTAGCCC Reverse PCR primer for UPF1-HA locus screening to generate This study yKB551 oKB175 GCTTACCAGAAACTTACGATGCTATTGTGAAGGAG Forward PCR primer for UPF1-HA locus screening to generate This study yKB551 oKB285 GGGTAAGTTTTCCGTATGTTGCATCACCTTCACCCT Antisense probe to GFP for northern blotting This study CTCCACTGACAG oKB599 CCAGTAGTGCAAATAAATTTAAGGGTAAGtcaacatatag mRNP pulldown oligo; tested but not incorporated into protocol This study attc/3Bio/ oKB600 CAGAAAATTTGTGCCCATTAACATCAtcaacatatagattc/ mRNP pulldown oligo; tested and incorporated into protocol This study 3Bio/ oKB613 CCGTATGTTGCATCACCTTCAtctacttgtcgattc/3Bio/ mRNP pulldown oligo; tested and incorporated into protocol This study oKB616 GAATTGGGACAACTCCAGTGAAGtcaacatatagattc/ mRNP pulldown oligo; tested but not incorporated into protocol This study 3Bio/ oKB617 GCCATGGAACAGGTAGTTTTCCtcaacatatagattc/3Bio/ mRNP pulldown oligo; tested and incorporated into protocol This study oKB619 CTACCATCGAAAGTTGATAGGGCAG Antisense probe to 18S rRNA for northern blotting This study oKB629 CCACTAACTACGAGGTCACGG Forward PCR primer for PRB1 locus screening to generate This study yKB551 oKB630 GGTTTTCCGCTGGCATCAAAC Forward PCR primer for PRB1 locus screening to generate This study yKB551 oKB631 GCCATATTGACAGAGCAGTATGTGAG Reverse PCR primer for PRC1 locus screening to generate This study yKB551 oKB632 CGCCTCTTTTGTCACAACTTGG Reverse PCR primer for PRC1 locus screening to generate This study yKB551 oKB633 GCTTGATGTGGTACAACAAGCTG Forward PCR primer for PEP4 locus screening to generate This study yKB551 oKB634 AAGTTGATCGTACAGAGGGCG Reverse PCR primer for PEP4 locus screening to generate This study yKB551 pKB290 Cen, GAL:GFP GFP This study pKB303 Cen, GAL:GFP(PTC@codon 69) GFP +PTC This study pKB323 Cen, GAL:GFP(-A, PTC@codon 69) GFP +PTC ΔA This study pKB324 Cen, GAL:GFP(-B, PTC@codon 69) GFP +PTC ΔB This study pKB325 Cen, GAL:GFP(-C, PTC@codon 69) GFP +PTC ΔC This study pKB326 Cen, GAL:GFP(-AB, PTC@codon 69) GFP +PTC ΔAB This study pKB327 Cen, GAL:GFP(-BC, PTC@codon 69) GFP +PTC ΔBC This study pKB328 Cen, GAL:GFP(-ABC, PTC@codon 69) GFP +PTC ΔABC This study pKB531 Cen, TDH3:GFP GFP This study pKB532 Cen, TDH3:GFP(PTC@codon 69) GFP +PTC This study pKB534 Cen, TDH3:GFP(-ABC, PTC@codon 69) GFP +PTC ΔABC This study pKB535 2µ, TDH3:GFP GFP This study pKB536 2µ, TDH3:GFP(-ABC) GFP ΔABC This study pKB537 2µ, TDH3:GFP(PTC@codon 69) GFP +PTC This study pKB538 2µ, TDH3:GFP(-ABC, PTC@codon 69) GFP +PTC ΔABC This study yJC529 MATa, ade2-1, ura3, leu2, his3, trp1, pep4Δ::HIS3, Triple protease mutant, UPF1-HA Wenqian Hu prb1Δ::HIS3, prc1Δ::HISG, UPF1-HA::TRP1 yKB146 MATa, ura3, leu2, his3, met15, upf1Δ::KanMX6 upf1Δ EUROSCARF yKB154 MATa, ura3, leu2, his3, met15 Wild-type EUROSCARF yKB352 MATα, ura3, leu2, his3, lys2, dcp2Δ::KanMX6 dcp2Δ Ambro van Hoof yKB551 MATα, ade2-1, ura3, leu2, his3, pep4Δ::HIS3, Triple protease mutant, dcp2Δ, UPF1-HA This study prb1Δ::HIS3, prc1Δ::HISG, UPF1-HA::TRP1, dcp2Δ::KanMX6 yKB563 MATa, ura3, leu2, his3, met15, pbp2Δ::HIS3 pbp2Δ This study yKB589 MATa, ura3, leu2, his3, met15, hek2Δ::KanMX6 hek2Δ This study yKB590 MATa, ura3, leu2, his3, met15, pbp2Δ::HIS3, pbp2Δ, hek2Δ This study hek2Δ::KanMX6 yKB666 MATa, ura3, leu2, his3, met15, GAL:HA-DBP5::KanMX6 Conditional expression of HA-DBP5 in galactose This study

Table C-1. List of oligonucleotides, plasmids, and yeast strains used in Chapter 2. Details of reagents used for presented data are listed.

128 APPENDIX D - EXPERIMENTAL METHODS

RELATED TO CHAPTER 34

Total RNA library preparation - Whole-cell RNA was isolated using glass-bead

cell lysis and phenol extraction as for northern analysis. 5 µg of DNase I-

treated whole-cell RNA was depleted of rRNA using the Epicentre Human/

Mouse/Rat Ribo-Zero rRNA removal kit. Small RNAs were excluded using

Zymo RNA Clean and Concentrator-5 spin columns, substituting 26.6% EtOH

final concentration at steps 1-2 of the manufacturer’s recommended protocol

to enhance removal of RNAs <200 nt (data not shown). Strand-specific,

random-primed cDNA libraries were generated by the CWRU Genome and

Transcriptome Sequencing Core, using Epicentre ScriptSeq v2 RNA-Seq

Library Preparation Kit and Epicentre ScriptSeq Index PCR Primers. Libraries

were prepared for biological replicates of WT and upf1Δ strains.

Polyribosome analysis - Yeast cultures were grown to mid-log phase, treated with

100 µg/mL cycloheximide (CHX), harvested immediately by centrifugation,

and cell pellets flash frozen on dry ice. Lysis was carried out at 4 °C. Cell

pellets were lysed in polysome lysis buffer (10 mM Tris, pH 7.4, 100 mM

NaCl, 30 mM MgCl2, 100 µg/mL CHX, 1 mM DTT) by mechanical disruption

using glass beads. Cell debris was removed by centrifugation through an 18

Ga puncture hole for 2 minutes at 2000 RPM, and the resulting lysate was

pre-cleared at 29,000 RPM for 10 minutes in a Beckman TLA-120.2 rotor.

4 Experimental methods from Chapter 3 are also available in Smith et al., 2014.

129 Lysate was treated on ice with 1% Triton X-100 for 5 minutes. 10 units (OD260)

of lysate were added to a 15-45% (w/w) sucrose gradient (buffer 50 mM Tris

acetate, pH 7.0, 50 mM NH4Cl, 12 mM MgCl2, 1 mM DTT) prepared using a

Biocomp gradient maker. Gradients were centrifuged for 2:26 hr at 41,000

RPM in a Beckman Sw-41Ti rotor. Gradients were fractionated, and RNA was

precipitated and extracted as described previously (Sweet et al., 2012). 5 µg

of RNA was used to prepare Polysome-seq libraries as described above for

total RNA libraries.

RNA-seq and Polysome-seq sequencing and analysis

Sequencing and mapping: cDNA libraries prepared from total and polysome-

associated RNA were sequenced on the Illumina HiSeq2000 platform at the

Institute for Integrative Genome Biology High-Throughput Sequencing Core at

the University of California, Riverside on a single-end, 100 cycle flow cell. On

the Galaxy platform (usegalaxy.org; Goecks et al., 2010), the sequencing

FASTQ files were run through the “NGS: QC and manipulation/FASTQ

Groomer” tool. “NGS: QC and manipulation/Compute quality statistics” was

used to compute quality scores and 1 low-quality nucleotide was trimmed

from the right end of all reads during mapping. Reads were mapped to the

sacCer2 genome on Galaxy with “NGS: Mapping/Map with Bowtie for

Illumina” (Langmead et al., 2009) using a SOAP-like alignment policy to allow

2 mismatches over the entire length of the read (-v 2), and excluding any read

that did not map uniquely to the genome (-m 1).

130 Identification of unannotated RNAs: Reads were assembled into transcripts using

Cufflinks v2.1.1 (Trapnell et al., 2010) with bias correction and multi-read

correction, using reference annotation-based transcript assembly (RABT;

Roberts et al., 2011) to identify unannotated transcripts (-GTF-guide -b -u

--library-type ff-firststrand; all other parameters default). The sacCer2

Ensembl Genes annotation downloaded from the UCSC genome table

browser was used as a guide during transcript assembly (genome.ucsc.edu/

cgi-bin/hgTables?command=start). RABT assembles reads into transcripts,

and then compares assembled transcripts to the reference genome

annotation to identify transcripts significantly different from transcripts

predicted by the annotation, allowing the identification of novel transcripts that

map to regions of the sacCer2 genome lacking annotated features. Using the

default --overlap-radius option, unique transcripts must be separated by at

least 50 basepairs from annotated transcripts to prevent merging either at the

Cufflinks step or subsequent Cuffmerge step. Transcripts <200 nucleotides or

with a coverage <1 read per million were filtered from the dataset. A master

annotation compiling RNAs detected in all datasets was generated with

Cuffmerge (Cufflinks v2.1.1). Notably, Cuffmerge includes a step that filters

transcripts likely to be artifacts including possible polymerase run-on

fragments, or transcripts within 2 kilobases downstream of a reference

transcript.

For classification of RNAs: “mRNAs” include any gene annotated with

a YXXNNNX systematic name; “known ncRNAs” include C/D box snoRNAs

131 (snR18, snR65, snR4, snR71, snR76, snR45, snR63, snR128, snR190,

snR70), H/ACA box snoRNAs (snR46, snR30, snR44, snR34, snR11, snR49,

snR81, snR8, snR5, snR161, snR43, snR189, snR84, snR80, snR37, snR42,

snR85, snR86, snR191, snR9, snR36, snR35, snR31), spliceosomal RNAs

(snR14, snR7-L, snR19, LSR1), U3 snoRNA snR17b, telomerase RNA TLC1,

signal recognition particle 7S RNA SCR1, RNase MRP NME1, and RNase P

component RPR1; uRNAs include all assembled transcripts that were not

assigned and did not align to a reference annotation, excluding those

mapping to mitochondrial DNA. Any assembled transcript spanning more than

one annotated chromosomal feature was excluded from all downstream

analysis.

Quantification of expression: Expression (FPKM) was calculated using Cuffdiff

(Cufflinks v2.1.1; Trapnell et al., 2013) with bias correction and multi-read

correction, providing biological replicates for analysis, with the master

annotation generated by Cuffmerge above as a reference (-b -u --library-type

ff-firststrand). Any RNAs which did not have an average expression in RNA-

seq datasets of FPKM ≥10 in upf1Δ, and FPKM >10 in wild-type for ncRNAs,

were excluded from further analysis.

Comparison to previous lncRNA transcripts: Comparison of uRNAs to previous

transcript annotations (DCP2-sensitive [Geisler et al., 2012]; SUTs, CUTs [Xu

et al., 2009]; or XUTs [van Dijk et al., 2011]) was performed by manually

comparing the published chromosomal coordinates from each of these 4

classes of transcripts to the coordinates of uRNAs defined by Cufflinks

132 analysis. If a uRNA overlapped a previously classified lncRNA >50%, or vice

versa, the uRNA was categorized as overlapping a member of that class. In

many cases, lncRNAs have already been previously classified in more than

one category. For example, when XUTs were described, 543 were identified

as also being SUTs and 183 were identified as also being CUTs (van Dijk et

al., 2011); this ambiguity between classes is reflected in the fact that many

uRNAs overlap lncRNAs in more than one class. Additionally, in some cases

uRNA annotations spanned adjacent but non-overlapping lncRNAs, also

resulting in grouping of the uRNA into more than one class; however, in these

cases the uRNA transcript isoform described here is likely to have distinct

stability characteristics from overlapping lncRNAs.

Translatability Score calculation: For each detected RNA, we calculated the ratio

of RNA-Seq reads associated with polysomes (Polysome-seq data) relative to

reads from total RNA at steady-state (RNA-seq data), to calculate the

Translatability Score (FPKMpolysomes/FPKMsteady-state). Sequencing datasets

were normalized for PGK1 and RPL41A mRNAs to have a Translatability

Score of 1. All graphical representations of Translatability Score data are

presented as histograms of the number of RNAs per bin, generated with 40

bins from scores 0-5.

Analysis of previous lncRNA transcripts: uRNAs and previous lncRNA categories

in yeast (DCP2-sensitive [Geisler et al., 2012]; SUTs, CUTs [Xu et al., 2009];

or XUTs [van Dijk et al., 2011]) were concatenated and sorted to generate a

master .gtf file containing all transcripts. Cuffcompare (Cufflinks v2.1.1) was

133 performed to identify and list overlapping transcripts as a single entry, creating

a non-redundant list of transcripts. Expression of this non-redundant list was

determined with Cufflinks (v2.1.1) with bias correction and multi-read

correction (-G -b -u --library-type ff-firststrand; all other parameters default).

RNAs were filtered for FPKM ≥10 in upf1Δ steady-state, FPKM ≥1 in WT

steady-state, FPKM >0 in Polysome-seq for both biological replicates in each

dataset. Translatability Scores were calculated and plotted as described.

‘Mixed’ category includes transcripts expressed from a locus overlapping at

least one other transcript, as determined by Cuffcompare.

Identification of NMD-sensitive RNAs: RNAs were identified as upregulated in

upf1Δ by comparing WT and upf1Δ total RNA samples using Cuffdiff with

parameters as described above. Upregulated transcripts were required to be

statistically significant at an FDR of <0.05 and show a ≥2-fold average

increase in expression.

Pol II transcriptional shutoff - Yeast strains yJC244 (rpb1-1; not isogenic to

BY4741) and yKB526 (rpb1-1/upf1Δ; not isogenic to BY4741) were grown at

24 °C to mid-log phase, concentrated, and shifted to 37 °C by addition of an

equal volume of media at 56 °C, to inactivate the temperature-sensitive allele

of RPB1 and globally shut off transcription by RNA polymerase II (Nonet et

al., 1987). Cells were harvested prior to temperature shift and 30 minutes

after temperature shift, and whole-cell RNA isolated and analyzed by northern

blotting.

134 Poly(A) selection - Polyadenylated RNA was purified from whole-cell RNA

isolated from WT or upf1Δ Resgen strains using the Dynabeads mRNA

purification kit, as per kit instructions. 30 µg of input, unbound, or poly(A)+

RNA was analyzed by northern blotting.

Ribosome profiling library preparation - Isolation and sequencing of ribosome-

protected RNA fragments was performed based on the described protocol

(Ingolia et al., 2012), with the following modifications. Yeast cultures were

grown in synthetic dextrose medium plus amino acids to mid-log phase,

treated with 100 µg/mL CHX, harvested immediately by centrifugation, and

cell pellets flash frozen on dry ice. Lysis was carried out at 4 °C. Cell pellets

were lysed in polysome lysis buffer (10 mM Tris, pH 7.4, 100 mM NaCl, 30

mM MgCl2, 100 µg/mL CHX, 1 mM DTT) by mechanical disruption using glass

beads. Cell debris was removed by centrifugation through an 18 Ga puncture

hole for 2 minutes at 2000 RPM, and the resulting lysate was pre-cleared at

14,000 RPM for 10 minutes. Lysates were treated with 1% Triton X-100 for 5

minutes. 12.5 units (OD260) of lysate were treated with 188 U RNase I in 250

µL at 24 °C for 1hr. Lysates were loaded onto a 15-45% (w/w) sucrose

gradient, centrifuged, and fractionated as described for polysome analysis

above.

RNA was precipitated from fractions containing the 80S monosome

peak with 2 volumes of 95% EtOH at -80 °C overnight, and centrifuged for 30

minutes at 13,200 RPM to collect RNA. RNA was resuspended in LET buffer

plus 1% SDS, and extracted once each with an equal volume of phenol/LET,

135 phenol/chloroform/LET, and chloroform. RNA was precipitated with 300 mM

NaCl, 1.5 µL GlycoBlue, and >1 volume isopropanol for 30 minutes on dry ice.

RNA was collected by centrifugation at 13,200 RPM for 30 minutes at 4 °C, air dried, and resuspended in 10 mM Tris, pH 8.0. RNA from all monosome fractions for each sample was pooled, and 5 µg aliquots depleted of ribosomal RNA using the Epicentre Human/Mouse/Rat Ribo-Zero rRNA removal kit. Each rRNA-depleted sample was purified through Zymo RNA

Clean and Concentrator-5 spin columns, substituting 60% EtOH at steps 1-2 of the manufacturer’s recommended protocol to facilitate purification of small

RNAs.

Size-selection of 26-34 nt fragments of RNA was carried out by electrophoresis on a 15% denaturing polyacrylamide gel, excision, and gel purification as described (Ingolia et al., 2012). 2 aliquots per sample were pooled, and a second ribosomal RNA depletion was performed using the

Epicentre Human/Mouse/Rat Ribo-Zero kit (eliminating the 50 °C incubation step) and Zymo RNA Clean and Concentrator-5 spin columns to purify RNA as above. RNA was dephosphorylated, a 3’ linker ligated, first-strand cDNA synthesized, and cDNA circularized as in described protocol (Ingolia et al.,

2012). cDNA libraries were amplified with 12-14 cycles of PCR with indexed primers.

To generate fragmented RNA control libraries, whole-cell RNA was purified, DNase-treated, and ribosomal RNA removed as described for the

RNA-seq library preparation. RNA was fragmented with base as described

136 (Ingolia, 2010) and fragments of 26-34 nt were gel purified and used for

library preparation as described above for ribosome footprinting libraries.

Libraries were prepared for biological replicates of WT and upf1Δ strains for

ribosome footprinting, or a single replicate of each strain for the fragmented

RNA control.

Ribosome profiling/fragmented RNA sequencing and analysis

Sequencing and mapping: cDNA libraries prepared for total fragmented RNA or

ribosome footprints were sequenced on the Illumina HiSeq2500 platform at

the Institute for Integrative Genome Biology High-Throughput Sequencing

Core at the University of California, Riverside on a single-end, 50 cycle flow

cell. Using the Galaxy platform (usegalaxy.org), the sequencing FASTQ files

were run through the “NGS: QC and manipulation/FASTQ Groomer” tool.

“NGS: QC and manipulation/Compute quality statistics” was used to compute

quality scores which indicated high-quality sequencing across the length of

the reads. Data processing was carried out in Galaxy as described (Ingolia et

al., 2012). Briefly, the sequencing adaptor was clipped from the 3’ end of each

read with “NGS: QC and manipulation/Clip,” and any reads without a clipped

adaptor or that were <25nt in length after clipping were discarded. The

clipped read was trimmed to nucleotides 2-50 with “NGS:QC and

manipulation/Trim sequences.” Reads were mapped to the sacCer2 yeast

genome on Galaxy with “NGS: Mapping/Map with Bowtie for Illumina” using a

SOAP-like alignment policy allowing 1 mismatch (-v 1), reporting 1 alignment

per read (-k 1), and discarding any reads aligning to more than 16 locations in

137 the genome (-m 16). rRNA reads (any reads mapping to chrXII:

451,000-468,999) were identified and removed using the “Filter and Sort/

Select” tool.

Modifying uRNA coordinates: The 5’ and 3’ termini of all uRNA detectable by total

RNA fragmentation were manually demarcated. The most inclusive 5’ and 3’

terminus among the uRNA boundaries annotated by Cufflinks and the manual

annotation of the total fragmented RNA was identified. These updated uRNA

transcript boundaries were converted into GTF format, and combined with the

sacCer2 Ensembl Genes annotation for use as the reference annotation for

quantification of ribosome profiling sequencing data (see below). This

adjustment ensured that quantification of ribosome footprinting and total

fragmented RNA sequencing was inclusive of the largest isoform of each

transcript identified between this sequencing dataset and the RNA-seq

sequencing dataset which initially defined uRNA coordinates.

Quantification of ribosome footprint coverage: FPKMs were obtained using

Cuffdiff (Cufflinks v2.1.1) with bias correction and multi-read correction,

providing biological replicates for analysis where possible, using the reference

GTF file described above in “Modifying uRNA coordinates” (-b -u --library-type

ff-firststrand). RNAs with poor coverage in the total fragmented RNA datasets

(FPKM = 0 in WT or FPKM <10 in upf1Δ) were excluded from Footprinting

Score analysis. 331 uRNAs met this filtering cutoff.

Calculation of Footprinting Score: To calculate the Footprinting Score, for each

RNA we determined the ratio of ribosome footprinting reads relative to reads

138 from total fragmented RNA (FPKMfootprints/FPKMfragments). Sequencing datasets

were normalized for PGK1 and RPL41A mRNAs to have a Footprinting Score

of 1. To compare the translation of RNAs as measured by both the

Translatability Score and Footprinting Score, for all RNAs with a score >0, a

Spearman rank correlation coefficient was calculated.

Because the absence of ribosome footprints could be either due to a

true failure to associate with the translation machinery, or insufficient depth of

our ribosome profiling, we classify uRNAs showing sufficient evidence of

ribosome association (footprinting score in WT > 0 and footprinting score in

upf1Δ >0.1) as showing evidence of being ribosome-bound in this assay, and

make no conclusions about the absence of ribosome footprinting data. 185

uRNAs (of 331 analyzed) demonstrated ribosome association by these

cutoffs.

Demarcation of ribosome-free regions: For uRNAs, the region covered by

ribosome footprints was manually demarcated based on visualization of

ribosome footprinting sequencing reads in the IGV genome browser

(www.broadinstitute.org/igv; Robinson et al., 2011). Footprint occupancy

regions were annotated to be representative of the ribosome footprint profile

and include >75% of ribosome footprint sequencing reads. In cases where

ribosome footprints fell marginally outside the uRNA boundaries, the 5’ or 3’

ribosome-free size was set as “0”. Only those uRNAs meeting the expression

cutoffs defined above in “Quantification of ribosome footprint coverage” and

139 with sufficient evidence of ribosome association as described in “Calculation

of footprinting score” were included in this analysis (n=185).

Assignment of phasing frames for mRNAs: To establish phasing of ribosome

footprints along annotated mRNAs, individual sequencing datasets (ribosome

profiling or total fragmented RNA) were filtered to include only reads of 27

nucleotides; these reads represent reads that were 28 nucleotides prior to

trimming 1 nucleotide from the 5’ end during mapping, and represent reads

which predictably demonstrate ribosome occupancy (Ingolia et al., 2009).

Using a custom script, each nucleotide position within all annotated CDS

(based on the Ensembl sacCer2 Genes annotation downloaded from the

UCSC genome table browser; genome.ucsc.edu/cgi-bin/hgTables?

command=start) was assigned a frame as follows: a position -11 of the first

nucleotide of the AUG start codon was assigned an in-frame “+1”, as well

every third nucleotide thereafter through -16 of the last position of the CDS; a

position -10 of the first nucleotide of the AUG start codon was assigned “+2”,

as well as every third nucleotide thereafter through -15 of the last position of

the CDS; a position -9 of the first nucleotide of the AUG start codon was

assigned “+3”, as well as every third nucleotide thereafter through -14 of the

last position of the CDS. The sequencing datasets were cross-referenced to

this nucleotide frame definition such that the frame number corresponding to

the start position of the sequencing read indicated the frame to which the read

aligned; reads not aligning to a CDS were assigned a frame of “0” and not

further analyzed. For each dataset, the percentage of reads assigned to each

140 frame was calculated. Graphed data represent the average percentage of

reads aligned to each frame for 4 replicates of ribosome footprinting data, and

2 replicates of fragmented RNA data, +/- SEM. Scripts were written in Python

by N. Kline.

Assignment of phasing frames for uRNAs: Using a custom script, each

nucleotide position within all uRNAs defined in this study was assigned a

frame as described above, with the exception that reference points were the

transcript start and end position, rather than a CDS start and stop position.

Sequencing datasets containing only 27-nucleotide reads (described above)

were compared to the uRNA nucleotide frame definition as above, to assign

each read to a corresponding frame. The total number of 27-mer reads

aligning to each uRNA was determined, combining all 4 ribosome footprinting

datasets (two WT biological replicates and two upf1Δ biological replicates) or

both fragmented RNA datasets (one WT biological replicate and one upf1Δ

biological replicate); any uRNA with less than 10 combined 27-mer ribosome

footprinting reads was discarded from this analysis. For all uRNAs with at

least 10 combined ribosomal footprinting reads (n=80), the percentage of

reads aligning to each frame was determined. Any uRNAs demonstrating at

least 50% of reads aligning to a single frame was considered to show

evidence of translation-dependent phasing, and this frame was arbitrarily set

to frame +1. A total of 61 uRNAs demonstrated phasing. Scripts were written

in Python by N. Kline.

141 Identification of sORFs: uRNAs demonstrating phasing of ribosome footprints

were individually examined to determine if a putative translated ORF could be

identified based on the frame to which the ribosome footprinting sequencing

reads aligned. This identification required an in-frame canonical AUG start

codon near the 5’ end of ribosome footprints (often centered within the P site

of the most 5’ footprinting read), and the putative ORF was extended through

the first in-frame stop codon following this AUG. All such putative ORFs

encoding peptides of at least 10 amino acids constitute our class of sORFs. In

some cases more than one utilized sORF was identified per uRNA. In one

case (sORF-28), no canonical start codon could be identified despite strong

evidence for phased ribosome footprints throughout the region; in this case,

the codon within the P site of the most 5’ ribosome footprint was considered

the first codon for this sORF.

Data visualization: Snapshots of ribosome profiling read coverage were obtained

using the IGV genome browser (www.broadinstitute.org/igv). Metagene plots

of ribosome footprint coverage were generated using ngsplot (https://

code.google.com/p/ngsplot/), providing 6-column BED files of sORF or

mRNA CDS regions (default parameters and -R bed -FL 30 -SE 0 -L 100).

Generation of HA-tagged sORF strains - Chromosomal tagging of sORFs at their

endogenous loci was performed using standard homologous recombination

methods (Longtine et al., 1998). This approach resulted in incorporation of the

3xHA tag at the C-terminus of the sORF immediately followed by ADH1

terminator sequences, and incorporation of a downstream selectable marker

142 to facilitate screening of clones. sORFs were selected based on high

expression of the uRNA, strong evidence of ribosome footprint phasing, and

intergenic genomic location. Yeast genome sequences retrieved from the

Saccharomyces Genome Database (www.yeastgenome.org) were used to

determine gene-specific sequences to target knock-in of the 3xHA tag and

selectable marker to the correct locus. These sequences were designed to

insert the 3xHA tag immediately upstream of the predicted stop codon. The

3xHA tag was inserted either in-frame with the putative sORF, or out-of-frame

as a control to demonstrate frame-dependent expression of the 3xHA tag.

Incorporation of the 3xHA tag was confirmed by Sanger sequencing for each

locus.

Generation of FLAG-tagged sORF plasmids - Based on the yeast genome

sequences retrieved from the Saccharomyces Genome Database

(www.yeastgenome.org), the genomic region encompassing several uRNAs

containing putative sORFs (sORFs were selected based on high expression

of the uRNA, strong evidence of ribosome footprint phasing, and intergenic

genomic location) plus and minus ~500 bp was amplified by PCR with

Phusion High-Fidelity DNA Polymerase, which produces a blunt-end PCR

product. PCR products were ligated at 16 °C overnight into yEpLac181 (Gietz

and Sugino, 1988) previously digested with SmaI blunt-end restriction

enzyme using T4 DNA . Ligated plasmids were transformed into

calcium chloride competent XL1-Blue Escherichia coli, plated on 2% agar

Luria broth plates plus 100 µg/mL ampicillin, and individual clones screened

143 by restriction digest and sequencing to confirm ligation of the appropriate

insert. 1X FLAG tag (DYKDDDDK) was added in-frame to the C-terminus of

each putative sORF, immediately upstream of the putative stop codon, using

a single round of site-directed mutagenesis PCR with Phusion High-Fidelity

DNA Polymerase. PCR product was treated with DpnI restriction enzyme to

digest any methylated template. Plasmid was transformed into XL1-Blue E.

coli as above, and clones screened by sequencing to confirm the in-frame

insertion of the FLAG sequence.

Protein isolation and western blot analysis - WT yeast cultures containing either

1) chromosomal 3xHA-tagged sORFs, or 2) plasmids containing uRNAs

encoding a putative sORF with or without a C-terminal FLAG-tag, were grown

and treated with proteasome inhibitor MG-132 as described (Liu et al., 2007),

and flash frozen on dry ice. Protein lysis and western blotting performed using

standard methods.

Conservation of sORF peptides in other fungi

BLAST analysis: A custom database to be used for BLAST search was

generated with NCBI BLAST tool formatdb V2.2.29+, with the following

genomes: from the Saccharomyces Genome Database (yeastgenome.org):

Saccharomyces bayanus strain S23-6C, Saccharomyces kudriavzevii strain

IFO1802, Saccharomyces mikatae strain IFO1815, Saccharomyces

paradoxus strain NRRL Y-17217, Saccharomyces pastorianus strain

Weihenstephan 34/70, and 33 Saccharomyces cerevisiae strains (standard

144 laboratory strain S228C, AWRI1631, AWRI796, BY4742, BY4741, CBS7960,

CEN.PK, CLIB215, CLIB324, CLIB382, EC1118, EC9-8, FL100, FostersB,

FostersO, JAY291, Kyokai7, LalvinQA23, M22, PW5, RM11-1a, Sigma1278b,

T7, T73, UC5, VIN13, VL3, W303, Y10, YJM269, YJM789, YPS163, ZTW1);

from the NCBI Genome database (www.ncbi.nlm.nih.gov/genome):

Naumovozyma castellii strain CBS 4309 (assembly ASM23723v1), Candida

glabrata strain CBS138 (assembly ASM253v2), Kluyveromyces lactis strain

NRRL Y-1140 (assembly ASM251v1), and Ashbya gossypii strain ATCC

10895 (assembly ASM9102v4).

Putative sORF peptides were provided as query for TBLASTN against

our custom curated database. TBLASTN was run using BLAST v2.2.29+, with

the E-value threshold set to 10 and all other parameters default. Results in

which the subject sequence contained a termination codon that interrupted

the peptide were filtered from the dataset. The number of identical residues

relative to the length of the query was used to calculate percent identity. In

many cases, local regions of high-identity alignment were reported that did

not extend across the entire query length; for these the number of identical

residues relative to the full length of the query was used to calculate percent

identity. Only the hit with the highest percentage of identical residues relative

to the full length of the query for each species is reported. Data is only

reported for non-S. cerevisiae alignments.

PhastCons conserved elements: Conserved elements across 7 yeast species (S.

cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, S. bayanus, S. castellii,

145 and S. kluyveri) have previously been identified using phastCons and

reported (Siepel et al., 2005), and are accessible in the UCSC genome

browser (http://genome.ucsc.edu/) using the “Most Conserved” track. Using

the sacCer2 S. cerevisiae genome assembly, we report the log-odds score of

conserved elements that partially or completely overlap each putative sORF.

If more than one conserved element overlapped an sORF, the log-odds score

for the element displaying the highest degree of overlap is reported.

Calculation of the Ka/Ks ratio: The Ka/Ks ratio (ω; the relative rate of

nonsynonymous to synonymous mutations along a conserved sequence),

was calculated for putative sORFs using the Ka/Ks_Calculator (Zhang et al.,

2006), with the method of model averaging. For each ratio, the sacCer2

reference genome sequence was compared to the nucleotide sequence

corresponding to the highest-identity TBLASTN result for each species. Ka/Ks

ratios were only calculated if 1) a TBLASTN alignment was reported, and 2)

the aligned nucleotide sequence corresponding to the TBLASTN peptide hit

did not align 100% with the reference genome nucleotide sequence (marked

as “N.D.”). Only Ka/Ks ratios with a Fisher’s p-value <0.05 are reported.

146 Name Description Notes Source yKB154 MATa, ura3, leu2, his3, met15 Wild-type EUROSCARF yKB146 MATa, ura3, leu2, his3, met15, upf1Δ::KAN upf1Δ EUROSCARF yKB596 MATa, ura3, leu2, his3, met15, sORF-4-HA::HIS3 sORF-4 plus 3xHA C-terminal tag This study yKB597 MATa, ura3, leu2, his3, met15, sORF-4-HA::HIS3 sORF-4 plus out-of-frame 3xHA C-terminal tag This study yKB526 MATa, ura3-52, leu2-3.112, his3-200, rpb1-1, upf1Δ::HIS3 Temperature-sensitive RNA polymerase II mutant, upf1Δ This study yJC244 MATa, ura3-52, leu2-3.112, his3-200, rpb1-1, Temperature-sensitive RNA polymerase II mutant Richard Young oJC1348 GTCATGCTCCTTTTTATGGGTTCTCGTCGTAATAATCCTG ORC2-TRM7 intergenic uRNA oligo probe This study oJC1352 ACCTGAAAGAGACGCCTTGTATCTTCTATAGGTCAACTAG FAA2-BIM1 intergenic uRNA oligo probe This study oJC1917 GTTATTCTATTCTTGAGCAGGCACTTTTAGGGTTGGGCAA ICR1 ncRNA oligo probe This study oJC1981 GTATGGTTCCATACTAAACTACCATCTTCTTTATTGCCGC YKU80-YMR107W intergenic uRNA oligo probe This study oJC306 GTCTAGCCGCGAGGAAGG SCR1 ncRNA oligo probe This study oJC1984 GAAATGTCCACTGAAGATTTCGTCAAGTTGGCCCC RPS15 antisense uRNA reverse PCR primer to make template for asymmetric PCR This study oKB707 CTGGAACAATGATCATGTTTCTCATGTGGGTTCTGACTGGAGC RPS15 antisense uRNA forward PCR primer to make template for asymmetric PCR This study oKB708 AGCTAGAGTTAGAAGAAGATTTGCCCGTG RPS15 antisense uRNA asymmetric PCR primer for Northern probe This study oJC1989 CAACACAAGGCCAAGTACAACACTCCAAAGTACAGATTGG RPL5 antisense uRNA reverse PCR primer to make template for asymmetric PCR This study oKB713 CAACTTCTTCAACACCCTTGTAAGTTTCGTCCAAACC RPL5 antisense uRNA forward PCR primer to make template for asymmetric PCR This study oKB714 CTGTCAAATCATCTCTTCTACCATCACTGGTG RPL5 antisense uRNA asymmetric PCR primer for Northern probe This study oJC1991 GTCGAGTTGATTTGTTCGTACCGTTCGAAGATTGAGACCG BMH1 antisense uRNA reverse PCR primer to make template for asymmetric PCR This study oKB711 AGAGAAGTTAAGAGCCAAACCTAGACGGATTGGGTGAG BMH1 antisense uRNA forward PCR primer to make template for asymmetric PCR This study oKB712 AACTAACTAAGATCTCCGACGATATTTTGTCCG BMH1 antisense uRNA asymmetric PCR primer for Northern probe This study oKB702 CGTAAGAACAATGCCGCCCCTGGTCCATCTAATTTCAACT YNL190W antisense uRNA reverse PCR primer to make template for asymmetric This study PCR oKB717 CACTTTTGCACAAGCACACGTAAACACATAGTAGTCGAAATAG YNL190W antisense uRNA forward PCR primer to make template for asymmetric This study PCR oKB718 CCATAAAATTGTTTGGTGTTACCGCTGGTAG YNL190W antisense uRNA asymmetric PCR primer for Northern probe This study oKB700 GTTCCTCGATCGACTAGTGCCATTCAATGAGATAAGGAGT YAR047C antisense uRNA reverse PCR primer to make template for asymmetric This study PCR oKB720 GAGCAGAGGTTAGCTCCGTCTCAACCAATTTTGTAC YAR047C antisense uRNA forward PCR primer to make template for asymmetric This study PCR oKB721 AGTATAGTAAGATATAATCCCACTAACGATTAGCGAGTG YAR047C antisense uRNA asymmetric PCR primer for Northern probe This study oKB748 CTTCCAGAGCGCCAGCATCGATCATAGCTG FAS2-USV1 antisense uRNA forward PCR primer to make template for asymmetric This study PCR oKB750 CTGGTGGGTTTACTATTACTGTCGCTAGAAAATACTTACAAACTCGCTG FAS2-USV1 antisense uRNA reverse PCR primer to make template for asymmetric This study PCR oKB749 GGACTACCATCTGGTAGACAAGATGGTG FAS2-USV1 antisense uRNA asymmetric PCR primer for Northern probe This study oKB688 ⁄5Phos⁄AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGG RT primer for ribosome profiling Ingolia et al., 2012 TGGTCGC⁄iSp18⁄CACTCA⁄iSp18⁄TTCAGACGTGTGCTCTTCCGATCTATT GATGGTGCCTACAG oKB689 AATGATACGGCGACCACCGAGATCTACAC PCR amplification of ribosome profiling cDNA libraries, forward primer Ingolia et al., 2012 oKB690 CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTCAGAC PCR amplification of ribosome profiling cDNA libraries, reverse primer, Index #1 Ingolia et al., 2012 GTGTGCTCTTCCG oKB691 CAAGCAGAAGACGGCATACGAGATCACTGTGTGACTGGAGTTCAGAC PCR amplification of ribosome profiling cDNA libraries, reverse primer, Index #2 Ingolia et al., 2012 GTGTGCTCTTCCG oKB692 CAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTCAGAC PCR amplification of ribosome profiling cDNA libraries, reverse primer, Index #3 Ingolia et al., 2012 GTGTGCTCTTCCG oKB693 CAAGCAGAAGACGGCATACGAGATTCAAGTGTGACTGGAGTTCAGAC PCR amplification of ribosome profiling cDNA libraries, reverse primer, Index #4 Ingolia et al., 2012 GTGTGCTCTTCGG oKB694 CAAGCAGAAGACGGCATACGAGATCTGATCGTGACTGGAGTTCAGAC PCR amplification of ribosome profiling cDNA libraries, reverse primer, Index #5 Ingolia et al., 2012 GTGTGCTCTTCCG oKB695 CAAGCAGAAGACGGCATACGAGATTACAAGGTGACTGGAGTTCAGAC PCR amplification of ribosome profiling cDNA libraries, reverse primer, Index #6 Ingolia et al., 2012 GTGTGCTCTTCCG oKB769 CTCATCCTCATCCACAGGCAATGAAGAACACGCTAACGTTCGGATCCC Forward PCR primer to amplify 3xHA-His3 product from pFA6a-3HA-His3MX6, with This study CGGGTTAATTAA gene-specific sequences for chromosomal tagging of sORF-4 oKB770 CTTATTTCTCACATCATTATGAAGTGACTCCCCTCGGTTAGAATTCGAG Reverse PCR primer to amplify 3xHA-His3 product from pFA6a-3HA-His3MX6, with This study CTCGTTTAAAC gene-specific sequences for chromosomal tagging of sORF-4 oKB784 CTCATCCTCATCCACAGGCAATGAAGAACACGCTAACGTCGGATCCCC Forward PCR primer to amplify 3xHA-His3 product from pFA6a-3HA-His3MX6, with This study GGGTTAATTAA gene-specific sequences for chromosomal tagging of sORF-4 out-of-frame oKB789 TTCCTTACGGAACCCAAGTGTG Forward PCR primer to amplify YBL027W-YBL026W intergenic uRNA +/- 500 bp, for This study generation of pKB561 oKB790 TTACTGTATCTACATCGGGATACTAATAGTAC Reverse PCR primer to amplify YBL027W-YBL026W intergenic uRNA +/- 500 bp, for This study generation of pKB561 oKB791 ATACCAATTTTACACGATGCCATAAAGGACGATTATAAAGATGATGATGA Forward PCR primer to insert 1XFLAG tag at C-terminus of sORF-1 inYBL027W- This study TAAATAGACAAGCTACGTTGAAACAAGAACCCGC YBL026W intergenic uRNA in pKB561, for generation of pKB565 oKB792 GCGGGTTCTTGTTTCAACGTAGCTTGTCTATTTATCATCATCATCTTTAT Reverse PCR primer to insert 1XFLAG tag at C-terminus of sORF-1 inYBL027W- This study AATCGTCCTTTATGGCATCGTGTAAAATTGGTAT YBL026W intergenic uRNA in pKB561, for generation of pKB565 oKB797 GGTACTTCCGCTAATAGACTACAAAC Forward PCR primer to amplify YKU80-YMR107W intergenic uRNA +/- 500 bp, for This study generation of pKB562 oKB798 GTTCGTACTTCCTTCTGAGCAG Reverse PCR primer to amplify YKU80-YMR107W intergenic uRNA +/- 500 bp, for This study generation of pKB562 oKB799 TCCACAGGCAATGAAGAACACGCTAACGTTGATTATAAAGATGATGATG Forward PCR primer to insert 1XFLAG tag at C-terminus of sORF-4 inYKU80- This study ATAAATAACCGAGGGGAGTCACTTCATAATGATGT YMR107W intergenic uRNA in pKB562, for generation of pKB566 oKB800 ACATCATTATGAAGTGACTCCCCTCGGTTATTTATCATCATCATCTTTATA Reverse PCR primer to insert 1XFLAG tag at C-terminus of sORF-4 inYKU80- This study ATCAACGTTAGCGTGTTCTTCATTGCCTGTGGA YMR107W intergenic uRNA in pKB562, for generation of pKB566 yEpLac181 2µ, LEU2 Parental vector used to construct pKB561, pKB562, pKB565, pKB566 Gietz and Sugino, 1988 pKB561 YBL027W-YBL026W intergenic uRNA +/- 500 bp This study pKB562 YKU80-YMR107W intergenic uRNA +/- 500 bp This study pKB565 YBL027W-YBL026W intergenic uRNA +/- 500 bp + C-terminal FLAG This study pKB566 YKU80-YMR107W intergenic uRNA +/- 500 bp + C-terminal FLAG This study pFA6a-3HA- Vector used to construct chromosomal 3xHA-tagged sORF loci Longtine et al., 1998 His3MX6

Table D-1. List of oligonucleotides, plasmids, and yeast strains used in Chapter 3. Details of reagents used for presented data are listed.

147 APPENDIX E - EXPERIMENTAL METHODS

RELATED TO CHAPTER 4

Total RNA library preparation and analysis - Libraries were prepared from whole-

cell RNA for biological replicates of WT and upf1Δ strains as described in

Appendix D. Sequencing data was mapped to the sacCer2 genome as

described in Appendix D. Expression was quantified using Cuffdiff (Cufflinks

v2.1.1) against the sacCer2 Ensembl Genes annotation downloaded from the

UCSC genome table browser as the reference annotation, with bias

correction and multi-read correction, providing biological replicates for

analysis (-b -u --library-type ff-firststrand). Any RNAs which did not have an

average expression in RNA-seq datasets of FPKM ≥10 in upf1Δ, and FPKM

>0 in wild-type were excluded from further analysis. Transcripts upregulated

in upf1Δ were identified by Cuffdiff as displaying a statistically significant ≥2-

fold average increase in expression at an FDR of <0.05.

Comparison of NMD-sensitive mRNAs to previous lists of NMD substrates in

yeast - mRNAs identified here as sensitive to NMD were compared to a

previous list of NMD sensitive mRNAs identified based on increased

expression at steady-state (He et al., 2003) to identify mRNAs common to

both datasets. RNAs that show a ≥2-fold increase in expression in upf1Δ

were mined from the He et al. dataset generated using a YGS98 microarray.

Only annotated gene loci were included for comparison, based on annotation

as a “YXXNNNX” systematic name and no inclusion of “TAU,” “DELTA,” or

148 “SIGMA” which identify transposable elements, to make a parallel comparison

to RNA-seq data.

NMD-sensitive RNAs were compared to those identified as containing

long 3‘ UTRs (Kebaara and Atkin, 2009); 19 were common to both datasets.

Any mRNAs identified with translated uORFs in the 5‘ UTR by ribosome

profiling (Ingolia et al., 2009) were also compared to NMD-sensitive mRNAs;

35 mRNAs were common to both datasets, with 1 also containing a long 3’

UTR which was included in only the long 3’ UTR group for data presentation.

Finally, the remaining NMD-sensitive mRNA loci were visually inspected for

patterns of sequencing read coverage in the IGV genome browser

(www.broadinstitute.org/igv). By qualitative analysis only, those which

demonstrated a 5’ transcript boundary further upstream than a typical mRNA

were labelled as having “upstream transcription;” those which demonstrated a

3’ transcript boundary further downstream than a typical mRNA were labelled

as having “downstream transcription;” and those for which reads spanning

two adjacent genes seemed to correspond to a single transcript were labeled

as “multi-gene transcription.”

Identification of NMD-sensitive transcript regions - RNA-seq SAM files were

analyzed to calculate the number of reads at each individual nucleotide

across the sacCer2 genome for each dataset. Read counts were normalized

for sequencing coverage (total library reads). Read counts at each nucleotide

were compared between corresponding WT and upf1Δ datasets to calculate a

fold change in upf1Δ; nucleotides with no read coverage in either dataset

149 were set as a fold-change of ‘0’. A sliding window with a step size ‘s’ of 10 was used to calculate the average coverage in upf1Δ and average fold change in upf1Δ. NMD-sensitive features were located, where a feature is a a contiguous set of sliding windows in which 1) the expression (col_3_min) and fold-change (col_4_min) of each sliding window meets a set of cutoffs that indicates an increase in expression in upf1Δ, 2) the average expression

(col_3_avg_min) and fold-change (col_4_avg_min) across the entire feature meets a more stringent set of cutoffs, and 3) there is no sharp difference in the fold-change across the feature (fold_change_max) which could suggest two adjacent unique features. The following set of variables and default settings are incorporated; default settings were used for the presented data unless otherwise noted:

col_3_min (minimum expression in upf1Δ); default 5

col_4_min (minimum fold-change in upf1Δ); default 1.5

fold_change_max (maximum fold-change between two windows

before splitting the feature); default 3

feature_length (minimum length of feature); default 25 - set at 50

col_3_avg_min (minimum expression in upf1Δ averaged across the

feature); default 10

col_4_avg_min (minimum fold-change in upf1Δ averaged across the

feature); default 2

All custom scripts were written in Python by N. Kline.

150 Generation of yKB650 - For global transcriptional shutoff, a strain containing

temperature-sensitive Pol II allele rpb1-1 and upf1Δ in the BY4741 strain

background was constructed by mating and sporulation. yKB146 (MATa,

upf1Δ) was crossed with yJC1884 (MATɑ, rpb1-1). Diploids were selected on

rich media and sporulated at room temperature in sporulation media (1%

potassium acetate, 0.1% yeast extract, 0.05% glucose) until >50% tetrads

were observed by light microscopy. Spores were gently separated with

zymolase (incubated rotating at 30 °C for 1hr, glass beads added, incubated

rotating at 30 °C for an additional hour, and vortexed for 2 minutes to disrupt

ascal sac). Single spores were plated onto rich media plus G418 to select for

the deletion of upf1 with the kanamycin resistance cassette. Spores

demonstrating slow growth representative of rpb1-1 were screened for

temperature sensitive growth at 37 °C, deletion of UPF1 by PCR, and the

RBP1 locus sequenced to generate yKB650. rpb1-1 transcriptional shutoff - Transcriptional shutoff was performed with

yJC1883 (rpb1-1, Resgen) and yKB650 (rpb1-1, upf1Δ, Resgen) essentially

as described in Appendix D. Cells were concentrated into 11 mL room

temperature synthetic media, 1 mL aliquot was removed for the zero minute

timepoint, and an equal volume of synthetic media at 56 °C was added to shift

the culture to 37 °C. Cells were incubated at 37 °C, 2 mL was removed at

each timepoint, cells pelleted, and flash frozen on dry ice. Whole cell RNA

was isolated by phenol extraction and analyzed by northern analysis.

151 Name Description Notes Source oKB660 GTATGTGATCTGCATCTGAGATTCGAGA Northern probe for AIF1 This study oKB662 CCTCATAAAAAGGTGGATTCTCCGG Northern probe for BSC5 This study oKB663 CTATGGTGCAATCGCTGACATGAG Northern probe for YPT35 This study oKB667 CCTCGTAGTCATCGGGCTCAATTC Northern probe for ISC10 This study oKB668 GATTAACTTCAGAGGATGGCTGCTTGC Northern probe for SLO1 This study oKB681 GCAAATAGAGTGCAACTTGCGGTTC Northern probe for ULI1 This study oKB682 GGAGATGAAGAGTCCTCGAAATCAGTC Northern probe for DAL7 This study oKB683 GTAAAGGCCCTCTTAGCAACTTCGAC Northern probe for ESF1 This study yJC1883 MATa, ura3, leu2, his3, met15, rpb1-1::no marker rpb1-1 a Najwa Al- Husaini yJC1884 MATα, ura3, leu2, his3, met15, rpb1-1::no marker rpb1-1 α Najwa Al- Husaini yKB146 MATa, ura3, leu2, his3, met15, upf1Δ::KanMX6 upf1Δ EUROSCARF yKB154 MATa, ura3, leu2, his3, met15 Wild-type EUROSCARF yKB650 MATa, ura3, leu2, his3, met15, upf1Δ::KanMX6, rpb1-1, upf1Δ This study rpb1-1::no marker

Table E-1. List of oligonucleotides, plasmids, and yeast strains used in Chapter 4. Details of reagents used for presented data are listed.

152 BIBLIOGRAPHY

Altamura, N., Groudinsky, O., Dujardin, G., and Slonimski, P.P. (1992). NAM7 nuclear gene encodes a novel member of a family of helicases with a Zn-ligand motif and is involved in mitochondrial functions in Saccharomyces cerevisiae. J. Mol. Biol. 224, 575-587. Amrani, N., Dong, S., He, F., Ganesan, R., Ghosh, S., Kervestin, S., Li, C., Mangus, D.A., Spatrick, P., and Jacobson, A. (2006). Aberrant termination triggers nonsense-mediated mRNA decay. Biochem. Soc. Trans. 34, 39-42. Amrani, N., Ganesan, R., Kervestin, S., Mangus, D.A., Ghosh, S., and Jacobson, A. (2004). A faux 3'-UTR promotes aberrant termination and triggers nonsense- mediated mRNA decay. Nature 432, 112-118. Andrews, S.J., and Rothnagel, J.A. (2014). Emerging evidence for functional peptides encoded by short open reading frames. Nat. Rev. Gen. 15, 193-204. Applequist, S.E., Selg, M., Raman, C., and Jack, H.M. (1997). Cloning and characterization of HUPF1, a human homolog of the Saccharomyces cerevisiae nonsense mRNA-reducing UPF1 protein. Nucleic Acids Res. 25, 814-821. Arribere, J.A., and Gilbert, W.V. (2013). Roles for transcript leaders in translation and mRNA decay revealed by transcript leader sequencing. Genome Res. 23, 977-987. Aspden, J.L., Eyre-Walker, Y.C., Phillips, R.J., Amin, U., Mumtaz, M.A., Brocard, M., and Couso, J.P. (2014). Extensive translation of small ORFs revealed by Poly-Ribo-Seq. eLife 3, e03528. Baker, K.E., and Parker, R. (2004). Nonsense-mediated mRNA decay: terminating erroneous gene expression. Curr. Opin. Cell Biol. 16, 293-299. Baker, K.E., and Parker, R. (2006). Conventional 3' end formation is not required for NMD substrate recognition in Saccharomyces cerevisiae. RNA 12, 1441-1445. Baltz, A.G., Munschauer, M., Schwanhausser, B., Vasile, A., Murakawa, Y., Schueler, M., Youngs, N., Penfold-Brown, D., Drew, K., Milek, M., et al. (2012). The mRNA-bound proteome and its global occupancy profile on protein-coding transcripts. Mol. Cell 46, 674-690. Banfai, B., Jia, H., Khatun, J., Wood, E., Risk, B., Gundling, W.E., Jr., Kundaje, A., Gunawardena, H.P., Yu, Y., Xie, L., et al. (2012). Long noncoding RNAs are rarely translated in two human cell lines. Genome Res. 22, 1646-1657. Bardwell, V.J., and Wickens, M. (1990). Purification of RNA and RNA-protein complexes by an R17 coat protein affinity method. Nucleic Acids Res. 18, 6587-6594.

153 Behm-Ansmant, I., Gatfield, D., Rehwinkel, J., Hilgers, V., and Izaurralde, E. (2007). A conserved role for cytoplasmic poly(A)-binding protein 1 (PABPC1) in nonsense-mediated mRNA decay. EMBO J. 26, 1591-1601. Belew, A.T., Advani, V.M., and Dinman, J.D. (2011). Endogenous ribosomal frameshift signals operate as mRNA destabilizing elements through at least two molecular pathways in yeast. Nucleic Acids Res. 39, 2799-2808. Belew, A.T., Meskauskas, A., Musalgaonkar, S., Advani, V.M., Sulima, S.O., Kasprzak, W.K., Shapiro, B.A., and Dinman, J.D. (2014). Ribosomal frameshifting in the CCR5 mRNA is regulated by miRNAs and the NMD pathway. Nature 512, 265-269. Belgrader, P., Cheng, J., and Maquat, L.E. (1993). Evidence to implicate translation by ribosomes in the mechanism by which nonsense codons reduce the nuclear level of human triosephosphate mRNA. Proc. Natl. Acad. Sci. USA 90, 482-486. Bhattacharya, A., Czaplinski, K., Trifillis, P., He, F., Jacobson, A., and Peltz, S.W. (2000). Characterization of the biochemical properties of the human Upf1 gene product that is involved in nonsense-mediated mRNA decay. RNA 6, 1226-1235. Bhuvanagiri, M., Schlitter, A.M., Hentze, M.W., and Kulozik, A.E. (2010). NMD: RNA biology meets human genetic medicine. Biochem. J. 430, 365-377. Boehm, V., Haberman, N., Ottens, F., Ule, J., and Gehring, N.H. (2014). 3' UTR length and messenger ribonucleoprotein composition determine endocleavage efficiencies at termination codons. Cell Rep. 9, 555-568. Brar, G.A., Yassour, M., Friedman, N., Regev, A., Ingolia, N.T., and Weissman, J.S. (2012). High-resolution view of the yeast meiotic program revealed by ribosome profiling. Science 335, 552-557. Braunschweig, U., Barbosa-Morais, N.L., Pan, Q., Nachman, E.N., Alipanahi, B., Gonatopoulos-Pournatzis, T., Frey, B., Irimia, M., and Blencowe, B.J. (2014). Widespread intron retention in mammals functionally tunes transcriptomes. Genome Res. 24, 1774-1786. Buhler, M., Steiner, S., Mohn, F., Paillusson, A., and Muhlemann, O. (2006). EJC- independent degradation of nonsense immunoglobulin-mu mRNA depends on 3' UTR length. Nat. Struct. Mol. Biol. 13, 462-464. Bumgarner, S.L., Dowell, R.D., Grisafi, P., Gifford, D.K., and Fink, G.R. (2009). Toggle involving cis-interfering noncoding RNAs controls variegated gene expression in yeast. Proc. Natl. Acad. Sci. USA 106, 18321-18326. Cali, B.M., Kuchma, S.L., Latham, J., and Anderson, P. (1999). smg-7 is required for mRNA surveillance in Caenorhabditis elegans. Genetics 151, 605-616. Camblong, J., Iglesias, N., Fickentscher, C., Dieppois, G., and Stutz, F. (2007). Antisense RNA stabilization induces transcriptional gene silencing via histone deacetylation in S. cerevisiae. Cell 131, 706-717.

154 Cao, D., and Parker, R. (2003). Computational modeling and experimental analysis of nonsense-mediated decay in yeast. Cell 113, 533-545. Carter, M.S., Doskow, J., Morris, P., Li, S., Nhim, R.P., Sandstedt, S., and Wilkinson, M.F. (1995). A regulatory mechanism that detects premature nonsense codons in T-cell receptor transcripts in vivo is reversed by protein synthesis inhibitors in vitro. J. Biol. Chem. 270, 28995-29003. Carter, M.S., Li, S., and Wilkinson, M.F. (1996). A splicing-dependent regulatory mechanism that detects translation signals. EMBO J. 15, 5965-5975. Carvunis, A.R., Rolland, T., Wapinski, I., Calderwood, M.A., Yildirim, M.A., Simonis, N., Charloteaux, B., Hidalgo, C.A., Barbette, J., Santhanam, B., et al. (2012). Proto-genes and de novo gene birth. Nature 487, 370-374. Castello, A., Fischer, B., Eichelbaum, K., Horos, R., Beckmann, B.M., Strein, C., Davey, N.E., Humphreys, D.T., Preiss, T., Steinmetz, L.M., et al. (2012). Insights into RNA biology from an atlas of mammalian mRNA-binding proteins. Cell 149, 1393-1406. Chakrabarti, S., Jayachandran, U., Bonneau, F., Fiorini, F., Basquin, C., Domcke, S., Le Hir, H., and Conti, E. (2011). Molecular mechanisms for the RNA- dependent ATPase activity of Upf1 and its regulation by Upf2. Mol. Cell 41, 693-703. Chamieh, H., Ballut, L., Bonneau, F., and Le Hir, H. (2008). NMD factors UPF2 and UPF3 bridge UPF1 to the exon junction complex and stimulate its RNA helicase activity. Nat. Struct. Mol. Biol. 15, 85-93. Chapin, A., Hu, H., Rynearson, S.G., Hollien, J., Yandell, M., and Metzstein, M.M. (2014). In vivo determination of direct targets of the nonsense-mediated decay pathway in Drosophila. G3 4, 485-496. Chew, G.L., Pauli, A., Rinn, J.L., Regev, A., Schier, A.F., and Valen, E. (2013). Ribosome profiling reveals resemblance between long non-coding RNAs and 5' leaders of coding RNAs. Development 140, 2828-2834. Chiu, S.Y., Serin, G., Ohara, O., and Maquat, L.E. (2003). Characterization of human Smg5/7a: a protein with similarities to Caenorhabditis elegans SMG5 and SMG7 that functions in the dephosphorylation of Upf1. RNA 9, 77-87. Cho, H., Kim, K.M., and Kim, Y.K. (2009). Human proline-rich nuclear receptor coregulatory protein 2 mediates an interaction between mRNA surveillance machinery and decapping complex. Mol. Cell 33, 75-86. Conti, E., and Izaurralde, E. (2005). Nonsense-mediated mRNA decay: molecular insights and mechanistic variations across species. Curr. Opin. Cell Biol. 17, 316-325. Culbertson, M.R., Underbrink, K.M., and Fink, G.R. (1980). Frameshift suppression Saccharomyces cerevisiae. II. Genetic properties of group II suppressors. Genetics 95, 833-853.

155 Czaplinski, K., Weng, Y., Hagan, K.W., and Peltz, S.W. (1995). Purification and characterization of the Upf1 protein: a factor involved in translation and mRNA degradation. RNA 1, 610-623. de la Cruz, J., Kressler, D., and Linder, P. (1999). Unwinding RNA in Saccharomyces cerevisiae: DEAD-box proteins and related families. Trends Biochem. Sci. 24, 192-198. Decker, C.J., and Parker, R. (1993). A turnover pathway for both stable and unstable mRNAs in yeast: evidence for a requirement for deadenylation. Genes Dev. 7, 1632-1643. Decourty, L., Doyen, A., Malabat, C., Frachon, E., Rispal, D., Seraphin, B., Feuerbach, F., Jacquier, A., and Saveanu, C. (2014). Long open reading frame transcripts escape nonsense-mediated mRNA decay in yeast. Cell Rep. 6, 593-598. Denisenko, O., and Bomsztyk, K. (2002). Yeast hnRNP K-like genes are involved in regulation of the telomeric position effect and telomere length. Mol. Cell. Biol. 22, 286-297. Derrien, T., Johnson, R., Bussotti, G., Tanzer, A., Djebali, S., Tilgner, H., Guernec, G., Martin, D., Merkel, A., Knowles, D.G., et al. (2012). The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775-1789. Doma, M.K., and Parker, R. (2006). Endonucleolytic cleavage of eukaryotic mRNAs with stalls in translation elongation. Nature 440, 561-564. Du, M., Liu, X., Welch, E.M., Hirawat, S., Peltz, S.W., and Bedwell, D.M. (2008). PTC124 is an orally bioavailable compound that promotes suppression of the human CFTR-G542X nonsense allele in a CF mouse model. Proc. Natl. Acad. Sci. USA 105, 2064-2069. Eberle, A.B., Lykke-Andersen, S., Muhlemann, O., and Jensen, T.H. (2009). SMG6 promotes endonucleolytic cleavage of nonsense mRNA in human cells. Nat. Struct. Mol. Biol. 16, 49-55. Eberle, A.B., Stalder, L., Mathys, H., Orozco, R.Z., and Muhlemann, O. (2008). Posttranscriptional gene regulation by spatial rearrangement of the 3' untranslated region. PLoS Biol. 6, e92. Eberle, A.B., and Visa, N. (2014). Quality control of mRNP biogenesis: networking at the transcription site. Semin. Cell Dev. Biol. 32, 37-46. Engreitz, J.M., Pandya-Jones, A., McDonel, P., Shishkin, A., Sirokman, K., Surka, C., Kadri, S., Xing, J., Goren, A., Lander, E.S., et al. (2013). The Xist lncRNA exploits three-dimensional genome architecture to spread across the X chromosome. Science 341, 1237973. Engreitz, J.M., Sirokman, K., McDonel, P., Shishkin, A.A., Surka, C., Russell, P., Grossman, S.R., Chow, A.Y., Guttman, M., and Lander, E.S. (2014). RNA-RNA

156 interactions enable specific targeting of noncoding RNAs to nascent pre-mRNAs and chromatin sites. Cell 159, 188-199. Franks, T.M., Singh, G., and Lykke-Andersen, J. (2010). Upf1 ATPase-dependent mRNP disassembly is required for completion of nonsense- mediated mRNA decay. Cell 143, 938-950. Frischmeyer, P.A., and Dietz, H.C. (1999). Nonsense-mediated mRNA decay in health and disease. Hum. Mol. Genet. 8, 1893-1900. Frischmeyer, P.A., van Hoof, A., O'Donnell, K., Guerrerio, A.L., Parker, R., and Dietz, H.C. (2002). An mRNA surveillance mechanism that eliminates transcripts lacking termination codons. Science 295, 2258-2261. Frischmeyer-Guerrerio, P.A., Montgomery, R.A., Warren, D.S., Cooke, S.K., Lutz, J., Sonnenday, C.J., Guerrerio, A.L., and Dietz, H.C. (2011). Perturbation of thymocyte development in nonsense-mediated decay (NMD)-deficient mice. Proc. Natl. Acad. Sci. USA 108, 10638-10643. Fritsch, C., Herrmann, A., Nothnagel, M., Szafranski, K., Huse, K., Schumann, F., Schreiber, S., Platzer, M., Krawczak, M., Hampe, J., et al. (2012). Genome-wide search for novel human uORFs and N-terminal protein extensions using ribosomal footprinting. Genome Res. 22, 2208-2218. Fukuhara, N., Ebert, J., Unterholzner, L., Lindner, D., Izaurralde, E., and Conti, E. (2005). SMG7 is a 14-3-3-like adaptor in the nonsense-mediated mRNA decay pathway. Mol. Cell 17, 537-547. Galindo, M.I., Pueyo, J.I., Fouix, S., Bishop, S.A., and Couso, J.P. (2007). Peptides encoded by short ORFs control development and define a new eukaryotic gene family. PLoS Biol. 5, e106. Gatfield, D., Unterholzner, L., Ciccarelli, F.D., Bork, P., and Izaurralde, E. (2003). Nonsense-mediated mRNA decay in Drosophila: at the intersection of the yeast and mammalian pathways. EMBO J. 22, 3960-3970. Geisler, S., and Coller, J. (2013). RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts. Nat. Rev. Mol. Cell Biol. 14, 699-712. Geisler, S., Lojek, L., Khalil, A.M., Baker, K.E., and Coller, J. (2012). Decapping of long noncoding RNAs regulates inducible genes. Mol. Cell 45, 279-291. Ghaemmaghami, S., Huh, W.K., Bower, K., Howson, R.W., Belle, A., Dephoure, N., O'Shea, E.K., and Weissman, J.S. (2003). Global analysis of protein expression in yeast. Nature 425, 737-741. Gietz, R.D., and Sugino, A. (1988). New yeast-Escherichia coli shuttle vectors constructed with in vitro mutagenized yeast genes lacking six-base pair restriction sites. Gene 74, 527-534. Gilbert, C., Kristjuhan, A., Winkler, G.S., and Svejstrup, J.Q. (2004). Elongator interactions with nascent mRNA revealed by RNA immunoprecipitation. Mol. Cell 14, 457-464.

157 Goecks, J., Nekrutenko, A., and Taylor, J. (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86. Gonzalez, C.I., Ruiz-Echevarria, M.J., Vasudevan, S., Henry, M.F., and Peltz, S.W. (2000). The yeast hnRNP-like protein Hrp1/Nab4 marks a transcript for nonsense-mediated mRNA decay. Mol. Cell 5, 489-499. Gozalbo, D., and Hohmann, S. (1990). Nonsense suppressors partially revert the decrease of the mRNA level of a nonsense mutant allele in yeast. Curr. Genet. 17, 77-79. Grabowski, P.J., and Sharp, P.A. (1986). Affinity chromatography of splicing complexes: U2, U5, and U4 + U6 small nuclear ribonucleoprotein particles in the spliceosome. Science 233, 1294-1299. Guan, Q., Zheng, W., Tang, S., Liu, X., Zinkel, R.A., Tsui, K.W., Yandell, B.S., and Culbertson, M.R. (2006). Impact of nonsense-mediated mRNA decay on the global expression profile of budding yeast. PLoS Genet. 2, e203. Guttman, M., Amit, I., Garber, M., French, C., Lin, M.F., Feldser, D., Huarte, M., Zuk, O., Carey, B.W., Cassady, J.P., et al. (2009). Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223-227. Guttman, M., Russell, P., Ingolia, N.T., Weissman, J.S., and Lander, E.S. (2013). Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell 154, 240-251. Hagan, K.W., Ruiz-Echevarria, M.J., Quan, Y., and Peltz, S.W. (1995). Characterization of cis-acting sequences and decay intermediates involved in nonsense-mediated mRNA turnover. Mol. Cell. Biol. 15, 809-823. He, F., and Jacobson, A. (1995). Identification of a novel component of the nonsense-mediated mRNA decay pathway by use of an interacting protein screen. Genes Dev. 9, 437-454. He, F., Li, X., Spatrick, P., Casillo, R., Dong, S., and Jacobson, A. (2003). Genome-wide analysis of mRNAs regulated by the nonsense-mediated and 5' to 3' mRNA decay pathways in yeast. Mol. Cell 12, 1439-1452. He, F., Peltz, S.W., Donahue, J.L., Rosbash, M., and Jacobson, A. (1993). Stabilization and ribosome association of unspliced pre-mRNAs in a yeast upf1- mutant. Proc. Natl. Acad. Sci. USA 90, 7034-7038. Hodgkin, J., Papp, A., Pulak, R., Ambros, V., and Anderson, P. (1989). A new kind of informational suppression in the nematode Caenorhabditis elegans. Genetics 123, 301-313. Hogg, J.R., and Collins, K. (2007). RNA-based affinity purification reveals 7SK RNPs with distinct composition and regulation. RNA 13, 868-880.

158 Hogg, J.R., and Goff, S.P. (2010). Upf1 senses 3'UTR length to potentiate mRNA decay. Cell 143, 379-389. Holbrook, J.A., Neu-Yilik, G., Hentze, M.W., and Kulozik, A.E. (2004). Nonsense- mediated decay approaches the clinic. Nat. Genet. 36, 801-808. Huntzinger, E., Kashima, I., Fauser, M., Sauliere, J., and Izaurralde, E. (2008). SMG6 is the catalytic endonuclease that cleaves mRNAs containing nonsense codons in metazoan. RNA 14, 2609-2617. Hurt, J.A., Robertson, A.D., and Burge, C.B. (2013). Global analyses of UPF1 binding and function reveal expanded scope of nonsense-mediated mRNA decay. Genome Res. 23, 1636-1650. Hwang, J., Sato, H., Tang, Y., Matsuda, D., and Maquat, L.E. (2010). UPF1 association with the cap-binding protein, CBP80, promotes nonsense-mediated mRNA decay at two distinct steps. Mol. Cell 39, 396-409. Ingolia, N.T. (2010). Genome-wide translational profiling by ribosome footprinting. Methods Enzymol. 470, 119-142. Ingolia, N.T., Brar, G.A., Rouskin, S., McGeachy, A.M., and Weissman, J.S. (2012). The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat. Protoc. 7, 1534-1550. Ingolia, N.T., Brar, G.A., Stern-Ginossar, N., Harris, M.S., Talhouarne, G.J., Jackson, S.E., Wills, M.R., and Weissman, J.S. (2014). Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 8, 1365-1379. Ingolia, N.T., Ghaemmaghami, S., Newman, J.R., and Weissman, J.S. (2009). Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218-223. Ingolia, N.T., Lareau, L.F., and Weissman, J.S. (2011). Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789-802. Isken, O., Kim, Y.K., Hosoda, N., Mayeur, G.L., Hershey, J.W., and Maquat, L.E. (2008). Upf1 phosphorylation triggers translational repression during nonsense- mediated mRNA decay. Cell 133, 314-327. Ivanov, P.V., Gehring, N.H., Kunz, J.B., Hentze, M.W., and Kulozik, A.E. (2008). Interactions between UPF1, eRFs, PABP and the exon junction complex suggest an integrated model for mammalian NMD pathways. EMBO J. 27, 736-747. Jacobs, J.L., Belew, A.T., Rakauskaite, R., and Dinman, J.D. (2007). Identification of functional, endogenous programmed -1 ribosomal frameshift signals in the genome of Saccharomyces cerevisiae. Nucleic Acids Res. 35, 165-174.

159 Jaillon, O., Bouhouche, K., Gout, J.F., Aury, J.M., Noel, B., Saudemont, B., Nowacki, M., Serrano, V., Porcel, B.M., Segurens, B., et al. (2008). Translational control of intron splicing in eukaryotes. Nature 451, 359-362. Johansson, M.J., He, F., Spatrick, P., Li, C., and Jacobson, A. (2007). Association of yeast Upf1p with direct substrates of the NMD pathway. Proc. Natl. Acad. Sci. USA 104, 20872-20877. Johns, L., Grimson, A., Kuchma, S.L., Newman, C.L., and Anderson, P. (2007). Caenorhabditis elegans SMG-2 selectively marks mRNAs containing premature translation termination codons. Mol. Cell. Biol. 27, 5630-5638. Juntawong, P., Girke, T., Bazin, J., and Bailey-Serres, J. (2014). Translational dynamics revealed by genome-wide profiling of ribosome footprints in Arabidopsis. Proc. Natl. Acad. Sci. USA 111, E203-212. Kadlec, J., Guilligay, D., Ravelli, R.B., and Cusack, S. (2006). Crystal structure of the UPF2-interacting domain of nonsense-mediated mRNA decay factor UPF1. RNA 12, 1817-1824. Kadlec, J., Izaurralde, E., and Cusack, S. (2004). The structural basis for the interaction between nonsense-mediated mRNA decay factors UPF2 and UPF3. Nat. Struct. Mol. Biol. 11, 330-337. Kashima, I., Yamashita, A., Izumi, N., Kataoka, N., Morishita, R., Hoshino, S., Ohno, M., Dreyfuss, G., and Ohno, S. (2006). Binding of a novel SMG-1-Upf1- eRF1-eRF3 complex (SURF) to the exon junction complex triggers Upf1 phosphorylation and nonsense-mediated mRNA decay. Genes Dev. 20, 355-367. Kastenmayer, J.P., Ni, L., Chu, A., Kitchen, L.E., Au, W.C., Yang, H., Carter, C.D., Wheeler, D., Davis, R.W., Boeke, J.D., et al. (2006). Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 16, 365-373. Kawashima, T., Douglass, S., Gabunilas, J., Pellegrini, M., and Chanfreau, G.F. (2014). Widespread use of non-productive alternative splice sites in Saccharomyces cerevisiae. PLoS Genet. 10, e1004249. Kebaara, B.W., and Atkin, A.L. (2009). Long 3'-UTRs target wild-type mRNAs for nonsense-mediated mRNA decay in Saccharomyces cerevisiae. Nucleic Acids Res. 37, 2771-2778. Keene, J.D., Komisarow, J.M., and Friedersdorf, M.B. (2006). RIP-Chip: the isolation and identification of mRNAs, microRNAs and protein components of ribonucleoprotein complexes from cell extracts. Nat. Protoc. 1, 302-307. Kervestin, S., Li, C., Buckingham, R., and Jacobson, A. (2012). Testing the faux- UTR model for NMD: analysis of Upf1p and Pab1p competition for binding to eRF3/Sup35p. Biochimie 94, 1560-1571. Khajavi, M., Inoue, K., and Lupski, J.R. (2006). Nonsense-mediated mRNA decay modulates clinical outcome of genetic disease. Eur. J. Hum. Genet. 14, 1074-1081.

160 Khalil, A.M., Guttman, M., Huarte, M., Garber, M., Raj, A., Rivea Morales, D., Thomas, K., Presser, A., Bernstein, B.E., van Oudenaarden, A., et al. (2009). Many human large intergenic noncoding RNAs associate with chromatin- modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667-11672. Kim, T., Xu, Z., Clauder-Munster, S., Steinmetz, L.M., and Buratowski, S. (2012). Set3 HDAC mediates effects of overlapping noncoding transcription on gene induction kinetics. Cell 150, 1158-1169. Kozak, M. (1989). The scanning model for translation: an update. J. Cell Biol. 108, 229-241. Kurihara, Y., Matsui, A., Hanada, K., Kawashima, M., Ishida, J., Morosawa, T., Tanaka, M., Kaminuma, E., Mochizuki, Y., Matsushima, A., et al. (2009). Genome-wide suppression of aberrant mRNA-like noncoding RNAs by NMD in Arabidopsis. Proc. Natl. Acad. Sci. USA 106, 2453-2458. Kuroha, K., Tatematsu, T., and Inada, T. (2009). Upf1 stimulates degradation of the product derived from aberrant messenger RNA containing a specific nonsense mutation by the proteasome. EMBO Rep. 10, 1265-1271. Kurosaki, T., Li, W., Hoque, M., Popp, M.W., Ermolenko, D.N., Tian, B., and Maquat, L.E. (2014). A post-translational regulatory switch on UPF1 controls targeted mRNA degradation. Genes Dev. 28, 1900-1916. Kurtzman, C.P., and Robnett, C.J. (2003). Phylogenetic relationships among yeasts of the 'Saccharomyces complex' determined from multigene sequence analyses. FEMS Yeast Res. 3, 417-432. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25. Lareau, L.F., Inada, M., Green, R.E., Wengrod, J.C., and Brenner, S.E. (2007). Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature 446, 926-929. Law, G.L., Bickel, K.S., MacKay, V.L., and Morris, D.R. (2005). The undertranslated transcriptome reveals widespread translational silencing by alternative 5' transcript leaders. Genome Biol. 6, R111. Le Hir, H., Gatfield, D., Izaurralde, E., and Moore, M.J. (2001). The exon-exon junction complex provides a binding platform for factors involved in mRNA export and nonsense-mediated mRNA decay. EMBO J. 20, 4987-4997. Le Hir, H., Izaurralde, E., Maquat, L.E., and Moore, M.J. (2000). The spliceosome deposits multiple proteins 20-24 nucleotides upstream of mRNA exon-exon junctions. EMBO J. 19, 6860-6869. Leeds, P., Peltz, S.W., Jacobson, A., and Culbertson, M.R. (1991). The product of the yeast UPF1 gene is required for rapid turnover of mRNAs containing a premature translational termination codon. Genes Dev. 5, 2303-2314.

161 Leeds, P., Wood, J.M., Lee, B.S., and Culbertson, M.R. (1992). Gene products that promote mRNA turnover in Saccharomyces cerevisiae. Mol. Cell. Biol. 12, 2165-2177. Lejeune, F., Li, X., and Maquat, L.E. (2003). Nonsense-mediated mRNA decay in mammalian cells involves decapping, deadenylating, and exonucleolytic activities. Mol. Cell 12, 675-687. Lelivelt, M.J., and Culbertson, M.R. (1999). Yeast Upf proteins required for RNA surveillance affect global expression of the yeast transcriptome. Mol. Cell. Biol. 19, 6710-6719. Lewis, B.P., Green, R.E., and Brenner, S.E. (2003). Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. Proc. Natl. Acad. Sci. USA 100, 189-192. Liu, C., Apodaca, J., Davis, L.E., and Rao, H. (2007). Proteasome inhibition in wild-type yeast Saccharomyces cerevisiae cells. Biotechniques 42, 158, 160, 162. Loh, B., Jonas, S., and Izaurralde, E. (2013). The SMG5-SMG7 heterodimer directly recruits the CCR4-NOT deadenylase complex to mRNAs containing nonsense codons via interaction with POP2. Genes Dev. 27, 2125-2138. Longman, D., Hug, N., Keith, M., Anastasaki, C., Patton, E.E., Grimes, G., and Caceres, J.F. (2013). DHX34 and NBAS form part of an autoregulatory NMD circuit that regulates endogenous RNA targets in human cells, zebrafish and Caenorhabditis elegans. Nucleic Acids Res. 41, 8319-8331. Longman, D., Plasterk, R.H., Johnstone, I.L., and Caceres, J.F. (2007). Mechanistic insights and identification of two novel factors in the C. elegans NMD pathway. Genes Dev. 21, 1075-1085. Longtine, M.S., McKenzie, A., 3rd, Demarini, D.J., Shah, N.G., Wach, A., Brachat, A., Philippsen, P., and Pringle, J.R. (1998). Additional modules for versatile and economical PCR-based gene deletion and modification in Saccharomyces cerevisiae. Yeast 14, 953-961. Losson, R., and Lacroute, F. (1979). Interference of nonsense mutations with eukaryotic messenger RNA stability. Proc. Natl. Acad. Sci. USA 76, 5134-5137. Lou, C.H., Shao, A., Shum, E.Y., Espinoza, J.L., Huang, L., Karam, R., and Wilkinson, M.F. (2014). Posttranscriptional control of the stem cell and neurogenic programs by the nonsense-mediated RNA decay pathway. Cell Rep. 6, 748-764. Luke, B., Azzalin, C.M., Hug, N., Deplazes, A., Peter, M., and Lingner, J. (2007). Saccharomyces cerevisiae Ebs1p is a putative ortholog of human Smg7 and promotes nonsense-mediated mRNA decay. Nucleic Acids Res. 35, 7688-7697. Lykke-Andersen, J. (2002). Identification of a human decapping complex associated with hUpf proteins in nonsense-mediated decay. Mol. Cell. Biol. 22, 8114-8121.

162 Lykke-Andersen, J., Shu, M.D., and Steitz, J.A. (2000). Human Upf proteins target an mRNA for nonsense-mediated decay when bound downstream of a termination codon. Cell 103, 1121-1131. Lykke-Andersen, J., Shu, M.D., and Steitz, J.A. (2001). Communication of the position of exon-exon junctions to the mRNA surveillance machinery by the protein RNPS1. Science 293, 1836-1839. Magny, E.G., Pueyo, J.I., Pearl, F.M., Cespedes, M.A., Niven, J.E., Bishop, S.A., and Couso, J.P. (2013). Conserved regulation of cardiac calcium uptake by peptides encoded in small open reading frames. Science 341, 1116-1120. Mangus, D.A., Amrani, N., and Jacobson, A. (1998). Pbp1p, a factor interacting with Saccharomyces cerevisiae poly(A)-binding protein, regulates polyadenylation. Mol. Cell. Biol. 18, 7383-7396. Maquat, L.E., and Li, X. (2001). Mammalian heat shock p70 and histone H4 transcripts, which derive from naturally intronless genes, are immune to nonsense-mediated decay. RNA 7, 445-456. Martens, J.A., Laprade, L., and Winston, F. (2004). Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene. Nature 429, 571-574. Matia-Gonzalez, A.M., Hasan, A., Moe, G.H., Mata, J., and Rodriguez-Gabriel, M.A. (2013). Functional characterization of Upf1 targets in Schizosaccharomyces pombe. RNA Biol. 10, 1057-1065. Meaux, S., van Hoof, A., and Baker, K.E. (2008). Nonsense-mediated mRNA decay in yeast does not require PAB1 or a poly(A) tail. Mol. Cell 29, 134-140. Medghalchi, S.M., Frischmeyer, P.A., Mendell, J.T., Kelly, A.G., Lawler, A.M., and Dietz, H.C. (2001). Rent1, a trans-effector of nonsense-mediated mRNA decay, is essential for mammalian embryonic viability. Hum. Mol. Genet. 10, 99-105. Melero, R., Buchwald, G., Castano, R., Raabe, M., Gil, D., Lazaro, M., Urlaub, H., Conti, E., and Llorca, O. (2012). The cryo-EM structure of the UPF-EJC complex shows UPF1 poised toward the RNA 3' end. Nat. Struct. Mol. Biol. 19, 498-505, S491-492. Mellacheruvu, D., Wright, Z., Couzens, A.L., Lambert, J.P., St-Denis, N.A., Li, T., Miteva, Y.V., Hauri, S., Sardiu, M.E., Low, T.Y., et al. (2013). The CRAPome: a contaminant repository for affinity purification-mass spectrometry data. Nat. Methods 10, 730-736. Mendell, J.T., Medghalchi, S.M., Lake, R.G., Noensie, E.N., and Dietz, H.C. (2000). Novel Upf2p orthologues suggest a functional link between translation initiation and nonsense surveillance complexes. Mol. Cell. Biol. 20, 8944-8957. Metzstein, M.M., and Krasnow, M.A. (2006). Functions of the nonsense-mediated mRNA decay pathway in Drosophila development. PLoS Genet. 2, e180.

163 Mili, S., and Steitz, J.A. (2004). Evidence for reassociation of RNA-binding proteins after cell lysis: implications for the interpretation of immunoprecipitation analyses. RNA 10, 1692-1694. Mitchell, P., and Tollervey, D. (2003). An NMD pathway in yeast involving accelerated deadenylation and exosome-mediated 3'-->5' degradation. Mol. Cell 11, 1405-1413. Mitchell, S.F., Jain, S., She, M., and Parker, R. (2013). Global analysis of yeast mRNPs. Nat. Struct. Mol. Biol. 20, 127-133. Moriarty, P.M., Reddy, C.C., and Maquat, L.E. (1998). Selenium deficiency reduces the abundance of mRNA for Se-dependent glutathione peroxidase 1 by a UGA-dependent mechanism likely to be nonsense codon-mediated decay of cytoplasmic mRNA. Mol. Cell. Biol. 18, 2932-2939. Muhlemann, O., and Lykke-Andersen, J. (2010). How and where are nonsense mRNAs degraded in mammalian cells? RNA Biol. 7, 28-32. Muhlrad, D., Decker, C.J., and Parker, R. (1994). Deadenylation of the unstable mRNA encoded by the yeast MFA2 gene leads to decapping followed by 5'-->3' digestion of the transcript. Genes Dev. 8, 855-866. Muhlrad, D., and Parker, R. (1994). Premature translational termination triggers mRNA decapping. Nature 370, 578-581. Muhlrad, D., and Parker, R. (1999a). Aberrant mRNAs with extended 3' UTRs are substrates for rapid degradation by mRNA surveillance. RNA 5, 1299-1307. Muhlrad, D., and Parker, R. (1999b). Recognition of yeast mRNAs as "nonsense containing" leads to both inhibition of mRNA translation and mRNA degradation: implications for the control of mRNA decapping. Mol. Biol Cell. 10, 3971-3978. Muller-McNicoll, M., and Neugebauer, K.M. (2013). How cells get the message: dynamic assembly and function of mRNA-protein complexes. Nat. Rev. Genet. 14, 275-287. Nacken, V., Achstetter, T., and Degryse, E. (1996). Probing the limits of expression levels by varying promoter strength and plasmid copy number in Saccharomyces cerevisiae. Gene 175, 253-260. Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M., and Snyder, M. (2008). The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344-1349. Nagy, E., and Maquat, L.E. (1998). A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance. Trends Biochem. Sci. 23, 198-199. Niazi, F., and Valadkhan, S. (2012). Computational analysis of functional long noncoding RNAs reveals lack of peptide-coding capacity and parallels with 3' UTRs. RNA 18, 825-843.

164 Niranjanakumari, S., Lasda, E., Brazas, R., and Garcia-Blanco, M.A. (2002). Reversible cross-linking combined with immunoprecipitation to study RNA- protein interactions in vivo. Methods 26, 182-190. Nonet, M., Scafe, C., Sexton, J., and Young, R. (1987). Eucaryotic RNA polymerase conditional mutant that rapidly ceases mRNA synthesis. Mol. Cell. Biol. 7, 1602-1611. Ohnishi, T., Yamashita, A., Kashima, I., Schell, T., Anders, K.R., Grimson, A., Hachiya, T., Hentze, M.W., Anderson, P., and Ohno, S. (2003). Phosphorylation of hUPF1 induces formation of mRNA surveillance complexes containing hSMG-5 and hSMG-7. Mol. Cell 12, 1187-1200. Page, M.F., Carr, B., Anders, K.R., Grimson, A., and Anderson, P. (1999). SMG-2 is a phosphorylated protein required for mRNA surveillance in Caenorhabditis elegans and related to Upf1p of yeast. Mol. Cell. Biol. 19, 5943-5951. Pal, M., Ishigaki, Y., Nagy, E., and Maquat, L.E. (2001). Evidence that phosphorylation of human Upfl protein varies with intracellular location and is mediated by a wortmannin-sensitive and rapamycin-sensitive PI 3-kinase-related kinase signaling pathway. RNA 7, 5-15. Pastor, F., Kolonias, D., Giangrande, P.H., and Gilboa, E. (2010). Induction of tumour immunity by targeted inhibition of nonsense-mediated mRNA decay. Nature 465, 227-230. Pauli, A., Norris, M.L., Valen, E., Chew, G.L., Gagnon, J.A., Zimmerman, S., Mitchell, A., Ma, J., Dubrulle, J., Reyon, D., et al. (2014). Toddler: an embryonic signal that promotes cell movement via Apelin receptors. Science 343, 1248636. Peltz, S.W., Brown, A.H., and Jacobson, A. (1993). mRNA destabilization triggered by premature translational termination depends on at least three cis- acting sequence elements and one trans-acting factor. Genes Dev. 7, 1737-1754. Peltz, S.W., Morsy, M., Welch, E.M., and Jacobson, A. (2013). Ataluren as an agent for therapeutic nonsense suppression. Annu. Rev. Med. 64, 407-425. Peltz, S.W., Welch, E.M., Trotta, C.R., Davis, T., and Jacobson, A. (2009). Targeting post-transcriptional control for drug discovery. RNA Biol. 6, 329-334. Perlick, H.A., Medghalchi, S.M., Spencer, F.A., Kendzior, R.J., Jr., and Dietz, H.C. (1996). Mammalian orthologues of a yeast regulator of nonsense transcript stability. Proc. Natl. Acad. Sci. USA 93, 10928-10932. Plant, E.P., Wang, P., Jacobs, J.L., and Dinman, J.D. (2004). A programmed -1 ribosomal frameshift signal can function as a cis-acting mRNA destabilizing element. Nucleic Acids Res. 32, 784-790. Pulak, R., and Anderson, P. (1993). mRNA surveillance by the Caenorhabditis elegans smg genes. Genes Dev. 7, 1885-1897.

165 Rehwinkel, J., Raes, J., and Izaurralde, E. (2006). Nonsense-mediated mRNA decay: Target genes and functional diversification of effectors. Trends Biochem. Sci. 31, 639-646. Rinn, J.L., Kertesz, M., Wang, J.K., Squazzo, S.L., Xu, X., Brugmann, S.A., Goodnough, L.H., Helms, J.A., Farnham, P.J., Segal, E., et al. (2007). Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311-1323. Rio, D.C., Ares, M., Hannon, G.J., and Nilsen, T.W. (2011). RNA : a laboratory manual (Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press). Roberts, A., Pimentel, H., Trapnell, C., and Pachter, L. (2011). Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27, 2325-2329. Robinson, J.T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P. (2011). Integrative genomics viewer. Nat. Biotechnol. 29, 24-26. Rodriguez-Gabriel, M.A., Watt, S., Bahler, J., and Russell, P. (2006). Upf1, an RNA helicase required for nonsense-mediated mRNA decay, modulates the transcriptional response to oxidative stress in fission yeast. Mol. Cell. Biol. 26, 6347-6356. Ruiz-Orera, J., Messeguer, X., Subirana, J.A., and Alba, M.M. (2014). Long non- coding RNAs as a source of new peptides. eLife 3, e03523. Said, N., Rieder, R., Hurwitz, R., Deckert, J., Urlaub, H., and Vogel, J. (2009). In vivo expression and purification of aptamer-tagged small RNA regulators. Nucleic Acids Res. 37, e133. Sayani, S., Janis, M., Lee, C.Y., Toesca, I., and Chanfreau, G.F. (2008). Widespread impact of nonsense-mediated mRNA decay on the yeast intronome. Mol. Cell 31, 360-370. Schweingruber, C., Rufener, S.C., Zund, D., Yamashita, A., and Muhlemann, O. (2013). Nonsense-mediated mRNA decay - mechanisms of substrate mRNA recognition and degradation in mammalian cells. Biochim. Biophys. Acta 1829, 612-623. Selth, L.A., Gilbert, C., and Svejstrup, J.Q. (2009). RNA immunoprecipitation to determine RNA-protein associations in vivo. Cold Spring Harb. Protoc. 6, pdb.prot5234. Serin, G., Gersappe, A., Black, J.D., Aronoff, R., and Maquat, L.E. (2001). Identification and characterization of human orthologues to Saccharomyces cerevisiae Upf2 protein and Upf3 protein (Caenorhabditis elegans SMG-4). Mol. Cell. Biol. 21, 209-223. Seyedali, A., and Berry, M.J. (2014). Nonsense-mediated decay factors are involved in the regulation of selenoprotein mRNA levels during selenium deficiency. RNA 20, 1248-1256.

166 Sharma, S. (2008). Isolation of a sequence-specific RNA binding protein, polypyrimidine tract binding protein, using RNA affinity chromatography. Methods Mol. Biol. 488, 1-8. Shi, Y. (2012). Alternative polyadenylation: new insights from global analyses. RNA 18, 2105-2117. Shigeoka, T., Kato, S., Kawaichi, M., and Ishida, Y. (2012). Evidence that the Upf1-related scans the 3'-UTR to ensure mRNA integrity. Nucleic Acids Res. 40, 6887-6897. Shoemaker, C.J., and Green, R. (2012). Translation drives mRNA quality control. Nat. Struct. Mol. Biol. 19, 594-601. Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., et al. (2005). Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034-1050. Silva, A.L., Ribeiro, P., Inacio, A., Liebhaber, S.A., and Romao, L. (2008). Proximity of the poly(A)-binding protein to a premature termination codon inhibits mammalian nonsense-mediated mRNA decay. RNA 14, 563-576. Silva, A.L., and Romao, L. (2009). The mammalian nonsense-mediated mRNA decay pathway: to decay or not to decay! Which players make the decision? FEBS Lett. 583, 499-505. Singh, G., Rebbapragada, I., and Lykke-Andersen, J. (2008). A competition between stimulators and antagonists of Upf complex recruitment governs human nonsense-mediated mRNA decay. PLoS Biol. 6, e111. Slobodin, B., and Gerst, J.E. (2010). A novel mRNA affinity purification technique for the identification of interacting proteins and transcripts in ribonucleoprotein complexes. RNA 16, 2277-2290. Smith, J.E., Alvarez-Dominguez, J.R., Kline, N., Huynh, N.J., Geisler, S., Hu, W., Coller, J., and Baker, K.E. (2014). Translation of small open reading frames within unannotated RNA transcripts in Saccharomyces cerevisiae. Cell Rep. 7, 1858-1866. Smith-Kinnaman, W.R., Berna, M.J., Hunter, G.O., True, J.D., Hsu, P., Cabello, G.I., Fox, M.J., Varani, G., and Mosley, A.L. (2014). The interactome of the atypical phosphatase Rtr1 in Saccharomyces cerevisiae. Mol. Biosyst. 10, 1730-1741. Stalder, L., and Muhlemann, O. (2008). The meaning of nonsense. Trends Cell Biol. 18, 315-321. Steiger, M., Carr-Schmid, A., Schwartz, D.C., Kiledjian, M., and Parker, R. (2003). Analysis of recombinant yeast decapping enzyme. RNA 9, 231-238.

167 Sutherland, B.W., Toews, J., and Kast, J. (2008). Utility of formaldehyde cross- linking and mass spectrometry in the study of protein-protein interactions. J. Mass Spectrom. 43, 699-715. Sweet, T., Kovalak, C., and Coller, J. (2012). The DEAD-box protein Dhh1 promotes decapping by slowing ribosome movement. PLoS Biol. 10, e1001342. Swisher, K.D., and Parker, R. (2011). Interactions between Upf1 and the decapping factors Edc3 and Pat1 in Saccharomyces cerevisiae. PLoS One 6, e26547. Takahashi, S., Araki, Y., Sakuno, T., and Katada, T. (2003). Interaction between Ski7p and Upf1p is required for nonsense-mediated 3'-to-5' mRNA decay in yeast. EMBO J. 22, 3951-3959. Tani, H., Imamachi, N., Salam, K.A., Mizutani, R., Ijiri, K., Irie, T., Yada, T., Suzuki, Y., and Akimitsu, N. (2012). Identification of hundreds of novel UPF1 target transcripts by direct determination of whole transcriptome stability. RNA Biol. 9, 1370-1379. Tani, H., Torimura, M., and Akimitsu, N. (2013). The RNA degradation pathway regulates the function of GAS5 a non-coding RNA in mammalian cells. PLoS One 8, e55684. Tarpey, P.S., Raymond, F.L., Nguyen, L.S., Rodriguez, J., Hackett, A., Vandeleur, L., Smith, R., Shoubridge, C., Edkins, S., Stevens, C., et al. (2007). Mutations in UPF3B, a member of the nonsense-mediated mRNA decay complex, cause syndromic and nonsyndromic mental retardation. Nat. Genet. 39, 1127-1133. Thermann, R., Neu-Yilik, G., Deters, A., Frede, U., Wehr, K., Hagemeier, C., Hentze, M.W., and Kulozik, A.E. (1998). Binary specification of nonsense codons by splicing and cytoplasmic translation. EMBO J. 17, 3484-3494. Thompson, D.M., and Parker, R. (2007). Cytoplasmic decay of intergenic transcripts in Saccharomyces cerevisiae. Mol. Cell. Biol. 27, 92-101. Thoren, L.A., Norgaard, G.A., Weischenfeldt, J., Waage, J., Jakobsen, J.S., Damgaard, I., Bergstrom, F.C., Blom, A.M., Borup, R., Bisgaard, H.C., et al. (2010). UPF2 is a critical regulator of liver development, function and regeneration. PLoS One 5, e11650. Toesca, I., Nery, C.R., Fernandez, C.F., Sayani, S., and Chanfreau, G.F. (2011). Cryptic transcription mediates repression of subtelomeric metal homeostasis genes. PLoS Genet. 7, e1002163. Trapnell, C., Hendrickson, D.G., Sauvageau, M., Goff, L., Rinn, J.L., and Pachter, L. (2013). Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46-53. Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511-515.

168 Ule, J., Jensen, K., Mele, A., and Darnell, R.B. (2005). CLIP: a method for identifying protein-RNA interaction sites in living cells. Methods 37, 376-386. van Dijk, E.L., Chen, C.L., d'Aubenton-Carafa, Y., Gourvennec, S., Kwapisz, M., Roche, V., Bertrand, C., Silvain, M., Legoix-Ne, P., Loeillet, S., et al. (2011). XUTs are a class of Xrn1-sensitive antisense regulatory non-coding RNA in yeast. Nature 475, 114-117. van Heesch, S., van Iterson, M., Jacobi, J., Boymans, S., Essers, P.B., de Bruijn, E., Hao, W., MacInnes, A.W., Cuppen, E., and Simonis, M. (2014). Extensive localization of long noncoding RNAs to the cytosol and mono- and polyribosomal complexes. Genome Biol. 15, R6. Waern, K., and Snyder, M. (2013). Extensive transcript diversity and novel upstream open reading frame regulation in yeast. G3 3, 343-352. Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57-63. Weischenfeldt, J., Damgaard, I., Bryder, D., Theilgaard-Monch, K., Thoren, L.A., Nielsen, F.C., Jacobsen, S.E., Nerlov, C., and Porse, B.T. (2008). NMD is essential for hematopoietic stem and progenitor cells and for eliminating by- products of programmed DNA rearrangements. Genes Dev. 22, 1381-1396. Welch, E.M., Barton, E.R., Zhuo, J., Tomizawa, Y., Friesen, W.J., Trifillis, P., Paushkin, S., Patel, M., Trotta, C.R., Hwang, S., et al. (2007). PTC124 targets genetic disorders caused by nonsense mutations. Nature 447, 87-91. Welch, E.M., and Jacobson, A. (1999). An internal open reading frame triggers nonsense-mediated decay of the yeast SPT10 mRNA. EMBO J. 18, 6134-6145. Wen, J., and Brogna, S. (2010). Splicing-dependent NMD does not require the EJC in Schizosaccharomyces pombe. EMBO J. 29, 1537-1551. Weng, Y., Czaplinski, K., and Peltz, S.W. (1996a). Genetic and biochemical characterization of mutations in the ATPase and helicase regions of the Upf1 protein. Mol. Cell. Biol. 16, 5477-5490. Weng, Y., Czaplinski, K., and Peltz, S.W. (1996b). Identification and characterization of mutations in the UPF1 gene that affect nonsense suppression and the formation of the Upf protein complex but not mRNA turnover. Mol. Cell. Biol. 16, 5491-5506. Wilson, B.A., and Masel, J. (2011). Putatively noncoding transcripts show extensive association with ribosomes. Genome Biol. Evol. 3, 1245-1252. Wittkopp, N., Huntzinger, E., Weiler, C., Sauliere, J., Schmidt, S., Sonawane, M., and Izaurralde, E. (2009). Nonsense-mediated mRNA decay effectors are essential for zebrafish embryonic development and survival. Mol. Cell. Biol. 29, 3517-3528.

169 Xu, Z., Wei, W., Gagneur, J., Perocchi, F., Clauder-Munster, S., Camblong, J., Guffanti, E., Stutz, F., Huber, W., and Steinmetz, L.M. (2009). Bidirectional promoters generate pervasive transcription in yeast. Nature 457, 1033-1037. Yamashita, A., Izumi, N., Kashima, I., Ohnishi, T., Saari, B., Katsuhata, Y., Muramatsu, R., Morita, T., Iwamatsu, A., Hachiya, T., et al. (2009). SMG-8 and SMG-9, two novel subunits of the SMG-1 complex, regulate remodeling of the mRNA surveillance complex during nonsense-mediated mRNA decay. Genes Dev. 23, 1091-1105. Yamashita, A., Ohnishi, T., Kashima, I., Taya, Y., and Ohno, S. (2001). Human SMG-1, a novel phosphatidylinositol 3-kinase-related protein kinase, associates with components of the mRNA surveillance complex and is involved in the regulation of nonsense-mediated mRNA decay. Genes Dev. 15, 2215-2228. Yepiskoposyan, H., Aeschimann, F., Nilsson, D., Okoniewski, M., and Muhlemann, O. (2011). Autoregulation of the nonsense-mediated mRNA decay pathway in human cells. RNA 17, 2108-2118. Yoine, M., Nishii, T., and Nakamura, K. (2006). Arabidopsis UPF1 RNA helicase for nonsense-mediated mRNA decay is involved in seed size control and is essential for growth. Plant Cell Physiol. 47, 572-580. Zhang, J., Sun, X., Qian, Y., LaDuca, J.P., and Maquat, L.E. (1998). At least one intron is required for the nonsense-mediated decay of triosephosphate isomerase mRNA: a possible link between nuclear splicing and cytoplasmic translation. Mol. Cell. Biol. 18, 5272-5283. Zhang, S., Ruiz-Echevarria, M.J., Quan, Y., and Peltz, S.W. (1995). Identification and characterization of a sequence motif involved in nonsense-mediated mRNA decay. Mol. Cell. Biol. 15, 2231-2244. Zhang, Z., Li, J., Zhao, X.Q., Wang, J., Wong, G.K., and Yu, J. (2006). KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics 4, 259-263. Zhao, J., Ohsumi, T.K., Kung, J.T., Ogawa, Y., Grau, D.J., Sarma, K., Song, J.J., Kingston, R.E., Borowsky, M., and Lee, J.T. (2010). Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol. Cell 40, 939-953. Zhao, J., Sun, B.K., Erwin, J.A., Song, J.J., and Lee, J.T. (2008). Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750-756. Zhou, Z., and Reed, R. (2003). Purification of functional RNA-protein complexes using MS2-MBP. Curr. Protoc. Mol. Biol. 63, 27.3.1-27.3.7. Zund, D., Gruber, A.R., Zavolan, M., and Muhlemann, O. (2013). Translation- dependent displacement of UPF1 from coding sequences causes its enrichment in 3' UTRs. Nat. Struct. Mol. Biol. 20, 936-943.

170