INVESTIGATION OF THE MRNP AND TRANSCRIPTOME REGULATED BY
NONSENSE-MEDIATED RNA DECAY
by
JENNA E. SMITH
Submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
Dissertation Advisor: Dr. Kristian Baker
Department of Biochemistry
CASE WESTERN RESERVE UNIVERSITY
MAY, 2015 CASE WESTERN RESERVE UNIVERSITY
SCHOOL OF GRADUATE STUDIES
We hereby approve the dissertation of
Jenna E. Smith . candidate for the degree of Doctor of Philosophy *.
Committee Chair
Dr. Timothy Nilsen .
Dissertation Advisor
Dr. Kristian E. Baker .
Committee Member
Dr. Donny Licatalosi .
Committee Member
Dr. Maria Hatzoglou .
Committee Member
Dr. Derek Taylor .
Date of Defense
December 19, 2014 .
*We also certify that written approval has been obtained
for any proprietary material contained therein. TABLE OF CONTENTS
LIST OF TABLES...... 5
LIST OF FIGURES...... 6
ACKNOWLEDGEMENTS ...... 8
ABSTRACT ...... 10
CHAPTER 1: AN OVERVIEW OF NONSENSE-MEDIATED RNA DECAY. . . . 12
1.1 RNA surveillance in gene expression...... 12
1.2 Biological importance of the NMD pathway...... 13
NMD in development ...... 15
NMD and human disease...... 17
1.3 NMD impacts the expression of many classes of endogenous
mRNAs ...... 19
1.4 The factors involved in NMD ...... 24
1.5 Models for the recognition of NMD substrates ...... 27
Pre-mRNA splicing and the exon junction complex...... 28
Other cis-acting elements ...... 30
Faux 3’ UTR...... 31
Towards a universal model...... 33
1.6 Downstream steps in NMD...... 35
Activation of UPF1 enzymatic activity ...... 35
Direct impact of NMD on protein output...... 36
1 Decay of NMD substrate RNAs ...... 37
1.7 Harnessing NMD to investigate RNA biology...... 38
CHAPTER 2: PURIFICATION OF AN NMD-SENSITIVE mRNP
COMPLEX ...... 40
2.1 Introduction...... 40
Rationale - evidence for a distinct NMD mRNP ...... 40
Existing methods to interrogate RNA-protein interactions ...... 41
2.2 Results and conclusions ...... 44
Overview of the mRNP pulldown protocol ...... 44
Design and characterization of reporter mRNAs ...... 46
Optimization of mRNP pulldown efficiency...... 48
Co-purification of mRNA-binding proteins ...... 53
Preliminary identification of mRNP components by mass
spectrometry...... 55
2.3 Discussion...... 60
2.4 Future directions...... 62
CHAPTER 3: TRANSLATION OF UNANNOTATED RNAs IN YEAST...... 67
3.1 Introduction...... 67
The emerging field of long non-coding RNAs ...... 67
NMD substrate recognition requires translation...... 69
3.2 Results and conclusions ...... 70
2 Identification of unannotated RNAs in the S. cerevisiae
genome...... 70
uRNAs co-sediment with polyribosomes ...... 74
Ribosome profiling indicates uRNAs are bound by ribosomes . . . 77
Conservation and detection of sORFs expressed from uRNAs . . 81
NMD-sensitivity of uRNAs provides evidence for translation. . . . . 83
3.3 Discussion...... 86
3.4 Future directions...... 90
CHAPTER 4: IDENTIFICATION OF NMD-SENSITIVE mRNAs ...... 97
4.1 Introduction...... 97
Obstacles to the identification of NMD substrates...... 97
Improving approaches for transcriptome analyses...... 100
4.2 Results and conclusions ...... 101
Identification of direct and indirect targets of NMD by RNA-seq . 101
Identifying transcript features unique to NMD-sensitive RNA
isoforms...... 105
4.3 Discussion...... 107
4.4 Future directions...... 110
CHAPTER 5: CONCLUDING REMARKS ...... 113
APPENDIX A - GENERAL EXPERIMENTAL METHODS ...... 115
APPENDIX B - mRNP PULLDOWN PROTOCOL...... 118
3 APPENDIX C - EXPERIMENTAL METHODS RELATED TO
CHAPTER 2 ...... 124
APPENDIX D - EXPERIMENTAL METHODS RELATED TO
CHAPTER 3 ...... 129
APPENDIX E - EXPERIMENTAL METHODS RELATED TO
CHAPTER 4 ...... 148
BIBLIOGRAPHY ...... 153
4 LIST OF TABLES
Table 2-1. List of NMD-sensitive mRNP-associated proteins with candidate roles
in NMD ...... 59
Table 3-1. Summary of high-throughput sequencing reads ...... 93
Table 3-2. Predicted sORFs and conservation analyses ...... 94
Table C-1. List of oligonucleotides, plasmids, and yeast strains used in
Chapter 2 ...... 128
Table D-1. List of oligonucleotides, plasmids, and yeast strains used in
Chapter 3 ...... 147
Table E-1. List of oligonucleotides, plasmids, and yeast strains used in
Chapter 4 ...... 152
5 LIST OF FIGURES
Figure 1-1. The impact of a premature termination codon on protein production
and mRNA stability...... 14
Figure 1-2. A variety of RNAs are substrates for NMD...... 22
Figure 1-3. Models for NMD substrate recognition...... 30
Figure 2-1. Design and validation of GFP reporter mRNAs ...... 47
Figure 2-2. Selection of streptavidin-conjugated magnetic beads for mRNP
purification ...... 49
Figure 2-3. Optimization of mRNP pulldown conditions ...... 50
Figure 2-4. mRNP pulldown is highly specific for target RNA...... 52
Figure 2-5. Co-purification of proteins with mRNP pulldown protocol...... 54
Figure 2-6. Preliminary screening of candidate proteins identified by mRNP
pulldown...... 65
Figure 3-1. Depletion of ribosomal RNA in samples for RNA-seq ...... 71
Figure 3-2. Characterization of yeast unannotated RNAs ...... 73
Figure 3-3. Yeast uRNAs and lncRNAs co-sediment with polyribosomes . . . . . 76
Figure 3-4. Ribosome profiling identifies regions of ribosome association with
uRNAs ...... 78
Figure 3-5. 3-nucleotide phasing of ribosome footprints facilitated the
identification of translated sORFs...... 80
Figure 3-6. Evidence for sORF expression and conservation ...... 82
Figure 3-7. Many uRNAs are sensitive to the translation-dependent NMD
pathway ...... 85
6 Figure 3-8. Evidence for translation of uRNAs in yeast ...... 87
Figure 4-1. Characteristics of NMD-sensitive mRNAs identified by RNA-seq . 103
Figure 4-2. Validation of RNA-seq by northern analysis...... 104
Figure 4-3. A tool for identifying known and unique NMD-sensitive transcript
regions...... 107
Figure 4-4. Global transcriptional inhibition using a temperature-sensitive
mutation in RNA Polymerase II...... 111
7 ACKNOWLEDGEMENTS
All of the work that I put into my doctoral research could not have been completed without the assistance and support of others. First and foremost, my research advisor and mentor, Dr. Kristian Baker, has been incredibly supportive and encouraging throughout my time in her lab. When I joined the Baker lab, I would have been lost without her guidance and always-open door to help me find my footing as an independent researcher. Over the years, Kristian has helped me grow as a scientist, a critical thinker, and a professional.
Many other faculty at CWRU have contributed to my development as a scientist. Prominent among these is Dr. Jeff Coller, who has always provided valuable insight in group meetings and helped motivate my research to move into new and exciting directions. I would also like to acknowledge Dr. Jo Ann Wise for the outside viewpoint she contributed to lab meeting discussion. Finally, the members of my thesis committee - Drs. Timothy Nilsen, Maria Hatzoglou, Donny
Licatalosi, and Derek Taylor - have provided suggestions and guidance throughout and at this final stage of my graduate career, without which my research would not have been as successful.
Some of the work presented here is directly thanks to the contributions of other researchers. In particular, one project was built off of work by Sarah
Geisler, a former member of the Coller lab, who also helped in the initial stages of my own research. Nathan Huynh has contributed a substantial amount of time as an undergraduate student and been a great help to me. Nicholas Kline provided invaluable bioinformatics support, without which I would have been lost.
8 I have also relied on the expertise of the CWRU Genome and Transcriptome
Sequencing Core, in particular Frank Campbell and Sheldon Bai. Our mass spectrometry analyses were greatly aided by the expertise and insight of Dr.
Amber Mosley and members of her lab. Finally, collaborators at the Whitehead
Institute - Juan Alvarez and Wenqian Hu - have also contributed great ideas and data to supplement the work documented here.
All of the past and present members of the Baker and Coller labs have been wonderful to work with. The distinction between these two labs was often blurred, which provided a great opportunity to work with twice as many knowledgeable colleagues. In particular, DaJuan Whiteside and Najwa Al-
Husaini were tremendously helpful in everything from troubleshooting to helping with protocols to ordering reagents. I cannot imagine working with a better group of people.
Aside from these direct contributions, my graduate research would not have succeeded without the mental and emotional support of my family. My husband, Trevor Smith, and parents, Steve and Faith Sroka, were always encouraging and optimistic, even at the most difficult times. Although they may not have always known exactly the challenges I was facing, they were always there to encourage me every step of the way. Thank you all for your love and support.
9 Investigation of the mRNP and Transcriptome Regulated by
Nonsense-Mediated RNA Decay
Abstract
by
JENNA E. SMITH
Appropriate and accurate gene expression is critical for all organisms.
One quality control pathway which exists to maintain the fidelity of gene expression is the nonsense-mediated RNA decay (NMD) pathway, responsible for recognizing and targeting for rapid degradation RNAs undergoing premature translation termination. Although this pathway is conserved throughout eukarya and essential in higher organisms, many mechanistic details underlying NMD are not understood.
Proteins uniquely bound to NMD-sensitive mRNAs are predicted to facilitate their recognition as aberrant by the NMD pathway. To investigate differences between NMD-sensitive and NMD-insensitive RNAs, an efficient and highly-specific method to biochemically purify an individual mRNP from a whole- cell lysate was developed. This extensively optimized procedure will serve as a powerful tool to identify proteins preferentially associated with NMD-sensitive mRNAs to elucidate how NMD substrates are recognized. Furthermore, the mRNP pulldown protocol can be adapted to study other aspects of mRNA regulation.
10 Because NMD is a strictly translation-dependent process, sensitivity to
NMD can provide evidence for the translation of RNAs. An emerging class of poorly characterized RNAs, long non-coding RNAs (lncRNAs) are bioinformatically classified to lack protein-coding capacity. Unannotated RNAs
(uRNAs) in yeast were specifically investigated for their capacity to associate with translating ribosomes based on co-sedimentation with polyribosomes, ribosome profiling, detection of encoded peptides, and sensitivity to NMD. These data demonstrated that many transcripts considered to lack protein-coding potential are, in fact, actively translated, and implicate NMD in regulating the activity or expression of a subset of lncRNAs.
Finally, many endogenous mRNAs are sensitive to the NMD pathway, including specific mRNA isoforms. Genome-wide profiling of the yeast transcriptome by high-throughput sequencing globally identified mRNAs sensitive to NMD. These mRNAs were enriched for features common to NMD-sensitive
RNAs, and also included indirect targets of NMD. A bioinformatic tool was developed to find NMD-sensitive regions of the yeast transcriptome independent of gene annotation or transcript structure, and identify NMD-sensitive RNA isoforms. This preliminary analysis of the NMD-sensitive transcriptome has revealed unappreciated complexity in RNA processing and expression in yeast.
11 CHAPTER 1: AN OVERVIEW OF NONSENSE-
MEDIATED RNA DECAY
1.1 RNA surveillance in gene expression
One of the most essential tasks carried out by all organisms is gene expression, which must occur under the appropriate conditions and with minimal error. RNA is the effector molecule responsible for converting genetic information from DNA - in which it is stored - to function, be that the RNA itself, or translation of the RNA into a protein. Thus, gene expression is not only controlled by the regulation of transcription, but can also be regulated during all steps of RNA metabolism. Gene expression can, therefore, be impacted by processing, localization, binding of RNA binding proteins (RBPs), regulation of translation, and decay of the RNA. In particular, an often overlooked aspect of gene expression is the degradation of RNAs as a critical means to either maintain the correct steady-state level of an RNA population or turn off expression upon a change in environmental conditions. Furthermore, quality control pathways exist to target RNAs for degradation if they contain errors, helping ensure the fidelity of gene expression.
Because of the importance of accurate gene expression in all organisms, it should perhaps be unsurprising that there are many mechanisms in place to ensure that the expression of a gene occurs with minimal error or wasted resources (reviewed in Eberle and Visa, 2014). In particular, three major surveillance pathways detect aberrancies in RNA sequence or structure during
12 translation and help clear the cell of these RNAs (Shoemaker and Green, 2012).
The earliest identified RNA surveillance pathway, nonsense-mediated RNA decay
(NMD) recognizes mRNAs that undergo premature translation termination and targets these RNAs for rapid degradation (Leeds et al., 1991; Losson and
Lacroute, 1979). In contrast, non-stop decay recognizes mRNAs which lack a termination codon, and also induces their decay (Frischmeyer et al., 2002).
Finally, no-go decay frees translating ribosomes from mRNAs where they have stalled during elongation due to a strong pause or structural block (Doma and
Parker, 2006). Although these pathways represent a small piece of the complex network in place to ensure that gene expression occurs accurately and appropriately, they provide a critical means to eliminate aberrant RNAs from the cell. Here, the focus will be on the first identified and best characterized RNA surveillance pathway, NMD.
1.2 Biological importance of the NMD pathway
Over three decades ago, it was first observed in yeast that mRNAs containing premature termination codons (PTCs) due to amber nonsense mutations were less abundant than their counterparts that lacked a PTC (Losson and Lacroute, 1979). Subsequently, it was determined that this effect was due to the preferential decay of PTC-containing mRNAs (Leeds et al., 1991; Figures
1-1A and 1-1B), leading to the characterization of the translation-dependent RNA quality control pathway known as nonsense-mediated RNA decay. The NMD pathway is conserved throughout eukarya, and recognizes and targets for rapid
13 degradation RNAs undergoing premature translation termination (reviewed in
Baker and Parker, 2004). The translation of truncated open reading frames
(ORFs) generated from premature translation termination (Figure 1-1B) could have deleterious effects on the cell by producing polypeptides truncated at their carboxy-terminus with dominant-negative functions (Pulak and Anderson, 1993), suggesting that a key role of NMD is to limit the production of these truncated proteins (reviewed in Bhuvanagiri et al., 2010; Holbrook et al., 2004).
Furthermore, while premature termination due to nonsense or frameshift mutations within a gene coding sequence are the most straightforward examples of NMD substrates, there are actually many classes of RNAs sensitive to NMD
(discussed in Chapter 1.3), suggesting the role of NMD in shaping gene expression may be quite broad. Indeed, 3-10% of genes across eukarya show
NMD-dependent changes in abundance (Rehwinkel et al., 2006), leading to the proposal that NMD may also be involved in regulating the expression of endogenous genes.
A Production of full-length protein
AAAn
AAAn
AAAn Stable RNA
AAAn species B Premature Production of termination codon truncated protein
AAAn UPF3 RNA rapidly degraded UPF1 UPF2
Figure 1-1. The impact of a premature termination codon on protein production and mRNA stability. A) Translation of a normal mRNA produces a full-length protein, and the RNA species is relatively stable. B) Translation of an mRNA containing a premature termination codon results in production of a truncated protein, and leads to the accelerated degradation of the mRNA. This process of nonsense-mediated RNA decay (NMD) requires the activity of three conserved NMD factors, UPF1, UPF2, and UPF3; the precise mechanism of these factors in facilitating NMD is not fully understood.
14 The fact that NMD is a conserved pathway in eukaryotes from yeast to humans hints at its importance. Elimination of the NMD pathway leads to lethality in many organisms, including fruit fly Drosophila melanogaster
(Metzstein and Krasnow, 2006), plants such as Arabidopsis thaliana (Yoine et al.,
2006), and vertebrates including zebrafish (Wittkopp et al., 2009) and mammals
(Medghalchi et al., 2001; Weischenfeldt et al., 2008). Even in lower organisms in which NMD is dispensable for viability, including budding yeast Saccharomyces cerevisiae, fission yeast Schizosaccharomyces pombe, and Caenorhabditis elegans, loss of NMD can still be detrimental for survival. NMD mutations in S. pombe result in increased sensitivity to oxidative stress (Matia-Gonzalez et al.,
2013; Rodriguez-Gabriel et al., 2006), while in worms deletion of any of the factors involved in NMD leads to a severe morphogenetic defect for the which the
C. elegans NMD factors are named (SMG proteins; Suppressor with
Morphogenetic effect on Genitalia; Hodgkin et al., 1989). Notably, it is not clear whether the importance of NMD is due solely to its role in RNA surveillance, or through its effects on the expression of endogenous genes.
NMD in development
Several specific roles for the NMD pathway during development have been uncovered in higher eukaryotes. NMD may have an important role in the development of the immune system. In mammals, NMD is critical for hematopoietic maintenance and the development and activation of lymphocytes based on conditional knockout of the NMD pathway (Weischenfeldt et al., 2008).
15 Similarly, transgenic mice ubiquitously over-expressing a dominant-negative form of one of the core NMD factors demonstrate abnormal thymus development and decreased thymocyte maturation (Frischmeyer-Guerrerio et al., 2011). The role of NMD in the immune system may be related to its function in eliminating non- productive gene rearrangements that occur in the T cell receptor and immunoglobulin genes during immune system development (Carter et al., 1995;
Weischenfeldt et al., 2008).
The activity and regulation of NMD in the nervous system also appears to be critical in higher eukaryotes. The regulation of NMD activity during development is important for the differentiation and maturation of neural cells in human, mouse, and Xenopus, suggesting an essential role for NMD in appropriate development of the brain (Lou et al., 2014). Indeed, in support of this, in humans mutations in NMD factor UPF3B are associated with intellectual deficiencies manifested as syndromic and nonsyndromic X-linked mental retardation (Tarpey et al., 2007).
NMD may also be important for development in other tissues. Conditional knockout of NMD has demonstrated a role for NMD in liver function, particularly fetal liver development and liver regeneration (Thoren et al., 2010). In D. melanogaster, deletion of NMD factors is lethal by larval stage 2 (Chapin et al.,
2014), and mutations in NMD factors result in abnormal development of genitalia in C. elegans (Hodgkin et al., 1989), providing evidence for roles of NMD in development throughout metazoa that have not yet been characterized.
16 NMD and human disease
Not only is NMD important for normal cellular function in many contexts, it has been implicated in the pathogenesis of a number of human diseases. Up to
1/3 of all genetic diseases are predicted to be caused by mutations that introduce a PTC into a gene coding sequence (Frischmeyer and Dietz, 1999). Among the most notable of these are 5% of mutations in the cystic fibrosis transmembrane receptor CFTR which cause cystic fibrosis, mutations in the dystrophin gene which lead to muscular dystrophy, and mutations in the β-globin gene that result in β-thalassemia (reviewed in Bhuvanagiri et al., 2010). Interestingly, the pathogenesis of these diseases can be either positively or negatively impacted by NMD (Khajavi et al., 2006). In some diseases, including β-thalassemia, NMD limits the production of truncated proteins with dominant-negative function that result from translation of PTC-containing mRNAs, thereby preventing the more severe dominant-negative phenotype seen with nonsense mutations in the β- globin gene that fail to elicit NMD (reviewed in Bhuvanagiri et al., 2010). In other cases, such as Duchenne muscular dystrophy, NMD may suppress expression of a truncated but partially active protein by triggering degradation of the PTC- containing transcript, causing a more severe phenotype (reviewed in Bhuvanagiri et al., 2010).
Because of the extent of disease-causing mutations impacted by NMD, a promising route of treatment may be through modulation of the NMD pathway itself (Peltz et al., 2009). There are two distinct mechanisms through which NMD therapeutics may act: 1) nonsense suppression of the disease-causing mutation,
17 which allows the expression of some full-length protein, or 2) inhibition of NMD activity to increase the levels of truncated polypeptides. For the former approach, aminoglycosides such as gentamycin, which decrease the fidelity of translational decoding and can suppress translation termination at nonsense codons, have been explored to treat diseases caused by nonsense mutations
(reviewed in Peltz et al., 2013). Additionally, a small molecule inhibitor of the
NMD pathway, referred to as PTC124 or Ataluren, has been shown to selectively promote readthrough of PTCs but not normal termination codons, allowing the production of full-length proteins from normally NMD-sensitive mRNAs (Peltz et al., 2013; Welch et al., 2007). This drug has had success alleviating disease phenotypes in mouse models or in vitro, including Duchenne muscular dystrophy
(Welch et al., 2007) and cystic fibrosis (Du et al., 2008), and is now undergoing
Phase III clinical trials (Peltz et al., 2013). In contrast, for the latter therapeutic approach, inhibition of NMD may allow the expression of truncated polypeptides that still possess partial functionality from a gene containing a nonsense mutation. Alternatively, inhibition of NMD in cancer has been shown to promote accumulation of new truncated protein isoforms that may arise from widespread genomic disorganization, which could be recognized by the native immune system (Pastor et al., 2010). To this effect, treatment by aptamer-conjugated siRNA knockdown of NMD components has had success in decreasing the severity of mouse tumors by inducing an immune response against the tumor
(Pastor et al., 2010). The promise of modulating NMD as a disease therapy
18 highlights the importance of understanding all that we can about the molecular details of the NMD pathway.
1.3 NMD impacts the expression of many classes of endogenous mRNAs
NMD is traditionally classified as a surveillance pathway, with classic targets arising from nonsense or frameshift mutations which create a stop codon within an open reading frame. However, genome-wide expression profiling has indicated that NMD also impacts the expression of 3-10% of endogenous genes
(Rehwinkel et al., 2006). Although the significance of this regulation is still unclear, examination of mRNAs sensitive to NMD has revealed a number of features - in addition to nonsense or frameshift mutations - that can render an mRNA a substrate of NMD. A common feature of all known classes of NMD substrates is the presence of a translation termination event that either occurs or is perceived to occur prematurely. NMD-inducing features can be grouped into three main categories - premature termination within an annotated ORF, premature termination due to translation of alternate reading frames, or extension of untranslated RNA downstream of a normal termination codon (Figure 1-2).
PTCs within an ORF may arise in a number of ways (Figure 1-2A). They may be introduced by genomic nonsense or frameshift mutations, or by errors in transcription (Figure 1-2A, top). These generally represent truly aberrant mRNA species that are cleared by surveillance. In contrast, mRNAs that encode the non-standard amino acid selenocysteine purposefully encode a stop codon within the coding sequence. Selenocysteine is encoded by the UGA stop codon, and
19 failure of the selenocysteine tRNA (sec-tRNA) to decode the UGA, particularly in conditions of selenium deficiency where sec-tRNA levels are limiting, will result in translation termination within the ORF (Moriarty et al., 1998; Seyedali and Berry,
2014).
PTCs can also be introduced into an ORF following erroneous or alternative pre-mRNA splicing. First, the erroneous or regulated retention of introns (Figure 1-2A, middle), often due to weak consensus splice sites
(Braunschweig et al., 2014; He et al., 1993; Sayani et al., 2008), is likely to introduce PTCs because introns are evolutionarily selected to either result in a frameshift if retained in the transcript or encode in-frame termination codons
(Jaillon et al., 2008). Similarly, the use of cryptic splice sites can also lead to the inclusion of intronic sequences or produce frameshifts that lead to downstream nonsense codons (Kawashima et al., 2014).
Second, distinct from intron retention, alternative splicing also commonly produces NMD substrates (Figure 1-2A, bottom), with over 1/3 of alternative splicing events predicted to introduce PTCs (Lewis et al., 2003). Indeed, the coupling of alternative splicing and NMD (AS-NMD) is thought to represent one of the most prominent means of generating NMD substrates in higher eukaryotes. In some cases, PTCs may be introduced by incorporating “poison cassette exons” which harbor in-frame nonsense codons (Lareau et al., 2007).
PTCs may also be introduced during splicing by the inclusion of a combination of exons that lead to frameshifts in the annotated coding sequence. Interestingly, the generation of AS-NMD splice products by poison cassette exon is used by
20 the SR family of alternative splicing regulators (Lareau et al., 2007), indicating that this may serve as an important feedback mechanism to buffer the level of alternative splicing.
Translation of alternate ORFs upstream to or out-of-frame of annotated reading frames can also result in premature translation termination (Figure 1-2B).
Selection of an alternate start codon that is out-of-frame of the annotated start codon often occurs when the annotated AUG is in a weak initiation context, a process termed leaky scanning (Figure 1-2B, top; Arribere and Gilbert, 2013;
Welch and Jacobson, 1999). Alternatively, -1 programmed ribosomal frameshifting (-1 PRF) can occur when a translating ribosome encounters two cis-acting signals in the RNA sequence, causing the ribosome to shift translation into the -1 reading frame (Figure 1-2B, middle; Belew et al., 2011; Belew et al.,
2014; Plant et al., 2004). Both of these events result in translation out-of-frame and likely lead to premature termination. Accordingly, it has been documented that the vast majority of -1 PRF signals in eukaryotes result in translation termination upstream of the annotated stop codon (Jacobs et al., 2007).
Additionally, many mRNAs contain ORFs upstream of the annotated ORF
- within the 5’ UTR - termed upstream ORFs (uORFs; Figure 1-2B, bottom).
Translation of a uORF leads to very early termination, upstream of the annotated
ORF, often making the mRNA sensitive to NMD (Hurt et al., 2013; Johansson et al., 2007; Yepiskoposyan et al., 2011). It has recently become appreciated that many uORFs initiation translation at non-AUG start codons (Brar et al., 2012;
Fritsch et al., 2012; Ingolia et al., 2011), suggesting that many uORFs may not
21 A
Nonsense or frameshift mutation; AAAAAAAn failure to decode selenocysteine
Intron retention
AAAAAAAn AAAAAAAn Normal NMD substrate
Alternative splicing
AAAAAAAn AAAAAAAn Normal NMD substrate
B
Alternative AUG selection AAAAAAAn
Ribosomal frameshift AAAAAAAn
Translation of uORF AAAAAAAn
C Extended 3’ UTR AAAAAAAn
Figure 1-2. A variety of RNAs are substrates for NMD. A) PTCs may be introduced into an annotated ORF by nonsense or frameshift mutations or failure to decode UGA stop codon as selenocysteine (top; mutation or stop codon indicated in red); retention of a PTC-containing intron (middle), or alternative splicing that introduces a PTC (bottom). B) Translation of alternate reading frames (indicated in dark green) may result in premature translation termination. Alternative reading frame translation may occur upon translation initiation at an alternative AUG start codon (top), ribosomal frameshift (middle; frameshift signals indicated in red), or translation of a uORF (bottom). C) Extension of the 3’ UTR can cause a normal termination codon to elicit NMD.
yet be identified and, therefore, the contribution of uORF translation to NMD may be underestimated.
Finally, even normal termination codons can trigger NMD if the RNA contains a long 3’ UTR (Kebaara and Atkin, 2009; Muhlrad and Parker, 1999a).
This is proposed to be because the large region of downstream untranslated
22 RNA resulting from a long 3’ UTR can make a normal termination event appear premature (Figure 1-2C). This is supported by the phenomenon of polarity, where a more 5’ proximal PTC that generates a longer untranslated 3’ sequence is more efficient at eliciting NMD (Cao and Parker, 2003; Losson and Lacroute,
1979; Peltz et al., 1993). Long 3’ UTRs may be genomically encoded, or result from processing the nascent mRNA at alternative, distal cleavage and polyadenylation sites (Muhlrad and Parker, 1999a; Pulak and Anderson, 1993;
Shi, 2012). Notably, however, the relationship between 3’ UTR length and NMD is not straightforward - there is no direct linear relationship between 3’ UTR length and NMD, and very long 3’ UTRs have been found to escape recognition by NMD (Kebaara and Atkin, 2009).
Global analyses of the NMD-sensitive transcriptome, originally by microarray and subsequently using high-throughput sequencing, have helped uncover many of these features common to NMD-sensitive RNAs. However, there is much that remains unclear regarding the endogenous, physiological targets of NMD. Although 3-10% of eukaryotic mRNAs are sensitive to NMD
(Rehwinkel et al., 2006), many of these mRNAs do not contain apparent NMD- inducing features. Furthermore, there is no consensus of NMD targets, suggesting that the coordinated regulation of mRNAs in common cellular pathways is not evolutionarily conserved. Our understanding of NMD can be greatly increased by understanding the impact it has on both aberrant and endogenous RNAs in the cell. To investigate this in S. cerevisiae at nucleotide resolution, high-throughput sequencing analyses of RNAs impacted by loss of
23 NMD activity are presented in Chapters 3 and 4. With continued analysis, it should be possible to identify features that lead to NMD-sensitivity for many
RNAs, and expand the knowledge of how NMD impacts the global transcriptome in this organism.
1.4 The factors involved in NMD
NMD requires a conserved set of proteins in order to elicit the accelerated decay of target mRNAs. The core NMD machinery is composed of the UPF proteins, for “up-frameshift suppressor,” (Culbertson et al., 1980), and includes
UPF1, UPF2, and UPF3 (Figure 1-1B) which were first identified in S. cerevisiae.
UPF1 was originally characterized upon discovery that mutations in the UPF1 gene resulted in allosuppression of frameshift mutations in HIS4 (Leeds et al.,
1991). UPF1 is a superfamily I helicase (Altamura et al., 1992; de la Cruz et al.,
1999; Leeds et al., 1992), which exhibits 5’ to 3’ helicase and nucleic acid- dependent ATPase activity (Bhattacharya et al., 2000; Czaplinski et al., 1995), both of which are required for its activity in NMD (Weng et al., 1996a; Weng et al., 1996b). Also an RNA binding protein, UPF1 directly binds 9-11 nucleotides of
RNA via its helicase domain (Chakrabarti et al., 2011; Chamieh et al., 2008).
Finally, UPF1 is a phosphoprotein, with the cycle of UPF1 phosphorylation and dephosphorylation required for NMD in higher eukaryotes (Page et al., 1999).
UPF3 was identified in a genetic screen similar to that in which UPF1 was identified (Leeds et al., 1992), while UPF2 (also called NMD2) was discovered through interaction with UPF1 in a yeast two-hybrid screen (He and Jacobson,
24 1995). UPF2 is a mIF4G (Middle of eukaryotic translation Initiation Factor 4G) domain-containing protein which can bind RNA through its third mIF4G domain
(Kadlec et al., 2004). It is thought to act as a molecular bridge, binding the cysteine-histidine-rich (CH) domain of UPF1 (Chakrabarti et al., 2011; Chamieh et al., 2008; Kadlec et al., 2006) and RNA recognition motif of UPF3 (Kadlec et al., 2004) to form a heterotrimeric complex. Neither UPF2 nor UPF3 have any demonstrated enzymatic function, although UPF2 can activate the helicase and
ATPase activities of UPF1, implicating this as a specific role for UPF2 in NMD
(Chakrabarti et al., 2011; Chamieh et al., 2008).
Subsequent screens in C. elegans for allele-specific suppressors identified homologs of the yeast NMD factors, as well as additional proteins required for
NMD in metazoa (Cali et al., 1999; Hodgkin et al., 1989). These were termed
SMG factors (Suppressor with Morphogenetic effect on Genitalia), named for the characteristic phenotype of worms carrying mutations in these genes. SMG2,
SMG3, and SMG4 correspond to yeast factors UPF1, UPF2, and UPF3, respectively, while SMG1, SMG5, SMG6, and SMG7 are metazoa-specific NMD factors. SMG1 is a phosphoinositol-3 kinase (PIK)-related kinase that is responsible for the phosphorylation of UPF1 (Page et al., 1999; Pal et al., 2001).
Alternatively, SMG5-7 bind phosphorylated UPF1 through their 14-3-3-like domains (Fukuhara et al., 2005) and recruit protein phosphatase 2A (PP2A) to induce UPF1 dephosphorylation (Chiu et al., 2003; Ohnishi et al., 2003). SMG6 is a PIN (PilT N-terminus) domain-containing endonuclease responsible for cleaving NMD targets to initiate their degradation in metazoa (Eberle et al., 2009;
25 Huntzinger et al., 2008). In budding yeast, the absence of clear homologs of these factors suggests that neither the phosphorylation of UPF1 nor endonucleolytic cleavage of NMD substrates are conserved steps in the NMD cycle, although it has been suggested that the yeast protein EBS1 may be a
SMG7 homolog (Luke et al., 2007).
Homologs of the core NMD factors and most auxiliary factors have been identified in all eukaryotic organisms investigated, consistent with the presence of a conserved NMD pathway (reviewed in Conti and Izaurralde, 2005). In general, metazoa including zebrafish (Wittkopp et al., 2009) and mammals (Applequist et al., 1997; Chiu et al., 2003; Gatfield et al., 2003; Lykke-Andersen et al., 2000;
Mendell et al., 2000; Ohnishi et al., 2003; Perlick et al., 1996; Serin et al., 2001;
Yamashita et al., 2001) express all seven NMD factors. A notable exception is D. melanogaster, which lacks a clear ortholog of SMG7 (Gatfield et al., 2003).
Although NMD has been studied for over two decades, novel factors with proposed roles in NMD continue to be identified. Recent screens in C. elegans have identified two factors essential for viability, termed SMGL-1 and SMGL-2
(SMG Lethal; also known as DHX34 and NBAS) that play an uncharacterized role in NMD in C. elegans, zebrafish, and humans (Longman et al., 2013;
Longman et al., 2007). Two additional proteins, SMG8 and SMG9, appear to play a role in regulating the kinase activity of SMG1 in humans and C. elegans
(Yamashita et al., 2009). Finally, human PNRC2 (Proline-rich Nuclear Receptor
Coregulatory protein 2) has recently been shown to function in concert with UPF1 in NMD and facilitates recruitment of the decay machinery to NMD targets (Cho
26 et al., 2009). Importantly, the fact that novel proteins continue to be implicated in the well-studied NMD pathway suggests there is still much we do not understand about the mechanisms of NMD substrate recognition and decay.
1.5 Models for the recognition of NMD substrates
The first step in NMD is the recognition of an RNA as an NMD substrate.
There is still uncertainty regarding how RNAs are recognized as targets for this decay pathway. Specifically, how is premature translation termination distinguished from normal translation termination? Several observations suggest that it is the spatial location of translation termination, specifically in relation to the downstream mRNA, that is important for PTC definition. For instance, if translation termination occurs closer to the 5’ end of an mRNA and further from the normal termination codon, the mRNA will be more efficiently targeted for rapid degradation through the NMD pathway, a phenomenon referred to as polarity
(Losson and Lacroute, 1979; Peltz et al., 1993). Additionally, an extended 3’ untranslated region (UTR) can cause a normal termination codon to be perceived as a PTC and trigger NMD (Kebaara and Atkin, 2009; Muhlrad and Parker,
1999a). Furthermore, an abundance of evidence suggests that differences in the spatial arrangement of translation termination and proteins associated with an mRNA help differentiate between normal and premature translation termination
(reviewed in Baker and Parker, 2004). As described below, the predominant models to explain NMD substrate recognition build upon this concept (Figure
1-3). Critically, however, no model represents a conserved and universal
27 mechanism for how NMD substrates are discriminated from normal RNAs, highlighting the ongoing need for investigation into this fundamental aspect of
NMD.
Pre-mRNA splicing and the exon junction complex
In higher eukaryotes, particularly vertebrates, pre-mRNA splicing can enhance NMD (Carter et al., 1996; Figure 1-3A). Specifically, an exon-exon junction located more than ~50-55 nucleotides downstream of a termination codon enhances the targeting of that mRNA by NMD (Zhang et al., 1998).
During pre-mRNA splicing, a suite of proteins known as the exon junction complex (EJC) is deposited onto the mRNA ~20-24 nucleotides upstream of the splice junction (Le Hir et al., 2000). Ribosomes displace EJCs from the coding region of mRNAs during translation elongation, usually resulting in an EJC-free mRNA. However, if translation termination occurs >55 nucleotides (nt) upstream of an exon-exon junction, translating ribosomes cannot physically displace the downstream EJC (known as the 50-55 nt rule; Nagy and Maquat, 1998). When translation termination occurs in the presence of an EJC, two of the core NMD factors, UPF2 and UPF3, which associate with the EJC through direct interaction with EJC component Y14 (Kashima et al., 2006; Le Hir et al., 2001), are believed to interact with UPF1 and activate NMD.
There is substantial evidence that the EJC enhances NMD. First, some
PTCs are only competent to induce NMD when located upstream of an exon- exon junction (Carter et al., 1996). Second, some naturally intronless genes,
28 which lack EJCs, are immune to NMD unless an artificial intron is placed downstream of the PTC (Maquat and Li, 2001). Finally, artificially placing EJC components on an mRNA downstream of a termination codon, through tethering protein components of the EJC or inserting a downstream intron, can convert an mRNA to an NMD substrate (Carter et al., 1996; Ivanov et al., 2008; Lykke-
Andersen et al., 2001; Thermann et al., 1998). Interestingly, the general absence of introns from 3’ UTRs suggests selective pressure exists to limit splicing downstream of a normal termination codon, which would cause otherwise normal
RNAs to be recognized as NMD substrates (Nagy and Maquat, 1998).
Despite this evidence, the presence of an EJC - or splicing in general - is not absolutely required for NMD substrate recognition. In some cases, reporter mRNAs that lack introns (Singh et al., 2008) or which contain a PTC that does not meet the “50-55 nt rule” (Buhler et al., 2006; Carter et al., 1996) are still recognized as NMD substrates. Furthermore, lower eukaryotes including D. melanogaster (Gatfield et al., 2003), C. elegans (Longman et al., 2007), fission yeast S. pombe (Wen and Brogna, 2010), and budding yeast S. cerevisiae do not require pre-mRNA splicing for efficient NMD substrate recognition. Indeed, only
~5% of S. cerevisiae mRNAs undergo pre-mRNA splicing, while every mRNA from this organism that has been tested can be made competent for NMD.
Therefore, while an EJC can enhance NMD substrate recognition, likely by increasing recruitment of the NMD machinery to an mRNA, it cannot underly substrate recognition for all mRNA substrates of this conserved pathway.
29 Other cis-acting elements
In budding yeast, which lack EJCs, a related model referred to as the downstream sequence element (DSE) model was previously proposed (Figure
1-3B). In this case, a degenerate consensus sequence of TGYYGATGYYYYY
A Pre-mRNA splicing EJC deposition
AAAAAAAAAA AAAAAAAAAA
A Displacement of Retention of EJC on EJC by translating Pre-mRNA mRNA during ribosomes premature translation splicing terminationEJC deposition
AAAAAAAAAA AAAAAAAAAA
AAAAAAAAAA AAAAAAAAAA
B Displacement of HRP1 Retention of EJC on EJC by translating mRNA during HRP1 ribosomes premature translation termination DSE AAAAAAAAAA DSE AAAAAAAAAA
AAAAAAAAAA AAAAAAAAAA
C NMD NMD Positive interactions with terminating B HRP1 AAAAAAAAAA AAAAAAAAAAribosome and release factors HRP1
DSE AAAAAAAAAA DSE AAAAAAAAAAAAAAAAAAAA AAAAAAAAAA
D NMD NMD Positive interactions DC with terminating ribosome and release factors AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA
Figure 1-3. Models for NMD substrate recognition. A) The EJC model postulates that when translation termination occurs >50-55nt upstream of an exon-exon junction, the EJC fails to be removed from the mRNA during translation which enhances recruitment of the NMD machinery. B) In the DSE model, HRP1 associates with a degenerate consensus sequence; when it fails to be displaced by translating ribosomes due to premature translation termination, NMD is triggered. C) The faux 3’ UTR model suggests that a false or extended 3’ UTR increases the distance between translation termination and poly(A) binding protein, which ordinarily provides a positive signal that translation is normal. In the absence of this positive signal, the NMD machinery is recruited. D) Other unidentified components of the mRNP may provide spatial or contextual information about the location of translation termination, similar to the models above.
30 (the DSE) elicits NMD only when located downstream of a premature termination codon (Hagan et al., 1995; Zhang et al., 1995), in a manner similar to an EJC.
HRP1, a heterogeneous nuclear ribonucleoprotein (hnRNP), has been proposed to bind to this consensus sequence and recruit the NMD machinery when translation termination occurs while it is associated with an mRNA (Gonzalez et al., 2000).
Importantly, the DSE model is not generally accepted as a mechanism by which NMD substrates may be recognized. Most importantly, it fails to explain how NMD substrates lacking a DSE can still be effectively discriminated from normal mRNAs (Hagan et al., 1995; Zhang et al., 1995). Moreover, the requirement for an mRNA to contain a DSE within its coding sequence would place evolutionary constraints on open reading frames and polypeptide sequences. Finally, a DSE motif has not been identified in any organism except yeast. This has led to the conclusion that the DSE model cannot represent a universal or conserved mechanism by which NMD substrates are identified, and left open the question of how NMD substrate discrimination is carried out in the ancestral model eukaryote, S. cerevisiae.
Faux 3’ UTR
A third model for NMD substrate recognition is the faux 3’ UTR model.
This model was originally described in yeast, but is now believed to apply more broadly throughout eukarya. The model is based on the fact that factors associated with a normal mRNA 3’ end are further from the location of premature
31 translation termination compared to normal termination (Figure 1-3C).
Specifically, the increased distance between the site of translational termination and the 3’ end of an mRNA created by premature termination generates a “false”
3’ UTR (Amrani et al., 2006; Amrani et al., 2004). A typical mRNA ends with a 3’ polyadenosine tail, to which the poly(A)-binding protein PAB1 binds. It is believed that when translation termination occurs at the normal position, proximal to the 3’ end of the RNA and poly(A) tail, PAB1 provides a positive signal to terminating ribosomes that termination is occurring in a normal context. When termination occurs further upstream and more distal to the poly(A) tail, such as at a PTC, it occurs in the absence of proximal PAB1 and, therefore, lacks the positive signal that translation termination is normal. The mechanism for this distance-dependent relationship was initially proposed to be due to competition between PAB1 and UPF1 to bind eukaryotic release factor eRF3 (SUP35 in yeast). During normal translation termination, PAB1 would interact with eRF3 and block the association of UPF1 at the site of termination; in contrast, during premature termination, the absence of PAB would allow UPF1 to interact with eRF3, thus distinguishing premature translation termination from normal termination (Amrani et al., 2004; Ivanov et al., 2008). However, subsequent data has failed to find evidence to support this competitive binding in vitro (Kervestin et al., 2012; Singh et al., 2008).
A number of pieces of evidence support a role for PAB1 in NMD. The faux
3’ UTR model is supported by the observation that increasing the distance between translation termination and the 3’ end of an mRNA, through an extended
32 3’ UTR or a more 5’-proximal PTC, increases targeting of that mRNA to NMD
(Buhler et al., 2006; Eberle et al., 2008; Singh et al., 2008). Additionally, in yeast,
D. melanogaster, and humans, artificially tethering PAB1 immediately downstream of a PTC converts an NMD-sensitive mRNA to one that is insensitive to NMD (Amrani et al., 2004; Behm-Ansmant et al., 2007; Eberle et al., 2008; Ivanov et al., 2008; Silva et al., 2008; Singh et al., 2008).
Once again, however, the faux 3’ UTR model is unable to fully explain
NMD substrate recognition. In both S. cerevisiae and S. pombe, cytoplasmic
PAB1 is not required for the accurate discrimination of normal and PTC- containing mRNAs by the NMD machinery (Meaux et al., 2008; Wen and Brogna,
2010). It has also been shown in S. cerevisiae that neither normal 3’ end formation to produce a normal 3’ UTR (Baker and Parker, 2006; Gonzalez et al.,
2000) nor the presence of a poly(A) tail on an mRNA is required for NMD substrate recognition (Meaux et al., 2008). Therefore, while the context of the 3’
UTR does appear to play an important role in NMD substrate recognition, PAB1 and the poly(A) tail are not sufficient to explain this effect.
Towards a universal model
In the absence of a universal model for NMD substrate recognition, alternative ways of explaining how this process occurs should be considered.
One approach has been to merge all existing models into a “unified NMD model” in which a faux 3’ UTR triggers NMD, and can be enhanced by factors such as an EJC or DSE (Stalder and Muhlemann, 2008). Alternatively, it has been
33 suggested that NMD substrate recognition may be more complex, resulting from a combination of signals that antagonize NMD (i.e. PAB1) and signals that favor
NMD (i.e. EJCs) to define translation termination as normal or aberrant (Silva and Romao, 2009; Singh et al., 2008). Indeed, it is likely that not all signals that can either enhance or inhibit NMD have been identified. Perhaps additional components of the messenger ribonucleoprotein (mRNP) complex provide contextual information about translation termination, similar to what has been proposed for the EJC, PAB1, or HRP1 (Figure 1-3D).
Certainly, there is clear evidence that specific mRNP features can positively or negatively impact NMD substrate recognition, and it is widely accepted that a combination of cis- and trans-acting features of the mRNP complex define an NMD substrate. What is not yet agreed upon, however, is which factors are necessary and sufficient to elicit NMD. By expanding our knowledge of the trans-acting protein factors that associate with normal versus
NMD-sensitive mRNAs, it may be possible to identify additional features important for the recognition of NMD substrates in S. cerevisiae, which will increase our understanding of how NMD substrate recognition occurs. To this end, a protocol to purify an in vivo mRNP complex has been developed and is presented in Chapter 2. This procedure will facilitate the identification of additional proteins that differentially associate with NMD-sensitive mRNAs and which may be involved in discriminating these from normal mRNA species.
34 1.6 Downstream steps in NMD
Activation of UPF1 enzymatic activity
The molecular events subsequent to recognition of an NMD substrate are not understood in detail, despite much investigation. It is known that both the
ATPase and helicase activity of UPF1 are required for NMD (Weng et al., 1996a, b). In addition to these two enzymatic functions of UPF1, the phosphorylation cycle of UPF1 is also required for progression of NMD in higher eukaryotes
(Kashima et al., 2006; Yamashita et al., 2009). However, there is conflicting evidence regarding how and when these steps occur, and what their role in NMD may be (reviewed in Schweingruber et al., 2013). Most evidence suggests that in higher eukaryotes, phosphorylation of UPF1 by SMG1 occurs following association with and recognition of an NMD substrate, as part of a multi-protein complex (Kashima et al., 2006). This phosphorylation is important for the recruitment of SMG5-7 and progression of NMD (Ohnishi et al., 2003).
Meanwhile, the ATPase and helicase activities of UPF1 are believed to be activated by its association with UPF2 (Chakrabarti et al., 2011; Chamieh et al.,
2008), which is likely to occur concomitant with or immediately after NMD substrate recognition. The ATPase and helicase activities of UPF1 have been proposed to be involved in steps as diverse as remodeling the mRNP to enhance the decay of NMD substrate RNAs (Franks et al., 2010; Melero et al., 2012), scanning the 3‘ UTR for EJCs or other NMD-inducing features (Shigeoka et al.,
2012), or recycling UPF1 (Kurosaki et al., 2014). Importantly, although many of these molecular details are still being worked out, what is quite clear is that the
35 progression of steps from recognition of an NMD substrate to degradation of that
RNA involves many mRNP remodeling events and interactions.
Direct impact of NMD on protein output
Premature translation termination is primarily deleterious to the cell due to the potential production of truncated polypeptides with dominant-negative function (reviewed in Bhuvanagiri et al., 2010; Holbrook et al., 2004). Therefore, an important aspect of NMD is to minimize the amount of protein produced from an ORF truncated by a nonsense codon. There is evidence that this may occur in two ways. First, translation of the mRNA may be repressed prior to the decay of NMD substrate mRNAs. In humans, UPF1 can directly inhibit translation in vitro through interaction with eukaryotic translation initiation factor eIF3 (Isken et al., 2008). This is corroborated by observations in yeast that more protein is produced per RNA molecule from NMD substrate mRNAs when NMD is inactivated (Muhlrad and Parker, 1999b). These data suggest that UPF1 is normally involved in limiting the translation of NMD substrate mRNAs independent of their accelerated degradation. Second, there is evidence that proteins generated from NMD substrate RNAs are targeted for degradation by the proteasome as another mechanism to minimize the expression of proteins from the aberrant mRNA (Kuroha et al., 2009), although mechanistically how
NMD might promote protein turnover is unclear. This suggests that NMD not only induces the decay of the RNA substrate (see below), but also of the aberrant
36 protein encoded by the RNA. These data indicate that the cell takes a number of precautions to minimize the production of truncated proteins.
Decay of NMD substrate RNAs
The ultimate fate of a transcript recognized as a substrate for NMD is accelerated decay of the RNA. Rapid decay ensures that steady-state levels of the aberrant RNAs are low, thereby limiting production of truncated polypeptides.
RNAs targeted to NMD primarily undergo decay initiated at the 5’ end of the transcript by decapping, carried out by a holoenzyme composed of DCP1 and catalytic subunit DCP2. This is followed by rapid 5’ to 3’ exonucleolytic digestion by cytoplasmic exoribonuclease XRN1 (Lejeune et al., 2003; Muhlrad and
Parker, 1994). In yeast, decapping of NMD substrates occurs independent of deadenylation, making it distinct from the decay of normal mRNAs which is initiated by removal of the 3‘ poly(A) tail (Decker and Parker, 1993; Muhlrad et al., 1994). Notably, exonucleolytic decay from the 3’ end of the RNA, carried out by deadenylases and the cytoplasmic exosome, is also accelerated by NMD
(Cao and Parker, 2003; Lejeune et al., 2003; Mitchell and Tollervey, 2003;
Takahashi et al., 2003). A third mode of decay for NMD substrates has been described in higher eukaryotes. SMG6, a PIN-domain endonuclease essential for NMD, catalyzes the endonucleolytic cleavage of RNAs targeted to NMD
(Eberle et al., 2009; Huntzinger et al., 2008), either at the site of the PTC or
~20-40 nucleotides downstream (Boehm et al., 2014). This subsequently leads to exonucleolytic decay of the two resulting RNA fragments. In higher eukaryotes
37 in which both endonucleolytic and exonucleolytic decay occur, it is unclear which pathway dominates (Muhlemann and Lykke-Andersen, 2010).
Although degradation of NMD-substrate mRNAs has been well studied, it is not clear how this accelerated decay is induced. Some evidence suggests the decay machinery may be directly recruited to NMD substrates by NMD factors
(He and Jacobson, 1995; Loh et al., 2013; Lykke-Andersen, 2002; Takahashi et al., 2003). However, these interactions have never been shown to be absolutely required for the rapid decay of NMD substrate RNAs, and in some cases have even been shown to be dispensable for NMD (Swisher and Parker, 2011).
Therefore, a second possibility is that the NMD machinery induces an overall change in the organization of the mRNP or the mRNA translation state, which could indirectly make the mRNA accessible for decay (reviewed in Baker and
Parker, 2004).
1.7 Harnessing NMD to investigate RNA biology
The NMD pathway provides an invaluable system for learning about RNA biology more broadly. This pathway is sufficiently characterized such that it can be easily manipulated, while simultaneously providing a number of remaining questions that are of significant interest to unravel. Here, I have taken advantage of the current knowledge - and knowledge gaps - about the NMD pathway to study three novel experimental questions. First, NMD was used to develop a protocol to purify and compare in vivo-assembled mRNP complexes, by taking advantage of the effective discrimination between NMD-sensitive and -insensitive
38 RNAs by the NMD pathway. In this way, an experimental system could be designed to compare and contrast two very similar RNA species that were nevertheless subject to distinct mechanisms of decay. Furthermore, this approach simultaneously allows the identification of novel mRNP features unique to mRNAs that undergo NMD, which may clarify our incomplete understanding of
NMD substrate recognition. Second, the sensitivity of an RNA to NMD, a pathway that is dependent upon translation to recognize its substrates, can provide evidence that the RNA is translated. This feature was used as one of several independent pieces of evidence that many putatively non-coding RNAs undergo translation and thus may be misclassified. Finally, global gene expression analysis in the absence of NMD was used to identify NMD-sensitive mRNAs and RNA isoforms. It is predicted that, during this analysis, novel RNA isoforms can be identified that have previously been overlooked due to their instability and low abundance. Thus, through their identification it will also be possible to learn about the extent of alternative RNA processing in yeast, which is believed to be minimal compared to higher eukaryotes. At the same time, this will refine the list of endogenous substrates of NMD, and potentially demonstrate an expanded breadth of NMD activity important to help the cell tolerate erroneous RNA processing or regulate gene expression. Importantly, by harnessing NMD to uncover information about diverse aspects of RNA biology, we have also expanded our understanding of the mechanisms and impact of
NMD.
39 CHAPTER 2: PURIFICATION OF AN NMD-
SENSITIVE mRNP COMPLEX
2.1 Introduction
Rationale - evidence for a distinct NMD mRNP
The repertoire of proteins associated with an mRNA (i.e., the mRNP) can serve as a critical signal to the cell as to whether that RNA is aberrant (Eberle and Visa, 2014). Indeed, all current models for NMD substrate recognition are built upon differences in the association of proteins with NMD-sensitive versus
NMD-insensitive mRNAs which define the context of translation termination - for instance, the presence and location of an EJC. However, as described above
(see Chapter 1.5), current knowledge regarding the differences between NMD- sensitive and NMD-insensitive mRNAs is insufficient to explain how the cell discriminates between these two classes. The full extent of differences in the mRNP complex between these two classes of mRNAs, or indeed the complete repertoire of proteins associated with any single mRNA, is unknown (reviewed in
Muller-McNicoll and Neugebauer, 2013). Because mRNA binding proteins are intimately involved in the life-cycle of an mRNA - many are deposited co- transcriptionally and impact events from pre-mRNA processing to degradation of the mRNA (Muller-McNicoll and Neugebauer, 2013) - a complete understanding of associated factors will improve our knowledge about many of these processes and, in particular, about NMD. To this end, we sought to develop a protocol to purify and characterize a single mRNP species as it exists in vivo. This approach
40 can be used to identify differences in proteins associated with an NMD-sensitive mRNA versus an NMD-insensitive counterpart. In addition to the information that can be learned about NMD substrate recognition, the NMD pathway provides an excellent system in which to develop this protocol, as proteins that associate with two similar mRNAs that nevertheless differ in their sensitivity to NMD can be directly compared. Furthermore, this technique is invaluable in its utility to interrogate the mRNP complex of any single mRNA species to gain understanding about various aspects of mRNA activity or metabolism.
Existing methods to interrogate RNA-protein interactions
Many techniques have been developed over the years to gain information about the interactions between proteins and RNAs. These techniques have provided fundamental information about the nature and makeup of RNP complexes, including small nucleolar RNPs (snRNPs) (Bardwell and Wickens,
1990; Grabowski and Sharp, 1986; Niranjanakumari et al., 2002), the spliceosome (Zhou and Reed, 2003), 7SK non-coding RNA (Hogg and Collins,
2007), small bacterial non-coding RNAs (ncRNAs; Said et al., 2009) and individual mRNAs (Slobodin and Gerst, 2010). Each method has proven to be beneficial in specific contexts, but as described below none were ideally suited to address our experimental question.
One of the earliest methods developed to identify multiple proteins that bind to a single RNA is RNA affinity chromatography (Grabowski and Sharp,
1986; Sharma, 2008) in which a cell lysate is applied to an immobilized synthetic
41 RNA (often biotinylated RNA bound to a streptavidin resin) such that RNA binding proteins specific to the sequence or structure of the synthetic RNA bind and are retained on the resin (Grabowski and Sharp, 1986). This approach can identify a set of proteins that associate with a single RNA, but because the protein-RNA associations are formed in vitro a major caveat of this approach is that the interactions may not be a true representation of complexes that exist in vivo.
A number of methods taking the reverse approach - identifying RNAs bound by a single protein - have been developed, built around RNA immunoprecipitation (RIP; Gilbert et al., 2004; Niranjanakumari et al., 2002). In this method, a specific protein is immunoprecipitated, and RNAs that co-purify with the protein are analyzed. When RNA analysis is performed by microarray
(i.e. RIP-chip; Keene et al., 2006) or high-throughput RNA sequencing (i.e. RIP-
Seq; Zhao et al., 2010), a global list of mRNAs with which the protein associates can be obtained. Immunoprecipitation may be done following covalent crosslinking of protein-RNA complexes, often with formaldehyde (Gilbert et al.,
2004; Niranjanakumari et al., 2002; Selth et al., 2009) or UV light (i.e. UV
CrossLinking and ImmunoPrecipitation, CLIP; Ule et al., 2005), to ensure the interactions that are detected were captured as they occur in vivo. These variations on RIP provide specific information about whether an RNA interacts with a given protein, but are unable to simultaneously provide information regarding all RBPs that associate with that RNA.
The approach most conceptually similar to the one developed here is the use of RNA aptamers and their cognate binding proteins or ligands to selectively
42 purify a single mRNA species. In this method, an RNA aptamer sequence is inserted into the RNA of interest, and it is purified from a whole cell lysate by binding to its protein or ligand. Many derivations of this method exist, including
RNA Affinity in Tandem (RAT) which utilizes aptamers for both bacteriophage
MS2 coat protein and tobramycin on a single RNA to perform a two-step purification (Hogg and Collins, 2007), or RNA-binding protein Purification and
IDentification (RaPID; Slobodin and Gerst, 2010) which captures in vivo interactions by formaldehyde crosslinking. The most useful aspect of these approaches is the unbiased identification of proteins interacting with a single
RNA, accomplished by targeting the RNA for purification rather than the protein.
Because the RNA aptamer-ligand system is ideal for the experimental question we hoped to ask, initial mRNP purification trials used the MS2 coat protein and RNA aptamer system. However, a number of technical challenges limited the ability to use this method for meaningful characterization of mRNP complexes. First, it is unclear what fraction of MS2 coat protein associates with the RNA aptamer in vivo, and, similarly, whether the exogenous MS2 coat protein associates nonspecifically with any yeast proteins or mRNAs in vivo. Both of these issues may have contributed to high background and low purification efficiency (data not shown). Additionally, the stringency of this purification approach is dictated by conditions that are compatible with protein purification, thus limiting the salt concentration and use of detergents. Finally, previous investigations performed with this approach largely purified the RNA of interest from a lysate by in vitro interaction with an immobilized ligand (Bardwell and
43 Wickens, 1990; Hogg and Collins, 2007; Said et al., 2009; Zhou and Reed,
2003), thus making it difficult to prevent the formation of RNA-protein interactions in vitro during purification. Due to these drawbacks, we sought to develop an improved purification scheme based on purification of an mRNP by hybridization.
Purification by hybridization has several benefits, including resistance to denaturing conditions, and the ability to directly target an endogenous RNA of interest - without modification - for purification. Several related methodologies have taken advantage of these benefits to identify the repertoire of proteins that bind all mRNAs in yeast (Mitchell et al., 2013) and mammals (Baltz et al., 2012;
Castello et al., 2012), by identifying RBPs that crosslink to polyadenylated RNAs subsequently purified by oligo-d(T) selection. This provided an excellent way to identify novel RBPs, and revealed unexpected classes of RNA-associated proteins such as metabolic proteins (Mitchell et al., 2013). These studies accomplished on a global scale what we sought to accomplish at single-RNA resolution. A related approach, RNA antisense purification (RAP) purifies an
RNA by hybridization in order to identify its interactions with DNA (Engreitz et al.,
2013) or RNA (Engreitz et al., 2014), either directly or through protein intermediates. The specificity and resistance to stringent purification conditions that the hybridization approach provides made it the ideal system around which to design our purification protocol.
2.2 Results and conclusions
Overview of the mRNP pulldown protocol
44 We sought to design a procedure to purify a single mRNA species and the proteins that are associated with that mRNA in vivo, so as to identify the protein composition of a single mRNP. The details of this protocol and optimization are described below (see also Appendix B). Briefly, the mRNP is purified from dcp2Δ yeast cells lacking the enzymatic activity to remove the protective 5’ 7- methylguanosine cap from mRNAs (Steiger et al., 2003), which stabilizes an
NMD-sensitive reporter (Muhlrad and Parker, 1994) so as to increase its steady- state abundance to enable purification of a sufficient quantity of material. The yeast strain genetic background also includes deletion of three vacuolar proteases (pep4Δ, prc1Δ, prb1Δ) which minimizes protein degradation during incubation steps of the protocol (data not shown). Prior to lysis, cells are treated with formaldehyde to covalently crosslink protein-RNA and protein-protein interactions that occur in vivo (Niranjanakumari et al., 2002; Sutherland et al.,
2008), and a crude whole-cell lysate is generated. From the lysate, a single mRNA species is purified based on sequence-specific hybridization to biotinylated DNA oligonucleotides immobilized onto streptavidin magnetic beads.
Because this hybridization-based purification is resistant to salt and detergent, purification is carried out under denaturing conditions (0.5% sodium dodecyl sulfate) and high salt (500 mM lithium chloride), to minimize the purification of protein contaminants (bound nonspecifically to mRNA, DNA, or beads) or the in vitro formation of complexes (Mili and Steitz, 2004), while maintaining the covalent interactions captured by in vivo formaldehyde crosslinking. After purification, RNA can be extracted and analyzed by northern blotting or
45 quantitative reverse transcriptase polymerase chain reaction (qRT-PCR), or proteins may be analyzed by western blotting or mass spectrometric analysis.
Design and characterization of reporter mRNAs
To identify proteins that differentially associate with NMD-sensitive versus
NMD-insensitive mRNAs, reporter mRNAs to represent each of these classes were deliberately designed. These reporter mRNAs encode the heterologous
GFP ORF (Figure 2-1A, reporter 1). Insertion of a PTC 25% of the way through the ORF (Figure 2-1A, reporter 2) rendered the mRNA a robust substrate for
NMD, based on a >10-fold decrease in abundance that was rescued upon NMD inactivation by deletion of NMD factor UPF1 (Figure 2-1B, compare WT to upf1Δ). Deletion of sequences following the PTC that functionally act as a 3’
UTR upon premature translation termination (Figure 2-1A, reporters 3 - 5) decreased the sensitivity of the GFP reporter mRNA to NMD in a sequence- independent manner (Figure 2-1B). Indeed, the NMD sensitivity of the GFP reporter was essentially eliminated by removing all downstream untranslated regions of the ORF (Figure 2-1A, reporter 5; Figure 2-1B [ΔABC]). Thus, reporters 2 and 5 were selected to compare NMD-sensitive and NMD-insensitive mRNPs, as they are identical except for the deletion of untranslated sequences downstream of the PTC in reporter 5, contain translated ORFs of equivalent size and sequence, but display different sensitivities to NMD. It was particularly beneficial that the differential sensitivity of the reporters to NMD was achieved by altering only the length of the 3’ UTR while keeping the ORF length constant, in
46 WT upf1 PTC at 25% PTC at 25% A B
B BC B BC A B C A B C 1
GFP 2
3 SCR1
4 160
140 A 120 100 5 80 WT 60 upf1
Abundance 40 Relative mRN 20 0 Normal PTC PTC (1) PTC (2) PTC (3)
C ABC ABC ABC ABC
+PTC +PTC +PTC +PTC +PTC +PTC +PTC +PTC
GFP GFP GFP GFP GFP GFP GFP GFP GFP GFP GFP GFP
Relative RNA levels 100 21.6 102 30.0 9.44 25.0 37.4 31.6 35.3 123 178 589 Promoter GAL TDH3 Plasmid copy-number Cen 2µ WT dcp2
Figure 2-1. Design and validation of GFP reporter mRNAs. A) Schematic representation of GFP reporters. Green indicates ORF; light green indicates untranslated open reading frame due to insertion of a PTC 25% of the way through the ORF. B) Steady-state abundance of GFP reporters in wild-type and NMD-deficient upf1Δ cells. Quantification of each set of deletions, normalized to SCR1 levels, shown below. Error bars represent SEM. C) Steady-state abundance of GFP reporters expressed from inducible GAL promoter or constitutive TDH3 promoter, from plasmids bearing a centromeric (Cen; i.e. low-copy) or 2µ (i.e. high-copy) origin of replication. Cells expressing GAL-driven reporters were grown in media containing 2% galactose/1% sucrose instead of 2% glucose, to induce expression.
light of recent observations that ORF length can impact the sensitivity of an mRNA to NMD through an unresolved mechanism (Decourty et al., 2014).
Importantly, the commonly used, inducible GAL promoter is inefficient at driving gene expression in the dcp2Δ genetic background required to stabilize the NMD- sensitive reporter RNA (Geisler et al., 2012). Therefore, to drive expression in this strain, the GFP reporters were placed under control of the constitutive TDH3
47 promoter (Nacken et al., 1996). Because this promoter is ~3-fold weaker than the GAL promoter (Figure 2-1C, left two panels), the GFP mRNAs were expressed from a high-copy plasmid (Figure 2-1C, right panel) to drive high expression and increase the yield of the purified mRNP.
Optimization of mRNP pulldown efficiency
The mRNP pulldown protocol is based on the high specificity base-pairing between complementary DNA and RNA sequences. The overall purification scheme (Figure 2-2A) involves immobilizing complementary DNA oligonucleotides to magnetic beads followed by hybridization to the target RNA in a whole-cell lysate to specifically and cleanly purify a single RNA species. For immobilization, a 3’ biotin adduct was incorporated onto the DNA oligonucleotides, and the strong interaction between biotin and streptavidin was harnessed. The resistance of this interaction to high salt and ionic detergents allowed us to perform purification under stringent conditions without disrupting the association of complementary DNA oligonucleotides with the beads. A panel of streptavidin Dynabeads with different binding capacities and hydrophilic properties was tested (Figure 2-2B). The MyOne Streptavidin C1 Dynabeads
(Invitrogen) provided >2.5-fold higher binding capacity than any other beads tested, based on the amount of GFP reporter purified from whole-cell RNA
(Figure 2-2C).
Nearly all variables for the mRNP pulldown were exhaustively optimized to maximize recovery of the GFP reporter mRNAs, including concentration and
48 A
M-270M-280MyOneMyOne - C1M-270 - T1M-280MyOneMyOne - C1 - T1
B C
1.4 1.7 5.6 2.2 Percent of Input Input Eluate
Figure 2-2. Selection of streptavidin-conjugated magnetic beads for mRNP purification. A) Schematic of mRNP pulldown design. Biotinylated DNA oligonucleotides are immobilized onto streptavidin magnetic beads, which hybridize to the reporter mRNA and copurify any associated RNA binding proteins, including known RBPs such as ribosomal proteins or PAB1 (shown), and unknown factors. B) Table of characteristics of magnetic streptavidin Dynabeads tested for use in mRNP pulldown protocol. C) Northern blot after pulldown of GFP reporter mRNA from whole-cell RNA, to test Dynabeads for binding capacity.
length of formaldehyde crosslinking treatment, lysis method, hybridization conditions, incubation time and temperature, and elution method.
Representative optimization experiments are presented in Figure 2-3. The highest GFP reporter mRNA recovery was achieved following an overnight incubation at room temperature, which resulted in >4-fold more mRNA purified than a one-hour incubation at room temperature or overnight incubation at 50 °C, and approximately 30% more than an overnight incubation at 37 °C (Figure
2-3A). The addition of the RNA denaturant formamide was found to increase
49 A B Input Supernatant Eluate Eluate 1/10 Input
°C O/N °C O/N °C O/N °C O/N RT 1hr RT O/N 37 50 RT 1hr RT O/N 37 50 ComplementaryNo DNA 2x bead 2x incubationDecreased +14% oligo oligo volume volume LiCl formamide
Percent of Input 1.3 6.1 4.8 1.3 Percent of Input 15.3 51.8 17.0 14.5 30.5
C
1/10 Supernatant Eluate 1/10 Input
Oligos 1 3 4 1+2+3 1+2+4 1+2+3+4 1 3 4 1+2+3 1+2+4 1+2+3+4
Percent of Input 100 102 72.5 14.7 24.0 14.5 14.5 0.72 0.38 35.1 19.4 47.4 36.8
Figure 2-3. Optimization of mRNP pulldown conditions. A) Incubation time and temperature for the hybridization step of the mRNP pulldown protocol was optimized for maximum mRNA recovery, based on percent of reporter mRNA purified under each condition. B) A panel of buffer and incubation conditions was tested for the hybridization step of the mRNP pulldown protocol. Conditions as labelled. C) A series of DNA oligonucleotides complementary to different regions of the common sequence of the GFP reporters (schematic of NMD-sensitive reporter shown below) were tested for the efficiency with which they purified the reporter mRNA. Star indicates combination selected for use in purification.
recovery approximately two-fold, presumably by increasing accessibility of the reporter mRNA for hybridization (Figure 2-3B).
Complementary DNA oligonucleotides were extensively tested to identify those which afforded the greatest recovery of the reporter mRNA. These oligonucleotides were required to satisfy several criteria. First, they must be
50 complementary to the GFP ORF sequence rather than flanking sequences, because the reporter utilizes 5’ and 3’ UTR sequences from endogenous yeast genes which would also be purified if hybridization was targeted to these regions
(Figure 2-3C, bottom); in contrast, the GFP ORF bears no significant homology to any yeast sequences. Second, they must be complementary to only the region of the GFP ORF common to both NMD-sensitive and NMD-insensitive reporters, such that both are being purified in an identical fashion. Third, to prevent hybridization between the mRNA and DNA oligonucleotide from being physically occluded, we avoided the last ~18 nucleotides of the common GFP ORF sequence which would overlap the footprint of a terminating ribosome (Ingolia et al., 2009). Fourth, the oligonucleotides must not contain any significant intramolecular secondary structure which could block hybridization to the reporter mRNA. Based on these criteria, DNA oligonucleotides complementary to different regions of the common GFP ORF were designed and tested both in isolation and in combination. Individually, the DNA oligonucleotides showed a wide range of effectiveness, recovering from <1% to >35% of the GFP reporter
(Figure 2-3C). We rationalized that simultaneously using a combination of DNA oligonucleotides might increase the capture of individual mRNA molecules obscured at one hybridization site but not another due to crosslinking of proteins.
Indeed, a combination of three DNA oligonucleotides was determined to be the most effective combination for purifying the GFP reporter mRNA, increasing the percent of RNA recovered to nearly 50% (Figure 2-3C, starred lane).
51 After thorough optimization of the mRNP pulldown protocol, the specificity of purification was determined by two assays. First, northern blot analysis demonstrated efficient purification of the GFP reporter mRNA, with 1/3 to 1/2 of the RNA present in the lysate consistently recovered (Figure 2-4A, top panel). In contrast, a nonspecific but highly abundant cellular mRNA, PGK1, was undetectable in the purified sample (Figure 2-4A, middle panel). Additionally, there was no detectable background of the highly-abundant 18S ribosomal RNA
(rRNA), a common purification contaminant, in a control purification from cells lacking a GFP reporter (Figure 2-4A, bottom panel). In contrast, co-purification of rRNA did occur upon purification of either reporter mRNA (Figure 2-4A, bottom panel), which was anticipated due to the fact that the GFP reporters are
A B
) 100000
No reporterNMD-sensitiveNMD-insensitiveNo reporterNMD-sensitiveNMD-insensitiveNo reporterNMD-sensitiveNMD-insensitive
PGK1 10000 / GFP
GFP GFP 1000
PGK1 100
18S rRNA 10 Fold enrichment ( 1 1/10 Input 1/10 Sup Eluate NMD-sensitive NMD-insensitive
32 16 40 31 Percent of Input
Figure 2-4. mRNP pulldown is highly specific for target RNA. A) Northern blot of GFP reporter RNA purified in mRNP pulldown (top), a negative control mRNA PGK1 (middle), and ribosomal RNA. Values indicate percent of GFP reporter RNA present in supernatant or purified eluate relative to respective input. B) Quantitation by qRT-PCR of GFP reporter and PGK1 negative control mRNAs purified by mRNP pulldown. Plotted is the relative enrichment of GFP reporter mRNA (GFP eluate/GFP input) compared to PGK1 (PGK1 eluate/PGK1 input). Data represent the average of 3-4 independent experiments +/- SEM.
52 translated (data not shown). Second, qRT-PCR analysis of the purified RNA demonstrated >5000-fold enrichment for the GFP reporter versus the non- specific PGK1 mRNA, providing quantitative evidence that this purification protocol is highly specific (Figure 2-4B). Furthermore, quantification of the absolute amount of GFP reporter RNA purified, based on comparison to a standard curve by qRT-PCR, indicated that approximately 10-50 pmol of GFP
RNA could be purified from yeast grown in 200 mL of liquid culture (data not shown). Based on these data, the mRNP pulldown can specifically and efficiently purify an mRNA of interest with minimal undesired nonspecific background.
Co-purification of mRNA-binding proteins
Next, co-purification of protein components of the mRNP was analyzed. mRNP pulldown was performed for cells expressing either NMD-sensitive or
NMD-insensitive reporter mRNAs, and control cells lacking a GFP reporter.
Proteins co-purified with the reporter mRNAs were analyzed by western blotting
(Figure 2-5). As expected, the mRNA-associated protein PAB1, which binds to mRNA 3’ poly(A) tails, was detected in the purified eluate for both reporter mRNAs, but was not detected in the purification from cells that did not express a reporter mRNA (Figure 2-5A, third panel). When normalized to the quantity of mRNA purified for this experiment (as determined by northern blotting), the level of co-purified PAB1 was similar between the two reporters (Figure 2-5B, white).
This demonstrated that predicted mRNP proteins were specifically co-purified with the target mRNA via the mRNP pulldown protocol. As a control for
53 nonspecific background, the metabolic enzyme 3-phosphoglycerate kinase
(PGK1), which is not expected to associate with mRNA, was undetectable in all purified eluates (Figure 2-5A, bottom panel). This provided additional evidence that the purification procedure resulted in limited purification of nonspecific proteins, indicating that the contribution of in vitro formed mRNP interactions and the association of proteins with the magnetic beads or DNA oligonucleotides under these conditions was minimal.
Interestingly, NMD factor UPF1 showed a strong enrichment in the mRNP of the NMD-sensitive mRNA but demonstrated some level of association with both reporter mRNAs (Figure 2-5A, top two panels), consistent with previous observations that UPF1 binds normal mRNAs but shows a preferential association with NMD substrates (Hogg and Goff, 2010; Johns et al., 2007).
A B
Urea lysate - reporterNMD-sensitiveNMD-insensitive reporter - reporter reporterNMD-sensitiveNMD-insensitive reporter reporter 250 kD -HA 100 150 kD (UPF1-HA ~113 kD)
100 kD level) 250 kD -HA A 150 kD (UPF1-HA ~113 kD) 10 Long exposure 100 kD PAB1 UPF1 75 kD -PAB1 (67 kD) 1 50 kD 75 kD Relative level in eluate
-PGK1 (normalized to RN 50 kD (45 kD) 0.1 37 kD Eluates Inputs
NMD-sensitive NMD-insensitive
Figure 2-5. Co-purification of proteins with mRNP pulldown protocol. A) Western blot of proteins co-purified with reporter mRNA in mRNP pulldown. PAB1 and UPF1 are expected mRNP components, while PGK1 serves as a negative control. B) Quantitation of (A). Relative protein levels are normalized to amount of reporter mRNA purified, as determined by northern blotting (data not shown).
54 When normalized to purified mRNA levels, a >100-fold enrichment of UPF1 with the NMD-sensitive mRNP was observed (Figure 2-5B, gray). Importantly, the enrichment of UPF1 with the NMD-sensitive mRNA provided evidence that proteins differentially associated with two mRNAs can be distinguished by this method.
Preliminary identification of mRNP components by mass spectrometry
For the comprehensive identification of proteins associated with the NMD- sensitive GFP mRNA, proteins were analyzed by mass spectrometry (MS) following mRNP pulldown by collaborators in the lab of Dr. Amber Mosley in the
Department of Biochemistry and Molecular Biology at Indiana University. In order to obtain a sufficient quantity of material for analysis by MS, the scale of the purification was increased to 12 liters of liquid yeast culture. Notably, MS analysis was only carried out on one-fourth of the purified material, indicating that purification of an mRNP from ~3 liters of liquid yeast culture should generate adequate material for MS analysis. Briefly, after purification of the mRNP, digestion with trypsin was used to liberate proteins from the bead-associated mRNP complex and generate peptides for MS. Tryptic peptides were separated by multi-dimensional chromatography and analyzed on an LTQ Velos Pro Ion
Trap MS, producing a list of >900 detected yeast proteins. For analysis, 236 yeast proteins met detection cutoffs of ≥2 unique peptides and ≥5 total peptides detected.
Included in the list of identified proteins were highly abundant cellular factors, such as metabolic enzymes and structural proteins. Some of these may
55 represent nonspecific background purified by nature of high cellular abundance or affinity for the magnetic beads or DNA oligonucleotides. This list of 236 proteins was compared against several databases of yeast proteins identified as common contaminants in MS analyses. Specifically, the Contaminant Repository for Affinity Purification (CRAPome; Mellacheruvu et al., 2013) identified yeast protein contaminants in several purification systems, while Dr. Amber Mosley provided several additional lists of yeast proteins commonly identified as contaminants in MS purifications. The number of datasets in which a protein appeared, with more hits indicating a higher likelihood of being a contaminant, was taken into account when identifying candidate proteins of interest.
Importantly, these datasets of common contaminants were each generated using distinct purification protocols, and thus contaminants unique to the mRNP purification system cannot be unequivocally identified by comparison to these datasets. Rather, future MS analysis of a mock mRNP purification from cells that lack a GFP reporter mRNA will be used as an invaluable resource to identify precisely which proteins are contaminants of this purification system (see
Chapter 2.4).
Of the 236 identified proteins, 84 have annotated functions related to
RNA. The enrichment of proteins with characterized, predicted, or anticipated functions related to RNA activity supports the specific purification of mRNP complexes from a yeast cell lysate. Proteins with high detection included NMD factor UPF1 and poly(A) tail-binding protein PAB1, both of which associated strongly with this mRNA based on detection by western blot (Figure 2-5A). This
56 indicated that analysis of the mRNP by MS recapitulated the results obtained by the targeted detection of proteins.
Proteins identified by MS analysis were assessed for a potential role in
NMD. Proteins involved in RNA quality control would be predicted to have RNA- related activity, while it would be more surprising for a metabolic enzyme, for example, to play a role in RNA surveillance. Therefore, the subset of 84 proteins with RNA-related functions were selected to be investigated initially. This subset includes proteins with well-defined roles in processes including translation (i.e. ribosomal proteins, translation factors), RNA processing (i.e. tRNA synthetases, splicing factors, RNA polymerases), and RNA decay (i.e. exoribonuclease XRN1,
NMD factor UPF1). Because these proteins have been previously well studied in general, we rationalized that they would be less likely to play a novel role in
NMD, although a secondary function in defining the context of translation termination cannot be excluded based on this alone. Rather, candidate proteins with more poorly defined roles in translation or mRNA metabolism (ribosome- associated proteins, RNA-binding proteins) or specific activity related to other aspects of NMD (nonsense suppression, protein phosphorylation, proteasome interactions), were considered intriguing candidates for an uncharacterized role in NMD. This assessment narrowed the list of candidates to be tested for a role in promoting NMD in the cell to 35 proteins (Table 2-1).
As one method to further hone this list of candidate proteins, the association of each protein with the mRNP relative to its overall cellular abundance was inspected. Importantly, proteins specifically associated with the
57 purified mRNP would be predicted to show an enrichment following mRNP purification. To address this, an abundance/detection (A/D) ratio was calculated.
This ratio measures the estimated cellular abundance of a protein, as determined by a global analysis of chromosomally TAP-tagged protein levels in log-phase yeast cells (Ghaemmaghami et al., 2003), relative to the number of peptides detected per protein after normalization for protein length (molecules per cell/# peptides per protein length). Thus, a smaller A/D ratio indicated an enrichment of the protein in the MS analysis relative to its cellular abundance. For UPF1, a protein specifically enriched on the NMD-sensitive mRNP, the A/D ratio was
2112, indicating that scores in this range or lower were suggestive of high enrichment. In contrast, the metabolic enzyme PGK1 had a much higher A/D ratio of 10450. Based on these examples, a ratio of ~5000 or less was considered to indicate specific enrichment of a protein with the mRNP.
Interestingly, 15 of 25 candidate proteins with a calculated A/D ratio met this score criteria, suggesting these proteins not only bound to but were specifically enriched in the purified mRNP, and making these candidates the highest priority for further investigation.
The list of candidate proteins can be further refined by future analyses of control mRNP complexes. Specifically, purification of an NMD-insensitive mRNP will enable identification of proteins that show a preferential association with either NMD-sensitive or NMD-insensitive mRNA. Additionally, a mock purification from cells lacking a GFP reporter will identify contaminant proteins, which can be excluded from future analysis.
58 Abundance Gene # Unique # Total / Detection Name Viability NMD-Relevant Function Priority Peptides Peptides Ratio
MBF1 Viable Suppressor of frameshift mutations +++ 5 40 1786 OLA1 Viable Increased readthrough of premature stop codons +++ 8 31 -
DBP5 Inviable DEAD-box RNA helicase, mRNP export/remodeling +++ 2 13 5524
DBP2 Viable DEAD-box RNA helicase, uncharacterized role in NMD +++ 2 18 10040
HRP1 Inviable Binds NMD cis-acting DSE +++ 2 6 -
ECM32 Viable Interacts with translation termination factors +++ 3 9 917
BFR1 Viable Associated with mRNP complexes and polyribosomes ++ 10 16 9811
RBG2 Viable GTPase with a role in translation ++ 3 11 -
RBG1 Viable GTPase that associates with translating ribosomes ++ 3 5 -
EAP1 Viable Translation inhibitor, decapping activator ++ 4 7 2925
NEW1 Viable Cosediments with polyribosomes ++ 5 9 49966
SUB2 Inviable DEAD-box RNA helicase, mRNA export ++ 2 17 13564
SRO9 Viable Associates with translating ribosomes ++ 11 32 1143
PBP1 Viable Interacts with PAB1 ++ 6 10 2072
SNF1 Viable Protein kinase ++ 3 5 746
RPG1 Inviable Translation initiation factor eIF3a ++ 12 30 16934
HYP2 Viable Translation elongation factor eIF5A ++ 2 19 -
ASC1 Viable Inhibits translation, component of 40S ribosomes + 4 18 59015
SCP160 Viable Interacts with translating mRNAs + 17 43 -
ARC1 Viable Involved in tRNA delivery + 9 29 7481
NGR1 Viable RNA binding protein, regulates mRNA-specific decay + 2 8 1294
RLI1 Inviable Translation initiation, termination, ribosome recycling + 3 7 5455
TMA19 Viable Associates with ribosomes + 2 9 -
STM1 Viable Required for optimal translation under stress + 10 29 4406
YLR419W Viable Putative helicase, uncharacterized + 4 15 -
REI1 Viable Cytoplasmic pre-60S factor + 2 8 -
PIN4 Viable Contains RNA-recognition motif + 3 6 5155
SPT5 Inviable Roles in RNA processing and quality control + 3 6 4146
NSR1 Viable rRNA processing and ribosome biogenesis - 11 128 2503
ARB1 Inviable 60S ribosome biogenesis and activity - 5 6 -
PRP43 Inviable RNA helicase involved in Pol II transcript metabolism - 5 7 18518
STE20 Viable Protein kinase - 4 5 486
RPT1 Inviable Proteasome ATPase - 3 8 61
TSA1 Viable Associates with ribosomes - 2 24 30870
VMA2 Viable Vacuolar ATPase - 4 7 96753
Table 2-1. List of NMD-sensitive mRNP-associated proteins with candidate roles in NMD. Gene name, viability of strains lacking gene of interest, brief description of characterized function as relevant to a potential role in NMD, priority, MS peptide detection levels, and A/D ratios listed. Functions based on gene descriptions in the Saccharomyces Genome Database (yeastgenome.org). Priority rankings, based on predicted role and peptide detection by MS: (+++) high interest; (++) moderate interest; (+) low interest; (-) tangential interest.
59 2.3 Discussion
The mRNP pulldown protocol represents a novel method to specifically and efficiently purify a single mRNP species as it exists in vivo. The use of formaldehyde crosslinking captures physiological RNA-protein interactions, which, when coupled with hybridization, allows purification under stringent conditions including ionic detergents and high salt concentration. This procedure improves over previous methodologies designed to interrogate RNA-protein interactions in several ways. First, it neither relies upon nor is conducive to interactions formed in vitro, which may not be representative of mRNP complex assembly in vivo. Second, directly targeting the RNA by hybridization results in high purification efficiency, which may be more difficult to achieve when purification is mediated by RNA-protein interactions. Third, this represents an approach that is unbiased by prior knowledge of protein function, as components of the mRNP can be identified by mass spectrometry following purification. This protocol has been demonstrated to effectively purify an mRNP complex, and will be used to identify differences between two mRNPs that display different sensitivity to the NMD pathway. Preliminary MS analysis for an NMD-sensitive mRNP complex identified an enrichment of proteins with known or anticipated function in mRNA activity, indicating that MS analysis of the purified mRNP can be used to produce a valid and informative dataset. A number of candidate proteins with possible roles in NMD were identified in this manner, and further interpretation of the data will be facilitated by future analyses of an NMD-
60 insensitive control mRNP and a mock sample generated from cells that do not express a GFP reporter mRNA.
This procedure can be easily adapted for the purification of other mRNP complexes to understand various aspects of RNA metabolism. With minimal optimization, a set of DNA oligonucleotides complementary to an RNA of interest can be tested and used in place of the oligonucleotides presented here.
Alternatively, the small region of the GFP ORF to which complementary DNA oligonucleotides were targeted can be cloned into an RNA of interest, serving as a transferrable targeting platform. The latter approach has been used to investigate the association of proteins with mRNAs displaying different sensitivities to decapping activator DHH1 (data not shown).
The mRNP pulldown protocol can interrogate the association of proteins with an mRNP complex in two distinct ways, depending on the nature of the experimental question and extent of prior knowledge regarding the role of a protein in the life cycle of an mRNA. First, this approach can be used to ask directed questions regarding the association of a specific RNA-binding protein with individual mRNAs using western analysis. This is convincingly demonstrated by the differential association of UPF1 with NMD-sensitive and
NMD-insensitive mRNAs following mRNP purification, which recapitulates the documented preference of UPF1 for NMD substrate mRNAs (Johns et al., 2007).
Furthermore, another interesting application of this protocol could be analysis of the association of known mRNP components with an mRNA in different gene deletion backgrounds. This could identify whether one component of an RNA
61 regulatory pathway is required for other factors involved in the same pathway to associate with an mRNA. This type of analysis could provide information about the order of protein assembly in an mRNP complex, or the interdependency of protein association with a specific mRNA. Moving forward, purification of mRNP complexes yields sufficient quantity of material for protein detection by MS. This facilitates the detection of unknown proteins associated with any single mRNA population. The mRNP pulldown procedure therefore represents a powerful new tool to uncover many details of mRNP complex formation and regulation.
2.4 Future directions
Following the development of the mRNP pulldown protocol, two long-term goals remain: 1) to identify factors important for NMD substrate recognition based on differences between NMD-sensitive and NMD-insensitive mRNPs, and
2) to obtain a complete picture of proteins that associate with a typical mRNP.
Following substantial protocol optimization, it was demonstrated that the purified mRNP can be effectively analyzed by MS, and data regarding an NMD-sensitive mRNP is now available for further study. Future analysis of control datasets will be carried out in order to extract the maximum amount of useful information from this dataset. Specifically, analysis of the mRNP associated with the NMD- insensitive mRNA will identify proteins that consistently associate with mRNAs and those that preferentially associate with NMD-sensitive or -insensitive mRNA populations. Additionally, analysis of a mock mRNP preparation from cells lacking a GFP reporter mRNA will identify proteins purified non-specifically, which
62 can be filtered from experimental datasets. The analysis of control datasets will allow both experimental goals to be addressed in more depth.
Protein factors involved in the recognition of NMD substrates are anticipated to show differential association with NMD-sensitive and -insensitive mRNAs. NMD-sensitive mRNAs are predicted to preferentially associate with specific proteins that help recruit the NMD machinery or mark the NMD substrate, based on the increased length of untranslated RNA that results from premature translation termination which may serve as a binding platform for
RBPs, although it is also formally possible that instead NMD substrates lack some feature of normal RNAs. Comparison of proteins associated with the
NMD-sensitive mRNP to those associated with the control NMD-insensitive mRNP will facilitate the identification of those that show a differential association between the two classes of mRNAs. This will provide stronger evidence for a potential role for candidate proteins in NMD substrate recognition.
An intriguing possibility is that the NMD factors themselves sense premature translation termination, in contrast to relying on other trans-acting factors to recruit the NMD machinery to NMD substrates. In support of this possibility, UPF1 associates preferentially with NMD-substrate mRNAs, or mRNAs with longer regions of untranslated RNA downstream of the site of translation termination, in a sequence-independent manner (Hogg and Goff,
2010; Hwang et al., 2010; Johns et al., 2007). In this way, UPF1 itself could act as a sensor of 3’ UTR length and directly recognize NMD substrates. This model is supported by the enriched association of UPF1 with the NMD-sensitive GFP
63 reporter mRNA by western blotting, and strong detection with this mRNP by MS analysis. Further investigation is needed to determine if UPF1 alone is truly able to detect the length of an mRNA 3’ UTR independent of any other protein factors.
Concomitantly, other proteins identified by MS to associate with the NMD- sensitive mRNA will be investigated for a general role in NMD. Proteins encoded by nonessential genes will be screened by generating genomic deletions of each gene individually. The CYH2 pre-mRNA, an endogenous NMD substrate due to inefficient splicing leading to intron retention (He et al., 1993), will then be analyzed for an increase in abundance indicative of decreased NMD activity. A representative example of this screening procedure was demonstrated with hnRNP K-like protein HEK2, which is implicated in nuclear mRNA maturation, export, and persistent association with cytoplasmic mRNPs (Denisenko and
Bomsztyk, 2002). The PBP2 protein is a paralog of HEK2 that associates with
PAB1 protein (Mangus et al., 1998), and was simultaneously tested to exclude a redundant role between the two proteins (Figure 2-6A). In this example, neither protein alone or in combination affected the abundance of CYH2 pre-mRNA, arguing against a role in NMD substrate recognition. Screening in this manner is being carried out for candidate proteins from the presented mRNP pulldown data, selected based on high representation in the NMD-sensitive mRNP and putative functions in mRNA processing or translation (Table 2-1). To this point, no novel proteins with a general role in NMD have been identified. Because candidate proteins were identified based on association with GFP reporter mRNAs, the abundance of the GFP reporters can be monitored following deletion of each
64 B GAL-HA:DBP5 Hours following shift
untagged 0 1 2 4 6 from GAL to GLU media
75 kD
50 kD -HA (DBP5-HA ~59 kD) 37 kD