<<

Understanding the Role of Junction Complex-dependent Nonsense Mediated

mRNA Decay in Embryonic Development

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

in the Graduate School of The Ohio State University

By

Pooja Gangras, B.Tech.

Graduate Program in

The Ohio State University

2019

Dissertation Committee

Dr. Sharon L. Amacher, Advisor

Dr. Guramrit Singh, Co-Advisor

Dr. Robin P. Wharton

Dr. Christine Beattie (deceased)

Dr. Anita K. Hopper

© Copyright by

Pooja Gangras

2019

ii Abstract

Post-transcriptional control of expression is essential for proper development and is achieved largely by RNA-binding . My graduate research has focused on understanding the role of one such RNA binding complex, the Exon Junction

Complex (EJC) in development. EJC is deposited about 24 nts upstream of exon-exon junctions during splicing. EJC influences many aspects of post-transcriptional regulation such as mRNA splicing, export, localization and nonsense mediated mRNA decay

(NMD). EJC-dependent NMD is recognized as one mode of rapidly regulating of normal mRNAs that contain NMD-inducing features such as 3′UTR or an upstream ORF. How regulation of gene expression via EJC-dependent NMD influences development of specific tissues is unknown. My work utilizes a strong developmental and genetic model system, zebrafish, to understand how EJC-dependent

NMD shapes development.

In order to address my scientific questions, I utilized the RNA-Seq technique.

Our lab has a custom RNA-Seq library preparation protocol which I wanted to improve in order to increase the efficiency of isolating sample cDNA after reverse of

RNA fragments (Chapter 2). To improve the protocol, I incorporated biotinylated dNTPs in the RT reaction so that cDNA generated from RNA fragments could be extracted using streptavidin beads. This amendment to the library preparation protocol efficiently selects for extended RT product and avoids ligation of the unextended adapter and generation of insert-lacking cDNAs in the library.

iii To study EJC function during development (Chapter 3), I generated zebrafish mutants in EJC core protein rbm8a and . Homozygous rbm8a and magoh mutants (EJC mutants) are paralyzed and have muscle and neural defects. As expected,

RNA profiling revealed that annotated aberrant and normal NMD targets are significantly upregulated in EJC mutants. An mRNA is targeted for NMD by the key

NMD-regulator Upf1 when an exon-exon junction, marked by the EJC, is bound ≥ 50 nts downstream of the terminated . Surprisingly, I discovered that some upregulated normal transcripts contain a conserved proximal 3′ UTR intron (3′UI) < 50 nts downstream of the stop codon. These ‘proximal 3′UI-containing NMD targets’ are similarly up-regulated in Upf1-deficient and NMD inhibitor-treated embryos, suggesting that this subset of rbm8a- and magoh-regulated transcripts is regulated via NMD. The same trend is observed in Upf1-deficient mammalian cells. One proximal 3′UI+ NMD target which is upregulated in EJC mutants and morphants encodes Foxo3b. I find that heterozygous and homozygous knockout of foxo3b partially rescues motor outgrowth defects in EJC mutants. My work establishes zebrafish as a system to study

NMD, characterizes zebrafish embryonic muscle and neural defects associated with loss of EJC and identifies proximal 3′UTR intron-containing genes a new class of NMD targets.

iv Dedication

This work is dedicated to Maithili,

I hope you aspire to be the best version of yourself that you can be…

v Acknowledgments

I have been able to reach this point in my and career due to the help, guidance and support of several people. Firstly, I would like to thank my advisors Dr. Sharon Amacher and Dr. Guramrit Singh for being amazing mentors and truly warm people. Sharon and

Amrit have been instrumental in teaching me several techniques and the scientific method of thinking. I have been very fortunate to have mentors who have always supported my ideas. I am grateful to the MolGen Graduate Student Advisor and my committee member,

Dr. Robin Wharton, who really helped me find my way to this collaborative project in my first year when I was completely clueless. I would like to thank all my committee members Dr. Robin Wharton, Dr. Christine Beattie (deceased) and Dr. Anita Hopper for their support throughout graduate school. I would also like to thank my undergraduate mentors Dr. Akihiro Ikeda (UW-Madison) and Dr. Barbara Kloeckener-Gruissem

(Univerity of Zurich) who had faith in my ability to do science. I would also like to specially thank Dr. Shubha Tole (TIFR, Mumbai) for all her valuable advice.

I would like to thank Dr. Jared Talbot who taught me everything I know about working with zebrafish, I came in with no experience and he was very patient in training me. I would also like to thank my lab mates Lauren, Justin, Zhongxia and Kiel. My lab mates created a wonderful atmosphere of laughter and friendship which made me want to come to lab even through the hard times in graduate school. I am grateful to Lauren for being my companion through every phase of graduate school and for being my sounding board through all the hard times. I am thankful to Zhongxia for keeping my love for science alive in my last year of graduate school with his excitement for all the newly

vi published cool papers. I have made some of the best conference memories with Justin and Kiel, they both helped me settle into the Singh and Amacher labs respectively and I am grateful for that. I am grateful to Robert Patton and Michael Parthun for all their help with bioinformatics.

I would like to thank the center of RNA biology for supporting my project with a seed grant and with the graduate student fellowship. I would also like to thank the

Amacher lab managers Zachary Morrow and Danielle Pvirre who worked with their army of undergraduates to always keep the fish facility functioning which allowed me to pursue my research. I would also like to thank all the undergraduates who helped out with the lab chores in Singh lab.

Lastly, I would like to thank my mother Devyani, sister Maithili and husband

Abhijoy for all the emotional support they have provided over the last 5 years. Talking to my mom and sister every weekend was the best therapy I could ask for. They both always made me laugh and encouraged me to push through everything. I am so thankful to have met Abhijoy in my first week in Columbus. Abhijoy has been my biggest cheerleader throughout graduate school and I can’t even imagine working through graduate school without him. Meeting Abhijoy every day after work and walking home together as we discuss the joys and sorrows of the day will continue to be one of my most cherished memories of the last 5 years.

vii Vita

Education

Bachelor of Technology (B.Tech.) | 2010-2014 | SRM University, Chennai, India

· Major: Genetic Engineering; Bachelor thesis project, University of Zurich, Switzerland

Publications

1. Gangras, P.*, Dayeh, D.M., Mabin, J.W., Nakanishi, K., Singh, G., 2018. Cloning and Identification of Recombinant Argonaute-Bound Small Using Next-

Generation . Methods Mol. Biol. Methods article

2. Woodward L.A., Mabin J., Gangras P.*, Singh G.,2016. The :

A Lifelong Guardian of mRNA Fate. WIREs RNA. Advanced review article

3. Woodward L.A., Gangras P.*, Singh G. 2019. Identification of footprints of

RNA:protein complexes via RNA Immunoprecipitation in Tandem followed by sequencing (RIPiT-Seq). J. of Visul. Experiments.

Fields of Study

Major Field: Molecular Genetics

viii

Table of Contents

Abstract ...... iii Dedication ...... v Acknowledgments...... vi Vita ...... viii Publications ...... viii Table of Contents ...... ix List of Tables ...... xii List of Figures ...... xiii Chapter 1 Introduction ...... 1 1.1 RNA-binding proteins in human disease ...... 1 1.2 Role of the Exon Junction Complex in RNA metabolism ...... 4 1.3 EJC-dependent Nonsense Mediated mRNA Decay ...... 7 1.4 Role of core EJC proteins, eIF4AIII, RBM8A and MAGOH in development and disease ...... 11 1.5 Defects associated with Nonsense mediated mRNA Decay ...... 19 1.6 Brief overview of zebrafish muscle development ...... 21 1.7 Overview of zebrafish axial motor neuron development ...... 23 1.8 Role of FOXO signaling pathway in human and zebrafish ...... 25 Chapter 2 Optimization of RNA-Seq library preparation method ...... 28 2.1 Summary ...... 28 2.1. Introduction ...... 30 2.2. Materials and Methods ...... 32 2.2.1 RNA end curing and quantification ...... 32 2.2.2 RNA end curing...... 33 2.2.3 RNA quantification ...... 33 2.2.4 Ligation ...... 34 2.2.5 Reverse Transcription (RT) ...... 34 2.3.6 Gel electrophoresis of RT product ...... 35 2.3.7 Gel elution and streptavidin pulldown ...... 35 2.3.8 Circularization ...... 36 2.3.9 PCR Amplification ...... 36 ix 2.3.10 DNA quantification and sample prep for next generation sequencing ... 37 2.3.11 TOPO Cloning to validate NGS library ...... 38 2.4 Results ...... 39 2.5 Conclusion ...... 41 2.6 Figures ...... 43 Chapter 3 -proximal 3′UTR elicit EJC-dependent nonsense-mediated mRNA decay and regulate vertebrate development ...... 50 3.1 Abstract ...... 50 3.2 Introduction ...... 52 3.3 Materials and methods ...... 56 3.3.1 stocks, lines, and husbandry ...... 56 3.3.2 CRISPR/Cas9 mutagenesis ...... 56 3.3.3 EJC mutant and foxo3bihb404 mutant embryo and adult genotyping strategy ...... 57 3.3.4 Acridine orange staining and immunohistochemistry...... 57 3.3.5 Microscopy and Imaging ...... 58 3.3.6 Zebrafish NMDI14 inhibitor treatment ...... 58 3.3.7 Immunoblot analysis ...... 59 3.3.8 Quantification of paralysis and motor axon length ...... 59 3.3.9 RNA-Immunoprecipitation-seq and RNA-Seq sample collection ...... 59 3.3.10 RIP-seq and RNA-Seq library preparation ...... 60 3.3.11 Zebrafish EJC mutant embryo RIP-seq and RNA-Seq data analysis .... 61 3.3.12 enrichment analysis ...... 64 3.3.13 Overlap analysis ...... 64 3.3.14 STRING network analysis ...... 64 3.3.15 Identification of uORF genes in zebrafish ...... 65 3.3.16 Identification of 3′UTR intron containing genes in zebrafish, mouse, and human...... 65 3.3.16 Mammalian knockdown experiments ...... 66 3.3.17 Zebrafish upf1 knockdown experiment and RNA-Seq...... 66 3.3.18 Quantitative RT-PCR ...... 68 3.3.19 Quantification and statistical analysis ...... 68 3.4 Results ...... 70 3.4.1 EJC composition and deposition is conserved in zebrafish ...... 70 x 3.4.2 rbm8a and magoh mutant embryos show defects in motility, muscle organization, and motor axon outgrowth ...... 71 3.4.3 Analysis of gene expression in rbm8a and magoh mutant embryos ...... 73 3.4.4 rbm8a and magoh mutant embryos have defects in NMD ...... 75 3.4.5 Transcripts with stop codon-proximal 3′UTR introns are upregulated upon loss of EJC and Upf1 function ...... 77 3.4.6 Loss of function of foxo3b, a proximal 3′UI-containing gene upregulated in EJC mutant embryos, partially rescues motor axon outgrowth ...... 79 3.4.7 Proximal 3′UI-containing genes are regulated by NMD in human and mouse cells ...... 80 3.5 Discussion...... 83 3.5.1 Loss of EJC causes tissue-specific defects, embryonic lethality and changes in gene expression in zebrafish ...... 83 3.5.2 The EJC is a critical component of NMD in zebrafish ...... 85 3.5.3 Proximal 3′UTR introns are a novel NMD-inducing feature ...... 86 3.5.4 How can splicing of a proximal 3′UI lead to NMD? ...... 87 3.5.5 EJC-dependent NMD of foxo3b is critical for zebrafish motor axon outgrowth ...... 88 3.6 Figures ...... 91 Chapter 4 Concluding Remarks and Future Directions ...... 120 4.1 Summary of Findings and Significance ...... 120 4.1.1 Significance ...... 121 4.2 Future Directions ...... 123 4.2.1 Determine molecular cause of the muscle defects in zebrafish EJC mutants...... 123 4.2.2 Identifying the mechanism of proximal 3′UI transcript decay ...... 127 4.2.3 Determining the molecular mechanism of foxo3b-mediated rescue of EJC mutant motor neuron outgrowth defects ...... 128 Bibliography ...... 131 Appendix A List of primers used in chapter 2 ...... 147 Appendix B RNA-Seq reads ...... 149 Appendix C List of primers used in chapter 3 ...... 150

xi

List of Tables

Table 1.1 Summary of known EJC-associated defects ...... 14

xii List of Figures

Figure 1.1 RNA binding proteins are associated with tissue-specific human diseases...... 3 Figure 1.2 Lifecycle of the Exon Junction Complex ...... 6 Figure 1.3 Schematic illustrating the events occurring in a normal termination and aberrant translation termination event that triggers NMD...... 9 Figure 1.4 Types of NMD targets ...... 10 Figure 1.5 Illustration of 25 hpf zebrafish embryo...... 22 Figure 1.6 Organization of zebrafish muscle...... 22 Figure 1.7 Primary motor in zebrafish at 18 and 24 hpf...... 24 Figure 2.1 Schematic of RNA-seq library preparation procedure...... 43 Figure 2.2 RT-PCR using cDNA with biotinylated dNTPs prevents amplification of the excess adapter...... 45 Figure 2.3 Product of the RT reaction using biotinylated dNTPs is comparable to reaction with normal dNTPs...... 46 Figure 2.4 0.01 pmol of RNA as input is sufficient to generate complex libraries...... 47 Figure 2.5 PCR amplification and size estimation of NGS libraries made from RNA bound to KpAgo...... 48 Figure 3.1 The zebrafish EJC is detected ~24 upstream of exon-exon junctions...... 91 Figure 3.2 EJC composition and deposition is highly conserved between zebrafish and humans...... 93 Figure 3.3 Zebrafish rbm8a and magoh mutant embryos show gradual loss of maternally contributed Rbm8a and Magoh proteins during early development...... 95 Figure 3.4 Cell death in EJC mutants begins at 19 hpf and progressively worsens over time...... 97 Figure 3.5 EJC mutant embryos are paralyzed, have disorganized muscles and stunted motor axons...... 98 Figure 3.6 Gene expression changes in rbm8a and magoh mutant embryos...... 100 Figure 3.7 Genes upregulated in EJC mutant embryos are also regulated by Upf1 and contain NMD-inducing features...... 102 Figure 3.8 Genes upregulated in rbm8a mutants at 27 hpf show the highest overlap with a previously published 24 hpf upf1 morphant dataset...... 104 Figure 3.9 Transcripts encoded by genes with a proximal 3′UTR intron are upregulated in EJC mutant and in NMDI14-treated embryos...... 106 Figure 3.10 Proximal and distal 3′UI genes proteins with roles in RNA metabolism ...... 108 Figure 3.11 Partial or complete loss of foxo3b in magoh mutant embryos rescues motor neuron outgrowth defects...... 110 Figure 3.12 Partial or complete loss of foxo3b in rbm8a mutant embryos rescues motor neuron outgrowth defects...... 112 Figure 3.13 Proximal position of 3′UTR introns is conserved in many vertebrate genes and such introns can induce NMD in human cells...... 114 Figure 3.14 Proximal 3′UI genes are upregulated in UPF1 knockdown ESCs and encode proteins that function in RNA metabolism...... 116 xiii Figure 3.15 Models for EJC-dependent NMD of proximal 3'UI-containing transcripts and of NMD-based regulation of foxo3b-dependent motor neuron outgrowth ...... 118 Figure 4.1 Characterization of zebrafish rbm8a mutant muscle ...... 125

xiv Chapter 1 Introduction

This chapter contains published work. The publication reference is: L. Woodward, J. Mabin, P. Gangras, G. Singh (2016); The exon junction complex: A lifelong guardian of mRNA fate. WIREs RNA review, vol. 8, no. 3. https://doi.org/10.1002/wrna.1411

1.1 RNA-binding proteins in human disease

Differences in gene expression make every cell in an organism unique despite containing the same DNA. As genes are transcribed into messenger RNA (mRNA), RNA-binding proteins (RBPs) interact with the nascent pre-mRNA to process them to mature mRNA.

At least, protein-coding mature mRNA is transported to the cytoplasm where it is translated by a ribosome. The translation of mature mRNA can be regulated spatially by localization to specific regions of the cytoplasm or temporally by storage of the mRNA in cytoplasmic bodies. Finally, the mRNA is degraded either at the end of its function or as a result of decay mechanisms that regulate protein production. The post-transcriptional regulation of mRNA fate is essential for proper cellular function, it influences crucial cell fate decisions and cellular differentiation pathways. RBPs that mediate post- transcriptional processes are therefore important regulators of development and are linked to several human diseases (1–3). Currently, in some RBPs that function in pre-mRNA splicing, 3′ end cleavage/ , mRNA export, assembly/localization, translation and 5′ cap turnover/mRNA decay are associated with human diseases (Fig 1.1). Surprisingly, many of the human diseases associated with

RBPs manifest as tissue-specific defects despite ubiquitous expression in all cells of many of these RBPs (1–4). Why dysregulation in the function of ubiquitously expressed

RBPs leads to human diseases in specific cell types remains unknown. Based on human 1 expression data, most RBPs display a broad expression pattern (5). Only 6% of RBPs are expressed in specific tissues such as testis, muscle/heart, liver/pancreas, lymphocytes/bone marrow and brain (5). These observations motivate in depth investigation of RBP tissue-specific functions and of cell type sensitivity to loss of specific RBPs. I have been interested in understanding the tissue-specific roles of one such ubiquitously expressed RBP complex called the Exon Junction Complex (EJC).

2

Figure 1.1 RNA binding proteins are associated with tissue-specific human diseases. Post-transcriptional processes (black) described in the middle panel and drawn in left panel are associated with diseases (red) described in middle panel which manifest as tissue specific defects illustrated in the right panel. MDS: Myelodysplastic syndromes ; OPMD: Oculopharyngeal muscular dystrophy; LCCS1: Lethal congenital contracture syndrome 1; SMA: ; FXS: fragile X-syndrome; ID: intellectual disability; PCH: Pontocerebellar hypoplasia. Figure from Corbett 2018 (This figure has been approved for use by Elsevier)

3 1.2 Role of the Exon Junction Complex in RNA metabolism

The EJC is comprised of three core proteins, eIF4AIII, RBM8A (also known as Y14) and

MAGOH (6) The EJC subunits are recruited co-transcriptionally to an activated which deposits the EJC in a sequence-independent manner about 24 nucleotides (nt) upstream of exon-exon junctions during pre-mRNA splicing (6–8). Once deposited, the EJC remains stably bound on mRNA until it is removed by a ribosome- associated protein, PYM, during the first round of translation (6). After disassembly, the

RBM8A:MAGOH heterodimer is reimported into the nucleus by Imp13, the mechanisms for nuclear import of eIF4AIII and MLN51 are unknown. Recycled EJC subunits are thought to re-enter the EJC lifecycle (6).

In mammalian cells, the two core EJC proteins RBM8A and MAGOH are present in nearly equal amounts and compared to them, eIF4AIII is present at a sub- stoichiometric ratio (9). Apart from the functions of the core proteins as the EJC, eIF4AIII is thought to function in ribosome biogenesis in yeast and human cells (10,11).

Recent studies also provide some evidence for yet unknown but EJC-independent functions of RBM8A in Y.lipolytica (12). As the complex, the EJC interacts with a number of peripheral factors to regulate pre-mRNA splicing (ASAP complex, MLN51,

SRm160) and mRNA export to the cytoplasm (TREX complex), and mRNA localization

(MLN51), translation (SKAR, MLN51) and nonsense-mediated decay (UPF3B, UPF2,

MLN51) in the cytoplasm (6) (Fig 1.2). Hence, the EJC influences almost every aspect of the mRNA’s life even though the EJC was originally discovered for its key role in triggering nonsense-mediated mRNA decay (NMD) (13–18). Several studies found that transcripts containing premature termination codons (PTC) were targeted to decay due to 4 the presence of a downstream exon-exon junction (13–18). EJC was discovered to be the marker of exon junctions which aids in triggering the decay of PTC-containing transcripts by communicating with the terminated ribosome (13–18).

5 Figure 1.2 Lifecycle of the Exon Junction Complex

EJC consisting of eIF4AIII, RBM8A (Y14) and MAGOH is deposited during splicing in the nucleus (i) after which it associates with several proteins such as Upf3b, TREX, SKAR and ASAP complexes (ii) of which some aid in mRNA export to the cytoplasm (iii). In the cytoplasm more proteins associate with the EJC such as Upf2 and MLN51 (iv). In the cytoplasm as the mRNA is translated by the ribosome, PYM interacts with EJC and disassembles the complex (v). Magoh and Y14 (Rbm8a) are then re-imported into the nucleus by Imp13 (vi). Figure from Woodward et al 2016 (Figure has been approved for use by Wiley and Sons.)

6 1.3 EJC-dependent Nonsense Mediated mRNA Decay

One key function of the EJC in vertebrates is its ability to trigger NMD when present downstream of a terminating ribosome. Initial models suggested that when a ribosome terminates translation ≥ 50 nts upstream of an exon-exon junction, the EJC that remains bound in the 3’- (UTR), through its interactions with peripheral proteins UPF3 and UPF2, activates central NMD factor UPF1 to induce NMD (Fig 1.3)

(19,20).

When the ribosome terminates, eukaryotic polypeptide release factors (eRF3 and eRF1), UPF1 and SMG1 are recruited to the site of termination on the mRNA where they form the SURF complex (20) (Fig 1.3B). SMG1, which is activated by SMG8 and

SMG9, in turn activates UPF1 by (Fig 1.3B). Phosphorylated UPF1 recruits SMG5, SMG6 and SMG7 via phospho-specific interactions (20) (Fig 1.3B).

SMG5 and SMG7 expose the ends of mRNAs to exonucleases by recruiting DCP1a and

POP2 (20) (Fig 1.3B). Meanwhile SMG6 endo-nucleolytically cleaves mRNA to make two fragments which can then be decayed by the exonucleases (20) (Fig 1.3B). The classical model of NMD which states that NMD occurs only during the first round of translation was based on the observation that mRNAs bound to nuclear cap-binding complex, CBP80, are sensitive to UPF1-mediated decay (21). However, it was later shown that eIF4E-bound mRNAs, which are undergoing more than one round of translation, are also being targeted by NMD (22,23). Thus, the current unified model of

NMD states that NMD can be triggered during any round of translation by an aberrant translation termination (20,24). The activation of NMD at a specific termination codon likely depends on the availability of termination stimulating factors in the 7 microenvironment, a lack of termination factors increases the probability of triggering

NMD. Despite the wealth of biochemical data on structure, functions and interactions of

NMD factors, the mechanism and criteria for selection of an mRNA for decay is not well understood (20,24).

NMD suppresses expression of aberrant transcripts bearing premature termination codons, and also regulates expression of normal transcripts that contain NMD-inducing features such as upstream open reading frames (uORF) and 3′UTR introns (3′UIs) (Fig

1.4) (25,26). Based on the current unified model, in the context of normal NMD targets, one would expect that most experience a termination-friendly environment and terminate properly before NMD is triggered. Transcripts with NMD-inducing features are thought to be regulated by NMD to alter protein expression at specific developmental stages or within localized areas of the cytoplasm. An example of the latter is ARC mRNA, encoded by a 3′UI-containing gene, is localized to neuronal synapses (27). A full repertoire of transcripts with 3′UTR introns have also been shown to be expressed at different levels in human cell types (28). Despite some research on NMD targets, several physiologically-relevant normal NMD targets and their cell-type specific functions are yet to be discovered. The discovery of normal NMD targets that influence tissue development would be critical for understanding the pathology of the EJC protein and

NMD factor associated human diseases which often manifest as neonatal developmental defects (Section 1.4 and 1.5).

8

Figure 1.3 Schematic illustrating the events occurring in a normal translation termination and aberrant translation termination event that triggers NMD. A. When the A site of a ribosome (red and peach) encounters a stop codon, it signals translation termination which is recognized by eRF1 (blue) and eRF3 (green). A conformational change in eRF1 leads to the release of the tRNA and nascent peptide while ABCE1(purple) facilitates the release of the ribosomal subunits. B. When the translation termination occurs prematurely, the ribosome interacts with the downstream EJC via UPF1, UPF2 and UPF3 (all three in sky blue). UPF1 is activated by SMG1 (yellow), which is itself activated by SMG8 and SMG9 (both in gray). The activated Upf1 recruits SMG5 (pink), SMG6 (orange) and SMG7 (pink) as well as other decay machinery to the mRNA. In the lower part of the panel, SMG1, UPF2 and UPF1 have been depicted in gray. Figure from Karousis et al 2019. (This figure has been approved for use by CSH press.)

9

Figure 1.4 Types of NMD targets Schematic showing the types of NMD targets where mRNA (gray) is bound by an exon junction complex (green) and ribosome (brown). NMD targets are mainly divided into two types: aberrant and natural or normal. Aberrant NMD targets are created as a result of or genetic which results in the presence of a premature termination codon. Natural/Normal NMD targets encode the full length protein however the transcript contain NMD-inducing features such as 3’UTR intron or uORF which are capable of triggering decay under certain environments of translation terminations.

10 1.4 Role of core EJC proteins, eIF4AIII, RBM8A and MAGOH in development and disease

Since the multi-functional EJC marks all spliced mRNAs, it is not surprising that EJC components are essential for normal organismal development, function and survival.

Identification of EJC- associated human genetic disorders along with work from several model organisms has begun to illuminate the importance of EJC proteins during animal and plant development (Table 1.1). Long before RBM8A and MAGOH were described as

EJC components, their Drosophila homologs, Tsunagi and Mago nashi, respectively, were identified for their role in oocyte development (29–31). Drosophila is, so far, the only example where developmental defects caused by loss of EJC proteins have been mechanistically linked to their molecular roles. EJC-mediated localization of oskar mRNA to the posterior pole in Drosophila oocytes is a critical event for setting up anterior–posterior axis of the oocyte and the embryo. At the posterior pole, oskar mRNA translation leads to assembly of polar granules that contain determinants for abdominal patterning and germ cell specification (32). Consistently, loss of tsunagi or mago nashi leads to loss of germ cell specification causing dorsoventral patterning defects, which results in double abdomen embryos (29–31,33). Now multiple human genetic disorders linked with mutations in genes encoding EJC core and associated proteins have been identified. Remarkably, in all these disorders, changes in EJC protein levels manifest as neurodevelopmental defects and reduced cognitive functions. A syndromic disorder

[Richieri-Costa–Pereira (RCP) syndrome] resulting from either a 5′UTR repeat expansion or a missense mutation in the gene encoding EIF4A3 leads to learning and language disabilities in more than 50% of patients (34–36). Microdeletions of the 1 11 region q21.1 harboring the RBM8A gene are associated with brain size abnormalities and autism spectrum disorders (37–39). Such neurological defects are also seen upon copy number variations (both increase and decrease) in genes encoding RBM8A, EIF4A3, and

RNPS1 (40). Intellectual disabilities are also observed, albeit at a lower , in patients with thrombocytopenia with absent radii (TAR) syndrome caused by co- inheritance of a microdeletion and a noncoding SNP in RBM8A (41,42). Clearly, EJC function appears to be especially critical for neurodevelopment, learning, and memory.

Upon loss of EJC core proteins, characteristic defects in musculoskeletal system are also seen in TAR syndrome patients (characterized by limb defects, absence of radius bone, and reduced counts of megakaryocytes, the platelet precursor cells (42)) and the RCP syndrome patients (characterized by severe limb and craniofacial abnormalities (36)).

Recent work from mouse models with reduced EJC protein levels has recapitulated some of the human defects. Conditional Rbm8a haploinsufficient mice show impaired neurodevelopment due to increased cell cycle exit of neural precursor cells (radial glial cells), their premature differentiation into neurons, and p53-dependent apoptosis of excess neurons. These defects result in a thinner cortex and eventually smaller brains (43,44). Consistent with the interdependent functions of Rbm8a and

Magoh, mice heterozygous for a Magoh null allele phenocopy conditional Rbm8a mutant mice (45). The mouse Magoh mutant also exhibits hypopigmentation due to defects in melanoblasts, precursors of pigment cells (45,46). Xenopus embryos depleted of show a similar pigmentation defect along with a striking loss of muscle contraction

(47,48), again highlighting the importance of the EJC for the musculature system. A theme that emerges from the human and mouse studies is that progenitor cells such as 12 megakaryocytes, radial glial cells, neurons and melanoblasts are more susceptible to the loss of EJC core proteins (42,43,45,46). Studies on mouse Rbm8a and Magoh mutants show that the cell cycle of neural progenitors is compromised (45,46,49) which may result from the susceptibility of centrosomes and microtubules to the EJC core protein depletion. Consistently, cultured mammalian cells depleted of RBM8A, MAGOH, or

EIF4A3 show spindle pole number, microtubule organization, and spindle plane orientation defects (45,46). Microtubule organization defects are also notable in rbm8a and magoh mutant Drosophila oocytes (29–31,33,50).

Despite these advances, how the EJC regulates neural cell cycle and function remains unknown. While studies thus far highlight the critical role of EJC during development, much remains to be learned about EJC’s role in developmental gene expression programs. How each of the many functions of this multi-functional protein complex contributes to gene regulation during development also remains unknown. It is interesting to note that human and mouse neurological defects associated with loss of

EJC function overlap with human defects observed upon loss of NMD factor function because it suggests that dysregulation EJC-dependent NMD maybe the mechanism underlying the previously described human neurological disorders (Section 1.5).

13 Table 1.1 Summary of known EJC-associated defects

Gene/Protein Organism Mutation Phenotype; defects

RBM8A/RBM8A Human Null plus Thrombocytopenia-

(Y14) noncoding Absent-Radii (TAR)

syndrome (low platelet

count, heart defects,

radius bone absent) (42)

Human 1q21.1 intellectual disability,

microdeletion; schizophrenia, autism

Copy Number spectrum disorders (40)

Variation

Mouse Conditional Microcephaly (small

knockout of brain size), neural

Rbm8a in progenitor proliferation

neural and apoptosis of neurons

progenitors (43,51)

Mammalian cell Variable levels Mitotic spindle

culture of knockdown formation, G2/M

transition and mitotic

division plane orientation

(46)

14

Drosophila Hypomorphic Microtubule

Melanogaster alleles, organization, oocyte

homozygous nucleus migration, oskar

null (Tsunagi) mRNA localization,

dorsal-ventral patterning,

oocyte fate restriction to

a single cell (early

oogenesis) (33,50)

Planaria siRNA Head regression, ventral (S. mediterranea) knockdown curling and lysis, lack of

blastema formation, lack

of stem and progenitor

cell maintenance (52)

EIF4A3/eIF4AIII Human Repeat Richieri-Costa-Pereira

expansion in (RCP) syndrome

the 5’UTR (Cranofacial anomalies,

severe limb defects,

intellectual disability)

(36)

Xenopus laevis Knockdown Melanophore

development, cardiac

15 looping, embryonic touch

response, muscle fiber

contraction (47,48)

Arabidopsis T-DNA Abiotic stress adaptation

Thaliana insertion (53)

(AteIF4AIII)

Planaria siRNA Head regression, ventral (S. mediterranea) knockdown curling, lack of blastema

formation, lack of stem

and progenitor cell

maintenance (52)

MAGOH/MAGOH Mouse Whole animal Microcephaly (small

heterozygous brain size),

for null allele hypopigmentation, neural

progenitor proliferation,

prolonged mitosis of

neural progenitor,

apoptosis of neurons

(45,46,49)

Mammalian cell Variable levels Mitotic spindle

culture of knockdown formation, G2/M

16 transition and division

plane orientation (45)

Drosophila Hypomorphic Microtubule

Melanogaster alleles organization, oskar

(reduced mRNA localization,

expression of dorsal-ventral patterning,

protein) and germ cell specification

homozygous (29–31,33,50)

null

(Mago nashi)

Arabidopsis Deletion leads premature pollen death thaliana to viable and a reduced seed

truncated production;

transcript; meristem organization,

Depletion pollen formation, and

(AtMago) seed development (54,55)

Planaria siRNA Head regression, ventral (S. mediterranea) knockdown curling, lack of blastema

formation, lack of stem

and progenitor cell

maintenance (52)

17 EJC associated Humans Copy Number Intellectual disability,

NMD factors Variation schizophrenia, autism

:UPF3B, UPF2 spectrum disorders (40)

18 1.5 Defects associated with Nonsense mediated mRNA Decay

Components of the NMD pathway have been linked to human disease and have also been shown to be essential for development. Mutations in UPF3B, UPF2 and SMG6 that cause reduced mRNA expression of these NMD factors have been linked to intellectual disability in humans (40,56–60). In mice, knockout of Smg1, Smg6, Upf1 or Upf2 has been shown to cause embryonic lethality while Upf3b knockout leads to behavioral defects (61–65). Several studies have also shown that NMD is essential for proper neural development in mammals as NMD components (UPF1, UPF2 and SMG1) localize to axonal growth cones and synapses to regulate local gene expression of key NMD targets like ARC (27,66–68). In addition to neural development, NMD is critical for the functioning of the hematopoietic system. NMD clears out erroneous PTC-containing isoforms of immunoglobin and T-cell receptor transcripts which are created by programmed genomic rearrangements during lymphocyte development (69–71).

Conditional knockout of Upf2 in T-cells leads to an increase of PTC-containing transcripts leading to apoptosis (69).

In order to understand the function of NMD factors in development, studies have been conducted in human embryonic stem cell lines. These studies have shown that NMD factors are essential for proliferation and differentiation of stem cells. RNA-Seq experiments have shown that human embryonic stem cells downregulate NMD during differentiation (72,73). NMD regulates two key growth factors TGFβ and BMP which influences differentiation of human embryonic stem cell lines (72,73). In mice, knockout of Smg6 in embryonic stem cells prevents cellular differentiation (65). Conditional

19 knockout of Upf2 in mouse long-term hematopoietic stem cells of the bone marrow leads to extinction of hematopoietic stem and progenitor cell populations (69).

Despite several advances in uncovering NMD function in development and disease, much remains to be discovered about NMD target selection and the NMD targets that influence development of tissues. Understanding how NMD regulatory networks function in developmental and tissue-specific contexts also remains a major area of discovery. To specifically understand the roles of EJC-dependent NMD in development, I have used the zebrafish model system (Chapter 3). To support the results presented in

Chapter 3, the subsequent sections of chapter 1 provide a brief overview of zebrafish skeletal muscle (Section 1.6), of zebrafish motor neuron development (Section 1.7) and of known functions of the FOXO signaling pathway in development (Section 1.8).

20 1.6 Brief overview of zebrafish muscle development

Zebrafish embryonic development is broken down into the following stages based on the time post fertilization: zygote, cleavage, blastula, gastrula, segmentation, pharyngula, hatching and larval stages (74). During segmentation, paired blocks of mesoderm which are referred to as somites, are formed along the anterior-posterior axis in the developing embryo, these blocks of mesoderm will give rise to several tissue types including muscles

(75). By 18 hours post fertilization (hpf), 18 somites are formed sequentially and have grown in size in an anterior to posterior direction (74). Over the next six hours, somites continue to form at the rate of approximately two per hour to produce a total of 30 pairs

(74). Among other tissues, somites also give rise to axial skeletal muscle which comprises of slow-twitch muscle cells, muscle pioneer cells, fast-twitch muscle cells and medial fast fiber cells (75). By 24 hpf, chevron shaped blocks of muscles called myotomes are formed (Fig 1.5). Most of the muscle fibers in a somite are multinucleated fast twitch fibers but each block of muscle is surrounded by a layer of mononucleated slow twitch fibers (75) (Fig 1.6). The slow twitch fibers undergo spontaneous contractions starting at about 17 hpf and mediate the coiling movements of the tail in response to touch by about 21 hpf (76–80). The fast twitch fibers are functional after a few hours when they aid the embryos in hatching out of the chorions and help the larvae in their characteristic darting movements (81). Thus, zebrafish muscle is formed from somites and is present in spatially discrete populations of slow and fast twitch fibers. As the skeletal muscle is formed, it influences the development of skeletal motor neurons

(Section 1.7).

21

Figure 1.5 Illustration of 25 hpf zebrafish embryo.

Arrows point towards features such as the hatching gland (bottom arrow) which are used to stage the embryo. Figure adapted from Kimmel et al. 1995. (This figure has been approved for use by Elsevier.)

Figure 1.6 Organization of zebrafish muscle. Schematic illustrating a zebrafish tail cross section with fast muscle cells in yellow and slow muscle cells in green. Figure adapted from Jackson et al. 2013. (This figure has been approved for use by Elsevier.)

22 1.7 Overview of zebrafish axial motor neuron development

Motor neurons in zebrafish are of two main types: primary and secondary, which are formed in two distinct waves. The first wave occurs during gastrulation (beginning at 9-

10 hpf) when primary motor neurons are born. The second wave occurs around 14-15 hpf when secondary motor neurons are born. There are three primary motor neurons per hemisegment of each myotome which differ based on the position of their soma in the hemisegment (Fig 1.7). Based on position, the three motor neurons are called RoP

(Rostral primary), CaP (Caudal primary) and MiP (Middle primary) (Fig 1.7) (82–84).

The primary motor neurons innervate through the axial muscles and CaP reaches the end of the ventral myotome by 24-26 hpf (82). The axon of CaP is the first one to grow out and it is followed by the axons of RoP and MiP (82). All three axons follow the same path until the horizontal myoseptum. At the choice point on the horizontal myoseptum, the CaP axon extends into the ventral somite, RoP retracts to the dorsal somite and MiP grows out laterally (85). Axonal outgrowth of the primary motor neurons is controlled by gene expression in the axonal growth cone and on cues from the myotomes (85). Despite advances in understanding axonal pathfinding, several questions about the changes in localized gene expression within the growth cone and the muscle that promote axonal outgrowth remain unanswered. In chapter 3, I have identified a molecular pathway consisting of EJC core components (Rbm8a and Magoh) and Foxo3b, that is critical for zebrafish motor neuron outgrowth. Foxo3b is a part of the highly studied Foxo signaling pathway which has been known to shape cell fate in many ways (Section 1.8).

23

Figure 1.7 Primary motor neurons in zebrafish at 18 and 24 hpf. Figure from Feldner et al. 2007 (86). (This figure has been approved for use by Journal of Neuroscience ; Copyright 2007 Society for Neuroscience))

24 1.8 Role of FOXO signaling pathway in human and zebrafish

Developmental signaling pathways are essential to intra-cellular interactions and for organismal response to stimuli. Every signaling pathway ends in effector proteins that modulate gene expression in response to the signal. An important class of effectors are transcription factors such as the FOXO (Forkhead/winged helix box gene, group O) family of proteins. The FOXO family are a highly conserved group of transcription factors that integrate multitude of extracellular cues to maintain cellular and tissue homeostasis. In mammals, there are four FOXO proteins: FOXO1, FOXO3, FOXO4 and

FOXO6 which are important effectors of the insulin/PI3K/Akt signaling pathway (87).

Evidence from zebrafish shows that expression of zebrafish orthologs FOXO4, FOXO6a and FOXO6b is also dependent on the PI3K signaling pathway (88). Most of the FOXO proteins are ubiquitously expressed even though cell-type specific spatial and temporal differences are observed in humans (89). In zebrafish, foxo3b mRNA expression is seen as early as the 2-cell stage but foxo4, foxo6a and foxo6b mRNA expression is not seen until 10-14 hpf. Expression of all the foxo transcripts described above starts out as ubiquitous and they are all primarily localized to the brain by 36 hpf and the expression remains neural even at day 2 (88,90).

FOXO’s have been shown to be important for the development of skeletal muscle cells, neural cells, hematopoietic cells and bone cells (91–93). Knockout of FOXO genes in model systems produces a range of phenotypes suggesting that FOXO’s function in a wide range of processes during development. Knockout of Foxo1 in mice leads to embryonic lethality due to vascular defects, while knockout of Foxo3a only causes age- dependent infertility in female mice and knockout of Foxo4 has no obvious phenotype 25 (94). In zebrafish, knockout of foxo3b has shown increase the sensitivity of larvae to hypoxia and also increased expression of key antiviral response genes (95,96).

Knockdown of zebrafish foxo3b and human FOXO3 in erythroid progenitors impaired their capacity to differentiate (97). Overexpression of foxo3a in zebrafish cardiac muscle lead to disarray in myofibril organization (98).

In my work described in Chapter 3, I have focused on the post-transcriptional regulation of foxo3b. Zebrafish foxo3b is orthologous to human FOXO3, which has been shown to play roles in diverse processes such as cell cycle, apoptosis, autophagy and redox balance (92,99). Zebrafish foxo3b is expressed ubiquitously early in development

(90). By 24 hpf its expression is restricted to the developing eye, hindbrain and in the posterior mesoderm (90). After 24 hpf, foxo3b expression is primarily neural (90).

Studies in zebrafish have shown that foxo3b binds β-catenin to repress the canonical Wnt signaling pathway and aids in hypoxia signaling by activating vhl, which encodes a member of E3 ubiquitin ligase complex (90,100).

Studies in model systems such as mice, zebrafish and human cells have brought to light several target genes of FOXO signaling pathway, however impact of FOXO- mediated gene regulation on tissue development is not fully understood. Studies in mammalian cell culture have illustrated some examples of transcriptional and post- transcriptional regulation of FOXO expression. In human cell culture, transcription of

FOXO1 and FOXO4 genes is promoted by the presence of FOXO3 and possibly by other

FOXO factors in a positive feedback loop (101). Additionally, expression of FOXO1,

FOXO3 and FOXO4 is repressed upon exposure to growth factors such as PDGF, FGF and IGF-1 (101). In vitro studies and luciferase assays in human cell culture have shown 26 that expression of FOXO3 mRNA is regulated by several miRNAs, by presence of

FOXO3 mRNA and by presence of FOXO3 circular RNA (102). Despite some advances, the mechanisms that regulate FOXO expression in cells in response to stimuli are not completely understood.

27 Chapter 2 Optimization of RNA-Seq library preparation method

This chapter contains published work. The publication reference is: P. Gangras, D. Dayeh, J. Mabin, K. Nakanishi and G. Singh (2018); Cloning and Identification of Recombinant Argonaute-Bound Small RNAs Using Next-Generation Sequencing, Argonaute Proteins, pp. 1-28. https://doi.org/10.1007/978-1-4939-7339-2_1

2.1 Summary

Biological samples can often yield very low amounts of RNA which is why identifying the of RNA obtained from scarce samples remains a challenge. Next-Generation

Sequencing (NGS) methods are a solution to quantifying and identifying RNA’s obtained from low amounts of biological sample. However, even NGS approaches remain limited to quantitative detection when RNA amounts are really low. Further, most commercial and published methods are compatible with either small RNAs or long RNAs, but are not equally applicable to both. Therefore, a single method that yields quantitative, bias-free

NGS libraries to identify small and long RNAs from low levels of input will be of wide interest.

Here, I introduce such a procedure that is based on several modifications of two published protocols and allows robust, sensitive and reproducible cloning and sequencing of small amounts of RNAs of variable lengths. Following ligation of a DNA adapter to

RNA 3'-end, the key feature of this method is to use the adapter for priming reverse transcription (RT) wherein biotinylated deoxyribonucleotides were specifically incorporated into the extended complementary DNA (Fig 2.1). Such RT products were enriched on streptavidin beads, circularized while immobilized on beads and directly used for PCR amplification (Fig 2.1). I have tested the advantage of incorporating biotinylated dNTPs in the RT reaction to avoid amplification of insert-lacking or 28 unextended adapters in the PCR (Fig 2.2). I have then shown that my improved method can be used to generate complex libraries from input samples as low as 0.01 pmol of

RNA (Fig 2.3). I also collaborated with the Nakanishi lab and members of Singh lab to implement our protocol for identifying the sequences of RNA bound to crystallized yeast

Argonaute (AGO) protein (Fig 2.4).

29 2.1. Introduction

In the last decade, next-generation sequencing (NGS) has illuminated our understanding of gene regulation and cellular function. However, obtaining meaningful data from scarce biological input sample for an affordable price can be challenging. Further, kit-based

RNA-Seq library preparation methods are amenable for sequencing of either short or long

RNAs. I designed a custom RNA-Seq library preparation method from existing workflows to make conducting RNA-Seq efficient and cost-effective. I have combined key features of two existing workflows to further increase the specificity and sensitivity of RNA-Seq library preparation from as low as 0.01 pmol of RNA. The described workflow is primarily based on a previously published method that extensively optimized each individual enzymatic step in RNA-Seq library preparation (103). The specificity and sensitivity of the method was further improved by incorporating a feature from another kit-free RNA-Seq library preparation approach wherein complementary (cDNAs) from adapter-ligated small RNAs are specifically enriched using biotinylated deoxyribonucleotides (dNTPs) (104). To test if my modifications to the workflow achieved the desired results, I evaluated the improved workflow for RNA-Seq library preparation using a synthetic 28-nt RNA. After establishing the protocol, I collaborated with members of the Nakanishi lab and Singh lab to test the workflow for small RNA-

Seq of RNAs bound to yeast argonaute protein. This procedure can also be used on long

RNAs if they are fragmented before NGS library prep. In cases of long RNAs, chemical methods of RNA fragmentation are much more desirable over enzymatic method (9,105).

30 A schematic of the whole procedure is depicted in figure 2.1. Isolated RNAs were first enzymatically treated to generate a 3'-hydroxyl group to allow ligation to a pre- adenylated DNA adapter. The adaptor-ligated-RNAs were used to perform reverse transcription (RT) using biotinylated dNTPs and one of the twelve RT primers to produce biotinylated cDNAs containing a specific 5- barcode (Appendix A). Another feature of the RT primers is the inclusion of five random nucleotides at positions that will be the first five nucleotides sequenced (5X ‘N’ in the primer sequences in Appendix A).

This random sequence ensures sequence complexity during first few sequencing cycles, and also allows for bioinformatic removal of any sequencing reads that are artefacts of

PCR caused due to over-duplication of sequences favored in the PCR reaction. Following

RT, reaction was run on a gel, the desired product is extracted and enriched via streptavidin pulldown. The purified RT product was then circularized where the 3'-end is ligated to the 5'-end by circLigase, and the circular molecule serves as a template for

PCRs. I have also described the utility of small-scale PCRs to determine the appropriate number of cycles for production of the highest concentration of libraries without accumulation of nonspecific PCR products. After determining the appropriate number of cycles, large-scale PCR was conducted under the determined conditions to specifically amplify the cDNA libraries for NGS. The purified PCR product then underwent a series of quantification and quality control steps before submission for NGS. After testing serially diluted synthetic RNAs as input, I have determined that my protocol can successfully generate complex cDNA libraries with as low as 0.01 pmoles of RNA.

31 2.2. Materials and Methods

2.2.1 RNA end curing and quantification

There are two important considerations for the RNAs that will be input into the method described here. First, this method requires a hydroxyl group at the RNA 3'-ends. Most small RNAs have a 3'-hydroxyl group and hence can be directly input into ligation reaction in step 1 of section 3.2. RNAs that I used for the NGS library preparation were synthetic 28-nt RNAs or were co-purified with a fragment of Kluyveromyces polysporus

Argonaute1 (KpAGO). The recombinant protein was expressed and purified from E. coli cells by Daniel Dayeh as reported previously (106). Consistent with previous crystal structures of eukaryotic AGOs, these RNAs possess a 5'-monophosphate (106–110).

However, the exact nature of these RNA 3'-ends (3'-hydroxyl or 3'-phosphate) is not known. Therefore, I first conducted a Calf Intestine phosphatase (CIP)-mediated dephosphorylation step to ensure that all RNAs end in a 3'-hydroxyl. To quantify input

RNA, I used a radioactive method that compares the number of 5'-ends in the RNA sample to a small synthetic RNA oligo of known size and concentration. This is a very sensitive approach that is particularly beneficial when input RNA amounts are low

(nanomolar range) and cannot be reliably quantified using spectroscopic methods. As small RNAs often have a 5'-monophosphate, the CIP treatment described here also converts the 5'-phosphate to a 5'-hydroxyl group, which can be subsequently labeled with a radioactive phosphate using T4 polynucleotide kinase (PNK).

32 2.2.2 RNA end curing

The following was added to an eppendorf tube: 2 μl 10X CutSmart® buffer (NEB), x μl

RNA, 1 μl CIP enzyme (NEB), make upto 20 μl with RNase-free water. The reaction was incubated at 37°C for 30 mins. Then the RNA was extracted by adding PCIA

(Phenol:chloroform:isoamyl alcohol, pH 4.5, ThermoFisher Scientific) and precipitated with ethanol.

2.2.3 RNA quantification

To quantify the RNA, I end labelled it with γ32P-ATP by adding 1 μl 10X PNK buffer

(NEB), 1/10 th vol of total RNA precipitated, 0.5 μl 1mM ATP, 20 μCi γ32P-ATP (Perkin

Elmer), 0.5 μl T4 PNK (NEB) to a tube, the reaction is made up to 10 μl with RNase-free water. Parallel T4 PNK reactions were set up to label 0.1 pmole of a synthetic RNA oligo, and 1 μl low molecular weight ssDNA ladder. All the reactions were incubated at

37°C for 30 mins. 10 μl of 2× Denaturing Load Buffer was added to each reaction which were heated at 65°C for 2-5 mins and then run on a 20% Urea-PAGE gel. The gel was lifted onto 3 M Whatman filter paper, covered with saran wrap and then dried for 1 hour at 80°C in a gel dryer. The gel was exposed to a phosphorimager screen overnight to visualize 32P-labeled RNA. The volume of signal of the input RNA was quantified and the RNA oligo used as standard. If needed, the concentration of the remaining input RNA was adjusted such that 3'-ends were between 0.015-0.5 pmoles/μl (μM). This yielded

~0.05-2 pmol of 3'-ends in 3.8 μl, which was optimal for efficient ligation reaction in the next step, and eventually yields a complex library.

33 2.2.4 Ligation

For ligation of the adapters to input RNA, 1.0 μl of 7 μM pre-adenylated adaptor

(mirCAT-33® Conversion Kit from IDT) was added to 3.8 μl of RNA and the reaction was made up to 4.8 μl with RNase-free water. The reaction was incubated in a thermocycler at 65°C for 10 minutes, rapidly cooled to 16°C and then held at 16°C for 5 minutes. Then other components of the ligation reaction were added to the tube: 1.5 μl

10X Ligation Buffer (10X T4RNL2 Tr. K227Q buffer; NEB), 7.5 μl 50% PEG8000

(NEB), 0.75 μl 20 mM DTT and 0.45 μl T4RNL2 Tr. K227Q (NEB). The ligation reaction was incubated in a thermocycler at 30°C for 6 hours followed by heat inactivation of the enzyme at 65°C for 20 minutes.

2.2.5 Reverse Transcription (RT)

To the 15 μl ligation mix, 1.0 μl 10 μM TruSeq RT primer, 11.25 μl 4X dNTP mix and

6.8 μl of RNase-free water was added.

4X dNTP mix: dGTP – 0.25 mM; dTTP – 0.25 mM; dATP – 0.25 mM; dCTP –

0.1625 mM; biotin-dATP – 0.075 mM (Metkin Biotin-11- dATP); biotin-dCTP – 0.0875 mM (Trilink Biotin-16-AA-2’ dCTP)

The RT reaction was incubated at 65°C for 5 minutes and then held at 4°C for at least a minute. Then the rest of the RT reaction components were added: 9.0 μl 5X FS w/o MgCl2 Buffer, 2.25 μl 100 mM DTT and 1.2 μl SSIII (200U/μl) (Invitrogen). The RT reaction was incubated at 55°C for 30-45 minutes followed by heat inactivation at 70°C for 15 minutes.

34 2.3.6 Gel electrophoresis of RT product

45 μl of 2X denaturing urea load buffer is added to the RT reaction and it is; then resolved on 10% denaturing Urea-PAGE gel (10% acrylamide:bisacrylamide (stock: 40%

(w/v)), 6M urea, 0.5X TBE) with eight 1.7 cm-wide wells at 35 Watts. The gel was stained with SYBR® gold for 5 mins using SYBR® gold staining solution and imaged on a Typhoon scanner using SYBR® gold compatible excitation (520 nm) and emission

(580 nm) filters. The desired size gel fragments (described in figures) were excised by placing gel on a blue light transilluminator.

2.3.7 Gel elution and streptavidin pulldown

The gel fragments were cut into small pieces and added to 800 μl of DNA elution buffer and 10 μl of washed hydrophilic streptavidin beads (NEB) for elution overnight at room temperature.

[The beads were washed 3X with 200 μl streptavidin bead wash buffer (0.5 M

NaCl, 20 mM Tris-HCl pH 7.5, 1mM EDTA) and then resuspended in 10 μl of streptavidin bead resuspension buffer (10 mM Tris HCl pH 7.5, 0.1 mM EDTA, 0.3 M

NaCl)].

After elution, streptavidin beads were isolated from the elution buffer. The gel slurry was transferred to a Spin-X column placed in a collection tube and spun at 10,000

X g for 3 min, at room temperature. The first and second eluates were incubated with the streptavidin magnetic beads for 2-3 hours at room temperature. Then magnetic beads were separated from the eluate and resuspended in 10 μl of RNase-free water.

35 2.3.8 Circularization

The bead slurry was transferred to a PCR tube and components of the circularization reaction were added: 10.0 μl RT Product (Bead slurry), 2.0 μl CircLigase Reaction Buffer

(Epicenter Biotechnologies), 1.0 μl 1 mM ATP, 1.0 μl 50 mM MnCl2, 4.0 μl 5M Betaine

(Sigma-Aldrich), 1.0 μl CircLigase and RNase-free water to 20 μl. The reaction was incubated at 60°C for 4 hours followed by heat inactivation at 80°C for 10 mins in a thermocycler.

2.3.9 PCR Amplification

The number of PCR cycles required to generate enough cDNA for deep sequencing will depend on the amount of the RT product, and in turn, on the amount of input RNA. Thus, the exact number of cycles needed for each sample were empirically determined. I performed small scale test PCRs at three-four different number of cycles to test the appropriate number of PCR cycles for all samples. The appropriate number of PCR cycles for large scale PCR were chosen based on the identification of the minimum number of PCR cycles required for library amplification where the free (unused) primer pool is minimally depleted.

Small-scale test PCRs

For a small-scale PCR, 4 to 6 μl of the circularization reaction was added to 2.25 μl of 10

μM phosphorothioate PE1.0 and PE2.0 primers and 9.0 μl 5X Q5 polymerase buffer

(NEB), 0.9 μl dNTPs, 0.45 μl Q5 polymerase (NEB) and the reaction was made up to 45

μl with RNase-free water. Then, the reaction was split into 3 equal parts and each reaction was subjected to a different number of PCR cycles (e.g. 5, 8 and 11 cycles) using the following amplification conditions: 36 98°C – 30 sec

98°C – 5 sec

65°C – 10 sec

72°C – 15 sec

72°C – 2 min

12°C – hold

(repeat steps in bold for desired number of cycles)

The PCR reactions were mixed with 6X DNA loading dye and resolved on an 8% nondenaturing PAGE at 35 Watts. The gel was stained with SYBR® gold for 5 mins in

SYBR® gold staining solution and imaged on a Typhoon scanner and the optimal number of cycles were determined (details in figures).

Large-scale PCRs

The PCR was repeated for 2 x 45 μl PCRs for a fixed number of PCR cycles determined based on the gel. The large-scale PCR products are resolved on an 8% nondenaturing

PAGE at constant volts (150 V max). The gel was stained with SYBR® gold and image on a Typhoon scanner and then gel fragments containing PCR products of expected size were excised. The gel fragments were crushed and added to DNA elution buffer overnight. The DNA was precipitated out of elution buffer using ethanol and resuspended in 20 μl water.

2.3.10 DNA quantification and sample prep for next generation sequencing

To prepare DNA sample for next generation sequencing, both the size and amount of

PCR product had to be carefully quantified. NGS libraries usually result in low sub- nanogram DNA yields, and therefore require highly sensitive quantification methods. 37 DNA size in a library is best quantified via automated chip-based electrophoresis systems such as Bioanalyzer (high sensitivity DNA chip) or TapeStation (DNA and RNA

ScreenTape). I used Bioanalyzer to quantify the library prepared from the KpAGO input sample, the trace and stats for the same are shown in Figure 2.5C. DNA concentration of a library was accurately quantified using a Qubit fluorometer.

2.3.11 TOPO Cloning to validate NGS library

A small amount of PCR product was cloned into T-tailed TOPO TA-cloning vector

(TOPO® TA Cloning® kit for subcloning; Invitrogen) and transformed into E. coli.

Using universal primers, inserts from a handful of bacterial colonies were PCR amplified and sequenced via Sanger sequencing with M13 reverse primer to validate the library for presence of random e.coli sequences before deep-sequencing (data not shown).

38 2.4 Results

My custom RNA-Seq library preparation protocol involves resolving the RT product of input RNA on a Urea-PAGE gel and then excising the extended RT product for elution in

DNA elution buffer. The extended RT product is observed as a smear above the band representing the RT obtained from excess adapter (red box in Fig 2.1). Despite clean excision of cDNA representing the input RNA sample, the eluate was always contaminated with insert-lacking or unextended adapter cDNA which was visible after

PCR amplification. This caveat in my protocol was particularly pronounced when making

RNA-Seq libraries for shorter RNAs needed sequencing and adapter cDNA accounted for some of the sequencing reads. I hypothesized that incorporating biotinylated dNTPs in the RT reaction can improve the yield of sample cDNA from the RT reaction. To test our hypothesis, 4 pmoles of a synthetic 28-nt RNA was used as input in the library preparation protocol. After ligation of the 3′ end adapter to input RNA sample, the sample was split into four equal parts. Each part was used in an RT reaction that differed in the type of dNTPs: (A and B) normal dNTPs (10 mM each) (C) suboptimal dNTPs

(0.25 mM each) and (D) Biotinylated dNTPs (0.25 mM each with biotin-dATP and biotin-dCTP). The RT product was resolved on a 10% Urea-PAGE gel and all four RT reactions produced comparable yields (lighter band in lanes A-D in Fig 2.2). Products from three RT reactions (B, C and D) were excised from the Urea-PAGE gel and then cDNA was extracted by incubating the gel pieces in DNA elution buffer overnight with magnetic streptavidin beads. For sample D, the gel pieces were separated from the elution buffer and magnetic beads. Then, the gel pieces of sample D were used to re-extract cDNA over 8 hours with fresh DNA elution buffer. All four eluates (B, C, D and D2) 39 were ethanol precipitated to obtain cDNA. The bead-bound cDNA (Dbeads) sample and the precipitated cDNA samples (Bbeads+elution, Cbeads+elution, Delution1, Delution2) were then circularized. Equal parts of the circularized sample were used to test amplification using increasing number of PCR cycles (for example, 5,8 and 11 cycles). The small-scale PCR product was resolved on an 8% PAGE gel. The PAGE gel shows that 8 cycles of PCR is sufficient to generate a product at the expected size in sample A without the synthesis of high amounts of spurious DNA species (lanes 2,5,8,11,14 in Fig 2.3). As expected, all samples that did not incorporate biotinylated dNTPs in the RT reaction generated a PCR product from the insert-lacking or unextended adapter (arrow in Fig 2.3). As hypothesized, sample Dbeads which selects for only the insert-containing cDNAs did not generate a PCR product from the unextended adapters (lanes 7-9, Fig 2.3). This result showed that the revised protocol is highly efficient at generating complex libraries that do not contain insert-lacking cDNAs.

After optimizing our protocol to increase efficiency, I proceeded to determine the lowest amount of input RNA that would be sufficient to make complex RNA-Seq libraries. For this, I performed the protocol described above for four different concentrations of input synthetic 28-nt RNA: 0.01, 0.05, 0.25 and 1.25 pmol. Ligation, reverse transcription and circularization were performed in the same way at the same time for these different RNA concentrations. The small-scale PCR gel shows that the

0.01 pmol sample at 16 cycles (Fig 2.4, lane 3) is comparable to the 1.25 pmol sample at

12 cycles (Fig 2.4, lane 15). This observation showed that 0.01 pmol of short input RNA was sufficient for generating complex library with enough cDNA concentration for deep- sequencing. Thus, our custom RNA-Seq library preparation method can be used for 40 identification of RNAs via NGS from scarce RNA samples.

To test our method on a scarce biological sample, we utilized it for identifying small RNAs that co-purify with crystallized yeast KpAGO protein (Kluyveromyces polysporus Argonaute1). AGOs are loaded with small RNAs as guides to recognize target mRNAs. Daniel Dayeh isolated RNA from KpAGO crystals and then Justin Mabin input them in our RNA-Seq library preparation procedure. Justin was able to successfully generate cDNA from the KpAGO-bound RNAs and amplify it to make libraries, as shown in the small-scale (Fig 2.5A) and large-scale (Fig 2.5B) PCR gels. After large scale PCR, Justin extracted the library cDNA from the gel and quantified it using DNA bioanalyzer (Fig 2.5C). We then sequenced the KpAGO samples, I analyzed the data and determined the nucleotide frequency at the first 10 positions of all reads using the

Weblogo (111,112) software. I found that, as expected from previous AGO studies, a high percentage of the guide RNA’s loaded onto KpAGO have a sequence which begins at U or A (Fig 2.5D). This result showed that my custom method was successful at amplifying miRNA sequences isolated from low amounts of starting materials and at obtaining meaningful sequencing data from the library.

2.5 Conclusion

In recent years RNA-Seq has emerged to be an essential technique for studying gene expression and identifying novel RNA sequences in the context of specific tissues or

RNA-binding proteins. Unlike most RNA-Seq library preparation kits, my custom method for generating RNA-Seq library is compatible with both short and long RNA fragments. The incorporation of biotinylated nucleotides in our protocol ensures the absence of insert-lacking adapter cDNAs in our library thereby ensuring that RNA-Seq 41 reads only represent sequences present in the biological sample. Further, my method can generate complex libraries with input samples being as low as 0.01 pmol. I have been able to successfully generate libraries from RNAs bound to crystallized KpAGO and identify their sequences.

42 2.6 Figures

Figure 2.1 Schematic of RNA-seq library preparation procedure. 1: Dotted wavy line depicts input RNA with a 3' hydroxyl. 2: Pre-adenylated 3' adaptor is shown as a thick black line ligated to RNA 3'-end. 3: 3' adapter complementary

43 sequences in the RT primer is as thick black line. In the RT primer, sequence corresponding to forward sequencing primer is represented by a grey tapered arrow, and reverse primer is shown as a black tapered arrow. The hexaethylene glycol spacer SP18 that connects the forward and reverse primers is shown as a solid wavy line. At the 5' end of the forward primer are a random 5-mer, the 5-nt TruSeq barcode sequence (BC) and two Gs. The 5’ phosphate on the terminal G is also shown. Following RT, the incorporated biotinylated dNTPs are represented by stars. 4: Dotted box marks the area of the gel that is excised out for elution of the RT product. 5: The biotinylated RT product is shown captured on the streptavidin conjugated magnetic beads (grey sphere) whereas unextended RT primer is shown to be selected away. 6: The circularized RT product is shown where the G at the 5' terminus of the forward primer end is shown ligated to the 3' end of the extended cDNA. 7: Illumina primers PE 1.0 and PE 2.0 are as grey and black arrows, respectively, and are shown bound to their complementary sequences in the circularized RT product. Note that biotinylated nucleotides in the cDNA do not interfere with PCR amplification. 8, 9 and 10 describe main steps post cDNA library preparation.

44

Figure 2.2 RT-PCR using cDNA with biotinylated dNTPs prevents amplification of the excess adapter. Urea-PAGE gel showing the four different RT reactions resolved on this gel differed in the concentration and type of dNTPs used.

A and B: normal dNTPs (10 mM each)

C: suboptimal dNTPs (0.25 mM each)

D: Biotinylated dNTPs (0.25 mM each with biotin-dATP and biotin-dCTP).

45

Figure 2.3 Product of the RT reaction using biotinylated dNTPs is comparable to reaction with normal dNTPs. PAGE gel showing small scale PCR products from 5 different types of the RT reaction products. 1) Sample obtained from normal RT (10 mM normal dNTPs) with beads added to the PCR reaction as a control (lanes 1-3); 2) sample obtained from suboptimal RT

(0.25 mM dNTPs) with beads added to the PCR reaction as a control (lanes 4-6); 3)

Sample obtained from the biotinylated RT (0.25 mM dNTPs with biotin-dATP and biotin-dCTP) that was bound to beads (lanes 7-9), precipitated from day 1 elution buffer

(lanes 10-13) and precipitated from day 2 elution buffer (lanes 14-17). L: 25 bp ladder

46

Figure 2.4 0.01 pmol of RNA as input is sufficient to generate complex libraries. PAGE gel showing small-scale PCR products generated using biotinylated cDNA obtained from 4 different concentrations of input RNA, 0.01 pmol (lanes 1-4), 0.05 pmol

(lanes 5-8), 0.25 pmol (lanes 9-12) and 1.25 pmol (lanes 13-16). L: 25 bp ladder.

47

Figure 2.5 PCR amplification and size estimation of NGS libraries made from RNA bound to KpAgo. A. SYBR® gold stained 8% nondenaturing PAGE-gel showing specific PCR products, over-amplification products, and unused primers (lanes 2-4; each species is indicated on the right) in small scale PCRs performed for increasing number of cycles indicated on top of each lane. The 25-bp DNA ladder is in lane 1.

48 B. SYBR® gold stained 8% nondenaturing PAGE showing the product band and unused primers (labeled on the right) from large scale PCR (lane 2). The 25-bp DNA ladder is in lane 1. The gel fragment excised to purify DNA for NGS is indicated vy a red-dotted rectangle.

C. The Bioanalyzer report of the purified PCR product from B. On the left is a virtual gel image of the DNA ladder (lane 1) and cDNA library of KpAGO-bound RNAs

(lane 2). On the right is the histogram showing the size of the cDNA library along with size markers (top) and a table with summary of size and amount quantification of the library (bottom). Note that while Bioanalyzer analysis also provides sample concentration (see table), fluorescent DNA binding dye based quantification provides more accurate quantification of DNA amount.

D. A web-logo showing the probability distribution (y-axis) of each of the four nucleotide bases at positions 1-10 (x-axis) of the KpAGO-bound RNAs. ~1.29 million reads ≥10 nt were used for this analysis.

*The extraction of RNA from KpAGO was conducted by Daniel Dayeh and the library preparation was done by Justin Mabin. The RNA-Seq data was analyzed by

Pooja Gangras.

49 Chapter 3 Stop codon-proximal 3′UTR introns elicit EJC-dependent nonsense-

mediated mRNA decay and regulate vertebrate development

This chapter contains work from a manuscript that is currently in revision. The manuscript reference is: P. Gangras, T. Gallagher, M. Parthun, R. Patton, K. Tietz, Z.Yi, N. Deans, R. Bundschuh, S. Amacher, G. Singh (2019); Stop codon proximal 3′UTR introns in vertebrates can elicit EJC-dependent nonsense-mediated mRNA decay. (Under review; Preprint on BioRxiv doi: https://doi.org/10.1101/677666) Individual contributions of the authors other than me have been specified in the figure legends.

3.1 Abstract

Many post-transcriptional mechanisms operate via mRNA 3′UTRs to regulate gene expression, and such controls are crucial for development. Here we show that the exon junction complex (EJC) is critical during zebrafish embryonic development to regulate mRNAs subjected to nonsense-mediated mRNA decay (NMD) due to translation termination ≥ 50 nts upstream of the last exon-exon junction. Surprisingly, we find that

EJC-dependent NMD also regulates a new class of transcripts that contain 3′UTR introns

(3′UI) < 50 nts downstream of a stop codon. Such proximal 3′UI-containing transcripts are also NMD-sensitive in cultured human cells and mouse embryonic stem cells. We identify 167 genes that contain a conserved proximal 3′UI in zebrafish, mouse and humans, and these genes are enriched in nervous system development and RNA binding functions. foxo3b is a proximal 3′UI-containing gene that is upregulated in zebrafish EJC mutant embryos, at both mRNA and protein levels, and loss of foxo3b function in EJC mutant embryos significantly rescues motor axon growth defects. These data are consistent with EJC-dependent NMD regulating foxo3b mRNA to control protein

50 expression during zebrafish development. Our work identifies new rules by which 3′UIs induce NMD to control protein production in vertebrates.

51 3.2 Introduction

Post-transcriptional control of messenger RNA (mRNA) expression is critical to regulate location, amount, and duration of protein expression. To achieve optimal protein expression in , many regulatory signals reside in mRNA 3′-untranslated regions (3′UTRs) (113). Recognition of 3′UTR-embedded signals by RNA binding proteins and miRNAs alter the 3′UTR ribonucleoprotein (RNP) composition and regulate mRNA localization, translation, and stability (113–115). Nuclear RNA processing steps such as alternative polyadenylation can further impact 3′UTR RNP composition by altering 3′UTR length, and hence the repertoire of 3′UTR regulatory signals (116,117).

Mechanisms dictating 3′UTR RNP composition are thus important for cellular function and organismal development (1,2,113) .

Pre-mRNA splicing also greatly impacts RNP composition by imprinting several proteins including the exon junction complex (EJC) on spliced (16,17,118). The

EJC is comprised of three core proteins, eIF4AIII, RBM8A (Y14), and MAGOH, which assemble ~24 nt upstream of exon-exon junctions and regulate many post-transcriptional steps including pre-mRNA splicing, mRNA export, localization, translation and nonsense-mediated mRNA decay (NMD) (6–8). As introns mainly occur in open reading frames and rarely in 3′UTRs, the EJC mainly decorates the translated portion of mRNAs

(119), from where they are removed by the first translating ribosome (120,121).

However, if a ribosome terminates translation ≥ 50 nucleotides (nts) upstream of an exon-exon junction, one or more EJCs that remain on the mRNA are now located within the 3′UTR. Such EJCs that occur downstream of a terminated ribosome can engage components of the NMD pathway leading to activation of the central NMD factor UPF1 52 and rapid mRNA turnover (19,20). In this way, the EJC can induce destruction of aberrant transcripts bearing premature termination codons to suppress expression of truncated polypeptides. When combined with regulated alternative splicing, such EJC- induced NMD can also suppress expression of particular protein isoforms to regulate cellular homeostasis and developmental decisions (67,122,123). Normal mRNAs that contain features such as 3′UTR introns (3′UIs) can also acquire EJCs located in the

3′UTRs (and hence within the 3′UTR RNP) (20,25–28). In the case of such normal transcripts, the ribosome terminates at a normal stop codon after production of at least one full length polypeptide, but due to the presence of a downstream EJC, the transcript is targeted for decay. Thus, EJC-dependent NMD also acts as a mechanism to fine-tune protein expression as has been shown for ARC mRNA at neuronal synapses (27,28).

Interestingly, 3′UI-bearing transcripts are enriched for neuronal and hematopoietic functions (27), and are expressed in tissue-specific patterns (28), suggesting that 3′UIs may play an important role in regulating tissue-specific developmental programs via

EJC-dependent NMD. However, the extent of gene regulation via 3′UI-dependent NMD remains largely unexplored, particularly during development.

The EJC core components RBM8A and MAGOH were first discovered in

Drosophila for their role in germ cell specification and embryo patterning (29,31). More recent work showing that mutations in human EJC core protein-encoding genes cause defects in neural, musculoskeletal and hematopoietic development underscores the importance of the EJC during development (36,42). Developmental defects in neural cell types are also observed in Xenopus embryos and mouse models with reduced EJC core protein levels, suggesting conserved and essential EJC neural functions (43,44,47,48). 53 Recent work in mouse models has illuminated an important role for EJC core components in neural precursor cell proliferation during brain development (45,51,124).

In mice that are conditionally haploinsufficient for any of one the three EJC core components, neural precursor cells exit the cell cycle early and prematurely differentiate, leading to excessive production of neurons, which then undergo p53-dependent apoptosis

(43,44,49,124). These defects lead to impaired cortical development and microcephaly, a phenotype also associated with RBM8A and EIF4A3 mutations in humans (36,42). While these advances highlight the critical role of EJC during neural development, much remains to be learned about EJC-regulated developmental gene expression programs and how each of the EJC’s many functions contribute to developmental gene regulation.

In this work, we studied EJC developmental functions in zebrafish, a vertebrate model where embryonic tissue formation and morphogenesis are readily observable. We find that zebrafish Rbm8a and Magoh proteins are deposited on mRNAs similarly in zebrafish as in other vertebrate models and have crucial functions in muscle and neural lineages. In rbm8a and magoh mutant embryos, the EJC-dependent NMD is disrupted.

Strikingly, we uncover a class of genes that contain a stop codon-proximal 3′UI (intron within 50 nts of the stop codon) and are upregulated in zebrafish EJC mutant embryos and upf1 morphants. We show that loss of foxo3b, a proximal 3′UI-containing gene whose transcript and protein levels are elevated in EJC mutants, significantly rescues the

EJC motor neuron outgrowth defect. These data are consistent with the idea that EJC located in foxo3b 3′UTR triggers NMD to regulate its protein output. Proximal 3′UI- containing genes are also widespread in human and mouse , and are similarly regulated by human and mouse NMD pathways. I identify 167 genes that contain a 3′UI 54 at a stop codon-proximal position in zebrafish, mouse and humans. These genes are enriched for genes encoding RNA-binding proteins and proteins involved in nervous system development. Overall, our work uncovers new rules of 3′UI-induced NMD, and highlights the critical role of this process in regulating developmental gene expression.

55 3.3 Materials and methods

3.3.1 Animal stocks, lines, and husbandry

Adult zebrafish (Danio rerio) were housed at 28.5°C on a 14 hour light/10 hour dark cycle and embryos were obtained by normal spawning or in vitro fertilization. Embryos were raised at both 25°C and 28.5°C and were staged according to Kimmel et al. 1995

(74). rbm8aoz36 and magohoz37 lines were generated using CRISPR/Cas9 mutagenesis

(described below) in the AB strain. The foxo3bihb404 line (96) was obtained from the Xiao lab at the Chinese Academy of Sciences, Wuhan, China. Animal experiments were performed in accordance with institutional and national guidelines and regulations and were approved by the Ohio State University Animal Care and Use Committees.

3.3.2 CRISPR/Cas9 mutagenesis

An optimal CRISPR target site in the coding sequence of rbm8a and magoh was identified using the ZiFit Targeter software package (125,126). gRNAs were designed and synthesized as described (127). rbm8a- or magoh-targeting gRNA was co-injected with Cas9 mRNA (128) into 1-cell stage embryos (60 pg gRNA and 160 pg Cas9 mRNA). rbm8a gRNA target site: (5’-GGGAGGCGAAGACTTTCCTA-3’) magoh gRNA target site: (5′-GGTACTATGTGGGGCATAA-3′)

Injected embryos were raised to 24 hpf at which time embryos were individually screened by high-resolution melting analysis (HRMA) to assess target site mutation efficiency in somatic cells. Remaining embryos were raised and crossed to AB wild-type adults; F1 adults were screened for germline transmission of CRISPR-induced mutations 56 using HRMA. HRMA revealed unique rbm8a and magoh mutant alleles transmitted by multiple F0 founders. We recovered the rbm8aoz35 and magohoz36 alleles and outcrossed the heterozygotes to the AB wild-type strain for two generations before intercrossing for phenotypic analyses.

3.3.3 EJC mutant and foxo3bihb404 mutant embryo and adult genotyping strategy

Individual embryos and adult fin tissue were lysed in 50 µl 1M NaOH for 15 mins at

95°C followed by incubation on ice for 5 minutes at 4°C, and then neutralized with 5 µl of 1M Tris-HCl pH 8. For genotyping fixed embryos, heads were removed into

ThermoPol buffer (20 µl) and treated with 2 mg/ml ProK at 55°C for 3 hours to extract

DNA. 1 µl of DNA extract was used as a template in a 20 µl PCR with Taq polymerase according to the manufacturer's protocol (NEB). For genotyping rbm8aoz35 and foxo3bihb404 mutant embryos, PCR products were digested with 20 units of XmnI and

XcmI respectively (NEB) to distinguish cleavable mutant from un-cleavable wild-type amplicons. Digested products were analyzed on a 1% agarose gel stained with Gel Red

(Biotium). For genotyping magohoz36 mutant embryos, PCR products were analyzed by separation of mutant and wild-type alleles on a 2% agarose gel stained with Gel Red

(Biotium). Primers are listed in Appendix C.

3.3.4 Acridine orange staining and immunohistochemistry

Embryos were incubated in 1:5000 acridine orange solution for 1 hr at 28.5°C (stock: 6 mg/ml, Sigma-Aldrich) followed by 2X washes in fish system water. For immunohistochemistry, embryos were processed following standard protocols using 4%

PFA fixation, permeabilization using acetone, and incubation in blocking solution for 1

57 hour. EJC mutant embryos and wild-type siblings at 24 hpf and 26 hpf were incubated in

2% BSA/2% goat serum/1% DMSO/0.1% Tween-20/PBS blocking solution with 1:100 dilution anti-SV2 (DSHB) and 1:1000 anti-A4.1025 (DSHB) primary and

AlexaFlour (Molecular Probes) secondary antibodies. Embryos were stained with Alexa

Fluor 488-conjugated α-Bungarotoxin (Thermo Fisher) incubation in a 1:200 blocking solution post primary and secondary staining. All images were centered on the region above the end of the yolk tube which included somites 12-16 at 24 hpf and somites 16-20 at 26 hpf.

3.3.5 Microscopy and Imaging

Immuno-stained embryos were dissected and mounted in Fluoromount-G

(SouthernBiotech) and imaged at 40x magnification using MetaMorph software

(Molecular Devices) on an Andor™ SpinningDisc Confocal Microscope (Oxford

Instruments) with Nikon Neo camera. Live images of EJC mutant and wild-type sibling embryos were taken by mounting embryos in 3% methylcellulose and imaging with a

AxioCam camera on a Zeiss upright AxioPlan2 microscope.

3.3.6 Zebrafish NMDI14 inhibitor treatment

NMDI14 (Sigma) stock solution was made in DMSO as per manufacturer’s instructions.

AB wild-type zebrafish embryos were dechorionated on agarose-coated 10 cm plates. At

3 hpf, NMDI14 was added to a final concentration of 4.8 μM. At 24 hpf, embryos

(20/treatment) were rinsed with fresh fish water and added to 500 μl of Trizol (Thermo

Fisher Scientific) for RNA preparation.

58 3.3.7 Immunoblot analysis

SDS-PAGE gels and western blots were performed using the standard mini-PROTEAN tetra system (Bio-Rad). All western blots were stained using infrared fluorophore- conjugated secondary antibodies and were scanned on a LI-COR Odyssey CLx imager.

Protein quantification was performed using Image Studio software (v5.2.5).

3.3.8 Quantification of paralysis and motor axon length

At 24 hpf, EJC mutant and wild-type sibling embryo movements were scored under the dissecting microscope by counting the number of tail contractions per minute. For motor neuron axon quantification, immunofluorescence images of 26 hpf EJC mutant and wild- type sibling embryos were stained with anti-SV2 as described above. Images were imported into Fiji (ImageJ v2) and motor axon length was quantified using the Simple

Neurite Tracer plugin (129).

3.3.9 RNA-Immunoprecipitation-seq and RNA-Seq sample collection

At 24 hpf, zebrafish embryos (n= 800 embryos/IP) were triturated using a 200 µl pipette and washed to remove yolks as previously described (130), followed by flash freezing the tissue in liquid nitrogen. Whole embryo tissue was lysed and sonicated in 800 μl of hypotonic lysis buffer (HLB) [20 mM Tris-HCl pH 7.5, 15 mM NaCl, 10 mM EDTA,

0.5 % NP-40, 0.1 % Triton X-100, 1 mM Aprotinin, 1 mM Leupeptin, 1 mM Pepstatin, 1 mM PMSF]. Lysates were sonicated using a microtip for 7 seconds, NaCl was increased to 150 mM, and RNase I was added to 100 μg/ml. Following a 5-minute incubation on ice, cell lysates were cleared by centrifugation at 15,000 × g. The sample was split into 2 tubes (400 ul each) and the volume was increased to 2 mL by addition of isotonic lysis

59 buffer. Complexes were captured on Protein G Dynabeads (Thermo Fisher) conjugated to IgG or α-Rbm8a for 2 hours at 4 °C. Complexes were washed in isotonic wash buffer

(IsoWB) [20 mM Tris-HCl pH 7.5, 150 mM NaCl, 0.1 % NP-40] and eluted in clear sample buffer [100 mM Tris-HCl pH 6.8, 4 % SDS, 10 mM EDTA, 100 mM DTT]. The proteins were eluted in 20 μl of clear sample buffer [100 mM Tris-Hcl pH 6.8, 4% SDS,

10mM EDTA, 100 mM DTT] and 10 μl of the sample was used to separate the proteins via SDS-PAGE and analyze by western blotting. The remaining 10 μl of the sample was used for RNA extraction using Phenol-Chloroform-Isoamyl alcohol precipitation. RNA was resuspended in 10 μl of Rnase-free water. 1 μl of the RNA was end-labeled with γ-

32 ATP and then run on a denaturing 4% urea PAGE gel to assess quality while the remainder was used for RNA-Seq library preparation.

For RNA-Seq sample collection, EJC mutant and wild-type sibling embryos (N = 25) were harvested at 21 and/or 27 hpf and lysed in 500 µl Trizol (Thermo Fisher Scientific).

RNA was extracted following manufacturer standard procedures.

3.3.10 RIP-seq and RNA-Seq library preparation

For RIP-Seq, RNA extracted from ~90 % of RIP eluate was used to generate strand- specific libraries. For RNA-Seq libraries, 5 µg of total cellular RNA was depleted of ribosomal RNA (RiboZero kit, Illumina), and subjected to base hydrolysis. RNA fragments were then used to generate strand-specific libraries using a custom library preparation method (131). Briefly, a pre-adenylated miR-Cat33 DNA adapter was ligated to RNA 3'-ends and used as a primer binding site for reverse-transcription (RT) using a special RT primer. This RT primer contains two sequences linked via a flexible PEG

60 spacer. The DNA with a free 3'-end contains sequence complementary to a DNA adapter as well as Illumina PE 2.0 primers. The DNA with a free 5'-end contains Illumina PE 1.0 primer sequences followed by a random pentamer, a 5 nt barcode sequence, and ends in

GG at the 5'-end. Following RT, the extended RT primer was gel purified, circularized using CircLigase (Illumina), and used for PCR amplification using Illumina PE 1.0 and

PE 2.0 primers. All DNA libraries were quantified using an Agilent Bioanalyzer instrument to determine DNA length and a Qubit Fluorometer to quantify DNA amount.

Libraries were sequenced on an Illumina HiSeq 2500 platform in the single-end format

(50 nt read lengths).

3.3.11 Zebrafish EJC mutant embryo RIP-seq and RNA-Seq data analysis

Adapter trimming and PCR duplicate removal: After demultiplexing, fastq files containing unmapped reads were first trimmed using Cutadapt (v2.3). A 12 nt sequence on read 5'-end consisting of a 5 nt random barcode sequence, 5 nt identifying barcode, and a CC was removed. The random barcode sequence associated with each read was saved for identifying PCR duplicates down the line. Next, as much of the 3'-adapter

(miR-Cat22) sequence TGGAATTCTCGGGTGCCAAGG was removed from the 3'-end as possible. Any reads less than 20 nts in length after trimming were discarded.

Alignment and removal of multi-mapping reads:

RIP-Seq: Following trimming, reads were aligned with HISAT2 v2.1.0 (Kim et al.,

2015) using 24 threads to zebrafish GRCz10. After alignment, reads with a HISAT2 mapping score less than 60 were removed, i.e. all multi-mapped reads were discarded.

Finally, all reads mapping to identical regions were compared for their random barcode

61 sequence; if the random sequences matched, such reads were inferred as PCR duplicates and only one such read was kept.

RNA-Seq: Trimmed libraries were aligned to the zebrafish using TopHat2 (132)

(v2.0.14 and default options: --read-mismatches 2, --red-gap-length 2, --read-edit-dist 2,

--min-anchor-length 8, --splice-mismatches 0, --num-threads 2 (not default), --max- multihits 20) and the GRCz10 genome assembly. Read count followed by differential expression analysis was conducted as stated in the Love et al. 2018 RNA-Seq workflow.

RIP-Seq data downstream analyses: First, by comparing aligned reads to a GRCz10 exon annotation obtained through Ensembl BioMart we determined the 5' and 3' end distribution of RIP-Seq reads and the meta-exon distributions of RIP-Seq reads. The primary reference transcriptome was obtained from Ensembl BioMart. RIP-Seq transcripts with an APPRIS P1 annotation were filtered out and only one major transcript per gene was used for all analyses concerning the specificity of the RIP-Seq replicates.

These analyses include calculation of RPKMs for the major APPRIS P1 isoform and compare intronic RPKMs to exonic RPKMs as well as intron-less transcript RPKMs to multi-exon transcript RPKMs.

RNA-Seq differential expression analysis: Differential expression analysis using EJC mutant embryos and wild-type RNA-Seq data was conducted based on the RNA-Seq workflow published by Love et al. 2018 (133). For each RNA-Seq experiment consisting of an EJC mutant and its WT sibling at a given time-point, three biological replicates were sequenced per genotype.

First, to create count-matrices for each RNA-Seq experiment, the GenomicAlignments and SummarizedExperiment software (134,135) were used to count reads mapping per 62 gene for each RNA-Seq bio-replicate. The count matrix was filtered to remove all genes with zero counts in all samples before differential expression (DE) analysis using

DESeq2 (136). At least one, if not all of the biological replicates for each RNA-Seq experiment were sequenced during a separate deep-sequencing run. These differences in sequencing runs introduced some variability among the replicates. To account for the variability among our RNA-Seq bio-replicates during differential expression analysis we used the RUV-seq R package (137). Usual methods of normalization only account for sequencing depth but RUV-seq methods can be used to normalize libraries for library preparation and other technical effects. We used the RUVs method of the RUV-seq package which utilizes the centered counts (the counts of genes unaffected by our covariates of interest such as the sample genotype) to determine a normalization factor for each library. The count matrix was imported into DESeq2, and the RUVs normalization factors and genotype were used in the design formula to construct the

DESeq dataset for gene-level differential expression analysis. We used the LRT test with all default DESeq2 settings to identify genes differentially expressed between mutant and wild-type samples. To correct for multiple testing in the DE analysis we used Benjamini-

Hochberg (BH) adjustment with independentFiltering set to false. We decided to set independentFiltering to false because we are interested in studying NMD targets which are most likely to have low read counts in wild-type embryos. In the case of the rbm8a-/-

21 hpf and 27 hpf datasets the histogram of all p-values showed a hill-shaped distribution. In order to account for this distribution, as per the suggestion made in RNA-

Seq workflow (133), we used fdrtool (138) for multiple testing and determined adjusted p-values using default fdrtool settings. 63 3.3.12 Gene Ontology enrichment analysis

The PANTHER14.0 (139) tool was used to identify significantly enriched biological process GO terms in genes that are found to be significantly differentially expressed in

EJC mutant embryos by DESeq2. The PANTHER tool was also used to identify significantly enriched biological process GO terms in proximal 3′UI genes in zebrafish and humans. For all analyses the PANTHER’s Benjamini-Hochberg correction was used to calculate adjusted p-values.

3.3.13 Overlap analysis

The universal and test sets were chosen to be the set of Ensembl gene IDs which did not correspond to NA in the adjusted p-value and gene symbol columns post-DESeq2 analysis. After determining a universal dataset for each RNA-Seq dataset, the smallest universal set for the comparison in question was chosen. The significance of overlap was calculated using a hypergeometric test on the R statistical software.

3.3.14 STRING network analysis

The STRING database (140) was used to identify connections between proteins encoded by proximal 3′UI genes with default high confidence settings (minimum required interaction score = 0.7). The clusters shown were created after the Markov Cluster

Algorithm (MCL) inflation parameter was set to 3 clusters.

Human and mouse RNA-Seq data analysis

SRA files were downloaded from sources specified in (65,72). Fastq files generated from the SRA files were mapped using TopHat2 (version 2.1.1) using the same settings described above for zebrafish alignment. Count matrices were generated using the

GenomicAlignments and SummarizedExperiment packages. The count matrices were 64 imported into DESeq2 for differential expression analysis using the LRT test, BH adjustment and with independentFiltering set to false.

3.3.15 Identification of uORF genes in zebrafish

We selected for uORFs which were categorized as “functional uORFs” in Johnstone et al. 2016 (141) based on RNA-Seq and ribosome profiling. The following filters were applied to select for uORFs that show evidence of translation at 24 hpf: 5’UTR RPF

RPKM ≥ 5, RNA-Seq RPKM≥ 5, RPF RPKM ≥ 5, ORF translation efficiency at 24 hpf

>1.

3.3.16 Identification of 3′UTR intron containing genes in zebrafish, mouse, and human

A table describing exon starts, exon ends, CDS start, CDS end, strand and APPRIS annotation was downloaded from the Ensembl database for all transcripts in zebrafish

(GRCz10), human (GRCh38) and mouse (GRCm38). All transcripts with any level of

APPRIS annotation (142) were included. We then identified transcripts that contain introns in the 3′UTRs by subtracting exon start coordinates from the CDS end coordinates in a strand specific manner. We then determined the distance of the nearest

3′UTR intron to the stop codon; based on the distance (< or ≥ 50 nts) as well as the number of 3′UTR introns the transcripts were classified into proximal and distal categories. Proximal transcripts were defined by the presence of only one 3′UTR intron which is within 50 nts of the normal stop codon. Distal transcripts were defined by the presence of one 3′UTR intron which is more than 50 nts away from the stop codon or by the presence of more than one 3′UTR intron irrespective of the distance of the nearest

65 3′UTR intron to the stop. For all fold-change analyses, we defined distal/proximal 3′UI- containing genes as those that encode one or more distal/proximal 3UI+ transcripts. The distal 3UI-containing genes that did not have an APPRIS annotation but were annotated with an ‘NMD biotype’ in the Ensembl database were included as a separate group in the analyses included in Fig 3.7 and the group was named as NMD biotype.

3.3.16 Mammalian cell culture knockdown experiments

HCT116 cells were seeded into 12 well plates (105 cells/ well) in McCoy’s 5A media and

15 pmol siRNA was reverse transfected using 1.6 µl lipofectamine RNAiMAX reagent

(Thermo Fisher Scientific) per well. Knockdown was carried out for 48 hours with a media change after 24 hours. Cells were harvested in hypotonic lysis buffer (described above), 30% of the cell lysate was saved to check the efficiency of knockdown while the rest was added to TRI Reagent (Sigma) for RNA extraction. The siRNAs used in this study are listed below:

Hs_RBM8A_5 FlexiTube siRNA, no modification, 20 nmole (SI03046533,

Qiagen)

All Stars Negative Control siRNA, no modification, 20 nmole (SI03650318,

Qiagen)

UPF1_1879: AAG AUG CAG UUC CGC UCC AUU

EIF4A3_187: CGA GCA AUC AAG CAG AUC AUU

3.3.17 Zebrafish upf1 knockdown experiment and RNA-Seq

For conducting upf1 knockdown in zebrafish, 2 ng of a splice blocking (MO) diluted in 0.2 M KCl with 0.1% phenol red was injected into 1-cell stage embryos. The

66 upf1 MO used was previously published and named upf1 MO2 (143) with a sequence of

5’-TTTTGGGAGTTTATACCTGGTTGTC-3’. Morpholino was synthesized by Gene

Tools, LLC. Uninjected wild-type control embryos (n=30) and injected morphants

(n=30) were raised for 12 hours at 28.5°C and lysed in 500 µl Trizol following manufacturer’s procedures (Thermo Fisher Scientific). After RNA-extraction, double- stranded cDNA was synthesized following Illumina's TruSeq protocol per manufacturer’s instructions. Briefly, mRNA was purified from 1 µg total RNA using

Dynabeads oligo(dT)25 magnetic beads (Thermo Fisher Scientific) followed by clean-up with AMPure XP SPRI beads (Beckman Coulter). mRNA was then fragmented for 5 minutes at 70°C using Ambion's RNA Fragmentation Reagent (AM8740) followed by an additional clean-up step using AMPure XP SPRI beads. Fragmented RNA was reverse transcribed using random primers and SuperScript III (Thermo Fisher Scientific). After second strand synthesis, end repair, and 3' end adenylation per the TruSeq protocol

(Illumina, Inc.), libraries were constructed using an Apollo 324 automated library system. After Illumina adapter ligation and amplification, all DNA libraries were quantified using an Agilent Bioanalyzer instrument to determine DNA length and a

Qubit Fluorometer to quantify DNA amount. 10 cycles of amplification was performed prior to sequencing on an Illumina HiSeq 2000 system in the paired-end format (100 nt read lengths). All experiments were performed in biological duplicate.

RNA-Seq data analysis: Trimmed reads were obtained from the sequencing core and then mapped to the GRCz10 genome assembly using TopHat2 as described above.

Differential gene expression analysis was also performed as described above using

DESeq2. RUV-Seq and fdrtool corrections were not required. 67 3.3.18 Quantitative RT-PCR

Zebrafish embryos and mammalian cells were harvested in Trizol. RNA was isolated using standard Trizol procedures, followed by DNase treatment, purification with

Phenol:Chloroform: Isoamyl alcohol (25:24:1, pH 4.5) and resuspension in RNase-free water. 1.5 μg of RNA was reverse transcribed using oligo-dT and Superscript III

(Invitrogen). After reverse transcription of RNA, the samples were treated with RNase H

(Promega) for 30 min at 37°C. For each qPCR 30 ng of cDNA was mixed with 5 μl of

2X SYBR Green Master Mix (ABS), 0.2 μl of a 10 mM forward and reverse primer each

(defrosted once) in a 10 μl reaction. The qPCRs were performed in triplicate (technical).

Reference genes for relative quantification were mob4 in zebrafish (144) and TATA- binding protein (TBP) in human cells. Fold-change calculations were performed by the

ΔΔCt method. Fold-changes from three biological replicates were used to determine the standard error of means. The p-values were calculated using Welch t-test in the R statistical computing software. Primers are listed in Appendix C.

3.3.19 Quantification and statistical analysis

All western blots were performed using infrared fluorophore conjugated secondary antibodies and were scanned on a LI-COR Odyssey CLx imager. Protein quantification was performed using Image Studio software (v5.2.5). Northern blot autoradiograms were scanned using Fuji FLA imager and quantified using ImageQuant TL software (v7.0).

Average and standard error of means in the observed signal was determined for data from at least three biological replicates.

68 Data availability

All short-read sequencing data are available at NCBI GEO database, accession number:

GSE135019.

69 3.4 Results

3.4.1 EJC composition and deposition is conserved in zebrafish

The three proteins, Eif4a3, Rbm8a, and Magoh, that form the EJC core are highly conserved among multicellular organisms including zebrafish and humans (Fig 3.2A). To test if the zebrafish EJC core proteins assemble into a complex similar to that observed in human (145–147) and Drosophila (148) cells, we immunoprecipitated Rbm8a from

RNase-treated zebrafish embryo extracts. I find that both Eif4a3 and Magoh, but not a negative control RNA-binding protein HuC, specifically co-immunopreciptate with

Rbm8a (Fig 3.1A). We and others have previously shown that the EJC primarily binds

24 nts upstream of exon-exon junctions in cultured human cells and adult Drosophila

(9,149–152). To test if the EJC binds at a similar position on zebrafish spliced RNAs, we first optimized RNA-immunoprecipitation (RIP) from RNase-treated zebrafish embryo lysates using an Rbm8a antibody (Figs 3.2B-C). Using optimized conditions, we obtained three well-correlated Rbm8a RIP-Seq biological replicates of Rbm8a-associated

RNA fragments from zebrafish embryo lysates (Fig 3.2D, Appendix B). Rbm8a footprint read densities are significantly higher in exonic regions as compared to intronic regions

(Fig 3.1B), and on exons from multi-exon genes as compared to those from intron-less genes (Fig 3.1C). A meta-exon analysis shows that, like the human EJC, zebrafish

Rbm8a footprints cluster around the canonical EJC position 24 nts upstream of exon 3′ ends (Fig 3.1D). As expected, 5′ and 3′ ends of RIP-Seq reads accumulate upstream and downstream of the -24 nts position, respectively (Fig 3.1E). A dramatic reduction in 5′ and 3′ end read counts is seen in a ~10 nt region around the -24 position, revealing the

70 RNA segment that is protected from RNase digestion by Rbm8a-containing EJCs (Fig

3.1E). The predominant Rbm8a-occupancy position close to exonic 3′ ends is also evident from the RIP-Seq read distribution on individual exons (Fig 3.1F and Fig 3.2E).

Qualitatively, many canonical EJC sites from highly expressed multi-exon genes show variable Rbm8a binding (Fig 3.1F and Fig 3.2E). These profiles also show that zebrafish

Rbm8a also associates with non-canonical positions away from the -24 position, as observed previously in human cells (Figs 3.1C, 3.1F and 3.2E). Thus, like in human cells, the EJC in zebrafish embryos is also detected at non-canonical positions. Taken together, I conclude that pre-mRNA splicing shapes zebrafish mRNP composition through EJC deposition at exon-exon junctions and beyond.

3.4.2 rbm8a and magoh mutant embryos show defects in motility, muscle organization, and motor axon outgrowth

To identify the molecular functions of the EJC during embryonic development, we generated zebrafish rbm8a and magoh mutant embryos. Using a CRISPR/Cas9-based approach (127), we created frame-shifting deletions early in the protein coding sequence to generate null alleles (Fig 3.3A). Fish heterozygous for rbm8aoz36 or magohoz37 alleles displayed no obvious phenotypes and were fully viable. Homozygous mutant rbm8a or magoh embryos (hereafter collectively referred to as EJC mutant embryos), obtained by intercrossing rbm8a or magoh heterozygotes, initially appear morphologically normal except for head necrosis and tail curvature prior to 24 hpf (hours post fertilization) (Fig

3.3B). A closer examination revealed that head necrosis is readily detected by acridine orange staining at 19 hpf (Fig 3.4A) and morphologically by 21 hpf (Fig 3.4B). After 24

71 hpf, EJC mutant embryos decline rapidly, with the decline in magoh mutant embryos appearing more advanced at each developmental time point examined (Figs 3.4B-C).

Both EJC mutant embryos have reduced head size, pericardial edema, and widespread necrosis by 32 hpf (Fig 3.4D) and die by 48 hpf.

We hypothesized that EJC mutant embryos are initially sustained by maternally- deposited rbm8a and magoh transcripts (153) and Rbm8a and Magoh protein, and that developmental defects appearing at 19-21 hpf coincide with maternal depletion.

Consistent with maternal deposition of EJC transcript and/or protein, we detect Rbm8a and Magoh protein in 2-4 cell stage embryos (0.75 hpf) (Figs 3.3C-D). Over time, both

Rbm8a and Magoh levels in EJC mutant embryos decrease to ~25% of their respective levels in wild-type siblings, and levels continue to drop over the next six hours (Figs

3.3C-D). As previously observed in mammalian cells (154), reduction of either protein of the Rbm8a:Magoh heterodimer leads to a concomitant depletion of the other protein

(Figs 3.3C-D).

Although EJC mutant embryos are morphologically indistinguishable from wild- type siblings at 18 hpf, we find that they are paralyzed (Fig 3.5A). It is unlikely that the lack of spontaneous contractions is due to developmental delay as EJC mutant embryos never become motile (data not shown). To further characterize the paralysis phenotype, we assessed muscle and motor neuron morphology, as these cell types are required for motility. Myosin heavy chain immunostaining reveals that EJC mutant embryos have disorganized myofibers and have U-shaped instead of chevron-shaped myotomes (Figs

3.5B-D), with muscle defects in magoh mutant embryos consistently more severe than in rbm8a mutant embryos. Co-labeling of motor axons (using anti-SV2) and neuromuscular 72 junctions (using Alexa Fluor conjugated α-Bungarotoxin), shows that motor axon length and neuromuscular junction number are reduced in EJC mutant embryos (Figs 3.5E-H).

Thus, as expected of genes that encode proteins that function as a complex, homozygous rbm8a and magoh mutant embryos show phenotypically similar muscle organization and motor axon outgrowth defects.

3.4.3 Analysis of gene expression in rbm8a and magoh mutant embryos

To identify EJC-regulated genes during zebrafish embryonic development, we performed

RNA-Seq of EJC mutant embryos and their wild-type siblings at two developmental time points, 21 hpf and 27 hpf. We chose the 21 hpf time point because this is when EJC mutant embryos begin to show visible phenotypes and reduced Rbm8a and Magoh protein levels (Figs 3.3C and 3.3D) but do not yet display extensive cell death. We chose the 27 hpf time point because this is when EJC mutant embryos have reliably low Rbm8a and Magoh protein levels as well as motor axon defects. However, because magoh mutant embryos display extensive necrosis at 27 hpf (Fig 3.4B), we only focused on

RNA-Seq from the less necrotic rbm8a mutant embryos at this later time point. We generated three biological replicates of total RNA-Seq from each mutant (rbm8a mutant at 21 hpf and 27 hpf, and magoh mutant at 21 hpf) and their wild-type siblings (a mixture of wild-type and heterozygous embryos) (Appendix B). A differential gene expression analysis using DESeq2 identified gene-level expression changes in the two mutant embryos compared to their wild-type siblings. As expected, rbm8a and magoh transcripts are downregulated in the respective mutant embryos (Fig 3.6A-C). We compared genes that are significantly altered (fold-change > 1.5, false discovery rate (FDR) < 0.05)

73 among the different mutant embryos and time points. In 21 hpf rbm8a and magoh mutant embryos, a significant number of genes are commonly up- or down-regulated (Fig 3.6D,

103 upregulated and 29 downregulated). Similarly, a significant number of genes are commonly upregulated between rbm8a mutant embryos at 21 and 27 hpf (Fig 3.6E).

We next determined if genes with shared functions are enriched among differentially-expressed genes in EJC mutant embryos. Except for a handful of cell death regulators and effectors, none of the upregulated genes in either EJC mutant identify a functionally-related class of genes, suggesting that the proteins they encode perform a variety of functions. The upregulated cell death genes include tp53, tp53-inp1, and casp8

(the latter upregulated only in rbm8a mutant embryos), which is consistent with the cell death observed in mutant embryos (Figs 3.3B and 3.4). In contrast to upregulated genes, downregulated genes in each EJC mutant are enriched in specific GO terms. In rbm8a mutant embryos, downregulated genes at both 21 and 27 hpf are significantly enriched for genes encoding proteins with G-protein coupled receptor (GPCR) or nucleic acid binding activities (Fig 3.6F). In magoh mutant embryos at 21 hpf, downregulated genes are significantly enriched for the retinoid binding GO term, which includes several

GPCRs. Another functionally related group of genes downregulated in magoh mutant embryos at 21 hpf are genes encoding structural constituents of the ribosome (Fig 3.6F).

This latter class is also downregulated in mouse magoh heterozygotes (44), highlighting the importance of magoh in ribosomal gene expression during development.

74 3.4.4 rbm8a and magoh mutant embryos have defects in NMD

Because translation termination upstream of exon-exon junctions was previously shown to trigger NMD in zebrafish embryos (143), one expected group of upregulated transcripts in EJC mutant embryos are mRNAs containing premature termination codons

(PTC) or “normal” NMD targets containing a 3’ UTR intron (3′UI) or an upstream (uORF). To identify whether NMD targets are enriched among upregulated genes in EJC mutant embryos, we first compared genes upregulated in EJC mutant embryos to those upregulated in zebrafish upf1 morphants at 24 hpf (155). A significant number of genes are shared between 24 hpf upf1 morphants and magoh mutant embryos at 21 hpf (39 out of 707), and rbm8a mutant embryos at 21 hpf (45 out of 1103, data not shown) and at 27 hpf (44 out of 499) (Fig 3.7A). Importantly, NMD targets previously validated in zebrafish (e.g. isg15, atxn1b, bbc3) (155) and mRNAs predicted to undergo NMD (e.g. upb1, contains a 3′UI) are among these shared genes.

We also generated an independent dataset of Upf1-regulated transcripts from zebrafish morphants at an earlier timepoint (12 hpf) using RNA-Seq (in duplicate, Fig 3.7B) to avoid secondary targets upregulated due to significant cell death in upf1 morphants

(143). We find that upregulated genes in 12 hpf upf1 morphants (Fig 3.7C) show a modest but significant overlap with upregulated genes in the previously published 24 hpf upf1 morphant dataset (Fig 3.8C); the overlap also includes three of the five previously validated NMD targets (isg15, atxn1b and bbc3) (155). Significant overlap is also observed among upregulated genes in 12 hpf upf1 morphants and 21 hpf magoh mutant embryos (32 out of 707, Fig 3.7A), 21 hpf rbm8a mutant embryos (33 out of 1103, data not shown), and 27 hpf rbm8a mutant embryos (65 out of 499, Fig 3.7A). Because the 75 observed changes in rbm8a mutant embryos at 21 hpf were similar to but more modest than in magoh and rbm8a mutant embryos at 21 hpf and 27 hpf, respectively, we focused all subsequent analyses on the latter two EJC mutant datasets. At least one-third of all genes upregulated >1.5 fold upon upf1 knockdown (FDR < 0.05) also show a >1.5-fold increase (FDR < 0.05) in either 21 hpf magoh or 27 hpf rbm8a mutant embryos with 14 genes being significantly upregulated in all three datasets (Fig 3.7A). Globally, genes upregulated >1.5 fold in EJC mutant embryos (FDR < 0.05), as compared to unchanged genes, also show a positive fold-change in upf1 morphants at both 12 hpf and 24 hpf (Fig

3.7B-C; Fig 3.8D-G). This suggests that a much larger shared set of genes show an increase in abundance upon depletion of the EJC or Upf1 even though only a small set is significantly affected.

To further confirm that predicted EJC-dependent NMD targets are indeed affected in EJC mutant embryos, we quantified relative levels of select transcripts that are orthologous to well-known NMD targets (srsf3a, srsf7a, srsf10b, and gadd45aa), or are upregulated in upf1 morphants (e.g. gtpbp1l, atxn1b). All of these transcripts are robustly upregulated in at least one of the EJC mutant backgrounds compared to wild- type siblings, and some (eif4a2, pik3r3a, gadd45aa, gtpbp1l, atxn1b) are upregulated in both EJC mutant backgrounds (Fig 3.7D). These data further confirm that the EJC is required for Upf1-mediated downregulation of NMD targets in zebrafish embryos.

We next tested if different known classes of NMD targets (PTC-, 3′UI- or uORF- containing mRNAs) are upregulated in EJC mutant embryos. We identified 589 genes that encode transcripts annotated as ‘NMD biotype’ in Ensembl due to the presence of

PTCs. When compared to intron-less protein-coding genes, 566 genes encoding NMD 76 biotype transcripts show an increase in abundance in EJC mutant embryos (Fig 3.7E-F).

We also identified 582 genes that encode transcript isoforms with 3′UIs > 50 nts from stop codons and are reliably detectable according to the APPRIS database (142). Similar to the NMD biotype group, the genes encoding APPRIS-supported 3′UI-containing transcripts are also increased in abundance in EJC mutant embryos (Fig 3.7E-F). Another feature known to induce EJC-dependent NMD is uORFs. Based on a published ribosome footprinting dataset (141,156) we identified 1525 genes encoding transcripts with ribosome-occupied uORFs. When compared with a control set of intron-less protein- coding genes, genes containing functional uORFs show a significant positive fold change in EJC mutant embryos (Fig 3.7E-F). Overall, we conclude that all major modes to trigger EJC-dependent NMD (i.e. PTCs, 3′UIs and uORFs) are active during zebrafish development.

3.4.5 Transcripts with stop codon-proximal 3′UTR introns are upregulated upon loss of EJC and Upf1 function

Current models of vertebrate NMD state that translation termination must occur at least

50 nts upstream of the last exon-exon junction for an mRNA to undergo EJC-dependent

NMD. To date, a T-cell receptor β (TCR-β) derived transcript is the only known exception to this “50-nt rule” (157). Surprisingly, we noticed that among the 14 transcripts that are commonly upregulated in rbm8a mutant embryos, magoh mutant embryos, and upf1 morphants (Fig 3.7A), three (foxo3b, phlda3 and nupr1a) are encoded by genes that contain a 3′UI where the distance between the stop codon and the intron is less than 50 nts. For foxo3b and nupr1a, the human orthologs also contain a 3′UI < 50 nts

77 downstream of the stop codon. This observation raises an intriguing possibility that additional mRNAs exist that defy the 50-nt rule, and that a proximal 3′UI (< 50 nts distance between intron and upstream stop codon; Fig 3.9A) may represent a previously unrecognized NMD-inducing feature. Using Ensembl GRCz10 transcript annotations, we identified 861 zebrafish genes that encode transcripts with proximal 3′UIs and 582 genes that encode transcripts with distal 3′UIs (3′UIs ≥ 50 nts downstream of the stop codon;

Fig 3.9A) (Fig 3.10A). Interestingly, proximal 3′UI-containing genes encode proteins which are enriched for mRNA binding and mRNA splicing factor functions (Fig 3.10B), two functional groups that are well-recognized to be regulated by EJC-dependent NMD

(122,158). We find that 3.5-8% (70/854 in rbm8a 27 hpf, 60/854 in magoh 21 hpf and

21/597 in upf1 morphants 12 hpf) of all proximal 3′UI-containing genes, detectable in our datasets, are ≥ 1.5 fold upregulated in all three EJC mutant embryos (Fig 3.9B, 3.9C and 3.10C) and in upf1 morphants (Fig 3.10D). To further confirm that proximal 3′UI- containing genes are indeed targets of NMD, we focused on a subset (foxo3b, cdkn1ba, and phlda2) that are upregulated in upf1 morphants and at least one EJC mutant, and where the existence of a proximal 3′UI is conserved in several other vertebrates including humans. After treating embryos with the NMD inhibitor NMDI14 (159), we found that foxo3b, cdkn1ba, and phlda2 transcripts are 2-to-8 fold upregulated, just like eif4a2, a distal 3′UI-containing transcript, and atxn1b, a previously validated (155) NMD target (Fig 3.9D). Thus, a subset of genes with a 3′UI in a stop codon-proximal position appear to defy the 50-nt rule as their encoded mRNAs are targeted for EJC-dependent

NMD. Interestingly, we did not observe any correlation between the distance of the 3′UI

78 from the stop codon and the degree of fold change observed in EJC mutant or upf1 morphants (Fig 3.9B-C, 3.10C-D).

3.4.6 Loss of function of foxo3b, a proximal 3′UI-containing gene upregulated in

EJC mutant embryos, partially rescues motor axon outgrowth

The zebrafish foxo3b gene, whose vertebrate orthologs also contain a proximally-located

3′UI (Fig 3.11A), encodes a transcript that is significantly upregulated in both EJC mutant and upf1 morphants (Fig 3.9B-C, 3.10C-D, and 3.12A). Conservation of proximal intron position suggests that the stop codon-proximal 3′UI may be important for regulation of foxo3b expression. Like foxo3b transcript, we find that Foxo3b protein is upregulated in 21 hpf magoh mutant embryos (2.8-fold; Fig 3.11B) and in 27 hpf rbm8a mutant embryos (1.3-fold; Fig 3.7B) compared to wild-type siblings. Furthermore, five known Foxo3b transcriptional target genes (91) are also upregulated in magoh and rbm8a mutant embryos (Fig 3.7C). To test if Foxo3b upregulation contributes to EJC mutant phenotypes, we obtained a previously described foxo3b null allele, foxo3bihb404

(95,96), generated magoh; foxo3b and rbm8a; foxo3b doubly heterozygous adults, and examined muscle and motor neuron development in single, double, and compound mutant embryos. As expected, motor axon outgrowth in foxo3b mutant embryos is indistinguishable from wild-type siblings (data not shown); in contrast, as noted above, motor axons barely extend beyond the horizontal myoseptum in EJC mutant embryos

(Figs 3.6E-G, 3.11G-H, and 3.12H-I). Strikingly, we find that heterozygous and homozygous loss of foxo3b in EJC mutant embryos leads to significantly longer motor axons that extend well beyond the horizontal myoseptum (Figs 3.11I-K and 3.12J-L), but

79 not as far as in wild-type embryos. Despite significant rescue of motor axon outgrowth, neuromuscular junction formation (Figs 3.11G-J and 3.12H-K) and myofiber organization (Figs 3.11C-F and 3.12D-G) are not restored in magoh; foxo3b or rbm8a; foxo3b double mutant embryos. Thus, we predict that Foxo3b repression via EJC- dependent NMD is important for motor axon outgrowth, but that regulation of other targets is required for proper muscle development.

3.4.7 Proximal 3′UI-containing genes are regulated by NMD in human and mouse cells

We surveyed human and mouse genomes for prevalence and conservation of proximal

3′UIs. Like in zebrafish, APPRIS-annotated proximal 3′UI-containing genes outnumber distal 3′UI-containing genes in human (Fig 3.14A, 1239 proximal 3′UI-containing genes,

489 distal 3′UI-containing genes) and mouse (Fig 3.14B, 921 proximal 3′UI-containing genes, 649 distal 3′UI-containing genes). Except for over-representation immediately downstream of the stop codon, 3′UI position appears randomly distributed within the stop-codon proximal window, and 3′UI occurrence precipitously drops after the 50-nt position (Fig 3.12A, 3.14A and 3.14B). This trend is observed even among all

ENSEMBL transcripts even though total number of genes encoding distal 3′UI- containing transcripts are higher than those encoding proximal 3′UI-containing transcripts in humans (Fig 3.14C-D, 2297 proximal 3′UI-containing genes, 13725 distal

3′UI-containing genes in humans). In order to focus on NMD-inducing feature that would directly impact gene expression and protein function, we used the APPRIS annotated database to conduct all our downstream analysis. As in zebrafish, human and

80 mouse proximal 3′UI-containing genes are enriched in mRNA binding function (Figs

3.14E, F). A cross-comparison of zebrafish, mouse, and human proximal 3′UI-containing genes identified 167 genes where the proximal position of 3′UI is conserved in all three organisms suggesting that proximal 3′UIs could be important regulatory elements. These genes show a significant interaction network amongst themselves (Fig 3.13A, p-value =

0.02), and are enriched for genes encoding proteins with RNA recognition motifs and with roles in neural development and disease (Fig 3.13B).

To test if human and mouse proximal 3′UI-containing genes are also regulated by

NMD, we analyzed publicly available RNA-seq datasets of human and mouse cell lines depleted of key NMD factors. We find that a subset of transcripts encoded by proximal and distal 3′UI-containing genes are similarly upregulated in UPF1-depleted HEK293 cells (Fig 3.13C, 3.14G) (160) and human ESCs (Fig 3.14H) (72), and in Smg6-/- knockout mouse ESCs (Fig 3.14I) (65). Furthermore, transcript stability of proximal

3′UI-containing genes grouped based on increasing distance from the stop codon (20-50 nts, 30-50 nts, and 36-50 nts) progressively increases upon UPF1 knockdown in HEK293 cells (160) (Fig 3.13D). Notably, the proximal 3′UI-containing genes where the intron is

≥ 36 nts from the stop codon are the most significantly stabilized. To further confirm

UPF1 and EJC dependence of proximal 3′UI-containing NMD targets in human cells, we knocked down UPF1 or EIF4A3 in a human colorectal carcinoma cell line (HCT116), and assessed levels of a subset of 3′UI -containing transcripts. This subset consists of human orthologs of all three proximal 3’UI-containing genes validated in zebrafish

(FOXO3 CDKN1B and PHLDA2). We also picked three proximal 3’UI-containing genes that show the highest change in stability upon UPF1 knockdown in Fig 3.13D (STX3, 81 ULBP1 AND RBM3). Finally, we included RNA-binding protein HNRNPD encoding transcript whose principal APPRIS isoform contains a proximal 3’UI. In all these transcripts, the proximal position of the 3′UI is conserved (with the exception of STX3;

Fig 3.13A). Importantly, primer pairs used for detection of these transcripts unambiguously amplified only the proximal 3’UI-containing isoforms. We find that

CDKN1B and ULBP1 are upregulated upon EIF4A3 and UPF1 knockdown like transcripts encoded by distal 3′UI-containing genes (ARC and SRSF4) (Fig 3.13E). STX3 and FOXO3 are upregulated either upon UPF1 or EIF4A3 knockdown but not under both conditions. HNRNPD, on the other hand, remains unchanged in HCT116 cells upon

UPF1 or EIF4A3 knockdown (Fig 3.13E). PHLDA2 and RBM3 were below detection limits in HCT116 cells and therefore could not be tested. Thus, we conclude that proximal 3′UI-containing transcripts are regulated by EJCs and UPF1 but exhibit differential sensitivities to loss of EJC-dependent NMD components possibly due to transcript and/or cell-type specific differences in NMD.

82 3.5 Discussion

Our work takes advantage of unique features of zebrafish as a model of vertebrate development to reveal tissue-specific requirement of EJC function during early development. Analysis of gene expression in zebrafish EJC mutant embryos uncovers proximal 3′UIs as key modulators of 3′UTR RNP composition to control mRNA stability and protein output. We also show that this new mode of NMD induction via proximal

3′UIs plays a critical role in zebrafish motor axon outgrowth by regulating foxo3b expression.

3.5.1 Loss of EJC causes tissue-specific defects, embryonic lethality and changes in gene expression in zebrafish

In zebrafish rbm8a and magoh single mutant embryos, both Rbm8a and Magoh proteins are co-depleted (Fig 3.3C and 3.3D). This simultaneous deficiency of rbm8a and magoh function impairs EJC function leading to rapid emergence of developmental defects, which progressively worsen and lead to embryonic death by 2 dpf. Remarkably, the initial defects in EJC mutant embryos arise in specific tissues. magoh mutants show head necrosis starting at 19 hpf while rbm8a mutants at 21 hpf likely due to tissue- specific differences in the rate of depletion of maternal stores. The head necrosis phenotype in EJC mutant embryos is accompanied by neural cell death (observed by acridine orange staining) in the brain and spinal cord at 19 hpf (Fig 3.4A); by 21 hpf, brain necrosis is also morphologically apparent (Fig 3.4B). The neural cell death phenotype is similar to that seen in mouse heterozygous EJC mutant embryos (43,44), and is consistent with the microcephaly phenotype in human patients heterozygous for hypomorphic RBM8A and EIF4A3 mutations (36,42). Thus, across vertebrates, certain 83 tissues appear more sensitive to loss of EJC function. The emergence of defects in EJC mutant embryos in discrete lineages such as neural and muscle cells may result from tissue-specific differences in EJC protein functions, activity of EJC regulators, or decay rates of maternally-provided EJC transcript/protein. Future investigation into these possibilities in zebrafish embryos may explain why loss of a ubiquitously-expressed entity like the EJC leads to tissue-specific phenotypes, as are also observed in human

EJC-linked syndromes (36,42). Notably, unlike haploinsufficiency of EJC core components in mouse and human (36,42–44), heterozygous loss of rbm8a or magoh in zebrafish does not have any apparent phenotypic consequences, indicating that the threshold dose of EJC may differ between zebrafish and mammals.

Despite the similarity in the phenotypic defects between rbm8a and magoh mutant embryos, the gene expression changes in the two mutants show low overlap (Fig

4). There are multiple factors that are likely to contribute to these low overlaps.

Foremost, the temporal difference in appearance of defects in the two mutants could contribute to the low overlap upon comparing magoh and rbm8a mutants at 21 hpf (Fig

3.6D). When comparing rbm8a mutants at 21 hpf and 27 hpf (Fig 3.6E), the low overlaps are more likely driven by the six-hour difference in developmental timing. Finally, comparing rbm8a 27hpf to magoh 21hpf gene expression signatures (Fig 3.7A) likely further amplifies the differences described above possibly resulting in only small overlaps. Additionally, the EJC-independent function of Eif4a3 and Rbm8a might lead to differences in gene expression in the two mutants. In fact, our observations of low overlaps in gene expression are similar to those observed in conditional cell type-specific

84 mouse mutants of all three core proteins (44) and Y.lipolytica mutants of rbm8a and magoh (12).

3.5.2 The EJC is a critical component of NMD in zebrafish

Our finding that a significant fraction of genes upregulated in EJC mutant embryos are also upregulated in upf1 knockdown embryos (Fig 3.7) shows that EJC-dependent NMD is compromised in both rbm8a and magoh mutant embryos. The overlap between upregulated genes in EJC mutant and upf1 morphants is small (Fig 3.7A and S3A), likely due to differences in developmental timing or due to NMD-independent functions of the proteins. However, the genes within these overlaps include previously validated zebrafish NMD targets and orthologs of known NMD targets in mammals (Fig 3.7D).

Furthermore, several known classes of NMD targets such as PTC-, uORF-, and 3′UI- containing transcripts are significantly upregulated in rbm8a and magoh mutant embryos

(Fig 3.7E-F). Therefore, the EJC is critical for the quality control function (i.e. suppression of aberrant PTC-containing transcripts) and the gene regulatory activity of the zebrafish NMD pathway, which further underscores the importance of EJC- dependent NMD for developmental and tissue-specific gene regulation (27,28,67,123).

An important future goal will be to expand on how EJC-dependent NMD regulates specific genes in particular cell types and tissues to control development.

Another function of EJC that is well documented in Drosophila, mouse and human cell lines is its role in regulating splicing. While the function of EJC in splicing is highly studied, and is likely a conserved function, we observed only a few changes in splicing patterns using DEX-Seq with no overlapping changes between rbm8a and

85 magoh mutants (data not shown). This observation is most likely due to the unsuitability for analysis of splicing changes due to short read lengths (median length of ~35 nt) in our

RNA-seq data.

3.5.3 Proximal 3′UTR introns are a novel NMD-inducing feature

Our surprising discovery that proximal 3′UIs represent a bona fide NMD-inducing signal is supported by multiple lines of evidence. Nearly 10 % of all detectable zebrafish proximal 3′UI-containing genes, like distal 3′UI-containing genes, are ≥1.5 fold upregulated in EJC mutant and upf1 morphant datasets (Fig 3.9B, 3.9C, 3.10D), and a subset of these are upregulated in zebrafish embryos treated with the NMD inhibitor

NMDI14 (Fig 3.9D). Further, a subset of proximal 3′UI-containing genes is also upregulated in mouse and human NMD- and EJC-compromised cells (Fig 3.13 and

3.14). The majority of proximal 3′UI-containing genes conserved in human, mouse and zebrafish encode RNA-binding proteins (e.g. HNRNPD, MBNL), proteins with neuronal functions (e.g. KIF5A, STXBP1), or both (e.g. CELF1, MSI2); genes in these classes are well-known for their regulation via 3′UIs and NMD (27,122,161). Thus, our findings suggest that among 3′UI-containing genes numerous exceptions exist to the prevailing

50-nt NMD rule.

Apparently, proximal 3′UI-containing genes are variably susceptible to reduced

EJC/NMD function. For example, zebrafish foxo3b is significantly upregulated in EJC mutant and upf1 morphants whereas human FOXO3 is only mildly sensitive to reduced

UPF1 levels in cell lines (Fig 3.9, 3.10, 3.13 and 3.14) but shows robust upregulation upon EIF4A3 knockdown in HCT116 cells (Fig 3.13E). Furthermore, many vertebrate

86 genes with proximal 3′UIs are not upregulated, or are even downregulated, upon diminished EJC/UPF1 function (Fig 3.9 and 3.13). Notably, distal 3′UI-containing genes show a similar variable susceptibility to EJC/NMD-deficiency. The variability and/or non-responsiveness of some 3′UI-containing genes to EJC/NMD manipulations could be due to detection issues (e.g. low expression at developmental stage/cell type investigated) or due to their variable sensitivity to EJC/NMD protein levels.

Downregulation of proximal 3′UI-containing genes could result from indirect effects of compromised EJC/NMD function. It is also possible that some 3′UI-containing genes actively evade NMD. Such a mechanism(s) could operate via 3′UTR-bound proteins as observed for mRNAs with unusually long 3′UTRs (162,163). This idea is consistent with the recent report that HNRNPL binding to 3′UTRs near stop codons can protect mRNAs with long 3′UTRs or downstream EJCs from NMD (164). Thus, regulation of mRNA stability by 3′UIs is likely to be a net outcome of a combinatorial control of NMD by multiple determinants of 3′UTR RNP composition.

3.5.4 How can splicing of a proximal 3′UI lead to NMD?

The prevalent 50-nt rule is presumed to account for the minimum distance required to accommodate a terminated ribosome at stop codon so that it does not interfere with the downstream EJC. Based on estimates that ribosome footprints at stop codons extend about 9 nts into the 3′UTR (156) and that the EJC 5′ boundary lies about 27 nts upstream of the exon-exon junction (Fig 3.1C), a 3′UI located at least 36 nts downstream of a stop codon could induce EJC-dependent NMD via the commonly accepted mechanism (Fig

3.14). Indeed, a recent single-molecule analysis of NMD in human cells shows that PTCs

87 as close as 40 nt to the last exon junction can induce NMD suggesting that EJCs very close to stop codons may not always be displaced by translating ribosomes (165). How introns within the first 35 nts of the 3′UTR induce NMD is more perplexing. One possibility is that introns within 35 nts of the stop may trigger NMD via non-canonical

EJCs present downstream in the 3′UTR (9,149) (Fig 3.15A). Additionally, EJC- interacting factors such as SR proteins deposited on 3′UTR sequences after 3′UI splicing may also recruit NMD-activating factors (9,149,166,167) (Fig 3.15A). Curiously, both proximal and distal 3′UI-containing transcripts show comparable upregulation upon

NMD disruption (Figs. 3.9 and 3.15A). Thus, we speculate that the presence of a 3′UI rather than its 3′UTR position is a stronger determinant of an mRNA’s 3′UTR RNP composition, and hence its NMD susceptibility.

In contrast to 3′UI-mediated NMD, PTC-triggered NMD appears to more strongly adhere to the 50-nt rule (168). Conceivably, some mechanistic differences may exist in induction of NMD at PTCs versus at normal stop codons upstream of a 3′UI.

PTC-containing aberrant mRNAs are targeted for decay to suppress expression of truncated polypeptides whereas 3′UI-containing transcripts are targeted for decay to fine- tune transcript and thus protein levels. We support a hypothesis that translation termination at PTCs is slow and results in strong NMD induction whereas termination at normal stop codons is rapid and results in weak NMD induction.

3.5.5 EJC-dependent NMD of foxo3b is critical for zebrafish motor axon outgrowth

Certain genes maintain proximal 3′UIs across vertebrate evolution (Fig 3.11 and 3.13) despite much faster rates of intron loss from 3′UTRs compared to

88 (169,170), suggesting that proximal 3′UIs play an important role in gene regulation and cellular function. foxo3b emerges as one such example that is regulated via EJC- dependent NMD in zebrafish embryos (Fig 3.9) and cultured human cells (Fig 3.13).

This gene encodes a forkhead box transcription factor that acts as a hub for integration of several stress stimuli, and functions in processes such as cell cycle, apoptosis, and autophagy (92,99). In zebrafish, Foxo3b contributes to survival under hypoxic stress

(95), and negatively regulates antiviral responses (96) as well as the canonical wnt signaling pathway (90). FOXO3, the mammalian ortholog of Foxo3b, physically interacts with p53, and both act synergistically to induce apoptosis in response to stress

(171). In addition, several pro-apoptotic genes (e.g. bim, bbc3, gadd45a) are direct

FOXO3 transcriptional targets (Fig 3.7C) (172,173). Thus, the regulation of foxo3b by

EJC-dependent NMD (Fig 3.9 and 3.13) can directly impact cell survival. The elevated levels of Foxo3b in EJC mutant embryos may cause motor axon growth defects due to dampened Wnt signaling (90,174), increased neural cell death, and/or other cell- autonomous or non-cell-autonomous reasons. Loss of foxo3b function in EJC mutant embryos partially reverses motor axon length (Fig 3.11). One caveat to this rescue experiment is the possibility that the rescue is caused as a result of genetic compensation

(175) wherein the lack of foxo3b leads to the transcriptional upregulation of a paralog that functions similarly. Even so, our findings parallel the rescue of neural apoptosis in mouse EJC mutant embryos upon brain-specific p53 ablation (43,44), and the reversal of cell death in NMD-defective flies and human cell lines upon reduced activity of

GADD45A (176,177). Thus, EJC-mediated regulation of foxo3b identifies a new gene in the EJC-NMD-cell survival network (Fig 3.15B) and highlights the importance of 89 proximal 3′UIs as key modulators of 3′UTR RNP composition to regulate mRNA stability and protein production during development.

90 3.6 Figures

Figure 3.1 The zebrafish EJC is detected ~24 nucleotides upstream of exon-exon junctions. A. Western blot indicating Rbm8a, Eif4a3 and Magoh proteins detected in RNase I- treated zebrafish embryo total extract (TE, lane 1), depleted extract (DE, lanes 2 and 4) immunoprecipitated protein complexes (IP, lanes 3 and 5). Antigens detected in the blot

91 are listed on the left and antibodies used to immunoprecipitate complexes are listed on top. The signal corresponding to the antibody light chain and heavy chain in the IP lanes is indicated by IgGL and IgGH respectively.

B. Boxplots showing the Rbm8a RIP-Seq normalized read densities (reads per kilobase per million, RPKM) in intronic versus exonic genomic regions. Asterisk at the top indicates Wilcoxon test p-values, which are < 10-6.

C. Boxplots as in B showing the Rbm8a RIP-Seq normalized read densities (RPKM) in the indicated genomic regions (bottom). Exons with downstream introns include all but last exons. Asterisk at the top indicates Wilcoxon test p-values, which are < 10-6.

D. Meta-exon plots showing Rbm8a RIP-Seq and RNA-Seq (indicated on the left) normalized read depths in a 75 nt region starting from the exon 5′ (left of dashed black line) or 3′ ends (right of dashed black line). Vertical black line: expected canonical EJC binding site (-24 nt) based on human studies. A composite exon with the relative position of exon-exon junctions (EEJ) is diagrammed at the bottom.

E. A meta-exon plot of start and end of Rbm8a RIP-Seq footprint reads (5′ ends, solid lines; 3′ ends, dotted lines). Vertical black line: canonical EJC site (-24 nt). Gray vertical dashed lines represent boundaries of the minimal EJC occupied site.

F. Top: UCSC genome browser screenshots showing read coverage along the atp2a1 gene in the Rbm8a RIP-Seq or RNA-Seq replicates as labeled on the right. Bottom: A zoomed in view of the region between the two dotted lines on the top panel. The y-axis on the left of each track shows maximal read coverage in the shown interval.

*Analysis in B-E has been done by Robert Patton (supervised by Ralf Bundschuh).

92

Figure 3.2 EJC composition and deposition is highly conserved between zebrafish and humans. A. Multiple sequence alignments of Eif4a3, Rbm8a and Magoh protein sequences from organisms on the left. is at the bottom with upper case letters indicating identity and lower case letters indicating similarity. Green indicates complete identity across all species, yellow and blue indicate the identical and unique amino acids

93 in the regions with similarity. Identity between human and zebrafish EJC proteins:

Eif4a3 (97%), Rbm8a (93%) and Magoh (100%).

B. Western blot detecting proteins listed on the left in RNase I-treated zebrafish embryo total extract (TE, lane 1), depleted extract (DE, lanes 2, 4 and 6) and immunoprecipitates

(IP, lanes 3, 5 and 7) with the Rbm8a antibody. Detergents supplemented to increase IP stringency are indicated on top of each lane. Optimized IP condition used in S1C is indicated by the dashed red box.

C. Autoradiogram of 32P 5′-end labeled RNAs from anti-Rbm8a RIP elution (lane 4) as well as indicated size-markers which include the low-molecular weight single-stranded

DNA ladder (lane 1), 0.1 pmol 28 nt synthetic RNA (lane 2) and 100 bp DNA ladder

(lane 3).

D. Scatter plots comparing read counts for each gene in a pair of RIP-Seq replicates. The replicates (Rep1, Rep2, and Rep3) are indicated on the x- and y-axes. A pseudocount of

0.0001 was added to all genic read counts before log2 transformation. Pearson correlation coefficient and p-value for the correlation test for each comparison is on the top left of each plot.

E. Genome browser screenshots showing read coverage of Rbm8a RIP-Seq (only Rep 3, the deepest replicate is shown) in green and RNA-Seq in gray of select highly-expressed genes, krt4, eef2b, eif4g1a, and hist1h4l (intron-less gene).

94

Figure 3.3 Zebrafish rbm8a and magoh mutant embryos show gradual loss of maternally contributed Rbm8a and Magoh proteins during early development. A. Schematic illustrating the rbm8aoz36 and magohoz37 alleles and the predicted proteins they encode. Full length Rbm8a and Magoh proteins are also shown. RRM: RNA

Recognition Motif.

B. Whole mount images of live wild-type sibling, rbm8a mutant, and magoh mutant embryos at 24 hpf. Increased grayness in the head region of homozygous rbm8a and magoh mutant embryos indicates cell death.

95 C. Top: Western blots showing EJC protein expression in wild type (WT) sibling and rbm8a-/- mutant embryos. Antigens detected are listed on the right and embryo genotype is listed above the blot. Developmental time points (hpf) are indicated above each lane.

Protein from five (0.75 hpf) or ten embryos (all other time points) was loaded in each lane. A longer exposure (L.E.) of the 0.75 hpf lane is on the left. Bottom: Line graphs showing the amount of protein (per embryo) in the mutant embryos compared to wild- type sibling as a percent of protein present at 0.75 hpf. Error bars represent standard error of means.

D. Top: Western blots showing EJC protein expression in wild type (WT) sibling and magoh-/- mutant embryos. Antigens detected are listed on the right and embryo genotype is listed above the blot. Developmental time points (hpf) are indicated above each lane.

Protein from five (0.75 hpf) or ten embryos (all other time points) was loaded in each lane (Eif4a3 is running slightly slower at this timepoint likely due to the presence of large amounts of yolk proteins of around the same size in the gel; samples of all other timepoints have been prepared after deyolking the embryos however deyolking is not possible with embryos at 0.75 hpf). A longer exposure (L.E.) of the 0.75 hpf lane is on the left. Bottom: Line graphs showing the amount of protein (per embryo) in the mutant embryos compared to wild-type sibling as a percent of protein present at 0.75 hpf. Error bars represent standard error of means.

*Tom Gallagher, Kiel Tietz have designed, generated and injected the CRISPRs targeted to rbm8a and magoh. Natalie Deans and I have contributed to the effort of identifying, isolating and genotyping adult zebrafish that carry lesions in rbm8a and magoh.

96

Figure 3.4 Cell death in EJC mutants begins at 19 hpf and progressively worsens over time. A. Whole mount images of live 19 hpf EJC mutant embryos and WT sibling embryos stained with acridine orange.

B. Whole mount images of live EJC mutant embryos and WT siblings at 21 hpf.

C. Whole mount images of live EJC mutant embryos and WT siblings at 27 hpf.

D. Whole mount images of live EJC mutant embryos and WT siblings at 32 hpf.

97

Figure 3.5 EJC mutant embryos are paralyzed, have disorganized muscles and stunted motor axons. A. Boxplots showing the number of spontaneous contractions per minute measured for the EJC mutant embryos and WT siblings at 24 hpf as indicated on the x-axis. Welch t- test p-values are indicated at the top.

B-D. Immunofluorescence images showing Myh1 expression in somites 10-14 of WT sibling (B) rbm8a mutant (C) and magoh (D) mutant embryos. Antibody used was anti-

A4.1025 (see methods) (N = 10 embryos/genotype).

E-G. Merged confocal images of somites 12-16 in WT siblings (E) rbm8a (F) and magoh

(G) mutant embryos showing immunofluorescence detection of motor neurons (anti-

SV2; red) and acetylcholine receptors (α-Bungarotoxin; green). Neuro-muscular junctions in the merged image appear yellow. White arrowheads point to the end of the motor neuron. Scale bar in G is 100 nm.

98 H. Boxplots showing the quantification of motor axon length in somites 12-15 of wild- type sibling, rbm8a mutant, and magoh mutant embryos (N = 4 embryos/genotype and 4 neurons/embryo). Welch t-test p-values are at the top.

99

Figure 3.6 Gene expression changes in rbm8a and magoh mutant embryos. A-C. MA plots (M: log ratio; A: mean average) showing genes that are upregulated (fold change > 1.5 and FDR < 0.05) (red), downregulated (fold change < 1.5 and FDR < 0.05)

(blue), or unchanged (gray) in rbm8a mutant embryos compared to WT siblings at 21 hpf

(A), magoh mutant embryos compared to WT siblings at 21 hpf (B), and rbm8a mutant embryos compared to WT siblings at 27 hpf (C). rbm8a, magoh, and eif4a3 are labeled in each plot with label colors signifying no change (gray) or downregulation (blue).

100 D. Venn diagrams showing the overlap between genes that are upregulated (top) and downregulated (bottom) in rbm8a mutant embryos at 21 hpf (left) and magoh mutant embryos at 21 hpf (right). Hypergeometric test p-values are below each comparison.

E. Venn diagrams as in (D) comparing upregulated and downregulated genes in rbm8a at

21 (left) and 27 hpf (right).

F. PANTHER14.0 (139) gene ontology (GO) term overrepresentation analysis of genes downregulated in rbm8a and magoh mutant embryos at indicated times. All significant terms (Benjamini-Hochberg corrected p-value < 0.05) are shown for each set. The number of genes in each term is indicated at the right of each bar.

101

Figure 3.7 Genes upregulated in EJC mutant embryos are also regulated by Upf1 and contain NMD-inducing features. A. Venn diagram showing the overlap of significantly upregulated genes in EJC mutant embryos and upf1 morphants. Each overlap and its corresponding hypergeometric test- based p-value are color-coded.

B. Cumulative distribution frequency (CDF) plot showing the fold changes in upf1 morphants (12 hpf) of genes upregulated in magoh mutant embryos at 21 hpf (blue) compared to unchanged genes (black). Kolmogorov-Smirnov (KS) test p-value for differences in fold changes between the two groups is indicated on the bottom right.

102 C. CDF plot as in B for genes upregulated in rbm8a mutant embryos at 27 hpf (red) compared to unchanged genes (black).

D. Quantitative RT-PCR (qRT-PCR) analysis showing fold change of select NMD target transcripts (x-axis) compared to control (mob4) transcript in magoh mutant embryos at

21 hpf compared to wild-type siblings (dark gray bars) and in rbm8a mutant embryos at

27 hpf compared to wild-type siblings (light gray bars). The selected genes either contain a 3′UTR intron and/or have orthologs that are known NMD targets or were previously shown to be zebrafish Upf1 targets (155). Red dots: the value of each individual replicate. Error bars: standard error of means. Horizontal black dashed line: fold change

= 1. Welch t-test p-values are indicated by asterisks (** p-value < 0.05; * p-value < 0.1).

E. CDF plot showing the fold changes in 21 hpf magoh mutant embryos for genes that contain 3′UTR introns (APPRIS 3′UTR intron, mauve), uORF (orange), defined in

Ensembl as NMD-biotype (green) compared to intronless genes (black). KS test p-value for differences in distribution of fold changes between intronless genes and each of the particular groups is indicated on the bottom right.

F. CDF plot as in E showing the fold changes in rbm8a mutant embryos at 27 hpf.

*Tom Gallagher has generated the upf1 morphant RNA-Seq libraries.

103

Figure 3.8 Genes upregulated in rbm8a mutants at 27 hpf show the highest overlap with a previously published 24 hpf upf1 morphant dataset. A. Venn diagram showing the overlap of genes that are significantly upregulated in EJC mutant embryos and upf1 morphant embryos at 24 hpf (155). Hypergeometric test p- values for each comparison are also shown.

B. MA plot showing the genes that are altered in expression (fold change > 1.5 and FDR

< 0.05; red and unchanged genes in gray) in upf1 morphant embryos compared to control embryos at 12 hpf. The number of significantly upregulated genes is at the top right and the number of downregulated genes is at the bottom right.

104 C. Venn diagram showing the overlap of significantly upregulated genes in upf1 morphant embryos at 24 hpf and upf1 morphant embryos at 12 hpf (155).

Hypergeometric test p-value for the comparison is indicated.

D. Cumulative distribution frequency (CDF) plot showing the fold changes in upf1 morphant embryos (24 hpf) (155) of genes upregulated in 21 hpf magoh mutant embryos

(blue) compared to unchanged genes (black). Kolmogorov-Smirnov (KS) test p-value for differences in the two distributions are indicated at the bottom of the class descriptions.

E. CDF plot as in S3D for genes upregulated in rbm8a mutant embryos at 21 hpf (red) compared to the unchanged genes (black).

F. CDF plot as in S3D for genes upregulated in rbm8a mutant embryos at 27 hpf (red) compared to the unchanged genes (black).

G. Cumulative distribution frequency (CDF) plot showing the fold changes in upf1 morphants (12 hpf) of genes upregulated in rbm8a mutant embryos at 21 hpf (red) compared to unchanged genes (black). Kolmogorov-Smirnov (KS) test p-value for differences in fold changes between the two groups is indicated on the bottom right.

105

Figure 3.9 Transcripts encoded by genes with a proximal 3′UTR intron are upregulated in EJC mutant and in NMDI14-treated embryos.

106 A. Top: Schematic illustrating genes with 3′UTR introns (3′UI) where the distance between the stop codon and the 3′UI is equal to or greater than 50 nts. Such 3′UI are classified as distal. Bottom: Schematic illustrating genes with 3′UI where the distance between the stop codon and 3′UI is less than 50 nts. Such 3′UI are classified as proximal.

The ribosome, stop codon and EJC are labeled in the top panel.

B. A scatter plot showing fold change (FC) for genes with proximal 3′UI (dark blue: FC

> 1.5 and light blue: FC < 1.5) and distal 3′UI (black: FC > 1.5 and gray: FC < 1.5) in magoh mutant embryos at 21 hpf compared to wild-type siblings. Genes encircled in red also contain a proximal 3′UI in mouse and human, and were independently validated in

(D). Dots circled in orange represent genes that also contain an uORF.

C. A scatter plot as in B showing fold changes for rbm8a mutant embryos at 27 hpf compared to wild-type siblings.

D. qRT-PCR analysis showing fold changes for proximal 3′UI-containing genes (blue bars), a distal 3′UI-containing gene (light gray bar), and a validated zebrafish Upf1- regulated gene (dark gray bar) compared to the control gene (black bar) in zebrafish embryos treated with NMDI14 from 3-24 hpf. Red dots: the value of each individual replicate. Error bars: standard error of means. Horizontal black dotted line: fold change=1. Welch t-test p-values (** p-value < 0.05; * p-value < 0.1).

*Michael Parthun has contributed to the bioinformatic efforts for identifying zebrafish, mouse and human 3′UI-containing genes.

107

Figure 3.10 Proximal and distal 3′UI genes encode proteins with roles in RNA metabolism.

A. Histogram depicting the frequency of all zebrafish 3′UI transcripts in Ensembl

GRCz10 as a measure of the distance of the 3′UI from the stop codon. Data are shown in

5 nts bins and bins beyond 500 nts are not shown. Bins of proximal 3′UI genes are in blue and distal 3′UI bins are in gray. Inset: Histogram of all zebrafish proximal 3′UI transcripts binned by 1 nt.

108 B. PANTHER14.0 (139) gene ontology (GO) term enrichment analysis of proximal

3′UI-containing genes (top, shades of blue) and all 3′UI-containing genes (bottom, shades of gray). All significant terms (Benjamini-Hochberg corrected p-value < 0.05) are shown for each set.

C. A scatter plot showing fold change (FC) for genes with proximal 3′UI (dark blue: FC

> 1.5 and light blue: FC < 1.5) and distal 3′UI (black: FC > 1.5 and gray: FC < 1.5) in rbm8a mutant embryos at 21 hpf compared to wild-type siblings.. Dots circled in orange represent genes that also contain an uORF.

D. A scatter plot as in C showing fold changes of 3′UI-containing genes for 12 hpf upf1 morphants compared to wild-type control embryos.

109

Figure 3.11 Partial or complete loss of foxo3b in magoh mutant embryos rescues motor neuron outgrowth defects. A. Illustration showing foxo3b in multiple vertebrates. The distance between the stop codon and the proximal 3′UTR intron is on the right. Open rectangles:

UTRs, filled rectangles: coding region, gray lines: introns (hash marks denote shortened intron sequences).

B. Left: Western blot showing protein levels in wild-type sibling (left) and magoh mutant

(right) embryos at 21 hpf. Right: a dot plot showing Foxo3b levels normalized to tubulin levels in magoh mutant embryos and WT siblings at 21 hpf in three biological replicates.

(N= 5 embryos per genotype per replicate). Error bars: standard error of means.

C-F. Confocal images showing Myh1 immunofluorescence using anti-A4.1025 in somites 12-16 of WT sibling (C), and magoh-/- mutant (D), magoh-/-; foxo3b+/- mutant 110 (E), and magoh-/-; foxo3b-/- mutant (F) embryos. (N = 13 embryos/genotype). Scalebar in

J (for panels C-J) is 100 nm.

G-J. Merged confocal images showing motor neurons (red; detected by anti-SV2 staining) and acetylcholine receptors (green; detected by alpha-bungarotoxin staining) in somites 12-16 of WT sibling (G), magoh-/- mutant (H), magoh-/-; foxo3b+/- mutant (I), and magoh-/-; foxo3b-/- mutant (J) embryos. Neuromuscular junctions in the merged image are yellow. White arrowheads point to the distal end of the motor neuron. (N = 13 embryos per genotype). Scalebar in J (for panels C-J) is 100 nm.

K. Boxplots showing quantification of motor axon length in embryos of genotypes indicated along the x-axis (4 motor neurons/embryo and 13 embryos/genotype).

111

Figure 3.12 Partial or complete loss of foxo3b in rbm8a mutant embryos rescues motor neuron outgrowth defects. A. Semi-quantitative RT-PCR shows transcript levels of foxo3b, eif4a2 and rpl13

(loading control) in rbm8a mutant and wild-type sibling embryos at 21 and 27 hpf.

B. Western blots (on the left) show levels of Foxo3b, Rbm8a, Magoh, and Tubulin in rbm8a mutant embryos compared to WT siblings at 27 hpf (N = 20 embryos per

112 genotype). Right: dot plot showing Foxo3b levels normalized to tubulin levels in rbm8a mutant embryos and WT siblings at 21 hpf in three biological replicates.

C. Bar graph showing log2 fold changes of known Foxo3b transcriptional targets that show a significant upregulation (FDR < 0.05) in EJC mutant RNA-Seq datasets. Foxo3b targets are from Morris et al. 2015 (91). A pound symbol indicates log2 fold change with

FDR > 0.05. Horizontal dotted red line indicates fold change of 1.5.

D-G. Confocal images showing Myh1 immunofluorescence using anti-A4.1025 in somites 12-16 of WT sibling (D), rbm8a-/- mutant (E), rbm8a-/-; foxo3b+/- mutant (F), and rbm8a-/-; foxo3b-/- mutant (G) embryos. (N = 5 embryos/genotype). Scalebar in K (for panels D-K) is 100 nm.

H-K. Merged confocal images showing motor neurons (red; detected by anti-SV2 staining) and acetylcholine receptors (green; detected by alpha-bungarotoxin staining) in somites 12-16 of WT sibling (H), rbm8a-/- mutant (I), rbm8a-/-; foxo3b+/- mutant (J), and rbm8a-/-; foxo3b-/- mutant (K) embryos. Neuromuscular junctions in the merged image are yellow. White arrowheads point to the distal end of the motor neuron. (N = 5 embryos/genotype). Scalebar in K (for panels D-K) is 100 nm.

L. Boxplots showing quantification of motor axon length in embryos of genotypes indicated along the x-axis) (4 motor neurons/embryo and 5 embryos/genotype).

113

Figure 3.13 Proximal position of 3′UTR introns is conserved in many vertebrate genes and such introns can induce NMD in human cells. A. A major interaction cluster predicted by STRING network analysis of genes with a shared proximal 3′UI in zebrafish, mouse and human. Nodes are colored by gene/protein function: nervous system (red), presence of RNA recognition motif (green), diseases of signal transduction (blue), FoxO signaling pathway (yellow). (167 nodes and 127 edges in total, PPI enrichment p-value = 0.02).

114 B. Gene ontology enrichment analysis of all 167 genes with conserved 3′UI proximal positioning. The most significant GO term within the following functional categories has been shown: Interpro domains, Biological process and Reactome pathways.

C. A scatter plot showing fold changes for all proximal 3′UI transcripts (dark blue: FC >

1.5 and light blue: FC < 1.5) and all distal 3′UI genes (black: FC > 1.5 and gray: FC <

1.5) in UPF1 knockdown HEK293 cells compared to control cells using previously published data (160).. Dots encircled in orange represent transcripts that also contain an uORF.

D. CDF plot showing change in mRNA stability for different classes of NMD targets and intron-less genes upon UPF1 knockdown in HEK293 cells (data from (160)). The gene classes are as follows: proximal 3′UI-containing genes where distance is 20-50 nts (sky blue), 30-50 nts (olive green) and 36-50 nts (dark blue), Ensembl-annotated NMD- biotype genes (red) and intron-less genes (black). KS test p-value for comparison of

NMD targets to intron-less genes is indicated in the same color.

E. qRT-PCR analysis showing fold changes for proximal 3′UI-containing genes

(CDKN1B, FOXO3, ULBP1, STX3 and HNRNPD) and distal 3′UI-containing genes

(ARC and SRSF4) upon UPF1 (top) and EIF4A3 (bottom) knockdown in HCT116 cells.

The distance between stop codon and 3′UI for every 3′UI-containing gene is indicated below each bar. TBP is the normalizing gene used for qPCR analysis. Welch t-test p- values are indicated using asterisks (** p-value < 0.05 and * p-value < 0.1).

*Michael Parthun has contributed to the bioinformatic efforts for identifying zebrafish, mouse and human 3′UI-containing genes.

115

Figure 3.14 Proximal 3′UI genes are upregulated in UPF1 knockdown ESCs and encode proteins that function in RNA metabolism. A. Histogram showing the frequency of all APPRIS 3′UI-containing transcripts in human

GRCh38 as a measure of the distance of the 3′UI from the stop codon. Data are grouped 116 in 5 nt bins from 1-500 nts. Proximal 3′UI-containing gene bins are indicated in blue; distal 3′UI-containing gene bins are indicated in gray. Red dotted line indicates distance from stop codon to closest 3’UI = 50 nts.

B. Histogram as in A of mouse proximal and distal 3′UI-containing transcripts in mouse

GRCm38.

C. Histogram as in A of human ENSEMBL proximal and distal 3′UI-containing transcripts in human GRCh38.

D. Histogram as in A of mouse ENSEMBL proximal and distal 3′UI-containing transcripts in mouse GRCm38

E. PANTHER14.0 (139) gene ontology (GO) term enrichment analysis of APPRIS proximal 3′UI-containing genes (shades of blue) and all 3′UI-containing genes (shades of gray). All significant terms (Benjamini-Hochberg corrected p-value < 0.05) are shown for each set.

F. GO term enrichment analysis as in E of mouse APPRIS 3′UI-containing genes.

G. A scatter plot showing gene-level fold changes for all ENSEMBL-annotated transcripts with proximal 3′UI (dark blue: FC > 1.5 and light blue: FC < 1.5) and distal

3′UI (black: FC > 1.5 and gray: FC < 1.5) in UPF1 knockdown human embryonic kidney cells (HEK) compared to control cells using previously published data (160). Genes encircled in orange also contain a uORF as determined from a previously published dataset (see Methods).

117

Figure 3.15 Models for EJC-dependent NMD of proximal 3'UI-containing transcripts and of NMD-based regulation of foxo3b-dependent motor neuron outgrowth A. Current (top) and proposed models (center and bottom) for EJC-dependent NMD. The current model (top) of EJC-dependent NMD states that decay is triggered when the exon- exon junction is at least 50 nts downstream of a terminated ribosome. Based on our work, we propose two models for the degradation of 3′UI-containing transcripts. The first model (center) is proposed for transcripts where the distance between the terminated ribosome and downstream exon-exon junctions is at least 36 nts to be able to accommodate both the EJC and ribosome for triggering decay. The second model

(bottom) is proposed for transcripts where the distance between the terminated ribosome and downstream exon-exon junctions is less than 36 nts, in this case other downstream 118 factors such as a non-canonical EJC (ncEJC) or EJC-interacting RBPs (eg. SR proteins) may trigger decay. The ribosome (brown), direction of translation (black arrow), stop codon (‘UAA’ in white), EJC (green), EEJ (exon-exon junction), coding region of mRNA (black) and 3′UI of mRNA (gray) are labeled in the top panel.

B. A schematic depicting the genetic pathway identified in this work which is the EJC and NMD dependent regulation of foxo3b being critical for its function during development.

119 Chapter 4 Concluding Remarks and Future Directions

4.1 Summary of Findings and Significance

RNA-binding proteins (RBPs) play a crucial role in defining the cellular transcriptome, which in turn affects the cell proteome and cell fate. One such RBP complex, the Exon

Junction Complex (EJC), influences many aspects of post-transcriptional regulation, including Nonsense Mediated mRNA Decay (NMD). During graduate school, I have focused on understanding how EJC-dependent NMD shapes zebrafish embryonic development. To investigate my scientific questions, I utilized Singh lab’s custom RNA-

Seq library preparation method, which I optimized to increase efficiency of insert- containing cDNA extraction after adapter ligation and reverse transcription of input RNA

(Chapter 2).

To characterize the zebrafish EJC, I optimized a RIP-seq (RNA- immunoprecipitation) protocol for purification of EJC and sequencing EJC footprints

(Chapter 3, Figs 3.1A and 3.2B-C). I showed that zebrafish EJC, like humans, is composed of the three core proteins Eif4a3, Rbm8a and Magoh and is deposited ~24 nucleotides upstream of the exon-exon junctions (Chapter 3, Fig 3.1). I then characterized zebrafish mutants for genes encoding two EJC core proteins rbm8a and magoh. I found that homozygous rbm8a and magoh mutants (EJC mutants) show defects in movement, muscle organization and motor neuron outgrowth (Chapter 3, Figs 3.3, 3.4 and 3.5). To identify the molecular defects of EJC mutants, I generated and analyzed whole-embryo

RNA-Seq libraries from homozygous mutants and wild-type siblings (Chapter 3, Figs 3.6 and 3.7). RNA-Seq data shows that annotated NMD targets and transcripts upregulated in

120 upf1 morphants are upregulated in EJC mutants thereby providing evidence for the dysregulation of NMD in mutants (Chapter 3, Fig 3.7). I found that a new class of NMD targets, transcripts containing a proximal 3’ UTR intron (3′UI), are upregulated in EJC mutants as well as upf1 morphants (Chapter 3, Figs 3.9 and 3.10). I then validated the

NMD sensitivity of proximal 3′UI genes in embryos treated with a pharmacological inhibitor of NMD (NMDI14) (Chapter 3, Fig 3.9D). Analysis of NMD-deficient human and mouse RNA-Seq datasets also shows upregulation of proximal 3′UI transcripts upon dysregulation of NMD in human and mouse cells (Chapter 3, Figs 3.13 and 3.14). Over

150 genes consistently contain a proximal 3′UI in zebrafish, mouse and human cells suggesting a conservation of function across evolution (Chapter 3, Fig 3.13A). The functional roles of proteins encoded by these conserved proximal 3′UI genes are enriched for RNA-binding and nervous system functions (Chapter 3, Fig 3.13B). One conserved proximal 3′UI transcript, foxo3b, is upregulated in EJC mutants at the transcript and protein level (Chapter 3, Fig 3.9, 3.11B and 3.12B). I find that homozygous loss, and to a lesser extent, heterozygous loss, of foxo3b, a proximal 3′UI transcript, in homozygous

EJC mutants partially rescues the mutant motor neuron outgrowth defects (Chapter 3,

Figs 3.11C-K and 3.12D-L). Thus, this work presents several lines of evidence for the discovery of proximal 3′UI genes as novel NMD targets and also provides evidence for the physiological relevance of one such NMD target, foxo3b.

4.1.1 Significance

The EJC is an mRNA-binding protein complex that is expressed ubiquitously and occupies about 75% of all exon-exon junctions in humans. Despite its ubiquitous expression pattern, mutations in genes encoding EJC core proteins cause tissue specific 121 developmental defects in humans. I have shown that unlike mice, the zebrafish system can be used to study rbm8a and magoh mutants which eventually show loss of function of said genes as the maternal stores are depleted (Chapter 3, Figs 3.3, 3.4 and 3.5). Since the EJC is involved in several molecular processes and is essential for cell survival, studying the molecular causes underlying EJC-associated defects and diseases has been challenging and thus are not well understood. This work has established the zebrafish system for studying the function of EJC and EJC-dependent NMD in the context of embryonic development. I have discovered a new class of NMD targets, transcripts with proximal 3′UIs which are regulated by components of EJC and NMD in zebrafish, mouse and human cells (Chapter 3, Figs 3.9, 3.10, 3.13 and 3.14). My work has led to the identification of over 150 genes that contain a proximal 3′UI in human, mouse and zebrafish which encode proteins enriched for RNA binding and nervous system functions

(Chapter 3, Figs 3.13A-B). I have illustrated the significance of EJC-dependent regulation of one proximal 3′UI gene, foxo3b, by providing evidence for its role in zebrafish motor neurons outgrowth in EJC mutants. (Chapter 3, Figs 3.11 and 3.12).

122 4.2 Future Directions

4.2.1 Determine molecular cause of the muscle defects in zebrafish EJC mutants

I have shown that zebrafish rbm8a and magoh mutants show defects in myofibril organization (Chapter 3, Fig 3.5). In my work, I have not been able to identify the molecular causes underlying EJC muscle defects although my analyses so far present several directions that can be pursued.

Defects in muscle organization and contraction have also been observed in

Xenopus eif4a3 morphants (47). Haremaki et al. showed that the mis-splicing and reduction in levels of the ryanodine receptor (ryr1) causes muscle-paralysis in xenopus eif4a3 morphants (48). RNA-Seq data from zebrafish rbm8a and magoh mutants does not show evidence for downregulation or mis-splicing of ryanodine receptor genes (ryr1a and ryr1b) in mutants (data not shown). One caveat of my current RNA-Seq data is that it consists of short reads (50 bp) and is thus not ideal for identifying global changes in splicing. To thoroughly test if the ryanodine receptor splicing is affected in the muscle cells of zebrafish EJC mutants, it will be critical to design RT-PCR and in-situ hybridization experiments to identify the isoforms expressed in muscle cells.

To add to what is already known from studies conducted in Xenopus embryos, I proceeded to characterize the muscle defects in zebrafish EJC mutants. To determine if zebrafish EJC mutants had fewer muscle cells than WT siblings, I stained rbm8a mutant and WT sibling embryos with antibodies that specifically detect muscle pioneer cells and slow muscle nuclei, I found no obvious differences between the two genotypes (Fig

4.1A-D). To determine whether EJC mutants had defects in both fast and slow muscle fibers, I stained rbm8a mutants and WT siblings with anti-EB165 (fast muscle fiber 123 specific) and anti-BA-D5 (slow muscle fiber specific) and found that both muscle fibers are similarly affected (Fig 4.1E-M). RNA-Seq data from muscle cells of EJC mutant embryos and WT siblings (Chapter 3, Fig 3.6) may reveal changes in gene expression of muscle specific genes and guide future experiments. So far based on my whole embryo

RNA-Seq data, I have found that some genes encoding myosin light chains such as myl10 and mylpfb are downregulated in EJC mutants suggesting that reduced amount of myosin light chain RNA and protein in EJC mutants causes myofiber disorganization. Reduction in mylpfb could also make muscles sensitive to severe muscle defects (unpublished,

Amacher lab). Also, genes encoding muscle cell fate regulators such as cdkn1ba (Chapter

3, Fig 3.9) and myog (Fig 4.2N) are upregulated in the EJC mutants. cdkn1ba, a proximal

3′UI containing gene, encodes a protein that upon overexpression has shown to cause decrease in muscle fiber diameter in mice (178). myog encodes a homeodomain transcription factor that has been implicated in muscle cell development, and balancing fiber number/size in zebrafish (75,179). Zebrafish myog can encode an aberrant distal

3′UI-containing transcript, an ‘NMD-inducing feature’. This aberrant isoform of myog is expressed at low levels compared to normal myog in WT siblings (top band in myog panel in Fig 4.2N) and is predicted to translate to protein without a homeodomain, unlike normal Myog (data not shown). The aberrant myog is upregulated in rbm8a mutants compared to WT siblings, which is expected since NMD is dysregulated in EJC mutants

(Chapter 3, Fig 3.7). To predict the impact of aberrant myog or cdkn1ba upregulation in rbm8a mutants, western blots can be performed to determine protein expression. Also, myog or cdkn1ba or both can be knocked out conditionally in the muscle cells of EJC mutants to test rescue of the muscle defects. Further, aberrant myog and cdkn1ba can also 124 be overexpressed in wild-type zebrafish to test if muscle defects similar to those observed in EJC mutants are observed.

Figure 4.1 Characterization of zebrafish rbm8a mutant muscle. (legend on next page)

125 A-B. Merged confocal images showing motor myofibers (red; detected by anti-

A4.1025 staining) and muscle pioneer cells (green; detected by anti-4D9 staining) in

WT siblings (A) and rbm8a mutants (B).

C-D. Merged confocal images showing myofibers (red; detected by anti-A4.1025 staining) and slow muscle nuclei (green; detected by anti-prox1 staining) in WT siblings (C) and rbm8a mutants (D).

E-G Confocal images showing fast muscle fiber immunofluorescence using anti-

EB165 (white) in WT siblings (E) and rbm8a mutants (F-G). Fast muscle fiber staining shows a moderate (F) and severe (G) phenotype in rbm8a mutants.

H-J Confocal images showing slow muscle fiber immunofluorescence using anti-BA-

D5 (white) in WT siblings (H) and rbm8a mutants (I-J). Slow muscle fiber staining shows a moderate (I) and severe (J) phenotype in rbm8a mutants.

K-M Merged confocal images showing fast muscle myofibers (green; detected by anti-EB165 staining) and slow muscle myofibers (green; detected by anti-BA-D5 staining) in WT siblings (K) and rbm8a mutants (L-M).

N. RT-PCR gel showing transcripts of foxo3b (proximal 3′UI), myog (aberrant distal

3′UI (top) and normal myog transcript with no known NMD inducing feature

(bottom)), eif4a2 (aberrant distal 3′UI containing) and rpl13 (control) in rbm8a mutants and WT siblings at 21 and 27 hpf.

126 4.2.2 Identifying the mechanism of proximal 3′UI transcript decay

We observe that several proximal and distal 3′UI-containing transcripts are unchanged or even downregulated, likely due to secondary effects, upon loss of EJC and NMD function

(Chapter 3, Figs 3.9, 3.10, 3.13 and 3.14). Thus, it is critical to identify additional features in proximal and distal 3′UI-containing transcripts that target them to EJC- dependent NMD. My current models for the EJC-dependent decay of proximal 3′UI- containing transcripts differ based on the distance of the 3′UI to the stop codon (Chapter

3, Fig 3.15). I hypothesize based on existing ribosome foot-printing data, if the distance of the 3′UI to the stop codon is ≥ 36 nts then the ribosome would terminate the stop and come in direct contact with the EJC which can trigger NMD by the established pathway.

However, if the distance of the 3′UI to the stop codon is < 36 nt then the terminated ribosome would displace or disassociate the canonical EJC and in this scenario the non- canonical EJCs or other EJC-interacting proteins (e.g. SR proteins) in the 3′UTR could trigger NMD. To test whether non-canonical EJC’s contribute to the instability of proximal 3′UI-containing transcripts, a bioinformatic analysis can be conducted to determine if highly unstable proximal 3′UI-containing transcripts contain higher number of exons in the coding region and harbor higher EJC footprints in the 3′UTR.

Another possibility is that specific sequences within the 3′UTRs of proximal 3′UI- containing transcripts lead to binding of EJC-interacting proteins (e.g. SR proteins) that confer instability to these transcripts. Sequences over-represented in the 3′UTRs of proximal 3′UI-containing transcripts can be identified by a systematic bioinformatic analysis approach using the MEME suite. To identify 3′UTR sequences that confer instability, it will be critical to compare MEME results obtained from 3′UTRs of 127 proximal 3′UI-containing transcripts that are upregulated dysregulation of NMD to those that remain unchanged. After identifying sequences using MEME analysis, Tomtom program within the MEME suite can be utilized to predict proteins binding to these sequences that are over-represented in EJC- and NMD-responsive proximal 3′UI- containing transcripts. The 3′UTR sequences identified by the bioinformatic analysis can then be tested by cloning the 3′UI-containing 3′UTRs that also contain the sequences discovered above to a β-globin reporter. Such a reporter and its counterpart that does not contain the 3′UTR sequences can be expressed in mammalian cells under conditions of

UPF1 knockdown to test if the sequences confer instability to the reporters. In addition to the in-silico approach described above, one can conduct a screen using the β-globin reporter system described above to test a wide range of reporters containing 3′UTRs of sensitive and insensitive proximal 3′UI-containing genes. An experiment comparing the sensitive to the insensitive proximal 3′UI-containing reporters can lead to the identification of sequences or features in the 3′UTR that confer instability.

4.2.3 Determining the molecular mechanism of foxo3b-mediated rescue of EJC mutant motor neuron outgrowth defects

I have shown that zebrafish EJC mutants have stunted motor neurons and knockout of foxo3b in EJC mutants partially rescues the motor neuron outgrowth defects (Chapter 3,

Fig 3.11G-K and 3.12H-L). However, the knockout of foxo3b in EJC mutants does not rescue the muscle defects (Chapter 3, Fig 3.11C-F and 3.12D-G). Even though the upregulation of Foxo3b maybe the cause of EJC muscle defects as atrophy of muscles is seen upon constitutive expression of FOXO3a (ortholog of zebrafish foxo3b) in mouse adult tibialis anterior muscles (180). 128 Whole embryo RNA-Seq data shows that foxo3b is upregulated in EJC mutants

(Chapter 3, Fig 3.9 and Fig 3.12A). Whole embryo western blots show that Foxo3b is upregulated in EJC mutants (Chapter 3, Fig 3.11B and 3.12B) but preliminary whole- mount immunofluorescence experiments do not show accumulation of the protein in any specific tissues (data not shown). Another evidence of increase in Foxo3b activity is that

RNA-Seq shows upregulation of genes previously identified as transcriptional targets of

FOXO proteins in humans. These genes encode proteins involved in cell cycle rest (rbl2, cdkn1ba), mTORC suppression (sens3), ROS detoxification (prdx3), and apoptosis

(bbc3) (Chapter 3, Fig 3.12C). How accumulation of Foxo3b in zebrafish embryos inhibits motor neuron outgrowth at a molecular level is unknown. The mechanism underlying foxo3b-mediated inhibition of motor neuron outgrowth can be determined by testing whether mRNA and protein of the foxo3b targets described above are localized to motor neurons or neuromuscular junctions in EJC mutants and by testing whether expression of foxo3b targets is increased in motor neurons or neuromuscular junctions of

EJC mutants.

Motor neuron outgrowth is dependent not only on specific gene expression within the motor neuron growth cones but also on specific signaling between the growing axon and the skeletal muscle (85). Since foxo3b/EJC double mutants only show a partial rescue of axonal growth and no rescue of muscle, one hypothesis is that knockout of foxo3b in

EJC mutants likely rescues gene expression in the motor neurons leading to longer axons.

The EJC mutant RNA-Seq can be used as a starting point to systematically determine if proteins that localize to growth cones are differentially expressed. Additionally, RNA-

Seq experiments can be conducted from neural cells of EJC mutants, WT siblings and 129 foxo3b/EJC double mutants to identify the molecular rescue in double mutants, if any, in an unbiased manner.

4.3 Conclusions

This work investigates how regulation of gene expression via decay shapes embryonic development. Specifically, this work focuses on understanding how mRNA decay mediated by the Exon Junction Complex (EJC) affects zebrafish embryonic development.

This work contributes to our understanding of the sensitivity of tissues to the loss of ubiquitously expressed RNA binding proteins such as the EJC; however this understanding remains incomplete. Identifying the tissue specific targets of ubiquitously expressed RBPs is critical for unraveling the mechanism underlying the diseases associated with these proteins.

130 Bibliography

1. Brinegar AE, Cooper TA. Roles for RNA-binding proteins in development and disease. Brain Res. 2016 15;1647:1–8.

2. Lennox AL, Mao H, Silver DL. RNA on the brain: emerging layers of post- transcriptional regulation in cerebral cortex development. Wiley Interdiscip Rev Dev Biol. 2018;7(1).

3. Corbett AH. Post-transcriptional regulation of gene expression and human disease. Curr Opin Cell Biol. 2018 Jun;52:96–104.

4. Lukong KE, Chang K, Khandjian EW, Richard S. RNA-binding proteins in human genetic disease. Trends Genet. 2008 Aug 1;24(8):416–25.

5. Gerstberger S, Hafner M, Ascano M, Tuschl T. Evolutionary Conservation and Expression of Human RNA-Binding Proteins and Their Role in Human Genetic Disease. Adv Exp Med Biol. 2014;825:1–55.

6. Woodward LA, Mabin JW, Gangras P, Singh G. The exon junction complex: a lifelong guardian of mRNA fate. Wiley Interdiscip Rev RNA. 2016 Dec 23;

7. Le Hir H, Saulière J, Wang Z. The exon junction complex as a node of post- transcriptional networks. Nat Rev Mol Cell Biol. 2016 Jan;17(1):41–54.

8. Boehm V, Gehring NH. Exon Junction Complexes: Supervising the Gene Expression Assembly Line. Trends Genet TIG. 2016;32(11):724–35.

9. Singh G, Kucukural A, Cenik C, Leszyk JD, Shaffer SA, Weng Z, et al. The cellular EJC interactome reveals higher-order mRNP structure and an EJC-SR protein nexus. Cell. 2012 Nov 9;151(4):750–64.

10. Martin R, Straub AU, Doebele C, Bohnsack MT. DExD/H-box RNA in ribosome biogenesis. RNA Biol. 2013 Jan 1;10(1):4–18.

11. Alexandrov A, Colognori D, Steitz JA. Human eIF4AIII interacts with an eIF4G- like partner, NOM1, revealing an evolutionarily conserved function outside the exon junction complex. Genes Dev. 2011 May 15;25(10):1078–90.

12. Boisramé A, Devillers H, Onésime D, Brunel F, Pouch J, Piot M, et al. Exon junction complex components Y14 and Mago still play a role in budding yeast. Sci Rep. 2019 Jan 29;9(1):1–18.

13. Carter MS, Li S, Wilkinson MF. A splicing-dependent regulatory mechanism that detects translation signals. EMBO J. 1996 Nov 1;15(21):5965–75.

131 14. Zhang J, Sun X, Qian Y, LaDuca JP, Maquat LE. At least one intron is required for the nonsense-mediated decay of triosephosphate isomerase mRNA: a possible link between nuclear splicing and cytoplasmic translation. Mol Cell Biol. 1998 Sep;18(9):5272–83.

15. Thermann R, Neu-Yilik G, Deters A, Frede U, Wehr K, Hagemeier C, et al. Binary specification of nonsense codons by splicing and cytoplasmic translation. EMBO J. 1998 Jun 15;17(12):3484–94.

16. Le Hir H, Moore MJ, Maquat LE. Pre-mRNA splicing alters mRNP composition: evidence for stable association of proteins at exon-exon junctions. Genes Dev. 2000 May 1;14(9):1098–108.

17. Le Hir H, Izaurralde E, Maquat LE, Moore MJ. The spliceosome deposits multiple proteins 20-24 nucleotides upstream of mRNA exon-exon junctions. EMBO J. 2000 Dec 15;19(24):6860–9.

18. Kataoka N, Yong J, Kim VN, Velazquez F, Perkinson RA, Wang F, et al. Pre- mRNA splicing imprints mRNA in the nucleus with a novel RNA-binding protein that persists in the cytoplasm. Mol Cell. 2000 Sep;6(3):673–82.

19. Maquat LE. Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat Rev Mol Cell Biol. 2004 Feb;5(2):89–99.

20. Karousis ED, Nasif S, Mühlemann O. Nonsense-mediated mRNA decay: novel mechanistic insights and biological impact. Wiley Interdiscip Rev RNA. 2016;7(5):661–82.

21. Hwang J, Sato H, Tang Y, Matsuda D, Maquat LE. UPF1 association with the cap- binding protein, CBP80, promotes nonsense-mediated mRNA decay at two distinct steps. Mol Cell. 2010 Aug 13;39(3):396–409.

22. Durand S, Lykke-Andersen J. Nonsense-mediated mRNA decay occurs during eIF4F-dependent translation in human cells. Nat Struct Mol Biol. 2013 Jun;20(6):702–9.

23. Rufener SC, Mühlemann O. eIF4E-bound mRNPs are substrates for nonsense- mediated mRNA decay in mammalian cells. Nat Struct Mol Biol. 2013 Jun;20(6):710–7.

24. He F, Jacobson A. Nonsense-Mediated mRNA Decay: Degradation of Defective Transcripts Is Only Part of the Story. Annu Rev Genet. 2015;49:339–66.

25. Schweingruber C, Rufener SC, Zünd D, Yamashita A, Mühlemann O. Nonsense- mediated mRNA decay - mechanisms of substrate mRNA recognition and

132 degradation in mammalian cells. Biochim Biophys Acta. 2013 Jul;1829(6–7):612– 23.

26. Lykke-Andersen S, Jensen TH. Nonsense-mediated mRNA decay: an intricate machinery that shapes transcriptomes. Nat Rev Mol Cell Biol. 2015 Nov;16(11):665–77.

27. Giorgi C, Yeo GW, Stone ME, Katz DB, Burge C, Turrigiano G, et al. The EJC factor eIF4AIII modulates synaptic strength and neuronal protein expression. Cell. 2007 Jul 13;130(1):179–91.

28. Bicknell AA, Cenik C, Chua HN, Roth FP, Moore MJ. Introns in UTRs: why we should stop ignoring them. BioEssays News Rev Mol Cell Dev Biol. 2012 Dec;34(12):1025–34.

29. Newmark PA, Boswell RE. The mago nashi encodes an essential product required for germ plasm assembly in Drosophila. Dev Camb Engl. 1994 May;120(5):1303–13.

30. Newmark PA, Mohr SE, Gong L, Boswell RE. mago nashi mediates the posterior follicle cell-to-oocyte signal to organize axis formation in Drosophila. Dev Camb Engl. 1997 Aug;124(16):3197–207.

31. Micklem DR, Dasgupta R, Elliott H, Gergely F, Davidson C, Brand A, et al. The mago nashi gene is required for the polarisation of the oocyte and the formation of perpendicular axes in Drosophila. Curr Biol CB. 1997 Jul 1;7(7):468–78.

32. Kugler J-M, Lasko P. Localization, anchoring and translational control of oskar, gurken, bicoid and nanos mRNA during Drosophila oogenesis. Fly (Austin). 2009 Mar;3(1):15–28.

33. Parma DH, Bennett PE, Boswell RE. Mago Nashi and Tsunagi/Y14, respectively, regulate Drosophila germline stem cell differentiation and oocyte specification. Dev Biol. 2007 Aug 15;308(2):507–19.

34. Castori M, Cascone P, Brinelli M, Iannetti G, Grammatico P. The nosology of Richieri-Costa/Guion-Almeida syndrome(s). Am J Med Genet A. 2011 Feb;155A(2):398–402.

35. Favaro FP, Zechi-Ceide RM, Alvarez CW, Maximino LP, Antunes LFBB, Richieri-Costa A, et al. Richieri-Costa-Pereira syndrome: a unique acrofacial dysostosis type. An overview of the Brazilian cases. Am J Med Genet A. 2011 Feb;155A(2):322–31.

36. Favaro FP, Alvizi L, Zechi-Ceide RM, Bertola D, Felix TM, de Souza J, et al. A noncoding expansion in EIF4A3 causes Richieri-Costa-Pereira syndrome, a

133 craniofacial disorder associated with limb defects. Am J Hum Genet. 2014 Jan 2;94(1):120–8.

37. Brunetti-Pierri N, Berg JS, Scaglia F, Belmont J, Bacino CA, Sahoo T, et al. Recurrent reciprocal 1q21.1 deletions and duplications associated with microcephaly or macrocephaly and developmental and behavioral abnormalities. Nat Genet. 2008 Dec;40(12):1466–71.

38. Mefford HC, Sharp AJ, Baker C, Itsara A, Jiang Z, Buysse K, et al. Recurrent rearrangements of chromosome 1q21.1 and variable pediatric phenotypes. N Engl J Med. 2008 Oct 16;359(16):1685–99.

39. International Schizophrenia Consortium. Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature. 2008 Sep 11;455(7210):237– 41.

40. Nguyen LS, Kim H-G, Rosenfeld JA, Shen Y, Gusella JF, Lacassie Y, et al. Contribution of copy number variants involving nonsense-mediated mRNA decay pathway genes to neuro-developmental disorders. Hum Mol Genet. 2013 May 1;22(9):1816–25.

41. Greenhalgh KL, Howell RT, Bottani A, Ancliff PJ, Brunner HG, Verschuuren- Bemelmans CC, et al. Thrombocytopenia-absent radius syndrome: a clinical genetic study. J Med Genet. 2002 Dec;39(12):876–81.

42. Albers CA, Paul DS, Schulze H, Freson K, Stephens JC, Smethurst PA, et al. Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome. Nat Genet. 2012 Apr;44(4):435–9, S1-2.

43. Mao H, Pilaz L-J, McMahon JJ, Golzio C, Wu D, Shi L, et al. Rbm8a haploinsufficiency disrupts embryonic cortical development resulting in microcephaly. J Neurosci Off J Soc Neurosci. 2015 May 6;35(18):7003–18.

44. Mao H, McMahon JJ, Tsai Y-H, Wang Z, Silver DL. Haploinsufficiency for Core Exon Junction Complex Components Disrupts Embryonic Neurogenesis and Causes p53-Mediated Microcephaly. PLoS Genet [Internet]. 2016 Sep 12;12(9). Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5019403/

45. Silver DL, Leeds KE, Hwang H-W, Miller EE, Pavan WJ. The EJC component Magoh regulates proliferation and expansion of neural crest-derived melanocytes. Dev Biol. 2013 Mar 15;375(2):172–81.

46. Silver DL, Watkins-Chow DE, Schreck KC, Pierfelice TJ, Larson DM, Burnetti AJ, et al. The exon junction complex component Magoh controls brain size by regulating neural stem cell division. Nat Neurosci. 2010 May;13(5):551–8. 134 47. Haremaki T, Sridharan J, Dvora S, Weinstein DC. Regulation of vertebrate embryogenesis by the exon junction complex core component Eif4a3. Dev Dyn Off Publ Am Assoc Anat. 2010 Jul;239(7):1977–87.

48. Haremaki T, Weinstein DC. Eif4a3 is required for accurate splicing of the Xenopus laevis ryanodine receptor pre-mRNA. Dev Biol. 2012 Dec 1;372(1):103– 10.

49. Pilaz L-J, McMahon JJ, Miller EE, Lennox AL, Suzuki A, Salmon E, et al. Prolonged Mitosis of Neural Progenitors Alters Cell Fate in the Developing Brain. Neuron. 2016 Jan 6;89(1):83–99.

50. Mohr SE, Dillon ST, Boswell RE. The RNA-binding protein Tsunagi interacts with Mago Nashi to establish polarity and localize oskar mRNA during Drosophila oogenesis. Genes Dev. 2001 Nov 1;15(21):2886–99.

51. Zou D, McSweeney C, Sebastian A, Reynolds DJ, Dong F, Zhou Y, et al. A critical role of RBM8a in proliferation and differentiation of embryonic neural progenitors. Neural Develop. 2015;10:18.

52. Kimball C, Powers K, Dustin J, Poirier V, Pellettieri J. The exon junction complex is required for stem and progenitor cell maintenance in planarians. Dev Biol [Internet]. 2019 Sep 23; Available from: http://www.sciencedirect.com/science/article/pii/S0012160619300296

53. Pascuan C, Frare R, Alleva K, Ayub ND, Soto G. mRNA biogenesis-related eIF4AIII from is an important factor for abiotic stress adaptation. Plant Cell Rep. 2016 May;35(5):1205–8.

54. Cilano K, Mazanek Z, Khan M, Metcalfe S, Zhang X-N. A New Mutation, hap1-2, Reveals a C Terminal Domain Function in AtMago Protein and Its Biological Effects in Male Gametophyte Development in Arabidopsis thaliana. PloS One. 2016;11(2):e0148200.

55. Park N-I, Yeung EC, Muench DG. Mago Nashi is involved in meristem organization, pollen formation, and seed development in Arabidopsis. Plant Sci Int J Exp Plant Biol. 2009 Apr;176(4):461–9.

56. Tarpey PS, Raymond FL, Nguyen LS, Rodriguez J, Hackett A, Vandeleur L, et al. Mutations in UPF3B, a member of the nonsense-mediated mRNA decay complex, cause syndromic and nonsyndromic mental retardation. Nat Genet. 2007 Sep;39(9):1127–33.

57. Addington AM, Gauthier J, Piton A, Hamdan FF, Raymond A, Gogtay N, et al. A novel in UPF3B identified in brothers affected with childhood

135 onset schizophrenia and autism spectrum disorders. Mol Psychiatry. 2011 Mar;16(3):238–9.

58. Laumonnier F, Shoubridge C, Antar C, Nguyen LS, Van Esch H, Kleefstra T, et al. Mutations of the UPF3B gene, which encodes a protein widely expressed in neurons, are associated with nonspecific mental retardation with or without autism. Mol Psychiatry. 2010 Jul;15(7):767–76.

59. Jolly LA, Homan CC, Jacob R, Barry S, Gecz J. The UPF3B gene, implicated in intellectual disability, autism, ADHD and childhood onset schizophrenia regulates neural progenitor cell behaviour and neuronal outgrowth. Hum Mol Genet. 2013 Dec 1;22(23):4673–87.

60. Xu X, Zhang L, Tong P, Xun G, Su W, Xiong Z, et al. sequencing identifies UPF3B as the causative gene for a Chinese non-syndrome mental retardation pedigree. Clin Genet. 2013;83(6):560–4.

61. Medghalchi SM, Frischmeyer PA, Mendell JT, Kelly AG, Lawler AM, Dietz HC. Rent1, a trans-effector of nonsense-mediated mRNA decay, is essential for mammalian embryonic viability. Hum Mol Genet. 2001 Jan 15;10(2):99–105.

62. McIlwain DR, Pan Q, Reilly PT, Elia AJ, McCracken S, Wakeham AC, et al. Smg1 is required for embryogenesis and regulates diverse genes via alternative splicing coupled to nonsense-mediated mRNA decay. Proc Natl Acad Sci. 2010 Jul 6;107(27):12186–91.

63. Roberts TL, Ho U, Luff J, Lee CS, Apte SH, MacDonald KPA, et al. Smg1 haploinsufficiency predisposes to tumor formation and inflammation. Proc Natl Acad Sci. 2013 Jan 22;110(4):1151–2.

64. Thoren LA, Nørgaard GA, Weischenfeldt J, Waage J, Jakobsen JS, Damgaard I, et al. UPF2 Is a Critical Regulator of Liver Development, Function and Regeneration. PLOS ONE. 2010 Jul 19;5(7):e11650.

65. Li T, Shi Y, Wang P, Guachalla LM, Sun B, Joerss T, et al. Smg6/Est1 licenses embryonic stem cell differentiation via nonsense-mediated mRNA decay. EMBO J. 2015 Jun 12;34(12):1630–47.

66. Bruno IG, Karam R, Huang L, Bhardwaj A, Lou CH, Shum EY, et al. Identification of a MicroRNA that Activates Gene Expression by Repressing Nonsense-Mediated RNA Decay. Mol Cell. 2011 May 20;42(4):500–10.

67. Colak D, Ji S-J, Porse BT, Jaffrey SR. Regulation of axon guidance by compartmentalized nonsense-mediated mRNA decay. Cell. 2013 Jun 6;153(6):1252–65.

136 68. Alrahbeni T, Sartor F, Anderson J, Miedzybrodzka Z, McCaig C, Müller B. Full UPF3B function is critical for neuronal differentiation of neural stem cells. Mol Brain [Internet]. 2015 May 27 [cited 2016 Mar 7];8. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4445987/

69. Weischenfeldt J, Damgaard I, Bryder D, Theilgaard-Mönch K, Thoren LA, Nielsen FC, et al. NMD is essential for hematopoietic stem and progenitor cells and for eliminating by-products of programmed DNA rearrangements. Genes Dev. 2008 May 15;22(10):1381–96.

70. Gudikote JP, Wilkinson MF. T-cell receptor sequences that elicit strong down- regulation of premature termination codon-bearing transcripts. EMBO J. 2002 Jan 15;21(1–2):125–34.

71. Carter MS, Doskow J, Morris P, Li S, Nhim RP, Sandstedt S, et al. A Regulatory Mechanism That Detects Premature Nonsense Codons in T-cell Receptor Transcripts in Vivo Is Reversed by Protein Synthesis Inhibitors in Vitro. J Biol Chem. 1995 Dec 1;270(48):28995–9003.

72. Lou CH, Shao A, Shum EY, Espinoza JL, Huang L, Karam R, et al. Posttranscriptional Control of the Stem Cell and Neurogenic Programs by the Nonsense-Mediated RNA Decay Pathway. Cell Rep. 2014 Feb 27;6(4):748–64.

73. Lou C-H, Dumdie J, Goetz A, Shum EY, Brafman D, Liao X, et al. Nonsense- Mediated RNA Decay Influences Human Embryonic Stem Cell Fate. Stem Cell Rep. 2016 Jun 14;6(6):844–57.

74. Kimmel CB, Ballard WW, Kimmel SR, Ullmann B, Schilling TF. Stages of embryonic development of the zebrafish. Dev Dyn Off Publ Am Assoc Anat. 1995 Jul;203(3):253–310.

75. Jackson HE, Ingham PW. Control of muscle fibre-type diversity during embryonic development: the zebrafish paradigm. Mech Dev. 2013 Oct;130(9–10):447–57.

76. Hirata H, Watanabe T, Hatakeyama J, Sprague SM, Saint-Amant L, Nagashima A, et al. Zebrafish relatively relaxed mutants have a ryanodine receptor defect, show slow swimming and provide a model of multi-minicore disease. Development. 2007 Aug 1;134(15):2771–81.

77. Pietri T, Manalo E, Ryan J, Saint-Amant L, Washbourne P. Glutamate drives the touch response through a rostral loop in the spinal cord of zebrafish embryos. Dev Neurobiol. 2009 Oct;69(12):780–95.

78. Saint-Amant L, Drapeau P. Time course of the development of motor behaviors in the zebrafish embryo. J Neurobiol. 1998 Dec;37(4):622–32.

137 79. Saint-Amant L, Drapeau P. Motoneuron activity patterns related to the earliest behavior of the zebrafish embryo. J Neurosci Off J Soc Neurosci. 2000 Jun 1;20(11):3964–72.

80. Saint-Amant L, Drapeau P. Synchronization of an Embryonic Network of Identified Spinal Interneurons Solely by Electrical Coupling. Neuron. 2001 Sep 27;31(6):1035–46.

81. Naganawa Y, Hirata H. Developmental transition of touch response from slow muscle-mediated coilings to fast muscle-mediated burst swimming in zebrafish. Dev Biol. 2011 Jul 15;355(2):194–204.

82. Myers PZ, Eisen JS, Westerfield M. Development and axonal outgrowth of identified motoneurons in the zebrafish. J Neurosci Off J Soc Neurosci. 1986 Aug;6(8):2278–89.

83. Westerfield M, McMurray JV, Eisen JS. Identified motoneurons and their innervation of axial muscles in the zebrafish. J Neurosci. 1986 Aug 1;6(8):2267– 77.

84. Eisen JS. Developmental neurobiology of the zebrafish. J Neurosci. 1991 Feb 1;11(2):311–7.

85. Beattie CE. Control of motor axon guidance in the zebrafish embryo. Brain Res Bull. 2000 Nov;53(5):489–500.

86. Feldner J, Reimer MM, Schweitzer J, Wendik B, Meyer D, Becker T, et al. PlexinA3 Restricts Spinal Exit Points and Branching of Trunk Motor Nerves in Embryonic Zebrafish. J Neurosci. 2007 May 2;27(18):4978–83.

87. Zhang X, Tang N, Hadden TJ, Rishi AK. Akt, FoxO and regulation of apoptosis. Biochim Biophys Acta BBA - Mol Cell Res. 2011 Nov 1;1813(11):1978–86.

88. Lin S-J, Chiang M-C, Shih H-Y, Chiang K-C, Cheng Y-C. Spatiotemporal expression of foxo4, foxo6a, and foxo6b in the developing brain and retina are transcriptionally regulated by PI3K signaling in zebrafish. Dev Genes Evol. 2017;227(3):219–30.

89. Paik J-H, Kollipara R, Chu G, Ji H, Xiao Y, Ding Z, et al. FoxOs are lineage- restricted redundant tumor suppressors and regulate endothelial cell homeostasis. Cell. 2007 Jan 26;128(2):309–23.

90. Xie X, Liu J-X, Hu B, Xiao W. Zebrafish foxo3b negatively regulates canonical Wnt signaling to affect early embryogenesis. PloS One. 2011;6(9):e24469.

91. Morris BJ, Willcox DC, Donlon TA, Willcox BJ. FOXO3: A Major Gene for Human Longevity--A Mini-Review. Gerontology. 2015;61(6):515–25. 138 92. Webb AE, Kundaje A, Brunet A. Characterization of the direct targets of FOXO transcription factors throughout evolution. Aging Cell. 2016;15(4):673–85.

93. Gómez-Puerto MC, Verhagen LP, Braat AK, Lam EW-F, Coffer PJ, Lorenowicz MJ. Activation of autophagy by FOXO3 regulates redox homeostasis during osteogenic differentiation. Autophagy. 2016 02;12(10):1804–16.

94. Hosaka T, Biggs WH, Tieu D, Boyer AD, Varki NM, Cavenee WK, et al. Disruption of forkhead transcription factor (FOXO) family members in mice reveals their functional diversification. Proc Natl Acad Sci. 2004 Mar 2;101(9):2975–80.

95. Liu X, Cai X, Hu B, Mei Z, Zhang D, Ouyang G, et al. Forkhead Transcription Factor 3a (FOXO3a) Modulates Hypoxia Signaling via Up-regulation of the von Hippel-Lindau Gene (VHL). J Biol Chem. 2016 Dec 2;291(49):25692–705.

96. Liu X, Cai X, Zhang D, Xu C, Xiao W. Zebrafish foxo3b Negatively Regulates Antiviral Response through Suppressing the Transactivity of irf3 and irf7. J Immunol Baltim Md 1950. 2016 15;197(12):4736–49.

97. Wang H, Li Y, Wang S, Zhang Q, Zheng J, Yang Y, et al. Knockdown of transcription factor forkhead box O3 (FOXO3) suppresses erythroid differentiation in human cells and zebrafish. Biochem Biophys Res Commun. 2015 May 15;460(4):923–30.

98. Shimizu H, Langenbacher AD, Huang J, Wang K, Otto G, Geisler R, et al. The Calcineurin-FoxO-MuRF1 signaling pathway regulates myofibril integrity in cardiomyocytes. Yelon D, editor. eLife. 2017 Aug 19;6:e27955.

99. Webb AE, Brunet A. FOXO transcription factors: key regulators of cellular quality control. Trends Biochem Sci. 2014 Apr;39(4):159–69.

100. Liu H, Yin J, Wang H, Jiang G, Deng M, Zhang G, et al. FOXO3a modulates WNT/β-catenin signaling and suppresses epithelial-to-mesenchymal transition in prostate cancer cells. Cell Signal. 2015 Mar;27(3):510–8.

101. Essaghir A, Dif N, Marbehant CY, Coffer PJ, Demoulin J-B. The transcription of FOXO genes is stimulated by FOXO3 and repressed by growth factors. J Biol Chem. 2009 Apr 17;284(16):10334–42.

102. Stefanetti RJ, Voisin S, Russell A, Lamon S. Recent advances in understanding the role of FOXO3. F1000Research [Internet]. 2018 Aug 31;7. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6124385/

139 103. Heyer EE, Ozadam H, Ricci EP, Cenik C, Moore MJ. An optimized kit-free method for making strand-specific deep sequencing libraries from RNA fragments. Nucleic Acids Res. 2015 Jan 9;43(1):e2–e2.

104. Sterling CH, Veksler-Lublinsky I, Ambros V. An efficient and sensitive method for preparing cDNA libraries from scarce biological samples. Nucleic Acids Res. 2015 Jan 9;43(1):e1–e1.

105. Wery M, Descrimes M, Thermes C, Gautheret D, Morillon A. Zinc-mediated RNA fragmentation allows robust transcript reassembly upon whole transcriptome RNA- Seq. Methods San Diego Calif. 2013 Sep 1;63(1):25–31.

106. Nakanishi K, Weinberg DE, Bartel DP, Patel DJ. Structure of yeast Argonaute with guide RNA. Nature. 2012 Jun 20;486(7403):368–74.

107. Nakanishi K, Ascano M, Gogakos T, Ishibe-Murakami S, Serganov AA, Briskin D, et al. -Specific Insertion Elements Control Human ARGONAUTE Slicer Activity. Cell Rep. 2013 Jun 27;3(6):1893–900.

108. Schirle NT, MacRae IJ. The crystal structure of human Argonaute2. Science. 2012 May 25;336(6084):1037–40.

109. Elkayam E, Kuhn C-D, Tocilj A, Haase AD, Greene EM, Hannon GJ, et al. The structure of human argonaute-2 in complex with miR-20a. Cell. 2012 Jul 6;150(1):100–10.

110. Faehnle CR, Elkayam E, Haase AD, Hannon GJ, Joshua-Tor L. The making of a slicer: activation of human Argonaute-1. Cell Rep. 2013 Jun 27;3(6):1901–9.

111. Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990 Oct 25;18(20):6097–100.

112. Crooks GE, Hon G, Chandonia J-M, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004 Jun;14(6):1188–90.

113. Mayr C. Regulation by 3′-Untranslated Regions. Annu Rev Genet. 2017;51(1):171–94.

114. Bartel DP. : Target Recognition and Regulatory Functions. Cell. 2009 Jan 23;136(2):215–33.

115. Matoulkova E, Michalova E, Vojtesek B, Hrstka R. The role of the 3’ untranslated region in post-transcriptional regulation of protein expression in mammalian cells. RNA Biol. 2012 May;9(5):563–76.

116. Mayr C. Evolution and Biological Roles of Alternative 3’UTRs. Trends Cell Biol. 2016 Mar;26(3):227–37. 140 117. Tian B, Manley JL. Alternative polyadenylation of mRNA precursors. Nat Rev Mol Cell Biol. 2017;18(1):18–30.

118. Merz C, Urlaub H, Will CL, Lührmann R. Protein composition of human mRNPs spliced in vitro and differential requirements for mRNP protein recruitment. RNA N Y N. 2007 Jan;13(1):116–28.

119. Singh G, Pratt G, Yeo GW, Moore MJ. The Clothes Make the mRNA: Past and Present Trends in mRNP Fashion. Annu Rev Biochem. 2015;84:325–54.

120. Gehring NH, Lamprinaki S, Kulozik AE, Hentze MW. Disassembly of exon junction complexes by PYM. Cell. 2009 May 1;137(3):536–48.

121. Dostie J, Dreyfuss G. Translation is required to remove Y14 from mRNAs in the cytoplasm. Curr Biol CB. 2002 Jul 9;12(13):1060–7.

122. McGlincy NJ, Smith CWJ. Alternative splicing resulting in nonsense-mediated mRNA decay: what is the meaning of nonsense? Trends Biochem Sci. 2008 Aug;33(8):385–93.

123. Zheng S, Gray EE, Chawla G, Porse BT, O’Dell TJ, Black DL. PSD-95 is post- transcriptionally repressed during early neural development by PTBP1 and PTBP2. Nat Neurosci. 2012 Jan 15;15(3):381–8, S1.

124. McMahon JJ, Miller EE, Silver DL. The exon junction complex in neural development and neurodevelopmental disease. Int J Dev Neurosci Off J Int Soc Dev Neurosci. 2016 Dec;55:117–23.

125. Sander JD, Zaback P, Joung JK, Voytas DF, Dobbs D. Zinc Finger Targeter (ZiFiT): an engineered zinc finger/target site design tool. Nucleic Acids Res. 2007 Jul;35(Web Server issue):W599-605.

126. Sander JD, Maeder ML, Reyon D, Voytas DF, Joung JK, Dobbs D. ZiFiT (Zinc Finger Targeter): an updated zinc finger engineering tool. Nucleic Acids Res. 2010 Jul;38(Web Server issue):W462-468.

127. Talbot JC, Amacher SL. A Streamlined CRISPR Pipeline to Reliably Generate Zebrafish Frameshifting Alleles. Zebrafish. 2014 Dec 1;11(6):583–5.

128. Jao L-E, Wente SR, Chen W. Efficient multiplex biallelic zebrafish genome editing using a CRISPR nuclease system. Proc Natl Acad Sci U S A. 2013 Aug 20;110(34):13904–9.

129. Longair MH, Baker DA, Armstrong JD. Simple Neurite Tracer: open source software for reconstruction, visualization and analysis of neuronal processes. Bioinformatics. 2011 Sep 1;27(17):2453–4.

141 130. Gallagher TL, Arribere JA, Geurts PA, Exner CRT, McDonald KL, Dill KK, et al. Rbfox-regulated alternative splicing is critical for zebrafish cardiac and skeletal muscle functions. Dev Biol. 2011 Nov 15;359(2):251–61.

131. Gangras P, Dayeh DM, Mabin JW, Nakanishi K, Singh G. Cloning and Identification of Recombinant Argonaute-Bound Small RNAs Using Next- Generation Sequencing. Methods Mol Biol Clifton NJ. 2018;1680:1–28.

132. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013 Apr 25;14(4):R36.

133. Love MI, Anders S, Kim V, Huber W. RNA-Seq workflow: gene-level exploratory analysis and differential expression. F1000Research. 2015;4:1070.

134. Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, et al. Software for Computing and Annotating Genomic Ranges. PLOS Comput Biol. 2013 Aug 8;9(8):e1003118.

135. Morgan M, Obenchain V, Hester J, Pagès H. SummarizedExperiment: SummarizedExperiment container. 2019.

136. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014 Dec 5;15(12):550.

137. Risso D, Ngai J, Speed TP, Dudoit S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol. 2014 Sep;32(9):896–902.

138. Strimmer K. fdrtool: a versatile R package for estimating local and tail area-based false discovery rates. Bioinforma Oxf Engl. 2008 Jun 15;24(12):1461–2.

139. Mi H, Muruganujan A, Ebert D, Huang X, Thomas PD. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res. 2019 Jan 8;47(Database issue):D419–26.

140. Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019 Jan 8;47(D1):D607–13.

141. Johnstone TG, Bazzini AA, Giraldez AJ. Upstream ORFs are prevalent translational repressors in vertebrates. EMBO J. 2016 Apr 1;35(7):706–23.

142. Rodriguez JM, Maietta P, Ezkurdia I, Pietrelli A, Wesselink J-J, Lopez G, et al. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res. 2013 Jan;41(Database issue):D110-117.

142 143. Wittkopp N, Huntzinger E, Weiler C, Saulière J, Schmidt S, Sonawane M, et al. Nonsense-mediated mRNA decay effectors are essential for zebrafish embryonic development and survival. Mol Cell Biol. 2009 Jul;29(13):3517–28.

144. Hu Y, Xie S, Yao J. Identification of Novel Reference Genes Suitable for qRT- PCR Normalization with Respect to the Zebrafish Developmental Stage. PloS One. 2016;11(2):e0149277.

145. Shibuya T, Tange TØ, Sonenberg N, Moore MJ. eIF4AIII binds spliced mRNA in the exon junction complex and is essential for nonsense-mediated decay. Nat Struct Mol Biol. 2004 Apr;11(4):346–51.

146. Palacios IM, Gatfield D, St Johnston D, Izaurralde E. An eIF4AIII-containing complex required for mRNA localization and nonsense-mediated mRNA decay. Nature. 2004 Feb 19;427(6976):753–7.

147. Ballut L, Marchadier B, Baguet A, Tomasetto C, Séraphin B, Le Hir H. The exon junction core complex is locked onto RNA by inhibition of eIF4AIII ATPase activity. Nat Struct Mol Biol. 2005 Oct;12(10):861–9.

148. Ghosh S, Marchand V, Gáspár I, Ephrussi A. Control of RNP motility and localization by a splicing-dependent structure in oskar mRNA. Nat Struct Mol Biol. 2012 Mar 18;19(4):441–9.

149. Saulière J, Murigneux V, Wang Z, Marquenet E, Barbosa I, Le Tonquèze O, et al. CLIP-seq of eIF4AIII reveals transcriptome-wide mapping of the human exon junction complex. Nat Struct Mol Biol. 2012 Nov;19(11):1124–31.

150. Hauer C, Sieber J, Schwarzl T, Hollerer I, Curk T, Alleaume A-M, et al. Exon Junction Complexes Show a Distributional Bias toward Alternatively Spliced mRNAs and against mRNAs Coding for Ribosomal Proteins. Cell Rep. 2016 09;16(6):1588–603.

151. Obrdlik A, Lin G, Haberman N, Ule J, Ephrussi A. The transcriptome-wide landscape and modalities of EJC binding in adult Drosophila. bioRxiv. 2018 Jan 1;459354.

152. Mabin JW, Woodward LA, Patton RD, Yi Z, Jia M, Wysocki VH, et al. The Exon Junction Complex Undergoes a Compositional Switch that Alters mRNP Structure and Nonsense-Mediated mRNA Decay Activity. Cell Rep. 2018 Nov 27;25(9):2431–2446.e7.

153. White RJ, Collins JE, Sealy IM, Wali N, Dooley CM, Digby Z, et al. A high- resolution mRNA expression time course of embryonic development in zebrafish. eLife. 2017 16;6.

143 154. Ma Q, Tatsuno T, Nakamura Y, Ishigaki Y. The stability of Magoh and Y14 depends on their heterodimer formation and nuclear localization. Biochem Biophys Res Commun. 2019 Apr 9;511(3):631–6.

155. Longman D, Hug N, Keith M, Anastasaki C, Patton EE, Grimes G, et al. DHX34 and NBAS form part of an autoregulatory NMD circuit that regulates endogenous RNA targets in human cells, zebrafish and . Nucleic Acids Res. 2013 Sep;41(17):8319–31.

156. Bazzini AA, Johnstone TG, Christiano R, Mackowiak SD, Obermayer B, Fleming ES, et al. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 2014 May 2;33(9):981–93.

157. Wang J, Gudikote JP, Olivas OR, Wilkinson MF. Boundary-independent polar nonsense-mediated decay. EMBO Rep. 2002 Mar;3(3):274–9.

158. Lareau LF, Brenner SE. Regulation of Splicing Factors by Alternative Splicing and NMD Is Conserved between Kingdoms Yet Evolutionarily Flexible. Mol Biol Evol. 2015 Apr;32(4):1072–9.

159. Martin L, Grigoryan A, Wang D, Wang J, Breda L, Rivella S, et al. Identification and characterization of small molecules that inhibit nonsense-mediated RNA decay and suppress nonsense p53 mutations. Cancer Res. 2014 Jun 1;74(11):3104–13.

160. Baird TD, Cheng KC-C, Chen Y-C, Buehler E, Martin SE, Inglese J, et al. ICE1 promotes the link between splicing and nonsense-mediated mRNA decay. eLife. 2018 12;7.

161. Jaffrey SR, Wilkinson MF. Nonsense-mediated RNA decay in the brain: emerging modulator of neural development and disease. Nat Rev Neurosci. 2018 Dec;19(12):715–28.

162. Toma KG, Rebbapragada I, Durand S, Lykke-Andersen J. Identification of elements in human long 3’ UTRs that inhibit nonsense-mediated decay. RNA N Y N. 2015 May;21(5):887–97.

163. Ge Z, Quek BL, Beemon KL, Hogg JR. binding protein 1 protects mRNAs from recognition by the nonsense-mediated mRNA decay pathway. eLife. 2016 Jan 8;5.

164. Kishor A, Ge Z, Hogg JR. hnRNP L-dependent protection of normal mRNAs from NMD subverts quality control in B cell lymphoma. EMBO J. 2019 Feb 1;38(3).

165. Hoek TA, Khuperkar D, Lindeboom RGH, Sonneveld S, Verhagen BMP, Boersma S, et al. Single-Molecule Imaging Uncovers Rules Governing Nonsense-Mediated

144 mRNA Decay. Mol Cell [Internet]. 2019 May 30; Available from: http://www.sciencedirect.com/science/article/pii/S1097276519303612

166. Zhang Z, Krainer AR. Involvement of SR proteins in mRNA surveillance. Mol Cell. 2004 Nov 19;16(4):597–607.

167. Aznarez I, Nomakuchi TT, Tetenbaum-Novatt J, Rahman MA, Fregoso O, Rees H, et al. Mechanism of Nonsense-Mediated mRNA Decay Stimulation by Splicing Factor SRSF1. Cell Rep. 2018 May 15;23(7):2186–98.

168. Lindeboom RGH, Supek F, Lehner B. The rules and impact of nonsense-mediated mRNA decay in human cancers. Nat Genet. 2016;48(10):1112–8.

169. Scofield DG, Lynch M. Evolutionary diversification of the Sm family of RNA- associated proteins. Mol Biol Evol. 2008 Nov;25(11):2255–67.

170. Hong X, Scofield DG, Lynch M. Intron size, abundance, and distribution within untranslated regions of genes. Mol Biol Evol. 2006 Dec;23(12):2392–404.

171. Chung YM, Park S-H, Tsai W-B, Wang S-Y, Ikeda M-A, Berek JS, et al. FOXO3 signalling links ATM to the p53 apoptotic pathway following DNA damage. Nat Commun. 2012;3:1000.

172. You H, Yamamoto K, Mak TW. Regulation of transactivation-independent proapoptotic activity of p53 by FOXO3a. Proc Natl Acad Sci U S A. 2006 Jun 13;103(24):9051–6.

173. Gilley J, Coffer PJ, Ham J. FOXO transcription factors directly activate bim gene expression and promote apoptosis in sympathetic neurons. J Cell Biol. 2003 Aug 18;162(4):613–22.

174. He C-W, Liao C-P, Pan C-L. Wnt signalling in the development of axon, dendrites and synapses. Open Biol [Internet]. 2018 Oct 3;8(10). Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6223216/

175. El-Brolosy MA, Kontarakis Z, Rossi A, Kuenne C, Günther S, Fukuda N, et al. Genetic compensation triggered by mutant mRNA degradation. Nature. 2019 May 7;568(7751):193–7.

176. Tran H, Brunet A, Grenier JM, Datta SR, Fornace AJ, DiStefano PS, et al. DNA repair pathway stimulated by the forkhead transcription factor FOXO3a through the Gadd45 protein. Science. 2002 Apr 19;296(5567):530–4.

177. Nelson JO, Moore KA, Chapin A, Hollien J, Metzstein MM. Degradation of Gadd45 mRNA by nonsense-mediated decay is essential for viability. eLife. 2016 08;5.

145 178. Pruitt SC, Freeland A, Rusiniak ME, Kunnev D, Cady GK. Cdkn1b overexpression in adult mice alters the balance between genome and tissue ageing. Nat Commun. 2013 Oct 23;4(1):1–12.

179. Ganassi M, Badodi S, Quiroga HPO, Zammit PS, Hinits Y, Hughes SM. Myogenin promotes myocyte fusion to balance fibre number and size. Nat Commun. 2018 Oct 12;9(1):1–17.

180. Sandri M, Sandri C, Gilbert A, Skurk C, Calabria E, Picard A, et al. Foxo transcription factors induce the atrophy-related ubiquitin ligase atrogin-1 and cause skeletal muscle atrophy. Cell. 2004 Apr 30;117(3):399–412.

146 Appendix A List of primers used in chapter 2

RT primer sequences

Name Sequence

TruSeq_ 5' - SE1 pGGCACTANNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGA GTGT-SPACER 18- CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCCTTGGCACCC GAGAATTCCA – 3' TruSeq_ 5' - SE2 pGGGTAGCNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGA GTGT-SPACER 18- CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCCTTGGCACCC GAGAATTCCA – 3' TruSeq_ 5' - SE3 pGGTCGATNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAG TGT-SPACER 18- CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCCTTGGCACCC GAGAATTCCA – 3' TruSeq_ 5' - SE4 pGGCCTCGNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAG TGT-SPACER 18- CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCCTTGGCACCC GAGAATTCCA – 3' TruSeq_ 5' - SE5 pGGTGACANNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGA GTGT-SPACER 18- CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCCTTGGCACCC GAGAATTCCA – 3' TruSeq_ 5' - SE6 pGGTAGACNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGA GTGT-SPACER 18- CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCCTTGGCACCC GAGAATTCCA – 3' TruSeq_ 5' - SE7 pGGGCCCTNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAG TGT-SPACER 18- CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCCTTGGCACCC GAGAATTCCA – 3' TruSeq_ 5' - SE8 pGGATCGGNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGA GTGT-SPACER 18-

147 CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCCTTGGCACCC GAGAATTCCA – 3' TruSeq_ 5' - SE9 pGGACTGANNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGA GTGT-SPACER 18- CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCCTTGGCACCC GAGAATTCCA – 3' TruSeq_ 5' - SE10 pGGTGTTCNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAG TGT-SPACER 18- CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCCTTGGCACCC GAGAATTCCA – 3' TruSeq_ 5' - SE11 pGGTAAGTNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGA GTGT-SPACER 18- CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCCTTGGCACCC GAGAATTCCA – 3' TruSeq_ 5' - SE12 pGGAGATGNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGA GTGT-SPACER 18- CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCCTTGGCACCC GAGAATTCCA – 3'

PCR Primers (Illumina PE primers; * indicates location of phosphorothioate bond) Name Sequence

PE1.0 5'– AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGA CGCTCTTCCGATC*T–3' PE2.0 5'– CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCT GAACCGCTCTTCCGATC*T–3'

148 Appendix B RNA-Seq reads

Number of mapped reads RIP-seq uniquely mapped reads (millions) experiment timepoint replicate 1 replicate 2 replicate (hpf) 3 Rbm8a RIP-seq 24 2.06 0.14 8.06

RNA-Seq uniquely mapped reads (millions) genotype/sample timepoint replicate 1 replicate 2 replicate (hpf) 3 rbm8a WT 21 35.9 9 40.2 sibling 27 9 10.6 47.8 rbm8a mutant 21 15.7 16.3 27.6 27 72.4 39 17.4 magoh WT 21 14.9 7.6 5.3 sibling magoh mutant 21 23.2 4.4 7.2

uniquely mapped left reads uniquely mapped right (millions) reads (millions) replicate1 replicate2 replicate1 replicate2 upf1 knockdown 12 50.2 50.2 49.7 48.7 embryos control WT 12 39.6 43.9 39.1 43.3 embryos

149 Appendix C List of primers used in chapter 3

zebrafish primers genotyping primers

rbm8aoz35 GCTCACTGCCGCTCACTC AGGCCGTCAGAAGAAAACAC magohoz36 AAGCGATTCATTGAAGGAACA GCGTCACTTACCGTCTGGTC foxo3bihb404 AGATCATGCACTTGGCCTCT Liu et al. 2016 (40,41) GGAGAGCCTGTGTTCTCCTG HRMA primers rbm8aoz35 TCACAAGCAAGCATCATGGC AACTGTACCATCGCCATCCT magohoz36 GGTTTCTTTAATACCGCGCAC GGCGCGTCACTTACCGTCTG sequencing primers rbm8aoz35 CACCTCTACACTGCCAAACC AGCTGAATCACTACAACGCG magohoz36 CACTGGTGTGCGACTAGGAA CTCAAACGCAAGTTCGCATA foxo3bihb404 same as geno primers Liu et al. 2016 (40,41) zebrafish qPCR primers srsf7a CCACTTCAGGGAGCTCAGAC AATGACCTGTCTCCCCACAC srsf3a CGGCCACGAGACGATTATAG TTGGTGTACGCTGGTCACAT srsf10b TGAAGATGTCCGAGATGCAG CGGAGATGAGCGTTCCTTAT gadd45aa f3 TCTCATCCAGGCTTTCTGCT r3 GCAGAAGCGGTTCACTTTTC gtpbp1l ACGACCAGAACGGGGAAG CATGCCGATCACATAGATGG pik3r3a GTGAATGTAACGCAGCAAGG GCAGTTTTGGATTTCTCTGGA

150 foxo3b CAGTGCACCAAGTGACTGGA GTCACATTCGCATTCCATGA phlda2 CAGGGCTTTAAAACGCAAAA CCTTTCAGACTTGCCGACAT cdkn1ba CGGACCCCAAAACGTAAAG TCCGAGGCTCGATTTGTTATT atxn1b TGAAGATTGACTCCAGCACTGT TGACCAACCTTGACCAAACA eif4a2 CGTGGCTTCAAGGATCAAAT CTCCTTTTTCACCAGGATGC mob4_F CTTTCACCATCGCCAGATTT mob4_R TGAGGCTTCAAATGCACAAG human qPCR primers CDKN1B_F AAGAAGCCTGGCCTCAGAA CDKN1B_R CAGGATGTCCATTCCATGAAG PHLDA2_F GCTCATCGATTTCCAGAACC PHLDA2_R CCTAGCCTCGGTCCGACT FOXO3_F1 CTGTCCCAGATCTACGAGTGG FOXO3a_R1 TCTTGCCAGTTCCCTCATTC SRSF4d (3′UI+)_F AGTTTTAGGACAAAGATATGGTTTT SRSF4d GCCGCGGGCATGCTCAACAATTACT (3′UI+)_R TBP_F: control TGTTTCTTGGCGTGTGAAGATAACC TBP_R AGAAACCCTTGCGCTGGAACTCGTC HNRNPD_F GCAGAGTGGTTATGGGAAGG HNRNPD_R GCTATTAGCAGGTGGCAGGA ARC_F CTGAGATGCTGGAGCACGTA ARC_R GCCTTGATGGACTTCTTCCA SRSF2_F GTGTCCAAGAGGGAATCCAA SRSF2_R TAGCCAGTTGCTTGTTCCAA STX3_F CTTTCCGTTGGGCTGAATTA STX3_R GGTTGCAAGGAAACAAAGGA b-globin NMD reporters Primer Sequence Purpose ARC_NotF AGAAGCGGCCGCccggagcccccagcctgccc forward primer to amplify ARC 3′UTR ARC_XbR GGAGAAtctagagagtctgtctctggggtaaggtgcac reverse primer to amplify 151 ARC 3′UTR FOXO3_3utrNotF GGAGAAGCGGCCGCTGAGGAAGGGGAAGTG forward GGCAAAGC primer to amplify FOXO3 3′UTR FOXO3_3R ttgaaaagcccaaaggTGTTCTCTTCACGAGCAG reverse primer that joins the first and last 250 nt of FOXO3 3′UTR intron FOXO3_4F cgtgaagagaacaCCTTTGGGCTTTTCAACTTCAGAG forward primer that joins the first and last 250 nt of FOXO3 3′UTR intron FOXO3_short3utr ggccgctctagaCAGAATGGCCGACGGGGGCTCAC reverse XbR G primer to amplify short FOXO3 3′UTR HNRNPD_NotF AGAAGCGGCCGCatttgcaacttatccccaacagGTATG forward primer to amplify HNRN PD 3′UTR

152 HNRNPD_XbR GGAGAAtctagaaacacttgaattcatctatgaagtccac reverse primer to amplify HNRN PD 3′UTR

153