2 Transcriptome-wide Organization of Subcellular Microenvironments Revealed by ATLAS-Seq

by Danielle Adekunle

Submitted to the Department of Biology on April 28, 2020 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Biology

Abstract

Subcellular localization of RNAs is a ubiquitous and evolutionarily conserved process that provides an additional layer of transcriptome organization promoting coordinated control of expression in both space and time. It has been shown to contribute to processes ranging from cell fate determination and embryonic patterning to local translation and directed cell movement. Elegant efforts focused on a small handful of RNAs have established RNA localization to play key roles in cell function – yet recent studies suggest that specific localization patterns are the rule, not the exception, across the transcriptome. We still lack global maps and organizing principles for how RNAs are localized in cells and tissues. This dissertation details the findings of a new approach to investigating RNA localization on a transcriptome-wide scale, ATLAS-Seq, a detergent-free method that generates transcriptomes and proteomes from tissue lysates fractionated across a continuous sucrose gradient by density ultracentrifugation. We conducted proteomic analyses of fractions to determine separation of subcellular compartments. Transcriptomic analyses revealed that RNAs sedimenting similarly across gradients encode in similar complexes, cellular compartments, or with similar biological functions, suggesting that RNAs that are functionally related are cosegregated to be coregulated. Overall, most RNAs sedimented differently than their encoded protein counterparts, signifying that most RNA compartmentalization is not directed at restricting RNA localization to the final destination of their protein product. To identify regulatory RNA binding proteins potentially driving these patterns, we correlated their sedimentation profiles to all RNAs, confirming known protein-RNA interactions and predicting new associations. Interestingly, hundreds of alternative RNA isoforms exhibited distinct sedimentation patterns across the gradient, despite sharing most of their coding sequence. These results provide new insights into establishment and maintenance of subcellular organization of the transcriptome.

Thesis Advisor: Eric T. Wang1, Phillip A. Sharp2 Title: Assistant Professor1, Professor2

3

4 Acknowledgements

I would like to thank my thesis advisors, Eric Wang and Phil Sharp for their support and mentorship. I am very grateful to have chosen a thesis advisor who has given me so much trust, and always encouraged me to be engaged in the international scientific community. These opportunities have fundamentally shaped me as a scientist and have undoubtedly impacted my trajectory in more ways than I can fully appreciate yet. Eric has always readily given me the trust and opportunity to expand my skillset and learn new techniques even when they were outside of any previous area of expertise for me or the lab. He has also instilled in me an appreciation for new ideas and clean code. I remember my first meeting with Phil many years ago when I was an undergraduate embarking on the journey to graduate school. He gave me a piece of advice that will always stay with me, “In a world full of ‘Nos’ you have to be the one to say ‘Yes’”. I do not think any other quote could have better summed up my PhD journey. There were certainly a lot of ‘Nos’ along the way and many failures but I am so grateful that you instilled me with the wisdom to persevere through the ups and downs that are inherent to research and the graduate school experience.

I want to thank my thesis committee, David Bartel and Christopher Burge for their insightful advice, their challenging questions, and unwavering support throughout my graduate school career. I would also like to thank Gary Bassell for kindly agreeing to read my manuscript and thesis, and for joining my committee as an outside examiner. Thank you Boris Zinshteyn and the Gilbert lab for all of your Gradient Station help. I am also very grateful to Matt Taliaferro for tolerating my countless ‘pop-in’ visits and providing scientific advice, discussion, and feedback throughout this project. Thank you to StackOverflow for saving me an incalculable amount of time and frustration. You are the real MVPs.

I thank all former and current members of the Wang and Sharp labs whose assistance and advice have been vital. I would like to give a special shoutout to the ‘Comp Room’. Our afternoon ‘pops’ and daily conversations made lab life fun. I am lucky to call you friends. Lance, thanks for the scientific discussions, the comic relief, and ‘surf video’ distractions. Hailey, you have become one of my very best friends. Thank you for answering all of my bazillion questions when I was first learning to code, your endless patience, and your dry sense of humor. You are the Angela to my Oscar/Stanley. I wouldn’t want to have experienced my Florida adventures with anyone other than you.

Of course, I have to thank all of my Biograd 2013 classmates. Maria, thank you for always being a ray of sunshine in the evenings and maintaining a safe, clean lab environment for everyone. I have so much gratitude for my grad school sisterhood Jordan, Faye, Helen, Marissa, and Emma. I could never thank you enough for all of your support in every aspect of the grad school journey and I am so grateful to have you as lifelong friends, same goes to Spencer, Paritosh, Yuelin, and Yunpeng. I would also like to thank my non-MIT friends for making the treks to visit me and always keeping me grounded. Thank you to my jewelry-making/pottery friends for giving me sanity and a positive, happy space for expression. To my family, thank you for your unwavering love and support and being my biggest cheerleaders in all of my endeavors. Arnaud, thank you for being my best friend and partner in this journey. Your belief in me has given me so much confidence as a scientist and the strength to face even the most seemingly insurmountable adversities. Thank you to the Hubstenbergers for sharing so many special milestones, being the best fellow gourmands a girl could ask for, all the tisane talks, and welcoming me with open arms into your family.

5 Table of Contents

Title Page ……………………………………………………………………………………… 1

Abstract ………………………………………………………………………………………... 3

Acknowledgements ……………………………………………………………………………5

Table of Contents ……………………………………………………………………………...6

Chapter 1: Introduction ……………………………………………………………………..8

Overview…………………………………………………………...... 9

RNA Localization Summary ………………………………………………………….12

Functions of RNA localization ……………………………………………………….15

Cis-elements……………………………………………………...……………………20

Trans-factors and RNA localization………………………………………………….22

RNP granules ………………………………………………………………………….23

Mechanisms of RNA Localization …………………………………………………...24

Splicing and RNA localization ……………………………………………………….34

Translation and RNA targeting ………………………………………………………36

High-throughput studies of RNA localization ………………………………………38

Concluding Remarks …………………………………………………………………42

References …………………………………………………………………………….44

Chapter 2: Transcriptome-wide Organization of Subcellular Microenvironments Revealed by ATLAS-Seq …………………………………………………………………...54 Author Contributions ………………………………………………………………….54

Abstract …………………………………………………………………………...……55

Introduction ……………………………………………………………….……………55

Materials and Methods …………………………………………….…………………58

6 Results …………………………………………………………………………………64

Discussion ……………………………………………………………………….….…93

Acknowledgements ……………………………………………….…………….….…95

References ……………………………………………………………………..…...…96

Supplementary Information ………………………………………………………...101

Chapter 3: Future Directions ………………………………………………….….……...108

References …………………………………………………………………….……..116

7 Chapter I

Introduction

Danielle Adekunle

8 Part 1. RNA Localization overview

Overview The molecules that exist within cells are nearly 1 million times smaller than the cell itself but through cellular organization they can work together to create dynamic structures and microenvironments that orchestrate cellular processes with immense implications on the survival and function of the cell itself. In eukaryotic cells, these structures and their subcompartments help the cell to carry out specialized functions with increased efficiency than would otherwise be possible given the size and scale of their genome. Prokaryotes are not typically thought to be compartmentalized cells, but even in the relatively small cells of prokaryotes, compartmentalization does exist, such as the magnetosome compartment and carboxysomes of magnetotatic bacteria and cyanobacteria, respectively (Komeili, 2012; Cornejo, Abreu & Komeili, 2014).

Additionally, a transcriptome-wide analysis of RNA distribution in E.coli characterized the complexity of bacterial transcriptome organization (Kannaiah, Livny & Amster-

Choder, 2019), further confirming that compartmentalization strategies can be found across the tree of life.

Through seminal research efforts, it has become clear that cellular organization in the form of spatial regulation provides necessary precision and efficiency to . Spatial regulation of protein production has widely been accepted as the mechanism for which proteins reach their target destination, but we are only beginning to uncover the importance of RNA localization, which is the spatial regulation of gene expression at the transcript level. Much work since the earliest described studies of

RNA localization almost four decades ago has been dedicated to uncovering the mechanisms, significance, and key players involved in this mode of transcriptome

9 regulation. The employment of genomic approaches to study RNA segregation have revealed that the large majority of transcripts are localized (Lécuyer et al., 2007). What are these transcripts, how are they regulated, and by what mechanisms? This dissertation details investigations into how the transcriptome is spatially organized and what factors and interactions provide for this organization in mouse liver. The idea of spatial regulation of transcripts, combined with the desire to determine the mechanisms by which cytoplasmic proteins were localized to their final destination, was a driving motivation that inspired much of the early, consequential studies into transcript organization. RNA localization is known to be of significant importance in an extensive number of cell types, including specialized cell types such as neurons, which have the challenge of localizing proteins across vast distances, such as from the soma to the axon, and developing embryos which require high levels of coordinated patterning regulation and maintenance of asymmetry. RNA localization has also been found to be prevalent in cells that do not have these challenges, such as budding yeast cells and fibroblasts. RNA localization is a ubiquitous mode of transcriptome organization that is highly conserved.

10

Figure 1.1. Examples of RNA localization in different cell types and contexts. (A) ASH1 mRNA localizing to the bud tip in budding yeast Saccharomyces cerevisiae. (B) Bicoid mRNA localized to the anterior pole of the Drosophila embryo. Oskar and nanos mRNA is localized to the posterior pole of the developing oocyte. (C) Localized maternal mRNA Vg1 concentrated at the vegetal pole of mature Xenopus laevis oocytes. (D) β-actin mRNA is localized to the lamellipodium at the leading edge of the cell in chicken embryo fibroblast cells. (E) In immature mouse neurons β-actin mRNA localizes to growth cones. In mature mouse neurons CamKIIa mRNA is localized to dendrites. (F) MBP mRNA localizes to myelin lamellae surrounding oligodendrocyte axons. Image adapted from (Martin & Ephrussi, 2009).

11 Chapter 1: Introduction

RNA localization is not a process restricted to a handful of RNAs but rather an integral layer of gene expression control. How RNA localization is orchestrated, what

RNAs and components are involved in transcript localization, and where these events occur are all the focus of ongoing research efforts and motivations for pursuing the studies described within this dissertation. This introductory chapter will provide insights and concepts important for placing the scientific advances of this dissertation in context.

It will focus on RNA localization in the cytoplasm. It covers cellular functions and settings in which RNA localization has been implicated, mechanisms of RNA localization that have been uncovered, the factors involved in mediating RNA localization, and new technological advances that allow the study RNA localization at a global scale.

RNA localization summary

Eukaryotic mRNA life begins in the nucleus with transcription. The newly transcribed pre-mRNA contains cis-elements also referred to as localization elements, primary sequence, or secondary structure elements, that can be recognized and bound by RNA binding proteins (RBPs) (Bullock & Ish-Horowicz, 2001; Martin & Ephrussi, 2009).

These trans-factors play a central role in all downstream processes and thereby serve as RNA fate determinants (Dominguez et al., 2018; Gerstberger, Hafner & Tuschl,

2014). The nuclear RBP-RNA interactions mark RNAs for localization and in some instances, ensure that the RNA will remain translationally repressed en route to its final destination (Martin & Ephrussi, 2009). Additional factors will assemble on the transcript

12 during splicing and if the mature mRNA is marked for nuclear export, it will be transported to the cytoplasm where additional factors, like motor proteins or stabilizing proteins, can associate (Martin & Ephrussi, 2009). This mRNA and its associated protein factors (mRNP) is the basic functional RNA localization unit and there are a variety of mechanisms employed by the cell to transport mRNPs to their final subcellular destinations.

13

Figure 1.2. Schematic of mRNA processing from pre-mRNA to RNA export to the cytoplasm. (A) A Pre-mRNA molecule is depicted after it has been transcribed from DNA (blue). RBPs (multi-colored circles) bind to the transcript co-transcriptionally (Dreyfuss, Kim & Kataoka, 2002; Viphakone et al., 2019). (B) A newly spliced mature mRNA molecule with its associated factors. Additional factors can assemble on the transcript during the process of splicing (Palacios, 2002). Through the process of nuclear export, the mRNA and its associated trans-factors form a ribonucleoprotein particle (RNP) that is transported from the nucleus into the cytoplasm. (D) in the cytoplasm RNPs can oligomerize with other RNPs. (E) RNPs can also coassemble into

14 granules with other RNPs, regulatory proteins, scaffolding RNAs, and various other factors.

Part I. Functions of RNA localization

Segregating mRNAs rather than localizing proteins is advantageous to the cell for a variety of reasons. There are a number of ways that subcellular distribution of

RNAs helps to improve cellular function and reduce the energy consumption of the cell.

Increased Cellular Efficiency

Energy-starved cells must self-destruct (Tzatsos & Tsichlis, 2007; Yong-Jun Fan,

2013). For the cell to avoid this fate, efficiency is vital. Protein production poses a massive energy burden, with transcription and protein synthesis being the principal sources of cellular energy consumption (Kafri et al., 2016). This leaves a limited number of resources to be dedicated to all other cellular tasks. In order to carry out specialized functions, to regulate the various physiological processes core to its vitality, and to respond effectively to external cues, the cell must maximize its efficiency. The ability of many copies of a protein to be generated from a single mRNA through repeated rounds of translation makes mRNA localization more energy efficient. Rather than dedicating energy to localizing each individual protein, the cell can localize a limited number of mRNA copies and generate large quantities of proteins at functional sites. Additionally, mRNAs encoding proteins subunits of the same protein complex can be concentrated and co-translated at the same subcellular site. Glycolytic mRNAs, for example, are colocalized to the same granules and co-translated in translation factories in yeast (Lui et al., 2014). By localizing RNAs rather than proteins, the cell can save energy and

15 more efficiently assemble protein complexes and proteins that will be co-localized and

co-regulated.

Rapid Response to Stimuli: temporal control

Cells must detect and respond to external cues and physical signals in their

environment to properly develop, maintain homeostasis, and protect themselves from

pathogens and other threats. Ideally, a response to stimuli should occur with high

temporal resolution. Localization of mRNAs allows for a rapid response to stimulus. A

stimulus need only be delivered to the site of response to swiftly trigger translation of

localized RNAs at the desired subcellular location. The alternative scenario would

require a transcriptional signal to be sent to the nucleus where the RNA would have to

be processed then exported to the cytoplasm, followed by translation and subsequent

localization of the protein to its functional domain before an effective response to a

stimulus could be produced (Holt, 2019). This lengthy signal transduction process can

be avoided when transcripts are concentrated locally.

The increased efficiency in stimuli response is particularly advantageous in very

highly polarized cells such as neurons. A large fraction of the proteome of these cells

must be localized to synapses (Gagnon & Mowry, 2011). The synapse must provide an

effective response to stimuli quicker than the necessary proteins can be transported

from the cell body to the site of stimulation (Crino & Eberwine, 1996; Steward et al.,

1998). Axonal growth cones, as an example, must respond rapidly to a wide-array of

axonal guidance cues and quickly reorganize the actin cytoskeleton for proper

extension and motility (Leung et al., 2018). Cofilin proteins are among the key

16 regulators of actin dynamics at the growth cone. In a study of the cofilin protein production response at the growth cone, treatment of the chemi-repellant, Slit2, led to

Cofilin protein accumulation within 5 minutes (Piper et al., 2006). In another example, β- actin mRNA granules were shown to be concentrated in neuronal processes of cortical neurons within minutes of Dibutyryl cAMP treatment, and localized to growth cones 1 hour after treatment, indicating that a dramatic RNA localization response can be rapidly triggered in the presence of a cellular signal (Bassell et al., 1998).

Transcriptome-wide analysis of RNAs localized to sensory axons provided evidence that many mRNAs are localized to axons and that these RNAs change dynamically with developmental cues (Gumy et al., 2011). Embryonic sensory axons are enriched for mRNAs encoding cytoskeletal proteins that allow the axon to grow outwardly (Willis et al., 2005; Gumy et al., 2011). Adult sensory axons are enriched for

RNAs that encode proteins responsible for signaling to the nervous system the presence of dangerous and damaging signals, such as temperature extremes or injury

(Willis et al., 2005; Gumy et al., 2011). From these studies and others, it is clear that localizing RNA at subcellular destinations allows cells to rapidly and dynamically respond to environmental cues and signals at the site of stimulation.

Prevents deleterious effects of ectopic protein localization: spatial control

Some proteins can be toxic and harmful to the cell if located outside of certain cellular compartments. By localizing translationally repressed RNAs and translating them locally, proteins whose presence could be deleterious outside of their functional domains are restricted to their target destination, preventing any detrimental effects to

17 the cell. In the case of embryonic development, aberrant expression of maternal determinants can disrupt embryonic patterning, thereby perturbing embryogenesis, altering cell fate, and jeopardizing embryo viability (Fan et al., 2015; Zhang, Talbot &

Schier, 1998; Houston & King, 2000). mRNA localization, in addition to preventing toxic proteins from acting outside of their intended cellular environment, helps reduce inappropriate protein-protein interactions. Both of these functions can be observed with

Myelin Basic Protein (MBP). MBP is a component of the myelin sheath (Boggs, 2006); fatty sleeves that oligodendrocyte cells wrap around nerve axons which act to speed up electrical impulses by operating as electrical insulators and also protecting nerve axons.

The MBP protein acts to compact myelin membrane by promoting myelin stacking among other functions (Boggs, 2006). Due to its intrinsically disordered region, the protein sticks to virtually any membrane or lipoprotein with which it comes into contact, leading to membrane compaction at the site of interaction (Baron & Hoekstra, 2010).

Compaction of plasma membrane and other cellular membranes that should not be compacted, as well as aberrant protein-protein interactions, both have harmful effects on the cell and must be avoided at all costs (Yin et al., 1997) To circumvent these effects, the cell ensures that MBP protein is concentrated through the localization of

MBP mRNA. When the MBP mRNA is localized and translationally repressed, any unintended compaction can be avoided and the protein is exclusively expressed at its functional domains.

Localizing RNA presents many benefits to the cell. Namely, it saves the cell energy, allows for the cell to rapidly respond to environmental cues, and serves to

18 protect the cell from the deleterious effects of proteins being expressed outside of their functional domains.

(A)

(B)

(C)

Figure 1.3. Schematic illustrating the various functions of RNA localization. (A) RNA localization allows for increased cellular efficiency by localizing RNAs and generating many proteins from a single localized transcript. (B) RNA localization can allow the cell to rapidly respond to a stimulus. On the left, in a neuronal cell (teal) a stimulus at the synapse (yellow) sends a signal that a protein is needed. If the cell relied on protein transport, a protein molecule (orange) would have to be shuttled from the site of synthesis at the soma down the axon all the way to the synapse. Alternatively, if the cell relied on RNA transport, upon stimulation a localized RNA molecule (purple) can rapidly be translated into protein allowing for a rapid cellular response within minutes. (C) By localizing translationally repressed RNAs, RNA localization can restrict a protein that has toxic effects at subcellular sites outside of its functional domain to its functional domain.

19 Part II. Cis-elements

Elements in the sequence of a transcript, termed cis-elements, are key

components known to regulate RNA stability, polyadenylation and translation. Cis-

elements also dictate the targeting of transcripts to specific subcellular domains. These

localization elements (LEs) are key components that mediate the localization of a

transcript and contribute to the determination of its subcellular fate. The interplay

between cis-elements and trans-factors, which will be described in greater detail below,

are the driving force behind the targeting of transcripts to their final destinations. The

use of LEs to traffic RNAs appears to be conserved (Hamilton & Davis, 2011); LEs have

been identified in a diverse assortment of organisms ranging from ascidians to humans

and a wide-array of cellular contexts from developing embryos to the growth cones of neurons (Bashirullah, Cooperstock & Lipshitz, 1998; Sasakura & Makabe, 2002). The

evidence that cis-elements are both necessary and sufficient to target RNAs to specific

subcellular sites largely came from microinjection studies in oocytes (Macdonald &

Struhl, 1988; Allen, Kloc & Etkin, 2003; Snee et al., 2005). In these studies, oocytes

were microinjected with synthetic transcripts that contained exogenous reporter

sequences fused to all or part of specific localized RNAs (Macdonald & Struhl, 1988;

Allen, Kloc & Etkin, 2003; Snee et al., 2005). If the reporter maintained its ability to

localize to the same subcellular site as the endogenous transcript, one could conclude

that the region of the localized RNA that was fused contained cis-elements sufficient to

localize the transcript (Macdonald & Struhl, 1988; Allen, Kloc & Etkin, 2003; Snee et al.,

2005). A cis-element is deemed necessary when the RNA cannot be localized without it

(Macdonald & Struhl, 1988; Allen, Kloc & Etkin, 2003; Snee et al., 2005). Many of the

20 early LEs were discovered in this way. From a compendium of these and similar

reporter studies in other organisms such as chicken fibroblasts, Drosophila, ascidians and zebrafish, we now know that localization elements are predominantly found in 3’

UTRs, though, there are examples of LEs in 5’ UTRs (Thio et al., 2000) and CDS (Long

et al., 1997; Shepard et al., 2003; Takizawa & Vale, 2000). LEs can localize RNAs in

the absence of the neighboring endogenous sequence, they can be repetitive and

redundant (Gautreau, Cote & Mowry, 1997; Ferrandon et al., 1994), and in some cases

are composed of tandem repeats (Kloc, Spohr & Etkin, 1993). They can have

secondary structure(Ferrandon et al., 1994), typically in the form of stem-loops; in fact,

there are instances where it is the secondary structure, not the primary sequence, that

is sufficient for mediating a transcript’s localization. Such is the case with bicoid RNA in

Drosophila and ASH1 mRNAs in budding yeast (Macdonald & Kerr, 1998; Chartrand et

al., 1999; Gonzalez et al., 1999). Quaternary structures have also been demonstrated to

be utilized in transcript localization (Ferrandon et al., 1994; Ferrandon et al., 1997). One

final point is that cis-elements can vary greatly in size ranging from a few nucleotides to

longer than 1 kilobase.

It is generally thought that if the localization process is very simple, as is the case

with the targeting of RNAs to the leading edge of the cell, there is just one very basic

element bound by a single trans-factor that targets the transcript (Ross et al., 1997).

When the localization process is multi-step and many factors are required to target the

RNA to its ultimate destination, an assortment of cis-elements are theorized to work in a

combinatorial fashion to localize the RNA (Jambhekar & DeRisi, 2007). Oskar mRNA is

among the RNAs shown to localize through a combination of elements (Kim-Ha et al.,

21 1993; Jambor et al., 2014; Kim et al., 2014; Ryu et al., 2017). Other RNAs, like ASH1, can possess multiple cis-elements that are sufficient to localize the RNA independently but work together to increase efficiency and accuracy of the RNA’s localization

(Chartrand et al., 2002).

Part III. Trans-factors and RNA localization

Cis-elements act in concert with trans-acting factors, RNA binding proteins

(RBPs), that bind RNAs and are necessary or sufficient for localizing them to their final destinations. They recognize RNAs and bind them via cis-elements to form RNPs. One of the first trans-factors shown to localize an RNA was zipcode-binding protein (ZBP1) in chicken fibroblasts (Ross et al., 1997). This RBP binds to a 54-nucleotide cis-element in beta-actin’s 3’UTR and is responsible for transporting β-actin mRNA to the cell periphery (Kislauskis, Zhu & Singer, 1994; Oleynikov & Singer, 2003). Knockdown of

ZBP1 results in mislocalization of β-actin mRNA and ZBP1 protein was also shown to be sufficient to rescue mislocalization of β-actin RNAs in cells lacking ZBP1 protein expression (Oleynikov & Singer, 2003). Subsequent studies revealed ZBP1 conservation (Yisraeli, 2005), through the identification of orthologous proteins that function in organisms ranging from Xenopus (Havin et al., 1998) to Homo sapiens

(Deshler et al., 1998; Leung et al., 2006; Nielsen et al., 1999; Vikesaa et al., 2006).

These studies illustrated the cellular importance of localization by ZBP1.

Since the early studies of β-actin, a host of other RBPs have been identified as playing essential roles in RNA distribution. These trans-factors were largely identified through affinity purification, biochemical studies, or computational approaches (Kindler

22 et al., 2005). The hypothesis that multiple cis-elements can be enriched in RNAs

localizing to the same compartment motivated the idea that collections of trans-acting factors might act on a single RNA during the localization process, rather than single

RBPs driving the localization of individual RNAs. These factors likely act concomitantly with each factor being responsible for mediating a different aspect of the localization process. The localization of ASH1 mRNA, described in greater detail below, highlights the key role trans-factors play in marking RNAs for localization and trafficking them to their subcellular destinations. The complex of proteins demonstrated to drive the trafficking of ASH1 mRNA has been found to localize at least 23 other mRNAs to the yeast bud tip in the same manner (Takizawa et al., 2000; Shepard et al., 2003),

providing evidence for the theory that networks of RNAs are localized in cells by a

shared combination of trans-factors working together to transport their targets. Trans-

factors are at the heart of transcriptome organization.

Part IV. RNP granules

RNAs are not only localized as single functional RNPs but they can also be

stored or transported in macromolecular complexes consisting of many RNAs and their

regulatory RBPs. The supramolecular RNA-protein structures that are large enough to

be visualized by light microscopy are termed RNP granules or condensates (De Graeve

& Besse, 2018; Formicola, Vijayakumar & Besse, 2019). RNP granules are

membraneless, highly variable in size (ranging from 100nm to 10000nm) and can differ

greatly in their function and composition (Banani et al., 2017). Some RNP granules are

postulated to behave like liquids and can form through liquid-liquid phase separation

23 much like oil and vinegar emulsion, others condense into gels with solid-like properties

(Brangwynne & Hyman, 2009). A combination of RNA-RNA, RNA-protein, and protein- protein interactions drives their assembly and influences their composition (Mittag &

Parker, 2018; Van Treeck & Parker, 2018).

RNP granules tend to be heterotypic, meaning they contain a heterogeneous mixture of RNPs that consist of different types of RNA species and their associated factors which can include translation regulation factors, RBPs, motor proteins and, in some cases such as stress granules, the small ribosomal subunits (Banani et al., 2017).

Asymmetric distribution of RNP granules within cells can be driven in the absence of active transport, when condensates diffuse but are entrapped at one pole through condensation and released from the other pole by dissolution (Brangwynne & Hyman,

2009). In other cases, RNP granules are the vehicles that RNAs are transported in on the way to their final destination. For example, MBP mRNA’s transport in RNP granules is directed down the oligodendrocyte processes along microtubules to the cell periphery

(Ainger et al., 1993). To conclude, RNP granules are large supramolecular complexes that serve many cellular functions including RNA distribution. These granules can be acted upon by one or a combination of mechanisms described below to target transcripts to their final subcellular destination.

Part V. Mechanisms of RNA Localization

There are various ways RNAs have been demonstrated to be non-uniformly distributed in eukaryotic cells. All of the mechanisms that non-uniformly distribute RNAs

24 in the cell act on either single RNPs or on RNP granules to concentrate RNAs to

specific subcellular sites.

Directed transport on cytoskeleton

Directed transport of RNAs to their final destination is considered the most common method of RNA transport (Martin & Ephrussi, 2009). It involves the use of molecular motors to drive directional movement of RNPs along the cytoskeletal tract to their target destination (Gagnon & Mowry, 2011). Molecular motors are small protein machines that can undergo conformational changes to convert chemical energy, generated from ATP hydrolysis, into motion allowing them to walk along the cytoskeletal tract. There are three families of molecular motors: myosin, dynein, and kinesin. All three families have been implicated in mediating RNA targeting, but the directionality and cytoskeletal structures that the motor proteins move along is determined by the molecular motor driving transport. There is evidence that RNPs can move bidirectionally through the coordination of different molecular motors but this bidirectional movement will ultimately end with RNPs assuming a final, polarized fate (Gagnon & Mowry, 2011).

Though the factors and processes of how molecular motors become associated

with RNPs remains poorly understood, a general model for how molecular motors

become associated with RNPs has been derived from a few well-characterized

examples. Decades of research into ASH1 mRNA localization has been perhaps the

biggest contributor to much of the current mechanistic understanding of how motor

proteins can traffic RNAs in a polarized manner to specific subcellular sites along the

cytoskeleton. ASH1 mRNA encodes a regulator of mating type switching in budding

25 yeast (Strathern & Herskowitz, 1979). In the nucleus, RBP She2p recognizes and binds localization elements in the coding region and 3’UTR of ASH1 mRNA (Long et al., 2001;

Böhl et al., 2000a). The ASH1-She2p RNP is exported to the cytoplasm. Once in the cytoplasm the adapter protein She3p binds to the RNP and links it to the myosin V motor protein (Bertrand et al., 1998), Myo4p. Myo4p moves the transport complex consisting of ASH1 mRNA, She2p, She3p, Myo4p, and other auxiliary proteins, along the actin cytoskeleton to the bud tip (Bertrand et al., 1998; Münchow, Sauter & Jansen,

1999; Takizawa & Vale, 2000; Böhl et al., 2000b; Long et al., 2000).

Figure 1.4. Schematic depicting how the ‘locasome’, the complex of proteins that link ASH1 mRNA to the actin cytoskeleton, transports ASH1 mRNA to the bud tip in budding

26 yeast. ASH1 mRNA is made in the nucleus where it becomes bound by She2p and subsequently exported to the cytoplasm. In the cytoplasm, ASH1-She2p are bound by She3p and Myo4p, the latter is a motor protein that links the RNP complex to actin filaments. The complex is transported along actin to the bud tip where ASH1 will be anchored. Image adapted from (Singer & Long, 2003).

The trafficking of other RNAs like Vg1 in developing Xenopus oocytes to the vegetal pole by kinesin motors to the plus ends of microtubules (Birsoy et al., 2006;

Messitt et al., 2008; Betley et al., 2004), and pair-rule RNAs in Drosophila embryos by dynein motors to microtubule minus ends to establish embryonic patterning (Wilkie &

Davis, 2001), largely corroborates the general themes observed in ASH1 mRNA transport. Namely, RBPs recognize and bind cis-elements within transcripts in the nucleus, forming RNPs and marking the RNAs for nuclear export. In the cytoplasm, adapter and scaffolding proteins will associate with the RNP and in turn recruit molecular motors to the complex. These motors will actively transport the RNP complex along cytoskeletal structures in a polarized fashion to the RNAs final destination.

27 (A)

(B)

(C)

Figure 1.5. Schematic depicting RNA localization through active transport. (A) RNA is exported from the nucleus (dark gray). (B) In the cytoplasm, myosin motor proteins can associate to RNPs and move them along actin filaments to the RNA’s final subcellular destination. (C) Alternatively, kinesin or dynein motor proteins can associate with RNPs and move the RNA in a directed manner along microtubules to its final destination. Kinesin motor proteins typically transports their cargo to the microtubule plus-end and Dynein motor proteins transport their respective cargo to the microtubule minus-end.

28 Selective stabilization

Another tactic used by the cell to restrict RNA distribution is the generation of a high local concentration of RNAs by locally protecting transcripts from degradation. In this mechanism of RNA localization an RNA that is distributed throughout the cytoplasm is actively stabilized at a specific subcellular region. Hsp83 RNA localization is one of the most well-characterized examples of RNA selective stabilization. Hsp83 is a target of the trans-factor Smaug in Drosophila embryos (Semotok et al., 2005; 2008). Its degradation is triggered by Smaug association which leads to Smaug-dependent removal of its poly(A) tail and the subsequent recruitment of the CCR4-Not complex which deadenylates the RNA resulting in transcript decay (Semotok et al., 2005; 2008).

Through mechanisms that remain poorly understood, hsp83 transcripts are protected from degradation at the posterior pole plasm resulting in a relatively high local concentration of hsp83 at the posterior (Semotok et al., 2005; 2008).

29

asymmetry in the biflagellated alga, Chlamydomonas, led to the discovery that when

nuclear pores were concentrated to the posterior end of the nucleus, β2-tubulin mRNAs

were localized to polysome-rich areas of the cytoplasm near the posterior end of the

nucleus (Colón-Ramos et al., 2003). This study remains among the best demonstrations

of this type of asymmetrical RNA distribution. However, polarization of nuclear pore

complex around nuclei has also been observed in Drosophila blastoderm cells in a

cytoskeleton-independent manner that implicates cis-elements in the 3’UTR of pair-rule

RNAs (Davis & Ish-Horowicz, 1991), suggesting that it may be universal strategy employed by cell to distribute specific RNAs asymmetrically.

Phase Separation

More recently, a new mechanism of RNA localization has emerged which implicates RNA’s capacity to co-assemble with proteins to form liquid droplets or solid aggregates. This was first characterized in C.elegans embryos when Brangwynne et. al noted that RNAs and proteins condensed into liquid droplets at the posterior pole while they were observed to dissolve at the anterior pole (Brangwynne & Hyman, 2009). The granule accumulation at the posterior pole traps maternal mRNAs just before asymmetric mitosis, producing a high concentration of the maternal mRNAs that will give rise to germ cells (Brangwynne & Hyman, 2009). This mode of localizing RNAs requires neither active transport along cytoskeletal structures nor cytoplasmic flow; it is

mediated, instead, through a reaction-diffusion mechanism (Kondo & Miura, 2010).

Since this seminal discovery, phase separation has become accepted as a critical

32 mechanism to compartmentalize RNAs in cells, from bacteria all the way to mammals

(Banani et al., 2017).

Figure 1.8. Asymmetric localization of germ granules in C.elegans embryos through phase separation post-fertilization and before asymmetric mitosis. Following fertilization germ droplets condense homogenously throughout the cytoplasm (top). These droplets begin dissolving in the anterior pole of embryos through the action of an anteriorly localized protein that is absent in the posterior pole. The condensates in the posterior pole grow through RNP coassembly causing an asymmetric distribution (middle). At the end of this process, maternal RNPs are depleted from the anterior pole and concentrated posteriorly (bottom). The red line delineates the frontier of condensation versus dissolution. The condensation gradient is depicted in teal, the dissolution gradient is depicted in maroon, and the dissolving proteins localization gradient is depicted in green.

33 To conclude, there are multiple strategies employed by the cell to asymmetrically organize the transcriptome. These mechanisms are not mutually exclusive; indeed, for some RNAs, a combination of these modes of localization produce the overall RNA distribution pattern. As research on RNA localization continues to progress, more mechanisms and a clearer understanding of known mechanisms will undoubtedly emerge.

Part VI. Splicing and RNA localization

Alternative splicing enables different RNAs to be generated from the same gene by the selective inclusion or exclusion of exons. This greatly increases the diversity of mRNA species in the cell. Studies of alternative spliced isoforms have largely centered around understanding their implications in protein synthesis but in recent years the influence of alternative splicing on RNA distribution has been increasingly investigated.

Alternative isoforms inherently have sequence differences, therefore, RNA isoforms generated from the same gene can have different cis-elements; the significance of these elements in trafficking transcripts was expounded upon earlier. This presents the possibility for different RNA isoforms of the same gene to differ in their subcellular localization patterns. Cyclin B exemplifies this type of program. There are two isoforms generated from Cyclin B that differ in 393 nucleotides. In Drosophila oocytes the shorter isoform is present ubiquitously in the pro-oocyte, whereas the longer isoform is localized to the posterior pole of the developing oocyte and exhibits a perinuclear pattern in syncytial embryo bodies (Dalby & Glover, 1992).

34 Splicing also presents the opportunity for regulatory trans-factors to bind to the

RNA. Some splicing factors can double as localization factors by remaining associated with the transcript after splicing and interacting with protein complexes directing nuclear export and RNA trafficking once in the cytoplasm (Hachet & Ephrussi, 2004). From studies of oskar mRNA, it was established that splicing was a necessary step in the targeting of the transcript to the poster pole in Drosophila oocytes (Hachet & Ephrussi,

2001). Upon splicing, components of the exon junction complex, which have been

demonstrated to be essential for oskar mRNA localization (Hachet & Ephrussi, 2001),

become associated with the mRNA in a sequence-independent and splicing dependent

manner. This illustrated the mechanistic coupling between RNA transport and splicing

(Mohr, Dillon & Boswell, 2001; Hachet & Ephrussi, 2001; Hachet & Ephrussi, 2004; Le

Hir et al., 2001).

Because cis-regulatory elements that drive localization are typically found in 3’

UTRs, it makes logical sense that alternative 3’ UTRs would be key splicing events in influencing transcript’s spatial fate. In a high-throughput study comparing the transcriptome of mouse neural projections and soma, it was concluded that distal last exons of alternative 3’ UTRs exhibited the highest differential localization when considering alternative 3’ UTRs, alternative 5’ UTRs, skipped exons, consecutive polyadenylation sites, and tandem 3’ UTRs (Taliaferro et al., 2016). This was a

validation of the theory that splicing programs driving RNA localization are generally

those that differentially splice 3’ UTRs. Even when alternative 3’UTR isoforms

themselves are not differentially localized the difference in 3’UTRs of alternative

isoforms can play an important role in the localization pattern of their protein product.

35 The ability of CD47 alternative 3’ UTR isoforms to differentially regulate the final

localization of CD47 protein is evidence of this. The longer 3’ UTR isoform of CD47

drives protein localization to the cell surface while CD47 protein encoded by the shorter

3’ UTR isoform is enriched at the endoplasmic reticulum (Berkovits & Mayr, 2015).

Taken together, splicing plays an integral role in mediating RNA localization: it is

responsible for cis-element differences between isoforms, that dictate differential isoform localization patterns with UTRs emerging as prominent sites of this type of localization regulation. Splicing also defines the trans-factors that will associate with

transcripts and drive RNA localization, and in some cases protein localization patterns.

Part VII. Translation and RNA targeting

The question of how translation regulates RNA localization has dominated localization studies for decades. We now know that RNA localization and translation are tightly coupled. RNAs that are localized tend to be translationally repressed until they arrive at their target destination. This translational repression serves to restrict gene expression to specific subcellular sites and appears to be highly conserved. In the

Drosophila oocyte, oskar mRNA is localized to the posterior pole and is translationally repressed until it reaches the pole plasm. Specific cis-elements in oskar’s 3’UTR are

sites where a translationally repressive trans-factor, Bruno, binds and prevents the RNA

from being translated ectopically (Kim-Ha, Kerr & Macdonald, 1995; Chekulaeva,

Hentze & Ephrussi, 2006). Bruno inhibits translation in two ways: (1) through RNA-

independent interactions with a 5’-cap binding protein, Cup, which binds translation

initiation factor EIF4E and competes with the ribosome loading factor EIF4G (Nelson,

36 Leidal & Smibert, 2004; Nakamura, Sato & Hanyu-Nakamura, 2004) and (2) it forms a

large mRNA oligomerization that blocks the translational machinery from accessing the

oligomerized mRNA (Chekulaeva, Hentze & Ephrussi, 2006). Bruno’s translational

repression of oskar mRNA is essential for proper development, and failure to effectively

prevent translation of oskar mRNA outside of the posterior pole results in developmental defects in the oocyte (Smith, Wilson & Macdonald, 1992; Kim-Ha, Kerr & Macdonald,

1995). In budding yeast, ASH1 mRNA, described above, is also translationally

repressed while being trafficked to its final destination through the binding of the RBP

Khd1p (Paquin et al., 2007). Khd1p prevents the translation initiation factor eiF4G1 from

binding and sequestering proteins that positively promote ASH1 translation (Paquin et

al., 2007). The RBP remains associated to ASH1 until it reaches the bud tip, where it is

phosphorylated by a kinase that causes Khd1p to leave ASH1 RNA, allowing for

translational activation of ASH1 at the bud tip (Paquin et al., 2007). This same pattern

can also be seen with β-actin mRNA. ZBP1, the RBP required for β-actin localization to

protrusions in primary fibroblasts and neurons (Ross et al., 1997; Farina, 2002),

prevents translation of β-actin, while it is being shuttled to the cell periphery. Once β-

actin reaches its destination, it is phosphorylated by Src kinase, causing ZBP1 to

dissociate from β-actin mRNA and leading to the translational activation of β-actin at the

cell periphery (Huttelmaier et al., 2005). From these examples and many others, a

pattern has emerged. RNAs are translationally repressed en route to their final

destination; in many cases the RBPs that are necessary and sufficient for their

localization also double as factors that keep the RNA in a translationally repressed

state. There are specific cis-elements that can be discretely responsible for translational

37 control and others that act as both translational control elements and elements required

for the RNA’s localization. The key takeaway is that RNA localization is a mode of

translational control and thus serves an essential role in spatiotemporal regulation of

gene expression.

Part VIII. High-throughput studies of RNA localization

Many methods have been developed to study RNA localization but few

technologies have been generated to study RNA localization on a transcriptome-wide

scale. One of the most foundational high-throughput imaging studies that helped to

establish that the majority of transcripts have distinct localization patterns was a high-

throughput in situ hybridization experiment in Drosophila that characterized the

subcellular distribution patterns of thousands of RNAs (Lécuyer et al., 2007). What

emerged from this study was the finding that the majority of transcripts show distribution patterns that are non-uniform. This and other subsequent high-throughput studies led to the development of the concept that there must be specific mechanisms and strategies that cells employ to localize RNAs, such as the ones described above, and highlighted the importance of focusing research efforts on high-throughput approaches. While a modest number of localized transcripts have been demonstrated to exhibit cell functions through localization, many localized transcripts await identification because the predominant approaches for studying RNA localization are low-throughput. As a result, we lack general principles regarding regulation of RNA localization. In this section, high- throughput technologies to investigate RNA localization and their limitations will be discussed.

38

High-throughput single-molecule FISH

The principle that RNAs can be localized and are asymmetrically distributed in cells was founded on early experiments that relied on visualizing the spatial patterns of

RNAs using in situ hybridization and microscopy (Jeffery, Tomlinson & Brodeur, 1983;

Bertrand et al., 1998). RNA Fluorescence in situ hybridization (RNA FISH) and single molecule FISH are now also commonly used methods for observing localization of

RNAs in addition to other imaging-based methods such as the MS2 hairpin system

(Bertrand et al., 1998). While these methods paint a fairly accurate picture in terms subcellular location of RNA species, they are limited in throughput and require that few transcripts be studied at a time. Advances have been made in these imaged-based approaches and techniques are emerging to image many RNAs at the same time providing the ability to determine how ubiquitous RNA localization is in the cell. FISSEQ and MERFISH are two recent advances that allow for a sizeable increase in the number of transcripts that can be assessed in a given experiment and have the potential to increase gene sampling while still employing microscopy for location accuracy.

Fluorescence in situ sequencing (FISSEQ) allows for fixed cells or tissues to be converted into cDNA libraries in such a way that sequencing reads can be obtained by confocal microscopy while keeping transcripts in their in vivo spatial contexts (Lee et al., 2015). While this approach seems promising in its potential to greatly expand our knowledge of transcript organization in the cytoplasm on a global scale, in practice it is too technologically challenging and costly to be implemented by non-specialist labs. The incapacity of rRNAs to be depleted created yet another challenge. Finally, and perhaps

39 most importantly, RNAs must be easily accessed to be evaluated by FISSEQ, meaning that RNAs in proteinaceous RNPs or supramolecular complexes such as RNP granules may not be easily assessed. This creates significant bias and greatly limits the transcripts whose spatial distribution can be investigated (Lee et al., 2015).

MERFISH, multiplexed error robust FISH, is another recently developed high- throughput imaging-based approach to study global RNA patterns. It uses combinatorial labeling, sequential hybridizations, imaging analysis and an error robust encoding system to detect hundreds to thousands of target RNAs at once (Chen et al., 2015).

While MERFISH is more accessible than FISSEQ and can visualize thousands of RNAs at once, it is still technologically challenging and there exist other major drawbacks such as its low resolution to map transcripts to organelles or very specific cellular compartments.

Proximity labeling

Proximity based labelling is a recently emerged method to label and purify RNAs and proteins that are in close contact. It requires the use of a modified enzyme, APEX, fused to a protein localized to the subcellular region of interest. The localized APEX-fusion protein can biotinylate RNAs or proteins that come in proximity to the enzyme with a biotin derivative (Hung et al., 2014). Utilizing the capacity of APEX-fusion constructs to label RNAs in targeted locations Fazal et. al were able to analyze RNAs localized to 9 different cellular compartments with an estimated 94% accuracy (Fazal et al., 2019).

This APEX-Seq approach was a major achievement for targeted high-throughput explorations, but the approach has a number of substantial limitations. The most

40 important limitation is that RNAs in macromolecular complexes or supramolecular complexes such as RNP granules are sterically inaccessible and thus cannot be biotinylated by the APEX-Seq method. Secondly, the APEX fusion construct is an exogenous construct that must be recombinantly expressed in cells, and as a consequence it cannot be readily used to analyze RNA localization in many tissues such as human or mouse (Fazal et al., 2019).

Figure 1.9. Schematic of APEX-Seq protocol for labelling RNAs localized to the outer mitochondrial membrane (OMM). APEX enzyme is genetically targeted to the outer mitochondrial membrane. In the presence of biotin-phenol and hydrogen peroxide, the enzyme can generate biotin-phenoxyl radicals that label RNAs in the targeted region with biotin. Streptavidin beads can be used to purify the RNAs and these isolated RNAs are then subjected to RNA-Seq, allowing for the transcriptome of the cellular region of interest, in this case the OMM, to be investigated. (Adapted from Fazal FM et. al, 2019)

Targeted fractionation-based approaches

41 Subcellular fractionation is a method to separate organelles and macromolecular

structures from one another. It typically involves a cell homogenization step, followed by layering and centrifuging the homogenate in a gradient, and finally isolation of gradient fractions. Genomics technologies have been used in combination with fractionation techniques to enrich specific organelles and to study their transcriptomes. This approach was used to analyze the human mitochondrial transcriptome using osteosarcoma cells as a model to discover and characterize mitochondrial-localized

RNAs (Mercer et al., 2011).The ER associated transcriptome was characterized by looking at ribosome-associated RNAs localized to ER using fractionation-seq, which purified ER and combined this ER fractionation with ribosome profiling (Reid & Nicchitta,

2012).

Targeted fractionation does have limitations. In particular, isolated organelles and complexes are often contaminated with components of other compartments, and what can be analyzed and assessed is limited to what cellular compartments can be successfully purified.

Concluding Remarks

The initial motivating questions for this dissertation was: what RNAs are localized, where are these RNAs targeted to, and how is their localization mediated?

Chapter II will describe an attempt to create a subcellular map of localized RNAs. What we found through these efforts was that many RNAs that encode proteins with similar functions cosegregate, suggesting that RNA coassembly is important for cellular compartment biogenesis and for maintaining asymmetries in the cell. What mechanisms

42 drive coassembly of RNAs encoding proteins with similar function and localizations and how does this shape the cellular architecture? In chapter III we will explore these questions and future directions for this dissertation. This work highlights the need to explore new models of subcellular organization. The organization of the transcriptome is more multiscale than what is currently appreciated.

43 References

Ainger, K., Avossa, D., Morgan, F., Hill, S.J., Barry, C., Barbarese, E. & Carson, J.H. (1993) Transport and localization of exogenous myelin basic protein mRNA microinjected into oligodendrocytes. The Journal of Cell Biology. 123 (2), pp. 431– 441.

Allen, L., Kloc, M. & Etkin, L.D. (2003) Identification and characterization of the Xlsirt cis-acting RNA localization element. Differentiation. 71 (6), pp. 311–321. and, P.C., Singer, R.H. & Long, R.M. (2003) RNP Localization and Transport in Yeast. dx.doi.org. 17 (1), pp. 297–310.

Banani, S.F., Lee, H.O., Hyman, A.A. & Rosen, M.K. (2017) Biomolecular condensates: organizers of cellular biochemistry. Nature Reviews Molecular Cell Biology. 18 (5), pp. 285–298.

Baron, W. & Hoekstra, D. (2010) On the biogenesis of myelin membranes: sorting, trafficking and cell polarity. FEBS letters. 584 (9), pp. 1760–1770.

Bashirullah, A., Cooperstock, R.L. & Lipshitz, H.D. (1998) RNA localization in development. Annual review of biochemistry. 67 (1), pp. 335–394.

Bassell, G.J., Zhang, H., Byrd, A.L., Femino, A.M., Singer, R.H., Taneja, K.L., Lifshitz, L.M., Herman, I.M. & Kosik, K.S. (1998) Sorting of β-Actin mRNA and Protein to Neurites and Growth Cones in Culture. Journal of Neuroscience. 18 (1), pp. 251– 265.

Berkovits, B.D. & Mayr, C. (2015) Alternative 3′ UTRs act as scaffolds to regulate membrane protein localization. Nature. 522 (7556), pp. 363–367.

Bertrand, E., Chartrand, P., Schaefer, M., Shenoy, S.M., Singer, R.H. & Long, R.M. (1998) Localization of ASH1 mRNA Particles in Living Yeast. Molecular Cell. 2 (4), pp. 437–445.

Betley, J.N., Heinrich, B., Vernos, I., Sardet, C., Prodon, F. & Deshler, J.O. (2004) Kinesin II mediates Vg1 mRNA transport in Xenopus oocytes. Current Biology. 14 (3), pp. 219–224.

Birsoy, B., Kofron, M., Schaible, K., Wylie, C. & Heasman, J. (2006) Vg 1 is an essential signaling molecule in Xenopus development. Development. 133 (1), pp. 15–20.

Boggs, J.M. (2006) Myelin basic protein: a multifunctional protein. Cellular and Molecular Life Sciences CMLS. 63 (17), pp. 1945–1961.

Böhl, F., Kruse, C., Frank, A., Ferring, D. & Jansen, R.-P. (2000a) She2p, a novel RNA- binding protein tethers ASH1 mRNA to the Myo4p myosin motor via She3p. The EMBO Journal. 19 (20), pp. 5514–5524.

44 Böhl, F., Kruse, C., Frank, A., Ferring, D. & Jansen, R.P. (2000b) She2p, a novel RNA- binding protein tethers ASH1 mRNA to the Myo4p myosin motor via She3p. The EMBO Journal. 19 (20), pp. 5514–5524.

Brangwynne, C.P. & Hyman, A.A. (2009) Germline P Granules Are Liquid Droplets That Localize by Controlled Dissolution/Condensation. Science. 324 (5935), pp. 1726– 1729.

Bullock, S.L. & Ish-Horowicz, D. (2001) Conserved signals and machinery for RNA transport in Drosophila oogenesis and embryogenesis. Nature. 414 (6864), pp. 611– 616.

Chang, P., Torres, J., Lewis, R.A., Mowry, K.L., Houliston, E. & Lou King, M. (2004) Localization of RNAs to the Mitochondrial Cloud in Xenopus Oocytes through Entrapment and Association with Endoplasmic Reticulum. Molecular Biology of the Cell. 15 (10), pp. 4669–4681.

Chartrand, P., Meng, X.H., Huttelmaier, S., Donato, D. & Singer, R.H. (2002) Asymmetric sorting of ash1p in yeast results from inhibition of translation by localization elements in the mRNA. Molecular Cell. 10 (6), pp. 1319–1330.

Chartrand, P., Meng, X.H., Singer, R.H. & Long, R.M. (1999) Structural elements required for the localization of ASH1 mRNA and of a green fluorescent protein reporter particle in vivo. Current Biology. 9 (6), pp. 333–336.

Chekulaeva, M., Hentze, M.W. & Ephrussi, A. (2006) Bruno acts as a dual repressor of oskar translation, promoting mRNA oligomerization and formation of silencing particles. Cell. 124 (3), pp. 521–533.

Chen, K.H., Boettiger, A.N., Moffitt, J.R., Wang, S. & Zhuang, X. (2015) RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 348 (6233), pp. aaa6090–aaa6090.

Colón-Ramos, D.A., Salisbury, J.L., Sanders, M.A., Shenoy, S.M., Singer, R.H. & García-Blanco, M.A. (2003) Asymmetric distribution of nuclear pore complexes and the cytoplasmic localization of beta2-tubulin mRNA in Chlamydomonas reinhardtii. Developmental Cell. 4 (6), pp. 941–952.

Cornejo, E., Abreu, N. & Komeili, A. (2014) Compartmentalization and organelle formation in bacteria. Current Opinion in Cell Biology. 26pp. 132–138.

Crino, P.B. & Eberwine, J. (1996) Molecular Characterization of the Dendritic Growth Cone: Regulated mRNA Transport and Local Protein Synthesis. Neuron. 17 (6), pp. 1173–1187.

Dalby, B. & Glover, D.M. (1992) 3' non-translated sequences in Drosophila cyclin B transcripts direct posterior pole accumulation late in oogenesis and peri-nuclear association in syncytial embryos. Development. 115 (4), pp. 989–997.

45 Davis, I. & Ish-Horowicz, D. (1991) Apical localization of pair-rule transcripts requires 3′ sequences and limits protein diffusion in the Drosophila blastoderm embryo. Cell. 67 (5), pp. 927–940.

De Graeve, F. & Besse, F. (2018) Neuronal RNP granules: from physiological to pathological assemblies. Biological chemistry. 399 (7), pp. 623–635.

Deshler, J.O., Highett, M.I. & Schnapp, B.J. (1997) Localization of Xenopus Vg1 mRNA by Vera Protein and the Endoplasmic Reticulum. Science. 276 (5315), pp. 1128– 1131.

Deshler, J.O., Highett, M.I., Abramson, T. & Schnapp, B.J. (1998) A highly conserved RNA-binding protein for cytoplasmic mRNA localization in vertebrates. Current Biology. 8 (9), pp. 489–496.

Dominguez, D., Freese, P., Alexis, M.S., Su, A., Hochman, M., Palden, T., Bazile, C., Lambert, N.J., Van Nostrand, E.L., Pratt, G.A., Yeo, G.W., Graveley, B.R. & Burge, C.B. (2018) Sequence, Structure, and Context Preferences of Human RNA Binding Proteins. Molecular Cell. 70 (5), pp. 854–867.e859.

Dreyfuss, G., Kim, V.N. & Kataoka, N. (2002) Messenger-RNA-binding proteins and the messages they carry. Nature Reviews Molecular Cell Biology. 3 (3), pp. 195–205.

Fan, Y., Zhao, H.-C., Liu, J., Tan, T., Ding, T., Li, R., Zhao, Y., Yan, J., Sun, X., Yu, Y. & Qiao, J. (2015) Aberrant expression of maternal Plk1 and Dctn3 results in the developmental failure of human in-vivo- and in-vitro-matured oocytes. Scientific reports. 5 (1), pp. 8192–10.

Farina, K.L. (2002) Two ZBP1 KH domains facilitate beta-actin mRNA localization, granule formation, and cytoskeletal attachment. The Journal of Cell Biology. 160 (1), pp. 77–87.

Fazal, F.M., Han, S., Parker, K.R., Kaewsapsak, P., Xu, J., Boettiger, A.N., Chang, H.Y. & Ting, A.Y. (2019) Atlas of Subcellular RNA Localization Revealed by APEX-Seq. Cell. 178 (2), pp. 473–490.e26.

Ferrandon, D., Elphick, L., Nüsslein-Volhard, C. & St Johnston, D. (1994) Staufen protein associates with the 3′UTR of bicoid mRNA to form particles that move in a microtubule-dependent manner. Cell. 79 (7), pp. 1221–1232.

Ferrandon, D., Koch, I., Westhof, E. & Nüsslein-Volhard, C. (1997) RNA-RNA interaction is required for the formation of specific bicoid mRNA 3' UTR-STAUFEN ribonucleoprotein particles. The EMBO Journal. 16 (7), pp. 1751–1758.

Formicola, N., Vijayakumar, J. & Besse, F. (2019) Neuronal ribonucleoprotein granules: Dynamic sensors of localized signals. Traffic. 20 (9), pp. 639–649.

46 Forrest, K.M. & Gavis, E.R. (2003) Live Imaging of Endogenous RNA Reveals a Diffusion and Entrapment Mechanism for nanos mRNA Localization in Drosophila. Current Biology. 13 (14), pp. 1159–1168.

Gagnon, J.A. & Mowry, K.L. (2011) Molecular motors: directing traffic during RNA localization. Critical Reviews in Biochemistry and Molecular Biology. 46 (3), pp. 229–239.

Gautreau, D., Cote, C.A. & Mowry, K.L. (1997) Two copies of a subelement from the Vg1 RNA localization sequence are sufficient to direct vegetal localization in Xenopus oocytes. Development. 124 (24), pp. 5013–5020.

Genetics, R.J.T.I.1996 (n.d.) Mother-cell-specific HO expression in budding yeast depends on the unconventional myosin Myo4p and other cytoplasmic proteins- Asymmetric ….

Gerstberger, S., Hafner, M. & Tuschl, T. (2014) A census of human RNA-binding proteins. Nature Reviews Genetics. 15 (12), pp. 829–845.

Gonzalez, I., Buonomo, S.B., Nasmyth, K. & Ahsen, von, U. (1999) ASH1 mRNA localization in yeast involves multiple secondary structural elements and Ash1 protein translation. Current Biology. 9 (6), pp. 337–340.

Gumy, L.F., Yeo, G.S.H., Tung, Y.C.L., Zivraj, K.H., Willis, D., Coppola, G., Lam, B.Y.H., Twiss, J.L., Holt, C.E. & Fawcett, J.W. (2011) Transcriptome analysis of embryonic and adult sensory axons reveals changes in mRNA repertoire localization. RNA. 17 (1), pp. 85–98.

Hachet, O. & Ephrussi, A. (2001) Drosophila Y14 shuttles to the posterior of the oocyte and is required for oskar mRNA transport. Current Biology. 11 (21), pp. 1666–1674.

Hachet, O. & Ephrussi, A. (2004) Splicing of oskar RNA in the nucleus is coupled to its cytoplasmic localization. Nature. 428 (6986), pp. 959–963.

Hamilton, R.S. & Davis, I. (2011) Identifying and Searching for Conserved RNA Localisation Signals. In: RNA Detection and Visualization. Methods in Molecular Biology. Totowa, NJ: Humana Press, Totowa, NJ. pp. pp. 447–466. doi:10.1007/978-1-61779-005-8 27.

Havin, L., Git, A., Elisha, Z., Oberman, F., Yaniv, K., Schwartz, S.P., Standart, N. & Yisraeli, J.K. (1998) RNA-binding protein conserved in both microtubule- and microfilament-based RNA localization. & Development. 12 (11), pp. 1593– 1598.

Holt, C. (2019) Molecular control of local translation in axon development and maintenance.

47 Houston, D.W. & King, M.L. (2000) A critical role for Xdazl, a germ plasm-localized RNA, in the differentiation of primordial germ cells in Xenopus. Development. 127 (3), pp. 447–456.

Hung, V., Zou, P., Rhee, H.-W., Udeshi, N.D., Cracan, V., Svinkina, T., Carr, S.A., Mootha, V.K. & Ting, A.Y. (2014) Proteomic mapping of the human mitochondrial intermembrane space in live cells via ratiometric APEX tagging. Molecular Cell. 55 (2), pp. 332–341.

Huttelmaier, S., Zenklusen, D., Lederer, M., Dictenberg, J., Lorenz, M., Meng, X., Bassell, G.J., Condeelis, J. & Singer, R.H. (2005) Spatial regulation of beta-actin translation by Src-dependent phosphorylation of ZBP1. Nature. 438 (7067), pp. 512–515.

Jambhekar, A. & DeRisi, J.L. (2007) Cis-acting determinants of asymmetric, cytoplasmic RNA transport. RNA. 13 (5), pp. 625–642.

Jambor, H., Mueller, S., Bullock, S.L. & Ephrussi, A. (2014) A stem-loop structure directs oskar mRNA to microtubule minus ends. RNA. 20 (4), pp. 429–439.

Jeffery, W.R., Tomlinson, C.R. & Brodeur, R.D. (1983) Localization of actin messenger RNA during early ascidian development. Developmental Biology. 99 (2), pp. 408– 417.

Kafri, M., Metzl-Raz, E., Jona, G. & Barkai, N. (2016) The Cost of Protein Production. CellReports. 14 (1), pp. 22–31.

Kannaiah, S., Livny, J. & Amster-Choder, O. (2019) Spatiotemporal Organization of the E. coli Transcriptome: Translation Independence and Engagement in Regulation. Molecular Cell. 76 (4), pp. 574–589.e577.

Kim, J., Lee, J., Lee, S., Lee, B. & Kim-Ha, J. (2014) Phylogenetic comparison of oskar mRNA localization signals. Biochemical and biophysical research communications. 444 (1), pp. 98–103.

Kim-Ha, J., Kerr, K. & Macdonald, P.M. (1995) Translational regulation of oskar mRNA by Bruno, an ovarian RNA-binding protein, is essential. Cell. 81 (3), pp. 403–412.

Kim-Ha, J., Webster, P.J., Smith, J.L. & Macdonald, P.M. (1993) Multiple RNA regulatory elements mediate distinct steps in localization of oskar mRNA. Development. 119 (1), pp. 169–178.

Kindler, S., Wang, H., Richter, D. & Tiedge, H. (2005) RNA TRANSPORT AND LOCAL CONTROL OF TRANSLATION. dx.doi.org. 21 (1), pp. 223–245.

Kislauskis, E.H., Zhu, X. & Singer, R.H. (1994) Sequences responsible for intracellular localization of beta-actin messenger RNA also affect cell phenotype. The Journal of Cell Biology. 127 (2), pp. 441–451.

48 Kloc, M., Spohr, G. & Etkin, L.D. (1993) Translocation of repetitive RNA sequences with the germ plasm in Xenopus oocytes. Science. 262 (5140), pp. 1712–1714.

Komeili, A. (2012) Molecular mechanisms of compartmentalization and biomineralization in magnetotactic bacteria. FEMS microbiology reviews. 36 (1), pp. 232–255.

Kondo, S. & Miura, T. (2010) Reaction-diffusion model as a framework for understanding biological pattern formation. Science. 329 (5999), pp. 1616–1620.

Le Hir, H., Gatfield, D., Braun, I.C., Forler, D. & Izaurralde, E. (2001) The protein Mago provides a link between splicing and mRNA localization. EMBO reports. 2 (12), pp. 1119–1124.

Lee, J.H., Daugharthy, E.R., Scheiman, J., Kalhor, R., Ferrante, T.C., Terry, R., Turczyk, B.M., Yang, J.L., Lee, H.S., Aach, J., Zhang, K. & Church, G.M. (2015) Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues. Nature protocols. 10 (3), pp. 442–458.

Leung, K.-M., Lu, B., Wong, H.H.-W., Lin, J.Q., Turner-Bridger, B. & Holt, C.E. (2018) Cue-Polarized Transport of β-actin mRNA Depends on 3'UTR and Microtubules in Live Growth Cones. Frontiers in cellular neuroscience. 12pp. 300.

Leung, K.-M., van Horck, F.P., Lin, A.C., Allison, R., Standart, N. & Holt, C.E. (2006) Asymmetrical β-actin mRNA translation in growth cones mediates attractive turning to netrin-1. Nature Neuroscience. 9 (10), pp. 1247–1256.

Lécuyer, E., Yoshida, H., Parthasarathy, N., Alm, C., Babak, T., Cerovina, T., Hughes, T.R., Tomancak, P. & Krause, H.M. (2007) Global Analysis of mRNA Localization Reveals a Prominent Role in Organizing Cellular Architecture and Function. Cell. 131 (1), pp. 174–187.

Long, R.M., Gu, W., Lorimer, E., Singer, R.H. & Chartrand, P. (2000) She2p is a novel RNA-binding protein that recruits the Myo4p-She3p complex to ASH1 mRNA. The EMBO Journal. 19 (23), pp. 6592–6601.

Long, R.M., Gu, W., Meng, X., Gonsalvez, G., Singer, R.H. & Chartrand, P. (2001) An exclusively nuclear RNA-binding protein affects asymmetric localization of ASH1 mRNA and Ash1p in yeast. The Journal of Cell Biology. 153 (2), pp. 307–318.

Long, R.M., Singer, R.H., Meng, X., Gonzalez, I., Nasmyth, K. & Jansen, R.-P. (1997) Mating Type Switching in Yeast Controlled by Asymmetric Localization of ASH1 mRNA. Science. 277 (5324), pp. 383–387.

Lui, J., Castelli, L.M., Pizzinga, M., Simpson, C.E., Hoyle, N.P., Bailey, K.L., Campbell, S.G. & Ashe, M.P. (2014) Granules Harboring Translationally Active mRNAs Provide a Platform for P-Body Formation following Stress. CellReports. 9 (3), pp. 944–954.

49 Macdonald, P.M. & Kerr, K. (1998) Mutational Analysis of an RNA Recognition Element That Mediates Localization of bicoid mRNA. Molecular and Cellular Biology. 18 (7), pp. 3788–3795.

Macdonald, P.M. & Struhl, G. (1988) cis-acting sequences responsible for anterior localization of bicoid mRNA in Drosophila embryos. Nature. 336 (6199), pp. 595– 598.

Martin, K.C. & Ephrussi, A. (2009) mRNA Localization: Gene Expression in the Spatial Dimension. Cell. 136 (4), pp. 719–730.

Mercer, T.R., Neph, S., Dinger, M.E., Crawford, J., Smith, M.A., Shearwood, A.-M.J., Haugen, E., Bracken, C.P., Rackham, O., Stamatoyannopoulos, J.A., Filipovska, A. & Mattick, J.S. (2011) The Human Mitochondrial Transcriptome. Cell. 146 (4), pp. 645–658.

Messitt, T.J., Gagnon, J.A., Kreiling, J.A., Pratt, C.A., Yoon, Y.J. & Mowry, K.L. (2008) Multiple Kinesin Motors Coordinate Cytoplasmic RNA Transport on a Subpopulation of Microtubules in Xenopus Oocytes. Developmental Cell. 15 (3), pp. 426–436.

Mittag, T. & Parker, R. (2018) Multiple Modes of Protein-Protein Interactions Promote RNP Granule Assembly. Journal of Molecular Biology. 430 (23), pp. 4636–4649.

Mohr, S.E., Dillon, S.T. & Boswell, R.E. (2001) The RNA-binding protein Tsunagi interacts with Mago Nashi to establish polarity and localize oskar mRNA during Drosophila oogenesis. Genes & Development. 15 (21), pp. 2886–2899.

Münchow, S., Sauter, C. & Jansen, R.P. (1999) Association of the class V myosin Myo4p with a localised messenger RNA in budding yeast depends on She proteins. Journal of cell science. 112 ( Pt 10)pp. 1511–1518.

Nakamura, A., Sato, K. & Hanyu-Nakamura, K. (2004) Drosophila Cup Is an eIF4E Binding Protein that Associates with Bruno and Regulates oskar mRNA Translation in Oogenesis. Developmental Cell. 6 (1), pp. 69–78.

Nelson, M.R., Leidal, A.M. & Smibert, C.A. (2004) Drosophila Cup is an eIF4E-binding protein that functions in Smaug-mediated translational repression. The EMBO Journal. 23 (1), pp. 150–159.

Nielsen, J., Christiansen, J., Lykke-Andersen, J., Johnsen, A.H., Wewer, U.M. & Nielsen, F.C. (1999) A family of insulin-like growth factor II mRNA-binding proteins represses translation in late development. Molecular and Cellular Biology. 19 (2), pp. 1262–1270.

Oleynikov, Y. & Singer, R.H. (2003) Real-time visualization of ZBP1 association with beta-actin mRNA during transcription and localization. Current Biology. 13 (3), pp. 199–207.

50 Palacios, I.M. (2002) RNA Processing: Splicing and the Cytoplasmic Localisation of mRNA. Current Biology. 12 (2), pp. R50–R52.

Paquin, N., Ménade, M., Poirier, G., Donato, D., Drouet, E. & Chartrand, P. (2007) Local Activation of Yeast ASH1 mRNA Translation through Phosphorylation of Khd1p by the Casein Kinase Yck1p. Molecular Cell. 26 (6), pp. 795–809.

Piper, M., Anderson, R., Dwivedy, A., Weinl, C., van Horck, F., Leung, K.-M., Cogill, E. & Holt, C. (2006) Signaling mechanisms underlying Slit2-induced collapse of Xenopus retinal growth cones. Neuron. 49 (2), pp. 215–228.

Reid, D.W. & Nicchitta, C.V. (2012) Primary role for endoplasmic reticulum-bound ribosomes in cellular translation identified by ribosome profiling. The Journal of biological chemistry. 287 (8), pp. 5518–5527.

Ross, A.F., Oleynikov, Y., Kislauskis, E.H., Taneja, K.L. & Singer, R.H. (1997) Characterization of a beta-actin mRNA zipcode-binding protein. Molecular and Cellular Biology. 17 (4), pp. 2158–2165.

Ryu, Y.H., Kenny, A., Gim, Y., Snee, M. & Macdonald, P.M. (2017) Multiple cis-acting signals, some weak by necessity, collectively direct robust transport of oskar mRNA to the oocyte. Journal of cell science. 130 (18), pp. 3060–3071.

Sasakura, Y. & Makabe, K.W. (2002) Identification of cis Elements Which Direct the Localization of Maternal mRNAs to the Posterior Pole of Ascidian Embryos. Developmental Biology. 250 (1), pp. 128–144.

Semotok, J.L., Cooperstock, R.L., Pinder, B.D., Vari, H.K., Lipshitz, H.D. & Smibert, C.A. (2005) Smaug Recruits the CCR4/POP2/NOT Deadenylase Complex to Trigger Maternal Transcript Localization in the Early Drosophila Embryo. Current Biology. 15 (4), pp. 284–294.

Semotok, J.L., Luo, H., Cooperstock, R.L., Karaiskakis, A., Vari, H.K., Smibert, C.A. & Lipshitz, H.D. (2008) Drosophila Maternal Hsp83 mRNA Destabilization Is Directed by Multiple SMAUG Recognition Elements in the Open Reading Frame. Molecular and Cellular Biology. 28 (22), pp. 6757–6772.

Shepard, K.A., Gerber, A.P., Jambhekar, A., Takizawa, P.A., Brown, P.O., Herschlag, D., DeRisi, J.L. & Vale, R.D. (2003) Widespread cytoplasmic mRNA transport in yeast: identification of 22 bud-localized transcripts using DNA microarray analysis. Proceedings of the National Academy of Sciences. 100 (20), pp. 11429–11434.

Smith, J.L., Wilson, J.E. & Macdonald, P.M. (1992) Overexpression of oskar directs ectopic activation of nanos and presumptive pole cell formation in Drosophila embryos. Cell. 70 (5), pp. 849–859.

51 Snee, M.J., Arn, E.A., Bullock, S.L. & Macdonald, P.M. (2005) Recognition of the bcd mRNA localization signal in Drosophila embryos and ovaries. Molecular and Cellular Biology. 25 (4), pp. 1501–1510.

Steward, O., Wallace, C.S., Lyford, G.L. & Worley, P.F. (1998) Synaptic Activation Causes the mRNA for the IEG Arc to Localize Selectively near Activated Postsynaptic Sites on Dendrites. Neuron. 21 (4), pp. 741–751.

Strathern, J.N. & Herskowitz, I. (1979) Asymmetry and directionality in production of new cell types during clonal growth: the switching pattern of homothallic yeast. Cell. 17 (2), pp. 371–381.

Takizawa, P.A. & Vale, R.D. (2000) The myosin motor, Myo4p, binds Ash1 mRNA via the adapter protein, She3p. Proceedings of the National Academy of Sciences. 97 (10), pp. 5273–5278.

Takizawa, P.A., DeRisi, J.L., Wilhelm, J.E. & Vale, R.D. (2000) Plasma membrane compartmentalization in yeast by messenger RNA transport and a septin diffusion barrier. Science. 290 (5490), pp. 341–344.

Taliaferro, J.M., Vidaki, M., Oliveira, R., Olson, S., Zhan, L., Saxena, T., Wang, E.T., Graveley, B.R., Gertler, F.B., Swanson, M.S. & Burge, C.B. (2016) Distal Alternative Last Exons Localize mRNAs to Neural Projections. Molecular Cell. 61 (6), pp. 821– 833.

Thio, G.L., Ray, R.P., Barcelo, G. & Schüpbach, T. (2000) Localization of gurken RNA in Drosophila oogenesis requires elements in the 5‘ and 3’ regions of the transcript. Developmental Biology. 221 (2), pp. 435–446.

Tzatsos, A. & Tsichlis, P.N. (2007) Energy depletion inhibits phosphatidylinositol 3- kinase/Akt signaling and induces apoptosis via AMP-activated protein kinase- dependent phosphorylation of IRS-1 at Ser-794. Journal of Biological Chemistry. 282 (25), pp. 18069–18082.

Van Treeck, B. & Parker, R. (2018) Emerging Roles for Intermolecular RNA-RNA Interactions in RNP Assemblies. Cell. 174 (4), pp. 791–802.

Vikesaa, J., Hansen, T.V.O., Jønson, L., Borup, R., Wewer, U.M., Christiansen, J. & Nielsen, F.C. (2006) RNA-binding IMPs promote cell adhesion and invadopodia formation. The EMBO Journal. 25 (7), pp. 1456–1468.

Viphakone, N., Sudbery, I., Griffith, L., Heath, C.G., Sims, D. & Wilson, S.A. (2019) Co- transcriptional Loading of RNA Export Factors Shapes the Human Transcriptome. Molecular Cell. 75 (2), pp. 310–323.e318.

Wilkie, G.S. & Davis, I. (2001) Drosophila wingless and Pair-Rule Transcripts Localize Apically by Dynein-Mediated Transport of RNA Particles. Cell. 105 (2), pp. 209–219.

52 Willis, D., Li, K.W., Zheng, J.-Q., Chang, J.H., Smit, A., Kelly, T., Merianda, T.T., Sylvester, J., van Minnen, J. & Twiss, J.L. (2005) Differential Transport and Local Translation of Cytoskeletal, Injury-Response, and Neurodegeneration Protein mRNAs in Axons. Journal of Neuroscience. 25 (4), pp. 778–791.

Yin, X., Peterson, J., Gravel, M., Braun, P.E. & Trapp, B.D. (1997) CNP overexpression induces aberrant oligodendrocyte membranes and inhibits MBP accumulation and myelin compaction. Journal of Neuroscience Research. 50 (2), pp. 238–247.

Yisraeli, J.K. (2005) VICKZ proteins: a multi-talented family of regulatory RNA-binding proteins. Biology of the Cell. 97 (1), pp. 87–96.

Yong-Jun Fan, W.-X.Z. (2013) The cellular decision between apoptosis and autophagy. Chinese Journal of Cancer. 32 (3), pp. 121–.

Zhang, J., Talbot, W.S. & Schier, A.F. (1998) Positional Cloning Identifies Zebrafish one-eyed pinhead as a Permissive EGF-Related Ligand Required during Gastrulation. Cell. 92 (2), pp. 241–251.

53

Chapter II

Transcriptome-wide Organization of Subcellular Microenvironments Revealed by ATLAS-Seq

Danielle Adekunle and Eric T. Wang

Author Contributions

D.A. designed and performed all experiments and conducted all computational analysis under the guidance of E.T.W. D.A. drafted the manuscript and D.A. and E.T.W. revised

the manuscript.

This chapter is adapted from an article accepted at Nucleic Acids Research, 2020.

54

Abstract

Subcellular organization of RNAs and proteins is critical for cell function, but we still lack

global maps and conceptual frameworks for how these molecules are localized in cells

and tissues. Here we introduce ATLAS-Seq, which generates transcriptomes and

proteomes from detergent-free tissue lysates fractionated across a sucrose gradient.

Proteomic analysis of fractions confirmed separation of subcellular compartments.

Unexpectedly, RNAs tended to co-sediment with other RNAs in similar protein

complexes, cellular compartments, or with similar biological functions. With the exception

of those encoding secreted proteins, most RNAs sedimented differently than their

encoded protein counterparts. To identify RNA binding proteins potentially driving these

patterns, we correlated their sedimentation profiles to all RNAs, confirming known

interactions and predicting new associations. Hundreds of alternative RNA isoforms

exhibited distinct sedimentation patterns across the gradient, despite sharing most of their

coding sequence. These observations suggest that transcriptomes can be organized into

networks of co-segregating mRNAs encoding functionally related proteins and provide

insights into the establishment and maintenance of subcellular organization.

Introduction

Subcellular organization is critical for compartmentalization of intracellular processes and spatiotemporal control of RNA metabolism and protein translation. RNAs distribute to distinct microenvironments such as the ER (1),(2), the leading edge of the cell (3), axons

55 (4), and dendrites (5).These patterns facilitate cellular functions (6), including cell fate determination (7), directed movement (8), embryonic patterning (9), and synaptic plasticity (10). RNAs can be localized by RNA binding proteins (RBPs), via formation of ribonucleoprotein (RNP) particles or RNA transport granules that may travel on cytoskeleton (11). For example, zipcode-binding protein localizes β-actin mRNA to the leading edge of fibroblasts (12), and the She2/She3/Myo4 complex localizes ASH1 mRNA to budding yeast tips (13). Cis-elements unique to each mRNA, even at the isoform level, control the repertoire of RNA binding proteins (RBPs) that they recruit. For example, constitutive or alternative 3’ UTRs of mRNAs can recruit specific RBPs that influence both

RNA and protein fate (14-16). Indeed, different RNPs can influence formation of granules or compartments with differing physical properties (17); (18, 19) and these properties could play a role in dictating their final destinations. One potential reason to co-distribute

RNAs is to facilitate efficient co-translation or co-assembly of the proteins they encode.

While extensive efforts have been focused on mapping interactions between cis-elements and trans-factors (20), a major challenge remains to characterize how RNAs are distributed across different types of RNPs and whether they may be localized to distinct subcellular microenvironments.

Many techniques have been developed to study subcellular localization of RNA. In situ hybridization (21) offers high accuracy and resolution, especially with single molecule approaches, but is generally low throughput. To address this limitation, techniques such as MERFISH (22) and FISSEQ (23) have been developed to simultaneously visualize thousands of RNAs. In spite of these advances, in situ approaches do not easily reveal physical or biochemical properties of the subcellular compartments to which these RNAs

56 localize. Without super-resolution or expansion microscopy, it can be challenging to determine whether RNAs are associated with structures such as membranes, vesicles, or the cytoskeleton. Proximity labeling techniques using BirA or APEX (24), coupled to deep sequencing, have provided alternative routes towards identifying these associations. However, it is challenging to apply these techniques to tissues in vivo, and they require exogenous introduction of fusion proteins to biotinylate specific organelles.

Traditional biochemical fractionation is therefore an attractive alternative to separate

RNPs with distinct biophysical properties (25). Sedimentation across density gradients have been used to stratify protein complexes across cellular compartments (26) and analyses of sedimentation profiles reveal differences that are typically hidden from both image-based and enrichment-based methods. Fractionation combined with sequencing has been used to analyze the transcriptome of specific cellular compartments that are purifiable (27, 28), but this approach has not been used to analyze transcriptomes of many cellular compartments simultaneously with high resolution.

Here, we describe “Assigning Transcript Locations Across Sucrose-Sequencing”

(ATLAS-Seq), a detergent-free method that fractionates tissue homogenate across a continuous sucrose gradient by density ultracentrifugation, followed by RNA sequencing and mass spectrometry. We have used this approach to develop a map of the subcellular organization of the transcriptome in mouse liver and find that transcripts encoding proteins involved in similar biological processes display similar sedimentation profiles.

These profiles reflect a wide array of cellular compartments and correlate with RBP sedimentation patterns, making predictions about regulatory associations. Global

57 characterization of these profiles is a first step towards the elucidation of how RNA-protein interactions generate and maintain these subcellular compartments.

Materials and Methods

Subcellular Fractionation

Wild-type FVB female mouse livers were dissected and washed in ice cold PBS. Tissue was placed in a tube containing 0.25 M buffered sucrose solution, 20mM Tris, water supplemented with protease inhibitor cocktail and 10 mM ribonucleoside-vanadyl complex (VRC) as a ribonuclease inhibitor) with 2.8 mm ceramic beads and placed in a bead homogenizer to homogenize tissue. Homogenized tissue was centrifuged at 5000 x g for 10 minutes to remove nuclei. A Biocomp Gradient Master™ was using to generate an 11 mL 10-50% sucrose gradient (with 10 mM VRC). Homogenate was layered onto the gradient, and components were resolved by ultracentrifugation in an SW41 rotor for

3 hours at 30,000 rpm (4° C). Twenty-four 0.5 ml fractions were collected from the gradient using the BioComp Piston Gradient Fractionator™. Fractions were split for RNA and protein extraction. RNA was extracted from each fraction by Direct-zol RNA miniprep kit. 10 equivalents of EDTA (relative to the VRC concentration) were added to each sample in Trizol-reagent before ethanol was added to remove the ribonucleoside-vanadyl complex. Protein concentrations were measured by the Pierce BCA protein assay kit.

58 RNA-Seq

The Kapa stranded RNA-Seq with RiboErase kit was used for prepare libraries according to manufacturer’s instructions. An equal mass (500ng) of RNA was used as input to each individual library. Libraries quality was assessed using a BioAnalyzer (Agilent, Santa

Clara, CA) and quantified using a Qubit (Life Technologies) prior to pooling for sequencing. Pooled libraries were 75-bp paired-end sequenced on an Illumina Next-Seq

550 v2.

Mass Spectrometry

Proteins were reduced with 10 mM dithiothreitol for 1 hr at 56o C and then alkylated with

55 mM iodoacetamide for 1 hr at 25o C in the dark. Proteins were digested with modified trypsin at an enzyme/substrate ratio of 1:50 in 100 mM ammonium bicarbonate, pH 8.9 at 25o C overnight. Trypsin activity was halted by addition of acetic acid (99.9%) to a final concentration of 5%. Peptides were desalted using C18 SpinTips (Protea, Morgantown,

WV) and then vacuum centrifuged. Peptide labeling with TMT 10plex was performed per manufacturer’s instructions. Lyophilized samples were dissolved in 70 μL ethanol and 30

μL of 500 mM triethylammonium bicarbonate, pH 8.5, and the TMT reagent was dissolved in 30 μL of anhydrous acetonitrile. The solution containing peptides and TMT reagent was vortexed and incubated at room temperature for 1 h. Samples labeled with the ten different isotopic TMT reagents were combined and concentrated to completion in a vacuum centrifuge.

Peptides were separated by reverse phase HPLC (Thermo Easy nLC1000) using a precolumn (made in house, 6 cm of 10 µm C18) and a self-packed 5 µm tip analytical

59 column (12 cm of 5 µm C18, New Objective) over a 140 minute gradient before

nanoelectrospray using a QExactive mass spectrometer (Thermo). Solvent A was 0.1%

formic acid and solvent B was 80% MeCN/0.1% formic acid. The gradient conditions were

0-10% B (0-5 min), 10-30% B (5-105 min), 30-40% B (105-119 min), 40-60% B (119-124 min), 60-100% B (124-126 min), 100% B (126-136 min), 100-0% B (136-138 min), 0% B

(138-140 min), and the mass spectrometer was operated in a data-dependent mode. The parameters for the full scan MS were: resolution of 70,000 across 350-2000 m/z, AGC

3e6, and maximum IT 50 ms. The full MS scan was followed by MS/MS for the top 10

precursor ions in each cycle with an NCE of 32 and dynamic exclusion of 30 s. Raw mass

spectral data files (.raw) were searched using Proteome Discoverer (Thermo) and Mascot

version 2.4.1 (Matrix Science). Mascot search parameters were: 15 ppm mass tolerance

for precursor ions; 15 mmu for fragment ion mass tolerance; 2 missed cleavages of

trypsin; fixed modifications were carbamidomethylation of cysteine and TMT 10plex

modification of lysines and peptide N-termini; variable modifications were methionine

oxidation.

Read Mapping, Expression Analysis, and Isoform Quantitation

Reads were aligned using Spliced Transcripts Alignment to a Reference (STAR)

algorithm (29). RNA-Seq reads were quantified, pseudo-aligned to an mm10 Refseq

index, and counted as transcripts per million (TPMs) using the Kallisto quantification

program (30). For mitochondrial RNAs reads were pseudo-aligned to an Ensembl mm10

index and the TPM counts for annotated mitochondrial-encoded RNAs from the resulting

Kallisto tpm table was used to plot the distribution of mitochondrial-encoded RNAs across

60 the ATLAS-Seq gradient (Fig. 3B). RefSeq and Ensembl TPM tables can be found in

Table S2. The Mixture of Isoforms (MISO) (31) program was used to quantitate

alternative isoforms. Only isoforms with <0.2 confidence interval across all fractions were

analyzed.

GO analysis

Data release from AmiGO 2 version: 2.5.12 was used to determine GO enrichments

(Ashburner et al., 2000). Panther GO enrichment analysis (Mi et al., 2019) was used to determine GO enrichments for all analyses in the paper with one exception. P-values were determined by Fisher’s exact test with Bonferroni correction for multiple testing. In

Figure 4C, GOrilla (32) was used to determine cellular component GO enrichment categories for single lists. The lists for Figure 4C were ranked from highest to lowest

Pearson correlation for positive association and from lowest to highest for negative

correlation. GOrilla computed an uncorrected p-value according to the HG model and the

FDR q-value was corrected using the Benjamini-Hochberg method.

Comparing ATLAS-Seq to ribosome profiling

Ribosome profiling was performed in mouse liver cells (33). Fastq files for ribosome

profiling and RNA-Seq in mouse liver were downloaded from NCBI (GEO Accession

GSE67305) and processed by Kallisto (30). Fastq files for ribosome profiling performed

in HEK293T cells were downloaded from NCBI (GEO Accession GSE65778) (34). Fastq

files for TRIP-Seq polysome profiling performed in HEK293T cells were downloaded from

61 NCBI (GEO Accession GSE69352). For Figure 2, weighted counts from polysome sequencing or ATLAS-Seq were calculated similarly to Floor et. al. 2016 (35):

% ATLAS-Seq or polysome sequencing weighted counts = ∑&=' #$

For ATLAS-Seq, x = 3 and y = 24. For polysome sequencing, x = 1 and y = 7. Essentially,

TPMs were weighted by the fraction number, e.g. $ is the TPM count in the #th fraction, where # is the fraction number.

smiFISH and Probes smiFISH was performed according to (36). 3D Z-stacks were captured by epifluorescence using a Zeiss LSM880 using a 63x 1.4 NA objective and an Axiocam

MRm camera. Cy3 or Cy5-conjugated Y flaps were used as secondary probe detectors for all primary probes. All probes and flaps produced and purchased from Integrated DNA

Technologies (IDT) following protocols as listed in (36). All primary probe sequences are provided in Table S4. NIH 3T3 cells were grown on chamber slides (Lab-Tek) in 10%

FBS DMEM media. For smiFISH in liver, wild-type FVB mouse livers were cryosectioned into 7 uM sections and then subjected to the smiFISH protocol. DAPI staining was used to identify nuclei and all coverslips were mounted with Vectashield.

RBP Analysis

RBPs were defined using publicly available datasets of previously characterized RBPS

(Cook et al., 2011; (37). Overlap between these datasets and our list of peptides obtained from our mass spectrometry identified 148 RBPs in our mass spectrometry dataset. The

62 peptide profile of each RBP was correlated with the mean profile of RNAs in each ATLAS-

Seq RNA cluster.

Quantification and Statistical Analysis

Graphs were generated using Matplotlib version 2.2.2. Statistical Analyses were performed using Python, SciPy 1.1.0 and NumPy 1.14.3 libraries. Statistical parameters, statistical tests, and statistical significance (p value) are reported in the figures and their legends. Two independent, biological replicate gradients were generated from mouse liver. Each replicate was analyzed independently, with “gradient 2” being the replicate used for all main figures. For hierarchical clustering analysis, the SciPy.cluster.hierarchy library was used. All Correlations were calculated using NumPy corrcoef function which returns a Pearson correlation coefficient for variables. Wilcoxon rank-sum tests were used to compute statistical significance.

Data Availability

Raw sequencing reads for all samples are available through the NCBI via GEO Accession

GSE140630. An interactive browser that allows users to explore profiles of transcripts, proteins, and ATLAS-Seq clusters can be found at http://ericwanglab.com/atlas.php.

63 Results

Detergent-free sucrose fractionation of liver lysate separates RNA signatures by their

cellular microenvironments.

In this study, we applied ATLAS-Seq to mouse liver. Approximately 80% of mouse liver

by weight is composed of hepatocytes (38), minimizing contributions from other cell types that could confound interpretation of fractionation profiles. In addition, previous studies of liver have performed velocity sedimentation, followed by fractionation and mass spectrometry, to generate a “fingerprint” of co-fractionating proteins and protein complexes across the gradient (26). We performed similar velocity sedimentation of a detergent-free, post-nuclear liver lysate across a 10-50% sucrose gradient (Fig. 1A).

Notably, although velocity sedimentation only yields modest enrichment of particular

organelles at specific densities relative to density equilibrium approaches, it can be

advantageous for generating unique fingerprints for a variety of RNP, RNA, and

membrane-associated complexes across the full spectrum of the gradient. We collected

24 fractions from homogenized supernatant and subjected 17 with sufficient protein

content (fractions 3-19) to mass spectrometry (Table S1). The normalized abundance of

known organelle markers including calnexin (endoplasmic reticulum, ER), clathrin

(clathrin-coated vesicles), Gapdh (cytosol), Psma1 (), and catalase

(peroxisome) were plotted across the gradient (Fig. 1B) and showed patterns similar to previous studies (26). Importantly, we observed that these well-established organellar markers do not always peak at a specific density, but rather peak at different gradient fractions and exhibit distinct profiles, potentially reflecting the microenvironmental preferences of each protein in the cell.

64 Given our ability to separate proteins according to published expectations, we

subsequently performed RNA-Seq on 22 out of 24 of the fractions with sufficient RNA

content (fractions 3-24). Overall, gene expression profiles of fractions with similar

densities were more highly correlated than fractions with greater density differences (Fig.

1C). Two independent biological gradients were generated and analyzed, and similar

profiles were observed across the transcriptome of the biological replicate gradient (Fig.

S1, Supp. Table S2). Given the high degree of concordance between transcriptome

replicate gradients, we focused subsequent analyses on the gradient with a larger number

of fractions, from which matched proteomic data were also generated. Unsupervised

hierarchical clustering identified groups of RNAs among whose normalized expression

profiles across the gradient were highly correlated (Fig. 1D). 9,269 genes were assigned to 635 distinct clusters; among these, 76 clusters contained at least 20 genes (Supp.

Table S3).

These clusters were subjected to (GO) analysis. Of the 76 clusters, 53 showed enrichment for particular cellular compartments (cluster identities and GO results are in Supp. Table S3). Similar to protein organelle markers, RNA profiles with strong

GO enrichment did not always show a strong peak at any particular sucrose concentration; rather, profiles commonly showed modest enrichments of up to 4-fold at their greatest point (Fig. 1D). For example, cluster 280 showed modest depletion in the center of the gradient and was highly enriched for categories including “chylomicron and plasma lipoprotein particle”. Clusters 522, 30, and 114 showed ~2 to 3-fold enrichment at successively denser locations across the gradient, and revealed slightly different GO categories related to Golgi, aminoacyl-tRNA synthetase multienzyme complex, and

65 mitochondrial respiration, respectively. Cluster 57, ~2-fold enriched towards the denser part of the gradient, was enriched for proteasomal and mitochondrial categories (Fig.

1E,1F). Localization patterns of RNAs encoding proteasomal components have not been previously studied as a class and these results suggest that this subset of RNAs may exhibit a shared localization signature. Interestingly, the profiles of these RNAs are distinct from the peptide profile of a proteasomal marker, Psma1, as assessed by mass spectrometry. Overall, these observations show that RNAs with similar sedimentation properties often encode proteins known to co-associate or co-assemble in the cell.

66

Figure 1. ATLAS-Seq generates transcriptome- and proteome-wide profiles across a density centrifugation gradient. A) Schematic of ATLAS-Seq procedure from mouse liver homogenate depleted of nuclei. B) Relative protein abundance across a single ATLAS-Seq gradient for specific protein organelle markers as assessed by mass spectrometry. C) Heatmap showing Pearson correlation coefficients of gene expression between all pairs of sucrose fractions from a single ATLAS-Seq gradient. D) Heatmap of relative gene expression across a single ATLAS-Seq gradient, organized by hierarchical clusters, where rows are genes and columns are successively denser sucrose fractions. E) Selected clusters from (D) enlarged. F) Mean relative expression profiles across a single ATLAS-Seq gradient for clusters highlighted in (D), with corresponding Gene Ontology (GO) categories enriched within each cluster (right panel).

67

0.823), and importantly, a single cloud of points centered along a diagonal (Fig. 2B, top panel) indicating that polysome profiling is mainly a measure of ribosome density rather than RNAs bound to larger cellular compartments or complexes as previously reported

(35).

We then correlated ribosome footprint counts to ATLAS-Seq counts, also by weighted sums across each of the 22 fractions, mirroring calculations of polysome profiling counts above. A high correlation would imply that ATLAS-Seq mirrors a polysome gradient, and a low correlation would imply that ribosome occupancy cannot fully explain ATLAS-Seq profiles. We observed a weaker correlation (Pearson’s R = 0.738) and also the presence of subsets of RNAs lying off the main diagonal (Fig. 2B, bottom panel). Upon further inspection, we noticed that clusters we previously identified by hierarchical clustering

(Fig. 1D) separated away from the diagonal and were associated with specific GO categories (Fig. 1E). For example, RNAs in cluster 589, a cluster predicted to be membrane-associated due to enrichment of RNAs encoding secreted proteins and ER components, appear less dense according to ATLAS-Seq relative to ribosome footprint profiling (Fig. 2C). Overall, these observations suggest that the mild detergent-free homogenization conditions of ATLAS-Seq allow additional cellular components besides ribosomes to influence sedimentation, providing information about the cellular compartments with which RNAs are associated.

69

Figure 2. ATLAS-Seq profiles reflect a combination of subcellular microenvironment and ribosome occupancy. A) Schematic illustrating sample preparation differences between polysome profiling, ribosome footprint profiling, and ATLAS-Seq. B) Scatter plot of weighted sums of polysome profiling counts (see Methods) versus ribosome footprint profiling transcript per million (TPM) counts in HEK293T cells (top panel). Scatter plot of weighted sums of ATLAS-Seq counts (see Methods) and ribosome footprint profiling TPM counts in mouse liver (bottom panel). All genes are shown in gray, and genes in three clusters identified by hierarchical clustering of one ATLAS-Seq gradient (Fig. 1D) are shown in red, purple, and blue. C) Normalized ATLAS- Seq profiles for each of the three clusters highlighted in (B), along with GO categories for which they are enriched.

70

ATLAS-Seq RNA localization patterns are consistent with those identified by orthogonal

methods

Although our analyses thus far suggested that ATLAS-Seq can reveal information about

the subcellular location of RNAs, we sought to compare these predictions to observations

made using orthogonal methods. Crosslinking of RNAs to proteins labeled by APEX has

been used to capture RNAs localized to specific subcellular locations, for example the

outer surface of the ER or the mitochondrial matrix (39). First, we analyzed RNAs

published to be associated with the ER according to APEX-RIP. For each ATLAS-Seq

cluster, we computed the fraction of RNAs determined to be ER-associated by APEX-RIP

as well as the fraction of RNAs predicted to have a signal sequence according to SignalP

(40). We plotted a scatter of these metrics for each cluster (Fig. 3A) and highlighted each cluster in red if it was significantly enriched for any ER-related GO categories (Table S3)

Clusters identified to be ER-associated by ATLAS-Seq showed enrichment for RNAs

identified by ER APEX-RIP and RNAs with high SignalP scores. Although one ATLAS-

Seq cluster did not show ER-related GO enrichment, it did show enrichment for “plasma

membrane”, and proteins in the plasma membrane are typically derived from proteins

synthesized, processed, and trafficked via the endomembrane system (41, 42).

Interestingly, although all of these clusters were enriched for ER-related GO terms, some

exhibited distinct profiles that could be further stratified by specific GO subcategories.

(Fig. S2). For example, cluster 598 was enriched for the ER chaperone complex, whereas

cluster 280 was enriched for RNAs encoding proteins found in lipoprotein microparticles.

71 Therefore, our approach may provide finer resolution to identify subclusters corresponding to distinct ER microenvironments.

To further assess whether our gradient could reveal co-localized RNAs, we analyzed the

13 protein-coding mRNAs of the mitochondrial genome, which are known to reside in the mitochondria. Profiles of these RNAs highly correlated with each other and also with the mass spectrometry profile of a mitochondrial resident protein, fumarate hydratase (Fig.

3B). The high concordance of these profiles suggests that our approach preserves the association between RNAs inside the mitochondria and proteins associated with the organelle. Taken together, these analyses confirm that ATLAS-Seq yields information related to the subcellular localization of RNA species, and that profiles of RNAs with unknown localization patterns may be used to predict their local microenvironment.

We then sought to explore subcellular distributions of RNAs for which little is known.

Eleven out of 19 RNAs encoding the proteasome core complex were found in ATLAS-

Seq clusters 53 and 57. The localization of the proteasome itself is well studied and has been observed to play a key role in mitochondrial biogenesis (43). Interestingly, both proteasomal clusters, 53 and 57, also contained a number of nuclear-encoded mitochondrial RNAs (Table S3).

To assess whether results from our ATLAS-Seq analysis were consistent with an imaging-based approach, we performed single-molecule inexpensive FISH (smiFISH) on

4 proteasomal core complex RNAs (Psma1, Psmb1, Psmc5, and Adrm1), a nuclear- encoded mitochondrial RNA (Atp5b), and a signal sequence-containing RNA (Fn1). All proteasome-encoding RNAs and Atp5b exhibited similar ATLAS-Seq profiles, whereas

72 Fn1 exhibited a highly distinct profile that was representative of RNAs encoding secreted proteins (Fig. 3C). SmiFISH for Fn1 RNA in both liver (Fig. S3) and adherent NIH 3T3 cells (Fig. 3D) revealed a perinuclear pattern, consistent with the presence of a signal sequence and localization to the ER. Psma1, Psmb1, Psmc5, Adrm1, and Atp5b were found throughout the cytoplasm in a pattern distinct from that of Fn1. Interestingly, in spite of highly overlapping proteasomal and mitochondrial ATLAS-Seq profiles, smiFISH did not reveal strong spatial co-localization. This indicates that while ATLAS-Seq cannot provide information about the precise spatial location of an RNA, it may rather provide information about local microenvironments within a particular subcellular region – a property that is often difficult to discern by image-based methods.

73

74

Figure 3. ATLAS-Seq reveals subcellular localization of RNAs in a manner consistent with other established techniques. A) Gene clusters identified by ATLAS- Seq, plotted as a function of the proportion of genes within each cluster identified by ER APEX-RIP (x-axis) and the proportion of genes within each cluster predicted to be secreted by SignalP (y-axis). Clusters significantly enriched as determined by Fisher’s exact test, in ER-related GO categories are shown in red, clusters with significant non- ER GO enrichment are shown in blue, and clusters with no significant GO enrichment are shown in gray. B) Distribution of normalized TPMs across the ATLAS-Seq gradient for 13 genes identified to be mitochondrially-associated by APEX-RIP. Normalized mass spectrometry peptide counts for fumarate hydratase across the ATLAS-Seq gradient are shown in black. Pearson correlation coefficients between TPMs for each RNA and fumarate hydratase peptide counts are shown. C) Normalized TPM profiles across one ATLAS-Seq gradient for RNAs encoding fibronectin 1 (Fn1), proteasomal subunit A1 (Psma1), proteasomal subunit B1 (Psmb1), Proteasomal ubiquitin receptor (Adrm1), 26s proteasome regulatory subunit 8 (Psmc5), and ATP synthase subunit beta (Atp5b) in green or red as labeled. Pearson correlation coefficients between each pair of RNAs are also listed. (D) smiFISH for RNAs encoding Fn1, Psma1, and Psmb1, Adrm1, Psmc5, Atp5b in NIH 3T3 cells. Fn1 exhibits a perinuclear pattern, whereas Psma1 and Psmb1 are distributed throughout the cytoplasm. Nuclei were stained by DAPI (blue), and the same scale bar applies to all images (10 µm).

75

Figure S3

Fn1 Psma1 Merge

Adrm1 Psma1 Merge

Psmb1 Psma1 Merge

Figure S3. Related to Figure 3. ER clusters possess distinct differences in subgroups of ER organelles they encode. A) Heatmap of GO enrichment categories for different ER clusters highlighted in Fig. 3A. B) Normalized TPM profiles across the gradient for ER clusters identified in Figure 3A.

Comparing sedimentation patterns of RNAs and the proteins they encode

A long-standing question is the extent to which RNAs co-localize with the proteins they encode. It is well established in neurons that many synaptically localized RNAs encode locally translated proteins, and therefore show co-localization(10). In contrast, in the

mouse intestinal epithelium, localization of many mRNAs is distinct from their encoded

proteins (44). Although ATLAS-Seq cannot truly assess co-localization of RNAs and

proteins in space, it can assess the extent to which they co-sediment. We compared

normalized protein profiles to normalized RNA profiles across the sucrose gradient,

limiting these analyses to genes for which we had both reasonable RNA-Seq read

coverage and mass spectrometry peptide counts (404 genes in total, Table S5). As

77 examples, we show that RNA and protein profiles for Alb (albumin) were highly concordant (Pearson’s R=0.93), whereas the RNA and protein for Psmd13, a 26S proteasome subunit protein, were anti-correlated (Pearson’s R=-0.88) (Fig. 4A). We plotted a histogram of these correlations across all genes for which we could obtain reproducible RNA and protein data and observed that most genes exhibited a negative correlation; that is, RNA and protein exhibited anti-correlated sedimentation profiles (Fig.

4B). This suggests that in liver, the majority of RNAs are not localized to the same subcellular region as the steady state destination of their protein counterparts or are in a microenvironment distinct from the protein they encode. GO analysis revealed that genes with a high correlation between their RNA and protein counterparts were enriched for secretion and/or endomembrane-trafficking, whereas highly anti-correlated genes were enriched for cytosolic genes (Fig. 4C, Table S5). Indeed, the most positively correlated genes (top 20th percentile) contained signal sequences ~38% of the time, consistent with their translation at the ER membrane, whereas the most negatively correlated genes

(bottom 20th percentile) contained signal sequences only ~14% of time (Fig. S4C). Thus, although proteins of the ER colocalize with their RNA, most RNAs and their encoded proteins did not co-sediment in this context.

Although most RNAs do not co-sediment with the proteins they encode, our previous

Gene Ontology analysis of RNA clusters (Fig. 1) suggested that proteins encoded by co- sedimenting RNAs act in similar biological pathways or cellular compartments. We therefore grouped proteins by their RNA cluster assignments and analyzed their sedimentation patterns. For example, RNAs in Cluster 53, enriched for the proteasome complex, showed enrichment towards the bottom of the gradient and were highly

78 correlated with one another, showing a median correlation among all pairwise

comparisons of 0.97 (Fig. 4D). Interestingly, the proteins encoded by these RNAs also

tended to correlate with one another, showing a median pairwise correlation of 0.65. To

assess this globally, we analyzed every cluster for which there were at least 2 proteins

assessed by mass spectrometry and obtained the median pairwise correlation among all

proteins in each cluster. These median correlation values were enriched for positive

values (Fig. 4E, pink bars), and were much greater than when computed using shuffled

RNA-protein assignments (Fig. 4E, gray bars). This analysis provides further evidence

that the sedimentation patterns of RNAs contain information about the subcellular

localization of the proteins they encode.

79

80 Figure 4. Most RNAs are anti-correlated with the proteins they encode in the ATLAS-Seq gradient. A) Normalized TPM (blue line) and peptide counts (red line) across the ATLAS-Seq gradient for Albumin (Alb, top panel), and 26S proteasome non- ATPase regulatory subunit 2 (Psmd2, bottom panel). Pearson correlation coefficients between RNA and protein are shown. B) Distribution of Pearson correlation coefficients between RNAs and the proteins they encode across the ATLAS-Seq gradient for 404 genes. C) Cellular compartment GO categories enriched in genes whose RNAs are strongly correlated with the proteins they encode. The size of each dot is determined by the number of genes (also listed next to point) found in that GO category. Fold enrichment was calculated by the observed number of genes in a GO category divided by the expected number of genes in that category (see Methods) (top panel). Cellular compartment GO categories enriched in genes whose RNAs strongly anti-correlate with the proteins they encode (bottom panel). D) Normalized TPM (blue lines) and peptide counts (red lines) across the ATLAS-Seq gradient for genes with both ATLAS-Seq and mass spectrometry data in Cluster 53, which is enriched for proteasome genes. The median pairwise correlation among all RNAs and among all proteins in the cluster are listed. E) Histogram of median pairwise correlations of protein profiles (red) for all clusters containing at least 2 proteins. Median pairwise correlations were also computed using shuffled RNA-protein assignments and plotted (gray). For reference, the median of all median pairwise RNA correlations across all RNA clusters is indicated in blue dashed line.

81

Alternative isoforms are differentially localized across the ATLAS-Seq gradient

We next investigated whether alternative isoforms from the same gene loci exhibit differential sedimentation patterns across the ATLAS-Seq gradient. Because untranslated regions have known roles in regulating RNA localization, we focused on alternative UTR isoforms for these studies. We considered both alternative first exons

(AFE, generated by alternative promoter usage and splicing to a constitutive exon) and alternative last exons (ALE, generated by alternative splicing and/or polyadenylation). We quantitated the proportion of each isoform present in each fraction of the gradient, labeled percent spliced in, or PSI (Ψ) (Table S6). After limiting analyses to isoforms for which Ψ could be confidently estimated (see Methods), we found 152 AFEs and 332 ALEs for which the maximum difference in Ψ (ΔΨ) across the gradient for any pair of isoforms was

> 0.5. For example, one AFE isoform for Chtop showed a ΔΨ of 0.96 towards the densest part of the gradient (Fig. 5A). Similarly, one ALE isoform of DNA-Caspase-9 (Casp9) showed a ΔΨ of 0.73 (Fig. 5B) towards the densest part of the gradient. These observations confirm distinct subcellular distributions of alternative isoforms, as revealed by sucrose density fractionation.

If the relative abundance of these alternative isoforms is important for cell function, they may contain sequences subject to positive selection through evolution. To determine whether isoforms with differential sedimentation patterns are more phylogenetically conserved, we measured their conservation using PhyloP scores. AFE isoforms showing strong (ΔΨ > 0.5) and moderate (0.5 < ΔΨ < 0.25) differential sedimentation showed

83 similar conservation scores but were more highly conserved than isoforms lacking differential localization (ΔΨ < 0.25) (Fig. 5C). In contrast, the extent of differential localization of ALE isoforms correlated with conservation across all three groups, e.g. more strongly localized ALE isoforms were more highly conserved suggesting their enrichment for functional features.

84

Figure 5. Alternative first and last exons exhibit differential profiles across the ATLAS-Seq gradient. A) Ψ values across one ATLAS-Seq gradient for AFE isoforms of Chromatin target of PRMT1 protein, Chtop. B) Ψ values across one ATLAS-Seq gradient for ALE isoforms of Caspase-9, Casp9. C) Cumulative distribution of PhyloP conservation scores for AFE isoforms, separated by strongly regulated (ΔΨ > 0.5), moderately regulated (0.5 < ΔΨ > 0.25), and non-regulated (ΔΨ < 0.25) isoforms. P values were determined by Wilcoxon rank-sum test, comparing each regulated group to the non- regulated group. D) Cumulative distribution of PhyloP conservation scores for ALE isoforms, similar to (C).

85 ATLAS-Seq profiles of RBPs correlate with target mRNAs

RBPs can control the RNA localization of their targets, but few RBP-RNA pairs have been

functionally validated in this context. Because each RNA interacts with many RBPs, the

impact of each RBP-RNA interaction may only subtly influence final destination of that

RNP. Therefore, analysis of many RNAs showing similar sedimentation patterns may be

required to provide the power necessary to identify potentially weak, yet true, significant

signals.

To test the hypothesis that RBPs and their RNA targets might co-sediment through the

gradient, we first analyzed a known example of an RBP-RNA pair. The RNA binding

protein APOBEC1 complementation factor, A1cf, is known to bind and edit the RNA

encoding (Apob) (45). The relative abundance of A1cf peptides and

Apob RNA were strongly correlated across the gradient (Pearson’s R = 0.92, Fig. 6A).

Given this correlation, we hypothesized that additional RNAs whose profiles strongly

correlated with A1cf might also be binding partners of A1cf. We identified 894 RNAs

whose profiles correlated strongly with A1cf (Pearson’s R>0.85); these RNAs encoded

proteins enriched for GO Cellular Compartment categories such as ER, golgi, endosome,

and vesicles (Fig. 6A). Enriched GO Biological Processes included lipid

localization/transport and the Endoplasmic Reticulum-associated protein degradation

(ERAD) pathway – functions known or proposed to be associated with A1cf (46). Similar

results were observed in a separate replicate gradient (Fig. S5). Notably, binding motifs for A1cf identified in vitro by BindNSeq were enriched in the 3' UTRs of these 894 RNAs relative to all other RNAs in the gradient; these hexamers were also more highly conserved than other hexamers in all 3' UTRs of mouse mRNAs (Fig. 6B).

86 To further assess whether the abundance of specific RBPs across the gradient might be

associated with the localization of their target RNAs, we identified RBPs in our mass

spectrometry dataset for which functional binding data was also publicly available. We

focused on hnRNP F, for which there is publicly available CLIP-Seq data from HEK293T

cells (47). We correlated all RNAs in our gradient to the peptide profile for hnRNPF and

separated them by Pearson’s correlation coefficient. The most strongly correlating RNAs

(Pearson’s R > 0.85) were enriched for specific GO categories, including ER membrane

(Fig. 6C, Fig. S5). We analyzed mouse orthologs of human RNAs bound by hnRNP F

according to CLIP and found that more highly correlated RNAs showed a greater density

of CLIP binding in 3' UTRs relative to less correlated or anti-correlated RNAs, as

measured by number of binding sites per unit of gene expression (Fig. 6D).

To uncover additional RBP-RNA relationships that may drive co-sedimentation patterns, we identified 134 RBPs (see Methods) supported by mass spectrometry peptides across our ATLAS-Seq gradient. We correlated their profiles to all ATLAS-Seq RNA clusters

(Fig. S6) and found 71 RBPs whose peptide counts correlated to our previously defined

RNA cluster profiles (Pearson’s R>0.85). While functional connections between most of these RBP-RNA cluster pairs are unknown, some relationships observed are consistent with known functions of the RBPs. For example, heterogeneous nuclear ribonucleoprotein

Q (hnRNP Q/SYNCRIP) correlated most strongly with cluster 177 (Pearson’s R=0.86), which contains RNAs encoding proteins in the pICln-Sm protein complex, the U12-type spliceosomal complex, and U2snRNPs (Fig. 6E, Supp. Table S7). Consistent with this pairing, HnRNP Q interacts with Survival of Motor Neuron (SMN) complex (48), is a component of the spliceosome, and has been proposed to link the SMN complex to

87 splicing functions (49). As another example, we observed that Myosin-9 (Myh9) correlates

best with cluster 430 (R = 0.90), which contains RNAs encoding cilium components (Fig.

6F, Supp. Table S7). Myh9 is an RBP that also contains a motor domain and has been

shown to compete with Myh10 to inhibit cilium biogenesis (50). Taken together, these

results support the ability of ATLAS-Seq to predict RBP-RNA associations and their regulatory connections.

88

89 Figure 6. ATLAS-Seq reveals associations between RNA binding proteins and their RNA targets. A) Distribution across the ATLAS-Seq gradient of relative peptide counts of APOBEC1 Complementation Factor (A1cf, red dashes), and normalized TPMs for apolipoprotein B (Apob, blue line). Pearson’s R between A1cf and Apob = 0.92. Also shown are normalized TPMs for 894 RNAs correlating with a Pearson’s R > 0.85 (gray). Shown below are GO Cellular Compartment terms associated with these RNAs. B) Hexamers plotted by log2(foreground / background counts) on the x-axis and conservation rate on the y-axis, where foreground counts were obtained from 3' UTRs in the set of 894 RNAs correlating > 0.85 from (A) and background counts were obtained from all other RNAs in the gradient. Conservation rate was computed across all mouse Refseq 3’ UTRs as fraction of instances showing full conservation across mouse, human, rat, and dog multi-alignments. The top 10 BindNSeq A1cf hexamers are highlighted in red. C) Relative peptide counts for heterogeneous nuclear ribonucleoprotein F (hnRNP F, red dashes) and normalized TPMs for RNAs correlating with Pearson’s R > 0.85 (gray). D) Cumulative distribution of the number of hnRNP F CLIP binding sites per unit TPM for groups of RNAs separated by Pearson’s correlation to relative abundance of hnRNP F peptides. *p < 0.05, **p < 0.001 as assessed by Wilcoxon rank-sum test. E) Relative peptide counts for heterogeneous nuclear ribonucleoprotein Q (hnRNP Q/Syncrip, red dashes) and mean TPM profile for the RNA cluster best correlating with hnRNP Q peptide counts (blue line). Shown below are GO Cellular Compartment terms enriched in that cluster. F) Relative peptide counts for Myosin-9 (Myh9, red dashes) and mean TPM profile for the RNA cluster best correlating with Myh9 peptide counts (blue). Shown below are GO Cellular Compartment terms enriched in that cluster.

90

Figure S5. Related to Figure 6. Selected RBPs and co-sedimenting RNAs in ATLAS- Seq gradient 1. A) Normalized mass spectrometry peptide counts for heterogeneous nuclear ribonucleoprotein F (dashed red, hnRNP F) and mean normalized TPMs for RNAs that correlate with a Pearson correlation coefficient > 0.85 (gray). Top GO cellular compartment enrichment categories are listed below the figure. B) Normalized mass spectrometry peptide counts for APOBEC1 Complementation Factor (dashed red, A1CF) and mean normalized TPMs for RNAs that correlate with a Pearson’s correlation coefficient greater than 0.85 (gray). Shown below each panel are cellular compartment GO categories for the best correlating RNAs.

91

Discussion

We have used ATLAS-Seq to uncover unexpected relationships between sucrose gradient sedimentation profiles of RNAs encoding proteins involved in similar biological functions. Deep sequencing of RNA transcriptome-wide and mass spectrometry of peptides with high resolution across the gradient facilitated the discovery of these relationships and characterized the presence of cellular microenvironments to which

RNAs are sorted. Surprisingly, subtle differences in profile shape can resolve differences in the composition of cellular compartments. These profiles likely reflect not only engagement with large macromolecules such as the ribosome, but also membranes and other structures with distinct physiochemical properties. We observed that these interactions were reflected in the divergence of ribosome footprint profiles from ATLAS-

Seq. Future studies directly comparing polysome profiles to ATLAS-Seq or other

gradients prepared by diverse detergents, cytoskeletal disruptors, or other agents might

further elucidate how various interactions drive sedimentation profiles.

Distinct microenvironments in the cell arising from these interactions – the sum of weak

attractive and repulsive forces between biomolecules – may create the appropriate

settings for translation, sorting, decay, and other cellular processes. Although these

specialized environments are sometimes membrane-bound organelles, our observations

suggest they may also reflect membrane-less organelles in the cytoplasm such as the

proteasome, sites of spliceosome component assembly, or even RNP granules. These

RNP granules could contain single mRNAs bound to multiple RBPs, or perhaps supra-

molecular assemblies in which multiple RNPs are linked via protein-protein, protein-RNA,

or even RNA-RNA interactions. Thus, there exists spatial organization among thousands

93 of RNAs revealed by physical separation across a density gradient. The observations here provide a blueprint for how RNAs might map to specific subcellular microenvironments in liver cells and provide insights into higher scale organization of the transcriptome.

Interestingly, correlations of RNA to their encoded proteins revealed that most RNA- protein counterparts are not co-localized, but that some are, most notably those encoding membrane and secreted proteins. In these cases, the co-localization may reflect co- translational insertion into specific lumenal compartments. However, both RNAs with and without signal peptide sequences often co-sedimented, suggesting that there may be additional signals within RNA that influence their localization. Notably, proteins encoded by co-sedimenting RNAs also tend to co-sediment, suggesting regulatory mechanisms that bridge the subcellular localization of each molecule. This has been previously observed for specific mRNAs at the isoform level; for example, localization of some proteins has been shown to be directly influenced by 3' UTRs of their mRNAs via association with membraneless organelles such as TIGER domains (51). This is consistent with our findings that isoforms with distinct last exons and 3' UTRs showed distinct sedimentation patterns associated with increased phylogenetic conservation.

Indeed, alternative 3' UTRs have been shown to localize mRNAs to neurites versus soma

(15). Whether RNAs co-localize with their encoded proteins may also depend on cell type and/or cell state and remains to be further characterized.

A key goal in studies of RNA sorting and localization is to identify RNA elements and

RBPs that might define subcellular localization of RNAs and locally translated proteins.

Only a small fraction of putative RBPs have been functionally characterized, but co-

94 sedimentation of RBPs and RNA targets may reveal functional interactions. In summary,

high resolution subcellular fractionation on a transcriptome-wide scale can provide

important insights into the regulation of higher order, subcellular compartmentalization of

mRNAs by revealing groups of RNAs that co-segregate within the cell and implicating

post-transcriptional processes and trans-factors associated with these microenvironments.

Funding

This research was supported by NIH grant DP5 OD017865.

Acknowledgments

The authors wish to thank members of the Wang lab, Phil Sharp, and the Sharp lab for insightful discussions, with special thanks to Hailey Olafson for programming discussions,

Amanda Del Rosario and Richard Schiavoni of the MIT Koch Institute Biopolymers and

Proteomics core for technical assistance, David Bartel and Christopher Burge for fruitful discussions and advice, Jared Richardson of the Berglund lab of the University of Florida

Center for Neurogenetics (UF CNG) for technical support with sequencing, the Ranum lab for assistance with mouse work, Matthew Taliaferro and Gary Bassell for helpful feedback, and David Coombs for insightful discussions and technical assistance with

BioComp’s Hybrid Piston Gradient Fractionator™ and Gradient Master™ operation.

95 References

1. Diehn,M., Bhattacharya,R., Botstein,D. and Brown,P.O. (2006) Genome-Scale Identification of Membrane-Associated Human mRNAs. PLoS Genet., 2, e11.

2. Pyhtila,B., Zheng,T., Lager,P.J., Keene,J.D., Reedy,M.C. and Nicchitta,C.V. (2008) Signal sequence- and translation-independent mRNA localization to the endoplasmic reticulum. RNA, 14, 445–453.

3. Mili,S., Moissoglu,K. and Macara,I.G. (2008) Genome-Wide Screen Identifies Localized RNAs Anchored At Cell Protrusions Through Microtubules And APC. Nature, 453, 115–119.

4. Hengst,U. and Jaffrey,S.R. (2007) Function and translational regulation of mRNA in developing axons. Seminars in Cell & Developmental Biology, 18, 209–215.

5. Steward,O. and Schuman,E.M. (2001) Protein Synthesis at Synaptic Sites on Dendrites. Annu. Rev. Neurosci., 24, 299–325.

6. Holt,C.E. and Bullock,S.L. (2009) Subcellular mRNA Localization in Animal Cells and Why It Matters. Science, 326, 1212–1216.

7. Bashirullah,A., Cooperstock,R.L. and Lipshitz,H.D. (1998) RNA localization in development. Annu. Rev. Biochem., 67, 335–394.

8. Macdonald,P.M. and Struhl,G. (1988) Cis - acting sequences responsible for anterior localization of bicoid mRNA in Drosophila embryos. Nature, 336, 595–598.

9. Ephrussi,A., Dickinson,L.K. and Lehmann,R. (1991) Oskar organizes the germ plasm and directs localization of the posterior determinant nanos. Cell, 66, 37–50.

10. Schuman,E.M. (1999) Neurotrophin regulation of synaptic transmission. Current Opinion in Neurobiology, 9, 105–109.

11. Tolino,M., Köhrmann,M. and Kiebler,M.A. (2012) RNA-binding proteins involved in RNA localization and their implications in neuronal diseases. European Journal of Neuroscience, 35, 1818–1836.

12. Ross,A.F., Oleynikov,Y., Kislauskis,E.H., Taneja,K.L. and Singer,R.H. (1997) Characterization of a beta-actin mRNA zipcode-binding protein. Molecular and Cellular Biology, 17, 2158–2165.

13. Bertrand,E., Chartrand,P., Schaefer,M., Shenoy,S.M., Singer,R.H. and Long,R.M. (1998) Localization of ASH1 mRNA Particles in Living Yeast. Molecular Cell, 2, 437–445.

96 14. Mattioli,C.C., Rom,A., Franke,V., acids,K.I.N.2019 Alternative 3′ UTRs direct localization of functionally diverse protein isoforms in neuronal compartments. academic.oup.com

15. Taliaferro,J.M., Vidaki,M., Oliveira,R., Olson,S., Zhan,L., Saxena,T., Wang,E.T., Graveley,B.R., Gertler,F.B., Swanson,M.S., et al. (2016) Distal Alternative Last Exons Localize mRNAs to Neural Projections. Molecular Cell, 61, 821–833.

16. Berkovits,B.D. and Mayr,C. (2015) Alternative 3′ UTRs act as scaffolds to regulate membrane protein localization. Nature, 522, 363–367.

17. Langdon,E.M., Qiu,Y., Niaki,A.G., McLaughlin,G.A., Weidmann,C.A., Gerbich,T.M., Smith,J.A., Crutchley,J.M., Termini,C.M., Weeks,K.M., et al. (2018) mRNA structure determines specificity of a polyQ-driven phase separation. Science, 360, 922–927.

18. Trcek,T., Grosch,M., York,A., Shroff,H., Lionnet,T. and Lehmann,R. (2015) Drosophila germ granules are structured and contain homotypic mRNA clusters. Nature Communications 2015 6, 6, 7962.

19. Hubstenberger,A., Courel,M., Bénard,M., Souquere,S., Ernoult-Lange,M., Chouaib,R., Yi,Z., Morlot,J.-B., Munier,A., Fradet,M., et al. (2017) P-Body Purification Reveals the Condensation of Repressed mRNA Regulons. Molecular Cell, 68, 144–157.e5.

20. Ray,D., Kazan,H., Chan,E.T., Castillo,L.P., Chaudhry,S., Talukder,S., Blencowe,B.J., Morris,Q. and Hughes,T.R. (2009) Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nature Biotechnology, 27, 667–670.

21. LAWRENCE,J. (1986) Intracellular localization of messenger RNAs for cytoskeletal proteins. Cell, 45, 407–415.

22. Chen,K.H., Boettiger,A.N., Moffitt,J.R., Wang,S. and Zhuang,X. (2015) RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science, 348, aaa6090–aaa6090.

23. Lee,J.H., Daugharthy,E.R., Scheiman,J., Kalhor,R., Ferrante,T.C., Terry,R., Turczyk,B.M., Yang,J.L., Lee,H.S., Aach,J., et al. (2015) Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues. Nat Protoc, 10, 442–458.

24. Fazal,F.M., Han,S., Kaewsapsak,P., Parker,K.R., Xu,J., Boettiger,A.N., Chang,H.Y. and Ting,A.Y. (2018) Atlas of Subcellular RNA Localization Revealed by APEX-seq. bioRxiv, 10.1101/454470.

25. Jagannathan,S., Nwosu,C. and Nicchitta,C.V. (2011) Analyzing Subcellular mRNA Localization via Cell Fractionation. Methods in molecular biology (Clifton, N.J.), 714, 301–321.

97 26. Foster,L.J., de Hoog,C.L., Zhang,Y., Zhang,Y., Xie,X., Mootha,V.K. and Mann,M. (2006) A Mammalian Organelle Map by Protein Correlation Profiling. Cell, 125, 187– 199.

27. Mercer,T.R., Neph,S., Dinger,M.E., Crawford,J., Smith,M.A., Shearwood,A.-M.J., Haugen,E., Bracken,C.P., Rackham,O., Stamatoyannopoulos,J.A., et al. (2011) The Human Mitochondrial Transcriptome. Cell, 146, 645–658.

28. Reid,D.W. and Nicchitta,C.V. (2012) Primary role for endoplasmic reticulum-bound ribosomes in cellular translation identified by ribosome profiling. J. Biol. Chem., 287, 5518–5527.

29. Dobin,A., Davis,C.A., Schlesinger,F., Drenkow,J., Zaleski,C., Jha,S., Batut,P., Chaisson,M. and Gingeras,T.R. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29, 15–21.

30. Bray,N.L., Pimentel,H., Melsted,P. and Pachter,L. (2016) Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology, 34, 525–527.

31. Katz,Y., Wang,E.T., Airoldi,E.M. and Burge,C.B. (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Meth, 7, 1009–1015.

32. Eden,E., Navon,R., Steinfeld,I., Lipson,D. and Yakhini,Z. (2009) GOrilla : a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics, 10, 48.

33. Janich,P., Arpat,A.B., Castelo-Szekely,V., Lopes,M. and Gatfield,D. (2015) Ribosome profiling reveals the rhythmic liver translatome and circadian clock regulation by upstream open reading frames. Genome Research, 25, 1848–1859.

34. Sidrauski,C., McGeachy,A.M., Ingolia,N.T. and Walter,P. (2015) The small molecule ISRIB reverses the effects of eIF2α phosphorylation on translation and stress granule assembly. eLife, 4, R106.

35. Stephen N Floor,J.A.D. (2016) Tunable protein synthesis by transcript isoforms in human cells. eLife, 5, 1276.

36. Tsanov,N., Samacoits,A., Chouaib,R., Traboulsi,A.-M., Gostan,T., Weber,C., Zimmer,C., Zibara,K., Walter,T., Peter,M., et al. (2016) smiFISH and FISH-quant - a flexible single RNA detection approach with super-resolution capability. Nucleic Acids Research, 44, e165–e165.

37. Preiss,T., Castello,A., Schwarzl,T. and Hentze,M.W. (2018) A brave new world of RNA-binding proteins. Nature Reviews Molecular Cell Biology, 19, 327–341.

38. Godoy,P., Hewitt,N.J., Albrecht,U., Andersen,M.E., Ansari,N., Bhattacharya,S., Bode,J.G., Bolleyn,J., Borner,C., Böttger,J., et al. (2013) Recent advances in 2D and 3D in vitro systems using primary hepatocytes, alternative hepatocyte sources

98 and non-parenchymal liver cells and their use in investigating mechanisms of hepatotoxicity, cell signaling and ADME. Arch. Toxicol., 87, 1315–1530.

39. Kaewsapsak,P., Shechner,D.M., Mallard,W., Rinn,J.L. and Ting,A.Y. (2017) Live- cell mapping of organelle-associated RNAs via proximity biotinylation combined with protein-RNA crosslinking. eLife, 6, 623.

40. Dyrløv Bendtsen,J., Nielsen,H., Heijne,von,G. and Brunak,S. (2004) Improved Prediction of Signal Peptides: SignalP 3.0. Journal of Molecular Biology, 340, 783– 795.

41. Novick,P., Field,C. and Schekman,R. (1980) Identification of 23 complementation groups required for post-translational events in the yeast secretory pathway. Cell, 21, 205–215.

42. Novick,P., Ferro,S. and Schekman,R. (1981) Order of events in the yeast secretory pathway. Cell, 25, 461–469.

43. Bragoszewski,P., Turek,M. and Chacinska,A. (2017) Control of mitochondrial biogenesis and function by the ubiquitin–proteasome system. Open Biology, 7, 170007.

44. Moor,A.E., Golan,M., Massasa,E.E., Lemze,D., Weizman,T., Shenhav,R., Baydatch,S., Mizrahi,O., Winkler,R., Golani,O., et al. (2017) Global mRNA polarization regulates translation efficiency in the intestinal epithelium. Science, 357, 1299–1303.

45. Mehta,A., Kinter,M.T., Sherman,N.E. and Driscoll,D.M. (2000) Molecular Cloning of Apobec-1 Complementation Factor, a Novel RNA-Binding Protein Involved in the Editing of Apolipoprotein B mRNA. Molecular and Cellular Biology, 20, 1846–1854.

46. Lin,J., Conlon,D.M., Wang,X., Van Nostrand,E., Rabano,I., Park,Y., Strong,A., Radmanesh,B., Barash,Y., Rader,D.J., et al. (2018) RNA-binding protein A1CF modulates plasma triglyceride levels through posttranscriptional regulation of stress- induced VLDL secretion. bioRxiv, 10.1101/397554.

47. Yang,Y.-C.T., Di,C., Hu,B., Zhou,M., Liu,Y., Song,N., Li,Y., Umetsu,J. and Lu,Z.J. (2015) CLIPdb: a CLIP-seq database for protein-RNA interactions. BMC Genomics 2015 16:1, 16, 51.

48. Rossoll,W., Kröning,A.-K., Ohndorf,U.-M., Steegborn,C., Jablonka,S. and Sendtner,M. (2002) Specific interaction of Smn, the spinal muscular atrophy determining gene product, with hnRNP-R and gry-rbp/hnRNP-Q: a role for Smn in RNA processing in motor axons? Human Molecular Genetics, 11, 93–105.

49. Mourelatos,Z., Abel,L., Yong,J., Kataoka,N. and Dreyfuss,G. (2001) SMN interacts with a novel family of hnRNP and spliceosomal proteins. The EMBO Journal, 20, 5443–5452.

99 50. Rao,Y., Hao,R., Bin Wang and Yao,T.-P. (2014) A Mec17-Myosin II Effector Axis Coordinates Microtubule Acetylation and Actin Dynamics to Control Primary Cilium Biogenesis. PLoS ONE, 9, e114087.

51. Ma,W. and Mayr,C. (2018) A Membraneless Organelle Associated with the Endoplasmic Reticulum Enables 3'UTR-Mediated Protein-Protein Interactions. Cell, 175, 1492–1506.e19.

100 Supplementary Tables

Table S1. Related to Figure 1.

Mass spectrometry counts for all proteins identified in this study. Gene names are listed, followed by Uniprot protein accession IDs, a description of the protein, and relative peptide abundances in each fraction for the protein. Mass spectrometry was conducted on gradient 2, with 1 technical replicate. The data reflect the aggregate of both technical replicates.

Table can be found here: https://bio.rc.ufl.edu/pub/ericwang/Danielle atlas/supp tables/

Table S2. Related to Figure 1. TPM values for all Refseq Genes for which at least 1 fraction has TPM > 1. Data from both gradients are shown here, and the gradient number is indicated in the worksheet name. Data from gradient 2 is highlighted in all main figures.

Table can be found here: https://bio.rc.ufl.edu/pub/ericwang/Danielle atlas/supp tables/

Table S3. Related to Figure 1. Worksheet 1: GO enrichments for each RNA cluster. Cluster number is in the left-most column, followed by GO enrichment. Subsequent columns show the observed number of genes in the GO category, the expected number of genes in the GO category, the fold enrichment, and p-value. The final column marks whether a cluster is designated as an ER cluster as defined by GO analysis. Worksheet 2: Hierarchical clustering assignments from clustering of gradient 2. Worksheet 3: Clusters analyzed in Figure 3A (those that contain more than 20 genes).

Table can be found here: https://bio.rc.ufl.edu/pub/ericwang/Danielle atlas/supp tables/

Table S4. Related to Figure 3. Probe sequences for smiFISH analyses. Gene names and probe sequences are provided for each smiFISH probe set. For the smiFISH, Y-flap was used as the secondary probe in all cases with either Cy3 or Cy5 used as the fluorophore indicated.

Name Sequence Psma1 1 GTGGTCTTTCTTCAAGACCATCCAGGAATGTTACACTCGGACCTCGTCGACATGCATT

101 Psma1 2 CTGGCATCAGCAGTTAGACCCGCAATTTTACACTCGGACCTCGTCGACATGCATT Psma1 3 GCATATTCAATTTGATGAATCCTGCCCTGTTACACTCGGACCTCGTCGACATGCATT Psma1 4 TTTCGAAACATAGCTCCGGCGCAGCGATTACACTCGGACCTCGTCGACATGCATT Psma1 5 ATGTCCTTGTAACCTTTATCACTTAATGTTCCTTACACTCGGACCTCGTCGACATGCATT Psma1 6 TGGTTCATCGGCTTTTTCTGCAGGTTCCTTACACTCGGACCTCGTCGACATGCATT Psma1 7 AGCAGCCTGTGAAGGCTGTGCTTTTCTTTACACTCGGACCTCGTCGACATGCATT Psma1 8 CATTCTTTGTGGTCAGGTCCTGCTCTGTTACACTCGGACCTCGTCGACATGCATT Psma1 9 GGGAGTGTTTCTCTTAAGGCACGCAGACCTTACACTCGGACCTCGTCGACATGCATT Psma1 10 TGTCAACATGGAGAATTTTCTTCTGGTGAGCGTTACACTCGGACCTCGTCGACATGCATT Psma1 11 AGCTCTGACTGTGCTCTCTTCAGTGCTTACACTCGGACCTCGTCGACATGCATT Psma1 12 CAGCACTGCGTGCGTTTTTGATTTTAGACCAATTACACTCGGACCTCGTCGACATGCATT Psma1 13 GTTGCTGAACCTTGCTTAACAGCTTCCATTACACTCGGACCTCGTCGACATGCATT Psma1 14 AAATACAGATTATTGTCGCCAGTGTATGTCCCTTACACTCGGACCTCGTCGACATGCATT Psma1 15 AATTCCAAGTCTTTACCAACGATTCCAATGGATTACACTCGGACCTCGTCGACATGCATT Psma1 16 GTTTAACCAGTTCATCCAAATTGCACTCCATTTACACTCGGACCTCGTCGACATGCATT Psma1 17 ATTCAGACATATGTCTCTCCAGGTAAGTACGATTACACTCGGACCTCGTCGACATGCATT Psma1 18 TGATTGAGAACGGGCTCCAATAGACATAGCTTACACTCGGACCTCGTCGACATGCATT Psma1 19 ATATCATCATAACCAGCAATGAGCAGCCTTACACTCGGACCTCGTCGACATGCATT Psma1 20 CTCCGGCCATATCGCTGTGTTGGGATTTACACTCGGACCTCGTCGACATGCATT Psma1 21 TTGCTTCCAATTAGAGACACAAGACGAGACACTTACACTCGGACCTCGTCGACATGCATT Psma1 22 GAAGTGGTCTGTCAAACACAAATCTGGAATCCTTACACTCGGACCTCGTCGACATGCATT Psma1 23 GGCTCCAAACAGTGACATCATTGTCATACTTTACACTCGGACCTCGTCGACATGCATT Psma1 24 ACTTACATCTGGACTGATTTCTAAAACATACCTTACACTCGGACCTCGTCGACATGCATT Psma1 25 AGCAGATGGACAGGTTTGGAAAATATGAGGGCTTACACTCGGACCTCGTCGACATGCATT Psma1 26 ACACTCCTGGCGCATAAAGTTGCATAACATTACACTCGGACCTCGTCGACATGCATT Psma1 27 CTGACCCAGCAGCGGACACAGGAACATTACACTCGGACCTCGTCGACATGCATT Psma1 28 GCCCCAACTACGGTTCGGTGATCGTCTTACACTCGGACCTCGTCGACATGCATT Psma1 29 TTATGTATACCCTTTCAAAGAAAATGCACCCCTTACACTCGGACCTCGTCGACATGCATT psmb1 1 TGGAACTGGAGTTACAGGTGGGTGCTAGGAATTACACTCGGACCTCGTCGACATGCATT psmb1 2 GTCACGATGCAGATCCTGAGAGCATCTTTACACTCGGACCTCGTCGACATGCATT

102 psmb1 3 AGTATACACATCCCTCTCGGCTGCAGTTACACTCGGACCTCGTCGACATGCATT psmb1 4 CTCTCTGGTAAGAGCCCACTGGGTCAATTACACTCGGACCTCGTCGACATGCATT psmb1 5 AATCCTTCACTCAATCGAGTGTCTGAAGCGTTACACTCGGACCTCGTCGACATGCATT psmb1 6 ACATACATTTGTATGTGGGTATTTGCATGTGCTTACACTCGGACCTCGTCGACATGCATT psmb1 7 AACAGTTTCCTCCCTGATGCCCTCTTTTACACTCGGACCTCGTCGACATGCATT psmb1 8 GTCCAGCGTCAGGGGGACGTGCTCTATTACACTCGGACCTCGTCGACATGCATT psmb1 9 TTCTGCATATTTTTGAAGCCAACCTGGTTGTTTACACTCGGACCTCGTCGACATGCATT psmb1 10 CTTGCTGAGCCTCCCGCCTTGAAAGATTACACTCGGACCTCGTCGACATGCATT psmb1 11 CTGTACACAGCTCCCTTTCCTTCTTCTTACACTCGGACCTCGTCGACATGCATT psmb1 12 CGTTGTCATGGCCTTGTTATTGGAATGCTTGTTACACTCGGACCTCGTCGACATGCATT psmb1 13 GCAGCCAATTACTGTCTTGTCTGTCAGTTTTTACACTCGGACCTCGTCGACATGCATT psmb1 14 AGCATTTGGGGCTATCTCGGGTATGAATTGTTACACTCGGACCTCGTCGACATGCATT psmb1 15 CCAGCTCTCGTTCCACGTCTCGGTAATTACACTCGGACCTCGTCGACATGCATT psmb1 16 GTCGAGGCCAGAAGAGGACACTGTAGCCTTACACTCGGACCTCGTCGACATGCATT psmb1 17 AAACTTAGGTCCTCTGCAAGAGCCATACTTTACACTCGGACCTCGTCGACATGCATT psmb1 18 TTAAACTGAGAGTTGTCTCTCCGACCCCTTTACACTCGGACCTCGTCGACATGCATT psmb1 19 GGAAGAAATTCTTCCCAATTTGCCATGCTTAGTTACACTCGGACCTCGTCGACATGCATT psmb1 20 GACCCGCATGTCAGTCTTTCCGCAGGTTACACTCGGACCTCGTCGACATGCATT psmb1 21 ATGAAGACATCTTTCACCAGCCTCATGGTTACACTCGGACCTCGTCGACATGCATT psmb1 22 AACATAGTAAGGGAAGAAGCGCCGTGAGTACATTACACTCGGACCTCGTCGACATGCATT psmb1 23 ATGGTAGACAGCATTGCAGCAATCGCCTTACACTCGGACCTCGTCGACATGCATT psmb1 24 TTGTCAAAGTGAGACAATCTCCATGGAAACCATTACACTCGGACCTCGTCGACATGCATT psmb1 25 AATAAAACAAACTTGTGCCACAGCTCCAATATTTACACTCGGACCTCGTCGACATGCATT psmb1 26 GATGGAAAAATCTTCTCCAGCAATTGCCAATATTACACTCGGACCTCGTCGACATGCATT psmb1 27 TACCTCCGTTGAAGGCATAAGGCGAAAACCTTACACTCGGACCTCGTCGACATGCATT atpb5 1 AGAAGGCTTGTTCTGGGAGATGGTCATATTCATTACACTCGGACCTCGTCGACATGCATT atpb5 2 TTATCTTCCTCAGAAAGTTCATCCATACCCTTACACTCGGACCTCGTCGACATGCATT atpb5 3 GATGGCAATGATGTCCTGGAGAGATTTGTAGTTTACACTCGGACCTCGTCGACATGCATT atpb5 4 TCTGTCCATATACCAACGCTACCTTGGTTACACTCGGACCTCGTCGACATGCATT atpb5 5 CAGAGTAACCACCATGGGCTTTGGCGACTTACACTCGGACCTCGTCGACATGCATT

103 atpb5 6 TTTATCCCAGTCACCAGAATCTCCTGCTTACACTCGGACCTCGTCGACATGCATT atpb5 7 CCAAGCCTTCAGTGCCATCCATAGCATTACACTCGGACCTCGTCGACATGCATT atpb5 8 ATGGGTGGTAATCCCTCATCGAACTGGATTACACTCGGACCTCGTCGACATGCATT atpb5 9 CTCTTCTGCCAGCTTATCAGCCTTTGTTACACTCGGACCTCGTCGACATGCATT atpb5 10 ACGGCTTCTTCAATGGGTCCCACCATTTACACTCGGACCTCGTCGACATGCATT atpb5 11 TGTATCTTTCTTGCCCGGGACACAGTCATTACACTCGGACCTCGTCGACATGCATT atpb5 12 CAGGATTTTCTGCACTCCTCGGGCGATTACACTCGGACCTCGTCGACATGCATT atpb5 13 CCGAGGTGATCGATCCCTTCTTGGTGGTTACACTCGGACCTCGTCGACATGCATT atpb5 14 GTGATCCTTTCCTGCATTGTGCCCATTTACACTCGGACCTCGTCGACATGCATT atpb5 15 TCCTGGTCTCTGAAGTATTCAGCAACGGTTTACACTCGGACCTCGTCGACATGCATT atpb5 16 GCAAATTGTTTGGTTTTGATAGGACCTCTCTCTTACACTCGGACCTCGTCGACATGCATT atpb5 17 CCCTGAATCCAGTACTTTCTGGCCTCTTACACTCGGACCTCGTCGACATGCATT atpb5 18 CGCCGCATAGTCTCTGGCAGGATGAATTACACTCGGACCTCGTCGACATGCATT atpb5 19 CCTGCCTTGCACTTCCAGGGCATTTATTACACTCGGACCTCGTCGACATGCATT atpb5 20 CTCTGAGCCAGCCTGGGTAAAGCGGATTACACTCGGACCTCGTCGACATGCATT atpb5 21 GGCGTATGGGGCCAGCAGATCCACAATTACACTCGGACCTCGTCGACATGCATT atpb5 22 CCGGCGGGAGCAGCTCGCAGTAGAAGTTACACTCGGACCTCGTCGACATGCATT atpb5 23 TGCTAAAATCTGCTGGAATCCTTTAATGGTCTTTACACTCGGACCTCGTCGACATGCATT atpb5 24 AAAGGTGGTTGCAGGGGCAGGGTCAGTCTTACACTCGGACCTCGTCGACATGCATT atpb5 25 GGTGGCTAGGGTGGGCTGGTAGCCTATTACACTCGGACCTCGTCGACATGCATT atpb5 26 CGGGCTCGAGCGCCAGGTGGTTCGTTTTACACTCGGACCTCGTCGACATGCATT atpb5 27 TGTTGATTAGCTCCATGATCAGTACTGTCTTTTACACTCGGACCTCGTCGACATGCATT Fbn1 1 GTCAAAGCATGAGTCATCTGTAGGCTGGTTCTTACACTCGGACCTCGTCGACATGCATT Fbn1 2 ATAATGTAGCCAGTAATCCTGGCACGGGGTTACACTCGGACCTCGTCGACATGCATT Fbn1 3 TCTTGGAGGGCTAACATTCTCCAGAGTAGTGTTACACTCGGACCTCGTCGACATGCATT Fbn1 4 AGAGGGATGCTCTCATGTTGTTCGTAGACACTTACACTCGGACCTCGTCGACATGCATT Fbn1 5 GGCATTGTCTGAAGGTGAAATGGATAGCTCTGTTACACTCGGACCTCGTCGACATGCATT Fbn1 6 ATCAGAGATAGGGGCACTTTCCTTGTCATCTTACACTCGGACCTCGTCGACATGCATT Fbn1 7 GTGTAAACACTGACGTTGTACTCCAGGCCATTACACTCGGACCTCGTCGACATGCATT Fbn1 8 GGTGCTGTAGTCTGTGAGGTAGACAGGATCAATTACACTCGGACCTCGTCGACATGCATT

104 Fbn1 9 CTCTAACATGTAGCCACCAGTCTCATGTGTTACACTCGGACCTCGTCGACATGCATT Fbn1 10 GGCAGCGATTTGCAATGGTACAGCTGATCTTACACTCGGACCTCGTCGACATGCATT Fbn1 11 TCAAAGCAAGTCTCTTCAGGCTCAGGCTTTTACACTCGGACCTCGTCGACATGCATT Fbn1 12 TCTCTGTGTATACTGGTTGTAGGTGTGGCCTTACACTCGGACCTCGTCGACATGCATT Fbn1 13 TAGCACGTTGCTTCATGGGGATCACACTTTTACACTCGGACCTCGTCGACATGCATT Fbn1 14 CTCCGATCTTGTAGTTGACACCGTTGTCATGTTACACTCGGACCTCGTCGACATGCATT Fbn1 15 CACTCCTCTCCAATGGCGTAATGGGAAACTTACACTCGGACCTCGTCGACATGCATT Fbn1 16 GAACTTGGAACTGTAAGGGCTCTTCGTCGGTTTACACTCGGACCTCGTCGACATGCATT Fbn1 17 GAGAGAGAGCTTCCTGTCCTGTCTTCTTCTTACACTCGGACCTCGTCGACATGCATT Fbn1 18 TCTTGTAGTCAGTGCCTGGCTGTAAACCTTTACACTCGGACCTCGTCGACATGCATT Fbn1 19 TAGCTTCTAACATCCGGGCTGATGCTCTTACACTCGGACCTCGTCGACATGCATT Fbn1 20 TTGATGGTCCTAGGCCATTTTTGGGAGTGGTTTACACTCGGACCTCGTCGACATGCATT Fbn1 21 TTGTTGATGGTGGCTGTGGACTTGCTTCCTTACACTCGGACCTCGTCGACATGCATT Fbn1 22 AGTATTCTGTCCCAGGCAGGAGATTTGTTAGGTTACACTCGGACCTCGTCGACATGCATT Fbn1 23 TCCTCTTCATTCTTCACGGGTGAGTAGCGTTACACTCGGACCTCGTCGACATGCATT Fbn1 24 CTACTCTGTTGACAATCGGTGCATCTCTCTTACACTCGGACCTCGTCGACATGCATT Fbn1 25 CATCTCGCAGGACTTGGATGGTGTAAGTGTATTTACACTCGGACCTCGTCGACATGCATT Fbn1 26 GTCCAGGTGATCACAATTGTGGTCTCTGTCTTACACTCGGACCTCGTCGACATGCATT Fbn1 27 CATTGACAATGTACTTTCTGCCCGGGAGCTTACACTCGGACCTCGTCGACATGCATT Fbn1 28 GTGGTGAAGTCGAAGCGTGTCACTTCTCTGTTTACACTCGGACCTCGTCGACATGCATT Fbn1 29 CCCGTAGAGGTTTTAGGTCTCCATCTGAGAATTTACACTCGGACCTCGTCGACATGCATT Fbn1 30 AATTACTTGGACAGGTCCAGTTGTGCCTGGTTACACTCGGACCTCGTCGACATGCATT Fbn1 31 AACTTCTCCCAGGAGTCACCAATCTGGTTTACACTCGGACCTCGTCGACATGCATT Fbn1 32 CTCTGAATCTTGGCACTGGTCAATGGGGTTTACACTCGGACCTCGTCGACATGCATT Fbn1 33 AAACTTCTGATCGGCATCGTAGTTCTGGGTGTTACACTCGGACCTCGTCGACATGCATT Fbn1 34 GTGCACAGCATTTGCTTGTTTCCTTGCGATTACACTCGGACCTCGTCGACATGCATT Fbn1 35 TCCGGCTGAAGCACTTTGTAGAGCATGTCTTACACTCGGACCTCGTCGACATGCATT Fbn1 36 CCACTTGTCGCCAATCTTGTAGGACTGACTTACACTCGGACCTCGTCGACATGCATT Fbn1 37 TTTAGGGCGCTCATAAGTGTCACCCACTTTTACACTCGGACCTCGTCGACATGCATT Fbn1 38 ATTCTCCCTTTCCATTCCCGAGGCATTTACACTCGGACCTCGTCGACATGCATT

105 adrm1 1 CCA GTG CTG AAA GCT CAT GGC CAC TAT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 2 CTG CAT GGC TTT GGC AAA TGC TTC CAC TTA CAC TCG GAC CTC GTC GAC ATG CAT T adrm1 3 TTT GTT GGC GGC CTC AAC AGC CTC TGT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 4 GTG TTC TGG ATC TCA TCT GCA GTC TGG TTA CAC TCG GAC CTC GTC GAC ATG CAT T adrm1 5 AGG ATG GGA GCC ATG ATC TCT GGG GTT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 6 GCT GCT GTG CTG GTT CCA TTA CCT GAT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 7 AGA CAC TCG TTG ACT TTC CGG CAG TGC TTA CAC TCG GAC CTC GTC GAC ATG CAT T adrm1 8 CTC ATC TTG GTC AGT CTT GGG CTC CTT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 9 GGT CCC AGA GGT CCT GTC TTT CCA ACA TTA CAC TCG GAC CTC GTC GAC ATG CAT T adrm1 10 AGT GAA TAA GGG AGT CGT CCG TCT GCT GGT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 11 GTA CAC GAG ACC TTT CCG TTT ATC TGG GGT TTA CAC TCG GAC CTC GTC GAC ATG CAT T adrm1 12 CCG TAG TTC CTT TTA ATG ACA TTT TTC CTG CCT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 13 TGA AGT CGT CAT CCT GGC GGT GCT CAT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 14 CCC TCC TTT GGG TCC GAT TTG GCA TTG TTA CAC TCG GAC CTC GTC GAC ATG CAT T adrm1 15 CTG GGG TGG CGC GAG CGG AAG AGG TGT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 16 TCC GGG AGC TGG ATG AAG AGC TGC TGT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 17 CCG GCT GGT CCG ATG AGC TGC ATA AGT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 18 TAC GGA ATC ACA CAG CTA CCA ATT CCA ACT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 19 GGA AGG CCG AAC TGG CAC ATG AGA GGG TTA CAC TCG GAC CTC GTC GAC ATG CAT T adrm1 20 CTG AGG CCA AGG CCG CAC TGA ACA TAT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 21 CAG AGA CTC CCC AGA GGG CAG GTA GGT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 22 GGC TGT GAC TCA TGT TCC CCA ACA GGT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 23 CTT AAA CTT GAG CAC GTA GAC CCT CCC ATT ACA CTC GGA CCT CGT CGA CAT GCA TT adrm1 24 AGT CAT CAG GAA AGA TAA TCA AGT CAT CCT CCT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 25 CTC TCG CCG CGG ATT GAG TTC CGT CAT TAC ACT CGG ACC TCG TCG ACA TGC ATT adrm1 26 GAA CTC CAC CAA ATA TTT GGT AGA AGA CCC CTT ACA CTC GGA CCT CGT CGA CAT GCA TT

Table S5. Related to Figure 4. Worksheet 1: RefSeq gene symbol followed by the Pearson’s correlation coefficient between the RNA and protein of each gene, sorted by most to least correlated.

106 Worksheet 2: GO enrichment results (by Gorilla GO analysis package) sorted from most to least correlated. Highest ranked values are for genes whose RNA and protein counterparts are the most positively correlated. Worksheet 3: GO enrichment results for genes whose RNA and protein counterparts are the most negatively correlated.

Table can be found here: https://bio.rc.ufl.edu/pub/ericwang/Danielle atlas/supp tables/

Table S6. Related to Figure 5. Worksheet 1. RefSeq gene symbol followed by AFE psi values for each fraction of gradient 2. Worksheet 2. RefSeq gene symbol followed by AFE psi values for each fraction of gradient 1. Worksheet 3. RefSeq gene symbol followed by ALE psi values for each fraction of gradient 2. Worksheet 4. RefSeq gene symbol followed by ALE psi values for each fraction of gradient 1.

Table can be found here: https://bio.rc.ufl.edu/pub/ericwang/Danielle atlas/supp tables/

Table S7. Related to Figure 6. Worksheet 1. RefSeq gene symbol for each RBP followed by most correlated cluster and the Pearson’s correlation coefficient between the peptide profile for the RBP profile and the mean profile of the RNA cluster. Worksheet 2. RefSeq gene symbol for each RBP along with the top 5 most correlated clusters. Pearson’s R values are provided.

Table can be found here: https://bio.rc.ufl.edu/pub/ericwang/Danielle atlas/supp tables/

107

Chapter III

Future Directions

108 Spatiotemporal organization of the transcriptome is at the center of cellular function and structure: RNAs must be expressed at the right time and in the right place.

Work presented in this thesis focused on interrogating the spatial organization of the transcriptome in mammalian cells by developing a map of RNA distribution on a transcriptome-wide scale. Through this work, we were able to make progress towards this aim and we strikingly uncovered that RNAs encoding proteins with similar biological functions or cellular compartment localization tend to cosegregate in the cell on a broad scale. The work I presented in this thesis suggests that RNA regulons and RNA-protein interaction networks coordinate post-transcriptional and translational events and shape the cellular architecture. The driving forces of this RNA cosegregation and the supramolecular organization of the transcriptome remain elusive. Contextualizing these findings with emerging studies, I propose follow-up investigations that would help to further mechanistically understand our findings.

109 strongly enriched to encode proteins of the proteasome complex. To our knowledge, mRNAs encoding the various subunits of the proteasome complex have not previously been known to co-segregate, and the spatial distribution of RNAs encoding the proteasome has not yet been systematically assessed. Many of the RNAs with restricted subcellular localization in our study have not been previously characterized as being localized or compartmentalized. Further analysis of ATLAS-Seq RNA clusters is necessary to characterize the sub-compartments or supramolecular assemblies in which RNAs enriched in the same cluster cosegregate or largely colocalize in intact cells.

Affinity purification of RNAs expected to cosegregate in vivo as predicted by

ATLAS-Seq is one way we could address this question. As I discussed in chapter I,

RNAs are localized in RNP complexes comprising at least one RNA species and regulatory RBPs. RaPID is an affinity purification technique that can be used to identify

RNAs and proteins interacting with an RNA of interest (Slobodin and Gerst, 2010).

RaPID involves MS2 tagging, by expressing a fusion protein that contains Streptavidin binding protein (SBP) and the bacteriophage MS2 coat protein (MS2-CP) that binds to

RNAs with MS2 coat protein’s binding sequence containing stem-loop structures in the transcript’s 3’UTR, to form an RNA-protein complex (Johansson et al., 1997; Bertrand et al., 1998) (Slobodin and Gerst, 2010). The RNA of interest is either edited by CRISPR to contain MS2 stem loops (Spille et al., 2019) or expressed with MS2 stem loops from a plasmid (Bertrand et al., 1998; Slobodin and Gerst, 2010). The MS2-CP-SBP fusion protein is expressed along with the RNA of interest containing MS2 binding stem loops.

Formaldehyde crosslinking followed by streptavidin bead pulldown allows for the

111 isolation of RNAs and proteins associated with the RNA of interest (Bertrand et al.,

1998; Slobodin and Gerst, 2010). With an assay like this, we could assess whether

RNAs predicted to cosegregate with the RNA of interest are physically associated and identify the proteins that may be involved in regulating these predicted networks of

RNAs. One major caveat to this study is the requirement of exogenous components and the difficulty to conduct this study in mammalian tissue. Nevertheless, this study would allow us to not only confirm our findings in intact cells, but also mechanistically dissect the factors driving RNA segregation and the structure of these potential RNA segregation networks.

Affinity purification could similarly be used to address another key finding from our study. Utilizing our mass-spectrometry and RNA-Seq analysis, we were able to profile 134 RBPs across our gradient, and predict the RNAs with which they may coassociate, by determining whether the RBP’s profile across the gradient is highly correlated to the RNA’s profile across the gradient. I hypothesize that a few RBPs can work together to traffic many RNAs, as has been described with ASH1, where ASH1 and 23 other mRNAs were shown to be localized to the bud tip in budding yeast by the same compilation of RBPs (Shepard et al., 2003). In our analysis of these RBPs and their colocalized RNAs, we identified two promising RBP candidates: HNRNPQ and

Myh9. As mentioned in Chapter II, HNRNPQ is a protein component of the spliceosome that interacts with Survival of Motor Neuron (SMN) complex and is postulated to link the

SMN complex with splicing (Mourelatos et al., 2001; Rossoll et al., 2002). HNRNPQ was found to be most strongly correlated with a group of RNAs encoding proteins in the pICln-Sm protein complex, the U12-type spliceosomal complex, and U2snRNPs. Myh9

112 is a myosin motor protein known to regulate cilium biogenesis (Rao et al., 2014) and was found, in our study, to be most highly correlated with a cluster of RNAs encoding cilium components. Using cross-linking immunoprecipitation (CLIP) (Rossbach et al.,

2014; Van Nostrand et al., 2016) or RNA immunoprecipitation (RIP) (Niranjanakumari et al., 2002; Hurt et al., 2004), we could identify the RNAs that associate with these RBPs in vivo and conduct RT-PCR or RNA-Seq to determine whether the RNAs identified in our gradient analysis were associated with these RBPS as predicted. Myh9 or

HNRNPQ depletion could permit us to further test whether these proteins are required to localize their associated RNAs.

Two alternative mechanisms could drive RNA cosegregation

There are two models that have been proposed to mechanistically describe RNA cosegregation. In model 1, RBPs are responsible for the recruitment of diverse RNAs to the same super-assembly. In this model, the RNA cosegregation we observed in our

ATLAS-Seq study would be driven by RNA-protein interactions. Various genetic approaches have dissected mechanisms through which RBPs recruit multiple RNAs to

RNA super-assemblies. The super-assemblies driven by the Whi-3 RBP is an example of this phenomenon. Mutation and deletion mutants identified a polyQ-containing prion- like domain that allowed Whi-3 to aggregate into super-assemblies, while RRM RNA binding domains further recruit RNAs to these super-assemblies (Zhang et al., 2015).

In model 2, although RBPs are localized to the coassembly, it is the direct intermolecular RNA-RNA interactions, through Watson-Crick base pairing, that drive

RNA cosegregation. This model was proposed for stress granule assembly by the Roy

Parker lab (Tauber et al., 2020; Van Treeck et al., 2018) and for Whi3 granules in

113 filamentous fungi by Amy Gladfelter (Langdon et al., 2018). The Gladfelter team had the most elegant demonstration of how RNA-RNA interactions are crucial for RNP super-assembly formation. To address this question, the Gladfelter team developed a reconstituted superassembly of RNAs and RBPs that were known to cosegregate in

Ashbya gossypii fungi and employed a technique called SHAPE-MaP that profiles the structure of RNAs of interest in this system (Langdon et al., 2018). By using complementary oligonucleotides that would compete for binding with the RNA-RNA loop interactions, the Gladfelter team was able to assess the significance of direct RNA-RNA interactions to the assembly of coseregating RNAs (Langdon et al., 2018). When RNA-

RNA binding was disrupted, recruitment of the two RNAs to the same assembly of cosegregating RNAs was eliminated. We could take a similar approach for identified

RNA assemblies in our ATLAS-Seq gradient and, in this way, determine whether model

1, model 2, or a combination of both is the likely mechanism behind the RNA cosegregation we observed in Chapter II. The incredible number and diversity of RNA clusters identified in our study suggests that membranous compartmentalization is just one segregation mechanism. Phase separations and other RNP super-assemblies driven by RBP-RNA and RNA-RNA interactions provide alternative RNA clustering mechanisms through supramolecular assembly that are not dependent on membranous organelle association.

Deeper proteomic analysis

Using our mass spectrometry analysis, we were able to identify 440 peptides with high confidence. The number of peptides we could identify was limited by the depth of our mass spectrometry. Deeper mass spectrometry would allow us to expand our

114 characterization of our gradient data. Firstly, deeper mass spectrometry would permit us to better index what organelles and protein complexes are localized at specific regions of our gradient by allowing us to expand identified peptides to proteins that are less abundant. Of the 440 peptides identified in our mass spectrometry analysis, 134 were classified as RBPs. Expanding the list of proteins identified by our methodology could allow us to analyze which RBPs in our gradient could function as cofactors by providing a larger list of RBPs whose profiles across the gradient we could cross-correlate and therefore implicate in functioning together to localize RNAs much like the ASH1 locasome previously described. One additional question our expanded mass spectrometry could better address is to further probe our finding that the profile of most

RNAs across the gradient is anti-correlated with the profile of the protein they encode.

We were able to analyze thousands of RNA profiles, but due to limitations in the number of identified peptides, we had to limit the study to a few hundred proteins and RNAs.

Increasing the number of proteins could help us to better characterize the co- segregation of RNAs and proteins. It could also allow us to determine whether additional cellular compartments and protein complexes beyond the endomembrane system are exceptions to this global result. Taken together, deeper mass spectrometry could strengthen our understanding of the composition of RNP complexes and their potential RNA targets.

115 References

Bertrand, E., Chartrand, P., Schaefer, M., Shenoy, S.M., Singer, R.H., and Long, R.M. (1998). Localization of ASH1 mRNA Particles in Living Yeast. Molecular Cell 2, 437– 445.

Hurt, E., Luo, M.-J., Röther, S., Reed, R., and Sträßer, K. (2004). Cotranscriptional recruitment of the serine-arginine-rich (SR)-like proteins Gbp2 and Hrb1 to nascent mRNA via the TREX complex. Pnas 101, 1858–1862.

Johansson, H.E., Liljas, L., and Uhlenbeck, O.C. (1997). RNA Recognition by the MS2 Phage Coat Protein. Seminars in Virology 8, 176–185.

Langdon, E.M., Qiu, Y., Niaki, A.G., McLaughlin, G.A., Weidmann, C.A., Gerbich, T.M., Smith, J.A., Crutchley, J.M., Termini, C.M., Weeks, K.M., et al. (2018). mRNA structure determines specificity of a polyQ-driven phase separation. Science 360, 922–927.

Mourelatos, Z., Abel, L., Yong, J., Kataoka, N., and Dreyfuss, G. (2001). SMN interacts with a novel family of hnRNP and spliceosomal proteins. The EMBO Journal 20, 5443– 5452.

Niranjanakumari, S., Lasda, E., Brazas, R., and García-Blanco, M.A. (2002). Reversible cross-linking combined with immunoprecipitation to study RNA-protein interactions in vivo. Methods 26, 182–190.

Rao, Y., Hao, R., Bin Wang, and Yao, T.-P. (2014). A Mec17-Myosin II Effector Axis Coordinates Microtubule Acetylation and Actin Dynamics to Control Primary Cilium Biogenesis. PLoS ONE 9, e114087.

Rossbach, O., Hung, L.-H., Khrameeva, E., Schreiner, S., König, J., Curk, T., Zupan, B., Ule, J., Gelfand, M.S., and Bindereif, A. (2014). Crosslinking-immunoprecipitation (iCLIP) analysis reveals global regulatory roles of hnRNP L. RNA Biology 11, 146–155.

Rossoll, W., Kröning, A.-K., Ohndorf, U.-M., Steegborn, C., Jablonka, S., and Sendtner, M. (2002). Specific interaction of Smn, the spinal muscular atrophy determining gene product, with hnRNP-R and gry-rbp/hnRNP-Q: a role for Smn in RNA processing in motor axons? Human Molecular Genetics 11, 93–105.

Shepard, K.A., Gerber, A.P., Jambhekar, A., Takizawa, P.A., Brown, P.O., Herschlag, D., DeRisi, J.L., and Vale, R.D. (2003). Widespread cytoplasmic mRNA transport in yeast: identification of 22 bud-localized transcripts using DNA microarray analysis. Pnas 100, 11429–11434.

Slobodin, B., and Gerst, J.E. (2010). A novel mRNA affinity purification technique for the identification of interacting proteins and transcripts in ribonucleoprotein complexes. Rna 16, 2277–2290.

116 Spille, J.-H., Hecht, M., Grube, V., Cho, W.-K., Lee, C., and Cissé, I.I. (2019). A CRISPR/Cas9 platform for MS2-labelling of single mRNA in live stem cells. Methods 153, 35–45.

Tauber, D., Tauber, G., Khong, A., Van Treeck, B., Pelletier, J., and Parker, R. (2020). Modulation of RNA Condensation by the DEAD-Box Protein eIF4A. Cell 180, 411– 426.e416.

Van Nostrand, E.L., Pratt, G.A., Shishkin, A.A., Gelboin-Burkhart, C., Fang, M.Y., Sundararaman, B., Blue, S.M., Nguyen, T.B., Surka, C., Elkins, K., et al. (2016). Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Meth 13, 508–514.

Van Treeck, B., Protter, D.S.W., Matheny, T., Khong, A., Link, C.D., and Parker, R. (2018). RNA self-assembly contributes to stress granule formation and defining the stress granule transcriptome. Pnas 115, 2734–2739.

Zhang, H., Elbaum-Garfinkle, S., Langdon, E.M., Taylor, N., Occhipinti, P., Bridges, A.A., Brangwynne, C.P., and Gladfelter, A.S. (2015). RNA Controls PolyQ Protein Phase Transitions. Molecular Cell 60, 220–230.

117