Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

1 Cytoplasmic long non‐coding are differentially regulated and translated during human 2 neuronal differentiation 3 4 Katerina Douka1,2, Isabel Birds1,2, Dapeng Wang2, Andreas Kosteletos1,2, Sophie Clayton1,3, Abigail 5 Byford1,3, Elton J. R. Vasconcelos2, Mary J. O’Connell5, Jim Deuchars2,3, Adrian Whitehouse1,2,4 and Julie 6 L. Aspden1,2  7 8 1 School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, LS2 9JT, 9 UK. 10 2 LeedsOmics, University of Leeds, UK 11 3 School of Biomedical Sciences, Faculty of Biological Sciences, University of Leeds, LS2 9JT, UK. 12 4 Astbury Centre for Structural , University of Leeds, Leeds, LS2 9JT, UK. 13 5 School of Life Sciences, Faculty of Medicine and Health Sciences, The University of Nottingham, 14 Nottingham, NG7 2UH, UK. 15  corresponding author: [email protected] 16 17 Abstract 18 The expression of long non‐coding RNAs is highly enriched in the human nervous system. 19 However, the function of neuronal lncRNAs in the and their potential remains 20 poorly understood. Here we performed Poly‐Ribo‐Seq to understand the interaction of lncRNAs with 21 the translation machinery and the functional consequences during neuronal differentiation of human 22 SH‐SY5Y cells. We discovered 237 cytoplasmic lncRNAs upregulated during early neuronal 23 differentiation, 58‐70% of which are associated with polysome translation complexes. Amongst these 24 polysome associated lncRNAs, we find 45 small ORFs to be actively translated, 17 specifically upon 25 differentiation. 15/45 of the translated lncRNA‐smORFs exhibit sequence conservation within 26 Hominidea suggesting they are under strong selective constraint in this clade. Profiling of publicly 27 available datasets revealed that 8/45 of the translated lncRNAs are dynamically expressed during 28 human brain development and 22/45 are associated with cancers of the central nervous system. One 29 translated lncRNA we discovered is LINC01116, which is induced upon differentiation and contains an 30 87 codon smORF exhibiting increased profiling signal upon differentiation. The resulting 31 LINC01116 peptide localises to neurites. Knockdown of LINC01116 results in a significant reduction of 32 neurite length in differentiated cells indicating it contributes to neuronal differentiation. Our findings 33 indicate cytoplasmic lncRNAs interact with translation complexes, are a non‐canonical source of novel

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

34 peptides, and contribute to neuronal function and disease. Specifically, we demonstrate a novel 35 functional role for LINC01116 during human neuronal differentiation. 36 37 Introduction 38 Long non‐coding RNAs (lncRNAs) are >200nt in length and thought to lack the ability to encode 39 . LncRNAs are less conserved, yet more tissue and developmental‐stage specific than mRNAs 40 (Tsagakis, Douka et al. 2020). Early work indicated that the majority of lncRNAs were predominantly 41 nuclear and localized to chromatin (Derrien, Johnson et al. 2012, Djebali, Davis et al. 2012). However, 42 it has become increasingly clear that many lncRNAs are exported to the cytoplasm and recent 43 estimates are that ~54% of lncRNAs are detected in the cytoplasm (Carlevaro‐Fita, Rahim et al. 2016). 44 Although many lncRNAs have been found to bind proteins, biological functions have only been 45 determined for a relatively small number of lncRNAs. 46 Several lncRNAs have been found to play key roles in development and differentiation; e.g. 47 lnc‐31 during myoblast differentiation (Dimartino, Colantoni et al. 2018). They are particularly 48 enriched in the nervous system of Drosophila melanogaster, mouse and human. It is estimated that 49 ~40% of human lncRNAs are specifically expressed in the brain (Derrien, Johnson et al. 2012), where 50 they display precise spatiotemporal expression profiles (Ponting, Oliver et al. 2009). A subset of 51 nuclear neuronal lncRNAs have been found to regulate neuronal differentiation in mouse and human 52 (Chodroff, Goodstadt et al. 2010, Lin, Chang et al. 2014, Winzi 2018, Carelli, Giallongo et al. 2019). 53 However, only a small number of cytoplasmic lncRNAs have had their biological and molecular 54 functions elucidated. These include lncRNAs found to associate with translation complexes (Carrieri, 55 Cimatti et al. 2012) and to have specific cytoplasmic functions in post‐transcriptional gene regulation. 56 For example, BACE1‐AS transcript, which is significantly upregulated in the brains of Alzheimer’s 57 disease patients, base‐pairs with beta‐secretase‐1 (BACE1) mRNA, stabilising it (Faghihi, Zhang et al. 58 2010). Whilst BC200 represses translation initiation in dendrites by disrupting the formation of pre‐ 59 initiation 48S complexes (Wang, Iacoangeli et al. 2002). Together these studies indicate that there 60 may be many lncRNAs present in the cytoplasm potentially playing important roles during neuronal 61 development and differentiation, yet to be discovered. 62 Ribosome profiling (Ribo‐Seq) in a range of and tissue types has revealed the 63 translation of a variety of non‐canonical ORFs, including small ORFs (smORFs) <100 codons in length 64 from within lncRNAs (Guo, Ingolia et al. 2010, Ingolia, Brar et al. 2013, Aspden, Eyre‐Walker et al. 2014, 65 Duncan and Mata 2014, Blair, Hockemeyer et al. 2017, Fujii, Shi et al. 2017, Rodriguez, Chun et al. 66 2019). Although these translation events remain controversial, it is clear that lncRNAs can interact 67 with the translation machinery (Ruiz‐Orera and Alba 2019). Limited ribosome profiling signal found on

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

68 smORFs might be the result of sporadic binding of a single ribosome and may not necessarily 69 correspond to active translation (Patraquim, Mumtaz et al. 2020). We previously developed Poly‐Ribo‐ 70 Seq to distinguish those lncRNAs that are bound by multiple , and therefore actively 71 translated, from non‐specific background signal (Aspden, Eyre‐Walker et al. 2014). A small but growing 72 number of smORF peptides translated from lncRNAs have been found to exhibit cellular and 73 organismal functions (Pueyo and Couso 2008, Magny, Pueyo et al. 2013, Anderson, Anderson et al. 74 2015, Chen, Brunner et al. 2020, Spencer, Sanders et al. 2020, Wang, Fan et al. 2020). Therefore, the 75 identification of genuine smORF translation events from lncRNAs by ribosome profiling is an important 76 first step in assessing the wider importance of these smORF peptides. To date, a robust assessment of 77 lncRNA translation in the context of neuronal differentiation, where lncRNA expression is enriched, 78 has been missing. 79 Given, i) the large number of lncRNAs enriched in the human central nervous system, ii) 80 recently revealed lncRNA roles in differentiation, and iii) evidence of translation of lncRNA to produce 81 small peptides, we reasoned that lncRNAs may functionally interact with polysomes and be translated 82 during neuronal differentiation. This work aimed to probe the dynamic interactions of lncRNAs with 83 the translation machinery and identify actively translated cytoplasmic lncRNAs during early neuronal 84 differentiation. This will be important to understand the biological role of cytoplasmic lncRNAs and to 85 identify novel peptides with potentially biological and medically important functions. 86 Here, we have performed Poly‐Ribo‐Seq (Aspden, Eyre‐Walker et al. 2014), an adaptation of 87 Ribosome profiling to determine the population of lncRNAs present in the cytoplasm, assess their 88 interaction with the translation machinery and establish which lncRNAs are translated. We used the 89 human neuronal cell line SH‐SY5Y to provide a model of neuronal differentiation and to generate 90 sufficient material for Poly‐Ribo‐Seq. We followed up our transcriptome wide analysis probing a 91 subset of candidate lncRNAs in more detail in terms of their enrichment in the cytoplasm, precise 92 association with translation complexes and ORF tagging experiments. For the translated lncRNAs we 93 identified, we have assessed their conservation and their importance to neuronal development and 94 disease using previously published datasets. For one translated candidate lncRNA, LINC01116, we 95 characterised its functional contribution to neuronal differentiation. 96 97 Results 98 Differentiation of SH‐SY5Y cells with retinoic acid results in reduced translation levels 99 To dissect the importance of cytoplasmic lncRNAs and their ribosome associations in early 100 neuronal differentiation we profiled the differentiation of SH‐SY5Y cells with retinoic acid (RA) for 3 101 days. This treatment results in neuronal differentiation as indicated by neurite elongation, which can

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

102 be seen by immunostaining for neuronal βIII‐tubulin (TuJ1) (Sup 1A) and quantification of neurite 103 length reveals significant elongation upon RA treatment (Sup 1B). There is also increased expression 104 of neuronal markers: more cells express c‐Fos upon differentiation (Sup 1C&D). There is a concomitant 105 reduction in cell proliferation, as seen by reduced number of Ki67+ cells (Sup 1E&F). Together this data 106 indicates our RA treatment of SH‐SY5Y cells leads to neuronal differentiation. 107 To determine if this RA induced differentiation of SH‐SY5Y model will be suitable to study 108 translational dynamics, the translational output of these cells was assessed by polysome profiles (Sup 109 2A). This revealed that differentiation results in down‐regulation of global translation. Quantification 110 of translation complexes across the sucrose gradients indicates that levels of polysomes are reduced 111 with respect to 80S monosomes resulting in a reduced polysome to monosome ratio (Sup 2B). This 112 down‐regulation of translation is accompanied by a shift of ribosomal (RP) mRNAs from 113 polysomes to monosomes: e.g. RpL26 mRNA (Sup 2C), RpS28 (Sup 2D) and RpL37 (Sup 2E), as 114 measured by RT‐qPCR across gradient fractions. This reduced synthesis of RPs has previously been 115 reported during neuronal differentiation (Blair, Hockemeyer et al. 2017, Chau, Shannon et al. 2018). 116 Together this data indicates that the model of RA‐induced neuronal differentiation of SH‐SY5Y cells, 117 with dynamically regulated translation provides an ideal system in which to study cytoplasm RNAs, 118 their interaction with the translation machinery and contribution to neuronal differentiation. 119 120 Cytoplasmic lncRNA expression is regulated during neuronal differentiation 121 To profile RNA, ribosome association, and translational changes upon differentiation we 122 employed Poly‐Ribo‐Seq (Aspden et al., 2014) with some minor modifications to adapt to human 123 neuronal cells (Fig 1A). This adaptation of ribosome profiling (Ribo‐Seq) can detect which RNAs are a) 124 cytoplasmic, b) polysome associated and c) translated (Fig 1A). We sequenced i) poly‐A selected 125 cytoplasmic RNA, ‘Total’ RNA‐Seq, ii) Polysome‐associated poly‐A RNAs, ‘Polysome’ RNA‐Seq, and iii) 126 ribosomal footprints, Ribo‐Seq, from ‘Control’ and ‘RA’ differentiated cells (Fig 1A). 127 To determine which lncRNAs are expressed, present in the cytoplasm, and regulated during 128 differentiation we first analysed Total RNA‐Seq (Sup 3A). PCA revealed that RA treated samples cluster 129 separately from Control samples and biological replicates generally cluster together (Sup 3B). We 130 detect large numbers of lncRNAs expressed and present (i.e. have RPKMs ≥ 1) in the cytoplasm. In the 131 control conditions there were 801 lncRNA genes expressed in the cytoplasm and 916 lncRNA genes in 132 differentiated cells. To understand the potential role and regulation of cytoplasmic lncRNAs during 133 neuronal differentiation we performed differential expression analysis between Control and RA 134 samples at the gene level. We observed that 178 lncRNA genes up‐regulated and 100 down‐regulated 135 during differentiation in the total cytoplasm (Sup 3C). We also performed differential expression

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

136 analysis at the gene level analysis for Polysomal RNA‐Seq samples (Fig 1A). Within the Polysomes, we 137 identified 237 lncRNA genes were upregulated during differentiation whilst only 82 were 138 downregulated (Fig 1B). Comparing the lncRNAs differentially regulated in Total and Polysomes 139 populations, the majority, i.e. 71% of the upregulated (126/178) and 71% of the downregulated 140 lncRNAs (58/82), found in Polysomes were also found in Total (Sup 3D&E). Significant induction of 141 specific lncRNAs during differentiation, such as DLGAP1‐AS2 is suggestive of a regulatory role during 142 neuronal differentiation (Sup 3F). An assessment of the types of lncRNA regulated during neuronal 143 differentiation revealed that the majority are either intergenic or anti‐sense lncRNAs, 216/236 for 144 upregulated (Fig 1C) and 74/82 for downregulated (Fig 1D). In summary, neuronal differentiation 145 results in differential expression of ~300 lncRNAs within the cytoplasm. 146 We validated the differentiation‐induced changes in a subset of 7 lncRNAs (Sup 4A). By RT‐ 147 qPCR the expression of 6/7 candidate lncRNAs were significantly (p<0.05: SNAP25‐AS1, LINC001116; 148 p<0.01: ACE254633.1, DLGAP1‐AS2, DLGAP1‐AS1, LINC02143) upregulated upon differentiation, as 149 was determined by RNA‐Seq analysis. Fold‐changes were highly correlative between RNA‐Seq and RT‐ 150 qPCR (Sup 4B). To enable us to focus on lncRNAs with potential neuronal functions we selected 151 candidate lncRNAs that exhibited the highest fold increase in levels upon differentiation. To 152 understand whether our candidate lncRNAs are genuine cytoplasmic lncRNAs or whether their 153 cytoplasmic population represents a small minority we assessed their sub‐cellular distribution. The 154 majority of these candidate lncRNAs (6/7) are specifically enriched in the cytoplasm, rather than the 155 nucleus (Fig 1E, Sup 4C) in contrast to the known nuclear lncRNA Xist. Together these data indicate 156 that ~900 lncRNAs are localized to the cytoplasm, the majority of these likely enriched in the cytoplasm 157 and ~30% of these cytoplasmic lncRNAs are dynamically expressed during neuronal differentiation. 158 159 Association of lncRNAs with polysomes during neuronal differentiation 160 To determine the propensity of cytoplasmic lncRNAs to associate with translation complexes 161 and how this is regulated during differentiation we compared lncRNAs levels between the whole 162 cytoplasm (Total‐RNA‐Seq) and the polysomes (Polysome‐RNA‐Seq). This analysis indicated that the 163 vast majority of lncRNAs (Control: 98% and RA: 99%) are neither polysome enriched nor depleted (Fig 164 2A&B). This suggests that most lncRNAs are not actively targeted to polysomes but present in 165 translation complexes. A small number (Control: 12 and RA: 10) of lncRNAs are specifically depleted 166 from the polysomes (Fig 2A&B). This suggests that the roles of these lncRNAs play are likely elsewhere 167 in the cytoplasm and not directly connected to translation. 9/12 depleted in control are not depleted 168 upon differentiation, indicating their polysome association is regulated during differentiation (Sup 5A). 169 There is a smaller proportion of anti‐sense lncRNAs within these polysome‐depleted populations (Sup

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

170 5B&C) as compared to the proportion displayed by those lncRNAs differentially expressed during 171 differentiation (Fig 1 C&D). This may indicate that anti‐sense lncRNAs are more likely to be present in 172 polysomes and their role in the polysomes could be linked to their anti‐sense characteristics. 173 To understand the precise nature of the association of our candidate lncRNAs with translation 174 complexes their distribution was profiled across sucrose gradient fractions. This confirmed that these 175 lncRNAs are found to associate with polysome complexes within the cytoplasm but also reveals the 176 precise translation complexes they interact with in terms of ribosomal subunits, 80S monosomes and 177 different polysomes. We first profiled LINC02143, which is induced >22‐fold during differentiation 178 (Sup 4A) and highly enriched in the cytoplasm compared to the nucleus (Fig 1E). LINC02143 was mainly 179 found in monosomes (80S) and small polysomes (2‐3 ribosomes) (Fig 2C). DLGAP1‐AS1 was also found 180 to associate with small polysomes (2‐4 ribosomes), as well as ribosomal subunits and 80S monosomes 181 (Fig 2D). Therefore, both LINC02143 and DLGAP1‐AS1 could either be translated or regulate 182 translation. 183 Another lncRNA whose levels significantly increase during differentiation is LINC01116 (Fig 1B 184 and Sup 4A), which has previously been shown to be involved in the progression of glioblastoma (GBM) 185 (Brodie, Lee et al. 2017). LINC01116 is enriched in the cytoplasm (Fig 1E), detected at high levels in the 186 80S (monosome) fraction and in small and medium polysomes (2‐7 ribosomes) (Fig 2E). Upon 187 differentiation there is an increase in the amount of LINC01116 present in disomes, compared to 188 Control. This is consistent with the upregulation of LINC01116 transcript in the polysomes detected 189 by RNA‐Seq, indicating a likely functional interaction of LINC01116 with polysomes during 190 differentiation. In fact, the majority of LINC01116 transcript was found to associate with polysomes in 191 both undifferentiated cells (Control‐66%) and upon differentiation (RA‐57%), suggesting it is could 192 either be translated or associating with translation complexes (Fig 2E). Overall, these data indicate 193 that the majority of cytoplasmic lncRNAs are polysome associated but not enriched. Comparing the 194 differentiation induced lncRNA changes between Total and Polysome populations, as well as specific 195 candidate lncRNAs, also suggests that polysome association is dynamic during differentiation. 196 197 Translation of lncRNA‐smORFs during differentiation 198 To better understand the association of lncRNAs with polysome complexes and their potential 199 translation we analysed our third and final dataset, ribosome footprinting from our Poly‐Ribo‐Seq 200 experiments (Fig 1A‐Actively translated mRNAs and lncRNAs). Triplet periodicity analysis reveals good 201 framing, specifically at footprint lengths of 31 and 33 nt (Sup 6A, Sup Table 1). On average 95% of 202 ribosome footprints mapped to CDSs, whilst in RNA‐Seq this was only 53% (Sup 6B). Together these 203 attributes indicate that our Ribo‐Seq quality is high and represents genuine translation events. Since

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

204 31 and 33nt reads exhibited high triplet‐periodicity they were selected for downstream analysis of 205 translation events to identify ORFs that were translated (Fig 3A). Overall, we are able to detect the 206 translation of both canonical protein‐coding ORFs, and non‐canonical ORFs, including upstream 207 (uORFs) and downstream ORFs (dORFs), in both Control and RA conditions (Table 1). 208 We analysed our Ribo‐Seq data in order to determine if any of the polysome associated 209 lncRNAs are engaged with ribosomes and translated. Using our pipeline (Fig 3A) we detected the 210 translation of 45 small ORFs within lncRNAs (lncRNA‐smORFs), 28 in control and 23 during 211 differentiation, 6 of these were translated in both conditions (Fig 3B). Only two of these lncRNA‐ 212 smORFs has previously been characterised, CRNDEP (Szafron, Balcerak et al. 2015) and HAND2‐AS1 213 (van Heesch, Witte et al. 2019), the remaining 43 represent novel ORFs. The level at which these 214 lncRNA‐smORF are translated was assessed by determining their Translational Efficiencies (TEs), the 215 number of ribosome footprints relative to RNA abundance. The TEs of translated lncRNA‐smORFs were 216 similar to those measured for protein‐coding ORFs (Fig 3C), providing further evidence that these are 217 genuine translation events. Whereas the dORFs that we detect are translated at much lower 218 efficiencies (Fig 3C). This is likely to be because ribosomes would have to reinitiate after the main ORF, 219 which would occur at a lower efficiency. As an additional assessment of whether the translation of 220 lncRNA‐smORFs represent genuine translation events we compared the pattern of ribosome 221 footprints across the smORFs with protein‐coding CDSs (Sup 6C‐F). In both Control and RA conditions 222 metagene plots show the distribution of footprints is very similar between lncRNA‐smORFs (Sup 223 6C&D) and protein‐coding CDSs (Sup 6E&F), specifically around start and stop codons. There is a 224 substantial drop‐off of footprints at the stop codon for both, indicative of genuine translation events. 225 Together our Ribo‐Seq analysis reveals that a subset of polysome‐associated lncRNAs are translated. 226 To better understand these translated lncRNA‐smORFs we profiled their features. Analysis of 227 the 45 translated smORFs from lncRNAs shows that they are all <300 aa in length with a median size 228 of 60 aa (Fig 3D) (56 aa in Control and 64 aa in RA: Sup 7A). Previous analysis has indicated that 229 Drosophila smORF peptides exhibit specific usage, indicating that they are genuine proteins 230 and show propensity to form transmembrane alpha‐helices (Aspden, Eyre‐Walker et al. 2014). 231 Therefore, we profiled the amino acid composition of our lncRNAs‐smORFs, uORFs and dORFs, 232 compared to protein‐coding ORFs and expected by chance frequencies (Sup 7B). lncRNA‐smORFs 233 exhibit similar frequencies to known protein‐coding ORFs. Specifically, smORFs possess lower than 234 expected levels, but not as low as known protein‐coding ORFs. Amino acid usage does not 235 suggest that any smORF groups have propensity to form transmembrane alpha‐helices. 236 From within the set of lncRNAs we identified as induced during differentiation (RNA‐Seq), 237 several contained translated smORFs (Ribo‐Seq). One of these is LINC01116, which contains a 71‐

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

238 codon smORF detected as translated by our Ribo‐Seq data. Ribosome profiling signal is substantially 239 higher upon differentiation (Fig 3E and Sup 7C), mainly as a result of increased lncRNA transcript 240 abundance. Overall, ~80% of reads that map to this ORF are in frame 2, whereas outside this ORF, the 241 few reads mapping to the remaining lncRNA sequence are far more equally distributed between the 242 three possible frames (Fig 3E). Such robust framing is highly indicative of genuine translation of this 243 specific smORF, from within the LINC0116 lncRNA. Together analysis of our Ribo‐Seq data has led to 244 the discovery of 43 novel lncRNA‐smORFs with robust indicators of translation. 245 246 Peptide synthesis from smORFs in lncRNAs during differentiation 247 Our pipeline is highly stringent, i.e., there are many additional ORFs that display good framing 248 but do not pass our thresholds for numbers of footprinting reads or exhibit background signal in rest 249 of the lncRNA. Therefore, we are confident these translation events are taking place. To validate 250 peptide synthesis from these translation events, we have taken two complementary strategies: mass 251 spectrometry analysis and transfection of ORF tagging constructs. Analysis of previously published 252 mass spectrometry datasets from SH‐SY5Y cells (undifferentiated and RA‐treated) (Murillo, Pla et al. 253 2018, Brenig, Grube et al. 2020) supports the production of peptides from 8 lncRNA‐smORFs (4 Control 254 and 4 RA) (Fig 4A). This relatively low level of support is to be expected given the small size of these 255 peptides and therefore reduced chance of producing peptides >8aa from digestion for mass 256 spectrometry detection (Saghatelian and Couso 2015). Overall mass spectrometry supports the 257 peptide synthesis from 18% of lncRNA‐smORFs detected by Poly‐Ribo‐Seq. 258 To validate translation of our lncRNA‐smORFs that were not identified in previous mass 259 spectrometry but were detected as translated by our Poly‐Ribo‐Seq analyses, we used a transfection 260 tagging approach. We cloned the lncRNA sequence 5’ of the putative ORF, termed the 5’‐UTR, and the 261 smORFs, without its stop codon, with a C terminal 3x FLAG tag (Fig 4B). The FLAG tag did not have its 262 own start codon, so any FLAG signal is the result of translation from the lncRNA‐smORF. Two candidate 263 lncRNA‐smORFs were selected that did not have mass spectrometry support: LINC000478 and 264 LINC01116. A 37 codon smORF was detected as translated by our Poly‐Ribo‐Seq from LINC00478 in 265 both conditions (Fig 4C). Transfection of LINC00478‐smORF‐FLAG into undifferentiated SH‐SY5Y cells 266 produced FLAG signal in both nuclear and cytoplasmic compartments (Fig 4D). FLAG signal was also 267 seen when LINC00478‐smORF‐FLAG transfected SH‐SY5Y cells were treated with RA (Sup 7A). This RA 268 FLAG signal was only ever detected in the nucleus. Similar results were seen in HEK293 cells (Sup 7B), 269 but because of the higher transfection efficiency in HEK293 compared with SH‐SY5Y cells, we detect 270 FLAG signal in more cells. Together this indicates that translation of LINC00478‐smORF is resulting in 271 the synthesis of peptide and the specific localization of this peptide is indicative of peptide function.

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

272 The second candidate lncRNA‐smORF detected by our Poly‐Ribo‐Seq that we tagged was in 273 LINC01116. Tagging of this LINC01116‐smORF (Fig 4E) generated FLAG signal in the cytoplasm of SH‐ 274 SY5Y cells, which is localized to neurites (Fig 4F). FLAG signal was also present in LINC01116 275 transfections in HEK293 cells (Sup 7C), but because of the higher transfection efficiency in HEK293 276 compared with SH‐SY5Y cells, we detect FLAG signal in more cells. 277 The LINC01116‐smORF detected by Poly‐Ribo‐Seq is 71‐codon in length but inspection of the 278 lncRNA sequence upstream of the smORF reveals a second potential ATG start codon (Fig 4E and Sup 279 7D). Although there was little Ribo‐Seq signal to support this 5’ start codon, it is possible translation 280 of LINC01116‐smORF initiates there. The two potential start codons were assessed for similarity to the 281 Kozak sequence consensus, using NetStart1.0 (Pedersen and Nielsen 1997), both exhibited scores 282 >0.5, indicating both are in good context and therefore either could be used to initiate translation

283 (AUG1= 0.545, AUG2= 0.645). Given the scanning model of translation initiation it seems likely that the 284 first AUG would be used. To determine if the 5’ start codon is used it was mutated and the effect on 285 production of FLAG signal measured (Fig 4E). No FLAG signal was present in transfections where the 286 5’ start codon was mutated (1) (Fig 4G). This suggests that the first start codon is necessary for the 287 translation of the LINC01116‐smORF and the resulting peptide is 87aa long. Although FLAG signal is 288 present in a low number of cells, no transfection controls and 1 indicate that the FLAG signal is 289 dependent on translation of the LINC01116‐smORF (Sup 7E). These results indicate that translation of 290 LINC01116‐smORF results in peptide synthesis and this 87aa peptide exhibits a distribution suggestive 291 of a function in neuronal differentiation. 292 293 Translated lncRNA‐smORFs exhibit sequence conservation 294 Another indicator of coding potential and of peptide functionality is sequence conservation. 295 Therefore, we assessed the extent to which the sequences of our novel lncRNA‐smORFs are 296 conserved. Given that lncRNAs in general are poorly conserved we used closely related species to 297 humans: the other 4 great apes (P. abelii, P. paniscus, P. troglodytes, G. gorilla) as well as N. leucogenys 298 (ape) and M. musculus. To ensure detection of sequence conservation for these short smORFs 299 irrespective of annotation in other , we used three complementary BLAST strategies using 300 transcript nt sequence, smORF nt sequence and protein aa sequence. 301 Initial searches using the entire lncRNA transcript sequence (nt) and BLASTn (Altschul, Gish et 302 al. 1990), returned results for ~78% of translated lncRNAs (35/45), many of which had short alignment 303 lengths of 30‐100 nt. Although some of these results may represent conservation of the smORFs, many 304 are due to small areas of sequence overlap along the rest of the lncRNA. LncRNAs rarely exhibit the 305 same levels of conservation as mRNAs (Johnsson, Lipovich et al. 2014) but may contain short

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

306 “modules” of higher sequence conservation, as described for Xist lncRNA (Brockdorff 2018). To take 307 account of this, a second round of searches were performed, on the initial search results, using the nt 308 sequence of the smORF (BLASTn) (Altschul, Gish et al. 1990) followed by manual cross validation. This 309 identified 14 lncRNA‐smORFs as exhibiting nt sequence conservation in at least one of apes or mouse 310 (Fig 5A). For the majority of these conserved lncRNA‐smORFs, conservation is high across the smORF 311 sequence, and lower across the rest of the transcript. One such example is AL162386.2‐smORF 312 (ENST00000442428.1), it exhibits high sequence conservation when compared to gorilla (G. gorilla) 313 and orangutan (P. abelii), with 100% and 99% smORF nt sequence identity respectively (Fig 5B). When 314 entire transcripts are aligned, this percentage sequence identity drops to 74% with gorilla 315 (ENSGGOT00000060708.1), and 65% with orangutan (ENSPPYT00000022401.2), indicating the smORF 316 is the most conserved part of these transcripts. Together this suggests that the AL162386.2‐smORF, 317 like canonical protein‐coding CDSs is under greater selective pressure than its UTRs. 318 To further corroborate these results, a tBLASTn (Altschul, Gish et al. 1990) search of the 319 lncRNA‐smORF aa sequences was performed, which employs translated transcript databases in all 6 320 frames. This removes the noise of synonymous substitutions, which can have a significant effect, 321 particularly in smORFs (Ladoukakis, Pereira et al. 2011). For the majority of smORFs, the same results 322 were returned as the first BLASTn strategy, and evidence of conservation was found for a further 3 323 lncRNA‐smORFs (ENST00000454935.1_477_633, ENST00000557660.5_42_186, 324 ENST00000453910.5_151_262) that appear to have undergone some frameshift (Fig 5A). 325 The third BLAST strategy at the aa sequence level, using BLASTp (Altschul, Gish et al. 1990), 326 identified homologous peptide sequences for 18% (8/45) of the translated lncRNA‐smORFs. This 327 strategy only identifies peptide sequences previously annotated at protein coding regions in the 328 search species. However, all the returned proteins were unreviewed, uncharacterised proteins, with 329 no evidence at protein, transcript or levels in the Uniprot database (Bateman, Martin et al. 330 2019). This potentially suggests that automated annotation pipelines recognised the coding potential 331 of these smORFs, unlike in the more curated human annotation. Overall, the combination of 332 these 3 layers of sequence conservation analysis, as assessed by 3 BLAST strategies, revealed that 12 333 of our translated lncRNA‐smORFs exhibit sequence conservation within Hominidae, 3 additional 334 smORFs are also found in gibbons (Fig 4A), with evidence for 2 translated smORFs detectable at the 335 greater evolutionary distance of human to mouse, but not found in all apes (Fig 5A). In total, we have 336 discovered homologs for 17/45 of the lncRNA‐smORFs in at least 1 of the searched species (Fig 5A). 337 Together this analysis indicates 38% of translated lncRNA‐smORFs exhibit some level of sequence 338 conservation, indicating that it is likely that they are translated in other species.

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

339 We took advantage of the identification of homologous smORFs in other species to determine 340 if our translated smORFs exhibit signals of conserved protein‐coding regions. Multi‐species 341 sequence alignments were made and phylogenetic codon analysis performed with PhyloCSF (Lin, 342 Jungreis et al. 2011). Of the 17 lncRNA‐smORFs with homologs in other species, 14 could be assessed, 343 based on the organisms that the homologs were identified in. Four smORFs exhibited positive 344 PhyloCSF scores, indicating that they are likely to represent a conserved coding region, whilst the other 345 10 had negative scores (Sup Table 2). The four‐positive scoring lncRNA‐smORFs are in AC020928.2, 346 ENTPD1‐AS1, LINC00839 and THAP9‐AS1. This analysis provides further support that these 4 lncRNA‐ 347 smORFs likely encode peptides. 348 349 Translated lncRNAs are associated with neuronal functions and diseases 350 To probe the potential biological function, expression and association with human disease we 351 profiled the 45 translated lncRNAs using published data. Dynamic expression of lncRNAs can indicate 352 tissues and developmental timepoints when lncRNAs may function. Therefore, we first assessed 353 whether our translated lncRNA genes exhibited developmentally dynamic expression. In lncExpDB 354 (Sarropoulos, Marin et al. 2019) genes that show large changes in expression at different time points 355 during development of a specific organ are annotated as ‘dynamic’. 39% of our translated lncRNA 356 genes exhibit ‘dynamic’ expression compared with 19% of the total population of lncRNA genes 357 present in lncExpDB (Fig 6A). This indicates that translated lncRNA genes are more likely to have 358 developmentally regulated expression than lncRNAs in general, suggesting biological roles for these 359 lncRNAs. 360 We found that 62% of these dynamic translated lncRNAs are regulated in the brain, compared 361 with 20% of all dynamic lncRNAs (Fig 6B), therefore our translated lncRNAs may function in the brain 362 during development. When we consider our own RNA‐Seq of Control and RA treated SH‐SY5Y cells, 363 22% of the translated lncRNAs exhibit differential expression in the cytoplasm, 20% upregulated and 364 2% downregulated upon differentiation (Fig 6C). This potentially indicates that the biological role of 365 these translated lncRNAs may of broader neuronal importance than just in this differentiation model. 366 By examining data from the FANTOM6 project we were able to probe potential cellular 367 functions for 3 of our translated lncRNAs. siRNA knockdowns of LINC01116, FGD5‐AS1 and TUG1 were 368 performed in human dermal fibroblasts followed by RNA‐Seq to understand global effects of depleting 369 these lncRNAs. GO term analysis of these published data showed that genes associated with neuronal 370 function were enriched for all 3 of our translated lncRNAs (Fig 6D). This is particularly striking given 371 knockdown was performed in a non‐neuronal cell type and suggests all 3 lncRNAs possess neuronal 372 functions.

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

373 To investigate potential roles for our translated lncRNAs in human disease we examined 374 published association studies specifically for neurological diseases and disorders (Chen, Wang et al. 375 2013, Li, Sun et al. 2016, Rappaport, Twik et al. 2017, Bao, Yang et al. 2019). This revealed 68% of the 376 translated lncRNAs have an association with cancers of the central nervous system (CNS) (Fig 6E). This 377 is consistent with our discovery of their translation in a neuroblastoma cell line (SH‐SY5Y). In addition, 378 21% of the translated lncRNAs are associated with neurodegenerative diseases and 18% with 379 neurodevelopmental disorders (Fig 6E). Overall examination of published data on our translated 380 lncRNA indicates that they likely have neuronal functions, potentially during neuronal development 381 and may contribute to neuronal diseases. 382 383 LINC01116 contributes to neuronal differentiation 384 To dissect the potential role of the translated lncRNAs during neuronal differentiation we 385 performed siRNA knockdown in SH‐SY5Y cells. We selected the candidate lncRNA LINC01116, because 386 we discovered it is induced during differentiation and translated to produce a neurite localised 387 peptide. We performed LINC0116 siRNA knockdown in both undifferentiated and differentiated SH‐ 388 SY5Y cells, achieving an 89‐94% reduction in LINC01116 levels (Sup 8A). LINC01116 knockdown had a 389 limited effect on cell viability (Sup 8B). The extent of differentiation was then assessed by Tuj1 390 immunofluoresence, which revealed that LINC01116 knockdown resulted in a significant reduction of 391 neurite length in RA treated SH‐SY5Y cells (Fig 7A, zoom in Fig 7B), compared to scrambled siRNA 392 treated SH‐SY5Y (Fig 7C). However, there was no effect of the knockdown in undifferentiated cells (Fig 393 7C). This suggests that LINC01116 is involved in the regulation of neuritic processes formation during 394 neuronal differentiation. To examine potential effects of LINC01116 knockdown further on 395 differentiation we assessed the expression levels of the noradrenergic marker MOXD1, which is 396 important in neural crest development. LINC01116 siRNA knockdown, upon differentiation resulted 397 in a reduction of MOXD1 expression levels, further indicating a role of LINC01116 in neuronal 398 differentiation (Fig 7D). However, LINC0116 knockdown had no effect on proliferation, as measure by 399 % of Ki67+ cells (Sup 8C&D) or cell cycle, as measured by E2F1 mRNA RT‐qPCR (Sup 8E). LINC01116 400 likely functions early in the differentiation pathway since its levels are significantly upregulated within 401 the first 24 hours of RA induced differentiation (Sup 8F). Expression of LINC01116 then declines rapidly 402 by day 8 (Sup 8G). Together these results suggest that LINC01116 functions during early 403 differentiation, contributing to neurite process formation. 404 405 Discussion

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

406 In this work we have dissected the relationship of lncRNAs with the translation machinery 407 during human neuronal differentiation using RA treated SHSY5Y cells as a model. We discovered that 408 ~800‐900 lncRNA genes are expressed and exported to the cytoplasm. 85‐90% of these cytoplasmic 409 lncRNAs are associated with polysome complexes, suggesting that they are either being translated, or 410 regulating the translation of the mRNAs with which they interact. Moreover, the association of 411 lncRNAs with polysomes is dynamic during differentiation, as shown by the differential polysome 412 enrichment of lncRNAs in Control and RA treated cells. These results reveal that many lncRNAs are 413 present in the cytoplasm, enriched there, and are associated with translation complexes. 414 We characterized LINC02143 in more detail, which was found to associate with polysomes. It 415 is an intergenic lncRNA with no known function, which is induced upon differentiation. It is detected 416 in 80S and small polysome fractions, indicating it interacts with the translation machinery, but it is not 417 detected as translated. A number of anti‐sense polysome‐associated lncRNAs appear to be 418 upregulated upon differentiation. Amongst them is DLGAP1‐AS1, which is anti‐sense to Disks large‐ 419 associated protein 1 (DLGAP1), a protein‐coding gene involved in chemical synaptic transmission. 420 DLGAP1‐AS1 interacts with actively translating polysomes both in Control and upon differentiation, 421 but it is not translated. The lncRNAs depleted from the polysomes have fewer anti‐sense lncRNAs 422 relative to other populations, suggesting that anti‐sense lncRNAs are preferentially localised to 423 polysomes. These polysome associated anti‐sense lncRNAs could potentially regulate the translation 424 of their ‘sense’ mRNA, through base‐pairing, as is the case with BACE1‐AS (Faghihi, Zhang et al. 2010) 425 and UCHL1‐AS (Carrieri, Cimatti et al. 2012). 426 Ribosome profiling of the actively translating polysomes allowed us to distinguish between 427 the lncRNAs that simply associate with the polysome complexes and those that are being actively 428 translated. We identified 45 translated lncRNA‐smORFs, 43 of which are novel ORFs. These translated 429 lncRNA‐smORFs exhibit high levels of triplet periodicity and their translational efficiencies are similar 430 to protein‐coding genes. We can therefore be confident that these are real translation events leading 431 to the production of substantial peptide levels rather than background, spurious events (Guttman, 432 Russell et al. 2013, Bazzini, Johnstone et al. 2014, Ruiz‐Orera and Alba 2019, Patraquim, Mumtaz et al. 433 2020). The size distribution of our novel translated ORFs indicates that the majority are indeed smORFs 434 (<100aa). The general pattern we identified is that dORFs > lncRNA‐smORF > uORFs in size. This is 435 consistent with previous studies where a wide range of peptides lengths were discovered (Aspden, 436 Eyre‐Walker et al. 2014, Chong, Müller et al. 2020). Amino acid composition of these translated 437 smORFs supports the fact they are translated into peptides. However, it does not suggest they are 438 enriched for transmembrane alpha‐helices, in contrast to the smORFs characterized in D. 439 melanogaster (Aspden, 2014).

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

440 Overall, we have independent evidence for peptide synthesis for 12/45 lncRNA‐smORFs. 8 of 441 these from published mass spectrometry data from SH‐SY5Y cells (Brenig, Grube et al. 2020). In 442 general, we find our lncRNA‐smORFs translated in the same treatment (undifferentiated or 443 differentiated) as these mass spectrometry datasets detect the smORF peptides (7/8). An 18% mass 444 spectrometry detection level may seem low but given the limitations of detecting small peptides by 445 mass spectrometry this represents a substantial level of validation. Two translation events were 446 validated by FLAG tagging transfection assay: LINC01116 and LINC00478 lncRNA‐smORFs. The 447 production of 2/45 lncRNA‐smORF peptides is corroborated by previous studies in non‐neuronal cells. 448 HAND2‐AS1 (translated in Control and RA) is translated in human and rodent heart and encodes for 449 an integral membrane component of the endoplasmic reticulum (van Heesch, Witte et al. 2019). 450 CRNDE, which is only translated upon differentiation, encodes for a previously characterised nuclear 451 peptide (CRNDEP) (Szafron, Balcerak et al. 2015). The translation of these smORFs in multiple cell types 452 provides substantial support for the production of peptides and their potential function. 453 We also discovered that 24% of the lncRNA‐smORFs we find translated show sequence 454 conservation across the Hominidae. This suggests that the other great apes have the potential to 455 translate very similar peptides. This provides additional evidence to indicate that these translation 456 events are not translational noise. Of course, it will be interesting to uncover the function of these 457 small peptides in the future. 4 of the conserved lncRNA‐smORFs are under purifying selection and 458 therefore likely to encode peptides. 459 LINC0116‐smORF DNA sequence is on the opposite strand to a SINE element, suggesting that 460 this lncRNA and smORF have likely evolved from a SINE transposable element (TE). This is consistent 461 with previous observations that 39% of lncRNA sequences are derived from TEs (Carlevaro‐Fita, Rahim 462 et al. 2016) and that LINC01116 is human specific, i.e. not found in other apes. Many other small ORFs 463 have also been found to originate from Alu elements e.g. 287 human uORFs (Shen, Lin et al. 2011). 464 Together this suggests that LINC01116‐smORF has recently evolved from a TE. 465 We found that 22% of the translated lncRNAs are differentially expressed during neuronal 466 differentiation of SH‐SY5Y. Analysis of publicly available datasets revealed that our translated lncRNAs 467 show regulated expression during human development, specifically in the brain, more so that lncRNAs 468 in general. Many of these translated lncRNAs also exhibit associations with neuronal diseases. The 469 FANTOM6 project data suggests that LINC01116, FGD5‐AS1 and TUG1 have cellular roles in neuronal 470 function (Ramilowski, Yip et al. 2020). Together this indicates that these translated lncRNAs play roles 471 in neuronal development and differentiation, and likely contribute to neurological diseases. 472 Here we have discovered that LINC01116 produces an 87aa peptide that exhibits cytoplasmic 473 localisation, and specifically is detected near the cell membrane and in neuritic processes. The

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

474 upregulation of LINC01116 expression upon differentiation, coupled with the localisation of its 475 peptide, led us to further investigate its potential role in differentiation. Knockdown of LINC01116 476 upon differentiation appears to impede neurite outgrowth and results in the reduction of the mRNA 477 levels of the noradrenergic marker MOXD1. Our data suggest that LINC01116 is involved in the 478 regulation of neuronal differentiation, consistent with the fact that it is moderately expressed in the 479 developing human forebrain and highly expressed in the developing human midbrain and spinal cord 480 (Lindsay, Xu et al. 2016). The effects of siRNA in human dermal fibroblasts also supports a role for 481 LINC01116 in neuronal migration (Ramilowski, Yip et al. 2020). LINC01116 has previously been found 482 to be involved two other cancer models; in the progression of glioblastoma (Brodie et al., 2017) and 483 it is upregulated in gefitinib resistant non‐small cell lung cancer cells (Wang He et al., 2020). siRNA 484 knockdown of LINC01116 in both these cell types results in decreased expression of stem‐cell markers 485 (Nanog, SOX2 and Oct4) and reduced cell proliferation. This suggests LINC01116 promotes cell 486 proliferation in these systems, indicating that the downstream effects of LINC01116 may vary 487 according to cell type. However, knockdown of LINC01116 also inhibited migration of glioma stem 488 cells (Brodie, Lee et al. 2017), while overexpression of LINC01116 promoted invasion and migration of 489 gastric cancer cells (Su, Zhang et al. 2019). This suggests a potential role of LINC01116 in the formation 490 of cell membrane protrusions, which is consistent with the role we have discovered for LINC01116 in 491 neurite development. It is yet to be determined if this function of LINC01116 during neuronal 492 differentiation is performed at the lncRNA or peptide level. 493 To conclude, our findings indicate that many lncRNAs are localised in the cytoplasm and they 494 likely play functional roles as indicated by their regulation during differentiation and polysome 495 association. Given the large number of lncRNAs we found to be associated with polysomes in the 496 cytoplasm, it is likely that future work will assign the functions of many more lncRNAs to translational 497 regulation. We have identified 43 novel translation events, many of which are regulated during 498 differentiation. The lncRNA‐smORFs we discover here represent a general population whose products 499 have not yet been characterized. As demonstrated for LINC01116, lncRNAs and the small peptides 500 encoded therein have the potential to contribute to important cellular functions, development and 501 disease. 502 503 Materials and Methods 504 Cell culture 505 Human neuroblastoma SH‐SY5Y cells, were cultured in Dulbecco’s Modified Eagle Medium 506 (DMEM 4.5g/L Glucose with L‐Glutamine) supplemented with 1% (v/v) Penicillin/Streptomycin and

o 507 10% Fetal Bovine Serum (FBS) at 37 C, 5% CO2. Neural induction commenced at passage 4 and was 508 performed as described previously (Korecka, van Kesteren et al. 2013, Forster, Kglsberger et al. 2016)

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

509 with minor alterations. All trans Retinoic Acid (RA, Sigma) was added to cells 24h after plating, at final 510 concentration of 30µM for 3 days. 511 512 Immunocytochemistry 513 Cells were seeded on Poly‐D‐Lysine/mouse laminin coated 12mm round coverslips (Corning 514 BioCoatTM Cellware) and fixed with 4% paraformaldehyde (PFA) (Affymetrix) for 20 min at room 515 temperature (RT). A permeabilization step (0.1% Triton‐X for 10 min at RT) was performed prior to 516 blocking, followed blocking at RT in blocking buffer (3% BSA, 1XPBS or 5% NGS, 1XPBS and 0.1% Triton‐ 517 X) for 30 min. Primary (Sup methods) were applied in (3% BSA 1X PBS or 0.5% NGS, 1X PBS, 518 0.1% Triton‐X and incubated at RT for 2h or at 4o C overnight. Cells were washed and labelled with 519 Alexa 488, Alexa 555, or Alexa 633 at 1:500 dilution for 2h at RT in 0.5% NGS, 1X PBS, 0.1% Triton‐X. 520 Cells were mounted in VECTASHIELD mounting medium, analysed using LSM 700 confocal microscope 521 (Zeiss) ImageJ. 522 523 cDNA synthesis and quantitative Real Time PCR (RT‐qPCR) 524 Equal amounts of RNA (whole cell, nuclear and cytoplasmic lysates) or equal volumes 525 (polysome fractions) were subject to cDNA synthesis, using qScript (Quanta Bio) according to 526 manufacturer’s instructions. qPCR was performed using the CFX ConnectTM thermal cycler and 527 quantification using SYBR Green fluorescent dye (PowerUpTM SYBR Green Master Mix, Thermo Fisher 528 SCIENTIFIC). Primers were designed to anneal to ‐exon junctions, where possible, or to common 529 between alternative transcripts (Sup methods). Target mRNA and lncRNA levels were assessed 530 by absolute quantification, by the means of standard curve or relative quantification, using the ΔΔCq 531 method. 532 533 Polysome Profiling 534 RA was added to SH‐SY5Y cells 3 days prior to harvesting. Cells were treated with 535 cycloheximide (Sigma) at 100μg/ml for 3 min at 37oC, washed (1X PBS, 100μg/ml cycloheximide) and 536 trypsinised for 5 min at 37oC. Subsequently, cells were pelleted, washed (1X PBS, 100μg/ml 537 cycloheximide), and resuspended in ice cold lysis buffer (Sup methods); 50mM Tris‐HCl pH8, 150mM

538 NaCl, 10mM MgCl2, 1mM DTT, 1% IGEPAL, 100μg/ml cycloheximide, Turbo DNase 24U/µL (Invitrogen), 539 RNasin Plus RNase Inhibitor 90U (Promega), cOmplete Protease Inhibitor (Roche), for 45 min. Cells 540 were then subjected to centrifugation at 17,000 xg for 5 min, to pellet nuclei. Cytoplasmic lysates were 541 loaded onto 18%‐60% sucrose gradients (~70 x106 cells per gradient) at 4oC and subjected to

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

o 542 ultracentrifugation (121,355 x gavg 3.5h, 4C) in SW‐40 rotor. Gradients were fractionated using 543 GRADIENT STATION (Biocomp) and absorbance at 254 nm was monitored using a Biorad detector. 544 545 Poly‐Ribo‐Seq 546 ~20% of cytoplasmic lysate was kept for PolyA selection (Total RNA control) and ~80% was 547 loaded onto 18%‐60% sucrose gradients (~70 x106 cells per gradient) at 4oC and subjected to

o 548 ultracentrifugation (121,355 x gavg 3.5h, 4 C) in SW‐40 rotor. Polysome fractions were pooled from 549 control and from differentiated cells. ~25% polysomes were kept for polyA selection (Polysome‐

550 associated RNA). The remaining 75% was diluted in 100mM Tris‐HCl pH8, 30mM NaCl, 10mM MgCl2. 551 RNaseI (EN601, 10U/µl 0.7‐1U/million cells) was subsequently added incubated overnight at 4oC. 552 RNaseI was deactivated using SUPERase inhibitor (200U/gradient) for 5 min at 4oC. Samples were 553 concentrated using 30 kDa molecular weight cut‐off columns (Merck) and loaded on sucrose cushion

554 (1M sucrose, 50mM Tris‐HCl pH8, 150mM NaCl, 10mM MgCl2, 40U RNase Inhibitor) and subjected to

o 555 ultracentrifugation at 204,428 x gavg at 4 C for 4h (TLA110). Pellets were resuspended in TRIzol 556 (Ambion, Life Technologies) and processed for RNA purification. 557 RNA purification from cytoplasmic lysates and RNaseI footprinted samples was performed by 558 Trizol RNA extraction, following manufacturer’s instructions. RNA purification from polysome 559 fractions was performed by isopropanol precipitation, followed by TURBO DNase treatment 560 (Thermofisher) (according to manufacturer’s instructions), acidic phenol/chloroform RNA purification 561 and ethanol precipitation at ‐80oC overnight. RNA concentration was determined by Nano‐drop 2000 562 software. Two rounds of polyA selection from total cytoplasmic lysate and polysome fractions were 563 performed using oligo (dT) Dynabeads (Invitrogen) according to manufacturer’s instructions. PolyA 564 RNA was fragmented by alkaline hydrolysis. 28–34 nt ribosome footprints and 50–80 nt mRNA 565 fragments were gel purified in 10% (w/v) polyacrylamide‐TBE‐Urea gel at 300V for 3.5h in 1X TBE. 566 Ribosome footprints were subjected to rRNA depletion (Illumina RiboZero rRNA removal kit). 567 5’ stranded libraries were constructed using NEB Next Multiplex Small RNA Library Prep. 568 Resulting cDNA was PCR amplified and gel purified prior to sequencing. Libraries were subjected to 569 75bp single end RNA Seq using NextSeq500 Illumina sequencer, High Output Kit v2.5 (75 Cycles) (Next 570 Generation Sequencing Facility, Faculty of Medicine, University of Leeds). 571 572 RNA‐Seq data analysis 573 RNA‐Seq reads were trimmed with Cutadapt (v.19.1) (Martin 2011) and filtered with 574 fastq_quality_filter (v.0.0.13) (Hannon 2010) to filter out the reads of low quality (90% of the read to 575 have a phred score above 20). Filtered reads were mapped (Liao, Smyth et al. 2013) to human genome 576 reference (the lncRNA GENCODE Release 19 (Frankish, Diekhans et al. 2019) annotation added to

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

577 mRNA annotation from UCSC (Haeussler, Zweig et al. 2019) human genome assembly (hg19) from 578 iGenomes) with Rsubread (v.1.22.0) (Liao, Smyth et al. 2013) and uniquely mapped reads were 579 reported. Bam file sorting and indexing was performed with SAMtools (v.1.3.1) (Li, Handsaker et al. 580 2009). Subsequently summarised read counts for all genes were calculated using featureCounts (Liao, 581 Smyth et al. 2014). For normalization, RPKM values were calculated. Differential expression analysis 582 was conducted with DESeq2 (v.1.12.0) (Love, Huber et al. 2014) based on the two cut‐offs padj < 0.05 583 and the absolute value of log2FoldChange > 1. Gene ontology analysis was performed with GOrilla 584 (Gene Ontology enRIchment anaLysis and visuaLizAtion tool) (Eden, Navon et al. 2009). 585 586 Ribo‐Seq analysis 587 Quality reports of Polysome‐associated RNA‐Seq and Ribo‐Seq data were made using Fastqc 588 (v.0.11.9) (Andrews 2010). Adaptor sequences were trimmed using Cutadapt (v.210) (Martin 2011) 589 with minimum read length of 25bp, and untrimmed outputs retained for RNA‐Seq reads. Low‐quality 590 reads (score <20 for 10% or more of read) were then discarded using FASTQ Quality Filter, FASTX‐ 591 Toolkit (v.0.0.14) (Gordon 2010). Human rRNA sequences were retrieved from RiboGalaxy (Michel, 592 Mullan et al. 2016) and high confidence hg38 tRNA sequences from GtRNAdb Release 17 (Chan and 593 Lowe 2016). One base was removed from 3’ end of reads to improve alignment quality, reads 594 originating from rRNA and tRNA were aligned and removed using Bowtie2 (v.2.4.1) (Langmead and 595 Salzberg 2012). 596 The splice aware aligner STAR (v2.7.5c) (Dobin, Davis et al. 2012) was used to map remaining 597 reads to human reference genome(GRCh38.p12), GENCODE release 30 (Frankish, Diekhans et al. 598 2019). The STAR (v2.7.5c) (Dobin, Davis et al. 2012) genome index was built with a sjdbOverhang of 599 73. SAMtools (v.1.10) (Li, Handsaker et al. 2009) was used to create sorted, indexed bam files of the 600 resulting alignments. 601 Metaplots of aligned Ribo‐seq data were generated using metaplots.bash script from Ribotaper 602 (v1.3) (Calviello, Mukherjee et al. 2016) pipeline. These show the distance between the 5’ ends of 603 Ribo‐seq and annotated start and stop codons from CCDS ORFs, allowing the locations of P‐sites to be 604 inferred. Read lengths exhibiting the best triplet periodicity were selected for each replicate, along 605 with appropriate offsets (Sup 5, Sup Table 1). 606 Actively translated smORFs were then identified using Ribotaper (v1.3) (Calviello, Mukherjee et 607 al. 2016). Initially, this requires an exon to contain more than 5 P‐sites in order to pass to quality 608 control steps. Identified ORFs were then required to have a 3‐nt periodic pattern of Ribo‐seq reads, 609 with 50% or more of the P‐sites in‐frame. In the case of multiple start codons, the most upstream in‐ 610 frame start codon with a minimum of five P‐sites in between it and the next ATG was selected. ORFs

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

611 for which >30% of the Ribo‐seq coverage was only supported by multimapping reads were also 612 subsequently filtered. For a smORF to be considered actively translated in a condition, we required 613 that it be identified in at least two of the three biological replicates for the condition. 614 Specific metaplots were also created for the 45 translated lncRNA smORFs, and 100 randomly 615 selected translated ccds ORFs, to compare ribosome enrichment around the start and stop codons in 616 our protocol. P‐sites were computed for each position in a 75 nt window around the start and stop 617 codons, and scaled by the total number of reads in the two windows for each transcript. The mean 618 normalised counts we then taken for each position in the two windows, and plotted. 619 Translational efficiency (TE) was estimated for all translated ORFs in each condition, where TE 620 was equal to the mean number of P sites per ORF, normalised by the median P sites per ORF per 621 replicate, divided by the mean number of RNA sites per ORF, normalised by the median RNA sites per 622 ORF per replicate. 623 624 smORF peptide analysis 625 For each of our ORF sets (protein coding, lncRNA‐smORF, uORF and dORFs), the average amino 626 acid compositions were calculated. Random control expected frequencies were taken from King and 627 Jukes (King and Jukes 1969). 628 Two published SH‐SY5Y cell mass proteomics datasets were analysed: PXD010776 (Murillo, Pla 629 et al. 2018) and PXD014381 (Brenig, Grube et al. 2020) . Binary raw files (*.raw) were downloaded 630 from PRIDE then converted to human‐readable MGF format using ThemoRawFileParser (Hulstaert, 631 Shofstahl et al. 2020). The amino acid sequences of our translated uORFs, dORFs and lncRNA‐smORFs 632 were added to the whole Homo sapiens proteome dataset (20,379 entries) downloaded from 633 UniProtKB (Bateman, Martin et al. 2019) on Nov/2019. The new FASTA file was then used as a 634 customized database on Comet (v2019.01.2) (Eng, Jahan et al. 2013) search engine runs that scanned 635 all MS/MS files (*.mgf) against it. 636 Default settings in Comet were used with the following exceptions according to the MS/MS 637 data type. iTRAQ‐4plex (PXD010776): decoy_search = 1, peptide_mass_tolerance= 10.00, 638 fragment_bin_tol = 0.1, fragment_bin_offset = 0.0, theoretical_fragment_ions = 0, 639 spectrum_batch_size = 15000, clear_mz_range = 113.5‐117.5, add_Nterm_peptide = 144.10253, 640 add_K_lysine = 144.10253, minimum_peaks = 8. Label‐free (PXD014381): decoy_search = 1, 641 peptide_mass_tolerance = 10.00, fragment_bin_tol = 0.02, fragment_bin_offset = 0.0, 642 theoretical_fragment_ions = 0, spectrum_batch_size = 15000. CometUI (Eng, Jahan et al. 2013) was 643 employed for analysing MS/MS data and setting a false discovery rate (FDR) threshold of 10% per

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

644 peptide identification. This FDR threshold was selected due to expected low abundance levels of the 645 target smORFs. 646 647 Conservation analysis 648 Protein, cDNA and ncRNA sequence data for H.sapiens, P.abelii, P.paniscus, P.troglodytes, 649 G.gorilla, N.leucogenys, and M.musculus were obtained from Ensembl (release 100, (Yates, Achuthan 650 et al. 2020). LncRNAs are poorly conserved so we selected 5 species of apes with well‐annotated 651 genomes (4 of these are great apes) and M.musculus represents an outgroup. These data formed the 652 subject database for subsequent homology searches. 653 A number of criteria were considered to deem an ORF “conserved”. At the protein level, these 654 included a pairwise distance of 50% or less, syntenous positions in the genome, and the finding of 655 ortholog groups exhibiting similar conservation to the human ORF. At the transcript level (using cDNA 656 and ncRNA data), the above criteria were considered, along with the conservation of a start codon 657 and subsequent sequence. If the same results were returned multiple times by the search strategies 658 described below, they were also given extra consideration. In all cases the focus was on the 659 conservation of the ORF sequence, not necessarily the surrounding transcript. 660 Sequence homology searches were performed using BLASTp (e value = 0.001) where the 45 661 translated human lncRNA peptide sequences formed the queries and the protein sequences for 662 P.abelii, P.paniscus, P.troglodytes, G.gorilla, N.leucogenys, and M.musculus formed the subject 663 database (Altschul, Gish et al. 1990). Results were filtered to remove anything with <75% identity, 664 unless a result(s) was the lowest e‐value hit for a given query in each species. Results were returned 665 for 12 lncRNA peptides, and these were manually cross validated using in the Ensembl Genome 666 Browser and using multiple sequence alignments generated in ClustalOmega (Sievers, Wilm et al. 667 2011). Default parameter settings were applied in the msa package in R (Bodenhofer, Bonatesta et al. 668 2015). 669 The transcript sequences of the 45 translated lncRNAs were searched against transcriptome 670 databases created by combining the cDNA and ncRNA data for each species, using BLASTn (e‐value = 671 0.001) (Altschul, Gish et al. 1990). Results of this BLAST were used to filter the initial BLAST databases. 672 ORF portions of the 45 translated lncRNAs were extracted, and searched against these filtered 673 databases using BLASTn (e‐value = 0.001) (Altschul, Gish et al. 1990). 674 The homology searches confirmed the genes of origin for all 45 lncRNA‐smORFs in H.sapiens 675 at the nucleotide and peptide level. For the nucleotide sequence searches, the remaining species could 676 identify homologs for 18 of the lncRNA ORF queries. These were cross validated as described above 677 (i.e. manually and using MSAs), resulting in 14 lncRNA‐smORFs with evidence of sequence

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

678 conservation based on transcript sequences. For the protein sequences, the remaining species 679 returned results homologs for 21 of the lncRNA peptide queries. As some queries had many spurious 680 results, they were further filtered to select the transcripts(s) with lowest e‐value for each query in 681 each species. These were cross validated as above, resulting in 16 lncRNA peptides with evidence of 682 sequence conservation based on transcript sequences. We combined evidence from both approaches 683 into a final dataset consisting of 17 lncRNA‐smORFs with evidence of conservation in at least one of 684 the 6 species queried. 685 The nucleotide alignments of the 17 lncRNA‐smORFs were manually curated and trimmed, 686 and evaluated using the 58 mammals model in PhyloCSF (Lin, Jungreis et al. 2011). As Bonobo is not 687 in this model, 3 sequences could not be evaluated, and the putative Bonobo ORF was removed from 688 a further 3 alignments. 689 690 Cytoplasmic/Nuclear fractionation of SH‐SY5Y cells. 691 Cells were harvested and washed with 1X PBS. Cells were lysed in whole cell lysis buffer (Sup 692 methods) (500µL buffer per 106 cells) on ice for 30min. Whole cell lysate aliquots were removed and 693 remainder subjected to centrifugation at 1,600 x g for 8 min to pellet nuclei. Nuclear and cytoplasmic 694 fractions were subjected to two further clearing steps by centrifugation (3,000 x g and 10,000 x g 695 respectively). Nuclei were lysed in RIPA buffer (Sup methods). ~10% of both nuclear and cytoplasmic 696 lysates were used for Western Blot and ~90% subjected to RNA extraction (ZYMO R1055). 697 698 Western Blot 699 Samples were diluted in 4X Laemmli sample buffer (Biorad) (277.8 mM Tris‐HCI, pH 6.8, 4.4% 700 LDS, 44.4% (v/v) glycerol, 0.02% bromophenol blue), 5% β‐mercaptoethanol (Sigma) was added prior 701 to heating at 95oC for 5 min and loaded on 10% SDS gels. Gel electrophoresis was performed using the 702 BioRad Mini‐PROTEAN 3 gel electrophoresis system (Bio‐Rad Laboratories, Hercules, CA, USA). 703 Proteins were transferred to nitrocellulose membranes (AmershamTM ProtranTM) and blocked with 5% 704 fat‐free milk powder in 1XPBS, 0.05% Tween‐20 (Sigma) for 1h at RT. Blots were incubated with 705 primary antibodies overnight (Table 1). Blots were then washed in PBS‐T and incubated with 706 secondary (anti‐mouse HRP) at RT for 2h. Membranes were washed 3 times with PBS‐T, prior 707 to application of ECL (Biological Industries). Chemiluminescent signal was detected with Chemi‐Doc 708 (BIO‐RAD). All membranes were probed for ‐tubulin as loading control. 709 710 Analysis of pubicly available lncRNA data

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

711 Dynamic differential expression analysis data was accessed through lncExpDB (Cardoso‐Moreira, 712 Halbert et al. 2019, Sarropoulos, Marin et al. 2019), where R package maSigPro was used to identify 713 developmentally dynamically lncRNAs i.e., genes that show large changes in expression during 714 development of a specific organ. 32 lncRNA genes identified by Poly‐Ribo‐Seq as translated were 715 present in lncExpDB. Disease association analysis used data from lncRNADisease (Chen, Wang et al. 716 2013, Bao, Yang et al. 2019), Differential Expression Atlas 717 (https://www.ebi.ac.uk/gxa/experiments?species=homo%20sapiens), Cancer RNA‐Seq Nexus (Li, Sun 718 et al. 2016), Malacards (Rappaport, Nativ et al. 2013, Rappaport, Twik et al. 2014, Rappaport, Twik et 719 al. 2017). Genes were considered related to a disease if they showed significant differential expression

720 between diseased and control conditions (log2 fold >1, p<0.05) or if they had been experimentally 721 validated in the literature. 722 723 smORF tagging 724 5’‐UTRs and CDSs of putative smORFs (lacking stop codon) were generated by PCR (Sup 725 methods), using NEB High Fidelity DNA Polymerase (Q5). C‐terminal 3xFLAG tag was incorporated 726 within the reverse primer (Sup methods) by PCR and products were cloned into NheI and EcoRV 727 restriction sites (Sup methods) of pcDNA3.1‐hygro vector (Addgene, kindly provided by Mark Richards‐ 728 Bayliss group, University of Leeds). Start codon mutations were generated by site directed 729 mutagenesis (Q5 Site directed mutagenesis kit, NEB). 730 Plasmid transfections were performed using Lipofectamine 3000 (Thermo) following the 731 manufacturer’s instructions. After 48 h, the cells were fixed for 20 min with 4% paraformaldehyde, 732 washed with 1X PBS, 0.1% Triton X–100 (PBS‐T) and processed for immunocytochemistry as previously 733 described. Imaging was conducted using EVOS fluorescent microscope. 734 735 siRNA knockdown 736 siRNA knockdown was performed using Lincode siRNA SMARTpool (Dharmacon) (LINC01116 737 transcript: R‐027999‐00‐0005 SMARTpool). Lincode Non‐targeting Pool (D‐001810‐10) was used as 738 scrambled control. Cells were seeded in 24‐well plates (105 cells/well,) and siRNA where transfected 739 using RNAiMAX lipofectamine (ThermoFisher) as per manufacturer’s instructions. 740 741 General statistics and plots 742 Statistical analyses were performed in R (R Core Team, 2019), using packages including stringr 743 (Wickham 2019), dplyr (Wickham, 2017), tidyr (Wickham, 2017), protr (Xiao, Cao et al. 2015) ggplot

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

744 (Wickham, 2016), ggstatsplot (Patil 2018), knitr (Xie 2020), seqinr (Charif D 2007), ggbeeswarm (Clarke 745 and Sherrill‐Mix 2017), and EnhancedVolcano (Blighe, Rana et al. 2018) 746 Experimental values (RT‐qPCR, are under polysome graphs, % of cells) from independent 747 samples with equal variances, were assessed using 2‐tailed unpaired Student’s t‐test. The results are 748 shown as mean±SEM values of 3 independent replicates. The exact p values are described and 749 specified in each figure legend. p values < 0.05 were considered statistically significant. 750 751 Data availability 752 Poly‐Ribo‐Seq datasets are deposited in GEO with ID GSE166214. 753 754 Acknowledgements 755 We wish to thank Dr Eric Hewitt for providing the SH‐SY5Y cells; Dr Iosifina Sampson and Dr Mark 756 Richards (Bayliss group) for providing HEK293 cells and mammalian vectors. We thank the Next 757 Generation Sequencing facility, at St James University Hospital, Leeds, UK for performing Next 758 Generation Sequencing. We also thank the Imaging Facility, Faculty of Biological Sciences, University 759 of Leeds, UK for their assistance in confocal microscopy. Parts of this work were undertaken on 760 ARC3, part of the High Performance Computing facilities at the University of Leeds, UK. 761 762 Author contributions 763 KD designed and performed experiments, acquired, analysed and interpreted data, drafted and 764 revised the manuscript. IB designed and performed experiments, acquired, analysed and interpreted 765 data, drafted and revised the manuscript. DW analysed, interpreted data and revised the 766 manuscript. AK analysed, interpreted data and revised the manuscript. SC acquired and analysed 767 data, and revised the manuscript. AB acquired and analysed data, and revised the manuscript. EJRV 768 analysed data and revised the manuscript. MOC helped interpret portions of the data and critique 769 the output for important intellectual content, and revised the manuscript. JD helped design 770 experiments, interpret portions of the data and critique the output for important intellectual 771 content, and revised the manuscript. AW helped design experiments, interpret portions of the data 772 and critique the output for important intellectual content, and revised the manuscript. JLA 773 conceived the work, interpreted data, drafted and revised the manuscript. 774 775 Funding 776 This work was funded by MRC (MR/N000471/1). KA was funded by Leeds Anniversary Research 777 Scholarship (LARS). AB’s summer project was funded by White Rose BBSRC Doctoral Training

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

778 Partnership in Mechanistic Biology (BB/M011151/1). AK is supported by a studentship from MRC 779 Discovery Medicine North (DiMeN) Doctoral Training Partnership (MR/N013840/1). JA is funded by 780 the University of Leeds (University Academic Fellow scheme). 781 782 Competing Interests 783 None of the authors have any competing interests. 784 785 Figure legends 786 787 Figure 1: Cytoplasmic lncRNA expression is regulated during neuronal differentiation 788 (A) Schematic of Poly‐Ribo‐Seq, with 3 level of analysis: i) total cytoplasmic, ii) polysome‐associated 789 and iii) translated lncRNAs. (B) Volcano plot of differentially expression analysis of polysome‐ 790 associated lncRNAs (labelled by geneIDs and names for candidate lncRNAs) between Control and RA

791 populations. 237 lncRNAs are upregulated during differentiation and 82 downregulated (log2 fold‐ 792 change cutoff=1, padj<0.05). Pie chart of types of lncRNAs (C) upregulated and (D) downregulated 793 upon differentiation (intergenic; anti‐sense; sense‐overlapping; retained ; sense‐intronic; 794 uncharacterized; NMD target). (E) LncRNAs of interest that are induced are specifically localised to 795 cytoplasm as shown by subcellular fractionation RT‐qPCR. XIST lncRNA was used as a nuclear and 796 GAPDH mRNA as a cytoplasmic positive control (n=3, SE is plotted, student’s t test, n=3, p>0.05). 797 798 Figure 2: Association of lncRNAs polysomes during neuronal differentiation 799 Volcano plots displaying the significantly differentially localised lncRNAs (labelled by their geneIDs)

800 between Total and Polysome populations in (A) Control and (B) RA, with log2 fold‐change cutoff=1, 801 padj<0.05. (C‐E) RT‐qPCR of lncRNAs across sucrose gradient fractions (n=3, SE is plotted). (C) 802 LINC02143 is found in 80S and small polysome fractions during differentiation. 5% of the transcript is 803 detected in 80S (monosome) fraction and 66% in small polysome complexes. (D) DLGAP1‐AS2 is 804 found in 80S and small polysome fractions both in control and RA treated cells. On average, 63% of 805 the transcript is detected in the polysome fractions in Control and 49% upon differentiation. (E) 806 LINC01116 is found in 80S and 2‐7 polysome fractions both in control and RA treated cells. On 807 average, 66% of the LINC01116 transcripts is detected in the polysome fractions in Control and 57% 808 upon differentiation. 809 810 Figure 3: Translation of lncRNA smORFs 811 (A) Workflow for identification of translated ORFs from Ribo‐seq and RNA‐seq, see materials and 812 methods for details. (B) Venn diagram of lncRNA‐smORFs translated in Control and

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

813 RA, with overlap. (C) Plots of Translational Efficiencies for protein‐coding ORFs, lncRNA‐ 814 smORFs, dORFS and uORFs. (D) Length distribution of translated ORFs in lncRNAs, dORFs and uORFs 815 (in codons). (E) Poly‐Ribo‐Seq profile for LINC01116 in RA treatment. RNA‐Seq (Polysome) reads are 816 grey and ribosome P sites in purple, turquoise and yellow according to frame. Purple lines 817 mark beginning and end of translated smORF. All possible start and stop codons are indicated below. 818 Framing within and outside translated smORF shown on left. 819 820 Figure 4: Peptide production from smORFs in lncRNAs 821 (A) Venn diagram showing overlap lncRNA‐smORFs detected, between our Poly‐Ribo‐Seq and 822 publicly available mass spectrometry data from SH‐SY5Y (purple). Control in blue, RA in pink. (B) 823 Schematic of tagging construct for LINC00478; lncRNA sequence upstream of smORF and smORF, 824 excluding its stop codon, cloned upstream of 3X FLAG, which is lacking its own start codon. FLAG 825 signal is therefore dependent on smORF translation. (C) Poly‐Ribo‐Seq profile for LINC00478 in RA 826 treatment. RNA‐Seq (Polysome) reads are grey and ribosome P sites in purple, turquoise and yellow 827 according to frame. Purple lines mark beginning and end of translated smORF. All possible start and 828 stop codons are indicated below. Framing within and outside translated smORF shown on right. (D) 829 Confocal images of FLAG‐tagged LINC00478 peptide in SH‐SY5Y cells (Control), showing (i) nuclear 830 and (ii) cytoplasmic distribution, green is FLAG, magenta is hnRNPK (marking nuclei) and blue is DAPI 831 (scale bar is 20μm). (E) Schematic of tagging constructs for LINC01116 (WT and start codon mutant 832 Δ1). (F) Confocal images of FLAG‐tagged WT LINC01116 peptide showing cytoplasmic localisation, 833 near cell membrane and neuritic processes (magnification of insert is 3X). (G) Δ1 start codon mutant, 834 showing no FLAG signal, in SH‐SY5Y cells; green is FLAG and blue is DAPI (scale bar is 20μm). 835 836 Figure 5: Translated lncRNA‐smORFs exhibit sequence conservation in great apes. 837 (A) Phylogram with lncRNA‐smORFs for which evidence of sequence conservation were found 838 represented as circles, coloured according to how sequence conservation was identified. Phylogram 839 built in iTOL (Letunic and Bork 2011) using data from TimeTree (Kumar, Stecher et al. 2017), scale in 840 10 MYA along. (B) Portion of human ENST00000442428 lncRNA nt alignment with gorilla and 841 orangutan nt sequences, showing the smORF. Alignment built in ClustalOmega (Sievers, Wilm et al. 842 2011) displayed in JalView (Waterhouse, Procter et al. 2009). 843 844 Figure 6: Potential biological importance of translated lncRNAs in neural development, 845 differentiation and disease.

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

846 (A) Percentage of translated lncRNA genes and all lncRNA genes, which exhibit dynamic expression 847 during human development according to lncExpDB (Sarropoulos, Marin et al. 2019). (B) Percentage 848 of the dynamically expressed, translated lncRNAs, which show dynamic expression in each organ, for 849 total and translated lncRNA populations, according to LncExpDB (Sarropoulos, Marin et al. 2019). (C) 850 Proportion of translated lncRNAs, which exhibit differential expression during differentiation of 851 SHSY5Y cells. (D) GO terms associated with changes in RNA‐Seq levels upon siRNA knockdown of 3 of 852 the translated lncRNAs (LINC01116, FGD5‐AS1 and TUG1) performed by FANTOM6 in human dermal 853 fibroblasts. (E) Percentage of translated lncRNA genes found to be associated with neuronal diseases 854 and disorders according to lncRNADisease, Differential Expression Atlas, Cancer RNA‐Seq Nexus, 855 Malacards. 856 857 Figure 7: LINC01116 contributes to neuronal differentiation but does not affect cell cycle 858 progression. 859 (A) Representative immunofluorescence images of Control and RA SH‐SY5Y cells, transfected with 860 siRNA targeting LINC01116 and scrambled control, after staining for Tuj (βIII‐tubulin) at day 3 post‐ 861 differentiation (scale bar=200μm). White windows magnified in (B). (C) Quantification of neurite 862 length in Control and RA treated cells upon knockdown shows a significant reduction of neurite 863 length in the differentiated cells upon LINC01116 knockdown (N=3 biological replicates, n>100 864 measurements, student's t‐test p<0.05). (D) RT‐qPCR of differentiation marker MOXD1 in Control 865 and RA treated cells, transfected with siRNA targeting LINC01116 and scrambled control, shows 866 significant reduction of MOXD1 expression in differentiated cells with reduced LINC01116 levels at 867 day 3 post‐differentiation (n=3 biological replicates, student's t‐test p<0.05). 868 869 870 871 References 872 Altschul, S. F., W. Gish, W. Miller, E. W. Myers and D. J. Lipman (1990). "BASIC LOCAL 873 ALIGNMENT SEARCH TOOL." Journal of Molecular Biology 215(3): 403‐410. 874 Anderson, D. M., K. M. Anderson, C. L. Chang, C. A. Makarewich, B. R. Nelson, J. R. McAnally, 875 P. Kasaragod, J. M. Shelton, J. Liou, R. Bassel‐Duby and E. N. Olson (2015). "A micropeptide 876 encoded by a putative long noncoding RNA regulates muscle performance." Cell 160(4): 877 595‐606. 878 Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data. 879 http://www.bioinformatics.babraham.ac.uk/projects/fastqc. 880 Aspden, J. L., Y. C. Eyre‐Walker, R. J. Phillips, U. Amin, M. A. Mumtaz, M. Brocard and J. P. 881 Couso (2014). "Extensive translation of small Open Reading Frames revealed by Poly‐Ribo‐ 882 Seq." Elife 3: e03528.

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

883 Bao, Z., Z. Yang, Z. Huang, Y. Zhou, Q. Cui and D. Dong (2019). "LncRNADisease 2.0: an 884 updated database of long non‐coding RNA‐associated diseases." Nucleic Acids Res 47(D1): 885 D1034‐D1037. 886 Bateman, A., M. J. Martin, S. Orchard, M. Magrane, E. Alpi, B. Bely, M. Bingley, R. Britto, B. 887 Bursteinas, G. Busiello, H. Bye‐A‐Jee, A. Da Silva, M. De Giorgi, T. Dogan, L. G. Castro, P. 888 Garmiri, G. Georghiou, D. Gonzales, L. Gonzales, E. Hatton‐Ellis, A. Ignatchenko, R. Ishtiaq, P. 889 Jokinen, V. Joshi, D. Jyothi, R. Lopez, J. Luo, Y. Lussi, A. MacDougall, F. Madeira, M. 890 Mahmoudy, M. Menchi, A. Nightingale, J. Onwubiko, B. Palka, K. Pichler, S. Pundir, G. Y. Qi, 891 S. Raj, A. Renaux, M. R. Lopez, R. Saidi, T. Sawford, A. Shypitsyna, E. Speretta, E. Turner, N. 892 Tyagi, P. Vasudev, V. Volynkin, T. Wardell, K. Warner, X. Watkins, R. Zaru, H. Zellner, A. 893 Bridge, I. Xenarios, S. Poux, N. Redaschi, L. Aimo, G. Argoud‐Puy, A. Auchincloss, K. Axelsen, 894 P. Bansal, D. Baratin, M. C. Blatter, J. Bolleman, E. Boutet, L. Breuza, C. Casals‐Casas, E. de 895 Castro, E. Coudert, B. Cuche, M. Doche, D. Dornevil, A. Estreicher, L. Famiglietti, M. 896 Feuermann, E. Gasteiger, S. Gehant, V. Gerritsen, A. Gos, N. Gruaz, U. Hinz, C. Hulo, N. Hyka‐ 897 Nouspikel, F. Jungo, G. Keller, A. Kerhornou, V. Lara, P. Lemercier, D. Lieberherr, T. 898 Lombardot, X. Martin, P. Masson, A. Morgat, T. B. Neto, S. Paesano, I. Pedruzzi, S. Pilbout, 899 M. Pozzato, M. Pruess, C. Rivoire, C. Sigrist, K. Sonesson, A. Stutz, S. Sundaram, M. Tognolli, 900 L. Verbregue, C. H. Wu, C. N. Arighi, L. Arminski, C. M. Chen, Y. X. Chen, J. Cowart, J. S. 901 Garavelli, H. Z. Huang, K. Laiho, P. McGarvey, D. A. Natale, K. Ross, C. R. Vinayaka, Q. H. 902 Wang, Y. Q. Wang, L. S. Yeh, J. Zhang and C. UniProt (2019). "UniProt: a worldwide hub of 903 protein knowledge." Nucleic Acids Research 47(D1): D506‐D515. 904 Bazzini, A. A., T. G. Johnstone, R. Christiano, S. D. Mackowiak, B. Obermayer, E. S. Fleming, C. 905 E. Vejnar, M. T. Lee, N. Rajewsky, T. C. Walther and A. J. Giraldez (2014). "Identification of 906 small ORFs in vertebrates using ribosome footprinting and evolutionary conservation." 907 EMBO J 33(9): 981‐993. 908 Blair, J. D., D. Hockemeyer, J. A. Doudna, H. S. Bateup and S. N. Floor (2017). "Widespread 909 Translational Remodeling during Human Neuronal Differentiation." Cell Reports 21(7): 2005‐ 910 ‐2016. 911 Blighe, K., S, M. Rana and Lewis (2018). EnhancedVolcano: Publication‐ready volcano plots 912 with enhanced colouring and labeling. https://github.com/kevinblighe/EnhancedVolcano. 913 Bodenhofer, U., E. Bonatesta, C. Horejs‐Kainrath and S. Hochreiter (2015). "msa: an R 914 package for multiple ." Bioinformatics 31(24): 3997‐3999. 915 Brenig, K., L. Grube, M. Schwarzländer, K. Köhrer, K. Stühler and G. Poschmann (2020). "The 916 Proteomic Landscape of Cysteine Oxidation That Underpins Retinoic Acid‐Induced Neuronal 917 Differentiation." J Proteome Res 19(5): 1923‐1940. 918 Brockdorff, N. (2018). "Local Tandem Repeat Expansion in Xist RNA as a Model for the 919 Functionalisation of ncRNA." Non‐Coding RNA 4(4): 28. 920 Brodie, S., H. K. Lee, W. Jiang, S. Cazacu, C. Xiang, L. M. Poisson, I. Datta, S. Kalkanis, D. 921 Ginsberg and C. Brodie (2017). "The novel long non‐coding RNA TALNEC2, regulates tumor 922 cell growth and the stemness and radiation response of glioma stem cells." Oncotarget 923 8(19): 31785‐‐31801. 924 Calviello, L., N. Mukherjee, E. Wyler, H. Zauber, A. Hirsekorn, M. Selbach, M. Landthaler, B. 925 Obermayer and U. Ohler (2016). "Detecting actively translated open reading frames in 926 ribosome profiling data." Nature Methods 13(2): 165‐+. 927 Cardoso‐Moreira, M., J. Halbert, D. Valloton, B. Velten, C. Chen, Y. Shao, A. Liechti, K. 928 Ascenção, C. Rummel, S. Ovchinnikova, P. V. Mazin, I. Xenarios, K. Harshman, M. Mort, D. N. 929 Cooper, C. Sandi, M. J. Soares, P. G. Ferreira, S. Afonso, M. Carneiro, J. M. A. Turner, J. L.

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

930 VandeBerg, A. Fallahshahroudi, P. Jensen, R. Behr, S. Lisgo, S. Lindsay, P. Khaitovich, W. 931 Huber, J. Baker, S. Anders, Y. E. Zhang and H. Kaessmann (2019). "Gene expression across 932 mammalian organ development." Nature 571(7766): 505‐509. 933 Carelli, S., T. Giallongo, F. Rey, E. Latorre, M. Bordoni, S. Mazzucchelli, M. C. Gorio, O. 934 Pansarasa, A. Provenzani and C. Cereda (2019). "HuR interacts with lincBRN1a and 935 lincBRN1b during neuronal stem cells differentiation." RNA Biology 16(10): 1471‐‐1485. 936 Carlevaro‐Fita, J., A. Rahim, R. Guigó, L. A. Vardy and R. Johnson (2016). "Cytoplasmic long 937 noncoding RNAs are frequently bound to and degraded at ribosomes in human cells." Rna 938 22(6): 867‐‐882. 939 Carrieri, C., L. Cimatti, M. Biagioli, A. Beugnet, S. Zucchelli, S. Fedele, E. Pesce, I. Ferrer, L. 940 Collavin, C. Santoro, A. R. R. Forrest, C. Piero, S. Biffo, E. Stupka and S. Gustincich (2012). 941 "Long non‐coding antisense RNA controls Uchl1 translation through an embedded SINEB2 942 repeat." Nature 491(7424): 454‐‐457. 943 Chan, P. P. and T. M. Lowe (2016). "GtRNAdb 2.0: an expanded database of transfer RNA 944 genes identified in complete and draft genomes." Nucleic Acids Research 44(D1): D184‐ 945 D189. 946 Charif D, L. J. (2007). SeqinR 1.0‐2: a contributed package to the R project for statistical 947 computing devoted to biological sequences retrieval and analysis. Structural approaches to 948 sequence evolution: , networks, populations. 949 Chau, K. F., M. L. Shannon, R. M. Fame, E. Fonseca, H. Mullan, M. B. Johnson, A. K. 950 Sendamarai, M. W. Springel, B. Laurent and M. K. Lehtinen (2018). "Downregulation of 951 ribosome biogenesis during early forebrain development." Elife 7. 952 Chen, G., Z. Wang, D. Wang, C. Qiu, M. Liu, X. Chen, Q. Zhang, G. Yan and Q. Cui (2013). 953 "LncRNADisease: a database for long‐non‐coding RNA‐associated diseases." Nucleic Acids 954 Res 41(Database issue): D983‐986. 955 Chen, J., A.‐D. Brunner, J. Z. Cogan, J. K. Nuez, A. P. Fields, B. Adamson, D. N. Itzhak, J. Y. Li, 956 M. Mann, M. D. Leonetti and J. S. Weissman (2020). "Pervasive functional translation of 957 noncanonical human open reading frames." Science 367(6482): 1140‐‐1146. 958 Chodroff, R. A., L. Goodstadt, T. M. Sirey, P. L. Oliver, K. E. Davies, E. D. Green, Z. Molnár and 959 C. P. Ponting (2010). "Long noncoding RNA genes: conservation of sequence and brain 960 expression among diverse amniotes." 961 Chong, C., M. Müller, H. Pak, D. Harnett, F. Huber, D. Grun, M. Leleu, A. Auger, M. Arnaud, B. 962 J. Stevenson, J. Michaux, I. Bilic, A. Hirsekorn, L. Calviello, L. Simó‐Riudalbas, E. Planet, J. 963 Lubiński, M. Bryśkiewicz, M. Wiznerowicz, I. Xenarios, L. Zhang, D. Trono, A. Harari, U. Ohler, 964 G. Coukos and M. Bassani‐Sternberg (2020). "Integrated proteogenomic deep sequencing 965 and analytics accurately identify non‐canonical peptides in tumor immunopeptidomes." Nat 966 Commun 11(1): 1293. 967 Clarke, E. and S. c. Sherrill‐Mix (2017). ggbeeswarm: Categorical Scatter (Violin Point) Plots. 968 Derrien, T., R. Johnson, G. Bussotti, A. Tanzer, S. Djebali, H. Tilgner, G. Guernec, D. Martin, A. 969 Merkel, D. G. Knowles, J. Lagarde, L. Veeravalli, X. Ruan, Y. Ruan, T. Lassmann, P. Carninci, J. 970 B. Brown, L. Lipovich, J. M. Gonzalez, M. Thomas, C. A. Davis, R. Shiekhattar, T. R. Gingeras, 971 T. J. Hubbard, C. Notredame, J. Harrow and R. Guigó (2012). "The GENCODE v7 catalog of 972 human long noncoding RNAs: analysis of their gene structure, evolution, and expression." 973 Genome Res 22(9): 1775‐1789. 974 Dimartino, D., A. Colantoni, M. Ballarino, J. Martone, D. Mariani, J. Danner, A. Bruckmann, 975 G. Meister, M. Morlando and I. Bozzoni (2018). "The Long Non‐coding RNA lnc‐31 Interacts

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

976 with Rock1 mRNA and Mediates Its YB‐1‐Dependent Translation." Cell Reports 23(3): 733‐‐ 977 740. 978 Djebali, S., C. A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, A. Tanzer, J. Lagarde, 979 W. Lin, F. Schlesinger, C. Xue, G. K. Marinov, J. Khatun, B. A. Williams, C. Zaleski, J. Rozowsky, 980 M. Röder, F. Kokocinski, R. F. Abdelhamid, T. Alioto, I. Antoshechkin, M. T. Baer, N. S. Bar, P. 981 Batut, K. Bell, I. Bell, S. Chakrabortty, X. Chen, J. Chrast, J. Curado, T. Derrien, J. Drenkow, E. 982 Dumais, J. Dumais, R. Duttagupta, E. Falconnet, M. Fastuca, K. Fejes‐Toth, P. Ferreira, S. 983 Foissac, M. J. Fullwood, H. Gao, D. Gonzalez, A. Gordon, H. Gunawardena, C. Howald, S. Jha, 984 R. Johnson, P. Kapranov, B. King, C. Kingswood, O. J. Luo, E. Park, K. Persaud, J. B. Preall, P. 985 Ribeca, B. Risk, D. Robyr, M. Sammeth, L. Schaffer, L. H. See, A. Shahab, J. Skancke, A. M. 986 Suzuki, H. Takahashi, H. Tilgner, D. Trout, N. Walters, H. Wang, J. Wrobel, Y. Yu, X. Ruan, Y. 987 Hayashizaki, J. Harrow, M. Gerstein, T. Hubbard, A. Reymond, S. E. Antonarakis, G. Hannon, 988 M. C. Giddings, Y. Ruan, B. Wold, P. Carninci, R. Guigó and T. R. Gingeras (2012). "Landscape 989 of in human cells." Nature 489(7414): 101‐108. 990 Dobin, A., C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson and 991 T. R. Gingeras (2012). "STAR: ultrafast universal RNA‐seq aligner." Bioinformatics 29(1): 15‐ 992 21. 993 Duncan, C. D. S. and J. Mata (2014). "The translational landscape of fission‐yeast meiosis and 994 sporulation." Nature Structural \& Molecular Biology 21(7): 641‐‐647. 995 Eden, E., R. Navon, I. Steinfeld, D. Lipson and Z. Yakhini (2009). "GOrilla: a tool for discovery 996 and visualization of enriched GO terms in ranked gene lists." BMC Bioinformatics 10(1): 48. 997 Eng, J. K., T. A. Jahan and M. R. Hoopmann (2013). "Comet: An open‐source MS/MS 998 sequence database search tool." Proteomics 13(1): 22‐24. 999 Faghihi, M. A., M. Zhang, J. Huang, F. Modarresi, M. P. Van der Brug, M. A. Nalls, M. R. 1000 Cookson, G. St‐Laurent, 3rd and C. Wahlestedt (2010). "Evidence for natural antisense 1001 transcript‐mediated inhibition of microRNA function." Genome Biol 11(5): R56. 1002 Forster, J. I., S. Kglsberger, C. Trefois, O. Boyd, A. S. Baumuratov, L. Buck, R. Balling and P. M. 1003 A. Antony (2016). "Characterization of Differentiated SH‐SY5Y as Neuronal Screening Model 1004 Reveals Increased Oxidative Vulnerability." Journal of Biomolecular Screening 21(5): 496‐‐ 1005 509. 1006 Frankish, A., M. Diekhans, A.‐M. Ferreira, R. Johnson, I. Jungreis, J. Loveland, J. M. Mudge, C. 1007 Sisu, J. Wright, J. Armstrong, I. Barnes, A. Berry, A. Bignell, S. Carbonell Sala, J. Chrast, F. 1008 Cunningham, T. Di Domenico, S. Donaldson, I. T. Fiddes, C. García Girón, J. M. Gonzalez, T. 1009 Grego, M. Hardy, T. Hourlier, T. Hunt, O. G Izuogu, J. Lagarde, F. J. Martin, L. Martínez, S. 1010 Mohanan, P. Muir, F. C. P. Navarro, A. Parker, B. Pei, F. Pozo, M. Ruffier, B. M. Schmitt, E. 1011 Stapleton, M.‐M. Suner, I. Sycheva, B. Uszczynska‐Ratajczak, J. Xu, A. Yates, D. Zerbino, Y. 1012 Zhang, B. Aken, J. S. Choudhary, M. Gerstein, R. Guigó, T. J. P. Hubbard, M. Kellis, B. Paten, 1013 A. Reymond, M. L. Tress and P. Flicek (2019). "GENCODE reference annotation for the 1014 human and mouse genomes." Nucleic acids research 47(D1): D766‐‐D773. 1015 Fujii, K., Z. Shi, O. Zhulyn, N. Denans and M. Barna (2017). "Pervasive translational regulation 1016 of the cell signalling circuitry underlies mammalian development." Nature Communications 1017 8(1): 14443. 1018 Gordon, A. (2010). FASTQ/A short‐reads pre‐processing tools. 1019 http://hannonlab.cshl.edu/fastx_toolkit/. 1020 Guo, H., N. T. Ingolia, J. S. Weissman and D. P. Bartel (2010). "Mammalian microRNAs 1021 predominantly act to decrease target mRNA levels." Nature 466(7308): 835‐‐840.

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

1022 Guttman, M., P. Russell, N. T. Ingolia, J. S. Weissman and E. S. Lander (2013). "Ribosome 1023 profiling provides evidence that large noncoding RNAs do not encode proteins." Cell 154(1): 1024 240‐251. 1025 Haeussler, M., A. S. Zweig, C. Tyner, M. L. Speir, K. R. Rosenbloom, B. J. Raney, C. M. Lee, B. 1026 T. Lee, A. S. Hinrichs, J. N. Gonzalez, D. Gibson, M. Diekhans, H. Clawson, J. Casper, G. P. 1027 Barber, D. Haussler, R. M. Kuhn and W. J. Kent (2019). "The UCSC Genome Browser 1028 database: 2019 update." Nucleic acids research 47(D1): D853‐‐D858. 1029 Hannon, G. J. (2010). "FASTX‐Toolkit." 1030 Hulstaert, N., J. Shofstahl, T. Sachsenberg, M. Walzer, H. Barsnes, L. Martens and Y. Perez‐ 1031 Riverol (2020). "ThermoRawFileParser: Modular, Scalable, and Cross‐Platform RAW File 1032 Conversion." Journal of Proteome Research 19(1): 537‐542. 1033 Ingolia, N. T., G. A. Brar, S. Rouskin, A. M. McGeachy and J. S. Weissman (2013). "Genome‐ 1034 Wide Annotation and Quantitation of Translation by Ribosome Profiling." 4.18.11‐‐14.18.19. 1035 Johnsson, P., L. Lipovich, D. Grander and K. V. Morris (2014). "Evolutionary conservation of 1036 long non‐coding RNAs; sequence, structure, function." Biochimica Et Biophysica Acta‐ 1037 General Subjects 1840(3): 1063‐1071. 1038 King, J. L. and T. H. Jukes (1969). "NON‐DARWINIAN EVOLUTION." Science 164(3881): 788‐+. 1039 Korecka, J. A., R. E. van Kesteren, E. Blaas, S. O. Spitzer, J. H. Kamstra, A. B. Smit, D. F. Swaab, 1040 J. Verhaagen and K. Bossers (2013). "Phenotypic Characterization of Retinoic Acid 1041 Differentiated SH‐SY5Y Cells by Transcriptional Profiling." PLoS ONE 8(5): e63862. 1042 Kumar, S., G. Stecher, M. Suleski and S. B. Hedges (2017). "TimeTree: A Resource for 1043 Timelines, Timetrees, and Divergence Times." Molecular Biology and Evolution 34(7): 1812‐ 1044 1819. 1045 Ladoukakis, E., V. Pereira, E. G. Magny, A. Eyre‐Walker and J. P. Couso (2011). "Hundreds of 1046 putatively functional small open reading frames in Drosophila." Genome Biology 12(11): 17. 1047 Langmead, B. and S. L. Salzberg (2012). "Fast gapped‐read alignment with Bowtie 2." Nature 1048 Methods 9(4): 357‐U354. 1049 Letunic, I. and P. Bork (2011). "Interactive Tree Of Life v2: online annotation and display of 1050 phylogenetic trees made easy." Nucleic Acids Research 39: W475‐W478. 1051 Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis and R. 1052 Durbin (2009). "The Sequence Alignment/Map format and SAMtools." Bioinformatics 1053 (Oxford, England) 25(16): 2078‐‐2079. 1054 Li, J. R., C. H. Sun, W. Li, R. F. Chao, C. C. Huang, X. J. Zhou and C. C. Liu (2016). "Cancer RNA‐ 1055 Seq Nexus: a database of phenotype‐specific transcriptome profiling in cancer cells." Nucleic 1056 Acids Res 44(D1): D944‐951. 1057 Liao, Y., G. K. Smyth and W. Shi (2013). "The Subread aligner: fast, accurate and scalable 1058 read mapping by seed‐and‐vote." Nucleic acids research 41(10): e108‐‐e108. 1059 Liao, Y., G. K. Smyth and W. Shi (2014). "featureCounts: an efficient general purpose 1060 program for assigning sequence reads to genomic features." Bioinformatics (Oxford, 1061 England) 30(7): 923‐‐930. 1062 Lin, M. F., I. Jungreis and M. Kellis (2011). "PhyloCSF: a comparative genomics method to 1063 distinguish protein coding and non‐coding regions." Bioinformatics 27(13): i275‐282. 1064 Lin, N., K.‐Y. Chang, Z. Li, K. Gates, Z. A. Rana, J. Dang, D. Zhang, T. Han, C.‐S. Yang, T. J. 1065 Cunningham, S. R. Head, G. Duester, P. D. S. Dong and T. M. Rana (2014). "An Evolutionarily 1066 Conserved Long Noncoding RNA TUNA Controls Pluripotency and Neural Lineage 1067 Commitment." Molecular Cell 53(6): 1005‐‐1019.

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

1068 Lindsay, S. J., Y. Xu, S. N. Lisgo, L. F. Harkin, A. J. Copp, D. Gerrelli, G. J. Clowry, A. Talbot, M. 1069 J. Keogh, J. Coxhead, M. Santibanez‐Koref and P. F. Chinnery (2016). "HDBR Expression: A 1070 Unique Resource for Global and Individual Gene Expression Studies during Early Human 1071 Brain Development." Frontiers in Neuroanatomy. 1072 Love, M. I., W. Huber and S. Anders (2014). "Moderated estimation of fold change and 1073 dispersion for RNA‐seq data with DESeq2." Genome Biology 15(12): 550. 1074 Magny, E. G., J. I. Pueyo, F. M. Pearl, M. A. Cespedes, J. E. Niven, S. A. Bishop and J. P. Couso 1075 (2013). Conserved regulation of cardiac calcium uptake by peptides encoded in small open 1076 reading frames. Science. United States. 341: 1116‐1120. 1077 Martin, M. (2011). "Cutadapt removes adapter sequences from high‐throughput sequencing 1078 reads." 2011 17(1): 3. 1079 Michel, A. M., J. P. A. Mullan, V. Velayudhan, P. B. F. O'Connor, C. A. Donohue and P. V. 1080 Baranov (2016). "RiboGalaxy: A browser based platform for the alignment, analysis and 1081 visualization of ribosome profiling data." Rna Biology 13(3): 316‐319. 1082 Murillo, J. R., I. Pla, L. Goto‐Silva, F. C. S. Nogueira, G. B. Domont, Y. Perez‐Riverol, A. 1083 Sánchez and M. Junqueira (2018). "Mass spectrometry evaluation of a neuroblastoma SH‐ 1084 SY5Y cell culture protocol." Anal Biochem 559: 51‐54. 1085 Patil, I. (2018). ggstatsplot: 'ggplot2' Based Plots with Statistical Details. CRAN. 1086 Patraquim, P., M. A. S. Mumtaz, J. I. Pueyo, J. L. Aspden and J. P. Couso (2020). 1087 "Developmental regulation of canonical and small ORF translation from mRNAs." Genome 1088 Biol 21(1): 128. 1089 Pedersen, A. G. and H. Nielsen (1997). "Neural network prediction of translation initiation 1090 sites in : perspectives for EST and genome analysis." Proc Int Conf Intell Syst Mol 1091 Biol 5: 226‐233. 1092 Ponting, C. P., P. L. Oliver and W. Reik (2009). "Evolution and Functions of Long Noncoding 1093 RNAs." Cell 136(4): 629‐641. 1094 Pueyo, J. I. and J. P. Couso (2008). The 11‐aminoacid long Tarsal‐less peptides trigger a cell 1095 signal in Drosophila leg development. Dev Biol. United States. 324: 192‐201. 1096 Ramilowski, J. A., C. W. Yip, S. Agrawal, J. C. Chang, Y. Ciani, I. V. Kulakovskiy, M. Mendez, J. 1097 L. C. Ooi, J. F. Ouyang, N. Parkinson, A. Petri, L. Roos, J. Severin, K. Yasuzawa, I. Abugessaisa, 1098 A. Akalin, I. V. Antonov, E. Arner, A. Bonetti, H. Bono, B. Borsari, F. Brombacher, C. J. 1099 Cameron, C. V. Cannistraci, R. Cardenas, M. Cardon, H. Chang, J. Dostie, L. Ducoli, A. 1100 Favorov, A. Fort, D. Garrido, N. Gil, J. Gimenez, R. Guler, L. Handoko, J. Harshbarger, A. 1101 Hasegawa, Y. Hasegawa, K. Hashimoto, N. Hayatsu, P. Heutink, T. Hirose, E. L. Imada, M. 1102 Itoh, B. Kaczkowski, A. Kanhere, E. Kawabata, H. Kawaji, T. Kawashima, S. T. Kelly, M. Kojima, 1103 N. Kondo, H. Koseki, T. Kouno, A. Kratz, M. Kurowska‐Stolarska, A. T. J. Kwon, J. Leek, A. 1104 Lennartsson, M. Lizio, F. López‐Redondo, J. Luginbühl, S. Maeda, V. J. Makeev, L. 1105 Marchionni, Y. A. Medvedeva, A. Minoda, F. Müller, M. Muñoz‐Aguirre, M. Murata, H. 1106 Nishiyori, K. R. Nitta, S. Noguchi, Y. Noro, R. Nurtdinov, Y. Okazaki, V. Orlando, D. Paquette, 1107 C. J. C. Parr, O. J. L. Rackham, P. Rizzu, D. F. Sánchez Martinez, A. Sandelin, P. Sanjana, C. A. 1108 M. Semple, Y. Shibayama, D. M. Sivaraman, T. Suzuki, S. C. Szumowski, M. Tagami, M. S. 1109 Taylor, C. Terao, M. Thodberg, S. Thongjuea, V. Tripathi, I. Ulitsky, R. Verardo, I. E. 1110 Vorontsov, C. Yamamoto, R. S. Young, J. K. Baillie, A. R. R. Forrest, R. Guigó, M. M. Hoffman, 1111 C. C. Hon, T. Kasukawa, S. Kauppinen, J. Kere, B. Lenhard, C. Schneider, H. Suzuki, K. Yagi, M. 1112 J. L. de Hoon, J. W. Shin and P. Carninci (2020). "Functional annotation of human long 1113 noncoding RNAs via molecular phenotyping." Genome Res 30(7): 1060‐1072.

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

1114 Rappaport, N., N. Nativ, G. Stelzer, M. Twik, Y. Guan‐Golan, T. I. Stein, I. Bahir, F. Belinky, C. 1115 P. Morrey, M. Safran and D. Lancet (2013). "MalaCards: an integrated compendium for 1116 diseases and their annotation." Database (Oxford) 2013: bat018. 1117 Rappaport, N., M. Twik, N. Nativ, G. Stelzer, I. Bahir, T. I. Stein, M. Safran and D. Lancet 1118 (2014). "MalaCards: A Comprehensive Automatically‐Mined Database of Human Diseases." 1119 Curr Protoc Bioinformatics 47: 1.24.21‐19. 1120 Rappaport, N., M. Twik, I. Plaschkes, R. Nudel, T. Iny Stein, J. Levitt, M. Gershoni, C. P. 1121 Morrey, M. Safran and D. Lancet (2017). "MalaCards: an amalgamated human disease 1122 compendium with diverse clinical and genetic annotation and structured search." Nucleic 1123 Acids Res 45(D1): D877‐D887. 1124 Rodriguez, C. M., S. Y. Chun, R. E. Mills and P. K. Todd (2019). "Translation of upstream open 1125 reading frames in a model of neuronal differentiation." BMC Genomics 20(1): 391. 1126 Ruiz‐Orera, J. and M. M. Alba (2019). "Conserved regions in long non‐coding RNAs contain 1127 abundant translation and protein–RNA interaction signatures." NAR Genomics and 1128 Bioinformatics 1(1): e2‐‐e2. 1129 Saghatelian, A. and J. P. Couso (2015). "Discovery and characterization of smORF‐encoded 1130 bioactive polypeptides." Nat Chem Biol 11(12): 909‐916. 1131 Sarropoulos, I., R. Marin, M. Cardoso‐Moreira and H. Kaessmann (2019). "Developmental 1132 dynamics of lncRNAs across mammalian organs and species." Nature 571(7766): 510‐514. 1133 Shen, S., L. Lin, J. J. Cai, P. Jiang, E. J. Kenkel, M. R. Stroik, S. Sato, B. L. Davidson and Y. Xing 1134 (2011). "Widespread establishment and regulatory impact of Alu exons in human genes." 1135 Proc Natl Acad Sci U S A 108(7): 2837‐2842. 1136 Sievers, F., A. Wilm, D. Dineen, T. J. Gibson, K. Karplus, W. Z. Li, R. Lopez, H. McWilliam, M. 1137 Remmert, J. Soding, J. D. Thompson and D. G. Higgins (2011). "Fast, scalable generation of 1138 high‐quality protein multiple sequence alignments using Clustal Omega." Molecular Systems 1139 Biology 7: 6. 1140 Spencer, H. L., R. Sanders, M. Boulberdaa, M. Meloni, A. Cochrane, A.‐M. Spiroski, J. 1141 Mountford, C. Emanueli, A. Caporali, M. Brittan, J. Rodor and A. H. Baker (2020). "The 1142 LINC00961 transcript and its encoded micropeptide, small regulatory polypeptide of amino 1143 acid response, regulate endothelial cell function." Cardiovascular Research. 1144 Su, X., J. Zhang, X. Luo, W. Yang, Y. Liu, Y. Liu and Z. Shan (2019). "LncRNA LINC01116 1145 Promotes Cancer Cell Proliferation, Migration And Invasion In Gastric Cancer By Positively 1146 Interacting With lncRNA CASC11." OncoTargets and Therapy Volume 12: 8117‐‐8123. 1147 Szafron, L. M., A. Balcerak, E. A. Grzybowska, B. Pienkowska‐Grela, A. Felisiak‐Golabek, A. 1148 Podgorska, M. Kulesza, N. Nowak, P. Pomorski, J. Wysocki, T. Rubel, A. Dansonka‐ 1149 Mieszkowska, B. Konopka, M. Lukasik and J. Kupryjanczyk (2015). "The Novel Gene CRNDE 1150 Encodes a Nuclear Peptide (CRNDEP) Which Is Overexpressed in Highly Proliferating 1151 Tissues." PloS one 10(5): e0127475‐e0127475. 1152 Tsagakis, I., K. Douka, I. Birds and J. L. Aspden (2020). "Long non‐coding RNAs in 1153 development and disease: Conservation to mechanisms." J Pathol. 1154 van Heesch, S., F. Witte, V. Schneider‐Lunitz, J. F. Schulz, E. Adami, A. B. Faber, M. Kirchner, 1155 H. Maatz, S. Blachut, C. L. Sandmann, M. Kanda, C. L. Worth, S. Schafer, L. Calviello, R. 1156 Merriott, G. Patone, O. Hummel, E. Wyler, B. Obermayer, M. B. Mücke, E. L. Lindberg, F. 1157 Trnka, S. Memczak, M. Schilling, L. E. Felkin, P. J. R. Barton, N. M. Quaife, K. Vanezis, S. 1158 Diecke, M. Mukai, N. Mah, S. J. Oh, A. Kurtz, C. Schramm, D. Schwinge, M. Sebode, M. 1159 Harakalova, F. W. Asselbergs, A. Vink, R. A. de Weger, S. Viswanathan, A. A. Widjaja, A. 1160 Gärtner‐Rommel, H. Milting, C. Dos Remedios, C. Knosalla, P. Mertins, M. Landthaler, M.

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

1161 Vingron, W. A. Linke, J. G. Seidman, C. E. Seidman, N. Rajewsky, U. Ohler, S. A. Cook and N. 1162 Hubner (2019). The Translational Landscape of the Human Heart. Cell. United States, © 1163 2019 Elsevier Inc. 178: 242‐260 e229. 1164 Wang, H., A. Iacoangeli, S. Popp, I. A. Muslimov, I. Hiroaki, N. Sonenberg, I. B. Lomakin and 1165 H. Tiedge (2002). "Dendritic BC1 RNA: Functional Role in Regulation of Translation 1166 Initiation." Journal of Neuroscience 22(23): 10232‐‐10241. 1167 Wang, L., J. Fan, L. Han, H. Qi, Y. Wang, H. Wang, S. Chen, L. Du, S. Li, Y. Zhang, W. Tang, G. 1168 Ge, W. Pan, P. Hu and H. Cheng (2020). "The micropeptide LEMP plays an evolutionarily 1169 conserved role in myogenesis." Cell Death & Disease 11(5): 357. 1170 Waterhouse, A. M., J. B. Procter, D. M. A. Martin, M. Clamp and G. J. Barton (2009). "Jalview 1171 Version 2‐a multiple sequence alignment editor and analysis workbench." Bioinformatics 1172 25(9): 1189‐1191. 1173 Wickham, H. (2019). stringr: Simple, Consistent Wrappers for Common String Operations. 1174 Winzi, M. a. (2018). "The long noncoding RNA lncR492 inhibits neural differentiation of 1175 murine embryonic stem cells." Plos One 13(1): e0191682. 1176 Xiao, N., D. S. Cao, M. F. Zhu and Q. S. Xu (2015). "protr/ProtrWeb: R package and web 1177 server for generating various numerical representation schemes of protein sequences." 1178 Bioinformatics 31(11): 1857‐1859. 1179 Xie, Y. (2020). knitr: A General‐Purpose Package for Dynamic Report Generation in R. 1180 Yates, A. D., P. Achuthan, W. Akanni, J. Allen, J. Alvarez‐Jarreta, M. R. Amode, I. M. Armean, 1181 A. G. Azov, R. Bennett, J. Bhai, K. Billis, S. Boddu, J. C. Marugan, C. Cummins, C. Davidson, K. 1182 Dodiya, R. Fatima, A. Gall, C. G. Giron, L. Gil, T. Grego, L. Haggerty, E. Haskell, T. Hourlier, O. 1183 G. Izuogu, S. H. Janacek, T. Juettemann, M. Kay, I. Lavidas, T. Le, D. Lemos, J. G. Martinez, T. 1184 Maurel, M. McDowall, A. McMahon, S. Mohanan, B. Moore, M. Nuhn, D. N. Oheh, A. Parker, 1185 A. Parton, M. Patricio, M. P. Sakthivel, A. I. A. Salam, B. M. Schmitt, H. Schuilenburg, D. 1186 Sheppard, M. Sycheva, M. Szuba, K. Taylor, A. Thormann, G. Threadgold, A. Vullo, B. Walts, 1187 A. Winterbottom, A. Zadissa, M. Chakiachvili, B. Flint, A. Frankish, S. E. Hunt, G. Ilsley, M. 1188 Kostadima, N. Langridge, J. E. Loveland, F. J. Martin, J. Morales, J. M. Mudge, M. Muffato, E. 1189 Perry, M. Ruffier, S. J. Trevanion, F. Cunningham, K. L. Howe, D. R. Zerbino and P. Flicek 1190 (2020). "Ensembl 2020." Nucleic Acids Research 48(D1): D682‐D688. 1191

Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press Figure 1

A Control RA

cytoplasm 80% of cytoplasm: polysome fractionation 20% of cytoplasm

AAAA AAAA 25% of polysomes: polysome Ribosome associated RNAs AAAA foot-printing AAAA AAAA AAAA AAAA AAAA AAAA

Volcanoi) All cytoplasmic plot ii) Polysome associated poly- iii) Actively translated poly-A (RNA-Seq) A (RNA-Seq) mRNAs & lncRNAs (Ribo-Seq) EnhancedVolcano

Control vs RA BNS Log2 FC p−value p − value a E Downregulated Upregulated Nuclear Cytoplasmic 50 82 237 SNAP25-AS1 ENSG00000262001.1 ENSG00000251011.1 40 (DLGAP1-AS2) AC254633.1

ENSG00000163364.5 (LINC01116) LINC01116 P

30 ENSG00000152931.7 SERPINB9P1 10 g

o (LINC02143) DLGAP1-AS2 L 20 ENSG00000253236.1 − DLGAP1-AS1 LINC02143 Log2 (Cytoplasmic/Nuclear) 10 ENSG00000227906.3(SNAP25-AS1) XIST 0 GAPDH −10 −5 0 5 10 -8 -4 0 4 8 Log2 fold change Log2 (Cytoplasmic/Nuclear) total = 3380 variables D C Upregulated Downregulated intergenic 4 10 1 1 1 2 1 5 anti-sense 4 sense (overlapping)

retained intron 108 intergenic sense (intronic) 40 108 34 intergenic anti-sense uncharacterised anti-sense

NMD target Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Figure 1: Cytoplasmic lncRNA expression is regulated during neuronal differentiation

(A) Schematic of Poly-Ribo-Seq, with 3 level of analysis: i) total cytoplasmic, ii) polysome-associated and iii) translated lncRNAs. (B) Volcano plot of differentially expression analysis of polysome- associated lncRNAs (labelled by geneIDs and names for candidate lncRNAs) between Control and RA populations. 237 lncRNAs are upregulated during differentiation and 82 downregulated (log2 fold-change cutoff=1, padj<0.05). Pie chart of types of lncRNAs (C) upregulated and (D) downregulated upon differentiation (intergenic; anti-sense; sense-overlapping; retained intron; sense-intronic; uncharacterized; NMD target). (E) LncRNAs of interest that are induced are specifically localised to cytoplasm as shown by subcellular fractionation RT-qPCR. XIST lncRNA was used as a nuclear and GAPDH mRNA as a cytoplasmic positive control (n=3, SE is plotted, student’s t test, n=3, p>0.05). Control total vs Control poly−ass lncRNA DEGFigure 2 RA total vs RA poly−ass lncRNA DEG EnhancedVolcano EnhancedVolcano

NS Log FC p−value p − value and log FC NS p−value p − value and log FC A 2 Control 2 B RA 2 Polysome depleted Polysome enriched Polysome depleted Polysome enriched

25 25 12 10

20 20 P P

15 15 10 10 g g o o L L − − 10 10

ENSG00000188206.5 ENSG00000271327.1 5 ENSG00000260852.1 5 ENSG00000260597.1 ENSG00000261584.1 NS

0 log2FC 0 −2 0 2 p-value −2 0 2 p-value & log FC Log2 fold change 2 Log2 fold change Total = 1972 variables Total = 1095 variables

C 80S D 80S LINC02143 DLGAP1-AS1 60S 60S 40S 40S

0.175 0.6 0.15 0.5 0.125 0.4 0.1 Control 0.3 0.075 Control Starting Quantity Starting 0.05 RA 0.2

Starting Quantity Starting RA 0.025 0.1 0 0 RNPs 40S 60S 80S polysomes 1 2 3 4 5 6 7 8 9 10 11 12 1 RNPs2 340S 4 60S5 80S6 polysomes7 8 9 10 11 12

E 80S

LINC01116 60S 40S

0.50

0.40

0.30 Control 0.20 RA

Starting Quantity Starting 0.10

0.00 1RNPs2 340S 60S4 80S5 polysomes6 7 8 9 10 11 12 Figure 2: Association of lncRNAs with polysomes during neuronal differentiation

Volcano plots displaying the significantly differentially localised lncRNAs (labelled by their geneIDs) between Total and Polysome populations in (A) Control and (B) RA, with log2 fold-change cutoff=1, padj<0.05. (C-E) RT-qPCR of lncRNAs across sucrose gradient fractions (n=3, SE is plotted). (C) LINC02143 is found in 80S and small polysome fractions during differentiation. 5% of the transcript is detected in 80S (monosome) fraction and 66% in small polysome complexes. (D) DLGAP1-AS2 is found in 80S and small polysome fractions both in control and RA treated cells. On average, 63% of the transcript is detected in the polysome fractions in Control and 49% upon differentiation. (E) LINC01116 is found in 80S and 2-7 polysome fractions both in control and RA treated cells. On average, 66% of the LINC01116 transcripts is detected in the polysome fractions in Control and 57% upon differentiation. Figure 3 Ra daa A B lncRNA-smORFs translated CAdap Control RA Adapo emoal

Faq (FASTX-Toolki) FaQC Qali le Read ali check

Refeence Boie2 RNA, RNA RNA and RNA emoal ancip D Refeence STAR genome Genome alignmen 200 Read lengh elecion Meaplo podced aond a and op codon. Selecion of ead lengh ih good faming

Riboape 100 Idenie anlaed ORF in coding and non-coding egion. Peptide Length (aa) Peptide

Replicae hehold Coff: anlaed ORF fond in 2/3 eplicae 0 dORF lncRNA ORF uORF n=5 n=45 n=71 ORF Type C padj= 0.001

adj Control p = 0.021 padj= 0.002 padj= 0.999 RA Control padj= 0.274 RA

8

4 4 µmean = 1.30 µmean = 0.64 µmean = 1.06 µmean = 0.09 µmean = 0.04 µmean = 0.08 TE TE

2 2 g g

o 0 o

0 l l

µmean = −5.11 −4 −4

µmean = −5.75

lncRNA Protein coding Protein coding Protein coding lncRNA Protein coding Protein coding Protein coding (n = 28) (n = 16,281) dORF uORF (n = 23) (n = 12,745) dORF uORF (n = 4) (n = 56) (n = 1) (n = 27) ORF type ORF type

E LINC01116

150 1500 ORF Framing

600

RNA seq coverage 400

100 1000 P sites 200

0 0 1 2 Frame 50 500

Number of P sites Rest of Transcript Framing

600

0 0 400

600 700 800 P sites 200 Start 0 Codons 600 700 800 0 1 2 Frame Stop Codons 600 700 800 0 1 2 Transcript Position Figure 3: Translation of lncRNA smORFs

(A) Workflow for identification of translated ORFs from Ribo-seq and RNA-seq, see materials and methods for details. (B) Venn diagram of lncRNA-smORFs translated in Control and RA, with overlap. (C) Plots of Translational Efficiencies for protein-coding ORFs, lncRNA-smORFs, dORFS and uORFs. (D) Length distribution of translated ORFs in lncRNAs, dORFs and uORFs (in codons). (E) Poly-Ribo-Seq profile for LINC01116 in RA treatment. RNA-Seq (Polysome) reads are grey and ribosome P sites in purple, turquoise and yellow according to frame. Purple lines mark beginning and end of translated smORF. All possible start and stop codons are indicated below. Framing within and outside translated smORF shown on left. Translated ORFs Control RA Overlap Total

Protein- 16,282 12,745 10,014 19,013 Coding ORFs uORFs 56 27 12 71 dORFs 4 1 0 5 lncRNA-smORFs 28 23 6 45

Table 1: Translation of small ORFs

Number of ORFs detected as translated in Poly-Ribo-Seq F B D 5’ LINC01116 –WT in SH-SY5Y LINC00478 in SH-SY5Y A 5’ ii - ii i UTR (

cytoplasmic nuclear i LINC00478 lncRNA Control 150 nt FLAG FLAG ) smORF LINC00478 (111 RA DAPI nt ) 3X FLAG 3X /MERGE hnRNPK 20 3’ 20 μ μ m m C Figure 4 Figure DAPI /MERGE 20 20 μ μ m m 20 20 μ μ m m E WT Δ1 G

LINC01116-Δ1 5’ ( 5’ 608 - AAA ATG UTR nt FLAG ) LINC01116 lncRNA ATG ATG ATG ATG ( smORF 213 bp) 213 DAPI /MERGE 3XFLAG 3XFLAG 20 μ m 3’ Figure 4: Peptide production from smORFs in lncRNAs

(A) Venn diagram showing overlap lncRNA-smORFs detected, between our Poly-Ribo-Seq and publicly available mass spectrometry data from SH-SY5Y (purple). Control in blue, RA in pink. (B) Schematic of tagging construct for LINC00478; lncRNA sequence upstream of smORF and smORF, excluding its stop codon, cloned upstream of 3X FLAG, which is lacking its own start codon. FLAG signal is therefore dependent on smORF translation. (C) Poly-Ribo- Seq profile for LINC00478 in RA treatment. RNA-Seq (Polysome) reads are grey and ribosome P sites in purple, turquoise and yellow according to frame. Purple lines mark beginning and end of translated smORF. All possible start and stop codons are indicated below. Framing within and outside translated smORF shown on right. (D) Confocal images of FLAG-tagged LINC00478 peptide in SH-SY5Y cells (Control), showing (i) nuclear and (ii) cytoplasmic distribution, green is FLAG, magenta is hnRNPK (marking nuclei) and blue is DAPI (scale bar is 20μm). (E) Schematic of tagging constructs for LINC01116 (WT and start codon mutant Δ1). (F) Confocal images of FLAG- tagged WT LINC01116 peptide showing cytoplasmic localisation, near cell membrane and neuritic processes (magnification of insert is 3X). (G) Δ1 start codon mutant, showing no FLAG signal, in SH-SY5Y cells; green is FLAG and blue is DAPI (scale bar is 20μm). Figure 5

A ENST000 ENST000 ENST000 ENST000 ENST000 ENST000 ENST000 ENST000 ENST000 ENST000 ENST000 ENST000 ENST000 ENST000 ENST000 ENST000 ENST000 0052603 0050117 0059075 0045391 0050479 0050486 0044941 0045493 0064157 0056622 0050294 0051022 0064586 0065116 0044125 0042994 0044242 6.1_122 7.7_136 0.1_25_ 0.5_151 2.6_74_ 9.1_132 9.5_342 5.1_477 1.1_548 0.2_141 1.5_6_2 1.5_6_2 9.1_2_2 3.1_67_ 7.1_75_ 0.6_97_ 8.1_220 6_2042 _388 283 _262 230 _360 _636 _633 _773 _570 22 85 12 343 243 595 _352

Evidence of ORF conservation found using:

X X X X X X X X X X X X X X X X X

B

Start codon

Stop codon Figure 5: Translated lncRNA-smORFs exhibit sequence conservation in great apes.

(A) Phylogram with lncRNA-smORFs for which evidence of sequence conservation were found represented as circles, coloured according to how sequence conservation was identified. Phylogram built in iTOL (Letunic, I. et al .2006) using data from TimeTree (Kumar, S. , et al. 2017), scale in 10 MYA along. (B) Portion of human ENST00000442428 lncRNA nt alignment with gorilla and orangutan nt sequences, showing the smORF. Alignment built in ClustalOmega (Sievers, F. et al. 2011) displayed in JalView (Waterhouse, A.M. et al. 2009). Figure 6 A Proportion of lncRNAs with developmentally dynamic expression

All lncRNAs

Dynamic Not dynamic Translated lncRNAs

0 10 20 30 40 50 60 70 80 90 100 Percentage B Organ distribution of developmentally dynamic lncRNAs Liver 8 3 Translated lncRNA Kidney 3 Total lncRNA Heart 2

Gonads 38 92

Brain 62 20

0 20 40 Percentage 60 80 100

C E Percentage of translated lncRNAs associated with Regulation of translated lncRNAs during neurological diseases and disorders differentiation

CNS cancer 68 20% 2% Neurodegenerative 21 Up Neurodevelopment 18 Down Mood disorder 3 No change 78% No disease association 24

0 10 20 30 40 50 60 70 80 Percentage

lncRNA Downregulated pathways and Normalised enrichment score LINC01116 GO_MYELIN_SHEATH –1.488 D GO_NEURON_MIGRATION -1.625 GO_LARGE_RIBOSOMAL_SUBUNIT -2.437 GO_SMALL_RIBOSOMAL_SUBUNIT - 2.436 GO_POSITIVE_REGULATION_OF_CANONICAL_WNT_SIGNALING_PATHWAY –1.889 FGD5-AS1 GO_NEURON_PROJECTION_GUIDANCE -2.086 GO_NEURON_PROJECTION_MEMBRANE –2.069 GO_NEURON_RECOGNITION –2.066 GO_MAIN_AXON - 1.99 GO_SYNAPTIC_SIGNALING –1.981 GO_NEURON_PROJECTION_MORPHOGENESIS -1.958 TUG1 GO_SYNAPTIC_MEMBRANE –1.885 GO_SMAD_BINDING –1.885 GO_NEGATIVE_REGULATION_OF_WNT_SIGNALING_PATHWAY -1.708 GO_NEURON_PROJECTION_MORPHOGENESIS –1.836 GO_CELL_MORPHOGENESIS_INVOLVED _IN_NEURON_DIFFERENTIATION—1.820 GO_CANONICAL_WNT_SIGNALING_PATHWAY –1.800 Figure 6: Potential biological importance of translated lncRNAs in neural development, differentiation and disease.

(A) Percentage of translated lncRNA genes and all lncRNA genes, which exhibit dynamic expression during human development according to lncExpDB (Sarropoulos et al, Cardoso-Moreira et al). (B) Percentage of the dynamically expressed, translated lncRNAs, which show dynamic expression in each organ, for total and translated lncRNA populations, according to LncExpDB (Sarropoulos et al, Cardoso-Moreira et al). (C) Proportion of translated lncRNAs, which exhibit differential expression during differentiation of SHSY5Y cells. (D) GO terms associated with changes in RNA-Seq levels upon siRNA knockdown of 3 of the translated lncRNAs (LINC01116, FGD5-AS1 and TUG1) performed by FANTOM6 in human dermal fibroblasts. (E) Percentage of translated lncRNA genes found to be associated with neuronal diseases and disorders according to lncRNADisease, Differential Expression Atlas, Cancer RNA-Seq Nexus, Malacards. Figure 7 A B Control RA Control RA Tuj1/DAPI Tuj1/ DAPI Tuj1 DAPI Scrambled siRNA 200μm 200μm siRNA Scrambled

LINC01116 siRNA 200μm 200μm LINC01116 siRNA LINC01116 siRNA

C D 70 0.2 * 60 * qPCR) - 0.15 m)

μ 50 40 0.1 30

20 0.05 Neurite length( Neurite 10

0 (RT MOXD1 mRNA levels 0 scr LINC01116 scr LINC01116 scr LINC01116 scr LINC01116 Control RA Control RA SE is plotted, Student’s t-test, N=3, n>100, p<0.05 SE is plotted, Student’s t-test, n=3, p<0.05 Figure 7: LINC01116 contributes to neuronal differentiation but does not affect cell cycle progression.

(A) Representative immunofluorescence images of Control and RA SH-SY5Y cells, transfected with siRNA targeting LINC01116 and scrambled control, after staining for Tuj (βIII-tubulin) at day 3 post-differentiation (scale bar=200μm). White windows magnified in (B). (C) Quantification of neurite length in Control and RA treated cells upon knockdown shows a significant reduction of neurite length in the differentiated cells upon LINC01116 knockdown (N=3 biological replicates, n>100 measurements, student's t-test p<0.05). (D) RT-qPCR of differentiation marker MOXD1 in Control and RA treated cells, transfected with siRNA targeting LINC01116 and scrambled control, shows significant reduction of MOXD1 expression in differentiated cells with reduced LINC01116 levels at day 3 post-differentiation (n=3 biological replicates, student's t-test p<0.05). Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Cytoplasmic long non-coding RNAs are differentially regulated and translated during human neuronal differentiation

Katerina Douka, Isabel Birds, Dapeng Wang, et al.

RNA published online June 30, 2021

Supplemental http://rnajournal.cshlp.org/content/suppl/2021/06/30/rna.078782.121.DC1 Material

P

Accepted Peer-reviewed and accepted for publication but not copyedited or typeset; accepted Manuscript manuscript is likely to differ from the final, published version.

Open Access Freely available online through the RNA Open Access option.

Creative This article, published in RNA, is available under a Creative Commons License Commons (Attribution 4.0 International), as described at License http://creativecommons.org/licenses/by/4.0/.

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

To subscribe to RNA go to: http://rnajournal.cshlp.org/subscriptions

Published by Cold Spring Harbor Laboratory Press for the RNA Society