<<

IMF YJMBI-63988; No. of pages: 15; 4C: 3, 4, 7, 8, 10

1 2 Rise of the RNA Machines: Exploring the Structure of

3 Long Non-Coding

4

Irina V. Novikova, Scott P. Hennelly, Chang-Shung Tung and Karissa Y. Sanbonmatsu Q15

6 Los Alamos National Laboratory, Los Alamos, NM 87545, USA

7 Correspondence to Karissa Y. Sanbonmatsu: [email protected] 8 http://dx.doi.org/10.1016/j.jmb.2013.02.030 9 Edited by A. Pyle 1011 12 Abstract

13 Novel, profound and unexpected roles of long non-coding RNAs (lncRNAs) are emerging in critical aspects of 14 regulation. Thousands of lncRNAs have been recently discovered in a wide range of mammalian 15 systems, related to development, , cancer, brain function and hereditary disease. The structural 16 of these lncRNAs presents a brave new RNA world, which may contain a diverse zoo of new 17 architectures and mechanisms. While structural studies of lncRNAs are in their infancy, we describe existing 18 structural data for lncRNAs, as well as crystallographic studies of other RNA machines and their implications 19 for lncRNAs. We also discuss the importance of dynamics in RNA machine mechanism. Determining 20 commonalities between lncRNA systems will help elucidate the and mechanistic role of lncRNAs in 21 disease, creating a structural framework necessary to pursue lncRNA-based therapeutics. 22 © 2013 Published by Elsevier Ltd. 24 23

25 Introduction rather than the exception in the case of eukaryotic 50 . 51 26 RNA is primarily known as an intermediary in gene LncRNAs are defined by the following: (i) lack of 52 11 27 expression between DNA and . Over the coding potential and (ii) transcript length (N200 nt). 53 28 past several decades, other roles for RNA have These transcripts are generally nuclear retained and 54 29 been identified, which include synthesis, transcribed by RNA II, with many that are 55 12–15 30 gene regulation and nucleic processing. How- spliced and polyadenylated. LncRNAs may be 56 31 ever, these RNAs were considered as outliers in a intronic, intergenic (large intervening non-coding 57 32 that consists mainly of protein-coding RNAs or lincRNAs) or antisense to the protein- 58 33 RNAs. While this may be the case for and coding (overlapping one or more ). 59 1 34 certain unicellular organisms, in While transcript lengths are normally in the range 60 35 higher are markedly different. Recent 1000–10,000 residues, the Air and kcnq1ot1 61 16–18 36 deep studies in humans have shown lncRNAs have lengths that exceed 90 kb. 62 37 that more than two-thirds of the is actively Interestingly, in 2013, a novel “monster” non-coding 63 2 38 transcribed. Since protein-coding genes constitute transcript, XACT (252 kb in length), was found to 64 3 39 a very small fraction of the genome, non-coding originate from the active X in human 65 19 40 RNAs represent the vast majority (more than 80% in pluripotent cells. Many lncRNAs play key roles in 66 4 20–22 23 12,24–29 41 mammals). Recent studies estimate approximately signaling, development, cancer, em- 67 30 31–34 42 15,000 long non-coding RNAs (lncRNAs) in bryonic stem pluripotency, brain function, 68 5 26,35–38 43 humans. In light of their expression profiles specific subcellular compartmentalization, 69 12,39,40 41,42 44 to cell, cell cycle, tissue, developmental stage and remodeling, biology and stress 70 43 45 disease, it is difficult to precisely quantify the number response. studies and functional 71 46 of human lncRNAs. We note that tens of thousands aspects of lncRNAs have been discussed by Rinn 72 6–10 47 of lncRNAs have been profiled in 2012. This and Chang in a recent review article, as well as in 73 6 48 recent explosion of newly discovered lncRNAs many references therein. Since lncRNAs are often 74 49 suggests that non-coding RNAs may be the norm associated with modification and chromatin 75

0022-2836/$ - see front matter © 2013 Published by Elsevier Ltd. J. Mol. Biol. (2013) xx, xxx–xxx

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 2 Review: Structure of RNAs and RNA Machines Q2

76 remodeling, epigenetic effects represent a unifying resolution structure of the bacterial 133 77 theme. While the mechanism by which epigenetic (N4500 nt) required more than two decades for its 134 78 factors find their targets is largely unknown, many solution, structural studies of lncRNAs are 135 64,65 79 recent studies indicate that lncRNAs may be a formidable. Fundamental questions regarding 136 80 necessary component of the network that guides the structure of lncRNAs remain unanswered, 137 81 these factors to their chromatin targets. In addition to including: 138 82 epigenetic effects, the origin of lncRNAs is not well 83 understood. While the of simple organisms (1) Is lncRNA mechanism dominated by pri- 139 84 are dominated by protein-coding genes, the advent mary sequence, higher-order structure or 140 85 of non-coding RNAs in higher organisms gave rise to both? 141 86 an incredible sequence space for exploration of new (2) Do lncRNAs contain sub-domains? 142 87 function at the RNA level, creating sophisticated (3) Are lncRNAs complexed with many pro- 143 88 regulatory networks. Possible evolution mechanisms teins or do they exist as isolated RNAs that 144 89 may include pseudogenization of protein-coding transiently interact with proteins? 145 90 genes, insertion of transposons and duplication of 91 genes. Investigations into the open area of the origin (4) Are lncRNAs compact or extended? 146 92 and evolution of lncRNAs have the potential to yield 147 93 exciting results. Here, we attempt to provide a starting point for 148 structural studies of lncRNAs. Because tertiary 149 94 RNA machines structures have not been solved to date, we review 150 tertiary structures of other RNA machines. We next 151 95 RNA molecules are able to form complex molec- summarize currently known structural features of 152 96 ular machines. The ribosome is one of the most well eukaryotic lncRNAs (Fig. 1). Finally, we discuss 153 97 known RNA machines since it has many moving lncRNA structure in the context of previously studied 154 44,45 98 parts and may be powered by GTP hydrolysis. RNA molecular machines and corresponding struc- 155 99 In certain circumstances, the ribosome can function ture/function relationships. 156 46 100 without GTP hydrolysis (“factor-free ”). 101 Factor-free translation is quite inefficient in compar- Crystallographic studies of RNA machines in 157 102 ison to GTP-based translation and in comparison to bacteria and other unicellular organisms 158 103 the far more efficient protein-based molecular 47 104 motors (e.g., ). The ribosome also pro- Three-dimensional structure is often integral to the 159 105 cesses information by performing a look-up table function and mechanism of a biomolecular system. 160 106 operation, converting the 4-letter alpha- Here, we review crystallographic structures of other 161 107 bet into the 20-letter protein alphabet. The group I RNA systems. This review is by no means compre- 162 108 and group II self-splicing are also molecular hensive but highlights the diversity of mechanism 163 109 machines as they use chemical energy to bring and composition of RNA structure. The group I 164 48–50 110 together distant regions of RNA. RNase P and , the group II intron, the ribosome and RNase P 165 111 RNA also catalyze chemical reactions to are the only RNAs with lengths greater than 200 nt 166 51–53 48,66 112 accomplish their function. The is that have been studied at high resolution. 167 54 113 another type of RNA molecular machine. Ribos- The group I and group II introns are self-splicing 168 114 witches are molecular switches that sense their RNAs that catalyze their own cleavage. The struc- 169 115 environment and allow decisions to ture of the group I intron consists largely of an 170 116 be made on the basis of environmental inputs such isolated RNA (~200 nt), organized into a well- 171 55–60 117 as ligand concentration. More complicated defined architecture by co-axially stacked helices 172 118 tandem act as logical AND gates. In connected by single-stranded regions. The pre-2S 173 119 addition, complex combinations of can be state of the group I intron includes an and also 174 120 arranged into logic circuitry (e.g., OR, NAND, NOR contains mainly co-axially stacked helices con- 175 61 67 121 and NOT gates). Finally, RNA scaffolding that nected by single-stranded regions. Four of the 176 122 controls information flow could be analogous to an helices are capped by . This structure 177 123 integrated circuit (a device that has no moving parts consists largely of an isolated RNA with one small 178 124 but controls highly complex information flow in spliceosomal protein (U1A) bound to its extremity. 179 125 computers). Recent studies suggest that several The active site contains two base triples, critical for 180 126 different lncRNA systems may act as scaffolding, precise positioning of ions and . 181 62 127 controlling information flow in epigenetic systems. The structure of the post-catalytic state of the 182 128 Structural studies are often critical in deciphering group II intron, solved by Toor et al., contains 183 129 RNA machines and the mechanism of RNA action. approximately 400 nt, organized into more than 10 184 130 Few structural studies of individual lncRNAs have helices, with most capped by RNA stem–loop 185 68 131 been performed due to their enormous size and very structures. The RNA contains many co-axially 186 20,63 132 recent discovery. Considering that the high- stacked helices, forming a relatively compact overall 187

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 Q2 eiw tutr fRA n N Machines RNA and RNAs of Structure Review: laect hsatcea:Nvkv,I. Novikova, Biol. as: article this cite Please (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 tal. et ieo h N ahns xlrn h tutr fLn o-oigRNAs, Non-Coding Long of Structure the Exploring Machines: RNA the of Rise ,

Fig. 1. Examples of non-coding RNAs studied with structural methods. Purple, lncRNAs with existing structural information. Orange, examples of non-coding RNAs with solved crystallographic structures. “3D” denotes systems with crystallographic structures; “2D” denotes systems with experimentally derived secondary structures of the entire RNA. .Mol. J. 3 4 Review: Structure of RNAs and RNA Machines Q2

188 structure. A large number of long-range tertiary intricate five-way junction with many stacked non- 191 189 interactions connect the helices, including a kissing helical . Several base triples exist in the 192 190 loop interaction, a zipper interaction and an Z-anchor region, nearby the magnesium binding 193

Fig. 2. Tertiary structures of previously solved RNA complexes. (a) The group II intron, solved by Chan et al., is a highly structured isolated RNA with a compact core.48 (b) The ribosome, solved by Gao et al., is highly structured and compact.70 Pink, rRNA; purple, tRNA; cyan, ribosomal proteins; yellow, elongation factor EF-G. Different protein factors bind to the same binding sites, regulating protein synthesis. (c) RNase P.51 RNase P is structured and contains a single protein binding . Pink, RNase P RNA; purple, tRNA; yellow, RNase P protein. (d) Cellulosome protein-based molecular machine. The cellulosome represents a relatively disordered but highly efficient molecular machine. Red, white and pink, cellulose substrate; gray, cellulase subunit; purple, dockerin connecting protein; blue, cohesin (disordered linker protein).

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 Review: Structure of RNAs and RNA Machines 5 Q2

194 sites associated with catalysis. Marcia and Pyle also large-scale conformational fluctuations occur simul- 253 195 uncovered a reversible conformational change in a taneously and at different timescales. Protein binding 254 196 study of the group II intron, which included 14 or GTP hydrolysis events act to synchronize the 255 197 structures, capturing several different intermediates fluctuations, shifting the equilibrium to the next basin 256 69 198 of splicing consistent with biochemical studies. A in the energy landscape and allowing the ribosome to 257 45,73 199 structure of the group II intron prior to the first step of progress through the elongation cycle. 258 48 200 splicing shows a high degree of similarity in overall RNase P (~330–400 nt) can be considered a 259 201 architecture to the post-splicing configuration canonical RNA that is compact, highly structured 260 51 202 (Fig. 2a). Interestingly, a significant conformational and binds a single protein (Fig. 2c). RNase P is an 261 203 change occurs in the core of the intron, where a RNA-based multiple-turnover that catalyzes 262 204 single-stranded region forms a kink motif, producing 5′ end maturation of tRNA and other RNAs. While 263 205 a reversal in the direction of the RNA strand at this the complex is dominated by RNA, a small protein 264 206 location. It has also been suggested that the component (approximately 10% of total mass) 265 207 is mechanistically similar to the group increases the affinity of tRNA to RNase P. The 266 68 208 II intron. overall architecture of the RNA is similar to the self- 267 209 The largest RNA tertiary structure solved to date is splicing introns, consisting mainly of co-axially 268 70 210 the ribosome (Fig. 2b). The architecture of the stacked helices, connected by various tertiary 269 74 211 ribosome comprises two major subunits, whose interactions. 270 212 RNA structure is formed by a large number of The structure of the intact telomerase RNA, TER 271 213 shorter RNA helices and a small number of longer (~450 nt in humans), has yet to be solved; however, 272 214 helices organized into a compact geometry. Approx- this system represents an interesting RNA in the 273 52 215 imately 50 proteins (in the case of bacteria) are context of lncRNAs. The template for telomeric 274 216 interspersed throughout the ribosome complex, repeat extension resides within the TER. The 275 217 producing a highly compact, globular architecture. telomerase RNA (TER) binds to telomerase reverse 276 218 The globular domains of ribosomal proteins them- transcriptase (TERT), facilitating addition 277 219 selves typically reside on the exterior of the complex to telomeric regions. Crystal structures of portions of 278 220 and possess long tails extending to the interior of the the telomerase RNA have been solved. These 279 221 ribosome, near the active sites in some cases. The structures show a high degree of organization with 280 222 70S complex is interconnected by an enormous several tertiary interaction motifs, including a well- 281 223 number of tertiary contacts, including both RNA/RNA studied and functionally important near 282 224 and RNA/protein interactions. The structure of the the RNA template sequence. The structural integrity 283 225 ribosome is uniquely designed for its function: the of the RNA is dependent to some degree on the 284 75 226 ribosome must (i) accept only correct tRNAs carrying interactions of TERT and accessory proteins. In 285 227 amino corresponding to the mRNA codon, (ii) yeast, the precise location of the telomerase subunit 286 228 perform chemistry (adding the to the (est1p's binding domain on this RNA) is not the key 287 71 229 nascent protein) and (iii) move the tRNAs through to its function and can be positioned within the RNA 288 72 53 230 the ribosome. A large cavity exists between the at different locations while maintaining function. 289 231 small and large subunits. This cavity is used to These important experiments suggest that the 290 232 transport the tRNA ligands through the ribosome. telomerase RNA may act as flexible scaffolding for 291 233 The six major active sites of the ribosome are RNA protein binding. Another telomerase-associated 292 234 based. In addition to ribosomal proteins, other RNA is the -repeat-containing RNA 293 235 protein factors bind to the ribosome, facilitating the (TERRA). This RNA is transcribed from positions 294 236 initiation and termination of translation, translocation near the telomeric ends and includes sequence from 295 237 of the ribosome along the mRNA, improved fidelity of the telomeric repeats. This RNA has been found to 296 238 tRNA selection and regulation of protein synthesis. associate with chromatin. TERRA appears to local- 297 239 While ribosomal proteins bind all over the ribosome, ize near the . Its association is negatively 298 240 protein factors bind at a few select sites, associating regulated by effectors of the nonsense-mediated 299 76 241 and dissociating with the ribosome at various stages. RNA decay pathway. The presence of increased 300 77 242 These include the GTP-associated center and the concentrations of TERRA may inhibit telomerase. 301 243 three tRNA binding sites (the aminoacyl site, the 244 peptidyl site and the exit site). Elongation factors EF- LncRNA mechanisms based on primary structure 302 245 Tu and EF-G each bind to GTP-associated center 246 during different stages of the elongation. In several cases, mechanistic information has 303 247 Several different RNA action mechanisms allow been gained by studying the primary structure of 304 248 the ribosome to accomplish its function. Locally, a the lncRNA. Here, the role of lncRNA is mainly to 305 249 conformational selection mechanism may act at the provide sequence specificity to a process. Such 306 250 decoding center (the aminoacyl site on the small mechanisms are well known in other RNA systems 307 251 subunit) during certain stages of elongation. During such as RNAi (RNA interference), where base 308 252 translocation of the ribosome along the mRNA, many pairing between short regulatory RNAs [siRNA 309

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 6 Review: Structure of RNAs and RNA Machines Q2

310 (small interfering RNA) or miRNA (micro RNA)] and tration shifts the equilibrium between secondary 367 78 311 mRNA plays a critical role in gene regulation. structures, resulting in the formation or destruction 368 312 RISC (RNA-induced silencing complex) complexes of a transcriptional helix. In riboswitch 369 313 use these short RNAs as templates to locate and studies, the action mechanism (i.e., that the RNA is a 370 78–84 314 silence their mRNA targets. The protein-based ligand-based molecular switch) is determined by 371 315 RISC complex provides the structural architecture studying the secondary structure, without the need 372 316 and is responsible for either directly cleaving the for tertiary structure studies. Breaker et al., who have 373 85 317 target or down-regulating translation. discovered most of the known riboswitches, used 374 318 A first example is the 1/2-sbsRNA, which facilitates chemical probing experiments (in-line probing) to 375 319 mRNA decay by base pairing with the 3′-untrans- determine the secondary structure of the riboswitch 376 13 56,57,90,91 320 lated region (STAU1-mediated mRNA decay). in the presence or absence of the ligand. 377 321 Once bound, the lncRNA/mRNA complex is recog- Their technique has been validated in many crystal- 378 54,59,92,93 322 nized by the double-stranded RNA binding protein, lographic studies. While many computa- 379 323 STAU1, triggering mRNA decay. Another example is tional techniques have been developed to predict 380 324 the DHFR lncRNA, which is transcribed from a minor secondary structure, predicting the secondary struc- 381 325 of the dihydrofolate reductase gene and ture of long RNA sequences remains a significant 382 94–98 326 plays a key role in the epigenetic mechanism of challenge. One promising strategy uses ma- 383 86 95–97 327 promoter-specific repression of . chine learning approaches. Our experimentally 384 328 Here, the lncRNA forms a complex with the major determined secondary structures are convenient 385 329 promoter and also interacts with the transcription benchmarks for new predictive algorithms. 386 330 factor IIB, disrupting the preinitiation complex. Our group has produced the first experimentally 387 331 Additional evidence exists in other systems for the derived secondary structure of an intact lncRNA, the 388 20 332 cotranscriptional recruitment of chromatin modifying steroid receptor RNA (SRA). This 389 333 complexes utilizing base pairing between the na- lncRNA co-activates a variety of sex hormone 390 87 334 scent RNA and its antisense transcript. Antisense receptors (ER, AR, TR, GR, RAR) and has been 391 335 lncRNAs have been also shown to regulate splicing shown to directly interact with several proteins (ER, 392 99–101 336 by interacting with mRNAs. The long non-coding SHARP, SLIRP, DAX-1, SF-1, TR, Pus1p). 393 337 Zeb2 natural antisense transcript is antisense to a 5′ SRA has been found to associate with CTCF and is 394 338 splice site of the Zeb2 mRNA. Binding of Zeb2 thought to play a scaffolding role in the transcription 395 102 339 inhibits splicing of mRNA and, as a result, preserves complex. SRA is also strongly associated with 396 340 the internal ribosome entry site, necessary for breast cancer and may be useful as an early onset 397 88 341 efficient translation. Linc-MD1, associated with tumorigenesis marker. As many lncRNAs are similar 398 342 muscle differentiation, is an example of an lncRNA in size to ribosomal subunits, we followed the 399 343 that serves as a decoy for miRNAs. This lncRNA footsteps of Noller, Woese and Gutell by (i) 400 344 regulates miRNA action by providing alternative performing extensive chemical probing to produce 401 345 binding sites, effectively titrating the miR-133 and a secondary structure of the full lncRNA (~870 nt) 402 89 346 miR-135 away from their targets. These binding and (ii) validating with analysis across 403 103,104 347 sites tend to be relatively short (i.e., much shorter multiple sequences (Fig. 3). 404 348 than the total length of the lncRNA). The much larger Our experiments uncovered an intricate and highly 405 349 remainder of the lncRNA may play an additional structured two-dimensional architecture of the 406 350 structural role in the lncRNA's mechanism of action lncRNA, organized into four major sub-domains. To 407 351 yet to be determined. determine the secondary structure, we employed 408 four methods of structural probing: selective 2′- 409 352 Structural probing studies of lncRNAs hydroxyl acylation analyzed by primer extension 410 (SHAPE), in-line probing, DMS probing and RNase 411 353 Secondary structure often plays a critical role in VI digestion. The four methods yielded complemen- 412 354 RNA mechanism, underpinning overall tertiary ar- tary information about the secondary structure of the 413 355 chitecture by defining helices, bulges, stem–loops, intact lncRNA and were highly consistent with each 414 356 internal loops, junctions and sub-domains. Second- other. Overall, the lncRNA comprises 25 helical 415 357 ary structure alone can provide the basis of function segments (H1–H25), 16 terminal loops, 15 internal 416 358 when (i) the secondary structure defines a platform loops and 5 junction regions (Fig. 3), consistent with 417 359 for unique protein recognition or (ii) a single the 16S rRNA (~1540 nt), which contains 45 helices, 418 360 sequence produces two competing secondary struc- 31 terminal loops, 26 internal loops and 18 junction 419 361 tures, as in the case of the riboswitch. The riboswitch regions. In the SRA system, helices H1–H7 com- 420 362 represents a quintessential secondary-structure- prise domain I. Helices H10–H14 comprise domain 421 55–60 363 based RNA mechanism. Here, a single sequence II. Helices H8, H9 and H15–H21 comprise domain III, 422 364 of RNA has two different secondary structures, while helices H22–H25 comprise domain IV. While 423 365 which compete with each other to determine the domains I–III are highly conserved, representing the 424 366 outcome of gene expression. Often, ligand concen- core region of SRA, domain IV is evolutionarily 425

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 Review: Structure of RNAs and RNA Machines 7 Q2

Fig. 3. First experimentally derived secondary structure of an intact higher lncRNA, to our knowledge. The human SRA has a length of 870 nt, organized into 4 sub-domains and 25 helices. Four biochemical techniques were used to obtain the secondary structure (SHAPE, in-line, DMS and RNase V1). The vast majority of helices were validated with covariance analysis based on multiple across . Yellow, domain I; blue, domain II; purple, domain III; green, domain IV. Inset: secondary structure annotated with SHAPE probing results (SHAPE). Orange/red, high reactivity nucleotides; yellow, medium reactivity; gray/black, low/no mobility.

426 divergent and highly variable in sequence across though the helical density is slightly lower compared 432 427 mammals. Therefore, our initial expectation was that to the rest of the sequence. This is consistent with 433 428 this region would have little structure because there the recently solved crystallographic structure of the 434 429 was no obvious sequence conservation across 80S yeast ribosome, whose expansion segments 435 430 species. However, we found that this domain does are highly variable in sequence yet highly structured 436 105 431 comprise a number of helical components, even in the crystallographic study. 437

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 8 Review: Structure of RNAs and RNA Machines Q2

438 We also found that the SRA RNA secondary to select the correct regional fold, but for some RNA 461 439 structure contains many highly conserved, - sections, more information was required. In these 462 440 rich elements, primarily located in single-stranded regions, we opted to use an additional experimental 463 441 regions of RNA. The same preference toward single- technique, Shotgun Secondary Structure determi- 464 442 stranded regions is also found in riboswitches and nation, where sub-fragments of the full RNA are 465 443 , attributed to the base-stacking propen- probed to determine modularly folded sub-domains 466 20 444 sity of . For example, in the eukaryotic of SRA. This technique enabled us to determine 467 445 ribosome, expansion segment 6 (ES6) contains much of the secondary structures of domain II and 468 105 446 purine-rich single-stranded regions. While, at domain III. We note that several interesting studies 469 447 first glance, this secondary structure might suggest of viral secondary structures have been performed 470 106–109 448 a significant amount of disorder, the crystallographic using SHAPE probing. To obtain the second- 471 449 structure reveals a high degree of organization, with ary structure fold, we incorporated SHAPE reactivity 472 450 many of the single-stranded nucleotides participat- data with structure predictions based on thermody- 473 451 ing in non-Watson-Crick base pairs (Fig. 4). Using a namic parameters. Covariance analysis and frag- 474 452 multiple sequence alignment across 45 species for ment-based probing strategies such as the shotgun 475 453 the SRA gene, we were able to calculate the degree methodology can be used to help validate these 476 454 of covariation of base pairs between species. structures. 477 455 Helices H2, H3, H4, H6, H7, H8, H9, H12, H13, In addition to studies of single intact lncRNAs, 478 456 H14, H15, H19, H20 and H21 possess at least one Chang et al. are pioneering genome-wide studies of 479 457 covariant . RNA secondary structure. Performing RNase diges- 480 458 Generally, the longer the RNA sequence, the more tion as a function of temperature, they have 481 459 alternative folding choices are present. Probing produced a technique capable of measuring RNA 482 63 460 experiments on SRA were sufficient in most cases folding energies on a genome-wide scale. The 483

U G G U (a) U U G U C U U U U A U A G U G C A U C U C G A U G C U G G A U U U U U G U CC A A G A C G G U U A U A G A G C A U G G U A U A A A U A CG A G U AA C A G G U A A U A G

G U U C A U U UU A A U C G G A C G U A A U A A G U A A A U A C U

(b)

Fig. 4. Expansion segment ES6 of the yeast ribosome small subunit. (a) Secondary structure of ES6 displays a large internal loop (22 nt) and a large stem–loop (13- nt loop). (b) Despite the lack of canonical base pairs, both the internal loop and the stem–loop are highly structured in the recent X-ray structure of the yeast ribo- some solved by Ben-Shem et al.105 Many of the purines in the internal loop form noncanonical base pairs.

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 Review: Structure of RNAs and RNA Machines 9 Q2

484 experiments reveal key features near the by recruiting the PRC2 and -specific demethy- 543 485 in mRNAs. The study also suggests that lncRNAs, on lase (LSD), which each modify at targeted 544 486 average, appear to be structured, with a level of loci. Deletions of portions of the HOTAIR RNA show 545 487 structural content in between the structural content of that PRC2 binds to a 300-nt region in the 5′ end of 546 488 the ribosome and the structural content of a typical HOTAIR and LSD binds to a 646-nt region at the 3′ 547 489 mRNA. Lucks et al. have developed a method to end. The intervening sequence may spatially orga- 548 490 perform SHAPE probing with the Illumina platform, nize the two interaction sites. This region may also 549 491 which may also be scalable to genome-wide contain motifs necessary for targeting. The exact 550 110 492 studies. New SHAPE reagents for in vivo RNA structures of the LSD1 binding motifs have not been 551 493 probing, with better solubility and increased half-, determined. Another system, the growth-arrest 552 111 494 have recently been introduced by Spitale et al. specific non-coding transcript (), contains a 553 495 Interesting structural elements involved in non- hairpin element. This element is responsible for 554 496 canonical 3′ end processing mechanisms have regulation of the glucocorticoid receptor (GR) via a 555 26,27,112 497 been determined for other lncRNA. Matura- decoy mechanism, mimicking the DNA hormone- 556 21 498 tion of the NEAT1_v2 (MEN beta) and MALAT1 responsive element. 557 499 transcripts relies on the formation of a cloverleaf 500 (four-way junction) secondary structure. It is a highly Possibilities for lncRNA three-dimensional 558 501 conserved element, which mimics pre-tRNA sub- structure 559 502 strate and recruits the RNase P complex. Subse- 503 quently, RNase P cleaves upfront of this element, In the case of lncRNAs, many interesting ques- 560 504 generating the mature ends of NEAT1_v2 and tions are yet to be answered. For example, should 561 505 MALAT1 transcripts. The cleaved 3′ end fragments we expect to encounter structural motifs already met 562 506 undergo an additional processing by RNase Z to in known RNA systems? What are the structural 563 507 yield shorter tRNA-like transcripts, named differences at the RNA level across various organ- 564 508 mascRNA (generated from MALAT1) and menRNA isms? In light of the rapid turnover of lncRNAs, do we 565 509 (generated from NEAT1_v2). expect the evolution of unique lineage-specific 566 510 The human accelerated region RNA, HAR1, structural elements? 567 511 associated with neocortex development, also con- We currently lack information on the tertiary 568 512 tains a cloverleaf element, as evidenced by structure of lncRNAs. In addition, it is not known if 569 113 513 structural probing studies. Interestingly, signifi- lncRNAs exist in ribonucleoprotein complexes 570 514 cant sequence and structure divergence is observed (RNPs) or predominantly as isolated RNAs. With 571 515 between human and chimpanzee (18 ), regard to tertiary structure, the ribosome is an 572 114,115 516 attributed to brain evolution. In chimpanzee, interesting system for comparison since the ribosome 573 517 structure probing of this region revealed an extended is the only RNA system N1 kb whose crystallographic 574 113 518 hairpin structure. It has been suggested that the structure has been solved. In terms of the composition 575 519 chimpanzee RNA may form a cloverleaf element in of a typical lncRNA complex, the total number of 576 520 the presence of an additional protein. unique lncRNAs and unique proteins is a useful 577 521 Another element observed in lncRNAs is the constraint. As mentioned above, the number of 578 522 double stem–loop, often associated with chromatin human lncRNAs has been estimated to be 579 5 523 remodeling. This is a particularly interesting example ~15,000, while the number of protein-coding genes 580 122 524 because many lncRNAs have been shown to play is ~21,000. While an lncRNA-based ribosome-like 581 525 important roles in chromatin remodeling, a focus of RNP complex could exist, it is unlikely that most 582 15,116,117 526 many transcriptome-wide lncRNA studies. lncRNAs are ribosome-like RNPs, given the limited 583 527 In embryonic stem cells, Zhao et al. and Surface et al. number of unique protein-coding genes relative to the 584 528 discovered N9000 lncRNAs that interact with the number of unique lncRNA transcripts. Ruling out 585 118,119 529 polycomb repressive complex (PRC2). Bio- ribosome-like RNA complexes, the following possibil- 586 530 chemical analysis of PRC2-interacting lncRNAs ities remain for the composition of lncRNA complexes: 587 531 showed that binding occurs through EZH2, a (1) RNP complexes with many repeats of a few 588 532 component of PRC2. In another study, a CLIP-seq proteins, (2) RNP complexes with only a few proteins 589 533 investigation of RNAs associated with the SFRS1 or (3) isolated RNAs that transiently bind proteins as 590 534 splicing factor uncovered N6000 spliced non-coding needed for function. For scenario (1), ~10 protein 591 535 RNAs. Overall, chromatin remodeling lncRNAs copies per 1 kb of lncRNA would be required to 592 536 appear to act in trans via association with chromatin produce complexes with a similar protein-to-RNA ratio 593 116 537 modifying . It has been suggested, for as the ribosome (e.g., 20 protein copies for HOTAIR). 594 538 the system, that some of these interactions may In scenarios (2) and (3), lncRNA complexes would 595 120,121 539 occur via double stem–loop RNA elements. be similar in composition to RNase P, telomerase 596 540 Double stem–loop RNA motifs have also been RNA or the group I/II introns. An “RNase-P-like” 597 541 implicated in a PRC2-binding region of the HOTAIR lncRNA complex would contain a highly structured 598 12 542 lncRNA. HOTAIR (2.2 kb) regulates HoxD genes and compact RNA core with a single protein binding 599

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 10 Review: Structure of RNAs and RNA Machines Q2

600 site (Fig. 5a). Alternatively, the RNA could be Here, the RNA may transiently bind proteins that 613 601 structurally decentralized without a compact core, each perform a specific function. 614 602 containing several distinct protein binding sites 603 (Fig. 5c). The lncRNA may also act as a flexible Supramolecular lncRNA complexes 615 604 structural scaffold, as suggested for the telomerase 605 RNA (Fig. 5b). In loose terms, such a flexible RNA The large size of certain lncRNA systems has 616 606 could be analogous to the cellulosome, where many produced new kinds of RNA complexes on a scale 617 607 cellulase subunits are connected by long disordered not previously studied in the context of structural 618 608 linker proteins (Fig. 2d). The cellulase subunits do biology. In X chromosome inactivation, the human 619 609 not act coherently but allow the cellulosome to XIST is responsible for on the inactive 620 610 efficiently process cellulose. Finally, the lncRNA X chromosome (Xi). At 17 kb, XIST is one of the 621 611 could exist predominantly as an isolated, highly largest lncRNAs. A large number of XIST copies are 622 612 structured RNA, similar to the self-splicing introns. transcribed to physically coat the X chromosome. 623

(a) (b)

(c)

Fig. 5. Homology models representing potential paradigms for lncRNA tertiary structure. (a) Compact core. The lncRNA (pink) is highly structured and has a protein (yellow) binding site that may bind different proteins. (b) Loosely organized protein binding domain with relatively unstructured RNA. (c) Decentralized. The lncRNA possesses several protein (yellow) binding sites but no compact core.

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 Review: Structure of RNAs and RNA Machines 11 Q2

624 FISH experiments suggest that XIST RNAs not only possible that the full mechanistic arsenal of previ- 679 625 interact with chromatin but also crosstalk to produce ously studied RNAs may be employed in various 680 626 a very large structural architecture on the scale of the lncRNA systems. Because we are in the early stages 681 123–125 627 entire X chromosome. The kcnq1ot1 lncRNA of lncRNA research, it is not clear if majority of these 682 16,18 628 (108 kb) is also thought to coat . systems provide scaffolding for gene regulation 683 629 Other lncRNAs participate in subcellular struc- complexes or engage in more active roles, such as 684 630 ture formation, including NEAT1 ( catalysis, molecular switching or information proces- 685 27,37,38,126 631 formation) and MALAT1/NEAT2 (nuclear sing. With the identification of many thousands of 686 28,36,33,127,128 632 speckle formation). NEAT1 non-coding lncRNAs in recent years, it is clear that new 687 633 RNA has two isoforms: a short NEAT1_V1 (3.7 kb) structural studies will play a key role in demystifying 688 634 and a long NEAT1_V2 (22.7 kb). While the shorter these strange, new RNA machines. 689 635 transcript aids in the paraspeckle assembly, a longer 636 transcript NEAT1_V2 acts as an essential scaffold, 637 creating forming a structural network with sausage- 129,130 638 like morphology. To date, we identify up to 35 639 paraspeckle-associated proteins, which range from Acknowledgements 690 640 the splicing factors, the 3′ end processing enzymes 129 The work was performed under the auspices of the 691 641 and the disease-related proteins. US Department of Energy via Los Alamos National 692 Laboratory, Laboratory Directed Research and 693 642 From structure to mechanism: lncRNA dynamics Development. 694

643 The dynamics of lncRNAs must be studied to fully 644 understand their mechanisms. Timescales, in partic- Received 20 December 2012; 695 645 ular, have been shown to play a critical role in Received in revised form 21 February 2013; 696 646 understanding the mechanism of other molecular Accepted 25 February 2013 697 647 machines. In the ribosome system, rapid kinetics 698 648 studies placed many useful constraints on the order of Keywords: 699 649 events in tRNA selection and translocation prior to the long non-coding RNA; 700 131,132 650 solution of X-ray structures of the ribosome. lncRNA; 701 651 With structural data in hand, single-molecule studies lincRNA; 702 652 have identified more sub-steps of these processes, lncRNA structure; 703 653 improved our understanding of transitions between epigenetics 704 654 states and produced a new framework for 705 73 655 mechanism. Abbreviations used: 706 656 As lncRNAs have only recently emerged as a lncRNA, long non-coding RNA; SRA, steroid receptor 707 657 class of RNAs, very little is known about the relevant RNA activator; SHAPE, selective 2′-hydroxyl acylation 708 658 timescales of lncRNA function. One recently discov- analyzed by primer extension; LSD, lysine-specific de- 709 659 ered lncRNA, DBE-T, is an illustrative example with methylase; RNPribonucleoprotein complex. 710 660 respect to processes that can occur in lncRNA 40 661 systems. Here, the DBE-T lncRNA is an important References 711 662 part of the epigenetic switch associated with 663 facioscapulohumeral muscular dystrophy. DBE-T is 1. Taft, R. J., Pheasant, M. & Mattick, J. S. (2007). The 712 relationship between non-protein-coding DNA and 713 664 cis-acting RNA and tethers epigenetic factors to the eukaryotic complexity. BioEssays, 29, 288–299. 714 665 D4Z4 binding element (DBE). As a result, histone 2. Djebali, S., Davis, C. A., Merkel, A., Dobin, A., 715 666 occurs. Here, timescales related to Lassmann, T., Mortazavi, A. et al. (2012). Landscape 716 667 transcription, folding, protein binding to the RNA, of transcription in human cells. Nature, 489, 101–108. 717 668 protein binding to the chromatin and histone meth- 3. Mehler, M. F. & Mattick, J. S. (2007). Noncoding 718 669 ylation should be studied to determine the rate- RNAs and RNA editing in brain development, 719 670 limiting step. functional diversification, and neurological disease. 720 Physiol. Rev. 87, 799–823. 721 4. Kapranov, P., Cheng, J., Dike, S., Nix, D. A., 722 671 Summary Duttagupta, R., Willingham, A. T. et al. (2007). RNA 723 maps reveal new RNA classes and a possible 724 672 LncRNAs have emerged as a new class of RNAs, function for pervasive transcription. Science, 316, 725 673 playing important roles in development, stem cells, 1484–1488. 726 674 cancer, brain disease and epigenetic mechanism. 5. Derrien, T., Johnson, R., Bussotti, G., Tanzer, A., 727 675 LncRNA mechanism may be based on sequence, Djebali, S., Tilgner, H. et al. (2012). The GENCODE 728 676 secondary structure, tertiary structure or a new v7 catalog of human long noncoding RNAs: analysis 729 677 combination of these mechanisms. In light of the of their , evolution, and expression. 730 678 high degree of diversity among lncRNAs, it is Genome Res. 22, 1775–1789. 731

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 12 Review: Structure of RNAs and RNA Machines Q2

732 6. Rinn, J. L. & Chang, H. Y. (2012). Genome regulation 22. Lin, D., Pestova, T. V., Hellen, C. U. & Tiedge, H. 796 733 by long noncoding RNAs. Annu. Rev. Biochem. 81, (2008). Translational control by a small RNA: 797 734 145–166. dendritic BC1 RNA targets the eukaryotic initiation 798 735 7. Ziats, M. N. & Rennert, O. M. (2012). Aberrant factor 4A mechanism. Mol. Cell. Biol. 28, 799 736 expression of long noncoding RNAs in autistic brain. 3008–3019. 800 737 J. Mol. Neurosci. 49, 589–593. 23. Pauli, A., Rinn, J. L. & Schier, A. F. (2011). Non- 801 738 8. Brunner, A. L., Beck, A. H., Edris, B., Sweeney, R. T., coding RNAs as regulators of embryogenesis. Nat. 802 739 Zhu, S. X., Li, R. et al. (2012). Transcriptional profiling Rev., Genet. 12, 136–149. 803 740 of lncRNAs and novel transcribed regions across a 24. Huarte, M. & Rinn, J. L. (2010). Large non-coding 804 741 diverse panel of archived human cancers. Genome RNAs: missing links in cancer? Hum. Mol. Genet. 19, 805 742 Biol. 13, R75. R152–R161. 806 743 9. Sati, S., Ghosh, S., Jain, V., Scaria, V. & Sengupta, 25. Tsai, M. C., Spitale, R. C. & Chang, H. Y. (2011). 807 744 S. (2012). Genome-wide analysis reveals distinct Long intergenic noncoding RNAs: new links in 808 745 patterns of epigenetic features in long non-coding cancer progression. Cancer Res. 71,3–7. 809 746 RNA loci. Nucleic Acids Res. 40, 10018–10031. 26. Wilusz, J. E., Freier, S. M. & Spector, D. L. (2008). 3′ 810 747 10. Zhang, X., Sun, S., Pu, J. K., Tsang, A. C., Lee, D., end processing of a long nuclear-retained noncoding 811 748 Man, V. O. et al. (2012). Long non-coding RNA RNA yields a tRNA-like cytoplasmic RNA. Cell, 135, 812 749 expression profiles predict clinical in 919–932. 813 750 glioma. Neurobiol. Dis. 48,1–8. 27. Sunwoo, H., Dinger, M. E., Wilusz, J. E., Amaral, 814 751 11. Banfai, B., Jia, H., Khatun, J., Wood, E., Risk, B., P. P., Mattick, J. S. & Spector, D. L. (2009). MEN 815 752 Gundling, W. E., Jr et al. (2012). Long noncoding epsilon/beta nuclear-retained non-coding RNAs are 816 753 RNAs are rarely translated in two human cell lines. up-regulated upon muscle differentiation and are 817 754 Genome Res. 22, 1646–1657. essential components of . Genome 818 755 12. Gupta, R. A., Shah, N., Wang, K. C., Kim, J., Res. 19, 347–359. 819 756 Horlings, H. M., Wong, D. J. et al. (2010). Long 28. Tano, K. & Akimitsu, N. (2012). Long non-coding 820 757 non-coding RNA HOTAIR reprograms chromatin RNAs in cancer progression. Front. Genet. 3, 219. 821 758 state to promote cancer metastasis. Nature, 464, 29. Tani, H., Nakamura, Y., Ijiri, K. & Akimitsu, N. (2010). 822 759 1071–1076. Stability of MALAT-1, a nuclear long non-coding RNA 823 760 13. Gong, C. & Maquat, L. E. (2011). lncRNAs transacti- in mammalian cells, varies in various cancer cells. 824 761 vate STAU1-mediated mRNA decay by duplexing Drug Discoveries Ther. 4, 235–239. 825 762 with 3′ UTRs via Alu elements. Nature, 470, 284–288. 30. Dinger, M. E., Amaral, P. P., Mercer, T. R., Pang, 826 763 14. Ponting, C. P., Oliver, P. L. & Reik, W. (2009). K. C., Bruce, S. J., Gardiner, B. B. et al. (2008). 827 764 Evolution and functions of long noncoding RNAs. Long noncoding RNAs in mouse embryonic stem 828 765 Cell, 136, 629–641. cell pluripotency and differentiation. Genome Res. 829 766 15. Guttman, M., Amit, I., Garber, M., French, C., Lin, M. F., 18, 1433–1445. 830 767 Feldser, D. et al. (2009). Chromatin signature reveals 31. Sone, M., Hayashi, T., Tarui, H., Agata, K., Takeichi, 831 768 over a thousand highly conserved large non-coding M. & Nakagawa, S. (2007). The mRNA-like noncod- 832 769 RNAs in mammals. Nature, 458, 223–227. ing RNA Gomafu constitutes a novel nuclear domain 833 770 16. Korostowski, L., Sedlak, N. & Engel, N. (2012). The in a subset of neurons. J. Cell Sci. 120, 2498–2506. 834 771 Kcnq1ot1 long non-coding RNA affects chromatin 32. Mercer, T. R., Dinger, M. E., Sunkin, S. M., Mehler, 835 772 conformation and expression of Kcnq1, but does not M. F. & Mattick, J. S. (2008). Specific expression of 836 773 regulate its imprinting in the developing heart. PLoS long noncoding RNAs in the mouse brain. Proc. Natl. 837 774 Genet. 8, e1002956. Acad. Sci. USA, 105, 716–721. 838 775 17. Sleutels, F., Zwart, R. & Barlow, D. P. (2002). The 33. Bernard, D., Prasanth, K. V., Tripathi, V., Colasse, S., 839 776 non-coding Air RNA is required for silencing autoso- Nakamura, T., Xuan, Z. et al. (2010). A long nuclear- 840 777 mal imprinted genes. Nature, 415, 810–813. retained non-coding RNA regulates synaptogenesis 841 778 18. Pandey, R. R., Mondal, T., Mohammad, F., Enroth, by modulating gene expression. EMBO J. 29, 842 779 S., Redrup, L., Komorowski, J. et al. (2008). 3082–3093. 843 780 Kcnq1ot1 antisense noncoding RNA mediates line- 34. Dharap, A., Nakka, V. P. & Vemuganti, R. (2012). 844 781 age-specific transcriptional silencing through chro- Effect of focal ischemia on long noncoding RNAs. 845 782 matin-level regulation. Mol. Cell, 32, 232–246. Stroke, 43, 2800–2802. 846 783 19. Vallot, C., Huret, C., Lesecque, Y., Resch, A., 35. Clemson, C. M., Hutchinson, J. N., Sara, S. A., 847 784 Oudrhiri, N., Bennaceur-Griscelli, A. et al. (2013). Ensminger, A. W., Fox, A. H., Chess, A. & 848 785 XACT, a long noncoding transcript coating the active Lawrence, J. B. (2009). An architectural role for a 849 786 X chromosome in human pluripotent cells. Nat. nuclear noncoding RNA: NEAT1 RNA is essential 850 787 Genet. 45, 239–241. for the structure of paraspeckles. Mol. Cell, 33, 851 788 20. Novikova, I. V., Hennelly, S. P. & Sanbonmatsu, K. Y. 717–726. 852 789 (2012). Structural architecture of the human long 36. Tano, K., Mizuno, R., Okada, T., Rakwal, R., Shibato, 853 790 non-coding RNA, steroid receptor RNA activator. J., Masuo, Y. et al. (2010). MALAT-1 enhances cell 854 791 Nucleic Acids Res. 40, 5034–5051. motility of lung adenocarcinoma cells by influencing 855 792 21. Kino, T., Hurt, D. E., Ichijo, T., Nader, N. & Chrousos, the expression of motility-related genes. FEBS Lett. 856 793 G. P. (2010). Noncoding RNA gas5 is a growth 584, 4575–4580. 857 794 arrest- and starvation-associated of the 37. Mao, Y. S., Sunwoo, H., Zhang, B. & Spector, D. L. 858 795 glucocorticoid receptor. Sci. Signaling, 3, ra8. (2011). Direct visualization of the co-transcriptional 859

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 Review: Structure of RNAs and RNA Machines 13 Q2

860 assembly of a nuclear body by noncoding RNAs. Nat. 54. Batey, R. T., Gilbert, S. D. & Montange, R. K. (2004). 924 861 Cell Biol. 13, 95–101. Structure of a natural -responsive riboswitch 925 862 38. Ip, J. Y. & Nakagawa, S. (2011). Long non-coding complexed with the metabolite . Na- 926 863 RNAs in nuclear bodies. Dev., Growth Differ. [Epub ture, 432, 411–415. 927 864 ahead of print]. 55. Montange, R. K. & Batey, R. T. (2008). Riboswitches: 928 865 39. Ietswaart, R., Wu, Z. & Dean, C. (2012). Flowering emerging themes in RNA structure and function. 929 866 time control: another window to the connection Annu. Rev. Biophys. 37, 117–133. 930 867 between antisense RNA and chromatin. Trends 56. Winkler, W., Nahvi, A. & Breaker, R. R. (2002). 931 868 Genet. 28, 445–453. derivatives bind messenger RNAs directly 932 869 40. Cabianca, D. S., Casa, V., Bodega, B., Xynos, A., to regulate bacterial gene expression. Nature, 419, 933 870 Ginelli, E., Tanaka, Y. & Gabellini, D. (2012). A long 952–956. 934 871 ncRNA links copy number variation to a polycomb/- 57. Tucker, B. J. & Breaker, R. R. (2005). Riboswitches 935 872 trithorax epigenetic switch in FSHD muscular dys- as versatile gene control elements. Curr. Opin. 936 873 trophy. Cell, 149, 819–831. Struct. Biol. 15, 342–348. 937 874 41. Swiezewski, S., Liu, F., Magusin, A. & Dean, C. 58. Hennelly, S. P. & Sanbonmatsu, K. Y. (2011). 938 875 (2009). Cold-induced silencing by long antisense Tertiary contacts control switching of the SAM-I 939 876 transcripts of an Arabidopsis Polycomb target. riboswitch. Nucleic Acids Res. 39, 2416–2431. 940 877 Nature, 462, 799–802. 59. Stoddard, C. D., Montange, R. K., Hennelly, S. P., 941 878 42. Song, J., Angel, A., Howard, M. & Dean, C. (2012). Rambo, R. P., Sanbonmatsu, K. Y. & Batey, R. T. 942 879 Vernalization—a cold-induced epigenetic switch. J. (2010). Free state conformational sampling of the 943 880 Cell Sci. 125, 3723–3731. SAM-I riboswitch domain. Structure, 18, 944 881 43. Mizutani, R., Wakamatsu, A., Tanaka, N., Yoshida, 787–797. 945 882 H., Tochigi, N., Suzuki, Y. et al. (2012). Identification 60. Hayes, R. L., Noel, J. K., Mohanty, U., Whitford, P. C., 946 883 and characterization of novel genotoxic stress- Hennelly, S. P., Onuchic, J. N. & Sanbonmatsu, 947 884 inducible nuclear long noncoding RNAs in mamma- K. Y. (2012). Magnesium fluctuations modulate RNA 948 885 lian cells. PLoS One, 7, e34949. dynamics in the SAM-I riboswitch. J. Am. Chem. Soc. 949 886 44. Frank, J., Jr & Gonzalez, R. L. (2010). Structure and 134, 12043–12053. 950 887 dynamics of a processive Brownian motor: the trans- 61. Penchovsky, R. & Breaker, R. R. (2005). Computa- 951 888 lating ribosome. Annu. Rev. Biochem. 79,381–412. tional design and experimental validation of oligonu- 952 889 45. Sanbonmatsu, K. Y. (2012). Computational studies cleotide-sensing allosteric . Nat. 953 890 of molecular machines: the ribosome. Curr. Opin. Biotechnol. 23, 1424–1433. 954 891 Struct. Biol. 22, 168–174. 62. Guttman, M. & Rinn, J. L. (2012). Modular regulatory 955 892 46. Gavrilova, L. P., Kostiashkina, O. E., Koteliansky, V. E., principles of large non-coding RNAs. Nature, 482, 956 893 Rutkevitch, N. M. & Spirin, A. S. (1976). Factor-free 339–346. 957 894 (“non-enzymic”) and factor-dependent systems of 63. Wan, Y., Qu, K., Ouyang, Z., Kertesz, M., Li, J., 958 895 translation of polyuridylic acid by Escherichia coli Tibshirani, R. et al. (2012). Genome-wide measure- 959 896 ribosomes. J. Mol. Biol. 101, 537–552. ment of RNA folding energies. Mol. Cell, 48, 960 897 47. Kitao, A., Yonekura, K., Maki-Yonekura, S., Samatey, 169–181. 961 898 F. A., Imada, K., Namba, K. & Go, N. (2006). Switch 64. Appelt, K., Dijk, J., Reinhardt, R., Sanhuesa, S., 962 899 interactions control energy frustration and multiple White, S. W., Wilson, K. S. & Yonath, A. (1981). The 963 900 flagellar filament structures. Proc. Natl. Acad. Sci. of ribosomal proteins from the 50S 964 901 USA, 103, 4894–4899. subunit of the Escherichia coli and Bacillus stear- 965 902 48. Chan, R. T., Robart, A. R., Rajashankar, K. R., Pyle, othermophilus ribosome. J. Biol. Chem. 256, 966 903 A. M. & Toor, N. (2012). Crystal structure of a group II 11787–11790. 967 904 intron in the pre-catalytic state. Nat. Struct. Mol. Biol. 65. Schuwirth, B. S., Borovinskaya, M. A., Hau, C. W., 968 905 19, 555–557. Zhang, W., Vila-Sanjurjo, A., Holton, J. M. & Cate, 969 906 49. Pyle, A. M. (2010). The tertiary structure of group II J. H. (2005). Structures of the bacterial ribosome at 970 907 introns: implications for biological function and 3.5 Ǻ resolution. Science, 310, 827–834. 971 908 evolution. Crit. Rev. Biochem. Mol. Biol. 45, 215–232. 66. Pyle, A. M., Fedorova, O. & Waldsich, C. (2007). 972 909 50. Toor, N., Keating, K. S. & Pyle, A. M. (2009). Folding of group II introns: a model system for large, 973 910 Structural insights into RNA splicing. Curr. Opin. multidomain RNAs? Trends Biochem. Sci. 32, 974 911 Struct. Biol. 19, 260–266. 138–145. 975 912 51. Reiter, N. J., Osterman, A., Torres-Larios, A., 67. Adams, P. L., Stahley, M. R., Kosek, A. B., Wang, J. 976 913 Swinger, K. K., Pan, T. & Mondragon, A. (2010). & Strobel, S. A. (2004). Crystal structure of a self- 977 914 Structure of a bacterial P holoenzyme splicing group I intron with both exons. Nature, 430, 978 915 in complex with tRNA. Nature, 468, 784–789. 45–50. 979 916 52. Mitchell, M., Gillis, A., Futahashi, M., Fujiwara, H. & 68. Toor, N., Keating, K. S., Taylor, S. D. & Pyle, A. M. 980 917 Skordalakes, E. (2010). Structural basis for telome- (2008). Crystal structure of a self-spliced group II 981 918 rase catalytic subunit TERT binding to RNA template intron. Science, 320, 77–82. 982 919 and telomeric DNA. Nat. Struct. Mol. Biol. 17, 69. Marcia, M. & Pyle, A. M. (2012). Visualizing group II 983 920 513–518. intron catalysis through the stages of splicing. Cell, 984 921 53. Zappulla, D. C. & Cech, T. R. (2004). Yeast 151, 497–507. 985 922 telomerase RNA: a flexible scaffold for protein sub- 70. Gao, Y. G., Selmer, M., Dunham, C. M., Weixlbaumer, 986 923 units. Proc. Natl. Acad. Sci. USA, 101, 10024–10029. A., Kelley, A. C. & Ramakrishnan, V. (2009). The 987

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 14 Review: Structure of RNAs and RNA Machines Q2

988 structure of the ribosome with elongation factor G transcript regulates Zeb2/Sip1 gene expression 1052 989 trapped in the posttranslocational state. Science, during Snail1-induced epithelial–mesenchymal tran- 1053 990 326, 694–699. sition. Genes Dev. 22, 756–769. 1054 991 71. Rodnina, M. V., Beringer, M. & Wintermeyer, W. 89. Cesana, M., Cacchiarelli, D., Legnini, I., Santini, T., 1055 992 (2006). Mechanism of bond formation on the Sthandier, O., Chinappi, M. et al. (2011). A long 1056 993 ribosome. Q. Rev. Biophys. 39, 203–225. noncoding RNA controls muscle differentiation by 1057 994 72. Whitford, P. C., Ahmed, A., Yu, Y., Hennelly, S. P., functioning as a competing endogenous RNA. Cell, 1058 995 Tama, F., Spahn, C. M. et al. (2011). Excited states of 147, 358–369. 1059 996 ribosome translocation revealed through integrative 90. Winkler, W. C., Cohen-Chalamish, S. & Breaker, R. R. 1060 997 molecular modeling. Proc. Natl Acad. Sci. USA, 108, (2002). An mRNA structure that controls gene 1061 998 18943–18948. expression by binding FMN. Proc. Natl Acad. Sci. 1062 999 73. Blanchard, S. C. (2009). Single-molecule observa- USA, 99, 15908–15913. 1063 1000 tions of ribosome function. Curr. Opin. Struct. Biol. 91. Winkler, W. C., Nahvi, A., Sudarsan, N., Barrick, J. E. 1064 1001 19, 103–109. & Breaker, R. R. (2003). An mRNA structure that 1065 1002 74. Reiter, N. J., Chan, C. W. & Mondragon, A. (2011). controls gene expression by binding S-adenosyl- 1066 1003 Emerging structural themes in large RNA molecules. . Nat. Struct. Biol. 10, 701–707. 1067 1004 Curr. Opin. Struct. Biol. 21, 319–326. 92. Montange, R. K. & Batey, R. T. (2006). Structure of 1068 1005 75. Cole, D. I., Legassie, J. D., Bonifacio, L. N., Sekaran, the S-adenosylmethionine riboswitch regulatory 1069 1006 V. G., Ding, F., Dokholyan, N. V. & Jarstfer, M. B. mRNA element. Nature, 441, 1172–1175. 1070 1007 (2012). New models of telomerase RNA 93. Gilbert, S. D., Rambo, R. P., Van Tyne, D. & Batey, 1071 1008 from experimentally derived constraints and model- R. T. (2008). Structure of the SAM-II riboswitch 1072 1009 ing. J. Am. Chem. Soc. 134, 20070–20080. bound to S-adenosylmethionine. Nat. Struct. Mol. 1073 1010 76. Azzalin, C. M., Reichenbach, P., Khoriauli, L., Biol. 15, 177–182. 1074 1011 Giulotto, E. & Lingner, J. (2007). Telomeric repeat 94. Deigan, K. E., Li, T. W., Mathews, D. H. & Weeks, K. M. 1075 1012 containing RNA and RNA surveillance factors at (2009). Accurate SHAPE-directed RNA structure de- 1076 1013 mammalian chromosome ends. Science, 318, termination. Proc. Natl Acad. Sci. USA, 106,97–102. 1077 1014 798–801. 95. Hamada, M., Sato, K. & Asai, K. (2010). Prediction of 1078 1015 77. Schoeftner, S. & Blasco, M. A. (2008). Developmen- RNA secondary structure by maximizing pseudo- 1079 1016 tally regulated transcription of mammalian telomeres expected accuracy. BMC , 11, 586. 1080 1017 by DNA-dependent RNA polymerase II. Nat. Cell 96. Hamada, M., Sato, K. & Asai, K. (2011). Improving the 1081 1018 Biol. 10, 228–236. accuracy of predicting secondary structure for aligned 1082 1019 78. Lau, N., Lim, L., Weinstein, E. & Bartel, D. (2001). An RNA sequences. Nucleic Acids Res. 39, 393–402. 1083 1020 abundant class of tiny RNAs with probable regulatory 97. Rivas, E., Lang, R. & Eddy, S. R. (2012). A range of 1084 1021 roles in . Science, 294, complex probabilistic models for RNA secondary 1085 1022 858–862. structure prediction that includes the nearest-neigh- 1086 1023 79. Reinhart, B., Weinstein, E., Rhoades, M., Bartel, B. & bor model and more. RNA, 18, 193–212. 1087 1024 Bartel, D. (2002). in . Genes Dev. 98. Bernhart, S. H., Hofacker, I. L., Will, S., Gruber, A. R. 1088 1025 16, 1616–1626. &Stadler,P.F.(2008).RNAalifold:improved1089 1026 80. Lewis, B., Shih, I., Jones-Rhoades, M., Bartel, D. & consensus structure prediction for RNA alignments. 1090 1027 Burge, C. (2003). Prediction of mammalian micro- BMC Bioinformatics, 9, 474. 1091 1028 RNA targets. Cell, 115, 787–798. 99. Ghosh, S. K., Patton, J. R. & Spanjaard, R. A. (2012). 1092 1029 81. Tang, G., Reinhart, B., Bartel, D. & Zamore, P. A small RNA derived from RNA coactivator SRA 1093 1030 (2003). A biochemical framework for RNA silencing blocks steroid receptor signaling via inhibition of 1094 1031 in plants. Genes Dev. 17, 49–63. Pus1p-mediated pseudouridylation of SRA: evi- 1095 1032 82. Zamore, P. (2002). Ancient pathways programmed dence of a novel RNA binding domain in the N- 1096 1033 by small RNAs. Science, 296, 1265–1269. terminus of steroid receptors. , 51, 1097 1034 83. Zamore, P. D. & Haley, B. (2005). Ribo-gnome: the 8163–8172. 1098 1035 big world of small RNAs. Science, 309, 1519–1524. 100. Xu, B., Yang, W. H., Gerin, I., Hu, C. D., Hammer, G. D. 1099 1036 84. Ghildiyal, M. & Zamore, P. D. (2009). Small silencing & Koenig, R. J. (2009). Dax-1 and steroid receptor 1100 1037 RNAs: an expanding . Nat. Rev., Genet. 10, RNA activator (SRA) function as transcriptional 1101 1038 94–108. coactivators for steroidogenic factor 1 in steroido- 1102 1039 85. Tomari, Y. & Zamore, P. D. (2005). Perspective: genesis. Mol. Cell. Biol. 29, 1719–1734. 1103 1040 machines for RNAi. Genes Dev. 19, 517–529. 101. Colley, S. M. & Leedman, P. J. (2009). SRA and its 1104 1041 86. Martianov, I., Ramadass, A., Serra Barros, A., Chow, binding partners: an expanding role for RNA- 1105 1042 N. & Akoulitchev, A. (2007). Repression of the binding coregulators in nuclear receptor-mediated 1106 1043 human dihydrofolate reductase gene by a non- gene regulation. Crit. Rev. Biochem. Mol. Biol. 44, 1107 1044 coding interfering transcript. Nature, 445, 666–670. 25–33. 1108 1045 87. Bertani, S., Sauer, S., Bolotin, E. & Sauer, F. (2011). 102. Yao, H., Brick, K., Evrard, Y., Xiao, T., Camerini- 1109 1046 The noncoding RNA mistral activates Hoxa6 and Otero, R. D. & Felsenfeld, G. (2010). Mediation of 1110 1047 Hoxa7 expression and differentiation by CTCF transcriptional insulation by DEAD-box RNA- 1111 1048 recruiting MLL1 to chromatin. Mol. Cell, 43, binding protein p68 and steroid receptor RNA 1112 1049 1040–1046. activator SRA. Genes Dev. 24, 2543–2555. 1113 1050 88. Beltran, M., Puig, I., Pena, C., Garcia, J. M., Alvarez, 103. Woese, C. R., Magrum, L. J., Gupta, R., Siegel, R. B., 1114 1051 A. B., Pena, R. et al. (2008). A natural antisense Stahl, D. A., Kop, J. et al. (1980). Secondary structure 1115

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 Review: Structure of RNAs and RNA Machines 15 Q2

1116 model for bacterial 16S ribosomal RNA: phylogenetic, 118. Zhao, J., Ohsumi, T. K., Kung, J. T., Ogawa, Y., 1178 1117 enzymatic and chemical evidence. Nucleic Acids Res. Grau, D. J., Sarma, K. et al. (2010). Genome-wide 1179 1118 8, 2275–2293. identification of polycomb-associated RNAs by RIP- 1180 1119 104. Noller, H. F. & Woese, C. R. (1981). Secondary seq. Mol. Cell, 40, 939–953. 1181 1120 structure of 16S ribosomal RNA. Science, 212, 119. Surface, L. E., Thornton, S. R. & Boyer, L. A. (2010). 1182 1121 403–411. Polycomb group proteins set the stage for early 1183 1122 105. Ben-Shem, A., Garreau de Loubresse, N., Melnikov, lineage commitment. Cell Stem Cell, 7, 288–298. 1184 1123 S., Jenner, L., Yusupova, G. & Yusupov, M. (2011). 120. Zhao, J., Sun, B. K., Erwin, J. A., Song, J. J. & Lee, 1185 1124 The structure of the eukaryotic ribosome at 3.0 Ǻ J. T. (2008). Polycomb proteins targeted by a short 1186 1125 resolution. Science, 334, 1524–1529. repeat RNA to the mouse X chromosome. Science, 1187 1126 106. Legiewicz, M., Badorrek, C. S., Turner, K. B., Fabris, 322, 750–756. 1188 1127 D., Hamm, T. E., Rekosh, D. et al. (2008). 121. Wutz, A., Rasmussen, T. P. & Jaenisch, R. (2002). 1189 1128 Resistance to RevM10 inhibition reflects a confor- Chromosomal silencing and localization are mediat- 1190 1129 mational switch in the HIV-1 Rev response element. ed by different domains of Xist RNA. Nat. Genet. 30, 1191 1130 Proc. Natl Acad. Sci. USA, 105, 14365–14370. 167–174. 1192 1131 107. Legiewicz, M., Zolotukhin, A. S., Pilkington, G. R., 122. Bernstein, B. E., Birney, E., Dunham, I., Green, E. D., 1193 1132 Purzycka, K. J., Mitchell, M., Uranishi, H. et al. Gunter, C. & Snyder, M. (2012). An integrated 1194 1133 (2010). The RNA transport element of the murine encyclopedia of DNA elements in the human 1195 1134 musD requires long-range intramo- genome. Nature, 489, 57–74. 1196 1135 lecular interactions for function. J. Biol. Chem. 285, 123. Jeon, Y., Sarma, K. & Lee, J. T. (2012). New and 1197 1136 42097–42104. Xisting regulatory mechanisms of X chromosome 1198 1137 108. Purzycka, K. J., Legiewicz, M., Matsuda, E., Eizentstat, inactivation. Curr. Opin. Genet. Dev. 22, 62–71. 1199 1138 L. D., Lusvarghi, S., Saha, A. et al. (2013). Exploring 124. Lee, J. T. (2011). Gracefully ageing at 50, X- 1200 1139 Ty1 retrotransposon RNA structure within -like chromosome inactivation becomes a paradigm for 1201 1140 particles. Nucleic Acids Res. 41,463–473. RNA and chromatin control. Nat. Rev., Mol. Cell Biol. 1202 1141 109. Watts, J. M., Dang, K. K., Gorelick, R. J., Leonard, 12, 815–826. 1203 1142 C. W., Bess, J. W., Jr, Swanstrom, R. et al. (2009). 125. Kim, D. H., Jeon, Y., Anguera, M. C. & Lee, J. T. 1204 1143 Architecture and secondary structure of an entire (2011). X-chromosome epigenetic reprogramming in 1205 1144 HIV-1 RNA genome. Nature, 460, 711–716. pluripotent stem cells via noncoding genes. Semin. 1206 1145 110. Lucks, J. B., Mortimer, S. A., Trapnell, C., Luo, S., Cell Dev. Biol. 22, 336–342. 1207 1146 Aviran, S., Schroth, G. P. et al. (2011). Multiplexed 126. Sasaki, Y. T., Ideue, T., Sano, M., Mituyama, T. & 1208 1147 RNA structure characterization with selective 2′- Hirose, T. (2009). MENepsilon/beta noncoding 1209 1148 hydroxyl acylation analyzed by primer extension RNAs are essential for structural integrity of nuclear 1210 1149 sequencing (SHAPE-Seq). Proc. Natl Acad. Sci. paraspeckles. Proc. Natl Acad. Sci. USA, 106, 1211 1150 USA, 108, 11063–11068. 2525–2530. 1212 1151 111. Spitale, R. C., Crisalli, P., Flynn, R. A., Torre, E. A., 127. Gutschner, T., Hammerle, M., Eissmann, M., Hsu, J., 1213 1152 Kool, E. T. & Chang, H. Y. (2013). RNA SHAPE Kim, Y., Hung, G. et al. (2012). The non-coding RNA 1214 1153 analysis in living cells. Nat. Chem. Biol. 9, 18–20. MALAT1 is a critical regulator of the metastasis 1215 1154 112. Wilusz, J. E. & Spector, D. L. (2010). An unexpected of lung cancer cells. Cancer Res. 73, 1216 1155 ending: noncanonical 3′ end processing mecha- 1180–1189. 1217 1156 nisms. RNA, 16, 259–266. 128. Miyagawa, R., Tano, K., Mizuno, R., Nakamura, Y., 1218 1157 113. Beniaminov, A., Westhof, E. & Krol, A. (2008). Ijiri, K., Rakwal, R. et al. (2012). Identification of cis- 1219 1158 Distinctive structures between chimpanzee and and trans-acting factors involved in the localization of 1220 1159 human in a brain noncoding RNA. RNA, 14, MALAT-1 noncoding RNA to nuclear speckles. RNA, 1221 1160 1270–1275. 18, 738–751. 1222 1161 114. Pollard, K. S., Salama, S. R., Lambert, N., Lambot, 129. Naganuma, T., Nakagawa, S., Tanigawa, A., Sasaki, 1223 1162 M. A., Coppens, S., Pedersen, J. S. et al. (2006). An Y. F., Goshima, N. & Hirose, T. (2012). Alternative 3′- 1224 1163 RNA gene expressed during cortical development end processing of long noncoding RNA initiates 1225 1164 evolved rapidly in humans. Nature, 443, 167–172. construction of nuclear paraspeckles. EMBO J. 31, 1226 1165 115. Pollard, K. S., Salama, S. R., King, B., Kern, A. D., 4020–4034. 1227 1166 Dreszer, T., Katzman, S. et al. (2006). Forces 130. Souquere, S., Beauclair, G., Harper, F., Fox, A. & 1228 1167 shaping the fastest evolving regions in the human Pierron, G. (2010). Highly ordered spatial organiza- 1229 1168 genome. PLoS Genet. 2, e168. tion of the structural long noncoding NEAT1 RNAs 1230 1169 116. Guttman, M., Donaghey, J., Carey, B. W., Garber, within paraspeckle nuclear bodies. Mol. Biol. Cell, 1231 1170 M., Grenier, J. K., Munson, G. et al. (2011). lincRNAs 21, 4020–4027. 1232 1171 act in the circuitry controlling pluripotency and 131. Rodnina, M. V. & Wintermeyer, W. (2001). Fidelity of 1233 1172 differentiation. Nature, 477, 295–300. aminoacyl-tRNA selection on the ribosome: kinetic 1234 1173 117. Sanford, J. R., Wang, X., Mort, M., Vanduyn, N., and structural mechanisms. Annu. Rev. Biochem. 1235 1174 Cooper, D. N., Mooney, S. D. et al. (2009). Splicing 70, 415–435. 1236 1175 factor SFRS1 recognizes a functionally diverse 132. Rodnina, M. V. & Wintermeyer, W. (2001). Ribosome 1237 1176 landscape of RNA transcripts. Genome Res. 19, fidelity: tRNA discrimination, proofreading and in- 1238 1177 381–394. duced fit. Trends Biochem. Sci. 26, 124–130. 1239 1240

Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030