Exploring the Structure of Long Non-Coding Rnas, J
Total Page:16
File Type:pdf, Size:1020Kb
IMF YJMBI-63988; No. of pages: 15; 4C: 3, 4, 7, 8, 10 1 2 Rise of the RNA Machines: Exploring the Structure of 3 Long Non-Coding RNAs 4 Irina V. Novikova, Scott P. Hennelly, Chang-Shung Tung and Karissa Y. Sanbonmatsu Q15 6 Los Alamos National Laboratory, Los Alamos, NM 87545, USA 7 Correspondence to Karissa Y. Sanbonmatsu: [email protected] 8 http://dx.doi.org/10.1016/j.jmb.2013.02.030 9 Edited by A. Pyle 1011 12 Abstract 13 Novel, profound and unexpected roles of long non-coding RNAs (lncRNAs) are emerging in critical aspects of 14 gene regulation. Thousands of lncRNAs have been recently discovered in a wide range of mammalian 15 systems, related to development, epigenetics, cancer, brain function and hereditary disease. The structural 16 biology of these lncRNAs presents a brave new RNA world, which may contain a diverse zoo of new 17 architectures and mechanisms. While structural studies of lncRNAs are in their infancy, we describe existing 18 structural data for lncRNAs, as well as crystallographic studies of other RNA machines and their implications 19 for lncRNAs. We also discuss the importance of dynamics in RNA machine mechanism. Determining 20 commonalities between lncRNA systems will help elucidate the evolution and mechanistic role of lncRNAs in 21 disease, creating a structural framework necessary to pursue lncRNA-based therapeutics. 22 © 2013 Published by Elsevier Ltd. 24 23 25 Introduction rather than the exception in the case of eukaryotic 50 organisms. 51 26 RNA is primarily known as an intermediary in gene LncRNAs are defined by the following: (i) lack of 52 11 27 expression between DNA and proteins. Over the coding potential and (ii) transcript length (N200 nt). 53 28 past several decades, other roles for RNA have These transcripts are generally nuclear retained and 54 29 been identified, which include protein synthesis, transcribed by RNA polymerase II, with many that are 55 12–15 30 gene regulation and nucleic acid processing. How- spliced and polyadenylated. LncRNAs may be 56 31 ever, these RNAs were considered as outliers in a intronic, intergenic (large intervening non-coding 57 32 transcriptome that consists mainly of protein-coding RNAs or lincRNAs) or antisense to the protein- 58 33 RNAs. While this may be the case for bacteria and coding genes (overlapping one or more exons). 59 1 34 certain unicellular organisms, transcriptomes in While transcript lengths are normally in the range 60 35 higher eukaryotes are markedly different. Recent 1000–10,000 residues, the Air and kcnq1ot1 61 16–18 36 deep sequencing studies in humans have shown lncRNAs have lengths that exceed 90 kb. 62 37 that more than two-thirds of the genome is actively Interestingly, in 2013, a novel “monster” non-coding 63 2 38 transcribed. Since protein-coding genes constitute transcript, XACT (252 kb in length), was found to 64 3 39 a very small fraction of the genome, non-coding originate from the active X chromosome in human 65 19 40 RNAs represent the vast majority (more than 80% in pluripotent cells. Many lncRNAs play key roles in 66 4 20–22 23 12,24–29 41 mammals). Recent studies estimate approximately signaling, development, cancer, em- 67 30 31–34 42 15,000 long non-coding RNAs (lncRNAs) in bryonic stem cell pluripotency, brain function, 68 5 26,35–38 43 humans. In light of their expression profiles specific subcellular compartmentalization, chromatin 69 12,39,40 41,42 44 to cell, cell cycle, tissue, developmental stage and remodeling, plant biology and stress 70 43 45 disease, it is difficult to precisely quantify the number response. Cell biology studies and functional 71 46 of human lncRNAs. We note that tens of thousands aspects of lncRNAs have been discussed by Rinn 72 6–10 47 of lncRNAs have been profiled in 2012. This and Chang in a recent review article, as well as in 73 6 48 recent explosion of newly discovered lncRNAs many references therein. Since lncRNAs are often 74 49 suggests that non-coding RNAs may be the norm associated with histone modification and chromatin 75 0022-2836/$ - see front matter © 2013 Published by Elsevier Ltd. J. Mol. Biol. (2013) xx, xxx–xxx Please cite this article as: Novikova, I. et al., Rise of the RNA Machines: Exploring the Structure of Long Non-Coding RNAs, J. Mol. Biol. (2013), http://dx.doi.org/10.1016/j.jmb.2013.02.030 2 Review: Structure of RNAs and RNA Machines Q2 76 remodeling, epigenetic effects represent a unifying resolution structure of the bacterial ribosome 133 77 theme. While the mechanism by which epigenetic (N4500 nt) required more than two decades for its 134 78 factors find their targets is largely unknown, many solution, structural studies of lncRNAs are 135 64,65 79 recent studies indicate that lncRNAs may be a formidable. Fundamental questions regarding 136 80 necessary component of the network that guides the structure of lncRNAs remain unanswered, 137 81 these factors to their chromatin targets. In addition to including: 138 82 epigenetic effects, the origin of lncRNAs is not well 83 understood. While the genomes of simple organisms (1) Is lncRNA mechanism dominated by pri- 139 84 are dominated by protein-coding genes, the advent mary sequence, higher-order structure or 140 85 of non-coding RNAs in higher organisms gave rise to both? 141 86 an incredible sequence space for exploration of new (2) Do lncRNAs contain sub-domains? 142 87 function at the RNA level, creating sophisticated (3) Are lncRNAs complexed with many pro- 143 88 regulatory networks. Possible evolution mechanisms teins or do they exist as isolated RNAs that 144 89 may include pseudogenization of protein-coding transiently interact with proteins? 145 90 genes, insertion of transposons and duplication of 91 genes. Investigations into the open area of the origin (4) Are lncRNAs compact or extended? 146 92 and evolution of lncRNAs have the potential to yield 147 93 exciting results. Here, we attempt to provide a starting point for 148 structural studies of lncRNAs. Because tertiary 149 94 RNA machines structures have not been solved to date, we review 150 tertiary structures of other RNA machines. We next 151 95 RNA molecules are able to form complex molec- summarize currently known structural features of 152 96 ular machines. The ribosome is one of the most well eukaryotic lncRNAs (Fig. 1). Finally, we discuss 153 97 known RNA machines since it has many moving lncRNA structure in the context of previously studied 154 44,45 98 parts and may be powered by GTP hydrolysis. RNA molecular machines and corresponding struc- 155 99 In certain circumstances, the ribosome can function ture/function relationships. 156 46 100 without GTP hydrolysis (“factor-free translation”). 101 Factor-free translation is quite inefficient in compar- Crystallographic studies of RNA machines in 157 102 ison to GTP-based translation and in comparison to bacteria and other unicellular organisms 158 103 the far more efficient protein-based molecular 47 104 motors (e.g., flagellum). The ribosome also pro- Three-dimensional structure is often integral to the 159 105 cesses information by performing a look-up table function and mechanism of a biomolecular system. 160 106 operation, converting the 4-letter nucleic acid alpha- Here, we review crystallographic structures of other 161 107 bet into the 20-letter protein alphabet. The group I RNA systems. This review is by no means compre- 162 108 and group II self-splicing introns are also molecular hensive but highlights the diversity of mechanism 163 109 machines as they use chemical energy to bring and composition of RNA structure. The group I 164 48–50 110 together distant regions of RNA. RNase P and intron, the group II intron, the ribosome and RNase P 165 111 telomerase RNA also catalyze chemical reactions to are the only RNAs with lengths greater than 200 nt 166 51–53 48,66 112 accomplish their function. The riboswitch is that have been studied at high resolution. 167 54 113 another type of RNA molecular machine. Ribos- The group I and group II introns are self-splicing 168 114 witches are molecular switches that sense their RNAs that catalyze their own cleavage. The struc- 169 115 environment and allow gene expression decisions to ture of the group I intron consists largely of an 170 116 be made on the basis of environmental inputs such isolated RNA (~200 nt), organized into a well- 171 55–60 117 as ligand concentration. More complicated defined architecture by co-axially stacked helices 172 118 tandem riboswitches act as logical AND gates. In connected by single-stranded regions. The pre-2S 173 119 addition, complex combinations of aptamers can be state of the group I intron includes an exon and also 174 120 arranged into logic circuitry (e.g., OR, NAND, NOR contains mainly co-axially stacked helices con- 175 61 67 121 and NOT gates). Finally, RNA scaffolding that nected by single-stranded regions. Four of the 176 122 controls information flow could be analogous to an helices are capped by tetraloops. This structure 177 123 integrated circuit (a device that has no moving parts consists largely of an isolated RNA with one small 178 124 but controls highly complex information flow in spliceosomal protein (U1A) bound to its extremity. 179 125 computers). Recent studies suggest that several The active site contains two base triples, critical for 180 126 different lncRNA systems may act as scaffolding, precise positioning of magnesium ions and catalysis. 181 62 127 controlling information flow in epigenetic systems. The structure of the post-catalytic state of the 182 128 Structural studies are often critical in deciphering group II intron, solved by Toor et al., contains 183 129 RNA machines and the mechanism of RNA action.