An Overview of the Introns-First Theory
Total Page:16
File Type:pdf, Size:1020Kb
J Mol Evol DOI 10.1007/s00239-009-9279-5 An Overview of the Introns-First Theory David Penny Æ Marc P. Hoeppner Æ Anthony M. Poole Æ Daniel C. Jeffares Received: 13 August 2009 / Accepted: 8 September 2009 Ó Springer Science+Business Media, LLC 2009 Abstract We review the introns-first hypothesis a decade within extant eukaryotic groups—the ‘very-late’ intron after it was first proposed. It is that exons emerged from invasion model. Similarly, it is clear that there are selective non-coding regions interspersed between RNA genes in an forces on numbers and positions of introns; their existence early RNA world, and is a subcomponent of a more general may not always be neutral. There is still a range of viable ‘RNA-continuity’ hypothesis. The latter is that some RNA- alternatives, including introns first, early, and ‘latish’ (i.e. based systems, especially in RNA processing, are ‘relics’ well established in LECA), and regardless of which is that can be traced back either to the RNA world that pre- ultimately correct, it pays to separate out various questions ceded both DNA and encoded protein synthesis or to the and to focus on testing the predictions of sub-theories. later ribonucleoprotein (RNP) world (before DNA took over the main coding role). RNA-continuity is based on Keywords Introns Á RNA world Á Eukaryote origins Á independent evidence—in particular, the relative ineffi- RNP world Á Spliceosome Á Introns early ciency of RNA catalysis compared with protein catalysis— and leads to a wide range of predictions, ranging from the origin of the ribosome, the spliceosome, small nucleolar Introduction RNAs, RNases P and MRP, and mRNA, and it is consistent with the wide involvement of RNA-processing and regu- The introns-first theory was published just over a decade lation of RNA in modern eukaryotes. While there may still ago (Poole et al. 1998; Jeffares et al. 1998), and aimed to be cause to withhold judgement on intron origins, there is account for the origin of mRNA within an evolutionary strong evidence against introns being uncommon in the last framework for the origin of genetically encoded protein eukaryotic common ancestor (LECA), and expanding only synthesis in the late stages of the RNA world (see Fig. 1 for a summary). There are three aspects to this hypothesis: that mRNA arose by co-option of expressed non-functional D. Penny (&) RNA, that the co-opted RNAs were interspersed between Allan Wilson Center, Massey University, Palmerston North, functional RNA genes, and that the core of the spliceo- New Zealand some, some extant genes, and their introns may be relics e-mail: [email protected] from this very early period. The last of these is directly M. P. Hoeppner Á A. M. Poole testable in principle. Introns-first is thus part of a much Department of Molecular Biology and Functional Genomics, wider analysis of the expectations of the continuity of RNA Stockholm University, 106 91 Stockholm, Sweden systems from the RNA and ribonucleoprotein (RNP) A. M. Poole worlds to modern organisms. This more general RNA- School of Biological Sciences, University of Canterbury, continuity theory (Fig. 2a, see Penny and Collins 2009)is Christchurch 8140, New Zealand that many classes of RNA in modern eukaryotes have existed since these earlier phases; they are in our termi- D. C. Jeffares Department of Genetics, Evolution and Environment, nology ‘relics’, though of course the associated proteins University College London, London WC1E 6BT, UK would only have arisen after encoded protein synthesis 123 J Mol Evol either introns first or early. Figure 2b shows the contrast between the RNA-continuity model and the more common idea that an early complexity of RNA control mechanisms from then RNP world was lost in prokaryotes (archaea and bacteria) and reinvented in eukaryotes. The last decade has seen a major expansion in our knowledge of the roles of RNA in eukaryotes and has expanded the classes of RNA that are known and the questions that need to be addressed. An early focus was on ubiquitous RNAs with a processing function (e.g. rRNA, tRNA, snRNA, small nucleolar RNA [snoRNa], srpRNA, RNase P, and RNase MRP) and the exon/intron structure of eukaryote genes (see Gilbert 1987; Cavalier-Smith 2002; Rodrı´guez-Trelles et al. 2006; Di Giulio 2008a, b, and references therein), but the finding of the widespread and complex roles of RNA in eukaryote cells (including RNAi) has broadened the discussion to the extent we now refer to the ‘RNA infrastructure’ of the eukaryote cell (Collins and Penny 2009). In particular, a range of regulatory RNAs have been identified in the last decade and include the many classes of small RNA involved in RNAi, such as miRNA, siRNA and piRNA, as well as their role in epi- genetics (Carthew and Sontheimer 2009). Introns First, Early, Late, and the RNA-Continuity Fig. 1 The main components of the introns-first model. Introns and Hypothesis intron splicing arose in a RNA world organism where both the genome and the enzymes are composed of RNA. The double-stranded A basic question is the extent to which the RNA infrastruc- RNA genome (at top) contained RNA genes (filled boxes), inter- spersed with sequences that are non-RNA coding (open boxes). ture arose de novo in eukaryotes, or whether there is conti- Transcription produces single pre-processed transcripts. These tran- nuity of many classes of RNA, including those restricted to scripts are then processed (spliced) to produce mature functional eukaryotes, from the later stages of the origin of life through RNAs. Non-functional RNA byproducts are also produced from this to the present. Most researchers appear to consider eukary- processing as a byproduct of liberation of functional RNAs (such as snoRNAs). Some such byproducts were subsequently recruited to otes as ‘advanced’ cells that must in some way be derived non-templated protein synthesis as a means to stabilize the interaction from ‘primitive’ prokaryotes. Although this is definitely between two charged tRNAs during non-genetically encoded peptide possible, we think that under the three domains of life view, synthesis (by pairing with what subsequently became ‘anticodon’ the data is currently insufficient to exclude the alternative loop). This model for the origin of mRNA suggests co-evolution of the genetic code and these earliest transcripts. The introns-first that eukaryotes have remained relatively inefficient in, for hypothesis proposed that the first proteins were initially selected for example, their processing of mRNA. In contrast, we could propensity to stabilize functional RNA and were not catalytic. Hence, consider bacteria and archaea as having evolved a very fast introns are derived from RNA genes, and these were present prior to and efficient mRNA processing system—whether from the evolution of protein-coding segments (exons) thermoreduction (Forterre 1995), r-selection (Poole et al. 1998), efficient selection in large populations (Lynch 2007), developed. Thus, the introns-first hypothesis is an inde- chronic energy stress (Valentine 2007), or other reasons. pendent sub-hypothesis of this more general model, and the Currently, it is important to remain open-minded about dif- introns-first model needs to be evaluated independently; for ferent interpretations of eukaryote origins, and focus on example, the RNA-continuity hypothesis could stand even using the available data to test the different models. It is though the introns-first could eventually be rejected. We unhelpful to prematurely decide on just one model. will see that the introns-first model shifts the focus onto The first step is to outline the timing of the different eukaryotes, and if it could be established that eukaryotes hypotheses. For the origin of spliceosomal introns, a were indeed formed de novo from an archaeal and a bac- common distinction is between scenarios that consider terial cell (see Embley and Martin 2006 for different introns as a late addition via horizontal transfer, or a models) then that would be a very strong evidence against remnant of the RNA or RNP worlds. There are plausible, 123 J Mol Evol Fig. 2 Two models for the origin of the high RNA complexity in there may have been the same early complex system of RNA eukaryotes. a Under the RNA-continuity model, the basic system of processing of RNA, but this was largely lost via streamlining or RNA processing of RNA in modern eukaryotes evolved in an earlier replacement before LUCA. This subsequently re-expanded in ribonucleoprotein stage of the origin of life—an RNP world. The eukaryotes. This model has one loss and one gain. The order of model involves streamlining of RNA processing separately in bacteria branching of archaea, bacteria, and eukaryotes is deliberately and archaea, with the latter having retained some snoRNAs. There ambiguous; branching is independent of the models, and alternative would be continued evolution (including expansion) of the RNA topologies are consistent with either model. (Modified from Penny infrastructure in eukaryotes. b Under the RNA re-expansion model, and Collins (2009).) detailed models for the late origin and spread of the Thus, we use the division between the RNA world, the spliceosome and spliceosomal introns in eukaryotes, these RNP world, and the DNA worlds as our primary distinc- having derived in a stepwise manner from group II introns tion, though agreeing that rigid distinctions are overly (Hickey 1992; Stoltzfus 1999)—introns-late. An origin for simplistic. For example, as illustrated in Fig. 3a, there will group II introns in the RNA world has likewise been pro- be overlaps between first/early and early/late. Again, the posed—a form of the introns-early model (Gilbert and de intron/exon structure could, in principle, have arisen in a Souza 1999) distinct from the original exon theory of common ancestor of eukaryotes and archaea (the introns- genes.