Structural and Sequence Diversity of Eukaryotic Transposable Elements

Structural and Sequence Diversity of Eukaryotic Transposable Elements

Genes Genet. Syst. (2019) 94, p. 233–252Diversity of eukaryotic transposable elements 233 Structural and sequence diversity of eukaryotic transposable elements Kenji K. Kojima1,2* 1Genetic Information Research Institute, 465 Fairchild Drive, Suite 201, Mountain View, CA 94043, USA 2Department of Life Sciences, National Cheng Kung University, No. 1, Daxue Road, East District, Tainan 701, Taiwan (Received 10 May 2018, accepted 12 July 2018; J-STAGE Advance published date: 9 November 2018) The majority of eukaryotic genomes contain a large fraction of repetitive sequences that primarily originate from transpositional bursts of transposable elements (TEs). Repbase serves as a database for eukaryotic repetitive sequences and has now become the largest collection of eukaryotic TEs. During the development of Repbase, many new superfamilies/lineages of TEs, which include Helitron, Polinton, Ginger and SINEU, were reported. The unique composition of protein domains and DNA motifs in TEs sometimes indicates novel mechanisms of transposition, replication, anti-suppression or proliferation. In this review, our current under- standing regarding the diversity of eukaryotic TEs in sequence, protein domain composition and structural hallmarks is introduced and summarized, based on the classification system implemented in Repbase. Autonomous eukaryotic TEs can be divided into two groups: Class I TEs, also called retrotransposons, and Class II TEs, or DNA transposons. Long terminal repeat (LTR) retrotransposons, includ- ing endogenous retroviruses, non-LTR retrotransposons, tyrosine recombinase ret- rotransposons and Penelope-like elements, are well accepted groups of autonomous retrotransposons. They share reverse transcriptase for replication but are distinct in the catalytic components responsible for integration into the host genome. Sim- ilarly, at least three transposition machineries have been reported in eukaryotic DNA transposons: DDD/E transposase, tyrosine recombinase and HUH endonu- clease combined with helicase. Among these, TEs with DDD/E transposase are dominant and are classified into 21 superfamilies in Repbase. Non-autonomous TEs are either simple derivatives generated by internal deletion, or are composed of several units that originated independently. Key words: transposase, transposon, Repbase, retrotransposon, reverse tran- scriptase became clear. First, the majority of eukaryotic inter- INTRODUCTION spersed repeat sequences are originated from TEs, which Transposable elements (TEs), also known as transpo- are active now or were active in the past. The majority sons, mobile DNA, or mobile elements, include a variety of Medium reiterated repeats families found in the human of DNA segments that can, in a process called transposi- genome have been classified into various TE superfami- tion, move (or duplicate) from one location in the genome lies (Kojima, 2018a). The origins of these interspersed to another. repeats were not initially obvious. Eukaryotic repeat Repbase was first established as a database of human sequences not derived from TEs are microsatellites, sat- repeat sequences in 1992 (Jurka et al., 1992). Now, Rep- ellite repeats arrayed in tandem, multicopy genes (such base contains diverse eukaryotic repeat sequences that as ribosomal RNA genes), histone genes, and occasionally are categorized by organism and repeat type (Bao et integrated viruses (Bao et al., 2015). al., 2015). In the development of Repbase, two things Second, the mechanisms and components responsible for transposition vary among TEs. Repbase contrib- uted significantly to reveal the diversity of TEs. Many Edited by Kenji Ichiyanagi * Corresponding author. E-mail: [email protected] TE superfamilies were described by the team at the DOI: http://doi.org/10.1266/ggs.18-00024 Genetic Information Research Institute (GIRI) who 234 K. K. KOJIMA have maintained and expanded Repbase (Bao et al., autonomous derivatives, while Class II includes all other 2015). The discovery of Helitron opened a new win- autonomous transposons that lack RT and their non- dow in the world of TE studies because this superfam- autonomous derivatives. ily encodes a unique protein set (Kapitonov and Jurka, Class I is subdivided into two large categories that are 2001). Characterization of the superfamily of gigantic distinguished by the presence of long terminal repeats TEs, Polinton, allowed us to create a vague boundary (LTRs): LTR retrotransposons and non-LTR retrotrans- between TEs and viruses (Kapitonov and Jurka, 2006; posons. Recent studies have revealed additional groups Krupovic et al., 2014a). Some recently characterized of eukaryotic retrotransposons that are distinguishable groups of TEs include Ginger1, Ginger2 (Bao et al., 2010), from these two by the transposition mechanism and/or Dada (Kojima and Jurka, 2013b) and SINEU (Kojima, the phylogeny of RT. They are DIRS retrotransposons 2015). In addition, studies outside Repbase cannot be (or tyrosine recombinase-encoding retrotransposons) neglected. Recent examples that were characterized by (Glöckner et al., 2001; Poulter and Goodwin, 2005) and other teams are Zisupton (Böhne et al., 2012), Spy (Han Penelope-like retrotransposons (Penelope-like elements) et al., 2014) and Teratorn (Inoue et al., 2017). (Arkhipova et al., 2003). It should be mentioned that After transposition, many types of TEs are flanked by even though DIRS is the abbreviation of Dictyostelium short (1–20 bp) direct repeats called target site duplica- intermediate repeat sequence, retrotransposons related tions (TSDs), which are derived from the target sequence to DIRS have been found in diverse species and the term (Kapitonov and Jurka, 2008). However, certain TE DIRS is now used as the name of a group whose mem- types, such as Helitron, some terminal inverted repeat bers show similar protein domain composition. In this (TIR)-bearing TEs, and CR1 retrotransposons, do not pro- review, names representing a superfamily or group are duce TSDs. The length of a TSD is usually characteristic not shown as abbreviations to avoid confusion about their of the TE’s group and its relatives, but may also vary distribution. These four groups are distinct in the ori- across groups in a specific superfamily. TEs constitute gins of the catalytic components (endonuclease or recom- the majority of repetitive sequences in most eukaryotic binase) that are responsible for integration into the host genomes. In fact, TEs can be viewed as intra-genomic genome. In the classification implemented in Repbase, parasites. Some viruses, such as retroviruses, behave DIRS retrotransposons are included in LTR retrotranspo- like TEs. TEs also have diverse evolutionary impacts on sons and Penelope-like retrotransposons in non-LTR ret- their host genome. rotransposons. Currently, this expedient classification The aim of this review is to introduce and summarize was primarily introduced for practical reasons to avoid our present understanding of the diversity in eukaryotic over-subclassification, and it does not mean that Repbase TEs in sequence, protein domain composition, as well as ignores the unique properties of DIRS and Penelope-like structural hallmarks that include TSDs or terminal sig- retrotransposons. natures (long terminal repeats, terminal inverted repeats, Due to the lack of any conserved protein domains polyA tail, etc.). I focus on protein domain composition among DNA transposons, the classification of DNA trans- because (1) it is tightly related to the mechanism of trans- posons is less widely accepted than that of retrotranspo- position, and (2) it can be easily detected by bioinformat- sons. The machinery of transposition is the framework ics analysis during the initial characterization of TEs. for classification of TEs. In general, the machinery is tightly linked to the composition of the protein domains encoded by TEs. When considering eukaryotic and pro- TE CLASSIFICATION BASED ON karyotic TEs together, the transposases encoded by DNA BIOINFORMATICS transposons are classified into six types: DDD/E trans- The concept that the highest rank of classifica- posase, DEDD transposase, tyrosine recombinase (YR), tion in TEs is linked to the mechanism of mobiliza- serine recombinase (SR), HUH nuclease and Cas1 nucle- tion is well accepted. Historically, eukaryotic TEs are ase (Siguier et al., 2006; Chandler et al., 2013; Krupovic divided into two classes: Class I and Class II (Finnegan, et al., 2014b). Among these, DEDD transposase, SR 1989). Despite several objections by critiques (Piégu et and Cas1 nuclease have not been found in any eukary- al., 2015; Arensburger et al., 2016), this simple classifi- otic TEs. YR is encoded by Crypton (Goodwin et al., cation has worked very well to date. Class I includes 2003; Kojima and Jurka, 2011a), while HUH nuclease is retrotransposons, which transpose through an RNA inter- encoded by Helitron (Kapitonov and Jurka, 2001). All mediate. Because reverse transcriptase (RT) is the only other groups of eukaryotic DNA transposons are thought enzyme that can efficiently catalyze reverse transcrip- to encode DDD/E transposase. tion, all autonomous retrotransposons encode RT. Class Table 1 is a brief comparison between the classifica- II includes DNA transposons, which do not use RNA as tion systems of Repbase (Bao et al., 2015), Wicker et al. transposition intermediates. In other words, Class I (2007), and Arkhipova (2017). They are largely consis- includes all transposons that encode RT and their non- tent, except for several minor conflicts. It is notewor-

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us