Massive Amplification of Rolling-Circle Transposons in the Lineage of the Bat Myotis Lucifugus
Total Page:16
File Type:pdf, Size:1020Kb
Massive amplification of rolling-circle transposons in the lineage of the bat Myotis lucifugus Ellen J. Pritham* and Ce´ dric Feschotte* Department of Biology, University of Texas, Arlington, TX 76019 Edited by Susan R. Wessler, University of Georgia, Athens, GA, and approved December 18, 2006 (received for review October 28, 2006) Rolling-circle (RC) transposons, or Helitrons, are a newly recog- and by the putative enzymatic activities encoded by autonomous nized group of eukaryotic transposable elements abundant in the copies (11). Helitrons share structural and sequence similarities genomes of plants, invertebrates, and zebrafish. We provide evi- with a disparate group of prokaryotic mobile elements and dence for the colonization of a mammalian genome by Helitrons, single-stranded DNA (ssDNA) viruses that use RC replication, which has not been reported previously. We identified and char- a process that involves the nicking, displacement, and ligation of acterized two families of Helitrons in the little brown bat Myotis an ssDNA intermediate (15–18). Helitrons are poorly character- lucifugus. The consensus sequence for the first family, HeliBat1, ized TEs in eukaryotes, but they make up Ϸ2% of the small displays the hallmarks of an autonomous Helitron, including cod- genomes of Arabidopsis thaliana and Caenorhabidits elegans (11). ing capacity for an Ϸ1,500-aa protein with an RC replication motif Furthermore, recent genetic and sequence analyses have impli- and a region related to the SF1 superfamily of DNA helicases. The cated Helitrons in large-scale duplication and exon shuffling of HeliBatN1 family is a nonautonomous Helitron family that is only thousands of genes in the maize genome (19–22). We show here distantly related to HeliBat1. The two HeliBat families have at- that Helitrons are a major component (at least 3%) of the M. tained high copy numbers (Ϸ15,000 and > 100,000 copies, respec- lucifugus genome, and, as such, they have likely played an tively) and make up at least Ϸ3% of the M. lucifugus genome. important role in shaping some mammalian genomes. Sequence divergence and cross-species analyses indicate that both HeliBat families have amplified within the last Ϸ30–36 million Results years and are restricted to the lineage of vesper bats. We could not Discovery of HeliBats. Homology-based searches (tblastn) of the detect the presence of Helitrons in any other order of placental whole genome shotgun sequences (WGS) database using as a mammals, despite the broad representation of these taxa in the query a protein domain (Ϸ160 residues) containing the RC motif databases. We describe an instance of HeliBat-mediated transduc- of a sea urchin Helitron (E.J.P., unpublished data) yielded 94 hits tion of a host gene fragment that was subsequently dispersed in with significant e values (Ͻ10Ϫ4)toM. lucifugus contigs. The first Ϸ1,000 copies throughout the M. lucifugus genome. Given the 75 hits span the entire RC domain and display 40–57% identity demonstrated propensity of RC transposons to mediate the dupli- and 63–76% similarity to the query sequence. No significant hits cation and shuffling of host genes in bacteria and maize, it is were obtained with any other placental mammals, despite the tempting to speculate that the massive amplification of Helitrons current representation in the WGS database of 35 other species in vesper bats has influenced the evolutionary trajectory of these from most mammalian orders. Two significant hits were ob- mammals. tained with sequences from the platypus Ornithorhynchus anati- nus (e values of 2 ϫ 10Ϫ5 and 0.042). Closer inspection confirmed Chiroptera ͉ Helitron ͉ horizontal transfer ͉ mammalian genome ͉ that these platypus contigs contain highly degenerated remnants transposable elements of Helitron coding sequences (data not shown). Thus, a family of Helitrons also seems to have colonized the genome of an he largest fraction of most eukaryotic genomes is made up of ancestral mammalian or monotreme species. Tinterspersed repetitive DNA. Transposable elements (TEs) Using one of the M. lucifugus hits as a seed, we retrieved and represent the major type of interspersed repeats and often aligned 15 closely related DNA sequences and their flanks. This constitute the single largest component of the genetic material. multiple alignment was used to derive a consensus sequence of For example, approximately half of the human genome is made 5,503 bp that seems to represent a full-length member of a of TEs (1), and at least 60% of the maize genome is occupied by Helitron family that we named HeliBat1. The HeliBat1 consensus TEs (2). There is evidence that TEs have contributed profoundly sequence possesses all of the hallmarks of a potential autono- to shaping eukaryotic genomes through their movement and mous Helitron (11). First, the termini of the consensus and amplification (for review, see refs. 3–5). individual copies in the genome are defined by 5Ј-TC and EVOLUTION Mammalian TEs described so far fall within three types: non- CTAG-3Ј motifs. Second, a short palindromic motif (18 bp with long-terminal repeat (non-LTR) retrotransposons, retroviral-like one mismatch and a single T nucleotide spacer) is located 11 (LTR) elements, and DNA transposons. The non-LTR retrotrans- nucleotides (nt) upstream of the 3Ј terminus of the consensus posons are, by far, the most abundant type of TEs in the genomes of human, mouse, rat, and dog (1, 6–8). In humans, two predom- Author contributions: E.J.P. and C.F. designed research, performed research, analyzed data, inant families of non-LTR retrotransposons (Alu and L1) account and wrote the paper. for more than one-fourth of the genome and have been major The authors declare no conflict of interest. players in the structural genomic evolution of humans and other This article is a PNAS direct submission. primates (9, 10). Much less is known about the nature and impact Abbreviations: MYA, million years ago; RC, rolling circle; TE, transposable element; WGS, of TEs in the genome of other mammalian lineages. whole genome shotgun sequences. We report on the discovery of rolling-circle (RC) transposons, Data desposition: Consensus sequences for the HeliBat families described in this study were also known as Helitrons, in the genome of the little brown bat, deposited in Repbase, www.girinst.org/repbase (accession numbers and coordinates of the Myotis lucifugus. Although Helitrons and Helitron-related se- individual sequences used to reconstruct the consensus are listed in SI Table 1). quences have been identified in the genome of plants, fungi, *To whom correspondence may be addressed. E-mail: [email protected] or [email protected]. invertebrates, and fish (11–14), these elements have not been This article contains supporting information online at www.pnas.org/cgi/content/full/ described previously in a mammalian species. Helitrons are 0609601104/DC1. distinguished from other classes of TEs by their overall structure © 2007 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0609601104 PNAS ͉ February 6, 2007 ͉ vol. 104 ͉ no. 6 ͉ 1895–1900 Downloaded by guest on September 28, 2021 Fig. 1. Terminal sequence features of HeliBat elements and other Helitrons. The 5Ј and 3Ј terminal sequences characteristic of Helitrons are shaded in black, and the 3Ј palindromic motifs are underlined. The flanking A and T host nucleotides are in lowercase. Ml, M. lucifugus; Ce, C. elegans; Ag, Anopheles gambiae; Os, Oryza sativa; At, A. thaliana. (Fig. 1). These terminal features are consistent with those of similar length to other Helitron proteins (typically 1,300–1,700 Helitrons previously characterized in A. thaliana, rice, mosquito, aa) and contains the same domain organization (Fig. 2). The and nematode (11, 12). N-terminal region contains predicted zinc-finger-like motifs. The HeliBat1 consensus sequence contains a long ORF that The central region shares similarity with the Rep domain of other potentially encodes a 1,496-aa protein. This putative protein can Helitron proteins (11) and contains a motif that can be confi- be aligned over its entire length with the proteins encoded by dently aligned with the so-called ‘‘two-His’’ replication initiator potential autonomous Helitrons previously identified from the motifs of some prokaryotic mobile elements, plasmids, and aforementioned species and other eukaryotic species [see sup- ssDNA viruses (Fig. 2). The Rep motif is essential for the life porting information (SI) Fig. 4]. The HeliBat1 protein is of cycle of these genetic elements, because it catalyzes the cleavage Fig. 2. Genetic organization and predicted functional protein domains of HeliBat1.(Top) A schematic representation of the genetic organization of HeliBat1 and domain structure of the putative encoded protein. ZF, zinc-finger-like motifs; Rep, RC replication initiator motif; Helicase, region similar to SF1 superfamily of DNA helicases. (Middle) An alignment of the REP motif of HeliBat1, representative Helitrons from seven other species [abbreviations as in Fig. 1, plus Sp, Strongylocentrotus purpuratus; Cg, Chaetomium globosum (a fungus); Dr, Danio rerio] and several RC viruses and plasmids (SVTS, Spiroplasma plectrovirus; RepSC, Streptomyces cyaneus plasmid; RepBB, Bacillus borstelensis plasmid; RepAA, Actinobacillus actinomycetemcomitans plasmid; TRAARHISN, Rhizobium sp.; NGR234Pf3, Pseudomonas aeruginosa bacteriophage). The positions of the two histidines and two tyrosines known to be critical for catalytic activity of the RC elements are highlighted above the alignment. (Bottom) An alignment of the seven conserved motifs of SF1 superfamily DNA helicases from yeast (P07271), baculovirus (T30397), bacteria (P55418), and T4 phage (P32270) with the corresponding regions of HeliBat1 and other Helitron proteins. 1896 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0609601104 Pritham and Feschotte Downloaded by guest on September 28, 2021 source, the HeliBatN1 family has been markedly more successful at propagating than HeliBat1 (see below). A consensus sequence for a distinct subfamily of HeliBatN1, called HeliBatN2, was constructed from the alignment of 15 copies.