Distinct Families of Site-Specific Retrotransposons Occupy Identical Positions in the Rrna Genes Ofanopheles Gambiae NORA J
Total Page:16
File Type:pdf, Size:1020Kb
MOLECULAR AND CELLULAR BIOLOGY, Nov. 1992, P. 5102-5110 Vol. 12, No. 11 0270-7306/92/115102-09$02.00/0 Copyright ) 1992, American Society for Microbiology Distinct Families of Site-Specific Retrotransposons Occupy Identical Positions in the rRNA Genes ofAnopheles gambiae NORA J. BESANSKY,12* SUSAN M. PASKEWITZ,lt DIANE MILLS HAMM,1 AND FRANK H. COLLINS1'2 Malaria Branch, Division ofParasitic Diseases, National Centerfor Infectious Diseases, Centers for Disease Control, Atlanta, Georgia 30333,1 and Department ofBiology, Emory University, Atlanta, Georgia 303222 Received 15 May 1992/Returned for modification 1 July 1992/Accepted 27 August 1992 Two distinct site-specific retrotransposon families, named RT1 and RT2, from the sibling mosquito species Anopheles gambiae and A. arabiensis, respectively, were previously identified. Both were shown to occupy identical nucleotide positions in the 28S rRNA gene and to be flanked by identical 17-bp target site duplications. Full-length representatives of each have been isolated from a single species, A. gambiae, and the nucleotide sequences have been analyzed. Beyond insertion specificity, RT1 and RT2 share several structural and sequence features which show them to be members of the LINE-like, or non-long-terminal-repeat retrotrans- poson, class of reverse transcriptase-encoding mobile elements. These features include two long overlapping open reading frames (ORFs), poly(A) tails, the absence of long terminal repeats, and heterogeneous 5' truncation of most copies. The first ORF of both elements, particularly ORF1 of RT1, is glutamine rich and contains long tracts of polyglutamine reminiscent of the opa repeat. Near the carboxy ends, three cysteine- histidine motifs occur in ORF1 and one occurs in ORF2. In addition, each ORF2 contains a region of sequence similarity to reverse transcriptases and integrases. Alignments of the protein sequences from RT1 and RT2 reveal 36% identity over the length ofORF1 and 60%o identity over the length ofORF2, but the elements cannot be aligned in the 5' and 3' noncoding regions. Unlike that of RT2, the 5' noncoding region of RT1 contains 3.5 copies of a 500-bp subrepeat, followed by a poly(T) tract and two imperfect 55-bp subrepeats, the second spanning the beginning of ORF1. The pattern of distribution of these elements among five sibling species in the A. gambiae complex is nonuniform. RT1 is present in laboratory and wild A. gambiae, A. arabiensis, and A. melas but has not been detected in A. quadriannulatus or A. merus. RT2 has been detected in all available members of the A. gambiae complex except A. merus. Copy number fluctuates, even among the offspring of individual wild female A. gambiae mosquitoes. These findings reflect a complex evolutionary history balancing gain and loss of copies against the coexistence of two elements competing for a conserved target site in the same species for perhaps millions of years. Among the transposable elements that encode reverse noma cells (20, 49). However, unlike retroviruses, Li-like transcriptase (RT) are those without long terminal repeats elements are not infectious (43), and the absence of LTRs (LTRs). Examples of this type, sometimes referred to as indicates a very different replication strategy that is not well non-LTR retrotransposons, are the mammalian LINE-1 (L1) understood. Peptide alignments show that the RT and cap- elements and the Drosophila melanogaster I factors, F sid-like domains of non-LTR retrotransposons are more elements, and jockey (6). Comparisons among the many closely related to one another than to those of other RT- elements characterized from a broad phylogenetic spectrum encoding mobile elements (23, 51, 66, 68). However, al- of organisms have revealed several common structural and though the coding region may occupy over 80% of the length sequence features. Full-length elements, typically about 6 kb of a given element, the sequence diversity among character- long, include one or two long open reading frames (ORFs) ized non-LTR retrotransposons is high enough to preclude and poly(A)- or (A)-rich terminal tracts. Most copies, thou- of sands in mammals and tens in other organisms, are hetero- amino acid alignment any but very short segments. geneously truncated at the 5' end. The coding region in- Many non-LTR retrotransposons, such as mammalian Li cludes domains with RT homology and may also contain one elements (32) and Ti elements from mosquitoes in the or more cysteine-histidine (Cys) motifs reminiscent of the Anopheles gambiae complex (8, 9), are widely dispersed in nucleic acid-binding domains of retroviral nucleocapsid pro- the host genomic DNA and have no apparent insertion site teins. Recent reports have demonstrated transposition specificity. However, some elements of this same class through an RNA intermediate for the I factor (37, 56) and the exhibit a preference for a particular target site. In three mouse Li element (25). Indeed, the RT enzymes encoded by different trypanosomatid protozoa, three distinct elements CRE1, jockey, and a human Li element have been shown to (CRE1, SLACS, and CZAR) interrupt some portion of the be functional (30, 33, 50), and the RT activity is associated array of spliced leader RNA genes at the same conserved with virus-like particles of Li RNA and protein in the sequence (2). The Drosophila G element occupies a precise microsomal fraction of human and mouse embryonal carci- site in the intergenic spacers of the rRNA genes (22). Similarly, some 28S rRNA genes are interrupted by non- LTR retrotransposons in most insects (34). Most of the * Corresponding author. rDNA insertions studied belong to the Ri and R2 families of t Present address: Department of Entomology, University of elements that are found at highly conserved sites located 74 Wisconsin, Madison, WI 53706. bp apart and termed Ri and R2 sites, respectively. Excep- 5102 VOL. 12, 1992 DISTINCT RETROTRANSPOSONS WITH IDENTICAL TARGET SITES 5103 tions from the Japanese beetle (34) and fungus gnat (41) S S RTI insert about 30 bp upstream of the R2 site, one from a jPAg23H4S6) R nematode (3) inserts between the two sites, and the mosquito n elements reported here, RT1 and RT2, insert about 630 bp downstream (55). The A. gambiae complex, named for the member species that is the most important vector of malaria in the world, is a group of at least six tropical African mosquito species that are morphologically indistinguishable as adults. They are so closely related that the divergence from a common ancestor of the most anthropophilic members may have accompanied the shift in settlement patterns and land use associated with the origins of agriculture in Africa in the last several thou- sand years (17). Nucleotide sequence comparisons using _____~~~~ (pAgrlO6S3)S regions of the ribosomal and mitochondrial DNAs are con- (AAA)- \\\\\\\\\ sistent with this view (11). Previously described non-LTR n ,. .. retrotransposons from diverse species have been too di- ORF1 verged to align, except for a few relatively short conserved FIG. 1. Structural organization of RT1 and RT2 and their rDNA amino acid sequence motifs typical of RT and putative insertion site relative to those of Rl and R2. Cross-hatched boxes, coding regions; stipled boxes, subrepeat regions; black and open nucleic acid-binding proteins. Sequence comparisons among boxes, coding and transcribed spacer sequences, respectively; S, distinct but less diverged element families from sibling SalI sites used for subcloning. Not drawn to scale. species may help define the importance of less understood regions of the ORFs, particularly the more rapidly evolving gag-like ORF1. Analysis of ORF2 structural domains from distinct site-specific elements may elucidate those residues melting-temperature agarose gel. The insert was excised which dictate target site preference and recognition. Com- from the gel and labeled with [32P]dCTP to 5 x 108 cpm/,ug parison between two elements competing for the same target by random primer extension in the gel overnight (27). site inA. gambiae and other species in the complex may also Southern analysis. DNA was prepared from individual provide an evolutionary picture of their transmission, prop- adult mosquitoes as described previously (15), digested with agation, and loss. With these goals in mind, we isolated from 15 U of SalI (Boehringer Mannheim Biochemicals, Indianap- A. gambiae full-length representatives of two non-LTR olis, Ind.) for 90 min, electrophoresed on an 0.8% agarose retrotransposon families with identical target site specifici- gel, and transferred to GeneScreen Plus membrane (Du ties and determined and analyzed their complete nucleotide Pont, NEN Research Products, Boston, Mass.). Hybridiza- sequences. tion was performed overnight in 0.25% nonfat dry milk (Carnation)-6x SSC (lx SSC is 0.15 M NaCl plus 0.015 M sodium citrate) at 65°C. The blot was washed three times for MATERIALS AND METHODS 15 min each time in prewarmed 0.1x SSC-0.1% sodium Isolation and sequencing of clones. Genomic clones from dodecyl sulfate at 65°C. Activity was recorded both by EMBL3 phage libraries containing RT1 and RT2 insertions autoradiography and by a Betascope 603 Blot Analyzer in the 3' region of the 28S rRNA gene ofA. gambiae and A. (Betagen Corp., Waltham, Mass.) made available by the arabiensis were isolated as described previously (55). Full- Centers for Disease Control Biotechnology Core Facility. length RT1 clone Agr23 fromA. gambiae was obtained from The latter was used in accordance with the manufacturer's the initial screening. A full-length representative of RT2 specifications to record beta emissions directly from the from A. gambiae, Ag106, was selected from the EMBL3 hybridized membranes. library by using the KjpnI-PstI fragment of Aar23, a trun- Northern (RNA) analysis. Total RNA and polyadenylated cated RT2 element from A. arabiensis, as a probe (55). mRNA were isolated from embryos, larvae, pupae, and Overlapping restriction fragments from each bacteriophage adults of anA.