Proc. Nati. Acad. Sci. USA Vol. 89, pp. 4018-4022, May 1992 P-element homologous sequences are tandemly repeated in the of Drosophila guanche WOLFGANG J. MILLER*, SYLVIA HAGEMANNt, ELEONORE REITERt, AND WILHELM PINSKERtt *Institut fur Botanik, Abteilung Cytologie und Genetik, Rennweg 14, A-1030 Vienna, Austria; and tInstitut fur Allgemeine Biologie, Abteilung Genetik, Wahringerstrasse 17, A-1090 Vienna, Austria Communicated by Bruce Wallace, December 23, 1991

ABSTRACT In Drosophila guanche, P-homologous se- In germ-line cells, the copy number of P elements is quences were found to be located in a tandem repetitive array genetically regulated by a complex mixture of chromosomal (copy number: 20-50) at a single genomic site. The cytological and cytoplasmic factors (11). At least two mechanisms can be position on the polytene chromosomes was determined by in situ distinguished: the maternally transmitted cytotype (12) and a hybridization (chromosome 0: 85C). of one com- chromosomally inherited control system (13). In the P cyto- plete repeat unit (3.25 kilobases) revealed high sequence simi- type, the cytoplasmic state of P-strain individuals, transpo- larity between the central coding region comprising 0 to sition is repressed in somatic cells and in the germ line. In 2 and the corresponding section of the Drosophila melanogaster general the P cytotype develops gradually as a result of the P element. The rest of the sequence has diverged considerably. increasing number of P-element copies. M-strains, on the 3 has no coding function and the inverted repeats have other hand, have devoid of P elements. In the M disappeared. The P homologues ofD. guanche apparently have cytotype, which is found in eggs derived from M-strain lost their mobility but have retained the coding capacity for a females, transposition of paternally introduced P elements is protein similar to the 66-kDa P-element of D. mela- not inhibited in the germ line. An intermediate repression nogaster. Divergence between different repeat units indicates potential is found in M'- and Q-strains (14). The chromoso- early amplification of the sequence at this particular genomic mally inherited system of transpositional regulation is site. The presence of a common P-element site at 85C in thought to be based on a particular class of modified P Drosophila subobscura, Drosophila madeirensis, and D. guanche elements, the KP element (13, 15), but other factors may be suggests that clustering ofthe sequence at this location took place also involved (11). The molecular mechanisms underlying before the phylogenetic radiation of the three species. transposition control in the germ line are not completely elucidated, but defective P-element copies are supposed to The P elements of Drosophila melanogaster are perhaps the play an important role in the repression of transpositional best characterized eukaryotic transposons (for review, see activity. About two-thirds of the P-element copies present in refs. 1 and 2). For the evolutionary biologist the P-element a true P-strain genome carry internal deletions possibly family provides an excellent system for studies of the evo- lutionary dynamics of mobile DNA (3-5). In general the generated by imprecise double-strand gap repair after trans- autonomous propagation of transposable elements in a ge- position events (16) and are capable of passive transposition nome can have deleterious side effects for the host, such as only. The defective P-element copies that could influence the high mutability, chromosomal rearrangements, and sterility regulation of transpositional activity in the germ line fall into (6). Consequently, the following mechanisms favoring the at least three distinct functional classes (8): (i) copies lacking elimination of these genomic parasites may counterbalance open reading frame (ORF) 3 sequences, thus producing the their proliferation (7): (i) random processes (e.g., deletions), 66-kDa P-element repressor protein exclusively (17, 18); (ii) (ii) regulating mechanisms caused by the element itself, or small nonautonomous P elements with large internal dele- (iii) active defense mechanisms developed by the host ge- tions that could dilute by titration, thereby nome. Thus, in its most extreme form, the evolution of a reducing the frequency of transposition (19); (iii) P elements mobile sequence can be seen as a succession of replicative coding for truncated proteins that interfere with the dimer- bursts, dispersion, and finally elimination, a pattern that ization of the transposase (8). The potential of a defective P resembles the evolutionary behavior of organisms more than element to suppress transposition depends not only on the the evolutionary behavior of stationary DNA sequences. sequence but also on the genomic position (18, 20, 21). Although this simplified model is certainly not the only way Recent results show that the P cytotype can be elicited by transposons could evolve, it may apply to P elements and only two P-element copies (22). related systems. In natural populations of D. melanogaster, there is a In D. melanogaster, transposition activity ofP elements is tendency to accumulate particular defective P-element cop- regulated in a tissue-specific manner (for review, see ref. 8). ies preventing transpositional activity. Two cases have been Tissue specificity is achieved by differential splicing of the described in detail. Nitasaka et al. (23) found a defective P full-length transcript (9). The third is removed only in element responsible for the P cytotype in a strain from a the germ line, giving rise to an 87-kDa protein that functions Japanese population. Due to a 58-base-pair (bp) deletion near as a transposase necessary for genomic mobility. In somatic the end of ORF2, this deficient P element codes for a cells, the third intron remains and a shorter 66-kDa protein is truncated protein with repressor function. Another deletion produced, acting as a repressor of transposition (10). There- derivative, the KP element (13), is present in many copies in fore, propagation of P elements is confined to the germ line the genome of some strains. KP elements are thought to be and the damage to the host organism resulting from P-ele- involved in the suppression of transpositional activity (15). ment-induced is minimized. Abbreviation: ORF, open reading frame. The publication costs of this article were defrayed in part by page charge tTo whom reprint requests should be addressed. payment. This article must therefore be hereby marked "advertisement" §The sequence reported in this paper has been deposited in the in accordance with 18 U.S.C. §1734 solely to indicate this fact. GenBank data base (accession no. M81221).

4018 Downloaded by guest on September 25, 2021 Genetics: Miller et al. Proc. Natl. Acad. Sci. USA 89 (1992) 4019 Defective P elements also have been detected in the genomes An answer to this question is given by the analysis of of Drosophila nebulosa (24) and Drosophila bifasciata (25). P-homologous EMBL3 clones isolated from a nonamplified In the present study P-homologous sequences were iso- genomic library of D. guanche. Fifteen clones (EMBL3-Gpl lated and sequenced from a genomic library of Drosophila to EMBL3-GplS) were digested with BamHI, Xho I, orBgl II. guanche, a member of the Drosophila obscura group. Anx- With each of the digests, the fragment pattern was olabehere et al. (26) report that P-element sequences are found to be homogenous among all the clones tested. A single widespread in this species group. The most intense hybrid- 3.25-kb fragment was observed in the Xho I and Bgl II digests. ization signals in squash blots probed with the D. melano- With BamHI one large fragment of -23 kb was obtained, gaster P element were obtained with D. guanche, indicating indicating that there are no restriction sites for this enzyme in either a high copy number or close similarity in sequence. D. the inserts of the 15 clones. From these results it can be guanche is a species endemic to the island of Tenerife concluded that the P-homologous sequences of D. guanche (Canary Islands) and is considered to be a primitive repre- are tandemly repeated in the genome, one repeat unit being sentative of the group (27). The closest relatives in the D. 3.25 kb long. This interpretation is supported by a restriction obscura group are the species Drosophila subobscura and experiment carried out with the clone EMBL3-Gpl. After Drosophila madeirensis (28-30). Our investigation provides partial digestion with Bgl II, an additional fragment of 6.5 kb sequence data to elucidate the complicated evolutionary representing the dimer ofthe repeat unit can be detected (Fig. history of the P-element family in the genus Drosophila. 1). Since rearrangements within phage clones are known to Furthermore, the present status of the D. guanche P element arise sometimes as artifacts of cloning, the experiment was may contribute to the understanding of the evolutionary life repeated using genomic DNA. The additional multimers cycle of transposable elements in the gene pool of a species. shown in Fig. 2 clearly prove that tandemly repeated P homologues actually occur in the genome. MATERIAL AND METHODS Another line of evidence for the clustered array of P Fly Stocks. Wild strains of D. guanche (Tenerife 1985), D. homologues in the D. guanche genome comes from in situ madeirensis (Madeira 1987), and D. subobscura (Sweden hybridization experiments. When the Bgl II fragment of the 1977) were derived from mass cultures of the original popu- clone EMBL3-Gpl was used as a probe, only a single lation samples collected in the years indicated. The two hybridization signal was detected on the polytene chromo- cytological marker strains of D. subobscura (03+4+6 and somes ofD. guanche: this signal was localized at position 85C 03+4+2) were established from inbred lines maintained at the of chromosome 0 on the cytological map of Molt6 et al. (36). University of Tubingen (Federal Republic of Germany). Analogous experiments carried out with the closely related Cloning Procedures. High molecular weight DNA for the species D. subobscura and D. madeirensis revealed that both genomic library of D. guanche was prepared according to species possess P-homologous sequences at cytologically Hagemann et al. (25). The genomic DNA was partially equivalent positions. In D. subobscura the site at 85C is the digested with Sau3A and ligated into phage AEMBL3 (31). only one in the genome. The exact chromosomal position in Screening was done by the methods of Lansman et al. (24), D. subobscura was further confirmed by in situ hybridization using the probe pr25.1, which contains a complete P element experiments with inversion strains. Since the P-element site of D. melanogaster (32). A second P-element probe used for is located inside of inversion 06 but outside of 02, it must be control experiments was bifl, a P-homologous sequence situated between the proximal breakpoints of the two inver- isolated from the genome ofD. bifasciata (25), which consists only of 5' and 3' noncoding sequences. Procedures for 1 2 3 4 5 6 restriction enzyme digestion, agarose gel electrophoresis, gel blot analysis, nick translation of probes, and filter hybrid- ization followed standard methods (33). w - w 4e DNA Sequencing. The P-homologous DNA fragments se- 23.1 lected for sequencing were subcloned into the vector pBlue- script I KS(+). Sequencing was accomplished by the dideoxy- 9.4 nucleotide chain-termination technique (34) using T7 DNA Dimer 6.50 6.5 polymerase (Pharmacia). In Situ Hybridization. Larval salivary gland chromosomes were prepared and hybridized to biotin-labeled probes ac- cording to Engels et al. (35). Detection was carried out as 4.3 described in the Detek instruction manual (Enzo Biochem). Monomer 3.25 RESULTS Genomic Organization. According to preliminary squash- blot and dot-blot assays (data not shown), the genome of D. 2.3 guanche was expected to harbor multiple copies, 20-50, of - 2.0 P-homologous sequences. However, gel-blot analysis of ge- nomic DNA digested with Bgl II and probed with the D. melanogaster P element pi7r25.1 revealed only a single P-ho- mologous fragment of 3.25 kilobases (kb). From the intensity of the hybridization signal, it was clear that this 3.25-kb fragment could not be derived from a single P-element copy in the D. guanche genome. Thus the observed pattern could in (i) D. guanche be explained the following two ways. The FIG. 1. Southern blot of EMBL3-Gpl partially digested with Bgl P element is somewhat longer than that of D. melanogaster II and probed with pir25.1. Lanes: 1, undigested control; 2, Bgl II at and contains two restriction sites for Bgl II. The genomic 0.625 unit/Ag; 3, 1.25 units/,Ag; 4, 2.5 units/,ug; 5, 5 units/,ug; 6, distribution could be either dispersed or tandem repetitive. complete digestion. The positions of the monomer and the dimer of (ii) The P element ofD. guanche has only one Bgl II site and the 3.25-kb repeat unit are indicated. The additional weak bands multiple copies are tandemly arranged in a cluster. probably represent fragments containing vector DNA. Downloaded by guest on September 25, 2021 4020 Genetics: Miller et al. Proc. Natl. Acad. Sci. USA 89 (1992) common P-element site at 85C in the three species suggests that clustering of the sequence at this location took place before the phylogenetic radiation of D. subobscura, D. madeirensis, and D. guanche. Sequence Data. The base sequence of one complete repeat unit (Gl) from the EMBL3-Gpl insert is shown in Fig. 3. The 192 5 -- Pentarnrer P-homologous region was aligned to the sequence of the T(lltrar canonical P element of D. melanogaster, and the results of ---U1*- this interspecific sequence comparison are shown in Table 1. 9 75 It .1 A good match between the two sequences is obtained only in q the central segment extending from the 5' end ofexon 0 to the 3' end of exon 2 (positions 146-1993 in Fig. 3). In the U .o- ODi fnl sequences upstream and downstream of this apparently con- served subregion, the similarity is significantly lower (2 = 182.9; degree of freedom = 1; P < 0.001). As a consequence, . the delimitation of the P homologue (2938 bp) in the repeat unit suggested by the computer alignment may be question- able. The 8-bp target sequences flanking each P element in D. 3.25 ----- Monolrier w 4 melanogaster as direct repeats are completely missing, and the entire sequence between the conserved P-element ho- mologues (including the intervening spacer segment) does not contain stretches of DNA resembling direct repeats. There- fore, the target site duplications have either disappeared by multiple mutations or were located outside of the boundaries of the amplified segment. Neither the 31-bp terminal inverted repeats (positions 1-30 and 2908-2938, according to the FIG. 2. Southern blot analysis of genomic DNA of D. guanche alignment) nor the 11-bp subterminal inverted repeats (posi- partially digested with Bg1 II and probed with the D. guanche P tions 123-132 and 2790-2802) have retained their comple- homologue (insert of EMBL3-Gpl). Lane 1 contains a BgI 11(0.0625 mentary structures, and there are no inverted repeats else- unit/A&g) digest; multimers of the 3.25-kb repeat unit (from the dimer where in the sequence. It can be assumed that these P to the pentamer) can be identified. Additional bands probably result homologues have lost the capability for passive mobilization from length variation in the cluster. Lane 2 contains a BgI II (1 by transposase (provided from intact P elements) in the unit/Ag) digest; the 3.25-kb monomer and the dimer can be seen. remote past. Since then the recognition sites for the trans- posase have been further eroded by random mutations. sions indicated on the map of Kunze-Muhl and Muller (37). In contrast to the noncoding flanking regions, the protein- In D. madeirensis, which has the identical banding pattern in coding sequences of exons 0 to 2 appear intact. If only this chromosomal segment (29), an additional signal, al- nucleotide substitutions are considered, the average similar- though of weaker intensity, is observed at position 85A, close ity to the D. melanogaster P element is 70%. Small deletions to the main P-element site. This second site may have been and insertions of up to 7 bp are found in all three exons. Since generated by a small cytologically undetectable chromosome these mutations occur either as triplet changes or are com- rearrangement or by the insertion of a mobile element. A pensated by mutations in the adjoining codons, the ORF is similar disruption of a has been described in the not disrupted. The splicing signals for 1 (positions same species for the histone site (30). The presence of a 430-493) and 2 (positions 1141-1261) are conserved and

CAGMATTATAGCCGTATGGAATACCAGTTTTACATGTCTGGTCACACTCTGTCGCCAGCTTTTCAATTGCACATCTACATAGATTCGGMAAAGGTTGAACAAATTTTCTATTTTTTTCMA 120 TTGCTMACCGTATGAGCGCCAAAAAATGACGTGGTGCAGTGTATGCGGCAAAGTCGCTAACCATGTGMAATTGGTTCATGTCCCGGTATGTTTAGAGMGAGAAAATTGTGGGAGCAAAT 2240 TCTGGACTGCAGCTTTGCTGTGAACTCGAAAATTTGCGATTCACATTTCGATGCTTCACAGTGGAGGTCGCCACCAAAGGAAGGGCAGATATATMAAAGAAGGCGACTGAAAGCTGATGC 3360 TGTTCCTCATGGCGAGCCTGAACCAAAATTTGTTAAATTAGGCTTCGCAAACTCGAGTACGCAAACAGAgtaagctaaacatgctcaaataaatatttacacaaatgccaatactaatt 480 gt t tgt aatgtagGGACAACGTGATAAATCATGCAATACGTGTGGAGAACGAGAGTCTGAGGAAACAGMATCGTCGMTGCAGAAAGAAATGCATTCTTTACGTCAGCAACTTGAGGACT 6600 -TTAAGGAATTGGAAATCAGTCTAAAAACCATTTTCACCGAGACCCAAATAAATATATTGAAGAGTGGAGGCAAGAGAGCCGTATTTAATGCAACAGACATGTCTGCTGCTATTTGCCTTC 72 0 ATACCGCGGGACCTCCTGCATACAATCATCTTTATAGAAAAGGATTTCCTTTGCCGAGTCGAGCAACATTATACAGATGGTTGGCAGATGTTAACATCAGTACAGGTACCCTTGATGTCG 84 0 TTATAGATCTCATGGAAMATGAGGAAATGCCCGMAGTGGATAAGCTATGCGTTTTGTCCTTTGATGAAATGAAGGTCGCTGCTGCATTCGAGCATGACAGCTCAGCGGATGTTGATTATG 960 AGCCAAGCACCTATGTTCAACTGGCCATAGCTCGTGGCTTGAATAAGTCTTGGGAGCAGCCAGTATTCTTCGATTTCAGTACTCTAATGGACGCGGACACTCTGCATTCAATMATAAACA 1080 AACTGCACAAAAGAGGCTATCCCGTAGTAGCTATTGTCTCTGATTTGGGCGCTGGAAATCAAACATTATGGACAGAGCTTGGCATATCGGAGAgtaagt t tct tatacgagt taaaat cc 120 0 gaaat ttaatt ta t ttt ttgt tt tagCMAGGAATTGGTTTACCCATCCAGCGGATGAAGATTTGAAAATATTCGTTTTCTCGGATACGCCAACTCTGATAAAGTTGGTCCGTGACCAGTA 132 0 TGTTGACTCCGGACTAATCATCAACGGAAAACGATTGACTAAATCGACAGTCCAGCAGACAATTAGCCACTGTGCTAAGCCAGATGTGTCGATGTCATTCAATATTACCGATAATCACCT 144 0 TAATATTGGACCGCTGGCCAAACAGAATATCAAATTGGCAACGCAACTGTTCTCCAATACAACCGGCAGCTTCATAAGACGATGCAATGCATTGGGTTATAACGTACAGAACGCATCCGA 1 560 AACCGCTGATCTTTTTAAAATTATCAATGATTGGTTCGGGGTCTTCAACTCAAAGTCTTCAACATCAAATTCCATCGAACCGACGCAGCCTTATGGCAAACAAATCGAAATCCAACGAGG 168 0 CATTTTGGCCAAAATGTCTGAGATTATGAGCTCGGAGATACTTGGAGTGGGAGCCCACAGCCTGCCCTTCCAAAAAGGCATTTTAGTCAACAATGCATCCCTTGMAGGTTTGTATTGCTA 180 0 TCTATCGTCGGAAAAGTACGAGATCGAATACATTTTTACAAGTCGTCTCTCCCAAGACATTGTTGAGAACTTTTTTATGCCCATGCGGCCTAAAGGTGAACAGTTTGAGCATCCGACTCC 1 920 ACTTCAATTTAAGTTTATGTTAAGAAAATATATATCAGGTATGACAAAATTAAATAAACCAATTGATAAATAATT GATTTGAACAATACAAATTGTGACATAGAAATAATTGTTTTAAGT 2 04 0 TATTTTATATTTCATATTAAATTTGAGGACTCATATTATCAGAGGAATGATATGGTTGTATATCTAGGTTCCTGCTCCATGGAGCTGTATTAAAGAGCTGTAGCATGTGCTCAAGTCTAA 216 0 GTTTGCTAAAGTACCGCATTGTTTTTCACTGCAGACGCTCGCGTATGATCGCTGTGCGGCTGAAATTTCCTTGGCCGCTGCTGGCAGTACCTATGGGATCAGGATGCGCACATTGGCCGC 228 0 ATGGTATAAGTTGTCTGTAGGCAAAAAGTGCGCTGAAAGTCCTCGAAATAGAACTGCATTTCCAGAGTGATGAACAGTCGGATGAGGTCCAGGGCCAGCTTGGCATCGTGCCTGGCCAGC 240 0 TTGGCCGCGTATAAGCTGTTCGTCTGTGCCCAGAGCTCCCTCAAAGACCTGATAGTTCTCGCAATTGGAGCGAAGGAGCTrGCAGCAGGTCCAGCAGGAACTTGGGTCGCTGTTCGAGCGT 2 52 0 GTTGGAAATGTCGTGTAAGAACTCATCATCCAAGTTTAGCGGTATTTTGAGATTAATTATACCCGTTACTCGATATACACGAATATACTAGGGTATATTGTATTTGTGGGGAAACAATGT 2 64 0 AATTCACAGAAGGAAGCCTTTCCTTCGCCAAAAGAATATATATTTTGATCAGCATGAATAGCCGAGTCAATGTGTCAACCTGTCTATTTTTTTTTAATTTCTGTACGCCCCCGCAAAACA 2 760 GCGAAAATCTGCTATATCCACAATTTTAAAGATCGAGAAAACTAAAAATGCCATTCCGTAGGGAATGAATATAGCTATCAGATCAACGAATTGGGATCAGATTGGAITCATTATTACAGCC 2 880 AGAATGAAGAAATTAATTTGCAGTGGCTAAACCCTCCCCGCCCCGCAGCTTTATTATGt t ttgcacat tcgaaaa tgcgagctaaaagagcgtgt tggcgtaagaagtga tgtaca tgta 3 000 gaagtaga tgtaaa tgacat tgtagaaacaa ttt taaatggcaaatgtaaaagttgaaagttgaaagaaaaaaaaaaaaggtacaca tatactaactgtgtaccgggtataaaagt tgtg 3120 a cgctaagaagcgt ctta caca tcccct tct cgtt t ttacct tcggt tct tatggaa tct tca tccgga ctgt t ttgcaaagcaaa caaaacaaaa cagctca cagctgt tt 3 23 2 FIG. 3. Sequence of one P-homologous repeat unit (G1) of D. guanche. The sequence consists of two subunits: the P homologue (roman type, positions 1-2938) and the intervening sequence (lowercase italic type, positions 2939-3232). Presumptive protein coding sequences ofexons 0, 1, and 2 are underlined. Introns 1 and 2 are given in lowercase type. The four additional stop codons after exon 2 are also underlined. Downloaded by guest on September 25, 2021 Genetics: Miller et al. Proc. Natl. Acad. Sci. USA 89 (1992) 4021 Table 1. Interspecific sequence comparison DISCUSSION Length, Sequence differences, bp % The rather high degree of sequence divergence between the P Region bp Id Sub Del Ins sim homologues ofD. guanche and the D. melanogasterP element indicates that the two sequences have evolved independently 5' IR 31 15 15 1 0 48.4 over a long period of time. This finding provides additional 5' NCR 122 57 57 7 1 46.7 evidence for the theory postulating recent invasion of the D. Exon 0 290 192 92 6 0 66.2 melanogaster gene pool by a P element horizontally intro- Intron 1 64 36 22 0 6 56.3 duced from Drosophila willistoni (38). D. melanogaster is Exon 1 680 468 200 0 12 68.8 phylogenetically closer related to D. guanche than to D. Intron 2 54 35 17 1 1 64.8 willistoni (39). Nevertheless the P elements of D. melano- Exon 2 732 515 211 0 6 70.4 the D. Intron 3 203 105 83 2 13 51.7 gaster and D. willistoni are almost identical, whereas Exon 3 582 236 325 8 13 40.5 guanche P homologue is considerably different from both. It 3' NCR 174 69 101 0 4 39.7 can be assumed that the P homologue of D. guanche has 3' IR 31 12 19 0 0 38.7 remained continuously in the same gene pool since the com- Total 2963 1740 1142 25 56 58.7 mon ancestor ofthe lineages leading to the D. willistoni group and the branch that later separated into the D. melanogaster Sequence differences between the D. guanche P homologue (Gi) group and the D. obscura group. Therefore, the P elements of and the canonical P element of D. melanogaster in various subre- D. 53 gions from the 5' end to the 3' end. Id, identical base-pair positions; melanogaster and D. guanche have been separated Sub, substitutions; Del, deletions; Ins, insertions; % sim, percent million years, the estimated divergence time between D. sequence similarity including insertions and deletions; IR, terminal guanche and D. willistoni (40). It would be interesting to inverted repeats; NCR, noncoding regions. compare the sequence of the D. guanche P homologue with related P sequences from other species of the D. obscura located at positions homologous to those in the D. melano- group to see whether a group-specific P element exists with gaster sequence. These two introns resemble the correspond- shared substitutions characteristic of the lineage. Unfortu- ing regions in the D. melanogaster P element in length and in nately, the sequences of the internally deleted P elements of sequence. In exons 0 to 2, the sequence changes in D. D. bifasciata (25) do not overlap with the functionally con- guanche did not generate stop codons and the ORFs are served regions of the D. guanche P homologue. The degen- preserved. Altogether the section comprising exons 0 to 2 is erated flanking regions of the D. guanche P homologue, more like a functional coding sequence than a degenerated however, have diverged at a much faster rate, thus obscuring derived from an immobilized transposon. the phylogenetic relationships. The region after the 3' end of exon 2 has considerably The conservation of exons 0 to 2 suggests that the P diverged from the D. melanogaster sequence. Intron 3 has homologue of D. guanche may code for a truncated protein lost the 3' splice signal. In the adjacent segment equivalent to similar to the 66-kDa P-element repressor (18) of D. melano- exon 3 of the D. melanogaster P element, the original reading gaster. The amino acid sequence of the putative protein is frame-has shifted because of randomly distributed noncom- given in Fig. 4. Provided that the third intron is not removed pensating deletions and insertions. Moreover, all possible because ofthe at the 3' splice signal, the translational reading frame is terminated by a stop codon at positions reading frames are blocked by several stop codons and it is 1991-1993. Four additional stop codons are found among the rather obvious that this sequence has no coding function. following 15 base triplets (Fig. 3). The protein predicted from To estimate the amount of variation among the P homo- the DNA sequence consists of 575 amino acids and has a logues in the cluster, two additional copies (G2 and G3) molecular weight of 65,341. With 63.6% identical amino acid derived from a different EMBL3 clone (EMBL3-Gp7) were positions, the homology to the P-element repressor of D. partially sequenced (260 bp of exon 0). Although the three P melanogaster would be rather high. Since 33% of the amino homologues belong to a tandemly clustered , they acid replacements are conservative changes, one is inclined to have diverged considerably (8.6-11.9%o), a finding that sug- postulate similar functions for the two proteins. gests early amplification of the sequence at this particular Although transcriptional activity of the D. guanche P genomic site. This result is consistent with the interspecific homologue has not been proved, it is tempting to speculate comparison of the chromosomal location of P homologues in why this terminally degenerated P-element derivative has D. guanche and its close relatives. In spite of the substitu- accumulated in the genome of D. guanche. The most plau- tional divergence, the translational reading frame of G2 and sible assumption is that a P-element repressor coding se- G3 has been preserved. quence was amplified at position 85C as a molecular defense According to these data the P homologues of D. guanche mechanism to suppress transposition. After switching of the can be described as stationary transposon derivatives that are cytotype from the M to the P state, all mobile P-element degenerated at their 5' and 3' termini but have some residual sequences (full-sized transposase producers and internally coding capacity. With respect to the internal coding region, deleted passive elements) became immobilized and started to the existence of differently structured P elements can be degenerate by random mutations. The process of degenera- tion may have gone so far that these elements are no longer excluded on the basis of restriction experiments. Theoreti- detectable by hybridization techniques. cally, however, it could be that internally deleted P-element An alternative hypothesis for the origin of the D. guanche derivatives with intact terminal repeat structures are present P homologue appears also reasonable. Accordingly, the P in the genome of D. guanche in addition to the tandemly homologues ofD. guanche may represent an ancient genomic repeated type. To test this possibility, Southern blots of sequence that predates mobile P elements. A copy of this genomic DNA of D. guanche were hybridized with the gene was captured by primordial P elements and converted P-element probe bifl of D. bifasciata (25), a sequence into a functional transposase gene by addition of a fourth consisting exclusively of noncoding termini. Since no signals exon. The formation of a transposon derived from a P were detected in these control experiments, we assume that element by a similar process has been described (41). The the genome of D. guanche does not contain sequences interspecific distribution pattern ofP elements (26, 38, 42, 43) homologous to the 31-bp terminal inverted repeats found in could be interpreted to mean that this transposon family was all P elements studied so far. present in the progenitor of the genus Drosophila. On the Downloaded by guest on September 25, 2021 4022 Genetics: Miller et al. Proc. Natl. Acad. Sci. USA 89 (1992)

Dg MTWCSVCGKVANHVKLVHVPVCLEKRKLWEQILDCSFAVNSKICDSHFDASQWRSPPKEG 60 2. Engels, W. R. (1989) in Mobile DNA, eds. Berg, D. E. & Howe, DM KY RF C AM I I RA M _ T ND KMAA M. M. (Am. Soc. Microbiol., Washington), pp. 437-484. Dg QIYKRRRLKADAVPHG--EPEPKFVKLGFANSSTQTEDNVINHAIRVENESLRKQNRRMQ 120 3. Kaplan, N., Darden, T. & Langley, C. H. (1985) Genetics 109, Dn TF m SRVI ERI E YT* aOL--- N- K ZU V 459-480. Dg KEMHSLRQQLEDFKELEISLKTIFTETQINILKSGGKRAVFNATDKSAAICLHTAGPPAY 180 4. Ajioka, J. W. & Hartl, D. L. (1989) in Mobile DNA, eds. Berg, DMI RA I Em E RR D R N Q X S_ I T _ D. E. & Howe, M. M. (Am. Soc. Microbiol., Washington), pp. Dg NHLYRKGFPLPSRATLYRWLADVNISTGTLDVVIDLHENEEMPEVDKLCVLSFDEMKVAA 240 939-958. DM K T S D ER C E D-JDVDA A 5. Charlesworth, B. & Langley, C. H. (1989) Annu. Rev. Genet. 23, 251-287. Dg AFEHDSSADVDYEPSTY ,IARGLNKSWEQPVFFDFSTLMDADTLHSIINKLHKRGYP 300 6. Kidwell, M. G., Kidwell, J. F. & Sved, J. DM I IV D Y K K NRB P NI RK A. (1977) Genetics 86, L 813-833. Dg 7. Orgel, L. E. & Crick, F. H. C. (1980) Nature (London) 284, 604- DM T K SKT S DH NH 607. Dg IINGKRLTKSTVQQTISHCAKPDVSMSFNITDNHLNIGPLAKQNIKLATQLFSNTTGSFI 420 8. Rio, D. C. (1990) Annu. Rev. Genet. 24, 543-578. Dm T K K I BALH _ L Il_ K E I VRt BV 9. Laski, F. A., Rio, D. C. & Rubin, G. M. (1986) Cell 44, 7-19. Dg RRCNALGYNVQNASETADLFKIINDWFGVFNSKSSTSNSIEPTQPYGKQIEIQRGILAKM 480 10. Rio, D. C., Laski, F. A. & Rubin, G. M. (1986) Cell 44, 21-32. DM XS PIE T F LM DI L C _S LD ND HR 11. Raymond, J. D., Ojala, T. A., White, J. & Simmons, M. J. (1991) Dg SEIMSSEILGVGAHSLPFQKGILVNNASLEGLYCYLSEKYEIEYIFTSRLSQDIVENFFI4 540 Genet. Res. 57, 227-234. DM BETQ OR- I D K Q UFMQ _ H G 12. Engels, W. R. (1979) Proc. Natl. Acad. Sci. USA 76, 4011-4015. Dg PMRPKGEQFEHPTPLQFKFMLRKYISGMTKL----NKPIDKX 582 13. Black, D. M., Jackson, M. S., Kidwell, M. G. & Dover, G. A. § DM 9 !R D YR I N =Q V (1987) EMBO J. 6, 4125-4135. 14. Kidwell, M. G. (1983) Proc. Natl. Acad. Sci. USA 80, 1655-1659. FIG. 4. Amino acid alignment between the D. melanogaster 15. Jackson, M. S., Black, D. M. & Dover, G. A. (1988) Genetics 120, repressor protein sequence (Dm) and the presumed translation 1003-1013. 16. Engels, W. R., Johnson-Schlitz, D. M., Eggleston, W. B. & Sved, product of the D. guanche P homologue (Dg). Conservative replace- J. (1990) Cell 62, 515-525. ments are shown in uppercase type; nonconservative replacements 17. Robertson, H. M. & Engels, W. R. (1989) Genetics 123, 815-824. are underlined. Dashes indicate amino acid deletions. 18. Misra, S. & Rio, D. C. (1990) Cell 62, 269-284. 19. Simmons, M. J. & Bucholz, L. M. (1985) Proc. NatI. Acad. Sci. basis of the "ancient gene" hypothesis, it is, however, USA 82, 8119-8123. difficult to explain why P-homologous sequences are com- 20. Preston, C. R. & Engels, W. R. (1989) Prog. Nucleic AcidRes. Mol. pletely absent in many species (38, 44, 45). The hypothesis Biol. 36, 71-85. 21. Simmons, M. J., Raymond, J. D., Rasmusson, K. E., Miller, would require the repeated loss of both types of P-homolo- L. M., McLarnon, C. F. & Zunt, J. R. (1990) Genetics 124, 663- gous sequences (mobile P elements and the stationary source 676. gene) in separate phylogenetic lineages. Although this pos- 22. Ronsseray, S., Lehmann, M. & Anxolabdhere, D. (1991) Genetics sibility cannot be ruled out, it seems less convincing than the 129, 501-512. 23. Nitasaka, E., Mukai, T. & Yamazaki, T. (1987) Proc. Natl. Acad. "terminal-degeneration" model outlined above. Sci. USA 84, 7605-7608. Parallel investigations in the three related species D. 24. Lansman, R. A., Shade, R. O., Grigliatti, T. A. & Brock, H. W. guanche, D. madeirensis, and D. subobscura suggest that the (1987) Proc. Natd. Acad. Sci. USA 84, 6491-6495. amplification of the repressor-producing P homologue at the 25. Hagemann, S., Miller, W. J. & Pinsker, W. (1990) J. Mol. Evol. 31, 478-484. cytological site 85C occurred before the phylogenetic radiation 26. Anxolabehtre, D., Nouaud, D. & Pdriquet, G. (1985) Genet. Sel. of this species cluster. Therefore, any mobile P elements Evol. 17, 579-584. should have been eliminated long ago. This, however, raises 27. Monchis, M. (1976) Bol. R. Soc. Esp. Hist. Nat. (Secc. Biol.) 74, the question as to why a repressor-producing sequence is still 197-213. conserved in the absence ofmobile P elements. Consequently, 28. Cabrera, V. M., Gonzales, A. M., Larruga, J. M. & Gull6n, A. (1983) Evolution 37, 675-689. the now useless P repressor genes should have degenerated at 29. Krimbas, C. B. & Loukas, M. (1984) Heredity 53, 469-482. a rate similar to that ofthe immobilized P elements. To explain 30. Felger, I. & Pinsker, W. (1987) Z. Zool. Syst. Evolutionsforsch. 25, this strange phenomenon, one has to postulate some additional 127-140. function for the protein that provides a selective advantage to 31. Frischauf, A. M., Lehrach, H., Poustka, A. & Murray, N. (1983) J. the fly. A change of function in connection with tandem Mol. Biol. 170, 827-842. 32. O'Hare, K. & Rubin, G. M. (1983) Cell 34, 25-35. amplification has been described for a rudimentary P element 33. Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989) in Molecular in D. melanogaster (46). In that special case, the insertion of Cloning: A Laboratory Manual (Cold Spring Harbor Lab., Cold four P elements into the control region of the Gpdh gene was Spring Harbor, NY), 2nd Ed. found to influence transcriptional activity. The P derivative 34. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. Acad. apparently serves as an binding a regulatory host Sci. USA 74, 5463-5467. 35. Engels, W. R., Preston, C. R., Thompson, P. & Eggleston, W. B. protein. For the P homologues of D. guanche, however, we (1986) Focus 8, 6-8. have to postulate a functional change at the protein level. The 36. Molt6, M. D., DeFrutos, R. & Martinez-Sebastian, M. J. (1987) protein encoded in the cluster possibly has acquired an as- Genetica 75, 55-70. signment distinct from its former repressor function. Thus the 37. Kunze-Mdhl, E. & Muller, E. (1958) Chromosoma 9, 559-570. P element of this species may have entered a further stage in 38. Daniels, S. B., Peterson, K. R., Strausbaugh, L. D., Kidwell, M. G. & Chovnick, A. (1990) Genetics 124, 339-355. the evolutionary life cycle of a transposon: molecular domes- 39. Throckmorton, L. H. (1975) in Handbook of Genetics, ed. King, tication, the transition of a former genomic parasite to a R. C. (Plenum, New York), Vol. 3, pp. 421-469. stationary gene beneficial to the host. 40. Beverly, S. M. & Wilson, A. C. (1984) J. Mol. Evol. 21, 1-13. 41. Tsubota, S. I. & Dang-Vu, H. (1991) Proc. Natl. Acad. Sci. USA 88, We are very much obliged to V. M. Cabrera (Tenerife) who 693-697. 42. Lansman, R. A., Stacey, S. H. W. provided the strains of D. guanche and D. madeirensis. We are N., Grigliatti, T. A. & Brock, (1985) Nature (London) 318, 561-563. grateful to D. Schweizer (Vienna) for the kind hospitality granted to 43. Simonelig, M. & Anxolabehere, D. (1991) Proc. Natl. Acad. Sci. our group in his laboratories throughout the project and for valuable USA 88, 6102-6106. advice. We greatly appreciate the comments on the manuscript by M. 44. Brookfield, J. F. Y., Montgomery, E. & Langley, C. H. (1984) Breitenbach (Vienna) and D. Sperlich (Tubingen). The work was Nature (London) 310, 330-332. supported by a grant from the Fonds zur Forderung der wissen- 45. Stacey, S. N., Lansman, R. A., Brock, H. W. & Grigliatti, T. A. schaftlichen Forschung (Project P6575B). (1986) Mol. Biol. Evol. 3, 522-534. 46. Itoh, M., Iwabuchi, M., Yoshida, K. & Hori, S. H. (1989) Biochem. 1. Engels, W. R. (1983) Annu. Rev. Genet. 17, 315-344. Genet. 27, 699-718. Downloaded by guest on September 25, 2021