Mol Genet Genomics DOI 10.1007/s00438-009-0453-7

ORIGINAL PAPER

Drosophila P transposons of the urochordata Ciona intestinalis

Stefanie Kimbacher · Ingrid Gerstl · Branko Velimirov · Sylvia Hagemann

Received: 29 January 2009 / Accepted: 21 April 2009 © Springer-Verlag 2009

Abstract P transposons belong to the eukaryotic DNA Introduction transposons, which are transposed by a cut and paste mech- anism using a P-element-coded transposase. They have P elements were Wrst discovered in melanogas- been detected in Drosophila, and reside as single copies ter and belong to the DNA transposons (Kidwell et al. and stable homologous sequences in many vertebrate spe- 1977). The genome of a typical P strain of D. melanogaster cies. We present the P elements Pcin1, Pcin2 and Pcin3 contains 50–60 P element copies dependent on geographi- from Ciona intestinalis, a species of the most primitive cal location (Kidwell 1983; Anxolabéhère et al. 1985a). In chordates, and compare them with those from Ciona sav- addition to complete autonomous elements, there are a ignyi. They showed typical DNA transposon structures, number of internally deleted and thus non-autonomous ele- namely terminal inverted repeats and target site duplica- ments (O’Hare et al. 1992). tions. The coding region of Pcin1 consisted of 13 small Full-sized autonomous D. melanogaster P elements exons that could be translated into a P-transposon-homolo- have a length of 2907 bp (O’Hare and Rubin 1983). The gous . C. intestinalis and C. savignyi displayed four exons of the element code for the transposase, an nearly the same phenotype. However, their P elements 87-kD protein that catalyzes transposition in the germ line were highly divergent and the assumed P transposase from (Rio et al. 1986; Laski et al. 1986). For the transposition C. intestinalis was more closely related to the transposase event, 31-bp terminal inverted repeats (TIRs) are essential from Drosophila melanogaster than to the transposase of and 11-bp subterminal inverted repeats (STIRs) function as C. savignyi. The present study showed that P elements with transpositional enhancers. P element transposition occurs typical features of transposable DNA elements may be exclusively in the germ line because that is only where the found already at the base of the chordate lineage. transposase is produced. The mechanism behind this is the tissue-speciWc, alternative splicing of intron 3 in the pre- Keywords P transposon · Ciona · Drosophila · THAP9 · mRNA. This intron is only spliced in germ line cells, which Urochordata · Evolution leads to assembly of the 87-kD transposase that is encoded by all four exons. In somatic cells (and in the germ line), Communicated by S. Hohmann. the third intron is not spliced, which gives rise to the 66-kD repressor protein (Laski et al. 1986; Misra and Rio 1990). Electronic supplementary material The online version of this P element sequences can be found in almost all species article (doi:10.1007/s00438-009-0453-7) contains supplementary groups of the subgenus (Anxolabéhère et al. material, which is available to authorized users. 1985b; Lansman et al. 1985; Daniels and Strausbaugh S. Kimbacher · I. Gerstl · B. Velimirov · S. Hagemann (&) 1986), but they are absent in species most closely related to Laboratory of Microbiology, D. melanogaster (BrookWeld et al. 1984) and in geographi- Molecular Biology and Virology (MMV), cally isolated species. These observations, together with the Centre of Anatomy and Cell Biology, fact that P elements are completely absent from old labora- Medical University of Vienna, Waehringerstrasse 10, 1090 Vienna, Austria tory strains of D. melanogaster, indicate that the latter has e-mail: [email protected] obtained its P element by horizontal transmission from 123 Mol Genet Genomics another species. This transfer must have occurred within et al. 2002). Its P elements were compared with those from the last century, and was followed by a rapid spread of P Ciona savignyi, which is distributed in the PaciWc Ocean elements through the new host (Anxolabéhère et al. 1988). and has recently spread from Japan to the West coast of the P-homologous sequences have also been found outside the US. These two species are often described as being closely genus Drosophila, for example, in the Australian sheep related (Jiang and Smith 2005). Morphologically, they are blowXy Lucilia cuprina (Perkins and Howells 1992), the quite similar and they also share essential mechanisms of house Xy Musca domestica (Lee et al. 1999) and in Scap- early development (Johnson et al. 2004). Systematically, tomyza pallida (Simonelig and Anxolabéhère 1991; C. intestinalis and C. savignyi are members of the class Hagemann et al. 1996; Haring et al. 1998). Ascidiacea or sea squirts and belong to the subphylum P elements had been thought to occur exclusively in the Tunicata or Urochordata. Urochordata is part of the phylum genomes of dipteran until our group described a Chordata, together with Cephalochordata and Vertebrata, P-homologous sequence in vertebrates (Hagemann and and diverged from the last common ancestor of all chor- Pinsker 2001). Primarily, it was found by BLAST search dates at least 520 million years ago (Johnson et al. 2004). using the cDNA sequence of a 3Ј-truncated P element from Recent studies, based on phylogenetic analyses of genome Drosophila subsilvestris in the , and it was sequences, have shown that the Tunicata are the closest named Phsa (P homolog of Homo sapiens). Phsa corre- living relatives of the Vertebrata (Delsuc et al. 2006). sponds to a sequence designated as THAP9 that was In this study, we investigated P transposons from Ciona described 2 years later by Roussigne et al. (2003a). The sp. and showed that they exhibited typical features of DNA- THAP domain was originally identiWed as a DNA-binding transposons, such as TIRs and TSDs. The Wnding of P trans- domain in the human protein THAP1, which is a nuclear posons in two species of Ciona provides compelling pro-apoptotic factor (Roussigne et al. 2003b). Phsa has a evidence that they existed already as jumping at the length of 19 533 bp and consists of six exons. It is a station- base of vertebrate evolution, and that their stable integration ary, single-copy , located on the long arm of chromo- into the genome in higher vertebrates resulted from a molecu- some 4 and encodes a hypothetical protein of 903 amino lar domestication event during evolution (Miller et al. 1992). acids, with still unknown function (Hammer et al. 2005). Pgga is the single-copy, stationary and presumably cod- ing P-homologous sequence in Gallus gallus, in which it is Materials and methods located at an orthologous position to Phsa. Such stationary, single-copy P-homologous sequences have been detected All laboratory techniques were carried out using standard in several other vertebrate species, but only as rudiments in methods. rodents (Hammer et al. 2005). These rudiments are also located at orthologous positions to Phsa and Pgga. In con- In silico analysis trast, several P-homologous sequences have been found in the genome of zebraWsh, named Pdre (P homolog of Danio The in silico analysis before and after the laboratory exper- rerio). Some of these have structural features typical of iments were carried out using BLAST (http://www.ncbi. transposable elements and large open reading frames nlm.nih.gov) and BLAT (http://genome.ucsc.edu/) database (ORFs) that suggest coding capacity (Hammer et al. 2005; searches (Kent 2002), and MegAlign (DNASTAR Laser- Hagemann and Hammer 2006). gene) to create alignments. The P elements from C. sav- To improve our knowledge about the evolution of P ele- ignyi were found in the Repbase Update (Jurka et al. 2005), ments within the vertebrates, we proceeded down the phylo- which is the most commonly used database of repetitive genetic lineage to the base of the chordate ancestry and DNA elements. Phylogenetic analysis was carried out with chose Ciona species for our further studies, from which the PhyML online web server (Guindon et al. 2005; http:// THAP9-like sequences have been reported (Quesneville atgc.lirmm.fr/phyml/) and MrBayes (Ronquist and Huelsen- et al. 2005). Ciona species belong to the so-called urochor- beck 2003; http://mrbayes.csit.fsu.edu/). The combination of data and are located phylogenetically at the base of the chor- in silico and molecular biological studies was carried out date lineage. Therefore, they constitute a suitable model because the TIR-containing ends of the C. intestinalis P-ele- system for studying chordate genome organization and evo- ments could not be detected by in silico analysis alone. lution. Their free-swimming, tadpole-like larvae show typi- cal chordate features, such as a prominent notochord, a Hybridization methods: library screening and Southern dorsal hollow nerve chord and branchial segments. The blotting present study focused on Ciona intestinalis, an ascidian spe- cies that is widely distributed throughout coastal regions The genomic C. intestinalis library was screened with a around the world (Hoshino and Nishikawa 1985; Dehal Digoxigenin-labeled, PCR-ampliWed and cloned fragment 123 Mol Genet Genomics of 1,011 bp (Pcin+: 5Ј-CAAATACCTGCTGCTAATGC sequence clone of 1,673 bp (AK113587). A BLAT search AACC-3Ј and Pcin¡: 5Ј-TAACCAGACTGAAACACAT with this cDNA sequence led to multiple hits that allowed CTGGG-3Ј) and genomic DNA from C. intestinalis as a us to assume that repeated P-homologous sequences exist template. Positive plaques were detected by the enzyme- within the genome of C. intestinalis, but we could not induced color reaction using BCIP (5-bromo-4-chloro-3- determine their terminal sequences. Therefore, at this point, indolyl phosphate; Roche Diagnostics; Mannheim, the laboratory experiments were started to obtain P element Germany) and NBT (Nitro blue tetrazolium chloride; sequences not represented in the available genomic Roche Diagnostics) as substrates. For the isolation of phage sequence data. DNA, we used the Lambda Mini Kit from Qiagen (Germany). Southern blot hybridizations were carried out Library screening and characterization of obtained clones re-using the hybridization solution from the library screening. Based on the alignment of the genomic sequence Cloning, subcloning and sequencing (AABS01002034) and the cDNA sequence (AK113587), we designed the primer pair Pcin+ and Pcin–, which bound The PCR-ampliWed hybridization probe was cloned using to exonic regions and ampliWed a 1,011-bp genomic probe. the pGEM®-Teasy Vector System (Promega, USA). SmaI- Screening a genomic library, we received more than 30 digested, phosphatase-treated pGEM®-3Z DNA (Promega, hybridization signals and took six clones (7, 11, 12, USA) was used for subcloning blunt-end fragments eluted 13, 17 and 19), which were chosen randomly for further from agarose gels by the GFX PCR DNA and Gel Band analysis. The clones were double restricted with ScaI and PuriWcation Kit from GE Healthcare (USA). The (recombi- PvuII and the fragments electrophoresed, blotted and nant) plasmids were transformed into competent Esche- hybridized, to identify those with P-homologous sequences. richia coli JM109 cells (Promega, USA). Sequencing was After hybridization, all clones exhibited one fragment of carried out in our laboratory using the SequiTherm Excel II »3,500 bp (fragment 1) and another of »1,700 bp (frag- DNA Sequencing Kit (Epicentre Biotechnologies, USA) ment 2), although four of the six analyzed clones originated and an LI-COR 4200, or by 4base lab (Germany). from diVerent genomic positions (Supplementary S1). We concluded that the two fragments, which were subcloned and partially sequenced, were parts of a larger repeat unit of Results »5,200 bp.

P-homologous sequences in C. intestinalis In silico analyses based on the obtained sequence data

With the BLAST search using the P-homologous amino The regions outside the already sequenced parts had to be acid sequence deduced from Pdre2 of zebraWsh analyzed, in order to identify the presumed ends of the (BX890548) and the program tblastn (Table 1), the most repeat unit, which were thought to display the P element signiWcant hit was a short cDNA sequence of 700 bp from termini. To this end, we combined fragments 1 and 2 from C. intestinalis (BW487041). Using the translated sequence, clone 13 (Fig. 1), which resulted in a 3,148-bp screening P transposase similarity was reXected by its homology to probe (Supplementary S2). The correct orientation of the diVerent THAP9 (e-values up to 3e-41), which cor- two fragments was given by the homology between them responded to the P-homologous proteins in vertebrates. We and the 1,011-bp hybridization probe, which overlapped the identiWed the corresponding genomic C. intestinalis 3Ј region of fragment 1 and the 5Ј region of fragment 2 sequence (AABS01002034) and the cDNA full insert (Fig. 1). This artiWcial in silico screening probe included

Table 1 Most signiWcant Acc. no. Description of the sequences e-value results of BLAST search using the amino acid sequence BW487041 Nori Satoh unpublished cDNA library, C. intestinalis, 6e-44 deduced from Pdre2 mRNA sequence DN528037 1316550 MARC 7BOV Bos taurus cDNA 3Ј, mRNA sequence 8e-38 BW502377 Nori Satoh unpublished cDNA library, C. intestinalis, 8e-38 mRNA sequence CR551045 Bos Taurus, embryonic and extraembryonic tissues 7e-37 1262874 MARC 7BOV Bos taurus cDNA 5-, mRNA sequence 2e-36 1257998 MARC 7BOV Bos taurus cDNA 3-, mRNA sequence 3e-36

123 Mol Genet Genomics

Fig. 1 The strategy for in silico searching for the terminal sequences mosome 01q; Table 2) started with the Wrst nucleotide (1) and ended of C. intestinalis P elements was as follows. Fragment 1 (F1/3500 bp) with the last nucleotide (3,148) of the screening probe. These sequenc- and fragment 2 (F2/1,700 bp), which were identiWed as two subunits of es were used to determine the ends of the repetitive sequence by a larger repeat unit, were partially sequenced (black regions) and comparing their Xanking regions. This led to the detection of the TIR- joined together to construct an artiWcial in silico screening probe of containing termini of the C. intestinalis P elements (shown as black- 3,148 bp (Supplementary S2), which was used for BLAT searching. framed white arrows), designated as Pcin1, Pcin2 and Pcin3 The homologous regions of three hits (scaVolds 202 and 272, and chro-

Table 2 Result of BLAT search Score Start End Identity (%) Location Strand Start End Span using an artiWcial probe of 3148 bp, deduced from sequence 2,937 1 3,148 99.1 202 ¡ 65,901 71,034 5,134 data of clone 13 1,453 1 2,369 93.3 59 + 114,569 116,511 1,943 1,350 847 3,148 98.4 138 ¡ 118,386 122,613 4,228 1,263 1 2,972 96.8 181 ¡ 121,117 122,977 1,861 1,126 868 3,147 94.2 275 ¡ 20,179 23,056 2,878 1,121 1 3,148 97.9 272 ¡ 46,173 50,573 4,401 1,112 1 2,365 98.2 297 + 12,169 16,400 4,232 1,090 1 3,148 97.0 01q + 1,359,987 1,361,379 1,393 stretches of non-determined nucleotides and spanned the 5Ј, 01q). Thus, the sequence located on scaVold 202 was sup- as well as the 3Ј end of the internal 5,200-bp fragment of posed to be the most complete. the assumed C. intestinalis P element. The result of the BLAT search is shown in Table 2. Three sequences, The P elements of C. intestinalis namely those located on the scaVolds 202 and 272 and on 01q were considered as especially interesting The nucleotides located upstream and downstream of the for further analysis, because their homology with the 3,148- sequences located on scaVolds 202 and 272 and chromo- bp screening probe started at position 1 and ended at some 01q were aligned in order to analyze the borders of position 3,148. This conWrmed our assumption that the two the repetitive sequence, and hence the P element terminal subcloned fragments were parts of a longer repeat unit. The sequences. It was possible to determine three of the spanned regions had a length of 5,134 bp (on scaVold 202), C. intestinalis P elements (Table 3), which were designated 4,401 bp (on scaVold 272), and 1,393 bp (on chromosome as Pcin1 (located on scaVold 202; Supplementary S3), 123 Mol Genet Genomics

Table 3 Most important features of Pcin1, Pcin2 and Pcin3 Position (nt) Strand Length (bp) TIR (bp) TSD (bp) Coding region Special features

Pcin1 202a 65,895–71,508 Minus 5,614 29 None + Duplication of 157 bp at the 5Ј end Pcin2 272a 46,167–51,043 Minus 4,877 29 7 ? Duplication of 153 bp at the 5Ј end Pcin3 01qb 1,359,670–1,361,385 Plus 1,716 29 7 – a ScaVold b Chromosome

Table 4 Genomic excision footprints Chromosome Position (nt) Strand Sequence (bp)

05q 4,084,754–4,084,825 Plus gtatcagCATAGTACTCTCTTAagagaagagagtactatggtatctctctTAAGAGAGTACTATGgtatcag 01p 3,837,258–3,837,299 Plus ctgagatCATAGTACTCTCATAAGAGAGTACTATGctgagat TSDs lower case, underlined, TIRs upper case, bold

Pcin2 (located on scaVold 272; Supplementary S4) and nalis, showed divergence of 0.9–2% in their corresponding Pcin3 (located on chromosome 01q; Supplementary S5) All regions. three elements have 29-bp long TIRs; Pcin2 and Pcin3 are Xanked by TSDs of 7 bp, whereas, in the sequence 202 (Pcin1), the TSD was missing (Table 3). Pcin1 had a length Discussion of 5,614 bp and its 29-bp TIRs were conserved. At the 5Ј end, a perfect duplication of 157 bp, including the 29-bp The P elements from C. intestinalis are highly divergent TIRs, was found, which corresponded to a 5Ј duplication in from those of C. savignyi Pcin2. Comparison with the cDNA showed that the coding region of Pcin1 consisted of 13 small exons and 12 introns. Three P-homologous sequences, P1_cis, P6_cis and All introns had intact splicing sites. The coding sequence P1a_cis, from C. savignyi, are found in the Repbase Update was Xanked by 5Ј and 3Ј non-coding sequences. (Jurka et al. 2005). One of these, namely P1_cis (Supple- Pcin2 had a length of 4,877 bp with perfectly conserved mentary S6) can be translated into a P-homologous protein. 29-bp TIRs and a 7-bp TSD. As mentioned above, at the 5Ј The second element, P1a_cis, has multiple internal dele- end, 153 bp were duplicated, with one substitution at posi- tions with respect to P1_cis. P6_cis is depicted as a P ele- tion 152. This duplication included the 7-bp TSD and the ment because of a short region (324 bp) with encoded 29-bp TIR. The internal region of this sequence was not similarity to P transposases, although this is the only indi- determined in the scaVold sequence, thus its coding capac- cation of any relationship with P elements. All three ele- ity was questionable. ments were posted by Smit in the Repbase Update in 2005 Pcin3 was an internally deleted element of 1,716 bp with (Jurka et al. 2005). P1_cis has a length of 4,062 bp, exhibits conserved 29-bp TIRs and a 7-bp TSD. The 5Ј breakpoint 26-bp TIRs (with one substitution), and is Xanked by an corresponded with position 620 in Pcin1 and the 3Ј break- 8-bp TSD. It has two large overlapping ORFs. ORF1 point with position 4355 in Pcin1. Hence, they were located encodes a protein that contains a THAP4 domain, which is within the Wrst intron and the 3Ј non-coding sequence. a member of the THAP superfamily. The larger ORF2 All three analyzed C. intestinalis P elements showed probably encodes a P-transposase homolog. P1a_cis is characteristic features of potentially active DNA transpo- 2,614 bp long and its 26-bp TIRs are perfectly conserved. sons, namely TIRs (in Pcin1, Pcin2 and Pcin3) and TSDs Comparative analysis of P1_cis and the Pcin elements (in Pcin2 and Pcin3). Neither the in silico constructed tran- has shown that the P elements of C. savignyi and C. intesti- script from Pcin1 nor the cDNA obtained from the database nalis are highly divergent. The TIRs of the C. savignyi ele- (AK113587) had an intact reading frame. Therefore, we ments P1_cis and P1a_cis are 26 bp long, whereas all three suppose that either the coding master sequence was not C. intestinalis elements had 29-bp TIRs and only the Wrst identiWed, or that the Pcin elements were already silenced. (considering the left TIR) and last (considering the right The Wrst hypothesis was supported by genomic excision TIR) Wve nucleotides (CATAG…CTATG) were identical footprints that were perfectly conserved and Xanked by per- between the TIRs of the two species. The TSD in C. sav- fectly conserved target-site duplications (Table 4). Pcin1, ignyi is 8 bp long, which is the typical length for P-transpo- Pcin2 and Pcin3, as well as the two transcripts of C. intesti- son TSDs. The 7-bp TSDs that Xank the C. intestinalis 123 Mol Genet Genomics

Fig. 2 The phylogenetic trees show the relationship of P elements and from D. melanogaster (Pmel); Pdre2 from Danio rerio (Pdre); Pgga P-homologous sequences from several organisms. The left tree was from chicken; and Phsa from human (Phsa). The trees do not reXect calculated by the online web server PhyML, the right one with the pro- the phylogenetic relationships, and place the P element of C. savignyi gram Mrbayes. They were based on the protein alignment (aminoacids (Pcis) adjacent to that of zebraWsh (Pdre; weak bootstrap value of 64 269–594) shown in Fig. 3. The following sequences were used: Pcin1 in the left tree) and the P element of C. intestinalis adjacent to that of from C. intestinalis (Pcin); P1_cis from C. savignyi (Pcis); p25.1 Drosophila (Pmel; bootstrap value of 99 in the right tree) elements Pcin2 and Pcin3 were unusual for P transposons. of the three P elements from C. intestinalis (Pcin1, Pcin2 Considering the coding regions, Pcin1 depicted 13 small and Pcin3) were conserved perfectly, whereas the 26-bp exons with sizes between 34 and 299 bp (Supplementary TIR from the C. savignyi P element was conserved per- S3), whereas there are two large ORFs in P1_cis (Supple- fectly in P1a_cis, but had one substitution in P1_cis. There mentary S6), and only one of them codes for a putative was only a weak sequence homology of »50% between the transposase. Although the translated sequences of P1_cis TIRs of Pcin, P1a_cis and the canonical P element from and Pcin1 gave signiWcant hits to P transposases of other D. melanogaster. Only the Wrst three nucleotides of the left species (vertebrates and drosophilids) in a BLAST search, TIR and last three nucleotides of the right TIR were highly they showed a high divergence rate of 24.3% within the conserved, which emphasized their importance for the region used for the phylogenetic trees shown in Fig. 2. transposition event. Highly divergent amino acid sequences have been reported For comparison at the amino acid level, the canonical also for the huntingtin gene product of C. intestinalis and P transposase (Pmel) from D. melanogaster was aligned C. savignyi: the 27.5% divergence is comparable to that to the P element amino acid sequences from C. intesti- observed between mammals and Wshes (Gissi et al. 2006). nalis (Pcin1) and C. savignyi (P1a_cis), and to the puta- Analyses of 18S tRNA sequences show that the pairwise tive proteins from zebraWsh (Pdre), chicken (Pgga) and divergence of the two Ciona sp. is slightly less than that human (Phsa) (Supplementary S8). The amino acid between human and frog, and slightly greater than between alignment from position 269–594 (Fig. 3) was used to human and chicken (Johnson et al. 2004). It is supposed calculate phylogenetic trees using the online web server that this is caused by the presence of long branches for PhyML and MrBayes. The trees obtained (Fig. 3) did not these species (Gissi et al. 2006; Supplementary S7). reXect the phylogenetic relationships and placed the P element of C. savignyi (Pcis) adjacent to that of zebraWsh Relationship between P elements from Ciona sp. (Pdre) and the P element of C. intestinalis (Pcin) adja- and other species groups cent to that of Drosophila (Pmel). This implies that the P element of C. intestinalis is more closely related to that When analyzing the relationship between P elements, it is of Drosophila (bootstrap value: 99%) than to that of always tempting to look at their TIRs and reasonable to C. intestinalis. Discrepancies between the phylogenetic compare their deduced amino acid sequences. The TIRs are relationships of P elements and their host species within essential features for P-element transposition, and they are the (BrookWeld et al. 1984; Hagemann bound by the inverted repeat binding proteins encoded by et al. 1996; Haring et al. 1998) suggest that the P ele- the organism itself (Rio and Rubin 1988). The 29-bp TIRs ments are not merely vertically inherited, but can also be 123 Mol Genet Genomics

Fig. 3 The amino acid Pcin YHPDVVRWAIELFSRSPSAYEHL-RTTGVLTLPAKSTI------ASYRNSISPEPGLNTKALEEIERVVKLNGNLSA----YVTLDE Pcis YTEKIKKFAMTLHTYSPKAYQFI-RKHL--TLPHPRTLSK------WLSVFKGNAGITAESIEAIK---NKQAKIDYKLYYSLILDE sequences of P elements and Pmel FNSDDISTAICLHTAGPRAYNHLYKKGF--PLPSRTTLYR------WLS----DVDIKRGCLDVVIDLMDSDGVDDADKLCVLAFDE P-homologous sequences of six Pdre YSEELKQFATTLHFYSPKAYDYV-RDNFQKALPHSHTIRN------WYSSVSADPGFTVASFTALKCHVEENKERGKETVCALMMDE Pgga YCPAMRQFVCLLHLRHHAAYEQL-RKVF--PLPHPASLSKQVCPRPFCVRGCVCGTTPAWLSNDAAAPGFSNDMFLQLQEKVERGEQAYC--YCALMVQE species were aligned: Pcin1 Phsa YSAEMKQFACTLYLCSSKVYDYV-RKIL--KLPHSSILR------TWLSKCQPSPGFNSNIFSFLQRRVENGDQLYQ--YCSLLIKS from C. intestinalis (Pcin); Pcin MKIRENLI-IKNGKLI----GFVDFGKD-LVQTKSQLASHLLVFYLRSSDREISIPLAWYPTHSTPPHVLAMLFWQLLWECESRGVQIHAVVCDGHPSNR P1_cis from C. savignyi (Pcis); Pcis MAIKKQVEWN--GKEY---TGFINLGEN-LDDDTLPVAANALVFMLVPINANWKIPIAYFFTDGLSGKTLAELVRKVLSCLHDNNILVCSLTCDGCGSNQ  Pmel MKVAAAFEYDSSADIVYEPSDYVQLA------IVRGLKKSWKQPVFFDFNTRMDPDTLNNILR----KLHRKGYLVVAIVSDLGTGNQ p 25.1 from D. melanogaster Pdre MYIHKMTEFA--GDQF---HGYVDIGTG-EIDNTL--ATQALVLMVVAVNESWKIPIAYFLITGMDGSEKANVIRESLSRLHEVGVKVISLTCDAPTTNL (Pmel); Pdre2 from Danio rerio Pgga MSLQKCQEWDRQSQRL---TGFIDLGAGSLNADEAPLASEVVVVMAAGISSPWRAPLGYFFVSGVTGCLLAQLLRQAISKLNNIGVTVLAVTSGATACGA Phsa MPLKQQLQWDPSSHSL---QGFMDFGLGKLDADETPLASETVLLMAVGIFGHWRTPLGYFFVNRASGYLQAQLLRLTIGKLSDIGITVLAVTSDATAHSV (Pdre); Pgga from G. gallus (Pgga); and Phsa from Pcin TFFKLIACRPLNVSGPDLYTAINPYACDRHIFLCSDPSHLLKTARNSLFSSKPGGSRYMNNNGDIL-WTHILSVYKTEQQNLMLSRCRLSSAHIYLNSYT Pcis SMLNELGVPVRYPIKQSYFL--HPENPNQKISVFLDNCHMFKLMRNLL--AQ-KKLLQNSTTGQQIKWDYITNLHNIQTKEGLKLANKLKRNHIEYGTQ- H. sapiens (Phsa). Amino acids, Pmel KLWTELGI----SESKTWFS--HPADDHLKIFVFSDTPHLIKLVRNHY--VDSG----LTINGKKLTKKTIQEALHLCNKSDLSILFKINENHINVRSLA which were identical in at least Pdre AMIRELGADLNINNMKPYFM--HPEDPTQKVHVILDACHMLKLLRNAF--A--SSLEFETEDGNKIKWKYIEALNELQEKEGLRLGNKLKMAHLQWRKQ- Pgga ETARALGVRINPQRIRCAFH--HPPSSAHCIAYFFDVCHALHLIRNTL--QYFQKIQWLSDTVQ---WQHVVELATLQEEK-L--LGPRSGHPVSKETY- two species, are shown in gray Phsa QMAKALGIHIDGDDMKCTFQ--HPSSSSQQIAYFFDSCHLLRLIRNAF--QNFQSIQFINGIAH---WQHLVELVALEEQE-LSNMERIPSTLANLKNH-

Pcin KMKVNLAREVLSWNVGK----CL-----EQIPAANATAKFVLLFAKWFDL Pcis KMKVSLAVQTLSASTGKAIQ-CLRQLGYAQFQGSAETENFIMLTDRLFDV Pmel KQKVKLATQLFSNTTASSIRRCY-SLGYD-IENATETADFFKLMNDWFDI Pdre KMKVHLAAQLFSSSVADAIEFCEQGLKMEEFKGCAATVQFLRTVDAAFDV Pgga QLKVNLAAPLFSEGVADALEH-LQKLGLASFQNCGGTVKFLRTMSRLCDV Phsa VLKVNSATQLFSESVASALEY-LLSLDLPPFQNCIGTIHFLRLINNLFDI transmitted horizontally. Further investigations will tell Dehal P, Satou Y, Campbell RK, Chapman J, Degnan B, De Tomaso us whether horizontal transmission can also be consid- A, Davidson B, Di Gregorio A, Gelpke M, Goodstein DM, Harafuji N, Hastings KE, Ho I, Hotta K, Huang W, Kawashima ered within the chordate lineage as a mechanism for P T, Lemaire P, Martinez D, Meinertzhagen IA, Necula S, Nonaka element propagation. M, Putnam N, Rash S, Saiga H, Satake M, Terry A, Yamada L, Wang HG, Awazu S, Azumi K, Boore J, Branno M, Chin-Bow S, DeSantis R, Doyle S, Francino P, Keys DN, Haga S, Hayashi H, Hino K, Imai KS, Inaba K, Kano S, Kobayashi K, Kobayashi M, Conclusion Lee BI, Makabe KW, Manohar C, Matassi G, Medina M, Mochizuki Y, Mount S, Morishita T, Miura S, Nakayama A, The present study demonstrates that P elements with typi- Nishizaka S, Nomoto H, Ohta F, Oishi K, Rigoutsos I, Sano M, cal features of mobile DNA elements have been found Sasaki A, Sasakura Y, Shoguchi E, Shin-i T, Spagnuolo A, Stainier D, Suzuki MM, Tassy O, Takatori N, Tokuoka M, Yagi already at the base of the chordate lineage. It can be specu- K, Yoshizaki F, Wada S, Zhang C, Hyatt PD, Larimer F, Detter lated that P elements were present in a common ancestor of C, Doggett N, Glavina T, Hawkins T, Richardson P, Lucas S, the urochordata and vertebrates, or considering the well- Kohara Y, Levine M, Satoh N, Rokhsar DS (2002) The draft known Drosophila P elements, even within the common genome of Ciona intestinalis: Insights into chordate and verte- brate origins. Science 298:2157–2167 ancestor of protostomia and deuterostomia. Delsuc F, Brinkmann H, Chourrout D, Philippe H (2006) Tunicates and not cephalochordates are the closest living relatives of verte- Acknowledgments The genomic C. intestinalis library was kindly brates. Nature 439:965–968 provided by Weiyang Shi (Berkley). We thank Daniela Schneider for Gissi C, Pesole G, Cattaneo E, Tartari M (2006) Huntingtin gene evo- various discussions, and the anonymous reviewers for valuable sug- lution in Chordata and its peculiar features in the ascidian Ciona gestions to improve the manuscript. genus. BMC Genomics 7:288 Guindon S, Lethiec F, Duroux P, Gascuel O (2005) PHYML Online— a webserver for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res 1;33(Web server issue):W557–559 References Hagemann S, Hammer SE (2006) The implications of DNA transpo- sons in the evolution of P elements in zebraWsh (Danio rerio). Anxolabéhère D, Nouaud D, Periquet G, Tchen P (1985a) P-Element Genomics 5:572–579 distribution in Eurasian populations of Drosophila melanogaster: Hagemann S, Pinsker W (2001) Drosophila P transposons in the A genetic and molecular analysis. Proc Natl Acad Sci USA human genome? Mol Biol Evol 18:1979–1982 82:5418–5422 Hagemann A, Miller WJ, Pinsker W (1996) Repeated horizontal trans- Anxolabéhère D, Nouaud D, Périquet G (1985b) Séquences homolo- fer of P transposons between Scaptomyza pallida and Drosophila goues á l`élément P chez des espèces de Drosophila du groupe bifasciata. Genetica 98:43–51 obscura et chez Scaptomyza pallida (Drosophilidae). Génét Sél Hammer SE, Strehl S, Hagemann S (2005) Homologs of Drosophila P Evol 17:579–584 transposons were mobile in zebraWsh but have been domesticated Anxolabéhère D, Kidwell MG, Periquet G (1988) Molecular character- in a common ancestor of chicken and human. Mol Biol Evol istics of diverse populations are consistent with the hypothesis of 22:833–844 a recent invasion of Drosophila melanogaster by mobile P ele- Haring E, Hagemann S, Pinsker W (1998) Transcription and ments. Mol Biol Evol 5:252–269 splicing patterns of M- and O-type P elements in Drosophila BrookWeld JF, Montgomery E, Langley CH (1984) Apparent absence bifasciata, D. helvetica, and Scaptomyza pallida. J Mol Evol of transposable elements related to the P elements of D. melano- 46:542–551 gaster in other species of Drosophila. Nature 310:330–332 Hoshino Z, Nishikawa T (1985) Taxonomic studies of Ciona intesti- Daniels SB, Strausbaugh LD (1986) The distribution of P-element se- nalis (L.) and its allies. Publ Seto Mar Lab 30:61–79 quences in Drosophila: the willistoni and saltans species groups. Jiang D, Smith WC (2005) Self- and cross fertilization in the solitary J Mol Evol 23:138–148 ascidian Ciona savignyi. Biol Bull 209:107–112 123 Mol Genet Genomics

Johnson DS, Davidson B, Brown CD, Smith WC, Sidow A (2004) OЈHare K, Driver A, McGrath S, Johnson-Schlitz DM (1992) Distribu- Noncoding regulatory sequences of Ciona exhibit strong corre- tion and structure of cloned P elements from the Drosophila spondence between evolutionary constraint and functional impor- melanogaster P strain. Genet Res 60:33–41 tance. Genome Res 14:2448–2456 Perkins HD, Howells AJ (1992) Genomic sequences with homology to Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, the P element of Drosophila melanogaster occur in the blowXy Walichiewicz J (2005) Repbase Update, a database of eukaryotic Lucilia cuprina. Proc Natl Acad Sci USA 89:10753–10757 repetitive elements. Cytogenet Genome Res 110:462–467 Quesneville H, Nouaud D, Anxolabéhère D (2005) Recurrent Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res recruitment of the THAP DNA-binding domain and molecular 12:656–664 domestication of the P-transposable element. Mol Biol Evol Kidwell MG (1983) Evolution of hybrid dysgenesis determinants in 22:741–746 Drosophila melanogaster. Proc Natl Acad Sci USA 80:1655–1659 Rio DC, Rubin GM (1988) IdentiWcation and puriWcation of a Kidwell MG, Kidwell JF, Sved JA (1977) Hybrid dysgenesis in Dro- Drosophila protein that binds to the terminal 31-base-pair invert- sophila melanogaster: a syndrome of aberrant traits including ed repeat of the P transposable element. Proc Natl Acad Sci USA mutation, sterility and male recombination. Genetics 86:813–833 85:8929–8933 Lansman RA, Stacey SN, Grigliatti TA, Brock HW (1985) Sequences Rio DC, Laski FA, Rubin GM (1986) IdentiWcation and immunochem- homologous to the P mobile element of Drosophila melanogaster ical analysis of biologically active Drosophila P element trans- are widely distributed in the subgenus Sophophora. Nature posase. Cell 44:21–32 318:561–563 Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic Laski FA, Rio DC, Rubin GM (1986) Tissue speciWcity of Drosophila inference under mixed models. Bioinformatics 19:1572–1574 P element transposition is regulated at the level of mRNA splic- Roussigne M, Kossida S, Lavigne AC, Clouaire T, Ecochard V, ing. Cell 44:7–19 Glories A, Amalric F, Girard JP (2003a) The THAP domain: a Lee SH, Clark JB, Kidwell MG (1999) A P element-homologous novel protein motif with similarity to the DNA- binding domain sequence in the house Xy Musca domestica. Mol Biol of P element transposase. Trends Biochem Sci 28:66–69 8:491–500 Roussigne M, Cayrol C, Clouaire T, Amalric F, Girard JP (2003b) Miller WJ, Hagemann S, Reiter E, Pinsker W (1992) P-element homol- THAP1 is a nuclear proapoptotic factor that links prostate- ogous sequences are tandemly repeated in the genome of Dro- apoptosis-response-4 (Par-4) to PML nuclear bodies. Oncogene sophila guanche. Proc Natl Acad Sci USA 89:4018–4022 22:2432–2442 Misra S, Rio DC (1990) Cytotype control of Drosophila P element Simonelig M, Anxolabéhère D (1991) A P element of Scaptomyza transposition: the 66 kD protein is a repressor of transposase pallida is active in Drosophila melanogaster. Proc Natl Acad Sci activity. Cell 62:269–284 USA 88:6102–6106 O’Hare K, Rubin GM (1983) Structures of P transposable elements and their sites of insertion and excision in the Drosophila melano- gaster genome. Cell 34:25–35

123