Copyright 0 1996 by the Society of America

Multiple Non-LTR in the of ArabidqPsis thulium

David A. Wright,* Ning Ke," Jan Smalle,t" Brian M. Hauge,t'2Howard M. Goodmant and Daniel F. Voytas*

*Department of Zoology and Genetics, Iowa State University, Ames, Iowa 50011 and tDepartment of Genetics, Haward Medical School and Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts 021 14 Manuscript received June 13, 1995 Accepted for publication October 14, 1995

ABSTRACT DNA sequence analysis near the AB13 revealed the presence of a non-LTR insertion thatwe have designated Tal 1-1.This insertion is 6.2 kb in length and encodes two overlapping reading frames with similarity to non-LTR retrotransposon proteins, including . A polymerase chain reaction assaywas developed based on conserved amino acid sequences shared between the Tall-1 reverse transcriptase and those of non-LTR retrotransposons from other species. Seventeen additionalA. thaliana reverse transcriptases were identified that range in nucleotide similarity from 4848% (Ta12-Ta28). Phylogenetic analyses indicated that the A. thaliana sequences are more closely related to each other than to elements from other , consistent with the vertical of these sequences over mostof their evolutionary history. One sequence, Ta17,is located in the mitochondrial genome. The remaining are nuclear andof low copy number among 17 diverse A. thaliana ecotypes tested, suggesting that they are not highly active in transposition. The paucity of retrotransposons and the small genome size of A. thaliana support the hypothesis that most repetitive sequences have been lost from the genome and that mechanisms may exist to prevent amplification of extant element families.

RABIDOPSIS thaliana has the smallest known ge- element groups-the Tyl/copia andthe Ty3/gypsy A nomeamong higher (LEUTWILERet al. group elements-named after representative retro- 1984). With only lo8 bp, it lies at one end of the spec- transposon families from Saccharomycescereuisiae and trum of genome size that extends as high as 10" Drosophilamelanogaster (XIONG and EICKBUSH 1990). bp for some monocot species (BENNETTand SMITH The Tyl/copia and Ty3/gypsy groups are distinguished 1976). An unusual feature of the A. thaliana genome is by the order of proteins encoded by their pol the scarcity ofinterspersed repetitive DNA (LEUTWILER and similarities among the sequences of their reverse et al. 1984; PRUITI and MEYEROMTZ1986). Although transcriptases. highly repeated andmoderately repeated sequencesto- We and others have previously assessed the distribu- gether make up -20% of the genome, the interspersed tion and diversity of Tyl/copia group elements among repeat fraction constitutes only -2% of the total nu- a wide variety of plant species (FLAVELLet al. 1992a,b; clear DNA (MEYEROWITZ1992). For most eucaryotes, VOYTASet al. 1992; HIROCHIKAand HIROCHIKA, 1993). interspersed repeat sequences are typically mobile ge- This was accomplished using a polymerase chain reac- netic elements, the most abundant of which are the tion assay that specifically amplified Tyl/copia group retrotransposons, mobile elements that replicate by re- reverse transcriptases. Species representing each Divi- verse of an mRNA intermediate. sion of the plant kingdom were found to carry Tyl/ Two major classesof retrotransposons have been copia group elements, and many species were found to identified that are present in the of diverse harbor multiple diverse families of these retrotranspo- eucaryotes (DOOLITTLEet al. 1989; XIONG and EICK- sons. In cotton, for example, nine distinct lineages of BUSH 1990). These retrotransposon classes are distin- Tyl/c@ia groupelements were identified from the guished by whether or not they are flanked by long analysisof 89 partial reverse transcriptase sequences terminal direct repeats (LTRs) and are simply referred (VANDERWIELet al. 1993). More recently, analysisof to as the LTR and non-LTR retrotransposons. The LTR sequences in the DNA databases have revealed Tyl/ retrotransposons are further composed of two distinct copia retrotransposon insertions adjacent to 21 plant genes, further indicating the abundance of these ele- Correspmding author: Daniel F. Voytas, 2208 Molecular Biology ments in plants (WHITEet al. 1994). Although less well Building, Iowa State University, Ames, IA 50011. documented, examples of plant Ty3/gypsy group ele- E-mail: [email protected] ments have also been reported (SMMYTHet al. 1989; Pu- 'Present address: Laboratorium Genetika, Universiteit Gent, Lede- ganckstraat 35, E9000 Gent, Belgium. RUGGANAN and WESSLER 1994), as have two examples 2Present address: Transkaryotic Therapies, Inc., 195 Albany St., Cam- of non-LTR retrotransposons (SCHWARZ-SOMMERet al. bridge, MA 02139. 1987; LEETONand SMYTH1993).

Genetics 142 569-578 (February, 1996) 570 A. D. Wright et al.

Retrotransposons have influenced plant genome or- dard laboratory strains; Landsberg carries the recessive erecta ganization by contributing to genome size. This occurs mutation. DNA manipulations: Standard methodswere used for DNA because retrotransposition is a replicative process and manipulations, including the purification of plant, yeast and unlike the DNA transposons, retrotransposons do not DNA, the preparation of Southern filters, the screen- excise. Rather,a single insertion can be transcribed ing of yeast artificial (YAC) and lambda phage toyield mRNA templates, which in turn are reverse libraries, and enzymatic manipulation of cloned DNAs (Ausu- transcribed to generate numerous progeny elements. BEL et al. 1987). All filter hybridizations were conducted at 65"by the method of CHURCHand GILBERT (1984). DNA The extent to which retrotransposons can contribute fragments used as hybridization probes were separated on to genome bulk is well exemplified by lily, which has low melting agarose gels; gel slices containing desired DNA one of the largest characterized plant genomes (BEN- fragments were excised, melted and used directly for radiola- NETT and SMITH1976). Two retrotransposon families beling by random priming (Promega). have been identified in lily, the dell and de12 elements, The presence of' Tal 1-1 was initially suggested from the se- quence of' clone 4711, which was isolated from a whichhave -13,000 and 250,000 copies, respectively Co- lumbia genomic library (GIKAUDATet al. 1992). To obtain the (SENTRYand SMV~H1985; SMYTHet al. 1989). The de12 entire Tall-I insertion as well as flanking sequences, a Colum- elements alone account for -4% of the lily genome. bia YAC library (GRIIL and SOMERVILLE1991) was screened This amounts to -lo9 base pairs or 10 times the size using cosmid 471 1 as a hybridization probe. A single clone was of the entire A. thalinna genome (SMWH et al. 1989; identified (EG15A3), from which a 7.2-kb BgnI fragment was subcloned that was predicted to span the 5' end of Tall-I SMYTH1991). (Figure 1).Because Southern hybridization analyses suggested While the A. thaliana genome is small and lacks abun- that Landsberg did not carry a Tall-1 insertion, the empty dant interspersed repeats,it nonetheless carries numer- target site was cloned using a Landsberg genomiclambda ous diverse Tyl/ cqia group retrotransposons (VOU~AS phage library (VOWAS et al. 1990); hybridization probes in- and AUSUBEL1988; VOITAS et al. 1990; KONIECZNYet al. cluded a 1.1-kb EcoRI/HzndIII fragment 3' of Tall-1 (probe A, Figure 1) and an -0.8-kb BgllI/SuZJ fragment near the 5' 1991). We have previously identified and characterized end of the insertion (probe B, Figure 1). The library screen 10 such families that are comparable in number and yielded one phageclone with probeA (XDWl) and three diversityto element families in species like cotton, unique clones with probe B (hDW2, XDW3, ADW4). XDWl did which have considerably larger genomes (lo8bp us. 5 not hybridize to phage isolated with probe B. This data, as well X lo9 bp) (KONIECZNYet al. 1991; VANDERWIELet al. as additional DNA sequence of Tall-I, suggested that YAC EG15A3 was a chimera,with thejunction residing at the BamHI 1993). Unlike their counterpartsin other plants, the A. site in Tall-l (see RESIILTS). The bonpjde Tall-I insertion, thalzana elements have not amplified to appreciable therefore, was cloned froma Cohmbid genomic lambda phage copy numbers. Each element family is represented by library (kind gift of J. MULLIGANand R. DAVIS).The library one or few insertions. was screened using a 1.8-kb Hind111 fragment from XDWl adja- Here we present evidence for the presence of a class cent to the site of Tall-I insertion (probe D, Figure 1). assay for reversetranscriptases: Two conserved of retrotransposons not previously described in A. thali- PCR amino acid sequence domains shared among non-LTR retro- ana-the non-LTR retrotransposons. One such ele- transposon reverse transcriptases were used to design com- ment, Tall-1, was identified through analysis of DNA pletely degenerate oligonucleotideprimers (DV0144 = sequences flanking the AB13 gene. Sequence similarity GGGATCCNGGNCCNGAYGGNWT and DV0145 = GGA- between Tal 1-1 and related retrotransposons provided ATTCGGNSWNARNGGRYMNCCYTG, where R = A + G; Y the basis for a polymerase chain reaction assay, which =C+T;M=A+C;S=G+C;W=A+T;N=A+G + C + T). Amplifications were carried out in 50-yl reactions was used to characterize the number and diversity of that included 0.5 pg of Landsberg genomic DNA, 50 pmol A. thalzana non-LTR retrotransposons. Members of 17 of each primer, 2.5 units of Taq DNA polymerase and buffer additional element families were identified, including supplied by the manufacturer (Gibco) with a final MgCl:! con- one that resides in the mitochondrial genome. As with centration of 2.5 mM. Amplifications were conducted for 30 the A. thalzana Tyl/c@ia group retrotransposons, the cycles (denaturation, 94" for 1 min; annealing, 47" for 1 min; extension, 72" for 2 min). The resultant 600-bp product was non-LTR retrotransposons arealso allof low copynum- gel purified on low melting agarose gels andcloned into ber. This suggests that at least in part, the small size of pT7Blue T-vector (Novagen). A total of 26 independent the A. thaliann genome is the consequenceof the failure clones were sequenced. of retrotransposons to successfully amplify and colonize DNA sequencing and sequence analysis: Templates for se- its genome. quencing Tall-l were generated by subcloning specific re- striction fragments orthrough y6-transposon mutagenesis (STRATHMANNet al. 1991) (TN1000, Gold Biotechnology). MATERIALSAND METHODS Oligonucleotides were also synthesized to prime some se- quencing reactions. DNA sequences were determined by the Plant material: The A. thalzana ecotypes represent wild DNA Sequencing Facility of Iowa State University or thefmol populations that have been collected from around theworld DNA sequencing system (Promega). Data analysis wdS per- (KRANZ and IKCHHEIM1987): Kas, India; Co, Portugal; Sei, formed using the GCG computer software (DEVEKEUXet al. Italy; No, Germany; Mv, United States; L1, Spain; Cvi, Cape 1984). The phylogenetic tree was constructed by the neigh- Verde Islands; Fi, Finland; Ba, Great Britain; Hau, Denmark; bor-joining method (SMrou and NEI 1987) using the pro- Aa, Germany; Ge, Switzerland; Ms, Soviet Union; Ag, France; gram, MEGA (KUMAKet al. 1993).Amino acid sequences were Mh, Poland. Columbia (Col) and Landsberg (La) are stan- used for the analysis; no distance correction formula was ap- A. thaliana Retrotransposons 571

B EE- PB :h :h A Cosmid4711 (Col)

LDW15 (Col) C I I""" """"_ I -4 I - H1 kb LDWl (La) D FIGURE1.-Restriction endonuclease maps at the site of the Tall-1 insertion. The location of Tall-1 is shown with respect to the AB13 gene as well as the chromosome 3 morphological markers, gll and hy2. All restriction endonuclease sites are shown for EcoRI (E), Bud1 (B), PstI (P) and en1 (K). Other relevant sites discussed in the Materials and Methods include BgnI (Bg) and XbuI (X). The left-most boundary of the insert in cosmid 4711 is shown as a jagged line. The heavy line in YAC EG15A3 denotes chimeric DNA. The two lambda phage clones, hDWl5 and XDWl, encompass the Tall-1 insertion in Columbia (Col) or the empty target site in Landsberg (La), respectively. Bars designated A, B, C and D indicate restriction fragments used as hybridization probes. plied. All stop codons and frameshifts were eliminated from The ends of non-LTR retrotransposons are difficult the data set. The tree was unrooted, and bootstrap analysis to discern because they are not flanked by terminal was conducted with 1000 replicates. GenBank accession num- bers for DNA sequences described in this manuscript are as repeat sequences. Because Tall-1 was identified from follows: Tall-I, L47193; Tall-1 target site, L47211;Ta12- the Columbia ecotype, Southern hybridizations were Ta28, L47175, L47181-92, L471945 and L47289-90. used to screen several other A. thaliana ecotypes to de- termine whether any lacked this insertion (data not RESULTS shown). The common laboratory strain Landsberg was Identification of Tall, an A. thaliana non-LTR retre found to carry a numberof restriction fragment length transposon: The A. thaliana AB13 gene was cloned using polymorphisms in this region when compared with Co- a map-based strategy that entailed constructing a de- lumbia, suggesting that either Tall-I was absent or re- tailed map of closely linked restriction fragment length arrangements at this sitehad occurred between the two polymorphisms surrounding the ubi3 locus (GIRAUDAT strains. A genomic lambda phage library of Landsberg et al. 1992). Through this analysis, AB13 was found to DNA was therefore screenedusing hybridization probes reside within a cosmid clone designated 471 1. DNA se- from either side of Tall-1 to clone the putative empty quence analysis of >23 kb of 4711 was performed to target site (Figure 1). identify potential coding regions, and one openreading If the presence of Tall-1 in Columbia were due to a frame (OW) was shown to encode ABZ3 (GIRAUDATet simple retrotransposon insertion, then the sequences al. 1992). In addition, a large ORF of 985 amino acids immediately 5' and 3' of the ORF would be predicted was found at one end of 4711 proximal to the gll locus to be juxtaposedin Landsberg. However, hybridization (Figure 1). Comparison of the amino acid sequence of analyses revealed that phage clones isolated from Land- this ORF to the DNA sequence databases revealed strong sberg with 5' and 3' probes did not overlap and were similarity to reverse transcriptases ofnon-LTR retro- separated by 210.7 kb. DNA sequence analysis further transposons (see below), suggesting that this sequence revealed that the Tall-1 OW stopped at the BamHI is an insertion of a novel A. thalzanu retrotransposon. site (Figure 1). Because BamHI was used to construct We refer to this sequence as Tall-1 to indicate that it the YAC library, we considered that the YAC clone may is the first insertion identified from the eleventh charac- be a chimera, consisting of sequences from two differ- terized A. thalzana retrotransposon family. ent regions of the A. thaliana genome. To determineif The ORF encoded by Tall-1 begins at the junction this was the case, Tall-1 was cloned from a second between vector and insert in cosmid 471 1. To obtain library, a lambda phage library of Columbia DNA. The the entire insertion, 4711 was used as a hybridization region 5' of the Tall-1 in the phage clones differed probe to screen a yeast artificial chromosome (YAC) from the corresponding region of the YAC clone, indi- library. One YACwas identified, EG15A3, from which cating that the YAC was a chimera. The Tall-I coding a 7.2-kb BglrI fragment was subcloned that extended region was found to extend -1 kb 5' of the BamHI site the region 5'of the ORF by -5.2 kb. Of this BglrI (Figure 1) . fragment, 2.8 kb was sequenced, and thepotential cod- Delimiting the ends of Tall-I: The DNA sequence ing region was found to extend an additional 1.6 kb, of Tall-I and the presumed targetsite from Landsberg to the BamHI site depicted in Figure 1. made it possible to delimit the ends of the element. 572 D. A. Wright et al. Domain I Ta 11-1 Domain II A (197 bP) (6077 bp) * (233 bP) a Col -

La " ab

a t b t Domain I Col GTATCATTATATTATTTTGTAA CTATATTCACTATGG...... IIIIIIIIIIIIIIIII /Ill IIIIIIIIIIIIIII La GTATCATTATATTATTT.GTAA CTATATTCACTATGGIGTTGGAAGACCAATGT TTAAGTCTTCTTTTT<150 bp>TAATTAGTAATTGCTAI Ta 11-1 ColIACAAGAAATCTCTCTTTTGGGCTTAAAACCCTCTTTCTCTCTTAACAATATTCATCAATATATCATTATTCAGTG-I La ...... a b t t Domain II Col [CTATATTCACTATGG GTTTGGAAGACCAATGTACACCTATTATAAGT470 bpAAATATTTATAGCAGC1TATAATACGGTGGATTATAA IIIIIIIIl1111111111I La ...... TATAATACGGTGGATTATAA FIGURE2.-The site of Tall-l integration in Columbia (Col) and Landsberg (La). (A) Diagram depicting DNA organization at the site of insertion. Bars indicate the presumptive 15-bp target site duplication (a) and the adjacent 17 bp (b) that is present in both Col and La. The distance in bp is given between the beginning of Tall-1 and the first methionine (M) in OW1 and between the stop codon (*) and the putative 3' target site duplication (a). (B) The DNA sequence of features depicted in A.

The Landsberg sequence contains an insertion of 197 Tall-I encodes ORFs with similarity to non-LTR ret- bp not present in Columbia (Domain I, Figure 2). Be- rotransposons: Tall-1 encodes two openreading cause non-LTR retrotransposons create target site du- frames, designated ORFl and ORF2, of 487 and 1339 plications ranging from 3 to 16 bp (HUTCHISONet al. amino acids, respectively. These ORFs areseparated 1989), sequences at the 5'junctionof Tal 1-1were com- by a - 1 frame-shift (Figure 3), which is similar to the pared with sequences downstream of the ORF. Fifteen organization of ORFs encoded by most LTR retro- base pairs immediately adjacent to Tall-1 (a, Figure 2) transposons and , as well as some non-LTR are repeated in directorientation 284 basesdown- retrotransposons (.g., LEETONand SMITH1993; DANI- stream of the Tal 1-1stop codon. The3' repeat is imme- LEVSKAYA et al. 1994). For the LTR retroelements, the diately preceded by 11 adenine residues. Poly(A) expression of the second reading frameis typicallyregu- stretches typically define the 3' ends of non-LTR retro- lated by ribosomal frameshifting (JACKS 1990). The transposons and reflect the useof a polyadenylated frameshift event is facilited by ribosomal pausing, fre- mRNA as a template for reverse transcription (HUTCHI- quently provided by RNA secondary structure immedi- SON et al. 1989; LUANet al. 1993). The 3' repeat is ately downstream of the frameshift site. Two stem/loops flanked by 17 bases (b, Figure 2) that are present in are presentin Tal 1-1downstream of the frameshift site, Domain I of Landsberg (with one mismatch). In addi- which may serve such a function (data not shown). tion to this sequence,there is an additional 201 bp Three amino acid sequence domains are present in insertion at the 3' end of Tall-1 that is unique to Co- ORFl and ORF2 that show similarity to proteins en- lumbia (Domain 11, Figure 2). While it is possible that coded by non-LTR retrotransposons. Two of these are Domain I1 is part of Tall-1, there is no evidence of a short cysteine motifs or finger domains, which are lo- target duplication adjacent to this sequence, and it is cated in the Tall-1 ORFs at positions comparable with not preceded by poly(A). Until additional elements are those of other non-LTR retrotransposons (MCCLURE characterized to confirm our interpretation of the ele- 1991). The motif in ORFl has the general structure ment boundaries, we designate Tall-1 as the 6077 bp CX2CX4H&C, and this motif is also found in gag pro- between the presumptive target site duplication (a, Fig- teins of LTR retrotransposons and retroviruses (COVEY ure 2). This interpretation suggests that the integration 1986). Near the carboxy terminus of the second ORF of Tall-1 resulted in the lossof Domain I and the is another finger domain that has the general structure addition of Domain 11. Alternatively, Tall-1 and Do- of CX1.BCX7.8HX4C (Figure 3). Although the function main I1 could have been deleted from Landsberg. Con- of these sequences is unknown, they are generally comitant with this deletion was the gain of Domain I, thought to be involved in binding nucleic acids (COVEY perhaps through recombination with a Tall element 1986; MCCLURE1991). elsewhere in the genome. The second and most striking region of sequence con- A. thaliana Retrotransposons 573 A ORFl EBE ORF2 E II I 1 ...... ::::::;:xcmz? cvs I ...... , ...... I I+ Reverse CYS Transcriptase -1 kb B Cysteine motif - ORFl Cysteine motif- ORF2

Txl LINE-1 - RZDm - RZDm Reverse transcriptase I II Tall -1 KIM cfn4 KIM del2 EVL Tx 1 .KAI LINE-1 'KIL R2Dm SVLVELN'AIL

." Tall-1 NMS cin4

Tx 1 RZDm v VI VI I Tall-1 HLL VKE < 1 I<23 cin4 CSL DKL

LIUB- RlDm cinl doll Tall Tall Tal3 Tal4 Tal5 Tal 6 Tal 7 Tal8 Tal9 Tal0 Tall Tall Tal3 Tall Tal5 Tal6 Tal7 Tal8

LIIB- 1 RlDm ci nl doll Tal1 Tall Tal3 Tall Tal5 Tal 6 Tal7 Tal8 Tal9 Tal5 Tall Tall Tal3 Tall Tal5 Tal6 Tal7 Tal8

FIGURE 4.-Alignment of non-LTR retrotransposon reverse transcriptases identifiedby the polymerase chain reaction. Black boxes indicate amino acids conserved in 20 of the 22 aligned sequences. Asterisks over the sequences indicate the amino acids encoded by the primers. Asterisks imbedded in the sequence depict stop codons, periods represent gaps inserted to optimize alignment, and X denotes missing data. Reverse transcripase domains I-IV are shown (see Figure 3) (XIONGand EICKBUSH 1990). Related non-LTR retrotransposon sequences areas in Figure 3. sequences to assess A. thaliana non-LTR retrotranspo- LTR retrotransposon reverse transcriptases (Figure 4). son diversity. Six of these ORFs had stop codons, three had a shift The two amino acid sequence domains chosen for in reading frame,and one had botha stop codonand a oligonucleotide design lie near the amino terminus of frame shift. This suggested that these clones are derived reverse transcriptase (Figure 3B). The 5'-most domain from nonfunctional elements or that errors were in- actually precedes the "core" reverse transcriptase (as duced during DNA amplification. All 22 clones showed defined by XIONG and EICKBUSH1990), and this do- a high degree of similarity to the threeconserved amino main is well conserved among non-LTR retrotranspo- acid domains included between the PCR primers (Fig- son reverse transcriptases. The second domain corre- ure 4). Two clones did not show any apparent reverse sponds to domain IV (XIONG and EICK~USH1990), so transcriptase similarity or similarity to sequences in the the resultant PCR product spans three conserved amino DNA databases, and two had the same primer at both acid domains to aid in the identification of novel re- ends, indicating that they were likely spurious amplifi- verse transcriptases. It should be noted that the PCR cation products. assay was not predicted to amplify Tall-1, because the Among the 26 clones, 17 distinct reverse tran- 5' primer does notmatch the sequence of this element scriptases were identified that ranged in nucleotidesim- due to an amino acid substitution (Figure 3B). ilarity from 48 to 88%. Because of the high levelof Consistent with the spacing of the amino acid se- nucleotide differences between clones, we concluded quence domains, the PCR primers amplified a -600- that the 17 clones represent distinct retrotransposon bp DNA fragment from Landsberg genomic DNA. This reverse transcriptases. We designated these new retro- PCR product was cloned into a plasmid vector and DNA transposon families as Ta12-Ta28.Among the 26 clones, sequences were obtained for 26 independent clones. two clones were identified foreach of Ta12, Ta13, Ta15, The nucleotideand derived amino acid sequences were Tal 7 and Ta22. The low level of redundancy among compared with the DNA sequence databases. Twenty- clones in the sample suggests that many more distinct two clones encodedORFs with strong similarity to non- non-LTR retrotransposons are resident in the the A. A. thalzana Retrotransposons 575

Ta 18 Ta I9 OQ Ta20 I Ta21 Ta 16 94 I Ta 17 Tall Ta24 Ta27 -61 fa25 100 I Ta26 Ta22

cln4

LINE-1 R2Dm I 1 0 .1

FIGURE5,"Neighbor-joining tree of partial non-LTR retrotransposon reverse transcriptase sequences. The tree is unrooted, and branch lengths are proportional to the percent amino acid distance between sequences. Numerals adjacent to branches indicate percentage of bootstrap replicates supporting that branch. thaliana genome that were not identified in this initial similar to that observed with or mitochondrial screen. DNA probes (data not shown). Relationships among theA. thaliana non-LTR reverse The DNA sequence of Tal 7 was compared with the transcriptases were assessed using the neighbor-joining data that is accumulating as part of the effort to se- method (SAITOUand NEI 1987). Included in the data quence the entireA. thaliana mitochondrial genome. A set were reverse transcriptases from other closely re- perfect match was found in the mitochondrial sequence lated non-LTR retrotransposons. The A. thaliana se- confirming its organellar location (KNOOP et al. 1996). quences formed a single, strongly supported clade by The Tal 7 sequence has a frame shift and a stop codon, both neighbor-joining analysis (Figure 5) and parsi- suggesting that it no longer encodes an active reverse mony analysis (not shown),indicating that they are transcriptase. Reverse transcriptases are knownto be more closely related to each other than to reverse tran- associated with some group I1 mitochondrial introns, scriptases from other species. This is consistent with the particularly those of fungi (XIONG andEICKBUSH 1990). vertical evolution of these sequences over most of their These reverse transcriptases, however, are clearly dis- evolutionary history. tinct from non-LTR retrotransposons (XIONGand EICK- Each of the clones encoding a distinct reverse tran- BUSH 1990). Because phylogenetic analysis places Tal 7 scriptase was hybridized to Southern filters containing well within the clade of other A. thaliana nuclear retro- genomic DNA from a collection of 17 diverse A. thaliana transposons (Figure 5), we conclude that the presence ecotypes (e.g.,Figure 6). All of the sequences were of of Tal 7 in the mitochondria likely represents atransfer low copy number and ranged from zero to no more of sequences from the nucleus to the mitochondria, than seven distinct insertions amongthe ecotypes (data or a rare transposition event into the mitochondrial not shown). In most cases, hybridization patterns were genome. very similar among the ecotypes, suggesting that each had the same complement of retrotransposon inser- DISCUSSION tions at the same chromosomal locations. The Co eco- type had the most retrotransposons, with five, six and The Tall non-LTR retrotransposons: The non-LTR seven members of the Ta20, Tall and Ta27 families, retrotransposons are among the most abundant class respectively (data not shown). of interspersed repetitive sequences found in the ge- Tal 7resides in the mitochondrial genome:The Tal7 nomes of higher eucaryotes. In plants, however, only reverse transcriptase hybridized to a single restriction two families of non-LTR retrotransposons have thus far fragment among the ecotypes analysed. The intensity been reported-the cin4 elements of and the of the hybridization signal, however, was considerably de12 elements of lily (SCHWARTZ-SOMMERet al. 1987; LEE- greater than those observed with the other clones (Fig- TON and SMYTH 1993). Sequence analysis of a region ure 6). This suggested that Tal 7was highlyrepresented near the A. thaliana AB13 gene revealed an open read- in the genomic DNA, and the hybridization signal was ing frame with similarity to non-LTR retrotransposon 576 D. A. Wright P/ nl.

target site duplication. The presumptive 3' target site duplication is immediately preceded by 11 adenine resi- dues, similar to the 3' ends of other non-LTR retro- transposon insertions (HUTCHISONP! nl. 1989). Ta 77-7 In contrast to most non-LTR retrotransposons, Tall- 1 appears structurally intact. Non-LTR elements fre- Ta 77-2 L quently have 5' deletions, which result from incomplete reverse transcription that is initiatied at the poly(A) tract (LUANet nl. 1993). With theexception of the frame-shift, the open reading frame of Tall-1 is unin- terrupted by stop codonsfor over 5.5 kb. This structural integrity suggests that Tal1-1 has transposed relatively recently and has not accumulated numerous mutations since its insertion. In this regard, Tall is similar to the LTR retrotransposons we have previouslycharacterized in A. thnlinnn. For example, the Ta3-1 insertion does not have any stop codons or frame-shifts in its >4.1 kb ORF (KONIECZNYet nl. 1991). Definitive evidence on the functional integrity of Tall-1 will require determining whether or not it is capable of transposition. Nonethe- less, the structural integrity of this element and other A. thaliann retrotransposons suggests that some may be transpositioncompetent. An abundance of A. thulium non-LTR retrotranspo- sons: Alignment of the Tall reverse transcriptase with FIGURE6.--Southern hybridization analysis of Tall and those from related retrotransposons revealed several Ta17. Filters were prepared with genomic DNAs from 16 di- wellconserved amino acid sequence domains. A PCR verse A. thnlimn ecotypes digested with RnmHI and Rt$II(La assaywas developed based on these conserved se- data not shown). Hybridization probes werefrom Tall-I (probe C, Figure 1) (A) or the Tal7 PCR product (B). The quences to identify related A. thaliann retrotransposons. Tall-I.Tall-2 and Tal 7insertions are indicated. Of 26 cloned amplification products analyzed, 17 were found to encode putative protein products with signifi- reverse transcriptases. We haveshown that this se- cant similarity to non-LTR retrotransposon reverse tran- quence is a member of an A. thnlinna non-LTR retro- scriptases. These 17 sequences range in nucleotide transposon family, which we have designated Tall. identity from 48 to 88%, which is comparable with the The Tallinsertion near AB13 (Tall-1) has a struc- divergence observed between distinct families of A. thnl- tural organization similar to other characterized non- innn Tyl/copin retrotransposons (KONIECZNYet nl. LTR retroelements. It has two open reading frames sep 1991). Each unique clone, therefore,was considered to arated by a -1 frameshift, which encode three con- represent a differentretrotransposon family,desig- served amino acid sequence domains. These include nated Tal2-Ta28. Because the 17 reverse transcriptases two short cysteine motifs, which are present in ORFl were identified from the sequence of only 26 indepen- and ORF2 at comparable positions in other non-LTR dent clones, it is unlikely that we have exhaustivelyiden- retrotransposons. Although the function of these motifs tified all non-LTR retrotransposon sequences in the A. remains unclear, it is likely that they bind nucleic acids thnlinna genome. during replication and integration (COVEY1986; Mc The PCR assay previously developed to amplify Tyl/ CLURE1991). ORF2 also encodes a region of 315 amino copin group elementswas based primarilyon plant retro- acids that shows significant similarity to reverse tran- transposon sequences and as a consequence, was found scriptase, including all seven conserved domains char- to only robustly amplifyplant DNAs (VOWASet nl. 1992). acteristic of retroelement reverse transcriptases (XIONG More exhaustive searches, however, haverecently identi- and EICKRUSH1990). fied Tyl/c@in group elements in a few nonplant lin- Tall-1 is not present in the Landsberg ecotype. Com- eages, including fish, reptiles and amphibians (FLAVELL parison of DNA sequence from Landsberg at the site andSMITH 1992; FLAVELLet al. 1995). The non-LTR of insertion to the Columbia sequence revealed what reverse transcriptase assay described here is based on are apparently two insertion/deletion events in addi- conserved amino acid sequences from both plant and tion tothe element. Although these sequence re- elements, suggesting that they may be less taxo- arrangements make it difficult to definitively determine nomically restricted. In support of this, we have ampli- the ends of Tall-1, we have identified a 15-bp direct fied PCR products of the appropriate size from maize, repeat on eitherside of the ORFs that may represent a cotton and human DNA, sequence analysis of the hu- A. thaliana Retrotransposons 577 man clones indicated that they were, in fact, human Origins of A. thaliana retrotransposons: We have LINE-1 non-LTR retrotransposons (data not shown). previously offered two scenarios to explain the presence Population dynamics of A. thaliana retrotransposons: of the numerous low copy retrotransposons in A. thali- Retrotransposonpopulations are continually in flux, ana (KONIECZNYet al. 1991; see also CUMMINGS1994) : and variations in copy number reflect the amplification Theseelements may have beenintroduced into the and loss of specific element lineages from the genome. genome from another by horizontal transfer Such dynamics are well exemplified in S. cermisiae, and once resident, failed to proliferate. Alternatively, where the extensive genome characterization efforts retrotransposons may have been abundantin the ances- have provided a more comprehensive picture of retro- tor of A. thaliann and were lost from thegenome transposon populations. S. cermisiae element copy num- through a process of genome reduction. We presently bers range from 25 to 35 copies for the Tyl family, 10- favor the second scenario based on several lines of evi- 15 copies for Ty2, and1-4 copies each of Ty? and Ty4 dence: l) Tyl/copia elements have been identified in (BOEKEand SANDMEYER1992). Active Ty5 elements, all plant lineages indicating that they are ancient com- however, appear to have been completely lost from the ponents of plant genomes and therefore were likely genome of this species (ZOU et al. 1995). present in the ancestor of A. thaliana (FLAVELLet al. In contrast, all of the characterized A. thaliana retro- 1992a; VOV~ASet al. 1992; HIROCHIKAand HIROCHIKA transposons are of low copy number. The 17 non-LTR 1993). 2) Relationships among reverse transcriptases elements reported here range from zero to no more of A. thaliana and other plant non-LTR sequences are than seven insertions among 17 diverse ecotypes ana- consistent with organismal relationships, suggesting lyzed. Similar low copy numbers were also observed for that these sequences have been vertically transmitted. the 10 families of A. thaliana Tyl/copia group elements Although phylogenetic analysis of the A. thaliana Tyl/ (KONIECZNYet al. 1991; VOWM et al. 1992). No retro- copia group elements implied that horizontal transfer transposon families have yet been characterized in A. may have played a role in their evolutionary history, the thaliana that areof appreciable copy number. Both em- high level of divergence among the sequences and the pirical and theoretical studies of limited number of characters analyzed (100 amino population dynamics suggest that element populations acids) made it difficult to convincingly argue for epi- are maintained as a result of transpositional increase in sodes of horizontal transfer. 3) Finally, because of the copy number,balanced by some opposing force or limited phylogenetic framework for the genus Arabi- forces (CHARLESWORTHand LANGLEY 1989). Regulation dopsis, we have used cotton as a model to study plant of rates of transposition, coupled with loss through re- Tyl/copia group evolution (VANDERWIELet al. 1993). combination (see below) may be important contribut- Species within the cotton genus (Gossypium) have well- ing factors in the scarcity of these sequences in the A. defined organismal phylogenies, and exhaustive charac- thaliana genome. terization of retrotransposons from these species indi- It was surprising to find that one non-LTR element, cated that retrotransposon evolution is completely con- Tal 7, resides in the mitochondrial genome. Typical of sistent withvertical transmission; no evidence for higher plants, the A. thaliana mitochondrial genome is horizontal transfer was uncovered. large (372 kb) (KLEIN et al. 1994), and the presence of If retrotransposons were lost from the A. thaliana ge- Tal 7 suggests that some of this DNA may have origi- nome, then the ancestorof A. thaliana likelyhad multi- nated from . This is further ple active elements thatevolved into thediverse families supported by the previous documentation of a degener- found today. The identification of transposition-compe- ate reverse transcriptase in the mitochondrial genome tent retrotransposonswill make it possibleto determine of the plant Oenothera (SCHUSTERand BRENNICKE if regulation of element activity has contributed to their 1987) and the identification of other retrotransposon present-day low copy number. In addition, the genome reverse transcriptases in the A. thaliana mitochondrial sequencing projects for A. thaliana and otherplants will genome (KNOOP et al. 1996).Other reverse tran- make it possibleto evaluate more precisely the role scriptases are known to reside in mitochondriaand plas- played by retrotransposon amplification and loss in tids as part of some group I1 introns, however their plant genome architecture. sequences are distinct from those of other non-LTR retroelements (XIONG and EICKBUSH1990). Phyloge- We are grateful to SARINE SONKEI., VOI.KER KNOOP and AXEL BRFN netic analysis indicates that Tal7is more closely related NICKE for making available unpublished data. We would like to ac- knowledge DAVIDNISSEN and BRYANWuI.FEKUH1.F. for assistance in to nuclear retrotransposons and therefore is likely not DNA cloning andsequencing, and JONATHAN WENDEI.and RICH a component of a group I1 intron. Although Southern CRONNfor assistance with the phylogenetic analysis. We are grateful hybridization analysis did not reveal the presence of to MIKE CUMMINGSfor critically reading the manuscript. This work nuclear copies of Ta17, we favor the hypothesis that was supported by a grant from the Iowa State University Biotechnol- this element arose by a transfer event from the nucleus ogy Council to D.F.V. and a grant fromHoechst AG to Massachusetts General Hospital. This is Journal Paper No. 5-16378 of the Iowa Agri- to the mitochondria or by a transposition event to the culture and HomeEconomics Experiment Station, Ames, IA Project genome of this organelle. No. 3120. 578 D. A. Wright et al.

LITERATURE CITED tionary Geneticshalysis, Version 1.0, Pennsylvania State Univer- sity, University Park, PA. AUSUBEI.,F. M., R. BRENT,R. E. KINGSTON, D. D. MOORE,J. G. SEID- KRANZ,A. R., and B. KIRCHHEIM,1987 Genetic resources in Arabi- MAN et al., 1987 Cuwent Protocols in Molecular Biology. Greene dopsis. Arabidopsis Inform. Sew. 24. Publishing Associates/Wiley Interscience, New York. LEV.TON,P. R. J., and D. R. SMWH, 1993 An abundant LINE-like BENNEIT,M. D., and J. B. SMITH,1976 Nuclear DNA amounts in element amplified in the genome of Lilinm speciosum. Mol. Gen. angiosperms. Proc. R. Soc. Lond. B Biol. Sci. 274 227-274. Genet. 237: 97-104. BOEKE,J. D., and S. B. SANDMEW,R,1992 Yeast trdnSp0Sabk ele- I.EUTUTI.ER,L. S., B. R. HOUGH-EVANSand E. M. MEYEROWIT%,1984 ments, pp. 193-261 in The Molecular Biology of the Yeast Saccharo The DNA of Arabidopsis thaliana. Mol. Gen. Genet. 194: 15-23. myces: Genome Dynamics, Protein Synthesis, and Energetics, edited by IL~AN,D. D., M. H. KORMAN,J. L. JAKURCLAK and T. H. EICKBUSH, J. R. BROACH,J. R. PRINGLF.,and E. W. JONES. Cold Spring Harbor 1993 Reverse transcription of R2Bm RNA is primed by a nick Laboratory Press, Cold Spring Harbor, NY. at the chromosomal target site: a mechanism for non-LTR retro- CHARLESWORTH, B., and C. H. LAN(;I.FY,1989 The population genet- transposition. 72: 595-605. ics of Drosophila transposable elements. Annu. Rev. Genet. 23: MCCI.UKE,M. A,, 1991 Evolution of by acquisition or 251-287. deletion of -like genes. Mol. Biol. Evol. 8: 835-857. CHURGH,G., and W. GILBERT,1984 GenomicSequencing. Proc. MEYEROWITZ,E. M., 1992 Introduction to the Arabidopsis genome, Natl. Acad. Sci. USA 81: 1991-1995. pp. 100-118 in Methods in ArabidopsisResearch, edited by C. COVEY,S. N. 1986 Amino acid in gag region KONCZ, N.-H. CHUA,and J. SCHELL.World Scientific Publishing, of reverse transcribing elements and the coat protein gene of Singapore. cauliflower mosaic . Nucleic Acids Res. 14: 623-633. PRLJITT,R. E., and E. M. MEYEROWI’Z,1986 Characterization of the CUMMINGS,M.P. 1994 Transmission patterns of eukaryotic transpos- genome of Arabidqpsis thaliana. J. Mol. Bid. 187: 169-183. able elements-arguments for and against horizontal transfer. PURLJGGANAN,M. D., and S. R.WESSI.F.R, 1994 Molecular evolution Trends Ecol. Evol. 9 141-145. of magdlnn, a maize Tyj/gypsylike retrotransposon. Proc. Natl. DANII.EVSKAYA,O., F. SI.OT,M. PAVLOVA and M.-L. PARDUE,1994 Acad. Sci. USA 91: 11674-11678. Structure of the Drosophila HeT-A transposon: a retrotransposon- SAITOU,N., and M. NEI, 1987 The neighbor-joining method: a new like element forming . Chromosoma 103: 215-224. method for reconstructing phylogenetic trees. Mol. Biol. Evol. DEVEREUX,J. P., P. HAEBERLIand 0.SMITHIES, 1984 A comprehen- 4 406-425. sive set of sequence programs for the VAX. Nucleic Acids Res. SCHUSTEK,W., and A. BRENNICKE,1987 Plastid, nuclear and reverse 12 387-395. transcriptase sequences in the mitochondrial genome of Oenoth- DOOI.ITI’IX,R. F., D.-F. FENG,M. S. JOHNSON and M.A. MCCILJRE, mu: is geneticinformation transferred between organelles via 1989 Origins and evolutionary relationships of retroviruses. (2. RNA? EMBO J. 6: 2857-2863. Rev. Bid. 64: 1-30. S(:~~M”\RTZ-SOMMER,Z., I,. LECLERCQ, E. GOBEL.and H. SAEDLER,1987 FI.AW,IL,A. J., and D. SMITH, 1993 A Tyl/copiagroup retrotranspo- Cin4, an insert altering the structure of the A1 gene in Zea mays, son sequence in a vertebrate. Mol. Gen. Genet. 233: 322-326. exhibitsproperties of nonviral retrotransposons. EMBO J. 6: FIAVEI.~.,A. J., E. DUNBAR,R. ANDERSON, S. R. P~cE,R. HARTLEY 3873-3880. et al., 199%Tyl-copza group retrotransposons are ubiquitous SENTRY,J. W., and D. R. SMYTH,1985 An element with long terminal and heterogeneous in plants. Nucleic Acids Res. 20: 3639-3644. repeats and its variant arrangements in the genome of Lilium FIA\TI.I.,A. J., D. B. SMITHand A. KLIMAR,1992b Extreme heteroge- hm@. Mol. Gen. Genet. 215: 349-354. neity of Tyl-copia group retrotransposons in plant?. Mol. Gen. SMYi-11, D. R., 1991 Dispersed repeats in plantgenomes. Chro- Genet. 231: 233-242. mosoma 100: 355-359. FIAVEI.~.,A. J., V. JA(:I