Proc. Nati. Acad. Sci. USA Vol. 86, pp. 6235-6239, August 1989 Genetics Structure and coding properties of Bsl, a maize -like transposon (transposable element/retrotransposon/reverse transcription/translation frameshift) YOUNG-KWAN JIN AND JEFFREY L. BENNETZEN Department of Biological Sciences, Purdue University, West Lafayette, IN 47907 Communicated by John D. Axtell, May 12, 1989 (receivedfor review July 11, 1988)

ABSTRACT We have sequenced Bs), an insertion element do not integrate into chromosomal DNA (10). Recently, isolated from a null allele of the Adhi locus encoding alcohol Schwarz-Sommer et al. (11) have found that the Cin4 inser- dehydrogenase in maize. The Bs) element is 3203 base pairs tion element of maize could encode a (bp) in length, has 302-bp identical long terminal direct repeats activity and may fall into the subcategory of nonviral retro- (LTRs), and created a 5-bp flanking direct duplication oftarget transposons. AdhI DNA upon insertion. The 5' LTR is followed by a In 1985, Johns and coworkers (12) reported the isolation canonical primer binding site with homology to the plant and terminal repeat sequence of the Bs) insertion element initiator methionyl-tRNA, and the 3' LTR is directly preceded that they isolated from a null allele of the maize Adhl locus by a polypurine stretch like that observed in and encoding alcohol dehydrogenase. The identical, direct ter- retrotransposons. Bs) encodes two overlapping open reading minal repeats of this approximately 3.3-kilobase pair (kb) frames specifying peptides of 740 and 168 amino acids. The insertion and its low genomic copy number led these inves- longer open reading frame specifies a peptide with amino acid tigators to describe Bs) as a "copia-like" transposable ele- homology to the protease and nucleic acid binding moiety of ment of maize. We report here the complete retroviruses and retrotransposons. The deduced amino acid sequence* ofBs) and both structural and functional evidence sequence encoded by Bsl lacks convincing homology to the that show that BsJ is the DNA form of a retrovirus-like polymerase (reverse transcriptase) encoded by retroposons, transposon. despite the fact that this polymerase-encoding domain is rou- tinely the most conserved region of any such element. The MATERIALS AND METHODS sequence and relatively small size of Bs) suggest that this element is a deleted retrotransposon that inserted into AdhM Biologicals. Escherichia coli strain HB101 was used for all with the aid of a reverse transcriptase function provided in plasmid cloning, subcloning, and maintenance. Plasmids pJ1 trans. In vitro transcribed Bsl complementary RNA was trans- and pK18, containing overlapping portions of the Bs) ele- lated in vitro to produce both a protein of 81 kDa representing ment, were provided by M. A. Johns (Northern Illinois open reading frame 1 (ORF1) and one of the 95-kDa size University). Bs) contains a single HindIII site at base pairs predicted for the frame-shifted fusion of ORF1 and ORF2. As (bp) 1937-1942. Plasmids pJL-24, pJL+ 138, pJL+ 174, with many other retroposons, the efficiency of translational pJL+248, and pJL+267 were constructed by ligating a initiation at the AUG beginning ORF1 was not noticeably BamHI/HindIII fragment containing the 5' 60% of Bs) from affected by the presence of one or more upstream, unproduc- pK18 and a Xba I/HindIII fragment containing the 3' 40% of tive AUGs in the complementary RNA transcript. Bs) from pJ1 into a BamHI/Xba I-digested derivative of pGEM1 (Promega Biotec). BamHI and Xba I linkers were obtained from New England Biolabs. Restriction enzymes By both structural and mechanistic criteria, eukaryotic in- were purchased from Boehringer Mannheim, BRL, and New sertion sequences and transposable elements can be divided England Biolabs. Klenow DNA polymerase was obtained into two major categories. One class of elements is bounded from Boehringer Mannheim, and BAL-31 was supplied by by inverted terminal repeat sequences and transposes New England Biolabs. Radioactive triphosphates and wheat through excision (1, 2) and/or enhanced replication (3) ofthe germ translation extracts were purchased from Amersham. integrated element. Examples of this class of transposable [35S]Methionine and [3H]leucine were from New England elements are the controlling elements of maize (4) and the P Nuclear. Phage T7 RNA polymerase, RNasin, and rabbit elements of Drosophila (5). A second class of eukaryotic reticulocyte lysates were supplied by Promega Biotec. All insertion sequences, the "retroposons," integrate into the enzymes and extracts were used approximately as specified chromosome after reverse transcription of element-encoded by the manufacturer. RNA (6), are particularly abundant in animals, and are the DNA Sequencing. DNA was sequenced by the technique of only type of insertion sequence yet discovered in fungi. Both Maxam and Gilbert (13) on restriction fragments end-labeled the retroviruses (6) and the LI elements (7) ofvertebrates and with Klenow DNA polymerase or phage T4 polynucleotide the "retrotransposons" Ty of yeast (8) and copia of Dro- kinase. DNA sequencing products were resolved on 43- sophila (9) fall into this latter category. cm-long 0.4-mm-thick 6%, 8%, 12%, and/or 20%o denaturing Despite the early discovery and characterization of numer- DNA ous distinct transposable elements in maize (1, 4), a canonical acrylamide gels. Between 200 and 350 of retrovirus or retrotransposon has not been described in sequence were read from a given labeled end. plants. The DNA caulimoviruses ofplants have the structural In Vitro RNA Synthesis. Complementary RNA (cRNA) was properties and encode the reverse transcriptase activity separately synthesized off restriction enzyme-digested definitional of a retrovirus or retrotransposon but apparently Abbreviations: BSMV, barley stripe mosaic ; CaMV, cauli- flower mosaic virus; LTR, ; ORF, open reading The publication costs of this article were defrayed in part by page charge frame; cRNA, complementary RNA. payment. This article must therefore be hereby marked "advertisement" *The sequence reported in this paper has been deposited in the in accordance with 18 U.S.C. §1734 solely to indicate this fact. GenBank data base (accession no. M25397). 6235 Downloaded by guest on September 29, 2021 6236 Genetics: Jin and Bennetzen Proc. Natl. Acad. Sci. USA 86 (1989)

pJL-24, pJL+138, pJL+175, pJL+248, and pJL+267 by duced amino acid sequence of Bs] does contain some ho- following the protocol advised by Promega Biotec and in mology with the most conserved consensus sequence of reactions containing added 7-methylguanosine(5')triphos- reverse transcriptase, it also specifies two nearby arginines pho(5')guanosine (New England Biolabs) to provide a 5' cap and a cysteine, which would be incompatible with all other (14). The quantity and integrity of in vitro transcribed Bsl reverse transcriptase sequences (Fig. 2). RNA were confirmed by agarose slab gel electrophoresis. The reverse transcription of DNA off retroviral or retro- In Vitro Translation and Analysis. Approximately equiva- transposon RNA is generally primed by the 3' end of a host lent quantities of Bsl cRNAs were added to either wheat cellular tRNA (6). This priming event often is initiated 2 bp germ or rabbit reticulocyte lysate translation extracts. Re- downstream from the 5' LTR. The sequence 5'-TGGTAT- actions were run for 1 hr, following the manufacturers' CAAAGGTCACCGATCCTGG-3', beginning 2 bp down- specifications. Translation products were resolved on 10%o or stream from the BsJ 5' LTR, shares 21 bp of homology with 12.5% polyacrylamide/NaDodSO4 gels containinga5% acryl- the 3' end of the only initiator methionyl-tRNAs sequenced amide stacking phase. Gels were immersed in Amplify (Am- in plants, those from the monocot wheat and the dicots ersham), dried, and fluorographed by standard procedures. Lupinous luteus and Phaseolus vulgaris (20, 21). The homol- ogy observed is found in three interrupted blocks of 8, 4, and 9 bp (Fig. 3), thereby providing a theoretical overall binding RESULTS energy equivalent to approximately 13 or 14 bp ofcontinuous Both strands ofthe BsJ element were DNA-sequenced by the homology. Initiator methionyl-tRNA sequences are highly technique of Maxam and Gilbert (13). Bs] is 3203 bp in length conserved in nature (20, 21), and it is likely that the maize and terminates in identical 302-bp direct repeats (Figs. 1 and initiator methionyl-tRNA has an identical 3' sequence. Re- 2). The BsJ element encodes two long, overlapping open verse transcriptions of Ty, copia, and CaMV (22) are reading frames (ORFs) that could specify peptides of 740 primed by initiator methionyl-tRNAs. amino acids and 168 amino acids (Figs. 1 and 2). Within the To compare the translational competence ofBsl RNA with sequence domain that encodes ORFi are two nonoverlapping the ORFs predicted by the Bsl DNA sequence, we engi- ORFs specifying peptides of 181 and 133 amino acids in a neered the Bs] element for expression off bacterial SP6 and second reading frame (Fig. 1). As observed for most inte- T7 promoters in vitro. The progressive exonuclease BAL-31 grated DNA forms of retroviruses and retrotransposons (6, was used to digest both strands of DNA toward and, in some 16), the long terminal repeats (LTRs) start with TG and end cases, into the 5' and 3' Bs] LTRs. Oligonucleotides encod- with CA, the 5' LTR is followed after 2 bp by the sequence ingBamHI (5') orXba I (3') restriction sites were ligated onto 5'-TGG-3', and the 3' LTR is directly preceded by a poly- the ends of the digestion products. These fragments were purine tract (Fig. 2). It is these consistent and strong retro- then inserted between the opposing T7 and SP6 promoters in viral homologies that permit the identification ofthe 5' and 3' the plasmid pGEM1, and the endpoints of the BAL-31 LTRs and orient the expected direction of expression ofBsJ digestions were identified by DNA sequencing. The resultant relative to AdMl transcription in the mutant allele AdhI-S5446 plasmids were linearized by digestion with the appropriate (Fig. 1). Insertion ofBsJ into AdMl created a 5-bp duplication restriction enzyme (generally Xba I) and used as templates of flanking target DNA (5'-GCCAC-3'). for the in vitro synthesis of RNA. In vitro synthesized and Comparative deduced amino acid sequence analysis indi- capped RNA was translated in vitro in both wheat germ and cates sequence homology between two domains within ORFJ rabbit reticulocyte extracts. of BsJ and the protease and nucleic acid binding moieties of In vitro translation ofcRNA predicted to be the Bs] coding other integrated retroviruses, retrotransposons, and cauli- strand, by the criteria of retroviral and retrotransposon DNA flower mosaic virus (CaMV) (Fig. 2). These sequences have sequence hallmarks, yielded a number of specific peptides. been observed to be particularly well conserved, even across The largest ofthese peptides migrated with molecular weights kingdom boundaries (16-19). These homologies are encoded of 95 kDa or 81 kDa (Fig. 4). Translation in both wheat germ near each other in the 5' half of the Bsi element, as they are and rabbit reticulocyte extracts produced either 95-kDa or in all other similar elements studied (6, 10, 16-18). Without 81-kDa peptides. Addition of the RNase inhibitor RNasin to previous exception, the most conserved deduced amino acid the translation reactions did not lead to any qualitative sequence among retroposons (6, 16-19) is observed in the change in the pattern of peptides observed (Fig. 4B, lanes 1, region that encodes reverse transcriptase. Although the de- 2, 7, and 8). In addition, an in-frame deletion of the 480-bp Adhl -3'

B Xb H H Xh I I I

BsO

ORF 2 500 bp ORF4 ORF3 FIG. 1. Map and orientation of the Bsl insertion in the mutant allele Adhl-S5447. The bar graph represents an approximately 6.6-kb BamHI/Xho I restriction fragment containing the Adhl-S allele. Solid bars denote the 10 exons of Adhl-S (15). The upper line indicates the orientation and presumed size ofthe primary transcript ofAdhi-S. Immediately below the bar graph is a representation of the insertion site and orientation of the BsJ element. Bold arrows denote the LTRs of Bsl, and the thin arrows indicate the position, orientation, and extent of the four largest ORFs detected in the Bsl sequence. B, BamHI; H, HindIII; Xb, Xba I; Xh, Xho I. Downloaded by guest on September 29, 2021 Genetics: Jin and Bennetzen Proc. Nati. Acad. Sci. USA 86 (1989) 6237

1 TGTCAGATATATAGATGGGCTGTGGGC CATGATGGGCCTGGTTTGCATCTGTACTGCTCGATACTACAAGAGGAGAAGMATACCCACGAACTGAGGGGAAGAGAACAGACAACAATTC CT 12 1 TCGTCTTCCATCTCTGTCTTC CTTCTCGGTAGCTGCTGATCCCTCTTCTCTAGGGTTAGAGTTCCTTCCTTMATCCTACCTATTCATGTTTAGTGTGTTCCTGATCTTACTAC CCMCTC MetGluProThrLeuGlnSerAlaMer 241 TATTCCTCTATATCGCTACTATTGAGTTGAATCAATACACACTGGTATTGGGTTGCTAACAATTGGTATCAAAGGTCACCGATCCTGGCGTCCATGGAGCCCACCCTGCAGTCCGCGAT PBS NcoI Gl uGl uGlnLeuLysI leLeuArgGluI1leSerAspArgLeuAlaAlaGlnGluAlaArgTrpArgSerArgGluSerThrValThrGlnHi sSerArgSerIl eHi sAspLeuGl uVa 1 3 61 GGAGGAACAGTTGAAGATCCTTCGCGAGATCAGTGACCGCCTCGCTGCCCAGGAAGCGCGGTGGCGCAGTCGGGAGTCGACAGTGACGCAACATTCGAGGTCGATCCACGACCTGGAGGT

AlaMe tAlsSerAl aProSerAlaThrLeuArgAl aGluLeuAspAlaProVa lAlaVal1ThrTyrGluArgLeuAspAlaThrGl uAlaAlaSerAlaAlaArgVal1SerAl aLeuGl u 481 AGCGATGGCCTCTGCGCCTTCCGCGACATTGAGGGCGGAGCTCGACGCTCCGGTCGCTGTGACCTACGAGCGGCTCGACGCGACGGMAGCGGCCTCGGCAGCACGCGTTAGCGCCCTGGA MlluI * * SerThrThrAlaThrPheAspMe tLeuPheAspAsnSerAspIl1eValGl uLysPheAsnAlaGl uValValAlaAspAspTrpGlyrGlyLeuPheGlyGlnHi sAl aHi sPrcLeuAsp 601 GTCCACCACGGCAACCTTCGACATGCTGTTCGACAACTCCGACATTGTTGAGAAGTTTAATGCTGAGGTGGTTGCCGACGATTGGGGTGGGCTGTTCGGACAGCATGCTCACCCTCTGGA

SerGlyGlyAld ArgLeuLeuLeuProSerAlaAlaLeuProPheAlaSerAlaThrAspThrAlaPheSerSer~lleTyrGlyVa lGluCysGlyThrIl eHisThrTyrThrAlaAl a 7 21 CTCTGGAGGAGCACGTCTTCTACTTCCTTCCGCCGCTCTTCCCTTCGCATCCGCGACCGATACTGCCTTCTCCAGCATCTACGGCGTCGAGTGTGGCACGATCCACACCTACACGGCTGC protease core * * * * * LeuThrArgProAlaAlaLeuHisLysProLeuValValAlaHisaLysC sSerThrProy sAsnrgVa C ylaGluThrGlnArgGlnGlyProArgGlnAlaArgArg 841 TCTCACCCGTCCTGCTGCACTTCACAAACCCTTGGTGGTGGCGCACGCCAAGTGTTCGACGCCAGGCCACACAGAGTGTGCGCTGAAACCCACGCCAGGGACCTCGGCAGGCTCGACG NBP cys GlnCysArgLeuArgValHi sValArgArgLeu~isPheThrGlyLeuGlnLeuHi sValArgGlnHisGlyArgArgArgGlnHisVa 1ValHisVa lGlnArgValLeuProGlyAla 961 CCAATGTCGACTACGCGTGCATGTTCGCCGACTGCACTTCACTGGGCTACAACTCCACGTGCGTCAGCATGGTCGTCGTCGGCAACACGTCGTACACGTTCAACGCGTCCTACCAGGCGC MluI M1uI GluLeuAsp~lyArgGlyLeuArgLeuProTrpProArgValAlaHi sGlyAspArgProPheArgAlaHi sLeuGlnLeuHi sHisI1eAspProAspGlyGlyGlyGlyGlyGlyHi s 1081 CGMACTAGACGGACGAGGCCTGCGACTTCCATGGCCTCGGGTTGCCCACGGAGACCGACCCTTCCGCGCCCACCTGCAACTTCACCATATAGATCCAGACGGGGGCGGGGGTGGCGGACA StuI NcoI GlyIl1eThrProProGl uArgTrpLeuSerArgAlaGlyAlaTyrAsnAlaAlaAlaProAsnGl uValAspValLeuAr-gTrpThrLeuLysGlyVa lLeuTrpProArgValThrTyr 1201 TGGCATCACTCCGCCGGAGCGCTGGCTTAGCCGCGCAGGTGCTTACAACGCTGCAGCTCCTAACGAGGTCGACGTCCTGCGCTGGACGCTGAAGGGCGTGCTCTGGCCTACGGTCACGTA -* -* * * - CysIleAlaGlyTrpIleAspLeuGlyAspGlyValAlaGluGlyAlaAspHisAlaValArgVa l~rPheArgValAspAs GlvCvs|alValGluGlyGlyThrValCysAlaGlu 1321 CTGCATTGCTGGGTGGATCGACCTCGGGGACGGCGTCGCCGMGGGGCCGACCATGCGGTGCGCGTGCGCTTCCGCGTGGACGATGGATGCGTCGTTGAGGGAGGCACCGTCTGCGCCGA BglI pol? SerGlySerTrpThrGl uIleLysGlyValPheArgLeuLysArgAsnAl aArgVa lMe tGluValTyrValGlnGlyAlaVsalAlaGlyI1eAspValLysValThrAspProGlnVal 144 1 GTCTGGCAGCTGGACAGAGATCMAGGGCGTGTTCCGGCTCAAAAGGMATGCGCGTGTTATGGAGGTGTACGTGCAGGGCGCGGTCGCGGGCATCGACGTCAAGGTCACAGATCCCCAGGT PheAlaThrAsnValIl eGlnAsnLeuAlaTyrValAspPhePheThrLysHi sPh eAspTrpAlaVa lPheGl uLysGluPheArgTrpTyrHisValGluAspAlaAlaLysValPhe 1561 CTTGCCACCAACGTCATCCAGAATCTGGCCTACGTTGACTTCTTCACCAAGCACTTCGACTGGGCCGTCTTGAGAAGGAGTTCAGGTGGTACCACGTGGAGGATGCCGCGAAGGTGTT KpnI AspL~yst~etLeuThrLyrsGlyGluGl uAlaProSerVa lGlnArgGlyVal1ValMe tThrAlaVa lAl aHisAsnLeuLeuVa lGlnAlaLeuPheMe tAspGlyArgAlaSerAspAla 16 81 CGACAAAATGCTGACCAAGGGAGAGGAGGCACCATCTGTACAACGAGGAGTTGTCATGACGGCTGTTGCACACAATTTGCTTGTGCAGGCGTTGTTCATGGATGGACGTGCATCAGATGC TyrValVa lLeuGluGl uMetGlnAsnAsnGlyProPheProAspVa lPheThrTyrThrLeuLysVa lLeuLysAsnGlyGlnTrpAlaGl uG1luGluSerThrl eLeuValProGly 180 1 TTATGTTGTGCTCGAGGAGATGCAGAATAATGGTCCTTTCCCAGACGTGTTCACTTATACTTTAAAGGTTCTAAAAAATGGCCAGTGGGCTGAGGAGGAATCGACTATCCTTGTGCCAGG Asp~l eI1eGlyValLysLeuGlyAsp~l eI eSerAlaAspThrArgLeuLeuG1luGlyAspProLeuLysIl1eAspGlnSerAlaLeuThrGl~yAsnPheCysIl1eCysSerIl1eVal 1921 AGACATCATTGGTGTGAAGCTTGGGGATATCATTTCAGCTGATACTCGTCTGCTTGAAGGGGATCCACTGAAMATTGATCAGTCTGCACTCACGGGGAATTTTTGCATCTGTTCAATTGT AlaGlyMe tLeuVa lGluPheIl eVa lMetTyrProIl1eGlnAsplletValTyvrArgProArgIl1eAspLysLeuLeuValLeuLeu~lleGlyGlIrlleProIleAlaMe tProThrVal 2041 TGCGGGAATGCTCGTGGAATTTATTGTCATGTACCCTATCCAAGACATGGTTACCGACCAAGAATAGACAAATTGTTGGTGCTTCTGATTGGAGGAATTCCAATTGCCATGCCCACTGT LeuSerValThrMe tSerlIleGlyAlaTyrArgLeuAlaGlnGlnGlyAl a~leThrLysArg~e tThrThrl eGluGluMetAlaGlylMetAspValProCysSerAspLysThrGly 216 1 CCTTTCCGTGACTATGTCMATCGGAGCTTATCGCTTGGCTCMACAGGGAGCCATTACTMAGAGMATGACMACTATTGMAGAMATGGCTGGCATGGACGTCCCTTGCAGCGACMAGACTGG ThrLeuProTrpThrLysLeuThrVal1I1eLysSerLeuValAspValPheGlnArgGlyAlaAspGlnAspAlaVa 1I1eLeuMe tAspAlaArgAlaSerCysThrLysAsnGlnAsp 2 281 GACATTAcCCTGGACCAAGTTGACTGTGATCAAGAGCCTGGTAGATGTTTCCAAMGAGGAGCTGACCAAGAT~CAGTAATTCTGATGGATGCCAGAGCTTCTTGTACGAAAAATCAGGA AlaI 1eGluAlsThrI 1eValSerMe tLeuAlaAlaProLysGluAlaCysAlaGlyVa lGlnGluIleGlnPheLeuProPheAsnProAsnAspLysArgThrAlaVa lThrTyrMe t ***GlnGluAsnC~ysSerAspValHi s 2401 TGCCATAGAGGCCACAATAGTTAGTATGCTGGCTGCTCCAAAGGAGGCTTGTGCTGGTGTCCAAGAGATCCAATTCTTCCATTTcAATCATGACAAGAGAACTGCAGTGACGTACAT Bgl1I SerLeuIl eTyrAlaLeuSerProGlyLy~sAla*** ValThrAspIleCysSerLeuSerArgGl uGlyLeu~l eGlnG1uProAspArgLeuPhel eVa lAlaGl uThrAsnG17ValAla~etTyrAsnAlaLeuIleAsnArgHisCysThr 2 521 GTCACTGATATATGCTCTCTCTCCAGGGAAGGCCTAATTCAGGAGCCAGATAGGTTGTTATAGTTGCTGAGACGAATGGTGTTGCCATGTACAATGCTCTGATCAATCGCCACTGCACA StuI AspGl7AsnMetGl uLysAlaTyrGluIl e~etValGluMetLysLysArgI1ePheProAspAgnValThrCysAsnThrLeu~etArgGl uSerCysLeuLeuGlyGlnLeuAspGlu 2 641 GATGGMMATATGGMAMAGCATATGAGATCATGGTAGAGATGMAGMGAGMTTTCCCAGATAATGTGACATGCMATACACTGATGAGAGAGTCTTGCTTGCTTGGGCMACTTGATGMA Al&ArgIl1eLeuIleAspGlu~letThrLysArgSerThrAlaI 1eSerHisAspGlnGlyrAsnPheSerlIleSerValTrpAspProGlyGlyHi sL~ys~snSerSerLeuGlyThrSer 2 761 GCTCGCATACTCATTGACGAGATGACTAAGAGGAGTACTGCTATCAGCCATGACCAAGGCAATTCTCAATCTCTGTTGGGATcCCGGTGGTCATAGAACAGCAGCTTGGGGACAACC CysI1eLeuArgTrpGlyGl uCysGlnI1eTyrArgTrpAlaValGl7Hi sAspGlyProGlyLeu~i sLeuTyrCysSerIl eLeuGlnGluGl uLysAsnThrHi sGl uLeuArgGly 2881 TGCATTGTGGGTTAAAAAAGGTTGCAGTGCTGTGACGATCCAATCGGAMMACAGATAGG PPT ArgGluGlnThrThrI 1eProSerSerSerIl1eSerValPheLeuLeuGlySerCys*** 3001 AGAGAACAGACAACATTCCTTCGTCTTCCATCTCTGTCTTCCrTcTCGGTAGCTGCTGATcCCTCrTcTCTAGGGTTAGAGTTCCTTCcCTMTACCTACCTATTCATGTTTAGTGTGTT 3121 CCTGATCTTACTACCCMACTCTATTCCTCTATMATCGCTACTATTGAGTTGMATCMATACACACTGGTATTGGGTTGCTMCA FIG. 2. Nucleotide sequence of Bsl and the deduced amino acid sequences of ORF1 and ORF2. The LTRs of Bsl are underlined, as are the adjacent sequences of the presumptive primer binding site (PBS) and polypurine tract (PPIT). Regions with strong amino acid homology to the retroviral protease core and nucleic acid binding protein (NBP) are boxed, and conserved amino acids are indicated by asterisks. A region with some amino acid homology to retroposon polymerase (reverse transcriptase) is also boxed and labeled "pol?" Amino acids consistent with a conserved reverse transcriptase sequence are marked by asterisks, while the cysteine and two arginines incompatible with all known polymerase sequences are marked by minus signs. Restriction sites mentioned in the text are indicated below their recognition sequence. Mlu I fragment in Bsl, which has homology to retroviral and 3, 10, and 11). The molecular weights of these truncated retrotransposon protease-encoding regions, still allowed the peptides indicated that their initiation point is just down- synthesisofsome lowermolecularweightpeptides(Fig. 4B, stream of the 5' LTR in Bsl. The translation initiation lane 9). We believe these small peptides are due to aberrant sequence nearest to this regyion is. the 5'-ATGr-3' that begyins internal initiation and/or premature termination in the in vitro ORFi (Figs. 2 and 3). The sequence around this ATG, translation system. Digestion of the DNA template with the 5'-CCATGG-3', fits the consensus for eukaryotic translation restriction enzymes Kpn I, Stu I, or BgI I (which have sites initiation sites (5'-CCATGG-3') (23, 24). within BsJ) prior to in vitro RNA synthesis led to the In vitro synthesized RNAs with various lengths of flanking subsequent translation of truncated peptides (Fig. 4A, lanes Adhi and BsJ 5' LTR sequence were translated in parallel in Downloaded by guest on September 29, 2021 6238 Genetics: Jin and Bennetzen Proc. Natl. Acad. Sci. USA 86 (1989)

C U G HOACC AU AG U UC C A G C U A G G A C C-tRNAMet 51Bs1--T G G T A T C A A A GG T C G A T C C T G G-o3Bs1 C C A FIG. 3. Homology between the 3' end of the initiator methionyl-tRNA of maize and the primer binding site ofBsl. The Bs] DNA sequence is presented, although Bs1 RNA would be primed for in vivo first-strand DNA synthesis. vitro (Fig. 4). In all cases, the 95-kDa or 81-kDa peptide was peptides predicted both by the ORF1 sequence and by a synthesized whether the cRNA 5' end entered element- translational frame shift, like that seen in retroposon RNA encoded sequences at Bs) nucleotides -24, +126, +138, translation (6, 16-19, 26, 27), which would fuse ORFi with + 174, +248, or +267. In addition, the quantity of 95-kDa or ORF2. Also, as with retroviral RNAs, efficient translation of 81-kDa peptide produced was similar in all cases (Fig. 4A, ORF1 occurs in vitro despite nonproductive, upstream AUGs lanes 1, 2, 4, 8, and 9; Fig. 4B, lanes 5-8), even ifone or more within the Bsl cRNA transcript. Finally, the peptide pre- upstream AUGs (in or out of frame with the ORFi AUG) dicted by ORMi of Bs) has amino acid sequence homology were present in the transcript. This ability to overcome with the protease and nucleic acid binding proteins of other nonproductive, upstream AUG sequences is a trait unusual to retroviral and retrotransposon ORF sequences. These results and characteristic of retroviral and retrotransposon RNAs show that retrotransposons are not limited to animals and (17-19, 25). fungi. As with the retrotransposon Ty) of yeast, it is possible that overproduction of a fully functional transcript of Bs) in vivo would lead to the production of detectable retrovirus- DISCUSSION like particles. Several lines of evidence indicate that Bs) is the DNA form The only significant dissimilarity between the Bs) se- of a retrovirus-like transposon. The DNA sequence of Bs) quence and the sequences of other retroviruses, retrotrans- exhibits the LTR structure, primer binding site, polypurine posons, and CaMV is the absence of a reverse transcriptase tract, and long overlapping ORFs characteristic ofintegrated consensus sequence encoded by Bs). Reverse transcriptase retroviral DNA. In vitro translation of Bsl cRNA yields the is usually the most highly conserved protein encoded by A 1 2 3 7 8 910 B 1 2 3 4 5 6

< 110

< 97

-4 68 0:

dbmk-dmll. .:. I., .: .IIV-i t,-1,o"".1'j,

< 43 )- I_

*^es~~~~0 .::wdR: 'm

138 138 138 174 NW -ElI 248 267248 248 138 138 -M NW -24 126 138 138 174 Kpn markers Stu Bgl -in _Am -In Mlu

FIG. 4. In vitro translation of Bsl cRNA. Translations were performed in either wheat germ extracts (lanes 1-4 and 6-11 in A and lanes 1-3 in B) or rabbit reticulocyte lysates (lanes 5-9 in B). The open arrows to the left ofeach gel depiction indicate the 95-kDa (A) or 81-kDa (B) peptides that are the largest proteins synthesized from Bs1 cRNA. The numbers below the figure indicate the sequence position of the most 5' Bsl nucleotide in the cRNA transcript. These were transcribed off plasmids pJL-24, pJL+ 138, pJL+ 174, pJL+248, and pJL+267. The 3' end of each cRNA was generally specified by digestion of the template with Xba I (lanes 1, 2, 4, 8, 9 in A and lanes 1, 2, 5, 6, 7, and 8 in B). The Xba I site had been added to the 3' end of Bsl at nucleotides 3061, 3199, or 3219. Constructs with the same 5' Bsl end and any of these three 3' ends were found to give the same translation results. The designations Kpn, Stu, and Bgl indicate translations of truncated cRNAs made from templates digested with Kpn I, Stu I, or Bgl I, respectively. One translation was derived from RNA made from an Mlu I-deleted version of pJL+ 174 and is designated "Mlu." The molecular weight (MW) markers used in these experiments (marked by arrowheads) were phosphorylase b (97.4 kDa), bovine serum albumin (68 kDa), ovalbumin (43 kDa), and three wheat germ extract translation products of brome mosaic virus RNA (lane 6 in A) of 110 kDa, 97 kDa, and 35 kDa. The ovalbumin marker in lane 4 of B degraded in storage. Lanes "-RNA" designate that no RNA was added to the translation reaction, and lanes "-in" designate that no RNasin (an inhibitor of RNase action) was added to the translation mix, as it was to all other reactions. Downloaded by guest on September 29, 2021 Genetics: Jin and Bennetzen Proc. Natl. Acad. Sci. USA 86 (1989) 6239 retroposons. For example, homology is observed between observation suggests that appropriately constructed BsJ de- the reverse transcriptase of the nonintegrating plant virus rivatives might prove to be powerful tools for transposon CaMV and animal and fungal retroposon proteins (19). There- tagging and gene transfer in a broad array of plant species. fore, the Bsl element must either lack its own functional reverse transcriptase or encode a reverse transcriptase with We thank M. A. Johns for providing subclones of Bsl, G. Gus- an tafson for the complete DNA sequence of BSMV prior to its unusually divergent sequence. publication, and P. T. Gilham for advice and software related to The Bsl element was recovered as an insertion in the Adhl DNA sequence analysis. This research was supported by an Indi- locus from a maize line infected with barley stripe mosaic vidual Research Project Award in Plant Biology from the McKnight virus (BSMV) (12). The Bsl sequence lacks homology with Foundation. BSMV (12), and there is no evidence that BSMV encodes a reverse transcriptase activity. In addition, the tRNA-like 1. McClintock, B. (1949) Carnegie Inst. Washington Yearb. 48, structures at the 3' ends of the BSMV RNAs have no 142-154. 2. Brink, R. A. & Nilan, R. A. (1952) Genetics 37, 519-544. homology with the Bsl tRNA primer binding site (28). 3. Bennetzen, J. L., Fracasso, R. P., Morris, D. W., Robertson, Hence, it is unlikely that BSMV directly provided a compo- D. S. & Skogen-Hagenson, M. J. (1987) Mol. Gen. Genet. 208, nent necessary for BsJ transposition into the Adhl locus. 57-62. More likely, the stress induced in the recipient plant by 4. Nevers, P., Shepherd, N. S. & Saedler, H. (1986) Adv. Bot. BSMV infection may have activated Bs] (29, 30). In this Res. 12, 103-203. regard, unstressed plants containing integrated Bs] elements 5. Engels, W R. (1983) Annu. Rev. Genet. 17, 315-344. apparently transcribe Bsl RNA poorly, if at all (12). 6. Varmus, H. E. (1983) in Mobile Genetic Elements, ed. Shapiro, The maize line in which the Bsi insertion in AdMl was J. A. (Academic, New York), pp. 411-503. isolated has two copies, genome, elements 7. Rogers, J. H. (1985) Int. Rev. Cytol. 93, 187-279. per haploid of 8. Boeke, J. D., Garfinkel, D. J., Styles, C. A. & Fink, G. R. with an identical restriction map to BsJ (12, 31). Since no (1985) Cell 40, 491-500. otherBsi sequence homology was detected in this maize line, 9. Shiba, T. & Saigo, K. (1983) Nature (London) 302, 119-124. it is likely that the BsJ insertion in Adhi-S5446 was derived 10. Hull, R., Covey, S. N. & Maule, J. (1987) J. Cell. Sci. Suppl. from one of these endogenous elements. If one of these 7, 213-229. Bs]-like elements was the source of the insertion in AdMl- 11. Schwarz-Sommer, Zs., Leclercq, L., Gobel, E. & Saedler, H. S5446, then this element must either encode all of its own (1987) EMBO J. 6, 3873-3880. transposition functions, including reverse transcriptase, or 12. Johns, M. A., Mottinger, J. & Freeling, M. (1985) EMBO J. 4, these functions must have been provided in trans. These 1093-1102. 13. Maxam, A. & Gilbert, W. (1977) Proc. Natl. Acad. Sci. USA endogenous elements' relatively small size plus their simi- 74, 560-564. larity in restriction map (12, 31), and, presumably, sequence 14. Contreras, R., Cheroutre, H., Degrave, W. & Fiers, W. (1982) to Bs] make it almost as unlikely that they encode a func- Nucleic Acids Res. 10, 6353-6362. tional reverse transcriptase as it is unlikely for BsL. Reverse 15. Dennis, E. S., Gerlach, W. L., Pryor, A. J., Bennetzen, J. L., transcriptase enzymes do produce a relatively high level of Inglis, A., Lllewellyn, D., Sachs, M. M., Ferl, R. J. & Pea- substitution errors in reverse transcription (0.1%) (6). There- cock, W. J. (1984) Nucleic Acids Res. 12, 3983-4000. fore, a finite possibility exists that this or some other variety 16. Clare, J. & Farabaugh, P. (1985) Proc. Natl. Acad. Sci. USA 82, of replication error led to an integrated Bsl element that 2829-2833. lacked a reverse transcriptase-encoding homology that was 17. Mount, S. M. & Rubin, G. (1985) Mol. Cell. Biol. 5,1630-1638. one or both the 18. Marlor, R. L., Parkhurst, S. M. & Corces, V. G. (1986) Mol. present in of endogenous BsJ elements. Cell. Biol. 6, 1129-1134. Alternatively, the structurally altered versions of Bsi de- 19. Toh, H., Hayashida, H. & Miyata, T. (1983) Nature (London) tected in some maize lines (12) could be a source of a 305, 827-829. Bs]-related element that encodes its own reverse transcrip- 20. Ghosh, H. P., Ghosh, K., Simsek, M. & RajBhandary, U. L. tase. (1982) Nucleic Acids Res. 10, 3241-3247. Reverse transcriptase activity may be provided in trans in 21. Sprinzl, M., Hartmann, T., Meissner, F., Moll, J. & Vorder- plant nuclei either by endogenous elements like Cin4 (11) or wuilbecke, T. (1987) Nucleic Acids Res. 15, r53-r188. by nonintegrating, reverse transcribed like CaMV 22. Pfeiffer, P. & Hohn, T. (1983) Cell 33, 781-789. (22). In potato, an intronless pseudogene derived from actin 23. Kozak, M. (1986) Cell 44, 283-292. 24. Heidecker, G. & Messing, J. (1986) Annu. Rev. Plant Physiol. mRNA has been discovered recently (32). Once Bsl RNA 37, 439-466. was transcribed, its proposed efficient priming by the maize 25. Kozak, M. (1984) Nucleic Acids Res. 12, 3873-3893. initiator methionyl-tRNA could yield an integration-compe- 26. Jacks, T. & Varmus, H. E. (1985) Science 230, 1237-1242. tent DNA form ofthe element through a reverse transcriptase 27. Mellor, J., Fulton, S. M., Dobson, M. J., Wilson, W., Kings- provided by an unrelated element. man, S. M. & Kingsman, A. J. (1985) Nature (London) 313, Ty, copia, and CaMV utilize the host initiator methionyl- 243-246. tRNA to prime reverse transcription oftheir RNA transcript. 28. Gustafson, G., Armour, S.L., Gamboa, G. C., Burgett, S. G. The closejuxtaposition ofthe RNA template's primerbinding & Shepherd, J. W. (1989) Virology 170, 370-377. site to the point at which translation is initiated may facilitate 29. Sprague, G. F., McKinney, H. H. & Greeley, L. (1963) Sci- this interaction. as the most ence 141, 1052-1053. Moreover, conserved of all 30. McClintock, B. (1984) Science 226, 796-801. tRNA species, an identical initiator methionyl-tRNA 3' end 31. Bennetzen, J. L., Brown, W. E. & Springer, P. S. (1988) in would be available in a broad range ofpotential host species. Plant Transposable Elements, ed. Nelson, 0. E. (Plenum, New In evolution, this could both aid in dispersal of a retroviral York), pp. 237-250. element and enhance the frequency of other horizontal DNA 32. Drouin, G. & Dover, S. A. (1987) Nature (London) 328, 557- transfer events. From a genetic engineering perspective, this 558. Downloaded by guest on September 29, 2021