Gene movement by Helitron transposons contributes to the haplotype variability of

Jinsheng Lai*, Yubin Li*, Joachim Messing*, and Hugo K. Dooner*†‡

*The Waksman Institute, Rutgers, The State University of New Jersey, Piscataway, NJ 08855; and †Department of Plant Biology, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901

Edited by Susan R. Wessler, University of Georgia, Athens, GA, and approved May 9, 2005 (received for review April 11, 2005) Different maize inbred lines are polymorphic for the presence or with a large amount of repetitive DNA. A comparison of the absence of genic sequences at various allelic chromosomal loca- Rph7 locus in two barley cultivars established that colinearity was tions. In the bz genomic region, located in 9S, sequences homol- restricted to Ͻ35% of the two sequences, principally because of ogous to four different genes from rice and Arabidopsis are differences in blocks (9). Interestingly, a gene present in line McC but absent from line B73. It is shown here that encoding a truncated helicase was present in only one of the two this apparent intraspecific violation of genetic colinearity arises cultivars. On the other hand, no cases of gene acquisition or loss from the movement of genes or gene fragments by Helitrons, a were found in a comparison of two different orthologous regions recently discovered class of eukaryotic transposons. Two Helitrons, between rice subspecies (10, 11). This finding suggests that the HelA and HelB, account for all of the genic differences distinguish- type of variation detected in maize and barley may not be a ing the two bz locus haplotypes. HelA is 5.9 kb long and contains general feature of plant . The functional significance of sequences for three of the four genes found only in the McC bz the ‘‘plus–minus’’ type of variation is also unclear, because the genomic region. A nearly identical copy of HelA was isolated from genes that vary among accessions of the same species are present a 5S chromosomal location in B73. Both the 9S and 5S sites appear in multiple copies (3), and many of them are clearly to be polymorphic in maize, suggesting that these Helitrons have or gene fragments (8, 9). Independent of its generality or been active recently. Helitrons lack the strong predictive terminal functional significance, the described variation raises an impor- features of other transposons, so the definition of their ends is tant question: How did it arise? Evidence presented here greatly facilitated by the identification of their vacant sites in indicates that the apparent intraspecific violations of genetic Helitron-minus lines. The ends of the 2.7-kb HelB Helitron were colinearity in maize and, probably, barley, arise from the move- discerned from a comparison of the McC haplotype sequence with ment of genes or gene fragments by Helitrons, a recently dis- that of yet a third line, Mo17, because the HelB vacant site is covered type of eukaryotic transposon (12). deleted in B73. Maize Helitrons resemble rice Pack-MULEs in their Helitrons were found by computational analysis of genomic ability to capture genes or gene fragments from several loci and sequences from Arabidopsis, rice, and move them around the , features that confer on them a (12) and were later reported to be the causative agents of two potential role in gene . spontaneous mutations in maize (13, 14). These transposons account for 2% of the genomes of Arabidopsis and C. elegans but genome variability ͉ Helitrons ͉ bz locus ͉ corn ͉ polymorphisms had escaped detection because they lack structural features, such as terminal inverted repeats or target site duplications, that can be easily detected by computer-assisted searches (15). Instead, ecent studies have uncovered exceptional haplotype vari- the transposons have 5Ј-TC and 3Ј-CTRR termini, carry a 16- to ability in maize. Comparison of the bz genomic region of R 20-bp palindrome of variable sequence Ϸ10–12 bp upstream of McC (1, 2), a line that had been used extensively in genetic the 3Ј terminus, and insert invariably between host nucleotides analyses, with that of the unrelated standard inbred B73 (3) A and T. The putative autonomous elements reconstructed from revealed unexpected differences between them. First, retrotrans- the Arabidopsis and rice genome sequences are large (5.5–15 kb) poson clusters, which make up the bulk of the maize genome and encode proteins with homology to a DNA helicase and an (4–6), differed in composition and location relative to the genes ssDNA-binding protein. Although these proteins are not similar in the region so that the two sequences could be aligned only at to known transposases, the predicted helicases share motifs with the genes they had in common. Second, and most strikingly, the replication-initiation proteins of rolling-circle (RC) repli- some genes present in McC were absent from B73, indicating cons, which catalyze both cleavage and ligation of DNA. Hence, that genetic colinearity was violated within the species. Nonco- Helitrons were postulated to transpose by RC replication. How- linear haplotypes were also found in a comparison of the ever, the vast majority of Helitrons are nonautonomous elements genomic intervals containing the z1C zein in B73 that vary greatly in size and do not encode the set of proteins and BSSS53, inbred lines derived from the same synthetic encoded by the putative autonomous element. Kapitonov and population (7). The lengths of the z1C regions in the two inbreds Jurka (12) argued that the Helitron’s helicase and ssDNA- varied by 50% because of differences in the number of zein and binding protein were likely to have evolved from host proteins other genes and in the sizes of the retrotransposon clusters recruited by ancestral RC transposons, because of the conser- flanking them. Similar extensive nonhomologies were reported vation of their structure and their similarity to between the allelic regions of inbreds B73 and Mo17 at three known host proteins. Along those lines, Feschotte and Wessler additional chromosomal locations in the genome (8). That study (15) proposed that the acquisition of host genes by RC elements established that more than one-third of the predicted genes were present in just one inbred at the loci examined, although many of the unshared genes appeared to be truncated. The observation This paper was submitted directly (Track II) to the PNAS office. that genes not shared between inbreds violate the maize–rice Abbreviations: BAC, bacterial artificial chromosome; GSS, genome survey sequence; RC, colinearity usually displayed by shared genes prompted the rolling-circle. authors to speculate that unshared genes originated from inser- Data deposition: The sequences reported in this paper have been deposited in the GenBank tions of a yet-unknown nature rather than deletions. database (accession nos. AC159612, DQ000639, and DQ003206). High intraspecific haplotype variability is not restricted to ‡To whom correspondence should be addressed. E-mail: [email protected]. maize, having been recently described in barley, another species © 2005 by The National Academy of Sciences of the USA

9068–9073 ͉ PNAS ͉ June 21, 2005 ͉ vol. 102 ͉ no. 25 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502923102 Downloaded by guest on September 28, 2021 Fig. 1. Organization of the Ϸ10-kb gene sequences from the McC bz haplotype that are missing from the B73 bz haplotype. (A) Structure of the McC region in 9S (from ref. 3). Predicted genes are shown as pentagons pointing in the direction of transcription. are shown in bronze and in yellow. The proximal end is at 0 kb and the distal end at 10 kb. (B) Diagram of the BLASTN results, with the red bars representing the two contigs formed by overlapping GSS sequences from B73. (C) Structure of HelA and HelB, the two Helitrons that, together, account for all of the genes present in the bz genomic region of McC but not of B73. Helitron terminal sequences are in uppercase and the host target sequences are in lowercase. The stem-and-loop structure in each Helitron symbolizes the palindrome close to the 3Ј end (not shown to scale; the palindrome does not overlap any gene). The location of the gene sequences derived from a reanalysis of genomic and cDNA sequences in the database is shown below each Helitron. The hypro3 gene was misannotated originally and does not span both Helitrons. (D) Exon–intron structure of the sequenced hypro2 cDNA clone from HelA2 (GenBank accession no. DQ000639). Exons are represented as thick straight lines and

introns as thin angular lines. PLANT BIOLOGY

must have occurred frequently enough to permit the eventual ing strategy on an Applied Biosystems 3730xl DNA sequencer capture of useful genes or exons and viewed them as potential and analyzed as described in ref. 2. ‘‘exon-shuffling machines.’’ The Helitrons recovered in the two spontaneous mutants of Characterization of a hypro2 cDNA Clone. The B73 cDNA clone maize would appear to fit that view because both are large and (GenBank accession no. CO522311) was obtained from the carry fragments of several unrelated genes (13, 16). The two University of Arizona (www.genome.arizona.edu͞orders). A Helitrons described here, termed HelA and HelB, are also large transposon minilibrary was made by following the manufactur- and carry sequences from still other genes. There are several er’s (Finnzymes, Helsinki) instructions, and 10 randomly se- copies of these Helitrons in the maize genome. Together, HelA lected clones were sequenced from both ends. Sequencing and HelB can account for all the genes that are present in the bz reactions were performed with the ABI PRISM BigDye Ter- genomic region of McC but absent from the same region in B73. minator Cycle Sequencing Ready Reaction kit V3.1 (Applied None of these genes appears to be related to those carried by the Biosystems) and analyzed on an Applied Biosystems 3730xl putative autonomous plant Helitrons and were most likely re- DNA sequencer. cruited from the host and moved around the genome by Helitron elements. PCR Amplification and DNA Sequencing. The 5Ј end of HelA2 and its flanking sequence were amplified from total genomic DNA of Materials and Methods the McC line by using primer pairs cdl3͞2hp-up1 and 2hp8͞ Identification, Sequencing, and Analysis of B73 Clone b0511I12. An 2hp-up2. The corresponding sequences at the 3Ј end were Ϸ10-kb fragment containing the four genes (cdl1, hypro2, hypro3, amplified with primer pairs cdl-up3͞cdl7. All amplification and rlk) that are present in the bz genomic region of McC, but reactions were carried out with the Expand High Fidelity PCR not of B73, was used as query to search the genome survey system (Roche). The PCR products were sequenced as described sequence (GSS) maize database of GenBank (Fig. 1A). Most of above. The following primers were used: cdl-3, ACATGGTTC- the sequences in this database are from the inbred B73. One of CATCCACGCTT; 2hp-up1, GGTCTGTCGACTACGTTC- the highest-scoring hits in the BLASTN analysis was a 937-bp CTT; cdl-up3, GCATCGCTGCATCAATGTCGAA; cdl7, bacterial artificial chromosome (BAC) end sequence (GenBank GCAGTACAGGAGACTCGTA; 2hp8, TACAGGCACG- accession no. CL205862) that had homology to the coding region CAGGAGCGTAGAA; and 2hp-up2, GGCTAACTGGCAT- of the predicted gene hypro2. That BAC end came from a clone GCTCTGTA. (b0570E18) that had been anchored in the maize physical map (www.genome.arizona.edu). Based on the physical map, a subset Sequence Data Deposition. The sequences described here have of clones that overlapped with b0570E18 was selected. PCR been deposited in GenBank under the following accession nos.: experiments with template DNA from the selected BAC clones B73 clone b0511I12, AC159612; B73 hypro2 cDNA, DQ000639; and primers designed according to the BAC end sequences were and McC HelA2, DQ003206. conducted to confirm the presence of the hypro2 sequence in these BACs and to determine which end of the b0570E18 Results corresponded to the hypro2 sequence. Clone b0511I12 was The Genes Absent from the B73 bz Genomic Region Are Present chosen for sequencing because the physical map and PCR Elsewhere in the Genome. Fu and Dooner (3) found that the genes experiment suggested that it had the hypro2 sequence in the exhibiting plus–minus variation in the bz genomic region were middle. The BAC clone was sequenced by the shotgun sequenc- members of multigene families, i.e., that other copies were

Lai et al. PNAS ͉ June 21, 2005 ͉ vol. 102 ͉ no. 25 ͉ 9069 Downloaded by guest on September 28, 2021 Fig. 2. Organization of a 50-kb sequence from the B73 chromosome 5 BAC b0511I12 (GenBank accession no. AC159612). A gene island is flanked on either side by retrotransposon clusters. The genes cdl and hypro2 are adjacent to each other at the left end of the gene island (marked by the dotted-perimeter box). Part of the predicted hypro3 gene is nested within hypro2. The other predicted genes in the gene island are: 3, hypothetical protein; 4, esterase; 5, aldehyde oxidase; 6, ␤-1,3-glucanase; 7, replication factor; and 8, hypothetical protein.

present elsewhere in the genomes of B73 and various maize the element carrying cdl and hypro2 as HelA, the version of the inbreds. A major question raised by that finding is whether those element in 9S as HelA-1, and the one in 5S as HelA-2. genes are also adjacent to each other at the other locations. To An examination of their end sequences revealed that both obtain a preliminary answer to this question, a BLASTN analysis fragments have typical Helitron features (Fig. 3), beginning with of the GenBank maize GSS database was conducted by using as TC and ending with CTAG, sequence motifs that occur at the 5Ј query the 9.9-kb fragment from McC that contains the sequences and 3Ј termini, respectively, of Helitrons. Both have palindromic from the four genes that were found to be absent from the bz sequences 11 bp upstream of the 3Ј terminus and are flanked by genomic region of B73 (Fig. 1A). These four genes were an A at the 5Ј terminus and a T at the 3Ј terminus. An alignment originally assigned tentative designations based on their homol- of their terminal sequences with those of previously described ogy to other plant genes encoding either confirmed or predicted maize Helitrons reveals strong sequence conservation. Their proteins: cdl, a cell-division-like protein; hypro2 and hypro3, 5Ј-terminal 12 bp, which are identical to each other, match those Arabidopsis hypothetical proteins; and rlk, a receptor-like kinase of the Helitron insertions in the sh2-7527 mutation (13) and the (1). The maize GSS database contains mostly sequences from ba1-Ref mutation (14, 16) at 11 of the 12 positions. Their the inbred B73 and have been contributed by projects that use 3Ј-terminal 30 base pairs, which include the palindrome and enrichment procedures for the gene space in the genome (17, differ from each other at only one site, are also conserved: the 18). The database also contains the end sequences from HelA-1 terminus matches those of the Helitronsinsh2 and ba1 Ϸ475,000 B73 BAC clones, many of which have been anchored at 26 and 24 positions, respectively. More importantly, the to the genetic map (6). sequences flanking the insertion in the McC bz genomic region The analysis revealed that sequences 99% identical to those in are identical to sequences flanking an AT dinucleotide in the bz McC are present elsewhere in the genome of B73 and that at least genomic region of B73 (3) and of other lines, such as Mo17 (8) some of the genes are adjacent to each other. Based on the and A188 (Q. Wang and H.K.D., unpublished work), which also sequence overlaps, cdl, hypro2, and part of hypro3, which is now lack cdl and hypro2 at that location. The most parsimonious known to have been misannotated (see The Genic Content of the explanation for the polymorphism detected in today’s modern Helitrons, below), would form one contig, and the rest of hypro3 inbreds is that the unoccupied site was never visited. This and rlk would form a separate contig (Fig. 1B). To confirm the explanation is also in agreement with the current view that RC linked arrangement of cdl and hypro2 in B73, the corresponding transposition does not result in excision of the element from the genomic fragment was isolated. One of the overlaps in the donor site (22). However, because the actual mechanism of transposition of Helitrons is not known, the sequence present in cdl-hypro2 contig revealed by BLAST was provided by an end the bz genomic region of B73, Mo17, and A188 will be referred sequence from a B73 BAC clone (b0570E18) that had already to as the ‘‘vacant site,’’ to leave open the possibility that a been anchored to the genetic map, close to the tip of 5S Helitron may have resided there at one time in the past and to (www.genome.arizona.edu). The BAC end sequence was virtu- distinguish it from the footprint-bearing ‘‘empty sites’’ produced ally identical to the predicted coding region of hypro2. With that by the excision of most class II DNA transposons. BAC as seed, a larger BAC contig was identified in the B73 physical map, and the BAC clone b0511I12 was selected for sequencing after determining, on the basis of PCR experiments, that it probably contained the entire cdl-hypro2 contig.

The Plus–Minus Variation of the bz Genomic Region Arises from Gene Movements Mediated by Helitrons. Analysis of the BAC sequence revealed that a gene island consisting of eight genes was flanked on either side by retrotransposon clusters of the type originally described at the Adh1 locus (19) and later found to be of general occurrence in the maize genome (2, 20, 21). As shown in Fig. 2, at the left end of the gene island, there is a 6,000-bp fragment that includes cdl and hypro2 and has Ͼ99% identity with a corresponding 5,869-bp fragment from the Bz-McC genomic region. The sequences differ only in 11 SNPs, nine short indels of 1–7 bp, and one larger indel of 133 bp. Thus, the intraspecies breakdown of colinearity reported in the maize bz genomic Fig. 3. Termini of the maize Helitrons HelA-1 and HelB from McC 9S, HelA-2 from B73 5S, the Helitron insertions in mutants sh2-7527 (13) and ba1-Ref (14, region (3) can be partly explained by the movement of cdl and 16), and the rice Helitron2࿝OS (12). Helitron sequences are in uppercase letters hypro2 from one location of the maize genome to another. and the invariant host nucleotides where the Helitrons insert are in lowercase Evidence presented below supports the argument that the letters. Conserved nucleotides at the 5Ј and 3Ј termini are in bold uppercase movement is mediated by Helitrons. Hence, we have designated letters and the inverted repeats at the 3Ј termini are underlined.

9070 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502923102 Lai et al. Downloaded by guest on September 28, 2021 The Genic Content of the Helitrons. Fu et al. (1) concluded, on the basis of the differential hybridization of specific gene probes to RNA from either wild-type or a deletion mutant, that at least some of the genes now known to be carried by Helitrons were expressed. However, because these genes are members of mul- tiple gene families, unambiguous evidence for their expression can be provided only by the isolation of their respective cDNAs. A cDNA clone (GenBank accession no. CO522311) with 100% identity to the B73 hypro2 gene was identified in the maize EST database, sequenced in its entirety, and confirmed to be derived from the hypro2 gene of HelA-2 on the basis of its almost complete sequence identity (only one mismatch in 1,586 bp). The inferred hypro2 exon–intron structure is shown in Fig. 1D. The transcript begins close to the 5Ј end of HelA, spans more than 3.5 Fig. 4. The Helitron structure of bz haplotypes McC, B73, and Mo17. The kb of HelA sequence, and helps to define five exons, with exons Helitrons are shown as downward-pointing triangles, and the corresponding 2 and 3 being separated by a large 1.8-kb intron. Conceptual vacant sites in B73 and Mo17 are represented by thick short vertical lines. The translation of the transcript revealed premature stop codons in vacant site for HelB is missing in B73. all reading frames, indicating that the cDNA clone does not encode a functional protein. Furthermore, careful examination of the gene’s exon–intron structure showed it to be chimeric: Maize lines are clearly polymorphic for the presence vs. exons 3–5 correspond to exons 6–8 of a putative glycosyl absence of HelA-1 in 9S. The possibility that a similar type of hydrolase (GH) in rice [National Center for Biotechnology polymorphism also occurs for HelA-2 in 5S was investigated by Information (NCBI) protein database accession no. BAD36734] PCR. Primers based on sequences internal to and flanking and Arabidopsis (accession no. BAB09947); exon 2 corresponds HelA-2 in 5S were used to amplify the junctions between HelA-2 to exon 2 of the GH and exon 1 is of unknown origin. A 1.8-kb and 5S sequences in line McC (which is essentially W22, except intron separates exon 2 from exon 6, but the sequences for the for the bz genomic region) and a series of other Corn Belt intervening GH exons 3–5 are completely missing from the

inbreds. Positive amplification products were obtained from genomic DNA. Thus, although hypro2 is expressed, it is clearly PLANT BIOLOGY McC, W22, A188, A636, B73, H99, M14, Mo17, BSSS53, and a . 4Co63 but not from W23, suggesting that, similar to its coun- No cDNAs corresponding to the three other genic sequences terpart in 9S, the 5S site is polymorphic for the presence vs. in HelA or HelB have been recovered, either from McC cDNA absence of a Helitron. In fact, a band for the vacant site could be libraries (1) or by RT-PCR using mRNA templates from a amplified only with primers flanking HelA-2 from the inbred diversity of McC and B73 tissues (data not shown). A reexam- W23. The sequence of the PCR product from McC (GenBank ination of the structure of the predicted cdl gene shows that it, accession no. DQ003206) confirmed that the correct junction too, consists of the terminal exons of a gene with multiple exons. The cdl sequence is homologous to exons 8–10 from a family of sequences between HelA-2 and 5S had been amplified and genes encoding a cell-division-like protein in rice (NCBI protein established that the HelA-2 Helitrons of B73 and McC are 99.8% database accession no. BAD53799) and Arabidopsis (accession identical to each other. Thus, McC has a copy of HelA in 9S and no. AAN86163). These exons, separated by their respective another one in 5S, as does W22 (L. He and H.K.D., unpublished introns, are present at the 3Ј end of HelA1, in the orientation work). Most of the Corn Belt lines examined have the copy of opposite that of hypro2. As originally annotated (1), the hypro3 HelA in 5S, but lack the one in 9S (3). W23 has a copy in 9S (3) gene spanned sequences that are now known to be split between but not in 5S. HelA and HelB, but a reexamination of the sequence reveals that Upon discovering that a Helitron accounted partly for the hypro3 is shorter and contained entirely within HelA. The hypro3 difference in apparent gene content of the bz genomic region of fragment is actually found within the large intron of hypro2,in different inbred lines, an attempt was made to determine the opposite transcriptional orientation, and contains coding whether the plus–minus variation for the other gene sequences information for a truncated protein with high similarity to a rice unique to the McC haplotype could also be explained by Helitron putative serine protease (NCBI protein database accession no. movement. Sequence alignments were performed between the BAD82560) and an Arabidopsis hypothetical protein (accession variable genomic regions of McC and those of B73 and Mo17, no. BAB11289). The genes encoding both of these proteins both of which lack all four genes in the region (3, 8). The latter consist of five exons, of which exons 2, 3, and 4 and part of 5 are comparison proved fruitful. A 2,712-bp sequence was identified present in HelA. Finally, the rlk gene of HelB contains only part to be present in McC and absent from the corresponding location of the first exon of a two-exon gene annotated as a putative in Mo17. Similar to the sequence found in HelA, this sequence receptor-like protein kinase in rice (NCBI protein database has typical Helitron features, so it has been called HelB. It begins accession no. BAA94519) and Arabidopsis (accession no. AAO64924). Thus, HelA and HelB resemble the two previously with TC, ends with CTAG, is flanked by A and T residues at the described Helitron elements of maize in carrying only gene 5Ј and 3Ј ends, respectively, and has an 8-bp palindromic fragments. sequence 10 bp upstream of the 3Ј end (Fig. 3). Its termini are less related to those of previously described maize Helitrons than Discussion to those of HelA and appear to be closer to those of rice Helitron2 Basis of the Plus–Minus Variation. Data presented here establish (12). An exact vacant site was identified in Mo17 but not in B73 that the differences in the apparent gene content of the bz or A188 (Q. Wang and H.K.D., unpublished work), haplotypes genomic regions of two Corn Belt inbreds reported by Fu and that may have suffered a deletion in this region (Fig. 4). The Dooner (3), and referred to as plus–minus variability, are not sequence separating the two Helitrons in McC is short, just 892 because of deletions from the (Ϫ) line, as originally thought, bp. Together, then, HelA and HelB can account for all the genes but because of additions in the (ϩ) line. The additions have found to be present in the bz genomic region of McC but not of been caused by the movement of complex Helitron trans- B73 (Fig. 1C). posons, which ferry gene fragments from one location of the

Lai et al. PNAS ͉ June 21, 2005 ͉ vol. 102 ͉ no. 25 ͉ 9071 Downloaded by guest on September 28, 2021 Fig. 5. Diagram of the origin of an McC-type haplotype from a Mo17-type progenitor haplotype by the transposition of Helitrons. Helitrons HelA and HelB are represented as thick gray lines in the donor chromosomes. Putative RC-transposition intermediates are shown as circles inserting into the vacant target AT dinucleotides of a Mo17-like haplotype (shown here, for simplicity, in the same 5Ј–3Ј orientation). The resulting transposition product would be an McC-like haplotype.

maize genome to another. Hence, differences between allelic the ends of the HelA transposon and the identification of the AT haplotypes do not appear to have the same basis as the target dinucleotide in the B73 vacant site. differences between homeologous regions of the maize ge- The smaller of the two Helitrons, HelB, is 2,712 bp long and nome examined to date, which clearly originated by deletion carries in it a fragment of an rlk gene that has close homologues (23–25). Since its paleotetraploid origin (26, 27), maize has in rice and Arabidopsis. The vacant AT site for this Helitron is been undergoing extensive gene loss as it approaches a diploid missing in B73 but present in Mo17. Thus, the ends of HelB could state. In most cases, entire genes have been deleted, although be determined only from an alignment of the McC and Mo17 partial gene deletions have been documented (23). The gene genomic sequences. This comparison highlights the value of the loss or fractionation of the duplicated regions has been vertical sampling of one genomic region for the precise identi- extensive. At the adh1 (23) and lg2 (25) orthologous regions, fication of genomic sequences, such as those from complex fractionation is nearly complete. In a comprehensive study of Helitrons, that lack the strong structural features required for five other duplicated regions from different chromosomal global genomic computational searches. A schematic diagram locations, Lai et al. (24) found that at least 50% of the outlining the possible origin of an McC-type haplotype from a duplicated genes from the two orthologues had been lost over Mo17-type progenitor haplotype is presented in Fig. 5. a very short period of time, estimated to be as short as 5 million Fu and Dooner (3) speculated that, if found to be common at years. Thus, the haplotype noncolinearity described at the bz other locations in the genome, the plus–minus variability un- locus is not simply a residual manifestation of this reduction to covered at bz could contribute to the phenomenon of heterosis diploidy but an altogether different phenomenon that oc- or hybrid vigor in maize. They reasoned that genes absent from curred more recently. certain polymorphic locations of the genome might be comple- The reported noncolinearity between the bz-locus haplotypes mented by copies of those genes at other polymorphic locations. in McC and B73 arose from the independent insertion of two On the other hand, Song and Messing (7) showed that the level different Helitrons Ϸ900 bp away from each other at a location of expression of most zein genes that were present in one line and just distal to bz in 9S. Both Helitrons are present in line McC and absent in another did not show simple additive patterns when the absent from lines B73 (3) and Mo17 (8). Both share the following two lines were intercrossed. Recently, Brunner et al. (8) have structural features of Helitrons (12): (i) they begin with a TC (5Ј found that plus–minus variability may be common throughout end) and end with CTAG (3Јend), (ii) they have a 10- to 16-bp the maize genome. However, evidence presented here indicates palindrome Ϸ11 bp upstream of the 3Ј end, and (iii) they are that the sequences displaying that kind of variability are often inserted at an AT host dinucleotide. This site is referred to as the gene fragments ferried around the genome by Helitron trans- occupied site in McC and as the vacant site in B73 and Mo17. posons. Therefore, this plus–minus variability, unlike that of the The larger of the two Helitrons, HelA-1, is 5,869 bp long and z1C1 locus, would contribute to heterosis, as envisioned origi- carries in it fragments from three separate genes (cdl, hypro2, and nally, only when intact genes rather than gene fragments have hypro3) that have homology to rice and Arabidopsis genes. As been captured by Helitrons. Alternatively, the Helitron-mediated documented in Results, none of these genes is complete. cdl and movement of large blocks of DNA into the vicinity of genes hypro3 are in one orientation and hypro2 is in the opposite could lead to differences in gene expression from the placement orientation. Interestingly, hypro3 is contained within the long of those genes within a novel sequence context. second intron of hypro2. An almost identical copy of this Helitron, termed HelA-2, was isolated from a chromosome 5 Gene Movement by Helitrons. The Helitrons described here differ BAC clone of B73. HelA-2 is slightly longer (6,000 bp) as a from those originally described in Arabidopsis, rice, and C. consequence of a 133-bp indel polymorphism, yet its overall elegans, in that they lack sequences similar to replication protein sequence is Ͼ99% identical to that of HelA-1. A 1.6-kb hypro2 A (RPA) and DNA helicases (12). They resemble, instead, other transcript from HelA-2 was identified in the B73 EST collection, Helitron insertions previously described in maize. The Helitrons but the presence of premature stop codons in every reading in the sh2-7527 (13) and ba1-Ref (14, 16) mutants and in a BAC frame indicates that this transcript does not encode a functional clone of the 19-kDa zein (16) are heterogeneous in protein. Unlike most other transposons, Helitrons do not have size and contain portions of at least 11 different genes, none of clear terminal features, such as terminal inverted repeats or which is related to RPA or DNA helicases. If, as suggested for target-site duplication, that mark their limits. The availability, in helicase and RPA (12), these genes are being recruited from the this instance, of two closely related Helitron copies from two host, the requirements for gene capture by nonautonomous different genomic locations greatly facilitated the definition of maize Helitrons do not appear to be very stringent.

9072 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502923102 Lai et al. Downloaded by guest on September 28, 2021 The capture of genes or gene fragments from the host by end but that the normal 3Ј palindrome termination signal is transposable elements has been documented in several plants bypassed, leading to the replication and capture of adjacent (28–32). The maize Helitrons share several features with the sequences until a new cryptic downstream palindrome is en- Pack-MULEs (mutator-like elements) recently described in rice countered that can serve as a terminator. The capture of either (33): Most of the sequences captured are gene fragments, not complete or partial gene sequences by this mechanism would complete genes; a single element can contain fragments from depend on where in a gene the Helitron was inserted initially. multiple genes; sequence acquisition is at the DNA level, as Helitrons that lose their mobilization machinery would become indicated by the conservation of introns; and transcripts can nonautonomous elements, although, possibly, the majority of initiate within the element (e.g., hypro2) or outside of the nonautonomous elements has a different origin. Given the element, producing chimeric transcripts (13, 16). Based on the apparently minimal requirements for transposition, nonautono- above features of Pack-MULEs, their abundance, and the large mous Helitrons could be very small, as are the nonautonomous fraction (one-fifth) that contain fragments from multiple loci, Ds1 and dTph elements in maize and petunia, respectively (34, Jiang et al. (33) have argued that Pack-MULEs have the poten- 35). In fact, the abundant nonautonomous Helitron elements tial to create new plant genes through the multiplication, rear- Helitrony2 and Helitrony3 of C. elegans are just 249 and 195 bp rangement, and fusion of fragments from multiple genomic loci. long, respectively. It is conceivable that most of the complex Although maize Helitrons have just begun to be characterized Helitrons of maize are not derived from autonomous elements but from these much more numerous defective elements, which (13, 16), their properties shared with Pack-MULEs suggest that could readily pick up adjacent host sequences in the presence of they have a similar potential. an autonomous element. The mechanism of host-sequence acquisition by Helitronsis not known, but Feschotte and Wessler (15) have proposed a We thank members of the Dooner laboratory and the anonymous model based on the observation that the transposition of RC reviewers for constructive comments on the manuscript. This work was replicons, such as IS91 (22), has minimal cis requirements. The supported by National Science Foundation Grant DBI 03-20683 (to model postulates that RC replication initiates correctly at the 5Ј H.K.D. and J.M.).

1. Fu, H., Park, W., Yan, X., Zheng, Z., Shen, B. & Dooner, H. K. (2001) Proc. 19. SanMiguel, P., Tikhonov, A., Jin, Y. K., Motchoulskaia, N., Zakharov, D., Natl. Acad. Sci. USA 98, 8903–8908. Melake-Berhan, A., Springer, P. S., Edwards, K. J., Lee, M., Avramova, Z. & 2. Fu, H., Zheng, Z. & Dooner, H. K. (2002) Proc. Natl. Acad. Sci. USA 99, Bennetzen, J. L. (1996) Science 274, 765–768. PLANT BIOLOGY 1082–1087. 20. Song, R., Llaca, V., Linton, E. & Messing, J. (2001) Genome Res. 11, 1817–1825. 3. Fu, H. & Dooner, H. K. (2002) Proc. Natl. Acad. Sci. USA 99, 9573–9578. 21. Ramakrishna, W., Emberton, J., Ogden, M., SanMiguel, P. & Bennetzen, J. L. 4. SanMiguel, P. & Bennetzen, J. L. (1998) Ann. Bot. (London) 82, 37–44. (2002) Plant Cell 14, 3213–3223. 5. Meyers, B. C., Tingey, S. V. & Morgante, M. (2001) Genome Res. 11, 22. del Pilar Garcillan-Barcia, M., Bernales, I., Mendiola, M. V. & de la Cruz, F. 1660–1676. (2002) in Mobile DNA II, eds. Craig, N. L., Craigie, R., Gellert, M. & 6. Messing, J., Bharti, A. K., Karlowski, W. M., Gundlach, H., Kim, H. R., Yu, Y., Lambowitz, A. M. (Am. Soc. Microbiol., Washington, DC), pp. 891–904. Wei, F., Fuks, G., Soderlund, C. A., Mayer, K. F. & Wing, R. A. (2004) Proc. 23. Ilic, K., SanMiguel, P. J. & Bennetzen, J. L. (2003) Proc. Natl. Acad. Sci. USA Natl. Acad. Sci. USA 101, 14349–14354. 100, 12265–12270. 7. Song, R. & Messing, J. (2003) Proc. Natl. Acad. Sci. USA 100, 9055–9060. 24. Lai, J., Ma, J., Swigonova, Z., Ramakrishna, W., Linton, E., Llaca, V., Tanyolac, 8. Brunner, S., Fengler, K., Morgante, M., Tingey, S. & Rafalski, A. (2005) Plant B., Park, Y. J., Jeong, O. Y., Bennetzen, J. L. & Messing, J. (2004) Genome Res. Cell 17, 343–360. 14, 1924–1931. 9. Scherrer, B., Isidore, E., Klein, P., Kim, J. S., Bellec, A., Chalhoub, B., Keller, 25. Langham, R. J., Walsh, J., Dunn, M., Ko, C., Goff, S. A. & Freeling, M. (2004) B. & Feuillet, C. (2005) Plant Cell 17, 361–374. 166, 935–945. 10. Song, R., Llaca, V. & Messing, J. (2002) Genome Res. 12, 1549–1555. 26. Gaut, B. S. & Doebley, J. (1997) Proc. Natl. Acad. Sci. USA 94, 6809–6814. 11. Ma, J., Devos, K. M. & Bennetzen, J. L. (2004) Genome Res. 14, 860–869. 27. Swigonova, Z., Lai, J., Ma, J., Ramakrishna, W., Llaca, V., Bennetzen, J. L. & 12. Kapitonov, V. V. & Jurka, J. (2001) Proc. Natl. Acad. Sci. USA 98, 8714–8719. Messing, J. (2004) Genome Res. 14, 1916–1923. 13. Lal, S. K., Giroux, M. J., Brendel, V., Vallejos, C. E. & Hannah, L. C. (2003) Plant Cell 15, 381–391. 28. Talbert, L. E. & Chandler, V. L. (1988) Mol. Biol. Evol. 5, 519–529. 14. Gallavotti, A., Zhao, Q., Kyozuka, J., Meeley, R. B., Ritter, M. K., Doebley, 29. Bureau, T. E., White, S. E. & Wessler, S. R. (1994) Cell 77, 479–480. J. F., Pe, M. E. & Schmidt, R. J. (2004) Nature 432, 630–635. 30. Jin, Y. K. & Bennetzen, J. L. (1994) Plant Cell 6, 1177–1186. 15. Feschotte, C. & Wessler, S. R. (2001) Proc. Natl. Acad. Sci. USA 98, 8923–8924. 31. Takahashi, S., Inagaki, Y., Satoh, H., Hoshino, A. & Iida, S. (1999) Mol. Gen. 16. Gupta, S., Gallavotti, A., Stryker, G. A., Schmidt, R. J. & Lal, S. K. (2005) Plant Genet. 261, 447–451. Mol. Biol. 57, 115–127. 32. Elrouby, N. & Bureau, T. E. (2001) J. Biol. Chem. 276, 41963–41968. 17. Whitelaw, C. A., Barbazuk, W. B., Pertea, G., Chan, A. P., Cheung, F., Lee, Y., 33. Jiang, N., Bao, Z., Zhang, X., Eddy, S. R. & Wessler, S. R. (2004) Nature 431, Zheng, L., van Heeringen, S., Karamycheva, S., Bennetzen, J. L., et al. (2003) 569–573. Science 302, 2118–2120. 34. Gerlach, W. L., Dennis, E. S., Peacock, W. J. & Clegg, M. T. (1987) J. Mol. Evol. 18. Palmer, L. E., Rabinowicz, P. D., O’Shaughnessy, A. L., Balija, V. S., 26, 329–334. Nascimento, L. U., Dike, S., de la Bastide, M., Martienssen, R. A. & 35. Gerats, A. G., Huits, H., Vrijlandt, E., Marana, C., Souer, E. & Beld, M. (1990) McCombie, W. R. (2003) Science 302, 2115–2117. Plant Cell 2, 1121–1128.

Lai et al. PNAS ͉ June 21, 2005 ͉ vol. 102 ͉ no. 25 ͉ 9073 Downloaded by guest on September 28, 2021