<<

Insect Molecular Biology

Insect Molecular Biology (2011) 20(1), 15–27 doi: 10.1111/j.1365-2583.2010.01046.x

A novel class of miniature transposable elements (MITEs) that contain hitchhiking (GTCY)n

B. S. Coates*, J. A. Kroemer*, D. V. Sumerford*† and Gutman, 1987). PCR-based molecular genetic markers R. L. Hellmich*† are often designed to assay for fragment-size variation *USDA-ARS, Corn Insects & Crop Research amongst alleles at loci, and rely on unique Unit, Genetics Laboratory; and †Department of DNA sequence flanking independent tandem repeats to Entomology, Iowa State University, Ames, IA, USA attain specificity (Tautz, 1989; Weber & May, 1989). In contrast, instances occur when PCR amplification prod- ucts are weak or contain unintended background frag-

Abstractimb_1046 15..27 ments derived from >1 locus (Economou et al., 1990). The movement of miniature inverted repeat transpos- are difficult to determine correctly from these able elements (MITEs) modifies structure and problematic loci, and analysis of data often result in devia- function. We describe the microsatellite-associated tions from Hardy–Weinberg equilibrium within natural interspersed nuclear element 2 (MINE-2), that inte- populations and Mendelian expectations within pedigrees grates at consensus WTTTT target sites, creates (Terro et al., 2006). Analogous difficulties are often dinucleotide TT target site duplications (TSDs), and encountered when microsatellite markers are developed forms predicted MITE-like secondary structures; a 5Ј for of Lepidoptera (Anthony et al., 2001; Meglécz subterminal inverted repeat (SIR; AGGGTTCCGTAG) et al., 2004), and may in part be affected by the movement that is partially complementary to a 5Ј inverted repeat of transposable elements (TEs; Coates et al., 2009, (IR; ACGAAGCCCT) and 3Ј-SIRs (TTACGGAACCCT). 2010).

A (GTCY)n microsatellite is hitchhiking downstream of Transposable elements are portions of genome DNA conserved 5Ј MINE-2 secondary structures, causing capable of ‘jumping’ amongst loci, and can affect microsat- flanking sequence similarity amongst mobile micro- ellites by inserting in proximity or by carrying hitchhiking satellite loci. Transfection of insect cell lines indicates microsatellites within the TE itself (Chen & Li, 2007; Yang & that MITE-like secondary structures are sufficient to Barbash, 2008; Coates et al., 2009, 2010). Miniature mediate genome integration, and provides insight into inverted-repeat TEs (MITEs) are class II TEs that use a the transposition mechanism used by MINE-2s. ‘cut-and-paste’ mechanism to move a DNA segment amongst loci, and are nonreplicative as a result of the Keywords: transposons, repetitive element, PCR excision of the entire MITE and subsequent reinsertion at competition. different genome locations (Craig, 1995). MITEs have characteristic hairpin-like secondary structures that include Introduction terminal inverted repeats (TIRs; Wessler et al., 1995) or Microsatellites are composed of short units subterminal inverted repeats (SIRs) that may be required repeated in tandem, and can show allelic size variation for recognition by enzymes that facilitate their mobility (Tu because of the addition or of repeat units by slip & Orphanidis, 2001). As a result of a lack of protein coding strand mispairing during DNA replication (Levinson & sequence, MITEs are small in size (100 to 600 bp) and are referred to as non-autonomous TEs because movement depends upon the function of enzymes encoded within First published online 24 October 2010. ancestral autonomous TEs (MacRae & Clegg, 1992). Thus, Correspondence: Brad S. Coates, USDA-ARS, Corn Insects & Crop MITEs are hypothesized to result from deletions within Genetics Research, 113 Genetics Lab, Iowa State University, Ames, IA 50010, USA. Tel.: + 1 515 294 0668; fax: + 1 515 294 2265; e-mail: ancestral autonomous TEs or fortuitous proximity of juxta- [email protected] posed TIRs within the genome (Tsubota & Huong, 1991;

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society 15 16 B. S. Coates et al.

MacRae & Clegg, 1992). These autonomous TE-encoded integration of MINE-2 elements into insect cell lines, which enzymes are dual-function, and act as both an endonu- suggests that sequence within these regions is conserved clease and integrase during excision (‘cut’) and integration as a result of the requirement for active transposition. (‘paste’), respectively. The integrase functions in site- specific recognition ofa4to5bpgenome sequence that is Results and discussion followed by endonuclease-mediated cleavage that leaves a pair of nucleotide overhangs at the site of integration, and Annotation of MITE-like sequence and secondary is analogous to type II restriction enzymes commonly used structural features in molecular biology applications (Roberts, 2005). Follow- Although coding regions are important for cellular ing of the MITE into the cut site, the nucleotide functions, the movement of TEs within a genome is known overhangs are filled in by DNA polymerase, which results in to cause chromosomal changes and alterations in gene sequence duplication flanking the TE, and are referred to expression (Kidwell & Lisch, 2001; Eichler & Sankoff, as the target site duplications (TSDs). Excision is essen- 2003; Kazazian, 2004; Feschotte & Pritham, 2007; tially integration in reverse, and results in the ‘cutting’ of the Feschotte, 2008). Knowledge of the type and number of MITE from a genome location for eventual reinsertion into TEs in a genome is fundamental to the study of changes another genome location. As movement is nonreplicative in genome structure, function and adaptation, and is a the entire MITE ‘jumps’ from an existing integration posi- critical initial step in order to define future comparative tion, but indications of past TE inserts remain because of studies. A 555 bp sequence was previously described retention of the TSD (Dawid & Rebbert, 1981). within an of the Helicoverpa zea cyp321A2 gene Nonreplicative ‘cut-and-paste’ movement may pose a (GenBank accession no. DQ788841 in Fig. 1) and was quandary as to how MITEs increase copy number within a annotated as a retroelement-like short interspersed genome. This propagation is shown to occur indirectly nuclear element (SINE; Chen & Li, 2007). This annotation through DNA gap repair and DNA replication events. In the was based upon the observation of AAAAA sequences 5′ first instance, MITE excision from one homologous chro- and 3′ of the integrated TE, which were assumed to rep- mosome pair in a diploid organism can invoke the DNA resent target site duplications (TSDs) similar to those gap repair mechanism, causing restoration of the MITE generated by Bos taurus (Szemraj et al., 1995) and Gallus when present upon the . Simi- gallus retroelements (Salva & Birch, 1989). This TE was larly, DNA gap repair acts upon a sister chromatid when subsequently referred to as HzSINE1 (Chen & Li, 2007). MITE integration occurs at a novel genome SINEs are class I non- (non-LTR) TEs location, where the newly integrated TE is used as tem- that propagate or make copies of themselves by a repli- plate for repair (Engels et al., 1990). A second method cative mechanism that uses an RNA intermediate tran- takes place during DNA replication, when a MITE excises scribed from the SINE itself (Weiner, 2002). The 5′ regions from the newly replicated strand with subsequent of SINEs retain homology to tRNA, 5S rRNA or 7SL RNA- re-integration at novel positions within the parental strand like from which they were derived (Okada, 1991; leading to a net MITE copy number gain (Ros & Kunze, 2001). These methods of MITE propagation have contrib- uted to the proliferation of hitchhiking microsatellites within of the insect species Ostrinia nubilalis (Coates et al., 2009), and Drosophila sp. (Locke et al., 1999; Vivas et al., 1999; Miller et al., 2000; Wilder & Hollocher, 2001; Yang et al., 2006; Yang & Barbash, 2008). Despite initial descriptions, the extent to which propagation of MITE-like elements have contributed to the proliferation of microsat- ellite loci within the genomes of Lepidoptera remain largely unknown. To partially address this question, we herein describe a novel group of MITEs called the microsatellite-associated interspersed nuclear element family 2 (MINE-2), that contain an internal hitchhiking Figure 1. Amplification and expression of Helicoverpa zea short (GTCY)n microsatellite. Evidence suggests that MINE-2 interspersed nuclear element 1 [HzSINE1; H. zea microsatellite- elements are conserved within the genomes of Lepi- associated interspersed nuclear element 2 (HzMINE-2)] from (A) doptera, and sequence immediately flanking the (GTCY)n genomic DNA (gDNA) for the 555 bp HzSINE1 (HzMINE-2) and microsatellite is highly conserved as a result of involve- ~1250 bp ribosomal protein small subunit 13 (rpS13) gene products. (B) mRNA-enriched and mRNA subtracted pools from larvae under ment in the formation of TIR and IR structures. These normal growth conditions (n; 27 °C) and heat shock conditions secondary structures are shown to facilitate the stable (h; 48 °C for 30 min).

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society, 20, 15–27 Lepidopteran mobile microsatellites 17

Frenkel et al., 2003), and the region includes an RNA Lack of an RNA pol III promoter. The of polymerase III (pol III) promoter that is necessary for syn- an RNA intermediate is critical for propagation of thesis of the RNA intermediate. Transcription terminates retroelement-like SINEs within a genome (Weiner, 2002). at a thymide-rich motif at the 3′ boundary of the SINE and A ~550 bp fragment representing the full-length HzSINE1 immediately upstream of the poly(adenosine) tail, and pro- which was amplified by PCR from H. zea genomic DNA vides a clear demarcation of the SINE boundary (Daniels (Fig. 1A), was not correspondingly detected by RT-PCR & Deininger, 1985). The 3′ region of a SINE usually con- that used cDNA template prepared from either mRNA- tains sequence that is similar to the terminal regions of the enriched or mRNA-subtracted pools collected from whole long interspersed nuclear elements (LINEs) group of class fourth H. zea instars under normal and heat stressed I TEs. The genome integration of the SINE RNA interme- conditions (mRNA-subtracted pool = total RNA less diate is dependent upon recognition of the 3′ sequence by mRNA; Fig. 1B). RT-PCR evidence indicated that the LINE-encoded endonuclease (EN)/reverse transcriptase HzSINE1 transcript was not present among nonpolyade- (RT) that carries out target site cleavage and conversion nylated RNA, from RT-PCR evidence indicated that the of the SINE transcript into DNA at the integration site nonpolyadenylated pools do not contain HzSINE1 or (Eickbush, 1992; Luan et al., 1993; Feng et al., 1996; rpS13 transcripts (Fig. 1B). HzSINE1 appears to be Okada et al., 1997). As a result of mobility that is based expressed within polyadenylated mRNA fractions at levels upon the hijacking of enzymatic machinery of other retro- lower than the rps13 control. Lower transcription levels elements, SINEs are also referred to as non-autonomous may be a result of their integration within transcripts driven elements where their continued transposition within a by RNA pol II promoters as was observed in the initial genome depends upon maintenance of the related description (Chen & Li, 2007). autonomous LINEs. RT-PCR also showed that transcription is not up- Cross-species comparison is often used to predict and regulated in response to heat shock treatment, which classify TEs based upon sequence and structural typically resulted in ~50-fold increases in transcription homologies (Yang & Barbash, 2008; Coates et al., 2009, amongst SINEs previously described by Jeang and Latch- 2010). For example, the Bombyx mori TE Bm1 is a man (1992). Additionally, computational predictions indi- tRNA-derived SINE that was shown to have ~66% cated that no recognizable RNA pol III promoter elements sequence homology to Drosophila tRNA-like SINEs are located within the HzSINE1 sequence, nor was homol- (Wilson et al., 1988), and further evidence for this Bm1 ogy shown to any tRNA, 5S rRNA or 7SL RNA gene from annotation came from identification of a RNA pol III pro- B. mori or D. melanogaster (BLASTN results not shown). moter and transcripts within tissues from fifth instar B. Lack of promoter or sequence homologies provides evi- mori. Initial TE classifications are often revised pending dence that HzSINE1 is probably not a derivative of a acquisition of new comparative or empirical evidence, nuclear gene typical of all known eukaryotic SINEs, which and this situation is highlighted in the case of Drosophila may result in the absence of transcription under normal or DINE-1 elements. DINE-1s were first described as a cell stress conditions. Furthermore, the absence of RNA SINE-like TE based upon small size, non-autonomous intermediates suggests that either HzSINE1 is a nonfunc- nature and high frequency in the Drosophila melano- tional version of an ancestral mobile SINE, or that its gaster genome (Locke et al., 1999). Structural evidence propagation does not occur by reverse transcription. from other Drosophila species provided comparative evidence for the classification of DINE-1s as non- Integration creates a target site duplication. MITEs autonomous MITE-like TEs (Vivas et al., 1999; Miller propagate by ‘cut-and-paste’ movement of the DNA region et al., 2000; Yang et al., 2006); where this evidence (see Introduction), and all described MITEs from insect included conserved TIRs and short dinucleotide TSDs genomes tend to create short A + T-biased TSDs during similar to MITEs in the Zea mays genome (Bureau & integration. For example, Aedes aegypti Wukong, Jujin Wessler, 1992), and lack of an RNA pol III promoter. and Weneng MITEs are flanked by tetranucleotide TAYA More recently, Kapitonov & Jurka (2007) proposed the TSDs (Y = C or T), A. aegypti Nemo1 and Culex pipiens reclassification of DINE-1s as a -like TE based mimo elements, respectively, generate CA and TA TSDs upon the contention that the TEs do not create a TSD (Tu, 1997; Feschotte & Mouches, 2000), and DINE-1 upon genome integration. The re-annotation of initial TE elements from Drosophila create TT dinucleotide TSDs descriptions is not atypical (Sabot et al., 2005). Here, we during MITE-like propagation (Yang & Barbash, 2008). A present three lines of evidence suggesting that HzSINE1 comparison of a H. zea cyp321a2 gene coding sequence is a MITE-like TE that shares features with Drosophila with a HzSINE1 integration (DQ788841) and another DINE-1s, which suggests evolution from a common without integration (AY113689) allowed us to characterize ancestral TE or convergent evolution because of a the genome target site and nucleotide changes that prob- common mechanism used for transposition. ably occur during insertion of HzSINE1. The

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society, 20, 15–27 18 B. S. Coates et al.

Figure 2. Model for Helicoverpa zea microsatellite-associated interspersed nuclear element 2 (HzMINE-2) integration within the H. zea cytochrome P450 cyp321a2 gene coding sequence (GenBank accession no. DQ788841) by comparison to a H. zea cyp321a2 allele sequence lacking HzMINE-2 (GenBank accession no. AY113689). The putative endonuclease recognition site is enclosed in a box in A), and TT target site duplications (TSD) are indicated by asterisks (*) in panel D). See text for detailed description of the proposed integration mechanism. has adjacent glutamine and lysine codons that form at TTTTT nucleotide sequences on the cyp321a2 noncoding strand (Fig. 2A), which curiously fits the degenerate WTTTT recognition site at which Drosophila DINE-1s inte- grate (W = an A or T nucleotide; Yang & Barbash, 2008). The model for MITE-based integration into the cyp321a1 gene involves cleavage of the TTTTT recognition site by an autonomous TE-encoded endonuclease, which creates dinucleotide overhangs (Fig. 2B), between which the MITE is inserted (Fig. 2C). The overhangs are subse- quently filled in by DNA polymerase during chromosomal gap repair, which effectively results in the duplication of the flanking the point of integration (Fig. 2D). This proposed model for HzSINE1 integration into the H. zea cyp321a1 coding sequence is consistent with mecha- nisms used by MITE-like elements (Tu, 1997; Feschotte & Mouches, 2000), and is an explanation for the presence of TTTTT duplications flanking the HzSINE1 insertion that also takes into account nucleotides present within the gene coding sequence prior to integration. Furthermore, our model for HzSINE1 integration as a MITE is analo- gous to that described for Drosophila DINE-1s, in that integration occurs at a WTTTT endonuclease recognition and cleavage site, and results in the creation of TT dinucleotide TSDs (Yang & Barbash, 2008).

Structural homology to MITEs. MITEs have secondary structures composed of TIRs or SIRs (see Introduction). Predictions from the program MFOLD (Zuker, 2003) indicated that the HzSINE1 sequence forms a stable MITE-like secondary structure [Gibbs free energy (DG) = –93.08 kcal/mol; Fig. 3]. This intramolecular base pairing within HzSINE1 includes features that may be necessary for transposition as a MITE (Dufresne et al., 2006), shows structural homology with those of MITE-like DINE-1s from Drosophila (Yang & Barbash, 2008; Fig. 4A) and includes a12bp5′-SIR (AGGGTTCCGTAG) that base pairs with a Figure 3. Predicted miniature inverted repeat ′ (MITE)-like secondary structure of the Helicoverpa zea reverse complementary 3 -SIR (TTACGGAACCCT; DG microsatellite-associated interspersed nuclear element 2 (HzMINE-2) change of –18.72 kcal/mol; Fig. 4B) and a 9 bp stem-loop sequence from positions 3474 to 2919 in GenBank accession no. structure downstream of the 3′-SIR. An alternative sec- DQ788841. Stability of the structure is indicated by a total Gibbs free ′ ′ energy (DG) estimation of –93.08 kcal/mol, in which a 5 subterminal ondary structure was predicted to form between the 5 -SIR inverted repeat (5′-SIR) base pairs with a complementary 3′ subterminal witha5′-IR (ACGGAACCCT), but involves only the first 10 inverted repeat (3′-SIR) in addition to a 3′ stem-loop structure.

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society, 20, 15–27 Lepidopteran mobile microsatellites 19

A) B)

C)

Figure 4. Secondary structures conserved between Drosophila interspersed nuclear elements (DINE-1s) and microsatellite interspersed nuclear element (MINE-2) from Lepidoptera. (A) A schematic showing the positions of 5′ subterminal inverted repeat (5′-SIR) that is complementary 5′ inverted repeat (5′-IR) and a 3′ subterminal inverted repeat (3′-SIR), a (GTCY)n microsatellite, and a 3′ stem-loop hairpin structures. [Numbers between brackets (|) indicate the base pairs involved in secondary structure formation]. (B) Alternate secondary structures involving the 5′-SIR, and (C) alignment of 5′ and 3′ terminal regions from full-length MINE-2 elements from Lepidoptera. Structural features are as indicated in Fig. 2A and boxed nucleotides represent the proposed target site duplication (see Fig. 3). nucleotides of the 5′-SIR and shows lower stability com- cate that structural homology is present between pared to base pairing with the 3′-SIR (–15.58 kcal/mol; HzSINE1 and the DINE-1 group of MITE-like elements Fig. 4B). Similarities with DINE-1s are also present within from Drosophila, and represent new structural and com- the primary structure (DNA sequence) of HzSINE1, with parative evidence suggesting that HzSINE1 should be these similarities including a common (GTCY)n microsat- reclassified as a MITE. Additionally, because of the pres- ellite repeat located downstream of the 5′-IR, and 5′- and ence of a hitchhiking (GTCY)n microsatellite repeat within 3′-boundaries, respectively, composed of TA and GT HzSINE1 (Fig. 4C), we named this MITE-like sequence nucleotides (Fig. 4C, Y = C or T nucleotide). the H. zea microsatellite-associated interspersed nuclear The results of the above analyses suggest that element 2 (HzMINE-2), which conforms to the convention HzSINE1 is not transcribed in H. zea, and may be due to started for the lepidopteran MINE-1 element (Coates absence of an RNA pol III promoter necessary for the et al., 2010). In the following sections we show that propagation of SINE-like TEs. In contrast, our data indi- HzMINE-2-related TEs are present within other lepi-

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society, 20, 15–27 20 B. S. Coates et al. dopteran genomes, and are capable of active transposi- structure-based predications has increased the robust- tion that leads to modification of genome structure in ness of TE predictions (Bergman & Quesneville, 2007). Lepidoptera. For instance, DINE-1s were predicted within 12 Droso- phila genomes using homology to a short but highly conserved sequence inclusive of the 5′-SIR, in which Annotation of lepidopteran MINE-2 elements secondary structures were subsequently confirmed Structural homology between HzMINE-2 and the Droso- (Wilder & Hollocher, 2001; Yang & Barbash, 2008). Simi- phila DINE-1s (Fig. 4A) suggests that these features may larly, a short sequence containing a conserved 5′-SIR was be shared amongst MINE-2s within the lepidopteran used to predict Helitron-like TEs amongst species of lineage and could be annotated within other genomes. Lepidoptera (Coates et al., 2010), and CACTA TEs from Copies of a TE within the genome of a species or different Triticeae plants (cereal crop species; Wicker et al., 2003). species often show sequence variation, which imposes Results from a BLASTN query of National Center for challenges for sequence prediction. In contrast to gene Biotechnology Information (NCBI) nonrestricted (nr) data- coding sequences, which tend to be highly conserved bases with HzMINE-2 positions 1 to 90 (inclusive of the within and amongst species because of functional con- 5′-SIR and downstream 5′-IR) showed that 67 accessions straints, a high level of divergence often occurs amongst from 17 species of Lepidoptera share Ն74.2% similarity copies of a TE (Lerat et al., 2003). This rapid sequence over Ն74 bp to the queries HzMINE-2 sequence (Sup- evolution is observed as the accumulation of point muta- porting Information Table S1). An identical BLASTN search tions, sequence rearrangements, insertion and deletion of B. mori genome contigs obtained from NCBI for strains or interruptions caused by the integration of Dazao and p50T resulted in identification of seven and other TEs (San Miguel et al., 1996). Integration of novel 12 BmMINE-2 integrations, respectively (Table 1). From regions into MITEs is facilitated by capture of genomic these, full-length MINE-2 sequences were predicted from DNA sequence following the double strand break and gap eight annotated NCBI nr database gene sequences repair that occurs during propagation (Gorbunova & Levy, [two H. zea accessions (HzMINE-2; DQ788841 and 1997). This sequence variation appears to be tolerated EF462994), a Heliothis virescens chemosensory receptor because the termini and their secondary structures are 10 transcript (HvMINE-2; AJ748325 positions 1141 to often the only requirement for recognition by enzymatic 1475), the Yponomeuta cagnagellus pheromone binding machinery and transposition (Craig et al., 2002). As a protein gene (YcMINE-2; AF177661 positions 1321 to consequence, sole use of homology-based methods for 1735), Lymantria dispar vitellogenin (LdMINE-2; U90756 TE identification tends to be insufficient across wide positions 2015 to 2313), Mamestra brassicae heat shock species boundaries in Lepidoptera (Coates et al., 2010) transcription factor (MbMINE-2; AB354712 positions as well as other species (Terro et al., 2006; Yang & 4614 to 5160), the Choristoneura fumiferana ecdysone Barbash, 2008). Despite these difficulties, the inclusion of receptor-A gene (CfMINE-2; AF092030 positions 3493 to

Table 1. Microsatellite-associated interspersed nuclear element family 2 (MINE-2) sequences predicted within Bombyx mori contig assemblies from strains Dazao and p50T. Explanation of sequence features are given in the legend of Fig. 3

Accession no. Start End 5′-SIR 5′-IR Msat 3′-SIR

BABH01010022 10183 9691 GGTTCCGTA ACGGAACCC (GTCY)9 TACGAAGCC

BABH01027278 881 1542 GGTTCCGTA ACGGAACCC (GTCY)7 TACGAAGTC BAAB01087666 865 1450 GGTTCCGTA ACGGAACCC (GTCY)7 TACGAAGTC AADK01028266 1578 933 GGTTCCGTA ATTGAACCC (GTCY)8 TACGAAGCC AADK01011588 9655 10147 GGTTCCGTA ACGGAACCC (GTCY)8 TACGAAGCC AADK01028297 912 1573 GGTTCCGTA ACGGAACCC (GTCY)7 TACGAAGTC BAAB01212543 1762 1116 GGTTCCGTA ATTGAACCC (GTCY)8 TACGAAGCC BABH01040647 6140 6786 GGTTCCGTA ATTGAACCC (GTCY)8 TACGAAGCC AADK01014598 2218 2864 GGTTCCGTA ACGGAACCC (GTCC)4 TACGAAACC AADK01040120 258 909 GGTTCCGTA ACGGAACCC (GTCC)4 TACGAAACC BAAB01194571 588 1240 GGTTCCGTA ACGGAACCC (GTCC)4 TACGAAACC BABH01004434 18735 18083 GGTTTCGTA ACGGAACCC (GTCY)6 TACGGAGCC BAAB01076824 1043 391 GGTTCCGTA ACGGAACCC (GTCY)6 TACGGAGCC BABH01013366 8230 8888 GGAACCTTCCGTA ACGGAACCC (GTCY)8 TACGAAGCC AADK01010489 2624 1966 GGAACCTTCCGTA ACGGAACCC (GTCY)8 TACGAAGCC BAAB01095520 6486 5828 GGAACCTTCCGTA ACGGAACCC (GTCY)8 TACGAAGCC

BABH01030861 1530 891 GGAACCTTCCGTA ACGGAACCC (GTCY)11 TACGAAGCC AADK01010552 6269 6938 GGAACCTTCCGTA ACGGAACCC (GTCY)11 TACGAAGCC

BAAB01044298 1183 1852 GGAACCTTCCGTA ACGGAACCC (GTCY)11 TACGAAGCC

IR, inverted repeat; SIR, subterminal IR.

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society, 20, 15–27 Lepidopteran mobile microsatellites 21

Figure 5. Alignment conserved 5′ and 3′ regions from microsatellite interspersed nuclear element 2 (MINE-2) family members from contigs assembled from the Bombyx mori whole genome sequence (BmMINE-2; GenBank contig accession nos. listed). Miniature inverted-repeat transposable element (MITE)-like structural features are indicated in the schematic and alignments as described in Fig. 1.

4002) and within the Plutella xylostella g-aminobutyric acid tions predicted for BmMINE-2s. In all 19 instances, the receptor (Rdl-1) gene (PxMINE-2; FJ668709 positions 3′-termini of BmMINE-2s consisted of a GTTC immedi- 277 to 938)], and also for all 19 BmMINE-2s. Similarly to ately upstream of a TTTT sequence (Fig. 5), conforming to HzMINE-2, other MINE-2s from Lepidoptera have con- the integration model that results in a TT TSD (see corre- served 5′-SIR regions (Figs 4, 5), in which seven to 13 sponding boxed nucleotides in Figure 5) flanked by the TT nucleotides are predicted to with the 3′-SIR remnant of the WTTTT endonuclease recognition site (Figs 4C, 5; DG = –8.0 to –24.0 kcal/mol; structures not shown to occur during HzMINE-2 integration. Variation at shown). All MINE-2s are also likely to form a stem-loop 3′-termini of MITEs may not be uncommon as Drosophila structure immediately upstream of the 3′ terminus, which DINE-1s have TTGT, TCGT, TGGT and TCAT 3′-terminal suggests that this secondary structure may be required for nucleotide sequences (Yang & Barbash, 2008). Analo- enzymatic recognition and subsequent mobility (Galagan gous variation was also shown at the 3′-terminus of et al., 2005). In contrast, mutation in the 5′-IR sequence MINE-1 Helitrons; the silkmoth GTAA differed from the was observed in representative full-length MbMINE-2 and CTAA sequence for all other Lepidoptera. This compara- CfMINE-2 TEs, which reduces the stability of base pairing tive evidence indicates that absolute conservation of the in the 5′-SIR to –2.4 and –0.8 kcal/mol, respectively, and 3′-terminus of MINE-2s may not be required for precise suggests that this structure may not be conserved excision, or that there has been co-evolution between amongst MINE-2s. variant terminal motifs and corresponding enzymatic Nucleotides located at the boundaries of a TE (5′- and machinery within the genome of each species (Weiner 3′-termini) are involved in integration and excision reac- et al., 1986). tions, and are often highly conserved amongst related In contrast to the relatively high conservation of MINE-2 TEs (Kapitonov & Jurka, 2001). Alignment of full-length sequences within 5′- and 3′- regions that form MITE-like MINE-2 integrations provided homology-based evidence secondary structures, the central region shows evidence that TA and CRGT nucleotides are conserved at the 5′- of substitution, block deletions and insertion between and 3′-termini (Fig. 4C, R = A or G nucleotide), with excep- species (Fig. 4A). Corresponding variation within a

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society, 20, 15–27 22 B. S. Coates et al.

Figure 6. Schematic of synthetic constructs used in transfection of Spodoptera frugiperda Sf9 and Trichoplusia ni HighFive cells, and results after a 16-day selection with 200 mg/ml hygromycin B. (A) Transfection with transient reporter constructs showed no survival in either cell line, whereas (B) the identical reporter constructs inserted into a multicloning site (MCS) flanked by 5′ subterminal inverted repeat (5′-SIR), 5′-IR, 3′-SIR, and 3′ stem-loop sequences derived from the Helicoverpa zea microsatellite-associated interspersed nuclear element 2 (HzMINE-2) conferred survival to transfected Sf9 and HighFive cells. The ie1-YFP-PA construct was used as a control that conferred no resistance to Hygromycin B. See Fig. 4 for description of symbols used to indicate MINE-2 secondary structures and Experimental procedures for details on constructs. species is present amongst BmMINE-2s. In this group, in required for continued propagation within a genome addition to in the (GTCY)n microsatellite (Craig et al., 2002; Galagan et al., 2005). The secondary (mean = 7.8 Ϯ 1.6 repeats per integration; Fig. 5), an structures formed at the 5′- and 3′-termini of MINE-2s are imperfect array of AWCTTT repeats and a 174 bp block also highly conserved amongst species of Lepidoptera insertion/deletion results in BmMINE-2 lengths of 467 to despite several million years since speciation events 668 bp (Supporting Information Figure S1). Analogous (Figs 4, 5), suggesting that the sequences are selectively sequence variation was described in a region of the A + retained or are represented in higher proportions within T-rich region of Drosophila DINE-1 elements, genomes because of a critical role played in transposition and also suggests that this central variable region of of the MITE. Our results showed that Sf9 and HighFive MINE-2 and other MITE-like elements may not serve a cells remained viable and continued to grow in media with functional role in mobility (Locke et al., 1999; Vivas et al., 200 mg/ml hygromycinB3weeks post-transfection with 1999). Similarly, we show in the next section that insertion the pHzMINE-2_Hygro, YFP/Hygro and Hygro/YFP con- of different length reporter genes into these hypervariable structs, compared to a lack of growth following the same regions of the HzMINE-2 sequence did not inhibit the hygromycin selection of Sf9 cells transfected with tran- ability of synthetic HzMINE-2 elements to integrate into sient pHygro, pYFP/Hygro and pHygro/YFP control con- insect cell lines (Fig. 6) providing further support that structs (Fig. 6). Specifically, these results demonstrated these central regions are non-essential to mobile element that expression of the hygromycin resistance gene from transposition. the control reporters was indeed transient and sustained for <3 weeks in both cell lines (Fig. 6B). In contrast, the hygromycin B resistance gene was expressed and cells The 5Ј- and 3Ј-termini of HzMINE-2 are sufficient to withstood selection for a greater and indefinite duration mediate transposition when transfected with the pHzMINE-2 constructs. Further- Sequences near the termini of TEs tend to be highly more, the presence of HzMINE-2-derived 5′ and 3′ sec- conserved due to the formation of secondary structures ondary structure flanking the reporter genes appears to

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society, 20, 15–27 Lepidopteran mobile microsatellites 23

Table 2. Microsatellite-associated interspersed nuclear element family 2 (MINE-2) sequences Accessions MINE-2 integrations predicted within bacterial artificial chromosome full insert sequences from species of Species Number Gb Total Mean/BAC Genome frequency Lepidoptera used to estimate the frequency of integration within each genome sequence Bicyclus anynana 11 1.3 4 0.36 Ϯ 0.92 3.07 ¥ 10-6 sample Bombyx mori 51 8.0 1 0.02 Ϯ 0.14 1.24 ¥ 10-7 Helicoverpa armigera 18 2.0 6 0.44 Ϯ 0.51 3.06 ¥ 10-6 Heliconius melpomene 24 3.1 9 0.38 Ϯ 0.58 2.89 ¥ 10-6 Heliconius numata 15 1.6 12 0.67 Ϯ 0.82 7.54 ¥ 10-6 Spodoptera frugiperda 17 2.0 10 0.41 Ϯ 0.87 1.89 ¥ 10-6

BAC, bacterial artificial chromosome; Gb, gigabase. facilitate the stable integration into the DNA of both Sf9 copy number (Hartl et al., 1992). This phenomenon can and HighFive cells, which thereby leads to sustained lead to the inability of the autonomous TE to effectively expression well beyond that observed during transient propagate and it may eventually be eliminated from a conditions. These results are significant because of the genome by random genetic drift. Additionally, self- indication that terminal secondary structures are sufficient induced repression is known to occur as copy number to mediate genome integration across species bound- thresholds are reached, such that further expansion aries, and suggest that the pHzMINE-2_multicloning site could be held in check (La Rouzic & Capy, 2006; Hartl (pHzMINE-2_MCS) construct can be used for stable et al., 1997). In conjunction with stochastic events, these transformation of insect cells and may represent a novel mechanisms affect TE expansion such that MITE copy system for use in insect cell tissue culture. number differences are observed within and amongst species. Indeed, 334 to 5424 DINE-1 integrations are present amongst 12 Drosophila genome sequences MINE-2 between species (Yang & Barbash, 2008), and analogously were shown Transfection experiments show that HzMINE-2s can inte- amongst TEs within non-insect genomes (Frenkel et al., grate into a genome if the transpositional machinery is 2003; Feuk et al., 2006). Copy number variation can present that recognizes similar MITE-like SIRs or TIRs. lead to significant differences in chromosome structure HzMINE-2s may therefore be descendents of a common and function (Sebat, 2007), and may actually result in ancestral autonomous TE and/or may be using a similar changes that facilitate speciation (Rose & Doolittle, mechanism for propagation because of convergent evo- 1983). lution. Furthermore, HzMINE-2s appear to have been retained within the lepidopteran insect lineage. BLASTN searches of the NCBI nr database resulted in 49 hits to Experimental procedures bacterial artificial chromosome full insert sequences from Annotation of a MITE-like mobile element from Lepidoptera different species of Lepidoptera (Table 2). By treating these sequences as a random and representative The HzSINE1 sequence was identified from positions 3473 to 3392 of GenBank accession no. DQ78884 and RNA polymerase sample of each species’ genome, we were able to III promoter motifs predicted using the program POLIIISCAN estimate a ~60-fold difference in MINE-2 integration (Naykova et al., 2003). The B. mori tRNAs, 5S rRNA or 7SL 7 frequency amongst genomes (range 1.24 ¥ 10- to 7.54 RNA-like genes were downloaded from GenBank (Supporting ¥ 10-6; Table 2). This estimate is greater than the ~16- Information File S1) and loaded into a local database using fold range observed for DINE-1 integrations across com- BIOEDIT (Hall, 1999), and homology to the HzSINE1 determined -5 plete Drosophila sp. genome sequences (334 to by a BLASTN search (cut-offs: Ն50% similarity, E-values Յ 10 ). 5424; Yang & Barbash, 2008) and may be a crude esti- Extraction of total RNA and mRNA enrichments from whole fourth instar H. zea, and first strand cDNA synthesis was conducted as mate because of possible selection for gene-rich regions described by Coates et al. (2008). PCR of cDNA template used within BAC inserts, such at the sampling completed primers pairs HzMINE-2f1 (5′-TTT TAG GGT TCC GTA GCC AAA d’Alençon et al. (2010), but are legitimate estimates in ATG AC-3′) with HzMINE-2r2 (5′-AAA AAC CGG CCA AGT GCG) lieu of full genome sequences for all species. ina20ml reaction that also included 1.5 mM MgCl2,50mM The retention of a non-autonomous TE in a genome is dNTPs, 2.0 pmol of each primer, 4 ml5¥ thermal polymerase a paradox when copy numbers increase several hundred buffer (Promega, Madison, WI, USA), and 0.3125 U Taq DNA fold beyond that of the autonomous TE, at which point it polymerase (Promega). Reactions were placed on an Eppendorf Mastercycler (Eppendorf, Hamburg, Germany), denatured for is hypothesized that endonuclease/integrase enzyme 2 min at 94 °C, cycled 30 times at 94 °C for 30 s, 60 °C for 30 s titres may become insufficient for continued propagation and 72 °C for 1 min, and then incubated at 72 °C for 10 min. The of TEs within a family, and is viewed to be a major products were visualized after separating the entire 20 ml PCR hurdle to overcome in order for MITEs to increase in reaction on 1.5% agarose. Secondary structures of HzSINE1

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society, 20, 15–27 24 B. S. Coates et al. were predicted by using the program MFOLD (Zuker, 2003), and 2rRE [5′-ACC GGT GAG CTC GGT ACC GGA TCC GAA TTC created using the partial function and pair probabilities algorithm GTC GAC AGA TCT TTT CCG AGT AAA TCG GCT GT-3′; and DNA parameters at 25 °C. Structural features described for engineered MCS composed of AgeI, SacI, KpnI, BamHI, EcoRI, DINE-1s by Yang & Barbash (2008; 5′-SIR, 5′-IR, GTCY repeat SalI and BglII sites are in bold] and primers HzMINE-2f2RE unit microsatellite, 3′-SIR and 3′ stem-loop structures) were iden- (5′-AGA TCT GTC GAC GAA TTC GGA TCC GGT ACC GAG tified via manual evaluation of the MFOLD output. Target site CTC ACC GGT TTT CAA CGA TTT TTG GTT TGA-3′) with specificity and creation of TSDs by HzSINE1 were determined by HzMINE-2r2m (5′-TTT ACG CGT AAA AAC CGG CCA AGT comparing DNA sequence of the H. zea P450 gene with an GCG; engineered MluI overhang in bold). PCR reactions were integrated TE (GenBank accession no. DQ788841) with a H. zea carried out as described previously, except that 0.3125 U Tli P450 gene that lack the integrated TE (GenBank accession no. proofreading DNA polymerase was used (Promega). A total of AY113689). Sequences were imported into the MEGA 4.0 software 5 ml of each PCR product was visualized on 1.5% agarose, then package alignment application in FASTA format (Tamura et al., purified with the Qiagen PCR Quickspin column procedure 2007), and aligning with the CLUSTALW algorithm (gap opening (Qiagen, Valencia, CA, USA). penalty 15, gap extension penalty 6.66, weight matrix Interna- Equal volumes of the two PCR amplification products were tional Union of and transition weight of 0.5) with combined, denatured at 96 °C for 10 min, cooled slowly so that manual adjustments made. Derived amino acid sequences were sequences would re-anneal at overlapping MCS regions, then obtained via translation in MEGA 4.0 and coding sequence was incubated at 72 °C for 30 min with the PCR reaction mixture integrated into the nucleotide alignment. The HzMINE-2 integra- described above to fill in ends and generate a product with tion was manually inserted into the alignment, and resulting HzSINE1 ends and a central MCS region. The engineered PCR changes were compared to models of MITE TSDs proposed by product was diluted 1:10, digested with XhoI and MluI restriction Yang & Barbash (2008). endonucleases (New England Biolabs, Beverly, MA, USA) according to the manufacturer’s instructions, and ligated into a compatibly digested and purified 13Xf+ using T4 ligase The MINE-2 family within Lepidoptera (Promega) to create the p13Zf+-HzMINE-2-MCS vehicle for Positions 1 to 90 of the HzSINE1 sequence (GenBank acces- subsequent insertion of reporter sequences. sion no. DQ788841) was used as a BLASTN query against the A pGEM®-13Zf+ plasmid vector (Promega) with the Autographa NCBI nucleotide collection (nr/nt; Current, 10 May 2009), and californica multicapsid nucleopolyhedrovirus (AcMNPV) viral imme- the resulting ‘hit’ table was downloaded in .csv format (‘hit’ cri- diate early 1 (ie1) constitutive insect promoter and Simian Virus 40 teria length Ն100 bp and Ն80% similarity). Contig assemblies polyadenylation (PA) signal flanking yellow fluorescent protein (YFP) for the B. mori strains Dazao (GenBank accession nos. and Hygromycin B resistance gene (HygroBr) were assembled as AADK01000001 to AADK01066482), and p50 (GenBank acces- previously described by Kroemer & Webb (2006). Full length ie1- sion nos. BAAB000001 to BAAB01213289) were downloaded in YFP-PAand ie1-HygroBr reporter genes were amplified by PCR from FASTA format, and a local database were constructed in BioEdit recombinant vectors using gene-specific primers described by (Hall, 1999). A BLASTN search of the B. mori local database Kroemer & Webb (2006), which incorporated flanking restriction used the first 90 bp of the HzSINE1 reverse complement as a enzyme sites allowing directional insertion into p13Zf+-for the query, and ‘hit’ tables were generated as described above. creation of the transient constructs p13Zf+-ie1-YFP-PA, p13Zf+-ie1- GenBank accessions for all hits were retrieved in FASTA format, HygroBr-PA, p13Zf+-ie1-YFP-PA-ie1-HygroBr-PA and p13Zf+-ie1- and trimmed to remove nucleotide +10 of putative HzSINE1 HygroBr-PA-ie1-YFP-PA (Fig. 6). Inserts from transient constructs start sites, and a multiple performed in were PCR amplified with Platinum High Fidelity Taq DNApolymerase MEGA 4.0 (Tamura et al., 2007) as described above. The (Invitrogen, Carlsbad, CA, USA), digested and directionally inserted 3′-termini of aligned sequences were predicted by the combi- into a compatibly digested p13Zf+-HzMINE-2-MCS vector to nation of a homology search of the GenBank nr/nt database generate p13Zf+-HzMINE-2 -ie1-YFP-PA, p13Zf+-HzMINE-2-ie1- using the final 50 nucleotides of HzSINE1 as the query and a HygroBr-PA, p13Zf+-HzMINE-2-ie1-YFP-PA-ie1-HygroBr-PA and search with the MEGA 4.0-generated alignment with a partial p13Zf+-HzMINE-2-ie1-HygroBr-PA-ie1-YFP-PA constructs (Fig. 6). 3′-SIR sequence (CGGAACCCT). Secondary structures were Spodoptera frugiperda ovarian tissue-derived Sf9 (Invitrogen), predicted for all HzSINE1 elements from Lepidoptera using the and Trichoplusia ni ovarian tissue-derived High Five (Invitrogen) program MFOLD (Zuker, 2003) as described above, from which insect cell lines were maintained in Sf900 II serum free medium ′ ′ ′ ′ DINE-1-like 5 -SIR, 5 -IR (GTCY)n microsatellite, 3 -SIR and 3 (SFM) (Invitrogen) or Express 5 SFM (Invitrogen) for Sf9 and stem-loop structures sequence structures were manually HighFive cells, respectively. The Sf9 and High Five cells were annotated. The genome recognition site and TSD were plated at 75–80% confluency (1.6–2.5 ¥ 106 cells/ml) in six-well manually determined for all aligned sequences for which 3′ tissue culture plates. In vitro transfections of all recombinant termini were predicted using MITE integration models as 13Zf+ constructs (Fig. 6) used the Invitrogen Bac-N-Blue trans- described above. fection procedure according to the manufacturer’s instructions. Briefly, 500 ng of recombinant 13Zf+ vector was combined with 1 ml Sf-900 II SFM (Sf9 cells) or 1 ml Express 5 SFM (HighFive Ј Ј The 5 - and 3 -termini of HzMINE-2 are sufficient cells) and 15 mL Cellfectin II reagent (Invitrogen). The mixture to mediate transposition was incubated for 15 min at room temperature and overlaid drop- The HzSINE1 sequence was amplified from H. zea genomic wise onto appropriate cells in individual wells of the six-well DNA by PCR separately using primers HzMINE- plates. Overlaid cells were incubated at room temperature and 2f1x (5′-AAA CTC GAG TTT TAG GGT TCC GTA GCC AAA gently agitated (2 rocking motions/min) for 4 h. Following the 4 h ATG AC-3′; engineered XhoI overhang in bold) with HzMINE- transfection, an additional 1 ml of the appropriate complete

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society, 20, 15–27 Lepidopteran mobile microsatellites 25 growth medium was added and cells were incubated at 24 °C for Coates, B.S., Sumerford, D.V., Hellmich, R.L. and Lewis, L.C. an additional 48 h. Cells were then assayed visually for transfec- (2009) Repetitive genomic elements in a European corn borer, tion success via monitoring YFP expression from p13Zf+-ie1- Ostrinia nubilalis, bacterial artificial chromosome library were YFP-PA and p13Zf+-HzMINE-2-ie1-YFP-PA control indicated by bacterial artificial chromosome end sequencing using a Zeiss model 510 inverted fluorescence microscope (Carl and development of sequence tag site markers: implications Zeiss International, Jena, Germany). All wells transfected with for lepidopteran genomic research. Genome 52: 57–67. hygromycin Br cassettes were placed under selection with Coates, B.S., Sumerford, D.V., Hellmich, R.L. and Lewis, L.C. 200 mg/ml hygromycin B (Sigma-Aldrich, St. Louis, MO, USA) for (2010) A Helitron-like transposon superfamily from Lepi-

16 days at 28 °C, and monitored over an additional incubation doptera disrupts (GAAA)n microsatellites and is responsible period of 5 days until complete lysis of all control wells, including for flanking sequence similarity within a microsatellite family. wells transiently transfected with p13Zf+-ie1-HygroBr-PA, was J Mol Evol 70: 278–288. observed. Surviving cells in HzMINE-2 transfected wells were Craig, N.L. (1995) Unity in transposition reactions. Science 270: documented at 20 ¥ magnification with a Nikon digital camera 253–254. (Nikon, Tokyo, Japan) fixed to an inverted Nikon light microscope Craig, N.L., Craigic, R., Gellert, M. and Lambowitz, A.M. (2002) (model E200). Mobile DNA II. American Society of Microbiology Press, Washington, DC. Daniels, G.R. and Deininger, P.L. (1985) Repeat sequence Acknowledgements families derived from mammalian tRNA genes. Nature 317: This research is a joint contribution from the United 819–822. States Department of Agriculture (USDA) Agriculture Dawid, I.B. and Rebbert, M.L. (1981) Nucleotide sequences at the boundaries between gene and insertion regions in the Research Service and the Iowa Agriculture and Home rDNA of Drosophila melanogaster. Nucleic Acids Res 9: 5011– Economics Experiment Station, Ames (project 3543). 5020. This article reports the results of research only. Mention Dufresne, M., Hua-Van, A., El Wahab, H.A., Ben M’Barek, S., of proprietary products does not constitute an endorse- Vasnier, C., Teysset, L. et al. (2006) Transposition of a fungal ment or a recommendation by the USDA, or Iowa State MITE through the action of a Tc1-like transposase. Genetics University for its use. This work was supported by the 175: 441–452. USDA, Agricultural Research Service (ARS), CRIS Economou, E.P., Bergen, A.W., Warren, A.C. and Antonarakis, S.E. (1990) The poly(A) tract of Alu repetitive elements is project 016. polymorphic in the . Proc Natl Acad Sci USA 87: 2951–2954. Conflicts of interest Eichler, E.E. and Sankoff, D. (2003) Structural dynamics of eukaryotic chromosome evolution. Science 301: 793– None declared. 797. Eickbush, T.H. (1992) Transposing without ends: the non-LTR References retrotransposable elements. New Biol 4: 430–440. Engels, W.R., Johnson-Schlitz, D.M., Eggleston, W.B. and Sved, d’Alençon, E., Sezutsu, H., Legeai, F., Permal, E., Bernard- J. (1990) High-frequency loss in Drosophila is Samain, S., Gimenez, S. et al. (2010) Extensive synteny con- homolog dependent. Cell 62: 515–525. servation of holocentric chromosomes in Lepidoptera despite Feng, Q., Moran, J.V., Kazazian, H.H. Jr and Boeke, J.D. high rates of local genome rearrangements. Proc Natl Acad (1996) Human L1 encodes a conserved Sci USA 107: 7680–7685. endonuclease required for retrotransposition. Cell 87: 905– Anthony, N., Gelembiuk, G., Raterman, D., Nice, C. and Ffrench- 916. Constant, R. (2001) Brief report: isolation and characterization Feschotte, C. (2008) Transposable elements and the evolution of of microsatellite markers from the endangered Karner blue regulatory networks. Nat Rev Genet 9: 397–405. butterfly Lycaeides melissa samuelis (Lepidoptera). Hereditas Feschotte, C. and Mouches, C. (2000) Recent amplification of 134: 271–273. miniature inverted-repeat transposable elements in the vector Bergman, C. and Quesneville, H. (2007) Discovering and detect- mosquito Culex pipiens: characterization of the Mimo family. ing transposable elements in genome sequences. Brief Gene 250: 109–116. Bioinform 8: 382–392. Feschotte, C. and Pritham, E.J. (2007) DNA transposons and the Bureau, T.E. and Wessler, S.R. (1992) Tourist: a large family of evolution of eukaryotic genomes. Annu Rev Genet 41: 331– inverted-repeat element frequently associated with maize 368. genes. Plant Cell 4: 1283–1294. Feuk, L., Carson, A.R. and Scherer, S.W. (2006) Structural varia- Chen, S. and Li, X. (2007) Transposable elements are enriched tion in the human genome. Nature Reviews Genetics 7: within or in close proximity to xenobiotic-metabolizing cyto- 85–97. chrome P450 genes. BMC Evol Biol 7: 46. Frenkel, F.E., Chaley, M.B., Korotkov, E.V. and Skryabin, K.G. Coates, B.S., Sumerford, D.V., Hellmich, R.L. and Lewis, L.C. (2003) Evolution of tRNA-like sequences and genome vari- (2008) Mining an Ostrinia nubilalis midgut expressed ability. Gene 335: 57–71. sequence tag (EST) library for candidate genes and single Galagan, J.E., Calvo, S.E. and Cuomo, C. (2005) Sequencing of nucleotide polymorphisms (SNPs). Insect Mol Biol 17: 607– Aspergillus nidulans and comparative analysis with A. fumi- 621. gatus and A. oryzae. Nature 438: 1105–1115.

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society, 20, 15–27 26 B. S. Coates et al.

Gorbunova, V. and Levy, A.A. (1997) Non-homologous DNA end changes in the nucleotide sequences of the intragenic pro- joining in plant cells is associated with deletions and filler DNA moter regions of eukaryotic genes for tRNAs of all specifici- insertions. Nucleic Acids Res 25: 4650–4657. ties. J Mol Evol 57: 520–532. Hall, T.A. (1999) BioEdit: a user-friendly biological sequence Okada, N. (1991) SINEs: short interspersed repeated elements of alignment editor and analysis program for Windows 95/98/NT. the eukaryotic genome. Trends Ecol Evol 6: 358–361. Nucl Acids Symp Ser 41: 95–98. Okada, N., Hamada, M., Ogiwara, I. and Ohshima, K. (1997) Hartl, D.L., Lozovskaya, E.R. and Lawrence, J.G. (1992) Nonau- SINEs and LINEs share common 3′-sequences: a review. tonomous transposable elements in and eukary- Gene 205: 229–243. otes. Genetica 86: 47–53. Roberts, R.L. (2005) How restriction enzymes became the work- Hartl, D.L., Lohe, A.R. and Lozovskaya, E.R. (1997) Modern horses of molecular biology. Proc Natl Acad Sci USA 105: thoughts on an ancient marinere: function, evolution, regula- 5905–5908. tion. Ann Rev Genet 31: 337–358. Ros, F. and Kunze, R. (2001) Regulation of activator/dissociation Jeang, K.L. and Latchman, D.S. (1992) The transposition by replication and DNA methylation. Genetics immediate-early protein ICP27 stimulates the transcription of 157: 1723–1733. cellular Alu repeated sequences by increasing the activity of Rose, M.R. and Doolittle, W.F. (1983) Molecular biological transcription factor TFIIIC. Biochem J 284: 667–673. mechanisms of speciation. Science 220: 157–162. Kapitonov, V.V. and Jurka, J. (2001) Rolling-circle transposons in Sabot, F., Guyot, R., Wicker, T., Chantret, N., Laubin, B., Chal- . Proc Natl Acad Sci USA 98: 8714–8719. houb, B., Leroy, P., Sourdille, P. and Bernard, M. (2005) Kapitonov, V.V. and Jurka, J. (2007) Helitrons in fruit flies. Updating of transposable element annotations from large Rephase Reports 7: 271–132. wheat genomic sequences reveals diverse activities and gene Kazazian, H.H., Jr (2004) Mobile elements: drivers of genome associations. Mol Genet 274: 119–130. evolution. Science 303: 1626–1632. Salva, R. and Birch, J.B. (1989) Evidence that chicken CRI ele- Kidwell, M.G. and Lisch, D.R. (2001) Perspective: transposable ments represent a novel family of retrotransposons. Mol Cell elements, parasitic DNA, and genome evolution. Evolution 55: Biol 9: 3562–3566. 1–24. San Miguel, P., Tikhonov, A., Jin, Y.K., Motchoulskaia, N., Kroemer, J.A. and Webb, B.A. (2006) Divergences in protein Zakharov, D., Melake-Berhan, A. et al. (1996) Nested ret- activity and cellular localization within the Campoletis rotransposons in the intergenic regions of the maize genome. sonorensis ichnovirus vankyrin family. J Virol 80: 12219– Science 274: 765–767. 12228. Sebat, J. (2007) Major changes in our DNA lead to major changes La Rouzic, A. and Capy, P. (2006) models in our thinking. Nat Genet 39: S3–S5. of competition between transposable element subfamilies. Szemraj, J., Plucienniczak, G., Jaworski, J. and Plucienniczak, A. Genetics 174: 785–793. (1995) Bovine Alu-like sequences mediated transposition of a Lerat, E., Rizzon, C. and Biémont, C. (2003) Sequence diver- new site-specific retroelement. Gene 152: 261–264. gence within transposable element families in the Drosophila Tamura, K., Dudley, J., Nei, M. and Kumar, S. (2007) MEGA4: melanogaster genome. Genome Res 13: 1889–1896. molecular evolutionary genetic analysis (MEGA) software Levinson, G. and Gutman, G.A. (1987) Slipped strand mispairing: version 4.0. Mol Biol Evol 24: 1596–1599. a major mechanism for DNA sequence evolution. Mol Biol Tautz, D. (1989) Hypervariability of simple sequences as a Evol 4: 203–221. general source for polymorphic DNA markers. Nucleic Acids Locke, J., Howard, L.T., Aippersbach, N., Podemski, L. and Res 17: 6463–6471. Hodgetts, R.B. (1999) The characterization of DINE-1, a short, Terro, N., Neumeier, H., Gudavalli, R. and Schlotterer, C. (2006) interspersed repetitive element present on chromosome and Silene tatarica microsatellites are frequently located in repeti- in the centric heterochromatin of Drosophila melanogaster. tive DNA. J Evol Biol 19: 1612–1619. Chromosoma 108: 356–366. Tsubota, S.I. and Huong, D.V. (1991) Capture of flanking DNA Luan, D., Korman, M.H., Jakubczak, J.L. and Eickbush, T.H. by a P element in Drosophila melanogaster: creation (1993) Reverse transcription of R2Bm RNA is primed by a nick of a transposable element. Proc Natl Acad Sci USA 88: 693– in the chromosomal target site: a mechanism for non-LTR 697. retrotransposition. Cell 72: 595–605. Tu, Z. (1997) Three novel families of miniature inverted-repeat MacRae, A.F. and Clegg, M.T. (1992) Evolution of Ac and transposable elements are associated with genes of the Ds1 elements in select grasses (Poaceae). Gentica 86: yellow fever mosquito, Aedes aegypti. Proc Natl Acad Sci 55–66. USA 94: 7475–7480. Meglécz, E., Petenian, F., Danchin, E., D’Acier, A.C., Rasplus, Tu, Z. and Orphanidis, S.P. (2001) Microuli, a family of miniature J.Y. and Faure, E. (2004) High similarity between flanking subterminal inverted-repeat transposable elements (MSITEs): regions of different microsatellites detected within each of two transposition without terminal inverted repeats. Mol Biol Evol species of Lepidoptera: Parnassius apollo and Euphydryas 18: 893–895. aurinia. Mol Ecol 13: 1693–1700. Vivas, M.V., Garcíia-Planells, J., Ruiz, C., Marfany, G., Paricio, Miller, W.J., Nagel, A., Bachmann, J. and Bachmann, L. (2000) N., Gonzàlez-Duarte, R. et al. (1999) GEM, a cluster of repeti- Evolutionary dynamics of the SGM transposon family in the tive sequences in the Drosophila subobscura genome. Gene Drosophila obscura species group. Mol Biol Evol 17: 1597– 229: 47–57. 1609. Weber, J.L. and May, P.E. (1989) Abundant class of human DNA Naykova, T.M., Kondrakhin, Y.V., Rogozin, I.B., Voevoda, M.I., polymorphisms which can be typed using the polymerase Yudin, N.S. and Romaschenko, A.G. (2003) Concerted chain reaction. Am J Hum Genet 44: 388–396.

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society, 20, 15–27 Lepidopteran mobile microsatellites 27

Weiner, A.M. (2002) SINEs and LINEs: the art of biting the hand Zuker, M. (2003) Mfold web server for nucleic acid folding that feeds you. Curr Opin Cell Biol 14: 343–350. and hybridization prediction. Nucleic Acids Res 31: 3406– Weiner, A.M., Deininger, P.L. and Efstratiadis, A. (1986) Nonviral 3415. : genes, , and transposable ele- ments generated by the reverse flow of genetic information. Supporting Information Annu Rev Biochem 55: 631–661. Wessler, S.R., Bureau, T.E. and White, S.E. (1995) Additional Supporting Information may be found in the LTR-retrotransposons and MITEs: important players in online version of this article under the DOI reference: the evolution of plant genomes. Curr Opin Genet Dev 5: 10.1111/j.1365-2583.2010.01046.x 814–521. Wicker, T., Guyot, R., Yahiaoui, N. and Keller, B. (2003) CACTA Table S1. Results of a BLASTn search of the National Center for Biotech- transposons in Triticeae. A diverse family of high-copy repeti- nology Information (NCBI) nonrestricted (nr) database in tab-delimited tive elements. Plant Physiol 132: 52–63. format. Wilder, J. and Hollocher, H. (2001) Mobile elements and the Figure S1. Nucleotide alignment of full-length Bombyx mori MINE-2 genesis of microsatellites in dipterans. Mol Biol Evol 18: 384– (BmMINE-2) elements identified within assembled genomic contigs. 392. Miniature inverted repeat transposable element (MITE)-like structural Wilson, E.T., Condliffe, D.P. and Sprague, K.U. (1988) Transcrip- features are as indicated in the schematic and alignments in Fig. 1. tional properties of BmX, a moderately repetitive silkworm gene that is an RNA polymerase III template. Mol Cell Biol 8: File S1. The FASTA formatted DNA sequence of GenBank accessions 624–631. that show regions homologous to the MINE-2 class of Miniature inverted repeat transposable element (MITE)-like elements. Yang, H.P. and Barbash, D.A. (2008) Abundant and species- specific DINE-1 transposable elements in 12 Drosophila Please note: Neither the Editors nor Wiley-Blackwell are genomes. Genome Biol 9: R39. responsible for the content or functionality of any support- Yang, H.P., Hung, T.L., You, T.L. and Yang, T.H. (2006) Genome- wide comparative analysis of the highly abundant transpos- ing materials supplied by the authors. Any queries (other able element DINE-1 suggests a recent transpositional burst than missing material) should be directed to the corre- in Drosophila yakuba. Genetics 173: 189–196. sponding author for the article.

© 2010 The Authors Insect Molecular Biology © 2010 The Royal Entomological Society, 20, 15–27