J Mol Evol (2004) 58:280–290 DOI: 10.1007/s00239-003-2550-2

The Utility of the Neglected Mitochondrial Control Region for Evolutionary Studies in (Insecta)

Marta Vila,1,2 Mats Bjo¨ rklund1

1 Department of Ecology, Evolutionary Biology Centre, Uppsala University, Norbyva¨ gen 18 D, SE-752 36, Uppsala, Sweden 2 IUX-Edificio de Servicios Centrais de Investigacio´ n, University of A Corun˜ a, Campus de Elvin˜ a, E-15071, A Corun˜ a, Galicia, Spain

Received: 24 February 2003 / Accepted: 15 September 2003

Abstract. The mitochondrial control region are straightforward over one part of the CR. The (=AT-rich-region) is a rarely used genetic marker in combination CR+COI appears to be a very prom- phylogeographic studies and population genetic sur- ising phylogenetic tool to resolve fast-evolving veys. Reasons for this are that the high AT content -level phylogenies. and the presence of tandem repeats and indels pose technical and analytical problems. We provide a new Key words: mtDNA control region — Cyto- pair of primers and the first taxonomically wide-scale chrome oxidase I — Insecta — Lepidoptera — description of control region (CR) structure in an Erebia — Indels — Structure — Phylogeography insect order after sequencing it in 31 lepidopteran — Phylogeny species. We assessed levels of variation occurring in the CR and cytochrome oxidase I (COI) by se- quencing and comparison. Intrapopulation analyses Introduction in five species of butterflies showed that CR was more variable than COI. Interpopulation variation from During the last decade there has been a remarkable three populations of Erebia triaria and E. palarica increase in population genetic and phylogeographic was slightly lower in the CR than COI with regard to studies using DNA sequence data. This has been single-nucleotide polymorphisms, but the results were greatly aided by the availability of effective markers. concordant between both markers and highly con- One popular marker is the mitochondrial control gruent with regard to population differentiation. region (CR), which has been used in many animal Using 15 species of we found that the CR taxa (Merila¨ et al. 1997; Donaldson and Wilson 1999; has the same, or stronger, phylogenetic resolution as Rankin-Baransky et al. 2001; Larizza et al. 2002). COI. Our results indicate that the CR can be of im- In Insecta, however, the CR remains a largely portance in addition to COI in population genetic unused and little-known genetic marker in most of studies. Alignments for the whole CR are direct and the more than 30 extant orders (Roehrdanz and unambiguous at the intraspecific level. Indels show Degrugillier 1998; Caterino et al. 2000; Lessinger and phylogenetic signal, but make this marker more Azeredo-Espin 2000; Schultheis et al. 2002a; complex to use than COI for higher phylogenetic Mardulyn et al. 2003). In Lepidoptera, the informa- analyses. Nevertheless, alignments at the generic level tion on the CR is almost null; it has been described for seven species of Lycaenids by Taylor et al. (1993) and sequenced for two (McKechnie et al. Correspondence to: Mats Bjo¨ rklund; email: mats.bjorklund@ 1993a,b). Thus, CR data from only nine species of ebc.uu.se Lepidoptera appear in sequence databases, although 281 the whole mitochondrial genomes are given for of the Erebia, one Maniola and four species of mori (Lee et al. 2000) and B. mandarina Coenonympha, and compare the topologies obtained (Yukuhiro et al. 2001). by using both CR and COI sequences to evaluate The usefulness of the CR, also known as the AT- their relative power in resolving species and genus rich region, for species-level phylogenies and popu- level phylogenies. Inter- level analyses are be- lation-level studies in Insecta has been questioned or yond the scope of this paper due to the problems in remains controversial (Taylor et al. 1993; Zhang and weighting large indels, needed for phylogenetic Hewitt 1997; Caterino et al. 2000). The remarkable analyses, at such taxonomic level. lack of CR sequences in Insecta may be due to (i) difficulties in amplification and/or sequencing, (ii) the Materials and Methods assumption that CR might not evolve faster than other regions of the mtDNA in (Zhang and Specimens Hewitt 1997), and (iii) putative strong selective con- straints related to its extreme richness in AT and to Individuals from Erebia triaria (n = 60) and E. palarica (n = 60) the presence of highly conserved structural elements were hand-netted in Galicia (NW of Spain) between 1998 and 2000 (Zhang and Hewitt 1997). Drosophila is the subject of and preserved in 95% ethanol until analysis. The average distance more studies about the insect CR (Clary and between the three locations was 96 km for E. triaria and 46 km for Wolstenholme 1987; Monnerot et al. 1990; Monforte E. palarica. Both species occur in spatial sympatry in Courel and et al. 1993; Inohira et al. 1997). The objective of these Queixa, although their flight periods do not coincide. For conser- vation purposes and to avoid the risk of amplifying sperm DNA has mainly been to survey its structure and , from the female spermatophores, only males were caught. Other not variation at the population level or higher, with species were either collected by the authors themselves or provided the exception of Brehm et al. (2001). There are only six by colleagues (see Appendix I for collectors and locations). studies using the CR as a genetic marker at the pop- ulation level in Insecta: McKechnie et al. (1993a, b), DNA Extraction, Amplification, and Sequencing Zhang and Hewitt (1996), Atkinson and Adams (1997), Mardulyn (2001), and Schultheis et al. Most genomic DNA was extracted from adult thorax tissue using (2002b), besides Vandewoestijne and Nice et al. (un- the DNAeasy Tissue Kit (Qiagen) or a Chelex protocol. Legs published). (Parnassius apollo) or larvae (Prodoxus decipiens and P. quinque- The aim of this paper is to evaluate the population punctellus) were used when thorax was not available. Amplifications were performed in 30-ll volumes containing 1· genetic and phylogenetic information contained by PCR Buffer II (Applied Biosystems), 2–2.5 mM MgCl2, 1.5 U the lepidopteran CR. First, we describe the structure AmpliTaq polymerase (Applied Biosystems), a 1 mM concentration of CR at the generic and family levels in Lepidoptera. of GeneAmp dNTP (Perkin Elmer), a 0.7 lM concentration of each Thus, we have sequenced 29 species ranging across primer J6 (Zhang et al. 1995) + Lep12S (Taylor et al. 1993) or five families of butterflies and two Prodoxidae moths. SeqLepMet + LepAT2B (this paper) and 50–70 ng of DNA. Primer SeqLepMet (50 TGA GGT ATG ARC CCA AAA GC 30) lies in the Second, we survey the usefulness of the CR as a ge- 50 end of the tRNA-methionine gene, whereas primer LepAT2B netic marker at the intraspecific level. We focus on (50 ATT AAA TTT TTG TAT AAC CGC AAC 30), antisense to the amount of variability found within and among SeqLepMet, is located toward the 50 extreme of the 12S rRNA gene. three populations of Erebia triaria and E. palarica Primers SeqLepMet and LepAT2B correspond in their 30 ends to (: Satyrinae). We compare CR se- positions 10,361 and 9667, respectively, in the mtDNA sequence (Lee et al. 2000). CR PCR started with 95C for 2 quences with those from cytochrome oxidase I (COI) min, followed by 35 cycles at 94C for 60 s, 51C for 90 s, and 65C since the latter is the main marker used at the intra- for 60 s, and a final step at 65C for 7min. DNA extractions from 5- specific level of analysis in Insecta (Caterino et al. year-old dried Erebia triaria were not suitable for PCR amplifica- 2000; e.g., Juan et al. 1998; Bogdanowicz et al. 2000; tion of the CR and hardly yielded any COI products. Shufran et al. 2000; Althoff et al. 2001; Williams COI length is 1.5 kilobases (kb) but we focused on its second half following the variability survey presented by Lunt et al. (1996) 2002). In addition, we have sequenced both markers in other Insecta. The fragment used for intraspecific surveys in in a single population from five more widespread Erebia triaria and Erebia palarica (786-bp length) started at posi- species. Overall, the mean sequence diversity of these tion 717 of COI. For species-level phylogenies, a fragment of about seven species sheds some light on the degree of vari- 1250 bp (starting at position 313) was used. COI (PCR product, 2 ability expected from both markers at the intrapop- kb) was amplified in 30-ll reaction volumes containing 1·PCR, 2 mM MgCl2, 1 U Taq, 1 mM dNTP, a 1 lM concentration of each ulation and intraspecific level. primer C1-J-1751 (Simon et al. 1994) and C2-N-3662 (Crozier and Third, we compare the genetic divergence of the Crozier 1993), and 50–70 ng of DNA. The PCR cycle differed from CR among 15 Satyrinae species with that obtained that of the CR only in annealing at 53C and extension for 90 s. from COI since the latter is also frequently used for PCR products were electrophoresed on 2% agarose gels and phylogenetic analyses in Lepidoptera (e.g., Brower visualized under UV light after ethidium bromide staining. Prod- ucts were purified with Microcon PCR Centrifugal Filter Devices 1994; Zimmermann et al. 2000; Rubinoff and Sper- (Amicon–Millipore) and used as template for direct sequencing on ling 2002). We perform phylogenetic analyses using an Applied Biosystems ABI-310 DNA sequencer. All PCR prod- both parsimony and distance methods on ten species ucts were sequenced in both directions. 282

Fig. 1. General structure of the lepidopteran mitochondrial control region and location of variable sites. A Arrows indicate approximate location of the annealing sites of CR primers used in the present study. E1 and E2 flank a highly conserved stem–loop secondary structure in Insecta. Hom stands for a stretch of thymines and G for Section G, both characteristic of the lepidopteran CR. Length of the CR is shown as in Erebia and Coenonympha. The number of polymorphic sites is shown along the aligned CR from B nine species of Erebia and C four of Coenonympha. Nucleotide substitions sites (white) and number of nucleotides involved in indels (shaded) were calculated for each 20 consecutive sites. Alignment files are available from the authors upon request.

CR PCR products amplified with the primer pair J6+Lep12S Heliothinae moths published by Taylor et al. (1993) as well as were successfully cycle-sequenced using SeqLepMet and LepAT2B Bombyx mitochondrial genomes (Lee et al. 2000; Yukuhiro et al. as internal primers and 90 ng of purified PCR product. Three in- 2001). ternal primers were used for COI: SeqIntCOIf (50 CWT CWT TTT We reported CR sequences as plus strand and in the same TTG AYC CAG CWG GAG 30, sense strand [this paper]), Lep- orientation as previous literature (Taylor et al. 1993; Zhang et al. LEUr (50 CCA TTA CWT ATA RTC TGC CAT ATT 30, anti- 1995; Zhang and Hewitt 1997; Caterino et al. 2000). Secondary sense strand in the tRNA-leucine [this paper], and LepCOIIr (50 structures (not shown) were inferred using the Mfold Program, CGT ART GAW GGA AGR GCA ATA 30, antisense strand in Washington University (Mathews et al. 1999; Zuker et al. 1999). the COII gene [this paper]). Primers SeqIntCOIf, LepLEUr, and 0 LepCOIIr correspond in their 3 ends with positions 12,494, 13,365, Phylogenetic Analysis and 13,412 according to Bombyx mori mtDNA sequence (Lee et al. 2000). Several facts indicated that the mtDNA PCR products were Standard indices of genetic variation, number of segregating sites mitochondrial, and not of nuclear origin (NUMTs; see Roehrdanz (S), nucleotide diversity (p), haplotype diversity (h), average num- and Degrugillier 1998; Bensasson et al. 2001). First, no unexpected ber of nucleotide differences (k), h (from S) per gene, and Tajima’s stop codons or frameshift mutations were present in the coding D, were calculated in DNAsp (Rozas and Rozas 1999). Most sequences. Second, several long PCR reactions were performed in parsimonious trees (MP) were obtained for 15 Satyrinae with a Erebia triaria, E. palarica, and E. ligea. Products of 5-kb length Nymphalinae (Melitaea latonigena) as outgroup. GenBank acces- (obtained with primers 16SR by Xiong and Kocher [1991] and C1- sion numbers are AY346219–AY346234 for COI and AY346235– J-1751 by Simon et al. [1994]) ranged across both COI and the CR AY346250 for the CR. and yielded sequences identical to those from the usual PCR. Long We assessed branch support through 1000 bootstrap replicates PCR reactions were performed using the Expand Long Template and Bremer (1994) support values, the latter calculated with PCR System (Roche). The manufacturer’s instructions were fol- Autodecay 4.0 (Eriksson 1998). COI trees were calculated including lowed (Buffer III) and the PCR program set at 94C for 2 min, 10 all variable positions, following Bjo¨ rklund (1999). A strict con- cycles at 94C for 10 s and 63C for 12.5 min, then 20 cycles at 94C sensus was calculated when obtaining several trees. Neighbor- for 10 s and 63C for 12.5 min, increasing 20 s with each cycle, with joining (NJ) trees were generated using the Tamura–Nei (TN) a final extension of 7min at 68 C. Third, trees based on the two model for the CR and the general time-reversible (GTR) model for independently amplified and separated mtDNA fragments were both COI and the combination of CR+COI. Again, branch sup- congruent—unlikely if the data set consisted of mixed mtDNA and port was assessed with 1000 bootstrap replicates. In both ap- Numts. Fourth, overlapping parts of sequences produced by dif- proaches, gaps were treated as missing data (pairwise deletion). ferent primers (i.e., J6+Lep12S and SeqLepMet+LepAT2B) were Phylogenetic and molecular evolutionary analyses were conducted identical, except in Prodoxus decipiens, where two different se- using PAUP* version 4.0b10 (Swofford 1998). The best-fit model of quences with common features were obtained. It is unlikely that DNA substitution for each data set was chosen by performing several primer combinations would preferentially amplify a Numt. hierarchical likelihood ratio tests using PAUP*4 and Modeltest 3.06 (Posada and Crandall 1998). Sequence Analysis

Sequences were aligned with AutoAssembler 2.1 (Applied Biosys- Results tems) and ClustalX (Thompson et al. 1997). The sequences are deposited in GenBank (see Appendix I for Accession numbers Structure of the CR [supplementary material]). Codon positions were identified by alignment with sequence from the Satyrinae Coenonympha tullia (Caterino et al. 2001) and the Nymphalinae Melitaea didyma The CR is flanked by methionine tRNA (Met), fol- (Zimmermann et al. 2000), whereas alignments with tRNA-Met lowed by isoleucine (Ile) on one side and by the 12S and 12S rDNA were performed with the Lycaenidae butterflies and rDNA gene on the other (Fig. 1A). The CR appears 283

Table 1. Description of the control region of 31 Lepidoptera

Taxon N Length (bp) AT% H Indela SNP

Erebia triaria 60 412–413 91.711 (T) l 8TS Erebia palarica 60 400–405 91.8 8 3(TA) (T)lhom 3 Ts Erebia euryale 3 371 89.94 2 — 1 Ts

Erebia cassioides 2 379 94.19 2 (TA)l — Erebia meolans 5 405–406 91.9 2 (TA)1hom 1 Ts Erebia ligea 2 370 90.13 2 — 1 Ts Erebia epiphron 2 369 89.971 — —

Erebia pandrose 3 374–375 91.45 2 (T)lhom — Erebia oeme 1 >300 89 Erebia gorge 1 38791.2 Coenonympha hero 5 40791.792 — 1 Ts Coenonympha arcania 1 408 90.93 Coenonympha glycerion 1 405 92.6 Coenonympha pamphilus 1 411 92.46

Arethusana arethusa 2 689–715 95.37 2 (TnAm) 1 Tv 1Ts Pyronia tithonus 3 454–456 91.5 3 (TA)l 1Ts Maniola jurtina 1 >380 92.11 Aphantopus hyperantus 1 >363 89.81 Parnassius apollo 2 504–505 92.31 2 (T)l 1Ts Inachis io 1 338 92.9 Aglais urticae * 359 92.48

Melitaea latonigena 5 343–345 94.3 2 (TA)l 1Ts Melitaea didymoides 1 345 94.2 Erynnis montanus 1 357 93.28 Daimio tethys 2 415 91.81 1 — — Artogeia napi 1 400 91.25 Artogeia rapae 1 34791.07 —

Plebejus argus 5 377–380 90.8 3 (TA)l (T)l — Aricia agestis 5 390–392 93.85 2 (T)2hom 1 Tv Callophrys rubi 1 >359 92.75 Prodoxus quinquepunctellus 3 498 92.6 1 — —

Note. H, Number of haplotypes. Indels were classified as occurring in (i) TA repeats (TA) or (ii) the homopolymer (hom). Single-nucleotide polymorphisms (SNP) are specified as transitions (Ts) or transversions (Tv). Taxa with symbols ( or >) beside the length value lack 1–10 nucleotides. (*) S. Vandewoestijne, unpublished data. a 3(T)lhom means three indel polymorphic sites of one single thymine in the homopolymer. to have a fairly conserved length at the intraspecific and 4) and thus it would be the most useful for popu- level (412–413 bp in Erebia triaria and 400–405 bp in lation studies. We screened congeneric interspecific Erebia palarica), although it is more variable at the diversity in nine species of Erebia (E. oeme excluded) level of congeneric species (average for 10 Erebia and four of Coenonympha. Variation appears as both species = 377 bp, SD = 31.5, and average for 4 indels and nucleotide substitutions (Figs. 1B and C). In Coenonympha species = 408 bp, SD = 2.5; Table 1). Erebia, long indels appear particularly in the transition Arethusana arethusa (Nymphalidae: Satyrinae), between Section C and Section D, with mono-/dinu- Leptidea sinapis (Pieridae), and Parnassius apollo cleotide indels more likely to appear in Sections A, B, (Papilionidae) had a longer CR (500–700 bp), due and D and in the homopolymer. In the Coenoympha mainly to an increase in the size and copy number of species, the longest deletion was of five nucleotides in repeat units. Section A, the rest being short indels scattered in Sec- After visually comparing the CR sequences of the tions A, B, C, and D and the homopolymer. We did not seven taxa used for intraspecific variability analysis observe any signs of either length or sequence hetero- (Tables 2 to 4), we identified nine sections in the plasmy; the ambiguity in a single individual (haplotype lepidopteran CR, most of them after Zhang et al. ET10; Table 2A) is likely due to the low quality of the (1995) and Zhang and Hewitt (1997) (see Appendix II sequence at the very beginning of it. Adenine and for details [supplementary material]). thymine contents (Table 1) were on average 92.0% (SD Intraspecific variation occurred both as (i) mono-/ = 1.5%). All taxa surveyed showed a higher content dinucleotide indels occurring in Sections A and D, the of T (mean, 49.2%; SD, 1.86%) than A (mean, 42.7%; homopolymer, and Section G and (ii) SNPs, mainly SD, 1.79%) with the exception of Daimio tethys (44.1% transitions, occurring in Sections A, C, and D and the T and 47.7% A). different sections around putative Section F. The most Amplification and sequencing with CR primers variable part appears to be Section D (Tables 2A, 2B, developed for Erebia were successful in all taxa listed 284

1 Table 2A. Mitochondrial DNA polymorphic sites in the control region of Erebia triaria haplotypes found in three NW Iberian populations

Position/section Individuals per population

GenBank 30 61 66 117128 135 166 169 177 accession No. A C C DDDDDDXISCOUQUE N

ET0 AY350485 — T A A A T A G A 0 6 4 10 ET1 AY350486 C G G 1 0 0 1 ET2 AY346235 T 0 4 0 4 ET3 AY350487T G 0 1 0 1 ET4 AY350488 G 0 1 0 1 ET5 AY350489 C A 0 6 1 7 ET6 AY350490 T C 0 1 0 1 ET7AY350491 G 0 0 11 11 ET8 AY350492 C 1 0 4 5 ET9 AY365141 C G 18 0 0 18 ET10 * N0101 Note. XIS, Xistral; COU, Courel; QUE, Queixa. Length of the analyzed fragment: 361 bp. Individuals per population: number of individuals sharing a particular haplotype. (*) Not submitted to public databases. (—) Lack of a nucleotide in this site.

Table 2B. Mitochondrial DNA polymorphic sites in control region of Erebia palarica: CR haplotypes found in three NW Iberian populations

Position/section Individuals per population

GenBank 28 78 79 122 123 151 172 173 312 accession No. A D D D D D D D F ANC COU QUE N

EP0 AY346236 T — — A T A T A C 13 14 734 EP1 AY350480 C — — 1 0 0 1 EP2 AY350481 — — G 4004 EP3 AY350482 T A — — 0 1 0 1 EP4 AY350483 — — T 0 0 1 1 EP5 AY350484 — — — — 2 5 11 18 EP6 AY365142 — — —— 0011 Note. Seven CR haplotypes, as polymorphism in the homopolymer is not considered. Individuals per population: number of individuals sharing a particular haplotype. ANC, Ancares; COU, Courel; QUE, Queixa. Length of the analyzed fragment: 354 nucleotides. (—) Lack of a nucleotide in this site.

in Table 1. For details about failure in other species, sidered), most of which differ in terms of transitional see Appendix III (supplementary material). SNPs, although in 3 haplotypes there is an insert of a single T. The populations differ considerably in their Intraspecific Variation in the CR Compared to COI frequency of the different haplotypes. For example, by far the most common haplotype in the Xistral Haplotype Distribution population (ET9) was not found in the other two populations, while the most common haplotype in We did not find any segregating sites after the Queixa (ET7) was not found elsewhere. In E. palarica homopolymer region of the CR from Erebia triaria seven haplotypes were found (Table 2B), and haplo- and E. palarica studied populations. Two individuals types differ both in terms of transitional SNPs and in of E. palarica showed a single-nucleotide insertion (T) terms of TA indels. The most common haplotype in the homopolymer, however, this polymorphism is (EP0) was found at a high frequency in all popula- not considered in this section, since even bidirectional tions, while the second most common (EP5) was sequencing can in some cases be ambiguous regarding found at a high frequency in Queixa, but at much the exact number of nucleotides at the homopolymer. lower frequencies in the other two populations. Therefore, a fragment (prior to the homopolymer) of There were 14 COI haplotypes in E. triaria (Table 361 bp in E. triaria and 354 bp in E. palarica was 2C). All substitutions were synonymous third-posi- analyzed. tion substitutions except for one nonsynonymous In E. triaria 10 haplotypes were found (Table 2A; first-position substitution. The populations differed haplotype ET10 defined by an ambiguity not con- in their haplotype frequency distribution. For exam- 285

Table 2C. Mitochondrial DNA polymorphic sites in cytochrome oxidase I of Erebia triaria haplotypes in this study

GenBank Position Individuals per population accesion No. 857 923 947 974 989 1007 1019 1040 1055 1097 1124 1175 1176 1298 1307 1424 1499 1502 XIS COU QUE N

ET0 AY346219 C T C G A T C C G T G G G T G A T T 0 14 12 26 ET1 AY350467T A T A A C 0 4 1 5 ET2 AY350468 T T A T A A C 0 1 0 1 ET3 AY350469 A 0 1 0 1 ET4 AY350470 T T A A C 0 0 4 4 ET5 AY350471 G0022 ET6 AY350472 A 0 0 1 1 ET7AY350473T T T A A C A C 13 0 0 13 ET8 AY350474 T T T A A A C A C 1 0 0 1 ET9 AY350475 T T T A A C A C C 1 0 0 1 ET10 AY350476 T T T A C A C A C 2 0 0 2 ET11 AY350477 T G C T T A A C A C 1 0 0 1 ET12 AY350478 T C T T A A C A C 1 0 0 1 ET13 AY350479 T C T T A A C A C 1 0 0 1 Note. XIS, Xistral; COU, Courel; QUE, Queixa. Length of the analyzed fragment: 786 bp. Individuals per population: number of individuals sharing a particular haplotype.

Table 2D. Mitochondrial DNA polymorphic sites in cytochrome oxidase I of Erebia palarica haplotypes

Position Individuals per population GenBank accession No. 839 926 1059 1193 1280 1298 1322 ANC COU QUE N

EP0 AY346220 A T C C T A T 13 0 20 33 EP1 AY350463 C T 3 0 0 3 EP2 AY350464 C 1 3 0 4 EP3 AY350465 G G 1 170 18 EP4 AY350466 T C 2 0 0 2 Note. ANC, Ancares; COU, Courel; QUE, Queixa. Individuals per population: number of individuals sharing a particular haplotype. Length of the analyzed fragment: 786 bp. ple, the most common haplotype in Xistral (ET7) was of 20 from Queixa, indicating differences among not found in the other populations, while the most populations (see Lunt et al. [1998] for a review on test common haplotype in the two other populations of genetic subdivision by indels). (ET0) was not found in Xistral. This corresponds to Sequences of CR and COI in five individuals from the pattern found in the CR. See Appendix IV (sup- the same population of two Lycaenidae species, one plementary material) for results about intraspecific Nymphalinae, and two other Satyrinae (Table 4) variability in other parts of COI. A detailed analysis showed that the CR can provide a higher mean se- of the spatial structure of haplotypes will be pub- quence diversity (p) than COI. Note the apparent lished elsewhere (Vila et al., unpublished). exception of Plebejus argus, whose CR only showed variability as indels: three different haplotypes from two segregating sites, comparable to COI, which re- Sequence Variability sulted in three different haplotypes from two transi- tions. When combined over populations, the CR and Average pairwise distances calculated for the 60 COI showed similar levels of sequence variability individuals of Erebia triaria was 0.005 (SD = 0.003) regarding SNPs (Table 3), taking into account that a for the CR and 0.009 (SD = 0.003) for the third longer sequence of COI was used. Erebia triaria ap- codon positions of COI, both under the TN model. peared to be more variable than E. palarica (Table 3), The same calculations for E. palarica resulted in an and by combining CR and COI sequences 20 different average of 0.001 (SD = 0.0006) for the CR and an haplotypes were obtained for the former and 13 for average of 0.006 (SD = 0.0056) for COI. the latter. Dinucleotide gaps were more frequent in The correlation between distances from COI third Erebia palarica, where the TA motif (positions 172– codon positions (786 bp; most variable part) and the 173; Table 2B) appeared as a deletion in 2 of 20 in- CR (361 bp) in the 20 different haplotypes of dividuals from Ancares, 6 of 20 from Courel, and 11 E. triaria was 0.74 (pairwise n = 190), with a slope of 286

Table 3. Comparison of the mitochondrial DNA molecular diversity of Erebia triaria and E. palarica provided by the control region and cytochrome oxidase I

Gene Species N Size (bp) S PI H h (SD) p (SD) k h (per gene) Tajima’s D

CR E. triaria 60 360–361 8 + 1 4+1 10 0.83 (0.025) 0.0044 (0.0002) 1.60 1.71 )0.1847 E. palarica 60 350–356 3 + 3 1+2 7I 0.62 (0.038) 0.0005 (0.0002) 0.19 0.64 )1.3883 COI E. triaria 60 786 18 12 14 0.76 (0.045) 0.0057 (0.0002) 4.49 3.86 0.4983 E. palarica 60 786 7 7 5 0.61 (0.048) 0.0016 (0.0002) 1.31 1.50 )0.3337 Note. Number of segregating sites (S) as number of substitutions plus number of indels. PI, parsimony informative sites; H, number of haplotypes. Haplotype diversity (h) and nucleotide diversity (p) per site with standard deviation (SD) in parentheses. k, average number of nucleotide differences. Theta (from S) per gene.

Table 4. Intrapopulation mitochondrial DNA genetic diversity estimated from the control region and cytochrome oxidase I in five species of Lepidoptera

NH S p (SD) (SD)

Size, CR type of Location in COI type of Species CR COI COI (bp) CR COI mutationsa CR section mutations CR COI CR COI

Plebejus argus 3 3 1239 0+2 2 (AT)1 D 2 Ts 0 0.0006 0 0.002 (T)1 G (0.0005) (0.0019) Aricia agestis 3 1 11971+1 0 1 Tv A 0.001 0 0.001 0

(T)2 Homopolymer (0.0011) (0.001) Melitaea 3 3 1248 1+1 2 (TA)1 D 2 Ts 0.0065 0.0008 0.0070.003 latonigena 2 Tv F (0.0033) (0.0006) (0.0038) (0.0024) 3Ts F Erebia 2 2 1547b 1+1 1 1 Ts F 1 Ts 0.001 0.0003 0.001 0 meolans

(T)1 Homopolymer (0.0009) (0.0003) (0.0012) Coenonympha 2 1 1255 1+0 0 1 Ts F 0.0015 0 0.001 0 hero (0.0016) (0.0018) Note. Number of haplotypes (NH). Segregating sites (S), split for CR into number of indels plus number of single-nucleotide polymorphisms (SNP). See Table 1 for CR length. Mean sequence diversity per site (p) and standard deviation (SD). Transitions (Ts) and transversions (Tv). 2 Pairwise distances averaged under Tamura–Nei model. CR GenBank accession numbers: P. argus (AY351424–AY351426), A. agestis (AY351427–AY351429), M. latonigena (AY346250, AY351409, AY351410), E. meolans (AY346239, AY351403), and C. hero (AY346245, AY351408). COI GenBank accession numbers: P. argus (AY350457–AY350459), A. agestis (AY350456), M. latonigena (AY346234, AY350460, AY350461), E. meolans (AY346223, AY350455), and C. hero (AY346229). a (N)n, nucleotides involved in indel polymorphisms. b No segregating sites occurred in the cytochrome oxidase II fragment that this sequence includes.

3.52, indicating a more than threefold faster rate of E. triaria and 93.3% in E. palarica) and this was not substitution of COI than the CR in this . At the the case. species level the corresponding correlation coefficient was 0.56 (pairwise n = 42) using nine species of Erebia (E. oeme omitted due to incomplete sequence) Phylogeny and the four Coenonympha species. However, this correlation was entirely due to four data points, all of Interspecific (Intrageneric) Divergence them with very low divergences (<0.05 for the CR, <0.15 for COI). If those data points are removed, the The average of CR pairwise distances (under the correlation disappears. Saturation could play a role TN model) was 0.087(SD = 0.018) for nine Erebia in keeping such low CR distances, according to species (E. oeme omitted) and 0.119 (SD = 0.0461) Caccone et al. (1996). However, this is unlikely in a for four Coenonympha species. COI third codon po- relatively unconstrained fragment such as the CR. sitions showed an average pairwise distance of 0.269 In addition, saturation should also have influenced (SD = 0.089) for Erebia species and 0.349 (SD = third codon positions in COI since they have AT 0.13) for Coenonympha. Note that C. glycerion/ contents comparable to those of the CR (91.5% in C. arcania was undefined. If all positions were taken 287

branching as well as the C. pamphilus/arcania–hero split. CR improvement regarding topology meant that Erebia epiphron and E. pandrose did not appear as unresolved species within a polytomy as happened with COI (Fig. 2B) but became highly supported as part of the ligea/euryale clade (Fig. 2A). The support for the E. cassioides/E. gorge branch was higher in the COI tree. Combining both sequences (Fig. 2C; one tree with a length of 1193 steps, CI = 0.67, RI = 0.65), branch support for the Maniola/Erebia split as well as the C. glycerion/C. arcania–hero improved. In addition, a decrease in polytomies in the genus Erebia was observed. Neighbor-joining trees using COI and CR differed in topology and resolution. The COI tree (under the GTR model) showed a branching pattern very similar to that of the MP COI tree (Fig. 2B), with the exception of the appearance of E. triaria/E. pandrose grouping together with only 58% bootstrap support. The CR tree (under the TN model) agreed in the E. meolans/palarica and C. pamphilus/C. arcania–hero–glycerion groupings (Fig. 2A) but failed in distinguishing the genera Maniola and Erebia. When adding the CR and COI sequences, the tree very much resembled the corresponding MP tree (Fig. 2C), with the only exceptions a higher bootstrap value for the Maniola/Erebia split and weak support (55% bootstrap) for grouping of E. triaria and Fig. 2. Phylogenetic reconstruction of 15 Satyrinae species with E. pandrose. Melitaea latonigena (Nymphalinae) as outgroup. Strict consensus of the (A) CR nine most parsimonious trees, (B) COI three most parsimonious, trees and (C) CR+COI single most parsimonious Discussion tree. Bootstrap support proportions (1000 replicates) are indicated above branches if more than 50% support was obtained; Bremer support values are depicted below them. Structure and Evolution of the CR

We showed that CR primers SeqLepMet and 2B worked for most lepidopteran species (see supple- into account, the average of COI pairwise distances mentary information) and can be used as internal (under the GTR model) was 0.0671 (SD = 0.0181) primers in combination with the already published Erebia for the nine species and 0.083 (SD = 0.025) primers (see Materials and Methods) improving the Coenonympha for the four species. sequencing process. There appear to be two groups of CRs (see Zhang Interspecific (Subfamily) Phylogeny et al. 1995; Zhang and Hewitt 1997): Group 1, found in fruit flies, where a conserved domain is followed by We tested the utility of the CR at the subfamily a variable domain; and Group 2, to which grass- level of phylogenies by calculating trees using 15 hoppers, locusts, and mosquitoes belong, where no Satyrinae taxa (10 species of Erebia, 4 species of conserved and variable domains can be differentiated. Coenonympha, and Maniola jurtina) having the Here, we classified butterflies and moths as belonging Nymphalinae Melitaea latonigena as outgroup. We to Group 2. Furthermore, the lepidopteran CR was compared the obtained topologies from CR, COI and more similar to the type of the Schistocerca gregaria both markers together. locust, where the entire CR does not appear to be The maximum parsimony trees (CR—nine trees tandemly repeated, than to the Chorthippus parallelus with a length of 382 steps, consistency index [CI] = grasshopper one. It was possible to identify most 0.74, retention index [RI] = 0.76; COI—three trees features previously defined by Zhang and Hewitt with a length of 835 steps, CI = 0.62, RI = 0.55) (1997). The downstream Section F was not as showed slightly better topology resolution when straightforwardly identified by its GA richness, but based in the CR (Figs. 2A and B). The CR tree had its flanking regions showed certain similarities to higher bootstrap support for the Maniola/Erebia those of other insects (see Zhang et al. 1995). 288

Based on the data obtained we suggest that the CR Given that COI and the CR are linked on the could provide useful information for intraspecific mtDNA and recombination is believed to be absent, studies of population differentiation and, also, for a correlation between the levels of divergence in COI phylogenetic work at the generic level. Congeneric and CR could be expected (Ruokonen and Kvist diversity in nine species of Erebia and four of Coen- 2002). Comparing pairwise distances from COI third onympha appeared as both indels and nucleotide codon positions versus the CR in Erebia triaria,we substitutions (Fig. 1). Since length was very con- found that the former evolved more than three times servative within both genera from Section E1 on- faster than the latter. However, this took into ac- ward, those sections are straightforward to use for count only substitutions, and inclusion of CR indels phylogenetic analysis. in the calculation of distances is required for a full The absence of length heteroplasmy in the CR, estimation of the rate of evolution. mainly in the tandem repeat, has already been re- ported in butterflies (Taylor et al. 1993) and the lo- Phylogeny cust Schistocerca gregaria (Zhang et al. 1995) and appears to be a widespread phenomenon in Insecta The CR evolves more slowly than the third position (Zhang and Hewitt 1997). of mtDNA protein coding genes ND4 and ND5 in some insects (Caccone et al. 1996). In contrast, some Drosophila CR domains have a faster rate of evolu- Sequence Variation in the CR Compared to COI tion than six functional mitochondrial regions (Bar- rio et al. 1994; Brehm et al. 2001). If indels are not The fact that the lepidopteran CR is prone to inser- included, our Erebia triaria and E. palarica intra- tions and deletions was previously known for Papilio specific results showed that CR evolved more slowly spp. (Sperling 1991), Plebejus argus (Brookes et al. than the third codon position of the COI coding gene. 1997), and Heliconius spp. (A.V.Z. Brower, unpub- Note that other species showed higher divergence in lished data). Our work showed that there was length the CR than in third codon positions of COI at the variation in the CR from the intrapopulational to the intraspecific level (Table 4), but the difference in interfamiliar level. The alignments were straightfor- sample sizes precludes further analyses. At the generic ward and unambiguous (i) for the whole CR at the level (nine species of Erebia and four of Coenonym- intraspecific level and (ii) from about section E1 on- pha) such a correlation disappeared when the four ward at the interspecific level. If only SNPs were closest pairs of species were removed from the anal- taken into account, the CR had the same resolution ysis. as COI to reveal population structure, and we rec- The MP trees calculated with both the CR and ommend the use of both these two genes as comple- COI were similar apart from the degree of support mentary information. The CR carried sufficient for different branches. The only topological difference phylogenetic signal to be informative at the conge- was a higher resolution resulting in the grouping of neric species level and, also, at the subfamily level E. pandrose/E. oeme/E. ligea/E. euryale in the CR tree (Figs. 2A and C). Nevertheless, when the CR is used (Fig. 2A). The combination of CR and COI se- for phylogenetic work, a clear strategy with regard to quences (Fig. 2C) clarified the position of one of the the indels is desirable (see McGuire et al. 2001). four unresolved species that appeared in the COI tree. There was a clear difference in terms of intraspe- All trees consistently showed three Erebia associa- cific variability in the two species examined at the tions: cassioides–gorge, euryale–ligea, and palarica– population level. Despite the fact that the most var- meolans. These and the clade pandrose–oeme–ligea– iable part was Section D, in E. triaria most of the euryale (MP CR; Fig. 2A) agreed with the morpho- variation appeared as transitions (Table 2A), and in logical classification (Warren 1936). E. palarica as TA indels (Table 2B). At present we The NJ method performed worse with the CR cannot provide any good explanation for the differ- than with COI sequences, although when both ence between the species. markers were combined, topology and support were Overall, the CR showed a level of nucleotide very similar to those of the MP trees. This low res- diversity similar to or higher than that of COI (ex- olution of the NJ with CR data was probably due to ceptions are Erebia triaria and Plebejus argus)as too little variation and undefined distances, which shown in Tables 3 and 4, despite being considerably makes the use of distance methods more inefficient shorter than the COI fragment. Some species, such as with this kind of sequences. In both approaches, gaps Erebia palarica and Plebejus argus, tended to show were treated as missing data (pairwise deletion), so variability as indels. Whether indels have arisen this was a conservative estimate of the actual poten- rather than substitutions in those species or past de- tial of the CR. mographic events have caused the differences is cur- The CR was not variable enough to resolve, by rently unknown. itself, the phylogeny of six Jalmenus species thought 289 to have speciated very recently (Taylor et al. 1993). In Brower AVZ (1994) Phylogeny of Heliconius butterflies inferred contrast, the CR adds significant information in the from mitochondrial DNA sequences (Lepidoptera: Nymphali- phylogeny of Erebia (Fig. 2). The lack of resolution in dae). Mol Phylogenet Evol 3:159–174 Caccone A, Garcı´ a BA, Powell JR (1996) Evolution of the mito- Erebia phylogenies with other markers has been ar- chondrial DNA control region in the Anopheles gambiae com- gued to be the result of a rapid diversification (Martin plex. Insect Mol Biol 5:51–59 et al. 2000, 2002). Nevertheless, the combination Caterino MS, Cho S, Sperling FAH (2000) The current state of CR+COI provided higher support for the ligea- insect molecular systematics: A thriving tower of Babel. Annu euryale pair than other mtDNA sequences did (see Rev Entomol 45:1–54 Martin et al. 2000). Caterino MS, Reed RD, Kuo MM, Sperling FAH (2001) A par- titioned likelihood analysis of swallowtail butterfly phylogeny To conclude, the present study has demonstrated (Lepidoptera: Papilionidae). Syst Biol 50:106–127 that the combination of both the CR and COI is ef- Clary DO, Wolstenholme DR (1987) Drosophila mitochondrial fective in elucidating intraspecific genetic structure DNA: Conserved sequences in the A+T-rich region and sup- and is promising for elucidating the phylogeny of porting evidence for a secondary structure model of the small fast-evolving lepidopteran genera. ribosomal RNA. J Mol Evol 25:116–125 Crozier RH, Crozier YC (1993) The mitochondrial genome of the honeybee Apis mellifera: Complete sequence and genome or- Acknowledgments. The authors are very grateful to the following ganization. Genetics 133:97–117 persons for help in the lab, analyses, identifications, unpublished Donaldson KA, Wilson Jr RR (1999) Amphi-panamic geminates data, and/or enlightening discussions: C. Vila` , J.R. Vidal-Romanı´ , of snook (Percoidei: Centropomidae) provide a calibration of A. O¨ deen, D. Andersson, E. Karvonen, N. Ryrholm, O. Kudrna, the divergence rate in the mitochondrial DNA control region of A. Ha¨ rlid, W. Delport, A.V.Z. Brower, C. Nice, and S. Van- fishes. Mol Phylogenet Evol 13:208–213 dewoestijne as well as colleagues, all listed in Appendix I (supple- Eriksson T (1998) AutoDecay version 4.0. Distributed by the au- mentary material), who have provided us with specimens. J. Vila, thor. Department of Botany, Stockholm University. http:// D. Romero, S. Uceira, J. Vidal, M. Ulloa, M. Ferna´ ndez, and www.bergianska.se/personal/TorstenE/ L. Lo´ pez kindly helped in the field. C. Vila` , M. Carlsson, M. Inohira K, Kara T, Matsuura ET (1997) Nucleotide sequence di- Webster, and A. O¨ deen reviewed early drafts. F.A.H. Sperling and vergence in the A+T rich region of mitochondrial DNA in an anonymous referee improved the manuscript with their com- Drosophila simulans and Drosophila mauritiana. Mol Biol Evol ments. This work was financially supported by the IUX at Uni- 14:814–822 versity of A Corun˜ a (Spain), an FPU fellowship (Spanish Ministry Juan C, Ibrahim LM, Oromı´ P, Hewitt GM (1998) The phyloge- of Education) to M.V., and grants from the Swedish Natural Sci- ography of the darkling beetle Hegeter politus in the eastern ence Research Council to M.B. Canary Islands. Proc R Soc Lond B Biol Sci 265:135–140 Larizza A, Pesole G, Reyes A, Sbis E, Saccone C (2002) Lineage specificity of the evolutionary dynamics of the mtDNA D-loop region in rodents. J Mol Evol 54:145–155 References Lee JS, Kim YS, Sung SH, Hwang JS, Lee DS, Suh DS (2000) Bombyx mori mitochondrion, complete genome. Direct sub- Althoff DM, Groman JD, Segraves KA, Pellmyr O (2001) Phylo- mission to GenBank. Accession: AF149768 geographic structure in the Bogus yucca Prodoxus quin- Lessinger AC, Azeredo-Espin AML (2000) Evolution and struc- quepunctellus (Prodoxidae): Comparisons with coexisting tural organisation of mitochondrial DNA control region of pollinator yucca moths. Mol Phylogenet Evol 21:117–127 myiasis-causing flies. Med Vet Entomol 14:71–80 Atkinson L, Adams, ES (1997) Double-strand conformation Lunt DH, Zhang D-X, Szymura JM, Hewitt GM (1996) The insect polymorphism (DCSP) analysis of the mitochondrial control cytochrome oxidase I gene: Evolutionary patterns, and con- region generates highly variable markers for population studies served primers for phylogenetic studies. Insect Mol Biol 5:153– in a social insect. Insect Mol Biol 6:369–376 165 Barrio E, Latorre A, Moya A (1994) Phylogeny of the Drosophila Lunt DH, Whipple LE, Hyman BC (1998) Mitochondrial DNA obscura species group deduced from mitochondrial DNA se- variable number tandem repeats (VNTRs): utility and problems quences. J Mol Evol 39:478–488 in molecular ecology. Mol Ecol 7:1441–1455 Bensasson D, Zhang D-X, Hartl DL, Hewitt GM (2001) Mitoch- Mardulyn P (2001) Phylogeography of the Vosges mountains ondrial pseudogenes: Evolution’s misplaced witnesses. Trends populations of Gonioctena pallida (Coleoptera: Chrysomelidae): Ecol Evol 16:314–321 A nested clade analysis of mitochondrial DNA haplotypes. Mol Bjo¨ rklund M (1999) Are third positions really that bad? A test Ecol 10:1751–1763 using vertebrate cytochrome b. Cladistics 15:191–197 Mardulyn P, Termonia A, Milinkovitch MC (2003) Structure and Bogdanowicz SM, Schaefer PW, Harrison RG (2000) Mitoch- evolution of the mitochondrial control region of leaf beetles ondrial DNA variation among worldwide population of (Coleoptera: Chrysomelidae): A hierarchical analysis of nucle- Gypsy moths, Lymantria dispar. Mol Phylogenet Evol 15:487– otide sequence variation. J Mol Evol 56:38–45 495 Martin JF, Gilles A, Descimon H (2000) Molecular phylogeny, and Brehm A, Harris DJ, Herna´ ndez M, Cabrera VM, Larruga JM, evolutionary patterns of the European satyrids (Lepidoptera: Pinto FM, Gonza´ lez AM (2001) Structure and evolution of the Satyridae) as revealed by mitochondrial gene sequences. Mol mitochondrial DNA complete control region in the Drosophila Phylogenet Evol 15:70–82 subobscura subgroup. Insect Mol Biol 10:573–578 Martin JF, Gilles A, Lo¨ rtscher M, Descimon H (2002) Phyloge- Bremer K (1994) Branch support and tree stability. Cladistics netics and differentiation among the western taxa of the Erebia 10:295–304 tyndarus group (Lepidoptera: Nymphalidae). Biol J Linn Soc Brookes MI, Graneau YA, King P, Rose OC, Thomas CD, Mallet Lond 75:319–332 JLB (1997) Genetic analysis of founder bottlenecks in the rare Mathews DH, Sabina J, Zuker M, Turner DH (1999) Expanded British butterfly Plebejus argus. Conserv Biol 11:648–661 sequence dependence of thermodynamic parameters provides 290

robust prediction of RNA secondary structure. J Mol Biol Aphididae) biotypes: Evidence for host-adapted races. Insect 288:910–940 Mol Biol 9:179–184 McGuire G, Denham MC, Balding DJ (2001) Models of sequence Simon C, Frati F, Beckenbach A, Crespi B, Liu H, Flook P (1994) evolution for DNA sequences containing gaps. Mol Biol Evol Evolution, weighting, and phylogenetic utility of mitochondrial 18:481–490 gene sequences and a compilation of conserved polymerase McKechnie SW, Hoffmann AA, Kovacs IV, Cacoyianni Z, chain reaction primers. Ann Entomol Soc Am 87:651–701 Naughton NE, Katsabanas S (1993a) Genetic variation among Sperling FAH (1991) Mitochondrial DNA restriction sites of Australian populations of native budworm Helicoverpa punc- Papilio species: Alignment, length variation, and phylogenetic tigera (Lepidoptera: Noctuidae). In Corey SA, Dall DJ, Milne relationships. In: Mitochondrial DNA phylogeny, speciation, WM (eds) Pest control and sustainable agriculture. CSIRO, and host plant coevolution of Papilio butterflies. PhD thesis. Melbourne, pp 428–431 Cornell University, Ithaca, NY, pp 90–109 McKechnie SW, Spackman ME, Naughton NE, Kovacs IV, Ghosn Swofford DL (1998) PAUP*: Phylogenetic analysis using parsi- M, Hoffmann AA (1993b) Assessing budworm population mony (*and other methods), v. 4.0b. Sinauer, Sunderland, MA structure in Australia using the AT-rich region of mitochond- Taylor MFJ, McKechnie SW, Pierce N, Kreitman M (1993) The rial DNA. Beltwide Cotton Conf Proc 2:838–839 Lepidopteran mitochondrial control region: structure and ev- Merila¨ J, Bjo¨ rklund M, Baker AJ (1997) Historical demography olution. Mol Biol Evol 10:1259–1272 and present day population structure of the greenfinch, Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins Carduelis chloris—An analysis of mtDNA control-region se- DG (1997) The ClustalX windows interface: flexible strategies quences. Evolution 51:946–956 for multiple sequence alignment aided by quality analysis tools. Monforte A, Barrio E, Latorre A (1993) Characterization of the Nucleic Acids Res 24:4876–4882 length polymorphism in the A+T-rich region of the Drosophila Warren BCS (1936) Monograph of the genus Erebia. British obscura group species. J Mol Evol 36:214–223 Museum Natural History, London Monnerot M, Solignac M, Wolstenholme DR (1990) Discrepancy in Williams BL (2002) Conservation genetics, , and taxo- divergence of the mitochondrial and nuclear genomes of Dro- nomic status: A case history of the Regal Fritillary. Conserv sophila teissieri and Drosophila yacuba. J Mol Evol 30:500–508 Biol 16:148–157 Posada D, Crandall KA (1998) Modeltest; Testing the model of Xiong B, Kocher TD (1991) Comparison of mitochondrial DNA DNA substitution. Bioinformatics 14:817–818 sequences of seven morphospecies of black flies (Diptera: Si- Rankin-Baransky K, Williams CJ, Bass AL, Bowen BW, Spotila muliidae). Genome 34:306–311 JR (2001) Origin of loggerhead turtles stranded in the north- Yukuhiro K, Sezutsu H, Itoh M, Shimizu K, Banno Y (2001) Not eastern United States as determined by mitochondrial DNA trivial level of sequence divergence, and sequence rearrange- analysis. J Herpetol 35:638–646 ments of mitochondrial genome between the wild mulberry Roehrdanz RL, Degrugillier ME (1998) Long sections of mitoch- silkmoth, Bombyx mandarina, and its close relative, the do- ondrial DNA amplified from fourteen orders of insects using mesticated silkmoth Bombyx mori. Direct submission to Gen- conserved polymerase chain reaction primers. Ann Entomol Bank. Accession number: AB070263 Soc Am 91:771–778 Zhang D-X, Hewitt GM (1996) The use of DNA markers in Rozas J, Rozas R (1999) DnaSP version 3: An integrated program population genetics, and ecological studies of the desert locust for molecular population genetics and molecular evolution Schistocerca greagaria (Orthoptera: Acrididae). In Symondson analysis. Bioinformatics 15:174–175 WOC, Liddell JE (eds) The ecology of agricultural pests. Bio- Rubinoff D, Sperling FAH (2002) Evolution of ecological traits chemical approaches. Chapman and Hall, London, pp 213–230 and wing morphology in Hemileuca (Saturniidae) based on a Zhang D-X, Hewitt GM (1997) Insect mitochondrial control re- two-gene phylogeny. Mol Phylogenet Evol 25:70–86 gion: A review of its structure, evolution, and usefulness in Ruokonen M, Kvist L (2002) Structure and evolution of the avian evolutionary studies. Biochem Syst Ecol 25:99–120 mitochondrial control region. Mol Phylogenet Evol 23:422–432 Zhang D-X, Szymura JM, Hewitt GM (1995) Evolution, and Schultheis AS, Weigt LA, Hendricks AC (2002a) Arrangement, structural conservation of the control region of insect mitoch- and structural conservation of the mitochondrial control region ondrial DNA. J Mol Evol 40:382–391 of two species of Plecoptera: Utility of tandem repeat-con- Zimmermann M, Wahlberg N, Descimon H (2000) Phylogeny of taining regions in studies of population genetics, and evolu- Euphydryas checkerspot butterflies (Lepidoptera: Nymphali- tionary history. Insect Mol Biol 11:605–610 dae) based on mitochondrial DNA sequence data. Ann Ento- Schultheis AS, Weigt LA, Hendricks AC (2002b) Gene flow, dis- mol Soc Am 93:347–355 persal, and nested clade analysis among populations of the Zuker M, Mathews DH, Turner DH (1999) Algorithms, and stonefly Peltoperla tarteri in the southern Appalachians. Mol thermodynamics for RNA secondary structure prediction: A Ecol 11:317–327 practical guide. In Barciszewski J, Clark BFC (eds) RNA bio- Shufran KA, Burd JD, Anstead JA, Lushai G (2000) Mitochond- chemistry and biotechnology. Kluwer Academic, Dordrecht, rial DNA sequence divergence among greenbug (Homoptera: pp 11–43 Vila M & Björklund M (2004) 1

Appendix I. Classification and number of specimens (N) of the species used in this study, collecting locations and providers. GenBank Accesion numbers for both CR and COI are given.

Note. “Request” stands for uncomplete CR sequences, not submitted to public databases and available directly from the authors: C. sylvicola (2 individuals

The utility the neglected mitochondrial control region for evolutionary studies in Lepidoptera (Insecta). J Mol Evol 58, 280-290. Vila M & Björklund M (2004) 2 sequenced) 600 bp of the CR available from SeqLepMet (5’ end) and 250 bp of the

CR from LepAT2B (3’ end), P. guttata (2 individuals sequenced) 518 bp available from 5’ and 118 bp from 3’, L. sinapis (1 individual sequenced) 524 bp available from

5’ and 42 bp from 3’, P. decipiens (3 individuals sequenced) two different sequences available, unable to identified the true CR. Frans Cupedo provided us with six dried specimens of Erebia triaria (Montes Universales, NE Spain) collected in 1996, see

Materials and Methods.

The utility the neglected mitochondrial control region for evolutionary studies in Lepidoptera (Insecta). J Mol Evol 58, 280-290. Vila M & Björklund M (2004) 3

Appendix II. Structure of Control Region (CR).

The following sections in the Lepidopteran CR were identified according to Zhang et al. (1995) and Zhang and Hewitt (1997). Descriptions involved the seven species mentioned in tables 3 and 4, although particular features from other Lepidoptera

(Appendix I) are included.

1. Section A is located downstream of the tRNA-Met and included a run of T

flanked by purines. The number of T´s was eight or nine in E. triaria and

Aricia agestis, six in E. palarica, E. meolans and Plebejus argus. In

Coenonympha hero there were either runs of six or nine T´s, the latter flanked

upstream by a purine. In Melitaea latonigena the run of T´s was interrupted by

two A´s.

2. Section B was a small fragment (~20 bp) always downstream of Section A.

The 3’ end was similar in most taxa to the consensus (CATTT) identified by

Zhang et al. (1995). Examples of deviant Section B haplotypes were (5’

TGTTT GCCAA ATTCT TTT 3’) in E. triaria and (5’ TGTTA CCAAA

ACAAT ATT 3’) in Bombyx mandarina. The Lepidopteran Section B had a

higher GC content (28.3% in E. triaria) compared to the data reported (Zhang

et al. 1995).

3. Section C was usually formed by ~30 bp and fairly conserved in base

composition (very rich in AT) and 3’ end consensus (5’ ATGTA 3’). Section

C could not be identified in Melitaea latonigena and Bombyx mandarina. Two

transitions occurred in this segment in E. triaria.

4. Section D was where most intraspecific variability was found, both as indels

and SNPs. For example, six out of eight substitutions of the CR in Erebia

The utility the neglected mitochondrial control region for evolutionary studies in Lepidoptera (Insecta). J Mol Evol 58, 280-290. Vila M & Björklund M (2004) 4

triaria occurred in this fragment. It showed an extremely high AT content

(99.9 % in most cases). Section D may form secondary structures, as

suggested by the base complementarity in the same strand. The number of

possible stem-loops could be one or two. In Plebejus argus and Aricia agestis

there was a small GC-rich hairpin (not shown) prior to the typical AT-rich

sequence. Section D was 180 nucleotides long in Erebia triaria and the motif

(5’ TATAATTTATAAATT 3’) was repeated twice (N1 and N2) along it;

three of the transitions occurred before N1 and the other three occurred after

N2. Microsatellite-like repeats were usually present in Section D, for example,

two monomorphic repeats (TA)8 and (TA)7 in E. triaria and two polymorphic

ones TA (seven or eight repeats) and AT (five or six repeats) in E. palarica.

Three specimens of Pyronia tithonus yielded three different haplotypes, all

differentiated in Section D due to an indel polymorphism (TA) repeated eight

or nine times and a transition. In the case of Arethusana arethusa, where

Section D ranged about 380 bp, there was a 26 bp block repeated ten or eleven

times (length polymorphism) like a minisatellite. The first three blocks had

differences at the 8th nucleotide (T, A, C respectively), while the rest of the

repeats were fixed for A. After these repeats, a 100 % AT fragment of about

50 bp (with several ATTTA repeats) led to the end of Section D.

5. Blocks E1 and E2 flanked a stem-loop secondary structure of about 50 bp in

Lepidoptera (not shown). E1 and E2 were both highly conserved motifs as

previously reported by Zhang et al. (1995): E1 consensus started with (5’

WTATA 3’) and ended with a motif akin to (5’ CTTT 3’) and both were found

in all taxa surveyed so far. The E2 consensus (5’ G(A)nTWW 3’) was also

The utility the neglected mitochondrial control region for evolutionary studies in Lepidoptera (Insecta). J Mol Evol 58, 280-290. Vila M & Björklund M (2004) 5

very conservative, e.g. (5’ GAAAATTAT 3’) in E. triaria, (5’ GAATTT 3’)

in Plebejus argus and (5’ GAATT 3’) in Bombyx mandarina. E2 was defined

as (5’ GGATT 3’) in Coenonympha hero and as (5’ GTAAT 3’) in Arethusana

arethusa. Unfortunately, it could not be identified in Aricia agestis nor

Melitaea latonigena. The lack of block E2 in the latter was not representative

of the Nymphalinae since Inachis io showed a typical E2.

6. Section F was difficult to identify as defined by Zhang et al. (1995). Block F

(= GA rich) should be flanked by an AT-rich sequence upstream and a GC-

rich fragment downstream. This organisation showed slight variations. For

instance, in E. triaria we found a 99.9 % AT-rich section, followed by a CT-

rich fragment (72 %), and a subsequent a GA-rich region (60 %), i.e., the

putative Block F. Variants of that organisation were E. palarica and E.

meolans [GC]-[AT]-[GA] or Bombyx mandarina [CT]-[GA]-[AT]. The size of

Section F was variable, from ~70 bp in Plebejus argus to ~450 bp in Bombyx

mori, but extremely conservative at the level of congeneric species (Figure 1B,

1C). E. palarica, E. meolans, Plebejus argus, Coenonympha hero and

Melitaea latonigena all had intraspecific substitutions in this section (Tables 2

and 4).

7. Homopolymer. Toward the end of the control region, adjacent to the 12s

rDNA, there was a string of about 19-23 thymines (T) present in all the taxa

examined so far, including moths (Bombyx mori, B. mandarina, Prodoxus

quinquepunctellus and P. decipiens). The length of this homopolymer was

fairly conserved intraspecifically, although single or dinucleotide insertions

were detected in some individuals of Erebia palarica, Erebia meolans and

The utility the neglected mitochondrial control region for evolutionary studies in Lepidoptera (Insecta). J Mol Evol 58, 280-290. Vila M & Björklund M (2004) 6

Aricia agestis. In addition, we found one (A) or two (AC) inserts in that long

repeat in Coenonympha species. Thus, C. arcania showed the pattern [16(T)-

(A)-3(T)], C. hero [16 or 17(T)-(A)-5(T)], C. glycerion [19(T)-(A)-3(T)] and

C. pamphilus [19(T)-(AC)-3(T)].

8. Small GA or GC rich region. Right after the homopolymer, a small GC or

GA rich fragment appeared in all taxa (for instance (5’ AAGATACACA 3’) in

E. triaria) that could be identified as the GC rich region that should flank

downstream the Block F according to Zhang et al. (1995).

9. Section G. This small CR fragment was always identified by the starting

consensus defined by Zhang et al. (1995) (5’ TTTT 3’). It was

intraspecifically identical in all the taxa examined, except in Arethusana

arethusana that had one transversion and in Plebejus argus where two out five

individuals had a T inserted just before the 12s gene.

The utility the neglected mitochondrial control region for evolutionary studies in Lepidoptera (Insecta). J Mol Evol 58, 280-290. Vila M & Björklund M (2004) 7

Appendix III. Failure and Success in CR amplification.

The complete CR is not available yet for Leptidea sinapis, Carterocephalus sylvicola and Prodoxus decipiens because of non-overlapping sequence from both directions and/or lack of the beginning toward the tRNA-Met end. The primers did not consistently amplify the CR in: Ochlodes ochracea (Hesperiidae), Polytremis pellucida (Hesperiidae) or Papilio demoleus (Papilioniidae). This can partly be attributed to poor quality of the last specimen, dry and four years old. The primers were successful in the African silk moth Gonometa postica (Lasiocampidae) for both amplification and sequencing (W. Delport pers. com.). In Prodoxus decipiens, depending on whether the PCR amplification was performed with the primer pair

J6+Lep12S or SeqLepMet+LepAT2B, we obtained different sequences showing common features, which suggests that one might be a nuclear mitochondrial pseudogene (NUMT). Those sequences, as the partial CR mentioned above, are available directly from the authors.

The utility the neglected mitochondrial control region for evolutionary studies in Lepidoptera (Insecta). J Mol Evol 58, 280-290. Vila M & Björklund M (2004) 8

Appendix IV. Intraspecific variability in the M3-M5 region of the COI.

We also surveyed the level of nucleotide diversity in another COI region, upstream of the one presented in Results, highly variable in other Insects and known as M3-M5

(see Lunt et al. 1996). This fragment, 297 bp length starting at position 313, resulted from bidirectional direct sequencing of the usual ~2 Kb COI PCR product with primers C1-J-1751 and C1-N-2191 (Simon et al. 1994). Our results showed little variability. Only three haplotypes (Accesion numbers AY346219, AY346251,

AY350462) resulted from sequencing 29 specimens of Erebia triaria (10, 8 and 11 individuals from the Xistral, Courel and Queixa populations, respectively) due to two synonymous transitions at 3rd codon positions. First transition (C/T) only occurred in the Courel population (frequency within population = 0.5). The second one (A/G) only occurred in the Queixa population (frequency within population = 0.182). The 37 analysed Erebia palarica samples (14, 12 and 11 individuals from Ancares, Courel and Queixa populations, respectively) were monomorphic (AY346220).

The utility the neglected mitochondrial control region for evolutionary studies in Lepidoptera (Insecta). J Mol Evol 58, 280-290.