Molecular Phylogenetics and Evolution 79 (2014) 169–178

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution

journal homepage: www.elsevier.com/locate/ympev

Towards a mitogenomic phylogeny of ⇑ Martijn J.T.N. Timmermans a,b, , David C. Lees c, Thomas J. Simonsen a a Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD, United Kingdom b Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, United Kingdom c Department of Zoology, Cambridge University, Downing Street CB2 3EJ, United Kingdom article info abstract

Article history: The backbone phylogeny of Lepidoptera remains unresolved, despite strenuous recent morphological and Received 13 January 2014 molecular efforts. Molecular studies have focused on nuclear protein coding genes, sometimes adding a Revised 11 May 2014 single mitochondrial gene. Recent advances in sequencing technology have, however, made acquisition of Accepted 26 May 2014 entire mitochondrial genomes both practical and economically viable. Prior phylogenetic studies utilised Available online 6 June 2014 just eight of 43 currently recognised lepidopteran superfamilies. Here, we add 23 full and six partial mitochondrial genomes (comprising 22 superfamilies of which 16 are newly represented) to those Keywords: publically available for a total of 24 superfamilies and ask whether such a sample can resolve deeper tRNA rearrangement lepidopteran phylogeny. Using recoded datasets we obtain topologies that are highly congruent with Illumina prior nuclear and/or morphological studies. Our study shows support for an expanded LR-PCR including , Thyridoidea, plume ( and Pterophoroidea; possibly along with Pooled mitochondrial genome assembly Epermenioidea), , , Mimallonoidea and . Regarding other controversially positioned higher taxa, Doidae is supported within the new concept of and sister to (or part of) Macroheterocera, while among butterflies, Danainae and not Libytheinae are sister to the remainder of the family. At the deepest level, we suggest that a tRNA rearrangement occurred at a node between and Ditrysia + + Tischeriidae. Ó 2014 Elsevier Inc. All rights reserved.

1. Introduction includes a number of biological model organisms, many severe pest , and comprises many of the best known and most Reconstructing a stable phylogenetic framework for Lepidop- popular invertebrates, emphasising that studies into lepidopteran tera at the superfamily level is an ongoing process that, despite phylogeny and evolution are of both scientific and public interest. considerable effort over recent years, is still far from complete. It is therefore not surprising that the past few years have seen sev- Kristensen and Skalski (1999) overview summarised the most eral attempts to resolve the higher-level phylogeny of Lepidoptera comprehensive morphology-based hypothesis of the last century based on comprehensive molecular datasets. The ‘‘LepTree project’’ for the phylogeny of the order. While the early part of Lepidoptera has already resulted in a number of benchmark publications evolution was presented as a well-resolved ‘‘Hennigean comb’’, the focused both at the superfamily level (e.g. Regier et al., 2009, phylogeny of the vast majority of the order (the suborder Ditrysia 2013; Cho, et al. 2011; Bazinet et al., 2013; Sohn et al., 2013) and which comprise more than 95% of all known species) was almost for clusters of several larger superfamilies (Zwick et al., 2011; entirely unresolved apart from a few higher-level clades. This Regier et al., 2012a,b). The ‘‘Ditrysia project’’ has so far resulted overview made it abundantly clear that large-scale molecular in a superfamily-level molecular phylogeny (Mutanen et al., analyses were needed to resolve the higher-level phylogeny of 2010) and a phylogeny of Papilionoidea (Heikkila et al., 2012), ditrysian Lepidoptera. while associated studies have focused on the superfamilies With more than 157,000 described species (van Nieukerken Gelechioidea (Kaila et al., 2011) and (Zahiri et al., et al., 2011), Lepidoptera (butterflies and moths) represent one of 2011, 2012, 2013a,b), as well as the large family the evolutionarily most successful lineages of phytophagous (Karsholt et al., 2013). Both the LepTree project and the Ditrysia (e.g. Grimaldi and Engel, 2005). Furthermore, the order project have provided considerable new resolution to the higher phylogeny of Ditrysia, but many groupings have been mutually exclusive between the studies and others were not well supported. ⇑ Corresponding author. E-mail addresses: [email protected] (M.J.T.N. Timmermans), Many of these poorly supported groupings are found in the same [email protected] (D.C. Lees), [email protected] (T.J. Simonsen). parts of the Lepidoptera Tree of Life where morphology alone has http://dx.doi.org/10.1016/j.ympev.2014.05.031 1055-7903/Ó 2014 Elsevier Inc. All rights reserved. 170 M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178 failed to yield conclusive evidence for evolutionary relationships. It representatives of 22 Lepidoptera superfamilies (Table 1). PCR is therefore clear that despite considerable efforts, we still remain reactions were performed with Takara LA taq and followed cycling far from a unified consensus of the phylogeny of the higher conditions described in Timmermans et al. (2010). Two primer Lepidoptera. For example, sister groups of some of the most prom- pairs were used for long-range PCR: One pair (SytbF/reverse inent lepidopteran radiations (Papilionoidea and Gelechioidea) complement of Jerry, JerryRC: Simon et al., 1994) targeting a 7 kb remain as yet unknown. To maximise our chances for arriving at fragment running from COI to CytB across the AT-rich region, such a ‘‘consensus hypothesis’’, we need to look at broadening both and one pair (Jerry: Simon et al., 1994/SytbR) targeting the oppo- taxon and character sampling in future phylogenetic studies. site 9 kb half of the circular genome (Fig. 1). Not all PCRs were Within the realm of molecular characters, full mitochondrial successful, and for some species one fragment was obtained genomes can be a data-rich and relatively accessible source of (Table 1; Fig. 2). For Pennisetia hylaeiformis the JerryRC-SytbR PCR information, and promising results have been obtained for other failed and a shorter fragment was targeted by replacing the JerryRC families and orders (e.g. Timmermans et al., 2010; Haran primer with stevCO3F (GCWTGATAYTGACAYTTYGT) located in et al., 2013; Simon and Hadrys, 2013). So far, the mt-phylogenomic COIII. In contrast to Timmermans et al. (2010), who used Roche’s studies with the most comprehensive lepidopteran taxon sampling 454 platform, sequencing was performed using Illumina’s Solexa remain the Kim et al. (2011), Lu et al. (2013) analyses that focused technology. Fragments were pooled before TruSeq library almost entirely on the polyphyletic group ‘‘’’ construction (800 bp insert size) and paired-end sequenced (PE: (essentially Papilionoidea plus Macroheterocera sensu van 2 250 bp; V2 Reagent Kit) on an Illumina Miseq aiming for 1% Nieukerken et al., 2011). of the machines run capacity. In addition to these Lepidoptera Here we characterise full mitochondrial genomes using a samples, the pool also contained 22 partial beetle mitochondrial method that relies on the in silico separation of DNA sequence genomes, which are not further discussed here. Three ‘baits’ were data from pooled PCR fragments (Timmermans et al., 2010). We generated to assign anonymous contigs to the voucher specimens. acquired mitochondrial genomes of a diverse set of lepidopterans These baits included a 400 bp fragment of CytB (primer pair: and present the most comprehensive mt-phylogenomic study of SytbF + SytbR) and two COI fragments (primer pairs LepCoxF1: Lepidoptera to date (mainly focused on the subgroup Heterone- ACNAATCATAARGATATTGGNAC + JerryRC and Jerry + sCOIR: CARA- ura). We explore the evolutionary patterns of the different genes TTTCDGARCATTG) (Fig. 1). ‘Bait’ sequences were bi-directionally in the mt-genome across heteroneuran Lepidoptera and assess sequenced using Sanger technology and edited in Geneious version the utility of mt-genomes in lepidopteran phylogenetics. 5.6 (Biomatters, Auckland, New Zealand).

2. Materials and methods 2.2. Sequence assembly and annotation

2.1. PCR amplification and sequencing Raw sequence reads were pre-processed using the Trimmomat- ic (Lohse et al., 2012) and Prinseq-lite (Schmieder and Edwards, Wet-lab methods are based on those described in Timmermans 2011) perl scripts. Bases with a Phred quality score below 20 were et al. (2010). Mitochondrial genomes were PCR amplified from trimmed. Sequence reads with a high quality length below 150 bp

Table 1 Details of Lepidoptera samples (29) subjected to mitochondrial genome sequencing.

Superfamily Family Species Voucher number Sequence length (bp) Mean coverage Accession number

1 Papilionoidea Papilionidae Papilio dardanus BMNH746800 15,337 405 JX313686 2 Geometridae Chloroclysta truncata BMNH1043250 15,828 428 KJ508061 3 Drepanoidea Drepana arcuata CWM-96-0579* 15,300 254 KJ508053 4 Incertae sedis Doidae Doa BOLD:AAC3419 05-srnp-63413* 15,228 185 KJ508058 5 Noctuoidea Acronicta psi BMNH1043251 15,348 224 KJ508060 6 Noctuoidea Noctuidae Noctua pronuba BMNH1043252 15,315 461 KJ508057 7 Mimallonoidea Mimallonidae Lacosoma valva RWH-96-0919* 16,108 66 KJ508050 8 Apatelopteryx phenax BMNH730001 15,648 41 KJ508055 9 Thyridoidea Rhodoneura mellea TH1 15,615 129 KJ508038 10 Pyraloidea angustea BMNH1044343 15,386 383 KJ508052 11 Tortricoidea Epiphyas postvittana BMNH1044344 15,451 495 KJ508051 12 Ischnusia culiculina IC1 8134 124 KJ508043 13 Pterophoroidea Emmelina monodactyla BMNH1044345 15,252 410 KJ508063 14 Alucitoidea Alucitidae montana NB-06-0080* 15,272 639 KJ508059 15 Epermenioidea Epermenia illigerella MM08-2525* 7983 141 KJ508039 16 Urodus decens DA-03-3349* 15,279 315 KJ508062 17 Pennisetia hylaeiformis BMNH1039831 14,199 262 KJ508049 18 Cossoidea Zeuzera coffeae AYK-04-0891-03* 7983 45 KJ508046 19 Gelechioidea eupositica MJM-96-0277* 15,347 466 KJ508047 20 Gelechioidea Endrosis sarcitrella BMNH1044346 15,317 245 KJ508037 21 Gelechioidea Oecogonia novimundi BMNH1044347 15,408 485 KJ508036 22 Gelechioidea Perimede sp. JWB-08-0217* 15,131 228 KJ508041 23 Phyllonorycter froelichiella BMNH1044348 15,538 466 KJ508048 24 Gracillarioidea Gracillariidae Phyllonorycter platani BMNH1044349 15,791 50 KJ508044 25 Gracillarioidea Gracillariidae Cameraria ohridella BMNH1044350 15,512 368 KJ508042 26 Tineola bisselliella BMNH1044351 15,661 556 KJ508045 27 Tischeriidae Astrotischeria sp. KN-06-1154* 15,079 311 KJ508056 28 Stigmella roborella BMNH1044352 15,565 351 KJ508054 29 Micropterigoidea Micropterix calthella UNK-93-0057* 9150 70 KJ508040

* Asterisked voucher numbers indicate that genomic DNA was provided by the ‘‘Leptree’’ initiative. Numbers in the first column followed by a minus sign indicate that the mitochondrial genome is incomplete. The Papilio dardanus data was combined with an existing DNA sequence to produce a full mitochondrial genome. The Doa species is represented by a BOLD cluster number. M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178 171

Fig. 1. PCR setup applied. For Illumina sequencing, the circular mitochondrial genome was amplified in two overlapping pieces, both spanning from COI to CytB. To assign species names to the assembled mitochondrial contigs, three ‘bait’ sequences were generated using Sanger technology located in Cox1 50, Cox1 30, and CytB (grey boxes).

Fig. 2. Sequence coverage of each of the 29 mitochondrial fragments. Horizontal axis depicts fragment length (x 1 K). Vertical axis depicts read coverage (as given). Note that coverage is highly variable, differs between the two LR-PCR products and tends to peak where the two PCR products overlap. 172 M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178 and reads containing ‘Ns’ were discarded. Sequences for which represents the probable extant sister lineage (Micropterigoidea) only one read of the pair passed this quality filter were also dis- to all other Lepidoptera and was shown to be highly divergent from carded. Remaining reads were assembled using Roche 454/ the others (see Results section). To test the effect of data Newbler software (settings: -rip -mi 95 -ml 100). Mitochondrial incompleteness, 10 partial genome sequences were removed and contigs were identified by linking them to Sanger ‘bait’ sequences all the analyses repeated using the 96 remaining taxa. Therefore, using stand-alone BLAST (Altschul et al., 1990). The P. dardanus in total nine tree searches were performed (nc123, degen, and contig was combined with an unpublished, incomplete mitochon- RY, all for 106 taxa, 105 taxa, and 96 taxa). drial genome that was generated as part of a different 454 based study (Timmermans et al., unpublished) to form a single complete 2.4. Phylogenetically informative tRNA translocation mitochondrial genome. Mitochondrial tRNAs were annotated using COVE (Eddy and Durbin, 1994) as described in (Timmermans and A putative phylogenetically informative tRNA translocation has Vogler, 2012) and positions of protein coding and rRNA genes were been reported for Lepidoptera (Cao et al., 2012). The inferred determined using Geneious (version 5.6) by transferring annota- ancestral insect mitochondrial genome contains three tRNAs tions of a reference genome to newly obtained mitochondrial between the control region and ND2 in the order tRNA-Ile, sequences. To obtain estimates of mean coverage Illumina reads tRNA-Glu, tRNA-Met, but for all sequenced ditrysian genomes this were remapped to the mitochondrial contigs using Geneious order is tRNA-Met, tRNA-Ile, tRNA-Glu. The have been (minimum overlap 150 bp, a maximum of 2% mismatches and a reported to show the ancestral order (Cao et al., 2012), and there- maximum gap size of 10). All annotated mitochondrial genome fore this translocation event must have occurred after this lineage sequences were submitted to GenBank (accession numbers: diverged from the more derived Lepidoptera. To pinpoint this JX313686; KJ508036–KJ508063). translocation event, gene order of the newly generated genome sequences was investigated. In addition, PCR was performed on 2.3. Phylogenetic analyses members of Adeloidea (Nematopogon sp., Hertfordshire, UK) and Palaephatidae (Azaleodes fuscipes, Queensland, Australia) for which Annotated mitochondrial genomes were combined with 77 to our knowledge no full mitochondrial genomes are available. PCR publicly available Lepidoptera mitochondrial genomes that were used a primer pair that only amplifies the order tRNA-Met, tRNA-Ile, available at the cut-off date for our analysis, 07-05-2013 (Suppl. tRNA-Glu, with a primer in tRNA-Met (AAGCTWTTRRGYTCATACC) Table 1), and FeatureExtract (Wernersson, 2005) was used to and in tRNA-Ile (CTATCANAATANYCCTTT) and an annealing tem- extract the individual protein coding genes. Protein coding gene perature of 40 °C. matrices were parsed through ClustalW (Thompson et al., 1994) using the transAlign perl script (Bininda-Emonds, 2005) for codon-based alignment. Uncertainly aligned regions were removed 3. Results using the GBlocks program (Castresana, 2000), using ‘‘minimum length of a block: 10’’ and ‘‘allowed gap positions: with half’’. The 3.1. Data quality and assembly rRNA genes were aligned using MAFFT (http://align.bmr.kyushu-u. ac.jp/mafft/) using the Q-INS-i method (Katoh et al., 2005; Katoh A total of 333,246 read-pairs were generated for the sequence et al., 2009) and regions of uncertain alignment were pinpointed library. After quality control 322,129 R1 reads and 317,979 R2 using Aliscore (Misof and Misof, 2009) and removed. Protein cod- reads were retained with a mean sequence length of 249 bp, which ing and rRNA genes were concatenated into a single data matrix represented 316,940 full sequence pairs. Following assembly and (named ‘‘nc123’’). Saturation plots, where F84 genetic distance manual concatenation of unassembled fragments, 23 full and 6 was plotted against transitions and transversions, were generated partial lepidopteran mitochondrial genomes were obtained for this data matrix using DAMBE (Xia and Xie, 2001). (Table 1; Fig. 2), which covered nearly all the added DNA template. Two additional data matrices were generated from this concat- The only genome for which we did not obtain the full sequence of enated dataset by recoding nucleotide positions. The first recoding the added PCR product was M. calthella, for which ca. half of the scheme involved recoding the first codon position of all protein- SytbF-JerryRC fragment was missing. The bait sequences unequiv- coding genes to either purines or pyrimidines (RY) and removing ocally assigned all contigs to species. the fast evolving third codon positions (Hassanin, 2006; Pons Reads were mapped back onto final contigs, which revealed the et al., 2010). The second scheme applied the Degen script (Regier mean coverage to be very high, but also very variable (mean cover- et al., 2010; Zwick et al., 2012) to convert every codon to a ‘degen- age: 296, stdev: 169; Fig. 2). In general, highest coverage was erated’ one, changing all synonymous sites to IUPAC ambiguity observed for the overlapping fragment in CytB that was present nomenclature. The datasets were split in 41 (nc123 dataset, degen on both LR-PCR products. In addition, coverage of the PCR fragment dataset) and 28 partitions (RY coded dataset) respectively. Parti- that includes the AT-rich region was generally higher than cover- tionFinder (Lanfear et al., 2012) was subsequently used to merge age of the other half of the genome. The mean coverage was lowest these partitions in the most optimal scheme and select a model for Apatelopteryx phenax (Lasiocampidae), but still more than 40X. for each of the predicted partition sets using the Bayesian Informa- tion Criterion. This procedure followed a two-tiered approach. 3.2. Data matrix, model selection and partitioning schemes Within PartitionFinder RAxML (version 7.4.1) (Stamatakis et al., 2005) was first used to obtain partition scheme, after which PhyML The new sequences were combined with 77 publicly available (version 20111219) (Guindon and Gascuel, 2003) was applied to mitochondrial genomes to generate a data matrix of 13,311 base select the best fitting model for each partition. Phylogenetic anal- pairs, of which 4.3% was missing or represented gaps. yses were performed on the CIPRES portal (Miller et al., 2010) The saturation plot (Supplementary Fig. 1) on the concate- using MrBayes 3.2.2 (Ronquist et al., 2011) with two independent nated dataset revealed that one species (M. calthella) was highly runs each with four chains. The analysis was run for 12 million divergent compared to the remaining taxa. This species repre- generations with sampling every 100 generations. The first 3 mil- sented the sister lineage of all other taxa included in this study lion generations (25%) were discarded as burn-in. and the high divergence is therefore not surprising—in particular All analyses were performed with (106-taxa dataset) and as none of the seven superfamilies traditionally placed between without (105-taxa dataset) Micropterix calthella, a species which Micropterigoidea and Hepialoidea (e.g. Kristensen and Skalski, M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178 173

1999) were included in the analyses as material of sufficient qual- In agreement with above-mentioned saturation plots the third ity was not available at the time of the sequence run. However, as codon positions, which are highly degenerate, showed the highest its divergence from the others might affect its correct placement substitution rates, mainly involving transitional changes. The third (due to missing data and long-branch attraction), we decided to codon positions of the forward strand and third codon positions on perform phylogenetic analyses with and without this species. This the reverse strand were grouped in two separate partitions that confirmed the divergence and difficult placing of this lineage differed in their nucleotide base frequencies, with the ‘forward’ (Supplementary Fig. 2). We therefore excluded M. calthella from partition being somewhat richer in C and the reverse partition further analyses and used the two representatives of Hepialoidea richer in G, reflecting the well-known CG bias between the two as outgroups, focussing our study on the large lepidopteran mitochondrial strands (e.g. Pons et al., 2010). subgroup which contains > 99% of all Lepidoptera The remaining two partitions contained the first codon posi- (Kristensen and Skalski, 1999; van Nieukerken et al., 2011)and tions of the cytochrome genes and the first codon positions of has appeared as monophyletic in all previous studies of the ND genes (combined with two codon positions of ATP8 and Lepidoptera phylogeny. the two rRNA genes) and showed intermediate rates, again biased PartitionFinder suggested five partitions to be optimal for the towards transitional changes. It is worth mentioning that ND1, nc123 datasets. The saturation plot indicated substitutional satura- ND2, ND3, ND4, ND4L and ND5 and ND6 are all sub-units of com- tion, specifically involving transitional changes at third codon posi- plex I of the mitochondrial respiratory chain. Although highly spec- tions. Nucleotide recoding reduced the number of partitions to ulative, their grouping might therefore reflect a constraint imposed three, indicating that both recoding schemes were effective in by their close interaction in this key enzyme. The same might hold reducing saturation and/or compositional heterogeneity (Fig. 3). true for the partition that contains the first codon positions of COI, Removing the incomplete genomes had only a minor effect on par- COII and COIII. These three genes are all subunits of complex IV of tition schemes, only affecting the placement of the second codon the respiratory chain. positions of ND4L (nc123 datasets) and ATP8 (degen datasets). For the 105-taxa datasets, the main difference between the degen 3.4. Problems with existing Genbank data and RY-coded partition schemes is that the two rRNA genes were assigned a separate partition for the RY-coded dataset, whereas We acknowledge that the existing public domain data surely they were merged with most of the first codon positions of the contain errors. This particularly applies to unpublished submis- degen alignment. Strikingly, for the nc123 datasets the ‘‘cyto- sions, where we observed highly similar sequence stretches chrome’’ genes all differed in the assignment of their first codon between about 300–2000 bp in length that might indicate foreign position (assigned to partition 1) compared to the NADH genes contaminations. This is found between Celastrina hersilia and on the forward strand (assigned to partition 4), whereas their sec- Parnassius bremeri, Papilio xuthus and Teinopalpus aureus, and ond and third codon positions were assigned to the same partitions Troides aeacus and Luehdorfia chinensis. We also note that the (partition 2 and 3 respectively). The NADH genes on the forward sequence labelled ‘‘Celastrina hersilia’’ represents Cyaniris semiargus strand differed from the NADH genes on the reverse strand in that according a query of the COI-5P (the DNA barcode fragment) on their third codon position was assigned to a different partition BOLD. Furthermore, although it does not contain long obviously (partition 3 and 5 respectively), but their first codon positions shared segments, the Ochlodes venata sequence seems likely to be shared a partition (partition 4) (Fig. 3). chimaeric, with mixed sequence from a contaminating family (perhaps to which ‘‘C. hersilia’’ belongs); indeed on 3.3. Base frequencies and substitution rates in the mitochondrial BOLD its COI-5P is 5.4% divergent from a genuine Ochlodes venata genome (JQ922041). These errors likely caused some topological artefacts such as the wrong placement of the so-labelled C. hersilia and Each of the five partitions of the non-recoded nc123 datasets Ochlodes venata that group with the wrong families in the 96 and (Fig. 3) differed in base frequencies and/or substitution rates as 105-exemplar nc123 analyses (Supplementary Fig. 3). It appears estimated by MrBayes (Supplementary Fig. 3 for nc123 105-taxa that the recoding schemes that we applied corrects these problems dataset). As expected, the partition that contained the majority of to some extent. This might be due to the schemes primarily second codon positions showed the lowest rate of substitution, reducing the signal of the most variable first and third codon reflecting the absence of degeneracy of these sites and the positions, thereby (e.g. in the case of Ochlodes venata) changing associated constraint on mutational change. the relative amount of suspect sequence compared to genuine data.

Fig. 3. Partition schemes selected by PartitionFinder for each of the 105 and 96 taxa data matrices. Each coloured box represents one of the three codon positions of a gene. Numbers in the boxes refer to the models that were selected and are: nc123: (1) GTR + I + G, (2) GTR + I + G, (3) GTR + G, (4) GTR + I + G, (5) GTR + G; Degen: (1) GTR + I + G, (2) GTR + I + G, (3) GTR + G; RY: (1) SYM + G, (2) GTR + I + G, (3) GTR + I + G (RNA had a unique partition in this case). 174 M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178

3.5. Lepidopteran relationships and the effect of recoding and the RY-105 analysis. The expanded Obtectomera (including the last mentioned four superfamilies and, in the 105-taxa analy- All analyses resulted in highly comparable trees (Fig. 4, Supple- ses, Epermenioidea) are recovered in all analyses except the mentary Fig. 4), yet with some noteworthy differences between nc123 and degen 105-taxa analyses, but again, the support values analyses. Here we focus largely on a comparison of RY and degen were variable (Fig. 4, Supplementary Fig. 3). recoding, with reference to the nc123 analysis where necessary. Within the Papilionoidea, the well-known relationship of Lyca- Tortricoidea are recovered as sister to the remaining Apoditry- enidae (presumed sister to the unsampled ) + Nymphal- sia in all analyses except the 105-taxa degen analysis, but with idae is recovered in all analyses except for the non-recoded (nc123) variable support (Fig. 4, Supplementary Fig. 3). Urodoidea are 96-taxa and 105-taxa analyses, where the hesperiid Ochlodes recovered as sister to the ‘‘Obtectomera sensu lato’’ in all the 96- venata is placed as sister to Lycaenidae (Fig. 4, Supplementary taxa analyses (Fig. 4, Supplementary Fig. 3), while this superfamily Fig. 3). This could be related to problems with the ‘‘Celastrina her- is placed in an unresolved basal apoditrysian polytomy in the 105- silia’’ sequence, see above. All analyses place Doidae as well sup- taxa analyses (Fig. 4, Supplementary Fig. 3). A sister relationship ported sister to our exemplar of Drepanoidea. Both degen-96 and between Alucitidoidea + Pterophoroidea is recovered in all analy- degen-105 analyses support a sister group relationship between ses, albeit with variable support. A sister relationship between Mimallonidae and Lasiocampoidea. However, in contrast, the 96 Thyridoidea and Gelechioidea is recovered in all 96-taxa analyses and 105 RY-coded datasets place Lacosoma as a well-supported

Fig. 4. Phylogenetic relationships among Lepidoptera. Left: based on the RY analysis of the 96-taxa data matrix. The preferred partition scheme was identical to the one for the 105 taxa RY recoded dataset as presented in Fig. 3. Major superfamilies are highlighted by coloured branches. Numbers are Bayesian posterior probabilities. Note that some very long branches have been truncated (indicated by cross-hatching) to fit the tree in the figure space. Family codes precede the species names using the first two letters of the family name (Table 1 and Supplementary Table 1) with the exception of He: Hesperiidae, Hp: , Ti: Tischeriidae and Tn: Tineidae. Right: Schematic representation of the topologies that were obtained for the RY and degen recoded 105 taxa data matrices. M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178 175 sister to the remaining Macroheterocera (Fig. 4; Supplementary We base our discussion on the results from the analyses of the Fig. 3). Within the ‘core’ macroheterocerans, all analyses except recoded RY and degen datasets (Fig. 4), as saturation effects might degen-105 taxa place and as sister taxa have affected the results from the analyses of the non-recoded with high support. Overall, nodes within the superfamilies are well datasets, resulting in spurious groupings not reflecting evolution- supported for each of the datasets, and we treat these results in ary relationships (Supplementary Fig. 3). Our analyses represents more detail in the discussion section. the most comprehensive mitochondrial genome based phyloge- netic study of Lepidoptera and reveal a number of interesting, 3.6. The effects of missing data well-supported relationships in Lepidoptera phylogeny, most of which are in agreement with nuclear DNA studies of Regier et al. When we compare the 96-taxa and 105-taxa analyses, it is clear (2009, 2013), Cho et al. (2011) and the COI + nuclear DNA study that the incomplete sequences affect support values in the part of of Mutanen et al. (2010). Ditrysia are monophyletic and Tineoidea the trees where they are placed in the 105-taxa analyses (mainly in are supported as the sister of the remaining Ditrysia (note that the lower apoditrysian grade), and that some of the taxa in ques- Regier et al. (2013), Bazinet et al. (2013) found the former super- tion are placed spuriously. For example, the RY-105 analysis rather family to constitute a grade), with Gracillarioidea + Yponomeutoi- implausibly placed the incomplete Zygaenoidea sequence as sister dea comprising the next lineage within Ditrysia as the sister to to other apoditrysians including Tortricidae. Otherwise, inclusion Apoditrysia. None of the analyses resulted in a monophyletic of incomplete mitochondrial genomes generally resulted in some , and greater sampling there is needed in future. resolution among the lower apoditrysian superfamilies. For Apoditrysia are recovered as monophyletic, but Tortricoidea are example, the RY-105 analysis recovers the two Cossoidea families placed unconventionally as the sister group of the remaining clade. as sisters, albeit with low support (Fig. 4). Within Obtectomera However, this is an arrangement previously suggested by the non- sensu lato (Fig. 4), there is weak support for a sister relationship synonymous ‘degen1’ 38-taxon trees in Bazinet et al. (2013; Fig. 4 between Epermenioidea and the plume superfamilies and 5). In the 46-taxon tree in the same study, Urodoidea and Alucitidoidea + Pterophoroidea in both the recoded 105-taxa Tortricoidea switch places respective to our 96-exemplar result. analyses, but the nc123 analysis fail even to place the latter two Other relationships among lower apoditrysians remain largely superfamilies in Obtectomera. unresolved, likely due to our modest level of superfamily sampling from this section of Lepidoptera (our study is missing eight putative lower apoditrysian superfamilies). Even Bazinet et al.’s 3.7. Gene order 741 nuclear genes study using 40 apoditrysian exemplars failed to resolve the lower Apoditrysia (their Fig. 1), but they too The above-mentioned derived tRNA translocation appears to be were missing six of the 26 apoditrysian superfamilies. Within important for determining the relationship between Ditrysia, Tortricidae, our results confirm the well-known split between Tischeriidae and the remaining higher non-ditrysian superfamilies. Tortricinae and Olethreutinae (e.g. Regier et al., 2012a). When analysing newly generated full genome sequences we The support we found for a broadened Obtectomera (Obtecto- discovered that the Tischeriidae share this translocation with the mera sensu lato) with Pterophoroidea, Thyridoidea and Gelechioi- Ditrysia, but the Nepticulidae do not. To determine whether this dea all grouping together with the more conventional (‘‘core’’) translocation could be a potential synapomorphy for Ditrysia and obtectomerans (Papilionoidea, Pyraloidea, Mimallonioidea and Tischeriidae, we sequenced the relevant region in the mt-genome Macroheterocera) is in line with the study of Bazinet et al. in an Australian Palaephatoidea and an Adeloidea (two superfam- (2013). Significantly, Gelechioidea never grouped with other lower ilies that where were not included here in the full mt-genome Apoditrysia (note that van Nieukerken et al., 2011 for lack of con- sequencing) using standard PCR. We found that the derived trans- sensus among taxonomists simply arranged them at the base of the location was present in Palaephatoidea but not in Adeloidea Apoditrysia). We found some support in the RY-96 tree for a sister (Fig. 5), supporting the hypothesis that Palaephatidae (at least, relationship of Gelechioidea to the traditionally obtectomeran Australian members currently ascribed to the family) and superfamily Thyridoidea, but admittedly our taxon sampling may Tischeriidae are closest relatives to the Ditrysia, which is also in be too limited to draw any firm conclusion here. Also, in the ML agreement with recent phylogenetic hypotheses (Mutanen et al., trees of Bazinet et al. (2013) Gelechioidea form a clade with 2010; Regier et al., 2013). Pterophoroidea and Thyridoidea. A close relationship between Pterophoroidea and Alucitoidea (the two superfamilies of plume 4. Discussion moths), which is supported in both 96-taxa trees, is in line with earlier suggestions based on morphology (e.g. Kristensen and Our study greatly augments the phylogenetic scope of available Skalski, 1999). In both 105-taxa trees Pterophoroidea and Alucitoi- lepidopteran mitochondrial genomes, adding 16 new superfamilies dea group with Epermenioidea (an incomplete sequence, thus to the eight previously available, and 23 new full mitochondrial absent from 96-exemplar analyses), albeit weakly supported. Note genomes to the 77 available at the time of our analysis. also that in the 96-taxa RY tree, the ‘‘plume moths’’ are supported as sister to the ‘‘core’’ obtectomerans. As other studies have also reported, Papilionoidea are not mac- rolepidoptera as formerly conceived e.g. by Kristensen and Skalski (1999), but rather Pyraloidea are the sister of the bulk of ‘classical macro moths’ (Macroheterocera + Mimallonidae). Papilionoidea have probably been the subject of more phylogenetic studies than almost any comparable group of hexapods (e.g. Simonsen et al., 2012), and are similarly well represented in our analyses. In the present study, we were unable to include Hedylidae, Calliduloidea and Hyblaeoidea, and thus to test close relationships with Papilio- noidea as suggested by Regier et al. (2009), Mutanen et al. (2010), Fig. 5. Phylogenetically informative tRNA re-arrangement hypothesized to be synapomorphic to Ditrysia, Tischeriidae and Australian Palaephatoidea (if not the and Heikkila et al. (2012). However, our results otherwise confirm unsequenced Andesianidae). the overall results in these studies in as much as Hesperiidae are 176 M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178 more closely related to the clade comprising , Lycaenidae roceran superfamilies with strong support. A position within Mac- and Nymphalidae, than any of these are to Papilionidae. The last roheterocera would be in line with morphological views, while the three studies had Thyrididae as closely related to Papilionoidea. topology of the RY-coded dataset agrees with Fig. 2 of Bazinet et al. Here, however, Thyrididae are placed in the basal obtectomeran (2013) in which it is sister to it. The second contentious taxon, Doi- polytomy as in Bazinet et al. (2013), but that study did not sample dae, has variously been placed in Noctuoidea based on morphology Papilionoidea, Hedylidae, Calliduloidea or Hyblaeoidea. Within (Kitching and Rawlins, 1999), or as mentioned above in a clade Papilionidae, the subfamilies Parnassinae and Papilioninae are with Mimallonidae and Drepanoidea based on molecular data recovered as monophyletic in accordance with previous molecular (Regier et al., 2009, 2013; Mutanen et al., 2010 did not include and morphological studies (e.g. Miller, 1987; Nazari et al., 2007; Doidae), or within the new concept of Drepanoidea, as sister to Simonsen et al., 2011), but contrary to Simonsen et al. (2011), Drepanidae (Bazinet et al., 2013). Here our results agree with Teinopalpini (represented only by Teinopalpus) and not Troidini Regier et al. (2009, 2013) and Bazinet et al. (2013) by placing (represented only by Troides) are the sister of Papilio; this may Doidae as the sister of Drepanidae with strong support, for both however reflect problematic public gene data (see above). Within RY and degen analyses. The support in RY trees for a clade consist- the family Nymphalidae, the most remarkable result is that milk- ing of Bombycidae + Sphingidae rather than the more convention- weed butterflies (Danainae) are placed as the sister to the remain- ally illustrated Sphingidae + grouping among the "SBS" der of the family. This is in conflict with both Heikkila et al. (2012) group of Zwick et al. (2011) also deserves highlighting. We note where Danainae together with Libytheinae form the sister group of that prior studies have not mostly shown a strongly supported res- the rest of the family, and the two most comprehensive studies olution within this clade, but see Breinholt and Kawakara (2013). focused on Nymphalidae: Freitas and Brown (2004: morphology), Finally and relevant to the deep phylogeny of the Lepidoptera, and Wahlberg et al. (2009: molecules), which both have Libythei- we found that the tRNA rearrangement previously reported for Dit- nae as the sister of the remaining Nymphalidae. However, our rysia is present in Tineoidea, Tischerioidea and Palaephatoidea but results in this respect are identical to the mt-genomic results by not in the monotrysian Adeloidea (Fig. 5). This synapomorphy Lu et al. (2013), and as noted by Simonsen et al. (2012), establish- therefore supports the mitochondrial topologies presented here ing which taxon is the sister group of the remaining Nymphalidae and other recent morphological and molecular studies (Mutanen seems to be one of the major challenges in butterfly systematics. et al., 2010; Regier et al., 2013), which showed Tischeriidae (and Within Nymphalidae, the sister group relationship between Helic- Palaephatidae) to be more closely related to Ditrysia than to oniinae and Limenitidinae, and the close relationship between Adeloidea. Nymphalinae and Apaturinae respectively, are in agreement with We emphasise the efficiency of our approach in obtaining these previous molecular studies (i.e. Brower, 2000; Wahlberg et al., mitochondrial genomes and phylogenetic results. We applied a 2003, 2009; Wahlberg and Wheat, 2008), but not with the morpho- cheap, next generation sequence (Illumina Solexa) based approach. logical study by Freitas and Brown (2004). Even though we aimed for only 1% of a MiSeq (Reagent Kit v2) While our sampling of Pyraloidea comprises only five of the 10 run capacity, sequence output was higher than needed with a subfamilies in Crambidae and one of the five subfamilies of Pyral- mean mitochondrial genome coverage of 300x. Extrapolating idae (and none of the three unplaced subfamilies) that are recogni- these results suggests that more than 3000 full mitochondrial sed by Regier et al. (2012b), our results do confirm the monophyly genomes can be obtained in a single sequence run. Here we of Crambidae (albeit with only one subfamily to test it used relatively expensive ‘Sanger bait sequences’ to associate against), and the sister group relationship between Spilomelinae mitochondrial genome contigs to specimens. To further reduce and Pyraustinae found by Regier et al. (2012b). But in contrast to costs, one could rely on the comprehensive BOLD barcode database that study, we find that Acentropinae, and not Crambinae, appears and replace custom Sanger baits with publicly available COI to be the sister group of . However, given our relatively sequences. limited taxon sampling, we do not necessarily consider this result conclusive. 5. Conclusion The last area where our taxon sampling is sufficient for us to make significant comparisons is the Macroheterocera (sensu van Overall, our study reveals the promise of mitochondrial gen- Nieukerken et al., 2011). The most problematic higher taxa to place omes for deep level phylogenetics of Lepidoptera, but also its lim- within Macroheterocera have generally been the small, enigmatic itations. As with previous studies, we fail to resolve conclusively families Mimallonidae and Doidae. Based on morphology, Mimal- the Apoditrysia. This is unsurprising due to limited taxon sampling, lonidae have been placed in (Kristensen and and it could well be that even adding a single exemplar represen- Skalski, 1999; Lemaire and Minet, 1999), but molecular analyses tation of each remaining superfamily would resolve their relation- have continuously failed to support this placement: while Regier ships. We find it encouraging that we can, independently of et al. (2009, 2013) consistently placed Mimallonidae in a clade nuclear genome information, recover the expanded concepts of with Drepanoidea, Doidae and (not sampled here). clades as deep as Macroheterocera and Obtectomera, and also Mutanen et al. (2010) recovered the family either as the sister the Apoditrysia, with representatives of a limited set of superfam- group of Gelechioidea (their Fig. 2, albeit with negligible posterior ilies. Moreover, the method we present here paves the way for a support), or as sister to Macroheterocera (their Fig. 1 and their Sup- phylogenetically comprehensive data sampling of lepidopteran plementary Fig. 1), with poor (60%) bootstrap support. We do not mitochondrial genomes including most of the 133 families. While find any support for including Mimallonidae in either Bombycoi- analyses of such a dataset on its own would surely provide consid- dea or in a clade including Drepanidae and Doidae. Our degen erable new insights into lepidopteran evolution, combining it with and nc123 analyses rather unconventionally recover Mimallonoi- already existing nuclear datasets could finally result in a strong dea as sister to Lasiocampoidea whereas the RY analyses more con- phylogenetic signal from different loci and gene histories. servatively recover Mimallonoidea as sister to the remaining Macroheterocera. All our analyses thus strongly support a close relationship between Mimallonoidea and Macroheterocera. The Acknowledgments RY result is also in agreement with Mutanen et al. (2010; their Sup- plementary Fig. 1) and with the 741-gene study of Bazinet et al. We are very grateful to Charles W. Mitter and Kim T. Mitter, (2013), who find Mimallonidae sister to the main five macrohete- Department of Entomology, University of Maryland, USA for M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178 177 donation of crucial specimens for sequencing from the LepTree tis- Kitching, I.J., Rawlins, J.E., 1999. The Noctuoidea. In: Kristensen, N.P. (Ed.), sue collection, and Edward (Ted) Edwards, Australian National Handbook of Zoology, Volume 1: Evolution, Systematics, and Biogeography. Walter de Gruyter, Berlin, New York, pp. 355–401. Insect Collection for loan of Australian Palaephatidae for the anal- Kristensen, N.P., Skalski, A.W., 1999. Phylogeny and palaeontology. In: Kristensen, yses. Library preparation and sequencing was performed at the N.P. (Ed.), Handbook of Zoology, Volume 1: Evolution, Systematics, and Department of Biochemistry (University of Cambridge). This work Biogeography. Walter de Gruyter, Berlin, New York, pp. 7–25. Lanfear, R., Calcott, B., Ho, S.Y.W., Guindon, S., 2012. PartitionFinder: is supported by the Initiative of the NHM London. combinedselection of partitioning schemes and substitution models for DCL’s sampling was supported by NSF #8316-07 and ERC grant phylogenetic analyses. Mol. Biol. Evol. 29, 1695–1701. EMARES #250325. MJTNT is a NERC Postdoctoral Fellow (NE/ Lemaire, C., Minet, J., 1999. The Bombycoidea and their relatives. In: Kristensen, N.P. (Ed.), Handbook of Zoology, Volume 1: Evolution, Systematics, and I021578/1). The authors thank M. Djernæs, I.J. Kitching and A.P. Biogeography. Walter de Gruyter, Berlin, New York, pp. 321–353. Vogler from NHM London for discussions and comments. This Lohse, M., Bolger, A.M., Nagel, A., Fernie, A.R., Lunn, J.E., Stitt, M., Usadel, B., 2012. study used the computational infrastructure of the Bioinformatics RobiNA: a user-friendly, integrated software solution for RNA-Seq- based transcriptomics. Nucleic Acids Res. 40, W622–W627. Support Service (Imperial College London; maintained by James Lu, H.-F., Su, T.-J., Luo, A.R., Zhu, C.-D., Wu, C.-S., 2013. Characterization of the Abbott) and of the Department of Life Sciences (NHM London; complete mitochondrion genome of diurnal moth Amata emma (Butler) maintained by Peter Foster). (Lepidoptera: ) and its phylogenetic implications. PLoS ONE 8, e72410. Miller, J.S., 1987. Phylogenetic studies in the Papilioninae (Lepidoptera, Appendix A. Supplementary material Papilionidae). Bull. Am. Mus. Nat. Hist. 186, 365–512. Miller, M.A., Pfeiffer, W., Schwartz, T., 2010. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. In: Proceedings of the Gateway Supplementary data associated with this article can be found, in Computing Environments Workshop (GCE), 14 Nov. 2010, New Orleans, LA, pp. the online version, at http://dx.doi.org/10.1016/j.ympev.2014.05. 2011–2018. 031. Misof, B., Misof, K., 2009. A Monte Carlo approach successfully identifies randomness in multiple sequence alignments: a more objective means of data exclusion. Syst. Biol. 58, 21–34. References Mutanen, M., Wahlberg, N., Kaila, L., 2010. Comprehensive gene and taxon coverage elucidates radiation patterns in moths and butterflies. Proc. R. Soc. Biol. Sci. Ser. B 277, 2839–2848. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local Nazari, V., Zakharov, E.V., Sperling, F.A.H., 2007. Phylogeny, historical biogeography, alignment search tool. J. Mol. Biol. 215, 403–410. and taxonomic ranking of Parnassiinae (Lepidoptera, Papilionidae) based on Bazinet, A.L., Cummings, M.P., Mitter, K.T., Mitter, C.W., 2013. Can RNA-Seq resolve morphology and seven genes. Mol. Phylogenet. Evol. 42, 131–156. the rapid radiation of advanced moths and butterflies (Hexapoda Lepidoptera: Pons, J., Ribera, I., Bertranpetit, J., Balke, M., 2010. Nucleotide substitution rates for Apoditrysia)? An exploratory study. PLoS ONE 8, e82615. the full set of mitochondrial protein-coding genes in Coleoptera. Mol. Bininda-Emonds, O.R.P., 2005. TransAlign: using amino acids to facilitate the Phylogenet. Evol. 56, 796–807. multiple alignment of protein-coding DNA sequences. BMC Bioinformatics 6, Regier, J.C., Zwick, A., Cummings, M.P., Kawahara, A.Y., Cho, S., Weller, S., Roe, A., 156. Baixeras, J., Brown, J.W., Parr, C., Davis, D.R., Epstein, M., Hallwachs, W., Breinholt, J.W., Kawahara, A.Y., 2013. Phylotranscriptomics: saturated third codon Hausmann, A., Janzen, D.H., Kitching, I.J., Solis, M.A., Yen, S.-H., Bazinet, A.L., positions radically influence the estimation of trees based on next-gen data. Mitter, C., 2009. Toward reconstructing the evolution of advanced moths and Genome Biol. Evol. 5, 2082–2092. butterflies (Lepidoptera: Ditrysia): an initial molecular study. BMC Evol. Biol. 9, Brower, A.V.Z., 2000. Phylogenetic relationships among the Nymphalidae 280. (Lepidoptera) inferred from partial sequences of the wingless gene. Proc. R. Regier, J.C., Shultz, J.W., Zwick, A., Hussey, A., Ball, B., Wetzer, R., Martin, J.W., Soc. Biol. Sci. Ser. B 267, 1201–1211. Cunningham, C.W., 2010. relationships revealed by phylogenomic Cao, Y.-Q., Ma, C., Chen, J.-Y., Yang, D.-R., 2012. The complete mitochondrial analysis of nuclear protein-coding sequences. Nature 463, 1079–1083. genomes of two ghost moths, Thitarodes renzhiensis and Thitarodes yunnanensis: Regier, J.C., Brown, J.W., Mitter, C., Baixeras, J., Cho, S., Cummings, M.P., Zwick, A., the ancestral gene arrangement in Lepidoptera. BMC Genomics 13, 276. 2012a. A molecular phylogeny for the leaf-roller moths (Lepidoptera: Castresana, J., 2000. Selection of conserved blocks from multiple alignments for Tortricidae) and its implications for classification and life history evolution. their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552. PLoS ONE 7, e35574. Cho, S., Zwick, A., Regier, J.C., Mitter, C., Cummings, M.P., Yao, J., Du, Z., Zhao, H., Regier, J.C., Mitter, C., Solis, M.A., Hayden, J.E., Landry, B., Nuss, M., Simonsen, T.J., Kawahara, A.Y., Weller, S., Davis, D.R., Baixeras, J., Brown, J.W., Parr, C., 2011. Yen, S.-H., Zwick, A., Cummings, M.P., 2012b. A molecular phylogeny for the Can deliberately incomplete gene sample augmentation improve a phylogeny pyraloid moths (Lepidoptera: Pyraloidea) and its implications for higher-level estimate for the advanced moths and butterflies (Hexapoda: Lepidoptera)? Syst. classification. Syst. Entomol. 37, 635–656. Biol. 60, 782–796. Regier, J.C., Mitter, C., Zwick, A., Bazinet, A.L., Cummings, M.P., Kawahara, A.Y., Sohn, Eddy, S.R., Durbin, R., 1994. RNA sequence analysis using covariance models. J.-C., Zwickl, D.J., Cho, S., Davis, D.R., Baixeras, J., Brown, J., Parr, C., Weller, S., Nucleic Acids Res. 22, 2079–2088. Lees, D.C., Mitter, K.T., 2013. A large-scale, higher-level, molecular phylogenetic Freitas, A.V.L., Brown, K.S., 2004. Phylogeny of the nymphalidae (Lepidoptera). Syst. study of the insect order Lepidoptera (moths and butterflies). PLoS ONE 8, Biol. 53, 363–383. e58568. Grimaldi, D., Engel, M.S., 2005. Evolution of the Insects. Cambridge University Press, Ronquist, F., Huelsenbeck, J.P., Teslenko, M., 2011. Draft MrBayes version 3.2 New York. Manual. Tutorials and Model Summaries. . large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704. Schmieder, R., Edwards, R., 2011. Quality control and preprocessing of metagenomic Haran, J., Timmermans, M.J.T.N., Vogler, A.P., 2013. Mitogenome sequences stabilize datasets. Bioinformatics 27, 863–864. the phylogenetics of weevils (Curculionoidea) and establish the monophyly of Simon, C., Frati, F., Beckenback, A., Crespi, B., Liu, H., Flook, P., 1994. Evolution, larval ectophagy. Mol. Phylogenet. Evol. 67, 156–166. weighting, and phylogenetic utility of mitochondrial gene sequences and a Hassanin, A., 2006. Phylogeny of Arthropoda inferred from mitochondrial sequences compilation of conserved polymerase chain reaction. Ann. Ent. Soc. Am. 87, strategies for limiting the misleading effects of multiple changes in pattern and 651–701. rates of substitution. Mol. Phylogenet. Evol. 38, 100–116. Simon, S., Hadrys, H., 2013. A comparative analysis of complete mitochondrial Heikkila, M., Kaila, L., Mutanen, M., Pena, C., Wahlberg, N., 2012. Cretaceous origin genomes among Hexapoda. Mol. Phylogenet. Evol. 69, 393–403. and repeated tertiary diversification of the redefined butterflies. Proc. R. Soc. Simonsen, T.J., Zakharov, E.V., Djernaes, M., Cotton, A.M., Vane-Wright, R.I., Sperling, Biol. Sci. Ser. B 279, 1093–1099. F.A.H., 2011. Phylogenetics and divergence times of Papilioninae (Lepidoptera) Kaila, L., Mutanen, M., Nyman, T., 2011. Phylogeny of the mega-diverse Gelechioidea with special reference to the enigmatic genera Teinopalpus and Meandrusa. (Lepidoptera): adaptations and determinants of success. Mol. Phylogenet. Evol. Cladistics 27, 113–137. 61, 801–809. Simonsen, T.J., de Jong, R., Heikkila, M., Kaila, L., 2012. Butterfly morphology in a Karsholt, O., Mutanen, M., Lee, S., Kaila, L., 2013. A molecular analysis of the molecular age – does it still matter in butterfly systematics? Arthropod Struct. Gelechiidae (Lepidoptera, Gelechioidea) with an interpretative grouping of its Dev. 41, 307–322. taxa. Syst. Entomol. 38, 334–348. Sohn, J.-C., Regier, J.C., Mitter, C., Davis, D., Landry, J.-F., Zwick, A., Cummings, M.P., Katoh, K., Kuma, K., Toh, H., Miyata, T., 2005. MAFFT version 5: improvement in 2013. A molecular phylogeny for Yponomeutoidea (Insecta, Lepidoptera, accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518. Ditrysia) and its implications for classification, biogeography and the Katoh, K., Asimenos, G., Toh, H., 2009. Multiple alignment of DNA sequences with evolution of host plant use. PLoS ONE 8, e5506. MAFFT. Methods Mol. Biol. 537, 39–64. Stamatakis, A., Ludwig, T., Meier, H., 2005. RAxML-III: a fast program for maximum Kim, M.J., Kang, A.R., Jeong, H.C., Kim, K.-G., Kim, I., 2011. Reconstructing likelihood-based inference of large phylogenetic trees. Bioinformatics 21, 456– intraordinal relationships in Lepidoptera using mitochondrial genome data 463. with the description of two newly sequenced lycaenids, Spindasis takanonis and Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the Protantigius superans (Lepidoptera: Lycaenidae). Mol. Phylogenet. Evol. 61, 436– sensitivity of progressive multiple sequence alignment through sequence 445. 178 M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178

weighting, position-specific gap penalties and weight matrix choice. Nucleic Wahlberg, N., Leneveu, J., Kodandaramaiah, U., Pena, C., Nylin, S., Freitas, A.V.L., Acids Res. 22, 4673–4680. Brower, A.V.Z., 2009. Nymphalid butterflies diversify following near demise at Timmermans, M.J.T.N., Vogler, A.P., 2012. Phylogenetically informative the Cretaceous/Tertiary boundary. Proc. R. Soc. Biol. Sci. Ser. B 276, 4295–4302. rearrangements in mitochondrial genomes of Coleoptera, and monophyly of Wernersson, R., 2005. FeatureExtract – extraction of sequence annotation made aquatic elateriform beetles (Dryopoidea). Mol. Phylogenet. Evol. 63, 299–304. easy. Nucleic Acids Res. 33, W567–W569. Timmermans, M.J.T.N., Dodsworth, S., Culverwell, C.L., Bocak, L., Ahrens, D., Xia, X., Xie, Z., 2001. DAMBE: software package for data analysis in molecular Littlewood, D.T.J., Pons, J., Vogler, A.P., 2010. Why barcode? High-throughput biology and evolution. J. Hered. 92, 371–373. multiplex sequencing of mitochondrial genomes for molecular systematics. Zahiri, R., Kitching, I.J., Lafontaine, J.D., Mutanen, M., Kaila, L., Holloway, J.D., Nucleic Acids Res. 38, e197. Wahlberg, N., 2011. A new molecular phylogeny offers hope for a stable family van Nieukerken, E.J., Kaila, L., Kitching, I.J., Kristensen, N.P., Lees, D.C., Minet, J., level classification of the Noctuoidea (Lepidoptera). Zool. Scr. 40, 158–173. Mitter, C., Mutanen, M., Regier, J.C., Simonsen, T.J., Wahlberg, N., Yen, S.-H., Zahiri, R., Holloway, J.D., Kitching, I.J., Lafontaine, J.D., Mutanen, M., Wahlberg, N., Zahiri, R., Adamski, D., Baixeras, J., Bartsch, D., Bengtsson, B.A., Brown, J.W., 2012. Molecular phylogenetics of Erebidae (Lepidoptera, Noctuoidea). Syst. Bucheli, S.R., Davis, D.R., de Prins, J., de Prins, W., Epstein, M.E., Gentili-Poole, P., Entomol. 37, 102–124. Gielis, C., Haettenschwiler, P., Hausmann, A., Holloway, J.D., Kallies, A., Karsholt, Zahiri, R., Lafontaine, D., Schmidt, C., Holloway, J.D., Kitching, I.J., Mutanen, M., O., Kawahara, A.Y., Koster, S., Kozlov, M.V., Lafontaine, J.D., Lamas, G., Landry, J.- Wahlberg, N., 2013a. Relationships among the basal lineages of Noctuidae F., Lee, S., Nuss, M., Park, K.-T., Penz, C., Rota, J., Schintlmeister, A., Schmidt, B.C., (Lepidoptera, Noctuoidea) based on eight gene regions. Zool. Scr. 42, 488–507. Sohn, J.-C., Solis, M.A., Tarmann, G.M., Warren, A.D., Weller, S., Yakovlev, R.V., Zahiri, R., Lafontaine, J.D., Holloway, J.D., Kitching, I.J., Schmidt, B.C., Kaila, L., Zolotuhin, V.V., Zwick, A., 2011. Order Lepidoptera Linnaeus, 1758. Zootaxa Wahlberg, N., 2013b. Major lineages of (Lepidoptera, Noctuoidea) 3148, 212–221. elucidated by molecular phylogenetics. Cladistics 29, 337–359. Wahlberg, N., Wheat, C.W., 2008. Genomic outposts serve the phylogenomic Zwick, A., Regier, J.C., Mitter, C., Cummings, M.P., 2011. Increased gene sampling pioneers: designing novel nuclear markers for genomic DNA extractions of yields robust support for higher-level clades within Bombycoidea lepidoptera. Syst. Biol. 57, 231–242. (Lepidoptera). Syst. Entomol. 36, 31–43. Wahlberg, N., Weingartner, E., Nylin, S., 2003. Towards a better understanding of Zwick, A., Regier, J.C., Zwickl, D.J., 2012. Resolving discrepancy between nucleotides the higher systematics of Nymphalidae (Lepidoptera: Papilionoidea). Mol. and amino acids in deep-level arthropod phylogenomics: differentiating serine Phylogenet. Evol. 28, 473–484. codons in 21-amino-acid models. PLoS ONE 7, e47450.