A Nuclear Gene for Higher Level Phylogenetics: Phosphoenolpyruvate Carboxykinase Tracks Mesozoic-Age Divergences Within (Insecta)

Timothy P. Friedlander, * Jerome C. Regier, * Charles Mtter,_F and David L. Wagner$ *Center for Agricultural Biotechnology, University of Maryland; TDepartment of Entomology, University of Maryland; and $Department of Ecology and Evolutionary Biology, University of Connecticut

The sequence of phosphoenolpyruvate carboxykinase (PEPCK) has been previously identified as a promising can- didate for reconstructing Mesozoic-age divergences (Friedlander, Regier, and Mitter 1992, 1994). To test this hy- pothesis more rigorously, 597 nucleotides of aligned PEPCK coding sequence (-30% of the coding region) were generated from 18 representing Mesozoic-age lineages of (Insecta: Lepidoptera) and outgroup taxa. Relationships among basal Lepidoptera are well established by morphological analysis, providing a strong test for the utility of a gene which has not previously been used in systematics. Parsimony and other phylogenetic analyses were conducted on nucleotides by codon positions (ntl, nt2, nt3) separately and in combination, and on amino Downloaded from https://academic.oup.com/mbe/article/13/4/594/1055547 by guest on 02 October 2021 acids, for comparison to the test phylogeny. The highest concordance was achieved with ntl + nt2, for which one of two most-parsimonious trees was identical to the test phylogeny, and with all nucleotides when nt3 was down- weighted sevenfold or higher, for which a single most-parsimonious tree identical to the test phylogeny resulted. Substitutions in nt3 approached saturation in many, but not all, pairwise comparisons and their exclusion or severe downweighting greatly increased the degree of concordance with the test phylogeny. Neighbor-joining analysis confirms this finding. The utility of PEPCK for phylogenetics is demonstrated over a time span for which few other suitable genes are currently available.

Introduction Concordance among independent data sets is a and phylum-level phylogenies, dating to the Paleozoic principal criterion for robustness of phylogenetic hy- and earlier (Creti et al. 1991; Rivera and Lake 1992). potheses (e.g., Penny and Hendy 1986; Miyamoto and By contrast, synonymous nucleotide changes within the Cracraft 1991; Hillis 1995; Miyamoto and Fitch 1995). same gene have recently proven useful for inferring spe- Therefore, molecular systematists require access to mul- cies- and -level relationships (Cho et al. 1995). tiple unlinked gene sequences. Currently, only organel- Our studies suggest that analysis of synonymous lar and nuclear ribosomal DNAs, for which “universal” changes in many protein-encoding genes should prove PCR primers are available, have been widely applied to useful for Tertiary-age divergences, with genes of highly systematic questions. Additional sequences are needed conserved protein sequence having the additional ad- to address relationships on which organellar or riboso- vantages of unambiguous sequence alignment and rela- ma1 sequences prove uninformative or misleading. tive ease of PCR primer definition. Our laboratory has been systematically searching Resolving Mesozoic-age systematic questions pre- for protein-encoding nuclear gene sequences that will be sents additional challenges because highly conserved phylogenetically useful at a variety of taxonomic levels protein sequences such as EF-la may be insufficiently in . Fourteen candidates were identified in an variable, while most synonymous changes in the same initial screening based on criteria of gene size, structure, gene may be multiply substituted and uninterpretable, copy number, and conservation (Friedlander, Regier, and particularly without extensive taxon sampling. What is Mitter 1992). The phylogenetic information content of needed are genes in which nonsynonymous characters five of these for which enough metazoan sequences were evolve more rapidly than in highly conserved sequences available was tested by the criterion of concordance, that such as EF-lo, but considerably more slowly than syn- is, their ability to recover groupings securely established onymous changes in such genes. Even when such genes by previous evidence (Friedlander, Regier, and Mitter are found, their application raises challenges absent 1994). These studies confirmed that all five genes carry from the study of recent divergences. Primer definition phylogenetic information and suggested further that they will be less straightforward because the sequence is had utility spanning an enormous temporal range (< 10 more variable. Furthermore, the issues of character MYA to >500 MYA). For example, amino acid se- weighting and data set partitioning will be relevant to quences of the highly conserved protein elongation fac- analyzing these anciently diverged sequences. For ex- tor- 101 (EF- la) have been used to reconstruct kingdom- ample, if synonymous changes are indeed saturated, then their downweighting, or even removal from the Key words: concordance study, Lepidoptera, Mesozoic, molecular data set, may be justified. systematics, nuclear gene, phosphoenolpyruvate carboxykinase, phy- Our earlier study identified two nuclear genes, not logenetics, sequence character partitions, PEPCK. previously exploited for systematics, which are likely to Address for correspondence and reprints: Timothy Friedlander, be informative about Mesozoic-age divergences: phos- Center for Agricultural Biotechnology, 2 113 Agriculture/Life Sciences Surge Building, University of Maryland, College Park, Maryland phoenolpyruvate carboxykinase (PEPCK; E.C. 4.1.1.32), 20742-3351. E-mail: [email protected]. the subject of this report, with approximately 1,941 bp

Mol. Biol. Evol. 13(4):594+04. 1996 coding sequence, and dopa decarboxylase (Friedlander, 0 1996 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 Regier, and Mitter 1994; Fang et al., in prep.). PEPCK

594 A Nuclear Gene for Mesozoic-Age Phylogenetics 595

Heteroneura

6/4/53 \ (~140,500) \ 12/6/59 7 \ 14/8/59 l2 1 2/2/g “” Wl,300) 4 1s t (Tischeriidae) 16/7/54 7 4/l/22 t (~~00) 17 11 4 () Lepidoptera 12/8/55 t 6/2/47 “Dacnonypha” (2W 14 (mothsandbutterflies) 15 6 () 11/g/51 t Heterobathmiina (l/lo) 16 8 Amphiesmenoptera \ \ 14/10/5518 10t Aglossata (~2)

14/11/56 t s/5/54 Zeugloptera Woo) t 18A 26 8

T I 11/6/57 Trichoptera (27,000) Downloaded from https://academic.oup.com/mbe/article/13/4/594/1055547 by guest on 02 October 2021 21 11 (caddisflies) Mecoptera (l/500) (scorpionflies) Mecopterida 16/10/57 Siphonaptera (l/2,000) 18 (fleas) 13/g/66 Diptera W 5Woc) t 16 Antliophora (flies)

FIG. 1 .-“Test phylogeny” of the Lepidoptera and the other mecopteroid orders (Trichoptera, Mecoptera, Siphonaptera, Diptera) as sampled in this study, consisting of groupings strongly supported by morphology (see text). Numbers of species sampled in each lineage and the approximate numbers of extant species in that clade are listed in parentheses. For clades in which more than one species was sampled, average pairwise divergence values by codon position (ntl/nt2/nt3) are displayed above each branch, and average pairwise divergence values for amino acids are displayed_ - below each branch. Numbers of morphological synapomorphies supporting major lepidopteran clades (Kristensen 1984) are indicated with arrows. catalyzes the first step of gluconeogenesis, interconvert- debated aspects of lepidopteran phylogeny and to other ing oxaloacetate and phosphoenolpyruvate. Gene copy Mesozoic-age systematic questions. Availability of the number is low, possibly single copy in Lepidoptera and “known” phylogeny also allows objective judgment of Diptera, based on genomic Southern hybridizations with alternative approaches to phylogenetic analysis of this a Drosophila probe (Friedlander, Regier, and Mitter gene (Miyamoto and Fitch 1995), some of which are 1992). There are similarly low estimates of copy number explored here. in vertebrates (Yoo-Warren et al. 1983; Hod, Yoo-War- ren, and Hanson 1984), although a second, quite diver- .Materials and Methods gent, paralogous PEPCK sequence has been identified Specimens that is specifically targeted for export to the mitochon- dria (Weldon et al. 1990). PEPCK’s potential as a phy- The species names, number of individuals sampled, logenetic marker was supported by its recovery of ex- life history stage, and geographical source are listed in pected relationships among six published se- table 1, along with GenBank accession numbers for their quences-two nematodes, an , a bird, rat, and hu- PEPCK sequences. Field-collected, live moths were man. However, our limited sampling of taxa did not temporarily stored dry at the temperature of liquid ni- permit a precise identification of the time frame over trogen or in 100% ethanol at 0°C for up to 3 days, fol- which PEPCK would be useful. lowed by long-term storage at -80°C. Storage at -20°C To gauge the phylogenetic utility of PEPCK more in 100% ethanol at least up to 1 year also yielded sat- fully, we have applied this gene to a test case consisting isfactory templates for this study. Specimens from each of basal divergences within the insect order Lepidoptera, of the collections are vouchered in freezers at the Uni- which are Mesozoic in age (fig. 1; Kukalova-Peck 1991; versity of Maryland, and all are authoritatively identi- Ross and Jarzembowski 1993; Labandeira et al. 1994). fied. Whole single individuals were extracted (U.S. Bio- Intensive research on primitive Lepidoptera over the chemical Corp. DNA/RNA Isolation Kit #73750) for nu- past 2 decades has yielded strong morphological evi- cleic acids, except for queenslandensis, dence from all life stages on many, though not all, as- Dyseriocrania griseocapitella, Ctenocephalides felis, pects of basal phylogeny (Kristensen 1984, 1994; and Drosophila melanogaster, for which multiple indi- Davis 1986; Kobayashi and Ando 1988; Nielsen and viduals were pooled prior to extraction. Common 199 1). Although this morphological phyloge- ny is not beyond all doubt, it is surely approximately Taxon Sampling correct. Numbers of morphological synapomorphies The species sampled represent the major groups in (Kristensen 1984) are mapped on the test phylogeny a widely accepted phylogeny and classification of Lep- (fig. 1). Thus, concordance between it and PEPCK idoptera (fig. 1). PEPCK sequences were generated from would support application of PEPCK both to currently 12 species of moths, including representatives of all four Sampled Taxa with 14 Test Clades Indicated

Code GenBank Classification Name Sample Source Accession No.

1. Amphiesmenoptera (Lepidoptera, Tiichoptera) 2. Lepidoptera (Zeugloptera, Aglossata, Heterobathmiina, Glossata) 3. Zeugloptera () Epimartyria auricrinella ...... Eau 1A Canada U28442 Micropterix calthella ...... Mea 1A England U28437 4. Aglossata + Heterobathmiina + Glossata Aglossata (Agathiphagidae) Agathiphaga queenslandensis ...... Mu 5L Australia U28446 5. Heterobathmiina + Glossata Heterobathmiina (Heterobathmiidae) pseuderiocrania ...... Hps 1L Argentina U28440 6. Glossata (“Dacnonypha,” Neolepidoptera) 7. “Dacnonypha” (Eriocraniidae) Downloaded from https://academic.oup.com/mbe/article/13/4/594/1055547 by guest on 02 October 2021 Dyseriocrania griseocapitella ...... DiY 5L Maryland U28443 Eriocrania semipurpurella ...... Ese 1L West Virginia U28441 8. Neolepidoptera (Exoporia, ) 9. Exoporia (Hepialidae) Korscheltellus gracilis ...... Kgr 1A New Hampshire U28439 Sthenopis argenteomaculatus ...... Sar 1L New York U28435 10. Heteroneura (“Monotrysia,” Ditrysia) 11. “Monotrysia” (Tischeriidae) Tischeria badiiella ...... Tha 1A Connecticut U28434 Tischeria citrinipennella ...... Tci 1A Connecticut U28433 12. Ditrysia (; Lymantriidae) Tinea pellionella ...... TPe 5A Australia U2843 1 (lab colony) Lymantria dispar ...... Ldi 1A Maryland U28438 13. Trichoptera (caddisflies: Limnephiloidea (Brachycentridae); Hydropsychoidea (Hydropsychidae)) Brachycentrus nigrosoma ...... Bni 1A Maryland U28445 Macrostemum zebrata ...... Mze 1A Maryland U28436 1. Antliophora (Mecoptera, Siphonaptera, Diptera) Mecoptera (scorpionflies: Bittacidae) Apterobittacus apterus ...... AaP 1A California U28447 Siphonaptera (fleas: Pulicidae) Ctenocephalides felis ...... Cfe 20A Maryland (lab colony) U28444 14. Diptera (flies: Nematocera (Tipulidae); Brachycera (Drosophilidae)) Tipula paterifera ...... Tpa 1A Maryland U28432 Drosophila melanogaster ...... Dme 10A Maryland YOO402 (lab colony)

Non.-Code names are used for inventory purposes and also refer to corresponding species in figures 2 and 3 of this report. “Sample” refers to the number of specimens extracted and their developmental stage (A = adult, L = larva).

suborders, and all major subdivisions of the largest one, (Gundelfinger et al. 1987) was verified by sequencing Glossata. Two species were sampled from each, except both strands in the aligned coding region reported here. that only single species were available for the species- PCR and Sequencing poor, relictual suborders Aglossata and Heterobathmiina, Direct amplification of genomic DNA proved un- which comprise single Southern Hemisphere genera successful for fragments containing more than approxi- containing 2 and 10 species, respectively (fig. 1). Within mately 200 nucleotides of coding DNA, whereas much Glossata, the monophyly of Dacnonypha and Monotry- larger fragments could be amplified by RT-PCR (Ka- sia broadly defined is still debated. To provide a securely wasaki 1990). This observation suggests the presence of resolved test phylogeny, each was represented by two numerous, relatively large introns in lepidopteran species from a single monophyletic family-Eriocrani- PEPCK genes, similar to the gene structure documented idae and Tischeriidae, respectively. in chicken (Hod, Yoo-Warren, and Hanson 1984) and Outgroups for this study consisted of two species rat (Beale et al. 1985). In contrast, much of the coding of Trichoptera (caddisflies), sister-order of Lepidoptera region in Drosophila melanogaster appears to be with- (moths and ), and four species of Antliophora out introns (unpublished data). Our limited observations (flies, fleas, scorpionflies = Diptera + Siphonaptera + also demonstrate that PEPCK mRNA is present during Mecoptera), sister-superorder of Amphiesmenoptera both larval and adult stages. (Lepidoptera + Trichoptera) (Kristensen 199 1). The Initially, highly specific PEPCK oligonucleotide PEPCK coding sequence from Drosophila melanogaster primers were defined by comparison of published se- A Nuclear Gene for Mesozoic-Age Phylogenetics 597

Table 2 Table 3 Oligonucleotide Primer Sequences and Amplicon Sizes for Tree Statistics for Character Set Mappings on Test Phy- PEPCK logeny in Figure 1

Name Sequence (5’ to 3’) Consis- Tree Add tency Retention 284dF . . . GAG GGC TGG CTR GCM GAR CAY ATG (901) Character Set Length Steps Index Index 18.5dF . . TGT GGN AAR ACC AAY YTG GCC ATG (991) 19.5dF . . GGN GAY GAY AT1 GCB TGG ATG (1051) ntl + nt2 ...... 422 0 0.490 0.554 20.5dF . . GGI GTI TGG TGG GAR GGI ATG G (1226) Amino acids ...... 318 6 0.619 0.587 2ldNrc . . CA1 AAY CT1 GAR TTI GGR TGN GC (1305) nt2 only ...... 162 2 0.532 0.604 Slldrc . . GGM CGC ATT GCR AAY GGR TCR TGC AT (1532) ntl only ...... 260 8 0.462 0.519 22.5drc . . GAA CCA RTT RAC RTG RAA GAT C (1630) nt3 only ...... 1,233 58 0.358 0.310 All nucleotides .... 1,655 25 0.390 0.378 284dF 18.5dF 19.5dF 20.5dF

2ldNrc . . . 449 bp 359 bp 296 bp 122 bp Non%-Numbers in “Add Steps” are the additional steps required over Slldrc . . . 680 bp 590 bp 527 bp 353 bp most-parsimonious solutions for that character set (see table 4). 22.5drc . . . 774 bp 684 bp 621 bp 447 bp Downloaded from https://academic.oup.com/mbe/article/13/4/594/1055547 by guest on 02 October 2021

Non.--Primer name abbreviations: d = degenerate, F = forward, N = new aligned for phylogenetic analysis (fig. 2). With the ex- version, rc = reverse complement. Nonstandard nucleotide abbreviations: B = ceptions mentioned below, alignment was straightfor- C/Gm, M = A/C, N = A/C/Gm, R = A/G, Y = C/T, I = inosine. Numbers in ward using the GAP program of the UWGCG genetics parentheses following the primer sequences correspond to the 3’ end of the primer as localized in the Drosophila melanogaster sequence (GenBank acces- software package (Devereux, Haeberli, and Smithies sion no. YOO402). 1984). Missing data were coded as question marks. Alignment was problematical only in the immediate vi- quences from an insect (Drosophila melanogaster), a cinity of two small gaps (nucleotide characters 319-327, bird (Gallus gallus), and a mammal (Rattus norvegicus). 379-384). These gap regions (nucleotide characters While amplification within Diptera was generally suc- 319-348, 370-384; see asterisks in fig. 2) total 7.2% of cessful, only a few moth sequences were obtained. the fragment length, and their effect on tree topology Based on these new sequences, we surmised that greater was explored through parsimony analysis of data sets success with PCR would result by increasing the degree with varying numbers of gap region characters exclud- of primer degeneracy. This proved to be the case, as a ed. Three data sets discussed below out of 20 tested new set of more highly degenerate primers successfully represent the range of outcomes. (1) No removal of gap amplified the 18 species reported in this study (tables 1 regions, using the alignment shown in figure 2 and cod- and 2). Many of these primers were localized internal ing gaps as question marks, resulted in 10 minimum- to the original primer pairs in order to incorporate the length trees with a maximum of 12 out of 14 clades new sequence information from Lepidoptera. However, recovered from the test phylogeny (fig. 1). (2) Complete only about 50% of the lepidopteran species tested were removal of the gap regions resulted in 11 minimum- successfully amplified. Not surprisingly, success rate length trees, one of which was congruent with the test varied across groups (high in Ditrysia, low in “Mono- phylogeny. (3) Removal of only gap characters plus a trysia”). The largest consistently amplifiable PEPCK few most especially problematically aligned adjoining coding fragment, 680 bp in length or 35% of the total characters (nucleotide characters 3 19-330, 370-384); coding sequence in Drosophila melanogaster, was ob- see underlined asterisks in fig. 2) resulted in two mini- tained using 284dF and 5 1 ldrc. Agathiphaga queen- mum-length trees, one congruent with the test phylog- slandensis could only be amplified with 18SdF and eny and the other recovering 13 out of 14 clades (fig. 22Sdrc; the amplified sequence largely overlaps that ob- 6A and B). Unless otherwise indicated, analyses pre- tained from the other primer set. sented in Results and Discussion, including figures 6 Sequences were obtained from single-stranded and 7 and tables 3-5, are based on this third data set. DNA templates (U.S. Biochemical Corp. Sequenase 2.0 Nucleotides were translated into amino acid sequences DNA Sequencing Kit #70770; Sanger, Nicklen, and (fig. 3); the analyzed data set excluded residue charac- Coulson 1977) generated by asymmetric PCR amplifi- ters 107-l 11 and 124-128, corresponding to nucleotide cation (McCabe 1990) from agarose gel-isolated (QIA- characters 3 19-330 and 370-384. GEN QIAEX gel extraction kit #20020), double-strand- Pairwise sequence divergences by character set ed DNA templates produced through RT-PCR (Perkin (nucleotides, nucleotides by codon position, amino ac- Elmer GeneAmp RNA PCR Kit #NSOS-0017). Prior to ids) were obtained using PAUP 3.0, 3.1, 3.2 (Swofford sequencing, the asymmetric PCR reactions were pro- 1991). Average pairwise divergence values plotted on cessed with a purification kit (QIAGEN QIAquick-spin) the test phylogeny were used as evidence for or against or by ultrafiltration (Amicon Centricon- or -100). saturation (fig. 1). The frequency distribution for number Both strands were sequenced. No evidence for multiple of character states per character (site) was plotted as a PEPCK sequences was observed in any species, consis- second estimator of levels of character substitution and tent with expression of a single orthologous gene. potential saturation, for both nucleotides and amino ac- ids (fig. 4). Nucleotide frequencies were calculated by Data Analysis codon position (fig. 5) to look for compositional bias. An approximately 624-bp region of PEPCK, cor- The PEPCK sequence character sets were mapped responding to about one third of the coding region, was onto the test phylogeny using the tree constraints option 598 Friedlander et al.

Asp Cfe Tpa Dme Bni Mze Eau Mea Aqu Hps Dqr Ese Kqr Sar Tba Tci Tpe Ldi

Asp Cfe TPa Dme Bni Mze Eau Mea Downloaded from https://academic.oup.com/mbe/article/13/4/594/1055547 by guest on 02 October 2021 Aw Hps Dqr Ese Kqr Sar Tba Tci TPe Ldi

Asp Cfe Tpa Dme Bni Mze Eau Mea Aw Hps Dqr Ese Kqr Sar Tba Tci TPe Ldi

AaP Cfe Tpa Dme Bni Mze Eau Mea Aw Bps Dqr Ese Kqr Sar Tba Tci Tpe Ldi

Asp Cfe TPa Dme Bni Mze Eau Mea Aw Bps Dqr Ese Kqr Sar Tba Tci Tw Ldi

FIG. 2.-Aligned PEPCK nucleotide sequences (EMBL accession no. DS24063). The sequences displayed correspond to the cDNA coding region 883-1503 from Drosophila melanogaster (GenBank accession no. YOO402). Taxa are referenced by code name (see table 1). Lower case letters are unconfirmed nucleotide calls. X, gap; ?, not sequenced; ***, gap region; ***, subset of gap regions excluded from data sets analyzed in figures 6 and 7 and tables 3-5. A Nuclear Gene for Mesozoic-Age Phylogenetics 599

Table 4 characters, measured by the increase in length required Tree Statistics for Parsimony Analyses by the test phylogeny as compared to the most-parsi- No. of monious tree(s) for the same characters (table 4), and Most- topological similarity, measured by the number of pos- Parsi- Consis- sible test clades recovered (fig. 7, table 5). monious Tree tency Retention Character Set Trees Length Index Index Results and Discussion ntl + nt2 ...... 2 422 0.490 0.554 Amino acids ...... 1 312 0.633 0.611 Tests of Data Set Utility and Nucleotide nt2 only...... 3 160 0.539 0.615 Saturation ntl only...... 7 252 0.478 0.549 nt3 only...... 2 1,175 0.376 0.360 Average pairwise divergences between clades pro- All nucleotides . . . . 1 1,630 0.396 0.394 vide an estimate for levels of character substitution. These are plotted by character set (ntl, nt2, nt3, amino acids) for each node of the test phylogeny (fig. 1). Di- of PAUP, with both unambiguous and unambiguous + vergence in amino acid sequence ranges from 4% be- Downloaded from https://academic.oup.com/mbe/article/13/4/594/1055547 by guest on 02 October 2021 maximally ambiguous changes (described as minimum tween confamilial and congeneric moths to 20% across and maximum possible lengths in PAUP) assigned to the Mecopterida, consistent with a character set that is each branch (fig. 6). Overall fit to the tree was measured not yet saturated. With the exception of Glossata, di- by the consistency index excluding uninformative char- vergence levels increase toward the base of the tree. ntl acters (Swofford 1991), and by the retention index (Far- and nt2 reveal qualitatively similar trends to that for ris 1989) (table 3). amino acids with values ranging from 2% and 1% to Most-parsimonious trees for each data set were ob- 16% and 12%, respectively. For ntl, Aglossata and Het- tained through multiple heuristic searches in PAUP, us- erobathmiina are slightly less divergent on average from ing TBR branch swapping from 100 random addition members of their respective sister-clades than expected. replicates. “Simple” and “closest” addition options fre- For the nt3 data set, most terminal clades and all internal quently did not recover all islands of most-parsimonious nodes have values greater than 50%. No trend of in- trees, and on occasion recovered only trees of greater creasing divergence values with taxonomic depth is ev- length. For the total nucleotide data set, the following ident. We conclude that nt3 data sets are near saturation weighting schemes (nt 1 : nt2 : nt3) were tested: 1: 1: 1, 1: at divergence levels approaching 60%. 2:0, 5:5:1, 6:6:1, 7:7:1, lO:lO:l, lOO:lOO:l, and 1,000: The number of character state changes on the test 1,000: 1. Decay indices (Bremer 1988; Donoghue et al. phylogeny is another estimator of levels of character 1992) and bootstrap values (100 replicates) (Felsenstein substitution (fig. 4). Amino acids are generally conser- 1985) for the ntl + nt2 data set were calculated as mea- vative. Fifty-seven percent of the amino acid sites are sures of branch support (fig. 6A). invariant and another 22% exhibit just one or two state Neighbor-joining trees (Saitou and Nei 1987) were changes. However, 7% have 7-10 state changes, indi- calculated using the Kimura two-parameter distance (Ki- cating that there is a subset of rapidly evolving amino mura 1980), using the DNADIST and NEIGHBOR pro- acid sites (fig. 40). grams in PHYLIP3.5 (Felsenstein 1992). Consistent with its low pairwise divergence values, Concordance of PEPCK trees with the test phylog- nt2 is the most conservative codon position overall with eny was judged by two criteria: similarity in fit to the a maximum of six observed state changes and 72% in-

Table 5 Clade Recovery by Character Set Based on Parsimony Analyses

DATA SETS

Amino CLADE ntl + nt2 Acids nt2 ntl nt3 All nt

1. Amphiesmenoptera (or Antliophora) ...... X X 2. Diptera (flies) ...... X X X X 3. Trichoptera (caddisflies) ...... X X X + 4. Lepidoptera (moths and butterflies) ...... X X + 5. Zeugloptera (Micropterigidae) ...... X X x x X 6. Aglossata + Heterobathmiina + Glossata ...... X 7. Heterobathmiina + Glossata ...... X X 8. Glossata ...... X X 9. Dacnonypha (Eriocraniidae) ...... X X x x + X 10. Neolepidoptera ...... + 11. Exoporia (Hepialidae) ...... X X X X X X 12. Heteroneura ...... X X X X 13. Monotrysia (Tischeriidae) ...... X X X X X X 14. Ditrysia ...... X X X X + X

No=.-Blank spaces indicate clades not present in any minimum-length trees. X, clade present in strict consensus; +, clade present in at least one most-parsimonious tree but not present in strict consensus. CnnJ rrrearanaer et at.

1 80 Aap GITNPAGQKR YIAAAFPSAC ?KTNLAMMTPTLPGYKVECVGDDfAWMKFDTTGQLRAINP~~~TSYETNPNAM Cfe AITSPSGK?R YITAAFPSAC GKTNLAMLTP SLPGYKVECVGDDIAWMKFN SKGELRAINP ENGFFGVAFGT- Tpa GITNPNGDKK YITAAFPSAC GKTNLAMLTP SLPGFKVFCVGDDIAWMKFD SNGVLRAINP ENGFFGVAPGTSDEI'NPIAM hne GITDPKGVKK YITAAFPSAC GKTNLAMWP SLANYKVECV GDDIAWMKFDSQGVLRAINPENGFFGVAPGTSMETNPIAM Bni GITNKAGRKR YIAAAFPSAC GKD SLPGYKVBCVGDDIAWMKFDSNGQLRAINPENGFFGVAPGTSYETNPNAM Mze ?????SNRKR YIAAAFPSAC GKTNUMMMP SLPGYKVECVGDDIAWMKFDETGQLRAINPENGFFGVAPGTSRDTNPNAM Eau GITD?KWAKR YIRAAFPSAC GKTNLAMMKPrrPGFKVECVGDDIAWMKFDESGQLRAINPENGFFGVAPGTSNETNPNAM Mea ?????N?RKR YIA?AFPSAC ?KTNLAMMKp TLPGFKVECVGDDIAWMKFDENGQLRAINPENGFFGVRP? TSNETNPNAM Aqu ?????????? ?????????? ??TN?AMMKP SLPGYKIECV GDDIAWMKFD ESGQLRAINPENGWGVAPG TSAE3NPNAM Hps GITDPKGHKR YIAAAFPSAC GKTNL?&WTP TLPGYKVF0J GDDIAWMKFD DKGQLRAINP ENGFFGVAPG TSEETNPNAM Dgr GITNPAGRKS YIAAAFPSAC GKTNL&4LTP SLPGYKVECVGDDIAWMRFDEQGQLRAINP ENGFFGVAPGTSEATNPNAM Ese ??TD?AGM? ?IAAAFPSAC GKTNLAMUW SLPGYKVECVGDDIA~FDEQGQLRAINPENGFFGVAPGTSEATNPNAM Kgr GITDPQGKKR YIAAAFPSAC GKTNLAMMTP TL PGyKvEcv GDDIAWMRFD RNGQLRAINP ENGFFGVAPG TSQSTNPIAM Sar GITDQKKV YIAAAFPSAC GKTNLAMMTPTLPGYKLBCVGDDIWMKFDRNGQLRAINPENGFFGVAPG TAKSTNPVAM Tba GITDPCGRKR YIAAAFPSAC -P TL FGYKVECVGDDIAWMKFDKNGVLRAINPENGFFGVAPGT?NATNPNAM Tci GITD?SGRKR YVAAAFPSAC GKTNLAMMTP TLPGYKWCV GDDIAWMKFDQNGVLRAINPENGFFGVAPGTSNATNPNAM Tpe ?ITD?KGRKR YIAAAFPSAC BP TLPGYKVEW GDDIAWMKFD SKGVLRAINP ENGFFGVAFG TSAATNPIAM Ldi GVTDPKGRKR YIAAAFPSAC GKTNLAMMTP TLPGYKWCV GDDIAWMKFDKNGVLRAINPENGFFGVAPGTSAATNPIAM

81 160 Downloaded from https://academic.oup.com/mbe/article/13/4/594/1055547 by guest on 02 October 2021 Aap JYl?IFHNTIFTNVAATSDGGVYWEGLEctmA TGVTVTI3'JQGRPWTPGXSKTPAAHPNSRFC SPASQCPIID PNWESPEZVP Cfe DTIFRNTIFTNvAATsDGGvywEGLEoTLPAGvTvTLIwQGRpwTPExsKT PAAHPNSRFC SPASQCPIID SEWESSEZVP Tpa QTIFRNTIFTNVAQTSZGV FWEGMESSLE IXWKITm NpwEu;vTKT PAAHPNSRFC TPASQZPIID SI3VENPAGVP DmeNTVFKNTIFTNVASTSDGGv -SSLA PNVRITDWU KPWI'KDXSGKPAAHPNSRFC TPAAQCPIID EAWEDPAGVP Bni TlVLKNTIFl'NVAETSDGGV WEGMEXSPPEDVAITKWLGKPWDRNNSKNPAAHPNSRFC SPAAQCPTID PAWESSAGVP Mze DTIFKNTIFT NVAETSDGGV YWEGMGXNLP DDIKITLWJHGQPWDKTXSKK PAAHPNSRFC SPASQCPTID PLWESSSGVP Eau GTIFKNIVFT NVAeTSDGGv WWEGMGXXXP APQQAVJmLG QPWSPDXSPT PAAHPNSRFC SPASQCPIID DQWESPEGVP Mea ATIFKNSVFT NVAETSDGGV bWEGM?XXXP APQQAIDLLG EPWSPDXSQK PAAHPNSRFC SPASQCPIID D@JESPEGVP Aqu ATIFAIWVFT NVAETSDZGVhWEGWGEVPK !lGl-TITWJRGNPWDAKTA'M'PAAHPNSRFCTPASQCPIID PEWESSAGVP Hps ATIFSNTVFT NVAETSDZGVWWEGMGXXXPAPHGI~GKPWl'HKQ~Q PAAHPNSRFCTPAAQCPIMD PEWESSAGVP Dgr AAvFTNTIFTNvAETSDGGv~GxxxPApQRLT~GNAwDKDTAKT PAAHPNSRFC TPASQCPVIDDAWESPEGVP Ese GSVFTWIFT NVABTADGGV hWEGMGXXXA APEGLIMG NAWDKNPAKT PAAHPNSRFC TPATQCPVID EAWESPEGVP Kgr STIFSWVFT NvmLDGGv WWEGMGXXXSAPSTL~GKPWNPVTAKT PAAHPNSRFCTPADECPIMD PAWBSP%VP Sar STIFSNI'VFTNV~WWFGMGXXXPAPSTLIl%lHGKFWNPVT AKT PAAHPNSRFC TPAIZQCPIMDPAWESPIZVP Tba ATI~NV~PDGGV~~G~APKRLI~GNPWSPDTAKT PAAHP?SRLC SPAAQCPIIDDAWEKSEGVP Tci ATIFKNTWT NVAEZPDGGV WWEGMGXXXA APKHFIIZWKGNPWSPDTAKT PAAHPNSRFC SPAAQCPIID DAWETSEGVP TpeSTVFSNIWT NV-1 WWEGMGm APESLTmG QPWDPSKXXT PAAHPNSRFC TPAGQCPIID GEWESAEGVP Ldi STVFQNTVFT NVAETSDGGV WWEGMGXXXA APEKLIIWKG QPWDSSKXKT PA??PNSRFC TPAEQCPIID GIWKAPEGVP **** ****** ***** 161 208 Aap ISAILFGGRR PQGVPLWEA FSWAHGVYIG SAMRSEATAA AEFKGK?? Cfe ISAILFGGRR PQGVPLVYEA RNWTHGVFIG ASMRSFATAA A??????? Tpa ISGILFGGRR PZWPLVYEAKSWSHGVFIGASMRSESTAAAEHKGKTI hne ISAMLFGGRR PAGVPLIYEARJ3+JTHGVFIGAAMRSEATAAAEHKGKVI Bni ISAILFGGRR PQ3VPLVYQA RLWAHGVFLG AAMRSEATAA AEHSGKVI Mze ISAILLGGRR PEGVPLVYQA YIWAHGVFLG ASMRSEATAA AEHNGKVI Eau ISAILFGGRR FQZVPLVYEA RSWQHGGFIG ASMRSESTAA AEHKGKVI Mea ISAILFGGRR P%VPLVYEA RSWQHGV?VG ASMRSESTAA ???????? Aqu ISAILLGGRR ?AGVPLVVJ!%RWJKHGVFMGAAMRSEATAAAEHKGKVI Hps ISAILLGGRR PM;vpLvcEA RNWAHSVF?G ATMRSEATAA AEHSGKW Dgr ISAILLGGRR PEGVPLVSEA F0.WHAVFL.GAAMRSEATAA A??????? Ese ISAILIGGRR PSGVPLVSEA RDWSHAVFLG AAMRSEATAA AEHSGKW Kgr ISAILLGGRR PAGVPLVCZA RIWAHGVFMG ASMRSESTAA AEHSGKAI Sar ISAILLGGRR PAGVPLVCEARIWAHGVFMGASMRSESTAAAEHS?KAI Tba ISAILIGGRI PAGVPLWES RSWEHGVFMGASMRSEATAAAEHSDKW Tci ISAILLGGRR PAGVPLVVES RSWEHGVFMG ASMRSEATAA AEHSGKVV Tpe ISAILLGGRRPAGVPLWEARWKHGVFMGASMRSEATAAAEHSGKW Ldi ISAILLGGRR PAGVPLVVEA RIWQHGVFMG ATMRSEATAA AEHAGKVV

FIG. 3.-Aligned PEPCK amino acid sequences (EMBL accession no. DS24063). The nucleotide sequences shown in figure 2 were conceptually translated. Taxa are referenced by code name (see table 1). X, gap; ?, not translatable or sequenced; ***, gap region; ***, s,ubset of gap regions excluded from data sets analyzed in figure 7 and tables 3-5. variant sites (fig. 4B). ntl is somewhat more variable bases (fig. 5). When codon positions are separated, larg- (58% invariant), but much less so than nt3, which dis- er differences are apparent, particularly for ntl (elevated plays a roughly symmetrical peak, apart from a residu- adenine and guanine). Variation in base composition um of 5% invariant (fig. 4C). The few invariant nt3 nu- among taxa is substantially greater in nt3 than in ntl or cleotides are embedded within completely nondegener- nt2. ate codons (four methionines and six tryptophans). The overall high variability of nt3 supports the earlier con- Mapping of PEPCK Character Sets Onto the clusion that the nt3 data set is approaching saturation in Test Phylogeny this study. Six data sets (amino acids, all nt, nt 1, nt2, ntl + An unbiased base composition is a desirable feature nt2, nt 3) were each mapped onto the test phylogeny to in a phylogenetic study (Lockhart et al. 1994). Overall, determine levels of homoplasy (table 3). nt2 and amino the PEPCK sequences in this study are relatively unbi- acids revealed the lowest levels of homoplasy, followed ased,- displaying only slightly less thymine than other in order by ntl -+ nt2, ntl, all nt, and nt3. A Nuclear Gene for Mesozoic-Age Phylogenetics 601

A. B. A. B. C. D.

120 140 40 40 40 40 nt 1 nt 2 t? f 110 0 130 iz I-l 9 I-l p 30 30 30 30 67 = E 9 E jq~_ ;q,, ; 20 3 20 53 20 z 20 z E z r- S 10 8 10 S 10 a- 10

0 1 2 3 4 5 6 7 9 9101112 0 1 2 3 4 5 6 7 9 9101112 Number of state changes Number of state changes ACGT ACGT ACGT ACGT C. D. FIG. 5.-Average nucleotide base compositions for 18 test taxa. A, All nt. B, ntl. C, nt2. 0, nt3. Bars bracket maximum ranges. A, 110 nt 3 E aa adenine; C, cytosine; G, guanine; T, thymine. Data sets analyzed for f loo tia 0 h this figure excluded gap regions entirely (see fig. 2). L : 3 30 Downloaded from https://academic.oup.com/mbe/article/13/4/594/1055547 by guest on 02 October 2021 pl of the more inclusive groupings, including the Lepidop- tera, have markedly lower levels of support.

2 10 Neighbor-joining (NJ) analysis (Saitou and Nei 0 1 2 3 4 5 6 7 8 9101112 ldhNumber of state changes Number of state changes 1987) on the ntl + nt2 data set also recovered the con- cordance tree, with the exception that Heterobathmiina FIG. 4.-Distribution of number of character state changes per was reversed with Dacnonypha and Aglossata was re- character on the test phylogeny. A, ntl data set. B, nt2 data set:C, it3 data set. D, Amino acids data set. Data sets analyzed for this figure versed with Zeugloptera (12 of 14 clades recovered, fig. excluded gap regions entirely (see figs. 2 and 3). 6C). NJ analysis on the amino acid set was only slightly less concordant, with Heterobathmiina uniquely mis- placed as sister group to Exoporia (12 of 14 clades re- Phylogenetic Analysis of PEPCK Character Sets covered, not shown). NJ analysis on all nucleotides re- sulted in loss of Diptera and Lepidoptera, as well as Most-parsimonious (MP) trees were constructed numerous misgroupings within Lepidoptera (not using the same six character sets. Levels of homoplasy shown). by character set followed the same relative order as The effect of downweighting nt3, but not complete- when mapped onto the test phylogeny (table 4). How- ly eliminating it from the data set, has also been ex- ever, only for the ntl + nt2 data set was the test phy- plored for a range of weighting schemes. When nt3 is logeny included among the MP trees (fig. 6). For the downweighted between sevenfold and 1 ,OOO-fold, the amino acid, all nt, ntl, nt2, and nt3 character sets, tree test phylogeny is recovered as the sole MP tree. lengths for the test phylogeny were longer than for the MP phylogenies-2%, 2%, 3%, I%, and 5%, respec- Utility of PEPCK tively (tables 3 and 4). All character sets except amino Our central finding is that when nt3 is downweight- acids and total nucleotides yielded multiple MP trees ed or eliminated and when ambiguously aligned gap re- (table 4). gion data (nucleotide characters 319-330, 370-384) are Topological concordance with “known” relation- excluded, the 624-bp PEPCK fragment recovers the ships was assessed by determining how many of the 14 “known” relationships for Lepidoptera and outgroups test clades within the phylogeny are recovered in the quite well, that is, parsimony and neighbor-joining anal- MP trees (fig. 7). The two best MP trees (13 and 14 yses of ntl + nt2 recover 12 to 14 of the 14 test clades clades recovered) were from the ntl + nt2 data set. The (fig. 6). Even when the gap region data are included, a single tree based on analysis of all nucleotides recovered strict consensus of MP trees recovers 9 of 14 clades, only six clades, presumably because of high homoplasy with one tree recovering 12 of 14 (not shown). Our re- levels from nt3, for which only two or four clades were sults demonstrate the phylogenetic utility of PEPCK at recovered. this level of systematic inquiry and argue for its appli- Another measure of topological concordance is the cation to unresolved relationships within basal Lepidop- number of clades in the test phylogeny that are also tera and other groups of Mesozoic age. recovered in the strict consensus of the MP trees (table Decay indices ranging from 9 to 13 (fig. 6A) in- 5). ntl + nt2 recovers the most clades (13), followed dicate good support for Zeugloptera, “Dacnonypha,” by nt2 (1 l), amino acids (8), ntl (7), all nt (6), and nt3 Exoporia, and “Monotrysia.” By contrast, none of the (2).. _ The tre e f or amino acids (not shown) included sev- more inclusive groupings, such as Lepidoptera, Glos- era1 groupings (e.g., Diptera + Trichoptera; Exoporia + sata, Neolepidoptera, or Heteroneura, is nearly so well Ditrysia) that contradict the test phylogeny. supported (decay indices between zero and two). Weak The distribution of ntl + nt2 characters on the test support for individual branches emphatically does not phylogeny, as well as bootstrap values and decay indi- mean that PEPCK contains no information on these larg- ces, show that not all groups are equally strongly sup- er groupings (see Sanderson 1989). The recovery of the ported (fig. 6A). In particular, all terminal clades within more inclusive clades by analysis of the ntl + nt2 data Lepidoptera, namely, Ditrysia, Monotrysia, Exoporia, set cannot represent chance, though conflicting signals Dacnonypha, and Zeugloptera, are well supported. All are also present. Stronger support may be expected from 602 Friedlander et al.

Ditrysia Heteroneura \ Apterobittacus 9Wq Tinea \ 6-15 Ctenocephalides 6_,3 Lymantria 47R

Macrostemum Eriocrania Downloaded from https://academic.oup.com/mbe/article/13/4/594/1055547 by guest on 02 October 2021 64/l 6-23

Zeugloptera ,4_26 Brachycentrus 6414 / r-17 L Macrostemum 1 Tinea

Lymantna Heteroneura

H 0.010

FIG. 6.-Trees produced on analysis of the ntl + nt2 data set for 18 test taxa. A, One of two most-parsimonious trees; it is congruent with the test phylogeny (fig. 1). Bootstrap values followed by decay indices are above branches. Minimum followed by maximum branch lengths are displayed below branches. B, The other most parsimonious tree. Note the altered placement of “Dacnonypha” relative to Exoporia. C, Neighbor-joining tree for the same ntl + nt2 data set for 18 test taxa, drawn proportionally to branch lengths. Note the altered placement of both Aglossata and Heterobathmiina relative to the test phylogeny.

PEPCK by sampling more than the approximately one vide powerful evidence on basal lepidopteran phyloge- third of the coding sequence examined here. Increasing ny. taxon sampling density may also greatly enhance the strength of our conclusions. Identifying Useful Data Set Partitions A comparable result, of complete congruence with Additional lines of evidence, apart from concor- morphology despite weak support for some deep clades, dance with the test phylogeny, support our conclusion emerged from 18s ribosomal DNA studies of a nearly that nt3 should be severely downweighted or eliminated identical set of taxa (Wiegmann 1994; Wiegmann et al., for analysis of PEPCK on this time scale. First, pairwise unpublished data). Perhaps because the data set is larger divergences at nt3 are mostly above 0.50 (fig. l), a value (1,357 bp with >300 informative sites), several clades above which saturation is highly likely. Under the Jukes- only weakly supported by PEPCK were strongly estab- Cantor (1969) model, for example, a pairwise diver- lished in the 18s rDNA result, including the monophyly gence value of 0.55 results when each nucleotide is, on of the Lepidoptera and Neolepidoptera. By contrast, the average, substituted once. Direct evidence for saturation “Dacnonypha,” with a decay index of one, was strongly lies in the failure of nt3 divergence values to increase supported by PEPCK. The fact that both 18s rDNA and with the order of divergence on the test phylogeny, out- PEPCK are concordant with the test phylogeny, yet pro- side the replicates within some of the more recent cla- vide different levels of support for different groups, sug- des, in particular, Monotrysia and Exoporia. This result gests that eventual combination of these genes will pro- contrasts sharply with the monotonic increases seen in ntl, nt2, and amino acid sequence divergences (fig. 1). Our data suggest that nt3 is subject to constraints be- yond those in the Jukes-Cantor model, as the limiting divergence appears to be less than 0.60, rather than 0.75. Additionally, the nucleotide composition at nt3 is sig- nificantly more variable than at ntl or nt2 (fig. 5), and this may create additional analytical difficulties (Lock-

- 1 0 1 2 3 4 5 6 7 6 9 10 11 12 13 14 hart et al. 1994). Not surprisingly, nt3 shows much

Number of clades shared with test phylogeny greater homoplasy on the test phylogeny than ntl or nt2. Similar saturation and phylogenetic uninformative- FIG. 7.-Number of clades present in the test phylogeny that are ness of nt3 at deep levels has been reported for a number recovered in each of the most-parsimonious trees for six character sets (all nt, ntl, nt2, ntl + nt2, nt3, amino acids). Individual boxes cor- of other genes (Edwards, Arctander, and Wilson 1991; respond to single most-parsimonious trees. Irwin, Kocher, and Wilson 1991). These effects were A Nuclear Gene for Mesozoic-Age Phylogenetics 603 clearly evident in partial sequences of the elongation Wiegmann for collection of some of the specimens used factor-lo (EF- la) gene from a subset of the basal lep- in this study. Laboratory assistance was provided by K. idopteran lineages (T. I? Friedlander, unpublished data). Horst and S. Zhao. Constructive and thoughtful com- Numerous discordances with morphological groupings ments from A. Brower, R. Harrison, and one anonymous were apparent in the most-parsimonious tree for these reviewer helped shape the final draft of this paper. This data. Almost all informative changes were synonymous research was supported by funds from the National Sci- with maximal pairwise nt3 divergence levels of only ence Foundation, grant DEB-9212669, the Center for about 0.35. Agricultural Biotechnology, the U.S. Department of Ag- Given the benefits of downweighting or eliminating riculture-NRI CGP grant 90-37250-5482, and the Mary- nt3, it might seem that concordance could be further land Agricultural Experiment Station. improved by distinguishing among character subsets within ntl + nt2, which contain synonymous and non- synonymous substitutions, or within amino acids. There LITERATURE CITED is clearly evolutionary rate heterogeneity among sites.

BEALE,E.G.,N.B. CHRAPKIEWICZ,H. A. SCOBLE,R.J.METz, Downloaded from https://academic.oup.com/mbe/article/13/4/594/1055547 by guest on 02 October 2021 For example, 57% of the amino acid sites are invariant, 0. I? QUICK, R. L. NOBLE, J. E. DONELSON, K. BIEMANN, yet more than 10% are substituted at least five times and D. K. GRANNER. 1985. Rat hepatic cytosolic phos- (fig. 4). Thus, some sites may be saturated, even at mod- phoenolpyruvate carboxykinase (GTP). J. Biol. Chem. 260: erate overall levels of divergence. Three kinds of vari- 10748-10760. able amino acid sites are definable when mapped on the BREMER, K. 1988. The limits of amino acid sequence data in test phylogeny: those that change once (e.g., characters angiosperm phylogenetic reconstruction. Evolution 42:795- 33 and 34 in fig. 3); those that change multiple times 803. among a limited number of character states (e.g., char- BROWER, A. V. Z., and R. DESALLE. 1994. Practical and the- oretical considerations for choice of a DNA sequence region acters 31 and 83), yielding high homoplasy; and those in insect molecular systematics, with a short review of pub- that change multiple times among numerous character lished studies using nuclear gene regions. Ann. Entomol. states, yielding low homoplasy (e.g., characters 73 and Sot. Am. 87:702-7 16. 81). It seemed plausible that removal of the second CHO, S., A. MITCHELL, J. C. REGIER, C. MITTER, R. W. POOLE, class, which results mostly from alternations between T. I? FRIEDLANDER,and S. ZHAO. 1995. A highly conserved serine and threonine, would improve concordance. How- nuclear gene for low-level phylogenetics: elongation factor- ever, doing so resulted only in loss of resolution. Similar la recovers morphology-based tree for heliothine moths. attempts to subdivide the ntl + nt2 data set were also Mol. Biol. Evol. 12:650-656. not helpful. CRETI, R., E CITARELLA, 0. TIBONI, A. SANANGELANTONI,I? With nt3 eliminated, we still must choose between PALM, and I? CAMMARANO. 1991. Nucleotide sequence of a DNA region comprising the gene for elongation factor la amino acid and nucleotide codings. Amino acids show (EF- 1o) from the ultrathermophilic archaeote Pyrococcus less homoplasy on the test phylogeny and on the most- woesei: phylogenetic implications. J. Mol. Evol. 33:332- parsimonious trees than ntl or ntl + nt2, but slightly 342. more than nt2 (tables 3 and 4). However, MP trees for DAVIS, D. R. 1986. A new family of monotrysian moths from ntl + nt2 and nt2 are more concordant with the test austral South America (Lepidoptera: ), with a phylogeny than those for amino acids, as indicated by phylogenetic review of the Monotrysia. Smithsonian Con- proportional difference in tree length (0% and 1.3%, trib. Zool. No. 434. versus 1.9%), and maximum number of test clades re- DEVEREUX, J., I? HAEBERLI, and 0. SMITHIES. 1984. A com- covered (14 and 12, versus 8 of 14) (fig. 7, table 5). prehensive set of sequence analysis programs for the VAX. One explanation is that the slight disadvantage of the Nucleic Acids Res. 12:387-395. nucleotide data in noisiness is overcome by the larger DONOGHUE,M., R. G. OLMSTEAD, J. E SMITH, and J. D. PALM- 1992. Phylogenetic relationships of Dipsacales based on number of characters. Another explanation would be ER. rbcL sequences. Ann. MO. Bot. Gard. 79:333-345. that some of the amino acid replacement site characters EDWARDS, S. V., I? ARCTANDER, and A. C. WILSON. 199 1. are positively misleading. Mitochondrial resolution of a deep branch in the genealog- In conclusion, using a well-supported morpholog- ical tree for perching birds. Proc. R. Sot. Lond. B 43:99- ical hypothesis of basal lepidopteran relationships as a 107. guide, our results show that PEPCK has much potential FARRIS, J. S. 1989. The retention index and the resealed con- for estimating phylogenetic splits of Mesozoic age. This sistency index. Cladistics 5:417-419. is a time span for which few other appropriate genes are FELSENSTEIN, J. 1985. Confidence limits on phylogenies: an now available, particularly in the nuclear genome, and approach using the bootstrap. Evolution 39:783-79 1. particularly in (Brower and DeSalle 1994). Fur- FELSENSTEIN,J. 1992. PHYLIP (phylogeny inference package) ther development and systematic application of PEPCK version 3.5. Department of Genetics, University of Wash- ingtion, Seattle. is amply justified. FRIEDLANDER,T. I?, J. C. REGIER, and C. MITTER. 1992. Nu- clear gene sequences for higher level phylogenetic analysis: Acknowledgments 14 promising candidates. Syst. Biol. 41:483490. FRIEDLANDER,T. I?, J. C. REGIER, and C. MITTER. 1994. Phy- We gratefully acknowledge S. Cho, L. Friedlander, logenetic information content of five nuclear gene sequenc- M. Gentili, R. Leclerc, J. Leonard, E. Nielsen, D. es in animals: initial assessment of character sets from con- O’Brochta, M. Scoble, E Sperling, A. Venables, and B. cordance and divergence studies. Syst. Biol. 43:5 1 l-525. 604 Friedlander et al.

GUNDELFINGER, E. D., I. HERMANS-BORGMEYER, G. GREN- FAND, J. J. SNINSKY, and T. J. WHITE, eds. PCR protocols: NINGLOH,and D. ZOPF. 1987. Nucleotide and deduced ami- a guide to methods and applications. Academic Press, San no acid sequence of the phosphoenolpyruvate carboxyki- Diego, Calif. nase (GTP) from Drosophila melanogaster. Nucleic Acids MIYAMOTO, M. M., and J. CRACRAFT. 1991. Phylogenetic in- Res. 15:6745. ference, DNA sequence analysis, and the future of molec- HILLIS, D. M. 1995. Approaches for assessing phylogenetic ular systematics. Pp. 3-17 in M. M. MIYAMOTOand J. CRA- accuracy. Syst. Biol. 44:3-16. CRAFT, eds. Phylogenetic analysis of DNA sequences. Ox- HOD, Y., H. YOO-WARREN, and R. W. HANSON. 1984. The gene ford Univ. Press, New York. encoding the cytosolic form of phosphoenolpyruvate car- MIYAMOTO, M. M., and W. M. FITCH. 1995. Testing species boxykinase (GTP) from the chicken. J. Biol. Chem. 259: phylogenies and phylogenetic methods with congruence. 15609-15614. Syst. Biol. 4464-76. IRWIN, D. M., T. D. KOCHER, and A. C. WILSON. 1991. Evo- NIELSEN, E. S., and I. E B. COMMON. 1991. Lepidoptera lution of the cytochrome B gene of mammals. J. Mol. Evol. (moths and butterflies). Pp. 8 17-915 in CSIRO, eds. The 32: 128144. insects of Australia, 2nd ed. Vol. II. Cornell Univ. Press, JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein Ithaca, N.Y.

molecules. Pp. 21-132 in H. N. MUNRO, ed. Mammalian PENNY, D., and M. D. HENDY. 1986. Estimating the reliability Downloaded from https://academic.oup.com/mbe/article/13/4/594/1055547 by guest on 02 October 2021 protein metabolism. Vol. 3. Academic Press, New York. of evolutionary trees. Mol. Biol. Evol. 3:403417. KAWASAKI, E. S. 1990. Amplification of RNA. Pp. 21-27 in RIVERA, M. C., and J. A. LAKE. 1992. Evidence that eukaryotes M. A. INNIS, D. H. GELFAND, J. J. SNINSKY, and T. J. and eocyte prokaryotes are immediate relatives. Science WHITE, eds. PCR protocols: a guide to methods and appli- 257:74-76. cations. Academic Pess, San Diego, Calif. Ross, A. J., and E. A. JARZEMBOWSKI.1993. Arthropoda (Hex- KIMURA, M. 1980. A simple method for estimating evolution- apoda; Insecta). Pp. 363-426 in M. J. BENTON, ed. The ary rate of base substitutions through comparative studies fossil record 2. Chapman & Hall, New York. of nucleotide sequences. J. Mol. Evol. 16: 11 l-120. SAITOU, N., and M. NEI. 1987. The neighbor-joining method: KOBAYASHI, Y. I?, and H. ANDO. 1988. Phylogenetic relation- a new method for reconstructing phylogenetic trees. Mol. ships among the lepidopteran and trichopteran suborders Biol. Evol. 4:406-425. (Insecta) from the embryological standpoint. Z. Zool. Sys- SANDERSON,M. J. 1989. Confidence limits on phylogenies: the tematik Evolutionsforschung 26: 186-2 10. bootstrap revisited. Cladistics 5: 113-129. KRISTENSEN, N. P. 1984. Studies on the morphology and sys- SANGER, E, S. NICKLEN, and A. R. COULSON. 1977. DNA se- tematics of primitive Lepidoptera (Insecta). Steenstrupia 10: quencing with chain-terminating inhibitors. Proc. Natl. 141-191. Acad. Sci. USA 74:5463-5467. KRISTENSEN, N. I? 1991. Phylogeny of extant hexapods. Pp. SWOFFORD, D. L. 1991. PAUP: phylogenetic analysis using 125-140 in CSIRO, eds. The insects of Australia, 2nd ed. parsimony, version 3.0. Illinois Natural History Survey, Vol. I. Cornell Univ. Press, Ithaca, N.Y. Champaign. KRISTENSEN, N. I? 1994. Evolutionary biology of primitive WELDON, S. L., A. RANDO, A. S. MATATHIAS, Y. HOD, I? A. Lepidoptera: an overview. Abstr., Sot. Europ. Lepidop. KALONICK, S. SAVON, J. S. COOK, and R. W. HANSON. Congr., Lednice, 1994, pp. 8-10. 1990. Mitochondrial phosphoenolpyruvate carboxykinase KUKALOVA-PECK, J. 199 1. Fossil history and the evolution of from the chicken. J. Biol. Chem. 265:7308-7317. hexapod structures. Pp. 141-179 in CSIRO, eds. The insects WIEGMANN, B. M. 1994. The earliest radiation of the Lepi- of Australia, 2nd ed. Vol. I. Cornell Univ. Press, Ithaca, doptera: evidence from 18s rDNA. Dissertation, University N.Y. of Maryland, College Park, Md. LABANDEIRA, C. C., D. L. DILCHER, D. R. DAVIS, and D. L. YOO-WARREN, H., J. E. MONAHAN, J. SHORT, H. SHORT, A. WAGNER. 1994. Ninety-seven million years of angiosperm- BRUZEL, A. WYNSHAW-BORIS, H. M. MEISNER, D. SAMOLS, insect association: paleobiological insights into the meaning and R. W. HANSON. 1983. Isolation and characterization of of coevolution. Proc. Natl. Acad. Sci. USA 91:12278- the gene coding for cytosolic phosphoenolpyruvate carbo- 12282. xykinase (GTP) from the rat. Proc. Natl. Acad. Sci. USA LOCKHART, l? J., M. A. STEEL, M. D. HENDY, and D. PENNY. 80:3656-3660. 1994. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol. 11:605-612. RICHARD G. HARRISON, reviewing editor MCCABE, l? C. 1990. Production of single-stranded DNA by asymmetric PCR. Pp. 76-83 in M. A. INNIS, D. H. GEL- Accepted January 2, 1996