Hierarchical Analysis of Variation in the Mitochondrial 16S rRNA Gene Among

James B. Whit®eld* and Sydney A. Cameron*² *Department of Entomology and ²Department of Biological Sciences, University of Arkansas

Nucleotide sequences from a 434-bp region of the 16S rRNA gene were analyzed for 65 taxa of Hymenoptera (ants, bees, , parasitoid wasps, saw¯ies) to examine the patterns of variation within the gene fragment and the taxonomic levels for which it shows maximum utility in phylogeny estimation. A hierarchical approach was adopted in the study through comparison of levels of sequence variation among taxa at different taxonomic levels. As previously reported for many holometabolous , the 16S data reported here for Hymenoptera are highly AT-rich and exhibit strong site-to-site variation in substitution rate. More precise estimates of the shape parameter (␣) of the gamma distribution and the proportion of invariant sites were obtained in this study by employing a reference phylogeny and utilizing maximum-likelihood estimation. The effectiveness of this approach to recovering expected phylogenies of selected hymenopteran taxa has been tested against the use of maximum parsimony. This study ®nds that the 16S gene is most informative for phylogenetic analysis at two different levels: among closely related species or populations, and among tribes, subfamilies, and families. Maximization of the phylogenetic signal extracted from the 16S gene at higher taxonomic levels may require consideration of the base composition bias and the site-to-site rate variation in a maximum-likelihood framework.

Introduction Selecting a gene for phylogenetic analysis requires of nucleotide substitution among sites within a gene matching the level of sequence variation to the desired (Wheeler and Honeycutt 1988; Mindell and Honeycutt taxonomic level of study. Several recent papers have 1990; Hillis and Dixon 1991; Kraus et al. 1992). These focused on the identi®cation of genes that are useful for constraints may be of a general nature, such as variation phylogenetic analysis at different taxonomic levels in the rate of substitution by codon position in protein- (Brower and DeSalle 1994; Friedlander, Regier, and Mit- coding genes or by secondary structural position in ter 1994; Graybeal 1994; Simon et al. 1994; Cho et al. rRNA genes, or they may be lineage-speci®c (some taxa 1995). For many of these genes, sequence data are avail- appear to evolve more slowly than others; e.g., DeSalle able from a relatively small sample of taxa with roughly and Templeton 1988; Hasegawa and Kishino 1989). known divergence times. These studies permit estimates Graybeal (1994) pointed out that any given gene's of sequence divergence rates (e.g., number of nucleotide potential phylogenetic utility at a particular taxonomic substitutions or percentage of sequence divergence over level depends not only on the percentage of sequence time), providing information on the relative rate of divergence at that level, but also on the shape of the change of a gene. However, estimates of sequence di- sequence divergence accumulation curve. For example, vergence rate calculated from a small sample of taxa at a given observed divergence level, genes in which may not be appropriate when applied more generally only a few sites are ``free to vary'' (sensu Palumbi 1989) (Graybeal 1994), because unsampled lineages may differ will contain more superimposed changes (i.e., be more in divergence rate. This problem will resolve itself as saturated with nucleotide substitutions) than those in sequence data are collected from additional genes for which many sites are able to change. The pattern that increasingly larger numbers of taxa. emerges when few sites are free to vary is a sequence A few mitochondrial (mtDNA) genes have been divergence accumulation curve that is strongly convex studied extensively within recently diverged lineages of near the origin, then ¯attens out at a low level over the (Ͻ5 MYA). These genes (12S rRNA, 16S remainder of the distribution of divergence times as ad- rRNA, cytochrome oxidase I) exhibit nearly the same ditional substitutions are superimposed and thus go divergence rate, which is linear with time and approxi- unobserved. (This curve contrasts sharply with the more mates 2.3% per Myr for silent sites (Brower 1994). linear curve seen with recently diverged lineages or with However, when more anciently diverged lineages (Ͼ75 genuinely highly conserved genes.) The low overall se- MYA) are compared, different mtDNA genes exhibit quence divergence level for older divergences might considerable variation in sequence divergence rate suggest a strongly conserved gene appropriate for high- (Cummings, Otto, and Wakeley 1995), with some show- er-level comparisons, when, in fact, the available vari- ing greater conservation than others. Furthermore, a ation may be useful only at lower taxonomic levels, number of constraints can in¯uence variation in the rate among recently diverged taxa. An appreciation of the distribution of variable sites across the gene is therefore Key words: Hymenoptera, molecular phylogeny, insects, mito- important in examining the phylogenetic utility of a chondrial DNA. gene for a particular taxonomic level. Address for correspondence and reprints: James B. Whit®eld, Factors other than the distribution of rate variation Department of Entomology, 321 Agriculture Building, University of Arkansas, Fayetteville, Arkansas 72701. E-mail: among sites can determine the shape of the sequence jwhit®[email protected]. divergence accumulation curve. For instance, in many Mol. Biol. Evol. 15(12):1728±1743. 1998 holometabolous insects, including Hymenoptera and ᭧ 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 Drosophila, mtDNA exhibits a nucleotide composition

1728 16S Hierarchical Variation in Hymenoptera 1729

which is strongly biased toward adenine and thymine Prior Uses and Criticisms of the 16S Gene for (AT bias). For some groups, the mean percentage of AT Phylogeny can be higher than 80% (Cameron 1993; Crozier and Sequences coding for the 16S rRNA gene have Crozier 1993; Simon et al. 1994; Dowton and Austin been used for estimating phylogenies over a notable 1994, 1997a, 1997b; Whit®eld 1997). When the base range of taxonomic levels (see table 1 for a survey of composition is biased to that degree, obviously, the ratio studies involving insects). The existence of sites that are of transversions (tv) to transitions (ti) increases, as does changing at widely differing rates within this single gene the probability of convergent substitutions to the more (Hillis and Dixon 1991; Simon et al. 1994) suggests that common bases. Both of these increases further reduce 16S sequences contain historical information that is use- the ability to correctly estimate the number and propor- ful at more than one level of phylogenetic divergence. tion of hidden mutations and, hence, the ability to cor- In a recent review, Simon et al. (1994) suggested that rect sequence divergence for hidden changes. data from 16S might be useful primarily for phyloge- Fortunately, increasingly complex models of sub- netic estimation at higher levels, because few sites were stitutional change continue to be incorporated into meth- variable at lower levels among closely related species, ods of correcting sequence divergence rates for satura- and even some of those quickly saturated. In contrast, tion (Jukes and Cantor [1969] and the models that fol- Engel and Schultz (1997), in a reanalysis of Cameron's lowed), ti/tv bias (Kimura 1980), base composition bias (1991, 1993) data for estimating relationships among the (Tajima and Nei 1984; Tamura 1992; Tamura and Nei corbiculate bees, suggested that Apis (honey bee) spe- 1993), and rate variation among sites (Yang 1995; Niel- cies relationships recovered from 16S sequences (Cam- sen 1997). Thus, even though the functional constraints eron et al. 1992) were strongly congruent with those on rRNA, tRNA, and a large variety of protein-coding inferred from morphological data. genes are not well understood, it is still possible to in- In cases in which well-corroborated phylogenies vestigate the patterns of variation and phylogenetic util- are available, concordance of DNA-based results with ity of these genes in some detail. well-researched phylogenies based on morphology and In this paper, we examine the mitochondrial large- other evidence often provides a good test of the phylo- subunit (16S) rRNA gene for hierarchical patterns of genetic informativeness of molecular data (e.g., Fried- sequence variation and divergence among a large and lander et al. [1996] for a nuclear protein-coding gene; diverse array of hymenopteran insects (ants, bees, Smith [1989] for rRNA genes). In this context, attempts wasps, parasitoids, and saw¯ies). We address several as- to use the 16S gene for estimation of relationships at pects of 16S sequence variation: (1) the observed pat- higher taxonomic levels have met with mixed (although terns of sequence divergence at different taxonomic lev- signi®cant) success. Cameron (1993) obtained a tribal els; (2) how those patterns are affected by correcting the phylogeny of the corbiculate bees which con¯icts with sequence divergences under different models; (3) the in- morphology-based phylogenies (Roig-Alsina and Mich- ferred ti : tv ratios at various hierarchical levels; (4) base ener 1993) but is fully concordant with results from oth- composition bias within the Hymenoptera, and the po- er genes, including the nuclear large-subunit (28S) tential effects of this bias on rates of change and phy- rRNA gene (Sheppard and McPheron 1991; unpublished logenetic informativeness; and (5) the distribution of data), the major opsin gene (Mardulyn and Cameron variable sites across the surveyed gene and the position 1998), and mitochondrial cytochrome b (Koulianos et of those sites relative to the inferred secondary structure al. 1998). Dowton and Austin (1994) and Flook and of the molecule. Rowell (1997a, 1997b) used 16S sequence data to re- For the vast majority of Hymenoptera, fossils are cover relationships among the superfamilies of Hyme- not available for accurate estimates of divergence times noptera and Orthoptera, respectively, which were largely (Whit®eld 1998). Thus, explicit plots of sequence di- consistent with those based on morphology. Other 16S vergence against time (as in Brower 1994; Graybeal phylogenetic analyses of hymenopteran families (Dow- 1994) are not possible. Instead, we have taken a hier- ton et al. 1997) and subfamilies (Whit®eld 1997; Dow- archical approach in this study, comparing sequence di- ton, Austin, and Antolin 1998) have recovered relation- vergences for species within genera, for genera within ships that are largely congruent with those based on tribes or subfamilies, for subfamilies within families, for morphology. families within superfamilies, for superfamilies within In these higher-level molecular studies, consider- the order. We realize that taxonomic levels are neces- able phylogenetic ``noise'' is present, presumably due to sarily subjective and somewhat arbitrarily determined the saturation of many of the variable sites at those lev- among different taxa (although this problem should be els. Such noise has often been nulli®ed or reduced by minimized by con®ning comparisons to within a single incorporating compensatory calculations or weights into order of ), and that this subjectivity will intro- parsimony or maximum-likelihood (ML) analyses duce an unknown amount of variability into our analy- (Swofford et al. 1996). Greater knowledge of how noise ses. What this approach lacks in determination of rate accumulates with increasing divergence level and fur- accuracy may, however, be counterbalanced by (1) the ther elucidation of patterns of variation within the 16S large and diverse number of our taxonomic comparisons gene will greatly assist in the extraction of meaningful and (2) the applicability of our methods without prior phylogenetic signal from the sequence data (Dowton estimates of fossil-calibrated divergence times. and Austin 1997a; Flook and Rowell 1997a, 1997b; 1730 Whit®eld and Cameron

Table 1 A Selection of Published Phylogenetic Studies of Taxa Using 16S Sequence Data Taxon Hierarchical Levels References Blattaria (entire order) ...... Superfamilies, families Kambhampati (1995) Coleoptera Gonioctena (Chrysomelidae) ...... Species Mardulyn, Milinkovitch, and Pasteels (1997) Ophraella (Chrysomelidae) Species Funk et al. (1995) Cicindelidae ...... Species, Populations Vogler and DeSalle (1993); Vogler et al. (1993a, 1993b) Diptera Drosophilidae ...... Subgenera, species groups, species DeSalle (1992a, 1992b); DeSalle et al. (1987) Simuliidae ...... Sibling species Xiong and Kocher (1993a, 1993b) Homoptera Cicadellidae ...... Genera Fang et al. (1993) Hymenoptera Apis (Apidae) ...... Species Cameron (1991); Cameron et al. (1992); Engel and Schultz (1997) Microgastrine Braconidae ...... Genera Mardulyn and Whit®eld (1998) Apidae ...... Tribes, genera Cameron (1993) Entire order ...... Superfamilies, families Derr et al. (1992a, 1992b); Dowton and Austin (1994, 1997a, 1997b) Proctotrupomorpha, Evaniomorpha .... Families Dowton et al. (1997) Microgastroid Braconidae ...... Subfamilies Whit®eld (1997) Braconidae ...... Subfamilies Dowton, Austin, and Antolin (1998) Lepidoptera Spodoptera and other Noctuidae ...... Populations, some species Pashley and Ke (1992) Orthoptera Entire order ...... Suborders, superfamilies Flook and Rowell (1997b) Caelifera ...... Superfamilies, families Flook and Rowell (1997a)

Whit®eld 1997). These are the goals of the analyses re- Sequences from two braconid wasps (®g. 1a and b) and ported below. a bumble bee were ®tted by hand to the 16S secondary- structure model of Gutell (1993). These, in turn, served Materials and Methods as templates for aligning sequences within the superfam- Sources of Sequence Data ilies Ichneumonoidea and Apoidea, respectively, using CLUSTAL W (Thompson, Higgins, and Gibson 1994). We examined 16S sequences that originated from These two blocks of aligned sequences within superfam- ®ve different phylogenetic studies: (1) an analysis of ilies were then aligned to one another. Finally, sequences subfamily relationships within the hymenopteran lineage from the remaining taxa were aligned to this set of of microgastroid Braconidae (Whit®eld 1997); (2) an aligned sequences using CLUSTAL W. A completely analysis of relationships among the four tribes of cor- automated alignment of all 65 taxa using the default biculate bees within the family Apidae sensu Roig-Al- parameters within CLUSTAL W was used in a previous sina and Michener (1993) (Cameron 1993); (3) an anal- set of hierarchical analyses (SAC, 1996 meeting of the ysis of hymenopteran relationships focusing on the su- Society for the Study of Evolution/Society of Systematic perfamilies Evanioidea and Proctotrupoidea s. l. (Dow- Biologists). The automated alignment procedure resulted ton et al. 1997); (4) a preliminary analysis of the in general patterns of variability in the data that were phylogenetic utility of the 16S gene in the order Hy- virtually identical to those reported here. However, ex- menoptera (Derr et al. 1992a, 1992b); and (5) a survey cluding considerations of secondary structure has been of hymenopteran relationships, with a special focus on found to be less effective for phylogeny estimation for the nonsaw¯y taxa or (Dowton and Austin these taxa (Whit®eld 1997). The patterns observed be- 1994). Our analyses utilize a 434-bp portion of the 3Ј low account for secondary structure. end of the 16S gene that was shared among each of the data sets from these studies. This portion of the gene Hierarchical Comparisons corresponds to positions 13470±13894 in Apis mellifera (Crozier and Crozier 1993). A list of all 65 taxa ex- The taxa used at each taxonomic (hierarchical) lev- amined in this analysis, along with their current classi- el of comparison are given in table 3. Exemplars were ®cations, GenBank accession numbers (when available), selected so that each taxon is represented only once at and source references, is provided in table 2. the next lowest hierarchical level. For instance, to com- pare divergences at the level of genera within subfam- Sequence Alignment ilies, one exemplar was selected from each of one or All sequences were entered unaligned into SeqApp, more genera within a subfamily to represent that genus version 1.9a (Gilbert 1993), and checked for accuracy. in the calculation of pairwise matrices. This procedure 16S Hierarchical Variation in Hymenoptera 1731

Table 2 Taxa Examined in this Analysis, Along with Their Current Classi®cations and Sources GenBank Superfamily Family Subfamily Genus Species Accession No. Source Tenthredinoidea ...... Tenthredinidae Undetermined Undetermined Undetermined Not submitted Derr et al. (1992a) Pergidae Perginae Perga condei U06953 Dowton and Austin (1994) Pergidae Phylacteophaginae Phylacteophaga froggattii U06954 Dowton and Austin (1994) Cephoidea ...... Cephidae Cephinae Hartigia trimaculata U06955 Dowton and Austin (1994) Orussoidea ..... Orussidae Orussinae Orussus terminalis U06956 Dowton and Austin (1994) Evanioidea ..... Evaniidae Evaniinae Evania Undetermined U06975 Dowton and Austin (1994) Gasteruptiidae Hyptiogastrinae Eufoenus Undetermined U06972 Dowton and Austin (1994) Gasteruptiidae Gasteruptiinae Gasteruption Undetermined U06974 Dowton and Austin (1994) Trigonalyoidea . . Trigonalyidae Trigonalyinae Orthogonalys pulchella U06973 Dowton and Austin (1994) Trigonalyidae Trigonalyinae Poeciligonalys costalis U06971 Dowton and Austin (1994) Megalyroidea . . . Megalyridae Megalyrinae Megalyra Undetermined U39955 Dowton et al. (1997) Ceraphronoidea ...... Ceraphronidae Ceraphroninae Aphanogmus Undetermined U39949 Dowton et al. (1997) Megaspilidae Megaspilinae Conostigmus Undetermined U39951 Dowton et al. (1997) Proctotrupoidea ...... Pelecinidae Pelecininae Pelecinus polyturator U39956 Dowton et al. (1997) Proctotrupidae Proctotrupinae Codrus Undetermined U39950 Dowton et al. (1997) Proctotrupidae Proctotrupinae Disogmus areolator U39953 Dowton et al. (1997) Vanhorniinae Vanhornia eucnemidarum U06969 Dowton and Austin (1994) Roproniidae Roproniinae Ropronia garmani U06968 Dowton and Austin (1994) Helorinae Helorus Undetermined U39954 Dowton et al. (1997) Diapriidae Ambositrinae Diphoropria Undetermined U39952 Dowton et al. (1997) Diapriidae Diapriinae Spilomicrus Undetermined U39957 Dowton et al. (1997) Platygastroidea ...... Scelionidae Scelioninae Scelio fulgidus U06964 Dowton and Austin (1994) Scelionidae Telonominae Trissolcus basalis U06962 Dowton and Austin (1994) Cynipoidea ..... Figitidae Anacharitinae Anacharis zealandica U39948 Dowton et al. (1997) Ibaliidae Ibaliinae Ibalia leucospoides U06970 Dowton and Austin (1994) Chalcidoidea .... Aphelinidae Aphelininae Aphytis melinus U06965 Dowton and Austin (1994) Aphelinidae Coccophaginae Encarsia formosa U06966 Dowton and Austin (1994) Pteromalidae Pteromalinae Pteromalus puparum U06967 Dowton and Austin (1994) Ichneumonoidea ...... Ichneumonidae Ichneumoninae Ichneumon promissorius U06960 Dowton and Austin (1994) Ichneumonidae Campopleginae Venturia canescens U06961 Dowton and Austin (1994) Ichneumonidae Pimplinae Xanthopimpla stemmator Not submitted Derr et al. (1992a, 1992b) Braconidae Braconinae Digonogastra kimballi Not submitted Derr et al. (1992a, 1992b) Braconidae Braconinae Bracon hebetor U68145 Whit®eld (1997) Braconidae Ichneutinae Paroligoneurus Undetermined U68148 Whit®eld (1997) Braconidae Agathidinae Alabagrus stigma Not submitted Derr et al. (1992a, 1992b) Braconidae Meteorinae Meteorus pulchricornis U68146 Whit®eld (1997) Braconidae Neoneurinae Neoneurus mantis U68147 Whit®eld (1997) Braconidae Cheloninae Chelonus Undetermined U68150 Whit®eld (1997) Braconidae Cheloninae Asogaster argenitifrons U68145 Whit®eld (1997) Braconidae Cardiochilinae Taxoneuron nigriceps U69151 Whit®eld (1997) Braconidae Miracinae Mirax lithocolletidis U68152 Whit®eld (1997) Braconidae Microgastrinae Pholetesor bedelliae U68153 Whit®eld (1997) Braconidae Microgastrinae Microgaster canadensis U68154 Whit®eld (1997) Braconidae Microgastrinae Microplitis Undetermined U68155 Whit®eld (1997) Braconidae Microgastrinae Cotesia autographae U68156 Whit®eld (1997) Braconidae Microgastrinae Cotesia congregata U68157 Whit®eld (1997) Braconidae Microgastrinae Cotesia glomerata U06958 Dowton and Austin (1994) Braconidae Microgastrinae Cotesia orobenae U68158 Whit®eld (1997) Braconidae Microgastrinae Cotesia rubecula U06959 Dowton and Austin (1994) Vespoidea ...... Vespidae Polistinae Polistes versicolor Not submitted Derr et al. (1992a) Formicidae Myrmeciinae Myrmecia for®cata U06963 Dowton and Austin (1994) Apoidea ...... Apidae Apinae Xylocopa virginica L22905 Cameron (1993) Apidae Apinae Eufriesea caerulescens L22904 Cameron (1993) Apidae Apinae Eulaema polychroma L22903 Cameron (1993) Apidae Apinae Bombus avinoviellus L22897 Cameron (1993) Apidae Apinae Bombus pennsylvanicus L22896 Cameron (1993) Apidae Apinae Melipona compressipes L22899 Cameron (1993) Apidae Apinae Scaptotrigona luteipennis L22900 Cameron (1993) Apidae Apinae Trigona hypogaea L22901 Cameron (1993) Apidae Apinae Trigona pallens L22902 Cameron (1993) Apidae Apinae Apis cerana L22892 Cameron (1993) Apidae Apinae Apis dorsata L22893 Cameron (1993) Apidae Apinae Apis ¯orea L22894 Cameron (1993) Apidae Apinae Apis koschevnikovi L22895 Cameron (1993) Apidae Apinae Apis mellifera L22891 Cameron (1993) 1732 Whit®eld and Cameron

FIG. 1.ÐSecondary structures for portions of the 16S gene analyzed for two braconid wasps in different subfamilies, Pholetesor bedelliae (Viereck) and Toxoneuron nigriceps (Viereck), as ®tted to the structural model of Gutell (1993). eliminated unnecessary duplication of divergence cal- More precise estimates of site-to-site rate variation culations from sets of closely related species to other were obtained by applying ML analysis (PAUP* test sets of closely related species, while still maintaining a version 4.0d54; Swofford 1997) to a relatively well cor- relatively large sample size of comparisons. Initial cal- roborated reference tree topology for the 65 hymenop- culations using the entire pairwise matrix indicated that teran taxa (table 1; see discussion of the reference phy- choice of exemplar taxon had little effect on the results. logeny below). A gamma distribution was initially in- ferred using four rate categories and the HKY85 model Obtaining Sequence Divergence Data and Statistics (Hasegawa, Kishino, and Yano 1985). The gamma shape Aligned sequences were reformatted appropriately parameter, the proportion of invariant sites, and ti : tv and entered into MEGA, version 1.01 (Kumar, Tamura, ratio were estimated from the data (with reference to the and Nei 1993), for calculation of sequence statistics and phylogeny), using the fast ML method (Rogers and measures of sequence divergence from pairwise matrices. Swofford 1998) to obtain starting branch lengths. Alter- For each pairwise comparison, we calculated the ti : tv native methods for estimating site-to-site rate variation ratio, the uncorrected sequence divergence (p-distance), (e.g., Nielsen 1997) have been proposed, but these re- and several corrected divergence estimates, including quire divergence time or branch length information that the Jukes and Cantor (1969), Kimura (1980) two-param- is not currently available for Hymenoptera. eter, Tajima and Nei (1984), and Tamura and Nei (1993) Using the reference phylogeny to consider sub- models. In the Tamura-Nei divergence estimates, a gam- clades at several hierarchical levels, we implemented ma distribution was assumed using a shape parameter MacClade, version 3.06 (updated version of Maddison of 0.4, consistent with (but slightly higher than) esti- and Maddison 1992), to obtain site-to-site estimates of mates of the gamma shape parameter reported by Yang the actual number of changes inferred (results summa- (1996) for insect 16S rRNA. Estimates of sequence di- rized in ®g. 7). MacClade was also used with the ref- vergence using the Hasegawa, Kishino, and Yano (1985; erence phylogeny to obtain graphical depictions of the subsequently referred to as HKY85) and General Time frequencies of changes from one nucleotide to another Reversible (Yang 1994; subsequently referred to as (results summarized in ®g. 4). GTR) models were obtained using PAUP* 4.0 beta ver- The Reference Phylogeny sion b1 (Swofford 1998), with the estimated parameters The higher-level phylogeny of Hymenoptera, as es- obtained as described in the following paragraph. timated from paleontological, morphological, and mo- 16S Hierarchical Variation in Hymenoptera 1733

Table 3 lecular data, was recently reviewed by Whit®eld (1998). Comparisons Used for Hierarchical Divergence Estimates Using the consensus phylogeny from that review as a Among species within genera (n ϭ 22 pairwise comparisons) foundation, we constructed a composite phylogeny (®g. Within Apis: A. mellifera, A. cerana, A. dorsata, A. koschevnikovi, 2) using results from Cameron (1993) for Apidae, Whit- A. ¯orea ®eld (1997) for Braconidae, and Dowton and Austin Within Bombus: B. avinoviellus, B. pennsylvanicus (1994) and Dowton et al. (1997) for some family- and Within Cotesia: C. autographae, C. congregata, C. glomerata, C. orobenae, C. rubecula superfamily-level hymenopteran relationships. Relation- Within Trigona: T. hypogaea, T. pallens ships among exemplar taxa from these studies were used Among genera within subfamilies (n ϭ 38 pairwise comparisons) to reconstruct the tips of the tree. Because this reference Within Apinae: Apis (mellifera), Bombus (pennsylvanicus), phylogeny is based on data from multiple sources, it is Eufriesea (caerulescens), Eulaema (polychroma), Trigona likely to be relatively accurate, although it could differ (hypogaea), Scaptotrigona (luteipennis), Melipona in some minor details from a maximum-parsimony tree (compressipes), Xylocopa (virginica) Within Braconinae: Bracon (hebetor), Digonagastra (kimballi) estimated in an actual combined analysis. A combined Within Cheloninae: Chelonus (sp.), Ascogaster argentifrons analysis is not possible at the present time, because ma- Within Microgastrinae: Microplitis (sp.), Microgaster (canadensis), jor differences in taxon representation exist between Cotesia (glomerata), Pholetesor (bedelliae) Within Proctotrupinae: Codrus (sp.), Disogmus (areolator) studies. However, the reference topology (®g. 2) has the Within Trigonalyinae: Orthogonalys (pulchella), Poecilogonalys advantage of being largely corroborated by both molec- (costalis) ular and morphological data. Therefore, rate parameters Among subfamilies within families (n ϭ 43 pairwise comparisons) estimated with reference to this phylogeny should be Within Aphelinidae: Encarsia formosa (Coccophaginae), Aphytis relatively robust. melinus (Aphelininae) Within Braconidae: Alabagrus stigma (Agathidinae), Ascogaster argentifrons (Cheloninae), Bracon hebetor (Braconinae), Cotesia Testing the Estimated Parameters in Phylogeny glomerata (Microgastrinae), Meteorus pulchricornis Estimation (Meteorinae), Mirax lithocolletidis (Miracinae), Neoneurus mantis (Neoneurinae), Paroligoneurus sp. (Ichneutinae), To determine the effects of the estimated ML pa- Toxoneuron nigriceps (Cardiochilinae) rameters and their ability to recover the correct tree rel- Within Diapriidae: Diphoropria sp. (Ambositrinae), Spilomicrus ative to parsimony analysis, two reduced sets of exem- (Diapriinae) Within Gasteruptiidae: Gasteruption sp. (Gasteruptiinae), Eufoenus plar taxa were selected. One comprised species of bees sp. (Hyptiogastrinae) within the tribe Apini, and the other comprised super- Within Ichneumonidae: Ichneumon promissorius (Ichneumoninae), family representatives. These were selected because re- Venturia canescens (Campopleginae), Xanthopimpla stemmator lationships among the taxa have been well-corroborated (Pimplinae) Within Scelionidae: Scelio fulgidus (Scelioninae), Trissolcus from multiple studies (apine bees: Alexander 1991; basalis (Telonominae) Cameron 1991, 1993; Engel and Schultz 1997; super- Among families within superfamilies (n ϭ 21 pairwise comparisons) families: see Whit®eld 1998 for a review of the hyme- Within Ceraphronoidea: Aphanogmus sp. (Ceraphronidae), nopteran superfamily relationships based on molecular, Conostigmus sp. (Megaspilidae) morphological, and fossil data). ``Expected'' phyloge- Within Chalcidoidea: Aphytis melinus (Aphelinidae), Pteromalus puparum (Pteromalidae) nies could thus be speci®ed for these well-corroborated Within Cynipoidea: Anacharis zealandica (Figitidae), Ibalia groups. The 16S data were subjected to equally weight- leucospoides (Ibaliidae) ed maximum-parsimony analysis and two ML analyses: Within Evanioidea: Evania sp. (Evaniidae), Gasteruption sp. (Gasteruptiidae) one using the HKY85 model (Hasegawa, Kishino, and Within Proctotrupoidea: Codrus sp. (Proctotrupidae), Helorus sp. Yano 1985) assuming a gamma distribution of among- (Heloridae), Pelecinus polyturator (Pelecinidae), Ropronia site rate variation estimated from the empirical base fre- garmani (Roproniidae), Spilomicrus sp. (Diapriidae), Vanhornia quencies and using estimates of the shape parameter and eucnemidarum (Vanhorniidae) Within Tenthredinoidea: Tenthredinidae sp., Perga condei proportion of invariant sites from the analyses described (Pergidae) above in Obtaining Sequence Divergence Data and Sta- Within Vespoidea: Myrmecia for®cata (Formicidae), Polistes tistics); and one using the GTR (Yang 1994) model versicolor (Vespidae) (same site-to-site rate variation assumptions) after esti- Among superfamilies within Hymenoptera (n ϭ 91 pairwise mation of the general rate matrix in an initial run on the comparisons) entire data set. Each analysis was run as a branch-and- Representing Apoidea: Apis mellifera Representing Cephoidea: Hartigia trimaculata bound search and repeated as a bootstrap analysis (heu- Representing Ceraphronoidea: Aphanogmus sp. ristic search, 400 replications) using PAUP*. The per- Representing Chalcidoidea: Pteromalus puparum centage of clades correct (Hillis, Huelsenbeck, and Cun- Representing Cynipoidea: Ibalia leucospoides Representing Evanioidea: Evania sp. ningham 1994) and the slightly more sensitive boot- Representing Ichneumonoidea: Ichneumon promissorius strapped percentage of clades correct (Cunningham Representing Megalyroidea: Megalyra sp. 1997) were calculated as measures of the ability to re- Representing Orussoidea: Orussus terminalis cover the expected phylogenies. Likelihood ratio tests Representing Platygastroidea: Scelio fulgidus Representing Proctotrupoidea: Codrus sp. (Huelsenbeck and Rannala 1997) were conducted on a Representing Tenthredinoidea: Phylacteophaga frogattii series of analyses to determine which of the estimated Representing Trigonalyoidea: Orthogonalys pulchella parameters (alone or in combination) signi®cantly im- Representing Vespoidea: Polistes versicolor proved the ML estimation. 1734 Whit®eld and Cameron

FIG. 2.ÐThe reference phylogeny used to estimate the number of evolutionary changes. See text for origin of this phylogeny, and table 2 for complete names and classi®cations of the taxa. 16S Hierarchical Variation in Hymenoptera 1735

FIG. 3.ÐSequence divergence, uncorrected and corrected using the models of Jukes and Cantor (1969), Tajima and Nei (1984), Kimura (1980), Tamura and Nei (1993), HKY85 (Hasegawa, Kishino, and Yano 1985), and GTR (Yang 1994) plotted against taxonomic level. See text for further explanation and interpretation of the speci®c models used.

Results shape parameter and proportion of invariant sites esti- Nucleotide Divergences at Various Taxonomic Levels mated from the data, also resulted in major increases in estimated divergence (with the HKY85 estimates strong- Figure 3 depicts the mean pairwise percentages of ly resembling the Tamura-Nei estimates). Golding nucleotide divergence among taxa at ®ve hierarchical (1983) noted that failure to account for site-to-site rate levels (species, genera, subfamilies, families, and super- variation (when it is substantial) can result in an under- families). The data are represented as uncorrected (raw estimation of the actual number of substitutions, clearly p-distance) and corrected, applying the Jukes and Cantor an in¯uential factor with our 16S data. The estimated (1969), Kimura (1980) two-parameter, Tajima and Nei divergence levels among families and superfamilies are (1984), Tamura and Nei (1993), HKY85, and GTR extremely high, nearing or exceeding 100%, clearly the (Yang 1994) substitution models. All of these correction result of superimposed changes at highly variable sites. methods correct, to some degree, for multiple nucleotide replacements (saturation) at a site. The Jukes-Cantor Nucleotide Composition Bias model is the simplest in assuming equal base frequen- It has previously been noted that the mtDNA of cies, ti : tv substitution rates, and no site-to-site rate vari- insects in general (Simon et al. 1994), and Hymenoptera ation. Correcting for transition bias with the Kimura in particular (Dowton and Austin 1997a, 1997b), exhib- two-parameter model resulted in divergence estimates its a signi®cantly larger proportion of A and T nucleo- virtually indistinguishable from those based on the Jukes- tides as compared with C and G. Our ®ndings from the Cantor model, indicating that transition bias has rela- large hymenopteran data set con®rm these reports (table tively little effect on these data (see also below). Cor- 4) both in magnitude and direction of the bias. AT con- recting for base composition bias using the Tajima-Nei tent is highest in groups considered to be relatively re- model results in a small but distinguishable increase in cently diverged in the hymenopteran phylogeny (bees, estimated level of divergence at higher taxonomic lev- chalcidoids, scelionids, and some endoparasitoid bra- els. This increase is small despite the relatively strong conids). The base composition bias is obviously re¯ect- AT bias in Hymenoptera. Applying the Tamura-Nei as- ed in the substitution bias toward A's and T's at different sumption of gamma-distributed (shape parameter ␣ϭ hierarchical levels (®g. 4). 0.4) rates across sites and estimating the proportion of invariant sites from the data results in major increases Transition/Transversion Bias in the estimated divergences at all levels above species. The uncorrected ti : tv ratios for Hymenoptera (®g. Finally, the HKY85 and GTR models, using the gamma 5) are unusually low (especially for species-level com- 1736 Whit®eld and Cameron

Table 4 data for 17 eukaryotes (Yang and Kumar 1996). Our Mean Percentages of A's and T's for 16S data from subsequent estimate of ␣, using ML in conjunction with Hymenopteran Taxa the reference phylogeny (®g. 2), resulted in the consid- Superfamily N Mean % AϩT erably higher value of 0.8728. The estimated proportion Tenthredinoidea ...... 3 77.4 of invariant sites was 0.1281. Cephoidea ...... 1 78.5 Length-Variable Regions Orussoidea ...... 1 75.1 Evanioidea ...... 3 80.6 Several regions of the 434-bp fragment of the 16S Trigonalyoidea ...... 2 82.1 molecule exhibit considerable length variation (indels), Megalyroidea ...... 1 82.1 especially in higher-taxon comparisons. These length- Ceraphronoidea ...... 2 81.4 Proctotrupoidea ...... 8 83.1 variable regions (®g. 1) do not consistently correspond Platygastroidea ...... 2 85.9 to any speci®c structural features of the molecule (e.g., Cynipoidea ...... 2 83.6 loop regions), but do occur in the same positions across Chalcidoidea ...... 3 83.6 taxa within the Hymenoptera. They also correspond to Ichneumonoidea ...... 21 83.4 Vespoidea ...... 2 81.5 those regions of ®gure 7 with the largest numbers of Apoidea ...... 14 81.2 estimated changes. Some of these variable regions can Total ...... 65 82.2 be reconciled with the highly variable regions found in Orthoptera (Flook and Rowell 1997a, 1997b). parisons) and increase with divergence time, contrary to Hierarchical Accumulation of Variation expectations (Wakeley 1996). It is possible that we lack Figure 7 shows the estimated number of changes suf®cient comparisons at the population and closely re- at each site within the 434-bp region for several taxo- lated species levels to detect the characteristic domi- nomic levels, optimized onto the reference phylogeny nance of transitions at low taxonomic levels, but there using maximum parsimony. From this, it is clear that is no indication that transitions predominate at any level the most highly variable sites are clumped in distribution in our comparisons (®g. 6). The overall ti : tv ratio es- and vary in magnitude at different taxonomic levels. For timated for all sequences using ML is also unusually example, while variability may be relatively low for low (0.28). comparisons among subfamilies, it becomes enormous when different superfamilies are compared. Site-to-Site Rate Variation In our initial estimates of corrected nucleotide di- Ef®cacy of the Estimated Parameters in Phylogeny vergence among taxa at different taxonomic levels (®g. Estimation 3), we employed the Tamura-Nei model and assumed a Figures 8 and 9 depict results of analyses designed gamma distribution of rates across sites and a shape pa- to assess which of three approaches comes closer to re- rameter (␣) of 0.4. The shape parameter value was rel- covering the expected phylogenies (®gs. 8A and 9A): (1) atively close to the value of 0.31 estimated from 16S unweighted parsimony, (2) ML under the HKY85 model

FIG. 4.ÐRelative frequency of types of base changes, as estimated under parsimony assumptions using MacClade, version 3.06 (Maddison and Maddison 1992) and the reference phylogeny in ®gure 2. 16S Hierarchical Variation in Hymenoptera 1737

FIG. 5.ÐUncorrected pairwise ti : tv ratio, plotted against taxonomic level of comparison within Hymenoptera.

of evolutionary change incorporating site-to site rate ation (P K 0.01). However, accounting for ti : tv ratio variation, or (3) ML under the GTR model (also simi- alone has little effect (P Ͼ 0.05). larly incorporating site-to-site rate variation). At both moderately low (®g. 8) and high (®g. 9) taxonomic lev- Discussion els, the more complex ML models incorporating our pa- Unusual Features of Hymenopteran 16S Sequence rameter estimates improve the ability of the analyses to Data recover the expected phylogeny (®gs. 8D and 9D). How- ever, at the higher taxonomic level, even the complex Hymenopteran mtDNA exhibits one the highest ML methods result in a tree with unacceptably low boot- proportions of AT nucleotides of any organism yet mea- strap support for most nodes (®g. 9D), suggesting that sured (Cameron 1991, 1993; Cameron et al. 1992; Cro- these data are not useful at the superfamily level or that zier and Crozier 1993; Simon et al. 1994; Dowton and taxon sampling needs to be more complete. Neverthe- Austin 1994, 1997a, 1997b; Whit®eld 1997). A current less, the use of additional data from other sources would hypothesis for this AT richness is that strand-speci®c be advisable to correctly estimate the entire phylogeny. compositional bias (a predominance of G→A transi- Likelihood ratio tests (Huelsenbeck and Rannala tions, perhaps the result of asymmetries in stem base- 1997) indicate that taking the empirical base frequencies pairing capabilities) has led to an increase in A content, into account signi®cantly improves the ®t of the ML followed by an increase in T content (reviewed in Dow- model to the tree (P K 0.01), as do the models account- ton and Austin 1997b). Dowton and Austin (1997b) ing for both base frequencies and site-to-site rate vari- showed that the AT content of 16S increases from the

FIG. 6.ÐObserved number of transitions plotted against number of transversions from all pairwise comparisons within Hymenoptera. A large number of superimposed points are hidden in the cloud to the right. 1738 Whit®eld and Cameron

FIG. 7.ÐNumber of changes by site at various taxonomic levels within Hymenoptera. The number of changes at each position was estimated using MacClade 3.06 (Maddison and Maddison 1992) and the reference phylogeny in ®gure 2. base to the tips of the tree, suggesting that AT bias has increasing taxonomic levels (®g. 5) suggests that the AT continued to accumulate over time within lineages of bias changes among lineages (which it does to some Hymenoptera. Our data are consistent with their ®nd- degree; see table 4). Nonetheless, the ti : tv ratio is ex- ings. ceptionally low compared with those of other organisms, This high AT bias could explain, in part, the strik- even after compensating for the AT bias using ML. ingly low ti : tv ratio observed at all taxonomic levels. Clearly, there must be other factors operating here which Furthermore, the fact that the ti : tv ratio increases with require further investigation. 16S Hierarchical Variation in Hymenoptera 1739

FIG. 8.ÐPerformance of three phylogenetic estimation methods in recovering an expected phylogeny (A) for a subset of taxa (species, genera, and tribes of bees) in ®gure 2. The three methods compared are (B) unweighted maximum parsimony, (C) ML using the HKY85 (Hasegawa, Kishino, and Yano 1985) model with the parameters estimated in this study (base frequencies, ti : tv ratio, proportion of invariant sites, gamma shape parameter), and (D) ML using the GTR model (same site-to-site rate assumptions as in C). Ϫln Like ϭ inverse log likelihood; %CCϭ percentage of clades correct (Hillis, Huelsenbeck, and Cunningham 1994); BV % CC ϭ bootstrapped percentage of clades correct (Cunningham 1997). Numbers on branches represent bootstrap proportions (400 replications). Tree lengths and log likelihoods are based on the shortest trees from branch-and-bound searches using PAUP* (Swofford 1997±1998).

16S rRNA sequences in general possess unique Treatment of 16S Data in Phylogenetic Analysis patterns of length variation and site-to-site rate variation At least two aspects of AT-richness have important that are absent in sequences of protein-coding genes. implications for phylogenetic analysis. First, it has the These patterns are most conspicuous in comparisons of effect of reducing a majority of sites to two-state char- distantly related taxa. Moreover, our investigation re- acters (A or T), thus increasing the potential for ho- veals that the location of the most variable sites is con- moplasy. Second, branch lengths will tend to be under- sistent across a wide array of taxa. Knowledge of the estimated unless the AT bias is considered. The simplest locations of high variability reported here for 16S should way to compensate for AT bias is to downweight AT be useful for future investigations of Hymenoptera and transversions in parsimony analyses (Knight and Min- other insect groups. dell 1993; Collins, Wimberger, and Naylor 1994; Dow- 1740 Whit®eld and Cameron

FIG. 9.ÐAs ®gure 8, except that taxa are exemplars of superfamilies represented in ®gure 2. ton and Austin 1997a). Differences among taxa in com- al. 1997; Flook and Rowell 1997a, 1997b; Whit®eld positional bias can be compensated for by employing 1997), often with little effect on the outcome, although log-determinant (transformed) distances, as was done by at higher levels, it appears that exclusion improves the Lockhart et al. (1994) for 16S data from honey bees. signal : noise ratio (Flook and Rowell 1997b; Whit®eld Although this method appears to successfully compen- 1997). Alignment of length-variable regions to the 16S sate for AT bias among lineages, we expect that a more secondary structure (using the Gutell [1993] model) sig- informative general strategy will employ an ML ap- ni®cantly increased the overall phylogenetic signal : proach, in which site-to-site rate variation can be con- noise ratio in at least one analysis (Whit®eld 1997) and sidered simultaneously with compositional bias. The yielded a larger fraction of useful sequence data. GTR (Yang 1994) model of sequence change appears to Recent work (reviewed in Yang 1996) has shown provide the best correction among those tried here, no that when signi®cant site-to-site rate variation exists in doubt because it allows for AT transversions to be treat- sequence data, it is important to account for this varia- ed as an independent rate. tion in order to obtain an accurate estimate of phylog- A number of phylogenetic studies have excluded eny. Rate variability across sites is more dif®cult to rec- the length-variable regions of 16S in some analyses oncile than compositional bias in phylogenetic analysis. (Cameron 1991; Dowton and Austin 1994; Dowton et Unfortunately, parsimony methods as currently imple- 16S Hierarchical Variation in Hymenoptera 1741

mented do not deal effectively with rate variation across thoptera). This conclusion is strongly supported by sites (Yang 1996). At present, ML methods incorporat- our phylogenetic tests (especially ®g. 9) and by the ing the gamma distribution, as well as mixed-distribu- extremely high divergence levels among superfami- tion models incorporating both gamma-distributed sites lies (®g. 3). and estimates of the proportion of invariant sites, are the most effective and easily implemented methods to ac- Our analysis of the hierarchical utility of 16S nu- commodate site-to-site rate variation. cleotide sequences for phylogeny estimation is some- Finally, it is evident at the level of distant family what limited in scope due to the intensive computational and superfamily comparisons that the number of esti- effort involved in summarizing such complex patterns. mated changes at many sites within 16S becomes so As future analyses continue to clarify patterns of vari- large as to render the gene phylogenetically useless at ation within this and other genes, the task of matching those taxonomic levels. Our analyses of the data at these the appropriate sequence data to speci®c evolutionary levels (®g. 9B±D) indicates that the data are poor at questions should become easier. At the same time, de- recovering expected clades. Figure 7 provides a graphic velopment of appropriate models of sequence change for depiction of why the gene cannot resolve basal diver- each gene remains a critical step in phylogenetic anal- gences within the Hymenoptera (Dowton and Austin ysis of DNA sequences. It is still a challenge to utilize 1994). such models in analyses incorporating multiple data sets, In summary, our analyses of patterns of variation but this situation is likely to improve. in 16S sequences of Hymenoptera suggest that the 16S gene may be useful for phylogenetic analysis across a Acknowledgments relatively wide range of taxonomic levels, with the fol- We thank Chris Simon, Karl Kjer, Mike Antolin, lowing caveats: and Thomas Buckley for discussions about secondary 1. When using 16S sequences in phylogenetic analyses, structure and its incorporation into sequence alignment; the magnitude of base composition bias (typically AT Patrick Mardulyn, John Wakeley, and David Pollock for bias) requires consideration, particularly for higher- discussion of trends in our data and suggestions for their taxon comparisons. One approach is to downweight presentation; Paul Flook, Mark Dowton, and David AT transversions in parsimony analysis. However, Swofford for sharing manuscripts and preprints of rel- compensation for this bias is probably best accom- evant papers; and David Swofford for discussion of ML plished using ML methods, in which site-to-site rate estimation of site-to-site rate heterogeneity and for pre- variation can be simultaneously accommodated. release versions of PAUP*. David Rand and two anon- 2. Site-to site variation in substitution rate is well doc- ymous reviewers provided many useful comments on an umented for hymenopteran (and other ) 16S earlier draft of the manuscript. This work was supported sequence data, and should be incorporated into mod- by NSF grants to J.B.W. (BSR-9111938) and S.A.C. els for phylogenetic estimation, especially in studies (GER-9450117) and by USDA grant 9501893 to J.B.W. of higher taxa (above the level of distantly related species). Attempts to estimate phylogeny using 16S LITERATURE CITED data at higher taxonomic levels without considering ALEXANDER, B. A. 1991. Phylogenetic analysis of the genus site-to-site rate variation are likely to produce inac- Apis (Hymenoptera: Apidae). Ann. Entomol. Soc. Am. 84: curate estimations of branch lengths and perhaps 137±149. even wrong topologies. We highly recommend esti- BROWER, A. V. Z. 1994. Rapid morphological radiation and mating the shape parameter for the gamma distribu- convergence among races of the butter¯y Heliconius erato tion from the data, rather than employing previously inferred from patterns of mitochondrial DNA evolution. published values, until a wider range of taxa have Proc. Natl. Acad. Sci. USA 91:6491±6495. been investigated fully. Our data suggest a higher BROWER, A. V. Z., and R. DESALLE. 1994. Practical and the- value for ␣ than has previously been estimated for oretical considerations for choice of a DNA sequence region insects (0.87 compared with 0.3±0.4). in insect molecular systematics, with a short review of pub- lished studies using nuclear gene regions. Ann. Entomol. 3. The 16S gene may be useful for estimating relation- Soc. Am. 87:702±716. ships among closely related species if a suf®cient CAMERON, S. A. 1991. A new tribal phylogeny of the Apidae number of variable sites can be found. It clearly con- inferred from mitochondrial DNA sequences. Pp. 71±87 in tains phylogenetically useful signal at the tribal/sub- D. R. SMITH, ed. Diversity in the genus Apis. Westview family and close family levels. However, among gen- Press, Boulder, Colo. era and distantly related species groups, the highly . 1993. Multiple origins of advanced eusociality in bees variable sites appear to be saturated with substitu- inferred from mitochondrial DNA sequences. Proc. Natl. tions, while too few of the conserved sites exhibit Acad. Sci. USA 90:8687±8691. variation (Simon et al. 1994; Mardulyn and Whit®eld CAMERON, S. A., J. N. DERR,A.D.AUSTIN,J.B.WOOLLEY, and R. A. WHARTON. 1992. The application of nucleotide 1998). Results of several phylogenetic studies sug- sequence data to phylogeny of the Hymenoptera: a review. gest that the upper limit of utility for 16S is exceeded J. Hymenopt. Res. 1:63±79. at the superfamily and subordinal levels (Dowton and CHO, S., A. MITCHELL,J.C.REGIER,C.MITTER,R.W.POOLE, Austin [1994] and Dowton et al. [1997] for Hyme- T. P. F RIEDLANDER, and S. ZHAO. 1995. A highly conserved noptera; Flook and Rowell [1997a, 1997b] for Or- nuclear gene for low-level phylogenetics: elongation factor- 1742 Whit®eld and Cameron

1ϰ recovers morphology-based tree for heliothine moths. FRIEDLANDER, T. P., J. C. REGIER, and C. MITTER. 1994. Phy- Mol. Biol. Evol. 12:650±656. logenetic information content of ®ve nuclear gene sequenc- COLLINS, T. M., P. H. WIMBERGER, and G. P. NAYLOR. 1994. es in animals: intitial assessment of character sets from con- Compositional bias, character-state bias, and character-state cordance and divergence studies. Syst. Biol. 43:511±525. reconstruction using parsimony. Syst. Biol. 43:482±496. FRIEDLANDER, T. P., J. C. REGIER,C.MITTER, and D. L. WAG- CROZIER, R. H., and Y. C. CROZIER. 1993. The mitochondrial NER. 1996. A nuclear gene for higher level phylogenetics: genome of the honeybee Apis mellifera: complete sequence phosphoenolpyruvate carboxykinase tracks Mesozoic-age and genome organization. Genetics 133:97±117. divergences within Lepidoptera (Insecta). Mol. Biol. Evol. CUMMINGS, M., S. OTTO, and J. WAKELEY. 1995. Sampling 13:594-604. properties of DNA sequence data in phylogenetic analysis. FUNK, D. J., D. J. FUTUYMA,G.ORTI, and A. MEYER. 1995. Mol. Biol. Evol. 12:814±822. Mitochondrial DNA sequences and multiple data sets: a CUNNINGHAM, C. W. 1997. Can three incongruence tests pre- phylogenetic study of phytophagous beetles (Chrysomeli- dict when data should be combined? Mol. Biol. Evol. 14: dae: Ophraella). Mol. Biol. Evol. 12:627±640. 733±740. GILBERT, D. 1993. SeqApp. Version 1.9a. Distributed by the DERR, J. N., S. K. DAVIS,J.B.WOOLLEY, and R. A. WHAR- author, Department of Biology, Indiana University, Bloo- TON. 1992a. Variation and phylogenetic utility of the large mington. ribosomal subunit of mitochondrial DNA from the insect GOLDING, G. B. 1983. Estimates of DNA and protein sequence order Hymenoptera. Mol. Phylogenet. Evol. 1:136±147. divergence: an examination of some assumptions. Mol. . 1992b. Reassessment of the 16S rRNA nucleotide se- Biol. Evol. 1:125±142. quence from members of the parasitic Hymenoptera. Mol. GRAYBEAL, A. 1994. Evaluating the phylogenetic utility of Phylogenet. Evol. 1:338±341. genes: a search for genes informative about deep diver- DESALLE, R. 1992a. The phylogenetic relationships of ¯ies in gences among vertebrates. Syst. Biol. 43:174±193. the family Drosophilidae deduced from mtDNA sequences. GUTELL, R. R. 1993. Comparative studies of RNA: inferring Mol. Phylogenet. Evol. 1:31±40. higher-order structure from patterns of sequence variation. . 1992b. The origin and possible time of divergence of Curr. Opin. Struct. Biol. 3:313±322. the Hawaiian Drosophilidae: evidence from DNA sequenc- HASEGAWA, M., and H. KISHINO. 1989. Heterogeneity of tem- es. Mol. Biol. Evol. 9:905±916. po and mode of mitochondrial DNA evolution among mam- DESALLE, R., T. FREEDMAN,E.M.PRAGER, and A. C. WILSON. malian orders. Jpn. J. Genet. 64:243-258. 1987. Tempo and mode of sequence evolution in mitochon- HASEGAWA, M., H. KISHINO, and T. YANO. 1985. Dating of drial DNA of Hawaiian Drosophila. J. Mol. Evol. 26:157± the human-ape splitting by a molecular clock of mitochon- 164. drial DNA. J. Mol. Evol. 22:160±174. DESALLE, R., and A. R. TEMPLETON. 1988. Founder effects HILLIS, D. M., and M. T. DIXON. 1991. Ribosomal DNA: mo- accelerate the rate of mitochondrial DNA evolution in Ha- lecular evolution and phylogenetic inference. Q. Rev. Biol. waiian Drosophila. Evolution 42:1076±1084. 66:411±453. DOWTON, M., and A. D. AUSTIN. 1994. Molecular phylogeny HILLIS, D. M., J. P. HUELSENBECK, and C. W. CUNNINGHAM. of the insect order Hymenoptera: apocritan relationships. 1994. Application and accuracy of molecular phylogenies. Proc. Natl. Acad. Sci. USA 91:9911±9915. Science 264:671±676. . 1997a. Evidence for AT-transversion bias in HUELSENBECK, J. P., and B. RANNALA. 1997. Phylogenetic (Hymenoptera: Symphyta) mitochondrial genes and its im- methods come of age: testing hypotheses in an evolutionary plications for the origin of parasitism. J. Mol. Evol. 44:398± context. Science 276:227±232. 405. JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein . 1997b. The evolution of strand-speci®c compositional molecules. Pp. 21±132 in H. N. MUNRO, ed. Mammalian bias. A case study in the hymenopteran mitochondrial 16S protein metabolism. Academic Press, New York. rRNA gene. Mol. Biol. Evol. 14:109±112. KAMBHAMPATI, S. 1995. A phylogeny of cockroaches and re- DOWTON, M., A. D. AUSTIN, and M. F. ANTOLIN. 1998. Evo- lated insects based on DNA sequence of mitochondrial ri- lutionary relationships among the Braconidae (Hymenop- bosomal RNA genes. Proc. Natl. Acad. Sci. USA 92:2017± tera: Ichneumonoidea) inferred from partial 16S rDNA gene 2020. sequences. Insect Mol. Biol. 7:129±150. KIMURA, M. 1980. A simple method for estimating evolution- DOWTON, M., A. D. AUSTIN,N.DILLON, and E. BARTOWSKY. ary rate of base substitutions through comparative studies 1997. Molecular phylogeny of the apocritan wasps with par- of nucleotide sequences. J. Mol. Evol. 16:111±120. ticular reference to the Proctotrupomorpha and Evanio- KNIGHT, A., and D. P. MINDELL. 1993. Substitution bias, morpha. Syst. Entomol. 22:245±255. weighting of DNA sequence evolution, and the phyloge- ENGEL, M. S., and T. R. SCHULTZ. 1997. Phylogeny and be- netic position of Fea's viper. Syst. Biol. 42:18-31. havior in honey bees (Hymenoptera: Apidae). Ann. Ento- KOULIANOS, S., R. SCHMID-HEMPEL,D.W.ROUBIK, and P. mol. Soc. Am. 90:43±53. SCHMID-HEMPEL. 1998. Phylogenetic relationships within FANG, Q., W. C. BLACK JR., H. D. BLOCKER, and R. F. WHIT- the Apinae (Hymenoptera) and the evolution of eusociality. COMB. 1993. A phylogeny of New World Deltocephalus- J. Evol. Biol. 11 (in press). like leafhopper (Homoptera: Cicadellidae) genera based on KRAUS, F., L. JARECKI,M.M.MIYAMOTO,S.M.TANHAUSER, mitochondrial 16S ribosomal DNA sequences. Mol. Phy- and P. J. LAIPIS. 1992. Mispairing and compensational logenet. Evol. 2:119±131. changes during the evolution of mitochondrial ribosomal FLOOK, P. K., and C. H. F. ROWELL. 1997a. The phylogeny of RNA. Mol. Biol. Evol. 9:770±774. the Caelifera (Insecta, Orthoptera) as deduced from mtr- KUMAR, S., K. TAMURA, and M. NEI. 1993. MEGA: molecular RNA gene sequences. Mol. Phylogenet. Evol. 8:89±103. evolutionary genetics analysis. Version 1.01. Pennsylvania . 1997b. The effectiveness of mitochondrial RNA gene State University, University Park. sequences for the reconstruction of the phylogeny of an LOCKHART, P. J., M. A. STEEL,M.D.HENDY, and D. PENNY. insect order (Orthoptera). Mol. Phylogenet. Evol. 8:177± 1994. Recovering evolutionary trees under a more realistic 192. model of sequence evolution. Mol. Biol. Evol. 11:605±612. 16S Hierarchical Variation in Hymenoptera 1743

MADDISON, W. P., and D. R. MADDISON. 1992. MacClade: anal- TAMURA, K. 1992. Estimation of the number of nucleotide sub- ysis of phylogeny and character evolution. Version 3. Sin- stitutions when there are strong transition-transversion and auer, Sunderland, Mass. GϩC-content biases. Mol. Biol. Evol. 9:678-687. MARDULYN, P., and S. A. CAMERON. 1998. The major opsin in TAMURA, K., and M. NEI. 1993. Estimation of the number of bees (Insecta: Hymenoptera): a promising nuclear gene for nucleotide substitutions in the control region of mitochon- higher level phylogenetics. Mol. Phylogenet. Evol. (in drial DNA in humans and chimpanzees. Mol. Biol. Evol. press). 10:512±526. MARDULYN, P., M. C. MILINKOVITCH, and J. M. PASTEELS. THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994. 1997. Phylogenetic analyses of DNA and allozyme data CLUSTAL W: improving the sensititvity of progressive suggest that Gonioctena leaf beetles (Coleoptera: Chryso- multiple sequence alignment through sequence weighting, melidae) experienced convergent evolution in their history position-speci®c gap penalties and weight matrix choice. of host-plant family shifts. Syst. Biol. 46:699±725. Nucleic Acids Res. 22:4673±4680. MARDULYN, P., and J. B. WHITFIELD. 1998. Phylogenetic signal VOGLER, A. P., and R. DESALLE. 1993. Phylogeographic pat- in the COI, 16S, and 28S genes for inferring relationships terns in coastal North American tiger beetles, Cicindela among genera of Microgastrinae (Hymenoptera: Braconi- dorsalis, inferred from mitochondrial DNA sequences. Evo- dae): evidence of a high rate of diversi®cation in this group lution 47:1192±1202. of parasitoids. Mol. Phylogenet. Evol. (in press). VOGLER, A. P., R. DESALLE,T.ASSMANN,C.B.KNISLEY, and CHULTZ MINDELL, D. P., and R. L. HONEYCUTT. 1990. Ribosomal RNA T. D. S . 1993a. Molecular population genetics of the in vertebrates: evolution and phylogenetic applications. endangered tiger beetle Cicindela dorsalis (Coleoptera: Ci- Ann. Rev. Ecol. Syst. 21:541±566. cindelidae). Ann. Entomol. Soc. Am. 86:142±152. VOGLER, A. P., C. B. KNISLEY,S.B.GLUECK,J.M.HILL, and NIELSEN, R. 1997. Site-by-site estimation of the rate of substi- tution and the correlation of rates in mitochondrial DNA. R. DESALLE. 1993b. Using molecular and ecological data Syst. Biol. 46:346±353. to diagnose endagered populations of the puritan tiger beetle Cincindela puritana. Mol. Ecol. 2:375±383. PALUMBI, S. R. 1989. Rates of molecular evolution and the WAKELEY, J. 1996. The excess of transitions among nucleotide fraction of nucleotide positions free to vary. J. Mol. Evol. substitutions: new methods of estimating transition bias un- 29:180±187. derscore its signi®cance. TREE 11:158±163. PASHLEY, D. P., and L. D. KE. 1992. Sequence evolution in WHEELER, W. C., and R. L. HONEYCUTT. 1988. Paired se- mitochondrial ribosomal and ND-1 genes in Lepidoptera: quence difference in ribosomal RNAs: evolutionary and implications for phylogenetic analyses. Mol. Biol. Evol. 9: phylogenetic implications. Mol. Biol. Evol. 5:90±96. 1061±1075. WHITFIELD, J. B. 1997. Molecular and morphological data sug- ROGERS, J. S., and D. L. SWOFFORD. 1998. A fast method for gest a single origin of the polydnaviruses among braconid approximating maximum likelihoods of phylogenetic trees wasps. Naturwissensshaften 84:502±507. from nucleotide sequences. Syst. Biol. 47:7±89. . 1998. Phylogeny and evolution of host-parasitoid in- ROIG-ALSINA, A., and C. D. MICHENER. 1993. Studies of the teractions in Hymenoptera. Ann. Rev. Entomol. 43:129± phylogeny and classi®cation of long-tongued bees (Hyme- 151. noptera: Apoidea). Univ. Kans. Sci. Bull. 55:124±162. XIONG, B., and T. D. KOCHER. 1993a. Intraspeci®c variation in SHEPPARD, W. S., and B. A. MCPHERON. 1991. Ribosomal sibling species of Simulium venustum and Simulium vere- DNA diversity in Apidae. Pp. 89±102 in D. R. SMITH, ed. cundum complexes (Diptera: Simuliidae) revealed by the Diversity in the genus Apis. Westview Press, Boulder, Colo. sequence of the mitochondrial 16S rRNA gene. Can. J. SIMON, C., F. FRATI,A.BECKENBACH,B.CRESPI,H.LIU, and Zool. 71:1202±1206. P. F LOOK. 1994. Evolution, weighting and phylogenetic util- . 1993b. Phylogeny of sibling species of Simulium ven- ity of mitochondrial gene sequences and a compilation of ustum and S. verecundum (Diptera: Simuliidae) based on conserved polymerase chain reaction primers. Ann. Ento- sequences of the mitochodrial large subunit rRNA gene. mol. Soc. Am. 87:651±701. Mol. Phylogenet. Evol. 4:293±303. SMITH, A. B. 1989. RNA sequence data in phylogenetic recon- YANG, Z. 1994. Estimating the pattern of nucleotide substitu- struction: testing the limits of its resolution. Cladistics 5: tion. J. Mol. Evol. 39:105-111. 321±344. . 1995. Evaluation of several methods for estimating SWOFFORD, D. L. 1997/1998. PAUP*Ðphylogenetic analysis phylogenetic trees when substitution rates differ over nu- using parsimony (* and other methods). Test versions cleotide sites. J. Mol. Evol. 40:689±697. 4.0d54±d64 used with permission of author; beta version . 1996. Among-site rate variation and its impact on phy- released of®cially 1998 by Sinauer, Sunderland, Mass. logenetic analysis. TREE 11:367±372. YANG, Z., and S. KUMAR. 1996. Approximate methods for es- SWOFFORD, D. L., G. J. OLSEN,P.J.WADDELL, and D. M. timating the pattern of nucleotide subtitution and the vari- HILLIS. 1996. Phylogenetic inference. Pp. 407±514 in D. M. ation of substitution rate among sites. Mol. Biol. Evol. 13: HILLIS,C.MORITZ, and B. K. MABLE, eds. Molecular sys- 650±659. tematics. 2nd edition. Sinauer, Sunderland, Mass. TAJIMA, F., and M. NEI. 1984. Estimation of evolutionary dis- DAVID M. RAND, reviewing editor tance between nucleotide sequences. Mol. Biol. Evol. 1: 269±285. Accepted September 4, 1998