Hierarchical Analysis of Variation in the Mitochondrial 16S Rrna Gene Among Hymenoptera
Total Page:16
File Type:pdf, Size:1020Kb
Hierarchical Analysis of Variation in the Mitochondrial 16S rRNA Gene Among Hymenoptera James B. Whit®eld* and Sydney A. Cameron*² *Department of Entomology and ²Department of Biological Sciences, University of Arkansas Nucleotide sequences from a 434-bp region of the 16S rRNA gene were analyzed for 65 taxa of Hymenoptera (ants, bees, wasps, parasitoid wasps, saw¯ies) to examine the patterns of variation within the gene fragment and the taxonomic levels for which it shows maximum utility in phylogeny estimation. A hierarchical approach was adopted in the study through comparison of levels of sequence variation among taxa at different taxonomic levels. As previously reported for many holometabolous insects, the 16S data reported here for Hymenoptera are highly AT-rich and exhibit strong site-to-site variation in substitution rate. More precise estimates of the shape parameter (a) of the gamma distribution and the proportion of invariant sites were obtained in this study by employing a reference phylogeny and utilizing maximum-likelihood estimation. The effectiveness of this approach to recovering expected phylogenies of selected hymenopteran taxa has been tested against the use of maximum parsimony. This study ®nds that the 16S gene is most informative for phylogenetic analysis at two different levels: among closely related species or populations, and among tribes, subfamilies, and families. Maximization of the phylogenetic signal extracted from the 16S gene at higher taxonomic levels may require consideration of the base composition bias and the site-to-site rate variation in a maximum-likelihood framework. Introduction Selecting a gene for phylogenetic analysis requires of nucleotide substitution among sites within a gene matching the level of sequence variation to the desired (Wheeler and Honeycutt 1988; Mindell and Honeycutt taxonomic level of study. Several recent papers have 1990; Hillis and Dixon 1991; Kraus et al. 1992). These focused on the identi®cation of genes that are useful for constraints may be of a general nature, such as variation phylogenetic analysis at different taxonomic levels in the rate of substitution by codon position in protein- (Brower and DeSalle 1994; Friedlander, Regier, and Mit- coding genes or by secondary structural position in ter 1994; Graybeal 1994; Simon et al. 1994; Cho et al. rRNA genes, or they may be lineage-speci®c (some taxa 1995). For many of these genes, sequence data are avail- appear to evolve more slowly than others; e.g., DeSalle able from a relatively small sample of taxa with roughly and Templeton 1988; Hasegawa and Kishino 1989). known divergence times. These studies permit estimates Graybeal (1994) pointed out that any given gene's of sequence divergence rates (e.g., number of nucleotide potential phylogenetic utility at a particular taxonomic substitutions or percentage of sequence divergence over level depends not only on the percentage of sequence time), providing information on the relative rate of divergence at that level, but also on the shape of the change of a gene. However, estimates of sequence di- sequence divergence accumulation curve. For example, vergence rate calculated from a small sample of taxa at a given observed divergence level, genes in which may not be appropriate when applied more generally only a few sites are ``free to vary'' (sensu Palumbi 1989) (Graybeal 1994), because unsampled lineages may differ will contain more superimposed changes (i.e., be more in divergence rate. This problem will resolve itself as saturated with nucleotide substitutions) than those in sequence data are collected from additional genes for which many sites are able to change. The pattern that increasingly larger numbers of taxa. emerges when few sites are free to vary is a sequence A few mitochondrial (mtDNA) genes have been divergence accumulation curve that is strongly convex studied extensively within recently diverged lineages of near the origin, then ¯attens out at a low level over the arthropods (,5 MYA). These genes (12S rRNA, 16S remainder of the distribution of divergence times as ad- rRNA, cytochrome oxidase I) exhibit nearly the same ditional substitutions are superimposed and thus go divergence rate, which is linear with time and approxi- unobserved. (This curve contrasts sharply with the more mates 2.3% per Myr for silent sites (Brower 1994). linear curve seen with recently diverged lineages or with However, when more anciently diverged lineages (.75 genuinely highly conserved genes.) The low overall se- MYA) are compared, different mtDNA genes exhibit quence divergence level for older divergences might considerable variation in sequence divergence rate suggest a strongly conserved gene appropriate for high- (Cummings, Otto, and Wakeley 1995), with some show- er-level comparisons, when, in fact, the available vari- ing greater conservation than others. Furthermore, a ation may be useful only at lower taxonomic levels, number of constraints can in¯uence variation in the rate among recently diverged taxa. An appreciation of the distribution of variable sites across the gene is therefore Key words: Hymenoptera, molecular phylogeny, insects, mito- important in examining the phylogenetic utility of a chondrial DNA. gene for a particular taxonomic level. Address for correspondence and reprints: James B. Whit®eld, Factors other than the distribution of rate variation Department of Entomology, 321 Agriculture Building, University of Arkansas, Fayetteville, Arkansas 72701. E-mail: among sites can determine the shape of the sequence jwhit®[email protected]. divergence accumulation curve. For instance, in many Mol. Biol. Evol. 15(12):1728±1743. 1998 holometabolous insects, including Hymenoptera and q 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 Drosophila, mtDNA exhibits a nucleotide composition 1728 16S Hierarchical Variation in Hymenoptera 1729 which is strongly biased toward adenine and thymine Prior Uses and Criticisms of the 16S Gene for (AT bias). For some groups, the mean percentage of AT Phylogeny can be higher than 80% (Cameron 1993; Crozier and Sequences coding for the 16S rRNA gene have Crozier 1993; Simon et al. 1994; Dowton and Austin been used for estimating phylogenies over a notable 1994, 1997a, 1997b; Whit®eld 1997). When the base range of taxonomic levels (see table 1 for a survey of composition is biased to that degree, obviously, the ratio studies involving insects). The existence of sites that are of transversions (tv) to transitions (ti) increases, as does changing at widely differing rates within this single gene the probability of convergent substitutions to the more (Hillis and Dixon 1991; Simon et al. 1994) suggests that common bases. Both of these increases further reduce 16S sequences contain historical information that is use- the ability to correctly estimate the number and propor- ful at more than one level of phylogenetic divergence. tion of hidden mutations and, hence, the ability to cor- In a recent review, Simon et al. (1994) suggested that rect sequence divergence for hidden changes. data from 16S might be useful primarily for phyloge- Fortunately, increasingly complex models of sub- netic estimation at higher levels, because few sites were stitutional change continue to be incorporated into meth- variable at lower levels among closely related species, ods of correcting sequence divergence rates for satura- and even some of those quickly saturated. In contrast, tion (Jukes and Cantor [1969] and the models that fol- Engel and Schultz (1997), in a reanalysis of Cameron's lowed), ti/tv bias (Kimura 1980), base composition bias (1991, 1993) data for estimating relationships among the (Tajima and Nei 1984; Tamura 1992; Tamura and Nei corbiculate bees, suggested that Apis (honey bee) spe- 1993), and rate variation among sites (Yang 1995; Niel- cies relationships recovered from 16S sequences (Cam- sen 1997). Thus, even though the functional constraints eron et al. 1992) were strongly congruent with those on rRNA, tRNA, and a large variety of protein-coding inferred from morphological data. genes are not well understood, it is still possible to in- In cases in which well-corroborated phylogenies vestigate the patterns of variation and phylogenetic util- are available, concordance of DNA-based results with ity of these genes in some detail. well-researched phylogenies based on morphology and In this paper, we examine the mitochondrial large- other evidence often provides a good test of the phylo- subunit (16S) rRNA gene for hierarchical patterns of genetic informativeness of molecular data (e.g., Fried- sequence variation and divergence among a large and lander et al. [1996] for a nuclear protein-coding gene; diverse array of hymenopteran insects (ants, bees, Smith [1989] for rRNA genes). In this context, attempts wasps, parasitoids, and saw¯ies). We address several as- to use the 16S gene for estimation of relationships at pects of 16S sequence variation: (1) the observed pat- higher taxonomic levels have met with mixed (although terns of sequence divergence at different taxonomic lev- signi®cant) success. Cameron (1993) obtained a tribal els; (2) how those patterns are affected by correcting the phylogeny of the corbiculate bees which con¯icts with sequence divergences under different models; (3) the in- morphology-based phylogenies (Roig-Alsina and Mich- ferred ti : tv ratios at various hierarchical levels; (4) base ener 1993) but is fully concordant with results from oth- composition bias within the Hymenoptera,