Higher-level phylogeny of mosquitoes (Diptera: Culicidae): mtDNA data support a derived placement for Toxorhynchites

ANDREW MITCHELL, FELIX A. H. SPERLING and DONAL A. HICKEY

Insect Syst Evol Mitchell, A., Sperling, F. A. H. & Hickey, D. A.: Higher-level phylogeny of mosquitoes · · (Diptera: Culicidae): mtDNA data support a derived placement for Toxorhynchites. Syst. Evol. 33: 163-174. Copenhagen, July 2002. ISSN 1399-560X.

We assess the potential of complete coding sequences of the mitochondrial cytochrome oxi­ dase genes (COl and COil) and the intervening tRNA-Leucine gene for use in high­ er-level systematics, and apply this data to an outstanding question: the phylogenetic affinities of Toxorhynchites. Traditionally placed in its own subfamily and regarded as sister group to Culicinae, recent morphological data instead have suggested that this distinctive genus belongs well within the Culicinae. Published molecular systematic studies seemingly conflict with this new morphological data or are ambiguous. The mitochondrial gene data that we pres­ ent show good potential for elucidating suprageneric relationships in Culicidae, and strongly support the placement of Toxorhynchites well within the Cnlicinae. Reexamination of pub­ lished data sets suggests that there is no substantive conflict among data sets on this issue.

Andrew Mitchell, School of Molecular and Cellular Biosciences, University of Natal, Private Bag XOl, Scottsville, 3209, South Africa ([email protected]). Felix A. H. Sperling, Department of Biological Sciences, University of Alberta, Edmonton, Alberta, T6G 2E9, Canada ([email protected]). Donal A. Hickey, Department of Biology, University .of Ottawa, Ottawa, Ontario, KlN 6N5, Canada ([email protected]).

Introduction tionships among subfamilies by Harbach & Mosquito has received intense attention Kitching ( 1998). However, their novel results have as a direct result of the immense medical impor­ yet to gain wide acceptance. This is largely be­ tance of many mosquitoes as vectors of disease­ cause of apparent conflict between their data and causing organisms. The realisation that accurate published data sets, both morphological and delimitation of species boundaries and an in-depth molecular. In this study we assess the potential of knowledge of geographic variation are essential . mitochondrial DNA sequence data for resolving for vector control has resulted in a focus on spe­ contentious issues in mosquito higher-level sys­ cies-level taxonomy in Culicidae. The higher-level tematics by examining the phylogenetic placement phylogeny of Culicidae is much more poorly of Toxorhynchites. understood. Among the reasons for this is the im­ Recent classifications of Culicidae recognize portance of nomenclatural stability in this impor­ three subfamilies: Anophelinae, with 3 genera, tant group, which understandably has led to con­ Culicinae, with 34 genera arranged in 10 tribes, -servative nomenclature. One could also argue, and Toxorhynchitinae, with a single genus, Toxo­ however, that the widespread importance of mos­ rhynchites. The distinctiveness of Toxorhynchites quitoes is in itself a strong motivation for provid­ was recognized early on, because the species are ing an up-to-date classification that more closely among the largest and most spectacular of all mos­ reflects phylogeny. Nevertheless, the earlier phe­ quitoes. Toxorhynchites females do not take blood netic work has largely pre-empted later cladistic meals, and oviposit in small, natural plant contain­ studies, and most systematists have followed the ers where the larvae are predatory (Steffan & classification of Edwards (1932) with little modi­ Evenhuis 1985). Toxorhynchites species have even fication. Rigorous phylogenetic methodology has been used as biological control agents of other only recently been applied to the problem of rela- mosquitoes, though with limited success (Steffan

© Insect Systematics & Evolution (Groups 6 & 7) 164 Mitchell, A. et al. INSECT SYST. EVOL. 33:2 (2002) & Evenhuis 1981). Despite these differences, a material are listed in Tab. 1. Specimens were col­ few authors considered the apparent peculiarity of lected in the wild or taken from laboratory the Toxorhynchitinae to have been overempha­ colonies and then either frozen live at -70°C or sized. Belkin (1962), for example, treated the preserved a few days in 95% ethanol before being group as a tribe of equal rank to the ten tribes of frozen at -70°C. Identifications of wild-collected Culicinae, and considered them closely related to material were provided by D. M. Wood {Agri­ the Sabethini. This hypothesis was supported by culture and Agri-Food Canada, Ottawa, Canada). Harbach & Kitching's (1998) morphology-based Our taxon sample included eleven mosquito cladistic analysis of all mosquito genera. species, including four anophelines, five culicines Molecular data, in contrast, have seemingly sup­ (from four tribes), and two toxorhynchitines. The ported the traditional placement of Toxorhynchites outgroup comprised two species, including a rep­ (Besansky & Fahey 1997) or have been equivocal resentative of Chaoboridae, most likely the sister (Miller et al. 1997).~ group of Culicidae (Oosterbroek & Courtney Which of these alternative placements for 1995) or at least very closely related (Pawlowski Toxorhynchites is better supported? We present et al. 1996), and a more distantly related culico­ sequence data .of the complete mitochondrial morph (a blackfly, Simuliidae). Published se­ cytochrome oxidase genes, COI and COil, and the quences .were obtained from GenBank for Ano­ intervening tRNA-Leucine gene (a total of 2.3kb ), pheles quadrimaculatus (Mitchell et al. 1993) and in order to assess the utility of these genes for A. gambiae (Beard et al. 1993). All other se­ mosquito higher-level phylogenetics, and to pro­ quences are new data. vide additional data bearing on this problem. Laboratory protocols. - For most specimens, DNA was extracted from individual specimens Materials and methods using a phenol!chloroform procedure as described by Sperling et al. (1994). For two specimens (Sa­ Abbreviations. - CO, cytochrome oxidase; GTR, bethes L-yaneus and Toxorhynchites sp.) DNA was general time reversible; ML, maximum likelihood; extracted using the Qiagen DNeasy Tissue Kit MP, maximum parsimony; mp, most parsimonious. (Qiagen Inc., Valencia, California). PCR was Taxon sampling. - Taxa sampled and sources of accomplished using the protocol of Sperling et al.

Table 1. Taxon sampling.

Taxon Source•

Culicidae Anophelinae (Anopheles) earlei Ottawa, Canada; D. M. Wood AF425843 Anopheles (Anopheles) quadrimaculatus Mitchell et al. (1993) NC000875 Anopheles (Cellia) gambiae Beard et al. ( 1993) NC002084 Anopheles (Cellia) stephensi LC, University of Maryland; V. Kulasekara AF425844 Culicinae Aedes atropalpus LC, University of Ottawa; T. Amason AF425845 Aedes aegypti LC, University of Ottawa; D. Hickey AF425846 Culex tarsalis LC, Queens University; V. Walker AF425847 Culiseta impatiens Ottawa, Canada; D. M. Wood AF425848 Sabethes cyaneus LC, University of Notre Dame; N. Besansky AF425840 Toxorhychitinae Toxorhynchites rutilus York Co., Pennsylvania; D. M. Wood AF425849 Toxorhynchites sp. Vietnam; D.C. Currie AF425850 Chaoboridae Chaoborus sp. Ottawa, Canad~; D. M. Wood AF425842 Simuliidae Cnephia dakotensis Ottawa, Canada; D. M. Wood AF425841 •LC =Laboratory colony bThe complete alignment is also available from TreeBase (http://www.herbaria.harvard.edultreebase/). INSECT SYST. EVOL. 33:2 (2002)

(1994). the end primers TY-J-1460 (=K698: pe1formed using identical protocols. All phyloge­ 5' -TAC AAT ITA TCG CCT AAA CIT CAG CC netic analyses were performed PAUP* ver­ and TK-N-3782 5'- GAG ACC AIT sions 4.0b4a and 4.0b8 Altivec (Swofford 1999). ACT TGC ITT CAG TCA TCT -3') and a variety lnc:on:gruem:e length difference (ILD) tests of primers developed by us consisted of 1000 heuristic from a comparison of GenBank sequences of etitions, each with 10 random addition sequelltces Vrt?sn.vmta yakuba, gambiae, and A. of taxa. A branch-and-bound maximum parsimony quadrimaculatus. For Cnephia. da- (MP) search was performed on our rntDNA data kotensis, Chaoborus sp., and Sabethes cyaneus, set. Further MP on our and all other the above end primers did not amplification MP analyses, of 1000 random addition products. For these taxa we developed far- sequences of ther out from the COIICOIT region. upstream sequences and, in primer used for Cnephia and Sabetl}.es was TW-J- genes, also on the inferred amino acid seaner1ces 1305 5'- GIT AAA TAAA'CT AAT AGC (740 characters). analyses CIT CAA AGC TG -3') and for Chaoborus was 1000 repetitions, each with 10 random addition TY-J-14601 5'-ITACAATTTCTTACT sequences of taxa. Constraint searches were car­ TAA GTT CAG CC -3'). The downstream ried out for each data set to determine the number used for both and Chaoborus was of additional steps needed to accommodate alter­ 3862 (5'- GGC CGT CTG ACA AAC TAA TGT native phylogenetic hypotheses. Bremer support TAT -3') from Simonet al. (1994). For most spec­ (or 'decay') indices (Bremer 1988) were calculat­ imens PCR · were sequenced directly ed for the morphological data set & Kit- using the Taq DyeDeoxyT111 Terminator Cycle 1998). Se(tUeJ:tciilg System (Applied Biosystems, Inc.) To assess the effects of biased base composition and fractionated on an ABI 373 automated DNA oh1vlo:£!:el1tV reconstruction we dis- sequencer. For two specimens ana!vs<~s based on the LogDet et and Toxorhynchites sp.) PCR were and compared these trees with those sequenced directly using the DYEnamicTM ETter­ derived the general time reversible minator cycle sequencing kit Phar­ model. macia Biotech; Cleveland, Ohio) and fractionated Maximum likelihood (ML) analyses were on an ABI 377 automated DNA sequencer. formed on all molecular data sets, and an identical protocol. Following Frati et al. (1997), 16 Sequences were assembled into models (four substitution models permuted with using SeqEd (Applied Bio­ four rate-distribution models) were evaluated on City, California) and Se­ the most parsimonious trees for each data set. qm~ncher (UI~neCodes Corp.; Ann Arbor, Michi­ Likelihood ratio tests were used to determine gan). Alignment of the protein-coding sequences which models had significantly higher likelihood and most of the intervening tRNA region was triv­ scores. The best model was then chosen from the ial, owing to the absence of insertions or '"""''"'""'""• set of models with the best score, with preference and therefore it was performed by hand using the to the model with the fewest free parame­ software Se-Al (Rambault 1996). The aligned DNA ters. This model was selected for further analyses, sequences were translated to amino acids using the with all parameters estimated once from the data insect mitochondrial code, in MacClade on the mp trees, and fixed Heuristic 3.03 (Maddison & Maddison 1992). Alignment- searches under the ML criterion used ambiguous sites were excluded to data analy- 100 random addition sequences for taxa with TBR sis for both the nucleotide amino acid data branch Bootstrap analyses employed sets. The aligned data set is deposited in Treebase 1000 each with a single addi- (http://www .herbaria.harvard.edu!treebase/). tion sequence of taxa. As for MP, constraint The data from published and new molecular searches were carried out for each data set to data sets were not combinable because of differ­ determine the ML score for alternative phyloge­ ences in taxon sampling, therefore separate phylo­ netic hypotheses. Kishino-Hasegawa tests (Ki­ genetic were performed on all data sets. shino & Hasegawa 1989), Shimodaira-Hasegawa All phylogenetic analyses of molecular data were tests (Shimodaira & 1999), Templeton 166 Mitchell, A. et al. INSECT SYST. EVOL. 33:2 (2002) Table 2. Base frettu::m when invariable sites were excluded (p = The complete consisted of 2345 sites.

Table 3. Summary statistics for data partitions mapped onto mp tree for all data (Pig. 3).

Combined COl con tRNA- Amino data Leu acids nt.l nt. 2 nt. 3 nt.l nt. 2 nt.3 No. characters 2279 512 512 512 228 228 228 59 740 No.(%) variable 877 (38) 137 (27) 47 (9) 399 (78) 76 (33) 26 (ll) 181 (79) 11 (19) 181 No.(%) infonn.• 620 (27) 91 (18) 26 (5) 304 (59) 50 (22) 18 (8) 129 (57) 2 (3) 114 No. steps 2279 301 73 1187 154 37 512 15 360 Cl, excl. unin:f.b 0.45 0.49 0.62 0.43 0.52 0.68 0.41 0.67 0.70 RI 0.37 0.52 0.73 0.28 0.57 0.27 0.60 0.73

•Number of parsimony-infonnative characters ~>consistency index, excluding parsimony-uninfonnative characters INSECT SYST. EVOL. 33:2 (2002) COI-COII phylogeny 167

Figure 1. One of three most parsimoni•ous (mp) tree.~ from equally-weighted of all sites, COI-COTI nucleotides. Length 2279 CI (excL uninf.) =0.45, R1 =0.37. Numbers above'-~·--'--- are boot;;trap values. Branches that collapse in the consensus tree are indicated by dashed lines.

Maximum parsimony (MP) analyses. -Equally­ analysis recovered as monophyletic the Anophel­ weighted parsimony of all sites resulted in inae, Anopheles (Anopheles), Anopheles (Cellia), three most parsimonious (mp) trees of 2279 steps and Toxorhynchites (Fig. was monophyl­ (Fig. 1). Data partitions were mapped onto the mp etic in only one of the three trees. Anophelinae tree to assess variation irr data quality 3). was basal within Culicidae, and Culicinae was Codon positions of COI and COH had very simi­ paraphyletic with respect Toxorhynchitinae. Toxo­ lar of variable and parsimony-inform­ rhynchites was firmly supported as sister-group to ative characters, and similar consistency indices Sabethes within the Culicinae. A search conducted (Cis) and retention indices (Ris). The tRNA-Leu using logarithmic-weighting produced a gene was less variable. The combined COIICOll tree which differed only in that Culex was the most amino acid data had the highest CI and Rl of all basal followed by Culiseta, then a mono­ partitions. phyletic Aedes, there was sup­ In the ingroup (Culicidae) the mean uncorrected port for A.edes monophyly. pairwise sequence divergence for all sites was For the inferred amino acid data set, a mp 13.5% with the values ranging from 7.9% between tree of 360 steps was recovered (Fig. which Anopheles earlei and "4. quadrimaculatus to differed from Fig. 1 in two respects: (i)subgenus 17.3% between A. quadrimaculatus and Toxorhyn­ Anopheles (Cellia) was paraphyletic with respect chites rutilus. For ingroup/outgroup comparisons to Anopheles (Anopheles), and (ii) there was the mean divergence was 18.0%. strong support for the monophyly of Aedes. The three mp trees from the equally-weighted Further analyses were performed on the amino 168 Mitcl;tell, A. et al. INSECT SYST. EVOL. 33:2 (2002) A B

Coephia dakotensis

Chaoborus sp .

. Anopheles stephensi 100 84 An

Anopheles quadrimaculatus

Anopheles earlei 97 98 Sabethes cyaneus 92 Toxorhynchites rutilus

Toxorhynchites sp.

Aedes atropalpus

Aedes aegypti

Culex tarsalis

Culiseta impatiens

- 10 steps -10 steps Figure 2. Most parsimonious (mp) trees derived using inferred amino acids for COl and COIL Numbers above branches are bootstrap values. (A) All amino acids coded as separate character states, length== 360 steps, CI (excl. uninf.) == 0.70, RI == 0.73. (B) Isoleucine, leucine and valine coded as equivalent character states, length= 279 steps, CI (excl. uninf.) =0.71, RI =0.76. acids with leucine, isoleucine, and valine and analyses. Bootstrap support for the monophyly of coded as equivalent character states (see Discus­ subgenus Anopheles (Cellia) was very strong sion). The resulting mp tree (Fig. 2B) recovered under the LogDet model (97%) but only moderate Anopheles (Cellia) as monophyletic with 70% (78%) weak under the GTR model, as in the parsi­ bootstrap support, otherwise the topology was mony analyses. identical to Fig. 2A. Maximum likelihood (ML) analyses.- Likelihood Minimum evolution (ME) analyses.- Unlike other ratio tests showed that the most appropriate substi­ distance measures, the LogDet model (Lockhart et tution model for this data set was the general time a!., 1994) is considered to be relatively insensitive reversible model (Lanave et a!., 1984), with in­ to base composition bias effects. ME analyses variable sites and gamma-distributed rates, i.e., were therefore performed using both LogDet and GTR +I+ I'. For this model the mp tree in Fig. 1 general time reversible (GTR; Lanave et al., 1984) had a likelihood score of -In L = 12,387.81, signif­ models. Both trees recovered similar relationships icantly better than all other models tested. (The to the mp trees, with only weakly supported dif­ next best model was GTR +I'; 2Aln L = 9.05, 1 ferences in the relationships among basal d.f., p < 0.005). Heuristic searches performed culicines. Toxorhynchites was always strongly under this model produced a single ML tree with a supported as sister-group to Sabethes, as in the MP score of -ln L = 12,386.63 (Fig. 3). INSECT SYST. EVOL. 33:2 (2002) COI-COII phylogeny of mosquitoes 169

sp.

,...-- Anopheles quadrimaculatus

Anopheles earlei

...--- Anopheles n:;;,nnn•~""

100 "-- Anopheles stephensi

---Culiseta impatiens

1.!.!:~,,...... --- Culex tarsa!is

Aedes atropalpus

---Aedes aegypti

---- Sabethes cyaneus

---- Toxorhynchites rutilus

"--- Toxorhynchites sp. - 0.05 substitutions/site

3. Maximum likelihood (ML) tree for COl-COil nucleotides, obtained under the GTR +I+ r u .•>ou.v.:>. Numbers above branches are bootstrap values. Model parameters: Nucleotide frequencies: 0=0.13385, T=0.41376; Substitution rate matrix: A1C=3.023, AJG=l8.37, A1T=21.36, C/T=60.29, Proportion of invariable sites: 0.47238; r- distribution shape parameter (a)=0.88003.

A search was performed with the Culicinae con­ Excluding third codon and applying suc­ strained to be monophyletic, and the ML tree cessive approximations character weighting obtained had a score ofln L = 12,414.93. This tree 1969) the authors obtained amp tree which was to all the parsimony and likelihood cortgn1ent with the traditional classification of trees obtained for this data set and described pre­ "'·'"~'-'~ Anophelinae was basal and Toxo­ viously. The Shimodaira-Hasegawa test (S-H test) rllynctutume was sister group to a monophyletic was used in addition to the Kishino-Hasegawa test (K-H test) because of recent concerns over the bias Be:;an:skv & Fahey's was pro- of the former test (Goldman et al., 2000). The tree the PILEUP program in the GCG in which Culicinae was constrained to be mono- Sequence Package (Genetic Computer nh,,l,..,,.. was the tree with a significantly Group 1994), using the default The likelihood than ML tree (K-H test, p = authors noted an alignment-ambiguous 0.019 and S-H test,p = between 117 an4 189, but to keep this in their data set because it had lit- tle effect on tree topology. They also coded the (1997) Data set of Besansky & gaps within this region as additional two-state Besansky & (1997) analysed data from the characters at the end of the data set. On examina­ nuclear, protein-encoding white gene for 13 tion of the inferred amino acid sequences it species of and two chaoborid appeared to us that Toxorhynchites was too diver- 170 Mitchell, A. et al. INSECT SYST. EVOL. 33:2 (2002)

....--- Chaoborus the full data set (their fig. 6). Rather than being placed as sister group to the Culicinae, Toxorhyn­ .---- Eucorethra chites was placed within the Culicinae, specifical­ ly within the Sabethini as sister group to Sabethes. An. albimanus There was substantial support for this placement, albeit under SACW: a 97% bootstrap value for An.gamblae 100 Toxorhynchites being sister group to Sabethes, and a 99% bootstrap value for the monophyly of (Sabethini + Toxorhynchites). Bironella We determined the most appropriate ML model for this reduced data set, inCluding third codon positions, to be that of Hasegawa et al. (1985), with invariable sites and gamma-distributed rates, Deinocerites i.e., HKY85 +I+ r. Under this model we found a Toxorhynchites single ML tree (Fig. 4) in which Anophelinae was the most basal group, although paraphyletic with Sabethes respect to all other mosquitoes, and Toxorhyn­ chites was placed well within Culicinae, as sister Trlpteroides group to Sabethini. Pairwise Kishino,Hasegawa tests revealed no significant differences (p = 0.84 Ae. triseriatus and p = 0.60, respectively) between the ML tree Ha. equinus and trees resulting from searches in which either Anophelinae was constrained to be monophyletic, Ae. aegypti or Toxorhynchites was constrained to be the sister group to Culicinae. Ae. alboplctus -0.1 substitutions/site Data set of Harbach & Kitching (1998) Figure 4. Maximum likelihood (ML) tree for the white All trees produced by Harbach & Kitching ( 1998) gene data set (Besansky & Fahey 1997) obtained under HKY85 +I+ r model, -ln L =5,682.76. Numbers above placed Toxorhynchites well within the Culicinae, branches are bootstrap values under ML criterion. although there was substantial variation among trees in the tribal-level relationships within Cul­ icinae. We calculated Bremer support (or 'decay') indices for this data set. Using PAUP*4.0b2a gent in this region, as well as in the flanking (Swofford 1999) we found 130460 trees of length regions, to be aligned reliably with the other s 351 steps, i.e., up to one step longer than the mp sequences. Given the lack of clear homology for tree. The strict consensus of these trees had seven these characters, we tested the effects of excluding of 36 nodes resolved within the ingroup (i.e., all the alignment-ambiguous region from analyses, other nodes had decay indices of 1). The most and not adding binary characters coding for the strongly supported clade was Anophelinae, with a gaps to the data set. decay index of eight. The Culicinae clade, which For our reanalysis we excluded from the data set contained Toxorhynchites, had a decay index of a block of 99 alignment-ambiguous nucleotides, five, although it cost only three steps to place corresponding to positions 100-198 in Besansky & Toxorhynchites as sister-group to the Culicinae. Fahey's (1997) alignment (their fig. 3). We repeat­ Kishino-Hasegawa tests, Templeton tests and win­ ed the parsimony analysis on this reduced data set, ning site tests confirmed that this difference in tree excluding third codon positions. We found 2 mp lengths was not statistically significant (p > 0.5 for trees of 383 steps, the strict consensus of which all tests). was identical to Besansky & Fahey's fig. 5. However, applying successive approximations Discussion character weighting (SACW) to this reduced data set resulted in a different topology to that obtained For the mtDNA data set, we noted heterogeneity when Besansky & Fahey (1997) applied SACW to of base composition in first and third codon posi- INSECT SYST. EVOL. 33:2 (2002) COI-COII phylogeny of mosquitoes 171 tions. An apparent phylogenetic trend in base com­ affect phylogeny reconstruction even when it is position is discernible in Tab. 2: The proportion of based on the inferred amino acid guanine decreases as one ascends the tree, Similarly, among taxon base bias at 11.7% in the decreasing to 8- might account for the low support(47%) 9% in Anophelinae, 5-7% in other than for the monophyly of Anopheles (Cellia) under the Sabethes, and finally 2-3% in Sabethes and Toxo­ ML criterion. rhynchites. Base composition biases are known to The of parsimony, distance and affect phylogenetic inferences. The questiQn of likelihood on very similar tree topologies whether base comp9sition is a reflection of phy- gives us confidence in these results. Within or is driving the homoplasious association Culicidae, our COI-COII data set to be taxa can be addressed by comparing distance providing useful phylogenetic signal recon- trees derived under the LogDet model (Lockhart et structing suprageneric relationships and potential­ al. with trees derived under conventional ly for relationships as welL In con­ models. The LogDet model is robust to among­ trast, Foley et al. (1998) found that the COil gene taxon variation in base composition, therefore if did not provide resolution of deeper rela­ both trees give the same pattern of relationships, tionships within Anopheles. Indeed, plots of one can be reasonably confident that base compo­ uncorrected versus ML-corrected dis­ sition bias is not driving the results. This was tances for third codon position sites of con in indeed the case for this data set, with LogDet and both data sets (not shown) indicate that these sites GTR-model distance trees recovering the same are approaching saturation. However, the situation pattern of relationships. is different for flrst and second codon positions. The full amino acid data set recovered erro­ This is illustrated by a comparison of uncorrected neous relationships within Anopheles, placing sequence divergence values for each Anopheles (Cellia) gambiae as sister group to position of the con gene for et al. Anopheles (Anopheles), with 84% sup- (1998) ingroup (Anopheles) versus our ingroup In our (e.g., Mitchell et aL (all mosquitoes): values at ftrst, second amino acid can reach local saturation and third codon at 8.0%, and before their corresponding nucleotide sequences. 32.3%, for Anopheles This is most likely due to loss of the phylogenetic ces, and at 7 .5%, and 34.2% for information in synonymous substitutions and con­ mosquito sequences in our data set. Third codon vergence among amino acids (Simmons 2000). posttton divergences level off at a little more than Naylor & Brown (1997) noted for a data set com­ in both data sets and therefore appear to be at prising complete mitochondrial genome sequences or near local saturation. We note, however, that that leucine, isoleucine and valine produced the this does not necessarily preclude their phyloge- poorest fits of all the amino acids. We noted many netic atthis level Orti & Meyer 1996; homoplasious substitutions among these amino Baker et 2001). In contrast, first and second acids in our data set therefore, in an attempt to codon position divergences continue to increase trace the source of between the amino when taxon sampling is extended from Anopheles acid and nucleotide trees, we coded I and Vas to include the other mosquito genera in our data equivalent character states and repeated the analy­ set, so these sites clearly have not reached satura- ses. The mp tree derived by this method 2B) tion in et al.'s data set. Thus it would appear recovered the expected relationships that the of this gene for An:oohel'es. with moderate for Anopheles reconstructing suprageneric rather than intl~aglener- ic relationships stems from the increased the distance~-·"· .. ,-~ b()()tstrap first and second codon divergences for the monophyly of (Cellia) (although we note that COil third position diver­ (97%) under the LogDet model than gences increase to when ingroup/out- under GTR model (78%), suggesting a possi­ comparisons are made for our data set). ble role played by base composition bias in the Co.mpari1mn of the summary statistics for our poor performance of the full amino acid data set in 3) with those for Foley et al.'s data (Tab. recovering this clade. Indeed, Foster & 4) provides further evidence supporting our argu­ (1999) described how base composition bias can ment for the phylogenetic utility of this gene 172 Mitchell,A.etal. INSECT SYST. EVOL. 33:2 (2002) region at deeper levels within Culicidae. For Foley Toxorhynchites well within the Culicinae. ML et al.'s 79% of the total tree length comes analyses of this place Toxorftynchites from third codon sites, which have an RI well within the Culicinae, as sister-group to the of 0.448, and 21% of the total tree comes Sabethini. However, the phylogenetic signal for from fl.rst and second positions, with 0.511 this placement was not strong under the ML crite­ and 0.644, respectively. In our data set, third rion, and placement of Toxorhynchites in the tradi­ codon positions of the COn gene (RI == 0 .27) con­ tional position could not be ruled out. tribute 71% of the can tree length while first Krzywinski et al. (2001) examined the higher- and second positions Ris == 0.66 and 0.81, level of Anophelinae mito- respectively) contribute of the total tree chondrial DNA and__ cyf]:l genes) nuclear length. Thus in our data which deep- ribosomal DNA sequences. While the cyt b data er relationships, more data comes from the slower­ saturated rapidly and proved uninformative evolving first and second codon positions. These at the very lowest levels, ND5 and 28S sites also are less homoplasious in our data set were informative of anopheline phylogeny, than in Foley et al.'s (1998) data set. ularly in combination. Unfortunately (for our pur­ i\nother factor potentially accounting for the poses) this study did not include non-mosquito greater phylogenetic information content of our outgroups, making it difficult to draw fim1 conclu­ data set versus that of Foley et al. (1998) is the sions about the phylogenetic affinities of number of characters in our data set, which Toxorhynchites. Instead the trees were rooted on also included the entire COI gene, is more Uranotaenia, presumably because Harbach & than twice the length of COIL In combination, the Kitching (1998) hypothesized that Uranotaenia COil, and tRNA-Leu genes show much and are basal to all other Culicinae. tential for of mosquito ... ~, .....,,,-,.., These two taxa both are placed in logeny at many levels. It is therefore still poss1bJle Krzywinski et al.'s (2001) l'v:iL tree, but Toxo­ that a combined COl and COH data set prove rhynchites is well separated from instead informative of the deeper splits in Anopheles as being placed within the Culicinae. This placement well. agreement with both Harbach & L'-u•~un1~:; Miller et al. presented data from the and our COI-COII data. nuclear 18S and 5.8S rRNA genes to address Krzywinski et al.'s (2001) ND5 data contained logenetic relationships of the Culicomorpha. much phylogenetic signal despite the small size of taxon sample of four mosquito species included the utilized (525 bp) and the fact that this is representatives of each of the three subfamilies of one ef the fastest-evolving genes in the mitochon­ Culicidae, thus their data is relevant to the place­ drial genome (Clary & Wolstenholme 1985; cited ment of Toxorhynchites within the family. Miller in Krzywinski et al. 2001). We can be confident et al.'s ML tree Toxorhynchites as sister therefore that the 2.3 kb of more slowly group to Culex, sister to this pair COl-COil data presented in this paper will be (65% bootstrap support for these taxa informative at these and probably consider­ a clade) and Anopheles basal. Bootstrap sup­ ably deeper. port for the placement of Toxorhynchites within Harbach & Kitching's (1998) mp trees place the Culicinae was weak. Our reanalysis of this Toxorhynchites within the Culicinae. However, data set found that placement of Toxorhynchites in given that it costs three extra to enforce the traditional position, as sister-group to all a sister-group between and Culicinae, could not be excluded as a significantly Toxorhynchitinae, we suggest that this traditional less likely hypothesis on the basis of Kishino­ hypothesis has not been refuted by their Hasegawa tests = 0.97). data set considered on its own. Analyses of white gene data set (Besansky Considered alone, only our COI-COII data set & Fahey 1997) from which third codon positions had sufficient resolving to sta- had been excluded were entirely dependent on tistically between a placement Toxo- inclusion/exclusion of the alignment-ambiguous rhynchites as sister group to the Culidnae, and a region and associated binary 'gap characters.' more derived placement well within the Culicinae Exclusion of the sites re­ (the preferred hypothesis), based on Kishino­ sulted in strong support for the placement of Hasegawa/Shimodaira-Hasegawa tests and non- INSECT SYST. EVOL. 33:2 (2002) COI-COII phylogeny of mosquitoes 173

parametric tests. Bootstrap values supporting the cussion. Research funding was provided by NSERC latter, more derived placement were very high for Canada Operating Grants to F. A. H. S. and D. A. H. our data set, i.e., flies (Diopsidae). Syst. Bioi. 50: 87-105. Beard, C. B., Hamm, D. M. & Collins, F. H. (1993) The most powerful tool at our disposal for corrobora­ mitochondrial genome of the mosqnito Anopheles tion of phylogenetic hypotheses (Miyamoto & gambiae: DNA sequence, genome organization, and Cracraft 1991). That all five data sets examined comparisons with mitochondrial sequences of other support a derived placement of Toxorhynchites . Insect Mol. Bioi. 2: 103-124. Belkin, J. N. (1962) The mosquitoes of the South Pacific within Culicinae gives us confidence in this phy­ (Diptera, Culicidae). Vol. 1: ix + 608 pp. University logenetic hypothesis. of California Press, Berkeley.· Besansky, N. J., & Fahey, G. T. (1997) Utility of the Conclusions. - Both the new data presented here white gene in estimating phylogenetic relationships and the previously published DNA sequence data among mosquitoes (Diptera: Culicidae). Mol. Biol. support the placement of Toxorhynchites within Evol. 14: 442-454. Bremer, K. (1988) The limits of amino-acid sequence the Culicinae, and the COI-COII data does so with data in angiosperm phylogeny reconstruction. the greatest confidence. Combining data from Evolution 42: 795-803. these different sources could improve phylogenet­ Caterino, M.S., Cho, S. & Sperling, F. A. H. (2000) The current state of insect molecular systematics: a thriv­ ic signal (Mitchell et al. 2000) and lead to a more ing tower of Babel. Ann. Rev. Ent. 45: 1-54. decisive result, but this cannot be done in this case Edwards, F. W. (1932) Fam. Culicidae. Genera Insect. because existing studies (which apparently were 194: 1-258. Tervueren, Bruxelles. commenced at about the same time) have sampled Fanis, J. S. (1969) A successive approximations ap­ proach to character weighting. Syst. Zool. 18: 374- different taxa and different genes. We recommend 385. that future DNA sequencing efforts aiming to Fanis, J. S., Killlersjo, M. Klnge, A. G. & Bult, C. resolve mosquito subfamily and tribal relation­ (1994) Testing the significance of incongruence. ships conclusively should capitalize on the pub­ Cladistics 10: 315-319. Foley, D. H., Bryan, J. H. Yeates, D. & A. Saul. (1998) lished data, where appropriate, and choose their Evolution and systematics of Anopheles: insights taxa in such a way that combined data sets can be from a molecular phylogeny of Australasian mosqui­ assembled (Caterino et al. 2001). Taxon sampling toes. Mol. Phylog. Evol. 9: 262-275. incompatibilities notwithstanding, all existing Foster, P. G. & Hickey, D. A. (1999) Compositional bias may affect both DNA-based and protein-based phylo­ DNA sequence data is in broad agreement with the genetic reconstructions. J. Mol. Evol. 48: 284-290. morphology-based study of Harbach & Kitching Frati, F., Simon, C. Sullivan, J. & Swofford, D. L. (1998) regarding the placement of Toxorhynchites. (1997) Evolution of the mitochondrial cytochrome Following these authors, Toxorhynchitinae should oxidase II gene in Collembola. J. Mol. Evol. 44: 145- 158. be reduced to tribal status, although further taxon Genetic Computer Group (1994) Program manual for sampling is needed to establish where they are the Wisconsin package. Version 8. Madison, Wiscon­ placed within Culicinae. It appears that complete sin. sequences of the COI, COII, and tRNA-Leu genes Goldman, N., Anderson, J.P. & Rodrigo, A. G. (2000) Likelihood-based tests of topologies in phylogenetics. will be effective in further resolving the higher­ Syst. Bioi. 49: 652-670. level relationships of Culicidae in general, and the Harbach, R. E. & Kitching, I. J. (1998) Phylogeny and ~.phylogenetic placement of Toxorhynchites within classification of the Culicidae (Diptera). Syst. Ent. 23: 327-370. the Culicinae in particular. Hasegawa, M., Kishino, H. & Yano, T. (1985) Dating of the human-ape splitting by a molecular clock of mito­ Acknowledgements chondrial DNA. J. Mol. Evol. 21: 160-174. Hnelsenbeck, J.P. (1995) Performance of phylogenetic We thank D. M. Wood for species identifications, T. methods in simulation. Syst. Biol. 44: 17-48. Amason, N. Besansky, D. C. Cunie, V. Kulasekara, V. Kishino, H. & Hasegawa, M. (1989) Evaluation of the Walker and D. M. Wood for providing specimens, J. maximum likelihood estimate of the evolutionary tree Leibovitz for excellent technical assistance, and anony­ topologies from DNA sequence data, and the branch­ mous reviewers for constructive criticism. F. A. H. S. ing order in Hominoidea. J. Mol. Evol. 29: 170-179. thanks J. Huelsenbeck for productive preliminary dis- Krzywinski, J., Wilkerson, R. C. & Besansky, N. J. 174 Mitchell, A. et al. INSECT SYST. EVOL. 33:2 (2002)

(2001) Evolution of mitochondrial &Jd ribosomal Oosterbroek, P.,.& (1995) Phylogeny of the gene sequences in Anophelinae {Dipter-a: Culicidae): nematocerous fanrilies Diptera (Insecta). Zool. J. Implications for phylogeny reconstruction. l'rfol. linn. Soc. 115: 267-311. Phylog. Evol. 18: 479-487. Orti, G. & Meyer, A. (1996) Molecular evolution of Lanave,C.,Preparata, Saccone, C. & Serio, G. (1984) ymin and the phylogenetic resolution of A new method for substitu- among euteleost fishes. Mol. Biol. tion rates. J. Mol. Evol. 20: 73. Lockhart, P. J., Steel, M.A., M.D. & Penny, D. Pa•wkowskL Szadziewski, R. '""'ll"''~"'""• (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol. 11: 605-612. Maddison, W.'P. & Maddi!!On, D. R .. (1992) MacClade: Analysis character evolution. version Version Associates, Sunde(land, Al.htrnl Massachusetts. Shimodaira, H. & Hasegawa, M. (1999) Multiple com­ Miller, B. R., Crabtree, M. B. & Savage, H. M. (1997) parisons of log-likelihoods with applications to phy­ Phylogenetic relationships of the Culicomorpha logenetic inference. Mol. Btol. Evol. 16:1114-1116. inferred from 18S and 5.8S ribosomal DNA Siinmons, M. P. (2000) A fundamental problem with· sequen.ces (Diptera: Nematocera).!nsect Mol. Bioi. 6: amino acid sequence characters for phylogenetic &Jalyses. Cladistics 16: 274-282. Cho, S., Regier, J. C., Mitter, C.,Poole, R. Simon, C., Beckenbach, A., Crespi, B., Liu, H. Ma1tthews.M. M. (1997) Phylogenetic of & Flook, P. weighting, and phylo- elong1l.tic•n factor-1? in Noctuoidea (Insecta: genetic utility of gene sequences and a dOlltelra): the limits of synonymous substitution. compilation of conserved polymerase chain reaction 14: 381-390. Ann. ent. Soc. Am. 87: 651-701. MitcheH, A., Mitter, C. & Regier, J. C. (2000) More taxa -8pt~rling,F.A. H., Anderson, G. S. & Hickey, D. (1994) or more characters revfsited: combining data from '-'1''"""'"'"""' approach to the identification of insect nuclear, protein-encoding ana- used for postmortem interval estimation. J. lyses ofNoctuoidea Syst. Bioi. Sci. 39: 418-427. 49: 202-224. W. A. & Evenhuis, N. L. of Mitchell, S. E., Cockburn, A. F. & Seawright, J. A. lOJI:orf!yn.cmites. Ann. Rev. Ent. 26: (1993) The mitochondrial genome of Anopheles A. & Evenhuis. N. L. quadrimaculatus A: complete nucleotide se- of subgenus To.wrhynchites \...,.·'""'""'· quence and gene Genome 36: 1058- J. med. Ent. Honolulu 22: 1073. Swofford, D. L. (1999) PAUP*. l"'hYiOfl.•metlc Miyamoto, M. M. & Cracraft, J. (1991) Phylogenetic using parsimony (*and other ... ~,.. ~,~,. inference, DNA analysis, and the future of Sinauer Associates, Sunderland, MaLssaiChllSeltts molecular 3c17 in Miyamoto, M. M. Templeton, A R. (1983) Convergent evolution and non­ & analysis of DNA sequen- parametric inferences from restriction fragment and ces. Press, New York. DNA sequence data. Pp. 151-179 in Weir, B.: Statis­ Naylor, G. J.P. & Brown, W. M. (1997) Structural biol­ tical analysis of DNA sequence Marcel Dekker, ogy and phylogenetic estimation. Nature 388: 527- New York. 528.

View publication stats