Molecular Phylogenetics and Evolution 68 (2013) 671–682

Contents lists available at SciVerse ScienceDirect

Molecular Phylogenetics and Evolution

journal homepage: www.elsevier.com/locate/ympev

Species tree reconstruction of a poorly resolved clade of () using multiple nuclear loci ⇑ Joshua S. Williams a, , John H. Niedzwiecki b, David W. Weisrock a a Department of Biology, University of , Lexington, KY 40506, USA b Department of Biology, Belmont University, Nashville, TN 37212, USA article info abstract

Article history: The analysis of diverse data sets can yield different phylogenetic estimates that challenge systematists to Received 13 June 2012 explain the source of discordance. The mole salamanders (family Ambystomatidae) are a classic example Revised 14 April 2013 of this phylogenetic conflict. Previous attempts to resolve the ambystomatid tree using allozymic, Accepted 16 April 2013 morphological, and mitochondrial sequence data have yielded different estimates, making it unclear Available online 28 April 2013 which data source best approximates ambystomatid phylogeny and which ones yield phylogenetically inaccurate reconstructions. To shed light on this conflict, we present the first multi-locus DNA Keywords: sequence-based phylogenetic study of the Ambystomatidae. We utilized a range of analyses, including Coalescent analysis coalescent-based methods of species-tree estimation that account for incomplete lineage sorting within Concordance analysis Bayesian analysis a locus and concordance-based methods that estimate the number of sampled loci that support a partic- Gene tree ular clade. We repeated these analyses with the removal of individual loci to determine if any locus has a Species tree disproportionate effect on our phylogenetic results. Collectively, these results robustly resolved many Ambystoma deep and relatively shallow clades within Ambystoma, including the placement of A. gracile and A. talpoid- eum as the sister clade to a clade containing all remaining ambystomatids, and the placement of A. mac- ulatum as the sister lineage to all remaining ambystomatids excluding A. gracile and A. talpoideum. Both Bayesian coalescent and concordance methods produced similar results, highlighting strongly supported branches in the species tree. Furthermore, coalescent-based analyses that excluded loci produced over- lapping species-tree posterior distributions, suggesting that no particular locus – including mtDNA – dis- proportionately contributed to our species-tree estimates. Overall, our phylogenetic estimates have greater similarity with previous allozyme and mitochondrial sequence-based phylogenetic estimates. However, intermediate depths of divergence in the ambystomatid species tree remain unresolved, poten- tially highlighting a region of rapid species radiation or a hard polytomy, which limits our ability to com- ment on previous morphologically-based taxonomic groups. Published by Elsevier Inc.

1. Introduction vidual data sets can sometimes identify the source of discordance (e.g., Wiens and Hollingsworth, 2000), in other studies individual Systematists are often challenged to explain phylogenetic con- data sets can yield convincingly strong support, precluding resolu- flict arising from the analysis of diverse data sets (e.g., morpholog- tion of the conflict. In these situations, collection of additional data ical and molecular data) (Shaffer et al., 1991; Wiens and from an independently evolving source will be necessary to eluci- Hollingsworth, 2000). Individual data sets can be phylogenetically date phylogenetic history and shed light on the source of the initial misleading if parallel evolution has produced homoplastic charac- phylogenetic conflict. ters, a problem inherent in both morphological and molecular Phylogenetic reconstruction of multiple independent loci can characters (Hillis, 1987). Furthermore, properties of the underlying also yield discordance among gene trees, a product of a number phylogeny itself can facilitate inaccurate estimation when terminal of processes, including incomplete lineage sorting (deep coales- branches are long relative to internal branches [e.g., long branch cence) and lateral gene transfer (Maddison, 1997). Concatenated attraction (Felsenstein, 1978)]. While further exploration of indi- phylogenetic analysis of loci represents a traditional approach to resolving a prevailing species tree from a collection of loci, with ⇑ the potential to yield strongly supported trees when large numbers Corresponding author. Address: DNA Analysis Facility on Science Hill at Yale of loci are considered, even when there is gene tree discordance University, 170 Whitney Ave., ESC Room 150, New Haven, CT 06511, USA. Fax: +1 203 432 7394. (Rokas and Carroll, 2005). However, despite this potential, concat- E-mail address: [email protected] (J.S. Williams). enation has been shown to have a high probability for statistical

1055-7903/$ - see front matter Published by Elsevier Inc. http://dx.doi.org/10.1016/j.ympev.2013.04.013 672 J.S. Williams et al. / Molecular Phylogenetics and Evolution 68 (2013) 671–682 inconsistency when there is substantial gene tree discordance, data sources, including the morphological placement of A. gracile resulting in strongly supported yet inaccurate phylogenies (Kubat- as the sister lineage to all other Ambystoma species. However, the ko and Degnan, 2007; Weisrock et al., 2012). Methods that esti- Linguaelapsus clade has been highlighted as a particularly striking mate a species tree from independently estimated gene trees, example of discordance between morphological and molecular including those that account for the stochastic nature of genetic data sets (Shaffer et al., 1991). Combined analysis of the data only drift in the lineage-sorting process, may be more likely to recon- weakly supports the Linguaelapsus clade (Jones et al., 1993; Shaf- struct the true species phylogeny, despite strongly supported dis- fer et al., 1991), but individual data sets each strongly support dif- cordance among gene trees (Edwards et al., 2007; Leache and ferent placements for its component taxa. Whether or not this Rannala, 2011). discordance results from homoplasy in one, or both, data sets, or The phylogeny of species of the family Ambystom- extreme non-independence among characters is not clear. atidae represents a classic example of phylogenetic conflict arising Phylogenetic reconstruction of the Ambystomatidae using mul- from the analysis of very different data sources (Shaffer et al., tiple independent sources of DNA sequence data represents one 1991). The Ambystomatidae are broadly distributed across North step towards elucidating the factors causing this strong discor- America, and feature a diverse array of life-history phenotypes. dance. There have been previous attempts to reconstruct ambys- This includes a radiation of US and Mexican species (the tiger sal- tomatid phylogeny using mtDNA sequence data. Bogart (2003) amanders) that vary in their propensity to metamorphose, and a presented a phylogenetic tree based on cytochrome b (cytb) and group of unisexual populations in eastern North America that are 16S sequence data that included all species of Ambystoma, except putatively of hybrid origin. Phylogenetic analyses of allozymic for A. annulatum. This tree placed A. texanum and A. barbouri to- and morphological data sets collected from sexual Ambystoma spe- gether in a clade, with A. mabeei maintaining a close relationship cies (and representative tiger salamander species) yield a number with these two species; however, the Linguaelapsus clade was of discordant topological patterns. The strongest of these involves not recovered. In a subsequent mtDNA study of the origin of uni- a morphologically supported clade, named Linguaelapsus, com- sexual ambystomatids using cytb sequence data, Robertson et al. prising Ambystoma annulatum, A. barbouri, A. cingulatum, A. mabeei, (2006) resolved a clade that contained Linguaelapsus species (A. and A. texanum (Fig. 1A) (Kraus, 1988). In contrast, parsimony and annulatum was not sampled) along with representative tiger sala- maximum-likelihood analysis of allozymic data strongly support mander taxa. However, this clade – and the majority of interspe- very different placements of these taxa within the Ambystoma tree cific relationships – received very low parsimony and Bayesian (Fig. 1B). A number of other relationships differ between the two branch support. This study may also have been misled by the inclu- sion of a nuclear paralog of cytb (Bi and Bogart, 2010). In this study, we have presented the first multi-locus nuclear A Ambystoma gracile DNA sequence data set ever collected to resolve phylogenetic rela- 91 A. tigrinum tionships among sexual species of Ambystoma. We included repre- A. californiense sentatives of nearly all sexual species, including multiple 74 A. laterale representatives of the diverse tiger salamander clade. Sequence A. jeffersonianum data were generated from 14 nuclear loci, a majority of which were 22 A. macrodactylum located on separate linkage groups, and a mitochondrial locus. To A. maculatum avoid the complexities and challenges associated with sampling al- leles from heterozygous individuals for use in concatenated phylo- 59 A. opacum 42 A. talpoideum genetic analysis (Weisrock et al., 2012), we focused on analyses A. mabeei that allow for the independent reconstruction of locus-specific 96 gene trees. We explored gene-tree discordance among loci using 97 A. texanum a Bayesian estimate of clade concordance (Ane et al., 2007), and 100 A. barbouri we focused our species-tree reconstruction on methods that use 94 A. annulatum a coalescent model to account for gene-tree discordance through A. cingulatum mechanisms of incomplete lineage sorting (Heled and Drummond, Ambystoma talpoideum 2010; Kubatko et al., 2009). Simulation studies indicate that these B methods outperform concatenation in species-tree reconstruction 100 A. maculatum A. gracile when phylogenetic history features short branches and/or large effective population sizes (Leache and Rannala, 2011), conditions 100 A. cingulatum 65 that should increase the probability of an incomplete lineage sort- A. annulatum ing event and discordance among gene trees. Finally, we gauge A. tigrinum 69 whether any individual gene has a disproportionate effect on spe- A. californiense 73 cies tree reconstruction. Overall, our results provide strong resolu- A. jeffersonianum 81 tion for many regions of the ambystomatid phylogeny, yet 69 A. macrodactylum highlight a region of the tree containing short internal branch A. opacum 65 lengths and weak branch support that still lacks strong evidence, A. laterale 50 for or against, the Linguaelapsus clade. A. mabeei 77 100 A. texanum A. barbouri 2. Materials and methods

Fig. 1. Morphological (A) and allozyme (B) based phylogenetic hypotheses for Ambystoma from previous published work (Kraus, 1988; Shaffer et al., 1991). (A) 2.1. Taxonomic and genetic sampling The morphological tree was based on 32 morphological characters. Numbers on branches represent bootstrap values. (B) The allozyme tree was based on 26 A total of 33 individuals were sampled from 18 extant Ambys- allozyme characters. Numbers on branches represent jackknife values. In both trees toma species, with 1–4 representative individuals per species members of Linguaelapsus are highlighted in red. (For interpretation of the references to color in this figure legend, the reader is referred to the web version (Table 1). This sampling included five representative lineages of of this article.) the taxonomically diverse A. tigrinum species complex based on J.S. Williams et al. / Molecular Phylogenetics and Evolution 68 (2013) 671–682 673

Table 1 Information for the individuals of Ambystoma and Dicamptodon used in this study. Tissue number abbreviations are as follows: DWW, David W. Weisrock tissue collection; HBS, H. Bradley Shaffer tissue collection; JSE, Jim Demastes tissue collection; LSU, Louisiana State University Collection of Genetic Resources; MVZ, Museum of Vertebrate Zoology, University of California, Berkeley.

Species Tissue source Locality # of Loci sequenced Ambystoma annulatum DWW 0364 Warren Co., MO, USA 15 A. barbouri DWW 0363 Jessamine Co., KY, USA 15 A. bishopi HBS 18028 Jackson Co., FL, USA 15 A. bishopi HBS 18036 Okaloosa Co., FL, USA 15 A. cingulatum HBS 8197 Liberty Co., FL, USA 15 A. cingulatum HBS 18030 Baker Co., FL, USA 14 A. gracile MVZ 161801 Mendocino Co., CA, USA 15 A. gracile MVZ 173465 Lane Co., OR USA 15 A. jeffersonianum LSU H1207 PA, USA 15 A. laterale MVZ 188017 Hants Co., Nova Scotia, Canada 15 A. laterale JSE60a Beherens Ponds, Linn Co., IA, USA 15 A. laterale MVZ 173468 Cook Co., IL, USA 15 A. mabeei MVZ 144890 Scotland Co., SC, USA 15 A. macrodactylum MVZ 137198 Missoula Co., MT, USA 15 A. macrodactylum MVZ 144895 Linn Co., OR, USA 15 A. macrodactylum MVZ 161822 Santa Cruz Co., CA, USA 15 A. maculatum MVZ 144934 Wake Co., NC, USA 15 A. maculatum MVZ 187999 Halifax Co., Nova Scotia, Canada 15 A. maculatum LSU H15983 LA, USA 15 A. opacum LSU H513 LA, USA 15 A. talpoideum LSU H15996 LA, USA 15 A. talpoideum MVZ 144946 Berkeley Co., SC, USA 15 A. texanum LSU H18514 LA, USA 15 A. texanum MVZ 144954 Douglas Co., KS, USA 15 A. tigrinum HBS 7247 Goshen Co., WY, USA 15 A. tigrinum HBS 7877 Washington Co., UT, USA 15 A. mexicanum DWW 1774 Area Laguna del Toro, Mexico 15 A. californiense HBS 6687 Jepson Praire Solano Co., CA, USA 13 A. californiense HBS 26367 Sonoma Co., CA, USA 15 A. tigrinum DWW 2548 Alachua Co., FL, USA 15 A. tigrinum DWW 2554 Alachua Co., FL, USA 15 A. ordinarium HBS 25134 San Jose Lagunillas, Mexico 15 A. ordinarium HBS 24978 El Pedregoso, Mexico 15 Dicamptodon aterrimus MVZ 203271 Idaho Co., ID, USA 1 D. aterrimus MVZ 187983 Valley Co., ID, USA 3 D. aterrimus MVZ 187986 Valley Co., ID, USA 3 D. copei MVZ 197777 Grays Harbor Co., WA, USA 3 D. copei MVZ 223515 Mason Co., WA, USA 4 D. ensatus MVZ 230027 San Mateo Co., CA, USA 3 D. ensatus MVZ 249022 Napa Co., CA, USA 3 D. tenebrosus MVZ 246114 Mendocino Co., CA, USA 3 D. tenebrosus MVZ 187929 Trinity Co., CA, USA 1

divergent lineages in the mtDNA gene tree (Shaffer and McKnight, comprised 14.1 lL of water, 2 lLofTaq buffer (with MgCl2), 1996), and all diploid sexual species outside this clade. The Ambys- 0.4 lL of dNTPs, 0.7 lL of each primer, 0.1 lLofTaq DNA polymer- tomatidae also contains a complex of unisexual populations with a ase, and 2 lL of template DNA. In each reaction we used approxi- complicated evolutionary history (Bi and Bogart, 2010), and repre- mately 50 ng of genomic DNA. Most loci were PCR amplified sentatives from this group were not included in this study. To root with 35 cycles of denaturing at 95 °C for 45 s, annealing at 55 °C the Ambystoma tree, 1–2 samples were included from all four ex- for 45 s, and extension at 72 °C for 30 s. All PCR runs opened with tant Dicamptodon species (seven total individuals). Dicamptodon 95 °C for 3 min and concluded with a 5 min extension stage at is the most appropriate outgroup for phylogenetic reconstruction 72 °C. For loci and individuals that were troublesome to amplify, within Ambystoma as numerous molecular studies consistently a gradient PCR was used with the same PCR protocol as outlined establish these genera as sister clades (Frost et al., 2006; Larson, above, except that the annealing phase consisted of a 45–65 °C gra- 1991; Roelants et al., 2007; Weisrock et al., 2005; Wiens et al., dient across the 12 columns of the thermalcycler block. To avoid 2005). DNA was extracted from tissues using a Qiagen DNeasy contamination, negative controls were run for each set of PCRs Blood and Tissue kit following the standard protocol. Genomic using 2 lL of water instead of DNA. All reactions were run on a DNA quantity and quality were assessed using a NanoDrop 2000 1.3% agarose gel, using 0.8 lL of EZ-Vision One 6X loading dye with Spectrophotometer (Thermoscientific) and through electrophore- 4 lL of PCR product for each well. sis on a 1.3% agarose gel. To phase alleles from heterozygous individuals with indels or Mitochondrial DNA sequence data were collected from a region multiple polymorphic sites, PCR products were cloned using an encompassing the nad2 gene region and the tRNATrp and tRNAAla Invitrogen TOPO-TA Cloning Kit. Culture plates were made with genes using primers previously published in Weisrock et al. LB agar, 40 mg of X-gal per mL of dimethylformamide, and mixed (2001). Nuclear sequence data were collected from 14 loci identi- with 50 lg/mL of kanamycin. Colonies were grown overnight at fied from EST-based genome resources developed for the Mexican 37 °C, subsequently picked from plates, and lysed in 25 lL TE Buf- Axolotl, A. mexicanum, and eastern tiger salamander, A. tigrinum fer for 5 min at 95 °C. 2 lL of lysed cells were used in the PCR pro- (Putta et al., 2004). Primer sequences for each locus are in Table 2. tocol outlined above using standard M13 forward and reverse All PCR reactions were performed in a total volume of 20 lL, and primers. Four separate colonies were sequenced for each cloning 674 J.S. Williams et al. / Molecular Phylogenetics and Evolution 68 (2013) 671–682

Table 2 Forward and reverse primers for all amplified loci.

Locus Forward primer Reverse primer AMOTL2 50-AATTATATTCCCTTTCCATGTCTGTC-30 50-TGCAGAAATATTTACGATTCTAGCAC-30 CD163L1 50-TACTACTGTCCTCACAACACATGAAC-30 50-AAACAGCTGCAGATATGTTAAACAAG-30 CD81 50-CTACAGGACACATTTAGCAGATCACT-30 50-ACATTCAGGTTACCAAGACAAGAAG-30 E14E10 50-TGAGGACTTCATCTTACACTCTGAAC-30 50-TATATAGCTGCGAGACCACAAAATAC-30 E16C7 50-GACAGGAGAATGAGTGAGTTACAAAA-30 50-AGAAGTGTTTCAACAGCATTATATCG-30 FMO3 50-CAGTATCGTTTAACAGGGCCAG-30 50-GTTACTAACCAATCAAACAGCAAGAA-30 IQGAP1 50-AGTTATGCATTGGTTCTTATGTTCAC-30 50-AAACAAAGGAATGTTTTGAATGACTT-30 KCTD3 50-CTTCACCAACAAAGTTAAGCACATCT-30 50-AAATTAACCCTGAATAGTGCCATC-30 LHX2 50-TAACTGACTTGACTAACCCCACTATG-30 50-GTCCATTGTACAAAGCCTCTATTAAA-30 M13 50-GTAAAACGACGGCCAG-30 50-CAGGAAACAGCTATGAC-30 mtDNA 50-AAGCTTTCGGGCCCATACC-30 50-GCGTTTAGCTGTTAACTAAA-30 PDXDC1 50-ACATAGGTTTAAAATGTGAACAGTGC-30 50-GTCGTCAAATACAAAGCAAACAGTAT-30 PSME3 50-GGAGAACACTGAAGTGAAAATAACAA-30 50-GCATGTACCACTACTGATCTGAAACT-30 SEC22B 50-ATCATGTTAATAGTGTATGTGCGGTT-30 50-ATTTACACAGATTCTGCAGTACAAGG-30 TRMT5 50-CCAGCTGTTAAAGTAAAGAAGGAAGT-30 50-GTTTTAAAAATTTCATAAGGCAGCTC-30 ZFR 50-TGATAGCTCTTAAAAGAAACCAGACA-30 50-GTAGCTCAAAATCCATGACAGTAAGA-30

reaction. If only one allele was recovered, then four more clones the program had been run long enough to provide independent were sequenced. Cloning sometimes revealed patterns consistent samples of the posterior distribution, where an ESS of 200 or great- with PCR recombination in the cloned PCR products (i.e., recover- er for combined replicate runs was considered representative of ing three to four potential alleles in the clone products). For cloning adequate posterior sampling. In all analyses, replicate runs reached products exhibiting PCR recombinant patterns, we performed a the same stable posterior distribution before 2.5 million genera- subsequent round of PCR on the genomic DNA aimed at minimiz- tions. Using MrBayes, we generated a 50% majority-rule consensus ing the potential for PCR recombination by reducing the number of tree based on the four replicate runs (using a 2.5 million generation amplification cycles from 35 to 30 (Cronn et al., 2002; Zylstra et al., burnin for each replicate). 1998). Overall, this had the intended effect of reducing the recov- ery of extra alleles from heterozygous individuals. 2.3. Coalescent-based species-tree estimation All PCR reactions were cleaned with a 1:5 dilution of ExoSAP-IT following the standard manufacturer’s protocol. Sequencing reac- We used two different analytical approaches to estimate a spe- tions were performed using BigDye Terminator v3.1 and the indi- cies tree within a coalescent framework. First, we used a Bayesian vidual PCR primers originally used for PCR. Samples were MCMC analysis implemented in the program BEAST version 1.6.1 sequenced in both the forward and reverse directions on an ABI (Drummond and Rambaut, 2007; Heled and Drummond, 2010)to 3730 sequencer located in the University of Kentucky’s Advanced estimate a posterior distribution of the species tree based on gene Genetic Technologies Center. Sequences were analyzed, edited, trees estimated from the individual loci. BEAST analyses were per- and aligned using Geneious Pro version 5.3.3 (Drummond et al., formed using the mitochondrial and nuclear loci, as well as only 2010). All data alignments included two haploid gene copies from the nuclear loci. Gene trees were estimated for the individual loci each individual (i.e., alleles). This was the case for both heterozy- (including both of the intraindividual A and B alleles) using the gous and homozygous individuals. The two sequences were arbi- best-fit substitution models identified for each locus (as described trarily labeled A and B. All GenBank accession numbers for above) and using a relaxed uncorrelated lognormal clock (Drum- sequences generated and used in this study are in Table S1. mond et al., 2006). Differences in ploidy between the mitochon- drial and nuclear genome were set to account for the smaller 2.2. Phylogenetic reconstruction of individual gene trees effective population size of the mtDNA locus. Species-tree estima- tion was modeled with a Yule process. Four replicate analyses were To estimate the model of nucleotide substitution for each gene run, each for 500 million generations with sampling events every we analyzed individual gene alignments (including both of the 50,000 generations. Replicate analyses were each started using a intraindividual A and B alleles) in JModelTest 0.1.1 (Guindon and different random-number seed. Tracer was used to assess –lnL Gascuel, 2003; Posada, 2008). We performed these analyses on and ESS values for convergence across replicate analyses, where alignments that included all Ambystoma and Dicamptodon se- an ESS of 200 or greater for combined independent runs for each quence data. In addition, for the purpose of using these data in scenario was regarded as sufficient posterior sampling. In all cases, some downstream analyses that exclude Dicamptodon, we also replicate analyses for a particular set of loci converged on the pos- estimated evolutionary models for alignments that were exclusive terior distribution before 200 million generations. The program to Ambystoma sequence data. For all data sets, the Akaike Informa- LogCombiner was used to combine posterior distributions across tion Criterion (AIC) was used to determine the best-fit substitution replicates using a burnin of 200 million generations, and we used model. Bayesian posterior distributions of gene trees for each locus the program TreeAnnotater to generate a maximum clade credibil- were estimated using MrBayes version 3.1.2 (Ronquist and Huel- ity (MCC) tree. senbeck, 2003). For each locus, analyses were performed with four To estimate the effect of each gene on the species-tree posterior runs containing four MCMC chains each. Each analysis was run for distribution, we ran a number of additional analyses, including: (1) 25 million generations with trees and parameters sampled every an analysis that excluded the mtDNA locus, and (2) 14 rounds of 5000 generations. We performed four replicate analyses for each analysis that excluded a single nuclear locus. All of these subse- locus using different starting conditions determined by random- quent analyses were performed as described above for the total number seeds. Tree scores (lnL values) and ESS estimates from set of loci. The likelihoods from replicate runs of each particular the four independent MCMC runs were analyzed with Tracer v1.5 analysis converged on the same posterior distribution before 250 (Rambaut and Drummond, 2009) to detect whether the posterior million generations. The program LogCombiner was used to com- distribution of all runs for a locus had converged and whether bine posterior distributions of all BEAST analyses across replicates J.S. Williams et al. / Molecular Phylogenetics and Evolution 68 (2013) 671–682 675 using a burnin of 250 million generations. To calculate a measure vidual gene trees generated from Ambystoma-specific data sets in of dissimilarity among trees from these analyses, we calculated MrBayes. We focused these analyses on Ambystoma based on the Robinson–Foulds (RF) distances between species-tree posterior lack of complete nuclear-gene sampling for Dicamptodon and as distributions using the program Treedist in PHYLIP version 3.69 an effort to reduce the number of tips in the analyzed trees. BUCKy (Felsenstein, 2004). In addition, to visualize the relative degree of analyses were conducted using both mitochondrial and nuclear similarity among each posterior distribution, 100 random samples gene trees, and using nuclear gene trees exclusively. For all nuclear from each posterior distribution were plotted in ordination space loci, both the A and B alleles from an individual were present in the using multidimensional scaling (MDS) in the Mesquite module input gene trees, and we used a function within BUCKy to choose Tree Set Viz v2.1 (Hillis et al., 2005; Maddison and Maddison, the allele designated with the A label (which had been arbitrarily 2004). Unweighted RF distances, which measure the dissimilarity assigned to one of the two gene copies as described above) from between the topology of two trees, were calculated for all pairwise each individual. All BUCKy analyses were run for 10 million gener- tree comparisons and used in the MDS analyses. The default step ations after an initial burnin of 1 million generations. For each size in Tree Set Viz was used in all analyses and MDS was allowed analysis, four independent replicate runs were performed, each to proceed until the first six decimal positions of the stress-func- with four MCMC chains. We ran multiple analyses using a range tion value ceased to change. To avoid being trapped in local opti- of Dirichlet process priors (a = 0.001, 0.01, 0.1, 1.0, 10.0, 100.0) ma, this procedure was repeated multiple times to insure that where a is a parameter indicating the a priori degree of discordance similar results were being achieved. The results of MDS analyses between different genes. were plotted as two-dimensional representations of multidimen- sional space. 3. Results We also performed a fossil-calibrated BEAST species tree anal- ysis of the Ambystomatidae by calibrating the root of the Ambys- A total of 14 nuclear loci and one mitochondrial locus (nad2 and toma tree using BEAST v1.7.4 (Drummond et al., 2012). The the adjacent tRNATrp and tRNAAla genes) were sequenced for most oldest fossil for Ambystoma is A. tiheni, which is from the Late Eo- of the 33 Ambystoma individuals. The exceptions were the PSME3 cene (37.2–33.9 million years ago) (Holman, 2006). This fossil locus (31 Ambystoma individuals) and the CD81 locus (32 individ- was used to calibrate the root of Ambystoma by placing a lognormal uals). For Dicamptodon, we were able to sequence three nuclear loci distribution on the root of the tree at 35.55 million years with a for all seven individuals; however, we were unable to generate standard deviation of 1.65 million years. All other running condi- successful PCR or sequence data for the remaining nuclear loci. tions were identical to those in the above BEAST analyses. New mtDNA data were generated for only one Dicamptodon copei As a second method of coalescent-based species-tree recon- individual (MVZ223515). Two additional Dicamptodon sequences struction, we estimated the maximum-likelihood (ML) species (GenBank Accessions AY916017 and AY916018) from a previous trees using the program STEM version 2.0 (Kubatko et al., 2009). study (Weisrock et al., 2005) were used in the mtDNA alignment. STEM requires input of an ultrametric gene tree generated for each Overall, this totaled 4276 bp of aligned nuclear sequence data individual locus. We estimated gene trees for each locus (including and 1183 bp of aligned mtDNA data. The nuclear DNA contained both of the intraindividual A and B alleles) using a Bayesian ap- a total of 1688 variable sites and 988 parsimony-informative sites proach implemented in BEAST version 1.6.1 (Drummond and Ram- across all Ambystoma and Dicamptodon individuals (Table 3). With- baut, 2007). Best fitting substitution models were used for each in Ambystoma, the nuclear data contained a total of 1581 variable locus, and analyses were performed using a relaxed uncorrelated and 926 parsimony-informative sites. The mtDNA contained a total lognormal molecular clock. Four replicate analyses were per- of 1035 variable sites and 524 parsimony-informative sites across formed for each locus. All BEAST analyses were run for 100 million all Ambystoma and Dicamptodon individuals (Table 3). Within generations, sampling every 10,000 generations. We assessed con- Ambystoma, the mtDNA data contained a total of 827 variable vergence by assessing the distribution of lnL and parameter values and 441 parsimony-informative sites. All data used in this study (with an ESS of 200 or greater indicating adequate sampling of the are available from the Dryad online repository at http:// posterior) over the course of each run using Tracer, and by compar- dx.doi.org/10.5061/dryad.2gq14. ing these values across replicate runs. All replicate analyses con- verged on the same stable distribution prior to 10 million generations, and we excluded samples from this portion of the 3.1. Individual gene-tree reconstruction run prior to summarizing the posterior distribution. The MCC tree for each locus was summarized from the combined posterior dis- For all loci analyzed with MrBayes, the independent replicate tribution of each replicate analysis. runs had sampling patterns after 2.5 million generations that indi- Single-locus MCC gene trees were input into STEM, and ML spe- cated convergence on the posterior distribution, including overlap- cies trees were estimated using all mitochondrial and nuclear loci ping plots of stabilized lnL values and similar majority-rule and using only nuclear loci. Because it was unclear what h value consensus topologies generated from each replicate analysis. would best represent our data, we used a range of prior values Majority-rule consensus trees generated from the combined pos- for h (0.0001, 0.0006, 0.001, 0.006, 0.01, 0.1, 1, 10) to account for terior distributions for each locus are presented in Fig. S1. Replicate the potential effect of ancestral population size on our results. Bayesian posterior distributions of trees for each locus generated in Analyses were run using a simulated annealing search for 10 mil- BEAST (for use in STEM analyses) exhibited similar signs of conver- lion generations, while discarding the first 1 million generations gence. Furthermore, the topologies of majority-rule consensus as burnin. A total of four replicate analyses were executed for each trees generated from MrBayes were generally consistent with h prior. those of the MCC trees estimated in BEAST. Differences between these two analyses were generally in ambiguous regions of the 2.4. Bayesian concordance of gene trees tree; unresolved regions of the MrBayes consensus trees were left as polytomies, while corresponding relationships in BEAST trees To determine the proportion of gene trees that supported a par- were resolved as bifurcating branches with low posterior probabil- ticular clade, we calculated Bayesian concordance factors using ities. Tree files for all majority-rule consensus trees generated in BUCKy version 1.4.0 (Ane et al., 2007; Larget et al., 2010). All MrBayes and MCC trees generated in BEAST are available from BUCKy analyses were performed on posterior distributions of indi- Dryad (link provided above). 676 J.S. Williams et al. / Molecular Phylogenetics and Evolution 68 (2013) 671–682

Table 3 Information for the loci sequenced in this study.

Locusa Alignment Number of Number of Linkage Number Number of Substitution Number of distinct Rate of length (bp) Ambystoma Dicamptodon groupb of parsimony model topologies in evolution individuals individuals variable informative posterior (meanRate)d sequenced sequenced sites sites distributionc mtDNA 1183 33 3** mtDNA 827/1035* 441/524* GTR+I+C 2143/3957* 5.605 AMOTL2 349 33 0 8 165 113 HKY+I 18,000 0.893 CD163L1 379 33 0 4 116 92 HKY+C 18,000 0.688 CD81 372 32 0 6 202 110 HKY+C 18,000 0.871 E14E10e 184 33 7 5 34/41* 18/25* GTR+I/GTR+C* 18,000 0.259 E16C7e 373/387* 33 7 5 133/223* 94/139* HKY+I+C 18,000 0.743 FMO3 384 33 0 10 143 107 GTR+C 18,000 0.908 IQGAP1 271 33 0 6 136 63 GTR+C 18,000 0.629 KCTD3 211 33 0 13 94 56 HKY+C 18,000 0.707 LHX2 157 33 7 7 23/33* 18/28* HKY+C 18,000 0.271 PDXDC1 225 33 0 3 89 54 GTR+C 18,000 0.624 PSME3 475 31 0 11 176 110 GTR+C 18,000 0.637 SEC22B 397 33 0 10 130 64 GTR+C 18,000 0.419 TRMT5 214 33 0 14 50 27 GTR+I+C 18,000 0.390 ZFR 271 33 0 2 90 47 GTR+I 18,000 0.332 Total nuclear 4262/4276* – – – 1581/1688* 926/988* –– Total nuclear + mtDNA 5445/5459* – – – 2408/2723* 1367/1512* ––

* Value that differs when Dicamptodon is included. ** Two of these sequences were originally published in Weisrock et al. (2005). a Full names for nuclear loci with determined ortholog: AMOTL2 (Angiomotin-like 2), CD163L1 (CD163 Antigen-like 1), CD81 (CD81 Antigen), FMO3 (Flavin-containing Monooxygenase 3), IQGAP1 (IQ Motif-containing GTPase-activating Protein 1), KCTD3 (Potassium Channel Tetramerization Domain-containing Protein 3), LHX2 (LIM Homeobox Gene 2), PDXDC1 (Pyridoxal-dependent Decarboxylase Domain-containing Protein 1), PSME3 (Proteasome Activator Subunit 3), SEC22B (SEC22 Vesicle-trafficking Protein Homolog B), TRMT5 (tRNA Methyltransferase 5), ZFR (Zinc Finger RNA Binding Protein). b Based on the linkage map of Smith et al. (2005). c Based on the combined posterior distributions from replicate MrBayes analyses. Because every sample of the posterior distribution had a different topology, all nuclear loci have the same number of distinct topologies. d Derived from the meanRate parameter in BEAST. Units for these values entail the number of substitutions per site averaged across the tree. These values come from the joint posterior distributions from BEAST analyses using all loci, using the meanRate value for each locus. e Original EST locus name; human ortholog not determined.

3.2. BEAST species tree analyses space with that of the full data analysis (Fig. 3); however, there are two notable deviations. Exclusion of the mtDNA data produced The BEAST species tree generated using all mitochondrial and an overlapping, but slightly different posterior distribution, com- nuclear loci produced a monophyletic Ambystoma with a high pos- pared to the full-data analysis (Fig. 3A). These differences were terior probability (PP) of 1.0 (Fig. 2A). Within the Ambystoma clade, manifested in two different ways. First, measures of branch support eight branches were resolved that exhibited moderate to strong PP for some of the terminal clades described above changed (Fig. 2B). support and either contained multiple species or were monotypic When the mtDNA data were excluded, branch support increased with no strong placement with other species. Ambystoma gracile slightly for clade A (PP = 0.93), which corresponds with the individ- and A. talpoideum were placed in a clade (designated clade A) with ual mtDNA gene tree not resolving A. gracile and A. talpoideum as a PP = 0.91, and this clade was supported as the sister group to a sister species (Fig. S1). In addition, the exclusion of mtDNA data clade containing all remaining Ambystoma species (PP = 1.0). With- yielded decreased branch support for clade E (PP = 0.77). Second, in this larger clade, A. maculatum (clade B) was placed as the sister exclusion of mtDNA data yielded some alternative phylogenetic lineage to the remaining Ambystoma clades (C–H) with strong sup- relationships among clades C–H, although these involved branches port (PP = 1.0). There was weak branch support for the relation- that received low levels of branch support from both sets of analy- ships among A. macrodactylum (clade C), A. opacum (clade D), and ses. Clade D was placed as the sister lineage to a clade containing a clade of remaining Ambystoma species (clades E–H), with the lat- clades C and clades E–H (PP = 0.54). Clades E–G were grouped to- ter two clades placed as sister groups with a PP = 0.48. Clades E–H gether to the exclusion of clade H with a PP = 0.50. In addition, sup- were each individually supported by strong PPs, although relation- port for the placement of clades E–H in a larger clade was reduced ships among these clades received lower measures of branch sup- to a PP = 0.71. The average RF distance between analyses that in- port. Ambystoma jeffersonianum and A. laterale (clade E) were cluded and excluded the mtDNA locus was 7.2 (Table S2). resolved as sister taxa with a PP = 0.96. Ambystoma mabeei, A. Overall, removal of a single nuclear locus had less of an effect on barbouri, and A. texanum formed a clade (clade F) with a PP = 1.0. the posterior distribution of species trees compared to the removal Clades E and F were resolved as sister groups, although with weak- of the mtDNA locus (Fig. 3B). Average RF distances between poster- er levels of branch support (PP = 0.73). Ambystoma annulatum, A. ior distributions from the nuclear exclusion analyses ranged from bishopi, and A. cingulatum formed a clade (clade G) with a 5.60 to 6.74 (Table S2). Given both the higher number of variable PP = 1.0. Finally, all sampled tiger salamander taxa (A. californiense, sites in the mitochondrial alignment vs. individual nuclear locus A. mexicanum, A. ordinarium, and A. tigrinum) formed a clade with a alignments as well as the higher rate of evolution in the mitochon- PP = 1.0. Clades G and H were resolved as sister groups, again, with drial data (Table 3), this is not unexpected. weaker branch support (PP = 0.78) than that seen for the individu- We also attempted a time-calibrated BEAST analysis of the ally identified clades. Collectively, clades E–H formed a monophy- Ambystomatidae by setting a date on the root of Ambystoma. How- letic group with strong branch support (PP = 0.98). ever, independent runs failed to converge on a similar posterior BEAST analyses that excluded individual loci produced species- distribution, even after running analyses for 500 million genera- tree posterior distributions that largely overlapped in ordination tions. As a result, we do not discuss these results further. However, J.S. Williams et al. / Molecular Phylogenetics and Evolution 68 (2013) 671–682 677

Dicamptodon ensatus 1 A D. copei .49 D. aterrimus .75 D. tenebrosus .91 Ambystoma talpoideum A A. gracile 1 (CF = 11.7) A. maculatum B 1 A. macrodactylum C A. opacum D 1/5.8 A. laterale E .96/3.8 A. jeffersonianum .48/1.4 A. mabeei .73/.1 1/9.5 A. barbouri F 1/3.7 A. texanum .98/2.4 A. annulatum G 1/7.2 1/13.4 A. cingulatum A. bishopi .78/2.1 A. californiense 1/13.0 A. tigrinum H 1/7.5 A. ordinarium 0.03 substitutions/site 1/5.9 A. mexicanum

Dicamptodon ensatus 1 B D. copei .83 D. tenebrosus .66 D. aterrimus .93 Ambystoma talpoideum A A. gracile .97 (CF = 10.8) A. maculatum B 1 A. opacum D C 1/4.8 A. macrodactylum A. californiense 1/12.0 .54/1.1 A. tigrinum H 1/6.5 A. ordinarium 1/5.8 A. mexicanum

.71/2.0 1/6.4 A. annulatum 1/12.3 A. bishopi G A. cingulatum .5/.1 A. jeffersonianum E .77/2.8 A. laterale .74/.1 1/2.7 A. mabeei 1/8.5 A. texanum F 0.0050 substitutions/site A. barbouri

Fig. 2. Maximum clade credibility (MCC) trees generated from the posterior distributions of Bayesian coalescent species-tree reconstructions performed in BEAST. (A) MCC tree generated from analyses that used all mitochondrial and nuclear loci. (B) MCC tree generated from analyses that used only nuclear loci. Numbers on branches represent BEAST posterior probabilities (left of the slash) and Bayesian concordance factors (minimum number of loci supporting a branch, right of the slash). Bayesian concordance factors are not included on all branches due to the exclusion of Dicamptodon outgroup taxa in Bayesian concordance analysis. Branches in red highlight relationships that differ between the two trees. In this study, we consider clades that have posterior probabilities above 0.95 and Bayesian concordance factors of 2 or greater to be strongly supported. all files associated with these analyses can be found in the online these two trees. Calculation of RF distances between all trees Dryad repository. resulting from STEM analyses across all explored h priors against the total-data BEAST MCC tree resulted in RF distances P20, 3.3. STEM analyses reflecting the many topological difference between these results.

Species-tree estimation using STEM produced results that var- 3.4. Bayesian concordance analyses ied according to the h prior used in an analysis. Overall, phyloge- netic relationships estimated in STEM analyses contained few In analyses using all 15 loci (mtDNA and nuclear data), and similarities to those reconstructed with Bayesian species-tree anal- analyses using only nuclear loci, there was no difference in results yses and Bayesian concordance analyses. Here we focus on the across a range of a priors (0.001, 0.01, 0.1, 1, 10, and 100). Bayesian STEM results produced using a h = 0.001, which had the greatest concordance analysis using all 15 mtDNA and nuclear loci pro- similarity to the results produced using other reconstruction meth- duced a primary concordance tree with many similarities to the ods. STEM analysis produced a single ML tree with a 15-locus species tree estimated in BEAST (Figs. 2A and 5A). Clades lnL = 241915.13 (Fig. 4). The ML tree placed A. gracile and A. tal- A–H each received a minimum concordance factor (CF) of 3.7 with poideum as sister lineages (Fig. 4) and placed this clade as the sister a 95% credible interval of 3 and 5 (a CF could not be estimated for clade to all other ambystomatids. The ML tree also resolved all taxon D due to the sampling of a single A. opacum individual). The representative tiger salamander species as a clade. Most other branch separating clade A from clades B–H received a CF of 11.7 well-supported clades present in the BEAST and BUCKy results (95% credibility interval: 10, 13). In addition, the monophyly a (see below) were not resolved in the ML STEM tree. The STEM tree group comprising clades C–H was supported with a CF = 5.8 (95% with the next highest lnL (241976.85) only differed from the ML CI: 4, 8), and the grouping of clades E–H as a larger clade received tree in that A. jeffersonianum and A. opacum were grouped as sister a CF = 2.4 (with a lower bound 95% CI indicating support from at species, and no other notable differences were found between least two loci). Concordance factors for the remaining inter-clade 678 J.S. Williams et al. / Molecular Phylogenetics and Evolution 68 (2013) 671–682

A All Loci relationships in the PC tree were low, with 95% CIs that included mtDNA excluded zero or one. Alternative phylogenetic relationships resolved in the BEAST species tree that were not present in the PC tree also received low CFs (Fig. 2A). For example, the BEAST tree placed clades E and F as sister groups with a PP = 0.73, while Bayesian concordance analysis gave it a very low CF of 0.1. Bayesian concordance analysis of only the nuclear loci produced similar results to those that included mtDNA (Fig. 5B). In these analyses, a CF = 14 would be the highest value that could be given. Differences in the PC tree between the two sets of analyses were restricted to branches that had 95% credibility intervals that in- cluded a CF < 2. For example, the nuclear PC tree included the Lin- guaelapsus clade (clade F + clade G) with a CF = 1.4 and a 95% CI that included 1 and 3 (Fig. 5B), while the mtDNA + nuclear PC tree placed clades G and H as sister clades with a CF = 2.1 and a 95% CI that included 1 and 3 (Fig. 5A). Overall, branches with lower bounds on their 95% CIs that included CFs P 2 were consistent B All Loci across the PC trees generated from all loci (mtDNA and nuclear), AMOTL2 excluded CD163L1 excluded and only the nuclear loci. Removal of mtDNA did decrease CFs CD81 excluded for most branches. Concordance factors for individual clades A–H E14E10 excluded E16C7 excluded each decreased by approximately one, as did the branches ances- FMO3 excluded tral to clades B–H and clades C–H. This effect was slightly less pro- IQGAP1 excluded nounced for the ancestral branch leading to clades E–H (CF = 2.4 KCTD3 excluded LHX2 excluded for all loci vs. CF = 2.0 for all nuclear loci); however, the upper PDXDC1 excluded bound of the 95% CI did drop by two in the nuclear analyses. While PSME3 excluded SEC22B excluded many absolute CFs were decreased by the exclusion of mitochon- TRMT5 excluded drial data, we point out that concordance values presented in this excluded ZFR way are relative to the number of loci used in BUCKy analyses.

4. Discussion

4.1. Interpreting phylogenetic resolution within the Ambystomatidae

Fig. 3. Ordination plots based on multidimensional scaling of trees sampled from the posterior distributions (PDs) generated from Bayesian species tree analyses. In contrast to the lack of resolution between conflicting mor- Each dot represents a tree sampled from the posterior distribution. Distances phological and allozymic data sets in previous phylogenetic studies between dots represent Robinson–Foulds distances between trees, where two dots of Ambystoma, we found considerable phylogenetic resolution that are closer in ordination space have a more similar topology than two dots that among data sets generated from independent nuclear and mito- are farther apart in ordination space. (A) The PDs resulting from analysis of all 15 chondrial loci. We reached this conclusion based primarily on pat- (mitochondrial + nuclear) loci and analysis of all nuclear loci (excluding mtDNA). The final stress value of this analysis was 0.276388. Minimum convex polygons terns of posterior probability support resulting from Bayesian encompass the distribution of trees from each analysis. (B) The PDs resulting from coalescent species-tree reconstruction in BEAST and from Bayes- analysis of all 15 (mitochondrial + nuclear) loci and analyses that excluded a single ian concordance factors. In general, we considered clades that re- nuclear locus. The final stress value of this analysis was 0.301906. Minimum convex ceived high posterior probabilities (in the range of 0.95 or polygons encompass the distribution of trees from each analysis. greater) and concordance factors with a minimum lower confi- Ambystoma talpoideum dence estimate of 2.0 (as reported in the 95% credibility interval) A. gracile to represent confidently supported relationships. A. californiense Whether or not the BEAST posterior probabilities should be A. mexicanum interpreted as the probability that a clade is present in the species A. ordinarium tree is not entirely clear (Alfaro and Holder, 2006), and can depend A. tigrinum on a number of aspects of the analysis, including priors and model A. macrodactylum assumptions (e.g., no gene tree discordance due to gene flow). Re- A. jeffersonianum cent simulation studies suggest that Bayesian implementations of A. opacum the multispecies coalescent can produce very accurate estimates A. annulatum of the species tree under certain species tree histories (Leache A. texanum and Rannala, 2011). Here, we interpret the species-tree posterior A. barbouri probabilities as a measure of the certainty that our data support A. maculatum A. laterale a particular clade, and given our broad sampling of loci across A. mabeei the genome, we infer such clades to be strongly supported esti- A. cingulatum mates of the phylogeny for ambystomatids. A. bishopi The interpretation of concordance factors as measures of branch 0.3 substitutions/site support is less clear, and cannot be viewed in the same light as a posterior probability or bootstrap value. Instead, they represent Fig. 4. Coalescent-based species tree generated via a maximum likelihood analysis the number of sampled loci with reconstructed gene trees that re- in STEM. The tree is based on input gene trees from all nuclear loci and the mtDNA flect a particular relationship. Because discordance for a relation- locus. A range of h values was used in STEM analyses. The tree presented here was generated using a h = 0.001. Branches in red indicate those that differ with the ship can exist across gene trees due to a number of factors BEAST maximum clade credibility tree using all loci (Fig. 2A). (Maddison, 1997), CFs for a true branch in the species tree can be J.S. Williams et al. / Molecular Phylogenetics and Evolution 68 (2013) 671–682 679

A noteworthy in that the mtDNA data on their own were equivocal about the placement of A. gracile and A. talpoideum outside of a clade containing all other Ambystoma species (Fig. S1). We also note that the very high CF for these species presented in Fig. 2 does not necessarily support A. talpoideum and A. gracile as sister spe- cies, but strongly groups all other ambystomatids excluding these two species in an unrooted framework (this is because Dicampto- don was not included in the Bayesian concordance analyses). As a result, our inference of support for their placement in a clade comes primarily from the posterior-probability support in the BEAST trees. Nonetheless, the deep divergence between the A. talpoideum and A. gracile clade and a clade containing all remain- ing ambystomatids is one of the most strongly supported relation- ships within the Ambystomatidae. Second, we resolved A. maculatum as the sister lineage to a clade of all remaining Ambystoma species (Fig. 2). Interestingly, these first two sets of relationships are somewhat similar to the previous allozyme-based results, which placed these three species outside a B clade containing all remaining ambystomatids (Fig. 1B). However, no previous studies provide the same strongly supported recon- structions among A. gracile, A. maculatum, and A. talpoideum found in this study. Third, we were able to resolve a number of more terminal mul- ti-species clades. This includes the grouping of A. laterale and A. jef- fersonianum as a clade (Clade E), the grouping of A. mabeei as a clade with A. barbouri and A. texanum (Clade F), the grouping of A. annulatum as a clade with A. bishopi and A. cingulatum (Clade G), and the resolution of a tiger-salamander clade (Clade H). Many of these findings represent additional support for already well-ac- cepted phylogenetic relationships. However, the sister-species relationship between A. jeffersonianum and A. laterale represents a substantial deviation from previous morphological and allozymic phylogenetic estimates (Kraus, 1988; Shaffer et al., 1991). Interest- ingly, our results are consistent with previous mtDNA (cytb) re- sults, which placed these two species as sister lineages, albeit with weak measures of branch support (Robertson et al., 2006). Fig. 5. Primary concordance trees generated with Bayesian concordance analysis. Finally, our results support the monophyletic grouping of clades (A) Primary concordance tree based on analyses using gene trees from all nuclear loci and the mtDNA locus. Numbers on branches are concordance factors (CFs) and E, F, G, and H. The posterior support for this combined relationship represent the minimum number of loci supporting a branch. The 95% credibility is strong when mitochondrial and nuclear loci are analyzed to- interval of CFs is presented inside brackets. In this tree, a CF = 1 would indicate gether (PP = 0.98; Fig. 2A), but drops when the mitochondrial data support from a single gene tree, while a CF = 15 indicates that the clade is found in are excluded (PP = 0.71; Fig. 2B). Interestingly, however, the lower gene trees reconstructed from all 15 loci. (B) Primary concordance tree based on bound on the 95% credibility interval of CFs is 2.0 in both sets of analyses using gene trees from all nuclear loci. Numbers on branches are concordance factors (minimum number of loci supporting a branch) with their analyses (Fig. 5), indicating that even when the mtDNA data are re- 95% credibility interval presented inside brackets. Lettered clade labels in A and B moved, this relationship is still supported by concordant patterns correspond to those presented in Fig. 2. Red branches highlight relationships that in at least two nuclear loci. differ between primary concordance trees from the total data analysis and the nuclear analysis. Asterisks indicate species with alleles sampled from multiple individuals. (For interpretation of the references to color in this figure legend, the 4.2. Phylogenetic ambiguity in the Ambystomatidae reader is referred to the web version of this article.) While analysis of our multi-locus data resolves much of the spe- cies-tree history for the Ambystomatidae, a fair degree of ambigu- far less than 1.0 (measured as a proportion). This can present a ity still exists in some portions of the tree. In particular, the challenge to the systematist when CFs do not overwhelmingly placement of A. macrodactylum, A. opacum and the branch leading indicate concordance across a majority of loci. We choose to view to the combined clade of clades E, F, G, and H remains unclear. CFs with a lower 95% confidence boundary of 2.0 (measured as the There was disagreement among data sets (i.e., mtDNA and nuclear number of our sampled loci) as a substantial measure of support loci vs. nuclear loci) and analyses in the resolution of these rela- for a branch, with the caveat that a contrasting branch did not tionships; however, this disagreement among trees coincided with share an equal or greater CF. While alternative relationships may very low branch support. A similar pattern of low branch support lack concordant patterns (and yield low CFs) simply through poor and poor phylogenetic resolution is seen among clades E–H. resolution of the gene tree, we view this as a useful ad hoc inter- The weakly supported branches are roughly clustered in the pretation of our results. same region of the phylogeny, occurring after the deeper and well By combining these two sets of results, we robustly resolved a supported branching events involving A. gracile, A. talpoideum, A. number of interspecific relationships within Ambystoma. First, A. maculatum, and the clade of remaining ambystomatids, but before gracile and A. talpoideum were resolved as sister species, and this the well supported branches leading to clades E through H. The group was placed as the sister clade to all remaining ambystomat- lack of resolution in this region of the species tree could be due ids (Fig. 2). The strong measures of support for this deep relation- to a number of factors. This region of the species tree features short ship, either when including or excluding the mtDNA data, are branch lengths (a pattern most evident in the BEAST trees; Fig. 2), 680 J.S. Williams et al. / Molecular Phylogenetics and Evolution 68 (2013) 671–682 potentially suggesting a period of rapid diversification in the his- tingent upon the information content of the added loci (Camargo tory of Ambystoma (Shaffer, 1993). Such an event could affect phy- et al., 2012; O’Neill et al., 2013). logenetic resolution in two ways. First, short branching events Here, we examined an equally important aspect of our molecu- coupled with large effective population sizes would be expected lar sampling: the effect that a particular locus has on our species- to increase the prevalence of deep coalescent events in gene trees tree estimates. By plotting samples from the posterior distributions (Maddison, 1997) and provide a challenge to their accurate recon- of analyses that excluded loci in multidimensional space (Hillis struction, even with a relatively large number of genes (Edwards et al., 2005), we were able to assess the effect of each locus on et al., 2007; Leache and Rannala, 2011). This assumes that the true the species-tree estimate. We found that the exclusion of any branching pattern is dichotomous and not a hard polytomy. For a one locus did not greatly change the posterior estimate of the spe- rapid radiation event, however, the diversification of species may cies tree (Figs. 3 and S2). The largest shift in the posterior distribu- have occurred quickly enough such that a dichotomous branching tion was seen with the removal of the mtDNA locus, which phylogeny may not adequately represent the actual phylogeny produced an overlapping, but slightly shifted distribution of sam- (Stanley et al., 2011). It is not clear how to distinguish between pled trees in ordination space (Fig. 3A). This result is not particu- these two alternatives (rapid branching vs. hard polytomy) in a larly surprising given that the mtDNA alignment was almost multi-species coalescent framework. three times larger than any individual nuclear locus, and had far Second, as internal branches in the species tree become shorter, more informative sites than any individual nuclear locus, as well the probability of mutations in genes marking those events de- as a much higher rate of evolution (Table 3). However, while creases. As a result, increasing the number of genes used in spe- excluding the mitochondrial locus could have had substantial ef- cies-tree reconstruction will not necessarily result in an increase fects on species-tree estimates, given the disproportionate amount in phylogenetic information (Huang et al., 2010). It is not com- of information it contained, the nuclear data alone produced lar- pletely clear in this study whether the regions of poor resolution gely overlapping posterior distributions, and the only changes in found here are tied to any of these factors. However, a number topology were in branches weakly supported with or without its (but not all) of our individual gene trees contain strongly sup- inclusion (Fig. 2). Analyses that excluded a single nuclear locus, ported branches involving clades that are poorly resolved in our or that excluded the mtDNA and a single nuclear locus, also pro- species-tree reconstructions (Fig. S1), suggesting that many loci duced posterior distributions that overlapped with each other contain adequate phylogenetic information at the gene-tree level. and with the posterior distribution of the total data analysis If a hard polytomy does not explain the poor resolution of interme- (Figs. 3B and S2). The MCC trees constructed after excluding single diate regions of the ambystomatids tree, a more complete resolu- loci were very similar, with the only differences involving branches tion of the ambystomatid species tree may be possible by with short lengths and low posterior probability support (results sampling a larger number of loci from the same EST-based pool not shown). of genomic resources, and by including greater numbers of individ- Overall, the most important conclusion derived from uals for each species (McCormack et al., 2009). these exclusion analyses is that our species-tree reconstruction We also note that the STEM-based ML estimate of the using BEAST has not been biased by the substitution patterns of species tree produced results that were largely inconsistent with any individual locus. While this does not mean that all loci are con- the BEAST estimates of the species tree and the primary concor- tributing equally to the phylogenetic resolution of ambystomatid dance trees estimated in BUCKy. One factor that may contribute phylogeny, it does establish that no single locus is driving the over- to this starkly contrasting estimate is that STEM uses a single all resulting species tree. The relatively large and variable mtDNA reconstructed tree as a representative. In contrast, Bayesian spe- data set is not overwhelmingly influencing the analysis, an impor- cies-tree analyses reconstruct a joint posterior distribution for each tant result given the demonstration of a disproportionate influence gene tree, and Bayesian concordance analyses utilize a posterior of highly variable mitochondrial mtDNA data when combined with distribution of gene trees for each locus. In both of these cases, less-variable nuclear data in other salamander species trees (Fish- the variance in gene-tree reconstruction is accommodated in the er-Reid and Wiens, 2011). reconstruction of the encompassing phylogenetic history. Several of our loci exhibited limited variation, and produced consensus gene trees (used as input for our STEM analyses) that contained 4.4. Comparison to previous hypotheses many poorly resolved branches. The reduced information in these point estimates of the gene trees may have constrained the STEM This multi-locus study of ambystomatid salamanders has analyses, and it may be necessary for either a higher number of yielded a phylogeny that has some similarity to previous hypothe- gene trees or for more well-resolved gene-tree estimates to be in- ses (Bogart, 2003; Kraus, 1988; Robertson et al., 2006; Shaffer et al., put into STEM for more robust species-tree estimation. 1991), but also contrasts in numerous ways. Like both the morpho- logical and the allozyme-based trees, estimates using our multi-lo- cus sequence data maintain the sister-species relationships 4.3. Effects of individual loci on phylogenetic reconstruction between A. barbouri and A. texanum, between A. annulatum and A. cingulatum, and between A. tigrinum and A. californiense. The place- An important consideration in multi-locus species-tree recon- ment of A. gracile as one of the early diverging lineages was the struction studies involves an assessment of the influence of differ- only remaining similarity between our multi-locus tree and the ent components of the data on the overall phylogenetic signal. morphological hypothesis; no other species relationship contained Recent studies have examined the effect of the sampling ratio of in the morphological estimate was found in the sequence-based individuals to loci, demonstrating that greater numbers of loci lead phylogenies. to an increase in accuracy and precision for more deeply diverged In contrast, our multi-locus phylogenetic results have many branches, while more recent rapid radiations can benefit from more similarities to previous phylogenetic hypotheses generated greater sampling of individuals per species (Maddison and Know- by allozyme data and cytb mtDNA data. This includes the place- les, 2006; McCormack et al., 2009). Furthermore, while increasing ment of A. gracile, A. talpoideum, and A. maculatum as early diverg- the number of loci is generally expected to increase accuracy and ing lineages in the tree (although our study provides robust precision, not all loci are equal in their information content. As a support for a novel placement of the A. gracile–A. talpoideum clade result, the increase in phylogenetic precision is expected to be con- as the sister group to all remaining ambystomatids), and the place- J.S. Williams et al. / Molecular Phylogenetics and Evolution 68 (2013) 671–682 681 ment of A. mabeei as the sister lineage to the clade of A. barbouri References and A. texanum. However, despite the analysis of a rather large amount of DNA Alfaro, M.E., Holder, M.T., 2006. The posterior and the prior in Bayesian phylogenetics. Annu. Rev. Ecol. Evol. Syst. 37, 19–42. sequence data in this study, we are unable to say anything conclu- Ane, C., Larget, B., Baum, D.A., Smith, S.D., Rokas, A., 2007. Bayesian estimation of sive about the evolution of the Linguaelapsusclade and the nature concordance among gene trees. Mol. Biol. Evol. 24, 412–426. of the ‘‘clash’’ between previously analyzed allozymic and morpho- Bi, K., Bogart, J.P., 2010. Time and time again: unisexual salamanders (genus logical data. Similar to the allozyme-based results, our multi-locus Ambystoma) are the oldest unisexual vertebrates. BMC Evol. Biol. 10, 238. Bogart, J.P., 2003. Genetics and systematics of hybrid species. In: Sever, D.M. (Ed.), species tree includes two components of the Linguaelapsus clade Reproductive Biology and Phylogeny of Urodela (Amphibia). Science Publishers, as strongly supported groups (Fig. 2): (1) A. barbouri, A. mabeei, Inc., Enfield, New Hampshire, pp. 109–134. and A. texanum (clade F), and (2) A. annulatum, A. bishopi, and A. cin- Camargo, A., Avila, L.J., Morando, M., Sites, J.W., 2012. Accuracy and precision of species trees: effects of locus, individual, and base pair sampling on inference of gulatum (clade G) [A. bishopi and A. cingulatum were only recently species trees in lizards of the Liolaemus darwinii Group (Squamata, designated as two species (Pauly et al., 2007)]. Furthermore, none Liolaemidae). Syst. Biol. 61, 272–288. of our coalescent analyses support grouping all six species as a Cronn, R., Cedroni, M., Haselkorn, T., Grover, C., Wendel, J.F., 2002. PCR-mediated recombination in amplification products derived from polyploid cotton. Theor. clade. Concordance analysis of nuclear loci alone did produce a pri- Appl. Genet. 104, 482–489. mary concordance tree that contained the Linguaelapsus clade; Drummond, A.J., Rambaut, A., 2007. BEAST: Bayesian evolutionary analysis by yet, this received a very low concordance factor (CF = 1.4) with a sampling trees. BMC Evol. Biol. 7, 214. Drummond, A.J., Ho, S.Y., Phillips, M.J., Rambaut, A., 2006. Relaxed phylogenetics 95% credibility interval indicating it may be supported by only a and dating with confidence. PLoS Biol. 4, e88. single locus. Indeed, only one of our 15 loci produced a gene tree Drummond, A.J.A.B., Buxton, S., Cheung, M., Cooper, A., Duran, C., Field, M., Heled, J., that resolved the Linguaelapsus clade (Fig. S1). Overall, however, Kearse, M., Markowitz, S., Moir, R., Stones-Havas, S., Sturrock, S., Thierer, T., Wilson, A., 2010. Geneious v5.3. . none of our phylogenetic results strongly rejected the Linguaelap- Drummond, A.J., Suchard, M.A., Xie, D., Rambaut, A., 2012. Bayesian Phylogenetics sus clade. Instead, clades F and G were intermingled with two with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973. other clades (clades E and H). Each of these was individually Edwards, S., Liu, L., Pearl, D., 2007. High-resolution species trees without strongly supported, but there was at best weak support for rela- concatenation. Proc. Natl. Acad. Sci. USA 104, 5936–5941. Felsenstein, J., 1978. Cases in which parsimony or compatibility methods will be tionships among clades. positively misleading. Syst. Zool. 27, 401–410. Our results do seem to suggest that, if the Linguaelapsus clade Felsenstein, J., 2004. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed does indeed reflect a true evolutionary grouping within the by the Author. Department of Genome Sciences, University of Washington, Seattle. Ambystomatidae, its ancestral branch likely evolved during a very Fisher-Reid, M.C., Wiens, J.J., 2011. What are the consequences of combining nuclear rapid period of diversification, and the multiple morphological and mitochondrial data for phylogenetic analysis? Lessons from Plethodon synapomorphies that support its resolution in morphological- salamanders and 13 other vertebrate clades. BMC Evol. Biol. 11, 300. Frost, D.R., Grant, T., Faivovich, J., Bain, R.H., Haas, A., Haddad, C.F.B., De Sa, R.O., based phylogenetic trees evolved over a very short duration of Channing, A., Wilkinson, M., Donnellan, S.C., Raxworthy, C.J., Campbell, J.A., time. We make this conclusion based upon our above discussion Blotto, B.L., Moler, P., Drewes, R.C., Nussbaum, R.A., Lynch, J.D., Green, D.M., of the potential for a hard polytomy or a rapid radiation across Wheeler, W.C., 2006. The tree of life. Bull. Am. Mus. Nat. Hist. 297, 1– 370. intermediate portions of the ambystomatids tree. While our work Guindon, S., Gascuel, O., 2003. A simple, fast, and accurate algorithm to estimate here provides substantial insights into the phylogenetic history large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704. of the Ambystomatidae, further resolution of the previously iden- Heled, J., Drummond, A.J., 2010. Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27, 570–580. tified ‘‘clash’’ between morphological and molecular data sets will Hillis, D.M., 1987. Molecular versus morphological approaches to systematics. require to generation and analysis of additional data. Annu. Rev. Ecol. Syst. 18, 23–42. Hillis, D.M., Heath, T.A., St John, K., 2005. Analysis and visualization of tree space. Syst. Biol. 54, 471–482. Acknowledgments Holman, J.A., 2006. Fossil Salamanders of North America. University Press. Huang, H.T., He, Q.I., Kubatko, L.S., Knowles, L.L., 2010. Sources of error inherent in We thank Jim Demastes, Paul Moler, Greg Pauly, Brad Shaffer, species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. Syst. Biol. the Louisiana State University Museum of Zoology, and the Mu- 59, 573–583. seum of Vertebrate Zoology at the University of California, Berkeley Jones, T.R., Kluge, A.G., Wolf, A.J., 1993. When theories and methodologies clash – a for their contribution of tissue samples. We also thank Eric O’Neill, phylogenetic reanalysis of the North-American ambystomatid salamanders (Caudata, Ambystomatidae). Syst. Biol. 42, 92–101. Stephanie Mitchell, Alex Noble, and Ana Mendia for invaluable lab- Kraus, F., 1988. An empirical-evaluation of the use of the ontogeny polarization oratory assistance. Randal Voss and Paul Hime provided comments criterion in phylogenetic inference. Syst. Zool. 37, 106–141. that improved this manuscript. We also thank the University of Kubatko, L.S., Degnan, J.H., 2007. Inconsistency of phylogenetic estimates from Kentucky Information Technology Department and Center for concatenated data under coalescence. Syst. Biol. 56, 17–24. Kubatko, L.S., Carstens, B.C., Knowles, L.L., 2009. STEM: species tree estimation using Computational Sciences for computing time on the Lipscomb High maximum likelihood for gene trees under coalescence. Bioinformatics 25, 971– Performance Computing Cluster and for access to other supercom- 973. puting resources, and we also thank the Yale University Biomedical Larget, B.R., Kotha, S.K., Dewey, C.N., Ane, C., 2010. BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26, 2910– High Performance Computing Center for allowing us computing 2911. time on the Louise High Performance Computing Cluster (NIH Larson, A., 1991. A molecular perspective on the evolutionary relationships of the Grants RR19895 and RR029676-01). Funding and support for this salamander families. Evol. Biol. 25, 211–277. Leache, A.D., Rannala, B., 2011. The accuracy of species tree estimation under work was provided from the University of Kentucky, a Common- simulation: a comparison of methods. Syst. Biol. 60, 126–137. wealth of Kentucky NSF EPSCoR Grant (# 0814194) in support of Maddison, W.P., 1997. Gene trees in species trees. Syst. Biol. 46, 523–536. Ecological Genomics training and research, NSF Grant Maddison, W.P., Knowles, L.L., 2006. Inferring phylogeny despite incomplete lineage sorting. Syst. Biol. 55, 21–30. DEB0949532 (to DWW), and a Society of Systematic Biologists Maddison, W.P., Maddison, D.R., 2004. Mesquite: A Modular System for Graduate Student Award (to JSW). Evolutionary Analysis. Version 1.05. . McCormack, J.E., Huang, H., Knowles, L.L., 2009. Maximum likelihood estimates of species trees: how accuracy of phylogenetic inference depends upon the Appendix A. Supplementary material divergence history and sampling design. Syst. Biol. 58, 501–508. O’Neill, E.M., Schwartz, R., Bullock, C.T., Williams, J.S., Shaffer, H.B., Aguilar-Miguel, X., Parra-Olea, G., Weisrock, D.W., 2013. Parallel tagged amplicon sequencing Supplementary data associated with this article can be found, in reveals major lineages and phylogenetic structure in the North American the online version, at http://dx.doi.org/10.1016/j.ympev.2013.04. tiger salamander (Ambystoma tigrinum) species complex. Mol. Ecol. 22, 013. 111–129. 682 J.S. Williams et al. / Molecular Phylogenetics and Evolution 68 (2013) 671–682

Pauly, G.B., Piskurek, O., Shaffer, H.B., 2007. Phylogeographic concordance in the Smith, J.J., Kump, D.K., Walker, J.A., Parichy, D.M., Voss, S.R., 2005. A comprehensive southeastern : the flatwoods salamander, Ambystoma cingulatum, expressed sequence tag linkage map for tiger salamander and Mexican axolotl: as a test case. Mol. Ecol. 16, 415–429. enabling gene mapping and comparative genomics in Ambystoma. Genetics 171, Posada, D., 2008. JModelTest: phylogenetic model averaging. Mol. Biol. Evol. 25, 1161–1171. 1253–1256. Stanley, E.L., Bauer, A.M., Jackman, T.R., Branch, W.R., Mouton, P.L.F.N., Putta, S., Smith, J.J., Walker, J.A., Rondet, M., Weisrock, D.W., Monaghan, J., Samuels, 2011. Between a rock and a hard polytomy: rapid radiation in the A.K., Kump, K., King, D.C., Maness, N.J., Habermann, B., Tanaka, E., Bryant, S.V., rupicolous girdled lizards (Squamata: Cordylidae). Mol. Phylogenet. Evol. Gardiner, D.M., Parichy, D.M., Voss, S.R., 2004. From biomedicine to natural history 58, 53–70. research: EST resources for ambystomatid salamanders. BMC Genom. 5, 54. Weisrock, D.W., Macey, J.R., Ugurtas, I.H., Larson, A., Papenfuss, T.J., 2001. Molecular Rambaut, A., Drummond, A.J., 2009. Tracer v1.5. . salamander clade: rapid branching of numerous highly divergent lineages in Robertson, A.V., Ramsden, C., Niedzwiecki, J., Fu, J.Z., Bogart, J.P., 2006. An Mertensiella luschani associated with the rise of Anatolia. Mol. Phylogenet. Evol. unexpected recent ancestor of unisexual Ambystoma. Mol. Ecol. 15, 3339–3351. 18, 434–448. Roelants, K., Gower, D.J., Wilkinson, M., Loader, S.P., Biju, S.D., Guillaume, K., Moriau, Weisrock, D.W., Harmon, L.J., Larson, A., 2005. Resolving deep phylogenetic L., Bossuyt, F., 2007. Global patterns of diversification in the history of modern relationships in salamanders: analyses of mitochondrial and nuclear genomic . Proc. Natl. Acad. Sci. USA 104, 887–892. data. Syst. Biol. 54, 758–777. Rokas, A., Carroll, S.B., 2005. More genes or more taxa? The relative contribution of Weisrock, D.W., Smith, S.D., Chan, L.M., Biebouw, K., Kappeler, P.M., Yoder, A.D., gene number and taxon number to phylogenetic accuracy. Mol. Biol. Evol. 22, 2012. Concatenation and concordance in the reconstruction of mouse lemur 1337–1344. phylogeny: an empirical demonstration of the effect of allele sampling in Ronquist, F., Huelsenbeck, J.P., 2003. MrBayes 3: Bayesian phylogenetic inference phylogenetics. Mol. Biol. Evol. 29, 1615–1630. under mixed models. Bioinformatics 19, 1572–1574. Wiens, J.J., Hollingsworth, B.D., 2000. War of the iguanas: conflicting molecular and Shaffer, H.B., 1993. Phylogenetics of model organisms – the laboratory axolotl, morphological phylogenies and long-branch attraction in iguanid lizards. Syst. Ambystoma mexicanum. Syst. Biol. 42, 508–522. Biol. 49, 143–159. Shaffer, H.B., McKnight, M.L., 1996. The polytypic species revisited: genetic Wiens, J.J., Bonett, R.M., Chippindale, P.T., 2005. Ontogeny discombobulates differentiation and molecular phylogenetics of the tiger salamander phylogeny: paedomorphosis and higher-level salamander relationships. Syst. Ambystoma tigrinum (Amphibia: Caudata) complex. Evolution 50, 417–433. Biol. 54, 91–110. Shaffer, H.B., Clark, J.M., Kraus, F., 1991. When molecules and morphology clash – a Zylstra, P., Rothenfluh, H.S., Weiller, G.F., Blanden, R.V., Steele, E.J., 1998. phylogenetic analysis of the North-American ambystomatid salamanders PCR amplification of murine immunoglobulin germline V genes: (Caudata, Ambystomatidae). Syst. Zool. 40, 284–303. strategies for minimization of recombination artefacts. Immunol. Cell Biol. 76, 395–405.