Comparing Species Tree Estimation with Large Anchored

Comparing Species Tree Estimation with Large Anchored

Ruane et al. BMC Evolutionary Biology _#####################_ DOI 10.1186/s12862-015-0503-1 1 RESEARCH ARTICLE Open Access 2 Comparing species tree estimation with large 3 anchored phylogenomic and small Sanger- 4 sequenced molecular datasets: an empirical study Q2 5 on Malagasy pseudoxyrhophiine snakes 1* 1 2 2 1,3 Q1 6789 Sara Ruane , Christopher J. Raxworthy , Alan R. Lemmon , Emily Moriarty Lemmon and Frank T. Burbrink 10 Abstract Q4 11 Background: Using molecular data generated by high throughput next generation sequencing (NGS) platforms to 12 infer phylogeny is becoming common as costs go down and the ability to capture loci from across the genome 13 goes up. While there is a general consensus that greater numbers of independent loci should result in more robust 14 phylogenetic estimates, few studies have compared phylogenies resulting from smaller datasets for commonly used 15 genetic markers with the large datasets captured using NGS. Here, we determine how a 5-locus Sanger dataset compares 16 with a 377-locus anchored genomics dataset for understanding the evolutionary history of the pseudoxyrhophiine snake 17 radiation centered in Madagascar. The Pseudoxyrhophiinae comprise ~86 % of Madagascar’s serpent diversity, yet they 18 are poorly known with respect to ecology, behavior, and systematics. Using the 377-locus NGS dataset and the summary 19 statistics species-tree methods STAR and MP-EST, we estimated a well-supported species tree that provides new insights 20 concerning intergeneric relationships for the pseudoxyrhophiines. 21 Results: The African pseudoxyrhophiine Duberria is the sister taxon to the Malagasy pseudoxyrhophiines genera, 22 providing evidence for a monophyletic radiation in Madagascar. In addition, within Madagascar, the two major clades 23 inferred correspond largely to the aglyphous and opisthoglyphous genera, suggesting that feeding specializations 24 associated with tooth venom delivery may have played a major role in the early diversification of this radiation. We 25 compared tree topologies from concatenated and species-tree methods using different datasets (subsets of the NGS 26 dataset and the Sanger dataset), including a *BEAST species-tree analysis, to determine how each method performs 27 when using varying numbers of loci. We also examined how different numbers of loci affects bootstrap support values. 28 Conclusions: Our results suggest that ≥50 loci may be necessary to confidently infer phylogenies when using summary 29 species-tree methods, but that the coalescent-based method *BEAST consistently recovers the same topology using only 30 15 loci. These results reinforce that datasets with small numbers of markers may result in misleading topologies, and 31 further, that the method of inference used to generate a phylogeny also has a major influence on the number of loci 32 necessary to infer robust species trees. Keywords: Madagascar, Anchored phylogenomics, Lamprophiidae, Next-generation sequencing 33 * Correspondence: [email protected]; [email protected] Q3 1Department of Herpetology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA Full list of author information is available at the end of the article © 2015 Ruane et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Ruane et al. BMC Evolutionary Biology _#####################_ Page 2 of 14 34 Background sites), gene-tree discordance due to mutational variance or 88 35 Phylogenetic studies are undergoing a massive jump in migration between species is a problem that still exists 89 36 the scale of the molecular datasets used to estimate phy- when estimating species trees even under a full-coalescent 90 37 logenies, due to the ease of collecting hundreds of loci model such as *BEAST [24, 26, 27]. Despite these potential 91 38 sampled throughout the genomes of non-model taxa challenges, summary statistics approaches remain a viable 92 39 (e.g., [1, 2]). Increasing the number of loci is expected to option for NGS dataset analyses and several recent empir- 93 40 have a positive effect on phylogenetic estimation [3]. ical studies with these types of data have used MP-EST and 94 41 Both simulation and empirical studies [4–8] demonstrate STAR to estimate well-resolved species trees that confirm 95 42 strong correlation between the number of independent previously hypothesized taxonomy as well as discover 96 43 loci and phylogenetic accuracy, though the exact number novel relationships (e.g., [13, 15, 28]. 97 44 of markers required to resolve relationships varies de- Here, we use an anchored phylogenomics dataset to 98 45 pending on the informativeness of the markers, the construct a generic-level species tree for the Malagasy 99 46 method of inference used, the number of taxa, and the pseudoxyrhophiines and simultaneously explore how dif- 100 47 time-scale being examined. Typically though, the ability ferent datasets, with respect to locus number, influence 101 48 to generate hundreds or thousands of loci that provide phylogenetic inference. The subfamily Pseudoxyrhophiinae 102 49 the appropriate amount of variation at differing time is part of the family Lamprophiidae, a mainly African 103 50 scales has been a challenge, especially across many non- radiation of snakes [29–31]. Pseudoxyrhophiines are 104 51 model taxa. Recently, two different next-generation se- among the most poorly studied of Colubroids, with little 105 52 quencing (NGS) protocols have produced large datasets known with respect to ecology and reproduction (e.g., 106 53 composed of generally longer loci (in contrast to shorter [32–34]), as well as basic morphology (e.g., hemipenial 107 54 length reads from restriction site-associated markers) structure; [35]). This is unfortunate since pseudoxyrho- 108 55 useful for estimating phylogenies at varying temporal phiines are unique among the world’s snake fauna as being 109 56 scales—the ultra-conserved element procedure of Faircloth the only island snake lineage where the majority of diversi- 110 57 et al. [9] and the anchored phylogenomics approach of fication takes place in situ on the island rather than 111 58 Lemmonetal.[1].Thesemethodsdifferinthespecificre- through dispersal from the mainland [36], potentially 112 59 gions targeted and the numbers of loci produced, yet both making pseudoxyrhophiines an excellent model system 113 60 produce orthologous markers across multiple, non-model for determining what factors promote ecological and mor- 114 61 taxa with substantial genetic variation for inferring phylog- phological diversification within a closed system. 115 62 enies at both shallow and deep-time scales [1, 2, 7, 9–16]. Of the currently recognized 89 species of pseudoxyr- 116 63 While generating DNA datasets that cover the genome hophiine, 80 are endemic to Madagascar (excepting 117 64 has become easier, estimating species-tree phylogenies possible introductions to the Comoros; [37]), with the 118 65 with these data remains problematic [17]. Multi-species remaining taxa distributed in mainland Africa (five spp.), 119 66 coalescent methods that jointly estimate gene trees and the Comoros islands (three spp.), and Socotra (one sp.). 120 67 species trees, such as *BEAST [18], have proved robust Previous studies have indicated the African and Socotran 121 68 for species-trees estimation. Although these full-coalescent species are the sister lineage(s) to a monophyletic radi- 122 69 methods can be highly accurate when using relatively few ation of Malagasy/Comoros taxa [38, 39] but this has 123 70 loci (e.g., [7]), they may not be suitable for the large not been supported by the most recent phylogenetic es- 124 71 numbers of loci produced using NGS techniques due timates for the group [30, 31]. Prior molecular phylogen- 125 72 to computational time and a lack of convergence as etic studies have included up to 54 of the recognized 126 73 the number of taxa or loci increases [14, 19]. species of Pseudoxyrhophiinae for a single gene [40] and 127 74 Alternatively, methods that use summarized informa- no study has used more than 10 loci to determine rela- 128 75 tion from user-provided gene trees to quickly estimate tionships among the genera, with the majority of taxa 129 76 species trees, such as MP-EST [20] and STAR [21], are having only 1–6 loci available [30]. These studies also 130 77 promising for analyzing NGS datasets. These methods used concatenated gene-tree methodologies, rather than 131 78 can accommodate many taxa and loci and have the sta- species-tree approaches, which are more likely to be 132 79 tistically desirable properties of being accurate when misleading when using small numbers of loci [6]. Al- 133 80 used with large numbers of loci and low levels of miss- though under certain circumstances concatenation re- 134 81 ing data [19, 20]. However, species-tree methods that de- sults in identical topologies when compared to a species 135 82 pend on summarized gene-tree uncertainty may suffer trees [6, 41, 42], empirical studies have demonstrated that 136 83 when markers are short and uninformative [22, 23] or concatenation may overestimate branch lengths, causing in- 137 84 when incomplete-lineage sorting is not the main cause accuracies in downstream phylogenetic analyses (e.g., [43]). 138 85 of gene-tree discordance [24–26]. While the concern of Using a NGS dataset comprised of 377 loci covering 139 86 uninformative markers can be circumvented by using 77 % of the genera of Pseudoxyrhophiinae, we first 140 87 high quality datasets (i.e., longer loci with more informative estimate species trees using the full dataset and the 141 Ruane et al.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    14 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us