Analysis of Phylogenomic Tree Space Resolves Relationships Among Marsupial Families
Total Page:16
File Type:pdf, Size:1020Kb
Syst. Biol. 67(3):400–412, 2018 © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: [email protected] DOI:10.1093/sysbio/syx076 Advance Access publication September 22, 2017 Analysis of Phylogenomic Tree Space Resolves Relationships Among Marsupial Families ,∗ , , DAV I D A. DUCHÊNE1 ,JASON G. BRAGG2 3,SEBASTIÁN DUCHÊNE4,LINDA E. NEAVES5,SALLY POTTER2 5,CRAIG MORITZ2, REBECCA N. JOHNSON5,SIMON Y. W. H O1, AND MARK D. B. ELDRIDGE5 1School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia; 2Research School of Biology, Australian National University, Canberra, ACT 2601, Australia; 3National Herbarium of NSW, The Royal Botanic Gardens and Domain Trust, Sydney, NSW 2000, Australia; 4Centre for Systems Genomics, The University of Melbourne, Melbourne, VIC 3010, Australia; and 5Australian Museum Research Institute, Australian Museum, 1 William Street, Sydney, NSW 2000, Australia ∗ Correspondence to be sent to: School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia; E-mail: [email protected] Downloaded from https://academic.oup.com/sysbio/article-abstract/67/3/400/4175806 by University of Idaho user on 05 September 2018 Simon Y. W. Ho and Mark D. B. Eldridge contributed equally to this work. Received 29 July 2017; reviews returned 8 September 2017; accepted 8 September 2017 Associate Editor: Matthew Hahn Abstract.—A fundamental challenge in resolving evolutionary relationships across the tree of life is to account for heterogeneity in the evolutionary signal across loci. Studies of marsupial mammals have demonstrated that this heterogeneity can be substantial, leaving considerable uncertainty in the evolutionary timescale and relationships within the group. Using simulations and a new phylogenomic data set comprising nucleotide sequences of 1550 loci from 18 of the 22 extant marsupial families, we demonstrate the power of a method for identifying clusters of loci that support different phylogenetic trees. We find two distinct clusters of loci, each providing an estimate of the species tree that matches previously proposed resolutions of the marsupial phylogeny. We also identify a well-supported placement for the enigmatic marsupial moles (Notoryctes) that contradicts previous molecular estimates but is consistent with morphological evidence. The pattern of gene-tree variation across tree-space is characterized by changes in information content, GC content, substitution-model adequacy, and signatures of purifying selection in the data. In a simulation study, we show that incomplete lineage sorting can explain the division of loci into the two tree-topology clusters, as found in our phylogenomic analysis of marsupials. We also demonstrate the potential benefits of minimizing uncertainty from phylogenetic conflict for molecular dating. Our analyses reveal that Australasian marsupials appeared in the early Paleocene, whereas the diversification of present-day families occurred primarily during the late Eocene and early Oligocene. Our methods provide an intuitive framework for improving the accuracy and precision of phylogenetic inference and molecular dating using genome-scale data. [Mammals; marsupials; multispecies coalescent; phylogenomics; tree space.] Genome-scale data have helped to resolve some Some molecular systematic studies identify the clade stubborn phylogenetic problems across the tree of Eomarsupialia, which excludes all American marsupials life, including the deep relationships among placental (Archer and Hand 1984). But Australasian marsupials mammals (Meredith et al. 2011; Song et al. 2012; have also been reported to be paraphyletic (Cardillo Tarver et al. 2016). Nevertheless, heterogeneity in the et al. 2004; Bininda-Emonds et al. 2007). Many other phylogenetic signal across loci has led to persistent deep divergences in the marsupial phylogeny have been uncertainty in estimates of the evolutionary timescale resolved inconsistently across studies. One example is and relationships of marsupials, the sister group of that of the possums, which are generally acknowledged placental mammals (Phillips et al. 2006; Mitchell et al. to comprise two separate superfamilies: Phalangeroidea 2014). Heterogeneous signals across loci can be the (brushtail possums, cuscuses, and pygmy possums) and product of rapid diversification (Degnan et al. 2006), Petauroidea (gliders and ringtail possums). However, low data quality, or poor substitution-model fit (Gatesy studies have disagreed on whether the closer relative and Springer 2013; Arcila et al. 2017). However, there are of Macropodiformes (bettongs, kangaroos, potoroos, few available methods for characterizing and evaluating wallabies, and allies) is Phalangeroidea (Szalay 1994; the phylogenetic signal across loci in genome-scale data. Meredith et al. 2008, 2009; Phillips and Pratt 2008)or This methodological gap undermines the reliability of Petauroidea (Meredith et al. 2011; Mitchell et al. 2014). phylogenetic inference and our understanding of the The marsupial moles (Notoryctes; family Notoryctidae) causes of incongruent signals across the genome (Gatesy and the American monito del monte (Dromiciops, family and Springer 2014; Springer and Gatesy 2015). One Microbiotheriidae) have also proven to be difficult common approach is to evaluate the strength of support cases, being placed in various positions using different for specific phylogenetic hypotheses across a given set of molecular data sets (Szalay 1994; Kirsch et al. 1997; loci (Song et al. 2012; Linkem et al. 2016; Arcila et al. 2017). Springer et al. 1998; Horovitz and Sanchez-Villagra 2003; But as the numbers of taxa and loci in phylogenetic data Asher et al. 2004; Beck 2008; Meredith et al. 2008; Beck sets increase, so does the need for intuitive methods for et al. 2016). exploring the information content across the data. To describe the patterns of diversification and lineage Australasian marsupials are a group of about 334 sorting across marsupial taxa, we aimed to visualize species found in Australia and parts of South-East the trees supported by different loci across the genome Asia. They underwent rapid diversification, which as points in phylogenetic tree space (Hillis et al. 2005; has complicated efforts to resolve their relationships. Matsen 2006; Höhna and Drummond 2012; Gori et al. 400 [17:05 14/4/2018 Sysbio-OP-SYSB170078.tex] Page: 400 400–413 2018 DUCHÊNE ET AL.—MARSUPIAL PHYLOGENOMICS 401 2016; Huang et al. 2016). In this space, trees inferred clusters independently for estimating the divergence from the most informative loci might be placed close times among marsupial families and find that a minor to each other and potentially to the species tree. In form of bias in date estimates can arise from this type of contrast, gene trees that differ markedly from the species data selection. Using simulations of gene-tree evolution, tree might be placed in their own, separate clusters. we find that incomplete lineage sorting along the branch In practice, however, the distances between gene trees leading to the Macropodiformes, and its sister group is can be distorted when visualized in Euclidean space sufficient to explain the clustering patterns seen in the (Hillis et al. 2005; Matsen 2006; Höhna and Drummond data. 2012; Gori et al. 2016; Huang et al. 2016). Owing to this distortion and the unknown drivers of gene-tree clustering, the potential for such visualization methods MATERIALS AND METHODS to serve as objective tools for describing genomic Downloaded from https://academic.oup.com/sysbio/article-abstract/67/3/400/4175806 by University of Idaho user on 05 September 2018 information is unclear (Stockham et al. 2002; Nye 2008; Sample Collection, Targeted Nuclear Enrichment, and Darlu and Guénoche 2011). If such methods are effective, Bioinformatics gene trees in separate clusters can be compared for We sampled 45 species of marsupials, representing 18 the strength of their statistical support and for any of the 22 extant families. Our data set did not include the biochemical or other characteristics that might be shared South American families Caenolestidae, Caluromydiae, by the corresponding loci. Loci that carry strong and and Glironiidae, and our samples from the Australian congruent phylogenetic signals can then be investigated family Petauridae failed to yield sequence data of in detail, with the intention of improving the accuracy sufficient quality. Our samples were mostly obtained and precision of phylogenomic inference. from the Australian Museum Research Institute frozen There is also an increasing interest in improving tissue collection (Supplementary Table S1; data available the accuracy and precision of molecular dating using on Dryad http://dx.doi.org/10.5061/dryad.353q5). We genome-scale data (Tong et al. 2017). This presents used a custom in-solution capture approach to target a number of challenges, including accounting for loci in marsupials. The approach we used to identify heterogeneous evolutionary signals across loci (Ho 2014; target exons is outlined by Bragg et al. (2016a), and a Angelis et al. 2017; Foster and Ho 2017). Many methods criterion was imposed such that targets were greater for molecular dating assume that the underlying tree than 220 bp and could be identified in at least topology is known, which can lead to substantial six out of eight previously sequenced genomes and amounts of error