Improved Transcriptome Sampling Pinpoints 26 Ancient and More Recent Polyploidy Events in Caryophyllales, Including Two Allopolyploidy Events
Total Page:16
File Type:pdf, Size:1020Kb
Research Improved transcriptome sampling pinpoints 26 ancient and more recent polyploidy events in Caryophyllales, including two allopolyploidy events Ya Yang1,4, Michael J. Moore2, Samuel F. Brockington3, Jessica Mikenas2, Julia Olivieri2, Joseph F. Walker1 and Stephen A. Smith1 1Department of Ecology & Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, MI 48109-1048, USA; 2Department of Biology, Oberlin College, 119 Woodland St, Oberlin, OH 44074-1097, USA; 3Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK; 4Present address: Department of Plant and Microbial Biology, University of Minnesota, Twin Cities, 1445 Gortner Avenue, St Paul, MN 55108, USA Summary Authors for correspondence: Studies of the macroevolutionary legacy of polyploidy are limited by an incomplete sam- Ya Yang pling of these events across the tree of life. To better locate and understand these events, we Tel: +1 612 625 6292 need comprehensive taxonomic sampling as well as homology inference methods that accu- Email: [email protected] rately reconstruct the frequency and location of gene duplications. Stephen A. Smith We assembled a data set of transcriptomes and genomes from 168 species in Caryophyl- Tel: +1 734 764 7923 lales, of which 43 transcriptomes were newly generated for this study, representing one of Email: [email protected] the most densely sampled genomic-scale data sets available. We carried out phylogenomic Received: 30 May 2017 analyses using a modified phylome strategy to reconstruct the species tree. We mapped the Accepted: 9 August 2017 phylogenetic distribution of polyploidy events by both tree-based and distance-based meth- ods, and explicitly tested scenarios for allopolyploidy. New Phytologist (2018) 217: 855–870 We identified 26 ancient and more recent polyploidy events distributed throughout doi: 10.1111/nph.14812 Caryophyllales. Two of these events were inferred to be allopolyploidy. Through dense phylogenomic sampling, we show the propensity of polyploidy throughout the evolutionary history of Caryophyllales. We also provide a framework for utilizing tran- Key words: allopolyploidy, Caryophyllales, scriptome data to detect allopolyploidy, which is important as it may have different macroevo- genome duplication, Ks plot, modified phylome, polyploidy. lutionary implications compared with autopolyploidy. Introduction polyploidy events, both are indirect methods that have insuffi- cient resolution and can be misleading (Kellogg, 2016). Ks plots The prevalence and evolutionary consequences of polyploidy in between syntenic blocks from individual sequenced genomes plants have been discussed at length in the field of macroevolu- have the advantage of being sensitive enough to detect ancient tion (Soltis et al., 2015; Lohaus & Van de Peer, 2016). Poly- and nested polyploidy events (Jaillon et al., 2007; Jiao et al., ploidy has been correlated with acceleration of speciation (Tank 2011, 2012, 2014). However, this technique suffers from the et al., 2015; Smith et al., 2017), surviving mass extinction typically sparse taxon sampling available in whole-genome data. (Fawcett et al., 2009; Vanneste et al., 2014a), evolutionary inno- Distribution of polyploidy events inferred using Ks plots from vations (Vanneste et al., 2014b; Edger et al., 2015), and niche genomic data, whether or not taking synteny into consideration shift (Smith et al., 2017). While there is little disagreement about (Fawcett et al., 2009; Vanneste et al., 2014a), awaits re- the importance of polyploidy in angiosperm evolution, the fre- examination with more comprehensive taxon sampling. An alter- quency and phylogenetic locations of these events often remain native to Ks plots is the detection of polyploidy from chromo- unclear. Several limitations in methodology and sampling have some counts. This method has the best signal for recent events limited our ability to accurately locate polyploidy events. and is most often restricted to the genus level or below (Wood Until recently, most studies of polyploidy have employed et al., 2009; Mayrose et al., 2010, 2011). either dating synonymous distances (Ks) among paralogous gene Recent advances in transcriptome and genome sequencing pairs (Vanneste et al., 2013) or ancestral character reconstruction offer the ability not only to measure Ks distances but also to use of chromosome counts (Mayrose et al., 2010; Glick & Mayrose, gene tree topology to validate these. A combination of both 2014). While these have facilitated the discovery of many approaches has allowed for the identification and placement of Ó 2017 The Authors New Phytologist (2018) 217: 855–870 855 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 856 Research Phytologist polyploidy events across the tree of life (Cannon et al., 2015; exploration of the macroevolutionary consequences of polyploidy Edger et al., 2015; Li et al., 2015; Marcet-Houben & Gabaldon, (for example, Smith et al., 2017). 2015; Yang et al., 2015; Huang et al., 2016; Xiang et al., 2016). Despite this rapid increase in the number and precision of Materials and Methods mapped polyploidy events, the sampling strategy for many of these studies was aimed at resolving deeper phylogenetic relation- Taxon sampling, laboratory procedure, and sequence ships. Testing hypotheses regarding the rich macroevolutionary processing legacy of polyploidy requires more extensive sampling of genomes and transcriptomes within a major plant clade. To date, We included 178 ingroup data sets (175 transcriptomes and three only a few such data sets with sufficient sampling are available genomes; Supporting Information Table S1) representing 168 (Huang et al., 2016; Xiang et al., 2016). Furthermore, with a few species in 27 out of the 39 Caryophyllales families (Byng et al., exceptions (Kane et al., 2009; Lai et al., 2012; Estep et al., 2014; 2016; Thulin et al., 2016). Among these, 43 transcriptomes were Hodgins et al., 2014), most of these studies have assumed newly generated for this study (Table S2). In addition, 40 out- autopolyploidy and have not explicitly tested for allopolyploidy. group genomes across angiosperms were used for rooting gene Despite the rich body of literature on gene expression, transposon trees (Table S1). Tissue collection, RNA isolation, library prepa- dynamics, formation of novel phenotypes, and gene silencing ration, sequencing, assembly, and translation followed previously and loss in recently formed allopolyploids (reviewed by Soltis & published protocols (Brockington et al., 2015; Yang et al., 2017) Soltis, 2016; Steige & Slotte, 2016), the long-term effects of with minor modifications (Tables S1, S2). allopolyploidy events remain poorly understood. The plant order Caryophyllales offers an excellent opportu- Caryophyllales homology and orthology inference from nity to explore phylogenomic processes in plants. Caryophyl- peptide sequences using a ‘modified phylome’ strategy lales forms a well-supported clade of c. 12 500 species distributed among 39 families (Byng et al., 2016; Thulin et al., We employed a modified phylome strategy for reconstructing 2016), with an estimated crown age of c.67–121 Myr (Bell orthogroups (Fig. S1). An ‘orthogroup’ includes the complete set et al., 2010; Moore et al., 2010; Smith et al., 2017). Species of of genes in a lineage from a single copy in their common ances- the Caryophyllales are found on every continent including tor. Each node in an orthogroup tree can represent either a speci- Antarctica and in all terrestrial ecosystems as well as aquatic ation event or a gene duplication event. An orthogroup differs systems, occupying some of the most extreme environments on from a homolog group in that the former is inferred from the lat- earth, including the coldest, hottest, driest, and most saline ter by extracting rooted ingroup lineages separated by outgroups. habitats inhabited by vascular plants. Familiar members of the The modified phylome procedure consisted of two major steps. group include cacti, living stones, a diverse array of carnivorous First, ‘backbone homolog groups’ were constructed using peptide plants (e.g. the sundews, Venus flytrap, and tropical pitcher sequences from three Caryophyllales and 40 outgroup genomes. plants), and several important crop plants (e.g. beet, spinach, Second, peptides from transcriptomes were sorted to each back- amaranth, and quinoa). Such extraordinary diversity makes bone homolog. This two-step procedure allowed us to avoid the Caryophyllales a prime system for investigating polyploidy vs computationally intensive all-by-all homology search for diversification rate, character evolution, and niche shifts. Previ- constructing orthogroups. ous analyses using transcriptomes representing 67 species across To construct the backbone homolog groups, we started from Caryophyllales located 13 polyploidy events (Yang et al., the proteome of the best annotated genome, sugar beet (Beta 2015). By generating 43 new transcriptomes, we have vulgaris; Fig. S1; Dohm et al., 2014; http://bvseq.molgen.mpg.de/ expanded the previous sampling to include lineages with key v.1.2, accessed 25 June 2015). Sequences from each beet locus evolutionary transitions, across a data set that now includes were used to search against a database that consisted of combined 168 species of Caryophyllales. proteomes from all 43 genomes using SWIPE v.2.0.11 (Rognes, The size of this data set makes an all-by-all homology search 2011) with an E-value cutoff of 0.01. The top 100