Research

Improved transcriptome sampling pinpoints 26 ancient and more recent events in , including two allopolyploidy events

Ya Yang1,4, Michael J. Moore2, Samuel F. Brockington3, Jessica Mikenas2, Julia Olivieri2, Joseph F. Walker1 and Stephen A. Smith1 1Department of Ecology & Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, MI 48109-1048, USA; 2Department of Biology, Oberlin College, 119 Woodland St, Oberlin, OH 44074-1097, USA; 3Department of Sciences, University of Cambridge, Cambridge, CB2 3EA, UK; 4Present address: Department of Plant and Microbial Biology, University of Minnesota, Twin Cities, 1445 Gortner Avenue, St Paul, MN 55108, USA

Summary Authors for correspondence: Studies of the macroevolutionary legacy of polyploidy are limited by an incomplete sam- Ya Yang pling of these events across the tree of life. To better locate and understand these events, we Tel: +1 612 625 6292 need comprehensive taxonomic sampling as well as homology inference methods that accu- Email: [email protected] rately reconstruct the frequency and location of gene duplications. Stephen A. Smith We assembled a data set of transcriptomes and genomes from 168 in Caryophyl- Tel: +1 734 764 7923 lales, of which 43 transcriptomes were newly generated for this study, representing one of Email: [email protected] the most densely sampled genomic-scale data sets available. We carried out phylogenomic Received: 30 May 2017 analyses using a modified phylome strategy to reconstruct the species tree. We mapped the Accepted: 9 August 2017 phylogenetic distribution of polyploidy events by both tree-based and distance-based meth- ods, and explicitly tested scenarios for allopolyploidy. New Phytologist (2018) 217: 855–870 We identified 26 ancient and more recent polyploidy events distributed throughout doi: 10.1111/nph.14812 Caryophyllales. Two of these events were inferred to be allopolyploidy. Through dense phylogenomic sampling, we show the propensity of polyploidy throughout the evolutionary history of Caryophyllales. We also provide a framework for utilizing tran- Key words: allopolyploidy, Caryophyllales, scriptome data to detect allopolyploidy, which is important as it may have different macroevo- genome duplication, Ks plot, modified phylome, polyploidy. lutionary implications compared with autopolyploidy.

Introduction polyploidy events, both are indirect methods that have insuffi- cient resolution and can be misleading (Kellogg, 2016). Ks plots The prevalence and evolutionary consequences of polyploidy in between syntenic blocks from individual sequenced genomes have been discussed at length in the field of macroevolu- have the advantage of being sensitive enough to detect ancient tion (Soltis et al., 2015; Lohaus & Van de Peer, 2016). Poly- and nested polyploidy events (Jaillon et al., 2007; Jiao et al., ploidy has been correlated with acceleration of speciation (Tank 2011, 2012, 2014). However, this technique suffers from the et al., 2015; Smith et al., 2017), surviving mass extinction typically sparse taxon sampling available in whole-genome data. (Fawcett et al., 2009; Vanneste et al., 2014a), evolutionary inno- Distribution of polyploidy events inferred using Ks plots from vations (Vanneste et al., 2014b; Edger et al., 2015), and niche genomic data, whether or not taking synteny into consideration shift (Smith et al., 2017). While there is little disagreement about (Fawcett et al., 2009; Vanneste et al., 2014a), awaits re- the importance of polyploidy in angiosperm evolution, the fre- examination with more comprehensive taxon sampling. An alter- quency and phylogenetic locations of these events often remain native to Ks plots is the detection of polyploidy from chromo- unclear. Several limitations in methodology and sampling have some counts. This method has the best signal for recent events limited our ability to accurately locate polyploidy events. and is most often restricted to the level or below (Wood Until recently, most studies of polyploidy have employed et al., 2009; Mayrose et al., 2010, 2011). either dating synonymous distances (Ks) among paralogous gene Recent advances in transcriptome and genome sequencing pairs (Vanneste et al., 2013) or ancestral character reconstruction offer the ability not only to measure Ks distances but also to use of chromosome counts (Mayrose et al., 2010; Glick & Mayrose, gene tree topology to validate these. A combination of both 2014). While these have facilitated the discovery of many approaches has allowed for the identification and placement of

Ó 2017 The Authors New Phytologist (2018) 217: 855–870 855 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 856 Research Phytologist

polyploidy events across the tree of life (Cannon et al., 2015; exploration of the macroevolutionary consequences of polyploidy Edger et al., 2015; Li et al., 2015; Marcet-Houben & Gabaldon, (for example, Smith et al., 2017). 2015; Yang et al., 2015; Huang et al., 2016; Xiang et al., 2016). Despite this rapid increase in the number and precision of Materials and Methods mapped polyploidy events, the sampling strategy for many of these studies was aimed at resolving deeper phylogenetic relation- Taxon sampling, laboratory procedure, and sequence ships. Testing hypotheses regarding the rich macroevolutionary processing legacy of polyploidy requires more extensive sampling of genomes and transcriptomes within a major plant clade. To date, We included 178 ingroup data sets (175 transcriptomes and three only a few such data sets with sufficient sampling are available genomes; Supporting Information Table S1) representing 168 (Huang et al., 2016; Xiang et al., 2016). Furthermore, with a few species in 27 out of the 39 Caryophyllales families (Byng et al., exceptions (Kane et al., 2009; Lai et al., 2012; Estep et al., 2014; 2016; Thulin et al., 2016). Among these, 43 transcriptomes were Hodgins et al., 2014), most of these studies have assumed newly generated for this study (Table S2). In addition, 40 out- autopolyploidy and have not explicitly tested for allopolyploidy. group genomes across angiosperms were used for rooting gene Despite the rich body of literature on gene expression, transposon trees (Table S1). Tissue collection, RNA isolation, library prepa- dynamics, formation of novel phenotypes, and gene silencing ration, sequencing, assembly, and translation followed previously and loss in recently formed allopolyploids (reviewed by Soltis & published protocols (Brockington et al., 2015; Yang et al., 2017) Soltis, 2016; Steige & Slotte, 2016), the long-term effects of with minor modifications (Tables S1, S2). allopolyploidy events remain poorly understood. The plant order Caryophyllales offers an excellent opportu- Caryophyllales homology and orthology inference from nity to explore phylogenomic processes in plants. Caryophyl- peptide sequences using a ‘modified phylome’ strategy lales forms a well-supported clade of c. 12 500 species distributed among 39 families (Byng et al., 2016; Thulin et al., We employed a modified phylome strategy for reconstructing 2016), with an estimated crown age of c.67–121 Myr (Bell orthogroups (Fig. S1). An ‘orthogroup’ includes the complete set et al., 2010; Moore et al., 2010; Smith et al., 2017). Species of of genes in a lineage from a single copy in their common ances- the Caryophyllales are found on every continent including tor. Each node in an orthogroup tree can represent either a speci- Antarctica and in all terrestrial ecosystems as well as aquatic ation event or a gene duplication event. An orthogroup differs systems, occupying some of the most extreme environments on from a homolog group in that the former is inferred from the lat- earth, including the coldest, hottest, driest, and most saline ter by extracting rooted ingroup lineages separated by outgroups. habitats inhabited by vascular plants. Familiar members of the The modified phylome procedure consisted of two major steps. group include cacti, living stones, a diverse array of carnivorous First, ‘backbone homolog groups’ were constructed using peptide plants (e.g. the sundews, Venus flytrap, and tropical pitcher sequences from three Caryophyllales and 40 outgroup genomes. plants), and several important crop plants (e.g. beet, spinach, Second, peptides from transcriptomes were sorted to each back- amaranth, and quinoa). Such extraordinary diversity makes bone homolog. This two-step procedure allowed us to avoid the Caryophyllales a prime system for investigating polyploidy vs computationally intensive all-by-all homology search for diversification rate, character evolution, and niche shifts. Previ- constructing orthogroups. ous analyses using transcriptomes representing 67 species across To construct the backbone homolog groups, we started from Caryophyllales located 13 polyploidy events (Yang et al., the proteome of the best annotated genome, sugar beet (Beta 2015). By generating 43 new transcriptomes, we have vulgaris; Fig. S1; Dohm et al., 2014; http://bvseq.molgen.mpg.de/ expanded the previous sampling to include lineages with key v.1.2, accessed 25 June 2015). Sequences from each beet locus evolutionary transitions, across a data set that now includes were used to search against a database that consisted of combined 168 species of Caryophyllales. proteomes from all 43 genomes using SWIPE v.2.0.11 (Rognes, The size of this data set makes an all-by-all homology search 2011) with an E-value cutoff of 0.01. The top 100 hits with bit impractical. Hence, we developed a ‘modified phylome’ strategy scores higher than 50, and bit scores of at least 20% of the self-hit to build homolog and ortholog groups for species tree inference. were retained and aligned using MAFFT v.7.215 (Katoh & Stand- In addition, we used an all-by-all approach to build lineage- ley, 2013), with ‘-genafpair -maxiterate 1000’. The alignments specific homolog gene sets (Yang & Smith, 2014), and took were trimmed using PHYUTILITY v.2.2.6 (Smith & Dunn, 2008) advantage of recent developments in gene tree-based methods for with ‘-clean 0.1’, and trees were constructed using RAXML v.8.1.5 mapping polyploidy events (Cannon et al., 2015; Li et al., 2015; (Stamatakis, 2014) with the model PROTCATWAG. After visual Yang et al., 2015). Our dense sampling allowed us to take chro- inspection of c. 10 resulting trees to evaluate outliers, any terminal mosome counts into consideration, and begin to explore branch that was longer than 2 (absolute cutoff) or longer than 1 allopolyploidy events. These improved methods for tree building and more than 10 times as long as its sister (relative cutoff) was and mapping of gene duplications, along with our improved trimmed. Internal branches longer than 1 were separated (Yang & taxon sampling, enabled the most extensive exploration of poly- Smith, 2014). We retained trees that contained the original beet ploidy yet attempted in a major plan clade. The results reported bait locus, combining those groups that shared beet locus IDs (i.e. here help establish the necessary foundation for further had gene duplication within Caryophyllales).

New Phytologist (2018) 217: 855–870 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 857

The resulting backbone homolog groups constructed from the et al., 2015), with Stegnosperma halimifolium (Stegnospermataceae) 43 genomes were then used to place the remaining 175 and the three Caryophyllales genomes Beta vulgaris (beet, Caryophyllales transcriptomes (Fig. S1). First, peptide sequences Chenopodiaceae; Dohm et al.,2014),Spinacia oleracea (spinach, from each of the Caryophyllales transcriptomes were reduced Chenopodiaceae; Dohm et al., 2014), and Dianthus caryophyllus using CD-HIT v.4.6 (-c 0.99 -n 5; Fu et al., 2012). The resulting (carnation, ; Yagi et al., 2014) as outgroups; sequences were then used in SWIPE analyses comparing the PORT, the ‘Portullugo clade’ which consists of Mollugi- sequences to the beet proteome to identify matching backbone naceae + Portulacineae (Edwards & Ogburn, 2012) with the three homolog groups. A new tree representing each expanded Caryophyllales genomes as outgroups; AMAR, Amaran- homolog group, with both genome and transcriptome data, was thaceae + Chenopodiaceae, with carnation and Phaulothamnus estimated using the same alignment and phylogenetic reconstruc- spinescens () as outgroups; CARY, Caryophyl- tion settings as for the backbone homolog tree. To reduce iso- laceae, with spinach, beet and Phaulothamnus spinescens (Achato- forms in transcriptome data sets, monophyletic and paraphyletic carpaceae) as outgroups; and NCORE, the clade that is sister to tips that belonged to the same taxon were removed, leaving only the rest of Caryophyllales, with all three Caryophyllales genomes the tip with the highest number of characters in the trimmed plus Microtea debilis, Physena madagascariensis and Simmondsia alignment (Yang & Smith, 2014). Spurious tips and long internal chinensis as outgroups. branches were cut using the same settings as for the backbone An all-by-all approach was used for homology inference in tree. For homolog groups with > 1000 and < 5000 sequences, each subclade following Yang & Smith (2014) with minor modi- alignments were constructed using PASTA v.1.6.3 (Mirarab et al., fications (Methods S1). The final alignments from homolog trees 2014a,b) with default settings, and were trimmed by PHYUTILITY with no taxon duplication (i.e. one-to-one orthologs), no more with ‘-clean 0.01’, and phylogenetic trees were estimated using than one missing taxon (except requiring full taxon occupancy FASTTREE v.2.1.8 (Price et al., 2010) with the model WAG. An for CARY and PHYT), and an average bootstrap value of at least initial internal branch length cutoff of 2 was used after reducing 80 were trimmed with PHYUTILITY ‘-clean 0.5’. Trimmed align- tips and trimming spurious tips with the same cutoffs as for the ments with at least 300 columns were the final orthologs. backbone trees. A second round of alignment and refining was carried out for these larger homolog groups. Homolog groups > Species tree inference 5000 were ignored. After obtaining final homologs using the modified phylome We used two alternative approaches for constructing species trees approach, we carried out orthology inference following the for both the entire Caryophyllales using peptides (‘modified phy- ‘rooted ingroup’ method in Yang & Smith (2014). Briefly, for lome data set’) and each of the five subclades using coding each Caryophyllales orthogroup extracted from a final homolog, sequences (CDSs) (‘the subclade data set’). First, a supermatrix we walked from the root toward the tip. When two sister nodes was constructed by concatenating trimmed ortholog alignments. shared one or more taxa, the side with a smaller number of taxa A maximum likelihood tree was estimated from the supermatrix was separated and both subtrees were taken into account in the using RAXML, partitioning each locus, with the model set to next round until all subtrees contained only one sequence per PROTCATAUTO for peptides and GTRCAT for coding taxon. For each resulting tree with at least 160 taxa, sequences sequences for each individual partition. Node support was evalu- were pooled, re-aligned using PRANK v.140110 (L€oytynoja & ated by the internode certainty all (ICA) scores (Salichos et al., Goldman, 2010) with default settings, and trimmed with PHYU- 2014) calculated in RAXML using final ortholog trees as input. 0 TILITY with ‘-clean 0.3 , and a new ortholog tree was estimated Probabilistic correction was used to take incomplete taxon occu- using RAXML with PROTCATAUTO. A set of more stringent pancy into consideration (Kobert et al., 2016; Stamatakis, 2016). cutoffs was used to produce the final ortholog trees: absolute tip As implementation of ICA score calculation was updated in more cutoff of 0.6, relative tip cutoff of 0.3, and an internal branch recent releases of RAXML, we used RAXML v.8.2.9 for calculat- cutoff of 0.4. Aligned sequences were pooled according to ing ICA scores. remaining tips and trimmed (using PHYUTILITY with ‘-clean 0.3’), In addition to the concatenated analyses, we also searched for and remaining alignments with at least 150 characters and 160 the maximum quartet support species tree (MQSST) using AS- taxa were used for species tree inference. TRAL-II v.4.10.12 (Mirarab et al., 2014b; Mirarab & Warnow, 2015) starting from maximum likelihood trees estimated from individual orthologs. Tree uncertainty was evaluated using 100 All-by-all homology search and orthology inference in each multi-locus bootstrap replicates (Seo et al., 2005; Seo, 2008; of five Caryophyllales subclades from coding sequences Mirarab et al., 2014a,b), starting from 200 Fast bootstrap trees Uncertainty in alignment and tree inference increases with data set for each final ortholog calculated in RAXML. size. Given the absence of polyploidy events along the backbone of Caryophyllales (Smith et al., 2015; Yang et al., 2015), we divided Mapping polyploidy events based on subclade orthogroup Caryophyllales into five subclades according to previous phyloge- tree topology netic analysis (Yang et al., 2015): PHYT: + the ‘Phyto- laccoid clade’ which consists of , s.l. To map polyploidy events in each subclade, we extracted (i.e. including Agdestis), , and Sarcobataceae (Yang orthogroups from each subclade homolog tree, requiring no more

Ó 2017 The Authors New Phytologist (2018) 217: 855–870 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 858 Research Phytologist

than two missing ingroup taxa. When two or more taxa over- Orthogroup tree Species tree lapped between the two daughter clades, a gene duplication event A was recorded to the most recent common ancestor (MRCA) on A 2n = 4x (a) A single taxon A the subclade species tree (Yang et al., 2015). In this procedure, B 2n = 2x from an B C 2n = 2x each node on a species tree can be counted at most once per autopolyploidy C orthogroup to avoid nested gene duplications inflating the num- event Node with ber of duplications scored. Two alternative filters were applied low support for comparison. The first filter required an average bootstrap per- (b) A single taxon A centage of each orthogroup to be at least 50. Alternatively, we from an B A 2n = 4x A also tested a local topology filter that only mapped a gene dupli- allopolyploidy B 2n = 2x event C C 2n = 2x cation event when the sister clade of the gene duplication node in the orthogroup contained a subset of the taxa in the correspond- (c) A single taxon A ing sister clade in the species tree (Cannon et al., 2015; Li et al., from an A 2n = 4x B 2015). allopolyploidy B 2n = 2x event with one A C 2n = 2x parental lineage C Distribution of synonymous distance among gene pairs missing (Ks plots) A B A 2n = 4x (d) Multiple taxa For each of the ingroup Caryophyllales data sets, a Ks plot of A B 2n = 4x from an C 2n = 2x within-taxon paralog pairs was created following the same proce- autopolyploidy B dure as Yang et al. (2015) based on BLASTP hits. Similarly, we car- event C ried out a second Ks analysis based on BLASTN between CDSs A Node with without first reducing highly similar sequences to maximize B low support (e) Multiple taxa A 2n = 4x detection of more recent polyploidy events. In cases where tree- C from an B 2n = 4x based mapping was ambiguous, or comparison of within- vs A allopolyploidy C 2n = 2x between-species Ks peaks could help inform allopolyploidy, we B event D 2n = 2x also calculated the Ks distribution of between-species reciprocal D best BLASTN hit pairs. Ks values < 0.01 were excluded to avoid iso- forms from de novo assembled transcriptomes. (f) Multiple taxa A from an B A 2n = 4x allopolyploidy C B 2n = 4x Chromosome counts event with one D C 2n = 2x parental lineage A D 2n = 2x Chromosome counts were obtained from the Chromosome missing B Counts Database (ccdb.tau.ac.il accessed 5 October 2015). When counts in this database were unavailable or inconsistent, counts Polyploidy event mapped by orthogroup tree topology were obtained from the Jepson eFlora (ucjeps.berkeley.edu/ Polyploidy event supported by within-taxon Ks plots eflora/ accessed 5 October 2015) and Flora of (www.efloras.org/ accessed 5 October 2015). Fig. 1 Scenarios of polyploidy events (Kellogg, 2016). Letters ‘A–D’ represent taxon names, followed by chromosome numbers with the base number ‘x’, and schematic Ks plots (the x-axis is Ks values, and the y-axis A total evidence approach for mapping polyploidy events is the number of paralogous gene pairs). We considered six scenarios for mapping polyploidy events, including taking orthogroup tree topology, within-taxon Ks same lineage in which all taxa share the same within-taxon Ks plots, and chromosome counts into consideration (Fig. 1). peak, then this was inferred as autopolyploidy (Fig. 1d). How- When polyploidy events occurred without subsequent specia- ever, if an excess of duplication events was found on a lineage tion (or for which only one taxon was represented in our sam- ancestral to the lineage in which all taxa share the same pling; Fig. 1a–c), only a single within-taxon Ks plot would within-taxon Ks peak, this was inferred to be an allopolyploidy show a peak. In these instances, because we required at least event (Fig. 1e,f). We indicated the node with low support in two overlapping taxa between sister clades in the orthogroup Fig. 1 to highlight that allopolyploidy can lead to nodes with tree to record a gene duplication event, no gene duplication low support when both parental lineages are present or nodes was recorded from topology-based mapping. The polyploidy that are well supported when one of the parental lineages is event was therefore mapped to the terminal branch with the missing. We did not consider polyploidy events that were sup- Ks peak. However, when a polyploidy event was followed by ported by chromosome count alone, as these can be within- lineage diversification, we used information from both Ks peaks population variation (Caperta et al., 2016), and an increase in and the orthogroup tree topologies to map duplication events number can represent chromosome fission instead of duplica- (Fig. 1d–f). If we saw an excess of duplication events along the tion (Fishman et al., 2014; Chester et al., 2015).

New Phytologist (2018) 217: 855–870 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 859

Results RAXML and ASTRAL recovered nearly identical species tree topologies Data availability RAXML and ASTRAL analyses recovered identical topologies for Raw reads for newly generated transcriptomes were deposited most branches (Figs 2–6, S2–S4). We consider branches with an in the NCBI Sequence Read Archive (BioProject: ICA score higher than 0.5 as strongly supported, as ICA scores < PRJNA388222; Table S2). Assembled sequences, alignments, 0.5 suggest that the dominant bipartition is present in < 80% of and trees were deposited in Dryad (https://doi.org/10.5061/ ortholog trees (Salichos et al., 2014). As multi-locus bootstrap dryad.st3gt). Scripts used were also archived in Dryad, with support percentages increase with the number of loci (Seo, 2008) notes and updates for modified phylomes available from and given that each of our final ortholog set contained > 100 loci https://bitbucket.org/yangya/genome_walking_2016 and those (Table 1), we consider multi-locus bootstrap values < 100 as low for building lineage-specific homologs and mapping poly- support. Using this set of criteria, most branches from subclade ploidy events available from https://bitbucket.org/blackrim/clus data sets (Figs 2–6, S2) and the majority of the branches from tering. modified phylomes (Figs S3, S4) were well supported.

0.0% Boerhavia torreyana 2n = 52 Phytolaccoid Clade + Aizoaceae (PHYT) 0.00 2.3% Boerhavia purpurascens 0.86 8.0% Boerhavia coccinea SRA 2n = 52 1.0 Boerhavia coccinea 2n = 52 0.0% 7.9% 0.02 Boerhavia burbidgeana 0.72 0.1% Anulocaulis leiosolenus var. gypsogenus 2n = 24 Above branch: % gene duplication 0.17 1.2% Anulocaulis eriosolenus 2n = 24 5.4% Below branch: ICA score 0.50 0.39 Nyctaginia capitata * Alternative topology compared to ASTRAL 4.9% Allionia incarnata 2 2n = 40 1 2.2% 0.3% 0.94 Allionia incarnata 2n = 40 Polyploidy event placed by tree topology 0.03 –0.01 Cyphomeris gypsophiloides 1 Polyploidy event supported by Ks plots * 3.0% Mirabilis jalapa SRA 2n = 58 7.9% 5.0% Mirabilis jalapa 2n = 58 0.90 0.10 0.94 Mirabilis multiflora 2n = 66 Nyctaginaceae 1.9% Abronia bigelovii 1.9% 51.6% 1 0.94 Abronia nealleyi 2n = 92 0.86 1.0 Tripterocalyx crux-maltae 1 0.3% Acleisanthes lanceolata 3.5% 0.15 1.1% Acleisanthes chenopodioides 2.7% 0.1 substitution per site 0.53 1.0 Acleisanthes acutifolia 0.94 Acleisanthes obtusa 7.3% 1.3% stipitata 2n = 34 1.0 0.94 Bougainvillea spectabilis 2n = 34 0.3% Pisonia aculeata 2n = 134,136 0.9% 0.86 Guapira obtusata 1.4% 0.79 0.22 Pisonia umbellifera 2n = 134,136 2.5% Hilleria latifolia 2n = 36 1.7% 1.7% 1.0 Rivina humilis 2n = 108,126 Petiveriaceae 0.78 Petiveria alliacea 2n = 36 4.6% 0.56 0.01 Seguieria aculeata 0.0% Anisomeria littoralis 0.20 2.3% Phytolacca dioica 2n = 36 33.2% 2 0.35 0.6% Phytolacca bogotensis 2n = 36 0.89 11.6% Phytolaccaceae 2 0.94 0.94 Phytolacca americana 2n = 18 Ercilla volubilis 20 4.9% Sarcobatus vermiculatus 2n = 36 Sarcobataceae 1.0 Sarcobatus vermiculatus MJM 2n = 36 PHYT 4.7% 15 0.0% Sesuvium portulacastrum 2n = 16 0.72 1.3% 0.45 ingroups Sesuvium verrucosum 2n = 16 10 0.4% 1.0 Cypselea humifusa 0.39 3.2% Zaleya pentandra 2n = 16 Aizoaceae 5 PHYT2 PHYT1 1.0 9.6% Trianthema portulacastrum 2n = 26 28

0.5% Number of branches 1.0 Trianthema portulacastrum SRA 2n = 26,28 0.49 0 0.6% echinatum 2n = 54 0 102030405060 0.94 Mesembryanthemum crystallinum 2n = 18 Percent gene duplication Chenopodiaceae Beta vulgaris 2n = 18 Chenopodiaceae Spinacia oleracea 2n = 12 Caryophyllaceae Stegnospermataceae Stegnosperma halimifolium Dianthus caryophyllus 2n = 30

Fig. 2 Best tree from RAXML analysis of the concatenated supermatrix of the phytolaccoid clade and Aizoaceae (PHYT). Percentage values above branches indicate the proportion of orthogroups showing duplication filtered by bootstrap percentage. Internode certainty all (ICA) values are given below branches.

Ó 2017 The Authors New Phytologist (2018) 217: 855–870 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 860 Research Phytologist

Portulacineae + Molluginaceae (PORT) 0.1 substitution per site

Above branch: % gene duplication 0.3% suffrutescens 2n = 8 Below branch: ICA score 0.75 1.7% Portulaca pilosa 2n = 8 0.54 Alternative topology compared to ASTRAL 2.61% * Portulaca amilis 2n = 18 1 Polyploidy event placed by tree topology 0.5% 0.9 0.11 Portulaca grandiflora 2n = 10,18 1 Polyploidy event supported by Ks plots 1.0% Portulaca umbraticola 2n = 18 0.23 6.6% Portulaca cryptopetala 1.0 2.1% Portulaca oleracea 2n = 18 0.27 7.9% 0.1% Portulaca oleracea SRA 2n = 18 1.0 0.08 Portulaca molokiniensis 2n = 40 12 0.2% Talinopsis frutescens 10 8 0.9 filamentosa 2n = 18 2.8% 6 0.5% Lophophora williamsii 2n = 22 4 0.78 0.3% 0.93 Echinocereus pectinatusi 2n = 22 PORT1 PORT3 0.16 2 0.9% Nopalea cochenillifera 2n = 22

Number of branches –0.09 0 Pereskia aculea 2n = 22 Cactaceae 0.0 0.1 0.2 0.3 11.2% 0.0% 4.3% –0.1 Pereskia grandifolia 2n = 22 Percent gene duplication 0.93 0.56 Leuenbergeria lychnidiflora 2n = 22 0.0% –0.09 * Leuenbergeria bleo 2n = 22 1.6% 2.4% paniculatum 2n = 24 0.12 Portulacineae 0.9% 1.0 Talinum sp. Talinaceae 0.8 Talinum portulacifolium 2n = 48 1 19.1% PORT Basella alba 2n = 44,48 1 0.93 2 Basellaceae ingroups 20.5% 3 Claytonia nevadensis 2n = 14 23.8% 0.4% 1.0 3 Claytonia virginica 2n = 12 to 190 0.58 1 parviflorus 2n = 24

0.41 0.0% Mollugo nudicaulis 2n = 54 0.2% 0.16 Mollugo cerviana 2n = 18 0.01 Molluginaceae 2.5% 4 Mollugo verticillata 2n = 64 0.72 Mollugo pentaphylla 2n = 18 Chenopodiaceae Beta vulgaris 2n = 18 Chenopodiaceae Spinacia oleracea 2n = 12 Caryophyllaceae Dianthus caryophyllus 2n = 30

300

Between

250 Mollugo pentaphylla 200 and Talinum sp. PORT1 (Ks = 0.76) Between 200 Ks = 0.64 150 Portulaca pilosa Mollugo pentaphylla PORT1 (Ks = 0.9) and Portulaca pilosa 150 (Ks = 0.9) 100 Gene pair counts Gene pair counts 100 Talinum sp. 50 50 Mollugo Mollugo pentaphylla pentaphylla 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 Ks Ks

Fig. 3 Best tree from RAXML analysis of the concatenated supermatrix of Portulacineae and Molluginaceae (PORT). Percentage values above branches indicate the proportion of orthogroups showing duplication filtered by bootstrap percentage. Internode certainty all (ICA) values are given below branches. Selected Ks plots based on BLASTN hits are shown below trees. Ks values < 0.01 are not shown. We recovered between 152 and 736 one-to-one orthologs and of 98–100% and character occupancies of 87–93% (Table 1). 0.2–1.1 million trimmed CDS columns from each of the five Four clades showed different relationships between RAXML and subclades. The concatenated supermatrices had gene occupancies ASTRAL, with little support for either alternative relationships

New Phytologist (2018) 217: 855–870 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 861

Above branch: % gene duplication + Chenopodiaceae (AMAR) Below branch: ICA score Alternative topology compared to ASTRAL * 0.0% prostrata 2n = 18 0.06 1 Polyploidy event placed by tree topology 1.2% Atriplex hortensis 2n = 18 1.3 % 0.97 1 Polyploidy event supported by Ks plots Atriplex rosea 2n = 18 0.81 0.3% Extriplex californica 2n = 18 3.3% 1.0 spinosa 2n = 36 0.97 0.2% 1.8% Chenopodium giganteum 2n = 54 0.1 0.3% 0.86 Chenopodium quinoa 2n = 36 0.47 Spinacia oleracea 2n = 12 0.4% lanata 2n = 18 0.71 Agriophyllum squarrosum 2n = 18 Chenopodiaceae 15 0.0% Suaeda fruticosa 2n = 18 0.4% 0.1% Suaeda linearis 2n = 54 0.09 0.01 0.0 10 5.9% Bienertia sinuspersici 2n = 18 0.88 4.5% Salicornia europaea 2n = 18 1.3% 0.94 Salicornia europaea GB 2n = 18 5 0.3% AMAR3 AMAR1 0.03 0.72 0.8% Haloxylon ammodendron 2n = 18 0.5% 1.0 Halogeton glomeratus 2n = 18 Number of branches 0 0.9 Bassia scoparia 2n = 18 0.0 0.1 0.2 0.3 0.4 0.59% Beta vulgaris 2n = 18 Percent gene duplication 1.0 Beta maritima 2n = 18 3.2% Alternanthera philoxeroides 2n = 64 14.1% Alternanthera brasilian 0.21 8.1% 0.97 2n = 28,32,34,40,88 AMAR 3.5% 0.9 1.19% Alternanthera caracasana 2n = 96 ingroups 0.51 2.0% 0.81 Alternanthera tenella 2n = 28 –0.0 0.1% vermiculare 2n = 30 * 1.5% Gossypianthus lanuginosus 6.2% 0.64 3.8% 0.79 0.35 1 Guilleminea densa var. aggregata 0.93 36.7% Froelichia latifolia 2n = 54,58 Amaran- 1 Tidestromia lanuginosa 2n = 20 0.97 thaceae 0.3% lanata 2n = 16 0.7% 0.92 2 2n = 32 0.55 5.9% Amaranthus hypochondriacus 2n = 32 0.84 3.5% Amaranthus hypochondriacus PAC 2n = 32 3 25.3% 0.77 Amaranthus retroflexus 2n = 32,34 0.7% 3 1.0 Amaranthus cruentus 2n = 32,34 0.1 Amaranthus tricolor 2n = 32,34 Achatocarpaceae Phaulothamnus spinescens Caryophyllaceae Dianthus caryophyllus 2n = 30

0. 1 substitution per site Between Aerva Aerva javanica javanica and Aerva lanata (Ks = 0.24)

8000 800 Tidestromia lanuginosa

6000 Between 600 Tidestromia lanuginosa and AMAR2 (Ks = 0.23) 4000 Aerva javanica 400 (red, Ks = 0.46) AMAR2 AMAR1

Gene pair counts (Ks = 0.23) 2000 (green, Ks = 0.50) Gene pair counts 200 Aerva Aerva javanica lanata 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Ks Ks

Fig. 4 Best tree from RAXML analysis of the concatenated supermatrix of Amaranthaceae and Chenopodiaceae (AMAR). Percentage values above branches indicate the proportion of orthogroups showing duplication filtered by bootstrap percentage. Internode certainty all (ICA) values are given below branches. Selected Ks plots based on BLASTN hits are shown below trees. Ks values < 0.01 are not shown.

(Figs 2–6 marked with ‘*’, S2): Cyphomeris gypsophiloides (Nyc- Anulocaulis + Boerhavia in the ASTRAL tree (bootstrap = 95); taginaceae, PHYT) was sister to Allionia in the RAXML tree species in Leuenbergeria (Cactaceae, PORT) were monophyletic (ICA = 0.01) but was sister to the clade Nyctaginia + in the RAXML tree (ICA = 0.09) but were paraphyletic to the

Ó 2017 The Authors New Phytologist (2018) 217: 855–870 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 862 Research Phytologist

Caryophyllaceae (CARY)

2.9% Cerastium arvense 2n = 36 2.0% 1.0 2 Cerastium fontanum subsp. vulgare Above branch: % gene duplication 2n = 72 0.1% 0.98 Lepyrodiclis stellarioides 2n = 68 Below branch: ICA score 0.08 Arenaria serpyllifolia 2n = 20 Alternative topology compared to ASTRAL 1.2% * 5.4% 3 Honckenya peploides 2n = 68 1 0.36 Polyploidy event placed by tree topology 0.2% 2.6% 0.78 4 Schiedea membranacea 2n = 44,60 1 Polyploidy event supported by Ks plots 0.98 5 Scleranthus polycarpos 2n = 44 1.3% 0.13 Colobanthus quitensis 2n = 80 0.06 6 0.7% Velezia rigida 2n = 28 0.1% 1.0 Dianthus caryophyllus 2n = 30 –0.04 1.0% * Gypsophila repens 2n = 34 0.96 14 Saponaria officinalis 2n = 28 12 0.5% Silene latifolia 2n = 24 10 0.5% Silene latifolia SRA 2n = 24 1 22.8% 0.94 0.24 8 0.8 Silene conica 2n = 20,24 1 0.0% 6 1.5% 0.01 0.23 Silene noctiflora 2n = 24 4 CARY1 1.0% Silene vulgaris ALN 2n = 24 2 8.6%

Number of branches 0.97 Silene vulgaris 2n = 24 0 0.98 0.1 0.2 0.3 21.1% 0.0 1 5.1% 0.8% Silene paradoxa 2n = 24 Percent gene duplication 0.96 0.94 0.96 Silene acaulis subsp. subacaulescens 2n = 24 0.7% Agrostemma githago 2n = 24 0.39 4.2% Eremogone procera 2n = 22 0.8% 0.98 Eremogone hookeri subsp. desertorum CARY 0.7 2n = 44,66 ingroups media 2n = 18 2.6% Drymaria cordata 2n = 24,36 0.4% 7 0.2% 0.96 Drymaria subumbellata 2n = 40,42,44 0.45 0.04 0.6% Illecebrum verticillatum 2n = 10 0.97 2.1% Polycarpaea repens 2n = 18 0.92 Herniaria latifolia 2n = 18 0.1% Telephium imperati 2n = 18 0.77 Corrigiola litoralis 2n = 18 Chenopodiaceae Spinacia oleracea 2n = 12 Chenopodiaceae Beta vulgaris 2n = 18 Achatocarpaceae Phaulothamnus spinescens

Between 0.1 substitution per site Silene latifolia and 200 Spergularia media (Ks = 0.55) Schematic phylogram showing the genome duplication and cladogenic events in the Honckenya+Schiedea+Scleranthus clade

150 Silene latifolia Honckenya vs Schiedea CARY1 (Ks = 0.56) Honckenya vs Scleranthus 100 Schiedea vs Scleranthus 3 Honckenya peploides

Gene pair counts 50 4 Schiedea membranacea Spergularia media 0 Ks 5 Scleranthus polycarpos 0.0 0.2 0.4 0.6 0.8 1.0

Honckenya vs Schiedea CARY5 (Ks = 0.08) CARY5 (Ks = 0.04) (Ks = 0.04) 1200 CARY3 (green, 1500 Ks = 0.06) 1000 6000 CARY3 Honckenya Scleranthus Scleranthus (Ks = 0.06) Scleranthus vs Schiedea (subsampled (subsampled 800 (Ks = 0.27) to fit in plot) CARY4 to fit in plot)1000 4000 CARY4 (blue, Ks = 0.22,

600 Honckenya vs (blue, Ks = 0.22, best seen in counts Fig. S6d) Gene pair Scleranthus best seen in Honckenya counts counts (red, Ks = 0.24) Fig. S6d)

400 Gene pair Gene pair 500 Schiedea 2000 200 Schiedea

0 0 0 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Ks Ks Ks

Fig. 5 Best tree from RAXML analysis of the concatenated supermatrix of Caryophyllaceae (CARY). Percentage values above branches indicate the proportion of orthogroups showing duplication filtered by bootstrap percentage. Internode certainty all (ICA) values are given below branches. Selected Ks plots based on BLASTN hits are shown below trees. Ks values < 0.01 are not shown.

rest of Cactaceae in the ASTRAL tree (bootstrap = 69); Tidestromia Alternanthera in the RAXML tree (ICA = 0.00), but was sister lanuginosa (Amaranthaceae, AMAR) was sister to the clade of to Alternanthera in the ASTRAL tree (bootstrap = 50); and Froelichia + Guilleminea + Gossypianthus + Blutaparon + Saponaria officinalis (Caryophyllaceae, CARY) was sister to

New Phytologist (2018) 217: 855–870 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 863

0.5% hastatulus 2n = 18 0.5% 0.96 Rumex acetosa 2n = 14,15 3.1% 0.96 Rumex palustris 2n = 60 Clade sister to the rest of 0.9 1.8% nobile 0.96 Rheum rhabarbarum 2n = 22 5.3% 4.8% Reynoutria japonica 2n = 44 0.73 Caryophyllales (NCORE) 0.93 3.4% Reynoutria japonica 2 2n = 44 0.0% Fallopia convolvulus 2n = 20 3.5% 9.9% 0.29 Muehlenbeckia platyclada 2n = 20 0.25 0.96 0.07 aviculare 2n = 40,60 3.7% Fagopyrum tataricum 2n = 16 18.8% 0.93 Fagopyrum esculentum 2n = 16 0.87 3.5% minor 2 2n = 40 8.0% Persicaria minor 2n = 40 42.8% 1 9.9% 0.9 0.96 Persicaria tinctoria 2n = 40,42,44 1.0 1 1.0 Persicaria virginiana 2n = 44 1.5% 0.9% Ruprechtia salicifolia 2n = 28 2.8% 0.59 Pterostegia drymarioides 2n = 28 0.25 0.17 Antigonon leptopus 2n = 14,40,44,48 0.1% Plumbago auriculata 2n = 14 0.3% 2 –0.22 Limonium spectabile 2n = 18 Plumbaginaceae 0.01 1.5% Reaumuria trigyna 2n = 22 5.4% 3 0.96 Reaumuria soongarica 2n = 22 5.3% 0.74 hispida 2n = 24 1.8% 1.0 Frankenia laevis 2n = 30 Frankeniaceae 0.73 0.2% 4 Aldrovanda vesiculosa 2n = 48 2.8% 5 0.31 Dionaea muscipula 2n = 32 Droseraceae 0.9% 0.96 Drosera binata 2n = 32 2.0% 0.03 15.5% Nepenthes alata SFB 2n = 80 6 Nepenthes alata 2n = 80 Nepenthaceae 0.77 1.0 Drosophyllum lusitanicum 2n = 12 Drosophyllaceae Chenopodiaceae Spinacia oleracea 2n = 12 Chenopodiaceae Beta vulgaris 2n = 18 0.9 0.42 0.23 Caryophyllaceae Dianthus caryophyllus 2n = 30 0.73 Microteaceae Microtea debilis 2n = 18 Physenaceae Physena madagascariensis Simmondsiaceae Simmondsia chinensis 2n = 52

7 0.04 substitution per site Above branch: % gene duplication 6 Below branch: ICA score 5 4 Alternative topology compared to ASTRAL * 3 1 NCORE1 Polyploidy event placed by tree topology 2 1 Polyploidy event supported by Ks plots 1 Number of branches 0 0.0 0.1 0.2 0.3 0.4 0.5 Percent gene duplication

Fig. 6 Best tree from RAXML analysis of the concatenated supermatrix of the clade sister to the rest of the Caryophyllales (NCORE). Percentage values above branches indicate the proportion of orthogroups showing duplication filtered by bootstrap percentage. Internode certainty all (ICA) values are given below branches.

Table 1 Statistics for homology and orthology inference. PHYT, PORT, AMAR, CARY and NCORE are subclades within Caryophyllales (see Fig. 7)

Caryophyllales PHYT PORT AMAR CARY NCORE

Data type Peptides Coding sequences (CDSs) Homology inference Modified phylome All-by-all Orthology inference Rooted ingroup One-to-one orthologs (Yang & Smith, 2014) Number of taxa (ingroup + outgroup) 175 + 40 45 + 429+ 337+ 231+ 331+ 6 Minimum number of taxa per ortholog 160 49 31 38 34 36 Supermatrix dimension taxa 9 loci 178 9 624 (215 669) 49 9 152 (217 033) 32 9 171 39 9 315 34 9 736 37 9 213 (columns) (230 873) (453 842) (1 130 082) (325 966) Supermatrix gene/character occupancy 92.6%/80.1% 100%/92.5% 97.8%/86.5% 98.1%/87.2% 100%/92.2% 97.5%/87.8% Minimal ingroup taxa for Not applicable 43 27 35 29 29 mapping gene duplication Number of orthogroups used for Not applicable 2843 3577 4713 6686 1649 mapping gene duplications

Ó 2017 The Authors New Phytologist (2018) 217: 855–870 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 864 Research Phytologist

Gypsophila + Dianthus + Velezia in the RAXML tree Mollugo pentaphylla (Molluginaceae). However, similar compar- (ICA = 0.04), but was sister to Dianthus + Velezia in the ASTRAL ison of Portulaca pilosa (Portulacineae) vs M. pentaphylla showed tree (bootstrap = 63). overlapping Ks peaks (Ks = 0.9; Fig. 3, lower right), probably Among the 15 045 homolog groups we obtained using the attributable to a higher molecular rate in Portulaca compared modified phylome approach, 15 had > 5000 sequences and were with Talinum. Therefore, phylogenetic uncertainty probably at ignored, while the rest were used for subsequent orthology infer- least partly contributed to the ambiguity in mapping. Both ence. The final concatenated matrix consisted of 624 loci and PORT2 and PORT4 were recovered by taxon-specific Ks peaks, 215 669 amino acids, with a final gene occupancy of 92.6% and and both had relatively high chromosome counts compared with character occupancy of 80.1% (Table 1). The modified phylome close relatives (Figs 3, S6b). PORT3 was supported by shared Ks approach recovered identical species tree topologies except for peaks and gene duplications in orthogroup trees (21%/18%), one branch that had little support from either analysis (Figs S3, whereas chromosome counts were uninformative. S4): Leuenbergeria (Cactaceae) was monophyletic in the RAXML At least three polyploidy events were recovered in the AMAR tree (ICA = 0.18) but was polyphyletic in the ASTRAL tree (boot- clade (Figs 4, S5c, S6c). AMAR1 was detected by an elevated per- strap = 28). The modified phylome approach recovered an identi- centage of gene duplications mapped to the branch uniting cal species tree topology compared to that recovered by the Alternanthera + Gossypianthus + Blutaparon + Froelichia + Aerva subclade analysis. When subclade trees had different topologies (37%/35%). However, species of Aerva lacked the AMAR1 Ks between RAXML and ASTRAL, the modified phylome tree agreed peak shared by Alternanthera + Gossypianthus + Blutaparon with the subclade ASTRAL results in the placement of Cyphomeris + Froelichia at 0.4–0.65. Further examination of the between- gypsophiloides (ICA = 0.23 and bootstrap = 97), whereas the posi- species Ks peak of Aerva javanica vs Tidestromia lanuginose tion of Leuenbergeria was recovered in the same incongruent posi- (Ks = 0.46) shows that it was more ancient than the within- tions as recovered by RAXML and ASTRAL in the subclade tree species Ks peak AMAR2 at 0.23 in A. javanica (Fig. 4, lower left), analyses. The modified phylome approach recovered both suggesting that paralogs in A. javanica coalesced more recently Tidestromia lanuginosa (0.19/81) and Saponaria officinalis (0.38/ than coalescing with taxa outside of Aerva, and AMAR1 and 98) in the same positions as found in the RAXML results for the AMAR2 were two distinct polyploidy events. The Ks peak within subclade trees. A. javanica (AMAR2; Ks = 0.23) overlapped with the between- species Ks peak of A. javanica vs A. lanata (Ks = 0.24; Fig. 4, lower right). Given that A. javanica had a higher molecular sub- Twenty-six polyploidy events were mapped stitution rate than A. lanata according to their relative branch Twenty-six polyploidy events were inferred by using a total evi- lengths (Fig. 4), paralogous copies within A. javanica probably dence approach of orthogroup tree topology, shared Ks peaks, coalesced along the branch leading to A. javanica before coalesc- and chromosome counts (Figs 2–6). Overall, the two orthogroup ing with A. lanata. The lack of the AMAR2 peak in A. lanata as tree filtering strategies, by bootstrap percentage or local tree well as chromosome counts (2n = 32 for A. javanica vs 2n = 16 topology, produced almost identical results for frequency of gene for A. lanata) further supported the location of AMAR2 along duplication (Figs 2–6, S5). The frequency of gene duplications the terminal branch leading to A. javanica. Based on the lack of was strongly associated with the inferred polyploidy events the AMAR1 peak in both Aerva species, we inferred that AMAR1 (Figs 2–6). In our Ks analysis, we only considered Ks peaks that was an allopolyploidy event, with one parental lineage closely were similar in height or taller than the peak, c. Ks = 2, that corre- related to Aerva and the other parental lineage missing, consistent sponds to the early eudicot paleohexaploidy that predates the with the scenario in Fig. 1(f). origin of Caryophyllales (Dohm et al., 2012; Jiao et al., 2012; At least seven polyploidy events were recovered in the CARY Yang et al., 2015). clade (Figs 5, S5d, S6d). CARY1 was detected through an elevated Two polyploidy events in the PHYT clade were supported by percentage of gene duplication in two adjacent nodes (21%/20% both homolog tree topology and shared Ks peaks (Figs 2, S5a, onthenodethatincludedSpergularia media, and 23%/20% in the S6a). The frequency of orthogroups showing evidence of gene node excluded S. media;Figs5,S5d).However,S. media did not duplication was 52% filtered by bootstrap percentage and 45% share the CARY1 Ks peak (Fig. S6d). The reciprocal best hits Ks filtered by tree topology for PHYT1, and 33% and 31%, respec- peak between S. media and Silene latifolia indicated that paralogs tively, for PHYT2. Both were significantly higher percentages derived from CARY1 coalesced within S. latifolia at similar Ks val- compared with remaining branches (Fig. 2). ues compared to coalescing with S. media (Fig. 5), suggesting that At least four polyploidy events were recovered in the PORT phylogenetic uncertainty at least partly contributed to the fact that clade. PORT1 was mapped to both the MRCA of Portulacineae CARY1 mapped to two adjacent nodes. (19%/17%; filtered by bootstrap vs tree topology) and its parent Nested in CARY1, five taxa showed Ks peaks (Figs 5, S6d). node (24%/22%) from gene duplications (Figs 3, S5b). How- Among them, within-species Ks peak CARY2 was observed only in ever, Molluginaceae did not share the Ks peak that was present in Cerastium fontanum (2n = 72) but missing from its sister all members of Portulacineae (Fig. S6b). Within-species paralogs Cerastium arvense (2n = 36). Honckenya peploides had a within- in Portulacineae (represented by the PORT1 Ks peak in Talinum species Ks peak c.0.06(CARY3),itssisterSchiedea membranacea sp. at Ks = 0.64; Fig. 3, lower left) coalesced at lower Ks values had a Ks peak at 0.22 (CARY4), whereas their reciprocal best hit Ks compared with the Ks peak at 0.76 between Talinum sp. and peak was at 0.08, suggesting that CARY3 was restricted to the

New Phytologist (2018) 217: 855–870 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 865 terminal branch leading to H. peploides. The observation that the Discussion within-species Ks peak in S. membranacea (CARY4) was older than its split with H. peploides, but that CARY4 was not shared with Our analyses add to a growing body of literature that suggests that H. peploides, suggested an allopolyploid origin of S. membranacea polyploidy events are much more prevalent than previously thought with one parental lineage closely related to H. peploides while the (Cannon et al., 2015; Edger et al., 2015; Li et al.,2015;Yanget al., other parental lineage was missing from our taxon sampling. Pair- 2015; Huang et al., 2016; Xiang et al., 2016; Mandakova et al., wise comparison among S. membranacea, H. peploides,and 2017). The data set presented here uniquely contributes to this Scleranthus polycarpos showed that S. polycarpos had a polyploidy question by greatly improving taxon sampling of transcriptomes in event CARY5 (Ks = 0.04) that was more recent than its split with a major plant clade (168 species in Caryophyllales) whose evolu- Schiedea or Honckenya and is therefore restricted to S. polycarpos. tionary history spans a time period that encompasses both deep CARY6, also nested in CARY1, was mapped to the terminal branch divergences and more recent events during the Neogene (Smith leading to Colobanthus quitensis.AlthoughColobanthus was weakly et al., 2017). Likewise, our improved homology search and filtering supported to be sister to the clade consisting of approaches aid in identifying the phylogenetic locations of poly- Honckenya + Schiedea + Scleranthus (ICA = 0.13; bootstrap = 100), ploidy events. Moreover, we consider multiple lines of evidence for the peak at Ks = 0.15 was more recent than the Honckenya/ pinpointing the phylogenetic locations of polyploidy events, includ- Scleranthus (Ks = 0.24) or the Schiedea/Scleranthus (Ks = 0.27) split ing orthogroup tree topology, Ks plots, and chromosome counts. and is therefore inferred to be independent of CARY3, CARY4 or These approaches identified 26 polyploidy events across Caryophyl- CARY5. In addition to CARY2–6 which were nested in CARY1, lales, include 10 newly reported and 16 previously identified (Yang one additional polyploidy event, CARY7, independent of CARY1 et al., 2015; Walker et al., 2017b). Importantly, two of these 26 was mapped to the MRCA of Drymaria cordata and events are suggested to be allopolyploidy events. Drymaria subumbellata by a shared Ks peak. Both species had high chromosome counts relative to their sister lineages. Species trees based on transcriptome data are concordant At least six polyploidy events were inferred in the NCORE clade with previous analyses (Figs 6, S5e, S6e). Both gene duplications (43%/34%) and shared Ks peaks supported a polyploidy event at the base of Polygonaceae The species trees we recovered are highly concordant with previous (NCORE1). NCORE2–5 were supported by Ks peaks, and analyses of family-level relationships (Figs 7, S3, S4; Cuenoud NCORE6 was supported by both Ks peaks and chromosome et al., 2002; Brockington et al., 2009; Sch€aferhoff et al.,2009; counts. We inferred NCORE5 (base of Droseraceae) and Arakaki et al., 2011; Yang et al., 2015). As seen in previous analy- NCORE6 (branch leading to Nepenthes alata, Nepenthaceae) as ses, the placements of Sarcobataceae and Stegnospermataceae two separate polyploidy events on sister branches given that very remain poorly supported despite using hundreds of loci. We found low frequencies of gene duplication events were mapped to the weak support for two nodes that had previously received high sup- MRCA of Droseraceae + Nepenthaceae (0.9%/1.7%), compared port using a small number of loci. Previous analyses using plas- with the MRCA of Droseraceae (2.8%/3.0%) and Nepenthaceae tomes recovered Cactaceae as being sister to Portulacaceae with (16%/15%). 100% bootstrap support (Arakaki et al., 2011), but we recovered In addition to the polyploidy events detected from each of the Anacampserotaceae + Portulacaceae as sisters to Cactaceae (modi- five subclades, three of the four taxa along the grade paraphyletic to fied phylome ICA = 0.31 and ASTRAL multi-locus bootstrap = 100). PHYT + PORT + AMAR + CARY also each had a peak at Ks < 1: Previous studies recovered strong to moderate support for the Stegnosperma halimifolium (Fig. S6a), Physena madagascariensis monophyly of Portulacineae + Molluginaceae (likelihood boot- (Fig. S6e), and Simmondsia chinensis (Fig. S6e). No polyploidy event strap = 100; Arakaki et al., 2011; Bayesian posterior probabil- has been inferred along the Caryophyllales backbone leading to beet ity = 0.94 and parsimony bootstrap < 50; Nyffeler & Eggli, 2010), (Chenopodiaceae) from genome analysis (Dohm et al., 2012, but we found low support for the relationship (ICA = 0.06 and 2014), indicating Ks peaks mapped to this grade probably represent multi-locus bootstrap = 93). This confirms that, while individual lineage-specific polyploidy events. Also, the relatively high chromo- loci can be informative, there is a large amount of phylogenetic some count of S. chinensis (2n = 52) compared with M. debilis conflict among gene trees (Smith et al., 2015; Walker et al., 2017a, (2n = 18), the only taxon in this grade that did not experience a b). Future studies should dissect these cases of discordance using a polyploidy event, further supports the lineage-specific nature of the gene-by-gene approach (Arcila et al., 2017; Brown & Thomson, polyploidy events along this grade. In addition, P. spinescens (Acha- 2017; Shen et al., 2017; Walker et al., 2017a,b). tocarpaceae; sister to AMAR) also probably had its lineage-specific polyploidy event, as it had a Ks peak that was not shared with its sis- Many polyploidy events are associated with taxonomic ter clade (Figs 4, S6c). units and/or habitat shifts As we excluded Ks values < 0.01 when plotting to avoid iso- forms in de novo assembled transcriptome data that resulted in A notable pattern emerged showing that many polyploidy events very high Ks counts, occasionally there were apparent peaks at Ks occurred on branches leading to major taxa and/or involved clear < 0.2. We have plotted all Ks values on a different scale to zoom habitat shifts (Fig. 7). For example, PHYT1 is located on the in on these Ks peaks (shown in Fig. S6 only when Ks peak < 0.2 branch representing a transition from trees and large in was confirmed). wetter environments within the Neotropics to a radiation of arid-

Ó 2017 The Authors New Phytologist (2018) 217: 855–870 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 866 Research Phytologist

Boerhavia torreyana Boerhavia purpurascens Boerhavia coccinea SRA Boerhavia burbidgeana Boerhavia coccinea Anulocaulis eriosolenus Anulocaulis leiosolenus Nyctaginia capitata Cyphomeris gypsophiloides Allionia incarnata 2 Allionia incarnata Mirabilis jalapa SRA Mirabilis jalapa Mirabilis multiflora Nyctaginaceae Abronia bigelovii = Polyploidy event Tribe Nyctagineae Abronia nealleyi Tripterocalyx crux-maltae Acleisanthes chenopodioides PHYT1 Acleisanthes lanceolata Acleisanthes acutifolia Acleisanthes obtusa Bougainvillea spectabilis Bougainvillea stipitata Pisonia aculeata Guapira obtusata Pisonia umbellifera Hilleria latifolia Rivina humilis Petiveria alliacea Petiveriaceae PHYT2 Seguieria aculeata Phytolacca bogotensis ‘Phytolaccoid Phytolacca americana Anisomeria littoralis clade’ Phytolacca dioica Phytolaccaceae Ercilla volubilis Sarcobatus vermiculatus PHYT Sarcobatus vermiculatus MJM Sesuvium portulacastrum Sarcobataceae Sesuvium verrucosum Cypselea humifusa Zaleya pentandra Trianthema portulacastrum Aizoaceae Trianthema portulacastrum SRA Mesembryanthemum crystallinum Delosperma echinatum Portulaca suffrutescens Portulaca pilosa Portulaca amilis Portulaca grandiflora Portulaca umbraticola Portulaca cryptopetala Portulacaceae Portulaca oleracea Portulaca oleracea SRA Portulaca molokiniensis Talinopsis frutescens Anacampseros filamentosa Anacampserotaceae Lophophora williamsii Echinocereus pectinatus Nopalea cochenillifera Pereskia grandifolia Pereskia aculeata Cactaceae Leuenbergeria lychnidiflora Leuenbergeria bleo Talinum sp Talinum paniculatum Talinaceae Portulacineae Talinum portulacifolium Basella alba Basellaceae PORT2 PORT3 Claytonia virginica PORT1 Claytonia nevadensis Phemeranthus parviflorus Montiaceae Mollugo cerviana Mollugo nudicaulis PORT Mollugo verticillata Molluginaceae Mollugo pentaphylla PORT4 Atriplex prostrata Atriplex hortensis Atriplex rosea Extriplex californica Chenopodium quinoa Chenopodium giganteum Spinacia oleracea Agriophyllum squarrosum Suaeda fruticosa Suaeda linearis Chenopodiaceae Bienertia sinuspersici Salicornia europaea Salicornia europaea GB Haloxylon ammodendron Halogeton glomeratus Bassia scoparia Beta maritima Beta vulgaris Alternanthera brasiliana Alternanthera philoxeroides AMAR Alternanthera sessilis Alternanthera caracasana Alternanthera tenella Blutaparon vermiculare AMAR1 Gossypianthus lanuginosus Guilleminea densa Froelichia latifolia Tidestromia lanuginosa Amaranthaceae Aerva lanata Aerva javanica Amaranthus hypochondriacus AMAR2 Amaranthus hypochondriacus PAC AMAR3 Amaranthus retroflexus Amaranthus tricolor Amaranthus cruentus Phaulothamnus spinescens Achatocarpaceae Schiedea membranacea CARY4 Honckenya peploides CARY3 Scleranthus polycarpos CARY5 Colobanthus quitensis CARY6 Cerastium arvense Cerastium fontanum Lepyrodiclis stellarioides CARY2 Arenaria serpyllifolia Velezia rigida Dianthus caryophyllus Gypsophila repens Saponaria officinalis Silene noctiflora CARY1 Silene conica Silene latifolia SRA Silene latifolia Caryophyllaceae Silene vulgaris Silene vulgaris ALN Silene acaulis Silene paradoxa Agrostemma githago Eremogone hookeri Eremogone procera Spergularia media CARY7 Drymaria cordata Drymaria subumbellata CARY Illecebrum verticillatum Polycarpaea repens Herniaria latifolia Corrigiola litoralis Telephium imperati Stegnosperma halimifolium Stegnospermataceae Microtea debilis Microteaceae Physena madagascariensis Simmondsia chinensis Physenaceae Simmondsiaceae Rumex acetosa Rumex hastatulus Rumex palustris Rheum rhabarbarum Rheum nobile Reynoutria japonica 2 Reynoutria japonica Muehlenbeckia platyclada Fallopia convolvulus Polygonum aviculare Fagopyrum tataricum Polygonaceae Fagopyrum esculentum Persicaria minor NCORE1 Persicaria minor 2 Persicaria tinctoria Persicaria virginiana NCORE Pterostegia drymarioides Ruprechtia salicifolia NCORE2 Antigonon leptopus Limonium spectabile Plumbago auriculata Plumbaginaceae Reaumuria soongarica NCORE3 Reaumuria trigyna Tamarix hispida Tamaricaceae Frankenia laevis Frankeniaceae NCORE5 NCORE4 Aldrovanda vesiculosa Dionaea muscipula Drosera binata Droseraceae NCORE6 Nepenthes alata Nepenthes alata SFB Nepenthaceae Drosophyllum lusitanicum Drosophyllaceae 0.09

New Phytologist (2018) 217: 855–870 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 867

Fig. 7 Species tree inferred by RAXML analysis of the supermatrix from modified phylomes. Polyploidy events are labeled according to Figs 2–6. When orthogroup tree topology vs Ks plots placed a polyploidy event on different branches as a result of either phylogenetic uncertainty or allopolyploidy, we placed the event at the most recent common ancestor of taxa that share a within-taxon Ks peak. and semiarid-adapted herbs and recognized as Tribe taxon sampling. Moving forward, genome and transcriptome Nyctagineae of Nyctaginaceae (Douglas & Manos, 2007; Dou- data will be essential for investigating selection, homeolog expres- glas & Spellenberg, 2010). Similarly, PORT1 at the base of Por- sion, gene silencing and loss in contributing to these divergence tulacineae is associated with the evolution of succulence (Nyffeler events following allopolyploidy. et al., 2008; Edwards & Ogburn, 2012; Ogburn & Edwards, 2013). Additional polyploidy events are inferred along the Improved homology inference methods improve polyploidy branch leading to Polygonaceae (Schuster et al., 2013) and the mapping branch leading to Droseraceae, a carnivorous family (Rivadavia et al., 2003). In the original phylome approach (Huerta-Cepas et al., 2011), Similar cases of polyploidy events at or near the base of major each sequence from a seed species was used to search against a clade origins include seed plants (Jiao et al., 2011), angiosperms database of sequenced genomes. The resulting homologous (Jiao et al., 2011), monocots (Jiao et al., 2014), early sequences were filtered and aligned, and phylogenetic trees were (Jiao et al., 2012), and Asteraceae (Barker et al., 2016; Huang constructed. In this study, we made three modifications to the et al., 2016). This hints at a potential relationship between original phylome approach to enhance the ability to accommo- genome duplication and evolutionary innovations (Soltis & date transcriptome data (Fig. S1). First, we merged putative Soltis, 2016). However, branches leading to major recognizable homolog groups that represent gene duplication within taxonomic units tend to be relatively long and thus had more Caryophyllales, ensuring that our final orthogroups are nonover- time to accumulate changes. Hence, while correlations between lapping. Second, given the presence of multiple transcript iso- polyploidy and evolutionary novelty are intriguing, we must be forms and the inherent incompleteness of transcriptome data cautious in assuming that polyploidy is the cause of such innova- sets, we used transcriptome sequences as queries to search against tion (Smith et al., 2017). the beet proteome for sorting transcriptome-derived sequences into backbone orthogroups constructed with genomes only. Lastly, to remove spurious tips and isoforms, we added tip- Inferring allopolyploidy from transcriptome data trimming and long-branch cutting procedures. By taking this Methods of polyploidy detection developed for genomes or low- two-step, baited approach we were able to process a large number copy nuclear loci are inadequate for data sets with isoforms and of taxa without going through the time-consuming all-by-all missing duplicated copies (Lott et al., 2009; Jones et al., 2013; homology search required by ORTHOMCL (Li et al., 2003) and Marcussen et al., 2015; Thomas et al., 2017). While we applied a ORTHOFINDER (Emms & Kelly, 2015). A second advantage of stringent filter to minimize missing taxa in orthogroups (no more this modified phylome approach is that it avoids a graph-based than two missing), differential gene loss or silencing following clustering step, and hence is not biased by sequence length or polyploidy events remained a problem. Given that our goal of phylogenetic relatedness among taxa (Emms & Kelly, 2015). accurately pinpointing the phylogenetic location of polyploidy The modified phylome approach is more effective than other events and searching for allopolyploidy is highly dependent on baited approaches such as HAMSTR (Ebersberger et al., 2009) in taxon sampling, we only explored cases when Ks vs orthogroup that it explicitly takes gene tree topology into account in distin- tree-based mapping disagreed with each other. guish orthologs from paralogs. However, because both the origi- Two allopolyploidy events were inferred in this study. We nal phylome and the modified methodology start with a seed inferred that the AMAR1 (Fig. 4) paleopolyploidy event followed genome, the resulting orthogroup set is dependent on the quality by a nested, more recent polyploidy event (AMAR2) together and gene content of the seed genome. were responsible for the observed Ks peaks, instead of a single, In addition to the modified phylome approach, to overcome deeper event as previously reconstructed (Yang et al., 2015). phylogenetic uncertainty associated with deep timescales we Schiedea also has a complex history (Fig. 5). While the polyploid employed a second homology inference strategy that inferred sub- origin of Schiedea was previously identified (Kapralov et al., clade species trees using all-by-all homology search and Markov 2009; Yang et al., 2015), by including its close relatives, clustering (van Dongen, 2000) of filtered hits, followed by align- Honckenya and Scleranthus, we show that all three each had their ment and tree trimming (Yang & Smith, 2014). We use the origi- own lineage-specific polyploidy event. Schiedea probably had a nal Markov clustering software MCL (van Dongen, 2000) instead parental lineage other than the lineage leading to Honckenya (see of software packages such as ORTHOMCL (Li et al., 2003) and schematic phylogram in Fig. 5). The putative allopolyploid origin ORTHOFINDER (Emms & Kelly, 2015) that aim to directly obtain of Schiedea adds to a growing list of Hawaiian endemic radiations orthogroups using filtered and normalized BLAST hits. The normal- with similar putative allopolyploid origins (Barrier et al., 1999; ization procedures used by these software packages were based on Yang & Berry, 2011; Marcussen et al., 2012; Roy et al., 2015), genome-derived data and were yet to be evaluated using transcrip- and demonstrates the importance of increased transcriptomic tome data sets that had isoforms, missing data, and assembly

Ó 2017 The Authors New Phytologist (2018) 217: 855–870 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 868 Research Phytologist

errors. By using the original MCL with relatively low inflation value carried out laboratory work; Y.Y. analyzed data and led the writing. (i.e. coarse clusters) and taking advantage of outgroup information All authors edited the manuscript and approved the final version. to root and extract orthogroups, we were able to minimize the loss of gene duplication information in our data set. References Using our techniques, we found that, for each inferred poly- ploidy event, approximately one-third of genes show clear evi- Arakaki M, Christin PA, Nyffeler R, Lendel A, Eggli U, Ogburn RM, Spriggs E, dence of duplication (i.e. they retain at least two overlapping taxa Moore MJ, Edwards EJ. 2011. Contemporaneous and recent radiations of the world’s major lineages. Proceedings of the National Academy of between paralogs), similar to the numbers identified in both tran- Sciences, USA 108: 8379–8384. scriptomes and genomes (Yang et al., 2015). For example, the Arcila D, Ortı G, Vari R, Armbruster JW, Stiassny MLJ, Ko KD, Sabaj MH, PHYT2, AMAR1 and AMAR3 events follow this ‘one-third rule’ Lundberg J, Revell LJ, Betancur -RR. 2017. Genome-wide interrogation (Figs 2–6). When there is phylogenetic uncertainty, gene duplica- advances resolution of recalcitrant groups in the tree of life. Nature Ecology & tion events may be mapped to two adjacent nodes, each with Evolution 1: 0020. Barker MS, Li Z, Kidder TI, Reardon CR, Lai Z, Oliveira LO, Scascitelli M, lower percentages, such as observed for PORT1 and CARY1. Rieseberg LH. 2016. Most Compositae (Asteraceae) are descendants of a Percentages of gene duplication can be inflated during rapid paleohexaploid and all share a paleotetraploid ancestor with the Calyceraceae. diversifications, where short internodes and phylogenetic uncer- American Journal of 103: 1203–1211. tainty make it difficult to distinguish paralogous copies from iso- Barrier M, Baldwin BG, Robichaux RH, Purugganan MD. 1999. Interspecific forms using tree topology. Such inflated percentages of gene hybrid ancestry of a plant adaptive radiation: allopolyploidy of the Hawaiian silversword alliance (Asteraceae) inferred from floral homeotic gene duplication can be seen at the base of Cactaceae and Silene (with- duplications. Molecular Biology and Evolution 16: 1105–1113. out polyploidy), and at PHYT1 and NCORE1 (following a poly- Bell CD, Soltis DE, Soltis PS. 2010. The age and diversification of the ploidy event). angiosperms re-revisited. American Journal of Botany 97: 1296–1303. Moving forward, additional taxon sampling of genomes and Brockington SF, Alexandre R, Ramdial J, Moore MJ, Crawley S, Dhingra A, transcriptomes will be essential to identify additional polyploidy Hilu K, Soltis DE, Soltis PS. 2009. Phylogeny of the Caryophyllales sensu lato: revisiting hypotheses on pollination biology and differentiation in the events and pinpoint their phylogenetic locations. Understanding core Caryophyllales. International Journal of Plant Sciences 170: 627–643. the legacy of ancient polyploidy events in plant macroevolution Brockington SF, Yang Y, Gandia-Herrero F, Covshoff S, Hibberd JM, Sage RF, will require many other forms of improved data as well, including Wong GKS, Moore MJ, Smith SA. 2015. Lineage-specific gene radiations functional studies of traits, molecular pathways, and genes them- underlie the evolution of novel betalain pigmentation in Caryophyllales. New – selves. Only then will we have a more comprehensive functional Phytologist 207: 1170 1180. Brown JM, Thomson RC. 2017. Bayes factors unmask highly variable framework for understanding differential gene retention and information content, bias, and extreme influence in phylogenomic analyses. diploidization following polyploidy events, and how polyploidy Systematic Biology 66: 517–530. is linked to character evolution and niche shifts. Byng JW, Chase MW, Christenhusz MJM, Fay MF, Judd WS, Mabberley DJ, Sennikov AN, Soltis DE, Soltis PS, Stevens PF et al. 2016. An update of the Angiosperm Phylogeny Group classification for the orders and families of Acknowledgements flowering plants: APG IV. Botanical Journal of the Linnean Society 181:1–20. Cannon SB, McKain MR, Harkess A, Nelson MN, Dash S, Deyholos MK, The authors thank H. Flores Olvera, H. Ochoterena, N. Douglas, Peng Y, Joyce B, Stewart CN, Rolf M. 2015. Multiple polyploidy events in S. Lavergne, T. Stoughton, L. Crawford, G. Kadereit, R. Puente, the early radiation of nodulating and nonnodulating legumes. Molecular Biology L. Majure, M. Howard, D. Anderson, M. Palmer, and K. Thiele and Evolution 32: 193–210. ß~ for assisting with obtaining plant material; the Cambridge Univer- Caperta AD, Castro S, Loureiro J, Rois AS, Conceicao S, Costa J, Rhazi L, Espırito Santo D, Arsenio P. 2016. Biogeographical, ecological and ploidy sity Botanic Gardens, Bureau of Land Management, White Sands variation in related asexual and sexual Limonium taxa (Plumbaginaceae). Missile Range, US Forest Service, State Parks, Rancho Botanical Journal of the Linnean Society 183:75–93. Santa Ana Botanic Garden, Desert Botanical Garden, Millennium Chester M, Riley RK, Soltis PS, Soltis DE. 2015. Patterns of chromosomal Seed Bank, and Booderee National Park for granting access to their variation in natural populations of the neoallotetraploid Tragopogon mirus (Asteraceae). Heredity 114: 309–317. plant material; M. R. M. Marchan-Rivadeneira and M. Parks for help with laboratory work; L. Y. Chen for help revising the Cuenoud P, Savolainen V, Chatrou LW, Powell M, Grayer ReJ, Chase MW. 2002. Molecular phylogenetics of Caryophyllales based on nuclear 18S rDNA manuscript; and three anonymous reviewers for constructive sug- and plastid rbcL, atpB, and matK DNA sequences. American Journal of Botany gestions. The molecular work of this study was conducted in part 89: 132–144. in the Genomic Diversity Laboratory at the University of Michi- Dohm JC, Lange C, Holtgr€awe D, Sorensen€ TR, Borchardt D, Schulz B, gan. Support came from the University of Michigan, Oberlin Col- Lehrach H, Weisshaar B, Himmelbauer H. 2012. Palaeohexaploid ancestry for Caryophyllales inferred from extensive gene-based physical and genetic lege, the National Geographic Society, the US National Science mapping of the sugar beet genome (Beta vulgaris). Plant Journal 70: 528–540. Foundation (DEB 1054539, DEB 1352907, and DEB 1354048), Dohm JC, Minoche AE, Holtgrawe D, Capella-Gutierrez S, Zakrzewski F, and a Natural Environment Research Council Independent Tafer H, Rupp O, Sorensen TR, Stracke R, Reinhardt R et al. 2014. The Research Fellowship (NERC RG69516). genome of the recently domesticated crop plant sugar beet (Beta vulgaris). Nature 505: 546–549. van Dongen S. 2000. Graph clustering by flow simulation. PhD thesis, University Author contributions of Utrecht, Utrecht, the Netherlands. Douglas NA, Manos PS. 2007. Molecular phylogeny of Nyctaginaceae: S.A.S., S.F.B., M.J.M., and Y.Y. designed research; M.J.M., Y.Y., , biogeography, and characters associated with a radiation of J.M., J.O., S.F.B., and J.F.W. collected samples and xerophytic genera in North America. American Journal of Botany 94: 856–872.

New Phytologist (2018) 217: 855–870 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 869

Douglas N, Spellenberg R. 2010. A new tribal classification of Nyctaginaceae. Kellogg EA. 2016. Has the connection between polyploidy and diversification Taxon 59: 905–910. actually been tested? Current Opinion in Plant Biology 30:25–32. Ebersberger I, Strauss S, von Haeseler A. 2009. HaMStR: profile hidden markov Kobert K, Salichos L, Rokas A, Stamatakis A. 2016. Computing the internode model based search for orthologs in ESTs. BMC Evolutionary Biology 9: 157. certainty and related measures from partial gene trees. Molecular Biology and Edger PP, Heidel-Fischer HM, Bekaert M, Rota J, Gl€ockner G, Platts AE, Evolution 33: 1606–1617. Heckel DG, Der JP, Wafula EK, Tang M et al. 2015. The butterfly plant Lai Z, Kane NC, Kozik A, Hodgins KA, Dlugosch KM, Barker MS, Matvienko arms-race escalated by gene and genome duplications. Proceedings of the M, Yu Q, Turner KG, Pearl SA et al. 2012. Genomics of Compositae weeds: National Academy of Sciences, USA 112: 8362–8366. EST libraries, microarrays, and evidence of introgression. American Journal of – Edwards EJ, Ogburn RM. 2012. Angiosperm responses to a low-CO2 World: Botany 99: 209 218. CAM and C4 photosynthesis as parallel evolutionary trajectories. International Li Z, Baniaga AE, Sessa EB, Scascitelli M, Graham SW, Rieseberg LH, Barker Journal of Plant Sciences 173: 724–733. MS. 2015. Early genome duplications in conifers and other seed plants. Science Emms DM, Kelly S. 2015. OrthoFinder: solving fundamental biases in whole Advances 1: e1501084. genome comparisons dramatically improves orthogroup inference accuracy. Li L, Stoeckert CJJ, Roos D. 2003. OrthoMCL: identification of ortholog groups Genome Biology 16: 157. for eukaryotic genomes. Genome Research 13: 2178–2189. Estep MC, McKain MR, Vela Diaz D, Zhong J, Hodge JG, Hodkinson TR, Lohaus R, Van de Peer Y. 2016. Of dups and dinos: evolution at the K/Pg Layton DJ, Malcomber ST, Pasquet R, Kellogg EA. 2014. Allopolyploidy, boundary. Current Opinion in Plant Biology 30:62–69. diversification, and the Miocene expansion. Proceedings of the Lott M, Spillner A, Huber KT, Moulton V. 2009. PADRE: a package for National Academy of Sciences, USA 111: 15149–15154. analyzing and displaying reticulate evolution. Bioinformatics 25: 1199–1200. Fawcett J, Maere S, Van de Peer Y. 2009. Plants with double genomes might L€oytynoja A, Goldman N. 2010. webPRANK: a phylogeny-aware multiple have had a better chance to survive the Cretaceous-Tertiary extinction event. sequence aligner with interactive alignment browser. BMC Bioinformatics 11: 579. Proceedings of the National Academy of Sciences, USA 106: 5737–5742. Mandakova T, Li Z, Barker MS, Lysak MA. 2017. Diverse genome organization Fishman L, Willis JH, Wu CA, Lee YW. 2014. Comparative linkage maps suggest following 13 independent mesopolyploid events in Brassicaceae contrasts with that fission, not polyploidy, underlies near-doubling of chromosome number convergent patterns of gene retention. Plant Journal 91:3–21. within monkeyflowers (Mimulus; Phrymaceae). Heredity 112:562–568. Marcet-Houben M, Gabaldon T. 2015. Beyond the whole-genome duplication: Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD-HIT: accelerated for clustering the phylogenetic evidence for an ancient interspecies hybridization in the Baker’s next generation sequencing data. Bioinformatics 28: 3150–3152. yeast lineage. PLoS Biology 13: e1002220. Glick L, Mayrose I. 2014. ChromEvol: assessing the pattern of chromosome Marcussen T, Heier L, Brysting AK, Oxelman B, Jakobsen KS. 2015. From number evolution and the inference of polyploidy along a phylogeny. Molecular gene trees to a dated allopolyploid network: insights from the angiosperm Biology and Evolution 31: 1914–1922. genus Viola (Violaceae). Systematic Biology 64:84–101. Hodgins KA, Lai Z, Oliveira LO, Still DW, Scascitelli M, Barker MS, Kane Marcussen T, Jakobsen KS, Danihelka J^o, Ballard HE, Blaxland K, Brysting NC, Dempewolf H, Kozik A, Kesseli RV et al. 2014. Genomics of AK, Oxelman B. 2012. Inferring species networks from gene trees in high- Compositae crops: reference transcriptome assemblies and evidence of polyploid North American and Hawaiian violets (Viola, Violaceae). Systematic hybridization with wild relatives. Molecular Ecology Resources 14: 166–177. Biology 61: 107–126. Huang CH, Zhang C, Liu M, Hu Y, Gao T, Qi J, Ma H. 2016. Multiple Mayrose I, Barker MS, Otto SP. 2010. Probabilistic models of chromosome polyploidization events across Asteraceae with two nested events in the early number evolution and the inference of polyploidy. Systematic Biology 59: 132– history revealed by nuclear phylogenomics. Molecular Biology and Evolution 33: 144. 2820–2835. Mayrose I, Zhan S, Rothfels C, Magnuson-Ford K, Barker M, Rieseberg L, Huerta-Cepas J, Capella-Gutierrez S, Pryszcz LP, Denisov I, Kormes D, Otto S. 2011. Recently formed polyploid plants diversify at lower rates. Science Marcet-Houben M, Gabaldon T. 2011. PhylomeDB v3.0: an expanding 333: 1257. repository of genome-wide collections of trees, alignments and phylogeny- Mirarab S, Nguyen N, Warnow T. 2014a. PASTA: ultra-large multiple sequence based orthology and paralogy predictions. Nucleic Acids Research 39: D556– alignment. In: Sharan R, ed. RECOMB 2014, LNBI 8394. Basel, Switzerland: D560. Springer International Publishing, 177–191. Jaillon O, Aury J, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Mirarab S, Reaz R, Bayzid MS, Zimmermann T, Swenson MS, Warnow T. Aubourg S, Vitulo N, Jubin C et al. 2007. The grapevine genome sequence 2014b. ASTRAL: genome-scale coalescent-based species tree estimation. suggests ancestral hexaploidization in major angiosperm phyla. Nature 449: Bioinformatics 30: i541–i548. 463–467. Mirarab S, Warnow T. 2015. ASTRAL-II: coalescent-based species tree Jiao Y, Leebens-Mack J, Ayyampalayam S, Bowers J, McKain M, McNeal J, Rolf estimation with many hundreds of taxa and thousands of genes. Bioinformatics M, Ruzicka D, Wafula E, Wickett N et al. 2012. A genome triplication 31:44–52. associated with early diversification of the core eudicots. Genome Biology 13:R3. Moore MJ, Soltis PS, Bell CD, Burleigh JG, Soltis DE. 2010. Phylogenetic Jiao Y, Li J, Tang H, Paterson AH. 2014. Integrated syntenic and phylogenomic analysis of 83 plastid genes further resolves the early diversification of eudicots. analyses reveal an ancient genome duplication in monocots. Plant Cell 26: Proceedings of the National Academy of Sciences, USA 107: 4623–4628. 2792–2802. Nyffeler R, Eggli U. 2010. Disintegrating Portulacaceae: a new familial Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, classification of the suborder Portulacineae (Caryophyllales) based on molecular Tomsho LP, Hu Y, Liang HY, Soltis PS et al. 2011. Ancestral polyploidy in and morphological data. Taxon 59: 227–240. seed plants and angiosperms. Nature 473:97–100. Nyffeler R, Eggli U, Ogburn M, Edwards E. 2008. Variations on a theme: Jones G, Sagitov S, Oxelman B. 2013. Statistical inference of allopolyploid repeated evolution of succulent life forms in the Portulacineae (Caryophyllales). species networks in the presence of incomplete lineage sorting. Systematic Haseltonia 14:26–36. Biology 62: 467–478. Ogburn RM, Edwards EJ. 2013. Repeated origin of three-dimensional Kane NC, King MG, Barker MS, Raduski A, Karrenberg S, Yatabe Y, Knapp SJ, venation releases constraints on the evolution of succulence in plants. Current Rieseberg LH. 2009. Comparative genomic and population genetic analyses Biology 23: 722–726. indicate highly porous genomes and high levels of gene flow between divergent Price MN, Dehal PS, Arkin AP. 2010. FastTree 2, approximately maximum- Helianthus species. Evolution 63: 2061–2075. likelihood trees for large alignments. PLoS ONE 5: e9490. Kapralov MV, Stift M, Filatov DA. 2009. Evolution of genome size in Hawaiian Rivadavia F, Kondo K, Kato M, Hasebe M. 2003. Phylogeny of the sundews, endemic genus Schiedea (Caryophyllaceae). Tropical Plant Biology 2:77–83. Drosera (Droseraceae), based on chloroplast rbcL and nuclear 18S ribosomal Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software DNA sequences. American Journal of Botany 90: 123–130. version 7: improvements in performance and usability. Molecular Biology and Rognes T. 2011. Faster Smith-Waterman database searches with inter-sequence Evolution 30: 772–780. SIMD parallelisation. BMC Bioinformatics 12: 221.

Ó 2017 The Authors New Phytologist (2018) 217: 855–870 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 870 Research Phytologist

Roy T, Cole LW, Chang T-H, Lindqvist C. 2015. Untangling reticulate Wood TE, Takebayashi N, Barker MS, Mayrose I, Greenspoon PB, Rieseberg evolutionary relationships among New World and Hawaiian mints LH. 2009. The frequency of polyploid speciation in vascular plants. Proceedings (Stachydeae, Lamiaceae). Molecular Phylogenetics and Evolution 89:46–62. of the National Academy of Sciences, USA 106: 13875–13879. Salichos L, Stamatakis A, Rokas A. 2014. Novel information theory-based Xiang Y, Huang CH, Hu Y, Wen J, Li S, Yi T, Chen H, Xiang J, Ma H. 2016. measures for quantifying incongruence among phylogenetic trees. Molecular Evolution of Rosaceae fruit types based on nuclear phylogeny in the context of Biology and Evolution 31: 1261–1271. geological times and genome duplication. Molecular Biology and Evolution 34: Sch€aferhoff B, Muller€ KF, Borsch T. 2009. Caryophyllales phylogenetics: 262–281. disentangling Phytolaccaceae and Molluginaceae and description of Yagi M, Kosugi S, Hirakawa H, Ohmiya A, Tanase K, Harada T, Kishimoto K, Microteaceae as a new isolated family. Willdenowia 39: 209–228. Nakayama M, Ichimura K, Onozaki T et al. 2014. Sequence analysis of the Schuster TM, Setaro SD, Kron KA. 2013. Age estimates for the buckwheat genome of carnation (Dianthus caryophyllus L.). DNA Research 21: 231–241. family Polygonaceae based on sequence data calibrated by fossils and with a Yang Y, Berry PE. 2011. Phylogenetics of the Chamaesyce clade (Euphorbia, focus on the amphi-Pacific Muehlenbeckia. PLoS ONE 8: e61261. Euphorbiaceae): reticulate evolution and long-distance dispersal in a prominent – Seo T-K. 2008. Calculating bootstrap probabilities of phylogeny using multilocus C4 lineage. American Journal of Botany 98: 1486 1503. sequence data. Molecular Biology and Evolution 25: 960–971. Yang Y, Moore MJ, Brockington SF, Soltis DE, Wong GK-S, Carpenter EJ, Seo T-K, Kishino H, Thorne JL. 2005. Incorporating gene-specific variation Zhang Y, Chen L, Yan Z, Xie Y et al. 2015. Dissecting molecular evolution in when inferring and evaluating optimal evolutionary tree topologies from the highly diverse plant clade Caryophyllales using transcriptome sequencing. multilocus sequence data. Proceedings of the National Academy of Sciences, USA Molecular Biology and Evolution 32: 2001–2014. 102: 4436–4441. Yang Y, Moore MJ, Brockington SF, Timoneda A, Feng T, Marx H, Walker JF, Shen X-X, Hittinger CT, Rokas A. 2017. Contentious relationships in Smith SA. 2017. An efficient field and laboratory workflow for plant phylogenomic studies can be driven by a handful of genes. Nature Ecology & phylotranscriptomic projects. Applications in Plant Sciences 5: 1600128. Evolution 1: 0126. Yang Y, Smith SA. 2014. Orthology inference in non-model organisms using Smith SA, Brown JW, Yang Y, Brockington SF, Bruenn R, Drummond CP, transcriptomes and low-coverage genomes: improving accuracy and matrix Walker JF, Last N, Douglas NA, Moore MJ. 2017. Disparity, diversity, and occupancy for phylogenomics. Molecular Biology and Evolution 31: 3081–3092. duplications in the Caryophyllales. New Phytologist. doi: 10.1111/nph.14772. Smith SA, Dunn CW. 2008. Phyutility: a phyloinformatics tool for trees, alignments and molecular data. Bioinformatics 24: 715–716. Supporting Information Smith SA, Moore MJ, Brown JW, Yang Y. 2015. Analysis of phylogenomic Additional Supporting Information may be found online in the datasets reveals conflict, concordance, and gene duplications with examples from animals and plants. BMC Evolutionary Biology 15: 150. Supporting Information tab for this article: Soltis PS, Marchant DB, Van de Peer Y, Soltis DE. 2015. Polyploidy and genome evolution in plants. Current Opinion in Genetics & Development 35:119–125. Fig. S1 Workflow for the modified phylome approach. Soltis PS, Soltis DE. 2016. Ancient WGD events as drivers of key innovations in – angiosperms. Current Opinion in Plant Biology 30: 159 165. Fig. S2 Maximum quartet support species tree (MQSST) from Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics 30: 1312–1313. ASTRAL analysis of individual subclade orthologous gene trees. Stamatakis A. 2016. The RAxML v8.2.X manual. [WWW document] URL https://github.com/stamatak/standard-RAxML/tree/master/manual [accessed 4 Fig. S3 Phylogram from RAXML analysis of the concatenated March 2017]. 624-gene supermatrix from modified phylomes, with ICA scores Steige KA, Slotte T. 2016. Genomic legacies of the progenitors and the on branches. evolutionary consequences of allopolyploidy. Current Opinion in Plant Biology 30:88–93. Tank DC, Eastman JM, Pennell MW, Soltis PS, Soltis DE, Hinchliff CE, Fig. S4 Maximum quartet support species tree (MQSST) from Brown JW, Sessa EB, Harmon LJ. 2015. Nested radiations and the pulse of ASTRAL analysis of 624 orthologous gene trees from modified angiosperm diversification: increased diversification rates often follow whole phylomes. genome duplications. New Phytologist 207: 454–467. Thomas GWC, Ather SH, Hahn MW. 2017. Gene-tree reconciliation with MUL-trees to resolve polyploidy events. Systematic Biology. doi: 10.1093/ Fig. S5 Proportion of orthogroups showing duplications filtered sysbio/syx1044. [Epub ahead of print.] by local tree topology. Thulin M, Moore AJ, El-Seedi H, Larsson A, Christin P-A, Edwards EJ. 2016. Phylogeny and generic delimitation in Molluginaceae, new pigment data in Fig. S6 Distribution of within-taxon synonymous distances (Ks) – Caryophyllales, and the new family Corbichoniaceae. Taxon 65: 775 793. among paralogs gene pairs. Vanneste K, Baele G, Maere S, Van de Peer Y. 2014a. Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous-Paleogene boundary. Genome Research 24: 1334–1347. Table S1 Sources of data and settings for assembly and transla- Vanneste K, Maere S, Van de Peer Y. 2014b. Tangled up in two: a burst of tion genome duplications at the end of the Cretaceous and the consequences for plant evolution. Philosophical Transactions of the Royal Society of London. Series Table S2 Information for the 43 newly sequenced transcrip- B, Biological sciences 369: 20130353. Vanneste K, Van de Peer Y, Maere S. 2013. Inference of genome tomes duplications from age distributions revisited. Molecular Biology and Evolution 30: 177–190. Methods S1 All-by-all homology search for subclade data sets. Walker JF, Brown JW, Smith SA. 2017a. Analyzing contentious relationships and outlier genes in phylogenomics. bioRxiv. doi: 10.1101/115774. Please note: Wiley Blackwell are not responsible for the content Walker JF, Yang Y, Moore MJ, Mikenas J, Timoneda A, Brockington SF, Smith SA. 2017b. Widespread paleopolyploidy, gene tree conflict, and recalcitrant or functionality of any Supporting Information supplied by the relationships among the carnivorous Caryophyllales. American Journal of authors. Any queries (other than missing material) should be Botany 104: 858–867. directed to the New Phytologist Central Office.

New Phytologist (2018) 217: 855–870 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust