<<

Research

Conflicting phylogenomic signals reveal a pattern of reticulate evolution in a recent high-Andean diversification (: : Diplostephium)

Oscar M. Vargas1,2, Edgardo M. Ortiz1 and Beryl B. Simpson1 1Integrative Biology and Resources Center, The University of Texas at Austin, Austin, TX 78712, USA; 2Department of Ecology and Evolutionary Biology, University of Michigan,

830 N. University Ave, Ann Arbor, MI 48109, USA

Summary Author for correspondence: High-throughput sequencing is helping biologists to overcome the difficulties of inferring Oscar M. Vargas the phylogenies of recently diverged taxa. The present study analyzes the phylogenetic signal Tel: +1 512 300 9236 of genomic regions with different inheritance patterns using genome skimming and ddRAD- Email: [email protected] seq in a species-rich Andean (Diplostephium) and its allies. Received: 15 January 2017 We analyzed the complete nuclear ribosomal cistron, the complete chloroplast genome, a Accepted: 19 February 2017 partial mitochondrial genome, and a nuclear-ddRAD matrix separately with phylogenetic methods. We applied several approaches to understand the causes of incongruence among New Phytologist (2017) 214: 1736–1750 datasets, including simulations and the detection of introgression using the D-statistic (ABBA- doi: 10.1111/nph.14530 BABA test). We found significant incongruence among the nuclear, chloroplast, and mitochondrial phy- Key words: ddRAD, Diplostephium, genome logenies. The strong signal of hybridization found by simulations and the D-statistic among skimming, hybridization, introgression, phy- genera and inside the main clades of Diplostephium indicate reticulate evolution as a main logenomics, rapid diversification, reticulate cause of phylogenetic incongruence. evolution. Our results add evidence for a major role of reticulate evolution in events of rapid diversifi- cation. Hybridization and introgression confound chloroplast and mitochondrial phylogenies in relation to the species tree as a result of the uniparental inheritance of these genomic regions. Practical implications regarding the prevalence of hybridization are discussed in relation to the phylogenetic method.

Introduction of working with recently diversified taxa (Bock et al., 2014; Ma et al., 2014; Mort et al., 2015). Here, we implement genome Rapid diversifications usually occur in landscapes like skimming (Straub et al., 2012) and double-digest restriction site- archipelagos and mountain ranges that provide disjunct terranes associated DNA sequencing (ddRAD; Peterson et al., 2012) to suitable for speciation via isolation and ecological divergence infer the phylogenetic patterns of one of the most species-rich (Givnish, 1997). The Cordillera is one of these landscapes genera of Andean and its relatives. in which numerous plant groups have experienced high rates of Diplostephium is a main component of the tropical high speciation during and after its recent uplift (Madrin~an et al., Andean flora. The genus traditionally comprises 111 species 2013; Luebert & Weigend, 2014; Hughes & Atchison, 2015). (Vargas, 2011) characterized by a woody habit (from 10 cm Phylogenies of Andean taxa based on Sanger sequencing often decumbent subshrubs to 10 m tall trees) and radiate capitula with suffer from lack of resolution and support, especially in crown white to purple rays (Blake, 1928; Cuatrecasas, 1969; Vargas & clades where accelerated speciation occurred (e.g. Emshwiller, Madrin~an, 2006). Diplostephium inhabits the high elevations of 2002; Rauscher, 2002; Sanchez-Baracaldo, 2004; Bell & the Talamanca Cordillera (Costa Rica), the Northern Andes Donoghue, 2005; Hughes & Eastwood, 2006; Nurk€ et al., 2013; (Venezuela–Colombia–Ecuador), and the Central Andes (Peru– Zapata, 2013). Poorly resolved relationships within Andean taxa Bolivia). Most species of the genus (c. 60) inhabit the paramo, a confirm the prediction that phylogenies resulting from rapid northern Andean ecosystem known for its high plant diversity radiations are difficult to estimate because of low and conflicting (Luteyn, 1999), island-like geography (Simpson, 1974), and a signal caused by short internodes (Whitfield & Lockhart, 2007). large number of species-rich genera (Madrin~an et al., 2013; Lue- High-throughput sequencing, a technology capable of producing bert & Weigend, 2014). In addition to the paramo, some orders of magnitude more data than obtained by Sanger sequenc- Diplostephium species inhabit the Central Andean puna and the ing, has created new opportunities for overcoming the difficulties upper limit of the high Andean forest. Diplostephiun belongs to

1736 New Phytologist (2017) 214: 1736–1750 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 1737

Astereae, where it has traditionally been classified as part of the DNA; and to test for infra- and intergeneric introgressive Chiliotrichum group, a subset of the subtribe Hinterhuberinae hybridization in Diplostephium and its Andean relatives. (Nesom, 1994; Nesom & Robinson, 2007). Molecular phyloge- nies have shown that Astereae subtribes and their subdivisions are Materials and Methods polyphyletic (Noyes & Rieseberg, 1999; Sancho & Karaman- Castro, 2008; Brouillet et al., 2009; Karaman-Castro & Assembly of the genome skimming datasets Urbatsch, 2009; Sancho et al., 2010; Vargas & Madrin~an, 2012) and need to be recircumscribed. Diplostephium is positioned in A total of 91 samples were sequenced. The ingroup contained 74 the ‘South American lineages’ grade (Brouillet et al., 2009), but samples of Diplostephium (69 species, c. 62% of the total number its ambiguous position in the Astereae phylogeny (Noyes & of species) and 14 samples from 13 allied genera in Astereae Rieseberg, 1999; Brouillet et al., 2009; Karaman-Castro & (Supporting Information Table S1). We chose ingroup and out- Urbatsch, 2009; Vargas & Madrin~an, 2012) has obscured its group genera based on their phylogenetic positions in Astereae place among its closest genera. The low molecular variation inferred by Brouillet et al. (2009). The majority of the samples found by Vargas & Madrin~an (2012) in Diplostephium high- (63) were collected in the field where leaf tissue was dried using lighted the problems of inferring the phylogeny of a rapidly silica gel. The remaining samples were taken from herbarium diversified taxon with limited sampling using Sanger sequencing. specimens deposited in ANDES, F, FMB, HUA, HUSA, TEX, Genome skimming, or shallow shotgun sequencing, is an US and USM (Table S1). approach in which whole genomic DNA is sequenced in order to We performed total genomic DNA extractions with the recover high-copy DNA regions suitable for phylogenetic analy- DNeasy Plant Mini Kit (Qiagen) following the manufacturer’s ses (Straub et al., 2012). The biparentally inherited (Volkov et al., protocol. To increase the yield of DNA from herbarium material, 2007) complete nuclear ribosomal DNA has proved to be a use- we added 50 ll of proteinase K (Qiagen; activ- ful marker for inferring species-level phylogenies (Linder et al., ity > 600 mAU ml 1) to the lysis solution and incubated it 2000; Straub et al., 2012; Bock et al., 2014). The mostly nonre- overnight at 45°C after the initial 10 min incubation. Standard combinant and uniparentally inherited chloroplast DNA (Birky, genomic DNA Illumina paired-end libraries with an average of 1995; Jansen & Ruhlman, 2012) has been used to reconstruct c. 400 bp length fragment size were prepared at the Genomic the phylogeny of Asteraceae at the tribal, generic, and species Sequencing and Analysis Facility (GSAF) at The University of levels (Kim et al., 2005; Panero & Funk, 2008; Bock et al., 2014; Texas at Austin (UT) and then sequenced using an Illumina Panero et al., 2014). Finally, the mostly nonrecombinant uni- HiSeq 2500 (Illumina Inc., San Diego, CA, USA). The sequenc- parentally inherited mitochondrial DNA (Birky, 1995) has been ing targeted 10 million paired-end reads (2 9 100 bp length) per employed to infer the evolutionary history of angiosperms at the sample. We inspected the quality of the reads with the program family and order levels (Qiu et al., 2010; Sun et al., 2015) and, FASTQC v.0.10.1 (Andrews, 2010) and filtered the raw data with more recently, to study phylogenetic patterns among species the program ‘process_shortreads’ of the software package STACKS (Bock et al., 2014). Restriction site-associated DNA sequencing v.1.20 (Catchen et al., 2011). The command discarded any reads (RAD-Seq, e.g. ddRAD, GBS, etc.), on the other hand, is a tech- with uncalled bases and trimmed terminal nucleotides that aver- nique that surveys hundreds or thousands of loci mainly from the aged a Phred score of ≤ 10 (90% of confidence in base calls) over nuclear genome (Peterson et al., 2012). a sliding window of 10 bp. The comparison among markers with different inheritance We employed a two-step strategy to create three subsets of patterns has the potential to elucidate hybridization and intro- reads per sample: nuclear ribosomal, chloroplast, and mitochon- gression events (Rieseberg & Soltis, 1991; Hardig et al., 2000; drial. By including only the necessary subsets of genomic data to Bock et al., 2014; Sun et al., 2015) hypothesized to play an perform the assemblies, we aimed to reduce noise and increase important role in cases of evolutionary bursts of speciation the accuracy and computational efficiency in our downstream (Anderson & Stebbins, 1954; Seehausen, 2004). Because rapid analyses. Our first step consisted of performing a de novo assem- divergence happens during a small window of time, it is expected bly of each sample using all the genomic data. The software RAY that the descendants of a diversification burst are prone to inter- v.2.3.1 (Boisvert et al., 2012) performed the assembly using three breed before reproductive barriers develop. This expectation has different k-mer values visually selected from the report graph pro- been hypothesized as beneficial because hybridization among vided by KMERGENIE (Chikhi & Medvedev, 2014). The contigs nascent lineages can result in genotypes potentially adapted to resulting from the RAY assembly were parsed with the ‘-search’ unexploited niches (Anderson & Stebbins, 1954; Seehausen, function of RAY using Helianthus annus L. nuclear ribosomal, 2004). Diplostephium is thus an excellent model to test such chloroplast, and mitochondrial reference sequences (GenBank ideas, as the genus appeared to have undergone a recent diversifi- accessions HM638217.1, NC_007977.1, and KF815390.1, cation associated with the uplift of the Andes (Vargas & respectively); this step provided a subset of RAY contigs. The sec- Madrin~an, 2012). ond step consisted in separating the genomic reads into nuclear In this study, we focused on uncovering the phylogenetic pat- ribosomal, chloroplast, and mitochondrial subsets by mapping all terns of Diplostephium and its allied genera. The aims of this reads against the RAY contig produced after the first step. For paper are to compare the phylogenetic signal among nuclear- example, whole-genome reads of Diplostephium haenkei were ddRAD, nuclear ribosomal, chloroplast, and mitochondrial mapped against their own RAY chloroplast contigs to obtain a

Ó 2017 The Authors New Phytologist (2017) 214: 1736–1750 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 1738 Research Phytologist

chloroplast subset of reads. BOWTIE v.2.2.3 (Langmead & Owing to the high degree of rearrangements and the disconti- Salzberg, 2012) was used to perform the mapping. Then, we ana- nuity found in the mitochondrial assemblies performed with lyzed with different pipelines the three subsets of reads obtained SPADES, it was not feasible to make a direct alignment of our per sample to obtain the corresponding sequence alignments mitochondrial genomes. Instead, we de novo assembled the mito- (Fig. S1). chondrial genome of Diplostephium hartwegii and then used this Owing to the intraindividual polymorphic nature of the genome as a reference for assembly by mapping the remaining 90 nuclear ribosomal DNA, we assembled these sequences in two samples. We chose D. hartwegii based on the continuity and the steps. First, we created a de novo draft assembly with the nuclear high coverage of its mitochondrial contig obtained by RAY.We ribosomal reads subset in each sample using SPADES v.3.5.0 expected all the mitochondrial assemblies with the exception of (Bankevich et al., 2012) with the ‘-careful’ option, and 21, 33, D. hartwegii to have missing data at the boundaries of DNA 55, 77 and 89 as k-mer sizes. The nuclear ribosomal contigs blocks where rearrangements occurred relative to D. hartwegii. obtained by SPAdes were merged in GENEIOUS v.7.1.4 (Kearse With this strategy, we intended to obtain a gapped mitochondrial et al., 2012), allowing overlapping regions of two contigs to genome assembly for each sample in which the order of the mito- assemble with some mismatches by using the ‘medium sensitiv- chondrial blocks matched that of D. hartwegii, making their ity’ setting. Taking into account the fact that the nuclear riboso- alignment feasible. We employed SPADES (with the same param- mal DNA is arranged in tandem repeats, the complete nuclear eters used for the nuclear ribosomal assemblies) and MITOFY ribosomal contig produced by GENEIOUS was treated as circular, v.1.3.1 (Alverson et al., 2010) to assemble and annotate, respec- aiming to increase the accuracy of the back-mapping. The first tively, the mitochondrial genome of D. hartwegii. We corrected nucleotide position of the circular contig was set at the first base by hand our MITOFY annotation using GENEIOUS based on that of right (50 to 30 direction) of the TATAGGGGG promoter found Cucumis sativus L. (Alverson et al., 2011). Before mapping, we fil- at the end of the nontranscribed spacer (NTS) (Linder et al., tered the mitochondrial reads of each sample because we detected 2000). Second, we back-mapped the nuclear ribosomal reads to a small proportion of chloroplast reads mixed in. The intraindi- their circular draft assembly in GENEIOUS using a minimum vidual presence of chloroplast reads in the mitochondrial subset overlap identity of 95%. From the back-mapping obtained for is explained by the similarity of some tRNAs and the transference each sample, we calculated three consensus sequences per sample of DNA between these two genomes (Alverson et al., 2011). To using different threshold percentages (50%, 75% and 90%; filter out the chloroplast reads from the mitochondrial reads sub- Fig. S1). By doing this, we were able to account for the different sets, we mapped every sample’s mitochondrial reads subset levels of intraindividual polymorphisms found in the nuclear against its de novo chloroplast genome assembled in this study ribosomal tandem repeats and evaluate their effects on phyloge- (e.g. the mitochondrial reads of D. colombianum were mapped to netic estimation. The 50% consensus only called a nucleotide in the chloroplast genome of D. colombianum) using a 97% mini- a polymorphic position if the nucleotide was present in > 50% mum overlap identity in GENEIOUS. The reads not mapped to the of the reads containing that position. Therefore, the 50% con- chloroplast genome were mapped to the mitochondrial reference sensus sequence set had fewer ambiguities than the 75% and (D. hartwegii) to create a consensus sequence per sample (Fig. S1) 90% consensus sequences. All the 273 nuclear ribosomal using the ‘medium sensitivity’ setting in GENEIOUS.MAFFT was sequences (three per sample) were aligned with MAFFT v.7.017 used to perform the alignment of the 91 mitochondrial (Katoh et al., 2002) into a master alignment. We corrected the sequences. We visually inspected the matrix in GENEIOUS, and master alignment by hand using GENEIOUS. Finally, we extracted regions difficult to align, or with significant amounts of missing from the master alignment the three matrices corresponding to data, were excluded from the matrix. We also removed the mito- the three percentage thresholds (nr50, nr75 and nr90). Each chondrial rRNA genes rrn5, rrnL, and rrnS from the alignment matrix was analyzed separately. because we found bacterial DNA matching some hyperconserved A de novo assembly for each sample’s chloroplast reads subset regions of these genes. We suspect that the source of bacterial was performed with SPAdes with the same options used for the DNA in our dataset came from bacteria living on the leaf surfaces nuclear ribosomal assembly. First, we annotated the chloroplast of our plant samples. genome of D. haenkei using DOGMA (Wyman et al., 2004) with subsequent manual correction in GENEIOUS using Guizotia Assembly of the nuclear-ddRAD dataset abyssinica (L.f.) Cass. as a reference (GenBank accession EU549769.1). Then, we used the chloroplast of D. haenkei to The RAD-Seq protocol known as ddRAD (Peterson et al., 2012) merge and annotate the chloroplast-SPAdes contigs of the was used to prepare a reference library that included a subset of remaining 90 samples using GENEIOUS (Fig. S1). We codified the 44 samples (Table S1) to which we could map all the unused gaps produced by nonoverlapping contigs as missing data. We nuclear reads that resulted from genome skimming. The restric- employed MAFFT to create the chloroplast genome alignment that tion enzymes EcoRI and SphI were used to digest total genomic was later corrected by hand in GENEIOUS. For the phylogenetic DNA, with a size selection range for the resulting fragments (ex- analysis, we only included one inverted repeat in our alignment cluding adapters) of 354–414 bp. Library sequencing, carried out and manually removed three unalignable regions of c. 2300 sites using an Illumina HiSeq 4000 at the UT-GSAF, produced c.83 in total. million of 2 9 150 bp paired-end reads. The reference library

New Phytologist (2017) 214: 1736–1750 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 1739 was demultiplexed with the software DEML v.1 (Renaud et al., three partitions: rRNA, transcribed spacers (the external tran- 2014) which allowed us to confidently assign > 99% of the reads scribed spacer (ETS), the ITS), and the nontranscribed spacer to their corresponding samples while tolerating up to two mis- (NTS). We compared M0 and M1 for each dataset by calculating matches in the barcodes. The sequence quality assessed using the stepping stone marginal likelihood (Xie et al., 2011) of both FASTQC v.0.11.5 (Andrews, 2010) revealed low quality over the model schemes with MRBAYES v.3.2.2 (Ronquist et al., 2012) on restriction overhang regions and a large portion of the R2 reads. the CIPRES Science Gateway Server (Miller et al., 2010) using Therefore, we excluded R2 reads from subsequent analyses and 10 million generations, two runs, four chains per run, and assum- trimmed the restriction overhang of the R1 files, as well as 40 ing a GTR+Γ model. The stepping stone marginal likelihood val- nucleotides from the end of the reads, to obtain sequences of ues were compared using Bayes factors (Fan et al., 2011). To 100 bp using the script ‘reformat.sh’ from the toolkit BBTOOLS calculate the model of evolution of the partitions, we employed a v.36.38 (Bushnell, 2016). To guarantee that the assembly was mixed strategy. First, we inferred the use +Γ and +I parameters in done using only nuclear reads, we used the script ‘bbsplit.sh’ our partitions using the corrected Akaike information criterion from BBTOOLS to create sets of reads that did not match the (AICc) (Hurvich & Tsai, 1989) employed in JMODELTEST nuclear ribosomal, chloroplast, or mitochondrial sequences of v.2.1.7 (Guindon & Gascuel, 2003; Darriba et al., 2012). Then, our ingroup, or the Coliphage phiX174 viral genome (GenBak we calculated the substitution parameters among nucleotides NC_001422.1) used in Illumina runs to increase nucleotide with reversible jump Markov chain Monte Carlo (rjMCMC) diversity. The demultiplexed, trimmed, and filtered reads were simulations (Huelsenbeck et al., 2004) using MRBAYES with 10 used as input for the software IPYRAD v.0.5.1 (Eaton & Overcast, million generations, two runs, four chains per run, and the +Γ 2016) in which the assembly was performed with the parameters and/or +I parameters if suggested by the AICc. A final MRBAYES specified in Methods S1. This reference ddRAD library con- analysis with 10 million generations, two runs, and four chains tained 32 424 loci. A custom script ‘loci_sampler.py’ (https:// per run was performed using the best partition model along with github.com/edgardomortiz/radscripts.git) was used to select the the substitution parameters inferred for each dataset. We per- most common sequence of each aligned locus, producing a col- formed the ML analyses with RAXML v.8.1.11 (Stamatakis, lection of unique sequences each representing a different locus, 2014) in the CIPRES portal using the partitioned schemes hereafter called the ddRAD reference. favored, 100 rapid bootstrap replicates, and the GTR+Γ model In order to expand the taxon sampling to the full set of species of evolution (as recommended by RAXML manual Stamatakis, and to increase the coverage of samples per locus, we designed a 2015). For all Bayesian analyses we used a burn-in fraction of pipeline called ‘SHOTGUN2RAD’ (https://github.com/edgardomor 0.25. We employed TRACER v.1.6.0 (Rambaut et al., 2014) to tiz/shotgun2rad.git) to map genome skimming reads to our confirm the convergence of parallel MCMC runs. ddRAD reference (or any other type of RAD data) and produce We analyzed the nuclear-ddRAD matrix with ML using the multiple loci alignments across species. The pipeline performs RAXML-HPC2 Workflow available in CIPRES (Miller et al., quality filtering and adapter removal with CUTADAPT v.1.12 2010). Two separate analyses were performed, one consisting of (Martin, 2011), merges the quality-filtered pairs that overlap 100 independent runs and one consisting of 100 thorough boot- using VSEARCH v.2.0.3 (Rognes et al., 2016), maps the merged straps, both run with the Ascertainment Bias option enabled. and unmerged reads to the ddRAD reference using BWA v.0.7.12 The 100 independent runs were summarized into a majority con- (Li & Durbin, 2009), and parses the resulting BWA alignments to sensus rule tree with ‘SUMTREES.PY’ v.4.10 (Sukumaran & Holder, match the ‘CLUSTS.GZ’ format produced by step 3 (within-sample 2010a) available in the package DENDROPY v.4.10 (Sukumaran & clustering) of PYRAD v.3.0.66 (Eaton, 2014). From that point, Holder, 2010b). Support was obtained from the 100 RAXML SHOTGUN2RAD runs PYRAD steps 4–7 to produce loci alignments thorough bootstraps using ‘SUMTREES.PY.’ The same matrix was across species and a variety of matrix formats for phylogenetic analyzed with SVDQUARTETS (Chifman & Kubatko, 2014) analyses. The parameters used for SHOTGUN2RAD and PYRAD are implemented in PAUP* v.4.0a150 (Swofford, 2002) using all indicated in Methods S2 and S3. Two datasets were derived from quartets and performing 100 bootstraps. We excluded outgroups our ddRAD pipeline, a loci dataset (nuclear-ddRAD-loci) and a from both analyses because of their high amounts of missing matrix containing only variable sites (nuclear-ddRAD). All com- data. FIGTREE v.1.4.2 (Rambaut, 2014) allowed us to inspect, putational processes related to the assembly of matrices were car- compare, and export trees to image editors. ried out at the Texas Advance Computing Center at UT (http:// www.tacc.utexas.edu). Congruence assessment and visualization We visually inspected the congruence of the topologies obtained Phylogenetic analyses by different matrices with the multidimensional scaling of tree We performed independent phylogenetic analyses for each space using the Robinson–Foulds distance implemented in genome skimming dataset using Bayesian inference (BI) and TREESETVIZ v.3.0 (Amenta & Klingner, 2002) as a package maximum likelihood (ML). We evaluated a nonpartitioned (M0) for MESQUITE v.3.04 (Maddison & Maddison, 2015). The first vs a partitioned model (M1). The M1 of the chloroplast and comparison was made among the results obtained by the BI and mitochondrial matrices contained a coding and a noncoding par- ML analysis of the three nuclear ribosomal datasets (nr50, nr75, tition. The M1 of the nuclear ribosomal DNA dataset contained and nr90). The second comparison contrasted the trees from the

Ó 2017 The Authors New Phytologist (2017) 214: 1736–1750 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 1740 Research Phytologist

analyses of the nuclear ribosomal 90%, chloroplast, mitochon- To detect introgression, we calculated the Patterson’s drial, and nuclear-ddRAD datasets. From the BI analyses, we D-statistic (ABBA-BABA test; Huson et al., 2005; Green et al., sampled the maximum clade credibility tree (MCCT from 2010; Durand et al., 2011) on selected quartets of taxa with the SUMTREES.PY) and 50 random Bayesian topologies from each nuclear-ddRAD-loci dataset employing the software PYRAD. We dataset after a 0.25 burn-in. From the ML and SVDq analy- chose the quartets based on the JML results and the comparison ses, we sampled the best tree obtained and 50 random boot- between nuclear-ddRAD and chloroplast topologies. The strap replicates from each subset. A hierarchical likelihood- D-statistic calculates the proportion ‘ABBA’ and ‘BABA’ patterns ratio congruence test among the nuclear ribosomal 90%, in a four-taxon phylogeny. This proportion should be equally fre- chloroplast, and mitochondrial datasets was performed with quent in a scenario with ILS and no gene flow; if the proportion CONCATERPILLAR v.1.8a (Leigh et al., 2008) coupled with between the two patterns is significantly different, then introgres- RAXML v.7.2.8. (Stamatakis, 2006). We identified rogue taxa sion between two of the taxa is inferred. Significant patterns were by using the Procrustean Approach to Cophylogeny (PACo; identified using a Z-score > 3, which correspond to a conservative Balbuena et al., 2013) with the pipeline designed by Perez- a = 0.01 (Eaton & Ree, 2013) after the Benjamini–Hochberg Escobar et al. (2016), specifically aimed at identifying incon- correction. gruent taxa between nuclear and chloroplast phylogenies. The approach of Perez-Escobar et al. (2016) assumes that the Time calibration of the phylogeny chloroplast phylogeny is dependent on the nuclear phylogeny and transforms the topologies into matrices of patristic dis- We could not implement a primary calibration for our dataset tances, which are then transformed into Euclidean principal because of the poor fossil record for Astereae. Therefore, in order coordinate matrices using the method proposed by de Vienne to obtain a calibration point for our ingroup, we extracted the et al. (2011). Taxa with significant incongruent phylogenetic ITS region (c. 700 bp) of our nuclear ribosomal alignment and positions are identified by comparing the observed distances combined it with that of Strijk et al. (2012), which contains with those produced by permutations. Because this approach sequences from Heliantheae, , and Anthemideae pro- performs best with phylograms (Perez-Escobar et al., 2016), viding external nodes for the time calibration. We aligned the we used the BI trees obtained from the nuclear ribosomal ITS matrix with MAFFT and employed MRBAYES to obtain a 90% and chloroplast matrices that have low amounts of miss- topology using 30 million generations, two runs, four chains, a ing data and therefore a more accurate branch estimation. temperature of 0.001, and GTR+Γ+I model. We calculated the chronogram with BEAST v.2.3.0 (Drummond et al., 2012) using the MRBAYES topology (fixed), a lognormal relaxed clock, a Yule Hybridization assessment model of speciation, 100 million generations, a sampling fre- To distinguish between incomplete lineage sorting (ILS) and quency of 40 thousand generations, and the same calibration hybridization, we employed the method described by Joly et al. piors used by Strijk et al. (2012). We produced the control file (2009) implemented in the software JML v.1.3.0 (Joly, 2012). using BEAUTI v.2.3 (Drummond et al., 2012). The sets of trees This method uses the posterior distribution of coalescent species from the two runs were combined after a burn-in of 0.25 using trees, estimated population sizes, and branch length from a LOGCOMBINER v.2.3 (Drummond et al., 2012) before calculating *BEAST v.2.3.0 (Heled & Drummond, 2010) output to simulate the chronogram with TREEANNOTATOR 2.3 (Drummond et al., a coalescent scenario with no migration. Hybridization is identi- 2012). fied by comparing the minimum pairwise distance between the Because the ITS matrix (c. 700 bp) did not provide enough simulated and the empirical datasets. If an observed distance is phylogenetic information to produce a robust and resolved topol- significantly smaller than the simulated one, then ILS can be ogy in the part of the tree corresponding to our ingroup, we cali- rejected, suggesting hybridization (Joly et al., 2009; Joly, 2012). brated our Bayesian ddRAD tree (fixed) using the complete Joly et al.’s (2009) approach assumes that the marker used to cal- nuclear ribosomal 90% matrix (c. 13 kb), employing secondary culate genetic distances does not recombine and its power calibration points derived from our ITS chronogram and the lit- increases with longer sequences (Joly, 2012); this makes our erature. We aligned to our matrix the nuclear ribosomal cistron chloroplast matrix an excellent candidate to simulate and com- of Helianthus annuus (GenBank accession KF767534) using pare genetic distances. We calculated a coalescent tree in *BEAST MAFFT and employed the following calibration points on the tree: using the nuclear ribosomal and chloroplast matrices. We ran five a mean of 32.4 million yr (Myr) based on Kim et al. (2005) with independent analyses in *BEAST using 200 million generations, a normal distribution and a sigma of 11 (95% CI: 14.4–50.6) to sampling every 1000, and with a GTR+Γ model. For the Jml the root of the tree representing the crown clade of ; input, we randomly sampled 2000 trees from the five *BEAST and a mean of 11.21 Myr with a normal distribution and a sigma runs using LOGCOMBINER v.2.3 (Drummond et al., 2012) after a of 3.1 (95% CI: 6.11–16.3) to the most recent common ancestor burn-in of 0.25. With JML, we carried out 2000 simulations of of the ‘South American Lineages’ (inferred in our ITS chrono- chloroplast genome pairwise distance comparisons. To visualize gram). To calculate the chronogram, we employed BEAST with the results, we created a matrix map in R (R Core Team, 2016) the same parameters used in the calibration of the ITS dataset indicating significant pairwise comparisons after the Benjamini– and the model and partition scheme calculated for the phyloge- Hochberg correction. netic analysis of the nuclear ribosomal matrix.

New Phytologist (2017) 214: 1736–1750 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 1741

Consensus sequences were deposited in GenBank (Table S1). located towards the tip of the trees (Figs 1b, S5–S9). For exam- Subsets of reads, aligned matrices, and control files are available ple, Diplostephium schultzii, a species represented by two samples, at Dryad (https://doi.org/10.5061/dryad.cn74h). is polyphyletic in the BI-nr50 and ML-nr50 topologies (Figs S5, S7), whereas it is monophyletic in the BI-nr75, BI-nr90, ML- nr75 and ML-nr90 topologies (Figs 1b, S6, S8, S9). Because the Results nr50, nr75 and nr90 backbones are congruent and the nr90 matrix is the most conservative in a phylogenetic context, captur- Characteristics of the datasets ing a considerable amount of intraindividual polymorphisms (in- The nuclear ribosomal alignments comprise 13 362 characters, of formative to MRBAYES and RAXML; Ronquist et al., 2011; which 1203–1425 (9.00–10.66%) are parsimony-informative Stamatakis, 2015), we selected the nr90 dataset to make compar- characters (PICs) depending on the consensus threshold (the isons with the chloroplast and mitochondrial datasets. number of PICs reported here always exclude the outgroup; For the remaining part of the paper we will therefore refer to Table S2). While the nr50 matrix contains only 25 ambiguities, the nuclear ribosomal 90% dataset simply as ‘nuclear ribosomal.’ the nr75 and the nr90 have 3201 and 5012 ambiguities, respec- tively. The chloroplast genome of D. haenkei contains 85 genes Incongruence among the genomic datasets plus 36 tRNAs and eight rRNAs in 152 292 bp (Fig. S2). No sig- nificant rearrangements or gene losses were found across the sam- To avoid confusion in our results and discussion, we only use the pled species relative to Guizotia abyssinica with the exception of BI MCCT topologies from the genome skimming matrices and the loss of the rps19 gene for genistelloides and trnT- the ML topology from the nuclear-ddRAD matrix when compar- GGU for B. genistelloides, B. tricuneata,andLlerasia caucana. ing all dataset topologies, as alternative topologies from the same Fifty-six of the 91 genomes have missing data in their assemblies matrices are almost identical. as a result of the low coverage of reads obtained in regions with a Incongruence is significant among the topologies obtained high number of repeats. The chloroplast alignment is 135 440 from the nuclear-ddRAD, nuclear ribosomal, chloroplast, and characters long, of which 2169 (1.69%) are PICs (Table S2). The mitochondrial genomic regions with the four trees recovering dif- de novo mitochondrial genome of D. hartwegii has a total length ferent numbers of clades of Diplostephium taxa and contrasting of 277 718 bp, containing 56 genes plus three chloroplast-like generic relationships (Fig. 1). The topology in Fig. 1(a) shows two tRNAs (Fig. S3). The final mitochondrial matrix has a length of labeled groups which we refer to as Diplostephium A and B. While 209 392 and contains 1730 (0.83%) PICs. The nuclear-ddRAD- in the nuclear-ddRAD and nuclear ribosomal topologies species loci dataset used for introgression analyses has a total of 19 987 of group A comprise a monophyletic group that is sister to a clade loci. The nuclear-ddRAD matrix containing only variable SNPs, containing nonDiplostephium genera and species of Diplostephium employed for the phylogenetic analyses, has a total of 244 255 group B, in the chloroplast and mitochondrial topologies, species total sites with 96 421 (39.5%) PICs (Table S2). of group A (except for D. inesianum in the mitochondrial topol- ogy) are nested within a clade containing Diplostephium group B species and the rest of Astereae genera sampled. Topological Phylogenetic analysis incongruence among our datasets is also obvious in the visualiza- In all cases, the Bayes factors favor a partitioned matrix analysis tion of the Robinson–Foulds tree distances (Fig. S13), depicting over a nonpartitioned one (Tables S3, S4). Overall, all the phylo- four independent clouds of topologies corresponding to the genetic topologies obtained are well resolved with highly sup- nuclear-ddRAD, nuclear ribosomal, chloroplast, and mitochon- ported nodes (Figs 1, S4–S11). The two topologies obtained by drial tree subsets. The chloroplast and mitochondrial clouds are BI or SVDq and ML for each of the matrices analyzed (e.g. the positioned more closely together relative to the nuclear-ddRAD BI-chloroplast tree compared with the ML-chloroplast tree) are and nuclear ribosomal trees. The hierarchical likelihood ratio test consistent excluding clades or taxa with low support (bootstrap rejects the concatenation of the most congruent genome skim- support (BS) < 80%, Bayesian posterior probability < 0.9). ming dataset combination, chloroplast plus mitochondrial DNA (P < 0.0001). Normalized squared residuals values obtained by PACo identify 35 (38%) of the 91 tips as rogue taxa (Fig. 2). Nuclear ribosomal topologies The BI and ML backbones of the nr50, nr75 and nr90 trees are Hybridization assessment all highly congruent, the exception being the position of Diplostephium meyenii, which has low support on all of the trees A strong signal of hybridization in our data is revealed by JML (Figs 1b, S5–S9). The visualization of the Robinson–Foulds tree (Fig. 2); 1647 (40%) out of 4095 pairwise chloroplast distances distances of the six nuclear ribosomal tree subsets (BI-nr50, BI- are significantly smaller than expected in a scenario with ILS and nr75, BI-nr90, ML-nr50, ML-nr75 and ML-nr90) shows four no migration (P < 0.05) after the Benjamini–Hochberg correc- clusters of topologies (Fig. S12). None of the four clouds is tion. The Patterson’s D-statistic (ABBA-BABA) performed in formed exclusively by the trees of one subset. This visualization selected quartets of taxa shows a signal of introgression among emphasizes the fact that despite the general agreement among the genera and inside Diplostephium groups A and B for many of the nuclear ribosomal topologies, there are some incongruences quartets evaluated (Tables 1, S5).

Ó 2017 The Authors New Phytologist (2017) 214: 1736–1750 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 1742 Research Phytologist

ica lia

m n (a) Nuclear ddRAD o um (b) Nuclear ribosomal f u a t

n ri a an es m a e

i la u p id m v n s u u oi m a a r co lia i i ombi eum u _ e o r ch s f er s i ol p e e is ii is si uscico m r cu iac _ il s oi in r t _c o a _ ta a_ tino m r _ m m r h a co u

u r um e tre c i m um _ glu n o uer amill c b or h u r a _ba i h h a u p u h _ pes ph m_ p e a_sess

o olium ib tadia_ p s o f r m io i

rasia_caucana hium iu t tephi Dip h e r T

i iella n terh t h m_ja r e N Diplostep e Di s uen os ep p hiu _ t

Lle g O o mutisc long q D Soliv lak l Arc um_ru Hinterhu l Diplo Laes o plostephium_co a pl ostephium_r te m Diplos Hin i io v_A i B p phiu steph s u se plostephi Diplostep L pl i L e D i o step t no n Di h um_ ant m a iplos _ e Blak s ephi Diplostephium_ta D pl p _ ir um

e Diplostephi ste D o n

s Di ch a tephiu aceu Di ium_colombia t Diplo r Diplostep hium_inesian ta p a oste h D i l hium ber Dipl _ D ella m plostephium_jarami d tephium_ob D h ostephi argo u ipl i Diplostip m plosteph i um_glut i ip a tep A D u hium_r u oli os D ipl s i f a_ _ um iplostephium_ol m _ba h cam Dipl o m_ D ui n Diplos s t m p n se ephi _fr Diplos t e m_ n eph u esa e o um er ric Diplo iu hium_och r ephium_oc r s iplostephium_sp st o m_te i des tsiif loste i ph y aceum c D u ntin um_ant iopho nt omer p i e i o o iu _r i e Di t um i t nos ep p c ides Di s fro m i jibi p h ne olia ium_jen c upestre o lo D n um teph dendroi lostephium_cium_muti _sp _hue Diplost m_ D Dip iplost hirense l s o um a iu ipl nse um i_CUN r oi Dip h zi lo um hod t o b _ io r r Diplostestep step ephium_teh nov_ aceum Diplo hium_jua CAL iu longif q r step i_ ECU am u tasii ll Diplostephlo _schul ltzi D hium_rosmh m scua ense oi ip i ium_fl _je a D lostep um idale_ plos rg oliuAN stephium_ _schu bo phiu n oa Dip lo phi m D tep e nu T p e u om ip n s m Di h atum lostephih m or uifoliuman n m lost D D ium _ ib u cul iplo iplostephium_ocayaa un umm Dip lostephi _c ri p phium_r_api step nifol dum Di te m Dipl um in m phiu hi _vi erasb iu Diplos te ricense o um_ ens m ta Dip stephium_r ol cens Diplos _cos loste ve aceum e um eolatum m Di nezuelebtusum Diplostephium_rhomboidale_COLhi alv yllu Di plo ph step _ roph ploste stephiumium_ e Diplo Dipl l volutunse m_hete D ostephium_schulphiu acun iu ribundum iplosteph m _p m Diplostephiumsteph _flo _heterohyl osum ium iu icoi Diplo eph ylicoides Diplostephm_rhodo phy des tzii llum Diplost m d _CA ostephium_phvolutu ium_ end Dipl Diplostephischultzii_CUNroidesL ostephium_re D Dipl iploste um_juaji tephium_lacunosumfolium Di phium bi Diplos ini plostephium_c_alveolatumoyi ost Diplostephium_rosmarbense Diplostephi aricense yam Diplostephi um_apiculatum Diplostephium_ca um_r iolaceum homboidale_COL Diplostephium_v Diplostephium_rhomboida le_ECU Diplostephium_cinerascens Lagenophora_cuchumatanica Diplostephium_venezuelense Westoniella_kohkemperi Laennecia_sophiifolia Diplostephium_obtusum r_matudae Parastrephia_qu Aztecaste olia adrangularis haris_asperif Exostigma_no Archibacc trum tobellidiastrum tigma_notobellidias nus Baccharis_gen Exos des Heter istelloides halamus_alie a othala Heterot is_genistelloi Baccha mus_alie r ricuneat a ris_tricuneatanus Baccha is_t sophil Di Bacchar _hyp eyenii plostephiu _m is Dip m_ci aldasiahium gular los nere Flosc ran AJ2 Az tephi um Diplostepquad ov_C um tecast um_sp_nov_CAJ ia_ p_n W er_m eph s estoni atudae arastr um_cinerepanumum La el 2 P tephium_ apam foli ennec la_k ox ureum Fl ohk Diplos _serratiz osc ia_s empe Diplostephiium_ m_a Dipl aldasi ophii ri steph phium arquillenseatum Di os a_ folia plo p teph hy Di loste stephiu hium_jelskiieol lostephi ium_meyenpso plo p m Diplostephium phila Dip Di anc ifolium r um Dipl um_azu rin ostep Diploste nulosu ii Dipl ii icoidesyllumeg Dipl os hium _serratiforeu Diplostephium_cajam r tephium_ m ephium_glandulosum_junipe rtw osae D os _ ioph n eguiium m ipl tephium_ cajam hium_empethium_spi ha u Diplosto liu p m n ste Diplostephium_obl Diplostephiu o a m Diplost phium ya p bla rqu i h ium_ la r i Dipl e iu nc ill sagast ov_CAJ_JUN2 jel ense c 4 p m_ e DiplostepDiploste kei Di o hium_empetr skii ola oides XA ov XA N g foliosissi y D plo step tum Diploste p_n n lan steph U Diplostephium_e v_JUN3x ipl s pophae Di J s d o ostephteph h m_spi ul v_O ium_ _ Diplostepploste ium_juni osu o um_bar v

Diplo h um_PAS D odspeedi um_O

o r i m p r D ipl um_c n iplostephium_crypter hium_ _gyno _n n u if Diplostephium_espi m_callilepis _sp_n i ium o p _ Dip o phium lo D iplostephium_ ch p liu tephi m p s st p su D oste s ium_hip Dipl los r e m phium_sp_stephium_gnidioides s Dip e h _e y rinumm los p pt o phiu _ Dipl t i tephium_haen o e hiu u _ha ric e Dipl e D teph t ph m Diplo m lo s e o ri loste stephium_sp_nov_JUN D ipl _e op Diplostep Dipl u D ste teph phium_fiu m_o i i iplo os r des um_pul D t h h ost m spin Diplostephium_lechle Dip iplo w y Dip Di ip t p ep ph egi llu Diplos Di i _s xapampa Diplo iplostephiu Dip s um ostephium_go e D lost eph m Diplos l Dip pl Diplost t Diplos st t os Diplostep lo i a D D p e h u i Diplostephium s iplost gas ip l os e p i m_ _ o B i s a o u o lo

p l ephi ph i bar li e D l te h m iplostephium_ ostephi s u o o l t p s iu o ephium_pul teg i te m s si D p _ ste te i

ipl s p h um m_ hipp c s n D ephium_ _g teph p _ t um_s l uii Diplostephium_pulch p si u D ium_ ayanum e h no p m h m p i _gnid s yn hiu i um_ hiu p ophae h u v um

i _ ox _ i u m u s JUN3 m p n m m_lec m yoi _ p _n o p i _ _ oi _ sp_ v_JU

_ s u ca no o d goods chrum_ d h p_n l c e v_CAJ e

a llile hr v s n (e) s

e hleri _ o N o u n J v_ pis v m k U

peedi _ e N2 JUN4 _P O O i XA X AS

A

i

Lagenop

Di

Ar

Oritrophium

P p a Diplostephium_juniper c l Heter a o ic Diplostep hib r st Di D astr hora_cuchum um s an r ep p acc iplo D us D Di at st loste Lle num o n iplostephiue hi ip ide a p a i phia_q thala s step h o l l u d m S um e o ostephium_sp_no r ll aris_asp m_empetr asia_caucana alie li u s a h l hi p o e id _p a _ e D Di tephium t l n hium_ l s h um_sp_n i olia m uc v f iplos is South America plos uneat i n iu e ico

a can r n r lia uad us_alie tobe le r ic

_ e m_ e o uvia r e il o f folia a_c p s nesia hila tep t m g _ e Dip ephiu a r s

e cau uscico a sii foliosissi m er hii rang t rqu _ _n r t a latum _cine s m anica is_t r p da h is_ ho m _ a o halamus Diplos num u Dipl D l r e i e s hium_peruvia os i u kii Dipl i fol r _ eo xapampa f p t m s iplo u i s o y ol a b is a so n l o a_sessilis r a ypsop m_g m_hippop i tephium_haen v i enii ia ja l Diplo o ularis s _ um i us h n ia v _C um asia_ h um ste o erot d a n stephium_goodspe r a reu um trop a_ba lanc du i stephium_ca eno r t stigma _m oli D eph ci cha r b f p yno v_CA mu A et m_jel i i s acc sta iell e c e r ploste h Soli Lle u m Ori t H nterhu t _o m ephium_sp_niu J2 Bacch k n a glan ium_lec B i ostephium_i s et u Dip Lag hi ium_cm m x num m Exo n ib p e is y H la ldasia_p h u junipe r r _pu Lae e h i _ l oid h J B a os phi a c eca te um_ nii J2 ae Dipl r s ep m _em e ula tephiu L t hi u g Di u lchr es A zt o s eph J Di m_ k t hi an p llilepishleri A pl lo ep _CA pl lost ei Flosc m_cinemey dr v Di o s u p st iu o m pl stephium m p ov m Di plos ua v_CA os ep _sp__n _OXAedi Di step h q _n o mu D _JUN2 Di iplo lo p ium_ p si ipl te hi ov a_ s is os p um_gnidi _ i D p s ii hi n J Di plostephium m_ io hae tephiumum_p _s ov_J U loste ephi u ol N4 Di losteph um_sp_nf peed p_n Dip p phi i _ ppop ulc oid UN astr m Az _sp h ov_O Di ar iu s te ru es P um_higoods epi La cas _nov m_ iplostelostephteph _ llil i m W enneci t XA D p s um ca ke u es er _J PAS Di tephi i _ an B ton _matudaeU os m aen mp acch iell a_sophi N3 Diplopl teph _h pa ar a_ Di m D is ko iplos stephiuiu i s iplos Baccharis__ge hk ifolia D lo um_oxa ide teph n em ip stephhi o ium_c istel pe D o tep m_lechlernoxy XA l ri Dipl os iu gy v_O Dip tricunoides ipl ph m_ o XA los ryp D oste hiu sp_n O tep terio ea ipl p _ rum_ Dipl hium_ ph ta D oste hium ulch N4 ost e yllu l ep p _JU ephi spin m Dip st hium_ nov Diplostephi um_e osae Diplo step sp_ Exo ricoi iplo hium_ ov_JUN2 stigm um_ha des D ostep _sp_n a_no rtw Dipl hium ov_JUN3 Diplost tobel egii ostep _sp_n ephiu lidiast Dipl hium PAS D m_sp rum lostep hrum_ iplosteph inulosu Dip ium_s m iplostephium_pulcnidioides Dipl erratifo D phium_g ostephium lium Diploste ov_JUN _azureum hium_sp_n Diplostephium Diplostep zureum _barclayanum lostephium_a Dip folium Diplostephium_sagasteguii Diplostephium_serrati Laestadia_muscicola Westoniella_kohkemperi Blakiella_bartsiifolia Diplostephium_sagasteguii Diplos Hinterhubera_ericoides tephium_barclayanum psophila Diplostephiu scaldasia_hy D m_spinulosum Flo _jelskii iplostephium Dip _crypter Diplostephium illense lostephiu iophyllum marqu Dipl m_ericoide ium_caja losum ostep s steph landu Diplo hium_ Diplo m_g atum D stephi hartwe tephiu nceol iplost um_es gii Diplos bla anum D eph p Fig. 1 Phylogenies obtained from the ium_o inesi iploste ium_ inosa teph ium_ ceum Diplost phi cor e plos teph ria oi um_c iaceum Di plos mer Diplost eph ol Di phium_co ro D ium_ omb loste ium_ um ipl ephium rome ianu different datasets: (a) nuclear double-digest Dip eph um Dipl oste r m lost _glutinosan os ph _fr oi Dip mbi Di te iu onti hium olo Di plo ph m_e nense step c rum pl ste ium_ rioph iplo um_ ontinenseho D ost ph restriction site-associated DNA sequencing D hi op tre ipl ep ium ru or step ri es sii Dip os hium _sp pes um iplo _e up rta D los tephiu tre D ium _r e ipl tephium _antioque_no ph um hu ense Dipl os m – Diplostephium_fre _ r lloi D tephi _c v_ ost m chi i os AN (ddRAD) maximum likelihood; (b) nuclear ipl stephihiu ami AL Diplosteplo am T D lo p _ta r s teph um_o_jenesaargoanse Dip ste _ja e Dip stephium lo tzii_Coid Diplosteph ium ip r Diplostelostep ch nu – D hul nd m phium_hu _m r nu m ribosomal 90% Bayesian inference (BI) ostephiumephium tu Dip ace m ipl st la i D _ob uti o ode u llum Diploiploslostep hiu s um D pl um_scod y oy e c i h D p m longi u D r apic Dip ium a _ ens ipl h _t e n m uajibi D s tephium ium_r achir r f um maximum clade credibility tree (MCCT); (c) m_ eroph ic Di os h _jara tasi ol u UN ip l te i i iplostephi i olatum Diplost os um_ju um D phiu h m_j D plo lo te p e i p star e ense D s tep phium_alh hom e o lv D iplost ium_ m n m_hethiu NT D ip ste te s st tzii_C Diplos iplo _a illoi e – o D i lo h a iploste l Di plost p b tep ioqu e_COL_A D i j chloroplast BI-MCCT; and (d) p l D i h picu ib D t D ep p um_cost o Diplost v Diploste s hul um D p st h ip p e hi i i idale_COL Di tephiu c aceum um ip st u o lo ip l e e i l oste p hium m yi iplostephium_rhomboidale_ECU p aceum n l o e u ter _s r um es m los o phiu h latum D phium_a l ephiu m_ v l te Dip ephium_c n o st ph _s o st i m m um eo op ifolium s st um m – sp_no s e p Diplos nesan goa i t t e p c o _ och e i _rosmate u latum l t mitochondrial BI-MCCT. Dots indicate e ph hi um h hium_an m_viol e ephium_ve m_p h ar j e p h folium p s _ yl licoid ense p nu p u u ultzii_CUN phiu rhomboida _ ifol p h i hiu m l i y u lu e iscua scens u Dip mar hium_ i a ce tephiu _ ium t um_f m _g Diploste m h h iu i a m_ci cunos m h rini bundu _r f b i r i h o acunosu volutum m _ ns p oblong u hium_ l a r m_ l um enu o ylico l i mut e ca e u iu m_rhododendroides t m te r _sch ri e iplos _ v t highly supported nodes (bootstrap support m_p _ ine i ni m m_ ayamb l olut D Diplostph _ ya n v o ner os m ob e l tephiu c osu um r f t iolac r id o ium_ u h s ip i i m b li Diplostephi t es plostep o nezue u as u los phiu hium_ h usu u D i u b p m lt m m p > > e m nd e D z cen eu teph e 90, Bayesian posterior probability 0.9). Di stephiu b n Diplo t i s m i_ ostephium_ca o o um se lost l s m tephium_flo ostep i C pl o s d lense ip ostephium l AL a Di p Dip D i Diplo l Dipl ostephium_ e D Shaded areas indicate the distribution of Dipl

_

Diplostephium_

Diplos E iplostephium_rosm Diplostephium_c Dipl C D

U Diplostephium_venezuelense Diplostephium taxa according to the map on (c) Chloroplast (d) Mitochondrial the right (e).

‘South American lineages’ (node R, mean = 11.2 Myr, 95% CI: Chronograms 6.2–16.4) employed to calibrate the nuclear-ddRAD topology The ITS chronogram shows poor support and resolution in with the complete nuclear ribosomal matrix. The time calibration nodes positioned in the ‘South American Lineages’ (Fig. S14); of our nuclear topology using the nuclear ribosomal matrix nevertheless, it provides a time estimate for the origin of the (Fig. 3) provides a rough estimation of the age of Diplostephium

New Phytologist (2017) 214: 1736–1750 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 1743

Laestadia muscicola Blakiella bartsiifolia 9 Hinterhubera ericoides 9* Diplostephium inesianum Diplostephium romeroi Diplostephium coriaceum Diplostephium glutinosum Diplostephium colombianum Diplostephium frontinense Diplostephium rupestre Diplostephium eriophorum Diplostephium tachirense Diplostephium huertasii Diplostephium jaramilloi 18 Diplostephium ochraceum Diplostephium sp nov ANT Diplostephium antioquense Diplostephium mutiscuanum Diplostephium oblongifolium Diplostephium camargoanum Diplostephium tenuifolium Diplostephium jenesanum Diplostephium heterophyllum Diplostephium juajibioyi Diplostephium schultzii CUN Diplostephium rhododendroides Diplostephium schultzii CAL Diplostephium costaricense Diplostephium alveolatum Diplostephium apiculatum Diplostephium rhomboidale ECU Diplostephium rhomboidale COL Diplostephium venezuelense Diplostephium obtusum Diplostephium revolutum Diplostephium phylicoides Diplostephium lacunosum Diplostephium rosmarinifolium Diplostephium floribundum Diplostephium cayambense Diplostephium violaceum Diplostephium cinerascens Lagenophora cuchumatanica 9* asperifolia 9 Aztecaster matudae Laennecia sophiifolia 9 Westoniella kohkemperi Exostigma notobellidiastrum 9 Heterothalamus alienus 9 Baccharis tricuneata Baccharis genistelloides 9 Floscaldasia hypsophila Diplostephium cinereum Diplostephium sp nov CAJ2 quadrangularis Diplostephium meyenii Diplostephium oxapampanum Diplostephium azureum Diplostephium serratifolium Diplostephium cajamarquillense Diplostephium jelskii Diplostephium oblanceolatum Diplostephium empetrifolium Diplostephium glandulosum 9 Diplostephium spinulosum 9 Diplostephium juniperinum Diplostephium crypteriophyllum Diplostephium espinosae Diplostephium hartwegii 9 Diplostephium sagasteguii Diplostephium barclayanum Diplostephium foliosissimum Diplostephium sp nov CAJ Diplostephium sp nov JUN3 Diplostephium sp nov JUN Diplostephium gnidioides Diplostephium sp nov JUN2 Diplostephium hippophae Diplostephium gynoxyoides Diplostephium pulchrum PAS Diplostephium sp nov JUN4 Diplostephium pulchrum OXA Diplostephium sp nov OXA Diplostephium lechleri Diplostephium haenkei Diplostephium 9 Diplostephium goodspeedii

Fig. 2 JML matrix mapped on the nuclear double-digest restriction site-associated DNA sequencing (ddRAD) topology showing significant pairwise comparisons (black squares) in which the chloroplast genetic distance was found to be significantly smaller than expected under a scenario with incomplete lineage sorting with no migration, suggesting hybridization. Ingroup rogue taxa identified by Procrustean Approach to Cophylogeny are indicated by the orange squares. Numbers following taxon names indicate the haploid number of chromosomes; the star indicates that the count was made in a different species of the same genus. groups A (mean = 6.5 Myr, 95% CI: 1.3–11.4) and B contains a group of Diplostephium species restricted mostly to the (mean = 7.3 Myr, 95% CI: 2.3–12.5), and the divergence of Northern Andes, labeled group A in the nuclear-ddRAD topology South American genera (mean = 10.17 Myr, 95% CI: 3.2–16.2). (Figs 1, 3). However, despite this Northern Andean clade being recovered in the four genomic phylogenies, its position relative to other genera and the rest of species of Diplostephium is incongruent Discussion when nuclear and organellar topologies are compared. Incongru- ence among different markers can be produced by phylogenetic Phylogenetic incongruence and hybridization uncertainty, incomplete lineage sorting, and/or hybridization Incongruence among the nuclear-ddRAD, nuclear ribosomal, (Rieseberg & Soltis, 1991; Maddison, 1997; Huelsenbeck et al., chloroplast, and mitochondrial datasets is found at the generic and 2000). To rule out phylogenetic uncertainty, we performed a PACo species levels, especially between the nuclear vs organellar trees. The analysis to identify rogue taxa. Because PACo performs simulations only clade recovered consistently across phylogenies (excluding incorporating phylogenetic uncertainty, the presence of significant D. inesianum for the mitochondrial topology) is a clade that outliers provides evidence for the phylogenies being affected by ILS

Ó 2017 The Authors New Phytologist (2017) 214: 1736–1750 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 1744 Research Phytologist

Table 1 Patterson’s D-statistic showing the tests for which introgression between P3 and either P1 or P2 was detected using a Z-score threshold of 3 (see Supporting Information Table S1 for full species details)

P1 P2 P3 Outgroup Z BABA ABBA nloci

Introgression among genera D. ochraceum D. violaceum B. bartsiifolia D. haenkei 4.99 8.5 35.25 198 B. genistelloides L. sophiifolia D. ericoides D. ochraceum 4.8 30.75 84 159 B. genistelloides B. tricuneata D. espinosae D. frontinense 5.06 77.25 169.5 291 B. genistelloides L. sophiifolia D. espinosae D. frontinense 4.35 50.38 106.88 204 B. genistelloides B. tricuneata D. ochraceum D. haenkei 4.58 184 89.5 303 B. genistelloides B. tricuneata D. violaceum D. haenkei 3.55 172.5 96.5 309 B. genistelloides B. tricuneata H. alienus E. notobellidiastrum 5.65 121.88 48.88 190 Introgression inside Diplostephium group A D. mutiscuanum D. oblongifolium D. antioquense D. colombianum 8.15 223.12 542.88 1447 D. revolutum D. rosmarinifolium D. colombianum B. tricuneata 6.03 167.38 56.88 333 D. revolutum D. rosmarinifolium D. eriophorum B. tricuneata 4.87 146.75 64 337 D. revolutum D. rosmarinifolium D. juajibioyi D. colombianum 3.84 232.12 361.88 1321 D. tenuifolium D. ochraceum D. rosmarinifolium D. colombianum 5.05 478.12 294.12 1550 Introgression inside Diplostephium group B D. callilepis D. goodspeedii D. haenkei D. azureum 13.5 238.38 715.38 1350 D. meyenii P. quadrangularis D. cinereum B. tricuneata 19.44 498 76.75 444 D. cinereum D. sp. nov. CAJ2 D. meyenii F. hypsophila 11.6 337 89.25 452 D. foliosissimum D. sagasteguii D. meyenii B. tricuneata 11.14 78.62 309.12 473 D. barclayanum D. sagasteguii D. meyenii B. tricuneata 9.42 81.38 277.62 473 D. haenkei D. callilepis D. meyenii F. hypsophila 9.3 300.25 92.25 490 D. goodspeedii D. callilepis D. meyenii F. hypsophila 5.28 242.38 108.62 466 D. meyenii P. quadrangularis D. sp. nov. CAJ2 B. tricuneata 19.44 498 76.75 444 D. sp. nov. JUN3 D. sp. nov. JUN D. sp. nov. JUN2 D. azureum 3.35 411.5 551.25 1294

or hybridization. The result that 38% of our samples are rogue indi- Diplostephium sp. nov. CAJ2 has scale-like leaves similar to cates extensive ILS or hybridization in our dataset. P. quadrangularis but heterogamous capitula with ray flowers like To distinguish between ILS and hybridization, we employed D. meyenii and group B species (Fig. 4a–c). Similarly, a high Z- the method proposed by Joly et al. (2009), which is incorporated score suggests that D. cinereum (Fig. 4d) has hybridized with into the software JML (Joly, 2012). The significantly small pair- D. meyenii or P. quadrangularis (Table 1), all of which occur in wise distances of c. 40% of all possible comparisons among the Peruvian Andes in parapatry, and morphological hybrids have chloroplast genomes suggest high degrees of hybridization. been observed in the field (V. Quipuscoa, pers. comm.). The Although JML results can be difficult to interpret because they incongruence among genomic phylogenies, the JML results, and only show species pairs affected by hybridization and do not the signal of introgression inferred by the D-statistic suggests a incorporate a model to detect ancient hybridization, our visual- complex pattern of reticulate evolution in our ingroup. ization of the JML results mapped next to a phylogeny (Fig. 2) The high number of instances of hybridization seen in our provides a new approach to formulate hybridization hypotheses. dataset explains to a great extent the incongruence obtained This visualization shows a strong signal of ancient hybridization among our genomic phylogenies. Numerous authors have pro- because significant comparisons are present among different gen- posed that horizontal gene transfer via introgression could cause era and among almost all comparisons between species of deviation of the chloroplast and mitochondrial phylogenies from Diplostephium groups A and B. the species tree (Rieseberg & Soltis, 1991; Moore, 1995; Hardig We employed a nuclear-ddRAD-loci dataset to calculate et al., 2000; Sun et al., 2015). Because organellar DNA inheri- empirically the amount of introgression among selected taxa tance is mostly uniparental and there is generally no recombina- using the Patterson’s D-statistic. Our results (Table 1) show that tion following fertilization (Birky, 1995; Jansen & Ruhlman, introgression has occurred among genera and inside 2012), an event of hybridization could completely replace the Diplostephium groups A and B. Even though we did not test for original chloroplast and mitochondrial DNA of a lineage with an every possible hybridization event in our D-statistic framework, alien one, confounding their phylogenies relative to the species the significant tests reported (Table 1) suggest at least 12 reticu- tree (Rieseberg & Soltis, 1991). Chloroplast and mitochondrial late events (different tests may represent the same event). In the genomes have the potential to be fixed rapidly because their effec- specific case of Diplostephium sp. nov. CAJ2, D. meyenii, and tive population size (Ne) is one-quarter that of a nuclear autoso- Parastrephia quadrangularis, the D-statistic identifies a strong sig- mal gene (Moore, 1995), making organellar genomes less prone nal of hybridization, also supported by morphological data. We to ILS. Additionally, the discrepancies among the phylogenies of hypothesize that Diplostephium sp. nov. CAJ2 is the product of a chloroplast and mitochondrial DNA suggest a decoupling of cross between D. meyenii and P. quadrangularis because inheritance of these two genomes. The evidence presented here

New Phytologist (2017) 214: 1736–1750 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 1745

Helianthus annuus Soliva sessilis Oritrophium peruvianum 6.51 caucana 1 muscicola 18.39 6.23 Blakiella bartsiifolia 3.79 Hinterhubera ericoides Diplostephium inesianum 3.5 Diplostephium coriaceum 2.22 Diplostephium romeroi 16.01 9.07 Diplostephium colombianum 2.97 Diplostephium glutinosum 4.58 2.61 Diplostephium frontinense 1.11 Diplostephium eriophorum 2.25 Diplostephium rupestre Diplostephium tachirense 6.46 3.29 Diplostephium jaramilloi 12.54 1.41 Diplostephium huertasii 0.98 Diplostephium ochraceum 2.82 Diplostephium sp nov ANT 0.52 Diplostephium antioquense 1.95 Diplostephium oblongifolium 0.2 Diplostephium mutiscuanum 1.56 Diplostephium camargoanum 5.55 0.87 Diplostephium jenesanum 0.18 Diplostephium tenuifolium Diplostephium obtusum A 1.26 .22 Diplostephium venezuelense Diplostephium revolutum 2 0.53 Diplostephium phylicoides 0.16 Diplostephium lacunosum 10.17 1.72 Diplostephium rosmarinifolium 0.55 Diplostephium floribundum 1.14 Diplostephium cayambense Diplostephium violaceum 3.93 0.73 Diplostephium cinerascens 0.28 Diplostephium heterophyllum 1.01 Diplostephium juajibioyi 1.55 Diplostephium schultzii CUN 3.12 Diplostephium rhododendroides 0.92 Diplostephium schultzii CAL 2.46 Diplostephium alveolatum .56 Diplostephium costaricense 1.52 Diplostephium apiculatum Diplostephium rhomboidale ECU 0.59 Diplostephium rhomboidale COL 0.18 Lagenophora cuchumatanica Laennecia sophiifolia 7.36 3.18 Westoniella kohkemperi 5.69 Aztecaster matudae 3.74 Archibaccharis asperifolia Exostigma notobellidiastrum 9.14 6.93 Heterothalamus alienus 4.02 Baccharis tricuneata 2.44 Baccharis genistelloides Floscaldasia hypsophila 8.61 Diplostephium sp nov CAJ2 2.93 Diplostephium cinereum 7.84 6.02 Diplostephium meyenii 3.98 Parastrephia quadrangularis Diplostephium oxapampanum 7.31 Diplostephium azureum 4.18 Diplostephium serratifolium Diplostephium cajamarquillense 6.22 Diplostephium jelskii 2.78 1.46 Diplostephium oblanceolatum 4.49 1.05 Diplostephium empetrifolium 0.59 Diplostephium glandulosum 5.95 Diplostephium spinulosum 2.03 Diplostephium juniperinum 3.53 Diplostephium ericoides 2.66 Diplostephium crypteriophyllum 5.47 2.18 Diplostephium espinosae 0.79 Diplostephium hartwegii B Diplostephium sagasteguii 2.4 Diplostephium barclayanum 0.19 Diplostephium foliosissimum Diplostephium sp nov CAJ 4.36 1.11 Diplostephium sp nov JUN 1.91 Diplostephium sp nov JUN3 3.12 Diplostephium gnidioides 0.74 Diplostephium sp nov JUN2 Diplostephium gynoxyoides 2.3 0.38 Diplostephium hippophae 0.37 Diplostephium lechleri 1.99 Diplostephium haenkei .21 Diplostephium callilepis .1 Diplostephium goodspeedii 1.13 .27 Diplostephium sp nov JUN4 Diplostephium pulchrum PAS 0.57 Diplostephium pulchrum OXA 0.17 Diplostephium sp nov OXA Millions of years ago

30 25 20 15 10 5 0

Fig. 3 Chronogram calculated on the maximum likelihood double-digest restriction site-associated DNA sequencing (ddRAD) topology using the nuclear ribosomal 90% matrix. Numbers at nodes indicate the average estimated ages, while bars show their 95% confidence intervals. Stars mark the calibration points used. leads us to conclude that the trees obtained from the chloroplast The nuclear-ddRAD tree is, to our knowledge, the best species and mitochondrial DNA unequivocally deviate from the species tree hypothesis, taking into account the numerous loci sampled tree. from the nuclear genome and its phylogenetic robustness

Ó 2017 The Authors New Phytologist (2017) 214: 1736–1750 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 1746 Research Phytologist

(a) (b)

(c) (d)

Fig. 4 Photos of taxa with high genetic hybridization signal. (a) Parastrephia quadrangularis; (b) Diplostephium meyenii; (c) Diplostephium sp. nov. CAJ2; and (d) Diplostephium cinereum. Notice the hybrid morphology of Diplostephium sp. nov. CAJ2 in relation to P. quadrangularis and D. meyenii.

(Fig. 1a). This topology fits better the morphological classifica- stages of differentiation before reproductive barriers develop. tion and biogeography of our ingroup than all the other phyloge- Available reports of chromosome numbers (Chromosome nies obtained in this study. The nuclear-ddRAD topology Counts Database, http://ccdb.tau.ac.il; Fig. 2) suggest that the indicates that Diplostephium (sensu Cuatrecasas, 1969) is ancient hybridization identified in our data did not result in poly- biphyletic (if Parastrephia is included in group B species) with ploidy in the majority of cases because the only sample with a both clades diverging approximately in the late Miocene. Because chromosome count report deviating from the base chromosome the nomenclatural type of the genus, D. ericoides, is a member of number in Astereae (n = 9; Nesom & Robinson, 2007) is group B species (Diplostephium s.s.), group A must be circum- Diplostephium jaramilloi (n = 18). However, more data about scribed as a different taxon (O. M. Vargas, unpublished). karyotypes are needed to confirm these statements, especially for From the genome skimming dataset, the nuclear ribosomal Diplostephium group A species. DNA is better at capturing signal to infer the species tree than the chloroplast and mitochondrial DNA, because the nuclear Practical considerations ribosomal cistron is biparentally inherited (Volkov et al., 2007), it recombines (Hughes & Petersen, 2001; Ambrose & Crease, Our results demonstrate that hybridization is prevalent in 2011), and its copies are located on multiple chromosomes Diplostephium and its allies deviating the chloroplast and mito- (Phillips et al., 1971; Alvarez & Wendel, 2003). The nuclear chondrial phylogenies from the species tree. In conjunction with ribosomal tree is also largely congruent with the nuclear-ddRAD documented cases of reticulate evolution in other organisms like tree (Fig. 1a,b). heliconiine butterflies (Mallet et al., 2007) and cichlid fishes Extensive introgression in South American Astereae is proba- (Genner & Turner, 2012), our results support the hypothesis bly a result of lack of reproductive barriers. Even though the pol- that hybridization might be a common process in events of rapid lination biology of our ingroup is still understudied, evolution (Anderson & Stebbins, 1954; Seehausen, 2004). Diplostephium and its allies lack obvious pollinator specialization Because hybridization might be frequent during evolutionary (Cuatrecasas, 1969) probably because of the low diversity of pol- bursts of speciation, ancient and recent diversifications (and radi- linators in the high Andes (Berry & Calvo, 1989). Additionally, ations) are especially prone to the phylogenetic biases explained the complex topography of the Andes might promote speciation here. We urge systematists to avoid concatenating data with dif- via geographic isolation while allowing gene flow in the early ferent inheritance patterns (e.g. chloroplast and nuclear markers)

New Phytologist (2017) 214: 1736–1750 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 1747 without testing for congruence, and to apply caution when using 1338694 and DEB 1240869). Research permits: resolution 0145 chloroplast and mitochondrial phylogenies as the basis for taxo- of ‘El Ministerio de Ambiente y Desarrollo Sostenible’ and per- nomic classification, time calibrations, historical biogeographic mit 37 of ‘Sistema Ambiental Sina’ (Colombia); authorization reconstructions, and comparative phylogenetic analyses. 10-2012-IC-FLO-DPAP-MA of ‘Ministerio del Medio Ambi- Phylogenetic methods have focused on a bifurcating model of ente’ (Ecuador); and resolution 0369-2011-AG-DGFFS- evolution, but dichotomous trees cannot illustrate processes like DGEFFS of ‘Direccion de Gestion Forestal y de Fauna Silvestre’ hybridization and horizontal gene transfer (Doolittle, 1999; (Peru). Huson & Bryant, 2006; Rieppel, 2010). Recent advances in phy- logenetic explicit network estimation (Solıs-Lemus & Ane, 2016; Author contributions Wen et al., 2016) promise better models to understand the evolu- tionary history of taxa that have undergone hybridization and lat- O.M.V. and E.M.O. performed the research, collected the sam- eral gene transfer, yet these approaches are still in their infancy ples, and analyzed the data. B.B.S. and O.M.V. designed the and are only efficient in cases of recent hybridization with few research and interpreted the results. The writing was primarily reticulate events. carried out by O.M.V. Approaches that identify rogue terminals (e.g. PACo and PARAfit; Legendre et al., 2002; Balbuena et al., 2013; Perez- References Escobar et al., 2016) are often used to remove taxa from phyloge- netic analysis and decrease incongruence among datasets Alvarez I, Wendel JF. 2003. Ribosomal ITS sequences and plant phylogenetic – (Philippe et al., 2009; Regier et al., 2013; Salichos & Rokas, inference. Molecular Phylogenetics and Evolution 29: 417 434. Alverson AJ, Rice DW, Dickinson S, Barry K, Palmer JD. 2011. Origins and 2013; Reginato & Michelangeli, 2016). While PACo identified recombination of the bacterial-sized multichromosomal mitochondrial genome 38% of our taxa as rogue, our hybridization analysis suggests that of cucumber. The Plant Cell 23: 2499–2513. c. 94% of our sampled taxa are affected by ancient or recent Alverson AJ, Wei X, Rice DW, Stern DB, Barry K, Palmer JD. 2010. Insights hybridization. Therefore, removing taxa identified as rogue from into the evolution of mitochondrial genome size from complete sequences of our analysis would not eliminate topological incongruence pro- Citrullus lanatus and Cucurbita pepo (Cucurbitaceae). Molecular Biology and Evolution 27: 1436–1448. duced by hybridization. We encourage biologists to uncover the Ambrose CD, Crease TJ. 2011. Evolution of the nuclear ribosomal DNA reason for incongruence (instead of removing rogue taxa) as the intergenic spacer in four species of the Daphnia pulex complex. BMC Genetics answer to this question could elucidate noteworthy biological 12: 13. processes. Amenta N, Klingner J. 2002. Case study: visualizing sets of evolutionary trees. – Our study demonstrates that ancient hybridization is worthy 8th IEEE Symposium on Information Visualization 2002:71 74. Anderson E, Stebbins GL. 1954. Hybridization as an evolutionary stimulus. of attention and should be modeled in phylogenetic inference. Evolution 8: 378–388. Future research should focus on understanding the role of Andrews S. 2010. FastQC: a quality control tool for high throughput sequence data. hybridization in diversification and on developing methods to [WWW document] URL http://www.bioinformatics.babraham.ac.uk/projec quantify ancient introgression and accurately infer reticulate net- ts/fastqc/ [accessed 1 January 2015]. works. The evidence presented in our study builds on numerous Balbuena JA, Mıguez-Lozano R, Blasco-Costa I. 2013. PACo: a novel procrustes application to cophylogenetic analysis. PLoS ONE 8: e61048. reports (Sessa et al., 2012; Sochor et al., 2015; Sun et al., 2015; Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin Wu et al., 2015) that emphasize a central role of reticulation in VM, Nikolenko SI, Pham S, Prjibelski AD et al. 2012. SPAdes: a new genome plant evolution. assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology 19: 455–477. Bell CD, Donoghue MJ. 2005. Phylogeny and biogeography of Valerianaceae Acknowledgements (Dipsacales) with special reference to the South American valerians. Organisms Diversity and Evolution 5: 147–159. We thank Fernando Alzate, Felipe Cardona, Michael Dillon, Berry PE, Calvo RN. 1989. Wind pollination, self-Incompatibility, and Alvaro Idarraga, Santiago Madrin~an, Haydee Montoya, Victor altitudinal shifts in pollination systems in the high Andean genus Espeletia Quipuscoa, Adriana Prieto, Rusty Russell, Katya Romoleroux, (Asteraceae). American Journal of Botany 76: 1602–1614. Luis Sanchez, and Tom Wendt for providing access to their Birky CW. 1995. Uniparental inheritance of mitochondrial and chloroplast genes: mechanisms and evolution. Proceedings of the National Academy of herbaria. Hamilton Beltran provided the pictures of Sciences, USA 92: 11331–11338. Diplostephium cinereum and Diplostephium sp. nov. CAJ 2. Jor- Blake SF. 1928. Review of the genus Diplostephium. American Journal of Botany dan Bemmels, Joseph Brown, David Cannatella, Amalia Diaz, 15:43–64. Robert Jansen, Thomas Juenger, Teofil Nakov, Juan Palacio, Bock DG, Kane NC, Ebert DP, Rieseberg LH. 2014. Genome skimming reveals Sarah Sussman, Ning Wang, and Mao-Lun Weng provided help the origin of the Jerusalem artichoke tuber crop species: neither from Jerusalem nor an artichoke. New Phytologist 201: 1021–1030. and comments during the development of this study. We also Boisvert S, Raymond F, Godzaridis E, Laviolette F, Corbeil J. 2012. Ray want to thank the three anonymous reviewers for their construc- Meta: scalable de novo metagenome assembly and profiling. Genome Biology tive comments. This project was funded by the University of 13: R122. Texas at Austin (Plant Biology Program Award, the C. L. Lundell Brouillet L, Lowrey TK, Urbatsch LE, Karaman-Castro V, Sancho G, Wagstaff Chair of Systematic Botany, and The Linda Escobar Award), the SJ, Semple JC. 2009. Astereae. In: Funk VA, Stuessy T, Bayer R, eds. Systematics, evolution and biogeography of Compositae. Vienna, Austria: Garden Club of America (2012 Award in Tropical Botany) and International Association of Plant Taxonomists, 589–629. the National Science Foundation (postdoctoral support FESD

Ó 2017 The Authors New Phytologist (2017) 214: 1736–1750 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 1748 Research Phytologist

Bushnell B. 2016. BBTools: a suite of bioinformatic tools used for DNA and RNA Huson DH, Bryant D. 2006. Application of phylogenetic networks in sequence data analysis. [WWW document] URL http://jgi.doe.gov/data-and- evolutionary studies. Molecular Biology and Evolution 23: 254–267. tools/bbtools/ [accessed 1 July 2016]. Huson DH, Kl€opper T, Lockhart PJ, Steel MA. 2005. Reconstruction of Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH. 2011. reticulate networks from gene trees. In: Miyano S, Mesirov J, Kasif S, Istrail S, Stacks: building and genotyping loci de novo from short-read sequences. G3 Pevzner PA, Waterman M, eds. Research in computational molecular biology. (Bethesda, Md.) 1: 171–182. Berlin, Heidelberg, Germany: Springer, 233–249. Chifman J, Kubatko L. 2014. Quartet inference from SNP data under the Jansen RK, Ruhlman TA. 2012. Plastid genomes of seed plants. In: Bock R, coalescent model. Bioinformatics 30: 3317–3324. Knoop V, eds. Genomics of chloroplasts and mitochondria, advances in Chikhi R, Medvedev P. 2014. Informed and automated k-mer size selection for photosynthesis and respiration. Dordrecht, the Netherlands: Springer, 103–126. genome assembly. Bioinformatics 30:31–37. Joly S. 2012. JML: testing hybridization from species trees. Molecular Ecology Cuatrecasas J. 1969. Prima Flora Colombiana. 3. Compositae-Astereae. Webbia Resources 12: 179–184. 24:1–335. Joly S, McLenachan PA, Lockhart PJ. 2009. A statistical approach for Darriba D, Taboada GL, Doallo R, Posada D. 2012. jModelTest 2: more distinguishing hybridization and incomplete lineage sorting. American models, new heuristics and parallel computing. Nature Methods 9: 772. Naturalist 174: E54–E70. Doolittle WF. 1999. Phylogenetic classification and the universal tree. Science Karaman-Castro V, Urbatsch LE. 2009. Phylogeny of Hinterhubera group and 284: 2124–2129. related genera (Hinterhuberinae: Astereae) based on the nrDNA ITS and ETS Drummond AJ, Suchard MA, Xie D, Rambaut A. 2012. Bayesian phylogenetics sequences. Systematic Botany 34: 805–817. with BEAUti and the BEAST 1.7. Molecular Biology and Evolution 29: 1969– Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for 1973. rapid multiple sequence alignment based on fast Fourier transform. Nucleic Durand EY, Patterson N, Reich D, Slatkin M. 2011. Testing for ancient Acids Research 30: 3059–3066. admixture between closely related populations. Molecular Biology and Evolution Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton 28: 2239–2252. S, Cooper A, Markowitz S, Duran C et al. 2012. Geneious basic: an integrated Eaton DAR. 2014. PyRAD: assembly of de novo RADseq loci for phylogenetic and extendable desktop software platform for the organization and analysis of analyses. Bioinformatics 30: 1844–1849. sequence data. Bioinformatics 28: 1647–1649. Eaton DAR, Overcast I. 2016. ipyrad: interactive assembly and analysis of RADseq Kim KJ, Choi KS, Jansen RK. 2005. Two chloroplast DNA inversions originated data sets. [WWW document] URL http://ipyrad.readthedocs.io/ [accessed 1 simultaneously during the early evolution of the sunflower family (Asteraceae). August 2016]. Molecular Biology and Evolution 22: 1783–1792. Eaton DAR, Ree RH. 2013. Inferring phylogeny and introgression using Langmead B, Salzberg S. 2012. Fast gapped-read alignment with Bowtie 2. RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae). Nature Methods 9: 357–359. Systematic Biology 62: 689–706. Legendre P, Desdevises Y, Bazin E. 2002. A statistical test for host-parasite Emshwiller E. 2002. Biogeography of the Oxalis tuberosa alliance. Botanical coevolution. Systematic Biology 51: 217–234. Review 68: 128–152. Leigh JW, Susko E, Baumgartner M, Roger AJ. 2008. Testing congruence in Fan Y, Wu R, Chen MH, Kuo L, Lewis PO. 2011. Choosing among partition phylogenomic analysis. Systematic Biology 57: 104–115. models in Bayesian phylogenetics. Molecular Biology and Evolution 28: 523– Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows- 532. Wheeler transform. Bioinformatics 25: 1754–1760. Genner MJ, Turner GF. 2012. Ancient hybridization and phenotypic novelty Linder CR, Goertzen LR, Heuvel BV, Francisco-Ortega J, Jansen RK. 2000. within lake Malawi’s cichlid fish radiation. Molecular Biology and Evolution 29: The complete external transcribed spacer of 18S-26S rDNA: amplification and 195–206. phylogenetic utility at low taxonomic levels in Asteraceae and closely allied Givnish TJ. 1997. Adaptive radiation and molecular systematics: aims and families. Molecular Phylogenetics and Evolution 14: 285–303. conceptual issues. In: Givnish TJ, Systma KJ, eds. Molecular evolution and Luebert F, Weigend M. 2014. Phylogenetic insights into Andean plant adaptive radiation. New York, NY, USA: Cambridge University Press, 1–54. diversification. Frontiers in Ecology and Evolution 2:1–17. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Luteyn J. 1999. Paramos: a checklist of plant diversity, geographical Li H, Zhai W, Fritz MH-Y et al. 2010. A draft sequence of the Neandertal distribution, and botanical literature. New York, NY, USA: New York genome. Science 328: 710–722. Botanical Garden. Guindon S, Gascuel O. 2003. A simple, fast, and accurate algorithm to estimate Ma PF, Zhang YX, Zeng CX, Guo ZH, Li DZ. 2014. Chloroplast phylogenomic large phylogenies by maximum likelihood. Systematic Biology 52: 696–704. analyses resolve deep-level relationships of an intractable bamboo tribe Hardig TM, Soltis PS, Soltis DE. 2000. Diversification of the North American Arundinarieae (Poaceae). Systematic Biology 63: 933–950. shrub genus Ceanothus (Rhamnaceae): conflicting phylogenies from nuclear Maddison WP. 1997. Gene trees in species trees. Systematic Biology 46: 523–536. ribosomal DNA and chloroplast DNA. American Journal of Botany 87: 108– Maddison WP, Maddison DR. 2015. Mesquite: a modular system for evolutionary 123. analysis. Version 3.04. [WWW document] URL http://mesquiteproject.org Heled J, Drummond AJ. 2010. Bayesian inference of species trees from [accessed 1 January 2016] multilocus data. Molecular Biology and Evolution 27: 570–580. Madrin~an S, Cortes AJ, Richardson JE. 2013. Paramo is the world’s fastest Huelsenbeck JP, Larget B, Alfaro ME. 2004. Bayesian phylogenetic model evolving and coolest biodiversity hotspot. Frontiers in Genetics 4: 192. selection using reversible jump Markov chain Monte Carlo. Molecular Biology Mallet J, Beltran M, Neukirchen W, Linares M. 2007. Natural hybridization in and Evolution 21: 1123–1133. heliconiine butterflies: the species boundary as a continuum. BMC Evolutionary Huelsenbeck JP, Rannala B, Masly JP. 2000. Accommodating phylogenetic Biology 7: 28. uncertainty in evolutionary studies. Science 288: 2349–2350. Martin M. 2011. Cutadapt removes adapter sequences from high-throughput Hughes CE, Atchison GW. 2015. The ubiquity of alpine plant radiations: from sequencing reads. EMBnet Journal 17:10–12. the Andes to the Hengduan Mountains. New Phytologist 207: 275–282. Miller MA, Feiffer WP, Schwartz T. 2010. Creating the CIPRES science Hughes CE, Eastwood R. 2006. Island radiation on a continental scale: gateway for inference of large phylogenetic trees. Gateway Computing exceptional rates of plant diversification after uplift of the Andes. Proceedings of Environments Workshop (GCE) 2010:1–8. the National Academy of Sciences, USA 103: 10334–10339. Moore WM. 1995. Inferring phylogenies from mtDNA variation: Hughes KW, Petersen RH. 2001. Apparent recombination or gene conversion in mitochondrial-gene trees versus nuclear-gene trees. Evolution 49: 718–726. the ribosomal ITS region of a Flammulina (Fungi, Agaricales) hybrid. Mort ME, Crawford DJ, Kelly JK, Santos-Guerra A, Menezes de Sequeira M, Molecular Biology and Evolution 18:94–96. Moura M, Caujape-Castells J. 2015. Multiplexed-shotgun-genotyping data Hurvich CM, Tsai C. 1989. Regression and time series model selection in small resolve phylogeny within a very recently derived insular lineage. American samples. Biometrika 76: 297–307. Journal of Botany 102: 634–641.

New Phytologist (2017) 214: 1736–1750 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 1749

Nesom GL. 1994. Subtribal classification of the Astereae (Asteraceae). Phytologia Salichos L, Rokas A. 2013. Inferring ancient divergences requires genes with 76: 193–274. strong phylogenetic signals. Nature 497: 327–331. Nesom GL, Robinson H. 2007. Tribe Astereae. In: Kadereit JW, Jeffrey C, eds. Sanchez-Baracaldo P. 2004. Phylogenetics and biogeography of the neotropical The families and genera of vascular plants vol 8. Berlin, Germany: Springer, fern genera Jamesonia and Eriosorus (Pteridaceae). American Journal of Botany 284–342. 91: 274–284. Noyes RD, Rieseberg LH. 1999. ITS sequence data support a single origin for Sancho G, Hind DJN, Pruski JF. 2010. Systematics of Podocoma (Asteraceae : North American Astereae (Asteraceae) and reflect deep geographic divisions in Astereae): a generic reassessment. Botanical Journal of the Linnean Society 163: Aster s.l. American Journal of Botany 86: 398–412. 486–513. Nurk€ NM, Scheriau C, Madrin~an S. 2013. Explosive radiation in high Andean Sancho G, Karaman-Castro V. 2008. A phylogenetic study in American Hypericum – rates of diversification among New World lineages. Frontiers in Podocominae (Asteraceae: Astereae) based on morphological and molecular Genetics 4: 175. data. Systematic Botany 33: 762–775. Panero JL, Freire SE, Ariza Espinar L, Crozier BS, Barboza GE, Cantero JJ. Seehausen O. 2004. Hybridization and adaptive radiation. Trends in Ecology and 2014. Resolution of deep nodes yields an improved backbone phylogeny and a Evolution 19: 198–207. new basal lineage to study early evolution of Asteraceae. Molecular Phylogenetics Sessa EB, Zimmer EA, Givnish TJ. 2012. Reticulate evolution on a global scale: and Evolution 80:43–53. a nuclear phylogeny for New World Dryopteris (Dryopteridaceae). Molecular Panero JL, Funk VA. 2008. The value of sampling anomalous taxa in Phylogenetics and Evolution 64: 563–581. phylogenetic studies: major clades of the Asteraceae revealed. Molecular Simpson BB. 1974. Glacial migrations of plants: island biogeographical evidence. Phylogenetics and Evolution 47: 757–782. Science 185: 698–700. Perez-Escobar OA, Balbuena JA, Gottschling M. 2016. Rumbling orchids: how Sochor M, Vasut RJ, Sharbel TF, Travnıcek B. 2015. How just a few makes a to assess divergent evolution between chloroplast endosymbionts and the lot: speciation via reticulation and apomixis on example of European brambles nuclear host. Systematic Biology 65:51–65. (Rubus subgen. Rubus, Rosaceae). Molecular Phylogenetics and Evolution 89:13– Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE. 2012. Double 27. digest RADseq: an inexpensive method for de novo SNP discovery and Solıs-Lemus C, Ane C. 2016. Inferring phylogenetic networks with genotyping in model and non-model species. PLoS ONE 7: e37135. maximum pseudolikelihood under incomplete lineage sorting. PLoS Genetics Philippe H, Derelle R, Lopez P, Pick K, Borchiellini C, Boury-Esnault N, 12:1–21. Vacelet J, Renard E, Houliston E, Queinnec E et al. 2009. Phylogenomics Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic revives traditional views on deep relationships. Current Biology 19: 706– analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688– 712. 2690. Phillips RL, Kleese RA, Wang SS. 1971. The nucleolus organizer region of maize Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post- (Zea mays L.): chromosomal site of DNA complementary to ribosomal RNA. analysis of large phylogenies. Bioinformatics 30: 1312–1313. Chromosoma 36:79–88. Stamatakis A. 2015. The RAxML v8.1.X manual. [WWW document] URL Qiu YL, Li LB, Wang B, Xue JY, Hendry Ta, Li RQ, Brown JW, Liu Y, http://sco.h-its.org/exelixis/web/software/raxml/index.html [accessed 1 January Hudson GT, Chen ZD. 2010. Angiosperm phylogeny inferred from 2016]. sequences of four mitochondrial genes. Journal of Systematics and Evolution Straub SCK, Parks M, Weitemier K, Fishbein M, Cronn RC, Liston A. 2012. 48: 391–425. Navigating the tip of the genomic iceberg: next-generation sequencing for plant R Core Team. 2016. R: a language and environment for statistical computing. systematics. American Journal of Botany 99: 349–364. [WWW document] http://www.R-project.org/ [accessed 1 January 2016] Strijk JS, Noyes RD, Strasberg D, Cruaud C, Gavory F, Chase MW, Abbott RJ, Rambaut A. 2014. FigTree v1.4.2. [WWW document] URL http://tree.bio.ed.ac. Thebaud C. 2012. In and out of Madagascar: dispersal to peripheral islands, uk/ [accessed 1 January 2016]. insular speciation and diversification of Indian ocean daisy trees (Psiadia, Rambaut A, Suchard MA, Xie D, Drummond AJ. 2014. Tracer v1.6. [WWW Asteraceae). PLoS One 7: e42932. document] URL http://beast.bio.ed.ac.uk/Tracer [accessed 1 January 2016]. Sukumaran J, Holder MT. 2010a. SumTrees: phylogenetic tree summarization Rauscher JT. 2002. Molecular phylogenetics of the Espeletia complex v.4.10. [WWW document] URL https://github.com/jeetsukumaran/DendroPy (Asteraceae): evidence from nrDNA its sequences on the closest relatives of an [accessed 1 July 2016]. Andean adaptive radiation. American Journal of Botany 89: 1074–1084. Sukumaran J, Holder MT. 2010b. DendroPy: a Python library for phylogenetic Regier JC, Mitter C, Zwick A, Bazinet AL, Cummings MP, Kawahara AY, Sohn computing. Bioinformatics 26: 1569–1571. JC, Zwickl DJ, Cho S, Davis DR et al. 2013. A large-scale, higher-level, Sun M, Soltis DE, Soltis PS, Zhu X, Burleigh JG, Chen Z. 2015. Deep molecular phylogenetic study of the order ( and phylogenetic incongruence in the angiosperm Rosidae clade. Molecular butterflies). PLoS ONE 8: e58568. Phylogenetics and Evolution 83: 156–166. Reginato M, Michelangeli FA. 2016. Untangling the phylogeny of Leandra s.str. Swofford DL. 2002. PAUP*. Phylogenetic analysis using parsimony (*and other (Melastomataceae, Miconieae). Molecular Phylogenetics and Evolution 96:17– methods). Version 4. Sunderland, MA, USA: Sinauer Associates. 32. Vargas OM. 2011. A nomenclator of Diplostephium (Asteraceae: Astereae): a list Renaud G, Stenzel U, Maricic T, Wiebe V, Kelso J. 2014. deML: robust of species with their synonyms and distribution. Lundellia 14:32–51. demultiplexing of Illumina sequences using a likelihood-based approach. Vargas OM, Madrin~an S. 2006. Clave para la identificacion de las especies del Bioinformatics 31: 770–772. genero Diplostephium (Asteraceae, Astereae) en Colombia. Revista de la Rieppel O. 2010. Species monophyly. Journal of Zoological Systematics and Academia Colombiana de Ciencias Exactas Fisicas y Naturales 30: 489–494. Evolutionary Research 48:1–8. Vargas OM, Madrin~an S. 2012. Preliminary phylogeny of Diplostephium Rieseberg LH, Soltis DE. 1991. Phylogenetic consequences of cytoplasmic gene (Asteraceae): speciation rate and character evolution. Lundellia 15:1–15. flow in plants. Evolutionary Trends in Plants 5:64–84. de Vienne DM, Aguileta G, Ollier S. 2011. Euclidean nature of phylogenetic Rognes T, Flouri T, Nichols B, Quince C, Mahe F. 2016. VSEARCH: a distance matrices. Systematic Biology 60: 826–832. versatile open source tool for metagenomics. PeerJ 4: e2584. Volkov RA, Komarova NY, Hemleben V. 2007. Ribosomal DNA in plant Ronquist F, Huelsenbeck JP, Telensko M. 2011. MrBayes version 3.2 manual: hybrids: inheritance, rearrangement, expression. Systematics and Biodiversity 5: tutorials and model summaries. [WWW document] URL http://mrbayes.sourcef 261–276. orge.net/index.php [accessed 1 January 2016]. Wen D, Yu Y, Hahn MW, Nakhleh L. 2016. Reticulate evolutionary history and Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, H€ohna S, extensive introgression in mosquito species revealed by phylogenetic network Larget B, Liu L, Suchard MA, Huelsenbeck JP. 2012. Mrbayes 3.2: efficient analysis. Molecular Ecology 25: 2361–2372. bayesian phylogenetic inference and model choice across a large model space. Whitfield JB, Lockhart PJ. 2007. Deciphering ancient rapid radiations. Trends in Systematic Biology 61: 539–542. Ecology and Evolution 22: 258–265.

Ó 2017 The Authors New Phytologist (2017) 214: 1736–1750 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 1750 Research Phytologist

Wu J, Nyman T, Wang D-C, Argus GW, Yang Y-P, Chen J-H. 2015. Fig. S11 Mitochondrial best tree obtained by maximum likeli- Phylogeny of Salix subgenus Salix s.l. (Salicaceae): delimitation, biogeography, hood. and reticulate evolution. BMC Evolutionary Biology 15:1–13. Wyman SK, Jansen RK, Boore JL. 2004. Automatic annotation of organellar – genomes with DOGMA. Bioinformatics 20: 3252–3255. Fig. S12 Visualization of Robinson Foulds tree distances among Xie W, Lewis PO, Fan Y, Kuo L, Chen MH. 2011. Improving marginal the nuclear ribosomal 50%, 75% and 90% consensus datasets. likelihood estimation for Bayesian phylogenetic model selection. Systematic – Biology 60: 150 160. Fig. S13 Visualization of Robinson–Foulds tree distances among Zapata F. 2013. A multilocus phylogenetic analysis of Escallonia (Escalloniaceae): the ddRAD, nuclear ribosomal 90%, chloroplast, and mitochon- diversification in montane South America. American Journal of Botany 100: 526–545. drial (red) datasets.

Fig. S14. Chronogram of the ITS matrix. Supporting Information Additional Supporting Information may be found online in the Table S1 List of specimens with their voucher and GenBank Supporting Information tab for this article: information

Fig. S1 Diagram representing the sequence assembly pipeline for Table S2 Descriptive statistics of the matrices obtained the genome skimming dataset. Table S3 Bayesian factor comparison between the two partition Fig. S2 Circular representation of the chloroplast genome of models calculated with MRBAYES in each genome skimming Diplostephium haenkei. dataset

Fig. S3 Circular representation of the mitochondrial genome of Table S4 Models of evolution inferred by JMODELTEST and Diplostephium hartwegii. MRBAYES for the genome skimming datasets

Fig. S4 Nuclear ddRAD tree obtained with SVD quartets. Table S5 Complete list of Patterson’s D-statistic calculations

Fig. S5 Nuclear ribosomal maximum clade credibility tree obtained Methods S1 Parameters file used for assembly of the reference by Bayesian inference with the 50% consensus matrix. ddRAD library in IPYRAD.

Fig. S6 Nuclear ribosomal maximum clade credibility tree obtained Methods S2 Parameters file used for assembly of the nuclear- by Bayesian inference with the 75% consensus matrix. ddRAD dataset in SHOTGUN2RAD.

Fig. S7 Nuclear ribosomal best tree obtained by maximum likeli- Methods S3 Parameters file used for assembly of the nuclear- hood with the 50% consensus matrix. ddRAD dataset in PYRAD.

Fig. S8 Nuclear ribosomal best tree obtained by maximum likeli- Please note: Wiley Blackwell are not responsible for the content hood with the 75% consensus matrix. or functionality of any Supporting Information supplied by the authors. Any queries (other than missing material) should be Fig. S9 Nuclear ribosomal best tree obtained by maximum likeli- directed to the New Phytologist Central Office. hood with the 90% consensus matrix.

Fig. S10 Chloroplast best tree obtained by maximum likelihood.

New Phytologist (2017) 214: 1736–1750 Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust