Molecular Phylogenetics and Evolution xxx (xxxx) xxx

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution

journal homepage: www.elsevier.com/locate/ympev

A phylogenomic approach resolves the backbone of () and identifies signals of hybridization and allopolyploidy

Richard G. J. Hodel *, Elizabeth Zimmer , Jun Wen

Department of Botany, National Museum of Natural History, MRC 166, Smithsonian Institution, Washington, DC 20013-7012, USA

ARTICLE INFO ABSTRACT

Keywords: The genus Prunus, which contains 250–400 species, has ample genomic resources for the economically important Allopolyploidy taxa in the group including , , and . However, the backbone of Prunus, specifically the Cytonuclear discord position of the racemose group relative to the solitary and corymbose groups, remains phylogenetically uncer­ Gene conflict tain. Surprisingly, phylogenomic analyses to resolve relationships in the genus are lacking. Here, we assemble Hybridization transcriptomes from 17 Prunus species representing four subgenera, and use existing transcriptome assemblies, to Phylogenomics Phylotranscriptomics resolve key relationships in the genus using a phylogenomic approach. From the transcriptomes, we constructed 21-taxon datasets of putatively single-copy nuclear genes with 591 and 379 genes, depending on taxon- occupancy filtering. Plastome sequences were obtained or assembled for all species present in the nuclear data set. The backbone of Prunus was resolved consistently in the nuclear and chloroplast phylogenies, but we found substantial cytonuclear discord within subgenera. Our nuclear phylogeny recovered a monophyletic racemose group, contrasting with previous studies finding paraphyly that suggests repeated allopolyploidy early in the evolutionary history of the genus. However, we detected multiple species with histories consistent with hy­ bridization and allopolyploidy, including a deep hybridization event involving subgenus Amygdalus and the Armeniaca clade in subgenus Prunus. Analyses of gene tree conflictrevealed substantial discord at several nodes, including the crown node of the racemose group. Alternative gene tree topologies that conflictedwith the species tree were consistent with a paraphyletic racemose group, highlighting the complex reticulated evolutionary history of this group.

1. Introduction hybridization are ubiquitous in the Rosaceae (e.g., Wang et al., 2019; Liu et al., 2020), a well-studied clade of angiosperms. The genomic age has demonstrated that the resolution of phyloge­ In many understudied clades, collecting phylogenomic data is netic relationships often requires many unlinked loci from the nuclear expensive and difficult, which explains their lack of resolved relation­ genome (Sang 2002; Zimmer and Wen, 2015). It has become clear that ships. Meanwhile, some clades have ample genomic resources, but still relying on only a few loci, or many linked loci (i.e., chloroplast genomes) lack phylogenetic resolution of key relationships. One notable example can lead to incorrect phylogenomic inferences due to idiosyncrasies or of the latter is Prunus (Rosaceae), an angiosperm genus of approximately biases in linked organellar genomes (Moore, 1995; Rothfels et al., 2017). 250–400 species that contains economically important taxa such as Processes such as incomplete lineage sorting, hybridization, polyploidy, cherries, peaches, , almonds, and plums. The genus also contains and horizontal gene transfer complicate phylogenetic inference (Mad­ many evergreen and species occurring throughout the dison and Wiens, 1997; Eaton and Ree, 2013; Folk et al., 2018; Knowles temperate regions of the northern hemisphere and in the tropics and et al., 2018). Approximately one quarter of species have under­ subtropics. The genus is characterized by reproductive features such as gone hybridization (Rieseberg and Willis, 2007), and polyploidy appears superior ovary position and a monocarpellate pistil developing into a to accompany at least 15% of speciation events in angiosperms ( , and vegetative characters including a solid pith and glands et al., 2009), making both key mechanisms driving diversification and (Chin et al., 2013; Chin et al., 2014). Within Prunus, there are three complicating phylogenetic inference. Instances of polyploidy and/or major groups defined by inflorescence architecture. These include the

* Corresponding author. E-mail address: [email protected] (R. G. J. Hodel). https://doi.org/10.1016/j.ympev.2021.107118 Received 29 September 2020; Received in revised form 1 February 2021; Accepted 8 February 2021 Available online 18 February 2021 1055-7903/© 2021 Elsevier Inc. All rights reserved.

Please cite this article as: Richard G. J. Hodel, Molecular Phylogenetics and Evolution, https://doi.org/10.1016/j.ympev.2021.107118 R. G. J. Hodel et al. Molecular Phylogenetics and Evolution xxx (xxxx) xxx deciduous solitary flowergroup, which includes species such as ASTRAL with 11 different gene sets. They recovered a monophyletic and , the deciduous corymbose group (i.e., cherries) with racemose lineage with their concatenation analysis and two of the arranged in a corymb or umbel, and the racemose group characterized ASTRAL datasets with few genes (113, 166 genes), but a paraphyletic by flowers borne on a , and containing both tropical and racemose group in all other phylogenies. All of their ASTRAL phylog­ temperate species and deciduous and evergreen representatives (Chin enies with greater than 256 genes indicated some degree of paraphyly in et al., 2013). the racemose group. Although Xiang et al. (2017) were targeting genes There are substantial genomic resources for the genus Prunus, with to resolve the Rosaceae as opposed to Prunus, the levels of uncertainty genome assemblies available for seven species: apricot (P. armeniaca; along the backbone of Prunus in their study while using hundreds of Jiang et al., 2019), sweet (P. avium; Shirasawa et al., 2017), nuclear genes highlight the need for a study focused on resolving Prunus. (P. dulcis; Sanchez-P´ ´erez et al., 2019), Chinese plum (P. mume; A query of the NCBI sequence read archive (SRA; https://www.ncbi. Zhang et al., 2012), peach (P. persica; Verde et al., 2013), Japanese plum nlm.nih.gov/sra) found that assembled transcriptomes and/or RNA-Seq (P. salicina; Xue et al., 2019), and Yoshino cherry (P. yedoensis; Shir­ data were available for 21 species in Prunus, with at least two species asawa et al., 2019). However, surprisingly, there has not been a phy­ represented in each of the three major groups (Table 2). We have logenomic investigation focusing on resolving key relationships within assembled transcriptomes to generate a phylogenomic dataset of hun­ Prunus. Phylogenetic studies on Prunus have used several plastid dreds of nuclear genes for 21 Prunus species plus an outgroup. Specif­ markers, the nuclear ribosomal ITS, and a few nuclear loci and have ically, our goals are to: 1) use this phylogenomic dataset to resolve the produced some significant insights, but many questions remain (Bortiri relationships of the backbone of the genus; 2) identify key nodes in the et al., 2001, 2002; Lee and Wen, 2001; Wen et al., 2008; Chin et al., genus with underlying gene tree conflict; 3) investigate the causes of 2010, 2014). Phylogenetic relationships within Prunus are not fully observed conflict at these nodes. resolved, particularly along the backbone of the phylogeny (Chin et al., 2014; Zhao et al., 2016; Xiang et al., 2017; Zhao et al., 2018; see 2. Materials and methods Table 1). Cytonuclear discord and multiple copies of low-copy nuclear loci suggest that allopolyploidy and/or ancient hybridization is 2.1. Dataset construction obscuring phylogenetic relationships (Zhao et al., 2016). The relation­ ships among major groups within Prunus defined by inflorescence We queried publicly available nucleotide sequence databases to morphology (solitary, corymbose, and racemose lineages) are poorly generate a phylogenomic dataset for as many Prunus species as possible. resolved (Chin et al., 2014). Chloroplast data show a sister relationship Raw reads were downloaded from the NCBI SRA for 21 accessions of between the solitary and corymbose groups, but nuclear data have not Prunus species plus one outgroup, Malus domestica (apple) (Table 2). clearly resolved the relationship among the solitary, corymbose, and Reads were either obtained from RNA-Seq or whole genome sequencing racemose groups. projects (Table 2). We downloaded transcriptomes from three Prunus Shi et al. (2013) used a concatenated matrix of 12 chloroplast genes, species and the outgroup with annotated draft genomes (P. avium, ITS, and two single-copy nuclear genes to infer with maximum support P. mume, P. persica, and M. domestica). For species with available RNA- that the racemose lineage was sister to a clade of the solitary + Seq data, but not assembled transcriptomes, we assembled tran­ corymbose lineages. The topology recovered by Shi et al. (2013) is scriptomes for use in this project (N = 17); all analyses were run on the concordant with chloroplast phylogenies from other studies (Chin et al., Smithsonian Institution High Performance Cluster (SI/HPC, “Hydra”) 2014), suggesting that it is possible the concatenation approach means (https://doi.org/10.25572/SIHPC). For these 17 species, we followed that the signal from their data matrix was dominated by the 12 chlo­ step one (i.e., the script ‘filter_fq.py’) of the Yang and Smith protocol for roplast genes they used (10,677 bp of the alignment) relative to the building a phylogenomic dataset (Morales-Briones et al., 2021; Yang and nuclear genes (2,275 bp). Subsequent studies also found the racemose Smith, 2014) to clean and quality filterthe data. First, Rcorrector (Song group sister to the other two groups using chloroplast data (Chin et al., and Florea, 2015) was used to correct random sequencing errors in the 2014) but found a paraphyletic racemose group when using nuclear data raw fastq files.Next, problematic read pairs that could not be corrected (both Chin et al., 2014; Zhao et al., 2016). A larger-scale systematic were removed from the dataset. Trimmomatic (Bolger et al., 2014) was study (Xiang et al., 2017) used a phylotranscriptomic approach to used to remove sequencing adapters and any low-quality sequences. resolve phylogenetic relationships in the Rosaceae and included 11 At this step, chloroplast reads were filtered out using Bowtie2 specimens of Prunus. They investigated many datasets ranging from 113 (Langmead and Salzberg, 2012) with the chloroplast database as to 882 genes, using a concatenation approach with 113 genes and a reference. We obtained publicly available plastome data for 16 species and assembled plastomes for the remaining fivespecies using the reads that mapped to a chloroplast reference (P. avium) using the software Table 1 FastPlast (https://github.com/mrmckain/Fast-Plast). MAFFT (Katoh Previous phylogenetic studies investigating the relationships among the major and Standley, 2013) was used to align the plastomes and the resulting clades in Prunus. For each study, we report the number of species included, the alignments were trimmed using TrimAl with the -automated1 heuristic, type of genetic data used, and the key results regarding the inferred relationships and with the ‘pxclsq’ command in phyx (phylogenetic tools for unix; of the racemose, corymbose, and solitary groups. Brown et al., 2017) to filterthe alignment based on either 10% or 30% Study Species Genetic data Result column occupancy required. Each of these three alignments were used in Shi et al., 2013 84 12 chloroplast genes, Racemose sister to subsequent plastome phylogenetic analyses (Supplemental Table S1). 3 nuclear genes (corymbose, solitary) The retained nuclear reads were run through FastQC (Andrews, Chin et al., 2014 81 4 chloroplast genes, Chloroplast: Racemose nuclear ITS sister to (corymbose, 2010) to check the quality of the reads and to detect and then remove solitary) Nuclear ITS: overrepresented reads. The resulting fastq files were normalized using Paraphyletic racemose the nkbc_normalize.pl script and then used for de novo transcriptome group assembly in Trinity (Haas et al., 2013). We processed the transcriptomes Zhao et al., 2016 47 3 nuclear genes Paraphyletic racemose using MarkerMiner (Chamala et al., 2015) to construct a dataset of pu­ group Xiang et al., 2017 124 Transcriptomic 12 phylogenies presented; 2 tatively single copy nuclear (SCN) genes. Briefly, MarkerMiner uses datasets of 113–882 show Racemose sister to reciprocal BLAST (Altschul et al., 1990) queries on all inputted filtered nuclear genes (corymbose, solitary), 10 transcriptome assemblies to identify putative orthologs using a user- show paraphyletic specified reference (in our case, Malus domestica). The orthologs are racemose group screened against a curated set of genes with SCN status across the

2 R. G. J. Hodel et al. Molecular Phylogenetics and Evolution xxx (xxxx) xxx

Table 2 The 22 species used in the phylogenomic analysis, including 21 Prunus species plus the outgroup (Malus domestica). The group based on floralarchitecture, subgenus, and origin of the data (NCBI accession) are listed. For each taxon where data are available, the sporophytic chromosome count number from the Tropicos IPCN chromosome reports is listed, along with the source listed on Tropicos. There were no chromosome count data available for P. mira, P. sibirica, and P. humilis.

Group Subgenus Species NCBI accession Sporophytic chromosome Chromosome citation count

outgroup – Malus domestica GCF_002114115.1_ASM211411v1_rna 34 Fuchs et al., 1995 corymbose Cerasus SRR5112800 16 Gonzalez´ Zapatero et al., 1988 corymbose Cerasus GCF_002207925.1_PAV_r1.0_rna 16 Montgomery et al., 1997 corymbose Cerasus Prunus SRR8186852 16 Oginuma, 1987 campanulata corymbose Cerasus Prunus SRR1568248 24/32 Oginuma, 1987/Iwatsubo et al., 2002 pseudocerasus corymbose Cerasus Prunus yedoensis DRR170194 16 Chen, 1993 corymbose Cerasus Prunus subhirtella SRR10737061 16 Iwatsubo et al., 2004 corymbose Cerasus Prunus sargentii SRR3738810 16 Nishikawa, 1985 solitary Amygdalus Prunus dulcis ERR2182602 16 https://www.rosaceae.org/species/prunus/p runus_dulcis solitary Amygdalus SRR8369796 – – solitary Amygdalus Prunus persica GCF_000346465.2_Prunus_persica_NCBIv2_rna 16 Terraciano and Cialzeta, 1979 solitary Amygdalus SRR3658840 16 Guo et al., 1986 solitary Prunus Prunus salicina SRR8302195 16 /24 /32 Oginuma, 1987/Lin et al., 1991/Eremin and Rassvetaeva, 1992 solitary Prunus Prunus domestica SRR5074769 16 /32 /48 Lin et al., 1991/Singhal et al., 1990 solitary Prunus Prunus mume GCF_000346735.1_P.mume_V1.0_rna 16 Oginuma, 1987 solitary Prunus SRR10505010 16 Li et al., 2020 solitary Prunus Prunus sibirica SRR9595637 – – solitary Prunus ERR2040422 32 Corrias, 1980 solitary Prunus Prunus triloba SRR5286140 64 Chen et al., 2003 solitary Prunus SRR10913940 – – racemose Padus Prunus virginiana SRR6134089 16 Love and Love, 1982 racemose Padus Prunus serotina SRR6134152 32 Javurkova, 1980 angiosperm phylogeny (De Smet et al., 2013). MarkerMiner then clus­ evolutionary rates independently estimated for each partition. Addi­ ters transcripts by reference protein ID, reorients them based on the tionally, a partitioned analysis with branch lengths unlinked between reference sequence, and aligns the retained orthologs using MAFFT partitions was run in IQ-TREE using the TIM3 + F + R3 and GTR + F + (Katoh and Standley, 2013). We filtered the MAFFT alignments R3 models of evolution, depending on the optimal model of evolution for outputted by MarkerMiner using TrimAl with the -automated1 heuristic, each alignment-trimming strategy (Supplemental Table S1). and used the ‘pxclsq’ command in phyx to filterthe alignment based on For the individual gene , first RAxML was used to estimate a either 10% or 30% column occupancy required. Each of these three phylogeny for each gene using the GTRGAMMA model of evolution, 20 alignments were used in subsequent nuclear phylogenetic analyses independent ML searches, and 100 bootstrap replicates. We generated (Supplemental Table S1). Although MarkerMiner uses a set of strictly or both unrooted gene trees for use in coalescent analyses, and rooted gene mostly SCN genes for dataset construction, we used TreSpEx (Struck, trees to use in downstream analyses (i.e., phyparts). We used ASTRAL 2014) to remove any remaining potential paralogues from all align­ (Mirarab et al., 2014) to estimate a species tree in a framework consis­ ments. The a priori paralogy screening function with a bootstrap tent with the coalescent using 1) 379 unrooted gene trees, and 2) 591 threshold of 95 was used, and we discarded any genes identified as unrooted gene trees. Quartet support values were used to assess confi­ potential paralogs. For each alignment we used a complete data matrix dence in the ASTRAL trees. All of the above analyses (partitioned/ with 591 genes and filtered out genes with data for fewer than 10 spe­ unpartitioned RAxML, partitioned/unpartitioned IQ-TREE, ASTRAL) cies; the resulting data matrix contained 379 genes. We also obtained were run on both gene sets (379, 591) and on all three alignments chromosome count data for all species with data available, primarily (pxclsq-30, pxclsq-10, TrimAl), which resulted in 30 separate nuclear from Index to Plant Chromosome Numbers (IPCN) Chromosome Reports phylogenetic trees (Supplemental Table S1). A phylogeny for the plas­ (http://legacy.tropicos.org/NameSearch.aspx?projectid = 9) hosted by tome matrix was inferred using RAxML with the GTRGAMMA model of Tropicos, to assess possible polyploidy within Prunus and its subgenera. evolution, 20 independent ML searches, and 100 bootstrap replicates. We also used IQ-TREE to infer the phylogeny using the TVM + F + R2 3. Phylogenomic inference model of evolution, as selected by ModelFinder, and 1000 rapid boot­ straps. The ‘cophylo’ function in the R package phytools (Revell, 2012) We used several approaches to construct the phylogenies to account was used to visualize concordance between the nuclear and plastome for the effect of sequence alignment, and ML model and search param­ phylogenies. eter choice. We used two sets of genes: ones present in ten or more taxa (N = 379), and all genes (N = 591), and we constructed concatenated supermatrices of 379 and 591 nuclear genes, respectively. We ran 3.1. Assessing conflict unpartitioned ML analyses on each supermatrix using both RAxML v8.2.11 (Stamatakis, 2014) with the GTRGAMMA model of evolution, We used several approaches to quantify gene tree conflictwithin the 20 independent ML searches, and 100 bootstrap replicates, and IQ-TREE nuclear phylogenies. The program phyparts (Smith et al., 2015) was (Nguyen et al., 2015) with automated model selection using Model­ used identify gene tree concordance and discordance for each node in Finder (Kalyaanamoorthy et al., 2017) and 1,000 rapid bootstraps. Next, the species tree. Because phyparts does not identify the process that a partitioned analysis was run in RAxML using the above parameter generated gene tree conflict,we used PhyloNetworks (Solís-Lemus et al., settings with the GTRGAMMA model with the α shape parameter of the 2017) to determine if conflictcould be better explained by hybridization rate heterogeneity model γ, empirical base frequencies, and as opposed to incomplete lineage sorting. The program phyparts com­ pares rooted gene trees with the rooted species tree topology to identify

3 R. G. J. Hodel et al. Molecular Phylogenetics and Evolution xxx (xxxx) xxx concordant, discordant, and uninformative gene trees at each node. exhaustively on all possible taxa combinations, with M. domestica set as Because rooted gene trees were used, fewer gene trees were available for the outgroup. In lineages with hypothesized reticulate evolution, strictly this analysis due to the absence of the outgroup in some gene trees: 320 bifurcating phylogenies may not accurately capture evolutionary history for the dataset requiring 10 + species present in a gene tree (379 so we estimated a phylogenomic network using the pseudolikelihood unrooted gene trees), and 438 for the less stringently filtered dataset method SNaQ, which explicitly accommodates introgression/gene flow (591 unrooted gene trees). We ran phyparts for each alignment/taxon and incomplete lineage sorting. The RAxML-inferred gene trees for all occupancy combination (N = 6) against the dominant species tree to­ 591 genes were used as input and were summarized using quartet pology. For any nodes with conflicting topologies in the nuclear phy­ concordance factors (i.e., the proportion of gene trees with a given logeny, we ran the ‘alternative relationship test’ in phyckle (Smith et al., quartet). Networks are optimized based on branch lengths and inheri­ 2020) to investigate the nature and impact of gene trees discordant with tance probabilities in phylogenetic network space using a pseudo- the species tree. The alternative relationship test uses two or more user deviance score. This score is a multiple of the network’s log-likelihood specified bipartitions and uses these as a constraint when running score up to a constant in which the network perfectly fits the data, RAxML to infer every gene tree. Log-likelihood scores are calculated and with lower pseudo-deviance scores indicating a better fit. We investi­ compared for each gene tree to determine which topology (i.e., between gated hmax values ranging from 0 to 6. The pxclsq-10-trimmed 379-gene the user-inputted bipartitions) is optimal. The number of gene trees and/ ASTRAL tree was used as a starting tree for the initial network optimi­ or the summed difference in log-likelihoods between gene trees are used zation (hmax = 0) and for subsequent hmax values, the best network to determine support for one bipartition versus others. estimated by the preceding hmax value was used as the starting topology. Two approaches were used to investigate potential hybridization Ten independent runs were used for each h value and the optimal events in Prunus: Hybrid Detector (HyDe; Blischak et al., 2018) and number of hybridization events were assessed by plotting hmax versus the SNaQ (Solís-Lemus et al., 2016), with the latter implemented in Phylo­ log-likelihood score for the optimal network for each h value. Networks (Solís-Lemus et al., 2017). HyDe uses phylogenetic invariants arising under a coalescent model with hybridization to detect and assign 4. Results probability of hybridization of three ingroup taxa relative to an out­ group taxon. The parameter γ represents the probability of gene trees 4.1. Phylogenomic inferences with a hybrid population sister to parent A arising under the parental population trees, whereas 1-γ would be the probability of the hybrid All major clade relationships were identical in all inferred nuclear population being sister to parent B. The HyDe analysis was run trees (Fig. 1, Supplemental Table S1); details about the datasets used are

Fig. 1. The nuclear (left) and plastome (right) topologies for 21 Prunus species plus the outgroup (Malus domestica). The nuclear phylogeny shows the dominant nuclear topology shared by 23 of 30 datasets analyzed (see Supplemental Table S1, Materials and Methods). At the nodes, symbols indicate levels of bootstrap support from the RAxML/IQ-TREEs and quartet support from the ASTRAL trees. Triangles indicate a node has bootstrap support of 100% in all 24 ML datasets and ASTRAL quartet support greater than 45. Diamonds show nodes with bootstrap support of 100% in all 24 ML datasets and ASTRAL quartet support greater than 65. Crosses denoted nodes that have 100% bootstrap support in all 24 ML datasets and ASTRAL quartet support greater than 85. In the plastome tree, asterisks show nodes where bootstrap support is greater than 95% in all six plastome datasets. All trees, with support values, are available in Supplemental Figure S1. The dashed lines in the center of the plot show instances of cytonuclear discord. Ingroup branch lengths represent nucleotide substitutions per site, but the branch length of the outgroup taxon (Malus domestica) is not to scale to facilitate visualization. The alternative nuclear topology for subgenus Amygdalus is depicted in the inset box in the upper left of the nuclear phylogeny. For both phylogenies, subgenera (Prunus, Amygdalus, Cerasus, Padus) are color-coded and floral arrangement is indicated on the far right. Photo credits: Bin-Bin Liu, Xianyun Mu, and Yubing Wang.

4 R. G. J. Hodel et al. Molecular Phylogenetics and Evolution xxx (xxxx) xxx listed in Supplemental Table S2. Most inferred nuclear trees were similar supported clades, however they were not the same. In both cases, within major clades and/or subgenera, except that P. persica and P. armeniaca and P. mume were in the same clade (also observed in Shi P. davidiana are sisters in most trees but P. persica is sister to P. mira in et al., 2013, Chin et al., 2014), as were P. prostrata, P. humilis, and nearly one quarter of the nuclear phylogenies—all ones using the entire P. triloba (P. prostrata and P. triloba unsampled in Shi et al., 2013; clade set of genes as opposed to the 10 + taxon set (Supplemental Fig. S1, unresolved in Chin et al., 2014). However, these fivetaxa formed a clade Supplemental Table S1). Henceforth, we refer to the dominant nuclear in our chloroplast phylogeny, but not in our nuclear phylogeny, where topology present in greater than 75% of nuclear trees when we use the the Armeniaca clade (P. armeniaca, P. mume, and P. sibirica) formed a term ‘nuclear phylogeny’. Notably, the P. persica - P. mira sister rela­ clade with P. domestica and P. salicina that was sister to the clade of tionship was also found in the plastome tree (Supplemental Figure S1F). P. triloba + P. prostrata + P. humilis. Prunus sibirica was always sister to For the most part, our study revealed a lack of cytonuclear discord along P. domestica and P. salicina in the chloroplast phylogeny, although in the the backbone of the genus that was detected in previous studies. The nuclear tree the clade P. mume + P. armeniaca + P. sibirica was sister to relationships among major groups based on floral/inflorescence archi­ P. domestica and P. salicina (P. sibirica + P. mume + P. armeniaca were tecture and subgenera were the same (Fig. 1). However, within the sister to P. salicina + P. domestica in Shi et al., 2013; not observed in major clades, there was substantial variation between our nuclear and chloroplast data in Chin et al., 2014). chloroplast phylogenies, especially within the subgenus Cerasus clade (Fig. 1). 4.4. Subgenus Amygdalus

4.2. Racemose taxa With the taxa in our study, the monophyly of subgenus Amygdalus was strongly supported, but within the subgenus there was conflict The racemose taxa in Prunus were only represented in this study by because of the dominant and alternative nuclear topologies (Fig. 1). The two species in subgenus Padus, P. serotina and P. virginiana, but in all phyckle alternative relationship analysis identified two outlier genes analyses there was a monophyletic racemose group sister to the solitary that likely drove the minority of analyses supporting the alternative + corymbose groups. With only two racemose species, it is difficult to topology (P. mira - P. persica sister; see following section). The chloro­ make strong conclusions about the relationship of the morphologically plast phylogeny also had a P. mira - P. persica sister relationship, diverse racemose lineages with the rest of Prunus. Although a mono­ explaining why previous studies found conflicting relation­ phyletic racemose group was strongly supported by bootstrap values ships—usually P. mira - P. persica sister in chloroplast phylogenies (Shi (Fig. 1, Supplemental Fig. S1), there was a substantial amount of un­ et al., 2013; Chin et al., 2014), or a third relationship, P. davidiana - P. derlying gene tree conflict.At the node defininga clade of P. serotina + mira sister, in nuclear data (Chin et al., 2014). P. virginiana, 125 genes supported the species tree topology, but 75 conflicted( Fig. 2). In summary, of the 200 genes that could inform this 4.5. Subgenus Cerasus relationship, over a third of them favored a relationship in which sub­ genus Padus was paraphyletic. The one similarity between the nuclear and chloroplast phylogenies in subgenus Cerasus was P. mahaleb being placed with strong support as 4.3. Subgenus Prunus sister to the rest of the subgenus, which was also observed in Shi et al. (2013), Chin et al. (2014), and Zhao et al. (2016). Otherwise, there were There were a number of differences in the nuclear and chloroplast substantial differences between the nuclear and chloroplast phylogenies phylogenies (Fig. 1). Prunus domestica and P. salicina were sister in both (Fig. 1). In the nuclear tree, P. avium was strongly supported as sister to phylogenies and in previous studies (Shi et al., 2013, Chin et al., 2014). the remainder of the subgenus excluding P. mahaleb, whereas in the In each phylogeny, the subgenus was divided into two strongly chloroplast tree P. avium was sister to P. campanulata + P. sargentii + P.

Fig. 2. The phyparts tree indicating conflictat each node of the nuclear phylogeny using the 591-gene alignment. Because phyparts uses rooted gene trees, only 438 genes were included in this analysis. Here the conflicts are relative to the dominant nu­ clear topology (Fig. 1). At each node, the pie charts indicate the proportion of homologs supporting the clade defined by the node is shown in blue, the proportion supporting the primary alternative for that clade are green, the proportion supporting all other alternatives for the clade are red, and the proportion of homologs with less than 50% boot­ strap support are shown in gray. Along each branch, the top number shows the number of genes concordant with the species tree at the associated node, and the bottom number represents the num­ ber of genes discordant with the species tree for the clade of interest. For six nodes, there is a letter A-F located to the upper left of the pie chart for ease of reference to these nodes in the text. The phyparts tree using the gene alignment that required 10 + taxa presence is displayed in Supplemental Figure S2. Subgenera are color-coded and floral arrangement is indicated on the far right.

5 R. G. J. Hodel et al. Molecular Phylogenetics and Evolution xxx (xxxx) xxx ­

pseudocerasus, although there was less than 100% bootstrap support bi the

(Fig. 1). In summary, there was incongruence between chloroplast and sister

nuclear phylogenies regarding the sister taxon or clade for every species (bottom) removed favored persica in subgenus Cerasus with the exception of P. mahaleb. Due to taxon removed sampling differences, it was impossible to treat these relationships as conflicting genes genes

congruent or discordant with previous studies. 379-gene given and outlier outlier two davidiana-P. P. (top)

4.6. Quantifying conflict For disproportionately the difference: An edge-based phylogenomic support test implemented in phyckle difference: lnL lnL (Smith et al., 2020) on the 591-gene dataset revealed that the P. persica - 591-gene support inferences. massively

P. davidiana sister relationship was supported by more (109) genes than the 855.674 637.090 901.440 693.935 941.171 677.183 Sum Sum 840.828 624.730 884.618 687.839 892.106 681.592 that the P. persica - P. mira sister relationship, which was supported by 105 genes

genes (Table 3). Meanwhile, the summed difference in log-likelihood from more scores supported the P. persica - P. mira sister relationship (sum ΔlnL genes phylogenetic = 1390.93) over the P. persica - P. davidiana sister relationship (sum results Δ = case,

lnL 855.674). However, when extreme outlier genes were excluded outlier The genes this Δ > genes (definedas lnL 100 for a single gene), both the number of genes and impact two summed difference in log-likelihood scores supported the P. persica - In > P. davidiana sister relationship. The cutoff of ΔlnL 100 is arbitrary and genes when non-outlier conservative; smaller ΔlnL cutoffs yielded qualitatively consistent re­ non-outlier relationship.

sults. All results were consistent among different alignments (Table 3, bipartition.

Supplemental Fig. S1). sister the individual 109 103 107 100 118 96 Number Number 101 104 102 102 109 98 However, Although the vast majority of relationships inferred in both nuclear

and chloroplast phylogenies had 100% bootstrap support, there was how persica substantial underlying conflict. The ASTRAL quartet support scores ranged from 37% 98%, with the lowest score occurring at the un­ supporting relationship. certain P. persica sister relationship. We investigated conflict in the nu­ interrogate genes davidiana-P.

clear trees using phyparts (Smith et al., 2015) and found several nodes sister for P. difference and with substantial underlying gene tree conflict, especially along the difference lnL the backbone of the phylogeny (Fig. 2). ASTRAL quartet support scores lnL persica approximately corresponded to the phyparts results in that higher 855.674 1390.93 901.440 1621.907 941.171 1623.679 Sum Sum 840.828 965.946 884.618 1029.821 892.106 1028.449 favors quartet support scores for a given node were generally associated with differences Amygdalus fewer conflicting genes for that node. Notably, some nodes had more mira-P. gene trees in conflictwith the species tree than congruent to the species P. tree (Fig. 2, Supplemental Fig. S2). At three nodes along the back­ the subgenus bone—defining the clades Cerasus, Amydgalus + Prunus, Cerasus + differences log-likelihood the genes + genes

Amydgalus Prunus, approximately 2.5-3X more genes were supporting of in supports the species tree than conflictingwith it. However, for the node defininga sum 109 105 107 106 118 98 Number Number monophyletic Padus in our species tree, approximately 35% of the 101 105 102 103 109 99 the log-likelihood

informative gene trees supported a non-monophyletic Padus (Fig. 2); the detected as of two primary alternative topologies both supported a paraphyletic Padus. differences well sum as and taxa) taxa) taxa) taxa) taxa) taxa) 4.7. Tests of hybridization relationship shown log-likelihood genes taxa) taxa) taxa) taxa) taxa) taxa) other other other other other other is of Of the 4,041 hypotheses tested in the HyDe analysis, 41 showed of other other other other other other )|(all )|(all )|(all significant evidence of a hybridization event (Table 4). Nine of the 21 )|(all )|(all )|(all conflicting sum Prunus species were determined to have significant evidence of hy­ )|(all )|(all )|(all )|(all )|(all )|(all number the the persica persica persica bridization, but only five species (P. sargentii, P. domestica, P. triloba, P. persica persica persica relationship P. P. P. P. P. P. but , , , , , , both persica persica persica davidiana, P. sibirica) had more than two inferred hybridization events persica persica persica P. P. P. P. P. P. (Table 4). Whereas significant γ values close to 0.5 are indicative of a each , , , , , , γ

recent hybridization event, significant values closer to 0 or 1 can investigate davidiana mira davidiana mira davidiana mira davidiana mira davidiana mira davidiana mira topology, to P. P. P. P. P. P. P. P. P. P. P. P.

reflectsignatures of ancient hybridization retained in extant taxa. In our removed, ( ( ( ( ( ( Bipartition Bipartition ( ( ( ( ( ( HyDe analysis, 30 of 41 γ values were greater than 0.7 or less than 0.3. supporting

The optimal number of hybridization events inferred in the SNaQ were nuclear analysis

network analysis was one, with the hybridization edge between subge­ genes the of

nus Amygdalus and the P. mume - P. armeniaca - P. sibirica (Armeniaca) in % % 30% 10 clade in subgenus Prunus (Fig. 3). The Armeniaca clade was 80.0% sister 30% 10 phyckle + relationship

to (P. domestica P. salicina) and 20.0% sister to subgenus Amygdalus shown. the number TrimAl pxclsq pxclsq TrimAl pxclsq pxclsq (Fig. 3). Allowing higher values of hmax decreased the pseudo-deviance observed of ------are the score. However, as hmax increased beyond one there were only mini­ persica

mal improvements in how the data fitthe model; we therefore followed 3 = results convention (e.g., Blair et al., 2019) and considered hmax 1 to be mira-P. Amygdalus Amygdalus Amygdalus Conflict Conflict Amygdalus Amygdalus Amygdalus alignments P. optimal for this dataset (Supplemental Fig. S3). Table The partitions, relationship

6 R. G. J. Hodel et al. Molecular Phylogenetics and Evolution xxx (xxxx) xxx

Table 4 The Hybrid Detector (HyDe) output results with significantγ values. For each set of parental taxa (P1, P2), a γ value significantlydifferent from 0 indicates a signal of hybridization in the taxon under ’Hybrid’. Non-significant results (4,041 tests) are not shown. Whereas significant γ values close to 0.5 are indicative of a recent hybridization event, γ values closer to 0 or 1 can reflectsignals of ancient hybridization retained in extant taxa. Subgenera are color-coded to correspond to Figs. 1–3.

5. Discussion should be considered. We obtained high bootstrap support for the backbone of the genus, but other metrics of support—phyparts analyses Our study combined hundreds of nuclear genes and plastome se­ of gene tree discord and ASTRAL quartet support scores—showed sub­ quences to measure gene tree conflict and assess how hybridization stantial underlying conflictat a number of nodes. Below we discuss the shaped the complex reticulate evolutionary history of Prunus. We potential causes and implications of the conflictdetected at several key resolved the branching order of the backbone of Prunus using phyloge­ nodes. nomic data and found a monophyletic racemose group and a lack of cytonuclear discord in the backbone relationships of the genus, con­ 5.1. Hybridization versus allopolyploidy tradicting previous studies (Zhao et al., 2016). While the dominant species tree topology for the taxa sampled strongly supports a mono­ Several analyses support the conclusion that the genomes of some phyletic racemose group, the frequency of gene trees in conflictwith the Prunus species have been shaped by ancient hybridization, allopoly­ dominant topology indicates a substantial number of gene trees ploidy, or both. Cytonuclear discord in previous studies hinted at con­ reflecting a paraphyletic racemose group—consistent with hypotheses flict, but discordance was confirmed and fully characterized in the of reticulate evolution in the racemose Prunus. The inference of an present study. We consider the lines of evidence supporting histories of optimal species tree network with one hybridization event suggests that hybridization and/or allopolyploidy at each node with a substantial incomplete lineage sorting is not the sole or predominant cause of amount of gene tree conflict(i.e., nodes with more gene trees in conflict conflict, and that other processes (e.g., hybridization) driving discord with the species tree than in concord; nodes labeled A-F in Fig. 2).

7 R. G. J. Hodel et al. Molecular Phylogenetics and Evolution xxx (xxxx) xxx

Fig. 3. The optimal network inferred in the SNaQ/ PhyloNetworks analysis. This analysis determined that a network with hmax = 1 fit the data best based on changes in the pseudo-deviance network scores (shown in Supplemental Figure S3). The light blue line shows the hybridization event in the network, which is interpreted to mean that Prunus mume + P. armeniaca + P. sibirica was 80.0% (dark blue label) sister to P. domestica + P. salicina and 20.0% (light blue label) sister to subgenus Amygdalus. The sporo­ phytic chromosome number from IPCN Chromosome Reports is displayed to the right of each species label. Subgenera are color-coded and floral arrangement is indicated on the far right.

Analyses such as SNaQ and HyDe are designed to detect hybridization gene tree conflict(nodes A-D in Fig. 2). The gene tree conflictat node F is but can also identify polyploid events resembling hybridization (i.e., likely not caused by allopolyploidy because no species in this clade differ allopolyploidy; Wagner et al., 2020). Chromosome count data are in their chromosome counts (Table 2, Figs. 2 and 3). available for most species examined and each subgenus had a distribu­ Previous studies, which showed cytonuclear discord and a para­ tion of chromosome counts that suggests a history of frequent genome phyletic racemose group, proposed repeated allopolyploidy early in the doubling (Table 2). Previous studies (Zhao et al., 2016) have noted that evolutionary history of this genus as an explanatory mechanism (Zhao the distribution of chromosome counts suggest that polyploidy may et al., 2016). Our finding of a monophyletic racemose lineage at the have occurred several times in the evolutionary history of Prunus and at node defining the subgenus Padus refutes this hypothesis. However, an some nodes inferences of hybridization may be due to allopolyploidy investigation of the discordant genes revealed that alternative gene tree (Table 2, Fig. 3). Notably, the three deeper nodes in subgenus Prunus topologies showed P. virginiana and P. serotina successively sister to have more gene trees in conflict with the species tree than in concert subgenera Prunus + Amygdalus + Cerasus. Although there was uncer­ (nodes A, B, D in Fig. 2). At node A, the source of conflictis likely due to tainty at several deeper nodes in the tree (nodes A, B, D, and the node hybridization as identifiedby the SNaQ analysis (Fig. 3). A hybridization definingsubgenus Padus) as expected based on previous studies, conflict event is supported, but not necessarily an allopolyploidy event, because was also detected at nodes shallower in the tree (C, E; Fig. 2). As noted all species involved in the hybridization event have the same chromo­ above, the gene tree discord deeper in the tree may have been driven by some count (2n = 16, or unknown in the case of P. sibirica; Table 2, hybridization (node A) and allopolyploidy (nodes B, D). The conflictat Fig. 3). shallow node E led to two nuclear topologies, which differed by a single At other nodes with high gene tree conflictin subgenus Prunus (nodes relationship in the tree (the P. persica - P. mira - P. davidiana relationship; B, C, D in Fig. 2), allopolyploidy may explain discord. Many HyDe results Fig. 1). In the conflict at node E, we determined that the secondary to­ implicate at least one member of subgenus Prunus as either a parent or pology was driven by two outlier genes favoring the alternative topology hybrid in a hybridization event, and the HyDe results, in combination of P. persica - P. mira sister with disproportionate influenceon the species with chromosome count data, suggest a history of reticulation at these tree. nodes that may include allopolyploidy. At nodes C and D, the conflict The excessive impact of outlier genes identified by phyckle high­ was likely due to taxa with high chromosome numbers in this clade lights how it can be risky to make strong inferences based on just few (P. triloba, P. prostrata; Table 2, Fig. 3). More than half (21 of 41) of the nuclear genes (Walker et al., 2018). The previous studies (Lee and Wen, significant HyDe results placed P. triloba, the species with the highest 2001; Wen et al., 2008; Chin et al., 2010, Chin et al., 2014; Zhao et al., chromosome count (2n = 64) as the hybrid taxon (Table 2, Fig. 3). High 2016) may have based their inferences on genes that do not track the chromosome numbers may also explain the conflictat node B: half of the species tree history well; our study used hundreds of loci to identify a species (four of eight) sampled in this clade with known chromosome well-supported species tree, but also identified many genes with counts have sporophytic counts greater than 16 (Table 2, Fig. 3). Of the different histories. Previous studies primarily have used bootstrap 41 significant results (out of the 4,041 hypotheses tested) in the HyDe values and posterior probabilities to evaluate support for phylogenetic analysis, 25 hybridization events inferred a hybrid species that had a relationships, which can lead to a false sense that there is no uncertainty diploid chromosome count greater than 16 (P. domestica (2n = 16/32/ about a relationship when these values are 100% /1.0. Our study 48), P. triloba (2n = 64); Table 2, Fig. 3). As noted above, P. domestica highlights that even when traditional support values indicate the highest and P. triloba are present in several clades defined by nodes with high possible confidencein relationships, there can be substantial underlying

8 R. G. J. Hodel et al. Molecular Phylogenetics and Evolution xxx (xxxx) xxx conflict. Thoroughly exploring datasets by using multiple alignment sequencing. The PURC pipeline uses all gene copies present in accessions techniques, phylogenetic inference methods, and conflict analysis stra­ to generate multilocus species networks that infer reticulation events (i. tegies is necessary in phylogenomic studies. e., allopolyploidy). The analyses of sequences of hundreds of nuclear loci and plastomes with a broad taxon sampling will untangle the complex 5.2. Depth of hybridization/allopolyploidy events reticulate evolutionary history of Prunus, emphasizing that thorough sampling of nuclear genomes and the racemose group is a critical next The SNaQ analysis supports a hybridization event relatively deep in step to understand the processes that generated this diverse genus. the tree—a one-hybridization network was optimal, with hybridization between subgenus Amygdalus and a clade (the Armeniaca clade) within CRediT authorship contribution statement subgenus Prunus, as opposed to a recent event between two more closely related species. Moreover, the γ value from the SNaQ analysis (Fig. 3) Richard G. J. Hodel: Conceptualization, Data curation, Visualiza­ was 0.8, likely implying a more ancient hybridization event and/or tion, Writing - original draft, Project administration, Funding acquisi­ subsequent introgression(s). One explanation for the observed cytonu­ tion, Resources. Elizabeth Zimmer: Conceptualization, Supervision, clear conflictis a history of chloroplast capture, when the cytoplasm of Writing - review & editing, Funding acquisition. Jun Wen: . : Concep­ one species is replaced by that of another species following hybridiza­ tualization, Supervision, Visualization, Writing - review & editing. tion and subsequent introgression (Smith and Sytsma, 1990, Rieseberg and Soltis, 1991). This phenomenon has been documented in diverse Acknowledgements lineages of , including the Rosaceae (Liu et al., 2020) as well as other clades (e.g., Saxifragaceae (Soltis and Kuzoff, 1995) and Apiaceae We acknowledge the following funding sources: Smithsonian Insti­ (Yi et al., 2015)). The hybridization event between subgenus Amygdalus tution Peter Buck Postdoctoral Fellowship. We thank the associate editor and the Armeniaca clade within subgenus Prunus inferred by SNaQ may and three anonymous reviewers for many helpful comments that greatly have led to one or more chloroplast capture events, explaining frequent improved the manuscript. We thank Dr. Jacob Landis for helpful dis­ cytonuclear discord in subgenera Amygdalus and Prunus. cussions and input on processing transcriptomic data. HyDe analyses showed that a substantial number of species have signals of ancient rather than recent hybridization (Table 4). The HyDe Appendix A. Supplementary material analysis indicated most significant instances of hybridization are older events deeper in the tree; nearly three quarters of hybridization events Supplementary data to this article can be found online at https://doi. (30 of 41) had a HyDe γ value less than 0.3 or greater than 0.7 (Table 4). org/10.1016/j.ympev.2021.107118. For example, all γ values inferring that a racemose or corymbose taxon is involved in a hybridization event, either as a parent or hybrid, are greater than 0.64 or less than 0.22 (Table 4). Because the order of References divergences—the racemose and corymbose groups are successively sis­ — γ Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local ter to the solitary group the distribution of HyDe values across Prunus alignment search tool. J. Mol. Biol. 215 (3), 403–410. groups conforms with the expectation that lower γ values would capture Andrews, S., 2010. FastQC: A quality control tool for high throughput sequence data more ancient hybridization events (Table 4, Fig. 1). [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/project s/fastqc/. Blair, C., Bryson Jr., R.W., Linkem, C.W., Lazcano, D., Klicka, J., McCormack, J.E., 2019. 5.3. Conclusions and prospects Cryptic diversity in the Mexican highlands: Thousands of UCE loci help illuminate phylogenetic relationships, species limits and divergence times of montane rattlesnakes (Viperidae: Crotalus). Mol. Ecol. Resour. 19 (2), 349–365. https://doi. In this study, we used a phylogenomic approach to resolve re­ org/10.1111/1755-0998.12970. lationships along the backbone of Prunus, but some relationships were Blischak, P.D., Chifman, J., Wolfe, A.D., Kubatko, L.S., 2018. HyDe: A Python package for not inferred unequivocally. While previous studies hypothesized that genome-scale hybridization detection. Syst. Biol. 67, 821–829. https://doi.org/ there was a history of reticulate evolution in this genus, likely due to 10.1093/sysbio/syy023. Bolger, A.M., Lohse, M., Usadel, B., 2014. Trimmomatic: A flexibletrimmer for Illumina ancient hybridization and/or allopolyploidy, our study, with hundreds sequence data. Bioinformatics 30, 2114–2120. https://doi.org/10.1093/ more nuclear loci, confirmed that there was substantial gene tree con­ bioinformatics/btu170. flict in Prunus. We demonstrated that at several nodes, conflict could Bortiri, E., Oh, S.-H., Jiang, J., Baggett, S., Granger, A., Weeks, C., Buckingham, M., Potter, D., Parfitt, D.E., 2001. Phylogeny and systematics of Prunus (Rosaceae) as have been caused by hybridization and/or allopolyploidy, as supported determined by sequence analysis of ITS and the chloroplast trnL-trnF spacer DNA 26, by the SNaQ and HyDe analyses and chromosome count data. We found 797–807. https://doi.org/10.1043/0363-6445-26.4.797. a monophyletic racemose group, but conflictinggene trees indicate that Bortiri, E., Oh, S.-H., Gao, F.-Y., Potter, D., 2002. The phylogenetic utility of nucleotide sequences of sorbitol 6-phosphate dehydrogenase in Prunus (Rosaceae). Amer. J. Bot. some loci show a history of a paraphyletic racemose group. Because this 89, 1697–1708. https://doi.org/10.1043/0363-6445-26.4.797. study used publicly available data, the taxon sampling was not ideal. Brown, J.W., Walker, J.F., Smith, S.A., 2017. Phyx: phylogenetic tools for unix. Although we limited our study to putatively single copy orthologs, a Bioinformatics 33, 1886–1888. https://doi.org/10.1093/bioinformatics/btx063. Chamala, S., García, N., Godden, G.T., Krishnakumar, V., Jordon-Thaden, I.E., Smet, R. history of allopolyploidy in Prunus implies that some genes in our De, Barbazuk, W.B., Soltis, D.E., Soltis, P.S., 2015. MarkerMiner 1.0: A new dataset could be paralogs, explaining some of the discordance among application for phylogenetic marker development using Angiosperm transcriptomes. trees. Future studies will need to significantly expand sampling on the Appl. Plant Sci. 3, 1400115. https://doi.org/10.3732/apps.1400115. Chen, R.Y., 1993. Chromosome Atlas of Chinese Trees and Their Close Wild understudied racemose lineages, especially the tropical lineages in Asia, Relatives. Chromosome Atlas Chin. Princ. Econ. Pl. 1. Africa and the Neotropics (Wen et al., 2008; Chin et al., 2014; Zhao Chen, R.Y., Song, W.Q., Li, X.I., Li, M.X., Liang, G.I., Chen, C.B., 2003. Chromosome Atlas et al., 2016, 2018), as the racemose Prunus includes more than 50% of of Garden Flowering Plants in China. Chromosome Atlas of Major Economic Plants the species diversity in the genus. A two-step approach could be fruit­ Genome in China, 3. Science Press, Beijing. Chin, S.-W., Lutz, S., Wen, J., Potter, D., 2013. The Bitter and the Sweet: Inference of ful—firstusing dense taxon sampling and an approach such as Hyb-Seq Homology and Evolution of Leaf Glands in Prunus (Rosaceae) through Anatomy, to resolve the phylogeny and identify precise nodes on the tree where Micromorphology, and Ancestral–Character State Reconstruction. Int. J. Plant Sci. – there are large amounts of discord, and then identifying loci that are 174 (1), 27 46. https://doi.org/10.1086/668219. Chin, S.-W., Shaw, J., Haberle, R., Wen, J., Potter, D., 2014. Diversification of almonds, candidates for protocols using single molecule sequencing (e.g., PURC, peaches, plums and cherries – Molecular systematics and biogeographic history of Rothfels et al., 2017). Specifically, the PURC protocol can use genes Prunus (Rosaceae). Mol. Phylogenet. Evol. 76, 34–48. https://doi.org/10.1016/j. identified as single copy in a diploid taxon to target all the copies (al­ ympev.2014.02.024. Chin, S.-W., Wen, J., Johnson, G., Potter, D., 2010. Merging Maddenia with the leles, homeologs, or paralogs) for a given accession that are amplifiedby morphologically diverse Prunus (Rosaceae). Bot. J. Linn. Soc. 164, 236–245. https:// a given primer pair using PacBio (or other long-read, single-molecule) doi.org/10.1111/j.1095-8339.2010.01083.x.

9 R. G. J. Hodel et al. Molecular Phylogenetics and Evolution xxx (xxxx) xxx

Corrias, B., 1980. Numeri cromosomici per la Flora Italiana. Inform. Bot. Ital. 12, Revell, L.J., 2012. phytools: An R package for phylogenetic comparative biology (and 121–123. other things). Methods Ecol. Evol. 3, 217–223. https://doi.org/10.1111/j.2041- De Smet, R., Adams, K.L., Vandepoele, K., Van Montagu, M.C.E., Maere, S., Van De Peer, 210X.2011.00169.x. Y., 2013. Convergent gene loss following gene and genome duplications creates Rieseberg, L., Soltis, D.E., 1991. Phylogenetic consequences of cytoplasmic gene flow in single-copy families in flowering plants. Proc. Natl. Acad. Sci. U. S. A. 110, plants. Evol. Trend. Plant. 5, 65–84. 2898–2903. https://doi.org/10.1073/pnas.1300127110. Rieseberg, L.H., Willis, J.H., 2007. Plant Speciation. Science 317 (5840), 910–914. Eaton, D.A.R., Ree, R.H., 2013. Inferring phylogeny and introgression using RADseq https://doi.org/10.1126/science.1137729. data: An example from floweringplants (Pedicularis: Orobanchaceae). Syst. Biol. 62, Rothfels, C.J., Pryer, K.M., Li, F.-W., 2017. Next-generation polyploid phylogenetics: 689–706. https://doi.org/10.1093/sysbio/syt032. rapid resolution of hybrid polyploid complexes using PacBio single-molecule Eremin, G.V., Rassvetaeva, E.G., 1992. Polymorphism of chromosome numbers in sequencing. New Phytol. 213 (1), 413–429. https://doi.org/10.1111/nph.14111. Prunoideae Focke. Tezisy III Soveshchanie po Kariologii Rastenii 21–22. Sanchez-P´ ´erez, R., Pavan, S., Mazzeo, R., Moldovan, C., Aiese Cigliano, R., Del Cueto, J., Folk, R.A., Soltis, P.S., Soltis, D.E., Guralnick, R., 2018. New prospects in the detection Ricciardi, F., Lotti, C., Ricciardi, L., Dicenta, F., Lopez-Marqu´ es,´ R.L., Møller, B.L., and comparative analysis of hybridization in the tree of life. Am. J. Bot. 105, 2019. Mutation of a bHLH transcription factor allowed almond domestication. 364–375. https://doi.org/10.1002/AJB2.1018. Science 364 (6445), 1095–1098. https://doi.org/10.1126/science.aav8197. Fuchs, J., Brandes, I., Schubert, I., 1995. Telomere sequence localization and karyotype Sang, T., 2002. Utility of Low-Copy Nuclear Gene Sequences in Plant Phylogenetics. Crit. evolution in higher plants. Pl. Syst. Evol. 196, 227–241. Rev. Biochem. Mol. Biol. 37 (3), 121–147. https://doi.org/10.1080/ Gonzalez´ Zapatero, M.A., Elena-Rosello,´ J.A., Andres,´ F.N., 1988. Números 10409230290771474. cromosomicos´ para la flora Espanola.˜ Lagascalia 15, 112–119. Shi, S., Li, J., Sun, J., Yu, J., Zhou, S., 2013. Phylogeny and Classificationof Prunus sensu Guo, Z.H., Lu, Z.R., Li, G.Q., Mu, Y.I., 1986. Karyotype analysis of Prunus davidiana lato (Rosaceae) : Phylogeny and Classification of Prunus. J. Integr. Plant Biol. 55 Franchet and P. kansuensis Rehd. J. Hebei Agric. Univ. 9, 1–5. (11), 1069–1079. https://doi.org/10.1111/jipb.12095. Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D., Bowden, J., Shirasawa, K., Esumi, T., Hirakawa, H., Tanaka, H., Itai, A., Ghelfi, A., Nagasaki, H., Couger, M.B., Eccles, D., Li, B., Lieber, M., MacManes, M.D., Ott, M., Orvis, J., Isobe, S., 2019. Phased genome sequence of an interspecifichybrid floweringcherry, Pochet, N., Strozzi, F., Weeks, N., Westerman, R., William, T., Dewey, C.N., “Somei-Yoshino” (Cerasus × yedoensis). DNA Res. https://doi.org/10.1093/dnares/ Henschel, R., LeDuc, R.D., Friedman, N., Regev, A., 2013. De novo transcript dsz016. sequence reconstruction from RNA-seq using the Trinity platform for reference Shirasawa, K., Isuzugawa, K., Ikenaga, M., Saito, Y., Yamamoto, T., Hirakawa, H., generation and analysis. Nat. Protoc. 8 (8), 1494–1512. https://doi.org/10.1038/ Isobe, S., 2017. The genome sequence of sweet cherry (Prunus avium) for use in nprot.2013.084. genomics-assisted breeding. DNA Res. 24, 499–508. https://doi.org/10.1093/ Iwatsubo, Y., Kawasaki, T., Naruhashi, N., 2002. Chromosome numbers of 193 cultivated dnares/dsx020. taxa of Prunus. J. Phytogeogr. Taxon. 50, 21–34. Smith, R.L., Sytsma, K.J., 1990. Evolution of Populus nigra (Sect. Aigeiros): Introgressive Iwatsubo, Y., Sengi, Y., Naruhashi, N., 2004. Chromosome numbers of 36 cultivated taxa Hybridization and the Chloroplast Contribution of Populus alba (Sect. Populus). Am. of Prunus subg. Cerasus in Japan. J. Phytogeogr. Taxon. 52, 73–76. J. Bot. 77 (9), 1176. https://doi.org/10.2307/2444628. Javurkova, V., 1980. Chromosome number reports LXIX. Taxon 29, 713–714. Singhal, V.K., Gill, B.S., Sidhu, M.S., 1990. Cytology of woody members of Rosaceae. Jiang, F., Zhang, J., Wang, S., Yang, L.i., Luo, Y., Gao, S., Zhang, M., Wu, S., Hu, S., Proc. Indian Acad. Sci., Pl. Sci. 100, 17–21. Sun, H., Wang, Y., 2019. The apricot (Prunus armeniaca L.) genome elucidates Smith, S.A., Moore, M.J., Brown, J.W., Yang, Y.a., 2015. Analysis of phylogenomic Rosaceae evolution and beta-carotenoid synthesis. Hortic. Res.. 6 (1) https://doi. datasets reveals conflict, concordance, and gene duplications with examples from org/10.1038/s41438-019-0215-6. animals and plants. BMC Evol. Biol. 15 (1) https://doi.org/10.1186/s12862-015- Kalyaanamoorthy, S., Minh, B.Q., Wong, T.K.F., von Haeseler, A., Jermiin, L.S., 2017. 0423-0. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Meth. 14 Smith, S.A., Walker-Hale, N., Walker, J.F., Brown, J.W., 2020. Phylogenetic conflicts, (6), 587–589. https://doi.org/10.1038/nmeth.4285. combinability, and deep phylogenomics in plants. Syst. Biol. 69, 579–592. https:// Katoh, K., Standley, D.M., 2013. MAFFT Multiple Sequence Alignment Software Version doi.org/10.1093/sysbio/syz078. 7: Improvements in Performance and Usability. Mol. Biol. Evol. 30 (4), 772–780. Solís-Lemus, C., Bastide, P., Ane,´ C., 2017. PhyloNetworks: A package for phylogenetic https://doi.org/10.1093/molbev/mst010. networks. Mol. Biol. Evol. 34, 3292–3298. https://doi.org/10.1093/molbev/ Knowles, L.L., Huang, H., Sukumaran, J., Smith, S.A., 2018. A matter of phylogenetic msx235. scale: Distinguishing incomplete lineage sorting from lateral gene transfer as the Solís-Lemus, C., An´e, C., 2016. Inferring phylogenetic networks with maximum cause of gene tree discord in recent versus deep diversificationhistories. Am. J. Bot. pseudolikelihood under incomplete lineage sorting. PLoS Genet. 12. https://doi.org/ 105 (3), 376–384. https://doi.org/10.1002/ajb2.1064. 10.1371/journal.pgen.1005896. Langmead, B., Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nat. Soltis, D.E., Kuzoff, R.K., 1995. Discordance between nuclear and chloroplast Meth. 9 (4), 357–359. https://doi.org/10.1038/nmeth.1923. phylogenies in the Heuchera group (Saxifragaceae). Evolution 49 (4), 727–742. Lee, S., Wen, J., 2001. A phylogenetic analysis of Prunus and the Amygdaloideae https://doi.org/10.1111/j.1558-5646.1995.tb02309.x. (Rosaceae) using ITS sequences of nuclear ribosomal DNA. Am. J. Bot. 88 (1), Song, L., Florea, L., 2015. Rcorrector: efficientand accurate error correction for Illumina 150–160. https://doi.org/10.2307/2657135. RNA-seq reads. GigaSci 4 (1). https://doi.org/10.1186/s13742-015-0089-y. Li, W., Liu, L., Zhou, W., Wang, Y., Ding, X., Fan, G., Zhang, S., Liao, K., 2020. Karyotypic Stamatakis, A., 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis Characteristics and Genetic Relationships of Apricot Accessions from Different of large phylogenies. Bioinformatics 30, 1312–3. https://doi.org/10.1093/ Ecological Groups. J. Am. Soc. Hortic. Sci. 146, 68–76. https://doi.org/10.21273/ bioinformatics/btu033. JASHS04956-20. Struck, T.H., 2014. Trespex-detection of misleading signal in phylogenetic Lin, S.H., Pu, F.S., Zhang, J.Y., Gao, X.Y., Li, X.J., 1991. Observation on the chromosome reconstructions based on tree information. Evol. Bioinforma. 10, 51–67. https://doi. number of Prunus. China 2, 8–10. org/10.4137/EBo.s14239. Liu, B.-B., Campbell, C.S., Hong, D.-Y., Wen, J., 2020. Phylogenetic relationships and Terraciano, L.B.C.D., Cialzeta, J.C., 1979. Cruzamientos interespecíficos como m´etodos chloroplast capture in the Amelanchier-Malacomeles-Peraphyllum clade (Maleae, de mejoramiento. Bol. Genet. 10, 29–33. Rosaceae): Evidence from chloroplast genome and nuclear ribosomal DNA data The International Peach Genome Initiative, Verde, I., Abbott, A.G., Scalabrin, S., Jung, S., using genome skimming. Mol. Phylogenet. Evol. 147, 106784. https://doi.org/ Shu, S., Marroni, F., Zhebentyayeva, T., Dettori, M.T., Grimwood, J., Cattonaro, F., 10.1016/j.ympev.2020.106784. Zuccolo, A., Rossini, L., Jenkins, J., Vendramin, E., Meisel, L.A., Decroocq, V., Love, A., Love, D., 1982. Taxon 31, 344–360. Sosinski, B., Prochnik, S., Mitros, T., Policriti, A., Cipriani, G., Dondini, L., Ficklin, S., Maddison, W.P., 1997. Gene trees in species trees. Syst. Biol. 46, 523–536. https://doi. Goodstein, D.M., Xuan, P., Fabbro, C.D., Aramini, V., Copetti, D., Gonzalez, S., org/10.1093/sysbio/46.3.523. Horner, D.S., Falchi, R., Lucas, S., Mica, E., Maldonado, J., Lazzari, B., Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M.S., Warnow, T., 2014. Bielenberg, D., Pirona, R., Miculan, M., Barakat, A., Testolin, R., Stella, A., ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30, Tartarini, S., Tonutti, P., Arús, P., Orellana, A., Wells, C., Main, D., Vizzotto, G., i541–i548. https://doi.org/10.1093/bioinformatics/btu462. Silva, H., Salamini, F., Schmutz, J., Morgante, M., Rokhsar, D.S., 2013. The high- Montgomery, L., Khalaf, M., Bailey, J.P., Gornal, K.J., 1997. Contributions to a quality draft genome of peach (Prunus persica) identifies unique patterns of genetic cytological catalogue of the British and Irish flora. Watsonia 21, 365–368. diversity, domestication and genome evolution. Nat. Genet. 45 (5), 487–494. Moore, W.S., 1995. Inferring phylogenies from mt DNA variation: mitochondrial-gene https://doi.org/10.1038/ng.2586. trees versus nuclear-gene trees. Evolution 49, 718–726. https://doi.org/10.1111/ Walker, J.F., Brown, J.W., Smith, S.A., 2018. Analyzing contentious relationships and j.1558-5646.1995.tb02308.x. outlier genes in phylogenomics. Syst. Biol. https://doi.org/10.1093/sysbio/syy043. Morales-Briones, D.F., Kadereit, G., Tefarikis, D.T., Moore, M.J., Smith, S.A., Wang, Y., Chen, Q., Chen, T., Zhang, J., He, W., Liu, L., Luo, Y., Sun, B., Zhang, Y., Tang, Brockington, S.F., Timoneda, A., Yim, W.C., Cushman, J.C., Yang, Y., 2021. H.R., Wang, X.R., 2019. Allopolyploid origin in Rubus (Rosaceae) inferred from Disentangling sources of gene tree discordance in phylotranscriptomic datasets: a nuclear granule-bound starch synthase I (GBSSI) sequences. BMC Plant Biol. 19, 303. case study from Amaranthaceae s.l. Syst. Biol. 70, 219–235. https://doi.org/ https://doi.org/10.1186/s12870-019-1915-7. 10.1093/sysbio/syaa066. Wagner, N.D., He, L., Horandl,¨ E., 2020. Phylogenomic relationships and evolution of Nguyen, L.T., Schmidt, H.A., Von Haeseler, A., Minh, B.Q., 2015. IQ-TREE: A fast and polyploid Salix species revealed by RAD sequencing data. Front. Plant Sci. 11, 1077. effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. https://doi.org/10.3389/FPLS.2020.01077. Biol. Evol. 32, 268–274. https://doi.org/10.1093/molbev/msu300. Wen, J., Berggren, S.T., Lee, C.-H., Ickert-Bond, S., Yi, T.-S., Yoo, K., Xie, L., Shaw, J., Nishikawa, T., 1985. Chromosome counts of flowering plants of Hokkaido. J. Hokkaido Potter, D., 2008. Phylogenetic inferences in Prunus (Rosaceae) using chloroplast Univ. Educ., Sect. 2B 35, 97–111. ndhF and nuclear ribosomal ITS sequences. J. Syst. Evol. 46, 322–332. https://doi. Oginuma, K., 1987. Karyomorphological studies on Prunus in Japan. J. Sci. Hiroshima org/10.3724/SP.J.1002.2008.08065. Univ., Ser. B, Div. 2, Bot. 21, 1–66.

10 R. G. J. Hodel et al. Molecular Phylogenetics and Evolution xxx (xxxx) xxx

Wood, T.E., Takebayashi, N., Barker, M.S., Mayrose, I., Greenspoon, P.B., Rieseberg, L. Osmorhiza (Apiaceae). Mol. Phylogenet. Evol. 85, 10–21. https://doi.org/10.1016/j. H., 2009. The frequency of polyploid speciation in vascular plants. Proc. Natl. Acad. ympev.2014.09.028. Sci. U. S. A. 106, 13875–13879. https://doi.org/10.1073/pnas.0811575106. Zhao, L., Jiang, X.-W., Zuo, Y., Liu, X.-L., Chin, S.-W., Haberle, R., Potter, D., Chang, Z.- Xiang, Y., Huang, C.H., Hu, Y., Wen, J., Li, S., Yi, T., Chen, H., Xiang, J., Ma, H., 2017. Y., Wen, J., 2016. Multiple events of allopolyploidy in the evolution of the racemose Evolution of Rosaceae fruit types based on nuclear phylogeny in the context of lineages in Prunus (Rosaceae) based on integrated evidence from nuclear and plastid geological times and genome duplication. Mol. Biol. Evol. 34, 262–281. https://doi. data. PLoS ONE 11, e0157123. https://doi.org/10.1371/journal.pone.0157123. org/10.1093/molbev/msw242. Zhao, L., Potter, D., Xu, Y., Liu, P.L., Johnson, G., Chang, Z.Y., Wen, J., 2018. Phylogeny Xue, S., Shi, T., Luo, W., Ni, X., Iqbal, S., Ni, Z., Huang, X., Yao, D., Shen, Z., Gao, Z., and spatio-temporal diversification of Prunus subgenus Laurocerasus section 2019. Comparative analysis of the complete chloroplast genome among Prunus Mesopygeum (Rosaceae) in the Malesian region. J. Syst. Evol. 56, 637–651. https:// mume, P. armeniaca, and P. salicina. Hortic. Res. 6 https://doi.org/10.1038/s41438- doi.org/10.1111/jse.12467. 019-0171-1. Zhang, Q., Chen, W., Sun, L., Zhao, F., Huang, B., Wang, J., Yang, W., Tao, Y., Yuan, Z., Yang, Y., Smith, S.A., 2014. Orthology inference in nonmodel organisms using Fan, G., Xing, Z., Han, C., Pan, H., Zhong, X., Shi, W., Liang, X., Du, D., Sun, F., transcriptomes and low-coverage genomes: Improving accuracy and matrix Xu, Z., Hao, R., Lv, T., Lv, Y., Zheng, Z., Sun, M., Luo, L., Cai, M., Gao, Y., Wang, J., occupancy for phylogenomics. Mol. Biol. Evol. 31, 3081–3092. https://doi.org/ Yin, Y., Xu, X., Cheng, T., Wang, J., 2012. The genome of Prunus mume. Nat. 10.1093/molbev/msu245. Commun. 3 https://doi.org/10.1038/ncomms2290. Yi, T.S., Jin, G.H., Wen, J., 2015. Chloroplast capture and intra- and inter-continental Zimmer, E.A., Wen, J., 2015. Using nuclear gene data for plant phylogenetics: Progress biogeographic diversification in the Asian - New World disjunct plant genus and prospects II. Next-gen approaches. J. Syst. Evol. https://doi.org/10.1111/ jse.12174.

11