<<

Research

Diversification of since the Late Cretaceous based on plastid phylogenomics

Shu-Dong Zhang1*, Jian-Jun Jin1,2*, Si-Yun Chen1, Mark W. Chase3,4, Douglas E. Soltis5,6,7, Hong-Tao Li1, Jun-Bo Yang1, De-Zhu Li1 and Ting-Shuang Yi1 1Germplasm Bank of Wild , Kunming Institute of , Chinese Academy of Sciences, Kunming 650201, ; 2Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, 650201, China; 3Science Directorate, Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3DS, UK; 4School of Biology, University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia; 5Florida Museum of Natural History, University of Florida, Gainesville, FL 32611-7800, USA; 6Department of Biology, University of Florida, Gainesville, FL 32611, USA; 7Genetics Institute, University of Florida, Gainesville, FL 32608, USA

Summary Authors for correspondence:  Phylogenetic relationships in Rosaceae have long been problematic because of frequent De-Zhu Li hybridisation, apomixis and presumed rapid radiation, and their historical diversification has Tel: +86 871 65223503 not been clarified. Email: [email protected]  With 87 genera representing all subfamilies and tribes of Rosaceae and six of the other eight Ting-Shuang Yi families of (outgroups), we analysed 130 newly sequenced plastomes together with Tel: +86 871 65223136 12 from GenBank in an attempt to reconstruct deep relationships and reveal temporal diversi- Email: [email protected] fication of this family. Received: 1 November 2016  Our results highlight the importance of improving sequence alignment and the use of Accepted: 26 December 2016 appropriate substitution models in plastid phylogenomics. Three subfamilies and 16 tribes (as previously delimited) were strongly supported as monophyletic, and their relationships were New Phytologist (2017) fully resolved and strongly supported at most nodes. Rosaceae were estimated to have origi- doi: 10.1111/nph.14461 nated during the Late Cretaceous with evidence for rapid diversification events during several geological periods. The major lineages rapidly diversified in warm and wet habits during the Late Cretaceous, and the rapid diversification of genera from the early Oligocene onwards Key words: diversification, phylogenomics, occurred in colder and drier environments. plastome, rapid radiation, Rosaceae,  systematic error. Plastid phylogenomics offers new and important insights into deep phylogenetic relation- ships and the diversification history of Rosaceae. The robust phylogenetic backbone and time estimates we provide establish a framework for future comparative studies on rosaceous evolution.

relationships among major clades and genera in Rosaceae have Introduction historically been a major phylogenetic challenge (Morgan et al., Phylogenomics applies genomic data to reconstruct the evolu- 1994; Potter et al., 2007), exemplifying a pattern resulting from tionary history of organisms (Eisen, 1998; Eisen & Fraser, 2003; multiple rapid radiations and reticulation events (Campbell et al., Delsuc et al., 2005) and has been applied to tackle rapidly radiat- 2007). We use plastid phylogenomics here to better resolve phy- ing clades (Hackett et al., 2008; Jian et al., 2008; Bewick et al., logenetic relationships within Rosaceae. 2012; Zhou et al., 2012; Ruhfel et al., 2014; Wickett et al., 2014) Rosaceae are one of the most diverse angiosperm families with and hybridisation events (Marcet-Houben & Gabaldon, 2015; c. 90 genera and 3000 species with a global distribution. They are Sun et al., 2015) in diverse lineages. Due to their moderate size, especially rich in the temperate and warmer regions of the North- moderate nucleotide substitution rates and freedom from prob- ern Hemisphere (Potter et al., 2007). Rosaceous species have lems of paralogy (Clegg et al., 1994), plastid DNA sequences adapted to a wide variety of environments, ranging from mesic to have been widely used for the reconstruction of plant phyloge- xeric communities and tropical to ecosystems. nies. A plastome phylogenomics approach has been successfully Many species of Rosaceae are also economically important, as applied to resolve many enigmatic relationships within edible (, , , , , , angiosperms and across all green (Jansen et al., 2007; raspberries, loquats and ), ornamentals (crab apples, Moore et al., 2007, 2010; Zhong et al., 2010; Xi et al., 2012; , , firethorns, rowans and hawthorns) and timbers Barrett et al., 2014; Ruhfel et al., 2014). Phylogenetic (plums and pears). For this reason, in part, the entire nuclear genome has been sequenced for at least nine species (with others *These authors contributed equally to this work. in progress) including (Velasco et al., 2010), five

Ó 2017 The Authors New Phytologist (2017) 1 New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 2 Research Phytologist

species (Shulaev et al., 2011; Hirakawa et al., 2014), sequenced plastomes representing all previously recognised major (Ahmad et al., 2011; International Peach Genome et al., 2013), lineages of Rosaceae, this study applies multiple phylogenetic (Wu et al., 2013) and mei (Sun et al., 2013). reconstruction methods in combination with appropriate models As an ecologically and economically important group, of sequence evolution to estimate phylogenetic relationships. The Rosaceae have been a particular focus of many botanists, and a major objectives of this study are to: resolve the phylogenetic rela- series of taxonomic and phylogenetic studies have been published tionships among major clades and genera of Rosaceae; test the (Potter et al., 2007). Recent molecular phylogenetic studies have utility of plastome sequence data to resolve phylogenetic relation- strongly supported the monophyly of Rosaceae and their sister ships of putatively rapidly radiating groups; and explore the tem- relationships to a clade formed by the other eight families of the poral diversification patterns of Rosaceae with respect to order Rosales (Soltis et al., 2011; Zhang et al., 2011; The palaeoenvironmental changes. Angiosperm Phylogeny Group, 2016). Previous molecular phylo- genetic studies based on a limited number of loci have greatly Materials and Methods advanced our understanding of rosaceous relationships (Morgan et al., 1994; Evans et al., 2000; Potter et al., 2002, 2007; Lo & Taxon sampling Donoghue, 2012; Chin et al., 2014). However, following the studies reviewed earlier, portions of the tree remained unresolved, We sampled 132 species from 79 genera (Supporting Informa- with weakly supported and conflicting relationships. Relation- tion Table S1) to represent all three recognised subfamilies ships among the three subfamilies (, Dryadoideae and 16 tribes of Rosaceae, and 10 species to represent six of and ) remain contentious (Potter et al., 2007; Chin the other eight families of Rosales as outgroups. We included et al., 2014; Li et al., 2015), and those among some tribes and representatives of many genera that have been recently com- genera are also controversial and not fully resolved (Potter et al., bined with other genera or for which circumscriptions are 2007; Lo & Donoghue, 2012; Chin et al., 2014), especially those contentious. The voucher specimens were deposited in the within subfamily Dryadoideae and subtribe (formerly herbarium of Kunming Institute of Botany, Chinese Academy Pyrinae or ). of Sciences (KUN). Concomitantly, major classifications of Rosaceae based on morphology alone, or on both morphology and molecular data Plastome sequencing and data assembly do not agree regarding the delimitation of subfamilies, super- tribes, tribes, subtribes and even some genera (Hutchinson, Total genomic DNA was extracted from 100 mg of fresh 1964; Schulze-Menz, 1964; Takhtajan, 1997; Kalkman, 2004; using a modified CTAB (cetrimonium bromide) method of Yang Potter et al., 2007). Homoplasy of morphological characters that et al. (2014). For most species, the plastomes were amplified in have been the basis for classifications, frequent interspecific and overlapping fragments using the long-range PCR method of even intergeneric hybridisation, apomixis and rapid radiation Yang et al. (2014), and PCR fragments were pooled together in have all created difficulties in both classification and phylogenetic roughly equal concentrations for subsequent sequencing. The reconstruction of Rosaceae (Campbell et al., 2007; Potter et al., plastomes of and two species failed to 2007; Lo & Donoghue, 2012). be amplified using this method, so 0.5 lg of total genomic DNA The large-scale evolutionary history of Rosaceae remains was directly used for sequencing. The DNA samples were sheared poorly understood, although previous dating analyses have into fragments of c. 500 bp and used to construct libraries accord- provided some insights. Most previous studies estimated a ing to the manufacturer’s manual (Illumina, San Diego, CA, stem age of Rosaceae between 90 and 80 Ma in the Late Cre- USA). Paired-end sequencing of 90 bp was conducted on an Illu- taceous (Wikstrom et al., 2001; Magallon & Castillo, 2009; mina HiSeq 2000 at BGI-Shenzhen, and > 200 Mb (> 2 Gb for Wang et al., 2009; Bell et al., 2010; Hohmann et al., 2015; K. japonica and Filipendula spp.) of sequence data for each sam- Tank et al., 2015), but 106.1 or 106.5 Ma by Li et al. (2015). ple was obtained. We assembled and annotated the sequenced What appears to be an unreasonably young age of only 40.8 plastomes following Yang et al. (2014); the plastome of Rosa or 31.9 Ma (Xue et al., 2012) and an age that is much too odorata var. gigantea (GenBank accession number KF753637) old of 367–170 Ma were also obtained (Jeong et al., 1999). was used as the reference genome for assembly of all other Only a few studies have examined the divergence times accessions. among the major clades and genera of the family (Lo & Donoghue, 2012; Chin et al., 2014). The lack of a robust Sequence alignment phylogenetic framework and time tree has hindered develop- ment of a full understanding of the diversification of Including the 12 previously sequenced plastomes in GenBank, Rosaceae. plastome sequences (using both coding and noncoding regions) A fully resolved, strongly supported, time-calibrated phyloge- from 142 species were aligned with MAFFT v.6.833 (Katoh et al., netic tree of Rosaceae will provide an important framework for 2005) using the default settings. The plastomes of Rosaceae are studying classification, diversification, biogeography, phenotypic conserved in gene order and content, so the alignment was evolution, gene function and comparative genomics of this straightforward; some poorly aligned regions were manually important family. Integrating 123 newly and nine previously adjusted in GENEIOUS v.8.1 (Kearse et al., 2012).

New Phytologist (2017) Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 3

Phylogenetic analyses Molecular dating We first analysed the whole plastome data set (the ‘WP’ data set) To alleviate the computational burden, the reduced GBDN with standard Bayesian inference (BI) and maximum likelihood data set keeping one sample for each was used for dat- (ML) methods. The dataset was unpartitioned, and the best-fit ing and diversification analyses. Penalised likelihood (PL) dat- model was determined by MODELTEST v.3.7 (Posada & Crandall, ing analyses were conducted using the TREEPL program 1998) with the Akaike Information Criterion (AIC) (Posada & (Smith & O’Meara, 2012). This program allows for better Buckley, 2004). BI was performed with MRBAYES v.3.1.2 (Ron- optimisation with large trees by combining stochastic optimi- quist & Huelsenbeck, 2003). Two independent Markov chain sation with hill-climbing gradient-based methods. The ML Monte Carlo (MCMC) chains were run, each with three heated tree reconstructed with the GBDN data set (with branch and one cold chain for 30 million generations. Each chain length generated by RAXML) was used as the input tree. To started with a random tree, default priors and sampling trees identify the appropriate level of rate heterogeneity in the phy- every 100 generations, with the first 25% discarded as burn-in. logram, a data-driven cross-validation analysis was conducted Stationarity was considered to be reached when the average stan- with TREEPL. One thousand bootstrap (BS) replicates with dard deviation of split frequencies was < 0.01. The ML analyses branch lengths were also generated using RAXML for calculat- were performed with RAXML v.7.2.6 (Stamatakis, 2006). The ing the confidence age intervals with TreeAnnotator as imple- ML tree was inferred with the combined rapid bootstrap (100 mented in BEAST (Bouckaert et al., 2014). replicates) and search for ML tree (the ‘-f a’ option). The Turonian floral dating to 90 Ma are the earliest fossils GTRGAMMA model was used in all analyses as suggested (RAXML assignable to Rosaceae (Crepet & Nixon, 1996). Most recent age manual). estimates are roughly equal to or greater than 90 Ma, such as 88 Phylogenomics has in some cases been shown to be susceptible or 93 Ma (two PL dates) or 103 Ma (Bayesian relaxed clock esti- to systematic errors produced by poor alignment of the data mates, Wang et al., 2009), c. 94 Ma (Magallon & Castillo, matrix and improper models of sequence evolution (Jeffroy et al., 2009), 85 or 82 Ma (Bell et al., 2010), c. 97 Ma (Wu et al., 2006; Rodriguez-Ezpeleta et al., 2007; Philippe et al., 2011). Sev- 2014), c. 87.7 Ma (Hohmann et al., 2015), c. 89.4 Ma (Tank eral strategies have been applied to reduce potential effects of sys- et al., 2015), and 106.5 or 106.1 Ma (Li et al., 2015). Integrating tematic errors in our analyses. First, we used only the exons of both the record and previous age estimates, we considered protein-coding genes (the ‘PCG’ data set). Second, we removed 90 and 106.5 Ma as the minimum- and maximum-age calibra- ambiguously aligned sites in the WP data set using GBLOCKS tions. Six fossil calibrations were also used as minimum-age cali- v.0.91b (Castresana, 2000; Talavera & Castresana, 2007) with brations (Table 1); the age of janhartfordae was the following parameters: minimum sequences per conserved determined by radioactive isotopes, and the youngest ages of position, 72; minimum sequences per flank position, 120; maxi- strata where other fossil species occurred were applied in this mum number of contiguous nonconserved positions, 8; mini- analysis. mum block length, 10; allowed gap positions, none (the ‘GBDN’ data set). Third, we analysed the WP, PCG and GBDN data sets Diversification rate analyses with two site-heterogeneous models, the CAT and CAT-GTR models (WPCAT, WPCAT-GTR, PCGCAT, PCGCAT-GTR, The diversification rate change over time within Rosaceae was GBDNCAT and GBDNCAT-GTR for short, respectively) using explored using several programs based on R applications with the PHYLOBAYES v.3.3f (Lartillot et al., 2009). To reduce the compu- reduced GBDN data set with all outgroups pruned. First, we cal- tational burden, only one to two samples for each major clade culated the semi-logarithmic lineage through time (LTT) plot by were included in the analyses. We ran two independent tree APE v.3.5 (Paradis et al., 2004). To check the reliability of the searches and stopped them after the likelihood of the sampled LTT signal in our consensus chronogram, we generated plots for trees had stabilised, indicating that the two runs had converged both the consensus chronogram as well as 1000 chronograms of (maxdiff < 0.3). TREEPL-inferred time-calibrated Rosaceae trees. To avoid

Table 1 Macrofossils of Rosaceae used as calibration points for molecular dating

Assigned Node Anchor fossil Geological period date (Ma) Reference(s)

C1 Rosales crown Cretaceous 106.5–90 Crepet & Nixon (1996), Wang et al. (2009), Li et al. (2015) C2 myricaefolius Late Eocene 33.9 MacGinite (1953), Evanhoff et al. (2001) C3 Rosa germerensis Early Eocene 47.8 Edelman (1975) C4 wutuensis Early Eocene 47.8 Li et al. (2011) C5 Oemleria janhartfordae Early Eocene 49.42 Benedict et al. (2011) C6 cretacea Cretaceous 66.0 Knobloch & Mai (1986) C7 Malinae crown Middle Eocene 45.0 Velasco et al. (2010)

Ó 2017 The Authors New Phytologist (2017) New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 4 Research Phytologist

misinterpretation of the plots, we restricted our analyses to the genomes are similar to previously published rosaceous plastomes accurately estimated part of the LTT plot, which is least influ- in size, structure and gene content (Jansen et al., 2011; Wang enced by the missing taxa. We determined the time point at et al., 2013; Jin et al., 2014). which incomplete taxon sampling would begin to have a signifi- The whole plastome (WP) data set has an aligned length of cant effect on the LTT plot following the methods of Couvreur 185 590 bp, within which 45 675 parsimony-informative sites et al. (2011). A threshold is arbitrarily defined to include 85% of (PICs, 24.61%) are detected across Rosaceae (Table 2). Remov- genus stem nodes, which is congruent with nodes older than ing ambiguous sites (GBDN) reduces this number to 61 429 13.61 Ma being included in subsequent diversification analyses. total aligned nucleotides, yielding 11 173 PICs. The PCG data Second, we used an ML method to assess homogeneity of set has an aligned length of 71 563 nucleotides with 14 907 PICs. diversification rates across lineages with LASER v.2.4-1 (Rabosky, Comparing these data sets with WP, the percentages of PICs of 2006). Two rate-constant (pure birth and birth–death model) PCG and GBDN were reduced (20.83 and 18.19%, and three rate-flexible diversification models (a logistic density- respectively). dependent speciation rate, an exponential density-dependent and a yule2rate model) were evaluated in our study. Phylogenetic results Third, we used the TREEPAR package (Stadler, 2011) to assess speciation and rates through time, and to specifically Altogether, we reconstructed 12 separate phylogenetic trees detect potential rapid and global changes in diversification rates based on different methods of analysis, matrices and evolution- that might be due to environmental factors. We employed the ary models: Bayesian and ML analyses on WP, PCG and ‘BD.SHIFTS.OPTIM’ function that allows estimating discrete changes GBDN, and Bayesian analyses on WP, PCG and GBDN in speciation, extinction rates and mass extinction events in under- using CAT and CAT-GTR models (Figs 1, 2, S1, S2). Both sampled trees (Stadler, 2011). At each time t, the rates are allowed Bayesian and ML analyses recovered almost identical trees to change, and the species may undergo a shift in diversification. from each data set. All analyses fully resolved phylogenetic TREEPAR analyses were run with the following settings: start = 0, relationships among the major clades and most genera, and end = crown age estimated by dating analyses, grid = 1 Ma, and most resolved relationships gave high internal support. posdiv = FALSE to allow the diversification rate to be negative (i.e. Rosaceae were strongly supported as monophyletic. Taxonomic allowing for periods of declining diversity). We also estimated the treatment of Rosaceae with three subfamilies and 16 tribes (as 95% highest posterior density in magnitude and timing of rate previously delimited) was compatible with our results, which shifts by using a wrapper for TREEPAR to include 1000 trees of were strongly supported. Most previously unresolved phyloge- TREEPL-inferred time-calibrated Rosaceae trees. netic relationships among tribes were resolved. Although char- acterised by short internal branches, the phylogenetic relationships of Malinae were resolved into four clades (clades Results A, B, C and D), and many intergeneric relationships were clar- ified with strong support. Characteristics of plastomes and data sets Different analyses produced largely congruent topologies (Figs Complete plastome sequences of 121 rosaceous species and most 3, S2). However, two nodes exhibit incongruence: WP weakly coding and a small number of noncoding regions of two species supported a relationship of Rosoideae (Dryadoideae + Amyg- of Filipendula were newly sequenced in this study; these daloideae), and other analyses strongly supported Amygdaloideae sequences were deposited in GenBank (Table S1). These plastid (Dryadoideae + Rosoideae); PCG and GBDN (including PCG

Table 2 Data characteristics, selected model and resolved relationships among tribes in each data set

Number Parsimony-informative Data set of sites Variable sites (%) sites (%) Best fit model Relationship

WP 185 590 63 516 (34.22) 45 675 (24.61) GTR + I + G ((((((((, Gillenieae), Spiraeeae), (Sorbarieae, Amygdaleae)), (Kerrieae, Exochordeae)), Neillieae), Lyonothamneae), Dryadeae), (((((Potentilleae, Roseae), Agrimonieae), Rubeae), Colurieae), Ulmarieae)) PCG 71 564 19 550 (27.32) 14 907 (20.83) GTR + I + G ((((((((Maleae, Gillenieae), Spiraeeae), Sorbarieae), Amygdaleae), (Kerrieae, Exochordeae)), Neillieae), Lyonothamneae), ((((((Potentilleae, Roseae), Agrimonieae), Rubeae), Colurieae), Ulmarieae), Dryadeae)) GBDN 61 429 14 631 (23.82) 11 173 (18.19) GTR + I + G ((((((((Maleae, Gillenieae), Spiraeeae), Sorbarieae), Amygdaleae), (Kerrieae, Exochordeae)), Neillieae), Lyonothamneae), ((((((Potentilleae, Roseae), Agrimonieae), Rubeae), Colurieae), Ulmarieae), Dryadeae))

For each data set, only ingroups are included. WP, the whole plastome data set; PCG, the data set only including the exons of protein-coding genes; GBDN, the data set of whole plastomes with most ambiguous sites being removed.

New Phytologist (2017) Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 5

Cotoneaster franchetii horizontalis 74/ Cotoneaster salicifolius 75/ salicifolia integrifolia Photinia prionophylla 98/ davidiana indica Rhaphiolepis umbellata bengalensis var. angustifolia C 98/ rufopilosa Sorbus sp. Sorbus helenae Micromeles ochracea Micromeles alnifolia Pyrus pashia 51/ Pyrus pyrifolia Pyrus spinosa 86/ 92/ Cormus domestica 21/ Cydonia oblonga 42/ tristaniicarpa Maleae 27/ melanocarpa 94/ Pourthiaea arguta var. salicifolia 98/ sinensis 91/ delavayi B 98/ speciosa doumeri Malus baccata 96/ anthyllidifolia alnifolia Amygdaloideae 82/ Amelanchier sinica denticulata ramosissimum A pinnatifida var. major Crataegus chungtienensis canescens 86/ fortuneana Pyracantha angustifolia D Pyracantha coccinea californica crataegifolia stipulata Gillenieae uniflora caespitosum angustata martini Pentactina rupicola dioicus Spiraeeae pectinata argenteus Holodiscus discolor millefolium sorbifolia fasciculatum Sorbarieae Prunus mume 75/0.53 Prunus armeniaca Prunus salicina Prunus persica 58/0.98 Prunus andersonii Prunus conradinae Amygdaleae Prunus undulata Prunus napaulensis Prunus macrocarpum Kerria japonica cliftonii ramosissima Kerrieae scandens sinensis Prinsepia utilis serratifolia Exochordeae Oemleria cerasiformis serratisepala Neillia gracilis Neillia hanceana Dryadoideae sp. B Neillieae 63/ Physocarpus sp. A floribundus Lyonothamneae tridentata Cercocarpus montanus var. minutiflorus foliolosa Dryadeae drummondii Dryas octopetala var. asiatica Dasiphora parvifolia Potaninia mongolica glandulosa Chamaerhodos erecta iinumae Fragaria vesca Alchemilla acutiloba Alchemilla pectinata 37/ Comarum salesovianum Sibbaldianthe sericea Sibbaldia procumbens Potentilleae Sibbaldia retusa purpurascens Potentilla tilingii Potentilla lancinata Potentilla indica 97/ Potentilla micrantha Potentilla purpurea Potentilla micropetala Potentilla lineata Rosoideae Rosa lichiangensis Rosa roxburghii f. normalis Roseae Rosa persica sp. pinnatifida Margyricarpus pinnatus Cliffortia repens Sanguisorba officinalis Sanguisorba filiformis Poterium moquiniana Agrimonieae Poterium menendezii Poterium sphaerocarpa Poterium spinosum Hagenia abyssinica Leucosidea sericea Agrimonia pilosa Spenceria ramalana niveus Rubus fockeanus Rubeae Rubus corchorifolius Geum elatum Geum triflorum Geum macrophyllum Colurieae Fallugia paradoxa Filipendula vestita Filipendula sp. Ulmarieae Cannabis sativa Humulus lupulus australis Morus mongolica Debregeasia saeneb Outgroup Ulmus changii var. kunmingensis Elaeagnus macrophylla Elaeagnus sp. Ziziphus jujuba

0.04 Fig. 1 The maximum likelihood (ML) tree of Rosaceae inferred from the whole plastome (WP) data set. Numbers at nodes correspond to ML bootstrap percentages (100 replicates, only values <100% are shown) and Bayesian inference (BI) posterior probabilities (only probabilities <1.0 are shown). A–D indicate the four clades in Malinae. and GBDN with two site-heterogeneous models) strongly sup- site-homogeneous and site-heterogeneous models) supported a ported tribe Amygdaleae as sister to a clade formed by tribes Sor- sister relationship of Amygdaleae and Sorbarieae with high sup- barieae, Spiraeeae, Maleae and Gillenieae, whereas WP (both port (Figs 3, S2).

Ó 2017 The Authors New Phytologist (2017) New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 6 Research Phytologist

Rhaphiolepis umbellata Rhaphiolepis indica 33/0.63 Eriobotrya bengalensis var. angustifolia 50/0.84 Heteromeles salicifolia Cotoneaster salicifolius 97/ Cotoneaster horizontalis 68/ Cotoneaster franchetii 95/ Photinia prionophylla 41/ Photinia integrifolia C Sorbus rufopilosa 91/ Sorbus sp. 93/ Sorbus helenae Micromeles alnifolia Micromeles ochracea Pyrus pashia 68/ Pyrus pyrifolia Pyrus spinosa 93/ Cormus domestica 84/ Pseudocydonia sinensis 54/ Docynia delavayi Maleae 18/ Chaenomeles speciosa 55/ 43/ Pourthiaea arguta var. salicifolia 90/ Osteomeles anthyllidifolia B 82/ Dichotomanthes tristaniicarpa 62/ 66/Cydonia oblonga Malus baccata 99/ Aronia melanocarpa

99/ Malacomeles denticulata Amygdaloideae Peraphyllum ramosissimum 81/ Amelanchier sinica Amelanchier alnifolia A Crataegus pinnatifida var. major Crataegus chungtienensis Mespilus canescens Pyracantha fortuneana Pyracantha angustifolia D Pyracantha coccinea Kageneckia crataegifolia Gillenieae Aruncus sylvester 97/ Luetkea pectinata Holodiscus discolor Holodiscus argenteus 58/- Kelseya uniflora Spiraeeae 84/ Sibiraea angustata 96/ Spiraea martini Pentactina rupicola Chamaebatiaria millefolium Sorbarieae Prunus mume 66/0.69 Prunus armeniaca Prunus persica Prunus salicina Prunus andersonii Prunus conradinae Amygdaleae 98/ 97/ Prunus undulata Prunus napaulensis Prunus macrocarpum Kerria japonica Coleogyne ramosissima Kerrieae Rhodotypos scandens Prinsepia utilis Prinsepia sinensis Exochorda serratifolia Exochordeae Oemleria cerasiformis Neillia gracilis Neillia serratisepala Neillia hanceana Physocarpus sp. B Neillieae Physocarpus sp. A Lyonothamnus floribundus Lyonothamneae Dasiphora fruticosa Dasiphora parvifolia Potaninia mongolica Drymocallis glandulosa 90/ Chamaerhodos erecta Fragaria vesca Fragaria iinumae Alchemilla acutiloba Alchemilla pectinata Comarum salesovianum Sibbaldia retusa Sibbaldia procumbens Potentilleae 94/ Sibbaldianthe sericea Potentilla purpurascens Potentilla tilingii Potentilla lancinata Potentilla indica 85/ Potentilla purpurea 90/ Potentilla micrantha Potentilla micropetala Potentilla lineata Rosoideae Rosa lichiangensis Rosa roxburghii f. normalis Rosa persica Roseae Polylepis sp. Polylepis australis Polylepis reticulata Acaena pinnatifida Margyricarpus pinnatus Cliffortia repens Sanguisorba officinalis Sanguisorba filiformis Poterium menendezii Agrimonieae 97/ Poterium moquiniana Poterium sphaerocarpa Poterium spinosum Hagenia abyssinica Leucosidea sericea Agrimonia pilosa Spenceria ramalana 93/ Rubus fockeanus Rubus corchorifolius Rubeae Rubus niveus 60/0.78 Geum triflorum Geum macrophyllum Geum elatum Colurieae Dryadoideae Fallugia paradoxa Filipendula vestita 99/ Filipendula sp. Ulmarieae Cercocarpus montanus var. minutiflorus Purshia tridentata Chamaebatia foliolosa Dryas drummondii Dryadeae Dryas octopetala var. asiatica Morus australis Morus mongolica Debregeasia saeneb Humulus lupulus Cannabis sativa Aphananthe aspera Outgroup Ulmus changii var. kunmingensis Elaeagnus sp. 99/ Elaeagnus macrophylla Ziziphus jujuba

0.02 Fig. 2 The maximum likelihood (ML) tree of Rosaceae inferred from the GBDN (whole plastomes with most ambiguous sites being removed) data set. Numbers at nodes correspond to ML bootstrap percentages (100 replicates, only values <100% are shown) and Bayesian inference (BI) posterior probabilities (only probabilities <1.0 are shown). A dash (‘–‘) indicates that this relationship was not supported in the analysis. A–D indicate the four clades in Malinae.

New Phytologist (2017) Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 7

(a) Ulmarieae (b) Ulmarieae Colurieae Colurieae Rubeae Rubeae Agrimonieae 98/ Agrimonieae Potentilleae 97/ 93/ Potentilleae Roseae Roseae Dryadeae Dryadeae 63/ Lyonothamneae Lyonothamneae Neillieae Neillieae Kerrieae Kerrieae Exochordeae Exochordeae 80/ Sorbarieae Amygdaleae 58/0.98 Amygdaleae Sorbarieae Spiraeeae 62/- Spiraeeae 96/ Gillenieae Gillenieae Maleae Maleae Outgroup Outgroup

(c) Ulmarieae (d) Ulmarieae Colurieae Colurieae Rubeae Rubeae 97/ Agrimonieae 0.99 99/ 0.99 Agrimonieae 85/ Potentilleae Roseae Roseae Dryadeae Dryadeae Lyonothamneae Lyonothamneae Neillieae Neillieae Kerrieae Kerrieae Exochordeae 98/ Exochordeae Amygdaleae Sorbarieae Amygdaleae 84/ Spiraeeae Sorbarieae 97/ Gillenieae 0.91 Spiraeeae Maleae Gillenieae Outgroup outgroup Fig. 3 Summary of different phylogenetic relationships produced by different data and methods in this study; only tribes are shown. Numbers at nodes correspond to maximum likelihood (ML) bootstrap percentages (100 replicates, only values <100% are shown) and Bayesian inference (BI) posterior probabilities (only probabilities <1.0 are shown). (a) WP, the whole plastome data set; (b) PCG, the data set only including the exons of protein-coding genes; (c) GBDN, the data set of whole plastomes with most ambiguous sites being removed; (d) WPCAT, whole plastomes with the CAT model.

The historical diversification of Rosaceae Discussion The stem age of Rosaceae (Fig. S3) was estimated at 106.50 Ma Reducing systematic error to improve robustness of plastid (95% highest posterior density (HPD): 106.50–106.50 Ma) and phylogenomics of Rosaceae the crown age of Rosaceae at 95.09 Ma (95% HPD: 94.46– 96.36 Ma). Subfamilies Dryadoideae and Rosoideae diverged at Systematic error could increase following the increase of charac- 92.93 Ma (95% HPD: 92.13–94.33 Ma). The crown age of ters used in phylogenomic studies, and a strongly supported but Dryadoideae was estimated at 40.67 Ma (95% HPD: 39. incorrect tree will be obtained when erroneous signal dominates 68–48.70 Ma), Rosoideae at 75.78 Ma (95% HPD: 74.45– the true phylogenetic signal (Delsuc et al., 2005; Jeffroy et al., 78.14 Ma) and Amygdaloideae at 90.18 Ma (95% HPD: 89.38– 2006; Rodriguez-Ezpeleta et al., 2007; Philippe et al., 2011). 91.02 Ma). The absolute ages for deep nodes of Rosaceae are Inaccurate alignment caused by rapidly evolving sites and shown in Table 3, and ages are plotted as a chronogram in improper models of sequence evolution have also been consid- Fig. S3. ered as major sources of systematic error in plastid phyloge- The LASER results rejected a null hypothesis of temporally nomics (Jeffroy et al., 2006; Rodriguez-Ezpeleta et al., 2007; homogeneous diversification rates and supported a rate-variable Philippe et al., 2011). model of diversification, with the diversification rate-constancy Phylogenetics is thought to be greatly dependent on accurate D = statistic AICRC 28.47. The logarithmic density-dependent sequence alignment (Morrison & Ellis, 1997; Ogden & Rosen- speciation rate model of diversification was statistically supported berg, 2006; Hohl & Ragan, 2007). Choosing reliable sequences as the best fitting model for our data. Analysis of the LTT plots and removal of problematic regions were suggested to be effective generated similar results as the LASER analysis and supported methods for reducing alignment artefacts and improving the increases in the diversification rate of Rosaceae during the Late robustness of phylogenomic reconstructions (Burleigh & Cretaceous and early Oligocene onwards (Fig. 4). TREEPAR analy- Mathews, 2004; Goremykin et al., 2010; Zhong et al., 2011; ses also supported a diversification model with varying rates (AIC Parks et al., 2012; Som, 2015). All our analyses (Figs 1–3, S1, for two shifts, 693.00). A rapid initial diversification in the Late S2) produced largely identical deep relationships except for those Cretaceous was followed by a decrease in diversification rate. The among three subfamilies and the position of Amygdaleae. Amyg- second rate shift occurred in the early Oligocene with a sharp daloideae (Dryadoideae + Rosoideae), a relationship supported increase in diversification rate. by previous molecular studies and some nonmolecular data (see

Ó 2017 The Authors New Phytologist (2017) New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 8 Research Phytologist

Table 3 Estimated dates (Ma) for the major Rosaceae clades (a)

Age estimated

Node Point estimate 95% HPD

Rosaceae 95.09 94.46–96.36 Amygdaloideae 90.18 89.38–91.02 Rosoideae 75.78 74.45–78.14 Dryadoideae 40.67 39.68–48.70 Rosoideae + Dryadoideae 92.93 92.13–94.33 ((Kerieae, Exochordeae), 86.06 84.97–87.53 ((((Gillenieae, Maleae),

Spiraeeae), Sorbarieae), Log number of lineages Amygdaleae)) + Neillieae ((((Gillenieae, Maleae), 85.87 84.84–87.35 Spiraeeae), Sorbarieae), Amygdaleae) + (Kerieae, Exochordeae) (((Gillenieae, Maleae), Spiraeeae), 84.84 83.47–86.15 12 5102050 Sorbarieae) + Amygdaleae Upper/Late Cretaceous Paleocene Eocene Oligocene Miocene Pl./P. – ((Gillenieae, Maleae), 84.30 82.84 85.68 −80 −60 −40 −20 0 + Spiraeeae) Sorbarieae Million yr ago (Gillenieae, Maleae) + Spiraeeae 82.75 80.82–84.12 (b) Kerieae + Exochordeae 84.19 82.63–85.56 .3 Gillenieae + Maleae 54.27 53.39–58.73 (((Roseae, Potentilleae), Agrimonieae), 59.51 58.43–67.05 Rubeae) + Colurieae 20 ((Roseae, Potentilleae), 57.84 56.93–65.66 Agrimonieae) + Rubeae (Roseae, Potentilleae) + Agrimonieae 49.17 48.48–56.21 Roseae + Potentilleae 48.19 47.80–55.02 Kerieae 61.44 59.65–67.31 Exochordeae 49.42 49.42–49.42 Spiraeeae 44.42 42.22–46.97 Maleae 50.06 49.38–54.35 Diversification rate Sorbarieae 31.40 29.28–37.06 Neillieae 18.47 16.77–27.52 Colurieae 39.14 38.00–49.55 Potentilleae 41.87 40.85–48.40 Agrimonieae 36.13 35.07–39.58

Upper/Late Cretaceous Paleocene Eocene Oligocene Miocene Pl./P. −0.1 0.0 0.1 0. later), was obtained with strong support with PCG (the data set −80 −60 −40 −20 0 only the exons of protein-coding genes) and GBDN (the data set Million yr ago of whole plastomes with most ambiguous sites removed). GBDN Fig. 4 Diversification analyses in Rosaceae. (a) Lineage-through-time plot; and PCG also strongly supported Amygdaleae as sister to a clade (b) maximum-likelihood birth–death–shift estimates of diversification formed by tribes Sorbarieae, Spiraeeae, Maleae and Gillenieae, rates. The red plot represents the maximum clade credibility tree, and the grey plots denote 1000 trees. PI., Pliocene; P., Pleistocene. whereas WP (whole plastome) resolved a sister relationship between Amygdaleae and Sorbarieae (Figs 3, S2). Rapidly evolv- ing and potentially poorly aligned sites in the WP data set could Analysis of the WP data set applying a complex site- have a significant impact on reconstruction of phylogenetic rela- heterogeneous model also resolved Amygdaloideae tionships of these nodes. GBDN and PCG would be expected to (Dryadoideae + Rosoideae) as did those for the PCG and GBDN produce a robust phylogenetic position of Amygdaleae. The data data sets (Fig. S2). Our results again highlight the importance of set including only noncoding regions poorly resolved many rela- goodness of fit between substitution model and data in plastid tionships (data not shown), and its poorly aligned regions could phylogenomics, and a site-heterogeneous mixed model could affect accurate phylogenetic reconstruction. alleviate effects of systematic errors from base compositional Previous studies indicated that use of the CAT model could heterogeneity to lineage-specific changes. deal with site heterogeneity (Lartillot & Philippe, 2004) and Taking all our analysed results into consideration, GBDN and improve the robustness of phylogenetic reconstruction (Campbell PCG would be expected to resolve most robustly the phyloge- et al., 2011; Philippe et al., 2011). Analyses of the PCG and netic relationships of Rosaceae. Both GBDN and PCG produced GBDN data sets with site-heterogeneous models and site- identical topologies, but the former has much higher support at homogeneous models produced identical relationships among many nodes than the latter with just coding regions. We have three subfamilies and the phylogenetic position of Amygdaleae. therefore used the GBDN tree as the basis for our discussions of

New Phytologist (2017) Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 9 phylogenetic relationships and diversification history of et al., 2000; Evans & Campbell, 2002; Campbell et al., 2007; Rosaceae. Potter et al., 2007; Lo & Donoghue, 2012). The plastid data of Lo & Donoghue (2012) resolved Malinae into five clades, whereas their nuclear data resolved many more, the plastid and Fully resolved deep-level relationships of Rosaceae nuclear trees exhibiting significant incongruence, and relation- Our phylogenomic analyses fully resolved relationships among ships among these clades were not resolved by either of the two major clades and genera of Rosaceae, a difficult group hypoth- data sets. Campbell et al. (2007) reconstructed the phylogenetic esised to have both ancient and recent radiations, hybridisa- relationships among Malinae with five nuclear regions, but only tion and apomixis (Campbell et al., 2007; Lo & Donoghue, their GBSSI-2A data set resolved Malinae into three clades with 2012; Chin et al., 2014). Congruent with previous studies strong support. The GBSSI-2A data set also revealed that species (Potter et al., 2007), Rosaceae were resolved into three clades of Amelanchier, Malus, Osteomeles and Pyracantha contain multi- corresponding to three subfamilies: Amygdaloideae, ple copies that are nested within different clades (Campbell et al., Dryadoideae and Rosoideae. Phylogenetic relationships among 2007). the three subfamilies have been addressed in some previous Our phylogenomic analyses resolved subtribe Malinae into studies, and conflicting results have been produced: Rosoideae four clades (A, B, C and D), which are largely congruent sister to a Amygdaloideae plus Dryadoideae clade by Potter with the plastid results of Lo & Donoghue (2012) except et al. (2007) vs Amygdaloideae sister to a Dryadoideae plus unresolved Osteomeles is sister to the rest of clade B (Figs 1, Rosoideae clade by Chin et al. (2014) and Li et al. (2015), 2, S1). Relationships among these four clades and most gen- which our results strongly support. Dryadoideae and era typically had high support. However, our results showed Rosoideae share some non-molecular features, such as presence significant incongruence with previous studies based on of stipules, separation of hypanthium from the ovary and usu- nuclear markers (Campbell et al., 2007; Lo & Donoghue, ally achenes as fruits. However, Amygdaloideae and 2012). Hybridisation is likely to have played a major role in Dryadoideae also share nonmolecular features, such as pres- the diversification of genera of Malinae, and this may in part ence of cyanogenic glycosides and sorbitol, which our results explain the phylogenetic incongruence. Of course, our results support as plesiomorphic in this clade with a secondary loss based solely on plastid DNA cannot be used to assess effects in Rosoideae. of hybridisation, so there are important issues to be resolved The previously recognised supertribes (Kerriodae, Pyrodae and via future comparison of nuclear and plastid trees. The previ- Rosodae), tribes (Agrimonieae, Amygdaleae, Colurieae, ously unresolved intergeneric relationships of Dryadoideae and Dryadeae, Kerrieae, Maleae, Neillieae, Exochordeae, Potentilleae, Colurieae were also fully resolved with high support. Roseae, Rubeae, Sorbarieae, Spiraeeae, Ulmarieae) and subtribes Our results also supported some recently clarified circumscrip- (Agrimoniinae, Fragariinae, Malinae, Sanguisorbinae) were all tions of controversial genera, although our taxonomic sampling is strongly supported. Our phylogenomic analyses also fully insufficient to be alone considered definitive. The generic circum- resolved their interrelationships with strong support. Relation- scription of Potentilla has frequently been altered, and many gen- ships among some tribes of Amygdaloideae have been unresolved era have been segregated from or merged with Potentilla (Potter or only weakly supported in previous studies (Potter et al., 2007). et al., 2007; Dobes & Paule, 2010). Our results strongly sup- For instance, Amygdaleae and Neillieae were weakly supported as ported the recently published merger of the genera Duchesnea, sisters in previous studies (Potter et al., 2007), whereas our results Horkelia and Horkeliella with Potentilla,andP. fruticosa and strongly supported Neillieae as sister to all other members of P. parvifolia should be included in genus Dasiphora. Recent Amygdaloideae except Lyonothamneae, and Amygdaleae was sis- molecular result indicated Sibbaldia was a remarkably poly- ter to supertribe Pyrodae plus tribes Spiraeeae and Sorbarieae. phyletic genus, which was split into five clades (Eriksson et al., Deep relationships of Rosoideae were inconclusive in previous 2015). The two Sibbaldia species sampled here (S. micropetala molecular studies, and Rubeae and Colurieae were weakly sup- and S. purpurea) were supported as members of Potentilla. ported as successively sister to the remaining members of the Bencomia, Dendriopoterium, Marcetella and Sarcopoterium are supertribe Rosodae (Potter et al., 2007). Our results supported closely related, and our results are consistent with their proposed Colurieae and then Rubeae as successively sister to supertribe merger with Poterium. Consistent with other studies, a broader Rosodae instead of vice versa, both with 100% support. circumscription of Prunus (including Amygdalus, Armeniaca, Our analyses also fully resolved some previously unresolved Cerasus, Laurocerasus and Padus) was strongly supported, which intergeneric relationships and added additional evidence support- was further divided into ‘racemose’ and ‘corymbose + solitary’ ing some of the recently altered generic circumscriptions based clades (Chin et al., 2014). Pourthiaea was supported to be a sepa- on analyses with much more appropriate representation at the rate genus relative to Photinia. Sorbus should be divided into sev- species level. Malinae are a typically difficult group for classifica- eral genera; Cormus, Micromeles and Sorbus s.s. were each tion and phylogenetic reconstruction due to frequent hybridisa- supported as separate clades. Stephanandra should be merged tion and low levels of sequence divergence, the latter perhaps due with Neillia. More samples of controversial genera and their to rapid divergence (Lo & Donoghue, 2012). Multiple previous closely related relatives should be included in future phyloge- molecular phylogenetic studies failed to resolve relationships nomic analysis to thoroughly clarify circumscriptions of all rosa- among their major clades and genera (Morgan et al., 1994; Evans ceous genera.

Ó 2017 The Authors New Phytologist (2017) New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 10 Research Phytologist

Sino-Africa Joint Research Center (SAJC201302), the National Dating of the origin and historical diversification of Science and Technology on Basic Research Programme Rosaceae (2013FY112600), the Science and Technology Innovation of Previous dating analyses have provided controversial stem ages CAS, iFlora Cross and Cooperation team (31129001), and the for this family (Jeong et al., 1999; Wikstrom et al., 2001; Magal- Talent Project of Yunnan Province (2011CI042). We thank the lon & Castillo, 2009; Wang et al., 2009; Bell et al., 2010; Xue Brisbane Botanic Garden, Royal Botanic Gardens, Kew, Royal et al., 2012; Chin et al., 2014; Hohmann et al., 2015; Li et al., Botanic Garden Edinburgh, the San Francisco Botanical Garden, 2015; Tank et al., 2015). Our results indicate the stem age of University of Botanical Garden at Berkeley, and Kun- Rosaceae to be during the Albian period of the Early Cretaceous ming Botanical Garden for providing samples, and Jian Wang, (106.50 Ma with 95% HPD: 106.50–106.50 Ma). The crown Michelle van der Bank, Olivier Maurin and James R. Shevock for age of Rosaceae varied among previous studies, 88.3 Ma (95% sampling. We are also grateful to Bo-Jian Zhong and Peng-Fei HPD: 84.2–92.8 Ma) by Chin et al. (2014) and c. 61.2 Ma by Ma for advice on the analyses. This study was facilitated by the Hohmann et al. (2015). Our estimated crown age is 95.09 Ma Germplasm Bank of Wild Species, Kunming Institute of Botany, (95% HPD: 94.46–96.36 Ma). Divergence times among major Chinese Academy of Sciences. clades are largely consistent with those of Chin et al. (2014), although they only sampled some representative clades. The three Author contributions subfamilies diverged in the Late Cretaceous (95.09–92.93 Ma). T-S.Y, S-D.Z. and D-Z.L. designed the research; S-D.Z., H-T.L. Divergence times among the 16 tribes were inferred to be the and J-B.Y. performed the research; S-D.Z., J-J.J. and S-Y.C. Late Cretaceous to the middle Eocene (90.18–40.67 Ma), when analysed the data; and T-S.Y., S-D.Z., M.W.C., D.E.S. and D- the Earth was warmer and moister, and tropical occurred Z.L. wrote the paper. worldwide (Zachos et al., 2001). The intergeneric divergence times were estimated to be from the Late Cretaceous to Pliocene. The LTT plots and TREEPAR analyses indicated higher diversi- References fication rates of lineages during several different geological ages. Ahmad R, Parfitt DE, Fass J, Ogundiwin E, Dhingra A, Gradziel TM, Lin D, Higher rates occurred during the Late Cretaceous, which is con- Joshi NA, Martinez-Garcia PJ, Crisosto CH. 2011. Whole genome gruent with rapid expansion of major lineages of Rosaceae. The sequencing of peach (Prunus persica L.) for SNP identification and selection. fossil record demonstrates a major diversification of Rosaceae BMC Genomics 12: 569–575. during the Eocene (DeVore & Pigg, 2007). Early expansion of Barrett CF, Specht CD, Leebens-Mack J, Stevenson DW, Zomlefer WB, Davis JI. 2014. Resolving ancient radiations: can complete plastid gene sets elucidate Rosaceae is accompanied by formation of extensive boreotropical deep relationships among the tropical gingers (Zingiberales)? Annals of Botany floras in the northern mid-latitudes during a warm phase peaking 113: 119–133. in the Late Palaeocene to Early Eocene thermal maximum Bell CD, Soltis DE, Soltis PS. 2010. The age and diversification of the (Wolfe, 1975; Zachos et al., 2001). Rapid diversification of rosa- angiosperms re-revisited. American Journal of Botany 97: 1296–1303. ceous lineages from the early Oligocene onwards corresponds to Benedict JC, DeVore ML, Pigg KB. 2011. Prunus and Oemleria (Rosaceae) flowers from the late early Eocene Republic flora of northeastern Washington habitat transitions of rosaceous taxa and their adaptation to a State, USA. International Journal of Plant Sciences 172: 948–958. wide variety of environments. Bewick AJ, Chain FJ, Heled J, Evans BJ. 2012. The pipid . Systematic Biology During the Cenozoic, the uplift of mountain ranges was asso- 61: 913–926. ciated with falling global temperatures in the middle- and high- Bouckaert R, Heled J, Kuhnert D, Vaughan T, Wu CH, Xie D, Suchard MA, latitude landmasses from warm subtropical and temperate cli- Rambaut A, Drummond AJ. 2014. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Computational Biology 10: e1003537. mates in the Eocene to colder, drier, and more strongly seasonal Burleigh JG, Mathews S. 2004. Phylogenetic signal in nucleotide data from climates from the Oligocene onwards (Molnar & England, 1990; plants: implications for resolving the seed plant tree of life. American Journal of Zachos et al., 2001; Dupont-Nivet et al., 2007; Eldrett et al., Botany 91: 1599–1613. 2009). Significant expansion of vegetation adapted to drier and Campbell CS, Evans RC, Morgan DR, Dickinson TA, Arsenault MP. 2007. colder climates occurred in large parts of Eurasia and North Phylogeny of subtribe Pyrinae (formerly the Maloideae, Rosaceae): limited resolution of a complex evolutionary history. Plant Systematics and Evolution America (Collinson, 1992; Wolfe, 1992). Extreme climatic and 266: 119–145. geological shifts during the late Cenozoic may have promoted Campbell LI, Rota-Stabelli O, Edgecombe GD, Marchioro T, Longhorn SJ, rapid generic diversification of Rosaceae, a family that contains Telford MJ, Philippe H, Rebecchi L, Peterson KJ, Pisani D. 2011. many genera adapted to montane, dry or temperate biomes. The MicroRNAs and phylogenomics resolve the relationships of Tardigrada and fossil record indicates that rosaceous taxa have adapted to many suggest that velvet worms are the sister group of Arthropoda. Proceedings of the National Academy of Sciences, USA 108: 15920–15924. different habitats and increasingly became prominent members Castresana J. 2000. Selection of conserved blocks from multiple alignments for of the cooler temperate floras of the Neogene (DeVore & Pigg, their use in phylogenetic analysis. Molecular Biology and Evolution 17: 540–552. 2007). Chin SW, Shaw J, Haberle R, Wen J, Potter D. 2014. Diversification of , peaches, plums and cherries – molecular systematics and biogeographic history of Prunus (Rosaceae). Molecular Phylogenetics and Acknowledgements Evolution 76:34–48. Clegg MT, Gaut BS, Learn GH Jr, Morton BR. 1994. Rates and patterns of This study was supported by the grants from the National Key chloroplast DNA evolution. Proceedings of the National Academy of Sciences, Basic Research Programme of China (2014CB954100), the USA 91: 6795–6801.

New Phytologist (2017) Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 11

Collinson ME. 1992. Vegetational and floristic changes around the Eocene/ Jansen RK, Cai Z, Raubeson LA, Daniell H, Depamphilis CW, Leebens-Mack Oligocene boundary in western and central Europe. In: Prothero DR, Berggren J, Muller KF, Guisinger-Bellian M, Haberle RC, Hansen AK et al. 2007. WA, eds. Eocene–Oligocene climatic and biotic evolution. Princeton, NJ, USA: Analysis of 81 genes from 64 plastid genomes resolves relationships in Princeton University Press, 437–450. angiosperms and identifies genome-scale evolutionary patterns. Proceedings of Couvreur TLP, Pirie MD, Chatrou LW, Saunders RMK, Su YCF, Richardson the National Academy of Sciences, USA 104: 19369–19374. JE, Erkens RHJ. 2011. Early evolutionary history of the flowering plant family Jansen RK, Saski C, Lee SB, Hansen AK, Daniell H. 2011. Complete plastid Annonaceae: steady diversification and boreotropical geodispersal. Journal of genome sequences of three (Castanea, Prunus, Theobroma): evidence for Biogeography 38: 664–680. at least two independent transfers of rpl22 to the nucleus. Molecular Biology and Crepet WL, Nixon KC. 1996. The fossil history of . In: D’Arcy WG, Evolution 28: 835–847. Keating RC, eds. The anther: form, function and phylogeny. Cambridge, UK: Jeffroy O, Brinkmann H, Delsuc F, Philippe H. 2006. Phylogenomics: the Cambridge University Press, 25–28. beginning of incongruence? Trends in Genetics 22: 225–231. Delsuc F, Brinkmann H, Philippe H. 2005. Phylogenomics and the Jeong SC, Ritchie NJ, Myrold DD. 1999. Molecular phylogenies of plants and reconstruction of the tree of life. Nature Reviews Genetics 6: 361–375. support multiple origins of actinorhizal symbioses. Molecular DeVore ML, Pigg KB. 2007. A brief review of the fossil history of the family Phylogenetics and Evolution 13: 493–503. Rosaceae with a focus on the Eocene Okanogan Highlands of eastern Jian S, Soltis PS, Gitzendanner MA, Moore MJ, Li R, Hendry TA, Qiu YL, Washington State, USA, and , Canada. Plant Systematics and Dhingra A, Bell CD, Soltis DE. 2008. Resolving an ancient, rapid radiation in Evolution 266:45–57. Saxifragales. Systematic Biology 57:38–57. Dobes C, Paule J. 2010. A comprehensive chloroplast DNA-based phylogeny of Jin GH, Chen SY, Yi TS, Zhang SD. 2014. Characterization of the complete the genus Potentilla (Rosaceae): implications for its geographic origin, chloroplast genome of apple (Malus 9 domestica, Rosaceae). Plant Diversity and phylogeography and generic circumscription. Molecular Phylogenetics and Resources 36: 468–484. Evolution 56: 156–175. Kalkman C. 2004. Rosaceae. In: Kubitzki K, ed. The families and genera of Dupont-Nivet G, Krijgsman W, Langereis CG, Abels HA, Dai S, Fang X. 2007. vascular plants. Vol. 6. Flowering plants – dicotyledons: Celastrales, Oxalidales, Tibetan plateau aridification linked to global cooling at the Eocene–Oligocene Rosales, Cornales, Ericales. Berlin, Germany: Springer, 343–386. transition. Nature 445: 635–638. Katoh K, Kuma K, Toh H, Miyata T. 2005. MAFFT version 5: improvement in Edelman DW. 1975. The Eocene Germer Basin flora of south-central Idaho.MS accuracy of multiple sequence alignment. Nucleic Acids Research 33: 511–518. thesis, University of Idaho, Moscow, ID, USA. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton Eisen JA. 1998. Phylogenomics: improving functional predictions for S, Cooper A, Markowitz S, Duran C et al. 2012. Geneious Basic: an uncharacterized genes by evolutionary analysis. Genome Research 8: 163–167. integrated and extendable desktop software platform for the organization and Eisen JA, Fraser CM. 2003. Phylogenomics: intersection of evolution and analysis of sequence data. Bioinformatics 28: 1647–1649. genomics. Science 300: 1706–1707. Knobloch E, Mai D. 1986. Monographie der Fruchte€ und Samen in der Kreide von Eldrett JS, Greenwood DR, Harding IC, Huber M. 2009. Increased seasonality Mitteleuropa. Praha, Czech Republic: Rozpravy Ust rednıho ustavu through the Eocene to Oligocene transition in northern high latitudes. Nature geologickeho. 459: 969–973. Lartillot N, Lepage T, Blanquart S. 2009. PhyloBayes 3: a Bayesian software € Eriksson T, Lundberg M, T€opel M, Ostensson P, Smedmark JEE. 2015. package for phylogenetic reconstruction and molecular dating. Bioinformatics Sibbaldia: a molecular phylogenetic study of a remarkably polyphyletic genus 25: 2286–2288. in Rosaceae. Plant Systematics and Evolution 301: 171–184. Lartillot N, Philippe H. 2004. A Bayesian mixture model for across-site Evanhoff E, Gregory-Wodzicki KM, Johnson KR. 2001. Fossil flora and heterogeneities in the amino-acid replacement process. Molecular Biology and stratigraphy of the Florissant Formation, Colorado. Denver, CO, USA: Denver Evolution 21: 1095–1109. Museum of Nature & Science Series 4. Li HL, Wang W, Mortimer PE, Li RQ, Li DZ, Hyde KD, Xu JC, Soltis DE, Evans RC, Alice LA, Campbell CS, Kellogg EA, Dickinson TA. 2000. The Chen ZD. 2015. Large-scale phylogenetic analyses reveal multiple gains of granule-bound starch synthase (GBSSI) gene in the Rosaceae: multiple loci and actinorhizal nitrogen-fixing symbioses in angiosperms associated with climate phylogenetic utility. Molecular Phylogenetics and Evolution 17: 388–400. change. Scientific Reports 5: 14023. Evans RC, Campbell CS. 2002. The origin of the apple subfamily (Maloideae; Li Y, Smith T, Liu CJ, Awasthi N, Yang J, Wang YF, Li CS. 2011. Endocarps of Rosaceae) is clarified by DNA sequence data from duplicated GBSSI genes. Prunus (Rosaceae: Prunoideae) from the early Eocene of Wutu, Shandong American Journal of Botany 89: 1478–1484. Province, China. Taxon 60: 555–564. Goremykin VV, Nikiforova SV, Bininda-Emonds OR. 2010. Automated Lo EY, Donoghue MJ. 2012. Expanded phylogenetic and dating analyses of the removal of noisy data in phylogenomic analyses. Journal of Molecular Evolution apples and their relatives (Pyreae, Rosaceae). Molecular Phylogenetics and 71: 319–331. Evolution 63: 230–243. Hackett SJ, Kimball RT, Reddy S, Bowie RC, Braun EL, Braun MJ, MacGinite HD. 1953. Fossil plants of the Florissant beds of Colorado. Washington, Chojnowski JL, Cox WA, Han KL, Harshman J et al. 2008. A phylogenomic DC, USA: Carnegie Institute of Washington. study of birds reveals their evolutionary history. Science 320: 1763–1768. Magallon S, Castillo A. 2009. Angiosperm diversification through time. Hirakawa H, Shirasawa K, Kosugi S, Tashiro K, Nakayama S, Yamada M, American Journal of Botany 96: 349–365. Kohara M, Watanabe A, Kishida Y, Fujishiro T et al. 2014. Dissection of the Marcet-Houben M, Gabaldon T. 2015. Beyond the whole-genome duplication: octoploid strawberry genome by deep sequencing of the genomes of Fragaria phylogenetic evidence for an ancient interspecies hybridization in the baker’s species. DNA Research 21: 169–181. yeast lineage. PLoS Biology 13: e1002220. Hohl M, Ragan MA. 2007. Is multiple-sequence alignment required for accurate Molnar P, England P. 1990. Late Cenozoic uplift of mountain ranges and global inference of phylogeny? Systematic Biology 56: 206–221. climate change: chicken or egg? Nature 346:29–34. Hohmann N, Wolf EM, Lysak MA, Koch MA. 2015. A time-calibrated road Moore MJ, Bell CD, Soltis PS, Soltis DE. 2007. Using plastid genome-scale data map of Brassicaceae species radiation and evolutionary history. Plant Cell 27: to resolve enigmatic relationships among basal angiosperms. Proceedings of the 2770–2784. National Academy of Sciences, USA 104: 19363–19368. Hutchinson J. 1964. The genera of flowering plants. Vol. 1. Dicotyledons. Oxford, Moore MJ, Soltis PS, Bell CD, Burleigh JG, Soltis DE. 2010. Phylogenetic UK: Clarendon Press. analysis of 83 plastid genes further resolves the early diversification of . International Peach Genome I, Verde I, Abbott AG, Scalabrin S, Jung S, Shu S, Proceedings of the National Academy of Sciences, USA 107: 4623–4628. Marroni F, Zhebentyayeva T, Dettori MT, Grimwood J et al. 2013. The Morgan DR, Soltis DE, Robertson KR. 1994. Systematic and evolutionary high-quality draft genome of peach (Prunus persica) identifies unique patterns implications of rbcL sequence variation in Rosaceae. American Journal of Botany of genetic diversity, domestication and genome evolution. Nature Genetics 45: 81: 890–903. 487–494.

Ó 2017 The Authors New Phytologist (2017) New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com New 12 Research Phytologist

Morrison DA, Ellis JT. 1997. Effects of nucleotide sequence alignment on Tank DC, Eastman JM, Pennell MW, Soltis PS, Soltis DE, Hinchliff CE, phylogeny estimation: a case study of 18S rDNAs of apicomplexa. Molecular Brown JW, Sessa EB, Harmon LJ. 2015. Nested radiations and the pulse of Biology and Evolution 14: 428–441. angiosperm diversification: increased diversification rates often follow whole Ogden TH, Rosenberg MS. 2006. Multiple sequence alignment accuracy and genome duplications. New Phytologist 207: 454–467. phylogenetic inference. Systematic Biology 55: 314–328. The Angiosperm Phylogeny Group. 2016. An update of the Angiosperm Paradis E, Claude J, Strimmer K. 2004. APE: analyses of phylogenetics and Phylogeny Group classification for the orders and families of flowering plants: evolution in R language. Bioinformatics 20: 289–290. APG IV. Botanical Journal of the Linnean Society 181:1–20. Parks M, Cronn R, Liston A. 2012. Separating the wheat from the chaff: Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman A, mitigating the effects of noise in a plastome phylogenomic data set from Pinus Fontana P, Bhatnagar SK, Troggio M, Pruss D et al. 2010. The genome of L. (Pinaceae). BMC Evolutionary Biology 12: 100. the domesticated apple (Malus 9 domestica Borkh.). Nature Genetics 42: 833– Philippe H, Brinkmann H, Lavrov DV, Littlewood DT, Manuel M, Worheide 839. G, Baurain D. 2011. Resolving difficult phylogenetic questions: why more Wang H, Moore MJ, Soltis PS, Bell CD, Brockington SF, Alexandre R, Davis sequences are not enough. PLoS Biology 9: e1000602. CC, Latvis M, Manchester SR, Soltis DE. 2009. Rosid radiation and the rapid Posada D, Buckley TR. 2004. Model selection and model averaging in rise of angiosperm-dominated forests. Proceedings of the National Academy of phylogenetics: advantages of Akaike information criterion and Sciences, USA 106: 3853–3858. Bayesian approaches over likelihood ratio tests. Systematic Biology 53: 793– Wang S, Shi C, Gao LZ. 2013. Plastid genome sequence of a wild woody 808. oil species, Prinsepia utilis, provides insights into evolutionary and Posada D, Crandall KA. 1998. MODELTEST: testing the model of DNA mutational patterns of Rosaceae chloroplast genomes. PLoS ONE 8: substitution. Bioinformatics 14: 817–818. e73946. Potter D, Eriksson T, Evans RC, Oh S, Smedmark JEE, Morgan DR, Wickett NJ, Mirarab S, Nguyen N, Warnow T, Carpenter E, Matasci N, Kerr M, Robertson KR, Arsenault M, Dickinson TA et al. 2007. Ayyampalayam S, Barker MS, Burleigh JG, Gitzendanner MA et al. 2014. Phylogeny and classification of Rosaceae. Plant Systematics and Evolution Phylotranscriptomic analysis of the origin and early diversification of land 266:5–43. plants. Proceedings of the National Academy of Sciences, USA 111: E4859– Potter D, Gao F, Bortiri PE, Oh SH, Baggett S. 2002. Phylogenetic E4868. relationships in Rosaceae inferred from chloroplast matK and trnL-trnF Wikstrom N, Savolainen V, Chase MW. 2001. Evolution of the angiosperms: nucleotide sequence data. Plant Systematics and Evolution 231:77–89. calibrating the family tree. Proceedings of the Royal Society of London, B, Rabosky DL. 2006. LASER: a maximum likelihood toolkit for detecting Biological Sciences 268: 2211–2220. temporal shifts in diversification rates from molecular phylogenies. Evolutionary Wolfe JA. 1975. Some aspects of plant geography of the Northern Hemisphere Bioinformatics 2: 247–250. during the late Cretaceous and Tertiary. Annals of the Missouri Botanical Rodriguez-Ezpeleta N, Brinkmann H, Roure B, Lartillot N, Lang BF, Philippe Garden 62: 264–279. H. 2007. Detecting and overcoming systematic errors in genome-scale Wolfe JA. 1992. Climatic, floristic, and vegetational changes near the Eocene/ phylogenies. Systematic Biology 56: 389–399. Oligocene boundary in . In: Prothero DR, Berggren WA, eds. Ronquist F, Huelsenbeck JP. 2003. MrBayes 3: Bayesian phylogenetic inference Eocene–Oligocene biotic and climatic evolution. Princeton, NJ, USA: Princeton under mixed models. Bioinformatics 19: 1572–1574. University Press, 421–436. Ruhfel BR, Gitzendanner MA, Soltis PS, Soltis DE, Burleigh JG. 2014. From Wu J, Wang Z, Shi Z, Zhang S, Ming R, Zhu S, Khan MA, Tao S, Korban SS, algae to angiosperms-inferring the phylogeny of green plants (Viridiplantae) Wang H et al. 2013. The genome of the pear (Pyrus bretschneideri Rehd.). from 360 plastid genomes. BMC Evolutionary Biology 14: 23. Genome Research 23: 396–408. Schulze-Menz GK. 1964. Rosaceae. In: Melchior H, ed. Engler’s Syllabus der Wu Z, Gui S, Quan Z, Pan L, Wang S, Ke W, Liang D, Ding Y. 2014. A Pflanzenfamilien II. Berlin, Germany: Gebruder€ Borntraeger, 209–218. precise chloroplast genome of Nelumbo nucifera (Nelumbonaceae) evaluated Shulaev V, Sargent DJ, Crowhurst RN, Mockler TC, Folkerts O, Delcher AL, with Sanger, Illumina MiSeq, and PacBio RS II sequencing platforms: Jaiswal P, Mockaitis K, Liston A, Mane SP et al. 2011. The genome of insight into the plastid evolution of basal eudicots. BMC Plant Biology 14: woodland strawberry (Fragaria vesca). Nature Genetics 43: 109–116. 289. Smith SA, O’Meara BC. 2012. treePL: divergence time estimation Xi Z, Ruhfel BR, Schaefer H, Amorim AM, Sugumaran M, Wurdack KJ, using penalized likelihood for large phylogenies. Bioinformatics 28: 2689– Endress PK, Matthews ML, Stevens PF, Mathews S et al. 2012. 2690. Phylogenomics and a posteriori data partitioning resolve the Cretaceous Soltis DE, Smith SA, Cellinese N, Wurdack KJ, Tank DC, Brockington SF, angiosperm radiation Malpighiales. Proceedings of the National Academy of Refulio-Rodriguez NF, Walker JB, Moore MJ, Carlsward BS et al. 2011. Sciences, USA 109: 17519–17524. Angiosperm phylogeny: 17 genes, 640 taxa. American Journal of Botany 98: Xue JH, Dong WP, Cheng T, Zhou SL. 2012. Nelumbonaceae: systematic 704–730. position and species diversification revealed by the complete chloroplast Som A. 2015. Causes, consequences and solutions of phylogenetic incongruence. genome. Journal of Systematics and Evolution 50: 477–487. Brief Bioinform 16: 536–548. Yang JB, Li DZ, Li HT. 2014. Highly effective sequencing whole chloroplast Stadler T. 2011. Mammalian phylogeny reveals recent diversification rate shifts. genomes of angiosperms by nine novel universal primer pairs. Molecular Ecology Proceedings of the National Academy of Science, USA 108: 6187–6192. Resources 14: 1024–1031. Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic Zachos J, Pagani M, Sloan L, Thomas E, Billups K. 2001. Trends, rhythms, and analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688– aberrations in global climate 65 Ma to present. Science 292: 686–693. 2690. Zhang SD, Soltis DE, Yang Y, Li DZ, Yi TS. 2011. Multi-gene analysis provides Sun L, Zhang Q, Xu Z, Yang W, Guo Y, Lu J, Pan H, Cheng T, Cai M. 2013. a well-supported phylogeny of Rosales. Molecular Phylogenetics and Evolution Genome-wide DNA polymorphisms in two of mei (Prunus mume 60:21–28. Sieb. et Zucc.). BMC Genetics 14: 98. Zhong B, Deusch O, Goremykin VV, Penny D, Biggs PJ, Atherton RA, Sun M, Soltis DE, Soltis PS, Zhu X, Burleigh JG, Chen Z. 2015. Deep Nikiforova SV, Lockhart PJ. 2011. Systematic error in seed plant phylogenetic incongruence in the angiosperm clade Rosidae. Molecular phylogenomics. Genome Biology and Evolution 3: 1340–1348. Phylogenetics and Evolution 83: 156–166. Zhong B, Yonezawa T, Zhong Y, Hasegawa M. 2010. The position of Gnetales Takhtajan A. 1997. Diversity and classification of flowering plants. New York, NY, among seed plants: overcoming pitfalls of chloroplast phylogenomics. USA: Columbia University Press. Molecular Biology and Evolution 27: 2855–2863. Talavera G, Castresana J. 2007. Improvement of phylogenies after removing Zhou X, Xu S, Xu J, Chen B, Zhou K, Yang G. 2012. Phylogenomic divergent and ambiguously aligned blocks from protein sequence alignments. analysis resolves the interordinal relationships and rapid diversification of Systematic Biology 56: 564–577. the laurasiatherian mammals. Systematic Biology 61: 150–164.

New Phytologist (2017) Ó 2017 The Authors www.newphytologist.com New Phytologist Ó 2017 New Phytologist Trust New Phytologist Research 13

Supporting Information Fig. S3 Chronogram of Rosaceae based on the GBDN (whole plastomes with most ambiguous sites being removed) data set. Additional Supporting Information may be found online in the Supporting Information tab for this article: Table S1 List of 142 taxa sampled in this study Fig. S1 The maximum-likelihood tree of Rosaceae based on the Please note: Wiley Blackwell are not responsible for the content PCG (the exons of protein-coding genes) data set. or functionality of any Supporting Information supplied by the authors. Any queries (other than missing material) should be Fig. S2 The tree inferred from the PHYLOBAYES analyses under directed to the New Phytologist Central Office. CAT and CAT-GTR models.

New Phytologist is an electronic (online-only) journal owned by the New Phytologist Trust, a not-for-profit organization dedicated to the promotion of plant science, facilitating projects from symposia to free access for our Tansley reviews.

Regular papers, Letters, Research reviews, Rapid reports and both Modelling/Theory and Methods papers are encouraged. We are committed to rapid processing, from online submission through to publication ‘as ready’ via Early View – our average time to decision is <28 days. There are no page or colour charges and a PDF version will be provided for each article.

The journal is available online at Wiley Online Library. Visit www.newphytologist.com to search the articles and register for table of contents email alerts.

If you have any questions, do get in touch with Central Office ([email protected]) or, if it is more convenient, our USA Office ([email protected])

For submission instructions, subscription and all the latest information visit www.newphytologist.com

Ó 2017 The Authors New Phytologist (2017) New Phytologist Ó 2017 New Phytologist Trust www.newphytologist.com