Botany

SUPERMATRIX ANALYSES AND MOLECULAR CLOCK ROOTING OF : EXPLORING THE EFFECTS OF CHOICE AND LONG BRANCH ATTRACTION ON TOPOLOGY

Journal: Botany

Manuscript ID cjb-2019-0109.R2

Manuscript Type: Article

Date Submitted by the 16-Nov-2019 Author:

Complete List of Authors: Aygoren Uluer, Deniz; Ahi Evran Universitesi, Cicekdagi Vocational College, Department of and Animal Production; Forest, Félix;Draft Royal Botanic Gardens Kew, Hawkins, Julie; University of Reading, School of Biological Sciences, Lyle Building

Keyword: Fabales, long branch attraction, molecular clock rooting, rapid radiation

Is the invited manuscript for consideration in a Special Not applicable (regular submission) Issue? :

https://mc06.manuscriptcentral.com/botany-pubs Page 1 of 45 Botany

SUPERMATRIX ANALYSES AND MOLECULAR CLOCK ROOTING OF FABALES: EXPLORING THE EFFECTS OF OUTGROUP CHOICE AND LONG BRANCH ATTRACTION ON TOPOLOGY

Deniz Aygoren Uluer1,4, Félix Forest2, Julie A. Hawkins3

1, 4 Ahi Evran University, Cicekdagi Vocational College, Department of Plant and Animal Production,

Boyalik Mahallesi, Stadyum Caddesi, Turan Sok. No:18. 40700 Cicekdagi, Kirsehir, Turkey,

[email protected]

2 Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3DS, United Kingdom, [email protected]

3 School of Biological Sciences, Lyle Building, University of Reading, Whiteknights, Reading, Berkshire,

RG6 6BX, United Kingdom, [email protected] Draft

4Author for correspondence: Deniz Aygoren Uluer, Ahi Evran University, Cicekdagi Vocational

College, Department of Plant and Animal Production, Boyalık Mahallesi, Stadyum Caddesi, Turan

Sok. No:18 40700 Cicekdagi, Kirşehir, Turkey, email: [email protected], Work phone:

+903862805500, Fax: +903862805528.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 2 of 45

ABSTRACT

Fabales is a cosmopolitan angiosperm order which consists of four families, Leguminosae

(), , and Quillajaceae. Despite the great interest in this group, a convincing phylogeny of the order is still not available. Therefore, the aim of the current study is to explicitly test for possible LBA problems within Fabales for the first time and determine whether low stemminess and unequal branch lengths could worsen this problem. Supermatrix analysis of

Fabales was carried out using previously published plastid matK, trnL, rbcL and newly sequenced nuclear sqd1 regions for 678 taxa in total, including 43 outgroup taxa from families of Fabidae. We employed additional analyses, such as simulations, network analyses, sampling different outgroup taxa (random or real), removing fast evolving sites and fast evolving taxa and molecular clock rooting, to identify both long branch attraction (LBA) and/or rooting problems. These analyses clearly show that the Fabales phylogeny hasDraft been influenced by the sampling of outgroup taxa, but not LBA. However, network analyses show that even though it is weak, there is a consistent phylogenetic signal among the rapidly radiated Fabales families, which can be traced by further analyses. While, molecular clock rooting analysis yielded a

(Leguminosae(Polygalaceae(Surianaceae+Quillajaceae))) topology with strong support for the first time here, supermatrix analyses yielded a ((Leguminosae+Polygalaceae)(Surianaceae+Quillajaceae)) with low-moderate support.

Keywords — Fabales, long branch attraction, molecular clock rooting, rapid radiation.

https://mc06.manuscriptcentral.com/botany-pubs Page 3 of 45 Botany

INTRODUCTION

Fabales is divided into four families which are very diverse morphologically and molecularly,

Leguminosae, Polygalaceae, Surianaceae and Quillajaceae (Bello et al. 2009; APG IV 2016). Molecular

studies and fossil evidence suggest an ancient-rapid radiation for Fabales (e.g., Crane et al. 1990; Zi-

Chen et al. 2004; Lavin et al. 2005; Pigg et al. 2008; Bello et al. 2009). The monophyly of the order is

strongly supported by several studies (e.g., Bello et al. 2009; Bello et al. 2012; APG IV 2016; Koenen

et al. 2019), but the overall phylogenetic relationships across the order and position of the root

remain controversial, changing from one study to another; a situation common in higher level

phylogenetic studies of ancient, rapid radiations. (Bello et al. 2009). This unresolved phylogenetic

problem for Fabales also hinder further evolutionary questions such as estimating diversification

rates (e.g., Smith et al. 2011; Koenen et al. 2013). In almost every phylogenetic study of Fabales to

date, different rootings and different phylogeneticDraft relationships have been found (e.g., Crayn et al.

1995; Doyle et al. 2000; Savolainen et al. 2000; Soltis et al. 2000; Kajita et al. 2001; Persson 2001;

Wojciechowski et al. 2004; Lavin et al. 2005; Forest et al. 2007; Bruneau et al. 2008; Bello et al. 2009;

Soltis et al. 2011; Bello et al. 2012; Sun et al. 2016; Koenen et al. 2019); however, none of these

studies has focussed on the reason for this incongruence and only a few have employed broad

enough taxon sampling with suitable outgroup taxa.

Ancient rapid radiations have been one of the hardest problems for phylogenetic studies to resolve,

due to short internal branches, which show a limited time span between speciation events and have

a weak phylogenetic signal compared with long external branches; in addition to other problems,

such as high extinction rates before or after a rapid radiation, the evolutionary rate heterogeneity of

branches and long branch attraction (LBA) artefacts even closely related outgroup sequences are

sampled (Felsenstein 1978; Wägele 1999; Fishbein et al. 2001; Rokas and Carroll 2006; Shavit et al.

2007; Whitfield and Lockhart 2007; Murdock 2008; Jian et al. 2008; Whitfield and Kjer 2008;

Kodandaramaiah et al. 2010; Philippe et al. 2011; Rothfels et al. 2012). Ancient-rapid radiations have

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 4 of 45

been described as having low tree stemminess sensu Smith (1994), bush-like sensu Rokas and

Carrol (2006), broom-and-handle sensu Crisp et al. (2004) or starburst phylogenies sensu Albertson et al. (1999). These types of problematic phylogenies have been reported for many angiosperm such as Mesangiospermae (Zeng et al. 2014), (Moore et al. 2010; Soltis et al. 2011),

Brassicaceae (Huang et al. 2015), Fabales (Bello et al. 2009; Bello et al. 2012) and early-diverging

Leguminosae (Azani et al. 2017). In the worst cases, rapid radiations may be represented as hard or near-hard polytomies, such as when the genes used have limited phylogenetic signal allowing the inference of relationships, especially for the internal branches (Braby et al. 2005; Whitfield and Kjer

2008; Kodandaramaiah et al. 2010). However, only a few hard polytomy cases have been reported until now (e.g., Kodandaramaiah et al. 2010), and Fabales is not one of them (Bello et al. 2009).

Gene tree incongruence has been reported as more serious for these short internal branches

(Salichos and Rokas 2013; Sun et al. 2015).Draft Additionally, in these cases, phylogenetic results may be very sensitive to the specific genes used, and to the phylogenetic method and outgroup taxon choice

(Roberts et al. 2009; Kirchberger et al. 2014; Borowiec et al. 2019), because outgroup taxa may recover an incorrect or random root (Smith 1994). Even if it is often trivialised, rooting is one of the hardest steps in phylogenetic reconstruction (Boykin et al. 2010). Due to the high levels of homoplasy, particularly in molecular data, finding the “correct-rooted tree” is always less probable than finding the “correct-unrooted tree” (Sourdis and Krimbas 1987; Smith 1994; Graham et al.

2002). In the case of rapid radiations, this situation becomes even more severe (Smith 1994; Shavit et al. 2007; Sterli 2010) as the combination of low tree stemminess (i.e., rapid radiations) and high levels of homoplasy reduces the chance of finding the correct root (Smith 1994), as in the case of

Fabales.

On the other hand, LBA was first described by Felsenstein (1978) as the spurious attraction of unrelated-long-branched taxa, and therefore homoplasy. LBA has been attributed to poor taxon sampling due to the extinction or unavailability of extant taxa, the Maximum Parsimony method,

https://mc06.manuscriptcentral.com/botany-pubs Page 5 of 45 Botany

distant outgroup taxa and outgroup taxa, which have very different base compositions from ingroup

taxa (Wheeler 1990; Shavit et al. 2007; Dabert et al. 2010; Kodandaramaiah et al. 2010; Li et al.

2012; Grant 2019). Over time, both empirical and simulated data and analyses have confirmed the

widespread distribution of this phenomenon. LBA can also be an important problem, particularly for

rapid radiations, because even closely related outgroup taxa can introduce long branches to the

phylogeny, compared with the short internal branches of the ingroup (Lyons-Weiler et al. 1998;

Moreira and Philippe 2000; Wägele and Mayer 2007). Furthermore, ancient radiations may be more

prone to LBA as a result of branch length heterogeneity, base homoplasy and compositional bias,

even when the model-based methods, such as maximum likelihood (ML), are employed (Wheeler

1990; Foster and Hickey 1999; Fishbein et al. 2001; Gribaldo and Philippe 2002).

For these reasons, any study aiming to resolve a particularly difficult phylogeny question needs to be

mindful of the influence of rooting, outgroupDraft effect and LBA as sources of systematic error.

Therefore, the aim of the current study is to explicitly test for possible LBA problems within Fabales

for the first time and to determine whether low tree stemminess and unequal branch lengths could

worsen this problem. Tree-based methods were explored to identify the root position of Fabales,

with broad taxon sampling and several outgroup sequences, based on current literature. To discover

the possible causes of an unresolved Fabales phylogeny, simulations and novel analyses, such as

employing different alignments, different outgroup selections that are closely or distantly related

and network analyses, were employed.

MATERIALS AND METHODS

DNA extraction, amplification, sequencing and taxon sampling

Our primary dataset contained widely sequenced (for all Fabales families) and published matK

(excluding trnK introns), trnL (excluding trnL-trnF intergenic spacer) and rbcL plastid gene regions

obtained from the National Center for Biotechnology Information (NCBI/GenBank) for 678 taxa in

total: 615 taxa from Leguminosae, 14 taxa from Polygalaceae, five taxa from Surianaceae and one

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 6 of 45

taxon representing Quillajaceae. This sampling represents 80% of Leguminosae genera (3% of species richness of the family), 70% of Polygalaceae genera (1.4% of species richness of the family) and all genera of Surianaceae and Quillajaceae. Forty-three outgroup taxa representing all families in the Fabidae orders were also included (Celastrales, Cucurbitales, Fagales, Malpighiales, Oxalidales,

Rosales, Zygophyllales). For the ingroup, we sampled one species from each available in

Fabales and for the outgroup we sampled one species from each family in the Fabidae . The sequence data were gathered and concatenated from multiple accessions, and if a DNA marker was only available in a different species, we gathered data from congeneric species. The number of taxa and the alignment lengths for 36 analyses are provided in Table 1, and most of these analyses were conducted on our primary dataset with 678 taxa, unless otherwise indicated. The National Center for

Biotechnology Information (NCBI/GenBank) accession numbers for these previously published DNA sequences are provided in SupplementaryDraft data¹.

Our second dataset comprising newly sequenced nuclear sqd1 (UDP sulfoquinovose synthase gene), which is a low copy nuclear gene, 267 base pairs (bp) long in Angiosperm families, and easy to align due to the lack of indels and (Li et al. 2008) and published plastid matK (sqd1+ matK with no outgroups dataset) was also used in this study to investigate the effect of outgroup taxa (i.e., network analyses). The present study is the first attempt to use a low copy nuclear marker to comprehensively address the problem of poor resolution of interfamilial relationships in Fabales.

Total genomic DNA samples used in previous studies (Forest 2004; Bello 2008; Babineau et al. 2013) were newly sequenced here for sqd1 DNA. Despite several attempts, 12 of 16 Polygalaceae samples available could not be amplified, including the design of a set of specific internal primers. The primers from (Li et al. 2008) were used for the amplification of the sqd1 region. Polymerase chain reactions were performed in 25 µL reaction volumes, with 12,5 µL of Biomix (Bioline, 2x), 0,5 µL bovine serum albumin (Sigma-Aldrich, 20 mg/mL), 1 µL of each primer (10mM), and 0,5 µL of ¹Supplementary data are available with the article through the journal Web site at http://nrcresearchpress.com/doi/suppl/10.1139/cjb- 2019-0109.R2.

https://mc06.manuscriptcentral.com/botany-pubs Page 7 of 45 Botany

template DNA and made up to 25 µL with distilled water. If the first PCR attempt was unsuccessful, 1

µL of dimethyl sulfoxide (DMSO) was added. The PCR profile for the sqd1 gene region consisted of an

initial denaturation at 95 °C for 2 seconds, followed by 30 cycles of denaturation at 94 °C for 40

seconds, annealing at 54 °C for 30 seconds, and extension at 72 °C for 40 seconds, and completed

with a final extension at 72 °C for 5 minutes. Amplifications were performed on a GeneAmp PCR

system 2700 (Applied Biosystems). Products were sent to Source BioScience (Nottingham, UK) for

purification and sequencing.

This matrix comprised 78 sequences, 61 of them generated for this study and 17 obtained from

GenBank (Supplementary data¹.). The matrix included 69 samples from Leguminosae, six from

Polygalaceae, two from Surianaceae and one from Quillajaceae. Outgroup taxa were excluded in the network analyses of sqd1+matK data. TheDraft National Center for Biotechnology Information (NCBI/GenBank) accession numbers for these previously published and newly produced DNA

sequences are provided in Supplementary data², including newly generated 78 sqd1 sequences.

Sequences were assembled and aligned using the Geneious alignment option in Geneious Pro 4.8.4

(Kearse et al. 2012) with the automatic pairwise alignment tool and subsequently edited manually.

Equivocal base calling at the beginning and end of assembled complementary strands were trimmed.

All indels were scored as missing data.

To robustly support a Fabales phylogeny, we employed several analyses, which are introduced

below. However, it should be noted that instead of following a step-by-step workflow, our aim was

to recover the same root from as many analyses as possible.

Methods to find a possible LBA and a rooting problem

a. Comparing unrooted and rooted topologies

²Supplementary data are available with the article through the journal Web site at http://nrcresearchpress.com/doi/suppl/10.1139/cjb- 2019-0109.R2.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 8 of 45

If the support for the nodes of an unrooted tree decreases with the addition of outgroup taxa, or the phylogenetic relationships of the ingroup change with the inclusion of outgroup taxa, the low support or altered topology potentially indicates and reveals an uncertain root position (Smith 1994;

Lyons-Weiler et al. 1998; Ware et al. 2008; Rich and Xu 2011). For this reason, outgroup sequences were removed from our primary dataset (Supplementary data³) prior to analyses on Geneious Pro

4.8.4, and ML analyses were implemented (Supplementary data⁴). Outgroup-free “sqd1+matK” dataset with 78 taxa was also subject to ML analyses. We named these outgroup-free analyses

“network analyses” to address the phylogenetic inconsistency within Fabales both with maternally inherited plastid and biparentally inherited nuclear DNA.

All ML analyses were implemented in RAxML through the http://embnet.vital-it.ch/raxml-bb/web- server (Stamatakis et al. 2008). For these analyses, we allowed missing data. A gamma model of heterogeneity and ML search options wereDraft selected, and for each data set outgroup taxa and partitions were defined by gene. The best scoring trees with bootstrap values were saved, after bootstrap analyses of 1000 replicates. We consider support for the topologies derived from bootstrap analyses as “moderate” if bootstrap support values ranged between 80% and 90%, and

“high” if they were 90% or higher. Interactive Tree of Life (iTOL) online tool (https://itol.embl.de/)

(Letunic and Bork 2016) was used to visualize tree files. b. Comparing results obtained with different outgroup taxa

Previous studies have shown that the ingroup topology may change by not only the addition of outgroup taxa, but also by the choice and density of outgroup taxa, rather than uncertain relationships within the ingroup (e.g. Graham et al. 2002; Kodandaramaiah et al. 2010; Thomas et al.

2013). Indeed, if the rooting does not change with random, distant and/or non-random outgroup taxa, this may indicate the possible presence of LBA (Qiu et al. 2001).

³ ⁴Supplementary data are available with the article through the journal Web site at http://nrcresearchpress.com/doi/suppl/10.1139/cjb- 2019-0109.R2.

https://mc06.manuscriptcentral.com/botany-pubs Page 9 of 45 Botany

For this reason, with our primary dataset, we employed the methods of Qui et al. (2001) to

investigate whether there is a problem with LBA within Fabales. Four types of outgroup sequences

were used. First, poly A, T, G, C and poly ATGC, constituting 38% A+T or 38% G+C, 10 random

outgroup sequences were generated by Seq-Gen v1.3 (Rambaut and Grass 1997). These

nonbiological sequences were included in the analyses to address whether the outgroup signal is

random or not (Qui et al. 2001). Second, the Fabales tree was rooted with different outgroup taxa

such as closely related Rosales (16 sequences) or distantly related Brassicales (Brassica),

Ceratophyllales (Ceratophyllum), Malpighiales (Viola)+Zygophyllales (Bulnesia) and Vitales (Vitis),

Zygophyllales (Bulnesia) to attempt to reveal any possible LBA problems with both closely-related

and distant outgroup taxa. Third, misaligned real outgroup sequences were created manually.

Additionally, simulations based on real ingroupDraft and outgroup sequences were also generated by

Seq- Gen v1.3 (Rambaut and Grass 1997) for 10,000 bp of data for 31 taxa which includes five

different orders as outgroup taxa to show the effect of large amount of data on Fabales phylogeny.

c. Removing long-branched taxa

While removing long-branched taxa can compromise the taxon sampling, it can also reduce conflict

and low resolution due to LBA artifacts, which is caused by including both slowly and rapidly evolving

branches (Philippe and Laurent 1998; Dabert et al. 2010). For this reason, long-branched taxa,

identified using the ML analyses, were removed manually from the Leguminosae data matrices of

our primary dataset. Second, Duparquetia (Leguminosae) was also excluded from the ML analysis to

assess any impact of this long-branched taxon on sister group relationships (please note that this

taxon was represented with a long branch not only in Bruneau et al. (2008), but also in our ML

analysis with our “primary data”). Third, to detect any LBA artefacts related to Fabales families, each

family was excluded from the analyses sequentially.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 10 of 45

d. Removing fast evolving sites

Removing fast evolving positions and particularly 3rd codon positions from the alignment may help to deal with saturation, removal of phylogenetic signal (i.e., noisy data), LBA and homoplasy problems

(Smith 1994; Philippe and Laurent 1998; Brinkmann et al. 2005; Rodriguez-Ezpeleta et al. 2007; Wu et al. 2013; Zhong et al. 2014; Sun et al. 2016). For this purpose, the length of the alignment of our primary dataset was decreased from 3,894 bp, then to 3,519 bp (S3), 3,293 bp (S2), 2,950 bp (S1) and eventually 2,167 bp (S0), with the exclusion of saturated nucleotides by SlowFaster, which is an effective program for minimising the impacts of LBA artifacts and possible phylogenetic noise in a data set caused by saturation of nucleotide positions (Kostka et al. 2008). e. Alignment quality Draft Increasing the alignment quality by removing the ambiguous sites may decrease a possible non- phylogenetic signal and LBA (Smith 1994; Harrison and Langdale 2006; Dabert et al. 2010; Philippe et al. 2011; Thomas et al. 2013). To do this, BMGE (Block Mapping and Gathering with Entropy)

(Criscuolo and Gribaldo 2010) was used to remove both ambiguous sites and saturated characters from the alignment, by removing nucleotides above a defined threshold score (Criscuolo and

Gribaldo 2010); therefore, it is useful software for both alignment quality and LBA effects. BMGE differs from SlowFaster because it not only removes the saturated nucleotides, but also trims the ambiguous sites from the alignment. For BMGE analyses, all outgroup sequences were retained (in total 678 taxa, primary dataset), and one alignment was generated with 2,995 bp. f. Molecular clock rooting

Molecular clock rooting, which states that the timespan between tips and the root is the same across all lineages, naturally provides a root to a phylogenetic tree, even if the data violates the molecular clock assumption (Huelsenbebck et al. 2002; Boykin et al. 2010). Furthermore, adding

https://mc06.manuscriptcentral.com/botany-pubs Page 11 of 45 Botany

fossils in molecular clock rooting analyses can improve the results (Huelsenbeck et al. 2002;

Drummond et al. 2006; Renner et al. 2008; Thomas et al. 2013).

In the current study, molecular clock rooting analyses were run on BEAST v1.8.0 (Drummond and

Rambaut 2007) on the CIPRES portal (Miller et al. 2010). The primary dataset alignment was

imported into BEAUti v1.8.0 to generate BEAST input files (Drummond et al. 2012). We allowed

BEAST to infer topology and branch lengths. A lognormal relaxed model (uncorrelated) (Drummond

et al. 2006) was used, and the underlying model of molecular evolution was set to be GTR+G+I for

each of the individual genes. The models of evolution were estimated by using jModelTest 2.1.10

(Guindon and Gascuel 2003; Darriba et al. 2012). The Yule process as a tree prior was used with a

randomly generated starting tree. At the same time, MCMC was set for 20 million generations, with

sampling every 1000th generation. The resulting trees and log files from two independent runs were

combined using LogCombiner v1.8.0 (DrummondDraft and Rambaut 2007). Tracer v1.6 (Rambaut et al.

2014) was used to check convergence statistics. We used TreeAnnotator v1.8.0 (Drummond and

Rambaut 2007) to produce maximum clade credibility trees. Interactive Tree of Life (iTOL) online

tool (https://itol.embl.de/) (Letunic and Bork 2016) was used to visualize tree files.

Twenty-four ingroup fossil calibration points were used to constrain divergence times (Table 2).

Most were adopted directly from Lavin et al. (2005), Bruneau et al. (2008) and Simon et al. (2009),

though two were from later publications (i.e., Pan et al. 2012; Jia and Manchester 2014). The

rationale for fossil selection followed Simon et al. (2009). While the inclusion of a set of outgroup

fossil constraints can render the 60-70 Ma legume stem node constraint redundant, we still

preferred to include the 60-70 Ma legume stem node constraint used by Lavin et al. (2005)’s in the

molecular clock rooting analysis, because this boundary is largely based on the earliest legume

fossils (Lavin et al. 2005; Simon et al. 2009) and the lack of any conclusive Leguminosae fossils prior

to ca. 58 Ma (Herendeen et al. 1992; Wing et al. 2009; Koenen et al. 2013). A recent study reported

that the oldest fossil record for Leguminosae dated to 65.35 Ma (Lyson et al. 2019). However, there

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 12 of 45

is little information about the leaflet and seedpod of this oldest legume fossil in the study, including to which legume group it is potentially assigned. Furthermore, the age of the fossil still falls within the range we used to constrain the legume stem node (i.e., 60-70 Ma). Therefore, we decided not to include this unconfirmed fossil in our analyses.

Two new fossils are used for Fabales: instead of the 34 Ma old Cercis (Leguminosae) fossil

(MacGinite 1953) which was used as a calibration point by Lavin et al. (2005), Bruneau et al. (2008) and Simon et al. (2009), we included fossil leaves and fruits from Oregon, USA to represent the oldest fossil record of Cercis at ~36 Ma (Jia and Manchester 2014) (calibration point C in the current study). Similarly, Newtonia (Leguminosae) fossil seeds from Ethiopia from 22-21 Ma indicates the earliest fossil record for this genus (Pan et al. 2012) (calibration point Y in the current study). Fossils were set as minimum age constraintsDraft with lognormal prior distribution to account for possible errors in fossil age estimates and gaps in the fossil record (Forest 2009; Heads 2010). No fossils from Polygalaceae (e.g., Crane et al. 1990; Magallon et al. 1999; Pigg et al. 2004, 2008) and

Surianaceae (e.g., Kruse 1954; Zi-Chen et al. 2004) were used due to their unconfirmed status (Lavin et al. 2005; Forest et al. 2007; Bello et al. 2009).

RESULTS

Results of all analyses with internal node support values are presented in Table 1. A summary of the

ML phylogenetic tree (primary dataset) showing Fabales and its closest outgroups is presented in

Figure 1 (Supplementary data⁵).

Employing two types of alignment alteration software, BMGE (Criscuolo and Gribaldo 2010) and

Slowfaster (Kostka et al. 2008), did not improve the resolution of the phylogenetic trees nor yield higher support values for the interfamilial relationships. Indeed, excluding more and more data with

⁵Supplementary data are available with the article through the journal Web site at http://nrcresearchpress.com/doi/suppl/10.1139/cjb- 2019-0109.R2.

https://mc06.manuscriptcentral.com/botany-pubs Page 13 of 45 Botany

both SlowFaster (Kostka et al. 2008) and BMGE (Criscuolo and Gribaldo 2010) generally yielded

lower support values for the phylogenetic relationships within Fabales, which may indicate that the

software could be removing the existent phylogenetic signal. On the other hand, results of this study

suggest that Fabales phylogeny does not suffer from LBA, because both the root and the tree

topology changed using different outgroup sampling. Random outgroup sequences such as poly A, C,

G, T and poly ACGT sequences, or misaligned outgroup sequences, yielded non-monophyletic

Fabales or at least one non-monophyletic family. The support values were not high when the Fabales

tree was rooted by not only a closely related outgroup (Rosales), but also by distant outgroup taxa

(Brassicales, Ceratophyllales, Malpighiales, Vitales and Zygophyllales). The outgroup simulation

analyses based on real outgroup sequences yielded moderately to highly supported trees, but with

different topologies (Table 1). Removing long branches of Leguminosae and Duparquetia (Leguminosae) did not improve the resolutionDraft of the phylogenetic trees or yield higher support values for the interfamilial relationships. On the contrary, exclusion of the Polygalaceae and

Surianaceae families, and network analyses yielded moderately to highly supported internal nodes

(Supplementary data⁶).

Molecular clock rooting analysis yielded monophyletic families for Fabales (posterior probability, PP,

of 1.0 for each family), and surprisingly, indicated Leguminosae as the root of Fabales (Figure 2,

Supplementary data⁷). While the (QS) relationship was supported with 0.98 PP, ((QS)P) relationship

received 1.0 PP. The outgroup-free molecular clock analysis yielded 71.4 million years (Ma) of age

for Leguminosae, 59.6 Ma for (Surianaceae+Quillajaceae) crown clade, 48.2 Ma for the crown

Polygalaceae and 39.8 Ma for the Surianaceae.

Interestingly, the same network was recovered from all the analyses described, and support values

for the interfamilial relationships of Fabales were very high when the outgroup taxa were removed

(Table 1).

⁶ ⁷Supplementary data are available with the article through the journal Web site at http://nrcresearchpress.com/doi/suppl/10.1139/cjb- 2019-0109.R2.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 14 of 45

Phylogenetic relationships

Overall the results of our ML analysis with our primary dataset were consistent with existing hypotheses of relationships. The order Fabales was monophyletic (99%) (Figure 1), while within

Fabales, a ((Leguminosae+Polygalaceae) (Surianaceae+Quillajaceae)) topology was obtained.

Monophyletic Polygalaceae (97%) was sister to monophyletic Leguminosae (96%) with only 48% support. Quillajaceae was sister to monophyletic Surianaceae (98%) with only 83% bootstrap support. Within Polygalaceae, was sister to all remaining taxa. Monophyletic

Carpolobieae (99%) was sister to monophyletic Polygaleae (55%), and non-monophyletic Moutabeae was sister to this pair. Within Leguminosae, the orchid-like flowered Duparquetia

(Duparquetioideae) was sister to all legumes with 96% bootstrap support. The sister pair monophyletic (100%) and monophyleticDraft (99100%) was the second diverging clade with very weak bootstrap support (only 28%). Monophyletic (99%) was sister to remaining Leguminosae with also weak support (63%). Monophyletic

(86%) was sister to monophyletic Papilionoideae (83%) with 88% support (Figure 1). Within

Papilionoideae, the (Cardoso et al. 2012a; 2013), Vataireoid clade (Ireland et al. 2000;

Pennington et al. 2001; Wojciechowski 2013) and Andira clade (Cardoso et al. 2012a) were not monophyletic, while the ADA clade (Amburaneae+Dipterygeae+Angylocalyceae; Cardoso et al.

2012a), Swartzieae, Cladrastis clade, Exostyleae (Herendeen 1995), Genistoids. l. (Wojciechowski et al. 2004; Cardoso et al. 2012b), Dalbergioids. l. (Lavin et al. 2001), Baphieae (Pennington et al. 2001) and the NPAAA (non-protein-amino-acid-accumulating) clade were monophyletic.

https://mc06.manuscriptcentral.com/botany-pubs Page 15 of 45 Botany

DISCUSSION

The spurious root of Fabales

Our supermatrix analysis with our primary dataset yielded a ((Leguminosae+Polygalaceae)

(Surianaceae+Quillajaceae)) topology for Fabales with low-moderate support, and this topology was

also suggested by previous analyses (Bello et al. 2009, 2012) (please note that while we employed

rbcL+matK regions in the current study, Bello et al. (2009) employed rbcL+matK, and Bello et al.

(2012) employed rbcL+matK+66 morphological characters). However, similarly to the previous

attempts, the current study could not provide a robust answer to clarify the phylogenetic

relationships within Fabales. The results of this study clearly suggest that there is no LBA problem

within the Fabales phylogeny, because, when LBA is present, the expectation is of a stable root,

which does not change by sampling differentDraft random and real outgroups (Qui et al. 2001; Graham et

al. 2002; Bergsten 2005). In contrast, by employing different outgroup taxa (real or random, close or

distant), the study yielded spurious roots in every case (Table 1). Even closely related outgroup taxa,

such as Rosales, yielded topology alterations and weakly supported ingroup relationships, probably

caused by a weak phylogenetic signal in the internal branches of Fabales, which are sensitive to the

smallest changes (Graham et al. 1998; Graham et al. 2002; Rota-Stabelli and Telford 2008), or

otherwise by the possibility of our level of gene and/or taxon sampling not providing enough

phylogenetic signal.

The choice of outgroup is a crucial step of a phylogenetic tree reconstruction (Pisani et al. 2015;

Jamil et al. 2019) as the outgroup taxa can affect the ingroup topology (Smith 1994; Lyons-Weiler et

al. 1998; Djernæs et al. 2011; Rich and Xu 2011; Drew et al. 2014; Grant 2019), such as variation in

root position when using different outgroup taxa (e.g., Milinkovitch and Lyons-Weiler 1998;

Puslednik and Serb 2008; Bewick et al. 2012). In such cases neither sampling sister outgroups nor

sampling more outgroup taxa will necessarily help in obtaining a robust tree. However, in the

current study, exclusion of outgroup taxa with both ML and BEAST analyses presented a highly

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 16 of 45

supported network (Table 1), and the same pattern was also shown by many previous studies (e.g.,

Graham et al. 1998; Rota-Stabelli and Telford 2008; Kodandaramaiah et al. 2010; Kirchberger et al.

2014). Here, the low support values were probably not caused by uncertain relationships (i.e., topology problem) within the order, because outgroup free analyses yielded very strong support and stable rooting (Graham et al. 1998). Indeed, in the case of phylogenies with problems, such as rapid radiations which are sensitive to outgroup choice due to weak phylogenetic signal at the internal branches and when different outgroup taxa introduce different topologies, the best practice is to compare rooted and unrooted trees, and if they differ, to accept the unrooted one (i.e., network), which is most possibly the correct one (Donoghue and Cantino 1984; Milinkovitch and Lyons-Weiler

1998; Tarrío et al. 2000; Holand et al. 2003; Ackerman et al. 2014). Although, unrooted trees are useful and can potentially provide plausible hypotheses of ingroup relationships, whether or not outgroup taxa affect the ingroup topology,Draft they do not allow the investigation of direction of evolutionary change or test the monophyly of the ingroup (Nixon and Carpenter 1993; Smith 1994;

Huelsenbeck et al. 2002; Schuettpelz and Hoot 2006).

In the current study, both the root and topology of the tree changed according to the different alignments, different rooting methods and different outgroup taxa used. It could be that this random rooting is caused by the ancient-rapid radiation of the order, which is represented by very short internal branches (Bello et al. 2009). Yet, the outgroup-free molecular clock analysis in the current study (Figure 2) estimated an age of ~59.6 Ma for the (Surianaceae+Quillajaceae) crown clade, ~71.4 Ma for Leguminosae and ~48.2 Ma for the crown Polygalaceae, which indicated a non- rapid radiation scenario for the Fabales, in contrast to the outgroup-included molecular dating analysis by Bello et al. (2009). This may be a result of rooting or taxon sampling (i.e., dense sampling of the ingroup and relatively sparse sampling of the outgroup) (Linder et al. 2005; Pirie et al. 2005;

Heath et al. 2008). However, the ancient-rapid radiation of Fabales has been supported before (e.g.,

Bello et al. 2009), and the fossil record of Fabales families also supports a rapid radiation hypothesis

(e.g., Crane et al. 1990; Zi-Chen et al. 2004; Lavin et al. 2005; Pigg et al. 2008) (note that fossils of

https://mc06.manuscriptcentral.com/botany-pubs Page 17 of 45 Botany

Polygalaceae and Surianaceae are unconfirmed), but there is still the possibility that the

incompleteness of the Fabales fossil record is more acute than currently assumed. However, lack of

resolution is a common problem across angiosperms (e.g., Zeng et al. 2014; Huang et al. 2015; Azani

et al. 2017; Koenen et al. 2019), and instead of a rapid radiation scenario, there are still other

possibilities, not only for Fabales, but also for other phylogenies with short internal branches, such

as conflicting gene trees, whole genome duplication, hybridization, introgression, horizontal gene

transfer, incomplete lineage sorting, phylogenetic methods, outgroup choice, a mass extinction

scenario followed by a rapid radiation and inadequate data (Rokas and Carroll 2006; Whitfield and

Lockart 2007; Crisp and Cook 2009; Kodandaramaiah et al. 2010; Koenen et al. 2013; Salichos and

Rokas 2013; Drew et al. 2014; Zeng et al. 2014; Koenen et al. 2019). Moreover, the effect of taxon

sampling on molecular dating analysis has also been discussed and shown by many studies (e.g., Linder et al. 2005; Pirie et al. 2005; HeathDraft et al. 2008). Dense sampling of the ingroup and relatively sparse sampling of the outgroup can be another problem for the molecular dating analyses of

Fabales, because over-estimation of the age of Fabales and Fabales families is also possible. For this

reason, in the current study, we employed a large outgroup sampling to obtain a more balanced tree

(Smith 1994). However, the effect of both ingroup and outgroup sampling on Fabales molecular

clock analysis is still unknown. To answer these questions, we certainly need more data from not

only the chloroplast, but also the nuclear genome.

There is a view that increasing the amount of data brought to bear on the problem of phylogenies

which are hard to resolve, but particularly rapid ancient radiations, is enough to resolve the near-

hard polytomies at their base. This view of rapid ancient radiations drives the use of increasingly

large amounts of data, made easier by the advent of high-throughput sequencing methods. Cases

which have shown that the more data approach is useful are not rare in the literature such as

Rosaceae (Zhang et al. 2017), Mesaangiospermae (Zeng et al. 2014), eudicots (Zeng et al. 2017),

Brassicaeae (Huang et al. 2015), Vitaceae (Raman and Park 2016), birds (Reddy et al. 2017), turtles

(Pereira et al. 2017; Shaffer et al. 2017), because especially in the case of rapid radiations more data

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 18 of 45

can increase the internal branch lengths, consequently resolving the phylogeny becomes easier. In contrast, several authors have highlighted the deficiencies of “big data set” (e.g., Philippe et al. 2011;

Pisani et al. 2012), because even more data from different genomes may not resolve topologies of rapidly radiated lineages (e.g., Sun et al. 2015; Rodriguez et al. 2017), since it may yield very strong support values even when the phylogenetic signal is spurious (Rodriguez et al. 2017), or different genomes may yield conflicting results due to biological events, such as hybridization, introgression, and incomplete lineage sorting, or just systematic errors such as taxon sampling differences, data sampling differences and wrong evolutionary model (Hughes et al. 2002; Salichos and Rokas 2013;

Zeng et al 2014; Sun et al. 2015; Reddy et al. 2017; Rodriguez et al. 2017). Recently, Koenen et al.

(2019) employed a concatenation approach with 72 protein-coding chloroplast genes and thousands of nuclear genomic sequence data, yet they reported that, due to biological events, such as, incomplete lineage sorting, reticulation, ancientDraft polyploidy and a combination of short and long branches, solving the root of the Leguminosae is difficult. Similarly, while simulated data do not always behave like real data (Spinks et al. 2009; Schäferhoff et al. 2010), the current study (i.e., our simulation analyses) sounds a note of caution with respect to interpreting the results of the “more data” approach, because larger datasets can strongly support incorrect phylogenetic relationships and an arbitrary root for Fabales.

Nevertheless, the results of the molecular clock rooting analysis provided a (((QS)P)L) topology with very high posterior probabilities, which is reported for the first time here (Figure 2). On the one hand, it is somewhat convincing that we obtained the same network from all unrooted analyses with very high support values, and it has been shown that molecular clock rooting can yield meaningful results (Huelsenbeck et al. 2002; Holland et al. 2003; Renner et al. 2008; Outlaw and Ricklefs 2011;

Calvignac-Spencer et al. 2014). On the other hand, it is possible that the relative number of legume fossils and the high number of its species compared with other families make it more likely that

Leguminosae is the oldest. Indeed, as Koenen et al. (2013) suggested, the significant fraction of fossils with the same 45-46 Ma age (nine fossils in the current study) may have affected the analysis

https://mc06.manuscriptcentral.com/botany-pubs Page 19 of 45 Botany

adversely (Hughes, pers. comm.). There would be more confidence in molecular clock rooting if

there were more fossils of other families of Fabales.

Surprisingly, successive exclusions of Polygalaceae and Surianaceae also yielded higher support

values for Fabales, and exclusion of Leguminosae slightly increased the support values (Table 1).

However, exclusion of long-branched Leguminosae lineages, such as Duparquetia (Leguminosae),

unexpectedly improved support for the (QS) node, while slightly reducing the (LP) node support

value. It has been claimed that the rapid radiation problem becomes more complicated with the

existence of both short branches (i.e., slowly evolving taxa) and long branches (i.e., fast-evolving

taxa) within the ingroup (Dabert et al. 2010), because an LBA artefact may also occur within the

ingroup, but not as extensively as one that the outgroup taxa could introduce (Bergsten 2005). In the

Fabales ML tree, most Papilionoideae and Polygalaceae are represented by very long branches

compared with the five other very short-branchedDraft Leguminosae subfamilies, Quillaja (Quillajaceae),

Surianaceae and outgroup branches (Figure 1). Therefore, obtaining better results, with the

exclusion of Polygalaceae and Surianaceae, could represent the effect of substitution rate

heterogeneity, or of even weaker signals shown by these two families. Indeed, substitution rate

heterogeneity among lineages is a widespread problem in phylogenetics, which is caused by

different mutation rates, generation times, metabolic rates and population size effects (Drummond

et al. 2006; Rutschmann 2006; Ho and Duchêne 2014; Beaulieu et al. 2015; Wei et al. 2017). The

impact of rate heterogeneity among lineages on divergence time analysis is also a well-known issue

and contradicts the core concept of molecular dating, which is that the “molecular clock ticks

regularly for each lineage” (i.e., a universal molecular clock or a strict molecular clock) (Welch and

Bromham 2005; Rutschmann 2006; Forest 2009; Bell 2015). Other than excluding long terminal

branches, which works only if a few branches are outliers in length, relaxed molecular clock

approaches have been implemented as the most appropriate solution for overcoming the rate

heterogeneity problem (Welch and Bromham 2005; Drummond et al. 2006), but these also have

their limits in dealing with extremes. In contrast to the application of a strict molecular clock and to

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 20 of 45

methods which correct rate heterogeneity among branches, BEAST incorporates rate heterogeneity by employing a relaxed-clock model (Drummond et al. 2006; Forest 2009). However, it has been suggested that even when relaxed molecular-clock approaches are used, great caution should be taken. For example, molecular dating analyses yield an older age for the origin of angiosperms compared with the actual fossil records, and this has been attributed to both heterogenous rates of molecular evolution among lineages and taxon sampling (Beaulieu et al. 2015). Interestingly, the molecular clock analysis of Fabales yielded meaningful results in this study, compared with the fossil record. Therefore, we believe that while it is obvious that successive exclusions of Polygalaceae and

Surianaceae improved the ML results, the effect of rate heterogeneity among Fabales branches should be minimum on the molecular clock analyses when using a relaxed molecular clock model.

Unfortunately, the Fabales phylogeny problemDraft has been waiting to be solved for over 10 years (Forest et al. 2007; Bello et al. 2009; Bello et al. 2012). In this study, the outgroup free analyses yielded a network (L=Leguminosae, P=Polygalaceae, S=Surianaceae, Q=Quillajaceae),

(nuclear+chloroplast data) and supermatrix analyses yielded a ((LP)(QS)) topology, and the molecular clock rooting analysis revealed a (((QS)P)L) topology with strong support. Although the network could be the working answer for Fabales, it does not show us the path of evolution within the order. Therefore, there is still a need for further studies to confirm whether ((LP)(QS)) (the supermatrix analysis in the current study) or (((QS)P)L) (molecular clock rooting analysis in the current study) or another topology (e.g., (((QS)L)P) Bello et al. 2009; 2012) is the right answer for

Fabales. However, the fossil record of Fabales (e.g., Crane et al. 1990; Zi-Chen et al. 2004; Lavin et al.

2005; Pigg et al. 2008) also supports a (((QS)P)L) topology, with which the oldest fossils were assigned to the Leguminosae family, compared with the unconfirmed fossils of Polygalaceae and

Surianaceae. Yet, the possibility of an incomplete fossil record and “random rooting” problem of

Fabales should not be ignored, because it is possible that the relative number of legume fossils and

https://mc06.manuscriptcentral.com/botany-pubs Page 21 of 45 Botany

high number of its species compared to other families make it more likely that Leguminosae is the

oldest. We would have more confidence in molecular clock rooting if there were more fossils of

other families of Fabales. Ultimately, the number of old fossils of Leguminosae may have directed

our analyses towards this interpretation. The current study also yielded some other important

results by using novel approaches to understand the reasons for the rooting problem in Fabales. It is

clear that neither sequencing more outgroup taxa nor employing different outgroup samplings will

provide a robust root for Fabales.

ACKNOWLEDGMENTS

We thank Colin Hughes for constructive suggestions, and Patrick S. Herendeen for valuable fossil

clarifications. We are also grateful to the Draftanonymous reviewers for helpful comments.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 22 of 45

Table 1. Details of all analyses performed, including alignment length, number of taxa, family topology and node support retrieved. Non-monophyletic results (for Fabales families or for Fabales itself) and support values for them are marked as X. If not indicated, all analyses refer to the ML results. Amount of missing data is indicated for our “primary alignment” and BMGE alignment within parentheses. Our primary alignment (with and without outgroup sequences), ML trees corresponding to these alignments, and molecular clock rooting analysis results are available as Supplementary data¹. L: Leguminosae, P: Polygalaceae, S: Surianaceae, Q: Quillajaceae, OG: Outgroup(s).

Data type Alignment # of taxa Results Support values for the deep length (bp) nodes Different alignments and removing fast evolving sites Primary alignment (31.3%) 3894 678 ((LP)(QS)) 48% for (LP), 83% for (QS) BMGE (31.8%) 2995 678 ((LP)(QS)) 53% for (LP), 74% for (QS) Slowfaster S0 2167 678 ((QL)(PS)) 2% for (QL), 14% for (PS) Slowfaster S1 2950 678 ((LP)(QS)) 19% for (LP), 67%, 38% for (QS) Slowfaster S2 3293 Draft678 (((QS)L)P) 77% for (QS), 35% for ((QS)L) Slowfaster S3 3519 678 ((LP)(QS)) 10% for (LP), 55% for (QS) Different outgroups Poly A outgroups 3894 636 X X Poly C outgroups 3894 636 X X Poly G outgroups 3894 636 X X Poly T outgroups 3894 636 X X Poly ACGT outgroups 3894 636 X X Brassicales as OG 3894 636 (((QS)L)P) 76% for (QS), 63% for ((QS)L) Ceratophyllales as OG 3894 636 (((LP)S)Q) 72% for (LP), 46% for ((LP)S) Malpighiales+ 3894 637 (((QS)L)P) 84% for (QS), 68% for ((QS)L) Zygophyllales as OG Rosales as OG 3894 651 ((LP)(QS)) 39% for (LP), 65% for (QS) Vitales as OG 3894 636 (((QS)L)P) 89% for (QS), 74% ((QS)L) Zygophyllales as OG 3894 636 (((QS)L)P) 87% for (QS), 81% for ((QS)L)

10 random sequences as 3894 645 X X OG %38 AT outgroup 3894 637 X X %38 GC outgroup 3894 637 X X Misaligned outgroups 3894 678 X X Outgroup simulations Cucurbitales as an OG 10,000 31 (((QS)L)P) 100% for (QS), 3% for ((QS)L) Fagales as an OG 10,000 31 ((LP)(QS)) 98% for (LP), 100% for (QS)

https://mc06.manuscriptcentral.com/botany-pubs Page 23 of 45 Botany

Malpighiales as an OG 10,000 31 ((LP)(QS)) 100% for (LP), 97% for (QS)1 Rosales as an OG 10,000 31 ((LP)(QS)) 100% for (LP), 94% for (QS) Zygophyllales as an OG 10,000 31 (((QS)L)P) 100% for (QS), 99% for ((QS)L) Network analyses No outgroup taxa 3894 635 92% for (QS), 99% for ((QS)P)

No outgroup taxa, no 3894 634 89% for (QS), 98% for ((QS)P) Duparquetia (Leguminosae)

Only sqd1+matK with no 2041 78 100% for (QS), 100% for ((QS)P) OG

Removing long branches and each family subsequently L with only short branches 3894 92 ((LP)(QS)) 55% for (LP), 90% for (QS) No L 3894 Draft63 ((QS)P) 79% for (QS), 100% for ((QS)P) No P 3894 664 ((QS)L) 93% for (QS), 99% for ((QS)L) No S 3894 673 ((LP)Q) 83% for (LP), 99% for ((LP)Q) No Q 3894 677 ((PS)L) 58% for (PS), 98% for ((PS)L) Molecular clock rooting Molecular clock rooting 3894 635 (((QS)P)L) 0.98 for (QS), 1.0 for ((QS)P

1 Supplementary data are available with the article through the journal Web site at http://nrcresearchpress.com/doi/suppl/......

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 24 of 45

Table 2. Fossils used as calibration points for the Fabales phylogenetic analyses. Sources are provided in the table. MRCA: Most Recent Common Ancestor, Ma: million years ago

Name Node constrained Fossil organ(s) Geographic location Age (Ma) Reference A Leguminosae stem node Early fossil record of 60-70 Lavin et al. 2005; Simon et al. 2009 Various locations Leguminosae C Cercis stem node 36 Jia and Manchester 2014 Cercis leaves and fruits Western North America

D Bauhinia stem node 46 Jacobs and Herendeen 2004; Bruneau et al. 2008; Simon et al. Bauhinia s.l. leaves Tanzania 2009

E Hymenaea stem node 24 Hueber and Langenheim 1986; Lavin et al. 2005; Bruneau et al. Hymenaea flower Dominican Republic 2008; Simon et al. 2009

F MRCA of Prioria and 24 Poinar and Poinar 1999; Bruneau et al. 2008; Simon et al. 2009 Oxystigma Prioria flowers Dominican Republic

F2 MRCA of clade of Protomimosoidea buchanensis 55 Crepet and Taylor 1985; 1986; Lavin et al. 2005; Bruneau et al. Dimorphandra group Tennessee,Draft USA 2008; Simon et al. 2009 flowers G Daniellia stem node 53 De Franceschi and De Ploëg 2003; Bruneau et al. 2008; Simon Daniellia wood France et al. 2009

H Aphanocalyx stem node 46 Heredeen and Jacobs 2000; Bruneau et al. 2008; Simon et al. Aphanocalyx leaves Tanzania 2009

I Crudia stem node 45 Herendeen and Dilcher 1990; Bruneau et al. 2008; Simon et al. Crudia fruits and leaflets SE USA 2009 Stem node leading to I2 Styphnolobium and Cladrastis 40 Herendeen 1992; Lavin et al. 2005; Simon et al. 2009 Styphnolobium and Tennessee, USA Cladrastis fruits and leaves J Papilionoideae stem node Barnebyanthus buchananensis 55 Crepet and Herendeen 1992; Herendeen and Wing 2001; Lavin SE USA and Wyoming, USA flowers et al. 2005; Bruneau et al. 2008; Simon et al. 2009

J2 Genistoid crown node Leaves and pods similar to 56 Herendeen and Wing 2001; Lavin et al. 2005; Simon et al. 2009 Western Wyoming, USA Bowdichia and Diplotropis

K stem node Swartzia fruits and leaflets SE USA 45 Herendeen 1992; Bruneau et al. 2008; Simon et al. 2009 K2 Machaerium stem node 40 Herendeen 1992; Lavin et al. 2005; Simon et al. 2009 Leaflets Northern Mississippi, USA

https://mc06.manuscriptcentral.com/botany-pubs Page 25 of 45 Botany

L Arcoa stem node 34 MacGinite 1953; Lavin et al. 2005; Bruneau et al. 2008; Simon Prosopis linearifolia leaves Florissant Locality, USA et al. 2009

L2 MRCA of and 10 Burnham 1995; Lavin et al. 2005; Simon et al. 2009 Maraniona Tipuana fruits Southern Ecuador M MRCA of Acrocarpus 45 Bruneau et al. 2008; Simon et al. 2009 Acrocarpus fruit SE USA M2 Robinia stem node 34 Lavin et al. 2003; Lavin et al. 2005; Simon et al. 2009 Robinia zirkelii wood North America and Europe N Senna stem node 45 Herendeen 1992; Calvillo-Canadell and Cevallos-Ferriz 2005; Senna fruits SE USA and Mexico Bruneau et al. 2008; Simon et al. 2009 O Caesalpinia stem node 45 Herendeen and Dilcher 1991; Lavin et al. 2005; Bruneau et al. Mezoneuron fruits SE and W USA 2008; Simon et al. 2009

Q MRCA of Acacieae/Ingeae 45 Guinet et al. 1987; Bruneau et al. 2008; Simon et al. 2009 Ingeae / Acacieae fossil pollen Egypt

Crepet and Dilcher 1977; Herendeen and Dilcher 1990; R Dinizia stem node Eumimosoidea plumosa 45 SE USA Bruneau et al. 2008; Simon et al. 2009 flowers, leaves and fruits Draft X Calliandra stem node 16 Caccavari and Barreda 2000; Simon et al. 2009 Calliandra pollen Argentina Y Newtonia stem node 21 Pan et al. 2012 Newtonia seeds Ethiopia

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 26 of 45

LITERATURE CITED

Ackerman, M., Brown, D. G., and Loker, D. 2014. Effects of rooting via out-groups on in-group

topology in phylogeny. Int. J. Bioinformatics Res. App. 2, 10: 426-446.

Albertson, R. C., Markert, J. A., Danley, P. D., and Kocher, T. D. 1999. Phylogeny of a rapidly evolving

clade: the cichlid fishes of Lake Malawi, East . Proc. Natl. Acad. Sci. U.S.A. 96(9): 5107-

5110.

Angiosperm Phylogeny Group 2016. An update of the Angiosperm Phylogeny Group classification

for the orders and families of flowering : APG IV. Bot. J. Linn. Soc. 181(1): 1-20.

Azani, N., Babineau, M., Bailey, C.D., Banks, H., Barbosa, A.R., Pinto, R.B., et al. 2017. A new

subfamily classification of the Leguminosae based on a taxonomically comprehensive phylogeny: The Legume PhylogenyDraft Working Group (LPWG). Taxon, 66(1): 44-77. Babineau, M., Gagnon, E., and Bruneau, A. 2013. Phylogenetic utility of 19 low copy nuclear genes in

closely related genera and species of caesalpinioid legumes. S. Afr. J. Bot. 89: 94-105.

Beaulieu, J. M., O'Meara, B. C., Crane, P., and Donoghue, M. J. 2015. Heterogeneous rates of

molecular evolution and diversification could explain the Triassic age estimate for

angiosperms. Syst. Biol. 64(5): 869-878.

Bell, C. D. 2015. Between a Rock and a Hard Place: Applications of the “Molecular Clock” in Syst. Biol.

40(1): 6–13.

Bello, M. A. 2008. Systematics and Floral Evolution of the Order Fabales. University of Reading, UK.

Bello, M. A., Bruneau, A., Forest, F., and Hawkins, J. A. 2009. Elusive relationships within order

Fabales: phylogenetic analyses using matK and rbcL sequence data. Syst. Bot. 34(1): 102–114.

doi: 10.1600/036364409787602348.

Bello, M. A., Rudall, P. J., and Hawkins, J. A. 2012. Combined phylogenetic analyses reveal

interfamilial relationships and patterns of floral evolution in the eudicot order Fabales.

Cladistics, 28: 393-421.

https://mc06.manuscriptcentral.com/botany-pubs Page 27 of 45 Botany

Bergsten, J. 2005. Cladistics A review of long-branch attraction, Cladistics, 21(2): 163–193. doi:

10.1111/j.1096-0031.2005.00059.x

Bewick, A. J., Chain, F. J. J., Heled, J., and Evans, B. J. 2012. The Pipid Root. Syst. Biol. 61(6): 913–926.

doi: 10.1093/sysbio/sys039

Borowiec, M. L., Rabeling, C., Brady, S. G., Fisher, B. L., Schultz, T. R., and Ward, P. S. 2019.

Compositional heterogeneity and outgroup choice influence the internal phylogeny of the ants.

Mol. Phylogenet. Evol. 134: 111-121.

Boykin, L. M., Kubatko, L. S., and Lowrey, T. K. 2010. Comparison of methods for rooting

phylogenetic trees: A case study using Orcuttieae (Poaceae: Chloridoideae). Mol. Phylogenet.

Evol. 54(3): 687–700. doi: 10.1016/j.ympev.2009.11.016

Braby, M. F., Trueman, J. W., and Eastwood,Draft R. 2005. When and where did troidine butterflies

(: Papilionidae) evolve? Phylogenetic and biogeographic evidence suggests an

origin in remnant Gondwana in the Late Cretaceous. Invertebr. Syst. 19: 113-143.

Brinkmann, H., Van Der Giezen, M., Zhou, Y., De Raucourt, G. P., and Philippe, H. 2005. An empirical

assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst. Biol.

54(5): 743–757. doi: 10.1080/10635150500234609.

Bruneau, A., Mercure, M., Lewis, G. P., and Herendeen, P. S. 2008. Phylogenetic patterns and

diversification in the caesalpinioid legumes. Botany, 86: 697-718.

Burnham, R. 1995. A new species of winged fruit from the Miocene of Ecuador: Tipuana ecuatoriana

(Leguminosae). Am. J. Bot. 82:1599 1607.

Caccavari, M. and Barreda, V. 2000. A new calymmate mimosoid polyad from the Miocene of

Argentina. Rev. Palaeobot. Palynol. 109: 197-203.

Calvignac-Spencer, S., Schulze, J. M., Zickmann, F., and Renard, B. Y. 2014. Clock rooting further

demonstrates that Guinea 2014 EBOV is a member of the Zaïre lineage. PLOS Currents

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 28 of 45

Outbreaks, 2014 Jun 16: 6. doi:

10.1371/currents.outbreaks.c0e035c86d721668a6ad7353f7f6fe86.

Calvillo-Canadell L. and Cevallos-Ferriz S., R., S. 2005. Diverse assemblage of Eocene and

Oligocene Leguminosae from Mexico. Int. J. Plant Sci. 166:671-692.

Cardoso, D., De Queiroz, L. P., Pennington, R. T., De Lima, H. C., Fonty, É., Wojciechowski, M. F., and

Lavin, M. 2012a. Revisiting the phylogeny of papilionoid legumes: New insights from

comprehensively sampled early-branching lineages. Am. J. Bot. 99: 1991-2013.

Cardoso, D., De Lima, H. C., Rodrigues, R. S., De Queiroz, L. P., Pennington, R. T., and Lavin, M. 2012b.

The Bowdichia clade of Genistoid legumes: Phylogenetic analysis of combined molecular and

morphological data and a recircumscription of Diplotropis. Taxon, 61(5): 1074-1087.

Cardoso, D., Pennington, R. T., De Queiroz, L. P., Boatwright, J. S., Van Wyk, B. E., Wojciechowski, M. F. et al. 2013. Reconstructing theDraft deep-branching relationships of the papilionoid legumes. S. Afr. J. Bot. 89: 58-75.

Crane, P., Manchester, S., and Dilcher, D. 1990. Fossil leaves and well-preserved reproductive

structures from the Fort Union Formation () near Almont. North Dakota, USA.

Fieldiana Geol. 20: 1-63.

Crayn, D. M., Fernando, E. S., Gadek, P. A., and Quinn, C. J. 1995. A reassessment of the familial

affinity of the Mexican genus Recchia Mocino, Sessé ex DC. Brittonia, 47: 397-402.

Crepet W. L. and Dilcher D. L. 1977. Investigations of angiosperms from Eocene of North America:

a mimosoid inflorescence. Am. J. Bot. 64: 714-725.

Crepet, W. L. and Taylor, D. W. 1985. The diversification of the Leguminosae: first fossil evidence of

the and Papilionoideae. Science, 288: 1087-1089.

Crepet, W. L. and Taylor, D. W. 1986. Primitive mimosoid flowers from the Paleocene-Eocene and

their systematic and evolutionary implications. Am. J. Bot. 73: 548-563.

https://mc06.manuscriptcentral.com/botany-pubs Page 29 of 45 Botany

Crepet, W. L. and Herendeen, P. S. 1992. Papilionoid flowers from the early Eocene of southeastern

North America. In Advances in Legume Systematics, Part 4. Edited by P. S. Herendeen and D.

L. Dilcher. Royal Botanic Gardens, Kew, UK. pp. 43-5

Criscuolo, A. and Gribaldo, S. 2010. BMGE (Block Mapping and Gathering with Entropy): A new

software for selection of phylogenetic informative regions from multiple sequence alignments.

BMC Evol. Biol. 10(1). doi: 10.1186/1471-2148-10-210

Crisp, M. D., Cook, L. G., and Steane, D. A. 2004. Radiation of the Australian flora: what can

comparisons of molecular phylogenies across multiple taxa tell us about the evolution of

diversity in present-day communities? Philos. Trans. R. Soc. Lond. B Biol. Sci. No. 359:1551–

1571.

Crisp, M. D. and Cook, L. G. 2009. Explosive radiation or cryptic mass extinction? Interpreting

signatures in molecular phylogenies.Draft Evolution, 63: 2257-65.

Dabert, M., Witalinski, W., Kazmierski, A., Olszanowski, Z., and Dabert, J. 2010. Molecular phylogeny

of acariform mites (Acari, Arachnida): Strong conflict between phylogenetic signal and long-

branch attraction artifacts. Mol. Phylogenet. Evol. 56(1): 222–241. doi:

10.1016/j.ympev.2009.12.020

Darriba, D., Taboada, G. L., Doallo, R., and Posada, D. 2012. jModelTest 2: more models, new

heuristics and parallel computing. Nat. Methods, 9: 772-772.

De Franceschi, D. and De Ploëg, G. 2003. Origine de l’ambre des faciès sparnaciens (Éocène inférieur)

du Bassin de Paris: le bois de l’arbre producteur. Geodiversitas, 25(4): 633-647.

Djernæs, M., Klass, K. D., Picker, M. D., and Damgaard, J. 2012. Phylogeny of cockroaches (Insecta,

Dictyoptera, Blattodea), with placement of aberrant taxa and exploration of out-group

sampling. Syst. Entomol. 37(1): 65–83. doi: 10.1111/j.1365-3113.2011.00598.x.

Donoghue, M. J. and Cantino, P. D. 1984. The logic and limitations of the outgroup substitution

approach to cladistic analysis. Syst. Bot. 9(2): 192-202.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 30 of 45

Doyle, J. J., Chappill, J. A., Bailey, C. D., and Kajita, T. 2000. Towards a comprehensive phylogeny of

legumes: evidence from rbcL sequences and non-molecular data. In Advances in Legume

Systematics, Part 9. Edited by P. S. Herendeen and A. Bruneau. Royal Botanic Gardens, Kew,

UK. pp. 1-20.

Drew, B. T., Ruhfel, B. R., Smith, S. A., Moore, M. J., Briggs, B. G., Gitzendanner, M. A. et al. 2014.

Another look at the root of the angiosperms reveals a familiar tale. Syst. Biol. 63: 368-82.

Drummond, A. J., Ho, S. Y., Phillips, M. J., and Rambaut, A. 2006. Relaxed phylogenetics and dating

with confidence. PLoS Biology, 4: e88.

Drummond, A. J. and Rambaut, A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees.

BMC Evol. Biol. 7(1): 214.

Drummond, A. and Rambaut, A. 2007. LogCombiner v1. 4.8. Available online at

http://evolve.zoo.ox.ac.uk/beast/.Draft

Drummond, A. J., Suchard, M. A., Xie, D., and Rambaut, A. 2012. Bayesian phylogenetics with BEAUti

and the BEAST 1.7. Mol. Biol. Evol. 29: 1969-1973.

Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively

misleading. Syst. Biol. 27: 401-410.

Fishbein, M., Hibsch-Jetter, C., Soltis, D. E., and Hufford, L. 2001. Phylogeny of Saxifragales

(angiosperms, eudicots): analysis of a rapid, ancient radiation. Syst. Biol. 50: 817-847.

Forest, F. 2004. Systematics of Fabales and Polygalaceae, with emphasis on Muraltia and the origin

of the Cape flora. University of Reading, UK.

Forest, F., Chase, M. W., Persson, C., Crane, P. R., and Hawkins, J. A. 2007. The role of biotic and

abiotic factors in evolution of ant dispersal in the milkwort family (Polygalaceae). Evolution,

61: 1675-1694.

Forest, F. 2009. Calibrating the tree of life: Fossils, molecules and evolutionary timescales. Ann. Bot.

104(5): 789–794. doi: 10.1093/aob/mcp192.

https://mc06.manuscriptcentral.com/botany-pubs Page 31 of 45 Botany

Foster, P.G. and Hickey, D.A. 1999. Compositional bias may affect both DNA-based and protein-

based phylogenetic reconstructions. J. Mol. Evol. 48(3): 284-290.

Graham, S. W., Olmstead, R. G., and Barrett, S. C. 2002. Rooting phylogenetic trees with distant

outgroups: a case study from the commelinoid monocots. Mol. Biol. Evol. 19: 1769-1781.

Grant, T. 2019. Outgroup sampling in phylogenetics: Severity of test and successive outgroup

expansion. J. Zool. Syst. Evol. Res. 57: 748-763.

Gribaldo, S. and Philippe, H. 2002. Ancient Phylogenetic Relationships. Theor. Popul. Biol. 61(4):

391–408. doi: 10.1006/tpbi.2002.1593.

Guindon, S. and Gascuel, O. 2003. A simple, fast, and accurate algorithm to estimate large

phylogenies by maximum likelihood. Syst. Biol. 52(5): 696-704.

Guinet P., El Sabrouty, N., Soliman, H. A., Draftand Omran, A. M. 1987. Study of pollen characters of the Leguminosae-Mimosoideae from the Tertiary sediments of the northwest of Egypt

(Translated from French). Mémoires Travaux E. P. H. E., Institute de Montpellier, 17:159-171.

Harrison, J. C. and Langdale, J. A. 2006. A step by step guide to phylogeny reconstruction. Plant J. 45:

561-572.

Heads, M. 2010. Old taxa on young islands: a critique of the use of island age to date island-endemic

clades and calibrate phylogenies. Syst. Biol. 60(2): 204-218.

Heath, T. A., Hedtke, S. M., and Hillis, D. M. 2008. Taxon sampling and the accuracy of phylogenetic

analyses. J. Syst. Evol. 46: 239–257. doi: 10.3724/SP.J.1002.2008.08016

Herendeen, P. S. and Dilcher, D. L. 1990. Reproductive and vegetative evidence for the occurrence of

Crudia (Leguminosae, Caesalpinioideae) in the Eocene of southeastern North America. Bot.

Gaz. 151(3): 402-413.

Herendeen, P. S. and Dilcher, D. L. 1991. Caesalpinia subgenus Mezoneuron (Leguminosae,

Caesalpinioideae) from the Tertiary of North America. Am. J. Bot. 78:1-12.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 32 of 45

Herendeen, P. S. 1992. The fossil history of the Leguminosae from the Eocene of southeastern North

America. In Advances in Legume Systematics, Part 4. Edited by P. S. Herendeen, D. L. Dilcher.

Royal Botanic Gardens, Kew, UK. pp. 85-160.

Herendeen, P. S., Crepet, W. L., and Dilcher, D. L. 1992. The fossil history of the Leguminosae:

phylogenetic and biogeographical implications. In Advances in Legume Systematics, Part 4.

Edited by P. S. Herendeen, D. L. Dilcher. Royal Botanic Gardens, Kew, UK. pp. 303-316.

Herendeen, P. S. 1995. Phylogenetic relationships of the tribe Swartzieae. In Advances in Legume

Systematics, Part 7. Edited by M. D. Crisp and J. J. Doyle. Royal Botanic Gardens, Kew, UK. pp.

123-132.

Herendeen, P. S. and Jacobs, B. F. 2000. Fossil legumes from the middle Eocene (46.0 Ma) Mahenge

flora of Singida, Tanzania. Am. J. Bot. 87(9):1358-1366. Herendeen, P. S. and Wing, S. 2001. PapilionoidDraft legume fruits and leaves from the Paleocene of northwestern Wyoming. Botany 2001 Abstracts, published by Botanical Society of America.

Available from http://2001.botanyconference.org/section7/abstracts/26.shtml.

Ho, S. Y. W. and Duchêne, S. 2014. Molecular-clock methods for estimating evolutionary rates and

timescales. Mol. Ecol. 23: 5947–5965.

Holland, B. R., Penny, D., and Hendy, M. D. 2003. Outgroup misplacement and phylogenetic

inaccuracy under a molecular clock - A simulation study. Syst. Biol. 52(2): 229–238. doi:

10.1080/10635150390192771.

Huang, C-H., Sun, R., Hu, Y., Zeng, L., Zhang, N., Cai, L. et al. 2015. Resolution of Brassicaceae

phylogeny using nuclear genes uncovers nested radiations and supports convergent

morphological evolution. Mol. Biol. Evol. 33(2): 394–412.

Hueber, F., M. and Langenheim, J. Dominican amber tree had African ancestors. 1986. Geotimes, 31:

8-10.

Huelsenbeck, J. P., Bollback, J. P., and Levine, A. M. 2002. Inferring the root of a phylogenetic tree.

Syst. Biol. 51: 32-43.

https://mc06.manuscriptcentral.com/botany-pubs Page 33 of 45 Botany

Hughes, C.E., Bailey, C.D., and Harris, S.A. 2002. Divergent and reticulate species relationships in

Leucaena (Fabaceae) inferred from multiple data sources: insights into polyploid origins and

nrDNA polymorphism. Am. J. Bot. 89(7): 1057-1073.

Ireland, H., Pennington, R. T., and Preston, J. 2000. Molecular systematics of the Swartzieae. In

Advances in Legume Systematics, Part 9. Edited by P. S. Herendeen and A. Bruneau. Royal

Botanic Gardens, Kew, UK. pp. 217–231.

Jacobs, B. F. and Herendeen, P. S. 2004. Eocene dry climate and woodland vegetation in tropical

Africa reconstructed from fossil leaves from northern Tanzania. Palaeogeogr. Palaeoclimatol.

Palaeoecol. 213(1): 115-123.

Jamil, I., Qamarunnisa, S., and Azhar, A. 2019. Effect of outgroup on phylogeny reconstruction: a

case study of family Solanaceae. Pure Appl. Biol. 8(4): 2213-2227. Jia, H. and Manchester, S. R. 2014. Fossil LeavesDraft and Fruits of Cercis L. (Leguminosae) from the Eocene of Western North America. Int. J. Plant Sci. 175: 601-612.

Jian, S., Soltis, P. S., Gitzendanner, M. A., Moore, M. J., Li, R., Hendry, T. A. et al. 2008. Resolving an

ancient, rapid radiation in Saxifragales. Syst. Biol. 57: 38-57.

Kajita, T., Ohashi, H., Tateishi, Y., Bailey, C. D., and Doyle, J. J. 2001. rbcL and legume phylogeny, with

particular reference to Phaseoleae, Millettieae, and allies. Syst. Bot. 26: 515-536.

Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S. et al. 2012. Geneious

Basic: an integrated and extendable desktop software platform for the organization and

analysis of sequence data. Bioinformatics, 28: 1647-1649.

Kirchberger, P. C., Sefc, K. M., Sturmbauer, C., and Koblmüller, S. 2014. Outgroup effects on root

position and tree topology in the AFLP phylogeny of a rapidly radiating lineage of cichlid fish.

Mol. Phylogenet. Evol. 70(1): 57–62. doi: 10.1016/j.ympev.2013.09.005.

Kodandaramaiah, U., Peña, C., Braby, M. F., Grund, R., Müller, C. J., Nylin, S. et al. 2010.

Phylogenetics of Coenonymphina (Nymphalidae: Satyrinae) and the problem of rooting rapid

radiations. Mol. Phylogenet. Evol. 54(2): 386–394. doi: 10.1016/j.ympev.2009.08.012.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 34 of 45

Koenen, E. J. M., De Vos, J. M., Atchison, G. W., Simon, M. F., Schrire, B. D., De Souza, E. R. et al.

2013. Exploring the tempo of species diversification in legumes. S. Afr. J. Bot. 89: 19-30.

Koenen, E. J., Ojeda, D. I., Steeves, R., Migliore, J., Bakker, F. T., Wieringa, J. J. et al. 2019. The Origin

and Early Evolution of the Legumes are a Complex Paleopolyploid Phylogenomic Tangle closely

associated with the Cretaceous-Paleogene (K-Pg) Boundary. BioRxiv, 577957.

Kostka, M., Uzlikova, M., Cepicka, I., and Flegr, J. 2008. SlowFaster, a user-friendly program for slow-

fast analysis and its application on phylogeny of Blastocystis. BMC Bioinformatics, 9: 4–9. doi:

10.1186/1471-2105-9-341.

Kruse, H. O. 1954. Some Eocene dicotyledonous woods from Eden Valley, Wyoming. The Ohio

Journal of Science, 54: 243-268. Lavin, M., Wojciechowski, M. F., Richman,Draft A., Rotella, J., Sanderson, M. J., and Matos, A. B. 2001. Identifying Tertiary radiations of Fabaceae in the Greater Antilles: alternatives to cladistic

vicariance analysis. Int. J. Plant Sci. 162: S53-S76.

Lavin, M., Wojciechowski, M. F., Gasson, P., Hughes, C., and Wheeler, E. 2003. Phylogeny of robinioid

legumes (Fabaceae) revisited: Coursetia and Gliricidia recircumscribed, and a

biogeographical appraisal of the endemics. Syst. Bot. 28(2): 387-409.

Lavin, M., Herendeen, P. S., and Wojciechowski, M. F. 2005. Evolutionary rates analysis of

Leguminosae implicates a rapid diversification of lineages during the Tertiary. Syst. Biol. 54:

575-594.

Letunic, I. and Bork, P. 2016. Interactive tree of life (iTOL) v3: an online tool for the display and

annotation of phylogenetic and other trees. Nucleic Acids Res. 44(W1): W242-W245.

Li, C., Matthes-Rosana, K. A., Garcia, M., and Naylor, G. J. P. 2012. Phylogenetics of Chondrichthyes

and the problem of rooting phylogenies with distant outgroups. Mol. Phylogenet. Evol. 63(2):

365–373. doi: 10.1016/j.ympev.2012.01.013.

https://mc06.manuscriptcentral.com/botany-pubs Page 35 of 45 Botany

Li, M., Wunder, J., Bissoli, G., Scarponi, E., Gazzani, S., Barbaro, E. et al. 2008. Development of COS

genes as universally amplifiable markers for phylogenetic reconstructions of closely related

plant species. Cladistics, 24(5): 727-745.

Linder, H. P., Hardy, C. R., and Rutschmann, F. 2005. Taxon sampling effects in molecular clock

dating: an example from the African Restionaceae. Mol. Phylogenet. Evol. 35: 569–582.

Luo, A., Zhang, Y., Qiao, H., Shi, W., Murphy, R. W., and Zhu, C. 2010. Outgroup selection in tree

reconstruction: a case study of the family Halictidae (Hymenoptera: Apoidea). Acta Entomol.

Sin. 53(2): 192-201.

Lyons-Weiler, J., Hoelzer, G. A., and Tausch, R. J. 1998. Optimal outgroup analysis. Biol. J. Linn. Soc.

64(4): 493–511. doi: 10.1006/bijl.1998.0229. Lyson, T.R., Miller, I.M., Bercovici, A.D., Weissenburger,Draft K., Fuentes, A.J., Clyde, W.C., Hagadorn, J.W., Butrim, M.J., Johnson, K.R., Fleming, R.F., and Barclay, R.S. 2019. Exceptional continental

record of biotic recovery after the Cretaceous–Paleogene mass extinction. Science, 366 (6468):

977-983. doi: 10.1126/science.aay2268.

MacGinitie HD. 1953. Fossil plants of the Florissant beds, Colorado. Contributions to Paleontology

series. Carnegie Institute of Washington Publ. No. 599. Carnegie Institute of Washington,

Washington D.C. pp. 198.

Magallon, S., Crane, P. R., and Herendeen, P. S. 1999. Phylogenetic pattern, diversity, and

diversification of eudicots. Ann. Mo. Bot. Gard. 297-372.

Milinkovitch, M. C. and Lyons-Weiler, J. 1998. Finding optimal ingroup topologies and convexities

when the choice of outgroups is not obvious. Mol. Phylogenet. Evol. 9(3): 348–357. doi:

10.1006/mpev.1998.0503.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 36 of 45

Miller, M. A., Pfeiffer, W., and Schwartz, T. 2010. Creating the CIPRES Science Gateway for inference

of large phylogenetic trees. 2010 Gateway Computing Environments Workshop (GCE), New

Orleans, LA, 2010. IEEE. pp. 1-8. doi: 10.1109/GCE.2010.5676129

Moore, M. J., Soltis, P. S., Bell, C. D., Burleigh, G., and Soltis, D. E. 2010. Phylogenetic analysis of 83

plastid genes further resolves the early diversification of eudicots. Proc. Natl. Acad. Sci.

U.S.A. 107(10): 4623-4628.

Moreira, D. and Philippe, H. 2000. Molecular phylogeny: pitfalls and progress. Int. Microbiol. 3: 9-16.

Murdock, A. G. 2008. Phylogeny of marattioid ferns (Marattiaceae): Inferring a root in the absence of

a closely related outgroup. Am. J. Bot. 95(5): 626–641. doi: 10.3732/ajb.2007308.

Nixon, K. C. and Carpenter, J. M. 1993. On outgroups. Cladistics, 9: 413-426.

Outlaw, D. C. and Ricklefs, R. E. 2011. Rerooting the evolutionary tree of malaria parasites. Proc.

Natl. Acad. Sci. U.S.A. 108(32): 13183–13187.Draft doi: 10.1073/pnas.1109153108.

Pan, A. D., Currano, E. D., Jacobs, B. F., Feseha, M., Tabor, N., and Herendeen, P. S. 2012. Fossil

Newtonia (Fabaceae: Mimoseae) seeds from the early Miocene (22–21 Ma) Mush Valley in

Ethiopia. Int. J. Plant Sci. 173: 290-296.

Pennington, R. T., Lavin, M., Ireland, H., Klitgaard, B., Preston, J., and Hu, J.-M. 2001. Phylogenetic

relationships of basal papilionoid legumes based upon sequences of the chloroplast trnL

intron. Syst. Bot. 26: 537-556.

Pereira, A.G., Sterli, J., Moreira, F.R., and Schrago, C.G. 2017. Multilocus phylogeny and statistical

biogeography clarify the evolutionary history of major lineages of turtles. Mol. Phylogenet.

Evol. 113: 59-66.

Persson, C. 2001. Phylogenetic relationships in Polygalaceae based on plastid DNA sequences from

the trnL-F region. Taxon, 50(3): 763-779.

Philippe, H., Brinkmann, H., Lavrov, D. V., Littlewood, D. T. J., Manuel, M., Wörheide, G. et al. 2011.

Resolving difficult phylogenetic questions: Why more sequences are not enough. PLoS Biology,

https://mc06.manuscriptcentral.com/botany-pubs Page 37 of 45 Botany

9(3). doi: 10.1371/journal.pbio.1000602.

Philippe, H. and Laurent, J. 1998. How good are deep phylogenetic trees? Curr. Opin. Genet. Dev.

8(6): 616-623.

Pigg, K., Wojciechowski, M., and Devore, M. 2004. Samaras from the Late Paleocene Almont and

Beicegel Creek floras of North Dakota, USA, with potential affinities to Securidaca

(Polygalaceae). Abstracts of Botany. Botany 2004 meeting, Salt Lake City, UT. Available at:

http://2004.botanyconference.org/engine/search/index.php?func=detail&aid=594.

Pigg, K. B., DeVore, M. L., and Wojciechowski, M. F. 2008. Paleosecuridaca curtisii gen. et sp. nov.,

Securidaca-like samaras (Polygalaceae) from the late Paleocene of North Dakota and their

significance to the divergence of families within the Fabales. Int. J. Plant Sci. 169(9): 1304- 1313. Draft Pirie, M. D., Chatrou, L. W., Erkens, R. H., Maas, J. W., Van Der Niet, T., Mols, J. B. et al. 2005.

Phylogeny reconstruction and molecular dating in four Neotropical genera of Annonaceae:

the effect of taxon sampling in age estimations. Regnum Vegetabile, 143: 149-174.

Pisani, D., Feuda, R., Peterson, K. J., and Smith, A. B. 2012. Resolving phylogenetic signal from noise

when divergence is rapid: A new look at the old problem of echinoderm class relationships.

Mol. Phylogenet. Evol. 62(1): 27–34. doi: 10.1016/j.ympev.2011.08.028.

Pisani, D., Pett, W., Dohrmann, M., Feuda, R., Rota-Stabelli, O., Philippe, H. et al. 2015. Genomic data

do not support comb jellies as the sister group to all other animals. Proc. Natl. Acad. Sci.

U.S.A. 112: 15402-15407.

Poinar, G.O. and Poinar, R. 1999. The amber forest: a reconstruction of a vanished world. Princeton

University Press, Princeton, N. J.

Posada, D. 2008. jModelTest: phylogenetic model averaging. Mol. Biol. Evol. 25: 1253-1256.

Puslednik, L. and Serb, J. M. 2008. of the Pectinidae (Mollusca: Bivalvia) and

effect of increased taxon sampling and outgroup selection on tree topology. Mol. Phylogenet.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 38 of 45

Evol. 48(3): 1178–1188. doi: 10.1016/j.ympev.2008.05.006.

Qiu, Y. L., Lee, J., Whitlock, B., Bernasconi-Quadroni, F., and Dombrovska, O. 2001. Was the ANITA

rooting of the angiosperm phylogeny affected by long-branch attraction? Mol. Biol. Evol. 18(9):

1745–1753. doi: 10.1093/oxfordjournals.molbev.a003962.

Raman, G. and Park, S. 2016. The complete chloroplast genome sequence of Ampelopsis: gene

organization, comparative analysis, and phylogenetic relationships to other angiosperms.

Front. Plant Sci. 7: 341. https://doi: 10.3389/fpls.2016.00341.

Rambaut, A. and Grass, N. C. 1997. Seq-Gen: an application for the Monte Carlo simulation of DNA

sequence evolution along phylogenetic trees. CABIOS. 13: 235-238.

Rambaut, A. and Drummond, A. 2007. TreeAnnotator. AvailAble from: http://beast. bio. ed. ac. uk/TreeAnnotator. Draft Rambaut, A., Suchard, M., Xie, D., and Drummond, A. 2014. Tracer v1. 6. Available at

http://beast.bio.ed.ac.uk/Tracer.

Reddy, S., Kimball, R.T., Pandey, A., Hosner, P.A., Braun, M.J., Hackett, S.J. et al. 2017. Why do

phylogenomic data sets yield conflicting trees? Data type influences the avian tree of life more

than taxon sampling. Syst. Biol. 66(5): 857-879.

Renner, S. S., Grimm, G. W., Schneeweiss, G. M., Stuessy, T. F., and Ricklefs, R. E. 2008. Rooting and

dating maples (Acer) with an uncorrelated-rates molecular clock: Implications for North

American/Asian disjunctions. Syst. Biol. 57(5): 795–808. doi: 10.1080/10635150802422282.

Rich, S. M. and Xu, G. 2011. Resolving the phylogeny of malaria parasites. Proc. Natl. Acad. Sci. U.S.A.

108(32): 12973–12974. doi: 10.1073/pnas.1110141108.

Roberts, T. E., Sargis, E. J., and Olson, L. E. 2009. Networks, trees, and treeshrews: Assessing support

and identifying conflict with multiple loci and a problematic root. Syst. Biol. 58(2): 257–270.

doi: 10.1093/sysbio/syp025.

https://mc06.manuscriptcentral.com/botany-pubs Page 39 of 45 Botany

Rodréguez-Ezpeleta, N., Brinkmann, H., Roure, B., Lartillot, N., Lang, B. F., and Philippe, H. 2007.

Detecting and overcoming systematic Errors in genome-scale phylogenies. Syst. Biol. 56(3):

389–399. doi: 10.1080/10635150701397643.

Rodríguez, A., Burgon, J.D., Lyra, M., Irisarri, I., Baurain, D., Blaustein, L. et al. 2017. Inferring the

shallow phylogeny of true salamanders (Salamandra) by multiple phylogenomic approaches.

Mol. Phylogenet. Evol. 115: 16-26.

Rokas, A. and Carroll, S. B. 2006. Bushes in the tree of life. PLoS Biology, 4(11): 1899–1904. doi:

10.1371/journal.pbio.0040352.

Rota-Stabelli, O. and Telford, M. J. 2008. A multi criterion approach for the selection of optimal

outgroups in phylogeny: Recovering some support for Mandibulata over Myriochelata using mitogenomics. Mol. Phylogenet. Evol.Draft 48(1): 103–111. doi: 10.1016/j.ympev.2008.03.033.

Rothfels, C. J., Larsson, A., Kuo, L. Y., Korall, P., Chiou, W. L., Pryer, K. M. 2012. Overcoming deep

roots, fast rates, and short internodes to resolve the ancient rapid radiation of eupolypod II

ferns. Syst. Biol. 61(3): 490–509. doi: 10.1093/sysbio/sys001.

Rutschmann F. 2006. Molecular dating of phylogenetic trees: a brief review of current methods that

estimate divergence times. Divers. Distrib. 12: 35–48.

Salichos, L. and Rokas, A. 2013. Inferring ancient divergences requires genes with strong

phylogenetic signals. Nature, 497(7449): 327–331. doi: 10.1038/nature12130.

Savolainen, V., Chase, M. W., Hoot, S. B., Morton, C. M., Soltis, D. E., Bayer, C. et al. 2000.

Phylogenetics of flowering plants based on combined analysis of plastid atpB and rbcL gene

sequences. Syst. Biol. 49: 306-362.

Schäferhoff, B., Fleischmann, A., Fischer, E., Albach, D.C., Borsch, T., Heubl, G., and Müller, K.F. 2010.

Towards resolving Lamiales relationships: insights from rapidly evolving chloroplast

sequences. BMC Evol. Biol. 10(1): 352.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 40 of 45

Schuettpelz, E. and Hoot, S. B. 2006. Inferring the root of Isoëtes: exploring alternatives in the

absence of an acceptable outgroup. Syst. Bot. 31: 258-270.

Shaffer, H.B., McCartney-Melstad, E., Near, T.J., Mount, G.G., and Spinks, P.Q. 2017. Phylogenomic

analyses of 539 highly informative loci dates a fully resolved time tree for the major clades of

living turtles (Testudines). Mol. Phylogenet. Evol. 115: 7-15.

Shavit, L., Penny, D., Hendy, M. D., and Holland, B. R. 2007. The problem of rooting rapid radiations.

Mol. Biol. Evol. 24(11): 2400–2411. doi: 10.1093/molbev/msm178.

Simon, M. F., Grether, R., De Queiroz, L. P., Skema, C., Pennington, R. T., and Hughes, C. E. 2009.

Recent assembly of the Cerrado, a neotropical plant diversity hotspot, by in situ evolution of

adaptations to fire. Proc. Natl. Acad. Sci. U.S.A. 106(48): 20359–20364. doi: 10.1073/pnas.0903410106. Draft Smith, A. B. 1994. Rooting molecular trees, problems and strategies. Biol. J. Linn. Soc. 51: 279–292.

Smith, S. A., Beaulieu, J. M., Stamatakis, A., and Donoghue, M. J. 2011. Understanding angiosperm

diversification using small and large phylogenetic trees. Am. J. Bot. 98(3): 404–414. doi:

10.3732/ajb.1000481.

Soltis, D. E., Soltis, P. S., Chase, M. W., Mort, M. E., Albach, D. C., Zanis, M. et al. 2000. Angiosperm

phylogeny inferred from 18S rDNA, rbcL, and atpB sequences. Bot. J. Linn. Soc. 133: 381-461.

Soltis, D. E., Smith, S. A., Cellinese, N., Wurdack, K. J., Tank, D. C., Brockington, S. F. et al. 2011.

Angiosperm phylogeny: 17 genes, 640 taxa. Am. J. Bot. 98(4): 704-730. doi:

10.3732/ajb.1000404.

Sourdis, J. and Krimbas, C. 1987. Accuracy of phylogenetic trees estimated from DNA sequence data.

Mol. Biol. Evol. 4: 159-166.

https://mc06.manuscriptcentral.com/botany-pubs Page 41 of 45 Botany

Spinks, P.Q., Thomson, R.C., Lovely, G.A., and Shaffer, H.B. 2009. Assessing what is needed to resolve

a molecular phylogeny: simulations and empirical data from emydid turtles. BMC Evol. Biol.

9(1): 56.

Stamatakis, A., Hoover, P., and Rougemont, J. 2008. A rapid bootstrap algorithm for the RAxML web

servers. Syst. Biol. 57: 758-771.

Sterli, J. 2010. Phylogenetic relationships among extinct and extant turtles: The position of

pleurodira and the effects of the fossils on rooting crown-group turtles. Contrib. Zool. 79(3):

93–106.

Sun, M., Soltis, D. E., Soltis, P. S., Zhu, X., Burleigh, J. G., and Chen, Z. 2015. Deep phylogenetic

incongruence in the angiosperm clade Rosidae. Mol. Phylogenet. Evol. 83: 156-166.

Sun, M., Naeem, R., Su, J. X., Cao, Z. Y., Burleigh, J. G., Soltis, P. S. et al.2016. Phylogeny of the

Rosidae: A dense taxon sampling Draftanalysis. J. Syst. Evol. 54(4): 363-391.

Tarrío, R., Rodríguez-Trelles, F., and Ayala, F. J. 2000. Tree rooting with outgroups when they differ in

their nucleotide composition from the ingroup: The Drosophila saltans and willistoni groups, a

case study. Mol. Phylogenet. Evol. 16(3): 344–349. doi: 10.1006/mpev.2000.0813.

Thomas, J. A., Trueman, J. W. H., Rambaut, A., and Welch, J. J. 2013. Relaxed phylogenetics and the

palaeoptera problem: Resolving deep ancestral splits in the insect phylogeny. Syst. Biol. 62(2):

285–297. doi: 10.1093/sysbio/sys093.

Wägele, J. 1999. Major sources of errors in phylogenetic systematics. Zoologischer Anzeiger, 238:

329-337.

Wägele, J. W. and Mayer, C. 2007. Visualizing differences in phylogenetic information content of

alignments and distinction of three classes of long-branch effects. BMC Evol. Biol. 7: 1–24. doi:

10.1186/1471-2148-7-147.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 42 of 45

Ware, J. L., Litman, J., Klass, K.-D., and Spearman, L. A. 2008. Relationships among the major lineages

of Dictyoptera: the effect of outgroup selection on dictyopteran tree topology. Syst.

Entomol. 33: 429-450.

Wei, R., Yan, Y. H., Harris, A. J., Kang, J. S., Shen, H., Xiang, Q. P. et al 2017. Plastid phylogenomics

resolve deep relationships among eupolypod II ferns with rapid radiation and rate

heterogeneity. Genome Biol. Evol. 9(6):1646-1657.

Welch, J. J. and Bromham, L. 2005. Molecular dating when rates vary. Trends Ecol. Evol. 20(6): 320-

327.

Wheeler, W. C. 1990. Nucleic acid sequence phylogeny and random outgroups. Cladistics, 6: 363-

367.

Whitfield, J. B. and Kjer, K. M. 2008. Ancient Rapid Radiations of Insects: Challenges for Phylogenetic Analysis. Annu. Rev. Entomol. 53(1):Draft 449–472. doi: 10.1007/s00103-013-1761-y.

Whitfield, J. B. and Lockhart, P. J. 2007. Deciphering ancient rapid radiations. Trends Ecol. Evol.

22(5): 258–265. doi: 10.1016/j.tree.2007.01.012.

Wing, S. L., Herrera, F., Jaramillo, C. A., Gómez-Navarro, C., Wilf, P., and Labandeira, C. C. 2009. Late

Paleocene fossils from the Cerrejón Formation, Colombia, are the earliest record of

Neotropical rainforest. Proc. Natl. Acad. Sci. U.S.A. 106(44): 18627-18632.

Wojciechowski, M. F., Lavin, M., and Sanderson, M. J. 2004. A phylogeny of legumes (Leguminosae)

based on analysis of the plastid matK gene resolves many well-supported subclades within

the family. Am. J. Bot. 91: 1846-1862.

Wojciechowski, M. F. 2013. Towards a new classification of Leguminosae: naming clades using non-

Linnaean phylogenetic nomenclature. S. Afr. J. Bot. 89: 85-93.

Wu, C. S., Chaw, S. M., and Huang, Y. Y. 2013. Chloroplast phylogenomics indicates that Ginkgo

biloba is sister to cycads. Genome Biol. Evol. 5(1): 243–254. doi: 10.1093/gbe/evt001.

Zeng, L., Zhang, Q., Sun, R., Kong, H., Zhang, N., and Ma, H. 2014. Resolution of deep angiosperm

https://mc06.manuscriptcentral.com/botany-pubs Page 43 of 45 Botany

phylogeny using conserved nuclear genes and estimates of early divergence times. Nat.

Commun. 5: 1–12. doi: 10.1038/ncomms5956.

Zeng, L., Zhang, N., Zhang, Q., Endress, P.K., Huang, J., and Ma, H. 2017. Resolution of deep eudicot

phylogeny and their temporal diversification using nuclear genes from transcriptomic and

genomic datasets. New Phytol. 214(3): 1338-1354.

Zhang, S.D., Jin, J.J., Chen, S.Y., Chase, M.W., Soltis, D.E., Li, H.T. et al. 2017. Diversification of

Rosaceae since the Late Cretaceous based on plastid phylogenomics. New Phytol. 214(3): 1355-

1367.

Zhong, B., Xi, Z., Goremykin, V. V., Fong, R., Mclenachan, P. A., Novis, P. M., Davis, C. C. et al. 2013.

Streptophyte algae and the origin of land plants revisited using heterogeneous models with three new algal chloroplast genomes.Draft Mol. Biol. Evol. 31(1): 177-183. doi: 10.1093/molbev/mst200.

Zhi-Chen, S., Wei-Ming, W., and Fei, H. 2004. Fossil pollen records of extant angiosperms in China.

Bot. Rev. 70: 425-458.

Figure 1. Summary of the phylogenetic tree from ML analysis with our primary dataset showing Fabales and its closest outgroup taxa. The Fabales families and the Leguminosae subfamilies (Azani et al. 2017) are indicated.

Figure 2. Chronogram of Leguminosae with our primary dataset. Origin of Leguminosae, Polygalaceae, (Surianaceae+Quillajaceae) and Surianaceae are indicated with dots. The age of Leguminosae is 71.4 Ma, crown Polygalaceae 48.2 Ma, (Surianaceae+Quillajaceae) crown clade 59.6 Ma and crown Surianaceae 39.8 Ma. Ma: million years. Posterior probabilities for the key nodes, calibration points (from A to Y) and six Leguminosae subfamilies are indicated. Scale bar in million years.

https://mc06.manuscriptcentral.com/botany-pubs Botany Page 44 of 45

Draft

Figure 1. Summary of the phylogenetic tree from ML analysis with our “primary dataset” showing Fabales and its closest outgroup taxa. The Fabales families and the Leguminosae subfamilies (LPWG, 2017) are colour coded.

303x250mm (300 x 300 DPI)

https://mc06.manuscriptcentral.com/botany-pubs Page 45 of 45 Botany

Draft

Figure 2. Chronogram of Leguminosae with our “primary dataset”. Origin of Leguminosae (red), Polygalaceae (blue), (Surianaceae+Quillajaceae) (green) and Surianaceae (pink) are indicated with dots. The age of Leguminosae is 71.4 Ma, crown Polygalaceae 48.2 Ma, (Surianaceae+Quillajaceae) crown clade 59.6 Ma and crown Surianaceae 39.8 Ma. Ma: million years. Posterior probabilities for the key nodes, calibration points (in red, from A to Y) and six Leguminosae subfamilies are indicated. Scale bar in million years.

123x198mm (300 x 300 DPI)

https://mc06.manuscriptcentral.com/botany-pubs