<<

View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by RERO DOC Digital Library

Published in Molecular Phylogenetics and Evolution 31, issue 2, 711-729, 2004 1 which should be used for any reference to this work

Conflicting phylogenies of balsaminoid families and the polytomy in : combining data in a Bayesian framework

K. Geuten,a,* E. Smets,a P. Schols,a Y.-M. Yuan,b S. Janssens,a P. Kupfer,€ b and N. Pycka

a Laboratory of Systematics, Institute of Botany and Microbiology, K.U.Leuven, Kasteelpark Arenberg 31, B-3001 Leuven, Belgium b Institut de Botanique, Universite de Neucha^tel, Neucha^tel, Switzerland

Abstract

The balsaminoid Ericales, namely , , Tetrameristaceae, and Pellicieraceae have been confidently placed at the base of Ericales, but the relations among these families have been resolved differently in recent analyses. Sister to this basal group is a large polytomy comprising all other families of Ericales, which is associated with short internodes. Because there are more than 13 kb of sequences for a large sampling of representatives, a thorough examination of the available data with novel methods seemed in place. Because of its computational speed, Bayesian phylogenetics allows for the use of parameter-rich models that can accommodate differences in the evolutionary process between partitions in a simultaneous analysis. In addition, there are recently proposed Bayesian strategies of assessing incongruence between partitions. We have applied these methods to the current problems in Ericales phylogeny, taking into account reported pitfalls in Bayesian analysis such as model selection uncertainty. Based on our results we infer several, previously unresolved relationships in the order Ericales. In balsaminoid families, we find that the closest relatives of Balsaminaceae are Marcgraviaceae. In the Ericales polytomy, we find strong support for Pentaphylacaceae sensu APG II as the sister group of Maesaceae. In addition, receive a position as sister to and these families form a monophyletic group together with . At the base of this are and . The positions of and remain uncertain.

Keywords: Ericales; Balsaminaceae; Marcgraviaceae; Tetrameristaceae; Bayesian inference; Combining data; Akaike weights; Model selection; Hierarchical model

1. Introduction Many molecular phylogenetic investigations, often with a larger scope than Ericales, have already included The angiosperm order Ericales received its current Ericales representatives (e.g., Albach et al., 2001; An- circumscription in the APG classification (APG, 1998) derberg et al., 2002; Bremer et al., 2002; Morton et al., and now comprises 23 families and ca. 11,150 . It is 1996; Savolainen et al., 2000; Soltis et al., 1997, 2000). a heterogeneous grouping of former families of Dillenii- These consistently found strong support for the mono- dae, with two families of Rosidae (Balsaminaceae and phyly of the order and were able to find several sup- Roridulaceae) and one of Asteridae (Polemonia- ported groups of families: a balsaminoid, primuloid, ceae) (Cronquist, 1981; Takhtajan, 1997). Some members ternstroemioid (Pentaphylacaceae sensu APG II), and have economic value such as tea or kiwi and others ericoid group have strong support in addition to the are of horticultural importance, e.g., the relation Fouquieriaceae–, Styracaceae– of Balsaminaceae (Smets and Pyck, 2003). The position of Diapensiaceae, and Lecythidaceae–Sapotaceae. But the the order is at the base of the , where it is sister to relatives of Theaceae, Ebenaceae, and Symplocaceae the large euasterids (APG, 2003; Bremer et al., 2002). remain unknown. Apart from relatively closer interfamilial relation- * Corresponding author. Fax: +32-16-32-19-68. ships in Ericales, it has been proven very difficult to E-mail address: [email protected] (K. Geuten). resolve deeper-level relationships, even with many genes. 2

ÔUnusual short branchesÕ or the hypothesis that Ôseveral by a short internode problem and a Bayesian approach of the groups evolved rapidly and simultaneouslyÕ are therefore seemed in place. However, this increase in likely hypotheses for this difficulty (Anderberg et al., sensitivity can result in excessively high support for 2002, p. 686; Soltis et al., 2000, p. 418). Two recent wrong, short internodes (Alfaro et al., 2003). In addi- analyses, Anderberg et al. (2002) and Bremer et al. tion, it has been reported that Bayesian inference may be (2002), have both contributed a denser sampling and sensitive to even small model misspecification (Buckley additional genes to the problem, analyzing their data in et al., 2002; Huelsenbeck et al., 2002) and that BPP a parsimony framework. Anderberg et al. (2002) ana- values can be excessively liberal when concatenated gene lyzed mitochondrial atp1andmatR genes in addition to sequences are used (Suzuki et al., 2002). To obtain an chloroplast atpB, ndhF, and rbcL genes for a sampling overall view on the confidence we can attribute to dif- that represents all families, and in most cases several ferent , we assess model selection uncertainty, and representatives of each family. Their results show the perform several separate and combined analyses using balsaminoid clade basal in Ericales, followed by a different modeling strategies. Fouquieriaceae–Polemoniaceae clade that is sister to a large polytomy of the remaining families. Bremer et al. (2002) analyzed three chloroplast coding genes rbcL, 2. Material and methods ndhF, and matK, in combination with three faster evolving non-coding chloroplast regions which contain 2.1. Taxon sampling introns and spacers in the vicinity of trnT-F, trnV-atpE, and rps16. In accordance with the results of Anderberg We used a sampling of 16 terminal nodes of Ericales et al. (2002), they find the balsaminoid group basal in that allowed us to do computationally intensive ML Ericales but as an immediate sister to the large poly- bootstrap analyses for comparison with Bayesian anal- tomy. yses. The 16 terminal nodes were used to represent Although the conclusions of Anderberg et al. (2002) families or groups of families. Basal taxa were chosen and Bremer et al. (2002) are in almost perfect agreement, when possible, as they often retain a large proportion of the balsaminoid group has been resolved differently in plesiomorphic character states and were previously both analyses. This group consists of four families: identified with high support. We included pairs Balsaminaceae, Marcgraviaceae, Pellicieraceae, and of families that were previously shown to be related Tetrameristaceae; Pellicieraceae and Tetrameristaceae with high support in analyses with a broader sampling have been merged in the single family Tetrameristaceae (Styracaceae–Diapensiaceae, Actinidiaceae–Clethraceae, in APG (2003) and will here be referred to as such. The and Fouquieriaceae–Polemoniaceae). This allowed us to combination of mitochondrial and chloroplast se- test, to a certain extent, the accuracy of our recon- quences of Anderberg et al. (2002), resulted in a strongly structions. To test the resolving power of our analyses, supported basal position of Balsaminaceae. In contrast, we also included taxa of which the phylogenetic position the analysis of the chloroplast sequences by Bremer was still very uncertain. The GenBank accession num- et al. (2002) shows a strongly supported position of bers of the sequences we have used are in Appendix A. Marcgraviaceae at the base of the balsaminoid group. Impatiens or represent Balsaminaceae (Yuan Our preliminary data from ITS, suggested a basal po- et al., submitted), or represent sition of Balsaminaceae. Marcgraviaceae (Ward and Price, 2002), Pelliciera and In this study, we want to address the conflict in the Tetramerista represent Tetrameristaceae. or phylogenies of balsaminoid Ericales by using the likeli- represent Sarraceniaceae, Roridulaceae, and hood and Bayesian framework. In an effort to contrib- Actinidiaceae (Anderberg et al., 2002; Bremer et al., ute to resolving the polytomy in Ericales, we add 2002; Morton et al., 1996); represents Clethra- sequences of the fast evolving nuclear ITS region and ceae, , and (Anderberg et al., 2002; test Bayesian methods for combining all data. This Bremer et al., 2002); Maesa is the representative of study will be complemented by a study in which se- Maesaceae, Primulaceae, Myrsinaceae, and Theophr- quences of the ITS region are analyzed for a large astaceae (Kallersj€ o€ et al., 2000; Morton et al., 1996). sampling of Ericales. We use both Bayesian posterior Ebenaceae are represented by Lissocarpa and some probabilities (BPP) and bootstrap percentages of maxi- sequences (Berry et al., 2001), Fouquieriaceae mum parsimony (MP) and maximum likelihood (ML) by Fouquieria (Schultheis and Baldwin, 1999), Polemo- analyses, as they have been suggested as a potential niaceae by Polemonium and Cobea (Prather et al., 2000), upper and lower bound of node support (Douady et al., Pentaphylacaceae by Sladenia (Anderberg et al., 2002), 2003). Bayesian posterior probability values have a Lecythidaceae and Sapotaceae by and number of advantages. One is their greater sensitivity to Barringtonia (Anderberg et al., 2002; Morton et al., the signal in a dataset (Alfaro et al., 2003; Kauff and 1997), Theaceae by Schima (Prince and Parks, 2001), Lutzoni, 2002). Phylogeny of Ericales has been plagued Symplocacaceae by , Diapensiaceae by 3 and (Ronblom€ and Anderberg, 2002), and (termed matK, trnT-F, trnV-atpE, and rps16 intron). Styracaceae by (Fritsch et al., 2001). We used Coding and different non-coding segments were sepa- the family names proposed by APG (2003) and for the rated according to the definitions entered in GenBank. presentation of our phylogenetic results, we placed the The resulting coding dataset is assembled from four between balsaminoid Ericales and the Ericales partitions: atpB+E, matK, ndhF, and rbcL. The re- polytomy because the balsaminoid clade has been con- sulting non-coding chloroplast dataset consists of four fidently shown to be sister to the other Ericales in most introns and four spacers, assembled into three parti- recent analyses (Anderberg et al., 2002; Bremer et al., tions: tRNA-introns (trnK intron, trnL intron, and trnV 2002). intron), the rps16 intron, and spacer sequences (trnT-L spacer, trnL-F spacer, trnV-trnM spacer, and trnM-atpE 2.2. DNA extraction, amplification, and sequencing spacer). The mitochondrial coding dataset consists of atp1 and matR sequences obtained by Anderberg et al. DNA was extracted using a modified version of the (2002). hot CTAB protocol (Doyle and Doyle, 1987; Saghai- For each of the coding genes, an initial alignment Maroof et al., 1984). We used a Fastprep bead-mill was constructed using CLUSTALX, with default pa- (Qbiogene) with tungsten beads (Qiagen) to homogenize rameters for gap opening and extension (Thompson the dry tissue (herbarium material or silica gel dried). et al., 1997). These preliminary alignments were visu- The lysis buffer used was as described in Chase and Hills ally inspected in MacClade 4.05 for Mac OSX (1991) and the aqueous phase was extracted at least (Maddison and Maddison, 2002) using their reading twice with chloroform–isomylalcohol (24/1 v/v). After frame. No ambiguities were present in the resulting an isopropanol precipitation ()20 °C, overnight) and alignments. The separate matrices were trimmed and subsequent centrifugation, the pellet was washed at least concatenated in PAUPv4b10 for Mac (Swofford, 2002), two subsequent times and air-dried. Finally, the (DNA) which resulted in two combined coding matrices, one pellet was dissolved in 10 mM Tris–HCl (pH 8.5) and for each genome. stored at 4 °C. For the non-coding chloroplast datasets, several For the amplification of the ITS region, we used DNA segments include sequences that are transcribed primer ITS4 of White et al., 1990) and primer ITS7 (50- into transfer RNA. These segments did not show any CGAGAAGTCCACTGAACCTTATC-30). Apart from substitutions for the given sampling and were removed standard components, 10% DMSO (v/v) was included in from the data matrices. Except for the rps16 intron, the the reagent mixture. The temperature profile consisted initial alignments were also constructed using CLU- of 2 min initial denaturation at 94 °C and 30 cycles of STALX, with default parameters. Again, these align- 30 s denaturation at 94 °C, 30 s primer annealing at ments were visually inspected in MacClade 4.05, but a 55 °C and 30 s extension at 72 °C. Reactions were per- manual improvement of the alignment was necessary. formed on a GeneAmp PCR System 9700 (Applied The CLUSTALX alignment of the rps16 intron se- Biosystems). quences was not satisfactory and the alignment proce- For the sequencing reactions we used a BigDye Ter- dure was repeated with DIALIGN 2.2.1 (Morgenstern, minator v3.0/v1.1 sequencing kit (Applied Biosystems), 1999) on the University of Bielefeld web server, using according to the manufacturerÕs instructions. We in- default parameters. This strategy produced a much cluded 5% DMSO in the reaction mixture, as suggested better alignment, in our opinion. for templates with high GC content. Sequencing prod- For the construction of the nuclear ITS dataset with ucts were cleaned according to the isopropanol precip- 16 sequences, we used a DIALIGN 2.2.1 alignment of itation procedure described in the Automated 94 ITS sequences from Ericales. These sequences were Sequencing Chemistry Guide (Applied Biosystems). For in part retrieved from GenBank and in part sequenced capillary electrophoresis and fluorescent detection, an at the Laboratory of Plant Systematics (Accession ABI PRISM 310 automated sequencer (Applied Bio- Nos. AY452668–AY452672). The alignment and systems) was used. analyses of those sequences will be published sepa- rately. The alignment was manually improved for 2.3. Data matrices and alignment analysis and in this dataset, ambiguously aligned or strongly gapped regions were excluded from analysis in Accessions from the papers of Bremer et al. (2002) MacClade 4.05. Next, the sampling analyzed here was and Anderberg et al. (2002) were retrieved from Gen- retained. Bank in batch Entrez queries and compiled in different datasets. The six chloroplast markers used in Bremer 2.4. Phylogenetic analysis: parsimony criterion et al. (2002) consist of two coding markers (rbcL and ndhF) and four markers with a combination of coding Maximum parsimony analyses were conducted using characters and different types of non-coding characters PAUP 4.0b10 for Macintosh. Heuristic searches were 4 conducted with -bisection-and-reconnection (TBR) belief in a hypothesis. In contrast to the classical method branch swapping on 100 random addition replicates, of finding a single optimal likelihood estimate, sam- with 20 held at each step. Support for individual pling-based Bayesian inference focuses on estimating the clades was measured by non-parametric bootstrapping entire distribution of parameters. This density estima- (MP-BP), using 1000 pseudo-replicates. For each pseu- tion is based on a long run (or several parallel long runs) do-replicate, a heuristic search for the most parsimoni- of samples from the posterior density (Congdon, 2001; ous tree was performed by TBR branch swapping on 10 Gelman et al., 1995). random addition starting trees, with 5 trees held at each We conducted Bayesian analyses with MrBayes3b4 step. (Huelsenbeck and Ronquist, 2001), compiled for the above-mentioned Unix IBM server. MrBayes uses the 2.5. Phylogenetic analysis: likelihood methods Markov Chain Monte Carlo method to sample from posterior densities (Larget and Simon, 1999; Yang and Maximum likelihood analyses were conducted using Rannala, 1997). Four chains (one cold, three heated) PAUP 4.0b10 (Swofford, 2002) for Unix on an IBM were started from random trees; after burn-in, the chains Pseries630 Power4 node at the K.U.Leuven University produce an approximation of the posterior distribution. Service Centre for Informatics and Telematics. To obtain reasonable confidence that the chains reached We used Modeltest 3.06 (Posada and Crandall, 1998) stationarity, the analyses were performed twice: once to select the best fitting model under the Akaike Infor- with 5 106 generations and a second time with 2 106 mation Criterion (AIC). Model parameters were esti- generations. Bayesian posterior probabilities (BPP) were mated using the successive approximation strategy estimated as the proportion of trees after burn-in (10,000 suggested by Swofford et al. (1996). Initial model pa- trees) and are represented as percentages here. rameters were estimated on a most parsimonious tree We followed Buckley et al. (2002) in taking an infor- and the resulting estimates were then fixed in a sub- mation theoretic approach to model selection and model sequent heuristic ML search (10 random addition rep- selection uncertainty for the 10 individual datasets licates with 5 trees held at each step). The parameters (Burnham and Anderson, 1998). We used the log likeli- were re-optimized on this new tree and fixed for a sec- hood scores for 24 models from MrModeltest 1.1b, a ond ML search. This was repeated but the same tree simplified version of Modeltest 3.06 (Posada and Cran- (and parameter estimates) was obtained for each of the dall, 1998) written by J. Nylander. We calculated the analyses. Akaike information criterion (AIC) and normalized Ak- Support for groups of taxa was measured using 100 aike weights (xi) in Microsoft Excel and calculated per- non-parametric bootstrap replicates (ML-BP). For each centages (xi=xi þ xj) that help to get an intuitive feeling pseudo-replicate, a heuristic search for the tree with the of the amount of evidence for the best fitting model ðiÞ,in highest likelihood score was performed by TBR branch- comparison to the next-best fitting model ðjÞ. swapping on 10 random addition starting trees, with 5 We performed separate analyses of the ten partitions. trees held at each addition step. In addition, we have used a hierarchical phylogenetic We used the Shimodaira–Hasegawa test (Shimodaira model (Gelman et al., 1995; Suchard et al., 2002) to and Hasegawa, 1999) as implemented in PAUP4b10. analyze our multiple sequence data, which constituted For each of the four partitions (mitochondrial, coding two levels: a partition-level and a higher population- chloroplast, non-coding chloroplast, and nuclear ITS), level (Gelman et al., 1995). Partition-level model pa- we tested the set of three possible topologies for the rameters were unlinked across partitions. At the popu- balsaminoid Ericales families: (Bals(Marc,Tetr)), lations level, 10 multiple rate hyperparameters allowed (Marc(Bals,Tetr)), and (Tetr(Bals,Marc)). ML heuristic rates to vary across partitions. For these parameters, we searches were performed with these topological con- also used an uninformative dirichlet prior (rates ¼ vari- straints enforced. Constraint trees were constructed in able). We used the default priors for the model param- MacClade 4.05. The version of the SH test used here eters: Dirichlet (...,1,...) for the substitution rates and corresponds to TestposNPncd, according to the notation the nucleotide frequencies, and a Uniform prior for the in Goldman et al. (2000). Substitution model parameters proportion of invariant sites ð0; 1Þ and for the shape were re-optimized for each topology tested, with 10,000 parameter of the gamma distribution. Branch lengths bootstrap replicates tested and with RELL approxima- were unconstrained (no molecular clock) and followed tion. an exponential distribution. We used the Bayesian approach for testing incon- 2.6. Phylogenetic analysis: Bayesian methods gruence as suggested by Buckley et al. (2002). PAUP4b10 was used to find (filter) the ML estimate or Bayesian analysis makes use of the updated (or pos- the Bayesian estimate of the most probable topology in terior) probability of a hypothesis that unifies the like- the 95% credible set of trees produced by the separate lihood (of the data, under the hypothesis) and a prior analyses. 5

2.7. Saturation plots and base frequency heterogeneity model from the 24 candidate substitution models. Among site variation (Yang, 1993; Yang, 1996) using To evaluate the effectiveness of the ML models to the invariant sites model was never chosen above a correct for multiple hits, we used saturation plots as mixed invariant sites with gamma model or the use of originally described by Phillipe et al. (1994). Observed the gamma model alone. distances (uncorrected-P) are plotted as a function of Often, in selecting the best approximating model, ML-corrected distances. Deviation from linearity in- there was only weak evidence that this model was better creases for more distant species pairs and indicates an than the next-best model, given the a priori set of underestimation of branch length. When saturation is models. This is evident from the Akaike weights and reached, branch lengths are no longer proportional to percentages (Table 2). The best model was selected the actual amount of evolutionary change and branches confidently only for the nuclear ITS dataset (99.9%), the appear to have equal length. rbcL data (99.9%), the combined chloroplast coding Shifts in base composition among taxa were exam- data and the total combined dataset (99.9%). Model ined by using v2 heterogeneity tests as implemented in selection was highly uncertain for the atpB+atpE, the PAUP4b10 on all included sites and on varied sites only. tRNA-introns and the combined mitochondrial dataset, The ITS dataset was also analyzed using the minimum with the uncertainty being to include or omit extra evolution criterion (ME) with LogDet distances (Lock- among-site variation parameters. It was also uncertain hart et al., 1994). Bootstrapping used 10,000 pseudo- whether and how to incorporate among site variation replicates with TBR swapping on neighbour-joining for both mitochondrial coding genes (atp1 and matR), starting trees. the matK gene, the rps16 intron, the spacers and the combined non-coding datasets. Bayesian parameter estimation was largely in accor- 3. Results dance with this uncertainty in model selection (data not shown). The 95% credibility interval for the shape 3.1. Model selection uncertainty and sequence evolution parameter of the gamma distribution and for the pro- portion of invariant sites was large for the ndhFand Summary statistics of the separate and combined tRNA-introns datasets; estimation of the shape param- datasets are listed in Table 1. Without exception, the eter for the spacers dataset was also associated with a GTR model (Tavare, 1986) was chosen as the best fitting large credibility interval. In contrast, similar Akaike

Table 1 Summary statistics of the partitioned and total combined data Dataset Characters: Varied sites % Contribution Parsimony % Contribution P values of base all/included (% varied of to the total informative to total combined frequency v2 (% included) included combined sites parsimony test: all sites) varied sites (% parsimony informative included sites/ informative varied sites only of included) (*significant at the 5% level) Coding 2495/2285 478 13.1 128 7.6 0.99/0.99 Mitochondrial (91.5) (19.3) (5.3) Coding 6410/6338 1974 51.0 790 50.0 0.99/0.94 Chloroplast (98.9) (29.6) (12.5) Non-coding 3524/2899 1087 28.8 446 28.2 0.99/0.94 Chloroplast (82.2) (36.6) (15.4) Nuclear ITS 892/524 304 8.3 224 14.1 0.71/0.03* (58.7) (58.0) (42.7) atp1 1009/986 180 4.9 71 4.4 1.00/0.99 matR 1486/1322 298 8.2 57 3.6 1.00/0.99 atpB + E 1669/1586 321 8.8 132 8.3 1.00/0.99 matK 1477/1456 625 17.2 268 16.8 0.99/0.99 ndhF 1977/1941 764 20.9 292 18.3 0.99/0.99 rbcL 1287/1279 264 7.2 99 6.2 1.00/0.99 tRNA-introns 1595/1375 441 12.1 159 9.9 0.99/0.99 rps16-intron 990/795 303 8.3 141 8.8 1.00/0.99 Spacers 939/714 343 9.4 157 9.8 1.00/1.00 Total combined 13321/11978 (90.4) 3643 (30.6) 100 1600 (13.1) 100 6

Table 2 Normalized Akaike weights for GTR models with among site rate heterogeneity Dataset GTR GTR + I GTR + G GTR + I + G 95% Mitochondrial coding 0.306 0.113 0.324 (51.4%) 0.134 8462 Chloroplast coding <0.001 <0.001 <0.001 0.999 (99.9%) 31 Chloroplast non-coding <0.001 <0.001 0.359 0.641 (64.1%) 739 Nuclear (ITS) <0.001 <0.001 0.001 0.998 (99.9%) 4756 atp1 0.490 (70.3%) 0.180 0.207 0.087 2528 matR 0.454 (69.3%) 0.167 0.201 0.074 5344 atpB+atpE 0.159 0.059 0.410 (52.4%) 0.372 9330 matK <0.001 <0.001 0.730 (73.0%) 0.270 755 ndhF <0.001 <0.001 0.085 0.915 (91.5%) 1789 rbcL <0.001 <0.001 <0.001 0.962 (99.9%) 4436 tRNA-introns <0.001 <0.001 0.445 0.555 (55.5%) 3389 rps16-intron 0.001 0.004 0.717 (72.1%) 0.278 8205 Spacers 0.014 0.005 0.694 (70.7%) 0.287 9017 Combined <0.001 <0.001 <0.001 0.999 (99.9%) Values for other models not shown, but always <0.1. Between brackets, percentage that the best fitting model is to be preferred over the next-best fitting model. In the right column are the number of trees in the 95% credible set for the separate Bayesian analyses. weights were obtained for the selection of the GTR + G Using alternative models combined in a hierarchical model over the GTR + I + G model for the spacers, model results in a decrease for deeper level relations rps16 intron and matK gene but the associated selection (Fig. 5). uncertainty was not clearly evident from Bayesian pa- Saturation plots show the strongest saturation for rameter estimation. the ITS data and for the third position chloroplast When the set of candidate models was larger (56 coding data (Fig. 1). There was no evidence for base versus 24, data not shown), selection uncertainty for the compositional bias (P values for v2 test in Table 1) in chloroplast coding and chloroplast non-coding datasets any of the partitions, except for the varied sites of the was between the best-fitting model (GTR + I + G) and a ITS data (P ¼ 0:03). In the minimum evolution anal- submodel of this model (TVM + I + G). For the nuclear ysis with LogDet distances, the monophyly of bals- ITS dataset, the chosen model was selected with equally aminoid families (ME-BP 89%), the monophyly of high certainty out of 24 or out of 56 models. The mi- Tetrameristaceae (ME-BP 70%) and the basal position tochondrial dataset did not allow for confident model of Balsaminaceae was highly supported (ME-BP 95%). selection with 24 models and this was even much more There was no bootstrap support >50% for relations so when comparing 56 models: no model was selected between the other Ericales. As a result, the same re- with an Akaike weight >0.15. lations were obtained with LogDet as with ML and We also tested the effect of the use of the next-best MP and Bayesian analyses and we conclude that the model on node posterior probability values for the 10 bias in base frequencies did not influence these analyses separate partitions (Fig. 3) and for the total combined strongly. analysis where a hierarchical model was used (Fig. 5). For the separate analyses, we did not evaluate an al- 3.2. Mitochondrial coding data: atp1 and matR ternative model for the ITS and rbcL data, because the chosen model for these partitions was selected with For both mitochondrial coding genes, there was high certainty (Table 2). The alternative model differed support for Fouquieriaceae as sister to Polemoniaceae from the best model in omitting or including an in- (BPP 87%/90%) and Theaceae as sister to Symploca- variant sites parameter for the chloroplast partitions. ceae (BPP 76%/96%). These same groups were also For both mitochondrial genes, the difference involved retained in the combined analysis. For the atp1 dataset including a gamma shape parameter. In general, the there was high support for Maesaceae–Diapensiaceae use of the alternative model resulted only in relatively (BPP 99%), but this relation was not found in the small changes in posterior probability (not larger than combined mitochondrial analysis. Bootstrap support 10%) as is the case for the tRNA introns, rps16, >50% under the ML or MP criteria, was only present spacers, and matK datasets. For both mitochondrial for the Polemoniaceae–Fouquieriaceae group (ML-BP genes, the use of a gamma distribution to model 60%, MP-BP 66%). Both mitochondrial coding genes, among site variation resulted in a general decrease of separately and combined, obtained high support under posterior probabilities for individual nodes, which was all criteria for Tetrameristaceae as a basal taxon in most extreme for the relations of Maesaceae, Lecyt- balsaminoid Ericales (BPP 99%/100%, MP-BP 94%, hidaceae, and Actinidiaceae for the matR dataset. ML-BP 87%). 7

Fig. 1. Saturation plot of the proportion of uncorrected (observed) sequence divergence on the Y axis versus the proportion of the ML estimated sequence divergence on the X axis.

3.3. Chloroplast coding data: atpB + E, matK, ndhF, and likelihood topology for this dataset both show a rbcL monophyletic group of these four families, with Symp- locaceae sister to Theaceae (ML) or sister to Styraca- The Fouquieriaceae–Polemoniaceae group was found ceae–Diapensiaceae (MP, not shown). Also, there was in the Bayesian analyses of matK, atpB + E and the BPP support (100%) for a sister group relationship of combined coding chloroplast Bayesian analyses (BPP these four families with Actinidiaceae–Clethraceae in 98%, BPP 88%, and BPP 99%, respectively). However, the combined chloroplast analysis; this group was also bootstrap support for these branches was low in com- found, although with no bootstrap support, in the ML bined ML and MP analyses (ML-BP <50%, MP-BP analysis and the MP topology. Actinidiaceae were sister 55%). The rbcLandndhF genes each support the rela- to Clethraceae in the Bayesian analysis of ndhF, matK tion between Pentaphylacaceae and Maesaceae. This and the combined chloroplast coding data (BPP 100%, relation also obtained moderate ML and MP bootstrap 98%, and 100%, respectively) and had support in ML support (ML-BP 70%, MP-BP 73%). The subsequent and MP bootstrap analyses (78 and 89%, respectively). position of Ebenaceae in a monophyletic group with In balsaminoid families, there is moderate to high sup- Fouquieriaceae–Polemoniaceae and Pentaphylacaceae– port for a basal position of Marcgraviaceae in all com- Maesaceae was only supported in the combined Bayes- bined chloroplast coding analyses (BPP 99%, ML-BP ian analysis (BPP 100%), but not in any separate 96%, and MP-BP 64%). Bayesian analysis or combined chloroplast coding ML or MP analysis. A Styracaceae–Diapensiaceae group 3.4. Chloroplast non-coding data: tRNA-introns, rps16- had no significant BPP support when analyzing the intron, and spacers matK partition (BPP 82%) but was strongly supported in the combined chloroplast coding analyses (BPP 99%, The separate Bayesian analyses of the three non- MP-BP 90%), except for the ML bootstrap analysis coding chloroplast partitions gave little evidence for the (ML-BP <50%). Symplocaceae and Theaceae had a same groups that belong to the outgroup Ericales. The close affinity with Styracaceae–Diapensiaceae in the only relation with some posterior probability (BPP 94%) Bayesian analysis of the matK dataset (BPP 65%) and in was Fouquieriaceae–Polemoniaceae for the tRNA-in- the combined chloroplast coding Bayesian analysis (BPP trons dataset. In the combined analyses of non-coding 93%). The most parsimonious tree (not shown) for the chloroplast sequences, the Fouquieriaceae–Polemonia- combined chloroplast coding data and the maximum ceae and Actinidiaceae–Clethraceae have strong support 8

Fig. 2. (A) Bayesian consensus phylograms of analyses of coding genes with best approximating models. Numbers above branches are posterior probabilities with best model/next-best model, values <50% not shown: mitochondrial atp1; mitochondrial matR; chloroplast matK; chloroplast rbcL; chloroplast ndhF; chloroplast atpB + E. (B) Bayesian consensus phylograms of analyses of non-coding regions with best approximating models. Numbers above branches are posterior probabilities with best model/next-best model, values <50% not shown: nuclear ITS region; chloroplast tRNA-introns; chloroplast rps16; chloroplast spacers. 9

(BPP 98%, ML-BP 71%, MP-BP 65%, and BPP 100%, for Polemoniaceae as sister to a large polytomy. A group ML-BP 87%, MP-BP 89%, respectively). of Symplocaceae, Theaceae, Diapensiaceae, and Styrac- For balsaminoid Ericales, the rps16 intron and the aceae is present in the strict consensus, but with no spacers dataset showed some support for Balsamina- bootstrap support. Finally, the ericoid group with Ac- ceae sister to Marcgraviaceae. In contrast, the tRNA- tinidiaceae and Clethraceae is also present. introns data had a high posterior probability for Marcgraviaceae basal (BPP 100%). In the combined 3.6. Total combined analyses analyses there was moderate to strong support for Marcgraviaceae basal (BPP 99%, ML-BP 69%, and All combined analyses with the data treated as a MP-BP 81%). single partition resulted in the same topology, except for the basal clade in balsaminoid Ericales (Fig. 4). The ML 3.5. Nuclear ITS data topology and Bayesian majority rule trees show Bals- aminaceae and Marcgraviaceae as sisters, but the single MP analysis of the nuclear ITS dataset resulted in four most parsimonious tree shows Balsaminaceae as the most parsimonious trees (not shown). The balsaminoid basal group (91% MP-BP). In the Ericales outgroup, group was strongly supported and resolved with Marc- Fouquieriaceae–Polemoniaceae, Pentaphylacaceae– graviaceae sister to Tetrameristaceae and Balsaminaceae Maesaceae, Styracaceae–Diapensiaceae, and Actinidia- at the base. There was low support (63% bootstrap value) ceae–Clethraceae all have high support. Deeper-level

Fig. 2. (continued) 10

Fig. 3. Maximum likelihood phylogenies of four data partitions. Numbers above branches indicate Bayesian posterior probabilities/ ML bootstrap values/MP bootstrap values: mitochondrial coding with K81uf + G: atp1+matR; chloroplast non-coding with TVM + I + G: tRNA-introns + spacers + rps16-intron; chloroplast coding with GTR + I + G: matK+ndhF+rbcL+atpB + E; nuclear ITS region with GTR+I+G. relationships do not have high ML or MP bootstrap In the hierarchical model Bayesian analysis, the rela- support. In contrast, Bayesian analyses show well-sup- tionships in balsaminoid families have a high posterior ported deeper-level relationships but posterior proba- probability (Fig. 5). In the Ericales outgroup, Fouqui- bility values vary with the way models are implemented. eriaceae–Polemoniaceae are basal, but this relation has 11

sister to Symplocacaceae (BPP 100%) and Styracaceae sister was to Diapensiaceae (BPP 100%). Together, these four families form a monophyletic group (BPP 100%) with the Actinidiaceae as sister to Clethraceae (BPP 100%) at the base (BPP 100%). Lecythidaceae have no supported affinity with other families and there is a some evidence for Ebenaceae as a basal clade of Pentaphy- lacaceae–Maesaceae.

3.7. Incongruence tests

We tested whether the topology obtained from the hierarchical model Bayesian analysis was in the 95, 99 or 100% credible set of separate analyses. The number of trees in each 95% credible set is in the last column of Table 2. The topology was never present in one of the credible sets. We tested the three possible hypotheses for the phy- logeny of balsaminoid Ericales families by comparison of ln scores with the Shimodaira–Hasegawa test. The Fig. 4. ML topology of the total combined analysis. Values above application of the SH test for the four different parti- branches indicate 1 model BPP values/ML bootstrap support values/ tions rejects significantly, in some cases, the alternative MP bootstrap support values. hypotheses. The mitochondrial dataset has the Balsaminaceae sister to Marcgraviaceae as best topology. The Marc- graviaceae basal topology is not considered plausible at the 5% significance level with P value 0.0473. But the Balsaminaceae basal topology is not excluded from the confidence set of trees obtained from the mitochondrial dataset (P ¼ 0:1714). Marcgraviaceae basal is the optimal ML topology for the combined dataset of coding chloroplast genes, but both alternative topologies fall within the confidence limit for alternative topologies. Both competing topol- ogies have a P value of 0.1081. Similarly, the concate- nated non-coding dataset of chloroplast data has Marcgraviaceae basal as the best topology. The likeli- hood for a basal position of Balsaminaceae is signifi- cantly lower according to the SH test (P ¼ 0:0440). A sistergroup relation for Marcgraviaceae and Balsamin- aceae has no significantly lower likelihood (P ¼ 0:2889). The nuclear data from the ITS region have the Bals- aminaceae basal topology as most likely. The competing topologies have no significantly lower likelihood (P ¼ 0:0734 or P ¼ 0:0781).

Fig. 5. Consensus phylogram of the hierarchical model Bayesian analysis with best-approximating models for partitions. Numbers 4. Discussion above branches indicate posterior probabilities with best-approxi- mating models/next-best approximating models. 4.1. Incongruence and accuracy

Numerous measures and tests have been devised to no probability above a 5 or 1% significance level. Also in assess conflict. In this study, we used a Bayesian method this analysis, the monophyly of Theaceae–Symploca- to assess global incongruence between partitions and ceae–Styracaceae–Diapensiaceae–Actinidiaceae– Clethr- applied the SH test for the local incongruence in bals- aceae was strongly supported (BPP 100%). Theaceae was aminoid families. 12

Two approaches to check for incongruence in a models. We believe there is a need to further investigate Bayesian framework have been proposed (Buckley, the performance of these tests under various conditions. 2002; Buckley et al., 2002; Kauff and Lutzoni, 2002; The unlikely strong presence of conflict across partitions Reeder, 2003). A first makes use of significant conflict here is in our view probably due to either too high between branches of topologies (Buckley, 2002; Kauff support for short internodes or model inadequacy. and Lutzoni, 2002). Using branch posterior probabilities The uncertainty in the estimation of the relation- as a test statistic for conflict can be misleading, probably ships between balsaminoid families was first evident because posterior probability values are sensitive to from previous analyses. In the combined analysis of model misspecification (Buckley, 2002). Our results Anderberg et al. (2002), balsaminoid Ericales are re- agree with this, because for different partitions there are solved with strong support for a basal position of highly supported branches at either the 5 or 1% signifi- Balsaminaceae. In contrast, the analysis of Bremer cance level, that are not present in any of the other et al. (2002) found equally strong support for a basal partitions or in any of the combined analyses. position of Marcgraviaceae. Several implementations A second way to investigate conflict in a Bayesian of tests of topology in a likelihood framework are re- framework tests whether a topology falls within a con- viewed by Goldman et al. (2000). We favored the use fidence set of trees. Buckley et al. (2002) used the ML of the non-parametric Shimodaira–Hasegawa test be- tree of a combined dataset to check whether this fell cause it is much more conservative in rejecting alter- within the 95% posterior credibility interval of the par- native topologies (Type I error) and less subject to titions, but the ML topology from the different dataset model misspecification (Buckley et al., 2002). The has also been used (Reeder, 2003). If the absence of a outcome of the application of the SH test on the four topology from a confidence set is not attributable to an partitions generated two significant P values at the 5% incomplete exploration of topology space, the imple- level. The topologies under investigation here were in mentation of Buckley et al. (2002) can be interpreted as all cases ML trees that only differ in the position of one follows: when the topology is present in the 95% credible branch. Although this is a practical approach (as im- sets of trees of both separate analyses, the topology can plemented in, e.g., Soltis et al., 2002) this set of trees is be a good representation of the underlying evolutionary not the full representation of all possible trees. Nar- history of both datasets, when it is not, either gross rowing this set of possible trees probably decreased P model misspecification or different evolutionary histo- values and it is a good possibility that testing would ries can be concluded (Bull et al., 1993). When we im- not have been significant when the complete set of trees plement this test here, there is strong evidence for would have been compared. Nevertheless, this testing is conflict between all ten partitions and between the congruent with the combined Bayesian and ML esti- combined mitochondrial, coding chloroplast, non-cod- mates, because the obtained topology, with Balsamin- ing chloroplast, and nuclear partitions. So we could aceae and Marcgraviaceae as most closely related, is conclude that conflict is not just caused by random er- never rejected by the SH test. ror, but either by a different evolutionary mechanism or Congruence between datasets is strong evidence for by incorrect tree reconstruction. For our 10 partitions, accuracy (Huelsenbeck et al., 1996; Penny and Hendy, incongruence is present between partitions that are ex- 1986) and incongruence can be an indication of either pected to have evolved along the same way: between different evolutionary histories or inaccuracy (Bull et al., mitochondrial genes, between coding chloroplast genes, 1993). An obvious way to improve the accuracy of our between non-coding chloroplast genes. Given that the preliminary estimations of the relationships in Ericales conflict in the data as well as between the estimated would be to use a denser taxon sampling (Graybeal, topologies seems to be real from a statistically point of 1998; Hillis et al., 2003; Pollock et al., 2002; Zwickl and view and coincides with the genome-level, it would be Hillis, 2002; but see Rosenberg and Kumar, 2001, 2003). tempting to search for an explanation in a biological The relationships in Ericales found here with ML and process (Wendel and Doyle, 1998). Two candidate MP bootstrap analysis are in agreement with Anderberg processes are lineage sorting and introgression. But et al. (2002) and Bremer et al. (2002) who have used a neither of these processes is very likely to function at the denser sampling. We will also contribute to more accu- phylogenetic level studied here. rate estimations with denser sampling (Geuten et al., in Buckley (2002) warns for possible misleading con- prep). clusions when using node posterior probabilities as a In an initial effort to investigate the effect of denser measure of incongruence. In statistics, an increased taxon sampling for balsaminoid families, we performed emphasis has been placed on estimating a correct con- analyses of ITS data with the same Ericales root but fidence interval in comparison to testing a phylogenetic added seven Impatiens sequence to the accession of hypothesis against a significance level (Gelman et al., Hydrocera triflora for Balsaminaceae and sequences of 1995; Holmes, 2003). But neither the former nor the the Marcgraviaceae genera and Norantea latter approach seems to be applicable for our data and (unpublished own results and Yuan et al., submitted). 13

Parsimony bootstrap analysis for that sampling indi- It seems unlikely that a single model, with few pa- cated similar high support for the relationships de- rameters, can adequately fit our large and heterogeneous picted here with a smaller sampling: monophyly of total combined dataset. In contrast, many parameters balsaminoid Ericales received 96% MP-BP, monophyly could overfit the data, decreasing inferential power. A of Tetrameristaceae received 69%, monophyly of Bals- hierarchical model can have enough parameters to fit aminaceae and Marcgraviaceae both received 100%, and the data, while installing dependence between the pa- Marcgraviaceae was sister to Tetrameristaceae with rameters of different partitions (Gelman et al., 1995; 97%. In addition, we used available rbcLandtrnL-F Suchard et al., 2002). More important than model se- spacer sequences (GenBank) from studies of relation- lection uncertainty and model fit is model adequacy; a ships in Balsaminaceae and in Marcgraviaceae (Fujih- good way to evaluate adequacy could be the use of ashi et al., 2002; Ward and Price, 2002). A combined posterior predictive simulation (Bollback, 2002; Gelman analysis had no support above 50% for Balsaminaceae et al., 1995; Huelsenbeck et al., 2001) as implemented in basal or Marcgraviaceae basal, either in parsimony MAPPS software (Bollback, 2002). This is based on the bootstrap analysis or in Bayesian analysis. rationale that an adequate model should perform well in predicting future observations. But when using more 4.2. Model selection, model fit, and model sensitivity than one gene, the problem of model adequacy overlaps with the question of how to define data partitions. Bayesian data analysis can be divided into three steps: Contiguous DNA segments are in fact made up of sev- setting up a model, calculating, and interpreting the eral different types of nucleotides under different selec- appropriate posterior distribution, and evaluating the fit tive constraints. When working with multiple partitions of the model and the implications for the resulting the question arises how much the data should be parti- posterior distribution (Gelman et al., 1995). The tioned in order to allow a model to adequately model ‘‘model’’ in this context is not restricted to the substi- sequence evolution. It may have been beneficial for our tution model but also incorporates the sampling distri- data analysis to further partition the data. bution and priors used; in the following discussion we We used sensitivity analysis (Chatfield, 1995; Gelman consider only the substitution model. et al., 1995) to evaluate the effect of choosing the alter- Several authors have stressed the importance of native to the selected model in Bayesian analyses. For model fit in Bayesian phylogenetics (Buckley, 2002; the separate analyses, the posterior probability values Bollback, 2002; Larget and Simon, 1999). Model selec- under both models were similar, although changes could tion and model fit are two related criteria. But whether involve support lying above or below a significance level. the chosen model is adequate to model sequence evo- Buckley et al. (2002) noted the same, relatively small lution and to allow for accurate tree reconstruction is effect for their single model combined analysis. When not checked when using one of the available selection alternative models are integrated in a hierarchical methods such as the likelihood ratio test, the Bayesian analysis, deeper-level branches lose support and an im- information criterion or the Akaike information crite- probable, weakly supported branch arises (Lecythida- rion (Bollback, 2002). Using the AIC allows assessing ceae–Actinidiaceae, not shown in Fig. 5). This is an uncertainty in model selection with the help of Akaike important result as this analysis gives an idea of the weights. This is an easy and straightforward procedure. robustness of the hierarchical analysis. However, a We note after Buckley et al. (2002) that evidence for reasonable but inferior model is selected for 8 out of 10 uncertainty obtained in this way is also in general partitions and the combination of these, seemingly re- agreement with Bayesian credibility intervals for model sults in an inferior estimate. We believe that the topol- parameters. ogy that resulted from combining the available data When combining data, a single model is almost al- with the use of a hierarchical model and best-approxi- ways used. The single model ML and Bayesian analyses mating models is a reasonable current estimate of Eri- and the MP analysis all give the same topology with very cales phylogeny (Fig. 5). high posterior probability for all branches. Also when making a single estimate of topology, but allowing dif- 4.3. Appropriateness of ITS data ferent partitions to evolve under a separate model, the same topology is obtained (data not shown). Only when We have used sequences from the ITS region to infer we use a model that hierarchically structures the parti- relationships at an unusually high taxonomic level. The tion parameters by including further multiple rate hy- ITS region has been applied as a phylogenetic marker at perparameters, we obtain a different topology with the inter- and infra-generic level, but use at a higher level Fouquieriaceae–Polemoniaceae basal, which is congru- has also been suggested (Hershkovitz et al., 1999; Sim- ent with the previous study of Anderberg et al. (2002) mons and Freudenstein, 2003). High evolutionary rates and therefore is in our opinion the most reasonable are associated with several possible problems (Yang, topology. 1998). The first is the difficulty of alignment. We have 14 taken two measures to take alignment uncertainty into Our initial effort was to investigate the relationships account. The first is the use of a large ITS data matrix of between balsaminoid families. We find that Balsamina- Ericales to produce an alignment with a smaller genetic ceae and Marcgraviaceae share the most recent common distance between the sequences. Also alignment im- ancestor. However, the position of Balsaminaceae with proves significantly when using more sequences former thealean families Marcgraviaceae and Tetra- (Thompson et al., 1999). The second is the use of a meristaceae, remains enigmatic from a morphological segment-to-segment alignment algorithm that produces point of view, their only synapomorphy being the shared a much more spacious alignment with local blocks presence of raphides (Anderberg et al., 2002; Baretta- (DIALIGN, Morgenstern, 1999). In addition, sequence Kuipers, 1976). We are currently exploring an evolu- regions of high ambiguity were excluded from analysis tionary developmental genetic (evo-devo) approach in the large alignment. (Albert et al., 1998; Theissen and Saedler, 1995) to fur- A second problem is base frequence heterogeneity of ther elucidate the relation between Marcgraviaceae and which the variable positions of the ITS matrix suffer, as Balsaminaceae (Geuten et al., 2003). indicated by the v2 test. But the inference of the rela- One of the newly inferred relationships in the Eri- tionships in balsaminoid Ericales in LogDet analysis cales outgroup is that of Maesaceae, as the basal rep- were no different from the separate ML, MP or Bayesian resentative of the primuloid families, and Sladenia a analyses and there was no support for conflicting rela- basal representative of Pentaphylacaceae sensu APG II. tionships between the LogDet analysis and the ML, MP, Caris et al. (2000) investigated the floral ontogeny of and Bayesian analyses. Maesaceae, but we know of no shared floral ontoge- Third is saturation, a process that has, as an extreme netic characters between Maesaceae and Pentaphylac- consequence that the similarity between sequences will aceae sensu APG II. Although unclear from a floral only depend on nucleotide frequencies, which is not a morphological point of view, there are some interesting good indicator of phylogeny (Xia et al., 2003). The wood anatomical features shared between these two saturation plots show ITS to be strongly saturated. groups. Maesaceae is wood anatomically rather differ- Nevertheless, the topology that is obtained is congruent ent from the other woody primuloid taxa and it has with the topology of other markers (Figs. 1 and 2) ex- recently been suggested that the primuloid taxa (except cept for the relationships in the balsaminoid clade. In for Maesaceae) are essentially herbaceous (Anderberg addition, there is no support (BPP, MP-BP, and ML- et al., 2000; Stevens, 2001). Therefore, we restrict the BP) for conflicting relationships. comparison to Maesaceae and Pentaphylaceae sensu The problem of saturation may have been exagger- APG II (with a focus on Sladenia and Ficalhoa). Both ated (Yang, 1998) and it has been noted that the optimal families consist of evergreen trees with diffuse apo- evolutionary rate can be very high for sequences to be tracheal and scanty paratracheal parenchyma. In most informative (Goldman, 1998; Yang, 1998). More Maesaceae, vessel-perforations are simple or scalari- important than concerns about saturation, there needs form (Metcalfe and Chalk, 1950; Moll and Janssonius, to be enough sequence divergence for a phylogeny to be 1926) and in Pentaphylacaceae perforations are sca- resolved (Rokas et al., 2002). lariform (Deng and Baas, 1990, 1991). Both Sladenia Adding the ITS data had a major impact on the and Maesa have vessel elements in radial groupings parsimony bootstrap support for the position of Bals- (Deng and Baas, 1990; F. Lens, pers. com.). A diag- aminaceae. The total combined MP analysis shows nostic feature for Maesaceae wood, in comparison to strong support (91% MP-BP) for a basal position, but other primuloid woody taxa, is the co-occurence of leaving out ITS characters gives weak to moderate uni- and multiserriate rays (Moll and Janssonius, 1926; support (64% MP-BP) for Marcgraviaceae at the base. F. Lens, pers. com.). The same feature is also present in Ficalhoa (Deng and Baas, 1991) and most other 4.4. Evolutionary implications Pentaphyacaceae sensu APG II genera (Deng and Baas, 1990), although absent in Sladenia. As mentioned above, we consider the topology in We also conclude that Theaceae sensu APG II are Fig. 5 our best estimate of Ericales phylogeny. Through the closest relatives of Symplocaceae. The possible re- the use of the hierarchical model, it acquires some status lations of Symplocaceae are discussed in Caris et al. as a meta-analysis, integrating all available evidence (2002). According to Airy-Shaw (1966), the Symp- (Gelman et al., 1995). We infer several, previously un- locacaeae and Theaceae are only scarcely different, resolved, deeper-level relationships in Ericales and dis- except that Symplocaceae have a racemose inflores- cuss them from a taxonomic and morphological point of cence and an inferior , but Caris et al. (2002) view on the basis of Fig. 5. We do not discuss the po- found no flower ontogenetical evidence to support their sitions of Lecythidaceae and Ebenaceae, because sup- relation. Both families have fascicled (for a port is not significant, and there position is still very detailed account of the ontogeny of Theaceae see uncertain. Erbar, 1986 and Tsou, 1998). Symplocaceae and 15

Theaceae both accumulate aluminium (Jansen et al., submitted) and share the presence of primitive wood anatomical features (van den Oever et al., 1981). The ITS AY452668 AY452669 relationship between Diapensiaceae and Styracaceae AY452670 was strongly supported in previous analyses (Ander- berg et al., 2002) and here we find support for a further relationship of Styracaceae–Diapensiaceae with Thea- ceae–Symplocacaceae. This is not surprising from a 16 rps AJ430993 AJ431014 morphological point of view, as the close affinity of AJ431001 Symplocaceae and Styracaceae was agreed upon by most authors before molecular investigations suggested E Diapensiaceae as the closest relative of Styracaceae atp V- trn AJ429641 AJ429660 (Cronquist, 1981; but not Nooteboom, 1975). Sym- AJ429646 plocacaceae, Theaceae, and Diapensiaceae are strong aluminium accumulators, but Styracaceae are not (Jansen et al., submitted). Pollen of Styracaceae strongly resembles that of Theaceae (Morton and T-F trn AJ430870 AJ430891 Dickison, 1992). AJ430879 We find a large supported group of Theaceae– Symplocaceae–Styracaceae–Diapensiaceae–Actinidiaceae–

Clethraceae. Actinidiaceae take an important position in R mat AF421017 AF421011 this respect because the family was placed in Ericales AF421022 sensu stricto by Dahlgren (1989) and Takhtajan (1997), and in Theales by Cronquist (1981) and Thorne (1992). Cronquist (1981, p. 326) considers Actinidiaceae as ‘‘an early and distinct branch of the Theales, not far re- 1 atp AF420941 AF420933 AF420943 moved from the ancestral order Dilleniales and near to the ancestry of the Ericales (sensu stricto).’’ Dickison (1972) provides flower morphological and anatomical evidence for a close affinity of Actinidiaceae and Ericales F ndh AF421060 AF421065 sensu stricto. Somewhat similar to Actinidiaceae are AF421069 Styracaceae, that was ‘‘considered to have an origin among of thealean affinity and to show a sister group relation with the Ericales (sensu stricto)’’ (Dick- K J429280 J429289 J429303 mat A A ison, 1993 p. 251). A The relationships found here should be confirmed by additional analyses with a broad sampling and by a thorough examination of morphological characters. The presented results can be used as testable phylogenetic B atp AF209647 hypotheses and as a phylogenetic framework to re-in- AJ235503 AJ235529 vestigate morphological characters. K V L .G. 586, A (2000) (2000) (2002) (2002) (2002) (2002) al., euten, al. al. al. al. al. al. (2002) (2002) (2002) t (direct

Acknowledgments G e et et et et et et . .T. al. al. al. Robyns, K R al., et et et

We thank the National Botanic Garden of Belgium, et 249369 Nationaal Herbarium Nederland Leiden branch, Berlin- 0 Savolainen Bremer Anderberg Voucher L Savolainen Bremer Anderberg Voucher Anderberg Reference Soltis submission) Bremer Anderberg Voucher Pennington Dahlem Botanic Garden and the Kew Royal Botanic Gardens for their supply of material. Thanks to Anja Vandeperre (K.U.Leuven) for technical assistance. We also thank two anonymous reviewers for suggestions to rectiflora sp. umbellata triflora s.n.

improve the manuscript. This work was financially repens capensis parviflora rhizophorae peduncularis supported by research grants of the K.U.Leuven OT/01/ A name 25 and the Fund for Scientific Research—Flanders Impatiens Impatiens Impatiens Hydrocera Marcgravia Marcgravia Anderberg Marcgravia Norantea (Belgium) (G.0104.01; 1.5.069.02; 1.5.061.03). N.P. is a Pelliciera Family Species Balsaminaceae Marcgraviaceae Tetrameristaceae

postdoctoral fellow of the K.U.Leuven. Appendix 16

Appendix A (continued) Family Reference atpB matK ndhF atp1 matR trnT-F trnV-atpE rps16 ITS Species name Tetramerista sp. Savolainen et al. (2000) AJ235623 Albach et al. (2001) AJ400887 Bremer et al. (2002) AJ429304 AJ430892 AJ429528 AJ431015 Anderberg et al. (2002) AF420958 Voucher Coode 7925, K AY452671

Actinidiaceae Actinidia chinensis Savolainen et al. (2000) AJ235382 Actinidia arguta Anderberg et al. (2002) AF421043 AF420916 AF420991 Actinidia kolomikta Bremer et al. (2002) AJ429279 AJ430869 AJ429640 AJ430992 Saurauia zahlbruckneri Fritsch et al. (direct AF396452 submission) Clethraceae Anderberg et al. (2002) AF420965 AF421046 AF420919 AF420996 Bremer et al. (2002) AJ429281 AJ430871 AJ430994 Fritsch et al. (direct AF396237 submission) Ebenaceae Lissocarpa guianensis Anderberg et al. (2002) AF420975 AF421062 AF420934 AF421012 Bremer et al. (2002) AJ429287 AJ429645 Diospyros kaki Bremer et al. (2002) AJ430874 Diospyros texana Jackson et al., 1999 AF174622

Fouquieriaceae Fouquieria sp. Anderberg et al. (2002) AF420970 AF421056 AF420928 AF421006 Anderberg s.n Fouquieria diguetii Bremer et al. (2002) AJ429285 AJ430876 AJ429643 AJ430998 Schultheis and Baldwin AH007889 (direct submission) Polemoniaceae scandens Savolainen et al. (2000) AJ235440 AF421049 Anderberg et al. (2002) AF420921 AF420999 Polemonium Bremer et al. (2002) AJ429292 AJ430882 AJ429649 AJ431004 pulcherrimum Bell and Patterson AF027704 (direct submission) Pentaphylacaceae Sladenia celastrifolia Anderberg et al. (2002) AF420988 AF421081 AF421040 Bremer et al. (2002) AJ429297 AJ430081 AJ429654 AJ431009 Wang et al. (direct AY096029 submission) Maesaceae Maesa tenera Kallersj€ o€ et al. (2000) AF213781 AF213750 Bremer et al. (2002) AJ429288 AJ430878 AJ431000 Anderberg et al. (2002) AF420937 AF421015 Maesa japonica Voucher K. Geuten, LV AY452672

Lecythidaceae Savolainen et al. (2000) AJ235540 Albach et al. (2001) AJ236258 Anderberg et al. (2002) AF420960 AF421041 17

References

Airy-Shaw, H.K., 1966. A Dictionary of the Flowering Plants and Ferns. Cambridge University Press, Cambridge. AF208700 AF354641 AF396229 AY049796 AY143589 Albach, D.C., Soltis, P.S., Soltis, D.E., Olmstead, R.G., 2001. Phylogenetic analysis of asterids based on sequences of four genes. Ann. Mol. Bot. Gard. 88, 163–212. Albert, V.A., Gustafsson, M.H.G., DiLaurenzio, L., 1998. Ontogentic systematics, molecular developmental genetics, and the angiosperm AJ431017 AJ430999 AJ431012 AJ431011 . In: Soltis, D.E., Soltis, P.S., Doyle, J.J. (Eds.), Molecular Systematics of Plants II, DNA sequencing. Kluwer Academic Publishers, Boston, pp. 349–374. Alfaro, M.E., Zoller, S., Lutzoni, F., 2003. Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov

J429662 chain Monte Carlo sampling and bootstrapping in assessing AJ429644 AJ429658 AJ429657 A phylogenetic confidence. Mol. Biol. Evol. 20, 255–266. Anderberg, A.A., Stahl, B., Kallersjo, M., 2000. Maesaceae, a new primuloid family in the order Ericales s.l. Taxon 49, 183–187. Anderberg, A.A., Rydin, C., Kallersjo, M., 2002. Phylogenetic relationships in the order Ericales SL: analyses of molecular data AJ430877 AJ430894 AJ430889 AJ430888 from five genes from the and mitochondrial genomes. Am. J. Bot. 89, 677–687. APG, 1998. An ordinal classification for the families of flowering plants. Ann. Mol. Bot. Gard. 85, 531–553. APG, 2003. An update of the angiosperm phylogeny group classifi- AF421038 AF421029 AF421001 AF421032 cation for the orders and families of flowering plants: APG II. Bot. J. Linn. Soc. 141, 399–436. Baretta-Kuipers, T., 1976. Comparative wood anatomy of Bonnetia- ceae, Theaceae, and Guttiferae. Leiden Bot. Series 3, 76–101. Berry, P.E., Savolainen, V., Sytsma, K.J., Hall, J.C., Chase, M.W., 2001. Lissocarpa is sister to Diospyros (Ebenaceae). Kew. Bull. 56, AF420948 AF420954 AF420923 AF420950 725–729. Bollback, J.P., 2002. Bayesian model adequacy and choice in phylog- enetics. Mol. Biol. Evol. 19, 1171–1180. Bremer, B., Bremer, K., Heidari, N., Erixon, P., Olmstead, R.G., Anderberg, A.A., Kallersjo, M., Barkhordarian, E., 2002. Phylog- AF421073 AF421075 AF421052 AF421084 enetics of asterids based on 3 coding and 3 non-coding chloroplast DNA markers and the utility of non-coding DNA at higher taxonomic levels. Mol. Phylogenet. Evol. 24, 274–301. Buckley, T.R., 2002. Model misspecification and probabilistic tests of topology: evidence from empirical data sets. Syst. Biol. 51, 509–

AJ429286 AJ429306 AJ429301 AJ429283 AJ429300 523. Buckley, T.R., Arensburger, P., Simon, C., Chambers, G.K., 2002. Combined data, Bayesian phylogenetics, and the origin of New Zealand Cicada genera. Syst. Biol. 51, 4–19. Bull, J.J., Huelsenbeck, J.P., Cunningham, C.W., Swofford, D.L.,

AJ235617 AF420967 AF420984 Waddell, P.J., 1993. Partitioning and combining data in phyloge- netic analysis. Syst. Biol. 42, 384–397. Burnham, K.P., Anderson, D.R., 1998. Model Selection and Inference: A Practical Information-theoretic Approach. Springer-Verlag, New York. (2000) (2002) (2002) (2002) (2002) Anderberg (direct

al. Caris, P., Ronse Decraene, L.P., Smets, E., Clinckemaillie, D., 2000. al. al. al. al. (2002) (2002) (2002) (2002) (2002) et et et et et Floral development of three Maesa species, with special emphasis al. al. al. al. al. Shi and (direct et et et et et (direct (direct on the position of the genus within Primulales. Ann. Bot. 86, 87–97. al. and Caris, P., Decraene, L.P.R., Smets, E., D, 2002. The uncertain et € onblom systematic position of Symplocos (Symplocaceae): evidence from a Bremer Shi submission) Anderberg Bremer Tang submission) Savolainen Bremer Anderberg Fritsch submission) Anderberg Bremer R (2002) Anderberg Bremer Fritsch submission) floral ontogenetic study. Int. J. Plant Sci. 163, 67–74. Chase, M.W., Hills, H.H., 1991. Silica gel: an ideal material for field preservation of samples for DNA studies. Taxon 40, 215–220. Chatfield, C., 1995. Model uncertainty, data mining and statistical Chung siatica inference. J. R. Stat. Soc. A 158, 419–466. a 1351 costata bogotensis sp. chinensis

lapponica Congdon, P., 2001. Bayesian Statistical Modeling. Wiley, Chichester. uperba s

officinalis obtusifolius Cronquist, A., 1981. An Integrated System of Classification of urceolata Flowering Plants. Columbia University Press, New York.

Anderberg Dahlgren, G., 1989. The last Dahlgrenogram, system of classifica- Barringtonia Schima Symplocos Symplocos Symplocos Symplocos Diapensia Galax Styrax Styrax

Theaceae Symplocaceae and Diapensiaceae Styracaceae tion of the . In: Tan, K. (Ed.), Plant , 18

Phytogeography and Releated Subjects: The Davis and Hedge Kauff, F., Lutzoni, F., 2002. Phylogeny of the Gyalectales and Festschrift. Edinburgh University Press, Edinburgh, pp. 249–260. Ostropales (Ascomycota, Fungi): among and within order rela- Deng, L., Baas, P., 1990. Wood anatomy of trees and from tionships based on nuclear ribosomal RNA small and large II: Theaceae. IAWA Bull. 11, 337–378. subunits. Mol. Phylogenet. Evol. 25, 138–156. Deng, L., Baas, P., 1991. The wood anatomy of the Theaceae. IAWA Larget, B., Simon, D.L., 1999. Markov chain Monte Carlo algorithms Bull. 12, 333–353. for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16, Dickison, W.C., 1972. Observations on the floral morphology of some 750–759. species of Saurauia, Actinidia, and Clematoclethra. J. Mitchell Soc. Lockhart, P.J., Steel, M.A., Hendy, M.D., Penny, D., 1994. Recov- 88, 43–54. ering evolutionary trees under a more realistic model of sequence Dickison, W.C., 1993. Floral anatomy of the Styracaceae, including evolution. Mol. Biol. Evol. 11, 605–612. observations on intra-ovarian trichomes. Bot. J. Linn. Soc. 112, Maddison, D.R., Maddison, W.P., 2002. MacClade. Sinauer Associ- 223–255. ates, Sunderland, MA. Douady, C.J., Delsuc, F., Boucher, Y., Doolittle, W.F., Douzery, Metcalfe, C.R., Chalk, L., 1950. Anatomy of the dicotyledons. 2 vols. E.J.P., 2003. Comparison of Bayesian and maximum likelihood Clarendon Press, Oxford. bootstrap measures of phylogenetic reliability. Mol. Biol. Evol. 20, Moll, J.W., Janssonius, H.H., 1906–1936. Micrographie des Holzes der 248–254. auf Java vorkommenden Baumarten. Leiden. Doyle, J.J., Doyle, J.J., 1987. A rapid DNA isolation procedure for Morgenstern, B., 1999. DIALIGN 2: improvement of the segment-to- small quantities of fresh leaf tissue. Phytochem. Bull. 19, 11–15. segment approach to multiple sequence alignment. Bioinformatics Erbar, C., 1986. Untersuchungen zur Entwicklung der spiraligen Blute€ 15, 211–218. von pseudocamellia (Theaceae). Bot. Jahrb. Syst. 106, Morton, C.M., Dickison, W.C., 1992. Comparative pollen morphol- 391–407. ogy of the Styracaceae. Grana 31, 1–15. Fritsch, P.W., Morton, C.M., Chen, T., Meldrun, C., 2001. Phylogeny Morton, C.M., Chase, M.W., Kron, K.A., Swensen, S.M., 1996. A and biogeography of the Styracaceae. Int. J. Plant Sci. 162, S95– molecular evaluation of the monophyly of the order Ebenales S116. based upon rbcL sequence data. Syst. Bot. 21, 567–586. Fujihashi, H., Akiyama, S., Ohba, H., 2002. Origin and relationships Morton, C.M., Chase, M.W., Kron, K.A., Swensen, S.M., 1997. A of the Sino-Himalayan Impatiens (Balsaminaceae) based on molecular evaluation of the order Ebenales based upon rbcL molecular phylogenetic analysis, chromosome numbers and gross sequence data. Syst. Bot. 21, 567–586. morphology. J. Jpn. Bot. 77, 284–295. Nooteboom, H.P., 1975. Revision of the Symplocaceae of the Old Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 1995. Bayesian World, New Caledonia excepted. Leiden University Press, Leiden. Data Analysis. Chapman & Hall, London. Penny, D., Hendy, M., 1986. Estimating the reliability of evolutionary Geuten, K., Becker, A., Yuan, Y.-M., Theissen, G., Smets, E., 2003. trees. Mol. Biol. Evol. 3, 403–417. MADS-box genes in flowers of Balsaminaceae and Marcgravia- Phillipe, H., Sorhannus, U., Baroin, A., Perasso, R., Gasse, F., ceae. In: Bayer, C., Dressler, S., Schneider, J., Zizka, G. (Eds.), Adoutte, A., 1994. Comparison of molecular and paleontological 16th International Symposium Biodiversity and Evolutionary data in diatoms suggests a major gap in the fossil record. J. Evol. Biology, 17th International Senckenberg Conference, 21–27 Sep- Biol. 7, 247–265. tember 2003. Abstracts: 157. Plamarum Hortus Francofurtensis 7. Pollock, D.D., Zwickl, D.J., McGuire, J.A., Hillis, D.M., 2002. Frankfurt am Main. 264 p. Increased taxon sampling is advantageous for phylogenetic infer- Goldman, N., 1998. Phylogenetic information and experimental design ence. Syst. Biol. 51, 664–671. in molecular systematics. Proc. R. Soc. Lond. B 265, 1779– Posada, D., Crandall, K.A., 1998. Modeltest: testing the model of 1786. DNA substitution. Bioinformatics 14, 817–818. Goldman, N., Anderson, J.P., Rodrigo, A.G., 2000. Likelihood-based Prather, A., Ferguson, C.J., Jansen, R.K., 2000. Polemoniaceae tests of topologies in phylogenetics. Syst. Biol. 49, 652–670. phylogeny and classification: implications of sequence data from Graybeal, A., 1998. Is it better to add taxa or characters to a difficult the chloroplast gene ndhF. Am. J. Bot. 87, 1300–1308. phylogenetic problem. Syst. Biol. 47, 9–17. Prince, L.M., Parks, C.R., 2001. Phylogenetic relationships of Thea- Hershkovitz, M.A., Zimmer, E.A., Hahn, W.J., 1999. Ribosomal ceae inferred from chloroplast DNA sequence data. Am. J. Bot. 88, DNA sequences and angiosperm systematics. In: Hollingsworth, 2309–2320. P.M., Bateman, R.M., Gornall, R.J. (Eds.), Molecular Systematics Reeder, T.W., 2003. A phylogeny of the Australian Sphenomorphus and Plant Evolution. Taylor & Francis, London, pp. 268–326. group (Scincidae: Squamata) and the phylogenetic placement of the Hillis, D.M., Pollock, D.D., McGuire, J.A., Zwickl, D.J., 2003. Is crocodile skinks (Tribolonotus): Bayesian approaches to assessing sparse taxon sampling a problem for phylogenetic inference. Syst. congruence and obtaining confidence in maximum likelihood Biol. 52, 124–126. inferred relationships. Mol. Phylogenet. Evol. 27, 384–397. Holmes, S., 2003. Statistics for phylogenetic trees. Theor. Popul. Biol. Ronblom,€ K., Anderberg, A.A., 2002. Phylogeny of Diapensiaceae 63, 17–32. based on molecular data and morphology. Syst. Bot. 27, 383– Huelsenbeck, J.P., Bull, J.J., Cunningham, C.W., 1996. Combining 395. data in phylogenetic analysis. Trends Ecol. Evol. 11, 152–158. Rokas, A., Nylander, J.A.A., Ronquist, F., Stone, G.N., 2002. A Huelsenbeck, J.P., Ronquist, F., 2001. MRBAYES: Bayesian inference maximum-likelihood analysis of eight phylogenetic markers in of phylogenetic trees. Bioinformatics 17, 754–755. gallwasps (Hymenoptera: Cynipidae): implications for insect phy- Huelsenbeck, J.P., Ronquist, F., Nielsen, R., Bollback, J.P., 2001. logenetic studies. Mol. Phylogenet. Evol. 22, 206–219. Evolution—Bayesian inference of phylogeny and its impact on Rosenberg, M.S., Kumar, S., 2001. Incomplete taxon sampling is not a evolutionary biology. Science 294, 2310–2314. problem for phylogenetic inference. Proc. Natl. Acad. Sci. USA 98, Huelsenbeck, J.P., Larget, B., Miller, R.E., Ronquist, F., 2002. 10751–10756. Potential applications and pitfalls of Bayesian inference of Rosenberg, M.S., Kumar, S., 2003. Taxon sampling, bioinformatics, phylogeny. Syst. Biol. 51, 673–688. and phylogenomics. Syst. Biol. 52, 119–124. Kallersj€ o,€ M., Bergqvist, G., Anderberg, A., 2000. Generic realignment Saghai-Maroof, M.A., Soliman, K.M., Jorgensen, R.A., Allard, R.W., in primuloid families of the Ericales s.l.: a phylogenetic analysis 1984. Ribosomal DNA spacer-length polymorphisms in barley: based on DNA sequences from three chloroplast genes and mendelian inheritance, chromosomal location, and population morphology. Am. J. Bot. 87, 1325–1341. dynamics. Proc. Natl. Acad. Sci. USA 81, 8014–8018. 19

Savolainen, V., Chase, M.W., Hoot, S.B., Morton, C.M., Soltis, D.E., Takhtajan, A., 1997. Diversity and Classification of Flowering Plants. Bayer, C., Fay, M.F., de Bruijn, A.Y., Sullivan, S., Qiu, Y.L., 2000. Columbia University Press, New York. Phylogenetics of flowering plants based on combined analysis of Tavare, S., 1986. Some probabilistic and statistical problems on the plastid atpB and rbcL gene sequences. Syst. Biol. 49, 306–362. analysis of DNA sequences. Lect. Math. Life Sci. 17. Schultheis, L.M., Baldwin, B.G., 1999. Molecular phylogenetics of Theissen, G., Saedler, H., 1995. MADS-box genes in plant ontogeny Fouquieriaceae: evidence from nuclear rDNA ITS studies. Am. J. and phylogeny: HaeckelÕs Ôbiogenetic lawÕ revisited. Curr. Opin. Bot. 86, 578–589. Genet. Dev. 5, 628–639. Shimodaira, H., Hasegawa, M., 1999. Multiple comparisons of log- Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, likelihoods with applications to phylogenetic inference. Mol. Biol. D.G., 1997. The CLUSTAL_X windows interface: flexible strate- Evol. 16, 1114–1116. gies for multiple sequence alignment aided by quality analysis Simmons, M.P., Freudenstein, J.V., 2003. The effects of increasing tools. Nucleic Acids Res. 25, 4876–4882. genetic distance on alignment of, and tree construction from rDNA Thompson, J.D., Plewniak, F., Poch, O., 1999. A comprehensive internal transcribed spacer sequences. Mol. Phylogenet. Evol. 26, comparison of multiple sequence alignment programs. Nucleic 444–451. Acids Res. 27, 2682–2690. Smets, E., Pyck, N., 2003. Ericales (Rhododendron). Encyclopedia of Thorne, R.F., 1992. An updated phylogenetic classification of the Life Sciences. Macmillan Reference Ltd. Available from: . Tsou, C.-H., 1998. Early floral development of Camellioideae (The- Soltis, D.E., Soltis, P.S., Nickrent, D.L., Johnson, L.A., Hahn, W.J., aceae). Am. J. Bot. 85, 1531–1547. Hoot, S.B., Sweere, J.A., Kuzoff, R.K., Kron, K.A., Chase, M.W., van den Oever, L., Baas, P., Zandee, M., 1981. Comparative wood Swensen, S.M., Zimmer, E.A., Chaw, S.M., Gillespie, L.J., Kress, anatomy of Symplocos and latitude and altitude of provenance. W.J., Systsma, K.J., 1997. Angiosperm phylogeny inferred from IAWA Bull. 2, 3–24. 18S ribosomal DNA sequences. Ann. Mol. Bot. Gard. 84, 1–49. Ward, N.M., Price, R.A., 2002. Phylogenetic relationships of Marc- Soltis, D.E., Soltis, P.S., Chase, M.W., Mort, M.E., Albach, D.C., graviaceae: insights from three chloroplast genes. Syst. Bot. 27, Zanis, M., Savolainen, V., Hahn, W.H., Hoot, S.B., Fay, M.F., 149–160. Axtell, M., Swensen, S.M., Prince, L.M., Kress, W.J., Nixon, K.C., Wendel, J.F., Doyle, J.J., 1998. Phylogenetic incongruence: window Farris, J.S., 2000. Angiosperm phylogeny inferred from 18S rDNA, into genome history and molecular evolution. In: Soltis, D.E., rbcL, and atpB sequences. Bot. J. Linn. Soc. 133, 381–461. Soltis, P.E., Doyle, J.J. (Eds.), Molecular Systematics of Plants. Soltis, D.E., Soltis, P.E., Zanis, M.J., 2002. Phylogeny of plants Kluwer Academic Publishers, Boston, pp. 265–296. based on evidence from eight genes. Am. J. Bot. 89, 1670–1681. White, T.J., Bruns, T., Lee, S., Taylor, J., 1990. Amplification and Stevens, P.F., 2001. onwards. Angiosperm Phylogeny Website. Version direct sequencing of fungal ribosomal RNA genes for phylogenet- 3, May 2002 (but more or less continuously updated since). ics. In: Innis, M., Gelfand, D., Sninsky, J., White, T. (Eds.), PCR Available from: . San Diego, CA, pp. 315–322. Suchard, M.A., Kitchen, C.M., Sinsheimer, J.S., Weiss, R.E., 2002. Xia, X., Xie, Z., Salemi, M., Chen, L., Wang, Y., 2003. An index of Hierarchical phylogenetic models for analyzing multipartite se- substitution saturation and its application. Mol. Phylogenet. Evol. quence data. Departments of Biomathematics, Human Genetics 26, 1–7. and Biostatistics, University of , Los Angeles, Los Yang, Z., 1993. Maximum likelihood estimation of phylogeny from Angeles, p. 34. DNA sequences when substitution rates differ over sites. Mol. Biol. Suzuki, Y., Glazko, G.V., Nei, M., 2002. Overcredibility of molecular Evol. 10, 1396–1401. phylogenies obtained by Bayesian phylogenetics. Proc. Natl. Acad. Yang, Z., 1996. Among-site rate variation and its impact on Sci. USA 99, 16138–16143. phylogenetic analyses. Trends Ecol. Evol. 11, 367–372. Swofford, D.L., Olsen, G.J., Waddell, P.J., Hillis, D.M., 1996. Yang, Z.H., Rannala, B., 1997. Bayesian phylogenetic inference using Phylogenetic inference. In: Hillis, D.M., Moritz, C., Mable, B.K. DNA sequences: A Markov Chain Monte Carlo method. Mol. (Eds.), Molecular Systematics. Sinauer Associates, Sunderland, Biol. Evol. 14, 717–724. MA, pp. 407–514. Yang, Z., 1998. On the best evolutionary rate for phylogenetic Swofford, D.L. 2002. PAUP*. Phylogenetic Analysis Using Parsimony analysis. Syst. Biol. 47, 125–133. (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Zwickl, D.J., Hillis, D.M., 2002. Increased taxon sampling greatly MA. reduces phylogenetic error. Syst. Biol. 51, 588–598.