MOLECULAR PHYLOGENETICS AND EVOLUTION Molecular Phylogenetics and Evolution 33 (2004) 922–935 www.elsevier.com/locate/ympev

Rabbits, if anything, are likely Glires

Emmanuel J.P. Douzerya,*, Dorothe´e Huchonb

a Laboratoire de Pale´ontologie, Phyloge´nie et Pale´obiologie, CC064, Institut des Sciences de lÕEvolution UMR 5554/CNRS, Universite´ Montpellier II; Place E. Bataillon, 34 095 Montpellier Cedex 05, France b Department of Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel

Received 1 July 2004 Available online 16 September 2004

Abstract

Rodentia (e.g., mice, , dormice, squirrels, and guinea pigs) and (e.g., , , and ) are usually grouped into the Glires. Status of this controversial superorder has been evaluated using morphology, paleontology, and mitochon- drial plus nuclear DNA sequences. This growing corpus of data has been favoring the monophyly of Glires. Recently, Misawa and Janke [Mol. Phylogenet. Evol. 28 (2003) 320] analyzed the 6441 amino acids of 20 nuclear proteins for six placental (, mouse, , human, cattle, and dog) and two outgroups (chicken and xenopus), and observed a position of the two murine among the former. They concluded that ‘‘the Glires hypothesis was rejected.’’ We here reanalyzed [loc. cit.] data set under maximum likelihood and Bayesian tree-building approaches, using phylogenetic models that take into account among-site variation in evolutionary rates and branch-length variation among proteins. Our observations support both the association of rodents and lagomorphs and the monophyly of (= Supraprimates) as the most likely explanation of the protein alignments. We conducted simulation studies to evaluate the appropriateness of lissamphibian and avian outgroups to root the placental tree. When the outgroup-to-ingroup evolutionary distance increases, maximum parsimony roots the topology along the long Mus-Rattus branch. Maximum likelihood, in contrast, roots the topology along different branches as a function of their length. Maximum like- lihood appears less sensitive to the ‘‘long-branch attraction artifact’’ than is parsimony. Our phylogenetic conclusions were con- firmed by the analysis of a different protein data set using a similar sample of species but different outgroups. We also tested the effect of the addition of afrotherian and xenarthran taxa. Using the linearized tree method, [loc. cit.] estimated that mice and rats diverged about 35 million years ago. Molecular dating based on the Bayesian relaxed molecular clock method suggests that the 95% credibility interval for the split between mice and rats is 7–17 Mya. We here emphasize the need for appropriate models of sequence evolution (matrices of amino acid replacement, taking into account among-site rate variation, and independent parameters across independent protein partitions) and for a taxonomically broad sample, and conclude on the likelihood that rodents and lagomorphs together constitute a monophyletic group (Glires). 2004 Elsevier Inc. All rights reserved.

Keywords: Rodentia; Lagomorpha; Glires; Maximum likeliood; Bayesian inference

What, if anything, is a rabbit? squirrels, and guinea pigs) and the lagomorphs (e.g., (Wood, 1957) rabbits, hares, and pikas). Rodentia and Lagomorpha are usually clustered to form the Glires based on 1. Introduction the shared traits of enlarged, ever-growing incisors adapted for gnawing. Rodents possess one pair of inci- Two orders of placental mammals are typically small sors on the upper jaw while Lagomorphs possess two gnawing : the rodents (e.g., mice, rats, dormice, pairs. For this reason, Rodentia have also been termed and Lagomorpha Duplicidentata. * Corresponding author. Fax: +33 4 67 14 36 10. The taxon Glires, first proposed by Linnaeus in 1758, E-mail address: [email protected] (E.J.P. Douzery). remains controversial after literally centuries of debate.

1055-7903/$ - see front matter 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.ympev.2004.07.014 E.J.P. Douzery, D. Huchon / Molecular Phylogenetics and Evolution 33 (2004) 922–935 923

In 1957, Albert E. Wood reviewed the many conflicting the two murine rodents. They indicated that ‘‘all theories in an article titled, ‘‘What, if anything, is a branches of [their] tree are well supported except by rabbit?’’ Based on morphological characters, Lagomor- the Bayesian method with rate variation,’’ but con- pha has been variously considered as related to Tric- cluded that ‘‘the Glires hypothesis was rejected.’’ onodonta (a group of Mesozoic mammals), We respond to the study by Misawa and Janke (2003) Marsupialia, Artiodactyla, Condylarthra (fossil by reanalyzing their protein alignments under probabi- ‘‘ungulates’’), , Insectivora, Macroscelidea, or listic tree-building approaches. Using phylogenetic mod- Scandentia (reviewed by Wood, 1957; Li et al., 1987). els that take into account among-site rate variation, However, the Glires concept has never been rejected, mouse, rat, and rabbit sequences appear monophyletic. since none of the alternative hypotheses appears satis- However, the data do not contain placental taxa that factory. The notion of Glires has been revived by pale- are likely to disrupt the monophyly of Glires (e.g., ontological discoveries, notably the Eurymilidae and Scandentia: Tupaia)(Waddell and Shelley, 2003). We Mimotonidae (fossil mammals considered to be close also investigate whether the use of distant—lissamphib- to the ancestral stock of Rodentia and Lagomorpha, ian and avian—outgroups is appropriate to root the pla- e.g., Li et al., 1987). Almost at the same time, this notion cental tree. By simulation, we show that maximum was also challenged by the first extensive multigene anal- parsimony artefactually roots the topology along the yses and studies of the mitochondrial genome (e.g., long Mus-Rattus branch, whereas maximum likeli- DÕErchia et al., 1996; Graur et al., 1991; Graur et al., hood—a more robust tree-building approach (Gaut 1996). These papers suggested that rodents might be and Lewis, 1995)—is less sensitive to this undesirable paraphyletic and unrelated to lagomorphs. This conclu- property, even for distant outgroups. Our reanalysis sion led morphologists to reconsider the position of lag- shows that Misawa and JankeÕs misleading conclusions omorphs. Recent studies support a close relationship are likely to be the result of phylogenetic analyses based between rodents and lagomorphs (e.g., Landry, 1999; on models of sequence evolution that fit the data poorly, Luckett and Hartenberger, 1993; Meng et al., 1994; and on the use of outgroups that are too divergent. Cor- Wallau et al., 2000). In parallel, it has been argued that recting for these problems suggests the monophyly of conclusions from molecular data about the paraphyly of Glires. Glires reflect small species samples, the use of oversim- plified models of evolution, tree rooting problems, or long-branch attraction artifacts in tree construction 2. Materials and methods (e.g., Adachi and Hasegawa, 1996; Halanych, 1998; Lin et al., 2002a; Philippe, 1997; Philippe and Douzery, 2.1. Data sets 1994; Reyes et al., 2004; Sullivan and Swofford, 1997). Many recent molecular studies indicate that Glires are Four data sets were analyzed, the nuclear and mito- monophyletic. Some individual nuclear genes such as chondrial ones of Misawa and Janke (2003) as well as pepsinogen C (Narita et al., 2001), IRBP (DeBry and two data sets based on Murphy et al. (2001b). The nu- Sagel, 2001; Stanhope et al., 1992), e-globin, and GHR clear data set of Misawa and Janke (2003) includes 20 (Waddell and Shelley, 2003) favor the monophyly of proteins (6441 amino-acid sites) and the mitochondrial Glires, as well as various combined nuclear and data set the 12 H-strand encoded proteins (3559 sites). mitochondrial data sets. Lagomorphs join rodents The Murphy et al. (2001b) data sets were chosen in or- in analyses of 5.7 kb for 28 mammals (Madsen et al., der to study the impact of taxon sampling on the mono- 2001), 2.4 kb for 42 taxa (Huchon et al., 2002), 5.1 kb phyly of Glires, because they include the sequences of for 50 taxa (Delsuc et al., 2002), 9.8 kb for 64 taxa mouse, rat, rabbit, human, dog, and bovine (boreoeu- (Murphy et al., 2001a), 16.4 kb for 44 taxa (Murphy therian placentals), and marsupial outgroups, with afro- et al., 2001b), 17 kb for 44 taxa (Amrine-Madsen therian and xenarthran (non-boreoeutherian) placentals et al., 2003), and proteins for 38–47 taxa (Waddell et al., included or excluded. Because Misawa and Janke (2003) 2001). The monophyly of Glires is also supported by worked on protein sequences, the data set of Murphy the analysis of indels in nuclear genes (de Jong et al., et al. (2001b) was translated into protein. Some minor 2003). corrections were made to the DNA alignments of Following the approach of Graur et al. (1991, 1996); Murphy et al. (2001b), to better take into account the Misawa and Janke (2003) concatenated proteins, maxi- reading frame. The concatenated protein alignment mized sequence length with no missing data at the ex- based on Murphy et al. (2001b) included 15 protein- pense of broad taxon sampling, and analyzed 6441 coding genes for a total of 4333 amino acids: A2AB, amino-acid sites for six placental mammals (rat, mouse, ADORA3, ADRB2, ATP7A, BDNF, BRCA1, CNR1, rabbit, human, cattle, and dog) and two outgroups EDG1, IRBP, PNOC, RAG1, RAG2, TYR, vWF, (chicken and xenopus). The authors surprisingly ob- and ZFX. Two data sets were considered: an 8-taxon served a basal position within placental mammals of matrix and a 10-taxon matrix. The 8-taxon matrix is 924 E.J.P. Douzery, D. Huchon / Molecular Phylogenetics and Evolution 33 (2004) 922–935 composed of the former 6 boreoeutherians with two log-likelihood between the 105 possible rootings of ro- marsupial outgroups: Didelphis and a diprotodontian. dents, lagomorphs, primates, carnivores, and cetartio- The 10-taxon matrix is the same as the 8-taxon matrix, dactyls by Xenopus and Gallus was evaluated by the with the addition of two taxa: the elephant representing non-parametric test of Kishino and Hasegawa (1989), the Proboscidea and sloth representing the Xenarthra. conducted with the conservative Shimodaira (2002) cor- These sequences were added in order to include repre- rection for multiple tree comparisons (SH test). The dif- sentative taxa from the four major placental . Se- ference in log-likelihoods of the best root [R] and the quence alignments are available upon request. alternative 104 rootings [A](d =lnL[R]ÀlnL[A]), the standard error (r) of this difference, and the confidence 2.2. Bayesian inference of phylogeny PSH values were computed by PAML 3.1. Because of the conservative behavior of the SH test The nuclear data set of Misawa and Janke (2003),and (Shimodaira, 2002), we evaluated the statistical signifi- the reduced 8- and 10-taxon versions of the Murphy et al. cance of lnL decrease for alternative rooting by using (2001b) data set were analyzed using the Bayesian the SOWH test (Swofford et al., 1996), based on Monte approach implemented in MrBayes version 3.0b4 (Ron- Carlo simulation, and also known as parametric boot- quist and Huelsenbeck, 2003), under a JTT model of strap. We used PSeq-Gen (Grassly et al., 1997), version protein evolution (Jones et al., 1992). Among-site rate 1.1, to generate 100 matrices of 8 taxa and 6441 amino variation was described by a discrete Gamma distribu- acids, simulated under a JTT + F + C4 model of protein tion with four categories (C4)(Yang, 1996). Four evolution. The reference topology was ((human: Metropolis-coupled Markov chains Monte Carlo 0.059608, ((rabbit: 0.080175, (mouse: 0.035348, rat: (MCMCMC) were run for 100,000 generations with 0.030112) :0.110667): 0.019303, (dog: 0.062351, bovine: the program default prior distributions, and trees were 0.086717): 0.020118): 0.012799): 0.305140, chicken: sampled every 20 generations. The choice of the total 0.226552, and xenopus: 0.783356). This tree, where pri- number of generations represents a compromise between mates are basal, corresponded to the best alternative un- computing time and rapidity of MCMCMC convergence der the constraint that the root is not on the branch due to the limited number of taxa (7 ingroups plus one leading to the dog plus bovine clade. For each of the outgroup). Among the 5000 saved trees, the number to 100 matrices obtained, PAML was used to calculate be discarded as a burn-in was determined empirically the lnL of the ‘‘basal Primates’’ and ‘‘Glires’’ trees, with after checking for log-likelihood (lnL) stationarity. The a full optimization of the a parameter of the Gamma reliability of nodes was measured by their posterior distribution. Because of the a priori definition of the probabilities (PP). Topologies were also compared topologies compared, this version of the SOWH test cor- according to their tree posterior probability (TPP). responded to the mnemonic code ‘‘priPfud’’ defined by To circumvent the problem of overcredibility of Goldman et al. (2000). Bayesian posterior probabilities for measuring the The same analysis was repeated with 100 simulations reliability of tree nodes (Suzuki et al., 2002; Waddell under the reference topology corresponding to the et al., 2001, 2002), bootstrapped posterior probabilities second best root (murine rodents are basal): ((((human: (BootPP) were computed, i.e., PP estimated by 0.070593, (dog: 0.062105, bovine: 0.086517): 0.020670): MCMCMC after bootstrap resampling of characters 0.017181, rabbit: 0.081891): 0.018747, (mouse: (Douady et al., 2003; Waddell et al., 2002). Here, we 0.035158, rat: 0.030281): 0.094743): 0.297491, chicken: computed BootPP after 50 bootstrap replicates on ami- 0.223176, xenopus: 0.777447). PAML was again used no acids, conducted exactly under the same conditions to calculate the lnL of the ‘‘basal murines’’ and ‘‘Glires’’ as those of the original MCMCMC run. Half of the trees. sampled trees were discarded as a burn-in to ensure that MCMCMC reached stationarity. Following Douady 2.4. Separate versus combined analysis et al. (2003) and Waddell et al. (2002), BootPP were cal- culated by the majority rule consensus of the 50 · 2500 The program COMBINE (Pupko et al., 2002) was remaining trees. used to study the effect of the ML model used to com- bine the different genes on the highest likelihood phylog- 2.3. ML inference of phylogeny eny obtained. The nuclear and the mitochondrial amino acid data sets of Misawa and Janke (2003) were ana- The nuclear data set of Misawa and Janke (2003) was lyzed under four different models of sequence evolution. also analyzed under maximum likelihood (ML) by Three methods of branch length estimation were consid- PAML (Yang, 1997), version 3.1, and using the ered: the concatenate model assumed the same set of JTT + F + C4 model. The option ‘‘+F’’ allows incorpo- branch lengths for all proteins; the proportional model ration of the amino acid frequencies of the protein con- assumed that branch lengths are proportional among sidered. The statistical significance of the differences in proteins; and the separate model assumed a different E.J.P. Douzery, D. Huchon / Molecular Phylogenetics and Evolution 33 (2004) 922–935 925 set of branch lengths for each protein. For the concate- 2.5. Simulation study of the impact of outgroup distance nate model, two models of among-site rate variation to the ingroup were used: the homogeneous model that assumes that all sites have the same rate of evolution; and the hetero- An analysis of the impact of the evolutionary dis- geneous Gamma model (one a parameter was used for tance between the ingroup and the outgroup was con- the concatenate data set, i.e., the ‘‘one-alpha’’ model). ducted on the nuclear data set of Misawa and Janke For the proportional and the separate models, we as- (2003). Based on the JTT + F + C4 model of protein sumed different Gamma distributions for each protein, evolution, we used PSeq-Gen 1.1 to generate five sets to describe among-site rate variation (different a param- of 100 matrices – 8 taxa · 6441 amino acids, with eters were considered for each gene, i.e., the ‘‘N-alpha’’ empirical amino acid frequencies deduced from the ori- model). The concatenate model without among-site rate ginal data set, Gamma parameter a = 0.63, and branch variation and the concatenate model with a single a lengths imported from the best tree identified by parameter, are the models used by Misawa and Janke PAML (‘‘Glires tree’’): (((human: 0.079, (rabbit: (2003). The proportional and the separate models with 0.080, (mouse: 0.035, rat: 0.030): 0.111): 0.019): N alpha parameters are the best models identified by 0.010, (dog: 0.064, bovine: 0.087): 0.011): 0.308, chick- Pupko et al. (2002) to combine sequences. en: 0.229, xenopus: 0.782) – each under the same The nuclear genes were analyzed under the JTT mod- topology, but with five varying lengths for the el and the mitochondrial genes under the mtREV model branches leading to the ingroup, to Gallus, and to (Adachi and Hasegawa, 1996). Three topologies were Xenopus. We changed the distance of the outgroup compared under the four models (Fig. 1): the best nucle- to the ingroup by rescaling these three branches by ar topology of Misawa and Janke (2003) (‘‘basal-rodents factors 0.01, 0.10, 1.00 (hence corresponding to the ori- tree’’), the best mitochondrial topology of Misawa and ginal data), 10, and 100. Then, PAML 3.1 was used to Janke (2003) (‘‘mitochondrial tree’’), and the topology compare the seven possible rootings along the that supports the monophyly of Glires (‘‘Glires tree,’’ branches of the model tree, for each of the 5 · 100 e.g., Madsen et al., 2001; Murphy et al., 2001a; Murphy simulated matrices, and under two optimality criteria. et al., 2001b; Huchon et al., 2002; Delsuc et al., 2002; These comparisons were first conducted under maxi- Amrine-Madsen et al., 2003 ; Waddell et al., 2001; Wad- mum parsimony (MP) by measuring the number of ex- dell and Shelley, 2003). For each topology, the a param- tra steps required to observe the alternative topologies eter of the Gamma distribution was estimated with the relative to the most parsimonious one; and then under program COMBINE. This program uses a discrete ML by measuring the d/r ratio of the lnL difference Gamma distribution with four categories. The different between two competing topologies (d) and its standard trees were compared under the same model using one- error (r). For both approaches, the number of times tailed Kishino–Hasegawa test (KH test; Kishino and the true (simulated) topology was recovered was Hasegawa, 1989). The correction of Shimodaira (2002) recorded. for multiple tree comparisons is not implemented in the program COMBINE. To compare the different 2.6. Relaxed molecular clock dating models, the Akaike Information Criterion (AIC) was used (Sakamoto et al., 1986). The statistical difference To estimate divergence ages among placentals using between two AIC values was evaluated using the test the nuclear protein data set of Misawa and Janke of Linhart (1988) which is similar to the KH test (2003), we applied the Bayesian relaxed molecular clock (Kishino and Hasegawa, 1989). approach of Kishino et al. (2001) and Thorne et al.

Fig. 1. Topologies compared under four different models for combining different genes. (A) ‘‘Basal-rodents tree’’ following Misawa and Janke (2003). (B) ‘‘Mitochondrial tree’’ following Misawa and Janke (2003). (C) ‘‘Glires tree.’’ 926 E.J.P. Douzery, D. Huchon / Molecular Phylogenetics and Evolution 33 (2004) 922–935

(1998), implemented in the DIVTIME 5b package. 3. Results and discussion Following Misawa and Janke (2003) and others (e.g., Kumar and Hedges, 1998), we used 310 Mya for the 3.1. Bayesian inference of phylogeny based on nuclear split between the chicken and the placentals, by con- sequences straining the root age of the ingroup to be bounded be- tween 309 and 311 Mya (but see Reisz and Mu¨ller, The MCMCMC analysis of the 20 concatenated nucle- 2004). The program ESTBRANCHES (Thorne et al., ar proteins under a model incorporating among-site rate 1998) was used to estimate the variance–covariance ma- variation reached stationarity very quickly (Fig. 2). Inde- trix of branch lengths under a JTT model of protein evo- pendent runs with varying number of generations lution. Then, the program DIVTIME (Thorne et al., (100,000, 300,000, or 1,000,000) and tree sampling (every 1998) was used to estimate the posterior divergence 20 or 100 generations) yielded exactly the same topology times after 10,000 sampling of the MCMC – each sam- and posterior probabilities. The maximum posterior ple separated by 100 cycles – and preceded by 100,000 probability (MAP) tree is depicted in Fig. 3. Clade poster- cycles of burn-in. The following priors for Gamma dis- ior probabilities support the association of Oryctolagus tributions were used: 310 ± 155 Mya for root age in the with Mus + Rattus (Glires: PP = 1.00), Homo + Glires absence of constraints on node times, 0.10 ± 0.05 amino (PP = 0.80), and Bos + Canis (PP = 0.99). In Fig. 2,we acid replacements/Mya/100 sites at root node, and see that the Markov chains massively visited two topolo- 0.003 ± 0.003 for the rate autocorrelation per Mya. gies, the MAP one (i.e., the ‘‘Glires topology’’), and the The maximum number of time units between root and same topology but with a rooting on the branch leading tip was set to 540 Mya. To evaluate whether different to the primates (i.e., the ‘‘mitochondrial topology’’). This methods of clock relaxation gave similar results, the can be quantified by the tree posterior probabilities, which same analysis with the same paleontological constraint range from 0.794 for the MAP tree, to 0.198 (Homo root- was performed under the non-parametric rate smooth- ing), and then to 0.004 (Bos rooting), 0.003 (Mus + Rattus ing approach (Sanderson, 1997) implemented in the rooting), and 0.001 (Canis rooting) (Fig. 3). software r8s, with optimization via PowellÕs method. The exact relationship between Bayesian posterior The optimal exponential penalty was 1.05 for smoothing probabilities and bootstrap percentages is still poorly the local estimates of substitution rates from a branch to understood (Alfaro et al., 2003; Cummings et al., the neighboring branch. 2003; Douady et al., 2003; Waddell et al., 2002).

Fig. 2. Log-likelihood (lnL) of the trees as a function of the number of MCMCMC generations for the nuclear data set of Misawa and Janke (2003), comprising 20 concatenated proteins. The topologies visited by the chains actually corresponded to five different rootings of the placental tree ((Bos, Canis), (Homo,(Oryctolagus,(Mus, Rattus)))), and are indicated by black circles (root on the branch leading to euarchontoglires, i.e., the ‘‘Glires tree’’), white triangles (root on the branch leading to human, i.e., the ‘‘mitochondrial tree’’), gray diamonds (root on the branch leading to murine rodents, i.e., the ‘‘basal-rodents tree’’ found by Misawa and Janke (2003) for the same data set), gray squares (root on the branch leading to bovine), and gray circles (root on the branch leading to dog). The former two topologies are massively favored by the Bayesian analysis. E.J.P. Douzery, D. Huchon / Molecular Phylogenetics and Evolution 33 (2004) 922–935 927

(BootPP = 0.84), whereas Primates stands in trifurca- tion relative to the former two clades (BootPP = 0.39). Though its bootstrapped PP is not maximal, the Glires clade is retrieved, which is in sharp contrast with the conclusions of Misawa and Janke (2003) obtained under NJ, MP, ML, quartet puzzling, and Bayesian analysis without rate variation on the same data set.

3.2. Separate versus combined ML analysis

It has been shown that the model used to combine different sequences under the ML criterion can have an impact on the best phylogeny obtained (Pupko et al., 2002). For this reason, we tried to identified the Fig. 3. Maximum posterior probability tree obtained after 100,000 most appropriate model for the nuclear and the mito-

MCMCMC generations under the JTT + C4 model for the 20 concat- chondrial data sets of Misawa and Janke (2003). enated nuclear proteins of Misawa and Janke (2003). For each node, the For the nuclear data set, the best model is the sepa- posterior probability (PP) is given, together with its bootstrapped PP rate model with 20 a parameters (Table 1). The differ- (BootPP in bold) computed after 50 replicates. Arrows indicate the five ences in AIC between the best model and the others is best rootings of the placental tree, with italicized values corresponding to the tree PP (TPP). The marginal log-likelihood of the tree is highly significant (PAIC < 0.001; data not shown). Under lnL = À54391.0, with Gamma parameter a = 0.63 (95% credibility this model, the nuclear data set favors the ‘‘Glires tree’’ interval: 0.58–0.67), and total tree length = 1.89 (1.81–1.97). Branch relative to the ‘‘mitochondrial tree’’ and the ‘‘basal-ro- lengths are proportional to the expected number of replacements per dents tree’’ (Fig. 1). In order to verify that the ‘‘Glires each of the 6441 amino acid (AA) sites. The length of the branch leading tree’’ was indeed the best tree we computed, under the to Xenopus was reduced by a factor of three. separate model, the likelihood of the 15 trees that de- scribe all the possible relationships between the five fol- Credibility values of the Bayesian phylogenetic inference lowing taxa: Rodentia (Mus + Rattus), Oryctolagus, seem to constitute an overestimated index of the reliabil- Homo, Laurasiatheria (Canis + Bos), and the outgroups ity of tree nodes under some circumstances (Cummings (Xenopus, Gallus). The ‘‘Glires tree’’ remains the best et al., 2003; Douady et al., 2003; Suzuki et al., 2002; tree among the 15 others, although the Kishino–Hase- Waddell et al., 2001) whereas bootstrap percentages gawa test indicates that the differences in likelihood might be too conservative (Hillis and Bull, 1993; Wilcox are not significant. The ‘‘Glires tree’’ is not significantly et al., 2002). To circumvent this problem, some authors better than the ‘‘basal-rodents tree’’ (PKH = 0.17), but is have suggested computing bootstrapped posterior prob- significantly better than the ‘‘mitochondrial tree’’ abilities (Douady et al., 2003; Waddell et al., 2002). (PKH = 0.02). Similar results were obtained with the While time consuming, this approach helps to discrimi- proportional model with twenty a parameters, and with nate between moderately and strongly supported nodes the concatenate model with one a parameter. The for which initial PP stands above 0.95. Here, the boot- ‘‘Glires tree’’ was not significantly better than the ‘‘bas- strapped Bayesian analysis is consistent with the mono- al-rodents tree’’ (PKH = 0.33 under the proportional phyly of Glires (BootPP = 0.72, Fig. 3)andBos + Canis model, and PKH = 0.34 under the concatenate model),

Table 1 Akaike Information Criterion (AIC) values for the 20 nuclear proteins data set of Misawa and Janke (2003) Branch length model Concatenate Concatenate Proportional Separate Among-site rate variation model Homogeneous One alpha 20 alpha 20 alpha df 13 14 52 280 Mitochondrial tree lnL À55670.02 À54435.76 À53561.77 À52795.58 AIC 111366.03 108899.52 107227.54 106151.15 Basal-rodents tree lnL À55571.40 À54393.13 À53528.98 À52761.46 AIC 111168.80 108814.27 107161.95 106082.91 Glires tree lnL À55608.09 À54386.81 À53522.41 À52745.53 AIC 111242.19 108801.61 107148.81 106051.06 df is the number of degrees of freedom, lnL is the Log-likelihood value, and AIC is À2 · lnL +2· df. ‘‘One alpha’’ indicates that a single a parameter of the Gamma distribution was used under the concatenate model; ‘‘20 alpha’’ indicates that different a parameters were used for each gene under the proportional and the separate models. The best AIC value is indicated in bold. The best lnL and AIC values are italicized for each model. 928 E.J.P. Douzery, D. Huchon / Molecular Phylogenetics and Evolution 33 (2004) 922–935 but it was significantly better than the ‘‘mitochondrial The mitochondrial genome based-analysis did not sup- tree’’ (PKH = 0.03 under the proportional model, and port the monophyly of Glires (Table 2). There is currently PKH = 0.009 under the concatenate model). When the no consensus concerning the phylogenetic positions of homogeneous model was used with the concatenate lagomorph and taxa based on mitogenomic anal- model the best tree was the ‘‘basal-rodents tree,’’ in ysis. Few mitochondrial studies support the monophyly agreement with Misawa and Janke (2003). However, of Glires once the tree is rooted (e.g., Cao et al., 2000; Pup- KH tests indicate that this tree is not significantly better ko et al., 2002; Reyes et al., 2004; Waddell et al., 2001), than the ‘‘Glires tree’’ (PKH = 0.07). This result is at var- many do not support it (e.g., Arnason et al., 2002; Schmitz iance with the high bootstrap values that support the et al., 2000), and the likelihood of alternative topologies is ‘‘basal-rodents tree’’ in Misawa and Janke (2003). The usually not significantly different (Arnason et al., 2002; mitochondrial tree was also rejected under the homoge- Cao et al., 2000; Pupko et al., 2002). This discrepancy neous model (PKH = 0.001). might come from the fact that mitochondrial sequences For the mitochondrial data set, there was no statisti- seem to contain less phylogenetic signal than the nuclear cal difference between the separate model with 12 a genes at deep taxonomic levels (Springer et al., 2001; parameters and the proportional model with 12 a but see Reyes et al., 2004). A complementary explanation parameters (PAIC = 0.15). However, these two models might come from the observation that ‘‘the placement of were significantly better than the concatenate model the outgroup sequence can have a confounding effect on with one a parameter or with the homogeneous model the ingroup tree, whereby the ingroup is correct when (PAIC < 0.001). Because the predictions based on a com- using the ingroup sequences alone, but with the inclusion plex model are more prone to error than those based on of the outgroup the ingroup tree becomes incorrect’’ a simpler model (Nei and Kumar, 2000, p. 154), the best (Holland et al., 2003). In their mitochondrial analysis, model for the mitochondrial data set was the propor- Lin et al. (2002a,b) found monophyly of Glires in their tional model with 12 a parameters. In opposition to unrooted tree, but rooting the tree by marsupials and a Misawa and Janke (2003), our analysis of the mitochon- monotreme then induced a shift in the root location, that drial data set always indicated that the ‘‘basal-rodents led to the paraphyly of Glires. However, it has recently tree’’ was the best tree relative to ‘‘the mitochondrial been shown that a better agreement between mito- tree’’ and the ‘‘Glires tree’’ (Table 2). This discrepancy chondrial and nuclear topologies is found when com- between the observation of Misawa and Janke (2003) prehensive taxon sampling is used, compositional bias and our results might result from the fact we used the is taken into account, and robust methods are used mtREV model. Under the JTT model of amino acid (Reyes et al., 2004). substitution and the concatenate homogeneous model, About the effect of different ML models of branch the ‘‘mitochondrial tree’’ (lnL = À25951.76) was the length estimation on the inferred phylogeny, our results best tree compared with the ‘‘basal-rodents tree’’ on the nuclear and mitochondrial proteins confirm the (lnL = À25956.96) and the ‘‘Glires tree’’ (lnL = observations of Pupko et al. (2002). A concatenate À25989.03). However, KH tests indicated that the model is less appropriate for phylogenetic analysis than ‘‘basal-rodents tree’’ was not significantly better than are separate or proportional models. Unfortunately, the two others when among-site rate variation is taken these models are time consuming and not available in into account (PKH > 0.05), whatever the model consid- most phylogenetic analysis packages. Additionally, ered. Only under the concatenate homogeneous model models that account for among-site rate variation here did the ‘‘Glires tree’’ appear to be significantly less likely appear significantly better than the homogeneous model than the ‘‘basal-rodents tree’’ (PKH = 0.04). (PAIC < 0.001).

Table 2 Akaike Information Criterion (AIC) values for the 12 mitochondrial protein data set of Misawa and Janke (2003) Branch length model Concatenate Concatenate Proportional Separate Among-site rate variation model Homogeneous One alpha 12 alpha 12 alpha df 13 14 36 168 Mitochondrial tree lnL À24618.68 À23931.57 À23767.79 À23614.30 AIC 49263.37 47891.14 47607.58 47564.61 Basal-rodents tree lnL À24610.63 À23920.05 À23758.09 À23607.91 AIC 49247.27 47868.10 47588.19 47551.82 Glires tree lnL À24646.17 À23929.87 À23767.57 À23616.30 AIC 49318.34 47887.74 47607.14 47568.59 df is the number of degrees of freedom, lnL is the Log-likelihood value, and AIC is À2 · lnL +2· df. ‘‘One alpha’’ indicates that a single a parameter of the Gamma distribution was used under the concatenate model; ‘‘12 alpha’’ indicates that different a parameters were used for each gene under the proportional and the separate models. The best AIC value is indicated in bold. The AIC value under the separate model is not statistically better than the AIC value under the proportional model. The best lnL and AIC values are italicized for each model. E.J.P. Douzery, D. Huchon / Molecular Phylogenetics and Evolution 33 (2004) 922–935 929

3.3. Analysis of alternative rooting evaluate the statistical significance of the difference in lnL between the best tree and the two first alternative The SH test for multiple tree comparisons was used rootings. The tests rejected the two alternative rootings to evaluate the significant differences among alternative at P < 0.01, as the lnL of the most likely rooting was rooting. The null hypothesis is that all 105 rootings com- never greater than the lnL of the alternative rootings pared—including the ML one—are equally good expla- used to simulate the 100 parametric bootstrap replicates. nations of the data. We computed the log-likelihood of the 105 possibilities of connecting rodents, lagomorphs, 3.4. Simulations: effect of distance between outgroup and primates, carnivores, and cetartiodactyls into a bifurcat- ingroup ing subtree, rooted by Xenopus and Gallus. These 15 · 7 = 105 rootings arise because there are 15 possible To test whether the lissamphibian (Xenopus) and avian unrooted trees with five placental orders, each with (Gallus) outgroup was too distant from the ingroup to 2 · 5 À 3 = 7 branches, i.e., seven potential locations provide a reliable root, a Monte Carlo simulation study for the root. The best rooting—i.e., the one yielding was conducted. We evaluated the ability of MP and the highest likelihood—was the ‘‘Glires tree.’’ The root ML to recover the simulated topology after 100 replicates was along the branch separating Carnivora + Cetartio- by varying the outgroup-to-ingroup distance. The dactyla—members of Laurasiatheria—from Glires + lengths of the three branches leading to xenopus, chicken, Primates—members of Euarchontoglires (= Supra- and placentals were thus rescaled from 0.01 to 100 times primates) (Fig. 4). The 104 other rootings were ranked (Table 3). While the original tree corresponds to a scaling by increasing level of statistical rejection by the SH test factor of 1.00, the situation for which the outgroup was (PSH values ranging from 0.97 to less than 0.001). very close to the direct placental ancestor corresponded Among them, 85 were statistically rejected to a scaling factor of 0.01. Conversely, a random rooting (PSH < 0.05). The 19 other rootings occur along the behavior of the outgroup was simulated by a scaling branches of three unrooted trees among 15 possible factor of 100, and we should expect a random choice ones. In these three unrooted trees, Rodentia, Lagomor- of ingroup branches, proportionally to their length pha, and Primates are directly connected (Fig. 4). The (Wheeler, 1990). Scaling factors of 0.1 and 10 represented third best likelihood score corresponded to the rooting two intermediates between the two extreme situations. observed by Misawa and Janke (2003). For scaling factors of 0.01 and 0.1, both MP and ML The first and second best alternative rootings (Fig. 4) recovered the true ingroup root (Table 3). For very low differed from the best rooting by, respectively, 1.5 and outgroup-to-ingroup distances (scaling factor of 0.01), 5.8 units of lnL. Two SOWH tests were conducted to alternative roots were all rejected, by D = 47 to 136 extra

Fig. 4. Confidence levels of the Shimodaira–Hasegawa test for the location of the root of the placental tree based on the nuclear data set of Misawa and Janke (2003). The 105 possible rootings of the placental tree by Xenopus and Gallus were evaluated and ranked according to their increasing level of rejection. Log-likelihoods were computed under the JTT + F + C4 model. 85 rooting possibilities were statistically rejected. The 19 remaining positions of the root, which are not statistically rejected, are located on three unrooted trees. Those unrooted trees always group rodents, lagomorphs and primates. The circled numbers on the unrooted trees indicate the various alternatives positions of the root. The numbers inside the circle, and the italicized numbers on histogram bars, reflect the ranking of the rooted trees according to their [PSH]. The star indicates the best tree according to Misawa and Janke (2003). The vertical dashed line corresponds to the PSH = 0.05 rejection level. CAR, CET, LAG, PRI, and ROD stand for Carnivora, Cetartiodactyla, Lagomorpha, Primates, and Rodentia, respectively. 930

Table 3 Impact of the evolutionary distance between the lissamphibian and avian outgroup and the ingroup for recovering the simulated topology under maximum parsimony (MP) and maximum likelihood (ML) based on the nuclear protein data set of Misawa and Janke (2003) ...Duey .Hco oeua hlgntc n vlto 3(04 922–935 (2004) 33 Evolution and Phylogenetics Molecular / Huchon D. Douzery, E.J.P.

Scaling = 0.01 Scaling = 0.1 Scaling = 1 (real) Scaling = 10 Scaling = 100 MP ML MP ML MP ML MP ML MP ML True root [4%] 100% 100% 100% 100% 67% 100% 0% 41% 0% 0% D =0 d/r = 0 0 0 2 ± 3 0 40 ± 10 0.2 ± 0.3 43 ± 10 0.8 ± 0.6 Root #1 [15%] 0% 0% 0% 0% 10% 0% 1% 16 % 0% 10% 47 ± 7 4.8 ± 0.5 43 ± 8 4.5 ± 0.5 12 ± 8 2.9 ± 0.5 29 ± 11 0.6 ± 0.5 31 ± 11 0.8 ± 0.6 Root #2(*)[21%] 0% 0% 0% 0% 4% 0% 88 % 3% 90% 18% 136 ± 12 7.7 ± 0.5 119 ± 12 7.4 ± 0.4 24 ± 12 5.7 ± 0.5 0 ± 1 1.3 ± 0.7 0 ± 2 0.7 ± 0.6 Root #3 [17%] 0% 0% 0% 0% 15% 0% 7% 6% 8% 32% 51 ± 7 4.5 ± 0.4 44 ± 7 4.3 ± 0.4 9 ± 7 3.1 ± 0.5 17 ± 12 0.7 ± 0.5 21 ± 13 0.6 ± 0.6 Root #4 [4%] 0% 0% 0% 0% 2% 0% 0% 14% 0% 1% 47 ± 7 4.8 ± 0.4 43 ± 8 4.5 ± 0.5 17 ± 8 2.9 ± 0.5 36 ± 9 0.6 ± 0.5 39 ± 8 1.0 ± 0.9 Root #5 [15%] 0% 0% 0% 0% 0% 0% 3% 3% 2% 21% 135 ± 12 7.7 ± 0.5 120 ± 11 7.4 ± 0.5 38 ± 12 5.7 ± 0.5 21 ± 9 1.3 ± 0.7 20 ± 10 0.7 ± 0.6 Root #6 [12%] 0% 0% 0% 0% 2% 0% 1% 17% 0% 18% 49 ± 7 4.5 ± 0.4 43 ± 7 4.3 ± 0.4 16 ± 7 3.1 ± 0.5 30 ± 13 0.7 ± 0.6 33 ± 12 0.7 ± 0.6 Outgroup-to-ingroup distances are expressed as a scaling factor (ranging from 0.01 to 100) for the three (dashed) branches leading to Xenopus and Gallus. A scaling factor of 1.00 corresponds to the real data. Seven locations of the root are evaluated: the true root is the one used for simulations and corresponds to the highest-likelihood rooting (cf. Fig. 4). The six alternative rootings (1–6) are numbered according to Fig. 4 and correspond to the outgroup branching on Homo (#1), murine rodents (#2), Bos (#3), Glires (#4), Oryctolagus (#5), and Canis (#6). Random rooting of the ingroup by the outgroup would occur at a percentage given between brackets, assuming that each of the nine ingroup branches randomly attracts the root according to its length. One hundred Monte Carlo simulations of 6441 amino acids for eight taxa were conducted for each of the five outgroup-to-ingroup distances. Each entry of the table contains two lines. The first line reports the percentage of times the simulated topology was recovered among the 100 replicates. The second line gives for the MP reconstruction the number of extra steps (D) relative to the best tree found with the simulated data set, or for the ML reconstructions the ratio between the difference in likelihood relative to the best tree and the standard error of this difference (d/r). All values are expressed as mean ± standard error over the 100 replicates. For the scaling factor of 1.00, outgroup-to-ingroup branches (dashed lines) are eight times longer than ingroup branches on average (full lines). (*) Root found by Misawa and Janke (2003). E.J.P. Douzery, D. Huchon / Molecular Phylogenetics and Evolution 33 (2004) 922–935 931 steps, and d/r = 4.5 to 7.7 standardized log-likelihoods. scaling factor. In contrast, and under our simulation These values represent the average differences (over the conditions, ML did not strongly support one given alter- 100 simulations) between the score (number of steps native topology when amino acid information was insuf- under MP, or lnL units under ML) of the simulated ficient for reaching proper conclusions, as was the case topology (root between euarchontoglires and laurasi- here when the outgroup randomly rooted the ingroup. atherians) and the score of the six alternative rootings. The root was placed randomly according to the branch These d/r ratios corresponded to PSH < 0.01. For mod- lengths of the ingroup. Such differences of behavior be- erate outgroup-to-ingroup distances (scaling factor of tween MP and ML have been previously observed by 0.10), alternative rootings were all rejected by D =43 Swofford et al. (2001). Because Misawa and JankeÕs to 120 extra steps and d/r = 4.3 to 7.4 (corresponding (2003) data fall in the area where ML still correctly to PSH < 0.05). Under the conditions of Misawa and localizes the ingroup root, the conclusion of this simula- JankeÕs (2003) data set (scaling factor of 1.00), ML again tion study is that the ML method should confidently recovered the true root in all cases (mean d/r ranged root the placental ingroup, despite the apparent very from 2.9 to 5.7, corresponding to PSH < 0.12). However, long phylogenetic distance between mammals and the the performance of MP began to decrease, with only lissamphibian and avian taxa. 67% of true root recovery. The mean number of extra steps to discriminate among alternative rootings also 3.5. Other nuclear proteins: the Murphy et al. (2001b) diminished (D ranged from 9 to 38). In the case of exces- data set sive outgroup-to-ingroup distance, MP never recovered the true root (Table 3), but massively favored rooting Bayesian analysis of the 15 concatenated proteins along the Mus + Rattus branch. For a scaling factor of from Murphy et al. (2001b) for the same ingroup sam- 10, the performance of ML began to diminish. The true pling as Misawa and Janke (2003) indicated the mono- root was recovered in 41% of the cases, though with lit- phyly of Glires, but Homo falls into an external tle discriminating power (d/r for alternative roots ran- position relative to the five other placentals. When two ged from 0.6 to 1.3, corresponding to PSH of 0.27 to additional placentals were added, one afrotherian and 0.61). For a scaling factor of 100, ML randomly rooted one xenarthran, a tree in which Glires, euarchontoglires, the ingroup, with recovery percentages being approxi- and laurasiatherians are monophyletic was recovered mately those expected under a random rooting propor- (Fig. 5; Waddell et al., 2001). This point emphasizes tionally to ingroup branch lengths. The two preferred the need for denser taxon sampling to increase the reli- rootings were along the bovine and the murine branches, though all seven possibilities behaved simi- larly, with d/r of 0.6–0.8, corresponding to PSH of 0.46–0.57. As expected, the ingroup was more correctly rooted when the length of the branches leading to lissamphib- ian and avian outgroup taxa were shorter (Graham et al., 2002; Huelsenbeck et al., 2002; Wheeler, 1990). A clear difference of behavior in recovering the true root was observed between MP and ML for increasing out- group-to-ingroup distances. MP artificially favored a rooting along the longest ingroup branch—the one lead- ing to murine rodents (cf. Fig. 3)—which might explain the topology observed by Misawa and Janke (2003). For the 6441 amino acid data set, the critical interval of out- Fig. 5. Bayesian analysis of 15 concatenated nuclear proteins from the group-to-ingroup distance scaling where the perfor- data set of Murphy et al. (2001b) with two different taxon samplings mance of MP decreased was between 0.10 and 1.00, among placentals. The search of the maximum posterior probability whereas the performance of ML decreased for out- tree was conducted with and without the afrotherian (elephant) and group-to-ingroup distance scaling between 1.00 and xenarthran (sloth) species. For the 8-taxon data set, the same placentals as in Misawa and Janke (2003) were considered, but a 10.00. More importantly, MP converged towards an closer (marsupial) outgroup was used. For each node, the posterior artifactual rooting with increasing outgroup-to-ingroup probability (PP) is given. The 10-taxon tree (on the left) has a marginal distance by systematically picking-up the root along the log-likelihood of lnL = À35402.8, with Gamma parameter a = 0.54 long Mus–Rattus branch. This is likely to be diagnostic (95% credibility interval: 0.49–0.59), and total tree length = 1.32 (1.26– of a long-branch attraction phenomenon (Felsenstein, 1.38). The 8-taxon tree (on the right) has a marginal log-likelihood of lnL = À31106.1, with Gamma parameter a = 0.55 (95% credibility 1978) of the murine branch by the distant outgroup. interval: 0.50–0.61), and total tree length = 1.11 (1.06–1.17). Branch This feature was reinforced as the branch lengths of lengths are proportional to the expected number of replacements per the lissamphibian and avian taxa were increased by the each of the 4333 amino acid (AA) sites. 932 E.J.P. Douzery, D. Huchon / Molecular Phylogenetics and Evolution 33 (2004) 922–935 ability of phylogenetic inferences (Adachi and Hase- explain the three times more recent estimate obtained gawa, 1995; Lecointre et al., 1993; Philippe and Douz- here for the split between mice and rats. ery, 1994; Zwickl and Hillis, 2002). Among the four Other Bayesian divergence times (Fig. 6) are 55 major placental clades, the elephant and the sloth break Mya for the first split among Glires (95% credibility the long isolated branch ancestral to placentals, and interval = 40–74 Mya), and 77 Mya (60–98 Mya) for facilitate the rooting by more distant taxa like marsupi- the placentals represented here (all boreoeutherians). als (Delsuc et al., 2002) or even avians or These estimates are 1.2–1.5 times younger than those lissamphibians. obtained by Springer et al. (2003) on nuclear genes only. This might be explained by the sparse taxon sampling 3.6. Molecular dating with relaxed clocks considered here, combined with the lack of representa- tives of the two other major placental clades—Afrothe- Our Bayesian divergence ages based on Misawa and ria and Xenarthra—and the very distant fossil reference. JankeÕs (2003) data set show that mice and rats sepa- However, even if our estimate of the mouse/rat split is rated 11 Mya with a 95% credibility interval of 7–17 1.5 times deeper—i.e., 16 Mya—the present molecular Mya (Fig. 6). Using the non-parametric rate smoothing estimate conforms more with the 12–14 Mya of the pale- approach for clock relaxation yields a Mus–Rattus split ontological estimates (Catzeflis et al., 1992) and to the at 13 Mya. These results contrast with the 38–44 Mya 16 Mya of some recent molecular studies based on local estimate found by Misawa and Janke (2003) on the same (Douzery et al., 2003) or relaxed (Adkins et al., 2003; data set but with the linearized tree method of molecular Springer et al., 2003) molecular clocks than to other dating, and with a topology with murines basalmost molecular estimates of 23 Mya (Adkins et al., 2001), among placentals. Springer et al. (2003) observed, for 33 Mya (Nei et al., 2001), 41 Mya (Kumar and Hedges, a larger data set, that the age of the mouse/rat split de- 1998), or 42 Mya (Huchon et al., 2000) based on lineage- pends on the tree rooting. Divergence of the two murid specific or global clock methods. It should be noted that taxa is estimated at 16 Mya when the Afrotheria are ba- all these molecular dating estimates ignore sources of sal in the placental tree (Murphy et al., 2001b)or35 uncertainty due to errors in sequence determination, Mya when the murid rodents are the most basal placen- sequence alignment, model assumptions, topology, tals (Arnason et al., 2002). The difference in topology edge lengths, ancestral polymorphism, and calibration between our best tree (Fig. 3), the topology of which is (Waddell et al., 1999). in agreement with Madsen et al. (2001) and Murphy et al. (2001a,b), but which contrasts with that of the tree published by Misawa and Janke (2003), is likely to 4. Conclusion

Misawa and JankeÕs (2003) phylogenetic conclusions are drawn from congruent results from independent tree-building methods. However, their analyses fail to take among-site rate variation into account. The single situation where Misawa and JankeÕs (2003) results did not support a basal position for rodents is when the tree was inferred using the Bayesian approach with among- site rate variation. Model comparisons using AIC indicate that accounting for among-site rate variation significantly improves the likelihood of the data. When a parameters are included in the model, the best phylo- genetic hypothesis supports the monophyly of Glires, except for the mitochondrial data set. Our ML studies also support the monophyly of Glires. According to sta- tistical tests of alternative topologies (KH and SH tests), we cannot exclude that Glires might be paraphyletic, but Fig. 6. Chronogram of 6 placentals and one avian outgroup, with Bayesian divergence times estimated on the nuclear data set of Misawa this result is hampered by the fact that SOWH tests re- and Janke (2003). For each node, the posterior divergence time is given jected the two best alternative rootings. Contrary to the in millions years (My). The star indicates that the root age was a priori claim of Misawa and Janke (2003), the monophyly of bounded between 309 and 311 Ma, based on the synapsid/diapsid Glires is a valid and likely explanation of the alignment paleontological calibration at 310 Ma. 95% credibility intervals are based on 20 nuclear proteins, whereas a basal position indicated by horizontal rectangles, in white and hatched for lower (recent) and upper (ancient) intervals respectively. Branch lengths are of murine rodents relative to rabbit, human, bovine proportional to time in million years. Branches leading to the chicken and dog is less likely. Additionally, the existence of the and to the placentals were reduced by a factor of four. Glires clade is supported by other molecular data sets, E.J.P. Douzery, D. Huchon / Molecular Phylogenetics and Evolution 33 (2004) 922–935 933 as indicated by our protein analysis of a subset of the se- Chain Monte Carlo sampling and bootstrapping in assessing quences published by Murphy et al. (2001b). It is also phylogenetic confidence. Mol. Biol. Evol. 20, 255–266. supported by morphological (Landry, 1999; Luckett Amrine-Madsen, H., Koepfli, K., Wayne, R.K., Springer, M.S., 2003. A new phylogenetic marker, apolipoprotein B, provides compelling and Hartenberger, 1993; Meng et al., 1994), paleonto- evidence for eutherian relationships. Mol. Phylogenet. Evol. 28, logical (Archibald et al., 2001; but see Fostowicz-Frelik 225–240. and Kielan-Jaworowska (2002) for a different opinion), Archibald, J.D., Averianov, A.O., Ekdale, E.G., 2001. Late Creta- and chromosomal (Stanyon et al., 2003) studies. Our re- ceous relatives of rabbits, rodents, and other extant eutherian sults emphasize once more the importance of accounting mammals. Nature 414, 62–65. Arnason, U., Adegoke, J.A., Bodin, K., Born, E.W., Esa, Y.B., for among-site rate variation (e.g., Sullivan and Swof- Gullberg, A., Nilsson, M., Short, R.V., Xu, X., Janke, A., 2002. ford, 1997; Yang, 1996) in phylogenetic tree reconstruc- Mammalian mitogenomic relationships and the root of the tions based on probabilistic approaches. Based on a eutherian tree. Proc. Natl. Acad. Sci. USA 99, 8151–8156. topology with monophyly of Glires, a Bayesian relaxed Cao, Y., Fujiwara, M., Nikaido, M., Okada, N., Hasegawa, M., 2000. molecular clock dating of the age of the Mus–Rattus Interordinal relationships and timescale of eutherian evolution as inferred from mitochondrial genome data. Gene 259, 149–158. split yields an estimate of 7–17 Mya, compatible with Catzeflis, F.M., Aguilar, J.-P., Jaeger, J.-J., 1992. Muroid rodents: the fossil record. phylogeny and evolution. Trends Ecol. Evol. 7, 122–126. Cummings, M.P., Handley, S.A., Myers, D.S., Reed, D.L., Rokas, A., Winka, K., 2003. Comparing bootstrap and posterior probability values in the four-taxon case. Syst. Biol. 52, 477–487. Acknowledgments de Jong, W.W., van Dijk, M.A., Poux, C., Kappe, G., van Rheede, T., Madsen, O., 2003. Indels in protein-coding sequences of Euarch- Axel Janke and Kazuharu Misawa kindly communi- ontoglires constrain the rooting of the eutherian tree. Mol. Phylogenet. Evol. 28, 328–340. cated their protein alignments. Helpful comments on DeBry, R.W., Sagel, R.M., 2001. Phylogeny of Rodentia (Mammalia) the manuscript were provided by Fre´de´ric Delsuc, Doyle inferred from the nuclear-encoded gene IRBP. Mol. Phylogenet. McKey, and two anonymous reviewers. This work has Evol. 19, 290–301. been supported by the ‘‘Genopole Montpellier Langue- Delsuc, F., Scally, M., Madsen, O., Stanhope, M.J., de Jong, W.W., doc-Roussillon,’’ the ‘‘Action Bioinformatique inter- Catzeflis, F., Springer, M.S., Douzery, E.J.P., 2002. Molecular phylogeny of living xenarthrans and the impact of character and EPST’’ of the CNRS, and by IFR119 ‘‘Biodiversite´ taxon sampling on the placental tree rooting. Mol. Biol. Evol. 19, Continentale Me´diterrane´enne et Tropicale’’ (Montpel- 1656–1671. lier) and INFOBIOGEN (Evry, France) computing DÕErchia, A.M., Gissi, C., Pesole, G., Saccone, C., Arnason, U., 1996. facilities. Additional access to computers was kindly pro- The guinea-pig is not a rodent. Nature 381, 597–600. vided by Herve´ Philippe (Montre´al, Canada), and Mar- Douady, C.J., Delsuc, F., Boucher, Y., Doolittle, W.F., Douzery, E.J.P., 2003. Comparison of Bayesian and maximum likelihood cel Me´chali and Denis Pugne`re (Institut de Ge´ne´tique bootstrap measures of phylogenetic reliability. Mol. Biol. Evol. 20, Humaine de Montpellier [IGH], CNRS UPR 1142). This 248–254. publication is the contribution No. EPML-003 of the Douzery, E.J.P., Delsuc, F., Stanhope, M.J., Huchon, D., 2003. Local Equipe-Projet multi-laboratoires CNRS-STIC ‘‘Me´th- molecular clocks in three nuclear genes: divergence times for odes informatiques pour la biologie mole´culaire’’ and rodents and other mammals, and incompatibility among fossil calibrations. J. Mol. Evol. 57, S201–S213. No. 2004-041 of the Institut des Sciences de lÕEvolution Felsenstein, J., 1978. Cases in which parsimony or compatibility de Montpellier (UMR 5554—CNRS). methods will be positively misleading. Syst. Zool. 27, 401–410. Fostowicz-Frelik, L., Kielan-Jaworowska, Z., 2002. Lower incisor in zalambdalestid mammals (Eutheria) and its phylogenetic implica- tions. Acta Pal. Pol. 47, 177–180. References Gaut, B.S., Lewis, P.O., 1995. Success of maximum likelihood phylogeny inference in the four-taxon case. Mol. Biol. Evol. 12, Adachi, J., Hasegawa, M., 1995. Phylogeny of whales: dependence 152–162. of the inference on species sampling. Mol. Biol. Evol. 12, 177– Goldman, N., Anderson, J.P., Rodrigo, A.G., 2000. Likelihood-based 179. tests of topologies in phylogenetics. Syst. Biol. 49, 652–670. Adachi, J., Hasegawa, M., 1996. Instability of quartet analyses of Graham, S.W., Olmstead, R.G., Barrett, S.C.H., 2002. Rooting molecular sequence data by the maximum likelihood method: the phylogenetic trees with distant outgroups: a case study from the Cetacea/Artiodactyla relationships. Mol. Phylogenet. Evol. 6, 72– commelinoid monocots. Mol. Biol. Evol. 19, 1769–1781. 76. Grassly, N.C., Adachi, J., Rambaut, A., 1997. PSeq-Gen: an appli- Adkins, R.M., Gelke, E.L., Rowe, D., Honeycutt, R.L., 2001. cation for the Monte Carlo simulation of protein sequence Molecular phylogeny and divergence time estimates for major evolution along phylogenetic trees. Comput. Appl. Biosci. 13, rodent groups: evidence from multiple genes. Mol. Biol. Evol. 18, 559–560. 777–791. Graur, D., Duret, L., Gouy, M., 1996. Phylogenetic position of Adkins, R.M., Walton, A.H., Honeycutt, R.L., 2003. Higher-level the order Lagomorpha (rabbits, hares and allies). Nature 379, 333– systematics of rodents and divergence time estimates based on 335. two congruent nuclear genes. Mol. Phylogenet. Evol. 26, 409– Graur, D., Hide, W.A., Li, W.-H., 1991. Is the guinea-pig a rodent?. 420. Nature 351, 649–652. Alfaro, M.E., Zoller, S., Lutzoni, F., 2003. Bayes or bootstrap. A Halanych, K.M., 1998. Lagomorphs misplaced by more characters and simulation study comparing the performance of Bayesian Markov fewer taxa. Syst. Biol. 47, 138–146. 934 E.J.P. Douzery, D. Huchon / Molecular Phylogenetics and Evolution 33 (2004) 922–935

Hillis, D.M., Bull, J.J., 1993. An empirical test of bootstrapping as a radiation using Bayesian phylogenetics. Science 294, method for assessing confidence in phylogenetic analysis. Syst. 2348–2351. Biol. 42, 182–192. Narita, Y., Oda, S., Takenaka, O., Kageyama, T., 2001. Phylogenetic Holland, B.R., Penny, D., Hendy, M., 2003. Outgroup misplacement position of Eulipotyphla inferred from the cDNA sequences of and phylogenetic inaccuracy under a molecular clock - A simula- pepsinogens A and C. Mol. Phylogenet. Evol. 21, 32–42. tion study. Syst. Biol. 52, 229–238. Nei, M., Kumar, S., 2000. Molecular Evolution and Phylogenetics. Huchon, D., Catzeflis, F., Douzery, E.J.P., 2000. Variance of Oxford University Press, New York. molecular datings, evolution of rodents, and the phylogenetic Nei, M., Xu, P., Glazko, G., 2001. Estimation of divergence times affinities between Ctenodactylidae and Hystricognathi. Proc. R. from multiprotein sequences for a few mammalian species and Soc. Lond. B 267, 393–402. several distantly related organisms. Proc. Natl. Acad. Sci. USA 98, Huchon, D., Madsen, O., Sibbald, M.J.J.B., Ament, K., Stanhope, M., 2497–2502. Catzeflis, F., de Jong, W.W., Douzery, E.J.P., 2002. Rodent Philippe, H., 1997. Rodent monophyly: pitfalls of molecular phylog- phylogeny and a timescale for the evolution of Glires: evidence enies. J. Mol. Evol. 45, 712–715. from an extensive taxon sampling using three nuclear genes. Mol. Philippe, H., Douzery, E., 1994. The pitfalls of molecular phylogeny Biol. Evol. 19, 1053–1065. based on four species as illustrated by the Cetacea/Artiodactyla Huelsenbeck, J.P., Bollback, J.P., Levine, A.M., 2002. Inferring the relationships. J. Mammal. Evol. 2, 133–152. root of a phylogenetic tree. Syst. Biol. 51, 32–43. Pupko, T., Huchon, D., Cao, Y., Okada, N., Hasegawa, M., 2002. Jones, D.T., Taylor, W.R., Thornton, J.M., 1992. The rapid genera- Combining multiple data sets in a likelihood analysis: which tion of mutation data matrices from protein sequences. Comput. models are the best?. Mol. Biol. Evol. 19, 2294–2307. Applic. Biosci. 8, 275–282. Reisz, R.R., Mu¨ller, J., 2004. Molecular timescales and the fossil Kishino, H., Hasegawa, M., 1989. Evaluation of the maximum record: a paleontological perspective. Trends Genet. 20, 237–241. likelihood estimate of the evolutionary tree topologies from Reyes, A., Gissi, C., Catzeflis, F., Nevo, E., Pesole, G., Saccone, C., DNA sequence data, and the branching order in Hominoidea. J. 2004. Congruent mammalian trees from mitochondrial and nuclear Mol. Evol. 29, 170–179. genes using Bayesian methods. Mol. Biol. Evol. 21, 397–403. Kishino, H., Thorne, J.L., Bruno, W.J., 2001. Performance of a Ronquist, F., Huelsenbeck, J.P., 2003. MrBayes 3: Bayesian phyloge- divergence time estimation method under a probabilistic model of netic inference under mixed models. Bioinformatics 19, 1572–1574. rate evolution. Mol. Biol. Evol. 18, 352–361. Sakamoto, Y., Ishiguro, M., Kitagawa, G., 1986. Akaike Information Kumar, S., Hedges, S.B., 1998. A molecular timescale for vertebrate Criterion Statistics. D. Reidel Pub., Dordrecht. evolution. Nature 392, 917–920. Sanderson, M.J., 1997. A nonparametric approach to estimating Landry, S.O.J., 1999. A proposal for a new classification and divergence times in the absence of rate constancy. Mol. Biol. Evol. nomenclature for the Glires (Lagomorpha and Rodentia). Mitt. 14, 1218–1231. Mus. Nat.kd. Berl. Zool. Reihe 75, 283–316. Schmitz, J., Ohme, M., Zischler, H., 2000. The complete mitochondrial Lecointre, G., Philippe, H., Leˆ, H.L.V., Le Guyader, H., 1993. Species genome of Tupaia belangeri and the phylogenetic affiliation of sampling has a major impact on phylogenetic inference. Mol. Scandentia to other eutherian orders. Mol. Biol. Evol. 17, 1334– Phylogenet. Evol. 2, 205–224. 1343. Li, C.-K., Wilson, R.W., Dawson, M.R., Krishtalka, L., 1987. Shimodaira, H., 2002. An approximately unbiased test for phyloge- The origin of rodents and lagomorphs. In: Genoways, H.H. netic tree selection. Syst. Biol. 51, 492–508. (Ed.), Current Mammalogy. Plenum Press, New York, pp. 97– Springer, M.S., DeBry, R.W., Douady, C., Amrine, H.M., Madsen, 108. O., de Jong, W.W., Stanhope, M.J., 2001. Mitochondrial versus Lin, Y.-H., McLenachan, P.A., Gore, A.R., Phillips, M.J., Ota, R., nuclear gene sequences in deep-level mammalian phylogeny Hendy, M.D., Penny, D., 2002b. Four new mitochondrial genomes reconstruction. Mol. Biol. Evol. 18, 132–143. and the increased stability of evolutionary trees of mammals from Springer, M.S., Murphy, W.J., Eizirik, E., OÕBrien, S.J., 2003. improved taxon sampling. Mol. Biol. Evol. 19, 2060–2070. Placental mammal diversification and the -Tertiary Lin, Y.-H., Waddell, P.J., Penny, D., 2002a. and vole mitochon- boundary. Proc. Natl. Acad. Sci. USA 100, 1056–1061. drial genomes increase support for both rodent monophyly and Stanhope, M.J., Czelusniak, J., Si, J.-S., Nickerson, J., Goodman, M., glires. Gene 294, 119–129. 1992. A molecular perspective on mammalian evolution from the Linhart, H., 1988. A test whether two AICÕs differ significantly. South gene encoding interphotoreceptor retinoid binding protein, with African Statist. J. 22, 153–161. convincing evidence for bat monophyly. Mol. Phylogenet. Evol. 1, Luckett, W.P., Hartenberger, J.-L., 1993. Monophyly or polyphyly of 148–160. the order Rodentia: possible conflict between morphological and Stanyon, R., Stone, G., Garcia, M., Froenicke, L., 2003. Reciprocal molecular interpretations. J. Mammal. Evol. 1, 127–147. chromosome painting shows that squirrels, unlike murid rodents, Madsen, O., Scally, M., Douady, C.J., Kao, D.J., DeBry, R.W., have a highly conserved genome organization. Genomics 82, 245– Adkins, R., Amrine, H., Stanhope, M.J., de Jong, W.W., Springer, 249. M.S., 2001. Parallel adaptative radiations in two major clades of Sullivan, J., Swofford, D.L., 1997. Are guinea pigs rodents? The placental mammals. Nature 409, 610–614. importance of adequate models in molecular phylogenetics. J. Meng, J., Wyss, A.R., Dawson, M.R., Zhai, R., 1994. Primitive fossil Mammal. Evol. 4, 77–86. rodent from inner Mongolia and its implications for mammalian Suzuki, Y., Glazko, G.V., Nei, M., 2002. Overcredibility of molecular phylogeny. Nature 370, 134–136. phylogenies obtained by Bayesian phylogenetics. Proc. Natl. Acad. Misawa, K., Janke, A., 2003. Revisiting the Glires concept - Sci. USA 99, 16138–16143. phylogenetic analysis of nuclear sequences. Mol. Phylogenet. Evol. Swofford, D.L., Olsen, G.J., Waddell, P.J., Hillis, D.M., 1996. 28, 320–327. Phylogenetic inference. In: Hillis, D.M., Moritz, C., Mable, B.K. Murphy, W.J., Eizirik, E., Johnson, W.E., Zhang, Y.P., Ryder, O.A., (Eds.), Molecular Systematics. Sinauer, Sunderland, MA, pp. 407– OÕBrien, S., 2001a. Molecular phylogenetics and the origins of 514. placental mammals. Nature 409, 614–618. Swofford, D.L., Waddell, P.J., Huelsenbeck, J.P., Foster, P.G., Lewis, Murphy, W.J., Eizirik, E., OÕBrien, S.J., Madsen, O., Scally, M., P.O., Rogers, J.S., 2001. Bias in phylogenetic estimation and its Douady, C., Teeling, E., Ryder, O.A., Stanhope, M.J., de Jong, relevance to the choice between parsimony and likelihood methods. W.W., Springer, M.S., 2001b. Resolution of the early placental Syst. Biol. 50, 525–539. E.J.P. Douzery, D. Huchon / Molecular Phylogenetics and Evolution 33 (2004) 922–935 935

Thorne, J.L., Kishino, H., Painter, I.S., 1998. Estimating the rate of Wallau, B.R., Schmitz, A., Perry, S.F., 2000. Lung morphology in evolution of the rate of molecular evolution. Mol. Biol. Evol. 15, rodents (Mammalia, Rodentia) and its implications for systematics. 1647–1657. J. Morphol. 246, 228–248. Waddell, P.J., Cao, Y., Hasegawa, M., Mindell, D.P., 1999. Assessing Wheeler, W.C., 1990. Nucleic acid sequence phylogeny and random the cretaceous superordinal divergence times within birds and outgroups. Cladistics 6, 363–367. placental mammals by using whole mitochondrial protein Wilcox, T.P., Zwickl, D.J., Heath, T.A., Hillis, D.M., 2002. Phyloge- sequences and an extended statistical framework. Syst. Biol. 48, netic relationships of the dwarf boas and a comparison of Bayesian 119–137. and bootstrap measures of phylogenetic support. Mol. Phylogenet. Waddell, P.J., Shelley, S., 2003. Evaluating placental inter-ordinal Evol. 25, 361–371. phylogenies with novel sequences including RAG1, ?-fibrinogen, Wood, A.E., 1957. What, if anything, is a rabbit. Evolution 11, 417– ND6, and mt-tRNA, plus MCMC-driven nucleotide, amino acid, 425. and codon models. Mol. Phylogenet. Evol. 28, 197–224. Yang, Z., 1996. Among-site rate variation and its impact on Waddell, P.J., Kishino, H., Ota, R., 2001. A phylogenetic foundation phylogenetic analyses. Trends Ecol. Evol. 11, 367–372. for comparative mammalian genomics. Gen. Inform. 12, 141–154. Yang, Z., 1997. PAML: a program package for phylogenetic analysis Waddell, P.J., Kishino, H., Ota, R., 2002. Very fast algorithms for by maximum likelihood. CABIOS 13, 555–556. evaluating the stability of ML and Bayesian phylogenetic trees Zwickl, D.J., Hillis, D.M., 2002. Increased taxon sampling greatly from sequence data. Gen. Inform. 13, 82–92. reduces phylogenetic error. Syst. Biol. 51, 588–598.