Gene 259 (2000) 139–148 www.elsevier.com/locate/gene

Phylogenetic position of turtles among amniotes: evidence from mitochondrial and nuclear genes

Ying Cao a, Michael D. Sorenson b, Yoshinori Kumazawa c, David P. Mindell d, Masami Hasegawa a,* a The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569, Japan b Department of Biology, Boston University, Boston, MA 02215, USA c Department of Earth and Planetary Sciences, Nagoya University, Chikusa-ku, Nagoya 464-8602, Japan d Department of Biology and Museum of Zoology, University of Michigan, Ann Arbor, MI 48109, USA

Received 18 April 2000; received in revised form 19 July 2000; accepted 8 September 2000 Received by T. Gojobori

Abstract

Maximum likelihood analysis, accounting for site-heterogeneity in evolutionary rate with the C-distribution model, was carried out with amino acid sequences of 12 mitochondrial proteins and nucleotide sequences of mitochondrial 12S and 16S rRNAs from three turtles, one squamate, one crocodile, and eight . The analysis strongly suggests that turtles are closely related to archosaurs (birds+crocodilians), and it supports both Tree-2: (((birds, crocodilians), turtles), squamates) and Tree-3: ((birds, (crocodilians, turtles)), squamates). A more traditional Tree-1: (((birds, crocodilians), squamates), turtles) and a tree in which turtles are basal to other amniotes were rejected with high statistical significance. Tree-3 has recently been proposed by Hedges and Poling [Science 283 (1999) 998–1001] based mainly on nuclear genes. Therefore, we re-analyzed their data using the maximum likelihood method, and evaluated the total evidence of the analyses of mitochondrial and nuclear data sets. Tree-1 was again rejected strongly. The most likely hypothesis was Tree-3, though Tree-2 remained a plausible candidate. © 2000 Elsevier Science B.V. All rights reserved.

Keywords: Diapsid affinity of turtles; Maximum likelihood; Molecular phylogeny; Reptile; Total evaluation

1. Introduction whereas Løvtrup placed archosaurs closest to turtles. Romer’s view has been the consensus, while Løvtrup’s Phylogenetic relationships among the major groups has been a minority view. Most morphological analyses of amniotes are not yet well resolved, and the position since the late 1980s (e.g. Gauthier et al., 1988; Laurin of Testudines (turtles) in particular remains uncertain. and Reisz, 1995; Lee, 1995) consistently supported the Among morphologists, alternative views have been pre- basal phylogenetic position of turtles as a sister-group sented by Romer (1966) and Løvtrup (1980). While of both archosaurs and squamates. both agree on placing crocodiles as the sister-group of Traditionally, living turtles have been considered the birds, in monophyletic Archosauria, Romer placed basal lineage in the amniote tree, mainly because of archosaurs closest to squamates (lizards and snakes), their lack of temporal fenestrae in the skull (anapsid), a condition that has been interpreted as primitive (e.g. Abbreviations: AIC, Akaike Information Criterion; ATP6, ATPase Benton, 1990). This view, however, was recently chal- 6; COB, cytochrome b; COX1, cytochrome oxidase subunit 1; KH lenged in a morphological analysis by Rieppel and test, Kishino–Hasegawa test; LDH, lactate dehydrogenase; ML, maxi- ffi mum likelihood; MS test, multiple-comparisons test of the standard- deBraga (1996), who suggested an a nity between ized statistics; mtDNA, mitochondrial DNA; mt-protein, turtles and diapsids, as did Løvtrup (1980). In contrast mitochondrial protein; mtREV model, general reversible Markov to Løvtrup, however, Rieppel and deBraga placed turtles model for amino acid substitution of mt-proteins; ND1, NADH dehy- as sister to lepidosaurs (squamates plus tuatara). In drogenase subunit 1; rRNA, ribosomal RNA. * Corresponding author. Tel.: +81-3-5421-8748; either case, turtles must have lost both the upper and fax: +81-3-3446-1695/+81-3-5421-8796. lower temporal fenestrae (deBraga and Rieppel, 1997). E-mail address: [email protected] (M. Hasegawa) The validity of Rieppel and deBraga’s interpretation

0378-1119/00/$ - see front matter © 2000 Elsevier Science B.V. All rights reserved. PII: S0378-1119(00)00425-X 140 Y. Cao et al. / Gene 259 (2000) 139–148 was questioned by Wilkinson et al. (1997) and Lee (1997), although Wilkinson et al. (1997) acknowledged the difficulty of determining the phylogenetic placement of turtles and suggested molecular approaches to be promising in this respect. Recent molecular studies bearing on turtle relation- ships all suggest a diapsid affinity for turtles, but also point to an apparent contradiction in data from the mitochondrial and nuclear genomes. Analyses based on complete mtDNA sequences of Eastern painted turtle (Chrysemys picta; Mindell et al., 1999) and African side- necked turtle (Pelomedusa subrufa; Zardoya and Meyer, 1998) both placed turtles as sister to archosaurs (birds and alligators), although representative lepidosaurs were not available for analyses of the complete mt genome. With lepidosaurs included in an analysis of mt rRNAs only, turtles grouped with archosaurs (Zardoya and Meyer, 1998). Kumazawa et al. (1998) and Kumazawa and Nishida (1999) obtained the first complete mtDNA sequences for lepidosaurs, from the Ryukyu odd-tooth snake (Dinodon semicarinatus) and the blue-tailed mole skink (Eumeces egregius lividus). With additional data from another turtle species (green turtle, Chelonia mydas), they obtained strong support for the turtle/archosaur grouping (Tree-2 in Fig. 1; Kumazawa and Nishida, 1999). Similarly, Platz and Conlon (1997) found a sister relationship between turtles and archo- saurs, excluding lepidosaurs as an outgroup, based on amino acid sequences of the pancreatic polypeptide. Other studies using evidence from the nuclear genome have reached a somewhat different conclusion. Using mainly nuclear-encoded proteins and ribosomal RNAs, Hedges and Poling (1999) suggested a sister relationship between turtles and crocodilians with birds placed as the sister taxon to this group (Tree-3 in Fig. 1). In other words, Archosauria was found to be paraphyletic with turtles placed within the group. The same result was Fig. 1. Three candidate trees for relationships among birds, crocodiles, turtles, and squamates. obtained by Kirsch and Mayer (1998) using DNA– DNA hybridization data. In Kumazawa and Nishida’s maximum likelihood (wallaroo; Janke et al., 1997; database accession number analysis of mt-proteins, the Hedges and Poling sugges- Y10524); Didelphis virginiana (opossum; Janke et al., tion of a turtle/crocodile grouping was rejected at the 1994; Z29573); Ornithorhynchus anatinus (platypus; % 5 level. Thus, nuclear and mitochondrial data appear Janke et al., 1996; X83427); Corvus frugilegus (rook; ff to provide a di erent resolution of the placement of Ha¨rlid and Arnason, 1999; Y18522); Gallus gallus turtles within Diapsida. However, phylogenetic infer- (chicken; Desjardins and Morais, 1990; X52392); Aythya ences can vary with taxonomic sampling (Philippe and americana (redhead duck; Mindell et al., 1999; Douzery, 1994; Adachi and Hasegawa, 1996c; Halanych, AF090337); Rhea americana (greater rhea; Mindell 1998), so we analyze here the updated mtDNA database et al., 1999; AF090339); Falco peregrinus (peregrine along with the nuclear sequences used by Hedges and falcon; Mindell et al., 1999; AF090338); Vidua chaly- Poling (1999). beata (village indigobird; Mindell et al., 1999; AF090341); Smithornis sharpei (grey-headed broadbill; Mindell et al., 1999; AF090340); Struthio camelus 2. Materials and methods (ostrich; Ha¨rlid et al., 1997; Y12025); Alligator mississip- 2.1. Sequence data piensis (American alligator; Mindell et al., 1999; AF069428); Eumeces egregius lividus (blue-tailed mole The complete mtDNA sequences used in this study skink; Kumazawa and Nishida, 1999; AB016606); are from the following 21 species: Macropus robustus Pelomedusa subrufa (African side-necked turtle; Zardoya Y. Cao et al. / Gene 259 (2000) 139–148 141 and Meyer, 1998; AF039066); Chrysemys picta (eastern consideration models of secondary structure (Neefs painted turtle; Mindell et al., 1999; AF069423); Chelonia et al., 1993; Gutell et al., 1993) following Cao et al. mydas (green turtle; Kumazawa and Nishida, 1999; (1994). Alignment positions with gaps and regions of AB012104); Xenopus laevis (African clawed toad; Roe ambiguous alignment were excluded from all analyses. et al., 1985; M10217); Cyprinus carpio (carp; Chang Total numbers of remaining sites are 781 and 1074, et al., 1994; X61010); Crossostoma lacustre (loach; respectively, for 12S and 16S. Tzeng et al., 1992; M91245); Oncorhynchus mykiss The alignments of the following nuclear genes used (trout; Zardoya et al., 1995; L29771); Mustelus manazo by Hedges and Poling (1999) were also analyzed: 18S (gummy shark; Cao et al., 1998; AB015962). We did rRNA, 28S rRNA, a-crystallin A, a-enolase, a-globin, not use the snake sequence (Kumazawa et al., 1998) b-globin, c-mos proto-oncogene, lactate dehydrogenase because of its extremely rapid evolutionary rate. In the A (LDHa), lactate dehydrogenase B (LDHb), and extensive analyses, we did not use eutherian mammals myoglobin. We did not use calcitonin and insulin because of the ambiguity of the relationship among sequences because of their short length. See Hedges and eutherians, marsupials and monotremes (Janke et al., Poling (1999) for details on the number of taxa included 1996, 1997), but a preliminary analysis including eutheri- in the analyses for each gene. ans did not give a significantly different result (data The alignments used in this work are available at not shown). http://www.evol.ism.ac.jp The 12 proteins encoded in the H-strand of mtDNA were carefully aligned by eye. All positions with gaps or 2.2. Phylogenetic methods ambiguous alignment and overlapping regions between ATP6 and ATP8 and between ND4 and ND4L were All 15 possible trees for Aves, Crocodylia, Squamata excluded from phylogenetic analyses. The total number and Testudines with Mammalia, Amphibia, Osteichthyes of remaining codons is 3235. and Chondrichthyes as the outgroup were analyzed by The small (12S) and the large (16S) mitochondrial the maximum likelihood (ML) method (Felsenstein, rRNA sequences were aligned manually taking into 1981; Kishino et al., 1990). Since the sister-group status

Fig. 2. A maximum likelihood tree of vertebrates estimated from the concatenated amino acid sequences of the 12 mt-proteins using ProtML with the mtREV-F model (Adachi and Hasegawa, 1996b). Relationships among birds were constrained to the traditional view as shown. For each internal branch, the percentage bootstrap probability (after fixing the relationships within subtrees attached to that branch; local bootstrap probability (Adachi and Hasegawa, 1996b) was estimated with the RELL method using 104 replications. The horizontal length of each branch is proportional to the estimated number of amino acid substitutions. 142 Y. Cao et al. / Gene 259 (2000) 139–148

A

Fig. 3A. Y. Cao et al. / Gene 259 (2000) 139–148 143

B

Fig. 3. P-values of KH test for the 15 possible trees among birds, crocodilians, turtles, and squamates with mammals as an outgroup for (A) the 12 individual mt-proteins, the 12S and 16S mt-rRNAs, and the total of the separate analyses of mitochondrial genes and for (B) the 18S and 28S rRNAs, the 10 nuclear proteins, and the total of the nuclear genes. Tree topologies: (1) (((, crocodile), squamate), turtle); (2) (((bird, crocodile), turtle), squamate); (3) ((bird, (turtle, crocodile)), squamate); (4) (((bird, turtle), crocodile), squamate); (5) ((bird, crocodile), (turtle, squamate)); (6) (((bird, turtle), squamate), crocodile); (7) ((bird, turtle), (crocodile, squamate)); (8) (bird, ((turtle, crocodile), squamate)); (9) ((bird, squamate), (turtle, crocodile)); (10) ((bird, (crocodile, squamate)), turtle); (11) (((bird, squamate), crocodile), turtle); (12) (bird, (turtle, (crocodile, squamate))); (13) (bird, ((turtle, squamate), crocodile)); (14) (((bird, squamate), turtle), crocodile); (15) ((bird, (turtle, squamate)), crocodile). of Mammalia to other groups of Amniota has been the nuclear-encoded proteins (Jones et al., 1992). The established by previous analyses (e.g. Mindell et al., BaseML program in the PAML package was applied to 1999), we assumed this relationship in our study. the rRNA sequences with the HKY85 model (Hasegawa Further, the relationships within each group were et al., 1985). For the CodeML and BaseML programs, assumed to be fixed as they are in Fig. 2, such that the the discrete C-distribution model with eight categories 15 trees represent rearrangements only of the main was used to accommodate site-heterogeneity. To evalu- branches between groups. The ProtML program in the ate the total evidence from separate analyses of indivi- MOLPHY package (vers. 2.3) (Adachi and Hasegawa, dual genes, the TotalML program in MOLPHY was 1996b) and the CodeML program in the PAML package applied to the output files of the ProtML, CodeML, (Yang, 1997) were applied to the protein sequences with and BaseML programs. For each of the 15 possible the mtREV-F model for mitochondrial proteins (Adachi trees, a standard error of the log-likelihood difference and Hasegawa, 1996a) and with the JTT-F model for from the ML tree was estimated by Kishino and 144 Y. Cao et al. / Gene 259 (2000) 139–148

Hasegawa’s (1989) formula, and the P-value of the sequences, and that the summation of log-likelihood Kishino–Hasegawa (KH) test was calculated. scores from the separate analyses of individual genes is Hedges and Poling (1999) concatenated the sequences preferable to a single analysis of concatenated sequences of several genes for combined analyses, but in preparing (Cao et al., 1999; Adachi et al., 2000). Therefore, the concatenated sequences, they chose only one individual genes were analyzed separately (branch sequence from each group and omitted other sequences. lengths were estimated by maximum likelihood for each This procedure did not make optimal use of all the individual gene with a different shape parameter for the available information. Furthermore, the analysis of con- C-distribution estimated for each gene) and the log- catenated sequences under a single model of sequence likelihoods for each gene were summed to determine the evolution does not satisfactorily take into account rate ML tree for a total analysis of all the data. heterogeneity among genes, even if the C-distribution Inclusion of the C-distribution for individual mito- model is used, as it has been shown that the total chondrial genes did not change the ML tree; i.e. Tree-2 evaluation based on a summation of log-likelihood (Fig. 1) is preferred based on the combined likelihood scores from separate analyses of individual genes with score for the mitochondrial data in total (Fig. 3A). For the C model is preferable (Cao et al., 1999). all the individual mitochondrial genes, either Tree-2 or 3 has the best likelihood score except for ND3 and COB which, respectively, yield Trees-4 and 9; however, their 3. Results scores are only slightly better than the scores for Trees-2 and 3 (P-values are 0.95 and 0.67, respectively, for ND3 A basal divergence for the passeriform lineage relative and 0.70 and 0.80 for COB). None of the individual to other birds was suggested in earlier analyses based mitochondrial genes discriminate significantly against on 12 mt protein-coding genes and 2 mt-rRNA genes Trees-2 and 3 (the lowest P-value is 0.23 for Tree-3 in combined (Mindell et al., 1997) and the mt cytb gene the analysis of ND5). Although the combined analysis (Ha¨rlid et al., 1997, 1998). This was an unexpected of all mitochondrial genes (including rRNAs) yield result, differing from the traditional view (e.g. Storer, Tree-2 as optimal, the log-likelihood of Tree-3 is lower 1971). Introduction of the C-distribution (Yang, 1996) by only 2.7±10.8 (±1 S.E.) than that of Tree-2 (P= in the ML analysis of the 12 mt-proteins, however, 0.80). While the combined likelihood scores for the greatly reduced the log-likelihood difference between a nuclear genes do not strongly reject any of the 15 more traditional ratite-basal tree and the ML - alternative trees (the lowest P-value is 0.01 for Tree-7) basal tree (Mindell et al., 1999; table 5), suggesting a (Fig. 3B), the total analysis of the mitochondrial genes possible artifactual attraction between a passerine bird rejects all trees other than Trees-2, 3 and 4 with P- and alligator which is an outgroup taxon to birds. values lower than 0.001 (Table 1). Recent studies of nuclear and mitochondrial genes by The basal position of squamates among extant rep- van Tuinen et al. (2000) and Groth and Barrowclough tiles is well established by our total analysis of the (1999) are consistent with this view. Fig. 2 shows the mitochondrial and nuclear data (Table 1; P-values for ProtML tree of the concatenated 12 mt-proteins with other relationships are lower than 0.0004), as well as by the constraint that ratites are basal among birds. previous studies (Kumazawa and Nishida, 1999; Hedges Although rooting the avian tree with the Smithornis and Poling, 1999). Although Tree-3 is best supported, lineage gives a higher log-likelihood than this tree by Tree-2 has a log-likelihood score lower by only 34.8±26.3 (P=0.19) without C, the difference reduces 25.2±17.6 (P=0.15) and cannot be rejected. P-values to 2.0±16.8 (P=0.90) with C. Therefore, we assumed by the multiple-comparisons of the standardized statis- that ratites are basal among birds in the rest of our tics (MS) method (Shimodaira and Hasegawa, 1999) analyses. for Trees-2 and 4 are 0.315 and 0.012, respectively, Fig. 2 is consistent with the tree of Kumazawa and which are larger than 0.152 and 0.003 given by the KH Nishida (1999) which was based on a more limited test presented in Table 1. In this sense the MS test is sampling of species. Turtle is the closest relative of more conservative than the Kishino–Hasegawa test. It archosaurs (birds+crocodilians) in this tree. The tree in is noteworthy that even the conservative MS test gave Fig. 2 was estimated by assuming that all sites are P-values lower than 0.002 for all trees other than Trees-2, equally free to vary, but phylogeny estimation can 3 and 4. sometimes be inconsistent under such an assumption An additional concern in the analysis of nuclear- (Lockhart et al., 1996; Yang, 1996; Sullivan and encoded proteins is accurate identification of the homol- Swofford, 1997). Therefore, we accommodate rate- ogous relationship (orthology versus paralogy) for heterogeneity among sites (Yang, 1997) as well as among LDHa and LDHb genes across species. This is crucial different genes. It has been shown that rate heterogeneity for phylogenetic analyses, though not always easy to among genes is not sufficiently well approximated by a establish (Stock et al., 1997), and when LDHa and single discrete C-distribution model of concatenated LDHb were excluded from the analysis, the combined Y. Cao et al. / Gene 259 (2000) 139–148 145

Table 1 Comparison of log-likelihood scores among the 15 possible trees. ‘Mitochondria’ refers to the total of the separate analyses of individual genes for 12 mt-proteins plus 12S and 16S rRNAs, ‘nuclear’ to the total of the 10 nuclear genes, and ‘total’ to the total of the individual analyses of the mitochondrial and nuclear genes. The log-likelihood values of the highest likelihood trees are given in angle brackets, and the differences in log- likelihood of alternative trees from that of the ML tree are shown with their S.E. following ±. P-values are shown for the ‘total’ analysis

Tree Mitochondria Nuclear Total Total (excl. LDHs)

Log-likelihood P-value Log-likelihood P-value

1 −70.5±19.3 −36.0±18.8 −103.8±26.9 1.1×10−4 −84.1±23.6 3.7×10−4 2 −67 098.8−27.9±16.8 −25.2±19.0 0.18 −3.7±15.6 0.81 3 −2.7±8.5 −17 255.3 −84 356.8 1 −80 693.2 1 4 −15.8±11.6 −33.2±17.2 −46.3±20.7 0.03 −26.6±17.3 0.12 5 −68.6±19.7 −34.1±17.4 −100.0±26.2 1.4×10−4 −77.4±23.2 8.5×10−4 6 −59.1±18.1 −40.3±19.4 −96.7±26.5 2.6×10−4 −76.4±23.3 1.0×10−3 7 −57.4±18.2 −43.6±19.0 −98.4±26.4 1.9×10−4 −79.1±22.9 5.5×10−4 8 −78.1±18.8 −3.3±11.9 −78.6±22.2 4.0×10−4 −80.6±21.6 1.9×10−4 9 −73.9±18.7 −9.5±13.6 −80.8±23.1 4.7×10−4 −85.6±23.1 2.1×10−4 10 −88.5±21.6 −41.4±19.1 −127.2±28.9 1.1×10−5 −109.2±25.6 2.0×10−5 11 −91.7±21.0 −29.7±17.5 −118.7±27.3 1.4×10−5 −110.4±25.3 1.3×10−5 12 −95.1±21.3 −29.7±16.3 −122.1±26.8 5.2×10−6 −108.1±24.3 8.7×10−6 13 −95.4±21.6 −27.4±15.2 −120.0±26.5 6.0×10−6 −103.7±24.0 1.6×10−5 14 −92.5±21.1 −27.2±17.6 −117.1±27.5 2.1×10−5 −105.9±25.9 4.3×10−5 15 −90.7±22.0 −33.7±18.2 −121.7±28.6 2.1×10−5 −100.2±25.8 1.0×10−4 likelihood scores for the mitochondrial plus nuclear data additional parameter, a (shape parameter of the were almost identical for Trees-2 and 3; Tree-2 has a C-distribution), must be estimated. Therefore, the log-likelihood score lower than that of Tree-3 by only number of estimated parameters is 59(=39+19+1) for 3.7±14.6 (P=0.80). each ML analysis with the C-distribution. For Tree-2 with the concatenated analysis of the 12 proteins, AIC was 2×49 413.4+2×59=98 944.8, while AIC for the 4. Discussion separate analysis was reduced to 2×48 522.8+ 2×59×12=98 461.6. This indicates that the separate Table 2 presents log-likelihood scores for the three analysis better approximates the underlying evolutionary trees including the birds/crocodilians/turtles clade when process than the concatenated analysis, which does not the 12 mt-proteins were analyzed with the mtREV-F explicitly assume heterogeneity of the substitution pro- model. The separate and concatenated analyses of 12 cess across genes. This holds even when site-hetero- mt-proteins were compared in terms of the Akaike geneity is taken into account with the C-distribution in = Information Criterion (AIC), where the AIC score the concatenated analysis. Separate analyses with the − × + × 2 ln L 2 (number of parameters). The minimum C-distribution model for each of the individual proteins AIC estimate is a natural extension of the classical provided the best approximation of the data among maximum likelihood estimate when comparing models models we considered, and should be more reliable than ff with di erent numbers of parameters, and the model analyses which yield higher AIC values (Cao et al., that minimizes AIC is considered to be the most appro- 1999; Adachi et al., 2000). There seems to be a tendency priate model (Akaike, 1974; Sakamoto et al., 1986). to exaggerate support for a particular tree, whether or The mtREV-F model uses the amino acid frequencies not the tree is true, when the assumed model clearly of the data (the number of parameters is 19) and, for fails to accommodate the full complexity of the substitu- a bifurcating tree with 21 species, 39 branch lengths tion process (e.g. Hasegawa and Adachi, 1996). must be estimated. For the C-distribution model, one Why does the concatenated analysis not approximate Table 2 the data as well as the separate analysis, even though Comparison of log-likelihood scores for trees based on 12 mito- the C-distribution model is applied? We suggest that the chondrial proteins in which birds, crocodilians and turtles form a pattern of rate variation among lineages and among monophyletic group relative to squamates nucleotide or amino acid positions differs between ff Tree Separate Concatenated di erent genes, such that a single C-distribution for all the genes provides a less accurate approximation of the 2 −48 522.8 −49 413.4 evolutionary process. − ± − ± 3 2.8 7.9 7.1 6.4 Although molecular phylogenetics has become a pow- 4 −14.5±9.7 −10.3±5.4 erful tool in elucidating the evolutionary history of 146 Y. Cao et al. / Gene 259 (2000) 139–148 organisms, a single gene does not necessarily contain Nishida (1999). These analyses suggest that the apparent sufficient information to resolve the problem at hand, discrepancy is attributable to subtle differences in the and therefore, it is necessary to consider as many alignment (i.e. higher stringency in our alignment using different loci as possible and to evaluate the total 3235 sites than in theirs using 3465 sites), different evidence. The ML method is particularly suitable for choice of outgroup taxa (i.e. mammals versus fishes), this purpose. Given a model, one can calculate the and, most importantly, different sampling of ingroup likelihood as the probability that one tree yielded the taxa (i.e. birds and turtles) (data not shown). These observed data, and each gene can reasonably be regarded results point to the importance of sampling sufficient as evolving independently from other genes. Therefore, numbers of ingroup taxa for more reliable phylogenetic the total support for a particular tree can be evaluated analyses. Further efforts to determine complete mtDNA by simply summing up the estimated log-likelihoods of sequences from representative crocodilian and squamate individual genes for that tree, and the total log-likeli- lineages should thus be encouraged in this respect. hoods for different trees can then be compared (Adachi Tree-3 is the best supported tree by our total analysis and Hasegawa, 1996b; Hasegawa et al., 1997; Cao of mitochondrial and nuclear data, but it is not sup- et al., 1999). ported by presently available morphological data. If Hedges and Poling’s (1999) analysis of the 11 nuclear Tree-3 is the true tree, turtles may have lost the morpho- proteins gave 100% bootstrap support for Tree-3 using logical characteristics indicating a close relationship with ML analysis (99 and 97% support by the neighbor- crocodilians. The total evaluation of the accumulated joining and parsimony). These support values seem molecular data does not reliably discriminate between much higher than that obtained by our analysis. This Trees-2 and 3, and more data are needed to resolve this discrepancy may be due to their neglect of heterogeneity issue. Concerning mitochondrial data, denser species among sites as well as among genes. Furthermore, sampling, particularly for Crocodylia, might be helpful, Mannen and Li (1999) obtained Tree-3 with very high because only American alligator with a long branch is bootstrap support by collecting and analyzing sequence represented in our data set. Alternatively, failure to find data of LDHa, LDHb, and a-enolase from several a single resolution for the branching order among birds, reptiles. However, they analyzed nucleotide sequences crocodilians, and turtles might indicate that divergences and ignored rate variation among different codon posi- for these three lineages occurred successively in a short tions. Reanalysis of their data at the amino acid sequence time period. ffi level suggests that the data does not contain sufficient The diapsid a nity of turtles has recently been sup- information to reliably discriminate among trees ported from a morphological standpoint (see e.g. a (although Mannen and Li’s data contain more species review by Rieppel and Reisz, 1999), but the currently ff available morphological data do not seem to have than those of Hedges and Poling, results do not di er ffi much from those presented in Fig. 3B; data not shown). su cient resolution as to which diapsids are closest In general agreement with results presented here, relatives of turtles. In this respect, it is noteworthy that the total molecular analyses of the present study rejected Mindell et al. (1999) found slight preference for a tree possibilities other than the archosaurian affiliation of with alligator and turtle as sisters based on 12 turtles. We feel our results provide reasonable topologi- mt-proteins and ML analysis assuming equal rates of cal constraints with which to re-evaluate the morpholog- change across sites; however, this changed to slight ical characters. preference for a sister relationship between turtle and archosaurs on accommodating rate heterogeneity across sites. Acknowledgements In our ML analyses using concatenated mito- chondrial proteins (Table 2), Tree-3 was not significantly We thank Hidetoshi Shimodaira for discussions and worse than Tree-2 (7.1±6.4; P=0.27). Similar ML for his help in the analysis of the multiple comparisons, analyses by Kumazawa and Nishida (1999) statistically Mitsuko Kitahara for drawing Fig. 1. This work was rejected Tree-3 (19.2±8.9; P=0.03), but the log-likeli- supported by grants from the Japan Society for the hood differences in their analyses were slightly overesti- Promotion of Sciences (Y.C. and M.H.), Yamada mated due to errors in the mtREV24 matrix of PUZZLE Science Foundation (M.H.) and U.S. National Science version 4.0 (Strimmer and von Haeseler, 1996) as noted Foundation (M.D.S. and D.P.M.). in the PUZZLE online manual (http://www.tree- puzzle.de/manual.html ). The corresponding difference using PUZZLE version 4.0.2, in which the mtREV error References was fixed, was 15.0±7.6 (P=0.05), still supporting the % rejection of Tree-3 at the 5 significance level. We have Adachi, J., Hasegawa, M., 1996a. Model of amino acid substitution conducted extensive analyses using the data sets ana- in proteins encoded by mitochondrial DNA. J. Mol. Evol. 42, lyzed in this paper and that used by Kumazawa and 459–468. Y. Cao et al. / Gene 259 (2000) 139–148 147

Adachi, J., Hasegawa, M., 1996b. MOLPHY: Programs for Molecular Janke, A., Gemmell, N., Feldmaier-Fuchs, G., von Haeseler, A., Phylogenetics vers. 2.3. In: Computer Science Monographs No. 28. Pa¨a¨bo, S., 1996. The mitochondrial genome of a monotreme, the Institute of Statistical Mathematics, Tokyo. platypus (Ornithorhynchus anatinus). J. Mol. Evol. 42, 153–159. Adachi, J., Hasegawa, M., 1996c. Instability of quartet analyses of Janke, A., Xu, X., Arnason, U., 1997. The complete mitochondrial molecular sequence data by the maximum likelihood method: the genome of the wallaroo (Macropus robustus) and the phylogenetic Cetacea/Artiodactyla relationships. Mol. Phyl. Evol. 6, 72–76. relationship among Monotremata, Marsupialia and Eutheria. Proc. Adachi, J., Waddell, P., Martin, W., Hasegawa, M., 2000. Plastid Natl. Acad. Sci. USA 94, 1276–1281. genome phylogeny and a model of amino acid substitution for Jones, D., Taylor, W., Thornton, J., 1992. The rapid generation of proteins encoded by chloroplast DNA. J. Mol. Evol. 50, 348–358. mutation data matrices from protein sequences. Comput. Appl. Akaike, H., 1974. A new look at the statistical model identification. Biosci. 8, 275–282. IEEE Trans. Autom. Contr. AC-19, 716–723. Kirsch, J., Mayer, G., 1998. The platypus is not a rodent: DNA hybrid- Benton, M., 1990. Vertebrate Palaeontology. Unwin Hyman, London. ization, amniote phylogeny and the palimpsest theory. Philos. Cao, Y., Adachi, J., Hasegawa, M., 1994. Eutherian phylogeny as Trans. R. Soc. London, Ser. B 353, 1221–1237. inferred from mitochondrial DNA sequence data. Jpn. J. Genet. Kishino, H., Hasegawa, M., 1989. Evaluation of the maximum likeli- 69, 455–472. hood estimate of the evolutionary tree topologies from DNA Cao, Y., Waddell, P., Okada, N., Hasegawa, M., 1998. The complete sequence data, and the branching order in Hominoidea. J. Mol. mitochondrial DNA sequence of the shark Mustelus manazo: evalu- Evol. 29, 170–179. ating rooting contradictions to living bony vertebrates. Mol. Biol. Kishino, H., Miyata, T., Hasegawa, M., 1990. Maximum likelihood Evol. 15, 1637–1646. inference of protein phylogeny, and the origin of chloroplasts. Cao, Y., Kim, K., Ha, J., Hasegawa, M., 1999. Model dependence of J. Mol. Evol. 31, 151–160. the phylogenetic inference: relationship among carnivores, perisso- Kumazawa, Y., Nishida, M., 1999. Complete mitochondrial DNA dactyls and cetartiodactyls as inferred from mitochondrial genome sequences of the green turtle and blue-tailed mole skink: statistical sequences. Genes Genet. Syst. 74, 211–217. evidence for archosaurian affinity of turtles. Mol. Biol. Evol. 16, Chang, Y.-s., Huang, F.-l., Lo, T.-b., 1994. The complete nucleotide 784–792. sequence and gene organization of carp (Cyprinus carpio) mito- Kumazawa, Y., Ota, H., Nishida, M., Ozawa, T., 1998. The complete chondrial genome. J. Mol. Evol. 38, 138–155. nucleotide sequence of a snake (Dinodon semicarinatus) mito- deBraga, M., Rieppel, O., 1997. Reptile phylogeny and the interrela- chondrial genome with two identical control regions. Genetics tionships of turtles. Zool. J. Linn. Soc. 120, 281–354. 150, 313–329. Desjardins, P., Morais, R., 1990. Sequence and gene organization of Laurin, M., Reisz, R.R., 1995. A reevaluation of early amniote phylog- the chicken mitochondrial genome: a novel gene order in higher eny. Zool. J. Linn. Soc. 113, 165–223. vertebrates. J. Mol. Biol. 212, 599–634. Lee, M.S.Y., 1995. Historical burden in systematics and the interrela- Felsenstein, J., 1981. Evolutionary trees from DNA sequences: a maxi- tionships of ‘parareptiles’. Biol. Rev. 70, 459–547. mum likelihood approach. J. Mol. Evol. 17, 368–376. Lee, M.S.Y., 1997. Reptile relationships turn turtle …. Nature 389, Gauthier, J., Kluge, A.G., Rowe, T., 1988. Amniote phylogeny and 245–246. the importance of fossils. Cladistics 4, 105–209. Lockhart, P., Larkum, A., Steel, M., Waddell, P., Penny, D., 1996. Groth, J.G., Barrowclough, G.F., 1999. Basal divergences in birds and Evolution of chlorophyll and bacteriochlorophyll: the problem of the phylogenetic utility of the nuclear RAG-1 gene. Mol. Phyl. invariant sites in sequence analysis. Proc. Natl. Acad. Sci. USA 93, Evol. 12, 115–123. 1930–1934. Gutell, R., Gray, M., Schnare, M., 1993. A compilation of large sub- Løvtrup, S., 1980. The Phylogeny of Vertebrata. Plenum Press, New unit (23S and 23S-like) ribosomal RNA structures: 1993. Nucl. York. Acids Res. 21, 3055–3074. Mannen, H., Li, S.-L., 1999. Molecular evidence for a clade of turtles. Halanych, K., 1998. Lagomorph misplaced by more characters and Mol. Phyl. Evol. 13, 144–148. fewer taxa. Syst. Biol. 47, 138–146. Mindell, D., Sorenson, M., Huddleston, C., Miranda Jr., H., Knight, Hasegawa, M., Adachi, J., 1996. Phylogenetic position of cetaceans A., Sawchuk, S., Yuri, T., 1997. Phylogenetic relationships among relative to artiodactyls: reanalysis of mitochondrial and nuclear and within select avian orders based on mitochondrial DNA. In: sequences. Mol. Biol. Evol. 13, 710–717. Mindell, D. (Ed.), Avian Molecular Evolution and Systematics. Hasegawa, M., Kishino, H., Yano, T., 1985. Dating of the human–ape Academic Press, San Diego, pp. 213–247. splitting by a molecular clock of mitochondrial DNA. J. Mol. Mindell, D., Sorenson, M., Dimcheff, D., Hasegawa, M., Ast, J., Yuri, Evol. 22, 160–174. T., 1999. Interordinal relationships of birds and other reptiles based Hasegawa, M., Adachi, J., Milinkovitch, M., 1997. Novel phylogeny on whole mitochondrial genomes. Syst. Biol. 48, 138–152. of whales supported by total molecular evidence. J. Mol. Evol. 44, Neefs, J.-M., Van de Peer, Y., De Rijk, P., Chapelle, S., De Wachter, Suppl. 1, 117–120. R., 1993. Compilation of small ribosomal subunit RNA structures. Ha¨rlid, A., Arnason, U., 1999. Analyses of mitochondrial DNA nest Nucl. Acids Res. 21, 3025–3049. ratite birds within the Neognathae-supporting a neotenous origin Philippe, H., Douzery, E., 1994. The pitfalls of molecular phylogeny of ratite morphological characters. Proc. R. Soc. London, Ser. B based on four species, as illustrated by the Cetacea/Artiodactyla 266, 305–309. relationships. J. Mammal. Evol. 2, 133–152. Ha¨rlid, A., Janke, A., Arnason, U., 1997. The mtDNA sequence of Platz, J., Conlon, J., 1997. … and turn back again. Nature 389, the ostrich and the divergence between paleognathous and neogna- 246–246. thous birds. Mol. Biol. Evol. 14, 754–761. Rieppel, O., deBraga, M., 1996. Turtles as diapsid reptiles. Nature Ha¨rlid, A., Janke, A., Arnason, U., 1998. The complete mitochondrial 384, 453–455. genome of Rhea americana and early avian divergence. J. Mol. Rieppel, O., Reisz, R.R., 1999. The origin and early evolution of tur- Evol. 46, 669–679. tles. Annu. Rev. Ecol. Syst. 30, 1–22. Hedges, S., Poling, L., 1999. A molecular phylogeny of reptiles. Science Roe, B., Ma, D.-P., Wilson, R., Wong, J.-H., 1985. The complete 283, 998–1001. nucleotide sequence of the Xenopus laevis mitochondrial genome. Janke, A., Feldmaier-Fuchs, G., Thomas, W., von Haeseler, A., Pa¨a¨bo, J. Biol. Chem. 260, 9759–9774. S., 1994. The marsupial mitochondrial genome and the evolution Romer, A., 1966. Vertebrate Paleontology. University of Chicago of placental mammals. Genetics 137, 243–256. Press, Chicago. 148 Y. Cao et al. / Gene 259 (2000) 139–148

Sakamoto, Y., Ishiguro, M., Kitagawa, G., 1986. Akaike Information nucleotide sequence of the Crossostoma lacustre mitochondrial Criterion Statistics. Reidel, Dordrecht. genome: conservation and variations among vertebrates. Nucl. Shimodaira, H., Hasegawa, M., 1999. Multiple comparisons of log- Acids Res. 20, 4853–4858. likelihoods with applications to phylogenetic inference. Mol. Biol. van Tuinen, M., Sibley, C., Hedges, S., 2000. The early history of Evol. 16, 1114–1116. modern birds inferred from DNA sequences of nuclear and mito- Stock, D.W., Quattro, J.M., Whitt, G.S., Powers, D.A., 1997. Lactate chondrial ribosomal genes. Mol. Biol. Evol. 17, 451–457. dehydrogenase (LDH) gene duplication during evolution: Wilkinson, M., Thorley, J., Benton, M., 1997. Uncertain turtle rela- the cDNA sequence of the LDH of the tunicate Styela plicata. Mol. tionships. Nature 387, 466–466. Biol. Evol. 14, 1273–1284. Yang, Z., 1996. Among-site rate variation and its impact on phyloge- Storer, R.W., 1971. Classification of birds. In: Farner, D.S., King, J.R. netic analyses. Trends Ecol. Evol. 11, 367–372. (Eds.), Avian Biology vol. 1. Academic Press, New York, pp. 1–18. Yang, Z., 1997. PAML: a program package for phylogenetic analysis Strimmer, K., von Haeseler, A., 1996. Quartet puzzling: a quartet by maximum likelihood. CABIOS 13, 555–556. maximum-likelihood method for reconstructing tree topologies. Zardoya, R., Meyer, A., 1998. Complete mitochondrial genome sug- Mol. Biol. Evol. 13, 964–969. gests diapsid affinities of turtles. Proc. Natl. Acad. Sci. USA 95, Sullivan, J., Swofford, D., 1997. Are guinea pigs rodents? The impor- 14 226–14 231. tance of adequate models in molecular phylogenetics. J. Mammal. Zardoya, R., Garrido-Pertierra, A., Bautista, J., 1995. The complete Evol. 4, 77–86. nucleotide sequence of mitochondrial DNA genome of the rainbow Tzeng, C.-S., Hui, C.-F., Shen, S.-C., Huang, P., 1992. The complete trout, Oncorhynchus mykiss. J. Mol. Evol. 41, 942–951.