Arthropod Relationships Revealed by Phylogenomic Analysis of Nuclear Protein-Coding Sequences
Total Page:16
File Type:pdf, Size:1020Kb
Vol 463 | 25 February 2010 | doi:10.1038/nature08742 LETTERS Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences Jerome C. Regier1, Jeffrey W. Shultz1,2,3, Andreas Zwick1, April Hussey1, Bernard Ball4, Regina Wetzer5, Joel W. Martin5 & Clifford W. Cunningham4 The remarkable antiquity, diversity and ecological significance of protein-coding genes (not including 18% missing data). This data set arthropods have inspired numerous attempts to resolve their deep builds on a previous study that sequenced the same gene regions for 12 phylogenetic history, but the results of two decades of intensive arthropods and one tardigrade outgroup (446 kb)16. With the excep- molecular phylogenetics have been mixed1–7. The discovery that tion of three nodes, that study showed unconvincing bootstrap sup- terrestrial insects (Hexapoda) are more closely related to aquatic port, which improved only modestly when we added 44 arthropods Crustacea than to the terrestrial centipedes and millipedes2,8 previously sequenced for only three genes (an additional 227 kb)16.The (Myriapoda) was an early, if exceptional, success. More typically, present study enlarges our earlier data set nearly fourfold, greatly analyses based on limited samples of taxa and genes have generated improving support throughout the entire phylogeny. results that are inconsistent, weakly supported and highly sensitive Sequences for each of the 62 genes from 80 taxa were obtained using to analytical conditions7,9,10. Here we present strongly supported PCR primers designed to amplify genes determined a priori to be single- results from likelihood, Bayesian and parsimony analyses of over copy orthologues in Drosophila melanogaster when compared with 41 kilobases of aligned DNA sequence from 62 single-copy nuclear Caenorhabditis elegans and Homo sapiens16. Alignment of these ortho- protein-coding genes from 75 arthropod species. These species logues was based on translated amino-acid sequences, which are highly represent every major arthropod lineage, plus five species of conserved. For example, the most divergent pair of protein sequences in tardigrades and onychophorans as outgroups. Our results strongly the entire data set is still identical at 73% of their amino-acid sites. support Pancrustacea (Hexapoda plus Crustacea) but also strongly Uncertainty about homology at sites bracketing small insertion–deletion favour the traditional morphology-based Mandibulata11 (Myriapoda (indel) regions resulted in exclusion of approximately 6.5% of all sites16. plus Pancrustacea) over the molecule-based Paradoxopoda (Myria- The phylogeny shown in Figs 1 and 2 has largely robust bootstrap poda plus Chelicerata)2,5,12. In addition to Hexapoda, Pancrustacea support from four methods of analysis that are well suited to inferring includes three major extant lineages of ‘crustaceans’, each spanning deep-level phylogenies (Fig. 1). Additional likelihood, Bayesian and a significant range of morphological disparity. These are Oligostraca parsimony analyses shown in Supplementary Figs 1–6 also support all (ostracods, mystacocarids, branchiurans and pentastomids), Veri- major conclusions described here. With the major exception of crustacea (malacostracans, thecostracans, copepods and branchio- ordinal relationships within Arachnida, bootstrap values above 80% pods) and Xenocarida (cephalocarids and remipedes). Finally, within and posterior probabilities of 1.0 pertain in Fig. 1 and Supplementary Pancrustacea we identify Xenocarida as the long-sought sister group Figs 3–5. With few exceptions, we also recovered clades widely to the Hexapoda, a result confirming that ‘crustaceans’ are not mono- accepted by morphologists, which is an informal criterion that sup- phyletic. These results provide a statistically well-supported phylo- ports our conclusions based on bootstrap analyses. Furthermore, genetic framework for the largest animal phylum and represent a step there was little evidence of strong conflict among 68 gene regions towards ending the often-heated, century-long debate on arthropod from 62 nuclear protein-coding genes. Single-gene analyses showed relationships. that 95% of all relationships with .70% bootstrap support agreed The molecular phylogeny of Arthropoda has proven difficult to with the phylogeny from the concatenated genes shown in Fig. 1 resolve. In an attempt to overcome this, we increased both taxon and (Supplementary Table 3). This paucity of strong conflict is consistent gene sampling relative to earlier studies. Our broad taxon sample with the orthology of these sequences. Of the six newly named groups includes the basal lineages of Hexapoda, every class of traditional shown in bold in Fig. 1, only two had more than 70% bootstrap ‘Crustacea’, every class in Myriapoda, every order in Arachnida support in single-gene analyses (one gene each; Supplementary and multiple representatives from Xiphosura, Pycnogonida and the Table 3). This suggests that the strong support in Fig. 1 is the result outgroups Onychophora and Tardigrada. of the cumulative phylogenetic signal across numerous gene regions. Until recently, arthropod molecular phylogenetics relied mainly Bootstrap values derived for amino acids were sometimes lower than upon nuclear ribosomal DNA and mitochondrial sequences. Our data those derived from nucleotides. This result is due, at least in part, to come from the complementary DNA of single-copy nuclear protein- the failure of amino-acid models to distinguish between serine resi- coding genes, which represent the largest source of data for phyloge- dues encoded by TCN and AGY codons (A.Z. and J.C.R., manuscript netics. Three phylogenomic studies of single-copy nuclear genes in the in preparation), and is consistent with recent work questioning the past year included 9 (ref. 13), 32 (ref. 14) and 6 (ref. 15) arthropod taxa, performance of widely implemented amino-acid models in com- respectively (165 kilobases (kb), 319 kb and 432 kb of DNA sequences, parison with nucleotide and codon models17,18. not including missing data). The present study of 75 arthropods brings At the deepest level, our phylogeny strongly supports Mandi- to bear 2.6 megabases of aligned arthropod DNA from 62 single-copy bulata11 (Pancrustacea plus Myriapoda), a controversial result that 1Center for Biosystems Research, University of Maryland Biotechnology Institute, 2Department of Entomology, 3Institute for Bioscience and Biotechnology Research, University of Maryland, College Park, Maryland 20742, USA. 4Department of Biology, Duke University, Durham, North Carolina 27708, USA. 5Natural History Museum of Los Angeles County, Los Angeles, California 90007, USA. 1079 ©2010 Macmillan Publishers Limited. All rights reserved LETTERS NATURE | Vol 463 | 25 February 2010 degen1 BP noLRall1+nt2 BP Polyneoptera Blattodea 94 99 Orthoptera Periplaneta 99 94 89 95 Acheta codon BP amino-acid BP Neoptera 97 100 Dermaptera 97 96 Forficula x: x% = BP and node 81 95 100 100 Antheraea Pterygota Lepidoptera 100 100100 100 Cydia recovered in MLA 99 99 100 100 Prodoxus 92 99 –: BP < 50% and node Dicondylia Palaeoptera Ephemeroptera 100 100 Hexagenia not recovered in MLA 100 100 69 59 100 100 Ephemerella 100 100 76 89 Odonata 100 100 Ischnura [53]: BP = 53% and node Insecta 100 100 Libellula 100 100 not recovered in MLA Zygentoma 100 100 Ctenolepisma 100 100 100 100 Nicoletia Hexapoda Archaeognatha 100 100 Pedetontus 100 99 Entomobryomorpha 100 100 Machiloides Collembola 98 95 Tomoceridae 97 100 Entomobryidae Tomocerus Miracrustacea 100 100 79 95 Orchesella 94 98 Entognatha 86 89 100 100 Poduridae Podura 89 87 Diplura Campodeidae 79 <50 100 100 Japygidae Eumesocampa Remipedia 100 100 Metajapyx Xenocarida 93 100 Speleonectes 89 54 Hutchinsoniella Cephalocarida Thoracica Sessilia 97 99 Semibalanus 100 100 Altocrustacea 97 96 Chthamalus 93 89 Thecostraca 100 100 100 100 Pedunculata Lepas 81 <50 Communostraca 100 100 Rhizocephala Loxothylacus 84 57 Eumalacostraca 87 51 Eucarida Libinia Peracarida Multicrustacea 86 97 Malacostraca 100 100 58 88 Armadillidium 100 100 Hoplocarida 100 100 100 100 Neogonodactylus 100 100 Phyllocarida 98 53 Nebalia Cyclopoida 100 100 Mesocyclops Copepoda 100 100 100 100 Acanthocyclops Pancrustacea 86 88 100 100 Calanoida Eurytemora 100 100 80 <50 Diplostraca Cladocera 100 100 Spinicaudata Daphnia 100 100 Phyllopoda 100 100 100 100 Limnadia 100 100 Laevicaudata Vericrustacea Branchiopoda 100 100 Lynceus 100 100 100 100 Notostraca Triops 100 100 Anostraca 100 100 Streptocephalus 100 100 Artemia Ichthyostraca 100 100 Pentastomida Armillifer <50 <50 Branchiura Mandibulata 100 100 Argulus – 57 Mystacocarida 99 99 Oligostraca 100 100 Derocheilocaris Myodocopa 100 100 Harbansus 99 99 92 99 Ostracoda 60 78 Podocopa 100 100 Skogsbergia <50 70 Cypridopsis Chilognatha 55 71 Callipodida Polyzoniida Abacion Diplopoda 100 100 – 52 Polyzonium Progoneata 99 95 100 100 Spirobolida Narceus Polyxenida 67 72 97 96 Polyxenus 69 73 Symphyla 100 100 Hanseniella 92 92 Pauropoda 100 100 Scutigerella Myriapoda 100 100 97 65 Scolopendromorpha Eurypauropus 100 100 Pleurostigmophora 99 99 Scolopendra 93 59 Lithobiomorpha Chilopoda 100 98 Lithobius 100 100 77 90 CraterostigmomorphaCraterostigmus Arthropoda 100 100 Scutigeromorpha Uropygi Scutigera 100 100 Pedipalpi 100 98 Schizomida Stenochrus Thelyphonida 100 100 Tetrapulmonata 100 85 100 99 Mastigoproctus Pulmonata 99 97 100 99 Amblypygi Phrynus 65 75 97 97 Araneae Aphonopelma 94 72 Scorpiones <50 – 100 100 Heterometrus Ricinulei100 100 Hadrurus