The Nematode–Arthropod Clade Revisited: Phylogenomic Analyses from Ribosomal Protein Genes Misled by Shared Evolutionary Biases
Total Page:16
File Type:pdf, Size:1020Kb
Cladistics Cladistics 23 (2007) 130–144 10.1111/j.1096-0031.2006.00132.x The nematode–arthropod clade revisited: phylogenomic analyses from ribosomal protein genes misled by shared evolutionary biases Stuart J. Longhorn1,2,3, Peter G. Foster2 and Alfried P. Vogler1,3 1Department of Entomology and 2Department of Zoology, Natural History Museum, Cromwell Road, London, SW7 5BD, UK, 3Division of Biology, Imperial College London, Silwood Park Campus, Ascot, SL5 7PY, UK Accepted 29 June 2006 Abstract Phylogenetic analysis of major groups of Metazoa using genomic data tends to recover the sister relationships of arthropods and chordates, contradicting the proposed Ecdysozoa (the molting animals), which group the arthropods together with nematodes and relatives. Ribosomal protein genes have been a major data source in phylogenomic studies because they are readily detected as Expressed Sequence Tags (ESTs) due to their high transcription rates. Here we address the debate about the recovery of Ecdysozoa in genomic data by building a new matrix of carefully curated EST and genome sequences for 25 ribosomal protein genes of the small subunit, with focus on new insect sequences in addition to the Diptera sequences generally used to represent the arthropods. Individually, each ribosomal protein gene showed low phylogenetic signal, but in simultaneous analysis strong support emerged for many expected groups, with support increasing linearly with increased gene number. In agreement with most studies of metazoan relationships from genomic data, our analyses contradicted the Ecdysozoa (the putative sister relationship of arthropods and nematodes), and instead supported the affinity of arthropods with chordates. In addition, relationships among holometabolan insects resulted in an unlikely basal position for Diptera. To test for biases in the data that might produce an erroneous arthropod– chordate affinity we simulated sequence data on tree topologies with the alternative arthropod–nematode sister relationships, applying a model of amino acid sequence evolution estimated from the real data. Tree searches on these simulated data still revealed an arthropod–chordate grouping, i.e., the topologies used to simulate the data were not recovered correctly. This suggests that the arthropod–chordate relationships may be obtained erroneously also from the real data even if the alternative topology (Ecdysozoa) represents the true phylogeny. Whereas denser taxon sampling in the future may recover the Ecdysozoa, our analyses demonstrate that recent phylogenomic studies may be affected by as yet unspecified biases in amino acid sequence composition in the model organisms with available genomic data. Ó The Willi Hennig Society 2007. Molecular data have greatly changed the perspective are separated from another major group of protostome on relationships of the Bilateria, i.e., the phyla of animals phyla with spiral cleavage (Spiralia), which includes the with bilateral symmetry developing from three germ Lophotrochozoa (Annelida, Mollusca, Brachipoda, etc.) layers (triploblasts). These data suggest that the tradi- and the formerly basal Rotifera and Plathyhelminthes tional view of an acoel-like ancestor, which progressively (Adoutte et al., 2000). Initially, the revised taxonomic acquired a coelome and differentiated internal organs, is divisions were based on 18S rRNA evidence alone invalid. The revised phylogenetic analyses moves the (Aguinaldo et al., 1997), but a growing suite of nuclear pseudocoelomate Nematoda away from the base of the markers and combined morphological–molecular studies bilaterian tree to a position near the Arthropoda with have continued to support these findings (Manuel et al., which they were grouped as the Ecdysozoa, the ‘‘molting 2000; Peterson and Eernisse, 2001; Giribet, 2002; animals’’ (Aguinaldo et al., 1997; see Fig. 1). The latter Ruiz-Trillo et al., 2002; Mallatt et al., 2004). Many of the changes, including the close relationships of arthro- *Corresponding author: E-mail address: [email protected] pods and nematodes, had already been anticipated in the Ó The Willi Hennig Society 2007 S. J. Longhorn et al. / Cladistics 23 (2007) 130–144 131 Echinodermata Deuterostomia Brachiopoda Mollusca Chordata Lophotrochozoa Articulata Mollusca Annelida Brachiopoda Spiralia Spiralia Annelida Arthropoda Mollusca Rotifera Platyhelminthes Protostomia Platyzoa Annelida Protostomia Protostomia Coelomata Platyhelminthes Nematoda Arthropoda Nematoda Rotifera Ecdysozoa Rotifera Pseudocoelomata Arthropoda Chordata Nematoda Echinodermata Echinodermata Bilateria Acoelomata Deuterostomia Platyhelminthes Deuterostomia sensu lato Chordata Brachiopoda Bilateria Bilateria Cnidaria Cnidaria Ctenophora Ctenophora Ctenophora Cnidaria Porifera Porifera Porifera Adoutte et al., 2000; adapted from Hyman, 1951 Giribet, 2002 Nielsen et al., 1996 Fig. 1. Relationships of selected metazoan phyla. (A) Traditional viewpoint based on morphological data according to (Adoutte et al., 2000; adapted from Hyman, 1951). (B) The ‘‘new phylogeny’’ of Metazoa based on combined morphological and molecular data (Giribet, 2002). (C) Morphology based analysis prior to the formal establishment of Ecdysozoa (Nielsen et al., 1996). These trees were simplified by pruning terminals from the published topologies. Three taxa (Chordata, Arthropoda, Nematoda) critical for the debate about Coelomata are shown in bold. morphological literature (Eernisse et al., 1992; Nielsen tode C. elegans (Copley et al., 2004) and idiosyncratic et al., 1996). However, controversy specifically about the rates of sequence evolution (Telford, 2004; Dopazo and validity of the Ecdysozoa remained based on morpho- Dopazo, 2005), while phylogenetic inferences are further logical evidence (e.g., Nielsen, 2001; Scholtz, 2002), while impeded by the presumed rapid diversification of major the conclusions from 18S rRNA analyses were heavily metazoan lineage (Rokas et al., 2005). The removal of criticized (Wa¨gele et al., 1999) but defended by others the fastest evolving sites or genes also led to a reduction (Zrzavy, 2001). in support for the Coelomata (Philip et al., 2005; With the emergence of genome data for several Philippe et al., 2005), possibly indicating the role of animal species, deep metazoan relationships have long-branch attraction (LBA) phenomena. Similarly, become a test case for the use of such data in the exclusion of sites with different substitution rates phylogenetics. Owing to the taxon sampling, initial between D. melanogaster and C. elegans (relative to studies have mainly focused on the validity of the humans) resulted in the first genomic scale data set that Ecdysozoa. The surprising result was that the arthro- provided support for the Ecdysozoa (Dopazo and pods (Drosophila melanogaster) grouped with the repre- Dopazo, 2005). Recent studies that incorporate a sentative of chordates (Homo sapiens) rather than broader taxonomic sampling of arthropods than Dip- nematodes (Caenorhabditis elegans), apparently refuting tera alone, like inclusion of a representative Lepidoptera the Ecdysozoa (Mushegian et al., 1998; Hausdorf, 2000; (silk worm, Bombyx) and Hymenoptera (honey bee, Blair et al., 2002). This affinity of arthropods with Apis) plus representatives of Spiralia, are in better chordates broadly reflects the traditional view of close agreement with the revised phylogenetic scheme, argu- relationships of taxa possessing a true body cavity, a ing against a basal position of nematodes (i.e., against coelom, which current literature refers to as the the Coelomata). However, even with improved taxon ‘‘Coelomata hypothesis’’. The arthropod–chordate sampling, phylogenetic analyses based on genomic data (Coelomata) clade continues to be supported by studies may not support a monophyletic Ecdysozoa due to using genome and Expressed Sequence Tags (ESTs), presumed erroneous grouping of Nematoda with the even when several hundred (Blair et al., 2005; Wolf Platyhelminthes (Philippe et al., 2004, 2005). et al., 2004; Philip et al., 2005) or several thousand Another trend apparent with the advent of genome (Copley et al., 2004; Dopazo et al., 2004) genes were level analysis is the largely automated data compilation, used. The most recent study of all available eukaryotic where BLAST searches are used to retrieve similar genome sequences, selecting genes with well-established sequences without evaluating orthology, based on orthology, also supports this relationship (Ciccarelli automated alignments (Dopazo and Dopazo, 2005; et al., 2006). However, it has been argued that some of Philip et al., 2005). While this may be acceptable these studies are affected by inherent biases in genome practice with well-curated genome data, it may be less evolution, such as consistent loss of genes in the nema- so using EST sequences due to the need for first 132 S. J. Longhorn et al. / Cladistics 23 (2007) 130–144 compiling full-length transcripts from multiple redund- frames (ORFs) and generate putative protein sequences. ant ESTs, which frequently differ in sequence to various RP sequences for the following taxa were first obtained degrees. The difficulty of establishing gene orthology is a from the non-redundant (nr) protein database of Gen- general problem of genome level data, and many loci fail Bank: H. sapiens (Uechi et al., 2001), D. melanogaster, in rigorous tests of orthology (Blair et al., 2005; Philip C. elegans, Ictalurus punctatus (Vertebrata, Pisces) et al., 2005). Stringent data filtering to remove likely (Karsi et al., 2002),