<<

Downloaded from http://mbe.oxfordjournals.org/ at American Museum of NaturalArticle History on February 17, 2015 2007). ons, please Shultz 1990, ; 3 ization of terrestrialization ology prediction, ancient nalyses have divided extant a, Tetrapulmonata, and all alez, ). The interrelationships of Weygoldt and Paulus 1979 Vanessa L. Gonz 3 ict, transcriptomics, orth upported by locus groups of variable evolutionary 3 Morphological phylogenetic a of , Euchelicerat nce Access publication August 8, 2014 2963 ates, is generally accepted, but has garnered little support von ReumontChelicerata remain et among thethropod al. systematics. last Chelicerates major constitute 2012 challenges theest second for larg- branch ar- of theiconic tree lineages, ofscorpions. life such and include as several horseshoeChelicerata crabs, into , threeXiphosura (horseshoe and groups: crabs), and Pycnogonidalicerates) Arachnida (e.g., (sea (terrestrial che- spiders), Implicit in this schema is the minim events, namely bymutual the monophyly (=Xiphosura + Arachnida) of common ancestor. isnumber Pycnogonida of morphological The characters, and presently embryological evidence, Euchelicerata supported by a ived relationships are demonstrably robust. Consistently Attempts to resolve the internal phylogeny of chelicerates alyses of gene-tree discordance indicate marked incongru- adth of taxonomic sampling or in depth of sequencing. To scorpiones, and potentially engender long- erez-Porro, ps supported by subsets of slow-evolving genes include e approaches to optimization of matrix composition. Here, we ty for Molecular Biology and Evolution. All rights reserved. For permissi ; useum of Comparative Zoology, Harvard University trapulmonata (previously named ). Alicia R. P 2 Rota-Stabelli and Gonzalo Giribet 1 ong-branch attraction artifacts and does not mitigate gene-tree discordance, Simon et al. 2012 data, including two complete genomes. We analyzed multiple data sets contain- ; nonmonophyly of Arachnida with increasing support upon incrementing the and its three major rami— me- and genome-scale data, ion of the hexapod sister group Stefan T. Kaluziak, Campbell et al. 2011; ,1 E-mail: [email protected]. Ward C. Wheeler, 4 Meusemann et al. 2010 ; Nicolas Vidal 31(11):2963–2984 doi:10.1093/molbev/msu235 Adva Arthropoda, , concatenation, topological confl ). Multiple phylogenomic efforts have been directed

Hejnol et al. 2009;

The Author 2014. Published by Oxford University Press on behalf of the Socie Department of Biological Sciences, George Washington University Division of Invertebrate Zoology, AmericanMarine Museum Science of Center, Natural Northeastern History, University NewDepartment York, of NY Organismic and Evolutionary Biology, M Regier et al. 2010 ( et al. 2011 toward resolving internal relationships of Pancrustacea,unite which hexapods and the paraphyleticcontributed “crustaceans,” to and the have identificat *Corresponding author: Associate editor: rapid radiation. ß Mol. Biol. Evol. ence in relationships among chelicerate orders, whereas der show that phylogenetic signal forset. the monophyly Accelerated of Arachnida evolutionary is restricted rates to in the 500 , slowest-evolving genes Pseudo in the data rate, revealing significant conflicts in phylogenetic signal. An recovered and supported relationshipsorders include represented the monophyly by multiple terminals. Relationshi Key words: Subsequent to decades of pitchedrelationships contention over of the inter- emerge Arthropoda, regarding a theWith consensus shape the of has advent the of begun transcripto arthropod to tree of life. e-mail: [email protected] Introduction the monophyly of Arthropoda Chelicerata, Myriapoda,“Tetraconata”)—is and strongly Pancrustaceasister supported, group (alternatively, relationship together(e.g., of with and a velvet worms address the internal phylogeny of this17 group, species we in employed combination a with phylogenomicing approach, existing up generating to transcriptomic data 1,235,912 for sites across 3,644 loci, using alternativ branch attraction artifacts,number yielding of concatenated genes. Mutually exclusive hypotheses are s + ; aTetrapulmonata, Scorpiones, andbranch Pseudoscorpiones. length comprised distribution We exacerbatesregardless of demonstrate l of that Ricinulei, highproposed outgroup gene for , representation selection the clade for without comprising and outgroups Scorpiones regard + that Te Solifugae; for are model and organisms. Arachnopulmonata a (new name) clade is comprised of Chelicerata represents one of the oldestgroup groups to of arthropods, the with remaining a extant fossilhave arthropods, record achieved the extending to little mandibulates. the consensus, ,phylogeny. and due The is to monophyly sister of marked Arachnida, discord thefrom terrestrial in molecular chelicer both data, morphological which and have molecular been hypotheses limited of either chelicerate in bre Abstract 4 1 2 3 Conflicts in Phylogenetic Signal Prashant P. Sharma,* Gustavo Hormiga, Phylogenomic Interrogation of Arachnida Reveals Systemic Sharma et al. . doi:10.1093/molbev/msu235 MBE Downloaded from http://mbe.oxfordjournals.org/

FIG.1. (A–J) Exemplars of chelicerate diversity. (A) Anoplodactylus evansi (Pycnogonida), (B) Limulus polyphemus (Xiphosura) spawning, (C) Pachylicus acutus (Opiliones), (D) Neoscona arabesca (Araneae), (E) Centruroides sculpturatus (Scorpiones), (F) Ricinoides atewa (Ricinulei), (G) Eremobates sp. (Solifugae) consuming a smaller conspecific, (H) Synsphyronus apimelus (Pseudoscorpiones), (I) Mastigoproctus giganteus (), (J) Damon variegatus (). (K–N) Hypotheses of chelicerate ordinal relationships, from (K) Wheeler and Hayashi (1998) (morphology and ribosomal at American Museum of Natural History on February 17, 2015 sequence data); (L) Giribet et al. (2002) (two ribosomal genes); (M) Shultz (2007) (morphology); and (N) Regier et al. (2010) (62 nuclear genes, analysis with degenerated codons). Image of Anoplodactylus evansi courtesy of Mick Harris. Image of Limulus polyphemus courtesy of Peter Funch. and molecular sequence data (Wheeler et al. 1993; Wheeler Meusemann et al. 2010; Pepato et al. 2010; Regier et al. and Hayashi 1998; Mallatt et al. 2004; Regier et al. 2010), in- 2010; Arabi et al. 2012; von Reumont et al. 2012). The largest cluding phylogenomic approaches (e.g., Roeding et al. 2009; phylogenetic effort sampling all extant arachnid lineages, Meusemann et al. 2010). In contrast, due to extensive mor- based on 62 Sanger-sequenced loci, recovered strong support phological character conflict, basal relationships of a puta- only for the monophyly of Euchelicerata, Tetrapulmonata tively monophyletic Arachnida have remained recalcitrant (Araneae, Amblypygi, and Uropygi), and relationships to resolution, and many hypotheses have been proposed within the tetrapulmonates; Arachnida was modestly sup- for these interrelationships (e.g., Shultz 1990, 2007; Wheeler ported in that study by only two of four analyses, and basal and Hayashi 1998; Giribet et al. 2002)(fig. 1). For example, relationships of arachnids were highly unstable (Regier et al. mites (Acariformes) and (Parasitiformes) are tradition- 2010). ally united as by such characters as the subcapitulum Multiple aspects of Arachnida make the internal resolution and the “hexapod larva” (a six-legged postembryonic stage, of this group prone to recalcitrance. First, the origins of che- which also occurs in Ricinulei); this relationship conflicts with licerates are presumed to be ancient, occurring during or prior the presence of a prominent sejugal furrow in camel spiders to the Cambrian, with rapid diversification of extant arachnid (Solifugae) and mites (Shultz 2007; Dunlop et al. 2012), as well orders (Dunlop 2010; Rota-Stabelli et al. 2013). A considerable as the fundamentally divergent morphology of the two con- number of arachnid lineages equal in rank to orders has gone stituent lineages of Acari (Van der Hammen 1989). Some extinct, such as trigonotarbids (early arachnids present by the authors disputed even the monophyly of Arachnida, on the ) and phalangiotarbids (arachnids of unknown affinity basis of “primitive” characters occurring in and present in the and ), in addition to Opiliones (harvestmen) (e.g., Van der Hammen 1989). non-arachnid chelicerate groups such as eurypterids and Paralleling morphology, molecular data have not resolved chasmataspidids (sea scorpions), and synziphosurines relationships within Arachnida either, much less affirmed the (stem-group horseshoe crabs or stem-group euchelicerates) validity of this group (fig. 1). Analyses of nucleotide sequence (Dunlop 2010; Briggs et al. 2012; Lamsdell 2013). Many extant data infrequently recover the monophyly of Arachnida, al- orders also demonstrate the signature of prolonged historic though very few studies have comprehensively sampled all extinction, that is, marked spans of time between origin and extant arachnid orders (Wheeler and Hayashi 1998; Giribet diversification, and the retention of few extant species. This et al. 2002; Masta et al. 2009, 2010; Roeding et al. 2009; phenomenon is epitomized by horseshoe crabs, which

2964 Phylogenomic Interrogation of Arachnida . doi:10.1093/molbev/msu235 MBE appeared in the fossil record in the and are sur- Illumina platform libraries. Sampled outgroup taxa to vived by just four extant species that diverged in the Arachnida consisted of Xiphosura and Pycnogonida, as well (Obst et al. 2012). The combination of ancient as basally branching Mandibulata (myriapods) and diversification and subsequent extinctions has presumably Onychophora. We used recently developed tools for orthol- produced short internodes in extant chelicerate phylogeny, ogy prediction and optimization of matrix composition. The a documented affliction for phylogenetic reconstruction largest data set of 3,644 genes and greater than 1.2 million (Rokas and Carroll 2006; Salichos and Rokas 2013). amino acid sites (at 25% gene occupancy) included exemplars Second, difficulty of phylogenetic resolution is com- of all extant arachnid orders except for (the sister pounded by accelerated evolution in certain arachnid group of Thelyphonida and part of Uropygi). Here we show orders. Specifically, mitochondrial genes of Pseudoscorpiones, that phylogenetic analysis of the 3,644-ortholog data set yields Acariformes, and Parasitiformes (or at least ) the nonmonophyly of Arachnida with 100% bootstrap sup- have demonstrably higher rates of evolution, and larger var- port, due to suspect placement of Pseudoscorpiones, iability in nucleotide composition, than other Euchelicerata Acariformes, and Parasitiformes. We demonstrate that this

(Arabi et al. 2012). Similarly, branch lengths inferred from result is an LBA artifact by examining nodal support as a Downloaded from nuclear ribosomal gene alignments suggest heterotachy in function of concatenation by evolutionary rate. Intriguingly, the same groups (as well as and some araneo- a set of genes supporting the monophyly of Arachnida morph spiders), even upon accounting for secondary strongly conflicts with other such sets, which support other structure (Pepato et al. 2010). Significantly accelerated rates hypotheses also consistent with morphology (e.g., the place- of evolution in certain lineages underlie long-branch attrac- ment of Pseudoscorpiones), indicating nonuniformity of phy- http://mbe.oxfordjournals.org/ tion (LBA), the artifactual close relationship of lineages with logenetic signal. Significant incongruence among gene trees long-branch lengths (Felsenstein 1978; Huelsenbeck 1997; corroborates the pervasiveness of phylogenetic conflict at the Bergsten 2005; Philippe et al. 2005). Because outgroup taxa base of Arachnida. Nevertheless, we demonstrate consistent are often undersampled, the placement of long-branch taxa support for the monophyly of Chelicerata, Euchelicerata, and sister to, or part of a grade with, outgroup terminals is symp- several derived of arachnids. tomatic of LBA artifacts. Application of sequence data at genomic scales, compounded by inclusion of relatively few Results terminals, can result in maximal support values for spurious Orthology Assignment and Supermatrix Assembly at American Museum of Natural History on February 17, 2015 placement of long-branch terminals (reviewed by Bergsten We sequenced cDNA from 16 arachnid and one xiphosuran 2005). species using the Illumina GAII and HiSeq 2500 platforms, and A third and final aspect of arachnid biology hindering phy- combined these with published transcriptomes and genomes logenetic resolution may be variability of genome size. At one from 30 other species, including some libraries sequenced in extreme, the genome of the mite Tetranychus urticae is our laboratories (Riesgo et al. 2012; Fernandez et al. 2014). among the smallest arthropod genomes, comprising approx- Assembly statistics for new libraries, as well as published tran- imately 18,224 coding genes (Grbic et al. 2011). In contrast, scriptomes assembled from raw reads, are provided in table 1. the genome of the Mesobuthus martensii contains approximately 32,016 genes, more than any other Subsequent to open reading frame (ORF) prediction and se- genome (Cao et al. 2013). For reference, approximately lection of the longest ORFs per unigene, 3,055–18,814 peptide sequences were retained for Illumina libraries and complete 15,771 genes occur in the fruit fly Drosophila melanogaster (FlyBase Consortium, Release 5), and approximately 21,000– genomes, 815–1,127 for 454 Life Science platform libraries, 25,000 in Homo sapiens (Genome Reference Consortium). and 54–10,646 for Sanger-sequenced EST libraries. Furthermore, whereas genome compaction in mites is asso- Orthology assessment of this 48-taxon data set with the ciated with gene loss (Grbic et al. 2011), scorpions appear to Orthologous MAtrix (OMA) stand-alone algorithm recovered have undergone high gene family turnover, as well as gene 77,774 orthogroups. Subsequent to alignment and culling of duplications and neofunctionalization of resulting paralogs ambiguously aligned positions, a supermatrix was con- (Cao et al. 2013; Sharma et al. 2014). Such marked historical structed by retaining a minimum gene occupancy threshold disparity in the evolution of gene families across arachnids of 13 taxa (i.e., occupancy of 425% of the data set), resulting may have hindered accurate assessments of orthology in data in 3,644 concatenated orthologs (1,235,912 aligned sites; sets composed of Sanger-sequenced nuclear genes, particu- 36.23% occupied). The number of orthologs per taxon larly some commonly utilized phylogenetic markers assumed varied from 6 (the palpigrade Prokoenenia wheeleri;based to be single-copy orthologs throughout arthropods (e.g., on published Sanger data) to 3,215 (the harvestman Hedin et al. 2010), but subsequently shown to contain nu- Sitalcina lobata). Due to the paucity of data for the palpigrade, merous paralogs using cloning techniques and/or transcrip- its phylogenetic placement was analyzed separately. tomic resources (Riesgo et al. 2012; Clouse et al. 2013). Toward improving resolution within Arachnida, we com- Maximum-Likelihood Analyses piled a data set of 48 taxa, including new transcriptomic Phylogenetic analysis of the 3,644-ortholog matrix under max- libraries for 17 species published herein, two complete ge- imum likelihood (ML) recovered a fully resolved tree topology nomes (the mite T. urticae and the Ixodes scapularis), with 100% bootstrap support for the monophyly of and previously published expressed sequence tag (EST) and Chelicerata and Euchelicerata (fig. 2). Contrary to the

2965 Sharma et al. . doi:10.1093/molbev/msu235 MBE

Table 1. Species Sequenced and Analyzed in This Study. Source Raw Reads Contigs/ESTs/Genes Peptides Filtered Peptides Onychophora Epiperipatus sp. GenBank (Sanger ESTs) — 1,868 698 480 Peripatopsis sedgwicki GenBank (454) — 10,476 5,489 2,552 Peripatopsis capensis De novo (Illumina HiSeq) 40,774,042 92,516 17,452 12,846 Myriapoda Archispirostreptus gigas Diplopoda GenBank (454) — 4,008 1,237 815 Alipes grandidieri Chilopoda GenBank (Illumina GAII) 32,332,034 147,732 34,505 18,814 Scutigera coleoptrata Chilopoda GenBank (Sanger ESTs) — 2,400 961 570 Pycnogonida Anoplodactylus eroticus Pycnogonida GenBank (454) — 3,744 2,291 1,127 Endeis spinosa Pycnogonida GenBank (454) — 4,069 3,302 2,656 Xiphosura Downloaded from Limulus polyphemus Xiphosura De novo (Illumina HiSeq) 65,099,444 110,362 33,041 17,824 Carcinoscorpius rotundicauda Xiphosura GenBank (Sanger ESTs) — 575 202 199 Arachnida Damon variegatus Amplypygi De novo (Illumina GAII) 64,733,221 64,715 19,686 11,823

Acanthoscurria gomesiana Araneae GenBank (Sanger ESTs) — 6,790 2,203 1,567 http://mbe.oxfordjournals.org/ Liphistius malayanus Araneae De novo (Illumina HiSeq) 62,897,982 61,775 16,526 11,221 Frontinella communis Araneae De novo (Illumina HiSeq) 66,183,160 155,930 53,936 18,978 Leucauge venusta Araneae De novo (Illumina HiSeq) 49,301,974 189,630 62,318 17,591 Neoscona arabesca Araneae De novo (Illumina HiSeq) 57,103,328 150,856 43,252 16,594 Blomia tropicalis Acariformes GenBank (Sanger ESTs) — 1,432 925 888 Glycyphagus domesticus Acariformes GenBank (Sanger ESTs) — 2,589 1,309 1,303 Suidasia medadensis Acariformes GenBank (Sanger ESTs) — 3,585 2,323 1,766

Tetranychus urticae Acariformes GenBank (whole genome) — 18,313 18,313 14,276 at American Museum of Natural History on February 17, 2015 Amblyomma americanum Parasitiformes GenBank (Sanger ESTs) — 6,502 1,266 971 Amblyomma variegatum Parasitiformes GenBank (Sanger ESTs) — 3,992 2,248 1,419 Dermacentor andersoni Parasitiformes GenBank (Sanger ESTs) — 3,994 614 452 Ixodes scapularis Parasitiformes GenBank (whole genome) — 20,473 20,473 15,288 Rhipicephalus appendiculatus Parasitiformes GenBank (Sanger ESTs) — 19,123 10,778 6,303 Rhipicephalus microplus Parasitiformes GenBank (Sanger ESTs) — 52,902 20,939 10,646 Larifuga sp. Opiliones De novo (Illumina HiSeq) 38,354,659 115,648 23,750 10,510 Metabiantes sp. Opiliones De novo (Illumina HiSeq) 72,266,554 85,203 30,754 12,843 Metasiro americanus Opiliones GenBank (Illumina GAII) — 83,792 30,041 16,556 Pachylicus acutus Opiliones De novo (Illumina HiSeq) 18,681,026 69,803 23,379 14,202 Phalangium opilio Opiliones De novo (Illumina GAII) 16,225,145 131,889 40,234 15,277 Vonones ornata Opiliones De novo (Illumina HiSeq) 66,096,788 114,248 37,076 19,208 Leiobunum verrucosum Opiliones GenBank (Illumina HiSeq) — 42,636 26,407 13,329 Protolophus singularis Opiliones GenBank (Illumina HiSeq) — 287,150 63,477 13,987 Ortholasma coronadense Opiliones GenBank (Illumina HiSeq) — 45,807 23,846 10,780 Trogulus martensi Opiliones GenBank (Illumina HiSeq) — 55,077 23,233 12,765 Hesperonemastoma modestum Opiliones GenBank (Illumina HiSeq) — 55,565 32,409 8,845 Sclerobunus nondimorphicus Opiliones GenBank (Illumina HiSeq) — 56,727 26,599 11,518 Sitalcina lobata Opiliones GenBank (Illumina HiSeq) — 64,104 26,410 14,828 Siro boyerae Opiliones GenBank (Illumina HiSeq) — 38,618 18,657 11,387 Prokoenenia wheeleri Palpigradi GenBank (Sanger ESTs) — 56 54 54 Synsphyronus apimelus Pseudoscorpiones De novo (Illumina HiSeq) 65,443,655 103,556 31,567 17,820 Pseudocellus pearsei Ricinulei De novo (Illumina HiSeq) 91,403,481 29,077 7,687 5,922 Ricinoides atewa Ricinulei De novo (Illumina GAII) 52,766,585 97,327 31,331 14,324 Centruroides vittatus Scorpiones De novo (Illumina HiSeq) 45,691,843 14,215 3,859 3,055 Pandinus imperator Scorpiones GenBank (454) — 17,253 2,219 2,209 Eremobates sp. Solifugae De novo (Illumina GAII) 86,794,767 94,687 25,474 11,765 Mastigoproctus giganteus Uropygi De novo (Illumina GAII) 25,983,006 116,600 32,827 17,674

2966 Phylogenomic Interrogation of Arachnida . doi:10.1093/molbev/msu235 MBE Downloaded from http://mbe.oxfordjournals.org/ at American Museum of Natural History on February 17, 2015

FIG.2. Relationships of Arachnida inferred from ML analysis of 3,644 orthologs. Numbers on nodes indicate bootstrap resampling frequencies (asterisks indicate a value of 100%). Uppercase text indicates terminals represented by complete genomes; boldface lowercase text indicates terminals with novel transcriptomic data from this study. Bars on right represent number of genes available for each terminal. Below: Global view of the sequence alignment. traditional basal split within Euchelicerata between Xiphosura Matrix Reduction and Concatenation by Gene and the terrestrial Arachnida, the ML analysis supported the Occupancy paraphyly of Arachnida, with Xiphosura sister to Ricinulei. To assess the impact of heterogeneous levels of missing data, The clade Pseudoscorpiones + Acariformes was recovered we generated three supermatrices. Two of these were sister to the remaining Euchelicerata; this clade in turn generated using the MARE algorithm, retaining either all 47 formed part of a grade with Acariformes with respect to terminals (516 orthologs, 122,436 sites, 39.99% occupied, the remaining Euchelicerata and Xiphosura. All orders repre- call handle “MARE1”) or only the 30 libraries sented by multiple terminals were recovered as monophyletic comprising Illumina data sets and whole genomes (1,237 with 100% bootstrap support, often in spite of marked dis- orthologs, 373,349 sites, 62.49% occupied; call handle parities in gene representation. The greatest such disparity is “MARE2”). A third matrix was generated by removing from exemplified by the Illumina-sequenced library of Limulus poly- our original 3,644-ortholog alignment all terminals with smal- phemus, which has over 2 orders of magnitude greater rep- ler libraries, that is, the same number of taxa as MARE2 (3,644 resentation than the Sanger-sequenced EST library of orthologs, 1,235,912 sites, 55.34% occupied, call handle “IL- Carcinoscorpius rotundicauda (2,431 vs. 14 orthologs, respec- GEN”). tively); support for Xiphosura was nevertheless maximal. All In spite of limited topological differences, none of nodes in this topology were fully supported, excepting the these three matrices recovered the monophyly of two corresponding to sister relationships of two sets of con- Arachnida (fig. 3 and supplementary fig. S1, Supplementary generic ticks (Amblyomma and Rhipicephalus). Longer Material online). In the two 30-taxon matrices (MARE2 branch lengths were observed for the orders Acariformes, and IL-GEN), L. polyphemus was consistently recovered Parasitiformes, and Pseudoscorpiones, with respect to the as nested within a paraphyletic Arachnida with strong remaining taxa. support (bootstrap resampling frequency [BS]  98%).

2967 Sharma et al. . doi:10.1093/molbev/msu235 MBE Downloaded from http://mbe.oxfordjournals.org/

FIG.3. Left: of arachnid relationships inferred from ML analysis post-matrix reduction, retaining only terminals with Illumina-sequenced transcriptomes or complete genomes. Right: Cladogram of chelicerate relationships inferred from ML analysis upon manual exclusion of Sanger- at American Museum of Natural History on February 17, 2015 sequenced EST and 454 library data from the 3,644-ortholog supermatrix. Orders of interest are indicated with shading. Numbers on nodes indicate bootstrap resampling frequencies. Bars on right represent number of genes available for each terminal. Below: Global view of the sequence alignment.

In all three topologies, Parasitiformes and the clade Support for Arachnida was equal to or near zero for all ma- Pseudoscorpiones + Acariformes formed a grade sister to trices except for one, which enforced gene occupancy as rep- the remaining Euchelicerata, and together with Solifugae in resentationinatleast30taxa(table 2, fig. 4).TheMLtree IL-GEN (a topology congruent with the supermatrix topology topology based on this exceptional matrix (98 orthologs, in fig. 2). 16,594 sites, 68.20%, occupancy, call handle “AL30”) recovered To infer the impact of gene occupancy on phylogenetic the monophyly of Arachnida with moderate support signal, another six matrices were constructed wherein the (BS = 79%; supplementary fig. S2, Supplementary Material threshold of gene occupancy was serially increased by five online). taxa, up to representation for a minimum of 35 out of 47 taxa. A summary of matrix composition and amount of miss- Concatenation by Percent Pairwise Identity ing data for these six matrices is provided in table 2. Bootstrap To assess how nodal support for alternative hypotheses of resampling frequency for various topological hypotheses of chelicerate relationships is affected by rate of molecular evo- interest showed no consistent effect of missing data (fig. 4). lution, a series of 15 matrices was assembled wherein

Table 2. Nodal Support for Three Ingroup Clades as a Function of Gene Occupancy Threshold. Bootstrap Resampling Frequency Gene Occupancy Threshold Proportion Number of Chelicerata Euchelicerata Arachnida Missing/Gaps Orthologs (sites) At least 13 taxa 0.638 3,644 (1,235,912) 100 100 0 At least 15 taxa 0.606 2,806 (896,210) 100 100 0 At least 20 taxa 0.520 1,152 (312,646) 100 100 0 At least 25 taxa 0.448 352 (77,516) 100 100 0 At least 30 taxa 0.318 98 (16,594) 99 100 79 At least 35 taxa 0.249 23 (4,021) 72 100 1

2968 Phylogenomic Interrogation of Arachnida . doi:10.1093/molbev/msu235 MBE Downloaded from http://mbe.oxfordjournals.org/ at American Museum of Natural History on February 17, 2015

FIG.4. (A) Bootstrap resampling frequency for phylogenetic hypotheses inferred from concatenation of orthologs by gene occupancy. (B) Global view of sequence alignment of largest matrix (13 taxa). (C) Global view of sequence alignment of densest matrix (35 taxa). orthologs were concatenated in order of percent pairwise for a group of gastropods. To facilitate systematic discourse identity, beginning with the most conserved (i.e., similar). more broadly, we rename this group Arachnopul-monata. We added genes at increments of 100 until 1,000 genes had In contrast, the trajectory corresponding to Arachnida been concatenated, and increments of 500 subsequently showed initial increase to maximum bootstrap frequency until 3,500 genes had been concatenated. We scored upon concatenation of the 500 slowest-evolving orthologs. the bootstrap resampling frequency for various topological Thereafter, nodal support for Arachnida decreased to zero by hypotheses. For such nodes as Chelicerata and Euchelicerata, the concatenation of the 1,000th slowest-evolving ortholog. we observed a trajectory of nodal support theoretically As illustrated in figure 6, the topology inferred from the 500 expected with accruing data: monotonic or nearly slowest-evolving orthologs recovers three separate clades monotonic increase in bootstrap resampling frequency until within Arachnida: Arachnopulmonata; a clade comprised of fixation at the maximum value (fig. 5). A similar result was Opiliones, Solifugae, and Ricinulei (a trio of orders with a obtained for some derived relationships that are consistent completely segmented opisthosoma and a tracheal tubule with morphology, such as Pedipalpi (Amblypygi + Uropygi), system for respiration); and a clade comprised of Pseudoscor- Tetrapulmonata (Araneae + Pedipalpi), and a clade we piones, Acariformes, and Parasitiformes. term Arachnopulmonata new name, which comprises Like Arachnida, a peak of support is observed for the place- Scorpiones + Tetrapulmonata. Albeit recovered with moder- ment of Pseudoscorpiones as part of a clade with ate support in the majority of their analyses, Regier et al. Arachnopulmonata. Specifically, much of that nodal support (2010) used the term “Pulmonata” for this clade, a places Pseudoscorpiones as sister group to Scorpiones (figs. 5 name introduced by Firstman (1973) for a clade of pulmonate and 6). Unlike Arachnida, nodal support for these hypotheses arachnids, but which had already been in use for decades is more restricted to the 200 slowest-evolving orthologs,

2969 Sharma et al. . doi:10.1093/molbev/msu235 MBE Downloaded from http://mbe.oxfordjournals.org/ at American Museum of Natural History on February 17, 2015

FIG.5. Bootstrap resampling frequency for phylogenetic hypotheses inferred from sequential concatenation of orthologs in order of percent pairwise identity (most to least conserved). which are insufficient to support the monophyly of ortholog data set and for MARE1 did not achieve conver- Arachnida (fig. 6). gence (0.3 < maximum discrepancy < 1) after 7 months of To ensure that results from this method of concatenation computation on 500 parallelized processors. We examined were not driven by missing data, we ranked the 3,644 ortho- posterior distributions under the four independent runs for logs in order of percent pairwise identity and mapped the both data sets and observed that all chains had recovered the oldest most recent common ancestor (MRCA) recoverable same consensus topology for each data set. Under either data for each orthogroup. Dense representation was observed for set, internodes of infinitesimal length separated major clades the MRCAs of Arachnida, Euchelicerata and Arthropoda, and within Euchelicerata; the result is indistinguishable from a the root of the tree (supplementary fig. S3, Supplementary basal polytomy for both data sets (fig. 7). Bayesian inference Material online). Considering the composition of three ma- analysis of the 500 slowest-evolving ortholog data set recov- trices of significance (200-slowest evolving orthologs, 500- ered the clade (Pseudoscorpiones + Scorpiones), but not slowest evolving orthologs, complete matrix), the proportion arachnid monophyly (fig. 7 compare with fig. 6). The two of orthologs wherein at least the MRCA of Arachnida was Bayesian inference topologies were almost identical due to represented exceeded 98% for all matrices. Only 72 (1.98%) of extensive overlap of orthologs between the two data sets; of orthologs in the complete matrix were restricted to sampling the 516 genes in MARE1, 301 are shared with the 500 slowest- of derived arachnids groups, and these were evenly distrib- evolving ortholog set. uted with respect to ordination of evolutionary rate. These results indicate that missing data affected the sampling of Discussion basal nodes in a nonsignificant and random way. Conflict, Resolution The phylogenetic data set amassed in this study is the largest Bayesian Inference Analyses deployed to test basal relationships of Arachnida to date. Runs of PhyloBayes implementing infinite mixture models Nevertheless, the tree topology recovered by ML analysis of (GTR + CAT) were implemented for the 500 slowest-evolving the 3,644 supermatrix (fig. 2) recapitulates a recurring

2970 Phylogenomic Interrogation of Arachnida . doi:10.1093/molbev/msu235 MBE Downloaded from http://mbe.oxfordjournals.org/ at American Museum of Natural History on February 17, 2015

FIG.6. Left: Tree topology inferred from ML analysis of the 200 slowest-evolving genes. Right: Tree topology inferred from ML analysis of the 500 slowest-evolving genes. Orders of interest are indicated with shading. Numbers on nodes indicate bootstrap resampling frequencies. Bars on right represent number of genes available for each terminal. Below: Global views of the sequence alignments. phenomenon in molecular phylogenetic studies of Parasitiformes) with known higher rates of molecular evolu- Chelicerata: The putative nonmonophyly of Arachnida sup- tion as a grade at the base of Euchelicerata with maximum ported by this analysis is a counterintuitive result, overturning bootstrap support, whether based on transcriptomic data a key relationship supported by morphological cladistic stud- (Roeding et al. 2009), nuclear ribosomal genes (Pepato et al. ies (e.g., Wheeler and Hayashi 1998; Giribet et al. 2002; Shultz 2010), or mitochondrial genes (Arabi et al. 2012). High or even 2007). Specifically, the nested placement of Xiphosura has optimal nodal support in genomic-scale analyses can belie been recovered in other phylogenomic assessments, albeit inaccuracy of the tree topology, interpartition conflicts, and with incomplete sampling of arachnid orders (Roeding et al. other biases (Felsenstein 1978; Rokas et al. 2003; Salichos and 2009; Meusemann et al. 2010). Shultz (2007) observed that Rokas 2013) and may be symptomatic of an LBA phenome- arachnids are putatively united by a series of synapomorphies, non (reviewed by Bergsten 2005). Furthermore, the inclusion such as aerial respiration, an anteriorly/anteroventrally di- of smaller (454 and Sanger-sequenced EST) libraries in our rected mouth, and the loss of some characters observed in data set engenders the possibility that the nonmonophyly of marine chelicerates (e.g., walking leg gnathobases, cardiac Arachnida is an artifact stemming from missing data. lobe). Counterarguments to morphological support for arach- The effect of missing data in phylogenetic analysis is con- nid monophyly, which are grounded in convergence of char- troversial. Simulations have shown that highly incomplete acters driven by independent episodes of terrestrialization, are terminals can be reliably placed in phylogenies, and often difficult to falsify, due to the fragmentary nature of the fossil confer advantages over excluding such terminals, but may record and limitations in parameterizing the historical prob- engender LBA artifacts in some cases (Wiens 2006). Other ability of terrestrialization events (Shultz 2007; von Reumont analyses of empirical data sets have implicated missing data et al. 2012). as misleading, specifically with respect to inflating nodal sup- However, we regarded with suspicion the placement port values for problematic nodes, but data sets selected to of three orders (Pseudoscorpiones, Acariformes, and balance representation of sequence data have not been

2971 Sharma et al. . doi:10.1093/molbev/msu235 MBE Downloaded from http://mbe.oxfordjournals.org/ at American Museum of Natural History on February 17, 2015

FIG.7. Consensus of PhyloBayes tree topologies across four independent runs. Left: Tree topology inferred using the 500 slowest-evolving genes. Right: Tree topology inferred using the MARE1 matrix. Dotted line represented abbreviated branch length for Glycyphagus domesticus (lengths of 1.545 and 1.076 for left and right topologies, respectively).

shown to resolve ambiguous nodes either (Dell’Ampio et al. although the measures of matrix completeness for these re- 2014; Fernandez et al. 2014). duced data sets are comparable to phylogenomic counter- To address this possibility, we combined concatenation parts among pancrustaceans and myriapods (e.g., approaches with ML inference of gene-tree topologies. In Meusemann et al. 2010; von Reumont et al. 2012; the 3,644-ortholog data set, all arachnid orders were repre- Fernandez et al. 2014). Matrix reduction with MARE while sented by at least one Illumina data set or complete genome, retaining all 47 terminals yielded nonmonophyly of Arachnida andwethuscalculatedlargenumbers of potentially informa- as well, with nonsignificant nodal support at the base of tive gene trees (i.e., trees containing at least one member of Euchelicerata (supplementary fig. S1, Supplementary each descendant branch and two distinct outgroups) for all Material online). nodes corresponding to interordinal relationships, in the Sequential concatenation of six matrices with varying level range of 701–3,116 for nodes corresponding to basal arachnid of gene occupancy did not suggest a uniform effect of missing relationships (fig. 8). However, the proportion of those po- data either for several relationships of interest (fig. 4). The tentially informative gene trees that were congruent with a smallest matrix with gene occupancy of over 35 taxa per gene given node varied markedly. Generally, internodes corre- (23 orthologs; 75.10% sequence occupancy) recovered sponding to the monophyly of derived clades (e.g., orders some relationships with slightly lower node support that sampled with multiple terminals; Tetrapulmonata) were were robustly supported elsewhere (e.g., Chelicerata, characterized by a higher proportion of congruent gene Tetrapulmonata, Arachnopulmonata), consistent with the trees, in contrast to basal nodes within Euchelicerata. These theoretical expectation of increased nodal support with ac- data suggest that the gene tree incongruence observed is not cruing data for robust nodes. Support for Arachnida was con- entirely attributable to missing data. sistently low, excepting its recovery with moderate support in We conducted additional analyses to investigate whether the AL30 matrix (gene occupancy of at least 30 taxa, 98 nonmonophyly of Arachnida was attributable to missing orthologs, 68.20% sequence occupancy; supplementary fig. data. Neither manual exclusion of 454 and Sanger-sequenced S2, Supplementary Material online). This recovery of a mono- libraries nor matrix reduction with MARE yielded arachnid phyletic Arachnida through exclusive use of molecular data, monophyly for their respective 30-terminal data sets (fig. 3), albeit with limited support, is a rare result (Regier et al. 2010).

2972 Phylogenomic Interrogation of Arachnida . doi:10.1093/molbev/msu235 MBE Downloaded from http://mbe.oxfordjournals.org/ at American Museum of Natural History on February 17, 2015

FIG.8. Relationships of Arachnida inferred from ML analysis of 3,644 orthologs (as in fig. 2). Numbers on nodes indicate number of gene trees congruent with nodes; numbers in parentheses indicate total number of potentially informative genes (for nodes of interest). Area shaded in gray indicates Tetrapulmonata, with branch lengths extended for clarity.

Nevertheless, as this threshold of gene occupancy (represen- those internodes (supplementary figs. S4 and S5, tation for at least 30 taxa) is at neither extreme of the param- Supplementary Material online), corroborating previous in- eter’s spectrum of values, we considered this result ferences of ancient and rapid radiation inferred from the fossil insufficient evidence for a systemic and pervasive effect of record and molecular phylogenetic efforts (Giribetetal.2002; missing data on tree topology inference. We therefore inves- Dunlop 2010; Regier et al. 2010). tigated another aspect of our data set that might reveal phy- We also considered the possibility that evolutionary rate logenetic support for Arachnida, namely, rate of molecular and degree of missing data were conflated. We dissected the evolution. relationship between these two variables and found that the Serial concatenation by evolutionary rate was conducted matrices concatenated by evolutionary rate had approxi- for 15 matrices wherein orthologs were concatenated in order mately the same amount of gene occupancy (36.23– of percent pairwise identity, beginning with the most con- 43.92%), with slightly higher occupancy in the slowest- served. In contrast to concatenation by gene occupancy, we evolving data sets (supplementary fig. S6, Supplementary observed distinct patterns in trajectories of nodal support Material online). In contrast, matrices concatenated by gene upon addition of genes, enabling discernment of robustly occupancy exhibit a clear negative relationship with evolu- supported nodes that were insensitive to evolutionary rate tionary rate, with denser matrices including a larger propor- (e.g., Chelicerata, Euchelicerata, Tetrapulmonata) and nodes tion of slowly evolving genes (supplementary fig. S6, that were supported by a subset of genes with slower rates Supplementary Material online)—genes that are more con- (e.g., Arachnida, Pseudoscorpiones + Scorpiones) (fig. 5). served across taxa. The latter correlation suggests that evolu- Intriguingly, the matrix of the 200- and 500-slowest evolving tionary rate is a stronger explanatory variable than amount of orthologs recovered interrelationships between arachnid missing data, not only for phylogenetic signal but also for the clades characterized by very short internodes in spite of rel- amount of missing data itself. In spite of this trend, concate- atively large numbers of potentially informative genes at nation in order of percent pairwise identity did not affect

2973 Sharma et al. . doi:10.1093/molbev/msu235 MBE representation of basal nodes (supplementary fig. S3, observed an unresolved star topology, with numerous reticu- Supplementary Material online), with the ingroup (MRCA lations corresponding to basal divergences of Arachnida (fig. of Arachnida) represented in over 98% of orthologs. The neg- 10). The retention of gene tree incongruence irrespective of ative relationship between evolutionary rate and gene occu- various approaches to matrix optimization (e.g., matrix reduc- pancy could reflect biological pattern (e.g., orthologs with tion, gene occupancy, evolutionary rate) indicates that con- conserved function and sequence may be expressed in flicts at the base of Arachnida are not solely attributable to many taxa), the algorithmic operation of OMA (Altenhoff stochastic sampling error, but are systemic. The greater et al. 2013), or a combination of the two. amount of reticulation in supernetworks of slowly evolving The number of different ML topologies obtained with var- genes corroborates previous reports of greater gene tree in- ious matrices is strongly suggestive of systemic incongruence congruence in slowly evolving partitions (Salichos and Rokas of phylogenetic signal, attributable in part to LBA artifacts; in 2013;butseeBetancur-R. et al. 2014) all topologies we obtained from the incremental concatena- The sum of these analyses suggests that phylogenetic res- tion series with nonmonophyly of Arachnida, arachnid olution of Arachnida is beleaguered by incongruent signal in paraphyly was caused by the suspect placement of different genomic regions, which in turn is a function of evo- Downloaded from Pseudoscorpiones, Acariformes, and/or Parasitiformes outside lutionary rate and exacerbated by LBA artifacts in rapidly of the clade uniting the remaining arachnids and Xiphosura evolving lineages. The combination of long-branch taxa, inci- (figs. 2, 3, 6,and7 and supplementary fig. S1, Supplementary dent heterotachy, and discordant results stemming from con- Material online). Consistent with this prediction as well, two sideration of evolutionary rate closely reflects historical separate Bayesian inference analyses (the 500-slowest evolv- disputes over the root of Metazoa, and tradeoffs between http://mbe.oxfordjournals.org/ ing orthologs and MARE1) yielded a polytomy or infinitesimal amount and parameterization of sequence data (Dunn branch lengths at the base of Arachnida (fig. 7), with long et al. 2008; Philippe et al. 2009, 2011; Schierwater et al. 2009; branches subtending the problematic orders. The LG4X + F Nosenko et al. 2013). Beyond demonstrating concealed phy- and CAT + GTR models utilized in ML and Bayesian inference logenetic support for arachnid monophyly, these results are analyses, respectively, were chosen for their superior ability to illuminative in understanding why a morphologically well- accommodate site-heterogeneous patterns of molecular evo- circumscribed group has historically proven difficult to re- lution, particularly in the event of LBA artifacts (Lartillot and cover in molecular phylogenetic analyses. We note that con-

Philippe 2004, 2008; Le et al. 2012). But upon examination of catenation by order of evolutionary rate proved an effective at American Museum of Natural History on February 17, 2015 branch lengths across postburn-in samples from the approach of identifying this phylogenetic incongruence. Serial PhyloBayes runs (based on the 500 slowest-evolving ortholog concatenation strategies that rely upon random sampling of data set), we continued to observe significantly longer branch orthologs (e.g., RADICAL; Narechania et al. 2012; Simon et al. lengths under a Bayesian relative rates test (Wilcox et al. 2004) 2012) were trialed in our data set, but did not identify incon- for all three problematic orders (Acariformes, Parasitiformes, gruence when it was highly localized. For example, unless a and Pseudoscorpiones) with respect to the MRCA of random concatenation series captures a significant propor- Euchelicerata, in comparison to the remaining euchelicerate tion of that subset supporting Pseudoscorpiones + taxa (P << 0.0001 for all comparisons; fig. 9). Together with Arachnopulmonata (200 out of 3,644 orthologs), this incon- the ML topologies, these analyses reinforce the inference of gruence will not be identified. Apropos, of the 98 orthologs in short internodes in the euchelicerate tree of life, precipitated the AL30 matrix, which supported the monophyly of by rapid divergence of basal lineages, and marked heterotachy Arachnida, 47 are among the 500 slowest evolving orthologs, in problematic orders. further indicating that rate of evolution has greater explana- However, concatenation methods have been critiqued for tory power than missing data toward accounting for arachnid their tendency to mask phylogenetic conflict when strong nonmonophyly. gene tree incongruence is incident (Jeffroy et al. 2006; Kubatko and Degnan 2007; Nosenko et al. 2013; Salichos and Rokas 2013), and some authors have advocated against Evaluating Historical Hypotheses Based on the use of bootstrap resampling in phylogenomic analyses Morphology: and Solifugae (Salichos and Rokas 2013; Dell’Ampio et al. 2014). Several We observed a nodal support profile similar to that of newly developed metrics for quantifying gene tree incongru- Arachnida for the clades (Pseudoscorpiones + Acari) and ence are inapplicable to partial trees (e.g., internode certainty; (Ricinulei + Solifugae) (fig. 5), neither historically supported Salichos and Rokas 2013). Therefore, we visualized the dom- by analyses of morphological characters (see Shultz 2007 for inant bipartitions among theMLgenetreetopologiesby the most recent analysis and summary of hypotheses). constructing supernetworks using the SuperQ method Monophyly of Acari (Acariformes + Parasitiformes) is conten- (Grunewald€ et al. 2013), which decomposes all gene trees tious, as molecular sequence data do not uniformly support into quartets to build supernetworks where edge lengths cor- this clade (Giribet et al. 2002; Pepato et al. 2010; Regier et al. respond to quartet frequencies (see Fernandez et al. 2014). 2010;butseeMeusemann et al. 2010) and relevant morpho- We inferred supernetworks for four data sets: 1) 3,644 ortho- logical characters may have undergone extensive conver- logs, 2) the 200 slowest-evolving orthologs, 3) the 500 slowest- gence. For example, a six-legged postembryonic stage evolving orthologs, and 4) the 1,237 orthologs of the MARE2 occurs not only in Acariformes and Parasitiformes but also data set, which retain only the 30 data-rich taxa. Invariably, we in the distantly related Ricinulei. The embryology and genome

2974 Phylogenomic Interrogation of Arachnida . doi:10.1093/molbev/msu235 MBE Downloaded from http://mbe.oxfordjournals.org/ at American Museum of Natural History on February 17, 2015

FIG.9. Bayesian relative rates test, indicating patristic distances from MRCA of Euchelicerata to constituent terminals. Samples are drawn from 4,000 postburn-in tree topologies (1,000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes. of T. urticae have also demonstrated the loss of opisthosomal abrogation of the appendage-patterning transcription factor segmentation and such conserved genes as the Hox transcrip- Distal-less (Schoppmeier and Damen 2001; Khila and Grbic tion factor abdominal-A, found in all other surveyed arach- 2007; Sharma et al. 2013). These data underscore the evolu- nids (Hughes and Kaufman 2002; Grbic et al. 2011; Sharma tionary lability of cheliceral segment number and enditic et al. 2012b). mouthparts, and disfavor their utility as phylogenetic charac- A greater measure of confidence was once placed in the ter systems. sister relationship of pseudoscorpions and solifuges Few morphological characters could potentially unite (Weygoldt and Paulus 1979; Shultz 1990, 2007;butsee Pseudoscorpiones and Acari. Nevertheless, the relationship Dunlop et al. 2012), with these forming part of a clade with of Pseudoscorpiones and Acariformes is consistent with the scorpions and harvestmen (fig. 1) (Weygoldt and Paulus 1978; presence of prosomal silk glands in these orders (produced Shultz 1990, 2007; Giribet et al. 2002). The sister relationship from cheliceral glands in the former and salivary glands in of Pseudoscorpiones and Solifugae is consistent with a some lineages of the latter; in contrast, silk is produced number of morphological characters, such as two-segmented from dedicated opisthosomal organs). A reciprocal BLAST , the shape of the epistome (a structure covering (Basic Local Alignment Search Tool) search against spider the mouth), and similarity in walking leg segments (Shultz and Tetranychus silk protein sequences revealed no hits in 2007). However, recent advances in comparative develop- the transcriptome of the Synsphyronus api- mental genetics have elucidated a simple mechanism for melus, and thus we are presently unable to evaluate the ho- the transition from the three- to the two-segmented chelic- mology of mite and pseudoscorpion silks based on their eral state: Loss of the expression domain of the transcription protein structure. factor dachshund, which in three-segmented chelicerae is Interestingly, in contrast to Arachnida, nodal support for uniquely expressed in the proximal segment (Prpic and the clade (Ricinulei + Solifugae) is more recalcitrant to con- Damen 2004; Sharma et al. 2012a, 2013; Barnett and catenation of fast-evolving genes. Only after concatenation of Thomas 2013). Similarly, the highly variable gnathobasic 3,000 orthologs does nodal support for this relationship mouthparts of spiders (the pedipalpal “maxilla”), mites (the become fixed at zero (fig. 5). Neither this relationship, nor rutellum), and harvestmen (the stomotheca) all result from the placement of Opiliones as sister group of the same embryonic tissues, whose growth is disrupted by (Ricinulei + Solifugae), has been supported previously in the

2975 Sharma et al. . doi:10.1093/molbev/msu235 MBE Downloaded from http://mbe.oxfordjournals.org/ at American Museum of Natural History on February 17, 2015

FIG.10. Supernetwork representation of quartets derived from individual ML gene trees, for four different data sets. Phylogenetic conflict is represented by reticulations. Edge lengths correspond to quartet frequencies. Shading indicates base of arachnid radiation. literature (very limited support was obtained for opisthosomal “sacs,” three pairs of putatively respiratory Ricinulei + Solifugae by Regier et al. 2010),andbothwarrant structures that appear in opisthosomal sternites 4–6. These further anatomical and embryological investigation. structures were historically inferred to be either the precur- sors or the remnants of book lungs, but a respiratory function of these structures has never been conclusively demonstrated. The Placement of Palpigradi As we were unable to include genome-scale data for The 47 total species in the supermatrix represented all but Palpigradi, we separately assessed the placement of this two orders of Chelicerata: Schizomida and Palpigradi. order using the 56 Sanger-sequenced genes for P. wheeleri Schizomida is a lineage consistently recovered as sister from Regier et al. (2010). These were analyzed as part of the group of Thelyphonida, and the pair of orders comprises matrix with the 500 slowest-evolving orthologs. the clade Uropygi (Shultz 1990, 2007; Giribet et al. 2002; This analysis recovered P. wheeleri as sister group to Pepato et al. 2010; Regier et al. 2010). Several characters Parasitiformes, with little support (BS = 50%), and modest unite Schizomida and Thelyphonida, including a unique reduction of nodal support elsewhere in the tree; relation- mating behavior (Shultz 1990), and the mutual monophyly ships among and within other orders were unchanged of the two orders has never been tested (i.e., one may con- (supplementary fig. S7, Supplementary Material online). stitute a nested lineage of the other). Moreover, given the placement of Palpigradi within a group In contrast, the placement of Palpigradi, an order of mi- of lineages with suspect placement, as well as its infinitesimal nuscule (typically < 1 mm in length) chelicerates (see Giribet representation in the matrix (three orthologs out of 500), we et al. 2014),haslongbeenindispute.Variousanalysesof consider the placement of this order insufficiently addressed different data classes have suggested that palpigrades are in this study. sister to Acariformes (Van der Hammen 1989; Regier et al. 2010), Acari (Pepato et al. 2010), Tetrapulmonata (Shultz 1990; Wheeler and Hayashi 1998), Solifugae (Giribetetal. Outgroup Sampling and LBA Artifacts 2002), or all remaining arachnids (Shultz 2007). At the heart One of the commonly invoked solutions for redressing topo- of the contention is the interpretation of palpigrade logical instability is depth of taxonomic sampling, and

2976 Phylogenomic Interrogation of Arachnida . doi:10.1093/molbev/msu235 MBE particularly of closely related outgroup taxa. This is a practice analyzed and ensuing LBA artifacts (see also Fernandez et al. that a subset of the authors has historically advocated in 2014). Results from the data set including Daphnia (a 52- arthropod molecular phylogenies (Wheeler et al. 1993; taxon data set) are therefore not presented here. Wheeler and Hayashi 1998; Giribet et al. 2001). However, As shown in figure 11, ML topologies of identical data for the majority of the analyses conducted in this study, sam- partitions augmented with pancrustacean outgroups demon- pling of Mandibulata was restricted to three myriapod spe- strate significantly longer patristic distances for Pancrustacea cies, out of a total of ten outgroups (three onychophorans, in comparison to Myriapoda (i.e., with respect to the MRCA three myriapods, two pycnogonids, and two xiphosurans). of arthropods; see also Regier et al. 2010), corroborating the This deliberate selection of mandibulate exemplars was preferential selection of myriapods as outgroups in the main driven by the topologically stable position of Myriapoda in analyses of this study. The branch lengths of Pancrustacea as a the arthropod tree of life (Regier et al. 2010; Rehm et al. 2014) whole are comparable to those of Acariformes, and the in- and their relatively short-branch lengths in comparison to all clusion of Pancrustacea renders Chelicerata nonmonophy- of Pancrustacea, as inferred by recent phylogenomic analyses letic in some analyses, due to the phylogenetic proximity of

(Regier et al. 2010). An argument favoring selection of out- Pancrustacea and Pycnogonida. Analysis of an augmented Downloaded from group taxa that have short patristic distances from their MARE2matrix(whichlackspycnogonids)didnotyieldman- MRCA with the ingroup taxa (i.e., low substitution rates) dibulate monophyly either, with euchelicerates nested within has been previously articulated by several workers as a solu- a paraphyletic Mandibulata (not shown). As demonstrated by tion for overcoming LBA artifacts (Bergsten 2005; Rota- visualizations of gene tree conflict (fig. 12), these placements

Stabelli and Telford 2008), which constitute a documented of Pancrustacea are directly attributable to LBA, with a large http://mbe.oxfordjournals.org/ affliction of arachnid phylogenies (Pepato et al. 2010; Arabi number of genes recovering a close relationship of Parhyale et al. 2012). Although the restriction of mandibulate exem- and Acariformes, particularly in slower-evolving gene parti- plars to myriapods means that nodal support for Myriapoda tions that are more susceptible to topological incongruence may be overestimated in our analyses (due to the conflation (Salichos and Rokas 2013). Although larger data sets do mit- of mandibulate synapomorphies with support for myriapod igate this problem (3,644-ortholog and MARE2 matrices; fig. monophyly), our focus was on arachnid internal phylogeny, 12), the lack of resolution at the base of Arachnida is entirely and the inclusion of Pycnogonida and Xiphosura as the most unaffected by the addition of pancrustacean outgroups, re- closely related outgroup taxa to Arachnida was the greater gardless of outgroup data set size (fig. 12,comparewithfig. at American Museum of Natural History on February 17, 2015 imperative for this study. Comparable considerations in out- 10), and Arachnida was not recovered as monophyletic by the group sampling among arthropods are exemplified by the augmented data sets. phylogenomic analyses of Regier et al. (2010), who sampled These analyses reinforce the importance of outgroup just five outgroups for their analysis, but specifically chose the choice when addressing lineages with demonstrable long- closest relatives of Arthropoda (three Onychophora and two branch ingroup taxa and justify the limitation of mandibulate Tardigrada), due to their focus on arthropod internal relation- sampling to myriapods in our central analyses. The addition ships. Brewer and Bond (2013) usedatick,awaterflea(pan- of outgroup data sets with nearly complete gene sampling crustacean), and a centipede to root their internal millipede (e.g., Drosophila, Tribolium)cannotsurmountLBAartifacts phylogeny. In an analysis of Opiliones, Hedin et al. (2012) used when a large swath of basally branching pancrustacean taxa is one outgroup (in one analysis, two): a tick or a chimeric not represented. Admittedly, the basal placement of scorpion terminal, the latter constructed from libraries of Pancrustacea may be greatly improved by extensively sam- three different families of Scorpiones. These last two cases pling basally branching lineages throughout Mandibulata, but represent a much sparser outgroup sampling than the one this is demonstrably the case even using a minuscule fraction conducted here. of the genes deployed herein (e.g., eight-gene analysis of Nevertheless, to demonstrate that sampling of outgroups Giribet et al. 2001; the 62-gene analysis of Regier et al. without regard for branch length distributions can in fact 2010). For the present analysis, pancrustacean genomic re- exacerbate topological conflict when LBA artifacts already sources comparable to those of model organisms like the four plague the ingroup, we conducted a separate family of anal- species we analyzed (in addition to the chelicerate data we yses wherein we added four pancrustacean taxa, retaining the generated) remain limited, and particularly for such key line- criterion of comparable data (i.e., Illumina libraries or com- ages as copepods, ostracods, remipedes, cephalocarids, and plete, well-annotated genomes): the dipteran Drosophila mel- collembolans. anogaster (complete genome), the coleopteran Tribolium castaneum (complete genome), the hemipteran Oncopeltus fasciatus (Illumina transcriptome; Ewen-Campen et al. 2011), Terrestrialization and Morphological Convergence and the amphipod Parhyale hawaiensis (Zeng et al. 2011). Although a number of aforementioned problems plague in- Due to incident concerns about missing data in this data ference of chelicerate relationships, perhaps the most haunt- set (see above), 454 libraries for pancrustaceans were not ing concern is that the pursuit and identification of analyzed. A fifth published pancrustacean data set phylogenetic signal supporting the monophyly of Arachnida (Daphnia pulex, complete genome) was added in a prelimi- is itself an idiosyncratic objective. A feature common to early nary analysis, but proved greatly problematic, due to a branch hypotheses of arthropod phylogeny, largely driven by mor- length significantly longer than all other pancrustacean taxa phological data, was the monophyly of Atelocerata, a clade

2977 Sharma et al. . doi:10.1093/molbev/msu235 MBE Downloaded from http://mbe.oxfordjournals.org/ at American Museum of Natural History on February 17, 2015

FIG.11. Left: Tree topology inferred from ML analysis of the 200 slowest-evolving genes, with four pancrustacean outgroups added. Right: Tree topology inferred from ML analysis of the 500 slowest-evolving genes, with four pancrustacean outgroups added. Orders of interest are indicated with shading. Numbers on nodes indicate bootstrap resampling frequencies. Bars on right represent number of genes available for each terminal. Below: Global views ofthesequencealignments. comprising myriapods and hexapods (e.g., Snodgrass 1938; terrestrialization events in the chelicerate branch of the tree Schram and Emerson 1991; Wheeler et al. 1993; Fortey et al. is greatly diminished. Integral to the reconstruction of terres- 1997; Emerson and Schram 1998; Kraus 1998; Wheeler 1998). trialization is the phylogenetic placement of scorpions, which Numerous aspects of anatomy united the two terrestrial have an impressive Paleozoic fossil record demonstrating mandibulate clades, such as the absence of tritocerebral ap- marine (or at least aquatic) origins (Jeram 1998; Dunlop pendages, the uniramy of present appendages, and characters et al. 2007; Dunlop 2010). Some workers therefore considered pertaining to tracheal respiration. This hypothesis was pro- scorpions sister to the remaining arachnids (Weygoldt and gressively overturned in favor of Pancrustacea (a clade uniting Paulus 1979) or separate from arachnids and sister to hexapods with a paraphyletic assemblage collectively termed Xiphosura (Van der Hammen 1989;ref.Dunlop and “crustaceans”) by accruing molecular data (e.g., Turbeville Webster 1999 for a review of these hypotheses). In contrast, et al. 1991; Friedrich and Tautz 1995; Giribet et al. 1996, others have inferred a single terrestrialization event in the 2001; Giribet and Ribera 1998; Zrzavy et al. 1998; ancestor of Arachnida, based on comparative morphology Koenemann et al. 2010; Meusemann et al. 2010; Campbell and putative homology of scorpion and tetrapulmonate re- et al. 2011; Rota-Stabelli et al. 2011). The diphyly of spiratory systems (Firstman 1973; Scholtz and Kamenz 2006; Atelocerata rendered a large number of its putative synapo- Kamenz et al. 2008) and circulatory systems (Wirkner et al. morphies homoplastic, the mechanism of which is inferred to 2013); implicit in this inference is the assumption that the be convergence due to terrestrial habitat (Legg et al. 2013). MRCA of scorpions and tetrapulmonates was the ancestral If terrestrialization has driven morphological convergence arachnid (i.e., the topology of Shultz 1990, 2007). Given the in the sister group of Chelicerata, then any epistemological strongly supported placement of Scorpiones sister to justification for minimizing the number of inferred Tetrapulmonata—a result recovered elsewhere with

2978 Phylogenomic Interrogation of Arachnida . doi:10.1093/molbev/msu235 MBE Downloaded from http://mbe.oxfordjournals.org/ at American Museum of Natural History on February 17, 2015

FIG.12. Supernetwork representation of quartets derived from individual ML gene trees, for four different data sets, with four pancrustacean outgroups added (indicated by boxes). Phylogenetic conflict is represented by reticulations. Edge lengths correspond to quartet frequencies. Shading indicates base of arachnid radiation. moderate support (Regier et al. 2010)—or in some analyses, sampling of constituent terminals (particularly of basally the sister relationship of (Scorpiones + Pseudoscorpiones) to branching exemplars of the problematic orders). We also Tetrapulmonata (fig. 6), the tree topology at hand obviates note that derived relationships, particularly within the well- inference of a terrestrial arachnid ancestor. The phylogenetic sampled orders Opiliones and Araneae, unambiguously sup- proximity of Scorpiones to Tetrapulmonata is certainly con- port traditional systematics within these groups (Shultz 1998; sistent with the homology of their respiratory and circulatory Coddington et al. 2004; Giribet et al. 2010; Sharma and Giribet systems (i.e., implies a single origin of the arachnid book lung), 2011; Hedin et al. 2012). butitimpliesthatasingleterrestrialization event is not a viable reconstruction of chelicerate evolutionary history, Conclusion even if Arachnida is recovered as monophyletic. Chelicerata We assessed the phylogeny of Arachnida using the largest therefore could encompass an evolutionary history with mul- phylogenomic assessment to date and conducted 40 separate tiple terrestrialization events, in a manner comparable to analyses, varying different parameters of matrix assembly and Mandibulata (von Reumont et al. 2012). algorithmic approach. We identified Pseudoscorpiones, Due to the incidence of heterotachy for some chelicerate Acariformes, and Parasitiformes as problematic long-branch orders and concealed phylogenetic signal for several plausible taxa, even upon inclusion of full genomes and use of sophis- phylogenetic hypotheses, we tentatively favor elements of the ticated substitution models. We consistently observed short tree topology recovered by the slower-evolving orthologs that internodes in basal relationships attributable to ancient rapid are consistent with morphological analyses, such as arachnid radiation. Serial concatenation in order of percent pairwise monophyly, the placement of Pseudoscorpiones within or identity demonstrates systemic conflicts in phylogenetic sister to Arachnopulmonata, and the relationship of signal. Topological conflict at the base of Arachnida is re- (Opiliones + (Ricinulei + Solifugae)). We add the caveat that tained among gene trees across data sets, regardless of evo- these relationships should be rigorously tested using broader lutionary rate or minimization of missing data. Slow-evolving

2979 Sharma et al. . doi:10.1093/molbev/msu235 MBE orthologs recovered the monophyly of Arachnida, the clade information and exported as separate files for left and right (Pseudoscorpiones + Arachnopulmonata), and the relation- mate pairs. This process was repeated for each sample and ship of (Opiliones + (Ricinulei + Solifugae)). Addition of out- reads corresponding to rRNA sequences were removed. groups without regard for evolutionary rate is shown to De novo assemblies were conducted using Trinity exacerbate LBA artifacts. A number of relationships was un- (Grabherr et al. 2011; Haas et al. 2013) using paired read ambiguously supported in all analyses. Future investigations files, with the exception of hardware specifications (num- of chelicerate phylogeny should emphasize taxonomic sam- bers of dedicated CPUs). For the single-end read library of pling efforts for breaking long branches by identifying basally M. giganteus, a path reinforcement distance of 25 was branching and putatively slowly evolving exemplars of the enforced; for paired-end libraries, a value of 75 was enforced. orders Pseudoscorpiones, Acariformes, and Parasitiformes. Raw reads have been deposited intheNCBI(NationalCenter Improving matrix representation of Palpigradi is mandatory for Biotechnology Information) Sequence Read Archive data- for assessing its placement within Chelicerata. base. All assemblies, alignments, matrices, and tree topologies generated in this study are deposited in the Dryad Digital

Materials and Methods Repository. Downloaded from Species Sampling and Molecular Techniques We added 30 other taxa to the data set of 18 Illumina- sequenced chelicerates (table 1). Peptide sequences from Taxa were sampled toward maximizing phylogenetic repre- whole genomes of I. scapularis and T. urticae were accessed sentation within Chelicerata. Given the availability of whole from the public databases Vectorbase and Ensembl, respec- genomes for both Parasitiformes and Acariformes, we focused tively. A pair of outgroup species, Alipes grandidieri http://mbe.oxfordjournals.org/ on generating Illumina paired-end read data for other cheli- (Myriapoda) and Peripatopsis capensis (Onychophora), were cerate orders. New transcriptomic data were thus generated sequenced by a subset of the authors using the same methods for 17 euchelicerate taxa. Collecting locality information and and platforms for a separate study (Fernandez et al. 2014). A statistics on sequencing yields are provided in table 1 and subset of smaller, assembled 454 libraries and Sanger-se- supplementary table S1, Supplementary Material online. All quenced ESTs were accessed from GenBank (table 1). tissues were collected fresh andpreparedimmediatelyfor Finally, we obtained reads for eight harvestman libraries pub- RNA extraction. Total RNA was extracted using TRI lished by Hedin et al. (2012), and sanitized and assembled

Reagent (Invitrogen). mRNA samples were prepared with these de novo in Trinity as described above. at American Museum of Natural History on February 17, 2015 the Dynabeads mRNA Purification Kit (Invitrogen), following the manufacturer’s protocol. Quality of mRNA was measured Orthology Assignment with picoRNA assays in an Agilent 2100 Bioanalyzer (Agilent Technologies); quantity was measured with a RNA assay in Redundancy reduction was done with CD-HIT (Fu et al. 2012) Qubit fluorometer (Life Technologies). for all Trinity assemblies (95% similarity). Resulting culled as- Almost all cDNA libraries were constructed with the semblies were processed in TransDecoder (Haas et al. 2013)in TruSeq for RNA Sample Preparation kit (Illumina) following order to identify candidate ORFs within the transcripts. the manufacturer’s instructions; three spider libraries Predictedpeptideswerethenprocessedwithafurtherfilter (Frontinella communis, Leucauge venusta,andNeoscona ara- to select only one peptide per putative unigene, by choosing besca) were prepared using the PrepX mRNA kit for Apollo the longest ORF per Trinity subcomponent with a python 324 (Integen). Samples were barcoded with TruSeq adaptors, script, thus removing the variation in the coding regions of facilitating multiplexed sequencing runs, with up to four spe- Trinity assemblies due to alternative splicing, closely related cimens per lane sequenced on the Illumina Hi-Seq 2500 plat- paralogs, and allelic diversity. Peptide sequences with all final form. Concentration of cDNA was measured using dsDNA candidate ORFs were retained as fasta files. We assigned pre- High Sensitivity assays in an Agilent 2100 Bioanalyzer (Agilent dictedORFsintoorthologousgroupsacrossallsamplesusing Technologies), as well as a Qubit fluorometer (Invitrogen). OMA stand-alone v.0.99t (Altenhoff et al. 2011, 2013). All Samples were run using a Illumina Genetic Analyzer II or input files were single line multi-fasta files and the parame- Illumina HiSeq 2500 platform with paired-end reads of ters.drw file specified retained all default settings. We paralle- 100–150 bp at the FAS Center for Systems Biology at lized all-by-all local alignments across 200 CPUs. In a separate Harvard University, excepting the Mastigoproctus giganteus family of analyses, additional outgroups were added to the library, which was sequenced using single-end reads of 100 bp. output from the principal OMA run using best reciprocal Sequenced results were quality filtered accordingly to a BLAST hits, implemented in a customized script. threshold average quality Phred score of 30 and adaptors trimmed. Retained reads shorter than 25 bp were discarded. Phylogenomic Analyses Ribosomal RNA (rRNA) was filtered out using Bowtie v.1.0.0 We constructed an amino acid supermatrix by concatenating (Langmead et al. 2009), by constructing a set of metazoan the set of OMA groups containing 13 or more taxa (gene rRNA sequences from GenBank with the command, “bowtie- occupancy of 25%), which were extracted using a Unix script. build.” Each sample was sequentially aligned to the index These 3,644 orthogroups were aligned individually using allowing up to two mismatches through Bowtie v.1.0.0, re- MUSCLE v.3.6 (Edgar 2004). Ambiguously aligned positions taining all unaligned reads in FASTQ format (command line: were culled using GBlocks v.0.91b (Castresana 2000), as well as “-a –best”). Unaligned results were paired using read header regions where more than 50% of the columns constituted

2980 Phylogenomic Interrogation of Arachnida . doi:10.1093/molbev/msu235 MBE missing data (using the command –b5 = h). Trimmed and Philippe 2004). Four independent chains were run for orthogroups were concatenated using Phyutility 2.6 (Smith 6,965–10,145 cycles, and the initial 5,000 cycles were discarded and Dunn 2008) for this and other phylogenomic matrices as burn-in, with convergence assessed using the maximum (see below). bipartition discrepancies across chains. Due to the degree of missing data for the palpigrade P. wheeleri, represented by 56 Sanger-sequenced genes, we Supplementary Material principally analyzed our data set without this species, Supplementary figures S1–S7 and table S1 areavailableat adding the palpigrade terminal in a final analysis to infer Molecular Biology and Evolution online (http://www.mbe. the placement of this enigmatic order. Attempts to obtain oxfordjournals.org/). new transcriptomes of Eukoenenia spelaea (from Slovakia) and Eukoenenia sp. (from Mexico) were unsuccessful due to Acknowledgments the diminutive size and scarcity of these . Mark S. Harvey, Lionel Monod, Peter J. Schwendinger, and To assess the impact of gene occupancy on nodal support Piotr Naskrecki kindly provided samples of Synsphyronus api- and tree topology, we constructed five other matrices at dif- melus, Liphistius malayanus, and Ricinoides atewa.Antje Downloaded from ferent thresholds of gene occupancy. An additional 15 matri- Fischer and Woods Hole Marine Biological Laboratory pro- ces were constructed based on evolutionary rate, for which vided specimens of Limulus polyphemus.Sonia C.S. Andrade, percent pairwise identity was employed as a proxy. This Jerome Murienne, and Ana Riesgo prepared some of the method was chosen to approximate evolutionary rate be- Illumina libraries. Thorsten Burmester provided the 454 tran- cause it is agnostic to tree topology. Percent pairwise identity scriptomic assembly of Pandinus imperator. Marshal Hedin http://mbe.oxfordjournals.org/ was calculated for each orthogroup alignment in Geneious contributed raw read data for the eight Opiliones libraries we v.6.1.6, and is computed by taking all possible pairs of bases at reassembled. Adrian Altenhoff, James Cuff, Geoffrey F. Dilly, the same column and scoring a hit (one) when a pair is Paul Edmon, Mike Ethier, and Rosa Fernandez facilitated com- identical; this value is divided by the total number of pairs puting operations. Discussions with Casey Dunn and Antonis for each column, and all column values are then averaged Rokas refined some of the ideas presented. This research was over the length of the alignment. Cells with indels (“-”) and supported by internal funds from the Museum of missing taxa do not contribute to pairwise calculations. The Comparative Zoology to G.G., by National Science percent pairwise identity was calculated for each of the 3,644 Foundation grants DEB-1144492 and DEB-114417 to G.H. at American Museum of Natural History on February 17, 2015 orthogroup alignments subsequent to treatment with and G.G., and by the National Science Foundation GBlocks v. 0.91b. This ensured that regions with large indels Postdoctoral Research Fellowship in Biology under Grant did not affect the calculation of percent pairwise identity. No. DBI-1202751 to P.P.S. The authors are indebted to As an algorithmic alternative to optimization of matrix Editor-in-Chief Sudhir Kumar, Associate Editor Nicolas construction, we used the software MARE (MAtrix Vidal, and three anonymous reviewers for reviewing this REduction; http://mare.zfmk.de, last accessed November 11, manuscript. 2013), which estimates informativeness of every orthogroup based on weighted geometry quartet mapping (Nieselt- References Struwe and von Haeseler 2001). We analyzed the 3,644 Altenhoff AM, Gil M, Gonnet GH, Dessimoz C. 2013. Inferring hierar- orthogroups with gene occupancy over 25% using MARE, chical orthologous groups from orthologous gene pairs. PLoS One 8: and thereby constructed two matrices. First, we implemented e53786. MARE, enforcing retention of all libraries. Second, we re- Altenhoff AM, Schneider A, Gonnet GH, Dessimoz C. 2011. OMA 2011: peated algorithmic matrix reduction, but enforced retention orthology inference among 1000 complete genomes. Nucleic Acids Res. 39:D289–D294. of only those 30 libraries constituting Illumina libraries or Arabi J, Judson ML, Deharveng L, Lourenc¸o WR, Cruaud C, Hassanin A. whole genomes (i.e., keeping only terminals with significantly 2012. Nucleotide composition of CO1 sequences in Chelicerata more data than 454 or Sanger-sequenced EST counterparts). (Arthropoda): detecting new mitogenomic rearrangements. JMol Asaseparatetestofinfluenceofdataquantity,wecon- Evol. 74:81–95. structed a manual reduction of the 3,644-ortholog superma- Barnett AA, Thomas RH. 2013. The expression of limb gap genes in the mite Archegozetes longisetosus reveals differential patterning mech- trix, retaining only the 30 libraries constituting Illumina anisms in chelicerates. Evol Dev. 15:280–292. libraries or whole genomes. Berger SA, Krompass D, Stamatakis A. 2011. Performance, accuracy, and ML analyses were conducted using RAxML v.7.7.5 (Berger Web server for evolutionary placement of short sequence reads et al. 2011) and Bayesian inference using PhyloBayes MPI 1.4e under maximum likelihood. Syst Biol. 60:291–302. (Lartillot et al. 2013). For RAxML v.7.7.5, we implemented a Bergsten J. 2005. A review of long-branch attraction. Cladistics 21: 163–193. unique LG4X + F model for each gene (Le et al. 2012). Betancur-RR, Naylor G, Ortı G. 2014. Conserved genes, sampling error, Bootstrap resampling frequencies were estimated with 500 and phylogenomic inference. Syst Biol. 63:257–262. replicates using a rapid bootstrapping algorithm (Stamatakis Brewer MS, Bond JE. 2013. Ordinal-level phylogenomics of the arthropod et al. 2008). Analyses with PhyloBayes MPI 1.4e were limited class Diplopoda (millipedes) based on an analysis of 221 nuclear to smaller data sets, as implementation of PhyloBayes for large protein-coding loci generated using next-generation sequence anal- yses. PLoS One 8:e79935. matrices requires intractable amounts of time for conver- Briggs DEG, Siveter DJ, Sieveter DJ, Sutton MD, Garwood RJ, Legg D. gence. We implemented PhyloBayes MPI 1.4e using the 2012. Silurian horseshoe crab illuminates the evolution of arthropod site-heterogeneous CAT + GTR model of evolution (Lartillot limbs. Proc Natl Acad Sci U S A. 109:15702–15705.

2981 Sharma et al. . doi:10.1093/molbev/msu235 MBE

Campbell LI, Rota-Stabelli O, Edgecombe GD, Marchioro T, Longhorn SJ, Giribet G, McIntyre E, Christian E, Espinasa L, Ferreira RL, Francke OF, Telford MJ, Philippe H, Rebecchi L, Peterson KJ, Pisani D. 2011. Harvey MS, Isaia M, Kovacˇ L, McCutchen L, et al. 2014. The first MicroRNAs and phylogenomics resolve the relationships of phylogenetic analysis of Palpigradi (Arachnida)—the most enig- Tardigrada and suggest that velvet worms are the sister group of matic arthropod order. Invertebr Syst. 28. doi:10.1071/IS13057. Arthropoda. Proc Natl Acad Sci U S A. 108:15920–15924. Giribet G, Ribera C. 1998. The position of arthropods in the animal Cao Z, Yu Y, Wu Y, Hao P, Di Z, He Y, Chen Z, Yang W, Shen Z, He X, kingdom: a search of a reliable outgroup for internal arthropod et al. 2013. The genome of Mesobuthus martensii reveals a unique phylogeny. Mol Phylogenet Evol. 9:481–488. adaptation model of arthropods. Nat Commun. 4:2602. GiribetG,VogtL,Perez Gonzalez A, Sharma P, Kury AB. 2010. A multi- Castresana J. 2000. Selection of conserved blocks from multiple align- locus approach to harvestmen (Arachnida: Opiliones) phylogeny ments for their use in phylogenetic analysis. MolBiolEvol.17: with emphasis on biogeography and the systematics of Laniatores. 540–552. Cladistics 26:408–437. Clouse RM, Sharma PP, Giribet G, Wheeler WC. 2013. Elongation factor- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, 1, a putative single-copy nuclear gene, has divergent sets of para- Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. 2011. Full- logs in an arachnid. Mol Phylogenet Evol. 68:471–481. length transcriptome assembly from RNA-Seq data without a ref- Coddington JA, Giribet G, Harvey MS, Prendini L, Walter DE. 2004. erence genome. Nat Biotechnol. 29:644–652. Arachnida. In: Cracraft J, Donoghue MJ, editors. Assembling the Grbic M, Van Leeuwen T, Clark RM, Rombauts S, RouzeP,Grbic ´ V, tree of life. New York: Oxford University Press. p. 296–318. Osborne EJ, Dermauw W, Ngoc PC, Ortego F, et al. 2011. The Downloaded from Dell’Ampio E, Meusemann K, Szucsich NU, Peters RS, Meyer B, Borner J, genome of Tetranychus urticae reveals herbivorous pest adaptations. Petersen M, Aberer AJ, Stamatakis A, Walzl MG, et al. 2014. Decisive Nature 479:487–492. data sets in phylogenomics: lessons from studies on the phyloge- Grunewald€ S, Spillner A, Bastkowski S, Bogershausen A, Moulton V. netic relationships of primarily wingless insects. MolBiolEvol.31: 2013. SuperQ: computing supernetworks from quartets. IEEE/ACM 239–249. Trans Comput Biol Bioinform. 10:151–160. Dunlop JA. 2010. Geological history and phylogeny of Chelicerata. Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, http://mbe.oxfordjournals.org/ Arthropod Struct Dev. 39:124–142. Couger MB, Eccles D, Li B, Lieber M, et al. 2013. De novo transcript Dunlop JA, Kamenz C, Scholtz G. 2007. Reinterpreting the morphology sequence reconstruction from RNA-seq using the Trinity platform of the scorpion Liassoscorpionides. Arthropod Struct Dev. 36: for reference generation and analysis. Nat Protoc. 8:1494–1512. 245–252. Hedin M, Derkarabetian S, McCormack M, Richart C, Shultz JW. 2010. Dunlop JA, Kruger€ J, Alberti G. 2012. The sejugal furrow in camel spiders The phylogenetic utility of the nuclear protein-coding gene EF-1 and acariform mites. Arachnol Mitt. 43:8–15. for resolving recent divergences in Opiliones, emphasizing intron Dunlop JA, Webster M. 1999. Fossil evidence, terrestrialization, and evolution. JArachnol.38:9–20. arachnid phylogeny. JArachnol.27:86–93. Hedin M, Starett J, Akhter S, Schonhofer€ AL, Shultz JW. 2012.

Dunn CW, Hejnol A, Matus DQ, Pang K, Browne WE, Smith SA, Seaver Phylogenomic resolution of Paleozoic divergences in harvestmen at American Museum of Natural History on February 17, 2015 E, Rouse GW, Obst M, Edgecombe GD, et al. 2008. Broad phyloge- (Arachnida, Opiliones) via analysis of next-generation transcriptome nomic sampling improves resolution of the animal tree of life. data. PLoS One 7:e42888. Nature 452:745–749. Hejnol A, Obst M, Stamatakis A, Ott M, Rouse GW, Edgecombe GD, Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accu- Martinez P, Bagun~a J, Bailly X, Jondelius U, et al. 2009. Assessing the racy and high throughput. Nucleic Acids Res. 32:1792–1797. root of bilaterian animals with scalable phylogenomic methods. Proc Emerson MJ, Schram FR. 1998. Theories, patterns, and reality: game plan RSocLondBBiolSci.276:4261–4270. for arthropod phylogeny. In: Fortey RA, Thomas RH, editors. Huelsenbeck JP. 1997. Is the Felsenstein zone a fly trap? Syst Biol. 46: Arthropod relationships. London: Chapman & Hall. p. 67–86. 69–74. Ewen-Campen B, Shaner N, Panfilio K, Suzuki Y, Roth S, Extavour CG. Hughes CL, Kaufman TC. 2002. Hox genes and the evolution of the 2011. The maternal and early embryonic transcriptome of the milk- arthropod body plan. Evol Dev. 4:459–499. weed bug Oncopeltus fasciatus. BMC Genomics 12:61. Jeffroy O, Brinkmann H, Delsuc F, Philippe H. 2006. Phylogenomics: the Felsenstein J. 1978. Cases in which parsimony or compatibility methods beginning of incongruence? Trends Genet. 22:225–231. will be positively misleading. Syst Biol. 27:401–410. Jeram AJ. 1998. Phylogeny, classifications and evolution of Silurian and Fernandez R, Laumer CE, Vahtera V, Libro S, Kaluziak ST, Sharma PP, Devonian scorpions. In: Selden PA, editor. In: Proceedings of the 17th Perez-Porro AR, Edgecombe GD, Giribet G. 2014. Evaluating topo- European Colloquium of . Edinburgh: British logical conflict in centipede phylogeny using transcriptomic data Arachnological Society, Burnham Beeches. p. 17–31. sets. MolBiolEvol.31:1500–1513. Kamenz C, Dunlop JA, Scholtz G, Kerp H, Hass H. 2008. Microanatomy Firstman B. 1973. The relationship of the chelicerate arterial system to of Early Devonian book lungs. Biol Lett. 4:212–215. the evolution of the endosternite. JArachnol.1:1–54. Khila A, Grbic M. 2007. Gene silencing in the spider mite Tetranychus Fortey RA, Briggs DEG, Wills MA. 1997. The Cambrian evolutionary urticae: dsRNA and siRNA parental silencing of the Distal-less gene. ‘explosion’ recalibrated. BioEssays 19:429–434. Dev Genes Evol. 217:241–251. Friedrich M, Tautz D. 1995. Ribosomal DNA phylogeny of the major Koenemann S, Jenner RA, Hoenemann M, Stemme T, von Reumont extant arthropod classes and the evolution of myriapods. Nature BM. 2010. Arthropod phylogeny revisited, with a focus on crusta- 376:165–167. cean relationships. Arthropod Struct Dev. 39:88–110. Fu LM, Niu BF, Zhu ZW, Wu ST, Li WZ. 2012. CD-HIT: accelerated for Kraus O. 1998. Phylogenetic relationships between higher taxa of tra- clustering the next-generation sequencing data. Bioinformatics 28: cheate arthropods. In: Fortey RA, Thomas RH, editors. Arthropod 3150–3152. relationships. London: Chapman & Hall. p. 295–303. Giribet G, Carranza S, Bagun~a J, Riutort M, Ribera C. 1996. First molec- Kubatko LS, Degnan JH. 2007. Inconsistency of phylogenetic estimates ular evidence for the existence of a Tardigrada + Arthropoda clade. from concatenated data under coalescence. Syst Biol. 56:17–24. Mol Biol Evol. 13:76–84. Lamsdell JC. 2013. Revised systematics of Palaeozoic ‘horseshoe crabs’ Giribet G, Edgecombe GD, Wheeler WC. 2001. Arthropod phylog- and the myth of monophyletic Xiphosura. Zool J Linn Soc. 167:1–27. eny based on eight molecular loci and morphology. Nature 413: Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and 157–161. memory-efficient alignment of short DNA sequences to the Giribet G, Edgecombe GD, Wheeler WC, Babbitt C. 2002. Phylogeny and human genome. Genome Biol. 10:R25. systematic position of Opiliones: a combined analysis of chelicerate Lartillot N, Philippe H. 2004. A Bayesian mixture model for across-site relationships using morphological and molecular data. Cladistics 18: heterogeneities in the amino-acid replacement process. Mol Biol 5–70. Evol. 21:1095–1109.

2982 Phylogenomic Interrogation of Arachnida . doi:10.1093/molbev/msu235 MBE

Lartillot N, Philippe H. 2008. Improvement of molecular phylogenetic efficiency estimation of genomic sampling in non-model taxa. Front inference and the phylogeny of Bilateria. Philos Trans R Soc Lond B Zool. 9:33. Biol Sci. 363:1463–1472. Roeding F, Borner J, Kube M, Klages S, Reinhardt R, Burmester T. 2009. A Lartillot N, Rodrigue N, Stubbs D, Richer J. 2013. PhyloBayes MPI: phy- 454 sequencing approach for large scale phylogenomic analysis of logenetic reconstruction with infinite mixtures of profiles in a par- the common emperor scorpion (Pandinus imperator). Mol allel environment. Syst Biol. 62:611–615. Phylogenet Evol. 53:826–834. Le SQ, Dang CC, Gascuel O. 2012. Modeling protein evolution with Rokas A, Carroll SB. 2006. Bushes in the tree of life. PLoS Biol. 4:e352. several amino acid replacement matrices depending on site rates. Rokas A, King N, Finnerty J, Carroll SB. 2003. Conflicting phylogenetic Mol Biol Evol. 29:2921–2936. signals at the base of the metazoan tree. Evol Dev. 5:346–359. Legg DA, Sutton MD, Edgecombe GD. 2013. Arthropod fossil data in- Rota-Stabelli O, Campbell L, Brinkmann H, Edgecombe GD, Longhorn SJ, crease congruence of morphological and molecular phylogenies. Nat Peterson KJ, Pisani D, Philippe H, Telford MJ. 2011. A congruent Commun. 4:2485. solution to arthropod phylogeny:phylogenomics,microRNAsand Mallatt JM, Garey JR, Shultz JW. 2004. Ecdysozoan phylogeny and morphology support monophyletic Mandibulata. Proc R Soc Lond B Bayesian inference: first use of nearly complete 28S and 18S rRNA Biol Sci. 278:298–306. gene sequences to classify the arthropods and their kin. Mol. Rota-Stabelli O, Daley AC, Pisani D. 2013. Molecular timetrees reveal a Phylogenet. Evol. 31:178–191. Cambrian colonization of land and a new scenario for ecdysozoan Masta SE, Longhorn SJ, Boore JL. 2009. Arachnid relationships based on evolution. Curr Biol. 23:392–398. Downloaded from mitochondrial genomes: asymmetric nucleotide and amino acid Rota-Stabelli O, Telford MJ. 2008. A multi criterion approach for the bias affects phylogenetic analyses. Mol Phylogenet Evol. 50:117–128. selection of optimal outgroups in phylogeny: recovering some sup- Masta SE, McCall A, Longhorn SJ. 2010. Rare genomic changes and port for Mandibulata over Myriochelata using mitogenomics. Mol mitochondrial sequences provide independent support for congru- Phylogenet Evol. 48:103–111. ent relationships among the sea spiders (Arthropoda, Pycnogonida). Salichos L, Rokas A. 2013. Inferring ancient divergences requires genes Mol Phylogenet Evol. 57:59–70. with strong phylogenetic signals. Nature 497:327–331. http://mbe.oxfordjournals.org/ Meusemann K, von Reumont BM, Simon S, Roeding F, Strauss S, Kuck€ P, Schierwater B, Eitel M, Jakob W, Osigus HJ, Hadrys H, Dellaporta SL, Ebersberger I, Walzl M, Pass G, Breuers S, et al. 2010. A phylogenomic Kolokotronis SO, Desalle R. 2009. Concatenated analysis sheds light approach to resolve the arthropod tree of life. MolBiolEvol.27: on early metazoan evolution and fuels a modern “urmetazoon” 2451–2464. hypothesis. PLoS Biol. 7:e1000020. Narechania A, Baker RH, Sit R, Kolokotronis S-O, DeSalle R, Planet PJ. Scholtz G, Kamenz C. 2006. The book lungs of Scorpiones and 2012. Random addition concatenation analysis: a novel approach to Tetrapulmonata (Chelicerata, Arachnida): evidence for homology the exploration of phylogenomic signal reveals strong agreement and a single terrestrialization event of a common arachnid ancestor. between core and shell genomic partitions in the cyanobacteria. Zoology 109:2–13.

Genome Biol Evol. 4:30–43. Schoppmeier M, Damen WGM. 2001. Double-stranded RNA interfer- at American Museum of Natural History on February 17, 2015 Nieselt-Struwe K, von Haeseler A. 2001. Quartet-mapping, a generaliza- ence in the spider Cupiennius salei:theroleofDistal-less is evolu- tion of the likelihood-mapping procedure. MolBiolEvol.18: tionarily conserved in arthropod appendage formation. Dev Genes 1204–1219. Evol. 211:76–82. Nosenko T, Schreiber F, Adamska M, Adamski M, Eitel M, Hammel J, Schram FR, Emerson MJ. 1991. Arthropod pattern theory: a new ap- Maldonado M, Muller€ WE, Nickel M, Schierwater B, et al. 2013. Deep proach to arthropod phylogeny. Mem Queensl Mus. 31:1–18. metazoan phylogeny: when different genes tell different stories. Mol Sharma PP, Giribet G. 2011. The evolutionary and biogeographic history Phylogenet Evol. 67:223–233. of the armoured harvestmen—Laniatores phylogeny based on ten Obst M, Faurby S, Bussarawit S, Funch P. 2012. Molecular phylog- molecular markers, with the description of two new families of eny of extant horseshoe crabs (Xiphosura, Limulidae) indicates Opiliones (Arachnida). Invertebr Syst. 25:106–142. Paleogene diversification of Asian species. Mol Phylogenet Evol. Sharma PP, Schwager EE, Extavour CG, Giribet G. 2012a. Evolution of the 62:21–26. chelicera: a dachshund domain is retained in the deutocerebral ap- Pepato AR, da Rocha CEF, Dunlop JA. 2010. Phylogenetic position of the pendage of Opiliones (Arthropoda, Chelicerata). Evol Dev. 14: acariform mites: sensitivity to homology assessment under total 522–533. evidence. BMC Evol Biol. 10:235. Sharma PP, Schwager EE, Extavour CG, Giribet G. 2012b. Hox Philippe H, Brinkmann H, Lavrov DV, Littlewood DTJ, Manuel M, gene expression in the harvestman Phalangium opilio reveals diver- Worheide€ G, Baurain D. 2011. Resolving difficult phylogenetic ques- gent patterning of the chelicerate opisthosoma. Evol Dev. 14: tions: why more sequences are not enough. PLoS Biol. 9:e1000602. 450–463. Philippe H, Derelle R, Lopez P, Pick K, Borchiellini C, Boury-Esnault N, Sharma PP, Schwager EE, Extavour CG, Wheeler WC. 2014. Hox gene VaceletJ,RenardE,HoulistonE,Queinnec E, et al. 2009. duplications correlate with posterior heteronomy in scorpions. Proc Phylogenomics revives traditional views on deep animal relation- RSocLondBBiolSci.281:20140661. ships. Curr Biol. 19:706–712. Sharma PP, Schwager EE, Giribet G, Jockusch EL, Extavour CG. 2013. Philippe H, Zhao Y, Brinkmann H, Rodrigue N, Delsuc F. 2005. Distal-less and dachshund pattern both plesiomorphic and Heterotachy and long-branch attraction in phylogenetics. BMC apomorphic structures in chelicerates: RNA interference Evol Biol. 5:50. in the harvestman Phalangium opilio (Opiliones). Evol Dev. 15: Prpic N-M, Damen WGM. 2004. Expression patterns of leg genes in the 228–242. mouthparts of the spider Cupiennius salei (Chelicerata: Arachnida). Shultz JW. 1990. Evolutionary morphology and phylogeny of Arachnida. Dev Genes Evol. 214:296–302. Cladistics 6:1–38. RegierJC,ShultzJW,ZwickA,HusseyA,BallB,WetzerR,MartinJW, Shultz JW. 1998. Phylogeny of Opiliones (Arachnida): an assessment of Cunningham CW. 2010. Arthropod relationships revealed by phy- the “Cyphopalpatores” concept. J Arachnol. 26:257–272. logenomic analysis of nuclear protein-coding sequences. Nature 463: Shultz JW. 2007. A phylogenetic analysis of the arachnid orders based on 1079–1083. morphological characters. Zool J Linn Soc. 150:221–265. Rehm P, Meusemann K, Borner J, Misof B, Burmester T. 2014. Simon S, Narechania A, DeSalle R, Hadrys H. 2012. Insect phylogenomics: Phylogenetic position of Myriapoda revealed by 454 transcriptomic exploring the source of incongruence using new transcriptomic sequencing. Mol Phylogenet Evol. 77:25–33. data. Genome Biol Evol. 4:1295–1309. Riesgo A, Andrade SCS, Sharma PP, Novo M, Perez-Porro AR, Vahtera V, Smith SA, Dunn CW. 2008. Phyutility: a phyloinformatics Gonzalez VL, Kawauchi GY, Giribet G. 2012. Comparative descrip- tool for trees, alignments and molecular data. Bioinformatics 24: tion of ten transcriptomes of newly sequenced invertebrates and 715–716.

2983 Sharma et al. . doi:10.1093/molbev/msu235 MBE

Snodgrass RE. 1938. Evolution of the Annelida, Onychophora and Wheeler WC, Hayashi CY. 1998. The phylogeny of the extant chelicerate Arthropoda. Smithson. Misc. Collect. 97:1–159. orders. Cladistics 14:173–192. Stamatakis A, Hoover P, Rougemont J. 2008. A rapid bootstrap algo- Wiens JJ. 2006. Missing data and the design of phylogenetic analyses. rithm for the RAxML Web servers. Syst Biol. 57:758–771. J Biomed Inform. 39:34–42. Turbeville JM, Pfeifer DM, Field KG, Raff RA. 1991. The phylogenetic Wilcox TP, Garcıa de Leon FJ, Hendrickson DA, Hillis DM. 2004. status of arthropods, as inferred from 18S rRNA sequences. Mol Biol Convergence among cave catfishes: long-branch attraction Evol. 8:669–686. and a Bayesian relative rates test. Mol Phylogenet Evol. 31: Van der Hammen L. 1989. An introduction to comparative arachnology. 1101–1103. Leiden (United Kingdom): SPB Academic Publishing. Wirkner CS, Togel€ M, Pass G. 2013. The arthropod circulatory system. In: von Reumont BM, Jenner RA, Wills MA, Dell’ampio E, Pass G, Minelli A, Boxshall G, Fusco G, editors. Arthropod biology and evo- Ebersberger I, Meyer B, Koenemann S, Iliffe TM, Stamatakis A, lution: molecules, development,andmorphology.Heidelberg et al. 2012. Pancrustacean phylogeny in the light of new phyloge- (Germany): Springer Press. p. 343–391. nomic data: support for Remipedia as the possible sister group of Zeng V, Villanueva KE, Ewen-Campen B, Alwes F, Browne WE, Extavour Hexapoda. MolBiolEvol.29:1031–1045. CG. 2011. De novo assembly and characterization of a maternal and Weygoldt P, Paulus HF. 1979. Untersuchungen zur Morphologie, developmental transcriptome for the emerging model crustacean Taxonomie und Phylogenie der Chelicerata. Z Zool Syst Evol. 17: Parhyale hawaiensis. BMC Genomics 12:581. 85–116, 177–200. ZrzavyJ,Hyp sa V, Vlaskov a M. 1998. Arthropod phylogeny: taxonomic Downloaded from Wheeler WC. 1998. Sampling, groundplans, total evidence and the sys- congruence, total evidence and conditional combination tematics of arthropods. In: Fortey RA, Thomas RH, editors. approaches to morphological and molecular data sets. In: Fortey Arthropod relationships. London: Chapman & Hall. p. 87–96. RA, Thomas RH, editors. Arthropod relationships. London: Wheeler WC, Cartwright P, Hayashi CY. 1993. Arthropod phylogeny: a Chapman & Hall. p. 97–107. combined approach. Cladistics 14:173–192. http://mbe.oxfordjournals.org/ at American Museum of Natural History on February 17, 2015

2984