SUPPLEMENTARY MATERIAL Phylogenomic Analysis of Ultraconserved Elements Resolves the Evolutionary and Biogeographic History of S
Total Page:16
File Type:pdf, Size:1020Kb
SUPPLEMENTARY MATERIAL Phylogenomic Analysis of Ultraconserved Elements Resolves the Evolutionary and Biogeographic History of Segmented Trapdoor Spiders XIN XU, YONG-CHAO SU, SIMON Y. W. HO, MATJAŽ KUNTNER, HIROTSUGU ONO, FENGXIANG LIU, CHIA-CHEN CHANG, NATAPOT WARRIT, VARAT SIVAYYAPRAM, KHIN PYAE PYAE AUNG, DINH SAC PHAM, Y. NORMA-RASHID, AND DAIQIN LI MATERIALS AND METHODS Taxon Selection and Genomic Resources We sampled 185 liphistiid speCimens, representing two subfamilies, all eight genera, and 166 putative speCies. These include 90 out of 137 known speCies, along with 76 unknown speCies (World Spider Catalog 2020). We also seleCted 25 outgroup taxa, representing two infraorders (Mygalomorphae and Araneomorphae) of the spider order Araneae and three other araChnid orders (Amblypygi, SCorpiones, and Thelyphonida) that are considered as sister lineages to Araneae (Starrett et al. 2017; Fernández et al. 2018; Lozano-Fernandez et al. 2019). These outgroup taxa include seven mygalomorphs sampled in this study as well as seven mygalomorphs, seven araneomorphs, and four non-spider araChnid taxa from the study by Starrett et al. (2017) (Supplementary Table S1). We extraCted genomiC DNA from one or two leg tissues of eaCh speCimen depending on body size. DNA was extraCted using a Maxwell RSC automatiC DNA/RNA extraCtion robot (Promega, USA) with Maxwell RSC blood DNA extraCtion kits, following the manufaCturer’s protocols. We quantified all genomiC DNA extraCtions using a Quantum assay (Promega, USA), and normalized them to 20 ng/µL in 50 µL of double-distilled H2O (for a total of 1000 ng). We sent genomiC DNA to the GenomiC Sequencing Core (University of Kansas, USA) for library preparation and low-Coverage genome sequencing. Paired-end DNA libraries were prepared with an insert size of 300 bp and sequenced using PE100 reads on the Illumina Hiseq 2500 system. We sent the probe sequences (desCribed in the next seCtion) and genomiC DNA from 192 speCimens (185 liphistiids and seven mygalomorphs) to Chiral TeChnologies, Inc. (West Chester, PA, USA) for library preparation and sequenCing. We then followed standard workflows for sequence-Capture enriChment with universal blockers, library preparation for Illumina sequencing with on-bead reaCtions and NEB/Kapa reagents, and post-enriChment limited-Cycle PCR (http://ultraConserved.org/#protocols). We quantified the final pooled libraries by qPCR and sent them for sequencing using PE100 reads on the Illumina Hiseq 2500 system. Probe Design Since the available UCE probe sets target few UCE loci from liphistiids, we designed a liphistiid-speCifiC probe set. Whole-genome sequences are unavailable for liphistiids, so we began by generating low-Coverage genomes for Liphistius aff. malayanus (subfamily Liphistiinae) and Songthela xianningensis (subfamily Heptathelinae). We removed the adapters and trimmed low-quality reads from the raw demultiplexed reads using Cutadapt 1 v1.12 (Martin 2011). We quantified the trimmed reads using FastQC v0.11.5 (Andrews 2010) and de novo assembled two genomes using ABYSS v2.0.2 (JaCkman et al. 2017). For designing the liphistiid-speCifiC UCE probe set, we used the two novel low- Coverage liphistiid genomes as the ingroup taxa. We used the genome of the mygalomorph Brazilian whiteknee tarantula Acanthoscurria geniculata (NCBI, PRJNA222716; 5.8 Gbp; Theraphosidae; Sanggaard et al. 2014) as the outgroup, and the genome of the araneomorph AfriCan social velvet spider Stegodyphus mimosarum (NCBI, PRJNA222714; 2.7 Gbp; Eresidae; Sanggaard et al. 2014) as the base genome. We designed and captured the liphistiid-speCifiC probe set from genomes using the PHYLUCE workflow (Faircloth 2016). In order to capture more conserved loci, we also combined the AraChnida probe set (AraChnida-UCE-1.1K-v1.fasta; Faircloth 2017) with our liphistiid-speCifiC probe set in the final probe-design step. The design of the AraChnida probe set (1,120 UCE loci) was based on ten araChnid taxa, including five speCies of Araneae. We converted the genome assemblies of L. aff. malayanus, Songthela xianningensis, A. geniculata, and Stegodyphus mimosarum to 2bit format using faToTwoBit, obtained the short reads from the two liphistiid and A. geniculata genomes using art v2.5.8 (Huang et al. 2012), and then aligned them to the base genome using the raw‐read aligner stampy v1.0.22 (Lunter and Goodson 2011). We used Samtools v1.9 (Li et al. 2009) to remove the unmapped reads and to reduce the resulting alignment BAM files. We then used Bedtools v2.27.1 (Quinlan and Hall 2010) to convert these reduced BAM files to BED format. This allowed us to merge those putatively conserved regions (i.e., alignment positions that are <100 bp from eaCh other), and also removes alignments that are <80 bp, overlap masked loci that are more than 25% of the base genome, or have ambiguous bases (‘N’ or ‘X’). We extraCted the conserved loci from the base genome and used these to design the temporary oligonucleotide bait sequences. DupliCate loci were searched and removed from the temporary bait sequences aCCording to the default setting ≥50% sequence identity over ≥50% of sequence length. The output bait sequences were the proto-‘liphistiid-speCifiC’ probe set. In the in silico test, we extraCted the unique UCE contigs and probe set from the AraChnida probes by matChing the AraChnida probe set against our two liphistiid genomes using lastz v1.02 (Harris 2007). Finally, we combined the extraCted unique AraChnida probe set with the proto-‘liphistiid-speCifiC’ probe set to form our final ‘liphistiid-speCifiC’ probe set. Bioinformatics We processed the raw demultiplexed UCE reads using the PHYLUCE pipeline (Faircloth 2016). We first used the illumiprocessor to remove adapters and low-quality bases (Faircloth 2013), then used Trinity (Grabherr et al. 2011) in PHYLUCE to assemble the trimmed reads. We matChed the contigs from all samples to the ‘liphistiid-speCifiC’ probes using minimum coverage and minimum identity values of 80. We also extraCted UCE loci in silico from our two low-Coverage liphistiid genomes and three other publiCly available spider genomes, Latrodectus hesperus, Loxosceles reclusa, and Stegodyphus mimosarum, as well as the araChnid UCE contigs from Starrett et al. (2017) (Supplementary Table S1). We aligned UCE loci using MAFFT (Katoh and Standley 2013) and internally trimmed with Gblocks (Castresana 2000; Talavera and Castresana 2007) using the default settings in the PHYLUCE pipeline. We prepared three data sets, one Comprising 2,765 UCE loci from 185 taxa (‘liphistiid’ data set), one comprising 2,801 UCE loci from 25 outgroup taxa (‘outgroup’ data set), and one comprising 2,822 UCE loci from 25 outgroup taxa and eight liphistiids (one taxon per genus, reduced data set). For the liphistiid data set, we produced subsets of the data based on varying levels of taxon coverage ranging from 50 to 90%, to 2 allow us to assess the effeCt of missing data on phylogenetiC inference. For the outgroup and reduced data sets, we produced only one subset based on 50% taxon coverage, in order to include more UCE loci. Phylogenomic Analyses We performed phylogenomiC analyses of the UCE data under a range of conditions and using several different methods to examine their effeCts on phylogenetiC resolution. Gene trees were inferred from UCE loci using maximum likelihood in IQ-TREE v1.6.12 (Nguyen et al. 2015). Node support was estimated using ultrafast bootstrapping with 1000 repliCates (Hoang et al. 2018) and the Shimodaira-Hasegawa approximate likelihood-ratio test (SH- aLRT) with 1000 repliCates (Guindon et al. 2010). We used ModelFinder (Kalyaanamoorthy et al. 2017) to seleCt the substitution model for eaCh UCE locus. To investigate the amount of gene-tree confliCt around each branch, we obtained the alternative quartet support values in ASTRAL. To cheCk for unusual gene trees and UCE loci, we midpoint-rooted the gene trees and calCulated the mean root-to-tip distances using the R paCkage phytools (Revell 2012). In Cases where the mean root-to-tip distance exceeded 2 substitutions per site, we cheCked for alignment errors and excluded problematiC loci. For the liphistiid, outgroup, and reduced data sets, we inferred the speCies tree using Concatenation and summary-CoalesCent methods. We first concatenated the UCE loci and analysed the sequence alignment using maximum likelihood in IQ-TREE. Node support was estimated using ultrafast bootstrapping with 1000 repliCates and SH-aLRT with 1000 repliCates. We assigned the substitution model seleCted by ModelFinder. We performed phylogenomiC inference using the summary-CoalesCent approaCh with ASTRAL-III (Zhang et al. 2018), based on the individual gene trees estimated using maximum likelihood in IQ-TREE. Based on the speCies-tree topology inferred using ASTRAL, we estimated branch lengths using a maximum-likelihood analysis of the Concatenated UCE loci in IQ-TREE. Phylogenomic Dating We used Bayesian and likelihood-based methods to estimate the evolutionary timesCale for liphistiid spiders, and explored the impaCts of tree topology, data filtering, data partitioning, and clock models. For these analyses, we focused on the 185-taxon liphistiid data set based on 70% taxon coverage beCause we consider this to be the best of our phylogenetiC estimates. Fossil calibrations.—We calibrated our dating analysis using three fossil-based age Constraints and two seCondary calibrations. The seCondary calibrations were based on moleCular date estimates by Fernández et al. (2018). First, we speCified an age range of 419– 462 Ma for the root node, representing the divergence between SCorpiones (Centruroides AR015 + Bothriurus AR021) and Araneae + Thelyphonida (Mastigoproctus AR011) + Amblypygi (Damon AR010). SeCond, we speCified an age range of 387–438 Ma for the split between Araneae and Thelyphonida + Amblypygi. Mesothelae fossils were reCently found from the Middle CretaCeous amber forest of Myanmar. These fossils were diagnosed and desCribed based on juveniles except for two males. Compared with all extant liphistiids, the two male fossil taxa Parvithele muelleri WunderliCh, 2017, and P.