<<

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

AUTHOR QUERIES AQ1: Please check whether the edit made to the article title is appropriate. AQ2: Please check that all names have been spelled correctly and appear in the correct order. Please also check that all initials are present. Please check that the author surnames (family name) have been correctly identified by a pink background. If this is incorrect, please identify the full surname of the relevant authors. Occasionally, the distinction between surnames and forenames can be ambiguous, and this is to ensure that the authors’ full surnames and forenames are tagged correctly, for accurate indexing online. Please also check all author affiliations. AQ3: Please provide department name (if any) for affiliations ‘2, 4–8, 15, 16’ and also provide full road and district address and Zip or postal code for affiliations ‘3, 4, 6, 8, 9, 11, 15’. AQ4: Please clarify whether this is “Reeder et al. 2015a” or “Reeder et al. 2015b” throughout the article. AQ5: Please check that the text is complete and that all figures, tables and their legends are included. AQ6: Please check that special characters, equations, dosages and units, if applicable, have been reproduced accurately. AQ7: Permission to reproduce any third party material in your paper should have been obtained prior to acceptance. If your paper contains figures or text that require permission to reproduce, please confirm that you have obtained all relevant permissions and that the correct permission text has been used as required by the copyright holders. Please contact [email protected] if you have any questions regarding permissions. AQ8: Please check whether the section hierarchy is OK as set. AQ9: Please spell out qPCR, MCMC, GTR, UCE (if necessary). AQ10: Please note that the unit ‘My’ has been ‘Ma’ throughout the article. Please confirm. AQ11: Please note that the Refs. [Greene 1997; Pyron 2017; and Anisimova et al. 2011] are not listed in the references list. Please add them to the list or delete the citations. AQ12: Please confirm whether the citations of Supplementary material are OK as set. AQ13: If you have submitted files to Dryad, please can you check the Dryad URL to make sure it contains the correct doi and is linked to your data package? Please confirm in the proof if it is correct or if it needs to be changed. AQ14: Please confirm whether the “Funding” section is OK as set. AQ15: Please update the missing volume number and page numbers for Ref. [Burbrink et al. 2019]. AQ16: Please confirm whether Ref. [Estes et al. 1988] is OK as set.’ AQ17: There is no mention of Ref. [Fry 2005] in the text. Please insert a citation in the text or delete the reference as appropriate. AQ18: Please provide complete details for Refs. [Gao and Norell 1998; Kuhn 2008; Rhodin et al. 2015]. AQ19: Please provide the missing volume number for Refs. [Minh et al. 2018; Robinson and Foulds 1981]. AQ20: Please provide the missing publisher name and publisher location for Ref. [R Core Team 2015]. AQ21: Please provide the missing publisher location for Ref. [Zhang 2010]. AQ22: If applicable figures have been placed as close as possible to their first citation. Please check that they are complete and that the correct figure legend is present. Figures in the proof are low resolution versions that will be replaced with high resolution versions when the journal is printed. AQ23: This figure is currently intended to appear in color online and black and white in print. Please check the black and white versions at the end of the proof, and if necessary, re-word the text and legend to avoid references to color. AQ24: We noticed Figure 6 (B) has 5 parts and the received source figure PDF ‘Fig_6.Var_Imp_NEW’ has only 4 parts. Kindly note, we have included the 5th part in Fig. 6 (B), please confirm if this fine or advise if the 5th Part needs to be deleted.

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 1 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

MAKING CORRECTIONS TO YOUR PROOF

These instructions show you how to mark changes or add notes to your proofs using Adobe Acrobat Professional versions 7 and onwards, or Adobe Reader DC. To check what version you are using go to Help then About. The latest version of Adobe Reader is available for free from get.adobe.com/reader.

DISPLAYING THE TOOLBARS

Adobe Reader DC Acrobat Professional 7, 8, and 9 In Adobe Reader DC, the Comment toolbar can be found by In Adobe Professional, the Comment toolbar can be found by ĐůŝĐŬŝŶŐ͚ŽŵŵĞŶƚ͛ŝŶƚŚĞŵĞŶƵŽŶthe right-hand side of the ĐůŝĐŬŝŶŐ͚ŽŵŵĞŶƚ(s)͛ŝŶƚŚĞƚŽƉƚoolbar, and then clicking page (shown below). ͚^ŚŽǁŽŵŵĞŶƚΘDĂƌŬƵƉdŽŽůďĂƌ͛ (shown below).

The toolbar shown below will then be displayed along the top.

The toolbar shown below will then display along the top.

USING TEXT EDITS AND COMMENTS IN ACROBAT USING COMMENTING TOOLS IN ADOBE READER This is the quickest, simplest and easiest All commenting tools are displayed in the toolbar. You cannot use method both to make corrections, and for your text edits, however you can still use highlighter, sticky notes, and a corrections to be transferred and checked. variety of insert/replace text options.

POP-UP NOTES In both Reader and Acrobat, when you insert or edit text a pop-up box will appear. In Acrobat it looks like this:

1. Click Text Edits 2. Select the text to be annotated or place your cursor at the insertion point and start typing. 3. Click the Text Edits drop down arrow and select the required action.

You can also right click on selected text for a range In Reader it looks like this, and will appear in the right-hand pane: of commenting options, or add sticky notes.

SAVING COMMENTS In order to save your comments and notes, you need to save the file (File, Save) when you close the document.

DO NOT MAKE ANY EDITS DIRECTLY INTO THE TEXT, USE COMMENTING TOOLS ONLY.

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 2 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Journal: Systematic Biology Article Doi: 10.1093/sysbio/syz062 Article Title: Interrogating Genomic-Scale Data for (Lizards, , and Amphisbaenians) Show no Support for Key Traditional Morphological Relationships First Author: Frank T. Burbrink Corr. Author: Hussam Zaher

INSTRUCTIONS

We encourage you to use Adobe’s editing tools (please see the next page for instructions). If this is not possible, please list clearly in an e-mail to [email protected]. Please do not send corrections as track changed Word documents. Changes should be corrections of typographical errors only. Changes that contradict journal style will not be made. These proofs are for checking purposes only. They should not be considered as final publication format. The proof must not be used for any other purpose. In particular we request that you do not post them on your personal/institutional web site,and do not print and distribute multiple copies. Neither excerpts nor all of the article should be included in other publications written or edited by yourself until the final version has been published and the full citation details are available. You will be sent these when the article is published, along with an author PDF of the final article.

1. License to Publish: If you have not already done so, please visit the link in your Welcome email and complete your License to Publish online.

2. Author groups: Please check that all names have been spelled correctly and appear in the correct order. Please also check that all initials are present. Please check that the author surnames (family name) have been correctly identified by a pink background. If this is incorrect, please identify the full surname of the relevant authors. Occasionally, the distinction between surnames and forenames can be ambiguous, and this is to ensure that the authors’ full surnames and forenames are tagged correctly, for accurate indexing online. Please also check all author affiliations.

3. Figures: Figures have been placed as close as possible to their first citation. Please check that they have no missing sections and that the correct figure legend is present.

4. Conflict of interest: All authors must make a formal statement indicating any potential conflict of interest that might constitute an embarrassment to any of the authors if it were not to be declared and were to emerge after publication. Such conflicts might include, but are not limited to, shareholding in or receipt of a grant or consultancy fee from a company whose product features in the submitted manuscript or which manufactures a competing product. The following statement has been added to your proof: ‘Conflict of Interest: none declared.’ If this is incorrect please supply the necessary text to identify the conflict of interest.

5. Permissions: Permission to reproduce any third party material in your paper should have been obtained prior to acceptance. If your paper contains figures or text that require permission to reproduce, please email [email protected] as soon as possible.

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 3 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Syst. Biol. 0(0):1–19, 2019 © The Author(s) 2019. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please email: [email protected] DOI:10.1093/sysbio/syz062

AQ1 Interrogating Genomic-Scale Data for Squamata (Lizards, Snakes, and Amphisbaenians) Show no Support for Key Traditional Morphological Relationships

, FRANK T. BURBRINK1,FELIPE G. GRAZZIOTIN2,R.ALEXANDER PYRON3,DAV I D CUNDALL4,STEVE DONNELLAN5 6,FRANCES AQ2 IRISH7,J.SCOTT KEOGH8,FRED KRAUS9,ROBERT W. MURPHY10,BRICE NOONAN11,CHRISTOPHER J. RAXWORTHY1,SARA , ,∗ RUANE12,ALAN R. LEMMON13,EMILY MORIARTY LEMMON14 , AND HUSSAM ZAHER15 16 1Department of Herpetology, The American Museum of Natural History, 79th Street at Central Park West, New York, NY 10024, USA; 2Laboratório de Coleções Zoológicas, Instituto Butantan, Av. Vital Brasil, 1500—Butantã, São Paulo—SP 05503-900, Brazil; 3Department of Biological Sciences, The George Washington University, Washington, DC 20052, USA; 4Biological Sciences, W. Packer Avenue, Lehigh University, Bethlehem, PA 18015, USA; 5South Australian Museum, North Terrace, Adelaide SA 5000, Australi; 6School of Biological Sciences, University of Adelaide, SA 5005 Australia; 7Biological Sciences, Moravian College, 1200 Main St, Bethlehem, PA 18018, US; 8Division of Ecology and Evolution, Research School of Biology, The AQ3 Australian National University, Canberra, ACT 2601, Australia; 9Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA; 10 Department of Natural History, Royal Ontario Museum, 100 Queens Park, Toronto, ON M5S 2C6, Canada; 11 Department of Biology, University of Mississippi, Oxford, MS 38677, USA; 12Department of Biological Sciences, 206 Boyden Hall, Rutgers University, 195 University Avenue, Newark, NJ 07102, USA; 13Department of Scientific Computing, Florida State University, Dirac Science Library, Tallahassee, FL 32306-4102, USA; 14 Department of Biological Science, Florida State University, 319 Stadium Drive, Tallahassee, FL 32306-4295, USA; 15Museu de Zoologia da Universidade de São Paulo, São Paulo, Brazil CEP 04263-000, Brazil; and 16 Centre de Recherche sur la Paléobiodiversité et les Paléoenvironnements (CR2P), UMR 7207 CNRS/MNHN/Sorbonne Université, Muséum national d’Histoire naturelle, 8 rue Buffon, CP 38, 75005 Paris, France ∗ Correspondence to be sent to: Museu de Zoologia da Universidade de São Paulo, São Paulo, Brazil CEP 04263-000, Brazil; E-mail: [email protected]

Received XX XXXX XXXX; reviews returned XX XXXX XXXX; accepted XX XXXX XXXX Associate Editor: Robert Thomson

Abstract.—Genomics is narrowing uncertainty in the phylogenetic structure for many amniote groups. For one of the most diverse and species-rich groups, the squamate (lizards, snakes, and amphisbaenians), an inverse correlation between the number of taxa and loci sampled still persists across all publications using DNA sequence data and reaching a consensus on the relationships among them has been highly problematic. In this study, we use high-throughput sequence data from 289 samples covering 75 families of squamates to address phylogenetic affinities, estimate divergence times, and characterize residual topological uncertainty in the presence of genome-scale data. Importantly, we address genomic support for the traditional taxonomic groupings Scleroglossa and Macrostomata using novel machine-learning techniques. We interrogate genes using various metrics inherent to these loci, including parsimony-informative sites (PIS), phylogenetic informativeness, length, gaps, number of substitutions, and site concordance to understand why certain loci fail to find previously well- supported molecular clades and how they fail to support species-tree estimates. We show that both incomplete lineage sorting and poor gene-tree estimation (due to a few undesirable gene properties, such as an insufficient number of PIS), may account for most gene and species-tree discordance. We find overwhelming signal for Toxicofera, and also show that none of the loci included in this study supports Scleroglossa or Macrostomata. We comment on the origins and diversification of Squamata throughout the Mesozoic and underscore remaining uncertainties that persist in both deeper parts of the tree (e.g., relationships between Dibamia, Gekkota, and remaining squamates; and between the three toxiferan clades Iguania, Serpentes, and Anguiformes) and within specific clades (e.g., affinities among gekkotan, pleurodont iguanians, and colubroid families). [Neural network; gene interrogation; lizards; snakes; genomics; phylogeny.]

Well-supported phylogenies inferred using both thor- few of intermediate range with ∼50 loci and 161 taxa ough taxon-sampling and genome-scale sequence data (Wiens et al., 2012; Reeder et al. 2015). Furthermore, are paramount for understanding phylogenetic struc- phylogenetic thinking about such groups often reflects ture and settling debates about higher-level taxonomy. historical morphological hypotheses that are weakly Phylogenomic analyses can provide reliable trees for congruent or incongruent with recent phylogenomic AQ7 downstream use in comparative biology (Garland et al. estimates (Conrad 2008; Gauthier et al., 2012; Losos et al., 2005; Wortley et al., 2005; Heath et al., 2008; Ruane 2012). et al., 2015; Burbrink et al., 2019) and helps unravel The order Squamata comprises almost 10,800 extant evolutionary complexity (Philippe et al., 2011), such lizards, snakes, and amphisbaenians (Uetz et al. 2018) as deep-time phylogenetic reticulation (Burbrink and showing continuous diversification since the , Gehara, 2018). In recent years, well-resolved phylogenies with many groups surviving the /Tertiary of birds and mammals used both large numbers of genes mass extinction (Evans 2003; Evans and Jones 2010; and living taxa (Prum et al., 2015; Liu et al., 2017). Jones et al., 2013). Extant squamates occur in nearly Unfortunately, among amniotes, squamates have fallen all habitats globally and show huge variation in body behind and all comparative and taxonomic studies still size, body shape, limb types (including repeated com- rely on phylogenetic structure estimated from either plete limb loss), oviparous and viviparous reproduct- a handful of genes and a large number of species ive modes, complex venoms, and extremely varied (Pyron et al., 2013), small number of lineages with diets that include plants, invertebrates and verteb- phylogenomic data (Streicher and Wiens 2017), and a rates (Vitt and Pianka 2005; Vitt and Caldwell 2009;

1

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 1 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

2 SYSTEMATIC BIOLOGY

Colston et al., 2010; Pyron and Burbrink, 2014; Zaher have found quantitatively similar hidden morphological et al., 2014; Fry 2015). Squamates are important research support for Toxicofera as well as for Scleroglossa (Reeder organisms in the fields of behavior, ecology, and macro- et al. 2015), which suggests that most traits supporting evolution, for which major studies on their speciation, a basal Iguania/Scleroglossa split are the result of con- biogeography, and latitudinal richness gradients have vergent ecological adaptations. However, Simões et al. contributed to the basic understanding of how diversity (2018) recently rejected this basal split into Scleroglossa accumulates across the globe (O’Connor and Shine 2004; using a new morphological data matrix containing Ricklefs et al., 2007; Pyron and Burbrink 2012; Pyron, characters that support the molecular phylogeny (at least 2014; Burbrink et al., 2015; Esquerré and Scott Keogh 2016; in part). This suggests that coding of some of these Esquerré et al., 2017). morphological traits may have been historically in error Attempts to understand relationships among squam- or reflected homoplasy. ates over the last 250 years reflect the input from Many morphological studies recovered a single origin hundreds of researchers spanning morphological, small- for large-gaped snakes, referred to as Macrostomata scale molecular, and now phylogenomic data sets, (Cundall et al., 1993; Rieppel et al., 2003; Conrad 2008; marked by several key milestones (Oppel 1811; Camp Wilson et al., 2010; Gauthier et al., 2012; Zaher and 1923; Underwood 1967; Estes et al., 1988; Townsend Scanferla 2012; Simões et al., 2018). This group has been et al., 2004; Vidal and Hedges 2005, 2009; Conrad 2008; defined by a large number of skull features, all involving Wiens et al., 2010, 2012; Gauthier et al., 2012; Pyron the dentigerous upper and lower jaws, palatal, and et al., 2013, 2014; Reeder et al. 2015; Streicher and Wiens suspensorium bones, which contribute to an increase of 2016). Although research from both morphological gape size (Rieppel 1988; Cundall and Irish 2008). How- and molecular studies have converged toward sim- ever, Macrostomata was found to be paraphyletic based ilar content of most family-level groups, relationships on fossil taxa and morphological data alone (Lee and among these groups and, more intriguingly, the deepest Scanlon 2002; Rieppel et al., 2003; Scanlon 2006; Rieppel, divisions within squamates remain highly contentious 2012). Similarly, molecular studies using mtDNA and/or (Losos et al., 2012). Squamate relationships inferred single-copy nuclear genes also demonstrate paraphyly from molecular sequence-based phylogenetics that are in Macrostomata, where Aniliidae (non-Macrostomata) fundamentally at odds with historical interpretations of and Tropidophiidae (Macrostomata) represent sister morphological data include: 1) monophyly of Toxicofera taxa to the exclusion of other macrostomatan families (snakes, iguanians, and anguiforms; Vidal and Hedges (Vidal and Hedges, 2004; Pyron and Burbrink, 2012; 2005), 2) polyphyly of Anilioidea (pipe-snakes) and Wiens et al., 2012; Streicher and Wiens, 2016); this has also Macrostomata (large-gaped snakes including Acrochor- been reinforced by some unconventional morphological didae, Boidea, , Colubriformes, Pythonidae, traits (Siegel et al., 2011). Finally, a recent study com- Tropidophiidae, and Ungaliophiidae), 3) paraphyly of bining morphological and molecular data from extant Scolecophidia (worm-snakes), 4) phylogenetic affinit- and fossil species using tip dating methods showed that ies of Dibamia and Amphisbaenia within squamates macrostomatan morphological features evolved early (Hallermann 1998; Rieppel and Zaher 2000; Conrad 2008; in snakes and subsequently reversed multiple times Gauthier et al., 2012), and 5) phylogenetic affinities of (Harrington and Reeder, 2017). Heloderma and Shinisaurus within anguiformes (Gao and Some authors have argued that because morpho- Norell 1998). logical data are subjectively coded and demonstrably In brief, all cladistic morphological studies of extant susceptible to convergence among key traits, they should and extinct taxa since the landmark analysis of Estes be considered biased against estimating correct rela- et al. (1988), which itself tested the original divisions tionships when compared with more voluminous and of Camp (1923), have supported a basal split between objectively coded molecular data (Wiens et al., 2010; Iguania, represented by Pleurodonta and Acrodonta, Reeder et al. 2015). Molecular data, therefore, should not and Scleroglossa, which includes gekkotans and all be as influenced by ontogeny,environment, convergence, remaining (“autarchoglossan”) squamate groups (see or user-coded bias when compared with morphological Conrad [2008] and Gauthier et al. [2012] for reviews). traits. However, a few examples of molecular conver- This arrangement at the root of crown-Squamata is gence within mtDNA in squamates exist (Castoe et al., supported by a suite of morphological characters that 2009) and among particular genes in other organisms range from 2 to 10 unambiguous synapomorphies (Parker et al., 2013; Projecto-Garcia et al., 2013; Zou uniting Scleroglossa as the sister group of Iguania and Zhang, 2016). Although instances of genome-wide (Conrad 2008; Gauthier et al., 2012). Alternatively, all convergence occur (Footeetal., 2015), it nevertheless molecular phylogenetic studies since Saint et al. (1998), seems unlikely to expect convergence across the nuc- Townsend et al. (2004) and Vidal and Hedges (2005)have lear and mtDNA genomes while yielding identical demonstrated that snakes, anguimorphs, and iguanians topologies with respect to groups like Toxicofera and share a more-recent common ancestor to the exclusion . of other squamates; this clade is collectively referred to Although genome-scale data should thus be optimal as Toxicofera (Vidal and Hedges 2005). Importantly, re- for resolving squamate relationships, most molecular analysis of combined molecular and morphological data studies have been limited by either having many taxa

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 2 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 3

for few genes (Pyron et al., 2013) or having few taxa and FXp liquid-handling robot. After ligating 8 bp indexes, many genes (Streicher and Wiens, 2017). For many stud- we pooled libraries in groups of 16 and enriched the ies, understanding how many genes support a species library pools using the AHE probes developed for tree and what signal exists for alternative arrangements Squamates by Ruane et al. (2015) and Tucker et al. (2016). is currently unknown for most genomic-scale data sets. After verifying the quality and quantity of the enriched Thus, a study using genome-scale molecular data and libraries by bioanalysis and qPCR, we sequenced the dense sampling of extant lineages is needed to confirm libraries at the Florida State University translational the robustness of the molecular signal, particularly with laboratory on eight lanes of Illumina HiSeq 2500 with respect to methods that can interrogate phylogenetic paired-end 150-bp protocol (∼395 Gb of total data). signal across loci with respect to the species tree (e.g., Following sequencing, we demultiplexed reads Arcila et al., 2017). passing the Casava high-chastity filter, allowing for Here, we sample 289 species from nearly all squamate no index mismatches. Next, we merged overlapping families for 394 anchored phylogenomic loci (AHE; read pairs to remove library adapters and correct Lemmon et al., 2012) to investigate species-tree relation- for sequencing errors (Rokyta et al., 2012). We then ships within Squamata, assess support for all nodes, assembled the reads using the quasi-de novo approach and estimate divergence dates. We then examine the described by Ruane et al. (2015) and Prum et al. (2015), influence of each locus on the overall topology. Spe- with Calamaria pavimentata and Anolis carolinensis as ref- cifically, we use machine-learning techniques (Lek et al., erences. We avoided sequences derived from low-level 1996; Zhang, 2010) to quantify and understand why contamination or misindexing by removing assembly particular genes do not support the standard molecular clusters containing fewer than 500 reads. We also verified relationships and determine if there is hidden support the identity of some tissue samples by comparing for Scleroglossa and Macrostomata. Given past morpho- mitochondrial genes to previously published mitochon- logical evidence for these relationships, we expect at drial sequences (mainly cytb and the rRNAs genes). least some loci in any phylogenetic context (concatenated Mitochondrial sequences were assembled through map or species trees) should support these relationships. to reference approach using the Geneious mapper in Thus, the genomic distribution of congruence and Geneious R9 (Biomatters Ltd.; Kearse et al., 2012). We incongruence should highlight the underlying biolo- established orthology by clustering consensus sequences gical mechanisms for these topological disputes and using pairwise-distances, as described by Hamilton et al. potentially provide resolution. Results from our research (2016). For some loci, we removed aberrant sequences solidify relationships among most extant squamates, resulting from low-level contamination or low-divergent further clarify their taxonomy, and highlight remaining paralog that passed by our primary filters by using a problems. modified version of the orthology assessment method described by Hamilton et al. (2016). In this modified approach, we iteratively clustered the sequences using AQ8 METHODS AND MATERIALS pairwise-distances for each nested taxonomic level. We then aligned orthologous sequences using MAFFT Data set v7.023b (Katoh and Standley, 2013). We trimmed and Using anchored phylogenomics (Lemmon et al., masked the alignments following Hamilton et al. (2016); 2012), we generated a genomic data set for 289 spe- but with MINGOODSITES = 13, MINPROPSAME = cies representing 75 families of squamates and one 0.3, and MISSINGALLOWED = 82, then inspected the Rhynchocephalian (see Supplemental material available alignments manually in Geneious R9 to identify and on Dryad at http://dx.doi.org/10.5061/dryad.6392n3s). remove remaining aberrant sequences. Our taxonomic arrangement followed Vidal and Hedges (2005, 2009), Conrad (2008), and Vidal et al. (2010) for higher-level squamatan clades; Pyron et al. (2013) and Barker et al. (2015) for Booidea and Pythonoidea Phylogeny familial and generic levels; Zaher et al. (2009) and Kelly We estimated models of substitution for each et al. (2009), including nomenclatural suggestions of locus using ModelFinder (Chernomor et al., 2016; Savage (2015) and Rhodin et al. (2015) for higher and Kalyaanamoorthy et al., 2017), which uses maximum familial caenophidian clades. Whole genomic DNA was likelihood to fit 22 substitutional models including up extracted from tissue using the Qiagen DNeasy kit to 6 free-rate gamma categories. We first estimated following manufacturer’s protocols at the Center for phylogeny and tree support using the ultrafast non- Anchored Phylogenomics at Florida State University parametric bootstrap approximation (n = 1000; Minh and data were assembled using Anchored Phylogen- et al., 2013) over the partitioned concatenated data omics (www.anchoredphylogeny.com). Following Qubit sets. Support was also estimated using the Shimodaira– fluorometer quantification using a dsDNA HS Assay kit Hasegawa-like approximate likelihood ratio test (SH; (Invitrogen™), we sonicated up to 1 g of the extracted Shimodaira and Hasegawa, 1999; Anisimova et al. 2011). DNA to a size range of 150–400 bp using a Covaris With the locus-partitions and substitution models, we E220 Focused-ultasonicator. We then prepared libraries generated gene trees with support for each locus in IQ- following Lemmon et al. (2012) using a Beckman-Coulter TREE v1.6.6 (Nguyen et al., 2015). We then generated

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 3 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

4 SYSTEMATIC BIOLOGY

a species tree using ASTRAL III using IQ gene trees Divergence Dates as inputs with support estimated using local posterior We estimated divergence dates (with error) using the probabilities on quadripartitions with default hyper- penalized-likelihood approach in TreePL (Smith and parameter inputs (Yule prior for branch lengths and O’Meara, 2012) with three phylogenetic data sets. A full the species tree set to 0.5; Mirarab and Warnow, MCMC-based relaxed phylogenetics method was not AQ9 2015). We also ran ASTRAL using 1000 bootstrapped computationally feasible for a data set of this size. The IQ trees under the multilocus bootstrapping feature. first data set used for dating was composed of 1000 We compared these concatenated and species trees concatenated bootstrapped (pseudoreplicated) data sets using Robinson–Foulds (RF) distances (Robinson and generated from the IQTREE rapid bootstrap function Foulds, 1981), which provide a measure of topological (UFBoots). Second, because tree space may not have dissimilarity, and then determined if all measures of been widely explored using this method of generating support were correlated among shared branches using bootstrapped trees (Smith et al., 2018), we also generated Spearman rank correlation (Supplementary material 100 pseudoreplicates in RAxML 1.6.7 (Stamatakis, 2014), available on Dryad). The Squamata tree was rooted with estimated phylogeny for each in IQTREE. Third, we their closest living relative, the rhynchocephalian Sphen- also generated a test tree by fitting the concatenated odon punctatus; outgroup status of this taxon has been data to the ASTRAL topology producing a species- discussed in other recent molecular and morphological tree topology with branch lengths in substitution rates. phylogenies (Gauthier et al., 2012; Jones et al., 2013; Chen We then re-estimated the phylogeny and generated et al., 2015; Harrington et al., 2016). dated trees using TreePL; this method produced dates Because bootstraps, SH, or posterior probabilities do given tree uncertainty from the two bootstrap replicates not provide comprehensive measures of underlying and the ASTRAL tree. For TreePL, we chose the best agreements or disagreement among sites and genes smoothing parameter to balance a tradeoff between for supporting any topological arrangement, we rates across the tree being clock-like or completely examined site and gene concordance factors (sCF saturated by using a cross-validation approach. This and gCF, respectively). Here, gCF indicated the approach sequentially removed terminal taxa, produced percentage of gene trees showing a particular branch an estimate of these branches from the remaining from a species tree (Ané et al., 2007; Baum, 2007), data given an optimal smoothing parameter, and then whereas sCF shows the number of sites supporting produced an appropriate smoothing parameter from the that branch. Values of sCF have a lower bound of fit of the real branch and the pruned branch (Sanderson, 33% given the three possible quartets for each node, 2002). We iterated this 14 times over a range of smoothing whereas gCF values are calculated from a full gene × −7 × 4 tree and, therefore, may not resolve a particular node parameters from 1 10 –1 10 and implemented thorough yielding a lower bound of 0%. We estimated gCF and the option to ensure that the run iterates sCF values across all nodes of the ASTRAL species until convergence. To calibrate these trees, we followed tree and concatenated IQ trees using IQ-TREE v1.6.6 Jones et al. (2013), Alencar et al. (2016), and Zaher (Nguyen et al. 2015; Minh et al. 2018) and associated et al. (2018), and added other taxa, to utilize 26 fossils R code (http://www.robertlanfear.com/blog/files/ and locations on the phylogeny for estimating diver- concordance_factors.html). gence dates (see Supplementary material available on Because discordance as estimated here between gene Dryad). trees and the species tree may be caused by either incomplete lineage sorting (ILS) and poorly estimated gene trees, we attempted to isolate these issues across all Locus Support for Morphological Topology nodes. If ILS is driving gene and species-tree differences, We examined how often individual loci recovered then the two discordant topologies at a particular the traditional Scleroglossa/Iguania division and a node (i.e., not the primary node from the species-tree monophyletic Macrostomata (including Acrochordidae, topology) should be equivalent for gCF and for sCF, Boidea, Bolyeriidae, Calabariidae, Candoidae, Charin- though sites may be linked within single loci and provide idae, Erycidae, Loxocemidae, Pythonidae, Sanziniidae, unreliable estimates. Using a X2 test for genes and sites Tropidophiidae, Ungaliophiidae, Xenopeltidae, and all separately, we sum the number of genes and the number 17 families of Colubroides recognized herein). For Sclero- of sites supporting the two discordant topologies at each glossa, we tested for monophyly of all squamates exclud- node to determine if they deviate significantly from ing Iguania. For Macrostomata, we tested for monophyly being evenly represented. If they are not significant, of Amerophidia (Aniliidae + Tropidophiidae), a group this is a reasonable expectation that discordance among that rejects Macrostomata, as aniliids are canonical gene trees and among sites are due to ILS (Huson non-macrostomatans, and tropidophiids are canonical et al., 2005; Green et al., 2010; Martin et al., 2015). macrostomatans (Vidal and Hedges, 2004). This group- We estimated X2 between genes and sites using script ing in particular is an example of conflict between provided here: http://www.robertlanfear.com/blog/ molecular, osteological, and soft-tissue characters (see files/concordance_factors.html. Siegel et al., 2011; Hsiang et al., 2015). To test a more

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 4 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 5

difficult phylogenetic relationship, we also examined the method used as an alternative to likelihood techniques placement of Dibamia, here inferred in the species tree to address complex questions with many predictor as sister to Gekkota. variables in population genomics (Libbrecht and Noble, We first estimated the probability of Toxicofera and 2015; Sheehan et al., 2016). As such, they are a powerful Amerophidia monophyly, respectively, using the meas- tool for understanding non-linear interactions among ures of support in the species tree (quadripartition and numerous parameters to predict responses resulting bootstrapped IQ gene trees) and asked how many loci in a range of simple to complex models regardless also recover these relationships. For the remaining loci of the statistical distribution of those variables or the that did not show these relationships, we asked how relationships among them (Lek et al., 1996; Zhang, often they find the basic Scleroglossa/Iguania split and 2010). These methods have been successfully used to monophyly of Macrostomata and whether these outlier address phylogenetic and comparative tree-based tests loci together recovered these traditional relationships previously (Burbrink et al., 2017; Burbrink and Gehara, using species-tree methods. Similarly, we also examined 2018). key placements for Dibamia as the sister group to: 1) all In brief, in the typical NN with multilayer feed- other squamates, 2) Gekkota, and 3) all other squamates forward back-propagation used here, the basic structure excluding Gekkota. began with genetic input variables (input neurons), joined by weighted synapses to hidden neurons, and then finally ending in an output neuron. Every node Genomic Interrogation Using Neural Networks was connected to every other node in the previous layer, and each node was provided with an activation value, Using a custom R script (R Core Team, 2015; defined as the difference between the weighted sum of script provided as Supplementary material available all inputs and a bias (threshold) parameter; connected on Dryad), we examined the frequency of genes that hidden neurons activate an output node that is compared support each node of the dated species tree, similar to with a known (dependent) value. Smith et al. (2018). We assessed this support over all Specific to our NN, we scaled all predictor variables dated nodes to understand where and when particular by the minimum and maximum range of each data phylogenetic relationships were poorly supported by category. We separated the data into 70% standard the density of gene trees that disagree with the species training and 30% test. We also tested accuracy at tree. We then attempted to investigate why certain loci other training and test percentages, respectively: 50/50, did not show the same relationships as the species 60/40, 70/30, 80/20, and 90/10. Each of these was tree. run using 1000 maximum iterations, which ensured Using RF distances (Robinson and Foulds, 1981) convergence. We resampled the data using the default between each gene tree and the species tree (RFgtst) 25 bootstrap replicates to reach convergence with the as the response variable, we chose characteristics of following tuning parameters: weight decay, root mean genes known to affect resolution, support, or topological squared error (RMSE), r2, and mean absolute error accuracy such as gene length, base-pair composition, (MAE). We examined the power of these variables and alignment gappiness, and variable sites (Rosenberg and models to predict the response (RFgtst) by recompos- Kumar, 2003; Felsenstein, 2004; Wortley et al., 2005; ing the test and training sets over 100 iterations and Nagy et al., 2012; López-Giráldez et al., 2013; Som, 2015; comparing the resulting test statistics (RMSE, r2, and Duchêne et al., 2017), which here included the following MAE) to those from randomized response variables parameters as a standard set of predictor variables: 1) for each of these 100 estimated models. Over these number of sites, 2) base-pair content, 3) number of replicates, we also identified the top five most important parsimony-informative sites (PIS), 4) number of gaps, 4) model variables (Supplementary Fig. S1 available on maximum gap length, 5) mean gap length, 6) number of Dryad). segregating sites with gaps, 7) standard deviation of gap Multicollinearity among variables may be problematic length, 8) sites with more than one substitution, 9) num- for constructing model inferences if there is a linear ber of bases observed at each site (1–4), 10) maximum dependence among these independent variables (De or minimum phylogenetic informativeness, and 11) gene Veaux and Ungar, 1994; Hastie et al., 2009; Dormann and site concordance (gCF and sCF). Metrics 1–9 were et al., 2013). Although overparameterization is often not generated with a custom script using the Ape package a problem for NN given that they are used for prediction in R (Paradis et al., 2004). RF distances were estimated of the system and not necessarily for interpretation, pre- using a custom script based on the package phangorn. vious studies have demonstrated that machine-learning Metric 10 was estimated using a custom script based on techniques in general may be sensitive to changes in the package phyloinformR (Dornburg et al., 2016), which variables over collinear data (Shan et al., 2006; Dormann estimates rates per site from the web server Phydesign et al., 2013). Our NN models contain 26 variables that (Lopez-Giraldez and Townsend, 2011) and metric 11 was mostly describe properties of genes; therefore, we used generated in IQ-TREE v1.6.6 (Nguyen et al., 2015). variance inflation factors (VIF; Montgomer et al., 2012) To understand if these variables can predict RFgtst,we to filter collinearity data in the R package “Faraway” used artificial neural network (NN) regressions in caret (Faraway, 2002). Here, we chose standard VIF >10 and (Kuhn, 2008). Artificial NNs are a type of deep-learning removed all but one variable greater than this value. We

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 5 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

6 SYSTEMATIC BIOLOGY

then repeated the NN analyses as with all of the variables showed 0.90, 0.92, 0.93, and 0.95 percent of nodes described above. were supported above 0.95 for each method, respect- Similarly, we used NN to understand if the same ively (Supplementary material available on Dryad). variables used to describe genes, but now also including For shared nodes, support was significantly correlated − RFgtst, could be used to classify why loci do or do among methods ( = 0.55–0.66; P= 2.2 × 10 16 ). not support the monophyly of Toxicofera, Amerophidia All phylogenies generally showed relationships (which implies lack of support for Macrostomata), and among groups that have been commonly recovered the Gekkota and Dibamia sister relationship (GD). Our in previous genomic or combined molecular- binary predictions were classified as “1” if the locus morphological studies (Figs. 1 and 2; Vidal and supported the relationship and “0” if it did not. We again Hedges, 2005; Wiens et al., 2012; Pyron et al., 2013; used the “train” function with the identical set up as Reeder et al., 2015a; Streicher and Wiens, 2017). We above but replaced the tuning parameters with weighted found unambiguous support (i.e., support values decay and model accuracy. This was replicated 500 times of 1.0) for all main groups, including Unidentata, for recomposing test and training data sets, estimating Episquamata, Toxicofera, Gekkota, Scincomorpha, accuracy, and then testing this against the accuracy Laterata, Anguiformes, Iguania, and Serpentes of randomized responses. For each of these replicates, (Fig. 1). The tree supported an early division between we also tested performance using a confusion matrix Gekkota/Dibamia and the remainder of Squamata, (Townsend, 1971; Kuhn, 2008) comparing predicted with followed by a Scincomorpha and Episquamata actual classification from the training data. sister relationship, and within Episquamata a sister relationship between Laterata and Toxicofera. Resolution and support within each of these groups RESULTS was generally unambiguous (Supplementary material available on Dryad). For instance, most of the nodes Assemblies within Colubroides received probability support values We were able to recover 99.0% (SD = 2.3%) of the 394 of 1.0, except for sister relationships between target AHE loci (Supplementary material available on + Grayiidae and + Pseudoxyrhophiidae, Dryad). An average of 23.5% (SD = 10.0%) of the reads and the placement of Natricidae and within mapped to the target region. The resulting consensus and , respectively. Likewise, sequences averaged 1775 bp (SD = 180 bp). We obtained the remainder of Serpentes was well supported except for each locus an average of 1.7 consensus sequences for the placement of Candoiidae and Bolyeriidae, (assembly clusters, SD = 0.67), indicating that the loci and the sister relationship between the two clades were low copy but not all loci had single copies. The containing the most recent common ancestors (MRCA) average coverage of these consensus sequences was 218 of 1) Pythonidae and Bolyeriidae and 2) Calabariidae (SD = 83). and Boidae. We also did not find strong support for relationships among some families in Pleurodonta (Iguania) and Gekkota, the placement of Dibamia as Data set sister to Gekkota, and placement of Anguiformes or Loci had on average retained 92.4% (SD = 10.8) Iguania (or their MRCA) as sister to Serpentes within of all taxa represented in the species tree (n = 289). Toxicofera (Fig. 2). Mean length and number of PIS were 1302 bp (range: 170–2075, SD = 282.72) and 749 bp (range: 61–1455; SD = 282.7167), respectively. We found that 18 models Tree Support provided a best fit for substitutions across all loci, with the TVM (transversion model, AG = CT, unequal base Our estimates of sCF and gCF were correlated across frequencies) fitted to 27% of loci, GTR fitted to 18.1 % of the species tree of squamates (Fig. 3) but we note that models, with either a five or six free-rate parameter (R) both of these measures fell well below standard meas- model describing rate heterogeneity. ures of Pp support. For many of the standard squamate relationships, including Toxicofera and Amerophidia, gCF were above 50%, even whereas sCF remained low. This indicated that estimating support from sites Phylogeny alone, such as in bootstraps, may not provide credible Tree estimates using either concatenated or species- estimates of support with genomic-scale data. We also tree methods produced similar topologies (Figs. 1 and 2), demonstrated a significant relationship between gCF AQ22 with RF distances of 44 between these trees being only and sCF and branch length (Fig. 3). 7.7% of the maximum RF distance. All methods showed Most of the gene-to-species-tree discordance can strong nodal support. All measures of support—which be described by ILS, where 82.6% of nodes did not included ASTRAL with local posterior probabilities, show significant differences between the two discord- ASTRAL with 1000 IQ tree UF bootstraps, concaten- ant topologies. Although likely not as reliable given ated partitioned IQ trees with 1000 bootstraps, and non-independence among sites, sCF shows 39.7% of concatenated partitioned IQ trees with SH likelihoods nodes not showing significance among discordance sites.

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 6 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 7

Rhyncocephalia Sphenodontidae

Dibamia Dibamidae Eublepharidae Diplodactylidae Carphodactylidae Pygopodidae Gekkota Gekkonidae Sphaerodactylidae Scincoidea Scincidae Squamata Cordyloidea Gerrhosauridae Cordylidae Xantusiidae Amphisbaenia Bipedidae Lacertibaenia Trogonophiidae Amphisbaenidae Scincomorpha Lacertidae Teiioidea Gymnophthalmidae Teiidae Amphisbaenoidea Acrodonta Unidentata Chamaeleonidae Iguania Agamidae Laterata Pleurodonta Phrynosomatidae Iguanidae Dactyloidae Tropiduridae Leiocephalidae Episquamata Corytophanidae Crotaphytidae Liolaemidae Hoplocercidae Neoanguimorpha Opluridae Anguioidea Leiosauridae Polychrotidae Helodermatidae Diploglossidae Anguiformes AnguidaeAnniellidae Xenosauridae Shinisauridae Varanidae Toxicofera Paleoanguimorpha Leptotyphlopidae Typhlopoidea Gerrhopilidae Typhlopidae

Serpentes Anomalepididae Amerophidia Tropidophiidae Aniliidae Bolyeriidae Xenopeltidae Loxocemidae

Pythonidae Pythonoidea

Calabariidae Booidea Sanziniidae Charinidae Alethinophidia Ungaliophiidae Erycidae

Candoiidae

Boidae

Afrophidia Uropeltoidea Cylindrophiidae Uropeltidae Acrochordidae Pareidae Caenophidia Colubroides Elapoidea incertae sedis Colubriformes * Cyclocoridae

Endoglyptodonta Elapidae

Atractaspididae Lamprophiidae

Elapoidea Pseudoxyrhophiidae

Atractaspididae Psammophiidae

Natricidae

Pseudoxenodontidae

Dipsadidae

Sibynophiidae Colubroidea Calamariidae Grayiidae

Colubridae eogene Jurassic N Paleogene Quaternary Cretaceous

FIGURE 1. Dated species tree for Squamata with all major taxonomic categories indicated. Dark circles represent areas of local posterior probabilities on quadripartitions <95%. Black stars represent the location of fossils. Tip labels and dating error estimates are available in the Supplementary material available on Dryad.

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 7 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

8 SYSTEMATIC BIOLOGY

Cylindrophiidae

e

a is Uropeltidae

d

i idae ed d e Can r s rm a Ungaliophiidae o e

h e Boidae e c doiidae

Erycidae o

r nod rida reidae c e e

A Charinidae X p i Pa ea incertae Sanziniidae V Homalopsid oid

C Cyclocorida ap alabarii idae phiidae El p rophiidae Pythonidae Ela dae Loxoc Lamp

emidae Pseudoxyrho Xenopeltidae Atractaspididae

Bolyeriidae Psammophiidae Natricidae oxenodontidae Aniliidae seud P Tro pidophiidae Dipsadidae Anomalepididae Sibynophiidae

Typhl Calamariidae opidae rayiidae Gerr G hopilidae Colubridae Triassic Leptotyphlopidae Sphenodontidae Varanidae Dibamidae e Shinisaurida Eublephar idae sauridae Xeno Diplodac tyl idae idae Angu Carpho dact idae ylida ll Pygopo e Annie ae did ssid Gekko ae loglo nid AQ5 Dip atidae Sphaerodactylidaeae Serpentes derm dae Sci Helo ncidae Iguania lychroti G Po er AQ6 Co rho iosauridae s r au Anguiformes Le Xant dylidae ri d Opluridae a Bi e usi p AQ23 Scincomorpha Tr edid idae og oplocercidae e Amphisb H L on ae Liolaemidae Gy a Laterata Teii cer ophii C

A otaphytidae mn h r g tid

am dae a C a dae o en Gekkota m a rytophanida ph e ae i id

d t

a ae Co ha l

e

e Leiocephalidae Tropiduridae Iguanidae Dibamia Dactyloidae o lmid

n

ida

a

e e

Phrynosomatidae Toxicofera

Episquamata

Unidentata

Squamata PP Support <0.95

FIGURE 2. Reduced dated, circle phylogeny showing family-level relationships and higher. Low support (<95%) from local posterior probabilities on quadripartitions, bootstraps, and SH tests are indicated according to the legend.

Finally, logistic regression for the presence or absence of Neither concatenated analyses nor species trees Toxicofera and Amerophidia given sCF was significant supported the traditional Scleroglossa or Macrosto- − (P= 0.012 and 3.99 × 10 5), whereas the presence of the mata. In addition, no loci recovered those groups Dibamia/Gekkota node showed no relationship with either. In contrast, Toxicofera and Amerophidia were sCF (P= 0.45). well supported (100%) in concatenated and species- For each node of the species tree, more genes recovered tree estimates (Supplementary material available on a particular node than did not. This does not indicate, Dryad). Among individual genes, 75% and 69% of loci however, that the majority of gene topologies were the recovered monophyletic Toxicofera and Amerophidia, same as the species tree (Fig. 4); 72% of nodes were respectively. However, the sister relationship between supported by the majority of gene trees. Most of the Gekkota/Dibamia (GD) was inferred in all concatenated main groups showed high species-tree and individual- and species-tree estimates but with low support (only locus agreement, though we note that at 90–95 Ma, 42.5% of individual loci). relationships among Pleurodonta families (Iguanians) We also generated species trees using ASTRAL III from and the placement of Candoiidae and Bolyeriidae among the loci not supporting Toxicofera, Amerophidia, and Pythonoidea and Booidea, respectively, showed strong GD. For the former, we found that among these smaller discordance between the number of gene trees showing sets of loci, species trees did not support Scleroglossa, the species-trees relationship. Most loci yielded strong but rather placed Laterata sister to Serpentes, followed phylogenetic informativeness over substantially large by Iguania and then Anguimorpha (Supplementary substitution rates to infer the origin and diversification material available on Dryad). When forcing a sister throughout the history of Squamata sampled in our date relationship between Laterata and Serpentes, estimat- trees (Supplementary material available on Dryad). ing a best supporting IQTree for this constraint, and

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 8 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 9

100

75

Serpentes Gekkota Quadripartion Support 1.0 Anguiformes 0.8 50 Gekkota/Dibamia 0.6

Episquamata 0.4 Iguania Amerophidia Scincomorpha

Site Concordance Factors 25 Toxicofera Laterata Unidentata

0

0 255075100 Gene Concordance Factors 0100 8

Concordance Factors Gene Concordance Factor

Site Concordance Factor 0204060

012345 Branch Length (coalescent units)

FIGURE 3. Plot showing the relationship between site and gene concordance factors (sCF and gCF) relative to quadripartition support from ASTRAL (top) and concordance factors regressed against branch length in coalescent units (bottom).

calculating gCF and sCF for that nodal constraint, we in Squamata (e.g., Mulcahy et al., 2012; Jones et al., find very little support, essentially random, for this 2013; Pyron, 2016). Our analysis, though, provided arrangement: gCF = 4.32 and sCF = 33.1. For those loci not an unprecedented taxonomic and genomic coverage supporting Amerophidia, we found Aniliidae as sister to at lower hierarchical levels of the squamate tree that Alethinophidia, but still not supporting Macrostomata, enabled new estimates of divergence dates among extant given the inclusion of Uropeltoidea in Alethinophidia families. Between the different sets of bootstrapped (Supplementary material available on Dryad). Finally, phylogeny, we found that the absolute mean difference we found that 73% of the loci did not recover the GD for dated nodes was only 0.668 Ma, and all dates at relationship but instead resolving Dibamia as sister to all shared nodes were correlated ( = 0.925, P = 2.2 × − all squamates (43% of loci) or as Dibamia as sister to 10 16 ). Our estimates of divergence dates (Figs. 1 and 2; the remaining Squamata after Gekkota (30% of loci; Supplementary material available on Dryad) suggested Supplementary material available on Dryad). the origin of crown-Squamata to be in the Early Jurassic (190 Ma), though recent fossil evidence suggested this may have occurred earlier in the Late Triassic (206 Ma; Divergence Dating Simões et al., 2018). Deep divergences within the tree Our divergence dates were largely congruent with of Squamata occurred between the Early and Middle recent studies focused on estimating divergence times Jurassic, with the divergence among Gekkota–Dibamia,

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 9 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

10 SYSTEMATIC BIOLOGY

Scincomorpha–Episquamata, Laterata–Toxicofera, For the second machine-learning approach, we used Lacertibaenia–Teiioidea, and Iguania–Anguiformes NN classification to understand if any of the properties of occurring between 190 and 155 Ma. these loci along with RFgtst could predict why particular A large number of modern groups diverged within genes failed to recover Toxicofera, Amerophidia, and the Cretaceous: gekkotan, cordyloid, teiioid, anguiform, Dibamia/Gekkota. We determined that accuracy (scaled acrodontan, pleurodontan, and typhlopoid extant fam- between 0 and 1) when running these models over ilies. Divergence of crown-pleurodont iguanian families the training data set only had a modest increase over occurred within a short interval during the Early randomizing the responses (Fig. 5; accuracy for Tox- , between ∼98 and 79 Ma, following icofera = 0.08, Amerophidia = 0.08, Dibamia/Gekkota the end of the opening of the South Atlantic and = 0.13), suggesting difficulty determining why certain during a period of isolation of the western Gondwanan loci fail to find these three relationships. Similarly, area landmasses (McLoughlin, 2001). Similarly, their sister- under the ROC curve was low (0.60–0.70), and confusion group acrodontan extant families, Chamaeleonidae and matrices were not significant (P = 0.16–0.44). The top- ∼ Agamidae, diverged at 100 Ma. Afrophidian stem- ranked importance variables were RFgtst (0.96–1.0) and Uropeltoidea, Pythonoidea, Booidea, and Caenophidia phylogenetic informativeness (0.92–0.96) for Toxicofera also diverged within the Late Cretaceous, although and Dibamia/Gekkota. For Amerophidia, we found that most of their extant families diverged after the K/Pg the variables RFgtst (0.94) and frequency of adenosine boundary. Notable exceptions are the Cylindrophiidae, bases (0.75) ranked highest. Bolyeriidae, Xenopeltidae, Calabariidae, Acrochordidae, and Xenodermidae, which diverged in the Late Creta- ceous. All extant families of the most diverse group of DISCUSSION > squamates, the Colubriformes ( 3600 extant species), Using a genome-scale data set with a thorough and diverged within the Paleogene (Pareidae, Viperidae, diverse sampling of taxa, we corroborate nearly all recent Homalopsidae) or throughout the Eocene (all remaining molecular studies in estimating strong support for sev- families). eral fundamental groupings in Squamata: Unidentata, Scincomorpha, Episquamata, Laterata, Toxicofera, and Gene Interrogation via NN: Scleroglossa and Macrostomata Amerophidia. Using gene-interrogation techniques, we do not find any support in the genome for the traditional To understand discordance between gene trees and morphology-based Scleroglossa/Iguania division nor species trees beyond ILS, we used two NN analyses. for monophyly of large-gaped snakes, Macrostomata. For the first, we used a regression approach and Although the former has mainly fallen out of use in modeled predictions for the RFgtst discordance. All the literature because the rise of DNA sequence-based NN analyses converged and accuracy did not vary phylogenies of snakes, the latter has remained in use among training data sets ranging in size from 0.5 to 0.9 given analyses of morphological data that strongly (mean and SD r2 =0.87, 0.015). We estimated an average supports it (Conrad, 2008; Gauthier et al., 2012; Zaher RMSE, r2, and MAE (SD in parentheses) of 0.06 (0.013), and Scanferla, 2012; Hsiang et al., 2015). 0.86 (0.07), 0.05 (0.009), respectively. These strongly suggested that the NN was accurate, particularly relative to random responses (test between real and Artificial NNs < × −16 random accuracy metrics; P 2.2 10 ), which were Artificial NNs show that loci yielding discontinuity on average 0.24 (RMSE), 0.01 (r2), 0.17 (MAE; Fig. 5). with the species tree, as measured by scaled RF distance, The top five most important variables for predicting can be characterized by a few general properties. RFgtst by each locus were: mean and SD of sCF, PIS, Although traditional support remains high across most phylogenetic informativeness, frequency of sites with parts of this phylogeny, both gCF and sCF provide three observable alternate bases, and frequency of another view where in several cases loci fail to infer key sites with four observable alternate bases (Fig. 6). nodes (Figs. 3 and 4). We developed this NN approach Individually, Bayesian correlation (BayesFirstAid to better understand if particular properties of genes or (https://github.com/rasmusab/bayesian_first_aid) simple ILS can account for discordance (Figs. 5 and 6). between each of these variables and RFgtst showed Overall, most nodes reveal a pattern consistent with ILS. strong negative correlation (Fig. 6). We also tested When using NN and filtering for multicollinearity to the efficacy of the NN approach by filtering for predict the degree of gene and species-tree discordance, multicollinearity using VIF and removed all but five we found the following properties of genes and samples variables (mean and SD sCF, PIS, number of tips, and were important for resolving species-tree phylogenies: number of gaps). This essentially produced the same the number of PIS, mean site concordance across all prediction as using all 26 variables (accuracy = RMSE = nodes in the tree, the number of terminals sampled, 0.06, r2 = 0.87, MAE = 0.047) with variable importance and, importantly, variance in concordance sites across ranked at 1.0 for all uncorrelated variables: PIS, number all nodes of each gene tree. of tips, number of gaps, and standard deviation and As expected, genes with higher mean sCF across all mean sCF. nodes are better at producing trees concordant with the

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 10 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 11

−300 −200 −100 0100200300 0

Candoiidae,Boidae 50

Pleurodonta Relationships Colubroides

100 Amerophidia Bolyeridae(Xenopeltidae(Loxocemida,Pythonidae))

Node Date Serpentes Anguiformes

150 Iguania Laterata Toxicofera Scincomorpha Episquamata Unidentata Dibamia/Gekkota Squamata 200

250 Does Not Supports Support Species Species Tree Tree

FIGURE 4. Density of loci supporting (red) or not supporting (blue) nodes in the species tree scaled against dates of nodes (Ma). Major taxonomic groupings are indicated along these locus densities by particular node.

species trees. This gene property is correlated with the Squamate Phylogenomics − number of PIS ( = 0.400, P< 1.1 × 10 15). In turn, with Some authors have suggested that molecular con- our tests of multicollinearity, PIS is correlated with a vergence, such as that found in previous phylogenetic large number of other properties including phylogenetic studies (e.g., Castoe et al., 2009), may account for an informativeness over time (related to substitution rate erroneous estimation of Toxicofera (see Losos et al., over time), length of the gene, base-pair frequencies, and 2012). However, this suggestion, even when using a number of segregating sites. Interestingly, high variance handful of loci, seems improbable given the over- in sCF predicts higher discordance between the gene whelming support for the group estimated across many and species-tree topologies as measured by RF. This individual markers—including nuclear, mitochondrial, and structural (SINE) loci—and from concatenated and suggests for the main topology estimated using species- species trees. In addition, there appears to be very tree techniques, genes with high sCF variance are little unambiguous evidence for Scleroglossa in most concordant with some nodes and not others, again likely morphological data sets (Reeder et al. 2015). Molecular associated with the number of PIS per locus. In addition, convergence has been found within loci—for instance, both gCF and sCF are correlated with branch lengths within burrowing squamates in mtDNA (Castoe et al., which indicates that difficult areas of a phylogeny to 2009), and even across the genome in some limited infer with credible support will more likely be those examples (Footeetal., 2015). However, expecting con- where the timing between divergences were small. This vergence across most or all heritable markers including has been known from the phylogenetic literature using UCE (Streicher and Wiens, 2017), AHE loci, and mtDNA both concatenated and multispecies coalescent-based is unlikely given independence and function of these methods (Philippe et al., 1994; Xu and Yang, 2016). loci and the extremely low probability that taxa would randomly show the same relationships across these In summary, gene-tree and species-tree discordance is genes. a mix of ILS, which is easy to ameliorate given coalescent- We do not find a single locus supporting Scleroglossa, based phylogenetic inference, various properties of the whereas a majority of all possible loci containing all taxa genes, like site concordance and PIS, and short times recover Toxicofera (Figs. 3 and 4). Moreover, both con- between divergences, which may be difficult to resolve catenated and species-tree estimates support Toxicofera with any amount of data. It is likely that this NN at 100%. The loci that do not recover Toxicofera are method could serve to filter loci prior to final species- also problematic for most relationships, yielding higher tree estimation. Care should be taken in cases where a gene-to-species-tree RF values. Interestingly, species- majority of loci feature poor properties for tree inference, tree estimates from these loci also do not support such as low PIS; preliminary species trees would likely Scleroglossa and still show some support for Toxicofera be poorly estimated and supported, thus providing (Supplementary material available on Dryad). Previous unusable estimates of sCF and gCF. research with a smaller data set also failed to find

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 11 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

12 SYSTEMATIC BIOLOGY 30 25 15 20

Density RFst-gt 10 05

0.0 0.2 0.4 0.6 0.8 1.0 10 8 Amerophidia 0246 0

0.5 0.6 0.7 0.8 0.9 8 Density

Density Toxicofera 0246

0.5 0.6 0.7 0.8 0.9 10 8

Density Dibamia/Gekkota 0246

0.5 0.6 0.7 0.8 0.9 Random Real Accuracy

FIGURE 5. Accuracy of NNs. The top graph shows accuracy predicting the RF distances between gene and species tree (RFstgt) using a neural net regression approach. The bottom three graphs show results from a neural-network-classification analyses to determine why particular loci fail to find the indicated three phylogenetic groups. The figures represent the density of accuracy from known (real) classifications (0—does not support the group, 1—does support the group) and randomly shuffled classifications.

gene-support for Scleroglossa and supported Toxicofera Therefore, given independent support between UCEs (Reeder et al. 2015). Given that characters supporting and AHE data, it may be likely that the node subtending Scleroglossa (Conrad, 2008; Gauthier et al., 2012)may Iguana/Anguiformes is correct. be problematic due potential convergence arising from It is important to note in our study Serpentes shows independent adaptation to burrowing (Reeder et al. a long unbranched interval of almost 50 Ma separating 2015), we discourage any further use of Scleroglossa as a it from the remaining two toxicoferan clades (Fig. 1). nomen outside of a historical context. On the other hand, The lack of representation of a number of key extinct sister-group relationships within Toxicofera remain lineages capable of resolving the placement of snakes unresolved given that the clade formed by Iguania and by filling this geochronological gap may also confound Anguiformes received low support values in one method phylogeny. These remaining two toxicoferan clades (quadripartition support), despite the large number are also subtended by an unusually small branch, of loci used in our study. This relationship has been <1.4 my, where 98% of branching times in this tree inferred previously using genomic data with greater exceed this length. Unsolved relationships within Tox- AQ10 support but using less taxa (Streicher and Wiens, 2017). icofera and other difficult regions in this tree (e.g.,

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 12 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 13

a) 100

80

60

40 Variable Importnace (Freq)

20

0 st b ase ase ase T b b ases ases ase A b b b b # gaps er Taxa er of sites b b /w ST & GT b um Gap Size SD Phylogen Inf um Pars inf sites Max gap size N Sites w 2 Sites w 1 N Mean gap size Sites > 0 su Sites w 3 Sites w 4 Frequency Frequency # Sse sites w gaps ML Diff SD Site Conc factors

Mean site conc factors median = 0.69 median = −0.70 ρ ρ

95% HDI

0.63 0.75 −0.76 −0.64

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

N = 370 b) RF Distance GT-ST RF Distance GT-ST 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6

0.10 0.15 0.20 0.25 0.30 0.35 −100 0 100 200 300 400 500 SD Site Concordance Factors Sites with 4 Bases

median = -0.82 median = -0.73 ρ ρ 95% HDI 95% HDI

−0.85 −0.78 −0.78 −0.67

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

N = 370 N 370 RF Distance GT-ST RF Distance GT-ST 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6

200 400 600 800 1000 1200 1400 0.0 0.5 1.0 1.5 Parsimony Informative Sites Phylogenetic Informativeness

median = -0.78 ρ 95% HDI

−0.82 −0.73

−1.0 −0.5 0.0 0.5 1.0

N = 370 0.2 0.4 0.6 RF Distance GT-ST 0.0

0 200 400 600 Sites w 3 bases

FIGURE 6. A) Variable importance (top five from each of 100 replicates) from a regression NN analysis designed to predict the scaled response of RF distances for each gene tree (GT) against the species tree (ST). B) Bayesian correlations between RF distances and each of the four most important predictor variables. Distribution of correlations () between each variables is indicated about the four panels. AQ24

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 13 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

14 SYSTEMATIC BIOLOGY

relationships within Pleurodonta), therefore, may be Rieppel and Zaher, 2000; Evans et al., 2005; Conrad, 2008; due to excessively short internodes, which may require Gauthier et al., 2012), though authors typically acknow- data from the entire genome to estimate with high ledge that many characters showing this relationship support. may be due to convergence reflecting shared ecology. Similarly, we find strong support for a deep sister rela- Our analyses inferred Dibamia as the sister to Gekkota, tionship between Aniliidae (“Microstomata”) and Trop- but with poor support among methods and loci, where idophiidae (Macrostomata) using both concatenated and both gCF and sCF were split evenly among primary species-tree methods (bootstrap and Pp support = 100%) and discordant nodes for both metrics. The next-most- and among loci (present in 69% of gene trees), indicating probable placement is sister to all other Squamata, that the taxon Macrostomata defined by snakes with which was also found using UCEs in Streicher and expanded gapes (Zaher, 1998; Conrad, 2008; Wilson et al., Wiens (2017), then sister to Episquamata. Importantly, 2010; Gauthier et al., 2012; Hsiang et al., 2015) is invalid. our results never find Dibamia closely related to other This suggests that the origin of large gapes has either limbless or burrowing taxa. In general, other methods evolved or been lost multiple times and is consistent with characterizing genomic data sets have also had difficulty Harrington and Reeder (2017), and it may indicate that confidently estimating deep relationships, for which previous hypotheses on the origin of a “macrostomatan” particular genes may have biased results (Brown and diet in Serpentes requires re-analyses (e.g., Greene Thomson, 2016). 1997; Rodriguez-Robles et al., 1999). Unfortunately, the AQ11 phylogenetic affinities of a number of extinct alethi- nophidian lineages with key macrostomatan features— such as Pachyrhachis, Haasiophis, Eupodophis, Yurlunggur, Phylogenetic Structure and the Origin of Squamates Wonambi, and Sanajeh—remain in dispute (Conrad, 2008; The structure of our phylogeny is extremely similar Gauthier et al., 2012; Zaher and Scanferla, 2012; Reeder between concatenated and species-tree methods, with et al. 2015). A more accurate placement of these fossils RF distances being only 7.7% of the maximum RF. at the base of the alethinophidian tree might help Support among all nodes for all measures of support is clarify higher-level affinities between amerophidian high, with >90% of nodes having 95% support or higher. and afrophidian lineages, including the position of Although we infer a tree similar to those published uropeltids and tropidophiids. previously (Pyron et al., 2013; Reeder et al. 2015; Streicher Within Anguiformes, both concatenated and species- and Wiens, 2017), where the tree is largely structured tree estimates support Shinisaurus as the sister taxon as (Unidentata(Episquamata(Toxicofera))), we note that to varanids and Helodermatidae as the sister taxon several key nodes remain poorly supported (Figs. 1 to Anguioidea (Diploglossidae, Anniellidae, Anguidae, and 2). For example, several deep relationships such as Xenosauridae). This topology conflicts with the one the sister to Serpentes (Anguiformes or Iguania, or both) suggested by morphological data sets, including those within Toxicofera remain uncertain. with expanded fossil sampling, where helodermatids Similarly, support is low for resolving relationships and Shinisaurus are traditionally recognized as the among pleurodont families (the primarily New World sister groups of varanids and Xenosaurus, respectively iguanians), which is similar to results from previous (McDowell and Bogert, 1954; Gao and Norell, 1998; molecular and morphological attempts to understand- Gauthier, 1998; Gauthier et al., 2012). However, Con- ing these relationships (Etheridge and Frost, 1989; rad (2008) has shown that Shinisaurus and Xenosaurus Townsend et al., 2004; Reeder et al. 2015; Streicher et al., were not closely related, the former being the sister 2015). Comparable with Streicher and Wiens (2016), we group of varanoids (including Helodermatidae) and the found excessively short branch lengths subtending rela- latter as the sister group of Anguidae (Conrad, 2008). tionships among iguanian families, here ranging from 0.3 to 1.6 my, which are in the lower 0.4–1.7% of shortest Conrad’s (2008) morphological tree closely matches internodes on our dated tree. It is likely, given their molecular estimates of anguiform affinities, including rapid divergence in the Upper Cretaceous, that signal ours, with the exception of the phylogenetic position across the genome to confidently estimate this area of the of Helodermatidae. The distinct and strongly supported tree may remain difficult to extract (Rokas and Carroll, evidence provided by morphological and molecular data 2008). We also find poor support for the placement of regarding the phylogenetic affinities of Helodermatidae Candoiidae within Booidea and Bolyeriidae in respect within Anguiforms is another point of conflict that still to Booidea and Pythonoidea, relationships also dating to awaits a solution. the Upper Cretaceous. However, dense taxon sampling Finally, the placement of Dibamia within Squamata within these relictual families is not possible. Finally, has been difficult to resolve; molecular studies place support for relationships among some of the families Dibamia as the sister to Gekkota, sister to the remaining of the rapidly diverging and diverse Colubroidea and Squamata, or sister to Unidentata (Townsend et al., 2004; Elapoidea is lacking, mirroring numerous previous Vidal and Hedges, 2005; Pyron et al., 2013; Reeder et al. studies (Lawson et al., 2005; Zaher et al., 2009; Pyron 2015; Streicher and Wiens, 2017). Most morphological et al., 2011). We are presently sampling Colubroidea studies place Dibamia with other limbless, burrowing and Elapoidea more densely to better estimate those taxa such as Amphisbaenia or Serpentes (Evans and interfamilial relationships. As with all regions in the Barbadillo, 1998, 1999; Hallermann, 1998; Lee, 1998; Squamate Tree of Life with poorly supported, short

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 14 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 15

nodes, it is hopeful that imminent whole genome studies This research provides a solid framework for under- may be capable of increasing support in that particular standing the relationships and dates of origins of region of the tree, though strong signal for a particular all extant squamates showing that most major famil- arrangement may always remain elusive. ies diversified prior to the K/Pg boundary. We also Our results provide a solid foundation for the origins provide a novel framework for interrogating genes of all groupings of Squamata, with major pulses of using artificial-intelligence techniques to understand diversification occurring in the Jurassic, Cretaceous, and how particular loci differ from species-tree estimates. Paleogene (Figs. 1 and 2). These results generally agree Importantly, we show that both ILS and poor tree with previous studies using genetic and morphological estimation given properties of genes associated with the data (Mulcahy et al., 2012; Pyron and Burbrink, 2014; Har- number of informative sites, may produce significant rington and Reeder, 2017), though disparity in sampling discordance among gene and species trees. All analyses loci and taxa make direct comparisons among studies fail to support the two traditional grouping of squamates difficult. For example, the root times for Squamata in into Scleroglossa and Macrostomata, but rather a con- a recent paper were found to be older, with crown- sistent pattern grouping the squamates into Unidentata, Squamata originating in the Late Triassic (∼206 Ma; Simões et al., 2018). However, Simões et al. (2018) expan- Scincomorpha, Episquamata, Laterata, and Toxicofera. ded the concept of Squamata by including Megachirella We also highlight areas of topological uncertainty among and Marmoretta from the Middle Triassic and Early particular groups, such as Pleurodonta family and deep Jurassic, respectively; these two poorly preserved taxa toxicoferan relationships, that represent potential aven- have been identified previously as stem-lepidosaurs ues of novel research using whole genomes and denser (Evans and Jones, 2010). Our results are concordant with taxon sampling to properly infer these relationships. divergence dates for the origin of Squamata and for the major divisions giving rise to higher-level groups within squamates given by Pyron (2017) and Jones et al. SUPPLEMENTARY MATERIAL (2013), but we note many of our calibrations where taken from those studies. Unidentata, Episquamata, and Data available from the Dryad Digital Repository: AQ12 Toxicofera all arose in the Early to Middle Jurassic. In http://dx.doi.org/10.5061/dryad.6392n3s. addition, within these groups, diversification producing AQ13 the primary taxonomic divisions for the following taxa also occurred in the Jurassic: Scincomorpha, Laterata, Iguania, Cordylioidea, Pleurodonta, Acrodonta, and the ACKNOWLEDGMENTS root of Dibamia and Gekkota. We thank the following curators who kindly provided Similar to recent combined fossil, morphological, tissue samples for our study: G. Schneider (UMMZ), L. and molecular studies (Jones et al., 2013; Pyron, Densmore, L. Grismer, A. Bauer, T. Jackman, R. Brown 2016; Simões et al., 2018), a large number of diverse and C. K. Onn (KU), C. Austin, R. Brumfield and D. groups subsequently originated in the Cretaceous, Dittman (LSUMNS), J. Rosado (MCZ), M. Hagemann including the crown Serpentes and their major divi- (BPBM), A. Wynn (USNM), J. Vindum (CAS), J. McGuire sions into Typhlopoidea, Amerophidia, Alethinophidia, and C. Spencer (MVZ), D. Kizirian (AMNH), A. Resetar Afrophidia, and Caenophidia. Within these divisions, (FMNH), K. Krysko and T. Lott (UF), and D. Dittman well-known and widely distributed groups such as (LSUMZ). Pythonoidea, Booidea, and Uropeltoidea diversified as well. Within lizards, we see origins and diversification within Gekkota, Teiioidea, Pleurodonta, and Acro- donta. Although understanding how the K/Pg bound- FUNDING ary affected rates of diversification within Squamata This research was supported by Fundação de Amparo requires additional information from the fossil record, AQ14 it is clear that the groups originating in the Mesozoic à Pesquisa do Estado de São Paulo [grant number rapidly diversified into all major families during the BIOTA-FAPESP 2011/50206-9 to H.Z.], National Science Cenozoic, ultimately producing the 10,800 currently Foundation [grant numbers DEB-1257926 to F.T.B., DEB- known extant species. Most of these extant families 1441719 to R.A.P., DEB-1257610 to C.J.R.], and Australian of squamates diversified massively throughout the Research Council Discovery [grant number DP120104146 Paleogene and Neogene, underscoring the origins and to J.S.K. and S.C.D.]. diversification of the many hyperdiverse families of Colubriformes.

REFERENCES

Alencar L.R.V., Quental T.B., Grazziotin F.G., Alfaro M.L., Martins M., Venzon M., Zaher H. 2016. Diversification in vipers: phylogenetic CONCLUSION relationships, time of divergence and shifts in speciation rates. Mol. We provide a robust, dated phylogenomic estimate of Phylogenet. Evol. 105:50–62. phylogenetic relationships among Squamata sampling Ané C., Larget B., Baum D.A., Smith S.D., Rokas A. 2007. Bayesian estimation of concordance among gene trees. Mol. Biol. Evol. 24:412– widely across almost all major lizard and groups. 426.

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 15 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

16 SYSTEMATIC BIOLOGY

Arcila D., Ortí G., Vari R., Armbruster J.W., Stiassny M.L.J., Ko K.D., L. Camp. Stamford: Stanford University Press. p. 119–282. Camp Sabaj M.H., Lundberg J., Revell L.J., Betancur-R R. 2017. Genome- Memorial Symposium on the Phylogenetic Relationships of the wide interrogation advances resolution of recalcitrant groups in the Lizard Families 1982: Knoxville T. tree of life. Nat. Ecol. Evol. 1:20. Etheridge R.E., Frost D.R. 1989. A phylogenetic analysis and taxonomy Barker D.G., Barker T.M., Davis M.A., Schuett G.W. 2015. A review of of iguanian lizards (Reptilia, Squamata). Univ. Kansas Mus. Nat. the systematics and taxonomy of Pythonidae: an ancient serpent Hist. Misc. Publ. 81:1–76. lineage. Zool. J. Linn. Soc. 175:1–19. Evans S., Wang Y., Li C. 2005. The early Cretaceous Chinese lizard, Baum D.A. 2007. Concordance trees, concordance factors, and the Yabeinosaurus: resolving an enigma. J. Syst. Palaeontol. 3:319–335. exploration of reticulate genealogy. Taxon 56:417–426. Evans S.E. 2003. At the feet of the dinosaurs: the early history and Brown J.M., Thomson R.C. 2016. Bayes factors unmask highly variable radiation of lizards. Biol. Rev. Camb. Philos. Soc. 78:513–51. information content, bias, and extreme influence in phylogenomic Evans S.E., Barbadillo L.J. 1998. An unusual lizard (Reptilia: Squamata) analyses. Syst. Biol. 66:517–530. from the Early Cretaceous of Las Hoyas, Spain. Zool. J. Linn. Soc. Burbrink F.T., Gehara M. 2018. The biogeography of deep time 124:235–265. phylogenetic reticulation. Syst. Biol. 67:743–755. Evans S.E., Barbadillo L.J. 1999. A short-limbed lizard from the Lower Burbrink F.T., Lorch J.M.J.M., Lips K.R. 2017. Host susceptibility to Cretaceous of Spain. Syst. Paleontol. 60:73–85. snake fungal disease is highly dispersed across phylogenetic and Evans S.E., Jones M.E.H. 2010. The origin, early history and diversifica- functional trait space. Sci. Adv. 3:e1701387. tion of Lepidosauromorph reptiles. Berlin, Heidelberg: Springer. p. Burbrink F.T., McKelvy A.D.A.D., Pyron R.A., Myers E.A. 2015. 27–44. Predicting community structure in snakes on Eastern Nearctic Faraway J.J. 2002. Practical regression and anova using R. Bath: islands using ecological neutral theory and phylogenetic methods. University of Bath. Proc. R. Soc. B Biol. Sci. 282:20151700. Felsenstein J. 2004. Inferring phylogenies. Sunderland (MA): Sinauer Burbrink F.T., Ruane S., Kuhn A., Rabibisoa N., Randriamahatant- AQ15 Associates. soa B., Raselimanana A.P., Andrianarimalala M.S.M., Cadle J.E., Foote A.D., Liu Y., Thomas G.W.C., Vinaˇv T., Alföldi J., Deng J., Dugan Lemmon A.R., Lemmon E.M., Nussbaum R.A., Jones L., Pearson S., van Elk C.E., Hunter M.E., Joshi V., Khan Z., Kovar C., Lee S.L., R.G., Raxworthy C.J. 2019. The origins and diversification of Lindblad-Toh K., Mancia A., Nielsen R., Qin X., Qu J., Raney B.J., the exceptionally rich gemsnakes (Colubroidea: Lamprophiidae: Vijay N., Wolf J.B.W., Hahn M.W., Muzny D.M., Worley K.C., Gilbert ) in Madagascar. Syst. Biol. In Press. M.T.P., Gibbs R.A. 2015. Convergent evolution of the genomes of Camp C.L. 1923. Classification of the lizards. Bull. Am. Mus. Nat. Hist. marine mammals. Nat. Genet. 47:272–5. 48:289–480. Fry B.G. 2005. From genome to “venome”: molecular origin and Castoe T.A., de Koning A.P.J., Kim H.M., Gu W.J., Noonan B.P., Naylor AQ17 G., Jiang Z.J., Parkinson C.L., Pollock D.D. 2009. Evidence for an evolution of the snake venom proteome inferred from phylogenetic ancient adaptive episode of convergent molecular evolution. Proc. analysis of toxin sequences and related body proteins. Genome Res. Natl. Acad. Sci. USA 106:8986–8991. 15:403–420. Chen M.-Y., Liang D., Zhang P. 2015. Selecting question-specific genes Fry B.G. 2015. Venomous reptiles and their toxins: evolution, patho- to reduce incongruence in phylogenomics: a case study of jawed physiology, and biodiscovery. Oxford: Oxford University Press. Gao K., Norell M. 1998. Taxonomic revision of Carusia (Reptilia, vertebrate backbone phylogeny. Syst. Biol. 64:1104–1120. AQ18 Chernomor O., von Haeseler A., Minh B.Q. 2016. Terrace aware data Squamata) from the late Cretaceous of the Gobi Desert and structure for phylogenomic inference from supermatrices. Syst. Biol. phylogenetic relationships of anguimorphan lizards. American 65:997–1008. Museum novitates. 1–51. Colston T.J.., Costa G.C.., Vitt L.J. 2010. Snake diets and the deep history Garland T., Bennett A.F., Rezende E.L. 2005. Phylogenetic approaches hypothesis. Biol. J. Linn. Soc. 101:476–486. in comparative physiology. J. Exp. Biol. 208:3015–3035. Conrad J.L. 2008. Phylogeny and systematics of squamata (Reptilia) Gauthier J.A. 1998. Fossil xenosaurid and anguid lizards from the Early based on morphology. Bull. Am. Mus. Nat. Hist. 310:1–182. Eocene Wasatch Formation, southeast Wyoming, and a revision of Cundall D., Irish F.J. 2008. The snake skull. In: Gans C., Gaunt A.S., the Anguioidea. Rocky Mt. Geol. 21:7–54. Adler K., editors. Biology of the reptilia. Ithaca: Cornell University Gauthier J.A., Kearney M., Maisano J.A., Rieppel O., Behlke A.D.B. Press. p. 349–692. 2012. Assembling the squamate tree of life: perspectives from the Cundall D., Wallach V., Rossman D.A. 1993. The systematic rela- phenotype and the fossil record assembling the squamate tree of tionships of the snake genus Anomochilus. Zool. J. Linn. Soc. life: perspectives from the phenotype and the fossil record. Bull. 109:275–299. Peabody Mus. Nat. Hist. 53:3–308. De Veaux R.D., Ungar L.H. 1994. Multicollinearity: a tale of two Green R.E., Krause J., Briggs A.W., Maricic T., Stenzel U., Kircher M., nonparametric regressions. New York (NY): Springer. p. 393–402. Patterson N., Li H., Zhai W., Fritz M.H.-Y., Hansen N.F., Durand E.Y., Dormann C.F., Elith J., Bacher S., Buchmann C., Carl G., Carré G., Malaspinas A.-S., Jensen J.D., Marques-Bonet T., Alkan C., Prüfer Marquéz J.R.G., Gruber B., Lafourcade B., Leitão P.J., Münkemüller K., Meyer M., Burbano H.A., Good J.M., Schultz R., Aximu-Petri T., McClean C., Osborne P.E., Reineking B., Schröder B., Skidmore A., Butthof A., Höber B., Höffner B., Siegemund M., Weihmann A., A.K., Zurell D., Lautenbach S. 2013. Collinearity: a review of Nusbaum C., Lander E.S., Russ C., Novod N., Affourtit J., Egholm methods to deal with it and a simulation study evaluating their M., Verna C., Rudan P., Brajkovic D., Kucan Ž., Gušic I., Doronichev performance. Ecography (Cop.). 36:27–46. Dornburg A., Fisk J.N., Tamagnan J., Townsend J.P. 2016. PhyIn- V.B., Golovanova L. V, Lalueza-Fox C., de la Rasilla M., Fortea J., formR: phylogenetic experimental design and phylogenomic data Rosas A., Schmitz R.W., Johnson P.L.F., Eichler E.E., Falush D., exploration in R. BMC Evol. Biol. 16:262. Birney E., Mullikin J.C., Slatkin M., Nielsen R., Kelso J., Lachmann Duchêne D.A., Duchêne S., Ho S.Y.W. 2017. New statistical criteria M., Reich D., Pääbo S. 2010. A draft sequence of the Neandertal detect phylogenetic bias caused by compositional heterogeneity. genome. Science 328:710–722. Mol. Biol. Evol. 34:1529–1534. Hallermann J. 1998. The ethmoidal region of Dibamus taylori Esquerré D., Scott Keogh J. 2016. Parallel selective pressures drive (Squamata: Dibamidae), with a phylogenetic hypothesis on dib- convergent diversification of phenotypes in pythons and boas. Ecol. amid relationships within Squamata. Zool. J. Linn. Soc. 122:385– Lett. 19:800–809. 426. Esquerré D., Sherratt E., Keogh J.S. 2017. Evolution of extreme Hamilton C.A., Lemmon A.R., Lemmon E.M., Bond J.E. 2016. Expand- ontogenetic allometric diversity and heterochrony in pythons, a ing anchored hybrid enrichment to resolve both deep and shallow clade of giant and dwarf snakes. Evolution (NY) 71:2829–2844. relationships within the spider tree of life. BMC Evol. Biol. Estes R., Pregill G.K., Camp C.L., Charles L. 1988. Phylogenetic 16:212. AQ16 relationships of the lizard families: essays commemorating Charles Harrington S.M., Leavitt D.H., Reeder T.W. 2016. Squamate phylo- L. Camp. In: Estes R., Pregill G.K., editors. Phylogenetic rela- genetics, molecular branch lengths, and molecular apomorphies: tionships of the lizard families: essays commemorating Charles a response to McMahan et al. Copeia 104:702–707.

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 16 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 17

Harrington S.M., Reeder T.W. 2017. Phylogenetic inference and diver- Minh B.Q., Hahn M.W., Lanfear R. 2018. New methods to cal- gence dating of snakes using molecules, morphology and fossils: culate concordance factors for phylogenomic datasets. bioRxiv AQ19 new insights into convergent evolution of feeding morphology and 487801. limb reduction. Biol. J. Linn. Soc. 121:379–394. Minh B.Q., Nguyen M.A.T., von Haeseler A. 2013. Ultrafast approxim- Hastie T., Tibshirani R., Friedman J. 2009. The elements of statistical ation for phylogenetic bootstrap. Mol. Biol. Evol. 30:1188–1195. learning. New York (NY): Springer New York. Mirarab S., Warnow T. 2015. ASTRAL-II: coalescent-based species tree Heath T.A., Hedtke S.M., Hillis D.M. 2008. Taxon sampling and the estimation with many hundreds of taxa and thousands of genes. accuracy of phylogenetic analyses. J. Syst. Evol. 46:239–257. Bioinformatics 31:i44–i52. Hsiang A.Y., Field D.J., Webster T.H., Behlke A.D., Davis M.B., Racicot Montgomer D.E., Peck E.A., Vining G.G. 2012. Introduction to linear R.A., Gauthier J.A. 2015. The origin of snakes: revealing the ecology, regression analysis. Hoboken (NJ): Wiley. behavior, and evolutionary history of early snakes using genomics, Mulcahy D.G., Noonan B.P.,Moss T., Townsend T.M., Reeder T.W., Sites phenomics, and the fossil record. BMC Evol. Biol. 15:87. J.W., Wiens J.J. 2012. Estimating divergence dates and evaluating Huson D.H., Klöpper T., Lockhart P.J., Steel M.A. 2005. Reconstruction dating methods using phylogenomic and mitochondrial data in of reticulate networks from gene trees. Springer, Berlin, Heidelberg. squamate reptiles. Mol. Phylogenet. Evol. 65:974–991. p. 233–249. Nagy L.G., Kocsubé S., Csanádi Z., Kovács G.M., Petkovits T., Jones M.E., Anderson C., Hipsley C.A., Müller J., Evans S.E., Schoch Vágvölgyi C., Papp T. 2012. Re-mind the gap! Insertion—deletion R.R. 2013. Integration of molecules and new fossils supports a data reveal neglected phylogenetic potential of the nuclear Triassic origin for (lizards, snakes, and tuatara). BMC ribosomal internal transcribed spacer (ITS) of fungi. PLoS One Evol. Biol. 13:208. 7:e49794. Kalyaanamoorthy S., Minh B.Q., Wong T.K.F., von Haeseler A., Nguyen L.-T., Schmidt H.A., von Haeseler A., Minh B.Q. 2015. IQ-TREE: Jermiin L.S. 2017. ModelFinder: fast model selection for accurate a fast and effective stochastic algorithm for estimating Maximum- phylogenetic estimates. Nat. Methods. 14:587–589. Likelihood phylogenies. Mol. Biol. Evol. 32:268–274. Katoh K., Standley D.M. 2013. MAFFT multiple sequence alignment O’Connor D.E., Shine R. 2004. Parental care protects against infanticide software version 7: improvements in performance and usability. in the lizard Egernia saxatilis (Scincidae). Anim. Behav. 68:1361–1369. Mol. Biol. Evol. 30:772–80. Oppel M. 1811. Die ordnungen, familien, und gattungen der reptilien, Kearse M., Moir R., Wilson A., Stones-Havas S., Cheung M., Sturrock als prodrom einer naturgeschichte derselben. Munich: Joseph S., Buxton S., Cooper A., Markowitz S., Duran C., Thierer T., Ashton Lindauer. B., Meintjes P., Drummond A. 2012. Geneious basic: an integrated Paradis E., Claude J., Strimmer K. 2004. APE: analyses of phylogenetics and extendable desktop software platform for the organization and and evolution in R language. Bioinformatics 20:289–290. analysis of sequence data. Bioinformatics 28:1647–1649. Parker J., Tsagkogeorga G., Cotton J.A., Liu Y., Provero P., Stupka E., Kelly C.M.R., Barker N.P., Villet M.H., Broadley D.G. 2009. Phylogeny, Rossiter S.J. 2013. Genome-wide signatures of convergent evolution biogeography and classification of the snake superfamily Elapoidea: in echolocating mammals. Nature 502:228–31. a rapid radiation in the late Eocene. Cladistics Int. J. Willi Hennig Philippe H., Brinkmann H., Lavrov D. V., Littlewood D.T.J., Manuel Soc. 25:38–63. M., Wörheide G., Baurain D. 2011. Resolving difficult phylogen- Kuhn M. 2008. Caret:Classification and regression training package. R etic questions: why more sequences are not enough. PLoS Biol. package version 6.0-77. 28. 9:e1000602. Lawson R., Slowinski J.B., Crother B.I., Burbrink F.T. 2005. Phylogeny Philippe H., Chenuil A., Adoutte A. 1994. Can the explosion of the Colubroidea (Serpentes): new evidence from mitochondrial be inferred through molecular phylogeny? Development 1994:15– and nuclear genes. Mol. Phylogenet. Evol. 37:581–601. 25. Lee M.S.Y. 1998. Convergent evolution and character correlation in Projecto-Garcia J., Natarajan C., Moriyama H., Weber R.E., Fago A., burrowing reptiles: towards a resolution of squamate relationships. Cheviron Z.A., Dudley R., McGuire J.A., Witt C.C., Storz J.F. 2013. Biol. J. Linn. Soc. 65:369–453. Repeated elevational transitions in hemoglobin function during the Lee M.S.Y., Scanlon J.D. 2002. Snake phylogeny based on osteology, evolution of Andean hummingbirds. Proc. Natl. Acad. Sci. USA soft anatomy and ecology. Biol. Rev. 77:333–401. 110:20669–74. Lek S., Delacoste M., Baran P., Dimopoulos I., Lauga J., Aulagnier Prum R.O., Berv J.S., Dornburg A., Field D.J., Townsend J.P., Lemmon S. 1996. Application of neural networks to modelling nonlinear E.M., Lemmon A.R. 2015. A comprehensive phylogeny of birds relationships in ecology. Ecol. Modell. 90:39–52. (Aves) using targeted next-generation DNA sequencing. Nature Lemmon A.R., Emme S.A., Lemmon E.M. 2012. Anchored hybrid 526:569–573. enrichment for massively high-throughput phylogenomics. Syst. Pyron R.A. 2014. Temperate extinction in squamate reptiles and Biol. 61:727–744. the roots of latitudinal diversity gradients. Glob. Ecol. Biogeogr. Libbrecht M.W., Noble W.S. 2015. Machine learning applications in 23:1126–1134. genetics and genomics. Nat. Rev. Genet. 16:321–332. Pyron R.A. 2016. Novel approaches for phylogenetic inference from Liu L., Zhang J., Rheindt F.E., Lei F., Qu Y., Wang Y., Zhang Y., Sullivan morphological data and total-evidence dating in squamate reptiles C., Nie W., Wang J., Yang F., Chen J., Edwards S. V, Meng J., Wu S. (lizards, snakes, and amphisbaenians). Syst. Biol. 66:syw068. 2017. Genomic evidence reveals a radiation of placental mammals Pyron R.A., Burbrink F.T. 2012. Extinction, ecological opportunity, and uninterrupted by the KPg boundary. Proc. Natl. Acad. Sci. USA the origins of global snake diversity. Evolution (NY) 66:163–178. 114:E7282–E7290. Pyron R.A., Burbrink F.T. 2014. Early origin of viviparity and multiple López-Giráldez F., Moeller A.H., Townsend J.P. 2013. Evaluating reversions to oviparity in squamate reptiles. Ecol. Lett. 17:13–21. phylogenetic informativeness as a predictor of phylogenetic signal Pyron R.A., Burbrink F.T., Colli G.R., Montes de Oca A.N., Vitt L.J.J., for metazoan, fungal, and mammalian phylogenomic data sets. Kuczynski C.A.A., Wiens J.J., de Oca A.N.M. 2011. The phylogeny of Biomed. Res. Int. 2013:621604. advanced snakes (Colubroidea), with discovery of a new subfamily Lopez-Giraldez F., Townsend J.P. 2011. PhyDesign: a webapp for and comparison of support methods for likelihood trees. Mol. profiling phylogenetic informativeness. BMC Evol. Biol. 11:152. Phylogenet. Evol. 58:329–342. Losos J., Hillis D., Greene H. 2012. Evolution. Who speaks with a forked Pyron R.A., Burbrink F.T., Wiens J.J. 2013. A phylogeny and revised tongue? Science 338:1428–1429. classification of Squamata, including 4161 species of lizards and Martin S.H., Davey J.W., Jiggins C.D. 2015. Evaluating the use of ABBA– snakes. BMC Evol. Biol. 13:93. BABA statistics to locate introgressed loci. Mol. Biol. Evol. 32:244– Pyron R.A., Hendry C.R., Chou V.M.V.M., Lemmon E.M., Lemmon 257. A.R., Burbrink F.T. 2014. Effectiveness of phylogenomic data and McDowell S.B., Bogert C.M. 1954. The systematic position of Lanthan- coalescent species-tree methods for resolving difficult nodes in otus and the affinities of the anguinomorphan lizards. Bull. Am. the phylogeny of advanced snakes (Serpentes: Caenophidia). Mol. Mus. Nat. Hist. 105:1–142. Phylogenet. Evol. 81:221–231. McLoughlin S. 2001. The breakup history of Gondwana and its impact Pyron R.A., Hsieh F.W., Lemmon A.R., Lemmon E.M., Hendry C.R. on pre-Cenozoic floristic provincialism. Aust. J. Bot. 49:271. 2016. Integrating phylogenomic and morphological data to assess

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 17 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

18 SYSTEMATIC BIOLOGY

candidate species-delimitation models in brown and red-bellied Shimodaira H., Hasegawa M. 1999. Multiple comparisons of log- snakes (Storeria ). Zool. J. Linn. Soc. 177:937–949. likelihoods with applications to phylogenetic inference. Mol. Biol. R Core Team. 2015. R: a language and environment for statistical Evol. 16:1114–1116. AQ20 computing. Siegel D.S., Miralles A., Aldridge R.D. 2011. Controversial snake Reeder T.W., Townsend T.M., Mulcahy D.G., Noonan B.P., Wood P.L., relationships supported by reproductive anatomy. J. Anat. 218:342– AQ4 Sites J.W., Wiens J.J. 2015a. Integrated analyses resolve conflicts over 348. squamate phylogeny and reveal unexpected placements for Simões T.R., Caldwell M.W., Tałanda M., Bernardi M., Palci A., fossil taxa. PLoS One 10:e0118199. Vernygora O., Bernardini F., Mancini L., Nydam R.L. 2018. The Rhodin A.G.J., Kaiser H., van Dijk P.P., Wüster W., O Shea M., Archer origin of squamates revealed by a Middle Triassic lizard from the M., Auliya M., Boitani L., Bour R., Clausnitzer V., Contreras- Italian Alps. Nature 557:706–709. MacBeath T., Crother B.I., Daza J.M., Driscoll C.A., Flores-Villela Smith S.A., Brown J.W., Yang Y., Bruenn R., Drummond C.P., O., Frazier J., Fritz U., Gardner A.L., Gascon C., Georges A., Glaw Brockington S.F., Walker J.F., Last N., Douglas N.A., Moore M.J. F., Grazziotin F.G., Groves C.P., Haszprunar G., Hava P., Hero 2018. Disparity, diversity, and duplications in the Caryophyllales. J.-M., Hoffmann M., Hoogmoed M.S., Horne B.D., Iverson J.B., New Phytol. 217:836–854. Jäch M., Jenkins C.L., Jenkins R.K.B., Kiester A.R., Keogh J.S., Smith S.A., O’Meara B.C. 2012. treePL: divergence time estimation Lacher Jr. T.E., Lovich J.E., Luiselli L., Mahler D.L., Mallon D.P., using penalized likelihood for large phylogenies. Bioinformatics Mast R., McDiarmid R.W., Measey J., Mittermeier R.A., Molur 28:2689–2690. S., Mosbrugger V., Murphy R.W., Naish D., Niekisch M., Ota Som A. 2015. Causes, consequences and solutions of phylogenetic H., Parham J.F., Parr M.J., Pilcher N.J., Pine R.H., Rylands A.B., incongruence. Brief. Bioinform. 16:536–548. Sanderson J.G., Savage J.M., Schleip W., Scrocchi G.J., Shaffer H.B., Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis Smith E.N., Sprackland R., Stuart S.N., Vetter H., Vitt L.J., Waller T., and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. Webb G., Wilson E.O., Zaher H., Thomson S. 2015b. Comment on Streicher J.W., Schulte J.A., Wiens J.J. 2015. How should genes and Spracklandus Hoser, 2009 (Reptilia, Serpentes, ELAPIDAE): request taxa be sampled for phylogenomic analyses with missing data? An for confirmation of availability of the generic name and for the empirical study in iguanian lizards. Syst. Biol. 65:128–45. nomenclatural validation of the journal in which it was published Streicher J.W., Wiens J.J. 2016. Phylogenomic analyses reveal novel (Case 3601; BZN 70: 234–237; 71: 30–38, 133–135,181–182, 252–253). relationships among snake families. Mol. Phylogenet. Evol. 100:160– Ricklefs R.E., Losos J.B., Townsend T.M. 2007. Evolutionary diversific- 169. ation of clades of squamate reptiles. J. Evol. Biol. 20:1751–1762. Streicher J.W., Wiens J.J. 2017.Phylogenomic analyses of more than 4000 Rieppel O. 1988. A review of the origin of snakes. Evolutionary biology. nuclear loci resolve the origin of snakes among lizard families. Biol. Boston (MA): Springer US. p. 37–130. Lett. 13:20170393. Rieppel O. 2012. “Regressed” macrostomatan snakes. Fieldiana Life Townsend J.T. 1971. Theoretical analysis of an alphabetic confusion Earth Sci. 5:99–103. matrix. Percept. Psychophys. 9:40–50. Rieppel O., Zaher H. 2000. The iIntramandibular joint in squamates, Townsend T.M., Larson A., Louis E., Macey J.R. 2004. Molecular and the phylogenetic relationships of the fossil snake Pachyrhachis phylogenetics of squamata: the position of snakes, amphisbaenians, problematicus Haas. Fieldiana Geol. 43:1–69. and dibamids, and the root of the squamate tree. Syst. Biol. Rieppel O., Zaher H., Tchernov E., Polcyn M.J. 2003. The anatomy 53:735–757. and relationships of Haasiophis terrasanctus, a fossil snake with well- Tucker D.B., Colli G.R., Giugliano L.G., Hedges S.B., Hendry C.R., developed hind limbs from the Mid-Cretaceous of the Middle East. Lemmon E.M., Lemmon A.R., Sites J.W., Pyron R.A. 2016. Method- J. Paleontol. 77:536–558. ological congruence in phylogenomic analyses with morphological Robinson D.F., Foulds L.R. 1981. Comparison of phylogenetic trees. support for teiid lizards (Sauria: Teiidae). Mol. Phylogenet. Evol. Math. Biosci. 131–147. 103:75–84. Rodriguez-Robles J.A., Bell C.J., Greene H.W. 1999. Gape size and Uetz P., Freed P., Hošek J. 2009. The reptile database. Available from: evolution of diet in snakes: feeding ecology of erycine boas. J. Zool. http://www.reptile-database.org. 248:49–58. Underwood G. 1967. A contribution to the classification of snakes. Rokas A., Carroll S.B. 2008. Frequent and widespread parallel evolution London: British Museum. of protein sequences. Mol. Biol. Evol. 25:1943–1953. Vidal N., Hedges S.B. 2004. Molecular evidence for a terrestrial origin Rokyta D.R., Lemmon A.R., Margres M.J., Aronow K. 2012. The of snakes. Proc. R. Soc. London Ser. B-Biological Sci. 271:S226–S229. venom-gland transcriptome of the eastern diamondback rattlesnake Vidal N., Hedges S.B. 2005. The phylogeny of squamate reptiles (Crotalus adamanteus). BMC Genomics 13:312. (lizards, snakes, and amphisbaenians) inferred from nine nuclear Rosenberg M.S., Kumar S. 2003. Heterogeneity of nucleotide frequen- protein-coding genes. C. R. Biol. 328:1000–1008. cies among evolutionary lineages and phylogenetic inference. Mol. Vidal N., Hedges S.B. 2009. The molecular evolutionary tree of lizards, Biol. Evol. 20:610–621. snakes, and amphisbaenians. C. R. Biol. 332:129–139. Ruane S., Raxworthy C.J.C.J., Lemmon A.R., Lemmon E.M., Burbrink Vidal N., Marin J., Morini M., Donnellan S., Branch W.R., Thomas F.T. 2015. Comparing species tree estimation with large anchored R., Vences M., Wynn A., Cruaud C., Hedges S.B. 2010. Blindsnake phylogenomic and small Sanger-sequenced molecular datasets: an evolutionary tree reveals long history on Gondwana. Biol. Lett. empirical study on Malagasy pseudoxyrhophiine snakes. BMC 6:558–561. Evol. Biol. 15:221. Vitt L.J., Caldwell J.P. 2009. Herpetology. Burlington (MA): Elsevier. Saint K.M., Austin C.C., Donnellan S.C., Hutchinson M.N. 1998. C- Vitt L.J., Pianka E.R. 2005. Deep history impacts present-day ecology mos, a nuclear marker useful for squamate phylogenetic analysis. and biodiversity. Proc. Natl. Acad. Sci. USA 102:7877–7881. Mol. Phylogenet. Evol. 10:259–263. Wiens J.J., Hutter C.R., Mulcahy D.G., Noonan B.P., Townsend T.M., Sanderson M.J. 2002. Estimating absolute rates of molecular evolution Sites J.W., Reeder T.W. 2012. Resolving the phylogeny of lizards and and divergence times: a penalized likelihood approach. Mol. Biol. snakes (Squamata) with extensive sampling of genes and species. Evol. 19:101–109. Biol. Lett. 8:1043–1046. Savage J.M. 2015. What are the correct family names for the taxa that Wiens J.J., Kuczynski C.A., Townsend T., Reeder T.W., Mulcahy D.G., include the snake general Xenodermus, Pareas, and Calamaria? Sites J.W. 2010. Combining phylogenomics and fossils in higher level Herpetol. Rev. 46:664–665. squamate reptile phylogeny: molecular data change the placement Scanlon J.D. 2006. Skull of the large non-macrostomatan snake of fossil taxa. Syst. Biol. 59:674–688. Yurlunggur from the Australian Oligo-Miocene. Nature 439:839–842. Wilson J.A., Mohabey D.M., Peters S.E., Head J.J. 2010. Predation upon Shan Y., Paull D., McKay R.I. 2006. Machine learning of poorly hatchling dinosaurs by a new snake from the Late Cretaceous of predictable ecological data. Ecol. Modell. 195:129–138. India. PLoS Biol. 8:e1000322. Sheehan S., Song Y.S., Buzbas E., Petrov D., Boyko A., Auton A. 2016. Wortley A.H., Rudall P.J., Harris D.J., Scotland R.W. 2005. How much Deep learning for population genetic inference. PLOS Comput. Biol. data are needed to resolve a difficult phylogeny? Case study in 12:e1004845. Lamiales. Syst. Biol. 54:697–709.

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 18 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 19

Xu B., Yang Z. 2016. Challenges in species tree estimation under the Zaher H., Scanferla C.A. 2012. The skull of the Upper Creta- multispecies coalescent model. Genetics 204:1353–1368. ceous snake Dinilysia patagonica Smith-Woodward, 1901, and Zaher H. 1998. The phylogenetic position of Pachyrhachis within snakes its phylogenetic position revisited. Zool. J. Linn. Soc. 164:194– (Squamata, Lepidosauria). J. Vertebr. Paleontol. 18:1–3. 238. Zaher H., de Oliveira L., Grazziotin F.G., Campagner M., Jared C., Zaher H., Yánez-Muñoz M.H., Rodrigues M.T., Graboski R., Machado Antoniazzi M.M., Prudente A.L. 2014. Consuming viscous prey: a F.A., Altamirano-Benavides M., Bonatto S.L., Grazziotin F.G. 2018. novel protein-secreting delivery system in neotropical snail-eating Origin and hidden diversity within the poorly known Galápagos snakes. BMC Evol. Biol. 14:58. snake radiation (Serpentes: Dipsadidae). Syst. Biodivers. 16:614– Zaher H., Grazziotin F.G., Cadle J.E., Murphy R.W., de Moura 642. J.C., Bonatto S.L. 2009. Molecular phylogeny of advanced snakes Zhang W. 2010. Computational ecology: artificial neural networks and (Serpentes, Caenophidia) with an emphasis on South American their applications. World Scientific. AQ21 xenodontines: a revised classification and descriptions of new taxa. Zou Z., Zhang J. 2016. Morphological and molecular convergences in Pap. Avulsos Zool. 49:115–153. mammalian phylogenetics. Nat. Commun. 7:12758.

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 19 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Rhyncocephalia Sphenodontidae

Dibamia Dibamidae Eublepharidae Diplodactylidae Carphodactylidae Pygopodidae Gekkota Gekkonidae Sphaerodactylidae Scincoidea Scincidae Squamata Cordyloidea Gerrhosauridae Cordylidae Xantusiidae Amphisbaenia Bipedidae Lacertibaenia Trogonophiidae Amphisbaenidae Scincomorpha Lacertidae Teiioidea Gymnophthalmidae Teiidae Amphisbaenoidea Acrodonta Unidentata Chamaeleonidae Iguania Agamidae Laterata Pleurodonta Phrynosomatidae Iguanidae Dactyloidae Tropiduridae Leiocephalidae Episquamata Corytophanidae Crotaphytidae Liolaemidae Hoplocercidae Neoanguimorpha Opluridae Anguioidea Leiosauridae Polychrotidae Helodermatidae Diploglossidae Anguiformes AnguidaeAnniellidae Xenosauridae Shinisauridae Varanidae Toxicofera Paleoanguimorpha Leptotyphlopidae Typhlopoidea Gerrhopilidae Typhlopidae

Serpentes Anomalepididae Amerophidia Tropidophiidae Aniliidae Bolyeriidae Xenopeltidae Loxocemidae

Pythonidae Pythonoidea

Calabariidae Booidea Sanziniidae Charinidae Alethinophidia Ungaliophiidae Erycidae

Candoiidae

Boidae

Afrophidia Uropeltoidea Cylindrophiidae Uropeltidae Acrochordidae Xenodermidae Pareidae Caenophidia Viperidae Colubroides Elapoidea incertae sedis Colubriformes Homalopsidae * Cyclocoridae

Endoglyptodonta Elapidae

Atractaspididae Lamprophiidae

Elapoidea Pseudoxyrhophiidae

Atractaspididae Psammophiidae

Natricidae

Pseudoxenodontidae

Dipsadidae

Sibynophiidae Colubroidea Calamariidae Grayiidae

Colubridae eogene Triassic Jurassic N Paleogene Quaternary Cretaceous

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 20 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Cylindrophiidae

e

a ae

Uropeltidae

d i edis d e Ca r a Ungaliophiidae o ermid

h e ndoi Boidae ae e c a sid rtae s

od e Erycidae o p

r

c idae erid

C A p i Xen malo harinidae Pareid ea inc Sanzi V id Ho yclocorida ae C ae Calabariidae iidae niidae lapo E pid la Pythonidae E yrhoph amprophiid Loxocem L udox Pse idae Xeno id ae Atractaspididae peltidae ammophi e Bolye Ps riid ae Natricidae Aniliidae eudoxenodontida Ps Tropidophii dae Dipsadidae Anomalepididae Sibynophiidae mariidae Typhlopidae Cala

rayiidae Gerrh G opilidae Colubridae Trias Leptotyphlopidae sic Sphenodontidae Varanidae Dibamidae isauridae Shin Eu blepharidae ridae Xenosau Diplodacty lidae Anguidae C arphoda ae ctylidae Pygopo Anniellid di Gekko dae iploglossidae nid D idae Sphaerodac ae Serpentes ae Scincidae Helodermat rotid tyl Gerrh ida Iguania olych dae P uri e Cord o sa Leiosa e u Anguiformes X ylid ri Opluridae ant dae Bip ae usiidae Scincomorpha ocercida Tr ed

A o i gonop da Hopl Lacert m iolaemidae e p L Gy Laterata h Ch Te

A isbae mno hi idae rotaphytidae g iidae i idae a C a d

ma Gekkota m ae ph n

i ida

d

ele t Corytophanidae a halmida e e Leiocephalidae Tropidur Iguanidae Dibamia o Dactyloidae

nid

a

e e

Phrynosomatidae Toxicofera

Episquamata

Unidentata

Squamata PP Support <0.95

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 21 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

100

75

Serpentes Gekkota Quadripartion Support 1.0 Anguiformes 0.8 50 Gekkota/Dibamia 0.6

Episquamata 0.4 Iguania Amerophidia Scincomorpha

Site Concordance Factors 25 Toxicofera Laterata Unidentata

0

0 255075100 Gene Concordance Factors 0100 8

Concordance Factors Gene Concordance Factor

Site Concordance Factor 0204060

012345 Branch Length (coalescent units)

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 22 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

−300 −200 −100 0 100 200 300 0

Candoiidae,Boidae 50

Pleurodonta Relationships Colubroides

100 Amerophidia Bolyeridae(Xenopeltidae(Loxocemida,Pythonidae))

Node Date Serpentes Anguiformes

150 Iguania Laterata Toxicofera Scincomorpha Episquamata Unidentata Dibamia/Gekkota Squamata 200

250 Does Not Supports Support Species Species Tree Tree

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 23 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology 30 25 15 20

Density RFst-gt 10 05

0.0 0.2 0.4 0.6 0.8 1.0 10 8 Amerophidia 0246 0

0.5 0.6 0.7 0.8 0.9 8 Density

Density Toxicofera 0246

0.5 0.6 0.7 0.8 0.9 10 8

Density Dibamia/Gekkota 0246

0.5 0.6 0.7 0.8 0.9 Random Real Accuracy

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 24 1–19 Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

a) 100

80

60

40 Variable Importnace (Freq)

20

0 st b ase ase ase T b b ases ases ase A b b b b # gaps er Taxa er of sites b b /w ST & GT b um Gap Size SD Phylogen Inf um Pars inf sites Max gap size N Sites w 2 Sites w 1 N Mean gap size Sites > 0 su Sites w 3 Sites w 4 Frequency Frequency # Sse sites w gaps ML Diff SD Site Conc factors

Mean site conc factors median = 0.69 median = −0.70 ρ ρ

95% HDI

0.63 0.75 −0.76 −0.64

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

N = 370 b) RF Distance GT-ST RF Distance GT-ST 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6

0.10 0.15 0.20 0.25 0.30 0.35 −100 0 100 200 300 400 500 SD Site Concordance Factors Sites with 4 Bases

median = -0.82 median = -0.73 ρ ρ 95% HDI 95% HDI

−0.85 −0.78 −0.78 −0.67

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

N = 370 N 370 RF Distance GT-ST RF Distance GT-ST 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6

200 400 600 800 1000 1200 1400 0.0 0.5 1.0 1.5 Parsimony Informative Sites Phylogenetic Informativeness

median = -0.78 ρ 95% HDI

−0.82 −0.73

−1.0 −0.5 0.0 0.5 1.0

N = 370 0.2 0.4 0.6 RF Distance GT-ST 0.0

0 200 400 600 Sites w 3 bases

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 25 1–19