SUPPLEMENTARY INFORMATION VOLUME: 1 | ARTICLE NUMBER: 0020

In the format provided by the authors and unedited. 1 Supplementary Information Genome-wide2 interrogation advances resolution 3 Genome-wide interrogation advances resolution of recalcitrant of 4recalcitrantgroups in the Tree groups of Life in the tree of life 5 6 Dahiana Arcila1,2, Guillermo Ortí1, Richard Vari2§, Jonathan W. Armbruster3, Melanie L. 7 J. Stiassny4, Kyung D. Ko1, Mark H. Sabaj5, John Lundberg5, Liam J. Revell6 & Ricardo 8 Betancur-R.7,2* 9 10 11 1Department of Biological Sciences, The George Washington University, 2023 G St. 12 NW, Washington, DC, 20052, United States 13 2Department of Vertebrate Zoology, National Museum of Natural History Smithsonian 14 Institution, PO Box 37012, MRC 159, Washington, DC, 20013, United States 15 3Department of Biological Sciences, Auburn University, Auburn, AL 36849, United 16 States 17 4Department of Ichthyology, Division of Vertebrate Zoology, American Museum of 18 Natural History, New York, New York, United States 19 5Department of Ichthyology, The Academy of Natural Sciences, 1900 Benjamin Franklin 20 Parkway, Philadelphia, Pennsylvania 19103, United States 21 6Department of Biology, University of Massachusetts Boston, Boston, Massachusetts 22 02125, United States 23 7Department of Biology, University of Puerto Rico – Río Piedras, PO Box 23360, San 24 Juan, Puerto Rico 25 26 27 This PDF file includes: 28 29 Supplementary Methods 30 Supplementary Notes 31 Figs. S1 to S7 32 Tables S1 to S8 33 34 Other Supplementary Materials for this manuscript: 35 36 A compressed folder containing all data files is available from Zenodo.org 37 (http://dx.doi.org/10.5281/zenodo.51603). These include: 38 39 Database S1. List of target loci and probes designed for pilot study 40 Database S2. Scripts for quasi de novo assembler 41 Database S3. DNA sequence alignments, gene and multi-locus trees for pilot 42 study 43 Database S4. List of loci selected for Otophysi 44 Database S5. List of target loci and probe designed for Otophysi 45 Database S6. DNA and protein sequence alignments for Otophysi

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 1 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

46 Database S7. List of presence and absence of and genes examined for 47 Otophysi 48 Database S8. Gene sequence annotation using Blast2Go. GO = Gene ontology 49 Database S9. Summary of gene properties and selection criteria for subset 50 assembly (selected loci for each subset given in bold) 51 Database S10. Gene and multi-locus trees for Otophysi 52 Database S11. GGI tutorial dataset 53 Database S12. GGI trees (Otophysi) and species trees analyses using STAR and 54 ASTRAL-2 55 Database S13. GGI trees metazoans, birds, mammals, and yeast 56 57 Supplementary Methods 58 59 Pilot study (ray-finned fishes). We ran an initial experiment to test the efficacy of target 60 capture techniques to obtain genome-level data for a large number of exons across the 61 diversity of fishes. Taxonomic sampling for this experiment consisted of 4 otophysans 62 (one representative per order) and 10 distantly-related ray-finned fish species (Table S5; 63 Database S1). Exon markers were selected using the EvolMarker pipeline1 to screen 64 genomes of zebrafish (Danio rerio) and medaka (Oryzias latipes) via pairwise BLAST 65 searches to identify single-copy, slowly-evolving exon regions according to a 50% 66 dissimilarity criterion. A set of 3,957 putatively orthologous exons longer than 200 bp 67 was defined by this method. Custom biotinylated RNA bait libraries for the pilot study 68 were designed based on zebrafish sequences following specifications of the MYBait 69 target enrichment system (http://www/mycroarray.com). A total of 15,376 70 oligonucleotide baits (120 bp long) tiling over the 3,957 exon sequences with 2x density 71 were designed using the py_tiler.py script https://github.com/faircloth-lab/uce-probe- 72 design; 2]. 73 DNA extractions were performed using DNeasy extraction kits (Qiagen, Inc.) 74 and resulting genomic DNAs were sheared to a target size of 500 bp using a Covaris 75 sonicator. Library preparation and indexing followed standard Illumina protocols. Target 76 enrichment (TE) was conducted using a modification of the Mamanova et al. protocol3, 77 as optimized by Faircloth et al.2 Indexed libraries enriched for the 3,957 exons were 78 pooled and sequenced in two lanes of Illumina HiScan. 79 Reads were assembled into contigs with a quasi de novo approach4 using custom 80 scripts to perform the following steps (Dataset S2). First, reads from each of the 14 81 species were mapped against the reference zebrafish genome using GNUMAP v3.0.25, 82 setting similarity (a), genome mer-size (m), and matching seed hashes (k) to 0.55, 7, and 83 5, respectively. Matching reads and their pair-mates (FASTQ files) were collected 84 separately for each species and locus. Second, each set of collected reads was assembled 85 separately using the de novo algorithm with pair-mate mode implemented in Trinity6. The 86 number of contigs per locus obtained for each species varied from zero to four. Third, 87 pairwise alignments were conducted against the reference for contigs from each locus 88 separately using BLAST searches. If multiple contigs were retrieved for a single locus 89 with a similarity value < 97%, they were flagged as potential paralogs and discarded for 90 downstream analyses; if two contigs assembled had a similarity value >97%, presumably 91 representing allelic variants, they were subsequently collapsed using IUPAC ambiguity

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 2

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

92 codes. Finally, the coding region for each orthologous exon was extracted from the 93 assembled contigs for all species and subsequently aligned using MAFFT v7.023 7 (Table 94 S5). 95 For 279 loci, assemblies produced more than one contig for at least one species 96 with <97% similarity; therefore, these were removed from further analyses. In addition, 97 2637 exons also were discarded because they were captured with low efficiency and had 98 either extremely low sequencing coverage or were successfully assembled in less than 6 99 species. The final dataset assembled consisted of 1041 single-copy genes (288,581 sites). 100 To test the efficacy of these markers in resolving ray-finned fish phylogeny, two different 101 analyses were conducted. First, a Maximum Likelihood (ML) analysis of the 102 concatenated dataset in RAxML v8.2.68 based on the GTRGAMMA model; second, a 103 summary multi-species coalescent analysis using ASTRAL-29, using as input the 1041 104 gene trees obtained with RAxML (Database S3). Both methods converged on a single 105 well-supported topology that is highly congruent with previous phylogenetic hypotheses 106 for fishes10,11; fig. S3). This preferred set of 1041 single-copy exons was used to create a 107 new capture library customized for otophysan taxa. 108 109 Supplementary Notes 110 111 Testing the assumption of subclade monophyly of individual gene genealogies. The 112 main goal of the GGI approach is to address a systematic question when a number of 113 well-supported clades of undisputed monophyly have unresolved relationships (in our 114 case, the relationships among otophysan orders/suborders). The assumption of 115 monophyly for these “undisputed” clades is based on organismal phylogenies supported 116 by several lines of evidence and is more likely to be met for ancient divergences spanning 117 tenths or hundreds of millions of years of evolution (e.g., Otophysi, and the other groups 118 analyzed here) than in shallow species divergences (e.g., humans, gorillas and 119 chimpanzees)12. The question is whether in these cases forcing the monophyly of 120 organismal clades onto individual gene trees is acceptable or reasonable. Rosenberg13 121 calculated that under a neutral coalescent model the vast majority of genes in a genome 122 (99.99–99.999%) require 5.3–8.3 coalescence time units to achieve monophyly (for either 123 one species or two sister species). While it is theoretically possible to directly estimate 124 branch lengths in coalescent time units using multi-species coalescent approaches (e.g., 125 ASTRAL-2), in reality these can be severely underestimated due to high levels of gene 126 tree error14, which is evidently the case with our dataset (Figs. S1, S2). Thus, to test 127 whether this assumption is met for all major otophysan groups (i.e., cypriniforms, 128 gymnotiforms, siluriforms characoids and citharinoids; Fig. 1), we first estimated the 129 length of the branches subtending these clades in millions of years using fossil-calibrated 130 phylogenies. A time-calibrated version of our tree and trees published by others15,16 show 131 that the subtending stem lineages leading to these clades are 20–65 million years (MY) 132 (Table S7). With these branch length values in MY (T) it is thus possible to estimate 133 coalescent time units, given by T/(2Ne*generation time) for diploid species, where Ne is 134 effective population size12. Assuming a conservative estimate of Ne= 100000 and a 135 generation time of 5 years, the minimum estimated length in coalescent units for all 136 otophysan stem lineages is 20–65 (see also range of estimates in Table S8). Note that 137 these parameters are conservatively estimated. For instance, freshwater fishes are

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 3

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

138 typically confined to small geographic regions with restricted breeding and their 139 population sizes are likely smaller17. Furthermore, the mean age at maturation for several 140 freshwater fishes is estimated at 2.5 years (North American species)18. These estimates of 141 coalescent units nonetheless suggest that the assumption of subclade monophyly is met in 142 this case for the vast majority of genes in the otophysan dataset (Table S8). It is possible 143 that for some genes allelic polymorphisms are maintained by selection for much longer 144 periods of time resulting in deeper coalescence events than those predicted under the 145 neutral model. Nevertheless, as discussed in the main text, it seems unlikely that those 146 isolated instances of deep coalescences would introduce systematic bias into our GGI 147 procedure. 148 Fig S1 (right) shows that individual (unconstrained) gene trees support 149 monophyly of these undisputed clades with very low frequency. But Fig. S2 also shows 150 that gene tree error is rampant (hence our motivation to devise GGI initially). Because the 151 signal supporting the monophyly of each subclade is strong (morphological, 152 mitochondrial, multi-locus, genomic) and the coalescent estimations above indicate 153 ample time for gene trees to achieve subclade monophyly, the reason that gene trees do 154 not resolve these subclades in most cases is likely due to a combination of poor signal 155 and systematic error, as we discuss in the main text, rather than ILS. 156 In addition to Otophysi, Table S8 presents estimates of branch lengths in 157 coalescent time units for subclades in the birds and mammal datasets (no fossil-calibrated 158 trees were found for metazoans and yeasts). Note that although some of these estimates 159 fall below the minimum thresholds for genome-wide monophyly (i.e., 5.3–8.3 units13), 160 modified GGI-based coalescent analyses that use a combination of unconstrained and 161 constrained gene trees obtained results identical to those using the constrained-only 162 version (see below and Figs. 5, S6, S7). 163 164 GGI analyses based on a mixture of constrained and unconstrained tree searches. 165 During early stages of our study, an anonymous reviewer proposed a different approach 166 for GGI to relax the constraints of subclade monophyly applied to each gene tree. The 167 proposal involved conducting AU tests to compare the constrained gene tree topologies 168 (shown in Fig. 1) against the unconstrained ML tree for each gene. If the unconstrained 169 ML tree is not significantly better than the second-ranked constrained tree according to 170 the AU test, then it is reasonable and justified to take the constrained tree as a better 171 estimate of the gene tree. But in cases where the unconstrained ML tree is significantly 172 better than all constrained trees then the former should be selected and used for species 173 tree analyses. For the Otophysi dataset that implies using 16 rather than 15 gene tree 174 searches for each AU test (15 constrained and one unconstrained trees). 175 We performed such tests using both DNA and protein gene alignments and find 176 that for 60–70% of the genes in the Otophysi dataset the unconstrained tree is 177 significantly better than the second-ranked constrained tree (Fig. S6). This result is, 178 however, neither unexpected (see Figs. S1, S2) nor entirely relevant for the GGI approach 179 since we are interested in knowing which of the 15 plausible hypotheses is a better fit for 180 the data in each gene alignment. For example, assume that for a given gene history deep 181 coalescences result in gymnotiform paraphyly (i.e., some gymnotiforms are closer to 182 siluriforms than to other gymnotiforms), but the monophyly of all other clades in the H0 183 tree is still supported (i.e., citharinoids, characoids, characiforms, siluriforms,

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 4

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

184 gymnotifoms + siluriforms). In this case while the unconstrained tree is expected to be 185 significantly better than any of the constrained trees due to gymnotiform paraphyly, the 186 predefined genealogy to be favored would still be H0 (i.e., as in a model-fitting 187 framework). Several factors other than ILS associated with gene tree estimation error 188 may lead to significant incongruence between an unconstrained ML gene tree and each of 189 the reasonable hypotheses represented by the 15 constraints (e.g., model misspecification, 190 systematic biases, horizontal gene transfer, hidden paralogy19), but understanding the 191 forces behind this incongruence is not the main goal of this procedure. Instead, we are 192 attempting to identify the models (plausible evolutionary histories) that are the best fit for 193 each gene partition by ranking them according to the AU test. In addition, we emphasize 194 that the coalescent time unit estimations shown above suggest that all otophysan 195 subclades should be monophyletic for the vast majority of gene trees. 196 Noteworthy, the overall results obtained with GGI are robust to the usage of 197 unconstrained ML gene trees (Fig. S6). We find that although most unconstrained ML 198 gene trees are statistically better than rank 2 constrained trees (714 for DNA and 594 for 199 proteins; P= 0.05), analyses including these unconstrained ML trees still find the same 200 result obtained with constrained searches alone (Figs. 2A-B): H0 is supported much more 201 often than Ha10, the second most frequent topology (260 vs. 25 for DNA, and 275 vs. 32 202 for proteins; Fig. S6). Furthermore, the GGI-based species tree analyses using a 203 combination of constrained and significantly better unconstrained ML gene trees as input 204 for ASTRAL-2 support the H0 tree for both DNA or protein alignments (Fig. S6). 205 In addition to otophysans, the AU tests comparing unconstrained and constrained 206 ML gene trees find similar results for metazoans, with the GGI-based ASTRAL-2 tree 207 supporting the Ctenophora-sister hypothesis (Fig. S7A). Similar results were also 208 obtained for mammals and yeasts, except that in these two cases the number of 209 unconstrained trees that are significant is much smaller (4–8%; Figs. S7B-C). For both 210 datasets, the ASTRAL-2 analyses find the same topologies as those favored by GGI 211 (Figs. 2E-F). While we also analysed the Neoaves dataset using the unconstrained and 212 constrained ML gene trees (Fig. S7D) we did not use these as input for ASTRAL-2 213 because, as discussed in the paper, this requires testing 105 competing topologies in 214 addition to the unconstrained trees – an analysis beyond the scope of this study. The 215 pattern of support among constrained gene trees for birds (Fig. S7D) is, however, similar 216 to that reported in Fig. 2D. 217 Taken together, these analyses indicate that our initial results are robust to 218 possible violations of the assumption of subclade monophyly across the vast majority of 219 genes. Even if independent estimates of branch lengths in coalescent time units suggest 220 high probability of monophyly (Table S8), we advise potential users of GGI to test the 221 robustness of their results using a combination of constrained and unconstrained trees 222 searches as input for summary coalescent methods. 223 224 1 Li, C., Riethoven, J. J. & Naylor, G. J. P. EvolMarkers: a database for mining 225 exon and intron markers for evolution, ecology and conservation studies. Mol 226 Ecol Resour 12, 967-971 (2012). 227 2 Faircloth, B. C. et al. Ultraconserved Elements Anchor Thousands of Genetic 228 Markers Spanning Multiple Evolutionary Timescales. Systematic Biology 61, 229 717–726, doi:10.1093/sysbio/sys004 (2012).

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 5

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

230 3 Mamanova, L. et al. Target-enrichment strategies for next-generation sequencing. 231 Nat Meth 7, 111-118, 232 doi:http://www.nature.com/nmeth/journal/v7/n2/suppinfo/nmeth.1419_S1.html 233 (2010). 234 4 Lemmon, A. R., Emme, S. A. & Lemmon, E. M. Anchored Hybrid Enrichment 235 for Massively High-Throughput Phylogenomics. Systematic Biology 61, 727-744, 236 doi:10.1093/sysbio/sys049 (2012). 237 5 Clement, N. L. et al. The GNUMAP algorithm: unbiased probabilistic mapping of 238 oligonucleotides from next-generation sequencing. Bioinformatics (Oxford, 239 England) 26, 38-45, doi:10.1093/bioinformatics/btp614 (2010). 240 6 Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using 241 the Trinity platform for reference generation and analysis. Nat Protoc 8, 1494- 242 1512, doi:10.1038/nprot.2013.084 (2013). 243 7 Katoh, K. & Standley, D. M. MAFFT Multiple Sequence Alignment Software 244 Version 7: Improvements in Performance and Usability. Molecular Biology and 245 Evolution 30, 772-780, doi:10.1093/molbev/mst010 (2013). 246 8 Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post- 247 analysis of large phylogenies. Bioinformatics (Oxford, England) 30, 1312-1313, 248 doi:10.1093/bioinformatics/btu033 (2014). 249 9 Mirarab, S. et al. ASTRAL: genome-scale coalescent-based species tree 250 estimation. Bioinformatics (Oxford, England) 30, i541-548, 251 doi:10.1093/bioinformatics/btu462 (2014). 252 10 Near, T. J. et al. Resolution of ray-finned fish phylogeny and timing of 253 diversification. Proceedings of the National Academy of Sciences 109, 13698– 254 13703, doi:10.1073/pnas.1206625109 (2012). 255 11 Betancur-R., R. et al. The tree of life and a new classification of bony fishes. 256 PLoS Currents Tree of Life 2013 Apr 18, 257 doi:10.1371/currents.tol.53ba26640df0ccaee75bb165c8c26288 (2013). 258 12 Degnan, J. H. & Rosenberg, N. A. Gene tree discordance, phylogenetic inference 259 and the multispecies coalescent. Trends in Ecology & Evolution 24, 332-340, 260 doi:10.1016/j.tree.2009.01.009 (2009). 261 13 Rosenberg, N. A. The shapes of neutral gene genealogies in two species: 262 probabilities of monophyly, paraphyly, and polyphyly in a coalescent model. 263 Evolution 57, 1465-1477 (2003). 264 14 Sayyari, E. & Mirarab, S. Fast Coalescent-Based Computation of Local Branch 265 Support from Quartet Frequencies. Mol Biol Evol 33, 1654-1668, 266 doi:10.1093/molbev/msw079 (2016). 267 15 Chen, W. J., Lavoue, S. & Mayden, R. L. Evolutionary origin and early 268 biogeography of otophysan fishes (Ostariophysi: Teleostei). Evolution 67, 2218- 269 2239, doi:10.1111/evo.12104 (2013). 270 16 Nakatani, M., Miya, M., Mabuchi, K., Saitoh, K. & Nishida, M. Evolutionary 271 history of Otophysi (Teleostei), a major clade of the modern freshwater fishes: 272 Pangaean origin and Mesozoic radiation. BMC Evolutionary Biology 11, 177, 273 doi:10.1186/1471-2148-11-177 (2011).

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 6

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

274 17 Yi, S. & Streelman, J. T. Genome size is negatively correlated with effective 275 population size in ray-finned fish. Trends Genet 21, 643-646, 276 doi:10.1016/j.tig.2005.09.003 (2005). 277 18 Mims, M. C., Olden, J. D., Shattuck, Z. R. & Poff, N. L. Life history trait 278 diversity of native freshwater fishes in North America. Ecology of Freshwater 279 Fish 19, 390-400, doi:10.1111/j.1600-0633.2010.00422.x (2010). 280 19 Maddison, W. P. Gene Trees in Species Trees. Systematic Biology 46, 523-536 281 (1997). 282 20 Fink, S. V. & Fink, W. L. Interrelationships of the Ostariophysan Fishes 283 (Teleostei). Zoological Journal of the Linnean Society 72, 297-353 (1981). 284 21 Dimmick, W. W. & Larson, A. A molecular and morphological perspective on the 285 phylogenetic relationships of the otophysan fishes. Molecular Phylogenetics and 286 Evolution 6, 120-133 (1996). 287 22 Alves-Gomes, J. A. in Gonorynchiformes and Ostariophysan relationships (eds 288 T. Grande, F. J. Potayo-Ariza, & R. Diogo) (Science Publishers, 2010). 289 23 Saitoh, K., Miya, M., Inoue, J. G., Ishiguro, N. B. & Nishida, M. Mitochondrial 290 genomics of ostariophysan fishes: Perspectives on phylogeny and biogeography. 291 Journal of Molecular Evolution 56, 464-472, doi:10.1007/s00239-002-2417-y 292 (2003). 293 24 Lavoue, S. et al. Molecular systematics of the gonorynchiform fishes (Teleostei) 294 based on whole mitogenome sequences: Implications for higher-level 295 relationships within the Otocephala. Molecular Phylogenetics and Evolution 37, 296 165-177, doi:Doi 10.1016/J.Ympev.2005.03.024 (2005). 297 25 Peng, Z., Diogo, R. & He, S. Teleost fishes (Teleostei). The timetree of life, 335 - 298 337 (2009). 299 26 Chakrabarty, P., McMahan, C., Fink, W., Stiassny, M. L. & Alfaro, M. in ASIH – 300 American Society of Ichthyologists and Herpetologists. (eds M. L. Crump & M. 301 A. Donnelly). 302 27 Betancur, R. R., Orti, G. & Pyron, R. A. Fossil-based comparative analyses reveal 303 ancient marine ancestry erased by extinction in ray-finned fishes. Ecol Lett 18, 304 441-450, doi:10.1111/ele.12423 (2015). 305 28 Meredith, R. W. et al. Impacts of the Cretaceous Terrestrial Revolution and KPg 306 Extinction on Mammal Diversification. Science 334, 521-524, 307 doi:10.1126/science.1211028 (2011). 308 29 Hedges, S. B., Marin, J., Suleski, M., M., P. & Kumar, S. Tree of Life Reveals 309 Clock-Like Speciation and Diversification. Mol Biol Evol 2015, 835-845 (2015). 310 30 Martin, A. P. & Palumbi, S. R. Body size, metabolic rate, generation time, and the 311 molecular clock. Proc. Natl. Acad. Sci. USA, 4087-4091 (1993). 312 31 Charlesworth, B. Fundamental concepts in genetics: effective population size and 313 patterns of molecular evolution and variation. Nat Rev Genet 10, 195-205 (2009). 314 32 Prum, R. O. et al. A comprehensive phylogeny of birds (Aves) using targeted 315 next-generation DNA sequencing. Nature 526, 569-573, doi:10.1038/nature15697 316 (2015). 317 33 Saether, B. E. et al. Generation time and temporal scaling of bird population 318 dynamics. Nature 436, 99-102 (2005). 319

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 7

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

320 321 322 Figure S1. Resolution and support for undisputed taxonomic groups (orders, 323 suborders and families) that are independent of the main hypotheses tested (Fig. 1) | 324 Results from 45 multi-locus analyses show high levels of concordance across methods in 325 terms of presence/absence of clades (absences denoted by oblique lines) as well as nodal 326 support (grey scale). By contrast, a large proportion of individual gene trees fail to 327 resolve these groups. These comparisons highlight differences in phylogenetic accuracy 328 and congruence between multi-locus analyses and individual gene trees. Twelve families 329 represented by single individuals in our dataset were not tested. 330 331 332 333 334 335 336 337 338

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 8

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

339 340 341 Figure S2. Assessments of phylogenetic precision for multi-locus trees and 342 individual gene trees | a, dispersion in multidimensional tree space based on un- 343 weighted Robinson-Foulds distances; b, plots of mean support values across all nodes for 344 each tree obtained with different methods (ExaBayes: posterior probabilities; all other 345 analyses: bootstrap values). 346 347 348 349 350 351 352

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 9

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

Acanthocobitis sp Acantopsis sp Citharinus congicus Macrhybopsis storeriana Citharinus gibbosus Parachela sp spilurus Esomus danrica ansorgii Danio rerio ornatus Opsarius tileo uniocellatu Cabdio morar Nannocharax taenia Salmostoma phulo fasciolatus Cypriniformes Psilorhynchus sp Citharinoidei Distichodus decemmaculatus Amblyrhynchichthys micracanthus Distichodus fasciolatus Sikukia gudgeri Barbonymus gonionotus Belonophago hutsebouti Paedopcypris sp macroterolepis Probarbus jullieni boulengeri Incisilabeo behri Gibelion catla Characidium purpuratum Characidium zebra Crossocheilus latius Characidium sp Garra fasciacauda Rasbora daniconius Boulengerella lateristriga Pyrrhulina filamentosa Henicorhynchus sp Lebiasina provenzanoi Osteochilus lini GGI-ST Lebiasina aureoguttata Labiobarbus leptocheilus 100% Sternarchorhynchus stewarti Chalceus epakros Apteronotus sp Acestrorhynchus microlepis Apteronotus albifrons Bryconops melanurus Sternopygus macrurus Triportheus rotundatus Eigenmannia virescens Carnegiella marthae Rhabdolichops jegui Thoracocharax stellatus Electrophorus electricus Brachychalcinus orbicularis Gymnotiformes Brachyhypopomus brevirostris Paracheirodon axelrodi Brachyhypopomus sp Jupiaba meunieri Hypopygus lepturus Jupiaba abramoides Rhamphichthys rostratus Hollandichthys multifasciatus Gymnorhamphichthys hypostomus Hyphessobrycon sovichthys Gymnorhamphichthys rondoni Odontostilbe pequira GGI-ST Ituglanis amazonicus Prodontocharax alleni Trichomycteridae sp Leptagoniates steindachneri 100% Paracanthopoma parva Prionobrama filigera Vandellia sp Aphyocharax dentatus Haemomaster venezuelae Exodon paradoxus Henonemus sp Roeboexodon guyanensis Pseudostegophilus nemurus Tetragonopterus chalceus Callichthys callichthys Roeboides descalvadensis Nematogenys inermis Acestrocephalus sardina Astroblepus sp Markiana nigripinnis Loricaria sp Glandulocauda melanopleura Cteniloricaria platystoma Farlowella amazonum Mimagoniates rheocharis Hemibrycon surinamensis Lamontichthys filamentosus Chaetostoma breve Acrobrycon ipanquianus Lasiancistrus schomburgkii Corynopoma riisei Ancistrus leucostictus Pseudocorypoma doriae Pseudancistrus nigrescens Lepidocharax burnsi Siluriformes Acanthicus adonis Characoidei Creagrutus beni Peckoltia lineola Ceratobranchia obtusirostris Hypostomus taphorni Knodus smithi Diplomystes nahuelbutaensis Piabina argentea Helogenes marmoratus Piabarchus stramineus Cetopsis plumbea Scopaeocharax sp Denticetopsis macilenta Cyanocharax uruguayensis Wallago attu Bryconamericus lethostigmus Phalacronotus bleekeri Bryconamericus leptorhynchus Kryptopterus cryptopterus Hepsetus odoe Gogangra viridescens Brycinus poptae Gagata cenia Brycinus grandisquamis Sperata aor Bryconalestes longipinnis Bagrus ubangensis Bathyaethiops greeni Eutropiichthys murius Alestopetersius hilgendorfi Clupisoma garua Hydrocynus vittatus Rhamdia quelen Hydrocynus goliath Pimelodella leptosoma Rhabdalestes septentrionalis Pimelodella cristata Micralestes acutidens Brachyglanis sp Alestopetersius tumbensis Leptorhamdia marmorata Hoplias aimara Mastiglanis sp Hoplerythrinus unitaeniatus Chasmocranus brevior Hydrolycus armatus Cetopsorhamdia insidiosa Cynodon gibbus Imparfinis hasemani GGI-ST Microglanis poecilus Bivibranchia fowleri 100% Cephalosilurus sp Hemiodus argenteus Batrochoglanis villosus Hemiodus thayeria Phractocephalus hemioliopterus Colossoma macropomum Pimelodus albofasciatus Mylossoma sp Megalonema platycephalum Acnodon oligacanthus Platysilurus mucosus Ossubtus sp Sorubim elongatus Myleus pacu Pseudoplatystoma fasciatum Myloplus rubripinnis Plotosus canius Pristobrycon striolatus Channallabes apus Pygocentrus nattereri Chrysichthys cranchii Serrasalmus rhombeus Parauchenoglanis balayi Serrasalmus manueli Chiloglanis occidentalis Serrasalmus gouldingi Synodontis batesii Apareiodon orinocoensis Ameiurus nebulosus Parodon guyanensis Noturus leptacanthus Ichthyoelephas longirostris Helicophagus leptorhynchus Prochilodus scrophus Pseudolais pleurotaenia Semaprochilodus varii Ernstichthys sp Caenotropus labyrinthicus Xyliphius kryptos Curimatopsis crypticus Bunocephalus amaurus Curimata roseni Glanidium leopardum Psectrogaster ciliata Tatia sp Cyphocharax abramoides Centromochlus heckelii Trachelyopterus sp Synaptolaemus latofasciatus Auchenipterichthys longimanus Anostomus ternetzi Petulanos spiloclistron Trachelyichthys sp Pseudanos winterbottomi Trachelyopterichthys taeniatus Scorpiodoras sp Leporinus brunneus Acanthodoras spinosissimus Abramites hypselonotus Rhynchodoras woodsi Leporinus striatus Rhinodoras armbrusteri Leporinus agassizii Oxydoras niger Leporinus ortomaculatus Platydoras costatus Schizodon vittatus Doras micropoeus Laemolyta proxima

353 354 Figure S3. GGI-based species trees for Otophysi using ASTRAL-2 (complete 355 dataset). 356

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 10

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

357 358 Figure S4. Flowchart of experimental design and methodological approaches used.

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 11

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

359 360 361 Figure S5. Phylogenomic tree of selected ray-finned fish from pilot study | Both 362 phylogenetic methods conducted (concatenation in RAxML; species tree in ASTRAL-2) 363 resulted in the same topology, with high bootstrap support (black circles 100% BS; grey 364 circles 75–85% BS). 365

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 12

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

366 367 Figure S6. Testing the assumption of subclade monophyly under GGI, using a 368 combination of significantly better unconstrained and constrained gene trees for 369 Otophysi | Lines represent the cumulative number of genes with phylogenetic signal (X- 370 axis) rejecting or accepting any of the 15 alternative topologies with highest probability 371 in favour of the unconstrained or a constrained ML tree and their associated P-values (Y- 372 axis) according to the approximately unbiased (AU) topology test. Coalescent species 373 trees analyses used both constrained and significantly-better unconstrained gene 374 trees as input for ASTRAL-2 (*Table S1).

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 13

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

375 376 Figure S7. Testing the assumption of subclade monophyly under GGI, using a 377 combination of significantly better unconstrained and constrained gene trees for 378 other groups | Lines represent the cumulative number of genes with phylogenetic signal 379 (X-axis) rejecting or accepting any of the 15 alternative topologies with highest 380 probability in favour of the unconstrained or a constrained ML tree and their associated 381 P-values (Y-axis) according to the approximately unbiased (AU) topology test. 382 Coalescent species trees analyses used both constrained and significantly-better 383 unconstrained gene trees as input for ASTRAL-2. a, metazoans; b, eutherian 384 mammals (treeshrew); c, yeast phylogeny; d, Neoaves (mousebird).

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 14

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

385 Table S1. Data used by previous studies addressing interrelationships among major 386 lineages in Otophysi. Number Study Code Markers Length bp/chrs of Species Fink and Fink20 FF81 Morphology 127 1 nuclear gene and Dimmick and Larson21 DL96 mt rDNA 2,477 9 Alves-Gomes22 AG10 mt rDNA 701 60 Saitoh et al.23 SA03 mt genome 8,096 13 Near et al.10 NE12 10 nuclear genes 7,587 16 Lavoue et al.24 LA05 mt genome 10,395 13 Peng et al.25 PO09 mt genome 6,198 17 20 nuclear genes Betancur-R et al.11 BE13 and mt rDNA 20,853 98 Chakrabarty et al.26 CK13 351 UCEs 133,424 35 10 nuclear genes Nakatani et al.16 NA11 and mt genome 17,918 66 Chen et al.15 CH13 5 nuclear genes 4,518 95 387 mt= mitochondrial 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 15

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

412 Table S2. Description of the 45 multi-locus datasets analysed, including information on 413 number of loci, total length of the alignments (sites), and number of parsimony 414 informative sites. DNA Protein Analyses Method Parsim- Parsim- Loci Sites IS Loci Sites IS RAxML 1051 279,012 123,714 1051 93,457 27,055 ExaBayes 1051 279,012 123,714 1051 93,457 27,055 FastTree 1051 279,012 123,714 1051 93,457 27,055 TNT 1051 279,012 123,714 1051 93,457 27,055 High bootstrap 200 65,709 31,189 200 20,954 8,987 Congruence 210 61,914 28,065 210 21,154 8,394 Conserved 200 48,348 17,660 200 15,791 1,487 SL > 350*/100** sites 205 60,225 27,446 205 28,907 10,026 Concatenation Concatenation At least 200 spp. 231 68,682 30,657 231 22,441 5,965 Li et al. (2013) 243 60,147 26,595 243 20,049 5,106 Inoue et al. (2015) 175 44,559 19,651 175 14,853 3,878 Stationarity (Low DI) 200 46,677 20,118 n/a n/a n/a AT richness 200 52,809 23,763 n/a n/a n/a ASTRAL-2 1051 279,012 123,714 1051 93,457 27,055 STAR 1050 279,012 123,714 1051 93,457 27,055 NJst 1050 279,012 123,714 1051 93,457 27,055 High bootstrap 200 65,709 31,189 200 20,954 8,987 Congruence 210 61,914 28,065 210 21,154 8,394 Conserved 200 48,348 17,660 200 15,791 1,487 SL > 350*/100** sites 205 60,225 27,446 205 28,907 10,026

At least 200 spp. Trees Species 231 68,682 30,657 231 22,441 5,965 Li et al. (2013) 243 60,147 26,595 243 20,049 5,106 Inoue et al. (2015) 175 44,559 19,651 175 14,853 3,878 Stationarity (Low DI) 200 46,677 20,118 n/a n/a n/a AT richness 200 52,809 23,763 n/a n/a n/a 415 SL> = sequence length greater than 350 sites (*DNA alignments) or 96 sites (**protein 416 alignments); Pars-IS = parsimony informative sites; DI = disparity index; n/a = not 417 applicable 418 419 420 421 422 423

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 16

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

424 Table S3. Summary of the results using gene genealogy interrogation (GGI) on the five 425 datasets analysed DNA PROT Number of Number of Number of rank 1 gene Number of rank 1 gene Hypothesis tested rank 1 gene trees that are rank 1 gene trees that are trees also trees also significant significant Otophysi H0 (FF81*) 459 327 314 197 Ha01 31 0 57 1 Ha02 44 0 51 4 Ha03 40 0 47 9 Ha04 40 0 56 0 Ha05 51 0 86 5 Ha06 44 1 72 5 Ha07 (SA03*) 22 0 56 4 Ha08 (LA05*) 0 0 0 0 Ha09 (CK13) 33 0 55 6 Ha10 (NA11*) 174 69 146 39 Ha11 39 0 45 2 Ha12 25 0 22 1 Ha13 24 0 23 2 Ha14 25 0 21 0 Metazoans Porifera-sister 38 0 Ctenophora-sister 134 7 Porifera + Ctenophora 37 1 Yeasts Saccharomyces castellii + 426 38 Saccharomyces s.s Candida glabrata + 332 24 Saccharomyces s.s Saccharomyces castellii + 312 17 Candida glabrata Eutherian mammals Sister to Rodentia + Lagomorpha + 97 1 Primates Sister to Primates 155 3 Sister to Rodentia + Lagomorpha 162 7 Neoavian birds Sister to other Afroaves 12 9 Sister to Coraciimorphae + owl 4 0 Sister to owl 35 2 Sister to Coraciimorphae 1 31 1 Sister to other landbirds 66 2 Sister to Accipitrimorphae 31 1 Sister to Australaves 25 0 Sister to Coraciimorphae 2 55 1 426 * Hypotheses resolved by more than one previous study; s.s = sensu strict

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 17

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

427 Table S4. General properties of all datasets analysed to test the generality of the Gene 428 Genealogy Interrogation approach. 429 Number Number Number of Number Dataset description of Genes of Taxa Hypotheses of Sites Otophysi 1,051 225 15 279,012 Metazoa 209 76 3 60,768 Birds (mousebird) 259 90 8* 390,000 Mammals (tree shrew) 413 18 3 139,600 Yeast 1,070 23 3 20,289 430 * Only 8 (out of 105) possible unrooted trees were 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 18

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

466 Table S5. List of specimens examined and catalog numbers. 467 Order Family Species Catalog Number Acestrorhynchidae Acestrorhynchus microlepis AUM 3537 Characiformes Alestidae Alestopetersius hilgendorfi AMNH 034-3321 Characiformes Alestidae Bathyaethiops greeni AMNH 024-2395 Characiformes Alestidae Brycinus grandisquamis AMNH 115-11429 Characiformes Alestidae Brycinus poptae AMNH 025-2455 Characiformes Alestidae Bryconalestes longipinnis AUM 5548 Characiformes Alestidae Alestopetersius tumbensis AMNH 052-5189 Characiformes Alestidae Hydrocynus goliath AMNH 022-2138 Characiformes Alestidae Hydrocynus vittatus AMNH 075-7445 Characiformes Alestidae Micralestes acutidens AUM 5674 Characiformes Alestidae Rhabdalestes septentrionalis AMNH 113-11282 Characiformes Anostomidae Abramites hypselonotus GO 77 Characiformes Anostomidae Anostomus ternetzi AUM 3075 Characiformes Anostomidae Laemolyta proxima AUM 3059 Characiformes Anostomidae Leporinus agassizii AUM 3060 Characiformes Anostomidae Leporinus brunneus AUM 3071 Characiformes Anostomidae Leporinus ortomaculatus AUM 3065 Characiformes Anostomidae Leporinus striatus AUM 3064 Characiformes Anostomidae Petulanos spiloclistron AUM 3057 Characiformes Anostomidae Pseudanos winterbottomi AUM 3084 Characiformes Anostomidae Schizodon vittatus AMNH 075-7445 Characiformes Anostomidae Synaptolaemus latofasciatus AUM 3064 Characiformes Chalceidae Chalceus epakros AUM 3601 Characiformes Acestrocephalus sardina AUM 3532 Characiformes Characidae Acrobrycon ipanquianus ANSP 180776 Characiformes Characidae Aphyocharax dentatus UFRGS13585 TEC1707A Characiformes Characidae Jupiaba abramoides AUM 4709 Characiformes Characidae Brachychalcinus orbicularis AUM 3056 Characiformes Characidae Piabarchus stramineus UFRGS12898 TEC1049 Characiformes Characidae Ceratobranchia obtusirostris AUM 4046 Characiformes Characidae Paracheirodon axelrodi GO 213 Characiformes Characidae Corynopoma riisei ROM 8900 Characiformes Characidae Creagrutus beni UFRGS 12811 TEC1521 Characiformes Characidae Cyanocharax uruguayensis UFRGS 10692 TEC393B Characiformes Characidae Exodon paradoxus AUM 3536 Characiformes Characidae Glandulocauda melanopleura UFRGS 12885 TEC831 Characiformes Characidae Hemibrycon surinamensis AUM 4711 Characiformes Characidae Hollandichthys multifasciatus UFRGS 11792 TEC 841A Characiformes Characidae Odontostilbe pequira AUM 4047 Characiformes Characidae Hyphessobrycon sovichthys AUM 4799 Characiformes Characidae Bryconamericus leptorhynchus UFRGS1812B Characiformes Characidae Jupiaba meunieri AUM 4688 Characiformes Characidae Knodus smithi MUSM 35458 Characiformes Characidae Lepidocharax burnsi UFRGS 12886 TEC1018 Characiformes Characidae Leptagoniates steindachneri GO 123 Characiformes Characidae Markiana nigripinnis UFRGS 1160A Characiformes Characidae Mimagoniates rheocharis UFRGS 12896 911 Characiformes Characidae Bryconamericus lethostigmus UFRGS12537 TEC1239 Characiformes Characidae Piabina argentea UFRGS 11373 TEC1211 Characiformes Characidae Prionobrama filigera ANSP 180765 Characiformes Characidae Prodontocharax alleni ANSP 182951

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 19

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

Characiformes Characidae Pseudocorypoma doriae UFRGS 12389 TEC693A Characiformes Characidae Roeboexodon guyanensis AUM 3605 Characiformes Characidae Roeboides descalvadensis UFRGS13496 TEC1618B Characiformes Characidae Scopaeocharax sp UFRGS 10692 TEC393B Characiformes Characidae Tetragonopterus chalceus AUM 3599 Characiformes Chilodontidae Caenotropus labyrinthicus AUM 3096 Characiformes Citharinidae Citharinus congicus AMNH 081-8014 Characiformes Citharinidae Citharinus gibbosus AMNH 102-10177 Characiformes Crenuchidae Characidium sp AUM 4777 Characiformes Crenuchidae Characidium purpuratum AUM 4061 Characiformes Crenuchidae Characidium zebra AUM 4856 Characiformes Ctenoluciidae Boulengerella lateristriga AUM 3257 Characiformes Curimatidae Curimata roseni ANSP 189094 Characiformes Curimatidae Curimatopsis crypticus ANSP 189091 Characiformes Curimatidae Cyphocharax abramoides AUM 4709 Characiformes Curimatidae Psectrogaster ciliata ANSP 189093 Characiformes Cynodontidae Cynodon gibbus AUM 3244 Characiformes Cynodontidae Hydrolycus armatus AUM 3529 Characiformes Distichodontidae Belonophago hutsebouti AMNH 025-2468 Characiformes Distichodontidae Distichodus decemmaculatus AMNH 026-2531 Characiformes Distichodontidae Distichodus fasciolatus AUM 5670 Characiformes Distichodontidae Distichodus fasciolatus AMNH 069-6958 Characiformes Distichodontidae Eugnathichthys macroterolepis AMNH 097-9609 Characiformes Distichodontidae Nannocharax uniocellatus AMNH 081-8052 Characiformes Distichodontidae Ichthyborus ornatus AMNH 034-3337 Characiformes Distichodontidae Mesoborus crocodilus AMNH 097-9623 Characiformes Distichodontidae Nannocharax taenia AMNH 108-10751 Characiformes Distichodontidae Neolebias ansorgii GO 48 Characiformes Distichodontidae Phago boulengeri AMNH 102-10181 Characiformes Distichodontidae Xenocharax spilurus AMNH 088-8739 Characiformes Erythrinidae Hoplerythrinus unitaeniatus AUM 4875 Characiformes Erythrinidae Hoplias aimara AUM 4938 Characiformes Gasteropelecidae Carnegiella marthae AUM3958 Characiformes Gasteropelecidae Thoracocharax stellatus AUM 4002 Characiformes Hemiodontidae Bivibranchia fowleri AUM 3245 Characiformes Hemiodontidae Hemiodus argenteus AUM 3250 Characiformes Hemiodontidae Hemiodus thayeria AUM 3258 Characiformes Hepsetidae Hepsetus odoe AUM 5790 Characiformes Iguanodectidae Bryconops melanurus AUM 4794 Characiformes Lebiasinidae Lebiasina aureoguttata AUM 3719 Characiformes Lebiasinidae Lebiasina provenzanoi AUM 4823 Characiformes Lebiasinidae Pyrrhulina filamentosa AUM 3938 Characiformes Parodontidae Apareiodon orinocoensis AUM 3957 Characiformes Parodontidae Parodon guyanensis AUM 3605 Characiformes Prochilodontidae Ichthyoelephas longirostris ANSP 1305 Characiformes Prochilodontidae Prochilodus scrophus AUM 3735 Characiformes Prochilodontidae Semaprochilodus varii ANSP 187435 Characiformes Serrasalmidae Acnodon oligacanthus GO 401 Characiformes Serrasalmidae Colossoma macropomum GO 332 Characiformes Serrasalmidae Myleus pacu AUM 3042 Characiformes Serrasalmidae Myloplus rubripinnis AUM 3051 Characiformes Serrasalmidae Mylossoma sp AUM 3733 Characiformes Serrasalmidae Ossubtus sp GO 253 Characiformes Serrasalmidae Pristobrycon striolatus GO 400

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 20

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

Characiformes Serrasalmidae Pygocentrus nattereri AUM 3043 Characiformes Serrasalmidae Serrasalmus gouldingi AUM 3045 Characiformes Serrasalmidae Serrasalmus manueli AUM 3054 Characiformes Serrasalmidae Serrasalmus rhombeus AUM 3041 Characiformes Triportheidae Triportheus rotundatus AUM 3556 Cypriniformes Cobitidae Acanthocobitis sp AUM 3795 Cypriniformes Cobitidae Acantopsis sp AUM 5030 Cypriniformes Cyprinidae Amblyrhynchichthys micracanthus AUM 5035 Cypriniformes Cyprinidae Incisilabeo behri AUM 5025 Cypriniformes Cyprinidae Barbonymus gonionotus AUM 5095 Cypriniformes Cyprinidae Opsarius tileo AUM 3793 Cypriniformes Cyprinidae Cabdio morar AUM 3800 Cypriniformes Cyprinidae Gibelion catla AUM 5042 Cypriniformes Cyprinidae Crossocheilus latius AUM 3794 Cypriniformes Cyprinidae Esomus danrica AUM 3760 Cypriniformes Cyprinidae Garra fasciacauda AUM 5009 Cypriniformes Cyprinidae Henicorhynchus sp AUM 5059 Cypriniformes Cyprinidae Labiobarbus leptocheilus AUM 5099 Cypriniformes Cyprinidae Macrhybopsis storeriana AUM 7 Cypriniformes Cyprinidae Osteochilus lini AUM 5040 Cypriniformes Cyprinidae Parachela sp AUM 5066 Cypriniformes Cyprinidae Probarbus jullieni AUM 5023 Cypriniformes Cyprinidae Rasbora daniconius AUM 5111 Cypriniformes Cyprinidae Salmostoma phulo AUM 3779 Cypriniformes Cyprinidae Sikukia gudgeri AUM 5083 Cypriniformes Paedocypridae Paedopcypris sp MT-001 Cypriniformes Psilorhynchidae Psilorhynchus sp AUM 3784 Cypriniformes Cyprinidae Danio rerio GO 23 Gymnotiformes Apteronotidae Apternotus sp AUM 3732 Gymnotiformes Apteronotidae Apteronotus albifrons AUM 3921 Gymnotiformes Apteronotidae Sternarchorhynchus stewarti AUM 3745 Gymnotiformes Gymnotidae Electrophorus electricus AUM 3843 Gymnotiformes Hypopomidae Brachyhypopomus brevirostris AUM 3222 Gymnotiformes Hypopomidae Brachyhypopomus sp AUM 3920 Gymnotiformes Hypopomidae Hypopygus lepturus AUM 3224 Gymnotiformes Rhamphichthyidae Gymnorhamphichthys hypostomus AUM 3202 Gymnotiformes Rhamphichthyidae Gymnorhamphichthys rondoni AUM 3201 Gymnotiformes Rhamphichthyidae Rhamphichthys rostratus AUM 3220 Gymnotiformes Sternopygidae Eigenmannia virescens AUM 3218 Gymnotiformes Sternopygidae Rhabdolichops jegui ANSP 189021 Gymnotiformes Sternopygidae Sternopygus macrurus AUM 3198 Siluriformes Aspredinidae Bunocephalus amaurus AUM 3909 Siluriformes Aspredinidae Ernstichthys sp AUM 3988 Siluriformes Aspredinidae Xyliphius kryptos AUM 4054 Siluriformes Astroblepidae Astroblepus sp AUM 5348 Siluriformes Auchenipteridae Auchenipterichthys longimanus AUM 3321 Siluriformes Auchenipteridae Centromochlus heckelii AUM 3328 Siluriformes Auchenipteridae Glanidium leopardum AUM 4687 Siluriformes Auchenipteridae Tatia sp AUM 3320 Siluriformes Auchenipteridae Trachelyichthys sp AUM 3325 Siluriformes Auchenipteridae Trachelyopterichthys taeniatus AUM 3316 Siluriformes Auchenipteridae Trachelyopterus sp AUM 3330 Siluriformes Bagridae Bagrus ubangensis AUM 5668 Siluriformes Bagridae Sperata aor AUM 3818

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 21

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

Siluriformes Callichthyidae Callichthys callichthys ANSP 179110 Siluriformes Cetopsidae Cetopsis plumbea AUM 4035 Siluriformes Cetopsidae Denticetopsis macilenta AUM 3310 Siluriformes Cetopsidae Helogenes marmoratus AUM 3952 Siluriformes Clariidae Channallabes apus AUM 5737 Siluriformes Claroteidae Chrysichthys cranchii AUM 5676 Siluriformes Claroteidae Parauchenoglanis balayi AUM 5825 Siluriformes Diplomystidae Diplomystes nahuelbutaensis ANSP 180476 Siluriformes Doradidae Acanthodoras spinosissimus AUM 3267 Siluriformes Doradidae Doras micropoeus AUM 3262 Siluriformes Doradidae Oxydoras niger AUM 3301 Siluriformes Doradidae Platydoras costatus AUM 3268 Siluriformes Doradidae Rhinodoras armbrusteri AUM 3298 Siluriformes Doradidae Rhynchodoras woodsi AUM 4038 Siluriformes Doradidae Scorpiodoras sp AUM 3269 Siluriformes Heptapteridae Brachyglanis sp AUM 3179 Siluriformes Heptapteridae Cetopsorhamdia insidiosa AUM 3104 Siluriformes Heptapteridae Chasmocranus brevior AUM 4730 Siluriformes Heptapteridae Imparfinis hasemani AUM 3157 Siluriformes Heptapteridae Leptorhamdia marmorata AUM 4821 Siluriformes Heptapteridae Mastiglanis sp AUM 3122 Siluriformes Heptapteridae Pimelodella cristata AUM 3147 Siluriformes Heptapteridae Pimelodella leptosoma AUM 3106 Siluriformes Heptapteridae Rhamdia quelen AUM 3105 Siluriformes Ictaluridae Ameiurus nebulosus ANSP 185099 Siluriformes Ictaluridae Noturus leptacanthus AUM 37 Siluriformes Loricariidae Acanthicus adonis AUM 5787 Siluriformes Loricariidae Ancistrus leucostictus AUM 3936 Siluriformes Loricariidae Chaetostoma breve AUM 4063 Siluriformes Loricariidae Farlowella amazonum ANSP 199910 Siluriformes Loricariidae Cteniloricaria platystoma AUM 3894 Siluriformes Loricariidae Hypostomus taphorni AUM 4262 Siluriformes Loricariidae Lamontichthys filamentosus AUM 4024 Siluriformes Loricariidae Lasiancistrus schomburgkii AUM 4056 Siluriformes Loricariidae Loricaria sp AUM 3656 Siluriformes Loricariidae Peckoltia lineola AUM 3881 Siluriformes Loricariidae Pseudancistrus nigrescens AUM 3609 Siluriformes Mochokidae Chiloglanis occidentalis AUM 5622 Siluriformes Mochokidae Synodontis batesii AUM 5839 Siluriformes Nematogenyidae Nematogenys inermis ANSP 180477 Siluriformes Pangasiidae Helicophagus leptorhynchus AUM 5067 Siluriformes Pangasiidae Pseudolais pleurotaenia AUM 5052 Siluriformes Pimelodidae Megalonema platycephalum AUM 3103 Siluriformes Pimelodidae Phractocephalus hemioliopterus AUM 3118 Siluriformes Pimelodidae Pimelodus albofasciatus AUM 3100 Siluriformes Pimelodidae Platysilurus mucosus AUM 4019 Siluriformes Pimelodidae Pseudoplatystoma fasciatum AUM 3099 Siluriformes Pimelodidae Sorubim elongatus AUM 3107 Siluriformes Plotosidae Plotosus canius INHS 93712 Siluriformes Pseudopimelodidae Batrochoglanis villosus AUM 3169 Siluriformes Pseudopimelodidae Cephalosilurus sp AUM 3918 Siluriformes Pseudopimelodidae Microglanis poecilus AUM 3173 Siluriformes Schilbeidae Clupisoma garua AUM 3821 Siluriformes Schilbeidae Eutropiichthys murius AUM 3826

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 22

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

Siluriformes Siluridae Kryptopterus cryptopterus AUM 5050 Siluriformes Siluridae Phalacronotus bleekeri AUM 5058 Siluriformes Siluridae Wallago attu AUM 3766 Siluriformes Sisoridae Gagata cenia AUM 3780 Siluriformes Sisoridae Gogangra viridescens AUM 3788 Siluriformes Trichomycteridae Haemomaster venezuelae AUM 4223 Siluriformes Trichomycteridae Henonemus sp AUM 4034 Siluriformes Trichomycteridae Ituglanis amazonicus AUM 4205 Siluriformes Trichomycteridae Paracanthopoma parva AUM 4222 Siluriformes Trichomycteridae Pseudostegophilus nemurus AUM 4023 Siluriformes Trichomycteridae Trichomycteridae sp AUM 4217 Siluriformes Trichomycteridae Vandellia sp AUM 4228 468 AMNH: American Museum of Natural History, New York, USA 469 ANSP: Academy of Natural Sciences, Philadelphia, Pennsylvania, USA 470 AUM: Auburn University Natural History Museum, Auburn, Alabama, USA 471 GO: The George Washington University, Washington, DC, USA 472 INHS: Illinois Natural History Survey, University of Illinois, Champaign, Illinois, USA 473 MUSM: Museo de Historia Natural "Javier Prado" de la Universidad Nacional Mayor de San 474 Marcos, Lima, Peru 475 ROM: Royal Ontario Museum, Toronto, Canada 476 UFRGS: Universidade Federal do Rio Grande do Sul, Departamento de Zoologia, Brazil 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 23

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

505 Table S6. Description of species sampled for the pilot study and data processing. Single- Total Reads Loci map copy loci Family Species number map to to (similarity of reads reference reference > 97%) Adrianichthyidae Oryzias latipes 11,284,58 416,815 669 551 Amiidae Amia calva 7,958,122 516,985 707 583 Aplocheilidae Pachypanchax sakaramyi 3,208,756 191,962 589 465 Apteronotidae Apteronotus albifrons 2,506,082 558,652 1090 913 Ariidae Brustarius solidus 12,301,320 456,444 906 762 Carangidae Caranx hippos 6,049,418 346,062 820 713 Chanidae Chanos chanos 3,711,118 283,111 876 724 Characidae Astyanax mexicanus 32,037,054 1,874,071 1486 1,270 Clupeidae Pellona flavipinnis 14,744,898 868,083 955 797 Cyprinidae Danio rerio 27,109,090 11,124,272 3947 3,667 Kneriidae Kneria sp. 12,034,058 597,480 828 697 Osteoglossidae Osteoglossum bicirrhosum 2,372,852 108,917 462 362 Rivulidae Nematolebias whitei 10,553,528 443,216 641 526 Cyprinidae Tanichthys sp. 18,603,974 1,936,994 3198 2,936

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 24

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

506 Table S7. Age of branch stems subtending each major clade in Otophysi (in MY). Clade This tree* Chen et al.15 Nakatani et al.16 Gymnotiformes 61.0 65.2 39.0 Siluriformes 19.6 22.2 35.5 Characoidei 23.0 26.1 23.8 Citharinoidei 46.4 49.6 61.7 507 *Otophysi tree grafted into the Fish Tree of Life (FToL)11,27 and time-calibrated using TreePL 508 and secondary calibrations (based on 61 primary fossil calibrations across all fishes; details in 509 Betancur-R. et al.11,27). We did not use the original FToL11 because Distichodus, the only 510 representative examined of Citharinoidei in that study was later shown to be contaminated with 511 Ctenoluciidae, a Characoidei (see15). 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 25

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION

532 Table S8. Estimates of subtending branch lengths in coalescent time units for subclades in three 533 of the five datasets analyzed here. Stem ages are estimated from fossil-calibrated phylogenies (no 534 data available for metazoans and yeasts). Stem lengths in coalescent units shown in bold are 535 below the minimum theoretical values (5.3–8.3) required for the vast majority of genes in a 536 genome (99.99–99.999%) to achieve monophyly13. Note that although low values (or lack of 537 estimates) may be indicative of possible violations of the underlying assumption of subclade 538 monophyly, the modified GGI-based coalescent analyses that use a combination of 539 unconstrained and constrained gene trees obtain results identical to those using the constrained- 540 only version (Figs. 5, S6, S7). 541 Effective Stem length Stem length Generation 542 Subclade population size (coalescent time (in MY) time (years) 543 (Ne) units) 544 Otophysi a 18 b 545 Gymnotiformes 39.0-61.0 2.5-5.0 10000-100000 20-122 546 a 18 b Siluriformes 19.6-35.5 2.5-5.0 10000-100000 20-71 547 a 18 b Characoidei 23.0-26.1 2.5-5.0 10000-100000 23-52 548 a 18 b Citharinoidei 46.4-61.7 2.5-5.0 10000-100000 46-123 549 Eutherian mammals 550 Rodentia 9.228,29 0.2-0.330 400000-500000 31 30.6-57.2 551 Lagomorpha 26.228 0.2-0.330 400000-50000031 87.3-163.7552 Primates 1.8-5.228,29 8.0-1530 10400-2520031 2.3-31.2 553 554 Neoavian birds 555 29,32 33 10000-100000b Coraciimorphae 0.88-0.99 1.0-12 0.4-49.5 556 Owl 12.9-14.529,32 1.0-1233 10000-100000b 5.3-725.0 29,32 33 b 557 Australaves 1.5-3.89 1.0-12 10000-100000 6.2-194.5 558 29,32 b Accipitrimorphae 3.02-5.5 1.0-1233 10000-100000 1.25-275 559 560 561 aaveraged from values presented in Table S7. 562 bconservative range of values for freshwater fishes and birds. 563 564 565

NATURE ECOLOGY & EVOLUTION | DOI: 10.1038/s41559-016-0020 | www.nature.com/natecolevol 26

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.