<<

Error, signal, and the placement of Ctenophora sister to all other

Nathan V. Whelana,1, Kevin M. Kocotb, Leonid L. Morozc, and Kenneth M. Halanycha aMolette Biology Laboratory for Environmental and Climate Change Studies, Department of Biological Sciences, Auburn University, Auburn, AL 36849; bSchool of Biological Sciences, University of Queensland, St. Lucia, QLD 4101, Australia; and cDepartment of Neuroscience and McKnight Brain Institute, University of Florida, Gainesville, FL 32610

Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved March 24, 2015 (received for review February 18, 2015)

Elucidating relationships among early lineages has been recovered a sister relationship between and ctenophores difficult, and recent phylogenomic analyses place Ctenophora sister in analyses where taxa with high amounts of missing data were to all other extant animals, contrary to the traditional view of Porifera excluded, and support for this was highest in Bayesian inference as the earliest-branching animal lineage. To date, phylogenetic support with the CAT (17) substitution model. However, ctenophores for either ctenophores or sponges as sister to other animals has been were recovered as sister to all other extant animals in maximum- limited and inconsistent among studies. Lack of agreement among likelihood analyses with greater taxon sampling (Bayesian in- phylogenomic analyses using different data and methods obscures ference never converged for datasets with more than 19 taxa). how complex traits, such as epithelia, neurons, and muscles evolved. A The CAT model is a site-heterogeneous model that may handle consensus view of animal evolution will not be accepted until datasets LBA artifacts better than site-homogeneous substitution models and methods converge on a single hypothesis of early metazoan like GTR (18). Therefore, LBA plausibly influenced the phylo- relationships and putative sources of systematic error (e.g., long- genetic position of ctenophores in analyses of Ryan et al. (6) that branch attraction, compositional bias, poor model choice) are assessed. recovered ctenophores-sister. In Moroz et al. (5), strong nodal Here, we investigate possible causes of systematic error by expanding support for ctenophores sister to all other extant animals dis- taxon sampling with eight novel transcriptomes, strictly enforcing appeared when both the strictest orthology criteria were enforced orthology inference criteria, and progressively examining potential and ctenophore taxon sampling increased. Thus, further consid- causes of systematic error while using both maximum-likelihood with eration of systematic error influencing phylogeny reconstruction robust data partitioning and Bayesian inference with a site-heteroge- at the base of the animal tree is desirable. The ctenophore-sister hypothesis has challenged our under- neous model. We identified ribosomal protein genes as possessing standing of early metazoan evolution, but given conflicting results a conflicting signal compared with other genes, which caused some (2–10), this and other hypotheses must be carefully scrutinized. past studies to infer ctenophores and cnidarians as sister. Importantly, Ideally, if robust datasets are assembled and causes of systematic biases resulting from elevated compositional heterogeneity or ele- error are accounted for, different datasets and analytical methods vated substitution rates are ruled out. Placement of ctenophores as will converge on a single phylogenetic hypothesis (19). However, sister to all other animals, and monophyly, are strongly sup- practical barriers exist in assembling robust datasets free of sys- ported under multiple analyses, herein. tematic error. For example, phylogenomic datasets are prone to missing data given the incomplete nature of transcriptome and phylogenomics | Metazoa | Ctenophora | Porifera | Cnidaria even genome sequences, and orthology determination among

esolving relationships among extant lineages at the base of Significance Rthe metazoan tree is integral to understanding evolution of complex animal traits, including nervous systems and gastrula- Traditional interpretation of animal phylogeny suggests traits, tion. Historically, sponges and placozoans, both of which have such as mesoderm, muscles, and neurons, evolved only once relatively simple body plans and lack neurons, have been con- given the assumed placement of sponges as sister to all other sidered to diverge from other animals earlier than ctenophores, animals. In contrast, placement of ctenophores as the first cnidarians, and bilaterians (1). Phylogenomic studies have resulted branching animal lineage raises the possibility of multiple origins in controversial hypotheses placing either Placozoa (Fig. 1A)(2), B – of many complex traits considered important for animal di- ctenophores (ctenophore-sister hypothesis) (Fig. 1 )(37), or a versification and success. We consider sources of potential error clade of ctenophores and sponges (Fig. 1C) (6) as sister to all – and increase taxon sampling to find a single, statistically robust remaining animals. Others (7 10) have claimed nontraditional placement of ctenophores as our most distant animal relatives, findings resulted from systematic error and argued for traditional contrary to the traditional understanding of animal phylogeny. placement of sponges as sister to all remaining animals (Eumetazoa, Furthermore, ribosomal protein genes are identified as creating or Porifera-sister hypothesis) (Fig. 1D) and a sister relationship D conflict in signal that caused some past studies to recover a sister between ctenophores and cnidarians (Coelenterata) (Fig. 1 ). relationship between ctenophores and cnidarians. Limited statistical support for various hypotheses and conflict among, and even within studies, has undermined confidence in Author contributions: N.V.W., K.M.K., L.L.M., and K.M.H. designed research; N.V.W. and our understanding of early animal evolution. Basal metazoan K.M.K. performed research; N.V.W. and K.M.K. analyzed data; and N.V.W., K.M.K., L.L.M., and K.M.H. wrote the paper. relationships must be resolved with greater consistency before a EVOLUTION consensus viewpoint is widely accepted. The authors declare no conflict of interest. Long-branch attraction (LBA) (11), which occurs when two This article is a PNAS Direct Submission. divergent lineages are artificially inferred as related because of Freely available online through the PNAS open access option. substitutional saturation (11), is perhaps the most often evoked – Data deposition: The sequence reported in this paper has been deposited in the NCBI explanation for controversial or spurious phylogenetic results (7 Sequence Read Archive, www.ncbi.nlm.nih.gov/sra (accession no. PRJNA278284). Tran- 10, 12, 13). Additional sources of systematic error include poor scriptome assemblies, phylogenetic datasets, and an annotation file were deposited to taxon or character sampling (10, 14, 15), large amounts of missing figshare, figshare.com (doi: 10.6084/m9.figshare.1334306). data (16), and model misspecification (16, 17). Such errors have 1To whom correspondence should be addressed. Email: [email protected]. been implicated as influencing the position of ctenophores in This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. metazoan phylogeny studies (5–8). For example, Ryan et al. (6) 1073/pnas.1503453112/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1503453112 PNAS | May 5, 2015 | vol. 112 | no. 18 | 5773–5778 but not definitively, paralogous) from 104 OGs. Datasets with certain and both certain and uncertain paralogs pruned were starting points for progressively filtering other causes of system- atic error. Overall, 25 hierarchical datasets that had progressively fewer characters, but controlled for more potential causes of systematic errors, were assembled (Fig. 2 and Table S2) (all datasets have been deposited on figshare, doi 10.6084/m9.fig- share.1334306). The percentage of gene occupancy and missing data ranged from 70–82% and 35–44%, respectively (Table S2). Other than progressive data filtering, differences between data- sets analyzed here and those used in previous studies of basal metazoan relationships (3–10) are increased character sampling and less missing data compared with some studies and increased nonbilaterian taxon sampling. In contrast to Nosenko et al. (7) Fig. 1. Phylogenetic hypotheses from previous molecular studies. (A)Placozoa- sister hypothesis (2). (B) Ctenophora-sister hypothesis (3–7). Placozoa was not and Philippe et al. (9), both of which relied heavily on ribosomal included in some studies that found support for this topology. (C) Ctenophora + proteins (i.e., 52% and 71%, respectively), our dataset did not Porifera-sister hypothesis (6). (D) Traditional Porifera-sister hypothesis (7–10). contain a large representation of any one gene class (e.g., only 8 of 250 were ribosomal protein genes) (SI Methods). distantly related species can be difficult (20, 21). Computational Ctenophores Sister to Other Extant Animals and Monophyletic Sponges. limitations of complex phylogenetic methods can also prevent All maximum-likelihood analyses with outgroups resulted in to- ≥ using what may be the best theoretical phylogenetic method. pologies with strong support [ 97% bootstrap support (BS)] for Nevertheless, both data quality and appropriate methods should ctenophores sister to all other extant animal lineages (datasets 1–21 in Figs. 2 and 3 and Figs. S1–S4). Importantly, Phylobayes be emphasized if deep relationships of any organismal group are +Γ to be robustly resolved. (26) analyses using the CAT-GTR model also resulted in Here, we have assembled a more comprehensive phyloge- phylogenies with Ctenophora sister to all other animals with 100% posterior probability (PP), and 100% PP for sponge monophyly nomic dataset of metazoan lineages that branched early in ani- B C mal evolution than previous studies to alleviate taxon sampling (datasets 6 and 16 in Figs. 2 and 3 and Fig. S5 and ). Inferred relationships among major sponge lineages (i.e., Demospongiae + concerns. We have sequenced transcriptomes of eight additional + species and used other deeply sequenced publicly available tran- Hexactinellida sister to Calcarea Homoscleromorpha) were consistent with morphology (27, 28) and most other molecular scriptomes (including some not used in past studies). Additionally, – we use a number of data-filtering steps to explore the sensitivity of analyses that have recovered monophyletic sponges (datasets 1 27 in Figs. 2 and 3 and Figs. S1–S5) (4, 5, 8, 9, 28–30). Alternative these results to potential sources of error. This process includes hypotheses of basal animal relationships (Fig. 1) were rejected by strict orthology determination, removal of taxa and genes that every phylogenetic analysis as measured by the approximately may cause LBA, and removal of heterogeneous genes that may unbiased (AU) (31) test (P ≤ 0.001) (Table 1). Overall, our results cause model misspecification. Regardless of how data were fil- overwhelmingly reject alternative hypotheses to ctenophores sister tered, all maximum-likelihood analyses with model partitioning to all other extant animals (Table 1). and all Bayesian inference analyses using a site-heterogeneous Ribosomal protein genes can have conflicting signal with most model recover ctenophores as sister to all other animals with other genes (32). The datasets of Philippe et al. (9) and Nosenko strong support. We identify overreliance on ribosomal protein et al. (7), which recovered cnidarians and ctenophores sister, had genes in some datasets (7, 9) as the source of incongruence among high proportions of ribosomal protein genes (67 of 128 and 87 of previous phylogenomic studies. We also find strong support for 122 genes, respectively). Nosenko et al. (7) analyzed a dataset – sponge monophyly in contrast to previous reports (7, 22 24). without ribosomal protein genes and recovered ctenophores sister to all other animals, but a similar analysis has not been done for Results the original Philippe et al. (9) dataset. If certain topologies are Datasets and Accounting for Biases. Orthology filtering of tran- recovered only with the use of a high proportion of one group of scriptome and genome data from 76 species resulted in 251 genes (e.g., ribosomal protein genes), this may indicate a phylo- orthologous groups (OGs) and 81,008 aligned amino acid sites genetic signal that conflicts with the true evolutionary history. As (Tables S1 and S2). TreSpEx (25) further identified 83 “certain” such, we analyzed the Philippe et al. (9) dataset with ribosomal paralogs (i.e., sequences TreSpEx classified as high-confidence proteins removed (67 of 128) using maximum-likelihood and paralogs) from 10 OGs. TreSpEx also identified 2,684 “un- Bayesian inference, and neither reconstruction placed ctenophores certain” paralogs (i.e., sequences TreSpEx classified as possibly, and cnidarians sister as in the original study (Fig. S5 D and E).

1 Full dataset

“Certain” paralogs “Uncertain” & “certain” 2 12 removed paralogs removed

Heterogeneous Taxa with high LB Heterogeneous 3 Taxa with high LB 8 13 18 scores removed genes removed scores removed genes removed Fig. 2. Hierarchy of datasets with progressive data filtering. Numbers associated with each datasets are Genes with high LB Genes with high LB Genes with high LB references throughout the text. Datasets 5, 11, 15, 4 Genes with high LB 9 14 19 scores removed scores removed scores removed scores removed 21 have as an outgroup, and datasets 22, 23, 24, 25 have all outgroups removed. Outgroups Slowest Lowest half Taxa with high LB Outgroups Slowest Lowest half Taxa with high LB RAxML analyses were used for each dataset, and removed evolving half RCFV genes scores removed removed evolving half RCFV genes scores removed shaded datasets were also analyzed with Phylo- 5, 22 6 7 10 15, 24 16 17 20 Bayes. Data matrix statistics (e.g., number of sites, Outgroups Outgroups 11, 23 21, 25 percentage of missing data, and so forth) for each removed removed dataset can be found in Table S2.

5774 | www.pnas.org/cgi/doi/10.1073/pnas.1503453112 Whelan et al. Priapulus caudatus SI Methods Lithobius forficatus protein classes ( ; figshare), rather than consisting of a Drosophila melanogaster Daphnia pulex majority of ribosomal proteins (7, 9). Hemithiris psittacea Tubulanus polymorphus Inaccurate orthology assignment can also introduce systematic Lottia gigantea Capitella teleta error into phylogenomic analyses. Although relationships among Saccoglossus kowalevskii Strongylocentrotus purpuratus basal lineages were unaffected, removal of paralogs as identified 79 Petromyzon marinus Homo sapiens by TreSpEx appeared to have the greatest effect on support for Danio rerio 9494 AiptasiaAiptasia pallidapallida HormathiaH thihi digitata dig it some critical nodes. For example, most topologies with both BoloceraBolocera tuediae NematostellaNematostella vectensisvectensis certain and uncertain paralogs removed had strong support for Acropora digitiferadigitifera PlatygyraPlatygyrraa carnosacarnosa sponge monophyly (i.e., ≥ 95% BS) (datasets 12–14 and 18–20 in EunicellaEunicella verrucosaverrucosa AAureliaurelia auritaaurita Figs. 2 and 3 and Figs. S2F, S3 A, B, and F, and S4 A and B), but Periphylla periphylla AbylopsisAAbbl l i tetragona four analyses with only certain paralogs removed recovered low Craseoa latheticalthti Agalma eleganselegans < Nanomia bijugabijuga support ( 90% BS) for sponge monophyly (datasets 5, 7, 9, and Physalia physalis E A D E HydraHy d viridissima i idi i 10 in Figs. 2 and 3 and Figs. S1 and S2 , , and ). Hydra vulgaris HydraHHdyg d oligactis li ti Because outgroup sampling has the potential to influence TrichoplaxTrichoplax adhaerensadhaerens HyalonemaHHyalonema populiferumpopuliferum rooting of the animal tree, we explored outgroup sampling as RossellaRosseella fibulatafibufi lata SympagellaSympaagella nux well. When all outgroups except two choanoflagellates were re- AphrocallistesAphrocAhph callil s ttes vastus AmphimedonA queenslandicaqueennslandica moved (datasets 5, 11, 15, and 21 in Fig. 2), inferred nonbilaterian Petrosiaifiifi ficiformis PseudospongosoritesPseudospongosorites suberitoidessuberitoides Latrunculia apicalis relationships were identical as in analyses we performed with full CrellaCClllrella eleganselegans outgroup sampling (datasets 5, 11, 15, and 21 in Figs. 2 and 3 and 61 Kirkpatrickia variolosa Tethya wilhelma E E C C SpongillaSpg alba Figs. S1 , S2 , S3 ,andS4 ), but support for sponge mono- Ephydatiay muelleri 96 Chondrilla nucula phyly decreased. In these analyses the leaf-stability indices for IrciniaIrcinia fasciculatafasciculata SyconSycon coactum homoscleromorph and calcareous sponges were less than 0.94, Sycon ciliatum Oscarellall carmela l but in all other analyses they were greater than 0.97 (Fig. S5 F and CorticiumCorticium candelabrum Euplokamis dunlapaep G Coeloplana astericolaasterricola ). Regardless, when choanoflagellates were the only outgroup, ValliculaV lli l sp. PleurobrachiaPleurobraPl chia bacheibachei ctenophores were still recovered as the deepest split within the PleurobrachiaPleurobrachia sp.sp Dryodora glandiformisglandifoormis animal tree with 100% BS support. Analyses with all outgroup 65 BeroeBBbileroe abyssicolaabyb ssicolaa – 99 Mnemiopsis leidyii taxa removed (datasets 22 25 in Fig. 2) recovered identical re- Bolinopsis infundibulum MertensiidaeM t iid sp.sp. lationships among major metazoan lineages as other analyses MonosigaMi ovata t Salpingoeca rosetta (Figs. S4 D–F and S5A). However, we observed low support for owczarzaki vibrans Amoebidium parasiticum relationships among ctenophores, sponges, and placozoans in Sphaeroforma arctica Rhizopus oryzae these analyses. This resulted from the long placozoan branch 95 Mortierella verticillata Spizellomyces punctatus being attracted to ctenophores in the absence of outgroup taxa as Allomyces macrogynus 0.2 indicated by bootstrap tree topologies and leaf-stability index for Trichoplax of less than 0.92, whereas leaf-stability indices were Fig. 3. Reconstructed maximum-likelihood topology of metazoan relation- greater than 0.99 in all other analyses (Fig. S5 F and G). ships inferred with dataset 10. Maximum likelihood and Bayesian topologies inferred with other datasets (Fig. 2) have identical basal branching patterns Discussion (Figs. S1–S5). Nodes are supported with 100% bootstrap support unless oth- erwise noted. Support, as inferred from each dataset (Fig. 2), for nodes cov- Placement of Ctenophores Sister to all Remaining Animals Is Not ered by black boxes are in Table 1. Sensitive to Systematic Errors. Every analysis conducted herein strongly supported the ctenophore-sister hypothesis (Fig. 3 and Table 1). A major hurdle to wide acceptance of ctenophores as The maximum-likelihood analysis recovered strong support for sister to other animals has been that different analyses have ctenophores as sister to all other metazoan lineages (BS = 93) yielded conflicting hypotheses of early animal phylogeny (2–9). (Fig. S5D). However, Bayesian inference (Fig. S5E) recovered Sensitivity to the selected model of molecular evolution has been – sponges as sister to all other metazoans, but support for this and especially problematic (2 9). In contrast, both maximum-likeli- other deep nodes were low (PP ≤ 90). hood analyses using data partitioning and Bayesian analyses using the CAT-GTR model of our datasets resulted in identical Systematic Biases and Their Effect on Phylogenetic Inference. Long- branching patterns among ctenophores, sponges, placozoans, branch (LB) scores (28), a measurement for identifying taxa and cnidarians, and bilaterians. Past critiques of studies that found OGs that could cause LBA, were calculated for each species and ctenophores to be sister to all other animals have emphasized the OG with TreSpEx (25). In total, we identified six “long-branched” CAT model as the most appropriate model for deep phyloge- taxa, all nonmetazoans (Fig. S6A and Table S2), and 28 OGs nomics because it is an infinite mixture model that accounts for with high LB scores compared with other OGs (Fig. S6 B and C). site-heterogeneity (7, 8, 29). Notably, when the CAT-GTR model was used here (datasets 6 and 16 in Fig. 2), we recovered cteno- We found complete congruence in relationships among basal phores-sister to all other metazoans (Fig. 3 and Fig. S5 B and C). metazoan phyla in trees inferred with (datasets 1, 2, 8, 12, and 18 – – – – – – The argument for LBA (7 10) or saturated datasets (7, 8) as the in Fig. 2) and without (datasets 3 7, 9 11, 13 17, 19 21, and 22 reason past studies found ctenophores to be sister to all other an- 25 in Fig. 2) taxa and genes that had high LB scores, and nodal imals seems to have been overstated. The recovered position of support for critical nodes showed little variation among analyses – ctenophores was identical in analyses with (datasets 1, 2, 8, 12, and (Fig. 3 and Figs. S1 S5). Removing OGs with high amino acid 18 in Fig. 2 and Figs. S1 A and B, S2 B and F,andS3F) and without compositional heterogeneity (datasets 7–11, 17–21, 23, and 25 in – – – – C–F (datasets 3 7, 9 11, 13 17, and 19 25 in Fig. 2, and Figs. S1 , S2 EVOLUTION Fig. 2) also had no effect on branching order (Fig. 3 and Figs. S2 A and C–E, S3 A–E, S4,andS5 A–C) taxa and genes with high LB A–E, S3 E and F, S4 A–E, and S5A). Topologies inferred with scores, and analyses with the slowest evolving genes (datasets 6 and only the slowest evolving half of OGs assembled here (datasets 6 16 in Fig. 2 and Fig. S7)alsorecoveredctenophoressistertoall and 16 in Fig. 2) (i.e., least saturated and least prone to homo- other animals (Fig. 3 and Figs. S1F, S3D,andS5 B and C). Fur- plasy; see Fig. S7 for saturation plots) recovered high support for thermore, despite the long internal branch leading to the cteno- ctenophores sister to all other animals and sponge monophyly phore clade, the position of this lineage did not change in any with both maximum-likelihood (BS = 100) (Fig. 3 and Figs. S1F analysis including those when outgroups were removed (datasets 5, and S3D) and Bayesian inference using the CAT-GTR model 11, 15, 21, and 22–25 in Fig. 2 and Figs. S1E, S2E, S3C,andS4 C– (PP = 1) (Fig. 3 and Fig. S5 B and C). Importantly, our datasets F). If this branch was being artificially attracted toward outgroups, of the slowest evolving half of OGs were of a broad range of then employment of different outgroup schemes would be expected

Whelan et al. PNAS | May 5, 2015 | vol. 112 | no. 18 | 5775 Table 1. AU test P values for alternative hypotheses of animal relationships and support for ctenophora-sister and sponge monophyly Porifera Dataset (no.) Porifera-sister Coelenterata Placozoa-sister Porifera + Ctenophora Ctenophora-sister monophyly

Full dataset (1) 2.E-04 5.E-06 1.E-76 2.E-05 100 BS 95 BS “Certain” paralogs 2.E-04 1.E-05 3.E-03 5.E-79 100 BS 90 BS removed (2) Taxa with high LB scores 6.E-06 2.E-06 1.E-62 2.E-07 100 BS 94 BS removed (3) Genes with high LB scores 1.E-02 2.E-04 2.E-32 3.E-47 100 BS 94 BS removed (4) -only 4.E-45 1.E-04 1.E-05 6.E-104 100 BS 68 BS outgroup (5) All outgroups removed (22) N/A 3.E-55 N/A N/A N/A 84 BS Slowest evolving half 2.E-78 2.E-11 2.E-04 1.E-05 100 BS/100 PP 95 BS/100 PP of genes (6) Genes with lowest half of 3.E-06 2.E-57 9.E-47 2.E-03 100 BS 53 BS RCFV values (7) Heterogeneous genes 7.E-52 2.E-49 9.E-05 3.E-05 100 BS 90 BS removed (8) Genes with high LB scores 9.E-41 9.E-05 2.E-04 2.E-66 99 BS 90 BS removed (9) Taxa with high LB scores 7.E-22 5.E-61 9.E-06 7.E-06 100 BS 88 BS removed (10) Choanoflagellate-only 1.E-03 3.E-36 1.E-02 6.E-104 92 BS 82 BS outgroup (11) All outgroups removed (23) N/A 2.E-04 N/A N/A N/A 92 BS “Certain” and “uncertain” 6.E-39 1.E-05 4.E-44 4.E-05 100 BS 99 BS paralogs (12) Taxa with high LB scores 6.E-05 8.E-06 6.E-30 8.E-09 100 BS 99 BS removed (13) Genes with high LB scores 8.E-105 9.E-05 1.E-45 5.E-102 99 BS 97 BS removed (14) Choanoflagellate-only 5.E-59 5.E-39 7.E-05 1.E-72 100 BS 66 BS outgroup (15) All outgroups removed (24) N/A 2.E-07 N/A N/A N/A 92 BS Slowest evolving half of 1.E-04 1.E-52 1.E-11 1.E-07 100 BS/100 PP 94 BS/100 PP genes (16) Genes with lowest half of 5.E-09 1.E-68 5.E-07 3.E-72 100 BS 93 BS RCFV values (17) Heterogeneous genes 7.E-07 4.E-10 6.E-77 8.E-09 99 BS 100 BS removed (18) Genes with high LB scores 2.E-46 1.E-08 2.E-04 1.E-57 100 BS 96 BS removed (19) Taxa with high LB scores 1.E-03 3.E-92 1.E-66 3.E-113 100 BS 100 BS removed (20) Choanoflagellate-only 2.E-61 1.E-31 2.E-04 4.E-40 100 BS 77 BS outgroup (21) All outgroups removed (25) N/A 9.E-05 N/A N/A N/A 96 BS Philippe et al. (7) Maximum 0.001 0.010 0.008 0.075 93 BS 42 BS likelihood Philippe et al. (7) Bayesian —— — — N/A 99 PP inference

Dataset numbers are as in Fig. 2. to result in different ctenophore placement. Maximum-likelihood a sister relationship between ctenophores and cnidarians (7, 9) and Bayesian inference using the CAT-GTR model of the least should be the focus of identifying problems with phylogenetic saturated datasets (datasets 6 and 16 in Fig. 2 and Fig. S7)re- reconstruction. A benefit of phylogenomic datasets is that mul- covered identical basal relationships as our other analyses (Fig. 3 tiple gene classes and many parts of the genome are analyzed. As and Figs. S1F, S3D,andS5 B and C), also indicating homoplasy and such, phylogenomic datasets should not rely too heavily on a model choice did not bias results. Given the consistency among our single gene class. Past molecular studies that found support for analyses that were designed to have different levels of potential Coelenterata and the Porifera-sister (7, 9) hypotheses appear to biases, we conclude that the ctenophore-sister hypothesis is robust have been strongly affected by a disproportionate reliance (i.e., > to systematic errors. 50%) on ribosomal protein genes. Nosenko et al. (7) and Philipe Rather than focusing on long branches, fast evolving genes, or et al. (9) found support for ctenophores sister to cnidarians, but model misspecification as influencing the position of cteno- Nosenko et al. (7) recovered ctenophores sister to all other ani- phores, the individual genes underlying datasets that resulted in mals when ribosomal proteins were excluded. Ribosomal protein

5776 | www.pnas.org/cgi/doi/10.1073/pnas.1503453112 Whelan et al. datasets from these studies are less saturated than datasets as- not a good proxy for metazoan evolution (40). Challenges to sembled here based on linear regression of patristic distance long-held viewpoints of morphological complexity and assumed versus uncorrected genetic distance (Fig. S7)(7–9, 25). This lower improbability of convergent evolution must not be dismissed mutational saturation has been the primary rational for empha- simply because they seem unlikely at face value, especially con- sizing ribosomal genes when inferring deep animal relationship sidering a growing body of evidence that supports convergent (7). However, standard measurements of sequence saturation (7, evolution of many animal traits including neurons (5, 6, 41–43). 8, 25) average across the length of the sequence. Thus, a sequence Furthermore, the Porifera-sister hypothesis lacks critical evalu- with a few variable, highly saturated sites may appear less satu- ation and homology of many characters in these taxa still need rated than a sequence with numerous variable sites but less sat- thorough analysis (44). Overall, findings presented here robustly uration per site. Furthermore, extremely low mutation rates can support ctenophores as sister to all remaining animals. indicate selection and result in too little phylogenetic information, both of which could lead to the inference of incorrect relation- Methods ships (33). Our maximum-likelihood analysis of Philipe et al.’s(9) Taxon and Character Sampling. Taxon sampling included previously available dataset with ribosomal genes removed recovered support for data and eight new transcriptomes from two choanoflagellates, three glass ctenophores as sister to all other animals (Fig. S5D). Basal re- sponges (Hexactinellida), two , and a deep-sea cnidarian (Scy- lationships were poorly resolved in the Bayesian analysis, which phozoa) (Table S1). Briefly, RNA was extracted, reverse-transcribed, and am- may be a result of too few characters, but ctenophores and cni- plified using the SMART kit (Clontech), and sequenced on an Illumina HiSeq darians were not recovered as sister (Fig. S5E). Notably, a study (SI Methods). Raw or assembled transcriptome data for 68 additional species focused only on myxozoan cnidarians (34) used the same matrix were retrieved from public databases (Table S1). as Philippe et al. (9), but added two highly divergent cnidarians Raw Illumina transcriptome data were digitally normalized using nor- and recovered ctenophores sister to all other animals. Ribosomal malize-by-median.py (45) with a k-mer size of 20, a desired coverage of 30, × 9 protein genes have previously been identified as a potential and four hash tables with a lower bound of 2.5 10 . Normalized Illumina source of phylogenetic error (32), and the above indicates that reads were assembled using default parameters in Trinity v20131110 (46). Raw 454 transcriptome data were assembled with Newbler (47). ctenophores sister to cnidarians as in Philippe et al. (9) was caused by either limited cnidarian taxon sampling, misleading Orthology Determination and Data Filtering. Putative orthologs were de- signal in ribosomal genes, or both. More work is needed to assess termined for each species using HaMStR v13.2 (48) using the model organism saturation, possible convergent evolution, and selective pressures core ortholog set. OGs, determined by HaMStR, were further processed us- of ribosomal proteins in ctenophores, sponges, placozoans, and ing a custom pipeline that filtered OGs with too much missing data (i.e., OGs cnidarians. However, differences in topologies when ribosomal with less than 37 species), aligned sequences, and filtered potential paralogs proteins are included or excluded strongly imply a misleading (https://github.com/kmkocot)(SI Methods). signal in ribosomal protein genes. Put simply, it appears highly TreSpEx (25), which requires individual gene trees for each OG, was used improbable that all genes other than ribosomal protein genes to identify putative paralogs and exogenous contamination missed by our could be recovering an incorrect phylogeny. initial orthology inference approach. Gene trees were inferred with RAxML v.8.0.2 (49) with 100 rapid bootstrap replicates followed by a full maximum- Sponges Are Monophyletic. Sponge monophyly, although less con- likelihood inference; each tree was inferred with the LG+Γ model, which was troversial than the phylogenetic positions of ctenophores, cnidar- by far the most common best-fitting model when the complete dataset was ians, and placozoans remains an important question as several partitioned (see below). Paralogs in the initial dataset were detected with studies have supported sponge paraphyly (7, 22–24). In regards to TreSpEx using the automated BLAST method and the prepackaged Capitella inferring the characteristics of the metazoan ancestor, sponge teleta and Helobdella robusta BLAST databases. This method identified two paraphyly, coupled with sponges being at the base of the metazoan classes of paralogs: “certain” or sequences that are high-confidence paralogs, phylogeny, is an attractive hypothesis that implies the metazoan and “uncertain” or sequences that are potential paralogs. From the initial, ancestor was sponge-like. However, sponge monophyly was re- 251 gene dataset, we created one dataset by removing certain paralogs and covered in all of our analyses and best supported when the strictest another dataset with both certain and uncertain paralogs pruned (Fig. 2); orthology criteria were applied with TreSpEx (Fig. 3 and Table 1), after pruning, OGs with fewer than 37 taxa were removed. LB scores (25) were calculated for each taxon and OG with TreSpEx. Following Struck (25), which also removes sequences resulting from sample contamina- these values were plotted in R (Fig. S6) (50), and outliers were identified as tion (e.g., endosymbionts).This observation suggests that spurious taxa or genes that could cause LBA artifacts. After removal of taxa and genes, paralogs or sequence contamination may have been a source of each OG was ranked by evolutionary rate with a custom python script (https:// error when sponges were found paraphyletic, but datasets that github.com/nathanwhelan) following Telford et al. (51). Datasets with only have recovered sponge paraphyly were also much smaller than the slowest half of remaining genes were then generated to assess if fast those analyzed here (e.g., refs. 7, 22–24). Sponge monophyly and evolving homoplasious genes were biasing inferences (Fig. 2). the well-supported ctenophore-sister hypothesis complicates in- We used BaCoCa (52) and two metrics (χ2-test of heterogeneity and rel- ferring the ancestral condition of metazoan and other major ative composition frequency variability; RCFV) (53) to identify genes with metazoan groups (e.g., Placozoa + Cnidaria + Bilateria) because amino acid compositional heterogeneity. Some datasets were further fil- many sponge characteristics are likely apomorphic traits. Our tered by removing non-choanoflagellate outgroups and all outgroup taxa to robust support of sponge monophyly agrees with morphology determine if outgroup choice affects inferred relationships. Saturation of (35) and most other large molecular datasets (5, 6, 8–10). each filtered dataset with full outgroup sampling was explored with TreSpEx and plotted in R (Fig. S7) to provide a further metric to compare datasets. Conclusions For more than a century, sponges were traditionally considered Phylogenetics. In addition to removing compositionally heterogeneous genes sister to all other extant metazoans because unlike ctenophores, from some datasets, two approaches were used to handle site-heterogeneity: (i) partitioning schemes for each dataset and associated protein substitution cnidarians, and bilaterians, they lack true tissues and body sym- EVOLUTION metry (36, 37). Sponges also possess choanocyte cells that are models were determined using the relaxed clustering method in PartitonFinder similar in morphology to choanoflagellates, the sister group to (54) with 20% clustering and the corrected Akaike information criterion; (ii)a site-heterogeneous mixture model, CAT-GTR+Γ, was used in PhyloBayes (26). metazoans (36). However, Mah et al. (38) found that homology Maximum-likelihood topologies were inferred with RAxML using partitions as between choanoflagellates and sponge choanocytes is not as de- indicated by ParitionFinder, associated best-fit substitution models, and the finitive as previously assumed. Similarly, some authors have argued gamma parameter to model rate-heterogeneity. Nodal support was measured that Placozoans are sister to all other animals because they lack with 100 fast bootstrap replicates. Phylobayes analyses were run with two chains neural and muscular systems and also share similarities in mito- until the maxdiff statistic between chains was below 0.3 as measured by bpcomp chondrial genome size with choanoflagellaes (2, 39). A common (26). Convergence was also assessed with tracecomp (26) to ensure each pa- theme of these two hypotheses is the placement of morphologically rameter had a maximum discrepancy between chains of less than 0.3 and an simple animals near the base of the animal tree, but complexity is effective sample size of at least 50. Computational demands and convergence

Whelan et al. PNAS | May 5, 2015 | vol. 112 | no. 18 | 5777 issues prevented us from using the CAT-GTR+Γ model for most datasets. inferred with the same partitioning scheme and models used for unconstrained Therefore, Bayesian phylogenies are only reported for the two analyses of the phylogenetic inference. Per site log-likelihoods for trees were calculated in RAxML slowest evolving half of OGs. Leaf-stability indices (55) for each taxon were and AU tests were performed in Consel (57). measured in PhyUtility (56) to identify potentially unstable taxa in each dataset. Maximum-likelihood and Bayesian inference trees were also inferred from ACKNOWLEDGMENTS. We thank members of the Molette Biology Labora- the Philippe et al. (9) dataset with ribosomal proteins removed (i.e., 67 of 128 tory for Environmental and Climate Change Studies at Auburn University for genes) to determine if a single gene class-biased phylogenetic inference. help with bioinformatics and data collection, especially Damien Waits. This Ribosomal proteins were filtered from the original dataset following data work was made possible in part by a grant of high-performance computing matrix annotations (9) and the matrix was split into individual genes for resources and technical support from the Alabama Supercomputer Authority model testing using a custom R script (https://github.com/nathanwhelan). and was supported by the US National Aeronautics and Space Administra- The AU test (31) was used to determine if a priori hypotheses of basal tion (Grant NASA-NNX13AJ31G) and in part by National Science Foundation metazoan relationships could be rejected (Fig. 1). Topological constraints (Grant 1146575). This is Molette Biology Laboratory Contribution 36 and were enforced in RAxML and the most likely tree given this constraint was Auburn University Marine Biology Program Contribution 128.

1. Dohrmann M, Wörheide G (2013) Novel scenarios of early animal evolution—Is it time 29. Dohrmann M, Janussen D, Reitner J, Collins AG, Wörheide G (2008) Phylogeny and to rewrite textbooks? Integr Comp Biol 53(3):503–511. evolution of glass sponges (Porifera, Hexactinellida). Syst Biol 57(3):388–405. 2. Dellaporta SL, et al. (2006) Mitochondrial genome of Trichoplax adhaerens supports 30. Voigt O, Adamski M, Sluzek K, Adamska M (2014) Calcareous sponge genomes reveal placozoa as the basal lower metazoan phylum. Proc Natl Acad Sci USA 103(23): complex evolution of α-carbonic anhydrases and two key biomineralization enzymes. 8751–8756. BMC Evol Biol 14(1):230. 3. Dunn CW, et al. (2008) Broad phylogenomic sampling improves resolution of the 31. Shimodaira H (2002) An approximately unbiased test of phylogenetic tree selection. animal tree of life. Nature 452(7188):745–749. Syst Biol 51(3):492–508. 4. Hejnol A, et al. (2009) Assessing the root of bilaterian animals with scalable phylo- 32. Bleidorn C, et al. (2009) On the phylogenetic position of Myzostomida: Can 77 genes genomic models. Proc Biol Sci 276(1802):4261–4270. get it wrong? BMC Evol Biol 9(1):150. 5. Moroz LL, et al. (2014) The ctenophore genome and the evolutionary origins of neural 33. Edwards SV (2009) Natural selection and phylogenetic analysis. Proc Natl Acad Sci USA systems. Nature 510(7503):109–114. 106(22):8799–8800. 6. Ryan JF, et al.; NISC Comparative Sequencing Program (2013) The genome of the 34. Nesnidal MP, Helmkampf M, Bruchhaus I, El-Matbouli M, Hausdorf B (2013) Agent of ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science whirling disease meets orphan worm: phylogenomic analyses firmly place Myxozoa in 342(6164):1242592. Cnidaria. PLoS ONE 8(1):e54576. 7. Nosenko T, et al. (2013) Deep metazoan phylogeny: When different genes tell dif- 35. Ax P (1996) Multicellular Animals: A New Approach to the Phylogenetic Order in – ferent stories. Mol Phylogenet Evol 67(1):223 233. Nature (Springer, Berlin). 8. Philippe H, et al. (2011) Resolving difficult phylogenetic questions: Why more se- 36. Nielsen C (2008) Six major steps in animal evolution: Are we derived sponge larvae? quences are not enough. PLoS Biol 9(3):e1000602. Evol Dev 10(2):241–257. 9. Philippe H, et al. (2009) Phylogenomics revives traditional views on deep animal re- 37. Srivastava M, et al. (2010) The Amphimedon queenslandica genome and the evolu- – lationships. Curr Biol 19(8):706 712. tion of animal complexity. Nature 466(7307):720–726. 10. Pick KS, et al. (2010) Improved phylogenomic taxon sampling noticeably affects 38. Mah JL, Christensen-Dalsgaard KK, Leys SP (2014) Choanoflagellate and choanocyte – nonbilaterian relationships. Mol Biol Evol 27(9):1983 1987. collar-flagellar systems and the assumption of homology. Evol Dev 16(1):25–37. 11. Felsenstein J (1978) Cases in which parsimony and compatability methods will be 39. Osigus H-J, Eitel M, Bernt M, Donath A, Schierwater B (2013) Mitogenomics at the positively misleading. Syst Zool 27(4):401–410. base of Metazoa. Mol Phylogenet Evol 69(2):339–351. 12. Boussau B, et al. (2014) Strepsiptera, phylogenomics and the long branch attraction 40. Halanych KM (1999) Metazoan phylogeny and the shifting comparative framework. problem. PLoS ONE 9(10):e107709. Recent Developments in Comparative Endocrinology and Neurobiology, eds Roubos EW, 13. Straub SCK, et al. (2014) Phylogenetic signal detection from an ancient rapid radia- Wendelaar-Bonga SE, Vaudry H, De Loof A (Shaker, Maastrict, The Netherlands), pp 3–7. tion: Effects of noise reduction, long-branch attraction, and model selection in crown 41. Dunn CW, Giribet G, Edgecombe GD, Hejnol A (2014) Animal phylogeny and its clade Apocynaceae. Mol Phylogenet Evol 80:169–185. evolutionary implications. Annu Rev Ecol Evol Syst 45:371–395. 14. Heath TA, Hedtke SM, Hillis DM (2008) Taxon sampling and the accuracy of phylo- 42. Liebeskind BJ, Hillis DM, Zakon HH (2015) Convergence of ion channel genome genetic analyses. J Syst Evol 46(3):239–257. content in early animal evolution. Proc Natl Acad Sci USA 112(8):E846–E851. 15. Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: The beginning of 43. Moroz LL (2015) Convergent evolution of neural systems in ctenophores. J Exp Biol incongruence? Trends Genet 22(4):225–231. 218(Pt 4):598–611. 16. Roure B, Baurain D, Philippe H (2013) Impact of missing data on phylogenies inferred 44. Halanych KM (2015) The ctenophore lineage is older than sponges? That cannot be from empirical phylogenomic data sets. Mol Biol Evol 30(1):197–214. right! Or can it? J Exp Biol 218(Pt 4):592–597. 17. Lartillot N, Brinkmann H, Philippe H (2007) Suppression of long-branch attraction 45. Brown T, Howe C, Zhang A, Pyrkosz Q, Brom AB (2012) A reference-free algorithm for artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol computational normalization of shotgun sequencing data. arXiv:1203.4802. 7(Suppl 1):S4. 46. Haas BJ, et al. (2013) De novo transcript sequence reconstruction from RNA-seq using 18. Tavaré S (1986) Some probabilistic and statistical problems in the analysis of DNA – sequences. Lect Math Life Sci 17:57–86. the Trinity platform for reference generation and analysis. Nat Protoc 8(8):1494 1512. 19. Philippe H, Delsuc F, Brinkmann H, Lartillot N (2005) Phylogenomics. Annu Rev Ecol 47. Margulies M, et al. (2005) Genome sequencing in microfabricated high-density pi- – Evol Syst 36:541–562. colitre reactors. Nature 437(7057):376 380. 20. Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of ortho- 48. Ebersberger I, Strauss S, von Haeseler A (2009) HaMStR: Profile hidden markov model logs inference projects and methods. PLOS Comput Biol 5(1):e1000262. based search for orthologs in ESTs. BMC Evol Biol 9:157. 21. Gabaldón T (2008) Large-scale assignment of orthology: Back to phylogenetics? Ge- 49. Stamatakis A (2014) RAxML version 8: A tool for phylogenetic analysis and post- – nome Biol 9(10):235. analysis of large phylogenies. Bioinformatics 30(9):1312 1313. 22. Borchiellini C, et al. (2001) Sponge paraphyly and the origin of Metazoa. J Evol Biol 50. R Core Development Team (2014) R: A Language and Environment for Statistical 14(1):171–179. Computing (R Foundation for Statistical Computing, Vienna, Austria). 23. Sperling EA, Pisani D, Peterson KJ (2007) Poriferan paraphyly and its implications for 51. Telford MJ, et al. (2013) Phylogenomic analysis of echinoderm class relationships Precambrian paleobiology. Geol Soc Lond Spec Publ 286:355–368. supports Asterozoa. Proc Biol Sci 281(1786):20140479. — 24. Sperling EA, Peterson KJ, Pisani D (2009) Phylogenetic-signal dissection of nuclear 52. Kück P, Struck TH (2014) BaCoCa A heuristic software tool for the parallel assess- housekeeping genes supports the paraphyly of sponges and the monophyly of Eu- ment of sequence biases in hundreds of gene and taxon partitions. Mol Phylogenet metazoa. Mol Biol Evol 26(10):2261–2274. Evol 70:94–98. 25. Struck TH (2014) TreSpEx-detection of misleading signal in phylogenetic reconstruc- 53. Zhong M, et al. (2011) Detecting the symplesiomorphy trap: A multigene phyloge- tions based on tree information. Evol Bioinform Online 10:51–67. netic analysis of terebelliform annelids. BMC Evol Biol 11(1):369. 26. Lartillot N, Rodrigue N, Stubbs D, Richer J (2013) PhyloBayes MPI: Phylogenetic re- 54. Lanfear R, Calcott B, Kainer D, Mayer C, Stamatakis A (2014) Selecting optimal par- construction with infinite mixtures of profiles in a parallel environment. Syst Biol titioning schemes for phylogenomic datasets. BMC Evol Biol 14:82. 62(4):611–615. 55. Thorley JL, Wilkinson M (1999) Testing the phylogenetic stability of early tetrapods. 27. van Soest RWM (1984) Deficient Merlia normani Kirkpatrick, 1908, from the Curacao J Theor Biol 200(3):343–344. reefs, with a discussion on the phylogenetic interpretation of sclerosponges. Contrib 56. Smith SA, Dunn CW (2008) Phyutility: A phyloinformatics tool for trees, alignments Zool 54(2):211–219. and molecular data. Bioinformatics 24(5):715–716. 28. Gazave E, et al. (2012) No longer Demospongiae: Homoscleromorpha formal nomi- 57. Shimodaira H, Hasegawa M (2001) CONSEL: For assessing the confidence of phylo- nation as a fourth class of Porifera. Hydrobiologia 687(1):3–10. genetic tree selection. Bioinformatics 17(12):1246–1247.

5778 | www.pnas.org/cgi/doi/10.1073/pnas.1503453112 Whelan et al. Supporting Information

Whelan et al. 10.1073/pnas.1503453112 SI Methods was also removed from the dataset to minimize missing data. Transcriptome Sequencing. Frozen aliquots of Acanthoeca specta- HaMStR sometimes pulls two identical sequences from a single bilis (ATCC PRA-103) and Salpingoeca pyxidium (ATCC 50929) taxon for any given gene so the script uniqHaplo.pl (raven.iab.alaska. cultures were purchased from ATCC and a 10-μL aliquot of each edu/∼ntakebay/teaching/programming/perl-scripts/uniqHaplo.pl) was culture was used for RNA extraction with the RNAqueous Micro used to remove duplicate sequences. As an initial data-filtering RNA extraction kit (Life Technologies). Notably, Acanthoeca step before sequence alignment, if an ambiguous amino acid (i.e., spectabilis was grown in a xenic culture with the bacteria Klebsiella an X) was present in the first 20 or last 20 amino acids then the 5′ or pneumoniae (ATCC 700831) and Enterobacter aerogenes (ATCC 3′ end, respectively, was trimmed to this X as it may indicate se- 13048) and Salpingoeca pyxidium was grown in a xenic culture quencing errors. with Klebsiella pneumoniae (ATCC 700831). Tissue cuttings were Sequences were aligned using MAFFT (2) with the “auto” and taken from the apical portion of the sponges Latrunculia apicalis, “localpair” parameters and 1,000 maximum iterations. MAFFT Kirpatrickia variolosa, Hyalonema populiferum,andRossella fibulata. outputs sequences in an interleaved format so nentferner.pl was One “orb” of the unusual hexactinellid Sympagella nux was taken again used to remove newline characters from sequences. Un- for RNA extractions. A tissue clip was taken from the margin of alignable regions were then removed with aliscore (3) and alicut the bell of the deep-sea cnidarian Periphylla periphylla. RNA (4). Alignment columns with only gaps were subsequently re- was extracted from all sponges and P. periphylla with TRIzol moved, and any gene with an alignment less than 50-bp long (invitrogen). The Qiagen RNeasy kit and on-column DNase di- after trimming was removed. For each gene, a custom javascript gestion was used to purify TRIzol extracted RNA. cDNA libraries (https://github.com/kmkocot) then removed any sequence that of all eight taxa were constructed with the SMART cDNA library did not overlap other sequences by at least 20 amino acids to construction kit (Clontech Laboratories). cDNA library construction minimize the number of end gaps in alignments that would likely followed manufacturer’s instructions except that the provided 3′ oligo not add to information content in phylogenetic inference. Fi- was replaced with the Cap-Trsa-CV oligo following Meyer et al. (1). nally, any gene that had fewer than 37 taxa after alignment and The Advantage 2 PCR system (Clontech) was used to amplify full- trimming was removed to further minimize missing data. length cDNA using a minimum number of PCR cycles (∼17–21). The last step removed putative paralogs with PhyloTreePrunner Amplified cDNA was shipped to Hudson Alpha Institute for Bio- (5), which is a tree-based method. Gene trees for each OG were technology in Huntsville, AL for Illumina library preparation and 2 × inferred using FastTreeMP (6) with the “slow” and “gamma” 100-base pair (A. spectabilis, S. pyxidium, K. variolosa, L. apicalis)or parameters. Gene trees and corresponding OGs were input into 2 × 150-base pair (H. populiferum, R. fibulata, S. nux, P. periphylla) PhyloTreePrunner and inferred paralogs were removed from paired-end sequencing on an Illumina HiSEq. 2500 using ∼one- each OG. Alignments of each OG were concatenated using eighth lane per taxon. FASconCAT v1.0 (7). An automated wrapper script for this procedure is available from https://github.com/kmkocot. Initial Orthology Inference Procedure. Before any filtering steps, all sequences were backed up and new line characters were removed OG Annotation. To ensure no one gene class dominated our from interleaved sequences with nentferner.pl (https://github. matrices, OGs were annotated with PFAM domains (8) using the com/mptrsen/HaMStRad). All sequences shorter than 50 bp Trinotate pipeline (trinotate.github.io). An annotation spread- were then removed. Any gene with fewer than 37 taxa (∼47%) sheet was placed on figshare (doi: 10.6084/m9.figshare.1334306)

1. Meyer E, et al. (2009) Sequencing and de novo analysis of a coral larval transcriptome 5. Kocot KM, Citarella MR, Moroz LL, Halanych KM (2013) PhyloTreePruner: A phyloge- using 454 GSFlx. BMC Genomics 10:219. netic tree-based approach for selection of orthologous sequences for phylogenomics. 2. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: Evol Bioinform Online 9(3940):429–435. Improvements in performance and usability. Mol Biol Evol 30(4):772–780. 6. Price MN, Dehal PS, Arkin AP (2010) FastTree 2—Approximately maximum-likelihood 3. Misof B, Misof K (2009) A Monte Carlo approach successfully identifies randomness in trees for large alignments. PLoS ONE 5(3):e9490. multiple sequence alignments: A more objective means of data exclusion. Syst Biol 7. Kück P, Meusemann K (2010) FASconCAT: Convenient handling of data matrices. Mol 58(1):21–34. Phylogenet Evol 56(3):1115–1118. 4. Kück P (2009) ALICUT: A Perlscript Which Cuts ALISCORE Identified RSS (Department of 8. Finn RD, et al. (2014) Pfam: the protein families database. Nucleic Acids Res Bioinformatics, Zoologisches Forschungsmuseum, Bonn, Germany). 42(Database issue):D222–D230.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 1of11 Hemithiris psittacea Drosophila melanogaster A Tubulanus polymorphus B Daphnia pulex C Petromyzon marinus Lottia gigantea Lithobius forficatus Danio rerio Capitella teleta Priapulus caudatus 82 Homo sapiens Priapulus caudatus 99 Lottia gigantea Saccoglossus kowalevskii Lithobius forficatus Capitella teleta Strongylocentrotus purpuratus Daphnia pulex Hemithiris psittacea Drosophila melanogaster Drosophila melanogaster 99 Tubulanus polymorphus Daphnia pulex Strongylocentrotus purpuratus Saccoglossus kowalevskii Lithobius forficatus Saccoglossus kowalevskii Strongylocentrotus purpuratus Priapulus caudatus 79 Petromyzon marinus Homo sapiens 77 Petromyzon marinus Lottia gigantea Danio rerio Danio rerio Capitella teleta Eunicella verrucosa Homo sapiens Hemithiris psittacea Acropora digitifera Eunicella verrucosa Tubulanus polymorphus Platygyra carnosa Bolocera tuediae Platygyra carnosa Nematostella vectensis Aiptasia pallida Acropora digitifera Aiptasia pallida 99 Hormathia digitata Nematostella vectensis Hormathia digitata Nematostella vectensis Bolocera tuediae Bolocera tuediae Platygyra carnosa Hormathia digitata Hydra viridissima Acropora digitifera Aiptasia pallida Hydra oligactis Hydra viridissima Eunicella verrucosa Hydra vulgaris Hydra oligactis Hydra oligactis Physalia physalis Hydra vulgaris Hydra vulgaris Nanomia bijuga Agalma elegans Hydra viridissima Agalma elegans Nanomia bijuga Abylopsis tetragona Abylopsis tetragona Abylopsis tetragona Craseoa lathetica Craseoa lathetica Craseoa lathetica Nanomia bijuga Aurelia aurita Agalma elegans Periphylla periphylla Physalia physalis Trichoplax adhaerens Aurelia aurita Physalia physalis Oscarella carmela Periphylla periphylla Aurelia aurita Corticium candelabrum Trichoplax adhaerens Periphylla periphylla Sycon coactum Oscarella carmela Trichoplax adhaerens Sycon ciliatum Corticium candelabrum Oscarella carmela Hyalonema populiferum Sycon coactum Corticium candelabrum Sycon coactum 95 Aphrocallistes vastus Sycon ciliatum Sympagella nux Hyalonema populiferum Sycon ciliatum Rossella fibulata Aphrocallistes vastus Hyalonema populiferum 99 Chondrilla nucula 94 Aphrocallistes vastus 90 Sympagella nux Ircinia fasciculata Rossella fibulata Rossella fibulata Spongilla alba 55 Tethya wilhelma Sympagella nux Ephydatia muelleri Pseudospongosorites suberitoides Ircinia fasciculata Pseudospongosorites suberitoides Latrunculia apicalis Chondrilla nucula Tethya wilhelma Kirkpatrickia variolosa Petrosia ficiformis Crella elegans 36 Amphimedon queenslandica Kirkpatrickia variolosa Crella elegans Spongilla alba Latrunculia apicalis Spongilla alba Ephydatia muelleri Amphimedon queenslandica Ephydatia muelleri Pseudospongosorites suberitoides Petrosia ficiformis Petrosia ficiformis Latrunculia apicalis Euplokamis dunlapae Amphimedon queenslandica Vallicula sp. Ircinia fasciculata Kirkpatrickia variolosa Crella elegans Coeloplana astericola Chondrilla nucula 52 Pleurobrachia bachei Euplokamis dunlapae Tethya wilhelma Pleurobrachia sp. Vallicula sp. Euplokamis dunlapae Dryodora glandiformis Coeloplana astericola Coeloplana astericola Beroe abyssicola Dryodora glandiformis Vallicula sp. 99 Mnemiopsis leidyi Beroe abyssicola Pleurobrachia sp. 63 Mertensiidae sp. Bolinopsis infundibulum Pleurobrachia bachei Bolinopsis infundibulum Mertensiidae sp. Dryodora glandiformis Monosiga ovata 90 Beroe abyssicola Mnemiopsis leidyi Salpingoeca rosetta Mnemiopsis leidyi Pleurobrachia sp. 98 Monosiga brevicollis Mertensiidae sp. Pleurobrachia bachei Acanthoeca spectabilis Bolinopsis infundibulum Monosiga brevicollis Salpingoeca pyxidium Salpingoeca rosetta Salpingoeca rosetta Capsaspora owczarzaki Monosiga ovata Ministeria vibrans Monosiga ovata Ministeria vibrans Sphaeroforma arctica Acanthoeca spectabilis Capsaspora owczarzaki Amoebidium parasiticum Salpingoeca pyxidium Allomyces macrogynus Spizellomyces punctatus Capsaspora owczarzaki Mortierella verticillata Neurospora crassa Ministeria vibrans 44 Rhizopus oryzae Aspergillus fumigatus 39 Amoebidium parasiticum 90 Spizellomyces punctatus 70 Saccharomyces cerevisiae Sphaeroforma arctica Sphaeroforma arctica Mortierella verticillata Spizellomyces punctatus Amoebidium parasiticum Rhizopus oryzae 55 Rhizopus oryzae Allomyces macrogynus Mortierella verticillata 0.2 0.4 79 Saccharomyces cerevisiae Neurospora crassa Aspergillus fumigatus Allomyces macrogynus 97 Priapulus caudatus Lithobius forficatus 0.4 Drosophila melanogaster D Tubulanus polymorphus E F Daphnia pulex Hemithiris psittacea Drosophila melanogaster 57 Hemithiris psittacea Lottia gigantea Daphnia pulex Tubulanus polymorphus Capitella teleta Lithobius forficatus Lottia gigantea Priapulus caudatus Priapulus caudatus 58 Capitella teleta Drosophila melanogaster Capitella teleta Saccoglossus kowalevskii Daphnia pulex Lottia gigantea Strongylocentrotus purpuratus Lithobius forficatus Hemithiris psittacea 48 Petromyzon marinus Strongylocentrotus purpuratus 99 Tubulanus polymorphus Homo sapiens Saccoglossus kowalevskii Danio rerio Petromyzon marinus 86 Petromyzon marinus Bolocera tuediae Homo sapiens Danio rerio Aiptasia pallida Danio rerio Homo sapiens 89 65 Hormathia digitata Strongylocentrotus purpuratus Eunicella verrucosa Nematostella vectensis Saccoglossus kowalevskii Nematostella vectensis Acropora digitifera Eunicella verrucosa Bolocera tuediae Platygyra carnosa Acropora digitifera Aiptasia pallida Eunicella verrucosa Hormathia digitata Platygyra carnosa Aurelia aurita Platygyra carnosa Nematostella vectensis Periphylla periphylla Acropora digitifera Bolocera tuediae Abylopsis tetragona Aurelia aurita Aiptasia pallida Craseoa lathetica Periphylla periphylla 99 Hormathia digitata Agalma elegans Hydra viridissima Aurelia aurita Nanomia bijuga Hydra oligactis Periphylla periphylla Physalia physalis Hydra vulgaris Hydra viridissima Hydra oligactis Agalma elegans Hydra vulgaris Hydra vulgaris Nanomia bijuga Hydra oligactis Hydra viridissima Abylopsis tetragona Abylopsis tetragona Trichoplax adhaerens Craseoa lathetica Craseoa lathetica Oscarella carmela Physalia physalis Nanomia bijuga Corticium candelabrum Trichoplax adhaerens Agalma elegans Sycon coactum Corticium candelabrum Physalia physalis Sycon ciliatum Oscarella carmela Trichoplax adhaerens Hyalonema populiferum Sycon coactum Hyalonema populiferum Rossella fibulata Sycon ciliatum Aphrocallistes vastus 95 Sympagella nux Hyalonema populiferum Rossella fibulata Aphrocallistes vastus 94 Sympagella nux Amphimedon queenslandica Sympagella nux Rossella fibulata Petrosia ficiformis 99 Ircinia fasciculata Aphrocallistes vastus Pseudospongosorites suberitoides Chondrilla nucula Chondrilla nucula Latrunculia apicalis Amphimedon queenslandica Ircinia fasciculata Crella elegans Petrosia ficiformis Petrosia ficiformis Kirkpatrickia variolosa Spongilla alba 44 Amphimedon queenslandica Tethya wilhelma Ephydatia muelleri Pseudospongosorites suberitoides Spongilla alba Tethya wilhelma Pseudospongosorites suberitoides 68 Ephydatia muelleri Tethya wilhelma 64 Latrunculia apicalis 98 Chondrilla nucula Kirkpatrickia variolosa 53 Latrunculia apicalis Ircinia fasciculata Crella elegans Crella elegans Euplokamis dunlapae Ephydatia muelleri Kirkpatrickia variolosa Coeloplana astericola Spongilla alba Corticium candelabrum Vallicula sp. Euplokamis dunlapae Oscarella carmela Pleurobrachia bachei Vallicula sp. Sycon coactum Pleurobrachia sp. Coeloplana astericola Sycon ciliatum Dryodora glandiformis Pleurobrachia bachei Euplokamis dunlapae Beroe abyssicola 99 Pleurobrachia sp. Coeloplana astericola 99 Mnemiopsis leidyi Dryodora glandiformis Vallicula sp. 83 Bolinopsis infundibulum 82 Beroe abyssicola Pleurobrachia sp. Mertensiidae sp. Mertensiidae sp. Pleurobrachia bachei Allomyces macrogynus Bolinopsis infundibulum Dryodora glandiformis Rhizopus oryzae Mnemiopsis leidyi 65 Beroe abyssicola Mortierella verticillata Monosiga ovata Mertensiidae sp. Spizellomyces punctatus Salpingoeca rosetta Bolinopsis infundibulum Amoebidium parasiticum Ministeria vibrans 71 Mnemiopsis leidyi Sphaeroforma arctica Capsaspora owczarzaki Ministeria vibrans Monosiga ovata Sphaeroforma arctica Capsaspora owczarzaki Salpingoeca rosetta Amoebidium parasiticum 82 Monosiga ovata Allomyces macrogynus 0.2 Salpingoeca rosetta Rhizopus oryzae Mortierella verticillata 0.2 Spizellomyces punctatus

0.2

Fig. S1. Maximum-likelihood topologies. (A) Full dataset (Fig. 2, dataset 1). (B) Dataset with certain paralogs removed (Fig. 2, dataset 2). (C) Dataset with certain paralogs and taxa with high LB scores removed (Fig. 2, dataset 3). (D) Dataset with certain paralogs and taxa and gene with high LB scores removed (Fig. 2, dataset 4). (E) Dataset with certain paralogs, taxa and genes with high LB scores, and all outgroups except choanoflagellates removed (Fig. 2, dataset 5). (F) Dataset with slowest evolving half of genes after certain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 6). Bootstrap values for each node are 100% unless otherwise noted.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 2of11 83 Hemithiris psittacea Saccharomyces cerevisiae Drosophila melanogaster A Tubulanus polymorphus B Neurospora crassa C Daphnia pulex 54 Lottia gigantea Aspergillus fumigatus Lithobius forficatus 91 Capitella teleta Rhizopus oryzae Priapulus caudatus Priapulus caudatus Mortierella verticillata Hemithiris psittacea Drosophila melanogaster Spizellomyces punctatus Tubulanus polymorphus 94 30 Daphnia pulex Allomyces macrogynus Lottia gigantea Amoebidium parasiticum Lithobius forficatus Capitella teleta 55 Sphaeroforma arctica Petromyzon marinus 98 Saccoglossus kowalevskii Capsaspora owczarzaki Homo sapiens Strongylocentrotus purpuratus Ministeria vibrans Danio rerio 55 Petromyzon marinus 83 Aspergillus fumigatus Strongylocentrotus purpuratus Danio rerio Salpingoeca pyxidium Saccoglossus kowalevskii Homo sapiens 99 Monosiga ovata Eunicella verrucosa Eunicella verrucosa 88 Salpingoeca rosetta Platygyra carnosa Platygyra carnosa Monosiga brevicollis Acropora digitifera Acropora digitifera Euplokamis dunlapae Bolocera tuediae Bolocera tuediae Vallicula sp. Hormathia digitata Aiptasia pallida Coeloplana astericola 97 Aiptasia pallida 93 Hormathia digitata Pleurobrachia sp. Nematostella vectensis Nematostella vectensis Pleurobrachia bachei Physalia physalis Aurelia aurita Dryodora glandiformis Agalma elegans Periphylla periphylla Beroe abyssicola Nanomia bijuga Hydra viridissima Mertensiidae sp. Craseoa lathetica 99 Bolinopsis infundibulum Hydra vulgaris Abylopsis tetragona Mnemiopsis leidyi Hydra oligactis Hydra oligactis Oscarella carmela Nanomia bijuga Hydra vulgaris Corticium candelabrum Hydra viridissima Agalma elegans Sycon coactum Aurelia aurita Craseoa lathetica Sycon ciliatum Periphylla periphylla Abylopsis tetragona Hyalonema populiferum 90 Trichoplax adhaerens Physalia physalis Aphrocallistes vastus Hyalonema populiferum Trichoplax adhaerens Rossella fibulata 99 Aphrocallistes vastus Petrosia ficiformis Sympagella nux Rossella fibulata Amphimedon queenslandica Ircinia fasciculata Sympagella nux Pseudospongosorites suberitoides Chondrilla nucula Amphimedon queenslandica Tethya wilhelma Petrosia ficiformis Petrosia ficiformis Amphimedon queenslandica Crella elegans Spongilla alba 85 Ephydatia muelleri Kirkpatrickia variolosa Ephydatia muelleri Spongilla alba Latrunculia apicalis Pseudospongosorites suberitoides Latrunculia apicalis Spongilla alba Tethya wilhelma Kirkpatrickia variolosa Ephydatia muelleri Latrunculiaapicalis Crella elegans 55 Chondrilla nucula Pseudospongosorites suberitoides 90 Kirkpatrickia variolosa Ircinia fasciculata 66 Tethya wilhelma Crella elegans Hyalonema populiferum Trichoplax adhaerens Chondrilla nucula 97 Aphrocallistes vastus Eunicella verrucosa Ircinia fasciculata 53 Rossella fibulata Platygyra carnosa Corticium candelabrum Sympagella nux Acropora digitifera Oscarella carmela Sycon coactum Nematostella vectensis Sycon coactum Sycon ciliatum Bolocera tuediae Sycon ciliatum Corticium candelabrum Hormathia digitata Euplokamis dunlapae Oscarella carmela Aiptasia pallida Coeloplana astericola Euplokamis dunlapae Craseoa lathetica Vallicula sp. Dryodora glandiformis 99 Coeloplana astericola Abylopsis tetragona Beroe abyssicola Vallicula sp. Agalma elegans Nanomia bijuga Mertensiidae sp. Pleurobrachia bachei 98 Physalia physalis Bolinopsis infundibulum Pleurobrachia sp. 99 Hydra viridissima Mnemiopsis leidyi Dryodora glandiformis Hydra oligactis Pleurobrachia bachei Beroe abyssicola 85 Hydra vulgaris Pleurobrachia sp. Bolinopsis infundibulum 98 Aurelia aurita Salpingoeca rosetta Mertensiidae sp. 99 Periphylla periphylla Monosiga brevicollis Mnemiopsis leidyi Priapulus caudatus Monosiga ovata Monosiga ovata Daphnia pulex Acanthoeca spectabilis Salpingoeca rosetta Drosophila melanogaster Salpingoeca pyxidium 99 Ministeria vibrans Lithobius forficatus Sphaeroforma arctica Capsaspora owczarzaki Capitella teleta Amoebidium parasiticum Sphaeroforma arctica Lottia gigantea 74 Ministeria vibrans Amoebidium parasiticum Tubulanus polymorphus Capsaspora owczarzaki 85 Allomyces macrogynus Hemithiris psittacea Mortierella verticillata Spizellomyces punctatus Saccoglossus kowalevskii Rhizopus oryzae 88 30 Mortierella verticillata Strongylocentrotus purpuratus Allomyces macrogynus Rhizopus oryzae 83 Petromyzon marinus Aspergillus fumigatus Homo sapiens 63 Neurospora crassa 0.2 Danio rerio Saccharomyces cerevisiae Spizellomyces punctatus 0.4 0.3

Priapulus caudatus Saccoglossus kowalevskii D Lithobius forficatus EFStrongylocentrotus purpuratus Drosophila melanogaster Danio rerio Daphnia pulex Homo sapiens Hemithiris psittacea Drosophila melanogaster Petromyzon marinus Tubulanus polymorphus Lithobius forficatus Daphnia pulex Lottia gigantea Drosophila melanogaster Lithobius forficatus 80 Capitella teleta Daphnia pulex Priapulus caudatus Saccoglossus kowalevskii Priapulus caudatus Capitella teleta 79 Strongylocentrotus purpuratus Capitella teleta Petromyzon marinus Lottia gigantea Lottia gigantea Homo sapiens Tubulanus polymorphus Tubulanus polymorphus Danio rerio 99 Hemithiris psittacea Hemithiris psittacea 94 Aiptasia pallida Petromyzon marinus Eunicella verrucosa Hormathia digitata Homo sapiens Nematostella vectensis Bolocera tuediae Danio rerio Bolocera tuediae 89 Nematostella vectensis Strongylocentrotus purpuratus Hormathia digitata 91 Aiptasia pallida Acropora digitifera Saccoglossus kowalevskii Platygyra carnosa Platygyra carnosa Eunicella verrucosa Eunicella verrucosa Acropora digitifera Acropora digitifera Aurelia aurita Aurelia aurita Platygyra carnosa Periphylla periphylla Periphylla periphylla Nematostella vectensis Abylopsis tetragona Hydra viridissima Craseoa lathetica Bolocera tuediae Hydra vulgaris Agalma elegans Aiptasia pallida Hydra oligactis Nanomia bijuga 98 Hormathia digitata Abylopsis tetragona Physalia physalis Aurelia aurita Craseoa lathetica Hydra viridissima 99 Periphylla periphylla Agalma elegans Hydra vulgaris Hydra viridissima Nanomia bijuga Hydra oligactis Hydra vulgaris Physalia physalis Trichoplax adhaerens Trichoplax adhaerens Hydra oligactis Hyalonema populiferum Hyalonema populiferum Abylopsis tetragona Rossella fibulata Rossella fibulata Craseoa lathetica Sympagella nux Sympagella nux Nanomia bijuga Aphrocallistes vastus Aphrocallistes vastus Agalma elegans Amphimedon queenslandica Ircinia fasciculata Physalia physalis Petrosia ficiformis Chondrilla nucula Pseudospongosorites suberitoides Trichoplax adhaerens Amphimedon queenslandica Latrunculia apicalis Hyalonema populiferum Petrosia ficiformis Crella elegans Aphrocallistes vastus Ephydatia muelleri Kirkpatrickia variolosa Rossella fibulata Spongilla alba 61 Pseudospongosorites suberitoides Tethya wilhelma Sympagella nux 99 88 Tethya wilhelma Spongilla alba Ircinia fasciculata Latrunculia apicalis Ephydatia muelleri Chondrilla nucula 50 Kirkpatrickia variolosa 96 Chondrilla nucula Amphimedon queenslandica Crella elegans Ircinia fasciculata Petrosia ficiformis Sycon coactum Corticium candelabrum Spongilla alba Sycon ciliatum Oscarella carmela Ephydatia muelleri Oscarella carmela Sycon ciliatum Pseudospongosorites suberitoides Sycon coactum Corticium candelabrum 82 Euplokamis dunlapae Euplokamis dunlapae Tethya wilhelma Dryodora glandiformis Coeloplana astericola 60 Latrunculia apicalis Beroe abyssicola Vallicula sp. Crella elegans Mnemiopsis leidyi Pleurobrachia bachei Kirkpatrickia variolosa Mertensiidae sp. Pleurobrachia sp. Corticium candelabrum Bolinopsis infundibulum Dryodora glandiformis Oscarella carmela Pleurobrachia bachei Beroe abyssicola Sycon coactum 65 Pleurobrachia sp. Mnemiopsis leidyi Sycon ciliatum 99 Vallicula sp. Bolinopsis infundibulum 54 Euplokamis dunlapae Coeloplana astericola Mertensiidae sp. Coeloplana astericola 99 Acanthoeca spectabilis Monosiga ovata Vallicula sp. Salpingoeca pyxidium Salpingoeca rosetta 99 Pleurobrachia sp. Salpingoeca rosetta Capsaspora owczarzaki Pleurobrachia bachei Monosiga brevicollis Ministeria vibrans 6 Monosiga ovata Amoebidium parasiticum Dryodora glandiformis Ministeria vibrans Sphaeroforma arctica Beroe abyssicola Capsaspora owczarzaki Rhizopus oryzae Mertensiidae sp. Sphaeroforma arctica 95 Mortierella verticillata Bolinopsis infundibulum Amoebidium parasiticum Spizellomyces punctatus Mnemiopsis leidyi Saccharomyces cerevisiae Allomyces macrogynus Monosiga ovata Neurospora crassa Salpingoeca rosetta 59 0.2 Aspergillus fumigatus 0.2 Allomyces macrogynus Spizellomyces punctatus 56 Rhizopus oryzae Mortierella verticillata

0.4

Fig. S2. Maximum-likelihood topologies. (A) Dataset with only genes with the lowest 50% of RCFV values after certain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 7).(B) Dataset with certain paralogs and heterogeneous genes removed (Fig. 2, dataset 8). (C) Dataset with certain paralogs, heterogeneous genes, and genes with high LB scores removed (Fig. 2, dataset 9). (D) Dataset with certain paralogs, heterogeneous genes, and taxa and genes with high LB scores removed (Fig. 2, dataset 10). (E) Dataset with certain paralogs, heterogeneous genes, taxa and genes with high LB scores, and all outgroups except choanoflagellates removed (Fig. 2, dataset 11). (F) Dataset with certain and uncertain paralogs removed (Fig. 2, dataset 12). Bootstrap values for each node are 100% unless otherwise noted.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 3of11 A Strongylocentrotus purpuratus B Saccoglossus kowalevskii C Saccoglossus kowalevskii Saccoglossus kowalevskii Strongylocentrotus purpuratus Strongylocentrotus purpuratus Petromyzon marinus Petromyzon marinus Petromyzon marinus Danio rerio Homo sapiens Danio rerio Homo sapiens Danio rerio Homo sapiens Priapulus caudatus Tubulanus polymorphus 99 Priapulus caudatus 80 73 Lithobius forficatus Hemithiris psittacea 78 98 Drosophila melanogaster Lottia gigantea Drosophila melanogaster Daphnia pulex Capitella teleta Daphnia pulex Lithobius forficatus Priapulus caudatus 99 Hemithiris psittacea Capitella teleta Drosophila melanogaster Tubulanus polymorphus Lottia gigantea Daphnia pulex Lottia gigantea Tubulanus polymorphus Lithobius forficatus Capitella teleta Aiptasia pallida Hemithiris psittacea Eunicella verrucosa 91 Hormathia digitata Aiptasia pallida Nematostella vectensis 87 Bolocera tuediae Bolocera tuediae Hormathia digitata Nematostella vectensis Aiptasia pallida Bolocera tuediae Acropora digitifera 95 Hormathia digitata Nematostella vectensis Platygyra carnosa Platygyra carnosa Acropora digitifera Eunicella verrucosa Acropora digitifera Platygyra carnosa Aurelia aurita Aurelia aurita Eunicella verrucosa Periphylla periphylla Periphylla periphylla Aurelia aurita Abylopsis tetragona 98 Hydra viridissima Craseoa lathetica Periphylla periphylla Hydra oligactis Agalma elegans Physalia physalis Hydra vulgaris Nanomia bijuga Abylopsis tetragona Agalma elegans Physalia physalis Craseoa lathetica Nanomia bijuga Hydra viridissima Nanomia bijuga Abylopsis tetragona Hydra vulgaris Agalma elegans Craseoa lathetica Hydra oligactis Hydra vulgaris Physalia physalis Trichoplax adhaerens Hydra oligactis Trichoplax adhaerens 99 Hyalonema populiferum Hydra viridissima Hyalonema populiferum Rossella fibulata Trichoplax adhaerens Sympagella nux Sympagella nux 98 Ircinia fasciculata Rossella fibulata Aphrocallistes vastus Chondrilla nucula Aphrocallistes vastus Amphimedon queenslandica Petrosia ficiformis 99 Chondrilla nucula Petrosia ficiformis Amphimedon queenslandica Ircinia fasciculata Pseudospongosorites suberitoides Petrosia ficiformis Latrunculia apicalis Spongilla alba Amphimedon queenslandica Crella elegans Ephydatia muelleri Pseudospongosorites suberitoides Pseudospongosorites suberitoides 63 Kirkpatrickia variolosa Tethya wilhelma Tethya wilhelma Tethya wilhelma Latrunculia apicalis Spongilla alba Kirkpatrickia variolosa 99 59 97 51 Kirkpatrickia variolosa Ephydatia muelleri Crella elegans Crella elegans Chondrilla nucula Latrunculia apicalis Ephydatia muelleri Ircinia fasciculata Hyalonema populiferum Spongilla alba Sycon coactum Rossella fibulata Sycon ciliatum 66 Corticium candelabrum Sympagella nux Oscarella carmela Oscarella carmela Aphrocallistes vastus Corticium candelabrum Sycon coactum Corticium candelabrum Euplokamis dunlapae Sycon ciliatum Coeloplana astericola 99 Oscarella carmela Euplokamis dunlapae Vallicula sp. Sycon ciliatum Vallicula sp. Pleurobrachia bachei Sycon coactum Coeloplana astericola Pleurobrachia sp. Euplokamis dunlapae Pleurobrachia bachei Dryodora glandiformis Coeloplana astericola Pleurobrachia sp. Beroe abyssicola Vallicula sp. Dryodora glandiformis 76 Mnemiopsis leidyi Pleurobrachia bachei Beroe abyssicola 93 Bolinopsis infundibulum Pleurobrachia sp. Mertensiidae sp. 99 Mertensiidae sp. Dryodora glandiformis Bolinopsis infundibulum Monosiga ovata Beroe abyssicola Mnemiopsis leidyi Salpingoeca rosetta Mnemiopsis leidyi Monosiga ovata Ministeria vibrans Bolinopsis infundibulum Salpingoeca rosetta Capsaspora owczarzaki Mertensiidae sp. Ministeria vibrans Allomyces macrogynus Capsaspora owczarzaki Rhizopus oryzae Salpingoeca rosetta Sphaeroforma arctica Mortierella verticillata Monosiga ovata 76 Amoebidium parasiticum 93 Spizellomyces punctatus 0.2 Allomyces macrogynus Amoebidium parasiticum Rhizopus oryzae Sphaeroforma arctica Mortierella verticillata 0.2 Spizellomyces punctatus 0.2

Saccoglossus kowalevskii D E F Strongylocentrotus purpuratus Priapulus caudatus 97 Saccoglossus kowalevskii Lithobius forficatus Strongylocentrotus purpuratus Daphnia pulex Petromyzon marinus 98 Strongylocentrotus purpuratus Drosophila melanogaster Saccoglossus kowalevskii Homo sapiens Capitella teleta Petromyzon marinus Danio rerio Lottia gigantea Danio rerio Hemithiris psittacea Hemithiris psittacea 66 Homo sapiens 40 Tubulanus polymorphus Tubulanus polymorphus 83 Lottia gigantea Lottia gigantea Homo sapiens 78 Capitella teleta Capitella teleta Danio rerio Tubulanus polymorphus Priapulus caudatus 99 Petromyzon marinus Eunicella verrucosa 69 Hemithiris psittacea 73 Lithobius forficatus Nematostella vectensis Priapulus caudatus Drosophila melanogaster 99 Drosophila melanogaster Daphnia pulex Bolocera tuediae 60 Daphnia pulex 78 Aiptasia pallida Hormathia digitata 87 Lithobius forficatus Hormathia digitata 99 Aiptasia pallida Nematostella vectensis Bolocera tuediae Acropora digitifera 99 Bolocera tuediae Nematostella vectensis Platygyra carnosa Aiptasia pallida Acropora digitifera Nanomia bijuga 99 99 Agalma elegans 87 Hormathia digitata Platygyra carnosa Abylopsis tetragona Acropora digitifera Eunicella verrucosa Craseoa lathetica Platygyra carnosa Aurelia aurita Physalia physalis Eunicella verrucosa Periphylla periphylla Hydra viridissima Physalia physalis Abylopsis tetragona Hydra vulgaris Nanomia bijuga Craseoa lathetica Hydra oligactis Agalma elegans Agalma elegans 99 Aurelia aurita Abylopsis tetragona Nanomia bijuga Periphylla periphylla Craseoa lathetica Physalia physalis Trichoplax adhaerens Hydra viridissima Hydra viridissima Sycon ciliatum Hydra vulgaris Hydra vulgaris Sycon coactum Hydra oligactis Hydra oligactis Oscarella carmela Aurelia aurita Trichoplax adhaerens Corticium candelabrum Periphylla periphylla Oscarella carmela Hyalonema populiferum Trichoplax adhaerens Corticium candelabrum Aphrocallistes vastus Sycon coactum Sycon coactum Rossella fibulata Sycon ciliatum 97 Sycon ciliatum Sympagella nux Corticium candelabrum Hyalonema populiferum 99 Chondrilla nucula Oscarella carmela 99 Rossella fibulata Ircinia fasciculata Hyalonema populiferum 93 Sympagella nux Petrosia ficiformis 94 Aphrocallistes vastus Aphrocallistes vastus Amphimedon queenslandica Rossella fibulata Amphimedon queenslandica Pseudospongosorites suberitoides Sympagella nux Petrosia ficiformis Kirkpatrickia variolosa Chondrilla nucula Pseudospongosorites suberitoides Crella elegans Ircinia fasciculata Latrunculia apicalis Latrunculia apicalis 32 Petrosia ficiformis Tethya wilhelma Crella elegans Amphimedon queenslandica Spongilla alba Kirkpatrickia variolosa Ephydatia muelleri 45 Ephydatia muelleri Tethya wilhelma Spongilla alba Euplokamis dunlapae Spongilla alba Pseudospongosorites suberitoides Vallicula sp. Ephydatia muelleri Kirkpatrickia variolosa Coeloplana astericola Crella elegans 99 Chondrilla nucula Dryodora glandiformis Latrunculia apicalis Ircinia fasciculata Beroe abyssicola 64 Tethya wilhelma Euplokamis dunlapae Mnemiopsis leidyi Euplokamis dunlapae Coeloplana astericola Mertensiidae sp. Vallicula sp. Vallicula sp. Bolinopsis infundibulum Coeloplana astericola Pleurobrachia bachei 59 Pleurobrachia sp. Pleurobrachia bachei Pleurobrachia sp. Pleurobrachia bachei Pleurobrachia sp. Dryodora glandiformis 88 Acanthoeca spectabilis Bolinopsis infundibulum 75 Beroe abyssicola 88 Salpingoeca pyxidium 62 86 Mertensiidae sp. 87 Mnemiopsis leidyi 1 Monosiga ovata Mnemiopsis leidyi 91 Bolinopsis infundibulum Monosiga brevicollis 62 Beroe abyssicola Mertensiidae sp. Salpingoeca rosetta Dryodora glandiformis Monosiga ovata Capsaspora owczarzaki Salpingoeca rosetta Salpingoeca rosetta Ministeria vibrans Monosiga ovata Capsaspora owczarzaki Sphaeroforma arctica Sphaeroforma arctica Ministeria vibrans Amoebidium parasiticum Amoebidium parasiticum Allomyces macrogynus Mortierella verticillata 59 99 Ministeria vibrans Rhizopus oryzae Rhizopus oryzae Capsaspora owczarzaki 99 Mortierella verticillata Spizellomyces punctatus 75 Saccharomyces cerevisiae Mortierella verticillata Spizellomyces punctatus 84 Neurospora crassa Rhizopus oryzae Amoebidium parasiticum Aspergillus fumigatus Spizellomyces punctatus Sphaeroforma arctica 71 Allomyces macrogynus Allomyces macrogynus 0.2 0.2 0.4

Fig. S3. Maximum-likelihood topologies. (A) Dataset with certain and uncertain paralogs and taxa with high LB scores removed (Fig. 2, dataset 13). (B) Dataset with certain and uncertain paralogs and taxa and gene with high LB scores removed (Fig. 2, dataset 14). (C) Dataset with certain and uncertain paralogs, taxa and genes with high LB scores, and all outgroups except choanoflagellates removed (Fig. 2, dataset 15). (D) Dataset with slowest evolving half of genes after certain and uncertain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 16). (E) Dataset with only genes with the lowest 50% of RCFV values after certain and uncertain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 17). (F) Dataset with certain and uncertain paralogs and heterogeneous genes removed (Fig. 2, dataset 18). Bootstrap values for each node are 100% unless otherwise noted.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 4of11 Saccoglossus kowalevskii Strongylocentrotus purpuratus Strongylocentrotus purpuratus Saccoglossus kowalevskii A Eunicella verrucosa B Petromyzon marinus C Acropora digitifera Homo sapiens Petromyzon marinus Platygyra carnosa Danio rerio Homo sapiens Nematostella vectensis Priapulus caudatus Danio rerio 98 Bolocera tuediae 70 Lithobius forficatus Drosophila melanogaster Aiptasia pallida Drosophila melanogaster Daphnia pulex 84 73 Hormathia digitata Daphnia pulex 98 Aurelia aurita Lithobius forficatus 99 Hemithiris psittacea Periphylla periphylla Priapulus caudatus Tubulanus polymorphus Physalia physalis 99 Tubulanus polymorphus Lottia gigantea Abylopsis tetragona Hemithiris psittacea Craseoa lathetica Capitella teleta Capitella teleta Agalma elegans 85 Aiptasia pallida Lottia gigantea Nanomia bijuga Hormathia digitata Hydra viridissima Bolocera tuediae Eunicella verrucosa Hydra oligactis Nematostella vectensis Acropora digitifera Hydra vulgaris Acropora digitifera Platygyra carnosa Saccoglossus kowalevskii Platygyra carnosa Nematostella vectensis Strongylocentrotus purpuratus Eunicella verrucosa Bolocera tuediae Petromyzon marinus Aurelia aurita Aiptasia pallida Danio rerio Periphylla periphylla 87 Homo sapiens Abylopsis tetragona Hormathia digitata Priapulus caudatus Craseoa lathetica 97 Aurelia aurita 68 93 Drosophila melanogaster Agalma elegans Periphylla periphylla Daphnia pulex Nanomia bijuga Hydra viridissima Lithobius forficatus Physalia physalis Capitella teleta Hydra vulgaris Hydra viridissima Lottia gigantea Hydra oligactis Tubulanus polymorphus Hydra vulgaris Abylopsis tetragona Hydra oligactis Hemithiris psittacea Craseoa lathetica Trichoplax adhaerens Trichoplax adhaerens Nanomia bijuga Oscarella carmela Amphimedon queenslandica Agalma elegans Corticium candelabrum Petrosia ficiformis Sycon coactum Pseudospongosorites suberitoides Physalia physalis Sycon ciliatum Latrunculia apicalis Trichoplax adhaerens Hyalonema populiferum Crella elegans Hyalonema populiferum Rossella fibulata 70 Kirkpatrickia variolosa Aphrocallistes vastus Sympagella nux 96 Tethya wilhelma Rossella fibulata Aphrocallistes vastus Spongilla alba Sympagella nux Spongilla alba Ephydatia muelleri 99 Ircinia fasciculata Ephydatia muelleri Chondrilla nucula Pseudospongosorites suberitoides Ircinia fasciculata Chondrilla nucula Latrunculia apicalis Hyalonema populiferum Amphimedon queenslandica Crella elegans Rossella fibulata Petrosia ficiformis Kirkpatrickia variolosa 53 Sympagella nux Spongilla alba Tethya wilhelma Amphimedon queenslandica Aphrocallistes vastus Ephydatia muelleri Petrosia ficiformis Sycon coactum Pseudospongosorites suberitoides 77 Chondrilla nucula Sycon ciliatum Tethya wilhelma Oscarella carmela Ircinia fasciculata Latrunculia apicalis Euplokamis dunlapae Corticium candelabrum 54 Crella elegans Pleurobrachia bachei Euplokamis dunlapae Pleurobrachia sp. Coeloplana astericola Kirkpatrickia variolosa Beroe abyssicola Sycon coactum 99 Vallicula sp. Mnemiopsis leidyi Pleurobrachia bachei 99 Sycon ciliatum Bolinopsis infundibulum Pleurobrachia sp. Corticium candelabrum Mertensiidae sp. Dryodora glandiformis Oscarella carmela Dryodora glandiformis Beroe abyssicola 79 Euplokamis dunlapae Coeloplana astericola Mnemiopsis leidyi Vallicula sp. Coeloplana astericola 96 Bolinopsis infundibulum Monosiga ovata Vallicula sp. 74 Mertensiidae sp. Monosiga brevicollis Monosiga ovata Dryodora glandiformis 96 Salpingoeca rosetta Salpingoeca rosetta Beroe abyssicola 93 Acanthoeca spectabilis Mertensiidae sp. Salpingoeca pyxidium Ministeria vibrans Ministeria vibrans Capsaspora owczarzaki Bolinopsis infundibulum 97 Capsaspora owczarzaki Amoebidium parasiticum Mnemiopsis leidyi 57 Amoebidium parasiticum 79 Sphaeroforma arctica Pleurobrachia sp. Sphaeroforma arctica Allomyces macrogynus Pleurobrachia bachei Rhizopus oryzae 70 Spizellomyces punctatus Monosiga ovata Rhizopus oryzae Mortierella verticillata Salpingoeca rosetta Mortierella verticillata Spizellomyces punctatus Saccharomyces cerevisiae 0.2 Neurospora crassa 0.2 Aspergillus fumigatus 78 Allomyces macrogynus

0.3

D E Priapulus caudatus F Saccoglossus kowalevskii Drosophila melanogaster Strongylocentrotus purpuratus Saccoglossus kowalevskii Daphnia pulex Danio rerio 86 Strongylocentrotus purpuratus Danio rerio Lithobius forficatus Homo sapiens Homo sapiens Lottia gigantea Petromyzon marinus Petromyzon marinus Capitella teleta Priapulus caudatus Priapulus caudatus 99 Drosophila melanogaster Drosophila melanogaster Hemithiris psittacea 81 Daphnia pulex Tubulanus polymorphus Daphnia pulex Lithobius forficatus Saccoglossus kowalevskii Lithobius forficatus Lottia gigantea Strongylocentrotus purpuratus Lottia gigantea Capitella teleta Capitella teleta Hemithiris psittacea 88 Danio rerio 99 Tubulanus polymorphus Homo sapiens 99 Hemithiris psittacea Eunicella verrucosa Petromyzon marinus Tubulanus polymorphus Acropora digitifera Eunicella verrucosa Platygyra carnosa 99 Eunicella verrucosa Acropora digitifera Bolocera tuediae Acropora digitifera Aiptasia pallida Platygyra carnosa Platygyra carnosa 97 Hormathia digitata Bolocera tuediae Bolocera tuediae Nematostella vectensis Aiptasia pallida Hydra viridissima Aiptasia pallida 86 Hormathia digitata 71 Hydra vulgaris 99 Hormathia digitata Nematostella vectensis Hydra oligactis Nematostella vectensis Physalia physalis Hydra viridissima Hydra viridissima 63 Abylopsis tetragona Hydra vulgaris Craseoa lathetica Hydra vulgaris 69 Hydra oligactis Nanomia bijuga Hydra oligactis Physalia physalis Agalma elegans Physalia physalis Aurelia aurita Abylopsis tetragona Abylopsis tetragona Periphylla periphylla Craseoa lathetica Trichoplax adhaerens Craseoa lathetica Nanomia bijuga Hyalonema populiferum Nanomia bijuga Aphrocallistes vastus Agalma elegans Agalma elegans Rossella fibulata Aurelia aurita Aurelia aurita Sympagella nux Periphylla periphylla Chondrilla nucula Periphylla periphylla Trichoplax adhaerens Ircinia fasciculata Trichoplax adhaerens Petrosia ficiformis 98 Chondrilla nucula Hyalonema populiferum Amphimedon queenslandica Ircinia fasciculata Pseudospongosorites suberitoides Aphrocallistes vastus Petrosia ficiformis Latrunculia apicalis Rossella fibulata Amphimedon queenslandica Kirkpatrickia variolosa 84 Sympagella nux Pseudospongosorites suberitoides 44 Crella elegans Tethya wilhelma Ircinia fasciculata Latrunculia apicalis Spongilla alba Chondrilla nucula Kirkpatrickia variolosa Ephydatia muelleri Petrosia ficiformis 99 Crella elegans Sycon coactum 51 Amphimedon queenslandica Sycon ciliatum Tethya wilhelma Corticium candelabrum Pseudospongosorites suberitoides Spongilla alba Oscarella carmela Latrunculia apicalis Ephydatia muelleri Vallicula sp. Kirkpatrickia variolosa Hyalonema populiferum Coeloplana astericola 92 Crella elegans Aphrocallistes vastus Pleurobrachia bachei 57 92 Pleurobrachia sp. Tethya wilhelma Rossella fibulata Beroe abyssicola Spongilla alba Sympagella nux Bolinopsis infundibulum Ephydatia muelleri Corticium candelabrum Mertensiidae sp. Mnemiopsis leidyi Sycon coactum 98 Oscarella carmela Dryodora glandiformis 95 Sycon ciliatum Sycon coactum Euplokamis dunlapae Corticium candelabrum Sycon ciliatum 0.2 Oscarella carmela Vallicula sp. Vallicula sp. Coeloplana astericola Coeloplana astericola Pleurobrachia bachei Pleurobrachia bachei Pleurobrachia sp. Pleurobrachia sp. Beroe abyssicola Beroe abyssicola Bolinopsis infundibulum Bolinopsis infundibulum Mertensiidae sp. Mertensiidae sp. Mnemiopsis leidyi Mnemiopsis leidyi Dryodora glandiformis Dryodora glandiformis Euplokamis dunlapae Euplokamis dunlapae 0.2 0.2

Fig. S4. Maximum-likelihood topologies. (A) Dataset with certain and uncertain paralogs, heterogeneous genes, and genes with high LB scores removed (Fig. 2, dataset 19). (B) Dataset with certain and uncertain paralogs, heterogeneous genes, and taxa and genes with high LB scores removed (Fig. 2, dataset 20). (C) Dataset with certain and uncertain” paralogs, heterogeneous genes, taxa and genes with high LB scores, and all outgroups except choanoflagellates removed (Fig. 2, dataset 21). (D) Dataset with certain paralogs, taxa and genes with high LB scores, and all outgroups removed (Fig. 2, dataset 22). (E) Dataset with certain paralogs, heterogeneous genes, taxa and genes with high LB scores, and all outgroups removed (Fig. 2, dataset 23). (F) Dataset with certain and uncertain paralogs, taxa and genes with high LB scores, and all outgroups removed (Fig. 2, dataset 24). Bootstrap values for each node are 100% unless otherwise noted.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 5of11 A B C Saccoglossus kowalevskii Petromyzon marinus Strongylocentrotus purpuratus Strongylocentrotus purpuratus Homo sapiens Saccoglossus kowalevskii Danio rerio Danio rerio Petromyzon marinus Homo sapiens Homo sapiens Strongylocentrotus purpuratus Danio rerio Petromyzon marinus Saccoglossus kowalevskii 0.68 Priapulus caudatus Priapulus caudatus Priapulus caudatus Lithobius forficatus 98 0.83 Lithobius forficatus 92 Drosophila melanogaster Daphnia pulex Daphnia pulex Daphnia pulex Drosophila melanogaster Drosophila melanogaster Lithobius forficatus Tubulanus polymorphus Tubulanus polymorphus Lottia gigantea Hemithiris psittacea Hemithiris psittacea Capitella teleta Lottia gigantea Lottia gigantea Hemithiris psittacea Capitella teleta Capitella teleta Tubulanus polymorphus Eunicella verrucosa Eunicella verrucosa Eunicella verrucosa Platygyra carnosa Platygyra carnosa Acropora digitifera Acropora digitifera Acropora digitifera Nematostella vectensis Platygyra carnosa Nematostella vectensis Bolocera tuediae Bolocera tuediae Bolocera tuediae Hormathia digitata Hormathia digitata Aiptasia pallida 0.98 Aiptasia pallida Aiptasia pallida 85 Hormathia digitata Hydra viridissima Hydra viridissima Nematostella vectensis Hydra vulgaris Hydra vulgaris 55 Hydra viridissima Hydra oligactis Hydra oligactis Hydra vulgaris Physalia physalis Physalia physalis Hydra oligactis Craseoa lathetica Craseoa lathetica Physalia physalis Abylopsis tetragona Abylopsis tetragona Abylopsis tetragona Nanomia bijuga Nanomia bijuga Craseoa lathetica Agalma elegans Agalma elegans Aurelia aurita Aurelia aurita Nanomia bijuga Periphylla periphylla Periphylla periphylla Agalma elegans Trichoplax adhaerens Trichoplax adhaerens Aurelia aurita Sycon ciliatum Sycon ciliatum Periphylla periphylla Sycon coactum Sycon coactum Trichoplax adhaerens Oscarella carmela Oscarella carmela Chondrilla nucula Corticium candelabrum Corticium candelabrum Ircinia fasciculata Hyalonema populiferum Hyalonema populiferum Petrosia ficiformis Sympagella nux Sympagella nux Amphimedon queenslandica Rossella fibulata Rossella fibulata Pseudospongosorites suberitoides Aphrocallistes vastus Aphrocallistes vastus Latrunculia apicalis Ircinia fasciculata Ircinia fasciculata Chondrilla nucula Kirkpatrickia variolosa Chondrilla nucula Spongilla alba Spongilla alba Crella elegans 61 Ephydatia muelleri Ephydatia muelleri Tethya wilhelma Pseudospongosorites suberitoides Pseudospongosorites suberitoides Spongilla alba Tethya wilhelma Tethya wilhelma Ephydatia muelleri Latrunculia apicalis 0.69 0.66 Latrunculia apicalis Hyalonema populiferum Kirkpatrickia variolosa Kirkpatrickia variolosa 96 Aphrocallistes vastus Crella elegans Crella elegans Rossella fibulata Petrosia ficiformis Petrosia ficiformis Sympagella nux Amphimedon queenslandica Amphimedon queenslandica Corticium candelabrum Euplokamis dunlapae Euplokamis dunlapae Vallicula sp. 99 Oscarella carmela Vallicula sp. Coeloplana astericola Sycon coactum Coeloplana astericola Pleurobrachia bachei Pleurobrachia bachei Sycon ciliatum Pleurobrachia sp. Pleurobrachia sp. Vallicula sp. Dryodora glandiformis Dryodora glandiformis Coeloplana astericola 0.87 Mnemiopsis leidyi 0.99 Mertensiidae sp. Pleurobrachia bachei Mertensiidae sp. Bolinopsis infundibulum Pleurobrachia sp. Bolinopsis infundibulum 0.84 Mnemiopsis leidyi Beroe abyssicola Beroe abyssicola Beroe abyssicola 99 Bolinopsis infundibulum Salpingoeca rosetta Salpingoeca rosetta Mertensiidae sp. Monosiga ovata Monosiga ovata 98 Mnemiopsis leidyi Sphaeroforma arctica Sphaeroforma arctica Amoebidium parasiticum Dryodora glandiformis Amoebidium parasiticum Ministeria vibrans Euplokamis dunlapae Ministeria vibrans Capsaspora owczarzaki Capsaspora owczarzaki Spizellomyces punctatus 0.2 Spizellomyces punctatus Allomyces macrogynus Allomyces macrogynus Rhizopus oryzae Rhizopus oryzae Mortierella verticillata Mortierella verticillata

0.3 0.3 D E 0.55 Xenoturbella bocki F Strongylocentrotus purpuratus 120 Saccoglossus kowalevskii Oscarella Petromyzon marinus Calcarea 72 Heterochone calyx Danio rerio 100 Oopsacas minuta Homoscleromorpha 0.99 Molgula tectigormis 86 Amphimedon queenslandica 42 Ciona intestinalis 95 81 Branchiostoma 80 Ephydatia Tubifex tubifex Carteriospongia foliascens Helobdella Trichoplax adhaerens Capitella sp. Euprymna 60 Pedicellin cernua 86 85 Crassostrea Density 0.96 Euprymna 99 Aplysia californica 43 Pedicellina cernua Crassostrea 40 Capitella sp. Aplysia californica Euperipatoides Tubifex tubifex 0.61 Helobdella Pediculus humanus 20 29 Euperipatoides Nasonia vitripennis 77 Ixodes scapularis Daphnia 57 Anoplodactylus eroticus Ixodes scapularis 0 Scutigera coleoptera 0.99 Scutigera coleoptera 47 Nasonia vitripennis 0.70 Anoplodactylus eroticus Pediculus humanus Cyanea capillata 0.97 0.98 0.99 1.00 Daphnia Hydra magnipapillata 43 Xenoturbella bocki Podocoryne carnea G 89 Strongylocentrotus purpuratus Hydractini echinata Saccoglossus kowalevskii Clytia hemisphaerica 50 Petromyzon marinus 0.90 Nematostella vectensis Trichoplax adhaerens 55 78 100 Danio rerio Metridium senile 93 50 Molgula tectiformis Acropora Ciona intestinalis Montastrae faveolata 40 Branchiostoma Pleurobrachia pileus Cyanea capillata Metrinsiidae sp. Hydra magnipapillata 0.94 Mnemiopsis leiydi Podocoryne carnea 30 93 Trichoplax adhaerens 98 Hydractini echinata Sycon raphanus Clytia hemisphaerica Leucetta chagosensis

Nematostella vectensis Oscarella Density 20 Metridium senile 0.99 Oopsacas minuta 99 Montastrae faveolata Acropora 0.62 Heterochone calyx Leucetta chagosensis Amphimedon queenslandica 10 Sycon raphanus Suberites Pleurobracia pileus 0.90 Ephydatia 76 Mnemiopsis leiydi Carteriospongia foliascens Metridium senile Monosiga ovata 0 Monosiga brevicollis Proterospongia sp. Proterospongia sp. Monosiga brevicollis 0.92 0.94 0.96 0.98 1.00 Monosiga ovata Capsaspora owczarzaki 99 Capsaspora owczarzaki Sphaeroforma arctica Average Leaf Stability Index Amoebidium parasiticum Amoebidium parasiticum Sphaeroforma arctica Rhizopus orizae Phycomyces blakesleeanus Phycomyces blakesleeanus 80 Rhizopus orizae Allomyces macrogynus Allomyces macrogynus Spizellomyces punctatus Batrachochochytrium dendrobatidis Batrachochytrium dendrobatidis Spizellomyces punctatus 0.2 0.07

Fig. S5. (A) Maximum-likelihood topology of dataset with certain and uncertain paralogs, heterogeneous genes, taxa and genes with high LB scores, and all outgroups removed (Fig. 2, dataset 25). (B) Bayesian inference topology of dataset of slowest evolving half of genes after certain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 6). (C) Bayesian inference topology of dataset of slowest evolving half of genes after certain and uncertain paralogs and taxa and genes with high LB scores were removed (Fig. 2, dataset 16). (D) Maximum-likelihood phylogeny of Philippe et al. (1) dataset with ribosomal protein genes removed. Chimeric taxa are labeled only by their name as in the original publication. (E) Bayesian inference topology of Philippe et al. (1) dataset with ribosomal protein genes removed. Chimeric taxa are labeled only by their genus name as in the original publication. Bootstrap values or posterior probabilities for each node are 100% unless otherwise noted. (F) Density plot of average leaf stability indices for each taxon across all datasets. (G) Density plot of average leaf stability indices for each taxon in datasets without outgroups.

1. Philippe H, et al. (2009) Phylogenomics revives traditional views on deep animal relationships. Curr Biol 19(8):706–712.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 6of11 A

6 taxa

B

10 genes

C

18 genes

Fig. S6. Density plots of (A) average LB scores for each taxon, (B) average upper quartile LB score for each OG, and (C) SD of LB score for each OG. Shaded areas include taxa or genes that were considered to have “high” LB scores and were trimmed from respective datasets (see Fig. 2).

Whelan et al. www.pnas.org/cgi/content/short/1503453112 7of11 Dataset 3 Dataset 2 Dataset 3 Dataset 2 Dataset 4 Dataset 8 Dataset 4 Dataset 8 Dataset 6 Dataset 9 Dataset 6 Dataset 9 Dataset 7 Dataset 10 Dataset 7 Dataset 10 Density Density 0123456 0123456 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 −0.1 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 −0.2 0.0 0.2 0.4 0.6 0.8 1.0

Dataset 13 Dataset 12 Dataset 13 Dataset 12 Dataset 14 Dataset 18 Dataset 14 Dataset 18 Dataset 16 Dataset 19 Dataset 16 Dataset 19 Dataset 17 Dataset 20 Dataset 17 Dataset 20 Density Density 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0123456 0123456 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 R 2 of Linear Regression Slope of Linear Regression Nosenko Ribosomal Nosenko Ribosomal Nosenko Non-Ribosomal Nosenko Non-Ribosomal Philippe et al. Ribosomal Philippe et al. Ribosomal Philippe et al. Non-Ribosomal Philippe et al. Non-Ribosomal Density Density 0.0 0.5 1.0 1.5 2.0 2.5

012345 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 −0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 R 2 of Linear Regression Slope of Linear Regression

Fig. S7. Density plots of gene specific slope and R2 of linear regression between patristic and uncorrected pairwise distances for each dataset. Datasets with higher values for each metric are less saturated. Dataset numbers are referenced as in Fig. 2.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 8of11 Table S1. Raw data for each species used in orthology searches Sequence No. of reads, transcripts Source or Species type or predicted proteins collection locality NCBI or other accession

Fungi Allomyces macrogynus Whole genome 19,446 proteins Broad Institute www.broadinstitute.org/annotation/genome/ multicellularity_project/MultiHome.html Aspergillus fumigatus† Whole genome 11,485 proteins InParanoid database Mortierella verticillata Whole genome 12,569 proteins Broad www.broadinstitute.org/annotation/genome/ multicellularity_project/MultiHome.html Neurospora crassa† Whole genome 9,822 proteins InParanoid database Rhizopus oryzae Whole genome 16,971 proteins InParanoid database Saccharomyces cerevisiae* Whole genome 6,590 proteins InParanoid database Spizellomyces punctatus Whole genome 9,424 proteins Broad Institute www.broadinstitute.org/annotation/genome/ multicellularity_project/MultiHome.html Ichthyosporea Amoebidium parasiticum Illumina 105,237 transcripts Broad Institute www.broadinstitute.org/annotation/genome/ multicellularity_project/MultiHome.html Capsaspora owczarzaki Whole genome 101,23 proteins Broad Institute www.broadinstitute.org/annotation/genome/ multicellularity_project/MultiHome.html Ministeria vibrans Illumina 19,953,296 NCBI SRA SRR343051 Sphaeroforma arctica Whole genome 18,730 proteins Broad Institute www.broadinstitute.org/annotation/genome/ multicellularity_project/MultiHome.html Choanoflagellata Acanthoeca spectabilis* Illumina 41,408,758 reads ATCC SRR1915695 Monosiga brevicollis* Whole genome 10336 proteins Joint Genome Institute Monosiga ovata Sanger 29,495 sequences dbEST Salpingoeca pyxidium* Illumina 34,177,888 reads ATCC SRR1915694 Salpingoeca rosetta Whole genome 11,731 proteins Broad Institute www.broadinstitute.org/annotation/genome/ multicellularity_project/MultiHome.html Placozoa Trichoplax adhaerans Whole genome 11,179 proteins Joint Genome Institute Cnidaria Acropora digitifera Whole genome 33,366 proteins OIST Marine genomics Unit Abylopsis tetragona Illumina 43150352 reads NCBI SRA SRR871525 Bolocera tuediae 454 546,903 reads NCBI SRA SRR504347 Agalma elegans Illumina 107,996,364 reads NCBI SRA SRR871526 Nanomia bijuga Illumina 105,066,886 reads NCBI SRA SRR871527 Aiptasia pallida Illumina 536616844 reads NCBI SRA SRR696721; SRR696732; SRR696745 Aurelia aurita 454 1,099,012 reads NCBI SRA SRR040475; SRR040476; SRR040477; (strain Roscoff) SRR040478; SRR040479 Eunicella verrucosa Illumina 70,071,835 reads NCBI SRA SRR1324944; SRR1324945 Hydra oligactis 454 1,017,333 reads NCBI SRA SRR040466; SRR040467; SRR040468; SRR040469 Hydra viridissima 454 1,048,810 reads NCBI SRA SRR040470; SRR040471; SRR040472; SRR040473 Hormathia digitata 454 546,846 reads NCBI SRA SRR504348 Hydra vulgaris Sanger 167,982 sequences dbEST Nematostella vectensis Whole genome 27,273 proteins Joint Genome Institute Periphylla periphylla Illumina 44, 443,566 45°55.33N 125°05.47 W SRR1915828 Physalia physalis Illumina 72,963,546 reads NCBI SRA SRR871528 Platygyra carnosus Illumina 72,963,546 reads NCBI SRA SRR402974; SRR402975 Craseoa lathetica Illumina 76,466,398 reads NCBI SRA SRR871529 Porifera Amphimedon Sanger 63,542 sequences dbEST queenslandica Aphrocallistes vastus Illumina preassembled Evolution & Research Archive Chondrilla nucula Illumina preassembled Dataverse Network (1) Corticium candelabrum Illumina preassembled Dataverse Network (1) Crella elegans Illumina preassembled Dryad doi: 10.5061/dryad.50dc6/3 Ephydatia muelleri Illumina 156,515,380 reads NCBI SRA SRR1041944 Hyalonema populiferum Illumina 62,031,394 reads 43°55.376N 127°24.65W SRR1916923 Ircinia fasciculata Illumina preassembled Dataverse Network (1) Kirkpatrickia variolosa Illumina 59,234,424 reads 77°54.228S 170°57.915E SRR1916957

Whelan et al. www.pnas.org/cgi/content/short/1503453112 9of11 Table S1. Cont. Sequence No. of reads, transcripts Source or Species type or predicted proteins collection locality NCBI or other accession

Latrunculia apicalis Illumina 49,432,908 reads 82°54.228S 175°57.915 E SRR1915755 Oscarella carmela Illumina Preassembled www.compagen.org Petrosia ficiformis Illumina preassembled Dataverse Network (1) Pseudospongosorites Illumina preassembled Dataverse Network (1) suberitoides Rossella fibulata Illumina 79,607,548 reads 76°14.716S 174°30.247E SRR1915835 Spongilla alba Illumina preassembled Dataverse Network (1) Sycon coactum Illumina preassembled Dataverse Network (1) Sympagella nux Illumina 30,065,060 reads 28°10.26N 89°56.80W SRR1916581 Tethya wilhelma 454 442,949 reads NCBI SRA ERR216193 Ctenophora Beroe abyssicola Illumina 45,444,644 reads NCBI SRA SRR777787 Bolinopsis infundibulum Illumina 43,377,170 reads NCBI SRA SRR786491 Coeloplana astericola Illumina 41,685,220 reads NCBI SRA SRR786490 Dryodora glandiformis Illumina 41,269,166 reads NCBI SRA SRR777788 Euplokamis dunlapae Illumina 68,302,698 reads NCBI SRA SRR777663 Mertensiidae sp. Illumina 47,454,246 reads NCBI SRA SRR786492 Mnemiopsis leidyi Illumina 49,782,900 reads NCBI SRA SRR789900 Pleurobrachia bachei Whole genome 80,151 proteins neurobase.rc.ufl.edu/ pleurobrachia Pleurobrachia sp. Illumina 50626422 reads NCBI SRA SRR789901 Vallicula sp. Illumina 49,090,372 reads NCBI SRA SRR786489 Bilateria Homo sapiens Whole genome N/A HaMStR core orthologs Danio rerio Whole genome 54,390 proteins EnSEMBL Petromyzon_marinus Whole genome 12,444 proteins EnSEMBL Saccoglossus kowalevskii Sanger 202,190 sequences dbEST Strongylocentrotus Whole genome 28,474 proteins InParanoid purpuratus Capitella teleta Whole genome 32,415 proteins Joint Genome Institute Hemithiris psittacea Illumina 60,730,022 reads NCBI SRA SRR1611556 Lottia gigantea Whole genome 22,899 proteins Joint Genome Institute Tubulanus polymorphus Illumina 39,262,732 reads NCBI SRA SRR1611583 Daphnia pulex Whole genome 30,137 proteins Joint Genome Institute Drosophila melanogaster Whole genome N/A HaMStR core orthologs Priapulus caudatus Illumina 57,331,982 reads NCBI SRA SRR1611567 Lithobius forficatus Illumina 57,098,550 reads NCBI SRA SRR1159752

Taxa in bold were sequenced in this study. *Taxa identified as possibly causing LBA based on LB scores.

1. Riesgo A, Farrar N, Windsor PJ, Giribet G, Leys SP (2014) The analysis of eight transcriptomes from all Poriferan classes reveals surprising genetic complexity in sponges. Mol Biol Evol 31(5):1102–1120.

Whelan et al. www.pnas.org/cgi/content/short/1503453112 10 of 11 Table S2. Data statistics for each phylogenetic dataset Missing data Dataset (no.) Taxa Genes Sites Gene occupancy, % (including gaps), %

Full dataset (1) 76 251 81,008 75 40 Certain paralogs removed (2) 76 250 80,735 75 40 Taxa with high LB scores removed (3) 70 250 80,735 75 41 Genes with high LB scores removed (4) 70 231 73,323 75 41 All outgroups except Choanoflagellates removed (5) 62 231 73,323 73 44 All outgroups removed (6) 60 231 73,323 73 44 Slowest evolving half of genes (7) 70 115 33,403 80 35 Genes with lowest half of RCFV values (8) 70 115 41,933 81 43 Heterogeneous genes removed (9) 76 228 66,571 76 38 Genes with high LB scores removed (10) 76 210 59,733 75 38 Taxa with high LB scores removed (11) 70 210 59,733 82 38 All outgroups except Choanoflagellates removed (12) 62 210 59,733 74 40 All outgroups removed (13) 60 210 59,733 75 40 Certain and uncertain paralogs removed (14) 76 209 60,768 72 41 Taxa with high LB scores removed (15) 70 209 60,768 78 42 Genes with high LB scores removed (16) 70 191 54,193 78 42 All outgroups except Choanoflagellates removed (17) 62 191 54,193 70 44 All outgroups removed (18) 60 191 54,193 71 44 Slowest evolving half of genes (19) 70 89 23,680 78 35 Genes with lowest half of RCFV values (20) 70 96 33,069 80 40 Heterogeneous genes removed (21) 76 195 52,543 73 39 Genes with high LB scores removed (22) 76 178 46,542 73 39 Taxa with high LB scores removed (23) 70 178 46,542 79 34 All outgroups except Choanoflagellates removed (24) 62 178 46,542 71 41 All outgroups removed (25) 60 178 46,542 72 41

Datasets nested in others have the same data filtered as corresponding upper-level datasets (see Fig. 2).

Whelan et al. www.pnas.org/cgi/content/short/1503453112 11 of 11