<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Lack of support for Deuterostomia prompts reinterpretation of the first .

2

3 Authors and Affiliations

4 Paschalia Kapli1, Paschalis Natsidis1, Daniel J. Leite1, Maximilian Fursman1,&, Nadia Jeffrie1, 5 Imran A. Rahman2, Hervé Philippe3, Richard R. Copley4, Maximilian J. Telford1*

6 7 1. Centre for ’s Origins and Evolution, Department of Genetics, Evolution and Environment, 8 University College London, Gower Street, London WC1E 6BT, UK 9 10 2. Oxford University Museum of Natural History, Parks Road, Oxford OX1 3PW, UK

11 3. Station d’Ecologie Théorique et Expérimentale, UMR CNRS 5321, Université Paul Sabatier, 12 09200 Moulis, France

13 4. Laboratoire de Biologie du Développement de Villefranche-sur-mer (LBDV), Sorbonne 14 Université, CNRS, 06230 Villefranche-sur-mer, France

15 & Current Address: Geology and Palaeoenvironmental Research Department, Institute of 16 Geosciences, Goethe University Frankfurt - Riedberg Campus, Altenhöferallee 1, 60438 17 Frankfurt am Main

18

19

20

21

22

23 *Correspondence: [email protected].

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

24 Abstract

25 The bilaterally symmetric (Bilateria) are considered to comprise two monophyletic 26 groups, Protostomia and Deuterostomia. Protostomia contains the and the 27 ; Deuterostomia contains the Chordata and the (Hemichordata, 28 Echinodermata and ). Their names refer to a supposed distinct origin of the 29 (stoma) in the two , but these groups have been differentiated by other 30 embryological characters including embryonic patterns and different ways of forming 31 their and . is not consistently supported by recent 32 studies. Here we compare support for Protostomia and Deuterostomia using five recently 33 published, phylogenomic datasets. Protostomia is always strongly supported, especially by 34 longer and higher quality genes. Support for Deuterostomia is always equivocal and barely 35 higher than support for paraphyletic alternatives. Conditions that can cause tree reconstruction 36 errors - inadequate models, short internal branch, faster evolving genes, and unequal branch 37 lengths - correlate with statistical support for monophyletic . Simulation 38 experiments show that support for Deuterostomia could be explained by systematic error. A 39 survey of molecular characters supposedly diagnostic of deuterostomes shows many are not valid 40 synapomorphies. The branch between bilaterian and deuterostome common ancestors, if real, is 41 very short. This finding fits with growing evidence suggesting the common ancestor of all 42 Bilateria had many deuterostome characteristics. This finding has important implications for our 43 understanding of early evolution and for the interpretation of some enigmatic 44 such as vetulicolians and banffiids. 45

46

47

48

49

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

50 Main 51 52 In the case of a burst of diversification such as is apparent at the base of the Cambrian, 53 reconstructing the evolutionary history of taxa is difficult. Limited time between 54 (short internal branches) lead to the accumulation of a small amount of phylogenetic signal and 55 this signal is easily blurred by the numerous substitutions occurring later (long terminal 56 branches). These problems are exacerbated by heterogeneities in the evolutionary process which 57 can result in model violations that can cause systematic errors1. Very short internal branches also 58 strongly imply minimal evolutionary change meaning that common ancestors of nested clades 59 are expected to have been highly similar. 60 61 The deuterostome branch has weak support. 62 The different topologies relating Chordata, Xenambulacraria and Protostomia seen in recently 63 published animal phylogenies suggest that the support for monophyletic deuterostomes (Fig. 1a) 64 may be low, in contrast with the consistent, strong support for the monophyly of the . 65 To study this difference, we have gathered five recent, independently generated, phylogenomic 66 datasets covering the diversity of animal phyla (i.e., 2–6). We use these data to investigate the 67 support for different topologies relating the Chordata, Xenambulacraria, Ecdysozoa and 68 Lophotrochozoa. We first asked whether the branches leading to protostomes and deuterostomes 69 were of similar length, under the assumption that both are monophyletic. 70 71 Using a fixed topology taken from the original publications for each dataset (modified when 72 necessary to enforce monophyly of deuterostomes), we optimised branch lengths using the site- 73 homogeneous LG+F+G model. In Figure 1b we see that, while the overall amount of change in 74 branches leading from the bilaterian common ancestor to both protostomes and deuterostomes 75 differs between datasets, in all cases, the branch leading to the protostomes is approximately 76 twice the length of the branch leading to the deuterostomes. This is a conservative measure of the 77 difference, as better fitting models (see later) give longer estimates for the branch 78 and shorter estimates for the deuterostome branch (for the Laumer dataset the protostome branch 79 is estimated as 1.75 times longer than deuterostome under LG and 5.7 times longer under CAT-

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

80 LG). This difference between protostome and deuterostome branches is reinforced when we look 81 at the change in log likelihood (∆lnL) between a fully resolved tree compared to trees in which 82 either the protostome or deuterostome branch is collapsed into a polytomy. The ∆lnL observed 83 on collapsing the branch leading to protostomes is considerably greater for all five datasets than 84 the ∆lnL when the deuterostome branch is collapsed (Fig. 1c), showing considerably less signal 85 supporting deuterostomes than protostomes. 86 87 To get a first indication of whether this weaker support for deuterostomes results from conflict 88 between genes or from a lack of signal across genes and across datasets, we repeated these 89 measures of branch lengths and ∆lnL on individual genes in all five datasets (Fig. 1d,e). We find 90 that, for the great majority of genes across all datasets, as for the concatenated datasets, there is a 91 longer branch leading to the protostomes than to deuterostomes (P longer: 3376 (70%); D longer: 92 1402 (29%), Supplementary Data 1) and consistently stronger support (larger ∆lnL) for 93 protostomes than for deuterostomes (∆lnL P larger: 3312 (68.6%); ∆lnL D larger: 1345 (27.8%), 94 Supplementary Data 2). Most genes support monophyletic deuterostomes much less strongly 95 than they support protostomes. 96 97 Minority of genes support deuterostome monophyly. 98 While the difference in branch lengths and likelihood indicate weaker evidence for the 99 monophyly of deuterostomes across different genes relative to the support for protostomes, the 100 measures presented are based on trees in which deuterostomes were constrained to be 101 monophyletic, meaning that we could not show any topological conflict between genes. To 102 measure conflict between genes we considered each gene individually and compared the 103 lnLikelihood of a tree supporting protostome monophyly (“PM”) or deuterostome monophyly 104 (“DM”) with the lnLikelihoods of two alternative topologies for each of these two clades: 105 Protostome “P1”: Lophotrochozoa sister to Deuterostomia; Protostome Paraphyly 106 “P2”: Ecdysozoa sister to Deuterostomia; Deuterostome Paraphyly “D1”: Xenambulacraria sister 107 to Protostomia; Deuterostome Paraphyly “D2”: Chordata sister to Protostomia. (see Fig. 2). 108 We visualised the relative lnLikelihoods for the three topologies for every gene on a triangular 109 plot (Fig. 2). For the protostome trees, most genes have a strong preference for a single topology

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

110 (77.31% on average across datasets with the LG+F+G model, Extended Data Table 1); amongst 111 the genes that strongly prefer one topology, we see a clear majority supporting monophyletic 112 protostomes (58.94% vs 9.71% and 8.66% for paraphyletic protostomes P1 and P2 respectively, 113 Extended Data Table 1). 114 115 For the deuterostomes, the results are more equivocal; fewer genes have strong preference for a 116 single topology (66.99% on average, Extended Data Table 2) and, amongst genes that do prefer 117 one topology, there is only a small excess in the number of genes strongly supporting 118 monophyletic deuterostomes over the other two topologies (28.65% vs 19.17% and 19.17% for 119 paraphyletic deuterostomes D1 and D2 respectively, Extended Data Table 2). Overall, these 120 experiments show that, while the protostome is strongly supported over either of the two 121 topologies with paraphyletic protostomes, the same is not true for monophyletic deuterostomes 122 which have a very small advantage over either of the alternative topologies. 123 124 For all five datasets, the gene alignments strongly supporting the monophyletic protostomes 125 topology were longer and had higher monophyly scores (a measure of their ability to support 126 known clades) compared to the gene alignments that supported either of the two alternative 127 topologies (Extended Data Fig. 1). For the deuterostomes, there was minimal evidence of a 128 significant difference of alignment length between genes supporting DM versus D1 or D2 129 (Extended Data Fig. 1) and there were no significant differences in monophyly scores between 130 genes supporting DM versus D1 or D2 (Extended Data Table 3). Monophyletic Protostomia, but 131 not monophyletic Deuterostomia, is strongly preferred across five independent datasets by a 132 clear majority of individual genes which are on average longer and have a stronger phylogenetic 133 signal. 134 135 Deuterostome monophyly could result from systematic error. 136 Whatever the true topology relating Xenambulacraria, Chordata and Protostomia, we have 137 shown that the branch separating them is short and might therefore be especially sensitive to 138 systematic errors. Systematic errors are generated by heterogeneities in the process of 139 substitution that are not accommodated by the models used and are exacerbated in faster

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

140 evolving datasets. In the context of unequal rates of evolution amongst taxa this may lead to a 141 (LBA) artefact7. We next consider the possibility that monophyletic 142 Deuterostomia could result from systematic error. 143 144 For each dataset we measured the average branch lengths within each of the protostome, 145 and xenambulacrarian clades (average distance from bilaterian common ancestor to 146 each within each clade) and the average distance from each taxon to the 147 bilaterian common ancestor. The branches leading to the outgroups and to the protostomes are 148 always longer than the branch leading to the deuterostomes (Fig. 3a). Under LBA, the long- 149 branched protostomes and outgroup might be expected to be mutually attracted resulting in 150 clustering of short branched Chordata and Xenambulacraria. We note that the three datasets 151 which supported monophyletic deuterostomes2,3,6 had the largest rate inequalities with the longest 152 average protostome and outgroup branch lengths relative to chordate and xenambulacrarian 153 branches (Fig. 3a). 154 155 To compare the relative rates of the 5 datasets we removed the effect of taxon sampling by 156 reducing all 5 datasets to the 8 all have in common and measured the rate of substitution 157 across the tree; the three datasets that support monophyletic Deuterostomia are the fastest 158 evolving (substitutions per site across the 8 taxon tree - Philippe: 1.971, Marletaz: 2.2082, 159 Cannon: 2.537, Laumer: 2.6578, Rouse: 2.8741). 160 161 Finally, while two datasets support paraphyletic deuterostomes under site heterogeneous 162 models4,5 which account for across site amino acid preference variability8, all five datasets 163 support monophyletic deuterostomes when using site homogeneous models. Unequal rates, faster 164 evolving loci and inadequate site homogeneous models are all known to promote LBA artifacts, 165 especially in the context of short internal nodes1,7,9 and all these conditions correlate with support 166 for Deuterostomia. 167 168 We further examined the possible effect of model misspecification by comparing branch lengths 169 estimated using different models. We used the Laumer et al. (2019)3 dataset which has the

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

170 clearest branch length heterogeneity (Fig. 3a). For computational speed, we selected 36 taxa 171 focussing on maintaining the taxonomic diversity of the full dataset and excluding taxa with a 172 large percentage of missing data. We randomly selected 50,000 sites from the full alignment of 173 106,186 sites. We conducted cross validation comparing 4 increasingly complex models (LG, 174 LG+G, C60+LG+G and CAT+LG+G). As expected less complex models have significantly 175 worse fit to the data than the more complex (average cross validation scores LG: -74877.78, 176 LG+G: -70348.95, C60+LG+G: -68707.89, CAT+LG+G: -68031.36). We estimated the lengths 177 of the branches leading to the deuterostomes, the protostomes and the outgroup with these 178 different models. Using models that fit the data increasingly well, we see that the estimated 179 length of the deuterostome branch decreases, and the length of the protostome and outgroup 180 branches increase (Fig. 3b). Using inadequate models could therefore underestimate the 181 likelihood of convergence between Protostomia and outgroup. This would be predicted to result 182 in a long branch attraction between protostomes and outgroups resulting in monophyletic 183 deuterostomes. 184 185 To evaluate further the idea that model misspecification in the context of unequal evolutionary 186 rates between taxa might lead to a long branch artifact, we followed a data simulation approach. 187 We ask whether any of the three topologies relating the Protostomia, Chordata and 188 Xenambulacraria (DM, and the two paraphyletic alternatives D1 and D2, Fig. 4a) could gain 189 artifactual support as a result of systematic error. Using the 50,000 positions and 36 taxa from 190 the Laumer dataset, we estimated parameters using the three alternative topologies (DM, D1, D2) 191 under the best fitting CAT+LG+G model implemented in phylobayes10 (Fig. 4a). For each 192 topology we simulated 100 datasets using the parameters we had estimated. 193 194 For data simulated according to the DM topology (Chordata plus Xenambulacraria) we are able 195 to reconstruct monophyletic deuterostomes (DM) correctly in 100% of replicates under both 196 correct (site heterogeneous) and model violating (site homogeneous models) (Fig. 4b). In 197 contrast, for the data simulated under the two alternative hypotheses (D1, D2), recovering the 198 true relationships was more challenging. As predicted by the consistency of probabilistic models, 199 under the correct site heterogeneous model the correct topology was always recovered. Under the

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

200 model-violating site homogeneous model, which would be expected to exacerbate LBA artifacts, 201 the true tree was recovered in 60% and 88% of the datasets simulated under the D1 and D2 202 hypotheses respectively (Fig. 4b). In both cases, the remaining 40% (D1) and 12% (D2) of the 203 datasets incorrectly recovered monophyletic deuterostomes (the DM topology). This result 204 suggests that if either D1 or D2 were the true tree, LBA stemming from a combination of the 205 long branches leading to protostomes and outgroups, short internal branches and model 206 misspecification, could result in artifactual support for deuterostome monophyly. 207 208 To explore the hypothesis of unequal rates of evolution lending exaggerated support to the 209 monophyletic Deuterostomia topology (DM), we re-inferred trees using the same simulated data 210 after first removing the 13 longest branched taxa of the protostome and outgroup clades, a 211 commonly adopted strategy to reduce LBA artifacts11,12. We re-inferred the trees using the poorly 212 fitting site homogeneous model (LG+F+G) which had resulted in some incorrect topologies with 213 the full dataset. Regardless of the topology under which the data were simulated (DM, D1 or D2) 214 removing these long branches resulted in recovering the correct topology in 100% of cases (Fig. 215 4b). Despite the model violation there was no long branch attraction artifact. 216 217 We contrast this result with an equivalent experiment designed to exaggerate any artifact caused 218 by rate heterogeneity. We removed the 13 shortest protostome and outgroup taxa from the 219 simulated data and again re-inferred the tree topologies under the site homogeneous LG+F+G 220 model. Regardless of the topology under which the data were simulated (DM, D1 or D2), we 221 recovered monophyly of the deuterostomes (DM topology) in 100% of replicates (Fig. 4b). 222 These results show that, if deuterostomes are paraphyletic (either D1 or D2 topology), then 223 conditions we have shown to affect real datasets could easily result in artifactual support for 224 monophyletic deuterostomes (Fig. 4c). 225 226 Reappraising deuterostome molecular synapomorphies. 227 Studies of different classes of molecular characters also suggest stronger support for 228 Prostostomia than Deuterostomia. Protostomes have 12 unique microRNA families13 compared 229 to 1 in deuterostomes and protostomes share 58 ’near intron pairs’ compared to just 7 in

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

230 deuterostomes. Protostomes also share a highly distinct, conserved variant of the mitochondrial 231 NAD5 protein 14 and a hidden break in their 28S ribosomal RNA15. 232 233 In their comparison of and chordate genomes, Simakov et al.16 conducted a 234 systematic search for genes unique to the deuterostomes. Using the greater number and diversity 235 of sequenced genomes now available, we have identified orthologs of many of these genes in 236 non-bilaterian metazoans and/or in protostomes. Overall, we have found evidence for 20 out of 237 31 of these ‘deuterostome novelties’ in protostomes and/or non bilaterian metazoans suggesting 238 they are bilaterian plesiomorphies (Supplementary Data 4). Equivalent searches for characters 239 unique to either Chordata plus Protostomia or Xenambulacraria plus Protostomia have not been 240 conducted and it is therefore not clear how to interpret the 11 remaining Chordata plus 241 Xenambulacraria specific characters. If deuterostomes are not monophyletic, these 11 genes 242 must have been lost in protostomes. 243 244 Simakov et al. also describe a deuterostomian ‘pharyngeal gene cluster’16. This is a micro- 245 syntenic block of 7 genes (nkx2.1, nkx2.2, msxlx, pax1/9, slc25A21, mipol1 and foxA), found 246 complete only in and Chordata. Several of these genes have functional links to 247 patterning and the formation of pharyngeal slits. Different protostomes do have some of 248 these genes linked (nkx2.1 nkx2.2 - various protostomes); nkx2.2 msxlx (Lottia); pax1/9 249 slc25A21 (Lottia); mipol1 foxA (various protostomes) and pax1/9 foxA (distant but on the same 250 chromosome in Caenorhabditis)17. Msxlx is adjacent to 3 nkx2.1/2-type genes in and 251 is also relatively close to foxA in the genomes of medusozoan and octocoral cnidarians 252 (cnidarians lack pax1/9; msxlx is also absent in hexacorals). We have also identified a linkage 253 between msxlx and pax1/9 in the genome of the protostome Phoronis australis (separated by 254 0.34 mb, with 17 intervening genes) (Supplementary Data 4). These links raise the possibility 255 that at least five (nkx2.1 nkx2.2 msxlx pax1/9 slc25A21) and perhaps all of the genes in the 256 cluster were linked in the bilaterian common ancestor and that the cluster is a bilaterian character 257 that has been dispersed in different protostome lineages (Fig. 5a). 258 259

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

260 Discussion 261 262 We have shown that the branch leading to monophyletic Deuterostomia, if it exists, is short and 263 weakly supported; for comparison we demonstrate unequivocal strong support for Protostomia 264 especially from higher quality gene alignments. Comparisons of rates and relative branch lengths 265 between data sets and the effects of model misspecification show that support for monophyletic 266 Deuterostomia correlates with conditions expected to enhance systematic error. The contention 267 that support for monophyletic Deuterostomia could result from systematic error is supported by 268 our simulation experiments. Systematic error is especially likely to affect our ability to 269 reconstruct relationships between taxa separated by the very short branches we have described. 270 271 The short branches and gene-tree heterogeneity we observe suggest that Chordata, 272 Xenambulacraria and Protostomia emerged from two closely spaced events. The gene- 273 tree discordance could reflect true differences in the evolutionary histories of genes resulting 274 from incomplete sorting18. Testing this will require the development of methods that 275 accommodate the LBA-causing heterogeneities affecting these data19. 276 277 Such a short deuterostome branch has important consequences for our understanding of character 278 evolution at the base of the Bilateria as (assuming it exists) it implies a short period of time 279 separating the last bilaterian common ancestor (Urbilateria) and the last deuterostome common 280 ancestor (Urdeuterostomia)4,20. Hybridisation between lineages following speciation is another 281 process that would blur the distinction between these ancestral taxa. If the deuterostomes are 282 paraphyletic, as some of our analyses suggest, then the last common ancestor of Chordata and 283 Xenambulacraria was the last common ancestor of the Bilateria meaning Urbilateria was a 284 deuterostome and characters common to Xenambulacraria and Chordata are bilaterian 285 plesiomorphies. 286 287 The idea that Urbilateria possessed some deuterostome characters is not new. Grobben included 288 the in his Deuterostomia (Grobben 1908)21. Chaetognaths are now known 289 to be lophotrochozoan protostomes5,22 but have the three main deuterostome characters23 of radial

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

290 cleavage, forming their coeloms and mesoderm by and their forms in the 291 vicinity of the closed blastopore - the mouth is a secondary opening - and by this definition they 292 are deuterostomes. Deuterostomy and radial cleavage are also found in the Ecdysozoa as 293 observed in priapulid worms24,25. Radial cleavage, enterocoely and deuterostomy are characters 294 found in both protostomes and deuterostomes and are therefore parsimoniously reconstructed as 295 primitive characteristics of the Bilateria. 296 297 Members of both Chordata and Xenambulacraria also have pharyngeal slits, a post anal tail and a 298 form of endostyle (a pharyngeal tissue that secretes iodine-rich mucus for filter feeding)26. Our 299 results suggest these were also characteristics of Urbilateria (rather than deuterostome 300 synapomorphies) implying they have been altered beyond recognition or lost in the lineage 301 leading to the protostomes. Our evidence that the pharyngeal cluster of genes likely existed in the 302 protostome ancestor implies that stem protostomes may have had pharyngeal slits. 303 304 If Urbliateria had these deuterostome characters there are major implications for understanding 305 early animal evolution. A number of forms from the Cambrian Period have been 306 interpreted as stem deuterostomes because, while lacking most defining characteristics of extant 307 deuterostome phyla, they appear to possess pharyngeal slits27–32 (Fig. 5b). The presence of 308 pharyngeal slits in vetulicolians27,29,30, has been used as evidence for placing them as stem- or 309 total-group deuterostomes, with the bipartite vetulicolian taken as a model for the 310 ancestral deuterostome27,30,33. If pharyngeal slits were present in Urbilateria, this removes the key 311 character supporting placement of vetulicolians with deuterostomes and the presence in 312 vetulicolians of a terminal anus and segmentation30 could indicate that they are stem protostomes 313 that had lost a post anal tail but retained pharyngeal slits. Banffiids, which lack pharyngeal slits 314 but are otherwise morphologically similar to vetulicolians32 (Conway Morris et al., 2015), might 315 occupy a more derived position in the protostome stem group.

316 This suite of complex characters in Urbilateria also suggests a relatively large, filter-feeding 317 animal arguing against a small and simple urbilaterian2,34. A large Urbilateria would suggest that 318 the apparent lack of bilaterians cannot be easily explained as the result of poor

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

319 preservation potential due to small size and simple morphology, and emphasises the gap between 320 molecular clock estimates for the origin of bilaterians and their oldest fossil evidence35. 321 322 While we have shown that support for monophyletic Deuterostomia is exaggerated by systematic 323 errors, we have not attempted definitively to resolve this polytomy. The conclusions we have 324 reached as regards character evolution are, however, unlikely to be affected by resolving the 325 relationships between these extremely short branches. Nevertheless, recent analyses employing 326 large datasets and the best available models give tentative support to a relationship 327 between the Chordata and Protostomia to the exclusion of the Xenambulacraria. Based on the 328 shared character of a centralised in Chordata and Protostomia, we propose the 329 name of Centroneuralia for this putative clade. 330

331 Acknowledgments

332 We are grateful to members of Telford and Yang labs (CLOE, UCL) for discussions and 333 comments on the manuscript. This work was funded by BBSRC grant BB/R016240/1 (PK); by 334 a Leverhulme Trust Research Project Grant RPG-2018-302 (DJL); and by the European 335 Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie 336 grant agreement No 764840 IGNITE (PN).

337 Author Contributions

338 Conceptualization: MJT, PK, HP; Investigation: PK, PN, DJL, MF, NJ, RRC; Data Curation: 339 PK, PN, DJL; Writing – Original Draft: MJT; Writing – Review & Editing: PK, PN, DJL, RRC, 340 HP, IAR; Visualization: PK, PN, DJL, IAR, MJT; Supervision: MJT; Funding Acquisition: 341 MJT.

342 Declaration of Interests

343 The authors declare no competing interests.

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

344 Materials and Correspondence

345 Correspondence and material requests should be addressed to [email protected]

346

347

348 Figure and Figure Legends

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

349 Figure 1. The deuterostome branch is short and weakly supported compared to the 350 protostome branch. 351 a. Canonical tree showing relationships between major metazoan clades. The Bilateria consists of 352 two branches, Protostomia (Red: Lophotrochozoa and Ecdysozoa) and Deuterostomia (Blue: 353 Chordata and Xenambulacraria). 354 b. Comparison of branch lengths from the bilaterian common ancestor to the deuterostome 355 common ancestor (blue) and to the protostome common ancestor (red). In five independently 356 curated large datasets the protostome branch is twice as long as the deuterostome branch.

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

357 c. Comparison of estimated protostome and deuterostome branch lengths for individual genes. 358 Genes from all five datasets are ordered by the difference in length between protostome and 359 deuterostome branches. In 3376 out of 4826 genes (69.9%) the protostome branch is longer. In 360 1402 out of 4826 genes (29%) the deuterostome branch is longer. c. Comparison of decrease in 361 lnLikelihood of a tree in which either the protostome or deuterostome branch has been collapsed 362 (∆lnL). In five independently curated large datasets collapsing the protostome branch reduces 363 lnL considerably more than collapsing the deuterostome branch. 364 cd. Comparison of estimated protostome and deuterostome branch lengths for individual genes. 365 Genes from all five datasets are ordered by the difference in length between protostome and 366 deuterostome branches. In 3376 out of 4826 genes (69.9%) the protostome branch is longer. In 367 1402 out of 4826 genes (29%) the deuterostome branch is longer. 368 e. Comparison of protostome and deuterostome ∆lnL for individual genes. Genes from all five 369 datasets are ordered by the difference in ∆lnL between protostomes and deuterostomes. In 3312 370 out of 4826 genes (68.6%) the protostome ∆lnL is larger. In 1345 out of 4826 genes (27.6%) the 371 deuterostome ∆lnL is larger. All analyses used the LG+F+G model.

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

372

374 Figure 2. Most genes prefer monophyletic Protostomia over alternatives, few genes prefer 375 monophyletic Deuterostomia. 376 Triangular plots showing relative support for the three alternative topologies shown at the 377 corners of the triangles. Each dot represents a gene and its position is determined by the relative 378 lnLikelihoods (measured using the LG+F+G model) of the three constrained trees calculated 379 using that gene alignment. Genes in coloured corners show a high preference for the 380 corresponding topology. The numbers of genes found in the different coloured sectors of the 381 large triangle after estimating likelihoods with the LG+F+G model are shown below. In the 382 triangle plots, all genes from the 5 datasets are visualised together, in the bar plots each named 383 dataset is represented by a bar.

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

384 a. Triangle plot comparing support for monophyletic Protostomia (PM) versus two alternative 385 topologies with paraphyletic Protostomia (P1, P2). 386 b. Bar plot showing that across five datasets the majority of genes strongly prefer the 387 monophyletic Protostomia topology. 388 c. Triangle plot comparing support for monophyletic Deuterostomia (DM) versus two alternative 389 topologies with paraphyletic Deuterostomia (D1, D2). 390 d. Bar plot showing that across five datasets a minority of genes strongly prefer the monophyletic 391 Deuterostomia topology over paraphyletic topologies or the grey areas (representing weak or no 392 preference). 393

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

394

395 Figure 3. Evidence from empirical data for systematic error supporting monophyletic 396 Deuterostomia.

397 a. Bar chart showing the average branch lengths within Chordata, Xenambulacraria, Protostomia 398 and from outgroup taxa to the root of the Bilateria. Datasets with the greatest branch length 399 heterogeneity (Rouse, Cannon, Laumer) are those that support deuterotome monophyly. 400 b. Box plot showing the lengths of the branches leading to the Deuterostomia, Protostomia and 401 from outgroups to Bilateria estimated using different models of evolution and 50,000 sites, 36 402 species from the Laumer dataset. Cross validation shows that models fit the data increasingly 403 well in the order LG < LG+G < C60+LG+G < CAT+LG+G. With improving model fit, the 404 estimated length of the deuterostome branch decreases, and estimated lengths of the protostome 405 and outgroup branches increase. Model violations (e.g. using site homogeneous models such as

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

406 LG) can lead to an underestimate of the likelihood of change on long branches. This would be 407 predicted to promote LBA between the protostomes and outgroup and to give artifactual support 408 to deuterostomes. Less well-fitting models tend to support monophyletic deuterostomes.

409

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

410

411 Figure 4. Simulations show model violations and unequal rates result in artifactual support 412 for monophyletic Deuterostomia. 413 a. 100 datasets were simulated using parameters estimated under a site heterogeneous model 414 (CAT+LG+G) for each of the three topologies shown (coloured boxes). 415 b. For each simulated dataset a maximum likelihood tree was reconstructed under four 416 conditions: Site heterogeneous C60+LG+F+G model (Het); model violating site homogenous 417 LG+F+G model (Hom); homogeneous model with long branch protostomes and outgroup taxa 418 removed (HomNL); homogeneous model with short branch protostomes and outgroup taxa 419 removed (HomNS). For each condition, the number of times the three possible topologies were 420 reconstructed is shown in the bar charts - the colour indicates the topology supported. 421 Data simulated under the DM topology always yield the correct topology under all conditions. 422 Under the site heterogeneous model, D1 and D2 data always yield a correct topology.

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

423 Under the model violating site homogeneous model, D1 and D2 data yield an incorrect topology 424 in 40% and 12% of replicates respectively. The incorrect tree is always DM. 425 Under the site homogeneous model with long branch protostomes and outgroups removed 426 (reducing LBA) D1 and D2 data always yield a correct topology. 427 Under the site homogeneous model with short branch protostomes and outgroups removed 428 (enhancing LBA) D1 and D2 data always yield an incorrect topology. The incorrect tree is 429 always DM. 430 c. Interpretation of simulation results as an LBA artifact. The tree topologies under which data 431 were simulated (true tree) are shown in the middle. The long branches leading to the protostomes 432 and outgroups are indicated in red. Under conditions that minimise systematic error (Left: site 433 heterogeneous model or reduction of branch length heterogeneity by removing long branches) 434 the correct tree (DM, D1 or D2) is always reconstructed. Under conditions that enhance 435 systematic error (Right: model violating site homogeneous models or increase of branch length 436 heterogeneity by removing short branches) the DM topology is reconstructed using data 437 simulated under all three topologies.

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

438 Figure 5. Character evolution implied by a short or non-existent deuterostome branch.

439 a. Distribution of characters on a with an unresolved polytomy at the base of 440 the Bilateria involving Protostomia (P), Xenambulacraria (XA) and Chordata. (L = 441 Lophotrochozoa; E = Ecdysozoa; B = Bilateria). Presence or absence of the characters 442 discussed in the main text are indicated (blue box = present; white box = absent; blue/white box

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

443 = variable; n/a = not applicable). The pharyngeal cluster contains the seven genes indicated by 444 coloured boxes. The cluster has not been found intact in any single protostome, but most 445 possible pairs/triplets of adjacent genes are found linked in one or more protostomes or non- 446 bilaterian metazoans (box represents various protostomes see text for details), implying the 447 common ancestor of protostomes had an intact cluster. Urbilateria is the common ancestor of all 448 three clades (dotted line) and its characteristics can be inferred as those present in Chordata and 449 Xenambulacraria (and, for some characters, in Protostomia). All characters previously 450 considered to be diagnostic of deuterostomes are likely to have been present in Urbilateria as 451 indicated.

452 b. Cambrian bilaterians. 1, The lophotrochozoan Lingulella chengjiangensis (Cambrian Series 453 2, Province, China). 2, The ecdysozoan Chuandianella ovata (Cambrian Series 2, 454 Yunnan Province, China). 3, The xenambulacrarian Protocinctus mansillaensis (Cambrian 455 Series 3, Spain). 4, The chordate fengjiaoa (Cambrian Series 2, Yunnan 456 Province, China). 5, The problematic bilaterian Vetulicola cuneata (Cambrian Series 2, Yunnan 457 Province, China). Pharyngeal slits (red arrows) are present in Vetulicola and Myllokunmingia, 458 while a segmented bipartite body (yellow arrows) is a feature of Vetulicola and Chuandianella. 459 If Urbilateria possessed pharyngeal slits, as discussed in the main text, Vetulicola could 460 represent a stem protostome. Images 1, 2, 4 and 5 courtesy of Yunnan Key Laboratory for 461 Palaeobiology and MEC International Joint Laboratory for Palaeobiology and 462 Palaeoenvironment, Yunnan University, Kunming, China. Image 3 courtesy of Samuel Zamora. 463 Scale bars: 5 mm (1–3); 10 mm (4, 5).

464

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

465 Methods

466 Data Availability

467 The simulation results, subsamples of datasets and any custom scripts used in this study are 468 available in the GitHub repository: https://github.com/MaxTelford/MonoDeutData

469

470 Datasets

471 All analyses were based on five recently published phylogenomic datasets (2–6) covering all 472 major clades of the animal phylogeny. For convenience we will refer to each dataset by the name 473 of the first author of each study (i.e., “Marletaz”, “Philippe”, “Laumer”, “Cannon'' and “Rouse”). 474 In all our analyses, we kept the full set of taxa apart from the fast evolving 475 species that have been shown to be prone to phylogenetic inference errors4. In our analyses the 476 Cannon dataset consisted of 67 taxa and 881 genes (“proteinortho.phy” in the original study), the 477 Laumer dataset consisted of 422 genes and 152 taxa (167 taxa in dataset M in the original study), 478 the Marletaz dataset consisted of 70 species and 1174 genes (alignments provided in the original 479 study after the HMMClean and BMGE filtering), the Philippe dataset consisted of 51 taxa and 480 1173 genes (trimmed alignments provided in the original study) and finally the Rouse dataset 481 consisted of 26 taxa and 1178 partitions (“70.fas” in the original study). Each dataset contained 482 several representatives of the major metazoan clades, specifically the two Protostomia clades 483 Lophotrochozoa and Ecdysozoa, the two Deuterostomia clades and Xenambulacraria 484 as well as several non-bilaterian phyla.

485 Measuring support for deuterostome and protostome branches

486 The protostomes and the deuterostomes are traditionally treated as two equally strongly 487 supported clades. To assess whether indeed molecular supports this perception of 488 bilaterian evolution we compared the branch lengths of the two clades across the five 489 concatenated datasets as well as for each gene of each dataset individually. For each dataset we

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

490 assumed the tree topology as reported in the original studies, specifically, for Cannon, Laumer, 491 Marletaz, Rouse the trees we used relate to Fig. 2, Fig. 2a, Fig. 2, Fig. 3 from their respective 492 publications, while for the Philippe data we assumed the relationships as in the “PHILIPPE- 493 ALLSPP-NOACOEL-CATGTR-100JP.tre” from the original publication. For the Cannon, 494 Laumer and Rouse data, are considered to be sister to Ambulacraria rather than 495 . This position for Xenoturbella in the absence of the long branched Acoelomorpha, is 496 supported by these datasets. For the Philippe and Marletaz data we altered the tree only to 497 enforce deuterostome monophyly as this was not supported in the published phylogenies. For the 498 analyses performed per gene we cropped the original trees using newick-tools36 such that only 499 the taxa present in the relevant gene-alignment were present. Given the fixed topology, the 500 branch lengths were estimated with IQ-TREE37 under the state frequency heterogeneous model 501 LG+F+G model. We used the optimised topologies to measure the length of the branch leading 502 from the common ancestor of Bilateria (protostomes plus deuterostomes) to the common 503 ancestors respectively of deuterostomes and protostomes.

504 We performed a second analysis where we measured the difference in the likelihood score 505 between the fully resolved phylogeny and for the topology after collapsing either the protostome 506 of the deuterostome branch into a polytomy.

507 Collapsed Protostomia branch = (Lophotrochozoa,Ecdysozoa,(Ambulacraria,Chordata))

508 Collapsed Deuterostomia branch = ((Lophotrochozoa,Ecdysozoa),Ambulacraria,Chordata).

509 For calculating the lnLikelihoods we used the LG+F+G model. We measured the difference in 510 lnLikelihood in both the five full concatenated alignments and for each gene from each dataset 511 individually.

512 Support for monophyletic versus paraphyletic deuterostomes and protostomes.

513 We measured the proportion of genes in each dataset that strongly support the monophyly of the 514 deuterostome clade “DM” over the two alternative paraphyletic topologies. Specifically, we 515 compared the lnLikelihood of the topologies assuming monophyletic deuterotomes “DM” and

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

516 two possible paraphyletic alternatives

517 Deuterostome Paraphyly “D1”: (Chordata,(Xenambulacraria,(Lophotrochozoa,Ecdysozoa)))

518 Deuterostome Paraphyly “D2”: (Xenambulacraria,(Chordata,(Lophotrochozoa,Ecdysozoa)))

519 We performed the equivalent analyses for the protostomes, specifically, we compared the 520 lnLikelihood of the topologies assuming monophyletic protostomes “PM” and the two possible 521 paraphyletic alternatives

522 Protostome Paraphyly “P1”: (Ecdysozoa,(Lophotrochozoa,(Xenambulacraria,Chordata)))

523 Protostome Paraphyly “P2”: (Lophotrochozoa,(Ecdysozoa,(Xenambulacraria,Chordata)))

524

525 To achieve this in both cases we calculated the log-likelihoods for the three alternative 526 topologies using IQ-TREE under the LG+F+G model.

527 In all cases the topologies were fixed as described earlier and we manually adjusted them to 528 produce the two alternative hypotheses rendering protostomes or deuterostomes paraphyletic. As 529 before, each of the topologies was trimmed to the taxa present in each gene. We visualised the 530 relative support of each gene for the three alternative topologies affecting either deuterostomes 531 or protostomes using a method adapted from that described in 38. By scaling the three likelihoods 532 in the range [0,1] such that lnLn1 + lnL2 + lnL3 = 1 (the relevant python script 533 “likelihood_transform.py” is available in https://github.com/MaxTelford/MonoDeutData) we 534 could use their transformed values as coordinates in a ternary plot, whose corners represent the 535 three topologies. We considered a gene to support a particular topology if the corresponding 536 scaled likelihood was larger than ⅔ (and the likelihoods for the other two topologies was 537 therefore smaller than ⅓). We divided the triangle into compartments to reflect these cutoffs and 538 plotted the points using the R package ‘ggtern’39 (the relevant R script “plot_triangles.R” is 539 available in https://github.com/MaxTelford/MonoDeutData).

540 Phylogenetic informativeness of genes supporting each topology

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

541 To identify potential reasons for individual genes supporting alternative topologies for both the 542 deuterostome and the protostome clades we evaluated two parameters known to be related to 543 errors in phylogenetic inference. For each gene alignment from each dataset we first measured 544 the alignment length. Subsequently we inferred the Maximum Likelihood phylogeny for each 545 gene with IQ-TREE under the LG+F+G model and measured the monophyly score of the tree 546 (this is a measure of a data set’s ability to reconstruct known monophyletic groups and is 547 described in 4). We used a Welch’s t-test to determine whether the difference in these scores 548 between the sets of genes supporting different topologies was significant (Extended Data Table 549 3).

550

551 Clade lengths across datasets

552 We assessed how the average branch length of the two clades Protostomes and Deuterostomes 553 and of the outgroup clan differ across the five datasets. Using the complete alignments, we 554 estimated the branch lengths under the LG+F+G model assuming the DM topology with IQ- 555 TREE. To measure the average tip to stem branch length in each clade or clan we used the 556 “pynt.py” script (available at https://github.com/MaxTelford/XenoCtenoSims). We additionally 557 measured the tree length (sum of all branch lengths) for the five datasets after reducing each of 558 them to the 8 species they all had in common (i.e. Amphimedon queenslandica, Nematostella 559 vectensis, Strongylocentrotus purpuratus, intestinalis, Peripatopsis capensis, Priapulus 560 caudatus, , Lottia gigantea).

561

562 Measuring branch lengths with different models and cross-validation

563 We compared the estimates for the deuterostome, protostome and bilaterian stem branch lengths 564 across four models accommodating or not state composition heterogeneity across sites. We 565 performed this analysis only for the Laumer dataset that showed the strongest support for the 566 deuterostome monophyly. To lessen the computational burden, we reduced the dataset to 36 taxa

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

567 that cover all major branches of the phylogeny selecting taxa with fewest missing data and 568 randomly selected 50,000 of the original 106,186 alignment sites (“reduced-Laumer”). Using the 569 reduced-Laumer dataset and assuming the DM topology we performed a phylobayes-mpi 570 (version 1.810) run for four substitution models, i.e., LG+G (homogeneous model in terms of 571 state composition heterogeneity across sites), C10+LG+G (heterogeneous model assuming 10 572 different sets of amino acid state frequencies40), C60+LG+G (heterogeneous model assuming 60 573 different sets of amino acid state frequencies40) and the infinite mixture model CAT+LG+G 574 model8. We performed 10,000 MCMC cycles for all models except for the CAT+LG+G for 575 which we performed 20,000. For all four runs we used the final 5,000 posterior tree samples to 576 calculate the average stem branch length of the deuterostomes, protostomes and bilaterians.

577 To compare how the branch length estimates relate to the fitness of each of the four models we 578 performed cross validation analyses as described in the phylobayes-mpi manual. Using the 579 reduced-Laumer dataset, we created 10 training datasets each consisting of 10,000 sites and 10 580 test datasets of 2,000 sites each. We ran phylobayes-mpi for 3,000 MCMC cycles for each of the 581 training datasets and for each of the four models. Subsequently, using the “readpbmpi” program 582 (distributed with the phylobayes-mpi) under the “-cv” option we calculated the posterior mean 583 cross-validation score with a burnin of 2000 and sampling a frequency of 0.1.

584 Simulations - Systematic Error

585 We simulated sequence alignments using parameters that match those measured from empirical 586 sequences under the three topological hypotheses relating the deuterostome clades. As before we 587 based these analyses on the reduced-Laumer dataset. The parameter estimates and simulations 588 were carried out with phylobayes-mpi.

589 The two steps were

590 1) We estimated the posteriors of branch lengths and model parameters using the CAT+LG+G 591 model on three alternative fixed topologies relating the deuterostome clades. For each fixed 592 topology we performed 20,000 MCMC cycles with a sampling frequency of 1.

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

593 2) Using the final 5,000 posterior samples we sub-sampled with a frequency of 1 in 50, which 594 gave us a subset of 100 posterior samples. Using these combinations of branch lengths and 595 model parameters we simulated data with the “readpb_mpi” tool under the “ppred” option.

596 For phylobayes runs using all three fixed topologies, the effective sampling size (ESS), for some 597 of the model parameters was lower than 100, but in all cases >50, possibly suggesting inadequate 598 MCMC mixing and lack of convergence. This is a known problem of phylobayes when using 599 large datasets. To ascertain whether the results from our simulations were affected by inadequate 600 mixing (or by a sampling effect), we repeated the procedure for a smaller fraction (10,000 601 randomly selected sites) of the Laumer alignment. In this case we performed 10,000 MCMC 602 samples, and the ESS values were above 100 for all parameters with a 25% burnin. The results of 603 the simulations based on this dataset were in broad agreement with the results based on the larger 604 dataset (Extended Data Table 4).

605 For each of the simulated datasets (simulated under the best fitting state-frequency 606 heterogeneous CAT model) we performed two phylogenetic inference analyses using IQ-TREE, 607 i) under the state-frequency homogeneous model LG+F+G and ii) the state-frequency 608 heterogeneous model C60+LG+F+G under the PMSF approximation (Wang et al., 2018). The 609 C60 model implements an approximation of the CAT model of Phylobayes but with 60 610 precomputed site frequency categories40 and is considerably faster to run in a maximum 611 likelihood framework41, enabling the analysis of multiple simulated datasets. We wanted to test 612 the possible influence of an LBA attracting long branched protostomes to the long branch 613 leading to the outgroup leading to artifactual support for monophyletic deuterostomes. We 614 repeated the previous analyses after removing the longest protostome branches (i.e., Diuronotus 615 aspetos, Geocentrophora applanata, Schmidtea mediterranea, Echinococcus multilocularis, 616 Adineta vaga, Loa loa, , dujardini) and the outgroup taxa (i.e., 617 hydriforme, Amphimedon queenslandica, Beroe abyssicola, Mnemiopsis leidyi, 618 Salpingoeca rosetta). Finally, we performed an equivalent experiment by removing the 13 619 shortest protostomes (Capitella teleta, Helobdella robusta, Phyllochaetopterus sp., Daphnia 620 pulex, Limulus polyphemus, Hemithiris psittacea, Membranipora membranacea, Lottia gigantea, 621 Neomenia sp., Phoronis psammophila, Priapulus caudatus) and outgroup taxa (Craspedacusta

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

622 sowerbyi, Nematostella vectensis) and inferred the tree topologies under the LG+F+G model.

623 In all cases we measured the proportion of simulated datasets supporting each of the three 624 potential topologies.

626 Gene order 627 628 Gene order was determined from NCBI genome gff files, or gff files distributed with sequence 629 data from the Marine Genomics Unit of Okinawa Institute of Science and Technology 630 (https://marinegenomics.oist.jp/), or other online resources related to original publications. 631 Orthologs of selected genes were determined from phylogenies of corresponding Pfam domains 632 (e.g. , Forkhead etc.). Briefly, Pfam hidden Markov models were searched against a 633 metazoan protein database (hmmsearch), sequences were aligned (hmmalign) and then trimmed 634 (esl-alimask, esl-alimanip) based on posterior probabilities of correct alignments and length of 635 remaining sequence. Phylogenies were constructed using IQ-TREE, allowing the optimum 636 model to be selected from the LG set (i.e. ‘-mset LG’ in IQ-TREE). Orthologs were identified by 637 inspection of the phylogenetic tree around the relevant gene (e.g. msxlx, pax1/9 etc.). 638 639 Deuterostome novelties 640 641 Deuterostome novelties of Simakov et al. were searched against selected protostome and non- 642 bilaterian metazoan data sets. The majority of gene sets (Lingula, Priapulus etc.) were obtained 643 from the NCBI genomes resource (accessible at: 644 https://www.ncbi.nlm.nih.gov/genome/browse#!/overview/), excepting Phoronis australis 645 (OIST, see above). 646 647 transcriptomes were assembled from reads retrieved from the European Nucleotide 648 Archive: 649 650 P. jani42 : SRR3417194_1.fastq.gz, SRR3417194_2.fastq.gz

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

651 652 C. candelabrum43: SRR499817_1.fastq.gz, SRR499817_2.fastq.gz, SRR499820_1.fastq.gz, 653 SRR499820_2.fastq.gz, SRR504694_1.fastq.gz, SRR504694_2.fastq.gz 654 655 Sequences were processed into ‘left’ and ‘right’ read files adding ‘/1’ and ‘/2’ respectively to 656 identifiers (e.g. for a ‘_1.fastq’ file: perl -pe 's/^@(SRR\S+)/\@$1\/1/' ) and then assembled 657 using Trinity-v2.8.444 with default parameters: 658 659 trinityrnaseq-Trinity-v2.8.4/Trinity --seqType fq --max_memory 128G --left fastq/left.fastq -- 660 right fastq/right.fastq --CPU 32 661 662 Oscarella carmela proteins were downloaded from compagen 663 (http://www.compagen.org/datasets.html, file OCAR_T-PEP_130911) and Phoronis australis 664 from OIST (see above). 665 666 Candidate orthologs were identified via reciprocal best hits and phylogenetic analysis of relevant 667 Pfam domains, as described above (section ‘Gene order’). 668 669 Sequences were also searched against the NR database of the NCBI to test likely monophyly of 670 metazoan (in this analysis generally sponge + bilaterian) proteins based on clear separation of bit 671 scores, using blastp (v2.10.0+)45. 672

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

673 References

674 675 1. Kapli, P., Yang, Z. & Telford, M. J. Phylogenetic tree building in the genomic age. Nat. 676 Rev. Genet. 21, 428–444 (2020).

677 2. Cannon, J. T. et al. Xenacoelomorpha is the sister group to Nephrozoa. Nature 530, 89–93 678 (2016).

679 3. Laumer, C. E. et al. Revisiting metazoan phylogeny with genomic sampling of all phyla. 680 Proc. R. Soc. B Biol. Sci. 286, 20190831 (2019).

681 4. Philippe, H. et al. Mitigating anticipated effects of systematic errors supports sister-group 682 relationship between Xenacoelomorpha and Ambulacraria. Curr. Biol. 29, 1818-1826.e6 683 (2019).

684 5. Marlétaz, F., Peijnenburg, K. T. C. A., Goto, T., Satoh, N. & Rokhsar, D. S. A new 685 spiralian phylogeny places the enigmatic arrow among Gnathiferans. Curr. Biol. 686 29, 312-318.e3 (2019).

687 6. Rouse, G. W., Wilson, N. G., Carvajal, J. I. & Vrijenhoek, R. C. New deep-sea species of 688 Xenoturbella and the position of Xenacoelomorpha. Nature 530, 94–97 (2016).

689 7. Felsenstein, J. Cases in which parsimony or compatibility methods will be positively 690 misleading. Syst. Biol. 27, 401–410 (1978).

691 8. Lartillot, N. & Philippe, H. A Bayesian mixture model for across-site heterogeneities in 692 the amino-acid replacement process. Mol. Biol. Evol. 21, 1095–1109 (2004).

693 9. Lartillot, N., Brinkmann, H. & Philippe, H. Suppression of long-branch attraction artefacts 694 in the animal phylogeny using a site-heterogeneous model. BMC Evol. Biol. 7, S4 (2007).

695 10. Lartillot, N., Rodrigue, N., Stubbs, D. & Richer, J. Phylobayes mpi: Phylogenetic 696 reconstruction with infinite mixtures of profiles in a parallel environment. Syst. Biol. 62, 697 611–615 (2013).

698 11. Aguinaldo, A. M. A. et al. Evidence for a clade of , and other 699 moulting animals. Nature 387, 489–493 (1997).

700 12. Bergsten, J. A review of long-branch attraction. 21, 163–193 (2005).

701 13. Tarver, J. E. et al. Well-annotated microRNAomes do not evidence pervasive miRNA 702 loss. Genome Biol. Evol. 10, 1457–1470 (2018).

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

703 14. Papillon, D., Perez, Y., Caubit, X. & Le Parco, Y. Identification of Chaetognaths as 704 protostomes is supported by the analysis of their mitochondrial genome. Mol. Biol. Evol. 705 21, 2122–2129 (2004).

706 15. Natsidis, P., Schiffer, P. H., Salvador-Martínez, I. & Telford, M. J. Computational 707 discovery of hidden breaks in 28S ribosomal RNAs across and consequences 708 for RNA Integrity Numbers. Sci. Rep. 9, 1–10 (2019).

709 16. Simakov, O. & Kawashima, T. Independent evolution of genomic characters during major 710 metazoan transitions. Dev. Biol. 427, 179–192 (2017).

711 17. Wang, W., Zhong, J., Su, B., Zhou, Y. & Wang, Y. Q. Comparison of Pax1/9 locus 712 reveals 500-Myr-old syntenic block and evolutionary conserved noncoding regions. Mol. 713 Biol. Evol. 24, 784–791 (2007).

714 18. Maddison, W. P. Gene trees in species trees. Syst. Biol. 46, 523–536 (1997).

715 19. Roch, S., Nute, M. & Warnow, T. Long-Branch attraction in species tree estimation: 716 inconsistency of partitioned likelihood and topology-based summary methods. Syst. Biol. 717 68, 281–297 (2019).

718 20. Telford, M. J., Budd, G. E. & Philippe, H. Phylogenomic insights into animal evolution. 719 Curr. Biol. 25, PR876-R887 (2015).

720 21. Grobben, K. Die systematische einteilung des tierreichs. Verhandlungen der Zool. 721 Gesellschaft Österreich 58, 491–511 (1908).

722 22. Marlétaz, F. et al. Chaetognath : a protostome with deuterostome-like 723 development. Curr. Biol. 16, PR577-R578 (2006).

724 23. Willmer, P. relationships: patterns in animal evolution. (Cambridge 725 University Press, 1990).

726 24. Martín-Durán, J. M., Janssen, R., Wennberg, S., Budd, G. E. & Hejnol, A. Deuterostomic 727 development in the protostome Priapulus caudatus. Curr. Biol. 22, P2161-2166 (2012).

728 25. Wennberg, S. A., Janssen, R. & Budd, G. E. Early of the 729 priapulid Priapulus caudatus. Evol. Dev. 10, 326–338 (2008).

730 26. Ruppert, E. E. Key characters uniting and chordates: Homologies or 731 homoplasies? Can. J. Zool. 83, 8–23 (2005).

732 27. Shu, D. G. et al. Primitive deuterostomes from the Chengjiang Lagerstätte (Lower 733 Cambrian, China). Nature 414, 419–424 (2001).

33 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

734 28. Shu, D. et al. A new species of yunnanozoan with implications for deuterostome 735 evolution. Science 299, 1380–1384 (2003).

736 29. Shu, D. G., Conway Morris, S., Zhang, Z. F. & Han, J. The earliest history of the 737 deuterostomes: The importance of the Chengjiang Fossil-Lagerstätte. Proc. R. Soc. B Biol. 738 Sci. 277, 165–174 (2010).

739 30. Ou, Q. et al. Evidence for gill slits and a pharynx in Cambrian vetulicolians: implications 740 for the early evolution of deuterostomes. BMC Biol. 10, (2012).

741 31. Han, J., Morris, S. C., Ou, Q., Shu, D. & Huang, H. Meiofaunal deuterostomes from the 742 Cambrian of Shaanxi (China). Nature 542, 228–231 (2017).

743 32. Morris, S. C., Halgedahl, S. L., Selden, P. & Jarrard, R. D. Rare primitive deuterostomes 744 from the Cambrian (Series 3) of Utah. J. Paleontol. 89, 631–636 (2015).

745 33. Peterson, K. J. & Eernisse, D. J. The phylogeny, evolutionary developmental , and 746 paleobiology of the Deuterostomia: 25 of new techniques, new discoveries, and new 747 ideas. Org. Divers. Evol. 16, 401–418 (2016).

748 34. Hejnol, A. & Martindale, M. Q. Acoel development indicates the independent evolution of 749 the bilaterian mouth and anus. Nature 456, 382–386 (2008).

750 35. Cunningham, J. A., Liu, A. G., Bengtson, S. & Donoghue, P. C. J. The origin of animals: 751 Can molecular clocks and the fossil record be reconciled? BioEssays 39, 1–12 (2017).

752 36. Flouri, T., Stamatakis, A. & Kapli, P. newick-tools: A novel software for simulating and 753 processing phylogenetic trees. https://github.com/xflouris/newick-tools (2018).

754 37. Nguyen, L. T., Schmidt, H. A., Von Haeseler, A. & Minh, B. Q. IQ-TREE: A fast and 755 effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. 756 Evol. 32, 268–274 (2015).

757 38. Strimmer, K. & Von Haeseler, A. Likelihood-mapping: A simple method to visualize 758 phylogenetic content of a sequence alignment. Proc. Natl. Acad. Sci. U. S. A. 94, 6815– 759 6819 (1997).

760 39. Hamilton, N. & Ferry, M. ggtern: Ternary diagrams using ggplot2. J. Stat. Software, Code 761 Snippets 87, 1–17 (2018).

762 40. Quang, L. S. et al. Empirical profile mixture models for phylogenetic reconstruction. 763 Bioinformatics 24, 2317–2323 (2008).

764 41. Wang, H. C., Minh, B. Q., Susko, E. & Roger, A. J. Modeling site heterogeneity with 765 posterior mean site frequency profiles accelerates accurate phylogenomic estimation. Syst.

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

766 Biol. 67, 216–235 (2018).

767 42. Simion, P. et al. A large and consistent phylogenomic dataset supports as the 768 sister group to all other animals. Curr. Biol. 27, 958–967 (2017).

769 43. Ludeman, D. A., Farrar, N., Riesgo, A., Paps, J. & Leys, S. P. Evolutionary origins of 770 sensation in metazoans: Functional evidence for a new sensory organ in sponges. BMC 771 Evol. Biol. (2014) doi:10.1186/1471-2148-14-3.

772 44. Grabherr, M. G. ., Brian J. Haas, Moran Yassour Joshua Z. Levin, Dawn A. Thompson, 773 Ido Amit, Xian Adiconis, Lin Fan, Raktima Raychowdhury, Qiandong Zeng, Zehua Chen, 774 Evan Mauceli, Nir Hacohen, Andreas Gnirke, Nicholas Rhind, Federica di Palma, Bruce 775 W., N. & Friedman, and A. R. Trinity: Reconstructing a full-length transcriptome without 776 a genome from RNA-Seq data. Nat. Biotechnol. 29, 644 (2013).

777 45. Camacho, C. et al. BLAST+: Architecture and applications. BMC Bioinformatics 10, 421 778 (2009). 779 780

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review)Philippe is the author/funder, who has granted bioRxiv a license to displayPhilippe the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

750 750

h h

t t

g g

n n

e e

L L

t

t 500 500

n n

e e

m m

n n

g g

i i

l l

A A

250 250

0 0 Chordata Chordata Chordata Ecdysozoa Ecdysozoa Ecdysozoa Xenambulacraria Xenambulacraria Xenambulacraria Lophotrochozoa Lophotrochozoa Lophotrochozoa Ecdysozoa Lophotrochozoa Ecdysozoa Ecdysozoa Xenambulacraria Chordata Lophotrochozoa Ecdysozoa Lophotrochozoa Lophotrochozoa Chordata Xenambulacraria Outgroup Outgroup Outgroup Outgroup Outgroup Outgroup Cannon Laumer Cannon Laumer 1000 1000

2000 2000

h h

t t g

g 750 750

n n e

e 1500 1500

L L

t t n

n 500 500 e

e 1000 1000

m m

n n

g g i

i 500 500 l

l 250 250

A A 0 0 0 Marletaz Rouse Marletaz Rouse

1000 1000 4000 4000

h h

t t g

g 750 750 n

n 3000 3000

e e

L L

t t n

n 2000 500 2000 500

e e

m m

n n

g g i

i 1000 1000 l

l 250 250

A A 0 0 0 0

Extended Data Figure 1 | Longer genes tend to support monophyletic Protostomia but not monophyletic Deuterostomia. Box plots showing the distribution of alignment lengths for genes strongly supporting the three alternative topologies shown (sets of genes selected using the data shown in figure 2). Longer alignments are expected to contain more phylogenetic signal. Asterisks indicate significance at p < 0.05 using a Welch t-test for equal means. Left: Data from all five named datasets comparing the lengths of gene alignments supporting monophyletic Protostomia versus those supporting two alternative topologies with paraphyletic Protostomia. For most datasets the genes supporting monophyletic Protostomia are significantly longer than the genes supporting the alternative topologies. Right: Data from all five datasets comparing lengths of gene alignment supporting monophyletic Deuterostomia versus two alternative topologies with paraphyletic Deuterostomia. For most datasets the genes supporting monophyletic Deuterostomia are not significantly longer than the genes supporting the alternative topologies. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Extended Data Table 1 | Average alignment length and monophyly scores of genes supporting the monophyly of Protostomia (PM), its alternatives (P1/P2), as well as of genes showing no preference to any of the three topologies (Px). Dataset Topology Proportion Alignment Monophyly Length Score Cannon PM 61.75% 269.6 0.628 P1 9.20% 214 0.589 P2 7.92% 232.4 0.55 Px 14.22% 195 0.568

Laumer PM 65.88% 408.7 0.564 P1 7.35% 309.3 0.554 P2 7.58% 310.8 0.559 Px 12.56% 215.3 0.536

Marletaz PM 61.50% 426.2 0.634 P1 9.20% 346.8 0.594 P2 7.92% 324 0.584 Px 14.22% 245.5 0.577

Philippe PM 58.65% 323.4 0.626 P1 9.55% 293.3 0.59 P2 8.70% 277.7 0.582 Px 14.32% 224.9 0.564

Rouse PM 46.94% 340.3 0.494 P1 11.46% 339.8 0.474 P2 10.02% 323.8 0.466 Px 23.01% 321.1 0.478

Average PM 58.94% 353.6 0.589 P1 9.71% 300.6 0.56 P2 8.66% 293.7 0.548 Px 14.82% 240.4 0.546

bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Extended Data Table 2 | Average alignment length and monophyly scores of genes supporting the monophyly of Deuterostomia (DM), its alternatives (D1/D2), as well as of genes showing no preference to any of the three topologies (Dx). Dataset Topology Proportion Alignment Monophyly Length Score Cannon DM 28.60% 389 0.621 D1 22.02% 424.7 0.638 D2 20.09% 419.5 0.622 Dx 17.25% 264.3 0.58

Laumer DM 35.07% 286.2 0.603 D1 14.93% 265.9 0.6 D2 15.17% 244.7 0.599 Dx 22.99% 189.8 0.617

Marletaz DM 27.34% 412.3 0.562 D1 18.99% 407.3 0.58 D2 20.27% 362.6 0.558 Dx 20.7% 240.7 0.533

Philippe DM 25.58% 309.1 0.617 D1 20.55% 324.8 0.615 D2 23.02% 314.3 0.612 Dx 21.06% 237 0.577

Rouse DM 26.66% 350.4 0.48 D1 29.35% 323.6 0.496 D2 17.32% 362.7 0.496 Dx 24.28% 297.1 0.484

Average DM 28.65% 349.4 0.577 D1 19.17% 349.3 0.586 D2 19.17% 340.8 0.577 Dx 21.26% 245.8 0.558

bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Extended Data Table 3 | Results of t-test comparing alignment length of genes supporting the monophyly of either Protostomia (PM) or Deuterostomia (DM) and the two alternatives (P1/P2 and D1/D2 correspondingly) Genes Supporting Genes Supporting Dataset topology 1 topology 2 P-value Laumer DM(148) D1(64) 0.04945* Philippe DM(300) D1(270) 0.6855 Cannon DM(252) D1(177) 0.241 Marletaz DM(321) D1(238) 0.08241 Rouse DM(314) D1(204) 0.3317 Laumer DM(148) D2(63) 0.4596 Philippe DM(300) D2(241) 0.2372 Cannon DM(252) D2(194) 0.1436 Marletaz DM(321) D2(223) 0.8731 Rouse DM(314) D2(228) 0.008017*

Laumer PM(278) P1(32) 0.1285 Philippe PM(688) P1(102) 0.001891* Cannon PM(544) P1(80) 0.00001572* Marletaz PM(722) P1(93) 0.00001684* Rouse PM(553) P1(118) 0.1573 Laumer PM(278) P2(31) 0.004006* Philippe PM(688) P2(112) 0.02411* Cannon PM(544) P2(97) 0.0002559* Marletaz PM(722) P2(108) 0.000005674* Rouse PM(553) P2(135) 0.9662 *Statistically significantly different

Page 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.182915; this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Extended Data Table 4 | Number of simulation replicates supporting the three alternative topologies under three different true tree hypotheses (D1, D2, DM) for different datasets and models True Tree: D1 True Tree: D2 True Tree: DM Dataset Model D1 DM D2 D2 DM D1 DM other 36-taxa-50K LG+F+G 60 40 - 88 12 - 100 - 36-taxa-50K C60+LG+F+G 100 - - 100 - - 100 - 23-taxa-50K-no-longs LG+F+G 100 - - 100 - - 100 - 23-taxa-50K-no-shorts LG+F+G - 100 - - 100 - 100 - 36-taxa-10K LG+F+G 70 30 - 96 4 - 100 - 36-taxa-10K C60+LG+F+G 86 14 - 100 - - 100 - 23-taxa-10K-no-longs LG+F+G 93 5 2 100 - - 100 - 23-taxa-10K-no-shorts LG+F+G - 100 - - 100 - 100 -