<<

bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 2 Title: Molecular data from Orthonectid worms show they

3 are highly degenerate members of Annelida not

4 phylum .

5 Authors: Philipp H. Schiffer1, Helen E. Robertson1, Maximilian J. Telford1*

6

7 Affiliations: 1 Centre for ’s Origins and Evolution, Department of Genetics, Evolution

8 and Environment, University College London, Gower Street, London, WC1E 6BT, UK

9 *Correspondence to: [email protected]

10

11

12 Summary:

13 The Mesozoa are a group of tiny, extremely simple, vermiform endoparasites of various

14 marine (Fig. 1). There are two recognised groups within the Mesozoa: the

15 (Fig. 1a,b; with a few hundred cells including a nervous system made up of just

16 10 cells [1]) and the Dicyemids (Fig. 1c; with at most 42 cells [2]). They are

17 classic ’Problematica’ [3] - the name Mesozoa suggests an evolutionary position

18 intermediate between Protozoa and Metazoa (animals) [4] and implies their simplicity is a

19 primitive state, but molecular data have shown they are members of within

20 [5-8] which would mean they derive from a more complex ancestor. Their precise

21 phylogenetic affinities remain uncertain, however, and ascertaining this is complicated by the

22 very fast evolution observed in genes from both groups, leading to the common systematic

23 error of Long Branch Attraction (LBA) [9]. Here we use mitochondrial and nuclear gene bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

24 sequence data, and show beyond doubt that both dicyemids and orthonectids are members of

25 the Lophotrochozoa. Carefully addressing the effects of systematic errors due to unequal

26 rates of evolution, we show that the phylum Mesozoa is polyphyletic. While the precise

27 position of dicyemids remains unresolved within Lophotrochozoa, we unequivocally identify

28 orthonectids as members of the phylum Annelida. This result reveals one of the most extreme

29 cases of body plan simplification in the ; our finding makes sense of an

30 -like cuticle in orthonectids [1] and suggests the circular muscle cells repeated along

31 their body [10] may be segmental in origin.

32

33 Results

34 Using a new assembly of available genomic and transcriptomic sequence data we identified an

35 almost complete mitochondrial genome from Intoshia linei (2 ribosomal RNAs, 20 transfer

36 RNAs and all protein coding genes apart from atp8) and recovered 9 individual mitochondrial

37 gene containing contigs from Dicyema japonicum and from a second unidentified species

38 (Dicyema sp.; cox1, 2, 3; cob; and nad1, 2, 3, 4, 5). Cob, nad3, nad4, and nad5 had not

39 previously been identified in any Dicyma species. All studied possess a unique,

40 derived combination of amino acid signatures and conserved deletions in their mitochondrial

41 NAD5 genes. Comparing the NAD5 protein coding regions of Intoshia and Dicyema to those

42 of other Metazoa shows that both share almost all of the conserved signatures [11]

43 (Fig. 2a). This signature is significantly more complex than the two amino acids of the

44 Lox5/DoxC signature from Dicyema previously published [11-13] and shows beyond doubt

45 that both groups are protostomes.

46 It has been suggested that mesozoans are derived from the parasitic neodermatan . If

47 this were correct mesozoans would be expected to share two changes in mitochondrial genetic bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

48 code that unite all rhabditophoran flatworms, where the triplet AAA codes for Asparagine (N)

49 rather than the normal Lysine (K) and ATA codes for Isoleucine (I) rather than the usual

50 Methionine (M) [14]. We inferred the mitochondrial genetic codes for Dicyema and Intoshia.

51 Both groups have the standard invertebrate mitochondrial code arguing against a relationship

52 with the parasitic rhabditophoran platyhelminths (table S1).

53 We next aligned the mitochondrial genes of Intoshia and three species of Dicyema with

54 orthologs from a diversity of other Metazoans and concatenated these to produce a matrix of

55 2,969 reliably aligned amino acids from 69 species. Phylogenetic analyses of this

56 comparatively small data set is not expected to be as reliable as a much larger set of nuclear

57 genes and aspects of the topology and observed branch lengths suggest it was affected by LBA

58 (Fig. 2b). To reduce the effects of LBA on the inference of the affinities of the mesozoans we

59 removed the taxa with the longest branches and considered the position of the dicyemids and

60 orthonectid separately (as both are very long branched). We were unable to resolve the position

61 of the dicyemids (although they are clearly lophotrochozoans), but found some support for

62 placing the orthonectid Intoshia linei with the (Fig. 2c and figures S1, 2). Intoshia

63 linei has a unique mitochondrial gene although the order of the genes nad1, nad6, and

64 cob match that seen in the Lophotrochozoan ground plan and the early branching annelid

65 Owenia (Fig. 2d).

66 We next assembled a data set of 469 orthologous genes, 227,187 reliably aligned amino acids,

67 from 45 species of animals including Intoshia linei and two species of Dicyema. After

68 removing positions in the concatenated alignment with less than 50% occupancy we had an

69 alignment length of 190,027 amino acids and average completeness of ~68%. Intoshia linei

70 was 65% complete, while Dicyema japonicum and Dicyema sp. were 77% and 43% complete

71 respectively (table S2). We conducted a bayesian phylogenetic analyses of these data with the

72 site heterogeneous CAT+G4 model in Phylobayes [15]. To provide an additional, conservative bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

73 estimate of clade support and to enable further analyses in a practical time frame, we also used

74 jackknife subsampling. For each jackknife analysis we took 50 random subsamples of 30,000

75 amino acids each and ran 2,000 cycles (phylobayes CAT+G4) per sample. All 50 subsamples

76 were summarised into a single tree with the first 1800 trees from each excluded as ‘burnin’

77 [16].

78 We observed strong support for a clade of Lophotrochozoa (excluding ) including both

79 dicyemids and the orthonectid (Bayesian Posterior Probability (PP) = 1.0; Jackknife Proportion

80 (JP) = 0.97) (Fig. 3a). The dicyemids and orthonectids were not each other’s closest relatives;

81 the position of the dicyemids within the Lophotrochozoa was not resolved; they were not the

82 sister group of the platyhelminths nor of the gastrotrichs in our analysis. The position of the

83 orthonectid Intoshia, in contrast, was resolved as being within the clade of annelids (Fig. 2a PP

84 = 0.97; JP = 0.74).

85 We next asked whether there was any effect from long branched dicyemids on the strength of

86 support for inclusion of Intoshia within the Annelida - Intoshia also being a long-branched

87 taxon. Repeating our jackknife analyses with dicyemids excluded increased the support for

88 Intoshia as an annelid from JP = 0.74 to JP = 0.86 (Fig. 3b) showing that when the expected

89 LBA between Dicyema spp and Intoshia is prevented, there is stronger support for including

90 the orthonectid in Annelida. An equivalent analysis omitting Intoshia did not help to resolve

91 the position of dicyemids (figure S3).

92 To test further the support for Intoshia being a member of Annelida, we reasoned that an

93 analysis restricted to genes showing the strongest signal supporting monophyletic Annelida

94 should give stronger support to Intoshia within Annelida but only if it is indeed a member of

95 the clade; if not, support should decrease when using this subset of genes. We first removed all

96 mesozoan sequences from each individual gene alignment and reconstructed a tree for each

97 gene. We ranked these gene trees according to the proportion of all annelids present in a given bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

98 gene data set that were observed united in a clade. We concatenated the genes (now including

99 mesozoans) from strongest supporters of monophyletic Annelida to weakest. We repeated our

100 jackknife analyses using the best quarter of genes. An analysis of the genes that most strongly

101 support monophyletic Annelida results in an increase support for inclusion of Intoshia within

102 Annelida from JP = 0.74 to JP = 0.94 (Fig. 3c).

103 Our results suggest that recent findings of a close relationship between Intoshia and Dicyema

104 and the linking of both these taxa to rapidly evolving gastrotrichs and platyhelminths [7,8] is

105 due to long branch attraction. To test this prediction we exaggerated the expected effects of

106 LBA on our own data set by using less well fitting models. We first conducted cross validation

107 comparing the site heterogeneous CAT+G4 model we have used to the site homogenous

108 LG+G4 and show that LG+G4 is a significantly less good fit to our data (CAT+G4 is better

109 than LG+G4: ΔlnL = 9787 +/- 249.265). We used the less well fitting LG+G4 model to

110 reanalyse the jackknife replicates of a data set including our four most complete annelids. We

111 observed a topology clearly influenced by LBA in which long branched taxa including

112 flatworms, annelids, rotifers and were grouped. We also observed within this ‘LBA

113 assemblage’ the two longest branched clades, dicyemids and the orthonectid as each other’s

114 closest relatives. As a further test we reanalysed the published data set [8] which had linked

115 orthonectid and dicyemid with platyhelminths and gastrotrichs. When we removed the most

116 obvious source of LBA - the long branched dicyemid - we found that the orthonectid Intoshia

117 was, as expected, found not with platyhelminths or gastrotrichs but with the two annelids

118 present in this data set, again providing evidence of the effects of long branch attraction (Fig

119 4).

120 Discussion

121 We have analysed the first, almost complete mitochondrial genome sequence of an orthonectid

122 mesozoan and added to the known mitochondrial genes of to provide two powerful bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

123 rare genomic changes. Our analyses of mitochondrial NAD5 gene sequences show

124 unequivocally that both Dicyemida and Orthonectida are members of the protostomes and the

125 absence of rhabditophoran mitochondrial genetic code changes rejects existing ideas

126 that either group might be derived from parasitic flatworms. Both groups show unusually high

127 rates of evolution and this required steps to test for and avoid the possible effects of long branch

128 attraction, not least between the orthonectids and dicyemids.

129 Our mitochondrial data set and our large, taxonomically broad set of nuclear genes with a low

130 percentage of missing data, analysed with well fitting, site heterogeneous models of sequence

131 evolution, do not support the close relationship between orthonectids and dicyemids.

132 Orthonectids are annelids and not members of the Mesozoa and the phylum Mesozoa sensu

133 lato is an unnatural polyphyletic assemblage. We were unable to place the dicyemids more

134 precisely and they may be considered a phylum in their own right. Experiments manipulating

135 the expected effects of LBA strongly suggest previous phylogenies were affected by this

136 important source of systematic error. Finding the orthonectids and dicyemids not closely

137 associated demonstrates a remarkable instance of convergent evolution in two unrelated,

138 miniaturised parasites.

139 The finding that the orthonectid Intoshia is a member of the Annelida shows that it has evolved

140 its extraordinary simplicity by drastic simplification from a much more complex annelid

141 common ancestor. Our phylogenetic analyses could not more precisely place Intoshia within

142 the annelids, however, a short stretch of mitochondrial genes (nad1, nad6, cob) that are found

143 in the same order as in the lophotrochozoan ancestor and in the early branching annelid Owenia

144 fusiformis but not in the pleistoannelid ground plan argues for a position outside of the

145 Pleistoannelida [17] (Fig 2d). Possible evidence of an ancestral segmented body plan is still

146 apparent in the series of circular muscles regularly spaced along the antero-posterior axis of

147 Intoshia (Fig 1b), along with similarly repeated bands of cilia (Fig 1 and ref [18]). Further bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

148 analysis of the genome, embryology and morphology of Intoshia or other orthonectids are

149 predicted to show additional clues as to their cryptic annelidan ancestry.

150 Methods 151 152 Genome and transcriptome assemblies.

153 We downloaded genomic (Intoshia linei: SRR4418796, SRR4418797) and transcriptomic

154 (Dicyema sp.: SRR827581; Dicyema japonicum: DRR057371) data from the NCBI Short

155 Read Archives and DDBJ, and used Trimmomatic [19] to clean residual adapter sequences

156 from the sequencing reads and to remove low quality bases. We used the clc assembly cell

157 (clcBIO/Qiagen; v.5.0) to re-assemble the I. linei genome and the Trinity pipeline [20]

158 (v.2.3.2) to assemble the Dicyema. sp. and D. japonicum transcriptomes using default

159 settings. We additionally assembled transcriptomes for Phascolopsis gouldii,

160 Spiochaetopterus sp., Arenicola marina, Sabella pavonina, Magelona pitelkai,

161 Pharyngocirrus tridentiger and Bonellia viridis from SRA datasets (SRR1654498,

162 SRR1224605, SRR2005653, SRR2005708, SRR2015609, SRR2016714, SRR2017645)

163 using the same approach.

164

165 Identifying mitochondrial genome fragments.

166 Using mitochondrial gene protein coding sequences from flatworms as queries [21] we used

167 tblastn [22] and blastp to search for Dicyema sp. and D. japonicum mitochondrial fragments

168 in the Trinity RNA-Seq assemblies, and screened the I. linei genome re-assembly in a similar

169 way. Positively identified ORFs were then blasted against NCBI nr to detect possible

170 contamination from species in the RNA-Seq data. For each Dicyema sp. gene-bearing

171 contig, we also found additional contigs which had strongly matching blast hits to Octopus or bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

172 other (or in some cases to the gastropod mollusc Aplysia) and we discarded

173 these as likely contaminations.

174

175 Annotating mitochondrial genomes.

176 Using blast we identified a 14.2kb mitochondrial contig in the assembled I. linei genome,

177 which we annotated using MITOS [23]. The location of protein-coding genes were manually

178 verified from MITOS prediction, and inferred to start from the first in-frame start codon

179 (ATN, GTG, TTG, or GTT). The C-terminal of the protein-coding genes was inferred to be

180 the first in-frame stop codon (TAA, TAG or TGA). We aligned the Intoshia and Dicyema

181 NAD5 genes with those from 5 protostomes, 4 , and 2 non-bilaterian species in

182 the Geneious software to visualise Protostome specific signatures in the sequence.

183

184 Mitochondrial Phylogenetics

185 We grouped the mesozoan mitochondrial protein coding genes with their orthologs from 65

186 other species selected to cover the diversity of the Metazoa including diploblasts,

187 deuterostomes and ecdysozoans but with an emphasis on the diversity of Lophotrochozoa.

188 We aligned each set of orthologs using Muscle [24] v3.8.31 using default parameters and

189 trimmed these alignments to exclude unreliably aligned positions using TrimAl [25] (version

190 1.2 rev 59 using default settings). Finally, we concatenated the trimmed alignments of all

191 genes into a supermatrix of 2969 positions. We inferred a phylogeny with phylobayes (4.1b)

192 under the CAT+G4 model. We ran 10 independent chains for 10,000 cycles each. We

193 summarised all ten chains (bpcomp) discarding the first 8,000 trees from each as burnin. We

194 reconstructed additional mitochondrial phylogenies omitting (i) the long branching flatworm

195 species, (ii) all long branch taxa and also Intoshia, and (iii) long branch taxa and the Dicyema bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

196 species. Here and elsewhere we visualised and edited phylogenetic trees with FigTree

197 (v1.4.3; http://tree.bio.ed.ac.uk/software/figtree/).

198

199 Nuclear gene orthology determination

200 We chose to add the mesozoan data to sets of orthologous genes that were previously

201 successfully used to infer lophotrochozoan phylogeny [26,27]. We first used Orthofinder [28]

202 (v.1.0.8) to calculate orthologous relationships between the genes predicted for I. linei in the

203 recent genome paper [8] and our Dicyema sp. gene predictions. To ensure robustness of the

204 analysis we included several outgroup species (Supplementary table 2) In particular, as we

205 were concerned about potential contamination by the hosts of the parasitic Dicyema we

206 included the Octopus bimaculoides proteome. Since the published phylogenomic studies

207 included few annelid species we added our own Trinity assemblies of several additional

208 species (see above). We then extracted all orthologous groups containing the Octopus and the

209 two mesozoan taxa from the Orthofinder output and inserted these sequences into the original

210 alignments. This resulted in 590 orthologous groups. With the aid of OMA [29] and custom

211 Perl scripts we filtered these groups to contain single copy orthologs of all species. We re-

212 aligned each set of orthologs using clustal-omega [30]; we removed unreliably aligned

213 positions from each alignment using TrimAl; finally we constructed individual gene trees

214 from these trimmed alignments using phyml [31] (v20160207). Using Python code and the

215 ETE3 toolkit we checked each tree for instances where sequences from Octopus and Dicyema

216 sp. were each other’s closest relatives (suggesting the sequence is an Octopus contaminant)

217 and removed the 5 alignments where the trees had this topology from our set. We

218 concatenated all single trimmed alignments of 45 taxa into a supermatrix of 227,646 bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

219 positions. We used a custom script to eliminate all positions in the alignment with less than

220 50% occupancy.

221

222 Nuclear Gene Phylogenomic analyses

223 Using the mpi version of phylobayes (in v.1.7) run over four independent chains for 5000

224 cycles and discarding the first 4500 trees as burnin we reconstructed a phylogeny using this

225 alignment under both the CAT+G4 model of molecular evolution. To provide a conservative

226 measure of clade support and to test different data samples in a reasonable time we also

227 reconstructed trees using 50 jackknife sub-samples of 30,000 positions each from the

228 supermatrix. We used phylobayes 4.1c with the aid of the gnu-parallel command line tool

229 [32] and the UCL HPC cluster. We used the CAT+G4 model, and also compared results

230 from LG+G4. We ran phylobayes for 2000 cycles per jackknife sample which consistently

231 resulted in a plateauing of the likelihood score. We summarised all 50 of these phylobayes

232 analyses per model (using bpcomp) discarding the first 1800 sampled trees per jackknife as

233 burnin. We also tested the effect of different species compositions in our dataset by

234 performing phylobayes jackknife sampling with different subsets of taxa.

235

236 Cross validation

237 We compared the fit of CAT+G4 and LG+G4 models to our data using cross validation as

238 described in the phylobayes user manual. We ran 10 replicates and for each replicate we used

239 a randomly selected 30,000 positions of the data as a training set and 10,000 randomly bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

240 selected positions as the test set. Log likelihood scores were averaged over the ten replicates

241 using the sumcv command.

242

243 Ranking genes according to support for monophyletic Annelida.

244 We first removed all Intoshia and Dicyema sequences from each individual gene alignment.

245 For each individual gene, we reconstructed a tree from the aligned protein coding sequences

246 using Ninja [33]. Each tree was parsed using a custom script to find the proportion of

247 annelids in the data set present in the largest clade of annelids found. The tree was given a

248 score which was calculated as the number of annelids in the largest clade/total number of

249 annelids on the tree. Trees with larger monophyletic annelid clades scored highest. The

250 genes were then concatenated in order of their score. We took the first 25% of positions from

251 this concatenation (those genes with the strongest signal supporting monophyletic annelids)

252 and analysed jackknife replicates as before.

253 254 Acknowledgements: 255 256 The authors are grateful to Prof George Slyusarev (Saint Petersburg State University, Russia)

257 for providing images of Intoshia linei and for sharing an unpublished book chapter and to Dr

258 Tsai-Ming Lu and colleagues (Okinawa Institute of Science and Technology, Japan) for

259 sharing data. The authors also thank Fraser Simpson (UCL) for help in editing Figure 1b. The

260 research was funded by ERC grant (ERC-2012-AdG 322790) to MJT. Alignments have been

261 deposited with Zenodo (XXX) and phylogenetic trees are available through treebase.org

262 (XXX).

263

264 bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

265 Author Contributions

266 Conceived the study: MJT. Planned the study: MJT and PHS. Assembled the data sets: PHS.

267 Analysed the data: PHS and MJT. Drafted the manuscript: MJT and PHS. Analysed

268 mitochondrial data: HER.

269

270 Declaration of Interests:

271 The authors declare no competing financial interests. 272 273 274 Figure Legends 275 276 Fig. 1: The mesozoans Intoshia variabili and Dicyema typus 277 278 A. Differential Interference contrast micrograph of an Intoshia variabili female showing 279 repeated bands of ciliated cells. Picture G. Slyusarev (St Petersburg State University, 280 Russia). 281 B. Confocal image of a phalloidin stained female specimen of Intoshia linei reveals repeated 282 set of circular muscles. Picture G. Slyusarev (St Petersburg State Univ.). 283 C. Rhombogen stage of a dicyemid (Dicyema typus from the Octopus) adapted from Hyman 284 L.H. The Invertebrates: Protozoa through McGraw-Hill, New York 1940(19). 285 Anterior to right in all images. 286 287 Fig. 2: Analyses of the phylogenetic positions of Dicyema and Intoshia based on 288 mitochondrial gene sequences. 289 290 A. Alignment of the mitochondrial NAD5 gene from selected protostomes, deuterostomes, 291 and outgroups, highlighting derived substitutions and amino acid deletions shared by the 292 orthonectids, dicyemids, and other protostomes. 293 B. A mitochondrial bayesian phylogeny based on 2969 positions places orthonectids and 294 dicyemids inside Lophotrochozoa, but the unlikely assemblage of Intoshia linei and 295 flatworms with annelids suggest this is affected by systematic error. 296 C. Mitochondrial bayesian phylogeny omitting the long branching taxa including Dicyema 297 gives some support for a position of Intoshia within Annelida.D. Order of the Intoshia nad1, 298 nad6, and cob mitochondrial genes in comparison to the early branching annelid Owenia 299 fusiformis, the pleistoannelid ground plan and the lophotrochozoan ground plan (see ref [17]). 300 bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

301 Fig. 3: Analyses of the phylogenetic positions of Dicyema and Intoshia based on nuclear 302 gene sequences. 303 A. A bayesian phylogeny reconstructed from 190,027 aligned amino acid positions analysed 304 under the CAT+G4 model. Support values are from bayesian posterior probabilities (PP) and 305 from 50 jackknifed sub-samples of 30,000 residues (JP support values in brackets). Both 306 analyses reveal Mesozoa to be polyphyletic and place Intoshia linei in Annelida (see Supp 307 Fig 4a for support values). 308 B. A repeat of the jackknife analysis omitting the long-branching Dicyema species eliminates 309 the potential for LBA between Intoshia and Dicyema. This leads to an increase in the support 310 for a position of Intoshia within Annelida from JP 0.74 to JP 0.86 JP. (Only lophotrochozoan 311 part of the tree shown, see Supplementary Fig 4c for full tree). 312 C. Bayesian jackknife using CAT+G4 model using the best quarter of genes supporting 313 monophyletic annelids leads to increased support for Intoshia within Annelida to JP 0.94 314 even with the inclusion of the Dicyema species. (Only lophotrochozoan part of the tree 315 shown, see Supplementary Fig 4d for full tree). 316

317 Fig. 4: Reanalysis of a published data set addressing potential LBA between mesozoans 318 supports annelid affinity for Intoshia. 319 Repeating the analyses on a previously published data set [8] excluding the long branching 320 Dicyema leads to Intoshia being placed with the annelids, showing the likely effect of LBA 321 on the original analysis. Support values are bayesian posterior probabilities (PP). A B bioRxiv preprint was notcertifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. doi: https://doi.org/10.1101/235549 ; C this versionpostedDecember18,2017. The copyrightholderforthispreprint(which

Fig. 1: The mesozoans Intoshia variabili and Dicyema typus A. Differential Interference contrast micrograph of an Intoshia variabili female showing repeated bands of ciliated cells. Picture G. Slyusarev (St Petersburg State University, Russia). B. Confocal image of a phalloidin stained female specimen of Intoshia linei reveals repeated set of circular muscles. Picture G. Slyusarev (St Petersburg State Univ.). C. Rhombogen stage of a dicyemid (Dicyema typus from the Octopus) adapted from Hyman L.H. The Invertebrates: Protozoa through Ctenophora McGraw-Hill, New York 1940(19). Anterior to right in all images. 180 190 260 270 280 290 Dicyema Lophotrochozoa bioRxiv preprint Intoshia rrnS rrnL nad1 nad6 cob nad4L nad4 A Octopus D Phonoris

Taenia Intoshia linei was notcertifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. Brugia Drosophila -nad4 -cox1 nad1 nad6 doi: cob nad3 rrnL

Xenopus https://doi.org/10.1101/235549 Pongo Owenia fusiformis Hydra Paramecium nad5 nad2 nad1 nad6 cob nad4L nad4 Pleistoannelida B C cox2 atp8 cox3 nad6 cob atp6 nad5

Acropora_tenuis Acropora_tenuis ; Nematostella_sp Nematostella_sp this versionpostedDecember18,2017. Aurelia_aurita Trichoplax_adhaerens Trichoplax_adhaerens Xenoturbella_bocki Aurelia_aurita 0.62 Strongylocentrotus_pallidus Xenoturbella_bocki Saccoglossus_kowalevskii Strongylocentrotus_pallidus Balanoglossus_carnosus Saccoglossus_kowalevskii Myxine_glutinosa 1 0.83 0.98 Lampetra_fluviatilis Balanoglossus_carnosus Salmo_salar Doliolum_nationalis 0.96 Ornithorhynchus_anatinus 0.74 Ciona_intestinalis Homo_sapiens Myxine_glutinosa Doliolum_nationalis Lampetra_fluviatilis Ciona_intestinalis Salmo_salar Branchiostoma_floridae 1 Asymmetron_inferum Ornithorhynchus_anatinus Priapulus_caudatus Homo_sapiens Tribolium_castaneum 1 Branchiostoma_floridae Locusta_migratoria Asymmetron_inferum Limulus_polyphemus Priapulus_caudatus Daphnia_pulex Trichinella_spiralis Tribolium_castaneum Strongyloides_stercoralis Locusta_migratoria 1 Steinernema_carpocapsae Limulus_polyphemus Schmidtea_mediterranea Daphnia_pulex Spadella_cephaloptera Spadella_cephaloptera Sagitta_inflata

Sagitta_inflata The copyrightholderforthispreprint(which Decipisagitta_decipiens 1 Siphonodentalium_lobatum Decipisagitta_decipiens Mytilus_trossulus Siphonodentalium_lobatum 0.87 Flustrellidra_hispida Mytilus_trossulus Bugula_neritina Thais_clavigera Pupa_strigosa Lineus_alborostratus Biomphalaria_tenagophila Aplysia_californica 0.92 Nautilus_macromphalus 0.68 Thais_clavigera Octopus_vulgaris Lineus_alborostratus Octopus_ocellatus Nautilus_macromphalus Sepia_officinalis Octopus_vulgaris Sepia_esculenta Octopus_ocellatus Sepioteuthis_lessoniana Sepia_officinalis Sepia_esculenta Loligo_bleekeri Sepioteuthis_lessoniana 0.94 Dosidicus_gigas 0.3 Loligo_bleekeri Pupa_strigosa Dosidicus_gigas Biomphalaria_tenagophila Terebratulina_retusa Aplysia_californica Laqueus_rubellus Sipunculus_nudus Terebratulina_retusa Orbinia_latreillii Laqueus_rubellus 0.41 Urechis_caupo Flustrellidra_hispida Platynereis_dumerilii Bugula_neritina Myzostoma_seymourcollegiorum 0.8 Sipunculus_nudus Marphysa_sanguinea Intoshia_linei 0.69 Intoshia_linei 0.57 Microstomum_lineare Orbinia_latreillii Schistosoma_mansoni 0.39 Clymenella_torquata Fasciola_hepatica Platynereis_dumerilii 0.33 Taenia_crassiceps Myzostoma_seymourcollegiorum Echinococcus_shiquicus Urechis_caupo Echinococcus_multilocularis Dicyema_sp Marphysa_sanguinea 1 Dicyema_misakiense Escarpia_spicata Dicyema_japonicum Lumbricus_terrestris Clymenella_torquata Perionyx_excavatus Escarpia_spicata Amynthas_jiriensis Lumbricus_terrestris Annelida Perionyx_excavatus Amynthas_jiriensis 0.1

0.1 Fig. 2: Analyses of the phylogenetic positions of Dicyema and Intoshia based on mitochondrial gene sequences. A. Alignment of the mitochondrial NAD5 gene from selected protostomes, deuterostomes, and outgroups, highlighting derived substitutions and amino acid deletions shared by the orthonectids, dicyemids, and other protostomes. B. A mitochondrial bayesian phylogeny based on 2969 positions places orthonectids and dicyemids inside Lophotrochozoa, but the unlikely assemblage of Intoshia linei and flatworms with annelids suggest this is affected by systematic error. C. Mitochondrial bayesian phylogeny omitting the long branching taxa including Dicyema gives some support for a position of Intoshia within Annelida. D. Order of the Intoshia nad1, nad6, and cob mitochondrial genes in comparison to the early branching annelid Owenia fusiformis, the pleistoannelid ground plan and the lophotrochozoan ground plan (see ref [17]). Amphimedon_queenslandica Sycon_coactum Trichoplax_adhaerens Nematostella_vectensis bioRxiv preprint Hydra_vulgaris Saccoglossus_kowalevskii Rhabdopleura_sp Strongylocentrotus_purpuratus Patiria_miniata was notcertifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission.

A Homo_sapiens doi: Branchiostoma_floridae Priapulus_caudatus Fig. 3: Analyses of the phylogenetichttps://doi.org/10.1101/235549 positions of Trichinella_spiralis Tribolium_castaneum Dicyema and Intoshia based on nuclear gene Pediculus_humanus Daphnia_pulex sequences. Macracanthorhynchus_hirudinaceus Adineta_ricciae A. A bayesian phylogeny reconstructed from Neomenia_megatrapezata Octopus_bimaculoides 190,027 aligned amino acid positions analysed 1 Lymnea_stagnalis Lottia_gigantea under the CAT+G4 model. Support values are Dicyema_sp

(0.97) Dicyema_japonicum ; from bayesian posterior probabilitiesthis versionpostedDecember18,2017. (PP) and Cerebratulus_sp Tubulanus_polymorphus from 50 jackknifed sub-samples of 30,000 Cephalothrix_linearis Mesodasys_laticaudatus residues (JP support values in brackets). Both Macrodasys_sp Lepidodermella_squamata analyses reveal Mesozoa to be polyphyletic and Macrostomum_lignano Maritigrella_crozieri Monocelis_sp place Intoshia linei in Annelida (see Supp Fig 4a Schistosoma_mansoni Dendrocoelum_lacteum for support values). Stenostomum_sp Magelona_pitelkai B. A repeat of the jackknife analysis omitting the Golfingia_vulgaris Pharyngocirrus_tridentiger long-branching Dicyema species eliminates the Intoshia_linei

0.97 The copyrightholderforthispreprint(which Sabella_pavonina potential for LBA between Intoshia and Dicyema. Capitella_tellata (0.74) Bonellia_viridis This leads to an increase in the support for a Helobdella_robusta 0.1 Alvinella_pompejana Annelida position of Intoshia within Annelida from JP 0.74

0.1 to JP 0.86 JP. (Only lophotrochozoan part of the Macracanthorhynchus_hirudinaceus Macracanthorhynchus_hirudinaceus Adineta_ricciae Adineta_ricciae tree shown, see Supplementary Fig 4c for full B C Mesodasys_laticaudatus Mesodasys_laticaudatus Macrodasys_sp tree). Macrodasys_sp Lepidodermella_squamata Lepidodermella_squamata Neomenia_megatrapezata C. Bayesian jackknife using CAT+G4 model using Neomenia_megatrapezata Octopus_bimaculoides Octopus_bimaculoides Lymnea_stagnalis the best quarter of genes supporting monophyletic Lymnea_stagnalis Lottia_gigantea Lottia_gigantea Dicyema_sp annelids leads to increased support for Intoshia Cerebratulus_sp Dicyema_japonicum Tubulanus_polymorphus Cerebratulus_sp within Annelida to JP 0.94 even with the inclusion Cephalothrix_linearis Tubulanus_polymorphus of the Dicyema species. (Only lophotrochozoan Macrostomum_lignano Cephalothrix_linearis Macrostomum_lignano Maritigrella_crozieri part of the tree shown, see Supplementary Fig 4d Maritigrella_crozieri Monocelis_sp Monocelis_sp Schistosoma_mansoni for full tree). Schistosoma_mansoni Dendrocoelum_lacteum Dendrocoelum_lacteum Stenostomum_sp Stenostomum_sp Magelona_pitelkai Magelona_pitelkai Golfingia_vulgaris Golfingia_vulgaris 0.86 Pharyngocirrus_tridentiger 0.94 Pharyngocirrus_tridentiger Intoshia_linei Sabella_pavonina Sabella_pavonina Intoshia_linei Helobdella_robusta Helobdella_robusta Capitella_tellata Capitella_tellata Bonellia_viridis Bonellia_viridis Alvinella_pompejana Alvinella_pompejana Annelida

0.5 0.50.5 0.5 bioRxiv preprint Daphnia_pulex 1 Tribolium_castaneum 1

Drosophila_melanogaster was notcertifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission.

Limnognathia_maerski doi: 1 Macracanthorhynchus_hirudinaceus https://doi.org/10.1101/235549 1 Brachionus_koreanus Mesodasys_laticaudatus 1 Macrodasys_sp 0.91 Lepidodermella_squamata 1 1 0.96 Diuronotus_aspetos 0.98 Stenostomum_sthenum 1 ;

Stenostomum_leucops this versionpostedDecember18,2017. 1 Microstomum_lineare Prostheceraeus_vittatus 1 1 Geocentrophora_applanata 1 1 Monocelis_fusca 1 Schistosoma_mansoni 0.99 1 Bothrioplana_semperi Octopus_bimaculatus 1 Lottia_gigantea 0.94 Intoshia_linei 0.93 Helobdella_robusta The copyrightholderforthispreprint(which 0.98 Annelida Capitella_teleta Gnathostomula_paradoxa 1 Austrognathia_sp Homo_sapiens 1 1 Ciona_intestinalis Branchiostoma_floridae

0.1

Fig. 4: Reanalysis of a published data set addressing potential LBA between mesozoans supports annelid affinity for Intoshia.

Repeating the analyses on a previously published data set [7] excluding the long branching Dicyema leads to Intoshia being placed with the annelids, showing the likely effect of LBA on the original analysis. Support values are bayesian posterior probabilities (PP). bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

322 Supplementary Tables 323 324 Supplementary Table 1 325 Predicted correspondence of nucleotide triplets to amino acids in Intoshia and three Dicyema 326 species. For each triplet, the amino acid corresponding to the triplet in the standard 327 invertebrate mitochondrial code is shown, the number of observations of the triplet to 328 prediction is based on, the predicted amino acid and its score and finally the second highest 329 scoring amino acid prediction. The triplets AAA and ATA are highlighted in green and likely 330 errors highlighted in blue. Likely errors are mostly associated with very low numbers of 331 observed GC rich triplets in these very AT rich mitochondrial genomes. 332 333 Supplementary Table 2 334 List of species used in the final phylogenetic analysis, data sources, and representation in the 335 final alignment. 336 337 338 Supplementary Figures: 339 340 Figure S1. Related to Figure 2. 341 Phylogram and corresponding cladogram of a Bayesian analysis of our mitochondrial data set 342 omitting the long-branching flatworm species. Phylobayes CAT+G4 model was run in 10 343 independent runs for 10,000 cycles each on an alignment with 2969 positions and 8000 trees 344 were discarded as burnin. 345 346 Figure S2. Related to Figure 2. 347 Phylogram and corresponding cladogram of a Bayesian analysis of our mitochondrial data set 348 omitting Intoshia linei. Phylobayes CAT+G4 model was run in 10 independent runs for 349 10,000 cycles each on an alignment with 2969 positions and 8000 trees were discarded as 350 burnin. 351 352 Figure S3. Related to Figure 3. 353 A phylogram based on our analysis of the jackknifed dataset omitting Intoshia linei. Contrary 354 to the improvement in placing I. linei observed when excluding the Dicyema species, the 355 exclusion of I. linei does not lead to a better resolution of the Dicyema species’ position. This 356 can be seen as further evidence for the non-affiliation of orthonectids and dicyemids and the 357 correct inference that orthonectids are part of Annelida. bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

358 Figure S4. Related to Figure 3. 359 A. Cladogram corresponding to Fig 3a showing all PP support values for the CAT+G4 360 phylogeny based on the full alignment of 190,027 amino acid positions. 361 362 B. A cladogram including JP support values based on 50 jackknife subsamples of 30,000 363 amino acid positions each independently analysed for 2000 cycles under the CAT+G4 model 364 in phylobayes and summarised with the bpcomp command setting 1800 as burnin. As in the 365 analysis of the full dataset I. linei is found within the annelids and phylum Mesozoa is found 366 as an unnatural assemblage. 367 368 C. Cladogram corresponding to Fig 3b showing all support values. 369 370 D. Cladogram corresponding to Fig 3c showing all support values. 371 372 References:

373 374 [1] Slyusarev GS, Starunov VV. The structure of the muscular and nervous systems of 375 the female Intoshia linei (Orthonectida). Org Divers Evol 2015;16:65–71. 376 [2] Furuya H, Hochberg FG, Tsuneki K. Cell number and cellular composition in 377 infusoriform larvae of dicyemid mesozoans (Phylum Dicyemida). Zool Sci 378 2004;21:877–89. 379 [3] Nielsen C. Animal Evolution. Oxford University Press; 2011. 380 [4] Dodson EO. A note on the systematic position of the Mesozoa. Syst Zoo 1956;5:37. 381 [5] Suzuki TG, Ogino K, Tsuneki K, Furuya H. Phylogenetic analysis of dicyemid 382 mesozoans (phylum Dicyemida) from innexin amino acid sequences: Dicyemids are 383 not related to Platyhelminthes. J Parasitol 2010;96:614–25. 384 [6] Hanelt B, Van Schyndel D, Adema CM, Lewis LA, Loker ES. The phylogenetic 385 position of Rhopalura ophiocomae (Orthonectida) based on 18S ribosomal DNA 386 sequence analysis. Mol Biol Evol 1996;13:1187–91. 387 [7] Lu T-M, Kanda M, Satoh N, Furuya H. The phylogenetic position of dicyemid 388 mesozoans offers insights into spiralian evolution. Zool Letts 2017;3:419. 389 [8] Mikhailov KV, Slyusarev GS, Nikitin MA, Logacheva MD, Penin AA, Aleoshin VV, et 390 al. The genome of Intoshia linei affirms orthonectids as highly simplified spiralians. 391 Curr Biol 2016;26:1768–74. 392 [9] Philippe H, Brinkmann H, Copley RR, Moroz LL, Nakano H, Poustka AJ, et al. 393 Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 394 2011;470:255–8. 395 [10] Slyusarev GS. Fine structure and development of the cuticle of Intoshia variabili 396 (Orthonectida). Acta Zool 2000;81:1–8. 397 [11] Telford MJ, Copley RR. Improving animal phylogenies with genomic data. Trends 398 Genet 2011;27:186–95. 399 [12] Telford MJ. Turning Hox “signatures” into synapomorphies. Evol Dev 2000;2:360–4. 400 [13] Kobayashi M, Furuya H, Holland PW. Dicyemids are higher animals. Nature 401 1999;401:762–2. 402 [14] Telford MJ, Herniou EA, Russell RB, Littlewood DT. Changes in mitochondrial 403 genetic codes as phylogenetic characters: two examples from the flatworms. P Natl 404 Acad Sci Usa 2000;97:11359–64. 405 [15] Lartillot N, Blanquart S, Lepage T. PhyloBayes 3.3 a Bayesian software for 406 phylogenetic reconstruction and molecular dating using mixture models. 2012. bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

407 [16] Simion P, Philippe H, Baurain D, Jager M, Richter DJ, Di Franco A, et al. A large 408 and consistent phylogenomic dataset supports as the sister group to all 409 other animals. Curr Biol 2017;27:958–67. 410 [17] Weigert A, Golombek A, Gerth M, Schwarz F, Struck TH, Bleidorn C. Evolution of 411 mitochondrial gene order in Annelida. Mol Phyl Evol 2016;94:196–206. 412 [18] Slyusarev GS, Kristensen RM. Fine structure of the ciliated cells and ciliary rootlets 413 of Intoshia variabili (Orthonectida). Zoomorphology 2002;122:33–9. 414 [19] Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina 415 sequence data. Bioinformatics 2014;30:2114–20. 416 [20] Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De 417 novo transcript sequence reconstruction from RNA-seq using the Trinity platform for 418 reference generation and analysis. Nat Protoc 2013;8:1494–512. 419 [21] Robertson HE, Lapraz F, Egger B, Telford MJ, Schiffer PH. The mitochondrial 420 genomes of the acoelomorph worms Paratomella rubra, Isodiametra pulchra and 421 Archaphanostoma ylvae. Sci Rep 2017;7:1847. 422 [22] Altschul SF, Gish W, Miller W, Myers EW. Basic local alignment search tool. Journal 423 of Mol Biol 1990;215:403–10. 424 [23] Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, et al. MITOS: 425 improved de novo metazoan mitochondrial genome annotation. Mol Phyl Evol 426 2013;69:313–9. 427 [24] Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and 428 space complexity. BMC Bioinformatics 2004;5:113. 429 [25] Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. TrimAl: a tool for automated 430 alignment trimming in large-scale phylogenetic analyses. Bioinformatics 431 2009;25:1972–3. 432 [26] Egger B, Lapraz F, Tomiczek B, Müller S, Dessimoz C, Girstmair J, et al. A 433 transcriptomic-phylogenomic analysis of the evolutionary relationships of flatworms. 434 Curr Biol 2015;25:1347–53. 435 [27] Struck TH, Wey-Fabrizius AR, Golombek A, Hering L, Weigert A, Bleidorn C, et al. 436 Platyzoan paraphyly based on phylogenomic data supports a noncoelomate 437 ancestry of . Mol Biol Evol 2014;31:1833–49. 438 [28] Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome 439 comparisons dramatically improves orthogroup inference accuracy. Genome Biol 440 2015;16:E9–13. 441 [29] Altenhoff AM, Škunca N, Glover N, Train C-M, Sueki A, Piližota I, et al. The OMA 442 orthology database in 2015: function predictions, better support, synteny view 443 and other improvements. Nucleic Acids Res 2015;43:D240–9. 444 [30] Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, et al. Fast, scalable 445 generation of high-quality protein multiple sequence alignments using Clustal 446 Omega. Mol Syst Biol 2011;7:1–6. 447 [31] Morrison DA. Increasing the efficiency of searches for the maximum likelihood tree 448 in a phylogenetic analysis of up to 150 nucleotide sequences. Systematic Biol 449 2007;56:988–1010. 450 [32] Tange O. Gnu parallel-the command-line power tool. The USENIX Magazine; 2011. 451 [33] Wheeler TJ. Large-Scale Neighbor-Joining with NINJA. Algorithms in Bioinformatics, 452 vol. 5724, Berlin, Heidelberg: Springer Berlin Heidelberg; 2009, pp. 375–89.

~50µm bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure S1. Related to Figure 2. Acropora_tenuis Nematostella_sp Aurelia_aurita Trichoplax_adhaerens Xenoturbella_bocki 1 Strongylocentrotus_pallidus 0.94 1 Saccoglossus_kowalevskii 1 Balanoglossus_carnosus 0.55 1 Doliolum_nationalis 0.69 Ciona_intestinalis 1 Myxine_glutinosa 0.74 Lampetra_fluviatilis 1 Salmo_salar Acropora_tenuis 1 Ornithorhynchus_anatinus Nematostella_sp 0.64 1 Homo_sapiens Aurelia_aurita 1 Branchiostoma_floridae Trichoplax_adhaerens 1 Asymmetron_inferum Xenoturbella_bocki Priapulus_caudatus Strongylocentrotus_pallidus Saccoglossus_kowalevskii 0.96 0.97 Tribolium_castaneum 0.97 Locusta_migratoria Balanoglossus_carnosus 0.96 Limulus_polyphemus Doliolum_nationalis Daphnia_pulex Ciona_intestinalis Spadella_cephaloptera Myxine_glutinosa 0.99 Lampetra_fluviatilis 0.78 Sagitta_inflata Salmo_salar 1 Decipisagitta_decipiens Ornithorhynchus_anatinus Siphonodentalium_lobatum Homo_sapiens 0.74 Mytilus_trossulus Branchiostoma_floridae Thais_clavigera Asymmetron_inferum 0.39 Lineus_alborostratus Priapulus_caudatus 0.58 Nautilus_macromphalus Tribolium_castaneum 0.94 0.41 Octopus_vulgaris Locusta_migratoria 0.83 0.99 Octopus_ocellatus Limulus_polyphemus 0.85 Sepia_officinalis Daphnia_pulex 0.94 Sepia_esculenta Spadella_cephaloptera 0.38 0.87 Sagitta_inflata 0.99 Sepioteuthis_lessoniana Decipisagitta_decipiens 0.97 Loligo_bleekeri Siphonodentalium_lobatum Dosidicus_gigas Mytilus_trossulus 0.87 Pupa_strigosa Thais_clavigera 0.99 Biomphalaria_tenagophila Lineus_alborostratus 0.63 Aplysia_californica Nautilus_macromphalus Terebratulina_retusa Octopus_vulgaris 0.99 Laqueus_rubellus Octopus_ocellatus 0.73 Flustrellidra_hispida Sepia_officinalis 1 Bugula_neritina Sepia_esculenta Sipunculus_nudus Sepioteuthis_lessoniana 0.47 0.32 Dicyema_sp Loligo_bleekeri 1 Dosidicus_gigas 0.98 Dicyema_misakiense Pupa_strigosa 0.43 Dicyema_japonicum Biomphalaria_tenagophila Intoshia_linei Aplysia_californica 0.32 Orbinia_latreillii Terebratulina_retusa 0.39 Clymenella_torquata Laqueus_rubellus Platynereis_dumerilii Flustrellidra_hispida 0.42 0.63 Myzostoma_seymourcollegiorum Bugula_neritina Urechis_caupo Sipunculus_nudus 0.44 0.57 Marphysa_sanguinea Dicyema_sp Escarpia_spicata Dicyema_misakiense 0.89 Lumbricus_terrestris Dicyema_japonicum 0.92 Perionyx_excavatus Intoshia_linei 0.94 Orbinia_latreillii Amynthas_jiriensis Clymenella_torquata Platynereis_dumerilii Myzostoma_seymourcollegiorum 0.1 Urechis_caupo Marphysa_sanguinea Escarpia_spicata Lumbricus_terrestris Perionyx_excavatus Amynthas_jiriensis

Figure S1: 0.1 Phylogram and corresponding cladogram of a Bayesian analysis of our mitochondrial data set omitting the long-branching flatworm species. Phylobayes CAT+G4 model was run in 10 independent runs for 10,000 cycles each on an alignment with 2969 positions and 8000 trees were discarded as burnin. bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure S2. Related to Figure 2. Acropora_tenuis Nematostella_sp Trichoplax_adhaerens Aurelia_aurita Xenoturbella_bocki 1 Strongylocentrotus_pallidus 0.85 1 Saccoglossus_kowalevskii 1 Balanoglossus_carnosus 0.57 1 Doliolum_nationalis 0.72 Ciona_intestinalis Myxine_glutinosa 0.66 1 Lampetra_fluviatilis 1 Salmo_salar 0.99 Ornithorhynchus_anatinus 0.64 1 Homo_sapiens Branchiostoma_floridae 1 1 Asymmetron_inferum Acropora_tenuis Priapulus_caudatus Nematostella_sp 0.95 0.97 Tribolium_castaneum Trichoplax_adhaerens 0.97 Locusta_migratoria Aurelia_aurita 0.97 Limulus_polyphemus Xenoturbella_bocki Daphnia_pulex Strongylocentrotus_pallidus Spadella_cephaloptera Saccoglossus_kowalevskii 1 Sagitta_inflata Balanoglossus_carnosus 0.76 Decipisagitta_decipiens Doliolum_nationalis 1 Ciona_intestinalis Thais_clavigera Myxine_glutinosa 0.88 Siphonodentalium_lobatum Lampetra_fluviatilis Mytilus_trossulus Salmo_salar Lineus_alborostratus Ornithorhynchus_anatinus 0.56 Nautilus_macromphalus Homo_sapiens Octopus_vulgaris Branchiostoma_floridae 0.92 0.89 0.99 Octopus_ocellatus Asymmetron_inferum 0.59 0.91 Sepia_officinalis Priapulus_caudatus 0.97 Sepia_esculenta Tribolium_castaneum 0.93 Locusta_migratoria 0.98 Sepioteuthis_lessoniana Limulus_polyphemus 0.98 Loligo_bleekeri Daphnia_pulex Dosidicus_gigas Spadella_cephaloptera Dicyema_sp Sagitta_inflata 1 Decipisagitta_decipiens 0.99 Dicyema_misakiense Thais_clavigera 0.91 Dicyema_japonicum Siphonodentalium_lobatum Pupa_strigosa Mytilus_trossulus 0.71 Biomphalaria_tenagophila Lineus_alborostratus Aplysia_californica Nautilus_macromphalus Terebratulina_retusa Octopus_vulgaris 0.99 Laqueus_rubellus Octopus_ocellatus 0.76 Sepia_officinalis 1 Flustrellidra_hispida Sepia_esculenta 0.74 Bugula_neritina Sepioteuthis_lessoniana Sipunculus_nudus Loligo_bleekeri Orbinia_latreillii Dosidicus_gigas 0.81 Clymenella_torquata Dicyema_sp 0.8 Urechis_caupo Dicyema_misakiense Dicyema_japonicum 0.68 Platynereis_dumerilii Pupa_strigosa Myzostoma_seymourcollegiorum Biomphalaria_tenagophila 0.56 Marphysa_sanguinea Aplysia_californica Escarpia_spicata Terebratulina_retusa 0.96 Lumbricus_terrestris Laqueus_rubellus 0.96 Perionyx_excavatus Flustrellidra_hispida 0.98 Amynthas_jiriensis Bugula_neritina Sipunculus_nudus Orbinia_latreillii Clymenella_torquata 0.1 Urechis_caupo Platynereis_dumerilii Myzostoma_seymourcollegiorum Marphysa_sanguinea Escarpia_spicata Lumbricus_terrestris Perionyx_excavatus Amynthas_jiriensis

Figure S2: 0.1

Phylogram and corresponding cladogram of a Bayesian analysis of our mitochondrial data set omitting Intoshia linei. Phylobayes CAT+G4 model was run in 10 independent runs for 10,000 cycles each on an alignment with 2969 positions and 8000 trees were discarded as burnin. bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure S3. Related to Figure 3. Amphimedon_queenslandica Sycon_coactum Trichoplax_adhaerens Nematostella_vectensis 1 Hydra_vulgaris 0.97 0.96 Saccoglossus_kowalevskii 1 Rhabdopleura_sp 0.87 Strongylocentrotus_purpuratus 1 Patiria_miniata Homo_sapiens 1 Branchiostoma_floridae 1 Priapulus_caudatus 0.9 Trichinella_spiralis 0.82 1 Tribolium_castaneum 1 Pediculus_humanus Daphnia_pulex Macracanthorhynchus_hirudinaceus 1 1 Adineta_ricciae Amphimedon_queenslandica Mesodasys_laticaudatus Sycon_coactum 1 Macrodasys_sp Trichoplax_adhaerens Neomenia_megatrapezata Nematostella_vectensis 0.84 0.98 Octopus_bimaculoides Hydra_vulgaris 0.94 Saccoglossus_kowalevskii 1 Lymnea_stagnalis Rhabdopleura_sp Lottia_gigantea Strongylocentrotus_purpuratus 1 Dicyema_sp Patiria_miniata Dicyema_japonicum Homo_sapiens 0.98 Cerebratulus_sp Branchiostoma_floridae 1 Tubulanus_polymorphus Priapulus_caudatus 0.98 Cephalothrix_linearis Trichinella_spiralis Tribolium_castaneum 0.53 Macrostomum_lignano Pediculus_humanus 1 Maritigrella_crozieri Daphnia_pulex 1 Monocelis_sp 0.99 1 Macracanthorhynchus_hirudinaceus 1 Schistosoma_mansoni Adineta_ricciae Dendrocoelum_lacteum Mesodasys_laticaudatus Stenostomum_sp Macrodasys_sp Magelona_pitelkai Neomenia_megatrapezata 0.79 Golfingia_vulgaris Octopus_bimaculoides Lymnea_stagnalis 0.93 Pharyngocirrus_tridentiger Lottia_gigantea 0.98 Sabella_pavonina Dicyema_sp 0.98 Helobdella_robusta Dicyema_japonicum 0.51 1 Capitella_tellata Cerebratulus_sp Bonellia_viridis Tubulanus_polymorphus Alvinella_pompejana Cephalothrix_linearis Macrostomum_lignano Maritigrella_crozieri 0.1 Monocelis_sp Schistosoma_mansoni Dendrocoelum_lacteum Stenostomum_sp Figure S3: Magelona_pitelkai Golfingia_vulgaris Pharyngocirrus_tridentiger A phylogram based on our analysis of the jackknifed dataset omitting Intoshia linei. Contrary to the Sabella_pavonina improvement in placing I. linei observed when excluding the Dicyema species, the exclusion of I. linei does Helobdella_robusta Capitella_tellata not lead to a better resolution of the Dicyema species’ position. This can be seen as further evidence for the Bonellia_viridis Alvinella_pompejana non-affiliation of orthonectids and dicyemids and the correct inference that orthonectids are part of Annelida. 0.1 bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure S4. Related to Figure 3. A B Amphimedon_queenslandica Amphimedon_queenslandica Sycon_coactum Sycon_coactum Trichoplax_adhaerens Trichoplax_adhaerens Nematostella_vectensis 1 Nematostella_vectensis 1 Hydra_vulgaris 1 Saccoglossus_kowalevskii Hydra_vulgaris 1 0.96 Rhabdopleura_sp Saccoglossus_kowalevskii 1 0.96 1 Strongylocentrotus_purpuratus Rhabdopleura_sp 1 1 Patiria_miniata Strongylocentrotus_purpuratus 0.86 1 Homo_sapiens Patiria_miniata 1 1 Branchiostoma_floridae Homo_sapiens 1 Priapulus_caudatus Branchiostoma_floridae 0.75 Trichinella_spiralis 1 Priapulus_caudatus 0.75 0.75 Tribolium_castaneum 1 0.92 Trichinella_spiralis 1 Pediculus_humanus Tribolium_castaneum Daphnia_pulex 0.82 1 1 Macracanthorhynchus_hirudinaceus Pediculus_humanus 1 1 Adineta_ricciae Daphnia_pulex Neomenia_megatrapezata Macracanthorhynchus_hirudinaceus 1 1 Octopus_bimaculoides 1 Adineta_ricciae 1 Lymnea_stagnalis 0.75 1 Mesodasys_laticaudatus Lottia_gigantea 1 Dicyema_sp 0.5 Macrodasys_sp 1 Dicyema_japonicum Lepidodermella_squamata 1 Cerebratulus_sp Neomenia_megatrapezata 1 Tubulanus_polymorphus 0.88 0.98 1 Octopus_bimaculoides Cephalothrix_linearis 0.93 Lymnea_stagnalis 0.75 Mesodasys_laticaudatus 1 1 Lottia_gigantea Macrodasys_sp 0.58 Dicyema_sp 1 Lepidodermella_squamata 1 Macrostomum_lignano Dicyema_japonicum 0.75 1 Maritigrella_crozieri 0.97 Cerebratulus_sp 1 0.99 Tubulanus_polymorphus Monocelis_sp 0.98 1 1 Schistosoma_mansoni 1 Cephalothrix_linearis Dendrocoelum_lacteum Macrostomum_lignano Stenostomum_sp 1 Maritigrella_crozieri Magelona_pitelkai 1 Monocelis_sp 0.97 Golfingia_vulgaris 0.94 1 Schistosoma_mansoni 1 Pharyngocirrus_tridentiger 0.98 Dendrocoelum_lacteum 1 Intoshia_linei Sabella_pavonina Stenostomum_sp Capitella_tellata 1 1 Magelona_pitelkai Bonellia_viridis 1 0.74 Golfingia_vulgaris Helobdella_robusta 1 0.75 Pharyngocirrus_tridentiger Alvinella_pompejana Intoshia_linei 0.6 Sabella_pavonina 0.1 Helobdella_robusta 0.77 Capitella_tellata 0.99 Bonellia_viridis Alvinella_pompejana

0.1 C D

Amphimedon_queenslandica Amphimedon_queenslandica Sycon_coactum Sycon_coactum Trichoplax_adhaerens Trichoplax_adhaerens Nematostella_vectensis 1 1 Nematostella_vectensis Hydra_vulgaris 0.97 Hydra_vulgaris 0.97 Saccoglossus_kowalevskii 0.97 Homo_sapiens Rhabdopleura_sp 0.97 1 0.67 Branchiostoma_floridae 0.9 Strongylocentrotus_purpuratus 1 0.99 Saccoglossus_kowalevskii Patiria_miniata Rhabdopleura_sp Homo_sapiens 1 1 0.99 1 Strongylocentrotus_purpuratus Branchiostoma_floridae Patiria_miniata 1 Priapulus_caudatus Priapulus_caudatus 0.91 Trichinella_spiralis 0.68 1 Trichinella_spiralis 0.8 Tribolium_castaneum 0.94 1 1 Tribolium_castaneum 1 Pediculus_humanus 1 Pediculus_humanus Daphnia_pulex Daphnia_pulex Macracanthorhynchus_hirudinaceus 1 1 1 Macracanthorhynchus_hirudinaceus 1 Adineta_ricciae Adineta_ricciae Mesodasys_laticaudatus 1 Mesodasys_laticaudatus Macrodasys_sp 1 0.55 1 0.82 Macrodasys_sp Lepidodermella_squamata Lepidodermella_squamata 0.86 Neomenia_megatrapezata Neomenia_megatrapezata 1 Octopus_bimaculoides 0.99 0.99 Octopus_bimaculoides 0.92 Lymnea_stagnalis 0.99 Lymnea_stagnalis 1 1 Lottia_gigantea Lottia_gigantea Cerebratulus_sp Dicyema_sp 0.99 0.59 1 0.99 Tubulanus_polymorphus Dicyema_japonicum 0.97 Cephalothrix_linearis Cerebratulus_sp 1 0.51 Macrostomum_lignano 1 Tubulanus_polymorphus 1 Cephalothrix_linearis Maritigrella_crozieri 0.54 1 Monocelis_sp 0.8 Macrostomum_lignano 0.95 1 Schistosoma_mansoni 1 Maritigrella_crozieri 1 Dendrocoelum_lacteum 1 Monocelis_sp 0.96 1 Schistosoma_mansoni Stenostomum_sp 0.52 1 Magelona_pitelkai Dendrocoelum_lacteum 0.86 Golfingia_vulgaris Stenostomum_sp Magelona_pitelkai 0.81 Pharyngocirrus_tridentiger Intoshia_linei 0.94 Golfingia_vulgaris 0.65 Sabella_pavonina 0.99 Pharyngocirrus_tridentiger Helobdella_robusta 0.88 Sabella_pavonina 0.72 Capitella_tellata 0.7 Intoshia_linei 0.99 Bonellia_viridis 0.43 Helobdella_robusta 0.66 Capitella_tellata Alvinella_pompejana 0.99 0.54 Bonellia_viridis Alvinella_pompejana 0.1

0.1 Figure S4: A. Cladogram corresponding to Fig 3a showing all PP support values for the CAT+G4 phylogeny based on the full alignment of 190,027 amino acid positions. B. A cladogram including JP support values based on 50 jackknife subsamples of 30,000 amino acid positions each independently analysed for 2000 cycles under the CAT+G4 model in phylobayes and summarised with the bpcomp command setting 1800 as burnin. As in the analysis of the full dataset I. linei is found within the annelids and phylum Mesozoa is found as an unnatural assemblage. C. Cladogram corresponding to Fig 3b showing all support values. D. Cladogram corresponding to Fig 3c showing all support values.