bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
1 2 Title: Molecular data from Orthonectid worms show they
3 are highly degenerate members of phylum Annelida not
4 phylum Mesozoa.
5 Authors: Philipp H. Schiffer1, Helen E. Robertson1, Maximilian J. Telford1*
6
7 Affiliations: 1 Centre for Life’s Origins and Evolution, Department of Genetics, Evolution
8 and Environment, University College London, Gower Street, London, WC1E 6BT, UK
9 *Correspondence to: [email protected]
10
11
12 Summary:
13 The Mesozoa are a group of tiny, extremely simple, vermiform endoparasites of various
14 marine animals (Fig. 1). There are two recognised groups within the Mesozoa: the
15 Orthonectida (Fig. 1a,b; with a few hundred cells including a nervous system made up of just
16 10 cells [1]) and the Dicyemids (Fig. 1c; with at most 42 cells [2]). They are
17 classic ’Problematica’ [3] - the name Mesozoa suggests an evolutionary position
18 intermediate between Protozoa and Metazoa (animals) [4] and implies their simplicity is a
19 primitive state, but molecular data have shown they are members of Lophotrochozoa within
20 Bilateria [5-8] which would mean they derive from a more complex ancestor. Their precise
21 phylogenetic affinities remain uncertain, however, and ascertaining this is complicated by the
22 very fast evolution observed in genes from both groups, leading to the common systematic
23 error of Long Branch Attraction (LBA) [9]. Here we use mitochondrial and nuclear gene bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
24 sequence data, and show beyond doubt that both dicyemids and orthonectids are members of
25 the Lophotrochozoa. Carefully addressing the effects of systematic errors due to unequal
26 rates of evolution, we show that the phylum Mesozoa is polyphyletic. While the precise
27 position of dicyemids remains unresolved within Lophotrochozoa, we unequivocally identify
28 orthonectids as members of the phylum Annelida. This result reveals one of the most extreme
29 cases of body plan simplification in the animal kingdom; our finding makes sense of an
30 annelid-like cuticle in orthonectids [1] and suggests the circular muscle cells repeated along
31 their body [10] may be segmental in origin.
32
33 Results
34 Using a new assembly of available genomic and transcriptomic sequence data we identified an
35 almost complete mitochondrial genome from Intoshia linei (2 ribosomal RNAs, 20 transfer
36 RNAs and all protein coding genes apart from atp8) and recovered 9 individual mitochondrial
37 gene containing contigs from Dicyema japonicum and from a second unidentified species
38 (Dicyema sp.; cox1, 2, 3; cob; and nad1, 2, 3, 4, 5). Cob, nad3, nad4, and nad5 had not
39 previously been identified in any Dicyma species. All protostomes studied possess a unique,
40 derived combination of amino acid signatures and conserved deletions in their mitochondrial
41 NAD5 genes. Comparing the NAD5 protein coding regions of Intoshia and Dicyema to those
42 of other Metazoa shows that both share almost all of the conserved protostome signatures [11]
43 (Fig. 2a). This signature is significantly more complex than the two amino acids of the
44 Lox5/DoxC signature from Dicyema previously published [11-13] and shows beyond doubt
45 that both groups are protostomes.
46 It has been suggested that mesozoans are derived from the parasitic neodermatan flatworms. If
47 this were correct mesozoans would be expected to share two changes in mitochondrial genetic bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
48 code that unite all rhabditophoran flatworms, where the triplet AAA codes for Asparagine (N)
49 rather than the normal Lysine (K) and ATA codes for Isoleucine (I) rather than the usual
50 Methionine (M) [14]. We inferred the mitochondrial genetic codes for Dicyema and Intoshia.
51 Both groups have the standard invertebrate mitochondrial code arguing against a relationship
52 with the parasitic rhabditophoran platyhelminths (table S1).
53 We next aligned the mitochondrial genes of Intoshia and three species of Dicyema with
54 orthologs from a diversity of other Metazoans and concatenated these to produce a matrix of
55 2,969 reliably aligned amino acids from 69 species. Phylogenetic analyses of this
56 comparatively small data set is not expected to be as reliable as a much larger set of nuclear
57 genes and aspects of the topology and observed branch lengths suggest it was affected by LBA
58 (Fig. 2b). To reduce the effects of LBA on the inference of the affinities of the mesozoans we
59 removed the taxa with the longest branches and considered the position of the dicyemids and
60 orthonectid separately (as both are very long branched). We were unable to resolve the position
61 of the dicyemids (although they are clearly lophotrochozoans), but found some support for
62 placing the orthonectid Intoshia linei with the annelids (Fig. 2c and figures S1, 2). Intoshia
63 linei has a unique mitochondrial gene order although the order of the genes nad1, nad6, and
64 cob match that seen in the Lophotrochozoan ground plan and the early branching annelid
65 Owenia (Fig. 2d).
66 We next assembled a data set of 469 orthologous genes, 227,187 reliably aligned amino acids,
67 from 45 species of animals including Intoshia linei and two species of Dicyema. After
68 removing positions in the concatenated alignment with less than 50% occupancy we had an
69 alignment length of 190,027 amino acids and average completeness of ~68%. Intoshia linei
70 was 65% complete, while Dicyema japonicum and Dicyema sp. were 77% and 43% complete
71 respectively (table S2). We conducted a bayesian phylogenetic analyses of these data with the
72 site heterogeneous CAT+G4 model in Phylobayes [15]. To provide an additional, conservative bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
73 estimate of clade support and to enable further analyses in a practical time frame, we also used
74 jackknife subsampling. For each jackknife analysis we took 50 random subsamples of 30,000
75 amino acids each and ran 2,000 cycles (phylobayes CAT+G4) per sample. All 50 subsamples
76 were summarised into a single tree with the first 1800 trees from each excluded as ‘burnin’
77 [16].
78 We observed strong support for a clade of Lophotrochozoa (excluding Rotifers) including both
79 dicyemids and the orthonectid (Bayesian Posterior Probability (PP) = 1.0; Jackknife Proportion
80 (JP) = 0.97) (Fig. 3a). The dicyemids and orthonectids were not each other’s closest relatives;
81 the position of the dicyemids within the Lophotrochozoa was not resolved; they were not the
82 sister group of the platyhelminths nor of the gastrotrichs in our analysis. The position of the
83 orthonectid Intoshia, in contrast, was resolved as being within the clade of annelids (Fig. 2a PP
84 = 0.97; JP = 0.74).
85 We next asked whether there was any effect from long branched dicyemids on the strength of
86 support for inclusion of Intoshia within the Annelida - Intoshia also being a long-branched
87 taxon. Repeating our jackknife analyses with dicyemids excluded increased the support for
88 Intoshia as an annelid from JP = 0.74 to JP = 0.86 (Fig. 3b) showing that when the expected
89 LBA between Dicyema spp and Intoshia is prevented, there is stronger support for including
90 the orthonectid in Annelida. An equivalent analysis omitting Intoshia did not help to resolve
91 the position of dicyemids (figure S3).
92 To test further the support for Intoshia being a member of Annelida, we reasoned that an
93 analysis restricted to genes showing the strongest signal supporting monophyletic Annelida
94 should give stronger support to Intoshia within Annelida but only if it is indeed a member of
95 the clade; if not, support should decrease when using this subset of genes. We first removed all
96 mesozoan sequences from each individual gene alignment and reconstructed a tree for each
97 gene. We ranked these gene trees according to the proportion of all annelids present in a given bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
98 gene data set that were observed united in a clade. We concatenated the genes (now including
99 mesozoans) from strongest supporters of monophyletic Annelida to weakest. We repeated our
100 jackknife analyses using the best quarter of genes. An analysis of the genes that most strongly
101 support monophyletic Annelida results in an increase support for inclusion of Intoshia within
102 Annelida from JP = 0.74 to JP = 0.94 (Fig. 3c).
103 Our results suggest that recent findings of a close relationship between Intoshia and Dicyema
104 and the linking of both these taxa to rapidly evolving gastrotrichs and platyhelminths [7,8] is
105 due to long branch attraction. To test this prediction we exaggerated the expected effects of
106 LBA on our own data set by using less well fitting models. We first conducted cross validation
107 comparing the site heterogeneous CAT+G4 model we have used to the site homogenous
108 LG+G4 and show that LG+G4 is a significantly less good fit to our data (CAT+G4 is better
109 than LG+G4: ΔlnL = 9787 +/- 249.265). We used the less well fitting LG+G4 model to
110 reanalyse the jackknife replicates of a data set including our four most complete annelids. We
111 observed a topology clearly influenced by LBA in which long branched taxa including
112 flatworms, annelids, rotifers and nematodes were grouped. We also observed within this ‘LBA
113 assemblage’ the two longest branched clades, dicyemids and the orthonectid as each other’s
114 closest relatives. As a further test we reanalysed the published data set [8] which had linked
115 orthonectid and dicyemid with platyhelminths and gastrotrichs. When we removed the most
116 obvious source of LBA - the long branched dicyemid - we found that the orthonectid Intoshia
117 was, as expected, found not with platyhelminths or gastrotrichs but with the two annelids
118 present in this data set, again providing evidence of the effects of long branch attraction (Fig
119 4).
120 Discussion
121 We have analysed the first, almost complete mitochondrial genome sequence of an orthonectid
122 mesozoan and added to the known mitochondrial genes of Dicyemida to provide two powerful bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
123 rare genomic changes. Our analyses of mitochondrial NAD5 gene sequences show
124 unequivocally that both Dicyemida and Orthonectida are members of the protostomes and the
125 absence of rhabditophoran flatworm mitochondrial genetic code changes rejects existing ideas
126 that either group might be derived from parasitic flatworms. Both groups show unusually high
127 rates of evolution and this required steps to test for and avoid the possible effects of long branch
128 attraction, not least between the orthonectids and dicyemids.
129 Our mitochondrial data set and our large, taxonomically broad set of nuclear genes with a low
130 percentage of missing data, analysed with well fitting, site heterogeneous models of sequence
131 evolution, do not support the close relationship between orthonectids and dicyemids.
132 Orthonectids are annelids and not members of the Mesozoa and the phylum Mesozoa sensu
133 lato is an unnatural polyphyletic assemblage. We were unable to place the dicyemids more
134 precisely and they may be considered a phylum in their own right. Experiments manipulating
135 the expected effects of LBA strongly suggest previous phylogenies were affected by this
136 important source of systematic error. Finding the orthonectids and dicyemids not closely
137 associated demonstrates a remarkable instance of convergent evolution in two unrelated,
138 miniaturised parasites.
139 The finding that the orthonectid Intoshia is a member of the Annelida shows that it has evolved
140 its extraordinary simplicity by drastic simplification from a much more complex annelid
141 common ancestor. Our phylogenetic analyses could not more precisely place Intoshia within
142 the annelids, however, a short stretch of mitochondrial genes (nad1, nad6, cob) that are found
143 in the same order as in the lophotrochozoan ancestor and in the early branching annelid Owenia
144 fusiformis but not in the pleistoannelid ground plan argues for a position outside of the
145 Pleistoannelida [17] (Fig 2d). Possible evidence of an ancestral segmented body plan is still
146 apparent in the series of circular muscles regularly spaced along the antero-posterior axis of
147 Intoshia (Fig 1b), along with similarly repeated bands of cilia (Fig 1 and ref [18]). Further bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
148 analysis of the genome, embryology and morphology of Intoshia or other orthonectids are
149 predicted to show additional clues as to their cryptic annelidan ancestry.
150 Methods 151 152 Genome and transcriptome assemblies.
153 We downloaded genomic (Intoshia linei: SRR4418796, SRR4418797) and transcriptomic
154 (Dicyema sp.: SRR827581; Dicyema japonicum: DRR057371) data from the NCBI Short
155 Read Archives and DDBJ, and used Trimmomatic [19] to clean residual adapter sequences
156 from the sequencing reads and to remove low quality bases. We used the clc assembly cell
157 (clcBIO/Qiagen; v.5.0) to re-assemble the I. linei genome and the Trinity pipeline [20]
158 (v.2.3.2) to assemble the Dicyema. sp. and D. japonicum transcriptomes using default
159 settings. We additionally assembled transcriptomes for Phascolopsis gouldii,
160 Spiochaetopterus sp., Arenicola marina, Sabella pavonina, Magelona pitelkai,
161 Pharyngocirrus tridentiger and Bonellia viridis from SRA datasets (SRR1654498,
162 SRR1224605, SRR2005653, SRR2005708, SRR2015609, SRR2016714, SRR2017645)
163 using the same approach.
164
165 Identifying mitochondrial genome fragments.
166 Using mitochondrial gene protein coding sequences from flatworms as queries [21] we used
167 tblastn [22] and blastp to search for Dicyema sp. and D. japonicum mitochondrial fragments
168 in the Trinity RNA-Seq assemblies, and screened the I. linei genome re-assembly in a similar
169 way. Positively identified ORFs were then blasted against NCBI nr to detect possible
170 contamination from host species in the RNA-Seq data. For each Dicyema sp. gene-bearing
171 contig, we also found additional contigs which had strongly matching blast hits to Octopus or bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
172 other cephalopods (or in some cases to the gastropod mollusc Aplysia) and we discarded
173 these as likely contaminations.
174
175 Annotating mitochondrial genomes.
176 Using blast we identified a 14.2kb mitochondrial contig in the assembled I. linei genome,
177 which we annotated using MITOS [23]. The location of protein-coding genes were manually
178 verified from MITOS prediction, and inferred to start from the first in-frame start codon
179 (ATN, GTG, TTG, or GTT). The C-terminal of the protein-coding genes was inferred to be
180 the first in-frame stop codon (TAA, TAG or TGA). We aligned the Intoshia and Dicyema
181 NAD5 genes with those from 5 protostomes, 4 deuterostomes, and 2 non-bilaterian species in
182 the Geneious software to visualise Protostome specific signatures in the sequence.
183
184 Mitochondrial Phylogenetics
185 We grouped the mesozoan mitochondrial protein coding genes with their orthologs from 65
186 other species selected to cover the diversity of the Metazoa including diploblasts,
187 deuterostomes and ecdysozoans but with an emphasis on the diversity of Lophotrochozoa.
188 We aligned each set of orthologs using Muscle [24] v3.8.31 using default parameters and
189 trimmed these alignments to exclude unreliably aligned positions using TrimAl [25] (version
190 1.2 rev 59 using default settings). Finally, we concatenated the trimmed alignments of all
191 genes into a supermatrix of 2969 positions. We inferred a phylogeny with phylobayes (4.1b)
192 under the CAT+G4 model. We ran 10 independent chains for 10,000 cycles each. We
193 summarised all ten chains (bpcomp) discarding the first 8,000 trees from each as burnin. We
194 reconstructed additional mitochondrial phylogenies omitting (i) the long branching flatworm
195 species, (ii) all long branch taxa and also Intoshia, and (iii) long branch taxa and the Dicyema bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
196 species. Here and elsewhere we visualised and edited phylogenetic trees with FigTree
197 (v1.4.3; http://tree.bio.ed.ac.uk/software/figtree/).
198
199 Nuclear gene orthology determination
200 We chose to add the mesozoan data to sets of orthologous genes that were previously
201 successfully used to infer lophotrochozoan phylogeny [26,27]. We first used Orthofinder [28]
202 (v.1.0.8) to calculate orthologous relationships between the genes predicted for I. linei in the
203 recent genome paper [8] and our Dicyema sp. gene predictions. To ensure robustness of the
204 analysis we included several outgroup species (Supplementary table 2) In particular, as we
205 were concerned about potential contamination by the hosts of the parasitic Dicyema we
206 included the Octopus bimaculoides proteome. Since the published phylogenomic studies
207 included few annelid species we added our own Trinity assemblies of several additional
208 species (see above). We then extracted all orthologous groups containing the Octopus and the
209 two mesozoan taxa from the Orthofinder output and inserted these sequences into the original
210 alignments. This resulted in 590 orthologous groups. With the aid of OMA [29] and custom
211 Perl scripts we filtered these groups to contain single copy orthologs of all species. We re-
212 aligned each set of orthologs using clustal-omega [30]; we removed unreliably aligned
213 positions from each alignment using TrimAl; finally we constructed individual gene trees
214 from these trimmed alignments using phyml [31] (v20160207). Using Python code and the
215 ETE3 toolkit we checked each tree for instances where sequences from Octopus and Dicyema
216 sp. were each other’s closest relatives (suggesting the sequence is an Octopus contaminant)
217 and removed the 5 alignments where the trees had this topology from our set. We
218 concatenated all single trimmed alignments of 45 taxa into a supermatrix of 227,646 bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
219 positions. We used a custom script to eliminate all positions in the alignment with less than
220 50% occupancy.
221
222 Nuclear Gene Phylogenomic analyses
223 Using the mpi version of phylobayes (in v.1.7) run over four independent chains for 5000
224 cycles and discarding the first 4500 trees as burnin we reconstructed a phylogeny using this
225 alignment under both the CAT+G4 model of molecular evolution. To provide a conservative
226 measure of clade support and to test different data samples in a reasonable time we also
227 reconstructed trees using 50 jackknife sub-samples of 30,000 positions each from the
228 supermatrix. We used phylobayes 4.1c with the aid of the gnu-parallel command line tool
229 [32] and the UCL HPC cluster. We used the CAT+G4 model, and also compared results
230 from LG+G4. We ran phylobayes for 2000 cycles per jackknife sample which consistently
231 resulted in a plateauing of the likelihood score. We summarised all 50 of these phylobayes
232 analyses per model (using bpcomp) discarding the first 1800 sampled trees per jackknife as
233 burnin. We also tested the effect of different species compositions in our dataset by
234 performing phylobayes jackknife sampling with different subsets of taxa.
235
236 Cross validation
237 We compared the fit of CAT+G4 and LG+G4 models to our data using cross validation as
238 described in the phylobayes user manual. We ran 10 replicates and for each replicate we used
239 a randomly selected 30,000 positions of the data as a training set and 10,000 randomly bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
240 selected positions as the test set. Log likelihood scores were averaged over the ten replicates
241 using the sumcv command.
242
243 Ranking genes according to support for monophyletic Annelida.
244 We first removed all Intoshia and Dicyema sequences from each individual gene alignment.
245 For each individual gene, we reconstructed a tree from the aligned protein coding sequences
246 using Ninja [33]. Each tree was parsed using a custom script to find the proportion of
247 annelids in the data set present in the largest clade of annelids found. The tree was given a
248 score which was calculated as the number of annelids in the largest clade/total number of
249 annelids on the tree. Trees with larger monophyletic annelid clades scored highest. The
250 genes were then concatenated in order of their score. We took the first 25% of positions from
251 this concatenation (those genes with the strongest signal supporting monophyletic annelids)
252 and analysed jackknife replicates as before.
253 254 Acknowledgements: 255 256 The authors are grateful to Prof George Slyusarev (Saint Petersburg State University, Russia)
257 for providing images of Intoshia linei and for sharing an unpublished book chapter and to Dr
258 Tsai-Ming Lu and colleagues (Okinawa Institute of Science and Technology, Japan) for
259 sharing data. The authors also thank Fraser Simpson (UCL) for help in editing Figure 1b. The
260 research was funded by ERC grant (ERC-2012-AdG 322790) to MJT. Alignments have been
261 deposited with Zenodo (XXX) and phylogenetic trees are available through treebase.org
262 (XXX).
263
264 bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
265 Author Contributions
266 Conceived the study: MJT. Planned the study: MJT and PHS. Assembled the data sets: PHS.
267 Analysed the data: PHS and MJT. Drafted the manuscript: MJT and PHS. Analysed
268 mitochondrial data: HER.
269
270 Declaration of Interests:
271 The authors declare no competing financial interests. 272 273 274 Figure Legends 275 276 Fig. 1: The mesozoans Intoshia variabili and Dicyema typus 277 278 A. Differential Interference contrast micrograph of an Intoshia variabili female showing 279 repeated bands of ciliated cells. Picture G. Slyusarev (St Petersburg State University, 280 Russia). 281 B. Confocal image of a phalloidin stained female specimen of Intoshia linei reveals repeated 282 set of circular muscles. Picture G. Slyusarev (St Petersburg State Univ.). 283 C. Rhombogen stage of a dicyemid (Dicyema typus from the Octopus) adapted from Hyman 284 L.H. The Invertebrates: Protozoa through Ctenophora McGraw-Hill, New York 1940(19). 285 Anterior to right in all images. 286 287 Fig. 2: Analyses of the phylogenetic positions of Dicyema and Intoshia based on 288 mitochondrial gene sequences. 289 290 A. Alignment of the mitochondrial NAD5 gene from selected protostomes, deuterostomes, 291 and outgroups, highlighting derived substitutions and amino acid deletions shared by the 292 orthonectids, dicyemids, and other protostomes. 293 B. A mitochondrial bayesian phylogeny based on 2969 positions places orthonectids and 294 dicyemids inside Lophotrochozoa, but the unlikely assemblage of Intoshia linei and 295 flatworms with annelids suggest this is affected by systematic error. 296 C. Mitochondrial bayesian phylogeny omitting the long branching taxa including Dicyema 297 gives some support for a position of Intoshia within Annelida.D. Order of the Intoshia nad1, 298 nad6, and cob mitochondrial genes in comparison to the early branching annelid Owenia 299 fusiformis, the pleistoannelid ground plan and the lophotrochozoan ground plan (see ref [17]). 300 bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
301 Fig. 3: Analyses of the phylogenetic positions of Dicyema and Intoshia based on nuclear 302 gene sequences. 303 A. A bayesian phylogeny reconstructed from 190,027 aligned amino acid positions analysed 304 under the CAT+G4 model. Support values are from bayesian posterior probabilities (PP) and 305 from 50 jackknifed sub-samples of 30,000 residues (JP support values in brackets). Both 306 analyses reveal Mesozoa to be polyphyletic and place Intoshia linei in Annelida (see Supp 307 Fig 4a for support values). 308 B. A repeat of the jackknife analysis omitting the long-branching Dicyema species eliminates 309 the potential for LBA between Intoshia and Dicyema. This leads to an increase in the support 310 for a position of Intoshia within Annelida from JP 0.74 to JP 0.86 JP. (Only lophotrochozoan 311 part of the tree shown, see Supplementary Fig 4c for full tree). 312 C. Bayesian jackknife using CAT+G4 model using the best quarter of genes supporting 313 monophyletic annelids leads to increased support for Intoshia within Annelida to JP 0.94 314 even with the inclusion of the Dicyema species. (Only lophotrochozoan part of the tree 315 shown, see Supplementary Fig 4d for full tree). 316
317 Fig. 4: Reanalysis of a published data set addressing potential LBA between mesozoans 318 supports annelid affinity for Intoshia. 319 Repeating the analyses on a previously published data set [8] excluding the long branching 320 Dicyema leads to Intoshia being placed with the annelids, showing the likely effect of LBA 321 on the original analysis. Support values are bayesian posterior probabilities (PP). A B bioRxiv preprint was notcertifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. doi: https://doi.org/10.1101/235549 ; C this versionpostedDecember18,2017. The copyrightholderforthispreprint(which
Fig. 1: The mesozoans Intoshia variabili and Dicyema typus A. Differential Interference contrast micrograph of an Intoshia variabili female showing repeated bands of ciliated cells. Picture G. Slyusarev (St Petersburg State University, Russia). B. Confocal image of a phalloidin stained female specimen of Intoshia linei reveals repeated set of circular muscles. Picture G. Slyusarev (St Petersburg State Univ.). C. Rhombogen stage of a dicyemid (Dicyema typus from the Octopus) adapted from Hyman L.H. The Invertebrates: Protozoa through Ctenophora McGraw-Hill, New York 1940(19). Anterior to right in all images. 180 190 260 270 280 290 Dicyema Lophotrochozoa bioRxiv preprint Intoshia rrnS rrnL nad1 nad6 cob nad4L nad4 A Octopus D Phonoris
Taenia Intoshia linei was notcertifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. Brugia Drosophila -nad4 -cox1 nad1 nad6 doi: cob nad3 rrnL
Xenopus https://doi.org/10.1101/235549 Pongo Xenoturbella Owenia fusiformis Hydra Paramecium nad5 nad2 nad1 nad6 cob nad4L nad4 Pleistoannelida B C cox2 atp8 cox3 nad6 cob atp6 nad5
Acropora_tenuis Acropora_tenuis ; Nematostella_sp Nematostella_sp this versionpostedDecember18,2017. Aurelia_aurita Trichoplax_adhaerens Trichoplax_adhaerens Xenoturbella_bocki Aurelia_aurita 0.62 Strongylocentrotus_pallidus Xenoturbella_bocki Saccoglossus_kowalevskii Strongylocentrotus_pallidus Balanoglossus_carnosus Saccoglossus_kowalevskii Myxine_glutinosa 1 0.83 0.98 Lampetra_fluviatilis Balanoglossus_carnosus Salmo_salar Doliolum_nationalis 0.96 Ornithorhynchus_anatinus 0.74 Ciona_intestinalis Homo_sapiens Myxine_glutinosa Doliolum_nationalis Lampetra_fluviatilis Ciona_intestinalis Salmo_salar Branchiostoma_floridae 1 Asymmetron_inferum Ornithorhynchus_anatinus Priapulus_caudatus Homo_sapiens Tribolium_castaneum 1 Branchiostoma_floridae Locusta_migratoria Asymmetron_inferum Limulus_polyphemus Priapulus_caudatus Daphnia_pulex Trichinella_spiralis Tribolium_castaneum Strongyloides_stercoralis Locusta_migratoria 1 Steinernema_carpocapsae Limulus_polyphemus Schmidtea_mediterranea Daphnia_pulex Spadella_cephaloptera Spadella_cephaloptera Sagitta_inflata
Sagitta_inflata The copyrightholderforthispreprint(which Decipisagitta_decipiens 1 Siphonodentalium_lobatum Decipisagitta_decipiens Mytilus_trossulus Siphonodentalium_lobatum 0.87 Flustrellidra_hispida Mytilus_trossulus Bugula_neritina Thais_clavigera Pupa_strigosa Lineus_alborostratus Biomphalaria_tenagophila Aplysia_californica 0.92 Nautilus_macromphalus 0.68 Thais_clavigera Octopus_vulgaris Lineus_alborostratus Octopus_ocellatus Nautilus_macromphalus Sepia_officinalis Octopus_vulgaris Sepia_esculenta Octopus_ocellatus Sepioteuthis_lessoniana Sepia_officinalis Sepia_esculenta Loligo_bleekeri Sepioteuthis_lessoniana 0.94 Dosidicus_gigas 0.3 Loligo_bleekeri Pupa_strigosa Dosidicus_gigas Biomphalaria_tenagophila Terebratulina_retusa Aplysia_californica Laqueus_rubellus Sipunculus_nudus Terebratulina_retusa Orbinia_latreillii Laqueus_rubellus 0.41 Urechis_caupo Flustrellidra_hispida Platynereis_dumerilii Bugula_neritina Myzostoma_seymourcollegiorum 0.8 Sipunculus_nudus Marphysa_sanguinea Intoshia_linei 0.69 Intoshia_linei 0.57 Microstomum_lineare Orbinia_latreillii Schistosoma_mansoni 0.39 Clymenella_torquata Fasciola_hepatica Platynereis_dumerilii 0.33 Taenia_crassiceps Myzostoma_seymourcollegiorum Echinococcus_shiquicus Urechis_caupo Echinococcus_multilocularis Dicyema_sp Marphysa_sanguinea 1 Dicyema_misakiense Escarpia_spicata Dicyema_japonicum Lumbricus_terrestris Clymenella_torquata Perionyx_excavatus Escarpia_spicata Amynthas_jiriensis Lumbricus_terrestris Annelida Perionyx_excavatus Amynthas_jiriensis 0.1
0.1 Fig. 2: Analyses of the phylogenetic positions of Dicyema and Intoshia based on mitochondrial gene sequences. A. Alignment of the mitochondrial NAD5 gene from selected protostomes, deuterostomes, and outgroups, highlighting derived substitutions and amino acid deletions shared by the orthonectids, dicyemids, and other protostomes. B. A mitochondrial bayesian phylogeny based on 2969 positions places orthonectids and dicyemids inside Lophotrochozoa, but the unlikely assemblage of Intoshia linei and flatworms with annelids suggest this is affected by systematic error. C. Mitochondrial bayesian phylogeny omitting the long branching taxa including Dicyema gives some support for a position of Intoshia within Annelida. D. Order of the Intoshia nad1, nad6, and cob mitochondrial genes in comparison to the early branching annelid Owenia fusiformis, the pleistoannelid ground plan and the lophotrochozoan ground plan (see ref [17]). Amphimedon_queenslandica Sycon_coactum Trichoplax_adhaerens Nematostella_vectensis bioRxiv preprint Hydra_vulgaris Saccoglossus_kowalevskii Rhabdopleura_sp Strongylocentrotus_purpuratus Patiria_miniata was notcertifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission.
A Homo_sapiens doi: Branchiostoma_floridae Priapulus_caudatus Fig. 3: Analyses of the phylogenetichttps://doi.org/10.1101/235549 positions of Trichinella_spiralis Tribolium_castaneum Dicyema and Intoshia based on nuclear gene Pediculus_humanus Daphnia_pulex sequences. Macracanthorhynchus_hirudinaceus Adineta_ricciae A. A bayesian phylogeny reconstructed from Neomenia_megatrapezata Octopus_bimaculoides 190,027 aligned amino acid positions analysed 1 Lymnea_stagnalis Lottia_gigantea under the CAT+G4 model. Support values are Dicyema_sp
(0.97) Dicyema_japonicum ; from bayesian posterior probabilitiesthis versionpostedDecember18,2017. (PP) and Cerebratulus_sp Tubulanus_polymorphus from 50 jackknifed sub-samples of 30,000 Cephalothrix_linearis Mesodasys_laticaudatus residues (JP support values in brackets). Both Macrodasys_sp Lepidodermella_squamata analyses reveal Mesozoa to be polyphyletic and Macrostomum_lignano Maritigrella_crozieri Monocelis_sp place Intoshia linei in Annelida (see Supp Fig 4a Schistosoma_mansoni Dendrocoelum_lacteum for support values). Stenostomum_sp Magelona_pitelkai B. A repeat of the jackknife analysis omitting the Golfingia_vulgaris Pharyngocirrus_tridentiger long-branching Dicyema species eliminates the Intoshia_linei
0.97 The copyrightholderforthispreprint(which Sabella_pavonina potential for LBA between Intoshia and Dicyema. Capitella_tellata (0.74) Bonellia_viridis This leads to an increase in the support for a Helobdella_robusta 0.1 Alvinella_pompejana Annelida position of Intoshia within Annelida from JP 0.74
0.1 to JP 0.86 JP. (Only lophotrochozoan part of the Macracanthorhynchus_hirudinaceus Macracanthorhynchus_hirudinaceus Adineta_ricciae Adineta_ricciae tree shown, see Supplementary Fig 4c for full B C Mesodasys_laticaudatus Mesodasys_laticaudatus Macrodasys_sp tree). Macrodasys_sp Lepidodermella_squamata Lepidodermella_squamata Neomenia_megatrapezata C. Bayesian jackknife using CAT+G4 model using Neomenia_megatrapezata Octopus_bimaculoides Octopus_bimaculoides Lymnea_stagnalis the best quarter of genes supporting monophyletic Lymnea_stagnalis Lottia_gigantea Lottia_gigantea Dicyema_sp annelids leads to increased support for Intoshia Cerebratulus_sp Dicyema_japonicum Tubulanus_polymorphus Cerebratulus_sp within Annelida to JP 0.94 even with the inclusion Cephalothrix_linearis Tubulanus_polymorphus of the Dicyema species. (Only lophotrochozoan Macrostomum_lignano Cephalothrix_linearis Macrostomum_lignano Maritigrella_crozieri part of the tree shown, see Supplementary Fig 4d Maritigrella_crozieri Monocelis_sp Monocelis_sp Schistosoma_mansoni for full tree). Schistosoma_mansoni Dendrocoelum_lacteum Dendrocoelum_lacteum Stenostomum_sp Stenostomum_sp Magelona_pitelkai Magelona_pitelkai Golfingia_vulgaris Golfingia_vulgaris 0.86 Pharyngocirrus_tridentiger 0.94 Pharyngocirrus_tridentiger Intoshia_linei Sabella_pavonina Sabella_pavonina Intoshia_linei Helobdella_robusta Helobdella_robusta Capitella_tellata Capitella_tellata Bonellia_viridis Bonellia_viridis Alvinella_pompejana Alvinella_pompejana Annelida
0.5 0.50.5 0.5 bioRxiv preprint Daphnia_pulex 1 Tribolium_castaneum 1
Drosophila_melanogaster was notcertifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission.
Limnognathia_maerski doi: 1 Macracanthorhynchus_hirudinaceus https://doi.org/10.1101/235549 1 Brachionus_koreanus Mesodasys_laticaudatus 1 Macrodasys_sp 0.91 Lepidodermella_squamata 1 1 0.96 Diuronotus_aspetos 0.98 Stenostomum_sthenum 1 ;
Stenostomum_leucops this versionpostedDecember18,2017. 1 Microstomum_lineare Prostheceraeus_vittatus 1 1 Geocentrophora_applanata 1 1 Monocelis_fusca 1 Schistosoma_mansoni 0.99 1 Bothrioplana_semperi Octopus_bimaculatus 1 Lottia_gigantea 0.94 Intoshia_linei 0.93 Helobdella_robusta The copyrightholderforthispreprint(which 0.98 Annelida Capitella_teleta Gnathostomula_paradoxa 1 Austrognathia_sp Homo_sapiens 1 1 Ciona_intestinalis Branchiostoma_floridae
0.1
Fig. 4: Reanalysis of a published data set addressing potential LBA between mesozoans supports annelid affinity for Intoshia.
Repeating the analyses on a previously published data set [7] excluding the long branching Dicyema leads to Intoshia being placed with the annelids, showing the likely effect of LBA on the original analysis. Support values are bayesian posterior probabilities (PP). bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
322 Supplementary Tables 323 324 Supplementary Table 1 325 Predicted correspondence of nucleotide triplets to amino acids in Intoshia and three Dicyema 326 species. For each triplet, the amino acid corresponding to the triplet in the standard 327 invertebrate mitochondrial code is shown, the number of observations of the triplet to 328 prediction is based on, the predicted amino acid and its score and finally the second highest 329 scoring amino acid prediction. The triplets AAA and ATA are highlighted in green and likely 330 errors highlighted in blue. Likely errors are mostly associated with very low numbers of 331 observed GC rich triplets in these very AT rich mitochondrial genomes. 332 333 Supplementary Table 2 334 List of species used in the final phylogenetic analysis, data sources, and representation in the 335 final alignment. 336 337 338 Supplementary Figures: 339 340 Figure S1. Related to Figure 2. 341 Phylogram and corresponding cladogram of a Bayesian analysis of our mitochondrial data set 342 omitting the long-branching flatworm species. Phylobayes CAT+G4 model was run in 10 343 independent runs for 10,000 cycles each on an alignment with 2969 positions and 8000 trees 344 were discarded as burnin. 345 346 Figure S2. Related to Figure 2. 347 Phylogram and corresponding cladogram of a Bayesian analysis of our mitochondrial data set 348 omitting Intoshia linei. Phylobayes CAT+G4 model was run in 10 independent runs for 349 10,000 cycles each on an alignment with 2969 positions and 8000 trees were discarded as 350 burnin. 351 352 Figure S3. Related to Figure 3. 353 A phylogram based on our analysis of the jackknifed dataset omitting Intoshia linei. Contrary 354 to the improvement in placing I. linei observed when excluding the Dicyema species, the 355 exclusion of I. linei does not lead to a better resolution of the Dicyema species’ position. This 356 can be seen as further evidence for the non-affiliation of orthonectids and dicyemids and the 357 correct inference that orthonectids are part of Annelida. bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
358 Figure S4. Related to Figure 3. 359 A. Cladogram corresponding to Fig 3a showing all PP support values for the CAT+G4 360 phylogeny based on the full alignment of 190,027 amino acid positions. 361 362 B. A cladogram including JP support values based on 50 jackknife subsamples of 30,000 363 amino acid positions each independently analysed for 2000 cycles under the CAT+G4 model 364 in phylobayes and summarised with the bpcomp command setting 1800 as burnin. As in the 365 analysis of the full dataset I. linei is found within the annelids and phylum Mesozoa is found 366 as an unnatural assemblage. 367 368 C. Cladogram corresponding to Fig 3b showing all support values. 369 370 D. Cladogram corresponding to Fig 3c showing all support values. 371 372 References:
373 374 [1] Slyusarev GS, Starunov VV. The structure of the muscular and nervous systems of 375 the female Intoshia linei (Orthonectida). Org Divers Evol 2015;16:65–71. 376 [2] Furuya H, Hochberg FG, Tsuneki K. Cell number and cellular composition in 377 infusoriform larvae of dicyemid mesozoans (Phylum Dicyemida). Zool Sci 378 2004;21:877–89. 379 [3] Nielsen C. Animal Evolution. Oxford University Press; 2011. 380 [4] Dodson EO. A note on the systematic position of the Mesozoa. Syst Zoo 1956;5:37. 381 [5] Suzuki TG, Ogino K, Tsuneki K, Furuya H. Phylogenetic analysis of dicyemid 382 mesozoans (phylum Dicyemida) from innexin amino acid sequences: Dicyemids are 383 not related to Platyhelminthes. J Parasitol 2010;96:614–25. 384 [6] Hanelt B, Van Schyndel D, Adema CM, Lewis LA, Loker ES. The phylogenetic 385 position of Rhopalura ophiocomae (Orthonectida) based on 18S ribosomal DNA 386 sequence analysis. Mol Biol Evol 1996;13:1187–91. 387 [7] Lu T-M, Kanda M, Satoh N, Furuya H. The phylogenetic position of dicyemid 388 mesozoans offers insights into spiralian evolution. Zool Letts 2017;3:419. 389 [8] Mikhailov KV, Slyusarev GS, Nikitin MA, Logacheva MD, Penin AA, Aleoshin VV, et 390 al. The genome of Intoshia linei affirms orthonectids as highly simplified spiralians. 391 Curr Biol 2016;26:1768–74. 392 [9] Philippe H, Brinkmann H, Copley RR, Moroz LL, Nakano H, Poustka AJ, et al. 393 Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 394 2011;470:255–8. 395 [10] Slyusarev GS. Fine structure and development of the cuticle of Intoshia variabili 396 (Orthonectida). Acta Zool 2000;81:1–8. 397 [11] Telford MJ, Copley RR. Improving animal phylogenies with genomic data. Trends 398 Genet 2011;27:186–95. 399 [12] Telford MJ. Turning Hox “signatures” into synapomorphies. Evol Dev 2000;2:360–4. 400 [13] Kobayashi M, Furuya H, Holland PW. Dicyemids are higher animals. Nature 401 1999;401:762–2. 402 [14] Telford MJ, Herniou EA, Russell RB, Littlewood DT. Changes in mitochondrial 403 genetic codes as phylogenetic characters: two examples from the flatworms. P Natl 404 Acad Sci Usa 2000;97:11359–64. 405 [15] Lartillot N, Blanquart S, Lepage T. PhyloBayes 3.3 a Bayesian software for 406 phylogenetic reconstruction and molecular dating using mixture models. 2012. bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
407 [16] Simion P, Philippe H, Baurain D, Jager M, Richter DJ, Di Franco A, et al. A large 408 and consistent phylogenomic dataset supports sponges as the sister group to all 409 other animals. Curr Biol 2017;27:958–67. 410 [17] Weigert A, Golombek A, Gerth M, Schwarz F, Struck TH, Bleidorn C. Evolution of 411 mitochondrial gene order in Annelida. Mol Phyl Evol 2016;94:196–206. 412 [18] Slyusarev GS, Kristensen RM. Fine structure of the ciliated cells and ciliary rootlets 413 of Intoshia variabili (Orthonectida). Zoomorphology 2002;122:33–9. 414 [19] Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina 415 sequence data. Bioinformatics 2014;30:2114–20. 416 [20] Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De 417 novo transcript sequence reconstruction from RNA-seq using the Trinity platform for 418 reference generation and analysis. Nat Protoc 2013;8:1494–512. 419 [21] Robertson HE, Lapraz F, Egger B, Telford MJ, Schiffer PH. The mitochondrial 420 genomes of the acoelomorph worms Paratomella rubra, Isodiametra pulchra and 421 Archaphanostoma ylvae. Sci Rep 2017;7:1847. 422 [22] Altschul SF, Gish W, Miller W, Myers EW. Basic local alignment search tool. Journal 423 of Mol Biol 1990;215:403–10. 424 [23] Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, et al. MITOS: 425 improved de novo metazoan mitochondrial genome annotation. Mol Phyl Evol 426 2013;69:313–9. 427 [24] Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and 428 space complexity. BMC Bioinformatics 2004;5:113. 429 [25] Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. TrimAl: a tool for automated 430 alignment trimming in large-scale phylogenetic analyses. Bioinformatics 431 2009;25:1972–3. 432 [26] Egger B, Lapraz F, Tomiczek B, Müller S, Dessimoz C, Girstmair J, et al. A 433 transcriptomic-phylogenomic analysis of the evolutionary relationships of flatworms. 434 Curr Biol 2015;25:1347–53. 435 [27] Struck TH, Wey-Fabrizius AR, Golombek A, Hering L, Weigert A, Bleidorn C, et al. 436 Platyzoan paraphyly based on phylogenomic data supports a noncoelomate 437 ancestry of spiralia. Mol Biol Evol 2014;31:1833–49. 438 [28] Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome 439 comparisons dramatically improves orthogroup inference accuracy. Genome Biol 440 2015;16:E9–13. 441 [29] Altenhoff AM, Škunca N, Glover N, Train C-M, Sueki A, Piližota I, et al. The OMA 442 orthology database in 2015: function predictions, better plant support, synteny view 443 and other improvements. Nucleic Acids Res 2015;43:D240–9. 444 [30] Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, et al. Fast, scalable 445 generation of high-quality protein multiple sequence alignments using Clustal 446 Omega. Mol Syst Biol 2011;7:1–6. 447 [31] Morrison DA. Increasing the efficiency of searches for the maximum likelihood tree 448 in a phylogenetic analysis of up to 150 nucleotide sequences. Systematic Biol 449 2007;56:988–1010. 450 [32] Tange O. Gnu parallel-the command-line power tool. The USENIX Magazine; 2011. 451 [33] Wheeler TJ. Large-Scale Neighbor-Joining with NINJA. Algorithms in Bioinformatics, 452 vol. 5724, Berlin, Heidelberg: Springer Berlin Heidelberg; 2009, pp. 375–89.
~50µm bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure S1. Related to Figure 2. Acropora_tenuis Nematostella_sp Aurelia_aurita Trichoplax_adhaerens Xenoturbella_bocki 1 Strongylocentrotus_pallidus 0.94 1 Saccoglossus_kowalevskii 1 Balanoglossus_carnosus 0.55 1 Doliolum_nationalis 0.69 Ciona_intestinalis 1 Myxine_glutinosa 0.74 Lampetra_fluviatilis 1 Salmo_salar Acropora_tenuis 1 Ornithorhynchus_anatinus Nematostella_sp 0.64 1 Homo_sapiens Aurelia_aurita 1 Branchiostoma_floridae Trichoplax_adhaerens 1 Asymmetron_inferum Xenoturbella_bocki Priapulus_caudatus Strongylocentrotus_pallidus Saccoglossus_kowalevskii 0.96 0.97 Tribolium_castaneum 0.97 Locusta_migratoria Balanoglossus_carnosus 0.96 Limulus_polyphemus Doliolum_nationalis Daphnia_pulex Ciona_intestinalis Spadella_cephaloptera Myxine_glutinosa 0.99 Lampetra_fluviatilis 0.78 Sagitta_inflata Salmo_salar 1 Decipisagitta_decipiens Ornithorhynchus_anatinus Siphonodentalium_lobatum Homo_sapiens 0.74 Mytilus_trossulus Branchiostoma_floridae Thais_clavigera Asymmetron_inferum 0.39 Lineus_alborostratus Priapulus_caudatus 0.58 Nautilus_macromphalus Tribolium_castaneum 0.94 0.41 Octopus_vulgaris Locusta_migratoria 0.83 0.99 Octopus_ocellatus Limulus_polyphemus 0.85 Sepia_officinalis Daphnia_pulex 0.94 Sepia_esculenta Spadella_cephaloptera 0.38 0.87 Sagitta_inflata 0.99 Sepioteuthis_lessoniana Decipisagitta_decipiens 0.97 Loligo_bleekeri Siphonodentalium_lobatum Dosidicus_gigas Mytilus_trossulus 0.87 Pupa_strigosa Thais_clavigera 0.99 Biomphalaria_tenagophila Lineus_alborostratus 0.63 Aplysia_californica Nautilus_macromphalus Terebratulina_retusa Octopus_vulgaris 0.99 Laqueus_rubellus Octopus_ocellatus 0.73 Flustrellidra_hispida Sepia_officinalis 1 Bugula_neritina Sepia_esculenta Sipunculus_nudus Sepioteuthis_lessoniana 0.47 0.32 Dicyema_sp Loligo_bleekeri 1 Dosidicus_gigas 0.98 Dicyema_misakiense Pupa_strigosa 0.43 Dicyema_japonicum Biomphalaria_tenagophila Intoshia_linei Aplysia_californica 0.32 Orbinia_latreillii Terebratulina_retusa 0.39 Clymenella_torquata Laqueus_rubellus Platynereis_dumerilii Flustrellidra_hispida 0.42 0.63 Myzostoma_seymourcollegiorum Bugula_neritina Urechis_caupo Sipunculus_nudus 0.44 0.57 Marphysa_sanguinea Dicyema_sp Escarpia_spicata Dicyema_misakiense 0.89 Lumbricus_terrestris Dicyema_japonicum 0.92 Perionyx_excavatus Intoshia_linei 0.94 Orbinia_latreillii Amynthas_jiriensis Clymenella_torquata Platynereis_dumerilii Myzostoma_seymourcollegiorum 0.1 Urechis_caupo Marphysa_sanguinea Escarpia_spicata Lumbricus_terrestris Perionyx_excavatus Amynthas_jiriensis
Figure S1: 0.1 Phylogram and corresponding cladogram of a Bayesian analysis of our mitochondrial data set omitting the long-branching flatworm species. Phylobayes CAT+G4 model was run in 10 independent runs for 10,000 cycles each on an alignment with 2969 positions and 8000 trees were discarded as burnin. bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure S2. Related to Figure 2. Acropora_tenuis Nematostella_sp Trichoplax_adhaerens Aurelia_aurita Xenoturbella_bocki 1 Strongylocentrotus_pallidus 0.85 1 Saccoglossus_kowalevskii 1 Balanoglossus_carnosus 0.57 1 Doliolum_nationalis 0.72 Ciona_intestinalis Myxine_glutinosa 0.66 1 Lampetra_fluviatilis 1 Salmo_salar 0.99 Ornithorhynchus_anatinus 0.64 1 Homo_sapiens Branchiostoma_floridae 1 1 Asymmetron_inferum Acropora_tenuis Priapulus_caudatus Nematostella_sp 0.95 0.97 Tribolium_castaneum Trichoplax_adhaerens 0.97 Locusta_migratoria Aurelia_aurita 0.97 Limulus_polyphemus Xenoturbella_bocki Daphnia_pulex Strongylocentrotus_pallidus Spadella_cephaloptera Saccoglossus_kowalevskii 1 Sagitta_inflata Balanoglossus_carnosus 0.76 Decipisagitta_decipiens Doliolum_nationalis 1 Ciona_intestinalis Thais_clavigera Myxine_glutinosa 0.88 Siphonodentalium_lobatum Lampetra_fluviatilis Mytilus_trossulus Salmo_salar Lineus_alborostratus Ornithorhynchus_anatinus 0.56 Nautilus_macromphalus Homo_sapiens Octopus_vulgaris Branchiostoma_floridae 0.92 0.89 0.99 Octopus_ocellatus Asymmetron_inferum 0.59 0.91 Sepia_officinalis Priapulus_caudatus 0.97 Sepia_esculenta Tribolium_castaneum 0.93 Locusta_migratoria 0.98 Sepioteuthis_lessoniana Limulus_polyphemus 0.98 Loligo_bleekeri Daphnia_pulex Dosidicus_gigas Spadella_cephaloptera Dicyema_sp Sagitta_inflata 1 Decipisagitta_decipiens 0.99 Dicyema_misakiense Thais_clavigera 0.91 Dicyema_japonicum Siphonodentalium_lobatum Pupa_strigosa Mytilus_trossulus 0.71 Biomphalaria_tenagophila Lineus_alborostratus Aplysia_californica Nautilus_macromphalus Terebratulina_retusa Octopus_vulgaris 0.99 Laqueus_rubellus Octopus_ocellatus 0.76 Sepia_officinalis 1 Flustrellidra_hispida Sepia_esculenta 0.74 Bugula_neritina Sepioteuthis_lessoniana Sipunculus_nudus Loligo_bleekeri Orbinia_latreillii Dosidicus_gigas 0.81 Clymenella_torquata Dicyema_sp 0.8 Urechis_caupo Dicyema_misakiense Dicyema_japonicum 0.68 Platynereis_dumerilii Pupa_strigosa Myzostoma_seymourcollegiorum Biomphalaria_tenagophila 0.56 Marphysa_sanguinea Aplysia_californica Escarpia_spicata Terebratulina_retusa 0.96 Lumbricus_terrestris Laqueus_rubellus 0.96 Perionyx_excavatus Flustrellidra_hispida 0.98 Amynthas_jiriensis Bugula_neritina Sipunculus_nudus Orbinia_latreillii Clymenella_torquata 0.1 Urechis_caupo Platynereis_dumerilii Myzostoma_seymourcollegiorum Marphysa_sanguinea Escarpia_spicata Lumbricus_terrestris Perionyx_excavatus Amynthas_jiriensis
Figure S2: 0.1
Phylogram and corresponding cladogram of a Bayesian analysis of our mitochondrial data set omitting Intoshia linei. Phylobayes CAT+G4 model was run in 10 independent runs for 10,000 cycles each on an alignment with 2969 positions and 8000 trees were discarded as burnin. bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure S3. Related to Figure 3. Amphimedon_queenslandica Sycon_coactum Trichoplax_adhaerens Nematostella_vectensis 1 Hydra_vulgaris 0.97 0.96 Saccoglossus_kowalevskii 1 Rhabdopleura_sp 0.87 Strongylocentrotus_purpuratus 1 Patiria_miniata Homo_sapiens 1 Branchiostoma_floridae 1 Priapulus_caudatus 0.9 Trichinella_spiralis 0.82 1 Tribolium_castaneum 1 Pediculus_humanus Daphnia_pulex Macracanthorhynchus_hirudinaceus 1 1 Adineta_ricciae Amphimedon_queenslandica Mesodasys_laticaudatus Sycon_coactum 1 Macrodasys_sp Trichoplax_adhaerens Neomenia_megatrapezata Nematostella_vectensis 0.84 0.98 Octopus_bimaculoides Hydra_vulgaris 0.94 Saccoglossus_kowalevskii 1 Lymnea_stagnalis Rhabdopleura_sp Lottia_gigantea Strongylocentrotus_purpuratus 1 Dicyema_sp Patiria_miniata Dicyema_japonicum Homo_sapiens 0.98 Cerebratulus_sp Branchiostoma_floridae 1 Tubulanus_polymorphus Priapulus_caudatus 0.98 Cephalothrix_linearis Trichinella_spiralis Tribolium_castaneum 0.53 Macrostomum_lignano Pediculus_humanus 1 Maritigrella_crozieri Daphnia_pulex 1 Monocelis_sp 0.99 1 Macracanthorhynchus_hirudinaceus 1 Schistosoma_mansoni Adineta_ricciae Dendrocoelum_lacteum Mesodasys_laticaudatus Stenostomum_sp Macrodasys_sp Magelona_pitelkai Neomenia_megatrapezata 0.79 Golfingia_vulgaris Octopus_bimaculoides Lymnea_stagnalis 0.93 Pharyngocirrus_tridentiger Lottia_gigantea 0.98 Sabella_pavonina Dicyema_sp 0.98 Helobdella_robusta Dicyema_japonicum 0.51 1 Capitella_tellata Cerebratulus_sp Bonellia_viridis Tubulanus_polymorphus Alvinella_pompejana Cephalothrix_linearis Macrostomum_lignano Maritigrella_crozieri 0.1 Monocelis_sp Schistosoma_mansoni Dendrocoelum_lacteum Stenostomum_sp Figure S3: Magelona_pitelkai Golfingia_vulgaris Pharyngocirrus_tridentiger A phylogram based on our analysis of the jackknifed dataset omitting Intoshia linei. Contrary to the Sabella_pavonina improvement in placing I. linei observed when excluding the Dicyema species, the exclusion of I. linei does Helobdella_robusta Capitella_tellata not lead to a better resolution of the Dicyema species’ position. This can be seen as further evidence for the Bonellia_viridis Alvinella_pompejana non-affiliation of orthonectids and dicyemids and the correct inference that orthonectids are part of Annelida. 0.1 bioRxiv preprint doi: https://doi.org/10.1101/235549; this version posted December 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure S4. Related to Figure 3. A B Amphimedon_queenslandica Amphimedon_queenslandica Sycon_coactum Sycon_coactum Trichoplax_adhaerens Trichoplax_adhaerens Nematostella_vectensis 1 Nematostella_vectensis 1 Hydra_vulgaris 1 Saccoglossus_kowalevskii Hydra_vulgaris 1 0.96 Rhabdopleura_sp Saccoglossus_kowalevskii 1 0.96 1 Strongylocentrotus_purpuratus Rhabdopleura_sp 1 1 Patiria_miniata Strongylocentrotus_purpuratus 0.86 1 Homo_sapiens Patiria_miniata 1 1 Branchiostoma_floridae Homo_sapiens 1 Priapulus_caudatus Branchiostoma_floridae 0.75 Trichinella_spiralis 1 Priapulus_caudatus 0.75 0.75 Tribolium_castaneum 1 0.92 Trichinella_spiralis 1 Pediculus_humanus Tribolium_castaneum Daphnia_pulex 0.82 1 1 Macracanthorhynchus_hirudinaceus Pediculus_humanus 1 1 Adineta_ricciae Daphnia_pulex Neomenia_megatrapezata Macracanthorhynchus_hirudinaceus 1 1 Octopus_bimaculoides 1 Adineta_ricciae 1 Lymnea_stagnalis 0.75 1 Mesodasys_laticaudatus Lottia_gigantea 1 Dicyema_sp 0.5 Macrodasys_sp 1 Dicyema_japonicum Lepidodermella_squamata 1 Cerebratulus_sp Neomenia_megatrapezata 1 Tubulanus_polymorphus 0.88 0.98 1 Octopus_bimaculoides Cephalothrix_linearis 0.93 Lymnea_stagnalis 0.75 Mesodasys_laticaudatus 1 1 Lottia_gigantea Macrodasys_sp 0.58 Dicyema_sp 1 Lepidodermella_squamata 1 Macrostomum_lignano Dicyema_japonicum 0.75 1 Maritigrella_crozieri 0.97 Cerebratulus_sp 1 0.99 Tubulanus_polymorphus Monocelis_sp 0.98 1 1 Schistosoma_mansoni 1 Cephalothrix_linearis Dendrocoelum_lacteum Macrostomum_lignano Stenostomum_sp 1 Maritigrella_crozieri Magelona_pitelkai 1 Monocelis_sp 0.97 Golfingia_vulgaris 0.94 1 Schistosoma_mansoni 1 Pharyngocirrus_tridentiger 0.98 Dendrocoelum_lacteum 1 Intoshia_linei Sabella_pavonina Stenostomum_sp Capitella_tellata 1 1 Magelona_pitelkai Bonellia_viridis 1 0.74 Golfingia_vulgaris Helobdella_robusta 1 0.75 Pharyngocirrus_tridentiger Alvinella_pompejana Intoshia_linei 0.6 Sabella_pavonina 0.1 Helobdella_robusta 0.77 Capitella_tellata 0.99 Bonellia_viridis Alvinella_pompejana
0.1 C D
Amphimedon_queenslandica Amphimedon_queenslandica Sycon_coactum Sycon_coactum Trichoplax_adhaerens Trichoplax_adhaerens Nematostella_vectensis 1 1 Nematostella_vectensis Hydra_vulgaris 0.97 Hydra_vulgaris 0.97 Saccoglossus_kowalevskii 0.97 Homo_sapiens Rhabdopleura_sp 0.97 1 0.67 Branchiostoma_floridae 0.9 Strongylocentrotus_purpuratus 1 0.99 Saccoglossus_kowalevskii Patiria_miniata Rhabdopleura_sp Homo_sapiens 1 1 0.99 1 Strongylocentrotus_purpuratus Branchiostoma_floridae Patiria_miniata 1 Priapulus_caudatus Priapulus_caudatus 0.91 Trichinella_spiralis 0.68 1 Trichinella_spiralis 0.8 Tribolium_castaneum 0.94 1 1 Tribolium_castaneum 1 Pediculus_humanus 1 Pediculus_humanus Daphnia_pulex Daphnia_pulex Macracanthorhynchus_hirudinaceus 1 1 1 Macracanthorhynchus_hirudinaceus 1 Adineta_ricciae Adineta_ricciae Mesodasys_laticaudatus 1 Mesodasys_laticaudatus Macrodasys_sp 1 0.55 1 0.82 Macrodasys_sp Lepidodermella_squamata Lepidodermella_squamata 0.86 Neomenia_megatrapezata Neomenia_megatrapezata 1 Octopus_bimaculoides 0.99 0.99 Octopus_bimaculoides 0.92 Lymnea_stagnalis 0.99 Lymnea_stagnalis 1 1 Lottia_gigantea Lottia_gigantea Cerebratulus_sp Dicyema_sp 0.99 0.59 1 0.99 Tubulanus_polymorphus Dicyema_japonicum 0.97 Cephalothrix_linearis Cerebratulus_sp 1 0.51 Macrostomum_lignano 1 Tubulanus_polymorphus 1 Cephalothrix_linearis Maritigrella_crozieri 0.54 1 Monocelis_sp 0.8 Macrostomum_lignano 0.95 1 Schistosoma_mansoni 1 Maritigrella_crozieri 1 Dendrocoelum_lacteum 1 Monocelis_sp 0.96 1 Schistosoma_mansoni Stenostomum_sp 0.52 1 Magelona_pitelkai Dendrocoelum_lacteum 0.86 Golfingia_vulgaris Stenostomum_sp Magelona_pitelkai 0.81 Pharyngocirrus_tridentiger Intoshia_linei 0.94 Golfingia_vulgaris 0.65 Sabella_pavonina 0.99 Pharyngocirrus_tridentiger Helobdella_robusta 0.88 Sabella_pavonina 0.72 Capitella_tellata 0.7 Intoshia_linei 0.99 Bonellia_viridis 0.43 Helobdella_robusta 0.66 Capitella_tellata Alvinella_pompejana 0.99 0.54 Bonellia_viridis Alvinella_pompejana 0.1
0.1 Figure S4: A. Cladogram corresponding to Fig 3a showing all PP support values for the CAT+G4 phylogeny based on the full alignment of 190,027 amino acid positions. B. A cladogram including JP support values based on 50 jackknife subsamples of 30,000 amino acid positions each independently analysed for 2000 cycles under the CAT+G4 model in phylobayes and summarised with the bpcomp command setting 1800 as burnin. As in the analysis of the full dataset I. linei is found within the annelids and phylum Mesozoa is found as an unnatural assemblage. C. Cladogram corresponding to Fig 3b showing all support values. D. Cladogram corresponding to Fig 3c showing all support values.