Supplemental Methods

Enrichment Cultures of Paraphelidium tribonemae Paraphelidium tribonemae was maintained in an enriched culture together with its Tribonema gayaum in mineral medium (KNO3, 2 g/L; KH2PO4, 0.3 g/L; MgSO4, 0.15 g/L; EDTA, 10 mg/L; FeSO4, 5 mg/L; NaBO3, 1.4 mg/L; (NH4)6Mo7O2, 4.1 mg/L; CaCl2, 0.6 mg/L; ZnSO4, 0.1 mg/L; CuSO4, 50 mg/L,

Co(NO3)2, 20 mg/L) or in Volvic™ at room temperature in the presence of white light [1]. The culture was progressively cleaned from other heterotrophic by transfer, by micromanipulation under the optical microscope, of infected host filaments to enriched cultures containing micromanipulated uninfected T. gayanum filaments.

WGA Staining, Epifluorescence and Scanning Electron Microscopy In order to test whether chitin was present in different cycle stages of Paraphelidium tribonemae, we incubated an actively growing culture with Wheat Germ Agglutinin (WGA) conjugated to Texas Red (W7024 Life Technologies) following the manufacturer instructions. WGA was added to a final concentration of 5µg/mL in the cell-culture medium, incubated for 10 minutes at room temperature and washed with Volvic™ water. Cells were then observed under a LEICA DM2000LED fluorescence microscope with an HCX PL FLUOTAR 100x/1.30 oil PH3 objective. Pictures were taken with a LEICA DFC3000G camera using the LEICA Application Suite v4.5 and edited with ImageJ (http://imagej.nih.gov/ij/). For scanning electron microscopy (SEM) observations, we transferred 1-week-old cultures to a Petri dish containing mineral medium and two coverslips. Cells were let to settle overnight before fixation. Cells were fixed for 45 minutes with a cocktail of 1% OsO4 and 1% HgCl2 in culture medium. The fixation solution was removed by washing the samples three times with distilled water for 10 minutes each. Samples were then dehydrated by passing through ethanol series (30%, 50%, 70%, 90%, 96%, 100%) for 10 minutes each. Samples were subjected to critical-point drying and subsequently sputter-coated with platinum. Images were taken with a Zeiss Sigma FE-SEM at 1kV acceleration voltage.

Transcriptome Sequencing To obtain the most complete representation of the aphelid transcriptome across different cell-cycle stages, we extracted RNA from a young culture (three days after aphelid inoculation to T. gayanum) rich in aphelid zoospores and cysts, and eight old cultures (five to seven days after inoculation) almost without T. gayanum living cells and rich in aphelid plasmodia, resting and zoospores. In order to minimize algal growth and therefore algal sequences obtained after RNA-seq, we maintained the cultures in absence of light. Total RNA was extracted with the RNeasy mini Kit (Qiagen), quantified by Qubit (ThermoFisher Scientific) and sent to Eurofins Genomics (Germany) for de novo transcriptome sequencing. Two cDNA Illumina libraries corresponding, respectively, to the ‘young’ and ‘old’ enrichment cultures were constructed after polyA mRNA selection and paired-end (2x125 bp) sequenced with Illumina HiSeq 2500 Chemistry v4. Raw read sequences have been deposited in NCBI under accession number PRJNA402032.

1

Transcriptome Assembly, Decontamination and Annotation A total of 71,015,565 and 58,826,379 reads, respectively, were obtained for the libraries of the ‘young’ and ‘old’ cultures. From these, we obtained 34,844,871 (‘young’) and 25,569,845 (‘old’) paired-end reads. After quality/Illumina Hiseq adapter trimming using Trimmomatic v0.33 [2] with “ILLUMINACLIP:adapters.fasta:2:30:10 SLIDINGWINDOW:4:30 HEADCROP:12 MINLEN:113” parameters, we retained 8,601,196 forward and 6,535,164 reverse reads (‘young’) and 10,063,604 forward and 4,563,537 reverse unpaired reads (‘old’). All resulting reads were assembled using Trinity v2.2.0 [3] with “min_kmer_cov 2” and “normalize_reads” options. The final transcriptome assembly (Partr_v0) contained 68,130 contig sequences or putative transcripts. 47 of these sequences were identified as ribosomal RNAs using RNAMMER v1.2 [4] and included sequences of Paraphelidium tribonemae, its food source Tribonema gayanum, chloroplasts and bacteria. No other partial or complete eukaryotic ribosomal genes were detected. To decontaminate P. tribonemae from host and bacterial sequences, we used Blobtools v0.9.19 [5] (http://zenodo.org/record/845347#.Wfm7unZryHs) to visualize the taxonomic assignment of assembled sequences. As input files, in addition to our assembly file, we used a read map file obtained with the fast gapped-read alignment program Bowtie2 v2.3.2 [6] and a formatted hit list obtained from applying diamond-blastx [7] to our sequences against the NCBI RefSeq database (e-value threshold 1e-10). To build the blobDB file, we used the first 10 hits for each putative transcript with the bestsum as tax-rule criterion to get a taxonomic affiliation at species rank. Sequences with no hit (41,836 sequences, 61.4% of the assembly) and sequences affiliated to bacteria, archaea or viruses were removed. The remaining 21,688 eukaryotic sequences were then blasted against a partial REFSEQ protein database (e-value threshold 1e-10) containing sequences of 5 fungi (Spizellomyces punctatus DAOM BR117, Batrachochytrium dendrobatidis JAM81, Agaricus bisporus var. bisporus H97, var. grubii H99 and Cryptococcus neoformans var. neoformans JEC21) plus 6 stramenopiles (Nannochloropsis gaditana CCMP526, Phytophthora sojae, Phytophthora parasitica INRA-310, Phytophthora infestans T30-4, Aureococcus anophagefferens and Aphanomyces invadans). If the first hit was a stramenopile (potential host origin), the sequence was discarded. 13,786 transcripts were then assumed to be free of contamination. Transdecoder v2 (http://transdecoder.github.io) with search_pfam option yielded 16,841 peptide sequences that were filtered using Cd-hit v4.6 [8] with a 100% identity resulting in 13,035 peptides. These peptides were functionally annotated with eggNOG-mapper v4.5.1 [9] using both DIAMOND and HMMER mapping mode, eukaryotes as taxonomic scope, all orthologs and non-electronic terms for Gene Ontology evidence. We thus got a total of 10,669 annotated peptides as the version 1 (Partr_v1) of the Paraphelidium tribonemae predicted proteome. To further verify the decontamination of the Paraphelidium tribonemae transcriptome, we subsequently generated transcriptomic data for the Tribonema gayanum host using the same methods and parameters for sequencing, assembly and annotation, as described above. Then, we blasted the Paraphelidium tribonemae version 1 proteins against a database of Tribonema gayanum proteins. This allowed us to exclude 219 additional sequences (100% identical to Tribonema’s). To accommodate differences due to misassembly or allele variance, we also excluded Paraphelidium sequences >95% identical to Tribonema proteins, which allowed us to exclude 11 additional proteins as contaminant. After these additional cleaning steps, version 1.5 of the Paraphelidium tribonemae predicted proteome contained 10,439 proteins cleaned of contamination (Table S1). The completeness of this new transcriptome was estimated to 91.4% with BUSCO v2 [10] using the eukaryotic ortholog database 9 with the protein option. Paraphelidium

2 tribonemae metatranscriptomic nucleotide contig assembly and version 1.5 of the predicted proteome are deposited in figshare (https://figshare.com/authors/Guifr_Torruella/3846172) under Creative Commons 4.0 licence.

Phylogenomic Analyses We updated a previously assembled 93 single-copy protein domain (SCPD) dataset [11] with data from the Paraphelidium tribonemae transcriptome assembly, thirteen derived microsporidian lineages as well as Mitosporidium daphniae [12], Paramicrosporidium saccamoebae [13]and Amphiamblys sp. [14]. Some stramenopile taxa such Ectocarpus siliculosus [15], Thalassiosira pseudonana [16] or Phytophthora infestans (GenBank NW_003303758.1) were also included in order to further prevent any previously undetected Tribonema gayanum contamination. Markers were aligned with MAFFT v7 [17], using the method L-INS-i with 1000 iterations. Unambiguously aligned regions were trimmed with TrimAl v1.2rev59 [18] with the automated1 algorithm, then approximate Maximum Likelihood (ML) trees were inferred using FastTree v2.1.7 [19] and all visually inspected in Geneious v6.0.6 [20] to check for possible contamination. Contamination-free alignments were relieved from distant outgroup taxa; we kept only sequences belonging to , mainly holomycotan sequences, and retained apusomonad and breviate sequences as outgroup. Sequences belonging to the selected taxa were concatenated using Alvert.py from the package Barrel-o-Monkeys (http://rogerlab.biochemistryandmolecularbiology.dal.ca/Software/Software.htm#Monkeybarrel). This resulted in our initial dataset, SCPD48 (containing data from 48 species), with 22,207 amino acids. After the publication of the genome sequence of Paramicrosporidium saccamoebae [13], we included it in our dataset (SCPD49, 49 species), which now contained thirteen microsporidians and 22,976 amino acids. Bayesian phylogenetic trees were inferred using PhyloBayes-MPI v1.5 [21] under the CAT-Poisson evolutionary model, applying the Dirichlet process and removing constant sites. Two independent MCMC chains for each dataset were run for >15,000 generations, saving one every 10 trees. Phylogenetic analyses were stopped once convergence thresholds were reached after a burn-in of 25% (i.e., maximum discrepancy < 0.1 and minimum effective size > 100 using bpcomp; see PhyloBayes manual for details) (Figure S1A). We also applied ML phylogenetic reconstruction to the SCPD48 dataset using IQ-TREE v1.5.3 [22] with the profile mixture model C60 [23]. Statistical support was obtained with 1,000 ultrafast bootstraps [24] and 1000 replicates of the SH-•like approximate likelihood ratio test [25] (Figure S1B). Trees in Newick format were visualized with FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/), rooted with non- taxa and exported as pdf, with further edition with Inkscape vector graphics software. Datasets and alignments used for these analyses are available at http://www.ese.u-psud.fr/article986.html?lang=en. All trees are provided in Newick format in Supplementary file all_trees.txt. Additional datasets for phylogenomic analyses. Since the inclusion of Paramicrosporidium saccamoebae changed the topology of the tree (Paraphelidium being now sister to Fungi) but the statistical support was not maximal, we carried out phylogenomic analyses with several additional datasets. First, to limit potential long-branch attraction artefacts due to fast-evolving , we created a subset of our dataset by discarding thirteen long-branch Microsporidia from individual alignments before realigning, trimming and concatenating with the same parameters as before. This resulted in a dataset containing 36 species (SCPD36; Figure S1E and Figure S1F) and, since less taxa were included, trimmed alignments resulted longer, the supermatrix containing 24,413 amino acids. Second, we carried out the

3 same analyses on the same taxon sampling (49 and 36 species, respectively) using two other protein datasets that have been previously used in phylogenomic studies of Microsporidia, for which we updated taxon sampling and performed sequence pruning to eliminate doubtful orthologs as revealed single by marker trees (data not shown). The resulting datasets were called BMC49 (Figure S1G and Figure S1H) and BMC36 (Figure S1I and Figure S1J) for a set of 53 protein markers [26] concatenated into matrices of 15,744 and 16,840 amino acidic positions, respectively; and GBE49 (Figure S1K and Figure S1L) and GBE36 (Figure S1M and Figure S1N) with 259 protein markers [14] amounting to 87,045 and 96,029 amino acids each. To alleviate the local computational burden, most of these analyses were carried out using CIPRES Science Gateway [27]. Tree topology test. In order to test if the best topology obtained was significantly better than previously obtained topologies using other markers, we tested whether constrained alternative topologies could be rejected for different datasets. We used Mesquite [28] to constrain topologies representing the opisthosporidian monophyly (the monophyly of aphelids, rozellids and microsporidians), the monophyly of aphelids with either blastocladiomycotans, chytridiomycotans or both, chytridiomycotans and blastocladiomycotans. The constrained topologies without any branch-length were reanalyzed with the “-g” option of IQ-TREE and the best-fitting model for each dataset. The resulting trees were then concatenated and AU tests were performed for each dataset with the “-z” and “-au” options as described in the advanced documentation of IQ-TREE. The results of alternative topology tests for each dataset can be found in Table S2. Reducing heterogeneity by progressively removing the fastest-evolving sites. To minimize systematic error due to the inclusion of fast-evolving sites in our protein datasets, we progressively removed the fastest evolving sites at steps of 5% sites removed at a time. Among-site evolutionary rates were inferred using IQ-TREE “-wsr” option and its best-fitting model for each dataset (Table S2). In total, we created 19 subsets for each dataset. We then reconstructed phylogenetic trees using IQ-TREE with the same best fitting model as for the whole dataset. To know how supported were alternative topologies in bootstrapped trees, we used CONSENSE from the PHYLIP package [29] and interrogated the .UFBOOT file using a Python script (M Kolisko, pers. comm). The results are provided in Table S2 and plotted in Figure 1.

Homology Searches and Phylogenetic Analyses of Specific Proteins All protein sets used in this study were obtained in May 2017 from the NCBI protein database (https://www.ncbi.nlm.nih.gov/protein/), except the following: Chromosphaera perkinsii, Corallochytrium limacisporum and Ichthyophonus hoferi genome data were obtained from [30]; Spizellomyces punctatus, Gonapodya prolifera, Batrachochytrium dendrobatidis, Allomyces macrogynus, Catenaria anguillulae and Blastocladiella britannica were retrieved from MycoCosm portal, Joint Genome Institute* (http://www.jgi.doe.gov/); Parvularia atlantis (previously sp. ATCC50694 from https://doi.org/10.6084/m9.figshare.3898485.v4); and Paramicrosporidium saccamoebae from NCBI in January 2018. [* These sequence data were produced by the US Department of Energy Joint Genome Institute http://www.jgi.doe.gov/ in collaboration with the user community.] Chitin and cell-wall related proteins. Query sequences to retrieve chitin and related cell-wall proteins (chitin synthases – CHS –, chitin deacetylases – CDA – and chitinases – CTS –) were obtained from the study of Rozella allomycis [31] except for 1,3-beta glucan synthase (FKS1), which was obtained from

4

Saccharomyces cerevisiae S288C. We used sequences from fumigatus as initial queries for cellulases [32] and also opisthokont homologs retrieved by BLASTp against a database from our Paraphelidium transcriptome v0 and V1.5. was confirmed by searching for the identified sequences in the eggNOG [9] annotation file (Table S1), performing reverse BLAST analyses against the non-redundant NCBI database and by HMMER analysis [33] (http://hmmer.org). Cell wall proteins, in particular chitinases and cellulases, were also annotated using CAZyme identification tools (http://csbl.bmb.uga.edu/dbCAN/annotate.php). We produced comprehensive alignments including sequences retrieved from previous studies and sequences identified by BLAST in GenBank or in the mycoCLAP database [34] with MAFFT web server [17] applying the progressive L-INS-i algorithm (except for the multidomain CHS sequences for which E-INS-i was used) (https://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html). Alignments were trimmed from gaps and ambiguously aligned sites using Trimal v1.2rev59 [18] with the automated1 algorithm. Maximum Likelihood trees were inferred using IQ-TREE web server [22] with the LG+G4 evolutionary model and visualized with FigTree. All trees are provided in Newick format in Supplementary file all_trees.txt. Myosins. Retrieval of myosin protein sequences was performed by querying the HMM profiles of the myosin head domain (Pfam accession: PF00063) against our database of Paraphelidium tribonemae proteins, using the hmmersearch utility of HMMER v3.1b2 (http://hmmer.org) and the profile-specific gathering threshold cut-off. For each retrieved homolog, the protein segment spanning the Pfam- defined myosin head domain was aligned against the pan-eukaryotic alignment of myosins of [35], using MAFFT v7.245 [17] with the G-INS-i algorithm optimized for global sequence homology. The alignment was run for up to 10³ cycles of iterative refinement, and then it was manually examined, curated and trimmed (a process that included the removal of non-homologous amino acid positions). Then, the trimmed alignment, totaling 997 sequences and 400 sites, was analyzed using ML phylogenetic inference with IQ-TREE v1.5.1 [22] under the LG model with a 10-categories free-rate distribution, which was selected as the best-fitting model according to the IQ-TREE TESTNEW algorithm as per the Bayesian information criterion (BIC). The best-scoring tree was searched for up to 100 iterations, starting from 100 initial parsimonious trees; statistical supports for the bipartitions were drawn from 1000 ultra-fast bootstrap [24] replicates with a 0.99 minimum correlation as convergence criterion, and 1000 replicates of the SH-like approximate likelihood ratio test. Then, we classified the myosin homologs of Paraphelidium tribonemae according to the phylogenetic framework of Sebé-Pedrós et al. [35]. In order to characterize the classified homologs (Figure S4 and Table S4B), we recorded the protein domain architectures of the full-length proteins using Pfamscan and the 29th release of the Pfam database [36]. Datasets and alignments used for these analyses are available here (http://www.ese.u- psud.fr/article986.html?lang=en). All trees are provided in Newick format in Supplementary file all_trees.txt.

Comparative analyses of proteins involved in primary metabolism as well as cytoskeleton, membrane- trafficking and phagotrophy Primary metabolism. To get a broad comparative view of the metabolic capabilities of Paraphelidium tribonemae, we performed a statistical multivariate analysis similar to the Principal Component Analysis previously used in the Rozella allomycis genome paper (Figure 4 in [31]) to show that this organism grouped with other eukaryotic parasites. We initially searched in P. tribonemae for the presence of

5

6,469 unique orthologous groups as per eggNOG web server v4.5.1 [9], corresponding to 8 primary metabolism categories as per Gene Ontology v1.4. The correspondence between GO terms and primary metabolism COGs was obtained from http://geneontology.org/external2go/cog2go, and was as follows: [C] Energy production and conversion (912 OGs); [G] Carbohydrate transport and metabolism (1901 OGs); [E] Amino acid transport and metabolism (714 OGs); [F] Nucleotide transport and metabolism (351 OGs); [H] Coenzyme transport and metabolism (375 OGs); [I] Lipid transport and metabolism (965 OGs); [P] Inorganic ion transport and metabolism (842 OGs) and [Q] Secondary metabolites biosynthesis, transport and catabolism (570 OGs). From these categories, we were able to identify a total of 1,172 non-ambiguous OGs in the P. tribonemae transcriptome that were shared with a set of other 41 protist species (Table S3). These 41 protist species represent a broader taxonomic sampling as compared to the previous study on R. allomyces [31]. The eukaryotes included were 11 opisthosporidians (Paramicrosporidium saccamoebae – Par_sa, Amphiamblys sp. - Amp_sp, Antonospora locustae - Ant_lo, Encephalitozoon cuniculi - Enc_cu, Encephalitozoon intestinalis - Enc_in, - Ent_bi, Mitosporidium daphniae - Mit_da, Nematocida parisii - Nem_pa, Nosema ceranae - Nos_ce, Rozella allomycis - Roz_al, Paraphelidium tribonemae - Par_tr); 10 fungi including members of most phyla (Spizellomyces punctatus* - Spi_pu, Gonapodya prolifera - Gon_pr, Batrachochytrium dendrobatidis - Bat_de, Allomyces macrogynus - All_ma, Catenaria anguillulae - Cat_an, Blastocladiella britannica - Bla_br, Mortierella verticillata - Mor_ve, Coprinopsis cinerea - Cop_ci, Ustilago maydis - Ust_ma, Schizosaccharomyces pombe - Sch_po); 2 nucleariids (Fonticula alba - Fon_al, Parvularia atlantis - Par_at); 7 holozoans (Salpingoeca rosetta - Sal_ro, Monosiga brevicollis - Mon_br, Capsaspora owczarzaki - Cap_ow, Chromosphaera perkinsii - Chr_pe, Corallochytrium limacisporum - Cor_li, Ichthyophonus hoferi - Ich_ho, Sphaeroforma arctica - Sph_ar), 3 amoebozoans (Acanthamoeba castellanii - Aca_ca, Dictyostelium purpureum - Dic_pu, Heterostelium pallidum - Het_pa, Entamoeba histolytica - Ent_hi), the free-living apusomonad Thecamonas trahens - The_tr and Naegleria gruberi - Nae_gr and 6 parasites (Toxoplasma gondii - Tox_go, falciparum - Pla_fa, Trypanosoma brucei - Try_br, Leishmania major - Lei_ma, Phytophora infestans - Phy_in). We annotated all 41 protein sets using eggNOG-mapper v4.5.1 [37] using DIAMOND mapping mode, eukaryotes as taxonomic scope, all orthologs and non-electronic terms for Gene Ontology evidence. Then we counted the OGs of interest and performed a series of exploratory multivariate analyses, as follows. The species-wise OG counts were transformed into a presence/absence matrix (encoded as 0/1), or binary OG profile. This binary OG profile excluded all OGs with no identifiable orthologs in any of the 41 species used in this analysis (e.g., proteins related to exclusive prokaryotic metabolism or photosynthesis) leaving a total number of 1,172 primary metabolism OGs (Table S3). The similarity between each species’ binary COG profile was assessed using the Pearson’s correlation coefficient (r statistic) as implemented in the R stats library [38]. Next, we built a complementary species distance matrix by defining dissimilarity as 1-r. Finally, we analyzed this dissimilarity matrix using a Principal Coordinate Analysis (PCoA) as implemented in the R ape library [39], with default parameters. For each PCoA, we represented the two vectors with the highest fraction of explained variance (Figure 3A and Figure S3A), which were in all cases higher than the fractions expected under the broken stick model. In addition, we plotted binary OG profiles of each species in a presence/absence heatmap, produced using the heatmap.2 function in the R gplots library [40]. Species’ order was defined by a Ward hierarchical clustering of the afore-mentioned interspecific Pearson’s correlation coefficients. OGs’ order was defined by Ward hierarchical clustering on euclidean distances. All clustering and distance analyses were performed using R stats library [38]. We then represented the raw species clustering (Pearson correlation+Ward) in a separate heatmap, scaling the color code to display positive Pearson

6 correlation values (0 to 1) (Figures 3B). Finally, each sub-set of primary metabolism COGs was also analyzed separately (categories C, E, F, G, H, I, P and Q) using the same hierarchical clustering as above in order to cluster species according to their metabolism gene content (based on Pearson+Ward) (Figures S3C-F). Cytoskeleton, membrane trafficking, phagotrophy-related proteins. Likewise, we carried out similar comparisons and statistical analyses of proteins involved in phagotrophy, membrane trafficking and cytoskeleton. For this, we looked for proteins broadly related to those systems (actin cytoskeleton, exocytosis, fungal , exosome, endosome, mitochondrial biogenesis, ER-Golgi, lysosome, autophagy, peroxisome) in the KEGG BRITE reference database (http://www.genome.jp/kegg/brite.html). We identified a total of 849 proteins related to membrane trafficking (ko04131), 279 cytoskeletal proteins (ko04812) and 361 proteins related to phagotrophy (KEGG maps map04144_endocytosis, map04145_phagosome, map04146_peroxisome, map04142_lysosome, map04810_actin_regulation, map04141_autophagy and map04071_sphingolipid_signal), which were used to carry out PCoA and clustering analyses as above (Figure S4A). Since the eggNOG annotation file did not provide specific KOs (KEGG orthologs) but only KEGG maps, we used the following approach to get the KO profile for the 41 taxa. All proteins in 35 genomes (Table 1 in [41]) were clustered to a non-redundant set using cd-hit at a 90% identity threshold. Those 35 nr90 genomes were compared in an all-vs-all BLAST analysis with an e- value cutoff of 1x10-3. The all-vs-all BLAST output was clustered using the MCL algorithm with an inflation parameter of 2.0. The resultant clusters were retained if the proteins in the cluster were derived from 3 or more organisms. This was done to avoid lineage specific proteins. Proteins from each cluster were aligned using MAFFT, the alignments were trimmed with trimAl, and HMM profiles were built for each alignment using HMMer3. This analysis resulted in a set of 14,095 HMMs. All proteins from the UniProt-SwissProt database were searched against the entire set of 14,095 HMMs. Each HMM was assigned a best-hit protein annotation from the UniProt-SwissProt database. KEGG ortholog (KO) identifiers (IDs) associated with each HMM were obtained from UniProtKB IDs and KO IDs using the mappings available from UniProtKB. Additional KO IDs were mapped to the HMMs by searching the HMMs against all proteins annotated to a given KO ID. The best hit bit score among those proteins to the HMM was compared to the best-hit bit score of the UniProtKB annotation for that HMM. If the best hit KO ID bit score was at least 80% of the best hit UniProtKB protein bit score, and there was no KO ID already associated with the UniProtKB annotation, the KO ID was transferred to that HMM. To build a presence-absence matrix associated with KO IDs, the HMMs were searched against all proteins in a genome. Best hit HMMs below threshold (total protein e-value <=1x10-5 and a best single domain e- value <=1e-4) were assigned for each protein in a genome. The best hit HMMs and their associated KO IDs were used for presence/absence calls. If at least one protein in a genome had a best hit to a given HMM, the associated KO was considered present for that genome. From a total of 1,568 OGs associated with the KEGG maps of interest, 633 KO IDs were identified in the UniProtKB annotations of the HMMs. An additional 62 KO IDs were mapped to HMMs by considering similarity of bit scores between the best UniProt/SwissProt protein hit to the HMM and the best KO ID protein hit to the HMM (Table S3A). Those 695 KO IDs were used to perform the same multivariate statistical analyses as per the metabolism section for the 41 taxa. Finally, we looked for some of the phagolysosome KEGG maps in Paraphelidium tribonemae and Rozella allomycis and compared them using KEGG tools (Figure S4C-E). To do so, we first annotated the two protein sets using BlastKOALA [42] with Eukaryotes as group and

7 genus_eukaryotes KEGG GENES database. We then uploaded our annotations in the KEGG Mapper Reconstruct Pathway and BRITE servers www.genome.jp/kegg/mapper.html.

Supplemental References 1. Karpov, S.A., Tcvetkova, V.S., Mamkaeva, M.A., Torruella, G., Timpano, H., Moreira, D., Mamanazarova, K.S., and Lopez-Garcia, P. (2017). Morphological and genetic diversity of : new aphelid Paraphelidium tribonemae gen. et sp. nov. J Eukaryot Microbiol 64, 204-212. 2. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120. 3. Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., et al. (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29, 644-652. 4. Lagesen, K., Hallin, P., Rodland, E.A., Staerfeldt, H.H., Rognes, T., and Ussery, D.W. (2007). RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35, 3100-3108. 5. Laetsch, D.R., and Blaxter, M.L. (2017). BlobTools: Interrogation of genome assemblies [version 1; referees: 2 approved with reservations]. F1000Research 6, 1287. 6. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357-359. 7. Buchfink, B., Xie, C., and Huson, D.H. (2015). Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59-60. 8. Fu, L., Niu, B., Zhu, Z., Wu, S., and Li, W. (2012). CD-HIT: accelerated for clustering the next- generation sequencing data. Bioinformatics 28, 3150-3152. 9. Huerta-Cepas, J., Szklarczyk, D., Forslund, K., Cook, H., Heller, D., Walter, M.C., Rattei, T., Mende, D.R., Sunagawa, S., Kuhn, M., et al. (2016). eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44, D286-293. 10. Simao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M. (2015). BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210-3212. 11. Torruella, G., de Mendoza, A., Grau-Bove, X., Anto, M., Chaplin, M.A., del Campo, J., Eme, L., Perez- Cordon, G., Whipps, C.M., Nichols, K.M., et al. (2015). Phylogenomics reveals convergent evolution of lifestyles in close relatives of animals and fungi. Curr Biol 25, 2404-2410. 12. Haag, K.L., James, T.Y., Pombert, J.F., Larsson, R., Schaer, T.M., Refardt, D., and Ebert, D. (2014). Evolution of a morphological novelty occurred before genome compaction in a lineage of extreme parasites. Proc Natl Acad Sci U S A 111, 15480-15485. 13. Quandt, C.A., Beaudet, D., Corsaro, D., Walochnik, J., Michel, R., Corradi, N., and James, T.Y. (2017). The genome of an intranuclear parasite, Paramicrosporidium saccamoebae, reveals alternative to obligate intracellular . Elife 6. 14. Mikhailov, K.V., Simdyanov, T.G., and Aleoshin, V.V. (2017). Genomic survey of a hyperparasitic microsporidian Amphiamblys sp. (Metchnikovellidae). Genome Biol Evol 9, 454-467. 15. Cock, J.M., Sterck, L., Rouze, P., Scornet, D., Allen, A.E., Amoutzias, G., Anthouard, V., Artiguenave, F., Aury, J.M., Badger, J.H., et al. (2010). The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature 465, 617-621. 16. Armbrust, E.V., Berges, J.A., Bowler, C., Green, B.R., Martinez, D., Putnam, N.H., Zhou, S., Allen, A.E., Apt, K.E., Bechner, M., et al. (2004). The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science 306, 79-86. 17. Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005). MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33, 511-518.

8

18. Capella-Gutierrez, S., Silla-Martinez, J.M., and Gabaldon, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972-1973. 19. Price, M.N., Dehal, P.S., and Arkin, A.P. (2010). FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490. 20. Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C., et al. (2012). Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647-1649. 21. Lartillot, N., Lepage, T., and Blanquart, S. (2009). PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25, 2286-2288. 22. Nguyen, L.T., Schmidt, H.A., von Haeseler, A., and Minh, B.Q. (2015). IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32, 268-274. 23. Le, S.Q., Lartillot, N., and Gascuel, O. (2008). Phylogenetic mixture models for proteins. Phil Transof Royal Soc B 363, 3965-3976. 24. Minh, B.Q., Nguyen, M.A., and von Haeseler, A. (2013). Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol 30, 1188-1195. 25. Anisimova, M., Gil, M., Dufayard, J.-F., Dessimoz, C., and Gascuel, O. (2011). Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol 60, 685-699. 26. Capella-Gutierrez, S., Marcet-Houben, M., and Gabaldon, T. (2012). Phylogenomics supports microsporidia as the earliest diverging clade of sequenced fungi. BMC Biol 10, 47. 27. Miller, M.A., Pfeiffer, W., and Schwartz, T. (2010). Creating the CIPRES Science Gateway for inference of large phylogenetic trees. In Proceedings of the Gateway Computing Environments Workshop, Volume 14 Nov. 2010. (New Orleans, LA; pp. 1-8), pp. 1-8. 28. Maddison, W.P., and Maddison, D.R. (2018). Mesquite: a modular system for evolutionary analysis. Version 3.40 p. http://mesquiteproject.org. 29. Felsenstein, J. (1999). PHYLIP - Phylogeny Inference Package. 3.6 Edition. (Seattle, WA: University of Washington). 30. Grau-Bove, X., Torruella, G., Donachie, S., Suga, H., Leonard, G., Richards, T.A., and Ruiz-Trillo, I. (2017). Dynamics of genomic innovation in the unicellular ancestry of animals. Elife 6, pii: e26036. 31. James, T.Y., Pelin, A., Bonen, L., Ahrendt, S., Sain, D., Corradi, N., and Stajich, J.E. (2013). Shared signatures of parasitism and phylogenomics unite Cryptomycota and Microsporidia. Current Biology 23, 1548-1553. 32. Kubicek, C.P., Starr, T.L., and Glass, N.L. (2014). Plant cell wall-degrading enzymes and their secretion in plant-pathogenic fungi. Annu Rev Phytopathol 52, 427-451. 33. Finn, R.D., Clements, J., and Eddy, S.R. (2011). HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39, W29-37. 34. Murphy, C., Powlowski, J., Wu, M., Butler, G., and Tsang, A. (2011). Curation of characterized glycoside hydrolases of fungal origin. In Database (Oxford). p. bar020. 35. Sebe-Pedros, A., Grau-Bove, X., Richards, T.A., and Ruiz-Trillo, I. (2014). Evolution and classification of myosins, a paneukaryotic whole-genome approach. Elife 6, 290-305. 36. Finn, R.D., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Mistry, J., Mitchell, A.L., Potter, S.C., Punta, M., Qureshi, M., Sangrador-Vegas, A., et al. (2016). The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44, D279-285. 37. Huerta-Cepas, J., Forslund, K., Coelho, L.P., Szklarczyk, D., Jensen, L.J., von Mering, C., and Bork, P. (2017). Fast genome-wide functional annotation through orthology assignment by eggNOG- Mapper. Mol Biol Evol 34, 2115-2122. 38. R Development Core Team (2017). R: A language and environment for statistical computing., http://www.r-project.org Edition. (Vienna, Austria: R Foundation for Statistical Computing). 39. Paradis, E., Claude, J., and Strimmer, K. (2004). APE: Analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289-290.

9

40. Warnes, G.R., Bolker, B., Bonebakker, L., Gentleman, R., Liaw, W.H.A., Lumley, T., Maechler, M., Magnusson, A., Moeller, S., Schwartz, M., et al. (2016). gplots: Various R programming tools for plotting data. R package version 2, 1. 41. Burns, J.A., Pittis, A.A., and Kim, E. (2018). Gene-based predictive models of trophic modes suggest Asgard archaea are not phagocytotic. Nat Ecol Evol 2, 697-704. 42. Kanehisa, M., Sato, Y., and Morishima, K. (2016). BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J Mol Biol 428, 726-731.

10

49sp - with LB-Microsporidia 36sp - without LB-Microsporidia PhyloBayes CAT Poisson IQ-TREE best model+C60 PhyloBayes CAT Poisson IQ-TREE best model+C60 Apusomonads/Breviates C D E F

Thecamonas trahens SCPD49 Thecamonas trahens SCPD49 Thecamonas trahens SCPD36 Thecamonas trahens SCPD36 Nucleariids Opisthosporidia Fungi 1 100/100 1 100/100 CAT+Poi LG+R8+C60 CAT+Poi LG+R7+C60 Pygsuia biforma Pygsuia biforma Pygsuia biforma Pygsuia biforma Sphaerothecum destruens Sphaerothecum destruens 0.99 0.9 Corallochytrium limacisporum 0.7 48.2/54 Corallochytrium limacisporum Corallochytrium limacisporum 0.3 Amoebidium parasiticum 0.4 0.99 47.5/71 0.7 100/100 1 Creolimax fragrantissima 100/100Amoebidium parasiticum Creolimax fragrantissima Creolimax fragrantissima Amoebidium parasiticum Creolimax fragrantissima 1 100/100 1 100/100 Amoebidium parasiticum Sphaerothecum destruens 1 Ministeria vibrans 100/100 Ministeria vibrans 1 33.4/27 Capsaspora owczarzaki Capsaspora owczarzaki Sphaerothecum destruens Corallochytrium limacisporum 99.8/82 74.5/41 1 Salpingoeca kvevrii Amphimedon queenslandica Ministeria vibrans Ministeria vibrans 1 Monosiga brevicollis 100/100Nematostella vectensis 1 100/100 0.67 99.7/100 0.68 Capsaspora owczarzaki Capsaspora owczarzaki 1 Didyomeca costata 100/100 Xenopus tropicalis Amphimedon queenslandica Didyomeca costata Salpingoeca kvevrii 1 100/100 1 99.6/95 Amphimedon queenslandica Xenopus tropicalis 1 100/100 1 11.9/57 Salpingoeca kvevrii Monosiga brevicollis Nematostella vectensis Nematostella vectensis Monosiga brevicollis 0.94 100/100 1 100/100 Parvularia atlantis Parvularia atlantis 1 Didyomeca costata 100/100 Xenopus tropicalis 1 100/100 1 100/100 Fonticula alba Fonticula alba Amphimedon queenslandica Monosiga brevicollis Rozella allomycis Rozella allomycis 1 100/100 100/100 Xenopus tropicalis Salpingoeca kvevrii 1 Paramicrosporidium saccamoebae Paramicrosporidium saccamoebae 1 100/100 4 3 / 6 5 A 1 Mitosporidium daphniae Mitosporidium daphniae Nematostella vectensis Didyomeca costata 65.4/91 Amphiamblys sp. WSBS2006 Amphiamblys sp. WSBS2006 0.99 Parvularia atlantis Parvularia atlantis 0.87 Nematocida parisii Nematocida parisii 1 100/100 Thecamonas trahens 1 100/ 99.9/98 Fonticula alba Fonticula alba 1 Edhazardia aedis Edhazardia aedis 1 BI CAT-Poisson SCPD Pygsuia biforma Pseudoloma neurophilia 100 100/100 11.1/59 Pseudoloma neurophilia Rozella allomycis Rozella allomycis Posterior Probabilities 1 1 Vavraia culicis 100/100 1 1 Sphaerothecum destruens 1 Trachipleistophora hominis Paramicrosporidium saccamoebae 100/100 100/100 1 100/100 Paramicrosporidium saccamoebae 1 0.3 Trachipleistophora hominis 100/100 Vavraia culicis 100/100 Corallochytrium limacisporum 0.94 Antonospora locustae Anncaliia algerae Mitosporidium daphniae Mitosporidium daphniae 1 1 1 100/100 Creolimax fragrantissima Anncaliia algerae 100/100 Antonospora locustae 1 Paraphelidium tribonemae Paraphelidium tribonemae 1 1 Vittaforma corneae 99.7/99 Enterocytozoon bieneusi 100/100 Amoebidium parasiticum 1 100/100 Allomyces macrogynus Allomyces macrogynus Enterocytozoon bieneusi 1 Vittaforma corneae 1 100/100 Ministeria vibrans 1 Nosema ceranae 100/100 Catenaria anguillulae Blastocladiella britannica 1 Nosema ceranae 0.88 1 1 100/100 96.7/95 100/100 Capsaspora owczarzaki 1 Ordospora colligata 100/100 Ordospora colligata Blastocladiella britannica Catenaria anguillulae Monosiga brevicollis Encephalitozoon intestinalis Encephalitozoon intestinalis Gonapodya prolifera 1 Paraphelidium tribonemae Paraphelidium tribonemae Gonapodya prolifera 1 1 Salpingoeca kvevrii Allomyces macrogynus Allomyces macrogynus 1 Spizellomyces punctatus 98.9/99 100/100 Spizellomyces punctatus 0.98 100/100 1 Didyomeca costata 1 Catenaria anguillulae Blastocladiella britannica Batrachochytrium dendrobatidis 100/100 1 0.66 1 100/100 Batrachochytrium dendrobatidis Blastocladiella britannica 19.3/53 Catenaria anguillulae Amphimedon queenslandica Gonapodya prolifera 98.7/ Gonapodya prolifera coronatus 1 100/100 1 99.2/99 Xenopus tropicalis 1 1 Spizellomyces punctatus 96 Spizellomyces punctatus 0.97 Coemansia reversa 7.1/51 Coemansia reversa 1 1 Batrachochytrium dendrobatidis 100/100 Batrachochytrium dendrobatidis 1 Nematostella vectensis Schizosaccharomyces pombe Saitoella complicata Conidiobolus coronatus 100/96 Conidiobolus coronatus 1 Parvularia atlantis 1 1 99.9/100 1 Saitoella complicata 100/100 1 Coemansia reversa Coemansia reversa 100/100 Schizosaccharomyces pombe Fonticula alba Schizosaccharomyces pombe Saitoella complicata 1 100/100 1 100/100 Ustilago maydis Phanerochaete Paraphelidium tribonemae 1 Saitoella complicata Schizosaccharomyces pombe 1 1 100/100 100/100 Phanerochaete chrysosporium 100/100 1 Ustilago maydis Phanerochaete chrysosporium 1 100/100 Ustilago maydis Rozella allomycis 1 100/100 Phanerochaete chrysosporium Ustilago maydis Rhizophagus irregularis Rhizophagus irregularis 1 100/100 1 Mitosporidium daphniae Rhizophagus irregularis Rhizophagus irregularis 1 Mortierella verticillata 99.5/99 1 Mortierella verticillata 1 Amphiamblys sp. WSBS2006 Mortierella verticillata 99.8/99 Mortierella verticillata 1 1 Mucor circinelloides Mucor circinelloides Mucor circinelloides 94.1/96 Mucor circinelloides Nematocida parisii 1 94.1/95 100/100 1 100/100 1 0.95 Lichtheimia hyalospora Lichtheimia hyalospora Lichtheimia hyalospora Lichtheimia hyalospora Edhazardia aedis 1 Pseudoloma neurophilia 1 1 Vavraia culicis 1 Trachipleistophora hominis 0.85 Antonospora locustae 1 1 Anncaliia algerae 1 Vittaforma corneae G H I J 1 Enterocytozoon bieneusi 1 Nosema ceranae 1 Thecamonas trahens BMC49 Pygsuia biforma BMC49 Thecamonas trahens BMC36 Pygsuia biforma BMC36 Pygsuia biforma 100/100 Thecamonas trahens 1 99.9/100 1 Ordospora colligata CAT+Poi LG+F+R7+C60 Pygsuia biforma CAT+Poi Thecamonas trahens LG+R6+C60 1 Corallochytrium limacisporum Corallochytrium limacisporum Corallochytrium limacisporum Encephalitozoon intestinalis Sphaerothecum destruens 0.6 100/ Sphaerothecum destruens 0.4 Corallochytrium limacisporum 0.3 0.4 1 0.99 100 99.8/100 Allomyces macrogynus 1 Creolimax fragrantissima 100/100 Amoebidium parasiticum Sphaerothecum destruens Sphaerothecum destruens Amoebidium parasiticum Creolimax fragrantissima 100/ 99.8/93 1 0.97 46.3/66 0.96 Creolimax fragrantissima Amoebidium parasiticum Catenaria anguillulae Ministeria vibrans Capsaspora owczarzaki 100 100/100 1 1 100/100 1 1 Blastocladiella emersonii Capsaspora owczarzaki Ministeria vibrans Amoebidium parasiticum Creolimax fragrantissima 0.98 Didyomeca costata 74.4/76 Didyomeca costata Ministeria vibrans 87.3/81 Ministeria vibrans Gonapodya prolifera 1 Salpingoeca kvevrii 100/100 Salpingoeca kvevrii 1 100/100 1 1 1 100/100 Capsaspora owczarzaki Capsaspora owczarzaki Spizellomyces punctatus 1 Monosiga brevicollis 100/100 Monosiga brevicollis 1 Amphimedon queenslandica Amphimedon queenslandica Amphimedon queenslandica Batrachochytrium dendrobatidis 1 100/100 0.83 Didyomeca costata 89.2/85 1 1 100/100 1 Xenopus tropicalis 100/100 100/100 Nematostella vectensis Salpingoeca kvevrii Nematostella vectensis Conidiobolus coronatus 99.9/100 1 Nematostella vectensis Xenopus tropicalis 1 1 Monosiga brevicollis 100/100 Xenopus tropicalis Coemansia reversa 1 Parvularia atlantis 100/100 Parvularia atlantis 1 99.9/100 Fonticula alba Fonticula alba 1 Amphimedon queenslandica Didyomeca costata Schizosaccharomyces pombe Rozella allomycis Rozella allomycis 1 100/100 1 100/100 Xenopus tropicalis Salpingoeca kvevrii 1 Saitoella complicata 1 Paramicrosporidium saccamoebae 89.7/97 Mitosporidium daphniae 1 100/100 1 1 Mitosporidium daphniae Paramicrosporidium saccamoebae Nematostella vectensis Monosiga brevicollis Ustilago maydis Amphiamblys sp. WSBS2006 100/100 Amphiamblys sp. WSBS2006 1 0.95 Parvularia atlantis Parvularia atlantis Phanerochaete chrysosporium 1 Nematocida parisii Nematocida parisii 1 100/100 1 0.99 Vittaforma corneae 100/ 100/100 Edhazardia aedis Fonticula alba Fonticula alba Rhizophagus irregularis 1 100 Enterocytozoon bieneusi 100/100 73.8/84 Pseudoloma neurophilia Rozella allomycis Rozella allomycis 1 1 100/100 1 Mortierella verticillata Nosema ceranae 100/100 Vavraia culicis 1 1 100/100 1 Paramicrosporidium saccamoebae 100/100 Paramicrosporidium saccamoebae BMC Ordospora colligata Trachipleistophora hominis 1 100/100 1 Mucor circinelloides 1 100/100 Mitosporidium daphniae 1 1 Encephalitozoon intestinalis Anncaliia algerae Mitosporidium daphniae Lichtheimia hyalospora 1 100/100 100/100 Paraphelidium tribonemae 1 Antonospora locustae Antonospora locustae 1 Paraphelidium tribonemae Anncaliia algerae 82.9/85 Enterocytozoon bieneusi 100/100 0.67 100/100 Gonapodya prolifera Gonapodya prolifera Edhazardia aedis 100/100 Vittaforma corneae 100/100 1 Pseudoloma neurophilia Nosema ceranae 1 Spizellomyces punctatus Spizellomyces punctatus 1 100/100 1 1 100/100 100/100 1 Vavraia culicis 100/100 Ordospora colligata Batrachochytrium dendrobatidis Batrachochytrium dendrobatidis Trachipleistophora hominis Encephalitozoon intestinalis Allomyces macrogynus Paraphelidium tribonemae Paraphelidium tribonemae Allomyces macrogynus Gonapodya prolifera Gonapodya prolifera 1 1 Catenaria anguillulae 98.5/99 100/100 Blastocladiella britannica 1 Spizellomyces punctatus 100/100 Spizellomyces punctatus 1 100/100 1 100/100 Blastocladiella britannica Catenaria anguillulae 1 Batrachochyrium dendrobatidis 99.9/ Batrachochytrium dendrobatidis 100 Schizosaccharomyces pombe Conidiobolus coronatus Allomyces macrogynus Allomyces macrogynus 99.9/98 1 1 Catenaria anguillulae 100/100 Blastocladiella britannica 1 1 100/100 Coemansia reversa 99.9/96 100/100 Saitoella complicata 1 Blastocladiella britannica Catenaria anguillulae 1 B Ustilago maydis Saitoella complicata Schizosaccharomyces pombe Conidiobolus coronatus 100/100 1 100/100 1 1 Saitoella complicata 83.2/91 Coemansia reversa Phanerochaete chrysosporium 100/100 Schizosaccharomyces pombe 1 100/100 Ustilago maydis Saitoella complicata 1 Conidiobolus coronatus Phanerochaete chrysosporium Pygsuia biforma 1 Phanerochaete chrysosporium 100/100 Schizosaccharomyces pombe 100/100 100/100 100/100 1 Ustilago maydis 100/100 1 Conidiobolus coronatus Phanerochaete chrysosporium Coemansia reversa Thecamonas trahens ML Poisson+G+C60 1 100/100 57.3/71 Coemansia reversa Ustilago maydis 0.99 Rhizophagus irregularis Rhizophagus irregularis Sphaerothecum destruens SH-aLRT/UFBOOT 81.4/91 69/69 1 Rhizophagus irregularis Rhizophagus irregularis Mortierella verticillata 0.55 1 Mortierella verticillata 100/100 Corallochytrium limacisporum 0.3 Mortierella verticillata 99.9/100 Mortierella verticillata 34.7/67 87/86 1 Mucor circinelloides Lichtheimia hyalospora 0.87 Mucor circinelloides Mucor circinelloides Amoebidium parasiticum 57.8/85 100/100 100/100 100/100 1 Lichtheimia hyalospora Mucor circinelloides 1 Lichtheimia hyalospora Creolimax fragrantissima Lichtheimia hyalospora 100/100 Ministeria vibrans 100/100 Capsaspora owczarzaki 99.5/88 Amphimedon queenslandica 100/100 Nematostella vectensis 97.8/98 100/100 Xenopus tropicalis Monosiga brevicollis K L M N 100/100 Salpingoeca kvevrii 81/89 Didyomeca costata Thecamonas trahens Pygsuia biforma Thecamonas trahens Pygsuia biforma 100/100 1 Pygsuia biforma GBE49 100/100 GBE49 1 GBE36 100/100 GBE36 Parvvularia atlantis Thecamonas trahens Pygsuia biforma Thecamonas trahens 100/100 Corallochytrium limacisporum CAT+Poi Corallochytrium limacisporum LG+F+R8+C60 CAT+Poi LG+R7+C60 Fonticula alba 0.6 64.3/68 Corallochytrium limacisporum Corallochytrium limacisporum Sphaerothecum destruens Sphaerothecum destruens 0.4 0.3 0.3 Paraphelidium tribonemae 1 1 Creolimax fragrantissima 99.9/100 77.2/79 1 100/100 Creolimax fragrantissima Sphaerothecum destruens Sphaerothecum destruens Rozella allomycis Amoebidium parasiticum Amoebidium parasiticum 1 100/100 100/100 99.8/100 0.98 Ministeria vibrans 100/100 Creolimax fragrantissima Creolimax fragrantissima Mitosporidium daphniae 1 100/100 Capsaspora owczarzaki 1 1 100/100 Capsaspora owczarzaki Ministeria vibrans Amoebidium parasiticum Amoebidium parasiticum 100/100 Amphiamblys sp. WSBS2006 1 Didyomeca costata 99.4/90 Didyomeca costata Ministeria vibrans 67.9/77 Ministeria vibrans Nematocida parisii 1 Salpingoeca kvevrii 100/100 Salpingoeca kvevrii 1 100/100 100/ 100/100 1 100/100 Capsaspora owczarzaki Capsaspora owczarzaki Edhazardia aedis 1 Monosiga brevicollis 100/100 Monosiga brevicollis 100 23.8/56 Amphimedon queenslandica Amphimedon queenslandica Didyomeca costata Didyomeca costata 100/100 Pseudoloma neurophilia 1 100/100 1 100/97 1 Xenopus tropicalis Nematostella vectensis 1 100/100 100/100 Trachipleistophora hominis 1 100/100 Salpingoeca kvevrii Salpingoeca kvevrii 100/100 Nematostella vectensis 100/100 Xenopus tropicalis 1 100/100 Vavraia culicis Parvularia atlantis Fonticula alba 1 Monosiga brevicollis 100/100 Monosiga brevicollis 100/100 1 100/100 1 100/100 Anncaliia algerae Fonticula alba Parvularia atlantis Amphimedon queenslandica Amphimedon queenslandica 100/100 Rozella allomycis Rozella allomycis 1 100/100 100/100 Antonospora locustae Xenopus tropicalis Nematostella vectensis 1 Paramicrosporidium saccamoebae 100/ Mitosporidium daphniae 1 100/100 94.6/73 Enterocytozoon bieneusi 99.6/99 100/100 1 Mitosporidium daphniae 100 Paramicrosporidium saccamoebae Nematostella vectensis Xenopus tropicalis Vittaforma corneae 0.5 Amphiamblys sp. WSBS2006 100/100 Amphiamblys sp. WSBS2006 Parvularia atlantis 100/100 Fonticula alba Nosema ceranae 1 Nematocida parisii Nematocida parisii 1 100/100 1 100/100 Fonticula alba 100/100 Edhazardia aedis 100/ Edhazardia aedis Parvularia atlantis Ordospora colligata 0.79 100/100 1 Pseudoloma neurophilia 100 100/100 98.4/97 100/100 Pseudoloma neurophilia Rozella allomycis Rozella allomycis Encephalitozoon intestinalis 1 1 1 1 Vavraia culicis 100/100 Vavraia culicis Paramicrosporidium saccamoebae 100/100 Allomyces macrogynus Trachipleistophora hominis 1 100/100 Paramicrosporidium saccamoebae 100/100 1 100/100 Trachipleistophora hominis 100/100 GBE Mitosporidium daphniae Blastocladiella emersonii 1 Antonospora locustae Anncaliia algerae Mitosporidium daphniae 100/100 1 100/100 Catenaria anguillulae Anncaliia algerae 100/100 Antonospora locustae 1 Paraphelidium tribonemae Paraphelidium tribonemae 1 Vittaforma corneae 99.6/99 Enterocytozoon bieneusi Gonapodya prolifera 100/100 Gonapodya prolifera 1 Enterocytozoon bieneusi 100/100 Gonapodya prolifera 100/100 1 Vittaforma corneae 1 98.2/90 100/100 Spizellomyces punctatus 100/100 Spizellomyces punctatus Nosema ceranae Nosema ceranae 1 1 Batrachochytrium dendrobatidis 100/100 1 Ordospora colligata 100/100 Batrachochytrium dendrobatidis 100/100 Batrachochytrium dendrobatidis 1 100/100 Ordospora colligata 100/100 Spizellomyces punctatus Encephalitozoon intestinalis Encephalitozoon intestinalis Conidiobolus coronatus Paraphelidium tribonemae Allomyces macrogynus Allomyces macrogynus 99.9/100 Paraphelidium tribonemae 1 1 98/96 Coemansia reversa Gonapodya prolifera Catenaria anguillulae 100/100 Gonapodya prolifera 1 100/100 Blastocladiella britannica Saitoella complicata 1 Spizellomyces punctatus 100/100 Spizellomyces punctatus Blastocladiella britannica 100/100 100/100 0.5 1 Batrachochyrium dendrobatidis 100/100 Catenaria anguillulae Schizosaccharomyces pombe 100/100 Batrachochytrium dendrobatidis 100/100 Conidiobolus coronatus 100/100 Allomyces macrogynus Allomyces macrogynus 1 1 Coemansia reversa Phanerochaete chrysosporium 1 1 100/100 100/100 Catenaria anguillulae 100/100 Blastocladiella britannica Coemansia reversa 99.6/98 100/100 1 100/100 Conidiobolus coronatus Ustilago maydis Blastocladiella britannica Catenaria anguillulae Schizosaccharomyces pombe 99.8/100 Rhizophagus irregularis Conidiobolus coronatus Coemansia reversa 1 Ustilago maydis 1 1 Coemansia reversa 100/100 1 Saitoella complicata 100/100 100/100 98.9/98 Conidiobolus coronatus 1 100/100 Phanerochaete chrysosporium Mortierella verticillata 100/100 92.9/94 Schizosaccharomyces pombe Ustilago maydis Ustilago maydis Mucor circinelloides 1 Saitoella complicata 100/100 1 Saitoella complicata 100/100 1 1 Phanerochaete chrysosporium Phanerochaete chrysosporium 100/100 Lichtheimia hyalospora Ustilago maydis 100/100 100/100 Schizosaccharomyces pombe 1 Saitoella complicata 1 Phanerochaete chrysosporium 100/100 Rhizophagus irregularis 100/100 100/100 Schizosaccharomyces pombe 1 Rhizophagus irregularis 1 Rhizophagus irregularis 1 Rhizophagus irregularis Mortierella verticillata Mortierella verticillata Mortierella verticillata 1 Mortierella verticillata 1 100/100 Mucor circinelloides 97.1/94 76.2/81 Mucor circinelloides 70.9/81 Lichtheimia hyalospora 1 Lichtheimia hyalospora 1 Lichtheimia hyalospora 100/100 Lichtheimia hyalospora 100/100 Mucor circinelloides Mucor circinelloides

Figure S1. Phylogenomic trees showing the position of Paraphelidium tribonemae (related to Figure 1). (A) Unrooted Bayesian phylogenetic tree from the SCPD48 matrix. Consensus tree from 2 chains of PhyloBayes-MPI using CAT-Poisson mixture model of evolution. Statistical support is in the form of posterior probabilities. (B) Unrooted maximum likelihood (ML) phylogenetic tree estimated from the SCPD48 matrix. The best tree was reconstructed using IQ-TREE with the best tting model of evolution (indicated in each panel) + C60. Statistical support in ML trees represents SH-aLRT / 1000 ultrafast bootstrap replicates (in %). Equivalent Bayesian and ML trees were carried out with three dierent datasets (SCPD, Torruella et al 2015; BMC, Capella-Gutiérrez et al. 2012; GBE, Mikhailov et al. 2016) for two sets of species including or not long-branch microsporidians (49 species and 36 species, respectively): (C-D) SCPD49; (E-F) SCPD36; (G-H) BMC49 ; (I-J) BMC36; (K-L) GBE49 and (M-N) GBE36. A. CHS 7 hyaluronan synthases B. CDS 87.9/81 Bacteria

87.9/81 104 CHS Division I

EPZ34936.1 Rozella allomycis 100/100 EPZ36133.1 Rozella allomycis 74.2/93 EPZ35818.1 Rozella allomycis 98/100 EPZ33379.1 Rozella allomycis 40.7/95 EPZ32222.1 Rozella allomycis 99.8/100 EPZ30852.1 Rozella allomycis 94.3/100 100/100 EPZ30851.1 Rozella allomycis 4 diatomea Partr_v15_DN23247_c0_g1_i1_m35436 94.1/100 100/100 Partr_v15_DN26012_c0_g1_i2_m758 53.3/76 Division II Partr_v15_DN26012_c0_g1_i3_m759 3 Corallochytrium class V-like 35.4/69 83.9/100 Partr_v15_DN25951_c0_g2_i2_m68817 96/100 Partr_v15_DN25951_c0_g2_i3_m68818 KNE73184_Amac_17376 Allomyces macrogynus Partr_v15_DN25951_c0_g3_i1_m68819 99.4/97 100/100 77.1/100 68.5/97 KNE73173_Amac_17369 Allomyces macrogynus Partr_v15_DN25951_c0_g3_i3_m68822 KNE73196_Amac_17386 Allomyces macrogynus 83.2/99 Partr_v15_DN25951_c0_g3_i2_m68820 73.5/60 96.9/100 100/100 Partr_v15_DN28371_c0_g1_i2_m78609 Partr_v15_DN25951_c0_g3_i4_m68823 Partr_v15_DN28371_c0_g1_i3_m78611 82.7/94 XP_021877953.1 Lobosporangium transversale 99.9/99 KNE54726_Amac_00686 Allomyces macrogynus 100/100 XP_016603936.1 Spizellomyces punctatus KNE56376_Amac_02187 Allomyces macrogynus 85.1/76 33.6/94 KNC96965_Spun_SPPG_07786 Spizellomyces_punctatus EPZ34745.1 Rozella allomycis 55.8/97 EIE77284_Rory_11650 100/100 92.6/94 99.7/100 73.1/99 EIE84025_Rory_16690 Rhizopus oryzae 20.5/57 EIE86913_Rory_654 Rhizopus oryzae 84.1/79 79.7/90 XP_013239593 Mitosporidium daphniae XP_956333_Ncra_3881NCU04352 Neurospora crassa 86.5/92 94.6/98 38.1/96 95/98 20.1/78 81.8/97 80.8/97 97.3/100 XP_956331_Ncra_3879NCU04350 Neurospora crassa XP_021946650.1 Folsomia 99.9/100 KNE55428_Amac_01322 Allomyces macrogynus 90.6/97 KNE60082_Amac_05513 Allomyces macrogynus XP_008074727.1 Vavraia culicis 87.8/94 98.6/100 98.9/100 KNE68553_Amac_12722 Allomyces macrogynus 0/43 XP_013058202.1 Nematocida parisii 100/100 XP_018299568.1 Phycomyces blakesleeanus KNE70564_Amac_14686 Allomyces macrogynus 84.6/100 59.9/94 XP_018294993.1 Phycomyces blakesleeanus 99.7/100 KNE62523_Amac_07735 Allomyces macrogynus 98.4/100 73.5/67 KNE65789_Amac_09765 Allomyces macrogynus 99.8/100 XP_021883125.1 Lobosporangium transversale 98.6/100 97.4/100 71.3/91 99.7/100 KNE62507_Amac_07719 Allomyces macrogynus XP_021881283.1 Lobosporangium transversale 93.9/100 KNE65811_Amac_09783 Allomyces macrogynus 74.1/77 EPZ30799.1 Rozella allomycis 98.5/100 KNE62547_Amac_07755 Allomyces macrogynus 85.1/100 XP_013237698.1 Mitosporidium daphniae KNE65765_Amac_09745 Allomyces macrogynus 99/100 XP_013238414.1 Mitosporidium daphniae EIE92315_Rory_5687 Rhizopus oryzae 0/41 20.4/99 EPZ32702.1 Rozella allomycis EIE89976_Rory_8615 Rhizopus oryzae Class V/VII 98.4/100 87.8/83 Partr_v15_DN25396_c1_g1_i1_m21679 66/98 EIE85857_Rory_12406 Rhizopus oryzae 99.2/100 EIE90773_Rory_17367 Rhizopus oryzae 99.9/100 Partr_v15_DN25537_c0_g1_i1_m20618 92.1/92 97.3/100 83.1/93 Partr_v15_DN25537_c0_g1_i2_m20619 100/100 EIE86453_Rory_230 Rhizopus oryzae 82.1/94 EIE79737_Rory_6001 Rhizopus oryzae PIA19446_Crev_37221 Coemansia reversa 93.2/100 95.7/100 PIA19447_Crev_79355 Coemansia reversa 75.7/94 PIA15172_Crev_82160 Coemansia reversa SH-aLRT (%) / 1,000 ultrafast bootstrap (%) 83.2/97 KND03454_Spun_SPPG_00937 Spizellomyces_punctatus 93.2/97 68.8/52 20.5/85 KND01644_Spun_SPPG_03441 Spizellomyces_punctatus 0.5 82.3/93 63.9/96 KNC98434_Spun_SPPG_06138 Spizellomyces_punctatus KNC98261_Spun_SPPG_06659 Spizellomyces_punctatus Partr_v15_DN28871_c0_g1_i1_m33284 87.7/100 EPZ33986_Rall_3114 Rozella_allomycis 77.2/99 Partr_v15_DN28978_c0_g1_i1_m25194 0/12 KNC96835_Spun_SPPG_07669 Spizellomyces_punctatus 81.9/100 100/100 KNE73155_Amac_17354 Allomyces macrogynus 82.5/98 KNE73222_Amac_17409 Allomyces macrogynus 93.6/99 EIE80454_Rory_6641 Rhizopus oryzae 91.5/78 87.5/100 EIE85440_Rory_2233 Rhizopus oryzae EIE88321_Rory_9123 Rhizopus oryzae Fungi 93.7/97

42.3/95 PIA13574_Crev_94493 Coemansia reversa 0/9 99.4/100 PIA18245_Crev_39016 Coemansia reversa 100/100 95.9/100 PIA14231_Crev_47458 Coemansia reversa PIA14233_Crev_94078 Coemansia reversa KNC95895_Spun_SPPG_08656 Spizellomyces_punctatus 100/100 50.3/85 KNE62280_Amac_07514 Allomyces macrogynus KNE72520_Amac_16564 Allomyces macrogynus C. CTSI 100/100 78.4/100 KNE62318_Amac_07548 Allomyces macrogynus 41.4/70 KNE72821_Amac_16925 Allomyces macrogynus PIA19404_Crev_90709 Coemansia reversa KND03389_Spun_SPPG_00877 Spizellomyces_punctatus EPZ33276_Rall_3847 Rozella_allomycis OIR58135 Amphiamblys sp WSBS2006 95.1/100 98.2/99 23.9/96 OIR56309 Amphiamblys sp WSBS2006 OIR59110 Amphiamblys sp WSBS2006 Embryophyta KCZ75424 Anncaliia algerae 90.7/99 89.8/49 81.9/97 76.7/80 jgi|Antlo1|232|1126 Antonospora locustae XP_013239321.1 Mitosporidium daphniae 95/98 EJW04385 Edhazardia aedis 67.1/53 XP_013060203 Nematocida parisii 99.7/100 100/100 KRH94744 Pseudoloma neurophilia G A 90.7/95 ELQ75318 Trachipleistophora hominis 65/73 XP_002651382 Enterocytozoon bieneusi KKO76571 Nosema ceranae 88.2/97 85.6/72 85.7/100 XP_014564521 Ordospora colligata OIR58586.1 Amphiamblys sp WSBS2006 99/100 95.4/100 100/100 66.2/69 KPM06133.1 Sarcoptes scabiei XP_003072355 Encephalitozoon cuniculi 87.3/71 100/100 KNE70001_Amac_14841 Allomyces macrogynus KNE71056_Amac_15310 Allomyces macrogynus SH-aLRT (%) / 1,000 ultrafast bootstrap (%) 99.7/100 15.8/46 99.6/100 KNE63274_Amac_08416 Allomyces macrogynus Partr_v0_DN25308_c0_g1_i1_m21985 KNE72544_Amac_16987 Allomyces macrogynus 0.9 86.2/53 79.3/37 ODM92412.1 Orchesella cincta 92.4/99 Amac_15340 Allomyces macrogynus 73.8/100 ODM90353.1 Orchesella cincta 92.1/94 0/87 KNE70026_Amac_14861 Allomyces macrogynus 96.5/99 ODM99250.1 Orchesella cincta Amac_15341 Allomyces macrogynus Class IV ODM92610.1 Orchesella cincta ODN03228.1 Orchesella cincta 89.9/98 100/100 EIE78190_Rory_3307 Rhizopus oryzae 78.5/95 EIE87013_Rory_748 Rhizopus oryzae WP_041563109.1 Nocardia brasiliensis WP_052280453.1 Nocardia vulneris 86.2/94 99.9/100 28.4/26 Ecdysozoa 42/65 PIA14591_Crev_46483 Coemansia reversa 76.2/72 Ncra_8263NCU09324 Neurospora crassa 77.7/87 20.9/73 EPZ34250.1 Rozella allomycis 5/59 EIE78483_Rory_3578 Rhizopus oryzae 81.8/86 EIE78332_Rory_3441 Rhizopus oryzae EPZ35997.1 Rozella allomycis 77/82 90.5/63 81.9/63 94.2/100 99.6/100 EIE75721_Rory_10213 Rhizopus oryzae EIE81461_Rory_14289 Rhizopus oryzae Amac_06169 Allomyces macrogynus Harpetalles XP_013236426 Mitosporidium daphniae 83.3/55 81.3/58 KNC97233_Spun_SPPG_07620 Spizellomyces_punctatus G A 78/90 86.8/74 37.8/79 EPZ33633_Rall_3453 Rozella_allomycis Partr_v15_DN28730_c0_g1_i1_m62801 92.4/82 Ecdysozoa 10/59 82.8/93 100/100 EIE87635_Rory_13323 Rhizopus oryzae Ecdysozoa EIE79997_Rory_6234 Rhizopus oryzae 89/52 KND01488_Spun_SPPG_03289 Spizellomyces_punctatus 86.7/61 Microsporidia 98.3/100 Partr_v15_DN27712_c1_g1_i1_m67620 84.7/80 Partr_v15_DN28894_c0_g1_i3_m34349 44/67 KND01274_Spun_SPPG_03084 Spizellomyces_punctatus EPZ36206_Rall_926 Rozella_allomycis 30.5/36 KNC99172_Spun_SPPG_05426 Spizellomyces_punctatus 22.1/44 PIA16683_Crev_30126 Coemansia reversa 80.7/56 SH-aLRT (%) / 1,000 ultrafast bootstrap (%) 99.5/100 98.6/97 Partr_v15_DN25428_DN27941 0.4 D. FKS1 100/100 50.4/49 KNE54083.1 Allomyces macrogynus 87.3/63 97.9/100 EIE87764_Rory_13441 Rhizopus oryzae ORX89476.1 Basidiobolus meristosporus EIE88502_Rory_9284 Rhizopus oryzae 86.7/98 ORX87727.1 Basidiobolus meristosporus 100/100 ORX87725.1 Basidiobolus meristosporus 99.2/100 ORX81310.1 Basidiobolus meristosporus KXN64983.1 Conidiobolus coronatus 49.8/89 KXN71726.1 Conidiobolus coronatus 68.1/76 OMH85289.1 Zancudomyces culiseta 100/100 100/100 CDS06604.1 Lichtheimia ramosa 100/100 100/100 CDH56750.1 EPB92646.1 Mucor circinelloides E. GH3 55.4/98 SAM02600.1 Absidia glauca 74.7/75 CAA41693_PGA28I_ASPNG_P26213_14alphaDgalacturonan_glycanohydrolase_Aspergillus_niger_GH28 SH-aLRT (%) / 1,000 ultrafast bootstrap (%) 64.2/50 WP_074288912_gl ycosyl_hydrolase_Burkholderia 93.4/98 WP_075300158_gl ycosyl_hydrolase_Burkholderia 0.5 100/100 100/100 99.8/75 Bacteria Beta-glu cosidase

68.1/95 100/100

97.1/85 Bacteria GH_3 90.7/99

99.7/100 Dikarya ADB61039.1_glh_3_domain_p rotein_Halo terrigena_tu rkmenica 100/76

99.7/99 Bacteria Beta-glu cosidase

86.7/86 100/100 82.3/89 92.6/98 BAL46040.1_glucocerebrosidase_Cryptococcus_neoformans_var._grubii 71.7/92 Partr_v15_DN27943_c0_g1_i1_m11403 97.5/100 F. GH5 100/100 98.8/79 Bacteria Beta-glu cosidase 28.8/95 100/100 89.4/100 48/99 85.9/87 100/100 99.7/100 53/99 GAN07816.1_glycoside_hydrolase_family_5_protein_Mucor_ambiguus XP_004337787.1_gh_3_Acanthamoeba_castellanii BAJ64239.1_putative_glycosidase_Anaerolinea_thermophila_UNI-1 22.3/75 WP_068928920_h rotein_Planobispo ra_rosea 100/100 AAN51810.1_hypothetical_protein_LB_251_Leptospira_interrogans_serovar_Lai_str._56601 100/100 99.3/97 ABJ80534.1_Conserved_hypothetical_protein_Leptospira_borgpetersenii_serovar WP_031165647_gl ycosyl_hydrolase_Streptosporangium_roseum ADN02922.1_hypothetical_protein_STHERM_c19870_Spirochaeta_thermophila_DSM_6192 92.8/77 99.4/100 ACZ88873.1_ conserved_h rotein_Streptosporangium_roseum 98/99 ABZ97122.1_Conserved_hypothetical_protein_Leptospira_bi exa_serovar 85.9/100 100/100 ABS62456.1_conserved_hypothetical_protein_Parvibaculum_lavamentivorans_DS-1 WP_012892608_gl ycosyl_hydrolase_Streptosporangium_roseum 83.6/100 ACM31353.1_conserved_hypothetical_protein__plasmid__Agrobacterium_radiobacter_K84 10/41 Partr_v15_DN23181_c0_g3_i1_m43973_GH3 51.2/89 98.7/98 98.3/90 XP_001745556_ h rotein_Monosiga_b revicollis 99.2/100 XP_001743204_ h rotein_par Monosiga_b revicollis 100/100 91.5/100 92.5/84 XP_004999057_ h rotein_PTSG_00514_Salpingoeca_ r 31.9/44 92.8/99 100/100 Partr_v15_DN28324_c1_g1_i1_m78825 AAO41704_BGL3A_PIRSP_Q875K3_betaDglucoside_glucohydrolase_Piromyces_sp 10.9/56 Partr_v15_DN28324_c2_g1_i5_m78840 CAP58431_BGL3A_RHIMI_B0JE65_betaDglucoside_glucohydrolase_Rhizomucor_miehei 97.9/100 81.6/92 Partr_v15_DN28324_c2_g1_i2_m78830 CAE01320_BGL3A_UROFA_Q70KQ7_betaDglucoside_glucohydrolase_Uromyces_fabae Partr_v15_DN28324_c2_g1_i4_m78838 85.7/72 41.2/72 BAB85988_BGL3A_PHACH_Q8TGC6_betaDglucoside_glucohydrolase_Phanerochaete_chrysosporium 40.7/79 EPZ33623.1_Glycoside_hydrolase_Rozella_allomycis 95.8/99 16.8/65 91.9/95 Partr_v15_DN27291_c0_g1_i1_m38474 100/100 Partr_v15_DN27291_c0_g1_i2_m38475 42.2/70 BAE43955.1_membrane_beta-glucosidase_1_Physarum_polycephalum 94.7/66 86.3/98 100/100 BAF02538.1_membrane_beta-glucosidase_2_Physarum_polycephalum 95.5/86 68.8/56 0/83 74.3/44 100/100 68.5/85 BAF02537.1_membrane_beta-glucosidase_3_Physarum_polycephalum 99.1/100 98.7/99 GAN08468.1_glycosyl_hydrolase_Mucor_ambiguus 96.7/73

90.5/88 Fungi Beta-D-glucoside glucohydrolase 88.4/64

99/100 Dikarya GH5_12 WP_025226355_h rotein_Fimbriimonas_ ginsengisoli CBK63660.1_Beta-glu cosidase-related_gl ycosidases_ A 83.9/93 BAE61738.1_unnamed_protein_product_Aspergillus_oryzae 89.4/94 95.7/99 Bacteria Beta-glu cosidase

78.3/95 BAB02435.1_cytochrome_P450_Arabidopsis_thaliana 95.8/90 ADE81554.1_glucan_1,4-beta -glu cosidase_Prevotella_rumini cola BAC84995_GAL5F_TRIVI_Q76FP5__Trichoderma_viride_endo-1,6-beta-galactanase AAC49731_ZEX5A_ORPJO_P87211__Orpinomyces_joyonii 99.8/100 82.1/94 92.2/91 99.8/100 Amoebozoa GH_3 98.4/100 EPZ34200.1_Glycoside_hydrolase_Rozella_allomycis 89.5/100 73.6/97 100/100 ACL75496.1_gh_3_domain_p rotein_Clost ridium_ c 98.8/100 100/100 CAP69143.1_unnamed_protein_product_Podospora_anserina ADL69943.1_gh_3_domain_p rotein_Thermoanae robacterium_the rmosacchar XP_004344126.1_betaxylosidase_Acanthamoeba_castellanii 100/100 XP_004346982.2_beta- xylosidase_ Capsaspo ra_owczarzaki 99.9/99 70.8/99 Partr_v15_DN5092_c0_g1_i1_m31392 95.1/98 95.9/100 100/100 EAA64470_XYL3C_EMENI_Q5BAS1_4betaDxylan_xylohydrolase_Emericella_nidulans 87/100 Embryophyta GH5_11

97.7/98 ADV43554.1_gl ycoside_hydrolase_family_3_domain_p rotein_Bacteroides_hel cogenes 97.7/88 CBL11432.1_Beta-glu cosidase-related_gl ycosidases_ Roseburia_int CBW23395.1_puta ve_exported_hydrolase_Bacteroides_fragilis 82.4/95 82.3/44 AAO76887.1_periplasmic_beta -glu cosidase_p recursor_Bacteroides_thetaiotaomic ron

72.1/82 93.3/85 4-beta-D-glucan_4-glucanohydrolase GH5_5

91.3/76 Bacteria Beta-glu cosidase

75.2/67

98.7/100 Bacteria Beta-glu cosidase

91.6/95 100/100 85.5/42 4-beta-D-glucan_4-glucanohydrolase GH5_4

78.1/85

62.5/83 Bacteria N-acetylhexosaminidase

96.7/90 67/79 74.3/44 ACL70728.1_beta -glu cosidase_Halothe rmothrix_orenii 52.9/96 ADJ13805.1_beta -glu cosidase_Halal kalicoccus_jeotgali

Bacteria Beta-glu cosidase 95.2/98 72.2/96

99.7/100 100/100 Partr_v0_DN28061_c1_g1_i2_m56950_gh3_gh3c_fn3like_betaDglucoside_glucohydrolase 4-beta-D-glucan_4-glucanohydrolase GH5_4 98.4/99 90.5/100 XP_014149803.1_h rotein_Sphae roforma_arc 100/100 XP_014151984.1_h rotein_Sphae roforma_arc 95.4/95 SH-aLRT (%) / 1,000 ultrafast bootstrap (%)

0.6 90.5/99 Amoebozoa beta_glucosidase

ARN65235.1_Negative_regulator_of_ agellin_synthesis_FlgM_Vibrio_vulnicus

Amoebozoa Apusomonads Holozoa 60.2/39 Opisthosporidia Fungi Nucleariids 95.9/99 3-beta-D-glucan_glucohydrolase GH5_2 SH-aLRT (%) / 1,000 ultrafast bootstrap (%)

0.7

Figure S2. Maximum Likelihood phylogene�c trees for cell-wall related proteins (chi�n metabolism and cellulases) (related to Figure 2). (A) Chi�n synthases (CHS). (B) Chi�n deacetylase (CDA). (C) Chi�nase class I (CTS I) Glycoside hydrolase 19. (D) 1,»-beta-glucan synthase (FKS1). To build this tree, the two domains PF02364 and PF14288 encoded by separate transcripts in Paraphelidium have been concatenated. (E) Glycoside hydrolase family 3 (GH3). (F) Glycoside hydrolase family 5 (GH5). Topologies and sta�s�cal supports were obtained with IQTREE LG NEWTEST model of evolu�on. Node support (in %) represent 1,000 SH-like approximate likelihood ra�o test (SH-aLRT) and ultrafast bootstraps. correlation (positive only) A Nae_gr B pairwise species correlation C Energy production and conversion OGs across species Amoebozoa 140 3 . 0

Apusomonad Het_pa 60 Dic_pu Count Holozoa The_tr Aca_ca 0 Ent_hi Try_br 0.2 0.6 1 Mon_br Value 2 Nucleariids Tox_go . Lei_ma Sal_ro 0 Tox_go Tox_go Opisthosporidia Pla_fa Pla_fa Try_br Try_br Fungi Pla_fa Phy_in Lei_ma Lei_ma Ent_hi Sch_po Roz_al 1 Cap_ow Ich_ho . Other Mit_da Roz_al

0 Ich_ho Chr_pe Par_sa Mit_da Sph_ar Par_at Phy_in Fon_al Nae_gr Fon_al Nae_gr The_tr Amp_sp Cor_li Aca_ca Aca_ca 0 The_tr Het_pa . Phy_in Dic_pu 0 Par_at Het_pa Chr_pe Roz_al Dic_pu Cap_ow Nem_pa Mit_da Cor_li Sph_ar Chr_pe Cor_li Cap_ow Sal_ro 1 Par_sa .

Axis 2 (13 % variance) Ant_lo Sph_ar Mon_br 0 Ich_ho Par_at − Ent_bi Sal_ro Fon_al Mon_br Par_sa Enc_in Paraphelidium Spi_pu Cat_an Nos_ce Bat_de All_ma

2 Bat_de Mor_ve Bla_br

. Enc_cu Par_tr Bat_de 0 All_ma Gon_pr Spi_pu − Cat_an Cat_an Par_tr Sch_po Bla_br Mor_ve Bla_br Mor_ve Gon_pr All_ma Gon_pr Cop_ci Ust_ma Ust_ma 3 Spi_pu Cop_ci

. Cop_ci Ust_ma Ent_bi 0 Sch_po Amp_sp − Enc_in Enc_cu Enc_in Nos_ce Enc_cu −0.4 −0.2 0.0 0.2 Nem_pa Nem_pa Ent_bi Nos_ce Ant_lo Ant_lo Axis 1 (24.1% variance) Amp_sp Ent_hi Cor_li Par_tr Sal_ro Ent_hi Pla_fa Ant_lo Ent_bi Try_br The_tr Bla_br Ich_ho Par_at Mit_da All_ma Phy_in Fon_al Enc_in Roz_al Dic_pu Spi_pu Cop_ci Sph_ar Par_sa Bat_de Chr_pe Het_pa Cat_an Nae_gr Lei_ma Mon_br Mor_ve Aca_ca Tox_go Gon_pr Enc_cu Nos_ce Sch_po Ust_ma Cap_ow Amp_sp Nem_pa D Amino acid transport and metabolism OGs across species E Nucleotide transport and metabolism OGs across species F Lipid transport and metabolism OGs across species

The_tr Sal_ro Cor_li Aca_ca Mon_br Cap_ow Par_at Phy_in Phy_in Fon_al The_tr Sph_ar Het_pa Ich_ho Chr_pe Dic_pu Cor_li Ich_ho Nae_gr Sph_ar Sal_ro Roz_al Chr_pe Mon_br Mit_da Cap_ow The_tr Par_sa Par_at Aca_ca Try_br Fon_al Het_pa Lei_ma Tox_go Dic_pu Ent_hi Pla_fa Nae_gr Tox_go Roz_al Try_br Pla_fa Mit_da Lei_ma Nem_pa Par_sa Cat_an Ant_lo Try_br All_ma Enc_in Lei_ma Bla_br Enc_cu Nae_gr Par_tr Nos_ce Ent_hi Spi_pu Amp_sp Het_pa Bat_de Ent_bi Dic_pu Mor_ve Sph_ar Aca_ca Ust_ma Cor_li Spi_pu Cop_ci Ich_ho Gon_pr Par_at Phy_in Par_tr Gon_pr Chr_pe Mor_ve Fon_al Cap_ow Bat_de Sch_po Sal_ro Cat_an Roz_al Mon_br Bla_br Mit_da Cat_an All_ma Par_sa Bla_br Ust_ma Tox_go All_ma Cop_ci Pla_fa Mor_ve Sch_po Ant_lo Bat_de Enc_in Amp_sp Par_tr Enc_cu Nem_pa Gon_pr Amp_sp Ent_bi Spi_pu Nos_ce Nos_ce Ust_ma Nem_pa Enc_in Cop_ci Ent_bi Enc_cu Sch_po Ant_lo Ent_hi G H

I J

Figure S3. Comparative analyses of metabolic gene content (related to Figure 3A-B). (A) Principal coordinate analysis (PCoA) of 41 eukaryotic species according to their pro le of presence/absence of 1172 ortholog groups (OGs) related to 8 primary metabolic functions (COG categories C,E, F, G, H, I, P and Q); simpli ed version in Figure 3A. (B) Clustering of 41 eukaryotic species according to their pro le of presence/absence of the same primary metabolism 1172 OGs. Color code is scaled to display positive Pearson’s r (0 to 1). (C-F) Binary heat maps of orthologs (OGs) in 41 eukaryotic genomes/transcriptomes. (C) Energy production and conversion (167 OGs, category C). (D) Amino acid transport and metabolism (160 OGs, category E). (E) Nucleotide transport and metabolism (73 OGs, category F). (F) Lipid transport and metabolism (166 OGs, category I). (G) KEGG metabolic pathways map01100. (H) KEGG Oxidative phosphorylation map00190. (I) KEGG Amino acid synthesis map01230. (J) KEGG Steroid biosynthesis map00100. KEGG maps compare presence/absence of the corresponding proteins in Paraphelidium tribonemae (green) and Rozella allomycis (pink); both (blue). Full names for species acronyms are provided in supplementary text. correlation 90.9/91 Spar_SPRG_03401T0 8.3/82 Spar_SPRG_19487T0 89.7/91 Spar_SPRG_03402T0 Myosin 24.9/52 Spar_SPRG_03400T0 pairwise species correlation Spar_SPRG_03391T0 XXVII 100/100 92.2/100 Pinf_XP_002894917 (positive only) 97.1/100 Pinf_XP_002909135 99.9/100 Phso_360938Ps Pyul_PYU1_T004123 45/20 96.8/100 Pinf_XP_002905000 100/100 Phso_479774Ps 98.9/100 Pyul_PYU1_T010653 A B F Spar_SPRG_05263T0 Amoebozoa 92.6/94 98.8/100 Pinf_XP_002908680 100/100 Pyul_PYU1_T012799 48.5/20 Spar_SPRG_02722T0 89.5/100 Pinf_XP_002907688 99/100 Phso_323276Ps 99.8/100 Pyul_PYU1_T004400

60 Spar_SPRG_08351T0 Apusomonad 100/100

Count Spar_SPRG_20600T0 87.5/7395.7/100 100/100 Pinf_XP_002900773 The_tr 99.8/100 Phso_258484Ps 100/100 Pyul_PYU1_T010236 Spar_SPRG_01581T0 Pinf_XP_002896847 66.1/27 99/100 99.2/100 Phso_534588Ps Fungi 0 100/100 Pyul_PYU1_T002103 Spar_SPRG_05311T0 99.9/100 Esil_CBJ28361 99.6/100 Esil_CBN77168 Esil_CBJ27444 Esil_CBN73968 Aca_ca 100/100 0/12 Esil_CBN73971 0.2 0.6 1 0/24 Ngad_31040 97.4/98 Ptri_XP_002179003 Opisthosporidia 92.2/100 100/100 Thps_Thaps36563__Tpse_XP_002296378 Ngad_01953 Tgon_XP_002370590 Value 94.2/74 94.5/100 86.1/100 Pfal_Q8IE50_Q8IE50 Het_pa Pmar_XP_002772878 Phy_in 100/100 98.7/100 Pmar_XP_002788507 2 99.9/100 Pmar_XP_002771112 . Holozoa Pmar_XP_002772278 Nae_gr 89.9/51 Thps_Thaps12645__Tpse_XP_002294196 65.3/100 39.9/99 Thps_Thaps38230__Tpse_XP_002297047 56.4/82 Ptri_XP_002184750 0 Thps_Thaps37376__Tpse_XP_002293770 76.4/67 97.9/100 Nae_gr Dic_pu The_tr Ptri_XP_002181360 73.5/90 Thps_Thaps12637__Tpse_XP_002286167 99.9/100 98.4/100 Thps_Thaps33225__Tpse_XP_002289050 Nucleariid Het_pa 8.3/60 Thps_Thaps12602__Tpse_XP_002288954 100/100 Ptri_XP_002181018 Thps_Thaps37960__Tpse_XP_002294433 Gthe_44931 Phy_in 100/100 Dic_pu 64.9/98 Gthe_48710 100/100 Gthe_158776 Cryptophyta 89.3/98 Gthe_72379 79.1/19 Gthe_165588 orphan Aca_ca Gthe_158791 Other Try_br 95.8/76 Ent_hi Gthe_57369 98.1/27 75.1/75 Gthe_82670 Gthe_68130 Cap_ow Cor_li 69.6/19 100/100 Spar_SPRG_08570T0 Spar_SPRG_17979T0 82.6/100 Ptri_XP_002176359 Mon_br Chr_pe 98.8/100 Ptri_XP_002176413 Myosin Ptri_XP_002179793 100/100 84.6/100 Ptri_XP_002186480 XXI 97.8/100 Thps_Thaps38363__Tpse_XP_002297174 Lei_ma Sph_ar 77.6/67 Thps_Thaps31803__Tpse_XP_002287576 88.5/489.42/69 Esil_CBN79305 Ngad_02898

1 Esil_CBJ26663 Ent_bi 83.3/66 Ich_ho Pinf_XP_002899967 Sal_ro 92.9/100 . 99.9/100 Phso_345966Ps 100/100 Pyul_PYU1_T013349 Spar_SPRG_21053T0 Par_at 81.8/65 Pyul_PYU1_T008784 0 0/96 Pyul_PYU1_T008785 84.6/72 84.5/100 Pla_fa 75.9/100 Pyul_PYU1_T008786 Pinf_XP_002907993 Fon_al 98.3/100 100/100 Phso_346079Ps

iance) Ant_lo Spar_SPRG_20560T0 Tox_go 94.9/100 100/100 Spar_SPRG_20562T0 Cap_ow Spar_SPRG_20561T0 Pinf_XP_002908185 Nos_ce 96.2/100 94.9/100 Phso_359958Ps Pyul_PYU1_T012064

a r Chr_pe 26.3/67 Sal_ro 82/62 Pinf_XP_002908251 99.1/100 100/100 Phso_492922Phs 99.2/100 Pyul_PYU1_T012096 100/100 Spar_SPRG_14408T0 Mon_br Spar_SPRG_20988T0 Enc_in 25.4/9259.3/95 Spar_SPRG_02517T0 100/100 Pinf_XP_002998411 100/100 Phso_338448Ps Try_br 67.8/96 Par_at Pyul_PYU1_T012592 Pinf_XP_002906363 Enc_cu 83.5/64 82.5/100 Nem_pa 100/100 Phso_468885Ps 100/100 Pyul_PYU1_T000659 Lei_ma 97.3/96 Spar_SPRG_05392T0 64.9/100 Pinf_XP_002896433 98.4/100 Phso_318067Ps Fon_al Ent_hi 100/100 Pyul_PYU1_T006463

0 97.7/98 Spar_SPRG_02231T0 99.8/100 Pinf_XP_002904187 . 18.5/96 Phso_565410Ps Sph_ar Tox_go Spar_SPRG_22058T0 93.9/91 75.3/97 0/61 Spar_SPRG_22057T0

0 Pyul_PYU1_T011542 Esil_CBJ26119 Pla_fa 71.6/71 98.4/75 Esil_CBJ31129 Emhu_EOD14709__Ehux_461567 Ich_ho Patl_Unigene5378_16482 Arth_19668774At__Atha_AT5G43900 Spi_pu 0/76 Amp_sp Arth_19668776At__Atha_AT5G43900 92.1/100 Myosin 99.9/100 Arth_19668775At__Atha_AT5G43900 Cor_li 82/69 Arth_19658216At__Atha_AT1G04160 XI Bat_de 62.1/90 Arth_19648960At__Atha_AT4G28710 0/37 Arth_19641901At__Atha_AT2G20290 Sbic_04g037210 84.7/45 100/100 Sbic_01g000330 Gon_pr 46.2/59 Sbic_02g010040 91.1/99 97/100 98/100 Arth_19640394At__Atha_AT2G33240 Arth_19651579At__Atha_AT1G04600 Acoe_005_00115 Par_tr 82.906/.19/0065 Acoe_034_00206 64.8/100 Sbic_01g008180 99.5/100 Arth_19642330At__Atha_AT2G31900 Bla_br Acoe_004_00667 100/100 Arth_19649567At__Atha_AT1G54560 37/11 Arth_19653305At__Atha_AT1G08730 Acoe_017_00666 99.6/19020/69 96.2/99 Arth_19660616At__Atha_AT3G58160 Axis 2 (13.5 % v All_ma Acoe_034_00422 1 9141.1/.99/911 Sbic_10g008210

. Acoe_076_00141 Cat_an 86.1/88 Sbic_03g032770 99.8/100 91.3/99 Sbic_09g026840 35/93 Sbic_03g026110 0 9715.3.8//18070 100/100 Arth_19670543At__Atha_AT5G20490 Roz_al Arth_19670544At__Atha_AT5G20490 Paraphelidium Arth_19654032At__Atha_AT1G17580

− Ppat_XP_001764154 59.8/770/96 Ust_ma 99.7/100 Ppat_XP_001770954 94.6/97 Ppat_XP_001770957 98.5/100 Smoe_XP_002966591 Mit_da Smoe_XP_002982428 95.1/100 Smoe_XP_002975092 Cop_ci 99.9/100 Smoe_XP_002988982 40.6/93 Acoe_020_00543 83.2/68 69.6/94 Sbic_04g034830 Mor_ve 100/94 Sbic_01g023150 100/100 Arth_19646160At__Atha_AT4G33200 Roz_al Arth_19646161At__Atha_AT4G33200 Otau_XP_003079970 Sch_po Crei_XP_001693409 84.2/39 78.7/100 Par_sa Gon_pr 100/100 Crei_XP_001692310 46.3/99 Vcar_XP_002947426 Crei_XP_001697846 Spi_pu Par_sa 98.7/98 77.3/99 99.9/100 Vcar_XP_002954496 8.2/55 Chva_34523Cv__Cvar_EFN58364 Patl_Unigene6096_18679 Amp_sp Chva_25512Cv__Cvar_EFN53725 95.4/100 Acas_g1473 80.1/96 Acas_g6612 Myosin 96.5/94 2 Bat_de 85.3/73 Acas_g13946 Mit_da Didi_DDB0185050__Ddis_XP_645195 99.7/100 XI

. Ust_ma 71.9/65 Ppal_EFA74941_1 90.3/100 Acas_g14730 76.8/87 96.8/98 Acas_g16003 Acas_g15556 0 Enc_in 92/99 100/100 Ppal_EFA79637_1 Cop_ci Mor_ve Didi_DDB0233685__Ddis_XP_001733030 Acas_g16115

− Enc_cu 100/100 Rhde_RO3G_04477T0__Rory_6035 91/100 Rhde_RO3G_14882T0__Rory_15113 15/95 Pbla_Phybl2_136641 Myosin Cat_an 100/100 Rhde_RO3G_00315T0__Rory_10110 Nos_ce 90.3/100 Sch_po All_ma Rhde_RO3G_08439T0__Rory_16422 V Pbla_Phybl2_137096 78.9/9939.8/100 Rhde_RO3G_14960T0__Rory_15185 Bla_br Nem_pa 70.3/99 Rhde_RO3G_04533T0__Rory_6086 72.7/100 Pbla_Phybl2_154650 94.89/120.02/100 Pbla_Phybl2_185687 Rhde_RO3G_05700T0__Rory_13880 Ent_bi Mver_00810 100/100 90.7/67 86.7/7481.3/99 Mver_02983 Rhir_1651Ri Ccin_CC1G_03780 Ant_lo 28.2/75 77.3/86 Usma_71020943__Umay_XP_760702 Cneo_XP_570120 0/16 Aory_CADAORAT00004113 94.6/16 Tmel_XP_002838719 99.4/99 92.3/45 Asfu_104351Af 98.7/95 Neca_8063Nc Tmel_XP_002838720 97.3/86 96.3/99 Sace_sace_5989Sc 94.5/76 Sace_sace_53Sc 98.2/99 Spom_NP_588492_1 17.7/42 Spom_NP_596233_1 −0.4 −0.3 −0.2 −0.1 0.0 0.1 0.2 Cor_li 88.3/99 Par_tr Gpro_145765 Sal_ro Pla_fa Ent_hi Ant_lo Ent_bi 100/100 Try_br The_tr Bla_br Ich_ho Par_at

Mit_da Gpro_157325 All_ma Phy_in Fon_al Enc_in 99.7/100 Dic_pu Cop_ci Roz_al Spi_pu Sph_ar Par_sa Bat_de Chr_pe Het_pa Cat_an Lei_ma Nae_gr Gpro_107502 Mon_br Mor_ve Aca_ca Tox_go Enc_cu Gon_pr 96.8/100 Nos_ce Sch_po Ust_ma

Cap_ow Spun_SPPG_01285

Amp_sp 95.8/100 Nem_pa Bade_13697Bad__Bden_Batde5_13697 85.6/99 88.4/100 Alma_AMAG_04247T0__Amac_04247 99.8/100 Alma_AMAG_04247T1__Amac_04247 100/100 Alma_AMAG_01753T0__Amac_04247 94.9/99 99.8/100 Alma_AMAG_12690T0__Amac_12690 Axis 1 (27.6 % variance) Alma_AMAG_14653T0 Partr_v1_DN28973_c0_g1_i3_m24892 93.8/49 Roal_1662Ra 41.3/97 Patl_Unigene47_204 4/91 Patl_Unigene6736_20668 92.6/86 Falb_H696_04966T0 Patl_Unigene5492_16811 58.2/99 Hsap_ENSP00000285039 97.2/100 Hsap_ENSP00000350945 91.6/86 Hsap_ENSP00000261839 62.9/86 Dpul_308269 93.6/94 Dmel_FBpp0088684 94.8/47 99.6/100 Ocar_g2311_t1 81/8459 .2/91 Ocar_g1152_t1 Ctel_225225 27.8/8487.7/95 Nvec_XP_001633882 Mlei_ML00121a 90.6/60 99.9/100 Mbre_XP_001743646 Sros_PTSG_01196 0/26 38.5/6803 .9/59 Cele_F36D4_3b Aque_Nk52_evm70s215_WARNING_NOHOMOLOG_rspp_3 33.3/387.59/27 Mvib_comp3686_c0_seq1_m1727 Clim_evm72s109 94.3/99 C Cfra_5427T1 98.7/97 Nk52_evm70s215 53.1/70 Cowc_CAOG_05212 Clim_evm54s157 43.8/11 92.1/71 89.5/93 Nk52_evm18s485 Nk52_evm35s1569 97.7/77 Patl_Unigene3786_11549 Hsap_ENSP00000409936 72.6/100 Myosin 90.1/100 Dpul_307123 84.9/100 Nvec_XP_001624060 92/100 Ctel_222718 XI 96/78 Cfra_6244T1 Nk52_evm21s156 99.4/100 Arth_19666626At__Atha_AT5G54280 100/100 Arth_19666627At__Atha_AT5G54280 Myosin 88.8/78 Arth_19643859At__Atha_AT4G27370 91.8/100 Acoe_063_00017 VIII 45.1/45 Acoe_063_00023 61.3/47 Acoe_008_00012 80/59 Sbic_02g036390 100/100/67 Arth_19658869At__Atha_AT3G19960 97.8/98 Arth_19653625At__Atha_AT1G50360 93.2/100 Arth_19658870At 13/93 Arth_19657716At__Atha_AT1G50360 31.6/97 97.4/100 Sbic_01g018770 Acoe_035_00049 99.6/100 Smoe_XP_002975651 97.4/100 Smoe_XP_002978834 99.7/100 100/100 Smoe_XP_002972734 Smoe_XP_002993349 72.8/95 Ppat_XP_001755337 95.9/95 Ppat_XP_001777862 100/100 91.5/99 Ppat_XP_001774306 97.7/100 Ppat_XP_001768683 Ppat_XP_001775964 99.9/100 Chva_26915Cv__Cvar_EFN52211 97.7/100 Chva_55866Cv 100/100 Crei_XP_001696752 Vcar_XP_002956099 100/100 Emhu_EOD08320 93.8/99 Emhu_EOD12524 Myosin 99.9/100 Emhu_EOD19243 66.9/98 Emhu_EOD31403 XXXII 94/100 100/100 Emhu_EOD19444 Emhu_EOD23846 Emhu_EOD27394 63.4/84 0/50 Emhu_EOD39780__Ehux_108791 100/100 Emhu_EOD15565__Ehux_108791 99.7/100 Emhu_EOD17958 100/100 Emhu_EOD16190__Ehux_76086 Emhu_EOD38059__Ehux_200475 0/93 Emhu_EOD12602__Ehux_436181 75.9/100 Emhu_EOD12642__Ehux_436181 97.6/82 94.5/100 Emhu_EOD18854__Ehux_436181 100/100 Emhu_EOD19528__Ehux_415936 100/100 Emhu_EOD26082__Ehux_115214 84.8/100 Emhu_EOD36429__Ehux_110275 98.5/100 Emhu_EOD27093__Ehux_450146 69/87 100/100 Emhu_EOD30167__Ehux_450146 Emhu_EOD24030__Ehux_450657 90.9/80 Emhu_EOD42146__Ehux_194888 93.6/96 Emhu_EOD42145__Ehux_194887 Bsal_BS07780 97/100 Pinf_XP_002903656 98.3/100 Phso_362097Ps Myosin 99.9/100 Pyul_PYU1_T009132 Spar_SPRG_20866T0 XXX 58.7/100 99.9/100 Pinf_XP_002908139 96.4/100 Phso_559706Ps 99.9/100 Pyul_PYU1_T013495 88/100 Spar_SPRG_11634T0 65.2/13 98.1/100 Esil_CBJ27150 36.3/87 Esil_CBN74765 100/100 Esil_CBJ30507 30.6/80 Esil_CBN74726 96/100 Esil_CBN79332 95/100 Pinf_XP_002908958 99.9/100 Phso_334143Ps 99.1/100 65.1/99 Pyul_PYU1_T004004 100/100 Spar_SPRG_00446T0 99.4/100 Spar_SPRG_06261T0 Spar_SPRG_06260T0 Emhu_EOD06811__Ehux_106740 41/100 Pinf_XP_002897677 90.1/100 Phso_443504Ps Myosin 92.5/100 Pinf_XP_002897358 94.4/100 Pinf_XP_002895407 98.4/100 Pinf_XP_002899382 XXXI/XXII 95.7/81 99.1/100 Phso_437327Ps 98.4/100 Pyul_PYU1_T010842 (unclear) 31.3/63 Spar_SPRG_19490T0 69.7/71 Pmar_XP_002765635 78.9/72 Tgon_XP_002367823 88.5/100 Pmar_XP_002765636 60.1/94 Pfal_Q8IHW0_Q8IHW0 100/100 Emhu_EOD14449__Ehux_210818 12.4/40 Emhu_EOD16408__Ehux_210818 99.8/100 Pmar_XP_002765212 0/56 97.7/100 Pmar_XP_002776935 Tgon_XP_002364981 Gvul_c12237_g1_i1_m.10012 100/100 Cele_F47G6_4 16.6/68 Cele_Y66H1A_6b Myosin 44.2/98 Mlei_ML12625a 65.2/100 Ctel_154601 93.2/100 Dpul_319997 VI 47.9/99 Hsap_ENSP00000412500 Aque_XP_003386276 96.6/8180.09/100 Ocar_g10472_t1 48.2/22 91.1/100 Nvec_XP_001628100 100/100 Mbre_XP_001742947 94.5/100 Sros_PTSG_02918 99.7/100 98.1/100 Cowc_CAOG_04414 Mvib_comp12907_c1_seq1_m21039 94.5/100 Patl_Unigene1224_3902 Ttra_AMSG_00252 53.9/100 Nspp_Mvib_comp13670_c1_seq1_m24493_WARNING_CONTAMINANT_Nspp9111_fr2 0/89 Falb_H696_04004T0 91.5/100 31.9/99 Patl_Unigene8793_26908 22.5/22 34.5/62 Mvib_comp13670_c1_seq1_m24493 96/99 Patl_Unigene7432_22865 70.5/100 42/62 Cowc_CAOG_03022 98.6/100 Mvib_comp15617_c0_seq1_m36937 91.7/100 Mvib_comp10197_c2_seq1_m11662 56.8/19 Cpar_29728Contig7421 83.4/99 Emhu_EOD09565__Ehux_448407 Emhu_EOD15761 100/100 Haptophyta Emhu_EOD40187 Emhu_EOD13884__Ehux_103727 100/100 orphan Emhu_EOD35775__Ehux_103727 50.6/41 62.4/99 100/100 Emhu_EOD03955__Ehux_210538 92.9/98 Emhu_EOD16234__Ehux_210538 91.3/98 Emhu_EOD40370__Ehux_51153 49/98 Emhu_EOD39709__Ehux_62642 Emhu_EOD17106__Ehux_462187 89.3/49 100/100 Emhu_EOD39603__Ehux_462187 Emhu_EOD41995__Ehux_447539 50.9/85 Patl_Unigene5331_16354 98.4/76 5.5/88 Gthe_78981 63.8/30 Patl_Unigene10547_32262 Nspp_Sros_PTSG_11202_WARNING_DISTANT_NOHOMOLOG_Nspp9812_fr6 35.8/98 Bnat_21545 99.8/100 Bnat_45663 Rhizaria 51.3/87 Bnat_91633 0/83 Bnat_132297 orphan 95.8/96 Rfil_3764830_1s1f1 94.9/99 Rfil_3802156_2s Bnat_28683 Rfil_3729363_1s1f1 97.4/99 90.2/100 97.9/100 Rfil_3720345_1s 97.4/100 Rfil_24918_1c Rfil_3723408_1s4r1 46.1/72 92/99 0/96 Rfil_16933_5cr1 88.5/99 Rfil_35285_1c Rfil_27726_3c Clim_evm18s243 Dmel_FBpp0082685 78.5/94 Myosin 46.1/93 Dpul_64163 56.8/98 Ctel_191825 II 99.6/100 Hsap_ENSP00000346291 44.2/93 Hsap_ENSP00000386096 96.6/100 Nvec_Smar_009217_WARNING_NOHOMOLOG_nvp_21 Ocar_g6279_t1 100/100 95.9/100 Aque_XP_003387144 66.3/97 Mlei_ML02153a 99.6/100 Phso_561276Ps 100/100 Pyul_PYU1_T014382 Spar_SPRG_08074T0 95/89 98.8/100 Dpul_40739 90.7/100 Dmel_FBpp0089186 34.4/93 Ctel_96159 100/100 Nvec_XP_001634718 73/90 Aque_XP_003384558 Ocar_g2767_t1 91.9/100 Cele_F11C3_3_1 100/100 Cele_R06C7_10_1 81.7/76 Cele_T18D3_4 24.7/74 Cele_F58G4_1 100/100 Cele_F45G2_2a D 25.6/97 Cele_K12F2_1 98.5/100 Dpul_192727 96.7/63 88.7/97 Dmel_FBpp0080452 42.5/10072.9/100 Hsap_ENSP00000245503 Ctel_118513 89.4/59 Nvec_XP_001636958 48.1/58 98/100 Mlei_ML02275a Mlei_ML022011a 96.7/100 Aque_XP_003382837 79/99 Ocar_g1608_t1 83.3/100 Mvib_comp15821_c0_seq2_m39426 Nk52_evm16s62 Hsap_ENSP00000352834 Dpul_306188 96.4/100 94.8/100 Dmel_FBpp0072306 Cele_F52B10_1 99.9/100 99.8/100 86.6/96 Cele_F20G4_3 96.5/95 98.5/100 Hsap_ENSP00000379613 Hsap_ENSP00000444076 78.5/100 Hsap_ENSP00000353590 53.9/97 G 75.7/97 Myosin Skow_XP_002742413 91.5/876.58/98 Aque_XP_003382828 Mlei_ML305521a 11.2/5 Ocar_g8625_t1 Ctel_120713 Cfra_6331T1 49.8/5 99.7/100 Cowc_CAOG_03939 91.8/100 67/5 Mbre_XP_001742581 100/100 Sros_PTSG_04749 89.1/100 Patl_Unigene1265_4021 Dpul_306979 35.7/89 Patl_Unigene1265_4021_bis I c/h 0/76 Falb_H696_05665T0 8.8/96 Patl_Unigene1745_5371 99.5/100 88/8 97.3/100 Patl_Unigene5818_17774 65/95 Patl_Unigene9327_28473 14.1/6 Ttra_AMSG_04291 Dmel_FBpp0072567 12.6/91 100/100 Ttra_AMSG_09069 Ttra_AMSG_08437 66.5/6 Cele_Y11D7A_14 Gvul_c9624_g2_i1_m.5694 Spun_SPPG_07224 Nvec_Cgig_10014042_WARNING_NOHOMOLOG_nvp_12 43.6/94 91.3/96 Gpro_46743 93.3/91 99.9/100 Alma_AMAG_17973T0 89.3/20 Alma_AMAG_20747T0 91.2/91 Roal_5807Ra Aque_XP_003389506 8883.2/.29/480 Partr_v1_DN28942_c0_g1_i1_m25835 68.3/93 Mver_07901 Rhir_337673Ri 95.3/99 68.9/5 Ccin_CC1G_09915 98.1/100 94.8/100 Cneo_XP_569902 Ocar_g8308_t1 89.7/79 Usma_71018405__Umay_XP_759433 79.6/11 100/100 Mdap_gi_692166808_gb_KGG50005.1 Mdap_gi_914959489_ref_XP_013236441.1 Pbla_Phybl2_21117 100/100 36.4/95 39.7/95 Rhde_RO3G_12113T0__Rory_13107 96/100 99.9/100 Pbla_Phybl2_71839 Mbre_XP_001745332 90.4/12 Rhde_RO3G_00933T0__Rory_10685 84.9/100 Aory_CADAORAT00006906 99.6/100 98.6/100 Asfu_104657Af 93.4/12 98.7/100 Neca_2185Nc 93.5/18 Tmel_XP_002835249 Sros_PTSG_04553 Spom_NP_588114_1 16/18 67.8/97 99.3/100 Spom_NP_593816_1 Sace_sace_2968Sc 97.7/100 Ecun_NP_584783 Npar_EIJ87373 Mbre_XP_001744998 Ppal_EFA84750_1 32.9/87 95.8/94 99.9/100 81/100 Didi_DDB0191444 2.5/76 Acas_g7928 100/100 Ngru_XP_002671980 Ngru_XP_002680829 Mvib_comp14411_c0_seq1_m28339 Patl_Unigene8792_26902 65.4/86 Usma_71020057__Umay_XP_760259 24.7/84 Patl_Unigene2558_7895 81.5/86 94.6/87 Ccin_CC1G_13719 Cneo_XP_568003 Myosin Patl_Unigene1450_4564 91.8/87 Pbla_Phybl2_106479 99.8/100 86.3/100 Rhde_RO3G_15776T0__Rory_9588 If 41.5/63 90/8965.7/100 Rhir_348439Ri Mver_02573 65.4/65 Spom_NP_595402_1 Patl_Unigene1476_4627 87.8/65 Tmel_XP_002840103 Sace_sace_3700Sc 51.8/73 100/100 99.4/87 Sace_sace_4787Sc 100/100 Aory_CADAORAT00002556 85.3/86 Asfu_107121Af Falb_H696_02314T0 99.1/92 Neca_319Nc 37.5/76 Alma_AMAG_11379T0__Amac_13842 100/100 22.7/99 Alma_AMAG_13842T0__Amac_13842 98.9/99 100/100 Partr_v1_DN27688_c0_g1_i1_m65219 Partr_v1_DN27688_c0_g1_i2_m65222 Cowc_CAOG_04501 Patl_Unigene1228_3914 60.7/84 92.8/84 Patl_Unigene3145_9641 99.6/10690.4/99 Nk52_evm13s167 Patl_Unigene1474_4621 98.5/92 Ppal_EFA78349_1 32.2/96 100/100 Partr_v1_DN27789_c1_g1_i1_m66972 97.3/99 Didi_DDB0215355__Ddis_XP_643060 Acas_g5382 Ngru_XP_002669917 100/100 90/81 Pinf_XP_002906848 94.9/100 97.1/100 Phso_537381Phs 99/100 Pyul_PYU1_T007191 Partr_v1_DN27789_c1_g1_i3_m66975 18.8/87 Spar_SPRG_01615T0 97.9/98 Mvib_comp8208_c2_seq1_m7232 90.2/78 95.3/90 100/100 Lmaj_XP_001686193 Bsal_BS22830 Ppal_EFA78234_1 Spun_SPPG_08031 99.6/100 99.9/100 Didi_DDB0191351__Ddis_XP_636382 Acas_g13932 91.6/100 Hsap_MYO1F_HUMAN_unconventional_WARNING 73.5/95 Hsap_ENSP00000288235 37.8/83 Mlei_ML015752a Hsap_ENSP00000306382 Aque_XP_003386902 82.7/86 80.1/92 Ctel_95339 100/100 Dpul_309941_bis 93.3/100 40.3/56 Cele_F29D10_4 99.3/67 71.2/57 Dpul_309941 Hsap_ENSP00000300119 Myosin 64/16400.3/84 Ocar_g5747_t1 92.3/83 Nvec_8181v105106_WARNING_NOHOMOLOG_nvp_18 Mbre_XP_001743180 92/90 91.8/96100/100 Sros_PTSG_04512 89.2/8851 .9/96 Cowc_CAOG_02719 Ctel_166394 81.3/21 Mvib_comp12887_c1_seq1_m20918 34/78 Nk52_evm7s212 90.3/98 Clim_evm80s215 82.6/100 Spun_SPPG_01225 98.3/100 55.2/100 Bade_91391Bd__Bden_Batde5_91391 Dpul_324552 I a/b 94.5/99 Clim_evm80s215_bis 85.6/80 0/99 Gpro_59391 98.9/99 46/87 Roal_1278Ra 66.3/87 Patl_Unigene5624_17186 97.4/100 0/89 Falb_H696_00830T0 Patl_Unigene2559_7896 Aque_XP_003384926 89.9/80 81.6/87 Patl_Unigene328_1117 Cfra_6349T1 82.2/72 Ttra_AMSG_11869 94/98 89.8/99 Acas_g1560 Patl_Unigene4542_13883 Nvec_XP_001641578 99/99 Ppal_EFA81458_1 61.7/68 94.7/95 100/100 59.4/94 Didi_DDB0191347__Ddis_XP_643446 Ttra_AMSG_02590 99.8/100 Ppal_EFA84632_1 Ocar_g3678_t1 75.8/14 100/100 Didi_DDB0185086__Ddis_XP_643950 61/93 Acas_g1837 Ngru_XP_002680436 100/100 Pinf_XP_002998568 99.8/100 Phso_347790Ps Mbre_XP_001749663 28.7/82 Spar_SPRG_18776T0 90.8/89 Esil_CBN78037 0/22 Gvul_c2745_g1_i1_m.1158 100/100 Hsap_ENSP00000324527 99.4/100 29.5/95 Hsap_ENSP00000258787 Ctel_167519 Myosin Myosin Sros_PTSG_08080 92.8/99 63.3/99 Dpul_305891 89.9/99 Dmel_FBpp0079625 I d/g I a/b/c/h/d/g/k 79.7/100 Nvec_XP_001637898 94.8/100 99.7/100 Aque_Lgig_143556_WARNING_NOHOMOLOG_rspp_9 Fig S4G Patl_Unigene1473_4620 Aque_XP_003391678 89.3/99 Cele_T02C12_1 99.8/63 Sros_PTSG_07873 88.8/100 84.3/9750 .9/99 Mbre_XP_001746657 Ocar_g878_t1 56.9/9898 .2/89 Cowc_CAOG_07648 Mvib_comp13454_c3_seq1_m23460 Mvib_comp14986_c0_seq1_m31842 30.6/73 Nk52_evm80s151 Partr_v1_DN28584_c1_g1_i1_m72858 93.4/99 100/100 84.3/76 Partr_v1_DN28584_c1_g1_i3_m72864 23.7/92 Falb_H696_01540T0 Mvib_comp15884_c1_seq6_m40327 94.7/100 Patl_Unigene4818_14740 44.3/83 Patl_Unigene2761_8550 97.4/100 Hsap_ENSP00000306382 0/93 Hsap_ENSP00000300119 96/100 Dpul_324552 Myosin Cowc_CAOG_04618 81.2/21 0/85 Ctel_166394 I a/b 92.6/99 Nvec_XP_001641578 94.8/100 Aque_XP_003384926 71.7/100 Ocar_g3678_t1 Mbre_XP_001749663 46.9/55 100/100 Dpul_305891 31.7/15 Sros_PTSG_08080 78.6/55 Cowc_CAOG_04618 87.6/90 Mvib_comp13454_c3_seq1_m23460 89/98 95.2/22 Mvib_comp15884_c1_seq6_m40327 8.3/10 Patl_Unigene1473_4620 Dmel_FBpp0079625 Patl_Unigene1476_4627 36/16 Hsap_ENSP00000352834 61.6/75 97.5/100 Myosin 86.3/99 Hsap_ENSP00000444076 Myosin 94.6/100 Ctel_120713 95.3/99 Dpul_306979 100/100 Nvec_XP_001637898 26.5/100 Dmel_FBpp0072567 I c/h 93/100 Nvec_Cgig_10014042_WARNING_NOHOMOLOG_nvp_12 79.3/100 Aque_XP_003389506 100/100 Ocar_g8308_t1 96.8/73 99.7/100 Mbre_XP_001745332 Sros_PTSG_04553 Hsap_ENSP00000324527 94.3/91 63.2/89 Mvib_comp14411_c0_seq1_m28339 I d/g 90.8/33 82.1/97 43.3/89 Mbre_XP_001744998 100/100 Cowc_CAOG_06292 24.2/88 Mvib_comp15194_c1_seq1_m33224 97.6/100 Myosin Mbre_XP_001747996 Hsap_ENSP00000258787 100/100 Sros_PTSG_07359 85.6/72 Partr_v1_DN27789_c1_g1_i1_m.66972 I k 100/100 44.4/48 78.1/92 96.7/97 Partr_v1_DN27789_c1_g1_i3_m.66975 84.4/94 0/14 Spun_SPPG_08031 Patl_Unigene1450_4564 Ctel_167519 94.5/28 91.4/93 Falb_H696_02314T0 Patl_Unigene1228_3914 76.8/100 Ttra_AMSG_07376 95.9/100 Ttra_AMSG_07592 Ttra_AMSG_09890 Aque_Lgig_143556_WARNING_NOHOMOLOG_rspp_9 93/41 17/100 99.9/100 Ppal_EFA85721_1 99.3/99 98.8/97 Didi_DDB0220021__Ddis_XP_636359 100/100 93.9/93 Ttra_AMSG_07283 100/100 Ppal_EFA77144_1 49.1/92 Didi_DDB0216200__Ddis_XP_636580 Aque_XP_003391678 Nk52_evm36s2531 86.4/44 94.2/70 Dmel_FBpp0099956 0/82 Emhu_EOD29489__Ehux_457182 100/100 Emhu_EOD29490__Ehux_457182 86.8/95 Emhu_EOD27387__Ehux_457182 E Emhu_EOD20192__Ehux_75288 Cele_T02C12_1 100/100 84.9/86 Emhu_EOD14055 98.7/100 100/100 12.6/17 49.8/88 Ttra_AMSG_02589 Esil_CBN78038 Patl_Unigene5950_18224 Sros_PTSG_07873 Didi_DDB0215392 Patl_Unigene10223_31227 Ngru_XP_002676561 96.4/100 97.1/100 99.9/100 Ngru_XP_002683472 59.2/98 Ngru_XP_002682373 Mbre_XP_001746657 Ngru_XP_002676946 Ptet_A0BUP0_A0BUP0 32.7/48 100/100 89.9/95 94.8/100 Ptet_A0E835_A0E835 Ptet_A0D4P3_A0D4P3 81.6/100 Myosin 90.3/100 Ptet_A0CKH1_A0CKH1 Tthe_XP_001018813 Ocar_g878_t1 48.1/33 87.2/30 Tthe_XP_001014642 XIV 100/100 Ptet_A0BQ00_A0BQ00 Ptet_A0D6K3_A0D6K3 Tthe_XP_001032548 63.2/33 95.1/30 93.1/100 Cowc_CAOG_07648 95.6/100 Tthe_XP_001032549 98.6/75 Tthe_XP_001026495 84.5/86 92.6/100 Tthe_XP_001015568 98.9/100 39.7/86 Tthe_XP_001027753 100/96 94.5/100 Tthe_XP_001010638 Tthe_XP_001013713 Mvib_comp14986_c0_seq1_m31842 100/100 Tthe_XP_001017158 86.8/99 Tthe_XP_001007637 Tthe_XP_001013408 54.3/100 Pfal_Q8IDR3_MYOA Nk52_evm80s151 0/86 99.6/100 Tgon_XP_002368942 95.7/100 Tgon_XP_002365418 99.1/100 Tgon_XP_002366595 71.4/21 100/100 Tgon_XP_002364848 58.9/100 24.5/6 Pfal_Q8I465_Q8I465 Falb_H696_01540T0 78.2/99 Tgon_XP_002366825 94.7/78 Pmar_XP_002780157 63.8/33 64.1/78 Cfra_4645T1_tris 45.4/95 Alma_AMAG_16259T0 Patl_Unigene7878_24198 69.7/57 Spar_SPRG_14367T0 Patl_Unigene4818_14740 Ptri_XP_002181408 90.4/100 Myosin 94.7/98 62.4/100 Thps_Thaps35059__Tpse_XP_002291453 53.5/100 Ngad_20744 XXIX 72.5/99 Ngad_06667 Esil_CBJ27624 Patl_Unigene2761_8550 100/100 Pinf_XP_002998463 98.6/100 97.6/100 Phso_362634Ps 22.7/59 100/100 Pyul_PYU1_T008890 99.2/100 Spar_SPRG_20216T0 100/100 Ptet_A0CVB8_A0CVB8 Partr_v1_DN28584_c1_g1_i1_m72858 98.7/100 Ptet_A0E4V4_A0E4V4 Tthe_XP_001028088 Gvul_c13205_g2_i1_m.14159 100/100 87.3/92 58.5/92 Gvul_c13205_g2_i4_m.14163 Myosin 98/94 Gvul_c13205_g2_i2_m.14161 Partr_v1_DN28584_c1_g1_i3_m72864 66.4/77 Gvul_c13205_g2_i3_m.14162 IV 79.9/84 Gvul_c14951_g1_i1_m.21123 Bnat_44500 18/68 43.3/100 93.4/92 Bnat_138429 98/94 Bnat_89109 Cowc_CAOG_06292 100/100 Bnat_127316 55.9/89 Bnat_140911 63.8/90 Bnat_41654 87.2/98 Gvul_c14372_g1_i1_m.20838 96.7/99 79.2/97 Rfil_40463_3c Mvib_comp15194_c1_seq1_m33224 79.8/99 Bnat_34821 Gvul_c13144_g1_i1_m.13443 18/56 0/87 Rfil_3788464_1s1f1 99.9/100 Rfil_3727747_1s1f1 100/100 82.8/96 97/100 Rfil_115205_2c Mbre_XP_001747996 82.7/92 Rfil_28520_1c Myosin Rfil_3780735_1s1r1 33/99 100/100 Rfil_39044_1c 100/100 Rfil_143346_1s 0/19 Acas_g6163 52.2/56 99.7/100 Sros_PTSG_07359 Acas_g2037 0/9 Gvul_c12237_g2_i1_m.10014 74.5/45 Partr_v1_DN22453_c0_g1_i1_m32098 32.5/48 Patl_Unigene6823_20915 Acas_g9362 99.5/8768/61 Emhu_EOD11728 Ttra_AMSG_07376 I k 100/100 99.6/100 Emhu_EOD23932 Esil_CBJ33811 40.2/89 69.3/100 Ttra_AMSG_00218 99.8/100 Ttra_AMSG_02747 88.2/99 Ttra_AMSG_08941 Ttra_AMSG_07592 76.3/49 Ttra_AMSG_07795 96.9/96 Emhu_EOD14276 100/100 51.4/64 Emhu_EOD23971 Patl_Unigene9376_28622 85.9/66 48.6/51 98.3/100 Bnat_36277 Ttra_AMSG_09890 Bnat_75564 61.8/39 100/100 Bnat_132607 65.1/88 Partr_v1_DN18655_c1_g1_i1_m5526 Gvul_c13443_g7_i1_m.17847 99.9/100 Esil_CBJ27565 Ppal_EFA85721_1 90.5/100 Esil_CBJ27566 Myosin 100/100 Ngad_04122 100/100 Esil_CBJ33547 XXXI 93.2/95 Pinf_XP_002900600 97.4/100 97.8/100 100/100 Phso_517698Ps (partial) Didi_DDB0220021__Ddis_XP_636359 100/100 Pyul_PYU1_T000322 74.9/37 53.3/89 Spar_SPRG_12183T0 90.2/100 Pinf_XP_002999253 96/100 Phso_359062Ps 100/100 Pyul_PYU1_T005362 Spar_SPRG_05632T0 100/100 Ttra_AMSG_07283 98.6/100 98/100 Spar_SPRG_01374T0 18/62 Emhu_EOD15095__Ehux_431051 32.5/91 Emhu_EOD20019__Ehux_431051 100/100 Emhu_EOD16798__Ehux_416078 Ppal_EFA77144_1 Emhu_EOD16797__Ehux_66080 79.9/100 Bsal_BS68940c 61.5/100 Bsal_BS09245 100/100 74.4/100 Bsal_BS64760 Myosin 83/99 Bsal_BS35940 84.6/34 Bsal_BS25900 XIII Didi_DDB0216200__Ddis_XP_636580 8.6/94 96/100 Bsal_BS44555 75.3/100 Bsal_BS08970 88/100 Bsal_BS70125 99.8/95 Bsal_BS21125 60.3/73 Bsal_BS32790 94.7/95 Bsal_BS59865 96.1/100 Ngru_XP_002680898 97.8/95 Ngru_XP_002681567 Lmaj_XP_001685713 88.3/35 100/100 86.3/56 Bsal_BS70030 100/100 Emhu_EOD23137 Emhu_EOD27336 94.4/71 Didi_DDB0191100 Patl_Unigene5494_16813 0.2 Dmel_FBpp0291459 0/23 56.2/23 Aque_XP_003386525 Myosin 98.4/100 Mbre_XP_001744128 95.1/100 Sros_PTSG_02189 XXII 100/100 Ocar_g340_t1 30.4/85 Mvib_comp15915_c0_seq1_m41032 90.8/92 Cowc_CAOG_08401 13.1/81 Cowc_CAOG_01574 38.9/99 Falb_H696_05852T0 82.8/100 Patl_Unigene3980_12104 52.2/81 Patl_Unigene1906_5938 100/100 Partr_v1_DN29016_c0_g1_i2_m58671 Partr_v1_DN29016_c0_g1_i3_m58677 78/94 27.6/82 13/18 97.5/92 Roal_5156Ra 82.9/100 Spun_SPPG_07005 Gpro_94128 96/93 Patl_Unigene7831_24042 Patl_Unigene9359_28565 99.2/99 Acas_g11880 96.9/22 38.3/36 Acas_g13118 Myosin 53.9/38 Acas_g9693 79.9/97 Acas_g10618 XXV Patl_Unigene4647_14195_bis 70.1/95 100/100 Ppal_EFA75546_1 88.7/95 Didi_DDB0185049__Ddis_XP_644171 90.3/96 81.5/95 Acas_g205 Patl_Unigene4100_12473 88.2/100 Acas_g13843 99.3/100 Acas_g9824 Acas_g1449 Cfra_4645T1 92.5/100 Dpul_304348 88.2/17 Dmel_FBpp0080282 Myosin 11.3/16 Ctel_226057 99/100 Hsap_ENSP00000386331 VII 85.9/98 Hsap_ENSP00000374175 87.1/9865 .4/96 Cele_T10H10_1 Dmel_FBpp0079097 32.3/87 Ocar_g9435_t1 Nvec_XP_001619773 47.1/100 Sros_PTSG_01118 88.1/78 65.9/100 Mbre_XP_001745924 36.3/25 Mbre_XP_001745923 98.6/5160.09/7 Patl_Unigene8185_25121 75/40 Ocar_g9829_t1 91.8/100 Nvec_XP_001637626 Aque_Saqv_comp10335_c1_seq21_m_47003_WARNING_NOHOMOLOG_rspp_2 96.4/95 Clim_evm40s25 Ttra_AMSG_02296 84.5/79 81.2/100 100/100 Ttra_AMSG_12322 99.8/100 Ttra_AMSG_05986 90.3/100 Ttra_AMSG_09426 orphan 73.7/89 94/100 Ttra_AMSG_06739 Ttra_AMSG_07281 Patl_Unigene8845_27063 44.8/96 Ctel_227847 Hsap_ENSP00000205890 90.7/99 Myosin 88.4/100 Dpul_191611 0/90 Dmel_FBpp0271775 57.1/69 Nvec_XP_001624024 XV Patl_Unigene6054_18553 38.4/52 93.4/99 42.6/70 Ocar_g2829_t1 96.6/99 Patl_Unigene4647_14195 99.9/100 Sros_PTSG_06636 59.5/93 Mbre_XP_001748290 84.7/100 Cowc_CAOG_08449 79.7/8597.3/95 Mvib_comp15616_c1_seq1_m36918 Mvib_comp15942_c0_seq1_m41490 75/49 Clim_evm9s51 75.5/52 74.6/65 Patl_Unigene4798_14670 Patl_Unigene3934_11981 Patl_Unigene5379_16484 93.1/74 99.5/100 Partr_v1_DN28430_c2_g1_i1_m41531 80.1/59 64/99 Partr_v1_DN28430_c5_g1_i2_m41535 71.8/100 Partr_v1_DN29015_c0_g1_i2_m58764 26.5/60 Patl_Unigene12292_37350 Figure S4. Comparative analyses of proteins related to cytoskeleton, membrane tra cking and phagotrophy Gpro_115647 77.3/89 Hsap_ENSP00000421280 91.9/86 Patl_Unigene4102_12479 Myosin 91.4/86 Nvec_XP_001636848 94.9/100 Ocar_g1598_t1 X 64.3/96 97.9/86 Aque_XP_003383212 99.9/100 Mbre_XP_001744901 91.6/88 Sros_PTSG_00762 87.3/99 Cowc_CAOG_02007 99.3/9366 .7/98 Mvib_comp15717_c0_seq1_m38007 Cowc_CAOG_07167 81.4/100 Cfra_7676T1 Clim_evm27s66 Patl_Unigene1225_3905 (related to Figures 3C-D). Sros_PTSG_03170 100/100 0/48 Mbre_XP_001745779 Myosin 32.2/52 Ocar_g4520_t1 Mbre_XP_001744231 99.8/100 III-like 94/89 Sros_PTSG_11624 88.4/88 100/100 Mbre_XP_001747308 88.7/89 Sros_PTSG_08880 100/100 Mbre_XP_001746075 Sros_PTSG_07948 Hsap_ENSP00000265944 90.9/89 97.9/100 78.2/95 Hsap_ENSP00000335100 III Aque_XP_003388016 0/39 100/100 Dpul_97999 (A) Principal coordinate analysis (PCoA) of 41 eukaryotic species according to their pro le of presence/absence of 99.6/9429 .8/49 Dmel_FBpp0079064 Ctel_226099 79.5/84 Nvec_XP_001635694 93/100 Mbre_XP_001750699 11.5/68 Sros_PTSG_08457 73.9/80 Sros_PTSG_10698 22.5/53 99.9/100 Mbre_XP_001746769 90/76 Sros_PTSG_01758 64.9/93 Ocar_g9905_t1 Hsap_ENSP00000349145 XVI 73.1/100 Mvib_comp15601_c0_seq2_m36800 89.9/57 Mvib_comp14521_c0_seq1_m28969 Cowc_CAOG_00840 99.6/100 Mbre_XP_001745802 695 KEGG groups of orthologs (KOGs) related to Cytoskeleton (KO04812), membrane tracking (KO04131) and Sros_PTSG_03041 87.4/56 Sros_PTSG_01006 74.4/56 59.3/99 98.8/100 Mbre_XP_001746805 100/100 Sros_PTSG_03054 88.9/100 Sros_PTSG_02707 77.3/100 Mbre_XP_001749588 88.4/56 Sros_PTSG_09865 37.3/62 Sros_PTSG_05946 99.6/100 Mbre_XP_001745395 Sros_PTSG_06008 7.5/61 99.3/100 Sros_PTSG_01875 95.5/62 4.1/87 60.9/97 Mbre_XP_001745239 86.9/97 Aque_XP_003388838 Mbre_XP_001744165 phagotrophy (ko04071, ko04142, ko04144, ko04145, map03050, map04130, map04141, map04146 and map04810; 83.3/100 64.2/100 Sros_PTSG_02425 80.7/100 Mbre_XP_001746511 99.3/100 Sros_PTSG_09835 85.4/79 99.2/100 Mbre_XP_001746510 98.6/100 Sros_PTSG_11202 Mbre_XP_001746313 Patl_Unigene501_1595 62.6/96 Patl_Unigene8890_27202 17.7/80 Patl_Unigene6370_19639 Nk52_evm23s2657 80/98 Dpul_64918 27/98 Ctel_226894 Myosin IX 19.1/87 94.8/84 80/98 Cele_F56A6_2 see Table S3); simpli ed version in Figure 3C. (B). Clustering of 41 eukaryotic species according to their pro le of Ctel_5381 98.1/98 99.7/100 Hsap_MYO9A_HUMAN_unconventional_WARNING 77.8/26 Hsap_ENSP00000380444 46.4/20 Nvec_XP_001630897 Ocar_g1966_t1 95.4/4943.8/39 Mlei_ML16038a Aque_XP_003385348 94.3/97 0/35 Gvul_c7097_g1_i1_m.3396 0/9 Mvib_comp115234_c0_seq1_m45092 99.9/88 73.1/96 94.2/37 Gvul_c9624_g1_i1_m.5692 Cowc_CAOG_02063 Patl_Unigene6822_20909 86.9/100 Patl_Unigene11935_36332 presence/absence of the same 695 OGs as in (A). Color code is scaled to display positive Pearson’s r (0 to 1). (C-E) 0/72 Patl_Unigene3141_9623_bis 99.4/100 Patl_Unigene3141_9623 Patl_Unigene12845_38987 Ecun_XP_955751 99.4/100 Ppal_EFA84521_1 90.9/99 Didi_DDB0232322__Ddis_XP_643200 97.5/100 Emhu_EOD08069 Emhu_EOD30970 54.3/100 Pbla_Phybl2_95920 Pbla_Phybl2_95943 98.7/100 Myosin 99.1/100 Rhde_RO3G_10151T0__Rory_2233 95/91 Rhde_RO3G_13033T0__Rory_9123 XVII 0/92 Rhde_RO3G_05161T0__Rory_6642 55.8/95 98/100 Pbla_Phybl2_107887 KEGG maps compare presence/absence of the corresponding proteins in Paraphelidium tribonemae (green) and Pbla_Phybl2_107631 85.2/95 94.5/92 Mver_06404 Ccin_CC1G_10992 85.6/99 Rhir_346613Ri 42.3/96 Mver_04343 100/100 Asfu_102004Af Neca_6176Nc 81.7/90 99.3/100 Aory_CADAORAT00008380 72.1/96 Asfu_102005Af 100/100 Tmel_XP_002839897 97.3/95 Neca_6174Nc 100/100 Alma_AMAG_17354T0 Alma_AMAG_20714T0 Rhde_RO3G_04443T0__Rory_6001 Rozella allomycis (pink). (C) KEGG Phagosome map04145. (D) KEGG Lysosome map04142. (E) KEGG 100/100 99/100 Rhde_RO3G_11165T0__Rory_230 97.5/92 47/100 Pbla_Phybl2_115666 89.7/100 Pbla_Phybl2_95968 98.5/100 Rhde_RO3G_10568T0__Rory_12406 96.5/100 Pbla_Phybl2_85927 95.7/100 Rhde_RO3G_15485T0__Rory_17367 99.9/100 Pbla_Phybl2_153081 98.2/100 Rhde_RO3G_17187T0__Rory_5687 100/100 Pbla_Phybl2_17470 90.8/97 Rhde_RO3G_14688T0__Rory_8615 44.7/100 Ccin_CC1G_11270 14/64 99.2/100 Usma_71018241__Umay_XP_759351 60.3/97 Cneo_XP_571494 map04144. (F-G) Phylogenetic tree of myosin genes using the Myosin head domain (Pfam PF00063), using Maximum 94.1/97 Rhir_348546Ri Mver_06060 99.7/100 Spun_SPPG_06138 42/100 Bade_35424Bd__Bden_Batde5_35424 90.5/100 Bade_86403Bd__Bden_Batde5_86403 84.7/84 Bade_23120Bd__Bden_Batde5_23120 Bade_8502Bad__Bden_Batde5_8502 69.3/99 91.6/100 Bade_34502Bd__Bden_Batde5_34502 Spun_SPPG_06659 91/76 99.9/99 97.9/84 Spun_SPPG_03441 Spun_SPPG_00937 99.9/100 Alma_AMAG_01322T0 99.9/100 Alma_AMAG_05513T0 Alma_AMAG_12722T0 likelihood in IQ-TREE (supports are SH-like approximate likelihood ratio test / UFBS, respectively) based in the classi - 100/100 83.2/100 Alma_AMAG_14686T0 100/100 Alma_AMAG_07735T0 97.2/100 Alma_AMAG_09765T0 99.9/88 100/100 Alma_AMAG_07719T0__Amac_09783 Alma_AMAG_09783T0__Amac_09783 92.4/76 99.9/100 100/100 Alma_AMAG_07755T0__Amac_09745 Alma_AMAG_09745T0__Amac_09745 26.2/78 Gpro_289059 Cowc_CAOG_06481 83.3/100 Partr_v1_DN28871_c0_g1_i1_m33284 78.9/67 Partr_v1_DN28978_c0_g1_i1_m25194 30.4/67 Partr_v1_DN46183_c0_g1_i1_m1798 cation by Sebé-Pedrós et al. (2014). (F) Myosin class I phylogeny. (G) All eukaryotic myosins. Species are color-coded Roal_3114Ra

0.4 according to taxonomy.