<<

Accepted Manuscript

Multigene phylogeny reveals the likely protozoan ancestors of opis- thokonts (, fungi, choanozoans) and

Thomas Cavalier-Smith, Ema E. Chao, Elizabeth A. Snell, Cédric Berney, Anna Maria Fiore-Donno, Rhodri Lewis

PII: S1055-7903(14)00279-6 DOI: http://dx.doi.org/10.1016/j.ympev.2014.08.012 Reference: YMPEV 4996

To appear in: Molecular Phylogenetics and

Received Date: 24 January 2014 Revised Date: 2 August 2014 Accepted Date: 11 August 2014

Please cite this article as: Cavalier-Smith, T., Chao, E.E., Snell, E.A., Berney, C., Fiore-Donno, A.M., Lewis, R., Multigene eukaryote phylogeny reveals the likely protozoan ancestors of (animals, fungi, choanozoans) and Amoebozoa, Molecular Phylogenetics and Evolution (2014), doi: http://dx.doi.org/10.1016/ j.ympev.2014.08.012

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. 1 1 Multigene eukaryote phylogeny reveals the likely protozoan ancestors of opisthokonts 2 (animals, fungi, choanozoans) and Amoebozoa 3 4 Thomas Cavalier-Smith1, Ema E. Chao1, Elizabeth A. Snell1, Cédric Berney1,2, Anna Maria 5 Fiore-Donno1,3, and Rhodri Lewis1 6 7 1Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK. 8 2Present address: Department of Sciences, The Natural History Museum, Cromwell Road, 9 London SW7 5BD, UK. 3Present address: Zoology Institute, Terrestrial Ecology Group, Faculty 10 of Mathematics and Natural Sciences, University of Cologne, Biozentrum Köln, Zülpicher Str. 47 11 b, D-50674 KÖLN, Germany. 12 13 Author for correspondence: Thomas Cavalier-Smith, email: [email protected]. 14 15 ABSTRACT 16 Animals and fungi independently evolved from the protozoan , these three 17 groups constituting a major branch of the eukaryotic evolutionary tree known as opisthokonts. 18 Opisthokonts and the protozoan phylum Amoebozoa (amoebae plus slime moulds) were 19 previously argued to have evolved independently from the little-studied, largely , 20 protozoan phylum, Sulcozoa. Sulcozoa are a likely evolutionary link between opisthokonts and 21 the more primitive excavate that have ventral feeding grooves and the most primitive 22 known mitochondria. To extend earlier sparse evidence for the ancestral (paraphyletic) nature of 23 Sulcozoa, we sequenced transcriptomes from six gliding flagellates (two apusomonads; three 24 planomonads; Mantamonas). Phylogenetic analyses of 173-192 and 73-122 eukaryote- 25 wide taxa show Sulcozoa as deeply paraphyletic, confirming that opisthokonts and Amoebozoa 26 independently evolved from sulcozoans by losing their ancestral ventral groove and dorsal 27 pellicle: (apusomonads plus anaerobic breviate amoebae) are robustly sisters to 28 opisthokonts and probably paraphyletic, breviates diverging before apusomonads; Varisulca 29 (planomonads, Mantamonas, and non-gliding flagellate ) are sisters to opisthokonts 30 plus Apusozoa and Amoebozoa, and possibly holophyletic; Glissodiscea (planomonads, 31 Mantamonas) may be holophyletic, but Mantamonas sometimes groups with Collodictyon 32 instead. Taxon and sampling slightly affects tree topology; for the closest branches in 33 Sulcozoa and opisthokonts, proportionally reducing missing data eliminates conflicts between 34 homogeneous-model maximum-likelihood trees and evolutionarily more realistic site- 35 heterogeneous trees. Sulcozoa, opisthokonts, and Amoebozoa constitute an often-pseudopodial 2 36 ‘podiate’ , one of only three eukaryotic ’supergroups’. Our trees indicate that evolution of 37 sulcozoan dorsal pellicle, ventral , and ciliary gliding (probably simultaneously) 38 generated podiate from -like excavate flagellates. 39 40 Key words: cell evolution, eukaryote phylogeny, podiates, , transcriptome sequencing, 41 Sulcozoa 42 43 1. Introduction 44 Phylogenetically, all eukaryotes have been assigned to just three supergroups: podiates, 45 corticates (kingdoms Plantae and ), and Eozoa (excavates and ) (Cavalier- 46 Smith, 2013a). The entirely heterotrophic podiates, the focus of this paper, include Animalia, 47 Fungi, and three protozoan phyla (Sulcozoa, Amoebozoa, Choanozoa); they are so called 48 because of the general presence of pseudopodia except in the derived Fungi that lost them 49 (Cavalier-Smith, 2013a). Originally ‘excavates’ excluded Euglenozoa (Simpson and Patterson 50 1999), but later these distinctive flagellates were included despite not sharing excavate 51 morphology (Cavalier-Smith 2002, 2003, Simpson 2003) under the influence of a probably 52 erroneous assumption about the location of the eukaryotic root; here we follow Cavalier-Smith 53 (2010a, 2013a) in excluding Euglenozoa from excavates. Multigene trees usually show corticates 54 as a clade, but are contradictory concerning the boundary between the excavate Eozoa and the 55 putatively podiate phylum Sulcozoa. The problem lies in the uncertain phylogenetic 56 position of the excavate flagellate Malawimonas (Brown et al., 2013; Derelle and Lang, 2012; 57 Hampl et al., 2009; Zhao et al., 2012; Zhao et al., 2013). Some multigene trees show podiates as 58 a clade with Malawimonas one node deeper (e.g. Brown et al., 2013); others place Malawimonas 59 within podiates, typically as sister to the sulcozoan flagellate Collodictyon (e.g. Zhao et al., 60 2012, 2013). Either position is consistent with the postulate that the evolutionary transitions 61 between the three supergroups (Fig. 1) all involved biciliate ventrally grooved cells 62 morphologically similar to Malawimonas (Cavalier-Smith, 2013a). Podiates are proposed to 63 have arisen from a Malawimonas-like ancestor by evolving ventral pseudopodia and novel dorsal 64 semi-rigid pellicle and abandoning swimming in the plankton in favour of a benthic habitat, 65 gliding on the posterior over surfaces in search of prey (Cavalier-Smith, 2013a). 66 To clarify the deep branching of podiates one must determine the relationships of the 67 various groups of Sulcozoa, a recently established protozoan phylum of mainly gliding 68 flagellates. Sulcozoa are morphologically unified by a unique cell structure, combining ventral 69 feeding groove, pseudopodia to catch prey (not for locomotion), and rigid dorsal pellicle 70 (Cavalier-Smith, 2013a). Sulcozoa comprise subphyla Apusozoa (apusomonads and breviates) 3 71 and Varisulca, proposed as the most basal podiate group (Cavalier-Smith, 2013a). Sulcozoa are 72 phylogenetically important as likely ancestors of two major groups, whose cell structure is 73 proposed to have been radically simplified from the complex cytoskeleton of the ventral feeding 74 groove of Sulcozoa and excavates: opisthokonts, whose ancestor lost the anterior cilium and 75 became uniciliate like sperm, and Amoebozoa, which developed pseudopodia for 76 efficient locomotion (Cavalier-Smith, 2013a). A 16-gene tree weakly grouped three sulcozoan 77 orders as a clade, but was based on only three or four genes for apusomonads and planomonads 78 (Katz et al., 2011), the best studied gliding sulcozoan flagellates; a sulcozoan clade was not seen 79 in a 30-gene tree with four Sulcozoa, because Malawimonas was weakly within Sulcozoa (Grant 80 et al., 2012). A study of 159 genes weakly excluded Malawimonas from Sulcozoa and podiates, 81 but showed two or three distinct sulcozoan at the base of podiates, though sampling only 82 five Sulcozoa from three orders, and only Collodictyon representing subphylum Varisulca 83 (Brown et al., 2013). Brown et al. (2013) concluded that Apusozoa were paraphyletic; their trees 84 also confirmed earlier evidence that Collodictyon branched deeper still, as the most divergent 85 podiate lineage to date, making Sulcozoa the ancestral (paraphyletic) podiate group. 86 Most Sulcozoa are gliding not swimming flagellates: apusomonads (Cavalier-Smith and 87 Chao, 2010) and planomonads (Cavalier-Smith et al., 2008) and Mantamonas (Glücksman et al., 88 2011) (the latter two grouped as Glissodiscea) glide over surfaces on their posterior cilium held 89 rigidly behind their cell like a ski. Sulcozoa also include Diphylleida (Brugerolle et al., 2002; 90 Cavalier-Smith, 2003), alga-eating grooved swimming flagellates with two or four cilia, which 91 emit pseudopodia from their groove and whose cytoskeleton is substantially modified compared 92 with gliding Sulcozoa (Cavalier-Smith and Chao 2010; Cavalier-Smith 2013a). Diphylleids, 93 sometimes included in Apusozoa or (Cavalier-Smith, 2003), were recently grouped 94 with Glissodiscea and the non-flagellate Rigifilida (Yabuki et al., 2013) as Varisulca (Cavalier- 95 Smith, 2013a). Previously Glissodiscea included the gliding flagellate Discocelia (Vørs, 1988; 96 Cavalier-Smith, 2013a), but rRNA trees put it in (Cavalier-Smith et al. in prep.). 97 Breviate amoebae lost the posterior cilium and glide instead by the remaining anterior cilium 98 held rigidly ahead (Heiss et al., 2013), and have become secondarily anaerobic. Ciliary gliding, 99 pseudopodia, and dorsal pellicle differentiate Sulcozoa from the excavate phylum Loukozoa, 100 which includes Malawimonas, secondarily anaerobic flagellates like , and 101 flagellates with the most primitive mitochondria (Lang et al., 1997; Burger et al., 2013).

102 A better sulcozoan phylogeny is crucial for understanding key steps in eukaryote cell 103 evolution before the origins of animals and fungi. This is impeded by lack of multigene data for 104 most sulcozoan lineages, whose relationships rely on morphological evidence and weakly 4 105 resolved and conflicting rDNA phylogenies (Cavalier-Smith and Chao, 2010; Cavalier-Smith et 106 al., 2008; Glücksman et al., 2011; Paps et al., 2013; Yabuki et al., 2013). Previously large-scale 107 transcriptomic data existed for only six , not all included together in one tree: the 108 apusomonads Thecamonas trahens (the only sulcozoan with completely sequenced) and 109 Manchomonas bermudensis, which grouped as sister to opisthokonts (Brown et al., 2013; Grant 110 et al., 2012; Torruella et al., 2012; Zhao et al., 2013); three uniciliate breviate amoebae, 111 anathema, Subulatomonas tetraspora and Pygsuia biforma, now grouped with apusomonads in 112 Apusozoa (Cavalier-Smith, 2013a) but previously regarded as Amoebozoa (Minge et al., 2009); 113 and the diphylleid Collodictyon that grouped weakly as sister to opisthokonts plus Apusozoa and 114 Amoebozoa (Brown et al., 2013; Zhao et al., 2012, 2013). The first transcriptome-derived 115 multigene tree with fewer proteins and no apusomonads, which grouped breviates with 116 Amoebozoa (Minge et al., 2009). However breviates are now sisters either to apusomonads 117 (forming a clade Apusozoa on maximum likelihood (ML) trees) or to opisthokonts plus 118 apusomonads on evolutionarily more realistic site-heterogeneous (CAT-GTR-GAMMA) 119 Bayesian trees (Brown et al., 2013; Grant et al., 2012; Zhao et al., 2012, though their fast-site 120 removal trees contradictorily placed them in Amoebozoa). 121 This paper studies basal podiate phylogeny and Sulcozoa in particular by sequencing 122 partial transcriptomes from sulcozoan flagellates representing three major groups, two not 123 previously represented on multigene trees. We did cDNA pyrosequencing for six sulcozoan 124 species from three different orders: (Thecamonas oxoniensis, Multimonas media 125 (Cavalier-Smith and Chao, 2010)); Mantamonadida (Mantamonas plastica (Glücksman et al., 126 2011)); Planomonadida ( sigmoides, Fabomonas tropica, Nutomonas howeae 127 (Glücksman et al., 2013)). For the evolutionarily key Mantamonas and planomonads we provide 128 the first large-scale protein data. New apusomonad sequences supplement those for Thecamonas 129 trahens and Manchomonas bermudensis (Brown et al., 2013; Torruella et al., 2012). For the first 130 time we include both the breviate amoeboflagellates Pygsuia (Brown et al., 2013) and 131 Subulatomonas (Grant et al., 2012) in the same tree, making our phylogenetic analyses the most 132 comprehensive to date. Overall they indicate that Sulcozoa comprise at least three 133 phylogenetically distinct lineages at the base of podiates, strongly support the separate phyletic 134 position of the sulcozoan subphyla, Apusozoa and Varisulca (Cavalier-Smith, 2013a), and show 135 that Apusozoa, Sulcozoa, Loukozoa, and Excavata are probably all ancestral paraphyletic groups 136 of key importance for understanding early stepwise evolution of eukaryotic cells ancestral to 137 higher kingdoms. Our analyses provide further evidence for excavate by strongly 138 suggesting that Malawimonas is sister to podiates whereas all other Eozoa are evolutionarily 139 more distant; the evolutionarily most realistic CAT-model trees show podiates as a clade. 5 140 141 2. Materials and Methods 142 RNA was extracted from uniprotist cultures of all six sulcozoans, cDNA libraries made, 143 454-pyrosequencing done, and 192-gene alignments (52,824 amino acid positions) constructed 144 for >200 eukaryotes. Supplementary Table S3 lists the genes. Because of the complexity of data 145 preparation, comprehensive technical details are in the electronic supplementary material. 146 Briefly however, RNA was extracted using TRI-Reagent (Sigma) protocol (Chomczynski & 147 Sacchi, 1987); cDNA libraries were constructed by Vertis AG using their full-length enriched 148 protocol and PCR amplification - those for Multimonas media and Ancyromonas sigmoides were 149 normalised. Sequencing was multiplex: tagging individual libraries with specific oligonucleotide 150 markers allowed computational separation of taxa after pooling DNA for two 454-sequencing 151 runs on two separate half plates that also included 18 other individually tagged libraries 152 (reported in separate papers). Some cross-contamination between certain samples must have 153 occurred at some stage, as some incorrectly labeled sequences were unambiguously identified in 154 the ML trees run for quality control of all 192 single-gene alignments and excluded from the 155 multigene analyses (as were similar contaminants identified in databases from other 156 laboratories). Bootstrapped single-gene trees combined with careful inspection of all alignments 157 were used to remove duplicate and incorrect paralogue sequences. Amino acid positions 158 included in the analyses were similar to those of the seed alignments from published multigene 159 analyses (see supplementary material) to which our and additional outgroup sequences were 160 added. Sequences were concatenated for multigene trees by SCaFoS (Roure et al., 2007) for each 161 gene and taxon sample analysed. 162 A eukaryote-wide subset of 75 taxa was used for phylogenetic analysis by the best 163 available site-heterogeneous amino-acid substitution model (PhyloBayes-MPI v.1b GTR-CAT- 164 Γ-4rates) (Lartillot and Philippe, 2008; Philippe et al., 2011) and for maximum likelihood (ML) 165 by RAxML-MPI v.7.2.8 or 7.7.2. PROTGAMMALGF (a site-homogeneous model) as detailed 166 in supplementary material. After Pygsuia sequences became available (Brown et al., 2013) we 167 used both methods to study a second subset of 73 taxa that excluded all eozoan outgroup 168 sequences but was substantially augmented by including additional podiate and corticate taxa 169 (i.e. neozoa) as well as adding extra sequences for already included taxa from recently sequenced 170 complete represented only by incomplete transcriptomes in our first analyses. To 171 reduce the proportion of missing data, analyses for these 73 neozoa excluded 19 less well 172 sampled genes (marked in Table S3) and thus used only 173 genes. To study the effects of 173 incomplete data for some taxa, we also ran 73-taxon trees by both methods for shorter alignment 174 subsets that excluded successively more less-well represented genes; for that purpose we used 6 175 SCaFoS (Roure et al., 2007) to construct shorter alignments restricted to genes missing in less 176 than 50% of taxa, in less than 40% of taxa, and in less than 30% of taxa. ML trees were also 177 done for taxonomically richer alignments (e.g. 86 or 122 taxa, not computationally practicable 178 for PhyloBayes) for the 192 genes, as well as for alignments including only the 178 genes 179 represented in our new sulcozoan data and for 98 eukaryote-wide taxa using the same 173 genes 180 as for the 73-taxon trees (and for reduced alignments for these taxa for genes with <50%, <40%, 181 and <30% missing data). For the 192 gene alignment we also used the program SlowFaster 182 (Kostka et al. 2008) to make five other reduced alignments, which excluded the fastest evolving 183 5%, 10%, 19%, 30% and 40% of amino acid positions; see Supplementary Material) and 184 calculated ML trees for each. New sequences obtained in this work have been depos ited in 185 GenBank under six BioProject numbers: PRJNA195917-9; PRJNA195922; PRJNA195923; 186 PRJNA195925. 187 188 3. Results 189 190 3.1 Distinct phylogenetic positions of Varisulca and Apusozoa 191 The two varisulcan groups lacking any previous phylogenetic trees based on numerous 192 genes (planomonads and Mantamonas) invariably branch in all our analyses at the very base of 193 podiates close to Collodictyon, the only other varisulcan previously with extensive multigene 194 sequences. All three planomonads form a maximally supported clade by both methods in all 195 trees, with Nutamonas sister to Ancyromonas (invariably maximally supported), robustly 196 confirming the weakly supported 18S rDNA tree topology (Glücksman et al., 2013). In Figure 2, 197 podiates are a clade by both methods (moderately supported by CAT, insignificantly by ML) and 198 all five Varisulca clearly branch below the last common ancestor of opisthokonts and 199 Amoebozoa. By contrast, the two lineages of the other sulcozoan subphylum Apusozoa branch 200 above the -amoebozoan divergence as sisters to opisthokonts only – either as a single 201 clade by ML or as two successive breviate and apusomonad clades by CAT, just as Brown et al. 202 (2013) previously showed. For the first time this shows unambiguously that Mantomonas is not 203 related to apusomonads, contrary to 28S rDNA trees (Glücksman et al., 2011), but is more 204 closely related to Collodicton (with which it is weakly sister by CAT, PP 0.51) or planomonads 205 (weakly sister by ML: BS 57%). Though these distinct and contrasting phylogenetic positions of 206 the two sulcozoan subphyla Varisulca and Apusozoa are consistent by both methods, ML and 207 CAT differ for their internal phylogeny: with ML Apusozoa (breviates, apusomonads) and 208 Glissodiscea (planomonads, Mantamonas) are both holophyletic but with CAT both are 7 209 paraphyletic. The two methods are also contradictory for Malawimonas: sister to podiates by 210 CAT (0.78) and insignificantly (39%) to by ML. 211 Taxon-rich trees are commonly better than sparse ones because more internal branches 212 are broken allowing more accurate phylogenetic reconstruction, so we also ran trees with 213 substantially more neozoan and podiate taxa. Fig. 3 omits the more distant excavate outgroups to 214 allow a denser taxonomic sampling for neozoa (podiates and corticates). It also omitted most 215 genes not represented in our six new transcriptomes, so as to reduce possible artefacts from too 216 high a proportion of missing data (Roure et al. 2013), and includes three more apusozoan 217 sequences unavailable when Fig. 2 was run (Brown et al., 2013); moreover, for some taxa, extra 218 genes were added from full genomes previously absent from partial transcriptomes. 219 Theoretically all four differences should make the Fig. 3 trees more accurate. Reassuringly, 220 despite considerable differences in taxon and gene sampling, podiate and sulcozoan topology is 221 identical to Fig. 2 for CAT except for a different and now convincingly and robustly supported 222 internal phylogeny of the better sampled apusomonads. Moreover, ML now also weakly shows 223 the Mantamonas/Collodictyon clade (59%) and thus now agrees in this respect with CAT; also 224 support for (Apusozoa, opisthokonts) increased from 69 to 89% as it does for the 225 branching of Varisulca below obazoa from to 59 to 81%. These improvements to the ML tree 226 show the greater importance of good taxon- and gene-sampling for ML than for CAT, which 227 maximally supports these clades in both trees. Corticate basal phylogeny is also improved, with 228 Plantae monophyletic in Fig. 3, not polyphyletic as in Fig. 2; however chromists still appear 229 paraphyletic with cryptists closer to Plantae than to other Chromista. Thus podiate basal tree 230 topology is more stable than it is for corticates. 231 Nonetheless the closeness of the varisulcan and Malawimonas/metamonad divergences 232 and the conflicts between ML and CAT near this interface between podiates and excavates on 233 both trees made further tests of tree stability desirable, for which numerous additional trees were 234 run. Overall these trees drew on 152 eukaryotic taxa and represent the most thorough and 235 extensive multigene analysis to date of basal eukaryotic branching. To orient the reader in their 236 more detailed description below, Table 1 summarises the different figures and alignments 237 analysed and their purposes. 238 A key issue is the potential distorting effort of disproportionately high levels of missing 239 sequence in some taxa, an artefact to which ML seems more prone than CAT, though neither is 240 immune (Roure et al., 2013). This is especially important for Varisulca, Malawimonas, and the 241 metamonad anaeromonads as no genomes or near-complete transcriptomes are available for any 242 of them and they therefore inevitably have proportionally more missing data than most other 243 clades on the trees. We therefore investigated this for the Fig. 3 alignment by systematically 8 244 excluding genes missing in higher proportions of the taxa (Fig. 3). To study whether increasing 245 taxon sampling substantially above the number computationally feasible for running the CAT 246 trees of the 75 and 73 taxon alignments used for Figs 2 and 3 would make ML trees more 247 compatible with the probably more accurate CAT trees, we carried out similar sparsely- 248 represented-gene removal tests on a eukaryote-wide 98-taxon sample including more excavates 249 than before (Fig. 4). A different test, used by others concerned by the inconstant position of 250 Malawimonas on trees (Derelle and Lang, 2012; Hampl et al., 2009; Zhao et al., 2012), is faster- 251 evolving-site removal to examine tree reproducibility when slower evolving sites are given 252 greater weight. As this sometimes makes ML trees agree better with CAT trees, Figure 5 shows 253 an ML tree for the largest eukaryote-wide alignment to date for protist multigene analyses (122 254 taxa), and summarises the results of removing faster-evolving sites (corresponding individual 255 trees after faster-site removal are in Figs S3-S5). 256 Finally, to investigate effects of including more excavate outgroups on the basal 257 branching order of podiates and related excavates, a sample of 86 eukaryotes and 178 genes 258 (excluding some of the most sparsely represented genes in our larger alignment) was analysed by 259 CAT and LG ML (Fig. S1), primarily to see the effect of including the long-branch 260 excavates, excluded in other trees because of concerns about their possibly distorting the rest of 261 the tree. This had precisely the same sulcozoan topology as Fig. 2 for ML and Fig. 3 for CAT, 262 showing that for Sulcozoa it does not matter whether long-branch and 263 trichomonads are both included (Fig. S1), both excluded (Fig. 2) or only Parabasalia included 264 (Fig.3) – though excavate topology differed in Fig. S1. Fig. S1 excluded Chromista, partly to 265 reduce computational time and partly because Deschamps and Moreira (2009) showed that 266 including chromist outgroups can distort basal branching order of Plantae, and in Fig. 2 267 even intruded into Plantae, and we wanted to check the internal branching order of Plantae in the 268 absence of possibly distorting chromists, using a much larger alignment and many more 269 outgroups and genes than did Deschamps and Moreira (2009). The CAT topology of Fig. S1 270 confirms that found by Deschamps and Moreira in the absence of chromists (red , 271 sisters), but the corresponding ML topology (Fig. S1A) was contradictory with 272 strongly (92%) sisters of Viridiplantae (as in some other ML trees Figs 4,5, not 273 Fig. 2). However, in contrast to Fig. S1 where metamonads were seemingly polyphyletic 274 ( sister to obazoa not to anaeromonads) Metamonada are a clade with maximal support 275 and branching as sister to podiates/Malawimonas plus corticates in ML trees (Figs 5, S1A). More 276 importantly for this paper, excluding all Chromista, like all Eozoa, did not alter the sulcozoan 277 branching order, which is therefore robust to immense changes in sampling distant taxa. 278 9 279 3.2. The global eukaryotic tree 280 Corticates are a maximally supported clade in the CAT trees of Figs 2, 3, and S2; they 281 are also a clade with low to strong support in all ML trees (Figs 2-5). The Fig. 2/3 CAT trees 282 have a maximally supported bipartition between podiates and Eozoa. Fig. 2 also weakly shows 283 this bipartition weakly by ML. All CAT trees show Malawimonas and metamonads as more 284 closely related to podiates (usually their immediate outgroups) than to other excavates or to 285 corticates. Three ML trees did not show the podiate/eozoan bipartition because Malawimonas 286 intrudes into podiates and into Varisulca, becoming sister to Collodictyon and displacing 287 Mantamonas (Figs 4, 5, S1) as Zhao et al. (2012) observed with a CAT model. Almost all 288 branches within opisthokonts are consistently maximally supported. Neither method showed 289 Plantae and Chromista as clades, because Cryptista was within Plantae - sister to glaucophytes in 290 ML (BS 62%), contradictorily sister to the red algal/green clade in PhyloBayes CAT (PP 291 0.98). With both phylogenetic methods and taxon samples (Figs 2,3), Sulcozoa appear as two 292 distinct groups corresponding to subphyla Apusozoa and Varisulca, each weakly holophyletic on 293 ML trees. 294 295 3.3. Detailed Sulcozoan Phylogeny 296 297 3.3.1. Apusozoa 298 The anaerobic breviate amoeboflagellates are sisters to the aerobic pseudopodial 299 apusomonads in ML trees for 192 or 173 genes: support for an apusozoan clade, is 71% for Fig. 300 2 and 75% for Fig. 3). However all CAT trees for the present taxon samples instead show 301 apusomonads as sister to opisthokonts with breviates branching one node more deeply as their 302 sisters (CAT: PP 1 in Figs 2, 3 and Supplementary Fig. S1 for 86 taxa). Fig. 3 shows that when 303 genes with proportionally lower taxonomic representation are successively excluded from the 304 analyses, this contradiction between LG and CAT trees disappears and eventually Apusozoa 305 become paraphyletic by both methods with identical topology, albeit with lower support by the 306 less realistic LG than with CAT. The ML tree for the full alignment and that using only genes 307 missing in <50% of taxa (30,538 amino acid positions, 35.21% gaps) showed Apusozoa as a 308 clade, those using only genes missing in <40% (17,470 amino acids, 27.39% gaps) or <30% 309 (8680 amino acids, 21.22% gaps) of taxa showed Apusozoa as paraphyletic and with identical 310 topology to CAT trees. Thus Apusozoa are probably paraphyletic, as Brown et al. (2013) also 311 argued from the markedly better fit of the data to CAT than to LG. The contradiction by ML 312 using genes with >40% of taxa missing is probably because the site-homogeneous LG model is 313 less able to model ancestral states for the apusozoan clade than is the evolutionarily more 10 314 realistic CAT model when genes are missing for too high a fraction of the taxa (Roure et al. 315 (2013). This multigene tree has the most comprehensive sampling for Apusozoa to date, and 316 shows for the first time that Subulatomonas is robustly sister to Breviata not Pygsuia and shows 317 that Thecamonas trahens is much more closely related to Manchomonas than to Thecamonas 318 oxoniensis, proving that Thecamonas as presently constituted is paraphyletic (suspected from 319 18S rDNA trees: Cavalier-Smith and Chao, 2010). 320 321 3.3.2. Varisulca 322 Our trees are the first using numerous genes to include glissodisceans (planomonads and 323 Mantamonas). Planomonads are always a clade with maximal support. Nutomonas and 324 Ancyromonas are maximally supported as sisters, which decisively supports their grouping as 325 Ancyromonadidae in contrast to the earlier diverging Fabomonas in agreement with ciliary 326 structure and much less robust 18S rDNA trees (Glücksman et al., 2013). Several CAT trees 327 weakly group Mantamonas with Collodictyon; support for this clade varies with taxon and gene 328 sampling from PP 0.5 to 0.87 (Figs 2, 3, S1). With LG this clade is seen only if Malawimonas is 329 excluded (Fig. 3), but it is found only with the complete gene set (59% support); whenever less 330 well represented genes are excluded Mantamonas groups instead with planomonads with weak 331 support (<50% BS 39%; <40% 54%; <30% 26%) forming a clade corresponding to the class 332 Glissodiscea - the ML support value peaking at the intermediate degree of gene removal suggests 333 this may be the correct topology. Removing genes missing in >50% of taxa moved Mantamonas 334 away from Collodictyon, breaking Varisulca into three distinct lineages: planomonads sister to 335 Amoebozoa/Apusozoa/opisthokonts (PP 0.88), Collodictyon sister to all of them (PP 0.52) and 336 Mantamonas the deeply branch sister to all other podiates; however, the CAT tree for <40% 337 missing retained a Mantamonas/Collodictyon clade; most strikingly, when only genes missing in 338 less than 30% of taxa were used CAT topology agreed with that of all three gene-removal ML 339 trees (PP 0.64 for Glissodiscea Fig. S2; based on both chains, which converged even for 340 - unlike the other 73-taxon CAT trees). The fact that varisulcan topology agrees by both methods 341 when the proportion of missing data is sufficiently reduced and that it is also the only one found 342 that agrees with the morphological reasons for establishing class Glissodiscea (Cavalier-Smith, 343 2013a) makes it likely that Glissodiscea is a clade and that CAT and ML trees may be distorted 344 in contradictory ways when the proportion of missing data is too high. Had we not conducted the 345 sparse-gene removal study for both methods we might have concluded (probably wrongly) that 346 the 192- and 173-gene CAT gene trees were correct and that Glissodiscea are paraphyletic. 347 Collodictyon was sister to Glissodiscea (making a varisulcan clade) in all Fig. 3 trees showing 348 clade Glissodiscea (<50% BS 65%; <40% 63%; <30% 12%, PP 0.45), weakly indicating that 11 349 Varisulca are probably holophyletic. Whenever Malawimonas is included in ML trees it groups 350 with Collodictyon (BS 71-85%), and Mantamonas groups instead (with weak to moderate 351 support) with planomonads forming a glissodiscean clade (BS 57% for Fig. 2, 61% Fig. 4). 352 These trees provide the first strong evidence that both planomonads and Mantamonas 353 branch more deeply than the common ancestor of Amoebozoa and opisthokonts in the same tree 354 region as the diphylleid Collodictyon, and that Varisulca collectively are the most deeply 355 divergent of all podiates, i.e. Mantamonas does not group with apusomonads as it did on 28S 356 rDNA trees (Glücksman et al., 2013). Less strongly we conclude that Varisulca are probably a 357 clade. The trees for subalignments with proportionally less missing data suggest that incomplete 358 data are a problem in several other areas in the tree, most seriously at the base of corticates as 359 indicated by changing support values that this produced (Fig. 3). 360 To see if such conflicts are reduced by better taxon sampling within Eozoa (but excluding 361 the longest diplomonad branch present in Fig. S1) we also analysed by RaxML-MPI v. 7.7.2 362 PROTGAMMALGF an enlarged eukaryote-wide alignment for 98 taxa using all 173 genes 363 (45,190 amino acid positions, 47.15% gaps) plus three reduced alignments including only genes 364 with <50% (24, 657 amino acid positions, 34.87% gaps), <40%, (17,470 amino acid positions, 365 27.39% gaps) and <30% (8680 amino acids, 21.22% gaps) missing taxa. As Fig. 4 shows, in all 366 four trees Malawimonas was still within podiates (BS 91%, 91%, 58%, 33%) and Varisulca 367 (75%, 86%, 36%, 28%), and sister to Collodictyon (85%, 81%, 73%, 71%). The steady drop in 368 support for this position and thus reduced conflict with the CAT trees supports the interpretation 369 that there are not enough data for a site-homogeneous model to reconstruct accurately ancestral 370 sequences for Malawimonas and Varisulca. Mantamonas was always sister to planomonads 371 (clade Glissodiscea: BS 66%, 77%, 35%, 38%). As in the 73-taxon neozoan trees (Fig. 3), 372 support for Apusozoa being a clade declined markedly as genes with more missing data were 373 removed (Fig. 4: BS 86%, 77%, 41%, -); and in the <30% missing-gene tree breviates moved to 374 group weakly with anaeromonad excavates and Amoebozoa, probably an artefact of insufficient 375 data. Within podiates most established groups (opisthokonts, Amoebozoa, breviates, and 376 planomonads) were invariably maximally supported as clades, as were apusomonads except in 377 the smallest <40% and <30% alignments where they remained strongly holophyletic and their 378 internal topology robustly identical to the 73-taxon tree. 379 To see whether the intrusion of Malawimonas into podiates on ML trees is just a 380 consequence of using the LG model we also ran a RAxML GTRGAMMA tree for the complete 381 alignment from Fig. 4. Its only topological difference was that Harosa were misplaced below the 382 divergence of podiates/Malawimonas and other corticates (not shown). This suggests that for the 383 long-branch Harosa, the possibly overgeneralised GTR model reconstructs ancestral sequences 12 384 less well than LG, whose parameters are derived from a much larger empirical data set than the 385 present alignment and may therefore be more accurate. Most bootstrap values were identical, 386 though a few non-maximal ones were higher and a similar number lower. 387 As supplementary Fig. S1 has PhyloBayes tree topology identical to Figs 2 and 3 except 388 for the weakly supported internal branching of Jakobea and Apusomonadida, the main features 389 of CAT trees are insensitive to removing all 10 chromists and adding 21 taxa to the remaining 390 groups, including trichozoan metamonads omitted from Fig. 2 because of their long branches 391 and because Hampl et al. (2009) found their presence or absence altered the position of 392 Malawimonas on their 48-taxon 143-gene trees. Reassuringly, using many more taxa and genes 393 stabilised the position of Malawimonas on our Bayesian trees; neither its position nor that of any 394 sulcozoan groups altered with big changes in metamonad sampling. Unexpectedly however, Fig. 395 S1 shows trichozoan metamonads within podiates, separately from anaeromonad metamonads; 396 the corresponding ML tree put Malawimonas sister to Collodictyon (78% support). The position 397 of Malawimonas may therefore be affected by taxon sampling more on ML than CAT trees. ML 398 trees for an even larger taxon sample of 122 taxa also placed Malawimonas within Varisulca and 399 robustly monophyletic Metamonada significantly lower (Fig. 5). 400 401 3.4. Removing faster sites less useful than proportionally reducing missing data 402 In marked contrast to earlier studies with no sulcozoans or only one, successive removal of up to 403 40% of faster evolving sites did not cause Malawimonas to move downwards and join other 404 excavates (Fig. 3). The composition and separate positions of Varisulca and Apusozoa remained 405 identical and strongly supported, though within Varisulca Malawimonas and Mantamonas 406 interchanged positions with 10% or more removal; the resulting planomonad/Malawimonas 407 ‘clade’ moved immediately below the new Mantamonas/Collodictyon clade (which remains 408 sister to opisthokonts/Apusozoa/Amoebozoa); the position of these two clades remained stable 409 when 19%-40% of faster sites were removed. Thus after 10% faster-site removal varisulcan ML 410 tree topology was the same as with CAT, except for Malawimonas also being within Varisulca 411 as sister to planomonads only rather than sister to all podiates. Fast-site removal did not reduce 412 the stability of or reverse the likely artefactual holophyly of Apusozoa (Fig. 5), as did reducing 413 the proportion of missing data (Fig. 3), suggesting that for some clades the latter is a better way 414 of reducing site-homogeneous-model tree reconstruction artefacts. Removing faster sites did 415 move Mantamonas to be sister to Collodictyon and thus made this tree contradict Fig. 4 and the 416 CAT <30% tree that showed a glissodiscean clade, arguably making the tree worse. Faster site 417 removal is probably not a good way of eliminating long-branch problems, and could reduce 418 some real phylogenetic signal. Previous interpretations of the movement downwards of 13 419 Malawimonas following fast-site removal as correct history (Derelle and Lang, 2012; Hampl et 420 al., 2009; Zhao et al., 2012) are not supported by our analyses using more genes and taxa. In one 421 case, site-removal clearly wrongly placed Breviata apparently robustly within Amoebozoa (Zhao 422 et al., 2012). Adding here many more Sulcozoa, by anchoring Breviata in a probably more 423 correct position, also supported by ultrastructural characters (Cavalier-Smith, 2013a; Heiss et al., 424 2013), eliminated that signal-loss artefact. Breaking up branches by adding more taxa, especially 425 shorter-branch ones, is a sounder way than faster site removal of reducing long-branch artefacts. 426 In all analyses both site-heterogeneous and homogeneous trees robustly place all 427 varisulcan lineages (planomonads, Collodictyon, Mantamonas) below the common ancestor of 428 Amoebozoa and opisthokonts, and put Apusozoa (apusomonads and breviates) above that 429 common ancestor, specifically with opisthokonts, confirming previous arguments that Sulcozoa 430 are a paraphyletic phylum ancestral independently to Amoeboza and opisthokonts (Cavalier- 431 Smith, 2013a). Figs 2, 3, and S1 with very different taxon sampling and somewhat different gene 432 sampling confirm that paraphyly of Apusozoa on site-heterogeneous trees is robust to taxon 433 sampling, and Figs 2-5 and S1-5 with extremely different taxon and gene sampling and two 434 contrasting algorithms all show Varisulca as the deepest branching podiate clade. 435 436 3.5. Paraphyly of excavates 437 All our trees show that the excavate Malawimonas is more closely related to podiates 438 than to corticates or to jakobid or percolozoan excavates. Our best CAT trees suggest that 439 metamonad excavates are sister to podiates plus Malawimonas (Fig. 2 confined to the short- 440 branch anaeromonad metamonads) or even branch within podiates (Fig. S1, including also the 441 long-branch Trichozoa, but we trust this tree less as its large taxon number allowed running only 442 one chain, which plateaued well and have found using larger taxon samples including chromists 443 and an additional parabasalian (not shown) that all metamonads more often than not form a 444 robust clade on CAT trees that is sister to podiates plus Malawimonas). All but two ML trees 445 contradictorily placed Metamonada as sister to podiates/Malawimonas plus corticates; however, 446 it is possible that this deeper position compared with CAT trees is a long-branch artefact; support 447 for this deeper position declined progressively as the proportion of missing data was lowered 448 (93%, 83%, 67%, -), and the <30% tree (Fig. S4) placed anaeromonads but not Trichozoa 449 weakly within podiates. Thus all our trees show that both Malawimonas and Metamonada are 450 more closely related to podiates than they are to . The 159-protein (43,319 amino acid) 451 68-taxon trees of Brown et al. (2013) weakly grouped Malawimonas and the sole included 452 metamonad () together, but we never saw that on our trees more broadly sampled for 453 excavates. However, the position of their Malawimonas/Trimastix clade by CAT (and of 14 454 Malawimonas alone by ML) was the same as for these genera separately on our CAT trees, 455 though made less obvious by their arbitrary rooting of the tree between podiates and other 456 eukaryotes - which is contradicted by -rooted ribosomal protein and mitochondrial 457 protein multigene trees (Lasek-Nesselquist and Gogarten, 2013; Zhao et al., 2013) and the dozen 458 characters discussed by Cavalier-Smith (2010a, 2013a), which all place the root within Eozoa, 459 either between excavates and Euglenozoa or (mitochondrial protein tree only) within excavates 460 between Malawimonas and jakobids. Support for the deeper branching of metamonads than 461 excavates in Fig. 2 is not high (0.61, 79) so our trees cannot exclude the possibility that 462 Malawimonas and metamonads are a clade that is sister to podiates (Brown et al. 2013), rather 463 than two successively branching independent sister clades to podiates, as our taxonomically 464 better sampled Fig. 2 suggests (supported also by several unshown CAT trees with still greater 465 taxon sampling). 466 Irrespective of where one places the eukaryotic root, excavates are clearly a paraphyletic 467 or ancestral group from which at least two of the groups podiates, corticates, and euglenoids 468 evidently evolved by independently losing the ultrastructural characters that define excavates. If 469 the root is either between jakobids and Malawimonas or between jakobids and all other 470 eukaryotes, then all three groups evolved from excavates; if the root were between podiates and 471 all other eukaryotes, then corticates and Euglenozoa evolved from excavates; if it were between 472 Euglenozoa and Excavata, as the largest mass of evidence indicates, then only corticates and 473 podiates evolved from excavates, and Euglenozoa diverged earlier. 474 475 3.6. Corticate outgroup phylogeny with contrasting algorithms and alignments 476 As proteins used for multigene trees consistently fit the site-heterogeneous CAT model 477 better than the homogeneous LG model (Philippe et al., 2011), it is unsurprising that the Fig. S1 478 CAT tree correctly shows the monophyly of vascular , whereas its ML tree wrongly groups 479 the tracheophyte Selaginella with the Physcomitrella. Practical superiority of CAT is also 480 shown by another improvement in Plantae: Viridiplantae are sister to Rhodophyta with maximal 481 support on all three PhyloBayes trees (and very weakly so on the 75-taxa ML tree), as is 482 probably correct from other evidence (wrongly grouped with Glaucophyta in 86-, 98-, and 122- 483 taxa ML: 92%, 32%, and 66% support). Intrusion of Cryptista into Plantae on one PhyloBayes 484 trees (Fig. 2), sometimes seen by others (Brown et al., 2013; Burki et al., 2012), is probably 485 incorrect, and did not occur in the better-sampled Fig. 3. Some ML trees (Fig. 2, 4) show the 486 same, but Figs 3 and 5 show holophyletic Plantae. Instability in ML basal branching order within 487 Plantae and inclusion or not of Cryptista may simply result from basal corticated stems being 488 shorter than for Sulcozoa, making their order even harder to reconstruct reliably. 15 489 Though corticates normally form a reasonably well supported clade on multigene trees, 490 as they do here, their basal branching order is notoriously unstable and different in almost all 491 recently published trees (Burki et al., 2009, 2012, 2013; Brown et al., 2013), which the 492 inconsistencies amongst our trees for the branching order of confirm. The simplest 493 explanation of that instability is that only extremely short stems separate basal corticate lineages 494 on the tree, so very few subsequently stable amino acid substitutions occurred in these short 495 intervals. Coupled with the large number of very early diverging corticate clades, this makes 496 basal tree reconstruction intrinsically harder than for podiates or Eozoa and indicates that Plantae 497 and Chromista both underwent an extremely rapid near simultaneous radiation shortly after the 498 symbiotic origin of and the secondary transfer of a red algal plastid to chromists 499 (Cavalier-Smith 2013b). Addition here of several Varisulca to the tree shows that the basal 500 podiate radiation also took place essentially contemporaneously with that of corticates and was 501 similarly rapid. 502 503 3.7. Opisthokont phylogeny 504 In marked contrast to podiates and corticates, basal opisthokont radiations were later and 505 well spread out over time, making them inherently easier to resolve and robustly supported on all 506 trees. However, Figs 3 and 4 include the flagellate parasite Rozella, sometimes classified as a 507 chytridiomycete , but which was recently excluded from Fungi and placed within the 508 protozoan phylum Choanozoa (Cavalier-Smith, 2013a). In Fig. 3 the four chains did not 509 converge, partly because in one Rozella was sister to fungi whereas in the other it was within 510 as sister to Piromyces (and also because of branching differences within 511 chromists). The same lack of convergence with contrasting positions for Rozella was evident for 512 the <50% and <40% missing taxa alignments; strikingly when the proportion of missing data 513 was further reduced below 30%, the CAT chains converged (maxdiff 0.233) and Rozella was 514 sister to all fungi with maximal support, not within Chytridiomycota, and excluded from fungi 515 with strong support (PP 0.98 Fig. S2), in agreement with the ML tree of James et al. (2013). This 516 suggests that the difficulty of resolving its position is exacerbated by proportionally higher 517 amounts of missing data. James et al. (2013) showed that on ML 200-gene trees Rozella is sister 518 to , excluded from our trees because of their very long branch, so neither Rozella 519 nor microsporidia branch within fungi, but are jointly their sisters. Allomycetes and 520 are robustly sisters here (arguing against separate fungal phyla for them, and 521 for including both in Chytridiomycota), not the alternatives on some previous trees (e.g. 522 Torruella et al. (2012); see Cavalier-Smith (2013a) for discussion). 16 523 Previously the position of within opisthokonts was uncertain: some multigene 524 trees put them as sister to plus animals forming a group called filozoa 525 (Shalchian-Tabrizi et al., 2008), others with different taxon/gene sampling (e.g. Brown et al., 526 2013) put them as sister to Ichthyosporea. Three of our site-heterogeneous trees maximally 527 support the latter grouping (Figs 2, Fig. 3 for the fewest genes, S1), but the three most gene-rich 528 trees for one taxon sample (the smallest) aberrantly placed Filasterea with 529 animals/choanoflagellates (Fig. 3), but the tree for this taxon sample with proportionally least 530 missing data has Filasterea and Ichthyosporea as sisters (PP 0.99 Fig. S2), as in all 13 ML trees 531 and Brown et al. (2013). As 16 of 19 trees (including all with the richest taxon sampling) show 532 Filasterea/Ichthyosporea as a clade, mostly with strong (usually maximal) support, we name it 533 here: 534 535 3.7. 1 New opisthokont superclass Filosporidia and probable clade 536 537 New choanozoan superclass Filosporidia Cavalier-Smith (under ICZN). Diagnosis: trophic cells 538 either naked with long thin filodigits (Cavalier-Smith, 2013a) not organised as a collar or walled 539 cells that usually reproduce by naked amoebae or uniciliate . Phylogenetically includes 540 all Choanofila except choanoflagellates, i.e. Filasterea, Corallochytrea, and Ichthyosporea. 541 542 4. Discussion 543 4.1. Overall eukaryote phylogeny 544 A recent synthesis suggested that podiates first evolved pellicle, ciliary gliding, and 545 pseudopodia in their last common ancestor and constitute one of only three eukaryotic 546 supergroups (Cavalier-Smith, 2013a). The other two supergroups (corticates and Eozoa) were 547 ancestrally non-pseudopodial biciliates, though evolved pseudopodia independently in a minority 548 of lineages (Cavalier-Smith, 1997): notably filose and reticulose pseudopodia in the chromist 549 and eruptive lobose pseudopodia in the excavate phylum (Heterolobosea 550 and others: Cavalier-Smith and Nikolaev, 2008), all generally morphologically distinct from 551 podiate pseudopodia. Our Bayesian trees clearly separate the three supergroups and strongly 552 support the monophyly of both podiates and corticates, but ML support is often weak (strongest 553 80%) for podiates on the 10% fast-site-removal tree. 554 Adding six Sulcozoa to the tree improves understanding of eukaryote large-scale 555 phylogeny by showing that Varisulca all consistently branch more deeply within podiates than 556 do Apusozoa or Amoebozoa. Irrespective of where one places the eukaryotic root, on all our 557 trees both Sulcozoa and excavates in the classical sense (Simpson and Patterson, 1999) are 17 558 highly paraphyletic ancestral groups (as O’Kelly who initiated the excavate concept (O'Kelly, 559 1993) but not the name (Simpson and Patterson, 1999) originally assumed for excavates). 560 Multigene trees previously put apusomonads as sister to opisthokonts (Brown et al., 2013; 561 Derelle and Lang, 2012; Torruella et al., 2012) in accord with the earliest rDNA trees for an 562 apusomonad (Cavalier-Smith and Chao, 1995). Adding more apusomonads here strongly 563 supports this relationship between Apusozoa and opisthokonts and robustly confirms that 564 apusomonads are an ancient genetically diverse clade that probably first radiated even earlier 565 than animals or fungi (Cavalier-Smith and Chao, 2003, 2010). Our CAT trees clearly indicate for 566 the first time that Malawimonas and metamonads are probably successively sisters to podiates 567 and that corticates are probably sisters to podiates/Malawimonas/metamonads, not to podiates 568 alone. 569 570 4.2. Sulcozoan phylogeny and podiate evolution 571 Our multigene trees, together with those of Brown et al. (2013), provide strong sequence 572 evidence for the deduction from cytoskeletal morphology that the uniciliate breviate amoebae are 573 more closely related to apusomonads than to Amoebozoa (Cavalier-Smith, 2013a). Our Bayesian 574 trees strongly suggest that Apusozoa are paraphyletic and that apusomonads are sisters to 575 opisthokonts, but breviates branch slightly more deeply as sister to opisthokonts plus 576 apusomonads. That is more likely to be correct than the alternative grouping of Breviata with 577 apusomonads by ML, as the homogeneous LG model used by RAxML does not fit the 578 evolutionary behaviour of the genes included in our analyses as well as does the CAT 579 heterogeneous model, as Brown et al. (2013) also argued. Our systematic removal of 580 taxonomically less well represented genes reduces the conflict with ML, eventually making it 581 agree with the CAT trees; this strongly supports the assumption that apusozoan holophyly on 582 ML trees is an artefact and suggests that missing data make inference of ancestral states for 583 Apusozoa more difficult for homogeneous models than it does for site-heterogeneous ones (see 584 the excellent discussion of these problems in Roure et al. (2013)), thereby distorting tree 585 topology. Breviates are clearly unrelated to Amoebozoa, contrary to original assumptions 586 (Cavalier-Smith et al., 2004) and early multigene trees using homogeneous models only that 587 included no Sulcozoa (Minge et al., 2009); though more closely related to apusomonads than to 588 Amoebozoa, breviates diverged from apusomonads before the primary radiation of extant 589 apusomonads (our trees represent three of the five apusomonad lineages in the poorly-resolved 590 basal radiation of five deep-branching clades (Cavalier-Smith and Chao, 2010)). The 591 apusomonad/opisthokont affinity of breviates shown here and independently by Brown et al. 592 (2013) agrees with the taxonomic composition of the sulcozoan subphylum Apusozoa as recently 18 593 revised by adding Breviatea and excluding Glissodiscea and Rigifilida (both now grouped 594 instead with Collodictyon and other diphylleids as the new sulcozoan subphylum Varisulca: 595 Cavalier-Smith, 2013a). 596 The strong grouping of Breviata and Subulatomonas found here for the first time is 597 consistent with both sharing a tenuous relict pellicular layer (Cavalier-Smith, 2013a), seemingly 598 absent from Pygsuia (Brown et al. 2013), implying that the Pygsuia lineage lost the typical 599 sulcozoan pellicular layer after Pygsuia diverged from the common ancestor of Breviata and 600 Subulatomonas; similar loss of a tenuous pellicular layer was postulated for Sulcomonas within 601 diphylleids (Cavalier-Smith, 2013a). A thorough study of Breviata cytoskeletal 602 (Heiss et al., 2013) strengthens the view that its single cilium is a derived state resulting from 603 loss of the posterior cilium (Cavalier-Smith, 2013a), convergently with such loss in the uniciliate 604 phalansteriids and (both Amoebozoa). In marked contrast, opisthokonts lost the 605 anterior cilium and simplified the ancestral sulcozoan skeleton in a different way through 606 convergently evolving more symmetrical cone-like microtubular arrays. Our trees are in 607 harmony with these ultrastructural interpretations, this congruence reinforcing the idea that 608 podiate ancestors were grooved biciliate cells with an asymmetric excavate-like cytoskeleton 609 (Cavalier-Smith, 2013a). 610 That means that the ‘unikont’ condition (having a single ) of phalansteriids and 611 Archamoebae is not ancestral for podiates, as was once postulated (Cavalier-Smith, 2000, 2002), 612 and not even ancestral for Amoebozoa, which must now be considered originally biciliate 613 (Cavalier-Smith, 2013a; Heiss et al., 2013). Our clear demonstration that breviates are derived 614 from biciliate Sulcozoa, that on trees both opisthokonts and Amoebozoa are nested within 615 Sulcozoa, and that Sulcozoa comprise three distinct clades, means that the former grouping of 616 opisthokonts plus Amoebozoa alone, originally called unikonts (Cavalier-Smith, 2002) is 617 polyphyletic. Though usage of ‘unikonts’ has sometimes been extended to include Apusozoa, 618 e.g. Torruella et al. (2012), that name is now inappropriate for such a broader group, being 619 phylogenetically and descriptively misleading. The clade name ‘podiates’ proposed by Cavalier- 620 Smith (2013a) to embrace the original unikonts (opisthokonts, Amoebozoa; often uniciliate but 621 seldom unicentriolar) plus all Sulcozoa (generally biciliate, sometimes uni-, tetra- or non-) 622 is a more appropriate substitute for that broadened usage of unikont. 623 If planomonads were the deepest branching Sulcozoa, as Bayesian trees with 624 proportionally more missing data strongly suggest (Figs 2, 3, S1), their ancestor would probably 625 have lacked pseudopodia, and only the dorsal pellicle and ciliary gliding evolved (causing 626 ventral vane loss) when Sulcozoa evolved from a Malawimonas-like ancestor, pseudopodia 627 arising slightly later in the common ancestor of all Sulcozoa other than planomonads (Fig. 2). As 19 628 myosin II (used for podiate pseudopodial motility) has not been found in planomonads (Richards 629 and Cavalier-Smith, 2005) or non-podiates other than the heterolobosean (which 630 might have got it by lateral gene transfer), myosin II might have originated immediately after 631 other podiates diverged from planomonads at the same time as pseudopodia, and been 632 instrumental in their origin. However, if instead the ML topology of Figs 2-5 and the CAT tree 633 with proportionally least missing data (Fig. S2) is correct, varisulcan paraphyly on trees with 634 proportionally more missing data is a tree reconstruction artefact, which is not unlikely, and 635 Varisulca are holophyletic. If so, pseudopodia probably evolved in the ancestral podiate, and 636 planomonads (the only Sulcozoa without pseudopods) lost them – as previously postulated 637 (Cavalier-Smith, 2013a); genome sequences are needed for all varisulcan lineages to see if 638 myosin II also originated then. If one judges lateral gene transfer for myosin II to be unlikely 639 (Sebé-Pedrós et al., 2014), then it would probably have arisen in an early neokaryote and have 640 been lost independently by Giardia and corticates. In either case, its extensive use by early 641 podiates was probably a significant feature of their pseudopodial function. It would be valuable 642 to have complete genome sequences for all the deepest branching sulcozoan lineages, not only to 643 firm up their phylogeny but also to clarify the origins of the unique podiate pseudopodial 644 cytoskeleton that subsequently played a central role in the development, physiology, and 645 evolution of animals and Amoebozoa. 646 Within planomonads, the sister grouping of Nutomonas and Ancyromonas is much 647 stronger on our trees than it was on rDNA trees, where the extremely long branch of Nutomonas 648 18S rDNA reduced bootstrap support for this relationship; we conclude that Ancyromonadidae 649 Glücksman et al. (2013) is certainly holophyletic, as also are Planomonadida and 650 Apusomonadida. 651 Ultrastructure is unavailable for Mantamonas, but should clearly tell us whether 652 mantamonads are more closely related to planomonads (ML trees and the <30% missing gene 653 sample CAT tree) or to diphylleids (most other CAT trees in Figs 2, 3, S1) or a third varisulcan 654 lineage and the most divergent podiate of all (Fig. 3 CAT <50% missing gene sample). Even 655 without that, our trees show that the strong grouping of Mantamonas with apusomonads on 28S 656 rDNA trees (Glücksman et al., 2011) was phylogenetically incorrect; by contrast 18S rDNA 657 trees, weakly grouping Mantamonas with planomonads or having no significant resolution for 658 their position relative to other Varisulca (Glücksman et al., 2011, 2013) closely fit our multigene 659 trees, showing that a single-gene tree with low bootstrap support may sometimes be truer than a 660 contradictory one with high support. This may occasionally also be true of multigene trees 661 (Philippe et al., 2011), so the best test of their validity is not levels of statistical ‘support’ but 20 662 congruence with independent data and stability to addition of many more taxa, stability to 663 reduction in the proportion of missing data, and use of better phylogenetic algorithms. 664 The closeness of the branches at the base of podiates revealed by our trees implies a rapid 665 early radiation, which is even more striking at the base of corticates, especially compared with 666 the better-spaced and well-resolved deepest branches following the putatively primary 667 eukaryotic divergence of Euglenozoa and excavates. This explosive bush-like radiation near the 668 base of neozoa makes it difficult to resolve trees even with scores of genes, as does 669 incompleteness in the data; the observed weak resolution in these circumstances is to be 670 expected (Philippe et al., 2011). Though Apusozoa are probably paraphyletic and Varisulca 671 probably holophyletic, further work is needed to confirm this. Better resolution than achieved 672 here will require full genome sequences or near-complete transcriptomes for several 673 representatives of each sulcozoan lineage, including Rigifilida (varisulcan non-flagellate filose 674 amoebae: Yabuki et al., 2013), currently lacking transcriptomic data. Unlike their unsurprising 675 conflict over the likely paraphyly/holophyly of Apusozoa Varisulca, which we have resolved and 676 partially explained, all our trees unambiguously indicate that Sulcozoa as a whole are 677 paraphyletic: all Apusozoa are more closely related to opisthokonts than they are to Amoebozoa, 678 whereas all Varisulca branch below the bifurcation between Amoebozoa and 679 Apusozoa/opisthokonts. Support for that conclusion varied with taxon sampling, gene sampling, 680 and method, being up to 1.0 with PhyloBayes (Figs 2, 3) and up to 88% with ML (Fig. 5). 681 We conclude that the sulcozoan body plan was most likely ancestral to all other podiate 682 eukaryotes. In other words, dorsal pellicle, ventral pseudopodia, and posterior ciliary gliding all 683 evolved in the podiate common ancestor of Varisulca, Amoebozoa, Apusozoa, and opisthokonts, 684 which already had a ventral groove with a supporting cytoskeleton of two asymmetric posterior 685 microtubular centriolar roots (R2 being split) inherited from a common ancestor of 686 Malawimonas and podiates, i.e. a loukozoan excavate. If so, opisthokonts and Amoebozoa must 687 have lost the ancestral sulcozoan ciliary gliding, ventral groove, and dorsal pellicle 688 independently, as earlier argued (Cavalier-Smith, 2013a). This gives strong evidence for 689 descendants of Sulcozoa having undergone four independent ciliary losses and radical 690 cytoskeletal simplifications to give rise to the more symmetric cone-like skeletons found in 691 opisthokonts and breviate Sulcozoa, both of which retained two , and in the 692 amoebozoan phalansteriids and Archamoebae that lost the posterior centriole altogether 693 (possibly in a common ancestor, more likely independently if archamoebae are more closely 694 related to the basically biciliate myxogastrid slime moulds than are phalansteriids as some 18S 695 rDNA trees suggest). 21 696 A theoretically possible alternative is that Sulcozoa are polyphyletic. But that would 697 mean that dorsal pellicles, posterior ciliary gliding, and ventral pseudopodia each evolved three 698 times independently, this three-fold combination all in this part of the eukaryotic tree only and in 699 a relatively short time interval, nine unlikely coincidences and a most unparsimonious 700 hypothesis. The simplest interpretation is that posterior ciliary gliding evolved once only in 701 podiates, in their last common ancestor, and played a key role in their early evolution, as it 702 probably did independently in that of Euglenozoa and the chromist phylum Cercozoa, and 703 perhaps even in the origin of cilia (Cavalier-Smith, 2014); as in those two phyla, ancestral 704 podiate ciliary gliding was probably multiply replaced by swimming as ancestrally surface- 705 associated sulcozoa generated swimming planktonic descendant lineages. Note that 706 (Adl et al., 2012) is not a synonym for podiates, as it refers only to the podiate subclade that 707 excludes Varisulca. 708 709 4.3. Paraphyly of Excavata and the position of Malawimonas 710 Even the first 143-gene (35,584 positions) ML tree for 48 eukaryotes using the WAG 711 substitution matrix showed Excavata as paraphyletic with Malawimonas sister to podiates (no 712 Sulcozoa included) (Hampl et al., 2009); that was argued to be an artefactual result of including 713 long-branch Trichozoa, but as the same topology can be found by both CAT and ML-LG when 714 they are excluded (Fig. 2 and Brown et al., 2013) that argument is no longer defensible. Two 715 other previous multigene papers also provided suggestive evidence that Malawimonas may be 716 more closely related to opisthokonts/Amoebozoa (as seen on all three of our most gene-rich 717 Bayesian trees) than to other excavates (Derelle and Lang, 2012; Zhao et al., 2012); one showing 718 it as sister to Collodictyon (Zhao et al., 2012), generally assumed to be a tree-reconstruction 719 artefact. That work showed that the position of Malawimonas was exceptionally sensitive to 720 taxon and molecular site sampling. Our CAT trees are more taxon-rich and gene-rich and 721 provide further evidence that Malawimonas is more closely related to podiates than to other 722 excavates or corticates and suggest that the same may also be true for metamonads. 723 That is consistent with the argument that supergroup Eozoa is ancestral to neozoa and 724 thus paraphyletic (Cavalier-Smith, 2010a,c, 2013a, 2014). The idea that a Malawimonas-like cell 725 was ancestral to Sulcozoa (Cavalier-Smith and Chao, 2010) and other podiates (Cavalier-Smith, 726 2013a), is now very strongly supported, whether the excavate Malawimonas is truly sister to 727 podiates as our Bayesian and a few ML trees indicate or instead really sister to neozoa (or even 728 within podiates) as on other ML trees. Adding six Sulcozoa and many more genes to the tree 729 somewhat stabilised the position of Malawimonas compared with earlier taxonomically sparser 730 studies (Hampl et al., 2009; Zhao et al., 2012). Its closeness to the base of the podiate/corticate 22 731 bifurcation and to Varisulca, plus the incompleteness of the data for both Malawimonas and 732 Varisulca in the absence of genome sequences, make branching order in this region still 733 somewhat problematic. Nonetheless, the likelihood that Malawimonas is sister to podiates, 734 whereas corticates include Colponema [which similarly has a ventral groove with almost 735 indistinguishable microtubular cytoskeleton and a posterior cilium with ventral vane (Cavalier- 736 Smith, 2013a; Mignot and Brugerolle, 1975)], fits the idea that podiates and corticates both 737 evolved from a Malawimonas-like loukozoan excavate (Cavalier-Smith, 2013a), but suggests 738 they did so independently and are not directly sisters. If Malawimonas really is sister to podiates, 739 and Metamonada sister to this clade as the Fig. 2 CAT tree indicates, even ‘core excavates’ are 740 paraphyletic. 741 Contrary to what is sometimes erroneously thought, demonstration of the paraphyly of 742 ancestral groups like Excavata, Sulcozoa, Apusozoa, and Choanozoa does not make them less 743 interesting for evolution or unacceptable as taxa. Both paraphyletic and holophyletic groups arise 744 as a result of the monophyletic origin of a particular character set, in contrast to polyphyletic 745 ones that involve character convergence. The distinction between paraphyly and holophyly is 746 more artificial than that between monophyly (i.e. paraphyly plus holophyly: Ashlock, 1974) and 747 , and evolutionarily less important. Clear demonstration of paraphyly reveals ancestral 748 groups, so one can better understand early cell evolution, because it allows one to specify with 749 confidence successive evolutionary steps in morphology, as in the progression in vertebrates 750 through jawless fish, jawed fish, amphibian, reptile to (independently) and mammals; thus 751 paraphyletic groups per se are not evolutionarily misleading, but highly informative – what is 752 confusing is to treat paraphyletic groups as if they were holophyletic clades (see Cavalier-Smith, 753 2010b). 754 755 4.4. Importance of increasing gene and taxon representation in multigene trees 756 We found two instances (paraphyly of Apusozoa and exclusion of Rozella from Fungi) 757 where removing genes with more missing data increased the accuracy of ML, making it agree 758 more with CAT trees. In a third case, Varisulca, removing genes with more missing data 759 changed both CAT and ML trees in Fig. 3, harmonising their previously conflicting topology and 760 providing evidence for the holophyly of both Glissodiscea and Varisulca and for both CAT and 761 ML trees having been distorted in different ways by excessive proportions of missing data. A 762 fourth example of likely distortion was the apparent paraphyly of the Ichthyosporea/Filasterea 763 clade (Filosporidia) in CAT tree for one taxon sample only (Fig. 3) indicating that with such 764 distortion a globally better fitting model can sometimes give a less accurate topology for an 765 individual clade than one with globally better fit, possibly because as Roure et al. (2013) suggest 23 766 errors of an inaccurate model may occasionally reinforce rather than contradict the true topology 767 depending on the historical evolutionary accidents in a particular clade. That particular tree had a 768 smaller taxon sample than the others (Figs 2, 4, S3-5) that showed holophyletic Filosporidia, 769 consistent with richer taxon sampling generally improving trees. 770 We showed that analyses of alignments with differing proportions of partially missing 771 genes can be helpful in identifying such local exceptions to the reasonable expectation that 772 models with globally better fit to the data should generally yield more accurate trees. Thus our 773 extensive analyses of effects of proportionally reducing missing data, not previously done for 774 protist deep branching, confirms studies with data that potentially misleading tree 775 distortions can affect both CAT and ML tree-reconstruction algorithms and supports the 776 argument of Roure et al. (2013) that trees with fewer but taxonomically more completely 777 represented genes can sometimes be more accurate than those with many more genes if the latter 778 suffer from unequal taxonomic representation of too high of fraction of the included genes. 779 By contrast to these examples, the stem at the base of the clade comprising opisthokonts, 780 Apusozoa, and Amoebozoa is consistently well supported by both ML and CAT trees, showing 781 the robustness of all Varisulca being the deepest branching of all podiates. This difference is 782 perhaps attributable to whole genome sequences being available for many opisthokonts and 783 Amoebozoa and one apusozoan, whereas none is for Varisulca. The basal branching of Sulcozoa 784 and therefore podiates as a whole will probably only be more robustly resolved when at least one 785 complete genome or near-complete transcriptome is available for each of the three varisulcan 786 lineages as well as for a breviate. The same problem applies to the still more uncertain position 787 of Mantamonas. The stem at the base of the clade containing it and Collodictyon on Fig. 4 is just 788 as short. Indeed this grouping is supported (weakly) only in the full alignment and for CAT the 789 <40% alignment (Fig. 3) and disappears for both ML and CAT with proportionally less missing 790 data (Fig. 3); though ML always had a glissodiscean clade, the CAT tree with <50% missing 791 data placed Mantamonas on its own as sister neither to planomonads nor to Collodictyon, but as 792 sister to Collodictyon plus Amoebozoa, Apusozoa, and opisthokonts (low 0.51 support). The 793 topology of the holophyletic Varisulca in the <30% missing gene tree (Fig. 2) ought technically 794 to be the most reliable, and is in harmony with ML trees that exclude Malawimonas, and with all 795 the cell evolutionary arguments underlying the establishment of that subphylum (Cavalier-Smith, 796 2013a). Though that makes it overall the most likely topology, more nearly complete 797 transcriptomes for taxa branching in this region are essential to test it more thoroughly. That 798 diphylleids and Mantamonas are represented by only single species, unlike all other sulcozoan 799 clades, makes accurate tree construction even more difficult, so it will be desirable to break up 24 800 these long branches by adding other members of both groups (and a rigifilid, which rDNA 801 suggests is related to diphylleids: Yabuki et al., 2013). 802

803 5. Conclusions 804 We conclude that Varisulca (diphylleids plus Glissodiscea) are the most divergent 805 podiates, and probably holophyletic, with important implications for the origin and earliest 806 evolution of podiates. In harmony with 18S rDNA trees, we conclusively show, contrary to 28S 807 rDNA trees (Glücksman et al., 2011), that Mantamonas is more closely related to planomonads 808 and diphylleids than to apusomonads. We show for the first time that within breviates 809 Subulatomonas is more closely related to Breviata than to Pygsuia and that the apusomonad 810 Thecamonas as currently constituted is paraphyletic or polyphyletic. Our analyses clarify 811 the reasons for and largely resolve previous contradictions between site-homogeneous ML and 812 site-heterogeneous Bayesian trees with respect to the holophyly or paraphyly of Apusozoa (and 813 similar contradictions seen here for Varisulca as well as two previously inconsistent branching 814 patterns within opisthokonts) and confirm the conclusion that Apusozoa are paraphyletic (Brown 815 et al., 2013); as breviates and apusomonads are probably not sisters, their shared apusozoan 816 phenotype is ancestral to that of opisthokonts. 817 Our trees are consistent with the subdivision of Sulcozoa into two subphyla, confirm that 818 breviates belong in Apusozoa (not Amoebozoa) and that Apusozoa are sister to opisthokonts, 819 and show that Sulcozoa are paraphyletic and thus ancestral to all other podiate eukaryotes, with 820 subphylum Varisulca being sister to opisthokonts plus Apusozoa and Amoebozoa. That means 821 that the simple microtubular skeletons of animal and other opisthokont cells arose by radically 822 simplifying a much more complex cell body plan that first evolved in association with a ventral 823 feeding groove in the ancestor of biciliate excavates, from which all eukaryotes other than 824 Euglenozoa are argued to have descended (Cavalier-Smith, 2010a, 2013a). Overall, when initial 825 conflicts between CAT and ML trees are resolved by reducing the proportion of missing data, 826 our trees provide substantial support for the thesis that the sulcozoan dorsal pellicle, ventral 827 pseudopodia, and posterior ciliary gliding all evolved simultaneously and coadaptively during 828 the origin of ancestral podiates from a swimming, non-gliding, non-pseudopodial Malawimonas- 829 like excavate (Cavalier-Smith, 2010a, 2013a). Moreover, CAT trees place Malawimonas and 830 anaeromonad metamonads as sisters to podiates as in Brown et al. (2013), further strengthening 831 that idea. Our analyses clarify hitherto contradictory interpretations of the phylogenetic position 832 of malawimonads and make it likely that Loukozoa and excavates as a whole are paraphyletic. 833 25 834 Appendix A. Electronic Supplementary Information is available in the online version of the 835 paper. 836 837 Acknowledgements This work was supported by the Natural Environment Research 838 Council (grant number NE/E004156/1) and by the Leverhulme trust (grant number 839 R1008101). 840 RL thanks NERC for a research studentship. We thank Oxford University Advanced Research 841 Computing (ARC) for access to their clusters and efficient advice, Matthew Brown, Fabien 842 Burki, Jessica Grant, Vladimir Hampl, Timothy James, Naira Rodríguez-Espeleta, and Kamran 843 Shalchian-Tabrizi for protein alignments or sequences. 844 845 References 846 847 Adl, S.M., Simpson, A.G.B., Lane, C. E., Lukeš, J., Bass, D., Bowser, S. S., Brown, M. W., 848 Burki, F., Dunthorn, M., Hampl, V., et al. 2012. The revised classification of eukaryotes. J. 849 Eukaryot. Microbiol. 59, 429-493. 850 Ashlock, P., 1974. The use of cladistics. Ann. Rev. Ecol. Syst. 5, 81-99. 851 Brown, M.W., Sharpe, S.C., Silberman, J.D., Heiss, A.A., Lang, F.B., Simpson, A.G.B., Roger, 852 A.J., 2013. Phylogenomics demonstrates that breviate flagellates are related to opisthokonts and 853 apusomonads. Proc. R. Soc. B 280, 20131755. 854 Brugerolle, G., Bricheux, G., Philippe, H., Coffea, G., 2002. Collodictyon triciliatum and 855 rotans (=Aulacomonas submarina) form a new family of flagellates 856 () with tubular mitochondrial cristae that is phylogenetically distant from other 857 flagellate groups. Protist 153, 59-70. 858 Burger, G., Gray, M.W., Forget, L., Lang, B.F., 2013. Strikingly -like and gene-rich 859 mitochondrial genomes throughout jakobid . Genome Biol. Evol. 5, 418-438. 860 Burki, F., Inagaki, Y., Bråte, J., Archibald, J.M., Keeling, P.J., Cavalier-Smith, T., Sakaguchi, 861 M., Hashimoto, T., Horak, A., Kumar, S., Klaveness, D., Jakobsen, K.S., Pawlowski, J., 862 Shalchian-Tabrizi, K., 2009. Large-scale phylogenomic analyses reveal that two enigmatic 863 protist lineages, and Centroheliozoa, are related to photosynthetic chromalveolates. 864 Genome. Biol. Evol. 1, 231-238. 865 Burki, F., Okamoto, N., Pombert, J.-F., Keeling, P.J., 2012. The evolutionary history of 866 and cryptophytes: phylogenomic evidence for separate origins. Proc. R. Soc. B, 867 2246-2254. 868 Cavalier-Smith, T., 1997. Amoeboflagellates and mitochondrial cristae in eukaryote evolution: 869 megasystematics of the new protozoan subkingdoms Eozoa and Neozoa. Arch. Protistenkd. 147, 870 237-258. 871 Cavalier-Smith, T., 2000. Flagellate megaevolution: the basis for eukaryote diversification. In: 872 Green, J., Leadbeater, B. (Eds.), The Flagellates. Taylor and Francis, London, pp. 361-390. 873 Cavalier-Smith, T., 2002. The phagotrophic origin of eukaryotes and phylogenetic classification 874 of Protozoa. Int. J. Syst. Evol. Microbiol. 52, 297-354. 875 Cavalier-Smith, T., 2003. The excavate protozoan phyla Metamonada Grassé emend. 876 (, Parabasalia, , Eopharyngia) and Loukozoa emend. (Jakobea, 877 Malawimonas): their evolutionary affinities and new higher taxa. Int. J. Syst. Evol. Microbiol. 878 53, 1741-1758. 879 Cavalier-Smith, T., 2009. Megaphylogeny, cell body plans, adaptive zones: causes and timing of 26 880 eukaryote basal radiations. J. Euk. Microbiol. 56, 26-33. 881 Cavalier-Smith, T., 2010a. Kingdoms Protozoa and Chromista and the eozoan root of the 882 eukaryotic tree. Biol. Lett. 6, 342-345. 883 Cavalier-Smith, T., 2010b. Deep phylogeny, ancestral groups, and the four ages of life. Phil. Trans. Roy. 884 Soc. B 365, 111-132. 885 Cavalier-Smith, T. 2010c. The origin of the , and sex: roles of intracellular 886 coevolution. Biology Direct 5, 7. 887 Cavalier-Smith, T., 2013a. Early evolution of eukaryote feeding modes, cell structural diversity, 888 and classification of the protozoan phyla Loukozoa, Sulcozoa, and Choanozoa. Eur. J. Protistol. 889 49, 115-178. 890 Cavalier-Smith, T. 2013b. : mechanisms, evolutionary consequences, and systematic 891 implications. Ann. Rev. Syst. Ecol. Evol. 44, 145-172. 892 Cavalier-Smith, T. 2014. The neomuran revolution and phagotrophic origin of eukaryotes and cilia in 893 the light of intracellular coevolution and a revised tree of life. In The Origin and Evolution of 894 Eukaryotes. Eds P. J. Keeling and E. V. Koonin. Cold Spring Harbor Perspectives Biology. Pp. 41-77 895 doi: 10.1101/cshperspect.a016006 896 Cavalier-Smith, T., Chao, E.E., 1995. The opalozoan is related to the common 897 ancestor of animals, fungi, and choanoflagellates. Proc. R. Soc. B 261, 1-6. 898 Cavalier-Smith, T., Chao, E.E., 2003. Phylogeny of Choanozoa, Apusozoa, and other Protozoa 899 and early eukaryote megaevolution. J. Mol. Evol. 56, 540-563. 900 Cavalier-Smith, T., Chao, E.E., 2010. Phylogeny and evolution of Apusomonadida (Protozoa: 901 Apusozoa): new genera and species. Protist 161, 549-576. 902 Cavalier-Smith, T., Nikolaev, S., 2008. The zooflagellates and 903 are a clade (Percolatea: Percolozoa). J. Euk. Microbiol. 55, 501-509. 904 Cavalier-Smith, T., Chao, E.E., Oates, B., 2004. Molecular phylogeny of Amoebozoa and the 905 evolutionary significance of the unikont . Eur. J. Protistol. 40, 21-48. 906 Cavalier-Smith, T., Chao, E.E., Stechmann, A., Oates, B., Nikolaev, S.I., 2008. Planomonadida 907 ord. nov. (Apusozoa): ultrastructural affinity with Micronuclearia podoventralis and deep 908 divergences within Planomonas gen. nov. Protist 159, 535-562. 909 Chomczynski, P., Sacchi, N., 1987. Single-step method of RNA isolation by acid guanidinium 910 thiocyanate-phenol-chloroform extraction. Anal. Biochem. 162, 156-159. 911 Derelle, R., Lang, F.B., 2012. Rooting the eukaryotic tree with mitochondrial and bacterial 912 proteins. Mol. Biol. Evol. 29, 1277-1289. 913 Deschamps, P., Moreira, D., 2009. Signal conflicts in the phylogeny of the primary 914 photosynthetic eukaryotes. Mol. Biol. Evol. 261, 2745-2753. 915 Glücksman, E., Snell, E.A., Berney, C., Chao, E.E., Bass, D., Cavalier-Smith, T., 2011. The 916 novel marine gliding zooflagellate genus Mantamonas (Mantamonadida ord. n.: Apusozoa). 917 Protist 162, 207-221. 918 Glücksman, E., Snell, E.A., Cavalier-Smith, T., 2013. Phylogeny and evolution of 919 Planomonadida (Sulcozoa): eight new species and new genera Fabomonas and Nutomonas. Eur. 920 J. Protistol. 49, 179-200. 921 Grant, J., Lahr, D.J.G., Rey, F.E., Gordon Burleigh, J., Gordon, J.I., Knight, R., Molestina, R.E., 922 Katz, L.A., 2012. Gene discovery from a pilot study of the transcriptomes from three diverse 923 microbial eukaryotes: Corallomyxa tenera, Chilodonella uncinata, and Subulatomonas 924 tetraspora. Prot. Genom., 3-18. 925 Hampl, V., Hug, L., Leigh, J.W., Dacks, J.B., Lang, F.B., Simpson, A.G.B., Roger, A.J., 2009. 926 Phylogenomic analyses support the monophyly of Excavata and resolve relationships among 927 eukaryotic ‘‘supergroups’’. Proc. Natl. Acad. Sci. USA 106, 3859-3864. 928 Heiss, A.A., Walker, G., Simpson, A.G.B., 2013. The flagellar apparatus of Breviata anathema, 929 a eukaryote without a clear supergroup affinity. Eur. J. Protistol. 49, 354-372. 27 930 Katz, L.A., Grant, J., Parfrey, L.W., Gant, A., O’Kelly, C.J., Anderson, R.O., Molestina, R.E., 931 Nerad, T.A., 2011. Subulatomonas tetraspora nov. gen. nov. sp. is a member of a previously 932 unrecognized major clade of eukaryotes. Protist 162, 762-773. 933 Lang, B.F., Burger, G., O'Kelly, C.J., Cedergren, R., Golding, G.B., Lemieux, C., Sankoff, D., Turmel, 934 M., Gray, M.W., 1997. An ancestral mitochondrial DNA resembling a eubacterial genome in miniature. 935 Nature 387, 493-497. 936 Lartillot, N., Philippe, H., 2008. Improvement of molecular phylogenetic inference and the 937 phylogeny of . Phil. Trans. R. Soc. B 363, 1463-1472. 938 Lasek-Nesselquist, E., Gogarten, J.P., 2013. The effects of model choice and mitigating bias on 939 the ribosomal tree of life. Mol. Phyl. Evol. 69, 17-38. 940 Minge, M.A., Silberman, J.D., Orr, R.J., Cavalier-Smith, T., Shalchian-Tabrizi, K., Burki, F., 941 Skjæveland, Å., Jakobsen, K.S., 2009. Evolutionary position of breviate amoebae and the 942 primary eukaryote divergence. Proc. R. Soc. B 276, 597-604. 943 O'Kelly, C., 1993. The jakobid flagellates: structural features of , and 944 and implications for the early diversification of eukaryotes. J. Euk. Microbiol. 40, 627- 945 636. 946 Paps, J., Medina-Chacón, L.A., Marshall, W., Suga, H., Ruiz-Trillo, I., 2013. Molecular 947 phylogeny of unikonts: new insights into the position of apusomonads and ancyromonads and 948 the internal relationships of opisthokonts. Protist 164, 2-12. 949 Philippe, H., Brinkmann, H., Lavrov, D.V., Littlewood, D.T.J., Manuel, M., Wörheide, G., 950 Baurain, D., 2011. Resolving difficult phylogenetic questions: why more sequences are not 951 enough. PLoS Biology 9, e1000602. 952 Richards, T.A., Cavalier-Smith, T., 2005. Myosin evolution and the primary divergence 953 of eukaryotes. Nature 436, 1113-1118. 954 Roure, B., Baurain, D., Philippe, H., 2013. Impact of missing data on phylogenies inferred from 955 empirical phylogenomic data sets. Mol. Biol. Evol., pp. 197-214. 956 Roure, B., Rodríguez-Ezpeleta, N., Philippe, H., 2007. SCaFoS: a tool for selection, 957 concatenation and fusion of sequences for phylogenomics. BMC Evol. Biol. 7, S2. 958 Sebé-Pedrós, A., Grau-Bové, X., Richards, T.A., Ruiz-Trillo, I., 2014. Evolution and classification of 959 myosins, a paneukaryotic whole genome approach. Genome Biol. Evol. 6, 290-305. 960 Shalchian-Tabrizi, K., Minge, M.A., Espelund, M., Orr, R.J., Ruden, T., Jakobsen, K.S., 961 Cavalier-Smith, T., 2008. Multigene phylogeny of Choanozoa and the origin of animals. PLoS 962 ONE 3, e2098. 963 Simpson, A.G.B., 2003. Cytoskeletal organization, phylogenetic affinities and systematics in the 964 contentious taxon Excavata (Eukaryota). Int. J. Syst. Evol. Microbiol. 53, 1759-1777. 965 Simpson, A.G.B., Patterson, D.J., 1999. The ultrastructure of Carpediemonas membranifera 966 (Eukaryota) with reference to the "excavate hypothesis". Eur. J. Protistol. 35, 353-370. 967 Torruella, G., Derelle, R., Paps, J., Lang, F.B., Roger, A.J., Shalchian-Tabrizi, K., Ruiz-Trillo, I., 968 2012. Phylogenetic relationships within the Opisthokonta based on phylogenomic analyses of 969 conserved single-copy protein domains. Mol. Biol. Evol. 29, 531-544. 970 Yabuki, A., Eikrem, W., Takishita, K., Patterson, D.J., 2013. Fine structure of Telonema subtilis 971 Griessmann, 1913: a flagellate with a unique cytoskeletal structure among eukaryotes. Protist 972 164, 556-569. 973 Zhao, S., Burki, F., Bråte, J., Keeling, P.J., Klaveness, D., Shalchian-Tabrizi, K., 2012. 974 Collodictyon - an ancient lineage in the tree of eukaryotes. Mol. Biol. Evol. 29, 1557-1568. 975 Zhao, S., Shalchian-Tabrizi, K., Klaveness, D., 2013. Sulcozoa revealed as a paraphyletic group 976 in mitochondrial phylogenomics. Mol. Phyl. Evol. 69, 462-468. 977 978 979 Fig. 1. Likely relationships amongst the eukaryote supergroups (podiates, corticates, 980 Eozoa) highlighting major steps in eukaryote cell diversification. Group names in red are 28 981 opisthokonts; those in green were ancestrally photosynthetic. The non-photosynthetic 982 phagotrophic last common ancestor of all eukaryotes had two cilia, stemming from a pair of 983 nucleus-attached centrioles anchored within the cell by three microtubular roots R1-R3 (red 984 lines). The eukaryotic tree’s root is shown within Eozoa, between excavates (with ventral 985 feeding groove supported by a split right microtubular root R2 and left root R1, and divergent 986 centrioles and cilia) and the grooveless Euglenozoa, whose unsplit, reflexed R2 supports their 987 cytopharynx with complex mouth parts (centrioles and ciliary bases parallel), as explained 988 elsewhere (Cavalier-Smith, 2010a,c, 2013a, 2014), though this precise position is not universally 989 accepted. The groove-supporting microtubular skeleton of loukozoan excavates was arguably 990 inherited by neozoa, but simplified when Choanozoa (ancestral opisthokonts) lost the anterior 991 cilium when evolving a filopodial collar (Cavalier-Smith, 2013a). Our multigene trees showing 992 that the excavate flagellate Malawimonas branches beside (or less likely within) Sulcozoa (Figs 993 2, 3, S1, S3-5) prove that the ancestors of podiates prior to the origin of their dorsal pellicle and 994 posterior ciliary gliding had an excavate-like cytoskeleton, and vane-bearing posterior cilium, 995 like the chromist Colponema and many loukozoan excavates. The sulcozoan dorsal rigidifying 996 pellicular layer(s) (lost by opisthokonts and Amoebozoa) overlying the dorsal root R3 is blue, 997 pseudopodia brown, mitochondria purple. Chloroplasts originated in a biciliate corticate with 998 cortical alveoli by symbiogenetic enslavement of a cyanobacterium; after green plants and red 999 algae diverged, chromists acquired chloroplasts by secondary transfer of a red algal cell (red 1000 arrow) into a host cell similar to Colponema (Alveolata), whose cortical alveoli and excavate- 1001 like cytoskeleton represent the ancestral state for corticates (Cavalier-Smith, 2013a,b). 1002 1003 Fig. 2. Bayesian GTR-CAT tree for 75 eukaryotes based on 52,824 amino acid positions in 1004 192 protein-coding genes. Two well-converged chains (maxdiff 0.080514) were summed. 1005 Support values for bipartitions are posterior probabilities (PP left) and bootstrap support (BS) 1006 percentages (100 pseudoreplicates) for a separate RAxML PROTGAMMALGF tree for the same 1007 alignment (right) are given; a black blob means both were maximal (1.0 or 100%); bipartitions 1008 not found in ML are shown by -. The number of amino acids included for each taxon follows its 1009 name; the six sulcozoan taxa sequenced here are in bold. In eight genera sequences for closely 1010 related species are combined as specified in Supplementary Table S1. Vertical labels denote the 1011 three eukaryotic supergroups and capitals the four derived kingdoms: all other branches belong 1012 to the basal Protozoa. The tree is rooted between Euglenozoa and excavates as shown 1013 by universal ribosomal protein trees (Lasel-Nesselquist and Gogarten 2013), and a dozen other 1014 arguments (Cavalier-Smith 2010a, 2013a). Arrows show likely points of origin of sulcozoan 1015 pellicle, ciliary gliding, and pseudopodia if this topology showing paraphyletic Varisulca were 29 1016 correct; however if Varisulca are holophyletic as in all corresponding ML trees and also the 73- 1017 taxon CAT tree with the lowest proportion of missing data (Fig. S2), arguably more likely (see 1018 text), then all three of these key sulcozoan characters probably evolved simultaneously in the 1019 ancestral podiate. The basal podiate and corticate near-simultaneous rapid radiations may be 1020 related to the origin of chloroplasts (perhaps corresponding to increased fossil cell complexity 1021 ~800 My ago (Cavalier-Smith, 2013a,b)), apparently long after the primary eukaryote 1022 bifurcation between excavates and Euglenozoa. 1023 1024 Fig, 3. Bayesian GTR-CAT tree for 73 neozoan eukaryotes based on 45,194 amino acid 1025 positions in 173 protein-coding genes. Support values for bipartitions are posterior proba bilities 1026 (leftmost for the complete alignment, next for the <50% missing genes subalignment – 30,538 1027 amino acids, then the <40% missing genes subalignment - 17,470 amino acids) and bootstrap 1028 percentages (100 pseudoreplicates, first for the complete alignment, next for the smaller <50%, 1029 <40%, and <30% (8680 amino acids) missing gene subalignments) for separate RAxML 1030 PROTGAMMALGF trees are given; black blobs indicate maximal (1.0 or 100%) support in all 1031 eight trees; bipartitions not found in relevant trees are shown by -. The number of amino acids 1032 included for each taxon in the complete alignment follows its name (augmented for some taxa by 1033 adding extra genes compared with Figs 2, 5 and S1); the six sulcozoan taxa we sequenced are in 1034 bold. This tree is for one of four independent chains that did not converge because each had a 1035 different topology within corticates; as all four had identical topology for all Sulcozoa the one 1036 most consistent with other evidence for chromists is used for this figure; as discussed in the text, 1037 two chains also conflicted with others for the position of Rozella within opisthokonts. The two 1038 chains run for the CAT tree for the <50% and <40% missing gene alignments also did not 1039 converge because of the same conflict over Rozella; bootstrap values for them are for the chain 1040 with the topology shown; both these pairs of chains agreed for all Sulcozoa and chromists. 1041 However, the two chains for the CAT tree for the <30% missing gene alignments did converge 1042 (maxdiff 0.228) and Varisulca had a different topology (see Fig. S2) that is identical to that in 1043 ML for the <40% and <30% missing gene alignments (see text). 1044 1045 Fig. 4. Maximum likelihood tree for 98 eukaryotes based on 45,194 amino acid positions in 1046 173 protein-coding genes. 1047 Support values for bipartitions are bootstrap percentages (100 pseudoreplicates, first for the 1048 complete alignment, next for the smaller <50%, <40%, <30% missing gene subalignments, as in 1049 Fig. 3) for separate RAxML PROTGAMMALGF trees are given; black blobs indicate maximal 1050 (1.0 or 100%) support in all four trees; bipartitions not found in relevant trees are shown by -. 30 1051 The number of amino acids included for each taxon in the complete alignment follows its name 1052 (augmented for some taxa by adding extra genes compared with Figs 2, 5, and S1); the complete 1053 trees for the subalignment showing the number of amino acids retained for each taxon are in 1054 supplementary material (Figs S3-5); the six sulcozoan taxa we sequenced are in bold. 1055 1056 Fig. 5. Maximum likelihood tree for 122 eukaryotes based on 47,510 amino acid positions 1057 in 178 genes (14 genes unrepresented in our new sulcozoan data excluded). This RAxML 1058 PROTGAMMALGF (4 gamma rates) tree is rooted within Eozoa between Euglenozoa and 1059 Excavata according to (Cavalier-Smith, 2010a). The number of amino acids included for each 1060 taxon follows its name; the six sulcozoan taxa we sequenced are in bold. Vertical labels denote 1061 the three eukaryotic supergroups and capitals the four derived kingdoms: all other branches 1062 belong to the basal kingdom Protozoa. Bootstrap support (100 pseudoreplicates) for bipartitions 1063 are shown from left to right for (1) the complete data; (2) after removing the fastest 19%, or (3) 1064 40%, of amino acid positions; a black blob means 100% support in all 6 trees with varying site 1065 removal. In six cases sequences for closely related species are combined as specified in 1066 Supplementary Table S1. 1067 1068 Table 1. Summary of the numbers of genes and taxa for the various trees 1069 ______1070 1071 Figure Taxa Composition Algorithm Genes Amino acids Purpose 1072 1073 ______1074 Figure 2 75 eukaryote-wide CAT/ML 192 52,824 simple summary 1075 1076 Figure 3 73 neozoa (no Eozoa) CAT/ML 173 or less* 45,194* missing data test 1077 1078 Figure 4 98 eukaryote-wide ML 173 or less* 45,194* missing data test 1079 + S3-5 for more taxa 1080 1081 Figure 5 122 eukaryote-wide ML 178 47,510 faster site removal 1082 1083 Figure S1 86 non-chromists CAT 178 47,510 taxon sampling test 1084 extra excavates 1085 1086 Figure S1A 86 non-chromists ML 178 47,510 taxon sampling test 1087 extra excavates 1088 1089 Figure S2 73 neozoa (no Eozoa) CAT ** 8,680 missing data test 1090 1091 ______1092 1093 * These tests calculated trees for both the full 173-gene alignment and also after 1094 omitting successively more genes for which data were absent for >50%, >40%, and 31 1095 30% of taxa, thus also analysed fewer genes and amino acids (the indicated number of 1096 amino acids is for 173 genes; those for the smaller alignments with proportionally less 1097 missing data are in the Fig. 3 legend). 1098 ** This tree is for the same taxa and gene set as Figure 3 after excluding all genes 1099 missing in >30% of taxa. 1100

podiates ANIMALIA FUNGI Microsporidia

corticates Amoebozoa Choanozoa PLANTAE CHROMISTA

CHLOROPLAST Sulcozoa groove Colponema ciliary gliding pseudopodia cortical alveoli

neozoa

cytopharynx Eozoa Excavata groove Euglenozoa Jakoba (Loukozoa)

Glaucocystis nostochinearum 17540 Cyanophora paradoxa 23184 Glaucophyta c Arabidopsis thaliana 50099 Physcomitrella patens 49794 o Mesostigma viride 12685 Viridiplantae 1/46 Micromonas 43711 r 46570 PLANTAE 0.99/18 Porphyridium cruentum 11821 t Pyropia yezoensis 24757 Rhodophyta Gracilaria changii 16047 i Galdieria sulphuraria 30803 1/58 0.98/- Cyanidioschyzon merolae 34490 c Roombia truncata 12279 Cryptista Guillardia theta 27440 1/41 Telonema subtilis 11154 Hacrobia a Polyplacocystis contractilis 16035 Emiliania huxleyi 37209 Haptophyta CHROMISTA t 1/47 Phytophthora 50208 1/- 47049 42549 Harosa e Rhizaria Reticulomyxa filosa 10833 1/33 Bigelowiella natans 22165 s Fabomonas tropica 8672 Nutomonas howeae 11125 Planomonadida pellicle, 0.51/- Ancyromonas sigmoides 8640 Glissodiscea Varisulca Mantamonas plastica 19454 Mantamonadida gliding Collodictyon triciliatum 16091 Diphylleida Breviata anathema 16676 Breviatea 0.89/33 Thecamonas trahens 47517 Sulcozoa Thecamonas oxoniensis 11167 Apusomonadida Apusozoa 0.54/51 Multimonas media 4279 pseudopodia Homo sapiens 49597 Ciona 45352 1/69 Ixodes 35572 p Drosophila melanogaster 45975 ANIMALIA Trichoplax adhaerens 46119 o 0.99/- 1/- Nematostella vectensis 47380 Monosiga ovata 31109 Monosiga brevicollis 46198 Choanoflagellatea d owczarzaki 31432 Filasterea Filosporidia 31663 Choanozoa i parasiticum 11281 Ichthyosporea 0.78/- simplex 6513 Cristidiscoidea alba 8315 a Schizosaccharomyces pombe 44500 opisthokonts Neurospora crassa 45513 t Phanerochaete chrysosporium 29726 Neomycota 1/59 Cryptococcus neoformans 43159 e Phycomyces blakesleeanus 43819 verticillata 26909 FUNGI Spizellomyces punctatus 23991 s Batrachochytrium dendrobatidis 39264 Chytridiomycota 1/98 Blastocladiella emersonii 27477 1/- Allomyces macrogynus 26824 1/96 Phreatamoeba balamuthi 29601 Archamoebae Physarum polycephalum 31049 Dictyostelium discoideum 46714 Amoebozoa Acanthamoeba castellanii 33497 Vermamoeba vermiformis 25566 0.99/95 Malawimonas jakobiformis 25026 Malawimonas californiana 19754 Malawimonadea Anaeromonadea Trimastix pyriformis 12918 Metamonada sp. 11891 E 1/66 'Seculamonas ecuadoriensis' 27570 Reclinomonas americana 31660 Loukozoa 1/- Jakoba libera 19440 Jakobea Excavata o Jakoba bahamensis 19248 incarcerata 13678 z lipophora 15040 marylandensis 25091 Percolozoa 49268 o brucei 45368 44668 a papillatum 8379 Euglenozoa longa 10833

0.3 Fabomonas tropica 8589 Nutomonas howeae 11027 Planomonadida Ancyromonas sigmoides 8641 Varisulca 0.5/-/-/0.87/59/-/-/- Mantamonas plastica 19339 Mantamonadida Collodictyon triciliatum 15545 Diphylleida Pygsuia biforma 27930 Breviatea Subulatomonas tetraspora 5004 0.98/.99/.99/1 1/1/1/1/99/ Breviata anathema 16432 /37/55/78/79 100/99/99 Manchomonas bermudensis 8973 Thecamonas trahens 47517 Sulcozoa 1/1/1/99/ Thecamonas oxoniensis 10999 100/99/69 Apusomonadida 1/1/1/1/ Multimonas media 4091 Apusozoa 1/1/1/0.42/ 100/100/ Homo sapiens 43422 89/85/82/- 100/91 Ciona 38948 0.99/.89/0.99/- Ixodes 31145 ANIMALIA /-/-/-/- Drosophila melanogaster 39632 p 1/1/1/0.56 Trichoplax adhaerens 39424 /-/-/30/13 1/1/1/1/100 Nematostella vectensis 39955 opisthokonts 1/1/1/-/ /100/100/83 o -/-/-/- Monosiga ovata 28519 Choanoflagellatea Monosiga brevicollis 40139 Choanozoa vibrans 4784 d Filasterea Capsaspora owczarzaki 29255 Filosporidia Sphaeroforma arctica 28785 i Amoebidium parasiticum 10656 Ichthyosporea Nuclearia simplex 6039 1/0.99/ Cristidiscoidea a 1/0.76/ Fonticula alba 7883 81/76/ Rozella allomycis 9916 Rozellidea 79/17 1/1/1/0.98/79/87/63/76 Piromyces sp. 10508 t Spizellomyces punctatus 23950 Batrachochytrium dendrobatidis 35587 Chytridiomycota e 1/1/ 1/1/ 1/1/-/0.97/79/87/63/76 Blastocladiella emersonii 25324 99/ Allomyces macrogynus 25491 87/ 1/1/1/.99/100/100/97/100 s 1/1/0.86/0.98 FUNGI 90/ Phycomyces blakesleeanus 38602 46 /-/-/33/50 Mortierella verticillata 26653 Zygomycota Schizosaccharomyces pombe 38624 1/1/1/1/100/100/100/99 Neurospora crassa 40648 Phanerochaete chrysosporium 26490 Neomycota Filobasidiella neoformans 38320 1/1/1/0.99/100/100 /99/96 Phreatamoeba balamuthi 27615 Archamoebae Physarum polycephalum 27929 Dictyostelium discoideum 41190 Mycetozoa Acanthamoeba castellanii 40931 1/1/1/1/100/100/93/95 Amoebozoa Vermamoeba vermiformis 24039 Lobosa Telonema subtilis 9567 0.81/1/0.99/0.83/63/44/63/55 Polyplacocystis contractilis 15641 0.88/-/-/-/69/-/-/- Diacronema (=Pavlova) lutheri 18368 Prymnesium parvum 18482 Haptophyta 0,88/-/-/-/46/-/-/- Emiliania huxleyi 32292 Thalassiosira pseudonana 41289 c 1/1/1/0.99/ Phytophthora 43020 40822 74/81/43/- Paramecium o Perkinsus marinus 29250 1/1/1/1/99/ Toxoplasma gondii 35154 100/100/99 Cryptosporidium 37189 r 1/1/0.98/0.99/95 Reticulomyxa /100/89/73 Gromia sphaerica 8580 filosa 15356 t Paracercomonas marina 11991 1/1/1/1/100/ Bigelowiella natans 31526 Harosa i 99/100/100 Roombia truncata 16435 Goniomonas pacifica 4304 Cryptista c Guillardia theta 35056 CHROMISTA Glaucocystis nostochinearum 17196 a 1/-/-/-/ Cyanophora paradoxa 33138 Glaucophyta 46/-/-/- Porphyridium cruentum 11031 Pyropia yezoensis 23341 t 1/1/1/1/97/100/99/99 0.92/-/-/ Gracilaria changii 15325 Rhodophyta -/-/-/-/- Galdieria sulphuraria 27972 e 1/1/1/1/100/86/91/88 Cyanidioschyzon merolae 34316 1/-/-/-/ Micromonas 39201 s -/-/-/- Chlamydomonas 40605 Arabidopsis thaliana 43994 Viridiplantae Physcomitrella patens 43301 Mesostigma viride 12213 PLANTAE 0.3 Bigelowiella natans 31526 100/100/100/93 Paracercomonas marina 11991 c Gromia sphaerica 8580 Rhizaria 100/94/72/- Reticulomyxa filosa 15356 99/98/98/54 100/98 Perkinsus marinus 29250 o /99/97 Toxoplasma gondii 35154 Harosa Cryptosporidium 37189 Halvaria Paramecium 40822 r 87/64 Thalassiosira pseudonana 41289 /-/- Phytophthora 43020 Heterokonta 65/-/65/54 Polyplacocystis contractilis 15641 t 41/-/-/- 55/-/-/- Telonema subtilis 9567 23/- Diacronema (=Pavlova) lutheri 18368 Haptista i /-/- Emiliania huxleyi 32292 Haptophyta Prymnesium parvum 18482 Roombia truncata 16435 CHROMISTA c 100/100 Guillardia theta 35056 Cryptista /100/99 Goniomonas pacifica 4304 95/95/64/60 Galdieria sulphuraria 27972 a 53/49/ 20/17 Cyanidioschyzon merolae 34316 Porphyridium cruentum 11031 Pyropia yezoensis 23341 Rhodophyta t 100/100/100/99 Gracilaria changii 15325 Micromonas 39201 26/- Chlamydomonas 40605 PLANTAE e /-/- Mesostigma viride 12213 Arabidopsis thaliana 43994 Viridiplantae 38/- Physcomitrella patens 43301 Tracheophyta s Glaucocystis nostochinearum 17191 /-/- Cyanophora paradoxa 33138 Glaucophyta 100/100/ 99/62 Phreatamoeba balamuthi 27615 Archamoebae 100/100 Physarum polycephalum 27929 /100/88 100/100/ 100/99 Dictyostelium discoideum 41190 Mycetozoa Amoebozoa 93/83/67/- Acanthamoeba castellanii 40931 100/96 Vermamoeba vermiformis 24039 Lobosa /98/78 100/100/78/75 Homo sapiens 43422 Ciona 38948 Ixodes 31145 Drosophila melanogaster 39632 Trichoplax adhaerens 39424 ANIMALIA 100/100 Nematostella vectensis 39955 p 79/ /100/77 Monosiga ovata 28519 opisthokonts 86/ Monosiga brevicollis 40139 Choanoflagellatea 46/ 100/100 /99/100 4784 o - Capsaspora owczarzaki 29255 Filasterea Filosporidia 100/57/ Amoebidium parasiticum 10656 85/100 Sphaeroforma arctica 28785 Ichthyosporea Choanozoa Nuclearia simplex 6039 d Fonticula alba 7883 Cristidiscoidea Schizosaccharomyces pombe 38624 Neurospora crassa 40648 i 100/99/ Phanerochaete chrysosporium 26485 Neomycota 100/98 Filobasidiella neoformans 38315 a 100/94/95/98 Phycomyces blakesleeanus 38602 Mortierella verticillata 26653 Zygomycota 100/100/88/76 35/38 FUNGI /-/57 Rozella allomycis 9916 Rozellidea (Choanozoa) 88/ Blastocladiella emersonii 25319 t 85/ Allomyces macrogynus 25491 plus Rozella 91/ 55/ 71/46/-/- Piromyces sp. 10508 Chytridiomycota 91/ - 88/88/69/82 Batrachochytrium dendrobatidis 35582 e 58/ 59/67/91/79 Spizellomyces punctatus 23950 - 100/100 Thecamonas oxoniensis 10999 /99/83 Thecamonas trahens 47517 s 100/100/100/98 Manchomonas bermudensis 8973 Apusomonadida Multimonas media 3977 86/77 Pygsuia biforma 27930 Apusozoa 100/100 /41/- Subulatomonas tetraspora 5004 Breviatea /88/73 66/77/ Breviata anathema 16432 35/38 Mantamonas plastica 19339 Mantamonadida Fabomonas tropica 8589 100/100 Nutomonas howeae 11027 Planomonadida Varisulca Sulcozoa 75/86/ /100/99 Ancyromonas sigmoides 8641 36/28 Collodictyon triciliatum 15545 Diphylleida 85/81/73/71 Malawimonas californiana 19477 Malawimonas jakobiformis 23071 Malawimonadea Trimastix pyriformis 11949 Monocercomonoides sp. 11627 Anaeromonadea 98/97/84/- vaginalis 36727 Metamonada foetus 8165 Parabasalia (Trichozoa) 26990 13001 Jakoba bahamensis 17729 E 100/100 Jakoba libera 18201 /99/100 Jakobea 79/63/86/83 Reclinomonas americana 29821 Loukozoa 52/-/71/83 'Seculamonas ecuadoriensis' 25809 o 100/100/98/40 Stachyamoeba lipophora 14591 Sawyeria marylandensis 23572 Percolozoa Naegleria gruberi 42479 Excavata Rhynchomonas nasuta 7174 z Procryptobia sorokini 7104 saltans 6757 Leishmania 39227 o 100/100/100/99 39784 Diplonema papillatum 9269 Euglenozoa Rhynchomonas sp. 5217 a trichophorum 5921 Euglena longa 10482 25223

0.2 Malawimonas californiana 19514 88/-/- Malawimonas jakobiformis 23931 Malawimonadea Loukozoa Excavata Eozoa 76/-/- Collodictyon triciliatum 15373 Diphyllatea Mantamonas plastica 19454 Mantamonadida Varisulca Fabomonas tropica 8672 Ancyromonas sigmoides 8640 Planomonadida Glissodiscea Nutomonas howeae 11125 Sulcozoa 61/-/- Homo sapiens 44459 88/94/96 Ciona savignyi 24125 Ciona intestinalis 40091 100/100/99 Strongylocentrotus purpuratus 31198 Caenorhabditis elegans 24103 Bombyx mori 29393 Drosophila melanogaster 41087 Ixodes 31708 ANIMALIA 100/99/100 100/100/99 Hydra magnipapillata 34440 Nematostella vectensis 42556 p 100/92/95 Trichoplax adhaerens 41040 100/92/96 Monosiga ovata 29821 Monosiga brevicollis 41151 Choanoflagellatea 89/84/58 rosetta 21911 o 100/100/78 Capsaspora owczarzaki 27385 Ministeria vibrans 5459 Choanozoa 100/100/99 99/96/82 Sphaeroforma arctica 29495 Filosporidia Amoebidium parasiticum 10812 d Nuclearia simplex 6170 Cristidiscoidea Saccharomyces cerevisiae 39595 opisthokonts Neurospora crassa 41460 100/99/99 Schizosaccharomyces pombe 39298 i Ustilago maydis 15616 Filobasidiella neoformans 38989 100/100/85 Phanerochaete chrysosporium 27046 88/94/96 Mortierella verticillata 24045 a Phycomyces blakesleeanus 40806 FUNGI Rhizopus oryzae 37562 Batrachochytrium dendrobatidis 35357 Spizellomyces punctatus 20080 t 97/96/48 Allomyces macrogynus 23268 99/100/96 Blastocladiella emersonii 25641 Thecamonas trahens 42448 87/69/71 Multimonas media 4279 Apusomonadida e 60/34/27 Thecamonas oxoniensis 11167 Apusozoa Sulcozoa 88/89/88 Breviata anathema 16179 Breviatea 98/96/95 Acanthamoeba castellanii 32234 Vermamoeba vermiformis 24432 Lobosa s 98/100/98 36888 Phreatamoeba balamuthi 28794 98/100/100 Polysphondylium pallidum 16538 Archamoebae 93/88/65 Dictyostelium purpureum 40596 Amoebozoa Dictyostelium discoideum 42011 Conosa Physarum polycephalum 29386 Mycetozoa Didymium dachnaya 13506 96/96/95 Stemonitis a . flavogenita 10394 100/100/97 Bigelowiella natans 19022 Paulinella chromatophora 6632 73/43/58 100/100/94 Paracercomonas marina 12069 100/100/99 Spongospora subterranea 2214 Cercozoa 88/-/- Plasmodiophora brassicae 1694 100/99/100 Gromia sphaerica 5560 Quinqueloculina sp. 6462 Rhizaria Reticulomyxa filosa 10730 79/80/63 Phyllostaurus sp. 2936 c 95/92/99 100/100/96 Astrolonche sp. 3822 Blastocystis hominis 20856 Schizochytrium sp. 10717 Phytophthora 44938 o 94/99/97 22854 Aureococcus anophage erens 38740 Heterokonta Thalassiosira pseudonana 41659 86/100/95 Phaeodactylum tricornutum 41546 r 100/100/31 Oxyrrhis marina 16435 95/75/81 Alexandrium 13267 Harosa Perkinsus marinus 29936 Cryptosporidium 37782 t 40033 Alveolata Tetrahymena 40299 100/99/100 Paramecium 41623 Sterkiella histriomuscorum 28178 i 58/61/61 Picomonas sp. MS584-11 3663 39/77/63 Roombia truncata 9966 88/84/60 Goniomonas cf. pacifica 3759 Cryptista c 31/58/38 Guillardia theta 23531 CHROMISTA 98/90/66 Rhodomonas salina 6170 Telonema subtilis 10576 Telonemea 66/77/36 Polyplacocystis contractilis 14019 Heliozoa a Pavlova lutheri 19175 Haptista 51/70/40 Prymnesium parvum 19117 Hacrobia Emiliania_huxleyi 33518 Haptophyta 100/100/94 Isochrysis galbana 24321 t Galdieria sulphuraria 28313 67/66/16 Cyanidioschyzon merolae 31335 Porphyridium cruentum 9123 Gracilaria changii 15323 Rhodophyta e 100/99/92 Chondrus crispus 6707 Pyropia yezoensis 23799 Chlamydomonas 41788 33/68/39 Micromonas 39742 s Ostreococcus lucimarinus 38549 Mesostigma viride 12306 PLANTAE Arabidopsis thaliana 45498 Viridiplantae Physcomitrella patens 44427 66/62/83 99/99/97 Selaginella moellendor i 42305 Cyanophora paradoxa 22255 Glaucocystis nostochinearum 17540 Glaucophyta 96/94/82 Monocercomonoides sp. 11777 Trimastix pyriformis 12197 Trichomonas Tritrichomonas 37344 Giardia intestinalis 31210 Metamonada E 100/98/81 barkhanus 14731 Andalucia incarcerata 13440 87/80/55 Jakoba libera 18479 61/-/- 'Seculamonas ecuadoriensis' 26765 Loukozoa o Histiona aroides 15250 Jakobea Reclinomonas americana 30581 98/100/100 Jakoba bahamensis 18123 Sawyeria marylandensis 24162 z Stachyamoeba lipophora 14627 Naegleria gruberi 43980 Percolozoa Euglena gracilis 26411 Excavata Euglena longa 10805 o Leishmania 40583 Trypanosoma brucei 40568 Euglenozoa a 0.2 /98/78 100/100/78/75 Homo sapiens 43422 podiates Ciona 38948 Ixodes 31145 Drosophila melanogaster 39632 ANIMALIA FUNGI Trichoplax adhaerens 39424 A 100/100 Nematostella vectensis 39955 Microsporidia 79/ /100/77 Monosiga ovata 28519 86/ Monosiga brevicollis 40139 Choano 46/ 100/100 /99/100 Ministeria vibrans 4784 - Capsaspora owczarzaki 29255 Filasterea corticates 100/57/ Amoebidium parasiticum 10656 85/100 Sphaeroforma arctica 28785 Ichthyo Nuclearia simplex 6039 Amoebozoa PLANTAE CHROMISTA Fonticula alba 7883 Cris Choanozoa Schizosaccharomyces pombe 3 Neurospora crassa 40648 100/99/ Phanerochaete chrysosporium 26485 100/98 Filobasidiella neoformans 38315 Phycomyces blakesleeanus 38602 groove 100/94/95/98 Zygomy Sulcozoa 35/38 Mortierella verticillata 26653 /-/57 Rozella allomycis 9916 Rozellidea Colponema 88/ Blastocladiella emersonii 25319 ciliary gliding pseudopodia cortical alveoli 85/ Allomyces macrogynus 25491 91/ 55/ 71/46/-/- Piromyces sp. 10508 Chy 91/ - 88/88/69/82 Batrachochytrium dendrobatidis 35582 58/ 59/67/91/79 Spizellomyces punctatus 23950 - 10999 neozoa 100/100 Thecamonas oxoniensis /99/83 Thecamonas trahens 47517 100/100/100/98 Manchomonas bermudensis 8973 Apusom Multimonas media 3977 cytopharynx 86/77 Pygsuia biforma 27930 Eozoa /41/- Subulatomonas tetraspora 500 66/77/ Breviata anathema 16432 Excavata 35/38 Mantamonas plastica 19339 Mantamonadid Fabomonas tropica 8589 groove Euglenozoa 100/100 Nutomonas howeae 11027 Planomonadid Jakoba (Loukozoa) 75/86/ /100/99 Ancyromonas sigmoides 8641 36/28 Collodictyon triciliatum 15545 Diphylleida 85/81/73/71 Malawimonas californiana 19477 Malawimonas jakobiformis 23071 Malawimonad Ti i if i 11949

32 1101 1102 • Sulcozoa, with dorsal rigid pellicle and ventral pseudopodia, are basal podiates 1103 1104 • Site-heterogeneous trees with low missing data show three basal sulcozoan lineages 1105 1106 • Apusozoa are paraphyletic with breviates sister to opisthokonts plus apusomonads 1107 1108 • Varisulca are sister to opisthokonts, Apusozoa, and Amoebozoa, and may be a clade 1109 1110 • Rapid early podiate and close outgroup radiation hinders multigene tree resolution 1111