<<

Accepted Manuscript

Multigene phylogeny resolves deep branching of

Thomas Cavalier-Smith, Anna Maria Fiore-Donno, Ema Chao, Alexander Kudryavtsev, Cédric Berney, Elizabeth A. Snell, Rhodri Lewis

PII: S1055-7903(14)00278-4 DOI: http://dx.doi.org/10.1016/j.ympev.2014.08.011 Reference: YMPEV 4995

To appear in: Molecular Phylogenetics and Evolution

Received Date: 17 April 2014 Revised Date: 2 August 2014 Accepted Date: 11 August 2014

Please cite this article as: Cavalier-Smith, T., Fiore-Donno, A.M., Chao, E., Kudryavtsev, A., Berney, C., Snell, E.A., Lewis, R., Multigene phylogeny resolves deep branching of Amoebozoa, Molecular Phylogenetics and Evolution (2014), doi: http://dx.doi.org/10.1016/j.ympev.2014.08.011

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. 1

Multigene phylogeny resolves deep branching of Amoebozoa

Thomas Cavalier-Smitha, Anna Maria Fiore-Donnoa,b, Ema Chaoa, Alexander Kudryavtsevc,

Cédric Berneya, d, Elizabeth A. Snella, Rhodri Lewisa a Department of Zoology, University of Oxford, Oxford, United b Present address: Terrestrial Ecology Group, Biozentrum Köln, University of Cologne,

Germany c Department of Invertebrate Zoology, Faculty of Biology, St. Petersburg State University,

Saint-Petersburg, Russia d Present address: Station Biologique de Roscoff, UMR7144/Groupe Evolution et PaléOcéans,

CNRS/Université Paris VI, Roscoff, France

Author for correspondence: Thomas Cavalier-Smith, [email protected] 2

ABSTRACT

Amoebozoa is a key for phylogeny and evolutionary history, but its phylogenetic validity has been questioned since included are very diverse: amoebo- slime-moulds, naked and , and some . 18S rRNA gene trees have not firmly established its internal topology. To rectify this we sequenced cDNA libraries for seven diverse Amoebozoa and conducted phylogenetic analyses for 109 (17-18 Amoebozoa) using 60-188 genes. We conducted Bayesian inferences with the evolutionarily most realistic site-heterogeneous CAT-GTR model and maximum likelihood analyses. These unequivocally establish the monophyly of Amoebozoa, showing a primary dichotomy between the previously contested subphyla and . Lobosa, the entirely non-flagellate lobose amoebae, are robustly partitioned into the monophyletic classes , with predominantly tube-shaped , and with flattened cells and different locomotion. Within Conosa 60/70-gene trees with very little missing data show a primary dichotomy between the aerobic infraphylum Semiconosia ( and

Variosea) and secondarily anaerobic . These phylogenetic features are entirely congruent with the most recent major amoebozoan classification emphasising locomotion modes, pseudopodial morphology, and ultrastructure. However, 188-gene trees where proportionally more taxa have sparser gene-representation weakly place Archamoebae as sister to Macromycetozoa instead, possibly a tree reconstruction artefact of differentially missing data.

Keywords: amoebae phylogenomics; eukaryote-wide phylogeny; Nolandella;

Himatismenida; Varipodida; Semiconosia 3

1 Introduction

Amoebae are unicellular that produce pseudopodia and change shape as they move: they were formerly all lumped as Sarcodina (Levine et al., 1980), a name signifying that their cells are a simple protoplasmic mass that uses pseudopodia for movement, not complex cilia as in flagellates and . From the 1970s to 1990s, increased application of electron microscopy and the first molecular phylogenies confirmed earlier suspicions that

Sarcodina was too broadly defined and that the distinction between amoebae and flagellates was often artificial. It is now clear that amoebae are polyphyletic and belong in three evolutionarily distinct major groups not directly related to each other, each of which also includes diverse flagellate . One distinctive group named Heterolobosea (Page and

Blanton, 1985) comprises protists whose amoebae moving by sudden, "eruptive" bulging and often having a separate non-amoeboid flagellate stage; these belong in the excavate phylum

Percolozoa (Cavalier-Smith 2003), which also includes purely flagellate like

Percolomonas and (Nikolaev and Cavalier-Smith 2007). Very different are the tremendously diverse rhizarian ‘amoebae’ (Cavalier-Smith, 2002) with net-like or thread- like pseudopodia; belong in the "SAR" (Burki et al., 2007), now formally subkingdom Harosa of the kingdom (Cavalier-Smith, 2010a). The third major amoeboid group, phylum Amoebozoa (Cavalier-Smith, 1998), is the subject of this paper; they are of special evolutionary interest, being related to the common ancestor of and fungi (Cavalier-Smith et al., 2004; Cavalier-Smith, 2013).

Amoebozoa typically have non-eruptive, generally blunt and broad (i.e. lobose) pseudopodia; cells are naked or with a shell (testate) or bear a dorsal organic cell coat

(), though some have other protoplasmic forms such as the massive macroscopic plasmodia of many slime moulds (myxomycete Mycetozoa).

Amoebozoa comprise two subphyla: the purely amoeboid Lobosa that never have cilia and 4 mostly have blunt and broad cytoplasmic projections, and Conosa with both amoeboid and flagellate lineages and often more pointed or even mildly branched subspeudopodia. In species number, perhaps over 2400 (Pawlowski et al., 2012), Amoebozoa are a major phylum, yet are underrepresented in molecular databases (Pawlowski, 2008; Pawlowski and

Burki, 2009). Currently, their phylogeny is based either on few species in well-resolved but extremely sparse multigene trees, or on a plethora of species in basally ill-resolved and often partially contradictory small subunit ribosomal RNA gene (18S rDNA) trees, leaving several phylogenetic questions debatable.

These include: (1) their monophyly or otherwise (here, monophyly = holophyly of

Ashlock (1971) and Cavalier-Smith (2010b)); (2) the phylogenetic validity of the present classification into classes and orders based primarily on locomotory modes and ultrastructure as interpreted in the light of single-gene trees (Smirnov et al., 2011: Cavalier-Smith 2013); (3) the unstable position of Archamoebea (e.g. , , Phreatamoeba); and

(4) that of the even harder to place Himatismenida, with a unique dorsal cell coat (e.g.

Cochliopodium, Ovalopodium and Parvamoeba). Existing data for multigene analyses are taxonomically biased (Fig. 1) and too sparse (only a dozen species) to answer these questions, so we have partially sequenced seven new transcriptomes and thereby answer all four. Most completed or in progress sequences belong to subphylum Conosa and only two to subphylum Lobosa – only one to each class, a deficiency especially serious for the diverse and contested class Discosea, and none to the conosan classes Variosea or Protostelea.

Comprehensive reviews are available for Amoebozoa (Pawlowski, 2008; Schilde and Schaap,

2013; Smirnov et al., 2011), so we outline as essential background only recent studies defining these key questions that we now resolve. 5

1.1 Are Amoebozoa monophyletic?

Amoebozoa encompass such a variety of forms and -styles that the group has been criticized for lack of unifying ultrastructural characters (Parfrey et al., 2006; Yoon et al.,

2008). Numerous 18S rDNA phylogenies including a wide array of Amoebozoa failed to resolve their basal branching or consistently show monophyly (e.g. Altan et al., 2012;

Kudryavtsev et al., 2011; Pombert et al., 2013; Ptáčková et al., 2013). Several recent eukaryote-wide multigene phylogenies (>10 genes) based on 5-10 Amoebozoa clearly recovered their monophyly (Baurain et al., 2010; Brown et al., 2013; Derelle and Lang, 2012;

Grant et al., 2012; Lasek-Nesselquist and Gogarten, 2013), but others failed to do so, at least in part of the analyses (Burki et al., 2013; Grant et al., 2012; Parfrey et al., 2010; Yoon et al.,

2008). An anaerobic amoeboflagellate of unstable position, anathema (first misidentified as the amoebozoan Mastigamoeba invertens), was thought to be either weakly related to the flagellate apusomonads (Walker et al., 2006) or the most divergent branch of

Amoebozoa (Cavalier-Smith et al. 2004; Paps et al., 2013; Shalchian-Tabrizi et al., 2008) or even to belong within Amoebozoa as sister to Archamoebea (Shalchian-Tabrizi et al., 2008).

Recently, it was convincingly shown that Breviatea (Breviata, Subulatomonas, Pygsuia) are not amoebozoans, but a distinct sulcozoan clade between Amoebozoa and that is probably sister to opisthokonts plus apusomonads (Brown et al., 2013; Cavalier-Smith et al., submitted; Katz et al., 2011). Exclusion of Breviatea was a major step in defining the boundaries of Amoebozoa, but multigene analyses (>100 genes) with broader sampling are necessary to establish more convincingly whether remaining Amoebozoa are monophyletic.

1.2 The deepest division in Amoebozoa: Conosa/Lobosa or Tubulinea/other Amoebozoa?

Recently a new classification of Amoebozoa integrating conflicting evidence from single-gene trees with cell morphology and locomotory patterns was developed (Cavalier-

Smith et al. 2004; Cavalier-Smith, 2013; Smirnov et al., 2005, 2011) (Fig. 1). The greatest 6 morphological disparity lies between subphyla Conosa and Lobosa (Cavalier-Smith 1998;

2013). Flagellate stages of Conosa (Mycetozoa, Variosea, Archamoebea) have a complex microtubular skeleton, simplest in Variosea and Archamoebea - essentially a partial or complete cone of - giving the name to the subphylum. Even non- Conosa

(e.g. ) have extensive cytoplasmic microtubules, usually absent in Lobosa

(Tubulinea, Discosea). Despite that disparity, a clear-cut primary bifurcation between Conosa and Lobosa is seldom seen in broadly-sampled rDNA phylogenies, one exception being Zhao et al. (2012). 18S rDNA trees often intermingle Lobosa and Conosa (Cavalier-Smith et al.

2004; Kudryavtsev et al., 2011; Lahr et al., 2011, 2013). In some, Tubulinea (e.g. ,

Hartmannella) were sister to all other amoebozoa, i.e. Discosea plus Conosa (Altan et al.,

2012; Brown et al., 2007, 2011; Kudryavtsev and Pawlowski, 2013; Nikolaev et al., 2005,

2006; Pombert et al., 2013; Shadwick et al., 2009; Smirnov et al., 2008; Tekle et al., 2008).

Thus, phylogenetic support for monophyly of Lobosa or Conosa or where the primary dichotomy (root) of the amoebozoan tree lies is wanting, so some question the validity of both subphyla (Grant et al., 2012; Lahr et al. 2011; Parfrey et al., 2010).

1.3 Monophyly and phylogeny of Conosa

Validity of Conosa has been challenged partly because of the unstable position on sequence trees of class Archamoebea (e.g. Mastigamoeba, Entamoeba, ), whose mitochondria degenerated to or . Archamoebea have long branches in 18S rDNA trees, sometimes relatively well-supported as a clade (Cavalier-Smith et al., 2004; Fiore-Donno et al., 2010; Kudryavtsev et al., 2005, 2011; Nikolaev et al., 2006;

Shadwick et al., 2009), and sometimes only weakly or insignificantly (Pombert et al., 2013;

Smirnov et al., 2008; Tekle et al., 2008), even in two-gene (Lahr et al., 2011) or multigene trees (Lahr et al., 2013), so their monophyly has been questioned. Even less clear is their position relative to other groups, highly unstable and mostly unresolved by 18S rDNA. 7

Placement in Conosa was supported on sparsely sampled multigene trees restricted to six

Amoebozoa (Burki et al. 2013).

The conosan class Variosea is controversial as it lacks obvious unifying characters and varies widely from broad, flattened, aciliate cells with pointed subpseudopodia (Filamoeba,

Flamella) (Fig. 1) and sometimes branched pseudopodia (Acramoeba, Grellamoeba) to multi- or uniciliate cells (, ). It is understudied, with many environmental sequences (Kudryavtsev et al., 2011; Pombert et al., 2013). Most 18S rDNA trees group varioseans weakly with Mycetozoa but not as a single clade or in a consistent order (Fiore-

Donno et al., 2010; Kudryavtsev et al., 2011; Kudryavtsev and Pawlowski, 2013; Ptáčková et al., 2013; Shadwick et al., 2009). This region of the amoebozoan rDNA tree also includes class Protostelea, characterized by "fruiting bodies" (often a single spore on a stalk), whose coherence has been similarly criticised. It was grouped with the other fruiting amoebae

(Macromycetozoa, i.e. Dictyostelea; Myxomycetes, which includes and

Ceratiomyxida: Cavalier-Smith, 2013) as Mycetozoa (Olive, 1975), but the diversity of protostelid ciliary apparatus and locomotive forms led Spiegel (1981) to suggest they are polyphyletic. However, Cavalier-Smith (2013) argued that most are genuinely related to other Mycetozoa, their cytoskeletal differences arising mainly by differential loss, and grouped Mycetozoa and Variosea as subphylum Semiconosia characterised by an incomplete microtubular cytoskeletal cone contrasting with the complete cone of infraphylum

Archamoebae. 18S rDNA trees show that a few spore-bearing amoebae are phylogenetically entirely distinct from Macromycetozoa, but most group in a bushlike array, predominantly nearer the base of Macromycetozoa than Variosea, with which a few (notably Soliformovum) commonly intermingle (Fiore-Donno et al., 2010; Kudryavtsev et al., 2011; Shadwick et al.,

2009). Because of the low resolution of single-gene trees we do not know whether Variosea 8 and Protostelea are phylogenetically distinct groups, or should be merged into one or subdivided into several.

1.4 Monophyly and composition of Lobosa

Lobosa have two classes: Tubulinea, a highly supported clade in most trees, with five orders (Smirnov et al., 2011), and the more diverse and broadly defined Discosea with at least nine (Smirnov et al., 2011). Most Tubulinea have broad, cylindrical pseudopodia, as exemplified by Amoeba, Echinamoeba, , Nolandella, and Vermamoeba, classified in three distinct orders of naked amoebae, and the single order () of lobose testate amoebae. Suggested synapomorphies are tubular pseudopodia and monoaxial cytoplasmic flow, found in all five orders, though the derived order (e.g.

Leptomyxa, , ) can alternate between a typical tubular and flattened reticulose (net-like) morphology.

Discosea are flattened and exhibit polyaxial cytoplasmic flow or flow without a pronounced axis. In most the cell moves as a whole, without separate pseudopodia (Smirnov et al., 2011) and may have a smooth outline or form subpseudopodia of different shapes. As

Discosea encompass nine different morphotypes their phylogenetic unity is more debated

(Smirnov et al., 2011); many rDNA trees weakly suggest or polyphyly (Brown et al., 2011; Cole et al., 2010; Kudryavtsev et al., 2011, 2014; Kudryavtsev and Pawlowski,

2013; Lahr et al., 2011, 2013; Pombert et al., 2013; Ptáčková et al., 2013).

1.5 Himatismenida, previously hard to place amoebae with a dorsal organic cell coat

Himatismenida are placed in Discosea (subclass ) mainly because they share the typical discosean flat cell shape (Smirnov et al., 2011) (Fig. 1). Their unique character is a dorsal, flexible, transparent thick cell coat, including intricate tridimensional scales in

Cochliopodium only (e.g. Kudryavtsev, 2006). Represented in rDNA databases by several 9 species of Cochliopodium (Kudryavtsev et al., 2005) and non-scaly Ovalopodium and

Parvamoeba (Kudryavtsev et al., 2011), they form a clear, long-branch clade of unstable position in Amoebozoa in 18S rDNA trees; inclusion in Discosea and even Flabellinia is sometimes weakly supported (Kudryavtsev et al., 2011, 2014).

Better delimiting the pivotal phylum Amoebozoa and firmly establishing its deep phylogeny will stimulate further research, as one cannot properly ask evolutionary questions about a phylogenetically shapeless group. Therefore we produced and sequenced cDNA libraries for seven Amoebozoa and made 188-gene phylogenetic analyses for 109 eukaryotes, as well as for alignments of 60-122 genes with proportionally less missing data to make conclusions more reliable (Roure et al., 2013). Our new sequences include varipodid Variosea

(Filamoeba, Flamella), Nolandella abertawensis (Tubulinea), representing a different, most divergent order (Nolandida) from the available Vermamoeba (Echinamoebida), and four

Discosea from three orders: Stenamoeba (Thecamoebida), (), and a scaly Cochliopodium and non-scaly Ovalopodium (both Himatismenida). This is the largest multigene dataset to date for Amoebozoa, including the first multigene data for Variosea and now five Discosea not just one species; it represents 12 different amoebozoan orders, nearly half those now recognised (Fig. 1 and Cavalier-Smith, 2013).

2 Materials and Methods

2.1 Cultures and RNA extraction

Seven strains were grown as noted in Table 1. For each, amoebae from 10-20 Petri dishes were scraped, collected in an Eppendorf tube, and centrifuged at low speed.

Cochliopodium minutoidum, Ovalopodium desertum, and Flamella fluviatilis cells were immediately transferred into RNALater buffer (Ambion, Texas, USA) and frozen; total RNA isolation was performed by Vertis Biotechnologie AG (Germany) using a mirVana RNA isolation kit (Ambion, Texas, USA). For other taxa, the pellet was washed with the same 10 sterile medium in which amoebae were kept. RNA was extracted using TRIzol Reagent

(Life Technologies Corporation, California, USA), following the manufacturer's protocol.

2.2 cDNA libraries, 454 pyrosequencing, sequence assembly

cDNA libraries were created by Vertis Biotechnologie AG from 10-20 µg of total RNA, as already described (Cavalier-Smith et al., submitted). Briefly, 454 pyrosequencing of the cDNA libraries was done at the University of Liverpool Advanced Genomics Facility. The sequencing process combined the seven libraries used in this study with 17 other protistan libraries in two separate runs of 12 taxa individually tagged by specific oligonucleotides

(described previously in detail: Cavalier-Smith et al., submitted). The University of Liverpool processed raw data files using 454 base-calling software; individual reads from each library were assembled with minimum overlap of 30 bp using proprietary 454 software, Newbler.

231,881 reads were obtained, leading to 89,577 assembled sequences (Table 1). Primers and adapters were removed using cross_match (part of the phred/phrap suite) (Ewing and Green,

1998) with slightly modified parameters (-minmatch 12 -minscore 17) to better identify short vector stubs as previously described (Cavalier-Smith et al., 2014). However, the assembly parameters were perhaps too stringent, and many useful unassembled sequences ("repeats") were included to increase total gene representation. In addition to the seven strains listed in

Table 1 a similar library prepared for simplex (order Vannellida) was poor and yielded sequences for only 13 of the 188 genes. New sequences obtained in this work are deposited under GenBank BioProject numbers given in Table 1.

2.3 Alignments

We combined two published multi-gene protein alignments (Burki et al., 2007;

Rodríguez-Ezpeleta et al., 2007) as seed for adding our new sequences. We added many more 11 outgroup sequences later available in NCBI and JGI and from http://www.broadinstitute.org/scientific-community/data or the Protist EST database TbestDB

(http://tbestdb.bcm.umontreal.ca for Diplonema papillatum and Rhynchopus euleeides). As meaningful signal can be obscured by noise (Philippe et al., 2011), for accurate paralogue, duplicate, and contaminant removal we carefully checked every single-gene alignment at least twice independently and RAxML 100-400 fast bootstrap trees were run initially for all 188 genes and examined visually and repeated after corrections as a last check. SCaFoS (Roure et al., 2007) was used to concatenate the alignments. To increase gene sampling for nine taxa we combined some closely related species in the same into an artificial (Table 3), selecting the longest sequence when two were present. Full technical details for data assembly are in Cavalier-Smith et al. (submitted); we discovered that the pyruvate kinase gene of

Phreatamoeba balamuthi included in earlier multigene analyses (e.g. Zhao et al., 2012;

Brown et al., 2013) is a lateral gene transfer from and two previously used

’ genes were actually from its food ; these and other problematic genes were excluded here. The 188 genes analysed are listed with their representation in our

ESTs in Supplementary Table S1. As new names were published for some taxa during the course of this work Supplementary Table S2 is a key to all names used for the sequences.

2.4 Phylogenetic analyses

We analysed two main taxon samples: a eukaryote-wide alignment of 109 taxa representing all major eukaryotic groups (included as a Supplementary file, with gene boundaries and number of taxa represented for each gene specified in Table S3), to investigate the monophyly and root position of Amoebozoa, and the 17 Amoebozoa alone to investigate their mutual relationships avoiding any tree distortion by outgroups. For the 188 genes, the first alignment had 51, 352 amino acid positions and the other 50, 906. Maximum likelihood

(ML) analyses used RAxML HPC-MPI (version 7.0.4 for 17 taxa and 8.0.3 for 109 taxa) 12

(Stamatakis et al., 2005; Stamatakis et al., 2008), with the LG amino-acid substitution matrix, gamma distribution approximated by four rate categories, amino acid frequencies estimated from the data (F), and rapid bootstrapping, shown to give equivalent results to the

"slow" method (Lasek-Nesselquist and Gogarten, 2013; Stamatakis et al., 2008), followed by a thorough ML search. For 109 taxa we used 100 and for 17 taxa 1000 bootstrap resamplings.

Bayesian inference used PhyloBayes MPI v.1.5a (Lartillot et al., 2013) with the site- heterogeneous CAT-GTR model (Lartillot and Philippe, 2004) and a gamma distribution of rates across sites approximated by four categories. At least two independent chains were run.

To influences of missing data (Roure et al., 2013), we created by SCaFoS (Roure et al.,

2007), three reduced alignments for the 17 Amoebozoa sample, excluding genes missing in

>50%, >40% and >30% of the taxa - with 97, 87, and 70 genes (16, 981; 13,719; 10,157 amino acids) remaining respectively for the 17-taxon trees, i.e.; these six alignments had steadily lower proportions of missing genes. For the 109-taxon trees the corresponding restricted gene samples had 122, 86, and 60 genes (25,747; 15,439; 11,100 amino acids). For these 60-122 concatenated genes ML (RAxML PROTGAMMALGF) and CAT-GTR phylogenies were constructed as above. All 17-species PhyloBayes chains converged well, with maxdiffs for 188 gene 0.0156 (11,512 trees summed, after removing the first 1,004 as burnin; 97 gene 0.0631256 (2,304 trees, burnin 156); 87 gene 0.0327248 (11,644 trees, burnin

1513); 70 gene 0.101553 (17, 294 trees burnin 793). For 109 taxa trees convergence was less good: complete for the Amoebozoa but topological contradictions remained for basal corticates and Parabasalia between the two chains so their results are described individually

(maxdiff 1 (meandiff 0.0189999) for 188 genes; 0.97824 (meandiff 0.0110837) for 60 genes).

In addition, for the 13 genes sequenced for Vanella simplex we made 18-species Amoebozoa only trees (3786 amino acids) by both methods (PhyloBayes converged well: maxdiff

0.0473378). 13

3. Results

3.1 Eukaryote-wide analyses

The CAT-GTR and maximum-likelihood (ML) eukaryote-wide trees (Fig. 2: 109 taxa,

188 genes) both show monophyletic Amoebozoa, with high support (1.0/89). Amoebozoa are split into two , Lobosa and Conosa, with maximal Bayesian and weaker ML support

(Lobosa: 1.0/65; Conosa: 1.0/68). Both algorithms show all six classes (Tubulinea, Discosea,

Variosea, Myxomycetes, Dictyostelea, and Archamoebea) as monophyletic and place

Variosea with weak support as sister to Archamoebea plus Macromycetozoa. Amoebozoa are sisters of plus opisthokonts with maximal support (1.0) by CAT and moderate support (75) by ML; breviates never group with Amoebozoa. Besides Amoebozoa, both methods consistently retrieve most phylogenetically well-established groups, e.g. opisthokonts (1.0/100), (1.0/100), (1.0/100), Harosa (= SAR)

(1.0/95), and corticates (1.0/76). However, as many others previously observed, ML and

CAT-GTR topologies were contradictory for basal branching in corticates, and for the positions of breviates, , and . The /Plantae region is the most difficult of all to resolve on global eukaryote trees as almost every published multigene tree is different (e.g. Burki et al., 2009, 2012, 2013); here one CAT chain showed kingdom

Plantae as monophyletic but the other and ML did not. Our trees clearly show that monophyly of Amoebozoa is much easier to resolve and better supported than that of Plantae.

In podiates Apusozoa (i.e. and Breviatea) appear monophyletic in ML

(93) in accordance with other LG-ML phylogenies (Cavalier-Smith et al., submitted; Grant et al., 2012; Brown et al., 2013), but this branching is strongly contradicted by our present and previous CAT-GTR analyses, where Apusozoa are paraphyletic with Breviatea sister to

Apusomonadida plus opisthokonts (1.0). Brown et al. (2013) and Cavalier-Smith et al.

(submitted) argued that apusozoan paraphyly is the more likely topology as CAT-GTR better 14 fits the data than does the LG model. In our ML trees, the podiate Collodictyon is sister to the excavate Malawimonas (BS 90%) as in Grant et al. (2012), possibly artefactual as it is not retrieved by our CAT-GTR analyses, which placed Collodictyon alone as sister to other podiates (1.0 chain 2 Fig. 1) or to podiates plus the long-branch Parabasalia (1.0 chain 1); whereas Malawimonas branches more deeply in a weakly supported clade with anaeromonads, which is sister to podiates plus Parabasalia (1.0), with maximal support (1.0) on both chains for its not grouping with Collodictyon and branching more deeply than podiates plus Parabasalia. ML shows all excavate metamonads as a well supported clade that is sister to podiates plus Malawimonas, in agreement with the only previous CAT trees that included the long-branch (Kamikawa et al. 2014). However, CAT split metamonads, grouping Parabasalia with podiates (inconsistent deep positions on the two chains), and anaeromonads weakly with Malawimonas. Though the monophyly and precise position of metamonads is unresolved on these trees, all place them closer to podiates/Malawimonas than to other excavates ( and Heterolobosea). One CAT chain shows maximally supported bipartitions between all three eukaryote supergroups (podiates, corticates, Eozoa: Cavalier-Smith, 2013) (Fig. 2), whereas ML blurs the podiate/eozoan boundary by grouping the podiate Collodictyon with the excavate Malawimonas, as did the other CAT chain in a contradictory way by placing trichomonads within podiates above

Collodictyon.

Given the proportionally low gene sampling of many Amoebozoa and of Malawimonas and Collodictyon which branch just below the divergence of Amoebozoa, Apusozoa, and opisthokonts, there is a risk that the 188-gene trees are distorted for these taxa by this gene sampling, as explained in detail by Roure et al. (2013); they argued that in such cases it may be better to analyse alignments with fewer genes overall but proportionally less missing data.

Therefore we repeated the analysis with a reduced alignment of the 60 genes (mostly 15 ribosomal proteins) having the best coverage across the tree, i.e. proportionally least missing data (all genes with >30% taxa missing from the alignment excluded). As can be seen from the additional support values on Figure 2, the main structure of this more densely represented 60-gene tree is identical to, and as well supported as, the 188-gene tree. However there are inconsistencies for basal corticates and for Parabasalia as noted above; furthermore with CAT, Malawimonas moved to become sister to Collodictyon (0.95, 0.87 on the two chains) in agreement with ML.

Within Amoebozoa, the 60-gene trees strongly contradict the 188-gene trees for Conosa in one crucial respect: showing monophyly of Semiconosia (Mycetozoa plus Variosea) with strong support by both methods and both CAT chains (1.0, 1.0, 85). Instead the 188-gene trees showed Archamoebea as sister to Macromycetozoa with insignificant ML support (1.0, 41) making Semiconosia paraphyletic. According to the reasoning of Roure et al. (2013), the substantially stronger support for semiconosian monophyly on the 109-taxon 60-gene trees

(least likely to suffer from missing data artefact) suggests that this topology is probably correct and their apparent paraphyly with 188 genes may be a tree reconstruction artefact.

Himatismenida are invariably monophyletic with maximal support, and within Discosea, but the branching order of the four orders within Discosea is generally poorly supported and was contradictory between methods for the 188-gene sample. For the 60-gene sample all three trees grouped Vexillifera and as a clade (0.71, 0.81, 76). In the 60-gene tree only, Discosea became weakly paraphyletic, as Tubulinea moved into it to become sister to

Himatismenida with insignificant support (9%). CAT consistently showed monophyly for

Discosea with maximal support on both 188-gene and 60-gene trees, but LG ML did so only on the 188-gene tree - with insignificant support (41). As the LG model is evolutionally less realistic, that is consistent with the expectation that it will be less able to calculate the correct topology. Analyses with an intermediate number of genes (86, 122) also showed the 16 monophyly of Discosea with good support with CAT; for Conosa topology was as for 188 genes.

3.2 Amoebozoa-only analyses

The Amoebozoa-only 188-gene tree (Fig. 3) has a maximally supported basal dichotomy between Conosa and Lobosa with CAT-GTR, but though also present with ML support is negligible. In Conosa all four classes are clades with maximal support, as are

Tubulinea and Himatismenida. All classes are monophyletic as in Figure 2, with maximal support for all except Discosea. However topology differs from Fig. 2 in precisely the two aspects that were inconsistent with method or gene sample in Fig. 2: branching within

Discosea and Semiconosia. Branching within Conosa is exactly as in the 60-gene tree for 109 taxa for ML, with 100% support for the monophyly of Semiconosia, which strongly supports our conclusion above that their paraphyly in the 188-gene ML tree of Fig. 2 is probably an artefact – interestingly seen only in the presence of distant outgroups. Oddly, for these taxa

CAT-GTR shows Archamoebea as sister to Macromycetozoa with 0.97 support, contradicting

ML but in agreement with the Fig. 2 188-gene tree. However, when the 17-amoebozoa alignment is restricted to the 70 best represented genes, CAT-GTR and LG ML agree in convincingly showing Semiconosia as monophyletic (0.96, 97; Fig. 4). This consistently high support in all 60-gene and 70-gene trees (those with proportionally least missing data) strongly supports the monophyly of Semiconosia. For intermediate proportions of missing data (87, 97 genes) the putatively artefactual topology (Archamoebea with Macromycetozoa) was seen with maximal support by CAT and insignificant or low by ML (Figs 3, 4). Thus both

ML and CAT-GTR can be misled by higher fractions of missing data.

Internal branching of Discosea varies somewhat as in the eukaryote-wide tree. The most widely supported grouping is Acanthamoeba with Vexillifera seen in six of the eight

Amoebozoa-only trees (in all those with the lowest proportion of missing data) and was found 17 in all eukaryote-wide trees, always with moderate to strong support. This consistency suggests that this clade is probably genuine, and is further supported by it being found by both methods with the 60-gene 109 taxa tree. However, the position of Stenamoeba is variable; sometimes sister to all other Discosea, sometimes to Himatismenida, and sometimes to

Centramoebida/Dactylopodida. In the 17-taxon 70-gene ML tree only, Archamoebea moved into Discosea as sister to Himatismenida (78% support), making Discosea seemingly paraphyletic, almost certainly an artefact of insufficient data overall coupled with the lower ability of LG ML to reconstruct the correct tree.

The 13-gene trees (Supplementary Fig. S1) for 18 Amoebozoa including Vannella simplex (Discosea: Vannellida) gave strong support by both methods for clades Semiconosia

(0.96, 84), Variosea (1.0, 100), Macromycetozoa (0.99, 92), Myxomycetes (1.0, 100),

Dictyostelea (1.0, 100), Archamoebae (0.99, 94), Varipodida (1.0, 100), Himatismenida (0.89,

76), and Tubulinea (0.87, 96). PhyloBayes alone weakly supported also clades Conosa and

Lobosa (0.55), neither being found with ML (as often on single-gene trees), further evidence for the higher accuracy of CAT. Discosea was paraphyletic with both methods and with contradictory insignificantly supported internal phylogeny. If all 188 genes are included the

18 amoebozoa ML tree more weakly supports Semiconosia as a clade, but CAT placed

Archamoebae as sister to Macromycetozoa (0.85). These trees confirm that Vannellida are

Discosea, but clearly do not include enough genes to clarify relationships amongst the five studied discosean orders.

4 Discussion 18

4.1 Monophyly of Amoebozoa

Using the most comprehensive dataset for eukaryote-wide multigene phylogeny to date

(188 genes; 17 amoebozoans; 92 other eukaryotes), we found Amoebozoa to be monophyletic with high support in all analyses (Fig. 2). This confirms Cavalier-Smith's (1998) early vision of Amoebozoa, and shows that the lack of unifying ultrastructural features cannot be invoked to reject the group. Monophyly of Amoebozoa was challenged because the long branches of the fast-evolving Archamoebea tend to be attracted outside Amoebozoa (Grant et al., 2012;

Parfrey et al., 2010). Our trees show that Archamoebea are invariably within Amoebozoa, and almost certainly belong within a monophyletic Conosa. All our eukaryote-wide trees agree with the recent transfer of Breviatea from Amoebozoa to Apusozoa (Cavalier-Smith, 2013), so the demarcation therein of Amoebozoa is phylogenetically sound.

It is to be expected that an essentially amoeboid phylum that repeatedly lost cilia and grossly simplified or entirely abandoned the complex ancestral eukaryotic microtubular skeleton in favour of novel actomyosin-based pseudopodial motility (with highly plastic cell form) should lack features of ciliary and ciliary root ultrastructure that vary amongst and unify most other essentially flagellate protist phyla. Rather than phylum-specific ultrastructure, the unifying feature of the body plan of Amoebozoa is this very ancestral dependence on whole- cell actomyosin locomotion, which contrasts strongly with the essentially feeding filopodia and reticulopodia of Rhizaria, and in pseudopodial form with the eruptive pseudopodia of

Heterolobosea.

4.2 Lobosa and Conosa constitute the primary amoebozoan dichotomy

For the first time with a representative assemblage of amoebozoan sequences, we establish clearly that the deepest division in Amoebozoa is between two monophyletic groups, subphyla Lobosa and Conosa: early-diverging Tubulinea seen without significant statistical support in some but not all 18S rDNA phylogenies is probably a single-gene tree 19 reconstruction artefact. This basic amoebozoan evolutionary dichotomy is recognized in the most comprehensive classification of Amoebozoa (Cavalier-Smith, 2013; Smirnov et al.,

2011), but ignored by that of Adl et al. (2012). It shows the primordial importance of the phylogenetic split between purely amoeboid cells (Lobosa) and the Conosa that ancestrally retained cilia in either the trophic phase (Variosea and Archamoebae) or dispersal stages

(myxomycete Macromycetozoa). The possibility that Conosa is the basal subphylum of

Amoebozoa was weakly suggested by a recent multigene analysis (Zhao et al. 2012 Fig. 3, but contradicted by their Fig. 4); our analyses show that apparent conosan paraphyly was probably an artefact, as already suggested by Cavalier-Smith (2013).

Lobosa and Conosa were both monophyletic by both methods on all trees that included all 188 genes and in Amoebozoa-only trees when genes with >50% or >40% missing data are excluded. Only in the ML tree for the most reduced 17-taxon dataset where genes missing in

>30% of taxa were excluded (i.e. 80% of the sequences were excluded) did Archamoebea jump from Conosa into Lobosa and disrupt their monophyly. Evidently, for Amoebozoa-only reducing the alignment to only 70 genes (10,157 amino acids) prevents ML from recovering the correct position of the long-branch Archamoebae because of insufficient data overall to reconstruct an accurate tree; unsurprisingly this artefact was also present with ML, but not

CAT, in the 13-gene tree using 18 Amoebozoa including Vannella. As this artefact did not occur in the 109-taxon alignment reduced to 60 genes (11,100 amino acids) it seems that breaking the basal stem between Lobosa and Conosa helps ML better reconstruct their monophyly. Even with the 70-gene alignment the site-heterogeneous CAT model showed both subphyla as distinct clades with reasonable support (0.85). This inferior performance of

ML is probably because the LG model assumes that all amino acid positions obey the same rules and applies a single historically constrained matrix to all sites (Le and Gascuel, 2008), an assumption known to be untrue for every multiple alignment tested statistically, thus 20 causing serious model misspecification. By contrast the CAT site-heterogeneous model allows different patterns of substitution at each site; all published tests (e.g. Brown et al.,

2013) show that the genes commonly used in multigene trees fit the CAT model much better than LG, so we give more weight to the high CAT support than the low ML support. Support for lobosan monophyly is maximal or strong on all CAT-GTR trees but much lower on ML trees. The discordance of the 70-gene trees alone (ML alone) shows that reducing the proportion of missing data by excluding genes is not beneficial if it shortens the total length of the alignment beyond a certain point; this example is another confirmation of the argument

(Roure et al., 2013) that reducing the proportion of missing data is generally less important than using the evolutionarily most realistic model available, in particular one taking into account different patterns of amino acid substitution across sites (CAT model) as well as a generalised GTR transition matrix and gamma distribution modelling of intersite rate variation – all as used here.

4.3 Phylogeny of Conosa

Within Conosa it was previously firmly established that cellular slime moulds and two separate subgroups of archamoebae (, Endolimacidae) lost cilia independently (Cavalier-Smith, 2013). For the first time, our robust trees indicate that

Varipodida must have lost cilia independently of these groups and of Lobosa, and therefore shared a biciliate common ancestor with other Conosa after they diverged from Lobosa.

Maximal support for the Filamoeba/Flamella clade on every tree strongly supports their grouping in one family Filamoebidae and order Varipodida (Cavalier-Smith et al., 2004).

Making a separate family for Flamella only and an unneeded new order just for the original

Filamoebidae (Lahr et al., 2011) was undesirable taxonomic inflation.

The ultrastructural character that gave the name to the conosan lineage of originally ciliated amoebae, the conical microtubular skeleton (Cavalier-Smith, 1998; see also Cavalier- 21

Smith et al., 2004) was recently reinterpreted. It is now considered not a synapomorphy for

Conosa but as convergent modifications within Variosea and Archamoebea from a more asymmetric ancestral condition of four robust microtubular roots linked to two centrioles, as is still retained by the biciliate Myxomycetes (Karpov et al., 2003) and in the variably ciliate

Protostelea within parvphylum Mycetozoa (Cavalier-Smith, 2013; Yubuki and Leander,

2013). It is now accepted that the eukaryotic ancestral ciliary apparatus consisted of two centrioles and at least three bands of microtubules (roots) attaching them to the rest of the cell cortex (Cavalier-Smith, 2013; Yubuki and Leander, 2013). Cavalier-Smith (2013) explained in detail how the ancestral myxomycete-like asymmetric amoebozoan cytoskeleton may have evolved from the rather similar cytoskeletons of ancestral sulcozoan protozoa resembling the apusozoan amoeboflagellates, which branch on multigene trees between Amoebozoa and opisthokonts. The secondary origins of the cone-like cytoskeleton of the convergently unicentriolar Variosea and Archamoebea is also elucidated there (Cavalier-Smith, 2013). Thus uniciliate conosan amoebozoans with single centrioles (the ‘unikont’ condition), like

Mastigamoeba (Archamoebae) and Phalansterium and some Protostelea (Mycetozoa with unicellular fruiting bodies), have secondarily lost the posterior and its associated microtubular roots (Cavalier-Smith, 2013; Yubuki and Leander, 2013). It follows from our robust amoebozoan tree topology that all non-flagellate conosans arose by secondary ciliary loss, notably Varipodida (e.g. Flamella, Filamoeba, Acramoeba), Entamoeba, and

Dictyostelea.

Our trees provide the strongest evidence to date for the position of the hard-to-place

Archamoebea. As explained above, all four trees with the lowest proportion of missing data strongly indicate that infraphylum Semiconosia (Variosea plus Mycetozoa: Cavalier-Smith,

2013) is monophyletic. Semiconosia was also a strongly supported clade by both methods on the 13-gene 18-Amoebozoa tree that included Vannella. Though one 17-species ML 188-gene 22 tree placed archamoebae within Discosea, we argued above for that being erroneous and its position in trees with proportionally least missing data as sister to Semiconosia probably correct. Interestingly, that almost certainly artefactual movement into Lobosa was never seen on the eukaryote-wide trees, which consistently recovered monophyly of both subphyla. The fact that non-amoebozoan outgroups break the stem connecting the bases of Conosa and

Lobosa provides additional information that should enable good algorithms to reconstruct the ancestors of each more accurately. For that reason we do not agree with the widespread assumption (e.g. Shavit et al. 2007) that tree topology for a taxon without outgroups (e.g. Figs

3-4) should necessarily be better than when properly representative outgroups are present (e.g.

Fig. 2).

The sharp conflict between the 188- and 60-gene trees for the precise position of archamoebae within Conosa might not be entirely because of missing data for the 128 excluded genes. A similar conflict was noted in deep phylogeny between multigene trees based on 87 ribosomal protein and 35 non-ribosomal protein trees (Nosenko et al.,

2013). They showed that divergence of the ribosomal proteins had not reached saturation levels, whereas for non-ribosomal proteins amino acid substitution was closer to saturation.

They argued that the concordance of the ribosomal protein tree with traditional morphological interpretations of animal evolution is attributable to this lower saturation of the ribosomal proteins (so more deep synapomorphies would be preserved) and to the higher saturation and greater sensitivity to long-branch artefacts and missing data of the non-ribosomal proteins, which produced a highly discordant tree. Significantly the vast majority of the proteins in our

60-gene and 70-gene alignments with proportionally least missing data are ribosomal proteins.

If in Amoebozoa as in animals amino acid substitution in ribosomal proteins is less saturated than in the largely non-ribosomal genes that were excluded, the putatively lower saturation of most of these 60 genes could explain why both methods showed the monophyly of 23

Semiconosia, whereas the complete gene set contradictorily grouped archamoebae with

Macromycetozoa.

All analyses show a robust clade including Dictyostelea and Myxomycetes, corresponding to the recently erected superclass Macromycetozoa (Cavalier-Smith, 2013;

Fiore-Donno et al., 2010), often also strongly supported in 18S rDNA phylogenies (Cole et al., 2010; Fiore-Donno et al., 2010; Nikolaev et al., 2006; Smirnov et al., 2011). No transcriptome or genome sequences are available yet for Protostelea, the third slime mould class; they are now needed for several lineages in this somewhat heterogeneous group to clarify further the basal branching of Conosa and test the monophyly of Mycetozoa – on rDNA trees some and variosean lineages tend to intermingle, so whether they should remain separate classes will be unclear until such information is obtained. One taxonomic adjustment favoured by the seemingly robust Soliformovum/Acramoeba clade on rDNA trees (Kudryavtsev et al. 2011; Lahr et al., 2011) is that Soliformovum should be transferred from Protostelea to Acramoebidae in Varipodida within Variosea, making a separate family for it unnecessary. Despite there being no transcriptome data for Protostelea, their mostly closer branching with Macromycetozoa than do Variosea on rDNA trees means that their eventual sequencing is unlikely to overturn our conclusion of conosan monophyly.

4. 4 Lobosa and its two classes

The well-characterized Tubulinea are always monophyletic with maximum support by all methods, as is the discosean order Himatismenida – unsurprising as their relatively long basal stems will have allowed many synapomorphies to accumulate. As the two tubulinean orders sampled here already straddle the deepest tubulinean bifurcation on rDNA trees

(Kudryavtsev et al., 2011), it is unlikely that new transcriptomes from the other three orders

(all reliably phylogenetically closer to Nolandella than to Vermamoeba) will undermine this robust monophyly of Tubulinea. 24

The other lobosean class, Discosea, flattened amoebae of very diverse morphotypes, was a clade in all CAT trees using 60-188 genes and all ML trees except two noted above where Archamoebea artefactually moved into Discosea because of insufficient data overall.

Our trees taken together clearly support the monophyly of Discosea, but this conclusion is necessarily provisional as we currently have transcriptome or genome sequences for only four of the nine discosean orders (plus 13 genes for a fifth, Vannellida); they are needed from all others to test this further. Especially if more complete than the present mostly partial transcriptomes, such additional data might better resolve discosean internal branching, now partially contradictory and weakly supported. Interestingly, our analyses found neither of the subclasses Flabellinia (Dactylopodida + Vannellida + Himatismenida) and Longamoebia

(Thecamoebida + Centramoebida) as a clade; we cannot predict whether that will still be true when more taxa are added or if the most frequently found Centramoebida/Dactylopodida clade will then be substantiated.

4.5 Himatismenida belong to Discosea

The problematic Cochliopodium and Ovalopodium, a robust clade whose position relative to other Amoebozoa is unstable in all 18S rDNA trees (Kudryavtsev et al., 2011), invariably branch inside Lobosa and Discosea in our analyses. Their precise position within

Discosea varies and remains somewhat ambiguous. The universal maximal support for sisterhood between the scaly Cochliopodium and non-scaly Ovalopodium with a morphologically simpler fibrous dorsal cell coat strongly supports their grouping in the same family Cochliopodiidae (Kudryavtsev et al., 2011; Smirnov et al., 2011). This emphasises that a thick dorsal cell coat without scales is the key innovation defining Himatismenida, although it may be not unique to it, as Goceviidae, now transferred to Pellitida (Kudryavtsev et al.,

2014), probably evolved a similar cell coat independently of Himatismenida. Probably scales 25 typical of Cochliopodium evolved only after this genus diverged from other himatismenids

(Kudryavtsev et al., 2011; Smirnov et al., 2011).

Overall our trees are more robust for amoebozoan phylogeny than they are for corticates. The inconsistencies amongst basal corticates are probably mainly caused by the near simultaneous divergence of the six major clades shown in Fig. 2. Resolution is probably not helped by relatively sparse taxon sampling of corticates, as we omitted many to make the alignment matrix small enough to be computationally tractable (yet still the largest by far to date for eukaryotes as a whole). This low corticate sampling does not detract from our conclusions concerning the Amoebozoa/podiate part of the tree, which we wanted to sample sufficiently thoroughly to make conclusions for this region of the tree as robust as practicable.

4.6 Conclusion

Our results strongly support the monophyly of Amoebozoa and its basal split into two monophyletic subphyla, Lobosa and Conosa, and also (as far as still somewhat limited taxon sampling allows) the monophyly of classes Tubulinea, Discosea, Variosea, Archamoebea,

Dictyostelea, and Myxomycetes, and the conosan superclass Macromycetozoa. Reducing the proportion of missing data suggests that the conosan infraphylum Semiconosia (Mycetozoa,

Variosea) may also be monophyletic and sister to infraphylum Archamoebae. These clades correspond precisely to all higher taxa except the unsequenced class Protostelea in the integrative amoebozoan classification of Smirnov et al. (2011), as revised by Cavalier-Smith

(2013), which therefore accurately reflects amoebozoan phylogeny. However, when too many genes with sparse taxon sampling are included, Archamoebae (probably artefactually) move into Semiconosia as apparent sister of Macromycetozoa. Within Discosea the frequently observed grouping of Centramoebida with Dactylopodida contradicts present classification into subclasses Longamoebia and Flabellinia. Similar studies are needed for additional 26 amoebozoan orders to test thoroughly and extend these conclusions, especially for the very diverse Discosea and Variosea, as well as Protostelea.

Acknowledgments

This work was supported by the Leverhulme trust (grant number R1008101) and by

NERC research grants and Research Professorship to TC-S, and by the Russian Foundation for Basic Research to AK (grant number 12-04-01835-a). AK is grateful to Jan Pawlowski

(University of Geneva) for laboratory facilities for RNA isolation. RL thanks NERC for a research studentship. We thank I. Dyková for providing the Filamoeba sinensis culture. We thank Matthew Brown, Fabien Burki, Vladimir Hampl, Naira Rodríguez-Ezpeleta, Laura

Parfrey, Timothy James, and Kamran Shalchian-Tabrizi, for providing gene alignments or protein sequences, and Gertraud Burger for permission to use those of Rhynchopus and

Diplonema.

References

Adl, S.M., Simpson, A.G.B., Lane, C.E., Lukeš, J., Bass, D., Bowser, S.F., Brown, M.W.,

Burki, F., Dunthorn, M., Hampl, V., Heiss, A.A., Hoppenrath, M., Lara, E., Le Gall, L., Lynn,

D.H., McManus, H., Mitchell, E.A.D., Mozley-Standridge, S.E., Parfrey, L.W., Pawlowski, J.,

Rueckert, S., Shadwick, L., Schoch, C.L., Smirnov, A., Spiegel, F.W., 2012. The revised classification of eukaryotes. J. Euk. Microbiol. 59, 429-493. Doi: 10.1111/j.1550-

7408.2012.00644.x.

Altan, D., Coupat-Goutaland, B., Risler, A., Reyrolle, M., Souchon, M., Briolay, J., Jarraud,

S., Doublet, P., Pélandakis, M., 2012. Micriamoeba tesseris nov. gen. nov. sp.: a new taxon of free-living small-sized amoebae non-permissive to virulent legionellae. Protist 163, 888-902.

Doi: 10.1016/j.protis.2012.04.006. 27

Ashlock, P.D., 1971. Monophyly and associated terms. Syst. Zool. 20, 63-69.

Baurain, D., Brinkmann, H., Petersen, J., Rodriguez-Ezpeleta, N., Stechmann, A., Demoulin,

V., Roger, A.J., Burger, G., Lang, F.B., Philippe, H., 2010. Phylogenomic evidence for separate acquisition of plastids in cryptophytes, , and stramenopiles. Mol. Biol.

Evol. 27, 1698-1709. Doi: 10.1093/molbev/msq059.

Brown, M.W., Spiegel, F.W., Silberman, J.D., 2007. Amoeba at attention: Phylogenetic affinity of pedata. J. Euk. Microbiol. 54, 511-519.

Brown, M.W., Silberman, J.D., Spiegel, F.W., 2011. "Slime Molds" among the Tubulinea

(Amoebozoa): molecular systematics and of Copromyxa. Protist 162, 277-287. Doi:

10.1016/j.protis.2010.09.003.

Brown, M.W., Sharpe, S.C., Silberman, J.D., Heiss, A.A., Lang, F.B., Simpson, A.G.B.,

Roger, A.J., 2013. Phylogenomics demonstrates that breviate flagellates are related to opisthokonts and apusomonads. Proc. R. Soc. B 280, 20131755. Doi:

10.1098/rspb.2013.1755.

Burki, F., Shalchian-Tabrizi, K., Minge, M.A., Skjæveland, Å., Nikolaev, S.I., Jakobsen, K.S.,

Pawlowski, J., 2007. Phylogenomics reshuffles the eukaryotic supergroups. PLoS ONE 2, e790. Doi: 10.1371/journal.pone.0000790.

Burki, F., Shalchian-Tabrizi, K., Pawlowski, J., 2008. Phylogenomics reveals a new

"megagroup" including most photosynthetic eukaryotes. Biol. Lett. 4, 366-369.

Burki, F., Inagaki, Y., Bråte, J., Archibald, J.M., Keeling, P.J., Cavalier-Smith, T., Sakaguchi,

M., Hashimoto, T., Horak, A., Kumar, S., Klaveness, D., Jakobsen, K.S., Pawlowski, J.,

Shalchian-Tabrizi, K., 2009. Large-scale phylogenomic analyses reveal that two enigmatic protist lineages, and Centroheliozoa, are related to photosynthetic chromalveolates.

Genome. Biol. Evol. 1, 231-238. Doi: 10.1093/gbe/evp022. 28

Burki, F., Corradi, N., Sierra, R., Pawlowski, J., Meyer, G.R., Abbott, C.L., Keeling, P.J.,

2013. Phylogenomics of the intracellular parasite Mikrocytos mackini reveals evidence for a in Rhizaria. Curr. Biol. 23, 1541-1547. Doi: 10.1016/j.cub.2013.06.033.

Cavalier-Smith, T., 1993. and the symbiotic origin of the metakaryote cell, in: Ishikawa,

H., Ishida, M., Sato, S. (Eds.), Endocytobiology V. Tübingen University Press, pp. 399-406.

Cavalier-Smith, T., 1998. A revised six-kingdom system of life. Biol. Rev. 73, 203-266.

Cavalier-Smith, T., 2002. The phagotrophic origin of eukaryotes and phylogenetic classification of Protozoa. Int. J. Syst. Evol. Microbiol. 52, 297-354.

Cavalier-Smith, T., 2010a. Kingdoms Protozoa and Chromista and the eozoan root of the eukaryotic tree. Biol. Lett. 6, 342-345. Doi: 10.1098/rsbl.2009.0948.

Cavalier-Smith, T., 2010b. Deep phylogeny, ancestral groups, and the four ages of life. Phil.

Trans. R. Soc. B 365, 111-132. Doi: 10.1098/rstb.2009.0161.

Cavalier-Smith, T., 2013. Early evolution of eukaryote feeding modes, cell structural diversity, and classification of the protozoan phyla Loukozoa, Sulcozoa, and Choanozoa. Eur.

J. Protistol. 49, 115-178. Doi: 10.1016/j.ejop.2012.06.001.

Cavalier-Smith, T., 2014. The neomuran revolution and phagotrophic origin of eukaryotes in the light of intracellular coevolution and a revised tree of life. In The Origin and Evolution of Eukaryotes. Eds

P. J. Keeling and E. V. Koonin. Cold Spring Harbor Perspectives in Biology. doi:

10.1101/cshperspect.a016006

Cavalier-Smith, T., Chao, E.E., Oates, B., 2004. Molecular phylogeny of Amoebozoa and the evolutionary significance of the unikont Phalansterium. Eur. J. Protistol. 40, 21-48.

Cavalier-Smith, T., Chao, E.E., Snell, E.A., Berney, C., Fiore-Donno, A.M., Lewis, R., 2014.

Multigene eukaryote phylogeny reveals the likely protozoan ancestors of opisthokonts

(animals, fungi, choanozoans) and Amoebozoa. Mol. Phyl. Evol. in press.

Cavalier-Smith, T., Nikolaev, S., 2008. The zooflagellates Stephanopogon and are a clade (class Percolatea: phylum Percolozoa). J. Eukaryot. Microbiol. 55, 501-509. 29

Cole, J.C., Anderson, R.O., Tekle, Y.I., Grant, J., Katz, L.A., Nerad, T.A., 2010. A description of a new "Amoebozoan" isolated from the American lobster, Homarus americanus. J. Euk. Microbiol. 57, 40-47. Doi: 10.1111/j.1550-7408.2009.00445.x.

Derelle, R., Lang, F.B., 2012. Rooting the eukaryotic tree with mitochondrial and bacterial proteins. Mol. Biol. Evol. 29, 1277-1289. Doi: 10.1093/molbev/msr295.

Dyková, I., Pecková, H., Fiala, I., Dvořáková, H., 2005. Filamoeba sinensis sp. n., a second species of the genus Filamoeba Page, 1967, isolated form gills of Carassius gibelio (Bloch,

1782). Acta Protozool. 44, 75-80.

Ewing, B., Green, P., 1998. Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Research 8, 186-194.

Fiore-Donno, A.M., Nikolaev, S.I., Nelson, M., Pawlowski, J., Cavalier-Smith, T., Baldauf,

S.L., 2010. Deep phylogeny and evolution of slime moulds (Mycetozoa). Protist 161, 55-70.

Doi: 10.1016/j.protis.2009.05.002.

Grant, J., Lahr, D.J.G., Rey, F.E., Gordon Burleigh, J., Gordon, J.I., Knight, R., Molestina,

R.E., Katz, L.A., 2012. Gene discovery from a pilot study of the transcriptomes from three diverse microbial eukaryotes: Corallomyxa tenera, Chilodonella uncinata, and

Subulatomonas tetraspora. Prot. Genom., 3-18. Doi: 10.2478/prge-2012-0002.

He, D., Fiz-Palacios, O., Fu, C.-J., Fehling, J., Tsai, C.-C., Baldauf, S.L., 2014. An alternative root for the eukaryote tree of life. Curr. Biol. 24, 465-470. Doi: 10.1016/j.cub.2014.01.036.

Kamikawa, R., Kolisko, M., Nishimura, Y., Yabuki, A., Brown, M.W., Ishikawa, S.A., Ishida, K.,

Roger, A.J., Hashimoto, T., Inagaki, Y., 2014. Gene content evolution in discobid mitochondria deduced from the phylogenetic position and complete mitochondrial genome of Tsukubamonas globosa. Genome Biol. Evol. 6, 306-315. 30

Karpov, S.A., Novozhilov, Y., Chistiakova, L.V., 2003. A comparative study of zoospore cytoskeleton in impexus, cinerea and epidendrum

(Eumycetozoa). 3, 15-29.

Katz, L.A., Grant, J., Parfrey, L.W., Gant, A., O’Kelly, C.J., Anderson, R.O., Molestina, R.E.,

Nerad, T.A., 2011. Subulatomonas tetraspora nov. gen. nov. sp. is a member of a previously unrecognized major clade of eukaryotes. Protist 162, 762-773. Doi:

10.1016/j.protis.2011.05.002.

Kudryavtsev, A., 2006. “Minute” species of Cochliopodium (Himatismenida): Description of three new fresh- and brackish-water species with a new diagnosis for Cochliopodium minus

Page, 1976. Eur. J. Protistol. 42, 77-89. Doi: 10.1016/j.ejop.2005.12.002.

Kudryavtsev, A., Bernhard, D., Schlegel, M., Chao, E.E., Cavalier-Smith, T., 2005. 18S

Ribosomal RNA gene sequences of Cochliopodium (Himatismenida) and the phylogeny of

Amoebozoa. Protist 156, 215-224.

Kudryavtsev, A., Brown, M.W., Tice, A., Spiegel, F.W., Pawlowski, J., Anderson O.R., 2014.

A revision of the order Pellitida Smirnov et al., 2011 (Amoebozoa, Discosea) based on ultrastructural and molecular evidence, with description of Endostelium crystalliferum n. sp.

Protist 165, 208-229. DOI: 10.1016/j.protis.2014.02.003

Kudryavtsev, A., Pawlowski, J., 2013. Squamamoeba japonica n. g. n. sp. (Amoebozoa): A deep-sea amoeba from the Sea of Japan with a novel cell coat structure. Protist 164, 13-23.

Doi: 10.1016/j.protis.2012.07.003.

Kudryavtsev, A., Wylezich, C., Schlegel, M., Walochnik, J., Michel, R., 2009. Ultrastructure,

SSU rRNA gene sequences and phylogenetic relationships of Flamella Schaeffer, 1926

(Amoebozoa), with description of three new species. Protist 160, 21-40.

Kudryavtsev, A., Wylezich, C., Pawlowski, J., 2011. Ovalopodium desertum n. sp. and the phylogenetic relationships of Cochliopodiidae (Amoebozoa). Protist 162, 571-589. Doi:

10.1016/j.protis.2011.04.002. 31

Lahr, D.J.G., Grant, J., Nguyen, T., Lin, J.H., Katz, L.A., 2011. Comprehensive phylogenetic reconstruction of Amoebozoa based on concatenated analyses of SSU-rDNA and actin genes. PLoS ONE, p. e22780. Doi: 10.1371/journal.pone.0022780.

Lahr, D.J.G., Grant, J., Katz, L.A., 2013. Multigene phylogenetic reconstruction of the

Tubulinea (Amoebozoa) corroborates four of the six major lineages, while additionally revealing that shell composition does not predict phylogeny in the Arcellinida. Protist 164,

323-339. Doi: 10.1016/j.protis.2013.02.003.

Lartillot, N., Philippe, H., 2004. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 1095-1109.

Lartillot, N., Rodrigue, N., Stubbs, D., Richer, J., 2013. PhyloBayes MPI. Phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst. Biol. Doi:

10.1093/sysbio/syt022.

Lasek-Nesselquist, E., Gogarten, J.P., 2013. The effects of model choice and mitigating bias on the ribosomal tree of life. Mol. Phyl. Evol. 69, 17-38. Doi: 10.1016/j.ympev.2013.05.006.

Le, S., Gascuel, O., 2008. LG: an improved, general amino-acid replacement matrix. Mol.

Biol. Evol. 25, 1307-1320.

Levine, N., Corliss, J.O., Cox, F., Deroux, G., Grain, J., Honigberg, B., Leedale, G.F.,

Loeblich, A., Lom, J., Lynn, D.H., Merinfeld, E., Page, F.C., Poljansky, G., Sprague, V.,

Vávra, J., Wallace, F., 1980. A newly revised classification of the Protozoa. J. Protozool. 27,

37-58.

Nikolaev, S.I., Mitchell, E.A.D., Petrov, N.B., Berney, C., Fahrni, J.F., Pawlowski, J., 2005.

The testate lobose Amoebae (Order Arcellinida Kent, 1880) finally find their home within

Amoebozoa. Protist 156, 191-202.

Nikolaev, S.I., Berney, C., Petrov, N.B., Mylnikov, A.P., Fahrni, J.F., Pawlowski, J., 2006.

Phylogenetic position of Multicilia marina and the evolution of Amoebozoa. Int. J. Syst. Evol.

Microbiol. 56, 1449-1458. 32

Nosenko, T., Schreiber, F., Adamska, M., Adamski, M., Eitel, M., Hammel, J., Maldonado,

M., Muller, W.E., Nickel, M., Schierwater, B., Vacelet, J., Wiens, M., Worheide, G., 2013. Deep metazoan phylogeny: when different genes tell different stories. Mol. Phylogenet. Evol. 67, 223-233.

Olive, L.S., 1975. The Mycetozoans. Academic Press, New York.

Page, F.C., 1969a. and pseudopod formation in Vexillifera bacillipedes n. sp., a mayorellid amoeba. Trans. Amer. Micros. Soc. 88, 394-400.

Page, F.C., 1969b. stenopodia n. g., n. sp., a freshwater amoeba. J. Protozool.

16, 437-441. Doi: 10.1111/j.1550-7408.1969.tb02296.x.

Page, F.C., 1980. A light- and electron-microscopic comparison of limax and flabellate marine amoebae belonging to four genera. Protistologica 16, 57-78.

Page, F.C., Blanton, R.L., 1985. The Heterolobosea (Sarcodina: Rhizopoda), a new class uniting the and the (Acrasida). Protistologica 21, 121-132.

Paps, J., Medina-Chacón, L.A., Marshall, W., Suga, H., Ruiz-Trillo, I., 2013. Molecular phylogeny of unikonts: new insights into the position of apusomonads and ancyromonads and the internal relationships of opisthokonts. Protist 164, 2-12. Doi:

10.1016/j.protis.2012.09.002.

Parfrey, L.W., Barbero, E., Lasser, E., Dunthorn, M., Bhattacharya, D., Patterson, D.J., Katz,

L.A., 2006. Evaluating support for the current classification of eukaryotic diversity. PLoS

Genet., p. e220.

Parfrey, L.W., Grant, J., Tekle, Y.I., Lasek-Nesselquist, E., Morrison, H.G., Sogin, M.,

Patterson, D.J., Katz, L.A., 2010. Broadly sampled multigene analyses yield a well-resolved eukaryotic tree of life. Syst. Biol. 59, 518-533. Doi: 10.1093/sysbio/syq037.

Pawlowski, J., 2008. The twilight of Sarcodina: a molecular perspective on the polyphyletic origin of amoeboid protists. Protistology 5, 281-302.

Pawlowski, J., Burki, F., 2009. Untangling the phylogeny of amoeboid protists. J. Euk.

Microbiol. 56, 16-25. Doi: 10.1111/j.1550-7408.2008.00379.x. 33

Pawlowski, J., Adl, S.M., Audic, S., Bass, D., Belbahri, L., Berney, C., Bowser, S.,

Cepicka, I., Decelle, J., Dunthorn, M., Fiore-Donno, A.M., Gile, G., Holzmann, M., Jahn, R.,

Jirků, M., Keeling, P.J., Kostka, M., Kudryavtsev, A., Lara, E., Lukeš, J., Mann, D.G.,

Mitchell, E.A.D., Nitsche, F., Romeralo, M., Saunders, G., Simpson, A.G.B., Smirnov, A.,

Spouge, J., Stern, R., Stoeck, T., Zimmermann, J., Schindel, D., de Vargas, C., 2012. CBOL

Protist Working Group: Barcoding eukaryotic richness beyond the animal, and fungal kingdoms. PLoS Biology 10, e1001419. Doi: 10.1371/journal.pbio.1001419.

Philippe, H., Brinkmann, H., Lavrov, D.V., Littlewood, D.T.J., Manuel, M., Wörheide, G.,

Baurain, D., 2011. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biology 9, e1000602. Doi: 10.1371/journal.pbio.1000602.

Pombert, J.-F., Smirnov, A., James, E.R., Janouškovec, J., Gray, M.W., Keeling, P.J., 2013.

The complete mitochondrial genome from an unidentified Phalansterium species. Prot.

Genom. 1, 25-32. Doi: 10.2478/prge-2013-0002.

Ptáčková, E., Kostygov, A.Y., Chistyakova, L.V., Falteisek, L., Frolov, A.O., Patterson, D.J.,

Walker, G., Cepicka, I., 2013. Evolution of Archamoebae: morphological and molecular evidence for pelobionts Including Rhizomastix, Entamoeba, Iodamoeba, and .

Protist 164, 380-410. Doi: 10.1016/j.protis.2012.11.005.

Rodríguez-Ezpeleta, N., Brinkmann, H., Burger, G., Roger, A.J., Gray, M.W., Philippe, H.,

Lang, F.B., 2007. Toward resolving the eukaryotic tree: the phylogenetic positions of jakobids and cercozoans. Curr. Biol. 17, 1420-1425. Doi: 10.1016/j.cub.2007.07.036.

Roure, B., Rodríguez-Ezpeleta, N., Philippe, H., 2007. SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics. BMC Evol. Biol. 7, S2. Doi:

10.1186/1471-2148-7-S1-S2.

Roure, B., Baurain, D., Philippe, H., 2013. Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. Mol. Biol. Evol., pp. 197-214. 34

Schilde, C., Schaap, P., 2013. The Amoebozoa. Methods in Molecular Biology. Springer

Science+Business Media. 1-15.

Shadwick, L.L., Spiegel, F.W., Shadwick, J.D.L., Brown, M.W., Silberman, J.D., 2009.

Eumycetozoa = Amoebozoa?: SSUrDNA phylogeny of protosteloid slime molds and its significance for the amoebozoan supergroup. PLoS ONE 4, e6754. Doi:

10.1371/journal.pone.0006754.

Shalchian-Tabrizi, K., Minge, M.A., Espelund, M., Orr, R.J., Ruden, T., Jakobsen, K.S.,

Cavalier-Smith, T., 2008. Multigene phylogeny of Choanozoa and the origin of animals.

PLoS ONE 3, e2098. Doi: 10.1371/journal.pone.0002098.

Shavit, L., Penny, D., Hendy, M.D., Holland, B.R., 2007. The problem of rooting rapid radiations.

Mol. Biol. Evol. 24, 2400-2411.

Smirnov, A.V., Chao, E.E., Nassonova, E.S., Cavalier-Smith, T., 2011. A revised classification of naked lobose amoebae (Amoebozoa: Lobosa). Protist 162, 545-570. Doi:

10.1016/j.protis.2011.04.004.

Smirnov, A.V., Nassonova, E.S., Berney, C., Fahrni, J.F., Bolivar, I., Pawlowski, J., 2005.

Molecular phylogeny and classification of the lobose moebae. Protist 156, 129-142.

Smirnov, A.V., Nassonova, E.S., Chao, E. E., Cavalier-Smith, T., 2007. Phylogeny, evolution and taxonomy of vannellid amoebae. Protist 158, 295-324.

Smirnov, A.V., Nassonova, E.S, Cavalier-Smith, T., 2008. Correct identification of species makes the amoebozoan rRNA tree congruent with morphology for the order Leptomyxida

Page 1987; with description of Acramoeba dendroida n. g., n. sp., originally misidentified as

sp.’ Eur. J. Protistol. 44, 35-44. Doi: 10.1016/j.ejop.2007.08.001.

Spiegel, F.W., 1981. Phylogenetic significance of the flagellar apparatus in protostelids

(Eumycetozoa). BioSystems 14, 491-499.

Stamatakis, A., Ludwig, T., Meier, H., 2005. RAxML-III: A fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21, 456-463. 35

Stamatakis, A., Hoover, P., Rougemont, J., 2008. A rapid bootstrap algorithm for the

RAxML web servers. Syst. Biol. 57, 758-771. Doi: 10.1080/10635150802429642.

Tekle, Y.I., Grant, J., Anderson, R.O., Nerad, T.A., Cole, J.C., Patterson, D.J., Katz, L.A.,

2008. Phylogenetic placement of diverse amoebae inferred from multigene analyses and assesment of clade stability within "Amoebozoa" upon removal of varying rate classes of

SSU-rDNA. Mol. Phyl. Evol. 47, 339-352. Doi: 10.1016/j.ympev.2007.11.015.

Walker, G., Dacks, J.B., Embley, M.T., 2006. Ultrastructural description of Breviata anathema, n. gen., n. sp., the previously studied as ‘‘Mastigamoeba invertens’’. J.

Euk. Microbiol. 53, 65-78. Doi: 10.1111/j.1550-7408.2005.00087.x.

Yoon, H.S., Grant, J., Tekle, Y.I., Wu, M., Chaon, B.C., Cole, J.C., Logsdon, J.M.J.,

Patterson, D.J., Bhattacharya, D., Katz, L.A., 2008. Broadly sampled multigene trees of eukaryotes. BMC Evol. Biol. 8, 1-12. Doi: 10.1186/1471-2148-8-14.

Yubuki, N., Leander, B.S., 2013. Evolution of organizing centers across the tree of eukaryotes. Plant J. 75, 230-244. Doi: 10.1111/tpj.12145.

Zhao, S., Burki, F., Bråte, J., Keeling, P.J., Klaveness, D., Shalchian-Tabrizi, K., 2012. Collodictyon - an ancient lineage in the tree of eukaryotes. Mol. Biol. Evol. 29, 1557-1568. Doi:

10.1093/molbev/mss001.

Zhao, S., Shalchian-Tabrizi, K., Klaveness, D., 2013. Sulcozoa revealed as a paraphyletic group in mitochondrial phylogenomics. Mol. Phylogenet. Evol. 69, 462-468.

Table 1

Information on the seven sequenced strains ______Taxon name Authors Culture References # of # of BioProject # reads assembled reads ______Cochliopodium Kudryavtsev CCAP 1537/7 Kudryavtsev, 46,099 17,079 PRJNA222681 minutoidum 2006 36

Filamoeba Dyková, Strain CH26, (Dyková et 53,914 23,564 PRJNA222679 sinensis Pecková, Institute of al., 2005) Fiala and Parasitology, Dvořáková České Budĕjovice, Czech Republic Flamella Kudryavtsev, CCAP 1525/4 (Kudryavtsev 30,929 9,915 PRJNA222680 fluviatilis Wylezich, et al., 2009) Schlegel, Walochnik and Michel Nolandella (Page) CCAP 1534/9 (Page, 1980; 14,378 6,326 PRJNA222683 abertawensis Cavalier- (Hartmannella Smirnov et Smith and abertawensis) al., 2011) Smirnov Ovalopodium Kudryavtsev, CCAP 1530/1 (Kudryavtsev 34,467 15,347 PRJNA222682 desertum Wylezich et al., 2011) and Pawlowski Stenamoeba (Page) CCAP 1565/3 (Page, 38,225 14,300 PRJNA222684 stenopodia Smirnov et (Platyamoeba 1969b; al. stenopodia) Smirnov et al., 2007) Vexillifera Page CCAP 1590/1 (Page, 13,869 3,046 PRJNA222686 bacillipedes 1969a)

Total 231,881 89,577 ______

Table 2.

Closely related species combined into one taxon for phylogenetic analyses (order as in Fig. 2)

______

Ixodes spp. I. ricinus, I. scapularis

Drosophila spp. D. melanogaster, D. pseudoobscura

Trypanosoma spp T. brucei, T. cruzi 37

Leishmania spp. L. infantum, L. major

Phytophthora spp. P. infestans, P. parasitica, P. ramorum, P. sojae

Paramecium spp. P. caudatum, P. tetraurelia

Cryptosporidium spp. C. hominis, C. parvum

Polyplacocystis spp. Polyplacocystis (=Raphidiophrys) ambigua, P. contractilis

Chlamydomonas spp. C. incerta, C. reinhardtii

Micromonas spp. M. pusilla, M. sp.

______

Figure Legends

Fig. 1. Systematic position of the species sequenced here (genera and their orders in bold) and available or EST (boxed). Classification according to Cavalier-Smith (2013) and

Smirnov et al. (2011); from left to right, subphyla (e.g. Lobosa), classes (e.g. Discosea), subclasses (e.g. Flabellinia), orders (e.g. Himatismenida), genera (e.g. Cochliopodium).

Photos (not to scale) are of the species sequenced here (see Table 1; all strains sequenced are the type strains so their identity is certain) simply to help non-specialist readers visualize them, not for identification for which the original papers cited should be consulted; that for

Filamoeba sinensis is of a stationary rather than the substantially different locomotive form; credits: A. Kudryavtsev, A. Smirnov, and I. Dyková.

Fig. 1. PhyloBayes tree inferred from concatenated alignments of 188 genes (51,352 positions) for 109 eukaryotic taxa using CAT-GTR-Γ4 model of substitution (MPI v. 1.4e).

ML Posterior probability support is given for each bipartition (left) and bootstrap percentages

(to its right) for a maximum likelihood analysis (RAxML PROTGAMMALGF) of the same 38 alignment; these same support values are also given for separate analyses by the same methods for a reduced alignment of only 60 genes (11,100 amino acid positions) with proportionally much less missing data (i.e. all genes missing from >30% of taxa excluded); dots indicate maximum support in all four analyses. The number of amino acids included for each taxon in the 188-gene alignments follows its name (and in brackets for Amoebozoa only for the 60-gene alignment). The seven species sequenced here are in bold. The tree is rooted within excavates as suggested by multigene mitochondrial protein trees rooted on the α- proteobacterial ancestors of mitochondria (Derelle and Lang, 2012; Zhao et al., 2013) and eubacterial-origin genes generally (He et al., 2014). An alternative, arguably much more convincing, position for the root suggested by trees for ribosomal proteins (of neomuran origin: Cavalier-Smith, 2014) rooted on all prokaryotes (Lasek-Nesselquist and Gogarten,

2013) and a dozen other lines of evidence (Cavalier-Smith, 2010a, 2013, 2014) is shown by the arrow, but was not adopted here purely to allow greater lateral expansion of deeper close branches and easier labelling. For the same reasons the long Entamoeba branch and the long stem at the base of Parabasalia were both halved in length. The scale bar indicates the fraction of substitutions per site. Within Amoebozoa, the two chains of each CAT-GTR analysis were identical in topology for Amoebozoa and most outgroups, but differed for corticates and metamonads; therefore the chains were summed individually, and the topology shown is that for chain 2 of the 188-gene tree with monophyletic Plantae, but support values are given for all chains (posterior probabilities (PP) for the chain 2 188 gene tree on the left). The other support values are: second from left, PP for chain 1 of the 188-gene tree; third, PP for chain 1 of the 60-gene tree; fourth, PP chain 2 of 60-gene tree; fifth, ML BS for 188-gene tree; sixth,

ML BS for 60-gene tree (chain 1 for the 188-gene trees summed 2059 trees after removing

2743 as burnin; chain 2 summed 699, burnin 2177; for the 60-gene trees chain 1 summed 921, 39 burnin 548; chain 2 summed 1335 trees, burnin 524). Dots indicate maximal support in all six analyses.

Fig. 3. PhyloBayes tree inferred from concatenated alignments of 188 genes (50,906 positions) for 17 Amoebozoa only using the CAT-GTR-Γ4 model of substitution (MPI v.

1.4e). Posterior probability support (upper row) and bootstrap support (lower row) for a maximum likelihood analysis (RAxML PROTGAMMALGF) of the same alignment; support values are also given for the trees run by both methods on reduced alignments after removing genes missing in > 50% of taxa (97 genes, 16, 981 amino acids, second figures), >40% of taxa

(87 genes, 13,719 amino acids, third figures), and >30% of taxa (70 genes, 10,157 amino acids, last figures); dots indicate maximal support in all eight analyses. The number of amino acids included for each species for the 188-gene analysis follows its name. The tree is rooted as shown by Figure 2. Group names follow Smirnov et al. (2011) and Cavalier-Smith (2013).

The seven newly sequenced species are in bold. The scale bar indicates the fraction of substitutions per site.

Fig. 4. PhyloBayes tree inferred from concatenated alignments of the 70 genes with least missing data (50,906 positions) for 17 Amoebozoa only using the CAT-GTR-Γ4 model of substitution (MPI v. 1.4e). Posterior probability support is given for each bipartition (upper row) and bootstrap percentages (lower row) for a maximum likelihood analysis (RAxML

PROTGAMMALGF) of the same alignment; support values are also given for the trees run by both methods on alignments with successively more but less well represented genes: 87 genes, excluding all missing in > 40% of taxa (second figures), 97 genes, excluding those missing in

>50% of taxa (third figure), and all 188 genes (last figure); dots indicate maximum support in all eight analyses. The number of amino acids included for each species in the 70-gene alignment follows its name. The seven species sequenced here are in bold. The tree is rooted as shown by Figure 2. The scale bar indicates the fraction of substitutions per site. *+,+-. /$01&02" :#%2.##(-(% /"1<;6&%&'$'"((((B Vexillifera +.##('(*%0000 7'3)%$".2(*% 8/(6,"&(*% Cochliopodium 9%--.##(*%0 A$9"<$0927$'"((()((

Ovalopodium ;"-)%$".2(% 5./$%$".2(*% ?.-'/%$".2(*%0000D 3821"9&25$'"((((B Stenamoeba 34546$72" E/6.##(-(*% >6,(-%$".2(*%0000D000000 ?&6"7'$'"(((((((((((B Nolandella >=%$".2(*%0000000 ;.F'"$34(*% >+?+-. Filamoeba !"#$&02" !"#"$%&'()(*% !"#$%&'$'"((((((() +,%#%-&'./((*%0 Flamella .#18"9&252" 1%&'()%$".2(*%00D +.#"2("-'(*%000000000G

:;12<&=&" 134")%&'/.%00000000H 5(6'3"&'.#.%000000000I +/"'"&'.#.%

@.3A0 ?2@('"<" -=$2./0"B0&.C=.-6.*0'%4%0 +/.<("=%<%(#%2#.0).-"$.&0"/0>78000 Figure 2

0.86/0.93/-/-/-/- Stenamoeba stenopodia 14,267 (7,860) 1/1/1/1/41/- Vexillifera bacillipedes 6,559 (1,958) 0.85/0.92/-/-/-/- Acanthamoeba castellani 44,606 (10,441) Discosea 1/1/-/-/65/19 Ovalopodium desertum 16,280 (8259) Lobosa 1/1/1/1/100/99 Cochliopodium minutoidum 18,435 (8,500) Vermamoeba vermiformis 25,389 (10,321) 1/1/1/1/89/78 Nolandella abertawensis 10,669 (7,502) Tubulinea Flamella fluviatilis 18,410 (8,517) Filamoeba sinensis 21,710 (9,152) Variosea AMOEBOZOA aff. flavogenita 10,696 (6,923) polycephalum 30,797 (9,665) 1/1/0.98 dachnaya 13,636 (8,479) Semiconosia 0.98/68/22 pallidum 19,176 (6,874) Macromycetozoa Conosa Dictystelium purpureum 43,574 (10,647) 1/1/-/-/41/- Dictystelium discoideum 45,962 (11,050 Phreatamoeba balamuthi 28,958 (10,443) 1/1/1/1/93/85 X2 38,935 (10,434) Archamoebea P 1/1/1/1/100/91 Homo sapiens 49,025 Ciona intestinalis 43,975 Ixodes spp. 34,977 O Drosophila spp. 44,798 Hydra magnipapillata 37,685 ANIMALIA D 1/1/-/ 1/1/1/1/100/99 Nematostella vectensis 45,595 0.99/100/42 Trichoplax adhaerens 45,266 Monosiga ovata 31,130 I 1/-/-/-/75/58 Monosiga brevicollis 45,624 Salpingoeca brevicollis 45,624 1/1/1/1 5,662 A 99/99 Capsaspora owczarzaki 30,267 Choanozoa 1/1/1/1 31,560 T 100/99 Amoebidium parasiticum 11,242 Nuclearia simplex 6,933 opisthokonts alba 8302 E Schizosaccharomyces pombe 43,150 Neurospora crassa 44,908 Ustilago maydis 32,236 S Phanerochaete chrysosporium 29,137 1/1/1/1/100/97 Filobasidiella neoformans 42,554 1/-/-/-/-/- Rhizopus oryzae 40,368 1/1/-/0.84/-/- 1/1/1/1/100/95 Phycomyces blakesleeanus 43,204 FUNGI Mortierella verticillata 28,984 Piromyces sp. 10,984 Spizellomyces punctatus 22,731 Batrachochytrium dendrobatidis 40,072 1/1/1/0.99/100/98 1/1/0.8/0.99/99/78 Blastocladiella emersoni 27,624 1/1/-/-/-/- Allomyces macrogynus 26,330 Thecamonas trahens 45,792 Manchomonas bermudensis 7,634 Apusomonadida Pygsuia biforma 29,885 Apusozoa Subulatomonas tetraspora 5,375 Breviatea Sulcozoa 1/1/0.92/-/93/88 Breviata anathema 16,893 Collodictyon triciliatum 16,249 Diphyllatea vaginalis 39,289 X2 Tritrichomonas foetus 8850 Metamonada Parabasalia 28,669 pyriformis 12,940 sp. 11,879 0.65/0.76/-/-/-/- Malawimonas jakobiformis 24,4720 Malawimonas californiana 19,725 E incarcerata 14,431 bahamensis 18,906 'Seculamonas ecuadoriensis' 27,523 Jakobea O 1/1/1/1 1/-/1/0.84/-/- Jakoba libera 19,233 /-/- 1/1/1/1/77/92 americana 31,405 1/1/1/1/100/- aroides 15,945 Z 1/1/1/1/100/93 lipophora 15,945 marylandensis 24,760 gruberi 48,549 Heterolobosea O Rhynchopus euleeides 5,479 Diplonema papillatum 9,797 A 1/1/1/1/100/- Rhynchomonas nasuta 7,259 Procryptobia sorokini 7,096 spp. 46, 505 Euglenozoa spp. 46, 505 Bodo saltans 6,843 trichophorum 6,468 1/-/-/-/-/- gracilis 27,339 Polyplacocystis spp. 19,450 Picomonas sp. 4,277 / Thalassiosira pseudonana 45,076 C Phytophthora spp. 4,277 Heterokonta Paramecium spp. 45,605 H C Perkinsus marinus 32,946 1/1/1/1/92/- 1/1/1/1 Toxoplasma gondii 39,327 Alveolata R 1/1/1/1/76/- 95/42 spp. 41,839 HAROSA O 1/1/-/-/100/59 O Gromia sphaerica 9,299 Reticulomyxa filosa 33,275 Paracercomonas marina 12,703 R Bigelowiella natans 45,630 Rhizaria M 1/1/1/1/98/97 Roombia truncata 17,289 T 1/-/-/-/-/- Guillardia theta 47,018 I 1/-/-/-/-/- Goniomonas cf. pacifica 4,811 Telonema subtilis 18,538 Telonemea S I 1/-/-/-/-/- Diacronema (=Pavlova) lutheri 19,557 HACROBIA Prymnesium parvum 19,569 T Isochrysis galbana 24,651 Haptophyta C Emiliania huxleyi 36,437 1/-/-/-/-/- Glaucocystis nostochinearum 17,456 A Cyanophora paradoxa 35,906 Glaucophyta A Porphyridium cruentum 11,524 Pyropia yezoensis 24,442 T 1/-/-/-/-/- Gracilaria changii 15,771 Rhodophyta Galdieria sulphuraria 30,680 PLANTAE 1/1/1/1/94/83 Cyanidioschyzon merolae 38,923 E 1/-/-/-/-/- Micromonas spp. 43,482 Chlamydomonas spp. 45,827 Mesostigma viride 12,718 Viridiplantae S Physcomitrella patens 12,718 Arabidopsis thaliana 48,671

0.3 0.3 Figure 3

Stenamoeba stenopodia 14,267 Thecamoebida D 0.98/0.72/-/- I -/-/-/- Vexillifera bacillipedes 6,559 Dactylopodida L 0.99/0.96/0.99/0.99 S 12/10/15/- Acanthamoeba castellani 44,606 Centramoebida C O 0.99/0.9/-/- O -/-/-/12 B Ovalopodium desertum 16,280 S Himatismenida E O Cochliopodium minutoidum 18,435 A S Vermamoeba vermiformis 25,389 Echinamoebida TUBULINEA A Nolandella abertawensis 10,669 Nolandida 1/1/1/0.85 5/36/48/- Phreatamoeba balamuthi 28,958 ARCHAMOEBEA Entamoeba histolytica 38,935

0.97/1/1/- Stemonitis aff. flavogenita 10,696 -/44/55/- M S Y 30,797 Myxomycetes E C C M Didymium dachnaya 13,636 E I O T C 19,176 N O O Dictyostelea Z Dictystelium purpureum 43,574 N O O O A S Dictystelium discoideum 45,281 S Flamella fluviatilis 18,410 I A Varipodida VARIOSEA A Filamoeba sinensis 21,710 0.3 0.3 Figure 4

Vermamoeba vermiformis 9,474 Echinamoebida TUBULINEA Nolandella abertawensis 8,401 Nolandida L Stenamoeba stenopodia 8,690 O 0.99/0.99/0.96/0.99 Thecamoebida D -/15/10/12 Dactylopodida Vexillifera bacillipedes 2,468 I B 0.45/0.59/-/- S 72/73/68/63 Acanthamoeba castellani 10,115 Centramoebida C O 0.47/-/-/- -/15/-/0 O S Ovalopodium desertum 8,617 S Himatismenida E A 0.85/1/1/0.99 Cochliopodium A -/48/36/5 minutoidum 8,582 Phreatamoeba balamuthi 9,919 ARCHAMOEBEA Entamoeba histolytica 8,996

Flamella fluviatilis 8,922 S Varipodida VVarioseaARIOSEA E Filamoeba sinensis 9,185 C M Stemonitis aff. flavogenita 7,662 I O Myxomycetes M 0.96/-/-/- C 97/-/-/100 Physarum polycephalum 8,191 Y N C O O Didymium dachnaya 9,119 E N T O S Polysphondylium pallidum 7,284 O Dictyostelea Z S A Dictystelium purpureum 9,379 O I A 0.3 Dictystelium discoideum 9,854 A

0.3