bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 2 3 Barthelonids represent a deep-branching clade with -related

4 organelles generating no ATP.

5

6 Euki Yazaki1*, Keitaro Kume2, Takashi Shiratori3, Yana Eglit 4,5,, Goro Tanifuji6, Ryo

7 Harada7, Alastair G.B. Simpson4,5, Ken-ichiro Ishida7,8, Tetsuo Hashimoto7,8 and Yuji

8 Inagaki7,9*

9

10 1Department of Biochemistry and Molecular Biology, Graduate School and Faculty of 11 Medicine, The University of Tokyo, Tokyo, Japan 12 2Faculty of Medicine, University of Tsukuba, Ibaraki, Japan 13 3Department of Marine Diversity, Japan Agency for Marine-Earth Science and Technology, 14 Yokosuka, Japan 15 4Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada 16 5Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, 17 Halifax, Nova Scotia, Canada 18 6Department of Zoology, National Museum of Nature and Science, Ibaraki, Japan 19 7Graduate School of and Environmental Sciences, University of Tsukuba, Tsukuba, 20 Ibaraki, Japan 21 8Faculty of Life and Environmental Sciences, University of Tsukuba, Ibaraki, Japan 22 9Center for Computational Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan 23

24 Running head: Phylogeny and putative MRO functions in a new metamonad clade.

25

26 *Correspondence addressed to Euki Yazaki, [email protected] and Yuji Inagaki,

27 [email protected]

1

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

28 Abstract

29 We here report the phylogenetic position of barthelonids, small anaerobic

30 previously examined using light microscopy alone. Barthelona spp. were isolated from

31 geographically distinct regions and we established five laboratory strains. Transcriptomic data

32 generated from one Barthelona strain (PAP020) was used for large-scale, multi-gene

33 phylogenetic (phylogenomic) analyses. Our analyses robustly placed strain PAP020 at the

34 base of the Fornicata clade, indicating that barthelonids represent a deep-branching

35 Metamonad clade. Considering the anaerobic/microaerophilic nature of barthelonids and

36 preliminary electron microscopy observations on strain PAP020, we suspected that

37 barthelonids possess functionally and structurally reduced mitochondria (i.e. mitochondrion-

38 related organelles or MROs). The metabolic pathways localized in the MRO of strain PAP020

39 were predicted based on its transcriptomic data and compared with those in the MROs of

40 fornicates. Strain PAP020 is most likely incapable of generating ATP in the MRO, as no

41 mitochondrial/MRO enzymes involved in substrate-level phosphorylation were detected.

42 Instead, we detected the putative cytosolic ATP-generating enzyme (acetyl-CoA synthetase),

43 suggesting that strain PAP020 depends on ATP generated in the cytosol. We propose two

44 separate losses of substrate-level phosphorylation from the MRO in the clade containing

45 barthelonids and (other) fornicates.

46

47 Key word: Metamonada, phylogenomics, mitochondrion-related organelles

48

2

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

49 Introduction

50 Elucidating the evolutionary relationships among the major groups of is

51 one of the most fundamental but unsettled questions in biology. It is widely accepted that

52 large-scale molecular data for phylogenetic analyses (so-called phylogenomic data) are

53 indispensable to infer ancient splits in the tree of eukaryotes (Burki 2014; Burki et al. 2019;

54 Keeling and Burki 2019). Preparing phylogenomic data has been greatly advanced by the

55 recent technological improvements in sequencing that generate a large amount of molecular

56 data at an affordable cost and in a reasonable time-frame (Bleidorn 2016; Vincent et al.

57 2017). Further, some recent phylogenomic analyses have included uncultured microbial

58 eukaryotes (e.g., Lax et al. 2018), since the libraries for sequencing of the whole-

59 genome/transcriptome can be prepared from a small number of cells (or even a single cell)

60 isolated from an environment sample (Kolisko et al. 2014; Strassert et al. 2019).

61 Despite these advances in experimental techniques, it is realistic to assume that no

62 current phylogenomic analysis has covered the true diversity of eukaryotes. A large number

63 of extant microbial eukaryotes have never been examined using transcriptomic or genomic

64 techniques, and some of them may hold the keys to resolving important unanswered questions

65 in eukaryotic phylogeny and evolution. Thus, to reconstruct the evolutionary relationships

66 among the major eukaryotic assemblages to a resolution that is both accurate and informative,

67 the taxon sampling in phylogenomic analyses has been improved by targeting two classes of

68 organisms: (i) Novel microbial eukaryotes that represent lineages that were previously

69 unknown to science, and (ii) “orphan eukaryotes” that had been reported before, but whose

70 evolutionary affiliations were unresolved by morphological examinations and/or single-gene

71 phylogenies (Zhao et al. 2012; Kamikawa et al. 2014; Yabuki et al. 2014; Burki et al. 2016;

72 Janouškovec et al. 2017; Brown et al. 2018; Lax et al. 2018; Gawryluk et al. 2019; Strassert et

73 al. 2019).

3

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

74 Many of “orphan eukaryotes” were described based solely on morphological

75 information prior to the regular use of gene sequences in phylogenetic/taxonomic studies.

76 One such organism is the small free-living heterotrophic biflagellate Barthelona vulgaris

77 (Bernard et al. 2000). The initial description of B. vulgaris was based on light microscopy

78 observations of cells isolated from marine sediment from Quibray Bay, Australia, and

79 maintained temporarily in nominally anoxic crude culture (Bernard et al. 2000). The

80 morphospecies was later identified at different geographical locations (Lee 2002; Lee 2006)

81 but never examined with methods incorporating molecular data. These past studies identified

82 no special morphological similarity between B. vulgaris and any eukaryotes described to date

83 at the morphological level (Bernard et al. 2000; Lee 2002; Lee 2006). Thus, to clarify the

84 phylogenetic placement of B. vulgaris in the tree of eukaryotes, molecular phylogenetic

85 analyses are required, preferably at the “phylogenomic” scale.

86 We here report five laboratory strains of Barthelona (EYP1702, FB11, LRM2,

87 PAP020 and PCE; Figs. 1A-E) isolated from separate geographical regions, and infer their

88 phylogenetic positions assessed by analyzing both SSU rDNA and phylogenomic data. A

89 SSU rDNA phylogeny robustly united all of the Barthelona strains together, but the precise

90 placement of Barthelona spp. among other eukaryotes remained inconclusive. To infer the

91 precise phylogenetic position of barthelonids, we obtained a transcriptome data from strain

92 PAP020, and analyzed its phylogenetic position from a -wide dataset containing

93 148 genes. The transcriptome data of strain PAP020 was also used for reconstructing the

94 metabolic pathways in a functionally and structurally reduced mitochondrion that is the result

95 of adaptation to anaerobiosis.

96

4

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

97 Results and Discussion 98

99 Phylogenetic position of barthelonids

100 Overall, the maximum-likelihood (ML) and Bayesian phylogenetic analyses of small

101 subunit ribosomal RNA gene (SSU rDNA) sequences resolved known major eukaryote

102 groups with moderate to high statistical support values, but the backbone of the tree remained

103 unresolved (Fig. 2). In the SSU rDNA tree, all of Barthelona sp. strains PAP020, EYP1702,

104 FB11, PCE, and LRM2 grouped together with a ML bootstrap value (MLBP) of 83% and a

105 Bayesian posterior probability (BPP) of 0.98. In this Barthelona clade, strains EYP1702 and

106 PCE were the earliest and second earliest diverging taxa, respectively, and strains PAP020,

107 LRM2 and FB11 formed a tight subclade. The Barthelona clade was sister to a Fornicata

108 clade comprising membranifera, Kipferlia bialata, Dysnectes brevis,

109 sp. and intestinalis (Fig. 2), but statistical support was equivocal

110 (MLBP 56%; BPP 0.86). This possible affinity between Barthelona and fornicates in the SSU

111 rDNA phylogeny is provocative, as both lineages thrive in oxygen-poor environments and

112 possess double-membrane bound MROs instead of typical mitochondria (Simpson and

113 Patterson 1999; Tovar et al. 2003; Yubuki et al. 2007; Yubuki et al. 2013; Kulda et al. 2017;

114 see Fig. S2 for the putative MRO in strain PAP020). Thus, we took a phylogenomic approach

115 to more robustly resolve the position of barthelonids in the tree of eukaryotes.

116 As anticipated, both ML and Bayesian phylogenetic analyses of a multi-gene

117 alignment comprising 148 genes (148-gene alignment) provided us deeper insights into the

118 backbone of the tree of eukaryotes (Fig. 3) than the SSU rDNA analyses (Fig. 2). The

119 backbone tree topology and statistical support values (Fig. 3) agreed largely with those

120 reported in Kamikawa et al. (2014), Yabuki et al. (2014) and Yabuki et al. (2018), which

121 analyzed multi-gene alignments generated from the same core set of 157 single-gene

5

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

122 alignments with mostly similar taxon sampling. The topology includes well established clades

123 including SAR, , and Discoba, but, as is common, did not infer a

124 monophyletic (Cenci et al. 2018; Strassert et al. 2019). Likewise, the 148-gene

125 phylogeny recovered neither the clade of Telonema subtilis and SAR (“T-SAR”; Strassert et

126 al. 2019) nor that of and (; Burki et al. 2016). We suspect

127 that large proportions of missing data in the sequence of T. subtilis and the single included

128 (34 and 35%, respectively), which derived from the transcriptomic data generated

129 by 454 pyrosequencing (Burki et al. 2009), hindered the recoveries of T-SAR and Haptista in

130 the 148-gene phylogeny.

131 The 148-gene phylogeny grouped Barthelona sp. strain PAP020 and 6 fornicates

132 together with a MLBP of 99% and a BPP of 1.0 (Fig. 3). In this clade, strain PAP020

133 occupied the basal position, which was supported fully by both ML and Bayesian analyses.

134 The clade of strain PAP020 and fornicates was connected sequentially with

135 (MLBP 100%; BPP 0.70), then with Paratrimastix pyriformis (representing Preaxostyla), to

136 form the Metamonada clade with a MLBP of 98% and a BPP of 0.98 (Fig. 3). Support for

137 these relationships was hardly affected by exclusion of rapidly evolving alignment positions,

138 until >60% of site were excluded (Fig. S1). We applied the ML tree and three alternative

139 trees, wherein strain PAP020 branched at the base of the Parabasalia clade, the clade of

140 Fornicata + Parabasalia and the Metamonada clade, to an approximately unbiased (AU) test,

141 and all of the alternative trees were rejected (p < 0.001). The results from the phylogenetic

142 analyses of 148-gene alignment consistently and robustly indicated that barthelonids are a

143 previously overlooked Metamonada lineage, which has a specific affinity with the Fornicata

144 clade.

145 There are two uncertain issues related to the taxonomic treatment of barthelonids for

146 future studies. Firstly, molecular phylogenetic analyses alone cannot determine whether

6

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

147 barthelonids are (a) a sister taxon to Fornicata, or (b) the deepest known branch within the

148 taxon of Fornicata. Fornicata is defined by a key ultrastructural characteristic in the flagellar

149 apparatus, namely the so-called “B fiber” forms a conspicuous arching bridge between the

150 two flagellar roots supporting the ventral feeding groove (Simpson 2003). Therefore, we need

151 to investigate the ultrastructure of barthelonid cells in detail for their higher-level taxonomic

152 treatment. The second issue for future studies is whether it is appropriate to classify all of the

153 five strains examined in this study into a single Barthelona. In the Barthelona clade

154 recovered in the SSU rDNA phylogeny (Fig. 2), strains PAP020, LRM2 and FB11 appeared

155 to be closely related to one another but are distant from strains PCE and EYP1702. We need

156 to assess their morphological characteristics to settle this issue.

157 Lack of substrate-level phosphorylation in the mitochondrion-related organelle of

158 Barthelona sp. PAP020.

159 All of the Barthelona strains assessed in this study (strains PAP020, EYP1702, PCE, LRM2

160 and FB11) are grown under oxygen-poor conditions in the laboratory. Our preliminary

161 ultrastructural observation of strain PAP020 did not reveal a typical mitochondrion. Instead

162 we observed a densely stained, double membrane-bounded organelle (Fig. S2). As all

163 studied so far lack typical mitochondria, we suspect that the double membrane-

164 bounded organelle identified in strain PAP020 is the MRO. According to the phylogenetic

165 position of barthelonids deduced from the SSU rDNA and 148-gene phylogeny (Figs. 2 & 3),

166 the metabolic pathways retained in the barthelonid MROs are significant to infer the

167 evolutionary history of the MROs in the Fornicata clade.

168 Leger et al. (2017) proposed that the ancestral fornicate possessed an MRO

169 with a metabolic capacity similar to that of the in parabasalids like

170 vaginalis. Thus, we surveyed the transcriptomic data from strain PAP020 for

171 transcripts encoding hydrogenosomal/MRO proteins that are homologous to Trichomonas

7

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

172 proteins localized in the . Strain PAP020 was predicted to possess the MRO

173 proteins involved in hydrogen production, pyruvate metabolism, amino acid metabolism, Fe-

174 S cluster assembly, anti-oxidant system and protein modification (chaperones and proteases)

175 (Figure 4A; purple and grey ellipses represent the proteins found and not found, respectively;

176 see also Table S2). This suggests that the overall function of the MRO of strain PAP020 is

177 similar to that of the Trichomonas hydrogenosome, except in ATP generation capacity. We

178 did not identify any transcripts encoding two enzymes for anaerobic ATP generation through

179 substrate-level phosphorylation, namely (i) acetate:succinate CoA transferase (ASCT) that

180 transfers coenzyme A (CoA) from acetyl-CoA to succinate and (ii) succinyl-CoA synthase

181 (SCS) that phosphorylates ADP to produce ATP coupled with converting succinyl-CoA back

182 to succinate. We propose that strain PAP020 genuinely lacks ASCT and SCS and that its

183 MRO is incapable of generating ATP.

184 We additionally surveyed the PAP020 data for transcripts encoding cytosol-localizing

185 acetyl-CoA synthase (ACS), which is an alternative mechanism to generate ATP in fornicate

186 cells. Intriguingly, two distinct ACS sequences were retrieved, designated here as ACS2 and

187 ACS3. Although the transcripts encoding both ACS versions most likely cover their N-

188 termini, neither of them was predicted to bear the typical signal to be localized in

189 mitochondria or MROs (i.e. an inferred N-terminal transit peptide). The abundances of the

190 ACS2 and ACS3 transcripts in strain PAP020 were 2249 and 2208 Transcripts Per kilobase

191 Million (TPM; Li and Dewey 2011), respectively, implying that the two Barthelona ACS

192 genes are indistinguishable at the transcription level. We subjected the two ACS sequences to

193 a phylogenetic analysis along with the homologues sampled from diverse ,

194 and eukaryotes (Fig S3). The PAP020 ACS2 sequence formed a clade with fornicate “ACS2”

195 sequences, which Leger et al. (2017) proposed to be cytosolic enzymes. Thus, we suggest that

196 ACS2 is most likely a cytosolic enzyme in strain PAP020 as well. The ACS phylogeny

8

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

197 recovered no strong affinity between PAP020 ACS3 sequence and other homologues (Fig.

198 S3). Neither of our analyses on the ACS3 sequence provided any positive support for MRO

199 localization, and we tentatively consider ACS3 as a cytosolic enzyme in strain PAP020.

200 Altogether, we conclude that strain PAP020 depends entirely on ATP generated by the two

201 cytosol-localizing ACS, as its MRO lacks substrate-level phosphorylation. For a better

202 understanding of ATP synthesis in this organism, the precise subcellular localizations of

203 ACS2 and ACS3 in strain PAP020 need to be confirmed experimentally in the future.

204 Leger et al. (2017) proposed a complex evolution of ATP-generating mechanisms in

205 the Fornicata clade, as follows: (i) The ancestral fornicate species possessed both substrate-

206 level phosphorylation in the MRO as well as ACS2 in the cytosol. (ii) Substrate-level

207 phosphorylation has been inherited vertically to the extant fornicate species, except D. brevis

208 and (see below). (iii) During the evolution of Fornicata, the ancestral cytosol-

209 localizing ACS (i.e. ACS2) was replaced by an evolutionarily distinct ACS (ACS1). (iv) The

210 redundancy in the ATP-generating system allowed the secondary loss of substrate-level

211 phosphorylation in the MRO prior to the separation of the D. brevis plus clade.

212 We here extend the scenario proposed by Leger et al. (2017) by incorporating the data from

213 Barthelona sp. strain PAP020 (See Fig. 4B). Acquisition of ACS2 was hypothesized at the

214 base of the Fornicata clade in the previous work (Leger et al. 2017), but after assessing the

215 data from stain PAP020, this particular event needs to be pushed back at least to the common

216 ancestor of fornicates and barthelonids, as strain PAP020 and multiple early-branching CLOs

217 (e.g., C. membranifera) share ACS2. It is worthy to note that acquisition of ACS2 may extend

218 back to the last common metamonad ancestor, since a possibly directly related ACS2 is also

219 present in (Fig. S3). Secondly, as barthelonids are distantly related to D. brevis and

220 diplomonads, loss of substrate-level phosphorylation in barthelonid MROs can be assumed to

221 have occurred independently from the loss in the common ancestor of D. brevis and

9

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

222 diplomonads (highlighted by blue diamonds in Fig. 3B). Further, barthelonids and the common

223 ancestor of D. brevis and diplomonads seem to have accommodated the loss of MRO-localized

224 substrate-level phosphorylation via possessing evolutionarily distinct ACS homologs (ACS2 and

225 ACS1, represented by yellow and red lines, respectively in Fig. 4B).

226 In the current study, we reconstructed the metabolic pathways in the MRO of only one

227 of the five strains of Barthelona sp. We anticipate that stains PAP020, LRM2 and FB11,

228 which formed a tight clade in the SSU rDNA phylogeny (Fig. 2), may have MROs with the

229 same or a very similar set of metabolic pathways. In future studies, it is important to

230 reconstruct the metabolic pathways in the MROs of strains PCE and/or EYP1702 to further

231 resolve the evolution of MROs and anaerobic metabolism in the Metamonada clade.

232 Considering the large evolutionary distance between PCE/EYP1702 and

233 PAP020/LRM2/FB11 in the SSU rDNA phylogeny (Fig. 2), we may find that the MRO

234 functions of strains PCE and EYP1702 are substantially different from that of strain PAP020

235 deduced in the current study.

236

237 Materials & Methods

238 Isolation and Cultivation

239 We established five laboratory strains of Barthelona sp. in this study (Fig. 1A-E).

240 Strains PAP020 and EYP1702 (Figs. 1A & 1D) were isolated from anaerobic mangrove

241 sediments collected at a seawater lake in the Republic of Palau in November 2011 and

242 October 2017, respectively. The laboratory cultures have been maintained in mTYGM-9

243 medium (http://mcc.nies.go.jp/medium/ja/mtygm9.pdf) with prey bacteria at 18-20 ℃. An

244 anaerobic environment within the laboratory cultures was created by the respiration of prey

245 bacteria. LRM2 (Fig 1B) was isolated from mud of a defunct saltern (now normal salinity) on

10

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

246 the Ebre Delta near San Carles de la Ràpita, Catalonia, Spain in February 2015. FB11 (Fig1C)

247 was isolated from False Bay, an intertidal mud flat on San Juan Island, WA, US in June 2015.

248 PCE (Fig 1E) was isolated from intertidal sediment near Cavendish, PEI, Canada in July,

249 2016. The established cultures were maintained with co-cultured bacteria on 3% LB in sterile

250 natural seawater at 18-21°C.

251 SSU rDNA phylogenetic analysis

252 Total DNA samples of Barthelona sp. strains PAP020, EYP1702, FB11, PCE and

253 LRM2 were extracted from the cultured cells using a DNeasy mini kit (Qiagen) or

254 NucleoSpin® Tissue kit (Macherey-Nagel). Near-complete SSU rDNA fragments were

255 amplified from each DNA sample by PCR, using either primers SR1 and SR12 (Nakayama et

256 al. 1998) or 18F and 18R (Yabuki et al. 2010). The amplification program consisted of 30

257 cycles of denaturation at 94 ℃ for 30 s, annealing at 55 ℃ for 30 s and extension at 72 ℃ for

258 90 s. The amplified product was gel-purified, cloned and sequenced by the Sanger method.

259 We aligned the SSU rDNA sequences of the five Barthelona strains with those of 91

260 phylogenetically diverse eukaryotes by using MAFFT 7.205 (Katoh 2002; Katoh and

261 Standley 2014). After manual exclusion of ambiguously aligned positions, 1,573 nucleotide

262 positions were subjected to ML phylogenetic analyses by using IQTREE v 1. 5. 4 (Nguyen et

263 al. 2015) with the GTR + R6 model, with MLBPs derived from 500 non-parametric bootstrap

264 replicates. The SSU rDNA alignment was also subjected to Bayesian phylogenetic analysis

265 using MrBayes 3.2.3 (Ronquist et al. 2012) with GTR + Γ model. The Markov Chain Monte

266 Carlo (MCMC) run was performed with one cold and three heated chains with default chain

267 temperatures. We ran 3,000,000 generations, and sampled log-likelihood scores and trees

268 with branch lengths every 1,000 generations (the stationarity was confirmed by plotting the

269 log-likelihoods sampled during the MCMC). The first 25% generations were discarded as

11

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

270 burn-in. The consensus tree with branch lengths and BPPs were calculated from the

271 remaining trees.

272 RNA-seq analyses

273 We conducted two RNA-seq runs of Barthelona sp. strain PAP020. The sequence

274 reads from the first analysis was used for a phylogenomic analysis assessing the position of

275 Barthelona spp. in the tree of eukaryotes, while those from the second sequencing run were

276 for surveying the proteins localized in the mitochondrial-related organelle (MRO) in strain

277 PAP020 (see below).

278 For the first RNA-seq run, PAP020 cells, together with bacterial cells in the culture

279 medium, were harvested and subjected to RNA extraction using TRIzol (Life Technologies)

280 by following the manufacturer’s protocol. We shipped the RNA sample to a biotech company

281 (Hokkaido System Science) for cDNA library construction and subsequent sequencing using

282 the Illumina HiSeq 2500 platform, which generated 2.9 x 107 paired-end 100-bp reads (2.9

283 Gb in total). The initial reads were then assembled into 29,251 unique contigs by TRINITY

284 (Grabherr et al. 2011; Haas et al. 2013).

285 For the second RNA-seq run, we separated PAP020 cells from the bacterial cells in

286 the culture medium by a gradient centrifugation using Optiprep (Axis Shield), as reported

287 previously (Tanifuji et al. 2018), with slight modifications (the Optiprep solution containing

288 the eukaryotic cells and bacteria was centrifuged at 2,000 g for 20 min, instead of 800 g for

289 20 min). Total RNA was extracted from the harvested eukaryote-enriched fraction, using

290 TRIzol by following the manufacturer’s protocol. Poly-A tailed RNAs in the RNA sample

291 described above were purified with a Dynabeads™ mRNA Purification Kit (Thermo Fisher

292 Scientific), and then used to construct the cDNA library using the SMART-Seq v4 Ultra Low

293 Input RNA Kit (Takara Bio USA) for Sequencing and Nextera XT DNA Library Preparation

294 Kit (Illumina). The resultant cDNA library was sequenced with the Illumina Miseq platform,

12

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

295 yielding 3.7 x 107 paired-end 300-bp sequence reads (8.6 Gb in total). These were assembled

296 into 21,286 unique contigs using TRINITY.

297 Phylogenomic analyses

298 To elucidate the phylogenetic position of Barthelona sp. strain PAP020, we prepared a

299 phylogenomic alignment by updating an existing dataset comprising 157 genes (Kamikawa et

300 al. 2014; Yabuki et al. 2014; Yabuki et al. 2018, see Table S1). For each of these 157 genes,

301 we added the homologous sequences retrieved from the transcriptomic data of strain PAP020

302 (this study) and four fornicates (Carpediemonas membranifera, Aduncisulcus paluster,

303 Kipferlia bialata and Dysnectes brevis; Leger et al. 2017). Each single-gene alignment was

304 aligned individually by MAFFT 7.205 with the L-INS-i algorithm followed by manual

305 correction and exclusion of ambiguously aligned positions. For each of the single-gene

306 alignments, the ML phylogenetic tree was inferred by RAxML 8.1.20 (Stamatakis 2014)

307 under the LG + Γ + F model with robustness assessed with a 100 replicate bootstrap analysis.

308 Individual single-gene trees were inspected to identify the alignments bearing aberrant

309 phylogenetic signal that disagreed strongly with any of a set of well-established monophyletic

310 assemblages in the tree of eukaryotes, namely Opisthokonta, , Alveolata,

311 Stramenopiles, , Rhodophyta, , Glaucophyta, Haptophyta, Cryptophyta,

312 Jakobida, , Heterolobosea, Diplomonadida, Parabasalia and .

313 Nine out of the 157 single-gene alignments were found to bear idiosyncratic phylogenetic

314 signal and were excluded from the phylogenomic analyses described below. After inspection

315 of single-gene alignments/trees, the remaining 148 single-gene alignments (Table S1) were

316 concatenated into a single phylogenomic alignment containing 83 taxa with 38,816

317 unambiguously aligned amino acid positions (148-gene alignment). The coverage for each

318 single-gene alignment is summarized in Table S1.

13

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

319 ML analyses of 148-gene alignment were conducted by using IQ-TREE v. 1.5.4 with

320 the LG + Γ + F + C60 + PMSF (posterior mean site frequencies) model (Wang et al. 2018)

321 and robustness evaluated with a ML bootstrap analysis on 100 replicates. We also conducted

322 a Bayesian phylogenetic analysis with the CAT + GTR model using PHYLOBAYES 1.5a

323 (Lartillot and Philippe 2004; Lartillot and Philippe 2006; Lartillot et al. 2007). In each

324 analysis, two MCMC runs were run for 5,000 cycles with “burn-in” of 1,250 (‘maxdiff’ value

325 was 0.96743). The consensus tree with branch lengths and Bayesian posterior probabilities

326 (BPPs) were calculated from the remaining trees.

327 The phylogenetic position of Barthelona sp. strain PAP020 inferred from the 148-

328 gene alignment was assessed by an approximately unbiased test (Shimodaira 2002). We

329 modified the ML tree to prepare four alternative tree topologies, in which strain PAP020

330 branches 1) at the base of the Parabasalia clade, 2) at the base of the clade of parabasalids and

331 fornicates, 3) with Paratrimastix pyriformis, and 4) at the base of the Metamonada clade. Site

332 likelihood data were calculated over each of the five trees examined (ML plus four alternative

333 trees) using IQ-TREE and then analyzed in CONSEL ver.0.20 (Shimodaira and Hasegawa

334 2001) with the default settings.

335 We evaluated the contribution of fast-evolving sites in the 148-gene alignment to the

336 position of Barthelona sp. strain PAP020. Individual rates for sites were calculated over the

337 ML tree topology using DIST_EST (Susko et al. 2003) with the LG + Γ + F model. Fast-

338 evolving sites were progressively removed from the original 148-gene alignment in 4,000-

339 position increments, and each of the resulting alignments was subjected to 100 replicate rapid

340 ML bootstrap analysis with RAxML 8.1.20 with the LG + Γ + F model.

14

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

341 Prediction of proteins localized in the mitochondrion-related organelle in Barthelona sp.

342 PAP020

343 We searched for mRNA sequences encoding proteins predicted to be localized to the

344 mitochondrion-related organelle (MRO) in Barthelona sp. strain PAP020, as well as those

345 involved in anaerobic ATP generation. For this we searched among the contigs generated

346 from the second RNA-seq experiment by TBLASTN, using the hydrogenosomal/MRO

347 proteins in and Giardia intestinalis as the queries (Leger et al. 2017).

348 The amino acid sequences deduced from the contigs retrieved by the first BLAST searches

349 were then subjected to BLASTP analyses against NCBI nr database to exclude false positives.

350 The structures of the putative MRO proteins were examined using hmmscan 3.1

351 (http://hmmer.org). We inspected each of the putative MRO proteins for potential

352 mitochondrial targeting sequences using MitoFates (Fukasawa et al. 2015) with default

353 parameters for the fungal sequences, and NommPred (Kume et al. 2018) with parameters for

354 canonical mitochondria and MRO.

355

356 Acknowledgements

357 This work was supported in part by a fund from the grants from the Japan Society for the

358 Promotion of Science (18KK0203 and 19H03280 awarded to YI; 15H05231 and 19KK0185

359 to TH) and by the "Tree of Life" research project of University of Tsukuba, as well as Natural

360 Sciences and Engineering Research Council (Canada) Discovery (Grant 298366-2014 to

361 AGBS). We thank Dr. Noèlia Carrasco for providing access to the Delta de l’Ebre sampling

362 location for strain LRM2. The SSU rDNA sequences of Barthelona spp. were deposited in

363 GenBank/EMBL/DDBJ database under the accession nos. LC506386‒LC506390. The

15

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

364 transcriptome data of strain PAP020 were deposited in DDBJ Sequence Archive under the

365 accession nos. SRA### and SRA$$$$.

366 References

367 Bernard C, Simpson AGB, Patterson DJ. 2000. Some free-living flagellates (protista) from 368 anoxic habitats. Ophelia 52:113–142. 369 Bleidorn C. 2016. Third generation sequencing: Technology and its potential impact on 370 evolutionary biodiversity research. Syst. Biodivers. 14:1–8. 371 Brown MW, Heiss AA, Kamikawa R, Inagaki Y, Yabuki A, Tice AK, Shiratori T, Ishida KI, 372 Hashimoto T, Simpson AGB, et al. 2018. Phylogenomics places orphan protistan 373 lineages in a novel eukaryotic super-group. Genome Biol. Evol. 10:427–433. 374 Burki F. 2014. The eukaryotic tree of life from a global phylogenomic perspective. Cold 375 Spring Harb. Perspect. Biol. 6. 376 Burki F, Inagaki Y, Bråte J, Archibald JM, Keeling PJ, Cavalier-Smith T, Sakaguchi M, 377 Hashimoto T, Horak A, Kumar S, et al. 2009. Large-scale phylogenomic analyses reveal 378 that two enigmatic lineages, and Centroheliozoa, are related to 379 photosynthetic chromalveolates. Genome Biol. Evol. 1:231–238. 380 Burki F, Kaplan M, Tikhonenkov D V., Zlatogursky V, Minh BQ, Radaykina L V., Smirnov 381 A, Mylnikov AP, Keeling PJ. 2016. Untangling the early diversification of eukaryotes: A 382 phylogenomic study of the evolutionary origins of centrohelida, haptophyta and 383 . Proc. R. Soc. B Biol. Sci. 283:20152802. 384 Burki F, Roger A, Brown MW, Simpson AGB. 2019. The new tree of eukaryotes. Trends 385 Ecol. Evol. In press. 386 Cenci U, Sibbald SJ, Curtis BA, Kamikawa R, Eme L, Moog D, Henrissat B, Maréchal E, 387 Chabi M, Djemiel C, et al. 2018. Nuclear genome sequence of the plastid-lacking 388 avonlea provides insights into the evolution of secondary 389 plastids. BMC Biol. 16:137. 390 Fukasawa Y, Tsuji J, Fu SC, Tomii K, Horton P, Imai K. 2015. MitoFates: Improved 391 prediction of mitochondrial targeting sequences and their cleavage sites. Mol. Cell. 392 Proteomics 14:1113–1126. 393 Gawryluk RMR, Tikhonenkov D V., Hehenberger E, Husnik F, Mylnikov AP, Keeling PJ. 394 2019. Non-photosynthetic predators are sister to . Nature 572:240–243.

16

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

395 Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, 396 Raychowdhury R, Zeng Q, et al. 2011. Full-length transcriptome assembly from RNA- 397 Seq data without a reference genome. Nat. Biotechnol. 29:644–652. 398 Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles 399 D, Li B, Lieber M, et al. 2013. De novo transcript sequence reconstruction from RNA- 400 seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 401 8:1494–1512. 402 Janouškovec J, Tikhonenkov D V., Burki F, Howe AT, Rohwer FL, Mylnikov AP, Keeling 403 PJ. 2017. A new lineage of eukaryotes illuminates early mitochondrial genome 404 reduction. Curr. Biol. 27:3717-3724.e5. 405 Kamikawa R, Kolisko M, Nishimura Y, Yabuki A, Brown MW, Ishikawa S a, Ishida K, 406 Roger AJ, Hashimoto T, Inagaki Y. 2014. Gene content evolution in Discobid 407 mitochondria deduced from the phylogenetic position and complete mitochondrial 408 genome of Tsukubamonas globosa. Genome Biol. Evol. 6:306–315. 409 Katoh K. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast 410 Fourier transform. Nucleic Acids Res. 30:3059–3066. 411 Katoh K, Standley DM. 2014. MAFFT: Iterative refinement and additional methods. Methods 412 Mol. Biol. 1079:131–146. 413 Keeling PJ, Burki F. 2019. Progress towards the tree of eukaryotes. Curr. Biol. 29:R808– 414 R817. 415 Kolisko M, Boscaro V, Burki F, Lynn DH, Keeling PJ. 2014. Single-cell transcriptomics for 416 microbial eukaryotes. Curr. Biol. 24:R1081–R1082. 417 Kulda J, Nohýnková E, Cepicka I. 2017. Retortamonadida (with Notes on A Carpediemonas- 418 like organisms and Caviomonadidae). In: Handbook of the : Second Edition. 419 Springer International Publishing. p. 1247–1278. 420 Kume K, Amagasa T, Hashimoto T, Kitagawa H. 2018. NommPred: Prediction of 421 mitochondrial and mitochondrion-related organelle proteins of nonmodel organisms. 422 Evol. Bioinformat. 14:1-12. 423 Lartillot N, Brinkmann H, Philippe H. 2007. Suppression of long-branch attraction artefacts 424 in the phylogeny using a site-heterogeneous model. BMC Evol. Biol. 7:S4. 425 Lartillot N, Philippe H. 2004. A Bayesian mixture model for across-site heterogeneities in the 426 amino-acid replacement process. Mol. Biol. Evol. 21:1095–1109. 427 Lartillot N, Philippe H. 2006. Computing Bayes factors using thermodynamic integration. 428 Syst. Biol. 55:195–207. 17

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

429 Lax G, Eglit Y, Eme L, Bertrand EM, Roger AJ, Simpson AGB. 2018. is 430 a novel supra--level lineage of eukaryotes. Nature 564:410–414. 431 Lee WJ. 2002. Some free‐living heterotrophic flagellates from marine sediments of Inchon 432 and Ganghwa Island, Korea. Korean J. Biol. Sci. 6:125–143. 433 Lee WJ. 2006. Some free-living heterotrophic flagellates from marine sediments of tropical 434 Australia. Ocean Sci. J. 41:75–95. 435 Leger MM, Kolisko M, Kamikawa R, Stairs CW, Kume K, Čepička I, Silberman JD, 436 Andersson JO, Xu F, Yabuki A, et al. 2017. Organelles that illuminate the origins of 437 Trichomonas hydrogenosomes and Giardia . Nat. Ecol. Evol. 1:0092. 438 Li B, Dewey CN. 2011. RSEM: accurate transcript quantification from RNA-Seq data with or 439 without a reference genome. BMC Bioinformatics 12:323. 440 Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: A fast and effective 441 stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 442 32:268–274. 443 Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, 444 Suchard MA, Huelsenbeck JP. 2012. MrBayes 3.2: Efficient bayesian phylogenetic 445 inference and model choice across a large model space. Syst. Biol. 61:539–542. 446 Shimodaira H. 2002. An approximately unbiased test of phylogenetic tree selection. Syst. 447 Biol. 51:492–508. 448 Shimodaira H, Hasegawa M. 2001. CONSEL: for assessing the confidence of phylogenetic 449 tree selection. Bioinformatics 17:1246–1247. 450 Simpson AGB. 2003. Cytoskeletal organization, phylogenetic affinities and systematics in the 451 contentious taxon (Eukaryota). Int. J. Syst. Evol. Microbiol. 53:1759–1777. 452 Simpson AGB, Patterson DJ. 1999. The ultrastructure of Carpediemonas membranifera 453 (Eukaryota) with reference to the “excavate hypothesis.” Eur. J. Protistol. 35:353–370. 454 Stamatakis A. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of 455 large phylogenies. Bioinformatics 30:1312–1313. 456 Strassert JFH, Jamy M, Mylnikov AP, Tikhonenkov D V., Burki F. 2019. New phylogenomic 457 analysis of the enigmatic Telonemia further resolves the eukaryote tree of life. 458 Mol. Biol. Evol. 36:757–765. 459 Susko E, Field C, Blouin C, Roger AJ. 2003. Estimation of rates-across-sites distributions in 460 phylogenetic substitution models. Syst. Biol. 52:594–603.

18

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

461 Tanifuji G, Takabayashi S, Kume K, Takagi M, Nakayama T, Kamikawa R, Inagaki Y, 462 Hashimoto T. 2018. The draft genome of Kipferlia bialata reveals reductive genome 463 evolution in fornicate parasites. PLoS One 13: e0194487. 464 Tovar J, León-Avila G, Sánchez LB, Sutak R, Tachezy J, Van Der Giezen M, Hernández M, 465 Müller M, Lucocq JM. 2003. Mitochondrial remnant organelles of Giardia function in 466 iron-sulphur protein maturation. Nature 426:172–176. 467 Vincent AT, Derome N, Boyle B, Culley AI, Charette SJ. 2017. Next-generation sequencing 468 (NGS) in the microbiological world: How to make the most of your money. J. Microbiol. 469 Methods 138:60–71. 470 Wang HC, Minh BQ, Susko E, Roger AJ. 2018. Modeling site heterogeneity with posterior 471 mean site frequency profiles accelerates accurate phylogenomic estimation. Syst. Biol. 472 67:216–235. 473 Yabuki A, Gyaltshen Y, Heiss AA, Fujikura K, Kim E. 2018. Ophirina amphinema n. gen., n. 474 sp., a new deeply branching discobid with phylogenetic affinity to . Sci. Rep. 8. 475 Yabuki A, Kamikawa R, Ishikawa S a, Kolisko M, Kim E, Tanabe AS, Kume K, Ishida K-I, 476 Inagaki Y. 2014. represents a basal cryptist lineage: insight into the 477 character evolution in Cryptista. Sci. Rep. 4:4641. 478 Yubuki N, Inagaki Y, Nakayama T, Inouye I. 2007. Ultrastructure and ribosomal RNA 479 phylogeny of the free-living heterotrophic Dysnectes brevis n. gen., n. sp., a 480 new member of the Fornicata. J. Eukaryot. Microbiol. 54:191–200. 481 Yubuki N, Simpson AGB, Leander BS. 2013. Comprehensive ultrastructure of Kipferlia 482 bialata provides evidence for character evolution within the Fornicata (Excavata). Protist 483 164:423–439. 484 Zhao S, Burki F, Bråte J, Keeling PJ, Klaveness D, Shalchian-Tabrizi K. 2012. - 485 an ancient lineage in the tree of eukaryotes. Mol. Biol. Evol. 29:1557–1568. 486

487 Figure legends

488 Figure 1. Light micrographs of Barthelona spp. studied in this study. Strains PAP020,

489 FB11, LRM2, EYP1702 and PCE are shown in (A), (B), (C), (D), and (E), respectively.

490 Flagella are marked by arrowheads. Scale bars = 10 μm.

491

19

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

492 Figure 2. Global eukaryotic phylogeny inferred from small subunit ribosomal DNA

493 sequences. The tree topology was inferred using the maximum-likelihood (ML) method and

494 ML bootstrap values (MLBPs) and Bayesian posterior probabilities (BPPs) were mapped on

495 the ML tree. The nodes marked by dots were supported by MLBPs of 100% and BPPs of 1.0.

496 MLBPs less than 70% are not shown. BPPs of 0.95 or more are marked by diamonds.

497

498 Figure 3. Global eukaryotic phylogeny inferred from a 148-gene alignment. The tree

499 topology was inferred using the maximum-likelihood (ML) method; ML bootstrap values

500 (MLBPs) and Bayesian posterior probabilities (BPPs) were mapped on the ML tree. The

501 Bayesian analysis recovered and identical overall topology. The nodes marked by dots were

502 supported by MLBPs of 98% or more, and BPPs of 0.95 or more. MLBPs less than 60% or

503 BPPs below 0.80 are not shown. The bar graph for each taxon indicates the percent coverage

504 of the amino acid positions in the 148-gene analyses.

505

506 Figure 4. Function and evolution of the mitochondrion-related organelles (MRO) of

507 Barthelona sp. strain PAP020. A. Reconstructed metabolic pathways in the MRO of strain

508 PAP020. Dark purple ellipses indicate that the transcripts encoding hydrogenosomal/MRO

509 proteins were detected in the Barthelona RNA-seq data, and their N-termini were predicted as

510 transit peptides for mitochondria/MRO by MitoFates (Fukasawa et al. 2015) and/or

511 NommPred (Kume et al. 2018). Pale purple ellipses indicate putative hydrogenosomal/MRO

512 proteins lacking N-terminal sequence information or those with N-terminal extensions that

513 were not predicted as mitochondria/MRO localizing by MitoFates and NommPred.

514 Hydrogenosomal/MRO proteins shown in grey ellipses represent the absence of the

515 corresponding transcripts in the Barthelona RNA-seq data. Strain PAP020 possesses two

516 acetatyl CoA synthases (ACS), one corresponds to the cytosolic ACS of multiple fornicates

20

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

517 (ACS2) and the other showed no clear phylogenetic affinity to any known ACS (ACS3; see

518 Fig. S3). We regard ACS2 as a cytosolic protein in strain PAP020 (blue ellipse). As there is

519 no hint for the subcellular localization of ACS3, this version is omitted from the figure. 1, H2-

520 synthesis; 2, pyruvate metabolism; 3, substrate-level phosphorylation; 4, amino acid

521 metabolism; 5, Fe-S cluster assembly; 6, anti-oxidant system; 7, protein modification.

522 Abbreviations: Ala, alanine; Asp, aspartic acid; Cys, cysteine; Glu, glutamic acid; Gly,

523 glycine; α-KG, α-ketoglutaric acid; Trp, tryptophan; OAA, oxaloacetic acid; NAD+/NADH,

524 nicotinamide adenine dinucleotide; NADP+/NADPH, nicotinamide adenine dinucleotide

525 phosphate; HydE/F/G, hydrogenase maturases E/F/G; HydA/[Fe]-Hyd, hydrogenase; Fdx,

526 ferredoxin; NuoE/F, 24/51 kDa of mitochondrial NADH:ubiquinone oxidoreductase; ME,

527 malic enzyme; PFO, pyruvate:ferredoxin oxidoreductase; ASCT, acetate:succinyl-CoA

528 transferase; H/L/P/T, glycine cleavage system protein H/L/P/T; AlaAT, alanine

529 aminotransferase; AspAT, aspartate aminotransferase; TNase, tryptophanse; GDH, glutamate

530 dehydrogenase; HCP, hybrid-cluster protein; SHMT, serine hydroxymethyltransferase; CS,

531 cysteine synthase; MGL, monoacylglycerol lipase; Fxn,; IscS, cysteine desulfurase; IscU/A,

532 iron-sulfur cluster assembly protein; Fxn, frataxin; OsmC, osmotically inducible protein;

533 SOD, superoxide dismutase; Trx, thioredoxin; TrxR, thioredoxin reductases; TrxP,

534 thioredoxin peroxidase; Rbr, rubrerythrin; Cpn60/10, chaperonin 60/10; Hsp70, heat shock

535 protein 70; HscB, heat shock cognate B; GrpE, nucleotide exchange factor for DnaK; DnaJ,

536 heat shock protein 40; MPPα/β, mitochondrial processing peptidase α/β. B. Evolution of ATP

537 generation in barthelonids, parabasalids and selected fornicates. In the clade of fornicates and

538 barthelonids (Fornicata+ clade), substrate-level phosphorylation (blue) was lost on two

539 separate branches. The cytosolic ACS2 (yellow), which was established at the base of the

540 Fornicata+ clade, was replaced by an evolutionarily distinct type of ACS (ACS1; red) during

541 the evolution of fornicates.

21

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

542

543 Table S1: Amino acid positions and coverages of the 148 single-gene alignments used in

544 this study.

545

546 Table S2: Putative MRO proteins of Barthelona sp. strain PAP020 and their predicted

547 subcellular localizations.

548

549 Figure S1: The impact of removal of fast-evolving alignment positions on the

550 phylogenetic relationship among fornicates, parabasalids and Barthelona sp. strain

551 PAP020.

552 Fast-evolving positions in the 148-gene alignments were progressively removed in 4,000

553 position increments. The filtered alignments were individually subjected to rapid ML

554 bootstrap analyses using RAxML. For each data point, we plotted the support values for (i)

555 the sister relationship between strain PAP020 and fornicates (blue), (ii) the monophyly of

556 fornicates (orange), (iii) the sister relationship between strain PAP020 and parabasalids (gray)

557 and (iv) the monophyly of parabasalids (green).

558

559 Figure S2: Transmission electron micrograph image of MRO of Barthelona sp. PAP020

560 Scale bar = 500 nm. A specimen for transmission electron microscopy (TEM) observation

561 was prepared as follows; cultivated cells were centrifuged and fixed with pre-fixation for 1 h

562 at room temperature with a mixture of 2% (v/v) glutaraldehyde, 0.1 M sucrose, and 0.1 M

563 sodium cacodylate buffer (pH 7.2, SCB). Fixed cells were washed with 0.2 M SCB three

564 times. Cells were post-fixed with 1% (v/v) OsO4 with 0.1 M SCB for 1 h at 4 °C. Cells were

565 washed with 0.2 M SCB two times. Dehydration was performed using a graded series of 30–

566 100% ethanol (v/v). After dehydration, cells were placed in a 1:1 mixture of 100% ethanol

22

bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

567 and acetone for 10 min and acetone for 10 min for two cycles. Resin replacement was

568 performed by a 1:1 mixture of acetone and Agar Low Viscosity Resin R1078 (Agar Scientific

569 Ltd, Stansted, England) for 30 min and resin for 2 h. Resin was polymerized by heating at

570 60 °C for 8 h. Ultrathin sections were prepared on a Reichert Ultracut S ultramicrotome

571 (Leica, Vienna, Austria), double stained with 2% (w/v) uranyl acetate and lead citrate

572 (Hanaichi et al., 1986, Sato, 1968), and observed using a Hitachi H-7650 electron microscope

573 (Hitachi High-Technologies Corp., Tokyo, Japan) equipped with a Veleta TEM CCD camera

574 (Olympus Soft Imaging System, Münster, Germany).

575

576 Figure S3: Phylogenetic tree of acetyl-CoA synthase (ACS) sequences.

577 The ACS phylogeny was inferred using the maximum-likelihood (ML) method and ML

578 bootstrap values (MLBPs) were mapped on the ML tree. MLBPs below 50% are not shown.

579 Two ACS sequences of Barthelona sp. strain PAP020 are highlighted in red. The pink- and

580 green-colored sequences are of eukaryotes and archaea, respectively. The clades of ACS1 and

581 ACS2 were defined by referring to the phylogenetic analysis presented in Leger et al. (2017).

582

23 bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Figure 2 available under aCC-BY-NC-ND 4.0 International license.

Barthelona sp. PAP020 Barthelona sp. LRM2 59 Barthelona sp. FB11 83 0.98 Barthelona sp. PCE 0.98 Barthelona sp. EYP1702 56 Giardia intestinalis (M54878.1) 0.86 99 Retortamonas sp. strain caviae (AF439349.1) 98 Dysnectes brevis (AB263123.1) Fornicata 73 Kipferlia bialata NY0173 (GU827604.1) Carpediemonas membranifera isolate BICM (GU827594.1) 98 Paratrimastix pyriformis ATCC50598 (AF244904.1) Preaxostyla sp. ItaOx8 (KJ399552.2) Dictyostelium discoideum isolate 2a3a (KJ394480.1) Amoebozoa Neoparamoeba perurans strain GD-HAC/2/2 (EF216905.1) Micronuclearia podoventralis (AY268038.1) Rigifilida Acrasis helenhemmesae strain Sk6a-1 (GU437222.1) Heterolobosea gruberi strain CCAP1518/1D (X78278.1) Tsukubamonas globosa (AB576851.1) Tsukubea Diplonema papillatum strain ATCC50162 (KF633466.1) gracilis (AF283309.1) Euglenozoa Neobodo saliens (AF174379.1) Trichomonas vaginalis isolate 04 (JX943596.1) Parabasalia Simplicimonas similis isolate PCC3007 (KC953858.1) godoyi isolate AND28 (AY965870.1) libera (AF411288.1) Jakobida americana (AF053089.1) jakobiformis (EF455761.2) Malawimonadida Heterosigma carterae (L42529.1) Tribonema aequale voucher UTEX 50 (HQ710588.1) Nannochloropsis salina CCMP537 (AF045049.1) 95 93 Thalassiosira pseudonana (HM991698.1) Stramenopiles Rhizosolenia setigera strain CCMP1820 (HQ912561.1) Phytophthora megasperma isolate WPC1679A20 (JN635225.1) 91 Hyphochytrium catenoides (X80344.1) Nerada mexicana strain ATCC50535 (AY520453.1) Thraustochytrium aureum (AB022110.1) 81 Toxoplasma gondii (L24381.1) parvum (L25642.1) 92 Chromera velia isolate Mdig4 (JN986791.1) 99 Cochlodinium polykrikoides clone GD1589bp49 (EU418965.1) Alveolata Alexandrium tamarense isolate UTEX2521 (AY883004.1) Perkinsus marinus isolate P-1 (AF324218.1) 99 Paramecium tetraurelia strain hrd (AB252009.1) Tetrahymena pyriformis strain MDL4u (EF070255.1) 76 anathema (AF153206.1) Breviatea Pygsuia biforma strain PCbi66 (KC433554.1) Arabidopsis thaliana (X16077.1) Closterium spinosporum var. crassum R-9-13 (AF352241.1) Pterosperma cristatum strain NIES 221 (AJ010407.1) Chlorella vulgaris KMMCC FC-4 (GQ122370.1) Chloroplastida 99 Chlamydomonas reinhardtii strain JinCheon1 (KF864473.1) Chara globularis (Y16465.1) Mesostigma viride (AJ250108.1) Bangia fuscopurpurea isolate jsqd (KJ023699.1) 74 Porphyra sp. isolate SSR053 (AF136427.1) 73 74 Rhodella violacea (AB045580.1) Rhodophyta Akalaphycus liagoroides (KC157584.1) Cyanidium caldarium strain 61D (AB090833.1) 76 isolate ESP7414 (KC404141.1) Phaeocystis antarctica strain MM-E2C1 (JN381495.1) Haptophyta patelliferum (L34671.1) Pavlova lutheri strain AC44 (JF714236.1) Palpitomonas bilix (AB508339.1) Cryptista Thecamonas trahens strain A-1a (EU542595.1) Picobiliphyte sp. MS609-66 (JN934893.1) 96 Pterocystis tropica isolate tropica (AY749603.1) Sphaerastrum fockii (AY749614.1) Centrohelida micra (JQ245079.1) 78 Cryptomonas paramaecium strain M2452 (AM051196.2) 99 Guillardia theta (X57162.1) 76 Goniomonas truncata (U03072.1) Cryptista Roombia truncata (FJ969717.1) Leucocryptos marina clone 3902 (JF791094.1) Glaucocystis nostochinearum strain SAG 28.80 (KF631385.1) Cyanophora paradoxa strain SAG 45.84 (KF631378.1) Glaucophyta Cyanoptyche gloeocystis strain SAG 34.90 (AJ007275.1) solitarium (AF280078.1) Phalansterium sp. (EF143966.1) Amoebozoa Acanthamoeba castellanii CDC 0184 V014 (U07401.1) Rhizamoeba saxonica strain CCAP1570/2 (EU719197.1) roboscidea strain CCAP 1905/1 (L37037.1) Apusomonadida Mantamonas plastica (GU001154.1) Mantamonadida kenti strain edm11b (JQ340339.1) Telonema subtilis isolate RCC358.7 (AJ564772.1) Telonemia Thaumatomonas sp. (U42446.1) Euglypha rotunda strain Costa Rica (AJ418782.1) 72 Cercomonas longicauda (AF411270.1) Rhizaria chromatophora strain FK01 (FJ456918.1) sp. strain RCC337 (FJ973362.1) Aspergillus oryzae strain FS528 (HM536621.1) Saccharomyces cerevisiae (V01335.1) 99 Homo sapiens (M10098.1) Opisthokonta Drosophila melanogaster (KC177303.1) Monosiga brevicollis (AF100940.1) 0.1 subsutitutions/site 0.4 subsutitutions/site bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 3

Coverage Barthelona sp. PAP020 Giardia intestinalis 99 (x2) vortens 1.0 Dysnectes brevis Fornicata Metamonada 100 Kipferlia bialata 0.70 Aduncisulcus paluster 98 Carpediemonas membranifera 0.98 Trichomonas vaginalis foetus Parabasalia Paratrimastix pyriformis Preaxostyla major Trypanosma cruzi 85 Diplonema papillatum Euglena gracilis 89 marylandensis 89 1.0 lipophora Naegleria gruberi Discoba 1.0 Tsukubamonas globosa Recliomonas americana 60 ‘Seculamonas ecuadoriensis’ 89 0.99 Jakoba libera 1.0 ‘Jakoba bahamensis’ Stygiella incarcerata Phaeodactylum tricornutum Thalassiosira pseudonana Aureococcus anophagefferens Ectocarpus siliculosus Phytophthora infestans Blastocystis hominis Bigelowiella natans Paracercomonas marina Gymnophrys cometa Quinqueloculina sp. SAR 89 filosa 1.0 Plasmodium falciparum Theileria parva Toxoplasma gondii Cryptosporidium parvum Oxyrrhis marina Perkinsus marinus Paramecium caudatum Tetrahymena pyriformis Emiliania huxleyi galbana Prymnesium parvum Haptophyta Pavlova lutheri Picomonas sp. Picozoa Telonema subtilis Telonemia Micromonas sp. Ostreococcus tauri Chlamydomonas reinhardtii Arabidopsis thaliana Chloroplastida Oryza sativa Guilllardia theta Rhodmonas salina 92 Goniomonas sp. 76 1.0 Cryptista 1.0 Roombia truncata Palpitomonas bilix Cyanophora paradoxa Glaucocystis nostochinearum Glaucophyta Chondrus crispus Gracilaria changii Porphyra yezorensis Rhodophyta 91 Cyanidioschyzon merolae 1.0 Galdieria sulphuraria Polyplacocystis contractilis Centrohelida Drosphila melanogaster Homo sapiens Amphimedon queenslandica Monosiga brevicollis 87 Capsaspora owczarzaki 84 Saccharomyces cerevisiae Ustilago maydis Amorphea Blastocladiella emersonii Thecamonas trahens Dictyostelium discoideum Physarum polycephalum Entamoeba histolytica 1.0 Mastigamoeba balamuthi Acanthamoeba castellanii Malawimonas californiensis Malawimonadida 81 Malawimonas jakobidormis Collodictyon triciliatum 0.2 subsutitutions / site 0 % 50 % 100 % bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 4

A B FORNICATA+

acetate ATP ACS3 2+ Ancestor of 1 5 Fe Ala Cpn60 acetyl- Hsp70 ADP diplomonads HydA Fxn IscS CoA Gain of S Cpn10 FeS Cys HscB acetate ATP ACS1 HydF cluster HydE IscU + 7 ACS2 HydG IscA Fdx GrpE LOSS of FORNICATA acetyl- ADP Dysnectes MPPα DnaJ CoA ACS2 [Fe]-Hyd Fdx- + MPPβ brevis H H2 acetate 3 - + MRO substrate-level phosphorylation lost Fdx Fdx succinyl-CoA ATP + ASCT SCS NAD Rbr NuoE NuoF 6 succinate Kipferlia bialata ADP H L NADH HCP NADH NAD+ acetyl- OsmC CoA Gain of malate Fdx+ H2O2 + + ACS2 ME Trx NADP PFO SOD 2 TrxP O2 TrxR pyruvate Fdx- Carpediemonas Trx- H2O NADPH membranifera CO2 T + NH3 + TNase H NAD O-acetylserine indole P L Trp Ala homocysteine + NH3 4 Gly CS H2S MGL NADH Cys Barthelonids ? pyruvate AlaAT α-KG NADH* NAD+ Gly MRO substrate-level phosphorylation lost Glu AspAT Asp NH3 SHMT GDH NADH Ser OAA hydroxylamine Parabasalids bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplemental figure 1

100

90

80

70

60

50 PAP020 PAP020 Fornicata Parabasalia

40 Fornicata Parabasalia monophyly monophyly

30 Rapid ML Bootstrap Value (%) 20

10

0 10 20 30 40 50 60 70 80 90 Fast-evolving positions removed (%) bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

N

MRO

Supplemental figure 2 bioRxiv preprint doi: https://doi.org/10.1101/805762; this version posted October 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Supplemental Figure 3 available under aCC-BY-NC-ND 4.0 International license.

27 + 2 other bacteria Rhodothermus marinus (WP_012844279.1) Bacteria Chlorobi Stigmatella aurantiaca (WP_002613218.1) Bacteria delta epsilon Salisaeta longa (WP_022836105.1) Bacteria Bacteroidetes Chlorobi Caldilinea aerophila (WP_014432206.1) Bacteria Caldilineae Geitlerinema sp. PCC 7105 (WP_026097602.1) Bacteria Cyanobacteria Oscillatoriophycideae Armatimonadetes bacterium JGI 0000077-K19 (WP_022820767.1) Bacteria Armatimonadetes Oscillatoria acuminata (WP_015151441.1) Bacteria Cyanobacteria Oscillatoriophycideae Pseudanabaena biceps (WP_009627786.1) Bacteria Cyanobacteria Oscillatoriophycideae Rubrobacter xylanophilus (WP_011565470.1) Bacteria Actinobacteria Candidatus Koribacter versatilis (WP_011523808.1) Bacteria 100 Ergobibamus cyprinoides 540198 98 Ergobibamus cyprinoides 537447 ACS2 93 Ergobibamus cyprinoides 547776 95 Aduncisulcus paluster 80898 41 Carpediemonas membranifera 6359 66 Barthelona sp. PAP020 “ACS2” Trimastix marina 1241743 Thalassiosira pseudonana CCMP1335 (XP_002294673.1) Verrucomicrobium sp. 3C (WP_018290568.1) Bacteria Verrucomicrobium spinosum (WP_009965300.1) Bacteria Chlamydiae Verrucomicrobia 71 Galdieria sulphuraria (XP_005704786.1) Cyanidioschyzon merolae strain 10D (XP_005537438.1) Blastocystis hominis (CBK25099.2) Eukaryota Blastocystis Trimastix marina 1243966 Trimastix marina 1246148 Trimastix marina 1242464 Trimastix marina 1264460 Naegleria gruberi strain NEG-M (XP_002669321.1) 86 10 Candidatus Protochlamydia amoebophila (WP_011176116.1) Bacteria Chlamydiae Verrucomicrobia 9 Alpha + 4 Gamma proteobacteria + Ricinus communis (XP_002536822.1) 8 Bacteria + Ceratitis capitata (XP_004532409.1) 84 15 Proteobacteria Desulfobacula toluolica (WP_014956490.1) Bacteria Proteobacteria delta epsilon + Ceratitis capitata (XP_004528035.1) 15 Bacteria + Phaeodactylum tricornutum CCAP 1055/1 (XP_002180714.1) 10 Bacteria + Thalassiosira pseudonana CCMP1335 (XP_002287791.1) 59 Methanococcus aeolicus (WP_011972920.1) Archaea Methanococci cuspidata 459521 Alicyclobacillus pohliae (WP_018131957.1) Bacteria Bacilli Halomicrobium mukohataei (WP_015762959.1) Archaea Euryarchaeota Halobacteria 2 Chloroflexia Kyrpidia tusciae ( WP_013075795.1) Bacteria Firmicutes Bacilli 2 Bacteria 97 1 + 2 Methanococci Thermoplasmatales archaeon E-plasma (WP_021792680.1) Archaea Euryarchaeota Thermoplasmata Candidatus Methanoperedens nitroreducens (KCZ71212.1) Archaea Euryarchaeota Methanomicrobia 2 Spirochates + 2Archaea 93 Thermococcus barophilus (WP_013467081.1) Archaea Euryarchaeota Thermococci Thermodesulfobacterium thermophilum (WP_022855426.1) Bacteria Pirellula staleyi (WP_012913541.1) Bacteria Planctomycetaceae Planctomyces maris (WP_002644686.1) Bacteria Planctomycetes Planctomycetia 2 Bacteria Draconibacterium orientale (AHW61201.1) Bacteroidetes Chlorobi 2 Bacteria + 2 Archaea 93 2 Bacteria Phaeodactylibacter xiamenensis (KGE88560.1) Bacteria Bacteroidetes Chlorobi unclassified Candidatus Cloacimonetes (WP_020252320.1) Bacteria Cloacimonetes 56 Entamoeba dispar SAW760 (XP_001736656.1) Verrucomicrobiae bacterium DG1235 (WP_008103214.1) Bacteria Chlamydiae Verrucomicrobia 93 Persephonella sp. IF05-L8 (WP_029519783.1) Bacteria Aquificae Hydrogenivirga sp. 128-5-R1-1 (WP_008289767.1) Bacteria Aquificae 83 25 Halobacteria Candidatus Haloredivivus sp. G17 (EHK00603.1) Archaea Nanohaloarchaeota Nanohaloarchaea Candidatus Nanosalina sp. J07AB43 (EGQ43833.1) Archaea Nanohaloarchaeota Nanohaloarchaea Methanocella paludicola (WP_012901660.1) Archaea Euryarchaeota Methanomicrobia 80 6 Methanobacteria Thermodesulfobium narugense (WP_013756762.1) Bacteria Firmicutes Clostridia 6 Bacteria 9 Spironucleus vortens 93 75 Spironucleus vortens 783979 ACS1 73 Spironucleus vortens 813133 52 Spironucleus vortens 730912 Trepomonas sp. PC1 (AGV05441.1) 100 Spironucleus barkhanus (AAM94652.1) (AFV80087.1) 53 Chilomastix caulleryi 481507 94 Chilomastix caulleryi 486133 71 67 Chilomastix caulleryi 521862 81 Chilomastix cuspidata 393293 Chilomastix caulleryi 481506 96 Dysnectes brevis 929219 Dysnectes brevis 948404 Kipferlia bialata 601207 98 Giardia intestinalis ATCC 50803 (XP_001705744.1) Giardia intestinalis ATCC 50581 (EET01173.1) Kipferlia bialata 579953 73 11 Clostridia + 2 Methanobacteria Carboxydothermus ferrireducens (WP_028051693.1) Bacteria Firmicutes Clostridia 9 Bacteria Marinitoga piezophila (WP_014296554.1) Bacteria Candidatus Haloredivivus sp. G17 (EHK02044.1) Archaea Nanohaloarchaeota Nanohaloarchaea Ferroplasma sp. Type II (WP_021788573.1) Archaea Euryarchaeota Thermoplasmata 67 Pentatrichomonas hominis 644092 100 Archaeoglobus sulfaticallidus (WP_015589906.1) Archaea Euryarchaeota Archaeoglobi 11 66 4 Cyanobacteria Barthelona sp. PAP020 “ACS3” 87 8 Thaumarchaeota 93 9 Deltaepsilon proteobacteria Desulfococcus multivorans (WP_020877936.1) Bacteria Proteobacteria deltaepsilon Chilomastix cuspidata 436406 Magnetococcus marinus (WP_011712773.1) Bacteria Proteobacteria Alphaproteobacteria Desulfatibacillum alkenivorans (WP_012610165.1) Bacteria Proteobacteria deltaepsilon 2 Bacteria uncultured Acetothermia bacterium (BAL56255.1) Bacteria Acetothermia 98 8 Bacteria 100 Spironucleus salmonicida 1112340 Spironucleus salmonicida (AFV80068.1) Streptosporangium amethystogenes (WP_030915428.1) Bacteria Actinobacteria 3 Bacteria uncultured Chloroflexi bacterium (BAL54358.1) Bacteria Chloroflexi 2 Bacteria + 1 Arhaea Candidatus Methanomassiliicoccus intestinalis (WP_020448504.1) Archaea Euryarchaeota Methanomicrobia

0.5 subsutitutions / site show MLBPs > 50 %