<<

The Iron-Sulfur Scaffold Protein HCF101 Unveils The Complexity of Organellar Evolution in SAR, and

Jan Pyrih Charles University Faculty of Science: Univerzita Karlova Prirodovedecka fakulta Vojtech Žárský Charles University Faculty of Science: Univerzita Karlova Prirodovedecka fakulta Jastin D. Fellow University of Georgia Christopher Grosche University of Marburg: Philipps-Universitat Marburg Dorota Wloga Nencki Institute of Experimental Biology: Instytut Biologii Doswiadczalnej im M Nenckiego Polskiej Akademii Nauk Boris Striepen University of Georgia Uwe G. Maier University of Marburg: Philipps-Universitat Marburg Jan Tachezy (  [email protected] ) Charles University: Univerzita Karlova https://orcid.org/0000-0001-6976-8446

Research article

Keywords: HCF101, Ind1, iron-sulfur cluster, mitochondrion, plastid, evolution

Posted Date: December 15th, 2020

DOI: https://doi.org/10.21203/rs.3.rs-126638/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License

Version of Record: A version of this preprint was published on March 19th, 2021. See the published version at https://doi.org/10.1186/s12862-021-01777-x. 1 The iron-sulfur scaffold protein HCF101 unveils the complexity of organellar evolution

2 in SAR, Haptista and Cryptista

3

4 Jan Pyriha, Vojtěch Žárskýa, Justin D. Fellowsb, Christopher Groschec,d, Dorota Wlogae, Boris

5 Striepenb, Uwe G. Maierc,d, Jan Tachezya,f.

6

7 a Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Průmyslová

8 595, 252 50 Vestec, Czech Republic.

9 b Department of Cellular Biology, University of Georgia, Athens, Georgia, USA.

10 c Laboratory for Cell Biology, Philipps University Marburg, Karl-von-Frisch-Str. 8, 35032,

11 Marburg, Germany.

12 d LOEWE Center for Synthetic (Synmikro), Hans-Meerwein-Str. 6, 35032

13 Marburg, Germany

14 e Laboratory of Cytoskeleton and Cilia Biology, Nencki Institute of Experimental Biology of

15 Polish Academy of Sciences, 3 Pasteur Street, 02-093 Warsaw, Poland.

16

17

18 f To whom correspondence should be addressed: Jan Tachezy, Department of Parasitology,

19 Faculty of Science, Charles University BIOCEV, Průmyslová 595, 252 50 Vestec, Czech

20 Republic.

21 E-mail: [email protected]

1

22 Abstract

23 Background: Nbp35-like proteins (Nbp35, Cfd1, HCF101, Ind1, and AbpC) are P-loop

24 NTPases that serve as components of iron-sulfur cluster (FeS) assembly machineries. In

25 , Ind1 is present in mitochondria, and its function is associated with the assembly

26 of FeS clusters in subunits of respiratory Complex I, Nbp35 and Cfd1 are the components of

27 the cytosolic FeS assembly (CIA) pathway, and HCF101 is involved in FeS assembly of

28 photosystem I in plastids of (chHCF101). The AbpC protein operates in Bacteria and

29 Archaea. To date, the cellular distribution of these proteins is considered to be highly

30 conserved with only a few exceptions.

31 Results: We searched for the genes of all members of the Nbp35-like protein family and

32 analyzed their targeting sequences. Nbp35 and Cfd1 were predicted to reside in the cytoplasm

33 with some exceptions of Nbp35 localization to the mitochondria; Ind1was found in the

34 mitochondria, and HCF101 was predicted to reside in plastids (chHCF101) of all

35 photosynthetically active eukaryotes. Surprisingly, we found a second HCF101 paralog in all

36 members of Cryptista, Haptista, and SAR that was predicted to predominantly target

37 mitochondria (mHCF101), whereas Ind1 appeared to be absent in these organisms. We also

38 identified a few exceptions, as apicomplexans possess mHCF101 predicted to localize in the

39 cytosol and Nbp35 in the mitochondria. Our predictions were experimentally confirmed in

40 selected representatives of (), Stramenopila (Phaeodactylum

41 tricornutum, Thalassiosira pseudonana), and Ciliophora ( thermophila) by

42 tagging proteins with an transgenic reporter. Phylogenetic analysis suggested that chHCF101

43 and mHCF101 evolved from a common ancestral HCF101 independently of the Nbp35/Cfd1

44 and Ind1 proteins. Interestingly, phylogenetic analysis supports rather a lateral gene transfer

45 of ancestral HCF101 from bacteria than its acquisition being associated with either α-

46 proteobacterial or cyanobacterial endosymbionts.

2

47 Conclusion: Our searches for Nbp35-like proteins across eukaryotic lineages revealed that

48 SAR, Haptista, and Cryptista possess mitochondrial HCF-101. Because plastid localization of

49 HCF101 was only known thus far, the discovery of its mitochondrial paralog explains

50 confusion regarding the presence of HCF101 in organisms that possibly lost secondary

51 plastids (e.g., , ) or possess reduced nonphotosynthetic plastids

52 (apicomplexans).

53

54 Keywords

55 HCF101, Ind1, iron-sulfur cluster, mitochondrion, plastid, evolution

56

57 Background

58 Iron-sulfur (FeS) cluster assembly pathways are essential for all three domains of life:

59 Bacteria, Archaea, and Eukarya. In eukaryotes, there are three main pathways, which are

60 localized in distinct cellular compartments: mitochondria, plastids, and the cytosol. The

61 organellar pathways were acquired through endosymbiosis of proteobacteria and

62 cyanobacteria that evolved into mitochondria and plastids, respectively [1, 2]. The

63 mitochondrial FeS cluster assembly (ISC) machinery operates in nearly all forms of

64 mitochondria including anaerobic hydrogenosomes [3] and highly reduced mitosomes [4].

65 The pathway in plastids is called the sulfur utilization factor (SUF) system, which is present

66 in primary [5] as well as secondary plastids [6–8]. The ISC machinery is functionally linked

67 to the third system, the cytosolic FeS cluster assembly (CIA) machinery. Phylogenetic

68 analysis suggested that the CIA pathway was present in the last eukaryotic common ancestor

69 (LECA) and that its components are predominantly of bacterial origin [9, 10]. There are few

3

70 known exceptions to the highly conserved setup of FeS assembly machineries, and all these

71 exceptions concern adapted to anaerobic or microaerobic conditions with modified

72 mitochondria. replaced the ISC pathway with two components of a nitrogen-

73 fixing (NIF) machinery that were acquired by lateral gene transfer (LGT) from ɛ-

74 proteobacteria[11]. The NIF system operates in the cytosol of Entamoeba histolytica or in the

75 cytosol and hydrogenosomes of Mastigamoeba balamuthi [12]. Similarly, the breviate

76 Pygsuia biforma apparently replaced the ISC system with an archeal SUF system [13, 14].

77 Finally, three SUF components (SufC, SufB, and fused protein SufDSU) of bacterial origin

78 were found in the cytosol of the oxymonad Monocercomonoides sp., which lost its

79 mitochondria [15].

80 The only proteins that are common to the CIA, ISC, and SUF pathways are P-loop

81 NTPases with the ParA : Nbp35/Cfd1, Ind1, and high chlorophyll fluorescence 101

82 (HCF101), respectively (hereafter Nbp35-like proteins). In CIA, Nbp35/Cfd proteins serve in

83 the initial phase of FeS assembly as a [4Fe-4S] scaffold using sulfur and iron that are exported

84 from mitochondria [16]. The FeS cluster is then transferred via Nar1 and the

85 Cia1/Cia2/MMS19 targeting complex to apo-proteins. Ind1 serves as a scaffold in later stages

86 of FeS assembly to deliver [4Fe-4S] clusters specifically to the apo-subunits of mitochondrial

87 respiratory complex I, and thus, the presence of Ind1 closely matches the complex I

88 distribution [17]. Its necessity for complex I maturation underlines the presence of Ind1 in

89 hydrogenosomes, in which complex I is reduced to only two FeS subunits [14, 18]. HCF101

90 was shown to transport [4Fe-4S] clusters to photosystem I subunits and heterodimeric

91 ferredoxin-thioredoxin reductase complexes in plastids of Arabidopsis thaliana [19, 20].

92 It is believed that the cannonical distribution of FeS cluster assembly machineries and

93 thus that of machinery-specific Nbp35-like proteins is highly conserved in eukaryotes,

94 including protists with primary or complex plastids. The latter organelles evolved in

4

95 eukaryotic hosts from eukaryotic symbionts with green ( and

96 Chlorarachniophyceae) or red (Stramenopila, Alveolata, , and Cryptophytes)

97 plastids [21, 22]. These complex plastids are surrounded by three or more membranes and

98 characterized by the presence of a periplastidal compartment, the extremely reduced cytosol

99 of the endosymbiont and, in the case of cryptophytes and chlorarachniophytes, of a remnant

100 nucleus (nucleomorph). Interestingly, the presence of nucleomorph, which is likely dependent

101 on activities of FeS proteins, correlates with the presence of the endosymbiotic CIA,

102 including Nbp35 that is retained in the periplastidial compartment [7] in addition to CIA in

103 the host cytosol. This curious finding further exemplifies the conserved topology of Nbp35

104 and other CIA components.

105 The localization of HCF101 has not been experimentally studied in most eukaryotic

106 lineages. Moreover, because HCF101 is essential for photosystem I and consequently

107 , it could be particularly interesting to investigate its presence and cellular

108 localization in organisms that possess non-photosynthesizing plastids such as the apicoplast in

109 apicomplexans. The genes for HCF101 have been noticed in several apicomplexan genomes

110 such as Toxoplasma gondii and falciparum, and their possible localization in the

111 apicoplast has been suggested [23, 24]. However, these HCF101 homologs lack the targeting

112 signals one would expect for proteins localized to the apicoplast [23, 24]. Even more puzzling

113 is the identification of HCF101 in the genome of Cryptosporidium parvum, which has lost its

114 plastid [24]. Therefore, we decided to search for Nbp35-like genes across eukaryotic genomes

115 and to predict their cellular localization based on their organellar targeting presequences. In

116 selected protists, we verified the localization of Nbp35-like proteins experimentally. The most

117 surprising result is the identification of the mitochondrial form of HCF101in protists with

118 complex plastids.

119

5

120 Results

121 Distribution of Nbp35-like proteins in eukaryotes

122 We searched for Nbp35, Cfd1, Ind1, and HCF101 in genomes and transcriptomes across the

123 main eukaryotic lineages, and for each protein we predicted its putative cellular localization

124 (Table 1, Table S1). While Nbp35 was found ubiquitously in all lineages as reported

125 previously [10], Cfd1 was generally present in Ophistokonta, , Cryptista,

126 Glaucophypta, and (Metamonada, Discoba) supergroups but absent in the remaining

127 groups of , SAR, and Haptista. Diplomonads such as Giardia intestinalis and

128 Spironucleus salmonicida represent the only exception within excavates as they lack Cfd1

129 (Table 1)[25]. Nbp35 homologs were not identified in only four organisms, most likely due to

130 the incompleteness of the available sequencing data (Table 1). Curiously, in Mastigamoeba

131 balamuthi, there are three Nbp35 paralogs, of which one was predicted to possess N-terminal

132 hydrogenosomal targeting presequences (Table 1). Furthermore, we also predicted a

133 mitochondrial targeting signal for Nbp35 proteins in Apicomplexa and Chromerids.

134 As expected, Ind1 was predicted to be present in the mitochondria of Ophistokonta,

135 Amoebozoa, Archaeplastida, and Excavata groups except for organisms that lack complex I

136 (Table 1). Interestingly, we did not identify Ind1 in any organism with complex plastids.

137 While this is not surprising for apicomplexans that lack complex I such as Toxoplasma gondii

138 and Plasmodium falciparum and evolutionarily related chromerids Chromera vellia and

139 Vitrella brassica, Ind1 was also absent in all other lineages of the SAR, Haptista and

140 Cryptista groups, despite the presence of genes for the FeS subunits of Complex I in these

141 organisms [17].

142 Finally, we searched for genes encoding the HCF101 protein. This protein could be

143 easily distinguished from other Nbp35-like proteins based on the presence of two extra

6

144 domains, an N-terminal FeS assembly P domain (FSCA, previously domain of unknown

145 function DUF59) and a C-terminal DUF971 [20]. Surprisingly, the distribution of HCF101

146 was limited not only to lineages harboring primary plastids (, Rhodophyta,

147 Glaucophyta) or complex plastids of red (SAR, Cryptophyta, Haptophyta) or green

148 (chlororachniophytes, euglenids and some ) origin, but the gene was also

149 present in the remaining nonphotosynthetic members of SAR, Haptista, and Cryptista. Every

150 photosynthetically active possesses a gene that encodes HCF101 with either an N-

151 terminal primary plastid targeting signal (Archaeplastida) or a bipartite signal, which targets

152 the protein to complex plastids (chHCF101) (Fig. 1). Strikingly, in all members of SAR,

153 Haptista, and Cryptista (formerly referred to as ), we found a second HCF101

154 paralog, with predicted mitochondrial localization (mHCF101). The only unexpected

155 variation of this cellular localization was found in Alveolata. In apicomplexans that harbor a

156 nonphotosynthetic apicoplast, HCF101 was predicted to reside in the cytosol, while Nbp35

157 possesses an N-terminal extension, which may target the protein to the mitochondria. The

158 same cytosolic distribution of HCF101 and possibly mitochondrial Nbp35 we predicted also

159 for evolutionarily related Chromerids that possess photosynthetic plastids, and therefore also

160 chHCF101. In other , such as , that possesses cryptic

161 nonphotosynthetic plastids and in ciliates that lack plastids, we predicted standard cytosolic

162 localization for Nbp35 and mitochondrial localization of putative mHCF101, whereas

163 chHCF101 is absent. The distribution of Nbp35, mHCF101, and chHCF101 in dinoflagellates

164 is likely similar to that in Stramenopila and ; however, predictions of protein

165 localization in some dinoflagellates had low confidence.

166

167 Experimental localization of selected Nbp35-like proteins

7

168 The identification of mHCF101 and the unexpected localization predicted for Nbp35 in

169 apicomplexans prompted us to select three protists that are amenable for cell transformation

170 and investigate the localization of Nbp35-like proteins using protein tagging. First, we tested

171 genes from the Phaeodactylum tricornutum and Thalassiosira pseudonana that

172 possess secondary plastids. P. tricornutum cells were transformed to express homologous

173 eGFP-tagged mHCF101, chHCF101, and Nbp35 as well as heterologous genes from T.

174 pseudonana. Fluorescence microscopy revealed that both mHCF101 proteins labeled

175 structures corresponding to mitochondria as indicated by colabeling with MitoTracker. These

176 structures were clearly distinct from plastids, in which we observed labeling with chHCF101

177 (Fig. 2). As expected, Nbp35 labeling corresponded to the cytosol. Next, we tested

178 localization of putative mHCF101 and Nbp35 in the Tetrahymena thermophila, which

179 lacks plastids. HA-tagged mHCF101 appeared in numerous round mitochondria organized in

180 longitudinal arrays that were again also labeled with MitoTracker (Fig. 3). Nbp35 appeared as

181 a diffuse signal within the cell corresponding to the cytosol. Finally, we expressed HCF101

182 and Nbp35 in T. gondii (Fig. 3). This organism lacks mitochondrial Ind1 and possesses a

183 reduced nonphotosynthetic plastid, the apicoplast. Nbp35 clearly colocalized in tubular

184 structures with the mitochondrial marker F1-ATPase. Putative HCF101 appeared within the

185 cell as a cytosolic protein. No localization of HCF101 to the apicoplast was observed using

186 the antibody against plastidial CPN60. These experimental data confirmed the predicted

187 localization of mHCF101 in diatoms and the ciliate and mitochondrial localization of Nbp35

188 in Toxoplasma.

189

190 Phylogeny of HCF101

8

191 To learn about the evolutionary history of chHCF101 and mHCF101 and to obtain further

192 support for predictions of their cellular localization, we performed phylogenetic analysis. In

193 the first step, we were interested in the relationship between HCF101 and other members of

194 the Nbp35-like protein family. We analyzed a large dataset of 8440 amino acid sequences

195 including mHCF101, chHCF101, Nbp35, Cfd1, and Ind1 as well as prokaryotic homologs of

196 ApbC proteins with the ParA domain. We expected that chHCF101 originated from a

197 cyanobacterial endosymbiont that evolved to a plastid, similar to Ind1, which was acquired

198 with the α-proteobacterial ancestor of mitochondria [9]. However, chHCF101 and mHCF101

199 formed a common clade with various lineages of bacteria that appeared at the base of the

200 HCF101 subtree, including proteobacteria, the PVC group, and Bacteroidetes. There is no

201 obvious support for the cyanobacterial ancestry of HCF101 and thus for endosymbiotic gene

202 transfer (EGT), although the overall resolution of the tree is low. As expected, Ind1 formed a

203 clade with the majority of eukaryotic sequences and α-proteobacteria at a basal position that is

204 consistent with EGT origin of the protein (Fig. 4). Interestingly, Ind1 of kinetoplastids

205 appeared at a separate position in the Ind1/α-proteobacterial subtree than the rest of the

206 eukaryotic sequences, suggesting its specific phylogenetic history. It is possible that hand in

207 hand with the presence of an atypical Complex I in kinetoplastids, Ind1 protein also

208 underwent dramatic evolutional reshaping [26]. Finally, Nbp35/Cfd1 clustered together with

209 various eubacterial and archaebacterial sequences as observed previously [9, 10].

210 Because the statistical support was moderate throughout the phylogenetic tree, we also

211 performed prediction of protein domains with a focus on the presence of the HCF101 marker

212 domains FSCA and DUF951 to obtain more information to estimate a possible HCF101

213 origin (Additional file 1, Table S2). This analysis showed that the majority of bacterial

214 sequences have a FSCA-ParA structure (3420), or contain the ParA domain only (2762), and

215 there are also various other domain combinations. Of note, 18 sequences obtained from

9

216 proteobacteria, PVC group members, and Bacteroidetes clustered with eukaryotic HCF101

217 and shared the characteristic FSCA-ParA-DUF951 domains structure of HCF101.

218 In the second step, we focused on more detailed phylogenetic analysis of chHCF101

219 and mHCF101 (Fig. 5). The phylogenetic tree revealed that chHCF101 and mHCF101 are

220 paralogs that evolved from a common HCF101 ancestor, possibly by duplication events.

221 ChHCF101 and mHCF101 formed two monophyletic groups albeit with low support. The

222 chHCF101 tree is by-and-large consistent with the current concept of eukaryotic phylogeny.

223 There are three well-supported clades of Archaeplastida with primary plastids for

224 Viridiplantae, Glaucophyta, and Rhodophyta together with protists that harbor corresponding

225 secondary plastids. Thus, chHCF101 in Viridiplantae clusters together with Euglenozoa that

226 possesses secondary plastids of green origin. ChHCF101 of Rhodophyta is at the base of

227 Stramenopila, , Cryptophytes, and Haptophytes, which contain Rhodophyta-

228 derived red secondary plastids. However, there are some exceptions. Some dinoflagellates

229 such Alexandrium and that have secondary plastids of red origin yet seem to

230 possess chHCF101 related to the green plastid lineage. Conversely, although

231 chlorarachniophytes acquired secondary plastids of green ancestry, their chHCF101 clustered

232 within orthologs of red plastids. Interestingly, this single gene phylogeny supports the close

233 relationship between plastids of Haptophytes and Cryptophytes. Branching of the main groups

234 was further supported by a comparison of six conserved amino acid residues (AA 461-466

235 according to Arabidopsis thaliana) in the DUF971 domain. The common motive for

236 Viridiplantae, Euglenozoa and Dinophyta was D[K,R,Q,T][G,S]Ax[G,S], chHCF101 of

237 Glaucophyta, Rhodophyta, Stramenophila, and Chromerida possess the highly conserved

238 motif C[R,S]CAxC, and Chlorarachniophyta, Cryptophytes and Haptophytes possess

239 CRSP[A,T,S]N.

10

240 The observed branching order of mHCF101 is poorly supported (Fig. 5); nevertheless,

241 separation of the chHCF101 and mHCF101 groups provides a tool for our prediction of cell

242 localization, as several sequences included in Table 1 were incomplete and thus preclude

243 confident predictions based on the identification of N-terminal targeting motifs. For example,

244 in dinoflagellates, we found complete sequences of two HCF101 paralogs only for A.

245 tamarense. Sequences of all other dinoflagellates were incomplete; however, phylogenetic

246 analysis clearly separated group mHCF101 including A. tamarense HCF101 with

247 mitochondrial targeting presequence and formed a subtree of dinoflagellates with high

248 statistical support. The other HCF101 paralogs appeared within the

249 group. We were particularly interested in the origin of HCF101 of apicomplexans that lack N-

250 terminal targeting sequences, and in T. gondii, we demonstrated its cytosolic localization.

251 HCF101 proteins of T. gondii and other related apicomplexans including Cystoisospora suis

252 clearly appeared within the mHCF101 group, at the base of a well-supported subtree of

253 apicomplexans, chromerids, and dinoflagellates. Therefore, apicomplexan HCF101s seem not

254 to be derived from plastids (apicoplast), contrary to previous assumptions [23, 24]. Another

255 interesting question was the origin of HCF101 in ciliates which lack plastids. Phylogeny of

256 HCF101 showed that HCF101 in ciliates is not related to chHCF101 but clustered within

257 mHCF101s.

258 Several members of the PVC group such as Kiritimatiellaceae bacterium and

259 Verrucomicrobia bacterium are at the base of the HCF101tree. Although this tree is poorly

260 resolved, it is noteworthy that the bacterial conserved motif of DUF971

261 C[A,R,N,H]CA[A,L]C is similar to the motifs in chHCF101 as well as most mHCF101 (Fig.

262 5). Interestingly, the verrucomicrobial HCF101 clustered with the orthologs of the

263 glaucophyta group, which is considered to possess the most primitive plastid. Thus, based on

264 the phylogeny analysis, the presence of bacterial HCF101-like proteins with specific domain

11

265 structures, and the conserved DUF971 motif, the subset of PVC group members represents

266 the best candidates for the origin of eukaryotic HCF101.

267

268 Discussion

269 In this work, we screened Nbp35-like homologs across eukaryotes and predicted their cellular

270 localization. This analysis discovered the existence of a mitochondrial HCF101 homolog that

271 is common to all tested members of SAR, Haptophytes, and Cryptophytes. Localization of

272 mHCF101 was predicted based on the identification of N-terminal mitochondrial targeting

273 sequences and supported by a phylogenetic analysis that separated mHCF101 from the

274 chHCF101 paralog. Moreover, mitochondrial localization of mHCF101 was experimentally

275 verified for mHCF101 encoded in the genomes of two diatoms (T. pseudonana, P.

276 tricornutum) and the ciliate T. thermophila. Curiously, but consistently with the in silico

277 predictions, we found mHCF101 in the cytosol of T. gondii, while Nbp35 was localized to the

278 mitochondrion. Evolutionary analysis of HCF101 proteins and their specific distribution

279 suggested that HCF101 was gained potentionally via LGT from bacteria of the PVC lineage

280 either by a common ancestor of Archaeplastida to serve in the chloroplast (plastid-first

281 hypothesis) or by a common ancestor of Archeaplastida SAR, Haptista and Cryptista to serve

282 first in mitochondria.

283 The presence of mHCF101 is coincident with the absence of Ind1, which is involved

284 in the maturation of complex I FeS subunits. This specific distribution suggests that

285 mHCF101 may act as a functional homolog of Ind1. Both proteins share conserved

286 nucleotide-binding domain characteristics of the Mrp (MetG-related protein)/Nbp35 subclass

287 of ParA P-loop NTPases [27], which includes the conserved CxxC motif. This motif is

288 essential to bind the transient [4Fe4S] cluster that is transferred to the target FeS proteins [20,

12

289 28]. It is evident that chHCF101 in and Ind1in mitochondria transfers labile FeS

290 clusters to different targets. However, both proteins are able to deliver the labile cluster to the

291 S. cerevisiae model [4Fe4S] acceptor protein, isopropyl malate isomerase, in vitro [20,

292 28].Thus, the function of HCF101 proteins and Ind1 might be interchangeable. The major

293 difference between HCF101 and Ind1 is the presence of N- and C-terminal domains in the

294 former protein. The FSCA domain is present at the N-terminus of HCF101 (just after the N-

295 terminal targeting sequence) and in a few other eukaryotic proteins involved in FeS assembly

296 such as Cia2 of CIA machinery [29] and asymmetric leaves1/2 enhancer7 (AE7), which is a

297 Cia2 homolog in A. thaliana [30]. The FSCA domain in combination with ParA was

298 identified in a large number of bacterial and some archeal FeS cluster carrier proteins (this

299 work). Importantly, in Staphylococcus aureus, the FSCA domain is composed solely of the

300 SufT subunit of SUF machinery and acts as an auxiliary FeS cluster maturation factor [31].

301 Therefore, the fusion of FSCA and Nbp35-like protein might be beneficial for more efficient

302 transfer of FeS centers to target proteins. The function of C-terminal DUF971 of HCF101 is

303 currently elusive. However, we noticed that DUF971 present at the C-termini of most

304 chHCF101 and mHCF101 proteins contains a highly conserved CxCxxC motif that may have

305 the capacity to bind divalent metals [32]. Further studies are required to clarify a function of

306 mHCF101 and DUF971 in particular.

307 The evolutionary journey taken by mHCF101 to arrive in the mitochondria of SAR,

308 Haptista, and Cryptista is a puzzle, but multiple evolutionary scenarios could potentially

309 explain the origin of this gene. Our phylogenetic and domain analysis of HCF101 proteins

310 together with their distribution in eukaryotes suggested that ancestral HCF101 was not

311 acquired via EGT from cyanobacteria that possess simple ParA domain-containing proteins

312 without FCSA and DUF971. Rather, it was gained via LGT from bacteria of the PVC lineage

313 that possessed an HCF101-like protein of the FSCA-ParA-DUF971 domain structure and

13

314 cluster (although with weak support) with chHCF101 of . The key question is

315 whether HCF101 was first targeted to chloroplasts or to mitochondria. Considering the

316 chloroplast-first scenario (Fig. 6A), we can hypothesize that HCF101 was acquired by a

317 common ancestor of Archaeplastida, targeted to chloroplasts, and evolved independently in

318 glaucophytes, green /land plants, and . Then, HCF101 was transferred via

319 secondary endosymbiosis of green plastids to Euglenozoa and by transfer of red plastids to a

320 putative common ancestor of SAR, Haptista and Cryptista. In this hypothetical ancestor, the

321 HCF101 was duplicated, and one of the paralogs was targeted to mitochondria (mHCF101),

322 where it functionally replaced Ind1. Alternatively (mitochondria-first), we can hypothesize

323 that HCF101 was first present in the mitochondria of a common ancestor of Archeplastida,

324 SAR, Haptistae and Cryptista [33, 34] and functioned in parallel with Ind1 (Fig. 6B). HCF101

325 in Archaeplastida was then retargeted from mitochondria to the plastid (chHCF101), while

326 Ind1 was lost at least twice independently in a common ancestor of Cryptophytes and of

327 Haptophytes plus SAR.

328 The proposed plastid-first scenario for HCF101 evolution is consistent with the

329 “chromalveolate” hypothesis that is based on the idea that all lineages with a red secondary

330 plastid are monophyletic [35]. In support of this hypothesis, it has been proposed that all

331 members of chromalveolates share unique SELMA (symbiont-derived ERAD-like machinery)

332 to target proteins into secondary plastids via the endoplasmic reticulum [36, 37]. Furthermore,

333 remnant plastids of some seemingly aplastidal-like members of chromalveolates such as

334 Perkinsus marinus were discovered [38]. In ciliates that lack plastid, several proteins of algal

335 origin were previously identified including a MinD-like hypothetical protein in T.

336 thermophila [39]. In our analysis, we identify this protein as mHCF101, and its mitochondrial

337 localization was experimentally confirmed in Tetrahymena. Thus, in addition to SELMA, the

14

338 presence of mHCF101 in mitochondria together with the absence of Ind1 is another feature

339 that is common to chromalveolates.

340 However, an increasing number of phylogenetic studies favor multiple secondary (or

341 serial) endosymbioses in these lineages [33, 40][41]. They refute the chromalveolate

342 hypothesis by placing Cryptophytes within Archeplastida and through the discovery of novel

343 groups such as (Cryptista) [42] and (Haptista), in which so far no

344 evolutionary traces of plastids have been found. Thus, their lack of plastids could reflect the

345 primary absence of plastids rather than secondary loss [33]. Interestingly, even these lineages

346 contain mHCF101 instead of Ind1, supporting the idea of multiple independent losses of Ind1.

347 Tertiary endosymbiosis is another facet that complicates tracing HCF101 evolution,

348 particularly in dinoflagellates. Our phylogenetic analyses revealed that chHCF101 of

349 clustered within the Haptophytes subtree. This is fully consistent with previous inferences that

350 Karenia and related genera of dinoflagellates with the fucoxanthin-containing plastids [43,

351 44] lost the ancestral secondary plastid, which was replaced by a new plastid from

352 Haptophytes via tertiary endosymbiosis [45–48]. Interestingly, another group of

353 dinoflagellates including Alexandrium and Symbiodinium with peridin-containing plastids of

354 red origin appeared at the base of Viridiplantae in the chHCF101 subtree, which may suggest

355 experience with a green plastid before acquiring the red plastid, as suggested in several

356 studies [40, 48–50]. In contrast to chHCF101 phylogenies, the monophyletic origin of

357 mHCF101 was observed for both groups of dinoflagellates regardless of the

358 multiendosymbiotic events, which clearly reflected different evolutionary histories for

359 mHCF101 and chHCF101.

360 Another example of the complex evolution of chHCF101 is found in

361 Chlorarachniophytes, which possess a complex plastid of green origin [51]. Perplexingly, the

15

362 phylogeny of chHCF101 suggested a red origin for this protein, which clustered with

363 Haptophytes and Cryptophytes and shared the unique CRSP[T,A,S]N motif of DUF971.

364 However, this finding may not be so surprising. Previous analyses of the chlorarachniophyte

365 Bigellowiella natans classified several genes of likely algal origin to be potentially acquired

366 from the red lineage [52]. These ‘red’ genes are rather puzzling, but might have originated

367 from cryptic endosymbioses involving red algae prior to the more recent acquisition of a

368 green lineage endosymbiont [53, 54].

369 Based on our and previous analyses [10], Nbp35 seems to be the only essential FeS

370 cluster assembling P-loop ATPase present in all eukaryotic cells. Typically Nbp35 is a

371 cytosolic member of the CIA machinery; however, there are multiple examples of

372 mitochondrial localization. In this work, we demonstrated targeting of a single Nbp35 to the

373 T. gondii mitochondrion, and we similarly predicted mitochondrial localization for other

374 apicomplexans and chromerids based on their targeting signals. Three Nbp35 genes were

375 observed in the unrelated free-living archamoebae M. balamuthi, from which a single Nbp35

376 paralog possesses the mitochondrial/hydrogenosomal targeting sequence, and its

377 hydrogenosomal localization was supported by previous proteomic data [55]. Dual

378 mitosomal/cytoplasmic localization of two out of three Nbp35 paralogs was observed in

379 G. intestinalis [25]. A common property shared by these organisms with

380 mitochondrion-associated Nbp35 is that they lack Complex I and Ind1. It is tempting to

381 speculate that mitochondrial Nbp35 replaces Ind1 and serves in the delivery of [4Fe4S]

382 clusters to proteins other than Complex I subunits. However, Ind1 is highly specific for

383 Complex I, and its involvement in the maturation of other FeS proteins was not observed [17,

384 28].

385 Conclusions

16

386 The searches for Nbp35-like proteins across eukaryotic lineages revealed mitochondrial HCF-

387 101 homologs that are present exclusively in SAR, Haptista, and Cryptista. Thus, the presence

388 of mHCF101 and lack of Ind1 are the first nonplastidial common features of these lineages

389 formerly grouped under chromalveolates. Phylogeny of the HCF101 protein suggested that

390 both mHCF101 and chHCF101 are paralogs and that an ancestral HCF101 more likely was

391 gained by LGT from bacteria than via EGT.

392

393 Methods

394 Toxoplasma gondii cultivation, genetic manipulation, and microscopy.

395 Tachyzoites of T. gondii derived from strain RH were cultivated and genetically manipulated

396 as described previously [56]. HCF101 (TGME49_318590) and Nbp35 (TGME49_280730)

397 coding sequences were amplified from T. gondii cDNA and cloned in frame with a triple

398 hemagglutinin (HA) epitope tag at the 3’ end into plasmid pDt7s4HA. The constructs were

399 transiently transfected into the T. gondii Δku80/TATi strain [57] using a BTX ECM 630

400 electroporator (Harward Apparatus). Confluent human foreskin fibroblasts (HFF) were

401 infected with transfected parasites and fixed after 24 hours of infection with 4% formaldehyde

402 and permeabilized with 0.2% Triton X-100. Immunofluorescence microscopy was performed

403 using the primary antibodies anti-HA (Roche), mouse anti- T. gondii mitochondrial F1-

404 ATPase [58], and rabbit anti-apicoplast HSP60 [59]. Secondary antibodies used were goat

405 anti-rat Alexa Fluor 488, goat anti-mouse Alexa Fluor 546, and goat anti-rabbit Alexa Fluor

406 546. Images were obtained on an Applied Precision Delta Vision microscope and were

407 deconvolved and adjusted using Softworx software (GE Healthcare).

408 Tetrahymena thermophila cultivation, genetic manipulation, and microscopy.

17

409 T. thermophila CU428 strain was cultivated axenically in SPP medium (1% proteose-peptone,

410 0.2% glucose, 0.1% yeast extract, and 0.003% ferric-sodium: EDTA supplied with an

411 antibiotic-antimycotic mix (Invitrogen, Carlsbad, CA)[60]. The insertion of transgenes into

412 the T. thermophila macronucleus was performed as described previously [61]. Genes coding

413 for Nbp35 (XP_001033404, TTHERM 0312220) and mHCF101 (XP_001007903, TTHERM

414 00538790) were amplified from genomic DNA and inserted into the pFAP44-3HA vector

415 [62], which allows the expression of C-terminal-3HAtagged protein under its native promoter

416 [63]. Transfected cells were selected under an increasing concentration of paromomycin (100

417 µg-1000 µg per ml) and decreasing concentration of CdCl2.

418 Living cells of T. thermophila were stained by Mitotracker Red CMXRos (Molecular

419 Probes, Invitrogen) following the manufacturer’s protocol. Then, the cells were spread on

420 polylysine-coated slides and immediately fixed using methanol, permeabilized with acetone,

421 and immunostained by a α-HA tag rat monoclonal antibody (Roche) andAlexa Fluor 488

422 (green) donkey α-rat antibody (Invitrogen). Nuclei were stained with 4',6-diamidin-2-

423 fenylindol (DAPI). The slides were examined using an Olympus IX81 microscope equipped

424 with an MT20 illumination system.

425

426 Phaeodactylum tricornutum and Thalassiosira_pseudonana cultivation, genetic manipulation,

427 and microscopy

428 P. tricornutum (Bohlin, University of Texas Culture Collection, strain 646) and

429 Thalassiosira_pseudonana Hasle et Heimdal CCMP1335 were axenically grown in artificial

430 seawater medium, made by dissolving “Tropic marine” salt (Wartenberg, Germany) to obtain

431 35 units of practical salinity and enriched by Guillard’s (F/2) Marine Water Enrichment

432 Solution. The cells were cultivated at 22°C under continuous illumination (80 mmol photons

18

433 per m2 per s) with agitation (150 rpm) in 250 mL Erlenmeyer flasks to a density of

434 approximately 7x106 cells/ml.

435 P. tricornutum genes for Nbp35 (XP_002179311), mHCF101 (Joint Genome Institute,

436 JGI portal ID 49356), and chHCF101 (JGI portal ID 1865) and T. pseudonana genes for

437 Nbp35 (XP_002289427), mHCF101 (XP_002290238), and chHCF101 (XP_002293925)

438 were amplified from corresponding cDNA and cloned for expression in P. tricornutum with C

439 terminal e-GFP in vector pPHA-NR4 [36]. Biolistic transfection was carried out as described

440 previously [64] using M10 tungsten particles and 1350 psi rupture discs together with the Bio-

441 Rad Biolistic PDS-1000/He particle delivery system. Transfected cells were grown at 22°C

442 under continuous illumination (80 mmol photons per m2 per s) on plates containing solid f/2-

+ 443 medium with 1.3% agar, 1.5 mM NH4 as the sole nitrogen source and 75 µg/mlZeocin™ as a

444 selection marker. Protein expression under the control of the nitrate reductase promoter

445 (pPha-NR vector) was induced by cultivation on 0.9 mM NO3 for 2 days.

446 Transformants were analyzed with a Leica TCSSP2 confocal laser scanning

447 microscope. Mitochondrial localization was verified with MitoTracker® Orange CMTMRos

448 (Life Technologies). The fluorescence of enhanced green fluorescent protein (eGFP) and

449 chlorophyll was excited with an argon laser (65 mW) at 488 nm and detected with two

450 photomultiplier tubes at bandwidths of 500 to 520 nm and 625 to 720 nm for eGFP and

451 chlorophyll fluorescence, respectively. MitoTracker® Orange CMTMRos was excited with a

452 HeNe(1.2 mW) laser at 543 nm, and emission was detected at 560-590 nm. Pictures were

453 assembled in ImageJ (http://imagej.nih.gov/ij/index.html) using the Loci Bio-Formats plug-in

454 (http://www.openmicroscopy.org/site/products/bio-formats).

455

456 Searches for protein sequences and targeting predictions

19

457 Homologs of Ind1, Nbp35, Cfd1, and Hcf101 proteins were retrieved using the BLAST

458 algorithm [65] from the NCBI nr database (https://www.ncbi.nlm.nih.gov/), JGI genome

459 (https://genome.jgi.doe.gov/portal/), iMicrobe (https://www.imicrobe.us/), and Uniprot

460 (https://www.uniprot.org/). Genes for Cyanophora paradoxa were obtained from the database

461 at http://cyanophora.rutgers.edu/cyanophora/home.php. Protein sequences for Eutreptiella

462 gymnastica and Pyramimonas parkeae were kindly provided by Vladimír Hampl (Charles

463 University, Prague, Czech Republic), gracilis sequences by Marek Eliáš (University

464 of Ostrava, Czech Republic), and and by Miroslav

465 Oborník (Biology center AS, Czech Budweis, Czech Republic). For each retrieved protein

466 sequence, a given database, dataset, and gene number is indicated in supplementary Table S1.

467 Protein sequences of four Nbp35-like protein categories were aligned using the

468 MUSCLE algorithm [66] in Geneious® 11.1.5 software with default settings. Protein

469 sequences with incomplete N- terminal parts were excluded from further protein localization

470 analysis. In a minority of cases, when N-terminal methionine was absent, but we identified

471 methionine within the first 10 amino acids of the N-terminus, we shortened the sequence, and

472 localization prediction was carried with lower confidence as indicated in Table 1. Subcellular

473 targeting of proteins was predicted using TargetP-1.1 ([67],

474 http://www.cbs.dtu.dk/services/TargetP-1.1/index.php); TargetP- 2.0

475 ([68],http://www.cbs.dtu.dk/services/TargetP/); DeepLoc-1.0

476 ([69],http://www.cbs.dtu.dk/services/DeepLoc/, accurate Profiles protein model); MitoFates

477 ([70], http://mitf.cbrc.jp/MitoFates/cgi-bin/top.cgi); MitoProt ([71],

478 https://ihg.gsf.de/ihg/mitoprot.html); SignalP 4.1 ([72],

479 http://www.cbs.dtu.dk/services/SignalP-4.1/); SignalP 5 ([73],

480 http://www.cbs.dtu.dk/services/SignalP/); Phobius ([74], http://phobius.sbc.su.se/); PSORT II

481 ([75], https://psort.hgc.jp/form2.html); ChloroP ([76],

20

482 http://www.cbs.dtu.dk/services/ChloroP/); Hectarv1.3 ([77], https://webtools.sb-roscoff.fr/);

483 Multiloc ([78], https://omictools.com/multiloc-tool); and PlasmoAP ([79];

484 https://plasmodb.org/plasmo/plasmoap.jsp). Furthermore, proteins with detected signal

485 peptides were shortened according to the predicted cleavage site of signal peptidase (SignalP

486 5, HECTAR, and TargetP 2 programs), and the presence of subsequent putative transit

487 peptide was detected with the MitoFates, TargetP2, and ChloroP algorithms. A search for the

488 motif of transit peptide cleavage by stromal processing peptidase was carried as described

489 [80–82].

490

491 Phylogenetic analysis

492 For the initial analysis of Nbp-35-like proteins (Fig. 4), homologs of the ParA domain

493 (PF10609) from the Pfam database were searched in the Uniprot database using HMMER

494 (version 3). A total of 22328 sequences with e-values below the 1e-50 cutoff were selected.

495 Selected sequences were grouped into groups that share 90% sequence identity using CD-

496 HIT, and for each such group, one sequence was selected to reduce redundancy, resulting in a

497 dataset of 9139 sequences. Sequences were then aligned using MAFFT [83] with default

498 settings, and the multiple sequence alignment was trimmed using BMGE [84] with the

499 BLOSUM30 matrix and a block size of one, resulting in an alignment with 189 aligned amino

500 acid positions. Sequences that were aligned at less than 126 positions (more than 63 gaps)

501 were removed from the dataset, resulting in 8440 sequences. These were again realigned and

502 trimmed resulting in an alignment with 185 aligned amino acid positions. A phylogenetic tree

503 was then inferred using FastTree [85] with default settings.

504 For detailed HCF101 analysis (Fig. 5), a dataset of 107 HCF101 proteins and their

505 homologs was manually assembled. The sequences were aligned using MAFFT [83] with “–

21

506 maxiterate 1000” and “–local pair” parameters. The alignment was trimmed using BMGE

507 [84] with the BLOSUM30 matrix and a block size of one, which resulted in 311 aligned

508 amino acid positions. A maximum likelihood phylogenetic tree was inferred using IQ-Tree

509 (version 1.6) [86] with the best selected mixture model LG+C60+G, and the topology was

510 tested using 10 000 ultrafast bootstraps. A Bayesian phylogenetic tree was inferred using

511 PhyloBayes (ver. 3) [87] and the -Poisson model, running two chains for 20 000

512 generations. The first 2 000 generations were discarded (burnin), and every tenth generation

513 was sampled. The chains converged, with the maxdiff value 0.076.

514 Domain searches

515 Conserved protein domains were detected by searching sequences against the Pfam database

516 (ver. 32) using HMMER (ver. 3)[88]. Hits with e-values below 1e-5 were considered.

517

518 Table 1. Identification of Nbp35-like proteins in selected representatives of eukaryotic

519 lineages and prediction of their cellular localization.

520

Supergroup Group/Species Cytosol Mitochondria Chloroplast Green Red Ophistokonta Metazoa Homo sapiens Nbp35, Cfd1 Ind1 Drosophila melanogaster Nbp35, Cfd1 Ind1 Fungi Saccharomyces cerevisiae Nbp35, Cfd1 Yarrowia lipolytica Nbp35, Cfd1 Ind1 Amoebozoa Acanthamoeba castelanii Nbp35, Cfd1 Ind1 Dictyostelium discoideum Nbp35, Cfd1 Ind1 Mastigamoeba balamuthi Nbp35, Cfd1 Entamoeba histolytica Nbp35, Cfd1

22

Archaeplastida Viridiplantae Arabidopsis thaliana Nbp35 Ind1 chHCF101 Chlorella variabilis Nbp35 Ind1 chHCF101 Nbp35, Coccomyxa subellipsoidea chHCF101 Ind1# Micromonas pusilla Nbp35 Ind1 chHCF101 Oryza sativa Nbp35 Ind1 chHCF101 Nbp35, Ostreococcus tauri chHCF101 Ind1 Physcomitrella patens Nbp35 Ind1 chHCF101 Pyramimonas parkeae Nbp35 Ind1* chHCF101 Selaginella moellendorffii Nbp35 Ind1# chHCF101# Glaucophyta Glaucocystis sp. Nbp35 Ind1 chHCF101* Gloeochaete wittrockiana Nbp35*, Cfd1* Ind1* chHCF101 Rhodophyta Chondrus crispus Nbp35 na chHCF101 Cyanidioschyzon merolae Nbp35 Ind1 chHCF101 Erythrolobus madagascarensis Nbp35* Ind1* chHCF101* Galdieria sulphuraria Nbp35 Ind1 chHCF101 Madagascaria erythrocladiodes Nbp35* na chHCF101 Porphyridium aerugineum Nbp35 Ind1 chHCF101* Rhodosorus marinus Nbp35* Ind1* chHCF101* Timspurckia oligopyrenoides na Ind1 chHCF101* SARCH Alveolata Apicomplexa bovis mHCF101* Nbp35# Cryptosporidium muris mHCF101 Nbp35# Toxoplasma gondii mHCF101 Nbp35 Plasmodium falciparum mHCF101 Nbp35# Plasmodium yoelii mHCF101 Nbp35# Chromerida Chromera vellia mHCF101 Nbp35# chHCF101 Vitrella brassica mHCF101 Nbp35# chHCF101 Perkinsus marinus Nbp35# mHCF101# Dinoflagellata Alexandrium monilatum Nbp35* mHCF101* chHCF101* Alexandrium tamarense Nbp35* mHCF101 chHCF101 Durinskia baltica Nbp35 na chHCF101* acuminata Nbp35* mHCF101* chHCF101* Karenia brevis Nbp35* mHCF101 chHCF101 marina Nbp35* mHCF101* na Symbiodinium sp. Nbp35# mHCF101* chHCF101 Cilliata Tetrahymena thermophila Nbp35 mHCF101

23

Oxytricha trifallax Nbp35 mHCF101 tetraurelia Nbp35 mHCF101 sp. Nbp35 mHCF101 Rhizaria Rhizaria Ammonia sp. Nbp35 mHCF101* Elphidium margaritaceum Nbp35 mHCF101* Reticulomyxa filosa Nbp35 mHCF101 Paulinella chromatophora na na Bacterial HCF101-like Chlorarachniophyta Bigelowiella longifila Nbp35 mHCF101* chHCF101* Bigelowiella natans Nbp35* mHCF101* chHCF101 Lotharella globosa Nbp35* mHCF101* chHCF101* Stramenopila Oomycota Albugo candidagi Nbp35 mHCF101# Albugo laibachii Nbp35 mHCF101# Aphanomyces astaci Nbp35 mHCF101 Aphanomyces invadans Nbp35 mHCF101 Phytophthora infestans Nbp35 mHCF101* Phytophthora parasitica Nbp35 mHCF101 Saprolegnia diclina Nbp35 mHCF101 Aureococcus anophagefferens Nbp35 mHCF101* chHCF101* Ectocarpus siliculosus Nbp35# mHCF101 chHCF101 Phaeodactylum tricornutum Nbp35 mHCF101 chHCF101 Nannochloropsis gaditana Nbp35 mHCF101 chHCF101 Schizochytrium aggregatum Nbp35 mHCF101 na Thalassiosira oceanica Nbp35 mHCF101* chHCF101 Thalassiosira pseudonana Nbp35 mHCF101 chHCF101 hominis Nbp35 mHCF101 Haptista Haptophyta polylepis Nbp35* mHCF101* chHCF101* Nbp35 mHCF101 chHCF101 Exanthemachrysis gayraliae na mHCF101* chHCF101* oceanica Nbp35 mHCF101* chHCF101 galbana Nbp35 mHCF101 chHCF101* Pleurochrysis carterae Nbp35 mHCF101 chHCF101* parvum Nbp35* mHCF101* chHCF101* Centrohelida Nbp35* mHCF101* Cryptista Cryptophyta mesostigmatica Nbp35*, Cfd1* mHCF101* chHCF101* curvata na mHCF101 na Cryptomonas paramecium Nbp35*, Cfd1* mHCF101 na

24

Geminigera cryophila Nbp35*, Cfd1 mHCF101 chHCF101* theta Nbp35/Cfd1 mHCF101* chHCF101# Hanusia phi Nbp35*, Cfd1* mHCF101 na rufescens Nbp35, Cfd1* mHCF101* chHCF101# Proteomonas sulcata Nbp35*, Cfd1 mHCF101* na sp. Nbp35, Cfd1* na chHCF101* Katablepharida Nbp35*, Cfd1* mHCF101* Goniomonas avonlea Nbp35*, Cfd1* mHCF101 Goniomonas pacifica Nbp35, Cfd1* mHCF101* Excavata Euglenozoa Euglena gracilis Nbp35, na na chHCF101* Eutreptiella gymnastica Nbp35, Cfd1 Ind1 chHCF101 Nbp35, Cfd1 Ind1 Leishmania major Nbp35, Cfd1 Ind1 Metamonada Trichomonas vaginalis Nbp35, Cfd1 Ind1 Giardia intestinalis Nbp35 Nbp35 Heterolobosea Naegleria gruberi Nbp35, Cfd1 Ind1

521 Proteins with experimentally verified localization are in bold. Taxons in red indicate available

522 genome sequence, taxons in red indicate available transcriptome. * Incomplete sequence of

523 the gene, cellular localization is predicted based on the phylogenetic analysis (Fig. 5); #

524 prediction with low confidence; na, gene was not identified in available transcriptome.

525

526 References

527 1. Tachezy J, Sánchez LB, Müller M. Mitochondrial type iron-sulfur cluster assembly in the

528 amitochondriate eukaryotes Trichomonas vaginalis and Giardia intestinalis, as indicated by

529 the phylogeny of IscS. Mol Biol Evol. 2001;18:1919–28.

530 2. Lill R. Function and biogenesis of iron-sulphur proteins. Nature. 2009;460:831–8.

531 3. Sutak R, Dolezal P, Fiumera HL, Hrdy I, Dancist A, Delgadillo-Correa M, et al.

532 Mitochondrial-type assembly of FeS centers in the hydrogenosomes of the amitochondriate

25

533 eukaryote Trichomonas vaginalis. Proc Natl Acad Sci U S A. 2004;101:10368–73.

534 4. Tovar J, León-Avila G, Sánchez LB, Sutak R, Tachezy J, Van Der Giezen M, et al.

535 Mitochondrial remnant organelles of Giardia function in iron-sulphur protein maturation.

536 Nature. 2003;426:172–6.

537 5. Takahashi Y, Tokumoto U. A third bacterial system for the assembly of iron-sulfur clusters

538 with homologs in Archaea and plastids. J Biol Chem. 2002;277:28380–3.

539 6. Novák Vanclová AMG, Zoltner M, Kelly S, Soukal P, Záhonová K, Füssy Z, et al.

540 Metabolic quirks and the colourful history of the Euglena gracilis secondary plastid. New

541 Phytol. 2020;225:1578–92. doi:10.1111/nph.16237.

542 7. Grosche C, Diehl A, Rensing SA, Maier UG. Iron-sulfur cluster biosynthesis in algae with

543 complex plastids. Genome Biol Evol. 2018;10:2061–71. doi:10.1093/gbe/evy156.

544 8. Füssy Z, Oborník M. Complex endosymbioses I: From primary to complex plastids,

545 multiple independent events. In: Methods in Molecular Biology. Humana Press Inc.; 2018. p.

546 17–35. doi:10.1007/978-1-4939-8654-5_2.

547 9. Freibert SA, Goldberg A V., Hacker C, Molik S, Dean P, Williams TA, et al. Evolutionary

548 conservation and in vitro reconstitution of microsporidian iron-sulfur cluster biosynthesis. Nat

549 Commun. 2017;8:13932. doi:10.1038/ncomms13932.

550 10. Tsaousis AD, Gentekaki E, Eme L, Gaston D, Roger AJ. Evolution of the cytosolic iron-

551 sulfur cluster assembly machinery in Blastocystis species and other microbial eukaryotes.

552 Eukaryot Cell. 2014;13:143–53. doi:10.1128/EC.00158-13.

553 11. Gill EE, Diaz-Trivino S, Barbera MJ, Silberman JD, Stechmann A, Gaston D, et al. Novel

554 mitochondrion-related organelles in the anaerobic Mastigamoeba balamuthi. Mol

26

555 Microbiol. 2007;66:1306–20.

556 12. Nývltová E, Šuták R, Harant K, Šedinová M, Hrdý I, Pačes J, et al. NIF-type iron-sulfur

557 cluster assembly system is duplicated and distributed in the mitochondria and cytosol of

558 Mastigamoeba balamuthi. Proc Natl Acad Sci U S A. 2013;110:7371–6.

559 13. Stairs CW, Eme L, Brown MW, Mutsaers C, Susko E, Dellaire G, et al. A SUF Fe-S

560 cluster biogenesis system in the mitochondrion-related organelles of the anaerobic

561 Pygsuia. Curr Biol. 2014;24:1176–86. doi:10.1016/j.cub.2014.04.033.

562 14. Leger MM, Eme L, Hug LA, Roger AJ. Novel hydrogenosomes in the microaerophilic

563 Stygiella incarcerata. Mol Biol Evol. 2016;33:2318–36.

564 doi:10.1093/molbev/msw103.

565 15. Karnkowska A, Vacek V, Zubáčová Z, Treitli SC, Petrželková R, Eme L, et al. A

566 eukaryote without a mitochondrial organelle. Curr Biol. 2016;26:1274–84.

567 16. Pandey AK, Pain J, Dancis A, Pain D. Mitochondria export iron-sulfur and sulfur

568 intermediates to the cytoplasm for iron-sulfur cluster assembly and tRNA thiolation in yeast. J

569 Biol Chem. 2019;294:9489–502.

570 17. Bych K, Kerscher S, Netz DJ, Pierik AJ, Zwicker K, Huynen MA, et al. The iron-sulphur

571 protein Ind1 is required for effective complex I assembly. EMBO J. 2008;27:1736–46.

572 18. Hrdy I, Hirt RP, Dolezal P, Bardonová L, Foster PG, Tachezy J, et al. Trichomonas

573 hydrogenosomes contain the NADH dehydrogenase module of mitochondrial complex I.

574 Nature. 2004;432:618–22.

575 19. Lezhneva L, Amann K, Meurer J. The universally conserved HCF101 protein is involved

576 in assembly of [4Fe-4S]-cluster-containing complexes in Arabidopsis thaliana chloroplasts.

27

577 J. 2004;37:174–85. doi:10.1046/j.1365-313X.2003.01952.x.

578 20. Schwenkert S, Netz DJA, Frazzon J, Pierik AJ, Bill E, Gross J, et al. Chloroplast HCF101

579 is a scaffold protein for [4Fe-4S] cluster assembly. Biochem J. 2009;425:207–14.

580 doi:10.1042/BJ20091290.

581 21. Keeling PJ. The endosymbiotic origin, diversification and fate of plastids. Philos Trans R

582 Soc Lond B Biol Sci. 2010;365:729–48. doi:10.1098/rstb.2009.0103.

583 22. Archibald JM. The puzzle of plastid evolution. Curr Biol. 2009;19:R81–8.

584 23. Pala ZR, Saxena V, Saggu GS, Garg S. Recent advances in the [Fe–S] cluster biogenesis

585 (SUF) pathway functional in the apicoplast of Plasmodium. Trends Parasitol. 2018;34:800–9.

586 doi:10.1016/j.pt.2018.05.010.

587 24. Seeber F, Soldati-Favre D. Metabolic pathways in the apicoplast of Apicomplexa. Int Rev

588 Cell Mol Biol. 2010;281:161–228. doi:10.1016/S1937-6448(10)81005-6.

589 25. Pyrih J, Pyrihová E, Kolísko M, Stojanovová D, Basu S, Harant K, et al. Minimal

590 cytosolic iron-sulfur cluster assembly machinery of Giardia intestinalis is partially associated

591 with mitosomes. Mol Microbiol. 2016;102:701–14.

592 26. Opperdoes FR, Michels PAM. Complex I of Trypanosomatidae: does it exist? Trends

593 Parasitol. 2008;24:310–7. doi:10.1016/j.pt.2008.03.013.

594 27. Leipe DD, Wolf YI, Koonin E V., Aravind L. Classification and evolution of P-loop

595 GTPases and related ATPases. J Mol Biol. 2002;317:41–72.

596 28. Sheftel AD, Stehling O, Pierik AJ, Netz DJA, Kerscher S, Elsässer H-P, et al. Human

597 Ind1, an iron-sulfur cluster assembly factor for respiratory complex I. Mol Cell Biol.

598 2009;29:6059–73.

28

599 29. Stehling O, Mascarenhas J, Vashisht AA, Sheftel AD, Niggemeyer B, Rösser R, et al.

600 Human CIA2A-FAM96A and CIA2B-FAM96B integrate iron homeostasis and maturation of

601 different subsets of cytosolic-nuclear iron-sulfur proteins. Cell Metab. 2013;18:187–98.

602 30. Luo D, Bernard DG, Balk J, Hai H, Cui X. The DUF59 family gene AE7 acts in the

603 cytosolic iron-sulfur cluster assembly pathway to maintain nuclear genome integrity in

604 Arabidopsis. Plant Cell. 2012;24:4135–48.

605 31. Mashruwala AA, Bhatt S, Poudel S, Boyd ES, Boyd JM. The DUF59 containing protein

606 SufT is involved in the maturation of iron-sulfur (FeS) proteins during conditions of high FeS

607 cofactor demand in Staphylococcus aureus. PLOS Genet. 2016;12:e1006233.

608 doi:10.1371/journal.pgen.1006233.

609 32. Mesterházy E, Lebrun C, Crouzy S, Jancsó A, Delangle P. Short oligopeptides with three

610 cysteine residues as models of sulphur-rich Cu(i)- and Hg(ii)-binding sites in proteins.

611 Metallomics. 2018;10:1232–44.

612 33. Burki F, Kaplan M, Tikhonenkov D V., Zlatogursky V, Minh BQ, Radaykina L V., et al.

613 Untangling the early diversification of eukaryotes: A phylogenomic study of the evolutionary

614 origins of centrohelida, haptophyta and cryptista. Proc R Soc B Biol Sci. 2016;283.

615 34. Burki F, Roger AJ, Brown MW, Simpson AGB. The New Tree of Eukaryotes. Trends

616 Ecol Evol. 2020;35:43–55.

617 35. Cavalier-Smith T. Principles of protein and lipid targeting in secondary symbiogenesis:

618 Euglenoid, dinoflagellate, and sporozoan plastid origins and the eukaryote family tree. J

619 Eukaryot Microbiol. 1999;46:347–66.

620 36. Stork S, Moog D, Przyborski JM, Wilhelmi I, Zauner S, Maier UG. Distribution of the

621 SELMA translocon in secondary plastids of red algal origin and predicted uncoupling of

29

622 ubiquitin-dependent translocation from degradation. Eukaryot Cell. 2012;11:1472–81.

623 doi:10.1128/EC.00183-12.

624 37. Felsner G, Sommer MS, Gruenheit N, Hempel F, Moog D, Zauner S, et al. ERAD

625 components in organisms with complex red plastids suggest recruitment of a preexisting

626 protein transport pathway for the periplastid membrane. Genome Biol Evol. 2011;3:140–50.

627 doi:10.1093/gbe/evq074.

628 38. Sakamoto H, Suzuki S, Nagamune K, Kita K, Matsuzaki M. Investigation into the

629 physiological significance of the phytohormone abscisic acid in Perkinsus marinus, an oyster

630 parasite harboring a nonphotosynthetic plastid. J Eukaryot Microbiol. 2017;64:440–6.

631 doi:10.1111/jeu.12379.

632 39. Reyes-Prieto A, Moustafa A, Bhattacharya D. Multiple genes of apparent algal origin

633 suggest ciliates may once have been photosynthetic. Curr Biol. 2008;18:956–62.

634 40. Keeling PJ. The number, speed, and impact of plastid endosymbioses in eukaryotic

635 evolution. Annu Rev Plant Biol. 2013;64:583–607.

636 41. Stiller JW, Schreiber J, Yue J, Guo H, Ding Q, Huang J. The evolution of photosynthesis

637 in chromist algae through serial endosymbioses. Nat Commun. 2014;5.

638 42. Burki F, Okamoto N, Pombert JF, Keeling PJ. The evolutionary history of haptophytes

639 and cryptophytes: Phylogenomic evidence for separate origins. Proc R Soc B Biol Sci.

640 2012;279:2246–54. doi:10.1098/rspb.2011.2301.

641 43. Kite GC, Dodge JD. Structural organization of plastid DNA in two anomalously

642 pigmented dinoflagellates. J Phycol. 1985;21:50–6. doi:10.1111/j.0022-3646.1985.00050.x.

643 44. Tengs T, Dahlberg OJ, Shalchian-Tabrizi K, Klaveness D, Rudi K, Delwiche CF, et al.

30

644 Phylogenetic analyses indicate flint the 19’hexanoyloxy-fucoxanthin- containing

645 dinoflagellates have tertiary plastids of origin. Mol Biol Evol. 2000;17:718–29.

646 doi:10.1093/oxfordjournals.molbev.a026350.

647 45. Burki F, Imanian B, Hehenberger E, Hirakawa Y, Maruyama S, Keeling PJ.

648 Endosymbiotic gene transfer in tertiary plastid-containing dinoflagellates. Eukaryot Cell.

649 2014;13:246–55.

650 46. Kamikawa R, Yazaki E, Tahara M, Sakura T, Matsuo E, Nagamune K, et al. Fates of

651 evolutionarily distinct, plastid-type glyceraldehyde 3-phosphate dehydrogenase genes in

652 kareniacean dinoflagellates. J Eukaryot Microbiol. 2018;65:669–78.

653 47. Hwan SY, Hackett JD, Van Dolah FM, Nosenko T, Lidie KL, Bhattacharya D. Tertiary

654 endosymbiosis driven genome evolution in dinoflagellate algae. Mol Biol Evol.

655 2005;22:1299–308. doi:10.1093/molbev/msi118.

656 48. Dorrell RG, Howe CJ. Integration of plastids with their hosts: Lessons learned from

657 dinoflagellates. Proc Natl Acad Sci U S A. 2015;112:10247–54.

658 doi:10.1073/pnas.1421380112.

659 49. Hackett JD, Yoon HS, Soares MB, Bonaldo MF, Casavant TL, Scheetz TE, et al.

660 Migration of the plastid genome to the nucleus in a peridinin dinoflagellate. Curr Biol.

661 2004;14:213–8.

662 50. Frommolt R, Werner S, Paulsen H, Goss R, Wilhelm C, Zauner S, et al. Ancient

663 recruitment by chromists of green algal genes encoding enzymes for carotenoid biosynthesis.

664 Mol Biol Evol. 2008;25:2653–67. doi:10.1093/molbev/msn206.

665 51. Ishida K, Green BR, Cavalier-Smith T. Diversification of a chimaeric algal group, the

666 chlorarachniophytes: Phylogeny of nuclear and nucleomorph small-subunit rRNA genes. Mol

31

667 Biol Evol. 1999;16:321–31. doi:10.1093/oxfordjournals.molbev.a026113.

668 52. Curtis BA, Tanifuji G, Maruyama S, Gile GH, Hopkins JF, Eveleigh RJM, et al. Algal

669 genomes reveal evolutionary mosaicism and the fate of nucleomorphs. Nature. 2012;492:59–

670 65. doi:10.1038/nature11681.

671 53. Ponce-Toledo RI, Moreira D, López-García P, Deschamps P. Secondary plastids of

672 euglenids and chlorarachniophytes function with a mix of genes of red and green algal

673 ancestry. Mol Biol Evol. 2018;35:2198–204. doi:10.1093/molbev/msy121.

674 54. Archibald JM, Rogers MB, Toop M, Ishida K ichiro, Keeling PJ. Lateral gene transfer and

675 the evolution of plastid-targeted proteins in the secondary plastid-containing alga Bigelowiella

676 natans. Proc Natl Acad Sci U S A. 2003;100:7678–83. doi:10.1073/pnas.1230951100.

677 55. Le T, Žárský V, Nývltová E, Rada P, Harant K, Vancová M, et al. Anaerobic peroxisomes

678 in Mastigamoeba balamuthi. Proc Natl Acad Sci. 2020;117:2065-75.

679 doi:10.1073/pnas.1909755117.

680 56. Striepen B, Soldati D. Genetic manipulation of Toxoplasma gondii. In: Toxoplasma

681 gondii. Academic Press; 2007. p. 391–418.

682 57. Sheiner L, Demerly JL, Poulsen N, Beatty WL, Lucas O, Behnke MS, et al. A systematic

683 screen to discover and analyze apicoplast proteins identifies a conserved and essential protein

684 import factor. PLoS Pathog. 2011;7.

685 58. Chen AL, Moon AS, Bell HN, Huang AS, Vashisht AA, Toh JY, et al. Novel insights into

686 the composition and function of the Toxoplasma IMC sutures. Cell Microbiol. 2017;19.

687 doi:10.1111/cmi.12678.

688 59. Agrawal S, van Dooren GG, Beatty WL, Striepen B. Genetic evidence that an

32

689 endosymbiont-derived endoplasmic reticulum-associated protein degradation (ERAD) system

690 functions in import of apicoplast proteins. J Biol Chem. 2009;284:33683–91.

691 60. Gorovsky MA, Yao MC, Keevert JB, Pleger GL. Chapter 16 Isolation of micro- and

692 macronuclei of Tetrahymena pyriformis. Methods Cell Biol. 1975;9 C:311–27.

693 doi:10.1016/S0091-679X(08)60080-1.

694 61. Wloga D, Camba A, Rogowski K, Manning G, Jerka-Dziadosz M, Gaertig J. Members of

695 the NIMA-related kinase family promote disassembly of cilia by multiple mechanisms. Mol

696 Biol Cell. 2006;17:2799–810.

697 62. Urbanska P, Joachimiak E, Bazan R, Fu G, Poprzeczko M, Fabczak H, et al. Ciliary

698 proteins Fap43 and Fap44 interact with each other and are essential for proper cilia and

699 flagella beating. Cell Mol Life Sci. 2018;75:4479–93. doi:10.1007/s00018-018-2819-7.

700 63. Dave D, Wloga D, Gaertig J. Manipulating ciliary protein-encoding genes in Tetrahymena

701 thermophila. Methods Cell Biol. 2009;93:1–20. doi:10.1016/S0091-679X(08)93001-6.

702 64. Hempel F, Bozarth AS, Lindenkamp N, Klingl A, Zauner S, Linne U, et al. Microalgae as

703 bioreactors for bioplastic production. Microb Cell Fact. 2011;10.

704 65. Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, et al. Gapped

705 BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic

706 Acids Res. 1997;25:3389–402. isi:A1997XU79300002.

707 66. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high

708 throughput. Nucleic Acids Res. 2004;32:1792–7. isi:000220487200025.

709 67. Emanuelsson O, Brunak S, von Heijne G, Nielsen H. Locating proteins in the cell using

710 TargetP, SignalP and related tools. Nat Protoc. 2007;2:953–71.

33

711 68. Armenteros JJA, Salvatore M, Emanuelsson O, Winther O, Von Heijne G, Elofsson A, et

712 al. Detecting sequence signals in targeting peptides using deep learning. Life Sci Alliance.

713 2019;2. doi:10.26508/lsa.201900429.

714 69. Almagro Armenteros JJ, Sønderby CK, Sønderby SK, Nielsen H, Winther O. DeepLoc:

715 prediction of protein subcellular localization using deep learning. Bioinformatics (Oxford,

716 England). 2017;33:3387–95. doi:10.1093/bioinformatics/btx431.

717 70. Fukasawa Y, Tsuji J, Fu S-C, Tomii K, Horton P, Imai K. MitoFates: Improved prediction

718 of mitochondrial targeting sequences and their cleavage sites. Mol Cell Proteomics.

719 2015;14:1113–26.

720 71. Claros MG, Vincens P. Computational method to predict mitochondrially imported

721 proteins and their targeting sequences. Eur J Biochem. 1996;241:779–86.

722 72. Petersen TN, Brunak S, Von Heijne G, Nielsen H. SignalP 4.0: Discriminating signal

723 peptides from transmembrane regions. Nature Methods. 2011;8:785–6.

724 doi:10.1038/nmeth.1701.

725 73. Almagro Armenteros JJ, Tsirigos KD, Sønderby CK, Petersen TN, Winther O, Brunak S,

726 et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat

727 Biotechnol. 2019;37:420–3. doi:10.1038/s41587-019-0036-z.

728 74. Käll L, Krogh A, Sonnhammer ELL. A combined transmembrane topology and signal

729 peptide prediction method. J Mol Biol. 2004;338:1027–36.

730 75. Nakai K, Horton P. PSORT: A program for detecting sorting signals in proteins and

731 predicting their subcellular localization. Trends Biochem Sci. 1999;24:34–5.

732 doi:10.1016/S0968-0004(98)01336-X.

34

733 76. Emanuelsson O, Nielsen H, Heijne G Von. ChloroP, a neural network-based method for

734 predicting chloroplast transit peptides and their cleavage sites. Protein Sci. 1999;8:978–84.

735 doi:10.1110/ps.8.5.978.

736 77. Gschloessl B, Guermeur Y, Cock JM. HECTAR: A method to predict subcellular

737 targeting in . BMC Bioinformatics. 2008;9:393. doi:10.1186/1471-2105-9-393.

738 78. Höglund A, Dönnes P, Blum T, Adolph HW, Kohlbacher O. MultiLoc: Prediction of

739 protein subcellular localization using N-terminal targeting sequences, sequence motifs and

740 amino acid composition. Bioinformatics. 2006;22:1158–65.

741 doi:10.1093/bioinformatics/btl002.

742 79. Foth BJ, Ralph SA, Tonkin CJ, Struck NS, Fraunholz M, Roos DS, et al. Dissecting

743 apicoplast targeting in the malaria parasite Plasmodium falciparum. Science. 2003;299:705–8.

744 doi:10.1126/science.1078599.

745 80. Apt KE, Zaslavkaia L, Lippmeier JC, Lang M, Kilian O, Wetherbee R, et al. In vivo

746 characterization of multipartite plastid targeting signals. J Cell Sci. 2002;115:4061–9.

747 doi:10.1242/jcs.00092.

748 81. Woehle C, Dagan T, Martin WF, Gould SB. Red and problematic green phylogenetic

749 signals among thousands of nuclear genes from the photosynthetic and apicomplexa-related

750 Chromera velia. Genome Biol Evol. 2011;3:1220–30. doi:10.1093/gbe/evr100.

751 82. Huesgen PF, Alami M, Lange PF, Foster LJ, Schröder WP, Overall CM, et al. Proteomic

752 amino-termini profiling reveals targeting information for protein import into complex

753 plastids. PLoS One. 2013;8:e74483. doi:10.1371/journal.pone.0074483.

754 83. Katoh K, Standley DM. MAFFT Multiple sequence alignment software version 7:

755 Improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.

35

756 doi:10.1093/molbev/mst010.

757 84. Criscuolo A, Gribaldo S. BMGE (Block Mapping and Gathering with Entropy): a new

758 software for selection of phylogenetic informative regions from multiple sequence

759 alignments. BMC Evol Biol. 2010;10:210. doi:10.1186/1471-2148-10-210.

760 85. Price MN, Dehal PS, Arkin AP. FastTree 2 - Approximately maximum-likelihood trees

761 for large alignments. PLoS One. 2010.

762 86. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: A fast and effective

763 stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol.

764 2015;32:268–74. doi:10.1093/molbev/msu300.

765 87. Lartillot N, Lepage T, Blanquart S. PhyloBayes 3: a Bayesian software package for

766 phylogenetic reconstruction and molecular dating. Bioinformatics. 2009;25:2286–8.

767 88. Finn RD, Clements J, Arndt W, Miller BL, Wheeler TJ, Schreiber F, et al. HMMER web

768 server: 2015 Update. Nucleic Acids Res. 2015;43:W30–8.

769

770 Declarations

771 Ethics approval and consent to participate

772 Not applicable

773 Consent for publication

774 Not applicable

775 Availability of data and materials

36

776 All data generated or analysed during this study are included in this published article and its

777 supplementary information files.

778 Competing interests

779 The authors declare that they have no competing interests.

780 Funding

781 Laboratory of JT was supported by NPU II (LQ1604) provided by the Ministry of Education, Youth

782 and Sport (MEYS) of the Czech Republic, CePaViP (CZ.02.1.01/0.0/0.0/16_019/0000759) provided

783 by ERD Funds, and MICOBION funded from EU H2020 (No 810224). CG and UGM were supported

784 by the LOEWE Center for Synthetic Microbiology (Synmikro).

785 Authors' contributions

786 JP and JT conceived the study. JP and VŽ performed bioinformatic analyses, JDF, BS, CG.

787 DW, UGM performed cell localization studies, JP and JT wrote the paper. All authors read

788 and approved the final manuscript.

789

790 Legends to figures

791 Figure 1. Predictions of N-terminal targeting sequences for chloroplast (chHCF101) and

792 mitochondrial versions of HCF101 (mHCF101) in selected alveolates and stramenopiles. Red

793 color highlights the mitochondrial leader sequence with cleavage sites predicted with

794 Mitofates (star), TargetP 2 (circle), and PSORT II (triangle). Yellow color indicates a signal

795 peptide, which is cleaved right before the phenylalanine residue, a typical feature for signal

796 peptide cleavage in diatoms. This cleavage site was predicted with high support by the

797 TargetP, Signal P, and HECTAR algorithms. Green represents a predicted transit peptide with

37

798 two possible cleavage sites for stromal processing peptidase, manually predicted based on

799 previous studies [82]. Gray color highlights conserved residues.

800 Figure 2. Localization of Nbp35-like proteins in diatoms. Genes for chHCF101, mHCF101,

801 and Nbp35 from P. tricornutum and T. pseudonana were expressed in P. tricornutum with C-

802 terminal e-GFP tag (green). PAF, plastid autofluorescence (red); MitoT, MitoTracker Orange

803 (blue); DIC, differential interference contrast. Scale bar 10µm.

804 Figure 3. Localization of Nbp35 and mHCF101 in T. thermophila and T. gondii. Nbp35 and

805 mHCF101 were expressed under the control of their respective native promoter with a C-

806 terminal HA tag (green). Specific polyclonal antibodies against F1-ATPase (red), and HSP60

807 (red) were used as mitochondrial and apicoplastidal markers, respectively. MitoT, Mitotracker

808 Red, (red); DIC, differential interference contrast. Scale bar 10 µm (T. thermophila) and 5 µm

809 (T. gondii).

810 Figure 4. A phylogenetic tree of the Nbp35-like proteins (the ParA family). The tree was built

811 using FastTree, and nodes with support below 0.9 were collapsed (see materials and

812 methods). The leaves are color coded to indicate (8840 sequences), with

813 annotations of selected sequences. The scale bar represents the estimated number of amino

814 acid substitutions per site.

815 Figure 5. Phylogeny of HCF101 proteins.The tree was built using the PhyloBayes CAT-

816 Poisson mixture model. Numbers at nodes of the tree indicate statistical support in the form of

817 posterior probability of the PhyloBayes analysis and an ultrafast bootstrap of the IQ-Tree

818 analysis (see materials and methods). The scale bar represents the estimated number of amino

819 acid substitutions per site. The conserved cysteine motif in the DUF971 domain is displayed

820 for each protein sequence when available.

38

821 Figure 6. Scheme of HCF101 evolution. A. HCF101 distribution explained via the

822 Chromalveolate hypothesis. Upon the acquisition of HCF101-like protein via LGT from

823 bacteria to the ancestor of Archeplastida, chHCF101 was established in the plastid.

824 Euglenozoa gained HCF101 via secondary endosymbiosis with a donor containing green

825 plastid. A common ancestor of Chromalveolata gained red plastid via secondary

826 endosymbiosis, the gene for chHCF101 was duplicated and one copy was targeted to

827 mitochondria (mHCF101), where it replaced the Ind1 gene. mHCF101 is common to all

828 chromalveolates, while several lineages lost chHCF101 together with loss of the secondary

829 plastid (Cryptosporidium, Ciliates, Oomycota, centrohelids, catablepharids). B. HCF101

830 distribution explained via ‘multiple secondary endosymbiosis’. Mitochondrially localized

831 HCF101 together with Ind1 was present in a common ancestor of Archaeplastida, SAR,

832 Cryptista, and Haptista. Then, in (i) Cryptista and in (ii) a common ancestor of Haptista and

833 SAR, the Ind1 gene was lost, whereas the Archaeplastida gene for the Ind1 protein remained

834 in the mitochondria, and mHCF101 was retargeted to the plastid. Then, chHCF101 was

835 introduced to Cryptophytes, Haptophytes, and certain SAR groups via multiple secondary

836 endosymbiosis. chHCF101 is absent in lineages that did not experience secondary

837 endosymbiosis.

838 PR, protein retargeting; GL, gene loss; GD, gene duplication; PL, plastid loss; PES, primary

839 endosymbiosis; SES, secondary endosymbiosis; TES, tertiary endosymbiosis.

840 Additional files

841 Additional file 1, Table S1.xls

842 List of retrieved protein sequences that were used for cellular localization predictions.

843 Sequence, Nbp35-like protein category, organism, its taxonomy, and source of the sequence

844 in various databases are highlighted in columns A to L. Completeness of the sequence is

39

845 indicated in Column M. In the case that the protein sequence was manually corrected, the

846 trimmed sequence is displayed in column N. Columns O to P indicate final protein

847 localization prediction. Columns R to Z, AA to AJ, and AK to AO indicate support for the

848 presence of signal peptide, mitochondrial leader sequence, or plastid transit peptide,

849 respectively.

850 Additional file 2, Table S2.xls

851 Predictions of domain structure in Nbp35 homologs included in phylogenetic analysis (Fig.

852 4).

40

1 10 20 30 40 50 60 | | | | | | | P.tricornutum mHCF 1------MLPLVGRKPAISALPKVWASSSLRQISRCSFLEPPKLPQNSRCQGAQTQPRN -52 T.pseudonana mHCF 1- MTAYRVTSLSRRGLQSSLTNQRRGILCGTRLTDRSMVVKHNRDDGSTTIKSLRHLSTLTS -60 T.thermofila mHCF 1------MIRNTVQLFKQC -12 T.gondii mHCF 1------MSRTF -5 P.tricornutum chHCF 1------MTTTTTRWNHSTRQTLRCCMVWMCLSLSTIDAFTPPNPYRFRRGSS -46 T.pseudonana chHCF 1------MRTTLIVATLAGAASGFVLSPSRVLPMPHI -30

P.tricornutum mHCF 53- AYPPCYSRSHSKRCSSTVSSFPLAAFSRREMADLEDRVCGRVA------QKVRDPVLGQT -106 T.pseudonana mHCF 61- SRLRTSDEYVKVFQFGNTDKLSFSKLSSRQSAELEDELFSFLSNNKSNDERLVDPLLGRD -120 T.thermofila mHCF 13- ARSATILKKLNLNSQSFSNTIKIGQLTQKPTFHFSNAVQEGKAEITKKLKEITFED-GSN -73 T.gondii mHCF 6- GGRDGTVKETAGLEELSPEHTFLMSDDSHRGQRLRDEVLDQL------RTVIDPDLHKD -58 P.tricornutum chHCF 47- TLTRPFATPPLPSGGAPVLVDAVEDTLPPIPKEWQGEVLSTL------KSVIDPDLGSD -99 T.pseudonana chHCF 31- SSRTALGPSSSPLLSLKFSLISNDAPLFKYPSDWQSQILAAL------SVINDPDLNAD -83

Nbp35 Cfd1

archaea tack group Methanococcus euryarchaeota ApbC bacteria fcb group pvc group Hcf+related bacteria terrabacteria group Escherichia (without cyanobacteria) ApbC cyanobacteria proteobacteria Ind1 (without alphaproteo) Kinetoplastida Hcf101 alphaproteobacteria

sar archaeplastida excavata

0.4 unknown Ind1 0.99/99 Flammeovirgaceae_bacterium.A0A1Z9UPY0 Bacteroidetes ------0.39/83 Pelagibacteraceae_bacterium.A0A1Z8LA07 ------0.28/- Alphaproteobacteria Alphaproteobacteria_bacterium.A0A1Z8VRU4 ------1.0/100 0.99/98 Alpha_proteobacterium.J9Z1T9 ------Gammaproteobacteria_bacterium.A0A1Z9TMS2 Gammaproteobacteria ------0.34/- Gammaproteobacteria_bacterium.A0A1Z8QF18 ------1.0/100 Simkania_negevensis.F8L6N9 ------Chlamydiales_bacterium.A0A212KRA2 ------0.86/100 Kiritimatiellaceae_bacterium.A0A1Z9T7T9 C N C A L C Omnitrophica_WOR_2.A0A1G1KTB5 C R C A L C PVC group Lentisphaera_araneosa.A6DSR2 C H C A A C

C A C A L C 0.48/- Verrucomicrobia_bacterium.A0A1W9LEK0 1.0/100 Gloeochaete_wittrockiana.CAMPEP_0184660528 Glaucophyta C K C A A C Cyanophora_paradoxa.CAMPEP_0184339498 1.0/100 Symbiodinium_sp.CAMPEP_0192588340 0.43/87 Alexandrium_temarense.CAMPEP_0186194250 Dinophyta D K G A G G 0.65/98 Alexandrium_monilatum.CAMPEP_0200681622 D K G A G G 1.0/100 0.34/- Eutreptiella_gymnastica D R S A S S 1.0/100 Euglena_gracilis Euglenozoa 0.99/50

1.0/99 Pyramimonas_parkae D Q S A S S

0.85/97 Chlorella_variabilis.E1Z2D1 D Q S A A S

Coccomyxa_subellipsoidea_384249812 D T S A K S 0.99/99 0.5/- 1.0/100 Ostreococcus_tauri_308813203 D E S A R V Micromonas_pusilla.C1N868 Viridiplantae Marchantia_polymorpha.A0A176W408 D R S A Q S 0.28/- D R S A K S 0.97/100 Selaginella_moellendorffii_gi_302818061 Physcomitrella_patens_gi_168065377_ D R S A K S 0.35/- 0.54/- Klebsormidium_flaccidum.A0A1Y1HM25 D R S A K S 0.39/- Arabidopsis_thaliana_gi_21592386_ D R S A Q S 0.47/85 Cyanidioschyzon_merolae.M1UWE1 C R C A A C

Galdieria_sulphuraria_gi_452820923 C S C A S C

0.99/99 C R C A E C 0.39/- Madagascaria_erythrocladiodes.CAMPEP_0198316958 C R C A L C Chondrus_crispus.R7Q4I6 Rhodophyta C R C A L C 0.45/- Rhodosorus_marinus.CAMPEP_0113964354 chHCF

C R C A L C 0.34/91 0.79/- Porphyridium_aerugineum.CAMPEP_0184708468 0.98/98 0.99/100 Erythrolobus_madagascarensis.CAMPEP_0185850542 C R C A V C Porphyra_umbilicalis.A0A1X6NM15 C R C A Q C

Ectocarpus_siliculosus.D7FMK8 C R C A L C Stramenopila 0.36/98 Nannochloropsisgi_585111142_gb_EWM28719.1 C R C A M C

0.83/99 C R C A A C 0.83/95 Vitrella_brassicaformis.A0A0G4EPR6 Chromerida Chromera_velia.A0A0G4FIP2 C R C A L C

0.91/- Phaeodactylum_tricornutum.jgi.18654 C R C A A C

C R C A A C 0.99/100 1.0/100 Thalassiosira_pseudonana_gi_XP_002293925 Stramenopila 0.44/- Thalassiosira_oceanica.K0T9L9 C R C A S C

Fistulifera_solaris.A0A1Z5JLB2 C R C A A C

1.0/100 Bigelowiella_natans.jgi.132106 C R S P T N 0.99/98 Chlorarachniophyta 0.71/- C R S P T N 0.49/50 Lotharella_globosa.CAMPEP_0202693732 0.96/90 Guillardia_theta.L1K1Z4 C R S P A N

0.99/100 Geminigera_cryophila.CAMPEP_0179449570 C R S P A N

Rhodomonas.sp_CAMPEP_0191558832 C R S P A N 1.0/100 Cryptophyta 0.89/95 1.0/100 Hemiselmis_rufescens.CAMPEP_0173428798 C R S P A N

Chroomonas_mesostigmatica.CAMPEP_0206235050 C R S P A N

Exanthemachrysis_gayraliae.CAMPEP_0206032504 C R S P A N 0.81/94 Pleurochrysis_carterae.CAMPEP_0190758428 C R S P A N

0.99/100 C R S P S N 1.0/99 Isochrysis_galbana.CAMPEP_0193656872 0.98/99 Isochrysis_galbana.CAMPEP_0193689384 Haptophyta C R S P S N 1.0/100 Emiliania_huxleyi.gi.XP_005791179.1 C R S P A N

Gephyrocapsa_oceanica.CAMPEP_0188164842 +Dinophyta C R S P A N 0.4/- Karenia_brevis.CAMPEP_0188996838 C R S P S N

C R S P S N 0.34/- 0.99/97 Prymnesium_parvum.CAMPEP_0191228890 Chrysochromulina_polylepis.CAMPEP_0193734682 C R S P S N 0.48/- Ectocarpus_siliculosus.D8LK11 D P A T G E

Blastocystis_sp..A0A196S6Y7 C Q C S K C 0.52/- 1.0/100 Albugo_laibachiigi_325186587_emb_CCA21133.1 C R C A Q C

Albugo_candidagi_635366393_emb_CCI45349.1 C R C A Q C 0.99/100 Stramenopila 1.0/100 Saprolegnia_diclina_gi_530742964 C R C A A C 1.0/100 Aphanomyces_astaci.W4FYD1 C R C A Q C

Aphanomyces_invadansgi_673027656_ref_XP_008867789.1 C R C A Q C 0.36/- Phytophthora_parasiticagi_570969613_gb_ETP32732.1 C R C A Q C 0.97/100 Plasmopara_halstedii.A0A0N7L5Z8 C R C A Q C 0.99/100 Cryptomonas_paramecium.CAMPEP_0172166586 Cryptomonas_curvata.CAMPEP_0172166586 0.99/100 C Q A A K S 0.99/100 Hanusia_phi.CAMPEP_0169512202 Cryptophyta Geminigera_cryophila.CAMPEP_0179495378 C L S A K K 0.79/96 C R C A G C 0.57/- 1.0/100 Hemiselmis_rufescens.CAMPEP_0173429308 Chroomonas_mesostigmatica.CAMPEP_0206242118 C R C A V C

Schizochytrium_aggregatum.CAMPEP_0191599164 L R K V D D 0.12/- 1.0/100 Phaeodactylum_tricornutum.jgi.41927 Stramenopila D P K T G N Thallassiosira_pseudonana.XP_002290238 D P K T G E 0.39/- 0.81/96 0.99/100 Isochrysis_galbana.CAMPEP_0193634094 R GA E P D 0.99/100 Isochrysis_galbana.CAMPEP_0193627372 A N V E E C

1.0/100 Emiliania_huxleyi.R1EBI0 R G G A G A 0.62/- 1.0/100 0.33/- Gephyrocapsa_oceanica.CAMPEP_0188165672 Haptophyta R G G A G A 0.7/- Pleurochrysis_carterae.CAMPEP_0190804586 P G A P A G

Chrysochromulina_polylepis.CAMPEP_0193792872 E G G P G A Exanthemachrysis_gayraliae.CAMPEP_0206024806 S R D A A T

0.29/- Spongospora_subterranea.A0A0H5RAN6 C Q C A L C mHCF

Lotharella_globosa.CAMPEP_0202693732 C K S A G N Rhizaria 0.34/80 0.99/98 Reticulomyxa_filosa.X6N870 C K C A L C 1.0/100 Elphidium_margaritaceum.CAMPEP_0202707210 0.72/96 Ammonia_sp.CAMPEP_0197078946 C K C A L C

1.0/100 Stylonychiagi_678309978_emb_CDW89606.1 C K C A G C

0.97/100 Oxytricha_trifallax.J9IHW4 C K C A A C

0.99/100 Paramecium_tetraurelia_gi_145515401 C N C A L C 0.99/100 Paramecium_tetraurelia_gi_145487614 C N C A L C

Tetrahymena_thermophila.I7M713 C N C A L C

1.0/100 Karenia_brevis.CAMPEP_0189447674 C R S A K M 0.17/- Symbiodinium_sp.CAMPEP_0192603282 C R S A N M 0.85/- 0.95/94 Oxyrrhis_marina.CAMPEP_0190318896 C Q S A K M

1.0/100 Dinophysis_acuminata_DAEP01 Alveolata C R C A H C 0.51/- 0.99/100 Alexandrium_temarense.CAMPEP_0186403332 Alexandrium_monilatum.CAMPEP_0186403324 C Q C A S C 0.86/- 0.99/99 Chromera_velia.23131.t1 C R C A E C 0.99/99 Vitrella_brassicaformis.A0A0G4H493 C R C A E C Perkinsus_marinus.C5KDC8

1.0/100 Toxoplasma_gondii.gi.237841001 C Q C K S C

Cystoisospora_suis.A0A2C6LFW4 C T C K E C

0.3

Figures

Figure 1

Predictions of N-terminal targeting sequences for chloroplast (chHCF101) and mitochondrial versions of HCF101 (mHCF101) in selected alveolates and stramenopiles. Red color highlights the mitochondrial leader sequence with cleavage sites predicted with Mitofates (star), TargetP 2 (circle), and PSORT II (triangle). Yellow color indicates a signal peptide, which is cleaved right before the phenylalanine residue, a typical feature for signal peptide cleavage in diatoms. This cleavage site was predicted with high support by the TargetP, Signal P, and HECTAR algorithms. Green represents a predicted transit peptide with two possible cleavage sites for stromal processing pe ptidase, manually predicted based on previous studies [82]. Gray color highlights conserved residues. Figure 2

Localization of Nbp35-like proteins in diatoms. Genes for chHCF101, mHCF101, and Nbp35 from P. tricornutum and T. pseudonana were expressed in P. tricornutum with C terminal e-GFP tag (green). PAF, plastid autouorescence (red); MitoT, MitoTracker Orange (blue); DIC, differential interference contrast. Scale bar 10μm. Figure 3

Localization of Nbp35 and mHCF101 in T. thermophila and T. gondii. Nbp35 and mHCF101 were expressed under the control of their respective native promoter with a C terminal HA tag (green). Specic polyclonal antibodies against F1-ATPase (red), and HSP60 (red) were used as mitochondrial and apicoplastidal markers, respectively. MitoT, Mitotracker Red, (red); DIC, differential interference contrast. Scale bar 10 μm (T. thermophila) and 5 μm (T. gondii). Figure 4

A phylogenetic tree of the Nbp35-like proteins (the ParA family). The tree was built using FastTree, and nodes with support below 0.9 were collapsed (see materials and methods). The leaves are color coded to indicate taxonomy (8840 sequences), with annotations of selected sequences. The scale bar represents the estimated number of amino acid substitutions per site. Figure 5

Phylogeny of HCF101 proteins.The tree was built using the PhyloBayes CAT Poisson mixture model. Numbers at nodes of the tree indicate statistical support in the form of posterior probability of the PhyloBayes analysis and an ultrafast bootstrap of the IQ-Tree analysis (see materials and methods). The scale bar represents the estimated number of amino acid substitutions per site. The conserved cysteine motif in the DUF971 domain is displayed for each protein sequence when available. Figure 6

Scheme of HCF101 evolution. A. HCF101 distribution explained via the Chromalveolate hypothesis. Upon the acquisition of HCF101-like protein via LGT from bacteria to the ancestor of Archeplastida, chHCF101 was established in the plastid. Euglenozoa gained HCF101 via secondary endosymbiosis with a donor containing green plastid. A common ancestor of Chromalveolata gained red plastid via secondary endosymbiosis, the gene for chHCF101 was duplicated and one copy was targeted to mitochondria (mHCF101), where it replaced the Ind1 gene. mHCF101 is common to all chromalveolates, while several lineages lost chHCF101 together with loss of the secondary plastid (Cryptosporidium, Ciliates, Oomycota, centrohelids, catablepharids). B. HCF101 distribution explained via ‘multiple secondary endosymbiosis’. Mitochondrially localized HCF101 together with Ind1 was present in a common ancestor of Archaeplastida, SAR, Cryptista, and Haptista. Then, in (i) Cryptista and in (ii) a common ancestor of Haptista and SAR, the Ind1 gene was lost, whereas the Archaeplastida gene for the Ind1 protein remained in the mitochondria, and mHCF101 was retargeted to the plastid. Then, chHCF101 was introduced to Cryptophytes, Haptophytes, and certain SAR groups via multiple secondary endosymbiosis. chHCF101 is absent in lineages that did not experience secondary endosymbiosis. PR, protein retargeting; GL, gene loss; GD, gene duplication; PL, plastid loss; PES, primary endosymbiosis; SES, secondary endosymbiosis; TES, tertiary endosymbiosis.

Supplementary Files

This is a list of supplementary les associated with this preprint. Click to download.

TableS1JT.xlsx TableS2domainpredictions.xlsx