<<

This is a pre-copyedited, author-produced version of an article accepted for publication, following peer review.

Spang, A.; Stairs, C.W.; Dombrowski, N.; Eme, E.; Lombard, J.; Caceres, E.F.; Greening, C.; Baker, B.J. & Ettema, T.J.G. (2019). Proposal of the reverse flow model for the origin of the eukaryotic cell based on comparative analyses of archaeal . Microbiology, 4, 1138–1148

Published version: https://dx.doi.org/10.1038/s41564-019-0406-9

Link NIOZ Repository: http://www.vliz.be/nl/imis?module=ref&refid=310089

[Article begins on next page]

The NIOZ Repository gives free access to the digital collection of the work of the Royal Netherlands Institute for Sea Research. This archive is managed according to the principles of the Open Access Movement, and the Open Archive Initiative. Each publication should be cited to its original source - please use the reference as presented. When using parts of, or whole publications in your own work, permission from the author(s) or copyright holder(s) is always needed. 1 Article to Nature Microbiology

2

3 Proposal of the reverse flow model for the origin of the eukaryotic cell based on

4 comparative analysis of Asgard archaeal metabolism

5

6 Anja Spang1,2,*, Courtney W. Stairs1, Nina Dombrowski2,3, Laura Eme1, Jonathan Lombard1, Eva Fernández

7 Cáceres1, Chris Greening4, Brett J. Baker3 and Thijs J.G. Ettema1,5*

8

9 1 Department of Cell- and Molecular Biology, Science for Life Laboratory, Uppsala University, SE-75123,

10 Uppsala, Sweden

11 2 NIOZ, Royal Netherlands Institute for Sea Research, Department of Marine Microbiology and

12 Biogeochemistry, and Utrecht University, P.O. Box 59, NL-1790 AB Den Burg, The Netherlands

13 3 Department of Marine Science, University of Texas at Austin, Marine Science Institute, Port Aransas, TX-

14 78373, USA

15 4 School of Biological Sciences, Monash University, Clayton, Victoria, Australia

16 5Laboratory of Microbiology, Department of Agrotechnology and Food Sciences, Wageningen University,

17 Stippeneng 4, 6708WE Wageningen, The Netherlands

18

19 * corresponding authors: [email protected], [email protected]

20

21

22 Key words: , Asgard superphylum, eukaryogenesis, endosymbiosis, , ,

23 metabolism

1

24

25

26 Abstract

27 The origin of represents an unresolved puzzle in evolutionary biology. Current research

28 suggests that eukaryotes evolved from a merger between a host of archaeal descent and an

29 alphaproteobacterial . The discovery of the Asgard archaea, a proposed archaeal

30 superphylum that includes -, -, Odin- and Heimdallarchaeota suggested to comprise the closest

31 archaeal relatives of eukaryotes, has helped elucidating the identity of the putative archaeal host. While

32 are assumed to employ a hydrogen-dependent metabolism, little is known about the

33 metabolic potential of other members of the Asgard superphylum. We infer the central metabolic

34 pathways of Asgard archaea using comparative genomics and to be able to refine current

35 models for the origin of eukaryotes. Our analyses indicate that Thor- and Lokiarchaeota encode

36 necessary for via the Wood-Ljungdahl pathway and for obtaining reducing equivalents

37 from organic substrates. In contrast, Heimdallarchaeum LC2 and LC3 encode

38 potentially enabling the oxidation of organic substrates using nitrate or oxygen as electron acceptors. The

39 repertoire of Heimdallarchaeum AB125 and Odinarchaeum indicates that these organisms can

40 ferment organic substrates and conserve energy by coupling ferredoxin re-oxidation to respiratory proton

41 reduction. Altogether, our analyses suggest that Asgard representatives are primarily

42 organoheterotrophs with variable capacity for hydrogen consumption and production. On this basis, we

43 propose the ‘reverse flow model’, an updated symbiogenetic model for the origin of eukaryotes that

44 involves electron or hydrogen flow from an organoheterotrophic archaeal host to a bacterial symbiont.

45

46

47

2

48 Main

49 The exploration of microbial life inhabiting anoxic environments has changed our perception of microbial

50 diversity, evolution, and ecology and unveiled various previously unknown branches in the tree of life1,

51 including the proposed Asgard archaea superphylum2-4. The analysis of genomes from uncultivated

52 has improved our understanding of microbial metabolic diversity and evolution of life on

53 Earth, including the origin of eukaryotes5. During the past years, models that posit that eukaryotes evolved

54 from a between an alphaproteobacterial endosymbiont and an archaeal host cell have gained

55 increasing support (reviewed in6-8). In particular, detailed phylogenomic analyses of Asgard archaea, which

56 currently include Loki-, Thor-, Odin- and Heimdallarchaeota, have suggested that these archaea represent

57 the closest known relatives of eukaryotes2,4. Though this view has been challenged9, additional analyses

58 supported the phylogenetic placement of eukaryotes within the Asgard archaea and reinforced the

59 suggestion that these organisms are key for our understanding of eukaryogenesis10,11. The genomes of all

60 members of the Asgard archaea are enriched in coding for so-called eukaryotic signature proteins

61 (ESPs)2,4,12, indicating that the elusive archaeal ancestor of eukaryotes already contained several building

62 blocks for the subsequent evolution of eukaryotic complexity2,4. In turn, genome analyses of

63 Lokiarchaeum have reinvigorated discussions about the nature of the archaeal ancestor of eukaryotes and

64 about the evolutionary events that led to the origin of the eukaryotic cell13-16. Even though recent data

65 favors hypotheses which underpin a symbiogenetic origin of eukaryotes from only two domains of

66 life6,8,17,18, the timing of the events leading to eukaryogenesis remain unknown19.

67 Here, we perform a comparative analysis of the metabolic potential of the Asgard superphylum,

68 which revealed considerable metabolic versatility in these archaea. In light of our findings, we update

69 previously formulated symbiogenetic hypotheses and present a refined model for the origin of

70 eukaryotes, in which the archaeal host is suggested to be a fermentative organoheterotroph generating

71 reduced compounds that are metabolized by a syntrophic bacterial partner organism.

3

72

73 Results and discussion

74 Genomic analyses of Asgard archaea reveals different metabolic repertoires

75 Our reconstruction of the metabolism of the Asgard archaea, based on comparative genome annotation

76 and phylogenetic analyses, revealed that the enzymatic repertoire differs both within and between the

77 metagenome assembled genomes (MAGs) of representatives of the Loki-, Thor-, Odin- and

78 Heimdallarchaeota4, suggesting that members of these groups are characterized by different physiological

79 lifestyles.

80

81 Loki- and can likely use organic compounds and hydrogen

82 The Wood-Ljungdahl pathway (WLP) allows the reduction of to acetyl-CoA and can

83 be used to support both autotrophic carbon fixation and, when linked to chemiosmotic processes, energy

84 conservation20-22. Both Loki- and Thorarchaeota encode all enzymes for a complete (archaeal) WLP (Figure

85 1, Suppl. Figure 2a, Suppl. Tables 1 and 2, Suppl. Information)3,13,23. The presence of the WLP, together

86 with group 3b and group 3c [NiFe]-hydrogenases (Figure 2, Suppl. Table 2, Suppl. Information), indicate

87 that members of the Loki- and Thorarchaeota may have the ability to grow lithoautotrophically using H2

88 as an electron donor in agreement with previous suggestions13. Based on studies of homologous

89 complexes in methanogens24, it is possible that the group 3c [NiFe]-hydrogenases of Loki- and

90 Thorarchaeota, which are encoded in gene clusters with soluble heterodisulfide reductase subunits

91 (Suppl. Figure 3), bifurcate electrons from H2 to ferredoxin and an unidentified heterodisulfide compound.

92 However, it is currently unclear how these organisms could generate a membrane potential through this

93 process given that membrane-bound hydrogenases or other ferredoxin-dependent complexes, such as

94 the Rnf complex, capable of ion translocation25 could not be identified in the genomes. Unless a currently

95 unidentified complex can couple H2 oxidation to membrane potential generation, H2 may

4

96 exclusively be used to support carbon fixation or fermentation in these lineages. Instead, the metabolic

97 repertoire of Loki- and Thorarchaeota reveals the potential to harness electrons from a variety of organic

98 substrates (Suppl. Table 1 and 2, Suppl. Information), including complex , peptides, amino

99 acids, alcohols, fatty acids (Suppl. Figures 2b and 4-13) and hydrocarbons (Suppl. Figure 14). We also

100 identified putative formate dehydrogenases that may support growth on formate as an energy and/or

101 carbon source. While Lokiarchaeota may oxidize these organic substrates using the reverse WLP, the

102 presence of a canonical respiratory chain complex I in Thorarchaeota indicates that members of this latter

103 group have the additional ability to couple organic carbon oxidation to membrane potential generation

104 through vectorial proton translocation. Additionally, it may be speculated that both Loki- and

105 Thorarchaeota can establish a membrane potential through a scalar mechanism from the oxidation of

106 organic compounds by using their putative membrane-bound heterodisulfide reductase (HdrD) for the re-

107 oxidation of quinol generated by electron transfer flavoproteins (ETF) (Figure 1; Suppl.

108 Information). It is possible that cofactors reduced during organic carbon oxidation may be re-oxidised by

109 hydrogenogenic fermentation using the nicotinamide-dependent group 3b26 and ferredoxin-dependent

110 group 3c [NiFe]-hydrogenases, which are thought to be reversible under physiological conditions (Figures

111 1 and 2). Thus, depending on environmental conditions, members of these groups might employ

112 hydrogenogenic or hydrogenotrophic . For example, depending on the electron yield of a

113 , these organisms could use the WLP as electron sink similar to heterotrophic acetogenic

114 bacteria27. In contrast, smaller organic substrates such as short-chain fatty acids with lower electron

115 yields, could be completely oxidized via the reverse WLP. In case of limited availability of electron

116 acceptors, reduced fermentation products may subsequently be metabolized syntrophically by H2 or

117 formate-consuming organisms, which keep the partial pressure of these intermediates sufficiently low28.

118 Finally, the presence of reductive dehalogenases in Loki- and Thorarchaeota suggests that these

119 organisms are also able to shuttle electrons to organohalide compounds (Suppl. Figure 15; Suppl.

5

120 Information). However, given the lack of the classical membrane-anchors of bacterial reductive

121 dehalogenases, the functions of these enzymes in Asgard archaea remain to be investigated.

122

123 Odinarchaeum may be a thermophilic fermentative heterotroph

124 The metabolic repertoire of Odinarchaeum, which so far represents the only known thermophilic member

125 of the Asgard archaea4, seems more limited (Figures 1 and 3; Suppl. Tables 1 and 2). Its genome encodes

126 only a partial tricarboxylic acid cycle and lacks genes for several key enzymes of the WLP as well as beta-

127 oxidation pathway (Suppl. Figure 2a). Yet, it encodes enzymes indicative of the potential to grow on

128 organic substrates, which could be fermented to acetate by a putative ADP-dependent acetyl-CoA

129 synthetase. Furthermore, the Odinarchaeum genome contains three distinct homologues of group 4

130 respiratory H2-evolving [NiFe] hydrogenases (Figure 2; Suppl. Table 1 and 2, Suppl. Information).

131 Phylogenetic analyses of the large subunit of the [NiFe]-hydrogenase suggest that these hydrogenases

132 form three previously undefined subgroups, together with homologs of other recently obtained archaeal

133 lineages (Figure 2). This supports the notion that the classification of group 4 [NiFe]-hydrogenases needs

134 to be extended29. The [NiFe]-hydrogenase gene clusters of Odinarchaeum encode NuoL-like subunits

135 (Suppl. Figure 3; Suppl. Material), which are thought to mediate sodium/proton translocation30, indicating

136 that these hydrogenases are involved in energy conservation. This is reminiscent of hydrogen-evolving

137 group 4d [NiFe]-hydrogenases encoded by members of the , which are involved in the

30,31 138 fermentation of organic substrates to H2, acetate and carbon dioxide . In these archaea, the

139 transporter-linked membrane-bound group 4 [NiFe]-hydrogenases can couple the thermodynamically

140 favorable transfer of electrons from reduced ferredoxin to protons, to the translocation of ions across the

141 membrane. Subsequently, this ion gradient can be harvested through a Na+- or H+-driven ATP synthase31.

142 The presence of hydrogen-evolving group 4 [NiFe]-hydrogenases in Odinarchaeum and its potential to use

143 organic substrates suggest that this organism can conserve energy by a fermentation process that

6

144 simultaneously generates ATP through substrate-level phosphorylation during carbon oxidation and

145 oxidative phosphorylation during ferredoxin re-oxidation, thereby increasing the overall ATP yield.

146

147 Heimdallarchaeota may grow heterotrophically by fermentation, or anaerobic and aerobic respiration

148 Heimdallarchaeota, encode a versatile metabolic repertoire (Figure 1; Suppl. Figure 2a; Suppl. Tables 1

149 and 2) with the potential to derive reducing equivalents from a range of organic substrates including

150 complex carbohydrates, fatty acids and proteins (Suppl. Figure 2b; Suppl. Information). While they lack

151 most enzymes for the WLP, both the LC2 and LC3 genomes encode various

152 components (Figure 1; Suppl. Tables 1 and 2; Suppl. Information), including an A-type heme-copper

153 oxidase (Suppl. Figure 16a), a bacterial-type nitrate reductase (Nar; Suppl. Figure 16b) and a respiratory

154 chain complex I. These components likely enable the use of oxygen and nitrate as electron acceptors

155 during aerobic and anaerobic respiration, respectively. In agreement with this, a close relative of

156 Heimdallarchaeum LC2 was recently identified in oxygen-rich oceanic surface waters32, while members of

157 the Thor- and Lokiarchaeota so far have only been found in strictly anoxic environments. The genome of

158 this oceanic member of the Heimdallarchaeota (PBWW00000000.1) encodes a terminal oxidase that is

159 similar (identity 63%, E-value 0.0) to that of Heimdallarchaeum LC2 (OLS29256), indicating that some

160 Heimdallarchaeota may be able to occupy both anoxic and oxic niches. In contrast, Heimdallarchaeum

161 AB125 has fewer electron acceptors and lacks proteins homologous to those involved in oxygen or

162 nitrogen cycling. However, similar to Heimdallarchaeum LC2 and Odinarchaeum, its genome encodes a

+ 163 putative H2-evolving group 4 [NiFe]-hydrogenase that may allow ferredoxin re-oxidation using H as

164 electron acceptor (Suppl. Information). Finally, all members of the Heimdallarchaeota may be able to use

165 organohalides as electron acceptors (Suppl. Information) and Heimdallarchaeum AB125 may in addition

166 be able to use hydrocarbons as substrates (Suppl. Figure 14).

167

7

168 The evolution of the metabolic potential of the Asgard archaea

169 The comparative genome analyses of Asgard archaea allowed us to infer key metabolic features of the

170 Last Asgard archaeal Common Ancestor (LAsCA) (Figure 3a). First of all, it is likely that LAsCA had the ability

171 to metabolize organic substrates (Suppl. Information). All Asgard archaea, except for Odinarchaeum, have

172 a large variety of genes coding for proteins of all steps of the beta-oxidation pathway (Suppl. Figures 5-

173 13; Suppl. Information). While the phylogenetic history of the respective enzymes is complex and invokes

174 several horizontal gene transfers (HGTs) from different sources (Suppl. Figures 5-13), at least some of

175 these enzymes (Suppl. Figures 6b, 8-10) seem to have evolved vertically within the Asgard archaea or

176 across the archaeal , supporting the presence of this pathway in LAsCA. Furthermore, all members

177 of the Asgard archaea have the potential to use additional organic substrates including complex organic

178 compounds (Suppl. Tables 1 and 2).

179 The presence of the WLP in both Thor- and Lokiarchaeota and the wide occurrence of the key

180 enzyme of the WLP (CO dehydrogenase/Acetyl-CoA synthetase (CODH/ACS)) in all major archaeal phyla5,33

181 indicate that this pathway was present in LAsCA. However, at least one gene encoding a subunit of the

182 CODH/ACS may have been subjected to HGT after the radiation of the Asgard archaea (Suppl. Figure 17),

183 Furthermore, few to all genes encoding enzymes of the WLP seem to have been lost in Odinarchaeum and

184 currently known Heimdallarchaeota. While an ADP-dependent acetyl-CoA synthetase (IPR014089), an

185 enzyme responsible for the generation of acetate and ATP by substrate-level phosphorylation, is present

186 in all Asgard lineages, phylogenetic analyses are indicative of frequent and recent HGTs. Thus, we cannot

187 infer the presence of this in LAsCA with certainty. Furthermore, while all Asgard archaea encode

188 a homolog of ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) family enzymes, which among

189 others include the key enzyme of the Calvin-Benson-Bassham (CBB) cycle (RuBisCO Type I and II) as well

190 as of the related reverse hexulose phosphate pathway34, phylogenetic analyses of these enzymes and the

191 investigation of key residues indicate that the Asgard archaeal homologs belong to RuBisCO sub-types

8

192 that are unrelated to the carbon fixation cycles (Suppl. Information, Suppl. Figure 18a) and may have been

193 acquired horizontally (Suppl. Figure 18a). In line with this, phosphoribulokinase, another key enzyme of

194 RuBisCO-based carbon fixation pathways, does not seem to be encoded by Asgard genomes (Suppl.

195 Information). Therefore, there is currently no indication that LAsCA contained carbon fixation strategies

196 other than the WLP.

197 Though the wide distribution of cytosolic group 3b and 3c [NiFe]-hydrogenases in the analysed

198 Asgard archaea may indicate their presence in LAsCA, phylogenetic analyses of the key subunit do not

199 support such inferences (Figures 2 and 3, Suppl. Tables 1 and 2). For instance, most homologs encoded by

200 the different Asgard lineages are not monophyletic and are thus indicative of extensive HGT events

201 throughout the evolution of this archaeal group (Figure 2). Furthermore, based on the patchy distribution

202 of membrane-bound group 4 [NiFe]-hydrogenases and NADH dehydrogenases in the Asgard genomes and

203 results of our phylogenetic analyses, it is unclear whether these enzymes were part of the LAsCA

204 (Figure 3a). Yet, based on current data, it is plausible that the respective ancestors of

205 Heimdallarchaeota harbored a membrane-bound group 4 [NiFe]-hydrogenase.

206 While phylogenetic analyses indicate that putative reductive dehalogenase-like proteins may

207 have been encoded by LAsCA (Suppl. Figure 15), the absence of a terminal nitrate and oxygen reductase

208 in most of the Asgard archaea studied herein, with the exception of Heimdallarchaeum LC2 and LC3,

209 indicates that these enzyme complexes were acquired more recently in Heimdallarchaeota (Figure 3a).

210 However, the exact timepoint of the acquisition of the genes encoding Nar and the A-type heme-copper

211 oxidase (Suppl. Figure 16a and b) remains to be determined. While the heimdallarchaeal A-type Cox1

212 homologs branch in a cluster with homologs of Ca. Caldiarchaeum subterraneum and , we

213 obtained strong support for the monophyly of the eukaryotic and alphaproteobacterial A-type homologs,

214 supporting the view that eukaryotes inherited their terminal oxidases from the bacterial endosymbiont

215 as indicated earlier35. Similarly, the monophyly of the bacterial-type nitrate reductase of

9

216 Heimdallarchaeum LC2 and LC3 with Methylomirabilis oxyfera, Nitrolancea hollandica and Nitrococcus

217 species, as well as the anaerobic methane-oxidizing euryarchaeote (ANME) Methanoperedens sp. BLZ136,

218 which contains both a bacterial and an archaeal-type Nar, points towards a late acquisition of this gene

219 cluster in Heimdallarchaeota. Thus, terminal reductases for exogenous electron acceptors may have been

220 absent from the LAsCA proteome.

221 Altogether, we conclude that LAsCA likely encoded the WLP, which is consistent with the

222 suggested ancestry of this pathway in archaea33,37. Furthermore, this organism may have had the potential

223 to grow both lithoautotrophically on H2 and CO2 as well as organoheterotrophically using the WLP for

224 growth on organic substrates, including fatty acids and perhaps alkanes or aromatic compounds (Figure

225 3a). It may also have had the ability of fermentative H2 production given the presence of the bidirectional

226 group 3b and 3c [NiFe]-hydrogenases in all Asgard phyla. In contrast, the WLP was lost in the currently

227 known Heimdallarchaeota, which instead seem to have acquired membrane-bound electron acceptors,

228 that could enable respiratory growth on organic substrates. Yet, it is currently unclear whether the WLP

229 was still encoded by the shared ancestor of Heimdallarchaeaota and eukaryotes (Figure 3a and b).

230

231 The reverse flow model for the emergence of the eukaryotic cell

232 Various symbiogenetic scenarios for the emergence of the eukaryotic cell have been proposed in

233 the past, e.g.6-8,16,38. Among these scenarios, the independently formulated but simultaneously published

234 syntrophic hypothesis39,40 (Figure 4a) and hydrogen hypothesis41 (Figure 4b) are perhaps the most

235 articulated and detailed examples. Both of these hypotheses assume a syntrophic interaction based on

236 the transfer of H2 between a H2-dependent and a H2-producing bacterial partner (Figure 4a

237 and b). Following the initial discovery of Lokiarchaeum2, a modified version of the hydrogen hypothesis

238 was proposed (Figure 4c)13. Based on the presence of the WLP in Lokiarchaeota, it was suggested that the

239 archaeal ancestor of eukaryotes likely represented an autotrophic H2-dependent organism. The analysis

10

240 of genomic data of additional Asgard archaea (Figure 1, Suppl. Tables 1 and 2, Suppl. Information)

241 including genomic information of members of the Heimdallarchaeota, which currently seem to represent

242 the closest relatives of eukaryotes among the Asgard archaea (Figure 3b)2,4 allowed us to refine previous

243 syntrophic scenarios for the origin of eukaryotes.

244 In our model, referred to as the ‘reverse flow model’ (Figure 4d), the direction of the syntrophic

245 interaction is suggested to be opposite of what was proposed for the hydrogen and syntrophic hypotheses

246 (Figure 4a and b)6; i.e. the archaeal ancestor of eukaryotes would have used fermentative pathways to

247 produce reduced substrates, which were syntrophically metabolized by the facultative anaerobic

248 alphaproteobacterial ancestor of mitochondria. In the following, we outline our scenario from the

249 perspective of the archaeal host and bacterial symbiont:

250

251 The archaeal host. The metabolic repertoire of Asgard archaea suggests that the archaeal ancestor of

252 eukaryotes had the potential to use organic substrates including fatty acids and alkanes or aromatic

253 compounds for growth (Figure 3a; Supp. Figures 3-13). The fate of reducing equivalents generated during

254 growth on these organics would differ depending on the metabolic repertoire of this organism, substrate

255 availability and the presence of available electron sinks. In anoxic environments, in which electron

28,42 256 acceptors other than CO2 are often scarce, the syntrophic degradation of organic matter is common .

257 usually represent the final reducers of biomass in such environments and outcompete

258 autotrophic acetogens. However, most acetogenic are able to metabolize a wide variety of

259 substrates, often in syntrophy with partner organisms, rather than by using CO2 as electron acceptor

260 through the WLP20. In line with this, we envision a scenario in which the last common ancestor of

261 Heimdallarchaea and eukaryotes degraded small organic compounds in syntrophy with one or more

262 bacterial partners (Figure 4d). For example, the loss of the WLP, a potential electron sink of

263 organoheterotrophically growing acetogens27, and gain of a membrane-bound hydrogenase (a scenario

11

264 observed in Heimdallarchaeota) would have provided a selective pressure for the maintenance of such a

265 relationship. Electrons produced by the archaeal host during the oxidation of organic substrates could

43,44 266 have been transferred to the partner in the form of H2, formate, acetate or via direct electron transfer .

267 Alternatively, this ancestor may have coupled the thermodynamically unfavorable anaerobic oxidation of

268 an alkane (e.g. butane) or related compounds through the shuttling of electrons to a bacterial partner

269 (similar to ANME consortia45) (Figure 4d). This possibility is inspired by the discovery of an “alkyl-coenzyme

270 M reductase” (encoded by mcr genes) in the Helarchaeota46, which is related to the butane-coenzyme M

271 reductases of Syntrophoarchaea, a group of that grows in syntrophy with Ca.

272 Desulfofervidus auxilii47. However, this latter possibility is currently not compatible with the placement of

273 eukaryotes sister to Heimdallarchaeota (Figure 3b), the sparse distribution of these mcr genes within the

274 currently sampled Asgard diversity46 and evidence for HGT of mcr gene homologs in other archaeal

275 lineages5.

276 The bacterial partner. A recent study indicates that mitochondria derive from a lineage that shares

277 a common ancestry with all known excluding Magnetococcales48. While ancestral

278 reconstructions are needed to infer the metabolic potential of the alphaproteobacterial ancestor of

279 mitochondria, it is notable that various extant members of this class encode group 1c and 1d [NiFe]-

280 hydrogenases that are known respiratory H2-uptake hydrogenases and thus allow hydrogenotrophic

29 281 growth . Some anaerobically functioning mitochondria and related of eukaryotes produce H2

282 via a [FeFe]-hydrogenases, the origin of which is still debated49. In particular, the vast majority of

283 alphaproteobacteria do not encode an [FeFe]-hydrogenase29, and the few representatives that do are not

284 from mitochondrial sister groups48,50. Furthermore, robust phylogenetic analyses of these proteins fail to

285 support an alphaproteobacterial origin of these proteins in eukaryotes49 suggesting that the

286 alphaproteobacterial endosymbiont may have oxidized, rather than produced H2. Thus, hydrogen-

287 evolving [FeFe]-hydrogenases and related enzymes, which are the defining feature of mitochondria-

12

288 derived organelles in anaerobic eukaryotes, may have been acquired horizontally from bacterial sources

289 other than the alphaproteobacterial endosymbiont subsequent to eukaryogenesis (Figure 4d).

290 Alphaproteobacteria have been shown to engage in syntrophic interactions within microbial consortia51

291 and to be able to accept electrons from a variety of donors including from extracellular electrons52. We

292 speculate that the alphaproteobacterial ancestor of mitochondria may have served as electron sink for

293 the archaeal partner either under anoxic conditions (e.g. using fumarate or inorganic compounds such as

294 , or nitrate as electron acceptor) or micro-oxic conditions (using oxygen as electron acceptor).

295

296 Altogether, and in agreement with previous syntrophic models for eukaryogenesis, our scenario

297 is based on the observation that syntrophic interactions are widespread, in particular in anoxic

298 environments42, and examples include methane or butane oxidizing euryarchaeota, many of which grow

299 in microbial consortia and are obligately dependent on a partner organism45. While speculative, we

300 propose that the ESPs encoded by Asgard archaea2,4 may have aided the transition from a metabolic

301 syntrophy to a more intricate symbiosis. For example, the generation of -based cellular protrusions

302 could have led to a tighter interaction between the partners allowing for more efficient metabolic

303 coupling, with the bacterial symbiont(s) eventually being effectively engulfed by the host cell. The origin

304 of the defining features of eukaryotic cells including the nucleus and the relative timing of their emergence

305 remain to be established53. However, it is possible that the acquisition of mitochondria was an

306 intermediate event that occurred after the invention of a basic intermembrane-system and actin

307 but predated the origin of the nucleus54. Furthermore, it remains an open question whether

308 membrane could have been of mixed origin during the transition phase55 and were subsequently

309 replaced by bacterial fatty-acid-type lipids in the outer membrane, following endosymbiont gene

310 transfer56, gene acquisitions from other sources57,58 and differential gene loss. Bacterial lipids may have

311 had a selective advantage considering that genes for most metabolic enzymes, including respiratory

13

312 chains, were retained from the endosymbiont rather than the host19 and were likely functioning optimally

313 in bacterial membrane lipids.

314

315 Conclusions

316 Our analyses have revealed that Asgard archaea are metabolically versatile: while Lokiarchaeota may

317 dominantly grow as fermentative organoheterotrophs, Thorarchaeota may be able to derive energy from

318 hydrogen, formate and different organic substrates including fatty acids and hydrocarbons. The presence

319 of a reductive dehalogenase in members of these phyla suggests that they may also be able to use

320 organohalides as electron acceptors during respiration. Odinarchaeum seems to represent an

321 organoheterotrophic fermentative organism that can potentially recycle the ferredoxin reduced during

322 fermentation reactions using H+ as an electron acceptor. In contrast, at least some members of the

323 Heimdallarchaeota appear to have the ability to grow by anaerobic and/or aerobic respiration while

324 others are obligate anaerobes. Based on these analyses, we propose a refined symbiogenetic scenario for

325 the origin of eukaryotes that is based on the transfer of reducing equivalents from the archaeal host to a

326 bacterial partner. We acknowledge that while such a reversed electron flow is most easily reconcilable

327 with the gene set of the herein analyzed Asgard archaeal MAGs, alternative scenarios (Figure 4) cannot

328 be excluded. Subsequent metabolic reconstructions of additional members of this group and of the

329 alphaproteobacteria as well as the functional characterization of ESPs in extant members of the Asgard

330 archaea, together with insights into their cell biology, will be essential to further refine potential symbiotic

331 interactions that may have played a role in the process of eukaryogenesis.

332

333 Methods

334 Annotation.

14

335 The metagenomic bins of the different Asgard members discussed in the manuscript have been annotated

336 automatically and were published in4. For metabolic reconstructions all proteins were analyzed with

337 IPRscan (version: interproscan-5.22-61.0) to determine protein domain information (IPR domains and

338 PFAMs) as well as KO numbers. Additionally, proteins were queried against NCBI nonredundant (nr)

339 database (February 2017) using Diamond blast59 and top hits with and without Asgards were extracted.

340 information for the top blast hits was determined using blastdbcmd. Furthermore, arCOGs and

341 COGs were assigned based on the publicly available arCOGs from4,60. Proteins were also queried against

342 proteins in the Transporter Classification Database61 to identify potential homology to proteins associated

343 with transport functions. Finally, the putative large subunits of [NiFe]-hydrogenases present in Asgard

344 genomes (proteins containing PF00374/PF00346 domains) were extracted and classified using HydDB62,

345 as well as by phylogenetic analyses (see below). The presence of conserved N-and C-terminal C-xx-C motifs

346 characterising the large subunit of [NiFe]-hydrogenases was determined in Jalview. Finally all this data

347 was compiled in a tsv file containing relevant information from all these analyses was generated (Suppl.

348 Table 3). All proteins discussed throughout the manuscript are listed in Suppl. Tables 1-3. Please note that,

349 although the Asgard MAGs are of medium to high quality (Suppl. Figure 1) 63, we cannot currently exclude

350 that the absence of specific genes could be due to genome bin incompleteness.

351

352 Prediction of active enzymes and esterases.

353 Genes encoding for carbohydrate-active enzymes were searched using the Carbohydrate-Active enZYmes

354 (CAZYmes) database using the dbCAN webtool64. To search for peptidases, protein sequences were

355 blasted against the MEROPS peptidase database using blastp and an e-value cutoff of 1e-20 (database

356 downloaded June 201765. Esterases were determined by downloading the HMM profiles from the ESTHER

357 database and searching for positive hits against the Asgard protein sequences using hmmsearch and an

358 e-value cutoff of 1e-1020 (downloaded September 2017)66. The sequence with the best e-value was

15

359 selected in case of multiple hits for a given protein-coding sequence. Protein localization of all positive

360 hits was determined using the server-version of PSORT v3.0 (-a option for archaeal sequences)67.

361 Singletons, hits related to central metabolism (i.e. archaeal proteasome, precursor proteins) and

362 peptidases included in the ESTHER database (to avoid redundancy to the MEROPS database) were not

363 included for further comparisons.

364

365 Phylogenetic analyses of concatenated ribosomal proteins

366 In order to reconstruct a species tree including the Heimallarchaeota Chinese Sea genome B2-JM-08 (WGS

367 accession NJBF000000), we used BLASTp to identify orthologues of 56 ribosomal proteins, which we

368 added to the pre-existing alignments taken from4. Individual protein datasets were aligned using Mafft-

369 LINSi68 and ambiguously aligned positions were trimmed using BMGE (-m BLOSUM30)69. Maximum

370 likelihood (ML) individual phylogenies were reconstructed using IQ-tree v. 1.5.570 under the LG+C20+F+G

371 substitution model with 1000 ultrafast bootstraps and were manually inspected. Trimmed alignments

372 were concatenated into a supermatrix, and an additional dataset was generated by removing DPANN

373 homologues to test the impact of taxon sampling on phylogenetic reconstruction. For each of these

374 concatenated datasets, phylogenies were inferred using ML and Bayesian approaches. ML phylogenies

375 were reconstructed using IQ-tree under the LG+C60+F+G+PMSF model71 (Figure 3b and Suppl. File 1).

376 Statistical support for branches was calculated using 100 bootstraps replicated under the same model. To

377 test robustness of the placement of eukaryotes, the datasets were subjected to several treatments. For

378 the ‘full dataset’ and the ‘no DPANN dataset’ (i.e., 144 and 104 taxa, respectively), we tested the impact

379 of removing the 25% fastest-evolving sites, since it has been shown that in large-scale phylogenetic

380 analyses, these sites are often saturated with multiple substitutions and, as a result of model-

381 misspecification, can generate an artefactual topology 72. The corresponding ML trees were inferred as

382 described above (Suppl. Files 2 and 3). Bayesian phylogenies were reconstructed with Phylobayes73 under

16

383 the LG+GTR model for the ‘no DPANN’ dataset): four independent Markov chain Monte Carlo chains were

384 run for ~40 000 generations. After a burn-in of 20%, convergence was achieved for all chains (maxdiff <

385 0.104). This supermatrix was also recoded into 4 categories in order to ameliorate effects of model

386 misspecification and saturation74 and the corresponding phylogeny was reconstructed with Phylobayes

387 under the CAT+GTR model. Four independent Markov chain Monte Carlo chains were run for ~160 000

388 generations. After a burn-in of 20%, convergence was achieved for all chains (maxdiff < 0.103).

389

390 Phylogenetic analyses of individual protein sequences

391 Cytochrome c oxidase, subunit 1 (PF00115, Cox). All sequences in UniProtKB (http://www.uniprot.org)

392 assigned to PF00115 (967,991 sequences) were extracted from Archaea, Bacteria and eukaryotes

393 excluding Opisthokonta (929,952 sequences), from which only reviewed sequences assigned to PF00115

394 were extracted. This set of sequences includes the distantly related Nitric oxide reductase family.

395 Backbone sequences were filtered by identity using cd-hit75 (cutoff of 65-70 % for eukaryotes and bacteria

396 and 85% for archaea) and by length (cutoff 400 AA). Alignments were iteratively refined by removing long-

397 branches and poorly aligned sequences after inspection of initial alignments (using hmm-align)76 and

398 phylogenies (using FastTree, LG model)77. In addition, Cox homologs reviewed in swissprot and predicted

399 cytochrome c oxidase, subunit 1 homologs of Asgard archaea (only in Heimdallarchaeota) and some

400 additional archaea not present in uniprot were added to this dataset and datasets were filtered again

401 using cd-hit (cutoff 85%). The final alignment of 443 positions is based on an hmm-alignment in which

402 sequences were aligned to hmm-profile of PF00115 using the --trim flag to only keep the conserved Cox1

403 domain. Alignments were trimmed using trimAL78 with the gappyout option and phylogenetic analyses

404 were performed using IQ-tree70 using under the LG+C20 model with ultrafast bootstraps79.

405

17

406 [NiFe]-Hydrogenase, large subunit. Backbone sequences for classifying [NiFe]-hydrogenase were based on

407 HydDB29. To reduce sampling size, the backbone dataset was filtered using cd-hit75 with a sequence

408 identity cutoff of 90% before concatenation with put. large subunit [NiFe]-hydrogenases of Asgard

409 archaea. HeimC3_03700 and HeimC2_23100 were removed from the dataset to prevent potential LBA

410 artifacts as these sequences represented very long branches and have extremely low similarity to known

411 homologs (<30%). They may represent previously uncharacterized groups that should be investigated in

412 the future. All sequences for group 3 and group 4 large subunit [NiFe]-hydrogenase were aligned using

413 Mafft-LINSi68 and trimmed with BMGE (using an entropy score of 0.55 and 0.65, respectively)69. Maximum

414 likelihood phylogenetic analyses were performed using IQ-tree70 with the best-fit model according to BIC

415 LG+C60+R+F (group 4) and LG+C50+R+F (group 3)). Support values were estimated using SH-like

416 approximate likelihood ratio test80 and ultrafast bootstraps79, respectively.

417

418 RuBisCO. All proteins of Asgard archaea assigned to orthologues assigned to COG01850 as well as

419 additional homologs of recently sequenced archaeal genomes were extracted and aligned with a

420 representative set of RuBisCO homologs, (including Family I through Family IV) that was based on a

421 backbone dataset kindly provided by Anantharaman81. Sequences were aligned using Mafft-LINSi68 and

422 trimmed with BMGE (entropy score set to 0.55)69 and the final alignment of 359 aligned positions was

423 subsequently subjected to maximum likelihood analyses using IQ-tree (using the LG+C60+R+F model

424 chosen based on BIC score)70. The presence of conserved amino acids was investigated in Jalview based

425 on residues given in82. Support values were estimated using SH-like approximate likelihood ratio test80

426 and ultrafast bootstraps79, respectively.

427

428 Acetyl-CoA synthase/CO dehydrogenase. Protein sequences with PF03598 encoding the key subunit of

429 Acetyl-CoA synthase/CO dehydrogenase, CdhC (also referred to as CODH/acetyl-CoA synthase beta

18

430 subunit or alpha subunit of ACS/CODH) were extracted from UniProtKB (http://www.uniprot.org) (which

431 included Asgard archaeal homologs) and filtered based on a length cutoff of 400 AA (except for the partial

432 homolog of Lokiarchaeaum) and a sequence identity of 95% using cd-hit75. Furthermore, sequences

433 lacking taxonomy information (labeled as uncultured) and poorly aligned sequences that likely represent

434 distant paralogs were removed. Final phylogenies are based on Mafft-LINSi alignments68 upon trimming

435 with TrimAL 5078, which were subjected to maximum likelihood analyses using IQ-tree70 with LG+C20+F+R.

436 Bacterial cdhC homologs as well as few archaeal homologs that branch within bacteria encode an N-

437 terminal domain absent in the other canonical archaeal homologs (length ca, 250AA). The N-terminal

438 domain was retained upon trimming with TrimAL 50% and yielded longer alignments (712AA) as

439 compared to BMGE. Support values were estimated using SH-like approximate likelihood ratio test80 and

440 ultrafast bootstraps79, respectively.

441

442 Beta-oxidation enzymes. Unless otherwise stated, the top 1000 bacterial, archaeal and eukaryotic

443 sequences were retrieved from Genbank non-redundant database (nr) using each Asgard sequence as a

444 query. These datasets were combined and reduced using cd-hit75 using taxonomy specific sequence

445 identity reduction thresholds for opisthokonts (50%), archaeplastids (50%), bacteria (80%) and archaea

446 (80%). Initial alignments were made with hmmalign using PFAM domains as indicated and trimmed using

447 BMGE69 (-h 0.6 and -m BLOSUM30). Initial trees were generated with Fasttree77. The datasets were

448 further reduced by manual inspection of the trees by removing well supported distant to the Asgard

449 sequences and selecting representative taxa. For example, well supported clades composed of dozens of

450 clostridiales were reduced to between 5-10 diverse representatives. This process was repeated iteratively

451 until there were less than 1000 taxa at which point alignments were generated using mafft-LINSi68 and

452 trees estimated with IQ-tree70 (LG+C20+G+F) with 1000 ultrafast bootstrap replicates79.

19

453 Acyl-CoA dehydrogenase (ACAD, COG1960) sequences were retrieved from83 and used for

454 identifying subcategories of ACAD enzymes and combined with the sequences retrieved from nr

455 (described above). Initial alignments were made as described above using hmmalign with profile

456 alignments for PF02771.14 PF00441.22 and PF02770.17 and concatenated. Acetoacetyl-CoA

457 acyltransferases (COG0183) sequences were retrieved from84 and nr as described above. Initial alignments

458 were generated as described above using the PF00501.26 hmm profile.

459 Alignments for the following proteins were generated as described above using the indicated

460 PFAM hmm profile: AMP-dependent synthetase and (COG0318; PF00501.26), enoyl-CoA hydratase

461 (COG1024; PF00378.18), 3-hydroxy-acyl-CoA dehydrogenase (COG1250; PF00725.20, PF02737.16). Initial

462 alignments of alternative acyl-CoA dehydrogenase (COG2368) were generated using Mafft-LINSi68.

463

464 Pyruvate-Formate superfamily. All sequence assigned to IPR004184 (Pyruvate formate lyase

465 domain) were downloaded from UniProtKB (http://www.uniprot.org) and filtered by length (only archaeal

466 and bacterial sequences between 700 AA and 950 AA were kept). Bacterial homologs were additionally

467 filtered by identity using cd-hit75 with a cutoff of 75%. Note: all homologs of Asgard archaea were kept.

468 Sequences of poor quality as well as poorly aligned sequences were removed upon inspection of initial

469 Mafft-LINSi alignments. The final set of sequences was re-aligned using Mafft-LINSi68 and trimmed with

470 BMGE69 (BLOSUM30, entropy: 0.55) leaving 336 sites, which were subjected to Maximum Likelihood

471 phylogenetic analyses using IQ-tree70 (LG+C20+F+R). Support values were estimated using SH-like

472 approximate likelihood ratio test80 and ultrafast bootstraps79, respectively.

473

474 Nitrate Reductase subunit alpha (NarG). All sequences with similarity to InterPro domain IPR006468

475 (Nitrate reductase, alpha subunit) were downloaded from UniProtKB (http://www.uniprot.org). Note: this

476 IPR domain only includes bacterial-type Nitrate reductases since the archaeal-type NarG are only distantly

20

477 related (e.g. 24% identity and 56% sequence coverage between the bacterial- and archaeal-type

478 (WP_097297416.1, WP_097297472.1) NarG of Methanoperedens sp. BLZ1. Sequences were filtered by

479 length (most sequences between 1300 and 1100 AA, which were thus used as lower and upper thresholds,

480 respectively) and identity (using cd-hit75 with a threshold of 75%) to decrease the amount of sequences

481 while keeping taxonomic representation. Subsequently, sequences were aligned with Mafft-LINSi68, and

482 trimmed using BMGE69 (BLOSUM30, entropy 0.55). The final alignment of 1071 positions was subjected

483 to maximum likelihood phylogenetic analyses using IQ-tree70 (LG+C20+F+R). Support values were

484 estimated using SH-like approximate likelihood ratio test80 and ultrafast bootstraps79, respectively.

485

486 Reductive dehalogenase. Archaeal proteins assigned to IPR028894 and sequences from marine

487 metagenomes (UniProtKB; http://www.uniprot.org) as well as all Asgard homologs assigned to the more

488 broadly defined COG01600 (includes both reductive dehalogenases and epoxyqueuosine reductases)

489 were added to a bacterial backbone dataset kindly provided by Laura Hug and published in85. The even

490 more specific protein domain IPR012832 defining reductive dehalogenases is present in a subset of these

491 Asgard sequences and in the sequences of four Euryarchaeota (2 sequences from Theionarchaea,

492 and Methanohaloarchaeum thermophilum). This set of sequences was pre-aligned using

493 Mafft-LINSi68 to identify and remove poorly aligned or partial sequences (*Note, one lokiarchaeal protein

494 as well as the homolog of Methanohalarchaeum thermophilum was removed in this way). Subsequently,

495 sequences were realigned using Mafft-LINSi68, trimmed with BMGE69 using the BLOSUM30 matrix

496 combined with an entropy of 0.6 leaving 160 sites that subjected to maximum likelihood analyses using

497 IQ-tree70 (using best-fit model according to BIC determined by IQ-tree: LG+C40+R+F). Support values were

498 estimated using SH-like approximate likelihood ratio test80 and ultrafast bootstraps79, respectively. The

499 tree was rooted using midpoint rooting. However, most likely all sequences shaded in light green in Suppl.

500 Figure 15, do not represent bona fide reductive dehalogenases, which is supported by the absence of the

21

501 characteristic domain IPR012832. The substrates that can be degraded by some of the characterized

502 reductive dehalogenases are indicated on the tree based on86,87.

503 All trees were visualized using FigTree (http://tree.bio.ed.ac.uk/publications/) and modified/annotated

504 with adobe illustrator to increase readability.

505

506 Data availability

507 The genomes of the herein analysed Asgard archaea have been made publicly available on NCBI

508 previously2,4. Detailed annotations of the metabolic repertoire are provided in Suppl. Tables 1-3

509 accompanying this manuscript. Raw data files are made available via figshare under the following link:

510 https://figshare.com/s/5f153d1dcacadd3b3ed6.

511

512 Code availability

513 Small custom scripts used for genome annotation and phylogenetic analyses are made available on

514 figshare and can be accessed under the following link: https://figshare.com/s/5f153d1dcacadd3b3ed6.

515

516 Acknowledgements

517 This work was supported by grants of the European Research Council (ERC Starting grant 310039-

518 PUZZLE_CELL to T.J.G.E.), the Swedish Foundation for Strategic Research (SSF-FFL5 to T.J.G.E.), the

519 Swedish Research Council (VR grant 2015-04959 to T.J.G.E. and VR starting grant 2016-03559 to A.S.), the

520 NWO-I foundation of the Netherlands Organisation for Scientific Research (WISE fellowship to A.S.), the

521 European Commission (Marie Curie IEF European grants 625521 to A.S. and 704263 to L.E.), the Wenner-

522 Gren Foundations in Stockholm (UPD2016-0072 to J.L.), the European Molecular Biology Organization

523 (EMBO long-term fellowship ALTF-997-2015 to C.W.S), the Natural Sciences and Engineering Research

22

524 Council of Canada (C.W.S), the Australian Research Council (DE170100310 and DP180101762 to C.G) and

525 the National Science Foundation (DEB: Systematics and Biodiversity Sciences; award number 1737298 to

526 B.J.B.).

527 We would like to thank Kasia Zaremba-Niedzwiedzka and Jimmy Saw for reconstruction some of these

528 genomes and helpful discussions. We would also like to acknowledge Steffen L. Jørgensen, the chief

529 scientist R. B. Pedersen, the scientific party and the entire crew on board the Norwegian research vessel

530 G.O. Sars during the summer 2010 expedition, which allowed us access to samples from Loki’s Castle.

531 Finally, we thank Pierre Offre for discussions on metabolic inferences.

532

533 Author Contributions

534 A.S. and T.J.G.E. conceived the study. A.S., C.S., E.F.C., J.L., C. G., B.J.B. and N.D. analysed genomic data.

535 A.S., C.S. and L.E. performed phylogenetic analyses. A.S. and T.J.G.E. wrote the manuscript with input from

536 all authors and A.S., C.S. and N.D wrote the Supplementary Information. All documents were edited and

537 approved by all authors.

538

539 Author Information

540 Reprints and permissions information is available at www.nature.com/reprints. Readers are welcome to

541 comment on the online version of the paper. Correspondence and requests for materials should be

542 addressed to A.S. ([email protected]) or T.J.G.E. ([email protected])

543 544 Competing interests

545 The authors declare no competing financial interests.

23

546 References

547 1 Hug, L. A. et al. A new view of the . Nature Microbiology 1, doi:Artn 16048 548 10.1038/Nmicrobiol.2016.48 (2016).

549 2 Spang, A. et al. Complex archaea that bridge the gap between and eukaryotes. Nature 550 521, 173-+, doi:10.1038/nature14447 (2015).

551 3 Seitz, K. W., Lazar, C. S., Hinrichs, K. U., Teske, A. P. & Baker, B. J. Genomic reconstruction of a 552 novel, deeply branched sediment archaeal with pathways for acetogenesis and reduction. 553 The ISME Journal, doi:10.1038/ismej.2015.233 (2016).

554 4 Zaremba-Niedzwiedzka, K. et al. Asgard archaea illuminate the origin of eukaryotic cellular 555 complexity. Nature 541, 353-358, doi:10.1038/nature21031 (2017).

556 5 Spang, A., Caceres, E. F. & Ettema, T. J. G. Genomic exploration of the diversity, ecology, and 557 evolution of the archaeal domain of life. Science 357, doi:10.1126/science.aaf3883 558 10.1126/science.aaf3883. (2017).

559 6 Lopez-Garcia, P. & Moreira, D. Open Questions on the Origin of Eukaryotes. Trends Ecol Evol 30, 560 697-708, doi:10.1016/j.tree.2015.09.005 (2015).

561 7 Guy, L., Saw, J. H. & Ettema, T. J. The archaeal legacy of eukaryotes: a phylogenomic perspective. 562 Cold Spring Harb Perspect Biol 6, a016022, doi:10.1101/cshperspect.a016022 (2014).

563 8 Martin, W. F., Garg, S. & Zimorski, V. Endosymbiotic theories for origin. Philos Trans R 564 Soc Lond B Biol Sci 370, 20140330-20140330, doi:10.1098/rstb.2014.0330 565 10.1098/rstb.2014.0330. (2015).

566 9 Da Cunha, V., Gaia, M., Gadelle, D., Nasir, A. & Forterre, P. Lokiarchaea are close relatives of 567 Euryarchaeota, not bridging the gap between prokaryotes and eukaryotes. PLoS Genet 13, e1006810, 568 doi:10.1371/journal.pgen.1006810 (2017).

569 10 Spang, A. et al. Asgard archaea are the closest prokaryotic relatives of eukaryotes. PLoS Genet 14, 570 e1007080-e1007080, doi:10.1371/journal.pgen.1007080 571 10.1371/journal.pgen.1007080. eCollection 2018 Mar. (2018).

572 11 Narrowe, A. B. et al. Complex evolutionary history of translation Elongation Factor 2 and 573 diphthamide biosynthesis in Archaea and parabasalids. Genome Biol Evol, doi:10.1093/gbe/evy154 574 (2018).

575 12 Klinger, C. M., Spang, A., Dacks, J. B. & Ettema, T. J. G. Tracing the Archaeal Origins of Eukaryotic 576 Membrane-Trafficking System Building Blocks. Molecular Biology and Evolution 33, 1528-1541, 577 doi:10.1093/molbev/msw034 (2016).

578 13 Sousa, F. L., Neukirchen, S., Allen, J. F., Lane, N. & Martin, W. F. Lokiarchaeon is hydrogen 579 dependent. Nature Microbiology 1, doi:Artn 16034 10.1038/Nmicrobiol.2016.34 (2016).

24

580 14 Martin, W. F., Tielens, A. G. M., Mentel, M., Garg, S. G. & Gould, S. B. The Physiology of 581 in the Context of Mitochondrial Origin. Microbiology and Molecular Biology Reviews : MMBR 582 81, doi:10.1128/MMBR.00008-17 583 10.1128/MMBR.00008-17. Print 2017 Sep. (2017).

584 15 Zachar, I., Szilagyi, A., Szamado, S. & Szathmary, E. Farming the mitochondrial ancestor as a model 585 of endosymbiotic establishment by natural selection. Proceedings of the National Academy of Sciences of 586 the United States of America 115, E1504-E1510, doi:10.1073/pnas.1718707115 587 10.1073/pnas.1718707115. Epub 2018 Jan 30. (2018).

588 16 Speijer, D. Alternating terminal electron-acceptors at the basis of : How oxygen 589 ignited eukaryotic evolution. Bioessays 39, doi:10.1002/bies.201600174 (2017).

590 17 Koonin, E. V. Origin of eukaryotes from within archaea, archaeal eukaryome and bursts of gene 591 gain: eukaryogenesis just made easier? Philos Trans R Soc Lond B Biol Sci 370, 20140333-20140333, 592 doi:10.1098/rstb.2014.0333 (2015).

593 18 Williams, T. A., Foster, P. G., Cox, C. J. & Embley, T. M. An archaeal origin of eukaryotes supports 594 only two primary domains of life. Nature 504, 231-236, doi:10.1038/nature12779 (2013).

595 19 Lopez-Garcia, P., Eme, L. & Moreira, D. Symbiosis in eukaryotic evolution. J Theor Biol 434, 20-33, 596 doi:10.1016/j.jtbi.2017.02.031 597 10.1016/j.jtbi.2017.02.031. Epub 2017 Feb 28. (2017).

598 20 Ragsdale, S. W. & Pierce, E. Acetogenesis and the Wood-Ljungdahl pathway of CO(2) fixation. 599 Biochim Biophys Acta 1784, 1873-1898, doi:10.1016/j.bbapap.2008.08.012 (2008).

600 21 Schuchmann, K. & Muller, V. Autotrophy at the thermodynamic limit of life: a model for energy 601 conservation in acetogenic bacteria. Nat Rev Microbiol 12, 809-821, doi:10.1038/nrmicro3365 (2014).

602 22 Adam, P. S., Borrel, G., Brochier-Armanet, C. & Gribaldo, S. The growing tree of Archaea: new 603 perspectives on their diversity, evolution and ecology. ISME J 11, 2407-2425, doi:10.1038/ismej.2017.122 604 10.1038/ismej.2017.122. Epub 2017 Aug 4. (2017).

605 23 Liu, Y. et al. Comparative genomic inference suggests mixotrophic lifestyle for Thorarchaeota. 606 ISME J 12, 1021-1031, doi:10.1038/s41396-018-0060-x (2018).

607 24 Wagner, A. et al. Mechanisms of gene flow in archaea. Nat Rev Microbiol, 608 doi:10.1038/nrmicro.2017.41 609 10.1038/nrmicro.2017.41. (2017).

610 25 Buckel, W. & Thauer, R. K. Energy conservation via electron bifurcating ferredoxin reduction and 611 proton/Na(+) translocating ferredoxin oxidation. Biochim Biophys Acta 1827, 94-113, 612 doi:10.1016/j.bbabio.2012.07.002 613 10.1016/j.bbabio.2012.07.002. Epub 2012 Jul 16. (2013).

614 26 Bryant, F. O. & Adams, M. W. Characterization of hydrogenase from the hyperthermophilic 615 archaebacterium, . J Biol Chem 264, 5070-5079 (1989).

25

616 27 Schuchmann, K. & Muller, V. Energetics and application of heterotrophy in acetogenic bacteria. 617 Appl Environ Microbiol, doi:10.1128/AEM.00882-16 (2016).

618 28 Stams, A. J. & Plugge, C. M. Electron transfer in syntrophic communities of anaerobic bacteria and 619 archaea. Nat Rev Microbiol 7, 568-577, doi:10.1038/nrmicro2166 (2009).

620 29 Greening, C. et al. Genomic and metagenomic surveys of hydrogenase distribution indicate H2 is 621 a widely utilised energy source for microbial growth and survival. The ISME Journal 10, 761-777, 622 doi:10.1038/ismej.2015.153 623 10.1038/ismej.2015.153. Epub 2015 Sep 25. (2016).

624 30 Yu, H. et al. Structure of an Ancient Respiratory System. Cell 173, 1636-1649 e1616, 625 doi:10.1016/j.cell.2018.03.071 (2018).

626 31 Schut, G. J., Boyd, E. S., Peters, J. W. & Adams, M. W. The modular respiratory complexes involved 627 in hydrogen and sulfur metabolism by heterotrophic hyperthermophilic archaea and their evolutionary 628 implications. FEMS Microbiol Rev 37, 182-203, doi:10.1111/j.1574-6976.2012.00346.x (2013).

629 32 Tully, B. J., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome- 630 assembled genomes from the global oceans. Sci Data 5, 170203-170203, doi:10.1038/sdata.2017.203 631 10.1038/sdata.2017.203. (2018).

632 33 Adam, P. S., Borrel, G. & Gribaldo, S. Evolutionary history of carbon monoxide 633 dehydrogenase/acetyl-CoA synthase, one of the oldest enzymatic complexes. Proceedings of the National 634 Academy of Sciences of the United States of America 115, E1166-E1173, doi:10.1073/pnas.1716667115 635 10.1073/pnas.1716667115. Epub 2018 Jan 22. (2018).

636 34 Kono, T. et al. A RuBisCO-mediated carbon metabolic pathway in methanogenic archaea. Nat 637 Commun 8, 14007-14007, doi:10.1038/ncomms14007 638 10.1038/ncomms14007. (2017).

639 35 Lang, B. F., Gray, M. W. & Burger, G. Mitochondrial genome evolution and the origin of 640 eukaryotes. Annu Rev Genet 33, 351-397, doi:10.1146/annurev.genet.33.1.351 641 10.1146/annurev.genet.33.1.351. (1999).

642 36 Arshad, A. et al. A -Based Metabolic Model of Nitrate-Dependent Anaerobic 643 Oxidation of Methane by Methanoperedens-Like Archaea. Front Microbiol 6, 1423-1423, 644 doi:10.3389/fmicb.2015.01423 645 10.3389/fmicb.2015.01423. eCollection 2015. (2015).

646 37 Williams, T. A. et al. Integrative modeling of gene and genome evolution roots the archaeal tree 647 of life. PNAS 114(23):E4602-E4611 (2017).

648 38 Zachar, I. & Szathmary, E. Breath-giving cooperation: critical review of origin of mitochondria 649 hypotheses : Major unanswered questions point to the importance of early ecology. Biol Direct 12, 19-19, 650 doi:10.1186/s13062-017-0190-5 651 10.1186/s13062-017-0190-5. (2017).

26

652 39 Moreira, D. & Lopez-Garcia, P. Symbiosis between methanogenic archaea and delta- 653 as the origin of eukaryotes: the syntrophic hypothesis. J Mol Evol 47, 517-530 (1998).

654 40 López-García, P. & Moreira, D. Selective forces for the origin of the eukaryotic nucleus. BioEssays: 655 News and Reviews in Molecular, Cellular and Developmental Biology 28, 525-533, doi:10.1002/bies.20413 656 (2006).

657 41 Martin, W. & Muller, M. The hydrogen hypothesis for the first eukaryote. Nature 392, 37-41, 658 doi:10.1038/32096 659 10.1038/32096. (1998).

660 42 Sieber, J. R., McInerney, M. J. & Gunsalus, R. P. Genomic insights into syntrophy: the paradigm for 661 anaerobic metabolic cooperation. Annu Rev Microbiol 66, 429-452, doi:10.1146/annurev-micro-090110- 662 102844 663 10.1146/annurev-micro-090110-102844. Epub 2012 Jul 9. (2012).

664 43 McGlynn, S. E., Chadwick, G. L., Kempes, C. P. & Orphan, V. J. Single cell activity reveals direct 665 electron transfer in methanotrophic consortia. Nature 526, 531-535, doi:10.1038/nature15512 (2015).

666 44 Wegener, G., Krukenberg, V., Riedel, D., Tegetmeyer, H. E. & Boetius, A. Intercellular wiring 667 enables electron transfer between methanotrophic archaea and bacteria. Nature 526, 587-590, 668 doi:10.1038/nature15733 (2015).

669 45 Knittel, K. & Boetius, A. Anaerobic oxidation of methane: progress with an unknown process. 670 Annual review of microbiology 63, 311-334, doi:10.1146/annurev.micro.61.080706.093130 (2009).

671 46 Seitz, K. W. et al. New Asgard archaea capable of anaerobic hydrocarbon cycling. bioRxiv (2019).

672 47 Laso-Perez, R. et al. Thermophilic archaea activate butane via alkyl-coenzyme M formation. 673 Nature 539, 396-401, doi:10.1038/nature20152 (2016).

674 48 Martijn, J., Vosseberg, J., Guy, L., Offre, P. & Ettema, T. J. G. Deep mitochondrial origin outside the 675 sampled alphaproteobacteria. Nature 557, 101-105, doi:10.1038/s41586-018-0059-5 (2018).

676 49 Leger, M. M., Eme, L., Stairs, C. W. & Roger, A. J. Demystifying Eukaryote Lateral Gene Transfer 677 (Response to Martin 2017 DOI: 10.1002/bies.201700115). Bioessays 40, e1700242, 678 doi:10.1002/bies.201700242 (2018).

679 50 Stairs, C. W. et al. Microbial eukaryotes have adapted to hypoxia by horizontal acquisitions of a 680 gene involved in rhodoquinone biosynthesis. Elife 7, doi:10.7554/eLife.34292 (2018).

681 51 Norlund, K. L. et al. Microbial architecture of environmental sulfur processes: a novel syntrophic 682 sulfur-metabolizing consortia. Environ Sci Technol 43, 8781-8786, doi:10.1021/es803616k (2009).

683 52 Bose, A., Gardel, E. J., Vidoudez, C., Parra, E. A. & Girguis, P. R. Electron uptake by iron-oxidizing 684 phototrophic bacteria. Nat Commun 5, 3391, doi:10.1038/ncomms4391 (2014).

685 53 Eme, L., Spang, A., Lombard, J., Stairs, C. W. & Ettema, T. J. G. Archaea and the origin of 686 eukaryotes. Nat Rev Microbiol 16, 120-120, doi:10.1038/nrmicro.2017.154

27

687 10.1038/nrmicro.2017.154. Epub 2017 Nov 27. (2018).

688 54 Ettema, T. J. Evolution: Mitochondria in the second act. Nature 531, 39-40, 689 doi:10.1038/nature16876 (2016).

690 55 Caforio, A. et al. Converting into an archaebacterium with a hybrid heterochiral 691 membrane. Proceedings of the National Academy of Sciences of the United States of America 115, 3704- 692 3709, doi:10.1073/pnas.1721604115 693 10.1073/pnas.1721604115. Epub 2018 Mar 19. (2018).

694 56 Martin, W. et al. Gene transfer to the nucleus and the evolution of . Nature 393, 162- 695 165, doi:10.1038/30234 696 10.1038/30234. (1998).

697 57 Pittis, A. A. & Gabaldon, T. Late acquisition of mitochondria by a host with chimaeric prokaryotic 698 ancestry. Nature 531, 101-104, doi:10.1038/nature16941 699 10.1038/nature16941. Epub 2016 Feb 3. (2016).

700 58 Roger, A. J., Munoz-Gomez, S. A. & Kamikawa, R. The Origin and Diversification of Mitochondria. 701 Curr Biol 27, R1177-R1192, doi:10.1016/j.cub.2017.09.015 702 10.1016/j.cub.2017.09.015. (2017).

703 59 Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat 704 Methods 12, 59-60, doi:10.1038/nmeth.3176 705 10.1038/nmeth.3176. Epub 2014 Nov 17. (2015).

706 60 Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Archaeal Clusters of Orthologous Genes (arCOGs): An 707 Update and Application for Analysis of Shared Features between , , and 708 . Life (Basel) 5, 818-840, doi:10.3390/life5010818 709 10.3390/life5010818. (2015).

710 61 Saier Jr, M. H., Tran, C. V. & Barabote, R. D. TCDB: the Transporter Classification Database for 711 membrane transport protein analyses and information. Nucleic Acids Research 34, D181-186, 712 doi:10.1093/nar/gkj001 (2006).

713 62 Sondergaard, D., Pedersen, C. N. & Greening, C. HydDB: A web tool for hydrogenase classification 714 and analysis. Sci Rep 6, 34212-34212, doi:10.1038/srep34212 715 10.1038/srep34212. (2016).

716 63 Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a 717 metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 35, 725-731, 718 doi:10.1038/nbt.3893 (2017).

719 64 Yin, Y. et al. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. 720 Nucleic Acids Research 40, W445-451, doi:10.1093/nar/gks479 721 10.1093/nar/gks479. Epub 2012 May 29. (2012).

722 65 Rawlings, N. D., Barrett, A. J. & Finn, R. Twenty years of the MEROPS database of proteolytic 723 enzymes, their substrates and inhibitors. Nucleic Acids Research 44, D343-350, doi:10.1093/nar/gkv1118

28

724 10.1093/nar/gkv1118. Epub 2015 Nov 2. (2016).

725 66 Lenfant, N. et al. ESTHER, the database of the alpha/beta- fold superfamily of proteins: 726 tools to explore diversity of functions. Nucleic Acids Research 41, D423-429, doi:10.1093/nar/gks1154 727 10.1093/nar/gks1154. Epub 2012 Nov 27. (2013).

728 67 Yu, N. Y. et al. PSORTb 3.0: improved protein subcellular localization prediction with refined 729 localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26, 1608-1615, 730 doi:10.1093/bioinformatics/btq249 731 10.1093/bioinformatics/btq249. Epub 2010 May 13. (2010).

732 68 Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence 733 alignment based on fast Fourier transform. Nucleic Acids Research 30, 3059-3066 (2002).

734 69 Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): a new software 735 for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 10, 736 210-210, doi:10.1186/1471-2148-10-210 737 10.1186/1471-2148-10-210. (2010).

738 70 Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic 739 algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32, 268-274, 740 doi:10.1093/molbev/msu300 (2015).

741 71 Wang, H. C., Minh, B. Q., Susko, E. & Roger, A. J. Modeling Site Heterogeneity with Posterior Mean 742 Site Frequency Profiles Accelerates Accurate Phylogenomic Estimation. Syst Biol 67, 216-235, 743 doi:10.1093/sysbio/syx068 (2018).

744 72 Kamikawa, R. et al. Parallel re-modeling of EF-1alpha function: divergent EF-1alpha genes co-occur 745 with EFL genes in diverse distantly related eukaryotes. BMC Evolutionary Biology 13, 131-131, 746 doi:10.1186/1471-2148-13-131 747 10.1186/1471-2148-13-131. (2013).

748 73 Lartillot, N., Rodrigue, N., Stubbs, D. & Richer, J. PhyloBayes MPI: phylogenetic reconstruction 749 with infinite mixtures of profiles in a parallel environment. Syst Biol 62, 611-615, 750 doi:10.1093/sysbio/syt022 751 10.1093/sysbio/syt022. Epub 2013 Apr 5. (2013).

752 74 Susko, E. & Roger, A. J. On reduced alphabets for phylogenetic inference. Molecular 753 Biology and Evolution 24, 2139-2150, doi:10.1093/molbev/msm144 (2007).

754 75 Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation 755 sequencing data. Bioinformatics 28, 3150-3152, doi:10.1093/bioinformatics/bts565 756 10.1093/bioinformatics/bts565. Epub 2012 Oct 11. (2012).

757 76 Eddy, S. R. Accelerated Profile HMM Searches. PLoS Comput Biol 7, e1002195-e1002195, 758 doi:10.1371/journal.pcbi.1002195 759 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20. (2011).

29

760 77 Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2--approximately maximum-likelihood trees for 761 large alignments. PLoS ONE 5, e9490-e9490, doi:10.1371/journal.pone.0009490 762 10.1371/journal.pone.0009490. (2010).

763 78 Capella-Gutierrez, S. et al. trimAl: a tool for automated alignment trimming in large-scale 764 phylogenetic analyses. Bioinformatics 25, 1972-1973, doi:10.1093/bioinformatics/btp348 (2009).

765 79 Minh, B. Q., Nguyen, M. A. & von Haeseler, A. Ultrafast approximation for phylogenetic bootstrap. 766 Mol Biol Evol 30, 1188-1195, doi:10.1093/molbev/mst024 767 10.1093/molbev/mst024. Epub 2013 Feb 15. (2013).

768 80 Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: 769 assessing the performance of PhyML 3.0. Systematic Biology 59, 307-321, doi:10.1093/sysbio/syq010 770 (2010).

771 81 Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected 772 biogeochemical processes in an aquifer system. Nat Commun 7, 13219-13219, 773 doi:10.1038/ncomms13219 774 10.1038/ncomms13219. (2016).

775 82 Wrighton, K. C. et al. RubisCO of a nucleoside pathway known from Archaea is found in diverse 776 uncultivated phyla in bacteria. The ISME Journal 10, 2702-2714, doi:10.1038/ismej.2016.53 777 10.1038/ismej.2016.53. Epub 2016 May 3. (2016).

778 83 Swigonova, Z., Mohsen, A. W. & Vockley, J. Acyl-CoA dehydrogenases: Dynamic history of protein 779 family evolution. J Mol Evol 69, 176-193, doi:10.1007/s00239-009-9263-0 780 10.1007/s00239-009-9263-0. Epub 2009 Jul 29. (2009).

781 84 Dibrova, D. V., Galperin, M. Y. & Mulkidjanian, A. Y. Phylogenomic reconstruction of archaeal fatty 782 acid metabolism. Environ Microbiol 16, 907-918, doi:10.1111/1462-2920.12359 (2014).

783 85 Hug, L. A. et al. Overview of organohalide-respiring bacteria and a proposal for a classification 784 system for reductive dehalogenases. Philos Trans R Soc Lond B Biol Sci 368, 20120322-20120322, 785 doi:10.1098/rstb.2012.0322 786 10.1098/rstb.2012.0322. Print 2013 Apr 19. (2013).

787 86 Jugder, B. E., Ertan, H., Lee, M., Manefield, M. & Marquis, C. P. Reductive Dehalogenases Come of 788 Age in Biological Destruction of Organohalides. Trends Biotechnol 33, 595-610, 789 doi:10.1016/j.tibtech.2015.07.004 790 10.1016/j.tibtech.2015.07.004. (2015).

791 87 Neumann, A., Wohlfarth, G. & Diekert, G. Tetrachloroethene dehalogenase from Dehalospirillum 792 multivorans: cloning, sequencing of the encoding genes, and expression of the pceA gene in Escherichia 793 coli. J Bacteriol 180, 4140-4145 (1998).

794 88 Vignais, P. M. & Billoud, B. Occurrence, classification, and biological function of hydrogenases: an 795 overview. Chem Rev 107, 4206-4272, doi:10.1021/cr050196r (2007).

30

796 89 Rochette, N. C., Brochier-Armanet, C. & Gouy, M. Phylogenomic test of the hypotheses for the 797 evolutionary origin of eukaryotes. Mol Biol Evol 31, 832-845, doi:10.1093/molbev/mst272 798 10.1093/molbev/mst272. Epub 2014 Jan 7. (2014). 799

800

801

31

802 Figures

rD drD d

ase uctase, H uctase, H genase? genase? H+ H+ H+ H+ + H+ P synthase H+ P synthase H disulfide red AT T FQO ? FQO ? A NADH dehydrogen tero + T + T H E reductive dehaloheterodisulfide? H red E reductive dehalo he ? MQ MQ MQ MQ MQ MQ MQ ?(red) ?(red) ? ATP ATP (ox) ? ? ?(red) (ox) (red) + ETF ETFox Fd(ox) NAD ox ?(ox) ?(ox) ADP+Pi Fd ADP+Pi ETF (red) 3c [NiFe]- NADH ETFred Fd(ox) red FAD H 2 Fd(red) ? - + hydrogenase 3c [NiFe]- (ox) FADH H Archaeal FAD H2 Archaeal lipid ? - + hydrogenase ? HdrABC-linked (ox) FADH H biosynthesis (red) biosynthesis HdrABC-linked ?(red) Glucose NADPH H+ 3b [NiFe]- Glucose Ribose-5P hydrogenase Glycerol-1P + Glycerol-1P NADP H2 NADPH H 3b [NiFe]- nonox. PPP NADP H hydrogenase rev. RuMP Ribose-5P 2 Glycerone-P Glycerone-P - CO2 HCOO CO2 Fd - NADH red EMP NADH HCOO EMP Fd Formate Fdred ox F420ox Ribose-1,5-2P Fd NMP CO2 dehydro- ox Formate F420red F420ox genase CO2 CO2 CO2 NADH dehydro- F420red F420red genase CO2 CO2 F420

Gluconeogenesis red F420ox Gluconeogenesis F420ox F420red Beta- F420 Mevalonate F420red Mevalonate WLP ox Beta- WLP ETF Acetyl-CoA F420ox Acetyl-CoA oxidation red pathway ETFred pathway NADH NADH oxidation NADH NADH Fumarate Fumarate Succinate Acetyl-CoA Acetyl-CoA TCA TCA dehydrogenase MQ ADP+Pi MQ ADP+Pi Succinate H+ Succinate H+ ATP NADH ATP NADH 2Pi PPi + H2O 2Pi PPi + H2O Acetate Acetate Lokiarchaeota Thorarchaeota

se ase 2 type ]-hydrogena genase? H+/Na+ H+/Na+ ]-hydrogenase rogenase + + [NiFe + + H /Na H+ H /Na + + P synthase [NiFe + + P synthase H /Na Nitrate reductaseHeme/copper- + + T H /Na AT NADH dehydrogenType 4 Mrp-like antoporterFQO H /Na A H+/Na+ T ? BCP cytochrome/quinol oxidase [NiFe]-hyd E reductive dehaloH+ ? H+ Type 4 Mrp-like antoporterH+/Na+

MQ MQ MQ MQ H+ + ?(red) - H2O H2 H Fd(red) ? NO + H O ATP + H 2 2 ATP 2 ?(ox) + ? NAD ETF - + O 2P PP + H O Fd(ox) H Fd(red) Fd ox NO3 + H 2 i + i 2 H2 NADH (ox) ADP+P H ADP+Pi ETF i Fd red MQ H+ (red) Fd(ox) ? ? (ox) HdrD Archaeal lipid ?(red) Archaeal lipid biosynthesis biosynthesis NADPH H+ 3b [NiFe]- Glucose Fd Glucose (ox) hydrogenase 1 NADP H2 ox. PPP Fd(red) Glycerol-1P Ribose-5P FAD H 3c [NiFe]- Glycerol-1P 1 2 ? - + hydrogenase rev. RuMP (ox) FADH H rev. RuMP Ribose-5P ? HdrABC-linked Ribose-5P CO Glycerone-P (red) Glycerone-P 2 Fered Fe NADH/Fd(red) ox

EMP NADPH EMP NMP NMP F420red + CO2 CO2 CO2 NADPH H 3b [NiFe]- F420ox F420 NADP H hydrogenase red 2 Gluconeogenesis Gluconeogenesis F420 ADP+P WLP ox Mevalonate i ATP Mevalonate pathway Acetyl-CoA Acetate pathway Acetyl-CoA NADH NADH NADH Acetyl-CoA Fumarate ETFred Succinate ADP+P TCA TCA i dehydrogenase Beta- MQ Succinate H+ ATP oxidation 2P PP + H O NADH i i 2 NADH Acetate

Heimdallarchaeota Odinarchaeota

present partial/potentially present 803 absent

804

805 Figure 1. Metabolic potential of different Asgard phyla. Each of the arrows represent functions that were

806 assigned to predicted proteins encoded in the respective genomes (Lokiarchaeota (2 MAGs),

807 Thorarchaeota (3 MAGs), Heimdallarchaeota (3 MAGs), Odinarchaeota (1MAG)). The shading of the

808 arrows depends on the distribution of a specific function across the different members of a groups as

809 indicated in the figure panel. Notes: 1oxidative pentose phosphate pathway (PPP) in Heimdallarchaeum

810 LC2 and LC3 and reductive ribulose monophosphate (RuMP) pathway in Heimdallarchaeum AB125;

811 2location of of nitrate reductase in Asgard homologs likely in the cytoplasm (Suppl.

32

812 Information). Color code as indicated in the figure inlet is as follows: arrows in black indicate enzymatic

813 steps encoded by the respective organisms; grey arrows indicate enzymatic steps for which either only

814 putative enzymes could be identified or enzymatic steps that are only encoded in some of the

815 representatives of a given phylum; dashed grey arrows reveal enzymatic steps for which no (candidate)

816 enzymes could be found in any of the representatives of a particular phylum. The presence/function of

817 enzyme complexes surrounded by black dashed lines is tentative.

33

a A0A151BGK3 Ca. Bathyarchaeota archaeon B25 OLS32788 Ca. Heimdallarchaeota archaeon AB_125 (HeimAB125_05250: non-hydrogenase; CxxE/CxxD) KXH75610 Ca. Thorarchaeota archaeon SMTZ1_45 (AM325_04280: non-hyrogenase; CxxE/no) OLS29934 Ca. Thorarchaeota archaeon AB_25 (ThorAB25_13190: non-hydrogenase; CxxE/no) non-hydrogenase (nuoD) (Aigarchaeota, Bathyarchaeota, , MSBL1 archaea)* put. NuoD OLS28029 Ca. Heimdallarchaeota archaeon LC_2 (HeimC2_09340: non-hydrogenase; CxxE/CxxD) OLS21505 Ca. Heimdallarchaeota archaeon LC_2 (HeimC2_33370: non-hydrogenase; CxxE/CxxE) put. nuoD (Thalassoarchaea) put. nuoD (Altiarchaea, Micrarchaeota, Parvarchaeota) A0A1Q6DX86 Euryarchaeota archaeon HMET1 Group 4e (Bacteria, , Methanomassiliicoccales, Bathyarchaeota) Group 4e WP_013129492_1 Thermosphaera aggregans (NiFe_Group_4?) Group 4i (Methanococcales, Methanobacteriales, Theionarchaea) Group 4i & 4h Group 4h (Methanomicrobia, Methanobacteriales, Methanococcales, MSBL1 archaea) A0A1J4RK35 Ca. Aenigmarchaeota archaeon CG1_02_38_14 A0A151F440 Theionarchaea archaeon DG_70 A0A151EHX0 Theionarchaea archaeon DG_70_1 OLS33139 Ca. Heimdallarchaeota archaeon AB_125 (HeimAB125_01550: Group 4; CxxC/CxxC) OLS24356 Ca. Heimdallarchaeota archaeon LC_2 (HeimC2_23020: Group 4; CxxC/CxxC) put. Group 4g (Bathyarchaeota) put. Group 4g (Bathyarchaeota) “Group 4g” OLS18213 Ca. Odinarchaeota archaeon LCB_4 (OdinLCB4_14520: Group 4; CxxC/CxxC) Group 4g (Methanomicrobia (1), Bacteria) put. Group 4g (Hadesarchaea, MSBL1 archaea) A0A1F5DG73 Ca. Bathyarchaeota archaeon RBG_13_38_9 OLS18223 Ca. Odinarchaeota archaeon LCB_4 (OdinLCB4_14620: Group 4; CxxC/CxxC) Group 4g (Methanobacteriales (1), ) Group 4c (Bacteria) Group 4g (Bacteria) Group 4g (Bacteria) “Group 4c & g” Group 4g (Bacteria) Group 4g (Bacteria) A0A0P9G6Q8 Ca. Bathyarchaeota archaeon BA1 A0A0M0BRL2 miscellaneous Crenarchaeota group 15 archaeon DG_45 OLS18555 Ca. Odinarchaeota archaeon LCB_4 (OdinLCB4_02560: Group 4; CxxC_CxxC) Group 4d (Methanomicrobia, Thermococcales, Aciduluprofundi, Bacteria) Group 4d A0A151BF30 Ca. Bathyarchaeota archaeon B25 WP_012745069_1 Kosmotoga olearia (NiFe_Group_4g) Group 4f (Bacteria, Altiarchaea) Group 4f Group 4a (Bacteria) WP_011751858_1 pendens (NiFe_Group_4a) Group 4a WP_014558423_1_Fervidicoccus_fontis_NiFe_Group_4a Group 4b (Thermococcales) Group 4b (, Thermococcales) Group 4b Group 4b (Bacteria)

0.4

b Group 3d (Bacteria) Group 3d KKK40681 Lokiarchaeum sp. GC14_75 (Lokiarch_49310: put. Group 3c; CxxC/CxxC)1 Group 3c (Bathyarchaeota) OLS27265 Ca. Heimdallarchaeota archaeon LC_3 (HeimC3_04360: Group 3c; CxxC/CxxC) Group 3c (Theionarchaea) Group 3c (Bacteria, Euryarchaeota) Group 3c (, , Bacteria) Group 3c (Anaeromyxobacter) OLS24369 Ca. Heimdallarchaeota archaeon LC_2 (HeimC2_23150: Group 3c; CxxC/CxxC) Group 3c (MSBL1 archaea) Group 3c (Bacteria) WP_015233716_1 gregoryi (NiFe_Group_3c) Group 3c Group 3c (Bacteria) A0A147JTG5 Hadesarchaea archaeon DG_33_1 Group 3c (Methanocella, MSBL1 archaea, Methanobacteriales, Methanococcales, ) Group 3c (Methanobacteriales, Methanocellales, ) Group 3c (Bacteria) Group 3c (Methanococcales) KXH73312 Ca. Thorarchaeota archaeon SMTZ1_45 (AM325_14150: Group 3c; CxxC/no) OLS21753 Ca. Thorarchaeota archaeon AB_25 (ThorAB25_26490: Group 3c; CxxC/no) KXH71847 Ca. Thorarchaeota archaeon SMTZ1_83 (AM324_08070: Group 3c; CxxC/no) OLS29367 Ca. Thorarchaeota archaeon AB_25 (ThorAB25_16580: Group 3c; CxxC/no) KKK41105 Lokiarchaeum sp GC14_75 (Lokiarch_45490: Group 3c; no/no*) Group 3b (Bacteria, Caldiarchaeum, ) Group 3b (Bacteria, Bathyarchaeota) Group 3b (Aciduliprofundum, MSBL1 archaea, Hadesarchaea, DPANN) Group 3b (Bacteria, MSBL1 archaea, Bathyarchaeota, Theionarchaea) Group 3b (Bacteria) OLS27427 Ca. Heimdallarchaeota archaeon LC_3 (HeimC3_04260: Group 3b; CxxC/CxxC) Group 3b (Thermococcales) KXH74491 Ca. Thorarchaeota archaeon SMTZ1_45 (AM325_05810: Group 3b; CxxC/CxxC) KXH73819 Ca. Thorarchaeota archaeon SMTZ1_83 (AM324_16110: Group 3b; CxxC/CxxC) KKK42945 Lokiarchaeum sp GC14_75 (Lokiarch_31190: Group 3b; no/CxxC*) Group 3b (Thermococcales) Group 3b Group 3b (Bathyarchaeota) OLS17582 Ca. Odinarchaeota archaeon LCB_4 (OdinLCB4_08000: Group 3b; CxxC/CxxC) WP_012308976_1 Korarchaeum cryptofilum (NiFe_Group_3b) Group 3b (Bacteria) Group 3b (Bacteria) OLS14703 Ca. Lokiarchaeota archaeon CR_4 (RBG13Loki_1669: Group 3; CxxC/CxxC)2 A0A0M0BR92 miscellaneous Crenarchaeota group 15 archaeon DG_45 Group 3b (Bathyarchaeota) A0A1Q6DUI2 Euryarchaeota archaeon HMET1 Group 3a Group 3a

0.7

100/100 90-99.9/100 or 100/90-99 90-99.9/90-99 80-89.9/80-100 or 80-100/80-89

818 34

819 Figure 2. Complex evolutionary history of Group 3 and Group 4 [NiFe]-hydrogenases in Asgard archaea.

820 Maximum likelihood phylogenetic reconstructions of the large subunit of Group 4 (376 sequences) (a) and

821 Group 3 (543 sequences) (b) [NiFe]-hydrogenases (359 and 319 amino acids, respectively). Asterisk

822 indicate partial genes. Asgard homologs are highlighted in boldface and their annotation includes

88 823 information as to whether the N- and C-terminal CxxC motifs, which ligate H2-binding metal centres , are

824 conserved. Numbers at branches indicate SH-like approximate likelihood ratio test80 and ultrafast

825 bootstrap support values79, respectively. Only bootstrap values above 80 are shown. The phylogeny of

826 large subunit [NiFe]-hydrogenases, group 4 (a) was rooted arbitrarily due to unstable position of the root.

827 The phylogeny of large subunit [NiFe]-hydrogenase Group 3 (b) was rooted between group 3a and all

828 other groups, as determined from a preliminary phylogenetic analysis of large subunit [NiFe]-

829 hydrogenases including all members (group 1-4) (not shown). Scale bars indicate the average number of

830 substitutions per site. 1Gene cluster analysis of this divergent group 3 [NiFe]-hydrogenase homolog

831 suggests that it represents a Group 3c member. 2 While phylogenetic analysis indicates this this sequence

832 is from a group 3b [NiFe]-hydrogenase, it belongs to a gene cluster with an organization typical of Group

833 3c [NiFe]-hydrogenases. Thus, the annotation of this hydrogenase is currently unclear.

834 835

35

a mpounds?

hydrogenaseydrogenasease e pathway iFe]-dehydrogenase β-Oxidation) [N [NiFe]-de dehalogenase

Group Group3b Group 3c NADH 4 [NiFe]-deh dehydrogenATP synthaseTerminalNitrate oxidasReductive reductaseWood-LjungdahlRubisco (excl.Formate typeFatty 4) acidsHydrocarbons/ ( aromatic co Lokiarchaeum sp. GC14_75 Lokiarchaeote CR_4 Odinarchaeote LCB_4 Thorarchaeote AB_25 Thorarchaeote WOR_45 LAsCA Thorarchaeote WOR_83 Heimdallarchaeote AB_125 Heimdallarchaeote LC_2 ? Heimdallarchaeote LC_3 Eukaryotes Hydrogenases Respiration C-fixation Selected substrates

WLP, Type 3b/c [NiFe]-Hyd., use of FA & alkanes, red. dehalogenase Genes likely present in LAsCA Genes likely acquired later in some lineages Loss of WLP, gain of Type 4 [NiFe]- Hyd. unclear

b Ca. Odinarchaeota LCB4 63/-/.64 Odinarchaeota Lokiarchaeum sp. GC14_75 Lokiarchaeota Ca. Lokiarchaeota archaeon CR_4 Ca. Thorarchaeota archaeon SMTZ-83 Ca. Thorarchaeota archaeon AB_25 Thorarchaeota Ca. Thorarchaeota archaeon SMTZ1-45 59/72/1 Ca.Thorarchaeota archaeon SMTZ-45

82/89/.98 Ca. Heimdallarchaeota archaeon LC_3 SChinaSea_Heimdall Heimdallarchaeota Ca. Heimdallarchaeota archaeon AB_125 Ca. Heimdallarchaeota LC_2

Eukaryotes 89/100/1 Bathyarchaeota Aigarchaeota 89/100/1 Thaumarchaeota Korarchaeota

91/82/- Verstraetearchaeota Geoarchaeota

58/71/1 Crenarchaeota

Euryarchaeota 0.1

BS >= 95 (LG+C60+F+G+PMSF), FSR / BS >= 95 (LG+C60+F+G+PMSF), full/ PP=1 (LG+GTR), FSR 836

837 Figure 3. Evolution of the Asgard superphylum and selected metabolic features. Schematic

838 representation of the relationship of the Asgard phyla (9 MAGs) and the distribution of selected metabolic

839 features across the different members of this group (a). Maximum-likelihood phylogenetic analysis of a

840 concatenated set of 56 universal protein markers5 from 144 representative taxa

841 https://figshare.com/s/5f153d1dcacadd3b3ed6 (b). Numbers at branches indicate bootstrap statistical

36

842 support estimated by IQ-TREE (LG+C60+F+G+PMSF model) obtained from 100 replicates on the ‘full’ and

843 ‘fast-sites removed’ datasets, and posterior probabilities estimated by Phylobayes (LG-GTR model) on the

844 ‘fast-sites removed’ dataset (see methods for further detail).

845

846

847 848 849

37

Syntrophic hypothesis organics organics, fumarate, CH4 oxic O anoxic 2 anoxic H O organics 2

- CO2 organics fermentation NO3 glucose CO2 - NO2 organics, CH4 O H2 2 H2 acetate H2 H2 organics pyruvate CO2 H2 WLP H2O CO2 acetate organics CO2 H2 CH4 aerobe CO2 anaerobe

Moreira & Lopez-Garcia, 1998; Lopez-Garcia & Moreira, 2006

Hydrogen hypothesis organics oxic organics anoxic O2 H O organics 2 CO2 glucose

H2 O2 H2 H2 H H2 2 H2 CO2 glucose pyruvate

WLP H2 H2O

CO2 H2

methanogenesis aerobe CO2 anaerobe

CH4 Martin & Müller, 1998

Updated hydrogen hypothesis organics oxic organics anoxic O2 H O organics 2 CO2 glucose

H2 O2 H2 H2 H H2 2 H2 CO2 glucose pyruvate

gluconeo- H2 H2O genesis WLP CO H acetyl-CoA acetate 2 2 aerobe CO2 anaerobe

Sousa et al., 2016

Reverse flow model

oxic H2,or organics (e.g. acetate) organics organics O anoxic 2

O2 H2O

succinate CO2 organics (e.g. FA, fumarate glucose glucose alkanes, alcohols, - e aromatic compounds) HGT & loss HGT & loss H2

+ mitochondrionmitochondri hydrogenosome H e- ? pyruvate pyruvate acetate acetyl-CoA e-

reverse X WLP formate CO aerobe H2 anaerobe 2 H2O - WLP CO2 e CO2 CO2

facultative aerobic alphaproteobacterium fermentative deltaproteobacterium autotrophic methanogenic archaeon autotrophic lokiarchaeum-like archaeon 850 heterotrophic asgard-like archaeon

38

851 Figure 4. Evolutionary scenarios for the origin of the eukaryotic cell

852 (a) Depiction of the syntrophic hypothesis previously proposed by Moreira and Lopez-Garcia, 1998 and

853 200639,40, which invokes two bacterial and one archaeal partner(s) in the origin of the eukaryotic cell; i.e.

854 first a syntrophic relationship was established between a fermentative deltaproteobacterium and a

855 hydrogen-dependent archaeal methanogen, which was incorporated into the cytoplasm of the bacterium

856 through endosymbiosis. Subsequently, a second endosymbiosis event led to the uptake of a facultative

857 aerobic alphaproteobacterium, which was suggested to have oxidized organic compounds and

858 hydrocarbons produced by the host. While this model can explain the origin of the nucleus from the

859 archaeal endosymbiont, it currently lacks support from genomic and phylogenomic analyses57,89. (b)

860 depiction of the hydrogen hypothesis originally proposed by Martin and Müller, 199841, which suggest

861 that a symbiosis between a strictly autotrophic hydrogen-dependent methanogenic archaeaon and an H2-

862 and CO2-producing alphaproteobacterium led to the origin of the eukaryotic cell. (c) depiction of the

863 updated hydrogen hypothesis based on the first analysis of the metabolic repertoire of Lokiarchaeum13,

864 which suggests that the archaeal host was an autotrophic hydrogen-dependent acetogen rather than a

865 methanogen. (d) Our syntrophy model referred to as the ‘reverse flow model’, which is based on our

866 comparative analysis of the metabolic repertoire encoded by the various members of the Asgard

867 archaea2,4. This model suggests that a metabolic syntrophy between anaerobic ancestral Asgard archaea

868 and facultative anaerobic alphaproteobacteriahas has provided the selective force for the establishment

869 of a stable symbiotic interaction that has subsequently led to the origin of the eukaryotic cell,. In this

870 scenario, the archaeal progenitor generated reducing equivalents during growth on small organic

871 substrates (e.g. hydrocarbons and fatty acids) and the bacterial partner utilized these in form of H2, small

872 reduced inorganic or organic compounds or by direct electron transfer. We acknowledge that HGT from

873 free-living bacteria played an important role in this process and contributed to the gene repertoire of the

874 eukaryotic cell including for the evolution of mitochondria-derived organelles such as hydrogenosomes in

39

875 anaerobic protists58. Please note, that our metabolic reconstructions do not provide insights into the

876 origin of the nucleus and the relative timing of eukaryogenesis (see main text for details).

877

878

879

880

881

882

40

Supplementary Information for Proposal of the reverse flow model for the origin of the eukaryotic cell based on comparative analysis of Asgard archaeal metabolism

Anja Spang1,2, Courtney Stairs1, Nina Dombrowski3, Laura Eme1, Jonathan Lombard1, Eva Fernández Cáceres1, Chris Greening, Brett J. Baker3 and Thijs J. G. Ettema1

Table of contents

A. Supplementary discussion 1. Carbon metabolism and the potential for autotrophic carbon fixation 2 Central Carbon metabolism 2 Wood-Ljungdahl pathway 3 Genes encoding for proteins related to methanogenesis 3 3-hydroxypropionate/4-hydroxybutyrate cycle 4 Calvin-Benson-Bassham Cycle 5 2. Asgard archaea have the potential to use various substrates as carbon source including hydrocarbons 6 Carbohydrate active enzymes and peptidases 6 Fatty acids 7 Hydrocarbons 11 3. Electron chains differ among the Asgard lineages 12 [Ni-Fe]-hydrogenases 12 Complex I and alternative NADH:quinone reductases 13 Reductive dehalogenases 14 Terminal oxidase and nitrate reductase in Heimdallarchaeota 14 Other potential components of an electron chain 15 Quinone biosynthesis 16

B. Supplementary References 17-21

C. Supplementary Tables, Figures and Files 22-60 Legends for Suppl. Tables, 1-5 22 Suppl. Figures 1-18 & Suppl. Files 1-3 23-60

1

A. Supplementary text

1. Carbon metabolism and the potential for autotrophic carbon fixation Central Carbon metabolism. All members of the Asgard archaea contain candidate genes encoding enzymes for most steps of glycolysis via the Embden-Meyerhof-Parnas (EMP) pathway as well as gluconeogenesis and the tricarboxylic acid cycle (TCA; Suppl. Tables 1 and 2, Suppl. Figure 2). However, the specific enzymes catalyzing these steps differ among the various lineages (Suppl. Tables 1 and 2). For example, our analysis of enzymes potentially involved in glycolysis revealed that only three members of the Asgard archaea appear to encode enzymes belonging to described hexo- or glucokinase families: the archaeal ADP-dependent phosphofructokinase/glucokinase in the two

Heimdallarchaeota LC3 and AB125 (PF04587) and a ROK (repressor, open reading frame, kinase) family enzyme in Lokiarchaeum GC14-75 (PF00480) (Suppl. Table 2), which could potentially phosphorylate glucose and thus perform the first step of the EMP pathway. While all genomes encode various proteins with Phosphofructokinase (Pfk) B family carbohydrate kinase (PF00294) or FGGY family of carbohydrate kinase domains (PF02782)1, it is currently unclear whether some of those could represent new kinase families acting on glucose as substrate. Additionally, Odinarchaeum lacks candidate genes for a phosphoglucose (EC 5.3.1.8/9) and phosphofructokinase (EC 2.7.1.11/ 2.7.1.146). It is also interesting to note that genes for the bifunctional archaeal fructose-1,6-bisphosphate aldolase/phosphatase (FBPA/FBPase), which has been suggested to represent an ancient carbon fixation enzyme in archaea2, could not be identified in the genomes of Lokiarchaeota and two Heimdallarchaeota (while present in the other Asgard genomes). However, these organisms (except Lokiarchaeum CR4) could perhaps catalyze this reaction in two steps using a fructose-bisphosphate aldolase and an archaeal fructose-1,6-bisphosphatase (Suppl. Tables 1 and 2, Suppl. Figure 2). The TCA is nearly complete in most Asgard genomes (Suppl. Figure 2, Suppl. Tables 1 and 2) except Odinarchaeum and perhaps Lokiarchaeum CR_4, which lack genes for all subunits of a succinyl-CoA synthetase (EC 6.2.1.5) and for a succinate dehydrogenase/fumarate reductase (EC 1.3.5.1/4). Several genomes lack genes for canonical malate dehydrogenases as well as a membrane-bound malate:quinone- (IPR006231), which catalyze the oxidation of malate to oxaloacetate (Suppl. Figure 2). However, in those organisms, malate could be converted to pyruvate using malic enzyme (EC 1.1.1.38/40). Interestingly, the gene repertoire coding for proteins involved in ribose sugar metabolism and the corresponding pathways vary substantially across Asgard lineages (Fig. 1, Suppl. Tables 1 and 2). While Lokiarchaeota (both lineages), Heimdallarchaeum AB125 and Odinarchaeum seem to use the ribulose-5-phosphate pathway, genomes of Thorarchaeota appear to encode genes for enzymes of the non-oxidative pentose phosphate (PP) pathway and Heimdallarchaeum LC2 and LC3 might employ the oxidative PP pathway (Fig. 1, Suppl. Tables 1 and 2). Notably, the Odinarchaeaum genome lacks genes for ribose 5-phosphate isomerase (both Type A (IPR004788) and B (IPR003500, IPR004785) enzymes), which reversibly converts D-ribulose 5-phosphate to D-ribose 5-phosphate. Since D-ribose 5-phosphate represents a precursor for biosynthesis3, a homologue of this enzyme, i.e. of

2

Type A, is encoded by most archaeal genomes independent of whether the respective organisms employ the PP- or ribulose monophosphate (RuMP)-pathway. Prospective genomic analyses of additional members of these archaea is needed to determine whether these patterns reflect different catabolic and anabolic abilities or are a result of genome bin incompleteness.

Wood-Ljungdahl pathway. In accordance with previous analyses4-6, we could identify genes for all enzymes of the Wood-Ljungdahl pathway (WLP)7,8 in Loki- and Thorarchaeota (Suppl. Figure 2). In particular, these genomes encode the archaeal methyl-branch and carbamoyl-branch as well as the key enzyme CO dehydrogenase/acetyl-CoA synthase (ACS/CODH), which is comprised of five subunits (CdhA-E) and links the two branches by synthesizing acetyl-CoA from carbon monoxide and methyl-H4MPT via the methylated corrinoid Fe-S protein CH3-Co(III)FeSP (Suppl. Figure 2, Suppl. Tables 1 and 2)8. In contrast, genomes of Heimdallarchaeota do not seem to encode any of the enzymes involved in this pathway. While the genome of the sole representative of the Odinarchaeota encodes genes for the five steps of the methyl-branch leading to the generation of methyl-H4MPT, it appears to lack genes for all subunits of ACS/CODH (CdhABCDE). In methanogenic archaea, methyl-H4MPT not only plays a role in carbon

9 fixation but is also the key intermediate in methanogenesis . As such, methyl-H4MPT is converted to methyl-CoM and subsequently disproportionated to methane and CO2 by the tetrahydromethanopterin S-methyltransferase complex (Mtr) and methyl-CoM reductase (Mcr), respectively. However, we could not detect genes coding for the eight subunits of the Mtr complex or the five subunits of Mcr in any of the investigated Asgard archaea (see below). This suggests that Loki- and Thorarchaeota are not able to perform methanogenesis, although the use of methylated substrates cannot be excluded (see below). Instead they may use the WLP for autotrophic carbon fixation and perhaps for energy conservation using inorganic (at least Thorarchaeota) and/or organic substrates (Fig. 1). In addition, the WLP could serve as electron sink and thereby allow energetically efficient fermentative growth on various organic substrates (see below) or function in reverse for the complete oxidation of organic substrates7,10. Given the lack of ACS/CODH as well as key enzymes of methanogenesis in Odinarchaeota, the functional role of the enzymes of the methyl-branch in this organism remain unknown. However, it should be noted that we can currently not exclude the possibility that the absence of the respective genes is due to genome incompleteness.

Genes encoding for proteins related to methanogenesis. Surprisingly, genomes of Loki-, Thor- and even Heimdallarchaeota encode various proteins assigned to the uroporphyrinogen decarboxylase (URO-D) (IPR000257) (Suppl. Figure 2, Suppl. Tables 1 and 2), which includes candidate proteins for methylcobalamin:coM methyltransferases. Furthermore, each of these genomes encodes at least one protein assigned to the trimethylamine:corrinoid methyltransferase family and Thorarchaeum SMTZ1-45 in addition contains a gene coding for a putative monomethylamine methyltransferase. Methyltransferases belonging to these protein families allow several methanogenic archaea to grow on methylated compounds such as methyl sulfides, methylamines and methanol11. Notably, homologues of these enzymes encoded by genomes of methanogenic archaea generally

3

contain the amino acid pyrrolysine, which is thought to be essential for catalysis12. However, these methyltransferase families are extremely diverse and not restricted to organisms that encode pyrrolysine. For instance, a trimethylamine:corrinoid methyltransferase that does not contain pyrrolysine has been shown to function as glycine betaine methyltransferase in the gram-positive Desulfitobacterium hafniense13. Since Asgard archaea seem to lack genes coding for pyrrolysine, their putative trimethylamine:corrinoid methyltransferase may demethylate substrates other than trimethylamine. Methyl-groups could be transferred to corrinoid proteins, candidates for which (COG05012) are encoded by genomes several Asgard genomes (Suppl. Table 2). Corrinoid proteins are involved in methyl-transfer and act as substrate for the methylcobalamin:coenzyme M methyltransferase (IPR000257) leading to the formation of methyl-CoM in methanogens11. Lokiarchaeota and Thorarchaeota not only contain genes coding for various proteins assigned to IPR000257 (corrinoid proteins and methyltransferases), but their genomes also reveal the organization of these genes in gene clusters. This suggests that at least Loki- and Thorarchaeota have the ability to convert specific methylated compounds to methyl-CoM. In methanogenic archaea, methyl-CoM is further converted to methane by the key enzyme of methanogenesis: methyl-CoM reductase (Mcr)11. However, in contrast to archaea known to be able to use methylated compounds, all herein studied members of the Asgard archaea lack gene homologues coding for the five subunits of Mcr. Furthermore, their genomes do not encode all eight subunits of the membrane-bound tetrahydromethanopterin S-methyltransferase (MtrA-H), which performs a central energy-conserving and sodium-ion-translocating step during methanogenesis and links the methyl-branch of the WLP to methane generation from methyl-CoM. Yet, genomes of Loki-, Odin- and Thorarchaeota encode the methyltransferase subunit MtrH (Suppl. Table 2), which also occurs in various non- methanogenic archaeal and bacterial lineages14 and has been suggested to also mediate the transfer of a methyl- group between methylcobalamin and tetrahydrofolate13. In contrast, in methanogens the methyl-group is transferred to tetrahydromethanopterin via the corrinoid prosthetic group of MtrA15. Interestingly, in the different Thorarchaeota, one to two genes coding for MtrH are located in gene clusters with genes encoding MtrA homologues as well as a protein of unknown function related to methanogenesis marker protein 14 (IPR008303). This gene cluster is reminiscent of the mtrXAH operon of Methanosarcina barkeri, which was suggested to play a role in the methyl-transfer from methylated substrates to tetrahydromethanopterin16. It therefore seems possible that at least in Thorarchaeota, MtrH and MtrA homologues may provide a link between Methyl-CoM and Methyl-

a H4MPT and/or Methyl-H4F and allow the use of methylated compounds by these archaea.

3-hydroxypropionate/4-hydroxybutyrate cycle. In addition to genes for the WLP, genomes of Loki- and Thorarchaeota but also Heimdallarchaeota encode several enzymes characteristic of the 3-hydroxypropionate/4- hydroxybutyrate (3H4H) carbon fixation pathway (Suppl. Table 1). However, in Archaeoglobus lithotrophicus, whose genome also encodes multiple carbon fixation pathways17, these proteins are involved in the degradation of fatty

a Methyl-H4F represents a key intermediate of the methyl-branch in the bacterial-type WLP, genes of which were also identified in Asgard genomes and may encode enzymes functioning in folate biosynthesis (Suppl. Figure 2).

4

acids and hydrocarbons. It is likely that these enzymes perform similar functions in the Loki-, Thor- and Heimdallarchaeota. For instance, some of the genes encoding these enzymes co-occur with genes for β-oxidation related proteins and are absent in the genome of Odinarchaeum, which also lacks genes coding for β-oxidation- related proteins. Furthermore, in Lokiarchaeum GC14_75 and Heimdallarchaeota LC2 and LC3, some of these genes are encoded in the same region of the chromosome (Suppl. Table 3). For example, in Lokiarchaeum GC14_75, methylmalonyl-CoA mutase (COG2185 and COG1884) can be found within two genes of the short chain dehydrogenase gene (ACAD, COG1960, discussed below). Altogether, this supports the idea that these proteins could be involved in the degradation of fatty acids and hydrocarbons in several Asgard archaea.

Calvin-Benson-Bassham Cycle. Furthermore, genomes of all members of the Asgard archaea encode a protein belonging to the ribulose 1,5-bisphosphate carboxylase family (RuBisCO), the key carbon fixation enzyme of the

Calvin-Benson-Bassham (CBB) cycle (Suppl. Figure 18a), which adds CO2 to ribulose 1,5-bisphosphate (RuBP). Our phylogenetic analyses classify a homologue of each of the analysed Thorarchaeota and Lokiarchaeota as Type-IV Rubisco, which is not known to have canonical RuBisCO activity and may instead act on analogous compounds (Suppl. Figure 18a)18. In agreement with this, key residues of the active site19 are not conserved (Suppl. Figure 18b) in most of these Type-IV sequences. However, genomes of Heimdallarchaeum AB125, Lokiarchaeum CR4, Lokiarchaeum GC14_75 and Heimdallarchaeum LC3 encode homologues that affiliate either with Type-III RuBisCO or lack clear affiliation, and retain all essential active site residues (Suppl. Figure 18b, except for some with one mutation that maintains the hydrophobicity of the respective amino acid side chain). Thus, those sequences may act as bona fide RuBisCO enzymes. However, further research is needed to elucidate the functional context in which these proteins operate. For example, it is possible that they are involved in an AMP salvage pathway20 (Suppl. Tables 1 and 2), like in Thermococcus kodakarensis, rather than in carbon fixation. This is supported by the presence of candidate genes of the AMP salvage pathway in these genomes, i.e. genes coding for a putative AMP phosphorylase as well as a ribose-1,5-bisphosphate isomerase (Suppl. Tables 1 and 2) as well as by the lack of genes encoding additional key enzymes of the CBB and the related carbon fixation pathway referred to as the reductive hexulose-phosphate (RHP) pathway21. For example, proteins with the signature domain for phosphoribulokinase (IPR006082), another key enzyme of RuBisCO-based carbon fixation pathways, are not encoded by Asgard genomes. While, Heimdallarchaeota LC2 and LC3 do encode enzymes belonging the broader phosphoribulokinase/uridine kinase family (IPR006083; Suppl. Table 2), these seem to represent uridine kinases rather than phosphoribulokinases as they lack the signal domain of and are only distantly related to known phosphoribulokinases (e.g. the use of known phophoribulokinases of Synechococcus elongatus (Q31PL2) and Methanospirillum hungatei (Q2FUB5) as query against the heimdallarchaeal genomes did not recover a significant hit). Finally, the RuBisCO homologues of Odinarchaeum LCB_4 and Heimdallarchaeum LC2 have two mutations within the active site residues (Suppl. Figure 18b) and it is therefore unlikely that the corresponding enzymes have retained the canonical carboxylase activity.

5

2. Asgard archaea have the potential to use various substrates as carbon source including hydrocarbons The genomes of all Asgard archaea reveal the potential for the degradation of various substrates including complex carbohydrates as evidenced by various carbon active enzymes, peptidases and esterases as well as amino acids (Suppl. Table 4). Also notable is the large variety of paralogues encoding key enzymes of the β-oxidation pathway in all Asgard archaea except for Odinarchaeum, which may enable these organisms to use fatty acids of varying chain lengths as carbon source. In addition, Loki- and Thorarchaeota genomes encode putative archaeal formate dehydrogenases (Suppl. Tables 1 and 2), suggesting that these organisms are able to also use formate as a carbon and electron source, enabling the reduction of F420. Reduced F420 could subsequently allow carbon fixation via the WLP (Fig. 1). Alternatively, the formate dehydrogenase could be used for the generation of formate from CO2 during the complete oxidation of organic substrates such as fatty acids via the WLP. This may require growth with syntrophic formate-utilizing partner organisms, that would lower the concentration of formate22. Altogether, the potential of using organic substrates for growth is in agreement with a higher relative abundance of Lokiarchaeota (DSAG) in organic-rich sediment layers23,24.

Carbohydrate active enzymes and peptidases. Each Asgard archaea genome has the genetic potential to degrade complex substrates and together they encode 532 carbohydrate-active enzymes (CAZYmes), 338 peptidases and 128 ESTHERs (ESTerases and alpha/beta-Hydrolase Enzymes and Relatives; Suppl. Table 4). Overall, the genome of Odinarchaeum LCB_4 encodes for the fewest CAZYmes (24), peptidases (14) and ESTHERs (1). In contrast, the genomes of Lokiarchaeota encoded for most CAZymes and ESTHERs (~80 and ~40, respectively) and Heimdall- and Thorarchaeota for most of the peptidases (~35). However, the potential to degrade complex substrates is similar among the different Asgard archaea, when compared to their overall proteome size (Suppl. Figure 4). Notably, the main difference is the relative low number of predicted esterases in Odinarchaeum, which is in agreement with the lack of β-oxidation genes encoded in its genome. Potential substrates for these enzymes include complex carbohydrates, such as starch and glycogen (GH13, GH57, GH133), cellulose (GH12, GH5) or arabinan (GH93). Additionally, β-galactosidases (GH2), β-glucosidases (GH3) and α-mannosidases (GH38) might be involved in the degradation of oligosaccharides. A subset of these CAZymes were predicted to be extracellular, suggesting the degradation of more complex carbohydrates outside of and subsequent uptake into the cell. For example, genomes of Heimdallarchaeota and Lokiarchaeota encode for potentially secreted α-amylases (GH57 detected in AB125, LC3 and GC14_75) that cleave sugars from starch25. Consistent with this finding, we detected carbohydrate transporters in their genomes, including a potential disaccharide transport system in Heimdallarchaeota and Lokiarchaeota (transporter classification database (TCDB) ID: 3.A.1.1.22, 3.A.1.1.30, respectively) that could import sugars cleaved by secreted α-amylases (Suppl. Table 3). This observation as well as the presence of genes for most of the glycolytic pathway in Asgard genomes suggests that these genomes are capable to transform more complex carbohydrates and simple sugars to acetyl-CoA.

6

Aside from carbohydrate-degradation, genomes of Asgard archaea encode for several peptidase families (Suppl. Table 4). Few of these peptidases are predicted to be secreted and those mostly belong to the alkaline serine peptidase family S08 (detected in most of the genomes, Suppl. Table 4), which is implicated in nutritional substrate degradation26. In agreement with this, we detected several peptide transport systems across Asgard genomes. These include oligopeptide (Opp, TCDB IDs 3.A.1.5.12, 3.A.1.5.15 and 3.A.1.5.29) and dipeptide transport systems (Dpp, TCDB ID 3.A.1.5.39) in most of the Asgard genomes. Notably, oligopeptide transporters of the Opp/Dpp family can also be involved in complex carbohydrate transport, such as transport of mannans, rhamnose or xylan in maritima27. While genes for the degradation of rhamnose or xylans were absent, we frequently detected genes encoding α-mannosidases (GH38) and Opp transporters in the same genomes but not genetically linked on the same contig (Lokiarchaeum CR_4, Thorarchaeum AB_25, Thorarchaeum SMTZ-45 and SMTZ1-45). Additionally, we detected several uptake mechanisms for amino acids, including the amino acid permease YhdG/CtrA (TCDB ID 2.A.3.6.1) in all genomes except Odinarchaeum, a transport system for branched chain amino acid in Thorarchaeota and Heimdallarchaeota (TCDB IDs 3.A.1.4.1 and 3.A.1.4.10) and basic amino acids in Thorarchaeota (TCDB ID 3.A.1.3.27). Most genomes of Asgard members encode genes involved in the intracellular degradation of peptides include potential oligopeptidases (M03, M13, S09), dipeptidases (M19, M38, S51, T02), aminopeptidases (M24, M29, M42, S33) and carboxypeptidases (M20, M32, S12) (Suppl. Table 4). Our analyses indicate that the key genes encoding proteins for the degradation of various amino acids (e.g. glutamate, alanine, glycine etc.) seem to be present in all lineages (Suppl.Tables 1 and 2). Amino acid degradation products such as pyruvate and oxoglutarate can then enter the TCA and contribute to the generation of reducing equivalents. Finally, the genomes of Odinarchaeum as well as Lokiarchaeota encode a complete suite of enzymes of the , which may tie into the arginine metabolism (Suppl. Tables 1 and 2). Consistent with the presence of fatty acid degradation genes in most genomes, we also find several genes that for example code for proteins belonging to the hydrolase family 4, which includes the monoglyceride lipase/lysophospholipase subfamily. Members of this subfamily have a broad substrate range but generally degrade monoacylglycerides to free fatty acids, which could subsequently be fed into the β-oxidation pathway. Consistent with the absence of this pathway in Odinarchaeum, we did not detect any potential lipases in this genome. Apart from lipid degradation, most genomes except Odinarchaeum also encode for haloalkane dehydrogenases, members of the carbohydrate esterase CE10 family (i.e. para-nitrobenzyl esterases) and the lipase family XIII, which can act on acyl chain esters of butyrate or valerate28.

Fatty acids (FA). One of the characteristics that distinguish archaeal lipids from those of bacteria and eukaryotes is the presence of isoprenoid and not fatty acyl chains29. Over the past three decades, a few notable exceptions to this observation have been reported in the Halobacteria30,31. Indeed, surveys of archaeal genomes revealed components of both fatty acid synthesis and breakdown32,33. In eukaryotes and bacteria, fatty acids are synthesized or oxidized on two different carrier molecules, acyl-carrier protein (ACP) or CoA respectively. However, in most archaea, fatty

7

acids are likely synthesized and oxidized on CoA33,34. While the genomes of all members of the Asgard archaea encode canonical enzymes to synthesize archaeal lipids (with the exception of the lack of genes encoding glycerol- 1-phosphate dehydrogenase in Lokiarchaeota (Fig. 1, Suppl. Figure 2)), we identified multiple homologues of fatty acid metabolism proteins in each of the Asgard genomes except Odinarchaeota. This suggests that most of the Asgard archaea are capable of fatty acid degradation and synthesis. Here, we describe the pathway with respect to fatty acid breakdown although we suspect this pathway could also function in fatty acid synthesis, which has been reported elsewhere32,33 (Suppl. Figure 2).

Long chain fatty acid CoA ligase - COG0318. Fatty acids destined for degradation are first activated with CoA via long chain fatty acid CoA ligase (COG0318) to generate an acyl-CoA. This protein family includes a variety of CoA (e.g., Acyl-CoA synthetase, 2-succinylbenzoate--CoA ligase, medium and long chain fatty acid CoA ligase). With the exception of Odinarchaeota, each of the Asgard genomes encode multiple paralogues of this protein (Suppl. Table 1 and 2). To determine if any of these paralogues could be involved in fatty acid metabolism, we reconstructed the evolutionary history of this protein using sequences retrieved with BLAST against the nr and swissprot databases (see Methods for details, Suppl. Figure 5). We observed that the proteins are distributed across prokaryotic and eukaryotic diversity in multiple paralogues, though poor backbone support makes it difficult to determine the exact function of these proteins. Interestingly, Heimdallarchaeum LC2 has a MenE-like homologue (OLS23766) that might be involved in quinone metabolism (discussed below).

Acyl-CoA dehydrogenase (ACAD) COG1960. The resulting acyl-CoA can be oxidized via either an acyl-CoA dehydrogenase (ACAD, COG1960) or 4-hydroxybutyryl-CoA dehydratase (COG2368) to generate an acyl-2-enoyl- CoA. Most organisms have multiple paralogues of ACAD enzymes that are specific for different chain-lengths of fatty acids35. We observed multiple copies of COG1960 in each of the non-Odinarchaeota Asgards suggesting these organisms might be capable of metabolizing fatty acids (Suppl. Table 5). To help determine the substrate of the various Asgard ACADs, we performed a large phylogenetic analysis of ACADs across the tree of life with an emphasis on those sequences where the substrate specificity is known as previously reported in35 (Suppl. Figure 6). Since these gene families consisted of 1000s of members, we subdivided the dataset into three main clades based on a preliminary phylogenetic analysis (see Methods for details). When possible, sub- distinctions were made based on representative sequences reported in35. The number of distinct paralogues identified in each Asgard genome ranged from 1-6 as indicated by numbers within the circles on Suppl. Figure 2b Clade 1 (Suppl. Figure 6a) consists of two unclassified clades (clade 1a and 1b) with sequences from Lokiarchaeum CR4 and all the Thorarcheota and a sub-clade of medium chain acyl-CoA dehydrogenases (clade 1c; Suppl. Figure 6a). Within clade 2, Heimdallarchaeum LC2 and LC3 resolved in a clade sister to eukaryotic and bacterial ACAD10/11 sequences (Suppl. Figure 6b). The ACAD10 and 11 proteins in humans have been linked to branched chain (16C) and long-chain (20-26C) metabolism respectively36, although the function of these ACADs remains largely unknown in other organisms. The Asgards

8

branch throughout the rest of clade 2, with the majority grouping with other archaeal sequences in the unclassified clade 2c and clade 2d (Suppl. Figure 6b). Clade 3 consists of mostly short chain (clade 3a and 3b) or branched chain (clade 3b and 3e) dehydrogenases (Suppl. Figure 6c). These clade 3 dehydrogenases were found only in the Lokiarchaeota and Heimdallarchaeota genomes but not in Thorarchaeota. In general, the poor resolution of the tree and patchy distribution of the ACAD genes across the Asgards and Archaea makes it challenging to infer the evolutionary history of these proteins. Some Asgard ACAD sequences represent clear cases of recent events of (HGT) from bacteria to archaea (e.g., Heimdallarchaeota OLS19783 and OLS26460, sub-clade 2a; Heimdallarchaeota OLS24605 sub-clade 2e; Heimdallarchaeota in sub-clade 3c, Heimdallarchaeota OLS26526 in sub-clade 3d; Suppl. Figure 6) while others proteins are found throughout archaea (e.g., sub-clades 2c and 2d), suggesting they could be conserved feature of the Asgards or HGTs from other archaea. The lack of studied representatives within many of the ACAD clades makes it difficult to predict the function of the Asgard sequences. However, we suspect that at least Lokiarchaeota and Heimdallarchaeota are capable of metabolizing short branched and unbranched acyl-CoA chains because they have homologues of clade 3a and 3b enzymes (Suppl. Figure 6c). Despite the absence of well-characterized bacterial or eukaryotic ACAD genes in archaea (e.g., medium chain ACAD, clade 1c; very Long chain ACAD, clade 2a), Archaeoglobus fulgidus is able to degrade long-chain fatty acids37. This suggests that there might be alternative paralogues that archaea use to metabolize fatty acids that are distinct from their bacterial and eukaryotic counterparts. Prime candidates for such paralogues could be those sequences found in clade 2c or 2d that are enriched in archaeal representatives.

ACAD alternative - aromatic ring hydroxylase (ARH) or 4-hydroxybutyryl-CoA dehydrogenase - COG2368. Dibrova and colleagues33 recognized that ARH represents an analogue of the ACAD enzymes and only appeared in archaeal genomes with other fatty acid metabolism genes. This prediction is supported by previous studies which demonstrated that ARH has ACAD-like activity in Clostridium38 suggesting that ARH could replace ACAD in fatty acid biosynthesis. We analyzed the evolutionary history of this protein across the tree of life and found that, in general, this gene is rare in archaea (Suppl. Figure 7). Many of the archaeal sequences, including those encoded by genomes of Loki- and Thorarchaeota, cluster together in a divergent clade composed of Cren- and Euryarchaeota suggesting these archaeal paralogues could share a common origin. However, the remainder of the Asgard sequences group mostly with bacteria, suggesting these sequences derive from bacteria-Asgard HGT events.

Enoyl-CoA hydratase - ECH - COG1024. The acyl-2-enoyl-CoA produced by ACAD or the ACAD alternative enzymes discussed above is subsequently hydrated to 3-hydroxyacyl-CoA by the action of enoyl-CoA hydratase (ECH; COG1024). We observed many copies of COG1024 in the Loki-, Thor- and Heimdallarchaeota genomes, and as many as 23 copies in Lokiarchaeum (Suppl. Table 5). Of these 23 Lokiarchaeum ECH homologues, seven were excluded as they were very similar to another copy (due to strain heterogeneity39) or too short for further analysis. Like ACAD,

9

ECH are part of a large superfamily. To refine our inferences, we therefore split the dataset into two distinct clades, ECH1 and ECH2 (Suppl. Figure 8 and 9). In general, support along the backbone of both clade 1 and clade 2 phylogenies is poor. We observed that this gene is present in multiple paralogous copies across prokaryotes and eukaryotes. Each non-Odinarchaeota Asgard genome encoded at least 1 paralogue of ECH as indicated by numbers within the circles on Suppl. Figure 2b. As previously suggested by Dibrova and colleagues, the origin of this gene in Archaea is best explained by multiple independent HGT events from bacteria. For example, Asgard sequences that branch within a clade of only bacteria rather than grouping with sequences of other members of the Asgard archaea likely represent more recent transfer events (e.g., clade 1, OLS26255 Heimdallarchaeum LC 3; clade 2, KKK42140, OLS14134, KKK40657, Lokiarchaeota).

3-hydroxyacyl-CoA dehydrogenase - HDH - COG1250. The 3-hydroxyacyl-CoA generated by ECH is oxidized further by 3-hydroxyacyl-CoA dehydrogenase (HDH; COG1250) to generate 3-ketoacyl-CoA. We observed that most of the non-Odinarchaeota Asgards have at least one copy of this gene with Lokiarchaeota having as many as eight copies. Phylogenetic analyses revealed that there are multiple paralogues of this gene across Asgards as indicated by numbers within the circles in Suppl. Figure 2b. Most of the Asgard sequences branch with other archaea suggesting that these proteins could have an archaeal origin (Suppl. Figure 10). However, some of the Lokiarchaeota sequences branch with bacteria (e.g., KKK40272, KKK40637, KKK42255, and KKK41074) and thus could have been acquired via HGT events.

β-ketothiolase - acetoacetyl-CoA acyltransferase - BKL/AACT - COG0183. The final step of β-oxidation is catalyzed by β-ketothiolase (BKL, COG0183) which cleaves the 3-ketoacyl-CoA to form acetyl-CoA and an acyl-CoA molecule that is two carbons shorter. This family of enzymes includes the acetoacetyl-CoA (AACT) enzymes which catalyze acetoacetyl-CoA formation from acetyl-CoA as the first step in the mevalonate pathway for the biosynthesis of isoprenoids in archaea. All of the Asgard genomes (except Heimdallarchaeum AB125) have at least one copy of BKL/AACT. Initial phylogenetic analysis of this superfamily of proteins revealed two distinct clades of sequences composed of mainly archaea (clade 1; Suppl. Figure 11) or bacteria (clade 2; Suppl. Figure 12). For ease of discussion, we labeled the subclades of clade 1 A-M however the corresponding clade identifiers first described by Dibrova et al. can be found in Suppl. Table 5. The Asgard sequences branch throughout clade 1 with other archaea. In general, the tree topology does not reflect a vertical inheritance pattern for many of the Asgard sequences. In fact, there are no clades that have a representative sequence from each of the Asgard phyla (i.e., Loki-, Thor-, Heimdall- and Odinarchaeota), suggesting that these genes have either been subject to multiple replacements via HGT or were differentially lost among the different Asgard phyla. Within clade 2, many of the Asgard homologues branch with bacteria (e.g., Heimdallarchaeum LC3 OLS23724; Lokiarchaeum CR4 OLS15559, OLS12682, OLS15498, OLS14603), suggesting they might derive from bacteria-Asgard HGT events. However, at least one clade includes seven representative sequences from Loki-, Thor-, and Heimdallarchaeota phyla (Suppl. Figure 11, bottom clade contains

10

Heimdallarchaeum LC3 sequence OLS26180) and branches with other archaea, and could therefore represent an ancient feature of the Asgard superphylum. According to previous studies, some of the archaeal BKL/AACT sequences found in clade 1 (subclades 1B, 1C, 1D and 1H) are encoded next to genes related to mevalonate metabolism33. To expand on this observation, we searched the 15 genes upstream and downstream of the Asgard BKL/AACT for genes related to β-oxidation (COGs 0318, 1960, 2368, 1024, 1042 or 1250) and mevalonate metabolism (COG3425) (Suppl. Figure 13). We found that some of the Asgard BKL/AACT sequences in subclades 1A, 1D, 1H, 1J, 1K, and 1L are linked to β-oxidation associated genes while those in 1B, 1E, and 1L are linked to mevalonate associated genes (e.g., HMG-CoA synthase). Within clade 2, only two of the Lokiarchaeota sequences appear to be linked to β-oxidation genes (Suppl. Figure 12). The gene proximity of the BKL/AACT sequences to various elements of β-oxidation and mevalonate metabolism could be used as a proxy for determining in which pathway these proteins participate, although further functional characterization is needed.

A note on fatty acid biosynthesis. Bacteria and eukaryotes typically have different enzymes dedicated to β-oxidation and biosynthesis of fatty acids. For example, the steps catalyzed by ACAD and HDH are in fact catalyzed by members of the short-chain dehydrogenase protein family (COG1028) namely enoyl-ACP reductase and ketoacyl-ACP reductase. We detected over 150 genes within this protein family across the Asgard superphylum, some of which might be dedicated for fatty acid metabolism, however, these pathways have yet to be characterized in archaea (Suppl. Table 5). Another fundamental difference between β-oxidation and fatty acid biosynthesis is the use of CoA versus acyl-carrier protein (ACP). In general, ACPs are rare in archaea and likely represent a bacterial feature33,34. Notably, however, we identified homologues of ACP in Heimdallarchaeum LC3 and all members of Thorarchaeota (Suppl. Table 5), suggesting that these organisms might be capable of ACP-mediated fatty acid biosynthesis. In the other Asgards and archaea without ACPs, it is possible that the β-oxidation enzymes might be able to synthesize fatty acids using CoA as the carrier molecule as is the presumed mechanism in other archaea33,34. Indeed, recent studies using synthetic biology, have successfully engineered the β-oxidation pathways of E. coli and yeast to elongate fatty acids40,41 suggesting that these pathways have the potential to function in both directions.

Hydrocarbons. Notably, all analyzed Loki- and Thorarchaeota as well as Heimdallarchaeum AB125 contain several genes coding for pyruvate formate lyase (Pfl) domain (IPR004184) proteins (Suppl. Tables 1 and 2). This protein family includes glycyl radical enzymes, some of which are involved in the anaerobic activation of hydrocarbons through the addition of fumarate42. The glycyl is activated by glycyl-activating enzymes (reviewed in43), the genes for which are present in all Asgard archaea and occur in a gene cluster with IPR004184-domain containing proteins in Lokiarchaeum CR4 and Heimdallarchaeum AB125. Our phylogenetic analyses of the Pfl domain proteins show that archaeal homologues including those of Asgard archaea do not represent bona fide Pfl (Suppl. Figure 14). In contrast, they affiliate with those proteins comprising known alkylsuccinate and benzylsuccinate synthases44, choline

11

trimethylamine lyases45 or 4-hydroxyphenlyacetate decarboxylases46, some of which allow the activation of hydrocarbons by fumarate addition. However, the different archaeal lineages comprise various separate clusters in phylogenetic analyses indicating independent gene transfers from different bacterial sources without clear affiliation to specific characterized enzymes. Thus, the function of these enzymes in Asgard archaea is unresolved. Yet, it seems possible, that at least some of these protein homologues could function in the activation of different hydrocarbons. For instance, a Pfl domain protein of Archaeoglobus fulgidus related to a homologue of Thorarchaeum SMTZ1_83 (Suppl. Figure 14: A0A135VWA9, arrow) has been analyzed previously and was suggested to represent an alkylsuccinate synthase involved in the activation of C16-alkanes by fumarate addition37. Furthermore, it was shown that Thermococcus sibiricus has the potential to grow on an alkane likely using its PFL-homologue (Suppl. Figure 14, C6A251, arrow). Prospective analyses are required to test the hypothesis of growth on hydrocarbons in the respective Asgard archaea and to elucidate the exact pathway of hydrocarbon degradation in archaea, which may differ from bacteria. For example, in bacteria, malonyl-CoA decarboxylase (IPR007956, IPR035372) is critical for the alkane degradation pathway47, though we could not detect this protein in archaea. However, several other proteins involved in alkane degradation in bacteria47 belong to enzyme families that also comprise proteins involved in the 3- hydroxypropionate/4-hydroxybutyrate carbon fixation pathway in some members of the TACK archaea. Candidate genes coding for enzymes of these protein families are present in the genomes of the different Asgard lineages (except for Odinarchaeota; Suppl. Tables 1 and 2) but also in several members of the Archaeoglobales, in which they were suggested to enable alkane degradation17.

3) Electron transport chains differ among the Asgard lineages [Ni-Fe]-hydrogenases. All members of the Asgards lack [FeFe]-hydrogenases or the methanogen-specific [Fe]-hydrogenases. However, all Asgard genomes encode various [NiFe]-hydrogenases, which were annotated based on classification of the large subunit using HydDB48, phylogenetic analyses (Suppl. Table 2, Fig. 2) and operon architecture (Suppl. Figure 3). In particular, the Asgard genomes encode [NiFe]-hydrogenases belonging to Group 3 and Group 4, yet lack representatives of Group 1 and Group 2 [NiFe]-hydrogenases. First of all, both the genomes of Thorarchaeota, Heimdallarchaeum LC2 and LC3 as well as of Lokiarchaeum GC14_75 encode genes for [NiFe]- Hydrogenases belonging to Group 3c, which are located in gene clusters encoding all or some components of heterodisulfide reductases (Suppl. Figure 3). In methanogens, hydrogenase (MvhADG)-heterodisulfide reductase

(HdrABC) complexes mediate electron bifurcation, with electrons from H2 oxidation being simultaneously transferred to ferredoxin and heterodisulfide49. However, the substrate for these enzymes outside the methanogens (which use CoM-S-S-CoB) remains to be determined. While, the Heimdallarchaeota (LC2 and LC3) large subunit 3c homologues have retained the C-XX-C motifs at the C-terminus, this motif is not conserved in most of the thor- and lokiarchaeal homologues. It thus needs to be confirmed that these Group 3c [NiFe]-hydrogenases can bind the [NiFe]

49 centers required to mediate H2 oxidation . In addition, Loki-, Odin- and Thorarchaeota as well as Heimdallarchaeum

12

LC3 contain gene clusters encoding [NiFe]-hydrogenases belonging to Group 3b (Suppl. Figure 3), which are bidirectional and may either evolve H2 using electrons from NADPH or recycle H2 to produce NADPH as reported for members of the Thermococcales50,51. While, the Group 3b enzyme complexes of Pyrococcus furiosus were shown to

52 also have sulfhydrogenase activity (i.e. to reduce elemental sulfur to H2S) in vitro , this does not seem to be their physiological function (51 and references therein). Finally, genomes of Heimdall- and Odinarchaeota also encode various membrane-bound Group 4 [NiFe]- hydrogenases, which are only distantly related to known Group 4 subgroups. In fact, the addition of homologues from novel archaeal genomes to our phylogenetic analyses, revealed that Group 4 hydrogenases are even more diverse than anticipated earlier53 and suggests that future efforts are needed to define novel subgroups. Currently, most of these archaeal Group 4 large subunit homologues, including those present in Asgard archaea, comprise novel clusters within the loosely defined respiratory H2-evolving 4g subgroup (Fig. 2). While, the exact function is unconfirmed, it has been suggested that the previously described Group 4g hydrogenases can couple ferredoxin oxidation to proton reduction in a simple electron chain that can contribute to the membrane potential and enhance overall energy yield. In order to get more insights into the potential function of group 4 [NiFe]-hydrogenases of Asgard archaea, we investigated the arrangement of the corresponding gene clusters. This revealed that the arrangement of their gene clusters (Suppl. Figure 3) is highly variable, in agreement with the large sequence diversity of the key subunit of these Group 4 [NiFe]-hydrogenases. However, all group 4 [NiFe]-hydrogenases seem to be membrane-anchored and ferredoxin represents the most likely co-substrate. Furthermore, the gene cluster architecture suggests, that at least two Group 4 hydrogenases of Odinarchaeum (OdinLCB4_14620, OdinLCB4_02560) and the Group 4 hydrogenase of Heimdallarchaeum LC2 (HeimC2_23020) can mediate sodium/proton translocation similar to Pyrococcus furiosus 54 as indicated by the presence of NuoL-like (also Mrp antiporter-like) membrane subunits (Suppl. Figure 3). Currently, the gene cluster that encodes the group 4 [NiFe]- hydrogenase of Heimdallarchaeum AB125 does not include genes encoding NuoL-like subunits (Suppl. Figure 3). However, in this regard, it has to be noted that the contig on which this gene cluster is located starts with the hypD gene (OLS33135). Furthermore, nuoL/M/N-like genes (e.g. OLS33159, OLS33160, OLS29628) are encoded towards the end of two other genomic contigs of this organism. This indicates that the gene cluster displayed in Suppl. Figure 3 is currently incomplete as a result of genome assembly that has split the hydrogenase gene cluster into separate contigs. Thus and based on the similarity to the large subunit of the Group 4 Hydrogenases of Heimdallarchaeum LC2 (Figure 2), it is most likely that the Group 4 hydrogenase of Heimdallarchaeum AB125 is also ion conductive and thus involved in energy conservation.

Complex I and alternative NADH:quinone reductases. Only genomes of Heimdallarchaeota and Thorarchaeota seem to encode a multisubunit NADH dehydrogenase (Complex 1) (Suppl. Table S2) including a homolog of NuoD (Fig. 2). However, candidates for an alternative NADH:quinone reductase are present in Loki- and Thorarchaeota. Genomes of members of these phyla encode several proteins with nitronate monooxygenase

13

domains (PF03060). It was recently reported that this protein family includes a new class of NADH:quinone reductases (PA1024)55. Interestingly, at least one PF03060 domain protein of each Loki- and Thorarchaeota is much more similar to the NADH:quinone reductases of PAO1 (PA1024) (ca. 40% AA identity) than to its nitronate monooxygenase (PA4202) (ca. 28% AA identity). This may suggest that PF03060 domain proteins in these Asgard archaea may represent novel types of quinone reductases and could comprise a NADH:quinone reductases, which could for example function in the re-oxidation of NAD(P)H generated during growth on organic substrates.

Reductive dehalogenases. Surprisingly, genomes of all Asgard archaea except for Odinarchaeum encode various proteins assigned to epoxyqueuosine reductases (COG01600), some of which have significant similarity to reductive dehalogenase domains (IPR028894) (Suppl. Table 2). In bacteria, reductive dehalogenases serve as

56 terminal reductases enabling the degradation of organohalides using H2, different organic substrates (e.g. pyruvate and acetate) or even NADPH as electron donors57. Organohalides such as halogenated phenols, dioxins, biphenyls, and aliphatic hydrocarbons represent persistent environmental pollutants and bacteria able to engage in dehalogenation of these compounds are of great ecological importance. Reductive dehalogenases are corrinoid Fe- S containing proteins and homologous to epoxyqueuosine reductases (COG01600)58. Notably, our phylogenetic analyses of this protein family using a previously described backbone dataset59 revealed that genomes of Loki-, Thor- and Heimdallarchaeum LC2 and LC3 as well as Theionarchaea each encode at least one putative bona fide reductive dehalogenase (Suppl. Figure 15, dark green) suggesting the ability to use organohalides as terminal electron acceptors. The function of other homologs belonging to the epoxyqueuosine reductase family and which lack IPR domains of canonical reductive dehalogenases (Suppl. Figure 15, light green box)58 is currently unclear, yet it may be speculated that some could represent novel electron acceptors with yet unknown the substrates. In organohalide-respiring bacteria, the reductive dehalogenase genes are generally encoded in an operon with a gene encoding a membrane anchoring protein. However, since homologues of these membrane anchors could not be detected in the Asgard archaea, it is difficult to determine if these reductive dehalogenases are membrane- associated or soluble enzymes. Therefore, it will be important to characterize some of these proteins biochemically to determine whether Asgard archaea can indeed engage in the degradation of specific pollutants and elucidate the substrate spectrum of these proteins. For example, in Loki- and Thorarchaeota, which appear to lack classical terminal reductases, these reductive dehalogenases could allow the re-oxidation of the quinols generated during β- oxidation and ETF.

Terminal oxidase and nitrate reductase in Heimdallarchaeota. Surprisingly, genomes of Heimdallarchaeote LC2 and LC3 encode terminal A-type heme-copper oxidases (Suppl. Figure 16a), which may allow the respective organisms to use oxygen as terminal electron acceptor. Furthermore, under anoxic conditions, the presence of a respiratory nitrate reductase (Suppl. Table 1) seems to enable the use of nitrate as terminal acceptor through anaerobic

14

phosphorylation. Notably, a closer investigation of the alpha subunits of the nitrate reductase (NarG) encoded by Heimdallarchaeum LC2 and LC3 genomes, revealed that they belong to the bacterial-type nitrate reductase family (IPR006468) and lack the TAT signal motif common in canonical archaeal-type nitrate reductases which are localized at the extracellular site of the cytoplasmic membrane. Thus, Nar of Heimdallarchaeota is likely located in the cytoplasm similar to the bacterial-type Nar60-62. Phylogenetic analyses of NarG revealed that the Heimdallarchaeota homologues form a monophyletic cluster with the bacterial-type nitrate reductase of Methanoperedens sp. BLZ1 (Suppl. Figure 16b) as well as with a homologue of Methylomirabilis oxyfera, Nitrolancea hollandica, Nitrococcus mobilis and a Nitrobacter sp. species. Most likely, the few archaeal genomes that encode a bacterial version of Nar (Suppl. Figure 16b) have acquired the corresponding genes independently from bacterial sources.

Other potential electron chain components. Besides the presence of terminal reductases and a Complex I in Heimdallarchaeum LC2 and LC3, the genomes of these organisms encode a membrane-bound succinate dehydrogenase/fumarate reductase (Complex II), as well as a putative electron transfer flavoprotein quinone:oxidoreductase complex (ETF-QO; composed of FixA, FixB, FixC and FixX), both of which may allow the reduction of quinone-species (Fig. 1; Suppl. Tables 1 and 2). While genomes of Loki- and Thorarchaeota also encode FixA and FixB homologues as well as candidate proteins for FixC (belonging to arCOG00570), the latter did not occur in a gene cluster with FixX homologues. It remains therefore unclear whether Loki- and Thorarchaeota have a complete membrane-associated ETF-QO. In this regard, it is notable that genomes of Loki- and Thorarchaeota encode a large amounts of homologues belonging to subunits of soluble heterodisulfide reductases comprised of HdrA, HdrB and HdrC as well as of the D subunit of membrane-bound heterodisulfide reductases (which also includes HdrE)9. However, similar to Methanomassiliicoccales63, genomes of Loki- and Thorarchaeota do not seem to encode homologues of HdrE, and the functional role of their HdrD homologues remains to be determined. For example, some of the loki- and thorarchaeal HdrD family homologues (arCOC00333) may be involved in the re-oxidation of quinol species generated by ETF during growth on organic substrates (e.g. β-oxidation) (Fig. 1). It also remains to be determined whether the succinate dehydrogenases/fumarate reductases of Loki- and Thorarchaeota are membrane-associated and could represent another means for the re-oxidation of quinol species (Fig. 1). Furthermore, genomes of most Asgard archaea, except for Odinarchaeoum, encode putative multihaem cytochromes (IPR011031) and/or cytochrome c-552/DMSO reductase-like, haem-binding domains (IPR019020), yet it is unclear whether these proteins play a role in electron transfer in these organisms. In two Thorarchaeota, the multihaeme cytochromes encode additional signature domains for cytochrome c552 (IPR003321), which occurs in formate-dependent cytochrome C nitrite reductases and catalyzes the reduction of nitrite to ammonia (Suppl. Tables 1, and 2)64. It remains to be investigated whether these thorarchaeal proteins, which are only distantly related to known nitrite reductases, could have similar functions. Most notably, Heimdallarchaeum LC2 and LC3 genomes code for cytochrome b561 domain proteins as well as a multitude of cupredoxin domain proteins, the latter of which may

15

represent important electron carriers (besides quinones) in these organisms and may shuttle reducing equivalents to their terminal oxidases (Fig. 1, Suppl. Table 2).

Menaquinone biosynthesis. All Asgard archaea genomes contain several genes encoding proteins potentially involved in ubiquinone/menaquinone biosynthesis. The set of genes is most complete for Heimdallarchaeota, supporting the idea that quinone species are components of their electron chains (Suppl. Table 1, Fig. 1). However, Heimdallarchaeum LC2 is the only representative of the Asgard archaea whose genome encodes homologues for most enzymes of the menaquinone biosynthesis pathway (Suppl. Tables 1 and 2; including MenB, MenF, MenD, MenE and candidates for MenA and MenG). For example, one of the steps is catalyzed by a member of the acyl-CoA synthetase (AMP-forming)/AMP-acid ligase II family. While genomes of all Asgard archaea encode representatives of this enzyme family (mainly functioning in β-oxidation), the Heimdallarchaeum LC2 genome uniquely encodes a homologue, which branches within the MenE clade in phylogenetic analyses and is annotated as a 2- succinylbenzoate-CoA ligase (Suppl. Figure 5). In contrast, the biosynthesis of menaquinones is less clear in Heimdallarchaeum LC3 and AB125. While they lack homologues of most canonical menaquinone biosynthesis genes, these genomes encode a homologue of MqnC, an enzyme of an alternative menaquinone biosynthesis pathway65. However, even though some of the genes annotated as ubiquinone biosynthesis genes may function in this alternative pathway, Heimdallarchaeum LC3 and AB125 seem to lack some enzymes characteristic of this pathway and the exact steps leading to menaquinone are therefore unclear in these latter organisms. Genomes of Loki- and Thorarchaeota encode only few candidate proteins for the biosynthesis of quinone species and the presence of quinones in these organisms needs to be investigated in future studies.

16

B. References 1 Zhang, Y., Zagnitko, O., Rodionova, I., Osterman, A. & Godzik, A. The FGGY carbohydrate kinase family: insights into the evolution of functional specificities. PLoS Comput Biol 7, e1002318- e1002318, doi:10.1371/journal.pcbi.1002318 10.1371/journal.pcbi.1002318. Epub 2011 Dec 22. (2011). 2 Say, R. F. & Fuchs, G. Fructose 1,6-bisphosphate aldolase/phosphatase may be an ancestral gluconeogenic enzyme. Nature 464, 1077-1081, doi:10.1038/nature08884 10.1038/nature08884. Epub 2010 Mar 28. (2010). 3 Brasen, C., Esser, D., Rauch, B. & Siebers, B. Carbohydrate metabolism in Archaea: current insights into unusual enzymes and pathways and their regulation. Microbiology and Molecular Biology Reviews 78, 89-175, doi:10.1128/MMBR.00041-13 10.1128/MMBR.00041-13. (2014). 4 Sousa, F. L., Neukirchen, S., Allen, J. F., Lane, N. & Martin, W. F. Lokiarchaeon is hydrogen dependent. Nature Microbiology 1, doi:Artn 16034 10.1038/Nmicrobiol.2016.34 (2016). 5 Seitz, K. W., Lazar, C. S., Hinrichs, K. U., Teske, A. P. & Baker, B. J. Genomic reconstruction of a novel, deeply branched sediment archaeal phylum with pathways for acetogenesis and sulfur reduction. The ISME Journal, doi:10.1038/ismej.2015.233 (2016). 6 Liu, Y. et al. Comparative genomic inference suggests mixotrophic lifestyle for Thorarchaeota. ISME J 12, 1021-1031, doi:10.1038/s41396-018-0060-x (2018). 7 Ragsdale, S. W. & Pierce, E. Acetogenesis and the Wood-Ljungdahl pathway of CO(2) fixation. Biochim Biophys Acta 1784, 1873-1898, doi:10.1016/j.bbapap.2008.08.012 (2008). 8 Adam, P. S., Borrel, G. & Gribaldo, S. Evolutionary history of carbon monoxide dehydrogenase/acetyl-CoA synthase, one of the oldest enzymatic complexes. Proceedings of the National Academy of Sciences of the United States of America 115, E1166-E1173, doi:10.1073/pnas.1716667115 10.1073/pnas.1716667115. Epub 2018 Jan 22. (2018). 9 Thauer, R. K., Kaster, A. K., Seedorf, H., Buckel, W. & Hedderich, R. Methanogenic archaea: ecologically relevant differences in energy conservation. Nat Rev Microbiol 6, 579-591, doi:10.1038/nrmicro1931 10.1038/nrmicro1931. Epub 2008 Jun 30. (2008). 10 Schuchmann, K. & Muller, V. Energetics and application of heterotrophy in acetogenic bacteria. Appl Environ Microbiol, doi:10.1128/AEM.00882-16 (2016). 11 Borrel, G., Adam, P. S. & Gribaldo, S. Methanogenesis and the Wood-Ljungdahl Pathway: An Ancient, Versatile, and Fragile Association. Genome Biol Evol 8, 1706-1711, doi:10.1093/gbe/evw114 (2016). 12 Krzycki, J. A. Function of genetically encoded pyrrolysine in corrinoid-dependent methylamine methyltransferases. Curr Opin Chem Biol 8, 484-491, doi:10.1016/j.cbpa.2004.08.012 10.1016/j.cbpa.2004.08.012. (2004). 13 Ticak, T., Kountz, D. J., Girosky, K. E., Krzycki, J. A. & Ferguson Jr, D. J. A nonpyrrolysine member of the widely distributed trimethylamine methyltransferase family is a glycine betaine methyltransferase. Proceedings of the National Academy of Sciences of the United States of America 111, E4668-4676, doi:10.1073/pnas.1409642111 10.1073/pnas.1409642111. Epub 2014 Oct 13. (2014). 14 Wang, S., Chen, Y., Cao, Q. & Lou, H. Long-Lasting Gene Conversion Shapes the Convergent Evolution of the Critical Methanogenesis Genes. G3 (Bethesda) 5, 2475-2486, doi:10.1534/g3.115.020180 10.1534/g3.115.020180. (2015).

17

15 Hippler, B. & Thauer, R. K. The energy conserving methyltetrahydromethanopterin:coenzyme M methyltransferase complex from methanogenic archaea: function of the subunit MtrH. FEBS Lett 449, 165-168 (1999). 16 Harms, U. & Thauer, R. K. Identification of the active site histidine in the corrinoid protein MtrA of the energy-conserving methyltransferase complex from Methanobacterium thermoautotrophicum. Eur J Biochem 250, 783-788 (1997). 17 Estelmann, S. et al. Carbon dioxide fixation in 'Archaeoglobus lithotrophicus': are there multiple autotrophic pathways? Fems Microbiology Letters 319, 65-72, doi:10.1111/j.1574- 6968.2011.02268.x 10.1111/j.1574-6968.2011.02268.x. Epub 2011 Apr 4. (2011). 18 Tabita, F. R., Satagopan, S., Hanson, T. E., Kreel, N. E. & Scott, S. S. Distinct form I, II, III, and IV Rubisco proteins from the three kingdoms of life provide clues about Rubisco evolution and structure/function relationships. J Exp Bot 59, 1515-1524, doi:10.1093/jxb/erm361 10.1093/jxb/erm361. Epub 2008 Feb 16. (2008). 19 Solden, L., Lloyd, K. & Wrighton, K. The bright side of microbial dark matter: lessons learned from the uncultivated majority. Curr Opin Microbiol 31, 217-226, doi:10.1016/j.mib.2016.04.020 (2016). 20 Sato, T., Atomi, H. & Imanaka, T. Archaeal type III RuBisCOs function in a pathway for AMP metabolism. Science 315, 1003-1006, doi:10.1126/science.1135999 10.1126/science.1135999. (2007). 21 Kono, T. et al. A RuBisCO-mediated carbon metabolic pathway in methanogenic archaea. Nat Commun 8, 14007-14007, doi:10.1038/ncomms14007 10.1038/ncomms14007. (2017). 22 Stams, A. J. & Plugge, C. M. Electron transfer in syntrophic communities of anaerobic bacteria and archaea. Nat Rev Microbiol 7, 568-577, doi:10.1038/nrmicro2166 (2009). 23 Jorgensen, S. L. et al. Correlating microbial community profiles with geochemical data in highly stratified sediments from the Arctic Mid-Ocean Ridge. Proceedings of the National Academy of Sciences of the United States of America 109, E2846-2855, doi:10.1073/pnas.1207574109 (2012). 24 Jorgensen, S. L., Thorseth, I. H., Pedersen, R. B., Baumberger, T. & Schleper, C. Quantitative and phylogenetic study of the Deep Sea Archaeal Group in sediments of the Arctic mid-ocean spreading ridge. Front Microbiol 4, 299-299, doi:10.3389/fmicb.2013.00299 (2013). 25 Janecek, S. & Blesak, K. Sequence-structural features and evolutionary relationships of family GH57 alpha-amylases and their putative alpha-amylase-like homologues. Protein J 30, 429-435, doi:10.1007/s10930-011-9348-7 10.1007/s10930-011-9348-7. (2011). 26 Rawlings, N. D., Barrett, A. J. & Finn, R. Twenty years of the MEROPS database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Research 44, D343-350, doi:10.1093/nar/gkv1118 10.1093/nar/gkv1118. Epub 2015 Nov 2. (2016). 27 Conners, S. B. et al. An expression-driven approach to the prediction of carbohydrate transport and utilization regulons in the hyperthermophilic bacterium Thermotoga maritima. J Bacteriol 187, 7267-7282, doi:10.1128/JB.187.21.7267-7282.2005 10.1128/JB.187.21.7267-7282.2005. (2005). 28 Levisson, M., van der Oost, J. & Kengen, S. W. Characterization and structural modeling of a new type of thermostable esterase from Thermotoga maritima. FEBS J 274, 2832-2842, doi:10.1111/j.1742-4658.2007.05817.x 10.1111/j.1742-4658.2007.05817.x. Epub 2007 Apr 27. (2007).

18

29 Koga, Y., Nishihara, M., Morii, H. & Akagawa-Matsushita, M. Ether polar lipids of methanogenic bacteria: structures, comparative aspects, and biosyntheses. Microbiol Rev 57, 164-182 (1993). 30 Pugh, E. L., Wassef, M. K. & Kates, M. Inhibition of fatty acid synthetase in cutirubrum and Escherichia coli by high salt concentrations. Canadian journal of biochemistry 49, 953-958 (1971). 31 Falb, M. et al. in (2008). 32 Lombard, J., López-García, P. & Moreira, D. The early evolution of lipid membranes and the three domains of life. Nature Reviews Microbiology 10, 507-515, doi:10.1038/nrmicro2815 (2012). 33 Dibrova, D. V., Galperin, M. Y. & Mulkidjanian, A. Y. Phylogenomic reconstruction of archaeal fatty acid metabolism. Environ Microbiol 16, 907-918, doi:10.1111/1462-2920.12359 (2014). 34 Lombard, J., López-García, P. & Moreira, D. An ACP-independent fatty acid synthesis pathway in archaea: Implications for the origin of . Molecular Biology and Evolution, doi:10.1093/molbev/mss160 (2012). 35 Swigonova, Z., Mohsen, A. W. & Vockley, J. Acyl-CoA dehydrogenases: Dynamic history of protein family evolution. J Mol Evol 69, 176-193, doi:10.1007/s00239-009-9263-0 10.1007/s00239-009-9263-0. Epub 2009 Jul 29. (2009). 36 He, M. et al. Identification and characterization of new long chain acyl-CoA dehydrogenases. Mol Genet Metab 102, 418-429, doi:10.1016/j.ymgme.2010.12.005 (2011). 37 Khelifi, N. et al. Anaerobic oxidation of long-chain n-alkanes by the hyperthermophilic sulfate- reducing archaeon, Archaeoglobus fulgidus. The ISME Journal 8, 2153-2166, doi:10.1038/ismej.2014.58 10.1038/ismej.2014.58. Epub 2014 Apr 24. (2014). 38 Müh, U., Çinkaya, I., Albracht, S. P. J. & Buckel, W. 4-Hydroxybutyryl-CoA dehydratase from aminobutyricum: Characterization of FAD and iron-sulfur clusters involved in an overall non- redox reaction. Biochemistry, doi:10.1021/bi9601363 (1996). 39 Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173-+, doi:10.1038/nature14447 (2015). 40 Clomburg, J. M., Vick, J. E., Blankschien, M. D., Rodríguez-Moyá, M. & Gonzalez, R. A synthetic biology approach to engineer a functional reversal of the β-oxidation cycle. ACS Synthetic Biology, doi:10.1021/sb3000782 (2012). 41 Lian, J. & Zhao, H. Reversal of the β-oxidation cycle in saccharomyces cerevisiae for production of fuels and chemicals. ACS Synthetic Biology, doi:10.1021/sb500243c (2015). 42 Callaghan, A. V. Enzymes involved in the anaerobic oxidation of n-alkanes: from methane to long- chain paraffins. Front Microbiol 4, 89-89, doi:10.3389/fmicb.2013.00089 10.3389/fmicb.2013.00089. eCollection 2013. (2013). 43 Shisler, K. A. & Broderick, J. B. Glycyl radical activating enzymes: structure, mechanism, and substrate interactions. Arch Biochem Biophys 546, 64-71, doi:10.1016/j.abb.2014.01.020 10.1016/j.abb.2014.01.020. Epub 2014 Jan 31. (2014). 44 Leuthner, B. et al. Biochemical and genetic characterization of benzylsuccinate synthase from Thauera aromatica: a new glycyl radical enzyme catalysing the first step in anaerobic toluene metabolism. Mol Microbiol 28, 615-628 (1998). 45 Craciun, S. & Balskus, E. P. Microbial conversion of choline to trimethylamine requires a glycyl radical enzyme. Proceedings of the National Academy of Sciences of the United States of America 109, 21307-21312, doi:10.1073/pnas.1215689109 10.1073/pnas.1215689109. Epub 2012 Nov 14. (2012). 46 Selmer, T. & Andrei, P. I. p-Hydroxyphenylacetate decarboxylase from Clostridium difficile. A novel glycyl radical enzyme catalysing the formation of p-cresol. Eur J Biochem 268, 1363-1372 (2001).

19

47 Jarling, R. et al. Stereochemical investigations reveal the mechanism of the bacterial activation of n-alkanes without oxygen. Angew Chem Int Ed Engl 51, 1334-1338, doi:10.1002/anie.201106055 10.1002/anie.201106055. Epub 2011 Nov 30. (2012). 48 Sondergaard, D., Pedersen, C. N. & Greening, C. HydDB: A web tool for hydrogenase classification and analysis. Sci Rep 6, 34212-34212, doi:10.1038/srep34212 10.1038/srep34212. (2016). 49 Buckel, W. & Thauer, R. K. Energy conservation via electron bifurcating ferredoxin reduction and proton/Na(+) translocating ferredoxin oxidation. Biochim Biophys Acta 1827, 94-113, doi:10.1016/j.bbabio.2012.07.002 10.1016/j.bbabio.2012.07.002. Epub 2012 Jul 16. (2013). 50 Kanai, T. et al. Distinct physiological roles of the three [NiFe]-hydrogenase orthologs in the hyperthermophilic archaeon Thermococcus kodakarensis. J Bacteriol 193, 3109-3116, doi:10.1128/JB.01072-10 10.1128/JB.01072-10. Epub 2011 Apr 22. (2011). 51 Jenney Jr, F. E. & Adams, M. W. Hydrogenases of the model . Ann N Y Acad Sci 1125, 252-266, doi:10.1196/annals.1419.013 10.1196/annals.1419.013. (2008). 52 Ma, K., Weiss, R. & Adams, M. W. Characterization of hydrogenase II from the hyperthermophilic archaeon Pyrococcus furiosus and assessment of its role in sulfur reduction. J Bacteriol 182, 1864- 1871 (2000). 53 Greening, C. et al. Genomic and metagenomic surveys of hydrogenase distribution indicate H2 is a widely utilised energy source for microbial growth and survival. The ISME Journal 10, 761-777, doi:10.1038/ismej.2015.153 10.1038/ismej.2015.153. Epub 2015 Sep 25. (2016). 54 Yu, H. et al. Structure of an Ancient Respiratory System. Cell 173, 1636-1649 e1616, doi:10.1016/j.cell.2018.03.071 (2018). 55 Ball, J., Salvi, F. & Gadda, G. Functional Annotation of a Presumed Nitronate Monoxygenase Reveals a New Class of NADH:Quinone Reductases. J Biol Chem 291, 21160-21170, doi:10.1074/jbc.M116.739151 10.1074/jbc.M116.739151. Epub 2016 Aug 8. (2016). 56 Bommer, M. et al. Structural basis for organohalide respiration. Science 346, 455-458, doi:10.1126/science.1258118 10.1126/science.1258118. Epub 2014 Oct 2. (2014). 57 Jugder, B. E., Ertan, H., Lee, M., Manefield, M. & Marquis, C. P. Reductive Dehalogenases Come of Age in Biological Destruction of Organohalides. Trends Biotechnol 33, 595-610, doi:10.1016/j.tibtech.2015.07.004 10.1016/j.tibtech.2015.07.004. (2015). 58 Miles, Z. D., McCarty, R. M., Molnar, G. & Bandarian, V. Discovery of epoxyqueuosine (oQ) reductase reveals parallels between halorespiration and tRNA modification. Proceedings of the National Academy of Sciences of the United States of America 108, 7368-7372, doi:10.1073/pnas.1018636108 10.1073/pnas.1018636108. Epub 2011 Apr 18. (2011). 59 Hug, L. A. et al. Overview of organohalide-respiring bacteria and a proposal for a classification system for reductive dehalogenases. Philos Trans R Soc Lond B Biol Sci 368, 20120322-20120322, doi:10.1098/rstb.2012.0322 10.1098/rstb.2012.0322. Print 2013 Apr 19. (2013).

20

60 Arshad, A. et al. A Metagenomics-Based Metabolic Model of Nitrate-Dependent Anaerobic Oxidation of Methane by Methanoperedens-Like Archaea. Front Microbiol 6, 1423-1423, doi:10.3389/fmicb.2015.01423 10.3389/fmicb.2015.01423. eCollection 2015. (2015). 61 Bertero, M. G. et al. Insights into the respiratory electron transfer pathway from the structure of nitrate reductase A. Nat Struct Biol 10, 681-687, doi:10.1038/nsb969 10.1038/nsb969. Epub 2003 Aug 10. (2003). 62 Martinez-Espinosa, R. M. et al. Look on the positive side! The orientation, identification and bioenergetics of 'Archaeal' membrane-bound nitrate reductases. Fems Microbiology Letters 276, 129-139, doi:10.1111/j.1574-6968.2007.00887.x 10.1111/j.1574-6968.2007.00887.x. Epub 2007 Sep 20. (2007). 63 Kroninger, L., Berger, S., Welte, C. & Deppenmeier, U. Evidence for the involvement of two heterodisulfide reductases in the energy-conserving system of Methanomassiliicoccus luminyensis. FEBS J 283, 472-483, doi:10.1111/febs.13594 (2016). 64 Einsle, O. et al. Structure of cytochrome c nitrite reductase. Nature 400, 476-480, doi:10.1038/22802 10.1038/22802. (1999). 65 Hiratsuka, T. et al. An alternative menaquinone biosynthetic pathway operating in microorganisms. Science 321, 1670-1673, doi:10.1126/science.1160446 10.1126/science.1160446. (2008). 66 Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research 25, 1043-1055, doi:10.1101/gr.186072.114 10.1101/gr.186072.114. Epub 2015 May 14. (2015). 67 Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 35, 725-731, doi:10.1038/nbt.3893 (2017). 68 Neumann, A., Wohlfarth, G. & Diekert, G. Tetrachloroethene dehalogenase from Dehalospirillum multivorans: cloning, sequencing of the encoding genes, and expression of the pceA gene in Escherichia coli. J Bacteriol 180, 4140-4145 (1998). 69 Wrighton, K. C. et al. RubisCO of a nucleoside pathway known from Archaea is found in diverse uncultivated phyla in bacteria. The ISME Journal 10, 2702-2714, doi:10.1038/ismej.2016.53 10.1038/ismej.2016.53. Epub 2016 May 3. (2016).

21

C Supplementary Tables and Figures

Suppl. Table 1: Overview of presence/absence of discussed enzymes in Asgard lineages Suppl. Table 2: Annotations for proteins, which serve as candidate enzymes potentially involved in the various metabolic pathways discussed throughout this manuscript. Suppl. Table 3: Automatic annotation of all genes Suppl. Table 4: Carbohydrate active enzymes, peptidases, esterases and information on extracellular localization Suppl. Table 5: Annotation of beta-oxidation genes encoded by Asgard genomes per protein family/phylogeny.

Please note that all Suppl. Tables are provided in two separate excel files referred to as “S-Tables_1-4.xlsx” (includes Suppl. Tables 1-4) and “S-Table5.xlsx” (includes Suppl. Table 5; with each sheet corresponding to one particular protein family of enzymes involved in β-oxidation)

22

Thorarchaeota SMTZ1−45 Thorarchaeota SMTZ−83 Thorarchaeota AB_25 Odinarchaeota LCB4 Lokiarchaeum sp. GC14_75 Lokiarchaeota CR_4 Heimdallarchaeota LC_3 Heimdallarchaeota LC_2 Heimdallarchaeota AB_125

100 50 0 50 100

% contamination, % completeness % redundancy

Suppl. Figure 1: Illustration of the quality of metagenome-assembled genomes (MAGs) of Asgard archaea Barplot visualising the estimated completeness and contamination/redundancy values obtained with checkM66 for the nine Asgard MAGs analysed in this work. The dashed line indicates the thresholds for contamination and completeness cutoffs representing high-quality MAGs according to the recently established metagenomic standards67. Two of the MAGs are of high quality according this criteria (Odinarchaeote LCB4 and Thorarchaeote AB_25), while the other MAGs represent medium quality bins. Purple bars represent the fraction of the contamination values (light blue bars) that is due to strain heterogeneity (redundant markers that share at least 90% of amino acid identity). As previously described in Spang et al. 201539, the Lokiarchaeum sp. GC14_75 MAG contains sequences from closely related strains, which explains the high contaminations/redundancy values observed in this MAG.

23

a

Central Carbon Metabolism

3.1.1.31 6P-D-glucono- D-gluconate-6P 1,5-lactone NUCLEOTIDE Lokiarchaeum sp GC14_75 2.7.6.1 Glucose 1.1.1.44 D-Ribose-5P P-Ribose- BIOSYNTHESIS Lokiarchaeote CR_4 1.1.1.49 Pyrophosphate 2.7.1.1/22 5.3.1.6 HCHO Thorarchaeote AB_25 Thorarchaeote WOR_45 ARCHEAL LIPID Ribulose-5P NUDIX Glucose-6P Thorarchaeote WOR_83 BIOSYNTHESIS 5.1.3.1 5.3.1.6 4.1.2.43 hydrolase? Heimdallarchaeote AB_125 5.3.1.8/9 Heimdallarchaeote LC_2 D-Xylulose-5P D-Ribose-5P Fructose-6P 2.7.4.23 Heimdallarchaeote LC_3 2.2.1.1 (D-Arabino-)3- 2.7.1.11 hexulose-6P Odinarchaeote LCB_4 Glycerol-1P 3.1.3.11 2.7.1.146 3.1.3.11/ 2.2.1.1 D-Sedo- Glyceraldehyde- 4.1.2.13 Fructose-1.6-bi-P heptulose-7P 3P 5.3.1.27 4.1.2.13 2.2.1.2 Glycerone-P 5.3.1.1 Glyceraldehyde-3P D-Erythrose-4P Fructose-6P Ribose-1P 1.2.1.12 base P 2.4.2.57 i 1.2.1.59 Ribose-1,5-2P NMP Nucleosides 1.2.7.6 1,3-bi-P-glycerate 1 1.2.1.9 5.3.1.29 2.7.2.3 CO2 Trimethyl glycine amine ? 4.1.1.39 3-P-glycerate Ribulose 1,5-2P 1.2.1.43 CO2 betaine ? Monomethyl Formate amine ? 5.4.2.1-1/2 1.2.7.12 6.3.4.3 2-P-glycerate Formyl-MF corrinoid protein corrinoid protein Formyl-H4F 2.1.1.250 2.1.1.248 4.2.1.11 2.3.1.101 3.5.4.9 Formyl-H4 PEP ? ? 4.1.1.49 Methenyl-H4 3.5.4.27 4.1.1.32 2.7.9.1/2 1.5.1.5 Methenyl-H4MPT methylated corrinoid protein 4.1.1.31 2.7.1.40 Formate ? 1.1.1.38/40 Methylene-H4F Pyruvate Acetyl-CoA ? 1.5.98.1 CoM 2.3.1.54 MEVALONATE 1.5.1.20 Methylene-H4MPT 1.2.7.-(1) PATHWAY 1.2.4.1/ 2.3.1.12 Methyl-H4F 1.5.98.2 IPR000257 arCOG02410 hydrocarbons 2.3.1.8/9 Acetyl-CoA Acetoacetyl-CoA Methyl-H4MPT corrinoid protein fatty acids BETA OXIDATION 2.1.1.258 2.3.3.1 2.1.1.245 2.1.1.86 cdhD, E 2.3.3.8 Oxaloacetate Citrate Oxaloacetate (mtrA,H only) Methyl-CoM CO2 1.2.7.4 CO Methyl-CoFeSP 1.1.1.37 4.2.1.3 Acetyl-CoA cdhA, cdhB cdhA,B,C,D,E Malate Cis-aconitate CoB 4.2.1.2 4.2.1.3 Methyl-H4F ? TCA 2.8.4.1 Fumarate Isocitrate 2.3.1.169 cdhC 1.3.5.1/4 CoB-CoM Heterodisulfide 1.1.1.41/2 ATP ADP Succinate 2-Oxoglutarate Pi Acetate Acetyl-CoA Methane 6.2.1.5 1.2.7.-(3) arCOG01340 Succinyl-CoA 1.2.4.1/ 2.3.1.12

b Hydrocarbon/ fatty acid metabolism

Acetyl-CoA COG0183-Mevm m m m m m m Hydrocarbon? Acetoacetyl-CoA Fumarate COG3425m Lokiarchaeum sp GC14_75 COG01882 Lokiarchaeote CR_4 Fatty acid 3-hydroxy-3-methylglutaryl-CoA Methylalkyl-succinate? (e.g., Hexadecanoate,n=16) Thorarchaeote AB_25 COG1257 COG0318* Thorarchaeote WOR_45 Mevalonate Thorarchaeote WOR_83 Acyl-CoA (C n) Acyl-CoA (Cn-2 ) Heimdallarchaeote AB_125 COG1577 Heimdallarchaeote LC_2 COG1960* or 2 4 4 4 3 1 6 5 0 Acetyl-CoA COG2368* Mevalonate-5P Heimdallarchaeote LC_3 -oxidation

β Odinarchaeote LCB_4 Acyl-2-enoyl-CoA ? Isopentyl-P 2 2 1 1 1 1 2 2 0 COG1024* COG0183-FA* COG1608 3-hydroxy-acyl-CoA # Paralogue count Isopentyl-di-P * * * * * * Genetic neighbourhood 2 1 1 1 1 1 1 1 0 COG1250* COG1304 * fatty acid metabolism or COG1443 m mevalonate metabolism Aceto-acyl-CoA DMAPP Fatty acid biosynthesis

ISOPRENOID METABOLISM

Suppl. Figure 2: Metabolic map of central carbon and of Asgard archaea For each enzyme, the corresponding enzyme commission (E.C.) or COG are provided and can be cross referenced in Suppl. Table 1 and 2. Coloured circles indicate the genomes in which homologues were detected, Lokiarchaeota (shades of purple), Thorarchaeota (shades of green), Heimdallarchaeota (shades of orange) and Odinarchaeota

24

(blue). a, Central carbon metabolism in Asgard archaea. b, Proposed CoA-coupled β-oxidation and fatty acid biosynthesis pathways of the Asgard archaea. For β-oxidation (counter-clockwise), fatty acids (e.g., Hexadecanoate) are activated with a long chain fatty acid CoA ligase (COG0318) generating an acyl-CoA species which is oxidized by a chain-length specific acyl-CoA dehydrogenase (COG1960) or 4-hydroxybutyryl-CoA dehydratase (COG2368). The resulting acyl-2-enoyl-CoA is further oxidized by enoyl-CoA hydratase (COG1024) and 3-hydroxy-acyl-CoA dehydrogenase (COG1250) to generate aceto-acyl-CoA which is ultimately cleaved by a B-ketothiolase/acetyl-CoA acetyltransferase (COG0183-FA) to generate acetyl-CoA and a Cn-2 acyl-CoA chain. Asterisks indicate that at least one B-ketothiolase/acetyl-CoA orthologue in the represented genome was encoded within 15 genes of one or more genes related to fatty acid metabolism (COG0318, COG1960, COG2368, COG1024, or COG1250). For complex protein families, the number of paralogues identified are shown in each circle. Fatty acid biosynthesis is depicted as the reverse of β-oxidation (clockwise) using CoA33 and not acyl-carrier protein as an acyl carrier. c, Proposed mevalonate biosynthesis pathway. Acetyl-CoA is consecutively acetylated by the actions of B-ketothiolase/acetyl-CoA acetyltransferase (COG0183-Mev) and 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) synthase (COG3425) to produce HMG-CoA which in turn is converted to mevalonate by HMG-CoA reductase (COG1257). Mevalonate is phosphorylated by mevalonate kinase (COG1577) and converted to isopentyl-phosphate by an unknown enzyme. Isopentyl-phosphate is phosphorylated by isopentyl-phosphate kinase (COG1608) to isopentyl-di-phosphate which is isomerized to dimethylallyl pyrophosphate (DMAPP) by isopentyl-diphosphate isomerase 1 (COG1443) or 2 (COG1304). ‘m’ indicates that the COG0183 orthologue was encoded within 15 genes of the HMG-CoA synthase (COG3425). Greyed out text indicates: presence unclear, putative.

25

Group 4 [NiFe]-hydrogenase

Ca. Odinarchaeota archaeon LCB_4 – Group 4 [NiFe]-hydrogenase (OdinLCB4_14520) -18211 OLS18212 OLS18213 -18214 ‘nuoI’ ‘nuoH’ ‘nuoD’ ‘nuoB’

Ca. Odinarchaeota archaeon LCB_4 – Group 4 [NiFe]-hydrogenase (OdinLCB4_14620) -18220 OLS18221 OLS18222 OLS18223 OLS18224 OLS18226 18228 OLS18230 18225 18227 -18229 ‘nuoB’ ‘nuoI’ ‘nuoH’ ‘nuoD’ ‘nuoL’* hdrB ‘hybA’ tRNA-synthetase OLS18233 -8234 -8236 OLS18237 OLS18238 OLS18239 -8235 -18231 OLS18232 put. ATPase ‘nuoJ’ ‘nuoC’ ‘nuoL’ ‘nuoL’* ‘nuoL’*

Ca. Odinarchaeota archaeon LCB_4 – Group 4 [NiFe]-hydrogenase (OdinLCB4_02560) -18549 OLS18550 OLS18552 -18554 OLS18555 OLS18556 -8557 OLS18551 OLS18553 ‘nuoC’ ‘nuoL’ ‘nuoL’* ‘nuoB’ ‘nuoD’ ‘nuoH’ ‘nuoI’

Ca. Heimdallarchaeota archaeon AB_125 – Group 4 [NiFe]-hydrogenase (HeimAB125_01550) 1 OLS33135 -136 OLS33137 -33138 OLS33139 -33140 -33141 / hypD hypC ‘nuoC’ ‘nuoD’ ‘nuoI’ ‘nuoB’

Ca. Heimdallarchaeota archaeon LC_2 – Group 4 [NiFe]-hydrogenase (HeimC2_23020) OLS24348 OLS24349 OLS24350 -4351 -4353 -4354 OLS24355 OLS24356 -24357 -24347 hupD ‘nuoL’* ‘nuoL’ ‘nuoK’ -4352 ‘nuoJ’ ‘nuoI’ ‘nuoH’ ‘nuoD’ ‘nuoB’

Group 3c [NiFe]-hydrogenase

Ca. Lokiarchaeota archaeon GC14_75 – partial Group 3c [NiFe]-hydrogenase (Lokiarch_45490) KKK41105 -41106 KKK41107 mvhA hdrC hdrB

Ca. Lokiarchaeota archaeon GC14_75 – Group 3c [NiFe]-hydrogenase (Lokiarch_49310) KKK40681 KKK40682 KKK40683 (Polyferredoxin – FeS subunit fusion) mvhA mvhG mvhB / mvhD

Ca. Lokiarchaeota archaeon CR_4 – Group 3c [NiFe]-hydrogenase (RBG13Loki_1669) OLS14701 OLS14702 OLS14703 mvhB / mvhD mvhG mvhA

Ca. Thorarchaeota archaeon AB_25 – Group 3c [NiFe]-hydrogenase (ThorAB25_16580) OLS29367 OLS29368 -29369 mvhA mvhG mvhD

Ca. Thorarchaeota archaeon AB_25 – Group 3c [NiFe]-hydrogenase (ThorAB25_26490) OLS21753 OLS21754 mvhA mvhG

Ca. Thorarchaeota archaeon WOR_45 – Group 3c [NiFe]-hydrogenase (AM325_14150) KXH73312 KXH73313 -73315 KXH73316 KXH73317 -73318 -314 mvhA mvhG mvhD hdrA hdrB hdrC

Ca. Thorarchaeota archaeon WOR_83 – Group 3c [NiFe]-hydrogenase (AM324_08070) -71840 KXH71841 KXH71842 KXH71843 -71844 KXH71846 KXH71847 -845 hdrC hdrB hdrA / gtdD hdrA mvhD mvhG mvhA

Ca. Heimdallarchaeota archaeon LC_2 – Group 3c [NiFe]-hydrogenase (HeimC2_23150) -4365 -4366 OLS24367 -24368 OLS24369 OLS24370 -71 -24372 hupD hypA hypB hdrC mvhA mvhG mvhD mvhG

Ca. Heimdallarchaeota archaeon LC_3 – Group 3c [NiFe]-hydrogenase (HeimC3_04360) OLS27261 OLS27262 -27263 OLS27264 OLS27265 -27266 hdrC hdrA mvhD mvhG mvhA hupD

Group 3b [NiFe]-hydrogenase

Ca. Lokiarchaeota archaeon GC14_75 – partial Group 3b [NiFe]-hydrogenase (Lokiarch_31190) KKK42945 hyhL

Ca. Odinarchaeota archaeon LCB_4 – Group 3b [NiFe]-hydrogenase (OdinLCB4_08000) OLS17582 OLS17583 OLS17584 OLS17585 Hydrogenase, Hydrogenase, hyhL hyhS hyhG hyhB large subunit small subunit

Ca. Thorarchaeota archaeon WOR_45 – Group 3b [NiFe]-hydrogenase (AM325_05810) Iron-sulfur subunits Maturation factors KXH74488 KXH74489 KXH74490 KXH74491 hyhB hyhG hyhS hyhL Transmembrane H+/Na+ transporter Ca. Thorarchaeota archaeon WOR_83 – Group 3b [NiFe]-hydrogenase (AM324_16110) modules subunits KXH73816 KXH73817 KXH73818 KXH73819 hyhB hyhG hyhS hyhL Heterodisulfide Diaphorase subunit reductase Ca. Heimdallarchaeota archaeon LC_3 – Group 3b [NiFe]-hydrogenase (HeimC3_04260) hypothetical OLS27427 OLS27428 OLS27429 OLS27430 Other proteins hyhL hyhS hyhG hyhB proteins

26

Suppl. Figure 3: Structure of [NiFe]-hydrogenase gene clusters. Genes were classified based on their top hits to the Conserved Domain Database (CDD) webserver and comparison of their sequences with those of hydrogenase operons represented in the hydrogenase database (HydDB). All genes are drawn to scale. Short gene names are given in italic and are shown in quotes whenever they are distantly related to known proteins. For group 4 [NiFe]- hydrogenases, the homologous genes to Nuo (Complex I; NADH-quinone oxidoreductase) are shown, with the more divergent nuoL homologs asterisked. Abbreviations are as follows: hdr, heterodisulfide reductase; hyh, hydrogenase (group 3b [NiFe]-hydrogenase); mvh, methyl-viologen-reducing hydrogenase (group 3c [NiFe]-hydrogenase); hyp, Hydrogenase maturation factor; nuo, NADH dehydrogenase. 1Note that this gene cluster is located at the end of the contig and thus incomplete. NuoL-like subunits are encoded on other short contigs, which are part of the Heimdallarchaeum AB125 genome. Thus, the encoded Group 4 [NiFe]-hydrogenase of this organism is most likely also ion conductive. Please note that also other gene clusters could be incomplete due to their location at the end of a contig.

27

Heimdallarchaeota Lokiarchaeota Odinarchaeota Thorarchaeota 20 AA CBM CE GH GT

15 PL

C M P S

T Peptidases CAZYmes 10 U

Esterases Hits/Average Protein Nr Protein Hits/Average

5

0

CAZymes Esterases CAZymes Esterases CAZymes Esterases CAZymes Esterases Peptidases Peptidases Peptidases Peptidases

Suppl. Figure 4: Abundance of carbohydrate-active enzymes, peptidases and esterases in the Asgard genomes Carbohydrate-active enzymes (CAZYmes), peptidases and esterases were identified in Asgard genomes using the dbCAN webserver, MEROPs and ESTER databases respectively (see Method section for details). Count data was averaged across the number of genomes represented by each phylum and normalized by the average number of proteins detected in each phylum. AA, auxiliary activities; CBM, carbohydrate-binding module; CE, carbohydrate esterase; GH, glycoside hydrolase; GT, glycosyltransferases; PL, polysaccharide lyase. A, aspartic peptidase; C, cysteine peptidase; M, metallopeptidase; P, mixed peptidase; S, serine peptidases; T, mixed catalytic type; U, Unknown Catalytic Type. Count data is summarized in Suppl. Table 4.

28

Hexadecanoate COG0318 Acyl-CoA (C ) Acyl-CoA (Cn+2 ) n COG1960* or Acetyl-CoA Very long chain fatty acyl CoA ligase Archaea COG2368 -oxidation

β COG0318, clade 1 (of 1) Acyl-2-enoyl-CoA Bacteria COG1024* 516 taxa, 287 sites COG0183* - FA UF IQTREE bootstraps Asgards 3-hydroxy-acyl-CoA LG+C20+G+F Eukaryotes COG1250*

Aceto-acyl-CoA Fatty acid biosynthesis

92 100 Chondromyces crocatus WP_0504311681 Bacteria Nannocystis exedens SFF350521 Bacteria 100 Caldilinea aerophila WP_0144337841 Bacteria 100 thermophilum WP_0588661301 Bacteria 54 100 Candidatus Contendobacter WP_0344341891 100 bacterium KXK197431 Candidatus Thorarchaeota archaeon SMTZ145 KXH73531 100 Thermofilum carboxyditrophus WP_052886938 Archaea archaeon ex4484 74 OYT34602 100 Chlorobium sp KER095701 Bacteria 99 100 abyssi WP_0069266951 Bacteria 100 49 Lewinella cohaerens WP_0205344091 Bacteria Phaeodactylibacter xiamenensis WP_0442235351 48 100 96 Rhodothermus profundi SHK841541 Bacteria Salisaeta longa WP_0228345931 Bacteria 100 100 Ignavibacterium album WP_0145593071 Bacteria bacterium BRHc32 KUO606591 Bdellovibrio bacteriovorus WP_0618345311 Bacteria bacterium OGP594941 Bacteria 81 94 paucihalophilus WP_007977237 Archaea amylolytica WP_008309598 Archaea 79 Halobacterium sp DL1 WP_009761306 Archaea 49 100 100 tibetense WP_006090990 Archaea 100 nitratireducens WP_006672845 Archaea 100 Halovivax ruber XH70 Archaea litoreum WP_008368109 Archaea 100 Nitrococcus mobilis WP_0050006011 Bacteria 96 Candidatus Acetothermus BAL599491 100 66 Clostridiisalibacter paucivorans WP_0515315021 Bacteria Clostridium ultunense WP_0256404461 Bacteria 36 100 97 100 Peptostreptococcaceae bacterium WP_0216757861 Bacteria bacterium KJS473621 Bacteria Desulfovirgula thermocuniculi WP_0351078831 Bacteria 100 100 Lokiarchaeum sp GC14 75 KKK40207 Candidatus Thorarchaeota archaeon AB 25 OLS21444 Candidatus Heimdallarchaeota archaeon LC 3 OLS16734 61 100 100 Chondromyces apiculatus WP_0442396001 Bacteria Polyan 90 Ardenticatena maritima WP_0544927321 Chloroflexi bacterium OGO401281 Symbiobacterium thermophilum WP_0111949311 Bacteria Famil 100 Thermogemmatispora onikobensis WP_0698046471 Bacteria 100 Ktedonobacter racemifer WP_0079141881 Bacteria 100 100 100 bacterium OFW245491 Bacteria Thermoanaerobaculum aquaticum WP_0380499121 Chloroflexi bacterium OGO665941 100 99 sp CP80 WP_081226652 Archaea 72 uncultured sp MG ESQ22365 Archaea 37 thermophila WP_084019815 Archaea 50 100 Thermoflavimicrobium dichotomicum SFJ515381 Bacteria Clostridium ultunense WP_0055856631 Bacteria candidate divison MSBL1 archaeon SCGCAAA259I09 KXA96289 100 Deltaproteobacteria bacterium OGP732271 Bacteria Deltaproteobacteria bacterium OIP322651 Bacteria 100 100 58 niacini WP_0455195121 Bacteria 52 Lysinibacillus sp WP_0534839701 Bacteria 100 sp WP_0670920931 Bacteria 67 Alcanivorax sp ERS105721 Bacteria 100 52 Rhodococcus tukisamuensis SDD948901 Bacteria 65 bacterium KRT779601 Deferrisoma camini WP_0253236731 Bacteria Rhodospirillales bacterium SEP383661 Bacteria 90 100 Mycobacterium tuberculosis P9WQ37 Bacteria Candidatus Rokubacteria OGK844371 Deltaproteobacteria bacterium OGP511191 Bacteria 67 67 Roseiflexus castenholzii WP_0119979481 Bacteria Haliangium ochraceum WP_0128283001 Bacteria 73 Hyalangium minutum WP_0441899991 Bacteria 94 95 bacterium OGR195761 Bacteria Candidatus Rokubacteria OGK866111 100 99 98 Deltaproteobacteria bacterium KPJ774571 Bacteria Desulfobacterales bacterium OEU447861 Bacteria 88 Desulfomonile tiedjei WP_0148123811 Bacteria 100 sp WP_0375625171 Bacteria 100 Candidatus Thorarchaeota archaeon AB 25 OLS30896 oligotrophica WP_0150293981 Bacteria Bacillus pseudofirmus WP_0129581001 Bacteria 100 100 Rhodovulum sp WP_0372070541 Bacteria 99 Paraburkholderia xenovorans WP_0114934691 100 100 lentum WP_0680715301 Bacteria Novosphingobium pentaromativorans WP_0070144311 Bacteria 81 63 Paraburkholderia xenovorans WP_0114936731 46 91 100 Tistrella mobilis WP_0627688511 Bacteria 98 Pseudohaliea rubra WP_0355138601 100 Novosphingobium sp WP_0541223931 Bacteria Desulfatibacillum aliphaticivorans WP_0283148141 Bacteria Desulfatitalea tepidiphila WP_0540305711 Desulfitobacterium hafniense WP_0183062491 Bacteria 99 51 76 Desulfococcus multivorans WP_0208779131 Bacteria 99 Syntrophus sp OHE226271 Bacteria 91 Euryarchaeota archaeon SG85 KYK30443 100 archaeon EMR748431 Archaea 100 Candidatus Bathyarchaeota archaeon B25 KYH39002 archaeon Heimdall LC 3 OLS23830 100 Bacillus sp WP_0663043091 Bacteria 100 100 Aneurinibacillus migulanus WP_0430691971 Bacteria Lysinibacillus sphaericus AOV085621 Bacteria 100 bacterium OGW182731 100 76 Saccharopolyspora shandongensis SDZ098901 Bacteria Pseudonocar Mycobacterium tuberculosis H37Rv NP_2146801 Bacteria 73 Clostridiales bacterium KKM100231 Bacteria 98 100 Peptococcaceae bacterium KJS460901 Bacteria 98 Desulfotomaculum alcoholivorax WP_0346295801 Bacteria Haloterrigena saccharevitans WP_076148453 Archaea 75 99 100 Bacillus sp WP_0663051841 Bacteria 48 bacterium OJF179861 Bacteria Peptococcaceae bacterium KJS677691 Bacteria Deltaproteobacteria bacterium KPK906701 Bacteria 85 Desulfotomaculum gibsoniae WP_0065234761 Bacteria 66 Deltaproteobacteria bacterium OGP595431 Bacteria 98 100 Clostridiales bacterium KKM125071 Bacteria 93 Desulfosporosinus sp KUO713271 Bacteria Deltaproteobacteria bacterium KPK890951 Bacteria 98 Mycobacterium rhodesiae WP_0142094881 Bacteria Chloroflexi bacterium OGO223351 100 100 Peptococcaceae bacterium KJR961141 Bacteria 99 Desulfosporosinus acidiphilus WP_0148262291 Bacteria Desulfotomaculum thermocisternum WP_0273557891 Bacteria 84 Symbiobacterium thermophilum WP_0437129461 Bacteria Famil 100 85 Thermogemmatispora carboxidivorans WP_0528886441 Bacteria 65 Mycobacterium tuberculosis H37Rv NP_2155741 Bacteria Candidatus Rokubacteria KRT736551 100 97 Halorubrum lacusprofundi YP_0025648991 Archaea halophilic archaeon YP_0048099351 Archaea 100 pellirubrum YP_0072822201 Archaea 75 100 Alcanivorax sp WP_0676106121 Bacteria 97 Aquisalimonas asiatica SEO999811 Bacteria 100 98 Ruegeria pomeroyi Q5LRT0 Bacteria 100 82 Rhodospirillales bacterium WP_0272995711 Bacteria Tistrella mobilis WP_0627692711 Bacteria thermophilus THET8 Long chain fatty acid CoA ligase Q5SKN9 Bacteria 100 100 sedula YP_0011913091 Archaea 98 acidocaldarius YP_2557321 Archaea NP_3759101 Archaea 100 100 Archaeoglobus fulgidus NP_0708571 Archaea Ferroglobus placidus YP_0034360961 Archaea 100 Thermoplasma acidophilum NP_3937721 Archaea 100 100 Vulcanisaeta distributa YP_0039007131 Archaea Vulcanisaeta distributa YP_0039023711 Archaea Archaeoglobus fulgidus NP_0690341 Archaea 100 100 Desulfatibacillum alkenivorans SHI518981 Bacteria 100 Desulfococcus oleovorans WP_0121759731 Bacteria 100 Desulfomonile tiedjei WP_0148110511 Bacteria 100 Candidatus Thorarchaeota archaeon SMTZ183 KXH73361 Desulfatirhabdium butyrativorans WP_0513277211 Bacteria Desulfob 100 Smithella sp KFN386541 Bacteria 100 100 100 Pseudonocardia sp OJY491221 Bacteria 75 Thermocrispum municipale WP_0288498601 Bacteria Bacillus rubiinfantis WP_0423556381 Bacteria 68 100 Desulfotomaculum alcoholivorax WP_0273645591 Bacteria Desulfotomaculum sp WP_0666671291 Bacteria 100 100 100 Desulforhopalus singaporensis SDP647491 Bacteria uncultured bacterium EKD379741 89 Desulfatitalea tepidiphila WP_0540335881 Desulfacinum infernum SHF425841 Bacteria Syntrophobacterac 85 78 Archaeoglobus fulgidus NP_0691001 Archaea 84 Ferroglobus placidus YP_0034359101 Archaea lagunensis YP_0071743701 Archaea 100 100 36 Sulfolobus islandicus YP_0034190771 Archaea 100 Vulcanisaeta moutnovskia YP_0042455411 Archaea Sulfolobus islandicus YP_0056448931 Archaea 100 Sulfolobus acidocaldarius YP_2557561 Archaea Sulfolobus acidocaldarius YP_0074341421 Archaea 95 100 Haloarcula hispanica YP_0047954611 Archaea moolapensis YP_0074884691 Archaea 100 occultus YP_0073084911 Archaea 100 Natronococcus jeotgali WP_008422457 Archaea 100 pallidum WP_049917032 Archaea 75 100 jeotgali YP_0037374961 Archaea 79 100 100 Natronomonas moolapensis YP_0074881461 Archaea 100 YP_0034055301 Archaea mediterranei YP_0063495221 Archaea 100 100 Arabidopsis thaliana OX 3702 Q9C9G2 Eukaryota Arabidopsis thaliana OX 3702 Q9C8D4 Eukaryota 100 Oryza brachyantha XP_0066493462 Eukaryota 100 100 Basidiobolus meristosporus CBS 93173 ORY01764 Eukaryota Mortierella elongata AG77 OAQ30182 Eukaryota Spizellomyces punctatus DAOM BR117 Eukaryota 100 69 Viridibacillus arenosi WP_0381879871 Bacteria 100 Effusibacillus pohliae WP_0181331751 Thermoplasmatales archaeon SG8523 KYK30387 Archaea Chloroflexi bacterium OJV993311 100 100 katesii WP_018256075 Archaea 97 Halomicrobium zhouii WP_089817539 Archaea 92 carlsbadense WP_006884443 Archaea 91 Haloarcula marismortui ATCC 43049 AAV45825 ArchaeaHalobacteria Haloarchaeobius iranensis WP_089731253 100 100 Halorientalis regularis WP_092686471 Archaea 8 Halorientalis sp IM1011 WP_077205033 Archaea 100 Halapricum salinum osuccinylbenzoateCoA ligase Halapricum salinum WP_049993302 Halovivax ruber XH70 AGB15946 Archaea 100 100 halophilic archaeon J07HX64 ERH09397 Archaea 83 Halovenus aranensis WP_092702468 MenE 93 Chloroflexi bacterium PID86606 Candidatus Heimdallarchaeota archaeon LC 2 OLS23766 2-succinylbenzoate-CoA ligase 100 Pseudanabaena sp WP_0195021541 Bacteria Pseudanabaena sp SR411 WP_094533551 Bacteria 30 62 Bacillus velezensis A7Z809 Bacteria Q816I1 Bacteria 45 84 100 Geobacillus kaustophilus Q5KVX9 Bacteria 88 96 Tenuibacillus multivorans SDN970061 Bacteria Gracilibacillus lacisalsi WP_018934092 Bacteria 84 88 Listeria innocua serovar 6a Q92AY8 Bacteria faecalis Q838K1 Bacteria 100 88 Alicyclobacillus herbarius WP_0269612131 Bacteria Thermobaculum terrenum WP_012875172 Bacteria 100 98 Staphylococcus aureus P63525 Bacteria Staphylococcus epidermidis Q5HNB2 Bacteria lactis subsp lactis Q9CHK3 Bacteria 92 92 Homo sapiens OX 9606 Q14689 Eukaryota A 100 Danio rerio OX 7955 Q6NVJ5 Eukaryota Actinopte 75 3 Homo sapiens OX 9606 Q9Y2E4 Eukaryota A 98 Drosophila melanogaster OX 7227 Q9W0S9 Eukaryota Neopt Mycobacterium tuberculosis O53580 Bacteria 100 100 Mycobacterium tuberculosis P9WQ57 Bacteria Mycobacterium marinum B2HPY8 Bacteria 100 27 100 Mycobacterium paratuberculosis Q73ZP7 Bacteria Mycobacterium tuberculosis P9WQ41 Bacteria Mycobacterium smegmatis A0R3D6 Bacteria 18 42 100 Staphylococcus haemolyticus Q4L3Q0 Bacteria 100 Staphylococcus epidermidis Q5HRH4 Bacteria 56 Staphylococcus aureus Q5HIA1 Bacteria 17 O07619 Bacteria 69 Clostridium scindens P19409 Bacteria 28 Mycobacterium tuberculosis P9WQ55 Bacteria 100 100 Escherichia coli P37353 Bacteria typhimurium P37418 Bacteria influenzae P44565 Bacteria Arabidopsis thaliana OX 3702 Q8VYJ1 Eukaryota 60 Bacillus subtilis BACSU Putative acyl CoA ligase YdaB P96575 Bacteria Candidatus Lokiarchaeota archaeon CR 4 OLS13925 100 100 Homo sapiens OX 9606 Q96CM8 Eukaryota Sarcopterygii 100 Mus musculus OX 10090 MOUSE Acyl CoA synthetase family member 2 mitochondrial Q8VCW8 Eukaryota Sarcopterygi Danio rerio OX 7955 Q0P4F7 Eukaryota Actinopterygii 59 Bacillus subtilis O31826 Bacteria 100 100 Mycobacterium tuberculosis P96843 Bacteria Rhodococcus jostii Q0S7V5 Bacteria 100 Comamonas testosteroni OX 285 Q7WSH3 Bacteria 99 Eukaryotes (17) 99 Archaeoglobus fulgidus WP_086975395 Archaea Candidatus Bathyarchaeota KPV621591 100 75 100 Sulfolobus acidocaldarius WP_011278943 Archaea Sulfolobus islandicus WP_012710475 Archaea 100 Thermoproteus sp CP80 longchain fatty acidCoA ligase Thermoproteus sp CP80 WP_081226282 Archaea 100 100 Archaeoglobus fulgidus WP_010879007 Archaea 67 acetivorans longchainfattyacidCoA ligase Geoglobus acetivorans WP_048092384 Archaea 100 aeolicum WP_054964100 Archaea Ferroplasma acidiphilum WP_081142773 Archaea 100 75 Lokiarchaeum sp GC14 75 KKK43952 Deltaproteobacteria bacterium RBG 13 43 22 OGP49200 Bacteria Ferroglobus placidus WP_012964621 Archaea 46 88 100 Thermoproteus sp AZ2 longchain fatty acidCoA ligase Thermoproteus sp AZ2 KJR73858 Archaea 100 Thermoproteus tenax WP_014126925 Archaea 100 arsenaticum DSM 13514 ABP50053 ArchaeaThermopro sp ECH B KUO93055 Archaea 74 99 delaneyi WP_055407811 Archaea 92 camini WP_022542167 Archaea 54 Candidatus Caldiarchaeum subterraneum BAJ48826 Archaea Archaeoglobus fulgidus DSM 4304 AAB90214 ArchaeaArchaeoglobac 100 99 miscellaneous Crenarchaeota group6 archaeon AD81 KON33329 64 miscellaneous Crenarchaeota group1 archaeon SG8323 KON32453 100 Candidatus Bathyarchaeota archaeon BA1 KPV65458 82 miscellaneous Crenarchaeota group15 archaeon DG45 KON30935 56 uncultured marine crenarchaeote E377F ADQ54427 Ferroglobus placidus WP_012966056 Archaea 74 Candidatus Geothermarchaeota archaeon ex4572 27 PCN49790 99 100 Lokiarchaeum sp GC14 75 KKK43371 Candidatus Lokiarchaeota archaeon CR 4 OLS13051 91 Lokiarchaeum sp GC14 75 KKK43585 100 100 Deltaproteobacteria bacterium OGP590281 Bacteria Desulfatiglans anilini WP_0352532611 100 Candidatus Lokiarchaeota archaeon CR 4 OLS16266 99 100 Desulfocarbo indianensis WP_0496740791 99 97 Desulfocarbo indianensis WP_082205268 Deltaproteobacteria bacterium OGP616581 Bacteria Deltaproteobacteria bacterium KPJ777671 Bacteria 99 Candidatus Lokiarchaeota archaeon CR 4 OLS15182 Deltaproteobacteria bacterium RBG 16 54 11 OGP82876 Bacteria 99 Anaerolineae bacterium SM23 84 KPL23618 Bacteria 98 Clostridiales bacterium PH28 bin88 KKM08777 Bacteria 99 95 bacterium ML8 F2 OPL10633 candidate division KSB1 bacterium 4484 219 OQX59393 98 82 bacterium OFW554581 Chloroflexi bacterium RBG 13 53 26 OGO02298 95 Carboxydothermus islandicus WP_075865646 Bacteria 96 47 Desulfonauticus sp 38 4375 KUJ95068 Bacteria 79 93 Chloroflexi bacterium 13 1 40CM 4 69 19 OLC50571 Armatimonadetes bacterium CSP13 KRT76067 100 90 Bacillus subtilis P94547 Bacteria Bacillus sp 17376 WP_023627069 Bacteria 99 Lihuaxuella thermophila WP_089966356 96 Alicyclobacillus acidocaldarius subsp acidocaldarius Tc41 AEJ43424 Bacteria 65 Paenibacillus barengoltzii WP_085279024 Bacteria 98 94 Ferroglobus placidus WP_012966703 Archaea 98 Geoglobus ahangari WP_048096179 Archaea 100 Archaeoglobus sulfaticallidus WP_015591500 Archaea 87 Archaeoglobus fulgidus WP_048096076 Archaea 69 Archaeoglobus fulgidus WP_010879854 Archaea Euryarchaeota archaeon RBG 16 68 13 OGS49388 92 Ardenticatena maritima GAP64541 56 Chloroflexi bacterium UTCFX4 OQY89831 2 97 Desulfotomaculum alcoholivorax WP_034629153 Bacteria 97 Peptococcaceae bacterium BICA17 KJS65522 Bacteria 100 100 Desulfosporosinus sp I2 KJR46426 Bacteria 100 Desulfosporosinus sp HMP52 WP_034598778 Bacteria 89 Desulfotomaculum sp WP_0666658381 Bacteria 83 Candidatus Lokiarchaeota archaeon CR 4z OLS15941 Symbiobacterium thermophilum longchain fatty acidCoA ligase partial Symbiobacterium thermophilum OTA40978 Bacteria 75 100 100 Ferroplasma acidiphilum WP_081142318 Archaea 93 Ferroplasma acidarmanus WP_019841341 Archaea 88 Thermoplasmatales archaeon Eplasma EQB66881 Archaea Thermoplasmatales archaeon Iplasmaz EQB66181 Archaea Catenulispora acidiphila WP_0127873171 Bacteria Catenulisporace Thermosulfidibacter takaii BAT709841 Bacteria 100 Pseudohongiella spirulinae WP_0580205741 44 100 Salinisphaera shabanensis WP_084623617 Bacteria 100 100 Pseudomonas pohangensis WP_090193776 Bacteria 100 bacterium 42 54 T18 OUS04916 Bacteria Gammaproteobacteria bacterium 45 16 T64 OUS25477 Bacteria 100 100 marine gamma WP_0072236911 Paraglaciecola arctica WP_007621374 100 Candidatus Magnetoglobus ETR721811 Bacteria 100 Candidatus Magnetoglobus multicellularis str Araruama ETR66006 Bacteria 100 100 Desulfarculus baarsii WP_0132600441 Bacteria Desulfobacterium vacuolatum WP_084070047 Bacteria 100 63 Desulfatibacillum alkenivorans SHK116371 Bacteria Desulfobactera 100 100 Syntrophobacterales bacterium CG03 land 8 20 14 0 80 58 14 PIV04862 Bacteria 100 Desulfobacterales bacterium RIFOXYA12 FULL 46 15 OGR28242 Bacteria Lokiarchaeum sp GC14 75 KKK46283 90 Desulfatibacillum alkenivorans SHJ919871 Bacteria Desulfobactera Desulfatirhabdium butyrativorans WP_0352584351 Bacteria Desulfob 100 100 Candidatus Bathyarchaeota archaeon RBG 13 60 20 OGD52338 92 Candidatus Bathyarchaeota archaeon RBG 13 52 12 OGD57231 miscellaneous Crenarchaeota group15 archaeon DG45 KON30265 100 66 Candidatus Thorarchaeota archaeon AB 25 OLS27009 100 Candidatus Lokiarchaeota archaeon CR 4 OLS13030 100 88 Candidatus Bathyarchaeota archaeon BA2 KPV63969 Candidatus Thorarchaeota archaeon SMTZ45 KXH70137 100 99 Candidatus Thorarchaeota archaeon SMTZ183 KXH73626 100 Candidatus Thorarchaeota archaeon SMTZ45 KXH72056 98 Candidatus Heimdallarchaeota archaeon LC 3 OLS16617 88 Candidatus Heimdallarchaeota archaeon LC 3 OLS28116 Candidatus Heimdallarchaeota archaeon LC 2 OLS20737 Thermoplasmatales archaeon ex4572 165 OYT28544 Archaea 79 98 100 P46450 Bacteria 99 62 pestis OX 632 Q8ZES9 Bacteria 100 Escherichia coli P69451 Bacteria 94 Bdellovibrionales bacterium OFZ292971 Bacteria 100 Thermonema rossianum WP_0380295881 Bacteria 99 Pseudomonas aeruginosa PAO1 NP_2519891 Bacteria Shewanella oneidensis MR1 NP_7192051 Bacteria 92 uncultured marine thaumarchaeote KM3 54 F04 AIF12086 100 45 bacterium 4572 123 OQY02702 Bacteria 99 Smithella sp F21 KFN37712 Bacteria Candidatus Heimdallarchaeota archaeon LC 2 OLS28911 Sulfurovum sp AR WP_008243160 Bacteria 10 100 Candidatus Heimdallarchaeota archaeon AB 125 OLS30582 Candidatus Rokubacteria bacterium RIFCSPLOWO2 02 FULL 68 19 OGL04341 100 100 Geoglobus acetivorans AIY90538 Archaea 100 Sulfolobus acidocaldarius YP_0089471761 Archaea 100 Archaeoglobus fulgidus NP_0694021 Archaea radiodurans R1 NP_2856321 Bacteria 99 Desulfotomaculum alcoholivorax WP_0512735941 Bacteria 100 100 Metallosphaera cuprina YP_0044097541 Archaea Sulfolobus acidocaldarius YP_2565621 Archaea Sulfolobus acidocaldarius SUSAZ AHC51439 Archaea 79 79 Candidatus Heimdallarchaeota archaeon LC 2 OLS28457 Photobacterium gaetbulicola WP_039456786 Bacteria 96 5 Gammaproteobacteria bacterium RIFCSPHIGHO2 12 FULL 63 22 OGT58870 Bacteria 100 Desulfatibacillum alkenivorans WP_073478559 Bacteria 98 Candidatus Magnetomorum sp HK1 KPA15789 Bacteria 99 100 Candidatus Entotheonella palauensis WP_089721897 Bacteria Ktedonobacter sp 13 1 20CM 3 54 15 OLE32264 Bacteria 99 Candidatus Promineofilum breve WP_095044096 100 uncultured Desulfobacterium sp CBX29666 99 Anaerolineae bacterium SG8 19 KPK13160 Bacteria 100 100 bacterium TFI 002 WP_096195203 Bacteria 30 100 Lewinella persica WP_020569801 Bacteria 100 99 marine bacterium AO1C OJJ14012 Bacteria 97 Emticicia oligotrophica WP_015028955 Bacteria 61 Hahella sp CCBMM4 WP_094707688 Bacteria Anaerolineaceae bacterium 4572 52 OQY37283 Bacteria 82 Roseiflexus sp RS1 WP_011955769 Bacteria 99 candidate division KSB1 bacterium 4484 219 OQX57321 100 Deltaproteobacteria bacterium RBG 13 43 22 OGP49451 Bacteria 74 Candidatus Rokubacteria bacterium CSP16 KRT72923 Peptococcaceae bacterium BRH c4b KJS15526 Bacteria 2 Bacillus subtilis O07610 Bacteria 100 Chthonomonas calidirosea WP_082629427 Bacteria 93 Armatimonadetes bacterium JP3 11 OYT75939 78 99 candidate division NC10 bacterium RIFCSPLOWO2 02 FULL 66 22 OGB93186 69 candidate division NC10 bacterium CSP15 KRT71276 Candidatus Vecturithrix granuli GAK59963 100 72 Mycobacterium tuberculosis P9WQ61 Bacteria Mycobacterium marinum B2HIL6 Bacteria 99 98 Aneurinibacillus aneurinilyticus ERI099601 Bacteria Candidatus Omnitrophica bacterium CG12 big fil rev 8 21 14 0 65 42 8 PIW67363 Thermincola ferriacetica WP_052217444 Bacteria 99 99 Lokiarchaeum sp GC14 75 KKK46046 Lokiarchaeum sp GC14 75 KKK42137 81 Firmicutes bacterium ML8 F2 OPL10746 100 100 Desulfobacteraceae bacterium A6 OQW99630 Bacteria 97 Desulfatirhabdium butyrativorans WP_051328388 Bacteria Deltaproteobacteria bacterium RIFOXYA12 FULL 61 11 OGQ97113 Bacteria 15 100 99 Acidobacteria bacterium 13 1 20CM 3 53 8 OLE54160 Bacteria 48 Acidobacteria bacterium OLB17 KXK05301 Bacteria Rubeoparvulum massiliense WP_082146911 87 87 Roseiflexus castenholzii WP_012120293 Bacteria Turneriella parva WP_014802707 Bacteria 100 Deltaproteobacteria bacterium RBG 13 61 14 OGP60424 Bacteria 100 bacterium RIFCSPLOWO2 12 FULL 59 9 OGS05501 Bacteria 98 Elusimicrobia bacterium PCI33276 Bacteria 99 100 Pseudonocardia sp AL04100510 WP_062397232 Bacteria 100 Amycolatopsis methanolica WP_038532535 Bacteria 98 Nocardia niwae WP_063018516 Bacteria Branchiibius sp NY1634622 KYH45714 Bacteria 100 Thermoplasmatales archaeon Eplasma EQB66646 Archaea 100 100 Bacillus sp FJAT45350 WP_096200686 Bacteria Bacillus sp VT1664 WP_083717430 Bacteria Caldalkalibacillus thermarum WP_007505878 Bacteria 100 Cupriavidus necator Q0K844 Bacteria Escherichia coli P38135 Bacteria 100 100 Arabidopsis thaliana OX 3702 Q9SMT7 Eukaryota Brassical 100 Glycine max XP_0146285481 Eukaryota 79 Ricinus communis XP_002509782 Eukaryota Neoptera 47 Selaginella moellendorffii XP_002968990 Eukaryota sulphuraria XP_005708092 Eukaryota 81 Acanthamoeba castellanii XP_0043357861 Eukaryota 99 75 Conidiobolus coronatus NRRL 28638 KXN71044 Eukaryota Ancylistaceae 98 Rhizophagus irregularis DAOM 181602 ERZ96761 Eukaryota Anaeromyces robustus ORX85010 Eukaryota Neocallimast 50 86 100 Polysphondylium pallidum PN500 XP_020431458 Eukaryota Dictyostelium fasciculatum XP_004361679 Eukaryota 100 Acytostelium subglobosum LB1 XP_012748474 Eukaryota Gibberella moniliformis Q8J2R0 Eukaryota 80 Methanocella paludicola YP_0033552871 Archaea 100 100 Mus musculus OX 10090 Q3URE1 Eukaryota 81 Bos taurus OX 9913 Q58DN7 Eukaryota T Arabidopsis thaliana OX 3702 Q8H151 Eukaryota 100 90 Candidatus Nitrosoarchaeum ZP 082582161 Archaea Marine Group I WP_048083190 Thermoplasmatales archaeon EMR746831 Archaea 99 Marinomonas sp WP_0460187341 Bacteria 93 Oleiphilus WP_0685167321 WP_0685167321 Bacteria 87 100 Hahella ganghwensis WP_0204063071 Bacteria Marinobacter sp WP_0709651961 Bacteria Pseudomonas citronellolis WP_0580709971 Bacteria 100 100 92 Luteipulveratus halotolerans WP_0506709831 Bacteria Mycobacterium WP_0590978761 WP_0590978761 Bacteria 100 90 Moritella sp WP_0060319311 Bacteria 100 Acinetobacter calcoaceticus WP_0524380821 Bacteria Pseudomonas nitroreducens WP_0247676711 Bacteria Pseud 96 100 100 Lokiarchaeum sp GC14 75 KKK44928 Lokiarchaeum sp GC14 75 KKK43377 Lokiarchaeum sp GC14 75 KKK43546 73 Haliea salexigens WP_0279492541 Bacteria 96 Lokiarchaeum sp GC14 75 KKK46018 100 Lokiarchaeum sp GC14 75 KKK41395 100 96 Lokiarchaeum sp GC14 75 KKK44636 70 98 Lokiarchaeum sp GC14 75 KKK43328 Lokiarchaeum sp GC14 75 KKK44634 100 Deltaproteobacteria bacterium OGP602851 Bacteria 100 Syntrophomonas zehnderi WP_0464985331 Bacteria Desulfoluna spongiiphila SCY054051 Bacteria 100 Sorangium cellulosum ADZ250061 Bacteria 98 98 Capitella teleta ELU085591 Eukaryota 100 Branchiostoma floridae XP_0025882231 Eukaryota Ciona intestinalis XP_0021236691 Eukaryota Saccoglossus kowalevskii XP_0027322151 Eukaryota 78 100 Hyphomonas sp 346218 OZB17301 Bacteria 100 Sphingorhabdus flavimaris ATW04227 100 93 Caulobacter vibrioides KSB90693 Bacteria Asticcacaulis biprosthecium WP_0062730521 Bacteria 81 Rhizobiales bacterium TMED94 OUV14550 Bacteria Bradyrhizobium sp Leaf396 WP_057197288 Bacteria 100 78 100 100 Drosophila willistoni XP_0020656461 Eukaryota 100 Pogonomyrmex barbatus XP_0116335921 Eukaryota 100 Clunio marinus CRK896351 Eukaryota 37 53 Parasteatoda tepidariorum XP_0159078641 Eukaryota varieornatus GAV095921 Eukaryota Phytophthora infestans XP_0028992901 Eukaryota 63 Pseudomonas jinjuensis SDP024891 Bacteria 100 Ferroplasma acidarmanus WP_009886084 Archaea 15 Acidiplasma WP_048101208 Archaea miscellaneous Crenarchaeota group archaeon SMTZ80 KON27411 100 94 agarilyticus 2 Archaea 97 Haloterrigena turkmenica WP_012944900 Archaea Natrinema pellirubrum WP_006182225 Archaea 100 100 100 Lokiarchaeum sp GC14 75 KKK45592 miscellaneous Crenarchaeota KON269711 Candidatus Lokiarchaeota archaeon CR 4 OLS12888 99 Halorientalis regularis WP_092694607 Archaea Natronomonas pharaonis YP_3308571 Archaea 100 98 76 Haloferax gibbonsii WP_081605159 Archaea 55 uncultured archaeon A07HR60 ESS11779 97 Natrinema sp CBA1119 WP_098726807 Archaea 100 Haloprofundus marisrubri WP_082230564 100 Natronorubrum sediminis WP_090508451 Archaea Halorientalis regularis WP_092690358 Archaea 100 Geoglobus ahangari WP_052747732 Archaea Archaeoglobales archaeon ex4484 92 OYT34221 Archaea

0.9

29

Suppl. Figure 5: Phylogenetic analysis of long chain fatty acid CoA ligase (COG0318). An unrooted was generated using IQ-tree LG+C20+F on an alignment of 516 taxa and 287 sites. Bacteria, Archaea, eukaryotes and Asgard archaea are colored in black, green, blue and purple respectively. Heimdallarchaeota MenE-like homologue is indicated with a star. Raw data files are available via figshare (see Data availability for more details).

30

a b Eukaryotic and bacterial ACAD11 Bacteria OLS19783 Heimdallarchaeota archaeon LC 3 bacterium OHD65915 OLS26460 Heimdallarchaeota archaeon LC 2 Desulfococcus multivorans WP 0208770901 Flavobacteria bacterium MS0242A WP 0088661701 Bacteria Desulfatitalea sp KJS32787 Ardenticatena putative CUS03139 OLS13053 Lokiarchaeota archaeon CR 4 Bacteria KXH70550 Thorarchaeota archaeon SMTZ-45 Myxococcales bacterium SG8 38 KPK14957 Bacteria Clade 2 a OLS29226 Thorarchaeota archaeon AB 25 Herpetosiphon aurantiacus DSM 785ABX04203 Bacteria ACAD11 KXH78047 Thorarchaeota archaeon SMTZ1-83 Bacteria Deltaproteobacteria bacterium RBG 19FT COMBO 46 12 OGP99206 Bacteria bacterium 13 1 40CM 2 60 3 OLD51893 Bacteria Deltaproteobacteria bacterium OGP84622 Clade 1 a Bacteriovorax EPZ50360 Archaea and Bacteria Halobacteriovorax marinus WP 0142459081 Bacteria Bacteria Deltaproteobacteria bacterium OGQ61258 Bacteria Myxococcus xanthus DK 1622 ABF89869 Bacteria Vulgatibacter incomptus WP 0507260261 Bacteria Deltaproteobacteria bacterium OGQ36242 Eukaryotic and bacterial VLCAD Very long chain album SEH12263 Latescibacteria bacterium DG 63 KPJ60103 Acyl-CoA dehydrogenase Actinobacteria bacterium SCGC AG212D09 OAI39001 Bacteria Actinobacteria bacterium RBG 16 68 12 OFW72708 Syntrophothermus lipocalidus WP 013175056 Bacteria Deltaproteobacteria bacterium CSP18 KRT73081 Bacteria Bacteria Bacteria and Archaea Glutaryl-CoA dehydrogenase KXH75636 Thorarchaeota archaeon SMTZ1-45 Archaeal and Bacterial glutaryl-CoA dehydrogenase OLS31131 Thorarchaeota archaeon AB 25 Archaea Candidatus Aminicenantes bacterium RBG 13 64 14 OGD23929 Candidatus Bathyarchaeota archaeon BA2 KPV639711 Clade 1 b KXH69706 Thorarchaeota archaeon SMTZ-45 WP 0112284661 Archaeoglobus fulgidus DSM 4304 NP 0711001 Archaea Clade 2 b Thermus Thermophilus Hb8 1UKW A Archaea Natronorubrum tibetense WP 0060920871 Archaea Sorangium cellulosum KYF52759 Eukaryotes Minicystis rosea APR85246 Roseiflexus ZP01359693.seq Haliangium ochraceum WP 0128284671 Luteimonas mephitis WP 027080353 Bacteria Eukaryotes Medium Chain Acyl-CoA Dehydrogenase Archaea, Bacteria, Eukaryotes Leptospira santarosai str CBC613 EMK05202 KXH69994 Thorarchaeota archaeon SMTZ-45 Proteobacteria bacterium NonnenW8red APJ02516 KXH75819 Thorarchaeota archaeon SMTZ1-83 Bdellovibrionales bacterium GWB1 55 8 OFZ19287 Clade 1 c KXH70517 Thorarchaeota archaeon SMTZ-45 Acidobacteria bacterium 13 1 40CM 4 69 4 OLC54358 Medium chain OLS20954 Thorarchaeota archaeon AB 25 Marine group II euryarchaeote REDSEA S10 B2 LURQ010000151 190 acyl-CoA dehydoroganse Archaeogl acd4 NP 069505 Archaeoglobus sulfaticallidus PM70 1 YP 0079071471 Archaea uncultured marine group II III euryarchaeote AIF18949 candidate divison MSBL1 archaeon SCGCAAA259D14 KXA90515 Marine group II euryarchaeote REDSEA S03 B6 LURP010001401 5 KKK40871 Lokiarchaeum sp GC14 75 Anaerolineae bacterium KPK13161 KKK44188 Lokiarchaeum sp GC14 75 Ardenticatena CUS05166 OLS12395 Lokiarchaeota archaeon CR 4 Brachymonas denitrificans DSM 15123 SEM98185 Thermoplasmatales archaeon DG 70 KYK384401 Archaea Thermorudis peleae WP 051914270 0.3 Thermoplasma volcanium WP 0480540961 Archaea Thermoplasmatales archaeon EQB66180 Archaea OLS13428 Lokiarchaeota archaeon CR 4 OLS14126 Lokiarchaeota archaeon CR 4 OLS13429 Lokiarchaeota archaeon CR 4 KKK41099 Lokiarchaeum sp GC14 75 c KKK43023 Lokiarchaeum sp GC14 75 KKK43023 KKK43024 Lokiarchaeum sp GC14 75 KKK43024 OLS15112 Candidatus Lokiarchaeota archaeon CR 4 Clade 2 c Bacteria OLS15961 Candidatus Lokiarchaeota archaeon CR 4 OLS14435 Lokiarchaeota archaeon CR 4 KKK46150 Lokiarchaeum sp GC14 75 KKK43402 Lokiarchaeum sp GC14 75 KKK46151 Lokiarchaeum sp GC14 75 KXH71300 Thorarchaeota archaeon SMTZ1-45 miscellaneous Crenarchaeota group archaeon SMTZ 80 KON269771 OLS29321 Thorarchaeota archaeon AB 25 OLS12340 Lokiarchaeota archaeon CR 4 KXH77597 Thorarchaeota archaeon SMTZ1-83 Bacteria Clade 3 a KXH77050 Thorarchaeota archaeon SMTZ-45 Bacteria Bacterial short chain Thermoplasmatales archaeon DG 70 1 KYK361041 Archaea Bacteria KXH72519 Thorarchaeota archaeon SMTZ1-45 acyl-CoA dehydrognease Archaea OLS31234 Thorarchaeota archaeon AB 25 Crenarchaeota archaeon 13 1 40CM 3 52 17 OLD13677 OLS26226 Heimdallarchaeota archaeon LC 2 archaeon 13 2 20CM 2 53 6 OLB455281 bacterium BRH c20a KJS20163 OLS14446 Candidatus Lokiarchaeota archaeon CR 4 Tessaracoccus bendigoensis WP 0731880981 uncultured archaeon ANME 1 CBH377421 SCD - Bacteria Candidatus Syntrophoarchaeum butanivorans OFV664321 SBCAD Archaea, Bacteria, Eukaryotes Sulfolobus islandicus LD85 YP 0034190831 Archaea Archaea, Bacteria, Eukaryotes Candidatus Bathyarchaeota archaeon RBG 13 46 16b OGD46862 Bacteria Bacteria and Eukaryotes OLS22720 Heimdallarchaeota archaeon LC 3 Archaea OLS25826 Heimdallarchaeota archaeon LC 3 KXH70609 Thorarchaeota archaeon SMTZ1-45 Clade 3 b Candidatus Wallbacteria bacterium GWC2 49 35 OGM06752 OLS22296 Thorarchaeota archaeon AB 25 Candidatus Eisenbacteria bacterium RBG 16 71 46 OGF13600 Short branched-chain KXH76220 Thorarchaeota archaeon SMTZ-45 Bacteria acyl-CoA dehydrognease KXH77230 Thorarchaeota archaeon SMTZ1-83 SCAD - Eukaryotes Thermoplasmatales archaeon SM1 50 KYK227901 Archaea OLS25108 Heimdallarchaeota archaeon LC 2 Candidatus Bathyarchaeota archaeon RBG 13 46 16b OGD47177 Clade 2 d OLS26159 Heimdallarchaeota archaeon LC 2 Archaea Candidatus bacterium OGI18402 Candidatus Methanoperedens nitroreducens KCZ719001 Bacteria and Archaea Archaea OLS19404 Heimdallarchaeota archaeon LC 3 Bacteria OLS32864 Heimdallarchaeota archaeon AB 125 Archaea and Bacteria OLS26133 Heimdallarchaeota archaeon LC 2 Deltaproteobacteria bacterium RBG 16 49 23 OGP74822 Bacteria Caldilinea aerophila DSM 14535 NBRC 104270 BAL98636 Archaea Ardenticatena CUS02206 97 Archaea Chloroflexi bacterium RBG 16 54 11OGO29950 Archaea and Bacteria Bacteria Eukaryotes Bacteria Clade 3 c Bdellovibrionales bacterium RIFCSPHIGHO2 01 OFZ291561 Bacteria OLS24605 Heimdallarchaeota archaeon LC 3 Actinobacteria bacterium OLB25876 Parcubacteria group bacterium RIFCSPLOWO2 01 OHB22163 Bacteria Bdellovibrio exovorus WP 0154698691 Bacteria Bacteria Legionella sp OJW10602 Bacteria Bacteria !"#$%&'!()*+#,-'.*/01*(22%$34*(56(,'703/8Bacteria Bacteria Bacteria Perkinsus marinus ATCC 50983 XP 0027875281 Eukaryota Clade 2 e Chloroflexi bacterium RBG 13 56 8b OGO05759 Dactylosporangium aurantiacum WP 033357316 Bacteria uncultured archaeon ANME 1 CBH365941 Bacteria !"#$%&'!()*+#,-'.*/01*(22%$34*(59(,'703/8Bacteria Bacteria bacterium GWF2 35 48 OFY290101 Bacteria Niabella drilacis SDE05376 Archaea, Bacteria, Eukaryotes Elusimicrobia bacterium CG1 02 56 21 OIO00021 OLS19814 Heimdallarchaeota archaeon LC 2 Melioribacter roseus WP 0148573991 OLS21102 Heimdallarchaeota archaeon LC 2 0.5 Elusimicrobia bacterium GWA2 66 18 OGR463411 OLS22621 Heimdallarchaeota archaeon LC 3 OLS26565 Heimdallarchaeota archaeon LC 3 Clade 3 d Deltaproteobacteria bacterium RIFOXYA12 FULL 61 11 OGR02661 Ignavibacteria bacterium GWA2 35 8 OGU138401 Bacteria Bacteria d Bacteria Acyl-CoA Dehydrogenase BP > 90 Bacteria Isovaleryl-CoA Dehydrogenase, Bacteria, Eukaryotes Hexadecanoate COG1960, 3 clades BP > 70 Isovaleryl-CoA Dehydrogenase, Archaea UF IQTREE bootstraps COG0318 OLS26193 Heimdallarchaeota archaeon LC 2 LG+C20+G+F Bacillus pseudofirmus WP 0753872151 Acyl-CoA (C ) Acyl-CoA (Cn) Dehalococcoidia bacterium CG2 30 46 19 OIP28045 n+2 Clade Archaea, Bacteria, Eukaryotes COG1960* or Syntrophaceae bacterium OIP92817 Acetyl-CoA COG2368 1a 1b 2a 2b 2c 2d 2e 3a 3b 3c 3d 3e

Bacteria -oxidation Bacteria Clade 3 e β Lokiarchaeum sp GC14 75 Archaea Acyl-2-enoyl-CoA Isovaleryl-CoA dehydrogenase Lokiarchaeota archaeon CR 4 Desulfotomaculum gibsoniae WP 0065235121 Thorarchaeota archaeon AB 25 Deltaproteobacteria bacterium OIP32710 COG1024* COG0183* - FA Thorarchaeota archaeon SMTZ-45 Bacteria Candidatus Rokubacteria bacterium KRT64914 3-hydroxy-acyl-CoA Thorarchaeota archaeon SMTZ1-45 Chloroflexi bacterium RBG 16 68 14 OGO52103 Thorarchaeota archaeon SMTZ1-83 OLS13182 Lokiarchaeota archaeon CR 4 COG1250* Heimdallarchaeota archaeon AB 125 Candidatus Methanoperedens nitroreducens KCZ706461 0.4 Heimdallarchaeota archaeon LC 2 Crenarchaeota archaeon 13 1 40CM 3 53 5 OLD02790 Aceto-acyl-CoA Heimdallarchaeota archaeon LC 3 Fatty acid biosynthesis Odinarchaeote LCB 4 Archaea

Suppl. Figure 6: Asgard archaea have a versatile repertoire of Acyl-CoA dehydrogenases (COG1960). The acyl-CoA dehydrogenase family was subdivided into three large clades (a-c) based on preliminary phylogenetic analysis of sequences retrieved from Swigonova et al. 2009 and the best BLAST hits for each ASGARD sequences. Each unrooted tree was estimated using IQ-tree LG+C20+F for a) Clade 1 (117 taxa, 365 sites), b) Clade 2 (542 taxa, 311 sites), and c) Clade 3 (419 taxa, 300 sites) independently. Bacteria, Archaea, eukaryotes and Asgard archaea are colored in black, green, blue and purple, respectively. Bipartition support values are depicted as open or closed circles for ultra- fast bootstrap values greater than 70 and 90, respectively. A summary of the clade distribution among Asgard

31

genomes is shown in d. d) Clade summaries for each Asgard genome. Raw data files are available via figshare (see Data availability for more details).

32

Hexadecanoate COG0318 Acyl-CoA (C ) Acyl-CoA (Cn+2 ) n 4-hydroxbutyryl-CoA dehydrogenase COG1960* or Archaea Acetyl-CoA COG2368, clade 1 (of 1) COG2368 -oxidation

β 186 taxa, 459 sites Bacteria Acyl-2-enoyl-CoA UF IQTREE bootstraps COG1024* Asgards COG0183* - FA LG+C20+G+F 3-hydroxy-acyl-CoA Eukaryotes COG1250*

Aceto-acyl-CoA Fatty acid biosynthesis 83 Odoribacter splanchnicus WP 013612261 Bacteria 95 Bacteroidetes bacterium GWE2 39 28 OFX78239 Bacteria 92 Butyricimonas sp An62 WP 087419602 Bacteria 100 Tannerella sp oral taxon BU063 isolate Cell 1 3 ETK08580 Bacteria 66 Bacteroidetes oral taxon 274 WP 009217651 Bacteria 99 Eubacterium yurii subsp margaretiae ATCC 43715 EFM38416 Bacteria 100 61 100 Peptoniphilus senegalensis Peptoniphilus senegalensis WP 019107522 Bacteria mediterraneensis Anaerococcus mediterraneensis WP 072536320 Bacteria 100 Anaerosalibacter massiliensis Anaerosalibacter massiliensis WP 042679566 Candidatus Bacteroides periocalifornicus Candidatus Bacteroides periocalifornicus KQM08696 79 100 bacterium A7P90m WP 092440445 Alistipes sp ZOR0009 Alistipes sp ZOR0009 WP 047449302 Bacteria 93 Clostridiales bacterium DRI13 Clostridiales bacterium DRI13 WP 034425949 Bacteria 98 Desulfitibacter sp BRH c19 Desulfitibacter sp BRH c19 KUO51321 Bacteria 100 Deltaproteobacteria bacterium RBG 13 43 22 Deltaproteobacteria bacterium RBG 13 43 22 OGP51839 Bacteria 100 100 Deltaproteobacteria bacterium RBG 16 44 11 Deltaproteobacteria bacterium RBG 16 44 11 OGP67490 Bacteria 100 Desulfomonile tiedjei WP 014809462 Bacteria 98 Thermosinus carboxydivorans WP 007288939 Bacteria Brenneria goodwinii WP 048638278 Bacteria 100 Desulfotomaculum alcoholivorax Desulfotomaculum alcoholivorax WP 027365655 Bacteria 48 100 96 Desulfovirgula thermocuniculi Desulfovirgula thermocuniculi WP 027718192 Bacteria Peptococcaceae bacterium BICA17 Peptococcaceae bacterium BICA17 KJS64096 Bacteria Thermodesulforhabdus norvegica WP 093395080 Bacteria 51 56 Clostridiales bacterium DRI13 Clostridiales bacterium DRI13 WP 034420114 Bacteria Chloroflexi bacterium RBG 16 48 8 Chloroflexi bacterium RBG 16 48 8 OGO18142 100 Clostridiales bacterium DRI13 Clostridiales bacterium DRI13 WP 034423511 Bacteria 63 Clostridiaceae bacterium BRH c20a Clostridiaceae bacterium BRH c20a KJS22906 Bacteria Chloroflexi bacterium RBG 16 54 18 Chloroflexi bacterium RBG 16 54 18 OGO32803 64 Desulfatitalea sp BRH c12 Desulfatitalea sp BRH c12 KJS30819 62 66 48 Thermoplasmatales archaeon DG 70 1 1 1 Archaea 48 Alkalispirochaeta odontotermitis WP 037564729 Desulfobacteraceae bacterium 4484 1903 Desulfobacteraceae bacterium 4484 190 3 OPX38039 Bacteria 96 Deltaproteobacteria bacterium RBG 16 54 11 Deltaproteobacteria bacterium RBG 16 54 11 OGP83646 Bacteria 43 Syntrophaceae bacterium CG2 30 58 14 Syntrophaceae bacterium CG2 30 58 14 OIP91094 Bacteria 100 Desulfococcus sp 4484 241 Desulfococcus sp 4484 241 OQX61890 Bacteria 99 Spirochaetes bacterium RBG 13 51 14 Spirochaetes bacterium RBG 13 51 14 OHD64336 Bacteria 100 100 Lokiarchaeota archaeon CR 4 OLS14149 Spirochaetes bacterium RBG 16 49 21 Spirochaetes bacterium RBG 16 49 21 OHD72371 Bacteria Chloroflexi bacterium RBG 13 50 10 Chloroflexi bacterium RBG 13 50 10 OGN92360 97 Cryomorphaceae bacterium BACL29 MAG121220 KRO90313 Bacteria 89 73 Eisenibacter elegans Eisenibacter elegans WP 026999252 Runella slithyformis WP 013927145 Bacteria 85 Opitutae bacterium TMED102 Opitutae bacterium TMED102 OUV37430 Bacteria 85 Spirosoma rigui Spirosoma rigui WP 080241179 Bacteria 67 62 Chloroflexi bacterium RBG 16 47 49 Chloroflexi bacterium RBG 16 47 49 OGO14180 100 Bradyrhizobium sp LMTR 3 WP 065748565 Bacteria 100 84 Rhizobiales bacterium 6247 Rhizobiales bacterium 6247 OJY10980 Bacteria bacterium RBG 16 66 20 Betaproteobacteria bacterium RBG 16 66 20 OFZ83833 Bacteria 100 Enhygromyxa salina 4hydroxyphenylacetate 3monooxygenase Enhygromyxa salina KIG19191 Bacteria 78 Plesiocystis pacifica WP 006975097 Bacteria 86 Myxococcales bacterium SG8 38 1 Myxococcales bacterium SG8 38 1 KPK53004 Bacteria 64 Rhodospirillaceae bacterium TMED63 Rhodospirillaceae bacterium TMED63 OUU64831 Bacteria 100 Actinobacteria bacterium RBG 16 68 21 OFW66924 91 71 Nannocystis exedens vinylacetylCoADeltaisomerase Nannocystis exedens SFD78078 Bacteria 86 Candidatus Actinomarina minuta aromatic ring hydroxylase Candidatus Actinomarina minuta AGQ19232 79 Heimdallarchaeota archaeon LC 3 OLS27557 Candidatus Nitrososphaera gargensis Ga92 YP 0068640121 Archaea 100 Candidatus sp AR2 YP 0067749211 Archaea Cenarchaeum symbiosum A YP 8749771 Archaea 100 97 Alcanivorax jadensis WP 035249649 Bacteria 100 68 Alcanivorax sp Nap 24 Alcanivorax sp Nap 24 KXJ42863 Bacteria 67 Patulibacter medicamentivorans EHN09625 Bacteria 68 Gammaproteobacteria bacterium 45 16 T64 Gammaproteobacteria bacterium 45 16 T64 OUS31771 Bacteria 99 Pacificimonas flava WP 088711494 100 Haliea salexigens Haliea salexigens WP 027949422 Bacteria 99 100 Cellvibrionales bacterium TMED79 Cellvibrionales bacterium TMED79 OUV04328 Cellvibrionales bacterium TMED21 Cellvibrionales bacterium TMED21 OUT64600 Rhodoplanes sp Z2YC6860 WP 068017906 Bacteria 87 Thermosyntropha lipolytica WP 073090387 Bacteria 86 100 bacterium 50 218 VinylacetylCoA Deltaisomerase Thermoanaerobacterales bacterium 50 218 KUK31993 Bacteria Syntrophothermus lipocalidus WP 013174847 Bacteria 87 Bilophila wadsworthia WP 005024032 Bacteria 100 Actinobacteria bacterium RBG 13 63 9 partial Actinobacteria bacterium RBG 13 63 9 OFW61991 100 Chloroflexi bacterium RBG 16 64 43 Chloroflexi bacterium RBG 16 64 43 OGO46987 93 Anaerolineae bacterium SM23 63 Anaerolineae bacterium SM23 63 KPK94973 Bacteria 100 Desulfosporosinus orientis WP 014183362 Bacteria 100 Desulfosporosinus sp I2 WP 045573733 Bacteria 96 Syntrophomonas zehnderi WP 046499671 Bacteria 84 69 Deltaproteobacteria bacterium RBG 13 61 14 Deltaproteobacteria bacterium RBG 13 61 14 OGP60234 Bacteria 100 Deltaproteobacteria bacterium RBG 13 47 9 partial Deltaproteobacteria bacterium RBG 13 47 9 OGP64414 Bacteria Desulfarculus baarsii WP 013257453 Bacteria 100 Thorarchaeota archaeon SMTZ1 45 KXH73298 95 71 Thorarchaeota archaeon AB 25 OLS21572 Carboxydothermus hydrogenoformans WP 011344224 Bacteria 96 Clostridiales bacterium PH28 bin88 Clostridiales bacterium PH28 bin88 KKM10021 Bacteria 99 99 Desulfobacteraceae bacterium 4484 1902 partial Desulfobacteraceae bacterium 4484 190 2 OPX37824 Bacteria 82 Deltaproteobacteria bacterium RBG 16 58 17 OGP84484 Bacteria 92 Lokiarchaeota archaeon CR 4 OLS15558 Actinobacteria bacterium RBG 13 55 18 OFW55534 96 Clostridium lactatifermentans WP 087988292 Bacteria 94 Sporanaerobacter sp PP176a WP 071140098 Bacteria 98 94 Megasphaera paucivorans WP 091650569 Bacteria 100 Christensenella minuta WP 066522843 Bacteria Emergencia timonensis Emergencia timonensis WP 067539369 Peptococcaceae bacterium BICA18 Peptococcaceae bacterium BICA18 KJS84478 Bacteria 95 100 Desulfotomaculum thermosubterraneum WP 072867350 Bacteria 94 Clostridiales bacterium PH28 bin88 Clostridiales bacterium PH28 bin88 KKM10421 Bacteria 97 98 100 Geobacter metallireducens WP 011365971 Bacteria Desulfosporosinus sp TolM Desulfosporosinus sp TolM KGP77049 Bacteria 38 Desulfotomaculum copahuensis WP 066666307 Bacteria Tepidanaerobacter syntrophicus WP 059033422 Bacteria 100 Chloroflexi bacterium RBG 13 52 14 Chloroflexi bacterium RBG 13 52 14 OGO01838 Archaeoglobus fulgidus DSM 4304 NP 0691691 Archaea 100 Desulfatibacillum alkenivorans WP 015947493 Bacteria 100 Desulfotignum phosphitoxidans WP 006963527 Bacteria 100 96 41 100 bacterium SG8 3 Nitrospira bacterium SG8 3 KPK30761 Bacteria Deltaproteobacteria bacterium RBG 16 54 11 Deltaproteobacteria bacterium RBG 16 54 11 OGP82199 Bacteria 45 100 Deltaproteobacteria bacterium RBG 13 43 22 Deltaproteobacteria bacterium RBG 13 43 22 OGP52204 Bacteria Syntrophaceae bacterium CG2 30 58 14 Syntrophaceae bacterium CG2 30 58 14 OIP93046 Bacteria 76 Dehalococcoidia bacterium SG8 51 3 Dehalococcoidia bacterium SG8 51 3 KPK22682 100 Deltaproteobacteria bacterium RBG 16 44 11 Deltaproteobacteria bacterium RBG 16 44 11 OGP67488 Bacteria 87 99 69 100 Desulfomonile tiedjei WP 014808689 Bacteria Desulfomonile tiedjei WP 014809852 Bacteria Deltaproteobacteria bacterium RBG 16 54 18 Deltaproteobacteria bacterium RBG 16 54 18 OGP94733 Bacteria uncultured Roseburia sp vinylacetylCoADeltaisomerase uncultured Roseburia sp SCI71009 100 Bacillus sp ES3 Bacillus sp ES3 WP 071412666 Bacteria 100 93 Effusibacillus pohliae hypothetical protein Effusibacillus pohliae WP 018133402 Candidatus Rokubacteria bacterium GWA2 73 35 OGK81673 Dehalococcoidia bacterium SM23 28 2 Dehalococcoidia bacterium SM23 28 2 KPK48035 90 100 Sulfolobus acidocaldarius DSM 639 YP 2567291 Archaea 100 100 Sulfolobus tokodaii str 7 NP 3776311 Archaea hospitalis W1 YP 0044592051 Archaea 100 Sulfolobus islandicus HVE10 4 YP 0056470731 Archaea 100 str IM2 NP 5601891 Archaea 100 100 Thermoproteus uzoniensis 76820 YP 0043377441 Archaea Thermoproteus tenax Kra 1 YP 0048928231 Archaea 100 hospitalis KIN4 I YP 0014351841 Archaea fumarii 1A YP 0047813861 Archaea 95 Syntrophus sp GWC2 56 31 Syntrophus sp GWC2 56 31 OHE24129 Bacteria 100 91 Syntrophaceae bacterium CG2 30 58 14 Syntrophaceae bacterium CG2 30 58 14 OIP93452 Bacteria 92 Syntrophus sp GWC2 56 31 Syntrophus sp GWC2 56 31 OHE21963 Bacteria 93 Deltaproteobacteria bacterium SM23 61 Deltaproteobacteria bacterium SM23 61 KPK92058 Bacteria Syntrophorhabdus aromaticivorans Syntrophorhabdus aromaticivorans WP 028893254 Bacteria 94 100 Thermodesulforhabdus norvegica WP 093393062 Bacteria Clostridiales bacterium DRI13 Clostridiales bacterium DRI13 WP 034423693 Bacteria 99 Sulfolobus acidocaldarius DSM 639 YP 2557481 Archaea 39 100 Crenarchaeota archaeon SCGC AAA261 N23 2264888932 Sulfolobus islandicus HVE10 4 YP 0056448861 Archaea 68 DSM 5348 YP 0011913051 Archaea 45 Pyrobaculum oguniense TE7 YP 0052612931 Archaea 27 100 Crenarchaeota archaeon SCGC AAA261 C22 2264892060 100 Sulfolobus tokodaii str 7 NP 3759111 Archaea Vulcanisaeta distributa DSM 14429 YP 0039026331 Archaea 94 100 100 Ferroplasma acidarmanus fer1 YP 0081412561 Archaea Vulcanisaeta moutnovskia 76828 YP 0042459511 Archaea 99 Lokiarchaeota archaeon CR 4 OLS12489 99 Lokiarchaeota archaeon CR 4 OLS15367 100 Thorarchaeota archaeon SMTZ1 45 KXH72787 86 26 Thorarchaeota archaeon AB 25 OLS29561 Lokiarchaeota archaeon CR 4 OLS12834 100 100 Lokiarchaeum sp GC14 75 KKK43500 Archaeoglobus fulgidus DSM 4304 NP 0698601 Archaea 100 Desulfobacteraceae bacterium 4484 1901 hypothetical protein B1H11 02200 partial Desulfobacteraceae bacterium 4484 190 1 OPX39743 Bacteria 100 Deltaproteobacteria bacterium RBG 16 58 17 hypothetical protein A2V87 06905 Deltaproteobacteria bacterium RBG 16 58 17 OGP82118 Bacteria Peptococcaceae bacterium BRH c4b hypothetical protein VR69 16425 Peptococcaceae bacterium BRH c4b KJS14753 Bacteria 39 Archaeoglobus fulgidus DSM 4304 NP 0697181 Archaea 100 100 Archaeoglobus sulfaticallidus PM701 YP 0079076971 Archaea Ferroglobus placidus DSM 10642 YP 0034353521 Archaea Caldisphaera lagunensis DSM 15908 YP 0071742681 Archaea 100 Elusimicrobia bacterium RIFOXYB2 FULL 62 6 Elusimicrobia bacterium RIFOXYB2 FULL 62 6 OGS52501 Bacteria 90 90 Syntrophobacter fumaroxidans WP 011700717 Bacteria 100 Pelotomaculum thermopropionicum SI aromatic ring hydroxylase Pelotomaculum thermopropionicum SI BAF59951 Bacteria Desulfotomaculum thermosubterraneum WP 072868999 Bacteria 96 Clostridiales bacterium PH28 bin88 Clostridiales bacterium PH28 bin88 KKM09002 Bacteria 100 Megasphaera cerevisiae WP 074503171 Bacteria 100 Megasphaera sp BV3C161 Megasphaera sp BV3C161 WP 036242684 Bacteria 91 Intestinimonas butyriciproducens Intestinimonas butyriciproducens WP 058118756 100 Anaerolineaceae bacterium 4572 322 Anaerolineaceae bacterium 4572 32 2 OQY17459 Bacteria 100 Desulfobacteraceae bacterium 4572 123 Desulfobacteraceae bacterium 4572 123 OQY05907 Bacteria 95 100 72 Lokiarchaeum sp GC14 75 KKK46499 Desulfopila aestuarii WP 073615492 Bacteria 97 Deltaproteobacteria bacterium SM23 61 Deltaproteobacteria bacterium SM23 61 KPK90599 Bacteria 99 Clostridiales bacterium DRI13 Clostridiales bacterium DRI13 WP 034424306 Bacteria Clostridiales bacterium PH28 bin88 Clostridiales bacterium PH28 bin88 KKM11653 Bacteria 4- hydroxy phenylacetate-3-hydroxylase Hippea alviniae Hippea alviniae WP 026939490 Bacteria 100 100 Peptococcaceae bacterium BICA18 Peptococcaceae bacterium BICA18 KJS80957 Bacteria 100 100 Anaerosporomusa subterranea WP 066242833 100 Clostridiales bacterium DRI13 Clostridiales bacterium DRI13 WP 034423538 Bacteria Sporolituus thermophilus WP 093689009 Bacteria Desulfuromonas sp DDH964 Desulfuromonas sp DDH964 WP 066729573 Bacteria 100 candidate division Zixibacteria bacterium DG 27 candidate division Zixibacteria bacterium DG 27 KPJ49045 82 candidate division Zixibacteria bacterium RBG 16 53 22 candidate division Zixibacteria bacterium RBG 16 53 22 OGC92899 100 candidate division TA06 bacterium DG 24 KPJ52824 100 Thermoplasmatales archaeon SG8 52 3 1 Archaea 82 Thermoplasmatales archaeon SM1 50 1 Archaea Elusimicrobia bacterium GWA2 69 24 Elusimicrobia bacterium GWA2 69 24 OGR56649 Bacteria

0.5

33

Suppl. Figure 7: Phylogenetic analysis of 4-hydroxy-butyryl-CoA dehydrogenases (COG2368). Phylogenetic tree was estimated using IQ-tree LG+C20+F on an alignment of 186 taxa and 459 sites and is unrooted. Bacteria, Archaea, eukaryotes and Asgards are colored in black, green, blue and purple, respectively. Raw data files are available via figshare (see Data availability for more details).

34

Hexadecanoate COG0318 Acyl-CoA (C ) Acyl-CoA (Cn+2 ) n COG1960* or Enoyl-CoA hydratase Archaea Acetyl-CoA COG2368 COG1024, clade 1 (of 2) -oxidation

β Bacteria Acyl-2-enoyl-CoA 220 taxa, 207 sites COG1024* Asgards COG0183* - FA UF IQTREE bootstraps 3-hydroxy-acyl-CoA LG+C20+G+F Eukaryotes COG1250*

Aceto-acyl-CoA Fatty acid biosynthesis

100 100 Thermoplasma acidophilum DSM 1728 NP 3935991 Archaea 93 Thermoplasma volcanium GSS1 NP 1107171 Archaea Ferroplasma acidarmanus fer1 YP 0081412391 Archaea 99 88 Rheinheimera sp A13L WP 008900523 uniref Bacteria Lokiarchaeum sp GC14 75 KKK41766 ASGARD 100 100 Pyrobaculum calidifontis JCM 11548 YP 0010561961 Archaea 44 Pyrobaculum oguniense TE7 YP 0052613111 Archaea 54 Sulfolobus islandicus M1627 YP 0028428681 Archaea 100 Brevundimonas sp BAL3 WP 008260384 uniref Bacteria Azoarcus sp CIB WP 050416105 uniref Bacteria Pelagibaca bermudensis WP 007797511 uniref Bacteria 53 99 Pyrobaculum aerophilum str IM2 NP 5600421 Archaea 100 Vulcanisaeta moutnovskia 76828 YP 0042442981 Archaea 62 100 Sulfolobus acidocaldarius DSM 639 YP 2566501 Archaea Sulfolobus tokodaii str 7 NP 3758941 Archaea 87 Rhodococcus opacus WP 012690607 uniref Bacteria 48 alni WP 011604258 uniref Bacteria 98 Frankia WP 011604285 uniref Bacteria 100 Halalkalicoccus jeotgali B3 YP 0037382851 Archaea 15 Haloterrigena turkmenica DSM 5511 YP 0034056821 Archaea 100 100 archaeon Heimdall LC 2 OLS29277 ASGARD 97 archaeon Heimdall LC 3 OLS16875 ASGARD 93 Lokiarchaeum sp GC14 75 KKK45968 ASGARD Desulfosporosinus acididurans WP 053006429 uniref Bacteria Leptospira biflexa WP 012387145 uniref Bacteria 97 100 96 Rhodococcus WP 007727023 uniref Bacteria 25 97 Amycolatopsis sp M39 WP 067587278 uniref Bacteria Desulfomonile tiedjei WP 014812730 uniref Bacteria 23 Archaeoglobus fulgidus DSM 4304 NP 0712511 Archaea Haloterrigena turkmenica DSM 5511 YP 0034028701 Archaea 100 Haloarcula marismortui WP 011223966 uniref Archaea 99 Natronococcus occultus WP 015321957 uniref Archaea 91 100 Haloterrigena turkmenica DSM 5511 YP 0034038961 Archaea magadii ATCC 43099 YP 0034789571 Archaea 21 65 100 halophilic archaeon DL31 WP 014050575 uniref Archaea 85 Natronococcus occultus SP4 YP 0073117791 Archaea Halalkalicoccus jeotgali B3 YP 0037386321 Archaea 87 archaeon Heimdall LC 2 OLS16840 ASGARD 74 24 100 Haloarcula hispanica ATCC 33960 YP 0047860071 Archaea Haloterrigena turkmenica DSM 5511 YP 0034053381 Archaea halophilic archaeon DL31 YP 0048095201 Archaea 66 Sulfolobus tokodaii WP 010978008 uniref Archaea Nocardioides sp JS614 WP 011753545 uniref Bacteria Syntrophomonas wolfei WP 011641933 uniref Bacteria 100 100 Bacillus WP 009794142 uniref Bacteria 100 Bacillus megaterium WP 049164532 uniref Bacteria 100 Kyrpidia tusciae WP 013074639 uniref Bacteria Bacillus cereus group WP 000787369 uniref Bacteria 61 Desulfosporosinus orientis WP 014184225 uniref Bacteria 100 59 99 Frankia WP 011601452 uniref Bacteria Desulfarculus baarsii WP 013259248 uniref Bacteria 35 Bhargavaea cecembensis WP 008298544 uniref Bacteria Rhodopseudomonas palustris WP 011471336 uniref Bacteria 99 24 Syntrophothermus lipocalidus WP 013174766 uniref Bacteria 99 Peptococcaceae bacterium CEB3 WP 047829464 uniref Bacteria 46 99 Aerococcus urinae WP 013669571 uniref Bacteria Alicyclobacillus acidocaldarius WP 014464611 uniref Bacteria 95 47 Desulfosporosinus orientis WP 014183353 uniref Bacteria Desulfatibacillum alkenivorans WP 015947404 uniref Bacteria 100 Frankia alni WP 011606943 uniref Bacteria Streptomyces sp PTY087I2 WP 065478073 uniref Bacteria 100 100 Halalkalicoccus jeotgali B3 YP 0037382701 Archaea 19 Haloterrigena turkmenica DSM 5511 YP 0034056651 Archaea P2 NP 3434131 Archaea 7 17 xylanophilus WP 011564725 uniref Bacteria 100 99 100 Geobacillus kaustophilus WP 011231478 uniref Bacteria archaeon Heimdall LC 2 OLS28655 ASGARD 89 Sorangium cellulosum WP 012238553 uniref Bacteria Coraliomargarita akajimensis WP 013041973 uniref Bacteria 70 100 Rhodococcus opacus WP 012691755 uniref Bacteria Frankia symbiont of Datisca glomerata WP 013873382 uniref Bacteria 98 70 Desulfosporosinus orientis WP 014183693 uniref Bacteria 40 Geobacter sp M18 WP 015718743 uniref Bacteria 98 marine gamma proteobacterium HTCC2080 WP 007233860 uniref 41 Acinetobacter oleivorans WP 013197757 uniref Bacteria Syntrophothermus lipocalidus WP 013174671 uniref Bacteria 67 Eggerthella WP 009305556 uniref Bacteria Desulfobacterium autotrophicum WP 015905953 uniref Bacteria 100 83 Drosophila melanogaster NP 610805 39 Anopheles gambiae str PEST XP 309296 83 Branchiostoma floridae XP 002599825 100 99 100 Gallus gallus methylglutaconylCoA hydratase mitochondrial isoform 1 Gallus gallus NP 001239535 Homo sapiens NP 001689 Danio rerio methylglutaconylCoA hydratase mitochondrial Danio rerio NP 001003576 100 Acyrthosiphon pisum NP 001155743 100 100 Glycine max NP 001236189 100 Glycine max XP 003549935 100 Populus trichocarpa XP 006385550 Arabidopsis thaliana NP 193413 85 99 Tetrahymena thermophila SB210 XP 001014909 51 XP 005708127 Dictyostelium discoideum AX4 XP 636218 97 37 Lysinibacillus manganicus WP 036188943 uniref Bacteria 99 94 Virgibacillus sp SK37 WP 040955555 uniref Bacteria Halobacillus sp BAB2008 WP 008638167 uniref Bacteria 99 99 Bacillus WP 003231526 uniref Bacteria Paenibacillus tyrfis WP 036691079 uniref Bacteria Bacillus sp FJAT25509 WP 056469684 uniref Bacteria Stigmatella aurantiaca WP 002615076 uniref Bacteria 74 89 100 subterraneus WP 006904322 uniref Bacteria 98 Kyrpidia tusciae WP 013075283 uniref Bacteria 97 99 Alicycliphilus WP 013520324 uniref Bacteria 91 hinzii WP 029578336 uniref Bacteria Streptomyces clavuligerus WP 003961616 uniref Bacteria 100 Rubrobacter xylanophilus WP 011564738 uniref Bacteria Haliangium ochraceum WP 012828694 uniref Bacteria 85 67 Ruminiclostridium thermocellum WP 003517391 uniref 96 Nocardia farcinica WP 060590886 uniref Bacteria 80 100 Lokiarchaeum sp GC14 75 KKK43567 ASGARD Candidatus Lokiarchaeota archaeon CR 4 OLS16144 ASGARD 97 95 Sphingobium chlorophenolicum WP 013846144 uniref Bacteria 99 Pseudonocardia WP 020628016 uniref Bacteria Syntrophomonas wolfei WP 011640082 uniref Bacteria 88 100 Candidatus Thorarchaeota archaeon SMTZ183 KXH77447 ASGARD 69 Candidatus Thorarchaeota archaeon SMTZ145 KXH74847 ASGARD 100 Desulfobacula toluolica WP 014957162 uniref Bacteria archaeon Heimdall LC 3 3 OLS22333 ASGARD Amycolatopsis mediterranei WP 013225713 uniref Bacteria 94 64 WP 010929921 uniref Bacteria 54 Syntrophothermus lipocalidus WP 013175409 uniref Bacteria Verminephrobacter eiseniae WP 011809574 uniref Bacteria Syntrophothermus lipocalidus WP 013174672 uniref Bacteria 72 100 Mesorhizobium sp STM 4661 WP 006332089 uniref Bacteria 28 Magnetospirillum sp XM1 WP 068435197 uniref Bacteria Labrenzia aggregata WP 055655685 uniref Bacteria 75 WP 003632456 uniref Bacteria 64 Mycobacterium smegmatis WP 003891941 uniref Bacteria 98 77 Desulfococcus oleovorans WP 012175781 uniref Bacteria 80 Pseudomonas sp UW4 WP 015094331 uniref Bacteria 99 Candidatus Nitrospira inopinata WP 062483601 uniref 100 Oscillochloris trichoides WP 006561984 uniref Bacteria Roseiflexus sp RS1 WP 011957907 uniref Bacteria 90 100 63 Stenotrophomonas maltophilia WP 024957873 uniref Bacteria Pseudoxanthomonas suwonensis WP 052633006 uniref Bacteria Niastella koreensis WP 014221770 uniref Bacteria Halomonas sp HL48 WP 027336359 uniref Bacteria 97 69 93 Dethiobacter alkaliphilus WP 008515220 uniref Bacteria 71 Desulfurispirillum indicum WP 013506013 uniref Bacteria 30 Hippea maritima WP 013682398 uniref Bacteria 93 100 Desulfotignum phosphitoxidans WP 006965724 uniref Bacteria Desulfobacula toluolica WP 014957343 uniref Bacteria 92 87 Waddlia chondrophila WP 013182530 uniref Bacteria Bacteriovorax sp Seq25 V WP 021274470 uniref Bacteria 55 65 Leptospira biflexa WP 012387934 uniref Bacteria archaeon Heimdall LC 3 OLS26255 ASGARD Desulfarculus baarsii WP 013258113 uniref Bacteria 100 100 Clostridium perfringens WP 004459261 uniref Bacteria 100 Clostridium WP 011967671 uniref Bacteria Clostridiales WP 008392578 uniref Bacteria Clostridium acetobutylicum WP 010965999 uniref Bacteria 75 77 88 71 Peptoniphilus sp oral taxon 836 WP 009345334 uniref Bacteria 59 21 Clostridium botulinum WP 012343411 uniref Bacteria Carboxydothermus hydrogenoformans WP 011344504 uniref Bacteria Clostridiales WP 003437965 uniref Bacteria 54 Thermoanaerobacter WP 004400800 uniref Bacteria 100 97 Candidatus Nitrosopumilus sp AR2 YP 0067761851 Archaea 42 Cenarchaeum symbiosum A YP 8751151 Archaea Thermomicrobium roseum WP 012641455 uniref Bacteria 82 91 Clostridium kluyveri WP 012102871 uniref Bacteria Clostridium propionicum WP 066050012 uniref Bacteria 87 93 100 Candidatus Thorarchaeota archaeon SMTZ183 KXH72335 ASGARD 90 Candidatus Thorarchaeota archaeon AB 25 OLS31246 ASGARD 94 Candidatus Thorarchaeota archaeon AB 25 OLS31249 ASGARD 92 Archaeoglobus fulgidus DSM 4304 NP 0692711 Archaea 100 92 Desulfosporosinus orientis WP 014185308 uniref Bacteria Intestinimonas butyriciproducens WP 033118374 uniref Roseiflexus castenholzii WP 012121641 uniref Bacteria Clostridium pasteurianum WP 003444509 uniref Bacteria 99 100 Plesiocystis pacifica WP 006976549 uniref Bacteria 83 Pseudogulbenkiania ferrooxidans WP 021477154 uniref Bacteria Halanaerobium saccharolyticum WP 005489316 uniref Bacteria 85 Geobacter metallireducens WP 004511493 uniref Bacteria Cupriavidus necator WP 011615255 uniref Bacteria 100 100 Lokiarchaeum sp GC14 75 KKK41167 ASGARD Lokiarchaeum sp GC14 75 KKK44690 ASGARD 100 marine gamma proteobacterium HTCC2080 WP 007233560 uniref 100 100 100 100 Rhodopseudomonas palustris WP 011441590 uniref Bacteria Bordetella hinzii WP 029580926 uniref Bacteria 16 Desulfococcus oleovorans WP 012174325 uniref Bacteria 88 Candidatus Microthrix parvicella WP 012224500 uniref Bacteria 100 100 Acidiplasma WP 048101207 uniref Archaea 73 100 Ferroplasma acidarmanus fer1 YP 0081412551 Archaea Vulcanisaeta moutnovskia 76828 YP 0042437361 Archaea Archaeoglobus fulgidus DSM 4304 NP 0695191 Archaea 90 Desulfococcus oleovorans WP 012175023 uniref Bacteria 29 Candidatus Caldiarchaeum subterraneum YP 0087966721 Archaea 100 37 Lokiarchaeum sp GC14 75 KKK46062 ASGARD 39 Lokiarchaeum sp GC14 75 2 KKK40381 ASGARD Ferroglobus placidus WP 012966419 uniref Archaea 100 3931 Picrophilus torridus WP 011177822 uniref Archaea Archaeoglobus fulgidus WP 048096729 uniref Archaea archaeon Heimdall LC 2 OLS29106 ASGARD 100 80 Candidatus Thorarchaeota archaeon SMTZ183 KXH72339 ASGARD 64 Candidatus Thorarchaeota archaeon SMTZ145 KXH70915 ASGARD Burkholderia sp YI23 WP 014193462 uniref Bacteria archaeon Heimdall AB 125 OLS31877 ASGARD 98 38 98 Sulfolobus acidocaldarius N8 YP 0074346931 Archaea 98 Sulfolobus islandicus LD85 YP 0034197211 Archaea 99 Sulfolobus tokodaii str 7 NP 3774781 Archaea 4 Acidianus hospitalis W1 YP 0044582771 Archaea 16 Metallosphaera sedula WP 012021928 uniref Archaea Tepidanaerobacter acetatoxydans WP 013778878 uniref Bacteria 7 Escherichia coli WP 001283568 uniref Bacteria Thiomonas WP 013106971 uniref Bacteria 77 99 21 Mesorhizobium metallidurans WP 008877395 uniref Bacteria Mycobacterium smegmatis WP 058125069 uniref Bacteria 23 Thermobaculum terrenum WP 012873964 uniref Bacteria 100 96 Thermoplasma acidophilum WP 010900489 uniref Archaea Thermoplasma volcanium GSS1 NP 1105361 Archaea Picrophilus torridus WP 011177755 uniref Archaea 99 Deferribacter desulfuricans WP 013008772 uniref Bacteria 100 Calditerrivibrio nitroreducens WP 013451445 uniref Bacteria 36 94 Denitrovibrio acetiphilus WP 013012168 uniref Bacteria Flexistipes sinusarabici WP 013887366 uniref Bacteria 100 sp oral taxon 071 WP 000048044 uniref Bacteria Eremococcus coleocola WP 006418863 uniref Bacteria 100 89 Denitrovibrio acetiphilus WP 013012169 uniref Bacteria Flexistipes sinusarabici WP 013887365 uniref Bacteria Candidatus Koribacter versatilis WP 011522234 uniref Bacteria Geobacter metallireducens WP 004511490 uniref Bacteria

0.5

35

Suppl. Figure 8: Phylogenetic analysis of Enoyl-CoA reductase clade 1 (COG1024). Phylogenetic tree was estimated using IQ-tree LG+C20+F on an alignment of 220 taxa and 207 sites and is unrooted. Bacteria, Archaea, eukaryotes and Asgards are colored in black, green, blue and purple respectively. Raw data files are available via figshare (see Data availability for more details).

36

Hexadecanoate COG0318 Acyl-CoA (C ) Acyl-CoA (Cn+2 ) n COG1960* or Enoyl-CoA hydratase Archaea Acetyl-CoA COG2368 COG1024, clade 2 (of 2) -oxidation

β Bacteria Acyl-2-enoyl-CoA 374 taxa, 133 sites COG1024* Asgards COG0183* - FA UF IQTREE bootstraps 3-hydroxy-acyl-CoA LG+C20+G+F Eukaryotes COG1250*

Aceto-acyl-CoA Fatty acid biosynthesis

100 100 Phaeobacter sp CECT 5382 WP 058332540 uniref Bacteria 99 Rhodobacteraceae bacterium O365 WP 068241489 uniref Bacteria 100 Rhodobacter sphaeroides WP 011337545 uniref Bacteria 76 Pseudovibrio sp WM33 WP 063301427 uniref Bacteria Rhodobacteraceae bacterium O365 WP 068242823 uniref Bacteria 74 Asticcacaulis sp AC402 WP 023448984 uniref Bacteria 95 80 Roseiflexus sp RS1 WP 011958520 uniref Bacteria archaeon Heimdall LC 3 OLS23549 ASGARD Ktedonobacter racemifer WP 007909232 uniref Bacteria 74 96 100 Thiomonas WP 013104588 uniref Bacteria 41 Caballeronia glathei WP 035940274 uniref Thiomonas WP 013107395 uniref Bacteria 100 Azoarcus sp BH72 WP 011766794 uniref Bacteria 68 Acinetobacter indicus WP 045795055 uniref Bacteria 61 100 94 Pseudogulbenkiania ferrooxidans WP 021477034 uniref Bacteria 37 Aquitalea WP 045845199 uniref Bacteria 98 Herbaspirillum seropedicae WP 013235032 uniref Bacteria Magnetospirillum sp XM1 WP 068437098 uniref Bacteria 99 Burkholderia lata WP 011354038 uniref Bacteria 76 Ralstonia solanacearum WP 064050468 uniref Bacteria 100 Limnobacter sp MED105 WP 008247470 uniref Bacteria 84 Bordetella ansorpii WP 066419609 uniref Bacteria thermophilus WP 012872967 uniref Bacteria 100 92 100 Kutzneria albida WP 025355251 uniref Bacteria Pseudonocardia autotrophica WP 073578012 uniref Bacteria Micromonospora WP 013287978 uniref Bacteria 29 Desulfosporosinus orientis WP 014186447 uniref Bacteria 100 100 Bradyrhizobiaceae bacterium SG6C WP 009737109 uniref Bacteria 89 100 Rhodopseudomonas palustris WP 011157700 uniref Bacteria Cupriavidus taiwanensis WP 012355636 uniref Bacteria 100 60 Agrobacterium tumefaciens complex WP 010973077 uniref Bacteria Cupriavidus necator WP 013959343 uniref Bacteria Myxococcus xanthus WP 011552779 uniref Bacteria 89 Achromobacter piechaudii WP 006220163 uniref Bacteria Sphingomonas wittichii WP 011952587 uniref Bacteria 100 98 Bacillus subtilis WP 010886512 uniref Bacteria 98 Serratia plymuthica WP 020453841 uniref Bacteria WP 010957414 uniref Bacteria 87 93 Saccharopolyspora erythraea WP 009943562 uniref Bacteria Catenulispora acidiphila WP 012786807 uniref Bacteria 93 99 Nostoc punctiforme WP 012409605 uniref Bacteria 100 Haliangium ochraceum WP 012826359 uniref Bacteria 68 89 Plesiocystis pacifica WP 006972733 uniref Bacteria 98 Yersinia pekkanenii WP 049608915 uniref Bacteria Xenorhabdus nematophila WP 013184043 uniref Bacteria Collimonas arenae WP 038489236 uniref Bacteria 83 97 90 Cupriavidus taiwanensis WP 012355348 uniref Bacteria 85 Bradyrhizobium lablabi WP 057856489 uniref Bacteria Cupriavidus taiwanensis WP 012355549 uniref Bacteria 93 Rhodobacteraceae bacterium O365 WP 068242680 uniref Bacteria Sphingomonas haloaromaticamans WP 070933504 uniref Bacteria 100 61 Pyrobaculum calidifontis JCM 11548 YP 0010554731 Archaea 62 Thermoproteus tenax Kra 1 YP 0048931961 Archaea 64 61 Vulcanisaeta distributa DSM 14429 YP 0039011951 Archaea Thermoproteus uzoniensis 76820 YP 0043376401 Archaea 62 62 Acidilobus saccharovorans WP 013266096 uniref Archaea Caldisphaera lagunensis DSM 15908 YP 0071746121 Archaea 62 100 K1 NP 1469272 Archaea 62 Aeropyrum camini SY1 JCM 12091 YP 0086034471 Archaea 100 Sulfolobus islandicus HVE10 4 YP 0056455601 Archaea 60 Sulfolobus tokodaii str 7 NP 3759421 Archaea Sulfolobus islandicus WP 012713237 uniref Archaea 78 Sorangium cellulosum WP 012239513 uniref Bacteria 100 86 Pseudooceanicola batsensis WP 009803446 uniref Paraburkholderia xenovorans WP 011487921 uniref Ktedonobacter racemifer WP 007908085 uniref Bacteria 100 100 Rhodopseudomonas palustris WP 011156030 uniref Bacteria 95 Comamonas testosteroni WP 034375978 uniref Bacteria 32 Pseudovibrio WP 063290435 uniref Bacteria Sphingopyxis granuli WP 067182044 uniref Bacteria 100 47 Mycobacterium WP 007167596 uniref Bacteria marine gamma proteobacterium HTCC2143 WP 007224778 uniref 11 100 100 100 Mycobacterium tuberculosis WP 003905366 uniref Bacteria Mycobacterium WP 005080429 uniref Bacteria Rhodococcus sp PBTS 1 WP 068102333 uniref Bacteria 66 100 Cupriavidus metallidurans WP 011519907 uniref Bacteria 80 Hyphomonas adhaerens WP 035573072 uniref Bacteria 82 Frankia inefficax WP 013423992 uniref Bacteria 50 Pseudonocardia dioxanivorans WP 013673522 uniref Bacteria 60 Mycobacterium WP 007169955 uniref Bacteria Rhodopseudomonas palustris WP 011156802 uniref Bacteria 55 Pseudooceanicola batsensis WP 009804114 uniref 100 87 64 Novosphingobium aromaticivorans WP 011444179 uniref Bacteria Paraburkholderia xenovorans WP 011493497 uniref 37 Frankia inefficax WP 013424356 uniref Bacteria Lokiarchaeum sp GC14 75 KKK40489 ASGARD 65 100 Thermomonospora curvata WP 012853035 uniref Bacteria Mycobacterium liflandii WP 015356648 uniref Bacteria marine gamma proteobacterium HTCC2143 WP 007223871 uniref 44 59 Aromatoleum aromaticum WP 011238810 uniref Bacteria 87 archaeon Heimdall LC 3 OLS23839 ASGARD 88 Ferroglobus placidus DSM 10642 YP 0034354681 Archaea 97 56 Candidatus Koribacter versatilis WP 011522066 uniref Bacteria 89 Chloracidobacterium thermophilum WP 014099232 uniref Bacteria Geobacter bemidjiensis WP 012529863 uniref Bacteria 12 94 Nocardia farcinica WP 011208805 uniref Bacteria 28 Halovivax ruber XH70 YP 0072852371 Archaea 31 Paraburkholderia xenovorans WP 011493722 uniref Frankia alni WP 011606367 uniref Bacteria 97 97 Streptomyces WP 014046606 uniref Bacteria 97 Streptomyces noursei WP 067344425 uniref Bacteria 10 99 Streptomyces himastatinicus WP 009717311 uniref Bacteria Streptomyces reticuli WP 059247798 uniref Bacteria Caballeronia glathei WP 035935640 uniref 97 77 Brevibacillus brevis WP 015891151 uniref Bacteria 89 Amycolatopsis sp M39 WP 067575287 uniref Bacteria Patulibacter medicamentivorans WP 007570965 uniref Bacteria 100 43 28 Roseovarius sp TM1035 WP 008281412 uniref Bacteria Sinorhizobium medicae WP 011969973 uniref Bacteria 95 Mycobacterium sinense WP 013830735 uniref Bacteria Candidatus Lokiarchaeota archaeon CR 4 OLS14783 ASGARD 100 100 Branchiostoma floridae XP 002595384 uniref Eukaryota 100 Branchiostoma floridae XP 002607030 uniref Eukaryota 100 Trichoplax adhaerens XP 002116876 uniref Eukaryota Nematostella vectensis XP 001630112 uniref Eukaryota 100 100 Homo sapiens NP 001910 uniref Eukaryota Prima Mus musculus NP 034153 UNIPROT Eukaryota Glir 99 Thalassiosira pseudonana CCMP1335 XP 002290549 uniref Eukaryota 97 Phytophthora infestans T304 XP 002907736 uniref Eukaryota 100 97 Drosophila melanogaster NP 609322 uniref Eukaryota Muscomo 100 97 Drosophila melanogaster NP 609324 uniref Eukaryota Muscomo Aedes aegypti XP 001649724 uniref Eukaryota C 75 Acyrthosiphon pisum 3 2transenoylCoA isomerase mitochondriallike Acyrthosiphon pisum NP 001191928 Apis mellifera PREDICTED enoylCoA delta isomerase 1 mitochondriallike Apis mellifera XP 624385 100 97 100 Pseudopedobacter saltans WP 013634600 uniref Pedobacter glucosidilyticus WP 039454871 uniref Bacteria Maribacter sp HTCC2170 WP 013307558 uniref Bacteria 73 100 Lysobacter sp A03 WP 043957485 uniref Bacteria Amantichitinum ursilacus WP 053936004 uniref 74 Desulfococcus oleovorans WP 012175971 uniref Bacteria 76 15 Talaromyces stipitatus ATCC 10500 XP 002486317 97 Desulfomonile tiedjei WP 014807981 uniref Bacteria Ferroglobus placidus DSM 10642 YP 0034363221 Archaea 71 99 95 Erythrobacter sp NAP1 WP 007163228 uniref Bacteria 61 Phaeobacter sp CECT 5382 WP 058334795 uniref Bacteria Pelobacter propionicus WP 011734049 uniref Bacteria 30 87 Leptospira licerasiae WP 008592889 uniref Bacteria 10 Leptospira biflexa WP 012387089 uniref Bacteria 50 archaeon Heimdall LC 3 3hydroxypropionylcoenzyme A dehydratase archaeon Heimdall LC 3 OLS24529 ASGARD 98 9 100 Mycobacterium WP 007166709 uniref Bacteria Geodermatophilus obscurus WP 012947893 uniref Bacteria 10 Amycolatopsis sp M39 WP 067582072 uniref Bacteria Providencia rettgeri WP 042844560 uniref Bacteria 6 Leptospira biflexa WP 012388770 uniref Bacteria 22 Rhodococcus WP 011593692 uniref Bacteria Rhodococcus erythropolis WP 070385651 uniref Bacteria Mycobacterium kansasii WP 023368831 uniref Bacteria 100 49 Cytophaga hutchinsonii WP 011586325 uniref Bacteria 97 tractuosa WP 013453505 uniref Bacteria 87 WP 013061966 uniref Bacteria 75 Lokiarchaeum sp GC14 75 putative enoylCoA hydratase echA8 partial Lokiarchaeum sp GC14 75 KKK45011 ASGARD Granulicella mallensis WP 014266994 uniref Bacteria Chloracidobacterium thermophilum WP 014100070 uniref Bacteria 95 100 87 Waddlia chondrophila WP 013181550 uniref Bacteria 73 Isosphaera pallida WP 013564659 uniref Bacteria Achromobacter xylosoxidans WP 058207472 uniref Bacteria 74 WP 053767930 uniref Bacteria 100 89 Desulfatibacillum alkenivorans WP 012610173 uniref Bacteria 89 Desulfobacterium autotrophicum WP 015905693 uniref Bacteria Nocardioides sp JS614 WP 011757565 uniref Bacteria 100 Rhodococcus opacus WP 015890819 uniref Bacteria 100 100 Mycobacterium tuberculosis complex WP 003414497 uniref Bacteria Mycobacterium chelonae group WP 005081745 uniref Bacteria 84 Ilumatobacter coccineus WP 015440685 uniref Bacteria 100 87 Arabidopsis thaliana NP 193356 uniref Eukaryota 71 94 Oryza brachyantha PREDICTED probable enoylCoA hydratase 1 peroxisomal Oryza brachyantha XP 006649975 82 Thalassiosira pseudonana CCMP1335 XP 002295697 uniref Eukaryota Pseudonocardia dioxanivorans WP 013674225 uniref Bacteria 98 gamma proteobacterium IMCC3088 WP 009574889 uniref gamma proteobacterium HIMB55 WP 009472242 uniref 100 Ostreococcus lucimarinus CCE9901 XP 001419835 uniref Eukaryota Ostreococcus tauri XP 003081364 uniref Eukaryota 99 100 marine gamma proteobacterium HTCC2080 WP 007235910 uniref 88 77 54 Parvularcula bermudensis WP 013300350 uniref Bacteria Phenylobacterium zucineum WP 012523111 uniref Bacteria 99 99 Emticicia oligotrophica WP 015030666 uniref Bacteria Candidatus Lokiarchaeota archaeon CR 4 EnoylCoA hydratase isomerase Lokiarchaeota archaeon CR 4 OLS15548 ASGARD 100 Desulfatibacillum alkenivorans WP 012610432 uniref Bacteria 85 99 100 Ilumatobacter coccineus WP 015440926 uniref Bacteria Ideonella sakaiensis WP 054022008 uniref Bacteria Candidatus Phaeomarinobacter ectocarpi WP 043948747 uniref 97 Sphingomonas wittichii WP 011952744 uniref Bacteria 79 82 Mycobacterium smegmatis WP 058126469 uniref Bacteria 99 79 Rhodobacteraceae WP 028288592 uniref Bacteria WP 054525391 uniref Bacteria Rhodobacteraceae bacterium O365 WP 068242768 uniref Bacteria 100 100 Nocardioides sp JS614 WP 011757536 uniref Bacteria Candidatus Phaeomarinobacter ectocarpi WP 043948572 uniref 15 Frankia alni WP 011603781 uniref Bacteria 100 100 archaeon Heimdall LC 3 1 4Dihydroxy2naphthoylCoA synthase archaeon Heimdall LC 3 OLS23844 ASGARD archaeon Heimdall LC 2 1 4Dihydroxy2naphthoylCoA synthase archaeon Heimdall LC 2 OLS26766 ASGARD Ferroglobus placidus DSM 10642 YP 0034354771 Archaea 100 Clostridiales WP 003511159 uniref Bacteria 50 Solibacillus WP 008403119 uniref Bacteria 90 95 marine gamma proteobacterium HTCC2148 WP 007227481 uniref Chelativorans sp BNC1 WP 011582597 uniref Bacteria 49 90 Pseudomonas putida WP 012051973 uniref Bacteria 88 Amycolatopsis sp M39 WP 067595752 uniref Bacteria Ardenticatena maritima WP 054493681 uniref 100 Rhodococcus jostii WP 011595371 uniref Bacteria Candidatus Phaeomarinobacter ectocarpi WP 043949981 uniref 84 86 100 85 Mycobacterium gastri WP 036414103 uniref Bacteria 85 Mycobacterium smegmatis WP 058126071 uniref Bacteria 11 99 Rhodobacteraceae WP 043753635 uniref Bacteria Bdellovibrio bacteriovorus WP 011165085 uniref Bacteria Desulfobulbus propionicus WP 015723375 uniref Bacteria 82 19 Achromobacter piechaudii WP 006220764 uniref Bacteria Sphingopyxis alaskensis WP 011541159 uniref Bacteria Natronomonas moolapensis 8811 YP 0074873851 Archaea 100 100 Haloarcula hispanica ATCC 33960 YP 0047960691 Archaea 100 Halomicrobium mukohataei DSM 12286 YP 0031783551 Archaea 100 DS2 YP 0035355091 Archaea Halogeometricum borinquense WP 006054882 uniref Archaea 23 100 tiamatea WP 008525329 uniref Archaea Halorhabdus utahensis DSM 12940 YP 0031317721 Archaea Haloterrigena turkmenica DSM 5511 YP 0034053221 Archaea 100 100 Burkholderia pseudomallei WP 004533653 uniref Bacteria 65 95 Acidovorax sp Root70 WP 056638643 uniref Bacteria Azospirillum lipoferum WP 014248081 uniref Bacteria Parageobacillus caldoxylosilyticus WP 061578694 uniref 68 100 100 Patulibacter medicamentivorans WP 007576601 uniref Bacteria 57 73 Gordonia bronchialis WP 012834347 uniref Bacteria archaeon Heimdall LC 2 OLS27212 ASGARD 54 WP 010189256 uniref Bacteria Carboxydothermus hydrogenoformans WP 011345136 uniref Bacteria 100 77 Lokiarchaeum sp GC14 75 KKK43273 ASGARD Lokiarchaeum sp GC14 75 KKK42724 ASGARD Desulfosporosinus orientis WP 014183265 uniref Bacteria 100 65 Sphingomonas wittichii WP 011951195 uniref Bacteria 48 Sphingobium sp EP60837 WP 066523951 uniref Bacteria 99 57 Alicycliphilus WP 013520092 uniref Bacteria 95 Bordetella ansorpii WP 066408907 uniref Bacteria Marinovum algicola WP 048534167 uniref Bacteria 56 Rhodopseudomonas palustris WP 011156974 uniref Bacteria 64 99 Phenylobacterium zucineum WP 012523096 uniref Bacteria 100 Pseudovibrio axinellae WP 068008998 uniref Bacteria 80 98 Gordonia polyisoprenivorans WP 014360135 uniref Bacteria Steroidobacter denitrificans WP 066918033 uniref Bacteria Desulfosporosinus orientis WP 014185322 uniref Bacteria 100 96 Candidatus Lokiarchaeota archaeon CR 4 OLS13324 ASGARD Lokiarchaeum sp GC14 75 KKK43521 ASGARD archaeon Heimdall LC 2 OLS28944 ASGARD 94 47 Homo sapiens NP 004671 uniref Eukaryota Prima 89 Mus musculus NP 034011 UNIPROT Eukaryota Glir 100 Homo sapiens NP 689555 UNIPROT Eukaryota Prim 96 Schistosoma mansoni XP 018654624 uniref Eukaryota Oreochromis niloticus PREDICTED chromodomain Ylike protein Oreochromis niloticus XP 005476945 uniref Eukaryota 96 Nematostella vectensis XP 001623931 uniref Eukaryota Branchiostoma floridae XP 002597640 uniref Eukaryota 100 88 78 Mus musculus NP 001103801 uniref Eukaryota Gl 99 Mus musculus NP 081223 uniref Eukaryota Glire 100 Homo sapiens NP 996667 uniref Eukaryota Prima 50 99 Ornithorhynchus anatinus XP 007667278 uniref Eukaryota Caenorhabditis elegans NP 496330 uniref Eukaryota Ixodes scapularis XP 002399360 uniref Eukaryota 100 Sutterella wadsworthensis WP 005429422 uniref Bacteria Parasutterella excrementihominis WP 008810213 uniref Bacteria 87 100 100 Photobacterium damselae WP 005299845 uniref Bacteria Photobacterium angustum WP 005367235 uniref Bacteria 60 Photobacterium profundum WP 011219012 uniref Bacteria 100 60 100 Achromobacter WP 024067558 uniref Bacteria Bordetella ansorpii WP 066128546 uniref Bacteria Silicibacter sp TrichCH4B WP 009179391 uniref Bacteria Cupriavidus taiwanensis WP 012354065 uniref Bacteria 42 42 Beijerinckia indica WP 012384735 uniref Bacteria 79 Marinomonas aquimarina WP 067206707 uniref Bacteria Shewanella oneidensis WP 011073654 uniref Bacteria Candidatus Contendobacter odensis WP 034435056 uniref 84 59 99 89 Enhydrobacter aerosaccus WP 007116611 uniref Bacteria 68 Reinekea blandensis WP 008046329 uniref Bacteria Marinobacter salarius WP 041335703 uniref Bacteria 47 Mesorhizobium amorphae WP 006199955 uniref Bacteria 100 97 Saccharophagus degradans WP 011468855 uniref Bacteria 86 Teredinibacter turnerae WP 015818864 uniref Bacteria Bermanella marisrubri WP 007016606 uniref Bacteria 91 66 Magnetospirillum sp XM1 WP 068428735 uniref Bacteria 42 63 94 75 Phaeospirillum fulvum WP 021131993 uniref Bacteria Perkinsus marinus ATCC 50983 XP 002777433 uniref Eukaryota Sphingobium chlorophenolicum WP 013849781 uniref Bacteria 89 65 85 Sphingomonas wittichii WP 011952684 uniref Bacteria Amycolatopsis sp M39 WP 067580453 uniref Bacteria WP 001186962 uniref Bacteria 37 Lokiarchaeum sp GC14 75 KKK45214 ASGARD 47 47 Salinisphaera shabanensis WP 006914021 uniref Bacteria Photobacterium profundum WP 011220296 uniref Bacteria Neptuniibacter caesariensis WP 007022118 uniref Bacteria 100 100 Mycobacterium kansasii WP 036402968 uniref Bacteria 93 98 Mycobacterium smegmatis WP 058126322 uniref Bacteria 99 Thermomonospora curvata WP 012852928 uniref Bacteria Frankia alni WP 011606238 uniref Bacteria 100 Desulfococcus oleovorans WP 012174511 uniref Bacteria Desulfatibacillum alkenivorans WP 015947918 uniref Bacteria 100 Aliiglaciecola lipolytica WP 008845082 uniref Phaeobacter sp CECT 5382 WP 058334832 uniref Bacteria Laccaria bicolor S238NH82 XP 001879421 uniref Eukaryota 94 Aspergillus oryzae RIB40 XP 001823380 uniref Eukaryota 97 85 Verticillium alfalfae VaMs102 XP 003002909 uniref Eukaryota Plectosphaerell 75 Talaromyces stipitatus ATCC 10500 XP 002478355 uniref Eukaryota 94 90 Uncinocarpus reesii 1704 XP 002543354 uniref Eukaryota 58 100 Penicillium rubens Wisconsin 541255 XP 002558283 uniref Eukaryota Penicillium rubens Wisconsin 541255 XP 002566133 uniref Eukaryota 86 Leptosphaeria maculans JN3 XP 003841409 uniref Eukaryota Fusarium graminearum PH1 XP 011316407 uniref Eukaryota 97 Candidatus Phaeomarinobacter ectocarpi WP 043948283 uniref Lokiarchaeum sp GC14 75 KKK44846 ASGARD 92 85 100 Mycobacterium sinense WP 013828028 uniref Bacteria 100 Nocardioides dokdonensis WP 068105229 uniref Bacteria 99 Phenylobacterium sp Root700 WP 056735861 uniref Bacteria 99 Alcanivorax WP 011589827 uniref Bacteria Rhodococcus WP 033236355 uniref Bacteria 98 100 Arthrobacter alpinus WP 062007316 uniref Bacteria 66 Amycolatopsis sp M39 WP 067579104 uniref Bacteria 74 98 Mycobacterium colombiense WP 007768396 uniref Bacteria 100 Mycobacterium kansasii WP 023372045 uniref Bacteria 51 Mycobacterium abscessus WP 005084143 uniref Bacteria 100 Nocardia seriolae WP 033085575 uniref Bacteria Mycobacterium hassiacum WP 005632724 uniref Bacteria Desulfococcus oleovorans WP 012175664 uniref Bacteria 99 100 Micromonospora aurantiaca WP 013285324 uniref Bacteria 66 98 Aeromicrobium erythreum WP 067853993 uniref Bacteria 94 Rhodococcus WP 019745702 uniref Bacteria Gordonia polyisoprenivorans WP 014360141 uniref Bacteria 100 45 Aeromicrobium marinum WP 007076733 uniref Bacteria Frankia WP 011605827 uniref Bacteria 100 98 98 Saccharopolyspora erythraea WP 009947886 uniref Bacteria Kutzneria albida WP 025355764 uniref Bacteria Frankia sp EAN1pec WP 020460492 uniref Bacteria 85 Mycobacterium liflandii WP 015356645 uniref Bacteria 100 99 100 Mesorhizobium amorphae WP 006199827 uniref Bacteria 91 Afipia sp P5210 WP 051444351 uniref Bacteria 100 Cupriavidus pinatubonensis WP 011298067 uniref Bacteria 97 marine gamma proteobacterium HTCC2143 WP 007224546 uniref Verminephrobacter eiseniae WP 011810500 uniref Bacteria Phenylobacterium zucineum WP 012523101 uniref Bacteria 100 68 marine gamma proteobacterium HTCC2143 WP 007223543 uniref Rhodopseudomonas palustris WP 011159855 uniref Bacteria Ilumatobacter coccineus WP 015441338 uniref Bacteria 100 100 Lokiarchaeum sp GC14 75 KKK42140 ASGARD 82 Candidatus Lokiarchaeota archaeon CR 4 OLS14634 ASGARD Lokiarchaeum sp GC14 75 KKK40657 ASGARD 93 47 Mycobacterium WP 007171188 uniref Bacteria Azospirillum lipoferum WP 012976680 uniref Bacteria 100 84 Mycobacterium tuberculosis complex WP 003407504 uniref Bacteria 72 Mycobacterium leprae WP 010908217 UNIPROT Bacteria Variovorax boronicumulans WP 070062852 uniref Bacteria 100 70 100 gamma proteobacterium NOR53 WP 009025024 uniref Novosphingobium aromaticivorans WP 011445391 uniref Bacteria 75 Desulfobacterium autotrophicum WP 015904777 uniref Bacteria Rhodococcus opacus WP 012689566 uniref Bacteria 100 100 Rhodococcus equi WP 013417534 uniref Bacteria Mycobacterium rhodesiae WP 014208930 uniref Bacteria 95 89 gamma proteobacterium IMCC3088 WP 009575517 uniref 54 Mycobacterium abscessus WP 017205568 uniref Bacteria Frankia WP 011605100 uniref Bacteria 100 97 Bordetella pertussis WP 010929730 uniref Bacteria 100 Labrenzia alba WP 055113399 uniref Bacteria Ktedonobacter racemifer WP 007918793 uniref Bacteria 97 Plesiocystis pacifica WP 006970172 uniref Bacteria Streptomyces bingchenggensis WP 014175806 uniref Bacteria 95 Sphingopyxis alaskensis WP 011541158 uniref Bacteria 99 Sphingomonas wittichii WP 011952905 uniref Bacteria 100 96 Nocardioides sp JS614 WP 011757533 uniref Bacteria Pseudonocardia dioxanivorans WP 013672721 uniref Bacteria 100 99 Novosphingobium sp AP12 WP 007687459 uniref Bacteria 70 Frankia alni WP 011605257 uniref Bacteria Rhodococcus erythropolis WP 070386433 uniref Bacteria 99 100 Pseudooceanicola batsensis WP 009805929 uniref 100 Rhodopseudomonas palustris WP 011156052 uniref Bacteria 76 marine gamma proteobacterium HTCC2143 WP 007224418 uniref 100 marine gamma proteobacterium HTCC2080 WP 007234476 uniref 59 Bradyrhizobiaceae bacterium SG6C WP 009735068 uniref Bacteria 97 Actinoplanes sp SE50 110 WP 014694784 uniref Bacteria Streptomyces reticuli WP 059252443 uniref Bacteria Rothia dentocariosa WP 013397880 uniref Bacteria

2.0

37

Suppl. Figure 9: Phylogenetic analysis of Enoyl-CoA reductase clade 2 (COG1024). Phylogenetic tree was estimated using IQ-tree LG+C20+F on an alignment of 374 taxa and 133 sites and is unrooted. Bacteria, Archaea, eukaryotes and Asgards are colored in black, green, blue and purple respectively. Raw data files are available via figshare (see Data availability for more details).

38

Hexadecanoate

COG0318

Acyl-CoA (Cn) Acyl-CoA (Cn+2 ) 3-hydroxyacyl-CoA dehydrogenase COG1960* or Archaea Acetyl-CoA COG1250 clade 1 (of 1) COG2368 Bacteria

-oxidation 288 taxa, 240 sites β Acyl-2-enoyl-CoA UF IQTREE bootstraps Eukaryotes LG+C20+G+F COG1024* Asgards COG0183* - FA 3-hydroxy-acyl-CoA COG1250*

Aceto-acyl-CoA Fatty acid biosynthesis

100 Cyclospora cayetanensis XP 022589126 Eukaryota 100 Eimeria praecox 3hydroxybutyrylcoA dehydrogenase putative Eimeria praecox CDI85290 Eukaryota 100 100 Neospora caninum Liverpool TPA 3hydroxybutyrylcoA dehydrogenase putative Neospora caninum Liverpool CEL67408 Eukaryota Toxoplasma gondii COUG 3hydroxyacylCoA dehydrogenase NAD binding domaincontaining protein Toxoplasma gondii COUG PIM03191 Eukaryota 53 Vitrella brassicaformis CCMP3155 unnamed protein Vitrella brassicaformis CCMP3155 CEL97709 Eukaryota 100 Histoplasma capsulatum NAm1 XP 001537416 Eukaryota 28 99 Talaromyces stipitatus ATCC 10500 XP 002482328 Eukaryota 28 Cryptococcus neoformans var neoformans JEC21 XP 572102 Eukaryota Acanthamoeba castellanii str Neff XP 004347130 Eukaryota 30 Stentor coeruleus hypothetical protein SteCoe 23297 Stentor coeruleus OMJ77156 Eukaryota 20 Paramecium tetraurelia strain d42 XP 001459051 Eukaryota 90 Tetrahymena thermophila SB210 XP 001020033 Eukaryota 93 100 Ichthyophthirius multifiliis XP 004039908 Eukaryota Plasmodiophora brassicae hypothetical protein PBRA 003454 Plasmodiophora brassicae CEP03847 Eukaryota 100 Dictyostelium fasciculatum XP 004360665 Eukaryota 100 Acytostelium subglobosum LB1 XP 012750007 Eukaryota 98 Dictyostelium lacteum 3hydroxybutyrylCoA dehydrogenase Dictyostelium lacteum KYQ91444 Eukaryota Dictyostelium purpureum XP 003290685 Eukaryota 98 100 Cucumis melo PREDICTED 3hydroxybutyrylCoA dehydrogenase Cucumis melo XP 008456022 Eukaryota 100 Vigna radiata var radiata uncharacterized protein LOC106756573 isoform X1 Vigna radiata var radiata XP 022634436 Eukaryota 100 Populus trichocarpa XP 006370238 Eukaryota 98 100 Oryza sativa Japonica Group XP 015621276 Eukaryota Zea mays NP 001130188 Eukaryota 95 98 Physcomitrella patens XP 001754185 Eukaryota 100 Bradyrhizobium WP 011084197 Bacteria 100 Rhodopseudomonas palustris WP 011160280 Bacteria 100 Methylobacterium aquaticum WP 060848893 Bacteria Mesorhizobium WP 024922688 100 Bacillus cereus group WP 002113074 Bacteria 94 Bacillus WP 003230294 Bacteria 100 Lampropedia cohaerens WP 046742772 Bacteria 100 Paraburkholderia ginsengiterrae WP 064264991 100 100 Hydrocarboniphaga effusa WP 007186683 Bacteria 100 Pseudomonas WP 014338772 Bacteria 96 Azoarcus sp CIB WP 050416670 Bacteria Chromobacterium violaceum WP 011135634 Bacteria 100 Leucothrix mucor 3hydroxybutyrylCoA dehydrogenase Leucothrix mucor WP 022953370 Bacteria 100 Pedobacter sp Leaf41 WP 056849282 Bacteria Marivirga tractuosa WP 013452714 Bacteria 83 Euryarchaeota archaeon TMED85 3hydroxybutyrylCoA dehydrogenase Euryarchaeota archaeon TMED85 OUU98955 85 uncultured Candidatus Thalassoarchaea euryarchaeote 3hydroxybutyrylCoA dehydrogenase paaH hbd fadB mmgB uncultured Candidatus Thalassoarchaea euryarchaeote ANV80052 82 94 Euryarchaeota archaeon TMED255 3hydroxybutyrylCoA dehydrogenase Euryarchaeota archaeon TMED255 OUX23986 85 Marine Group II euryarchaeote MEDG33 3hydroxybutyrylCoA dehydrogenase Marine Group II euryarchaeote MEDG33 PDH26084 100 95 Marine Group II euryarchaeote MEDG35 3hydroxybutyrylCoA dehydrogenase Marine Group II euryarchaeote MEDG35 PDH23999 uncultured marine group II III euryarchaeote AD1000 88 H01 3hydroxyacylCoA dehydrogenase paaH hbd fadB mmgB uncultured marine group II III euryarchaeote AD1000 88 H01 AIE96994 97 100 Euryarchaeota archaeon TMED97 OUV25020 Euryarchaeota archaeon TMED192 3hydroxybutyrylCoA dehydrogenase Euryarchaeota archaeon TMED192 OUW56787 100 uncultured marine group II euryarchaeote 3hydroxybutyrylCoA dehydrogenase 3hydroxyacylCoA dehydrogenase uncultured marine group II euryarchaeote EHR76632 100 100 Euryarchaeota archaeon TMED103 3hydroxybutyrylCoA dehydrogenase Euryarchaeota archaeon TMED103 OUV40824 97 87 Euryarchaeota archaeon TMED248 3hydroxybutyrylCoA dehydrogenase Euryarchaeota archaeon TMED248 OUX16982 Euryarchaeota archaeon TMED141 3hydroxybutyrylCoA dehydrogenase Euryarchaeota archaeon TMED141 OUV96242 89 Euryarchaeota archaeon SG85 3hydroxybutyrylCoA dehydrogenase Euryarchaeota archaeon SG85 KYK28820 94 Deinococcus radiodurans WP 010887711 Bacteria 98 78 Thermus thermophilus WP 011173325 Bacteria ruber WP 013012719 Bacteria 97 100 Marine Group III euryarchaeote CGEpi4 3hydroxybutyrylCoA dehydrogenase Marine Group III euryarchaeote CGEpi4 OIR20725 Marine Group III euryarchaeote CGEpi1 3hydroxybutyrylCoA dehydrogenase Marine Group III euryarchaeote CGEpi1 OIR18473 Thermoplasmatales archaeon ex4484 36 3hydroxybutyrylCoA dehydrogenase Thermoplasmatales archaeon ex4484 36 OYT49743 Archaea 100 Thermoplasmatales archaeon SG8523 3hydroxybutyrylCoA dehydrogenase Thermoplasmatales archaeon SG8523 KYK34111 Archaea 100 Thermoplasmatales archaeon SG8524 3hydroxybutyrylCoA dehydrogenase Thermoplasmatales archaeon SG8524 KYK22940 Archaea 100 80 100 Thermoplasmatales archaeon SCGC AB540F20 3hydroxyacylCoA dehydrogenase Thermoplasmatales archaeon SCGC AB540F20 EMR75660 Archaea Thermoplasmatales archaeon ex4484 6 3hydroxybutyrylCoA dehydrogenase Thermoplasmatales archaeon ex4484 6 OYT43976 Archaea 100 Euryarchaeota archaeon CG01 land 8 20 14 3 00 38 12 PIV69382 100 Candidatus Altiarchaeales archaeon WOR SM1 79 3hydroxybutyrylCoA dehydrogenase Candidatus Altiarchaeales archaeon WOR SM1 79 ODS37769 Theionarchaea archaeon DG701 3hydroxybutyrylCoA dehydrogenase Theionarchaea archaeon DG701 KYK33847 100 Archaeoglobus sulfaticallidus WP 015591182 Archaea Geoglobus ahangari WP 048095966 Archaea 86 Caldisalinibacter kiritimatiensis WP 006313855 100 Anaerobranca californiensis WP 072908667 Bacteria 66 100 99 Caloranaerobacter sp TR13 WP 054871896 Bacteria Clostridium saccharoperbutylacetonicum WP 015392175 Bacteria 100 Thermoanaerobacterium thermosaccharolyticum WP 013298133 UNIPROT Bacteria 100 Clostridium botulinum D str 1873 3hydroxybutyrylCoA dehydrogenase Clostridium botulinum D str 1873 EES92122 Bacteria 100 Clostridium acetobutylicum WP 034582187 Bacteria 59 100 Clostridium bornimense WP 044037332 Bacteria 59 Oleiphilus WP 068453379 98 Maledivibacter halophilus WP 079489958 99 Clostridiaceae bacterium BRH c20a 3hydroxybutyrylCoA dehydrogenase Clostridiaceae bacterium BRH c20a KJS19562 Bacteria 99 Desulfotomaculum alcoholivorax 3hydroxybutyrylCoA dehydrogenase Desulfotomaculum alcoholivorax WP 027365128 Bacteria 100 Desulfotomaculum sp 46 80 KUK65319 Bacteria 97 89 Mycobacterium abscessus subsp abscessus 3hydroxyacylCoA dehydrogenase Mycobacterium abscessus subsp abscessus SHT29673 Bacteria Myco Dethiobacter alkaliphilus WP 008515217 Bacteria 75 Desulfacinum infernum DSM 9756 3hydroxyacylCoA dehydrogenase Desulfacinum infernum DSM 9756 SHG23897 Bacteria candidate divison MSBL1 archaeon SCGCAAA259E22 KXA93283 100 Candidatus Thorarchaeota archaeon SMTZ145 3hydroxybutyrylCoA dehydrogenase Candidatus Thorarchaeota archaeon SMTZ145 KXH71468 100 Candidatus Thorarchaeota archaeon AB 25 putative 3hydroxybutyrylCoA dehydrogenase Candidatus Thorarchaeota archaeon AB 25 OLS31639 97 97 Candidatus Heimdallarchaeota archaeon LC 3 putative 3hydroxybutyrylCoA dehydrogenase Candidatus Heimdallarchaeota archaeon LC 3 OLS23846 65 85 archaeon RBG 16 50 20 3hydroxybutyrylCoA dehydrogenase archaeon RBG 16 50 20 OFX17113 (possible missasembly/contamination, near assembly gap) archaeon Heimdall LC 2 putative 3hydroxybutyrylCoA dehydrogenase archaeon Heimdall LC 2 OLS20154 91 Candidatus Heimdallarchaeota archaeon LC 2 putative 3hydroxybutyrylCoA dehydrogenase Candidatus Heimdallarchaeota archaeon LC 2 OLS16841 100 Candidatus Syntrophoarchaeum butanivorans 3hydroxyacylCoA dehydrogenase Candidatus Syntrophoarchaeum butanivorans OFV66995 78 Candidatus Syntrophoarchaeum caldarius secreted protein containing 3hydroxyacylCoA dehydrogenase NAD binding Candidatus Syntrophoarchaeum caldarius OFV67828 100 Thermoplasmatales archaeon Aplasma hypothetical protein AMDU1 APLC00006G0039 Thermoplasmatales archaeon Aplasma EQB72615 Archaea Picrophilus torridus WP 011178305 Archaea 93 Euryarchaeota archaeon RBG 16 67 27 3hydroxybutyrylCoA dehydrogenase partial Euryarchaeota archaeon RBG 16 67 27 OGS46529 100 Euryarchaeota archaeon RBG 16 68 13 3hydroxybutyrylCoA dehydrogenase Euryarchaeota archaeon RBG 16 68 13 OGS49359 82 Euryarchaeota archaeon RBG 19FT COMBO 69 17 3hydroxybutyrylCoA dehydrogenase Euryarchaeota archaeon RBG 19FT COMBO 69 17 OGS61163 43 Candidatus Bathyarchaeota archaeon B25 3hydroxybutyrylCoA dehydrogenase Candidatus Bathyarchaeota archaeon B25 KYH38298 97 Natrinema gari WP 008456737 Archaea 97 Natronolimnobius innermongolicus WP 007258647 Archaea 100 99 Natronorubrum sediminis WP 090507612 Archaea 98 Halostagnicola kamekurae WP 092906045 Archaea 100 Halopenitus persicus 3hydroxybutyrylCoA dehydrogenase Halopenitus persicus WP 096389893 Haladaptatus litoreus WP 076429094 Archaea Natronococcus amylolyticus 3hydroxybutyrylCoA dehydrogenase Natronococcus amylolyticus WP 049891430 Archaea 100 Saprolegnia parasitica CBS 22365 XP 012195374 Eukaryota 100 Thraustotheca clavata hydroxyacylcoenzyme A dehydrogenase mitochondrial precursor Thraustotheca clavata OQR95289 Eukaryota 100 Aphanomyces invadans XP 008864050 Eukaryota 93 Pythium insidiosum hydroxyacylcoenzyme A dehydrogenase mitochondrial precursor Pythium insidiosum GAY06457 Eukaryota 100 99 Plasmopara halstedii hydroxyacylcoenzyme a mitochondrial precursor Plasmopara halstedii CEG39381 Eukaryota 41 Albugo candida unnamed protein product Albugo candida CCI41170 Eukaryota Ectocarpus siliculosus hydroxyacylCoenzyme A dehydrogenase Ectocarpus siliculosus CBJ29605 Eukaryota 100 Euglena gracilis L3hydroxyacylCoA dehydrogenase subunit precursor Euglena gracilis AAO72312 Eukaryota 100 Fistulifera solaris 3hydroxyacylCoA dehydrogenase Fistulifera solaris GAX17462 Eukaryota 100 Phaeodactylum tricornutum CCAP 1055 1 XP 002183747 Eukaryota Thalassiosira pseudonana CCMP1335 XP 002297045 Eukaryota 50 100 Homo sapiens NP 005318 Eukaryota 100 100 Mus caroli hydroxyacylcoenzyme A dehydrogenase mitochondrial Mus caroli XP 021013257 Eukaryota 98 Xenopus tropicalis NP 001011073 Eukaryota Thecamonas trahens ATCC 50062 XP 013755800 Eukaryota 61 84 short chain 3hydroxyacylCoA dehydrogenase Angomonas deanei EPY39263 Eukaryota 98 Leishmania braziliensis MHOM BR 75 M2904 XP 001568749 Eukaryota 100 100 EPY30233 Eukaryota Trypanosoma cruzi strain CL Brener XP 819900 Eukaryota 69 Capsaspora owczarzaki ATCC 30864 XP 004347391 Eukaryota 74 Toxocara canis putative 3hydroxyacylCoA dehydrogenase Toxocara canis KHN86975 Eukaryota 100 Candidatus Rokubacteria bacterium GWC2 70 16 3hydroxybutyrylCoA dehydrogenase Candidatus Rokubacteria bacterium GWC2 70 16 OGK88218 92 Candidatus Rokubacteria bacterium 13 1 40CM 2 68 8 3hydroxybutyrylCoA dehydrogenase Candidatus Rokubacteria bacterium 13 1 40CM 2 68 8 OLD38483 92 93 Acidobacteria bacterium RIFCSPHIGHO2 01 FULL 67 28 OFV85237 Bacteria Acidobacteria bacterium 13 1 40CM 2 68 5 3hydroxybutyrylCoA dehydrogenase Acidobacteria bacterium 13 1 40CM 2 68 5 OLD62630 Bacteria 96 73 candidate division Zixibacteria bacterium CG 4 9 14 3 um filter 46 8 3hydroxybutyrylCoA dehydrogenase candidate division Zixibacteria bacterium CG 4 9 14 3 um filter 46 8 PJA27787 100 92 Chloroflexi bacterium 13 1 40CM 4 68 4 3hydroxybutyrylCoA dehydrogenase Chloroflexi bacterium 13 1 40CM 4 68 4 OLC58022 92 Herpetosiphon geysericola WP 054533685 Bacteria 79 100 Armatimonadetes bacterium 13 1 40CM 3 65 7 3hydroxybutyrylCoA dehydrogenase Armatimonadetes bacterium 13 1 40CM 3 65 7 OLD59932 uncultured 3hydroxybutyrylCoA dehydrogenase uncultured prokaryote BAL56908 Archaeoglobus fulgidus KUJ93247 Archaea 54 100 Desulfocarbo indianensis WP 049673869 Bacillus cereus group WP 001007577 Bacteria 100 Candidatus Micrarchaeota archaeon CG1 02 55 22 hypothetical protein AUJ14 02850 Candidatus Micrarchaeota archaeon CG1 02 55 22 OIO26089 99 Candidatus Woesearchaeota archaeon CG10 big fil rev 8 21 14 0 10 44 13 3hydroxybutyrylCoA dehydrogenase Candidatus Woesearchaeota archaeon CG10 big fil rev 8 21 14 0 10 44 13 PIN86290 96 Methanolobus sp YSF03 hypothetical protein Methanolobus sp YSF03 WP 094228671 Archaea 83 Geobacter lovleyi WP 012470456 Bacteria 93 Lokiarchaeum sp GC14 75 putative 3hydroxybutyrylCoA dehydrogenase partial Lokiarchaeum sp GC14 75 KKK45648 archaeon Heimdall AB 125 putative 3hydroxybutyrylCoA dehydrogenase archaeon Heimdall AB 125 OLS32558 100 Candidatus Thorarchaeota archaeon SMTZ145 hypothetical protein AM325 10730 Candidatus Thorarchaeota archaeon SMTZ145 KXH70916 100 Candidatus Thorarchaeota archaeon AB 25 putative 3hydroxybutyrylCoA dehydrogenase Candidatus Thorarchaeota archaeon AB 25 OLS31254 100 Candidatus Thorarchaeota archaeon SMTZ183 hypothetical protein AM324 07530 Candidatus Thorarchaeota archaeon SMTZ183 KXH72340 100 Candidatus Thorarchaeota archaeon AB 25 putative 3hydroxybutyrylCoA dehydrogenase Candidatus Thorarchaeota archaeon AB 25 OLS31247 100 Theionarchaea archaeon DG70 3hydroxybutyrylCoA dehydrogenase Theionarchaea archaeon DG70 KYK31734 100 91 Theionarchaea archaeon DG701 3hydroxybutyrylCoA dehydrogenase Theionarchaea archaeon DG701 KYK34281 archaeon Heimdall LC 2 putative 3hydroxybutyrylCoA dehydrogenase archaeon Heimdall LC 2 OLS24556 Actinobacteria bacterium RBG 13 55 18 3hydroxybutyrylCoA dehydrogenase Actinobacteria bacterium RBG 13 55 18 OFW57061 100 uncultured Acidilobus sp CIS 3hydroxyacylCoA dehydrogenase uncultured Acidilobus sp CIS ESQ21999 100 Acidilobus sp 7A 3hydroxyacylCoA dehydrogenase Acidilobus sp 7A AMD30943 Archaea 100 100 Caldisphaera lagunensis WP 015232167 Archaea Aeropyrum camini WP 022541676 Archaea 81 Pyrodictium delaneyi WP 055410634 Archaea 100 Vulcanisaeta distributa WP 013335392 Archaea 100 Thermoproteus sp CP80 3hydroxyacylCoA dehydrogenase partial Thermoproteus sp CP80 WP 081228210 Archaea 86 Vulcanisaeta thermophila 3hydroxyacylCoA dehydrogenase Vulcanisaeta thermophila WP 069807647 Archaea 50 Pyrobaculum aerophilum str IM2 3hydroxyacylCoA dehydrogenase and enoylCoA hydratase Pyrobaculum aerophilum str IM2 AAL63449 Archaea 99 Sulfolobus tokodaii WP 010978008 Archaea 99 Acidianus manzaensis 3hydroxyacylCoA dehydrogenase Acidianus manzaensis ARM75529 Archaea 85 100 Candidatus Acidianus copahuensis WP 048098673 100 Sulfolobus sp A20 WP 069283260 Archaea 100 Sulfolobus islandicus WP 012716692 Archaea 100 Metallosphaera sedula WP 048806948 Archaea Metallosphaera yellowstonensis WP 009072665 Archaea 72 100 Metallosphaera cuprina 3hydroxybutyrylCoA dehydrogenase Metallosphaera cuprina WP 048057460 Archaea 100 Candidatus Acidianus copahuensis WP 048100649 100 Acidianus hospitalis WP 013777062 Archaea 99 73 Sulfolobus tokodaii WP 052846720 Archaea Sulfolobus sp JCM 16833 3hydroxybutyrylCoA dehydrogenase Sulfolobus sp JCM 16833 WP 054845266 Archaea 78 100 83 maquilingensis 3hydroxyacylCoA dehydrogenase Caldivirga maquilingensis WP 048062861 Archaea Thermocladium sp ECH B 3hydroxyacylCoA dehydrogenase Thermocladium sp ECH B KUO91351 Archaea 100 Ignicoccus hospitalis WP 012123200 Archaea 100 85 Ignicoccus islandicus 3hydroxyacylCoA dehydrogenase Ignicoccus islandicus WP 075049869 Archaea WP 014026963 Archaea Vulcanisaeta sp JCHS 4 3hydroxyacylCoA dehydrogenase Vulcanisaeta sp JCHS 4 KUO80465 Archaea 94 100 Geoglobus acetivorans WP 048090336 Archaea 98 Geoglobus ahangari WP 048095968 Archaea Sulfolobus islandicus WP 012713237 Archaea Metallosphaera cuprina 3hydroxybutyrylCoA dehydrogenase Metallosphaera cuprina WP 083808578 Archaea 93 100 archaeon 3hydroxyacylCoA dehydrogenase Nitrosopumilales archaeon PBO83099 Archaea 100 uncultured marine thaumarchaeote KM3 82 D05 3hydroxybutyrylCoA dehydrogenase uncultured marine thaumarchaeote KM3 82 D05 AIF18342 92 Marine Group I thaumarchaeote SCGC AAA799D07 WP 048083923 100 Candidatus Nitrosotenuis sp AQ6f 3hydroxyacylCoA dehydrogenase Candidatus Nitrosotenuis sp AQ6f WP 100183388 100 Candidatus Nitrosotenuis chungbukensis 3hydroxyacylCoA dehydrogenase Candidatus Nitrosotenuis chungbukensis WP 042686843 99 Candidatus Nitrosoarchaeum limnia SFB1 3hydroxybutyrylCoA dehydrogenase Candidatus Nitrosoarchaeum limnia SFB1 EGG42527 Archaea 82 77 Candidatus Nitrosopumilus adriaticus 3hydroxybutyrylCoA dehydrogenase Candidatus Nitrosopumilus adriaticus AJW70597 Archaea Nitrosotalea sp CS 3hydroxybutyrylCoA dehydrogenase Nitrosotalea sp CS SMH70789 Archaea 81 Haloterrigena turkmenica 3hydroxyacylCoA dehydrogenase Haloterrigena turkmenica WP 058864463 Archaea 83 Natronomonas pharaonis 3hydroxyacylCoA dehydrogenase Natronomonas pharaonis WP 011323868 Archaea 87 88 79 Halococcus agarilyticus 3hydroxyacylCoA dehydrogenase Halococcus agarilyticus WP 049898480 Archaea 100 Salinarchaeum sp HarchtBsk1 WP 020447818 Haloterrigena turkmenica WP 012944903 Archaea halophilic archaeon J07HX64 3hydroxyacylCoA dehydrogenase halophilic archaeon J07HX64 ERH09551 Archaea Candidatus Bathyarchaeota archaeon RBG 13 52 12 hypothetical protein A3K78 11235 Candidatus Bathyarchaeota archaeon RBG 13 52 12 OGD56491 97 100 Lokiarchaeum sp GC14 75 putative 3hydroxybutyrylCoA dehydrogenase Lokiarchaeum sp GC14 75 KKK41626 100 Lokiarchaeum sp GC14 75 putative 3hydroxybutyrylCoA dehydrogenase Lokiarchaeum sp GC14 75 KKK45763 9 Lokiarchaeum sp GC14 75 putative 3hydroxybutyrylCoA dehydrogenase Lokiarchaeum sp GC14 75 KKK44060 87 94 Candidatus Lokiarchaeota archaeon CR 4 3hydroxyacylCoA dehydrogenase Candidatus Lokiarchaeota archaeon CR 4 OLS14927 Desulfatibacillum alkenivorans WP 015948235 Bacteria 89 66 Archaeoglobus fulgidus 3hydroxyacylCoA dehydrogenase Hbd10 partial Archaeoglobus fulgidus KUK06290 Archaea Picrophilus torridus WP 011177822 Archaea 100 Ferroglobus placidus WP 012966419 Archaea Archaeoglobales archaeon ex4484 92 3hydroxyacylCoA dehydrogenase Archaeoglobales archaeon ex4484 92 OYT34728 Archaea 100 Nocardia paucivorans 3hydroxybutyrylCoA dehydrogenase Nocardia paucivorans WP 040790647 Bacteria 100 Nocardia transvalensis 3hydroxybutyrylCoA dehydrogenase Nocardia transvalensis WP 040746720 Bacteria 100 Mycobacterium peregrinum WP 064878550 Bacteria 45 Tomitella biformata 3hydroxybutyrylCoA dehydrogenase Tomitella biformata WP 024794565 Bacteria 59 100 paracollinoides WP 054710888 Bacteria Lactobacillus buchneri WP 014940795 Bacteria Lactobacillus versmoldensis WP 010624123 Bacteria 47 100 Ornithinibacillus scapharcae 3hydroxybutyrylCoA dehydrogenase Ornithinibacillus scapharcae WP 010095408 Bacteria 100 Oceanobacillus limi WP 090871653 Bacteria 61 Solibacillus silvestris WP 014823771 Bacteria 56 Enteractinococcus helveticum WP 043056932 50 100 Lysinibacillus odysseyi WP 036159542 Bacteria 100 Bacillus testis 3hydroxybutyrylCoA dehydrogenase Bacillus testis WP 050615487 Bacteria Paenisporosarcina sp TG20 hypothetical protein Paenisporosarcina sp TG20 WP 019414017 Bacteria 100 Methanobrevibacter sp AbM4 WP 016358742 Archaea 36 100 Methanobrevibacter boviskoreani 3hydroxybutyrylCoA dehydrogenase Methanobrevibacter boviskoreani WP 040681816 Archaea Methanosphaera WP 011406316 97 91 Elizabethkingia WP 034868128 100 Arenibacter certesii 3hydroxybutyrylCoA dehydrogenase Arenibacter certesii WP 026814127 Bacteria 99 100 Chryseobacterium sp YR477 3hydroxybutyrylCoA dehydrogenase Chryseobacterium sp YR477 WP 047434120 Bacteria Sphingobacterium cellulitidis WP 094257195 Bacteria 79 Alicyclobacillus acidocaldarius WP 012810620 Bacteria Corynebacterium camporealensis WP 046453578 Bacteria 100 Rhodococcus sp OK302 WP 094275457 Bacteria 100 Verminephrobacter eiseniae WP 011809063 Bacteria 100 Flavobacteriaceae bacterium 3hydroxybutyrylCoA dehydrogenase Flavobacteriaceae bacterium PCI09167 Bacteria 100 Gillisia sp JM1 3hydroxybutyrylCoA dehydrogenase Gillisia sp JM1 WP 026839065 Bacteria 96 Marinitoga sp 1197 3hydroxybutyrylCoA dehydrogenase Marinitoga sp 1197 KLO20982 Bacteria Reichenbachiella faecimaris WP 084371226 Bacteria 100 Desulfobacterales bacterium PIP40463 Bacteria 70 Desulfococcus oleovorans WP 012175103 Bacteria 66 Desulfobacterium autotrophicum WP 012662675 Bacteria 80 Desulfoluna spongiiphila WP 092213138 Bacteria 72 80 Desulfatibacillum alkenivorans WP 073476939 Bacteria Desulfopila aestuarii WP 073613457 Bacteria 99 100 Desulfotomaculum alcoholivorax 3hydroxybutyrylCoA dehydrogenase Desulfotomaculum alcoholivorax WP 027364072 Bacteria 60 Desulfotomaculum arcticum WP 092471644 Bacteria 85 Anaerolineae bacterium SM23 84 3hydroxybutyrylCoA dehydrogenase Anaerolineae bacterium SM23 84 KPL21469 Bacteria 68 Ornatilinea apprima WP 075062643 67 Candidatus Promineofilum breve WP 095045371 100 Pelobacter carbinolicus WP 011340028 Bacteria 56 Pelobacter acetylenicus WP 083558626 Bacteria 99 Algoriphagus ornithinivorans WP 091654034 Bacteria 100 79 Algoriphagus hitonicola WP 092794084 Bacteria 100 63 Algoriphagus sp M82 WP 067547128 Bacteria 92 Aquiflexum balticum WP 084121902 Bacteria 58 Desulfobacca acetoxidans WP 013706894 Bacteria 100 Lokiarchaeum sp GC14 75 putative 3hydroxybutyrylCoA dehydrogenase partial Lokiarchaeum sp GC14 75 KKK40272 100 Lokiarchaeum sp GC14 75 putative 3hydroxybutyrylCoA dehydrogenase Lokiarchaeum sp GC14 75 KKK40337 Lokiarchaeum sp GC14 75 putative 3hydroxybutyrylCoA dehydrogenase Lokiarchaeum sp GC14 75 KKK42255 Aminiphilus circumscriptus hypothetical protein Aminiphilus circumscriptus WP 029166630 Bacteria 100 100 marine gamma proteobacterium HTCC2148 WP 007229042 Desulfatibacillum alkenivorans WP 012609403 Bacteria 100 Syntrophus sp GWC2 56 31 OHE22119 Bacteria 100 Deltaproteobacteria bacterium SM23 61 hypothetical protein AMJ94 06785 Deltaproteobacteria bacterium SM23 61 KPK91667 Bacteria 100 Deltaproteobacteria bacterium SM23 61 hypothetical protein AMJ94 11040 Deltaproteobacteria bacterium SM23 61 KPK89863 Bacteria 100 100 miscellaneous Crenarchaeota group15 archaeon DG45 hypothetical protein AC482 01610 miscellaneous Crenarchaeota group15 archaeon DG45 KON31162 Desulfobacteraceae bacterium 4572 87 hypothetical protein B6240 12710 Desulfobacteraceae bacterium 4572 87 OQY43302 Bacteria 100 miscellaneous Crenarchaeota group archaeon SMTZ155 hypothetical protein AC480 01315 miscellaneous Crenarchaeota group archaeon SMTZ155 KON30332 Lokiarchaeum sp GC14 75 putative 3hydroxybutyrylCoA dehydrogenase Lokiarchaeum sp GC14 75 KKK41074 99 100 Candidatus Methanoperedens nitroreducens WP 096203572 100 Candidatus Methanoperedens nitroreducens WP 048092142 93 98 bacterium RBG 13 63 9 hypothetical protein A2V70 09745 Planctomycetes bacterium RBG 13 63 9 OHB68955 Bacteria 94 Candidatus Bathyarchaeota archaeon RBG 13 52 12 hypothetical protein A3K78 09710 Candidatus Bathyarchaeota archaeon RBG 13 52 12 OGD56112 Candidatus Omnitrophica bacterium 4484 702 hypothetical protein B6D55 00810 Candidatus Omnitrophica bacterium 4484 70 2 OQX88321 miscellaneous Crenarchaeota group15 archaeon DG45 hypothetical protein AC482 02495 miscellaneous Crenarchaeota group15 archaeon DG45 KON30942 100 Desulfotomaculum alcoholivorax 3hydroxyacylCoA dehydrogenase Desulfotomaculum alcoholivorax WP 027365645 Bacteria 100 Desulfurispora thermophila hypothetical protein Desulfurispora thermophila WP 018086915 Bacteria 100 Peptococcaceae bacterium BICA17 3hydroxyacylCoA dehydrogenase Peptococcaceae bacterium BICA17 KJS69947 Bacteria 81 Carboxydothermus ferrireducens 3hydroxyacylCoA dehydrogenase Carboxydothermus ferrireducens WP 028052188 Bacteria 100 88 Flavobacterium sp 83 3hydroxyacylCoA dehydrogenase Flavobacterium sp 83 WP 035672928 Bacteria 100 Desulfobacca sp RBG 16 60 12 3hydroxyacylCoA dehydrogenase Desulfobacca sp RBG 16 60 12 OGR24237 Bacteria Clostridiales bacterium VE20216 3hydroxyacylCoA dehydrogenase Clostridiales bacterium VE20216 WP 024738853 Bacteria 100 100 Clostridiales bacterium PH28 bin88 3hydroxyacylCoA dehydrogenase Clostridiales bacterium PH28 bin88 KKM10089 Bacteria Oscillibacter valericigenes WP 014118274 Bacteria 100 Corynebacterium aurimucosum 3hydroxyacylCoA dehydrogenase Corynebacterium aurimucosum WP 049156013 Bacteria 100 99 Corynebacterium camporealensis WP 046453228 Bacteria Bacillus daliensis WP 090843749 Bacteria 100 Actinobacteria bacterium 13 2 20CM 2 66 6 hypothetical protein AUI15 32580 Actinobacteria bacterium 13 2 20CM 2 66 6 OLB85612 100 Chloroflexi bacterium 13 1 40CM 66 19 hypothetical protein AUH27 04165 Chloroflexi bacterium 13 1 40CM 66 19 OLC20468 Candidatus Rokubacteria bacterium RIFCSPLOWO2 02 FULL 68 19 3hydroxyacylCoA dehydrogenase Candidatus Rokubacteria bacterium RIFCSPLOWO2 02 FULL 68 19 OGL07203 0.4

39

Suppl. Figure 10: Phylogenetic analysis of 3-hydroxy-acyl-CoA dehydrogease (COG1250). Phylogenetic tree was estimated using IQ-tree LG+C20+F on an alignment of 288 taxa and 240 sites and is unrooted. Bacteria, Archaea, eukaryotes and Asgards are colored in black, green, blue and purple respectively. Raw data files are available via figshare (see Data availability for more details).

40

Hexadecanoate B-ketothiolase/acetyl-CoA acetyltransferase COG0183 COG0318 466 taxa, 279 sites Acyl-CoA (C ) UF IQTREE bootstraps Acyl-CoA (Cn+2 ) n COG1960* or LG+C20+G+F Acetyl-CoA COG2368 -oxidation β Acyl-2-enoyl-CoA 100 Natronomonas pharaonis WP 011323534 Archaea Euryarchaeota Halobacteria COG1024* 99 COG0183* - FA Halovenus aranensis WP 092700094 98 3-hydroxy-acyl-CoA Natronomonas moolapensis WP 015409636 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae Fatty acid metabolism 100 Halococcus thailandensis WP 007742929 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae COG1250* 100 Mevalonate metabolism Halococcus agarilyticus 3ketoacylCoA thiolase Halococcus agarilyticus WP 049899934 Archaea Euryarchaeota Halobacteria Halobacteriales Aceto-acyl-CoA 100 96 Fatty acid biosynthesis Haloarchaeobius iranensis WP 089731673 Fatty acid and mevalonate metabolism Salinarchaeum sp HarchtBsk1 WP 020445068 Haloterrigena turkmenica DSM 5511 YP 0034042671 284165988 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae 100 96 57 Sulfolobus islandicus WP 012712532 Archaea Crenarchaeota Thermoprotei 56 Sulfolobus sp A20 WP 069283513 Archaea Crenarchaeota Thermoprotei Sulfolobales 100 Candidatus Acidianus copahuensis WP 048099087 Metallosphaera yellowstonensis MK1 acetylCoA acetyltransferase Metallosphaera yellowstonensis MK1 EHP70022 Archaea Crenarchaeota Thermoprotei Sulfolobales Sulfolobaceae 100 100 Sulfolobus tokodaii WP 010978035 Archaea Crenarchaeota Thermoprotei Sulfolobales Sulfolobaceae 98 Acidianus manzaensis acetylCoA acetyltransferase Acidianus manzaensis ARM74669 Archaea Crenarchaeota Thermoprotei Sulfolobales Sulfolobaceae Fervidicoccus fontis Kam940 acetylCoA Cacyltransferase 7 Fervidicoccus fontis Kam940 AFH42596 Archaea Crenarchaeota Thermoprotei Fervidicoccales Fervidicoccaceae 100 100 Natronococcus amylolyticus WP 005557417 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae 100 Natronococcus occultus WP 015323414 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae Halogranum amylolyticum acetylCoA acyltransferase Halogranum amylolyticum SEP16724 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae 100 walsbyi DSM 16790 YP 6577441 110667933 A7 100 Halorubrum sp SD626R acetylCoA acyltransferase Halorubrum sp SD626R WP 092561882 Archaea Euryarchaeota Halobacteria Halobacteriales Halorubrum sp E3 acetylCoA acyltransferase partial Halorubrum sp E3 OYR70122 Archaea Euryarchaeota Halobacteria Halobacteriales 100 100 Chloroflexi bacterium RBG 19FT COMBO 62 14 acetylCoA acetyltransferase Chloroflexi bacterium RBG 19FT COMBO 62 14 OGO68546 Anaerolineae bacterium SM23 63 acetylCoA acetyltransferase Anaerolineae bacterium SM23 63 KPK94938 Bacteria 100 58 Anaerolineae bacterium SM23 84 acetylCoA acetyltransferase Anaerolineae bacterium SM23 84 KPL18419 Bacteria 100 100 Anaerolineae bacterium CG1 02 58 13 acetylCoA acetyltransferase Anaerolineae bacterium CG1 02 58 13 OIN94728 Bacteria 100 Longilinea arvoryzae WP 075074885 Bacteria Chloroflexi Anaerolineae Anaerolineales Anaerolineaceae Clade 1A Candidatus Cloacimonas sp 4484 140 acetylCoA acetyltransferase Candidatus Cloacimonas sp 4484 140 OPX25715 100 100 Ignavibacteria bacterium CG1 02 37 35 acetylCoA acetyltransferase Ignavibacteria bacterium CG1 02 37 35 OIO19520 Bacteria 99 Candidatus Hydrogenedentes bacterium CG1 02 42 14 acetylCoA acetyltransferase Candidatus Hydrogenedentes bacterium CG1 02 42 14 OIO31007 100 97 candidate division Zixibacteria bacterium SM23 81 acetylCoA acetyltransferase candidate division Zixibacteria bacterium SM23 81 KPL18758 Hadesarchaea archaeon DG331 hypothetical protein AVW06 02910 Hadesarchaea archaeon DG331 KUO40105 26 94 Hadesarchaea archaeon YNP 45 hypothetical protein APZ16 01450 Hadesarchaea archaeon YNP 45 KUO40876 candidate divison MSBL1 archaeon SCGCAAA261F17 hypothetical protein AKJ44 02680 candidate divison MSBL1 archaeon SCGCAAA261F17 KXB01293 94 94 Lokiarchaeum sp GC14 75 putative acetylCoA acyltransferase Lokiarchaeum sp GC14 75 KKK41862 Candidatus Lokiarchaeota archaeon CR 4 propanoylCoA Cacyltransferase Lokiarchaeota archaeon CR 4 OLS12046 Candidatus Lokiarchaeota archaeon CR 4 acetylCoA acetyltransferase Lokiarchaeota archaeon CR 4 OLS12047 Candidatus Bathyarchaeota archaeon B23 hypothetical protein AYL28 006310 Candidatus Bathyarchaeota archaeon B23 KYH36493 99 82 85 Thermosulfidibacter takaii WP 068549111 Bacteria Aquificae Aquificales Aquificales genera Desulfocarbo indianensis hypothetical protein Desulfocarbo indianensis WP 049674633 99 Deltaproteobacteria bacterium RBG 19FT COMBO 46 12 hypothetical protein A2026 14745 partial Deltaproteobacteria bacterium RBG 19FT COMBO 46 12 OGQ04985 Bacteria Proteobacteria 85 99 Deltaproteobacteria bacterium RIFCSPLOWO2 12 FULL 60 19 hypothetical protein A3F90 03080 Deltaproteobacteria bacterium RIFCSPLOWO2 12 FULL 60 19 OGQ83492 Bacteria Proteobacteria Deltaproteobacteria bacterium TMED58 hypothetical protein CBC18 03955 Deltaproteobacteria bacterium TMED58 OUU32981 Bacteria Proteobacteria 100 100 Candidatus Hydrogenedentes bacterium CG1 02 42 14 hypothetical protein AUJ18 02000 Candidatus Hydrogenedentes bacterium CG1 02 42 14 OIO34254 Candidatus Lindowbacteria bacterium RIFCSPLOWO2 12 FULL 62 27 OGH55637 100 95 Candidatus Rokubacteria bacterium RIFCSPLOWO2 12 FULL 71 22 hypothetical protein A3F92 04305 Candidatus Rokubacteria bacterium RIFCSPLOWO2 12 FULL 71 22 OGL17020 candidate division NC10 bacterium RIFCSPLOWO2 02 FULL 66 22 OGB95270 100 100 Candidatus Thorarchaeota archaeon SMTZ145 hypothetical protein AM325 13495 Thorarchaeota archaeon SMTZ145 KXH73830 Candidatus Thorarchaeota archaeon SMTZ45 hypothetical protein AM326 01895 Thorarchaeota archaeon SMTZ45 KXH77044 Candidatus Thorarchaeota archaeon SMTZ183 hypothetical protein AM324 03610 Thorarchaeota archaeon SMTZ183 KXH77603 100 100 Herpetosiphon aurantiacus DSM 785 YP 0015466321 159900385 Bacteria Chloroflexi Chloroflexi Herpetosiphonales Herpetosiphonaceae 98 J10fl YP 0016338761 163845832 Bacteria Chloroflexi Chloroflexi Chloroflexales Chloroflexaceae 79 Roseiflexus sp RS1 WP 011955217 Bacteria Chloroflexi Chloroflexi Chloroflexales Theionarchaea archaeon DG70 hypothetical protein AYK18 15030 Theionarchaea archaeon DG70 KYK33397 69 100 Anaerolineae bacterium SG8 19 acetylCoA acetyltransferase Anaerolineae bacterium SG8 19 KPK05954 Bacteria 39 Ardenticatena conserved protein of unknown function Ardenticatena CUS03536 Acidobacteria bacterium 13 1 40CM 2 68 10 acetylCoA acetyltransferase Acidobacteria bacterium 13 1 40CM 2 68 10 OLD65706 Bacteria 100 51 Anaerolineae bacterium SG8 19 acetylCoA acetyltransferase Anaerolineae bacterium SG8 19 KPK07183 Bacteria Rhodopirellula maiorica WP 008708803 Bacteria Planctomycetes Planctomycetia Planctomycetales Chloroflexi bacterium UTCFX4 acetylCoA acetyltransferase Chloroflexi bacterium UTCFX4 OQY80116 37 100 98 Thermococcus sibiricus MM 739 AcetylCoA acetyltransferase Thermococcus sibiricus MM 739 ACS89861 Archaea Euryarchaeota Thermococci Thermococcales Thermococcaceae 100 Palaeococcus pacificus WP 048164930 Archaea Euryarchaeota Thermococci Thermococcales 93 39 37 Thermococcus zilligii acetylCoA acetyltransferase Thermococcus zilligii WP 010480132 Archaea Euryarchaeota Thermococci Thermococcales Thermococcaceae Pyrococcus furiosus DSM 3638 NP 5787021 18977345 AM 2 Candidatus Heimdallarchaeota archaeon LC 3 3ketoacylCoA thiolase Heimdallarchaeota archaeon LC 3 OLS23586 Arc I group archaeon ADurb1013 Bin02101 acetylCoA acetyltransferase Arc I group archaeon ADurb1013 Bin02101 KYC53975 100 98 Aeropyrum pernix K1 NP 1482272 118431627 Archaea Crenarchaeota Thermoprotei Desulfurococcales Clade 1B 100 Acidilobus saccharovorans 34515 YP 0038164721 302348834 Archaea Crenarchaeota Thermoprotei Acidilobaceae 99 Ignicoccus hospitalis KIN4 I YP 0014349671 156937171 Archaea Crenarchaeota Thermoprotei Desulfurococcales Desulfurococcaceae Candidatus Korarchaeum cryptofilum OPF8 YP 0017376051 170290789 Archaea 24 100 99 Sulfolobus solfataricus P2 NP 3420631 15897458 AM 2 Pyrolobus fumarii 1A YP 0047808871 347523317 Archaea Crenarchaeota Thermoprotei Desulfurococcales 100 100 Pyrodictium occultum WP 058370260 Archaea Crenarchaeota Thermoprotei Desulfurococcales Pyrodictiaceae Pyrodictium delaneyi WP 055410584 Archaea Crenarchaeota Thermoprotei Desulfurococcales butylicus WP 011822668 Archaea Crenarchaeota Thermoprotei Desulfurococcales Pyrodictiaceae 99 100 100 mucosus WP 013561924 Archaea Crenarchaeota Thermoprotei Desulfurococcales Desulfurococcaceae 32 100 Desulfurococcus amylolyticus WP 012608328 Archaea Crenarchaeota Thermoprotei Desulfurococcales Desulfurococcaceae 27 100 Thermosphaera aggregans WP 013129289 Archaea Crenarchaeota Thermoprotei Desulfurococcales Desulfurococcaceae 100 marinus WP 011839709 Archaea Crenarchaeota Thermoprotei Desulfurococcales Desulfurococcaceae Thermogladius calderae WP 014737608 Archaea Crenarchaeota Thermoprotei Desulfurococcales 100 aggregans DSM 17230 Thiolase Ignisphaera aggregans DSM 17230 ADM27286 Archaea Crenarchaeota Thermoprotei Desulfurococcales Desulfurococcaceae uncultured marine thaumarchaeote KM3 74 C10 acetylCoA acetyltransferase atoB uncultured marine thaumarchaeote KM3 74 C10 AIF16465 98 100 Thermofilum pendens WP 011752432 Archaea Crenarchaeota Thermoprotei Thermofilum WP 020962721 Archaea Crenarchaeota Thermoprotei Thermoproteales Thermofilum uzonense WP 052884345 Archaea Crenarchaeota Thermoprotei Thermoproteales 99 Candidatus Bathyarchaeota archaeon RBG 16 57 9 OGD44973 100 Candidatus Odinarchaeota archaeon LCB 4 3ketoacylCoA thiolase Candidatus Odinarchaeota archaeon LCB 4 OLS17784 99 100 Acidiplasma WP 048101926 Archaea Euryarchaeota Thermoplasmatales Ferroplasma acidarmanus WP 019841569 Archaea Euryarchaeota Thermoplasmata Thermoplasmatales 99 Picrophilus torridus WP 011177184 Archaea Euryarchaeota Thermoplasmata Thermoplasmatales Picrophilaceae 100 100 Thermoplasmatales archaeon Eplasma hypothetical protein AMDU2 EPLC00006G0105 Thermoplasmatales archaeon Eplasma EQB66548 Archaea Euryarchaeota Clade 1C 78 100 Cuniculiplasma divulgatum acetylCoA Cacyltransferase Cuniculiplasma divulgatum SIM35430 100 Thermoplasma acidophilum DSM 1728 NP 3949091 16082423 AM 4 Thermoplasmatales archaeon Iplasma hypothetical protein AMDU3 IPLC00002G0356 Thermoplasmatales archaeon Iplasma EQB65719 Archaea Euryarchaeota Aciduliprofundum sp MAR08339 WP 015283221 Archaea Euryarchaeota 88 86 Archaeoglobus fulgidus DSM 4304 NP 0712381 11499992 AM 1 100 Ferroglobus placidus WP 012965485 Archaea Euryarchaeota Archaeoglobi Archaeoglobales 98 Archaeoglobus veneficus WP 013684688 Archaea Euryarchaeota Archaeoglobi Archaeoglobales Archaeoglobaceae 96 Archaeoglobus sulfaticallidus WP 015590341 Archaea Euryarchaeota Archaeoglobi Archaeoglobales 98 Candidatus Bathyarchaeota archaeon B261 acetylCoA acetyltransferase Candidatus Bathyarchaeota archaeon B261 KYH41650 miscellaneous Crenarchaeota group archaeon SMTZ155 acetylCoA acetyltransferase miscellaneous Crenarchaeota group archaeon SMTZ155 KON30702 100 Candidatus Methanohalarchaeum thermophilum AcetylCoA acetyltransferase Candidatus Methanohalarchaeum thermophilum OKY77828 Methanonatronarchaeum thermophilum WP 086636531 97 100 100 candidate divison MSBL1 archaeon SCGCAAA259B11 acetylCoA acetyltransferase candidate divison MSBL1 archaeon SCGCAAA259B11 KXA90438 uncultured organism acetylCoA acetyltransferase uncultured organism AGF93538 98 candidate divison MSBL1 archaeon SCGCAAA259E19 acetylCoA acetyltransferase candidate divison MSBL1 archaeon SCGCAAA259E19 KXA95695 98 96 candidate divison MSBL1 archaeon SCGCAAA261D19 acetylCoA acetyltransferase candidate divison MSBL1 archaeon SCGCAAA261D19 KXB02160 Hadesarchaea archaeon YNP 45 acetylCoA acetyltransferase Hadesarchaea archaeon YNP 45 KUO39684 89 97 Candidatus Altiarchaeales archaeon IMC4 acetylCoA acetyltransferase Candidatus Altiarchaeales archaeon IMC4 ODS41988 archaeon GW2011 AR15 Betaketoacyl synthase thiolase archaeon GW2011 AR15 AJF61241 100 100 Methanomassiliicoccus luminyensis hypothetical protein Methanomassiliicoccus luminyensis WP 019176549 Archaea Euryarchaeota Methanomicrobia unclassified Methanomicrobia 99 Methanomassiliicoccales archaeon RumEn M1 acetylCoA acetyltransferase partial Methanomassiliicoccales archaeon RumEn M1 KQM11262 Clade 1D 98 99 Euryarchaeota archaeon RBG 13 57 23 acetylCoA acetyltransferase Euryarchaeota archaeon RBG 13 57 23 OGS43563 86 Euryarchaeota archaeon RBG 16 67 27 acetylCoA acetyltransferase Euryarchaeota archaeon RBG 16 67 27 OGS47716 38 Marine Group III euryarchaeote CGEpi2 acetylCoA acetyltransferase Marine Group III euryarchaeote CGEpi2 OIR23110 Thermoplasmatales archaeon SG8523 acetylCoA acetyltransferase Thermoplasmatales archaeon SG8523 KYK30927 Archaea Euryarchaeota 100 Candidatus Micrarchaeota archaeon CG1 02 49 24 acetylCoA acetyltransferase Candidatus Micrarchaeota archaeon CG1 02 49 24 OIO25613 Candidatus Micrarchaeota archaeon CG1 02 60 51 acetylCoA acetyltransferase Candidatus Micrarchaeota archaeon CG1 02 60 51 OIO27664 Candidatus Heimdallarchaeota archaeon LC 2 3ketoacylCoA thiolase Heimdallarchaeota archaeon LC 2 OLS18997 94 69 candidate division Zixibacteria bacterium 4484 95 acetylCoA acetyltransferase candidate division Zixibacteria bacterium 4484 95 OQX90739 91 Candidatus Aminicenantes bacterium 4484 214 acetylCoA acetyltransferase Candidatus Aminicenantes bacterium 4484 214 OQX55011 99 91 Deltaproteobacteria bacterium RIFOXYA12 FULL 61 11 acetylCoA acetyltransferase Deltaproteobacteria bacterium RIFOXYA12 FULL 61 11 OGQ98609 Bacteria Proteobacteria Ignavibacteria bacterium RBG 13 36 8 acetylCoA acetyltransferase Ignavibacteria bacterium RBG 13 36 8 OGU55031 Bacteria 67100 Thermosulfidibacter takaii WP 068550472 Bacteria Aquificae Aquificae Aquificales Aquificales genera incertae sedis 51 95 Candidatus Cloacimonetes bacterium 4572 55 acetylCoA acetyltransferase Candidatus Cloacimonetes bacterium 4572 55 OQY27358 77 Ignavibacteria bacterium GWC2 35 8 acetylCoA acetyltransferase Ignavibacteria bacterium GWC2 35 8 OGU49098 Bacteria 99 candidate division WOR 3 bacterium SM23 42 acetylCoA acetyltransferase candidate division WOR 3 bacterium SM23 42 KPK63728 91 Ignavibacteria bacterium GWF2 33 9 acetylCoA acetyltransferase Ignavibacteria bacterium GWF2 33 9 OGU60641 Bacteria Candidatus Cloacimonas sp 4484 209 acetylCoA acetyltransferase partial Candidatus Cloacimonas sp 4484 209 OQX56383 Desulfurella amilsii WP 086033517 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfurellales 100 99 Rhodobacterales bacterium 59 46 T64 acetylCoA acetyltransferase Rhodobacterales bacterium 59 46 T64 OUS20248 Bacteria Proteobacteria 100 Stappia indica acetylCoA acetyltransferase Stappia indica WP 067222525 Bacteria Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae 100 Ilumatobacter nonamiensis acetylCoA acetyltransferase Ilumatobacter nonamiensis WP 040492959 Bacteria Actinobacteria Actinobacteria Acidimicrobidae Acidimicrobiales Acidimicrobineae 99 Rhodobacteraceae WP 067546477 Bacteria Proteobacteria Alphaproteobacteria Paraburkholderia piptadeniae WP 087732120 Thermovenabulum gondwanense WP 068748737 Bacteria Firmicutes Thermoanaerobacterales Thermoanaerobacterales Family III Incertae Sedis 100 100 97 Caldisalinibacter kiritimatiensis WP 006313856 100 Caloramator sp ALD01 propanoylCoA acyltransferase Caloramator sp ALD01 WP 027308994 Bacteria Firmicutes Clostridia Clostridiales Proteiniborus ethanoligenes WP 091727083 Bacteria Firmicutes Clostridia Clostridiales unclassified Clostridiales 42 Clostridium homopropionicum WP 052220615 Bacteria Firmicutes Clostridia Clostridiales Clostridiaceae 99 Candidatus Schekmanbacteria bacterium RBG 16 38 11 acetylCoA acetyltransferase Candidatus Schekmanbacteria bacterium RBG 16 38 11 OGL46871 61 Candidatus Dadabacteria bacterium CSP12 acetylCoA acetyltransferase acetylCoA Cacetyltransferase Candidatus Dadabacteria bacterium CSP12 KRT66400 100 47 83 Bryobacter aggregatus acetylCoA acetyltransferase Bryobacter aggregatus WP 031495679 Bacteria Acidobacteria group Acidobacteria unclassified Acidobacteria bacterium KBS 96 hypothetical protein Acidobacteriaceae bacterium KBS 96 WP 020722503 Bacteria Fibrobacteres Acidobacteria group Acidobacteria Acidobacteriia 100 Thermoflavimicrobium dichotomicum WP 093231052 Bacteria Firmicutes Thermoactinomycetaceae Aneurinibacillus terranovensis acetylCoA acetyltransferase Aneurinibacillus terranovensis WP 027415899 Bacteria Firmicutes Bacilli Bacillales Paenibacillaceae Aneurinibacillus group 100 Sphaerobacter thermophilus WP 012870600 Bacteria Chloroflexi Sphaerobacteridae Sphaerobacterales Sphaerobacterineae Sphaerobacteraceae 94 Thermorudis peleae acetylCoA acetyltransferase Thermorudis peleae WP 038038393 Clade 1E 86 100 Haloprofundus marisrubri WP 058580537 Peptococcaceae bacterium BRH c4b acetylCoA acetyltransferase Peptococcaceae bacterium BRH c4b KJS17127 Bacteria Firmicutes Clostridia 93 Deltaproteobacteria bacterium RBG 16 47 11 acetylCoA acetyltransferase Deltaproteobacteria bacterium RBG 16 47 11 OGP96102 Bacteria Proteobacteria 99 27 100 Candidatus Methylomirabilis oxyfera conserved protein of unknown function Candidatus Methylomirabilis oxyfera CBE67755 Bacteria unclassified Bacteria Candidatus Rokubacteria bacterium CSP16 hypothetical protein XU13 C0058G0014 Candidatus Rokubacteria bacterium CSP16 KRT70265 100 Clostridium ultunense Esp conserved hypothetical protein Clostridium ultunense Esp CCQ92359 Bacteria Firmicutes Clostridia Clostridiales Clostridiaceae Bacillus thermozeamaize acetylCoA acetyltransferase Bacillus thermozeamaize OUM86099 Bacteria Firmicutes Bacilli Bacillales Bacillaceae 100 100 Candidatus Thorarchaeota archaeon SMTZ145 KXH74809 100 Candidatus Thorarchaeota archaeon SMTZ45 acetylCoA acetyltransferase Thorarchaeota archaeon SMTZ45 KXH76295 30 Candidatus Thorarchaeota archaeon AB 25 putative acetylCoA acyltransferase Thorarchaeota archaeon AB 25 OLS30690 Candidatus Thorarchaeota archaeon SMTZ183 acetylCoA acetyltransferase Thorarchaeota archaeon SMTZ183 KXH71996 100 99 Candidatus Heimdallarchaeota archaeon LC 2 hypothetical protein HeimC2 43120 Heimdallarchaeota archaeon LC 2 OLS19065 36 Candidatus Heimdallarchaeota archaeon LC 2 hypothetical protein HeimC2 43140 Heimdallarchaeota archaeon LC 2 OLS19067 33 Candidatus Heimdallarchaeota archaeon LC 3 3ketoacylCoA thiolase Heimdallarchaeota archaeon LC 3 OLS27524 candidate division TM6 bacterium GW2011 GWF2 37 49 AcetylCoA acyltransferase candidate division TM6 bacterium GW2011 GWF2 37 49 KKQ32163 79 Syntrophaceticus schinkii WP 044664269 Bacteria Firmicutes Clostridia Thermoanaerobacterales Thermoanaerobacterales Family III Incertae Sedis Thaumarchaeota archaeon RBG 16 49 8 hypothetical protein A3K61 03745 Thaumarchaeota archaeon RBG 16 49 8 OHE55795 97 100 Chloroflexi bacterium RBG 16 58 8 propanoylCoA acyltransferase Chloroflexi bacterium RBG 16 58 8 OGO44539 Dehalococcoidia bacterium SG8 51 3 propanoylCoA acyltransferase Dehalococcoidia bacterium SG8 51 3 KPK22228 100 Dehalogenimonas formicexedens WP 076004983 Bacteria Chloroflexi 100 100 Desulfobacterium autotrophicum WP 015906299 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfobacterales Desulfobacteraceae 26 98 Desulfobacteraceae bacterium 4484 1901 propanoylCoA acyltransferase Desulfobacteraceae bacterium 4484 190 1 OPX37315 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Chloroflexi bacterium RBG 13 50 10 propanoylCoA acyltransferase Chloroflexi bacterium RBG 13 50 10 OGN95059 63 100 Dehalococcoidia bacterium DG 18 propanoylCoA acyltransferase Dehalococcoidia bacterium DG 18 KPJ51518 Chloroflexi bacterium RBG 16 60 22 propanoylCoA acyltransferase Chloroflexi bacterium RBG 16 60 22 OGO45028 100 100 Desulfobacteraceae bacterium CG2 30 51 40 propanoylCoA acyltransferase Desulfobacteraceae bacterium CG2 30 51 40 OIP39362 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria 99 Desulfarculus baarsii WP 013257793 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfarculales Desulfarculaceae Archaeoglobus fulgidus DSM 8774 AIG98171 Archaea Euryarchaeota Archaeoglobi Archaeoglobales Archaeoglobaceae 81 Candidatus Rokubacteria bacterium 13 1 40CM 4 69 39 OLC53442 100 100 Natronomonas sp CBA1134 acetylCoA acyltransferase Natronomonas sp CBA1134 WP 075936069 Archaea Euryarchaeota Halobacteria Halobacteriales 32 100 Natronomonas pharaonis WP 011322825 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae Natronorubrum tibetense acetylCoA acyltransferase Natronorubrum tibetense WP 081603685 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae Salinarchaeum sp HarchtBsk1 acetylCoA acyltransferase Salinarchaeum sp HarchtBsk1 WP 081638687 Candidatus Lokiarchaeota archaeon CR 4 hypothetical protein RBG13Loki 0693 partial Lokiarchaeota archaeon CR 4 OLS15701 31 100 100 uncultured marine thaumarchaeote KM3 01 F02 PropanoylCoA Cacyltransferase atoB uncultured marine thaumarchaeote KM3 01 F02 AIE97573 uncultured marine crenarchaeote HF4000 APKG3J11 putative thiolase Cterminal domain protein uncultured marine crenarchaeote HF4000 APKG3J11 ABZ08627 100 Thaumarchaeota archaeon 13 1 40CM 38 12 OLB90598 100 100 Thaumarchaeota archaeon 13 1 40CM 38 12 acetylCoA acetyltransferase Thaumarchaeota archaeon 13 1 40CM 38 12 OLB89791 uncultured crenarchaeote putative acetylCoA acetyltransferase uncultured crenarchaeote CAF28713 uncultured marine thaumarchaeote AD1000 41 C07 AIE93902 100 miscellaneous Crenarchaeota group6 archaeon AD81 hypothetical protein AC479 03475 miscellaneous Crenarchaeota group6 archaeon AD81 KON33724 92 Candidatus Bathyarchaeota archaeon RBG 13 38 9 hypothetical protein A3K80 06990 Candidatus Bathyarchaeota archaeon RBG 13 38 9 OGD53575 92 100 archaeon 13 1 20CM 2 54 9 hypothetical protein AUG19 01020 archaeon 13 1 20CM 2 54 9 OLE77319 89 99 Acidobacteria bacterium 13 1 40CM 2 56 11 hypothetical protein AUI45 10335 Acidobacteria bacterium 13 1 40CM 2 56 11 OLD68536 Bacteria 100 Candidatus Bathyarchaeota archaeon B63 hypothetical protein AYL33 004140 Candidatus Bathyarchaeota archaeon B63 KYH41761 99 archaeon RBG 16 50 20 hypothetical protein A3K71 02445 archaeon RBG 16 50 20 OFX15527 miscellaneous Crenarchaeota group15 archaeon DG45 hypothetical protein AC482 02195 miscellaneous Crenarchaeota group15 archaeon DG45 KON31022 89 Candidatus Odinarchaeota archaeon LCB 4 putative acetylCoA acyltransferase Candidatus Odinarchaeota archaeon LCB 4 OLS17488 100 Clade 1F 100 Candidatus Thorarchaeota archaeon AB 25 3ketoacylCoA thiolase Thorarchaeota archaeon AB 25 OLS31129 Candidatus Thorarchaeota archaeon SMTZ145 hypothetical protein AM325 16150 Thorarchaeota archaeon SMTZ145 KXH75634 Candidatus Thorarchaeota archaeon SMTZ183 hypothetical protein AM324 06290 Thorarchaeota archaeon SMTZ183 KXH73872 100 98 candidate divison MSBL1 archaeon SCGCAAA385M11 hypothetical protein AKJ60 01100 partial candidate divison MSBL1 archaeon SCGCAAA385M11 KXB09045 35 Dehalococcoidia bacterium CG2 30 46 19 hypothetical protein AUK39 05920 Dehalococcoidia bacterium CG2 30 46 19 OIP25903 90 Brevefilum fermentans conserved protein of unknown function Brevefilum fermentans SMX54374 100 Desulfobacterium sp 4572 20 hypothetical protein B6I32 03080 Desulfobacterium sp 4572 20 OQY16579 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfobacterales 99 92 Chloroflexi bacterium RBG 13 51 36 hypothetical protein A2Z77 05475 Chloroflexi bacterium RBG 13 51 36 OGN97478 13 uncultured Chloroflexi bacterium HF4000 28F02 acetylCoA acetyltransferase uncultured Chloroflexi bacterium HF4000 28F02 ADI18694 34 Chloroflexi bacterium GWB2 49 20 OGN72617 100 78 Candidatus Bathyarchaeota archaeon RBG 13 38 9 hypothetical protein A3K80 03725 Candidatus Bathyarchaeota archaeon RBG 13 38 9 OGD53015 87 miscellaneous Crenarchaeota group archaeon SMTZ80 hypothetical protein AC481 02640 miscellaneous Crenarchaeota group archaeon SMTZ80 KON28167 Candidatus Bathyarchaeota archaeon RBG 13 52 12 hypothetical protein A3K78 11225 Candidatus Bathyarchaeota archaeon RBG 13 52 12 OGD56495 Candidatus Bathyarchaeota archaeon B25 hypothetical protein AYL30 006030 Candidatus Bathyarchaeota archaeon B25 KYH37912 100 Desulfotomaculum australicum WP 073163669 Bacteria Firmicutes Clostridia Clostridiales Peptococcaceae 100 Desulfovirgula thermocuniculi 3ketoacylCoA thiolase Desulfovirgula thermocuniculi WP 027718212 Bacteria Firmicutes Clostridia Thermoanaerobacterales 100 100 Carboxydothermus ferrireducens 3ketoacylCoA thiolase Carboxydothermus ferrireducens WP 028052191 Bacteria Firmicutes Clostridia Thermoanaerobacterales Thermoanaerobacteraceae 59 Desulfotomaculum thermocisternum 3ketoacylCoA thiolase Desulfotomaculum thermocisternum WP 027355765 Bacteria Firmicutes Clostridia Clostridiales Peptococcaceae 56 Clostridiales bacterium PH28 bin88 3ketoacylCoA thiolase Clostridiales bacterium PH28 bin88 KKM10087 Bacteria Firmicutes 14 58 Haladaptatus paucihalophilus WP 007981307 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae Euryarchaeota archaeon RBG 13 57 23 3ketoacylCoA thiolase Euryarchaeota archaeon RBG 13 57 23 OGS43624 100 Flavobacteriales bacterium TMED123 3ketoacylCoA thiolase Flavobacteriales bacterium TMED123 OUV75155 Bacteria Bacteroidetes Chlorobi group Bacteroidetes 100 100 Natrialba magadii ATCC 43099 YP 0034787441 289580278 A6 Haloterrigena turkmenica WP 012945476 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae Haloarcula japonica WP 004595068 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae 100 100 Haloterrigena hispanica WP 092935691 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae 100 Haloterrigena mahii WP 066303891 Archaea Euryarchaeota Halobacteria Halobacteriales Halobiforma lacisalsi WP 049918419 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae 100 Halorhabdus utahensis DSM 12940 YP 0031295171 257051684 A5 100 Halogranum amylolyticum acetylCoA Cacetyltransferase Halogranum amylolyticum SEP10630 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae Clade 1G 80 Haloferax mediterranei ATCC 33500 3ketoacylCoA thiolase plasmid Haloferax mediterranei ATCC 33500 AHZ24333 81 100 Pyrobaculum aerophilum str IM2 NP 5600361 18313369 A8 Thermoproteus tenax WP 014127008 Archaea Crenarchaeota Thermoprotei Thermoproteales 100 Thermoproteus uzoniensis WP 013679760 Archaea Crenarchaeota Thermoprotei Thermoproteales Thermoproteaceae 100 Haloarcula marismortui ATCC 43049 YP 1381801 55380331 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae 70 Natronomonas pharaonis DSM 2160 YP 3273561 76802348 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae 100 Lokiarchaeum sp GC14 75 BetaketoadipylCoA thiolase Lokiarchaeum sp GC14 75 KKK45622 100 Lokiarchaeum sp GC14 75 BetaketoadipylCoA thiolase Lokiarchaeum sp GC14 75 KKK44269 100 Desulfatitalea tepidiphila thiolase Desulfatitalea tepidiphila WP 054030131 98 Firmicutes bacterium ML8 F2 thiolase Firmicutes bacterium ML8 F2 OPL14534 100 100 Deltaproteobacteria bacterium RBG 13 61 14 thiolase Deltaproteobacteria bacterium RBG 13 61 14 OGP57556 Bacteria Proteobacteria Ruegeria halocynthiae hypothetical protein Ruegeria halocynthiae WP 037313459 Bacteria Proteobacteria Alphaproteobacteria Rhodobacterales 80 Actinobacteria bacterium RBG 13 55 18 OFW55490 100 100 Syntrophothermus lipocalidus WP 013174650 Bacteria Firmicutes Clostridia Clostridiales Bacillus niacini thiolase Bacillus niacini WP 063251831 Bacteria Firmicutes Bacilli Bacillales Bacillaceae 100 Desulfarculus baarsii WP 013259988 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfarculales Desulfarculaceae 97 99 Desulfotomaculum alcoholivorax thiolase Desulfotomaculum alcoholivorax WP 027364566 Bacteria Firmicutes Clostridia Clostridiales Peptococcaceae Smithella sp SDB thiolase Smithella sp SDB KQC08156 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Syntrophobacterales 100 Spirochaetes bacterium RBG 16 49 21 thiolase Spirochaetes bacterium RBG 16 49 21 OHD70527 Bacteria 98 100 Smithella sp SDB thiolase Smithella sp SDB KQC10101 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Syntrophobacterales 100 100 Desulfatibacillum alkenivorans WP 073473131 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfobacterales Desulfobacteraceae Desulfococcus sp 4484 241 thiolase Desulfococcus sp 4484 241 OQX63305 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfobacterales gamma proteobacterium HdN1 WP 013263741 Smithella sp SDB thiolase Smithella sp SDB KQC11261 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Syntrophobacterales 100 100 100 Rhodocyclaceae bacterium Paddy1 thiolase Rhodocyclaceae bacterium Paddy1 WP 054620013 Bacteria Proteobacteria Betaproteobacteria 100 Achromobacter WP 063960386 Bacteria Proteobacteria Betaproteobacteria 100 Herbaspirillum sp RV1423 thiolase Herbaspirillum sp RV1423 WP 081768700 Bacteria Proteobacteria Betaproteobacteria Burkholderiales 100 Ottowia thiooxydans thiolase Ottowia thiooxydans WP 036593234 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae 99 Oceanibacterium hippocampi WP 085881981 100 100 Desulfobacterium vacuolatum WP 084070079 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfobacterales Desulfobacteraceae 100 Desulforhopalus singaporensis WP 092225217 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfobacterales Desulfobulbaceae 100 Variovorax sp KK3 thiolase Variovorax sp KK3 WP 077001708 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Desulfococcus sp 4484 241 thiolase Desulfococcus sp 4484 241 OQX61548 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfobacterales 50 Actinobacteria bacterium RBG 13 55 18 OFW59796 90 Deltaproteobacteria bacterium RBG 13 43 22 thiolase Deltaproteobacteria bacterium RBG 13 43 22 OGP50224 Bacteria Proteobacteria Acidobacteria bacterium RBG 16 64 8 thiolase partial Acidobacteria bacterium RBG 16 64 8 OFV84838 Bacteria 32 Dehalococcoidia bacterium SM23 28 2 thiolase Dehalococcoidia bacterium SM23 28 2 KPK47970 68 100 100 Bacillus thermozeamaize hypothetical protein BAA01 09900 Bacillus thermozeamaize OUM87298 Bacteria Firmicutes Bacilli Bacillales Bacillaceae Bacillus alveayuensis thiolase Bacillus alveayuensis WP 044894657 Bacteria Firmicutes Bacilli Bacillales Bacillaceae Bacillus pseudofirmus WP 075387468 Bacteria Firmicutes Bacilli Bacillales Bacillaceae 100 Nitrospinae bacterium RIFCSPLOWO2 12 FULL 45 22 thiolase Nitrospinae bacterium RIFCSPLOWO2 12 FULL 45 22 OGW17055 100 Amycolatopsis jejuensis thiolase Amycolatopsis jejuensis WP 051792894 Bacteria Actinobacteria Actinobacteria Pseudonocardineae Pseudonocardiaceae 98 Blastococcus saxobsidens WP 014378043 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Frankineae Clade 1H 100 Peptococcaceae bacterium BRH c23 KJS48594 Bacteria Firmicutes Clostridia Desulfotomaculum sp BICA16 hypothetical protein JL56 14340 Desulfotomaculum sp BICA16 KJS71555 Bacteria Firmicutes Clostridia Clostridiales 100 Desulfobacteraceae bacterium 4572 87 hypothetical protein B6240 01030 Desulfobacteraceae bacterium 4572 87 OQY50777 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria 100 Rhodospirillaceae bacterium BRH c57 hypothetical protein VR70 08735 Rhodospirillaceae bacterium BRH c57 KJS39134 Bacteria Proteobacteria Alphaproteobacteria 100 100 Syntrophorhabdus aromaticivorans hypothetical protein Syntrophorhabdus aromaticivorans WP 028895873 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Syntrophobacterales Syntrophorhabdaceae 100 Desulfatiglans anilini hypothetical protein Desulfatiglans anilini WP 028321592 87 99 Desulfosporosinus sp BRH c37 hypothetical protein APF81 21580 Desulfosporosinus sp BRH c37 KUO77152 Bacteria Firmicutes Clostridia Clostridiales Deltaproteobacteria bacterium RBG 16 49 23 hypothetical protein A2V86 03010 Deltaproteobacteria bacterium RBG 16 49 23 OGP78128 Bacteria Proteobacteria 91 100 100 Pseudonocardia sp EC08061009 thiolase Pseudonocardia sp EC08061009 WP 082538314 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Pseudonocardineae 100 Pseudonocardia autotrophica thiolase Pseudonocardia autotrophica WP 037039857 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Pseudonocardineae Pseudonocardiaceae 59 98 Amycolatopsis sacchari AcetylCoA acetyltransferase Amycolatopsis sacchari SFK08274 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Pseudonocardineae Pseudonocardiaceae Streptomyces melanosporofaciens WP 093459658 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Streptomycineae Streptomycetaceae Streptomyces uncultured delta proteobacterium putative thiolase uncultured delta proteobacterium CAI78630 100 Nitrospinae bacterium RIFCSPLOWO2 12 FULL 45 22 hypothetical protein A3G93 03000 Nitrospinae bacterium RIFCSPLOWO2 12 FULL 45 22 OGW17539 100 100 100 Betaproteobacteria bacterium RIFCSPLOWO2 12 FULL 62 13b propanoylCoA acyltransferase Betaproteobacteria bacterium RIFCSPLOWO2 12 FULL 62 13b OGA35675 Bacteria 100 Streptomyces sp NBRC 109706 propanoylCoA acyltransferase Streptomyces sp NBRC 109706 WP 062214518 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Streptomycineae 92 65 Deltaproteobacteria bacterium SM23 61 propanoylCoA acyltransferase Deltaproteobacteria bacterium SM23 61 KPK90479 Bacteria Proteobacteria Bacillus sp MRMR6 WP 075687438 Bacteria Firmicutes Bacilli Bacillales Bacillaceae 100 Xanthobacter sp 91 hypothetical protein Xanthobacter sp 91 WP 029557356 Bacteria Proteobacteria Alphaproteobacteria Rhizobiales 100 uncultured delta proteobacterium conserved hypothetical protein uncultured delta proteobacterium SBV99679 Burkholderiales bacterium 7064 hypothetical protein BGO72 08595 Burkholderiales bacterium 7064 OJX07498 Bacteria Proteobacteria 100 Natronorubrum tibetense WP 006092095 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae candidate divison MSBL1 archaeon SCGCAAA259J03 hypothetical protein AKJ39 04150 candidate divison MSBL1 archaeon SCGCAAA259J03 KXA96786 Patulibacter medicamentivorans WP 007569907 Bacteria Actinobacteria Actinobacteria Rubrobacteridae 58 87 Deltaproteobacteria bacterium CG1 02 45 11 hypothetical protein AUJ48 00395 Deltaproteobacteria bacterium CG1 02 45 11 OIN97201 Bacteria Proteobacteria 100 Mycobacterium saskatchewanense hypothetical protein AWC23 06410 Mycobacterium saskatchewanense ORW73711 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae 100 Actinobacteria bacterium IMCC26207 WP 052602026 63 Mycobacterium nebraskense WP 047321814 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae 100 100 Haloterrigena daqingensis WP 076581581 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae Halorientalis regularis WP 092686487 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae 100 Candidatus Fraserbacteria bacterium RBG 16 55 9 hypothetical protein A2Z21 10200 Candidatus Fraserbacteria bacterium RBG 16 55 9 OGF52666 100 100 Rhodococcus sp 4J2A2 thiolase Rhodococcus sp 4J2A2 WP 092803297 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacterium holsaticum WP 069408228 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae 48 Mycobacterium europaeum WP 085242896 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Corynebacterineae Mycobacteriaceae 55 100 Archaeoglobus fulgidus DSM 4304 NP 0688751 11497655 AM 3 Lokiarchaeum sp GC14 75 hypothetical protein Lokiarch 28110 partial Lokiarchaeum sp GC14 75 KKK43307 Halorientalis regularis WP 092692634 Archaea Euryarchaeota Halobacteria Halobacteriales Halobacteriaceae 89 Candidatus Peribacteria bacterium RIFCSPHIGHO2 02 FULL 51 15 OGJ62235 89 Candidatus Peribacteria bacterium RIFCSPHIGHO2 01 FULL 55 13 OGJ59271 100 Candidatus Peribacter riflensis ALM10041 89 Candidatus Peribacteria bacterium RIFOXYC2 FULL 58 10 hypothetical protein A3J91 04390 Candidatus Peribacteria bacterium RIFOXYC2 FULL 58 10 OGJ82969 100 100 100 Candidatus Kaiserbacteria bacterium RIFCSPHIGHO2 02 FULL 49 16 hypothetical protein A3C86 03975 Candidatus Kaiserbacteria bacterium RIFCSPHIGHO2 02 FULL 49 16 OGG58704 Candidatus Peribacteria bacterium RIFCSPHIGHO2 01 FULL 51 35 hypothetical protein A2706 02385 Candidatus Peribacteria bacterium RIFCSPHIGHO2 01 FULL 51 35 OGJ56146 19 Candidatus Sungbacteria bacterium RIFCSPLOWO2 02 FULL 54 10 hypothetical protein A3J10 02965 Candidatus Sungbacteria bacterium RIFCSPLOWO2 02 FULL 54 10 OHA13138 Candidatus Peribacteria bacterium RIFCSPHIGHO2 01 FULL 49 38 OGJ54889 Candidatus Lokiarchaeota archaeon CR 4 Thiolase Lokiarchaeota archaeon CR 4 OLS13386 Clade 1I 100 57 100 archaeon GW2011 AR11 AcetylCoA acetyltransferase archaeon GW2011 AR11 KHO50309 100 Parcubacteria bacterium DG 74 1 hypothetical protein AMJ48 00600 Parcubacteria bacterium DG 74 1 KPJ73687 100 archaeon GW2011 AR11 AcetylCoA acetyltransferase archaeon GW2011 AR11 KHO50312 Candidatus bacterium CG2 30 54 11 hypothetical protein AUK40 01600 Candidatus Wirthbacteria bacterium CG2 30 54 11 OIP98414 70 75 100 Candidatus Micrarchaeota archaeon CG1 02 55 22 hypothetical protein AUJ14 02845 Candidatus Micrarchaeota archaeon CG1 02 55 22 OIO26059 Actinobacteria bacterium 66 15 AcetylCoA acetyltransferase Actinobacteria bacterium 66 15 KUK47632 Candidatus Wirthbacteria bacterium CG2 30 54 11 hypothetical protein AUK40 05410 Candidatus Wirthbacteria bacterium CG2 30 54 11 OIP95997 100 100 Candidatus Bathyarchaeota archaeon RBG 13 38 9 hypothetical protein A3K80 05415 Candidatus Bathyarchaeota archaeon RBG 13 38 9 OGD53395 candidate divison MSBL1 archaeon SCGCAAA261O19 hypothetical protein AKJ48 00390 candidate divison MSBL1 archaeon SCGCAAA261O19 KXB05070 Thermofilum adornatus hypothetical protein N186 09695 Thermofilum adornatus AGT36274 Archaea Crenarchaeota Thermoprotei Thermoproteales 95 61 Alloactinosynnema album WP 091381836 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Pseudonocardineae Nocardioides jensenii acetylCoA acetyltransferase Nocardioides jensenii WP 067433096 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Propionibacterineae Nocardioidaceae 61 Deltaproteobacteria bacterium 13 1 40CM 3 71 4 acetylCoA acetyltransferase Deltaproteobacteria bacterium 13 1 40CM 3 71 4 OLC90651 Bacteria Proteobacteria 99 100 Streptacidiphilus oryzae acetylCoA acetyltransferase Streptacidiphilus oryzae WP 037578157 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Streptomycineae Streptomycetaceae Streptomyces cattleya WP 014141521 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Streptomycineae Streptomycetaceae 100 Microtetraspora niveoalba acetylCoA acetyltransferase Microtetraspora niveoalba WP 067180844 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Streptosporangineae Streptosporangiaceae 96 100 Actinomadura flavalba acetylCoA acetyltransferase Actinomadura flavalba WP 018653155 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Streptosporangineae Catenulispora acidiphila WP 012787294 Bacteria Actinobacteria Actinobacteria Actinobacteridae Actinomycetales Catenulisporineae Catenulisporaceae 100 100 Chloroflexi bacterium RBG 13 54 8 hypothetical protein A2Y91 05010 Chloroflexi bacterium RBG 13 54 8 OGO02692 62 Desulfatibacillum alkenivorans WP 015948091 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfobacterales Desulfobacteraceae 100 Archaeoglobus fulgidus DSM 4304 NP 0689721 11497750 A3 Lokiarchaeum sp GC14 75 acetylCoA acetyltransferase Lokiarchaeum sp GC14 75 KKK44143 100 Archaeoglobus fulgidus DSM 4304 NP 0690401 11497818 Archaea Euryarchaeota Archaeoglobi Archaeoglobales Archaeoglobaceae 100 Deltaproteobacteria bacterium SM23 61 hypothetical protein AMJ94 05690 Deltaproteobacteria bacterium SM23 61 KPK92201 Bacteria Proteobacteria 99 Desulfobacteraceae bacterium 4484 1901 hypothetical protein B1H11 00035 Desulfobacteraceae bacterium 4484 190 1 OPX40428 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria 100 100 100 Dehalococcoidia bacterium CG2 30 46 19 hypothetical protein AUK39 06165 Dehalococcoidia bacterium CG2 30 46 19 OIP25801 Smithella sp SCADC hypothetical protein ER57 00150 Smithella sp SCADC KFO68993 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Syntrophobacterales Candidatus Bathyarchaeota archaeon BA1 acetylCoA acetyltransferase Candidatus Bathyarchaeota archaeon BA1 KPV65560 100 99 Candidatus Thorarchaeota archaeon SMTZ183 acetylCoA acetyltransferase Thorarchaeota archaeon SMTZ183 KXH77299 70 Candidatus Thorarchaeota archaeon SMTZ145 acetylCoA acetyltransferase Thorarchaeota archaeon SMTZ145 KXH71335 67 Archaeoglobus fulgidus DSM 4304 NP 0698001 11498572 A2 100 100 Acidilobus saccharovorans 34515 Putative 3ketoacylCoA thiolase Acidilobus saccharovorans 34515 ADL19667 Archaea Crenarchaeota Thermoprotei Acidilobales Acidilobaceae 75 Caldisphaera lagunensis WP 015231816 Archaea Crenarchaeota Thermoprotei Acidilobales Aeropyrum pernix K1 NP 1485781 14602032 Archaea Crenarchaeota Thermoprotei Desulfurococcales Desulfurococcaceae 95 100 Pyrobaculum oguniense TE7 AcetylCoA acetyltransferase Pyrobaculum oguniense TE7 AFA40009 Archaea Crenarchaeota Thermoprotei Thermoproteales Thermoproteaceae 100 Vulcanisaeta moutnovskia 76828 YP 0042449351 325968743 Archaea Crenarchaeota Thermoprotei Thermoproteales Thermoproteaceae 100 Sulfolobus solfataricus P2 NP 3434561 15898851 Archaea Crenarchaeota Thermoprotei Sulfolobales Sulfolobaceae Metallosphaera sedula DSM 5348 YP 0011903711 146303055 Archaea Crenarchaeota Thermoprotei Sulfolobales Sulfolobaceae Carboxydothermus islandicus WP 075866616 Bacteria Firmicutes Clostridia Thermoanaerobacterales 100 Clade 1J 80 archaeon 13 1 40CM 4 53 4 hypothetical protein AUH73 04735 archaeon 13 1 40CM 4 53 4 OLC62389 100 Crenarchaeota archaeon 13 1 40CM 3 52 17 OLD14469 93 archaeon RBG 16 50 20 hypothetical protein A3K71 02450 archaeon RBG 16 50 20 OFX15533 100 95 Chloroflexi bacterium RBG 16 52 11 hypothetical protein A2W33 04745 Chloroflexi bacterium RBG 16 52 11 OGO26908 Anaerolineae bacterium SM23 84 acetylCoA acetyltransferase Anaerolineae bacterium SM23 84 KPL24488 Bacteria 99 100 Candidatus Thorarchaeota archaeon SMTZ183 acetylCoA acetyltransferase Thorarchaeota archaeon SMTZ183 KXH77303 Candidatus Thorarchaeota archaeon SMTZ145 acetylCoA acetyltransferase Thorarchaeota archaeon SMTZ145 KXH71334 100 Archaeoglobus fulgidus DSM 4304 NP 0698011 11498573 A1 100 Ferroglobus placidus DSM 10642 YP 0034369351 288932875 Archaea Euryarchaeota Archaeoglobi Archaeoglobales Archaeoglobaceae 100 100 Vulcanisaeta moutnovskia 76828 YP 0042449341 325968742 Archaea Crenarchaeota Thermoprotei Thermoproteales Thermoproteaceae Thermoproteus tenax Kra 1 YP 0048923111 352681787 Archaea Crenarchaeota Thermoprotei Thermoproteales Thermoproteaceae 100 Acidilobus saccharovorans 34515 YP 0038166971 302349059 Archaea Crenarchaeota Thermoprotei Acidilobales Acidilobaceae 88 100 Sulfolobus solfataricus P2 NP 3434551 15898850 Archaea Crenarchaeota Thermoprotei Sulfolobales Sulfolobaceae Metallosphaera sedula DSM 5348 YP 0011903721 146303056 Archaea Crenarchaeota Thermoprotei Sulfolobales Sulfolobaceae 100 Deltaproteobacteria bacterium SM23 61 hypothetical protein AMJ94 05695 Deltaproteobacteria bacterium SM23 61 KPK92185 Bacteria Proteobacteria 98 Desulfobacteraceae bacterium 4484 1901 hypothetical protein B1H11 00030 Desulfobacteraceae bacterium 4484 190 1 OPX40427 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria 100 100 Lokiarchaeum sp GC14 75 acetylCoA acetyltransferase Lokiarchaeum sp GC14 75 KKK44142 Archaeoglobus fulgidus DSM 4304 NP 0689731 11497751 Archaea Euryarchaeota Archaeoglobi Archaeoglobales Archaeoglobaceae 100 Chloroflexi bacterium RBG 16 64 32 hypothetical protein A2W34 03725 Chloroflexi bacterium RBG 16 64 32 OGO47302 Archaeoglobus fulgidus DSM 4304 NP 0690391 11497817 Archaea Euryarchaeota Archaeoglobi Archaeoglobales Archaeoglobaceae 95 76 Candidatus Schekmanbacteria bacterium RBG 16 38 11 acetylCoA acetyltransferase Candidatus Schekmanbacteria bacterium RBG 16 38 11 OGL47013 79 Candidatus Methylomirabilis oxyfera carrier protein x sterol carrier protein 2 related protein Candidatus Methylomirabilis oxyfera CBE67725 Bacteria unclassified Bacteria 100 candidate division NC10 bacterium RIFCSPLOWO2 02 FULL 66 22 OGB91763 100 93 candidate division Zixibacteria bacterium RBG 16 48 11 acetylCoA acetyltransferase candidate division Zixibacteria bacterium RBG 16 48 11 OGC92804 Parcubacteria group bacterium GW2011 GWA2 47 9 KKU86773 100 100 Candidatus Heimdallarchaeota archaeon LC 2 hypothetical protein HeimC2 27720 Heimdallarchaeota archaeon LC 2 OLS23154 archaeon Heimdall LC 3 hypothetical protein HeimC3 44300 archaeon Heimdall LC 3 OLS19667 Deltaproteobacteria bacterium CG2 30 66 27 acetylCoA acetyltransferase Deltaproteobacteria bacterium CG2 30 66 27 OIP33786 Bacteria Proteobacteria 76 100 100 Deltaproteobacteria bacterium GWC2 42 51 OGP33370 Bacteria Proteobacteria 100 Nitrospinae bacterium RIFCSPLOWO2 02 39 17 acetylCoA acetyltransferase Nitrospinae bacterium RIFCSPLOWO2 02 39 17 OGW06684 bacterium 13 2 20CM 2 62 8 acetylCoA acetyltransferase Nitrospirae bacterium 13 2 20CM 2 62 8 OLB56229 78 Geobacteraceae bacterium GWC2 55 20 OGU01627 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria 91 100 82 Peptococcaceae bacterium CEB3 WP 047830580 Bacteria Firmicutes Clostridia Desulfosporosinus sp OL WP 075365835 Bacteria Firmicutes Clostridia Clostridiales Carboxydothermus ferrireducens acetylCoA acetyltransferase Carboxydothermus ferrireducens WP 028051459 Bacteria Firmicutes Clostridia Thermoanaerobacterales Thermoanaerobacteraceae 100 100 Thermoplasmatales archaeon Aplasma hypothetical protein AMDU1 APLC00005G0045 Thermoplasmatales archaeon Aplasma EQB72666 Archaea Euryarchaeota 100 uncultured archaeon acetylCoA acetyltransferase uncultured archaeon AKA48882 91 Thermoplasma acidophilum DSM 1728 NP 3938301 16081480 Archaea Euryarchaeota Thermoplasmata Thermoplasmatales Thermoplasmataceae 100 100 Acidiplasma cupricumulans WP 055041126 Archaea Euryarchaeota Thermoplasmata Thermoplasmatales Ferroplasma WP 019841461 Archaea Euryarchaeota Thermoplasmata Thermoplasmatales Picrophilus torridus DSM 9790 YP 0239661 48478260 Archaea Euryarchaeota Thermoplasmata Thermoplasmatales Picrophilaceae Clade 1K 97 Thermoplasmatales archaeon Iplasma hypothetical protein AMDU3 IPLC00003G0159 Thermoplasmatales archaeon Iplasma EQB65307 Archaea Euryarchaeota 96 99 Desulfatiglans anilini hypothetical protein Desulfatiglans anilini WP 051184473 Desulfarculus baarsii WP 013258355 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfarculales Desulfarculaceae 100 Desulfopila aestuarii WP 073616679 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Desulfobacterales Desulfobulbaceae 97 100 Lokiarchaeum sp GC14 75 acetylCoA acetyltransferase Lokiarchaeum sp GC14 75 KKK40679 73 Lokiarchaeum sp GC14 75 acetylCoA acetyltransferase Lokiarchaeum sp GC14 75 KKK44137 Candidatus Lokiarchaeota archaeon CR 4 acetylCoA acetyltransferase Lokiarchaeota archaeon CR 4 OLS14444 100 82 WP 010927571 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Hydrogenophaga palleronii hypothetical protein Hydrogenophaga palleronii WP 066272055 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae 96 99 Lokiarchaeum sp GC14 75 acetylCoA acetyltransferase Lokiarchaeum sp GC14 75 KKK41228 97 94 Lokiarchaeum sp GC14 75 acetylCoA acetyltransferase Lokiarchaeum sp GC14 75 KKK41227 Lokiarchaeum sp GC14 75 acetylCoA acetyltransferase partial Lokiarchaeum sp GC14 75 KKK46381 Thermoplasmatales archaeon Eplasma hypothetical protein AMDU2 EPLC00006G0205 Thermoplasmatales archaeon Eplasma EQB66648 Archaea Euryarchaeota 100 100 uncultured Acidilobus sp MG AcetylCoA acetyltransferase uncultured Acidilobus sp MG ESQ22400 66 Acidilobus saccharovorans WP 013265946 Archaea Crenarchaeota Thermoprotei Acidilobales Acidilobaceae Caldisphaera lagunensis WP 015232657 Archaea Crenarchaeota Thermoprotei Acidilobales Caldisphaeraceae Aeropyrum camini SY1 JCM 12091 acetylCoA acetyltransferase Aeropyrum camini SY1 JCM 12091 BAN90075 Archaea Crenarchaeota Thermoprotei Desulfurococcales Desulfurococcaceae 70 97 100 Thermoproteus sp AZ2 acetylCoA acetyltransferase Thermoproteus sp AZ2 KJR73007 Archaea Crenarchaeota Thermoprotei Thermoproteales 100 100 Pyrobaculum oguniense TE7 AcetylCoA acetyltransferase Pyrobaculum oguniense TE7 AFA39931 Archaea Crenarchaeota Thermoprotei Thermoproteales Thermoproteaceae Thermoproteus tenax WP 014126923 Archaea Crenarchaeota Thermoprotei Thermoproteales Thermoproteaceae 100 Vulcanisaeta sp JCHS 4 acetylCoA acetyltransferase Vulcanisaeta sp JCHS 4 KUO81418 Archaea Crenarchaeota Thermoprotei Thermoproteales Pyrodictium delaneyi WP 055407807 Archaea Crenarchaeota Thermoprotei Desulfurococcales Candidatus Caldiarchaeum subterraneum BAJ46646 Archaea Thaumarchaeota 29 100 Archaeoglobus fulgidus DSM 4304 NP 0692741 11498050 A3 1 100 Archaeoglobus sulfaticallidus WP 015590788 Archaea Euryarchaeota Archaeoglobi Archaeoglobales 100 Ferroglobus placidus WP 012965690 Archaea Euryarchaeota Archaeoglobi Archaeoglobales Archaeoglobaceae 100 25 Geoglobus acetivorans WP 048091355 Archaea Euryarchaeota Archaeoglobi Archaeoglobales 100 99 Vulcanisaeta sp AZ3 acetylCoA acetyltransferase Vulcanisaeta sp AZ3 KJR72600 Archaea Crenarchaeota Thermoprotei Thermoproteales Vulcanisaeta thermophila acetylCoA acetyltransferase Vulcanisaeta thermophila WP 069806605 Archaea Crenarchaeota Thermoprotei Thermoproteales 100 Pyrobaculum sp OCT 11 acetylCoA acetyltransferase Pyrobaculum sp OCT 11 KUO90455 Archaea Crenarchaeota Thermoprotei Thermoproteales 100 miscellaneous Crenarchaeota group15 archaeon DG45 acetylCoA acetyltransferase miscellaneous Crenarchaeota group15 archaeon DG45 KON30936 Candidatus Bathyarchaeota archaeon RBG 13 46 16b acetylCoA acetyltransferase Candidatus Bathyarchaeota archaeon RBG 13 46 16b OGD45430 100 candidate divison MSBL1 archaeon SCGCAAA382A13 acetylCoA acetyltransferase candidate divison MSBL1 archaeon SCGCAAA382A13 KXB04321 Clade 1L Methanonatronarchaeum thermophilum WP 086637525 61 100 Chloroflexi bacterium RBG 16 48 7 acetylCoA acetyltransferase Chloroflexi bacterium RBG 16 48 7 OGO16989 candidate divison MSBL1 archaeon SCGCAAA259I09 acetylCoA acetyltransferase candidate divison MSBL1 archaeon SCGCAAA259I09 KXA97022 100 Chloroflexi bacterium RBG 16 51 9 acetylCoA acetyltransferase Chloroflexi bacterium RBG 16 51 9 OGO22476 100 100 84 Crenarchaeota archaeon 13 1 40CM 3 53 5 OLD05060 archaeon 13 2 20CM 2 53 6 OLB45605 100 Candidatus Bathyarchaeota archaeon BA1 acetylCoA acetyltransferase Candidatus Bathyarchaeota archaeon BA1 KPV65333 92 53 Thermoplasmatales archaeon SM150 acetylCoA acetyltransferase Thermoplasmatales archaeon SM150 KYK25484 Archaea Euryarchaeota Thermoplasmatales archaeon SG8523 acetylCoA acetyltransferase Thermoplasmatales archaeon SG8523 KYK34010 Archaea Euryarchaeota 100 Euryarchaeota archaeon SG85 acetylCoA acetyltransferase Euryarchaeota archaeon SG85 KYK30424 100 Deltaproteobacteria bacterium RBG 16 55 12 acetylCoA acetyltransferase Deltaproteobacteria bacterium RBG 16 55 12 OGP98308 Bacteria Proteobacteria Planctomycetes bacterium RBG 16 64 12 acetylCoA acetyltransferase Planctomycetes bacterium RBG 16 64 12 OHB81131 Bacteria Planctomycetes 86 100 54 Candidatus Thorarchaeota archaeon SMTZ45 hypothetical protein AM326 09995 Thorarchaeota archaeon SMTZ45 KXH74294 100 Candidatus Thorarchaeota archaeon SMTZ145 hypothetical protein AM325 10720 Thorarchaeota archaeon SMTZ145 KXH70914 99 Candidatus Thorarchaeota archaeon SMTZ183 hypothetical protein AM324 07520 Thorarchaeota archaeon SMTZ183 KXH72338 100 Candidatus Thorarchaeota archaeon AB 25 putative acetylCoA acyltransferase Thorarchaeota archaeon AB 25 OLS31252 Candidatus Heimdallarchaeota archaeon LC 2 3ketoacylCoA thiolase Heimdallarchaeota archaeon LC 2 OLS29112 88 Theionarchaea archaeon DG70 acetylCoA acetyltransferase Theionarchaea archaeon DG70 KYK31714 88 100 Desulfobacteraceae bacterium 4484 1901 hypothetical protein B1H11 05560 Desulfobacteraceae bacterium 4484 190 1 OPX37712 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria candidate divison MSBL1 archaeon SCGCAAA259E22 hypothetical protein AKJ66 00180 candidate divison MSBL1 archaeon SCGCAAA259E22 KXA94016 Deltaproteobacteria bacterium RBG 16 49 23 acetylCoA acetyltransferase Deltaproteobacteria bacterium RBG 16 49 23 OGP72566 Bacteria Proteobacteria 100 97 Nitrososphaera viennensis WP 075053857 Archaea Thaumarchaeota Nitrososphaerales Nitrososphaeraceae 100 Candidatus Nitrososphaera gargensis Ga92 acetoacetylCoA thiolase or ketoacylCoA thiolase Candidatus Nitrososphaera gargensis Ga9 2 AFU58682 Archaea Thaumarchaeota Nitrososphaerales 99 Candidatus Nitrocosmicus oleophilus AcetylCoA acetyltransferase Candidatus Nitrocosmicus oleophilus ALI36508 P 013481848 Cenarchaeum symbiosum A4 uncultured marine thaumarchaeote KM3 55 F05 3ketoacylCoA thiolase acaB1 atoB uncultured marine thaumarchaeote KM3 55 F05 AIF12396 97 76 99 Desulfotomaculum gibsoniae WP 006523469 Bacteria Firmicutes Clostridia Clostridiales Peptococcaceae 99 100 Syntrophus gentianae WP 093882824 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Syntrophobacterales Syntrophaceae Desulfotomaculum copahuensis WP 066666996 Bacteria Firmicutes Clostridia Clostridiales Desulfomonile tiedjei WP 014809337 Bacteria Proteobacteria delta epsilon subdivisions Deltaproteobacteria Syntrophobacterales Syntrophaceae 99 100 Desulfotomaculum gibsoniae WP 006523688 Bacteria Firmicutes Clostridia Clostridiales Peptococcaceae Candidatus Rokubacteria bacterium RIFCSPLOWO2 12 FULL 71 22 hypothetical protein A3F92 00880 Candidatus Rokubacteria bacterium RIFCSPLOWO2 12 FULL 71 22 OGL17874 Clade 1M 99 92 100 Geoglobus acetivorans 3ketoacylCoA thiolase Geoglobus acetivorans WP 048091316 Archaea Euryarchaeota Archaeoglobi Archaeoglobales Ferroglobus placidus WP 012965541 Archaea Euryarchaeota Archaeoglobi Archaeoglobales Archaeoglobaceae Archaeoglobus fulgidus DSM 4304 NP 0688591 11497639 A9 100 Ferroglobus placidus WP 012966782 Archaea Euryarchaeota Archaeoglobi Archaeoglobales Archaeoglobaceae candidate divison MSBL1 archaeon SCGCAAA259B11 hypothetical protein AKJ61 03165 candidate divison MSBL1 archaeon SCGCAAA259B11 KXA89305 100 Candidatus Heimdallarchaeota archaeon LC 3 putative acetylCoA acyltransferase Heimdallarchaeota archaeon LC 3 OLS23825 Candidatus Heimdallarchaeota archaeon LC 2 putative acetylCoA Cacetyltransferase YhfS Heimdallarchaeota archaeon LC 2 OLS29085

0.4

41

Suppl. Figure 11: Phylogenetic analysis of clade 1 of B-ketothiolase/acetyl-CoA acetyltransferase (COG0183). Phylogenetic tree was estimated using IQ-tree LG+C20+F on an alignment of 466 taxa and 279 sites. Bacteria, Archaea, and Asgards are colored in black, green, and purple respectively. Gene sequences from Asgard that are genetically linked to fatty acid metabolism or mevalonate metabolism or both fatty acid and mevalonate metabolism are shown in yellow, green and grey respectively summarized in Suppl. Figure 13. Raw data files are available via figshare (see Data availability for more details).

42

Hexadecanoate -ketothiolase/acetyl-CoA acetyltransferase 92 COG0183 Clade 2 76 Bacillus subtilis BACSU Acetyl CoA acetyltransferase P45855 UNIPROT Bacteria COG0318 99 Oceanobacillus iheyensis HTE831 NP 6939351 23100468 Bacteria 169 taxa, 338 sites Brevibacillus brevis NBRC 100599 YP 0027749801 226315084 Bacteria Acyl-CoA (C ) Acyl-CoA (Cn) UF IQTREE bootstraps 100 n+2 Heliobacterium modesticaldum Ice1 YP 0016796801 167629181 Bacteria COG1960* or LG+C20+G+F 56 Acetyl-CoA Rubrobacter xylanophilus DSM 9941 YP 6452591 108805322 Bacteria COG2368 -oxidation

63 Thermobaculum terrenum ATCC BAA798 YP 0033221071 269925484 Bacteria Fatty acid metabolism Acyl-2-enoyl-CoA 100 Mycobacterium leprae MYCLE Probable acetyl CoA acetyltransferase P46707 UNIPROT Bacteria 67 Mycobacterium smegmatis MYCS2 Probable acetyl CoA acetyltransferase A0R1Y7 UNIPROT Bacteria COG1024* 100 100 71 Streptomyces coelicolor A32 NP 6295381 21223759 Bacteria COG0183* - FA Corynebacterium variabile DSM 44702 YP 0047605091 340795046 Bacteria 3-hydroxy-acyl-CoA Archaea Bacteria Corynebacterium variabile DSM 44702 YP 0047593811 340793918 Bacteria COG1250* Ferroglobus placidus DSM 10642 YP 0034354411 288931381 Archaea Eukaryotes 100 94 Herpetosiphon aurantiacus DSM 785 YP 0015432991 159897052 Bacteria Aceto-acyl-CoA Asgards

99 Fatty acid biosynthesis 62 Chloroflexus aurantiacus J10fl YP 0016351481 163847104 Bacteria 93 Gemmatimonas aurantiaca T27 YP 0027608011 226226695 Bacteria Deinococcus radiodurans R1 NP 2947961 15806092 Bacteria 96 86 archaeon Heimdall LC 3 AcetylCoA acetyltransferase archaeon Heimdall LC 3 OLS23724 Acidobacterium capsulatum ATCC 51196 YP 0027553531 225873894 Bacteria 100 60 100 Haloterrigena turkmenica DSM 5511 YP 0034064691 284176192 Archaea Halopiger xanaduensis SH6 YP 0045975231 336254416 Archaea Haloarcula marismortui ATCC 43049 YP 1378741 55380024 Archaea 100 44 Staphylococcus aureus STAAM Probable acetyl CoA acyltransferase Q99WM3 UNIPROT Bacteria Staphylococcus epidermidis STAEQ Probable acetyl CoA acyltransferase Q5HS07 UNIPROT Bacteria 42 Streptobacillus moniliformis DSM 12112 YP 0033066221 269124045 Bacteria 100 Listeria innocua Clip11262 NP 4707891 16800521 Bacteria 96 100 Lactobacillus casei BL23 YP 0019879411 191638775 Bacteria Erysipelothrix rhusiopathiae str Fujisawa YP 0045617321 336066874 Bacteria 26695 NP 2074841 15645313 Bacteria 99 100 Rhizobium meliloti RHIME Acetyl CoA acetyltransferase P50174 UNIPROT Bacteria 58 Agrobacterium radiobacter K84 YP 0025462741 222087737 Bacteria 63 100 Rhodobacter capsulatus SB 1003 YP 0035793091 294678694 Bacteria Magnetospirillum magneticum AMB1 YP 4202051 83309941 Bacteria 100 100 Homo sapiens HUMAN Acetyl CoA acetyltransferase cytosolic Q9BWD1 UNIPROT Eukaryota Nematostella vectensis XP 0016303951 156376494 Eukaryota str Wilmington YP 0676611 51473904 Bacteria 100 100 Pseudomonas aeruginosa PSEAE Acetyl CoA acetyltransferase Q9I2A8 UNIPROT Bacteria 100 Azotobacter vinelandii DJ YP 0027995271 226944454 Bacteria 98 66 Escherichia coli ECOLI Probable acetyl CoA acetyltransferase Q46939 UNIPROT Bacteria 58 Azotobacter vinelandii DJ YP 0028022111 226947138 Bacteria 98 Allochromatium vinosum ALLVD Acetyl CoA acetyltransferase P45369 UNIPROT Bacteria YqeH 99 70 Cupriavidus necator CUPNH Acetyl CoA acetyltransferase P14611 UNIPROT Bacteria Chromobacterium violaceum CHRVO Acetyl CoA acetyltransferase Q9ZHI1 UNIPROT Bacteria 100 100 Clostridium acetobutylicum CLOAB Acetyl CoA acetyltransferase P45359 UNIPROT Bacteria 94 Clostridium acetobutylicum ATCC 824 NP 1492421 15004782 Bacteria Peptoclostridium difficile PEPD6 Acetyl CoA acetyltransferase Q18AR0 UNIPROT 99 65 Oceanobacillus iheyensis HTE831 NP 6935531 23100087 Bacteria Brevibacillus brevis NBRC 100599 YP 0027749391 226315043 Bacteria 82 Natranaerobius thermophilus JW NMWNLF YP 0019180671 188586522 Bacteria 100 69 Escherichia coli ECOLI Acetyl CoA acetyltransferase P76461 UNIPROT Bacteria Haemophilus influenzae HAEIN Acetyl CoA acetyltransferase P44873 UNIPROT Bacteria 100 100 Archaeoglobus fulgidus DSM 4304 NP 0700261 11498797 Archaea Ferroglobus placidus DSM 10642 YP 0034368331 288932773 Archaea Archaeoglobus veneficus SNP6 YP 0043421781 327401339 Archaea 100 99 Fusobacterium nucleatum subsp nucleatum ATCC 25586 NP 6033921 19703830 Bacteria 53 94 laidlawii PG8A YP 0016204971 162447365 Bacteria 100 cholerae O1 biovar El Tor str N16961 NP 2330781 15601447 Bacteria 82 Petrotoga mobilis SJ95 YP 0015675721 160901991 Bacteria Kosmotoga olearia TBF 1951 YP 0029403201 239616998 Bacteria Thermomicrobium roseum DSM 5159 YP 0025219451 221632724 Bacteria 88 56 Candidatus Lokiarchaeota archaeon CR 4 acetylCoA acetyltransferase Candidatus Lokiarchaeota archaeon CR 4 OLS15559 Brevibacillus brevis NBRC 100599 YP 0027716761 226311782 Bacteria 56 Candidatus Lokiarchaeota archaeon CR 4 acetylCoA acetyltransferase Candidatus Lokiarchaeota archaeon CR 4 OLS12682 99 58 Chloroflexus aurantiacus J10fl YP 0016350781 163847034 Bacteria 99 Azotobacter vinelandii DJ YP 0027988281 226943755 Bacteria 100 Cupriavidus necator CUPNH Beta ketothiolase BktB Q0KBP1 UNIPROT Bacteria 100 Bordetella avium 197N YP 7869881 187478964 Bacteria Nitrosomonas europaea ATCC 19718 NP 8422641 30250194 Bacteria 96 98 Rhodobacter capsulatus SB 1003 YP 0035795801 294678965 Bacteria 36 Collimonas fungivorans Ter331 YP 0047511491 340785684 Bacteria Rubrobacter xylanophilus DSM 9941 YP 6451291 108805192 Bacteria Deinococcus radiodurans R1 NP 2956831 15806958 Bacteria 100 99 Pseudoalteromonas atlantica PSEA6 3 ketoacyl CoA thiolase Q15VA3 UNIPROT Bacteria 100 Alteromonas mediterranea ALTMD 3 ketoacyl CoA thiolase B4RTU9 UNIPROT Bacteria 95 Idiomarina loihiensis IDILO 3 ketoacyl CoA thiolase Q5QXN5 UNIPROT Bacteria 100 VIBVU 3 ketoacyl CoA thiolase Q8DB48 UNIPROT Bacteria Haliangium ochraceum DSM 14365 YP 0032678501 262196641 Bacteria Nitrosococcus watsonii C113 YP 0037606201 300114045 Bacteria 100 97 99 Micrococcus luteus NCTC 2665 YP 0029572321 239917674 Bacteria 98 Corynebacterium variabile DSM 44702 YP 0047596631 340794200 Bacteria 99 Streptomyces coelicolor A32 NP 6286661 21222887 Bacteria 100 Flavobacterium johnsoniae UW101 YP 0011956561 146301065 Bacteria Collimonas fungivorans Ter331 YP 0047541071 340788642 Bacteria Bacteriovorax marinus SJ YP 0050352011 374288116 Bacteria 60 100 53 Gallus gallus XP 4200041 50745166 Eukaryota 60 Ixodes scapularis XP 0024360881 242002890 Eukaryota 99 Trichoplax adhaerens XP 0021106031 196001471 Eukaryota 99 Ciona intestinalis XP 0021198731 198427827 Eukaryota Nematostella vectensis XP 0016281901 156369857 Eukaryota 100 FadI 69 100 Phaeodactylum tricornutum CCAP 1055 1 XP 0021856191 219120775 Eukaryota 3-ketoacyl-CoA thiolase Thalassiosira pseudonana CCMP1335 XP 0022884231 223997500 Eukaryota 99 100 Rubrobacter xylanophilus DSM 9941 YP 6451931 108805256 Bacteria Truepera radiovictrix DSM 17093 YP 0037063851 297624951 Bacteria Gemmatimonas aurantiaca T27 YP 0027623691 226228263 Bacteria 88 98 100 Sulfolobus solfataricus P2 NP 3438511 15899246 Archaea 99 Metallosphaera sedula DSM 5348 YP 0011911001 146303784 Archaea 99 Sulfolobus solfataricus P2 NP 3444261 15899821 Archaea 87 Haloterrigena turkmenica DSM 5511 YP 0034038891 284165610 Archaea Natrialba magadii ATCC 43099 YP 0034820341 289583624 Archaea 76 100 Citromicrobium sp WPS32 WP 054527015 Bacteria 90 Sphingopyxis WP 039575429 Bacteria 100 96 Lokiarchaeum sp GC14 75 acetylCoA acetyltransferase Lokiarchaeum sp GC14 75 KKK41226 Lokiarchaeum sp GC14 75 acetylCoA acetyltransferase Lokiarchaeum sp GC14 75 KKK46214 99 90 100 Rubrobacter xylanophilus DSM 9941 YP 6445091 108804572 Bacteria 89 Rubrobacter xylanophilus DSM 9941 YP 6446101 108804673 Bacteria Magnetospirillum magneticum AMB1 YP 4223521 83312088 Bacteria Leptospira borgpetersenii serovar Hardjobovis str L550 YP 7988261 116329106 Bacteria 99 Nitrosococcus watsonii C113 YP 0037614321 300114857 Bacteria 20 Bacteriovorax marinus SJ YP 0050356091 374288524 Bacteria 92 Candidatus Nitrospira defluvii YP 0037987251 302038403 100 40 94 Candidatus Lokiarchaeota archaeon CR 4 protein AtoB1 Candidatus Lokiarchaeota archaeon CR 4 OLS15498 47 Candidatus Lokiarchaeota archaeon CR 4 acetylCoA acetyltransferase Candidatus Lokiarchaeota archaeon CR 4 OLS14603 Brevibacillus brevis NBRC 100599 YP 0027711461 226311252 Bacteria Corynebacterium variabile DSM 44702 YP 0047593181 340793855 Bacteria Oceanobacillus iheyensis HTE831 NP 6916101 23098144 Bacteria 100 99 Phaeodactylum tricornutum CCAP 1055 1 XP 0021803061 219119079 Eukaryota Thalassiosira pseudonana CCMP1335 XP 0022915571 224003771 Eukaryota 3-ketoacyl-CoA thiolase 40 82 Monosiga brevicollis MX1 XP 0017483101 167533261 Eukaryota (mitochondrial) 100 97 Homo sapiens HUMAN 3 ketoacyl CoA thiolase mitochondrial P42765 UNIPROT Eukaryota Bos taurus BOVIN 3 ketoacyl CoA thiolase mitochondrial Q3T0R7 UNIPROT Eukaryota 67 100 95 Ustilago maydis 521 XP 7594451 71018429 Eukaryota Puccinia graminis f sp tritici CRL 75367003 XP 0033076381 331212737 Eukaryota 93 95 88 Azotobacter vinelandii DJ YP 0027986171 226943544 Bacteria Haliangium ochraceum DSM 14365 YP 0032689871 262197778 Bacteria Bacteriovorax marinus SJ YP 0050341071 374287022 Bacteria Leptospira borgpetersenii serovar Hardjobovis str L550 YP 7967561 116327036 Bacteria 76 96 100 Staphylococcus haemolyticus STAHJ Putative acetyl CoA C acetyltransferase VraB Q4L3Q1 UNIPROT Bacteria 100 Staphylococcus epidermidis STAEQ Putative acetyl CoA C acetyltransferase VraB Q5HRH3 UNIPROT Bacteria 99 Staphylococcus aureus STAAC Putative acetyl CoA C acetyltransferase VraB Q5HIA0 UNIPROT Bacteria Bacillus subtilis BACSU Putative acetyl CoA C acetyltransferase YhfS O07618 UNIPROT Bacteria Oceanobacillus iheyensis HTE831 NP 6915971 23098131 Bacteria Streptomyces coelicolor A32 NP 6272991 21221520 Bacteria Encephalitozoon cuniculi ENCCU 3 ketoacyl CoA thiolase peroxisomal Q8SVA6 UNIPROT Eukaryota Opisthokonta Fungi Microsporidia Apansporoblastina Unikaryonidae 100 100 Mycobacterium tuberculosis MYCTU Putative acyltransferase Rv0859 O53871 UNIPROT Bacteria 98 Corynebacterium variabile DSM 44702 YP 0047587091 340793246 Bacteria Streptomyces coelicolor A32 NP 6308041 21225025 Bacteria 100 Collimonas fungivorans Ter331 YP 0047518671 340786402 Bacteria 96 75 96 Agrobacterium radiobacter K84 YP 0025433471 222084818 Bacteria 100 Haliangium ochraceum DSM 14365 YP 0032661001 262194891 Bacteria Rhodobacter capsulatus SB 1003 YP 0035766901 294676075 Bacteria 96 88 Gemmatimonas aurantiaca T27 YP 0027627561 226228650 Bacteria 98 Haliangium ochraceum DSM 14365 YP 0032690151 262197806 Bacteria Leptospira borgpetersenii serovar Hardjobovis str L550 YP 7988091 116329089 Bacteria 90 100 Haloterrigena turkmenica DSM 5511 YP 0034038971 284165618 Archaea 86 Natrialba magadii ATCC 43099 YP 0034815211 289583055 Archaea Natronomonas pharaonis DSM 2160 YP 3310181 76802923 Archaea 100 100 96 Streptomyces coelicolor A32 NP 6256091 21219830 Bacteria Micrococcus luteus NCTC 2665 YP 0029579891 239918431 Bacteria 95 Oceanobacillus iheyensis HTE831 NP 6935931 23100127 Bacteria 100 99 100 Thermus thermophilus HB8 YP 1439561 55980659 Bacteria Herpetosiphon aurantiacus DSM 785 YP 0015471481 159900901 Bacteria Rubrobacter xylanophilus DSM 9941 YP 6451001 108805163 Bacteria 100 97 Candidatus Lokiarchaeota archaeon CR 4 acetylCoA Cacyltransferase Candidatus Lokiarchaeota archaeon CR 4 OLS12832 Candidatus Lokiarchaeota archaeon CR 4 acetylCoA acetyltransferase Candidatus Lokiarchaeota archaeon CR 4 OLS14966 Candidatus Thorarchaeota archaeon SMTZ183 hypothetical protein AM324 07455 Candidatus Thorarchaeota archaeon SMTZ183 KXH72490 Streptomyces coelicolor A32 NP 6301381 21224359 Bacteria 100 72 Candidatus Heimdallarchaeota archaeon LC 3 3ketoacylCoA thiolase Candidatus Heimdallarchaeota archaeon LC 3 OLS26180 71 Candidatus Heimdallarchaeota archaeon LC 3 3ketoacylCoA thiolase Candidatus Heimdallarchaeota archaeon LC 3 OLS26181 71 archaeon Heimdall LC 2 3ketoacylCoA thiolase archaeon Heimdall LC 2 OLS23634 100 Candidatus Thorarchaeota archaeon SMTZ145 hypothetical protein AM325 12560 Candidatus Thorarchaeota archaeon SMTZ145 KXH74685 Lokiarchaeum sp GC14 75 3ketoacylCoA thiolase Lokiarchaeum sp GC14 75 KKK41709 97 100 Lokiarchaeum sp GC14 75 3ketoacylCoA thiolase Lokiarchaeum sp GC14 75 KKK45022 Lokiarchaeum sp GC14 75 3ketoacylCoA thiolase Lokiarchaeum sp GC14 75 KKK46066 99 Archaeoglobus fulgidus DSM 4304 NP 0698611 11498633 Archaea 100 98 Thermoplasma acidophilum DSM 1728 NP 3937761 16081433 Archaea Picrophilus torridus DSM 9790 YP 0238011 48478095 Archaea Vulcanisaeta moutnovskia 76828 YP 0042443591 325968167 Archaea 100 100 Sulfolobus solfataricus P2 NP 3438411 15899236 Archaea Pyrobaculum aerophilum str IM2 NP 5608931 18314226 Archaea Thermoproteus uzoniensis 76820 YP 0043374921 327310595 Archaea

0.7

43

Suppl. Figure 12: Phylogenetic analysis of clade 2 of B-ketothiolase/acetyl-CoA acetyltransferase (COG0183). Phylogenetic tree was estimated using IQ-tree LG+C20+F on an alignment of 169 taxa and 338 sites. Bacteria, Archaea, and Asgards are coloured in black, green, and purple respectively. Gene sequences from Asgard that are genetically linked to fatty acid metabolism yellow and summarized in Suppl. Figure 13. Raw data files are available via figshare (see Data availability for more details).

44

a b COG0183 genetic linkage

COG0318COG1960COG2368COG1024COG1250COG3425COG1042 KKK41862 Lokiarchaeum sp GC14 75 Clade 1 A 1 A KXH77044 Thorarchaeota archaeon SMTZ-45 Clade 1 B OLS17784 Odinarchaeota archaeon LCB 4 1 B

Clade 1 C

Clade 1 D OLS18997 Heimdallarchaeota archaeon LC 2 1 D

KXH71996 Thorarchaeota archaeon SMTZ1-83 Clade 1 E KXH74809 Thorarchaeota archaeon SMTZ1-45 KXH76295 Thorarchaeota archaeon SMTZ-45 OLS19065 Heimdallarchaeota archaeon LC 2 1 E Clade 1 F OLS19067 Heimdallarchaeota archaeon LC 2 OLS27524 Heimdallarchaeota archaeon LC 3 Clade 1 G OLS30690 Thorarchaeota archaeon AB 25

CladeAM 3 1 H KKK44269 Lokiarchaeum sp GC14 75 1 H

Clade 1 I KKK44142 Lokiarchaeum sp GC14 75 Clade 1 J 1 J KKK44143 Lokiarchaeum sp GC14 75 KKK44137 Lokiarchaeum sp GC14 75 Clade 1 K 1 K OLS14444 Lokiarchaeota archaeon CR 4 KXH74294 Thorarchaeota archaeon SMTZ-45 Clade 1 L OLS29112 Heimdallarchaeota archaeon LC 2 1 L OLS31252 Thorarchaeota archaeon AB 25 Archaea and Bacteria uncultured marine thaumarchaeote AIF12396 Desulfotomaculum gibsoniae WP 006523469 Syntrophus gentianae WP 093882824 Desulfotomaculum copahuensis WP 066666996 Desulfomonile tiedjei WP 014809337 Desulfotomaculum gibsoniae WP 006523688 Clade 1M Candidatus Rokubacteria bacterium OGL17874 Archaea and Bacteria Ferroglobus placidus WP 012966782 candidate divison MSBL1 archaeon KXA89305 Heimdallarchaeota archaeon LC 3 OLS23825 OLS23825 Heimdallarchaeota archaeon LC 3 1 M Heimdallarchaeota archaeon LC 2 OLS29085 0.4

KKK46066 Lokiarchaeum sp GC14 75 Clade 2 OLS15559 Lokiarchaeota archaeon CR 4 c Clade Summary

1A 1B 1D 1E 1F 1H 1I 1J 1K 1L 1M 2 Lokiarchaeum sp GC14 75 HMG-CoA synthase Lokiarchaeota archaeon CR 4 Enoyl-CoA hydrataseAcyl-CoA synthetase Thorarchaeota archaeon AB 25 Acetyl-CoAAcetyl-CoA dehydrogenase dehydrogenase Thorarchaeota archaeon SMTZ-45 Acetyl-CoA acetyltransferase Thorarchaeota archaeon SMTZ1-45 Thorarchaeota archaeon SMTZ1-83 3-hydroxyacyl-CoA dehydrogenase Heimdallarchaeota archaeon AB 125 Heimdallarchaeota archaeon LC 2 COG00183 linked to fatty acid metabolism genes Heimdallarchaeota archaeon LC 3 Odinarchaeote LCB 4 COG00183 linked to mevalonate metabolism genes Archaea COG00183 linked to fatty acid or mevalonate metabolism gene

Suppl. Figure 13: Phylogenetic and gene linkage analysis of B-ketothiolase/acetyl-CoA acetyltransferase (COG0183). a, The COG0183 protein family was divided into two clades based on previous reports by33 where clade 1 is composed of mostly archaeal sequences previously implicated in fatty acid and mevalonate metabolism. A schematic representation of clade 1 based on a tree constructed with IQ-tree LG+C20+F on an alignment of 466 taxa and 279 sites and unrooted. Bipartition support values are depicted as open or closed circles for ultra-fast bootstrap values greater than 70 and 90 respectively. Full phylogeny can be found in Suppl. Figs 10, 11. b, Gene sequences from Asgard archaea that are genetically linked (i.e., within 15 genes of the B-ketothiolase/acetyl-CoA acetyltransferase gene) to the indicated fatty acid metabolism or mevalonate metabolism or both fatty acid and mevalonate metabolism are shown in yellow, green and grey respectively. c, Distribution of Asgard archaeal sequences in each subclade. Raw data files are available via figshare (see Data availability for more details).

45

99.9/100 A0A0Q1F7T8 Methanolinea sp. SDB A0A099TEH5 Smithella sp SCADC 76.9/98 A0A094K1W7 Smithella sp D17 93.5/99 E0WEX9 Aromatoleum sp OcN1 86.7/100 B8FEM4 Desulfatibacillum alkenivorans AK 01 (Alkylsuccinate synthase 1, AssA1) 98.5/100 B8FF75 Desulfatibacillum alkenivorans AK 01 (Alkylsuccinate synthase 2, AssA2) D8WZ05 Desulfoglaeba alkanexedens 98.5/100 A0A1J5HSQ1 Syntrophaceae bacterium CG2_30_49_12 100/100 A0A0F2NHU6 Peptococcaceae bacterium BRH_c4a 64.2/93 77.9/97 Firmicutes, Epsilonbacteraeota 99.4/100 Firmicutes, 97.7/100 O87943 Thauera aromatica (Benzylsuccinate synthase) 97.3/100 O68395 Thauera aromatica 86/99 B9M021 Geobacter daltonii strain DSM 22248 99.8/100 99/99 A0A101GT22 Desulfotomaculum sp 46_80 A0A0F2SCX8 Peptococcaceae bacterium BRH c23 95.7/100 D8WWC2 Clostridia bacterium enrichment culture clone BF 96.7/100 R4KI57 Desulfotomaculum gibsoniae DSM 7213 96.6/100 K0ND30 Desulfobacula toluolica strain DSM 7467 98.1/100 99.7/100 Epsilonbacteraeota, Firmicutes 93.3/100 A0A124G424 Atribacteria bacterium 34 868 CPR 99.5/100 A0A1J4SET6 Deltaproteobacteria bacterium CG1_02_45_11 95.2/100 A0A1Q9MXX8 Ca. Lokiarchaeota archaeon CR 4 99.4/100 A0A1F9A0C8 Deltaproteobacteria bacterium RBG_13_43_22 99.6/100 A0A0M2U0V4 Clostridiales bacterium PH28 bin88 88.8/91 95.9/100 Epsilonbacteraeota A0A0Q1A2Y0 Smithella sp SDB 83.5/97 98.1/100 Epsilonbacteraeota, Nitrospinae 71.2/99 A0A0Q0VR02 Smithella sp SDB 100/100 Chloroflexi, Nitrospinae 99.1/98 Actinobacteria, Chloroflexi, Firmicutes Q38HX4 Clostridium scatologenes (4-hydroxyphenylacetate decarboxylase) 100/100 A0A1Y3SCI7 Flavonifractor sp An82 86.7/98 A0A1K1XCP8 Olsenella sp kh2p3 93.9/98 Q84F16 Clostridioides difficile (4-hydroxyphenylacetate decarboxylase) A0A075KHH8 Pelosinus sp UFO1 87.6/96 97.6/100 A0A1M5QTX0 Tepidibacter thalassicus DSM 15285 86.3/83 F0SW35 Syntrophobotulus glycolicus strain DSM 8271 FlGlyR Firmicutes 71.5/96 A0A0P6YDT3 Thermanaerothrix daxensis 99.1/100 57.8/84 Actinobacteria, Firmicutes 97.9/100 G8UI84 strain ATCC 43037 100/100 99.6/100 FCB, PVC 98.6/100 97.8/100 Firmicutes, Gammaproteobacteria 87.9/65 98.4/100 Actinobacteria, Firmicutes 80.8/93 A0A1F3MUG0 Bacteroidetes bacterium GWF2_49_14 92.7/100 97.8/100 Epsilonbacteraeota 82.1/91 A0A1V5HA01 Deltaproteobacteria bacterium ADurbBinA179 89.4/9199.9/100 Firmicutes, Alpha-, Gammaproteobacteria, Epsilonbacteraeota 99.9/100 52.1/83 Alphaproteobacteria, Firmicutes 99.8/100 Actinobacteria, Chloroflexi, Epsilonbacteraeota, Firmicutes Actinobacteria, Epsilonbacteraeota, Firmicutes, PVC, 75.7/98 99.6/100 A0A135VMY2 Ca. Thorarchaeota archaeon SMTZ1-45 100/100 A0A1Q9PBI2 Ca. Thorarchaeota archaeon AB_25 93.9/74 A0A135VKS2 Ca. Thorarchaeota archaeon SMTZ-45 99.6/100 90.7/92 Firmicutes, Alpha-, Beta-, Gammaproteobacteria, Epsilonbacteraeota 63/72 91/93 Firmicutes, Actinobacteria, , Epsilonbacteraeota 99.6/100 Spirochaetes, Firmicutes 89.3/97 Actinobacteria, Chloroflexi, Firmicutes, Fusobacteria, Gammaproteobacteriaa, PVC, Spirochaetes 93/82 Firmicutes, Synergistetes 90.1/92 93/99 88.3/81 FCB 78.6/86 A0A1Q9PD56 Ca. Heimdallarchaeota archaeon AB_125 74.7/99 84.2/98 A0A0S8K0S6 candidate division WOR 3 bacterium SM1_77 91/92 Aminicenantes A0A151EJY1 Euryarchaeota archaeon SG8_5 98/100 A0A0S8JE93 candidate division BRC1 bacterium SM23_51 67.6/91 100/100 A0A135VPV8 Ca. Thorarchaeota archaeon SMTZ1-45 A0A1Q9NGQ1 Ca. Thorarchaeota archaeon AB-25 A0A151F9S9 Theion archaeon DG70_1 95.9/98 A0A1W9RK08 Ca. Latescibacteria bacterium 4484_7 89.2/98 A0A151F8P6 Theion archaeon DG70 96.8/96 Armatimonadetes, PVC 77.9/91 91.6/100 A0A151F9E2 Theion archaeon DG70 92/100 Firmicutes, Actinobacteria, Epsilonbacteraeota, FCB, PVC, Spirochaetes, Synergistetes, Chloroflexi 84.2/96 93.2/79 Epsilonbacteraeota, FCB 87.8/92 H3ZMH7 Thermococcus litoralis strain ATCC 51850 DSM 5473 JCM 8560 NS C 99.9/100 88.3/92 Q5JFY6 Thermococcus kodakarensis strain ATCC BAA 918 JCM 12380 KOD1 97.8/100 A0A101DMH6 Thermococcales archaeon 44 46 65/97 C6A251 Thermococcus sibiricus strain MM 739 DSM 12597 99.3/100 91.1/99 B5IEV9 Aciduliprofundum boonei strain DSM 19572 T469 99.6/100 L0HNW0 Aciduliprofundum sp strain MAR08 339 100/100 B1L491 Korarchaeum cryptofilum strain OPF8 Firmicutes, Synergistetes 92.9/100 A0A1F9P3V2 Desulfobacterales bacterium RIFOXYA12 FULL 46_15 100/100 A0A133UUQ0 candidate divison MSBL1 archaeon SCGC AAA259I09 80.7/97 A0A133U8F5 candidate divison MSBL1 archaeon SCGC AAA259B11 93.4/100 80/98 Actinobacteria, Epsilonbacteraeota, Firmicutes, Synergistetes 84.3/87 80.4/99 Firmicutes, Epsilonbacteraeota 95.2/100 Q30W70 Desulfovibrio alaskensis (Choline trimethylamine lyase) 86.1/100 D2KVE9 Streptococcus dysgalactiae subsp equisimilis 99.7/100 A0A1C5SQE6 uncultured Eubacterium 99.4/100 A0A0F8UQ89 Methanosarcina sp 1HT1A1 87.1/82 A0A242HN70 Enterococcus sp 12E11 DIV0728 99.6/100 A0A242AW64 Enterococcus sp 7F3 DIV0205 95.4/100 95.4/96 100/100 Actinobacteria, Epsilonbacteraeota, Firmicutes, Gammaproteobacteria, 87.5/82 Firmicutes, Gammaproteobacteria 91.7/97 92.5/100 Gammaproteobacteria, Epsilonbacteraeota 100/100 Firmicutes 100/100 Firmicutes, Actinobacteria, Epsilonbacteraeota, Beta-, Gammaproteobacteria, FCB 97.6/100 Firmicutes, Actinobacteria, Fusobacteria, Gammaproteobacteria, Spirochaetes, FCB 97/97 Firmicutes, Epsilonbacteraeota, Chloroflexi, Spirochaetes 80.8/51 77.5/100 A0A0F8XJF2 Ca. Lokiarchaeum sp GC14 75 61.6/82 99.9/100 I4C5J8 Desulfomonile tiedjei strain ATCC 49306 87.3/66 99.8/100 A0A0F8W433 Ca. Lokiarchaeum sp GC14 75 A0A0F8VKZ9 Ca. Lokiarchaeum sp GC14 75 86.3/86 A0A0M2U4U2 Clostridiales bacterium PH28 bin88 100/100 90.1/63 Firmicutes, Spirochaetes Firmicutes, Actinobacteria 94.4/98 Firmicutes, Epsilonbacteraeota 84.2/94 Firmicutes 96.7/78 Fusobacteria, Firmicutes, Gammaproteobacteria, PVC, FCB, Spirochaetes, Armatimonadales 100/100 Firmicutes, PVC 98.2/100 A0A1G2Y408 Planctomycetes bacterium GWF2_41_51 85/99 A0A1I4KHE4 Methanobrevibacter olleyae A0A0M0BEB4 miscellaneous Crenarchaeota group archaeon SMTZ1 55 90.5/97 Firmicutes, PVC, Epsilonbacteraeota 99.3/99 94/100 Firmicutes, PVC 19.5/38 100/100 Firmicutes, FCB, PVC, Spirochaetes Firmicutes 100/100 88.3/100 A0A1V5RRX6 Chloroflexi bacterium ADurbBin325 92.9/98 A0A1G1A697 Lentisphaerae bacterium RIFOXYA12 FULL_48_11 75/95 88.6/98 F0YTG2 Clostridium sp D5 96.2/73 A0A151BL95 Ca. Bathyarchaeota archaeon B63 84.8/98 95.3/98 Firmicutes 100/100 Chloroflexi, Firmicutes, PVC 97/92 Armatimonadetes, Firmicutes, FCB, PVC 66.4/94 A0A0F8XYB2 Ca. Lokiarchaeum sp GC14 75 A0A0M2U450 Clostridiales bacterium PH28 bin88 93.6/93 A0A1V5HZJ5 Spirochaetes bacterium ADurbBinA120 81.1/94 Actinobacteria, Chloroflexi, Epsilonbacteraeota, Gammaproteobacteria, Nitrospinae, Spirochaetes 99.8/100 A0A0A7GEG9 Geoglobus acetivorans 54.8/85 98.8/100 A0A0F7IF56 Geoglobus ahangari 99.9/100 N0BKU3 Archaeoglobus sulfaticallidus PM70 1 98.6/100 100/100 98.8/100 A0A101DF56 Archaeoglobus fulgidus A0A135VWA9 Ca. Thorarchaeota archaeon SMTZ1 83 A0A0S7WEY6 Dehalococcoidia bacterium DG 22 79/99 100/100 Actinobacteria, Chloroflexi, Epsilonbacteraeota, Firmicutes, FCB, Spirochaetes, A0A135VQY2 Ca. Thorarchaeota archaeon SMTZ1-45 89.2/100 A0A1Q9MVR6 Ca. Lokiarchaeota archaeon CR_4 89/91 76.7/100 A0A1Q9N0T7 Ca. Lokiarchaeota archaeon CR_4 95.3/100 A0A1Q9MTL8 Ca. Lokiarchaeota archaeon CR_4 97.5/100 A0A1Q9MTL9 Ca. Lokiarchaeota archaeon CR_4 Epsilonbacteraeota 100/100 Firmicutes, Actinobacteria, PVC

0.4

46

Suppl. Figure 14: Maximum likelihood phylogenetic analysis of pyruvate-formate lyase superfamily proteins. Phylogenetic tree was estimated using IQ-tree LG+C20+R+F and is based on a protein alignment of 336 aligned sites and includes 1012 sequences. Support values (SH-like approximate likelihood ratio test and ultrafast bootstraps) are not shown whenever one of the values was below 50. Scale bar indicates number of substitutions per site. Arrows point to proteins of organisms that are known to utilize hydrocarbons. Raw data files are available via figshare (see Data availability for more details).

47

Organohalides: TCB, trichlorobenzene; TeCB, tetrachlorobenzene; PCE, perchloroethylene; TCE, trichloroethylene; DCE, dichloroethene; 100/ 100 8658187VS_Dehalococcoides_mccartyi_VS DCP, dichlorophenol; C3RWJ4_MB_mbrA_Dehalococcoides_mccartyi_MB MbrA (PCE, TCE) cbdbA1453_Dehalococcoides_mccartyi_CBDB1 VC, vinyl chloride; 90.0-99.9/ 100 or 100/ 90-99 KB1_8_KB_1_consortium cbdbA1624_Dehalococcoides_mccartyi_CBDB1 DCA, dichloroethane; GT_1344_Dehalococcoides_mccartyi_GT 90.0-99.9/ 90-99 8658303VS_Dehalococcoides_mccartyi_VS 3Cl-4-HBA, 3-chloro-4-hydroxybenzoate. cbdbA1550_Dehalococcoides_mccartyi_CBDB1 KB13241_6_KB_1_consortium 80.0-89.9/ 80-100 & 80.0-100/ 80-89 cbdbA1570_Dehalococcoides_mccartyi_CBDB1 DET1522_Dehalococcoides_mccartyi_195 D337_WL_Dehalococcoides_mccartyi_WL DET1535_Dehalococcoides_mccartyi_195 KB13109_9_KB_1_consortium 8658324VS_Dehalococcoides_mccartyi_VS 8658241VS_Dehalococcoides_mccartyi_VS 8658249VS_Dehalococcoides_mccartyi_VS KB13109_7_KB_1_consortium KB1_22_KB_1_consortium cbdbA1582_Dehalococcoides_mccartyi_CBDB1 8658312VS_Dehalococcoides_mccartyi_VS KB1_12_KB_1_consortium cbdbA1535_Dehalococcoides_mccartyi_CBDB1 BAV1_0112_Dehalococcoides_mccartyi_BAV1 Q3ZAB8_(AY165309)_Dehalococcoides_mccartyi_FL2 TceA (PCE, 2,3-DCP, TCE, All DCE isomers, VC) KB1338_1_KB_1_consortium D339_WL_Dehalococcoides_mccartyi_WL Q5YD55_BAV1_0847_Dehalococcoides_mccartyi_BAV1 BvcA (VC, TCE, All DCE isomers, 1,2-DCA) KB1_6_KB_1_consortium D2BJ91_Dehalococcoides_mccartyi_VS VcrA (VC, All DCE isomers) GT_1237_Dehalococcoides_mccartyi_GT D338_WL_Dehalococcoides_mccartyi_WL 8657050VS_Dehalococcoides_mccartyi_VS BAV1_0284_Dehalococcoides_mccartyi_BAV1 Dehly0275_Dehalogenimonas_lykanthroporepellens_BL_DC_9 Dehly1152_Dehalogenimonas_lykanthroporepellens_BL_DC_9 8658318VS_Dehalococcoides_mccartyi_VS cbdbA1588_Dehalococcoides_mccartyi_CBDB1 Q3Z9N3_(DET0318)_Dehalococcoides_mccartyi_195_pceA PceA (PCE, 2,3-DCP, TCE, All DCE isomers, VC) cbdbA1495_Dehalococcoides_mccartyi_CBDB1 DET0306_Dehalococcoides_mccartyi_195 KB13241_3_KB_1_consortium 8658300VS_Dehalococcoides_mccartyi_VS DET1519_Dehalococcoides_mccartyi_195 cbdbA1575_Dehalococcoides_mccartyi_CBDB1 8658296VS_Dehalococcoides_mccartyi_VS cbdbA1546_Dehalococcoides_mccartyi_CBDB1 8658289VS_Dehalococcoides_mccartyi_VS 8658308VS_Dehalococcoides_mccartyi_VS cbdbA1578_Dehalococcoides_mccartyi_CBDB1 KB11024_1_KB_1_consortium 8658245VS_Dehalococcoides_mccartyi_VS cbdbA1508_Dehalococcoides_mccartyi_CBDB1 8658254VS_Dehalococcoides_mccartyi_VS DET1528_Dehalococcoides_mccartyi_195 Dehly0156_Dehalogenimonas_lykanthroporepellens_BL_DC_9 Dehly1054_Dehalogenimonas_lykanthroporepellens_BL_DC_9 AY374251_Dehalococcoides_mccartyi_FL2 8658265VS_Dehalococcoides_mccartyi_VS 8657053VS_Dehalococcoides_mccartyi_VS BAV1_0281_Dehalococcoides_mccartyi_BAV1 8658267VS_Dehalococcoides_mccartyi_VS 8657042VS_Dehalococcoides_mccartyi_VS Dehly0910_Dehalogenimonas_lykanthroporepellens_BL_DC_9 8658269VS_Dehalococcoides_mccartyi_VS Dehly0121_Dehalogenimonas_lykanthroporepellens_BL_DC_9 Dehly0274_Dehalogenimonas_lykanthroporepellens_BL_DC_9 Dehly1582_Dehalogenimonas_lykanthroporepellens_st__BL_DC_9 Dehly1540_Dehalogenimonas_lykanthroporepellens_BL_DC_9 Dehly0283_Dehalogenimonas_lykanthroporepellens_BL_DC_9 DET0235_Dehalococcoides_mccartyi_195 KB13107_1_KB_1_consortium cbdbA243_Dehalococcoides_mccartyi_CBDB1 8658272VS_Dehalococcoides_mccartyi_VS BAV1_0119_Dehalococcoides_mccartyi_BAV1 cbdbA80_Dehalococcoides_mccartyi_CBDB1 DET1559_Dehalococcoides_mccartyi_195 Dehly1148_Dehalogenimonas_lykanthroporepellens_BL_DC_9 Dehly1355_Dehalogenimonas_lykanthroporepellens_BL_DC_9 Q3ZWF5_cbdbA84_Dehalococcoides_mccartyi_CBDB1_cbrA CbrA (1,2,3,4-TeCB, 1,2,3-TCB) BAV1_0104_Dehalococcoides_mccartyi_BAV1 DET0311_Dehalococcoides_mccartyi_195 Dehly1524_Dehalogenimonas_lykanthroporepellens_BL_DC_9 AY374245_Dehalococcoides_mccartyi_FL2 cbdbA1618_Dehalococcoides_mccartyi_CBDB1 8658346VS_Dehalococcoides_mccartyi_VS 8658327VS_Dehalococcoides_mccartyi_VS cbdbA1598_Dehalococcoides_mccartyi_CBDB1 MB_rdhA5_Dehalococcoides_mccartyi_MB cbdbA1455_Dehalococcoides_mccartyi_CBDB1 8658189VS_Dehalococcoides_mccartyi_VS 8658239VS_Dehalococcoides_mccartyi_VS 8657058VS_Dehalococcoides_mccartyi_VS BAV1_0276_Dehalococcoides_mccartyi_BAV1__bvcA_ 8658252VS_Dehalococcoides_mccartyi_VS 8658278VS_Dehalococcoides_mccartyi_VS DET0302_Dehalococcoides_mccartyi_195 BAV1_0121_Dehalococcoides_mccartyi_BAV1 DET0876_Dehalococcoides_mccartyi_195 DET0173_Dehalococcoides_mccartyi_195 8658355VS_Dehalococcoides_mccartyi_VS cbdbA1627_Dehalococcoides_mccartyi_CBDB1 DET1538_Dehalococcoides_mccartyi_195 cbdbA1491_Dehalococcoides_mccartyi_CBDB1 cbdbA1542_Dehalococcoides_mccartyi_CBDB1 8658361VS_Dehalococcoides_mccartyi_VS DET1545_Dehalococcoides_mccartyi_195 cbdbA1638_Dehalococcoides_mccartyi_CBDB1 cbdbA1560_Dehalococcoides_mccartyi_CBDB1 D336_WL_Dehalococcoides_mccartyi_WL KB13241_1_KB_1_consortium 8657036VS_Dehalococcoides_mccartyi_VS X1S4K5_marine_sediment_metagenome KB13241_2_KB_1_consortium Dehly1328_Dehalogenimonas_lykanthroporepellens_BL_DC_9 Dehly1530_Dehalogenimonas_lykanthroporepellens_BL_DC_9 Dehly0068_Dehalogenimonas_lykanthroporepellens_BL_DC_9 Dehly0849_Dehalogenimonas_lykanthroporepellens_BL_DC_9 Dehly1514_Dehalogenimonas_lykanthroporepellens_BL_DC_9 A0A0F9RET7_marine_sediment_metagenome Ferp_2321_Ferroglobus_placidus_DSM_10642 Ferroglobus ACT3_rdh03_Dehalobacter_CF Dhaf_0711_Desulfitobacterium_hafniense_DCB_2 Dhaf_0689_Desulfitobacterium_hafniense_DCB_2 ACT3_rdh06_Dehalobacter_CF ACT3_rdh14_Dehalobacter_CF AB194705_Desulfitobacterium_sp__KBC1 AF259791_Desulfitobacterium_sp__Viet_1 Dhaf_0737_Desulfitobacterium_hafniense_DCB_2 ACT3_rdh05_Dehalobacter_CF ACT3_rdh04_Dehalobacter_CF ACT3_rdh15_Dehalobacter_CF Q8RQC9_(AF204275)_Desulfitobacterium_chlororespirans CprA (3Cl-4-HBA, several ortho-chlorophenols) ACT3_rdh16_Dehalobacter_CF Dhaf_0713_Desulfitobacterium_hafniense_DCB_2 ACT3_rdh10_Dehalobacter_CF ACT3_rdh11_Dehalobacter_CF X1JNB6_marine_sediment_metagenome DET0180_Dehalococcoides_mccartyi_195 BAV1_0173_Dehalococcoides_mccartyi_BAV1 8657123VS_Dehalococcoides_mccartyi_VS cbdbA1539_Dehalococcoides_mccartyi_CBDB1 8658274VS_Dehalococcoides_mccartyi_VS A2cp1_0353_Anaeromyxobacter_dehalogenans_2CP_1 Adeh_0329_Anaeromyxobacter_dehalogenans_2CP_C 8658285VS_Dehalococcoides_mccartyi_VS DET1171_Dehalococcoides_mccartyi_195 cbdbA1092_Dehalococcoides_mccartyi_CBDB1 cbdbA1503_Dehalococcoides_mccartyi_CBDB1 P3TCK_04761_Photobacterium_profundum_3TCK ACT3_rdh08_Dehalobacter_CF AJ539533_Dehalobacter_restrictus_DSM_9455 CD1958_Clostridium_difficile_630 ThebaDRAFT_2271_Thermotogales_bacterium_mesG1_Ag_4_2 A0A0F9QYL7_marine_sediment_metagenome X1PZD0_marine_sediment_metagenome AnaeK_0343_Anaeromyxobacter_dehalogenans_K Adeh_0331_Anaeromyxobacter_dehalogenans_2CP_C A2cp1_0355_Anaeromyxobacter_dehalogenans_2CP_1 NPH_5634_delta_proteobacterium_NaphS2 X1FW99_marine_sediment_metagenome ACT3_rdh07_Dehalobacter_CF KKK43376_Lokiarchaeum_sp_GC14_75_Lokiarch_27520_IPR028894 A0A0F9GSC3_marine_sediment_metagenome Lokiarchaeota KKK40854_Lokiarchaeum_sp_GC14_75_Lokiarch_47950_IPR028894 A0A0F9EBH6_marine_sediment_metagenome X1F939_marine_sediment_metagenome A0A0F9U3K6_marine_sediment_metagenome X1EEX8_marine_sediment_metagenome A0A151EJY7_Theionarchaea_archaeon_DG_70_1 Theionarchaea A0A151ET48_Theionarchaea_archaeon_DG_70 DealDRAFT_0257_Dethiobacter_alkaliphilus_AHT_1 HeliMode_pceA_Heliobacterium_modesticaldum_Ice1 Q3LHG0_(AB194706)_Desulfitobacterium_sp__KBC1 PrdA (PCE) AF022812_Sulfurospirillum_multivorans* PceA (TCE) KXH73683_Candidatus_Thorarchaeota_archaeon_SMTZ1_45_AM325_13575_IPR028894 OLS30642_Candidatus_Thorarchaeota_archaeon_AB_25_ThorAB25_08700_IPR028894 Thorarchaeaota KXH70126_Candidatus_Thorarchaeota_archaeon_SMTZ1_83_AM324_01615_IPR028894 OLS29023_Candidatus_Heimdallarchaeota_archaeon_LC_2_HeimC2_02180_IPR028894 OLS16461_Candidatus_Heimdallarchaeota_archaeon_LC_3_HeimC3_54040_IPR028894 Heimdallarchaeaota A0A0F9QPY0_marine_sediment_metagenome ACT3_rdh13_Dehalobacter_CF DcaA (1,2-DCA) Q08GR2_(AM183918)_Desulfitobacterium_dichloroeliminans_LMG_P_21439 B5TS67_AGWLrdhA1_Dehalobacter_WL DcaA (1,2-DCA) Q8GJ31_(AJ439608)_Desulfitobacterium_hafniense AY216592_Desulfitobacterium_sp_PCE_S PceA (PCE, TCE) ACT3_rdh12_Dehalobacter_CF Dhaf_0696_Desulfitobacterium_hafniense_DCB_2 Glov_2870_Geobacter_lovleyi_SZ Ssed1729_Shewanella_sediminis_HAW_EB3 Ssed2100_Shewanella_sediminis_HAW_EB3 VOA_003314_Vibrio_sp__RC586 ACT3_rdh18_Dehalobacter_CF ACT3_rdh19_Dehalobacter_CF ACT3_rdh17_Dehalobacter_CF ACT3_rdh01_1 ACT3_rdh02_1 Jann_1968_Jannaschia_sp__CCS1 SPO1738_Ruegeria_pomeroyi_DSS_3 A0A256YZQ5_Thermoplasmatales_archaeon_ex4484_6 OLS26278_Candidatus_Heimdallarchaeota_archaeon_LC_2_HeimC2_15220_IPR028894 A0A0P9BI83_Thermococcus_sp_EP1 A0A218NZS2_Thermococcus_celer Euryarchaeaota H3ZQM4_Thermococcus_litoralis_strain_ATCC_51850_DSM_5473_JCM_8560_NS_C KKK46025_Lokiarchaeum_sp_GC14_75_Lokiarch_04730 & Asgard A0A2A2HAL0_Methanobacterium_bryantii A0A1D3L1L0_Methanobacterium_congolense K6T0U2_Methanobacterium_sp_Maddingley_MBC34 KXH72092_Candidatus_Thorarchaeota_archaeon_SMTZ1_45_AM325_14585_IPR028894 OLS26771_Candidatus_Thorarchaeota_archaeon_AB_25_ThorAB25_20970_IPR028894 KXH70142_Candidatus_Thorarchaeota_archaeon_SMTZ_45_AM326_00950_IPR028894 KXH73195_Candidatus_Thorarchaeota_archaeon_SMTZ1_83_AM324_16300_IPR028894 OLS28791_Candidatus_Heimdallarchaeota_archaeon_AB_125_HeimAB125_23330_IPR028894 OLS28792_Candidatus_Heimdallarchaeota_archaeon_AB_125_HeimAB125_23340 KKK44698_Lokiarchaeum_sp_GC14_75_Lokiarch_15880 KKK45779_Lokiarchaeum_sp_GC14_75_Lokiarch_06930 KKK43884_Lokiarchaeum_sp_GC14_75_Lokiarch_23150 Euryarchaeaota KKK45567_Lokiarchaeum_sp_GC14_75_Lokiarch_08830 KKK44372_Lokiarchaeum_sp_GC14_75_Lokiarch_18660 & Asgard KKK43393_Lokiarchaeum_sp_GC14_75_Lokiarch_27300 KKK42911_Lokiarchaeum_sp_GC14_75_Lokiarch_31330 KKK45702_Lokiarchaeum_sp_GC14_75_Lokiarch_07790 KKK41514_Lokiarchaeum_sp_GC14_75_Lokiarch_42380 KKK41619_Lokiarchaeum_sp_GC14_75_Lokiarch_41590 KKK45737_Lokiarchaeum_sp_GC14_75_Lokiarch_07190 KKK43374_Lokiarchaeum_sp_GC14_75_Lokiarch_27500 KKK40852_Lokiarchaeum_sp_GC14_75_Lokiarch_47930 OLS12611_Candidatus_Lokiarchaeota_archaeon_CR_4_RBG13Loki_3783 KXH70060_Candidatus_Thorarchaeota_archaeon_SMTZ1_83_AM324_10225 KKK41078_Lokiarchaeum_sp_GC14_75_Lokiarch_46030 COG01600 (epoxyqueosine reductase) of asgard archaea KKK43730_Lokiarchaeum_sp_GC14_75_Lokiarch_24480 KKK40604_Lokiarchaeum_sp_GC14_75_Lokiarch_50040 lacking IPR028894 were used to root phylogeny OLS15865_Candidatus_Lokiarchaeota_archaeon_CR_4_RBG13Loki_0505 KKK43963_Lokiarchaeum_sp_GC14_75_Lokiarch_22430 OLS13975_Candidatus_Lokiarchaeota_archaeon_CR_4_RBG13Loki_2371 OLS14450_Candidatus_Lokiarchaeota_archaeon_CR_4_RBG13Loki_1935 KXH73170_Candidatus_Thorarchaeota_archaeon_SMTZ1_45_AM325_08180 OLS29679_Candidatus_Thorarchaeota_archaeon_AB_25_ThorAB25_15040 KXH70876_Candidatus_Thorarchaeota_archaeon_SMTZ1_83_AM324_09565 OLS15887_Candidatus_Lokiarchaeota_archaeon_CR_4_RBG13Loki_0478 A0A125RBK9_Methanobrevibacter_sp_YE315 A0A256ZK79_Candidatus_Bathyarchaeota_archaeon_ex4484_135 H1Z3M0_Methanoplanus_limicola_DSM_2279

0.6

48

Suppl. Figure 15: Maximum likelihood phylogenetic analysis of reductive dehalogenases. Phylogenetic tree was estimated using IQ-tree LG+C40+R+F and is based on a protein alignment of 160 aligned sites and includes 243 sequences. All support values (SH-like approximate likelihood ratio test and ultrafast bootstraps) below 80 as well as support values of closely related sequences were removed to increase readability and are represented by filled circles according to the color code in the figure legend. Tree was rooted using midpoint rooting. Scale bar indicates the number of substitutions per site. Archaeal clades are shaded in green, bacteria are shaded in violet. Dark green shading refers to archaeal sequences that might represent bona fide reductive dehalogenases while the function of sequences highlighted in light green is less clear. The current outgroup represents homologous sequences of Asgard archaea assigned to COG1600 but lacking significant hit to reductive dehalogenase domains (IPR028894). However, notice that some archaeal sequences which encode IPR028894 cluster with these divergent Asgard sequences. The substrates of some know reductive dehalogenases are indicated based on57,68 and the backbone dataset is based on sequences kindly provided by Laura Hug and described in59. Abr.: Organohalides: TCB, trichlorobenzene; TeCB, tetrachlorobenzene; PCE, perchloroethylene; TCE, trichloroethylene; DCE, dichloroethene; DCP, dichlorophenol; VC, vinyl chloride; DCA, dichloroethane; 3Cl-4-HBA, 3-chloro-4-hydroxybenzoate. *This sequence is 99% similar to W6EQP0, for which a X-ray crystal structure has been obtained56. Raw data files are available via figshare (see Data availability for more details).

49

a 100 Bacteria 95-99 Bacteria 90-94 85-89 Thalassoarchaea Archaea Eukaryotes Alphaproteobacteria Bacteria Bacteria A-type Crenarchaeota OLS27277 Ca. Heimdallarchaeota archaeon LC_3 (HeimC3_04480) OLS29256 Ca. Heimdallarchaeota archaeon LC_2 (HeimC2_01030) E6N8L3 Candidatus Caldiarchaeum subterraneum “Novel euryarchaeota”1 Thaumarchaeota Thermoplasmatales Bacteria, Haloarchaea Bacteria Crenarchaeota Bacteria2 Bacteria, Haloarchaea B-type OLS21932 Ca. Heimdallarchaeota archaeon LC_2 (HeimC2_31680) Bacteria Crenarchaeota Crenarchaeota

Bacteria C-type Bacteria, Archaea NOR

0.9

b A3X059 Nitrobacter sp. Nb_311A A4BST3 Nitrococcus mobilis Nb_231 D5MLH7 Ca. Methylomirabilis_oxyfera I6WH63 Nitrolancea hollandica Lb

100/100 A0A0P7ZK23 Ca. Methanoperedens sp BLZ1 95.0-99.9/100 or 100/95-99 90.0-94.9/100 or 100/90-94 A0A1Q9NM41 Ca. Heimdallarchaeota archaeon LC_2 (HeimC2_27310) 85.0-99.9/85-99 A0A1Q9NYE9 Ca. Heimdallarchaeota archaeon LC_3 (HeimC3_05850) Beta-, Gamma-, Deltaproteobacteria (9), Chlorobi (5), Chloroflexi (2), Omnitrophica (1), CPR (1), Melainabacteria (1) Chloroflexi (2), Thermaceae (2) Nitrospirae (1), Deltaproteobacteria (1) D3S2G1 Ferroglobus placidus strain DSM_10642 E8WP19 Geobacter sp strain M18 (2) Alpha-, Gammaproteobacteria (2) Q8ZSS1 Pyrobaculum aerophilum strain ATCC_51768 U3T9S4 Aeropyrum camini SY1 C3MWH5 Sulfolobus islandicus strain M1425 E1QSW0 Vulcanisaeta distributa strain DSM_14429 Actinobacteria (77) Firmicutes (39) Firmicutes (7) T0M4I4_Thermoplasmatales_archaeon_A_plasma_IPR006468_archaea Deltaproteobacteria (4)

Alpha-, Beta-, Gamma-Proteobacteria (58), Chlorobi (2), Actinobacteria (1), Acidithiobacillus (1), Firmicutes (1)

0.2

Suppl. Figure 16: Maximum likelihood phylogenetic analysis of Heme-copper-type cytochrome c oxidase, subunit 1 and nitrate reductase, alpha subunit. a, Phylogenetic tree of Heme-copper-type cytochrome c oxidase, subunit 1

50

was estimated using IQ-tree LG+C20 and is based on a protein alignment of 443 aligned sites and includes 1078 sequences. Only support values (ultrafast bootstraps) above 85 are shown and are represented by filled circles according to the color code in the figure legend. The tree was rooted using the distantly related nitric-oxide reductase (NOR). b, Phylogenetic tree of the alpha-subunit of the bacterial nitrate reductase family (IPR006468) was estimated using IQ-tree LLG+C20+F+R and is based on a protein alignment of 1071 aligned sites and including 233 sequences. Only support values (SH-like approximate likelihood ratio test and ultrafast bootstraps) above 80 are represented by filled circles according to the color code in the Fig. legend. The tree was rooted arbitrarily. a,b Scale bars indicate number of substitutions per site. Raw data files are available via figshare (see Data availability for more details).

51

Suppl. Figure 17: Maximum likelihood phylogenetic analysis of key subunit of acetyl-CoA synthase/CO dehydrogenase. Phylogenetic tree was estimated for protein sequences with PF03598 encoding the key subunit of Acetyl-CoA synthase/CO dehydrogenase, cdhC (also referred to as CODH/acetyl-CoA synthase beta subunit or alpha subunit of ACS/CODH) using IQ-tree with the LG+C20+F+R. The protein alignment consisted of 712 aligned sites and includes 442 sequences. Only support values (SH-like approximate likelihood ratio test and ultrafast bootstraps) above 80 are represented by filled circles according to the color code in the figure legend. Scale bar indicates number of substitutions per site. Raw data files are available via figshare (see Data availability for more details).

52

a

Rubisco Type IC/ID

Rubisco Type IA/IB

Archaeal Rubisco Type III

HeimC3_2539 100/100 0 90-99.9/100 or 100/90-99 90-99.9/90-99

0 Lokiarch_3017

Rubisco Type IV 0

HeimC2_1001

O

dinLC

B

4_1530 ThorAB25_2003AM R LoAM 325_0614 BG Lokiarch_1248k 324_0831 HeimAB125_16260 13Loki_197ia 2 rc Rubisco 0 h_1414 0 (Thorarchaeote) 0 (Thorarchaeote)0 Rubisco Type II/III 0 i_223 0 Type II k 3

13Lo

BG

R

0.6

b 57 62 120 172 174 176 195 196 198 200 201 291 292 324 331 376 377 400 401 Type IB YP_170840 Synechococcus elongatus PCC_6301 E T N K K G D F K D E H R H K S G G G Type II P50922 Rhodobacter capsulatus ATCC11166 E T N K K G D F K D E H R H K S G G G Type II/III YP_566926 Methanococcoides burtonii DSM_6242 E T N K K G D F K D E H R H K S G G S Type III AAB99239 jannaschii DSM_2661 E T N K K G D L K D E H R H K S G G G BAD86479 Thermococcus kodakaraensis KOD1 E T N K K G D Y K D E H R H K S G G G OLS23198.1 Ca. Heimdallarchaeota archaeon LC_3 (HeimC3_25390) E T N K K G D V K D E H R H K S G G G KKK43062.1 Ca. Lokiarchaeum sp. GC14_75 (Lokiarch_30170) E T N K K G D L K D E H R H K S G G G OLS27808.1 Ca. Heimdallarchaeota archaeon LC_2 (HeimC2_10010) E T N K K G T W K D E H R H K S G G G OLS31478.1 Ca. Heimdallarchaeota archaeon AB125 (HeimAB125_16260) E T N K K G D F K D E H R H K S G G G OLS14054.1 Ca. Lokiarchaeota archaeon CR_4 (RBG13Loki_2232) E T N K K G D V K D E H R H K S G G G OLS18291.1 Ca. Odinarchaeota archaeon LCB_4 (OdinLCB4_15300) E T N K K G T S K D E H R H K S G G G OLS14417.1 Ca. Lokiarchaeota archaeon CR_4 (RBG13Loki_1973) E T N K K G D I K D E H T L K S G G G KKK44923.1 Ca. Lokiarchaeum sp. GC14_75 (Lokiarch_14140) E T N K C G D V K D E H M V K S G G G KKK45122.1 Ca. Lokiarchaeum sp. GC14_75 (Lokiarch_12480) E T N K C G D V K D E H M V K S G G G Type IV KXH74397.1 Ca. Thorarchaeota archaeon SMTZ1-45 (AM325_06140) E T N K C Y D I K D E H M V K S G G A KXH71633.1 Ca. Thorarchaeota archaeon SMTZ1-83 (AM324_08310) E T N K C Y D I K D E H M V K S G G A OLS27625.1 Ca. Thorarchaeota archaeon AB_25 (ThorAB25_20030)* E T N K C Y D I ------CAB13232 Bacillus subtilis subsp subtilis str. 168 G S K K V G D L K D E H P L S S A G G

*partial

53

Suppl. Figure 18: Maximum likelihood phylogenetic analysis of ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) and analysis of essential sites. a, Phylogenetic tree was estimated using IQ-tree LG+C60+R+F and is based on a protein alignment of 359 aligned sites and includes 171 sequences. All support values (SH-like approximate likelihood ratio test and ultrafast bootstraps) below 90 as well as support values of closely related sequences were removed to increase readability and are represented by filled circles according to the color code in the figure legend. Scale bar indicates number of substitutions per site. b, Alignment positions of active site residues69 present in Asgard homologues as compared to representative species for each known RuBisCO-type. Amino acids, which are given as single-letter IUPAC code, are numbered according to Synechococcus elongatus PCC 6301 reference sequence. Raw data files are available via figshare (see Data availability for more details).

54

Suppl. File 1: Phylogenetic analyses of universal marker proteins. Maximum-likelihood analysis based on a concatenated set of 56 universal protein markers for archaeal representatives of all major clades (144 taxa, 6453 sites) using IQ-TREE under the LG+C60+F+G+PMSF model. The tree is rooted on the DPANN. Numbers at branches indicate bootstrap statistical support (100 replicates). Raw data files are available via figshare (see Data availability for more details).

(Heimdallarchaeote_AB_125:0.5119457884,((((((((((Candidatus_Bathyarchaeota_archaeon_B25:0.4531130314,C andidatus_Bathyarchaeota_archaeon_B24:0.4731823306)72:0.0400240000,miscellaneous_Crenarchaeota_grou p_archaeon_SMTZ_80:0.6020819648)67:0.0237290000,((miscellaneous_Crenarchaeota_group_6_archaeon_AD8 _1:0.4186633848,(Candidatus_Bathyarchaeota_archaeon_B26_2:0.2162053929,(Candidatus_Bathyarchaeota_ar chaeon_BA2:0.1488715737,Candidatus_Bathyarchaeota_archaeon_BA1:0.1587608596)100:0.1023070000)75:0. 0474330000)100:0.1125940000,((miscellaneous_Crenarchaeota_group_15_archaeon_DG_45:0.2178359671,Tha umarchaeota_archaeon_SCGC_AB_539_E09:0.2870999698)100:0.1023590000,Candidatus_Bathyarchaeota_arch aeon_B23:0.2427957722)100:0.3052240000)100:0.0651900000)98:0.0659720000,(((((Cenarchaeum_symbiosum: 0.2528176005,Nitrosopumilus_maritimus:0.1505108344)100:0.0882820000,Candidatus_Nitrosotenuis_cloacae:0 .1798435586)100:0.0812630000,Candidatus_Nitrosotalea_devanaterra:0.1815256419)100:0.2647440000,(Ca_Ni trososphaera:0.0788246241,Nitrososphaera_viennensis_EN76:0.1004829283)100:0.2441430000)100:0.5402010 000,((Aigarchaeota_archaeon_JGI_0000106_J15:0.3605721576,((Aigarchaeota_archaeon_OS1:0.0367179819,Aig archaeota_archaeon_SCGC_AAA471_B22:0.0617268900)100:0.2609830000,Aigarchaeota_archaeon_SCGC_AAA 471_E14:0.3729145924)100:0.1671580000)90:0.0451010000,Candidatus_Caldiarchaeum_subterraneum:0.44411 03811)100:0.3178320000)98:0.0696250000)100:0.0619260000,(((Geoarchaeon_NAG1:0.2720989573,Geo_AAA4 71:0.2534452452)100:0.5434200000,(((Sulfolobus_acidocalcarius:0.2319193431,Metallosphaera_cuprina:0.2972 266276)100:0.3617150000,(((Pyrolobus_fumarii:0.2051900286,Aeropyrum_pernix:0.4018732523)84:0.05653300 00,Ignicoccus_hospitalis:0.3559141863)98:0.0452870000,Desulfurococcus_kamchatkensis:0.3921172805)100:0.0 606660000)100:0.1281290000,(((Pyrobaculum_aerophilum:0.1815051956,Thermoproteus_uzoni:0.1880619751) 100:0.1580240000,(Caldivirga_maquilingensis:0.3822967131,Vulcanisaeta_distri:0.2599948501)100:0.07239100 00)100:0.2475740000,Thermofilum_pendens:0.4584292309)100:0.1184660000)84:0.0511610000)94:0.03204700 00,((Candidatus_Methanosuratus_petracarbonis_V5:0.0090453197,Candidatus_Methanosuratus_petracarbonis _V4:0.0000026919)100:0.1482610000,((Candidatus_Methanomethylicus_mesodigestum_V1:0.0046958387,Cand idatus_Methanomethylicus_mesodigestum_V2:0.0005984168)100:0.0441760000,Candidatus_Methanomethylic us_oleusabulum_V3:0.0368734041)100:0.1757180000)100:0.5020500000)99:0.0618830000)68:0.0321740000,((K or1:0.5365366568,(Kor4_v2.filtered:0.3036017157,(Kor2:0.2238184972,Ca_Korarchaeum:0.2962272840)100:0.0 724230000)100:0.1627930000)100:0.0883790000,Kor3:0.5793397151)100:0.2823680000)91:0.0414900000,((((((( ((((archaeon_GW2011_AR11:0.4966087548,archaeon_GW2011_AR3:0.4550070139)84:0.0782380000,(archaeon _GW2011_AR4:0.5813326123,archaeon_GW2011_AR15:0.5035291483)62:0.0530790000)69:0.0369600000,arch aeon_GW2011_AR9:0.7132641404)100:0.1073650000,((archaeon_GW2011_AR20:0.4855603253, _archaeon_SCGC_AAA011_G17:0.4730819495)70:0.0793580000,(archaeon_GW2011_AR17:0.6073917730,archa eon_GW2011_AR18:0.6214375403)59:0.0442480000)100:0.0620860000)99:0.0789890000,(((archaeon_GW2011 _AR13:0.3102680013,archaeon_GW2011_AR19:0.3025266702)42:0.0354230000,Nanoarchaeota_archaeon_SCG C_AAA011_D5:0.2702275719)100:0.5334460000,archaeon_GW2011_AR6:0.6534922299)100:0.3003840000)100: 0.0955910000,(Nanoarchaeum_equitans_Kin4_M:0.5414425351,Candidatus_Nanopusillus_acidilobi:0.70002425 70)100:0.3461100000)94:0.0483270000,Candidatus_Parvarchaeum_acidiphilum_ARMAN_4:1.4189863069)100:0 .0545450000,(((((Nanohaloarchaea_archaeon_SG9:0.1948546983,(Nanohaloarchaea_archaeon_PL_Br10_U2g27: 0.1570703854,(Nanohaloarchaea_archaeon_B1_Br10_U2g29:0.1299062234,Candidatus_Haloredivivus_sp_G17: 0.1448038422)69:0.0368570000)100:0.1222030000)89:0.0362240000,Candidatus_Nanosalina_sp_J07AB43:0.290 4755147)96:0.0364810000,(Nanohaloarchaea_archaeon_PL_Br10_U2g16:0.1879147201,Nanohaloarchaea_arch aeon_B1_Br10_U2g1:0.1747657297)100:0.0640860000)96:0.0780700000,Candidatus_Nanosalinarum_sp_J07AB 56:0.4245938503)100:0.8025340000,(archaeon_GW2011_AR5:0.7784679340,((Candidatus_Aenigmarchaeota_ar chaeon_JGI_0000106_F11:0.3480567321,Candidatus_Micrarchaeota_archaeon_RBG_16_49_10:0.4891597952)1

55

00:0.2345570000,(Candidatus_Aenigmarchaeum_subterraneum_SCGC_AAA011_O16:0.4175100808,Candidatus _Micrarchaeota_archaeon_RBG_16_36_9:0.3305634076)100:0.3070420000)87:0.0558810000)100:0.1104580000 )100:0.0636050000)96:0.0381950000,(((((archaeon_GW2011_AR21:0.3689632658,Diapherotrites_archaeon_SCG C_AAA011_K09_6G:0.4743490602)73:0.0589080000,Diapherotrites_archaeon_SCGC_AAA011_N19_5G:0.563585 7648)42:0.0606430000,archaeon_GW2011_AR10:0.4108879722)46:0.0932110000,Nano_AAA011:0.4849340888) 100:0.4424210000,Candidatus_Micrarchaeum_acidiphilum_ARMAN_2:1.3613198185)97:0.0834780000)100:0.04 94900000,(((Candidatus_Altiarchaeales_archaeon_IMC4:0.4270153522,Candidatus_Altiarchaeales_archaeon_W OR_SM1_SCG:0.3334887534)97:0.0733080000,Candidatus_Altiarchaeales_archaeon_WOR_SM1_86_2:0.378317 4389)100:0.0978190000,(Candidatus_Altiarchaeum_sp_CG2_30_32_3053:0.1037902554,Altiarchaeum_SM1_MS I:0.1029317431)100:0.7956100000)100:0.2627390000)100:0.0383750000,(((((Hadesarchaea_archaeon_YNP_45:0 .2456273360,Hadesarchaea_archaeon_DG_33:0.1787536205)99:0.0645250000,(Hadesarchaea_archaeon_DG_33 _1:0.2448048479,Hadesarchaea_archaeon_YNP_N21:0.2000892364)70:0.0312690000)100:0.3482460000,(((Arc_ I_group_archaeon_B15fssc0709_Meth_Bin003:0.0448513845,Arc_I_group_archaeon_U1lsi0528_Bin089:0.10838 45831)100:0.4109600000,Thermoplasmatales_archaeon_DG_70:0.4393983001)100:0.1433250000,(Pyrococcus_f uriosus:0.0807378113,Thermococcus_kodakarensis:0.0883115265)100:0.3189490000)99:0.0461040000)79:0.027 1940000,((Methanococcus_maripaludis:0.3305886271,Methanocaldococcus_jannaschii:0.1743835270)100:0.271 2160000,(((Methanosphaera_stadtmanae:0.2717423870,Methanobacterium_AL:0.1476723137)100:0.09553200 00,Methanothermobacter_thermauto:0.1126648235)100:0.1433530000,Methanothermus_ferv:0.1821868526)1 00:0.2012670000)87:0.0430680000)94:0.0305370000,(((((Methanosaeta_thermophila:0.3907974225,(Methanos arcina_acetivorans:0.3460832542,Candidatus_Methanoperedens_nitroreducens:0.3492219008)100:0.06188600 00)99:0.0514910000,(Methanocella_paludicola:0.4338084092,((Methanocorpusculum_labreanum:0.337877442 8,(Methanoculleus_marisnigri:0.2102787828,Methanoplanus_petrolearius:0.2290565846)98:0.0573700000)100: 0.2484840000,(Haloferax_volcanii:0.1741280583,Haloarcula_marismortui:0.1836330649)100:0.5455300000)86:0 .0553610000)45:0.0320800000)63:0.0456490000,(uncultured_archaeon_ANME_1:0.4963718545,(Candidatus_Sy ntrophoarchaeum_caldarius:0.1397866544,Candidatus_Syntrophoarchaeum_butanivorans:0.0980292217)100:0. 2650430000)100:0.0776360000)100:0.0971060000,(Ferroglobus_placidus:0.1241508911,Archaeoglobus_fulgidus :0.1685411238)100:0.3091740000)100:0.0799120000,(((Thermoplasma_acidoph:0.2832671403,Ferroplasma_aci darmanus:0.3767134251)100:0.4018670000,Aciduliprofundum_boonei:0.2991713160)100:0.1175000000,(Meth anomassiliicoccus:0.4740189710,(Thermoplasmatales_archaeon_SM1_50:0.2641616620,Thermoplasmatales_ar chaeon_SG8_52_3:0.2039809914)100:0.2867050000)91:0.0846780000)100:0.1363960000)100:0.0738590000)96: 0.0268940000)100:0.1050430000)78:0.0342760000,((((Thorarchaeote_AB_25:0.0529313888,(Thor45:0.02041441 84,Candidatus_Thorarchaeota_archaeon_SMTZ_45:0.0327720733)100:0.0398360000)100:0.0750040000,Thor83: 0.1320739741)100:0.5338300000,Odinarchaeote_LCB4:0.5164121338)94:0.0433970000,(Lokiarchaeum_GC14_7 5:0.6968888177,Lokiarchaeote_CR_4:0.5709798690)100:0.2487610000)100:0.0504240000)90:0.0310000000,((((( Plasmodium_falciparum:0.3641438771,Tetrahymena_thermophila:0.5012707949)82:0.0567880000,((((Chlamyd omonas_reinhardtii:0.1853510417,Arabidopsis_tha:0.1649083532)100:0.0845610000,(Bigelowiella_natans:0.34 82763535,Tpseudonana:0.3345170553)87:0.0440040000)76:0.0255950000,(Saccharomyces_cerev:0.3515848858 ,Homo_sapiens:0.1965011570)99:0.0527320000)51:0.0275410000,(Dictyostelium_dis:0.3371247249,Thecamona s_trahens:0.3295523016)55:0.0424430000)84:0.0365720000)66:0.0352300000,(Naegleria_gruberi:0.4297535442 ,Leishmania_infantum:0.5017477514)77:0.0569910000)68:0.0588530000,Entamoeba_histolytica:0.4846889392) 100:0.1100600000,Trichomonas_vaginalis:0.6764020263)100:0.9057540000)100:0.0804370000,(Heimdallarchae ote_LC_3:0.6004704979,SChinaSea_Heimdall:0.4926646175)100:0.2158920000)100:0.1665880000,Heimdallarch aeote_LC_2:0.7012944200);

56

Suppl. File 2: Phylogenetic analyses of universal marker proteins after removal of fast evolving sites. Maximum- likelihood analysis is based on a concatenated set of 56 universal protein markers, after removal of the fastest- evolving sites (144 taxa, 5020 sites) using IQ-TREE under the LG+C60+F+G+PMSF model. Numbers at branches indicate bootstrap statistical support (100 replicates). Raw data files are available via figshare (see Data availability for more details).

(Heimdallarchaeote_AB_125:0.2928752461,((((((((Candidatus_Bathyarchaeota_archaeon_B25:0.2576332972,Ca ndidatus_Bathyarchaeota_archaeon_B24:0.2526796273)96:0.019728,miscellaneous_Crenarchaeota_group_arch aeon_SMTZ_80:0.3263807535)95:0.017056,(((miscellaneous_Crenarchaeota_group_6_archaeon_AD8_1:0.23675 16196,(Candidatus_Bathyarchaeota_archaeon_BA2:0.0861171418,Candidatus_Bathyarchaeota_archaeon_BA1: 0.0974642542)100:0.047737)87:0.023556,Candidatus_Bathyarchaeota_archaeon_B26_2:0.1161076513)100:0.07 1130,((miscellaneous_Crenarchaeota_group_15_archaeon_DG_45:0.1238104185,Thaumarchaeota_archaeon_SC GC_AB_539_E09:0.1504660449)100:0.064200,Candidatus_Bathyarchaeota_archaeon_B23:0.1406423645)100:0. 168348)100:0.033258)96:0.032607,(((((Cenarchaeum_symbiosum:0.1423967759,Nitrosopumilus_maritimus:0.08 80727544)100:0.052961,Candidatus_Nitrosotenuis_cloacae:0.1066986373)100:0.053142,Candidatus_Nitrosotale a_devanaterra:0.1108124325)100:0.154594,(Ca_Nitrososphaera:0.0414908913,Nitrososphaera_viennensis_EN7 6:0.0603424390)100:0.142028)100:0.298809,((Aigarchaeota_archaeon_JGI_0000106_J15:0.1885502863,((Aigarc haeota_archaeon_OS1:0.0262797011,Aigarchaeota_archaeon_SCGC_AAA471_B22:0.0432531308)100:0.143105, Aigarchaeota_archaeon_SCGC_AAA471_E14:0.2030318584)100:0.092678)97:0.025985,Candidatus_Caldiarchaeu m_subterraneum:0.2432666605)100:0.195677)95:0.038464)100:0.038334,(((Kor1:0.2881651482,(Kor4_v2.filtere d:0.1742037378,(Kor2:0.1284213180,Ca_Korarchaeum:0.1586226874)100:0.038116)100:0.101010)100:0.057113 ,Kor3:0.3103890041)100:0.167677,(((Geoarchaeon_NAG1:0.1550803276,Geo_AAA471:0.1430709836)100:0.300 106,(((Sulfolobus_acidocalcarius:0.1334921306,Metallosphaera_cuprina:0.1644045996)100:0.204357,(((Pyrolob us_fumarii:0.1178680447,Aeropyrum_pernix:0.2278035065)97:0.030335,Ignicoccus_hospitalis:0.2079668745)10 0:0.024511,Desulfurococcus_kamchatkensis:0.2350887965)100:0.036425)100:0.080812,(((Pyrobaculum_aerophi lum:0.1043569862,Thermoproteus_uzoni:0.1135277736)100:0.093661,(Caldivirga_maquilingensis:0.2147770111 ,Vulcanisaeta_distri:0.1517204874)100:0.039938)100:0.143774,Thermofilum_pendens:0.2432647327)100:0.067 185)95:0.026303)99:0.015357,((Candidatus_Methanosuratus_petracarbonis_V5:0.0063215991,Candidatus_Met hanosuratus_petracarbonis_V4:0.0000025546)100:0.076680,((Candidatus_Methanomethylicus_mesodigestum_ V1:0.0028693107,Candidatus_Methanomethylicus_mesodigestum_V2:0.0001875353)100:0.027948,Candidatus_ Methanomethylicus_oleusabulum_V3:0.0177341150)100:0.095345)100:0.270462)99:0.029900)83:0.019305)98:0 .029100,(((((Thorarchaeote_AB_25:0.0297522209,(Thor45:0.0124677315,Candidatus_Thorarchaeota_archaeon_ SMTZ_45:0.0197536180)100:0.022932)100:0.037425,Thor83:0.0795752878)100:0.288900,Odinarchaeote_LCB4:0 .2786545193)88:0.018743,(Lokiarchaeum_GC14_75:0.4027761540,Lokiarchaeote_CR_4:0.3345673018)100:0.139 449)100:0.025326,(((((Plasmodium_falciparum:0.2085223709,Tetrahymena_thermophila:0.3194920166)73:0.03 4773,(((Chlamydomonas_reinhardtii:0.1102354254,Arabidopsis_tha:0.0993637014)100:0.055215,(Bigelowiella_ natans:0.2053383662,Tpseudonana:0.2066663550)99:0.028938)79:0.016541,((Saccharomyces_cerev:0.2156302 829,Homo_sapiens:0.1224174531)94:0.027732,(Dictyostelium_dis:0.2083541297,Thecamonas_trahens:0.200881 7613)62:0.024938)61:0.014422)96:0.024404)68:0.020668,(Naegleria_gruberi:0.2690815171,Leishmania_infantu m:0.3024806769)78:0.030179)74:0.039396,Entamoeba_histolytica:0.2968605001)100:0.078008,Trichomonas_va ginalis:0.4194799783)100:0.524230)42:0.010032)67:0.016736,(((((((((((archaeon_GW2011_AR11:0.2704708130,a rchaeon_GW2011_AR3:0.2570247121)96:0.040935,(archaeon_GW2011_AR4:0.3186664594,archaeon_GW2011_ AR15:0.2948153160)92:0.029914)100:0.021840,archaeon_GW2011_AR9:0.4048077455)100:0.065613,((archaeo n_GW2011_AR20:0.2871980917,Nanoarchaeota_archaeon_SCGC_AAA011_G17:0.2795197136)79:0.046001,(arc haeon_GW2011_AR17:0.3455077894,archaeon_GW2011_AR18:0.3572807861)79:0.029573)100:0.029174)100:0. 047009,((archaeon_GW2011_AR13:0.1784562909,(Nanoarchaeota_archaeon_SCGC_AAA011_D5:0.1652680619, archaeon_GW2011_AR19:0.1846220301)98:0.027684)100:0.316135,archaeon_GW2011_AR6:0.3757921568)100: 0.178003)100:0.060528,(Nanoarchaeum_equitans_Kin4_M:0.3136124966,Candidatus_Nanopusillus_acidilobi:0.

57

4411257580)100:0.202730)100:0.029227,Candidatus_Parvarchaeum_acidiphilum_ARMAN_4:0.8461739376)100: 0.034398,((((Nanohaloarchaea_archaeon_SG9:0.1132825771,(((Nanohaloarchaea_archaeon_PL_Br10_U2g27:0.0 945141476,Nanohaloarchaea_archaeon_B1_Br10_U2g29:0.0838380440)99:0.026215,Candidatus_Haloredivivus_ sp_G17:0.0857715585)100:0.076165,Candidatus_Nanosalina_sp_J07AB43:0.1706675622)100:0.023863)100:0.03 3432,(Nanohaloarchaea_archaeon_PL_Br10_U2g16:0.1121395047,Nanohaloarchaea_archaeon_B1_Br10_U2g1: 0.1105100764)100:0.042800)100:0.045492,Candidatus_Nanosalinarum_sp_J07AB56:0.2737061602)100:0.45182 7,(archaeon_GW2011_AR5:0.4375157604,((Candidatus_Aenigmarchaeota_archaeon_JGI_0000106_F11:0.20398 25969,Candidatus_Micrarchaeota_archaeon_RBG_16_49_10:0.2799267749)100:0.142718,(Candidatus_Aenigma rchaeum_subterraneum_SCGC_AAA011_O16:0.2597734948,Candidatus_Micrarchaeota_archaeon_RBG_16_36_ 9:0.2019659819)100:0.156027)99:0.035202)100:0.067332)100:0.042257)99:0.024786,((((archaeon_GW2011_AR 21:0.2311695189,Diapherotrites_archaeon_SCGC_AAA011_K09_6G:0.2739890648)100:0.049886,(Diapherotrites _archaeon_SCGC_AAA011_N19_5G:0.3172921306,archaeon_GW2011_AR10:0.1888879368)85:0.037809)83:0.05 3895,Nano_AAA011:0.2673741530)100:0.263781,Candidatus_Micrarchaeum_acidiphilum_ARMAN_2:0.8154579 369)100:0.053567)100:0.028880,(((Candidatus_Altiarchaeales_archaeon_IMC4:0.2342178374,Candidatus_Altiar chaeales_archaeon_WOR_SM1_SCG:0.1744397512)100:0.045033,Candidatus_Altiarchaeales_archaeon_WOR_S M1_86_2:0.2147362419)100:0.057120,(Candidatus_Altiarchaeum_sp_CG2_30_32_3053:0.0737814278,Altiarcha eum_SM1_MSI:0.0662605690)100:0.460538)100:0.150970)100:0.024629,(((((Hadesarchaea_archaeon_YNP_45:0 .1432494873,Hadesarchaea_archaeon_DG_33:0.1088459762)95:0.036844,(Hadesarchaea_archaeon_DG_33_1:0. 1417478473,Hadesarchaea_archaeon_YNP_N21:0.1134198622)93:0.017617)100:0.191723,(((Arc_I_group_archa eon_B15fssc0709_Meth_Bin003:0.0258586073,Arc_I_group_archaeon_U1lsi0528_Bin089:0.0781824935)100:0.2 13623,Thermoplasmatales_archaeon_DG_70:0.2300603898)100:0.082953,(Pyrococcus_furiosus:0.0457492873,T hermococcus_kodakarensis:0.0513814250)100:0.172184)100:0.028674)91:0.015003,((Methanococcus_maripalu dis:0.1861893675,Methanocaldococcus_jannaschii:0.0968532236)100:0.147919,(((Methanosphaera_stadtmana e:0.1416960487,Methanobacterium_AL:0.0750576720)100:0.049283,Methanothermobacter_thermauto:0.0649 599757)100:0.084065,Methanothermus_ferv:0.0986154825)100:0.110005)100:0.026342)100:0.014888,(((((Meth anosaeta_thermophila:0.2028158404,(Methanosarcina_acetivorans:0.1957996843,Candidatus_Methanoperede ns_nitroreducens:0.1845454941)100:0.030342)100:0.023981,(Methanocella_paludicola:0.2316270532,((Methan ocorpusculum_labreanum:0.1799391639,(Methanoculleus_marisnigri:0.1103367772,Methanoplanus_petroleari us:0.1249728363)100:0.035637)100:0.133049,(Haloferax_volcanii:0.0986221119,Haloarcula_marismortui:0.1061 408156)100:0.297410)100:0.030385)94:0.018797)93:0.024075,(uncultured_archaeon_ANME_1:0.2729942273,(C andidatus_Syntrophoarchaeum_caldarius:0.0710397049,Candidatus_Syntrophoarchaeum_butanivorans:0.0516 405097)100:0.145365)100:0.033684)100:0.052239,(Ferroglobus_placidus:0.0706377564,Archaeoglobus_fulgidus :0.0940691233)100:0.166343)100:0.046167,(((Thermoplasma_acidoph:0.1594383260,Ferroplasma_acidarmanus :0.2068978205)100:0.220409,Aciduliprofundum_boonei:0.1534174837)100:0.062302,(Methanomassiliicoccus:0. 2404226722,(Thermoplasmatales_archaeon_SM1_50:0.1508279051,Thermoplasmatales_archaeon_SG8_52_3:0. 1149976575)100:0.159494)88:0.040828)100:0.071061)100:0.043855)99:0.018153)100:0.059263)100:0.063213,(H eimdallarchaeote_LC_3:0.3338784722,SChinaSea_Heimdall:0.2708131853)100:0.122024)100:0.093803,Heimdall archaeote_LC_2:0.3898869217);

58

Suppl. File 3: Phylogenetic analyses of universal marker proteins after removal of DPANN archaea and fast evolving sites. Maximum-likelihood analysis was based on a concatenated set of 56 universal protein markers, after removal of the fastest-evolving sites, and excluding DPANN representatives (104 taxa, 5008 sites) using IQ-TREE under the LG+C60+F+G+PMSF model. Numbers at branches indicate bootstrap statistical support (100 replicates). Raw data files are available via figshare (see Data availability for more details).

(Heimdallarchaeote_AB_125:0.2753135815,(((((((((Candidatus_Bathyarchaeota_archaeon_B25:0.2336842229,Ca ndidatus_Bathyarchaeota_archaeon_B24:0.2293072611)67:0.0160270000,miscellaneous_Crenarchaeota_group _archaeon_SMTZ_80:0.2994076157)74:0.0127260000,(((miscellaneous_Crenarchaeota_group_6_archaeon_AD8 _1:0.2128375546,(Candidatus_Bathyarchaeota_archaeon_BA2:0.0791803903,Candidatus_Bathyarchaeota_arch aeon_BA1:0.0856296015)100:0.0381420000)73:0.0221990000,Candidatus_Bathyarchaeota_archaeon_B26_2:0.1 032920367)100:0.0643980000,((miscellaneous_Crenarchaeota_group_15_archaeon_DG_45:0.1119863372,Thau marchaeota_archaeon_SCGC_AB_539_E09:0.1421864268)100:0.0560090000,Candidatus_Bathyarchaeota_archa eon_B23:0.1302738407)100:0.1560470000)100:0.0275500000)89:0.0293320000,(((((Cenarchaeum_symbiosum:0 .1256062741,Nitrosopumilus_maritimus:0.0818520869)100:0.0505150000,Candidatus_Nitrosotenuis_cloacae:0. 0941423095)100:0.0464360000,Candidatus_Nitrosotalea_devanaterra:0.1025858591)100:0.1463980000,(Ca_Nit rososphaera:0.0380180376,Nitrososphaera_viennensis_EN76:0.0535685788)100:0.1298000000)100:0.28215700 00,((Aigarchaeota_archaeon_JGI_0000106_J15:0.1742193317,((Aigarchaeota_archaeon_OS1:0.0239533738,Aiga rchaeota_archaeon_SCGC_AAA471_B22:0.0405684018)100:0.1294120000,Aigarchaeota_archaeon_SCGC_AAA4 71_E14:0.1869180340)100:0.0825500000)77:0.0228150000,Candidatus_Caldiarchaeum_subterraneum:0.220614 0559)100:0.1817890000)89:0.0414090000)100:0.0343000000,(((Kor1:0.2651649861,(Kor4_v2.filtered:0.1575735 627,(Kor2:0.1131258394,Ca_Korarchaeum:0.1404138858)100:0.0341560000)100:0.0928520000)100:0.05332800 00,Kor3:0.2874064018)100:0.1528820000,(((Geoarchaeon_NAG1:0.1461450081,Geo_AAA471:0.1233876504)100 :0.2902970000,(((Sulfolobus_acidocalcarius:0.1193094710,Metallosphaera_cuprina:0.1480239482)100:0.186775 0000,(((Pyrolobus_fumarii:0.1042088633,Aeropyrum_pernix:0.2039007103)55:0.0248560000,Ignicoccus_hospita lis:0.1875035844)97:0.0215690000,Desulfurococcus_kamchatkensis:0.2113263888)99:0.0298950000)100:0.0732 740000,(((Pyrobaculum_aerophilum:0.0949735492,Thermoproteus_uzoni:0.0994199202)100:0.0872270000,(Cal divirga_maquilingensis:0.1901766946,Vulcanisaeta_distri:0.1405652332)97:0.0358030000)100:0.1291230000,Th ermofilum_pendens:0.2183540751)100:0.0622840000)58:0.0222840000)99:0.0148690000,((Candidatus_Methan osuratus_petracarbonis_V5:0.0054809690,Candidatus_Methanosuratus_petracarbonis_V4:0.0000025071)100:0. 0713590000,((Candidatus_Methanomethylicus_mesodigestum_V1:0.0030118134,Candidatus_Methanomethylic us_mesodigestum_V2:0.0001516040)100:0.0244820000,Candidatus_Methanomethylicus_oleusabulum_V3:0.01 63362919)100:0.0853660000)100:0.2470540000)95:0.0255920000)91:0.0168890000)96:0.0286730000,((((((Hade sarchaea_archaeon_YNP_45:0.1329479445,Hadesarchaea_archaeon_DG_33:0.0944241647)95:0.0303230000,Ha desarchaea_archaeon_DG_33_1:0.1319438479)57:0.0181990000,Hadesarchaea_archaeon_YNP_N21:0.1006370 334)100:0.1755270000,((Methanococcus_maripaludis:0.1688536747,Methanocaldococcus_jannaschii:0.0900005 940)100:0.1317540000,(((Methanosphaera_stadtmanae:0.1280112983,Methanobacterium_AL:0.0668465289)10 0:0.0462100000,Methanothermobacter_thermauto:0.0557823216)100:0.0769810000,Methanothermus_ferv:0.0 877605990)100:0.1014160000)97:0.0197300000)53:0.0071820000,(((Arc_I_group_archaeon_B15fssc0709_Meth _Bin003:0.0232475234,Arc_I_group_archaeon_U1lsi0528_Bin089:0.0716052256)100:0.2018200000,Thermoplas matales_archaeon_DG_70:0.2031773496)100:0.0804710000,(Pyrococcus_furiosus:0.0417240096,Thermococcus _kodakarensis:0.0458043458)100:0.1542150000)100:0.0297100000)84:0.0130200000,(((((Methanosaeta_therm ophila:0.1834352438,(Methanosarcina_acetivorans:0.1801134713,Candidatus_Methanoperedens_nitroreducen s:0.1705921165)90:0.0237020000)99:0.0214860000,(Methanocella_paludicola:0.2120856965,((Methanocorpusc ulum_labreanum:0.1658945587,(Methanoculleus_marisnigri:0.0959918219,Methanoplanus_petrolearius:0.1132 098362)97:0.0292970000)100:0.1210410000,(Haloferax_volcanii:0.0895252802,Haloarcula_marismortui:0.09837 43921)100:0.2730120000)95:0.0282950000)92:0.0181260000)96:0.0223730000,(uncultured_archaeon_ANME_1: 0.2502074410,(Candidatus_Syntrophoarchaeum_caldarius:0.0652268066,Candidatus_Syntrophoarchaeum_buta

59

nivorans:0.0457084893)100:0.1295230000)98:0.0294570000)100:0.0477320000,(Ferroglobus_placidus:0.062480 1670,Archaeoglobus_fulgidus:0.0835863236)100:0.1508200000)100:0.0397200000,(((Thermoplasma_acidoph:0. 1478047480,Ferroplasma_acidarmanus:0.1871859881)100:0.2056740000,Aciduliprofundum_boonei:0.13914648 50)100:0.0582040000,(Methanomassiliicoccus:0.2232765614,(Thermoplasmatales_archaeon_SM1_50:0.134592 1651,Thermoplasmatales_archaeon_SG8_52_3:0.1101539943)100:0.1414890000)88:0.0409040000)100:0.07010 90000)100:0.0369440000)100:0.0719980000)59:0.0121050000,(((Thorarchaeote_AB_25:0.0234233586,(Thor45:0 .0107773770,Candidatus_Thorarchaeota_archaeon_SMTZ_45:0.0188485947)97:0.0200180000)100:0.034369000 0,Thor83:0.0709674518)100:0.2760890000,(Odinarchaeote_LCB4:0.2535515008,(Lokiarchaeum_GC14_75:0.390 8516406,Lokiarchaeote_CR_4:0.3151145991)100:0.1123350000)63:0.0096020000)99:0.0238650000)82:0.016522 0000,((((Plasmodium_falciparum:0.1909199990,Tetrahymena_thermophila:0.2960092071)67:0.0273520000,(((( Chlamydomonas_reinhardtii:0.1015493084,Arabidopsis_tha:0.0856689814)100:0.0444510000,(Bigelowiella_nat ans:0.1836013735,Tpseudonana:0.1914315382)88:0.0253360000)71:0.0161450000,((Saccharomyces_cerev:0.19 56095640,Homo_sapiens:0.1085922128)83:0.0228620000,(Dictyostelium_dis:0.1917619524,Thecamonas_trahe ns:0.1844618962)48:0.0192750000)41:0.0124220000)61:0.0187360000,(Entamoeba_histolytica:0.2783573085,Le ishmania_infantum:0.2891016899)65:0.0345450000)42:0.0132550000)52:0.0226570000,Naegleria_gruberi:0.25 08036160)100:0.0697630000,Trichomonas_vaginalis:0.4057911732)100:0.5014970000)100:0.0401890000,(Heim dallarchaeote_LC_3:0.3151418784,SChinaSea_Heimdall:0.2473933463)100:0.1156070000)100:0.0862280000,Hei mdallarchaeote_LC_2:0.3630172997);

60