bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

1 Expanding diversity of Asgard archaea and the elusive ancestry of eukaryotes

2

3 Yang Liu1†, Kira S. Makarova2†, Wen-Cong Huang1†, Yuri I. Wolf2, Anastasia Nikolskaya2, Xinxu 4 Zhang1, Mingwei Cai1, Cui-Jing Zhang1, Wei Xu3, Zhuhua Luo3, Lei Cheng4, Eugene V. Koonin2*, Meng 5 Li1*

6 1 Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen 7 University, Shenzhen, Guangdong, 518060, P. R. China

8 2 National Center for Biotechnology Information, National Library of Medicine, National Institutes of 9 Health, Bethesda, Maryland 20894, USA

10 3 State Key Laboratory Breeding Base of Marine Genetic Resources, Key Laboratory of Marine Genetic 11 Resources, Fujian Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, State 12 Oceanic Administration, Xiamen 361005, P. R. China

13 4 Key Laboratory of Development and Application of Rural Renewable Energy, Biogas Institute of 14 Ministry of Agriculture, Chengdu 610041, P.R. China

15 † These authors contributed equally to this work.

16 *Authors for correspondence: [email protected] or [email protected]

17

18

19 Running title: Asgard archaea genomics

20 Keywords:

1

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

21 Abstract

22 Comparative analysis of 162 (nearly) complete genomes of Asgard archaea, including 75 not reported 23 previously, substantially expands the phylogenetic and metabolic diversity of the Asgard superphylum, 24 with six additional phyla proposed. Phylogenetic analysis does not strongly support origin of eukaryotes 25 from within Asgard, leaning instead towards a three- topology, with eukaryotes branching outside 26 archaea. Comprehensive protein domain analysis in the 162 Asgard genomes results in a major expansion 27 of the set of eukaryote signature proteins (ESPs). The Asgard ESPs show variable phyletic distributions 28 and domain architectures, suggestive of dynamic evolution via horizontal gene transfer (HGT), gene loss, 29 gene duplication and domain shuffling. The results appear best compatible with the origin of the 30 conserved core of eukaryote genes from an unknown ancestral lineage deep within or outside the extant 31 archaeal diversity. Such hypothetical ancestors would accumulate components of the mobile archaeal 32 ‘eukaryome’ via extensive HGT, eventually, giving rise to eukaryote-like cells.

2

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

33 Introduction

34 The Asgard archaea are a recently discovered archaeal superphylum that is rapidly expanding, thanks to 35 metagenomic sequencing (1–5). The Asgard genomes encode a diverse repertoire of eukaryotic signature 36 proteins (ESPs) that far exceeds the diversity of ESPs in other archaea. The Asgard ESPs are particularly 37 enriched in proteins involved in membrane trafficking, vesicle formation and transport, cytoskeleton 38 formation and the ubiquitin network, suggesting that these archaea possess a eukaryote-type cytoskeleton 39 and an intracellular membrane system (2).

40 The discovery of the Asgard archaea rekindled the decades old but still unresolved fundamental debate on 41 the evolutionary relationship between eukaryotes and archaea that has shaped around the ‘2-domain (2D) 42 versus 3-domain (3D) tree of life’ theme (6–8). The central question is whether the eukaryotic nuclear 43 lineage evolved from a common ancestor shared with archaea, as in the 3D tree, or from within the 44 archaea, as in the 2D tree. The discovery and phylogenomic analysis of Asgard archaea yielded strong 45 evidence in support of the 2D tree, in which eukaryotes appeared to share common ancestry with one of 46 the Asgard lineages, Heimdallarchaeota (1, 2, 5). However, the debate is not over as arguments have been 47 made for the 3D topology, in particular, based on the phylogenetic analysis of RNA polymerases, some of 48 the most highly conserved, universal proteins (9, 10).

49 Molecular phylogenetic methods alone might be insufficient to resolve the ancient ancestral relationship 50 between archaea and eukarya. To arrive at a compelling solution, supporting biological evidence is crucial 51 (11), because, for example, the transition from archaeal, ether-linked membrane lipids to eukaryotic, 52 ester-linked lipids that constitute eukaryotic (and bacterial) membranes (12) and the apparent lack of the 53 phagocytosis capacity in Asgard archaea (13) are major problems for scenarios of the origin of eukaryotes 54 from Asgard or any other archaeal lineage. Some biochemical evidence indicates that Asgard archaea 55 possess an actin cytoskeleton regulated by accessory proteins, such as profilins and gelsolins, and the 56 endosomal sorting complex required for transport machinery (ESCRT) that can be predicted to function 57 similarly to the eukaryotic counterparts (14–16). Generally, however, the biology of Asgard archaea 58 remains poorly characterized, in large part, because of their recalcitrance to growth in culture (17). To 59 date, only one Asgard archaeon, Candidatus Prometheoarchaeum syntrophicum strain MK-D1, has been 60 isolated and grown in culture (17). This has been reported to form extracellular protrusions that 61 are involved in its interaction with syntrophic , but no visible organelle-like structure and, 62 apparently, little intracellular complexity.

63 The only complete, closed genome of an Asgard archaeon also comes from Candidatus P. syntrophicum 64 strain MK-D1 (17) whereas all other genome sequences were obtained by binning multiple metagenomics

3

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

65 contigs. Furthermore, these genome sequences represent but a small fraction of the Asgard diversity that 66 has been revealed by 16S rRNA sequencing (18, 19).An additional challenge to the study of the 67 relationship between archaea and eukaryotes is that identification and analysis of the archaeal ESPs are 68 non-trivial tasks due to the high sequence divergence of many if not most of these proteins. At present, 69 the most efficient, realistic approach to the study of Asgard archaea and their eukaryotic connections 70 involves obtaining high quality genome sequences and analyzing them using the most powerful and 71 robust of the available computational methods.

72 Here we describe metagenomic mining of the expanding diversity of the superphylum Asgard, including 73 the identification of six additional -level lineages that thrive in a wide variety of ecosystems and 74 are inferred to possess versatile metabolic capacities. We show that these uncultivated Asgard groups 75 carry a broad repertoire of ESPs many of which have not been reported previously. Our in depth 76 phylogenomic analysis of these genomes provides insights into the evolution of Asgard archaea but calls 77 into question the origin of eukaryotes from within Asgard.

78

79 Results

80 Reconstruction of Asgard archaeal genomes from metagenomics data

81 We reconstructed 75 metagenome-assembled genomes (MAGs) from the Asgard superphylum that have 82 not been reported previously. These MAGs were recovered from various water depths of the Yap trench, 83 intertidal mangrove sediments of Mai Po Nature Reserve (Hong Kong, China) and Futian Mangrove 84 Nature Reserve (Shenzhen, China), seagrass sediments of Swan Lake Nature Reserve (Rongcheng, 85 China) and petroleum samples of Shengli oilfield (Shandong, China) (Supplementary Table 1, 86 Supplementary Figure 1). For all analyses described here, these 75 MAGs were combined with 87 87 publicly available genomes, resulting in a set of 162 Asgard genomes. The 75 genomes reconstructed here 88 were, on average, 82% complete and showed evidence of low contamination of about 3%, on average 89 (Supplementary Figure 2).

90

91 Classification of Asgard genes into clusters of orthologs

92 The previous analyses of Asgard genomes detected a large fraction of “dark matter” genes (20). For 93 example, in the recently published complete genome of Candidatus Prometheoarchaeum syntrophicum, 94 45% of the proteins are annotated as “hypothetical”. We made an effort to improve the annotation of 95 Asgard genomes by investigating this dark matter in greater depth, and developing a dedicated platform

4

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

96 for Asgard comparative genomics. To this end, we constructed Asgard Clusters of Orthologous Genes 97 (asCOGs) and used the most sensitive available methods of sequence analysis to annotate additional 98 Asgard proteins, attempting, in particular, to expand the catalogue of Asgard homologs of ESPs (see 99 Materials and Methods for details).

100 Preliminary clustering by sequence similarity and analysis of the protein cluster representation across the 101 genomes identified the set of 76 most complete Asgard MAGs (46 genomes available previously and 30 102 ones reported here) that cover most of the group diversity (Supplementary Table 1). The first version of 103 the asCOGs presented here consists of 14,704 orthologous protein families built for this 76-genome set. 104 The asCOGs cover from 72% to 98% (92% on average) of the proteins in these 76 genomes (additional 105 data file 1). Many asCOGs include individual domains of large, multidomain proteins.

106 The gene commonality plot for the asCOGs shows an abrupt drop at the right end, which reflects a 107 surprising deficit of nearly universal genes (Fig. 1). Such shape of the gene commonality curve appears 108 anomalous compared to other major groups of archaea or bacteria with many sequenced genomes (21). 109 For example, in the case of the TACK superphylum of archaea, for which the number of genomes 110 available is similar to that for Asgard, with a comparable level of diversity, the commonality plot shows 111 no drop at the right end, but instead, presents a clear uptick, which corresponds to the core of genes 112 represented in (almost) all genomes (Fig. 1). Apparently, most of the Asgard genomes remain incomplete, 113 such that conserved genes were missed randomly. Currently, there are only three gene families that are 114 present in all Asgard MAGs, namely, a Zn-ribbon domain, a Threonyl-tRNA synthetase and an 115 aminotransferase (additional data file 1).

116 We employed the asCOG profiles to annotate the remaining 86 Asgard MAGs, including those that were 117 sequenced in the later stages of this work (Supplementary Table 1). On average, 89% of the proteins 118 encoded in these genomes were covered by asCOGs (Supplementary Table 1). Thus, the asCOGs 119 database appears to be an efficient tool for annotation and comparative genomic analysis of Asgard 120 MAGs and complete genomes.

121

122 Expanding the phylogenetic diversity of Asgard archaea

123 Phylogenetic analysis of the Asgard MAGs based on a concatenated alignment of 209 core asCOGs (see 124 Methods and additional data file 2) placed many of the genomes reported here into the previously 125 delineated major Asgard lineages (Fig. 2a, Supplementary Table 1, additional data file 2), namely, 126 Thorarchaeota (n=20), Lokiarchaeota (n=18), Hermodarchaeota (n=9), Gerdarchaeota (n=3),

5

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

127 Helarchaeota (n=2), and Odinarchaeota (n=1). Additionally, we identified 6 previously unknown major 128 Asgard lineages that appear to be strong candidates to become additional phyla (Fig. 2a and b, 129 Supplementary Table 1; see also Taxonomic description of new taxa in the Supplementary Information 130 and additional data file 1). A clade formed by As_085 and As_075 is a deeply branching sister group to 131 the previously recognized Heimdallarchaeota(2). Furthermore, our phylogenetic analysis supported the 132 further split of “Heimdallarchaeota” into 4 phylum-level lineages according to the branch length in the 133 concatenated phylogeny (see Materials and Methods; see also Taxonomic description of new taxa in the 134 Supplementary Information and additional data file 1). The putative phyla within the old 135 Heimdallarchaeota included the previously defined Gerdarchaeota (4), and three additional phyla that 136 could be represented by As_002 (LC2) , As_003 (LC3) and AB_125 (As_001), respectively. Another 3 137 previously undescribed lineages were related, respectively, to Hel-, Loki-, Odin- and Thorarchaeota. 138 Specifically, As_181, As_178 and As_183 formed a clade that was deeply rooted at the Hel-Loki-Odin- 139 Thor clade; As_129 and As_130 formed a sister group to Odinarchaeota; and a lineage represented by 140 As_086 was a sister group to Thorarchaeota. These results were buttressed by the 16S rRNA gene 141 phylogeny, comparisons of the mean amino acid identity and 16S rRNA sequence identity (Fig. 2b, 142 Supplementary Figure 3, Supplementary Figure 4, Supplementary Table 2 and Supplementary Table 3).

143 We propose the name Wukongarchaeota after Wukong, a Chinese legendary figure who caused havoc in 144 the heavenly palace, for the putative phylum represented by MAGs As_085 and As_075 (Candidatus 145 Wukongarchaeum yapensis), and names of Asgard deities in the Norse mythology for the other 5 146 proposed phyla: (1) Hodarchaeota, after Hod, the god of darkness, for MAG As_027 (Candidatus 147 Hodarchaeum mangrove); (2) Kariarchaeota, after Kari, the god of the North wind, for MAG As_030 148 (Candidatus Kariarchaeum pelagius); (3) Borrarchaeota after Borr, the creator god and father of Odin, for 149 MAG As_133 (Candidatus Borrarchaeum yapensis); (4) Baldrarchaeota, after Baldr, the god of light and 150 brother of Thor, for MAG As_130 (Candidatus Baldrarchaeum yapensis); (5) and Hermodarchaeota after 151 Hermod, the messenger of the gods, son of Odin and brother of Baldr, for MAG As_086 (Candidatus 152 Hermodarchaeum yapensis) (Fig. 2a). For details, see Taxonomic Description of new taxa in the 153 Supplementary Information.

154 The gene content of Asgard MAGs agrees well with the phylogenetic structure of the group. The phyletic 155 patterns of the asCOG form clusters that generally correspond to the clades identified by phylogenetic 156 analysis (Fig. 3a), suggesting that gene gain and loss within Asgard archaea largely proceeded in a clock- 157 like manner and/or that horizontal gene exchange preferentially occurred between genomes within the 158 same clade.

159

6

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

160 Phylogenomic analysis and positions of Asgard archaea and eukaryotes in the tree of life

161 The outcome of phylogenetic reconstruction, especially, when deep branchings are involved, such as 162 those that are relevant for the 3D vs 2D conundrum, depends on the phylogenetic methods employed, the 163 selection of genes for phylogeny construction and, perhaps most dramatically, on the species sampling 164 (22–24). In many cases, initially uncertain positions of lineages in a tree settle over time once more 165 representatives of the groups in question and their relatives become available.

166 In our analysis of the universal phylogeny, we aimed to make the species set for phylogenetic 167 reconstruction as broadly representative as possible, while keeping its size manageable, to allow the use 168 of powerful phylogenetic methods. The tree was constructed from alignments of conserved proteins of 169 162 Asgard archaea, 286 other archaea, 98 bacteria and 72 eukaryotes (see Supplementary Material and 170 Methods for details of the procedure including the selection of a representative species set and 171 Supplementary Table 4). Members of 30 families of (nearly) universal proteins that appear to have 172 evolved without much HGT and have been previously employed for the reconstruction of the tree of life 173 (25) were used to generate a concatenated alignment of 7411 positions, after removing low information 174 content positions (Supplementary Table 4). For the phylogenetic reconstruction, we used the IQ-tree 175 program with several phylogenetic models (see Methods and Supplementary Table 5 for details). 176 Surprisingly, the resulting trees had the 3D topology, with high support values for all key bifurcations 177 (Fig. 2c, additional data file 2).

178 A full investigation of the effects of different factors, in particular, the marker gene selection, on the tree 179 topology is beyond the scope of this work. Nonetheless, we addressed the possibility that the 3D topology 180 resulted from the model used for the tree reconstruction and/or the species selection. To this end, we 181 constructed 100 trees from the same alignment by randomly sampling 5 representatives of Asgard 182 archaea, other archaea, bacteria and eukaryotes each. For these smaller sets of species, the best model 183 identified by Williams et al. (LG+C60+G4+F) could be employed, resulting in 50 3D and 50 2D trees 184 (Supplementary Table 5, additional data file 2). Because IQ-tree identified this model as over-specified 185 for such a small alignment, we also tested a more restricted model (LG+C20+G4+F), obtaining 58 3D and 186 42 2D trees for the same set of 100 samples (Supplementary Table 5, additional data file 2).

187 The results of our phylogenetic analysis indicate that: 1) species sampling substantially affects the tree 188 topology; 2) even the set of most highly conserved genes that appear to be minimally prone to HGT, 189 yields conflicting signals for different species sets. Additional markers, less highly conserved and more 190 prone to HGT, are unlikely to improve phylogenetic resolution and might cause systematic error. Notably, 191 the topology of our complete phylogenetic tree (Fig. 2a) within the archaeal clade is mostly consistent

7

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

192 with the tree obtained in a preliminary analysis of a larger set of archaeal genomes and a larger marker 193 gene set (26). Taking into account these observations and the fact that we used the largest set of Asgard 194 archaea and other archaea compared to all previous phylogenetic analyses, the appearance of the 3D 195 topology in our tree indicates that the origin of eukaryotes from within Asgard cannot be considered a 196 settled issue. Various factors affecting the tree topology, including further increased species 197 representation, particularly, of Asgard and the deep branches of the TACK superphylum, such as 198 Bathyarchaeota and Korarchaeota, remain to be explored in order to definitively resolve the evolutionary 199 relationship between archaea and eukaryotes.

200

201 The core gene set of Asgard archaea

202 We next analyzed the core set of conserved Asgard genes which we arbitrarily defined as all asCOGs that 203 are present at least in one third of the MAGs in each of the 12 phylum-level lineages, with the mean 204 representation across lineages >75%. Under these criteria, the Asgard core includes 378 asCOGs 205 (Supplementary Table 6). As expected, most of these protein families, 293 (77%), are universal (present 206 in bacteria, other archaea and eukaryotes), 62 (16%) are represented in other archaea and eukaryotes, but 207 not in bacteria, 15 (4%) are found in other archaea and bacteria, but not in eukaryotes, 7 (2%) are archaea- 208 specific, and only 1 (0.003%) is shared exclusively with eukaryotes (Supplementary Figure 5). Most of 209 the core asCOGs show comparable levels of similarity to homologs from two or all three domains of life. 210 The second largest fraction of the core asCOGs shows substantially greater sequence similarity (at least, 211 25% higher similarity score) to homologous proteins from archaea than to those from eukaryotes and/or 212 bacteria (Supplementary Table 6). Compared with the 219 genes that comprise the pan-archaeal core (27), 213 the Asgard core set lacks 12 genes, each of which, however, is present in some subset of the Asgard 214 genomes. These include three genes of diphthamide biosynthesis and 2 ribosomal proteins, L40E and 215 L37E. The intricate evolutionary history of gene encoding translation elongation factors and enzymes of 216 diphthamide biosynthesis in Asgard has been analyzed previously (28). Also of note is the displacement 217 of the typical archaeal glyceraldehyde-3-phosphate dehydrogenase (type II) by a bacterial one (type I) in 218 most of the Asgard genomes (cog.001204, additional data file 1).

219 Functional distribution of the core asCOGs is shown in Fig. 3b (also see additional data file 1). For 220 comparison, we also derived an extended gene core for the TACK superphylum, using similar criteria (at 221 least 50% in each of the 6 lineages and 75% of the genomes overall, Fig 3b). For at least half of the 222 Asgard core genes, across most functional classes, there were no orthologs in the TACK core. The most 223 pronounced differences were found, as expected, in the category U (intracellular trafficking, secretion,

8

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

224 and vesicular transport). In Asgard archaea, this category includes 19 core genes compared with 7 genes 225 in TACK; 13 of these genes are specific to the Asgard archaea and include components of ESCRT I and 226 II, 3 distinct Roadblock/longin families, 2 distinct families of small GTPases, and a few other genes 227 implicated in related processes (Supplementary Table 6).

228 We compared the protein annotation obtained using asCOGs with the available annotation of ‘Candidatus 229 Prometheoarchaeum syntrophicum’ and found that using asCOGs allowed at least a general functional 230 prediction for 649 of the 1756 (37%) ‘hypothetical proteins’ in this organism, the only one in Asgard with 231 a closed genome. We also identified 139 proteins, in addition to the 80 described originally, that can be 232 considered Eukaryotic Signature Proteins, or ESPs (see next section).

233

234 Eukaryotic features of Asgard archaea gleaned from genome analysis

235 The enrichment of Asgard proteomes with homologs of eukaryote signature proteins (ESPs), such as 236 ESCRTs, components of protein sorting complexes including coat proteins, complete ubiquitin 237 machinery, actins and actin-binding proteins gelsolins and profilins, might be the strongest argument in 238 support of a direct evolutionary relationship between Asgard archaea and eukaryotes (2, 29). However, 239 the definition of ESPs is fuzzy because many of these proteins, in addition to their occurrence in Asgard, 240 are either scattered among several other archaeal genomes, often, from diverse groups (30), or consist of 241 promiscuous domains that are common in archaea, bacteria and eukaryotes, such as WD40 (after the 242 conserved terminal amino acids of the repeat units, also known as beta-transducin repeats), LRR 243 (Leucine-Rich Repeats), TPR (TetratricoPeptide Repeats), HEAT (Huntingtin-EF3-protein phosphatase 244 2A-TOR1) and other, largely, repetitive domains (31, 32). Furthermore, the sequences of some ESPs have 245 diverged to the extent that they become hardly detectable with standard computational methods. Our 246 computational strategy for delineating an extensive yet robust ESP set is described under Materials and 247 Methods. The ESP set we identified contained 505 asCOGs, including 238 that were not closely similar 248 (E-value=10-10, length coverage 75%) to those previously described by Zaremba-Niedzwiedzka et al. 249 (2)(Supplementary Table 7). In a general agreement with previous observations, the majority of these 250 ESPs, 329 of the 505, belonged to the ‘Intracellular trafficking, secretion, and vesicular transport’ (U) 251 functional class, followed by ‘Posttranslational modification, protein turnover, chaperones’ (O), with 101 252 asCOGs (Supplementary Table 7). Among the asCOGs in the U class, 130 were Roadblock/LC7 253 superfamily proteins, including longins, sybindin and profilins, and 94 were small GTPases of several 254 families, such as RagA-like, Arf-like and Rab-like ones, as discussed previously (33).

9

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

255 The phyletic patterns of ESP asCOGs in Asgard archaea are extremely patchy and largely lineage-specific 256 (Fig. 4), indicating that most of the proteins in this set are not uniformly conserved throughout Asgard 257 evolution, but rather, are prone to frequent HGT, gene losses and duplications. These evolutionary 258 processes are correlated in , resulting in the overall picture of highly dynamic evolution (34). 259 Even the most highly conserved ESP asCOG are missing in some Asgard lineages but show multiple 260 duplications in others (Fig. 4 and Supplementary Table 7). Surprisingly, many gaps in the ESPs 261 distribution were detected in the Heimdallarchaeota that include the suspected ancestors of eukaryotes.

262 Characteristically, many ESPs are multidomain proteins, with 37% assigned to more than one asCOG, 263 compared to 17% among non-ESP proteins (Supplementary Table 7). Some multidomain ESPs in Asgard 264 archaea have the same domain organizations as their homologs in eukaryotes, but these are a minority and 265 typically contain only two domains. Examples include the fusion of two EAP30/Vps37 domains (35), and 266 Vps23 and E2 domains (35) in ESCRT complexes, multiple Rag family GTPases, in which longin domain 267 is fused to the GTPase domain, and several others. By contrast, most of the domain architectures of the 268 multidomain ESP proteins were not detected in eukaryotes and often are found only in a narrow subset of 269 Asgard archaea, suggesting extensive domain shuffling during Asgard evolution (Fig. 5a). For example, 270 we identified many proteins containing a fusion of Vps28/Vps23 from ESCRT I complex (35) with C- 271 terminal domains of several homologous subunits of adaptin and COPI coatomer complexes (36, 37), and 272 E3 UFM1-protein ligase 1, which is involved in the UFM1 ubiquitin pathway (38) (Fig. 5a). Generally, a 273 protein with such a combination of domains can be predicted to be involved in ubiquitin-dependent 274 membrane remodeling but, because its domain architecture is unique, the precise function cannot be 275 inferred.

276 The majority of the ESP genes of Asgard archaea do not belong to conserved genomic neighborhoods, but 277 several such putative operons were detected. Perhaps, the most notable one is the ESCRT neighborhood 278 which includes genes coding for subunits of ESCRT I, II and III, and often, components of the ubiquitin 279 system (2), suggesting an ancient link between the two systems that persists in eukaryotes (35). We 280 predicted another operon that is conserved in most Asgard archaea and consists of genes encoding a 281 LAMTOR1-like protein of the Roadblock superfamily, a Rab-like small GTPase, and a protein containing 282 the DENN (differentially expressed in normal and neoplastic cells) domain that so far has been identified 283 only in eukaryotes (Fig. 5b). Two proteins consisting of a DENN domain fused to longin are subunits of 284 the folliculin (FLCN) complex that is conserved in eukaryotes. The FLCN complex is the sensor of amino 285 acid starvation interacting with Rag GTPase and Ragulator lysosomal complex, and a key component of 286 the mTORC1 pathway, the central regulator of cell growth in eukaryotes (39). Some Heimdallarchaea 287 encode several proteins with the exact same domain organization as FLCN (Fig. 5b). Ragulator is a

10

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

288 complex that consists of 5 subunits, each containing the Roadblock domain. In Asgard archaea, however, 289 the GTPase present in the operon is from a family that is distinct from the Rag GTPases, which interact 290 with both FLCN and Ragulator complexes in eukaryotes, despite the fact that Rag family GTPases are 291 abundant in Asgards (33) (Supplementary Table 7). Nevertheless, this conserved module of Asgard 292 proteins is a strong candidate to function as a guanine nucleotide exchange factor for Rab and Rag 293 GTPases, analogously to the eukaryotic FLCN. In eukaryotes, the DENN domain is present in many 294 proteins with different domain architectures that interact with different partners and perform a variety of 295 functions (40, 41). The Asgard archaea also encode other DENN domain proteins, and the respective 296 genes form expanded families of paralogs in Loki, Hel and Heimdall lineages, again, with domain 297 architectures distinct from those in eukaryotes (Fig. 5b) (42).

298 Prompted by the identification of a FLCN-like complex, we searched for other components of the 299 mTORC1 regulatory pathway in Asgard archaea. The GATOR1 complex that consists of three subunits, 300 Depdc5, Nprl2, and Nprl3, is another amino acid starvation sensor that is involved in this pathway in 301 eukaryotes (43). Nitrogen permease regulators 2 and 3 (NPRL2 and NPRL3) are homologous GATOR1 302 subunits that contain a longin domain and a small NPRL2-specific C-terminal domain (43). We identified 303 a protein family with this domain organization in most Thor MAGs and a few Loki MAGs. Several other 304 ESP asCOGs include proteins with high similarity to the longin domain of NPRL2. Additionally, we 305 identified many fusions of the NPRL2-like longin domain with various domains related to prokaryotic 306 two-component signal transduction system (Fig. 5c). Considering the absence of a homolog of 307 phosphatidylinositol 3-kinase, the catalytic domain of the mTOR protein, it seems likely that, in Asgard 308 archaea, the key growth regulation pathway remains centered at typical prokaryotic two-component signal 309 transduction systems whereas at least some of the regulators and sensors in this pathway are “eukaryotic”. 310 The abundance of NPRL2-like longin domains in Asgard archaea implies that the link between this 311 domain and amino acid starvation regulation emerged at the onset of Asgard evolution if not earlier.

312

313 Diverse metabolic repertoires, ancestral metabolism of Asgard archaea, and syntrophic evolution

314 Examination of the distribution of the asCOGs among the 12 Asgard archaeal phyla showed that the 315 metabolic pathway repertoire was conserved among the MAGs of each phylum but differed between the 316 phyla (Fig. 3a). Three distinct lifestyles were predicted by the asCOG analysis for different major 317 branches of Asgard archaea, namely, anaerobic heterotrophy, facultative aerobic heterotrophy, and 318 chemolithotrophy (Fig. 6, Supplementary Figure 9). For the last Asgard archaeal common ancestor

319 (LAsCA), a mixotrophic life style, including both production and consumption of H2, can be inferred

11

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

320 from parsimony considerations (Fig. 6, Supplementary Table 8; see Materials and Methods for further 321 details). Loki-, Thor-, Hermod-, Baldr- and Borrarchaeota encode all enzymes for the complete (archaeal) 322 Wood-Ljungdahl pathway (WLP) and are predicted to oxidize organic substrates, likely, by using the 323 reverse WLP, given the lack of enzymes for oxidation of inorganic compounds (e.g., hydrogen, 324 sulfur/sulfide and nitrogen/ammonia). The genomes of these five Asgard phyla encode homologues of

325 membrane-bound respiratory H2-evolving Group 4 [NiFe] hydrogenase (Supplementary Figure 6) and/or 326 cytosolic cofactor-coupled bidirectional Group 3 [NiFe] hydrogenase (44) (Supplementary Figure 7). 327 Phylogenetic analysis of both group 3 and group 4 [NiFe] hydrogenases showed that Asgard archaea form 328 distinct clades well separated from the functionally characterized hydrogenases, hampering the prediction 329 of their specific functions in Asgard archaea. The functionally characterized group 4 [NiFe] hydrogenases

330 in the Thermococci are involved in the fermentation of organic substrates to H2, acetate and carbon 331 dioxide (45, 46). The presence of group 3 [NiFe] hydrogenases suggests that these Asgard archaea cannot

332 use H2 as an electron donor because they lack the enzyme complex coupling H2 oxidation to membrane

333 potential generation. Thus, in these , bifurcate electrons from H2 are likely to be used to support 334 the fermentation of organic substrates exclusively (45–47).

335 Both Wukongarchaeota genomes (As_075 and As_085) encode a bona fide membrane-bound Group 1k 336 [NiFe] hydrogenase that could mediate hydrogenotrophic respiration using heterodisulfide as the terminal 337 electron acceptor (48, 49) (Fig. 6, Supplementary Figure 8, Supplementary Figure 10). The group 1k 338 [NiFe] hydrogenase is exclusively found in methanogens of the order Methanosarcinales (Euryarchaoeta) 339 (50), and it is the first discovery of the group 1 [NiFe] hydrogenase in the Asgard archaea. 340 Wukongarchaeota also encode all enzymes for a complete WLP and a putative ADP-dependent acetyl- 341 CoA synthetase for acetate synthesis. Unlike all other Asgard archaea, Wukongarchaeota lack genes for 342 citrate cycle and beta-oxidation. Thus, Wukongarchaeota appear to be obligate chemolithotrophic 343 acetogens. The genomes of Wukongarchaeota were discovered only in seawater of the euphotic zone of

344 the Yap trench (0 m and 125 m). Dissolved H2 concentration is known to be the highest in surface 345 seawater, where the active microbial fermentation, compared to deep sea (51), could produce sufficient 346 amounts of hydrogen for the growth of Wukongarchaeota. Hodarchaeota, Gerdarchaeota, Kariarchaeota, 347 and Heimdallarchaeota share a common ancestor with Wukongarchaeota (Fig. 6). However, genome 348 analysis implies different lifestyles for these organisms. Hod-, Gerd- and Kariarchaeota encode various 349 electron transport chain components, including heme/copper-type cytochrome/quinol oxidase, nitrate 350 reductase, and NADH dehydrogenase, most likely, allowing the use of oxygen and nitrate as electron 351 acceptors during aerobic and anaerobic respiration, respectively (44). In addition, Hod-, Gerd- and 352 Heimdallarchaeota encode phosphoadenosine phosphosulfate (PAPS) reductase and adenylylsulfate 353 kinase for sulfate reduction, enabling the use of sulfate as electron acceptor during anaerobic respiration.

12

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

354 Gerd-, Heimdall-, and Hodarchaeota are only found in coastal and deep-sea sedimentary environments, 355 whereas Kariarchaeota were found also in marine water. The versatile predicted metabolic capacities of 356 these groups suggest that Hod-, Gerd- and Kariarchaeota might occupy both anoxic and oxic niches. In 357 contrast, Heimdallarchaeota appear to be able to thrive only in anoxic environments.

358 In the widely considered syntrophy scenarios (52), eukaryogenesis has been proposed to involve 359 metabolic symbiosis (syntrophy) between an archaeon and one or two bacterial partners which, in the

360 original hydrogen/syntrophy hypothesis, were postulated to donate H2 for methane or hydrogen sulfide 361 production by the consortium (53, 54). The syntrophic scenarios were boosted by the discovery of 362 apparent syntrophy between Candidatus P. syntrophicum and Deltaproteobacteria which led to the 363 proposal of the Etangle-Engulf-Endogenize (E3) model of eukaryogenesis (17, 55). Reconstruction of the 364 Lokiarchaeon metabolism has suggested that this organism was hydrogen-dependent, in accord with the 365 hydrogen-syntrophic scenarios (54). In contrast, subsequent analysis of the metabolic potentials of 4

366 Asgard phyla has led to the inference that these organisms were primarily organoheterotrophic and H2- 367 producing, the ‘reverse flow model’ model of protoeukaryote energy metabolism that involves electron or 368 hydrogen flow from an Asgard archaeon to the alphaproteobacterial ancestor of mitochondria, in the 369 opposite direction from that in the original hydrogen-syntrophy hypotheses (44). Here, we discovered a 370 deeply branching Asgard group, Wukongarchaeota, which appears to include obligate hydrogenotrophic 371 acetogens, suggesting the possibility of the LAsCA being a hydrogen-dependent autotroph (Fig. 6). This

372 finding suggests that LAsCA both produced and consumed H2. Thus, depending on the exact relationship 373 between Asgard archaea and eukaryotes that remains to be elucidated, our findings could be compatible

374 with different syntrophic scenarios that postulate H2 transfer from bacteria to the archaeal symbiont or in 375 the opposite direction.

376

377 Conclusions

378 The Asgard archaea that were discovered only 5 years ago as a result of the painstaking assembly of 379 several Loki Castle metagenomes have grown into a highly diverse archaeal superphylum. The most 380 remarkable feature of the Asgards is their apparent evolutionary affinity with eukaryotes that has been 381 buttressed by two independent lines of evidence: phylogenetic analysis of highly conserved genes and 382 detection of multiple ESPs that are absent or far less common in other archaea. The 75 MAGs added here 383 substantially expand the phylogenetic and metabolic diversity of the Asgard superphylum. The extended 384 set of Asgard genomes provides for a phylogeny based on a far more representative species sampling than 385 available previously and a substantially expanded ESP analysis employing powerful computational

13

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

386 methods. This extended analysis reveals 6 putative additional Asgard phyla but does not immediately 387 clarify the Asgard-eukaryote relationship. Indeed, phylogenetic analysis of conserved genes in an 388 expanded set of archaea, bacteria and eukaryotes yields conflicting 3D and 2D signals, with an 389 unexpected preference for the 3D topology. Thus, the conclusion that eukaryotes emerged from within 390 Asgard archaea, in particular, from the Heimdall lineage, appears to be premature. Further phylogenomic 391 study with an even broader representation of diverse archaeal lineages as well as, possibly, even more 392 sophisticated evolutionary models are required to clarify the relationships between archaea and 393 eukaryotes.

394 Our analysis of Asgard genomes substantially expanded the set of ESPs encoded by this group of archaea 395 and revealed numerous, complex domain architectures of these proteins. These results further emphasize 396 the excess of ESPs in Asgards compared to other archaea and provide additional support to the conclusion 397 that most of the Asgard ESPs are involved in membrane remodeling and intracellular trafficking. 398 However, in parallel with the phylogenomic results, detailed analysis of the ESPs reveals a complex 399 picture. Most of the multidomain Asgard ESPs possess domain architectures distinct from typical 400 eukaryotic ones and some of these arrangements include signature prokaryotic domains, suggesting 401 substantial functional differences from the respective eukaryotic systems. Furthermore, virtually all the 402 ESPs show patchy distributions in Asgard and other archaea, indicative of a history of extensive HGT, 403 gene losses and paralogous family expansion. All these findings seem to be best compatible with the 404 model of a dispersed, dynamic archaeal ‘eukaryome’ (30) that widely spreads among archaea via HGT, so 405 far reaching the highest ESP density in the Asgard archaea.

406 The results of this work cannot rule out the possibility of the emergence of eukaryotes from within the 407 Asgard but seem to be better compatible with a different evolutionary scenario under which the conserved 408 core of eukaryote genes involved in informational processes originates from an as yet unknown ancestor 409 group that might be a deep archaeal branch or could lie outside the presently characterized archaeal 410 diversity. These hypothetical ancestral forms might have accumulated components of the mobile archaeal 411 ‘eukaryome’ to an even greater extent than the Asgards archaea, eventually, giving rise to eukaryote-like 412 cells, likely, via a form of syntrophy with one or more bacterial partners. Combined genomic and 413 (undoubtedly, far more challenging) biological study of diverse archaea is essential for further advancing 414 our understanding of eukaryogenesis.

415 416 Author contributions

14

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

417 ML, EVK, KSM and YL initiated the study;; YL performed metagenomic assembly, binning, metabolism 418 analysis; KSM, AN and YIW performed comparative genomic analysis; YL, KSM, YIW and WCH 419 performed phylogenetic analysis; KSM and YIW constructed asCOGs; KSM, YIW, YL, ML, and EVK 420 analyzed the data; YL, KSM, WCH, EVK, and ML wrote the manuscript that was read, edited and 421 approved by all authors.

422

423 Acknowledgements

424 ML and YL are supported by National Natural Science Foundation of China (Grant No. 91851105, 425 31970105 and 31700430), the Key Project of Department of Education of Guangdong Province 426 (No.2017KZDXM071), and the Shenzhen Science and Technology Program (Grant no. 427 JCYJ20170818091727570 and KQTD20180412181334790). KSM, YIW, SAS and EVK are supported 428 by the Intramural Research Program of the National Institutes of Health of the USA (National Library of 429 Medicine).

430

431 Competing interests

432 The authors declare no competing interests

15

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

433 References

434 1. A. Spang, J. H. Saw, S. L. Jørgensen, K. Zaremba-Niedzwiedzka, J. Martijn, A. E. Lind, 435 R. van Eijk, C. Schleper, L. Guy, T. J. G. Ettema, Complex archaea that bridge the gap between 436 prokaryotes and eukaryotes. Nature. 521, 173–179 (2015).

437 2. K. Zaremba-Niedzwiedzka, E. F. Caceres, J. H. Saw, D. B ckstr m, L. Juzokaite, E. 438 Vancaester, K. W. Seitz, K. Anantharaman, P. Starnawski, K. U. Kjeldsen, M. B. Stott, T. 439 Nunoura, J. F. Banfield, A. Schramm, B. J. Baker, A. Spang, T. J. G. Ettema, Asgard archaea 440 illuminate the origin of eukaryotic cellular complexity. Nature. 541, 353–358 (2017).

441 3. F. MacLeod, G. S. Kindler, H. L. Wong, R. Chen, B. P. Burns, Asgard archaea: 442 Diversity, function, and evolutionary implications in a range of microbiomes. AIMS Microbiol. 443 5, 48–61 (2019).

444 4. M. Cai, Y. Liu, X. Yin, Z. Zhou, M. W. Friedrich, T. Richter-Heitmann, R. Nimzyk, A. 445 Kulkarni, X. Wang, W. Li, J. Pan, Y. Yang, J.-D. Gu, M. Li, Diverse Asgard archaea including 446 the novel phylum Gerdarchaeota participate in organic matter degradation. Sci. China Life Sci. 447 63, 886–897 (2020).

448 5. T. A. Williams, C. J. Cox, P. G. Foster, G. J. Szöllősi, T. M. Embley, Phylogenomics 449 provides robust support for a two-domains tree of life. Nat Ecol Evol (2019), 450 doi:10.1038/s41559-019-1040-x.

451 6. T. A. Williams, P. G. Foster, C. J. Cox, T. M. Embley, An archaeal origin of eukaryotes 452 supports only two primary domains of life. Nature. 504, 231–236 (2013).

453 7. C. J. Cox, P. G. Foster, R. P. Hirt, S. R. Harris, T. M. Embley, The archaebacterial origin 454 of eukaryotes. Proc Natl Acad Sci U S A. 105, 20356–20361 (2008).

455 8. N. Yutin, K. S. Makarova, S. L. Mekhedov, Y. I. Wolf, E. V. Koonin, The deep archaeal 456 roots of eukaryotes. Mol. Biol. Evol. 25, 1619–1630 (2008).

457 9. V. Da Cunha, M. Gaia, D. Gadelle, A. Nasir, P. Forterre, Lokiarchaea are close relatives 458 of Euryarchaeota, not bridging the gap between prokaryotes and eukaryotes. PLoS Genet. 13, 459 e1006810 (2017).

460 10. V. D. Cunha, M. Gaia, A. Nasir, P. Forterre, Asgard archaea do not close the debate 461 about the universal tree of life topology. PLoS Genet. 14, e1007215 (2018).

462 11. P. Forterre, The universal tree of life: an update. Front. Microbiol. 6, 717 (2015).

463 12. J. Lombard, P. López-García, D. Moreira, The early evolution of lipid membranes and 464 the three domains of life. Nat. Rev. Microbiol. 10, 507–515 (2012).

465 13. J. A. Burns, A. A. Pittis, E. Kim, Gene-based predictive models of trophic modes suggest 466 Asgard archaea are not phagocytotic. Nat. Ecol. Evol. 6, a016006 (2018).

16

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

467 14. C. Akıl, R. C. Robinson, Genomes of Asgard archaea encode profilins that regulate actin. 468 Nature. 562, 439–443 (2018).

469 15. C. Akıl, L. T. Tran, M. Orhant-Prioux, Y. Baskaran, E. Manser, L. Blanchoin, R. C. 470 Robinson, Insights into the evolution of regulated actin dynamics via characterization of 471 primitive gelsolin/cofilin proteins from Asgard archaea. Proc Natl Acad Sci U S A. 117, 19904– 472 19913 (2020).

473 16. Z. Lu, T. Fu, T. Li, Y. Liu, S. Zhang, J. Li, J. Dai, E. V. Koonin, G. Li, H. Chu, M. Li, 474 Coevolution of Eukaryote-like Vps4 and ESCRT-III Subunits in the Asgard Archaea. mBio. 11, 475 e00417-20, /mbio/11/3/mBio.00417-20.atom (2020).

476 17. H. Imachi, M. K. Nobu, N. Nakahara, Y. Morono, M. Ogawara, Y. Takaki, Y. Takano, 477 K. Uematsu, T. Ikuta, M. Ito, Y. Matsui, M. Miyazaki, K. Murata, Y. Saito, S. Sakai, C. Song, E. 478 Tasumi, Y. Yamanaka, T. Yamaguchi, Y. Kamagata, H. Tamaki, K. Takai, Isolation of an 479 archaeon at the -eukaryote interface. Nature. 577, 519–525 (2020).

480 18. S. M. Karst, M. S. Dueholm, S. J. McIlroy, R. H. Kirkegaard, P. H. Nielsen, M. 481 Albertsen, Nat. Biotechnol., in press, doi:10.1038/nbt.4045.

482 19. R.-Y. Zhang, B. Zou, Y.-W. Yan, C. O. Jeon, M. Li, M. Cai, Z.-X. Quan, Design of 483 targeted primers based on 16S rRNA sequences in meta-transcriptomic datasets and 484 identification of a novel taxonomic group in the Asgard archaea. BMC microbiology. 20, 25 485 (2020).

486 20. K. S. Makarova, Y. I. Wolf, E. V. Koonin, Towards functional characterization of 487 archaeal genomic dark matter. Biochem Soc Trans. 47, 389–398 (2019).

488 21. E. V. Koonin, Y. I. Wolf, Genomics of bacteria and archaea: the emerging dynamic view 489 of the prokaryotic world. Nucleic Acids Res. 36, 6688–6719 (2008).

490 22. A. R. Nabhan, I. N. Sarkar, The impact of taxon sampling on phylogenetic inference: a 491 review of two decades of controversy. Brief Bioinform. 13, 122–134 (2012).

492 23. R. Gouy, D. Baurain, H. Philippe, Rooting the tree of life: the phylogenetic jury is still 493 out. Philos Trans R Soc Lond B Biol Sci. 370, 20140329 (2015).

494 24. J. Felsenstein, Inferring Phylogenies (Oxford University Press, Oxford, New York, 2nd 495 Edition., 2003).

496 25. F. D. Ciccarelli, T. Doerks, C. von Mering, C. J. Creevey, B. Snel, P. Bork, Toward 497 automatic reconstruction of a highly resolved tree of life. Science (New York, N.Y.). 311, 1283– 498 1287 (2006).

499 26. C. Rinke, M. Chuvochina, A. J. Mussig, P.-A. Chaumeil, D. W. Waite, W. B. Whitman, 500 D. H. Parks, P. Hugenholtz, bioRxiv, in press, doi:10.1101/2020.03.01.972265.

17

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

501 27. K. S. Makarova, Y. I. Wolf, E. V. Koonin, Archaeal Clusters of Orthologous Genes 502 (arCOGs): An Update and Application for Analysis of Shared Features between 503 Thermococcales, Methanococcales, and Methanobacteriales. Life (Basel). 5, 818–840 (2015).

504 28. A. B. Narrowe, A. Spang, C. W. Stairs, E. F. Caceres, B. J. Baker, C. S. Miller, T. J. G. 505 Ettema, Complex Evolutionary History of Translation Elongation Factor 2 and Diphthamide 506 Biosynthesis in Archaea and Parabasalids. Genome Biol Evol. 10, 2380–2393 (2018).

507 29. L. Eme, A. Spang, J. Lombard, C. W. Stairs, T. J. G. Ettema, Archaea and the origin of 508 eukaryotes. Nat. Rev. Microbiol. 15, 711–723 (2017).

509 30. E. V. Koonin, N. Yutin, The dispersed archaeal eukaryome and the complex archaeal 510 ancestor of eukaryotes. Cold Spring Harb Perspect Biol. 6, a016188 (2014).

511 31. R. L. Tatusov, N. D. Fedorova, J. D. Jackson, A. R. Jacobs, B. Kiryutin, E. V. Koonin, D. 512 M. Krylov, R. Mazumder, S. L. Mekhedov, A. N. Nikolskaya, B. S. Rao, S. Smirnov, A. V. 513 Sverdlov, S. Vasudevan, Y. I. Wolf, J. J. Yin, D. A. Natale, The COG database: an updated 514 version includes eukaryotes. BMC Bioinformatics. 4, 41 (2003).

515 32. E. V. Koonin, N. D. Fedorova, J. D. Jackson, A. R. Jacobs, D. M. Krylov, K. S. 516 Makarova, R. Mazumder, S. L. Mekhedov, A. N. Nikolskaya, B. S. Rao, I. B. Rogozin, S. 517 Smirnov, A. V. Sorokin, A. V. Sverdlov, S. Vasudevan, Y. I. Wolf, J. J. Yin, D. A. Natale, A 518 comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. 519 Genome Biol. 5, R7 (2004).

520 33. C. M. Klinger, A. Spang, J. B. Dacks, T. J. G. Ettema, Tracing the Archaeal Origins of 521 Eukaryotic Membrane-Trafficking System Building Blocks. Mol Biol Evol. 33, 1528–1541 522 (2016).

523 34. P. Puigbò, A. E. Lobkovsky, D. M. Kristensen, Y. I. Wolf, E. V. Koonin, Genomes in 524 turmoil: quantification of genome dynamics in prokaryote supergenomes. BMC Biol. 12, 66 525 (2014).

526 35. L. Christ, C. Raiborg, E. M. Wenzel, C. Campsteijn, H. Stenmark, Cellular Functions and 527 Molecular Mechanisms of the ESCRT Membrane-Scission Machinery. Trends Biochem Sci. 42, 528 42–56 (2017).

529 36. N. Gomez-Navarro, E. Miller, Protein sorting at the ER-Golgi interface. J Cell Biol. 215, 530 769–778 (2016).

531 37. K. M. K. Kibria, J. Ferdous, R. Sardar, A. Panda, D. Gupta, A. Mohmmed, P. Malhotra, 532 A genome-wide analysis of coatomer protein (COP) subunits of apicomplexan parasites and their 533 evolutionary relationships. BMC Genomics. 20, 98 (2019).

534 38. Y. Wei, X. Xu, UFMylation: A Unique & Fashionable Modification for Life. Genomics 535 Proteomics Bioinformatics. 14, 140–146 (2016).

536 39. R. E. Lawrence, S. A. Fromm, Y. Fu, A. L. Yokom, D. J. Kim, A. M. Thelen, L. N. 537 Young, C.-Y. Lim, A. J. Samelson, J. H. Hurley, R. Zoncu, Structural mechanism of a Rag

18

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

538 GTPase activation checkpoint by the lysosomal folliculin complex. Science. 366, 971–977 539 (2019).

540 40. M.-Y. Su, S. A. Fromm, R. Zoncu, J. H. Hurley, Structure of the C9orf72 ARF GAP 541 complex that is haploinsufficient in ALS and FTD. Nature. 585, 251–255 (2020).

542 41. N. de Martín Garrido, C. H. S. Aylett, Nutrient Signaling and Lysosome Positioning 543 Crosstalk Through a Multifunctional Protein, Folliculin. Front Cell Dev Biol. 8, 108 (2020).

544 42. A. L. Marat, H. Dokainish, P. S. McPherson, DENN domain proteins: regulators of Rab 545 GTPases. J Biol Chem. 286, 13791–13800 (2011).

546 43. K. Shen, R. K. Huang, E. J. Brignole, K. J. Condon, M. L. Valenstein, L. Chantranupong, 547 A. Bomaliyamu, A. Choe, C. Hong, Z. Yu, D. M. Sabatini, Architecture of the human GATOR1 548 and GATOR1-Rag GTPases complexes. Nature. 556, 64–69 (2018).

549 44. A. Spang, C. W. Stairs, N. Dombrowski, L. Eme, J. Lombard, E. F. Caceres, C. 550 Greening, B. J. Baker, T. J. G. Ettema, Proposal of the reverse flow model for the origin of the 551 eukaryotic cell based on comparative analyses of Asgard archaeal metabolism. Nat. Microbiol. 4, 552 1138–1148 (2019).

553 45. H. Yu, C.-H. Wu, G. J. Schut, D. K. Haja, G. Zhao, J. W. Peters, M. W. W. Adams, H. 554 Li, Structure of an Ancient Respiratory System. Cell. 173, 1636-1649.e16 (2018).

555 46. G. J. Schut, E. S. Boyd, J. W. Peters, M. W. W. Adams, The modular respiratory 556 complexes involved in hydrogen and sulfur metabolism by heterotrophic hyperthermophilic 557 archaea and their evolutionary implications. FEMS Microbiol Rev. 37, 182–203 (2013).

558 47. F. O. Bryant, M. W. Adams, Characterization of hydrogenase from the hyperthermophilic 559 archaebacterium, Pyrococcus furiosus. J Biol Chem. 264, 5070–5079 (1989).

560 48. U. Deppenmeier, M. Blaut, B. Schmidt, G. Gottschalk, Purification and properties of a 561 F420-nonreactive, membrane-bound hydrogenase from Methanosarcina strain Gö1. Arch 562 Microbiol. 157, 505–511 (1992).

563 49. R. K. Thauer, A.-K. Kaster, M. Goenrich, M. Schick, T. Hiromoto, S. Shima, 564 Hydrogenases from methanogenic archaea, nickel, a novel cofactor, and H2 storage. Annu Rev 565 Biochem. 79, 507–536 (2010).

566 50. D. Søndergaard, C. N. S. Pedersen, C. Greening, HydDB: A web tool for hydrogenase 567 classification and analysis. Scientific Reports. 6, 34212 (2016).

568 51. R. Conrad, W. Seiler, Methane and hydrogen in seawater (Atlantic Ocean). Deep Sea 569 Res. Part I Oceanogr. Res. Pap. 35, 1903–1917 (1988).

570 52. P. López-García, D. Moreira, The Syntrophy hypothesis for the origin of eukaryotes 571 revisited. Nat Microbiol. 5, 655–667 (2020).

572 53. W. Martin, M. Müller, The hydrogen hypothesis for the first eukaryote. Nature. 392, 37– 573 41 (1998).

19

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

574 54. D. Moreira, P. López-García, Symbiosis Between Methanogenic Archaea and δ- 575 as the Origin of Eukaryotes: The Syntrophic Hypothesis. J Mol Evol. 47, 517–530 576 (1998).

577 55. P. López-García, D. Moreira, Cultured Asgard Archaea Shed Light on Eukaryogenesis. 578 Cell. 181, 232–235 (2020). 579

580

20

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

581

582 .

583

21

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

584

585 Figures

586

587 Figure 1. Gene commonality plots for Asgard archaea and the TACK superphylum.

588 The gene commonality plot showing the number of asCOGs in log scale (Y-axis) that include the given 589 fraction of analyzed genomes (X-axis). The Asgard plot is compared with the TACK superphylum plot 590 based on assignment of TACK genomes to arCOGs.

591

592 Figure 2. Phylogenetic analysis of Asgard archaea and their relationships with eukaryotes.

593 (a) Maximum likelihood tree, inferred with IQ-tree and LG+F+R10 model, constructed from 594 concatenated alignments of the protein sequences from 209 core Asgard Clusters of Orthologs (asCOGs). 595 Only the 12 phylum-level clades are shown, with species within each clade collapsed. See supplementary 596 Methods and Supplementary Table 5 for details.

597 (b) Maximum likelihood tree, inferred with IQ-tree and SYM+R8 model, based on 16S rRNA gene 598 sequences. Red stars in (b) denote MAGs reconstructed in the current study.

599 (c) Phylogenetic tree of bacteria, archaea and eukaryotes, inferred with IQ-tree under LG+R10 model, 600 constructed from concatenated alignments of the protein sequences of 30 universally conserved genes 601 (see Material and Methods for details). The tree shows the relationships between the major clades.

602 The trees are unrooted and are shown in a pseudorooted form for visualization purposes only. The actual 603 trees and alignments are in Additional data file 2 and list of the trees are provided in the Supplementary 604 Table 4 and 5.

605

606 Figure 3. Phyletic patterns of asCOGs and functional distribution of Asgard core genes.

607 (a) Classical Multidimensional Scaling analysis of binary presence-absence phyletic patterns for 13,939 608 asCOGs that are represented in at least two genomes (see Material and Methods for details).

609 (b) Functional breakdown of Asgard core genes (378 asCOGs) compared with TACK superphylum core 610 genes (489 arCOGs). The values were normalized as described in Materials and Methods. Functional 611 classes of genes: J, Translation, ribosomal structure and biogenesis; K, Transcription; L, Replication,

22

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

612 recombination and repair; D, Cell cycle control, cell division, chromosome partitioning; V, Defense 613 mechanisms; T, Signal transduction mechanisms; M, Cell wall/membrane/envelope biogenesis; N, Cell 614 motility; U, Intracellular trafficking, secretion, and vesicular transport; O, Posttranslational modification, 615 protein turnover, chaperones; X, Mobilome: prophages, transposons; C, Energy production and 616 conversion; G, Carbohydrate transport and metabolism; E, Amino acid transport and metabolism; F, 617 Nucleotide transport and metabolism; H, Coenzyme transport and metabolism; I, Lipid transport and 618 metabolism; P, Inorganic ion transport and metabolism; Q, Secondary metabolites biosynthesis, transport 619 and catabolism; R, General function prediction only; S, Function unknown;

620

621 Figure 4. Phyletic patterns of Eukaryotic Signature Proteins (ESPs) encoded in Asgard genomes.

622 All 505 ESP asCOGs are grouped by distance between binary presence-absence phyletic patterns. The 623 most highly conserved ESP asCOGs are shown within the red rectangle. Below the plot of the number of 624 ESP domains in each genome is shown. For details, see Supplementary Table 7. 625

626 Figure 5. Domain architectures of selected Asgard ESPs.

627 (a) ESPs with unique domain architectures. The schematic of each multidomain protein is roughly 628 proportional to the respective protein length. The identified domains are shown inside the arrows 629 approximately according to their location and are briefly annotated. Homologous domains are shown by 630 the same color or pattern.

631 (b) DENN domain proteins in Asgards. Upper part of the figure above the dashed line shows putative 632 operon encoding DENN domain proteins. Genes are shown by block arrows with the length proportional 633 to the size of the corresponding protein. For each protein, the nucleotide contig or genome partition 634 accession number, Asgard genome ID and lineage are indicated. The part of the figure below the dashed 635 line shows domain organization of diverse proteins containing DENN domain. Homologous domains are 636 shown by the same color. The inset on the right side explains identity of the domains

637 (c) NPRL2-like proteins in Asgards. Designations are the same as in Figure 4a.

638

639 Figure 6. Reconstruction and evolution of the key metabolic processes in Asgard archaea.

640 The schematic phylogeny of Asgard archaea is from Figure 1a. 641 LAsCA, Last Asgard Common Ancestor; WLP, Wood-Ljungdahl pathway.

23

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

642 Supplementary Materials 643 Taxonomic description 644 Materials and Methods 645 Supplementary References 646 Tables S1-S8 647 Figures S1-S10 648 Additional Data files are available at ftp://ftp.ncbi.nih.gov/pub/wolf/_suppl/asgard20/

649 Additional data file 1 (Additional_data_file_1.tgz): Complete asCOG data archive

650 Additional data file 2 (Additional_data_file_2.tgz): Phylogenetic trees and alignments archive

651

24

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.01

0.001

0.0001 Asgard TACK a b 99.8 Euryarchaeota (231) + DPANN (22) Korarchaeota_Candidatus_Korarchaeum_cryptofilum_OPF8 As_130 Baldrarchaeota Bathyarchaeota (2) + Thaumarchaeota (13) +Aigarchaeota (1) 91.8 100 Thorarchaeota 100 Crenarchaeota (41)

100 As_143 Odinarchaeota As_006_00990 100 100 As_098_03035 Helarchaeota Hela_A_Ga0180301_100789461 100 Lokiarchaeota (28) As_168 100 100 100 Hermodarchaeota As_095_02125 Hermodarchaeota As_086_02466 100 Thorarchaeota (28) 100 Odinarchaeota KP091041 99 JX000838 100 100 JQ817966 100 AB797473 Baldrarchaeota 96 97.4 As_104_02716 As_139 92 Heimdallarchaeota 99.9 As_138 AB797480 94.2 DQ640139 99 OBEP010863780 90.9 98.8 OBEP011359260 100 Lokiarchaeota As_001_00761 99.6 As_141 94.9 97.9 GU553642 100 100 100 Asgard-2 (4) 99.8 As_030_00678 OBEP010743007 100 OBEP010855942 Kariarchaeota 100 100 Helarchaeota 92.3 OBEP010696748 OBEP010845664 100 Gerdarchaeota (11) 100 Borrarchaeota 100 Asgard-4 (9) Former Heimdallarchaeota clade As_105_01019 100 Heimdallarchaeota 100 As_090_03462 99.9 As_109_01519 100 100 OBEP011316084 100 OBEP010918415 100 Kariarchaeota JQ989558 100 100 JX000774 As_003_05426 MDVS01000157 100 100 Gerdarchaeota OBEP010704239 98.5 100 OBEP010794268 Hodarchaeota 99.5 99.4 OBEP010919744 100 OBEP010663114 Hodarchaeota 94.3 As_091_00788

91.3 OBEP011300513 99.7 GQ410928 99.3 100 Wukongarchaeota As_031_03504 OBEP010852077 As_085_00438 Wukongarchaeota

Tree scale: 0.1

Heimdallarchaeota

c Remaining Euryarchaea

Thermococci/Class 1 methanogens

DPANN bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

TACK

Asgard

Eukaryotes

Bacteria

0.5 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made A available under aCC-BY-ND 4.0 International license.

0.6

Heimdall 0.5

Gerd Kari 0.4

Hod 0.3

Wukong 0.2 Borr 0.1 A2 Hermod Odin Baldr 0 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 Loki -0.1 Hel

-0.2 Thor -0.3

-0.4 A1

B

0.0 0.1 0.2 0.3 0.4 J K L O D N U C 378 E F asCOGs G H I M P Q T V X R S

Asgard only Shared TACK only bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Borr Odin Kari Thor Hermod Baldr Loki Hel Heimdall Herd Hod Wukong

ESCRT I, II, III; actin, gelsolins and profilins; ubiquitin-system components E1,E2,E3;

600

500

400

300

200

100

0 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made A available under aCC-BY-ND 4.0 International license.

ZnRx3 Coiled-coil PH vps23 core Loki (MK-D1) QEE16267.1

ZnRx3 E3 UFM1-protein ligase 1/PCI domain adaptin C-terminal Vps28 C-terminal QEE15500.1 Loki (MK-D1)

E3 UFM1-protein ligase 1/PCI domain adaptin C-terminal Vps28 C-terminal longin As_098-p_00119 Hel

TPR repeats adaptin C-terminal Vps28 C-terminal longin As_047-p_01971 Hel PCI-like WD-40 repeats 2 HTH only adaptin C-terminal EAP30 As_050p_03916 Hel

Integrin repeats adaptin C-terminal E3 UFM1-protein ligase 1/PCI domain As_098-p_04464 Hel

Integrin repeats adaptin C-terminal (TRAP_B) Rab GTPase As_043-p_00502 Thor

E3 UFM1-protein ligase 1/PCI domain Znr Znr PCI domain? RING As_051-p_00042 Hod

B

As_006 (565499..563083) Odin DENN Loki MK-D1 As_047 (462..2957) As_128 (726660..729529) Hel Lomgin As_071 (32004..29543) Thor Rab-like GTPAse

Roadblock family LAMTOR1-like

HTH Thor Gerd Heimdall As_071-p_03822 As_053-p_02405 As_001-p_01989 7TM

Ig-like domain Gerd Gerd Gerd As_053-p_01688 As_062-p_00879 As_011-p_01894

C

Hel NPRL2 Loki Hel subfamily As_048 Histidine As_040 As_050 longin kinase 24903..23378 4909..2036 99086..101594 NPRL2 C-terminal Rec domain 7TM Gerd Hod longin As_053 As_051 PAS 160513..162485 572349..567823

Rag GTPase MASE3 Hel Arf GTPase As_047 KaiC 280..5396 Ras GTPase HTH

Odin 7TM sensor As_006 WsaG-like 52317..45192 Membrane protein, possibly involved in polysaccharide utilization

PAS Loki Loki As_004-p_02022 Loki (MK-D1) Loki As_119-p_00144 As_028-p_0166 QEE16479.1 As_094-p_01664 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

CO2 Anaerobic heterotroph Thor-, Hermod-, Odin-, Baldr-, Loki-, Hel-, Borr-, Heimdallarchaeota Acetate Reverse WLP CO H+ 2 Facultative aerobic heterotroph Thorarchaeota Acetate Kari-, Gerd-, Hodarchaeota Acetyl-CoA H2 H2 e- e- Chemolithotroph Hermodarchaeota Wukongarchaeota β-oxidation Glycolysis WLP Odinarchaeota Alphaproteobacterium Lipids, fatty acids, sugars, alcohols, alkane/aromatic compounds Mixotrophic LAsCA Baldrarchaeota + CO2 H2 H CO2

Acetate Lokiarchaeota Acetate WLP + Reverse WLP CO H Alkane metabolism H+ 2 H uptake Acetate 2 Helarchaeota Acetyl-CoA H Acetyl-CoA 2 H H CO2 2 2 - - H2 production e e Borrarchaeota β-oxidation Glycolysis β-oxidation Glycolysis WLP, Glycolysis,

H2 uptake, Alphaproteobacterium Organics H production(?), Heimdallarchaeota O 2 O metabolism Lipids, fatty acids, sugars, alcohols 2 β-oxidation, 2 acetogenesis WLP Kariarchaeota

+ CO2 H2 H

O metabolism Gerdarchaeota 2 e- Acetate H uptake 2 WLP H2 LAsCA CO Hodarchaeota 2 Acetyl-CoA Fermentation Loss of metabolic potential β-oxidation, H2 production Wukongarchaeota Gluconeogenesis Gain of metabolic potential Organics Lipids, fatty acids, sugars, alcohols Deltaproteobacterium bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

1 Expanding diversity of Asgard archaea and the elusive ancestry of eukaryotes

2 Yang Liu1†, Kira S. Makarova2†, Wen-Cong Huang1†, Yuri I. Wolf2, Anastasia Nikolskaya2, Xinxu 3 Zhang1, Mingwei Cai1, Cui-Jing Zhang1, Wei Xu3, Zhuhua Luo3, Lei Cheng4, Eugene V. Koonin2*, Meng 4 Li1*

5 1 Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen 6 University, Shenzhen, Guangdong, 518060, P. R. China

7 2 National Center for Biotechnology Information, National Library of Medicine, National Institutes of 8 Health, Bethesda, Maryland 20894, USA

9 3 State Key Laboratory Breeding Base of Marine Genetic Resources, Key Laboratory of Marine Genetic 10 Resources, Fujian Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, State 11 Oceanic Administration, Xiamen 361005, P. R. China

12 4 Key Laboratory of Development and Application of Rural Renewable Energy, Biogas Institute of 13 Ministry of Agriculture, Chengdu 610041, P.R. China

14 † These authors contributed equally to this work.

15 *Authors for correspondence: [email protected] or [email protected]

16

1

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

17 Supplementary Information

18 Taxonomic Description of new taxa…………………………………………………………………..….1

19 Materials and Methods…………………………………………………………………………..………..9

20 Supplementary references…...……………………………………………………………………………15

21 Supplementary Figures…………………………………………………………………………………...19

22 Supplementary Tables………………………………………………………………………...…….…….21

23

24

2

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

25 Taxonomic Description of new taxa

26 ‘Candidatus Wukongarchaeum’ (Wu.kong.ar.chae’um. N. L. n. Wukong a legendary Chinese figure, 27 also known as Monkey King, who caused havoc in the heavenly palace); N.L. neut. N. archaeum (from 28 Gr. adj. archaios ancient) archaeon; N. L. neut. N. Wukongarchaeum.

29 ‘Candidatus Wukongarchaeum yapensis’ (yap’ensis N. L. masc. adj. pertaining to Yap trench, which is 30 the geographical position where the first type material of this species was obtained). Type material is the 31 genome designated as As_085 (Yap4.bin4.70) representing ‘Candidatus Wukongarchaeum yapensis’. The 32 genome “As_085” represents a MAG consisting of 2.16 Mbps in 277 contigs with an estimated 33 completeness of 92.52%, an estimated contamination of 4.05%, a 16S and 23S rRNA gene and 14 tRNAs. 34 The MAG recovered from a marine water metagenome (Yap trench, Western Pacific), with an estimated 35 depth of coverage of 31.4, has a GC content of 38%.

36 ‘Candidatus Hodarchaeum’ (Hod.ar.chae’um. N. L. n. Hod a son of Odin in Norse mythology); N.L. 37 neut. N. archaeum (from Gr. adj. archaios ancient) archaeon; N. L. neut. N. Hodarchaeum.

38 ‘Candidatus Hodarchaeum mangrovi’ (man.gro’vi N.L. fem. n. of a mangrove, referring to the isolation 39 of the type material from mangrove soil). Type material is the genome designated as As_027 40 (FT2_5_011) representing ‘Candidatus Hodarchaeum mangrovi’. The genome “As_027” represents a 41 MAG consisting of 4.01 Mbps in 348 contigs with an estimated completeness of 93.61%, an estimated 42 contamination of 0.93%, a 23S rRNA gene and 14 tRNAs. The MAG recovered from mangrove sediment 43 metagenomes (Futian Nature Reserve, China), with an estimated depth of coverage of 17.9, has a GC 44 content of 32.9%.

45 ‘Candidatus Kariarchaeum’ (Ka.ri.ar.chae’um. N. L. n. Kari the god of wind and brother to Aegir in 46 Norse mythology); N.L. neut. N. archaeum (from Gr. adj. archaios ancient) archaeon; N. L. neut. N. 47 Kariarchaeum.

48 ‘Candidatus Kariarchaeum pelagius’ (pe.la’gi.us. L. masc. adj. of or belonging to the sea, referring to 49 the isolation of the type material from the Ocean). Type material is the genome designated as As_030 50 (RS678) representing ‘Candidatus Kariarchaeum pelagius’. The genome “As_030” represents a MAG 51 consisting of 1.41 Mbps in 76 contigs, an estimated completeness of 83.18%, with an estimated 52 contamination of 1.87%, a 23S, 16S and 5S rRNA genes and 18 tRNAs. The MAG recovered from a 53 marine metagenome (Saudi Arabia: Red Sea) has a GC content of 30.11%.

54 ‘Candidatus Borrarchaeum’ (Borr.ar.chae’um. N. L. n. Borr a creator god and father of Odin); N.L. 55 neut. N. archaeum (from Gr. adj. archaios ancient) archaeon; N. L. neut. N. Borrarchaeum.

56 ‘Candidatus Borrarchaeum yapensis’ (yap’ensis N. L. masc. adj. pertaining to Yap trench, which is the 57 geographical position where the first type material of this species was obtained). Type material is the 58 genome designated as As_181 (Yap2000.bin9.141) representing ‘Candidatus Borrarchaeum yapensis’. 59 The genome “As_181” represents a MAG consisting of 3.63 Mbps in 125 contigs, with an estimated 60 completeness of 95.02%, an estimated contamination of 5.61% and 11 tRNAs. The MAG, recovered from 61 a marine water metagenome (Yap trench, Western Pacific) with an estimated depth coverage of 15.04, has 62 a GC content of 37.1%.

3

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

63 ‘Candidatus Baldrarchaeum’ (Bal.dr.ar.chae’um. N. L. n. Baldr the god of light and son of Odin and 64 borther of Thor in Norse mythology); N.L. neut. N. archaeum (from Gr. adj. archaios ancient) archaeon; 65 N. L. neut. N. Baldrarchaeum.

66 ‘Candidatus Baldrarchaeum yapensis’ (yap’ensis N. L. masc. adj. pertaining to Yap trench, which is the 67 geographical position where the first type material of this species was obtained). Type material is the 68 genome designated as As_130 (Yap30.bin9.72) representing ‘Candidatus Baldrarchaeum yapensis’. The 69 genome “As_130” represents a MAG consisting of 2.27 Mbps in 100 contigs, with an estimated 70 completeness of 93.93%, an estimated contamination of 3.74%, a 23S and 16S rRNA gene and 15 tRNAs. 71 The MAG, recovered from a marine water metagenome (Yap trench, Western Pacific) with an estimated 72 depth coverage of 39.99, has a GC content of 45.95%.

73 ‘Candidatus Hermodarchaeum’ (Her.mod.ar.chae’um. N. L. n. Hermod, messengers of the gods in the 74 Norse mythology and son of Odin and brother of Baldr in the Norse mythology); N.L. neut. N. archaeum 75 (from Gr. adj. archaios ancient) archaeon; N. L. neut. N. Hermodarchaeum.

76 ‘Candidatus Hermodarchaeum yapensis’ (yap’ensis N. L. masc. adj. pertaining to Yap trench, which is 77 the geographical position where the first type material of this species was obtained). Type material is the 78 genome designated as As_086 (Yap4.bin9.105) representing ‘Candidatus Hermodarchaeum yapensis’. 79 The genome ‘As_086’ represent a MAG consisting of 2.71 Mbps in 77 contigs, with an estimated 80 completeness of 92.99%, an estimated contamination of 1.87%, a 23S and 16S rRNA gene and 16 tRNAs. 81 The MAG, recovered from a marine water metagenome (Yap trench, Western Pacific) with an estimated 82 depth coverage of 19.24, has a GC content of 44.69%.

83 Candidatus Wukongarchaeaceae (Wu.kong.ar.chae.a.ce’ae. N.L. neut. n. Wukongarchaeum a 84 (Candidatus) type genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. 85 Wukongarchaeaceae the Wukongarchaeum family).

86 The family is delineated based on 209 concatenated Asgard Cluster of Orthologs (AsCOGs) and 16S 87 rRNA gene phylogeny. The description is the same as that of its sole genus and species. Type genus is 88 Candidatus Wukongarchaeum.

89 Candidatus Wukongarchaeales (Wu.kong.ar.chae.a’les. N.L. neut. n. Wukongarchaeum a (Candidatus) 90 type genus of the order; -ales ending to denote the order; N.L fem. pl. n. Wukongarchaeales the 91 Wukongarchaeum order).

92 The order is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 93 description is the same as that of its sole genus and species. Type genus is Candidatus Wukongarchaeum.

94 Candidatus Wukongarchaeia (Wu.kong.ar.chae’i.a. N.L. neut. n. Wukongarchaeum a (Candidatus) type 95 genus of the order of the class; -ia ending to denote the class; N.L fem. pl. n. Wukongarchaeia the 96 Wukongarchaeum class).

97 The class is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 98 description is the same as that of its sole and type order Candidatus Wukongarchaeales.

4

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

99 Candidatus Wukongarchaeota (Wu.kong.ar.chae.o’ta. N.L. neut. n. Wukongarchaeum a (Candidatus) 100 type genus of the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. 101 Wukongarchaeota the Wukongarchaeum phylum)

102 The phylum is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 103 description is the same as that of its sole and type class Candidatus Wukongarchaeia.

104 Candidatus Hodarchaeaceae (Hod.ar.chae.a.ce’ae. N.L. neut. n. Hodarchaeum a (Candidatus) type 105 genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. Hodarchaeaceae the 106 Hodarchaeum family).

107 The family is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 108 description is the same as that of its sole genus and species. Type genus is Candidatus Hodarchaeum.

109 Candidatus Hodarchaeales (Hod.ar.chae.a’les. N.L. neut. n. Hodarchaeum a (Candidatus) type genus of 110 the order; -ales ending to denote the order; N.L fem. pl. n. Hodarchaeales the Hodarchaeum order).

111 The order is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 112 description is the same as that of its sole genus and species. Type genus is Candidatus Hodarchaeum.

113 Candidatus Hodarchaeia (Hod.ar.chae’i.a. N.L. neut. n. Hodarchaeum a (Candidatus) type genus of the 114 order of the class; -ia ending to denote the class; N.L fem. pl. n. Hodarchaeia the Hodarchaeum class).

115 The class is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 116 description is the same as that of its sole and type order Candidatus Hodarchaeales.

117 Candidatus Hodarchaeota (Hod.ar.chae.o’ta. N.L. neut. n. Hodarchaeum a (Candidatus) type genus of 118 the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. Hodarchaeota the 119 Hodarchaeum phylum)

120 The phylum is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 121 description is the same as that of its sole and type class Candidatus Hodarchaeia.

122 Candidatus Hodarchaeaceae (Hod.ar.chae.a.ce’ae. N.L. neut. n. Hodarchaeum a (Candidatus) type 123 genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. Hodarchaeaceae the 124 Hodarchaeum family).

125 The family is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 126 description is the same as that of its sole genus and species. Type genus is Candidatus Hodarchaeum.

127 Candidatus Hodarchaeales (Hod.ar.chae.a’les. N.L. neut. n. Hodarchaeum a (Candidatus) type genus of 128 the order; -ales ending to denote the order; N.L fem. pl. n. Hodarchaeales the Hodarchaeum order).

129 The order is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 130 description is the same as that of its sole genus and species. Type genus is Candidatus Hodarchaeum.

131 Candidatus Hodarchaeia (Hod.ar.chae’i.a. N.L. neut. n. Hodarchaeum a (Candidatus) type genus of the 132 order of the class; -ia ending to denote the class; N.L fem. pl. n. Hodarchaeia the Hodarchaeum class).

5

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

133 The class is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 134 description is the same as that of its sole and type order Candidatus Hodarchaeales.

135 Candidatus Hodarchaeota (Hod.ar.chae.o’ta. N.L. neut. n. Hodarchaeum a (Candidatus) type genus of 136 the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. Hodarchaeota the 137 Hodarchaeum phylum)

138 The phylum is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 139 description is the same as that of its sole and type class Candidatus Hodarchaeia.

140 Candidatus Kariarchaeaceae (Ka.ri.ar.chae.a.ce’ae. N.L. neut. n. Kariarchaeum a (Candidatus) type 141 genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. Kariarchaeaceae the 142 Kariarchaeum family).

143 The family is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 144 description is the same as that of its sole genus and species. Type genus is Candidatus Kariarchaeum.

145 Candidatus Kariarchaeales (Ka.ri.ar.chae.a’les. N.L. neut. n. Kariarchaeum a (Candidatus) type genus of 146 the order; -ales ending to denote the order; N.L fem. pl. n. Kariarchaeales the Kariarchaeum order).

147 The order is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 148 description is the same as that of its sole genus and species. Type genus is Candidatus Kariarchaeum.

149 Candidatus Kariarchaeia (Ka.ri.ar.chae’i.a. N.L. neut. n. Kariarchaeum a (Candidatus) type genus of the 150 order of the class; -ia ending to denote the class; N.L fem. pl. n. Kariarchaeia the Kariarchaeum class).

151 The class is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 152 description is the same as that of its sole and type order Candidatus Kariarchaeales.

153 Candidatus Kariarchaeota (Ka.ri.ar.chae.o’ta. N.L. neut. n. Kariarchaeum a (Candidatus) type genus of 154 the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. Kariarchaeota the 155 Kariarchaeum phylum)

156 The phylum is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 157 description is the same as that of its sole and type class Candidatus Kariarchaeia.

158 Candidatus Borrarchaeaceae (Borr.ar.chae.a.ce’ae. N.L. neut. n. Borrarchaeum a (Candidatus) type 159 genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. Borrarchaeaceae the 160 Borrarchaeum family).

161 The family is delineated based on 209 concatenated AsCOGs phylogeny. The description is the same as 162 that of its sole genus and species. Type genus is Candidatus Borrarchaeum.

163 Candidatus Borrarchaeales (Borr.ar.chae.a’les. N.L. neut. n. Borrarchaeum a (Candidatus) type genus of 164 the order; -ales ending to denote the order; N.L fem. pl. n. Borrarchaeales the Borrarchaeum order).

165 The order is delineated based on 209 concatenated AsCOGs phylogeny. The description is the same as 166 that of its sole genus and species. Type genus is Candidatus Borrarchaeum.

6

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

167 Candidatus Borrarchaeia (Borr.ar.chae’i.a. N.L. neut. n. Borrarchaeum a (Candidatus) type genus of the 168 order of the class; -ia ending to denote the class; N.L fem. pl. n. Borrarchaeia the Borrarchaeum class).

169 The class is delineated based on 209 concatenated AsCOGs phylogeny. The description is the same as 170 that of its sole and type order Candidatus Borrarchaeales.

171 Candidatus Borrarchaeota (Borr.ar.chae.o’ta. N.L. neut. n. Borrarchaeum a (Candidatus) type genus of 172 the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. Borrarchaeota the 173 Borrarchaeum phylum)

174 The phylum is delineated based on 209 concatenated AsCOGs phylogeny. The description is the same as 175 that of its sole and type class Candidatus Borrarchaeia.

176

177 Candidatus Baldrarchaeaceae (Bal.dr.ar.chae.a.ce’ae. N.L. neut. n. Baldrarchaeum a (Candidatus) type 178 genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. Baldrarchaeaceae the 179 Baldrarchaeum family).

180 The family is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 181 description is the same as that of its sole genus and species. Type genus is Candidatus Baldrarchaeum.

182 Candidatus Baldrarchaeales (Bal.dr.ar.chae.a’les. N.L. neut. n. Bladrarchaeum a (Candidatus) type 183 genus of the order; -ales ending to denote the order; N.L fem. pl. n. Baldrarchaeales the Baldrarchaeum 184 order).

185 The order is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 186 description is the same as that of its sole genus and species. Type genus is Candidatus Baldrarchaeum.

187 Candidatus Baldrarchaeia (Bal.dr.ar.chae’i.a. N.L. neut. n. Baldrarchaeum a (Candidatus) type genus of 188 the order of the class; -ia ending to denote the class; N.L fem. pl. n. Baldrarchaeia the Baldrarchaeum 189 class).

190 The class is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 191 description is the same as that of its sole and type order Candidatus Baldrarchaeales.

192 Candidatus Baldrarchaeota (Bal.dr.ar.chae.o’ta. N.L. neut. n. Baldrarchaeum a (Candidatus) type genus 193 of the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. Baldrarchaeota the 194 Baldrarchaeum phylum)

195 The phylum is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 196 description is the same as that of its sole and type class Candidatus Baldrarchaeia.

197

198 Candidatus Hermodarchaeaceae (Her.mod.ar.chae.a.ce’ae. N.L. neut. n. Hermodarchaeum a 199 (Candidatus) type genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. 200 Hermodarchaeaceae the Hermodarchaeum family).

7

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

201 The family is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 202 description is the same as that of its sole genus and species. Type genus is Candidatus Hermodarchaeum.

203 Candidatus Hermodarchaeales (Her.mod.ar.chae.a’les. N.L. neut. n. Hermodarchaeum a (Candidatus) 204 type genus of the order; -ales ending to denote the order; N.L fem. pl. n. Hermodarchaeales the 205 Hermodarchaeum order).

206 The order is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 207 description is the same as that of its sole genus and species. Type genus is Candidatus Hermodarchaeum.

208 Candidatus Hermodarchaeia (Her.mod.ar.chae’i.a. N.L. neut. n. Hermodarchaeum a (Candidatus) type 209 genus of the order of the class; -ia ending to denote the class; N.L fem. pl. n. Hermodarchaeia the 210 Hermodarchaeum class).

211 The class is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 212 description is the same as that of its sole and type order Candidatus Hermodarchaeales.

213 Candidatus Hermodarchaeota (Her.mod.ar.chae.o’ta. N.L. neut. n. Hermodarchaeum a (Candidatus) 214 type genus of the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. 215 Hermodarchaeota the Hermodarchaeum phylum)

216 The phylum is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 217 description is the same as that of its sole and type class Candidatus Hermodarchaeia.

218

8

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

219 Materials and Methods

220 Sampling collections and DNA sequencing

221 YT samples were obtained from the Rongcheng Nation Swan Nature Reserve (Rongcheng, China) in 222 November 2018. The sediment cores were collected using columnar samplers at depth intervals of 0–2, 223 21–26, and 36–41 cm at a seagrass meadow and a non-seagrass–covered site nearby. After collection, 224 bulk sediments were immediately sealed in plastic bags, placed in a pre-cooled icebox, and transported to 225 laboratory within 4 hours. For each sample, DNA was extracted from 10 g sediment using PowerSoil 226 DNA Isolation kit (QIAGEN, Germany), according to the manufacturer’s protocol. Following extraction, 227 nucleic acids were sequenced using Illumina HiSeq2500 (Illumina, USA) PE150 by Novogene (Nanjing, 228 China).

229 MP5 samples were obtained from Mai Po Nature Reserve (Hong Kong, China) in September 2014. Three 230 subsurface sediment samples were collected from a site covering with mangrove forest at depth intervals 231 of 0-2, 10-15 and 20-25 cm. Two subsurface sediment samples were taken at an intertidal mudflat with 232 depths of 0-5 and 13-16 cm. Samples were transported back to laboratory as described for YT 233 metagenomes. DNA was extracted from 5g wet sediment per sample using the PowerSoil DNA Isolation 234 Kit (MO BIO, USA) following the manufacturer’s protocol. Metagenomic sequencing data were 235 generated using Illumina HiSeq2500 (Illumina, USA) PE150 by Novogene (Tianjin, China).

236 FT samples were taken from Futian Nature Reserve (Shenzhen, Guangdong, China) in April 2017. 237 Sediment samples were collected as described for YT samples at depth intervals of 0-2, 6-8, 12-14, 20-22, 238 and 28-30 cm. DNA was extracted from 5g wet sediment per sample using DNeasy PowerSoil kit 239 (Qiagen, Germany) as per manufacturer’s instructions. Nucleic acids were sequenced using Illumina 240 HiSeq2000 (Illumina, USA) PE150 by Novogene (Tianjin, China).

241 The surface sediment sample of the CJE metagenome was collected from Changjiang estuary during a 242 cruise in August 2016. The sample was grabbed from the water bed, sealed immediately in a 50 ml tubes 243 and stored in liquid nitrogen onboard. After transportation to laboratory, 10g wet sediments were used for 244 DNA extraction as per manufacturer’s protocol. Nucleic acids were sequenced using Illumina HiSeq2000 245 (Illumina, USA) PE150 by Novogene (Tianjin, China).

246 An oil sand sample was collected from Shengli Oilfield (Shandong, China) into bottles and they were 247 transported to laboratory where they were stored at 4 ℃. The sampl e w as used as i nocul um to perform 248 enrichment with anaerobic medium in vials as described elsewhere (1). After 253d of enrichment, the 249 genomic DNA was extracted as described elsewhere (2). Nucleic acids were sequenced Illumina 250 HiSeq2000 (Illumina, USA) PE150 by Novogene (Tianjin, China).

251 The seawater samples of Yap metagenomes were collected at Yap trench region by CTD SBE911plus 252 (Sea-Bird Electronics, USA) during the 37th Dayang cruise in 2016. 8L of seawater per sample was 253 filtered through a 0.22 µm-mesh membrane filter immediately after recovery onboard. The membrane 254 was then cut into ~0.2 cm2 pieces with a flame-sterilized scissors and added to a PowerBead Tube (MO 255 BIO, USA) and the subsequent steps were implemented according to the manufacturer’s protocol to 256 extract DNA. The DNA per sample was amplified in five separative reactions using REPLI-g Single Cell 257 Kits (Qiagen, Germany) following the manufacturer's protocol, given to the challenging nature of sample 258 retrieval and DNA recovery. The products were pooled together and purified using QIAamp DNA Mini

9

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

259 Kit (Qiagen, Germany) according to the manufacturer’s recommendations. Parallel blank controls were 260 set for sampling, DNA extraction and ampliation with 0.22 µm-mesh membrane filtering Milli-Q water 261 (18.2 MΩ; Millipore, USA). Nucleic acids were sequenced using HiSeq X Ten (Illumina, USA) PE150.

262 Metagenomic assembly, binning and gene calling

263 FT, MP5 and YT metagenomes. These three sets of metagenomes were assembled and binned using the 264 same method. Raw shotgun metagenomic sequencing reads were trimmed with “read_qc” module from 265 metaWRAP (v.1.1) (3). All clean reads from the same set were pooled together prior to de novo assemble 266 to one co-assembly. Clean reads were sent out to MEGAHIT (v1.1.2) with flag “--presets meta-large” for 267 co-assembling job (4). Sequencing coverage was determined for each assembled scaffold by mapping 268 reads from each sample to the co-assembly using Bowtie2 (5). The binning analysis was carried out 8 269 times with 8 different combinations of specificity and sensitivity parameters using MetaBAT2(52) (“-- 270 maxP 60 or 95” AND “--minS 60 or 95” AND “--maxEdges 200 or 500”) on the assembly with a 271 minimum length of 2000 bp (6). DAS Tool (v1.0) was used as a dereplication and aggregation strategy on 272 those eight binning results to construct accurate bins (7). Manual curation was used for reducing the 273 genome contamination based on differential coverages, GC contents, and the presence of duplicate genes.

274 The depth coverage and N50 statistics of 38 Asgard MAGs recovered from YT metagenomes range from 275 7.72 to 298.86 (median: 21.77) and from 6658 to 381755 bps (median: 26337.5 bps) respectively; for 13 276 Asgard MAGs recovered from FT metagenomes, the values range from 6.98 to 33.22 (median 16.89) and 277 from 3889 to 8957 bps (median: 5000 bps), respectively; and for 11 Asgard MAGs recovered from MP5 278 metagenomes, the values range from 7.06 to 54.42 (median: 10) and from 3898 to 19362 bps (median: 279 7581 bps) respectively. (Supplementary Figure 2).

280 CJE metagenome. Raw metagenomic shotgun sequencing reads were trimmed using Sickle 281 (https://github.com/najoshi/sickle) with default settings. The trimmed reads were de novo assembled 282 using IDBA-UD (v 1.1.1) with the parameters: “-mink 65, -maxk 145, -step 10” (8). Sequencing coverage 283 was determined as described above. The binning analysis were performed with MetaBAT2 12 times, with 284 12 combinations of specificity and sensitivity parameters (--m 1500, 2000, or 2500” AND “--maxP 85 or 285 90” AND “--minS 80, 85 and 90”) for further refinement (6). All binning results were merged and refined 286 using DAS Tool (v1.0) (7).

287 2 Asgard MAGs recovered from CJE metagenome have a depth coverage of 19.16 and 20.56, and a N50 288 statistics of 8740 and 5246 bps (Supplementary Figure 2).

289 J65 metagenome. Raw metagenomic shotgun sequencing reads were trimmed with Trimmomatic (v0.38) 290 (9). The clean reads were then fed to SPAdes (v 3.12.0) for de novo assembly with the parameters: “-- 291 meta -k 21, 33, 55, 77” (10). Sequencing coverage was determined using BBMap (v 38.24) toolkit with 292 the parameters: “bbmap.sh minid=0.99” (https://github.com/BioInfoTools/BBMap). MetaBAT2 (v 2.12.1) 293 was used to perform binning analysis with the parameter: “-m 2000” (6).

294 One Asgard MAG recovered from J65 metagenome had a depth coverage of 14.52 and a N50 statistics of 295 10460 bps (Supplementary Figure 2).

10

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

296 Yap metagenome. For each Yap metagenome, raw metagenomic shotgun sequencing reads were trimmed 297 with Trimmomatic (v0.38) (9). Assembly and binning analysis were performed as described for CJE 298 metagenome for each Yap metagenome.

299 The depth coverage and N50 statistics of 24 Asgard MAGs recovered from Yap metagenomes range from 300 7.78 to 82.33 (median: 13.91) and from 5097 to 889102 bps (median: 18155.5 bps) respectively. 301 (Supplementary Figure 2)

302 A total of 89 Asgard MAGs were reconstructed in this work. Additional 95 Asgard MAGs were 303 downloaded from public databases (e.g. NCBI FTP site). For all 184 genomes, a uniform gene calling 304 protocol was applied. Specifically, the completeness, contamination, and strain heterogeneity of the 305 genomes were estimated by using CheckM (v.1.0.12) (11) and DAS Tool under the taxonomic scope of 306 domain (i.e., Bacteria and Archaea). Protein-coding genes were predicted using Prodigal (v 2.6.3) (12) 307 embedded in Prokka (v 1.13) (13). Transfer RNAs (tRNAs) were identified with tRNAscan-SE (v1.23) 308 using the archaeal tRNA model (14). After quality screening, further analysis focused on 162 high quality 309 Asgard MAGs.

310 Genome set

311 The Asgard archaea genome set analyzed in this work consisted of 161 Asgard MAGs and one complete 312 Asgard genome (Supplementary Table 1). For comparison, selected representative genomes of archaea 313 (296), bacteria (100) and eukaryotes (76) were downloaded from Refseq and Genbank using the NCBI 314 FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/).

315 Average amino-acid identity

316 The average amino-acid identity (AAI) across TACK archaeal reference genomes and the 184 Asgard 317 genomes was calculated using compareM (v0.0.23) with the “aai_wf” at default settings 318 (https://github.com/dparks1134/CompareM).

319 asCOGs construction

320 Initial clustering of 250,634 proteins encoded in 76 Asgard MAGs was performed using two approaches: 321 first, footprints of arCOG profiles were obtained by running PSI-BLAST (15), initiated with arCOG 322 alignments, against the set of predicted Asgard proteins. The footprint sequences were extracted and 323 clustered according to the arCOG best hit. The remaining protein sequences (both full-length proteins and 324 the sequence fragments outside of the footprints, if longer than 60 aa) were clustered using MMseqs2 325 (16), with similarity threshold of 0.5. Sequences within clusters were aligned using MUSCLE (17); the 326 resulting alignments were passed through several rounds of merging and splitting. The merging phase 327 involved comparing alignments to each other using HHSEARCH (18), finding full-length cluster-to- 328 cluster matches, merging the sequence sets and re-aligning the new clusters. The splitting phase consisted 329 of the construction of an approximate phylogenetic tree of the sequences using FastTree (19) (gamma- 330 distributed site rates, WAG evolutionary model) with balanced mid-point tree rooting, identification of 331 subtrees maximizing the fraction of species (MAGs) representation and minimizing the number of 332 paralogs, and pruning such subtrees as separate clusters of putative orthologs. Clusters, derived from 333 arCOGs, were prohibited from merging across distinct arCOGs to prevent distant paralogs from forming 334 mixed clusters.

11

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

335 Phylogenetic analysis

336 16S rRNA gene phylogenetic analysis. 16S rRNA gene sequences were identified in 73 genomes of 337 Asgard archaea (26 generated in this study, 47 from public database) using Barrnap (v 0.9) with the “-- 338 arc” option (https://github.com/tseemann/barrnap). These sequences were combined with 46 339 published 16S rRNA sequences of Asgard archaea, to assess the novelty of the sequences obtained in this 340 work. The novelty of 16S rRNA gene sequences was measured in terms of their sequence identity to 341 previously identified Asgard archaeal 16S rRNA gene and phylogenetic relationships. Specifically, the 342 pairwise sequence identity of two Asgard archaeal 16S rRNA gene sequences (>1300bp) was obtained by 343 first globally aligning the sequences with “Stretcher” in EMBOSS package and then calculating the 344 percent identity excluding gaps (20). The Asgard archaeal 16S rRNA gene sequences were aligned with 345 311 reference sequences from Euryarchaeota (n=231), DPANN (n=22), Korarchaeota (n=1), 346 Crenarchaeota (n=41), Bathyarchaeota (n=2), Thaumarchaeota (n=13) and Aigarchaeota (n=1) using 347 mafft-linsi (v7.471) (21) and trimmed with BMGE (v1.12) (settings: -m DNAPAM250:4, -g 0.5) (22).The 348 alignment was used for phylogenetic inference with IQ-Tree (v2.0.6) based on SYM+R8 (selected by 349 ModelFinder) to generate a maximum-likelihood tree (23).

350 Asgard phylogeny. A set of asCOGs that were considered most suitable as phylogenetic markers for 351 Asgard archaea was selected using the preliminary classification of the 76 genomes in AsCOGs into 352 previously described lineages: Loki, Thor, Odin, Hel and Heimdall. The following criteria were adopted: 353 the asCOG have to be i) present in at least half of the genomes in all lineages, ii) present in at least 75% 354 among the 76 genomes, iii) the mean number of paralogs per genome not to exceed 1.25. For the 209 355 asCOGs matching these criteria, the corresponding protein sequences were obtained from the extended set 356 of 162 MAGs and aligned using MUSCLE (17); the ‘index’ paralog to include in the phylogeny was 357 selected for each MAG based on the similarity to the alignment consensus. Alignments were trimmed to 358 exclude columns containing more than 2/3 gap characters and with homogeneity below 0.05 359 (homogeneity is calculated from the score of the consensus amino acid against the alignment column, 360 compared to the score of the perfect match (24) and concatenated, resulting in an alignment of 50,706 361 characters from 162 sequences (one sequence per MAG). Phylogeny was reconstructed using FastTree 362 (gamma-distributed site rates, WAG evolutionary model) (19) and IQ-Tree (LG+F+R10 model, selected 363 by ModelFinder) (23), producing very similar tree topologies.

364 Tree of life. To elucidate the relationships between the Asgard archaea and other major clades of archaea, 365 bacteria and eukaryotes, 30 families of conserved proteins were selected that appear to have evolved 366 mostly vertically (25, 26). The set of 162 MAGs of Asgard archaea was supplemented with 66 TACK 367 archaea and 220 non-TACK archaea (the former, having been described as the closest archaeal relatives 368 of the Asgard archaea (27), were sampled more densely), 98 bacteria and 72 eukaryotes. Prokaryotic 369 genomes were sampled from the set of completely sequenced genomes to represent the maximum 370 diversity within their respective clades (briefly, all proteins, encoded by genomes within a groups were 371 clustered at 75% identity level; distances between genomes were estimated from the number of shared 372 proteins within these clusters; UPGMA trees were reconstructed from the distances, and genome sets 373 maximizing the total tree branch length were selected to represent the groups). The set of eukaryotic 374 genomes was manually selected to represent the maximum possible variety of eukaryotic taxa. Genomes, 375 in which more than four markers were missing, were excluded from the bacterial and eukaryotic sets. 376 When multiple paralogs of a marker were present in a genome, preliminary phylogenetic trees were

12

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

377 constructed from protein sequence alignments, and paralogs with the shortest branches were selected to 378 represent the corresponding genomes in the set. Sequences were aligned using MUSCLE (17), and 379 alignment columns, containing more than 2/3 gap characters or with alignment column homogeneity 380 below 0.05 were removed (24). The resulting concatenated alignments of the 30 markers consisted of 381 7411 sites. Phylogeny was reconstructed using FastTree (gamma-distributed site rates, WAG evolutionary 382 model) (19) and IQ-Tree (23) with three models: LG+R10, selected by IQ-tree ModelFinder as the best 383 fit, GTR20+F+R10 (following the suggestion of Williams et al. (28) to use GTR, we let IQ-tree to select 384 the best version of the GTR model), and LG+C20+G4+F (again, the mixture model was used following 385 the suggestion of Williams et al. (28); we were unable to use the higher-specified C60 model due to 386 memory limitations of our hardware and used the C20 model instead).

387 Ordination of asCOG phyletic patterns using Classical Multidimensional Scaling

388 Binary asCOG presence-absence patterns were compared between pairs of Asgard MAGs using the 389 following procedure: first, similarity between asCOG sets {A} and {B} was calculated as , = | |/ | || | 390 (the number of shared AsCOGs normalized by the geometric mean of the𝑆𝑆𝐴𝐴 number𝐵𝐵 of 391 AsCOGs in the two MAGs); then, the distance between the patterns was calculated as , = ln ( , ). 𝑨𝑨 ∩𝑩𝑩 � 𝑨𝑨 𝑩𝑩 392 The 162x162 distance matrix was embedded into a 2-dimensional space using Classical Multidimensional 𝑑𝑑𝐴𝐴 𝐵𝐵 − 𝑆𝑆𝐴𝐴 𝐵𝐵 393 Scaling analysis implemented as the cmdscale function in R. The projection retained 89% of the original 394 datapoint inertia.

395 Identification and analysis of ESPs

396 To identify eukaryotic signature proteins (ESPs), several strategies were employed. First, ESPs reported 397 by Zaremba-Niedzwiedzka et al. (29) were mapped to asCOGs using PSI-BLAST (15). These asCOGs 398 were additionally examined case by case using HHpred (30) with representative of the respective asCOG 399 or the respective asCOG alignment used as the query. Second, all asCOGs were mapped to CDD profiles 400 (31) using PSI-BLAST, hits to eukaryote-specific domains were selected. Most of the putative ESP 401 asCOGs identified in this search, and all with E-value >1e-10, were additionally examined using HHpred, 402 with a representative of the respective asCOG or the respective asCOG alignment as the query. Third, we 403 analyzed frequently occurring asCOGs (present in at least 50% of Asgard genomes and in at least 30% of 404 Heimdall genomes) that were not annotated automatically with the above two approaches using HHpred 405 with a representative of the respective asCOG or the respective asCOG alignment as the query . Fourth, 406 most of the putative ESP asCOGs detected with these approaches were used as queries for a PSI-BLAST 407 search that was run for three iterations (with inclusion threshold E-value=0.0001) against an Asgard only 408 protein sequence database. Additional unannotated asCOGs with similarity to the (putative) ESPs 409 identified in this search were further examined using HHpred. Fifth, the genomic neighborhoods or all 410 ESPs were examined, and proteins encoded by unannotated neighbor genes were analyzed using HHpred 411 server.

412 Metabolic pathway reconstruction

413 The patterns of gene presence-absence in the asCOGs were used to reconstruct the metabolic pathways of 414 Asgard archaea (Supplementary Table 8). The asCOGs were linked to the KEGG database and to the list 415 of predicted metabolic enzymes of Asgard archaea reported by Spang et al. (32) (Supplementary Table 8). 416 The classification of [NiFe] hydrogenases was performed by comparing the Asgard proteins of

13

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

417 cog.001539, cog.002254, cog.010021, cog.011939 and cog.012499 to HydDB (33). For phylogenetic 418 analysis, the reference sequences of group 1, 3 and 4 [NiFe] hydrogenases were retrieved from HydDB. 419 The sequences were filtered using cd-hit with a sequence identity cut-off of 90% prior to adding 420 orthologous genes of cog.001539, cog.002254, cog.010021, cog.011939 and cog.012499 of Asgard 421 archaea. All sequences for group 1, group 3 and group 4 [NiFe] hydrogenases were aligned using mafft- 422 LINSI (21) and trimmed with BMGE (-m BLOSUM30 -h 0.6) (22). Maximum-likelihood phylogenetic 423 analyses were performed using IQ-tree (23) with the best-fit model (group 1:LG + C60 + R + F, group 3: 424 LG + C60 + R +F and group 4 LG + C50 + R + F), respectively, according to Bayesian information 425 criterion (BIC). Support values were estimated using the SH-like approximate-likelihood ratio test and 426 ultrafast bootstraps, respectively. We adapted a relaxed common denominator approach to determine the 427 presence of certain pathway in one Asgard phyla (32), and combined with maximum parsimony principle 428 (34) to infer the metabolisms of major ancestral forms.

429

430

14

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

431 References

432 1. L. Cheng, T.-L. Qiu, X. Li, W.-D. Wang, Y. Deng, X.-B. Yin, H. Zhang, Isolation and 433 characterization of Methanoculleus receptaculi sp. nov. from Shengli oil field, China. FEMS 434 Microbiology Letters. 285, 65–71 (2008).

435 2. J. Peng, Z. Lü, J. Rui, Y. Lu, Dynamics of the Methanogenic Archaeal Community during Plant 436 Residue Decomposition in an Anoxic Rice Field Soil. Appl. Environ. Microbiol. 74, 2894–2901 (2008).

437 3. G. V. Uritskiy, J. DiRuggiero, J. Taylor, MetaWRAP—a flexible pipeline for genome-resolved 438 metagenomic data analysis. Microbiome. 6, 158 (2018).

439 4. D. Li, C. M. Liu, R. Luo, K. Sadakane, T. W. Lam, MEGAHIT: an ultra-fast single-node solution 440 for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 31, 1674– 441 1676 (2015).

442 5. B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat Methods. 9, 357– 443 359 (2012).

444 6. D. D. Kang, F. Li, E. Kirton, A. Thomas, R. Egan, H. An, Z. Wang, MetaBAT 2: an adaptive 445 binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 7, 446 e7359 (2019).

447 7. C. M. K. Sieber, A. J. Probst, A. Sharrar, B. C. Thomas, M. Hess, S. G. Tringe, J. F. Banfield, 448 Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat 449 Microbiol. 3, 836–843 (2018).

450 8. Y. Peng, H. C. M. Leung, S. M. Yiu, F. Y. L. Chin, IDBA-UD: a de novo assembler for single- 451 cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 28, 1420–1428 (2012).

452 9. A. M. Bolger, M. Lohse, B. Usadel, Trimmomatic: a flexible trimmer for Illumina sequence data. 453 Bioinformatics. 30, 2114–2120 (2014).

454 10. S. Nurk, D. Meleshko, A. Korobeynikov, P. A. Pevzner, metaSPAdes: a new versatile 455 metagenomic assembler. Genome Res. 27, 824–834 (2017).

456 11. D. H. Parks, M. Imelfort, C. T. Skennerton, P. Hugenholtz, G. W. Tyson, CheckM: assessing the 457 quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research. 458 25, 1043–1055 (2015).

459 12. D. Hyatt, G.-L. Chen, P. F. LoCascio, M. L. Land, F. W. Larimer, L. J. Hauser, Prodigal: 460 prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 11, 119 461 (2010).

462 13. T. Seemann, Prokka: rapid prokaryotic genome annotation. Bioinformatics. 30, 2068–2069 463 (2014).

464 14. P. P. Chan, T. M. Lowe, tRNAscan-SE: Searching for tRNA genes in genomic sequences. 465 Methods Mol Biol. 1962, 1–14 (2019).

15

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

466 15. A. A. Schäffer, L. Aravind, T. L. Madden, S. Shavirin, J. L. Spouge, Y. I. Wolf, E. V. Koonin, S. 467 F. Altschul, Improving the accuracy of PSI-BLAST protein database searches with composition-based 468 statistics and other refinements. Nucleic Acids Res. 29, 2994–3005 (2001).

469 16. M. Steinegger, J. Söding, MMseqs2 enables sensitive protein sequence searching for the analysis 470 of massive data sets. Nat Biotechnol. 35, 1026–1028 (2017).

471 17. R. C. Edgar, MUSCLE: a multiple sequence alignment method with reduced time and space 472 complexity. BMC Bioinformatics. 5, 113 (2004).

473 18. J. Söding, Protein homology detection by HMM-HMM comparison. Bioinformatics. 21, 951–960 474 (2005).

475 19. M. N. Price, P. S. Dehal, A. P. Arkin, FastTree 2--approximately maximum-likelihood trees for 476 large alignments. PLoS One. 5, e9490 (2010).

477 20. P. Rice, I. Longden, A. Bleasby, EMBOSS: the European Molecular Biology Open Software 478 Suite. Trends Genet. 16, 276–277 (2000).

479 21. K. Katoh, D. M. Standley, MAFFT Multiple Sequence Alignment Software Version 7: 480 Improvements in Performance and Usability. Molecular Biology and Evolution. 30, 772–780 (2013).

481 22. A. Criscuolo, S. Gribaldo, BMGE (Block Mapping and Gathering with Entropy): a new software 482 for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evolutionary 483 Biology. 10, 210 (2010).

484 23. L.-T. Nguyen, H. A. Schmidt, A. von Haeseler, B. Q. Minh, IQ-TREE: A Fast and Effective 485 Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol Biol Evol. 32, 268–274 486 (2015).

487 24. E. Esterman, Y. I. Wolf, R. Kogay, E. V. Koonin, O. Zhaxybayeva, bioRxiv, in press, 488 doi:10.1101/2020.10.08.331884.

489 25. F. D. Ciccarelli, T. Doerks, C. von Mering, C. J. Creevey, B. Snel, P. Bork, Toward automatic 490 reconstruction of a highly resolved tree of life. Science (New York, N.Y.). 311, 1283–1287 (2006).

491 26. P. Puigbò, Y. I. Wolf, E. V. Koonin, Search for a “Tree of Life” in the thicket of the phylogenetic 492 forest. Journal of biology. 8, 59 (2009).

493 27. A. Spang, J. H. Saw, S. L. Jørgensen, K. Zaremba-Niedzwiedzka, J. Martijn, A. E. Lind, R. van 494 Eijk, C. Schleper, L. Guy, T. J. G. Ettema, Complex archaea that bridge the gap between prokaryotes and 495 eukaryotes. Nature. 521, 173–179 (2015).

496 28. T. A. Williams, C. J. Cox, P. G. Foster, G. J. Szöllősi, T. M. Embley, Phylogenomics provides 497 robust support for a two-domains tree of life. Nat Ecol Evol. 4, 138–147 (2020).

498 29. K. Zaremba-Niedzwiedzka, E. F. Caceres, J. H. Saw, D. B ckstr m, L. Juzokaite, E. Vancaester, 499 K. W. Seitz, K. Anantharaman, P. Starnawski, K. U. Kjeldsen, M. B. Stott, T. Nunoura, J. F. Banfield, A.

16

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

500 Schramm, B. J. Baker, A. Spang, T. J. G. Ettema, Asgard archaea illuminate the origin of eukaryotic 501 cellular complexity. Nature. 541, 353–358 (2017).

502 30. L. Zimmermann, A. Stephens, S.-Z. Nam, D. Rau, J. Kübler, M. Lozajic, F. Gabler, J. Söding, A. 503 N. Lupas, V. Alva, A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred 504 Server at its Core. J Mol Biol. 430, 2237–2243 (2018).

505 31. M. Yang, M. K. Derbyshire, R. A. Yamashita, A. Marchler-Bauer, NCBI’s Conserved Domain 506 Database and Tools for Protein Domain Analysis. Curr Protoc Bioinformatics. 69, e90 (2020).

507 32. A. Spang, C. W. Stairs, N. Dombrowski, L. Eme, J. Lombard, E. F. Caceres, C. Greening, B. J. 508 Baker, T. J. G. Ettema, Proposal of the reverse flow model for the origin of the eukaryotic cell based on 509 comparative analyses of Asgard archaeal metabolism. Nature Microbiology. 4, 1138–1148 (2019).

510 33. D. Søndergaard, C. N. S. Pedersen, C. Greening, HydDB: A web tool for hydrogenase 511 classification and analysis. Scientific Reports. 6, 34212 (2016).

512 34. D. L. Swofford, W. P. Maddison, Reconstructing ancestral character states under Wagner 513 parsimony. Mathematical Biosciences. 87, 199–229 (1987).

514

515

17

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

516 Data availability

517 Asgard archaea genomes generated in this study are available in eLMSG (an eLibrary of Microbial 518 Systematics and Genomics, https://www.biosino.org/elmsg/index) under accession numbers from 519 LMSG_G000000521.1 to LMSG_G000000609.1 and NCBI database under the project XXXXX.

520 Additional Data files are available at ftp://ftp.ncbi.nih.gov/pub/wolf/_suppl/asgard20/

521 Additional data file 1 (Additional_data_file_1.tgz): Complete asCOG data archive

522 Additional data file 2 (Additional_data_file_2.tgz): Phylogenetic trees and alignments archive

523

524

525

18

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

526 Supplementary Figures

527 Supplementary Figure 1 Global distribution of Asgard genomes analyzed in the study. The pie chart 528 shows the proportion of Asgard genomes found in a biotope. Bold letters in the map show the sampling 529 locations.

530 Supplementary Figure 2 a. Distribution of completeness and contamination for 76 Asgard MAGs 531 assessed by CheckM (v 1.0.12). Distribution of depth coverage (b) and N50 statistics (c) for Asgard 532 MAGs reconstructed in this study. The numbers in parentheses indicate the number of Asgard genomes 533 recovered from a given sampling location. The data for this plot can be found in Supplementary Table 1.

534 Supplementary Figure 3 Comparison of the mean amino-acid identity (AAI) of Asgard and TACK 535 superphyla. a. Shared AAI across Asgard and TACK lineages. Background: comparing all Asgard and 536 TACK lineages included in the analyses but excluding archaea belonging to the same lineages and the six 537 phyla proposed in the current work to investigate the distribution of AAI that defines a phylum. The AAI 538 comparison of (b) Thorarchaeota, (c) Hermodarchaeota (d) Odinarchaeota, (e) Baldrarchaeota, (f) 539 Lokiarchaeota, (g) Helarchaeota, (h) Borrarchaeota, (i) Heimdallarchaeota, (j) Kariarchaeota, (k) 540 Gerdarchaeota, (l) Hodarchaeota and (m) Wukongarchaeota to other Asgad and TACK lineages. The 541 lower and upper hinges of the boxplot correspond to the first and third quartiles. Data beyond the 542 whiskers are shown as individual data points. Number in the parenthesis indicates the number of genomes 543 in the lineages. Data for this plot is included in Supplementary Table 2.

544 Supplementary Figure 4 Comparison of the 16S rRNA gene sequence (>1300 bp) identity of Asgard 545 and TACK lineages. a. Shared 16S rRNA gene sequence identity across Asgard and TACK lineages. 546 Background:comparing all Asgard and TACK lineages included in the analyses but excluding archaea 547 belonging to the same lineages and the six phyla proposed in the current study to investigate the 548 distribution of 16S rRNA gene identity that defines a phylum. 16S rRNA gene identity comparison of (b) 549 Thorarchaeota, (c) Hermodarchaeota (d) Odinarchaeota, (e) Lokiarchaeota, (f) Helarchaeota, (g) 550 Heimdallarchaeota, (h) Kariarchaeota, (i) Gerdarchaeota, (j) Hodarchaeota and (k) Wukongarchaeota to 551 other Asgard and TACK lineages. The lower and upper hinges of the boxplot correspond to the first and 552 third quartiles. Data beyond the whiskers are shown as individual data points. Number in the parenthesis 553 indicates the number of 16S rRNA gene sequences compared in the lineages. Line represents a 16S rRNA 554 gene identity of 75%. Data for this plot could be found in Supplementary Table 3.

555 Supplementary Figure 5 Presence-absence of orthologs of Asgard core genes in other archaea, bacteria 556 and eukaryotes.

557 Supplementary Figure 6 Phylogenetic analysis of group 4 [NiFe] hydrogenases in the Asgard archaea. 558 The unrooted maximum-likelihood phylogenetic tree was built from an alignment of 425 sequences 559 including 110 sequences of Asgard archaea, with 308 amino-acid positions.

560 Supplementary Figure 7 Phylogenetic analysis of group 3 [NiFe] hydrogenases in the Asgard archaea. 561 The unrooted maximum-likelihood phylogenetic tree was built from an alignment of 813 sequences 562 including 335 sequences of Asgard archaea, 514 amino-acid positions.

19

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

563 Supplementary Figure 8 Phylogenetic analysis of group 1 [NiFe] hydrogenases in the Asgard archaea. 564 The unrooted maximum-likelihood phylogenetic tree was built from an alignment of 541 sequences 565 including 2 sequences of Wukongarchaeaota, with 745 amino-acid positions.

566 Supplementary Figure 9 A schematic representation of the presence and absence of selected metabolic 567 features in all (putative) phyla of Asgard archaea.

568 Supplementary Figure 10 Gene structure of the contig encoding Group 1 [Ni,Fe]-hydrogenase in 569 Wukongarchaeota. Abbreviations: Ftr, Formylmethanofuran:tetrahydromethanopterin formyltransferase; 570 FwdD Formylmethanofuran dehydrogenase subunit D; FwdB, Formylmethanofuran dehydrogenase 571 subunit B; FwdA, Formylmethanofuran dehydrogenase subunit A; FwdC, Formylmethanofuran 572 dehydrogenase subunit C; HdrC, Heterodisulfide reductase, subunit C; HyaB, Ni,Fe-hydrogenase I large 573 subunit; HyaA, Ni,Fe-hydrogenase I small subunit.

574

575

20

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

576 Supplementary Tables

577 Supplementary Table 1 Genome information, proposed and isolation data

578 Supplementary Table 2 Mean amino-acid identity values (in %) comparing 66 TACK genomes and 184 579 Asgard genomes (162 high quality and 22 low-quality)

580 Supplementary Table 3 The 16S rRNA gene sequence identity (in %) comparing TACK lineages and 581 Asgard lineages. The identity was calculated using sequences longer than 1300 bps

582 Supplementary Table 4 Species and phyletic markers used for the tree of life reconstruction

583 Supplementary Table 5 Data for phylogenetic trees: the trees in the Newick format and the underlying 584 alignments

585 Supplementary Table 6 The asCOGs annotation

586 Supplementary Table 7 Eukaryotic signature proteins in Asgard archaea

587 Supplementary Table 8 The presence-absence of metabolic enzymes in Asgard archaea.

588

21

● ● ● 50°N bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. ● ●● ● ● YT J65 ● ● ● ● CJE ● ●● ● ● ● ● MP5YT ● ● ● ● ●

● ● Yap

3.3% 0.5 % 0°

Latitude 14.2 % 51.3%

14.2 %

50°S 0.5 %

7.1 %

0.5 % 8.1% N

120°W 60°W 0° 60°E 120°E Longitude

● Coastal sediment ● Hot spring ● Hypersaline lake sediment ● Marine water ● Petroleum seep (Marine) Biotope ● Freshwater sediment ● Hydrothermal vent ● Marine sediment ● Petroleum field bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

a 100 b c 300

275

750000 250

75 225

200

175 500000 50 150 N50 (bp) value (%) value 125 depth of coverage (1X) depth of coverage 100 250000 25 75

50

25

0 0 0 Completeness Contamination CJE (n=2) FT (n=13) J65 (n=1) MP5 (n=11) Yap (n=24) YT (n=38) CJE (n=2) FT (n=13) J65 (n=1) MP5 (n=11) Yap (n=24) YT (n=38) a 100

80 60 Average amino acid identity (%) Average

40

Background Aigarchaeota(1) Korarchaeota(2) Helarchaeota(5) Odinarchaeota(2) Borrarchaeota(3) Kariarchaeota(4) Crenarchaeota(45) Bathyarchaeota(5) Thorarchaeota(57) Baldrarchaeota(2) Lokiarchaeota(52) Gerdarchaeota(14) Hodarchaeota(16) Wukongarchaeota(3) Thaumarchaeota(13) Hermodarchaeota(17) Heimdallarchaeota(9)

100 100 b c

80 80

60 60 Average amino acid identity (%) Average Average amino acid identity (%) Average

40 40

Thorarchaeota (57) _vs_ Aigarchaeota (1) Thorarchaeota (57) _vs_ Korarchaeota (2) Thorarchaeota (57) _vs_ Helarchaeota (5) Hermodarchaeota (17) _vs_ Aigarchaeota (1) Hermodarchaeota (17) _vs_ Korarchaeota (2) Hermodarchaeota (17) _vs_ Helarchaeota (5) Thorarchaeota (57) _vs_ Odinarchaeota (2) Thorarchaeota (57) _vs_ BorrarchaeotaThorarchaeota (3) (57) _vs_ Kariarchaeota (4) Hermodarchaeota (17) _vs_ Odinarchaeota (2) Hermodarchaeota (17) _vs_ BorrarchaeotaHermodarchaeota (3) (17) _vs_ Kariarchaeota (4) Thorarchaeota (57)Thorarchaeota _vs_ Crenarchaeota (57) _vs_ (45)Bathyarchaeota (5) Thorarchaeota Thorarchaeota(57) _vs_ Baldrarchaeota (57) _vs_ Lokiarchaeota (2) (52) Thorarchaeota (57)Thorarchaeota _vs_ Gerdarchaeota (57) _vs_ (14)HodarchaeotaThorarchaeota (16) (57) _vs_ Thorarchaeota (57) HermodarchaeotaHermodarchaeota (17) _vs_ Crenarchaeota (17) _vs_ (45)Bathyarchaeota (5)Hermodarchaeota (17) _vs_ ThorarchaeotaHermodarchaeota (57)Hermodarchaeota (17) _vs_ Baldrarchaeota (17) _vs_ Lokiarchaeota (2) (52) HermodarchaeotaHermodarchaeota (17) _vs_ Gerdarchaeota (17) _vs_ (14)Hodarchaeota (16) Thorarchaeota (57) _vs_ Wukongarchaeota (3) Hermodarchaeota (17) _vs_ Wukongarchaeota (3) Thorarchaeota (57) _vs_ ThaumarchaeotaThorarchaeota (13) (57) _vs_ Hermodarchaeota (17) Thorarchaeota (57) _vs_ Heimdallarchaeota (9) Hermodarchaeota (17) _vs_ Thaumarchaeota (13) Hermodarchaeota (17) _vs_ Heimdallarchaeota (9) Hermodarchaeota (17) _vs_ Hermodarchaeota (17)

d 100 e 100

80 80

60 60 Average amino acid identity (%) Average Average amino acid identity (%) Average

40 40

Odinarchaeota (2) _vs_ Aigarchaeota (1) Odinarchaeota (2) _vs_ Korarchaeota (2) Odinarchaeota (2) _vs_ Helarchaeota (5) Baldrarchaeota (2) _vs_ Aigarchaeota (1) Baldrarchaeota (2) _vs_ Korarchaeota (2) Baldrarchaeota (2) _vs_ Helarchaeota (5) Odinarchaeota (2) _vs_ BorrarchaeotaOdinarchaeota (3) (2) _vs_ Kariarchaeota (4) Odinarchaeota (2) _vs_ Odinarchaeota (2) Baldrarchaeota (2) _vs_ Odinarchaeota (2) Baldrarchaeota (2) _vs_ BorrarchaeotaBaldrarchaeota (3) (2) _vs_ Kariarchaeota (4) Odinarchaeota (2)Odinarchaeota _vs_ Crenarchaeota (2) _vs_ (45)Bathyarchaeota (5) Odinarchaeota (2) _vs_ ThorarchaeotaOdinarchaeota (57) Odinarchaeota(2) _vs_ Baldrarchaeota (2) _vs_ Lokiarchaeota (2) (52) Odinarchaeota (2)Odinarchaeota _vs_ Gerdarchaeota (2) _vs_ (14)Hodarchaeota (16) Baldrarchaeota Baldrarchaeota(2) _vs_ Crenarchaeota (2) _vs_ (45)Bathyarchaeota (5) Baldrarchaeota (2) _vs_ Thorarchaeota (57) Baldrarchaeota (2) _vs_ Lokiarchaeota (52) Baldrarchaeota (2)Baldrarchaeota _vs_ Gerdarchaeota (2) _vs_ (14)HodarchaeotaBaldrarchaeota (16) (2) _vs_ Baldrarchaeota (2) Odinarchaeota (2) _vs_ Wukongarchaeota (3) Baldrarchaeota (2) _vs_ Wukongarchaeota (3) Odinarchaeota (2) _vs_ Thaumarchaeota (13)Odinarchaeota (2) _vs_ Hermodarchaeota (17) Odinarchaeota (2) _vs_ Heimdallarchaeota (9) Baldrarchaeota (2) _vs_ Thaumarchaeota (13)Baldrarchaeota (2) _vs_ Hermodarchaeota (17) Baldrarchaeota (2) _vs_ Heimdallarchaeota (9)

f 100 g 100

80 80

60 60

Average amino acid identity (%) Average Average amino acid identity (%) Average 40 40

Lokiarchaeota (52) _vs_ Aigarchaeota (1) Lokiarchaeota (52) _vs_ Korarchaeota (2) Lokiarchaeota (52) _vs_ Helarchaeota (5) Helarchaeota (5) _vs_ Aigarchaeota (1) Helarchaeota (5) _vs_ Korarchaeota (2) Helarchaeota (5) _vs_ Helarchaeota (5) Lokiarchaeota (52) _vs_ Odinarchaeota (2) Lokiarchaeota (52) _vs_ BorrarchaeotaLokiarchaeota (3) (52) _vs_ Kariarchaeota (4) Helarchaeota (5) _vs_ Odinarchaeota (2) Helarchaeota (5) _vs_ BorrarchaeotaHelarchaeota (3) (5) _vs_ Kariarchaeota (4) Lokiarchaeota (52)Lokiarchaeota _vs_ Crenarchaeota (52) _vs_ (45)Bathyarchaeota (5) Lokiarchaeota (52) _vs_ Thorarchaeota (57) Lokiarchaeota (52) _vs_ Baldrarchaeota (2) Lokiarchaeota (52)Lokiarchaeota _vs_ Gerdarchaeota (52) _vs_ (14)HodarchaeotaLokiarchaeota (16) (52) _vs_ Lokiarchaeota (52) Helarchaeota (5)Helarchaeota _vs_ Crenarchaeota (5) _vs_ (45)Bathyarchaeota (5) Helarchaeota (5) _vs_ Thorarchaeota (57) Helarchaeota (5)Helarchaeota _vs_ Baldrarchaeota (5) _vs_ Lokiarchaeota (2) (52) Helarchaeota (5)Helarchaeota _vs_ Gerdarchaeota (5) _vs_ (14)Hodarchaeota (16) Lokiarchaeota (52) _vs_ Wukongarchaeota (3) Helarchaeota (5) _vs_ Wukongarchaeota (3) Lokiarchaeota (52) _vs_ Thaumarchaeota (13)Lokiarchaeota (52) _vs_ Hermodarchaeota (17) Lokiarchaeota (52) _vs_ Heimdallarchaeota (9) Helarchaeota (5) _vs_ Thaumarchaeota (13)Helarchaeota (5) _vs_ Hermodarchaeota (17) Helarchaeota (5) _vs_ Heimdallarchaeota (9)

h 100 i 100

80 80

60 60 Average amino acid identity (%) Average

Average amino acid identity (%) Average

40 40

Borrarchaeota (3) _vs_ Aigarchaeota (1) Borrarchaeota (3) _vs_ Korarchaeota (2) Borrarchaeota (3) _vs_ Helarchaeota (5) Heimdallarchaeota (9) _vs_ Aigarchaeota (1) Heimdallarchaeota (9) _vs_ Korarchaeota (2) Heimdallarchaeota (9) _vs_ Helarchaeota (5) Borrarchaeota (3) _vs_ Odinarchaeota (2) Borrarchaeota (3) _vs_ Kariarchaeota (4) Borrarchaeota (3) _vs_ Borrarchaeota (3) Heimdallarchaeota (9) _vs_ Odinarchaeota (2) HeimdallarchaeotaHeimdallarchaeota (9) _vs_ Borrarchaeota (9) _vs_ (3)Kariarchaeota (4) Borrarchaeota (3)Borrarchaeota _vs_ Crenarchaeota (3) _vs_ (45)Bathyarchaeota (5) Borrarchaeota (3) _vs_ Thorarchaeota (57) Borrarchaeota (3)Borrarchaeota _vs_ Baldrarchaeota (3) _vs_ Lokiarchaeota (2) (52) Borrarchaeota (3)Borrarchaeota _vs_ Gerdarchaeota (3) _vs_ (14)Hodarchaeota (16) HeimdallarchaeotaHeimdallarchaeota (9) _vs_ Crenarchaeota (9) _vs_ (45)Bathyarchaeota (5)Heimdallarchaeota (9) _vs_ Thorarchaeota (57)HeimdallarchaeotaHeimdallarchaeota (9) _vs_ Baldrarchaeota (9) _vs_ Lokiarchaeota (2) (52) HeimdallarchaeotaHeimdallarchaeota (9) _vs_ Gerdarchaeota (9) _vs_ (14)Hodarchaeota (16) Borrarchaeota (3) _vs_ Wukongarchaeota (3) Heimdallarchaeota (9) _vs_ Wukongarchaeota (3) Borrarchaeota (3) _vs_ Thaumarchaeota (13)Borrarchaeota (3) _vs_ Hermodarchaeota (17) Borrarchaeota (3) _vs_ Heimdallarchaeota (9) Heimdallarchaeota (9) _vs_ ThaumarchaeotaHeimdallarchaeota (13) (9) _vs_ Hermodarchaeota (17) Heimdallarchaeota (9) _vs_ Heimdallarchaeota (9)

100 100 j k

80 80

bioRxiv preprint60 doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint 60 (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Average amino acid identity (%) Average Average amino acid identity (%) Average 40 40

Kariarchaeota (4) _vs_ Aigarchaeota (1) Kariarchaeota (4) _vs_ Korarchaeota (2) Kariarchaeota (4) _vs_ Helarchaeota (5) Gerdarchaeota (14) _vs_ Aigarchaeota (1) Gerdarchaeota (14) _vs_ Korarchaeota (2) Gerdarchaeota (14) _vs_ Helarchaeota (5) Kariarchaeota (4) _vs_ Odinarchaeota (2) Kariarchaeota (4) _vs_ Borrarchaeota (3) Kariarchaeota (4) _vs_ Kariarchaeota (4) Gerdarchaeota (14) _vs_ Odinarchaeota (2) Gerdarchaeota (14) _vs_ BorrarchaeotaGerdarchaeota (3) (14) _vs_ Kariarchaeota (4) Kariarchaeota (4)Kariarchaeota _vs_ Crenarchaeota (4) _vs_ (45)Bathyarchaeota (5) Kariarchaeota (4) _vs_ Thorarchaeota (57) Kariarchaeota (4)Kariarchaeota _vs_ Baldrarchaeota (4) _vs_ Lokiarchaeota (2) (52) Kariarchaeota (4)Kariarchaeota _vs_ Gerdarchaeota (4) _vs_ (14)Hodarchaeota (16) Gerdarchaeota (14)Gerdarchaeota _vs_ Crenarchaeota (14) _vs_ (45)Bathyarchaeota (5) Gerdarchaeota (14) _vs_ Thorarchaeota (57) Gerdarchaeota Gerdarchaeota(14) _vs_ Baldrarchaeota (14) _vs_ Lokiarchaeota (2) (52) Gerdarchaeota (14) _vs_ HodarchaeotaGerdarchaeota (16) (14) _vs_ Gerdarchaeota (14) Kariarchaeota (4) _vs_ Wukongarchaeota (3) Gerdarchaeota (14) _vs_ Wukongarchaeota (3) Kariarchaeota (4) _vs_ Thaumarchaeota (13)Kariarchaeota (4) _vs_ Hermodarchaeota (17) Kariarchaeota (4) _vs_ Heimdallarchaeota (9) Gerdarchaeota (14) _vs_ Thaumarchaeota (13)Gerdarchaeota (14) _vs_ Hermodarchaeota (17) Gerdarchaeota (14) _vs_ Heimdallarchaeota (9)

l 100 m 100

80 80

60 60 Average amino acid identity (%) Average Average amino acid identity (%) Average

40 40

Hodarchaeota (16) _vs_ Aigarchaeota (1) Hodarchaeota (16) _vs_ Korarchaeota (2) Hodarchaeota (16) _vs_ Helarchaeota (5) Wukongarchaeota (3) _vs_ Aigarchaeota (1) Wukongarchaeota (3) _vs_ Korarchaeota (2) Wukongarchaeota (3) _vs_ Helarchaeota (5) Hodarchaeota (16) _vs_ Odinarchaeota (2) Hodarchaeota (16) _vs_ BorrarchaeotaHodarchaeota (3) (16) _vs_ Kariarchaeota (4) Wukongarchaeota (3) _vs_ Odinarchaeota (2) Wukongarchaeota (3) _vs_ BorrarchaeotaWukongarchaeota (3) (3) _vs_ Kariarchaeota (4) Hodarchaeota (16)Hodarchaeota _vs_ Crenarchaeota (16) _vs_ (45)Bathyarchaeota (5) Hodarchaeota (16) _vs_ Thorarchaeota (57) Hodarchaeota (16)Hodarchaeota _vs_ Baldrarchaeota (16) _vs_ Lokiarchaeota (2) (52) Hodarchaeota (16) _vs_ GerdarchaeotaHodarchaeota (14) (16) _vs_ Hodarchaeota (16) WukongarchaeotaWukongarchaeota (3) _vs_ Crenarchaeota (3) _vs_ (45)Bathyarchaeota (5)Wukongarchaeota (3) _vs_ Thorarchaeota (57)WukongarchaeotaWukongarchaeota (3) _vs_ Baldrarchaeota (3) _vs_ Lokiarchaeota (2) (52) WukongarchaeotaWukongarchaeota (3) _vs_ Gerdarchaeota (3) _vs_ (14)Hodarchaeota (16) Hodarchaeota (16) _vs_ Wukongarchaeota (3) Wukongarchaeota (3) _vs_ Wukongarchaeota (3) Hodarchaeota (16) _vs_ Thaumarchaeota (13)Hodarchaeota (16) _vs_ Hermodarchaeota (17) Hodarchaeota (16) _vs_ Heimdallarchaeota (9) Wukongarchaeota (3) _vs_ ThaumarchaeotaWukongarchaeota (13) (3) _vs_ Hermodarchaeota (17) Wukongarchaeota (3) _vs_ Heimdallarchaeota (9) a 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ●

90

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

80

70 16S rRNA gene identity (%)

● ●

60

● ● ● ● ● ● ● ● ● ● ● ● ●

Background Aigarchaeota(1) Korarchaeota(1) Helarchaeota(1) Odinarchaeota(2) Kariarchaeota(4) Gerdarchaeota(8) Crenarchaeota(38) Bathyarchaeota(2) Thorarchaeota(10) Lokiarchaeota(12) Hodarchaeota(13) Thaumarchaeota(13) Hermodarchaeota(1) Wukongarchaeota(1) Heimdallarchaeota(12)

b 100 c 100

90 90

80 80

● ● ● ● ● ● ●

16S rRNA gene identity (%) ● 16S rRNA gene identity (%) 70 ● ● ● 70 ● ● ● ● ● ● ● ●

● 60 ● 60 ●

Thorarchaeota (10) _vs_ Aigarchaeota (1) Thorarchaeota (10) _vs_ Korarchaeota (1) Thorarchaeota (10) _vs_ Helarchaeota (1) Hermodarchaeota (1) _vs_ Aigarchaeota (1) Hermodarchaeota (1) _vs_ Korarchaeota (1) Hermodarchaeota (1) _vs_ Helarchaeota (1) Thorarchaeota (10) _vs_ Odinarchaeota (2) Thorarchaeota (10)Thorarchaeota _vs_ Kariarchaeota (10) _vs_ (4) Gerdarchaeota (8) Hermodarchaeota (1) _vs_ Odinarchaeota (2) HermodarchaeotaHermodarchaeota (1) _vs_ Kariarchaeota (1) _vs_ (4) Gerdarchaeota (8) Thorarchaeota (10)Thorarchaeota _vs_ Crenarchaeota (10) _vs_ (38) Bathyarchaeota (2) Thorarchaeota (10) _vs_ Lokiarchaeota (12) Thorarchaeota (10) _vs_ HodarchaeotaThorarchaeota (13) (10) _vs_ Thorarchaeota (10) HermodarchaeotaHermodarchaeota (1) _vs_ Crenarchaeota (1) _vs_ (38) Bathyarchaeota (2) Hermodarchaeota (1) _vs_ ThorarchaeotaHermodarchaeota (10) (1) _vs_ Lokiarchaeota (12) Hermodarchaeota (1) _vs_ Hodarchaeota (13) Thorarchaeota (10) _vs_ ThaumarchaeotaThorarchaeota (13) (10) _vs_ Hermodarchaeota (1) Thorarchaeota (10) _vs_ Wukongarchaeota (1) Hermodarchaeota (1) _vs_ Thaumarchaeota (13) HermodarchaeotaHermodarchaeota (1) _vs_ Wukongarchaeota (1) _vs_ Hermodarchaeota (1) (1) Thorarchaeota (10) _vs_ Heimdallarchaeota (12) Hermodarchaeota (1) _vs_ Heimdallarchaeota (12)

d 100 e 100

90 90

● ● 80 80 ●

● ●

● ● 16S rRNA gene identity (%) 16S rRNA gene identity (%) ● ●

● ● ● 70 ● 70

Odinarchaeota (2) _vs_ Aigarchaeota (1) Odinarchaeota (2) _vs_ Korarchaeota (1) Odinarchaeota (2) _vs_ Helarchaeota (1) Lokiarchaeota (12) _vs_ Aigarchaeota (1) Lokiarchaeota (12) _vs_ Korarchaeota (1) Lokiarchaeota (12) _vs_ Helarchaeota (1) Odinarchaeota (2)Odinarchaeota _vs_ Kariarchaeota (2) _vs_ (4) Gerdarchaeota (8) Odinarchaeota (2) _vs_ Odinarchaeota (2) Lokiarchaeota (12) _vs_ Odinarchaeota (2) Lokiarchaeota (12)Lokiarchaeota _vs_ Kariarchaeota (12) _vs_ (4) Gerdarchaeota (8) Odinarchaeota (2)Odinarchaeota _vs_ Crenarchaeota (2) _vs_ (38) Bathyarchaeota (2) Odinarchaeota (2) _vs_ ThorarchaeotaOdinarchaeota (10) (2) _vs_ Lokiarchaeota (12) Odinarchaeota (2) _vs_ Hodarchaeota (13) Lokiarchaeota (12)Lokiarchaeota _vs_ Crenarchaeota (12) _vs_ (38) Bathyarchaeota (2) Lokiarchaeota (12) _vs_ Thorarchaeota (10) Lokiarchaeota (12) _vs_ HodarchaeotaLokiarchaeota (13) (12) _vs_ Lokiarchaeota (12) Odinarchaeota (2) _vs_ Thaumarchaeota (13) Odinarchaeota (2) _vs_ Hermodarchaeota (1) Odinarchaeota (2) _vs_ Wukongarchaeota (1) Lokiarchaeota (12) _vs_ Thaumarchaeota (13) Lokiarchaeota (12) _vs_ Hermodarchaeota (1) Lokiarchaeota (12) _vs_ Wukongarchaeota (1) Odinarchaeota (2) _vs_ Heimdallarchaeota (12) Lokiarchaeota (12) _vs_ Heimdallarchaeota (12)

e 100 f 100

90 90

80

80

● ● ● ● ● ● ● ● ● ● ● ● ●

16S rRNA gene identity (%) ● ● ●

16S rRNA gene identity (%) ● 70 ● ● ● ● ● ● ● ● ● ● 70 ● ●

● ● ●

60 ●

60

Helarchaeota (1) _vs_ Aigarchaeota (1) Helarchaeota (1) _vs_ Korarchaeota (1) Helarchaeota (1) _vs_ Helarchaeota (1) Heimdallarchaeota (12) _vs_ Aigarchaeota (1) Heimdallarchaeota (12) _vs_ Korarchaeota (1) Heimdallarchaeota (12) _vs_ Helarchaeota (1) Helarchaeota (1) _vs_ Odinarchaeota (2) Helarchaeota (1)Helarchaeota _vs_ Kariarchaeota (1) _vs_ (4) Gerdarchaeota (8) Heimdallarchaeota (12) _vs_ Odinarchaeota (2) HeimdallarchaeotaHeimdallarchaeota (12) _vs_ Kariarchaeota (12) _vs_ (4) Gerdarchaeota (8) Helarchaeota (1) _vs_Helarchaeota Crenarchaeota (1) _vs_ (38) Bathyarchaeota (2) Helarchaeota (1) _vs_ Thorarchaeota (10) Helarchaeota (1) _vs_ Lokiarchaeota (12) Helarchaeota (1) _vs_ Hodarchaeota (13) HeimdallarchaeotaHeimdallarchaeota (12) _vs_ Crenarchaeota (12) _vs_ (38) Bathyarchaeota (2) Heimdallarchaeota (12) _vs_ Thorarchaeota (10) Heimdallarchaeota (12) _vs_ Lokiarchaeota (12) Heimdallarchaeota (12) _vs_ Hodarchaeota (13) Helarchaeota (1) _vs_ Thaumarchaeota (13) Helarchaeota (1) _vs_ Hermodarchaeota (1) Helarchaeota (1) _vs_ Wukongarchaeota (1) Heimdallarchaeota (12) _vs_ Thaumarchaeota (13) Heimdallarchaeota (12) _vs_ Hermodarchaeota (1) Heimdallarchaeota (12) _vs_ Wukongarchaeota (1) Helarchaeota (1) _vs_ Heimdallarchaeota (12) Heimdallarchaeota (12) _vs_ Heimdallarchaeota (12)

g 100 h 100 ●

● 90 ● ● 90 ●

80 80

● ● ● 16S rRNA gene identity (%)

16S rRNA gene identity (%) ● 70 ● ● ● ● ● ● 70 ● ● ● ● ● ● ●

60 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the● preprint in perpetuity. It is made 60 available under aCC-BY-ND 4.0 International license.

Kariarchaeota (5) _vs_ Aigarchaeota (1) Kariarchaeota (4) _vs_ Korarchaeota (1) Kariarchaeota (4) _vs_ Helarchaeota (1) Gerdarchaeota (8) _vs_ Aigarchaeota (1) Gerdarchaeota (8) _vs_ Korarchaeota (1) Gerdarchaeota (8) _vs_ Helarchaeota (1) Kariarchaeota (4) _vs_ Odinarchaeota (2) Kariarchaeota (4) _vs_ Gerdarchaeota (8) Kariarchaeota (4) _vs_ Kariarchaeota (4) Gerdarchaeota (8) _vs_ Odinarchaeota (2) Gerdarchaeota (8) _vs_ Kariarchaeota (4) Gerdarchaeota (8) _vs_ Gerdarchaeota (8) Kariarchaeota (4) Kariarchaeota_vs_ Crenarchaeota (4) _vs_ (38) Bathyarchaeota (2) Kariarchaeota (4) _vs_ Thorarchaeota (10) Kariarchaeota (4) _vs_ Lokiarchaeota (12) Kariarchaeota (4) _vs_ Hodarchaeota (13) Gerdarchaeota (8)Gerdarchaeota _vs_ Crenarchaeota (8) _vs_ (38) Bathyarchaeota (2) Gerdarchaeota (8) _vs_ Thorarchaeota (10) Gerdarchaeota (8) _vs_ Lokiarchaeota (12) Gerdarchaeota (8) _vs_ Hodarchaeota (13) Kariarchaeota (4) _vs_ Thaumarchaeota (13) Kariarchaeota (4) _vs_ Hermodarchaeota (1) Kariarchaeota (4) _vs_ Wukongarchaeota (1) Gerdarchaeota (8) _vs_ Thaumarchaeota (13) Gerdarchaeota (8) _vs_ Hermodarchaeota (1) Gerdarchaeota (8) _vs_ Wukongarchaeota (1) Kariarchaeota (4) _vs_ Heimdallarchaeota (12) Gerdarchaeota (8) _vs_ Heimdallarchaeota (12)

i 100 j 100

● ● ● 90 ● 90

80 ● ● 80 ● ● ● 16S rRNA gene identity (%) 16S rRNA gene identity (%)

70

● ● 70 ●

60

Hodarchaeota (13) _vs_ Aigarchaeota (1) Hodarchaeota (13) _vs_ Korarchaeota (1) Hodarchaeota (13) _vs_ Helarchaeota (1) Wukongarchaeota (1) _vs_ Aigarchaeota (1) Wukongarchaeota (1) _vs_ Korarchaeota (1) Wukongarchaeota (1) _vs_ Helarchaeota (1) Hodarchaeota (13) _vs_ Odinarchaeota (2) Hodarchaeota (13)Hodarchaeota _vs_ Kariarchaeota (13) _vs_ (4) Gerdarchaeota (8) Wukongarchaeota (1) _vs_ Odinarchaeota (2) WukongarchaeotaWukongarchaeota (1) _vs_ Kariarchaeota (1) _vs_ (4) Gerdarchaeota (8) Hodarchaeota (13)Hodarchaeota _vs_ Crenarchaeota (13) _vs_ (38) Bathyarchaeota (2) Hodarchaeota (13) _vs_ Thorarchaeota (10) Hodarchaeota (13) _vs_ Lokiarchaeota (12) Hodarchaeota (13) _vs_ Hodarchaeota (13) WukongarchaeotaWukongarchaeota (1) _vs_ Crenarchaeota (1) _vs_ (38) Bathyarchaeota (2) Wukongarchaeota (1) _vs_ Thorarchaeota (10) Wukongarchaeota (1) _vs_ Lokiarchaeota (12) Wukongarchaeota (1) _vs_ Hodarchaeota (13) Hodarchaeota (13) _vs_ Thaumarchaeota (13) Hodarchaeota (13) _vs_ Hermodarchaeota (1) Hodarchaeota (13) _vs_ Wukongarchaeota (1) Wukongarchaeota (1) _vs_ Thaumarchaeota (13) Wukongarchaeota (1) _vs_ Hermodarchaeota (1) Wukongarchaeota (1) _vs_ Wukongarchaeota (1) Hodarchaeota (13) _vs_ Heimdallarchaeota (12) Wukongarchaeota (1) _vs_ Heimdallarchaeota (12) bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Universal 293

Archaea & Eukaryotes 62

Archaea & Bacteria 15

Archaea only 7

Eukaryotes only 1

Asgard only 0

0 50 100 150 200 250 300 350 G

Group 4c WP 012528510.1 Proteobacteria Deltaproteobacteria

Group 4c WP 027187553.1 Proteobacteria Deltaproteobacteria Group 4c WP 026783288.1 Proteobacteria Alphaproteobacteria Group 4c WP 018606884.1 Proteobacteria Betaproteobacteria

Group 4c WP 010939566.1 Proteobacteria Deltaproteobacteria Group 4c WP 008388614.1 Proteobacteria Alphaproteobacteria Group 4c WP 020881764.1 Proteobacteria Deltaproteobacteria

Group 4c WP 007527819.1 Proteobacteria Deltaproteobacteria

Group 4c WP 028535106.1 Proteobacteria Betaproteobacteria Group 4c WP 027191193.1 Proteobacteria Deltaproteobacter

Group 4c WP 015416298.1 Proteobacteria Deltaproteobacteria Group 4c WP 011474902.1 Proteobacteria Alphaproteobacteria Group 4c WP 028108808.1 Proteobacteria Gammaproteobacteria Group 4c WP 028112273.1 Proteobacteria Gammaproteobacteria

Group 4c WP 011389179.1 Proteobacteria Alphaproteobacteria

airetcabomrehtorpoC atoretcabomrehtorpoC 1.685345210 PW g4 puorG g4 PW 1.685345210 atoretcabomrehtorpoC airetcabomrehtorpoC

airetcabonahteM atoeahcrayruE 1.407314310 PW g4 puorG g4 PW 1.407314310 atoeahcrayruE airetcabonahteM

orG Group 4c WP 007468176.1 Proteobacteria Gammaproteobacteria

ietorpomrehT atoeahcranerC 1.665737410 PW g4 puorG g4 PW 1.665737410 atoeahcranerC ietorpomrehT

Tree scale: 1 puorG g4 PW 1.659269020 atoeahcranerC ietorpomrehT

Group 4c WP 012856977.1 Proteobacteria Epsilonproteobacteria puorG g4 PW 1.537257110 atoeahcranerC ietorpomrehT

Group 4c WP 007473061.1 Proteobacteria Epsilonproteobacteria Group 4c WP 014769235.1 Proteobacteria Epsilonproteobacteria

Group 4g WP 028841464.1 Thermodesulfobacteria Thermodesulfobacteria

Group 4c WP 015902767.1 Proteobacteria Epsilonproteobacteria

Group 4c WP 013121779.1

Group 4h WP 011832947.1 Euryarchaeota Methanomicrobia Group 4c WP 013809561.1 Firmicutes Clostridia Group 4hGroup WP 004076146.1 4h WP 013329140.1 Euryarchaeota Euryarchaeota Methanomicrobia Methanomicrobia Group 4c WP 011344721.1 Firmicutes Clostridia Group 4c WP 015594087.1 Firmicutes Bacilli

Group 4h WP 011449071.1 Euryarchaeota Methanomicrobia various Asgard archaea

airetc a b o etorp atle D airetc a b o etor P 1.1 6 2 8 6 3 9 0 0 P W c 4 p u or

Group 4h WP 011844026.1 Euryarchaeota Methanomicrobia airetc a b o etorp atle D airetc a b o etor P 1.8 7 8 7 0 0 6 0 0 P W c 4 p u or G

airetc a b o etorp atle D airetc a b o etor P 1.5 0 5 5 2 6 2 1 0 P W c 4 p u or G Group 4g WP 029009221.1 Proteobacteria Alphaproteobacteria Group 4h WP 014867151.1 Euryarchaeota Methanomicrobia Group 4g WP 013756005.1 Firmicutes Clostridia

ia Group 4h WP 004038499.1 Euryarchaeota Methanomicrobia atoretc a b o m re htorp o C 1.7 2 2 3 6 9 8 1 0 P W g 4 p u

Group 4g WP 027624751.1 Firmicutes Clostridia

Group 4h WP 013644869.1 Euryarchaeota Methanobacteria Group 4g WP 027099538.1 Firmicutes Clostridia Group 4g WP 005210163.1 Firmicutes Clostridia

Group 4h WP 004031618.1 Euryarchaeota Methanobacteria As 129-p 02096

Group 4g WP 027308223.1 Firmicutes Clostridia Group 4h WP 013825480.1 Euryarchaeota Methanobacteria Group 4g WP 012745069.1Thermotogae As 143-p 01022 Group 4g WP 008908605.1 Firmicutes Clostridia As 129-p 02846 Group 4h WP 013295617.1 Euryarchaeota Methanobacteria As 006-p 01423 Group 4h WP 016358530.1 Euryarchaeota Methanobacteria As 168-p 02492 Group 4h WP 019266264.1 Euryarchaeota Methanobacteria

airetc a b o m re htorp o C As 168-p 02277

As 101-p 00069

Group 4h WP 012956197.1 Euryarchaeota Methanobacteria As 171-p 04484

Group 4h WP 011018833.1 Euryarchaeota Methanopyri As 080-p 01531 As 130-p 00452 As 086-p 01626 As 022-p 01727 As 095-p 00419 As 103-p 00367

Group 4h WP 011972581.1 Euryarchaeota Methanococci Group 4d WP 013467760.1 Euryarchaeota Thermococci Group 4d WP 011251041.1 Euryarchaeota Thermococci Group 4h WP 012193964.1 Euryarchaeota Methanococci Group 4d WP 015849571.1 Euryarchaeota Thermococci Group 4d WP 012572555.1 Euryarchaeota Thermococci Group 4h WP 013180984.1 Euryarchaeota Methanococci Group 4d WP 004066458.1 Euryarchaeota Thermococci Group 4d WP 014013086.1 Euryarchaeota Thermococci Group 4d WP 029166411.1 Synergistia Group 4d WP 014734412.1 Euryarchaeota Thermococci Group 4d WP 006301161.1 Synergistetes Synergistia Group 4d WP 011012581.1 Euryarchaeota ThermococciGroup 4d WP 014163350.1 SynergistetesGroup 4d WP Synergistia 013048648.1Group Synergistetes 4d WP 009164181.1 Synergistia Synergistetes Synergistia Group 4d WP 013747503.1 Euryarchaeota ThermococciGroup 4d WP 009200527.1 Synergistetes Synergistia Group 4d WP 005662523.1 Synergistetes Synergistia Group 4d WP 013905871.1 Euryarchaeota Thermococci Group 4d WP 024821984.1 Synergistetes Synergistia Group 4hGroup WP 011973090.1 4h WP 013867209.1 Euryarchaeota Euryarchaeota Methanococci Methanococci

Group 4h WP 018154556.1 Euryarchaeota Methanococci

Group 4h WP 013799750.1 Euryarchaeota Methanococci Group 4d WP 004066437.1 Euryarchaeota Thermococci Group 4d WP 014013072.1 Euryarchaeota Thermococci Group 4h WP 010870018.1 Euryarchaeota Methanococci Group 4h WP 013100690.1 Euryarchaeota Methanococci group 4c

Group 4d WP 008083483.1 Euryarchaeota NA Group 4d WP 015283654.1 Euryarchaeota NA Group 4h WP 004592818.1 Euryarchaeota Methanococci Group 4d WP 011833332.1 Euryarchaeota Methanomicrobia

Group 4d WP 011449544.1 Euryarchaeota Methanomicrobia Group 4i WP 011972283.1 Euryarchaeota Methanococci Group 4d WP 004038443.1Group 4d WP Euryarchaeota 004077410.1 Methanomicrobia Euryarchaeota Methanomicrobia Group 4i WP 012194251.1 Euryarchaeota Methanococci Group 4i WP 013180047.1 Euryarchaeota Methanococci Group 4d WP 013329507.1 Euryarchaeota Methanomicrobia

Group 4i WP 018154032.1 Euryarchaeota Methanococci As 135-p 01818 Group 4i WP 011973026.1 Euryarchaeota Methanococci group 4g Group 4i WP 013866214.1 Euryarchaeota Methanococci group 4g Group 4b WP 020505780.1 Proteobacteria Gammaproteobacteria Group 4b WP 028450152.1 Proteobacteria Betaproteobacteria As 176-p 02784 Group 4i WP 007044316.1 Euryarchaeota Methanococci Group 4b WP 028451288.1 Proteobacteria Betaproteobacteria Group 4i WPGroup 013099747.1 4i WP 004590220.1 Euryarchaeota Euryarchaeota Methanococci Methanococci

Group 4b WP 011388069.1 Proteobacteria Alphaproteobacteria Group 4i WP 019262598.1 Euryarchaeota Methanobacteria Group 4i WP 010870540.1 Euryarchaeota Methanococci Group 4b WP 028842781.1 Nitrospira Group 4i WP 012956799.1 Euryarchaeota Methanobacteria Group 4b WP 028845048.1 Nitrospirae Nitrospira

Group 4i WP 016359143.1 Euryarchaeota Methanobacteria Group 4i WP 011407013.1 Euryarchaeota Methanobacteria

As 003-p 00361 Group 4bGroup WP 013467422.14b WP 012571983.1 Euryarchaeota Euryarchaeota Thermococci Thermococci Group 4i WP 013643876.1 Euryarchaeota Methanobacteria Group 4b WP 014123101.1 Euryarchaeota Thermococci Group 4i WP 013826680.1 Euryarchaeota Methanobacteria Group 4b WP 011839350.1 Crenarchaeota Thermoprotei Group 4i WP 004029221.1 Euryarchaeota Methanobacteria Group 4b WP 014737671.1 Crenarchaeota Thermoprotei Group 4b WP 013128928.1 Crenarchaeota Thermoprotei Group 4i WP 010876862.1 Euryarchaeota Methanobacteria group 4h Group 4b WP 014767907.1 Crenarchaeota Thermoprotei Group 4b WP 013561596.1 Crenarchaeota Thermoprotei

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint Group 4bGroup WP 004068373.14b WP 014012072.1 Euryarchaeota Euryarchaeota Thermococci Thermococci As 002-p 02316 Group 4b WP 012571234.1 Euryarchaeota Thermococci group 4d Group 4b WP 010868591.1 Euryarchaeota Thermococci (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Group 4b WP 015857865.1 Euryarchaeota Thermococci As 116-p 00058 As 106-p 00094 Group 4b WP 015857684.1 Euryarchaeota Thermococci available under aCC-BY-ND 4.0 InternationalAs 062-p 01097 license. Group 4b WP 013905846.1 Euryarchaeota Thermococci As 135-p 00999 As 063-p 01967 Group 4b WP 012572531.1 Euryarchaeota ThermococciGroup 4f WP 007780611.1 Firmicutes Clostridia As 011-p 00088 As 136-p 01227 Group 4a WP 014558423.1 Crenarchaeota ThermoproteiGroup 4f WP 014183752.1 Firmicutes Clostridia Asgard archaea cluster Group 4f WP 013623369.1 Firmicutes Clostridia GroupGroup 4f WP 4f 015756422.1 WP 013843216.1 Firmicutes Firmicutes Clostridia Clostridia Group 4f WP 019553945.1 Firmicutes group 4b Group 4f WP 018133527.1Group 4f WP Firmicutes 028986298.1 Bacilli Firmicutes Bacilli Group 4f WP 006323220.1 Firmicutes Bacilli As 133-p 02944As 053-p 00934 Group 4f WP 014795396.1 Firmicutes Clostridia Group 4f WP 015945149.1 Firmicutes Clostridia Group 4f WP 014902103.1 Firmicutes Clostridia group 4i Group 4f WP 009619272.1 Firmicutes Clostridia As 134-p 02543 As 034-p 00456 Group 4f WP 027938372.1 Firmicutes Negativicutes Group 4f WP 004094701.1 Firmicutes Negativicutes Group 4f WP 019880045.1 Firmicutes Negativicutes As 126-p 02361 Group 4f WP 021167282.1 Firmicutes Negativicutes As 091-p 03606 NONHYDROGENASE WP 027397032.1 Firmicutes Negativicutes As 117-p 00130 As 144-p 01628 As 031-p 00867 As 112-p 03635 As 122-p 00152 group 4f NONHYDROGENASE WP 021718870.1 Firmicutes Negativicutes As 027-p 01766

Group 4a WP 011751858.1 Crenarchaeota Thermoprotei

Group 4a WP 015051661.1 Firmicutes Clostridia As 149-p 01698 As 110-p 03657 Group 4a WP 025773400.1 Firmicutes Clostridia Group 4a WP 013400781.1 Firmicutes Bacilli As 150-p 01260 Group 4a WP 022587809.1 Firmicutes Clostridia Group 4a WP 007473606.1 Proteobacteria Epsilonproteobacteria As 151-p 01362 Group 4a WP 013134545.1 Proteobacteria Epsilonproteobacteria Group 4a WP 024953588.1 Proteobacteria Epsilonproteobacteria As 128-p 01290 Group 4a WP 021091787.1 Proteobacteria Epsilonproteobacteria Group 4a WP 023384105.1 Proteobacteria Epsilonproteobacteria Group 4a WP 024955559.1 Proteobacteria Epsilonproteobacteria Group 4a WP 014770116.1 Proteobacteria Epsilonproteobacteria Group 4a WP 011139635.1 Proteobacteria Epsilonproteobacteria

Group 4a WP 022739022.1 Coriobacteriia Group 4a WP 012802521.1 Actinobacteria Coriobacteriia Group 4a WP 006234109.1 Actinobacteria Coriobacteriia Group 4a WP 006362270.1 Actinobacteria Coriobacteriia

Group 4a WP 011474970.1 Proteobacteria Alphaproteobacteria Group 4a WP 008384313.1 Proteobacteria Alphaproteobacteria Group 4a WP 014249328.1 Proteobacteria Alphaproteobacteria Group 4a WP 029010114.1 Proteobacteria Alphaproteobacteria Group 4a WP 028536385.1 Proteobacteria Betaproteobacteria group 4a Group 4a WP 018609272.1 Proteobacteria Betaproteobacteria Group 4a WP 009113583.1 Proteobacteria Gammaproteobacteria Group 4a WP 013317233.1 Proteobacteria Gammaproteobacteria Group 4a WP 005187619.1 Proteobacteria Gammaproteobacteria Group 4a WP 025797024.1 Proteobacteria Gammaproteobacteria Group 4a WP 007370867.1 Proteobacteria Gammaproteobacteria Group 4a WP 001102339.1 Proteobacteria Gammaproteobacteria Group 4e WP 022184688.1 Actinobacteria Coriobacteriia Group 4a WP 006818557.1 Proteobacteria Gammaproteobacteria Group 4e WP 022379545.1 Actinobacteria Coriobacteriia Group 4a WP 002436255.1 Proteobacteria Gammaproteobacteria Group 4a WP 025151932.1 Proteobacteria Gammaproteobacteria Group 4e WP 022363091.1 Actinobacteria Coriobacteriia Group 4a WP 004245689.1 Proteobacteria Gammaproteobacteria Group 4e WP 012803120.1 Actinobacteria Coriobacteriia Group 4a WP 021804850.1 Proteobacteria Gammaproteobacteria Group 4e WP 015760641.1 Actinobacteria Coriobacteriia Group 4a WP 011092831.1 Proteobacteria Gammaproteobacteria Group 4a WP 002445316.1 Proteobacteria Gammaproteobacteria Group 4e WP 022739883.1 Actinobacteria Coriobacteriia Group 4a WP 017802632.1 Proteobacteria Gammaproteobacteria Group 4e WP 009138996.1 Actinobacteria Coriobacteriia Group 4a WP 020826906.1 Proteobacteria Gammaproteobacteria Group 4e WP 012798463.1 Actinobacteria Coriobacteriia Group 4a WP 025801577.1 Proteobacteria Gammaproteobacteria Group 4a WP 013511773.1 Proteobacteria Gammaproteobacteria Group 4e WP 006361427.1 Actinobacteria Coriobacteriia Group 4a WP 020845056.1 Proteobacteria Gammaproteobacteria Group 4e WP 009142078.1 Actinobacteria Coriobacteriia Group 4a WP 029095340.1 Proteobacteria Gammaproteobacteria Group 4e WP 006720658.1 Actinobacteria Coriobacteriia Group 4a WP 027275601.1 Proteobacteria Gammaproteobacteria 022349360.1 Actinobacteria Coriobacteriia Group 4a WP 005764546.1 Proteobacteria Gammaproteobacteria Group 4e WP Group 4a WP 005542188.1 Proteobacteria Gammaproteobacteria Group 4e WP 006235754.1 Actinobacteria Coriobacteriia Group 4a WP 005760524.1 Proteobacteria Gammaproteobacteria Group 4e WP 027399701.1 Firmicutes Clostridia Group 4a WP 017781337.1 Proteobacteria Gammaproteobacteria Group 4a WP 015466266.1 Proteobacteria Gammaproteobacteria

Group 4e WP 020449518.1 Euryarchaeota Thermoplasmata Group 4a WP 025820124.1 Proteobacteria Gammaproteobacteria Group 4a WP 007469705.1 Proteobacteria Gammaproteobacteria Group 4e WP 008860581.1 Firmicutes Negativicutes Group 4a WP 004726128.1 Proteobacteria Gammaproteobacteria Group 4e WP 022106764.1 Firmicutes Negativicutes Group 4a WP 021019734.1 Proteobacteria Gammaproteobacteria Group 4e WP 007070913.1 Firmicutes Negativicutes

Group 4e WP 027871860.1 Firmicutes Clostridia Group 4e WP 027203889.1 Firmicutes Clostridia Group 4e WP 027431522.1 Firmicutes Clostridia Group 4e WP 026651247.1 Firmicutes Clostridia Group 4e WP 028241974.1 Firmicutes Clostridia Group 4e WP 026835345.1 Firmicutes Clostridia

Group 4e WP 022790212.1 Firmicutes Erysipelotrichia

Group 4e WP Group015536221.1 4e WP 007047619.1Firmicutes Erysipelotrichia Firmicutes Clostridia Group 4e WP 014254276.1 Firmicutes Clostridia group 4e Group 4g WP 013560598.1 Chloroexi Anaerolineae Group 4g WP 011737331.1 Proteobacteria Deltaproteobacteria Group 4e WP 010251151.1 Firmicutes Clostridia Group 4g WP 012415297.1 Elus

Group 4g WP 008528233.1 Euryarchaeota Halobacteria

As 037-p 00889Group 4g WP 015284192.1 Euryarchaeota Methanomicrobia Group 4e WP 011937735.1Group Proteobacteria 4e WP 005213155.1 Deltaproteobacteria Firmicutes Clostridia

Group 4e WP 027716617.1Group Proteobacteria 4e WP 024346172.1 Deltaproteobacteria Firmicutes Clostridia imicro Group 4e WP 026892028.1 Firmicutes Clostridia bia As 135-p 01313 Group 4e WP 012199760.1 Firmicutes Clostridia Group 4e WP 003514863.1 Firmicutes Clostridia group 4g As 134-p 00139 As 024-p 00917 Group 4e WP 013625216.1 Firmicutes Clostridia As 118-p 00580 Group 4e WP 019226287.1 Firmicutes Clostridia As 116-p 02491 As 118-p 02013

As 106-pAs 00181133-p 02734

Group 4e WP 012619310.1 Euryarchaeota Methanomicrobia As 101-p 00691 Group 4e WP 011991040.1 Euryarchaeota Methanomicrobia As 006-p 01413 As 169-p 01856 As 143-p 01011 Group 4e WP 004039320.1 Euryarchaeota Methanomicrobia As 172-p 03633 As 173-p 01569 Group 4e WP 011843203.1 Euryarchaeota Methanomicrobia As 114-p 00864 Group 4e WP 014867437.1 Euryarchaeota Methanomicrobia Group 4e WP 013328417.1 Euryarchaeota Methanomicrobia As 179-p 00390 airetcaboe As 080-p 00859 Group 4e WP 004077817.1 Euryarchaeota Methanomicrobia As 086-p 00898 Group 4e WP 015284201.1 Euryarchaeota Methanomicrobia As 172-p 01874 As 181-p 03248 t As 169-p 01202 or As 178-p 01409 Group 4e WP 011448732.1 Euryarchaeota Methanomicrobia p atleD airetcaboetorP 1.922585020 PW e4puorG PW 1.922585020 airetcaboetorP atleD As 022-p 02023 As 129-p 01460 As 095-p 02348 Group 4e WP 012376920.1 Opitutae Group 4e WP 014777057.1 Proteobacteria Gammaproteobacteria As 130-p 00329

Group 4e WP 012969568.1 Proteobacteria Gammaproteobacteria Group 4e WP 015282154.1 Proteobacteria Gammaproteobacteria

As 049-p 00393 As 052-p 01560 Group 4e WP 014323623.1 Proteobacteria Deltaproteobacteria As 071-p 01758 As 157-p 02708 Group 4e WP 013515288.1 Proteobacteria Deltaproteobacteria As 158-p 00251 Group 4e WP 018125731.1 Proteobacteria Deltaproteobacteria As 160-p 01454

As 012-p 01476 Group 4e WP 022661459.1 Proteobacteria Deltaproteobacteria As 064-p 00591 As 021-p 00409 As 156-p 00963 As 072-p 00743 Group 4e WP 015336113.1 Proteobacteria Deltaproteobacteria As 161-p 02052 Group 4e WP 027178908.1 Proteobacteria Deltaproteobacteria As 029-p 01314 As 043-p 00187 As 043-p 02777

Group 4e WP 015720170.1Group 4e Proteobacteria WP 020879492.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria

82300 p-830 sA p-830 82300 Group 4e WP 020887613.1 Proteobacteria Deltaproteobacteria Group 4e WP 029967123.1 Proteobacteria Deltaproteobacteria sA p-590 24010

Group 4e WP 027182881.1 Proteobacteria Deltaproteobacteria Group 4e WP 028572004.1 Proteobacteria Deltaproteobacteria As 075-p 00728 Group 4e WP 005986921.1 Proteobacteria Deltaproteobacteria puorG e4 PW 1.283671910 atoeahcrayruE atamsalpomrehT As 085-p 01234 Group 4e WP 021758533.1 Proteobacteria Deltaproteobacteria

026486855.1 Firmicutes Clostridia As 099-p 02586 Group 4e WP 010937737.1 Proteobacteria Deltaproteobacteria Group 4e WP 007524167.1 Proteobacteria Deltaproteobacteria As 109-p 02628 As 001-p 00152 As 090-p 01015 As 090-p 02671

ietorpomrehT atoeahcranerC 1.294921310 PW g4 puorG g4 PW 1.294921310 atoeahcranerC ietorpomrehT

As 031-p 02111 As 139-p 01638

Group 4e WP 003868941.1 Firmicutes Clostridia 3 5 2 0 0 p-6 0 0 s A Group 4e WP 012995592.1 Firmicutes Clostridia As 002-p 02309 As 137-p 00310 As 141-p 00737

As 126-p 03708 Group 4e WP 014757614.1 Firmicutes Clostridia 01.1 Firmicutes Clostridia As 107-p 00856 Group 4e WP 011024634.1 Firmicutes Clostridia As 122-p 01751 As 104-p 02344 As 091-p 00679 As 087-p 02378 Group 4e WP 027187946.1 Proteobacteria Deltaproteobacteria 150 sA As 138-p 02965

Group 4e WP - Group 4e WP 028587759.1 Proteobacteria Deltaproteobacteria Group 4e WP 012625292.1Group Proteobacteria4e WP 027190463.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria 13.1 Coprothermobacteria Group 4e WP 012749853.1 Proteobacteria Deltaproteobacteria Group 4e WP 015926708.1 Firmicutes Clostridia Group 4e WP 029228227.1 Firmicutes Clostridia Group 4e WP 028573897.1 Proteobacteria Deltaproteobacteria Group 4e WP 028576381.1 Proteobacteria Deltaproteobacteria Group 4e WP 027631079.1 Firmicutes Clostridia

9 0 7 0 0 p-1 3 1 s A 97500 p

5 0 3 3 0 p-8 1 0 s A 7 9 4 1 0 p-8 9 0 s A various Asgard archaea

Group 4e WP 012302199.1 Firmicutes Clostridia

Group 4e WP 0150512 Group 4e WP 015738579.1 Firmicutes Clostridia Group 4e WP 013174844.1 Firmicutes Clostridia Group 4e WP 0189625

Group 4e WP 010936594.1 Chloroexi Dehalococcoidia

Group 4e WP 011034248.1 Euryarchaeota Methanomicrobia

Group 4e WP 011305188.1 Euryarchaeota Methanomicrobia

Group 4e WP 013638341.1Aqui cae Aqui cae

Group 4e WP 013537236.1Aqui cae Aqui cae

Group 4e WP 012035507.1 Euryarchaeota Methanomicrobia

Group 4e WP 012899275.1 Euryarchaeota Methanomicrobia Group 4e WP 012899230.1 Euryarchaeota Methanomicrobia

Group 4e WP 014405402.1 Euryarchaeota Methanomicrobia

Group 4e WP 012663581.1 Proteobacteria Epsilonproteobacteria

P 013460916.1 Proteobacteria Epsilonproteobacteria Group 4e WP 024789181.1 ProteobacteriaGroup 4e WP Epsilonproteobacteria 007474009.1 Proteobacteria Epsilonproteobacteria

Group 4e WP 008340210.1 Proteobacteria Epsilonproteobacteria

Group 4e W rG

puo

Group 3c WP

PW c3

010878869.1 EuryarchaeotaArchaeoglobi

32510

3

Group 3c WP 027191993.1 Proteobacteria Deltaproteobacteria 027187949.1 P

.617 Tree scale: 0.1 Group 3c WP

ruE 1

roteobacteria Deltaproteobacteria

ato e a h cra y

Group 3c WP 026167551.1 Proteobacteria Deltaproteobacteria

Group 3c WP 020882283.1 Proteobacteria Deltaproteobacteria

980 laH

o

Group 3c WP 012965352.1 EuryarchaeotaArchaeoglobi Group 3c WP 4

4

72 2

889

airetcaboetorpatleD ai 3 airetcaboetorpatleD

0 p-81 0

3

20 p-431 p-431 20

79

20

6

98300 p-110 sA p-110 98300

1 Group 3c WP 011698648.1 Proteobacteria Deltaproteobacteria 7

airetcab rmodesulfobacteria Thermodesulfobacteria 0 p-811 sA p-811 0

1

000

p

10 p-360 sA p-360 10

-

31 sA 1

430 The Group 3c WP

s Group 3c WP 011700738.1 Proteobacteria Deltaproteobacteria 012099129.1 Proteobacteria Deltaproteobacteria rG p A Group 3c WP 027371047.1 Proteobacteria Deltaproteobacteria 027368703.1 Proteobacteria Deltaproteobacteria

-

A

sA

s

63 Group 3c WP

A

Group 3c WP s

6940

601 sA

1 s 1

600 p-5

32

Group 3c WP 011188256.1 Proteobacteria Deltaproteobacteria A

p-811

0

Group 3c WP 015935331.1 Proteobacteria Deltaproteobacteria

0 008870928.1 Proteobacteria Deltaproteobacteria Group 3c WP 013637988.1Aqui cae Aqui cae p Group 3c WP

0

-260 sA -260 r Group 3c WP 028580885.1 Proteobacteria Deltaproteobacteria

Group 3c WP 023408290.1 Proteobacteria Deltaproteobacteria 7 etcaboetorP 04420 p-

01572 8 2 0 P W c3 p u o

A 028318776.1 Proteobacteria Deltaproteobacteria 79 5

0 7 5 0 0 p-3 3 1 s A

Group 3c WP 022668938.1 Proteobacteria Deltaproteobacteria 4435.1 Proteobacteria Deltaproteobacteria

350 s Group 3c WP 029898901.1 Proteobacteria Deltaproteobacteria

-

Group 3c WP 013179971.1 Euryarchaeota Methanococci Group 3c WP 015903075.1 Proteobacteria Deltaproteobacteria Group 3c WP 028840908.1

013756665.1 Firmicutes Clostridia

P1.19548 puorG c3 PW 1.386257510 Group 3c WP 011170767.1 Euryarchaeota Methanococci

r Group 3c WP 008869690.1 Proteobacteria Deltaproteobacteria o

8600 p

t Group 3c WP 012545782.1 Nitrospirae Nitrospira

0

Group 3c WP 011700332.1 Proteobacteria Deltaproteobacteria As 037-p 02044 Group 3c WP 028574551.1 Proteobacteria Deltaproteobacteria Group 3c WP 027183464.1 Proteobacteria Deltaproteobacteria

Group 3c WP 016359076.1 Euryarchaeota Methanobacteria Group 3c WP 013908379.1 Thermodesulfobacteria Thermodesulfobacteria Group 3c WP

Group 3c WP 028317553.1 Proteobacteria Deltaproteobacteria Group 3c WP 028580860.1 Proteobacteria Deltaproteobacteria Group 3c WP 022532976.1 Euryarchaeota Methanomicrobia

Group 3c WP 029966039.1 Proteobacteria Deltaproteobacteria Group 3c WP 012309762.1 Candidatus Korarchaeota NA

Group 3c WP 012956704.1 Euryarchaeota Methanobacteria Group 3c WP 015590675.1 EuryarchaeotaArchaeoglobi

Group 3c WP 011699072.1 Proteobacteria Deltaproteobacteria As 072-p 03930

Group 3c WP 011405926.1 Euryarchaeota Methanobacteria As 087-p 01315 Group 3c WP 022775381.1 Proteobacteria Betaproteobacteria Group 3c WP 011972821.1 Euryarchaeota Methanococci

Group 3c WP 010936390.1 ChloroexiGroup 3c WP Dehalococcoidia orp atle D airetc a b o e Group 3c WP 018154262.1 Euryarchaeota Methanococci t As 139-p 03413

Group 3c WP As 139-p 02380

Group 3c WP Group 3c WP 013799334.1 Euryarchaeota Methanococci 0 p-401 sA Group 3c WP 013826546.1 Euryarchaeota Methanobacteria iretcaboe 0 Group 3c WP 011843851.1 Euryarchaeota Methanomicrobia a

01321 3 Group 3c WP 014867732.1 Euryarchaeota Methanomicrobia Group 3c WP 013643973.1 Euryarchaeota Methanobacteria 4 As 002-p 02321

7 Group 3c WP 019177457.1 Euryarchaeota Thermoplasmata Group 3c WP 004039570.1 Euryarchaeota Methanomicrobia Group 3c WP 004031311.1 Euryarchaeota Methanobacteria 019264891.1 Euryarchaeota Methanobacteria Group 3c WP 012979819.1 Euryarchaeota Methanococci 1 4 5 0 0 p-7 0 1 s A 8178.1 Chloroexi Dehalococcoidia Group 3c WP 020448391.1 Euryarchaeota Thermoplasmata

01573 Group 3c WP 028894885.1 Proteobacteria Deltaproteobacteria

Group 3c WP 015739967.1 Firmicutes Clostridia As 105-p 05004 Group 3c WP 012376486.1 Verrucomicrobia Opitutae

As 141-p 00605 Group 3c WP 013684228.1 Euryarchaeota Archaeoglobi Group 3c WP Group 3c WP As 087-p 02526 As 075-p 01598 2794.1 Euryarchaeota Methanococci As 085-p 01362 As 003-p 00426 As 107-p 01049 As 099-p 02889 Group 3c WP 006930005.1 Calditrichaeota Calditrichae As 087-p 00736 Group 3c WP 014404721.1 Euryarchaeota Methanomicrobia Group 3c WP 013296307.1 Euryarchaeota Methanobacteria Group 3c WP 011018638.1 Euryarchaeota Methanopyri Group 3c WP 012036601.1 Euryarchaeota Methanomicrobia various Asgard archaea Group 3c WP 012302733.1 Firmicutes Clostridia 012901238.1 Eu 01333 As 027-p 01061

Group 3c WP 015617925.1 Firmicutes Clostridia

WP 013866362.1 0634.1 Eu

Group 3c W

ryarchaeota Methanomicrobiaryarchaeota Methanomicrobia

P 013413806.1 Euryarchaeota Methanobacteria

Group 3c WP 013644241.1 Euryarchaeota Methanobacteria

Group 3c WP 004032005.1 Euryarchaeota Methanobacteria

ia retc

a bo Group 3c WP 013826227.1 Euryarchaeota Methanobacteria te o rp a Group 3d WP 014208491.1 Actinobacteria Actinobacteria mma Group 3c WP 012901516.1 Euryarchaeota Methanomicrobia Group 3d WP 011781268.1 ActinobacteriaGroup 3d ActinobacteriaWP 005505390.1 Proteobacteria Gammaproteobacteria Group 3d WP 027856531.1 Proteobacteria Gammaproteobacteria Group 3c WP 014407035.1 Euryarchaeota Methanomicrobia ai G

Group 3c WP Group 3d WP 011516268.1 Proteobacteria NA r etcaboetor

As 091-p 01314

01203 P As 118-p 01389 .1 016967110 PW d3p PW 016967110

6509.1 Euryarchaeota Methanomicrobia Group 3d WP 020174963.1 Proteobacteria Alphaproteobacteria

Group 3d WP 012591738.1 Proteobacteria Alphaproteobacteria

Group 3d WP 006931831.1 Proteobacteria Alphaproteobacteria Group 3d WP 011394103.1 Proteobacteria Gammaproteobacteria Group 3d WP 028451929.1 Proteobacteria Betaproteobacteria

As 169-p 02083 Group 3d WP 008193041.1 Proteobacteria Alphaproteobacteria Group 3d WP 023411848.1 Proteobacteria Gammaproteobacteria

As 022-p 00078 u As 114-p 00674 Group 3d WP 020410523.1 Proteobacteria Gammaproteobacteria orG Group 3d WP 029133296.1 ProteobacteriaGroup 3d WP 020504783.1 Gammaproteobacteria Proteobacteria Gammaproteobacteria Group 3c KKK40681.1 Candidatus Lokiarchaeota NA Group 3d WP 007191642.1Group 3d Proteobacteria WP 028864775.1 Gammaproteobacteria Proteobacteria Gammaproteobacteria Group 3d WP 026596337.1 Proteobacteria Gammaproteobacteria

Group 3d WP 013966114.1 Proteobacteria Betaproteobacteria As 170 As 086-p 02212 Group 3d WP 010959484.1 Proteobacteria GammaproteobacteriaGroup 3d WPGroup 020393771.1 3d WP 009849435.1 Proteobacteria Proteobacteria Gammaproteobacteria Zetaproteobacteria As 179 As 101-p 01887 -p 02317

Group 3c WP 012469611.1 Proteobacteria Deltaproteobacteria -p 00531 As 172-p 00538As 168-p 01774

Group 3d WP 020161995.1 Proteobacteria Gammaproteobacteria

Group 3c WP Group 3c WP 004512544.1 Proteobacteria Deltaproteobacteria Group 3d WP 020565381.1 Proteobacteria Gammaproteobacteria

As 105-p 04491

As 068-p 03211 As 004-p 04576 As 064-p 02031 027358538.1 Proteobacteria Deltaproteobacteria Group 3d WP 027159095.1 Proteobacteria Gammaproteobacteria

Group 3d WP 005370476.1Group Proteobacteria 3d WP 020158180.1 Gammaproteobacteria Proteobacteria Gammaproteobacteria

As 090-p 01439 As 093-p 03438 As 112-p 00898 Group 3d WP 024299641.1 Proteobacteria Gammaproteobacteria As 098-p 00186 Group 3d WP 027148209.1 Proteobacteria Gammaproteobacteria As 102-p 02448 Group 3d WP 019865758.1 Proteobacteria Gammaproteobacteria Group 3d WPGroup 013816969.1 3d WP 017839865.1 Proteobacteria Proteobacteria Gammaproteobacteria Gammaproteobacteria Group 3d WP 026601561.1 Proteobacteria Gammaproteobacteria As 112-p 01564As 089-p 00722

WP 012532312.1 As 091-p 00152 As 093-p 02726 As 126-p 02983

As 122-p 03157 Group 3d WP 011380994.1 ProteobacteriaGroup Betaproteobacteria 3d WP 011765148.1 Proteobacteria Betaproteobacteria

As 109-p 01894 Group 3d WP 004174406.1 Proteobacteria Betaproteobacteria As 105-p 04203 Group 3d WP 018412521.1 Proteobacteria Betaproteobacteria Group 3d WP 009205622.1 Proteobacteria Betaproteobacteria

Group 3d WP 013029742.1 Proteobacteria BetaproteobacteriaGroup 3d WP 022771893.1 Proteobacteria Betaproteobacteria Group 3d WP 015436644.1 Proteobacteria Betaproteobacteria Group 3d WP 018992236.1 Proteobacteria Betaproteobacteria Group 3d WP 011466117.1 Proteobacteria Betaproteobacteria

Group 3d WP 011493687.1 Proteobacteria Betaproteobacteria As 098-p 04914 Group 3d WP 023490135.1 Proteobacteria Gammaproteobacteria Group 3d WP 012346489.1 Proteobacteria Betaproteobacteria As 048-p 00197 As 050-p 00448 As 047-p 01223 As 178-p 04000 Group 3d WP 008384205.1 Proteobacteria Alphaproteobacteria Group 3d WP 028994701.1 Proteobacteria Betaproteobacteria As 181-p 01329 Group 3d WP 008616907.1 Proteobacteria Alphaproteobacteria

As 057-p 00855 As 023-p 00285 As 056-p 02668 As 069-p 01656 Group 3d WP 014238042.1 Proteobacteria Betaproteobacteria Lokiarchaeota Group 3d WP 015765571.1 Proteobacteria Betaproteobacteria As 055-p 02462 As 057-p 03929 Group 3d WP 013293541.1 Proteobacteria Betaproteobacteria Group 3d WP 026687097.1 Proteobacteria Betaproteobacteria Group 3d WP 027457859.1 Proteobacteria Betaproteobacteria As 019-p 02026 Group 3d WP 002937457.1 Proteobacteria Betaproteobacteria Group 3d WP 004322153.1 Proteobacteria Betaproteobacteria

As 125-p 01 Group 3d WP 004252763.1 Proteobacteria Betaproteobacteria

As 008-p 01622 As 007-p 02515 155

As 016-p 00138

As 065-p 01584

As 019-p 02307

As 083-p 03351

As 058

-p 00402 As 165-p 01307

Group 3d WP 013559391.1 Chloroexi Anaerolineae As 165-p 02389

As 162-p 02317

As 029-p 00896 Group 3d WP 012548375.1 Dictyoglomi Dictyoglomia As 021-p 00359 Group 3d WP 009200281.1 Synergistetes Synergistia As 078-p 02217 Group 3d WP 014807493.1 Synergistetes Synergistia As 020-p 02253 Group 3d WP 005661842.1 Synergistetes Synergistia Group 3d WP 014163745.1 Synergistetes Synergistia Group 3d WP 029165455.1 Synergistetes Synergistia As 155-p 01988 Group 3d WP 013048933.1 Synergistetes Synergistia As 013-p 02158 Group 3d WP 024821667.1Group Synergistetes3d WP 008526412.1 Synergistia Euryarchaeota Halobacteria As 160-p 00727 As 068-p 02547 As 058-p 03852

Group 3d WP 009150973.1 Proteobacteria Gammaproteobacteria Group 3d WP 015281226.1 Proteobacteria Gammaproteobacteria As 093-p 01707As 067-p 00631 Group 3d WP 012971066.1 Proteobacteria Gammaproteobacteria

As 092-p 00538 Group 3d WP 020504162.1 Proteobacteria Gammaproteobacteria Group 3d WP 013451671.1 Deferribacteres Deferribacteres Group 3d WP 014777645.1 Proteobacteria Gammaproteobacteria Group 3d WP 013163642.1 Proteobacteria Deltaproteobacteria As 049-p 03013 Group 3d WP 007191859.1 Proteobacteria Gammaproteobacteria As 164-p 00800 Group 3d WP 021287226.1 Proteobacteria Epsilonproteobacteria Group 3d WP 007042613.1 Proteobacteria Gammaproteobacteria

As 163-p 02374

As 009-p 02443 Group 3d WP 028317309.1 Proteobacteria Deltaproteobacteria

As 012-p 01779 As 037-p 02082 Group 3d WP 015724758.1 Proteobacteria Deltaproteobacteria Group 3d WP 013559677.1 Chloroexi Anaerolineae Group 3d WP 028582678.1 Proteobacteria Deltaproteobacteria

As 072-p 01614 Group 3d WP 014808238.1 Proteobacteria Deltaproteobacteria Group 3d WP 011686721.1 Acidobacteriia

Group 3d WP 028585846.1 Proteobacteria Deltaproteobacteria As 038-p 00713

As 071-p 01625 Group 3d WP 011189453.1 Proteobacteria Deltaproteobacteria

As 155-p 01678

Group 3d WP 020876264.1 Proteobacteria Deltaproteobacteria As 013-p 00597

Group 3d WP 0116995Group 61.13d WP Proteobacteria 004513769.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria As 021-p 00022 Group 3d WP 014434113.1 Chloroexi Caldilineae Group 3d WP 014551946.1 Proteobacteria Deltaproteobacteria As 007-p 01615

As 093-p 00608 Group 3d WP 012470833.1 Proteobacteria Deltaproteobacteria

As 065-p 02861 Group 3d WP 027388897.1 Chrysiogenetes Chrysiogenetes As 028-p 01009 Group 3d WP 013506608.1 Chrysiogenetes Chrysiogenetes As 028 -p 011 14 As 029-p 01521 As 157- Group 3d WPGroup 006562473.1 3d WP 015941206.1 Chloroexi Chloroexia Chloroexi Chloroexia p 02084 As 071-p 02924 As 072-p 03276 Group 3d WP 028457902.1 Chloroexi Chloroexia Group 3d WP 012119893.1 Chloroexi Chloroexia

As 037-p 00296 group 3c Group 3d WP 006512117.1 NA Group 3d WP 006512920.1 Cyanobacteria NA As 038-p 00989 Group 3d WP 012167698.1Group 3d WP Cyanobacteria 017324543.1 Cyanobacteria NA NA As 068-p 00674 Group 3d WP 024545803.1 Cyanobacteria NA Group 3d WP 012305837.1 Cyanobacteria NA As 064-p 02310

As 067-p 02304

As 164-p 00543 Group 3d WP 009756785.1 Cyanobacteria NA As 152-p 01686 Group 3d WP 006195766.1 Cyanobacteria NA As 163-p 00277 As 144 Group 3d WP 015153897.1 Cyanobacteria NA -p 01782 Group 3d WP 016874926.1 Cyanobacteria NA As 045-p 02964 Group 3d WP 015113185.1 Cyanobacteria NA

Group 3d WP 011321324.1 Cyanobacteria NA As 1 15-p 00036 Group 3d WP 015138057.1 016952 Cyanobacteria407.1 Cyanobacteria NA NA Group 3d WP As 094-p 01207 Group 3d WP 015079811.1 Cyanobacteria NA As 044-p 00301 Group 3d WP 015216232.1 Cyanobacteria NA Group 3d WP 006276675.1 Cyanobacteria NA As 044-p 00697 As 127-p 02333

Group 3d WP 029552892.1 Cyanobacteria NA As 127-p 01966 Group 3d WP 006170040.1 Cyanobacteria NA As 154-p 01 139 Group 3d WP 015110900.1 Cyanobacteria NA

Group 3d WP 006910395.1 Cyanobacteria NA As 036-p 00882 group 3d Group 3d WP 011378516.1 Cyanobacteria NA As 036-p 00083

Group 3d WP 012629409.1 Cyanobacteria NA As 088 -p 02393 Group 3d WP 015171978.1 Cyanobacteria NA

As 039-p 02722 Group 3d WP 015182972.1 Cyanobacteria NA

As 023-p 00284 As 040-p 00781 Group 3d WP 017305034.1 Cyanobacteria NA Group 3d WP 015144267.1 Cyanobacteria NA

Group 3d WP 019509061.1 Cyanobacteria NA 019498425.1 Cyanobacteria NA As 054-p 01255 Group 3d WP 008273694.1 Cyanobacteria NA Group 3d WP As 132-p 03976 Group 3d WP 012593478.1 Cyanobacteria NA As 077-p 01641 Group 3d WP 015956115.1 Cyanobacteria NA As 176-p 00301 Group 3d WP 016514321.1 Cyanobacteria NA 028947433.1 Cyanobacteria NA Group 3d WP As 004-p 04217

Group 3d WP 006669190.1 Cyanobacteria NA As 175-p 01075 Group 3d WP 015221229.1 Cyanobacteria NA

As 015-p 03118 Group 3d WP 017294725.1 Cyanobacteria NA

As 174-p 00972 Group 3d WP 023067990.1 Cyanobacteria NA Group 3d WP 015227029.1 Cyanobacteria NA

As 119-p 0 017659307.1 Cyanobacteria NA 0225 Group 3d WP

As 111-p 01764 Group 3d WP 006101686.1 Cyanobacteria NA

Group 3d WP 015151979.1 Cyanobacteria NA As 124-p 02519 Group 3d WP 017715405.1 Cyanobacteria NA As 142-p 01660

As 137-p 00534

As 046-p 02481

As 100-p 00502

As 178-p 04240

As 073-p 00518

As 076-p 01403

As 076-p 00959

As 073-p 01465

As 023-p 03063

As 041-p 01275

As 025-p 01167

As 145-p 01590

As 151-p 01275

As 150-p 01868 Group 3c WP 011018550.1 Euryarchaeota Methanopyri

As 128-p 03923 Group 3c WP 013099800.1 Euryarchaeota Methanococci

Group 3c WP 004589824.1 Euryarchaeota Methanococci

0819.1 Euryarchaeota Methanococci Group 3c WP 01579

Group 3c WP 015733560.1 Euryarchaeota Methanococci

Group 3c WP 013799074.1 Euryarchaeota Methanococci

Group 3c WP 013866983.1 Euryarchaeota Methanococci

Group 3c WP 018154225.1 Euryarchaeota Methanococci

Group 3c WP 011973924.1 Euryarchaeota Methanococci

Group 3c WP 011171638.1 Euryarchaeota Methanococci

Group 3c WP 012065828.1 Euryarchaeota Methanococci group 3c Group 3c WP 013180651.1 Euryarchaeota Methanococci

Group 3a WP 012107592.1 Euryarchaeota Methanomicrobia

Group 3a WP 011307257.1 Euryarchaeota Methanomicrobia Group 3a WP 011021012.1 Euryarchaeota Methanomicrobia Group 3a WP 011305486.1 Euryarchaeota Methanomicrobia Group 3a WP 011034943.1 Euryarchaeota Methanomicrobia

WP 015286660.1 Group 3a WP 012616797.1 Euryarchaeota Methanomicrobia

Group 3a WP 004076756.1 Euryarchaeota Methanomicrobia Group 3a WP 011449294.1 Euryarchaeota Methanomicrobia Group 3a WP 011832394.1 Euryarchaeota Methanomicrobia Group 3a WP 014867277.1 Euryarchaeota Methanomicr Group 3a WP 004037261.1 Euryarchaeota Methanomicrobiaobia

Group 3a WP 013414105.1 Euryarchaeota Methanobacteria Group 3a WP 010876915.1 Euryarchaeota Methanobacteria

Group 3a W P 004029354.1 Euryarchaeota Methanobacteria Group 3a WP group 3a 013643821.1 Euryarchaeota Methanobacteria Group 3a WP 013826770.1 Euryarchaeota Methanobacteria Group 3a WP 012956862.1 Euryarchaeota Methanobacteria

Group 3a WP 011406878.1 Euryarchaeota Methanobacteria Group 3a WP 019262624.1 Euryarchaeota Methanobacteria Group 3a WP 016359193.1 Euryarchaeota Methanobacteria

Group 3a WP Group 3a WP 012035102.1 Euryarchaeota Methanomicr 012901155.1 Euryarchaeota Methanomicrobia

Group 3a WP obia 012035310.1 Eu ryarchaeota Methanomicr Group 3a WP 014405594.1 Euryarchaeobiaota Methanomicrobia Group 3a WP 012901 198.1 Euryarchaeota Methanomicrobia Group 3a WP 007044698.1 Euryarchaeota Methanococci

Group 3a WP 013866359.1 Euryarchaeota Methanococci

Group 3a WP 018154259.1 Euryarchaeota Methanococci

Group 3a WP 01

Group 3a WP 197101117076891.14.1 Euryarchaeota Euryarchaeota Methanococci Methanococci

Group 3a WP 013179974.1 Euryarchaeota Methanococci

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

available under aCC-BY-ND 4.0 International license. Group 3a WP 022846160.1 Aqui cae Aqui cae

Group 3a WP 013537869.1Group 3a WP Aqui cae 013638061.1 Aqui cae Aqui cae Aqui cae Group 3a WP Group 3a WP 011019 Group 3a WP 004591340.1 013099931.1 Euryarchaeota Euryarchaeota299.1 EuMethanococci Methanococciryarchae Group 3a WP 015791787.1 Euryarchaeota Methanococci ota Methanopyri

Group 3a WP 011973755.1 Euryarchaeota Methanococci Group 3a WP 013867054.1 Euryarchaeota Methanococci Group 3a WP 007043840.1 Euryarchaeota Methanococci Group 3a WP 018154655.1 Euryarchaeota Methanococci As 012-p 02210 Group 3a WP 014012057.1 Eu Group 3a WP 013180376.1 Euryarchaeota Methanococci Group 3a WP ryarchaeota Thermococci Group 3a WP 011171326.1 Euryarchaeota Methanococci Group 012572 3a WP 521.1 Euryarchaeota Group 3a WP 011972500.1 Euryarchaeota Methanococci 013905833.1 Eu Group 3a WP 015857693.1 Euryarchaeota Thermococci Group 3a WP ryarcha Thermococci eota 014788989.1 Euryarchae The ota Group 3b WP 024444559.1 Actinobacteria Actinobacteria rmococci Group 3b WP 014814971.1 Actinobacteria Actinobacteria Thermococci Group 3b WP 003895383.1 Actinobacteria Actinobacteria Group 3b WP 003924002.1 Actinobacteria Actinobacteria

Group 3b WP 003918688.1 Actinobacteria Actinobacteria As 170 -p 00319 Group 3b WP 007168195.1 Actinobacteria Actinobacteria As 168-p 00720 Group 3b WP 020730473.1 Actinobacteria Actinobacteria Group 3b WP 023365427.1 Actinobacteria Actinobacteria As 172-p 01717

Group 3b WP 005514364.1 Actinobacteria Actinobacteria As 167 Group 3b WP 006331106.1 Actinobacteria Actinobacteria -p 00495 003938391.1 Actinobacteria Actinobacteria As 172-p 00628 Group 3b WP Actinobacteria As 171-p 04703 Group 3b WP 007298856.1 Actinobacteria Group 3b WP 021333622.1 Actinobacteria Actinobacteria As 080-p 00137 018160400.1 Actinobacteria Actinobacteria Group 3b WP Group 3b WP 025357598.1 Actinobacteria Actinobacteria As 022-p 01446 retcabai Group 3b WP 007033070.1 Actinobacteria Actinobacteria aire itcAon cAt 1.3ni aboct As 169-p 00152 10 231310 As 095-p 00463 puorGb3 PW 012891383.1 Actinobacteria Actinobacteria As 103-p 00526 Group 3b WP As 1 Group 3b WP 026214329.1 Actinobacteria Actinobacteria 14-p 02177

As 173-p 01352 Group 3b WP 015662841.1 Actinobacteria Actinobacteria As 080- 1 7 1 4 0 p-1 7 1 s A Group 3b WP 020123404.1 Actinobacteria Actinobacteria 010360429.1 Actinobacteria Actinobacteria p 00428 Group 3b WP As 177-p 00194 Group 3b WP 012998941.1 Actinobacteria Actinobacteria As 086-p 00403 Group 3b WP 004946981.1 Actinobacteria Actinobacteria 004936990.1 Actinobacteria Actinobacteria Group 3b WP Group 3b WP 018384886.1 Actinobacteria Actinobacteria Group 3b WP 003980535.1 Actinobacteria Actinobacteria 023544405.1 Actinobacteria Actinobacteria Group 3b WP 014677317.1 Actinobacteria Actinobacteria Group 3b WP

Group 3b WP 020874487.1 Actinobacteria Actinobacteria

Group 3b WP 013018709.1 Actinobacteria Actinobacteria Group 3b WP 026405334.1 Actinobacteria Actinobacteria Group 3b WP 020524726.1 Actinobacteria Actinobacteria

Group 3b WP 024287469.1 Actinobacteria Actinobacteria

Group 3b WP 027140596.1 Actinobacteria Actinobacteria

Group 3b WP 026311168.1Group 3b W ActinobacteriaP 011438850.1 Actinobacteria Actinobacteria Actinobacteria Group 3b WP 011719855.1 Actinobacteriaaproteobacteria Actinobacteria Alph

Group 3b WP 003611589.1 Proteobacteria Alphaproteobacteria Group 3b WP 018265836.1 Proteobacteria Group 3b WP 018407885.1 Proteobacteria Alphaproteobacteria Group 3b WP 016920195.1 Proteobacteria Alphaproteobacteria 580.1 Proteobacteria Gammaproteobacteria Group 3b WP 008939163.1012136 Proteobacteria Gammaproteobacteria Group 3b WP Group 3b WP 005000139.1 Proteobacteria Gammaproteobacteria

errucomicrobiae Group 3b WP 028214587.1Group 3b WP Proteobacteri 017774733.1a Betap Proteobacteriaroteobacteria Betaproteobacteria Group 3b WP 012383518.1 Proteobacteria Alphaproteobacteria Group 3b WP 012699124.1 Proteobacteria Gammaproteobacteria Group 3b WP 024807859.1 Verrucomicrobia NA Group 3b WP 018290138.1 Verrucomicrobia V

Group 3b WP 009021278.1 Proteobacteria Gammaproteobacteria

aproteobacteria Group 3b WP 012237207.1 Proteobacteria Deltaproteobacteria Alph

ldilineae

Group 3b WP 029008300.1 Proteobacteria Group 3b WP 009540223.1 Proteobacteria Alphaproteobacteria Group 3b WP 014068049.1 NA

Group 3b WP 028322239.1 Proteobacteria Deltaproteobacteria 014433201.1 Chloroexi Ca Group 3b WP 024079550.1 Proteobacteria Alphaproteobacteria Group 3b WP 008887819.1 Proteobacteria Alphaproteobacteria Group 3b WP

Group 3b CUU06124.1 Candidatus Kryptonia NA

Group 3b WP 005006913.1 Nitrospinia Group 3b WP 012637708.1 Proteobacteria Gammaproteobacteria 017290644.1 Cyanobacteria NA 009061764.1 Verrucomicrobia Methylacidiphilae Group 3b WP 007219971.1 Candidatus Brocadiae Group 3b WP Group 3b WP 027877222.1 Deinococcus-Thermus Deinococci Group 3b GroupWP 3b WP 012464725.1 Verrucomicrobia Methylacidiphilae Group 3b WP 027844918.1 Cyanobacteria NA Group 3b WP 012628086.1 Cyanobacteria NA

Group 3b WP 008098529.1 Verrucomicrobia Verrucomicrobiae

Group 3b WP 020864100.1 Thaumarchaeota NA Group 3b WP 013706649.1 Proteobacteria Deltaproteobacteria

Group 3b WP 020412786.1 Proteobacteria Gammaproteobacteria As 181-p 00705 As 178-p 03266

As 169-p 00794 As 048-p 02929 As 005-p 01601 As 171-p 04121 As 173-p 01036 As 047-p 00315

As 080 As 177-p 01484 As 059-p 03203 -p 02085As 179-p 00632 As 046-p 01940 As 086 As 014-p 01481 As 050-p 02570 As 022-p 02179 As 142-p 01058 -p 00810 Group 3b WP 027854910.1 Proteobacteria Gammaproteobacteria

010321726.1 Proteobacteria GammaGroupproteobacteria 3b WP 015258948.1 Proteobacteria Gammaproteobacteria As 144-p 02097

Group 3b WP 006786889.1 Proteobacteria Gammaproteobacteria As 108-pA s00728 123-p 03252 Group 3b WP 025280829.1 Proteobacteria Gammaproteobacteria Group 3b WP As 040-p 02732 Group 3b WP 007021942.1 ProteobacteriaGroup 3b WP Gammaproteobacteria028300037.1 Proteobacteria Gammaproteobacteria As 176-p 00980 As 123 As 176-p 01334 As 156-p 01310 -p 03253 As 144-p 00932

As 076-p 01112 As 100-p 03196 026610481.1 Proteobacteria Gammaproteobacteria

Group 3b WP 014029204.1 Proteobacteria Acidithiobacillia As 036-p 01802

Group 3b WP Group 3bGroup WP 018508430.1 3b WP 011311775.1 Proteobacteria Proteobacteria Betaproteobacteria Betaproteobacteria group 3b Group 3b WP 020680743.1 Proteobacteria Gammaproteobacteria As 088-p 02634 Group 3b WP 020653150.1 Proteobacteria Betaproteobacteria acteria As 081-p 00318 As 115-p 00275 proteob As 148-p 01958 Group 3b WP 027709971.1Group 3bProteobacteria WP 018608874.1 Gammaproteobacteria Proteobacteria Betaproteobacteria Group 3b WP 002708355.1 Proteobacteria Gammaproteobacteria Group 3b WP 014292817.1 Proteobacteria Gammaproteobacteria Group 3b WP 028490240.1 Proteobacteria Gammaproteobacteria As 154-p 00692

019350089.1 Proteobacteria Gammaproteobacteria Group 3b WP 026375374.1 Proteobacteria Gammaproteobacteria 003635857.1 Proteobacteri 028383773.1a ProteobacteriaGammaproteobacteria Gammaproteobacteria

Group 3b WP 028379943.1 Proteobacteria Gammaproteobacteria Group 3b WP As 147 Group 3b WP Group 3bGroup WP 0068714 3b WP 34.1 Proteobacteria Gammaproteobacteria As 174-p 02275 As 094-p 01689 -p 00326 028385762.1 Proteobacteria Gamma Group 3b WP 025386399.1 Proteobacteria Gammaproteobacteria Group 3b WP 027222572.1 Proteobacteria Gammaproteobacteria Group 3b WP 013032567.1 Proteobacteria Gammaproteobacteria As 045-p 01630 Group 3b WP 028989466.1 Proteobacteria Acidithiobacillia Group 3b WP 002691721.1 Proteobacteria Gammaproteobacteria Group 3b WP Group 3b WP 006969455.1 Proteobacteria Deltaproteobacteria As 023-p 00963 Group 3b WP 002644708.1 Planctomycetes Planctomycetia As 152-p 00910 Group 3b WP 013638346.1 Aqui cae Aqui cae As 077-p 01394 Group 3b WP 010585297.1 Planctomycetes Planctomycetia Group 3b WP As 132-p 01631 Group 3b WP 009094014.1 Planctomycetes Planctomycetia As 054

-p 00528 Group 3b WP 008515176.1 Firmicutes Clostridia As 130-p 01133 013537242.1 Aqui cae Aqui cae

Group 3b WP 018300187.1 Proteobacteria Gammaproteobacteria Aqui cae Aqui cae Group 3b WP 012308976.1 Candidatus Korarchaeota NA As 143-p 00397

As 006-p 00782

Group 3b WP 012280244.1 Proteobacteria Gammaproteobacteria Group 3b WP 015051716.1 Firmicutes Clostridia Group 3b W Aqui cae Aqui cae Group 3b WP 012963143.1

Group 3b WP 012646977.1 Proteobacteria Deltaproteobacteria P 011938049.1 Proteobacteri Group 3b WP 012530401.1 Proteobacteria Deltaproteobacteria

Group 3b WP 012676609.1 Aqui cae Aqui cae

Group 3b WP 008286766.1 Aqui cae Aqui cae Group 3bGroup WP 0289498 3b WP 012674555.140.1 Aqui cae Aqui cae Group 3b WP Group 3b WP

Group 3b WP 011012478.1 Euryarchaeota a Deltaproteobacteria 015720 027716793.1 Proteobacteria Deltaproteobacteria

1511 As 062-p 02970 565.1 Proteobacteria Deltaproteobacteria Group 3bGroup WP 3b WP 014734228.1 Euryarchaeota Thermococci As 024-pAs 00015 106-p 00715 1-p 0 Group 3b WP As 136-p 01341 As 116-p 01981 As 053-p 00553 As 01 As 063-p 00245As 134-p 00173 As 135-p 00451 Group 3b WP 013749017.1 Euryarchaeota Thermococci As 034-p 00789 010868072.1 Eu As 133-p 02164

013904776.1 Euryarchaeota

Group 3b W Group 3b WP Asgard archaea cluster Group 3b WP 013466811.1 Euryarchae Group 3b WP 012571009.1 Euryarchaeota Group 3b WP 004069412.1 Euryarchaeota

ryarchaeota Group 3b WP 015849791.1 Euryarchaeota Thermococci Group 3b WP 020953630.1 Eu Group 3b WP 014788486.1 Euryarchaeota Group 3b WP 013905579.1 Euryarchaeota Thermococci Group 3b WP Ther As 051-p 00383 Group 3b WP 010479608.1 Euryarchaeota Group 3b WP P 013466346.1 Euryarchaeota 014012582.1 Euryarchaeota Thermococci mococci As 131-p 00516 Group 3b W As 018-p 00049 As 084-p 00757As 038-p 02645 As 157-p 01221As 071-p 02982 Group 3b WP 010867988.1 Euryarchaeota The As 003-p 00416 As 067-p 00303 rmococci As 068-p 00599As 064-p 02743 014122 Thermococci rmoprotei -p 01526 As 037-p 00540 011712560.1 Proteobacteria Alphaproteobacteria Group 3b WP 010885378.1 EuryarchaeP 015848571.1ota Euryarchaeota The Group 3b WP 109 1 569.1 Euryarcha -p 02309 Group 3b WP 01 As 072 Group 3b W

-p 0 Group 3b W

As 173 Group 3b WP 014788000.1 Euryarchaeota ryarchaeota As 086-p 01631 ota As 171-p 03622 The As 080 As 168-p 01823 01 As 103-p 00373 As 022-p 01731 Ther Thermococcirmococci -p 00147 Group 3b WP P 1012 The 014734005.1 Eu As 095-p 02356 As 171-p 03712 eota Thermococci 156.1 Crenarchaeota As 169-p 02496 P Group 3b WP 1251 mococci As 172-p 03129 010479591.1 Eu rmococci As 167-p 01407 029.1 Eu As 061-p 01329 The As 085 The 019. The 014558 As 075-p 00153 rmococci P rmococci As 053 rmococci Euryarchaeota1 Thermococci The 012571493.1 Euryarchaeota ryarchaeota

Thermococci rmococci 014012081.1 Euryarchaeota Thermococci ryarchaeota Thermococci -p 00438 Group 3b W ryarchaeota Thermococci

Thermococci Thermococci

As 063-p 00195 As 064-p 00705

Thermococci

As 038-p 01098 As 004-p 02894

Thermococci

As 029-p 01339 As 014-p 02021 As 065-p 02124 As 142-p 01601

As 161

As 056-p 02945 AN ac AN As 057-p 00710

As 042-p 02687 -p 01206 As 166-p 00087As 165-p 02110

i As 129-p 00672 As 072

hportinm As 121-p 02042 As 056-p 00966

As 157-p 02646 As 037-p 01771

-p 02197

As 107-p 01801 Thermoprotei As 071-p 02544

O As 096-p 03155 As 026-p 00393 As 121-p 00372 As 078-p 01818 As 066-p 01261

s

roteobacteria utadid

As 055-p 01004 As 104-p 02552 As 069-p 00782 As 008-p 00865 As 060-p 02256 As 020-p 01794 899.1 Chlorobi Chlorobia

As 021-p 00428

aiboro na sA

A

C 1.8 C 20 012498 s

9 P sA l Group 3b WP 028842337.1 Nitrospirae Nitrospira 1 hC iborolhC 1 iborolhC hC As 093-p 00240 As 160-p 01425 2 sA As 141-p 00599 As 099-p 02884 -310 4 Group 3b WP 006926982.1 Calditrichaeota Calditrichae -55 As 159-p 00126

0

80

p p-261 sA As 158-p 01101

0 0 p

8220 0 674.1 Euryarchaeota Halobacteria 88210 p-

Group 3b W 79

9

38110

6281 Group 3b WP 028844226.1 Nitrospirae Nitrospira 5

4 As 087-p 00741

200 p-38 . P 6 As 139-p 02133

651 00 Group 3b WP 014737726.1 Crenarchaeota 008526 W 6

52 b3 Group 3b WP 029915222.1 Proteobacteria Deltaproteobacteria

27

p

Group 3b WP 012970030.1 Proteobacteria Gammaproteobacteria As 138-p 00938

9 6

-

Group 3b WP 013218375.1 Chloroexi Dehalococcoidia p 20

05210 PW PW 05210 0 022363293.1Actinobacteria Coriobacteriia p-9 34910

39 p 710 p-970sA 710

42730 42730 Group 3b WP 015283468.1 Euryarchaeota NA uorG Group 3b WP 008085116.1 Euryarchaeota NA

03900 p-330 sA p-330 03900

0

79500 p-651 sA p-651 79500

-9 7720 s 192

sA

0

40

A

-461 sA

Group 3b WP 017890430.1 CandidatusAtribacteria NA As 125-p 01953 Group 3b WP

0 p

sA

Group 3b WP 015403594.1 Proteobacteria Deltaproteobacteria -8

00 sA 00 b p 3 -91

5

010 p Group 3b WP Group 3b WP 048088262.1 Euryarchaeota Methanomicrobia p Group 3b WP 012506474.1 Chlorobi Chlorobia 0 Group 3b WP 027366709.1 Proteobacteria Deltap As 184-p 01 uorG Group 3b WP 011416040.1 Proteobacteria Deltaproteobacteria 26 Group 3b WP 013979629.1Actinobacteria Coriobacteriia 0 0 s Group 3b WP 009139266.1Actinobacteria Coriobacteriia

01300 p-21 2 4 7 1 0 p-5 5 1 s A A

5 3 2 2 0 p-3 6 1 s A Group 3b WP 006366269.1 Chlorobi Chlorobia Group 3b WP 015760489.1Actinobacteria Coriobacteriia sA

Group 3b WP 012466994.1 Chlorobi Chlorobia Group 3b WP 012501574.1 Chlorobi Chlorobia

Group 3b WP 028027114.1Actinobacteria Coriobacteriia Group 3b WP 011745973.1 Chlorobi Chlorobia

Group 3b WP 022739737.1Actinobacteria Coriobacteriia

Group 3b WP 029912396.1 Proteobacteria Deltaproteobacteria

Group 3b WP 016308610.1Actinoba cteria Coriobacteriia

Group 3b WP 011361294.1 Chlorobi Chlorobia

048090768.1 Euryarchaeota Methanomicrobia

Group 3b WP

various Asgard archaea

-p 00827

As 153 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

ai

rG dir

aidirtsolC setucimri aidirtsolC

t

solC setucim solC

W a1 puo

a Group 1a WP i

dir a

id aidi

t aidirts irtsolC irtsolC

s

r ri o ts

lC se lC

F

olC

1 aidir G ol Tree scale: 1.25 . F F

s 34083900 P 31331471 r se 013011808.1 Deferribacteres Deferribacteres t setucimriF C 1

1 aid

ucimriF 1. ucimriF pu a1 .587936720PW t

rG 1.78 tucimriF e

Group 1a WP 013885747.1 Deferribacteres tu solC set solC 1.

puo

i

uo rtsolC setucimriF 1.2068085 setucimriF rtsolC

cim

orG a1 p

ri

setucivitageN setucimriF 1.377961120 PW a PW 1.377961120 setucimriF setucivitageN

pu etorP

0 F o 1.548281410 PW PW 1.548281410 micutes Clostridia

u

3 P

cimriF 1.975877 cimriF PW a1 0 PW 265 Group 1a WP ab Group 1a WP 022664892.1 Pr a1 1

0 W .4 c puo a1 W Group 1a WP 015336980.1 ProteobacteriGroup 1aa Deltap WP roteobacteria

Group 1a WP 2

3 2

109410 PW a1 puorG a1 PW 109410 puorG P 639 2 84

0600

8 6 a1 G

1 70010 P 70010 1 008909589.1 Firmicutes Clostridia W a1 puorG

or puorG a1 PW 0

aidirts Group 1a WP 014261065.1 Proteobacteria Deltaproteobacteria r u PW a

G i G 063720 o 17930 004071 r Group 1a WP dir

1.98

10 P Group 1a WP 010890826.1 Firmicutes Clostridia 015851 or G ts

o r p atle D air et

027178 1. u

o

899

l

7 t Group 1a WP 015774148.1 Proteobacteria Deltaproteobacteria . e W o

00 PW a1 a1 PW 00 C a1 lC

p

etorP

set

1 a Group 1a WP 015958012.1 Firmicutes Clostridia o puorG

a s 552.1 Pr puor 1 Group 1a WP 0 Group 1a WP 016206881.1 Firmicutes Clostridia

925.1 Pr 0 PW a1 puorG a1 PW 0

e

uc 658.1 Proteobacteria Deltaproteobacteria t Group 1a WP 028573845.1 Proteobacteria Deltaproteobacte .4029390

im tcab ucim Group 1a WP 003493671.1 Firmicutes Clostridia 027185047.1 Proteobacteria Deltaproteobacteria e Group 1a WP 017209204.1 Fir Group 1a WP 014254247.1 Firmicutes Clostridia 0 PW

airetcabo

r ir G

iF 1. iF oteobacter a oteobacteria Deltaproteobacteria Deferribacteres p

52 riF

iretc a b o etor P oteobacteria Deltaproteobacteria uorG

Group 1a WP 6 tc a b o etor P 1

e etorP 1

eD 9

o

l 4 1

ir 34 1.2 3 0 8 6 3 11 0 P W a 1 p .21 b o etor P 1.7 0 3 5 3 3 4 2 0

rG Group 1a WP 028784867.1 Firmicutes Bacilli a ab Group 1a WP o Group 1a W 933

c u

orpat 8 tca

ia Deltaproteobacte 1.249 563720 PW a1 PW 563720 Group 1a WP 019123746.1 Firmicutes Bacilli

et

2 Group 1a WP 026688533.1 Firmicutes Bacilli atleD

ire a o

airet PW 010

a1 p 1.241493 Group 1a WP 025219113.1 Firmicutes Bacilli 1

rp puorG 029968 riF Group 1a WP 013162757.1 Proteobacteria Deltaproteobacteria D o etorp atle D a

m

cab Group 1a WP 012959609.1 Firmicutes Bacilli

t leD Group 1a WP 017379378.1 Firmicutes Bacilli oeto cab atle PW

P uci 004072606.1 Proteobacteria Deltaproteobacteria at Group 1a WP 024346501.1 Firmicutes Clostridia

b re 014959255.1 Proteobacteria Deltaproteobacteria t Group 1a WP 013272840.1 Firmicutes Clostridia

t

iretc a b o etor P

ai e Group 1a WP 014116604.1 Firmicutes Clostridia Group 1a WP 792.1 Proteobacteria Deltaproteobacteria a1 ire

a 5

2700

etca etorp PW 10

8

p r Group 1a WP 011 o aB s

tleD a uorG Group 1a GroupWP 010244488.1 1a WP 015924586.1 Firmicutes Firmicutes Clostridia Clostridia ai pu Group 1a WP 004615979.1 Firmicutes Clostridia Group 1h WP a 08 ic ab p

ll o c ria r i Group 1h WP 020540932.1 Actinobacteria Actinobacteria rG

.84

Group 1h WP 026402909.1 Actinobacteria Actinobacteria F 1 airetc a b o et o rp 025324287.1 Proteobacteria Deltap airet Group 1h W Group 1h WP

012854 imri Group 1h WP Group023553242.1 1h WP Actinobacteria Actinobacteria Group 1h WP 187404.1 Proteobacteria Deltaproteobacteria Group 1a WP 029915207.1 Pr

tuc Group 1h WP airetc a b o et o e

Group 1h WP 003988175.1 ActinobacteriaP Actinobacteria s

N 014376814.1 Actinobacteria897.1 Actinobacteria Actinobacteria Group 1a WP 015739752.1 Firmicutes Clostridia 027944 Group 1a W Group 1h WP 014150953.1 Actinobacteria Actinobacteria Group 1a WP 015049787.1 Firmicutes Clostridia Group 1a W Group 1a WP Group 1a WP 013217520.1 Chloro exi Dehalococcoidia Group 1h WPGroup 1h WP 009155165.1 Actinobacteria 007516725.1 011603554.1 Actinobacteria Actinobacteria tage Group 1h WP 028059415.1 Actinobacteria Thermoleophilia Group 1a WP

014670577.1 Actinobacteria444.1 Actinobacteria Actinobacteria ria ivi

c

Group 1a W u

t

Group 1h WP Group 1a WP e Group 1h WP P 028052542.1 Firmicutes Clostridia s Group 1h WP 009945368.1 Actinobacteria Actinobacteria Group 1a WP P 015739219.1 Firmicutes Clostridia 028062027.1 Actinobacteria Thermoleophilia Group 1a W Group 1a WP 012032 Actinob Group 1a WP Group 1a WP Group 1a WP 005812775.1 Firmicutes Clostridia Group 1h WP 021597135.1 Actinobacteria Actinobacteria P 013218717.1 Chloro exi Dehalococcoidia 015407385.1 Chloro exi Dehalococcoidia 012796301.1 Actinobacteria Actinobacteria Group 1a W roteobacteria acteria Group 1a WP oteobacteria Deltaproteobacteria 019819110.1 Actinobacteria Actinobacteria 009616 862.1 Firmicutes Clostridia Group 1h WP Actino Group 1h WP 013132859.1 Actinobacteria Actinobacteria P 014901856.1 Firmicutes Clostridia F 1.2 7 5 4 7 1 3 1 0 P W a 1 p u or G Group 1h WP 013755712.1 Fir Group 1a WP 025205 013119172.1213.1 Firmicutes Firmicutes Clostridia Clostridia i Actino 015262 bacteria 014828375.1 Firmicutes Clostridia imr Group 1h WP 008474746.1 Chloro exi Thermomic 484.1 Firmicutes Clostridia c Group 1h WP 003612931.1Group Proteobacteria 1h WP Alphaproteobacteria P 014794526.1 Firmicutes Clostridia 029651 bacteria 015262 308.1 Firmicutes Clostridia 016920676.1 Proteobacteria setu Group 1h WP 011 Actino 411.1 Proteobacteria Alphaproteobacteria 676.1 Fir 012872944.1 Chloro exi Thermomic micutes Clostridia solC bacteria Group 1hGroup W 1h WP 014267363.1 Acidobacteria Acidobacteriia aidirt micutes Clost 153987.1 Proteobacteria Betaproteobacteria

P 010587703.1 Planctomycetes Planctomycetia Group 1h WP 015247797.1 Planctomycetes Planctomycetia Group 1h W Alph Group 1h WP Group 1h WP 041979300.1 Acidobacteria Blastocatellia Group 1a WP a Group 1h WP Group 1h WP 01 aproteobacteria ridia ii r etcab Group 1h WP 018161883.1 Actinobacteria P 003922 020375837.1 Firmicutes Clostridia robia robia 003926809.1 Actinobacteria Actinobacteria ir o Group 1h WP 006334825.1 Actinobacteria Actinobacteria 013 139.1 Actinoba1688cteria Actinobacteria o Group Group1h WP 1h WP 007735075.1 Actinobacteria Actinobacteria C 119653.1 Firmicutes Clostridia 202.1 air Group 1h WP 003801347.1 Actinobacteria Acidobacteria Acidobacteriia etc Group 1h WP Group 1i WP 009141216.1 Actinobacteria Coriobacteriia 005517441.1 Actinobacteria Actinobacteria a b Group 1h WP o itcA1 n Group 1h WP 023543351.1020500148.1 Actinobacteria ActinobacteriaActinobacteria Group 1h WP 013806 Actino . Group 1h WP 9 0 Group 1h WP bacteria 4973 028930304.1771.1 Actinobacteria Actinobacteria Group 1h WP 0132256 018330638.146.1 Actinobacteria Actinobacteria Actinobacteria Actinobacteria Actinobacteria 006361856.1 Actinobacteria Coriobacteriia 2 Group 1h WP 007034607.1 Actinobacteria Actinobacteria Actinobacteria 2 028936 0 Group 1h WP airetc a b o nitc A airetc a b o nitc A 1.0 7 1 5 7 7 7 0 0 P W h 1 p u or G P 742.1 Group 1i WP 022363273.1 Actinobacteria Coriobacteriia Group 1i WP 0227400W 96.1 Actinobacteria Coriobacteriia i1 Group 1h WP013423 006543203.1858.1 Actinobacteria Actinoba Actinobacteriacteria Actinobacteria Actinobacteria Actinobacteria Group 1i WP 016308995.1 Actinobacteria Coriobacteriia group 1a Group 1i WP Group 1i WP 012803243.1p Actinobacteria Coriobacteriia Group 1h WP 012890477.1 u Actino o Group 1i WP 022184401.1 Actinobacteria Coriobacteriia Group 1h WPGroup 009061231.1 1h WP 006608003.1 Verrucomicro Actinobacteriabia Methylacidiphilae Actinobacteria Group 1i WP 012798421.1 Actinobacteria Coriobacteriia rG bacteria Group 1i WP 009139069.1 Actinobacteria Coriobacteriia Group 1i WP 022370065.1 Actinobacteria Coriobacteriia

Group 1i WP 009304598.1 Actinobacteria Coriobacteriia Group 1h WP 015922819.1 Chloro exi Thermomicrobia Group 1i WP 013979648.1 Actinobacteria Coriobacteriia Group 1h WP 007412921.1 Verrucomicro Actinobacteriabia Verrucomicrobiae Actinobacteria

Group 1h WP 007904650.1 Chloro exi Ktedonobacteria

Group 1h WP

012934014.1 Actinobacteria

The rmoleophilia

Group 1h WPGroup 012813797.1 1h WP 014512387.1 Bacteroidetes Crenarchae Flavobacteriotaia

Group 1b WP 027177852.1 Proteobacteria Deltaproteobacteria Group 1b WP 011369403.1 Proteobacteria Deltaproteobacteria Group 1b WP 010939796.1Group 1b WPProteobacteria 015415315.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria group 1h Group 1b WP 018125512.1 Proteobacteria Deltaproteobacteria The rmoprotei

Group 1g WP Group 1g WP 014125970.1 Crenarchaeota Thermoprotei Group 1b WP 013258222.1 Proteobacteria Deltaproteobacteria 013680 997.1 Crenarchaeota Group 1b WP 015904103.1 Proteobacteria Deltaproteobacteria Group 1g WP 011850254.1 Crenarchaeota Thermoprotei Group 1b WP 028320012.1Group Proteobacteria 1b WP 020585310.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria Group 1g WP 013337401.1 Crenarchaeota Thermoprotei Group 1b WP 011699790.1 ProteobacteriaGroup 1b WP Deltaproteobacteria 006964156.1 Proteobacteria Deltaproteobacteria Group 1g WP Thermoprotei Group 1b WP 028324338.1 Proteobacteria DeltaproteobacteriaGroup 1b WP 022666958.1 Proteobacteria Deltaproteobacteria 048100713.1 Crenarchaeota

Group 1g WP 020962524.1 Crenarchaeota Thermoproteiietorp o m re h T ato e a h cra n er C 1.6 5 9 1 6 7 11 0 P W g 1 p u or G Group 1b WP 028586938.1 Proteobacteria Deltaproteobacteria Group 1g WP 048100721.1 Crenarchaeota The Group 1g WPGroup 014025627.1 1g WP Crenarchaeota rmoprotei Group 1h WP 048100378.1 Crenarchaeota Thermoprotei Group 1g WP 01 011752262.1 Crenarchaeota Thermopr 1822 Thermoprotei 511.1 Crenarchaeota Thermoprotei Group 1g WP The Group 1b WP 020879797.1 Proteobacteria Deltaproteobacteria rmoprotei Group 1b WP 020888131.1 Proteobacteria Deltaproteobacteria 012123507.1 Crenarchaeota otei Group 1b WP 029459118.1 Proteobacteria Deltaproteobacteria

Group 1bGroup WP 029966051.1 1b WP 014261062.1 Proteobacteria Proteobacteria Deltaproteobacteria Deltaproteobacteria

Thermoprotei Group 1b WP 027191085.1 Proteobacteria Deltaproteobacteria

Group 1b WP 027182711.1 Proteobacteria Deltaproteobacteria Group 1b WP 015750727.1 Proteobacteria Deltaproteobacteria Group 1bGroup WP 015772985.11b WP 027179541.1 Proteobacteria Proteobacteria Deltaproteobacteria Deltaproteobacteria group 1g Group 1b WP 015337220.1 Proteobacteria Deltaproteobacteria Group 1b WP 015851793.1 Proteobacteria Deltaproteobacteria Group 1b WP 028574549.1 Proteobacteria GroupDeltaproteobacter 1b WP 027188779.1ia Proteobacteria Deltaproteobacteria

Group 1b WP 028572157.1Group 1b WP Proteobacteria 027361001.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria

Group 1b WP 006008218.1 Proteobacteria Deltaproteobacteria Group 1b WP 011368035.1 ProteobacteriaGroup 1b WPDeltaproteobacteria 009380705.1 Proteobacteria Deltaproteobacteria Group 1b WP 010939208.1 Proteobacteria Deltaproteobacteria

Group 1b WP 021760964.1 Proteobacteria Deltaproteobacteria Group 1b WP 013515487.1 Proteobacteria Deltaproteobacteria Group 1b WP 027175156.1Group 1b Proteobacteria WP 018126002.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria Group 1b WP 022660651.1Group Proteobacteria 1b WP 015414692.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria group 1i

025322049.1 Proteobacteria Deltaproteobacteria

Group 1b WP

Group 1b WP 007474315.1 Proteobacteria Epsilonproteobacteria Group 1b WP 012674350.1 Aquicae Aquicae

Group 1b WP 012675774.1 Aquicae Aquicae Group 1b WP 029520216.1 Aquicae Aquicae Group 1b WP 013553045.1 Proteobacteria Epsilonproteobacteria Group 1j WP 013682857.1 Euryarchaeota Archaeoglobi Group 1j WP Group 1b WP 012082337.1 Proteobacteria Epsilonproteobacteria 015591 Group Group1b WP 1b014468937.1 WP 013135164.1 Proteobacteria Proteobacteria Epsilonproteobacteria Epsilonproteobacteria Group 1j WP 0129404799.102.1 Eu Euryarcharyarchaeota Archaeoglobi eota Archaeoglo Group 1b WP 013460908.1 Proteobacteria Epsilonproteobacteria10.1 Proteobacteria Epsilonproteobacteria bi Group 1j WP 012965344.1 Euryarchaeota Group 1b WP 012083422.1 ProteobacteriaGroup 1b WP Epsilonproteobacteria 025391229.1 Proteobacteria Deltaproteobacteria Group 1b WP 014353799.1 Firmicutes Clostridia Group 1j WP 010878877.1 Euryarchaeota Archaeoglobi Group 1b WP 022670762.1 Proteobacteria Deltaproteobacteria Archaeoglobi Group 1b WP 0133271454.1 Proteobacteria Gammaproteobacteria 1371 Group 1b WP 013460913.1 Proteobacteria Epsilonproteobacteria Group 1b WP 01 PW k1puorG Group 1k WP 01 1021171.1 Euryarchaeota Methanomicrobiaaib orci m o n a hte M ato e a h cra yru E 1.7 3 2 4 3 0 1 1 0 Group 1k WP 011306827.1 Euryarchaeota Methanomicrobia Group 1k WP 028115083.1 Proteobacteria Gammaproteobacteria 013898215.1 Euryarchaeota Methanomicrobia Group 1b WP group 1j Group 1b WGroupP 028110915.1 1b WP 023403981.1 Proteobacteria Proteobacteria Gammaproteobacteria Gammaproteobacteria Group 1k WP 01 acteria 1306833.1 Euryarchaeota Methanomicrobia Group 1b WP Group013344606.1 1b WP Proteobacteria005368230.1 Proteobacteria Gammaproteobacteria Gammaproteobacteria Group 1b WP oteobacteria005299657.1 GammaproteobacterProteobacteria Gammaia proteobacteria Group 1k WP 011021166.1 Euryarchaeota Methanomicrobia 560.1 Pr 1637 028762126.1 Proteobacteria Gammaproteobacteria Group 1k WP 898.1 Proteobacteria Gammaproteob 023846630.1 Euryarchaeota Methanomicrobia Group 1b WP 01 Group 1b WPP 028768 Group 1k WP 012901677.1 Euryarchaeota Methanomicrobia Group 1b W Group 1b WP 012277350.1 Proteobacteria Gammaproteobacteria Group 1k WP 014405075.1 Euryarchaeota Methanomicrobia group 1b 014468932.1 Proteobacteria Epsilonproteobacteria Group 1k WP 012036281.1 Euryarchaeota Methanomic robia Group 1b WP group 1k onproteobacteria Group 1b WP 013135169.1 Proteobacteria Epsilonproteobacteria Group 1b WP 021286544.1 Proteobacteria Epsilonproteobacteria As 085-p 01085 Group 1b WP 011373064.1 Proteobacteria Epsilonproteobacteria As 075-p 00539 Group 1b WP 008339404.1 Proteobacteria Epsil oteobacteria

Group 1b WP 012663540.1 Proteobacteria Epsilonproteobacteria Group 1b WP 007473991.1 Proteobacteria Epsilonproteobacteriaonproteobacteria Wukongarchaeota Group 1b WP 024789021.1 Proteobacteria Epsilonpr Group 1b WP 0249540887.107.1 Proteobacteria Proteobacteria Epsilonproteobacteria Epsil 012856 Group 1b WP Group 1b WP 005870439.1 Proteobacteria Epsilonproteobacteria Group 1b WP 012108717.1 Proteobacteria Epsilonproteobacteria 016647003.1 Proteobacteria Epsilonproteobacteria Group 1b WP

Group 1b WP 002849472.1P 012661 Proteobacteria425.1 Proteobacteri Epsilonproteobacteriaa Epsilonproteobacteria Group 1b W 020975057.1 Proteobacteria Epsilonproteobacteria Group 1b WP Group 1b WP 027305891.1 Proteobacteria Epsilonproteobacteria Group 1b WP 002952687.1 Proteobacteria Epsilonproteobacteria 366.1 Proteobacteria Epsilonproteobacteria Group 1b WP 021091 Group 1b WP 018136329.1 Proteobacteria Epsilonproteobacteria

Group 1b WP 013505701.1 Chrysiogenetes Chrysiogenetes Group 1b WP 027390843.1 Chrysiogenetes Chrysiogenetes Group 1b WP 013163129.1 Proteobacteria Deltaproteobacteria Group 1b WP 006655925.1 Proteobacteria Epsilonproteobacteria Group 1b WP 011139497.1 Proteobacteria Epsilonproteobacteria Group 1b WP 002956442.1 Proteobacteria Epsilonproteobacteria Group 1b WP 004087140.1 Proteobacteria Epsilonproteobacteria Group 1b WP 023929963.1 Proteobacteria Epsilonprote obacteria Group 1b WP 023949825.1 Proteobacteria Epsilonproteobacteria Group 1b WP 023926858.1 Proteobacteria Epsilonproteobacteria Group 1b WP 013023319.1 Proteobacteria Epsil onproteobacteria Group 1b WP 011577655.1 Proteobacteria Epsilonproteo Group 1b WP 013469730.1 Proteobacteria Epsilonproteobacteriabacteria Group 1b WP 013890446.1 Proteobacteria Epsilonproteobacteria Group 1b WP 006564 409.1 Proteobacteria Epsilonproteobacteria

Group 1c WP 013010514.1 Deferribacteres Deferribacteres Group 1c WP 022851983.1 Deferribacte

Group 1c WP Group 1c WP 013886108.1res Deferribacteres Deferribacteres Deferribacteres 010941450.1 Proteobacteria Deltaproteobacteria Group 1d WP 011830409.1 Proteobacteria Betaproteobacteria Group 1c WP 012468242.1 Proteobacteria Deltaproteobacteria Group 1d WP 026436975.1 Proteobacteria Betaproteobacteria Group 1c WP 015721386.1 Proteobacteria Deltaproteobacteria Group 1d WP 022981892.1 Proteobacteria Betaproteobacteria Group 1c WP ia Betaproteobacteria 011938839.1 Prote Group 1d WP 028309978.1 Proteobacter obacteria Deltaproteobacteria Group 1d AAB25780.1 Proteobacteria Betaproteobacteria 256.1 Proteobacteria Gammaproteobacteria Group 1c WP 015403419.1 Proteobacteria Deltaproteobacteria Group 1d WP 028007 Group 1d WP 011153927.1 Proteobacteria Betaproteobacteria oteobacteria Betaproteobacteria Group 1c WP 023408348.1 Proteobacteria Deltaproteobacteria Group 1d WP 028999751.1 Pr group 1c Group 1c WP oteobacteria Gammaproteobacteria 023105852.1 Pr 011187820.1 Proteobacteria Deltaproteobacteria Group 1d WP Group 1c WP proteobacteria Group 1c WP 013638066.1 Group 1d WP 028215130.1 Proteobacteria Betaproteobacteria 022846122.1 Aquicae Aquicae Group 1c WP Group 1d WP 020394857.1 Proteobacteria Gamma 013124489.1 Proteobacteria Betaproteobacteria 013537874.1 AquiAqucae Group 1d WP aproteobacteria Group 1c WP icae Aquicae Group 1c WP 022854121.1 025813214.1 Proteobacteria Alph Aquicae Group 1d WP Group 1c WP 01390 The Group 1c WP 028840815.16919.1 Thermodesrmodesulfobacteria Thermodesulfobacteria Group 1d WP 025825897.1 Proteobacteria Alphaproteobacteria WP 018391449.1 01390 Thermodesulfobacteria Thermodesu Group 1d WP 012384150.1 Proteobacteria Alphaproteobacteria 9681.1 Thermodesulfobacteria Thermodesu Group 1d WP 012169130.1 Proteobacteria Alphaproteobacteria Group 1c WP 01 oteobacteria Alphaproteobacteriabacteria Group 1c W16885P ulfobacteria Thermodesulfobacterialfobacteria Group 1c WP 011419537.159.1 Proteobacteria Deltaproteobacteria Group 1d WP 0142493 02900965.1006.1 Pr Proteobacteria Alphaproteobacteriabacteria Acidobacteria Acidobacteriia Group 1c WP 012098324.1 01152503 Proteobacter9.1 Acidobacteriaia Deltaproteobacter Acidobacteriiaia lfobacteria Group 1d WP 013913761.1 Proteobacteria Alphaproteo Group 1d WP Group 1d WP 029356678.1 Proteobacteria Alphaproteobacteria acteria Group 1d WPP 029908 023457248.1155.1 Proteobacteria Proteobacteria Gammaproteobacteria Alphaproteo Group 1c WP Group 1d W group 1f Group 1c 00915 WP 026199142.1 Proteobacteria Gammaproteobacteria Group 1c W 0983.1 Proteobacte Group 1d WP 027853880.1 Proteobacteria Gammaproteobacteria Group 1c WP Group 1d WP 027859843.1 Proteobacteria Gammaproteobacteria P 01478 Group 1c WP ria Gammaproteobacteria Group 1d WP 007019929.1 Proteobacteria Gammaproteob Group 1c WP 028581743.1 014198378.1 Proteobacteria0 Proteobacteria162.1 Proteobacter Deltaproteobacteria Alphiaaprote Gammaproteobacteria Group 1c WP 029009764.1 Proteobacte Group 1d WP 028293254.1 Proteobacteria Gammaproteobacteria 005863657.1 Proteobacteria Alphaproteobacteriabacteria Group 1c WP 027371077.1 Proteobacteria Deltaproteobacteria Group 1d WP 020745479.1 Proteobacteria Gammaproteobacteria 01572 Group 1d WP 5977.1 Proteobacte 020041893.1 Proteobacteria Alphaproteobacteria ria Group 1d WP 009505608.1 Proteobacteria Alphaproteobacteria Alphaproteobacteria obacteria Group 1d WP Group 1c WP ria Deltap Group 1d WP 008282509.1 Proteobacteria Alphaproteobacteria Group 1c W aproteobacteria Group 1c WP 028579802.1 Proteobacterroteobacteria Deltapia roteobacteria Group 1d WP 002720659.1 Proteobacteria AlphaproteobacteriaAlph P Group 1c WP 01171412 02858 2.1 Proteobacteria Alphaproteobacteria Group 1d WP 023664202.1 Proteobacteria Alphaproteo 3198.1 P Group 1c WP 009206448.1 02913 Proteobacter2834.1 Proteobacteria Betaproteobacteriaia Gammaproteobacteria Group 1d WP 028714011.1 Proteobacteria Alphaproteobacteria Group 1c WP roteobacteria Deltap Group 1d WP 028877918.1 Proteobacteria Alphaproteobacteria P W c1 p u orG Group 1c WP 011466351.1 Proteobacteria Betaproteobacteria Group 1d WP 008068006939138.1262.1 PrProteobacteriaoteobacteria Alphaproteobacteria 01 008838370.1 Proteobacteria Alphaproteobacteria 13840 Group 1d WP Group 1c WP 56.1 Prote roteobacteria 645.1 Proteobacteria Alphaproteobacteria Group 1c WP 004378930.1 Proteobacteria Betaproteobacteria Group 1d WP aproteobacteria b o etor P 1.9 1 4 0 8 0 4 2 0 013418 obacteria Group 1d WP 015597373.1 Proteobacteria Alphaproteobacteriaaproteobacteria group 1d Group 1c WP 01543 Alph 7386.1 Proteobacteria Betaproteobacter Alphaproteoiabacteria Group 1d WP GroupGroup 1c W 1c WP 015766767.1 Proteobacteria Betap Group 1d WP 027157817.1 Proteobacteria Gammaproteobacteriaaproteobacteria Group 1c WP 010863700.1 Proteobacteria Gammaproteobacteria aproteobacteria P 02745 01423 air etc a b o et o r p a h pl A air etc a Alph 8024.1 P Group 1c WP 7220.1 P Group 1d WP 020923409.1 Proteobacteria Alph roteobacteria Betap Group 1d WP 018240255.1 Proteobacteria Alphaproteobacteria roteobacteria Betaproteobacteria GroupGroup 1c WP 1c 029095540.1 WP 004236380.1 Proteobacteria P Gammaproteobacteria Group 1d WP 014891816.1 Proteobacteria oteobacteria Alphaproteobacteria acteria Group 1c WP 012135127.1 00737 Proteobacteria Gammaproteobacteria roteobacteria Group 1c WP 027275754.1 Proteobacteria Gammaproteobacteria Group 1d WP 006205723.1 Proteobacteria746.1 Pr Alphaproteobacteria 0146.1 Proteobacteria Gammaproteobacteria Group 1c WP 006659162.1 Proteobacteria Gammaproteobacteria roteobacte Group 1d WP 028738633.1 Proteobacteria Alphaproteobacteria aproteobacteria Group 1d WP 0281356 02833606.1 Proteobacteria Alph 626.1 Proteobacteria Alphaproteobacteria group 1e Gamm Group 1d WP 026600032.1 Proteobacteria Group 1c WP ria roteobacteria Gammaproteobacteria Group 1d WP 016919 Group 1c WP Group 1c WP Group 1d WP 026783388.1 Proteobacteria1130964.1 Proteobacteria Alphaproteobacteria Alphaproteobacteria Group 1c WP 010674Group404.1 P 1croteobacteria WP 012073017.1 00560 Gammaproteobacteria5173.1 Proteobacter Proteobacteria Gammia Gammaproteobacteria Group 1d WP 009540200.1Group 1d Proteobacteria WP 02 Alphaproteobacteria Group 1c WP 00557 GroupGroup 1c W 1c WP 024522506.1 Proteobacteria Gammaproteobacteria Group 1c WP Group 1c WP 005638557.1 Proteobacteria Gammaproteobacteria 6256.1 Proteobacteria Gammaproteobacteria Group 1dGroup WP 011388917.1 1d WP Proteobacteria Alphaproteobacteria Group 1d WP 011156496.1 Proteobacteria949.1 Proteobacteria Alphaproteob Gammaproteobacteria 011201506.1 Prote Group 1c WP 023580917.1P Proteobacteria Gammaproteo 004092218.1 Proteobacter 00576ia Gamm0858.1aproteobacteria Proteobacteria Gammaproteobacteria Group 1d WP 017366017.1 Proteobacteria Group 1c WPGroup 1c WP 013318689.1 Proteobacteria Gammaproteobacteria 009149 Group 1f WP 015898101.1 Acidobacteria Acidobacteriia Group 1d WP 026610556.1 Proteobacteria Gammaproteobacteria Group 1c WPGroup 1c W Group 1c WP 013366452.1 Proteobacteria Gammaproteobacteria aproteobacteria Group 1f WP Group 1f WP Group 1d WP Group 1c WP 011092818.1 00470 Proteobacteria 00243 Gammaproteobacteria obacteria Gammap Group 1f WP Group 1d WP 015282042.1 Proteobacteria Gammaproteobacteria P Group 1d WP 020502968.1 Proteobacteria Gammaproteobacteria 024528473.13021.1 Proteobacteria 00244P 5491.1 P Gammaproteobacteria Group 1d WP 007039718.1 Proteobacteria Gammaproteobacteria Group 1d WP 007190983.1 Proteobacteria Gammaproteobacteria 006562194.1 Chloro exi Ch loro exia026441619.1 ia Betaproteobacteria 3827.1 P roteobacteriaroteobacteria Gammaproteo Gammaproteobacteria Group 1d WP 002706872.1 Proteobacteria Gammaproteobacteria 012121 roteobacteria Group 1d WP 011767509.1 Proteobacteria Betaproteobacteria Group 1f WP Group 1f WP 028845527.1 Nitrospirae Nitrospira Group 1fGroup WP 1f WP roteobacteria Gammaproteobacteria Group 1f WP Group 1d WP 011629764.1 Proteobacteria Gammaproteobacteria 625.1 Chloro exi Chloro exiaGroup 1f WP Acidobacteria 018502767.1 Acidobacteriia Actinobacteria oteobacteria Gammaproteobacteria Group 1f WPG Group 1f WP 011937483.1 Proteobacteria Deltaproteobacteria 028842875.1 Nitrospirae Nitrospira 872.1 Proteobacteria Gammaproteobacteria 015004151.1 Proteobacteria Betaproteobacteria 020678 027713 Group 1f WP 006544185.1 Actinobacteria Actinobacteria bacteria Group 1f WP 013450229.1 Deferribacteres Defer 003830 014450582.1 Nitrospirae Nitrospira 004512555.1 Pr 214.1 Proteobacteria781.1 Pr Deltaproteobacteria bacteria Group 1d WP 010863916.1 Proteobacteria Gammaproteobacteria Group 1f WP Group 1d WP Group 1f WP 015718812.1 Proteobacteria Deltaproteobacte 012470110.1 Proteobacteria Deltaproteobacteria oteobacteria Deltaproteobacteria Group 1d WP 026685892.1 Proteobacter Group 1d 0 11526524.1WP Proteobacteria Deltaproteobacteria Group 1f WP Group 1d WP Group 1dGroup WP 023226188.1 1d WP 0026831Group Proteobacteri84.1 1d WPrP 028993715.1a Gammaproteoba Proteobacteriacteria Betaproteobacteria 013006836.1 Deferribacte oteobacter Group 1f WP 022851572.1 Deferribacteres Defer

Group 1d WP Group 1f WP Group 1d WP 009378513.1 Proteobacteria Deltaproteobacteria 013885 o etorP 1.4 2 5 4 2 3 5 2 0 ActinoP W f1 p u or Group 1d WP 012463656.1 Verrucomicrobia Methylacidiphilae

ia Deltaproteobacteria Group 1d WP 009059758.1 Verrucomicrobia Methylacidiphilae tcabe bacteria 838.1 Deferribacteres Defer D air Group 1f WP Group 1f WP 0130

Group 1e WP ridia 11747.1 Deferribacteres Deferribacteres res Deferribacteres torpatlee

ea Group 1e WP 013755279.1 Proteobacteria Gamma WP 006192752.1 Group 1e WP Group 1e WP 029133715.1 Proteobacteria Gammaproteobacteria Group 1e WP 012992451.1 Aquicae Aquicae

h abo Group 1e WP 015282140.1 Proteobacteria Gamma Group 1e WP 010880593.1 Aquicae Aquicae 012529798.1 Proteobacteria Deltaproteobacteria ribacteres

c Group 1e WP 012421152.1 V i Group 1e WP

rtidl

Group 1e WP 014779054.1 Proteobacteria Gammaproteobacteria Group 1e WP 007191025.1Group 1e Proteobacteria WP 020508075.1 Gammaproteobacteria Proteobacteria Gammaproteobacteria Group 1e WP 028490337.1 Pr aiborol airetc Group 1e WP 012962 a Group 1e WP

i

a 568.1 Firmicutes Negativicutes micutes Clostridia aibor a borol

ib C C

019542170.1 Firmicutes Negativicutes a Group 1f WP iborolhC iborolhC 1.235205210 PW d1 puorG d1 PW 1.235205210 iborolhC iborolhC

at ribacter a o iborolh 008287448.1 Aquicae Aquicae

r rG

h

o oeah 005840788.1 Firmicutes Negativicutes o orG lh o hC 745.1 Aquicae l 019554 illi 1.3 iborolhC C 012971 009147 er ria 029541750.1 Firmicutes Negativicutes h 600.1 Fir Group 1f WP 004007712.1 Actinobacteria Actinobacteria C iborol rucomicrobia Verrucomicrobiae ibor C Group 1f WP 022865058.1 Actinobacteria Actinobacteria uorG 1 pu es ibo uorG PW c

puorG Group 1f WP 005509950.1 Actinobacteria Actinobacteria irtidla Group 1f WP p e 007039064.1 Proteobacteria Gammaproteobacteria illi 002684413.1 Proteobacteria Gammaproteobacteria ribacteres

007954318.1 Firmicutes Negativicutes C orG e1 pu iborolhC 1.871853110 PW d1 d1 PW 1.871853110 iborolhC Group 1d WP orG 1 W

ro 020489980.1 Fir 015261076.1 Firmicutes Clost 230.1 Pr o 2510 062.1 Proteobacteria Gammaproteobacteria Group 1f WP Group 1f WP 014310124.1 Actinobacteria Actinobacteria lhC 014794 h pu Group 1d WP lhC 014826810.1 Firmicutes Clostridia airet pu Group 1f WP Group 1f WP 005388873.1 Actinobacteria

C

uorG 1 010880393.1 Aquicae Aquicae C 0 P Group 1d WP 012992146.1 Aquicae Aquicae

Group 1d WP 1.15

p

1.73 1

1

1.4496 027937761.1 Firmicutes Negativicutes e 3

c

1 0 PW e Aquic

9 .41856421 58100 PW P W e 1 p u or G aboetorpammaG aire aboetorpammaG oteobacteria Gammaproteobacteria 004013385.1 Actinobacteria Actinobacteria Group 1d WP 021167302.1 Firmicutes Negativicutes 10 PW e1 p Group 1d WP 012282567.1 Firmicutes Clostridia 11 Group 1d WP 1.36797

e 02 Group 1dGroup WP 007290417.1 1d WP 004573252.1 Firmicutes Firmicutes Negativicutes Negativicutes 4

W 40 Group 1f WP 025252525.1 Actinobacteria Actinobacteria 0 P W e 1 p u or G Group 1d WP 025206080.1 Firmicutes Clostridia PW 5

5 0 74

006555093.1 Firmicutes Negativicutes 00 Group 1d WP 014186363.1 Firmicutes Clostridia oteobacteria Gammaproteobacteria Group 1d WP 026762838.1 Firmicutes Negativicutes 0 Group 1d WP Group 1d WP 012281726.1 Firmicutes Clostridia puorG d1 PW 0 5210 00 8 027012481.1 Actinobacteria Actinobacteria 5080 Group 1d WP 006305436.1 Firmicutes Negativicutes Group 1d WP 025306136.1 Aquicae Aquicae 3 6 3 Group 1e WP 018610610.1 Proteobacteria Betaproteobacteria ae 5210 P 5210

Group 1d WP P Group 1d WP 014904046.1 Firmicutes Clostridia 6 0 006062526.1 Actinobacteria Actinobacteria

9 84792 Group 1d WP 0700 PW e1 P 028552966.1 Firmicutes Bacilli 0 3600 9 micutes Clostridia

7 3

Group 1d WP Group 1d WP roidetes Chitinophagia 6 8 Group 1d WP 027889016.1 Firmicutes Negativicutes Group 1d WP 008286892.1 Aquicae Aquicae 1.2 8

00 028257253.1 Firmicutes Negativicutes 69 7259810 P W e1 Group 1d WP 00877

P

. 673521

.

06 1.0 010631937.1 Firmicutes Bac 0 9 r Group 1d WP 003397243.1 Firmicutes Bacilli 5251 PW

P P 8 PW

P 1 005376164.1 Firmicutes Negativicutes Group 1d CUU03002.1 Candidatus Kryptonia NA 027414715.1 Firmicutes Bacilli 8 W d W

eto r Group 1d WP Group 1d WP 003330153.1 Firmicutes Bacilli W Group 1d WP 012513102.1 Aquicae Aquicae orP Group 1d WP 028254735.1 Firmicutes Negativicutes puorG d1 proteobacteria

d

bo Group 1d WP 015594636.1 Firmicutes Bacilli pu 1 884.1 Bacteroidetes Flavobacteriia Group 1d W d1 1 rP 1.4520

.248 rP 1.4

o Group 1d WP 020062534.1 Firmicutes Bacilli 1 proteob

puorG

puo boeto

Group 1d WP Group 1d WP 019241710.1 Firmicutes Bacilli

P

a etor P 1.7 2 0 9 3 aboet torP 1.8etca

t

or

e

o

caboeto r

rG Group 1d W p air etc a b o et o r P 1.

etc G

tor bo 007653239.1 Bacteroidetes Bacteroidia u 012025 Group 1d WP e Group 1d WP

a o retc a b o etor P 1

014217548.1 Bacte ir

022602547.1 Bacteroidetes Bacteroidia rG a acteria tcaboet c

eB e 008510449.1 Bacteroidetes Sphingobacteriia 008862337.1 BacteroidetesP Bacteroidia t Group 1d WP 023510524.1 Firmicutes Bac ai G t

e G Group 1d WP 013764260.1 Bacteroidetes Saprospiria a

iretc a biretcab o eto r

a r pateBa ai

tcabo

m P 1 P Actino Group 1d WP 005643376.1 Bacteroidetes Bacteroidia G orp ai

re G lA air

m

p

.

a h

838043620 PW e1 e1 PW 838043620 oet

m a

b Group 1d WP Group 1d WP

p htidicA a rpa bacteria

m a Group 1d WP Group 1d W r orpamma

maG ai

to

a Group 1d WP Group 1d WP t

boi

etcaboetor m etc Group 1d WP 025279286.1 Bacteroidetes Bacteroidia o etorp a m m a G airetc

orpamma

a b boe air air

p

torp

lica

e

l

i

a caboeto

tca

bo t e

retcaboe

a

i

oetor re ir a airetca a

b ai

tc

e

airetc a b o et

air

p

etca

uorG

r

i

a bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

- Carbohydrates CO2 O2 Nitrogen SulfurCl H2

As_083 As_078 Thorarchaeota As_029 As_028 As_114 Hermodarchaeota As_086 As_022 Odinarchaeota As_006 As_143 As_130 Baldrarchaeota As_129 As_127 As_077 Lokiarchaeota As_004 As_152 As_128 As_098 Helarchaeota As_050 As_048 As_183 Borrarchaeota As_181 As_178 As_099 Heimdallarchaeota As_139 As_138 As_030 Kariarchaeota As_140 As_002 As_017 As_116 Gerdarchaeota As_053 As_024 As_003 Hodarchaeota As_027 As_131 Wukongarchaeota As_085 Nitrogen fixation: nifH Sulfate reduction: sat Sulfate reduction: cysH Sulfate reduction: cysC Reductive Dehalogenase NiFe hydrogenase Group 1 NiFe hydrogenase Group 3 NiFe hydrogenase Group 4 Complex I: nuoD Complex II: sdhA Complex II: sdhC Complex II: sdhD Complex IV: Complex IV: cyoC Complex IV: cydA Complex V: atpA Complex V: atpB Complex V: atpC Nitrate reductase: narGYJI narK Nitrate/nitrite transporter: Nitrous oxide reductase: nosZ Compl Glycolysis Pentose phosphate pathway, oxidative Wood Wood Methane oxidation: mcrA Methane oxidation: mcrB Methane oxidation: mcrC Methane oxidation: mcrD Methane oxidation: mcrG Complex I: nuoA Complex I: nuoB Complex I: nuoC Acetate synthesis: acs/acdA Wood Gluconeogenesis Pyruvate oxidation Citrate cycle Pentose phosphate pathway, non-oxidative Pentose phosphate pathway, rev. RuMP beta As_075 Oxidation ex IV: cyoB Ljungdahl pathway: cdhD Ljungdahl pathway: cdhE Ljungdahl pathway: cooS cyoA

Absence in this phylum Presence in MAG Presence in other MAGs of this phylum bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Wukongarchaeota Ftr FwdD FwdC HdrC

As_075 contig_145_6336 Ftr FwdB FwdA Ftr HyaB

Ftr FwdD FwdC HdrC

As_085 contig_145_13955 FwdB FwdA Ftr HyaB HyaA