<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

1 Expanding diversity of and the elusive ancestry of

2

3 Yang Liu1†, Kira S. Makarova2†, Wen-Cong Huang1†, Yuri I. Wolf2, Anastasia Nikolskaya2, Xinxu 4 Zhang1, Mingwei Cai1, Cui-Jing Zhang1, Wei Xu3, Zhuhua Luo3, Lei Cheng4, Eugene V. Koonin2*, Meng 5 Li1*

6 1 Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen 7 University, Shenzhen, Guangdong, 518060, P. R. China

8 2 National Center for Biotechnology Information, National Library of Medicine, National Institutes of 9 Health, Bethesda, Maryland 20894, USA

10 3 State Key Laboratory Breeding Base of Marine Genetic Resources, Key Laboratory of Marine Genetic 11 Resources, Fujian Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, State 12 Oceanic Administration, Xiamen 361005, P. R. China

13 4 Key Laboratory of Development and Application of Rural Renewable Energy, Biogas Institute of 14 Ministry of Agriculture, Chengdu 610041, P.R. China

15 † These authors contributed equally to this work.

16 *Authors for correspondence: [email protected] or [email protected]

17

18

19 Running title: Asgard archaea genomics

20 Keywords:

1

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

21 Abstract

22 Comparative analysis of 162 (nearly) complete of Asgard archaea, including 75 not reported 23 previously, substantially expands the phylogenetic and metabolic diversity of the Asgard superphylum, 24 with six additional phyla proposed. Phylogenetic analysis does not strongly support origin of eukaryotes 25 from within Asgard, leaning instead towards a three- topology, with eukaryotes branching outside 26 archaea. Comprehensive protein domain analysis in the 162 Asgard genomes results in a major expansion 27 of the set of signature proteins (ESPs). The Asgard ESPs show variable phyletic distributions 28 and domain architectures, suggestive of dynamic evolution via horizontal transfer (HGT), gene loss, 29 gene duplication and domain shuffling. The results appear best compatible with the origin of the 30 conserved core of eukaryote from an unknown ancestral lineage deep within or outside the extant 31 archaeal diversity. Such hypothetical ancestors would accumulate components of the mobile archaeal 32 ‘eukaryome’ via extensive HGT, eventually, giving rise to eukaryote-like cells.

2

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

33 Introduction

34 The Asgard archaea are a recently discovered archaeal superphylum that is rapidly expanding, thanks to 35 metagenomic sequencing (1–5). The Asgard genomes encode a diverse repertoire of eukaryotic signature 36 proteins (ESPs) that far exceeds the diversity of ESPs in other archaea. The Asgard ESPs are particularly 37 enriched in proteins involved in membrane trafficking, vesicle formation and transport, 38 formation and the network, suggesting that these archaea possess a eukaryote-type cytoskeleton 39 and an intracellular membrane system (2).

40 The discovery of the Asgard archaea rekindled the decades old but still unresolved fundamental debate on 41 the evolutionary relationship between eukaryotes and archaea that has shaped around the ‘2-domain (2D) 42 versus 3-domain (3D) tree of ’ theme (6–8). The central question is whether the eukaryotic nuclear 43 lineage evolved from a common ancestor shared with archaea, as in the 3D tree, or from within the 44 archaea, as in the 2D tree. The discovery and phylogenomic analysis of Asgard archaea yielded strong 45 evidence in support of the 2D tree, in which eukaryotes appeared to share common ancestry with one of 46 the Asgard lineages, Heimdallarchaeota (1, 2, 5). However, the debate is not over as arguments have been 47 made for the 3D topology, in particular, based on the phylogenetic analysis of RNA polymerases, some of 48 the most highly conserved, universal proteins (9, 10).

49 Molecular phylogenetic methods alone might be insufficient to resolve the ancient ancestral relationship 50 between archaea and eukarya. To arrive at a compelling solution, supporting biological evidence is crucial 51 (11), because, for example, the transition from archaeal, ether-linked membrane to eukaryotic, 52 ester-linked lipids that constitute eukaryotic (and bacterial) membranes (12) and the apparent lack of the 53 capacity in Asgard archaea (13) are major problems for scenarios of the origin of eukaryotes 54 from Asgard or any other archaeal lineage. Some biochemical evidence indicates that Asgard archaea 55 possess an cytoskeleton regulated by accessory proteins, such as and gelsolins, and the 56 endosomal sorting complex required for transport machinery (ESCRT) that can be predicted to function 57 similarly to the eukaryotic counterparts (14–16). Generally, however, the biology of Asgard archaea 58 remains poorly characterized, in large part, because of their recalcitrance to growth in culture (17). To 59 date, only one Asgard archaeon, Candidatus Prometheoarchaeum syntrophicum strain MK-D1, has been 60 isolated and grown in culture (17). This organism has been reported to form extracellular protrusions that 61 are involved in its interaction with syntrophic , but no visible -like structure and, 62 apparently, little intracellular complexity.

63 The only complete, closed of an Asgard archaeon also comes from Candidatus P. syntrophicum 64 strain MK-D1 (17) whereas all other genome sequences were obtained by binning multiple

3

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

65 contigs. Furthermore, these genome sequences represent but a small fraction of the Asgard diversity that 66 has been revealed by 16S rRNA sequencing (18, 19).An additional challenge to the study of the 67 relationship between archaea and eukaryotes is that identification and analysis of the archaeal ESPs are 68 non-trivial tasks due to the high sequence divergence of many if not most of these proteins. At present, 69 the most efficient, realistic approach to the study of Asgard archaea and their eukaryotic connections 70 involves obtaining high quality genome sequences and analyzing them using the most powerful and 71 robust of the available computational methods.

72 Here we describe metagenomic mining of the expanding diversity of the superphylum Asgard, including 73 the identification of six additional -level lineages that thrive in a wide variety of ecosystems and 74 are inferred to possess versatile metabolic capacities. We show that these uncultivated Asgard groups 75 carry a broad repertoire of ESPs many of which have not been reported previously. Our in depth 76 phylogenomic analysis of these genomes provides insights into the evolution of Asgard archaea but calls 77 into question the origin of eukaryotes from within Asgard.

78

79 Results

80 Reconstruction of Asgard archaeal genomes from metagenomics data

81 We reconstructed 75 metagenome-assembled genomes (MAGs) from the Asgard superphylum that have 82 not been reported previously. These MAGs were recovered from various water depths of the Yap trench, 83 intertidal mangrove sediments of Mai Po Reserve (Hong Kong, China) and Futian Mangrove 84 Nature Reserve (Shenzhen, China), seagrass sediments of Lake Nature Reserve (Rongcheng, 85 China) and petroleum samples of Shengli oilfield (Shandong, China) (Supplementary Table 1, 86 Supplementary Figure 1). For all analyses described here, these 75 MAGs were combined with 87 87 publicly available genomes, resulting in a set of 162 Asgard genomes. The 75 genomes reconstructed here 88 were, on average, 82% complete and showed evidence of low contamination of about 3%, on average 89 (Supplementary Figure 2).

90

91 Classification of Asgard genes into clusters of orthologs

92 The previous analyses of Asgard genomes detected a large fraction of “dark matter” genes (20). For 93 example, in the recently published complete genome of Candidatus Prometheoarchaeum syntrophicum, 94 45% of the proteins are annotated as “hypothetical”. We made an effort to improve the annotation of 95 Asgard genomes by investigating this dark matter in greater depth, and developing a dedicated platform

4

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

96 for Asgard comparative genomics. To this end, we constructed Asgard Clusters of Orthologous Genes 97 (asCOGs) and used the most sensitive available methods of sequence analysis to annotate additional 98 Asgard proteins, attempting, in particular, to expand the catalogue of Asgard homologs of ESPs (see 99 Materials and Methods for details).

100 Preliminary clustering by sequence similarity and analysis of the protein cluster representation across the 101 genomes identified the set of 76 most complete Asgard MAGs (46 genomes available previously and 30 102 ones reported here) that cover most of the group diversity (Supplementary Table 1). The first version of 103 the asCOGs presented here consists of 14,704 orthologous protein families built for this 76-genome set. 104 The asCOGs cover from 72% to 98% (92% on average) of the proteins in these 76 genomes (additional 105 data file 1). Many asCOGs include individual domains of large, multidomain proteins.

106 The gene commonality plot for the asCOGs shows an abrupt drop at the right end, which reflects a 107 surprising deficit of nearly universal genes (Fig. 1). Such shape of the gene commonality curve appears 108 anomalous compared to other major groups of archaea or bacteria with many sequenced genomes (21). 109 For example, in the case of the TACK superphylum of archaea, for which the number of genomes 110 available is similar to that for Asgard, with a comparable level of diversity, the commonality plot shows 111 no drop at the right end, but instead, presents a clear uptick, which corresponds to the core of genes 112 represented in (almost) all genomes (Fig. 1). Apparently, most of the Asgard genomes remain incomplete, 113 such that conserved genes were missed randomly. Currently, there are only three gene families that are 114 present in all Asgard MAGs, namely, a Zn-ribbon domain, a Threonyl-tRNA synthetase and an 115 aminotransferase (additional data file 1).

116 We employed the asCOG profiles to annotate the remaining 86 Asgard MAGs, including those that were 117 sequenced in the later stages of this work (Supplementary Table 1). On average, 89% of the proteins 118 encoded in these genomes were covered by asCOGs (Supplementary Table 1). Thus, the asCOGs 119 database appears to be an efficient tool for annotation and comparative genomic analysis of Asgard 120 MAGs and complete genomes.

121

122 Expanding the phylogenetic diversity of Asgard archaea

123 Phylogenetic analysis of the Asgard MAGs based on a concatenated alignment of 209 core asCOGs (see 124 Methods and additional data file 2) placed many of the genomes reported here into the previously 125 delineated major Asgard lineages (Fig. 2a, Supplementary Table 1, additional data file 2), namely, 126 (n=20), (n=18), Hermodarchaeota (n=9), Gerdarchaeota (n=3),

5

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

127 Helarchaeota (n=2), and Odinarchaeota (n=1). Additionally, we identified 6 previously unknown major 128 Asgard lineages that appear to be strong candidates to become additional phyla (Fig. 2a and b, 129 Supplementary Table 1; see also Taxonomic description of new taxa in the Supplementary Information 130 and additional data file 1). A formed by As_085 and As_075 is a deeply branching sister group to 131 the previously recognized Heimdallarchaeota(2). Furthermore, our phylogenetic analysis supported the 132 further split of “Heimdallarchaeota” into 4 phylum-level lineages according to the branch length in the 133 concatenated phylogeny (see Materials and Methods; see also Taxonomic description of new taxa in the 134 Supplementary Information and additional data file 1). The putative phyla within the old 135 Heimdallarchaeota included the previously defined Gerdarchaeota (4), and three additional phyla that 136 could be represented by As_002 (LC2) , As_003 (LC3) and AB_125 (As_001), respectively. Another 3 137 previously undescribed lineages were related, respectively, to -, -, - and Thorarchaeota. 138 Specifically, As_181, As_178 and As_183 formed a clade that was deeply rooted at the Hel-Loki-Odin- 139 clade; As_129 and As_130 formed a sister group to Odinarchaeota; and a lineage represented by 140 As_086 was a sister group to Thorarchaeota. These results were buttressed by the 16S rRNA gene 141 phylogeny, comparisons of the mean identity and 16S rRNA sequence identity (Fig. 2b, 142 Supplementary Figure 3, Supplementary Figure 4, Supplementary Table 2 and Supplementary Table 3).

143 We propose the name Wukongarchaeota after Wukong, a Chinese legendary figure who caused havoc in 144 the heavenly palace, for the putative phylum represented by MAGs As_085 and As_075 (Candidatus 145 Wukongarchaeum yapensis), and names of Asgard deities in the for the other 5 146 proposed phyla: (1) Hodarchaeota, after Hod, the god of darkness, for MAG As_027 (Candidatus 147 Hodarchaeum mangrove); (2) Kariarchaeota, after Kari, the god of the North wind, for MAG As_030 148 (Candidatus Kariarchaeum pelagius); (3) Borrarchaeota after , the creator god and father of Odin, for 149 MAG As_133 (Candidatus Borrarchaeum yapensis); (4) Baldrarchaeota, after , the god of light and 150 brother of Thor, for MAG As_130 (Candidatus Baldrarchaeum yapensis); (5) and Hermodarchaeota after 151 Hermod, the messenger of the gods, son of Odin and brother of Baldr, for MAG As_086 (Candidatus 152 Hermodarchaeum yapensis) (Fig. 2a). For details, see Taxonomic Description of new taxa in the 153 Supplementary Information.

154 The gene content of Asgard MAGs agrees well with the phylogenetic structure of the group. The phyletic 155 patterns of the asCOG form clusters that generally correspond to the identified by phylogenetic 156 analysis (Fig. 3a), suggesting that gene gain and loss within Asgard archaea largely proceeded in a clock- 157 like manner and/or that horizontal gene exchange preferentially occurred between genomes within the 158 same clade.

159

6

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

160 Phylogenomic analysis and positions of Asgard archaea and eukaryotes in the tree of life

161 The outcome of phylogenetic reconstruction, especially, when deep branchings are involved, such as 162 those that are relevant for the 3D vs 2D conundrum, depends on the phylogenetic methods employed, the 163 selection of genes for phylogeny construction and, perhaps most dramatically, on the sampling 164 (22–24). In many cases, initially uncertain positions of lineages in a tree settle over time once more 165 representatives of the groups in question and their relatives become available.

166 In our analysis of the universal phylogeny, we aimed to make the species set for phylogenetic 167 reconstruction as broadly representative as possible, while keeping its size manageable, to allow the use 168 of powerful phylogenetic methods. The tree was constructed from alignments of conserved proteins of 169 162 Asgard archaea, 286 other archaea, 98 bacteria and 72 eukaryotes (see Supplementary Material and 170 Methods for details of the procedure including the selection of a representative species set and 171 Supplementary Table 4). Members of 30 families of (nearly) universal proteins that appear to have 172 evolved without much HGT and have been previously employed for the reconstruction of the tree of life 173 (25) were used to generate a concatenated alignment of 7411 positions, after removing low information 174 content positions (Supplementary Table 4). For the phylogenetic reconstruction, we used the IQ-tree 175 program with several phylogenetic models (see Methods and Supplementary Table 5 for details). 176 Surprisingly, the resulting trees had the 3D topology, with high support values for all key bifurcations 177 (Fig. 2c, additional data file 2).

178 A full investigation of the effects of different factors, in particular, the marker gene selection, on the tree 179 topology is beyond the scope of this work. Nonetheless, we addressed the possibility that the 3D topology 180 resulted from the model used for the tree reconstruction and/or the species selection. To this end, we 181 constructed 100 trees from the same alignment by randomly sampling 5 representatives of Asgard 182 archaea, other archaea, bacteria and eukaryotes each. For these smaller sets of species, the best model 183 identified by Williams et al. (LG+C60+G4+F) could be employed, resulting in 50 3D and 50 2D trees 184 (Supplementary Table 5, additional data file 2). Because IQ-tree identified this model as over-specified 185 for such a small alignment, we also tested a more restricted model (LG+C20+G4+F), obtaining 58 3D and 186 42 2D trees for the same set of 100 samples (Supplementary Table 5, additional data file 2).

187 The results of our phylogenetic analysis indicate that: 1) species sampling substantially affects the tree 188 topology; 2) even the set of most highly conserved genes that appear to be minimally prone to HGT, 189 yields conflicting signals for different species sets. Additional markers, less highly conserved and more 190 prone to HGT, are unlikely to improve phylogenetic resolution and might cause systematic error. Notably, 191 the topology of our complete (Fig. 2a) within the archaeal clade is mostly consistent

7

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

192 with the tree obtained in a preliminary analysis of a larger set of archaeal genomes and a larger marker 193 gene set (26). Taking into account these observations and the fact that we used the largest set of Asgard 194 archaea and other archaea compared to all previous phylogenetic analyses, the appearance of the 3D 195 topology in our tree indicates that the origin of eukaryotes from within Asgard cannot be considered a 196 settled issue. Various factors affecting the tree topology, including further increased species 197 representation, particularly, of Asgard and the deep branches of the TACK superphylum, such as 198 Bathyarchaeota and , remain to be explored in order to definitively resolve the evolutionary 199 relationship between archaea and eukaryotes.

200

201 The core gene set of Asgard archaea

202 We next analyzed the core set of conserved Asgard genes which we arbitrarily defined as all asCOGs that 203 are present at least in one third of the MAGs in each of the 12 phylum-level lineages, with the mean 204 representation across lineages >75%. Under these criteria, the Asgard core includes 378 asCOGs 205 (Supplementary Table 6). As expected, most of these protein families, 293 (77%), are universal (present 206 in bacteria, other archaea and eukaryotes), 62 (16%) are represented in other archaea and eukaryotes, but 207 not in bacteria, 15 (4%) are found in other archaea and bacteria, but not in eukaryotes, 7 (2%) are archaea- 208 specific, and only 1 (0.003%) is shared exclusively with eukaryotes (Supplementary Figure 5). Most of 209 the core asCOGs show comparable levels of similarity to homologs from two or all three domains of life. 210 The second largest fraction of the core asCOGs shows substantially greater sequence similarity (at least, 211 25% higher similarity score) to homologous proteins from archaea than to those from eukaryotes and/or 212 bacteria (Supplementary Table 6). Compared with the 219 genes that comprise the pan-archaeal core (27), 213 the Asgard core set lacks 12 genes, each of which, however, is present in some subset of the Asgard 214 genomes. These include three genes of diphthamide and 2 ribosomal proteins, L40E and 215 L37E. The intricate evolutionary history of gene encoding translation elongation factors and enzymes of 216 diphthamide biosynthesis in Asgard has been analyzed previously (28). Also of note is the displacement 217 of the typical archaeal glyceraldehyde-3-phosphate dehydrogenase (type II) by a bacterial one (type I) in 218 most of the Asgard genomes (cog.001204, additional data file 1).

219 Functional distribution of the core asCOGs is shown in Fig. 3b (also see additional data file 1). For 220 comparison, we also derived an extended gene core for the TACK superphylum, using similar criteria (at 221 least 50% in each of the 6 lineages and 75% of the genomes overall, Fig 3b). For at least half of the 222 Asgard core genes, across most functional classes, there were no orthologs in the TACK core. The most 223 pronounced differences were found, as expected, in the category U (intracellular trafficking, secretion,

8

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

224 and vesicular transport). In Asgard archaea, this category includes 19 core genes compared with 7 genes 225 in TACK; 13 of these genes are specific to the Asgard archaea and include components of ESCRT I and 226 II, 3 distinct Roadblock/longin families, 2 distinct families of small GTPases, and a few other genes 227 implicated in related processes (Supplementary Table 6).

228 We compared the protein annotation obtained using asCOGs with the available annotation of ‘Candidatus 229 Prometheoarchaeum syntrophicum’ and found that using asCOGs allowed at least a general functional 230 prediction for 649 of the 1756 (37%) ‘hypothetical proteins’ in this organism, the only one in Asgard with 231 a closed genome. We also identified 139 proteins, in addition to the 80 described originally, that can be 232 considered Eukaryotic Signature Proteins, or ESPs (see next section).

233

234 Eukaryotic features of Asgard archaea gleaned from genome analysis

235 The enrichment of Asgard proteomes with homologs of eukaryote signature proteins (ESPs), such as 236 ESCRTs, components of protein sorting complexes including coat proteins, complete ubiquitin 237 machinery, and actin-binding proteins gelsolins and profilins, might be the strongest argument in 238 support of a direct evolutionary relationship between Asgard archaea and eukaryotes (2, 29). However, 239 the definition of ESPs is fuzzy because many of these proteins, in addition to their occurrence in Asgard, 240 are either scattered among several other archaeal genomes, often, from diverse groups (30), or consist of 241 promiscuous domains that are common in archaea, bacteria and eukaryotes, such as WD40 (after the 242 conserved terminal amino acids of the repeat units, also known as beta-transducin repeats), LRR 243 (Leucine-Rich Repeats), TPR (TetratricoPeptide Repeats), HEAT (Huntingtin-EF3-protein phosphatase 244 2A-TOR1) and other, largely, repetitive domains (31, 32). Furthermore, the sequences of some ESPs have 245 diverged to the extent that they become hardly detectable with standard computational methods. Our 246 computational strategy for delineating an extensive yet robust ESP set is described under Materials and 247 Methods. The ESP set we identified contained 505 asCOGs, including 238 that were not closely similar 248 (E-value=10-10, length coverage 75%) to those previously described by Zaremba-Niedzwiedzka et al. 249 (2)(Supplementary Table 7). In a general agreement with previous observations, the majority of these 250 ESPs, 329 of the 505, belonged to the ‘Intracellular trafficking, secretion, and vesicular transport’ (U) 251 functional class, followed by ‘Posttranslational modification, protein turnover, chaperones’ (O), with 101 252 asCOGs (Supplementary Table 7). Among the asCOGs in the U class, 130 were Roadblock/LC7 253 superfamily proteins, including longins, sybindin and profilins, and 94 were small GTPases of several 254 families, such as RagA-like, Arf-like and Rab-like ones, as discussed previously (33).

9

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

255 The phyletic patterns of ESP asCOGs in Asgard archaea are extremely patchy and largely lineage-specific 256 (Fig. 4), indicating that most of the proteins in this set are not uniformly conserved throughout Asgard 257 evolution, but rather, are prone to frequent HGT, gene losses and duplications. These evolutionary 258 processes are correlated in , resulting in the overall picture of highly dynamic evolution (34). 259 Even the most highly conserved ESP asCOG are missing in some Asgard lineages but show multiple 260 duplications in others (Fig. 4 and Supplementary Table 7). Surprisingly, many gaps in the ESPs 261 distribution were detected in the Heimdallarchaeota that include the suspected ancestors of eukaryotes.

262 Characteristically, many ESPs are multidomain proteins, with 37% assigned to more than one asCOG, 263 compared to 17% among non-ESP proteins (Supplementary Table 7). Some multidomain ESPs in Asgard 264 archaea have the same domain organizations as their homologs in eukaryotes, but these are a minority and 265 typically contain only two domains. Examples include the fusion of two EAP30/Vps37 domains (35), and 266 Vps23 and E2 domains (35) in ESCRT complexes, multiple Rag family GTPases, in which longin domain 267 is fused to the GTPase domain, and several others. By contrast, most of the domain architectures of the 268 multidomain ESP proteins were not detected in eukaryotes and often are found only in a narrow subset of 269 Asgard archaea, suggesting extensive domain shuffling during Asgard evolution (Fig. 5a). For example, 270 we identified many proteins containing a fusion of Vps28/Vps23 from ESCRT I complex (35) with C- 271 terminal domains of several homologous subunits of adaptin and COPI coatomer complexes (36, 37), and 272 E3 UFM1-protein ligase 1, which is involved in the UFM1 ubiquitin pathway (38) (Fig. 5a). Generally, a 273 protein with such a combination of domains can be predicted to be involved in ubiquitin-dependent 274 membrane remodeling but, because its domain architecture is unique, the precise function cannot be 275 inferred.

276 The majority of the ESP genes of Asgard archaea do not belong to conserved genomic neighborhoods, but 277 several such putative operons were detected. Perhaps, the most notable one is the ESCRT neighborhood 278 which includes genes coding for subunits of ESCRT I, II and III, and often, components of the ubiquitin 279 system (2), suggesting an ancient link between the two systems that persists in eukaryotes (35). We 280 predicted another operon that is conserved in most Asgard archaea and consists of genes encoding a 281 LAMTOR1-like protein of the Roadblock superfamily, a Rab-like small GTPase, and a protein containing 282 the DENN (differentially expressed in normal and neoplastic cells) domain that so far has been identified 283 only in eukaryotes (Fig. 5b). Two proteins consisting of a DENN domain fused to longin are subunits of 284 the folliculin (FLCN) complex that is conserved in eukaryotes. The FLCN complex is the sensor of amino 285 acid starvation interacting with Rag GTPase and Ragulator lysosomal complex, and a key component of 286 the mTORC1 pathway, the central regulator of cell growth in eukaryotes (39). Some Heimdallarchaea 287 encode several proteins with the exact same domain organization as FLCN (Fig. 5b). Ragulator is a

10

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

288 complex that consists of 5 subunits, each containing the Roadblock domain. In Asgard archaea, however, 289 the GTPase present in the operon is from a family that is distinct from the Rag GTPases, which interact 290 with both FLCN and Ragulator complexes in eukaryotes, despite the fact that Rag family GTPases are 291 abundant in Asgards (33) (Supplementary Table 7). Nevertheless, this conserved module of Asgard 292 proteins is a strong candidate to function as a guanine exchange factor for Rab and Rag 293 GTPases, analogously to the eukaryotic FLCN. In eukaryotes, the DENN domain is present in many 294 proteins with different domain architectures that interact with different partners and perform a variety of 295 functions (40, 41). The Asgard archaea also encode other DENN domain proteins, and the respective 296 genes form expanded families of paralogs in Loki, Hel and Heimdall lineages, again, with domain 297 architectures distinct from those in eukaryotes (Fig. 5b) (42).

298 Prompted by the identification of a FLCN-like complex, we searched for other components of the 299 mTORC1 regulatory pathway in Asgard archaea. The GATOR1 complex that consists of three subunits, 300 Depdc5, Nprl2, and Nprl3, is another amino acid starvation sensor that is involved in this pathway in 301 eukaryotes (43). Nitrogen permease regulators 2 and 3 (NPRL2 and NPRL3) are homologous GATOR1 302 subunits that contain a longin domain and a small NPRL2-specific C-terminal domain (43). We identified 303 a protein family with this domain organization in most Thor MAGs and a few Loki MAGs. Several other 304 ESP asCOGs include proteins with high similarity to the longin domain of NPRL2. Additionally, we 305 identified many fusions of the NPRL2-like longin domain with various domains related to prokaryotic 306 two-component signal transduction system (Fig. 5c). Considering the absence of a homolog of 307 phosphatidylinositol 3-kinase, the catalytic domain of the mTOR protein, it seems likely that, in Asgard 308 archaea, the key growth regulation pathway remains centered at typical prokaryotic two-component signal 309 transduction systems whereas at least some of the regulators and sensors in this pathway are “eukaryotic”. 310 The abundance of NPRL2-like longin domains in Asgard archaea implies that the link between this 311 domain and amino acid starvation regulation emerged at the onset of Asgard evolution if not earlier.

312

313 Diverse metabolic repertoires, ancestral metabolism of Asgard archaea, and syntrophic evolution

314 Examination of the distribution of the asCOGs among the 12 Asgard archaeal phyla showed that the 315 metabolic pathway repertoire was conserved among the MAGs of each phylum but differed between the 316 phyla (Fig. 3a). Three distinct lifestyles were predicted by the asCOG analysis for different major 317 branches of Asgard archaea, namely, anaerobic heterotrophy, facultative aerobic heterotrophy, and 318 chemolithotrophy (Fig. 6, Supplementary Figure 9). For the last Asgard archaeal common ancestor

319 (LAsCA), a mixotrophic life style, including both production and consumption of H2, can be inferred

11

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

320 from parsimony considerations (Fig. 6, Supplementary Table 8; see Materials and Methods for further 321 details). Loki-, Thor-, Hermod-, Baldr- and Borrarchaeota encode all enzymes for the complete (archaeal) 322 Wood-Ljungdahl pathway (WLP) and are predicted to oxidize organic substrates, likely, by using the 323 reverse WLP, given the lack of enzymes for oxidation of inorganic compounds (e.g., hydrogen, 324 sulfur/sulfide and nitrogen/). The genomes of these five Asgard phyla encode homologues of

325 membrane-bound respiratory H2-evolving Group 4 [NiFe] hydrogenase (Supplementary Figure 6) and/or 326 cytosolic cofactor-coupled bidirectional Group 3 [NiFe] hydrogenase (44) (Supplementary Figure 7). 327 Phylogenetic analysis of both group 3 and group 4 [NiFe] hydrogenases showed that Asgard archaea form 328 distinct clades well separated from the functionally characterized hydrogenases, hampering the prediction 329 of their specific functions in Asgard archaea. The functionally characterized group 4 [NiFe] hydrogenases

330 in the are involved in the of organic substrates to H2, acetate and carbon 331 dioxide (45, 46). The presence of group 3 [NiFe] hydrogenases suggests that these Asgard archaea cannot

332 use H2 as an electron donor because they lack the enzyme complex coupling H2 oxidation to membrane

333 potential generation. Thus, in these organisms, bifurcate electrons from H2 are likely to be used to support 334 the fermentation of organic substrates exclusively (45–47).

335 Both Wukongarchaeota genomes (As_075 and As_085) encode a bona fide membrane-bound Group 1k 336 [NiFe] hydrogenase that could mediate hydrogenotrophic respiration using heterodisulfide as the terminal 337 electron acceptor (48, 49) (Fig. 6, Supplementary Figure 8, Supplementary Figure 10). The group 1k 338 [NiFe] hydrogenase is exclusively found in of the order (Euryarchaoeta) 339 (50), and it is the first discovery of the group 1 [NiFe] hydrogenase in the Asgard archaea. 340 Wukongarchaeota also encode all enzymes for a complete WLP and a putative ADP-dependent acetyl- 341 CoA synthetase for acetate synthesis. Unlike all other Asgard archaea, Wukongarchaeota lack genes for 342 citrate cycle and beta-oxidation. Thus, Wukongarchaeota appear to be obligate chemolithotrophic 343 acetogens. The genomes of Wukongarchaeota were discovered only in seawater of the euphotic zone of

344 the Yap trench (0 m and 125 m). Dissolved H2 concentration is known to be the highest in surface 345 seawater, where the active microbial fermentation, compared to deep sea (51), could produce sufficient 346 amounts of hydrogen for the growth of Wukongarchaeota. Hodarchaeota, Gerdarchaeota, Kariarchaeota, 347 and Heimdallarchaeota share a common ancestor with Wukongarchaeota (Fig. 6). However, genome 348 analysis implies different lifestyles for these organisms. Hod-, Gerd- and Kariarchaeota encode various 349 electron transport chain components, including heme/copper-type cytochrome/quinol oxidase, nitrate 350 reductase, and NADH dehydrogenase, most likely, allowing the use of oxygen and nitrate as electron 351 acceptors during aerobic and anaerobic respiration, respectively (44). In addition, Hod-, Gerd- and 352 Heimdallarchaeota encode phosphoadenosine phosphosulfate (PAPS) reductase and adenylylsulfate 353 kinase for sulfate reduction, enabling the use of sulfate as electron acceptor during anaerobic respiration.

12

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

354 Gerd-, Heimdall-, and Hodarchaeota are only found in coastal and deep-sea sedimentary environments, 355 whereas Kariarchaeota were found also in marine water. The versatile predicted metabolic capacities of 356 these groups suggest that Hod-, Gerd- and Kariarchaeota might occupy both anoxic and oxic niches. In 357 contrast, Heimdallarchaeota appear to be able to thrive only in anoxic environments.

358 In the widely considered scenarios (52), eukaryogenesis has been proposed to involve 359 metabolic (syntrophy) between an archaeon and one or two bacterial partners which, in the

360 original hydrogen/syntrophy hypothesis, were postulated to donate H2 for methane or hydrogen sulfide 361 production by the consortium (53, 54). The syntrophic scenarios were boosted by the discovery of 362 apparent syntrophy between Candidatus P. syntrophicum and Deltaproteobacteria which led to the 363 proposal of the Etangle-Engulf-Endogenize (E3) model of eukaryogenesis (17, 55). Reconstruction of the 364 Lokiarchaeon metabolism has suggested that this organism was hydrogen-dependent, in accord with the 365 hydrogen-syntrophic scenarios (54). In contrast, subsequent analysis of the metabolic potentials of 4

366 Asgard phyla has led to the inference that these organisms were primarily organoheterotrophic and H2- 367 producing, the ‘reverse flow model’ model of protoeukaryote energy metabolism that involves electron or 368 hydrogen flow from an Asgard archaeon to the alphaproteobacterial ancestor of mitochondria, in the 369 opposite direction from that in the original hydrogen-syntrophy hypotheses (44). Here, we discovered a 370 deeply branching Asgard group, Wukongarchaeota, which appears to include obligate hydrogenotrophic 371 acetogens, suggesting the possibility of the LAsCA being a hydrogen-dependent (Fig. 6). This

372 finding suggests that LAsCA both produced and consumed H2. Thus, depending on the exact relationship 373 between Asgard archaea and eukaryotes that remains to be elucidated, our findings could be compatible

374 with different syntrophic scenarios that postulate H2 transfer from bacteria to the archaeal symbiont or in 375 the opposite direction.

376

377 Conclusions

378 The Asgard archaea that were discovered only 5 years ago as a result of the painstaking assembly of 379 several Loki Castle metagenomes have grown into a highly diverse archaeal superphylum. The most 380 remarkable feature of the Asgards is their apparent evolutionary affinity with eukaryotes that has been 381 buttressed by two independent lines of evidence: phylogenetic analysis of highly conserved genes and 382 detection of multiple ESPs that are absent or far less common in other archaea. The 75 MAGs added here 383 substantially expand the phylogenetic and metabolic diversity of the Asgard superphylum. The extended 384 set of Asgard genomes provides for a phylogeny based on a far more representative species sampling than 385 available previously and a substantially expanded ESP analysis employing powerful computational

13

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

386 methods. This extended analysis reveals 6 putative additional Asgard phyla but does not immediately 387 clarify the Asgard-eukaryote relationship. Indeed, phylogenetic analysis of conserved genes in an 388 expanded set of archaea, bacteria and eukaryotes yields conflicting 3D and 2D signals, with an 389 unexpected preference for the 3D topology. Thus, the conclusion that eukaryotes emerged from within 390 Asgard archaea, in particular, from the Heimdall lineage, appears to be premature. Further phylogenomic 391 study with an even broader representation of diverse archaeal lineages as well as, possibly, even more 392 sophisticated evolutionary models are required to clarify the relationships between archaea and 393 eukaryotes.

394 Our analysis of Asgard genomes substantially expanded the set of ESPs encoded by this group of archaea 395 and revealed numerous, complex domain architectures of these proteins. These results further emphasize 396 the excess of ESPs in Asgards compared to other archaea and provide additional support to the conclusion 397 that most of the Asgard ESPs are involved in membrane remodeling and intracellular trafficking. 398 However, in parallel with the phylogenomic results, detailed analysis of the ESPs reveals a complex 399 picture. Most of the multidomain Asgard ESPs possess domain architectures distinct from typical 400 eukaryotic ones and some of these arrangements include signature prokaryotic domains, suggesting 401 substantial functional differences from the respective eukaryotic systems. Furthermore, virtually all the 402 ESPs show patchy distributions in Asgard and other archaea, indicative of a history of extensive HGT, 403 gene losses and paralogous family expansion. All these findings seem to be best compatible with the 404 model of a dispersed, dynamic archaeal ‘eukaryome’ (30) that widely spreads among archaea via HGT, so 405 far reaching the highest ESP density in the Asgard archaea.

406 The results of this work cannot rule out the possibility of the emergence of eukaryotes from within the 407 Asgard but seem to be better compatible with a different evolutionary scenario under which the conserved 408 core of eukaryote genes involved in informational processes originates from an as yet unknown ancestor 409 group that might be a deep archaeal branch or could lie outside the presently characterized archaeal 410 diversity. These hypothetical ancestral forms might have accumulated components of the mobile archaeal 411 ‘eukaryome’ to an even greater extent than the Asgards archaea, eventually, giving rise to eukaryote-like 412 cells, likely, via a form of syntrophy with one or more bacterial partners. Combined genomic and 413 (undoubtedly, far more challenging) biological study of diverse archaea is essential for further advancing 414 our understanding of eukaryogenesis.

415 416 Author contributions

14

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

417 ML, EVK, KSM and YL initiated the study;; YL performed metagenomic assembly, binning, metabolism 418 analysis; KSM, AN and YIW performed comparative genomic analysis; YL, KSM, YIW and WCH 419 performed phylogenetic analysis; KSM and YIW constructed asCOGs; KSM, YIW, YL, ML, and EVK 420 analyzed the data; YL, KSM, WCH, EVK, and ML wrote the manuscript that was read, edited and 421 approved by all authors.

422

423 Acknowledgements

424 ML and YL are supported by National Natural Science Foundation of China (Grant No. 91851105, 425 31970105 and 31700430), the Key Project of Department of Education of Guangdong Province 426 (No.2017KZDXM071), and the Shenzhen Science and Technology Program (Grant no. 427 JCYJ20170818091727570 and KQTD20180412181334790). KSM, YIW, SAS and EVK are supported 428 by the Intramural Research Program of the National Institutes of Health of the USA (National Library of 429 Medicine).

430

431 Competing interests

432 The authors declare no competing interests

15

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

433 References

434 1. A. Spang, J. H. Saw, S. L. Jørgensen, K. Zaremba-Niedzwiedzka, J. Martijn, A. E. Lind, 435 R. van Eijk, C. Schleper, L. Guy, T. J. G. Ettema, Complex archaea that bridge the gap between 436 prokaryotes and eukaryotes. Nature. 521, 173–179 (2015).

437 2. K. Zaremba-Niedzwiedzka, E. F. Caceres, J. H. Saw, D. B ckstr m, L. Juzokaite, E. 438 Vancaester, K. W. Seitz, K. Anantharaman, P. Starnawski, K. U. Kjeldsen, M. B. Stott, T. 439 Nunoura, J. F. Banfield, A. Schramm, B. J. Baker, A. Spang, T. J. G. Ettema, Asgard archaea 440 illuminate the origin of eukaryotic cellular complexity. Nature. 541, 353–358 (2017).

441 3. F. MacLeod, G. S. Kindler, H. L. Wong, R. Chen, B. P. Burns, Asgard archaea: 442 Diversity, function, and evolutionary implications in a range of microbiomes. AIMS Microbiol. 443 5, 48–61 (2019).

444 4. M. Cai, Y. Liu, X. Yin, Z. Zhou, M. W. Friedrich, T. Richter-Heitmann, R. Nimzyk, A. 445 Kulkarni, X. Wang, W. Li, J. Pan, Y. Yang, J.-D. Gu, M. Li, Diverse Asgard archaea including 446 the novel phylum Gerdarchaeota participate in organic matter degradation. Sci. China Life Sci. 447 63, 886–897 (2020).

448 5. 449 provides robust support for a two-domains tree of life. Nat Ecol Evol (2019), 450 doi:10.1038/s41559-019-1040-x.

451 6. T. A. Williams, P. G. Foster, C. J. Cox, T. M. Embley, An archaeal origin of eukaryotes 452 supports only two primary domains of life. Nature. 504, 231–236 (2013).

453 7. C. J. Cox, P. G. Foster, R. P. Hirt, S. R. Harris, T. M. Embley, The archaebacterial origin 454 of eukaryotes. Proc Natl Acad Sci U S A. 105, 20356–20361 (2008).

455 8. N. Yutin, K. S. Makarova, S. L. Mekhedov, Y. I. Wolf, E. V. Koonin, The deep archaeal 456 roots of eukaryotes. Mol. Biol. Evol. 25, 1619–1630 (2008).

457 9. V. Da Cunha, M. Gaia, D. Gadelle, A. Nasir, P. Forterre, Lokiarchaea are close relatives 458 of , not bridging the gap between prokaryotes and eukaryotes. PLoS Genet. 13, 459 e1006810 (2017).

460 10. V. D. Cunha, M. Gaia, A. Nasir, P. Forterre, Asgard archaea do not close the debate 461 about the universal tree of life topology. PLoS Genet. 14, e1007215 (2018).

462 11. P. Forterre, The universal tree of life: an update. Front. Microbiol. 6, 717 (2015).

463 12. J. Lombard, P. López-García, D. Moreira, The early evolution of membranes and 464 the three domains of life. Nat. Rev. Microbiol. 10, 507–515 (2012).

465 13. J. A. Burns, A. A. Pittis, E. Kim, Gene-based predictive models of trophic modes suggest 466 Asgard archaea are not phagocytotic. Nat. Ecol. Evol. 6, a016006 (2018).

16

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

467 14. 468 Nature. 562, 439–443 (2018).

469 15. -Prioux, Y. Baskaran, E. Manser, L. Blanchoin, R. C. 470 Robinson, Insights into the evolution of regulated actin dynamics via characterization of 471 primitive /cofilin proteins from Asgard archaea. Proc Natl Acad Sci U S A. 117, 19904– 472 19913 (2020).

473 16. Z. Lu, T. Fu, T. Li, Y. Liu, S. Zhang, J. Li, J. Dai, E. V. Koonin, G. Li, H. Chu, M. Li, 474 Coevolution of Eukaryote-like Vps4 and ESCRT-III Subunits in the Asgard Archaea. mBio. 11, 475 e00417-20, /mbio/11/3/mBio.00417-20.atom (2020).

476 17. H. Imachi, M. K. Nobu, N. Nakahara, Y. Morono, M. Ogawara, Y. Takaki, Y. Takano, 477 K. Uematsu, T. Ikuta, M. Ito, Y. Matsui, M. Miyazaki, K. Murata, Y. Saito, S. Sakai, C. Song, E. 478 Tasumi, Y. Yamanaka, T. Yamaguchi, Y. Kamagata, H. Tamaki, K. Takai, Isolation of an 479 archaeon at the -eukaryote interface. Nature. 577, 519–525 (2020).

480 18. S. M. Karst, M. S. Dueholm, S. J. McIlroy, R. H. Kirkegaard, P. H. Nielsen, M. 481 Albertsen, Nat. Biotechnol., in press, doi:10.1038/nbt.4045.

482 19. R.-Y. Zhang, B. Zou, Y.-W. Yan, C. O. Jeon, M. Li, M. Cai, Z.-X. Quan, Design of 483 targeted primers based on 16S rRNA sequences in meta-transcriptomic datasets and 484 identification of a novel taxonomic group in the Asgard archaea. BMC microbiology. 20, 25 485 (2020).

486 20. K. S. Makarova, Y. I. Wolf, E. V. Koonin, Towards functional characterization of 487 archaeal genomic dark matter. Biochem Soc Trans. 47, 389–398 (2019).

488 21. E. V. Koonin, Y. I. Wolf, Genomics of bacteria and archaea: the emerging dynamic view 489 of the prokaryotic world. Nucleic Acids Res. 36, 6688–6719 (2008).

490 22. A. R. Nabhan, I. N. Sarkar, The impact of taxon sampling on phylogenetic inference: a 491 review of two decades of controversy. Brief Bioinform. 13, 122–134 (2012).

492 23. R. Gouy, D. Baurain, H. Philippe, Rooting the tree of life: the phylogenetic jury is still 493 out. Philos Trans R Soc Lond B Biol Sci. 370, 20140329 (2015).

494 24. J. Felsenstein, Inferring Phylogenies (Oxford University Press, Oxford, New York, 2nd 495 Edition., 2003).

496 25. F. D. Ciccarelli, T. Doerks, C. von Mering, C. J. Creevey, B. Snel, P. Bork, Toward 497 automatic reconstruction of a highly resolved tree of life. Science (New York, N.Y.). 311, 1283– 498 1287 (2006).

499 26. C. Rinke, M. Chuvochina, A. J. Mussig, P.-A. Chaumeil, D. W. Waite, W. B. Whitman, 500 D. H. Parks, P. Hugenholtz, bioRxiv, in press, doi:10.1101/2020.03.01.972265.

17

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

501 27. K. S. Makarova, Y. I. Wolf, E. V. Koonin, Archaeal Clusters of Orthologous Genes 502 (arCOGs): An Update and Application for Analysis of Shared Features between 503 , , and . Life (Basel). 5, 818–840 (2015).

504 28. A. B. Narrowe, A. Spang, C. W. Stairs, E. F. Caceres, B. J. Baker, C. S. Miller, T. J. G. 505 Ettema, Complex Evolutionary History of Translation Elongation Factor 2 and Diphthamide 506 Biosynthesis in Archaea and Parabasalids. Genome Biol Evol. 10, 2380–2393 (2018).

507 29. L. Eme, A. Spang, J. Lombard, C. W. Stairs, T. J. G. Ettema, Archaea and the origin of 508 eukaryotes. Nat. Rev. Microbiol. 15, 711–723 (2017).

509 30. E. V. Koonin, N. Yutin, The dispersed archaeal eukaryome and the complex archaeal 510 ancestor of eukaryotes. Cold Spring Harb Perspect Biol. 6, a016188 (2014).

511 31. R. L. Tatusov, N. D. Fedorova, J. D. Jackson, A. R. Jacobs, B. Kiryutin, E. V. Koonin, D. 512 M. Krylov, R. Mazumder, S. L. Mekhedov, A. N. Nikolskaya, B. S. Rao, S. Smirnov, A. V. 513 Sverdlov, S. Vasudevan, Y. I. Wolf, J. J. Yin, D. A. Natale, The COG database: an updated 514 version includes eukaryotes. BMC Bioinformatics. 4, 41 (2003).

515 32. E. V. Koonin, N. D. Fedorova, J. D. Jackson, A. R. Jacobs, D. M. Krylov, K. S. 516 Makarova, R. Mazumder, S. L. Mekhedov, A. N. Nikolskaya, B. S. Rao, I. B. Rogozin, S. 517 Smirnov, A. V. Sorokin, A. V. Sverdlov, S. Vasudevan, Y. I. Wolf, J. J. Yin, D. A. Natale, A 518 comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. 519 Genome Biol. 5, R7 (2004).

520 33. C. M. Klinger, A. Spang, J. B. Dacks, T. J. G. Ettema, Tracing the Archaeal Origins of 521 Eukaryotic Membrane-Trafficking System Building Blocks. Mol Biol Evol. 33, 1528–1541 522 (2016).

523 34. P. Puigbò, A. E. Lobkovsky, D. M. Kristensen, Y. I. Wolf, E. V. Koonin, Genomes in 524 turmoil: quantification of genome dynamics in prokaryote supergenomes. BMC Biol. 12, 66 525 (2014).

526 35. L. Christ, C. Raiborg, E. M. Wenzel, C. Campsteijn, H. Stenmark, Cellular Functions and 527 Molecular Mechanisms of the ESCRT Membrane-Scission Machinery. Trends Biochem Sci. 42, 528 42–56 (2017).

529 36. N. Gomez-Navarro, E. Miller, Protein sorting at the ER-Golgi interface. J Cell Biol. 215, 530 769–778 (2016).

531 37. K. M. K. Kibria, J. Ferdous, R. Sardar, A. Panda, D. Gupta, A. Mohmmed, P. Malhotra, 532 A genome-wide analysis of coatomer protein (COP) subunits of apicomplexan parasites and their 533 evolutionary relationships. BMC Genomics. 20, 98 (2019).

534 38. Y. Wei, X. Xu, UFMylation: A Unique & Fashionable Modification for Life. Genomics 535 Proteomics Bioinformatics. 14, 140–146 (2016).

536 39. R. E. Lawrence, S. A. Fromm, Y. Fu, A. L. Yokom, D. J. Kim, A. M. Thelen, L. N. 537 Young, C.-Y. Lim, A. J. Samelson, J. H. Hurley, R. Zoncu, Structural mechanism of a Rag

18

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

538 GTPase activation checkpoint by the lysosomal folliculin complex. Science. 366, 971–977 539 (2019).

540 40. M.-Y. Su, S. A. Fromm, R. Zoncu, J. H. Hurley, Structure of the C9orf72 ARF GAP 541 complex that is haploinsufficient in ALS and FTD. Nature. 585, 251–255 (2020).

542 41. N. de Martín Garrido, C. H. S. Aylett, Nutrient Signaling and Lysosome Positioning 543 Crosstalk Through a Multifunctional Protein, Folliculin. Front Cell Dev Biol. 8, 108 (2020).

544 42. A. L. Marat, H. Dokainish, P. S. McPherson, DENN domain proteins: regulators of Rab 545 GTPases. J Biol Chem. 286, 13791–13800 (2011).

546 43. K. Shen, R. K. Huang, E. J. Brignole, K. J. Condon, M. L. Valenstein, L. Chantranupong, 547 A. Bomaliyamu, A. Choe, C. Hong, Z. Yu, D. M. Sabatini, Architecture of the human GATOR1 548 and GATOR1-Rag GTPases complexes. Nature. 556, 64–69 (2018).

549 44. A. Spang, C. W. Stairs, N. Dombrowski, L. Eme, J. Lombard, E. F. Caceres, C. 550 Greening, B. J. Baker, T. J. G. Ettema, Proposal of the reverse flow model for the origin of the 551 eukaryotic cell based on comparative analyses of Asgard archaeal metabolism. Nat. Microbiol. 4, 552 1138–1148 (2019).

553 45. H. Yu, C.-H. Wu, G. J. Schut, D. K. Haja, G. Zhao, J. W. Peters, M. W. W. Adams, H. 554 Li, Structure of an Ancient Respiratory System. Cell. 173, 1636-1649.e16 (2018).

555 46. G. J. Schut, E. S. Boyd, J. W. Peters, M. W. W. Adams, The modular respiratory 556 complexes involved in hydrogen and sulfur metabolism by heterotrophic hyperthermophilic 557 archaea and their evolutionary implications. FEMS Microbiol Rev. 37, 182–203 (2013).

558 47. F. O. Bryant, M. W. Adams, Characterization of hydrogenase from the hyperthermophilic 559 archaebacterium, furiosus. J Biol Chem. 264, 5070–5079 (1989).

560 48. U. Deppenmeier, M. Blaut, B. Schmidt, G. Gottschalk, Purification and properties of a 561 F420-nonreactive, membrane-bound hydrogenase from strain Gö1. Arch 562 Microbiol. 157, 505–511 (1992).

563 49. R. K. Thauer, A.-K. Kaster, M. Goenrich, M. Schick, T. Hiromoto, S. Shima, 564 Hydrogenases from methanogenic archaea, nickel, a novel cofactor, and H2 storage. Annu Rev 565 Biochem. 79, 507–536 (2010).

566 50. D. Søndergaard, C. N. S. Pedersen, C. Greening, HydDB: A web tool for hydrogenase 567 classification and analysis. Scientific Reports. 6, 34212 (2016).

568 51. R. Conrad, W. Seiler, Methane and hydrogen in seawater (Atlantic Ocean). Deep Sea 569 Res. Part I Oceanogr. Res. Pap. 35, 1903–1917 (1988).

570 52. P. López-García, D. Moreira, The Syntrophy hypothesis for the origin of eukaryotes 571 revisited. Nat Microbiol. 5, 655–667 (2020).

572 53. W. Martin, M. Müller, The hydrogen hypothesis for the first eukaryote. Nature. 392, 37– 573 41 (1998).

19

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

574 54. D. Moreira, P. López-- 575 as the Origin of Eukaryotes: The Syntrophic Hypothesis. J Mol Evol. 47, 517–530 576 (1998).

577 55. P. López-García, D. Moreira, Cultured Asgard Archaea Shed Light on Eukaryogenesis. 578 Cell. 181, 232–235 (2020). 579

580

20

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

581

582 .

583

21

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

584

585 Figures

586

587 Figure 1. Gene commonality plots for Asgard archaea and the TACK superphylum.

588 The gene commonality plot showing the number of asCOGs in log scale (Y-axis) that include the given 589 fraction of analyzed genomes (X-axis). The Asgard plot is compared with the TACK superphylum plot 590 based on assignment of TACK genomes to arCOGs.

591

592 Figure 2. Phylogenetic analysis of Asgard archaea and their relationships with eukaryotes.

593 (a) Maximum likelihood tree, inferred with IQ-tree and LG+F+R10 model, constructed from 594 concatenated alignments of the protein sequences from 209 core Asgard Clusters of Orthologs (asCOGs). 595 Only the 12 phylum-level clades are shown, with species within each clade collapsed. See supplementary 596 Methods and Supplementary Table 5 for details.

597 (b) Maximum likelihood tree, inferred with IQ-tree and SYM+R8 model, based on 16S rRNA gene 598 sequences. Red stars in (b) denote MAGs reconstructed in the current study.

599 (c) Phylogenetic tree of bacteria, archaea and eukaryotes, inferred with IQ-tree under LG+R10 model, 600 constructed from concatenated alignments of the protein sequences of 30 universally conserved genes 601 (see Material and Methods for details). The tree shows the relationships between the major clades.

602 The trees are unrooted and are shown in a pseudorooted form for visualization purposes only. The actual 603 trees and alignments are in Additional data file 2 and list of the trees are provided in the Supplementary 604 Table 4 and 5.

605

606 Figure 3. Phyletic patterns of asCOGs and functional distribution of Asgard core genes.

607 (a) Classical Multidimensional Scaling analysis of binary presence-absence phyletic patterns for 13,939 608 asCOGs that are represented in at least two genomes (see Material and Methods for details).

609 (b) Functional breakdown of Asgard core genes (378 asCOGs) compared with TACK superphylum core 610 genes (489 arCOGs). The values were normalized as described in Materials and Methods. Functional 611 classes of genes: J, Translation, ribosomal structure and biogenesis; K, Transcription; L, Replication,

22

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

612 recombination and repair; D, Cell cycle control, cell division, chromosome partitioning; V, Defense 613 mechanisms; T, Signal transduction mechanisms; M, Cell wall/membrane/envelope biogenesis; N, Cell 614 motility; U, Intracellular trafficking, secretion, and vesicular transport; O, Posttranslational modification, 615 protein turnover, chaperones; X, Mobilome: prophages, transposons; C, Energy production and 616 conversion; G, Carbohydrate transport and metabolism; E, Amino acid transport and metabolism; F, 617 Nucleotide transport and metabolism; H, Coenzyme transport and metabolism; I, Lipid transport and 618 metabolism; P, Inorganic ion transport and metabolism; Q, Secondary metabolites biosynthesis, transport 619 and catabolism; R, General function prediction only; S, Function unknown;

620

621 Figure 4. Phyletic patterns of Eukaryotic Signature Proteins (ESPs) encoded in Asgard genomes.

622 All 505 ESP asCOGs are grouped by distance between binary presence-absence phyletic patterns. The 623 most highly conserved ESP asCOGs are shown within the red rectangle. Below the plot of the number of 624 ESP domains in each genome is shown. For details, see Supplementary Table 7. 625

626 Figure 5. Domain architectures of selected Asgard ESPs.

627 (a) ESPs with unique domain architectures. The schematic of each multidomain protein is roughly 628 proportional to the respective protein length. The identified domains are shown inside the arrows 629 approximately according to their location and are briefly annotated. Homologous domains are shown by 630 the same color or pattern.

631 (b) DENN domain proteins in Asgards. Upper part of the figure above the dashed line shows putative 632 operon encoding DENN domain proteins. Genes are shown by block arrows with the length proportional 633 to the size of the corresponding protein. For each protein, the nucleotide contig or genome partition 634 accession number, Asgard genome ID and lineage are indicated. The part of the figure below the dashed 635 line shows domain organization of diverse proteins containing DENN domain. Homologous domains are 636 shown by the same color. The inset on the right side explains identity of the domains

637 (c) NPRL2-like proteins in Asgards. Designations are the same as in Figure 4a.

638

639 Figure 6. Reconstruction and evolution of the key metabolic processes in Asgard archaea.

640 The schematic phylogeny of Asgard archaea is from Figure 1a. 641 LAsCA, Last Asgard Common Ancestor; WLP, Wood-Ljungdahl pathway.

23

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

642 Supplementary Materials 643 Taxonomic description 644 Materials and Methods 645 Supplementary References 646 Tables S1-S8 647 Figures S1-S10 648 Additional Data files are available at ftp://ftp.ncbi.nih.gov/pub/wolf/_suppl/asgard20/

649 Additional data file 1 (Additional_data_file_1.tgz): Complete asCOG data archive

650 Additional data file 2 (Additional_data_file_2.tgz): Phylogenetic trees and alignments archive

651

24

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.01

0.001

0.0001 Asgard TACK a b 99.8 Euryarchaeota (231) + DPANN (22) Korarchaeota_Candidatus_Korarchaeum_cryptofilum_OPF8 As_130 Baldrarchaeota Bathyarchaeota (2) + (13) + (1) 91.8 100 Thorarchaeota 100 (41)

100 As_143 Odinarchaeota As_006_00990 100 100 As_098_03035 Helarchaeota Hela_A_Ga0180301_100789461 100 Lokiarchaeota (28) As_168 100 100 100 Hermodarchaeota As_095_02125 Hermodarchaeota As_086_02466 100 Thorarchaeota (28) 100 Odinarchaeota KP091041 99 JX000838 100 100 JQ817966 100 AB797473 Baldrarchaeota 96 97.4 As_104_02716 As_139 92 Heimdallarchaeota 99.9 As_138 AB797480 94.2 DQ640139 99 OBEP010863780 90.9 98.8 OBEP011359260 100 Lokiarchaeota As_001_00761 99.6 As_141 94.9 97.9 GU553642 100 100 100 Asgard-2 (4) 99.8 As_030_00678 OBEP010743007 100 OBEP010855942 Kariarchaeota 100 100 Helarchaeota 92.3 OBEP010696748 OBEP010845664 100 Gerdarchaeota (11) 100 Borrarchaeota 100 Asgard-4 (9) Former Heimdallarchaeota clade As_105_01019 100 Heimdallarchaeota 100 As_090_03462 99.9 As_109_01519 100 100 OBEP011316084 100 OBEP010918415 100 Kariarchaeota JQ989558 100 100 JX000774 As_003_05426 MDVS01000157 100 100 Gerdarchaeota OBEP010704239 98.5 100 OBEP010794268 Hodarchaeota 99.5 99.4 OBEP010919744 100 OBEP010663114 Hodarchaeota 94.3 As_091_00788

91.3 OBEP011300513 99.7 GQ410928 99.3 100 Wukongarchaeota As_031_03504 OBEP010852077 As_085_00438 Wukongarchaeota

Scale: 0.1 c

Euryarchaea Eukaryotes

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400DPANN; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

100 Bacteria 100 100 TACK

Scale: 0.5 Asgard bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made A available under aCC-BY-ND 4.0 International license.

0.6

Heimdall 0.5

Kari Gerd 0.4

Hod 0.3

Wukong 0.2 Borr 0.1 A2 Hermod Odin Baldr 0 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 Loki -0.1 Hel

-0.2 Thor -0.3

-0.4 A1

B

0.0 0.1 0.2 0.3 0.4 J K L O D N U C 378 E F asCOGs G H I M P Q T V X R S

Asgard only Shared TACK only bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Thor Loki Borr Heimdall Herd Hermod Odin Hel Kari Hod Baldr Wukong

600

500

400

300

200

100

0 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made A available under aCC-BY-ND 4.0 International license.

ZnRx3 Coiled-coil PH vps23 core Loki (MK-D1) QEE16267.1

ZnRx3 E3 UFM1-protein ligase 1/PCI domain adaptin C-terminal Vps28 C-terminal QEE15500.1 Loki (MK-D1)

E3 UFM1-protein ligase 1/PCI domain adaptin C-terminal Vps28 C-terminal longin As_098-p_00119 Hel

TPR repeats adaptin C-terminal Vps28 C-terminal longin As_047-p_01971 Hel PCI-like WD-40 repeats 2 HTH only adaptin C-terminal EAP30 As_050p_03916 Hel

Integrin repeats adaptin C-terminal E3 UFM1-protein ligase 1/PCI domain As_098-p_04464 Hel

Integrin repeats adaptin C-terminal (TRAP_B) Rab GTPase As_043-p_00502 Thor

E3 UFM1-protein ligase 1/PCI domain Znr Znr PCI domain? RING As_051-p_00042 Hod

B

As_006 (565499..563083) Odin DENN Loki MK-D1 As_047 (462..2957) As_128 (726660..729529) Hel Lomgin As_071 (32004..29543) Thor Rab-like GTPAse

Roadblock family LAMTOR1-like

HTH Thor Gerd Heimdall As_071-p_03822 As_053-p_02405 As_001-p_01989 7TM

Ig-like domain Gerd Gerd Gerd As_053-p_01688 As_062-p_00879 As_011-p_01894

C

Hel NPRL2 Loki Hel subfamily As_048 Histidine As_040 As_050 longin kinase 24903..23378 4909..2036 99086..101594 NPRL2 C-terminal Rec domain 7TM Gerd Hod longin As_053 As_051 PAS 160513..162485 572349..567823

Rag GTPase MASE3 Hel Arf GTPase As_047 KaiC 280..5396 Ras GTPase HTH

Odin 7TM sensor As_006 WsaG-like 52317..45192 Membrane protein, possibly involved in polysaccharide utilization

PAS Loki Loki Loki As_004-p_02022 Loki (MK-D1) As_119-p_00144 As_028-p_0166 QEE16479.1 As_094-p_01664 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

CO 2 Thor-, Hermod-, Odin-, Baldr-, Loki-, Hel-, Borr-, Heimdallarchaeota Acetate Reverse WLP CO H+ 2 Thorarchaeota Acetate Kari-, Gerd-, Hodarchaeota Acetyl-CoA H2 H2 e- e- Hermodarchaeota Wukongarchaeota WLP Odinarchaeota Lipids, fatty acids, sugars, alcohols, alkane/aromatic compounds Baldrarchaeota + CO2 H2 H CO2

Acetate Lokiarchaeota Acetate WLP + Reverse WLP CO H Alkane metabolism H+ 2 H uptake Acetate 2 Helarchaeota Acetyl-CoA H Acetyl-CoA 2 H H CO2 2 2 - - H2 production e e Borrarchaeota Glycolysis Glycolysis WLP, Glycolysis, H2 uptake, Organics H production(?), Heimdallarchaeota O 2 O metabolism Lipids, fatty acids, sugars, alcohols 2 2 acetogenesis WLP Kariarchaeota

+ CO2 H2 H

O metabolism Gerdarchaeota 2 e- Acetate H uptake 2 WLP H2 LAsCA CO Hodarchaeota 2 Acetyl-CoA production Loss of metabolic potential 2 Wukongarchaeota Gluconeogenesis Gain of metabolic potential Organics Lipids, fatty acids, sugars, alcohols bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

1 Expanding diversity of Asgard archaea and the elusive ancestry of eukaryotes

2 Yang Liu1†, Kira S. Makarova2†, Wen-Cong Huang1†, Yuri I. Wolf2, Anastasia Nikolskaya2, Xinxu 3 Zhang1, Mingwei Cai1, Cui-Jing Zhang1, Wei Xu3, Zhuhua Luo3, Lei Cheng4, Eugene V. Koonin2*, Meng 4 Li1*

5 1 Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen 6 University, Shenzhen, Guangdong, 518060, P. R. China

7 2 National Center for Biotechnology Information, National Library of Medicine, National Institutes of 8 Health, Bethesda, Maryland 20894, USA

9 3 State Key Laboratory Breeding Base of Marine Genetic Resources, Key Laboratory of Marine Genetic 10 Resources, Fujian Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, State 11 Oceanic Administration, Xiamen 361005, P. R. China

12 4 Key Laboratory of Development and Application of Rural Renewable Energy, Biogas Institute of 13 Ministry of Agriculture, Chengdu 610041, P.R. China

14 † These authors contributed equally to this work.

15 *Authors for correspondence: [email protected] or [email protected]

16

1

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

17 Supplementary Information

18 Taxonomic Description of new taxa…………………………………………………………………..….1

19 Materials and Methods…………………………………………………………………………..………..9

20 Supplementary references…...……………………………………………………………………………15

21 Supplementary Figures…………………………………………………………………………………...19

22 Supplementary Tables………………………………………………………………………...…….…….21

23

24

2

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

25 Taxonomic Description of new taxa

26 ‘Candidatus Wukongarchaeum’ (Wu.kong.ar.chae’um. N. L. n. Wukong a legendary Chinese figure, 27 also known as Monkey King, who caused havoc in the heavenly palace); N.L. neut. N. archaeum (from 28 Gr. adj. archaios ancient) archaeon; N. L. neut. N. Wukongarchaeum.

29 ‘Candidatus Wukongarchaeum yapensis’ (yap’ensis N. L. masc. adj. pertaining to Yap trench, which is 30 the geographical position where the first type material of this species was obtained). Type material is the 31 genome designated as As_085 (Yap4.bin4.70) representing ‘Candidatus Wukongarchaeum yapensis’. The 32 genome “As_085” represents a MAG consisting of 2.16 Mbps in 277 contigs with an estimated 33 completeness of 92.52%, an estimated contamination of 4.05%, a 16S and 23S rRNA gene and 14 tRNAs. 34 The MAG recovered from a marine water metagenome (Yap trench, Western Pacific), with an estimated 35 depth of coverage of 31.4, has a GC content of 38%.

36 ‘Candidatus Hodarchaeum’ (Hod.ar.chae’um. N. L. n. Hod a son of Odin in Norse mythology); N.L. 37 neut. N. archaeum (from Gr. adj. archaios ancient) archaeon; N. L. neut. N. Hodarchaeum.

38 ‘Candidatus Hodarchaeum mangrovi’ (man.gro’vi N.L. fem. n. of a mangrove, referring to the isolation 39 of the type material from mangrove soil). Type material is the genome designated as As_027 40 (FT2_5_011) representing ‘Candidatus Hodarchaeum mangrovi’. The genome “As_027” represents a 41 MAG consisting of 4.01 Mbps in 348 contigs with an estimated completeness of 93.61%, an estimated 42 contamination of 0.93%, a 23S rRNA gene and 14 tRNAs. The MAG recovered from mangrove sediment 43 metagenomes (Futian Nature Reserve, China), with an estimated depth of coverage of 17.9, has a GC 44 content of 32.9%.

45 ‘Candidatus Kariarchaeum’ (Ka.ri.ar.chae’um. N. L. n. Kari the god of wind and brother to Aegir in 46 Norse mythology); N.L. neut. N. archaeum (from Gr. adj. archaios ancient) archaeon; N. L. neut. N. 47 Kariarchaeum.

48 ‘Candidatus Kariarchaeum pelagius’ (pe.la’gi.us. L. masc. adj. of or belonging to the sea, referring to 49 the isolation of the type material from the Ocean). Type material is the genome designated as As_030 50 (RS678) representing ‘Candidatus Kariarchaeum pelagius’. The genome “As_030” represents a MAG 51 consisting of 1.41 Mbps in 76 contigs, an estimated completeness of 83.18%, with an estimated 52 contamination of 1.87%, a 23S, 16S and 5S rRNA genes and 18 tRNAs. The MAG recovered from a 53 marine metagenome (Saudi Arabia: Red Sea) has a GC content of 30.11%.

54 ‘Candidatus Borrarchaeum’ (Borr.ar.chae’um. N. L. n. Borr a creator god and father of Odin); N.L. 55 neut. N. archaeum (from Gr. adj. archaios ancient) archaeon; N. L. neut. N. Borrarchaeum.

56 ‘Candidatus Borrarchaeum yapensis’ (yap’ensis N. L. masc. adj. pertaining to Yap trench, which is the 57 geographical position where the first type material of this species was obtained). Type material is the 58 genome designated as As_181 (Yap2000.bin9.141) representing ‘Candidatus Borrarchaeum yapensis’. 59 The genome “As_181” represents a MAG consisting of 3.63 Mbps in 125 contigs, with an estimated 60 completeness of 95.02%, an estimated contamination of 5.61% and 11 tRNAs. The MAG, recovered from 61 a marine water metagenome (Yap trench, Western Pacific) with an estimated depth coverage of 15.04, has 62 a GC content of 37.1%.

3

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

63 ‘Candidatus Baldrarchaeum’ (Bal.dr.ar.chae’um. N. L. n. Baldr the god of light and son of Odin and 64 borther of Thor in Norse mythology); N.L. neut. N. archaeum (from Gr. adj. archaios ancient) archaeon; 65 N. L. neut. N. Baldrarchaeum.

66 ‘Candidatus Baldrarchaeum yapensis’ (yap’ensis N. L. masc. adj. pertaining to Yap trench, which is the 67 geographical position where the first type material of this species was obtained). Type material is the 68 genome designated as As_130 (Yap30.bin9.72) representing ‘Candidatus Baldrarchaeum yapensis’. The 69 genome “As_130” represents a MAG consisting of 2.27 Mbps in 100 contigs, with an estimated 70 completeness of 93.93%, an estimated contamination of 3.74%, a 23S and 16S rRNA gene and 15 tRNAs. 71 The MAG, recovered from a marine water metagenome (Yap trench, Western Pacific) with an estimated 72 depth coverage of 39.99, has a GC content of 45.95%.

73 ‘Candidatus Hermodarchaeum’ (Her.mod.ar.chae’um. N. L. n. Hermod, messengers of the gods in the 74 Norse mythology and son of Odin and brother of Baldr in the Norse mythology); N.L. neut. N. archaeum 75 (from Gr. adj. archaios ancient) archaeon; N. L. neut. N. Hermodarchaeum.

76 ‘Candidatus Hermodarchaeum yapensis’ (yap’ensis N. L. masc. adj. pertaining to Yap trench, which is 77 the geographical position where the first type material of this species was obtained). Type material is the 78 genome designated as As_086 (Yap4.bin9.105) representing ‘Candidatus Hermodarchaeum yapensis’. 79 The genome ‘As_086’ represent a MAG consisting of 2.71 Mbps in 77 contigs, with an estimated 80 completeness of 92.99%, an estimated contamination of 1.87%, a 23S and 16S rRNA gene and 16 tRNAs. 81 The MAG, recovered from a marine water metagenome (Yap trench, Western Pacific) with an estimated 82 depth coverage of 19.24, has a GC content of 44.69%.

83 Candidatus Wukongarchaeaceae (Wu.kong.ar.chae.a.ce’ae. N.L. neut. n. Wukongarchaeum a 84 (Candidatus) type of the family; -aceae ending to denote the family; N.L. fem. pl. n. 85 Wukongarchaeaceae the Wukongarchaeum family).

86 The family is delineated based on 209 concatenated Asgard Cluster of Orthologs (AsCOGs) and 16S 87 rRNA gene phylogeny. The description is the same as that of its sole genus and species. Type genus is 88 Candidatus Wukongarchaeum.

89 Candidatus Wukongarchaeales (Wu.kong.ar.chae.a’les. N.L. neut. n. Wukongarchaeum a (Candidatus) 90 type genus of the order; -ales ending to denote the order; N.L fem. pl. n. Wukongarchaeales the 91 Wukongarchaeum order).

92 The order is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 93 description is the same as that of its sole genus and species. Type genus is Candidatus Wukongarchaeum.

94 Candidatus Wukongarchaeia (Wu.kong.ar.chae’i.a. N.L. neut. n. Wukongarchaeum a (Candidatus) type 95 genus of the order of the class; -ia ending to denote the class; N.L fem. pl. n. Wukongarchaeia the 96 Wukongarchaeum class).

97 The class is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 98 description is the same as that of its sole and type order Candidatus Wukongarchaeales.

4

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

99 Candidatus Wukongarchaeota (Wu.kong.ar.chae.o’ta. N.L. neut. n. Wukongarchaeum a (Candidatus) 100 type genus of the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. 101 Wukongarchaeota the Wukongarchaeum phylum)

102 The phylum is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 103 description is the same as that of its sole and type class Candidatus Wukongarchaeia.

104 Candidatus Hodarchaeaceae (Hod.ar.chae.a.ce’ae. N.L. neut. n. Hodarchaeum a (Candidatus) type 105 genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. Hodarchaeaceae the 106 Hodarchaeum family).

107 The family is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 108 description is the same as that of its sole genus and species. Type genus is Candidatus Hodarchaeum.

109 Candidatus Hodarchaeales (Hod.ar.chae.a’les. N.L. neut. n. Hodarchaeum a (Candidatus) type genus of 110 the order; -ales ending to denote the order; N.L fem. pl. n. Hodarchaeales the Hodarchaeum order).

111 The order is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 112 description is the same as that of its sole genus and species. Type genus is Candidatus Hodarchaeum.

113 Candidatus Hodarchaeia (Hod.ar.chae’i.a. N.L. neut. n. Hodarchaeum a (Candidatus) type genus of the 114 order of the class; -ia ending to denote the class; N.L fem. pl. n. Hodarchaeia the Hodarchaeum class).

115 The class is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 116 description is the same as that of its sole and type order Candidatus Hodarchaeales.

117 Candidatus Hodarchaeota (Hod.ar.chae.o’ta. N.L. neut. n. Hodarchaeum a (Candidatus) type genus of 118 the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. Hodarchaeota the 119 Hodarchaeum phylum)

120 The phylum is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 121 description is the same as that of its sole and type class Candidatus Hodarchaeia.

122 Candidatus Hodarchaeaceae (Hod.ar.chae.a.ce’ae. N.L. neut. n. Hodarchaeum a (Candidatus) type 123 genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. Hodarchaeaceae the 124 Hodarchaeum family).

125 The family is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 126 description is the same as that of its sole genus and species. Type genus is Candidatus Hodarchaeum.

127 Candidatus Hodarchaeales (Hod.ar.chae.a’les. N.L. neut. n. Hodarchaeum a (Candidatus) type genus of 128 the order; -ales ending to denote the order; N.L fem. pl. n. Hodarchaeales the Hodarchaeum order).

129 The order is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 130 description is the same as that of its sole genus and species. Type genus is Candidatus Hodarchaeum.

131 Candidatus Hodarchaeia (Hod.ar.chae’i.a. N.L. neut. n. Hodarchaeum a (Candidatus) type genus of the 132 order of the class; -ia ending to denote the class; N.L fem. pl. n. Hodarchaeia the Hodarchaeum class).

5

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

133 The class is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 134 description is the same as that of its sole and type order Candidatus Hodarchaeales.

135 Candidatus Hodarchaeota (Hod.ar.chae.o’ta. N.L. neut. n. Hodarchaeum a (Candidatus) type genus of 136 the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. Hodarchaeota the 137 Hodarchaeum phylum)

138 The phylum is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 139 description is the same as that of its sole and type class Candidatus Hodarchaeia.

140 Candidatus Kariarchaeaceae (Ka.ri.ar.chae.a.ce’ae. N.L. neut. n. Kariarchaeum a (Candidatus) type 141 genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. Kariarchaeaceae the 142 Kariarchaeum family).

143 The family is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 144 description is the same as that of its sole genus and species. Type genus is Candidatus Kariarchaeum.

145 Candidatus Kariarchaeales (Ka.ri.ar.chae.a’les. N.L. neut. n. Kariarchaeum a (Candidatus) type genus of 146 the order; -ales ending to denote the order; N.L fem. pl. n. Kariarchaeales the Kariarchaeum order).

147 The order is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 148 description is the same as that of its sole genus and species. Type genus is Candidatus Kariarchaeum.

149 Candidatus Kariarchaeia (Ka.ri.ar.chae’i.a. N.L. neut. n. Kariarchaeum a (Candidatus) type genus of the 150 order of the class; -ia ending to denote the class; N.L fem. pl. n. Kariarchaeia the Kariarchaeum class).

151 The class is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 152 description is the same as that of its sole and type order Candidatus Kariarchaeales.

153 Candidatus Kariarchaeota (Ka.ri.ar.chae.o’ta. N.L. neut. n. Kariarchaeum a (Candidatus) type genus of 154 the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. Kariarchaeota the 155 Kariarchaeum phylum)

156 The phylum is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 157 description is the same as that of its sole and type class Candidatus Kariarchaeia.

158 Candidatus Borrarchaeaceae (Borr.ar.chae.a.ce’ae. N.L. neut. n. Borrarchaeum a (Candidatus) type 159 genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. Borrarchaeaceae the 160 Borrarchaeum family).

161 The family is delineated based on 209 concatenated AsCOGs phylogeny. The description is the same as 162 that of its sole genus and species. Type genus is Candidatus Borrarchaeum.

163 Candidatus Borrarchaeales (Borr.ar.chae.a’les. N.L. neut. n. Borrarchaeum a (Candidatus) type genus of 164 the order; -ales ending to denote the order; N.L fem. pl. n. Borrarchaeales the Borrarchaeum order).

165 The order is delineated based on 209 concatenated AsCOGs phylogeny. The description is the same as 166 that of its sole genus and species. Type genus is Candidatus Borrarchaeum.

6

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

167 Candidatus Borrarchaeia (Borr.ar.chae’i.a. N.L. neut. n. Borrarchaeum a (Candidatus) type genus of the 168 order of the class; -ia ending to denote the class; N.L fem. pl. n. Borrarchaeia the Borrarchaeum class).

169 The class is delineated based on 209 concatenated AsCOGs phylogeny. The description is the same as 170 that of its sole and type order Candidatus Borrarchaeales.

171 Candidatus Borrarchaeota (Borr.ar.chae.o’ta. N.L. neut. n. Borrarchaeum a (Candidatus) type genus of 172 the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. Borrarchaeota the 173 Borrarchaeum phylum)

174 The phylum is delineated based on 209 concatenated AsCOGs phylogeny. The description is the same as 175 that of its sole and type class Candidatus Borrarchaeia.

176

177 Candidatus Baldrarchaeaceae (Bal.dr.ar.chae.a.ce’ae. N.L. neut. n. Baldrarchaeum a (Candidatus) type 178 genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. Baldrarchaeaceae the 179 Baldrarchaeum family).

180 The family is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 181 description is the same as that of its sole genus and species. Type genus is Candidatus Baldrarchaeum.

182 Candidatus Baldrarchaeales (Bal.dr.ar.chae.a’les. N.L. neut. n. Bladrarchaeum a (Candidatus) type 183 genus of the order; -ales ending to denote the order; N.L fem. pl. n. Baldrarchaeales the Baldrarchaeum 184 order).

185 The order is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 186 description is the same as that of its sole genus and species. Type genus is Candidatus Baldrarchaeum.

187 Candidatus Baldrarchaeia (Bal.dr.ar.chae’i.a. N.L. neut. n. Baldrarchaeum a (Candidatus) type genus of 188 the order of the class; -ia ending to denote the class; N.L fem. pl. n. Baldrarchaeia the Baldrarchaeum 189 class).

190 The class is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 191 description is the same as that of its sole and type order Candidatus Baldrarchaeales.

192 Candidatus Baldrarchaeota (Bal.dr.ar.chae.o’ta. N.L. neut. n. Baldrarchaeum a (Candidatus) type genus 193 of the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. Baldrarchaeota the 194 Baldrarchaeum phylum)

195 The phylum is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 196 description is the same as that of its sole and type class Candidatus Baldrarchaeia.

197

198 Candidatus Hermodarchaeaceae (Her.mod.ar.chae.a.ce’ae. N.L. neut. n. Hermodarchaeum a 199 (Candidatus) type genus of the family; -aceae ending to denote the family; N.L. fem. pl. n. 200 Hermodarchaeaceae the Hermodarchaeum family).

7

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

201 The family is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 202 description is the same as that of its sole genus and species. Type genus is Candidatus Hermodarchaeum.

203 Candidatus Hermodarchaeales (Her.mod.ar.chae.a’les. N.L. neut. n. Hermodarchaeum a (Candidatus) 204 type genus of the order; -ales ending to denote the order; N.L fem. pl. n. Hermodarchaeales the 205 Hermodarchaeum order).

206 The order is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 207 description is the same as that of its sole genus and species. Type genus is Candidatus Hermodarchaeum.

208 Candidatus Hermodarchaeia (Her.mod.ar.chae’i.a. N.L. neut. n. Hermodarchaeum a (Candidatus) type 209 genus of the order of the class; -ia ending to denote the class; N.L fem. pl. n. Hermodarchaeia the 210 Hermodarchaeum class).

211 The class is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 212 description is the same as that of its sole and type order Candidatus Hermodarchaeales.

213 Candidatus Hermodarchaeota (Her.mod.ar.chae.o’ta. N.L. neut. n. Hermodarchaeum a (Candidatus) 214 type genus of the class of the phylum; -ota ending to denote the phylum; N.L neut. pl. n. 215 Hermodarchaeota the Hermodarchaeum phylum)

216 The phylum is delineated based on 209 concatenated AsCOGs and 16S rRNA gene phylogeny. The 217 description is the same as that of its sole and type class Candidatus Hermodarchaeia.

218

8

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

219 Materials and Methods

220 Sampling collections and DNA sequencing

221 YT samples were obtained from the Rongcheng Nation Swan Nature Reserve (Rongcheng, China) in 222 November 2018. The sediment cores were collected using columnar samplers at depth intervals of 0–2, 223 21–26, and 36–41 cm at a seagrass meadow and a non-seagrass–covered site nearby. After collection, 224 bulk sediments were immediately sealed in plastic bags, placed in a pre-cooled icebox, and transported to 225 laboratory within 4 hours. For each sample, DNA was extracted from 10 g sediment using PowerSoil 226 DNA Isolation kit (QIAGEN, Germany), according to the manufacturer’s protocol. Following extraction, 227 nucleic acids were sequenced using Illumina HiSeq2500 (Illumina, USA) PE150 by Novogene (Nanjing, 228 China).

229 MP5 samples were obtained from Mai Po Nature Reserve (Hong Kong, China) in September 2014. Three 230 subsurface sediment samples were collected from a site covering with mangrove forest at depth intervals 231 of 0-2, 10-15 and 20-25 cm. Two subsurface sediment samples were taken at an intertidal mudflat with 232 depths of 0-5 and 13-16 cm. Samples were transported back to laboratory as described for YT 233 metagenomes. DNA was extracted from 5g wet sediment per sample using the PowerSoil DNA Isolation 234 Kit (MO BIO, USA) following the manufacturer’s protocol. Metagenomic sequencing data were 235 generated using Illumina HiSeq2500 (Illumina, USA) PE150 by Novogene (Tianjin, China).

236 FT samples were taken from Futian Nature Reserve (Shenzhen, Guangdong, China) in April 2017. 237 Sediment samples were collected as described for YT samples at depth intervals of 0-2, 6-8, 12-14, 20-22, 238 and 28-30 cm. DNA was extracted from 5g wet sediment per sample using DNeasy PowerSoil kit 239 (Qiagen, Germany) as per manufacturer’s instructions. Nucleic acids were sequenced using Illumina 240 HiSeq2000 (Illumina, USA) PE150 by Novogene (Tianjin, China).

241 The surface sediment sample of the CJE metagenome was collected from Changjiang estuary during a 242 cruise in August 2016. The sample was grabbed from the water bed, sealed immediately in a 50 ml tubes 243 and stored in liquid nitrogen onboard. After transportation to laboratory, 10g wet sediments were used for 244 DNA extraction as per manufacturer’s protocol. Nucleic acids were sequenced using Illumina HiSeq2000 245 (Illumina, USA) PE150 by Novogene (Tianjin, China).

246 An oil sand sample was collected from Shengli Oilfield (Shandong, China) into bottles and they were 247 transported to laboratory where they were stored at 4 . The sample was used as inoculum to perform 248 enrichment with anaerobic medium in vials as described elsewhere (1). After 253d of enrichment, the 249 genomic DNA was extracted as described elsewhere (2). Nucleic acids were sequenced Illumina 250 HiSeq2000 (Illumina, USA) PE150 by Novogene (Tianjin, China).

251 The seawater samples of Yap metagenomes were collected at Yap trench region by CTD SBE911plus 252 (Sea-Bird Electronics, USA) during the 37th Dayang cruise in 2016. 8L of seawater per sample was 253 filtered through a 0.22 µm-mesh membrane filter immediately after recovery onboard. The membrane 254 was then cut into ~0.2 cm2 pieces with a flame-sterilized scissors and added to a PowerBead Tube (MO 255 BIO, USA) and the subsequent steps were implemented according to the manufacturer’s protocol to 256 extract DNA. The DNA per sample was amplified in five separative reactions using REPLI-g Single Cell 257 Kits (Qiagen, Germany) following the manufacturer's protocol, given to the challenging nature of sample 258 retrieval and DNA recovery. The products were pooled together and purified using QIAamp DNA Mini

9

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

259 Kit (Qiagen, Germany) according to the manufacturer’s recommendations. Parallel blank controls were 260 set for sampling, DNA extraction and ampliation with 0.22 µm-mesh membrane filtering Milli-Q water 261

262 Metagenomic assembly, binning and gene calling

263 FT, MP5 and YT metagenomes. These three sets of metagenomes were assembled and binned using the 264 same method. Raw shotgun metagenomic sequencing reads were trimmed with “read_qc” module from 265 metaWRAP (v.1.1) (3). All clean reads from the same set were pooled together prior to de novo assemble 266 to one co-assembly. Clean reads were sent out to MEGAHIT (v1.1.2) with flag “--presets meta-large” for 267 co-assembling job (4). Sequencing coverage was determined for each assembled scaffold by mapping 268 reads from each sample to the co-assembly using Bowtie2 (5). The binning analysis was carried out 8 269 times with 8 different combinations of specificity and sensitivity parameters using MetaBAT2(52) (“-- 270 maxP 60 or 95” AND “--minS 60 or 95” AND “--maxEdges 200 or 500”) on the assembly with a 271 minimum length of 2000 bp (6). DAS Tool (v1.0) was used as a dereplication and aggregation strategy on 272 those eight binning results to construct accurate bins (7). Manual curation was used for reducing the 273 genome contamination based on differential coverages, GC contents, and the presence of duplicate genes.

274 The depth coverage and N50 statistics of 38 Asgard MAGs recovered from YT metagenomes range from 275 7.72 to 298.86 (median: 21.77) and from 6658 to 381755 bps (median: 26337.5 bps) respectively; for 13 276 Asgard MAGs recovered from FT metagenomes, the values range from 6.98 to 33.22 (median 16.89) and 277 from 3889 to 8957 bps (median: 5000 bps), respectively; and for 11 Asgard MAGs recovered from MP5 278 metagenomes, the values range from 7.06 to 54.42 (median: 10) and from 3898 to 19362 bps (median: 279 7581 bps) respectively. (Supplementary Figure 2).

280 CJE metagenome. Raw metagenomic shotgun sequencing reads were trimmed using Sickle 281 (https://github.com/najoshi/sickle) with default settings. The trimmed reads were de novo assembled 282 using IDBA-UD (v 1.1.1) with the parameters: “-mink 65, -maxk 145, -step 10” (8). Sequencing coverage 283 was determined as described above. The binning analysis were performed with MetaBAT2 12 times, with 284 12 combinations of specificity and sensitivity parameters (--m 1500, 2000, or 2500” AND “--maxP 85 or 285 90” AND “--minS 80, 85 and 90”) for further refinement (6). All binning results were merged and refined 286 using DAS Tool (v1.0) (7).

287 2 Asgard MAGs recovered from CJE metagenome have a depth coverage of 19.16 and 20.56, and a N50 288 statistics of 8740 and 5246 bps (Supplementary Figure 2).

289 J65 metagenome. Raw metagenomic shotgun sequencing reads were trimmed with Trimmomatic (v0.38) 290 (9). The clean reads were then fed to SPAdes (v 3.12.0) for de novo assembly with the parameters: “-- 291 meta -k 21, 33, 55, 77” (10). Sequencing coverage was determined using BBMap (v 38.24) toolkit with 292 the parameters: “bbmap.sh minid=0.99” (https://github.com/BioInfoTools/BBMap). MetaBAT2 (v 2.12.1) 293 was used to perform binning analysis with the parameter: “-m 2000” (6).

294 One Asgard MAG recovered from J65 metagenome had a depth coverage of 14.52 and a N50 statistics of 295 10460 bps (Supplementary Figure 2).

10

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

296 Yap metagenome. For each Yap metagenome, raw metagenomic shotgun sequencing reads were trimmed 297 with Trimmomatic (v0.38) (9). Assembly and binning analysis were performed as described for CJE 298 metagenome for each Yap metagenome.

299 The depth coverage and N50 statistics of 24 Asgard MAGs recovered from Yap metagenomes range from 300 7.78 to 82.33 (median: 13.91) and from 5097 to 889102 bps (median: 18155.5 bps) respectively. 301 (Supplementary Figure 2)

302 A total of 89 Asgard MAGs were reconstructed in this work. Additional 95 Asgard MAGs were 303 downloaded from public databases (e.g. NCBI FTP site). For all 184 genomes, a uniform gene calling 304 protocol was applied. Specifically, the completeness, contamination, and strain heterogeneity of the 305 genomes were estimated by using CheckM (v.1.0.12) (11) and DAS Tool under the taxonomic scope of 306 domain (i.e., Bacteria and Archaea). Protein-coding genes were predicted using Prodigal (v 2.6.3) (12) 307 embedded in Prokka (v 1.13) (13). Transfer RNAs (tRNAs) were identified with tRNAscan-SE (v1.23) 308 using the archaeal tRNA model (14). After quality screening, further analysis focused on 162 high quality 309 Asgard MAGs.

310 Genome set

311 The Asgard archaea genome set analyzed in this work consisted of 161 Asgard MAGs and one complete 312 Asgard genome (Supplementary Table 1). For comparison, selected representative genomes of archaea 313 (296), bacteria (100) and eukaryotes (76) were downloaded from Refseq and Genbank using the NCBI 314 FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/).

315 Average amino-acid identity

316 The average amino-acid identity (AAI) across TACK archaeal reference genomes and the 184 Asgard 317 genomes was calculated using compareM (v0.0.23) with the “aai_wf” at default settings 318 (https://github.com/dparks1134/CompareM).

319 asCOGs construction

320 Initial clustering of 250,634 proteins encoded in 76 Asgard MAGs was performed using two approaches: 321 first, footprints of arCOG profiles were obtained by running PSI-BLAST (15), initiated with arCOG 322 alignments, against the set of predicted Asgard proteins. The footprint sequences were extracted and 323 clustered according to the arCOG best hit. The remaining protein sequences (both full-length proteins and 324 the sequence fragments outside of the footprints, if longer than 60 aa) were clustered using MMseqs2 325 (16), with similarity threshold of 0.5. Sequences within clusters were aligned using MUSCLE (17); the 326 resulting alignments were passed through several rounds of merging and splitting. The merging phase 327 involved comparing alignments to each other using HHSEARCH (18), finding full-length cluster-to- 328 cluster matches, merging the sequence sets and re-aligning the new clusters. The splitting phase consisted 329 of the construction of an approximate phylogenetic tree of the sequences using FastTree (19) (gamma- 330 distributed site rates, WAG evolutionary model) with balanced mid-point tree rooting, identification of 331 subtrees maximizing the fraction of species (MAGs) representation and minimizing the number of 332 paralogs, and pruning such subtrees as separate clusters of putative orthologs. Clusters, derived from 333 arCOGs, were prohibited from merging across distinct arCOGs to prevent distant paralogs from forming 334 mixed clusters.

11

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

335 Phylogenetic analysis

336 16S rRNA gene phylogenetic analysis. 16S rRNA gene sequences were identified in 73 genomes of 337 Asgard archaea (26 generated in this study, 47 from public database) using Barrnap (v 0.9) with the “-- 338 arc” option (https://github.com/tseemann/barrnap). These sequences were combined with 46 339 published 16S rRNA sequences of Asgard archaea, to assess the novelty of the sequences obtained in this 340 work. The novelty of 16S rRNA gene sequences was measured in terms of their sequence identity to 341 previously identified Asgard archaeal 16S rRNA gene and phylogenetic relationships. Specifically, the 342 pairwise sequence identity of two Asgard archaeal 16S rRNA gene sequences (>1300bp) was obtained by 343 first globally aligning the sequences with “Stretcher” in EMBOSS package and then calculating the 344 percent identity excluding gaps (20). The Asgard archaeal 16S rRNA gene sequences were aligned with 345 311 reference sequences from Euryarchaeota (n=231), DPANN (n=22), Korarchaeota (n=1), 346 Crenarchaeota (n=41), Bathyarchaeota (n=2), Thaumarchaeota (n=13) and Aigarchaeota (n=1) using 347 mafft-linsi (v7.471) (21) and trimmed with BMGE (v1.12) (settings: -m DNAPAM250:4, -g 0.5) (22).The 348 alignment was used for phylogenetic inference with IQ-Tree (v2.0.6) based on SYM+R8 (selected by 349 ModelFinder) to generate a maximum-likelihood tree (23).

350 Asgard phylogeny. A set of asCOGs that were considered most suitable as phylogenetic markers for 351 Asgard archaea was selected using the preliminary classification of the 76 genomes in AsCOGs into 352 previously described lineages: Loki, Thor, Odin, Hel and Heimdall. The following criteria were adopted: 353 the asCOG have to be i) present in at least half of the genomes in all lineages, ii) present in at least 75% 354 among the 76 genomes, iii) the mean number of paralogs per genome not to exceed 1.25. For the 209 355 asCOGs matching these criteria, the corresponding protein sequences were obtained from the extended set 356 of 162 MAGs and aligned using MUSCLE (17); the ‘index’ paralog to include in the phylogeny was 357 selected for each MAG based on the similarity to the alignment consensus. Alignments were trimmed to 358 exclude columns containing more than 2/3 gap characters and with homogeneity below 0.05 359 (homogeneity is calculated from the score of the consensus amino acid against the alignment column, 360 compared to the score of the perfect match (24) and concatenated, resulting in an alignment of 50,706 361 characters from 162 sequences (one sequence per MAG). Phylogeny was reconstructed using FastTree 362 (gamma-distributed site rates, WAG evolutionary model) (19) and IQ-Tree (LG+F+R10 model, selected 363 by ModelFinder) (23), producing very similar tree topologies.

364 Tree of life. To elucidate the relationships between the Asgard archaea and other major clades of archaea, 365 bacteria and eukaryotes, 30 families of conserved proteins were selected that appear to have evolved 366 mostly vertically (25, 26). The set of 162 MAGs of Asgard archaea was supplemented with 66 TACK 367 archaea and 220 non-TACK archaea (the former, having been described as the closest archaeal relatives 368 of the Asgard archaea (27), were sampled more densely), 98 bacteria and 72 eukaryotes. Prokaryotic 369 genomes were sampled from the set of completely sequenced genomes to represent the maximum 370 diversity within their respective clades (briefly, all proteins, encoded by genomes within a groups were 371 clustered at 75% identity level; distances between genomes were estimated from the number of shared 372 proteins within these clusters; UPGMA trees were reconstructed from the distances, and genome sets 373 maximizing the total tree branch length were selected to represent the groups). The set of eukaryotic 374 genomes was manually selected to represent the maximum possible variety of eukaryotic taxa. Genomes, 375 in which more than four markers were missing, were excluded from the bacterial and eukaryotic sets. 376 When multiple paralogs of a marker were present in a genome, preliminary phylogenetic trees were

12

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

377 constructed from protein sequence alignments, and paralogs with the shortest branches were selected to 378 represent the corresponding genomes in the set. Sequences were aligned using MUSCLE (17), and 379 alignment columns, containing more than 2/3 gap characters or with alignment column homogeneity 380 below 0.05 were removed (24). The resulting concatenated alignments of the 30 markers consisted of 381 7411 sites. Phylogeny was reconstructed using FastTree (gamma-distributed site rates, WAG evolutionary 382 model) (19) and IQ-Tree (23) with three models: LG+R10, selected by IQ-tree ModelFinder as the best 383 fit, GTR20+F+R10 (following the suggestion of Williams et al. (28) to use GTR, we let IQ-tree to select 384 the best version of the GTR model), and LG+C20+G4+F (again, the mixture model was used following 385 the suggestion of Williams et al. (28); we were unable to use the higher-specified C60 model due to 386 memory limitations of our hardware and used the C20 model instead).

387 Ordination of asCOG phyletic patterns using Classical Multidimensional Scaling

388 Binary asCOG presence-absence patterns were compared between pairs of Asgard MAGs using the

389 following procedure: first, similarity between asCOG sets {A} and {B} was calculated as , = 390 | |/|||| (the number of shared AsCOGs normalized by the geometric mean of the number of

391 AsCOGs in the two MAGs); then, the distance between the patterns was calculated as , = ln (,). 392 The 162x162 distance matrix was embedded into a 2-dimensional space using Classical Multidimensional 393 Scaling analysis implemented as the cmdscale function in R. The projection retained 89% of the original 394 datapoint inertia.

395 Identification and analysis of ESPs

396 To identify eukaryotic signature proteins (ESPs), several strategies were employed. First, ESPs reported 397 by Zaremba-Niedzwiedzka et al. (29) were mapped to asCOGs using PSI-BLAST (15). These asCOGs 398 were additionally examined case by case using HHpred (30) with representative of the respective asCOG 399 or the respective asCOG alignment used as the query. Second, all asCOGs were mapped to CDD profiles 400 (31) using PSI-BLAST, hits to eukaryote-specific domains were selected. Most of the putative ESP 401 asCOGs identified in this search, and all with E-value >1e-10, were additionally examined using HHpred, 402 with a representative of the respective asCOG or the respective asCOG alignment as the query. Third, we 403 analyzed frequently occurring asCOGs (present in at least 50% of Asgard genomes and in at least 30% of 404 Heimdall genomes) that were not annotated automatically with the above two approaches using HHpred 405 with a representative of the respective asCOG or the respective asCOG alignment as the query . Fourth, 406 most of the putative ESP asCOGs detected with these approaches were used as queries for a PSI-BLAST 407 search that was run for three iterations (with inclusion threshold E-value=0.0001) against an Asgard only 408 protein sequence database. Additional unannotated asCOGs with similarity to the (putative) ESPs 409 identified in this search were further examined using HHpred. Fifth, the genomic neighborhoods or all 410 ESPs were examined, and proteins encoded by unannotated neighbor genes were analyzed using HHpred 411 server.

412 Metabolic pathway reconstruction

413 The patterns of gene presence-absence in the asCOGs were used to reconstruct the metabolic pathways of 414 Asgard archaea (Supplementary Table 8). The asCOGs were linked to the KEGG database and to the list 415 of predicted metabolic enzymes of Asgard archaea reported by Spang et al. (32) (Supplementary Table 8). 416 The classification of [NiFe] hydrogenases was performed by comparing the Asgard proteins of

13

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

417 cog.001539, cog.002254, cog.010021, cog.011939 and cog.012499 to HydDB (33). For phylogenetic 418 analysis, the reference sequences of group 1, 3 and 4 [NiFe] hydrogenases were retrieved from HydDB. 419 The sequences were filtered using cd-hit with a sequence identity cut-off of 90% prior to adding 420 orthologous genes of cog.001539, cog.002254, cog.010021, cog.011939 and cog.012499 of Asgard 421 archaea. All sequences for group 1, group 3 and group 4 [NiFe] hydrogenases were aligned using mafft- 422 LINSI (21) and trimmed with BMGE (-m BLOSUM30 -h 0.6) (22). Maximum-likelihood phylogenetic 423 analyses were performed using IQ-tree (23) with the best-fit model (group 1:LG + C60 + R + F, group 3: 424 LG + C60 + R +F and group 4 LG + C50 + R + F), respectively, according to Bayesian information 425 criterion (BIC). Support values were estimated using the SH-like approximate-likelihood ratio test and 426 ultrafast bootstraps, respectively. We adapted a relaxed common denominator approach to determine the 427 presence of certain pathway in one Asgard phyla (32), and combined with maximum parsimony principle 428 (34) to infer the metabolisms of major ancestral forms.

429

430

14

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

431 References

432 1. L. Cheng, T.-L. Qiu, X. Li, W.-D. Wang, Y. Deng, X.-B. Yin, H. Zhang, Isolation and 433 characterization of receptaculi sp. nov. from Shengli oil field, China. FEMS 434 Microbiology Letters. 285, 65–71 (2008).

435 2. J. Peng, Z. Lü, J. Rui, Y. Lu, Dynamics of the Methanogenic Archaeal Community during 436 Residue Decomposition in an Anoxic Rice Field Soil. Appl. Environ. Microbiol. 74, 2894–2901 (2008).

437 3. G. V. Uritskiy, J. DiRuggiero, J. Taylor, MetaWRAP—a flexible pipeline for genome-resolved 438 metagenomic data analysis. Microbiome. 6, 158 (2018).

439 4. D. Li, C. M. Liu, R. Luo, K. Sadakane, T. W. Lam, MEGAHIT: an ultra-fast single-node solution 440 for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 31, 1674– 441 1676 (2015).

442 5. B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat Methods. 9, 357– 443 359 (2012).

444 6. D. D. Kang, F. Li, E. Kirton, A. Thomas, R. Egan, H. An, Z. Wang, MetaBAT 2: an adaptive 445 binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 7, 446 e7359 (2019).

447 7. C. M. K. Sieber, A. J. Probst, A. Sharrar, B. C. Thomas, M. Hess, S. G. Tringe, J. F. Banfield, 448 Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat 449 Microbiol. 3, 836–843 (2018).

450 8. Y. Peng, H. C. M. Leung, S. M. Yiu, F. Y. L. Chin, IDBA-UD: a de novo assembler for single- 451 cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 28, 1420–1428 (2012).

452 9. A. M. Bolger, M. Lohse, B. Usadel, Trimmomatic: a flexible trimmer for Illumina sequence data. 453 Bioinformatics. 30, 2114–2120 (2014).

454 10. S. Nurk, D. Meleshko, A. Korobeynikov, P. A. Pevzner, metaSPAdes: a new versatile 455 metagenomic assembler. Genome Res. 27, 824–834 (2017).

456 11. D. H. Parks, M. Imelfort, C. T. Skennerton, P. Hugenholtz, G. W. Tyson, CheckM: assessing the 457 quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research. 458 25, 1043–1055 (2015).

459 12. D. Hyatt, G.-L. Chen, P. F. LoCascio, M. L. Land, F. W. Larimer, L. J. Hauser, Prodigal: 460 prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 11, 119 461 (2010).

462 13. T. Seemann, Prokka: rapid prokaryotic genome annotation. Bioinformatics. 30, 2068–2069 463 (2014).

464 14. P. P. Chan, T. M. Lowe, tRNAscan-SE: Searching for tRNA genes in genomic sequences. 465 Methods Mol Biol. 1962, 1–14 (2019).

15

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

466 15. A. A. Schäffer, L. Aravind, T. L. Madden, S. Shavirin, J. L. Spouge, Y. I. Wolf, E. V. Koonin, S. 467 F. Altschul, Improving the accuracy of PSI-BLAST protein database searches with composition-based 468 statistics and other refinements. Nucleic Acids Res. 29, 2994–3005 (2001).

469 16. M. Steinegger, J. Söding, MMseqs2 enables sensitive protein sequence searching for the analysis 470 of massive data sets. Nat Biotechnol. 35, 1026–1028 (2017).

471 17. R. C. Edgar, MUSCLE: a multiple sequence alignment method with reduced time and space 472 complexity. BMC Bioinformatics. 5, 113 (2004).

473 18. J. Söding, Protein homology detection by HMM-HMM comparison. Bioinformatics. 21, 951–960 474 (2005).

475 19. M. N. Price, P. S. Dehal, A. P. Arkin, FastTree 2--approximately maximum-likelihood trees for 476 large alignments. PLoS One. 5, e9490 (2010).

477 20. P. Rice, I. Longden, A. Bleasby, EMBOSS: the European Molecular Biology Open Software 478 Suite. Trends Genet. 16, 276–277 (2000).

479 21. K. Katoh, D. M. Standley, MAFFT Multiple Sequence Alignment Software Version 7: 480 Improvements in Performance and Usability. Molecular Biology and Evolution. 30, 772–780 (2013).

481 22. A. Criscuolo, S. Gribaldo, BMGE (Block Mapping and Gathering with Entropy): a new software 482 for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evolutionary 483 Biology. 10, 210 (2010).

484 23. L.-T. Nguyen, H. A. Schmidt, A. von Haeseler, B. Q. Minh, IQ-TREE: A Fast and Effective 485 Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol Biol Evol. 32, 268–274 486 (2015).

487 24. E. Esterman, Y. I. Wolf, R. Kogay, E. V. Koonin, O. Zhaxybayeva, bioRxiv, in press, 488 doi:10.1101/2020.10.08.331884.

489 25. F. D. Ciccarelli, T. Doerks, C. von Mering, C. J. Creevey, B. Snel, P. Bork, Toward automatic 490 reconstruction of a highly resolved tree of life. Science (New York, N.Y.). 311, 1283–1287 (2006).

491 26. P. Puigbò, Y. I. Wolf, E. V. Koonin, Search for a “Tree of Life” in the thicket of the phylogenetic 492 forest. Journal of biology. 8, 59 (2009).

493 27. A. Spang, J. H. Saw, S. L. Jørgensen, K. Zaremba-Niedzwiedzka, J. Martijn, A. E. Lind, R. van 494 Eijk, C. Schleper, L. Guy, T. J. G. Ettema, Complex archaea that bridge the gap between prokaryotes and 495 eukaryotes. Nature. 521, 173–179 (2015).

496 28. 497 robust support for a two-domains tree of life. Nat Ecol Evol. 4, 138–147 (2020).

498 29. K. Zaremba-Niedzwiedzka, E. F. Caceres, J. H. Saw, D. B ckstr m, L. Juzokaite, E. Vancaester, 499 K. W. Seitz, K. Anantharaman, P. Starnawski, K. U. Kjeldsen, M. B. Stott, T. Nunoura, J. F. Banfield, A.

16

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

500 Schramm, B. J. Baker, A. Spang, T. J. G. Ettema, Asgard archaea illuminate the origin of eukaryotic 501 cellular complexity. Nature. 541, 353–358 (2017).

502 30. L. Zimmermann, A. Stephens, S.-Z. Nam, D. Rau, J. Kübler, M. Lozajic, F. Gabler, J. Söding, A. 503 N. Lupas, V. Alva, A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred 504 Server at its Core. J Mol Biol. 430, 2237–2243 (2018).

505 31. M. Yang, M. K. Derbyshire, R. A. Yamashita, A. Marchler-Bauer, NCBI’s Conserved Domain 506 Database and Tools for Protein Domain Analysis. Curr Protoc Bioinformatics. 69, e90 (2020).

507 32. A. Spang, C. W. Stairs, N. Dombrowski, L. Eme, J. Lombard, E. F. Caceres, C. Greening, B. J. 508 Baker, T. J. G. Ettema, Proposal of the reverse flow model for the origin of the eukaryotic cell based on 509 comparative analyses of Asgard archaeal metabolism. Nature Microbiology. 4, 1138–1148 (2019).

510 33. D. Søndergaard, C. N. S. Pedersen, C. Greening, HydDB: A web tool for hydrogenase 511 classification and analysis. Scientific Reports. 6, 34212 (2016).

512 34. D. L. Swofford, W. P. Maddison, Reconstructing ancestral character states under Wagner 513 parsimony. Mathematical Biosciences. 87, 199–229 (1987).

514

515

17

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

516 Data availability

517 Asgard archaea genomes generated in this study are available in eLMSG (an eLibrary of Microbial 518 Systematics and Genomics, https://www.biosino.org/elmsg/index) under accession numbers from 519 LMSG_G000000521.1 to LMSG_G000000609.1 and NCBI database under the project XXXXX.

520 Additional Data files are available at ftp://ftp.ncbi.nih.gov/pub/wolf/_suppl/asgard20/

521 Additional data file 1 (Additional_data_file_1.tgz): Complete asCOG data archive

522 Additional data file 2 (Additional_data_file_2.tgz): Phylogenetic trees and alignments archive

523

524

525

18

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

526 Supplementary Figures

527 Supplementary Figure 1 Global distribution of Asgard genomes analyzed in the study. The pie chart 528 shows the proportion of Asgard genomes found in a biotope. Bold letters in the map show the sampling 529 locations.

530 Supplementary Figure 2 a. Distribution of completeness and contamination for 76 Asgard MAGs 531 assessed by CheckM (v 1.0.12). Distribution of depth coverage (b) and N50 statistics (c) for Asgard 532 MAGs reconstructed in this study. The numbers in parentheses indicate the number of Asgard genomes 533 recovered from a given sampling location. The data for this plot can be found in Supplementary Table 1.

534 Supplementary Figure 3 Comparison of the mean amino-acid identity (AAI) of Asgard and TACK 535 superphyla. a. Shared AAI across Asgard and TACK lineages. Background: comparing all Asgard and 536 TACK lineages included in the analyses but excluding archaea belonging to the same lineages and the six 537 phyla proposed in the current work to investigate the distribution of AAI that defines a phylum. The AAI 538 comparison of (b) Thorarchaeota, (c) Hermodarchaeota (d) Odinarchaeota, (e) Baldrarchaeota, (f) 539 Lokiarchaeota, (g) Helarchaeota, (h) Borrarchaeota, (i) Heimdallarchaeota, (j) Kariarchaeota, (k) 540 Gerdarchaeota, (l) Hodarchaeota and (m) Wukongarchaeota to other Asgad and TACK lineages. The 541 lower and upper hinges of the boxplot correspond to the first and third quartiles. Data beyond the 542 whiskers are shown as individual data points. Number in the parenthesis indicates the number of genomes 543 in the lineages. Data for this plot is included in Supplementary Table 2.

544 Supplementary Figure 4 Comparison of the 16S rRNA gene sequence (>1300 bp) identity of Asgard 545 and TACK lineages. a. Shared 16S rRNA gene sequence identity across Asgard and TACK lineages. 546 Background:comparing all Asgard and TACK lineages included in the analyses but excluding archaea 547 belonging to the same lineages and the six phyla proposed in the current study to investigate the 548 distribution of 16S rRNA gene identity that defines a phylum. 16S rRNA gene identity comparison of (b) 549 Thorarchaeota, (c) Hermodarchaeota (d) Odinarchaeota, (e) Lokiarchaeota, (f) Helarchaeota, (g) 550 Heimdallarchaeota, (h) Kariarchaeota, (i) Gerdarchaeota, (j) Hodarchaeota and (k) Wukongarchaeota to 551 other Asgard and TACK lineages. The lower and upper hinges of the boxplot correspond to the first and 552 third quartiles. Data beyond the whiskers are shown as individual data points. Number in the parenthesis 553 indicates the number of 16S rRNA gene sequences compared in the lineages. Line represents a 16S rRNA 554 gene identity of 75%. Data for this plot could be found in Supplementary Table 3.

555 Supplementary Figure 5 Presence-absence of orthologs of Asgard core genes in other archaea, bacteria 556 and eukaryotes.

557 Supplementary Figure 6 Phylogenetic analysis of group 4 [NiFe] hydrogenases in the Asgard archaea. 558 The unrooted maximum-likelihood phylogenetic tree was built from an alignment of 425 sequences 559 including 110 sequences of Asgard archaea, with 308 amino-acid positions.

560 Supplementary Figure 7 Phylogenetic analysis of group 3 [NiFe] hydrogenases in the Asgard archaea. 561 The unrooted maximum-likelihood phylogenetic tree was built from an alignment of 813 sequences 562 including 335 sequences of Asgard archaea, 514 amino-acid positions.

19

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

563 Supplementary Figure 8 Phylogenetic analysis of group 1 [NiFe] hydrogenases in the Asgard archaea. 564 The unrooted maximum-likelihood phylogenetic tree was built from an alignment of 541 sequences 565 including 2 sequences of Wukongarchaeaota, with 745 amino-acid positions.

566 Supplementary Figure 9 A schematic representation of the presence and absence of selected metabolic 567 features in all (putative) phyla of Asgard archaea.

568 Supplementary Figure 10 Gene structure of the contig encoding Group 1 [Ni,Fe]-hydrogenase in 569 Wukongarchaeota. Abbreviations: Ftr, Formylmethanofuran:tetrahydromethanopterin formyltransferase; 570 FwdD Formylmethanofuran dehydrogenase subunit D; FwdB, Formylmethanofuran dehydrogenase 571 subunit B; FwdA, Formylmethanofuran dehydrogenase subunit A; FwdC, Formylmethanofuran 572 dehydrogenase subunit C; HdrC, Heterodisulfide reductase, subunit C; HyaB, Ni,Fe-hydrogenase I large 573 subunit; HyaA, Ni,Fe-hydrogenase I small subunit.

574

575

20

bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

576 Supplementary Tables

577 Supplementary Table 1 Genome information, proposed and isolation data

578 Supplementary Table 2 Mean amino-acid identity values (in %) comparing 66 TACK genomes and 184 579 Asgard genomes (162 high quality and 22 low-quality)

580 Supplementary Table 3 The 16S rRNA gene sequence identity (in %) comparing TACK lineages and 581 Asgard lineages. The identity was calculated using sequences longer than 1300 bps

582 Supplementary Table 4 Species and phyletic markers used for the tree of life reconstruction

583 Supplementary Table 5 Data for phylogenetic trees: the trees in the Newick format and the underlying 584 alignments

585 Supplementary Table 6 The asCOGs annotation

586 Supplementary Table 7 Eukaryotic signature proteins in Asgard archaea

587 Supplementary Table 8 The presence-absence of metabolic enzymes in Asgard archaea.

588

21

● ● ● 50°N bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. ● ●● ● ● YT J65 ● ● ● ● CJE ● ●● ● ● ● ● MP5YT ● ● ● ● ●

● ● Yap

3.3% 0.5 % 0°

Latitude 14.2 % 51.3%

14.2 %

50°S 0.5 %

7.1 %

0.5 % 8.1% N

120°W 60°W 0° 60°E 120°E Longitude

● Coastal sediment ● Hot spring ● Hypersaline lake sediment ● Marine water ● Petroleum seep (Marine) Biotope ● Freshwater sediment ● ● Marine sediment ● Petroleum field bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

a 100 b c 300

275

750000 250

75 225

200

175 500000 50 150 N50 (bp) value (%) value 125 depth of coverage (1X) depth of coverage 100 250000 25 75

50

25

0 0 0 Completeness Contamination CJE (n=2) FT (n=13) J65 (n=1) MP5 (n=11) Yap (n=24) YT (n=38) CJE (n=2) FT (n=13) J65 (n=1) MP5 (n=11) Yap (n=24) YT (n=38) a 100

80 60 age amino acid identity (%) r e v A

40

Background yarchaeota(5) rarchaeota(2) rarchaeota(2) rarchaeota(3) iarchaeota(4) h o rarchaeota(57) r Aigarchaeota(1) K Helarchaeota(5) Ka ongarchaeota(3) Odinarchaeota(2) Bald Bor k Crenarchaeota(45) Bat Tho rmodarchaeota(17) Lokiarchaeota(52) Gerdarchaeota(14) Hodarchaeota(16) u W Thaumarchaeota(13) He Heimdallarchaeota(9)

b 100 c 100

80 80

60 60 age amino acid identity (%) r age amino acid identity (%) r e e v v A A

40 40

yarchaeota (5) rarchaeota (2) rarchaeota (2) rarchaeota (3) iarchaeota (4) h o rarchaeota (57) r archaeota (5) rarchaeota (2) rarchaeota (2) rarchaeota (3) iarchaeota (4) K ongarchaeota (3) y o r rarchaeota (57) k modarchaeota (17) h K u r kongarchaeota (3) W rmodarchaeota (17) u W

archaeota (57) _vs_ Aigarchaeota (1) archaeota (57) _vs_ archaeota (57) _vs_ Helarchaeota (5) r r archaeota (57) _vs_ Odinarchaeota (2) r rarchaeota (57) _vs_ Bor rarchaeota (57) _vs_ Ka rmodarchaeota (17) _vs_ Aigarchaeota (1) modarchaeota (17) _vs_ modarchaeota (17) _vs_ Helarchaeota (5) modarchaeota (17) _vs_ Ka archaeota (57) _vs_archaeota Crenarchaeota (57) _vs_ (45)Bat r rarchaeota (57)r archaeota_vs_ Bald (57) _vs_ Lokiarchaeota (52) rarchaeota (57) _vs_ Hodarchaeotaarchaeota (16) (57) _vs_ Tho r modarchaeota modarchaeota(17) _vs_ Odinarchaeota (17) _vs_ Bald (2) r rmodarchaeota (17) _vs_ Bor r r r rarchaeota (57) _vs_ Gerdarchaeota (14) r modarchaeota r(17)modarchaeota _vs_ Crenarchaeota (17) _vs_ (45)Bat modarchaeota (17)r _vs_ Tho r rmodarchaeota (17) _vs_ Lokiarchaeota (52) modarchaeota (17)rmodarchaeota _vs_ Gerdarchaeota (17) _vs_ (14)Hodarchaeota (16) Tho archaeota (57) _vs_Tho Thaumarchaeota (13) Tho Tho archaeota (57) _vs_ He r He r He He r r archaeota (57) Tho_vs_ He Tho Tho rarchaeota (57) _vs_ Heimdallarchaeota (9) r modarchaeota (17) _vs_ Thaumarchaeota (13)He He He modarchaeota (17) _vs_ Heimdallarchaeota (9) rmodarchaeota (17) _vs_ Tho Tho r Tho Tho Tho Tho He He r He He r He He rmodarchaeota (17) _vs_ He Tho He Tho Tho Tho He He He

d 100 e 100

80 80

60 60 age amino acid identity (%) age amino acid identity (%) r r e e v v A A

40 40

archaeota (5) archaeota (2) archaeota (3) archaeota (2) yarchaeota (5) rarchaeota (2) rarchaeota (2) rarchaeota (3) iarchaeota (4) y r archaeota (57) r riarchaeota (4) r h o rarchaeota (57) r h o r K ongarchaeota (3) K ongarchaeota (3) k modarchaeota (17) k rmodarchaeota (17) u r u W W

rarchaeota (2) _vs_ Aigarchaeota (1) rarchaeota (2) _vs_ rarchaeota (2) _vs_archaeota Helarchaeota (2) _vs_ (5)Bor archaeota (2) _vs_ Ka archaeota (2) _vs_ Bat archaeota (2) _vs_ Tho rarchaeota (2) _vs_archaeota Odinarchaeota (2) _vs_ Lokiarchaeota (2) r (52) r archaeota (2) _vs_ Hodarchaeotaarchaeota (16) (2) _vs_ Bald rarchaeota (2) _vs_r Crenarchaeota (45) r r rarchaeota (2) _vs_r Gerdarchaeota (14) r Odinarchaeota (2) _vs_ Aigarchaeota (1) Odinarchaeota (2) _vs_ Odinarchaeota (2) _vs_ Helarchaeota (5) Bald archaeota (2) _vs_Bald Thaumarchaeota (13) Bald archaeota (2) _vs_ Heimdallarchaeota (9) rarchaeota (2) _vs_ Odinarchaeota (2) _vs_ Bor Odinarchaeota (2) _vs_ Ka Odinarchaeota (2) _vs_ Odinarchaeota (2) r rarchaeota (2) _vs_Bald He Bald r Bald Odinarchaeota (2)Odinarchaeota _vs_ Crenarchaeota (2) _vs_ (45)Bat Odinarchaeota (2) _vs_ Tho Odinarchaeota Odinarchaeota(2) _vs_ Bald (2) _vs_ Lokiarchaeota (52) Odinarchaeota (2)Odinarchaeota _vs_ Gerdarchaeota (2) _vs_ (14)Hodarchaeota (16) Bald Bald Bald Bald Bald Bald Bald Odinarchaeota (2) _vs_ Bald Odinarchaeota (2) _vs_ Thaumarchaeota (13)Odinarchaeota (2) _vs_ He Odinarchaeota (2) _vs_ Heimdallarchaeota (9) Bald Bald Bald

100 100 f g

80 80

60 60 age amino acid identity (%) age amino acid identity (%) r r e e v v A A 40 40

archaeota (5) archaeota (2) archaeota (2) archaeota (3) y r archaeota (57) r r riarchaeota (4) h o r yarchaeota (5) rarchaeota (2) rarchaeota (2) rarchaeota (3) iarchaeota (4) K ongarchaeota (3) h o rarchaeota (57) r modarchaeota (17) k K r u kongarchaeota (3) W rmodarchaeota (17) u W

Lokiarchaeota (52) _vs_ Aigarchaeota (1) Lokiarchaeota (52) _vs_ Lokiarchaeota (52) _vs_ Helarchaeota (5) Helarchaeota (5) _vs_ Aigarchaeota (1) Helarchaeota (5) _vs_ Helarchaeota (5) _vs_ Helarchaeota (5) Lokiarchaeota (52) _vs_ Odinarchaeota (2) Lokiarchaeota (52) _vs_ Bor Lokiarchaeota (52) _vs_ Ka Helarchaeota (5) _vs_ Odinarchaeota (2) Helarchaeota (5) _vs_ Bor Helarchaeota (5) _vs_ Ka Lokiarchaeota (52)Lokiarchaeota _vs_ Crenarchaeota (52) _vs_ (45)Bat Lokiarchaeota (52) _vs_ Tho Lokiarchaeota (52) _vs_ Bald Lokiarchaeota (52)Lokiarchaeota _vs_ Gerdarchaeota (52) _vs_ (14)HodarchaeotaLokiarchaeota (16) (52) _vs_ Lokiarchaeota (52) Helarchaeota (5)Helarchaeota _vs_ Crenarchaeota (5) _vs_ (45)Bat Helarchaeota (5) _vs_ Tho Helarchaeota (5)Helarchaeota _vs_ Bald (5) _vs_ Lokiarchaeota (52) Helarchaeota (5)Helarchaeota _vs_ Gerdarchaeota (5) _vs_ (14)Hodarchaeota (16) Lokiarchaeota (52) _vs_ Helarchaeota (5) _vs_ Lokiarchaeota (52) _vs_ Thaumarchaeota (13)Lokiarchaeota (52) _vs_ He Lokiarchaeota (52) _vs_ Heimdallarchaeota (9) Helarchaeota (5) _vs_ Thaumarchaeota (13)Helarchaeota (5) _vs_ He Helarchaeota (5) _vs_ Heimdallarchaeota (9)

h 100 i 100

80 80

60

60 age amino acid identity (%) r age amino acid identity (%) e r v e v A

A

40 40

yarchaeota (5) rarchaeota (2) rarchaeota (2) rarchaeota (3) iarchaeota (4) h o rarchaeota (57) r K ongarchaeota (3) yarchaeota (5) rarchaeota (2) rarchaeota (2) iarchaeota (4) rarchaeota (3) modarchaeota (17) k h o rarchaeota (57) r r u K W kongarchaeota (3) rmodarchaeota (17) u W

rarchaeota (3) _vs_ Aigarchaeota (1) rarchaeota (3) _vs_ rarchaeota (3) _vs_ Helarchaeotaarchaeota (5) (3) _vs_ Ka archaeota (3) _vs_ Bor archaeota (3) _vs_ Bat archaeota (3) _vs_ Tho rarchaeota (3) _vs_archaeota Odinarchaeota (3) _vs_archaeota Bald (2) (3) _vs_ Lokiarchaeota (52) r archaeota (3) _vs_ Hodarchaeotar (16) rarchaeota (3) _vs_r Crenarchaeota (45) r r r rarchaeota (3) _vs_r Gerdarchaeota (14) Bor Bor Bor archaeota (3) _vs_ Heimdallarchaeota (9) _vs_ Aigarchaeota (1) Heimdallarchaeota (9) _vs_ Heimdallarchaeota (9) _vs_ Helarchaeota (5) rarchaeota (3) _vs_ Thaumarchaeota (13) archaeota (3) _vs_Bor He archaeota (3) _vs_Bor Heimdallarchaeota (9) r Bor Heimdallarchaeota (9) _vs_ Odinarchaeota (2) HeimdallarchaeotaHeimdallarchaeota (9) _vs_ Bor (9) _vs_ Ka Bor Bor Bor r Bor Bor r Bor Bor HeimdallarchaeotaHeimdallarchaeota (9) _vs_ Crenarchaeota (9) _vs_ (45)Bat Heimdallarchaeota (9) _vs_ Tho HeimdallarchaeotaHeimdallarchaeota (9) _vs_ Bald (9) _vs_ Lokiarchaeota (52) HeimdallarchaeotaHeimdallarchaeota (9) _vs_ Gerdarchaeota (9) _vs_ (14)Hodarchaeota (16) Bor Heimdallarchaeota (9) _vs_ Bor Bor Bor Heimdallarchaeota (9) _vs_ ThaumarchaeotaHeimdallarchaeota (13) (9) _vs_ He Heimdallarchaeota (9) _vs_ Heimdallarchaeota (9)

100 100 j k

80 80

bioRxiv preprint60 doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint 60

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made age amino acid identity (%) age amino acid identity (%) r r

available under aCC-BY-ND 4.0 International license. e e v v A A 40 40

archaeota (5) rarchaeota (2) rarchaeota (2) rarchaeota (3) iarchaeota (4) y o rarchaeota (57) r yarchaeota (5) rarchaeota (2) rarchaeota (2) rarchaeota (3) iarchaeota (4) h h o rarchaeota (57) r K ongarchaeota (3) K modarchaeota (17) k kongarchaeota (3) r u rmodarchaeota (17) u W W

riarchaeota (4) _vs_ Aigarchaeota (1) riarchaeota (4) _vs_ riarchaeota (4) iarchaeota_vs_ Helarchaeota (4) _vs_ (5)Bor iarchaeota (4) _vs_ Ka iarchaeota (4) _vs_iarchaeota Odinarchaeota (4) _vs_iarchaeota Bald (2) (4) _vs_ Lokiarchaeotar (52) iarchaeota (4) _vs_ Hodarchaeotar (16) iarchaeota (4) _vs_riarchaeota Crenarchaeota (4) _vs_ (45)Bat riarchaeota (4) _vs_ Tho r r r iarchaeota (4) _vs_r Gerdarchaeota (14) Ka r Ka Ka Ka r Ka Gerdarchaeota (14) _vs_ Aigarchaeota (1) Gerdarchaeota (14) _vs_ GerdarchaeotaGerdarchaeota (14) _vs_ Helarchaeota (14) _vs_ (5)Bor Gerdarchaeota (14) _vs_ Ka iarchaeota (4) _vs_ Thaumarchaeota (13) Ka Ka Ka iarchaeota (4) _vs_ HeimdallarchaeotaKa (9) riarchaeota (4) _vs_ GerdarchaeotaGerdarchaeota (14) _vs_ Odinarchaeota Gerdarchaeota(14) _vs_ Bald (2) (14) _vs_ Lokiarchaeota (52) Gerdarchaeota (14) _vs_ Hodarchaeota (16) Ka Ka r Ka riarchaeota (4) _vs_ He r Ka Gerdarchaeota (14)Gerdarchaeota _vs_ Crenarchaeota (14) _vs_ (45)Bat Gerdarchaeota (14) _vs_ Tho Gerdarchaeota (14) _vs_ Gerdarchaeota (14) Ka Gerdarchaeota (14) _vs_ Ka Ka Ka Gerdarchaeota (14) _vs_ Thaumarchaeota (13)Gerdarchaeota (14) _vs_ He Gerdarchaeota (14) _vs_ Heimdallarchaeota (9)

l 100 m 100

80 80

60 60 age amino acid identity (%) r age amino acid identity (%) r e e v v A A

40 40

yarchaeota (5) rarchaeota (2) rarchaeota (2) rarchaeota (3) iarchaeota (4) h o rarchaeota (57) r yarchaeota (5) rarchaeota (2) rarchaeota (2) rarchaeota (3) iarchaeota (4) K ongarchaeota (3) h o rarchaeota (57) r k K rmodarchaeota (17) u kongarchaeota (3) rmodarchaeota (17) u W W

ongarchaeota (3) _vs_ Aigarchaeota (1) ongarchaeota (3) _vs_ ongarchaeota (3) _vs_ Helarchaeota (5) k k ongarchaeota (3) _vs_ Odinarchaeota (2) k ongarchaeota (3) _vs_ Bor ongarchaeota (3) _vs_ Ka u ongarchaeota (3)ongarchaeota _vs_ Crenarchaeota (3) _vs_ (45)Bat u ongarchaeota (3) _vs_ Tho k ongarchaeota (3)ongarchaeota _vs_ Bald (3)u _vs_ Lokiarchaeotak (52) k ongarchaeota (3) _vs_ Hodarchaeota (16) k k k u k k u u kongarchaeota (3)k _vs_ Gerdarchaeota (14) Hodarchaeota (16) _vs_ Aigarchaeota (1) Hodarchaeota (16) _vs_ Hodarchaeota Hodarchaeota(16) _vs_ Helarchaeota (16) _vs_ (5)Bor Hodarchaeota (16) _vs_ Ka W u u ongarchaeota (3)W _vs_ Thaumarchaeotau (13) u u W W W u u ongarchaeota (3) _vs_ Hodarchaeota Hodarchaeota(16) _vs_ Odinarchaeota (16)Hodarchaeota _vs_ Bald (2) (16) _vs_ Lokiarchaeota (52) Hodarchaeota (16) _vs_ Hodarchaeota (16) k ongarchaeota (3)W _vs_ He W W kongarchaeota (3) _vs_ Heimdallarchaeota (9)W k Hodarchaeota (16)Hodarchaeota _vs_ Crenarchaeota (16) _vs_ (45)Bat Hodarchaeota (16) _vs_ Tho Hodarchaeota (16) _vs_ Gerdarchaeota (14) W W u W k u W u Hodarchaeota (16) _vs_ u W Hodarchaeota (16) _vs_ Thaumarchaeota (13)Hodarchaeota (16) _vs_ He Hodarchaeota (16) _vs_ Heimdallarchaeota (9) W W W a 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ●

90

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

80

70 16S rRNA gene identity (%)

● ●

60

● ● ● ● ● ● ● ● ● ● ● ● ●

Background yarchaeota(2) rarchaeota(1) iarchaeota(4) h o rarchaeota(10) r Aigarchaeota(1) K Helarchaeota(1) ongarchaeota(1) modarchaeota(1) Odinarchaeota(2) Ka Gerdarchaeota(8) k Crenarchaeota(38) Bat Tho r Lokiarchaeota(12) Hodarchaeota(13) u Thaumarchaeota(13) He W Heimdallarchaeota(12)

b 100 c 100

90 90

80 80

● ● ● ● ● ● ●

16S rRNA gene identity (%) ● 16S rRNA gene identity (%) 70 ● ● ● 70 ● ● ● ● ● ● ● ●

● 60 ● 60 ●

archaeota (2) archaeota (1) y r archaeota (10) riarchaeota (4) archaeota (2) rarchaeota (1) iarchaeota (4) h o r y o r rarchaeota (10) K ongarchaeota (1)modarchaeota (1) h K k r modarchaeota (1) kongarchaeota (1) u r u W W

rarchaeota (10) _vs_ Aigarchaeota (1) rarchaeota (10) _vs_ rarchaeota (10) _vs_ Helarchaeotaarchaeota (1) (10) _vs_ Ka modarchaeota (1) _vs_ Aigarchaeota (1) modarchaeota (1) _vs_ modarchaeota (1) _vs_ Helarchaeota (1) archaeota (10) _vs_ Bat rarchaeota (10) _vs_archaeota Odinarchaeota (10) _vs_ (2) Lokiarchaeota (12) r rarchaeota (10) _vs_archaeota Gerdarchaeota (10) _vs_ (8) Hodarchaeota (13) r r modarchaeota (1) _vs_ Odinarchaeotar (2) modarchaeota (1) _vs_ Ka rarchaeota (10) _vs_r Crenarchaeota (38) r r rarchaeota (10) _vs_ Tho modarchaeota (1) _vs_ Bat modarchaeota (1)r _vs_ Tho modarchaeota (1) _vs_ Lokiarchaeota (12) r rmodarchaeota (1)modarchaeota _vs_ Gerdarchaeota (1) _vs_ (8) Hodarchaeota (13) Tho Tho Tho He rmodarchaeota (1)r _vs_ Crenarchaeota (38) He r r He r rarchaeota (10) _vs_ Thaumarchaeotararchaeota (13) (10) _vs_ He Tho rarchaeota (10) _vs_ modarchaeota (1) _vs_ Thaumarchaeota (13) He modarchaeota (1)modarchaeota _vs_ (1) _vs_ He Tho Tho Tho archaeota (10) _vs_ HeimdallarchaeotaTho (12) Tho Tho He r He He He He He r r Tho r He rmodarchaeota (1) _vs_ Heimdallarchaeota (12) Tho Tho Tho He He He Tho He

d 100 e 100

90 90

● ● 80 80 ●

● ●

● ● 16S rRNA gene identity (%) 16S rRNA gene identity (%) ● ●

● ● ● 70 ● 70

archaeota (2) archaeota (1) archaeota (2) rarchaeota (1) iarchaeota (4) y r archaeota (10) riarchaeota (4) y o rarchaeota (10) r h o r h K K ongarchaeota (1) modarchaeota (1) kongarchaeota (1) rmodarchaeota (1) k r u u W W

Odinarchaeota (2) _vs_ Aigarchaeota (1) Odinarchaeota (2) _vs_ Odinarchaeota (2) _vs_ Helarchaeota (1) Lokiarchaeota (12) _vs_ Aigarchaeota (1) Lokiarchaeota (12) _vs_ Lokiarchaeota (12) _vs_ Helarchaeota (1) Odinarchaeota (2)Odinarchaeota _vs_ Ka (2) _vs_ Gerdarchaeota (8) Odinarchaeota (2) _vs_ Odinarchaeota (2) Lokiarchaeota (12) _vs_ Odinarchaeota (2) Lokiarchaeota (12)Lokiarchaeota _vs_ Ka (12) _vs_ Gerdarchaeota (8) Odinarchaeota (2)Odinarchaeota _vs_ Crenarchaeota (2) _vs_ (38) Bat Odinarchaeota (2) _vs_ Tho Odinarchaeota (2) _vs_ Lokiarchaeota (12) Odinarchaeota (2) _vs_ Hodarchaeota (13) Lokiarchaeota (12)Lokiarchaeota _vs_ Crenarchaeota (12) _vs_ (38) Bat Lokiarchaeota (12) _vs_ Tho Lokiarchaeota (12) _vs_ HodarchaeotaLokiarchaeota (13) (12) _vs_ Lokiarchaeota (12) Odinarchaeota (2) _vs_ Thaumarchaeota (13) Odinarchaeota (2) _vs_ He Odinarchaeota (2) _vs_ Lokiarchaeota (12) _vs_ Thaumarchaeota (13) Lokiarchaeota (12) _vs_ He Lokiarchaeota (12) _vs_ Odinarchaeota (2) _vs_ Heimdallarchaeota (12) Lokiarchaeota (12) _vs_ Heimdallarchaeota (12)

e 100 f 100

90 90

80

80

● ● ● ● ● ● ● ● ● ● ● ● ●

16S rRNA gene identity (%) ● ● ●

16S rRNA gene identity (%) ● 70 ● ● ● ● ● ● ● ● ● ● 70 ● ●

● ● ●

60 ●

60

archaeota (2) rarchaeota (1) iarchaeota (4) y o rarchaeota (10) r h K modarchaeota (1) kongarchaeota (1) r u archaeota (2) rarchaeota (1) iarchaeota (4) y o rarchaeota (10) r W h K modarchaeota (1) kongarchaeota (1) r u W

Helarchaeota (1) _vs_ Aigarchaeota (1) Helarchaeota (1) _vs_ Helarchaeota (1) _vs_ Helarchaeota (1) Heimdallarchaeota (12) _vs_ Aigarchaeota (1) Heimdallarchaeota (12) _vs_ Heimdallarchaeota (12) _vs_ Helarchaeota (1) Helarchaeota (1) _vs_ Odinarchaeota (2) Helarchaeota (1)Helarchaeota _vs_ Ka (1) _vs_ Gerdarchaeota (8) Heimdallarchaeota (12) _vs_ Odinarchaeota (2) HeimdallarchaeotaHeimdallarchaeota (12) _vs_ Ka (12) _vs_ Gerdarchaeota (8) Helarchaeota (1) _vs_Helarchaeota Crenarchaeota (1) _vs_ (38) Bat Helarchaeota (1) _vs_ Tho Helarchaeota (1) _vs_ Lokiarchaeota (12) Helarchaeota (1) _vs_ Hodarchaeota (13) HeimdallarchaeotaHeimdallarchaeota (12) _vs_ Crenarchaeota (12) _vs_ (38) Bat Heimdallarchaeota (12) _vs_ Tho Heimdallarchaeota (12) _vs_ Lokiarchaeota (12) Heimdallarchaeota (12) _vs_ Hodarchaeota (13) Helarchaeota (1) _vs_ Thaumarchaeota (13) Helarchaeota (1) _vs_ He Helarchaeota (1) _vs_ Heimdallarchaeota (12) _vs_ Thaumarchaeota (13) Heimdallarchaeota (12) _vs_ He Heimdallarchaeota (12) _vs_ Helarchaeota (1) _vs_ Heimdallarchaeota (12) Heimdallarchaeota (12) _vs_ Heimdallarchaeota (12)

g 100 h 100 ●

● 90 ● ● 90 ●

80 80

● ● ● 16S rRNA gene identity (%)

16S rRNA gene identity (%) ● 70 ● ● ● ● ● ● 70 ● ● ● ● ● ● ●

60 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the● preprint in perpetuity. It is made 60 available under aCC-BY-ND 4.0 International license.

archaeota (2) rarchaeota (1) iarchaeota (4) yarchaeota (2) rarchaeota (1) iarchaeota (4) y o rarchaeota (10) r h o rarchaeota (10) r h K K modarchaeota (1) ongarchaeota (1) modarchaeota (1) kongarchaeota (1) r k r u u W W

iarchaeota (5) _vs_ Aigarchaeota (1) iarchaeota (4) _vs_ iarchaeota (4) _vs_ Helarchaeota (1) r r iarchaeota (4) _vs_ Odinarchaeota r(2) iarchaeota (4) _vs_ Gerdarchaeota (8) riarchaeota (4) _vs_ Ka iarchaeota (4) _vs_riarchaeota Crenarchaeota (4) _vs_ (38) Bat iarchaeota (4) _vs_ Tho r riarchaeota (4) _vs_ Lokiarchaeota (12) r riarchaeota (4) _vs_ Hodarchaeota (13) Ka r Ka r Ka Ka Gerdarchaeota (8) _vs_ Aigarchaeota (1) Gerdarchaeota (8) _vs_ Gerdarchaeota (8) _vs_ HelarchaeotaGerdarchaeota (1) (8) _vs_ Ka iarchaeota (4) _vs_ Thaumarchaeota (13) riarchaeota (4) _vs_Ka He Ka iarchaeota (4) _vs_ Gerdarchaeota (8) _vs_ Odinarchaeota (2) Gerdarchaeota (8) _vs_ Gerdarchaeota (8) Ka Ka r Ka Ka iarchaeota (4) _vs_ HeimdallarchaeotaKa (12) r Gerdarchaeota (8)Gerdarchaeota _vs_ Crenarchaeota (8) _vs_ (38) Bat Gerdarchaeota (8) _vs_ Tho Gerdarchaeota (8) _vs_ Lokiarchaeota (12) Gerdarchaeota (8) _vs_ Hodarchaeota (13) Ka Ka r Ka Gerdarchaeota (8) _vs_ Thaumarchaeota (13) Gerdarchaeota (8) _vs_ He Gerdarchaeota (8) _vs_ Ka Gerdarchaeota (8) _vs_ Heimdallarchaeota (12)

i 100 j 100

● ● ● 90 ● 90

80 ● ● 80 ● ● ● 16S rRNA gene identity (%) 16S rRNA gene identity (%)

70

● ● 70 ●

60

yarchaeota (2) rarchaeota (1) iarchaeota (4) h o rarchaeota (10) r archaeota (2) rarchaeota (1) iarchaeota (4) K y o rarchaeota (10) r modarchaeota (1) kongarchaeota (1) h K r u modarchaeota (1) kongarchaeota (1) r u W W

ongarchaeota (1) _vs_ Aigarchaeota (1) ongarchaeota (1) _vs_ ongarchaeota (1) _vs_ Helarchaeota (1) k k ongarchaeota (1) _vs_ Odinarchaeotak (2) ongarchaeota (1)ongarchaeota _vs_ Ka (1) _vs_ Gerdarchaeota (8) u ongarchaeota (1) ongarchaeota_vs_ Crenarchaeota (1) _vs_ (38) Bat u ongarchaeota (1) _vs_ Tho k ongarchaeota (1)u _vs_ Lokiarchaeota (12) k k ongarchaeota (1) _vs_ Hodarchaeota (13) k k k u k u u k Hodarchaeota (13) _vs_ Aigarchaeota (1) Hodarchaeota (13) _vs_ Hodarchaeota (13) _vs_ HelarchaeotaHodarchaeota (1) (13) _vs_ Ka W u u ongarchaeota (1) W_vs_ Thaumarchaeotau (13) ongarchaeota (1) _vs_ He u W W u ongarchaeota (1) _vs_ Hodarchaeota (13)Hodarchaeota _vs_ Odinarchaeota (13) _vs_ (2) Lokiarchaeota (12) Hodarchaeota (13) _vs_ GerdarchaeotaHodarchaeota (8) (13) _vs_ Hodarchaeota (13) k k W W W W k Hodarchaeota (13)Hodarchaeota _vs_ Crenarchaeota (13) _vs_ (38) Bat Hodarchaeota (13) _vs_ Tho W W u W u kongarchaeota (1) _vs_ Heimdallarchaeota (12) u Hodarchaeota (13) _vs_ Thaumarchaeota (13) Hodarchaeota (13) _vs_ He Hodarchaeota (13) _vs_ W W u W Hodarchaeota (13) _vs_ Heimdallarchaeota (12) W bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Universal 293

Archaea & Eukaryotes 62

Archaea & Bacteria 15

Archaea only 7

Eukaryotes only 1

Asgard only 0

0 50 100 150 200 250 300 350 G

Group 4c WP 012528510.1 Proteobacteria Deltaproteobacteria

Group 4c WP 027187553.1 Proteobacteria Deltaproteobacteria Group 4c WP 026783288.1 Proteobacteria Group 4c WP 01860

Group 4c WP 010939566.1 Proteobacteria Deltaproteobacteria Group 4c WP 008388614.1 Pr Group 4c WP 020881764.1 Proteobacteria Deltaproteobacteria

Group 4c WP 007527819.1 Proteobacteria Deltaproteobacteria

Group 4c WP 028535106.1 Proteobacteria Betaproteobacteria Group 4c WP 027191193.1 Proteobacteria Deltaproteobacteria

Group 4c WP 015416298.1 Proteobacteria Deltaproteobacteria Group 4c W Group 4c WP 028108808.1 Proteobacte Group 4c WP 028112273.1 Proteobacteria Gamm

Group 4c WP 01138917

a

a

ire

i

orG

r Group 4c WP 00746

et

tc

ie

cabonahte

i

Tree scale: 1 a 6884.1 P t

Group 4c W P 0 i e

b tor

o

e

r

om t 11474902.1 Proteobacteria Alphaproteobacteria po

o

r p

r

p o

e

mrehT

Group 4c WP 007473061.1 Proteobacte o

roteobacter h m

t Group 4c W r

m

orp r e

M a M Group 4g WP 028841 P 01285 e h oteobacteria Alph a

h T otogae

toeah

o

T T t

8 a m Group 4c WP 01590 C 9.1 Proteo t o 176.1 Proteobacteria Gamma at

oe

Group 4c WP 013121779.1 Clostridia ea

a

t

o 6977.1 P

ore

e a Ther

hc P 01476 ia Betaproteobacteria c Group 4h W

ra Group 4c WP a h

ra

tca cr h

Group 4h W y Group 4h WP 0040761 c

ru Group 4c W r an Group 4c WP 015594087.1 Firmicutes Bacilli ne

bacteria Alphaproteo a

b

r

n erC roteobacte 1 E otogae 9235.1 P o ria er C

1 m

mr

aproteobacteria .407314310 P

C 1. C 01 .6 464.1 e 1 . various Asgard archaea 013809 h P t 6 Group 4h 1832947.1W Euryarchaeota 2 6 013329140.1 Eu Ther 767.1 Pr o 59 5 r P 01 57 oteobacte r

3 Thermodesulfobacteria airetc a b o etorp atle D airetc a b o etor P 1.1 6 2 8 6 3 9 0 0 P W c 4 p u or r p 7 2 ia Epsil 374 aproteo o airetc a b o etorp atle D airetc a b o etor P 1.8 7 8 7 0 0 6 0 0 P W c 4 p u or G Group 4h WP 011844026.1 Euryarchaeota Methanomicrobia 134472 2 6 069.1 airetc a b o etorp atle D airetc a b o etor P 1.5 0 5 5 2 6 2 1 0 P W c 4 p u or G

C 5 9 Group 4g WP 029009221.1 Proteobacteria Alphaproteobacteria 561.1 Firmicutes Clostridia

7 0

1 oteobacte 1 Group 4h W 46.1 Euryarchaeota Methanomicrobia . 1 2 ria Epsilonproteobacteria 0 P 0

P 6 1 Group 4g WP 013756005.1 Firmicutes Clostridia onproteobacteria proteobacteria 0 mococci ococci P r P r 8 0 ia Epsilonproteobacteria bacteria 1 1.1 Firmicutes Clostrid bacteria g4 W atoretc a b o m re htorp o C 1.7 2 2 3 6 9 8 1 0 P W g 4 p u m Group 4h WP 004038499.1 Euryarchaeota1449 Methanomicrobia 5 rmococci 012745 W P

3 g mococci ryarchaeota Methanomicrobia W

p 02096 The r

4 mococci

W W -

puo r

g Group 4g WP 027624751.1 Firmicutes Clostridia 5 The 071.1 Euryarchaeota Methanomicrobia 4 Ther ria Epsilonproteobacteria p ococci g mococci 4 P 014867151.1 Euryarchaeota Methanomicrobia 2 ota The 4 4

1 e Group 4h WP 013644869.1 Euryarchaeota Group 4g WP 027099538.1 Firmicutes Clostridia ota m p p 02846 The ota uo

0 -p 01022

r e Group 4g WP 005210163.1 Firmicutes Clostridia p mococci rgistia u P ergistia -p 01423 Ther r

r u As 129 Group 4h WP 004031618.1 Euryarchaeota Methanobacteria or G eota Thermococciota eota rgistia rgistia ergistia o Thermococci G Ther e e r

W G The yarcha Group 4g WP 027308223.1 Firmicutes Clostridia ta G r Group 4h WP 013825480.1 Euryarchaeota Methanobacteria Group 4g WP eota

g As 143 o Group 4g WP 008908605.1 Firmicutes Clostridia As 129- p 02277 yarchae ota ryarcha As 006 - r 4 ota

Group 4h WP 013295617.1 Euryarchaeota Methanobacteria yarcha yarcha As 168-p 02492 r -p 00069 r Group 4h WP 016358530.1 Euryarchaeota Methanobacteria p yarchae p 04484 Group 4h WP 019266264.1 Euryarchaeota Methanobacteria u rgistetes Synergistia - gistetes Synergistia ia yarcha gistetes Syne gistetes Synergistia or 60.1 Eu r p 01531 r r p 00452 ergistetes gistetes Syn airetc a b o m re htorp o C gistetes Syn - 7 yarchae As 168 p 01727 41.1 Eu ryarchae n gistetes Syn p 00419 r yarchae G 571.1 Eu y p 00367 r As 101 58.1 Eu 55.1 Eu yner - 86.1 Eur 5 mococci Group 4h WP 012956197.1 Euryarchaeota Methanobacteria As 171 0 1.1 Syne Syner r Group 4h WP 011018833.1 Euryarchaeota As 080- As 130 412.1 Eu As 086-p 01626 112510 81.1 Eu 50.1 Syne 41 rmococci As 022- 013467 03.1 Eu 48.1 Syne The As 095- 5 81.1 Syner P 0 5 1161.1 S 6 ia As 103 527.1 S 84.1 P 523.1 Syner b P 015849 9 ota The P 0040664 P 012572 e 014013 1012 icro ia P 1 ota b 014734 00630 m P P 029166 013747 0141633 P bia P 0 P P 013048 005662 icro P 024821 yarcha P 013905871.1 Eu P r m Group 4h WP 011972581.1 Euryarchaeota WP 009200 Group 4d W d ryarchae micro Group 4d W ota NA o Group 4h WP 012193964.1 Euryarchaeota Methanococci Group 4d W e omicrobia Group 4d W obia Group 4h WP 013180984.1 Euryarchaeota Methanococci Group 4d W ta Methano Group 4d W o Group 4d W 437.1 Eu Group 4d W Group 4d W Group 4d W Group 4d W Group 4d W Group 4d WP 0091641 omicr Group 4d W Group 4 Group 4d WP 072.1 Eu yarcha Group 4d W Group 4d W ota Methano Group 4hGroup WP 011973090.1 4h WP 013867209.1 Euryarchaeota Euryarchaeota Methanococci Methanococci ryarchaeota NA yarchae 004066 P eota Methan 014013 83.1 Eur 4 ryarchae 54.1 Eu Group 4h WP 018154556.1 Euryarchaeota Methanococci 6 32.1 Eur ryarchaeota Methan eota Methan cteria 3 a ia 44.1 Eu Group 4h WP 013799750.1 Euryarchaeota Methanococci 008083 1833 ia Group 4d W 015283 43.1 Eu ryarcha roteob r P 4 10.1 Euryarcha group 4c Group 4d WP P 01 14495 Group 4h WP 010870018.1 Euryarchaeota Methanococci oteobacter Group 4h WP 013100690.1 Euryarchaeota Methanococci 01 507.1 Eu 004038 oteobacte P 0040774 a Gammap r Group 4d WP i a Betapr acteria Group 4d W b Group 4h WP 004592818.1 Euryarchaeota Methanococci Group 4d W 013329 ia Betap Group 4d WP oteobacter aproteo Group 4i WP 011972283.1 Euryarchaeota Methanococci Group 4d W Group 4d WP teobacteri a Group 4i WP 012194251.1 Euryarchaeota Methanococci teobacter Alph r Group 4d WP 80.1 Pr o a Group 4i WP 013180047.1 Euryarchaeota Methanococci 7 ra 52.1 Pro p 01818 1 itrospi 88.1 Pr 2 trospi 020505 teobacteri Group 4i WP 018154032.1 Euryarchaeota Methanococci P As 135- irae Ni Group 4i WP 011973026.1 Euryarchaeota Methanococci group 4g P 028451 69.1 Pro group 4g rmococcimococci Group 4i WP 013866214.1 Euryarchaeota Methanococci 81.1 N Group 4b W The tei 113880 48.1 Nitrosp Group 4b WP 028450 0 P 0 ota ota Ther opro Group 4i WP 007044316.1 Euryarchaeota Methanococci e e m As 176-p 02784 Group 4i WP 013099747.1 Euryarchaeota Methanococci Group 4b W 0288427 Group 4i WP 004590220.1 Euryarchaeota Methanococci P 028845 ota Thermococci P ryarcha yarcha moproteitei ota Ther Ther tei Group 4b W o yarchae Group 4i WP 019262598.1 Euryarchaeota Methanobacteria 22.1 Eu83.1 Eur r archae eota tei Group 4i WP 010870540.1 Euryarchaeota Methanococci Group 4b W n Thermopromopr Group 4i WP 012956799.1 Euryarchaeota Methanobacteria Group 4b W narcha eota Ther rmopro Group 4i WP 016359143.1 Euryarchaeota Methanobacteria P 0134674P 0125719 50.1 Cre ota The Group 4i WP 011407013.1 Euryarchaeota Methanobacteria ota P 014123101.118393 Eu 01 narchae As 003-p 00361 28.1 Crenarcha narchae mococci Group 4bGroup W 4b W Thermococci Group 4i WP 013643876.1 Euryarchaeota Methanobacteria P 014737671.1 Cre07.1 Cre Ther ota rmococci Group 4b W 0131289 e mococci Group 4b WP P The r Group 4i WP 013826680.1 Euryarchaeota Methanobacteria 0147679 The ryarchaeotaryarcha mococci Group 4i WP 004029221.1 Euryarchaeota Methanobacteria Group 4b W Eu r The Group 4b W 373.1 Eu ryarchaeota rmococci group 4h ryarchaeotaeota Group 4i WP 010876862.1 Euryarchaeota Methanobacteria Group 4b WP Eu The Group 4b WP 013561596.1 014012072.1 Cre 34.1 Eu ryarcha eota P 591.1 u yarcha Thermococci 0125712 r mococci 010868 Ther Group 4bGroup WP 0040684b W 84.1 Eu idia P 015857865.1 E eota bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint Euryarchaeota As 002-p 02316 group 4d Group 4b WP 0158576 yarcha Group 4b WP P 846.1 idia (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Group 4b W 31.1 Eur ridia 5 ridia available under aCC-BY-ND 4.0As 116-p International 00058 As 106-p 00094 license. Group 4b W 611.1 Firmicutesmicutes Clostr Clostr ridia As 062-p 01097 P 012572 micutes Clost 423.1 Crenarchaeota 52.1 Fir Group 4b WP 013905 icutes Closts Negativicutes As 135-p 00999 369.1 Fir 422.1 Firmicutes Clost lli As 063-p 01967 Group 4b W Group 4f WP 007780 216.1 Firm As 011-p 00088 acilli P 013623 icutes Baci P 013843 Asgard archaea cluster As 136-p 01227 Group 4f WP 0141837 rmicutes B Group 4a WP 014558 micutes Bacilli Group 4f W ridia Group 4f WP 015756 6298.1 Fi micutes Clostridia Group 4f W idia 02898 96.1 Fir r Group 4f WP 019553945.1P 006323220.1 Firmicute Fir group 4b icutes Clost P 0147953149.1 Firmicutesm Clost Group 4f WP 018133527.1Group 4f WP Firm Group 4f W P 015945 micutes Clostridia As 133-p 02944As 053-p 00934 Group 4f W Group 4f W group 4i Group 4f WP 014902103.1 Fir ativicutes Group 4f WP 009619272.1 Fir As 134-p 02543 ativicutes As 034-p 00456 701.1 Firmicutes Neg Group 4f WPP 004094 027938372.1 Firmicutes Negativicutes Group 4f W 70.1 Firmicutes Negativicutes Group 4f WP 021167282.1Group 4f WPOGENASE Firmicutes 019880045.1 WP Negativicutes 027397032.1 Firmicutes Negativicutes Firmicutes Neg As 126-p 02361 YDR P 0217188 As 091-p 03606 NONH E W As 117-p 00130 As 144-p 01628 As 031-p 00867 As 112-p 03635 As 122-p 00152 group 4f NONHYDROGENAS As 027-p 01766

Group 4a WP 011751858.1 Crenarchaeota Thermoprotei

P 015051661.1 Firmicutes Clostridia Group 4a W As 149-p 01698 As 110-p 03657 Group 4a WP 025773400.1 Firmicutes Clostridia Group 4a WP 013400781.1 Firmicutes Bacilli As 150-p 01260 Group 4a WP 022587809.1 Firmicutes Clostridia Group 4a WP 007473606.1 Proteobacteria Epsilonproteobacteria As 151-p 01362 Group 4a WP 013134545.1 Proteobacteria Epsilonproteobacteria Group 4a WP 024953588.1 Proteobacteria Epsilonproteobacteria As 128-p 01290 Group 4a WP 021091787.1 Proteobacteria Epsilonproteobacteria Group 4a WP 023384105.1 Proteobacteria Epsilonproteobacteria Group 4a WP 024955559.1 Proteobacteria Epsilonproteobacteria Group 4a WP 014770116.1 Proteobacteria Epsilonproteobacteria Group 4a WP 011139635.1 Proteobacteria Epsilonproteobacteria

Group 4a WP 022739022.1 Coriobacteriia Group 4a WP 012802521.1 Actinobacteria Coriobacteriia Group 4a WP 006234109.1 Actinobacteria Coriobacteriia Group 4a WP 006362270.1 Actinobacteria Coriobacteriia

Group 4a WP 011474970.1 Proteobacteria Alphaproteobacteria Group 4a WP 008384313.1 Proteobacteria Alphaproteobacteria Group 4a WP 014249328.1 Proteobacteria Alphaproteobacteria Group 4a WP 029010114.1 Proteobacteria Alphaproteobacteria group 4a Group 4a WP 028536385.1 Proteobacteria Betaproteobacteria Group 4a WP 018609272.1 Proteobacteria Betaproteobacteria Group 4a WP 009113583.1 Proteobacteria Gammaproteobacteria Group 4a WP 013317233.1 Proteobacteria Gammaproteobacteria Group 4a WP 005187619.1 Proteobacteria Gammaproteobacteria Group 4a WP 025797024.1 P Group 4a WP 007370867.1 Proteobacteriaroteobacteria GammaGammaproteobacteria Group 4a WP 001102339.1 Proteobacteria Gammaproteobacteria Group 4e WP 022184688.1 Actinobacteria Coriobacteriia Group 4a WP 006818557.1 Proteobacteria Gammaproteobacteriaproteobacteria Group 4e WP 022379545.1 Actinobacteria Coriobacteriia Group 4a WP 0024362 Group 4a WP 025151932.1 Proteobacteri Group 4e WP 022363091.1 Actinobacteria Coriobacteriia 55.1 Pr Group 4a WP oteobacteria Gammap Group 4e WP 012803120.1 Actinobacteria Coriobacteriia Group 4a WP 021804850.1 0042456 Proteobacteria Gammaproteob 89.1 Proteobacteria Gammaproteobacteriaroteobacteria Group 4e WP 015760641.1 Actinobacteria Coriobacteriia Group 4a WP 0 a Gammap roteob Group 4a WP 110928 acteria Group 4e WP 022739883.1 Actinobacteria Coriobacteriia 31.1 Proteobacteria Gammaproteobacteria Group 4a WP 0178026 002445 Group 4e WP 009138996.1 Actinobacteria Coriobacteriia Group 4a WP 020826 316.1 Pr oteobacteri acteria Group 4a WP 025801 Group 4e WP 012798463.1 Actinobacteria Coriobacteriia 32.1 Pro a Gammaproteob Group 4a W 906.1 teobacteria Gamma Group 4e WP 006361427.1 Actinobacteria Coriobacteriia Pro Group 4a W P 013511773.1577.1 Proteobacteriateobacteria Gammaproteobacteria Gammaproteobacteriaacteria Proteobacteria Gammaproteobacteria Group 4e WP 009142078.1 Actinobacteria Coriobacteriia Group 4a WP P029095 020845 proteobacteria Group 4e WP 006720658.1 Actinobacteria Coriobacteriia Group 4a WP 027275 056.1 Proteobacteria Gammaproteoba Group 4a W Group 4e WP 022349360.1 Actinobacteria Coriobacteriia 340.1 Pro Group 4a WP 005542 P 005764 601.1 Proteobacteriateobacter Gammap Group 4e WP 006235754.1 Actinobacteria Coriobacteriia Group 4a WP Group 4e WP 027399701.1 Firmicutes Clostridia Group 4a WP 017781 546.1 Proteobacter ia Gammap cteria Group 4a WP 005760 188.1 Proteobacteria Gammaproteobroteob 524.1 Pro a Group 4e WP 020449518.1 Euryarchaeota Group 4a WP 025820124.1 Proteobacteria Gammaia Gammaproteobroteob cteria 015466 337.1 Pro Group 4a WP 0074697 teobacter a Group 4e WP 008860581.1 Firmicutes Negativicutes cteria Group 4a WP 004726128.1266.1 Pr Proteobacteriateobacteri Gammaproteob ia Gammaproteobacteriaacteria Group 4e WP 022106764.1 Firmicutes Negativicutes Group 4a WP a Gamma a Group 4e WP 007070913.1 Firmicutes Negativicutes cteria 05.1 Pr proteoba 021019734.1 Pro Group 4e WP 027871860.1 Firmicutes Clostridia o teobacter cteria oteobacter Group 4e WP 027203889.1 Firmicutes Clostridia ia Gammap acteria teobacteria Gamma proteob Group 4e WP 027431522.1 Firmicutes Clostridia ia Gammaproteobacteria roteoba acteria Group 4e WP 026651247.1 Firmicutes Clostridia Group 4e WP 028241974.1 Firmicutes Clostridia cteria proteob Group 4e WP 026835345.1 Firmicutes Clostridia acteria

Group 4e WP 022790212.1 Firmicutes Erysipelotrichia Group 4e WP Group015536221.1 4e WP 007047619.1Firmicutes Erysipelotrichia Firmicutes Clostridia group 4e Group 4e WP 014254276.1 Firmicutes Clostridia Group 4g WP 013560598.1 Chlor Group 4g W Group 4e WP 010251151.1 Firmicutes Clostridia Group 4g W P 011737 Group 4g W P 012415 331.1 Pr As 037 Group 4g WP o"exi Group 4e WP 011937735.1Group Proteobacteria 4e WP 005213155.1 Deltaproteobacteria Firmicutes Clostridia P 297.1 E oteobacteri 0085282 Anae -p 00889 lus Group 4e WP 027716617.1Group Proteobacteria 4e WP 024346172.1 Deltaproteobacteria Firmicutes Clostridia rolineae 015284192.133.1 EEu imicrobia Elua Deltapr Group 4e WP 026892028.1 Firmicutes Clostridia As 135- ryarcha oteobacteri Group 4e WP 012199760.1 Firmicutes Clostridia group 4g simicr As 134-p 00139 u e Group 4e WP 003514863.1 Firmicutes Clostridia p 01313 ryarchaeotaota Methan Halob obia As 024- a Group 4e WP 013625216.1 Firmicutes Clostridia As acteria 118-p 0p 00917 Group 4e WP 019226287.1 Firmicutes Clostridia As As 1 116-p 0 0580 omicr 18-p 02 2491 obia As 106As 133- 013 -p 00181p 02734

A Group 4e WP 012619310.1 Euryarchaeota Methanomicrobia s 101 A Group 4e WP 011991040.1 Euryarchaeota Methanomicrobia s 006- As 169 -p 00691 As 143-p 010 Group 4e WP 004039320.1 Euryarchaeota Methanomicrobia As 172 p 01413 As 173 -p 01856 Group 4e WP 011843203.1 Euryarchaeota Methanomicrobia -p 03633 As 1 11 -p 01569 Group 4e WP 014867437.1 Euryarchaeota Methanomicrobia 14-p 00864 Group 4e WP 013328417.1 Euryarchaeota Methanomicrobia As 179 airetcaboe As 080 Group 4e WP 004077817.1 Euryarchaeota Methanomicrobia As 086-p 00898- As 172-p 01874 p 00390 Group 4e WP 015284201.1 Euryarchaeota Methanomicrobia - As 181 p 00859 t As 169- or As 178 Group 4e WP 011448732.1 Euryarchaeota Methanomicrobia p As 022 atleD airetcaboetorP 1.922585020 PW e4puorG PW 1.922585020 airetcaboetorP atleD -p 03248 As 095- As 129 p 01202 -p 01409 Group 4e WP 012376920.1 Opitutae Group 4e WP 014777057.1 Proteobacteria Gammaproteobacteria As 130-p 00329 -p 02023 -p 01460 p 02348 Group 4e WP 012969568.1 Proteobacteria Gammaproteobacteria Group 4e WP 015282154.1 Proteobacteria Gammaproteobacteria

As 049-p 00393 As 052-p 01560 Group 4e WP 014323623.1 Proteobacteria Deltaproteobacteria As 071 As 157-p 02708 Group 4e WP 013515288.1 Proteobacteria Deltaproteobacteria - As 158 p 01758 Group 4e WP 018125731.1 Proteobacteria Deltaproteobacteria As 160-p 01454

As 012 Group 4e WP 022661459.1 Proteobacteria Deltaproteobacteria As 064 -p 00251 As 021-p 00409 As 156 As 072- Group 4e WP 015336113.1 Proteobacteria Deltaproteobacteria As 161 -p 01476 Group 4e WP 027178908.1 Proteobacteria Deltaproteobacteria -p 00591 As 029 - As 043- p 00963 As 043 p 00743 - p 02052

- p 01314 - p 00187 Group 4e WP 015720170.1Group 4e Proteobacteria WP 020879492.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria p 02777

82300 p-830 sA p-830 82300 Group 4e WP 020887613.1 Proteobacteria Deltaproteobacteria Group 4e WP 029967123.1 Proteobacteria Deltaproteobacteria sA p-590 24010

Group 4e WP 027182881.1 Proteobacteria Deltaproteobacteria Group 4e WP 028572004.1 Proteobacteria Deltaproteobacteria As 075 Group 4e WP 005986921.1 Proteobacteria Deltaproteobacteria puorG e4 PW 1.283671910 atoeahcrayruE atamsalpomrehT As 085 Group 4e WP 021758533.1 Proteobacteria Deltaproteobacteria

-p 00728 As 099 -p 01234 Group 4e WP 010937737.1 Proteobacteria Deltaproteobacteria Group 4e WP 007524167.1 Proteobacteria Deltaproteobacteria As 109 As 001 As 090-p 01015

A A As 090-p 02671

ietorpomrehT atoeahcranerC 1.294921310 PW g4 puorG g4 PW 1.294921310 atoeahcranerC ietorpomrehT A

A As 031-p 02 As 139

s

s Group 4e WP 003868941.1 Firmicutes Clostridia 3 5 2 0 0 p-6 0 0 s A s As 002 Group 4e WP 012995592.1 Firmicutes Clostridia s As 137- - As 141- p 02586 0 As 126- Group 4e WP 014757614.1 Firmicutes Clostridia 31 0 As 107 Group 4e WP 011024634.1 Firmicutes Clostridia As 122 As 104 1 As 091 -p 02628 As 087- -p 00152 9 Group 4e WP 027187946.1 Proteobacteria Deltaproteobacteria 150 As 138-p 02965 8

8 Group 4e WP 026486855.1 Firmicutes Clostridia - - Group 4e WP 028587759.1 Proteobacteria Deltaproteobacteria - p 01638 - Group 4e WP 012625292.1 Proteobacteria Deltaproteobacteria p Group 4e WP 027190463.1 Proteobacteria Deltaproteobacteria -p 02309 Group 4e WP 012749853.1 Proteobacteria Deltaproteobacteria p-1 p

0 p p 00310

0 p 00737 Group 4e WP 015926708.1 Firmicutes Clostridia 00 0 - Group 4e WP 029228227.1 Firmicutes Clostridia - p 03708 - 3 - p 00856 p 02344 Group 4e WP 028573897.1 Proteobacteria Deltaproteobacteria p 01751 Group 4e WP 027631079.1 Firmicutes Clostridia 1 p 00679 1 p 02378 Group 4e WP 028576381.1 Proteobacteria Deltaproteobacteria 3 1

07 4

0 1

9 9 9750

5 7 various Asgard archaea

Group 4e WP 012302199.1 Firmicutes Clostridia

Group 4e WP 015051201.1 Firmicutes Clostridia Group 4e WP 015738579.1 Firmicutes Clostridia Group 4e WP 013174844.1 Firmicutes Clostridia Group 4e WP 018962513.1 Coprothermobacterota Coprothermobacteria

Group 4e WP 010936594.1 Chloro"exi Dehalococcoidia

Group 4e WP 011034248.1 Euryarchaeota Methanomicrobia

Group 4e WP 011305188.1 Euryarchaeota Methanomicrobia

Group 4e WP 013638341.1Aqui !caeAqui !cae

Group 4e WP 013537236.1Aqui !caeAqui !cae

Group 4e WP 012035507.1 Euryarchaeota Methanomicrobia

Group 4e WP 012899275.1 Euryarchaeota Methanomicrobia Group 4e WP 012899230.1 Euryarchaeota Methanomicrobia

Group 4e WP 014405402.1 Euryarchaeota Methanomicrobia

Group 4e WP 012663581.1 Proteobacteria Epsilonproteobacteria

P 013460916.1 Proteobacteria Epsilonproteobacteria Group 4e WP 024789181.1 ProteobacteriaGroup 4e WP Epsilonproteobacteria 007474009.1 Proteobacteria Epsilonproteobacteria

Group 4e WP 008340210.1 Proteobacteria Epsilonproteobacteria

Group 4e W eotaArchaeoglobi

a

rG

puo

Archaeoglobi Group 3c WP 02718

PW c3

ia 510 r ia r

32

3

Group 3c WP

.617 oteobacte ria Group 3c WP 010878869.1 Euryarch oteobacte 7 r

Tree scale: 0.1 949.1 P

ruE 1

modesulfobacteria ria Deltapr 027191 a i r roteobacte

Ther ia Deltap Aqui!cae roteobacte r ia ria

993.1 Proteobacteria Deltapr

ia r ato e a h cra y

Group 3c WP 026167551.1 Proteobacte oteobacte

Group 3c WP 020882283.1 Proteobacteria Deltaproteobacteria ia Deltap r oteobacte ria Deltaproteobacteria 980 laH P r o

roteobacte obia Group 3c WP 012965352.1 Euryarchaeota Group 3c WP 027368703.1 Proteobacteria Deltaproteobacteria 4 r

4

72 2

889

airetcaboetorpatleD ai 3 airetcaboetorpatleD

0 0 ia Thermodesulfobacteria ria 3 20 20 bi 5331.1 79 p-81 129.1 P 20

6

98300 p-110 sA p-110 98300 o 3 1 9 Group 3c WP 0 modesulfobacteria nomic 7

airetcab roteobacteria

0 p-811 sA p-811 0 1 p-4

000 roteobacter 59 p

10 p-360 sA p-360 10

-

31 sA 1

430

5 Ther 01 G 31 Group 3c WP 00887 eota NA s Group 3c WP 0 P spirae Nitrospira lfobacter 01209 a r p Group 3c WP 027371 A

- o A h

sA sA roteobacte s ria Deltaproteobacte

63 Group 3c WP u 901.1 P rc

A

s a Group 3c WP 028318776.1 Proteobacteria Deltaproteobacte 908.1 6940 tap r 601

1 s 1

600 p- 3 p o

oteobacteria Deltap c 811

oteobacteria 32

Group 3c WP 11698648.1 Proteobacteria Deltaproteobacteria A up 3c WP W Proteobacteria Deltaproteobacter p- aeotaArchaeogl us K 0 ria Be micutes Clostridia t 029898 Group 3c W

0 r Group 3c WP 013637988.1Aqui !cae p Gro

P 0 -260 sA -260 r Group 3c WP 028580885.1 Proteobacte ida

Group 3c WP 02340 117007 7 etcaboetorP 04420 p- oteobacteria Deltap 551.1 Pr d 015724 820 3464.1

A 79 008869690.1 Proteobacteria Deltaproteobacte 5

0928.1 Proteobacteria Deltaproteobacteria P 8 012545782.1 Nitro

8379.1 Thermodesu 4 0 7 5 0 0 p-3 3 1 s A 553.1 Proteobacte roteobacte 5 3 7 Group 3c W 0 047.1 P 2.1 Can r moplasmata 8.1 Proteobacteria Deltaproteo ia Deltaproteobacteria 350 s 435.1 P 11188256.1 Proteob Group 3c WP 19 P 02718 76 - oteobacteria 9 Group 3c W Group 3c WP 015903 Group 3c WP 028840 0675.1 Euryarch r

Ther bi 381.1 P P1. 1.386 o Group 3c WP 01 r 5 P 02831 r oteobacte Group 3c W Thermoplasmata r o oteobacteria Deltap 8600 p eota ogl 01230 t Group 3c WP pitutae 0 a P P 2277 8290.1 Pr ae

25751 O 02266 stridia h W Group 3c WP 011700332.1 Proteo ria Deltap ichae As 037-p 02044 Group 3c WP 028574 Group 3c W r aboe P 0 P te c 013179 yarch c Arc obia t r Group 3c WP 01390 Group 3c WP e ria Deltap Group 3c WP 013756665.1 Fi c ba Group 3c W 8 r 0 Group 3c WP 028580860.1 Proteobacteria Deltaproteobacteria yarchaeota i Group 3c WP 022532976.1 Euryarchaeota Metha o Caldit 938.1 Proteobacteria Deltaproteobacteria a eota oteobacte acteria D puorG c3 PW Group 3c WP 029966039.1 Pr a Group 3c 117076 D Group 3c WP Group 3c WP 01559 ota rote 971.1 Eu 075.1 P e ucomi Group 3c W As 072-p 03930 p 03413 e r l As 087-p 01315 Group 3c W Group 3c WP t Group 3c W a yarch er r p r V oteobacter 7 roteobacte r 1 Group 3c WP 8391.1 Eur 016359076.1 Euryarch Group 3c W o eltaproteobacteria Group 3c W .1 Eurya r 885.1 P itricha roteobacter ia Deltaproteobacte t ria Deltapr bacteria As 139- d ryarchaeota Methan 4

1 sA P 019177457.1 Eur 6486. 01295 P Group 3c W 02044 As 139-p 02380 micutes Clostridia 0 p 02321 4228.1 Eu P r 0 r - Group 3c WP P 02889 chaeota 1 etcaboe 114059 01 Group 3c W ia P 1699072.1 Proteo 0 p-40 ria 015739967.1 Firmicutes Clo 005.1 Cal Group 3c W r 01237 P 01093 ria i 0 0 Group 3c W 6 P a oteobacter micutes Clostridia P 1972821.1 Eury r 704.1 Eu 013218 ia Deltaproteobacte 3 s 002 Group 3c W 018154 01368 4 W Group 3c WP 01364 A P

7 Group 3c W p 05004 p 3c W Group 3c W 2 bacteria D M Group 3c WP 6.1 Eurya P u 00693 019264 Group 3c W 1 4 5 0 0 p-7 0 1 s A p 00605 ethanococci Group 3c W - 3c W P 6 p 02526 - 178.1 Chl 262.1 Euryarch 390.1 Chlo Gro r 01573 P r p 00426 p 01049 P yarchaeota Methanaeota Methanobacter ococci ia - - 013799334.1 Euryarch -p 02889 P 0 ia Group 3c WP As 105- Group 3c 013826546.1 Eur p 00736 P - 118438 Group 01486 891.1 Eu As 141 Group 3c WP Group 3c W rchaeota eltaproteobacteria As 087 a As 075-p 01598 P 2794.1 Eu bacteria D rchaeota As 085-p 01362 As 003 As 107 00403 As 099 various Asgard archaea o 00403 P 012979 Group 3c W ro"exi Dehalococc As 087 ro p 01061 7 51.1 Eurya Group 3c W Group 3c W ria 732.1 Euryarcha 3973.1 Eu Group 3c WP 0 " ia r exi Dehalococc 9570.1 Eu yarch Group 3c W M aeota Metha 1 Group 3c WP 012302733.1 Fi ethanoba ryarch cteria P 31 M eltaproteobacteria cter a 012901 013330 819.1 Eur As 027- ethanococci 1.1 Eurya yarch aeota Methan obacteria Group 3c WP 015617925.1 Fi oba rchaeota ryarch i a P P 01329 a W ryarcha eota Methanococci acteria roteob 01440 aeota Metha aeota Metha rote P b acteria p P 01203 238.1 Eur 634.1 Eu cteria 0138663 p Group 3c W b 110186 yarcha n eota Metha rchaeota M a ococci eota Metha oidia o 4 eota Metha M 6307.1 Eur idia 721.1 Euryarcha Actino ethanomicrob 6601.1 Eur Actino r obacter yarch yarch 3 eota Methan a Gamma a Gamma i 8.1 Eury 62.1 i nobacteria n cteria ococci nomic P ethanob a ter a a 01341 NA acteria c eota Methan eota Metha n a yarch ia b n obacteri yarch omic inob archaeota eoba teobacter r ct t acteria ob o i 3 a A ctino r b eota Metha a a ococci robia 806.1 Eu eota Metha P acteria Group 3c W a ia cteria A Pro eota Metha b a teobacteri .1 91.1 0.1 eo ia n o 4 68.1 r 9 proteo omic omicr 31 3 Pr a te Group 3c W Methanopyri 5 ryarch prot ac ia a acteria lph ia n r h b obia obia 17812 A ia r omic nobacteri 1 acteria P 014208 a e n i eo acteria teria b t 013644 6268.1 0 omic 0278565 oteob c c aeota Methan P P 00550 r a a WP P ia Alp rot robia W p b Group 3c W 151 p eria o P a robia 1 cteria ct e oteobacter 004032 proteo a t r 0 teobacter a o p 241.1 Eu a P 3d W proteob ba r a o cteria teobacter Alph proteob r ob p eta up o eo a ia Gamma Group 3d o teobacter lph a proteobacteria B Group 3c W rote rot m 005.1 Eu Group 3d Group 3d W A Gr 1 Pr p m obacteria a P r i ia Gamm a a ia Group 3c W yarch r teria roteoba 01382 63. c a G p 9 a Gamma Gammap ia 4 i a Gamma i acteria Group 3d W 38.1 Pro acter r acteria Group 3c W 7 teobacter a r a b i e ob eoba yarcha teobacteri Gammap t t 6 eota Metha teri a Gamm e o ter ia i c P 227.1 Eur Pro r a As 091 01290 Pr teobacter oteo e bo rot o oteobacte ia Gamma roteob P P 02017 1 r p p 01440 P teobacter eota Metha 012591 31.1 teobacter te o W o teobac P 8 03. teobac o 29.1 Pro 1 o r 9 -p 01314 P 1 o 01203 516.1 Eu 41.1 Pr Pr teobact P Betapr yarch 1 Pr a As 1 n . 1 o oteobacter 1 acteria 7 obacter 30 3 1848.1 . 035.1 Eu 1394 a Gamma teobacter 06931 2 1 Pr 01 i a Gamma 4 3.1 Pr r o i ia Zetapr 0 5 18-p 01 a 01 296. 8 e 028451 cteria 6509.1 Eu Group 3d P 7 6 t a eota Metha n P 9 P ryarch 00819 4 75.1 Pr ac 1 Pr obacteri 023 642.1 7 6 proteob Group 3d W P ia P 7 teobacteri r 050 1 yarch W 337. o 389 020410 1 teob teobacter roteob 029133 0 teobacter a 02 o o p eria 07191 o r eota Metha P P Pr r ct yarch a WP 0 roup 3d W P a a P P 028864 a Gamma eota Methanomic n Group 3d W W G 14.1 Pr i obacteri Group 3d W 026596 1 cteria d 84.1 P cteria 3 71.1 Pr a a As 169 a Group 3d Group 3d W 4 7 435.1 eota Methan 3d W p a Gamma roteob 59 i As 022 p u p Group 3d WP o As u rG nomic Group 3d Group 3d W teobacter roteob roteob a P 013966 p cteria ria Group 3c K -p 02083 1 o p a e Gro Group 3d W 0109 020393 009849 14-p 00 P ct - p 00078 Group 3d W a robia teobacter ba cteria o ia Gamma o a eri ria 95.1 Pr roteob cteria t omicrobia 3d WP 3d WP 9 a robia p ote ac 674 ia Gamma a Gamma K up Group 3d W up K40 As 170 81.1 Pr roteob eob 3 cter p As 086-p 02212 Gro Group 3d WGro roteob rot 6 As 179 teobacter p ba o Gamma p roteobacte 81.1 Cand a p As 101 eo Pr i a - teobacteri p 0 ot o a Gammapr 020565 Group 3c W -p 00531 Pr eri mm 2317 P ia Gamma As 172 As 168 095.1 a Gamma Ga -p i ia Gamma idatus Lokia 76.1 r a 01887 3d W 80.1 Pr teobacter Group 3d WP 4020161 1 o eri 7159 r teobact -p 00 -p 01774 P o ia oup 02 a Group 3c W P 01246 5370 P Pr teobacter i teobacter Group 3c W Gr 41.1 .1 teobacte 538 00 020158 6 ro o teobact P P 9 o r 0 P 9 chaeota NA As 105 2 Pr ia 6 58.1 Pro teobacter r As 068 11.1 Proteo 7 o teobacter As 004 3d W 24299 69.1 65.1 Pr ia o P 0 9 8 61.1 r ia 02735 5 r As 06 P - Group 3d W P 16 00451 p 04491 027148 8 etapr -p 03211 P B ia - Group 3d W Group 019865 a oteobacte r Betapr 4 p 04576 P r ria -p 02031 8538.1 Pr 3d W 013 017839 ia bacteria D P 026601 teri r 2544.1 P As 090 oteobacte As 093 P r ia roteobacte r bacte As Group p 3d W a Betap teobacte bacteria o ia As 098 i o obacte e etap r oteobacte 1 Group 3d W teobac ro - r As 102- 12-p 0 -p 01439 ou te rot B p 03438 oteobacte e ia Betap o ote a ltaprote Gr Pro i Group 3d WGroup 3d WP oteobacte ia a -p 00186 Group 3d W As As 089 0898 94.1 a Betap etapr er p 02448 bacter 9 teobacter ri 48.1 Pr ia Betap oteobacte ria Deltap o e 1 ria o 112-p 01 80 ia B etapr r bacteria teo B ia Deltapr o a oteobacter W -p 00722 r 1765 r eobact 113 obact 1 cter P P 0 06.1 Pr ia Betapr ot roteobacteri 012532 As 091 1 P 4 te r p r As 126 P teobacter teobacte A oteobacte 564 o 21.1 s 093- 22. ro 5 ro oteobacte 6 oteoba P 2 Betapr - r teobacteri a p 00152 05 P o 3 - 004174 42.1 Pr p 027 12.1 p 02983 P Pr 93.1 841 eri a Gamma 7 oteobacte a Betap i As 122 44.1 .1 01 i ria 0092 Group 3d W 29 6 P ter ia 26 P Group 3d WP 0 36 obact r 3d W 2 ac bacteria ia 17.1 Pr te d W 0130 92 bacter o teo -p 03157 P 0227718 3d W r cteria As 109-p 01894 3 roup p teob a o p W u teo o pr G P 015436 14661 ro a ou d 0189 1 P 87.1 P h oteobacter As 105 3 P Gro 0 6 Gr P lp 93 35.1 Pr proteob A 1 ia 14 Group Group 3d WP 1 pha ia Betapr -p 04203 0 Al Group 3d Woup 3d W 346489.1 P 23490 r 0 ia cter G a ia 012 P teobacter ria Group 3d WP o d W acter acter 3 roteob bacte up teob teob ia Group 3d W ro 05.1 Pr 1 P r ro P 2 oteo As 098- G 01. r 7 ia Group 3d W etapro r As 048 07.1 ria 008384 ia B As 050 p 04914 P ia Betap roteobacte ria W 028994 eria As 047- -p 00197 d P tap 3 roteobacte teobacte - P 0086169 Be o As 178 p 004 teobacter ia tap r teobacter r e p 01223 o B oteobact roteobacte Group Pro r a r ia 48 P acte i p r -p 04000 Group 3d W ia Betap te As 181 42.1 eta 0 71.1 B ia Betap Group 3d W 5 roteob ia teobacter oteobac - o r p 01329 teobacter -p 00285 014238 ro As 057-p 00855 P 015765 3541.1 P teobacter etap Lokiarchaeota teobacter o 9 o r a B s 023 097.1 Pr P i A 7 59.1 P As 056 0132 78 7.1 P 668 45 As 069- 153.1 Pr Group 3d W teobacter -p 02668 Group 3d WP P 02 o P 02745 r As 055 p 01656 W P 002937 1 P 3d W P 004322 As 057 Group 3d W 763. - p 3d p 02462 Group ou 52 - Gr oup 3d W As 019 p 03929 roup 3d W Gr 0042 G P -p 02026 As 125 Group 3d W -p 0 A 1 As 007 s 008 155 -p 01622 -p 02515 As 016-

p 00138 As 065 e As 019 -p 01584 rolinea

-p 02307 Anae "exi ro As 083-

391.1 Chlo As 058 p 03351 rgistia 13559 mi Dictyoglomia e - o p 00402 P 0 As 165 p 3d W ergistia gistetes Syn a stia -p 01307 rou 75.1 Dictyogl r G 3 ia ergi As 165 nergisti st yn 1.1 Syne rgi 012548 28 rgistetes Syn ne tes S -p 02389 P te As 162 200 yne 9 S tes Sy rgis 0 rgistetes Sy e rgistia - P 0 93.1 yn As 029 p 02317 4 yne rgiste S cteria Group 3d W S rgistia 42.1 e -p 00896 45.1 8 yn As 021 7 S 014807 rgistetes Syne Group 3d W 455.1 Syne ota Haloba yne e cteria -p 00359 P 005661 S istetes a a As 078 P 014163 rg ne yarcha eri 029165 933.1 r act -p 02217 Group 3d WP Sy proteob eob As 020- up 3d W 67.1 t cteria o Group 3d W 6 412.1 Eu ro a Gr P 013048 p a Gamma ma p 02253 Group 3d WP i 024821 proteob acteria As 155- P P 008526 W roteob cteria es p 01988 Group 3d W oteobacter acteria Gam Gamma p a As 013 Pr ia roup 3d ria ibacter G Group 3d W oteob mma cteria r 973.1 proteob -p 02 ia Ga ba As 160 158 oteobacter er res Defer As 068- acteria roteobacte P 009150 .1 Pr obact a Gamma obacteria -p 00727 6 i ibacte As 058-p 03852 06 ote oteob r p 02547 3d W 015281226.1 Pr pr ia Deltap ia Gammaproteo onprote 62.1 Pr teobacter il Group 012971 1 o 71.1 Defer Eps P Gamma 6 ia 0504 ia oteobacter Group 3d WP 645.1 Pr oteobacter er 02 013451 As 093- As 067 WP 42.1 Pr P teobacter Group 3d W 59.1 Pr teobact 6 o 014777 o 3d W r p 01707 -p 00631 P 013163 oup 26.1 P ia As Group 3d 13.1 Pr P Gr 2 r 092 3d W P 0071918 6 -p 0053 Group P 021287 P 007042 roteobacte ia 8 W Group 3d W r As Group 3d W up 3d W 049-p 03013 o ia Deltap As 164 Gr ia Group 3d ae roteobacte er -p 00800 ine rol oteobacter ia e Pr ia Deltap r roteobact As 163 Ana ap exi 309.1 - " oteobacte a Delt p 02 ro oteobacter r i ia 374 Pr acteriia r P 028317 58.1 a Deltap bacte 77.1 Chlo 7 i teobacter teo 6 a Acidob ro bacter As 009 015724 o 78.1 Pro 013559 Group 3d W P bacteri te 6 ia P o o ia Deltap -p 02443 cid .1 Pr 28582 As 012 A 38 0 cter As 037 up 3d W 2 P teobacter Gro 721.1 teoba -p 01779 -p 02082 Group 3d W Pro 11686 P 014808 Deltapro P 0 Group 3d W 846.1 a 85 roup 3d W obacteri As 072 oup 3d W G P 0285 Gr Prote -p 01614 3.1 ria 45 cte Group 3d W 189 11 eoba As 038 P 0 ot a ria -p 00713 teri ia Deltapr As 071 teobac Group 3d W o roteobacte -p 01625 e ria roteobacter Deltapr ilinea P a a Deltap Cald 64.1 roteobacte exi teobacteri teobacteri ro" 208762 o P 0 ia Deltap As 155 561.1 Pr 769.1 Pro ia 13.1 Chlo -p 01678 1 up 3d W 11699 oteobacter 34 Gro P 0 P 004513 oteobacter As 013 44 r P 01 3d W 946.1 Pr - a Deltap p 00597 roup i netes up 3d W G Group 3d WP 014551 e As 021 Gro oteobacter hrysiog - netes p 00022 oge Group 3d W 833.1 Pr rysi As 007- iogenetes C h 12470 p 01615 P 0 Chrys 897.1 iogenetes C As 093 Group 3d W -p 00608 608.1 Chrys As 065 P 013506 Group 3d WP 027388 ia -p 02861 "ex ia loro "ex As 02 Group 3d W loro exi Ch h 8-p 01 ro" exi C As 028 ro" 009 ia -p 0 473.1 Chlo "ex 1 06.1 Chlo 114 412 loro P 006562 exi Ch o"exia As 029- P 0159 ro" r p 3d W Chlo A p 01521 "exi Chlo s 157 Grou Group 3d W 902.1 -p 02084 028457 3.1 Chloro P 1989 As 071 1 acteria NA - b As 072 p 02924 Group 3d W eria NA act -p 03276 Group 3d WP 012117.1 Cyano b group 3c a NA i P 006512 20.1 Cyano As 0 bacter acteria NA 37 -p 00296 P 0065129 anob ia NA Group 3d W d W 98.1 Cyano .1 Cy acter p 3 6 543 b ria NA bacte Grou P 012167 As 038 P 017324 803.1 Cyano p 3d W 3d W 37.1 Cyano - 024545 8 p 00989 Grou Group P P 012305 As roup 3d W 3d W 068- G p 0 Group 0674 As 064-p 02310 bacteria NA 85.1 Cyano bacteria NA As 067 7 a NA - P 009756 bacteri p 02304 5766.1 Cyano ano 619 97.1 Cy As 164 Group 3d W P 00 8 acteria NA As 15 -p 00543 P 015153 b 2 Group 3d W -p 01686 p 3d W 26.1 Cyano As 163 Grou 9 ria NA As 144 -p 002 016874 - P p 01782 77 3d W .1 Cyanobacte oup 5 Gr 11318 015 bacteria NA As 045- ia NA 24.1 Cyano bacter p 02964 Group 3d WP 13213 P 01 1 Cyano 057. bacteria NA Group 3d W P 015138 NA As 407.1 Cyano 115- obacteria p 00036 Group 3d W P 016952 A 9811.1 Cyan a N Group 3d W bacteri As 094 P 01507 -p 0 up 3d W 232.1 Cyano 1207 Gro 5216 bacteria NA P 01 As 044- 675.1 Cyano p 00301 Group 3d W P 006276 Group 3d W ia NA As 044 bacter -p 00697 As 127 892.1 Cyano -p 0233 P 029552 bacteria NA 3 Cyano Group 3d W 040.1 NA As 127 ria WP 006170 -p 01966 p 3d 00.1 Cyanobacte Grou 1109 As 154- P 015 bacteria NA p 0 p 3d W 1139 Grou 395.1 Cyano A P 006910 N group 3d obacteria roup 3d W G 516.1 Cyan As 036- 11378 p 00882 0 Group 3d WP bacteria NA As 036- 09.1 Cyano 6294 p 00083 P 012 bacteria NA 978.1 Cyano Group 3d W 015171 As 088 P bacteria NA -p 02393 Group 3d W 972.1 Cyano 015182 P bacteria NA As 039 Group 3d W -p 0 034.1 Cyano 2722 P 017305 bacteria NA Group 3d W 67.1 Cyano As 023 As 040- 0151442 -p 00284 p 00781 3d WP acteria NA Group b 061.1 Cyano d WP 019509 bacteria NA Group 3 425.1 Cyano 019498 bacteria NA Cyano As Group 3d WP 694.1 054-p P 008273 01255 3d W Group bacteria NA As 132 478.1 Cyano - 012593 p 03976 P teria NA Group 3d W anobac 15.1 Cy As 077- WP 0159561 bacteria NA p 01641 Group 3d 514321.1 Cyano P 016 ria NA As 176 Group 3d W bacte -p 00301 33.1 Cyano P 0289474 Group 3d W As 004 bacteria NA -p 04217 190.1 Cyano 006669 up 3d WP bacteria NA Gro 29.1 Cyano As 175 2 -p 01075 WP 015221 Group 3d bacteria NA 94725.1 Cyano 3d WP 0172 As 015-p Group 03118 acteria NA 067990.1 Cyanob d WP 023 As 174 Group 3 bacteria NA -p 00972 29.1 Cyano P 0152270 Group 3d W bacteria NA 07.1 Cyano As 119-p 0 P 0176593 0225 Group 3d W bacteria NA P 006101686.1 Cyano As 111-p 01 Group 3d W 764 acteria NA 79.1 Cyanob Group 3d WP 0151519 As 124-p 0 bacteria NA 2519 017715405.1 Cyano Group 3d WP As 142-p 01660

As 137-p 00534

As 046-p 02481

As 100-p 00502

As 178-p 04240

As 073-p 00518

As 076-p 01403

As 076-p 00959

As 073-p 01465

As 023-p 03063

As 041-p 01275

As 025-p 01167

As 145-p 01590

As 151-p 01275 ri archaeota Methanopy 11018550.1 Eury As 150-p 01868 Group 3c WP 0

aeota Methanococci As 128-p 03923 9800.1 Euryarch Group 3c WP 01309 a Methanococci 824.1 Euryarchaeot Group 3c WP 004589 nococci 0819.1 Euryarchaeota Metha Group 3c WP 01579 ryarchaeota Methanococci Group 3c WP 015733560.1 Eu

Euryarchaeota Methanococci Group 3c WP 013799074.1 aeota Methanococci Group 3c WP 013866983.1 Euryarch nococci Group 3c WP 018154225.1 Euryarchaeota Metha

Group 3c WP 011973924.1 Euryarchaeota Methanococci

Group 3c WP 011171638.1 Euryarchaeota Methanococci

Group 3c WP 012065828.1 Euryarchaeota Methanococci group 3c Group 3c WP 013180651.1 Euryarchaeota Methanococci

Group 3a WP 012107592.1 Euryarchaeota Methanomicrobia

Group 3a WP 011307257.1 Euryarchaeota Methanomicrobia Group 3a WP 011021012.1 Euryarchaeota Methanomicrobia Group 3a WP 011305486.1 Euryarchaeota Methanomicrobia Group 3a WP 011034943.1 Euryarchaeota Methanomicrobia

WP 015286660.1 Group 3a WP 012616797.1 Euryarcha eota Methanomicrobia Group 3a WP 004076756.1 Euryarchaeota Methanomicrobia Group 3a WP 011449294.1 Euryarchaeota Methanomicrobia Group 3a WP 011832394.1 Euryarchaeota Methanomicrobia Group 3a WP 014867 277.1 Euryarchaeota Methanomicr Group 3a WP 004037261.1 Euryarchaeota Methanomicrobiaobia

Group 3a WP 013414 105.1 Euryarchaeota Methanobacteria Group 3a WP 010876 group 3a 915.1 Euryarchaeota Methanobacteria Group 3a WP 004029354.1 Euryarcha Group 3a WP eota Methanobacteria 013643821.1 Group 3a W Euryarchaeota Methanobacteria P 0138267 Group 3a WP 012956 70.1 Euryarchaeota Methan obacteria 862.1 Euryarchaeota Methanobacteria

Group 3a WP 011406878.1 Euryarchaeota Methano Group 3a WP 019262 Group 3a W 624.1 Euryarcha P 0163 eota Methano bacteria 59193.1 E u bacteria Group 3a W ryarchaeota Metha Group 3a WP 01290 P 012035102.1 Euryarchaeota Methanomicr nobacter ia 1 155.1 Euryarchaeota Methanomic obia Group 3a WP 012035 robia 310.1 Euryarchaeota Methano Group 3a WP 0144055 microbia Group 3a WP 94.1 Euryarchaeota Methanomicrobia 012901198.1 E Group 3a W uryarch P 007044 aeota Metha 698.1 Eur nomic Group 3a W yarcha robia P 013866 eota Methanococci Group 3a WP 359.1 Eur yarcha 018154 eota Methanococci 259.1 Eu Group 3a W ryarcha P 01 eota Methano Group 3a WP1971891.1 Eu ryarchae cocci 011170764.1 Euryarchaeota Methanoc Group 3a WP occi 013179 ota Methanoco 974.1 Euryarcha cci eota Methan ococci bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

available under aCC-BY-ND 4.0 International license. Group 3a W P 022846160.1 Group 3a W Group 3a WP Aqui!c Group 3a W P 0135378 0136380 ae Aqui! Group 3 69.1 cae P 0 Aqui 61.1 Aqui! a WP 11019 !c Group 3 013099 2 ae A cae 99.1 Eu qui!c Aqui! Group 3a W a 931.1 Eu ae ca WP 004591 ryarchaeota Methanopyri e ryarchae P 015791 340.1 E ota Methanoc 7 uryarchaeota Methanococci Group 3a W87.1 Eu ryarcha occi Group 3a WP P e 011973 ota Methan Group 3a WP 0138670 755.1 ococci Euryarchae Group 3a W 007043 54.1 Eur o 840.1 Eu yarcha ta Methanoc As 012 P 0181546 eota Methano Group 3a WP -p 02210 ryarchaeota o 55.1 Eur cci cocci 014012057.1 Eur Group 3a Wyarcha Methan Group 3a W eota Methanococci yarchaeota Group 3a WP 013180376.1 GroupP 012572 3a W Ther Group 3a WP 0 ococci 521.1 E mococci P 011171326.1 E P 013905ur Eu yarcha ryarcha Group 3a WP833.1 Euryarchaeota 1 Group 3a W The 1972 eota Methano rmococci 500.1 Eu urya 015857 rchae P 014788 693.1 Eureota ryarchaeotao Methanococci Thermococci Group 3b WP 024444559.1 Actinobacteria Actinobacteria ta Methanococci 989.1 Eu yarcha ryarcha Group 3b WP 014814971.1 Actinobacteria Actinobacteria eota cocci eota Thermococci Thermococci Group 3b WP 003895383.1 Actinobacteria Actinobacteria Group 3b WP 003924002.1 Actinobacteria Actinobacteria

Group 3b WP 003918688.1 Actinobacteria Actinobacteria As 170- p 00319 Group 3b WP 007168195.1 Actinobacteria Actinobacteria As 168 Group 3b WP 020730473.1 Actinobacteria Actinobacteria -p 00720 Group 3b WP 023365427.1 Actinobacteria Actinobacteria As 172 -p 01717 Group 3b WP 005514364.1 Actinobacteria Actinobacteria As 167 Group 3b WP 006331106.1 Actinobacteria Actinobacteria -p 00495 As 172-p 00628 Group 3b WP 003938391.1 Actinobacteria Actinobacteria As 171 Group 3b WP 007298856.1 Actinobacteria Actinobacteria -p 04703 Group 3b WP 021333622.1 Actinobacteria Actinobacteria As 080- Group 3b WP 018160400.1 Actinobacteria Actinobacteria p 00137 Group 3b WP 025357598.1 Actinobacteria Actinobacteria As 022-p 01446 retcabai Group 3b WP 007033070.1 Actinobacteria Actinobacteria aireA itcon cAt 1.3ni aboct As 169 10 231310 As 095 puorGb3 PW -p 00152 -p 00463 As 103 Group 3b WP 012891383.1 Actinobacteria Actinobacteria As -p 00526 Group 3b WP 026214329.1 Actinobacteria Actinobacteria 114-p 0 sA 2177 17 1- As 173-p 01352p 0 Group 3b WP 015662841.1 Actinobacteria Actinobacteria 147 As 080-p 00428 1 Group 3b WP 020123404.1 Actinobacteria Actinobacteria Group 3b WP 010360429.1 Actinobacteria Actinobacteria As 177 - Group 3b WP 012998941.1 Actinobacteria Actinobacteria As 086 p 00194 Group 3b WP 004946981.1 Actinobacteria Actinobacteria -p 00403 Group 3b WP 004936990.1 Actinobacteria Actinobacteria Group 3b WP 018384886.1 Actinobacteria Actinobacteria Group 3b WP 003980535.1 Actinobacteria Actinobacteria

Group 3b WP 023544405.1 Actinobacteria Actinobacteria Group 3b WP 014677317.1 Actinobacteria Actinobacteria

Group 3b WP 020874487.1 Actinobacteria Actinobacteria

Group 3b WP 013018709.1 Actinobacteria Actinobacteria Group 3b WP 026405334.1 Actinobacteria Actinobacteria Group 3b WP 020524726.1 Actinobacteria Actinobacteria

Group 3b WP 024287469.1 Actinobacteria Actinobacteria

Group 3b WP 027140596.1 Actinobacteria Actinobacteria

Group 3b WP 026311168.1Group 3b WP Actinobacteria 011438850.1 Actinobacteria Actinobacteria Actinobacteria Group 3b WP 011719855.1 Actinobacteria Actinobacteria

Group 3b WP 003611589.1 Proteobacteria Alphaproteobacteria Group 3b WP 018265836.1 Proteobacteria Alphaproteobacteria Group 3b WP 018407885.1 Proteobacteria Alphaproteobacteria Group 3b WP 016920195.1 Proteobacteria Alphaproteobacteria

Group 3b WP 008939163.1 Proteobacteria Gammaproteobacteria

Group 3b WP 012136580.1 Proteobacteria Gammaproteobacteria Group 3b WP 005000139.1 Proteobacteria Gammaproteobacteria

errucomicrobiae Group 3b WP 028214587.1Group 3b WP Proteobacteria 017774733.1 Betaproteobacteria Proteobacteria Betaproteobacteria Group 3b WP 012383518.1 Proteobacteria Alphaproteobacteria Group 3b WP 012699124.1 Proteobacteria Gammaproteobacteria Group 3b WP 024807859.1 Verrucomicrobia NA Group 3b WP 018290138.1 Verrucomicrobia V

Group 3b WP 009021278.1 Proteobacteria Gammaproteobacteria

Group 3b WP 012237207.1 Proteobacteria Deltaproteobacteria

Group 3b WP 029008300.1 Proteobacteria Alphaproteobacteria Group 3b WP 009540223.1 Proteobacteria Alphaproteobacteria Group 3b WP 014068049.1 NA

Group 3b WP 028322239.1 Proteobacteria Deltaproteobacteria Group 3b WP 024079550.1 Proteobacteria Alphaproteobacteria Group 3b WP 008887819.1 Proteobacteria Alphaproteobacteria Group 3b WP 014433201.1 Chloro"exi Caldilineae

Group 3b CUU06124.1 Candidatus Kryptonia NA

Group 3b WP 005006913.1 Nitrospinae Nitrospinia Group 3b WP 012637708.1 Proteobacteria Gammaproteobacteria Group 3b WP 007219971.1 Candidatus Brocadiae Group 3b WP 017290644.1 NA Group 3b WP 027877222.1 Deinococcus-Thermus Deinococci Group 3b GroupWP 009061764.1 3b WP 012464725.1 Verrucomicrobia Verrucomicrobia Methylacidiphilae Methylacidiphilae Group 3b WP 027844918.1 Cyanobacteria NA Group 3b WP 012628086.1 Cyanobacteria NA

Group 3b WP 008098529.1 Verrucomicrobia Verrucomicrobiae

Group 3b WP 020864100.1 Thaumarchaeota NA Group 3b WP 013706649.1 Proteobacteria Deltaproteobacteria

Group 3b WP 020412786.1 Proteobacteria Gammaproteobacteria As 181 As 178 -p 00705 -p 03266

As 169 As 048 As 005 -p 00794 -p 02929 As 171 -p 01601 As 173 - As 047 p 04121 -p 01036 -p 00315 As 080- As 059-p 03203 As 177 A A As 086 p 02085s 179 As 014- s 046 - As 022-p 02179 As 050- p 01484 -p 00632 A - s 142-p 01058 -p 01940 p 00810 p 01481 Group 3b WP 027854910.1 Proteobacteria Gammaproteobacteria p 02570 Group 3b WP 015258948.1 Proteobacteria Gammaproteobacteria As 144 Group 3b WP 006786889.1 Proteobacteria Gammaproteobacteria A A -p 02097 Group 3b WP 025280829.1 Proteobacteria Gammaproteobacteria s 108 s 123 Group 3b WP 010321726.1 Proteobacteria Gammaproteobacteria A - Group 3b WP 007021942.1 ProteobacteriaGroup 3b WP Gammaproteobacteria028300037.1 Proteobacteria Gammaproteobacteria As 176 s 040 -p 00728p 03252 A -p 02732 s 123 -p 00980 As 176 As 156 - p 03253 As 144 -p 01334 -p 01310 -p 00932 As 076 As 100- -p 0 p 03196 Group 3b WP 014029204.1 Proteobacteria Acidithiobacillia group 3b A s 036 1 112 Group 3b WP 026610481.1 Proteobacteria Gammaproteobacteria -p 01802 Group 3bGroup WP 018508430.1 3b WP 011311775.1 Proteobacteria Proteobacteria Betaproteobacteria Betaproteobacteria Group 3b WP 020680743.1 Proteobacteria Gammaproteobacteria As 088- Group 3b WP 020653150.1 Proteobacteria Betaproteobacteria As 081 As p 02634 1 -p 00318 As 148 Group 3b WP 027709971.1Group 3bProteobacteria WP 018608874.1 Gammaproteobacteria Proteobacteria Betaproteobacteria 15-p 0 Group 3b WP 002708355.1 Proteobacteria Gammaproteobacteria 0 - Group 3b WP 014292817.1 Proteobacteria Gammaproteobacteria 275 p 01958 Group 3b WP 028490240.1 Proteobacteria Gammaproteobacteria As 154

-p Group 3b WP 026375374.1 Proteobacteria Gammaproteobacteria 00692

Group 3b WP 028379943.1 Proteobacteria Gammaproteobacteria Group 3b WP 019350089.1 Proteobacteria Gammaproteobacteria As 147 Group 3b WP 003635857.1 Proteobacteria Gammaproteobacteria Group 3bGroup WP 006871434.1 3b WP 028383773.1 Proteobacteria Proteobacteria Gammaproteobacteria Gammaproteobacteria A A - s 174 s 094 p 00326 Group 3b WP 025386399.1 Proteobacteria Gammaproteobacteria Group 3b WP 027222572.1 Proteobacteria Gammaproteobacteria -p 01689 Group 3b WP 013032567.1 Proteobacteria Gammaproteobacteria As 045 -p 02275 Group 3b WP 028989466.1 Proteobacteria Acidithiobacillia Group 3b WP 002691721.1 Proteobacteria Gammaproteobacteria Group 3b WP 028385762.1 Proteobacteria Gammaproteobacteria Group 3b WP 006969455.1 Proteobacteria Deltaproteobacteria - As 023 Group 3b WP 002644708.1 Planctomycetes Planctomycetia p 01630 As 152 Group 3b W As 077-p 013 Group 3b WP 010585297.1 Planctomycetes Planctomycetia -p 00963 Group 3b W As 132-p 01631 - Group 3b WP 009094014.1 Planctomycetes Planctomycetia p 00910 As 054

P 013638 -p 00528 94 Group 3b WP 008515176.1 Firmicutes Clostridia P As 130-p 01133 013537 3 46.1 A 2 42.1 Group 3b WP 018300187.1 Proteobacteria Gammaproteobacteria qui Group 3b W A ! As 1 c qui a e ! 43 c Aqu a -p e A i! 003 A qui ca s 006 P 012308 e 97 ! c a Group 3b WP 012280244.1 Proteobacteria Gammaproteobacteria -p 00782 e Group 3b W Group 3b WP 015051716.1 Firmicutes Clostridia 9 76.1 Cand Group 3b WP 012963143.1 Aqui!cae Aqui!cae

Group 3b W P idatus 01 Group 3b WP 012530 1938 Group 3b WP 012676609.1 Aqui!cae Aqui!cae K o 0 rarcha 49.1 P P 012646977.1 Pr Group 3b WP 008286766.1 Aqui!cae Aqui!cae Group 3bGroup WP 028949840.1 3b WP 012674555.1 Aqui!cae Aqui Aqui!cae!cae Aqui!cae Group 3b WP e ro ota NA teobacteri Group 3b W 4 01.1 Pr

o Group 3b W a teobacteri 01572 Deltapr o P teobacter 027716 05 As 062-p 02970 Group 3b W o 65.1 Group 3b WP teobacter a Deltap 793.1 As 024-pAs 00015 106-p 00715 i P Pro a Deltap Group 3b W 0 As 136-p 01341 1 As 116-p 01981 1012 teobacter Pr As 053-p 00553 As 011-p 01511 o As 063-p 00245As 134-p 00173 roteobacter teobacter As 135-p 00451 Group 3 i r P 478.1 a oteobacte As 034-p 00789 014734228.1 Asgard archaea cluster 010868 As 133-p 02164 P ia Deltap 013904 E b u ia Deltap WP r ia 0 yarchae r Group 3b W Group 3b W 72.1 Eu ia Group 3b W 0137490 r Group 3b W Group 3b W 7 oteobacte 76.1 Eur E roteobacte u o ryarcha ryarcha ta Group 3b W Group 3b W Gro Therm Group 3b WP Gr 17.1 E As 051-p 00383 Group 3b P Group 3b W P oup 3b W 013466 yarchae r P 014012 u e ia As 131-p 00516 Group 3b W 013466 p 3b W P e ota ria As 018-p 00049 P 012571 004069 u ota ococci r As 084-p 00757As 038-p 02645 yarchae The As 157-p 01221As 071-p 02982 Group 3b W The As 003-p 00416 P 3 ota W 5 P 014788 015849 As 067-p 00303 P P 46.1 Eu r 020953 8 82.1 Eur 014 rmococci mococci As 068-p 00599As 064-p 02743 013905 1 P 412.1 Eur The 1.1 Eu P 0 010479608.10 As 037-p 00540 09.1 Eu ota The Group 3b W P 1 122 r Group 3b W 015848 1712 r mococci 7 yarcha yarcha 4 569.1 E 91.1 P 6 ryarchae Group 3b WGroup 3b W 5 30.1 Eu 86.1 Eu As 072-p 01526 010867 r yarcha r 79.1 5 yarchae mococci 60.1 Pr E Group 3b W 5 u e e r ota Eu 71.1 Eu u yarcha As 173-p 02309 E ota r r e Group 3b W P r ota u yarcha ya As 086-p 01631 P r yarcha ota The As 171-p 03622 010885 9 yarchae oteobacteri r As 080-p 01109 As 168-p 01823 0 88.1 E yarc ota T rchae As 022-p 01731As 103-p 00373 Th her P 1 e The r Group 3b W P 1012 r Ther mococci 0 014734 yarcha h ota As 095-p 02356 As 171-p 03712 er mococci e 1 e ae ota As 169-p 02496 P ota r Group 3b W 1251 u ota m mococci As 172-p 03129 010479 3 ota The As 167-p 01407 0 r ococci ota mococci 78.1 E yarcha The As 061-p 01329 29.1 Eu The As 085-p 00147 P e The 014788 The a r As 075-p 00153 019.1 ota The mococci 0 Alph r r 05.1 Eu r mococci mococci uryarcha As 053-p 00438 r moco r P 5 e mococci The mococci 91.1 Eu r ota 012571 yarchae a E pr P 0 ury rmococci The 014012 00.1 Eu r cci oteo yarcha a e r rchae r Group 3b WP 014558156.1 Crenarchaeota Thermoprotei ota moco 4 yarcha o b 93.1 Eu ta acteri r 0 yarcha e Ther T ota ota he 81.1 cci eota As 063-p 00195 rmoc a As 064-p 00705 Ther r The m Eu yarcha e ococci The ryarcha ota r o m mococci cci ococci The r e mococci As 038-p 01098 ota As 004-p 02894 r e mococci ota The

As 029-p 01339 As 014-p 02021 The r As 065-p 02124 As 142-p 01601 mococci

As 161 r mococci

As 056-p 02945 AN ac AN As 057-p 00710

As 042-p 02687 -p 01206 As 166-p 00087As 165-p 02110

i As 129-p 00672 As 072-p 02197

hportinm As 121-p 02042 As 056-p 00966

As 157-p 02646 As 037-p 01771

As 107-p 01801 As 071-p 02544

O As 096-p 03155 As 026-p 00393 As 121-p 00372 As 078-p 01818 As 066-p 01261

s

utadid

As 055-p 01004 As 104-p 02552 As 069-p 00782 As 008-p 00865 As 060-p 02256 As 020-p 01794

As 021-p 00428

aiboro na sA

A

C 1.8 C 20 s

9

sA l Group 3b WP 028842337.1 Nitrospirae Nitrospira 1 hC iborolhC 1 iborolhC hC ia As 093-p 00240 As 160-p 01425 2 sA As 141-p 00599 As 099-p 02884 -310 4 Group 3b WP 006926982.1 Calditrichaeota Calditrichae -55 As 159-p 00126

0

80

p p-261 sA As 158-p 01101

0 0 p

8220 0 88210 p-

Group 3b WP 012498899.1 Chlorobi Chlorobia 79

9

38110

6281 Group 3b WP 028844226.1 Nitrospirae Nitrospira 5

4 As 087-p 00741

200 p-38 . P 6 As 139-p 02133

651 00 Group 3b WP 014737726.1 Crenarchaeota Thermoprotei W 6

52 Group 3b WP 029915222.1 Proteobacteria Deltaproteobacteria

27

p

b3 Group 3b WP 012970030.1 Proteobacteria Gammaproteobacteria As 138-p 00938

9 6

-

Group 3b WP 013218375.1 Chloro"exi Dehalococcoidia p 20

05 0 34910 p-9 34910

39 p 710 p-970sA 710

42730 42730 Group 3b WP 015283468.1 Euryarchaeota NA 21 uorG Group 3b WP 008085116.1 Euryarchaeota NA

03900 p-330 sA p-330 03900

0

79500 p-651 sA p-651 79500

-9 7720 s 192

0 PW PW 0 sA

0

461 sA

40

A

-

Group 3b WP 017890430.1 CandidatusAtribacteria NA As 125-p 01953 Group 3b WP 008526674.1 Euryarchaeota Halobacteria

0 p

sA

Group 3b WP 015403594.1 Proteobacteria Deltaproteobacteria -8

00 sA 00 b p 3 -91

5

010 p Group 3b WP 022363293.1Actinobacteria Coriobacteriia Group 3b WP 048088262.1 Euryarchaeota Methanomicrobia p Group 3b WP 012506474.1 Chlorobi Chlorobia 0 Group 3b WP 027366709.1 Proteobacteria Deltaproteobacteria As 184-p 01 uo Group 3b WP 011416040.1 Proteobacteria Deltaproteobacteria 26 Group 3b WP 013979629.1Actinobacteria Coriobacteriia 0 0 s Group 3b WP 009139266.1Actinobacteria Coriobacteriia

01300 p-21 2 4 7 1 0 p-5 5 1 s A rG A 5 3 2 2 0 p-3 6 1 s A Group 3b WP 006366269.1 Chlorobi Chlorobia Group 3b WP 015760489.1Actinobacteria Coriobacteriia sA

Group 3b WP 012466994.1 Chlorobi Chlorobia Group 3b WP 012501574.1 Chlorobi Chlorob

Group 3b WP 028027114.1Actinobacteria Coriobacteriia Group 3b WP 011745973.1 Chlorobi Chlorobia

Group 3b WP 022739737.1Actinobacteria Coriobacteriia

Group 3b WP 029912396.1 Proteobacteria Deltaproteobacteria

Group 3b WP 016308610.1Actinobacteria Coriobacteriia

Group 3b WP 011361294.1 Chlorobi Chlorobia

Group 3b WP 048090768.1 Euryarchaeota Methanomicrobia

various Asgard archaea

As 153-p 00827 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

ai

rG di

o

aidirtsolC setucimri r aidirtsolC

t

solC setuc solC

W a1 pu

a Group 1a WP 013011808.1 Deferribacteres Deferribacteres i

d a

ir id aidi

i

t aidirts irtsolC irtsolC

s m

r ri o ts

lC se lC

F

olC

1 aidir G ol Tree scale: 1.25 . F F

s 34083900 P 31331471 r se t C 1 setu 1 aid

ucimriF 1. ucimriF pu a1 .587936720PW t

uo rG t e

Group 1a WP 013885747.1 Deferribacteres tu o set solC 1.

p uc i

u rtsolC setucimriF 1.2068085 setucimriF rtsolC

i c

im

orG a1 ci p mriF 1.78 mriF

W r

setucivitageN setucimriF 1.377961120 PW a PW 1.377961120 setucimriF setucivitageN i mriF pu etorP

0 F o W a1 P

G 1.548

u

3 P

cimriF 1 cimriF P 0 265 Group 1a WP 004071552.1 Proteobacteria Deltaproteobacteria ab Group 1a WP 022664892.1 Pr a1 1

0 W .4 c W Group 1a WP 015336980.1 ProteobacteriaGroup 1a Deltaproteobacteria WP 015851925.1 Proteobacteria Deltaproteobacteria uor

Group 1a WP 2 puo a1

3 2

1 puorG P 639 2 2 84

09410 PW PW 09410 0600

8 81 6 a1 G

.975877

1 70010 P 70010 a1 p 1

W W 4 or puorG a1 PW 0

eD airet aidirts Group 1a WP 014261065.1 Proteobacteria Deltaproteobacteria r u PW 10

P l a

.98 p G i G

063720 1 o 17930 r Group 1a WP 027185047.1 Proteobacteria Deltaproteobacteria dir pat

1

10 P Group 1a WP 010890826.1 Firmicutes Clostridia

or G a ts

or

u 027178 1. rP

o

899

l

7 t Group 1a WP 015774148.1 Proteobacteria Deltaproteobacteria . e a W o

to a1 PW 00 C a lC

p

e

set puorG 1

1 a 533420 1 Group 1a WP 015958012.1 Firmicutes Clostridia o pu

a s 1 pu 1 Group 1a WP 008909589.1 Firmicutes Clostridia 0 Group 1a WP 016206881.1 Firmicutes Clostridia

03

0 PW a1 puorG a1 PW 0

e

uc 658.1 Proteobacteria Deltaproteobacteria t orG Group 1a WP 028573845.1 Proteobacteria Deltaproteobacteria .7 .4029390 im tcab ucim o Group 1a WP 003493671.1 Firmicutes Clostridia e r Group 1a WP 017209204.1 Firmicutes Clostridia Group 1a WP 014254247.1 Firmicutes Clostridia 0 PW

airetcabo

r ir G

iF 1. iF a oteobacteria Deltaproteobacteria Deferribacteres rP 1 p

52 r

iretc a b o etor P i

uorG

Group 1a WP 029968792.1 Proteobacteria Deltaproteobacteria 6 tc a b o etor P 1

F e etorP 1

eD 9

r o

l 4 1

i 34 1.2 3 0 8 6 3t 11 0 P W .21 boeto

rG Group 1a WP 028784867.1 Firmicutes Bacilli a ab Group 1a WP 004072606.1 Proteobacteria Deltaproteobacteria o Group 1a WP 014959255.1 Proteobacteria Deltaproteobacteria 933

c u rP

orpa 8 tca

1.249 a1 PW 563720 Group 1a WP 019123746.1 Firmicutes Bacilli

et eto

2 Group 1a WP 026688533.1 Firmicutes Bacilli atleD

ire

o a F

airet 01

a1 p 1.241493 Group 1a WP 025219113.1 Firmicutes Bacilli 1

rp puo

ri D o etorp atle D a Group 1a WP 013162757.1 Proteobacteria Deltaproteobacteria W

m

0 PW 0

cab cabo Group 1a WP 012959609.1 Firmicutes Bacilli

P

t leD Group 1a WP 017379378.1 Firmicutes Bacilli oeto cab atle at uci r Group 1a WP 024346501.1 Firmicutes Clostridia

b re t Group 1a WP 013272840.1 Firmicutes Clostridia

t

iret

G ai e Group 1a WP 014116604.1 Firmicutes Clostridia Group 1a WP 025324287.1 Proteobacteria Deltaproteobacteria a1 ire

D a a 5

2700

etca B s etorp PW 10 e 8

p r Group 1a WP 011 o a l

t uorG Group 1a GroupWP 010244488.1 1a WP 015924586.1 Firmicutes Firmicutes Clostridia Clostridia ai pu Group 1a WP 004615979.1 Firmicutes Clostridia Group 1h WP 012854897.1 Actinobacteria Actinobacteria a 08 ic ab p

l

l o c r i Group 1h WP 0205409 rG

.84

Group 1h WP 026402909.1 Actinobacteria Actinobacteria F 1 airetc a b o et o rp Group 1h WP 014376814.1 Actinobacteria Actinob airet Group 1h WP 027944444.1 Actinobacteria Actinobacteria

imri Group 1h WP Group023553242.1 1h WP 007516725.1Actinobacteria Actinobacteria Actinobacteria Group 1h WP 187404.1 Proteobacteria Deltaproteobacteria Group 1a WP 029915207.1 Proteobacteria Deltaproteobacteria

tuc Group 1h WP 014670577.1 Actinobacteria Actinobacteria airetc a b o et o e

Group 1h WP 003988175.1 Actinobacteria Actinobacteria s

N Group 1a WP 015739752.1 Firmicutes Clostridia

Group 1a WP 028052542.1 Firmicutes Clostridia Group 1h WP 014150953.1 Group 1a WP 015049787.1 Firmicutes Clostridia Group 1a WP 015739219.1 Firmicutes Clostridia 32.1 Group 1a WP 012032862.1 Firmicutes Clostridia Group 1h WPGroup 028062027.1 1h WP 0091551 Actinobacteria65.1 Thermoleophilia Group 1a WP 013217520.1 Chloro!exi Dehalococcoidia Group 1h WP 028059415.1 Actinoba 011603554.1cteria Actinobacteria Actinobacteria tage Actinobacteria Actinobacteria Group 1a WP 015407385.1 Chloro!exi Dehalococcoidia

ivi

c

Group 1a WP 013218717.1 Chloro! u

0 P W a1 p u or G t Group 1h WP 012796301.1 Group 1a WP 009616484.1 Firmicutes Clostridia e

s Group 1h WP 019819110.1 Actinob 31 Group 1h WP 009945368.1 Actinobacteria Actinobacteria Group 1a WP 014901856.1 Firmicutes Clostridia Group 1a WP 013755712.1 Firmicutes Clostridia Group 1a WP 025205213.1 Firmicutes Clostridia Group 1a WP 015262308.1 Firmicutes Clostridia Group 1a WP 014828375.1 Firmicutes Clostridia Group 1a WP 005812775.1 Firmicutes Clostridia Group 1h WP 021597135.1 Actinobacteria Actinobacteria Group 1a WP 014794526.1 Firmicutes Clostridia A Group 1a WP 015262676.1 Firmicutes Clostridia Group 1h WP 029651411.1 Proteobacteria Alphaproteobacteria ctinobacteria Actinob 1.275471 Group 1h WP 013132859.1 Actinobacteria Actinobacteria Actinobacteria Actinobacteria F Group 1h WP 016920676.1 Proteobacteria Group 1a WP 013119172.1 Firmicutes Clostridia i Actino mr

acteria i Group 1h WP 008474746.1 Chloro!exi c Group 1h WP 003612931.1Group Proteobacteria 1h WP 012872944.1 Alphaproteobacteria Chloro Actinobacteria!exi Thermomicrobia Actinobacteria

bacteria etu

Group 1h WP 011 s Thermoleophilia lC acteria exi De o acteria s

Group 1hGroup WP 010587703.11h WP 014267363.1 Planctomycetes Planctomycetia Acidobacteriia Actinob halococcoidia aidirt

153987.1 Proteobacteria acteria Group 1h WP 015247797.1 Planctomycetes Planctomycetia Group 1h WP 020375837.1 FirmicutesAlphaproteobacteria Clostridia Group 1h WP 003922139.1Group Actinoba 1h WPcteria 041979300.1 Acidobacteria Blastocatellia Group 1a WP 013119653.1 Firmicutes Clostridia a Group 1h WP 003926809.1Group 1hActinobacteria WP 01 Actinobacteria ii r Group 1h WP 018161883.1 Actinobacteria et cab Betaproteobacter o iia Group 1h WP 006334825.1 Actinobacteria A ir 1688202.1 Acidobacteria Acidobacteriia o Group Group1h WP 1h005517441.1 WP 007735075.1 Actinobacteria Actinobacteria Actinobacteria Actinobacteria C air Group 1h WP 003801347.1 Actinobacteria Actinobacteria iobacter etc r Group 1h WP 020500148.1 Actinobacteria Actinobacteria Group 1i WP 009141216.1 Actinobacteria Coriobacteriia ia a b Group 1h WP 013806771.1 Actinobacteria Actinobacteria o Actinobacteria itcA1 n Group 1h WP 023543351.1 Actinobacteria Actinobacteria Group 1h WP 028930304.1 Actinobacteria Actinobacteria Actino . Group 1h WP 018330638.1 Actinobacteria Actinobacteria 9 0 Group 1h WP 028936742.1 Actinobacteria Actinobacteria bacteria 4973 Group 1h WP 013225646.1 ctinobacteria 2 Group 1h WP 007034607.1 Actinobacteria Actinobacteria 2 0 Group 1h WP 013423858.1 airetc a b o nitc A airetc a b o nitc A 1.0 7 1 5 7 7 7 0 0 P W h 1 p u or G P Group 1i WP 022363273.1 Actinobacteria Coriobacteriia Group 1i WP 022740096.1W Actinobacteria Coriobacteriia group 1a i Group 1i WP 016308995.1 Actinobacteria Coriobacteriia Group 1h WP 006543203.1 Actinobacteria 1 Group 1i WP 006361856.1 Actinobacteria CoriobacteriiaGroup 1i WP 012803243.1 Actinobacteria Coriobacteriia p Group 1h WP 012890477.1 ActinobacteriaActinobacteria Actinobacteria u o Group 1i WP 022184401.1 Actinobacteria Coriobacteriia Group 1h WP 009061231.1 Verrucomicrobia Methylacidiphilae rG Group 1h WP 006608003.1 Actinobacteria Actinobacteria Actinobacteria Actinobacteria Group 1i WP 012798421.1 Actinobacteria Coriobacteriia Group 1i WP 009139069.1 Actinobacteria Coriobacteriia Group 1i WP 022370065.1 Actinobacteria Co

Group 1i WP 009304598.1 Actinobacteria Coriobacteriia Group 1hGroup WP 007412921.1 1h WP 015922819.1 Verrucomicrobia Chloro!exi Verrucomicrobiae Thermomicrobia Group 1i WP 013979648.1 Actinobacteria Coriobacteriia Actinobacteria Group 1h WP 007904650.1 Chloro!exi Ktedonobacteria

Group 1h WP 012934014.1 Actinobacteria

The rmoleophilia

Group 1h WPGroup 012813797.1 1h WP 014512387.1 Bacteroidetes Crenarchae Flavobacteriotaia

Group 1b WP 027177852.1 Proteobacteria Deltaproteobacteria Group 1b WP 011369403.1 Proteobacteria Deltaproteobacteria Group 1b WP 010939796.1Group 1b WPProteobacteria 015415315.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria group 1h Group 1b WP 018125512.1 Proteobacteria Deltaproteobacteria The rmoprotei

Group 1g WP 013680997.1 Crenarchaeota Thermoprotei Group 1g WP 014125970.1 Crenarchaeota Thermoprotei Group 1b WP 013258222.1 Proteobacteria Deltaproteobacteria

Group 1b WP 015904103.1 Proteobacteria Deltaproteobacteria Group 1g WP 011850254.1 Crenarchaeota Thermoprotei Group 1b WP 028320012.1Group Proteobacteria 1b WP 020585310.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria Group 1g WP 013337401.1 Crenarchaeota Thermoprotei Group 1b WP 011699790.1 ProteobacteriaGroup 1b WP Deltaproteobacteria 006964156.1 Proteobacteria Deltaproteobacteria Group 1g WP 048100713.1 Crenarchaeota Group 1b WP 028324338.1 Proteobacteria DeltaproteobacteriaGroup 1b WP 022666958.1 Proteobacteria Deltaproteobacteria ia ia Deltaproteobacteria

oteobacter Group 1g WP 020962524.1 Crenarchaeota Thermoproteiietorp o m re h T ato e a h cra n er C 1.6 5 9 1 6 7 11 0 P W g 1 p u or G Group 1b WP 028586938.1 Proteobacteria Deltaproteobacteriateobacter Group 1g WP 048100721.1 Crenarchaeota The o ia Group 1g WPGroup 014025627.1 1g WP Crenarchaeota rmoprotei Group 1h WP 048100378.1 Crenarchaeota Thermoprotei Group 1g WP 011822511.1 Crenarchaeota 011752262.1 ThermoproteiCrenarchaeota Thermoprotei

roteobacter Thermoprotei Group 1g WP 012123507.1 CrenarchaeThe ota Group 1b WP 020879797.1 Proteobacteria Deltaproteobacteria rmoprotei Group 1b WP 020888131.1 Proteobacteria Deltaproteobacteria Group 1b WP 029459118.1 Proteobacteria Deltaproteobacteria a Deltap

Group 1bGroup WP 029966051.1 1b WP 014261062.1 Proteobacteria Proteobacteria Deltaproteobacteria Deltaproteobacteria

Thermoprotei Group 1b WP 027191085.1 Pr ia Group 1b WP 027182711.1 Proteobacteria Deltaproteobacteria Group 1b WP 015750727.1 Proteobacteria Deltaproteobacteria group 1g Group 1b WP 015772985.1 Proteobacteria Deltaproteobacteria Group 1b WP 027179541.1 Proteobacteria Deltapr Group 1b WP 015337220.1 Proteobacteria Deltaproteobacteria Group 1b WP 015851793.1 Proteobacteria Deltaproteobacteria Group 1b WP 028574549.1 Proteobacteria GroupDeltaproteobacteria 1b WP 027188779.1 Proteobacteria Deltaproteobacteria

Group 1b WP 028572157.1Group 1b WP Proteobacteri 027361001.1 Proteobacteria Deltaproteobacteria

Group 1b WP 006008218.1 Proteobacteria Deltaproteobacteria Group 1b WP 011368035.1 ProteobacteriaGroup 1b WPDeltaproteobacteria 009380705.1 Proteobacteria Deltaproteobacteria Group 1b WP 010939208.1 Proteobacteria Deltaproteobacteria

Group 1b WP 021760964.1 Proteobacteria Deltaproteobacter Group 1b WP 013515487.1 Proteobacteria Deltaproteobacteria Group 1b WP 027175156.1Group 1b Proteobacteria WP 018126002.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria Group 1b WP 022660651.1Group Proteobacteria 1b WP 015414692.1 Deltaproteobacteria Proteobacteria Deltaproteobacteria

group 1i roteobacteria

Group 1b WP 025322049.1 Proteobacteria Deltap

Group 1b WP 007474315.1 Proteobacteria Epsilonproteobacteria Group 1b WP 012674350.1 Aqui"cae Aqui"cae obacteria

Group 1b WP 012675774.1 Aqui"cae Aqui"cae onprote Group 1b WP 029520216.1 Aqui"cae Aqui"cae a Epsil Group 1b WP 013553045.1 Proteobacteria Epsilonproteobacteria Group 1j WP 013682857.1 Euryarchaeota Archaeoglobi Group 1j WP 015591799.1 Euryarchaeota Archaeoglobi Group 1b WP 012082337.1 Proteobacteria Epsilonproteobacteria Group Group1b WP 1b014468937.1 WP 013135164.1 Proteobacteria Proteobacteria Epsilonproteobacteria Epsilonproteobacteria Group 1j WP 012940402.1 Euryarchaeota Archaeoglobi Group 1b WP 013460908.1 Proteobacteria Epsilonproteobacteria Group 1j WP 012965344.1 Euryarchaeota Archaeoglobi Group 1b WP 012083422.1 ProteobacteriGroup 1b WP 025391229.1 Proteobacteria Deltaproteobacteria Group 1b WP 014353799.1 Firmicutes Clostridia Group 1j WP 010878877.1 Euryarchaeota Archaeoglobi Group 1b WP 022670762.1 Proteobacteria Deltaproteobacteria Group 1b WP 013327110.1 Proteobacteria Epsilonproteobacteria

Group 1b WP 013460913.1 Proteobacteria Epsilonproteobacteria Group 1b WP 011371454.1 Proteobacteria Gammaproteobacteria PW k1puorG Group 1k WP 011021171.1 Euryarchaeota Methanomicrobia aib orci m o n a hte M ato e a h cra yru E 1.7 3 2 4 3 0 1 1 0 Group 1k WP 011306827.1 Euryarchaeota Methanomicrobia Group 1k WP 013898215.1 Euryarchaeota Methanomicrobia group 1j Group 1b WP 028115083.1 Proteobacteria Gammaproteobacteria Group 1b WPGroup 028110915.1 1b WP 023403981.1 Proteobacteria Proteobacteria Gammaproteobacteria Gammaproteobacteria Group 1k WP 011306833.1 Euryarchaeota Methanomicrobia Group 1b WP Group013344606.1 1b WP Proteobacteria005368230.1 Proteobacteria Gammaproteobacteria Gammaproteobacteria Group 1b WP 005299657.1 Proteobacteria Gammaproteobacteria Group 1k WP 011021166.1 Euryarchaeota Methanomicrobia

Group 1k WP 023846630.1 Euryarchaeota Methanomicrobia Group 1b WP 011637560.1 Proteobacteria Gammaproteobacteria Group 1b WP 028762126.1 Proteobacteria Gammaproteobacteria Group 1k WP 012901677.1 Euryarchaeota Methanomicrobia Group 1b WP 028768898.1 Proteobacteria Gammaproteobacteria Group 1b WP 012277350.1 Proteobacteria Gammaproteobacteria Group 1k WP 014405075.1 Euryarchaeota Methanomicrobia group 1b Group 1k WP 012036281.1 Euryarchaeota Methanomicrobia group 1k Group 1b WP 014468932.1 Proteobacteria Epsilonproteobacteria Group 1b WP 013135169.1 Proteobacteria Epsilonproteobacteria Group 1b WP 021286544.1 Proteobacteria Epsilonproteobacteria As 085-p 01085 Group 1b WP 011373064.1 Proteobacteria Epsilonproteobacteria As 075-p 00539 Group 1b WP 008339404.1 Proteobacteria Epsilonproteobacteria oteobacteria

Group 1b WP 012663540.1 Proteobacteria Epsilonproteobacteria Group 1b WP 007473991.1 Proteobacteria Epsilonproteobacteria Wukongarchaeota Group 1b WP 024789021.1 Proteobacteria Epsilonpr Group 1b WP 024954007.1 Proteobacteria Epsilonproteobacteria Group 1b WP 012856887.1 Proteobacteria Epsilonproteobacteria Group 1b WP 005870439.1 Proteobacteria Epsilonproteobacteria Group 1b WP 012108717.1 Proteobacteria Epsilonproteobacteria Group 1b WP 016647003.1 Proteobacteria Epsilonproteobacteria Group 1b WP 002849472.1 Proteobacteria Epsilonproteobacteria Group 1b WP 012661425.1 Proteobacteria Epsilonproteobacteria Group 1b WP 020975057.1 Proteobacteria Epsilonproteobacteria Group 1b WP 027305891.1 Proteobacteria Epsilonproteobacteria Group 1b WP 002952687.1 Proteobacteria Epsilonproteobacteriaonproteobacteria 366.1 Proteobacteria Epsil Group 1b WP 021091 Group 1b WP 018136329.1 Proteobacteria Epsilonproteobacteria

Group 1b WP 013505701.1 Chrysiogenetes Chrysiogenetes Group 1b WP 027390843.1 Chrysiogenetes Chrysiogenetes Group 1b WP 013163129.1 Proteobacteria Deltaproteobacteria Group 1b WP 006655925.1 Proteobacteria Epsilonproteobacteria Group 1b WP 011139497.1 Proteobacteria Epsilonproteobacteria Group 1b WP 002956442.1 Proteobacteria Epsilonproteobacteria Group 1b WP 004087140.1 Proteobacteria Epsilonproteobacteria Group 1b WP 023929963.1 Proteobacteria Epsilonproteobacteria Group 1b WP 023949825.1 Proteobacteria Epsilonproteobacteria Group 1b WP 023926858.1 Proteobacteria Epsilonproteobacteria Group 1b WP 013023319.1 Proteobacteria Epsilonproteobacteria

Group 1b WP 011577655.1 Proteobacteria Epsilonproteobacteria Group 1b WP 013469730.1 Proteobacteria Epsilonproteobacteria Group 1b WP 013890446.1 Proteobacteria Epsilonproteobacteria Group 1b WP 006564409.1 Proteobacteria Epsilonproteobacteria

Group 1c WP 013010514.1 Deferribacteres Deferribacteres Group 1c WP 022851983.1 Deferribacteres Deferribacte

Group 1c WP 010941450.1 GroupProteobacteria 1c WP 013886108.1 Deltaproteobacteria Deferribacteres Deferribacteres res Group 1d WP 011830409.1 Proteobacteria Betaproteobacteria Group 1c WP 012468242.1 Proteobacteria Deltaproteobacteria Group 1d WP 026436975.1 Proteobacteria Betaproteobacteria Group 1c WP 015721386.1 Proteobacteria Deltaproteobacteria Group 1d WP 022981892.1 Proteobacteria Betaproteobacteria Group 1c WP 0 Group 1d WP 028309978.1 Proteobacteria Betaproteobacteria 11938839.1 Proteobacteria Deltaprote

Group 1d AAB25780.1 Proteobacteria Betaproteobacteria Group 1c WP 015403419.1 Proteobacteria Deltaproteobacteria Group 1d WP 028007256.1 Proteobacteria Gammaproteobacteria obacteria Group 1d WP 011153927.1 Proteobacteria Betaproteobacteria group 1c Group 1c WP 023408348.1 Proteobacteria Deltaproteobacteria Group 1d WP 028999751.1 Proteobacteria Betaproteobacteria Group 1c WP 011187820 Group 1d WP 023105852.1 Proteobacteria Gammaproteobacteria Group 1c WP 022846122.1 Aqui"cae Aqui" Group 1c WP 013638066.1 Aqui.1" caeProteobacteria Aqui"cae D Group 1d WP 028215130.1 Proteobacteria Betaproteobacteria Group 1c WP 013537874.1 Aqui"cae Aqui"cae eltaproteobacteria Group 1d WP 020394857.1 Proteobacteria Gammaproteobacteria Group 1d WP 013124489.1 Proteobacteria Betaproteobacteria Group 1c WP 022854121.1 Thermodesulfobacteriacae Thermodesulfobacteria Group 1c WP 013906919.1 Group 1d WP 025813214.1 Proteobacteria Alphaproteobacteria Group 1c WP 013909681.1 Thermodesulfobacteria Thermodesulfobacteria Group 1d WP 025825897.1 Proteobacteria Alphaproteobacteria Group 1c WP 028840815.1 Thermodesulfobacteria Thermodesulfobacteria WP 018391449.1 Thermodesu Group 1d WP 012384150.1 Proteobacteria Alphaproteobacteria lfobacteria Thermodesulfobacteria Group 1d WP 012169130.1 Proteobacteria Alphaproteobacteria Group 1c WP 011688559.1 Acidobacteria Acidobacteriia bacteria Group 1cGroup WP 011419537.1 1c WP Proteobacteria Deltaproteobacteria Group 1d WP 014249365.1 Proteobacteria Alphaproteobacteriabacteria 0115250 Group 1d WP 029009006.1 Proteobacteria Alphaproteobacteria Group 1c WP 012098324.1 Proteobacteria Deltaproteobacteria 39.1 Group 1d WP 013913761.1 Proteobacteria Alphaproteo Acidobacteria Acidobacteriia Group 1d WP 029356678.1 Proteobacteria Alphaproteobacteria Group 1d WP 023457248.1 Proteobacteria Alphaproteo Group 1c WP 009150983.1 Proteobacteria Gammaproteobacteria Group 1d WP 029908155.1 Proteobacteria Gammaproteobacteria group 1f Group 1c WP 026199142.1 Proteobacteria Gammaproteobacteria Group 1c WP 014780162.1 Proteobacteria Gammaproteobacteria Group 1d WP 027853880.1 Proteobacteria Gammaproteobacteria Group 1c WP 014198378.1 Proteobacteria Alphaproteobacteria Group 1d WP 027859843.1 Proteobacteria Gammaproteobacteria Group 1c WP 029009764.1 Proteobacteria Group 1d WP 007019929.1 Proteobacteria Gammaproteobacteria Group 1c WP 028581743.1 Proteobacteria Deltaproteobacteria Group 1c WP 015725977.1 Proteobacteria Deltaproteobacteria Group 1d WP 028293254.1 Proteobacteria Gammaproteobacteria bacteria Group 1c WP 027371077.1 Proteobacteria Deltaproteobacteria Group 1d WP 020745479.1 Proteobacteria Gammaproteobacteria Group 1d WP 005863657.1 Proteobacteria Alphaproteobacteria

Group 1d WP 009505608.1 Proteobacteria Alphaproteobacteria Alphaproteobacteria Group 1d WP 020041893.1 Proteobacteria Alphaproteobacteria Group 1c WP 028579802.1 Proteobacteria Deltaproteobacteria Group 1d WP 008282509.1 Proteobacteria Alphaproteobacteria Group 1c WP 028583198.1 Proteobacteria Deltaproteobacteria Group 1c WP 029132834.1 Proteobacteria Gammaproteobacteria Group 1d WP 002720659.1 Proteobacteria Alphaproteobacteria Group 1c WP 011714122.1 Proteobacteria Alphaproteobacteria Group 1d WP 023664202.1 Proteobacteria Alphaproteo Group 1c WP 009206448.1orG Proteobacteria Betaproteobacteria Group 1d WP 028714011.1 Proteobacteria Alphaproteobacteria Group 1c1pu WP Group 1d WP 028877918.1 Proteobacteria Alphaproteobacteria PW c Group 1c WP 011466351.120 Proteobacteria Betaproteobacteria Group 1d WP 006939138.1 Proteobacteria Alphaproteobacteria 01 1384056.1 Prote Group 1c WP 015437386.1 Proteobacteria1.9140804 Betaproteobacteria Group 1d WP 008068262.1 Proteobacteria Alphaproteobacteria bacteria Group 1c WP 004378930.1 ProteobacteriaorP Betaproteobacteret ia Group 1d WP 008838370.1 Proteobacteria Alphaproteobacteria bo aproteo o group 1d Group 1c WP 015766767.1 Proteobacteriabacteria Betaproteobacteria Group 1d WP 015597373.1 Proteobacteria Alphaproteobacteriaia Alph Alp Group 1d WP 013418645.1 Proteobacteria Alphaproteobacteria GroupGroup 1c WP 1c WP 014238024.1 Proteobacteria Betap haproteo Group 1d WP 027157817.1 Proteobacteria Gammaproteobacteria Group 1c WP 010863700.1 Proteobacteria Gammaproteobacteria bacteria 255.1 Proteobacter 02745 air etc a b o et o r p a h pl A air etc a

Group 1c WP 007370146.1 Proteobacteria7220.1 P Gammaproteobacteria Group 1d WP 020923409.1 Proteobacteria Alphaproteobacteria Group 1d WP 018240 roteobacteria Betaproteobacteria GroupGroup 1c WP 1c 029095540.1 WP 004236380.1 Proteobacteria Proteobacteria Gammaproteobacteria Gammaproteobacteria Group 1d WP 014891816.1 Proteobacteria Alphaproteobacteria Group 1c W Group 1d WP 006205723.1 Proteobacteria Alphaproteobacteria Group 1c WP 027275754.1 Proteobacteria Gammaproteobacteria Group 1c WP 006659162.1 Proteobacteria Gammaproteobacteria teobacteria Alphaproteobacteria roteobacteria Group 1d WP 028738633.1 Proteobacteria Alphaproteobacteria group 1e P 012135127.1 Pr Group 1d WP 028135606.1 Proteobacteria Alphaproteobacteria 200.1 Pro Group 1d WP 026600032.1 Proteobacteria Alphaproteobacteria Group 1c WP 005605173.1 Proteobacteria Gammaproteobacteria Group 1d WP 028336746.1 Proteobacteria Alphaproteobacteria Group 1c WP 005576256.1 Proteobacteria Gammaproteobacteria oteobacte Group 1c WP 005638557.1 Proteobacteria Gammaproteobacteria Group 1d WP 026783388.1 Proteobacteria Alphaproteobacteria Group 1c WP 010674404.1Group Proteobacteria 1c WP 012073017.1 Gammaproteobacteria Proteobacteria Gammaproteobacteria Group 1d WP 009540Group 1d WP 016919626.1 Proteobacteria Alphaproteobacteria Group 1c WP ria Gammaproteobacteria GroupGroup 1c WP 1c 004092218.1 WP 024522506.1 Proteobacteria Proteobacteria Gammaproteobacteria Gammaproteobacteria Group 1c WP 013318689.1 ProteobacteriaGroup 1c WP 005760858.1 Gammaproteobacteria Proteobacteria Gammaproteobacteria Group 1dGroup WP 011388917.1 1d WP 021130964.1 Proteobacteria Proteobacteria Alphaproteobacteria Alphaproteobacteria 011201506.1 Proteobacteria Gammaproteobacteria Group 1d WP 011156496.1 Proteobacteria Alphaproteobacteria Group 1c WP 023580917.1 Proteobacteria Gammaproteobacteria Group 1d WP 017366017.1 Proteobacteria Gammaproteobacteria Group 1c WPGroup 004703021.1 1c WP 002435491.1 Proteobacteria Proteobacteria Gammaproteobacteria Gammaproteobacteria Group 1f WP 015898101.1 Acidobacteria Acidobacteriia Group 1d WP 026610556.1 Proteobacteria Gammaproteobacteria Group 1c WP Group024528473.1 1c WP Proteobacteria002443827.1 Proteobacteria Gammaproteobacteria Gammaproteobacteria Group 1c WP 013366452.1 Proteobacteria Gammaproteobacteria Group 1f WP 006562194.1 ChloroGroup! 1fexi WP Chloro 0264416!exia19.1 Acidobacteria Acidobacteriia Group 1d WP 009149949.1 Proteobacteria Gammaproteobacteria Group 1c WP 011092818.1 Proteobacteria Gammaproteobacteria Group 1f WP 012121625.1 Chloro!exi Chloro!exia Group 1d WP 015282042.1 Proteobacteria Gammaproteobacteria Group 1d WP 020502968.1 Proteobacteria Gammaproteobacteria Group 1d WP 007039718.1 Proteobacteria Gammaproteobacteria Group 1d WP 007190983.1 Proteobacteria Gammaproteobacteria Group 1d WP 002706872.1 Proteobacteria Gammaproteobacteria Group 1d WP 011767509.1 Proteobacteria Betaproteobacteria Group 1f WP 028842875.1 Nitrospirae Nitrospira Group 1f WP 028845527.1 Nitrospirae Nitrospira Group 1fGroup WP 020678214.1 1f WP 027713781.1 Proteobacteria Proteobacteria Deltaproteobacteria Deltaproteobacteria Group 1f WP 006544185.1 Actinobacteria Actinobacteria Group 1d WP 011629764.1 Proteobacteria Gammaproteobacteria Group 1f WP 018502767.1 Actinobacteria Actinobacteria

Group 1f WPG 004512555.1 Proteobacteria Deltaproteobacteria Group 1f WP 011937483.1 Proteobacteria Deltaproteobacteria

Group 1f WP 012470110.1 Proteobacteria Deltaproteobacteria

Group 1f WP 013450229.1 Deferribacteres Deferribacteres

Group 1d WP 010863916.1 Proteobacteria Gammaproteobacteria Group 1f WP 013006836.1 Deferribacteres Deferribacteres Group 1d WP 015004151.1 Proteobacteria Betaproteobacteria Group 1f WP 015718812.1 Proteobacteria Deltaproteobacteria

Group 1d WP 003830872.1 Proteobacteria GammaproteobacteriaGroup 1d WP 026685892.1 Proteobacteria Betaproteobacteria Group 1f WP 013885838.1 Deferribacteres Deferribacteres Group 1d WP 014450582.1 Nitrospirae Nitrospira Group 1dGroup WP 023226188.1 1d WP 002683184.1Group Proteobacteria 1d WPProteobacteria 028993715.1 Gammaproteobacteria Gammaproteobacteria Proteobacteria Betaproteobacteria

Group 1f WP 022851572.1 Deferribacteres Deferribacteres

Group 1d WP 011526524.1 Proteobacteria Deltaproteobacteria Group 1f WP 013011747.1 Deferribacteres Deferribacteres Group 1d WP 009378513.1 Proteobacteria Deltaproteobacteria o etorP 1.4 2 5 4 2 3 5 2 0 P W f1 p u or Group 1d WP 012463656.1 Verrucomicrobia Methylacidiphilae

Group 1d WP 009059758.1 Verrucomicrobia Methylacidiphilae tcabe

D air Group 1f WP 012529798.1 Proteobacteria Deltaproteobacteria Group 1f WP 012421152.1 V

Group 1e WP 012962745.1 Aqui"cae

torpatlee

ea Group 1e WP 013755279.1 Proteobacteria Gamma WP 006192752.1 Group 1e WP 008287448.1 Aqui"cae Aqui"cae Group 1e WP 029133715.1 Proteobacteria Gammaproteobacteria Group 1e WP 012992451.1 Aqui"cae Aqui"cae

h abo Group 1e WP 015282140.1 Proteobacteria Gammaproteob Group 1e WP 010880593.1

c Group 1e WP 012971230.1 Proteobacteria Gammaproteobacteria i Group 1e WP 009147062.1 Proteobacteria Gammaproteobacteria

rtidl

Group 1e WP 014779054.1 Proteobacteria Gammaproteobacteria Group 1e WP 007191025.1Group 1e Proteobacteria WP 020508075.1 Gammaproteobacteria Proteobacteria Gammaproteobacteria Group 1e WP 028490337.1 Proteobacteria Gammaproteobacteria aiborol airetc ae Group 1e WP 007039064.1 Proteobacteria Gammaproteobacteria c a Group 1e WP 002684413.1 Proteobacteria Gammaproteobacteria i

a

aibor

a borol

ib C C

a Group 1f WP 020489980.1 Fir G iborol

iria at a Aqui"cae Aqui"cae o iborolh

r rG

h

o oeah o or lh o Aqui"cae Aqui"cae hC

l ae Aqui" i C errucomic h pu Group 1f WP 004007712.1 Actinobacteria Actinobacteria C iborol "c boro

ibor C hC hC Group 1f WP 022865058.1 Actinobacteria Actinobacteria 1 uorG uorG PW c ibo

puorG Group 1f WP 005509950.1 i Group 1f WP 004013385.1 Actinobacteria Actinobacteria p illi e rt

rG orG C i e1 pu

idl iborolhC iborolhC lhC Group 1d WP 019542170.1 Firmicutes Negativicutes o 1 W

borolhC

ro

o 2510 Group 1f WP 027012481.1 ActinobaGroupcteria 1f WP 014310124.1 Actinobacteria Actinobacteria lhC

h

pu Group 1d WP 005840788.1 Firmicutes Negativicutes lhC a airet pu Group 1f WP 006062526.1 Group 1f WP 005388873.1 Actinobacteria Actinobacteria robia Verrucomicrobiae 1

C

uorG 1 0 PW C 0 P Group 1d WP 019554568.1 Firmicutes Negativicutes Aqui"cae Aqui"cae

Group 1d WP 029541750.1 Firmicutes Negativicutes 1.15 .3 PW e

p

1.73 1

1

1.4496 e 3

c

1 0 Aqui

9 .41 5810 P W e 1 p u or G

aboetor

Group 1d WP 021167302.1 Firmicutes Negativicutes PW10 e1PW e1 p 1.235205210 PW d1 puorG d1 PW 1.235205210 1 Group 1d WP 012282567.1 Firmicutes Clostridia 2 Group 1d WP 007954318.1 Firmicutes Negativicutes 1.36797

e 02

Group 1d WP 004573252.1 Firmicutes Negativicutes 4 1 1.8 Group 1d WP 007290417.1 Firmicutes Negativicutes

W 40 Group 1f WP 025252525.1 0 P W e 1 p u or G Group 1d WP 025206080.1 Firmicutes Clostridia PW 5 5 74 0 " 00 Group 1d WP 014186363.1 Firmicutes Clostridia 856421 Group 1d WP 026762838.1 Firmicutes Negativicutes 0 Group 1d WP 015261076.1 Firmicutes Clostridia Group 1d WP 012281726.1 Firmicutes Clostridia puorG d1 PW 0 5210 c 00 8 5080 Group 1d WP 006305436.1 Firmicutes Negativicutes Group 1d WP 025306136.1 P 71853110 3 6 3 Group 1e WP 018610610.1 Proteobacteria Betaproteobacteria ae 5210 P 5210

Group 1d WP 014794600.1 Firmicutes Clostridia P Group 1d WP 014904046.1 Firmicutes Clostridia 6 0

9 8479 Group 1d WP 014826810.1 Firmicutes Clostridia 521 0700 3600 0 9 micutes Clostridia

7 3

Group 1d WP 012992146.1 Group 1d WP 010880393.1 Aqui"cae Aqui"cae 6

pammaG aire pammaG 8 Group 1d WP 027889016.1 Firmicutes Negativicutes Group 1d WP 008286892.1 Aqui"cae Aqui"cae 1.2 8

00

69 7259810 P W e1

Group 1d WP 027937761.1 Firmicutes Negativicutes 00877

P

. 673

.

06 1.0 0

9 r Group 1d WP 003397243.1 Firmicutes Bacilli 5251 P

P

8 Actinobacteria PW

P 1 Group 1d CUU03002.1 Candidatus Kryptonia NA 8 P 1

W d W

eto r etorP 1. Group 1d WP 003330153.1 Firmicutes Bacilli W Group 1d WP 006555093.1 Firmicutes Negativicutes rP Group 1d WP 012513102.1 Aqui o d1 puorG d1 Group 1d WP 028254735.1 Firmicutes Negativicutes W proteobacteria

d t bo bo Group 1d WP 015594636.1 Firmicutes Bacilli pu 1 Group 1d WP 028552966.1 Firmicutes Bacilli d1 1 rP 1.4520

W .248 Actinobacteria rP 1.4

o Group 1d WP 020062534.1 Firmicutes Bacilli d1 1

puorG

puo boeto

Group 1d WP 028257253.1 Firmicutes Negativicutes Group 1d WP 019241710.1 Firmicutes Bacilli

P

a etor P 1.7 2 0 9 3 aboe torP 1.8etcaetca t

o

e

o r

c r

rG p Group 1d WP 010631937.1 Firmicutes Bacilli tc a air tcaboetor

e G etc

tor bo u

e

Group 1d WP 005376164.1 Firmicutes Negativicutes boeto

Group 1d WP 027414715.1 Firmicutes Bacilli r a B ai o re Actinob

ir Actinobacteria i

rG a acteria tcaboet c

eB e

a t Group 1d WP 023510524.1 Firmicutes Bac ai G t

e G Group 1d WP 013764260.1 Bacteroidetes Saprosp retcaboetoa

i iretcab r

a r pate a Actinobacteria

tcabo

m

P 1 P Actino

Group 1d WP 005643376.1 Bacteroidetes Bacteroidia G orp ai

re G acteria lA air

m

p

.

a h

838043620 PW e1 e1 PW 838043620 oet

m

G ai a

b Group 1d WP 007653239.1 Bacteroidetes Bacteroidia Group 1d WP 012025884.1 Bacteroidetes Flavobacteriia

p htidicA a

rpa

m bacteria a Group 1d WP 014217548.1 Bacteroidetes Chitinophagia o Group 1d WP 022602547.1 Bacteroidetes Bacteroidia r orpamma

ma

t Actinobacteria

a Group 1d WP 008862337.1 Bacteroidetes Bacteroidia Group 1d WP 008510449.1 Bacteroidetes Sphingobacteriia t tc

boi

etcaboetor m e Group 1d WP 025279286.1 Bacteroidetes Bacteroidia oetorpa m m a G

orpamma rp

a b boe air air

p

to

lica

e

l

i

or a caboeto

tca

etcaboe

bo t e

et

r

a

i

o re ir a airetca a

b ai

tc

e

airetc a b o et

air

p

etca

uorG

r

i

a bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

- Carbohydrates CO2 O2 Nitrogen SulfurCl H2

As_083 As_078 Thorarchaeota As_029 As_028 As_114 Hermodarchaeota As_086 As_022 Odinarchaeota As_006 As_143 As_130 Baldrarchaeota As_129 As_127 As_077 Lokiarchaeota As_004 As_152 As_128 As_098 Helarchaeota As_050 As_048 As_183 Borrarchaeota As_181 As_178 As_099 Heimdallarchaeota As_139 As_138 As_030 As_140 Kariarchaeota As_002 As_017 As_116 Gerdarchaeota As_053 As_024 As_003 Hodarchaeota As_027 As_131 Wukongarchaeota As_085 As_075 : nifH Sulfate reduction: sat Sulfate reduction: cysH Sulfate reduction: cysC Reductive Dehalogenase NiFe hydrogenase Group 1 NiFe hydrogenase Group 3 NiFe hydrogenase Group 4 Complex I: nuoD Complex II: sdhA Complex II: sdhC Complex II: sdhD Complex IV: Complex IV: cyoB Complex IV: cyoC Complex IV: cydA Complex V: atpA Complex V: atpB Complex V: atpC Nitrate reductase: narGYJI narK Nitrate/nitrite transporter: reductase: nosZ Glycolysis Pentose phosphate pathway, oxidative Wood Wood Methane oxidation: mcrA Methane oxidation: mcrB Methane oxidation: mcrC Methane oxidation: mcrD Methane oxidation: mcrG Complex I: nuoA Complex I: nuoB Complex I: nuoC Acetate synthesis: acs/acdA Wood Gluconeogenesis Pyruvate oxidation Citrate cycle Pentose phosphate pathway, non-oxidative Pentose phosphate pathway, rev. RuMP beta Oxidation Ljungdahl pathway: cdhD Ljungdahl pathway: cdhE Ljungdahl pathway: cooS cyoA

Absence in this phylum Presence in MAG Presence in other MAGs of this phylum bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.343400; this version posted October 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Wukongarchaeota Ftr FwdD FwdC HdrC

As_075 contig_145_6336 Ftr FwdB FwdA Ftr HyaB

Ftr FwdD FwdC HdrC

As_085 contig_145_13955 FwdB FwdA Ftr HyaB HyaA