Supplementary information for

Early acquisition of conserved, lineage-specific proteins currently lacking functional predictions were central to the rise and diversification of

Raphaël Méheust+1, 4, Cindy J. Castelle1, 2, Alexander L. Jaffe5 and Jillian F. Banfield+,1,2,3,4,5

+Corresponding authors: [email protected], raphael.meheust@berkeley,edu

1Department of Earth and Planetary Science, University of California, Berkeley, CA 2Chan Zuckerberg Biohub, San Francisco, CA 3Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA 4Innovative Genomics Institute, University of California, Berkeley, CA 5Department of Plant and Microbial Biology, University of California, Berkeley, CA

Do cluster together, despite their phylogenetic diversity? (Full analysis) To test whether we could detect expected patterns of protein family distribution across lineages using our approach, we examined the distribution of genes involved in methanogenesis, including the alpha subunit mcrA, a key gene in methane production (Ermler et al. 1997). The mcrA protein family (fam05485) was identified in module 65, which comprises 128 protein families and is highly enriched in , and Methanomassiliicoccales (Supplementary Figure 7). Along with the mcrA gene, all the other subunits (BCDG) of methyl– coenzyme M reductase (Mcr) were identified in module 65 (fam02993, fam03716, fam15638 and fam04416). Five subunits of the methyl-tetrahydromethanopterin (methyl-H4MPT): coenzyme M methyltransferase (Mtr) (BCDEF) were also found (fam06037, fam06163, fam06462, fam06013 and fam02119 respectively) although the Mtr genes were absent in Methanomassiliicoccales (Borrel et al. 2014) (Supplementary Dataset - Table S3). Using HMM-HMM comparison method against the eggNOG database (Huerta-Cepas et al. 2019) (see materials and methods), we also detected five hypothetical conserved protein families that are associated with the core proteins of methanogenesis (fam02372, fam05394, fam06062, fam06600 and fam12037) (Borrel et al. 2014) (Supplementary Dataset - Table S4). The occurrence between the Mcr subunits and the five methanogenesis markers of unknown functions is striking (Supplementary Figure 7). Further, genes for transport of iron, magnesium, cobalt and nickel and for synthesis of key cofactors that are required for methanogens growth were also found in the module 65. We identified two other modules enriched in subunits of the energy-converting hydrogenase A (fam10726, fam14360, fam17633, fam06266, fam13666, fam08995, fam03063, fam02679, fam22457 and fam06367) and B (fam06098, fam06875 and fam32156) (module 129) and in enzymes for the utilization of methylamine (fam02336 and fam03937), dimethylamine (fam03076 and fam05873), and trimethylamine (fam04092 and fam21299) as substrates for methanogenesis (Burke et al. 1998) (module 184). Methylamine, dimethylamine and trimethylamine are in two distinct families instead of one single family due to mispredictions of the coding sequences (each gene was split in two consecutive genes by the gene prediction software Prodigal). We found two families of methanol---5-hydroxybenzimidazolylcobamide Co-methyltransferase in two distinct families (fam05405 and fam04064) in modules 184 and 72 that are specific to the Methanomassiliicoccales (Supplementary Figure 7 and Supplementary Dataset - Table S3). When the distribution of the signature families is rendered on the phylogenetic tree of Archaea the correspondence between families, modules and annotations is apparent (Supplementary Figure 7). We also recovered mcr subunits in lineages that are not considered as canonical methanogenic lineages (Evans et al. 2019). These include two genomes of Bathyarchaeota related to BA1 and BA2 (GCA_002509245.1 and GCA_001399805.1) (Evans et al. 2015), and one Archaeoglobi genome related to JdFR-42 (GCA_002010305) (Boyd et al. 2019; Wang et al. 2019). These genomes have been described as having divergent MCR genes. It is reassuring that our method is sensitive enough to recover distant homology (Supplementary Figure 7).

Functions specific to Thalassoarchaea (Full analysis) Modules 32 and 71, encompassing 199 families, were consistently associated with genomes of Thalassoarchaea archaea. Recent comparative genomic analysis of 250 Thalassoarchaea genomes revealed the ecological roles of these archaea in protein and saccharide degradation (Tully 2019). Protein degrading enzymes (several different classes of peptidases and one oligotransporter) found in modules 32 and 71, include some previously linked to Thalassoarchaea archaea (fam05120, fam05272 and fam01092) (Tully 2019). We also identified two new Thalassoarchaea-specific families of well-conserved peptidases that are seemingly unique to Thalassoarchaea (peptidase M17; PF00883; fam03211 (Module 32) and peptidase M6; PF05547 fam00840 (Module 71)). As reported by Tully (2019), peptidase S15 (PF02129; fam03321) and peptidase M60-like (PF13402; fam05454) have a narrow distribution within Thalassoarchaea so were assigned to another module (module_738). Interestingly, we identified modules specific to Thalassoarchaea subgroup IIa (module_135, containing 99 families) and Thalassoarchaea subgroup IIb (module_45, containing 39 families). Both modules contain 4 protein families with calcium-binding domains (fam02857, fam00852 and fam02101 in module 135 and fam01838 in module 45) (Supplementary Figure 8). These proteins may be involved in signaling and regulation of protein-protein interactions in the cell (Michiels et al. 2002).

Functions specific to the six genomes (Full analysis). The module 48 contains 42 families that are specific and conserved in the six genomes of the superphylum Asgard (four genomes of and two genomes of Heimdallarchaeota). Of these, 33 lack both KEGG and PFAM functional predictions (Supplementary Dataset - Table S3). The Asgard archaea, which affiliate with eukaryotes in the tree of life (Cox et al. 2008), encode proteins that they only share with eukaryotes (Hartman and Fedorov 2002). We detected six eukaryotic signature protein families (ESPs) in module 48 (Supplementary Figure 13). These ESPs include the ESCRT-III Snf7- family (fam04979) and the ESCRT-II Vps25-like family (fam12378). These two families are part of the ESCRT system and the genes are found in the same genomic region in the Asgard genomes (Zaremba- Niedzwiedzka et al. 2017). Three cytoskeleton-related families were found in module 48. The two first families are the gelsonin-domain protein family (fam03231) (Zaremba-Niedzwiedzka et al. 2017) and fam04420 that matches with the PDB sequence of the Loki profilin-1 crystal structure (accession: 5yee) (Akıl and Robinson 2018). Interestingly, the third cytoskeleton-related family (fam15271) shows similarity with the integrin beta 4. To the best we know, integrin genes were never described in archaea and fam15271 may represent a new ESP. The genes of fam15271 are always located next to genes (fam00241) in the five Asgard genomes (Figure 4A and Supplementary Dataset - Table S8). This is particularly interesting as recent studies have shed light on the crosstalk between integrin and the microtubule cytoskeleton (LaFlamme et al. 2018). Finally, one family in module 48 (fam18955) is annotated as the DNA excision repair protein ERCC-3 in three Asgard genomes and three Theionarchaea genomes. The genes neighboring the genes of fam18955 differ between the two lineages (Figure 4B and Supplementary Dataset - Table S8) and the three Asgard sequences only share between 20 and 23% protein identity with the three Theionarchaea sequences. These differences may indicate two distinct functions for this family. Fam18955 shows distant homology with the protein RAD25 of Saccharomyces cerevisiae. RAD25 is a DNA helicase required for DNA repair and RNA polymerase II transcription in S. cerevisiae (Guzder et al. 1994). RAD25 is also one of the six subunits of the transcription factor IIH (TFIIH) in S. cerevisiae (Sung et al. 1996). Consistent with the role of RAD25 in S. cerevisiae, the genes of family18955 is found next to replication factor C small subunit genes in the three Asgard genomes (Figure 4 and Supplementary Dataset - Table S8). ggkbase UBA543 ggkbase UBA55 ggkbase UBA490 ggkbase RIFCSPHIGHO2 02 FULL Archaea Diapherotrites 47 15 genasci GD2017-1 S p2 S4 Biohub 170907 DPANN 55 27 genasci S p2 S4 coassembly Micrarchaeota 40 33 ggkbase RIFCSPLOWO2 12 FULL Micrarchaeota 37 11 ggkbase CG1 02 FULL Micrarchaeota 47 40 curated Tree scale: 0.1 ggkbase RIFCSPHIGHO2 02 FULL Archaea Micrarchaeota 50 12 ggkbase CG10 big l rev 8 21 14 0 10 Micrarchaeota 45 29 curated genasci S p2 S4 coassembly Micrarchaeota 56 85 genasci GD2018-4 B1 QB3 180703 Micrarchaeota 58 13 ggkbase UBA97 ggkbase CG1 02 FULL Micrarchaeota 51 15 curated Lineages ggkbase RIFCSPHIGHO2 12 FULL Micrarchaeota 52 14 ggkbase CG08 land 8 20 14 0 20 Micraarchaeota 59 11 curated ggkbase UBA93 ggkbase UBA95 Halobacteria ggkbase CG08 land 8 20 14 0 20 Diapherotrites 30 16 curated ggkbase CG09 land 8 20 14 0 10 Diapherotrites 32 12 curated ggkbase CG08 land 8 20 14 0 20 Diapherotrites 34 12 curated BrettBaker Meg22 1214 Bin 231 BrettBaker Meg22 1416 Bin 115 Methanomicrobia BrettBaker Meg19 1012 Bin 161 genasci S p1 S3 coassembly Diapherotrites 51 226 BrettBaker AB 3033 bin 170 GCA 002688035.1 Candidatus Diapherotrites Archaeoglobi BrettBaker AB 3033 bin 60 ggkbase RIFCSPLOWO2 01 FULL Archaea Diapherotrites AR10 48 17 ggkbase CG10 big l rev 8 21 14 0 10 Diapherotrites 31 34 curated BrettBaker Meg19 1012 Bin 505 Pontarchaea (Marine Group III) genasci S p1 S3 coassembly Diapherotrites 48 87 ggkbase RIFCSPLOWO2 01 FULL Diapherotrites 43 13 genasci Genasci Feb2018 S34 3B Diapherotrites 46 10 ggkbase RIFCSPHIGHO2 01 FULL Archaea Diapherotrites AR10 44 11 genasci S p1 S3 coassembly Diapherotrites 46 58 Methanomassiliicoccales genasci Genasci Feb2018 S35 SO-1 Parcearchaeota 40 6 ggkbase RIFCSPHIGHO2 01 FULL Archaea Diapherotrites GW2011 AR10 43 9 ggkbase RIFCSPLOWO2 01 FULL Archaea Diapherotrites 58 19 genasci S p2 S4 coassembly Diapherotrites 52 80 ggkbase DUSEL3 archaeon SCGC AAA011 E11 ggkbase UBA493 ggkbase CG11 big l rev 8 21 14 0 20 Diapherotrites 37 9 curated genasci Genasci Feb2018 S35 SO-1 Aenigma Micrachaeota 46 11 Izemarchaea (Marine Benthic Group D) GCA 002688965.1 unclassi ed Archaea miscellaneous ggkbase RIFCSPHIGHO2 02 FULL Aenigmarchaeota 36 49 genasci S p1 S3 coassembly Aenigma Micrachaeota 51 28 ggkbase RIFCSPHIGHO2 02 FULL Aenigmarchaeota 49 46 genasci S p1 S3 coassembly Aenigma Micrachaeota 45 568 Methanobacteria genasci GD2017-2 S2 QB3 180125 Aenigma Micrachaeota 45 12 genasci Genasci Feb2018 S36 SO-2 Aenigma Micrachaeota 53 36 genasci Genasci Feb2018 S35 SO-1 Aenigma Micrachaeota 47 21 ggkbase RIFCSPHIGHO2 01 FULL Archaea Aenigmarchaeota AR5 44 11 genasci GD2018-4 B1 QB3 180703 Aenigma Micrachaeota 45 55 genasci S p2 S4 coassembly Aenigma Micrachaeota 44 80 genasci S p2 S4 coassembly Aenigma Micrachaeota 40 40 genasci GD2017-2 S2 QB3 180125 Aenigma Micrachaeota 50 12 Thaumarcheota genasci S p2 S4 coassembly Aenigma Micrachaeota 49 73 ggkbase AR5 curated draft ggkbase RIFCSPLOWO2 01 FULL Aenigmarchaeota 48 10 ggkbase Candidatus Micrarchaeota archaeon RBG 16 36 9 Bathyarcheota BrettBaker Meg22 810 Bin 261 genasci S p1 S3 coassembly Aenigma Micrachaeota 53 25 genasci GD2017-2 S1 QB3 180125 Aenigma Micrachaeota 56 17 genasci S p1 S3 coassembly Aenigma Micrachaeota 63 48 genasci S p2 S4 coassembly Aenigma Micrachaeota 45 36 genasci S p2 S4 coassembly Aenigma Micrachaeota 49 229 genasci GD2017-2 S2 QB3 180125 Aenigma Micrachaeota 44 11 genasci GD2017-1 S p2 S4 Biohub 170907 Aenigma Micrachaeota 49 52 ggkbase RIFCSPHIGHO2 02 FULL Aenigmarchaeota 35 12 Vestraetearcheota ggkbase Candidatus Micrarchaeota archaeon RBG 16 49 10 ggkbase CG15 BIG FIL POST REV 8 21 14 020 Aenigmarchaeota 37 27 curated BrettBaker AB 3033 bin 19 ggkbase RIFCSPLOWO2 01 FULL Aenigmarchaeota 39 12 Asgard genasci S p1 S3 coassembly Parcearchaeota 37 102 genasci Genasci Feb2018 S35 SO-1 Aenigma Micrachaeota 40 17 genasci GD2017-2 S2 QB3 180125 Aenigma Micrachaeota 43 16 genasci Genasci Feb2018 S35 SO-1 Aenigma Micrachaeota 38 10 genasci Genasci Feb2018 S35 SO-1 Aenigma Micrachaeota 40 10 Woesarcheaota genasci GD2017-2 B1 QB3 180125 Aenigma Micrachaeota 41 10 BrettBaker AB 1215 Bin 123 BrettBaker AB 03 bin 175 BrettBaker AB 1215 Bin 137 Paecearcheota GCA 001563875.1 Candidatus Nanohaloarchaeota GCA 003009795.1 Candidatus Nanohaloarchaeota GCA 003009815.1 Candidatus Nanohaloarchaeota GCA 001761425.1 Candidatus Nanohaloarchaeota Micrarchaeota GCA 001564145.1 Candidatus Nanohaloarchaeota GCA 001564035.1 Candidatus Nanohaloarchaeota ggkbase UBA506 ggkbase UBA573 BrettBaker Meg 22 1618 bin 99 Diapherotrites GCA 002779795.1 unclassi ed Archaea miscellaneous BrettBaker Meg19 1012 Bin 507 GCA 002763265.1 unclassi ed Archaea miscellaneous BrettBaker Meg22 46 Bin 205 Aenigmarcheota BrettBaker Meg22 1214 Bin 175 BrettBaker Meg22 1214 Bin 73 BrettBaker Meg22 46 Bin 225 BrettBaker Meg22 810 Bin 162 Mamarchaeota genasci S p1 S3 coassembly Aenigma Micrachaeota 44 133 ggkbase RIFCSPLOWO2 12 FULL Novel Lineage Archaea 36 25 ggkbase RIFOXYA1 FULL Archea Novel Lineage 33 17 ggkbase RIFOXYA1 FULL Archaea Novel lineage 35 35 partial ggkbase RIFCSPLOWO2 01 FULL Archaea Pacearchaeota 43 24 Thalassoarchaea (Marine Group II) GCA 002254495.1 Candidatus Pacearchaeota ggkbase CG06 land 8 20 14 3 00 150 Pacearchaeota 35 12 curated GCA 001872145.1 Candidatus Pacearchaeota GCA 002254805.1 Candidatus Pacearchaeota Nanohaloarchaeota ggkbase RIFOXYC1 FULL Pacearchaeota 30 8 ggkbase RIFCSPLOWO2 01 FULL Archaea Pacearchaeota 38 24 seed BrettBaker Meg22 1214 Bin 151 BrettBaker Meg22 1012 Bin 115 Hadearchaeota GCA 002690445.1 Candidatus Pacearchaeota GCA 002688415.1 Candidatus Pacearchaeota ggkbase RIFOXYB2 FULL Pacearchaeota 43 25 genasci GD2017-2 S1 QB3 180125 Parcearchaeota 37 8 genasci GD2018-4 B1 QB3 180703 Parcearchaeota 33 12 genasci Genasci Feb2018 S35 SO-1 Parcearchaeota 33 10 a BrettBaker Meg22 1012 Bin 283 BrettBaker Meg22 1214 Bin 224 ggkbase CG10 big l rev 8 21 14 0 10 Pacearchaeota 35 13 curated ggkbase RIFCSPHIGHO2 01 FULL Archaea Pacearchaeota 37 14 ggkbase CG10 big l rev 8 21 14 0 10 Pacearchaeota 34 76 curated ggkbase RIFCSPLOWO2 01 FULL Archaea Pacearchaeota 33 25 ggkbase RIFCSPHIGHO2 01 FULL Archaea Pacearchaeota 34 23 genasci GD2017-2 S1 QB3 180125 Pacearchaeota 36 11 ggkbase CG1 02 FULL Pacearchaeota 35 32 curated GCA 002788615.1 Candidatus Pacearchaeota ggkbase UBA92 GCA 002687255.1 Candidatus Pacearchaeota ggkbase RIFOXYC1 FULL Pacearchaeota 38 15 genasci GD2018-4 B1 QB3 180703 Parcearchaeota 34 21 ggkbase RIFCSPLOWO2 01 FULL Archaea Pacearchaeota 36 21 ggkbase UBA73 ggkbase RIFCSPHIGHO2 01 FULL Archaea Pacearchaeota 39 44 ggkbase RIFCSPLOWO2 01 FULL Pacearchaeota 32 14 GCA 002686195.1 Candidatus Pacearchaeota GCA 002688015.1 Candidatus Pacearchaeota ggkbase RIFCSPHIGHO2 12 FULL Archaea Pacearchaeota 36 23 ggkbase CG10 big l rev 8 21 14 0 10 Pacearchaeota 31 24 curated ggkbase RIFCSPLOWO2 01 FULL Archaea Woesearchaeota 35 26 genasci Genasci Feb2018 S35 SO-1 Parcearchaeota 31 8 ggkbase RIFCSPHIGHO2 01 FULL Archaea Pacearchaeota 35 25 BrettBaker Meg22 1214 Bin 14 ggkbase Candidatus Pacearchaeota archaeon RBG 13 36 9 BrettBaker Meg19 1012 Bin 33 BrettBaker Meg19 1012 Bin 465 BrettBaker Meg19 1012 Bin 253 ggkbase RIFOXYC1 FULL Pacearchaeota 30 12 ggkbase RIFOXYB1 FULL Pacearchaeota 29 20 GCA 002688165.1 Candidatus Pacearchaeota ggkbase RIFCSPLOWO2 02 FULL Archaea Pacearchaeota 30 10 ggkbase RIFCSPHIGHO2 01 FULL Archaea Pacearchaeota 33 47 BrettBaker Meg19 1012 Bin 451 genasci Genasci Feb2018 S35 SO-1 Parcearchaeota 30 12 ggkbase RIFCSPHIGHO2 02 FULL Archaea Pacearchaeota 31 29 GCA 002792685.1 Candidatus Pacearchaeota ggkbase RIFCSPHIGHO2 01 FULL Archaea Pacearchaeota 29 23 ggkbase RIFOXYB1 FULL Pacearchaeota 31 12 ggkbase RIFOXYB1 FULL Pacearchaeota 32 14 ggkbase CG10 big l rev 8 21 14 0 10 Pacearchaeota 30 48 curated BrettBaker Meg19 1012 Bin 278 BrettBaker Meg22 1012 Bin 191 ggkbase RIFCSPLOWO2 01 FULL Archaea Pacearchaeota 32 35 BrettBaker Meg22 1214 Bin 100 ggkbase CG10 big l rev 8 21 14 0 10 Pacearchaeota 32 14 curated ggkbase CG10 big l rev 8 21 14 0 10 Pacearchaeota 34 12 curated genasci S p1 S3 coassembly Parcearchaeota 34 115 ggkbase RIFCSPHIGHO2 02 FULL Archaea Pacearchaeota 33 14 genasci GD2017-2 S1 QB3 180125 Parcearchaeota 36 11 genasci S p1 S3 coassembly Parcearchaeota 38 135 ggkbase Candidatus Pacearchaeota archaeon RBG 16 35 8 ggkbase Candidatus Pacearchaeota archaeon RBG 13 33 26 ggkbase RIFCSPHIGHO2 12 FULL Pacearchaeota 34 15 BrettBaker AB 1215 Bin 163 BrettBaker Meg22 1012 Bin 152 BrettBaker Meg22 1214 Bin 377 BrettBaker Meg22 1214 Bin 270 BrettBaker Meg19 1012 Bin 412 GCA 002731115.1 Candidatus Pacearchaeota GCA 002688755.1 Candidatus Pacearchaeota BrettBaker Meg19 1012 Bin 43 BrettBaker Meg22 46 Bin 244 genasci GD2017-2 S1 QB3 180125 Parcearchaeota 34 9 ggkbase GW2011 AR19 genasci GD2017-2 B1 QB3 180125 Parcearchaeota 31 18 ggkbase GW2011 AR13 ggkbase RIFOXYC1 FULL Pacearchaeota 33 130 BrettBaker Meg22 1012 Bin 201 ggkbase RIFOXYB1 FULL Pacearchaeota 32 8.8 ggkbase RIFOXYB1 FULL Pacearchaeota 34 12 ggkbase CG10 big l rev 8 21 14 0 10 Pacearchaeota 32 42 curated ggkbase RIFOXYA1 FULL Pacearchaeota 30 12 GCA 002687185.1 unclassi ed Archaea miscellaneous ggkbase RIFCSPLOWO2 12 FULL Archaea Woesearchaeota 33 15 DPANN ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 32 10 GCA 002688355.1 unclassi ed Archaea miscellaneous ggkbase RIFCSPLOWO2 01 FULL Woesearchaeota 37 43 ggkbase RIFCSPLOWO2 01 FULL Archaea Woesearchaeota 32 21 ggkbase RIFCSPHIGHO2 01 FULL Archaea Woesearchaeota 34 12 ggkbase RIFOXYB1 FULL Woesearchaeota 29 12 genasci S p1 S3 coassembly Nano Woesearchaeota 33 315 genasci GD2017-2 S1 QB3 180125 Nano Woesearchaeota 32 9 ggkbase RIFCSPLOWO2 02 FULL Woesearchaeota 34 11 ggkbase RIFCSPLOWO2 02 FULL Woesearchaeota 32 17 ggkbase RIFOXYA1 FULL Woesearchaeota 30 10 genasci Genasci Feb2018 S35 SO-1 Nano Woesearchaeota 32 8 ggkbase RIFCSPHIGHO2 01 FULL Archaea Woesearchaeota 29 27 ggkbase RIFCSPLOWO2 02 FULL Archaea Woesearchaeota 30 14 BrettBaker Meg22 1012 Bin 171 ggkbase RIFCSPLOWO2 01 FULL Archaea Woeserachaeota 30 9 ggkbase RIFCSPLOWO2 01 FULL Woesearchaeota 33 9b ggkbase RIFCSPHIGHO2 12 FULL Archaea Woesearchaeota 30 20 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 28 36 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 30 12 ggkbase RIFCSPLOWO2 01 FULL Archaea Woesearchaeota 28 39 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 29 10 genasci GD2017-2 S2 QB3 180125 Nano Woesearchaeota 29 9 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 30 43 BrettBaker Meg 22 1618 bin 98 genasci S p2 S4 coassembly Nano Woesearchaeota 39 41 ggkbase RIFOXYD2 FULL Archaea Woesearchaeota 34 17 ggkbase RIFOXYB1 FULL Woesearchaeota 31 16 ggkbase RIFOXYA1 FULL Woesearchaeota 29 14 genasci S p1 S3 coassembly Nano Woesearchaeota 29 154 ggkbase RIFOXYB1 FULL Woesearchaeota 34 13 genasci Genasci Feb2018 S35 SO-1 Nano Woesearchaeota 33 7 BrettBaker Meg19 1012 Bin 325 genasci S p1 S3 coassembly Nano Woesearchaeota 31 108 ggkbase RIFCSPLOWO2 01 FULL Woesearchaeota Archaea 34 35 ggkbase RIFCSPHIGHO2 01 FULL Archaea Woesearchaeota 38 8 ggkbase RIFCSPHIGHO2 01 FULL Archaea Woesearchaeota 39 22 genasci Genasci Feb2018 S35 SO-1 Nano Woesearchaeota 31 12 genasci Genasci Feb2018 S35 SO-1 Nano Woesearchaeota 30 27 genasci GD18-Feb Sp1 Nano Woesearchaeota ggkbase RIFOXYD1 FULL Woesearchaeota 32 9 genasci GD2018-4 B1 QB3 180703 Nano Woesearchaeota 30 18 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 29 19 genasci Genasci Feb2018 S35 SO-1 Nano Woesearchaeota 31 11 genasci GD2017-2 S1 QB3 180125 Nano Woesearchaeota 35 14 ggkbase RIFCSPHIGHO2 02 FULL Woesearchaeota 36 21 GCA 002505585.1 Candidatus Woesearchaeota GCA 002688325.1 unclassi ed Archaea miscellaneous ggkbase RIFCSPLOWO2 01 FULL Woesearchaeota Archaea 29 12 ggkbase DUSEL4 archaeon SCGC AAA011 G17 ggkbase RIFOXYB1 FULL Woesearchaeota 31 12 GCA 002688395.1 GCA 002687825.1 Nanoarchaeota GCA 002688095.1 Nanoarchaeota genasci GD2017-2 S1 QB3 180125 Nano Woesearchaeota 32 61 genasci GD2018-4 B1 QB3 180703 Nano Woesearchaeota 35 15 ggkbase RIFCSPLOWO2 01 FULL Woesearchaeota 35 31 ggkbase Candidatus Woesearchaeota archaeon RBG 13 36 6 BrettBaker Meg22 1012 Bin 284 BrettBaker Meg22 810 Bin 150 GCA 002687795.1 Candidatus Woesearchaeota BrettBaker Meg19 1012 Bin 85 BrettBaker AB 3033 bin 101 ggkbase CG10 big l rev 8 21 14 0 10 Woesearchaeota 33 12 curated BrettBaker Meg22 1214 Bin 140 ggkbase RIFCSPLOWO2 01 FULL Archaea Woesearchaeota 46 24 genasci Genasci Feb2018 S34 3B Woesearchaeota 40 14 genasci Genasci Feb2018 S36 SO-2 Woesearchaeota 34 19 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 32 12 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 33 21 genasci S p2 S4 coassembly Woesearchaeota 34 95 ggkbase RIFCSPLOWO2 01 FULL Archaea Woesearchaeota 42 17 BrettBaker AB 69 bin 160 ggkbase UBA119 ggkbase UBA153 ggkbase UBA525 ggkbase CG10 big l rev 8 21 14 0 10 Woesearchaeota 32 9 curated GCA 001595785.1 unclassi ed GCA 002685855.1 Candidatus Woesearchaeota ggkbase RIFOXYA2 FULL Woesearchaeota 33 15 ggkbase CG07 land 8 20 14 0 80 Woesearchaeota 44 23 curated ggkbase CG11 big l rev 8 21 14 0 20 Woesearchaeota 43 8 curated genasci GD2017-2 S2 QB3 180125 Woesearchaeota 38 10 ggkbase RIFCSPHIGHO2 01 FULL Archaea Woesearchaeota 49 8 GCA 001872825.1 Candidatus Woesearchaeota genasci S p2 S4 coassembly Woesearchaeota 48 44 genasci GD2018-4 B1 QB3 180703 Woesearchaeota 47 10 genasci GD2017-2 S2 QB3 180125 Woesearchaeota 45 10 genasci Genasci Feb2018 S36 SO-2 Woesearchaeota 49 29 genasci Genasci Feb2018 S36 SO-2 Woesearchaeota 46 11 genasci GD2017-2 S2 QB3 180125 Archaea 45 11 genasci Genasci Feb2018 S35 SO-1 Woesearchaeota 46 13 ggkbase UBA492 ggkbase RIFCSPHIGHO2 02 FULL Woesearchaeota 45 15 genasci GD2017-2 S2 QB3 180125 Woesearchaeota 49 32 genasci GD2018-4 B1 QB3 180703 Woesearchaeota 48 22 genasci GD2017-2 S2 QB3 180125 Woesearchaeota 49 15 genasci GD2017-2 S2 QB3 180125 Woesearchaeota 49 8 genasci GD2018-4 B1 QB3 180703 Woesearchaeota 47 15 genasci Genasci Feb2018 S36 SO-2 Woesearchaeota 48 19 genasci GD2017-2 S2 QB3 180125 Woesearchaeota 50 8 genasci GD2017-2 S65 QB3 180125 Woesearchaeota 48 59 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 45 17 ggkbase UBA94 genasci S p2 S4 coassembly Woesearchaeota 49 28 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 39 9 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 36 9 genasci S p1 S3 coassembly Woesearchaeota 37 158 GCA 002687275.1 Candidatus Woesearchaeota genasci S p1 S3 coassembly Woesearchaeota 39 91 BrettBaker Meg19 1012 Bin 77 genasci GD2018-4 B1 QB3 180703 Woesearchaeota 31 17 ggkbase CG10 big l rev 8 21 14 0 10 Woesearchaeota 30 7 curated ggkbase CG10 big l rev 8 21 14 0 10 Woesearchaeota 37 12 curated ggkbase RIFCSPLOWO2 01 FULL Archaea Woesearchaeota 45 21 genasci S p1 S3 coassembly Woesearchaeota 45 47 ggkbase RIFCSPLOWO2 12 FULL Archaea Woesearchaeota 46 25 genasci S p1 S3 coassembly Aenigma Micrachaeota 41 83 genasci S p1 S3 coassembly Woesearchaeota 37 38 genasci S p2 S4 coassembly Woesearchaeota 52 30 genasci S p1 S3 coassembly Woesearchaeota 57 86 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 62 20 genasci S p1 S3 coassembly Woesearchaeota 52 42 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 32 17 genasci S p2 S4 coassembly Woesearchaeota 48 40 ggkbase RIFCSPLOWO2 01 FULL Archaea Woesearchaeota 41 36 ggkbase RIFOXYD1 FULL Woesearchaeota 39 9.8 GCA 002687775.1 Nanoarchaeota genasci Genasci Feb2018 S35 SO-1 Woesearchaeota 44 11 ggkbase RIFCSPHIGHO2 01 FULL Archaea Woesearchaeota 33 11 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 39 16 ggkbase CG10 big l rev 8 21 14 0 10 Woesearchaeota 32 24 curated ggkbase RIFCSPHIGHO2 02 FULL Woesearchaeota 38 13 genasci GD2018-4 B1 QB3 180703 Woesearchaeota 39 9 genasci S p1 S3 coassembly Woesearchaeota 38 69 ggkbase CG 4 10 14 0 2 um lter Woesearchaeota 33 13 curated ggkbase RIFCSPLOWO2 01 FULL Woesearchaeota Archaea 41 17 ggkbase CG10 big l rev 8 21 14 0 10 Woesearchaeota 36 11 curated ggkbase RIFOXYD1 FULL Archaea Woesearchaeota 40 14 genasci Genasci Feb2018 S35 SO-1 Woesearchaeota 35 31 ggkbase RIFCSPLOWO2 01 FULL Woesearchaeota 35 30 genasci GD2017-2 S2 QB3 180125 Woesearchaeota 44 14 ggkbase RIFCSPLOWO2 01 FULL Woesearchaeota 44 20 genasci S p1 S3 coassembly Woesearchaeota 44 80 genasci S p1 S3 coassembly Woesearchaeota 45 28 genasci GD2017-2 S1 QB3 180125 Woesearchaeota 42 19 genasci GD2018-4 B1 QB3 180703 Woesearchaeota 39 47 ggkbase CG10 big l rev 8 21 14 0 10 Woesearchaeota 45 16 curated GCA 002688265.1 unclassi ed Archaea miscellaneous GCA 002686315.1 unclassi ed Archaea miscellaneous GCA 002867475.1 Candidatus Woesearchaeota ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 37 9 genasci GD2017-2 S2 QB3 180125 Woesearchaeota 36 8 genasci GD2017-1 S p1 S3 Biohub 170907 Woesearchaeota 45 28 ggkbase CG10 big l rev 8 21 14 0 10 Woesearchaeota 44 13 curated genasci S p1 S3 coassembly Woesearchaeota 45 29 genasci GD2017-2 S2 QB3 180125 Woesearchaeota 50 10 ggkbase RIFOXYB1 FULL Woesearchaeota 37 19 genasci GD2018-4 B1 QB3 180703 Woesearchaeota 44 11 genasci S p2 S4 coassembly Woesearchaeota 45 88 ggkbase RIFCSPHIGHO2 01 FULL Archaea Woesearchaeota 47 16 ggkbase RIFCSPHIGHO2 01 FULL Archaea Woesearchaeota 57 88 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 41 15 ggkbase RIFCSPLOWO2 01 FULL Archaea Woesearchaeota 53 31 ggkbase RIFCSPLOWO2 01 FULL Archaea Woesearchaeota 50 16 genasci GD2017-2 S2 QB3 180125 Woesearchaeota 45 7 genasci GD2017-2 S2 QB3 180125 Woesearchaeota 46 18 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 36 13 ggkbase RIFCSPLOWO2 12 FULL Archaea Woesearchaeota 38 19 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 44 15 GCA 002686855.1 Candidatus Woesearchaeota genasci GD2017-3 B1 QB3 180125 Woesearchaeota 38 13 genasci GD2017-2 S2 QB3 180125 Woesearchaeota 40 15 GCA 002688925.1 unclassi ed Archaea miscellaneous GCA 002730625.1 Candidatus Woesearchaeota genasci Genasci Feb2018 S34 3B Woesearchaeota 36 15 genasci Genasci Feb2018 S35 SO-1 Woesearchaeota 36 10 genasci GD2017-2 S1 QB3 180125 Woesearchaeota 44 9 genasci S p2 S4 coassembly Woesearchaeota 39 97 genasci Genasci Feb2018 S36 SO-2 Woesearchaeota 37 15 ggkbase RIFCSPHIGHO2 02 FULL Archaea Woesearchaeota 40 54 genasci GD2017-2 S1 QB3 180125 Woesearchaeota 41 17 genasci Genasci Feb2018 S34 3B Woesearchaeota 38 40 genasci GD18-Feb Sp1 Woesearchaeota Maxbin2 028 sub ggkbase RIFCSPHIGHO2 01 FULL Archaea Woesearchaeota 38 11 genasci Genasci Feb2018 S35 SO-1 Woesearchaeota 40 64 genasci GD2018-4 B1 QB3 180703 Woesearchaeota 40 22 genasci Genasci Feb2018 S35 SO-1 Woesearchaeota 42 26 genasci Genasci Feb2018 S35 SO-1 Woesearchaeota 34 16 genasci Genasci Feb2018 S35 SO-1 Woesearchaeota 38 11 ggkbase RIFCSPLOWO2 01 FULL Archaea Woesearchaeota 35 20 genasci Genasci Feb2018 S35 SO-1 Woesearchaeota 39 10 ggkbase RIFCSPLOWO2 01 FULL Archaea Woesearchaeota 38 40 genasci Genasci Feb2018 S35 SO-1 Woesearchaeota 39 8 GCA 001940755.1 Candidatus Heimdallarchaeota GCA 003144275.1 Candidatus Heimdallarchaeota GCA 001563325.1 Candidatus Thorarchaeota GCA 002825535.1 Candidatus Thorarchaeota GCA 001940705.1 Candidatus Thorarchaeota GCA 002825515.1 Candidatus Thorarchaeota GCA 002507085.1 unclassi ed Archaea miscellaneous Asgard GCA 000494205.1 Crenarchaeota GCA 001717015.1 Candidatus Verstraetearchaeota GCA 001717035.1 Candidatus Verstraetearchaeota GCA 002504985.1 Candidatus Verstraetearchaeota GCA 000993805.1 Crenarchaeota GCA 000015225.1 Crenarchaeota GCA 000813245.1 Crenarchaeota GCA 002254745.1 Crenarchaeota GCA 002254985.1 Crenarchaeota GCA 000223395.1 Crenarchaeota GCA 000015145.1 Crenarchaeota GCA 001462395.1 Crenarchaeota GCA 002194565.1 Crenarchaeota GCA 002254665.1 Crenarchaeota GCA 000092465.1 Crenarchaeota GCA 000015945.1 Crenarchaeota GCA 000264495.1 Crenarchaeota GCA 002495025.1 Crenarchaeota GCA 000092185.1 Crenarchaeota GCA 000513855.1 Crenarchaeota GCA 000186365.1 Crenarchaeota GCA 000258425.1 Crenarchaeota GCA 000389735.1 Crenarchaeota GCA 000011205.1 Crenarchaeota GCA 003086555.1 Crenarchaeota GCA 001560435.1 Crenarchaeota GCA 000508305.1 Crenarchaeota GCA 001719125.1 Crenarchaeota GCA 000022385.1 Crenarchaeota GCA 900079115.1 Crenarchaeota GCA 001316085.1 Crenarchaeota GCA 000632495.1 Crenarchaeota GCA 000213215.1 Crenarchaeota GCA 003201765.1 Crenarchaeota GCA 002116695.1 Crenarchaeota GCA 000565255.1 Crenarchaeota GCA 003201835.1 Crenarchaeota GCA 000243315.1 Crenarchaeota GCA 000204925.1 Crenarchaeota GCA 000016605.1 Crenarchaeota GCA 003201675.1 Crenarchaeota GCA 001775955.1 Candidatus Bathyarchaeota GCA 001918745.1 Crenarchaeota GCA 001775995.1 Candidatus Bathyarchaeota GCA 001273415.1 Candidatus Bathyarchaeota GCA 002726865.1 Candidatus Bathyarchaeota GCA 001775965.1 Candidatus Bathyarchaeota GCA 001774245.1 Candidatus Bathyarchaeota GCA 001593865.1 Candidatus Bathyarchaeota GCA 002009985.1 Candidatus Bathyarchaeota GCA 001593935.1 Candidatus Bathyarchaeota GCA 002490245.1 Candidatus Bathyarchaeota GCA 001399805.1 Candidatus Bathyarchaeota GCA 002509245.1 Candidatus Bathyarchaeota GCA 002507245.1 Candidatus Bathyarchaeota GCA 001774235.1 Candidatus Bathyarchaeota GCA 001273385.1 Candidatus Bathyarchaeota GCA 003165195.1 Candidatus Bathyarchaeota GCA 003151735.1 Candidatus Bathyarchaeota GCA 003141855.1 Candidatus Bathyarchaeota GCA 002898355.1 unclassi ed Archaea miscellaneous GCA 002011075.1 Candidatus Geothermarchaeota GCA 002499005.1 GCA 002506605.1 Thaumarchaeota genasci B 1 S1 Biohub coassembly Thaumarchaeota 49 231 genasci GD2017-2 S65 QB3 180125 Thaumarchaeota 49 13 GCA 003164815.1 Thaumarchaeota genasci Genasci Feb2018 S36 SO-2 Thaumarchaeota 54 9 genasci Genasci Feb2018 S36 SO-2 Thaumarchaeota 45 15 genasci GD2017-2 S2 QB3 180125 Thaumarchaeota 46 8 GCA 002713205.1 Thaumarchaeota GCA 002508305.1 Thaumarchaeota GCA 002494565.1 Thaumarchaeota GCA 002713325.1 Thaumarchaeota TACK GCA 001870125.1 Thaumarchaeota GCA 000802205.2 Thaumarchaeota GCA 003176995.1 Thaumarchaeota GCA 000303155.1 Thaumarchaeota GCA 002508435.1 Thaumarchaeota GCA 000698785.1 Thaumarchaeota GCA 000730285.1 Thaumarchaeota GCA 001917505.1 Thaumarchaeota GCA 900143675.1 Thaumarchaeota GCA 900177045.1 Thaumarchaeota GCA 900065925.1 Thaumarchaeota () GCA 900167955.1 Thaumarchaeota GCA 000685395.1 Thaumarchaeota GCA 000723185.1 Thaumarchaeota GCA 002787055.1 Thaumarchaeota GCA 000955905.3 Thaumarchaeota GCA 001443365.1 Thaumarchaeota GCA 000402075.1 Thaumarchaeota GCA 002698885.1 Thaumarchaeota GCA 000812185.1 Thaumarchaeota GCA 001627235.1 Thaumarchaeota GCA 003352295.1 Crenarchaeota GCA 000200715.1 Thaumarchaeota GCA 000204585.1 Thaumarchaeota genasci GD2017-2 S2 QB3 180125 Thaumarchaeota 33 9 GCA 002737445.1 Thaumarchaeota GCA 002784675.1 Thaumarchaeota GCA 000220175.2 Thaumarchaeota GCA 002788515.1 Thaumarchaeota GCA 003175215.1 Thaumarchaeota GCA 000956175.1 Thaumarchaeota GCA 000018465.1 Thaumarchaeota GCA 000328945.1 Thaumarchaeota GCA 000875775.1 Thaumarchaeota GCA 001510275.1 Thaumarchaeota GCA 001543015.1 Thaumarchaeota GCA 002730325.1 Thaumarchaeota GCA 000299395.1 Thaumarchaeota GCA 002737455.1 Thaumarchaeota GCA 002508395.1 Thaumarchaeota GCA 001541925.1 Thaumarchaeota GCA 003331425.1 Thaumarchaeota GCA 002156965.1 Thaumarchaeota GCA 003229935.1 unclassi ed Euryarchaeota GCA 002011125.1 unclassi ed Euryarchaeota GCA 001595815.1 Theionarchaea GCA 001595795.1 Theionarchaea GCA 001587635.1 Methanomicrobia GCA 001587655.1 Methanomicrobia GCA 001587715.1 Methanomicrobia GCA 000376965.1 GCA 000243455.2 Methanococci GCA 002009975.1 Thermococci GCA 000966265.1 Thermococci GCA 000725425.1 Thermococci GCA 001317345.1 Thermococci GCA 000022545.1 Thermococci GCA 000430485.1 Thermococci GCA 001484685.1 Thermococci GCA 000246985.3 Thermococci GCA 000517445.1 Thermococci GCA 001433455.1 Thermococci GCA 000215995.1 Thermococci GCA 000195935.2 Thermococci GCA 000011105.1 Thermococci GCA 000211475.1 Thermococci GCA 000275605.1 Thermococci GCA 000263735.1 Thermococci GCA 001577775.1 Thermococci GCA 900012635.1 Thermococci GCA 001647085.1 Thermococci GCA 000018365.1 Thermococci GCA 900109425.1 Thermococci GCA 002214365.1 Thermococci GCA 002214525.1 Thermococci GCA 000265525.1 Thermococci GCA 002214565.1 Thermococci GCA 002214465.1 Thermococci GCA 000221185.1 Thermococci GCA 001484195.1 Thermococci GCA 002214505.1 Thermococci GCA 002214485.1 Thermococci GCA 002214585.1 Thermococci GCA 002214385.1 Thermococci GCA 000258515.1 Thermococci GCA 000009965.1 Thermococci GCA 001592435.1 Thermococci GCA 000816105.1 Thermococci GCA 000769655.1 Thermococci GCA 900198835.1 Thermococci GCA 000585495.1 Thermococci GCA 000151205.2 Thermococci GCA 000022365.1 Thermococci GCA 001515215.1 Hadesarchaea GCA 001515205.2 Hadesarchaea GCA 001515185.1 Hadesarchaea GCA 000166095.1 Methanobacteria GCA 003264935.1 Methanobacteria GCA 900095815.1 Methanobacteria GCA 000145295.1 Methanobacteria GCA 000828575.1 Methanobacteria GCA 002494785.1 Methanobacteria GCA 002494885.1 Methanobacteria GCA 002505765.1 Methanobacteria GCA 000745485.1 Methanobacteria GCA 002287175.1 Methanobacteria GCA 003149675.1 Methanobacteria GCA 001729965.1 Methanobacteria GCA 003265405.1 Methanobacteria GCA 002509725.1 Methanobacteria GCA 900095295.1 Methanobacteria GCA 002495365.1 Methanobacteria GCA 000214725.1 Methanobacteria GCA 000191585.1 Methanobacteria GCA 003151535.1 Methanobacteria GCA 003158115.1 Methanobacteria GCA 000744455.1 Methanobacteria GCA 003202615.1 Methanobacteria GCA 003166335.1 Methanobacteria GCA 002494495.1 Methanobacteria GCA 002502855.1 Methanobacteria GCA 002495625.1 Methanobacteria GCA 002503005.1 Methanobacteria GCA 000499765.1 Methanobacteria GCA 000762265.1 Methanobacteria GCA 002813695.1 Methanobacteria GCA 000302455.1 Methanobacteria GCA 002509085.1 Methanobacteria GCA 002504725.1 Methanobacteria GCA 002509745.1 Methanobacteria GCA 002839705.1 Methanobacteria GCA 001639265.1 Methanobacteria GCA 001639295.1 Methanobacteria GCA 002072215.1 Methanobacteria GCA 001639285.1 Methanobacteria GCA 002208625.1 Methanobacteria GCA 000621965.1 Methanobacteria GCA 900315215.1 Methanobacteria GCA 900314605.1 Methanobacteria GCA 002509655.1 Methanobacteria GCA 002505605.1 Methanobacteria GCA 000024185.1 Methanobacteria GCA 900114585.1 Methanobacteria GCA 003111605.1 Methanobacteria GCA 002813085.1 Methanobacteria GCA 002252585.1 Methanobacteria GCA 900289035.1 Methanobacteria GCA 001477655.1 Methanobacteria GCA 900319535.1 Methanobacteria GCA 001548675.1 Methanobacteria GCA 900109595.1 Methanobacteria GCA 900314695.1 Methanobacteria GCA 900320515.1 Methanobacteria GCA 900314615.1 Methanobacteria GCA 003111625.1 Methanobacteria GCA 002494765.1 unclassi ed Euryarchaeota GCA 003009755.1 Thermoplasmata genasci S p2 S4 coassembly Euryarchaeota 62 92 GCA 002838935.1 Thermoplasmata GCA 002502685.1 unclassi ed Euryarchaeota GCA 001595915.1 Thermoplasmata GCA 001595945.1 Thermoplasmata GCA 002509175.1 unclassi ed Euryarchaeota GCA 002506745.1 unclassi ed Euryarchaeota GCA 002011395.1 unclassi ed Euryarchaeota GCA 000025665.1 unclassi ed Euryarchaeota GCA 000327505.1 unclassi ed Euryarchaeota GCA 001856825.1 Thermoplasmata GCA 000195915.1 Thermoplasmata GCA 000011185.1 Thermoplasmata GCA 900176435.1 Thermoplasmata GCA 000949015.1 Thermoplasmata GCA 002078355.1 Thermoplasmata GCA 002505185.1 Thermoplasmata GCA 002204705.1 Thermoplasmata GCA 000965745.1 uncultured archaeon GCA 002507555.1 Thermoplasmata GCA 002509305.1 Thermoplasmata GCA 900090055.1 Thermoplasmata GCA 000496135.1 Thermoplasmata GCA 002503985.1 Thermoplasmata genasci S p2 S4 coassembly Euryarchaeota 67 55 GCA 001800755.1 unclassi ed Euryarchaeota GCA 001800795.1 unclassi ed Euryarchaeota GCA 001800745.1 unclassi ed Euryarchaeota GCA 001800825.1 unclassi ed Euryarchaeota GCA 002499085.1 Thermoplasmata GCA 002496385.1 Thermoplasmata GCA 001800815.1 unclassi ed Euryarchaeota GCA 001800675.1 unclassi ed Euryarchaeota GCA 002506985.1 Thermoplasmata GCA 002497075.1 Thermoplasmata GCA 000404225.1 Thermoplasmata GCA 002508545.1 Thermoplasmata GCA 000308215.1 Thermoplasmata GCA 002495585.1 Thermoplasmata GCA 002494705.1 Thermoplasmata GCA 001421185.1 Thermoplasmata GCA 002498285.1 Thermoplasmata GCA 002502965.1 Thermoplasmata GCA 001421175.1 Thermoplasmata GCA 002498605.1 Thermoplasmata GCA 002506995.1 Thermoplasmata GCA 002505345.1 Thermoplasmata GCA 002509405.1 Thermoplasmata GCA 001563305.1 Methanomicrobia GCA 002504405.1 Thermoplasmata GCA 002504495.1 Thermoplasmata GCA 002494805.1 Thermoplasmata GCA 002506905.1 Thermoplasmata GCA 000800805.1 Thermoplasmata GCA 002495325.1 Thermoplasmata GCA 002503545.1 Thermoplasmata GCA 000437835.1 Methanomicrobia GCA 900314325.1 Thermoplasmata GCA 001560915.1 Thermoplasmata GCA 000350305.1 Thermoplasmata GCA 002509225.1 unclassi ed Euryarchaeota GCA 002724775.1 unclassi ed Euryarchaeota GCA 001875345.1 unclassi ed Euryarchaeota GCA 002497685.1 unclassi ed Euryarchaeota GCA 002495305.1 unclassi ed Euryarchaeota GCA 003193815.1 unclassi ed Euryarchaeota GCA 002687355.1 unclassi ed Euryarchaeota GCA 002731905.1 unclassi ed Euryarchaeota GCA 002720095.2 unclassi ed Euryarchaeota GCA 002696315.1 unclassi ed Euryarchaeota GCA 002698725.1 unclassi ed Euryarchaeota GCA 002720275.1 unclassi ed Euryarchaeota GCA 002692465.1 unclassi ed Euryarchaeota GCA 002503395.1 unclassi ed Euryarchaeota GCA 002719475.1 unclassi ed Euryarchaeota GCA 002505935.1 unclassi ed Euryarchaeota GCA 002699425.1 unclassi ed Euryarchaeota GCA 002695145.1 unclassi ed Euryarchaeota GCA 002706615.1 unclassi ed Euryarchaeota GCA 002701145.1 unclassi ed Euryarchaeota GCA 002507425.1 unclassi ed Euryarchaeota GCA 002710015.1 unclassi ed Euryarchaeota GCA 002694245.1 unclassi ed Euryarchaeota GCA 002715725.1 unclassi ed Euryarchaeota GCA 002506705.1 unclassi ed Euryarchaeota GCA 002720115.1 unclassi ed Euryarchaeota GCA 002710205.1 unclassi ed Euryarchaeota GCA 002709705.1 unclassi ed Euryarchaeota GCA 002498185.1 unclassi ed Euryarchaeota GCA 002712495.1 unclassi ed Euryarchaeota GCA 002507165.1 unclassi ed Euryarchaeota GCA 002170075.1 unclassi ed Euryarchaeota GCA 002505155.1 unclassi ed Euryarchaeota GCA 002694585.1 unclassi ed Euryarchaeota GCA 002497645.1 unclassi ed Euryarchaeota GCA 002496635.1 unclassi ed Euryarchaeota GCA 002713665.1 unclassi ed Euryarchaeota GCA 002726845.1 unclassi ed Euryarchaeota GCA 002706065.1 unclassi ed Euryarchaeota GCA 002703075.2 unclassi ed Euryarchaeota GCA 002715245.1 unclassi ed Euryarchaeota GCA 002502215.1 unclassi ed Euryarchaeota GCA 002698145.1 unclassi ed Euryarchaeota GCA 002494645.1 unclassi ed Euryarchaeota GCA 002505405.1 unclassi ed Euryarchaeota GCA 002726495.1 unclassi ed Euryarchaeota GCA 002728035.1 unclassi ed Euryarchaeota GCA 002495645.1 unclassi ed Euryarchaeota GCA 002495105.1 unclassi ed Euryarchaeota GCA 002692625.1 unclassi ed Euryarchaeota GCA 002694525.1 unclassi ed Euryarchaeota GCA 002502295.1 unclassi ed Euryarchaeota GCA 002704515.1 unclassi ed Euryarchaeota GCA 002714305.1 unclassi ed Euryarchaeota GCA 002499015.1 unclassi ed Euryarchaeota GCA 000246735.1 unclassi ed Euryarchaeota GCA 002708275.1 unclassi ed Euryarchaeota GCA 002170315.1 unclassi ed Euryarchaeota GCA 002499705.1 unclassi ed Euryarchaeota GCA 002687075.1 unclassi ed Euryarchaeota GCA 002506275.1 unclassi ed Euryarchaeota GCA 002499545.1 unclassi ed Euryarchaeota GCA 002701025.1 unclassi ed Euryarchaeota GCA 002697705.1 unclassi ed Euryarchaeota GCA 002702945.1 unclassi ed Euryarchaeota GCA 002729095.1 unclassi ed Euryarchaeota GCA 002494685.1 unclassi ed Euryarchaeota GCA 002506025.1 unclassi ed Euryarchaeota GCA 002496955.1 unclassi ed Euryarchaeota GCA 002494865.1 unclassi ed Euryarchaeota GCA 002505495.1 unclassi ed Euryarchaeota GCA 002722595.1 unclassi ed Euryarchaeota GCA 002502335.1 unclassi ed Euryarchaeota GCA 002724815.1 unclassi ed Euryarchaeota GCA 002712285.1 unclassi ed Euryarchaeota GCA 002506875.1 unclassi ed Euryarchaeota GCA 002713585.1 unclassi ed Euryarchaeota GCA 002708695.1 unclassi ed Euryarchaeota GCA 002708015.1 unclassi ed Euryarchaeota GCA 002503045.1 unclassi ed Euryarchaeota GCA 002502625.1 unclassi ed Euryarchaeota GCA 002718215.1 unclassi ed Euryarchaeota GCA 002505455.1 unclassi ed Euryarchaeota GCA 002497295.1 unclassi ed Euryarchaeota GCA 002504845.1 unclassi ed Euryarchaeota GCA 002717835.1 unclassi ed Euryarchaeota GCA 002497125.1 unclassi ed Euryarchaeota GCA 002507175.1 unclassi ed Euryarchaeota GCA 002495675.1 unclassi ed Euryarchaeota GCA 002507345.1 unclassi ed Euryarchaeota GCA 002498525.1 unclassi ed Euryarchaeota GCA 002498725.1 unclassi ed Euryarchaeota GCA 002699665.1 unclassi ed Euryarchaeota GCA 002499365.1 unclassi ed Euryarchaeota GCA 002708385.1 unclassi ed Euryarchaeota GCA 002502285.1 unclassi ed Euryarchaeota GCA 002457555.1 unclassi ed Euryarchaeota GCA 002497025.1 unclassi ed Euryarchaeota GCA 002499785.1 unclassi ed Euryarchaeota GCA 002494975.1 unclassi ed Euryarchaeota GCA 002497245.1 unclassi ed Euryarchaeota GCA 002686525.1 unclassi ed Euryarchaeota GCA 002494965.1 unclassi ed Euryarchaeota GCA 002496435.1 unclassi ed Euryarchaeota GCA 002495525.1 unclassi ed Euryarchaeota GCA 002726275.1 unclassi ed Euryarchaeota GCA 002504435.1 unclassi ed Euryarchaeota GCA 002499715.1 unclassi ed Euryarchaeota GCA 002718995.2 unclassi ed Euryarchaeota GCA 002730095.1 unclassi ed Euryarchaeota GCA 002505775.1 unclassi ed Euryarchaeota GCA 002499265.1 unclassi ed Euryarchaeota GCA 002496725.1 unclassi ed Euryarchaeota GCA 002704585.1 unclassi ed Euryarchaeota GCA 002725245.1 unclassi ed Euryarchaeota GCA 002499865.1 unclassi ed Euryarchaeota GCA 002501805.1 unclassi ed Euryarchaeota GCA 002502095.1 unclassi ed Euryarchaeota GCA 002499325.1 unclassi ed Euryarchaeota GCA 003193925.1 unclassi ed Euryarchaeota GCA 002721855.1 unclassi ed Euryarchaeota GCA 002506005.1 unclassi ed Euryarchaeota GCA 002727275.1 unclassi ed Euryarchaeota GCA 002496735.1 unclassi ed Euryarchaeota GCA 002171445.1 unclassi ed Euryarchaeota GCA 002719615.1 unclassi ed Euryarchaeota GCA 002698465.1 unclassi ed Euryarchaeota GCA 002712575.1 unclassi ed Euryarchaeota GCA 001629205.1 unclassi ed Euryarchaeota GCA 002698805.1 unclassi ed Euryarchaeota GCA 002497545.1 unclassi ed Euryarchaeota GCA 002725315.1 unclassi ed Euryarchaeota GCA 002728015.1 unclassi ed Euryarchaeota GCA 002506755.1 unclassi ed Euryarchaeota GCA 002719965.1 unclassi ed Euryarchaeota GCA 002716085.1 unclassi ed Euryarchaeota GCA 002720055.1 unclassi ed Euryarchaeota GCA 002689165.1 unclassi ed Euryarchaeota GCA 001629245.1 unclassi ed Euryarchaeota GCA 001628485.1 unclassi ed Euryarchaeota GCA 002683155.1 unclassi ed Euryarchaeota GCA 002723045.1 unclassi ed Euryarchaeota GCA 002457155.1 unclassi ed Euryarchaeota GCA 002727815.1 unclassi ed Euryarchaeota GCA 002170935.1 unclassi ed Euryarchaeota GCA 002011165.1 unclassi ed Euryarchaeota GCA 002010305.1 Archaeoglobi Euryarchaeota GCA 000025285.1 Archaeoglobi GCA 002010285.1 Archaeoglobi GCA 000008665.1 Archaeoglobi GCA 002507545.1 Archaeoglobi GCA 000025505.1 Archaeoglobi GCA 000789255.1 Archaeoglobi GCA 002494725.1 Archaeoglobi GCA 001006045.1 Archaeoglobi GCA 002010215.1 Archaeoglobi GCA 000194625.1 Archaeoglobi GCA 002010045.1 Archaeoglobi GCA 002011215.1 Archaeoglobi GCA 000385565.1 Archaeoglobi GCA 002011235.1 Archaeoglobi GCA 002010195.1 Archaeoglobi GCA 002494625.1 Archaeoglobi GCA 002503135.1 Methanomicrobia GCA 002503105.1 Methanomicrobia GCA 900025055.1 Methanomicrobia GCA 000304355.2 Methanomicrobia GCA 001571405.1 Methanomicrobia GCA 002499455.1 Methanomicrobia GCA 002508805.1 Methanomicrobia GCA 002506585.1 Methanomicrobia GCA 002501655.1 Methanomicrobia GCA 002497965.1 Methanomicrobia GCA 002503885.1 Methanomicrobia GCA 002839645.1 Methanomicrobia GCA 002839575.1 Methanomicrobia GCA 002498065.1 Methanomicrobia GCA 900095385.1 Methanomicrobia GCA 001602375.1 Methanomicrobia GCA 001017125.1 Methanomicrobia GCA 000691865.1 Methanomicrobia GCA 000015825.1 Methanomicrobia GCA 001412355.1 Methanomicrobia GCA 000711215.1 Methanomicrobia GCA 000243255.1 Methanomicrobia GCA 000147875.1 Methanomicrobia GCA 000784355.1 Methanomicrobia GCA 000275865.1 Methanomicrobia GCA 002498315.1 Methanomicrobia GCA 001571385.1 Methanomicrobia GCA 001508695.1 Methanomicrobia GCA 002496395.1 Methanomicrobia GCA 001940805.1 Methanomicrobia GCA 002506085.1 Methanomicrobia GCA 000015765.1 Methanomicrobia GCA 002498375.1 Methanomicrobia GCA 002287215.1 Methanomicrobia GCA 003173355.1 Methanomicrobia GCA 003251955.1 Methanomicrobia GCA 000013445.1 Methanomicrobia GCA 003173335.1 Methanomicrobia GCA 000021965.1 Methanomicrobia GCA 002503825.1 Methanomicrobia GCA 002499905.1 Methanomicrobia GCA 000235685.3 Methanomicrobia GCA 002496855.1 Methanomicrobia GCA 000017625.1 Methanomicrobia GCA 003170075.1 Methanomicrobia GCA 003157895.1 Methanomicrobia GCA 002506785.1 Methanomicrobia GCA 002839605.1 Methanomicrobia GCA 000327485.1 Methanomicrobia GCA 002839675.1 Methanomicrobia GCA 003153835.1 Methanomicrobia GCA 002495065.1 Methanomicrobia GCA 003141335.1 Methanomicrobia GCA 002495055.1 Methanomicrobia GCA 002495885.1 Methanomicrobia GCA 000063445.1 Methanomicrobia GCA 000251105.1 Methanomicrobia GCA 000011005.1 Methanomicrobia GCA 001766825.1 Methanomicrobia GCA 001766815.1 Methanomicrobia GCA 003170935.1 unclassi ed Euryarchaeota GCA 003171015.1 unclassi ed Euryarchaeota GCA 002509385.1 Methanomicrobia GCA 002010075.1 Methanomicrobia GCA 000711905.1 Methanomicrobia GCA 000014945.1 Methanomicrobia GCA 002496465.1 Methanomicrobia GCA 002256595.1 Methanomicrobia GCA 002506335.1 Methanomicrobia GCA 003162615.1 Methanomicrobia GCA 002501765.1 Methanomicrobia GCA 002506535.1 Methanomicrobia GCA 001508615.1 Methanomicrobia GCA 002503085.1 Methanomicrobia GCA 000235565.1 Methanomicrobia GCA 001602645.1 Methanomicrobia GCA 002839545.1 Methanomicrobia GCA 002634415.1 Methanomicrobia GCA 000685155.1 Methanomicrobia GCA 002926195.1 Methanomicrobia GCA 000970045.1 Methanomicrobia GCA 000970005.1 Methanomicrobia GCA 000979425.1 Methanomicrobia GCA 000970265.1 Methanomicrobia GCA 000970145.1 Methanomicrobia GCA 002507065.1 Methanomicrobia GCA 000970285.1 Methanomicrobia GCA 001714685.2 Methanomicrobia GCA 003164755.1 Methanomicrobia GCA 002494845.1 Methanomicrobia GCA 000969885.1 Methanomicrobia GCA 001304615.1 Methanomicrobia GCA 002495725.1 Methanomicrobia GCA 000970305.1 Methanomicrobia GCA 002287235.1 Methanomicrobia GCA 002505485.1 Methanomicrobia GCA 000969945.1 Methanomicrobia GCA 000969985.1 Methanomicrobia GCA 000970025.1 Methanomicrobia GCA 000217995.1 Methanomicrobia GCA 000196655.1 Methanomicrobia GCA 900177455.1 Methanomicrobia GCA 900106905.1 Methanomicrobia GCA 000025865.1 Methanomicrobia GCA 900215215.1 Methanomicrobia GCA 000013725.1 Methanomicrobia GCA 000970325.1 Methanomicrobia GCA 000765475.1 Methanomicrobia GCA 900111645.1 Methanomicrobia GCA 000328665.1 Methanomicrobia GCA 001896725.1 Methanomicrobia GCA 002508425.1 Methanomicrobia GCA 001577565.1 Methanomicrobia GCA 002508635.1 Methanomicrobia GCA 000306725.1 Methanomicrobia GCA 002501695.1 Methanomicrobia GCA 002243045.1 Methanomicrobia GCA 900114835.1 Methanomicrobia GCA 000504205.1 Methanomicrobia GCA 900100715.1 Methanomicrobia GCA 002153915.1 Methanonatronarchaeia GCA 001914405.1 Methanonatronarchaeia GCA 002503845.1 unclassi ed Euryarchaeota GCA 000485535.1 Halobacteria GCA 001593955.1 Halobacteria GCA 000474235.1 Halobacteria GCA 001305655.1 Halobacteria GCA 001886955.1 Halobacteria GCA 900110535.1 Halobacteria GCA 000069025.1 Halobacteria GCA 000230955.3 Halobacteria GCA 001485535.1 Halobacteria GCA 900005775.1 Halobacteria GCA 000496195.1 uncultured archaeon A07HB70 GCA 900107665.1 Halobacteria GCA 000427685.1 Halobacteria GCA 003342695.1 Halobacteria GCA 003336245.1 Halobacteria GCA 002906575.1 Halobacteria GCA 002844195.1 Halobacteria GCA 001469955.1 Halobacteria GCA 900103715.1 Halobacteria GCA 900110465.1 Halobacteria GCA 900114455.1 Halobacteria GCA 900109695.1 Halobacteria GCA 000336755.1 Halobacteria GCA 001469865.1 Halobacteria GCA 000337815.1 Halobacteria GCA 000685635.1 Halobacteria GCA 001368915.1 Halobacteria GCA 001469875.2 Halobacteria GCA 001190965.1 Halobacteria GCA 001482285.1 Halobacteria GCA 000025685.1 Halobacteria GCA 000337835.1 Halobacteria GCA 900100875.1 Halobacteria GCA 900113245.1 Halobacteria GCA 900112175.1 Halobacteria GCA 000337855.1 Halobacteria GCA 000337095.1 Halobacteria GCA 900115785.1 Halobacteria GCA 000739575.1 Halobacteria GCA 900107195.1 Halobacteria GCA 900108165.1 Halobacteria GCA 000416005.1 Halobacteria GCA 000415985.1 Halobacteria GCA 000237865.1 Halobacteria GCA 000415965.1 Halobacteria GCA 000495475.1 Halobacteria GCA 900129775.1 Halobacteria GCA 003020945.1 Halobacteria GCA 000224475.1 unclassi ed Archaea miscellaneous GCA 900115675.1 Halobacteria GCA 000739555.1 Halobacteria GCA 002025255.1 Halobacteria GCA 001282785.1 Halobacteria GCA 900109065.1 Halobacteria GCA 001462205.1 Halobacteria GCA 900108505.1 Halobacteria GCA 002355635.1 Halobacteria GCA 900107205.1 Halobacteria GCA 900188065.1 Halobacteria GCA 000496175.1 uncultured archaeon A07HR67 GCA 001542905.1 Halobacteria GCA 002252985.1 Halobacteria GCA 000337035.1 Halobacteria GCA 900188075.1 Halobacteria GCA 003023405.1 Halobacteria GCA 000337415.1 Halobacteria GCA 000337075.1 Halobacteria GCA 001280455.1 Halobacteria GCA 000336875.1 Halobacteria GCA 000337015.1 Halobacteria GCA 900111935.1 Halobacteria GCA 002355655.1 Halobacteria GCA 000296615.1 Halobacteria GCA 002252895.1 Halobacteria GCA 000336995.1 Halobacteria GCA 001564205.1 Candidatus Nanohaloarchaeota GCA 000337355.1 Halobacteria GCA 002727125.1 Halobacteria GCA 000337915.1 Halobacteria GCA 000337375.1 Halobacteria GCA 002215305.1 Halobacteria GCA 000746205.1 Halobacteria GCA 000739595.1 Halobacteria GCA 002286985.1 Halobacteria GCA 900103505.1 Halobacteria GCA 000328525.1 Halobacteria GCA 000337515.1 Halobacteria GCA 001564275.1 Halobacteria GCA 001564285.1 Halobacteria GCA 000517625.1 Halobacteria GCA 900116205.1 Halobacteria GCA 000691505.2 Halobacteria GCA 003430825.1 Halobacteria GCA 002156705.1 Halobacteria GCA 000455345.1 Halobacteria GCA 000337215.1 Halobacteria GCA 000337675.1 Halobacteria GCA 000337695.1 Halobacteria GCA 000328685.1 Halobacteria GCA 000337495.1 Halobacteria GCA 000025325.1 Halobacteria GCA 900156475.1 Halobacteria GCA 000337735.1 Halobacteria GCA 000337715.1 Halobacteria GCA 000383975.1 Halobacteria GCA 900100335.1 Halobacteria GCA 900108095.1 Halobacteria GCA 900156445.1 Halobacteria GCA 002177135.1 Halobacteria GCA 000337535.1 Halobacteria GCA 000337595.1 Halobacteria GCA 000337555.1 Halobacteria GCA 000337575.1 Halobacteria GCA 000025625.1 Halobacteria GCA 000337135.1 Halobacteria GCA 001861355.1 Halobacteria GCA 000217715.1 Halobacteria GCA 000455365.1 Halobacteria GCA 900114025.1 Halobacteria GCA 900104065.1 Halobacteria GCA 000337895.1 Halobacteria GCA 900112205.1 Halobacteria GCA 000336655.1 Halobacteria GCA 900110455.1 Halobacteria GCA 900111485.1 Halobacteria GCA 000337475.1 Halobacteria GCA 900110865.1 Halobacteria GCA 002572525.1 Halobacteria GCA 000710605.1 Halobacteria GCA 000690595.2 Halobacteria GCA 001953745.1 Halobacteria GCA 000337195.1 Halobacteria GCA 002494345.1 Halobacteria GCA 000493245.1 Halobacteria GCA 000337615.1 Halobacteria GCA 000337155.1 Halobacteria GCA 003023565.1 Halobacteria GCA 900215575.1 Halobacteria GCA 003369835.1 Halobacteria GCA 003382685.1 Halobacteria GCA 003022025.1 Halobacteria GCA 003298465.1 Halobacteria GCA 900142335.1 Halobacteria GCA 001625445.1 Halobacteria GCA 900156425.1 Halobacteria GCA 000710615.1 Halobacteria GCA 003021085.1 Halobacteria GCA 000403645.1 Halobacteria GCA 003022865.1 Halobacteria GCA 003023195.1 Halobacteria GCA 003021705.1 Halobacteria GCA 003022925.1 Halobacteria GCA 003021235.1 Halobacteria GCA 003021045.1 Halobacteria GCA 000336675.1 Halobacteria GCA 000755245.1 Halobacteria GCA 000336695.1 Halobacteria GCA 000336715.1 Halobacteria GCA 000336935.1 Halobacteria GCA 000336915.1 Halobacteria GCA 000334895.1 Halobacteria GCA 003021145.1 Halobacteria GCA 003023525.1 Halobacteria GCA 003021975.1 Halobacteria GCA 000026045.1 Halobacteria GCA 000591055.1 Halobacteria GCA 003021005.1 Halobacteria GCA 003020965.1 Halobacteria GCA 003022005.1 Halobacteria GCA 003020985.1 Halobacteria GCA 001950595.1 Halobacteria GCA 003023215.1 Halobacteria GCA 003021175.1 Halobacteria GCA 003023185.1 Halobacteria GCA 001989615.1 Halobacteria GCA 900102305.1 Halobacteria GCA 900110215.1 Halobacteria GCA 900114435.1 Halobacteria GCA 000023965.1 Halobacteria GCA 003020925.1 Halobacteria GCA 003021755.1 Halobacteria GCA 001280425.1 Halobacteria GCA 900106715.1 Halobacteria GCA 000336615.1 Halobacteria GCA 000337755.1 Halobacteria GCA 001485575.1 Halobacteria GCA 001647155.1 Halobacteria GCA 000336895.1 Halobacteria GCA 000336635.1 Halobacteria GCA 000470655.1 Halobacteria GCA 000023945.1 Halobacteria GCA 003058365.1 Halobacteria GCA 000755225.1 Halobacteria GCA 003022045.1 Halobacteria GCA 000337455.1 Halobacteria GCA 003023785.1 Halobacteria GCA 003021655.1 Halobacteria GCA 003021305.1 Halobacteria GCA 003022745.1 Halobacteria GCA 003021015.1 Halobacteria GCA 003022725.1 Halobacteria GCA 001564135.1 Halobacteria GCA 000416085.1 Halobacteria GCA 003021285.1 Halobacteria GCA 900100385.1 Halobacteria Supplementary Figure S1. Taxonomic assessment and distribution of the 1,179 representative genomes. Maximum-likelihood phylogeny based on a 14-ribosomal-protein concatenated alignment using the LG plus gamma model of evolution. Scale bar indicates the average substitutions per site.

Sequences vs sequences comparison (Mmseqs2) (Clustering sequences into subfamilies)

201,189 sequences with no hits

98,243 subfamilies 2,336,157 2,134,968 protein sequences protein sequences (>= 2 protein sequences) from 1,179 MAGs All-vs-all comparison Greedy Set Cover E-value < 0.001 clustering Sensitivity: 7.5 Coverage > 0.50

10,866 families 2,075,863 protein sequences (>= 5 genomes) MCL clustering HMM-HMM comparison HMM profiles building -I 2 Hhblits 98,243 HMMs Coverage > 0.50 Probabilities > 0.95

HMM vs HMM comparison (HHpred suite) (Merging distant homologous subfamilies into families)

Supplementary Figure S2. The protein clustering pipeline used in the study. MAGs: metagenome-assembled genomes.

Supplementary Figure S3. Quality assessment of the protein clustering. A. Consistency between the KEGG annotations and the protein families. For each of the 6482 annotations, we reported the family which contains the highest percentage of protein members annotated with that KEGG annotation. Each dot represents a KEGG annotation, the y-axis represents the highest percentage. C. Contamination of the protein families. For each family with proteins having KEGG annotations, we computed the percentage of the proteins that have KEGG annotations different than the most abundant one, this percentage defined the annotation admixture (y-axis). Each dot represents a protein family.

Supplementary Figure S4. Correlation plot of 4 trees obtained from 3 different hierarchical clustering methods (complete linkage, average linkage and single linkage). Maximum-likelihood tree based on RAxML is also shown (“Phylogenetic tree”). Correlations are based on cophenetic distance matrices between pairs of trees. Positive correlations are displayed in blue and negative correlations in red color. Color intensity is proportional to the correlation coefficient.

Supplementary Figure S5. The distribution of 10,866 widely distributed protein families (columns) in 1,179 representative genomes (rows) from Archaea. The families of the 19 modules discussed in the study were colored. Data are clustered based on the presence (black) and absence (white) profiles (Jaccard distance, complete linkage).

Supplementary Figure S6. Number of genomes per family in 14 selected modules. X-axis represents the families and y-axis the number of genomes. For each family, the number of genomes of Methanomicrobia, Methanobacteria, Halobacteria, Crenarchaeota, Thalassoarchaea, Thermococci, Archaeoglobi, Thaumarchaeota, Thermoplasmata and Methanomassiliicoccales is shown by a colored dot.

Tree scale: 0.1

and methanol methyltrasferase reductase S-methyltransf

Methyl-coenzyme Mtetrahydromethanopterin mono-, di-, tri-methylamine methyltransferases Energy-converting hydrogenaseEnergy-converting A hydrogenase B Methanogenesis markers 7 3 2 3 6 0 3 6 6 5 3 9 7 7 8 5 6 5 4 6 7 6 3 2 9 2 4 2 0 7 2 19

DPANN fam060 3 fam061 6 fam064 6 fam060 1 fam02 1 fam107 2 fam143 6 fam176 3 fam062 6 fam136 6 fam089 9 fam030 6 fam026 7 fam224 5 fam063 6 fam060 9 fam068 7 fam321 5 fam054 0 fam040 6 fam023 3 fam039 3 fam030 7 fam058 7 fam040 9 fam212 9 fam023 7 fam053 9 fam060 6 fam066 0 fam120 3 fam023 7 ASGARD GCA 002507085.1 unclassi ed Archaea miscellaneous GCA 000494205.1 Crenarchaeota Crenarchaeota GCA 001717015.1 Candidatus Verstraetearchaeota GCA 001717035.1 Candidatus Verstraetearchaeota Verstraetearchaeota GCA 002504985.1 Candidatus Verstraetearchaeota GCA 002898355.1 unclassi ed Archaea miscellaneous GCA 002011075.1 Candidatus Geothermarchaeota Thaumarchaeota GCA 001775955.1 Candidatus Bathyarchaeota GCA 001918745.1 Crenarchaeota GCA 001775995.1 Candidatus Bathyarchaeota GCA 001273415.1 Candidatus Bathyarchaeota GCA 002726865.1 Candidatus Bathyarchaeota GCA 001775965.1 Candidatus Bathyarchaeota GCA 001774245.1 Candidatus Bathyarchaeota GCA 001593865.1 Candidatus Bathyarchaeota GCA 002009985.1 Candidatus Bathyarchaeota GCA 001593935.1 Candidatus Bathyarchaeota GCA 002490245.1 Candidatus Bathyarchaeota GCA 001399805.1 Candidatus Bathyarchaeota GCA 002509245.1 Candidatus Bathyarchaeota GCA 002507245.1 Candidatus Bathyarchaeota

GCA 001774235.1 Candidatus Bathyarchaeota Bathyarchaeota GCA 001273385.1 Candidatus Bathyarchaeota GCA 003165195.1 Candidatus Bathyarchaeota GCA 003151735.1 Candidatus Bathyarchaeota GCA 003141855.1 Candidatus Bathyarchaeota GCA 003229935.1 unclassi ed Euryarchaeota Hydrothermarchaeota GCA 002011125.1 unclassi ed Euryarchaeota GCA 000376965.1 Methanococci Methanococci GCA 000243455.2 Methanococci Thermococci GCA 001595815.1 Theionarchaea Theionarchaea GCA 001595795.1 Theionarchaea GCA 001587635.1 Methanomicrobia GCA 001587655.1 Methanomicrobia Methanofastidiosales GCA 001587715.1 Methanomicrobia GCA 001515215.1 Hadesarchaea Hadesarchaea GCA 001515205.2 Hadesarchaea GCA 001515185.1 Hadesarchaea GCA 000166095.1 Methanobacteria GCA 003264935.1 Methanobacteria GCA 900095815.1 Methanobacteria GCA 000145295.1 Methanobacteria GCA 000828575.1 Methanobacteria GCA 002494785.1 Methanobacteria GCA 002494885.1 Methanobacteria GCA 002505765.1 Methanobacteria GCA 000745485.1 Methanobacteria GCA 002287175.1 Methanobacteria GCA 003149675.1 Methanobacteria GCA 001729965.1 Methanobacteria GCA 003265405.1 Methanobacteria GCA 002509725.1 Methanobacteria GCA 900095295.1 Methanobacteria GCA 002495365.1 Methanobacteria GCA 000214725.1 Methanobacteria GCA 000191585.1 Methanobacteria GCA 003151535.1 Methanobacteria GCA 003158115.1 Methanobacteria GCA 000744455.1 Methanobacteria GCA 003202615.1 Methanobacteria GCA 003166335.1 Methanobacteria GCA 002494495.1 Methanobacteria GCA 002502855.1 Methanobacteria GCA 002495625.1 Methanobacteria GCA 002503005.1 Methanobacteria GCA 000499765.1 Methanobacteria GCA 000762265.1 Methanobacteria GCA 002813695.1 Methanobacteria GCA 000302455.1 Methanobacteria GCA 002509085.1 Methanobacteria GCA 002504725.1 Methanobacteria GCA 002509745.1 Methanobacteria

GCA 002839705.1 Methanobacteria Methanobacteria GCA 001639265.1 Methanobacteria GCA 001639295.1 Methanobacteria GCA 002072215.1 Methanobacteria GCA 001639285.1 Methanobacteria GCA 002208625.1 Methanobacteria GCA 000621965.1 Methanobacteria GCA 900315215.1 Methanobacteria GCA 900314605.1 Methanobacteria GCA 002509655.1 Methanobacteria GCA 002505605.1 Methanobacteria GCA 000024185.1 Methanobacteria GCA 900114585.1 Methanobacteria GCA 003111605.1 Methanobacteria GCA 002813085.1 Methanobacteria GCA 002252585.1 Methanobacteria GCA 900289035.1 Methanobacteria GCA 001477655.1 Methanobacteria GCA 900319535.1 Methanobacteria GCA 001548675.1 Methanobacteria GCA 900109595.1 Methanobacteria GCA 900314695.1 Methanobacteria GCA 900320515.1 Methanobacteria GCA 900314615.1 Methanobacteria GCA 003111625.1 Methanobacteria GCA 002494765.1 unclassi ed Euryarchaeota Izemarchaea Pontarchaea Marine Group II Thermoplasmata GCA 002011395.1 unclassi ed Euryarchaeota GCA 000025665.1 unclassi ed Euryarchaeota GCA 000327505.1 unclassi ed Euryarchaeota GCA 002503985.1 Thermoplasmata genasci S p2 S4 coassembly Euryarchaeota 67 55 GCA 001800755.1 unclassi ed Euryarchaeota GCA 001800795.1 unclassi ed Euryarchaeota GCA 001800745.1 unclassi ed Euryarchaeota GCA 001800825.1 unclassi ed Euryarchaeota GCA 002499085.1 Thermoplasmata GCA 002496385.1 Thermoplasmata GCA 001800815.1 unclassi ed Euryarchaeota GCA 001800675.1 unclassi ed Euryarchaeota GCA 002506985.1 Thermoplasmata GCA 002497075.1 Thermoplasmata GCA 000404225.1 Thermoplasmata GCA 002508545.1 Thermoplasmata GCA 000308215.1 Thermoplasmata GCA 002495585.1 Thermoplasmata GCA 002494705.1 Thermoplasmata GCA 001421185.1 Thermoplasmata GCA 002498285.1 Thermoplasmata GCA 002502965.1 Thermoplasmata GCA 001421175.1 Thermoplasmata GCA 002498605.1 Thermoplasmata GCA 002506995.1 Thermoplasmata GCA 002505345.1 Thermoplasmata GCA 002509405.1 Thermoplasmata GCA 001563305.1 Methanomicrobia GCA 002504405.1 Thermoplasmata GCA 002504495.1 Thermoplasmata GCA 002494805.1 Thermoplasmata GCA 002506905.1 Thermoplasmata GCA 000800805.1 Thermoplasmata GCA 002495325.1 Thermoplasmata

GCA 002503545.1 Thermoplasmata Methanomassiliicoccales GCA 000437835.1 Methanomicrobia GCA 900314325.1 Thermoplasmata GCA 001560915.1 Thermoplasmata GCA 000350305.1 Thermoplasmata GCA 002011165.1 unclassi ed Euryarchaeota GCA 002010305.1 Archaeoglobi GCA 000025285.1 Archaeoglobi GCA 002010285.1 Archaeoglobi GCA 000008665.1 Archaeoglobi GCA 002507545.1 Archaeoglobi GCA 000025505.1 Archaeoglobi GCA 000789255.1 Archaeoglobi GCA 002494725.1 Archaeoglobi GCA 001006045.1 Archaeoglobi GCA 002010215.1 Archaeoglobi GCA 000194625.1 Archaeoglobi GCA 002010045.1 Archaeoglobi

GCA 002011215.1 Archaeoglobi Archaeoglobi GCA 000385565.1 Archaeoglobi GCA 002011235.1 Archaeoglobi GCA 002010195.1 Archaeoglobi GCA 002494625.1 Archaeoglobi GCA 002153915.1 Methanonatronarchaeia Methanonatronarchaeia GCA 001914405.1 Methanonatronarchaeia Halobacteria GCA 002503135.1 Methanomicrobia GCA 002503105.1 Methanomicrobia GCA 900025055.1 Methanomicrobia GCA 000304355.2 Methanomicrobia GCA 001571405.1 Methanomicrobia GCA 002499455.1 Methanomicrobia GCA 002508805.1 Methanomicrobia GCA 002506585.1 Methanomicrobia GCA 002501655.1 Methanomicrobia GCA 002497965.1 Methanomicrobia GCA 002503885.1 Methanomicrobia GCA 002839645.1 Methanomicrobia GCA 002839575.1 Methanomicrobia GCA 002498065.1 Methanomicrobia GCA 900095385.1 Methanomicrobia GCA 001602375.1 Methanomicrobia GCA 001017125.1 Methanomicrobia GCA 000691865.1 Methanomicrobia GCA 000015825.1 Methanomicrobia GCA 001412355.1 Methanomicrobia GCA 000711215.1 Methanomicrobia GCA 000243255.1 Methanomicrobia GCA 000147875.1 Methanomicrobia GCA 000784355.1 Methanomicrobia GCA 000275865.1 Methanomicrobia GCA 002498315.1 Methanomicrobia GCA 001571385.1 Methanomicrobia GCA 001508695.1 Methanomicrobia GCA 002496395.1 Methanomicrobia GCA 001940805.1 Methanomicrobia GCA 002506085.1 Methanomicrobia GCA 000015765.1 Methanomicrobia GCA 002498375.1 Methanomicrobia GCA 002287215.1 Methanomicrobia GCA 003173355.1 Methanomicrobia GCA 003251955.1 Methanomicrobia GCA 000013445.1 Methanomicrobia GCA 003173335.1 Methanomicrobia GCA 000021965.1 Methanomicrobia GCA 002503825.1 Methanomicrobia GCA 002499905.1 Methanomicrobia GCA 000235685.3 Methanomicrobia GCA 002496855.1 Methanomicrobia GCA 000017625.1 Methanomicrobia GCA 003170075.1 Methanomicrobia GCA 003157895.1 Methanomicrobia GCA 002506785.1 Methanomicrobia GCA 002839605.1 Methanomicrobia GCA 000327485.1 Methanomicrobia GCA 002839675.1 Methanomicrobia GCA 003153835.1 Methanomicrobia GCA 002495065.1 Methanomicrobia GCA 003141335.1 Methanomicrobia GCA 002495055.1 Methanomicrobia GCA 002495885.1 Methanomicrobia GCA 000063445.1 Methanomicrobia Methanocellales GCA 000251105.1 Methanomicrobia GCA 000011005.1 Methanomicrobia GCA 001766825.1 Methanomicrobia Syntrophoarchaeales GCA 001766815.1 Methanomicrobia GCA 003170935.1 unclassi ed Euryarchaeota Methanomicrobia GCA 003171015.1 unclassi ed Euryarchaeota Methanoflorentaceae GCA 002509385.1 Methanomicrobia GCA 002010075.1 Methanomicrobia GCA 000711905.1 Methanomicrobia GCA 000014945.1 Methanomicrobia GCA 002496465.1 Methanomicrobia GCA 002256595.1 Methanomicrobia GCA 002506335.1 Methanomicrobia GCA 003162615.1 Methanomicrobia GCA 002501765.1 Methanomicrobia GCA 002506535.1 Methanomicrobia GCA 001508615.1 Methanomicrobia GCA 002503085.1 Methanomicrobia GCA 000235565.1 Methanomicrobia GCA 001602645.1 Methanomicrobia GCA 002839545.1 Methanomicrobia GCA 002634415.1 Methanomicrobia GCA 000685155.1 Methanomicrobia GCA 002926195.1 Methanomicrobia GCA 000970045.1 Methanomicrobia GCA 000970005.1 Methanomicrobia GCA 000979425.1 Methanomicrobia GCA 000970265.1 Methanomicrobia GCA 000970145.1 Methanomicrobia GCA 002507065.1 Methanomicrobia GCA 000970285.1 Methanomicrobia GCA 001714685.2 Methanomicrobia GCA 003164755.1 Methanomicrobia GCA 002494845.1 Methanomicrobia GCA 000969885.1 Methanomicrobia GCA 001304615.1 Methanomicrobia GCA 002495725.1 Methanomicrobia GCA 000970305.1 Methanomicrobia GCA 002287235.1 Methanomicrobia GCA 002505485.1 Methanomicrobia GCA 000969945.1 Methanomicrobia GCA 000969985.1 Methanomicrobia GCA 000970025.1 Methanomicrobia GCA 000217995.1 Methanomicrobia GCA 000196655.1 Methanomicrobia GCA 900177455.1 Methanomicrobia GCA 900106905.1 Methanomicrobia GCA 000025865.1 Methanomicrobia GCA 900215215.1 Methanomicrobia GCA 000013725.1 Methanomicrobia GCA 000970325.1 Methanomicrobia GCA 000765475.1 Methanomicrobia GCA 900111645.1 Methanomicrobia GCA 000328665.1 Methanomicrobia GCA 001896725.1 Methanomicrobia GCA 002508425.1 Methanomicrobia GCA 001577565.1 Methanomicrobia GCA 002508635.1 Methanomicrobia GCA 000306725.1 Methanomicrobia GCA 002501695.1 Methanomicrobia GCA 002243045.1 Methanomicrobia GCA 900114835.1 Methanomicrobia GCA 000504205.1 Methanomicrobia GCA 900100715.1 Methanomicrobia Supplementary Figure S7. Presence and absence of 37 families of modules 65, 72, 129 and 184 in genomes of archaea. Scale bar indicates the average substitutions per site. PeptidasePeptidase M14Oligopeptide M28PeptidasePeptidase transporter M17Peptidase M6Peptidase S15Calmodulin M60-likeCalmodulinCalmodulinCalcium-binding protein CML 0 2 2 0 1 4 8 7 2 1 1 fam051 2 fam052 7 fam010 9 fam032 1 fam008 4 fam033 2 fam054 5 fam018 3 fam028 5 fam008 5 fam021 0

GCA 002509225.1 unclassi ed Euryarchaeota GCA 002724775.1 unclassi ed Euryarchaeota GCA 001875345.1 unclassi ed Euryarchaeota Pontarchaea GCA 002497685.1 unclassi ed Euryarchaeota GCA 002495305.1 unclassi ed Euryarchaeota (Marine Group III) GCA 003193815.1 unclassi ed Euryarchaeota GCA 002687355.1 unclassi ed Euryarchaeota GCA 002731905.1 unclassi ed Euryarchaeota GCA 002720095.2 unclassi ed Euryarchaeota GCA 002696315.1 unclassi ed Euryarchaeota GCA 002698725.1 unclassi ed Euryarchaeota GCA 002720275.1 unclassi ed Euryarchaeota GCA 002692465.1 unclassi ed Euryarchaeota GCA 002503395.1 unclassi ed Euryarchaeota GCA 002719475.1 unclassi ed Euryarchaeota GCA 002505935.1 unclassi ed Euryarchaeota GCA 002699425.1 unclassi ed Euryarchaeota GCA 002695145.1 unclassi ed Euryarchaeota GCA 002706615.1 unclassi ed Euryarchaeota GCA 002701145.1 unclassi ed Euryarchaeota GCA 002507425.1 unclassi ed Euryarchaeota GCA 002710015.1 unclassi ed Euryarchaeota GCA 002694245.1 unclassi ed Euryarchaeota GCA 002715725.1 unclassi ed Euryarchaeota GCA 002506705.1 unclassi ed Euryarchaeota GCA 002720115.1 unclassi ed Euryarchaeota GCA 002710205.1 unclassi ed Euryarchaeota GCA 002709705.1 unclassi ed Euryarchaeota GCA 002498185.1 unclassi ed Euryarchaeota GCA 002712495.1 unclassi ed Euryarchaeota GCA 002507165.1 unclassi ed Euryarchaeota GCA 002170075.1 unclassi ed Euryarchaeota GCA 002505155.1 unclassi ed Euryarchaeota GCA 002694585.1 unclassi ed Euryarchaeota GCA 002497645.1 unclassi ed Euryarchaeota GCA 002496635.1 unclassi ed Euryarchaeota GCA 002713665.1 unclassi ed Euryarchaeota GCA 002726845.1 unclassi ed Euryarchaeota GCA 002706065.1 unclassi ed Euryarchaeota GCA 002703075.2 unclassi ed Euryarchaeota GCA 002715245.1 unclassi ed Euryarchaeota GCA 002502215.1 unclassi ed Euryarchaeota Thalassoarchaea GCA 002698145.1 unclassi ed Euryarchaeota GCA 002494645.1 unclassi ed Euryarchaeota (Marine Group II) GCA 002505405.1 unclassi ed Euryarchaeota GCA 002726495.1 unclassi ed Euryarchaeota MGIIa GCA 002728035.1 unclassi ed Euryarchaeota GCA 002495645.1 unclassi ed Euryarchaeota GCA 002495105.1 unclassi ed Euryarchaeota GCA 002692625.1 unclassi ed Euryarchaeota GCA 002694525.1 unclassi ed Euryarchaeota GCA 002502295.1 unclassi ed Euryarchaeota GCA 002704515.1 unclassi ed Euryarchaeota GCA 002714305.1 unclassi ed Euryarchaeota GCA 002499015.1 unclassi ed Euryarchaeota GCA 000246735.1 unclassi ed Euryarchaeota GCA 002708275.1 unclassi ed Euryarchaeota GCA 002170315.1 unclassi ed Euryarchaeota GCA 002499705.1 unclassi ed Euryarchaeota GCA 002687075.1 unclassi ed Euryarchaeota GCA 002506275.1 unclassi ed Euryarchaeota GCA 002499545.1 unclassi ed Euryarchaeota GCA 002701025.1 unclassi ed Euryarchaeota GCA 002697705.1 unclassi ed Euryarchaeota GCA 002702945.1 unclassi ed Euryarchaeota GCA 002729095.1 unclassi ed Euryarchaeota GCA 002494685.1 unclassi ed Euryarchaeota GCA 002506025.1 unclassi ed Euryarchaeota GCA 002496955.1 unclassi ed Euryarchaeota GCA 002494865.1 unclassi ed Euryarchaeota GCA 002505495.1 unclassi ed Euryarchaeota GCA 002722595.1 unclassi ed Euryarchaeota GCA 002502335.1 unclassi ed Euryarchaeota GCA 002724815.1 unclassi ed Euryarchaeota GCA 002712285.1 unclassi ed Euryarchaeota GCA 002506875.1 unclassi ed Euryarchaeota GCA 002713585.1 unclassi ed Euryarchaeota GCA 002708695.1 unclassi ed Euryarchaeota GCA 002708015.1 unclassi ed Euryarchaeota GCA 002503045.1 unclassi ed Euryarchaeota GCA 002502625.1 unclassi ed Euryarchaeota GCA 002718215.1 unclassi ed Euryarchaeota GCA 002505455.1 unclassi ed Euryarchaeota GCA 002497295.1 unclassi ed Euryarchaeota GCA 002504845.1 unclassi ed Euryarchaeota GCA 002717835.1 unclassi ed Euryarchaeota GCA 002497125.1 unclassi ed Euryarchaeota GCA 002507175.1 unclassi ed Euryarchaeota GCA 002495675.1 unclassi ed Euryarchaeota GCA 002507345.1 unclassi ed Euryarchaeota GCA 002498525.1 unclassi ed Euryarchaeota GCA 002498725.1 unclassi ed Euryarchaeota GCA 002699665.1 unclassi ed Euryarchaeota GCA 002499365.1 unclassi ed Euryarchaeota GCA 002708385.1 unclassi ed Euryarchaeota GCA 002502285.1 unclassi ed Euryarchaeota GCA 002457555.1 unclassi ed Euryarchaeota GCA 002497025.1 unclassi ed Euryarchaeota GCA 002499785.1 unclassi ed Euryarchaeota GCA 002494975.1 unclassi ed Euryarchaeota GCA 002497245.1 unclassi ed Euryarchaeota GCA 002686525.1 unclassi ed Euryarchaeota GCA 002494965.1 unclassi ed Euryarchaeota GCA 002496435.1 unclassi ed Euryarchaeota GCA 002495525.1 unclassi ed Euryarchaeota GCA 002726275.1 unclassi ed Euryarchaeota GCA 002504435.1 unclassi ed Euryarchaeota Thalassoarchaea GCA 002499715.1 unclassi ed Euryarchaeota GCA 002718995.2 unclassi ed Euryarchaeota (Marine Group II) GCA 002730095.1 unclassi ed Euryarchaeota GCA 002505775.1 unclassi ed Euryarchaeota MGIIb GCA 002499265.1 unclassi ed Euryarchaeota GCA 002496725.1 unclassi ed Euryarchaeota GCA 002704585.1 unclassi ed Euryarchaeota GCA 002725245.1 unclassi ed Euryarchaeota GCA 002499865.1 unclassi ed Euryarchaeota GCA 002501805.1 unclassi ed Euryarchaeota GCA 002502095.1 unclassi ed Euryarchaeota GCA 002499325.1 unclassi ed Euryarchaeota GCA 003193925.1 unclassi ed Euryarchaeota GCA 002721855.1 unclassi ed Euryarchaeota GCA 002506005.1 unclassi ed Euryarchaeota GCA 002727275.1 unclassi ed Euryarchaeota GCA 002496735.1 unclassi ed Euryarchaeota GCA 002171445.1 unclassi ed Euryarchaeota GCA 002719615.1 unclassi ed Euryarchaeota GCA 002698465.1 unclassi ed Euryarchaeota GCA 002712575.1 unclassi ed Euryarchaeota GCA 001629205.1 unclassi ed Euryarchaeota GCA 002698805.1 unclassi ed Euryarchaeota GCA 002497545.1 unclassi ed Euryarchaeota GCA 002725315.1 unclassi ed Euryarchaeota GCA 002728015.1 unclassi ed Euryarchaeota GCA 002506755.1 unclassi ed Euryarchaeota GCA 002719965.1 unclassi ed Euryarchaeota GCA 002716085.1 unclassi ed Euryarchaeota GCA 002720055.1 unclassi ed Euryarchaeota GCA 002689165.1 unclassi ed Euryarchaeota Tree scale: 0.1 GCA 001629245.1 unclassi ed Euryarchaeota GCA 001628485.1 unclassi ed Euryarchaeota GCA 002683155.1 unclassi ed Euryarchaeota GCA 002723045.1 unclassi ed Euryarchaeota GCA 002457155.1 unclassi ed Euryarchaeota GCA 002727815.1 unclassi ed Euryarchaeota GCA 002170935.1 unclassi ed Euryarchaeota

Supplementary Figure S8. Presence and absence of 11 families of modules 32, 45, 71, 135 in the genomes of Thalassoarchaea. Scale bar indicates the average substitutions per site.

Cmr3Cmr5Cmr2Cmr1CsaX Csa5 Cas3-HDCas5 RpoG/Rpb8Rpo13ThermopsinThermopsinPBP2ThermoDBPPriX 2 1 6 0 7 9 8 2 9 5 0 8 1 7 5 0 4 7 7 7 5 9 5 9 9 9 7 3 3 9 fam016 fam031 fam031 fam038 fam031 fam031 fam012 fam072 fam016 fam012 fam019 fam027 fam029 fam049 fam031 GCA 000993805.1 Crenarchaeota GCA 000015225.1 Crenarchaeota GCA 000813245.1 Crenarchaeota GCA 002254745.1 Crenarchaeota GCA 002254985.1 Crenarchaeota GCA 000223395.1 Crenarchaeota GCA 000015145.1 Crenarchaeota GCA 001462395.1 Crenarchaeota GCA 002194565.1 Crenarchaeota GCA 002254665.1 Crenarchaeota GCA 000092465.1 Crenarchaeota GCA 000015945.1 Crenarchaeota GCA 000264495.1 Crenarchaeota GCA 002495025.1 Crenarchaeota GCA 000092185.1 Crenarchaeota GCA 000513855.1 Crenarchaeota GCA 000186365.1 Crenarchaeota GCA 000258425.1 Crenarchaeota GCA 000389735.1 Crenarchaeota GCA 000011205.1 Crenarchaeota GCA 003086555.1 Crenarchaeota GCA 001560435.1 Crenarchaeota GCA 000508305.1 Crenarchaeota GCA 001719125.1 Crenarchaeota GCA 000022385.1 Crenarchaeota GCA 900079115.1 Crenarchaeota GCA 001316085.1 Crenarchaeota GCA 000632495.1 Crenarchaeota GCA 000213215.1 Crenarchaeota GCA 003201765.1 Crenarchaeota GCA 002116695.1 Crenarchaeota Tree scale: 0.1 GCA 000565255.1 Crenarchaeota GCA 003201835.1 Crenarchaeota GCA 000243315.1 Crenarchaeota GCA 000204925.1 Crenarchaeota GCA 000016605.1 Crenarchaeota GCA 003201675.1 Crenarchaeota

Supplementary Figure S9. Presence and absence of 15 families of modules 2 and 66 in genomes of Crenarchaeota. Scale bar indicates the average substitutions per site.

amoAamoBamoCamoX 3 6 1 3 1 2 1040 1 fam170 fam064 fam fam080 GCA 002011075.1 Candidatus Geothermarchaeota GCA 002898355.1 unclassi!ed Archaea miscellaneous Outgroup BS4 GCA 002499005.1 Thaumarchaeota Group I.1d GCA 002506605.1 Thaumarchaeota DS1 fn1 genasci Genasci Feb2018 S36 SO-2 Thaumarchaeota 54 9 GCA 003164815.1 Thaumarchaeota genasci B 1 S1 Biohub coassembly Thaumarchaeota 49 231 genasci GD2017-2 S65 QB3 180125 Thaumarchaeota 49 13 Group I.1c genasci Genasci Feb2018 S36 SO-2 Thaumarchaeota 45 15 genasci GD2017-2 S2 QB3 180125 Thaumarchaeota 46 8 GCA 002713205.1 Thaumarchaeota GCA 002508305.1 Thaumarchaeota GCA 002494565.1 Thaumarchaeota GCA 002713325.1 Thaumarchaeota GCA 000802205.2 Thaumarchaeota GCA 001870125.1 Thaumarchaeota GCA 003176995.1 Thaumarchaeota GCA 000730285.1 Thaumarchaeota Group I.1b GCA 000698785.1 Thaumarchaeota GCA 002508435.1 Thaumarchaeota GCA 000303155.1 Thaumarchaeota GCA 001917505.1 Thaumarchaeota GCA 900167955.1 Thaumarchaeota GCA 900065925.1 Thaumarchaeota GCA 900143675.1 Thaumarchaeota GCA 900177045.1 Thaumarchaeota GCA 000685395.1 Thaumarchaeota GCA 000723185.1 Thaumarchaeota GCA 002787055.1 Thaumarchaeota GCA 000955905.3 Thaumarchaeota GCA 001443365.1 Thaumarchaeota GCA 000402075.1 Thaumarchaeota GCA 002698885.1 Thaumarchaeota GCA 001627235.1 Thaumarchaeota GCA 000812185.1 Thaumarchaeota GCA 000200715.1 Thaumarchaeota GCA 003352295.1 Crenarchaeota genasci GD2017-2 S2 QB3 180125 Thaumarchaeota 33 9 GCA 002737445.1 Thaumarchaeota Group I.1a Tree scale: 0.1 GCA 000204585.1 Thaumarchaeota GCA 002784675.1 Thaumarchaeota GCA 000220175.2 Thaumarchaeota GCA 000299395.1 Thaumarchaeota GCA 002788515.1 Thaumarchaeota GCA 003175215.1 Thaumarchaeota GCA 000956175.1 Thaumarchaeota GCA 000018465.1 Thaumarchaeota GCA 000328945.1 Thaumarchaeota GCA 000875775.1 Thaumarchaeota GCA 001510275.1 Thaumarchaeota GCA 001543015.1 Thaumarchaeota GCA 002730325.1 Thaumarchaeota GCA 002508395.1 Thaumarchaeota GCA 002737455.1 Thaumarchaeota GCA 001541925.1 Thaumarchaeota GCA 002156965.1 Thaumarchaeota GCA 003331425.1 Thaumarchaeota

Supplementary Figure S10. Presence and absence of 4 families of module 142 in genomes of Thaumarchaeota. Scale bar indicates the average substitutions per site.

5-oxoprolinaseRibosomalThermococcales complexValosin-containing proteinS component L41ePeptidases inhibitor of protein-likeSulfhydrogenase an of energy-coupling HydrogenPCNANa+-pumping ATPaseNa+-pumping gas-evolving subunit factor methylmalonyl-coenzyme I transporter methylmalonyl-coenzymemembrane-bound hydrogenases A (CoA) A (CoA) decarboxylase subunit decarboxylase I subunit subunit alpha gamma 6 4 0 1 8 3 8 2 2 1 4 9 5 2 9 6 7 6 3 3 7 5 7 5 6 3 133 1 fam059 fam075 fam150 fam021 fam098 fam0 fam020 fam013 fam269 fam050 fam045 fam037 fam108 fam024 GCA 002009975.1 Thermococci GCA 000966265.1 Thermococci GCA 000725425.1 Thermococci GCA 001317345.1 Thermococci GCA 000022545.1 Thermococci GCA 000430485.1 Thermococci GCA 001484685.1 Thermococci GCA 000246985.3 Thermococci GCA 000517445.1 Thermococci GCA 001433455.1 Thermococci GCA 000215995.1 Thermococci GCA 000195935.2 Thermococci GCA 000011105.1 Thermococci GCA 000211475.1 Thermococci GCA 000275605.1 Thermococci GCA 000263735.1 Thermococci GCA 001577775.1 Thermococci GCA 900012635.1 Thermococci GCA 001647085.1 Thermococci GCA 000018365.1 Thermococci GCA 900109425.1 Thermococci GCA 002214365.1 Thermococci GCA 002214525.1 Thermococci GCA 000265525.1 Thermococci Tree scale: 0.1 GCA 002214565.1 Thermococci GCA 002214465.1 Thermococci GCA 000221185.1 Thermococci GCA 001484195.1 Thermococci GCA 002214505.1 Thermococci GCA 002214485.1 Thermococci GCA 002214585.1 Thermococci GCA 002214385.1 Thermococci GCA 000258515.1 Thermococci GCA 000009965.1 Thermococci GCA 001592435.1 Thermococci GCA 000816105.1 Thermococci GCA 000769655.1 Thermococci GCA 900198835.1 Thermococci GCA 000585495.1 Thermococci GCA 000151205.2 Thermococci GCA 000022365.1 Thermococci

Supplementary Figure S11. Presence and absence of 14 families of module 8 in genomes of Thermococci. Scale bar indicates the average substitutions per site.

CytochromeCytochrome bdDivergent oxidase bdCytochrome oxidasesubunit heme-copperCytochrome subunitI cCatalase-peroxidase biogenesis oxygenII C and reductase Quinolfactor oxidaseGas subunit vesicle polypeptide I proteinBacterioruberin families I 2’’, 3’’-hydratase fam023 9 5 fam066 3 4 fam026 9 6 fam010 0 1 fam021 4 3 fam022 1 0 fam038 3 4 fam037 4 0 fam028 5 4 fam008 8 9 fam007 3 6 GCA 002153915.1 GCA 001914405.1 Methanonatronarchaeia GCA 002503845.1 GCA 000485535.1 Halobacteria GCA 001593955.1 Halobacteria GCA 000474235.1 Halobacteria GCA 001305655.1 Halobacteria GCA 001886955.1 Halobacteria GCA 900110535.1 Halobacteria GCA 000069025.1 Halobacteria GCA 000230955.3 Halobacteria GCA 001485535.1 Halobacteria GCA 900005775.1 Halobacteria GCA 000496195.1 Halobacteria GCA 900107665.1 Halobacteria GCA 000427685.1 Halobacteria GCA 003342695.1 Halobacteria GCA 003336245.1 Halobacteria GCA 002906575.1 Halobacteria GCA 002844195.1 Halobacteria GCA 001469955.1 Halobacteria GCA 900103715.1 Halobacteria GCA 900110465.1 Halobacteria GCA 900114455.1 Halobacteria GCA 900109695.1 Halobacteria GCA 000336755.1 Halobacteria GCA 001469865.1 Halobacteria GCA 000337815.1 Halobacteria GCA 000685635.1 Halobacteria GCA 001368915.1 Halobacteria GCA 001469875.2 Halobacteria GCA 001190965.1 Halobacteria GCA 001482285.1 Halobacteria GCA 000025685.1 Halobacteria GCA 000337835.1 Halobacteria GCA 900100875.1 Halobacteria GCA 900113245.1 Halobacteria GCA 900112175.1 Halobacteria GCA 000337855.1 Halobacteria GCA 000337095.1 Halobacteria GCA 900115785.1 Halobacteria Tree scale: 0.1 GCA 000739575.1 Halobacteria GCA 900107195.1 Halobacteria GCA 900108165.1 Halobacteria GCA 000416005.1 Halobacteria GCA 000415985.1 Halobacteria GCA 000237865.1 Halobacteria GCA 000415965.1 Halobacteria GCA 000495475.1 Halobacteria GCA 900129775.1 Halobacteria GCA 003020945.1 Halobacteria GCA 000224475.1 Halobacteria GCA 900115675.1 Halobacteria GCA 000739555.1 Halobacteria GCA 002025255.1 Halobacteria GCA 001282785.1 Halobacteria GCA 900109065.1 Halobacteria GCA 001462205.1 Halobacteria GCA 900108505.1 Halobacteria GCA 002355635.1 Halobacteria GCA 900107205.1 Halobacteria GCA 900188065.1 Halobacteria GCA 000496175.1 Halobacteria GCA 001542905.1 Halobacteria GCA 002252985.1 Halobacteria GCA 000337035.1 Halobacteria GCA 900188075.1 Halobacteria GCA 003023405.1 Halobacteria GCA 000337415.1 Halobacteria GCA 000337075.1 Halobacteria GCA 001280455.1 Halobacteria GCA 000336875.1 Halobacteria GCA 000337015.1 Halobacteria GCA 900111935.1 Halobacteria GCA 002355655.1 Halobacteria GCA 000296615.1 Halobacteria GCA 002252895.1 Halobacteria GCA 000336995.1 Halobacteria GCA 001564205.1 Halobacteria GCA 000337355.1 Halobacteria GCA 002727125.1 Halobacteria GCA 000337915.1 Halobacteria GCA 000337375.1 Halobacteria GCA 002215305.1 Halobacteria GCA 000746205.1 Halobacteria GCA 000739595.1 Halobacteria GCA 002286985.1 Halobacteria GCA 900103505.1 Halobacteria GCA 000328525.1 Halobacteria GCA 000337515.1 Halobacteria GCA 001564275.1 Halobacteria GCA 001564285.1 Halobacteria GCA 000517625.1 Halobacteria GCA 900116205.1 Halobacteria GCA 000691505.2 Halobacteria GCA 003430825.1 Halobacteria GCA 002156705.1 Halobacteria GCA 000455345.1 Halobacteria GCA 000337215.1 Halobacteria GCA 000337675.1 Halobacteria GCA 000337695.1 Halobacteria GCA 000328685.1 Halobacteria GCA 000337495.1 Halobacteria GCA 000025325.1 Halobacteria GCA 900156475.1 Halobacteria GCA 000337735.1 Halobacteria GCA 000337715.1 Halobacteria GCA 000383975.1 Halobacteria GCA 900100335.1 Halobacteria GCA 900108095.1 Halobacteria GCA 900156445.1 Halobacteria GCA 002177135.1 Halobacteria GCA 000337535.1 Halobacteria GCA 000337595.1 Halobacteria GCA 000337555.1 Halobacteria GCA 000337575.1 Halobacteria GCA 000025625.1 Halobacteria GCA 000337135.1 Halobacteria GCA 001861355.1 Halobacteria GCA 000217715.1 Halobacteria GCA 000455365.1 Halobacteria GCA 900114025.1 Halobacteria GCA 900104065.1 Halobacteria GCA 000337895.1 Halobacteria GCA 900112205.1 Halobacteria GCA 000336655.1 Halobacteria GCA 900110455.1 Halobacteria GCA 900111485.1 Halobacteria GCA 000337475.1 Halobacteria GCA 900110865.1 Halobacteria GCA 002572525.1 Halobacteria GCA 000710605.1 Halobacteria GCA 000690595.2 Halobacteria GCA 001953745.1 Halobacteria GCA 000337195.1 Halobacteria GCA 002494345.1 Halobacteria GCA 000493245.1 Halobacteria GCA 000337615.1 Halobacteria GCA 000337155.1 Halobacteria GCA 003023565.1 Halobacteria GCA 900215575.1 Halobacteria GCA 003369835.1 Halobacteria GCA 003382685.1 Halobacteria GCA 003022025.1 Halobacteria GCA 003298465.1 Halobacteria GCA 900142335.1 Halobacteria GCA 001625445.1 Halobacteria GCA 900156425.1 Halobacteria GCA 000710615.1 Halobacteria GCA 003021085.1 Halobacteria GCA 000403645.1 Halobacteria GCA 003022865.1 Halobacteria GCA 003023195.1 Halobacteria GCA 003021705.1 Halobacteria GCA 003022925.1 Halobacteria GCA 003021235.1 Halobacteria GCA 003021045.1 Halobacteria GCA 000336675.1 Halobacteria GCA 000755245.1 Halobacteria GCA 000336695.1 Halobacteria GCA 000336715.1 Halobacteria GCA 000336935.1 Halobacteria GCA 000336915.1 Halobacteria GCA 000334895.1 Halobacteria GCA 003021145.1 Halobacteria GCA 003023525.1 Halobacteria GCA 003021975.1 Halobacteria GCA 000026045.1 Halobacteria GCA 000591055.1 Halobacteria GCA 003021005.1 Halobacteria GCA 003020965.1 Halobacteria GCA 003022005.1 Halobacteria GCA 003020985.1 Halobacteria GCA 001950595.1 Halobacteria GCA 003023215.1 Halobacteria GCA 003021175.1 Halobacteria GCA 003023185.1 Halobacteria GCA 001989615.1 Halobacteria GCA 900102305.1 Halobacteria GCA 900110215.1 Halobacteria GCA 900114435.1 Halobacteria GCA 000023965.1 Halobacteria GCA 003020925.1 Halobacteria GCA 003021755.1 Halobacteria GCA 001280425.1 Halobacteria GCA 900106715.1 Halobacteria GCA 000336615.1 Halobacteria GCA 000337755.1 Halobacteria GCA 001485575.1 Halobacteria GCA 001647155.1 Halobacteria GCA 000336895.1 Halobacteria GCA 000336635.1 Halobacteria GCA 000470655.1 Halobacteria GCA 000023945.1 Halobacteria GCA 003058365.1 Halobacteria GCA 000755225.1 Halobacteria GCA 003022045.1 Halobacteria GCA 000337455.1 Halobacteria GCA 003023785.1 Halobacteria GCA 003021655.1 Halobacteria GCA 003021305.1 Halobacteria GCA 003022745.1 Halobacteria GCA 003021015.1 Halobacteria GCA 003022725.1 Halobacteria GCA 001564135.1 Halobacteria GCA 000416085.1 Halobacteria GCA 003021285.1 Halobacteria GCA 900100385.1 Halobacteria Supplementary Figure S12. Presence and absence of 11 families of modules 13 and 108 in genomes of Halobacteria. Scale bar indicates the average substitutions per site.

ESCRT-IIIESCRT-II Snf7-domainGelsolin Vps25-likeLoki proIntegrin-like!lin-1TFIIH 9 8 1 0 1 5 7 7 3 2 7 5 fam049 fam123 fam032 fam044 fam152 fam189 GCA 001940755.1 Candidatus Heimdallarchaeota GCA 003144275.1 Candidatus Heimdallarchaeota GCA 001563325.1 Candidatus Thorarchaeota GCA 002825535.1 Candidatus Thorarchaeota GCA 001940705.1 Candidatus Thorarchaeota Tree scale: 0.1 GCA 002825515.1 Candidatus Thorarchaeota

Supplementary Figure S13. Presence and absence of 6 families of module 48 in genomes of Asgard. Scale bar indicates the average substitutions per site.

1000

800

600

400 Protein lenght (amino-acid)

200

0

Asgard

Halobacteria Thermococci CrenarchaeotaArchaeoglobi Module 1 (core) Thalassoarchaea ThaumarchaeotaThermoplasmataMethanomicrobia Methanobacteria

Methanomassiliicoccales

Supplementary Figure S14. The length distribution of hypothetical proteins (in amino acid).

Supplementary dataset

Table S1. List of the 3,197 genomes used in this study. For each genome (column A), the old name, the number of scaffolds, genome size and number of CDS are displayed in columns B, C, D and E. Genome source is in column F, dRep cluster in column G. Genome completeness and the contamination based on single copy genes are displayed in columns H and I respectively. Column J informs about the concatenated ribosomal proteins. The 1,179 representative genomes are indicated in column K. The phylum and superphylum (DPANN and non-DPANN) of the representative genomes are provided in columns L and M. Taxonomy based on the different databases we pulled out the genomes is shown in column N.

Table S2. Taxonomy distribution of the 113 modules. Module name is indicated in column A whereas the number of families is indicated in column B. Suggested taxonomic distribution is indicated in column C. Column D details the genomes used to define the taxonomic distribution (phylum, number of genomes).

Table S3. Annotation of the 10,866 families. Column A: module number. Column B: family accession. Column C: number of proteins in the family. Column D: median length of the proteins. Column E: ratio of proteins predicted to contain a signal peptide. Column F: median number of predicted transmembrane helix per protein. Column G: domain architecture reported by Pfam. Columns H, I, J, K, L: KEGG annotations. Column M: Cazy annotation. Columns N to AC indicate the ratio of genomes having the given family in the given archaeal phylum. Columns AD to CK indicate the ratio of genomes having the given family in the given bacterial phylum.

Table S4. Annotation of the subfamilies (column C) based on Hmmsearch against the PDB database (columns D and E) and based on HMM-HMM prediction against the arCogs of the EggNOGs database (columns F, G, H and I).

Table S5. Genes neighboring the four genes encoding the subunits of the ammonia monooxygenase. The four genes downstream and upstream of each amoA, amoB, amoC and amoX genes (column H) were identified and annotated using the protein clustering (column E), the PFAM (column G) and the KEGG databases (column F).

Table S6. Genes neighboring the three genes encoding the subunits of the 5-oxoprolinase complex. The three genes downstream and upstream of each pxpA, pxpB and pxpC genes (column H) were identified and annotated using the protein clustering (column E), the PFAM (column G) and the KEGG databases (column F).

Table S7. Genes neighboring the three genes encoding the enzymes of the pathway of the C50 carotenoid bacterioruberin and the gene encoding a distant homolog of the catalytic subunit I of heme-copper oxygen reductase (fam02696). The four genes downstream and upstream of each LyeJ, CruF ,CrtD and fam02696 genes (column H) were identified and annotated using the protein clustering (column E), the PFAM (column G) and the KEGG databases (column F).

Table S8. Genes neighboring the two genes encoding the integrin beta 4 and the TFIIH. The five genes downstream and upstream of each integrin and the TFIIH genes (column H) were identified and annotated using the protein clustering (column E), the PFAM (column G) and the KEGG databases (column F).