Supplementary Tables and Figures

Table S19 | Selected representatives. Subsampling the r89 dataset set by reducing the number of taxa to one representative per genus resulting in 440 genus representatives. Abbreviations: Completeness (Com), contamination (Cont).

#GENOME ID GENUS COM CONT QUALITY DESCRIPTION RS_GCF_000745485.1 g__Methanobacterium_D 99.2 0.8 95.2 NCBI representative genome RS_GCF_000023945.1 g__Halorhabdus 98.25 0.87 93.9 Complete NCBI representative genome RS_GCF_000013445.1 g__Methanospirillum 99.02 0.33 97.37 Complete NCBI representative genome RS_GCF_000012285.1 g__Sulfolobus 99.4 0 99.4 Complete NCBI representative genome RS_GCF_900095295.1 g__Methanobacterium_C 99.73 0 99.73 Complete NCBI representative genome RS_GCF_000191585.1 g__Methanobacterium_B 99.2 0 99.2 Complete NCBI representative genome GB_GCA_000802205.2 g__Nitrosocosmicus 98.06 0.97 93.21 GTDB representative genome RS_GCF_000217995.1 g__Methanosalsum 99.35 0 99.35 Complete NCBI representative genome GB_GCA_000375685.1 g__AAA261-N23 73.65 2.21 62.6 GTDB representative genome RS_GCF_000016525.1 g__Methanobrevibacter_A 100 0 100 Complete NCBI representative genome RS_GCF_000621965.1 g__Methanobrevibacter_B 100 0 100 NCBI representative genome RS_GCF_000513315.1 g__Methanobrevibacter_C 99.2 0 99.2 NCBI representative genome RS_GCF_001639295.1 g__Methanobrevibacter_D 97.6 0 97.6 NCBI representative genome GB_GCA_002497075.1 g__UBA472 98.39 0.97 93.54 GTDB representative genome GB_GCA_001775955.1 g__RBG-13-38-9 81.31 2.8 67.31 GTDB representative genome RS_GCF_000337215.1 g__Natronolimnobius 98.92 0.46 96.62 NCBI representative genome RS_GCF_000455365.1 g__Halopiger_B 99.24 0 99.24 NCBI representative genome RS_GCF_000455345.1 g__Halopiger_A 99.62 0.24 98.42 NCBI representative genome GB_GCA_002839705.1 g__UBA349 100 1.6 92 GTDB representative genome GB_GCA_002503395.1 g__UBA60 68.84 0.8 64.84 GTDB representative genome RS_GCF_000226975.2 g__Halobiforma 99.43 0 99.43 Complete NCBI representative genome GB_GCA_002720095.1 g__UBA68 86 5.6 58 GTDB representative genome RS_GCF_000023965.1 g__Halomicrobium 99.93 0 99.93 Complete NCBI representative genome RS_GCF_000337515.1 g__Halovivax 98.65 0 98.65 NCBI representative genome GB_GCA_900177045.1 g__Nitrosotalea 99.51 0 99.51 GTDB representative genome RS_GCF_000223905.1 g__Haloarcula 99.93 0.4 97.93 Complete NCBI representative genome GB_GCA_002503015.1 g__UBA618 83.42 0 83.42 GTDB representative genome GB_GCA_000965745.1 g__GCA-965745 97.94 0 97.94 GTDB representative genome GB_GCA_001587575.1 g__Methanofastidiosum 90.98 2.72 77.38 GTDB representative genome RS_GCF_000196895.1 g__Halalkalicoccus 99.84 0 99.84 Complete NCBI representative genome GB_GCA_002792685.1 g__CG1-02-32-21 66.2 0.93 61.55 GTDB representative genome RS_GCF_000337895.1 g__Halobiforma_A 86.94 0 86.94 NCBI representative genome GB_GCA_002686295.1 g__GCA-2686295 77.57 0 77.57 GTDB representative genome RS_GCF_000022205.1 g__Halorubrum 99.76 0 99.76 Complete NCBI representative genome RS_GCF_000020905.1 g__Desulfurococcus 100 0 100 Complete NCBI representative genome U_76166 g__UBA11855 71.35 2.8 57.35 GTDB representative genome RS_GCF_000283335.1 g__Halogranum 98.61 0.76 94.81 NCBI representative genome U_70249 g__UBA11716 81.07 0.93 76.42 GTDB representative genome GB_GCA_002785105.1 g__CG03 71.12 0 71.12 GTDB representative genome RS_GCF_000495475.1 g__Halobonum 97.54 1.52 89.94 GTDB representative genome GB_GCA_002254545.1 g__ex4484-224 68.22 0.47 65.87 GTDB representative genome GB_GCA_002505585.1 g__UBA489 79.44 0 79.44 GTDB representative genome U_71212 g__UBA9989 85.28 0 85.28 GTDB representative genome GB_GCA_002687825.1 g__GCA-2687825 65.42 0 65.42 GTDB representative genome U_75797 g__UBA12501 73.68 1.87 64.33 GTDB representative genome RS_GCF_000179575.2 g__Methanothermococcus 99.05 0 99.05 Complete NCBI representative genome GB_GCA_001515185.1 g__DG-33 63.87 0.8 59.87 GTDB representative genome U_71538 g__UBA10219 71.31 2.8 57.31 GTDB representative genome RS_GCF_000264495.1 g__Thermogladius 100 0 100 Complete NCBI representative genome U_71529 g__UBA10210 83.02 2.8 69.02 GTDB representative genome U_71533 g__UBA10214 79.63 0 79.63 GTDB representative genome U_71535 g__UBA10216 66.28 0 66.28 GTDB representative genome GB_GCA_001563305.1 g__ISO4-G1 96.37 0.81 92.32 GTDB representative genome RS_GCF_001971705.1 g__Natronorubrum 99.11 0.19 98.16 Complete NCBI representative genome GB_GCA_002505525.1 g__UBA431 73.36 0 73.36 GTDB representative genome U_67580 g__UBA8695 70.76 0 70.76 GTDB representative genome 1

GB_GCA_002727675.1 g__UBA8690 68.69 0 68.69 GTDB representative genome GB_GCA_001775965.1 g__RBG-16-57-9 76.13 0 76.13 GTDB representative genome GB_GCA_001595885.1 g__SG8-5 72.24 0.8 68.24 GTDB representative genome GB_GCA_002699425.1 g__UBA12002 78.93 0.93 74.28 GTDB representative genome GB_GCA_002495885.1 g__UBA148 96.24 2.29 84.79 GTDB representative genome GB_GCA_002496385.1 g__UBA147 89.6 3.2 73.6 GTDB representative genome GB_GCA_001761425.1 g__SG9 87.77 0 87.77 GTDB representative genome GB_GCA_001775995.1 g__RBG-16-48-13 84.11 1.44 76.91 GTDB representative genome GB_GCA_002495315.1 g__UBA141 91.91 0 91.91 GTDB representative genome RS_GCF_900110215.1 g__Halorientalis 99.93 1.7 91.43 NCBI representative genome RS_GCF_000220175.1 g__Nitrosoarchaeum 100 0 100 GTDB representative genome RS_GCF_000517625.1 g__Halostagnicola 100 0 100 Complete NCBI representative genome RS_GCF_000152265.2 g__Ferroplasma 98.74 0.81 94.69 Complete NCBI representative genome GB_GCA_002505935.1 g__UBA226 71.07 0 71.07 GTDB representative genome RS_GCF_000195935.2 g__Pyrococcus 100 0 100 Complete NCBI representative genome GB_GCA_002495525.1 g__UBA9562 78.9 0.8 74.9 GTDB representative genome GB_GCA_002457195.1 g__UBA501 57.26 0 57.26 GTDB representative genome RS_GCF_000015205.1 g__Pyrobaculum 100 0 100 GTDB representative genome GB_GCA_000008085.1 g__Nanoarchaeum 73.13 0 73.13 GTDB representative genome GB_GCA_002763025.1 g__1-14-0-10-32-24 64.49 0.93 59.84 GTDB representative genome RS_GCF_001748385.1 g__Vulcanisaeta_A 100 2.21 88.95 NCBI representative genome RS_GCF_000017945.1 g__Ignicoccus_A 99.37 0.84 95.17 Complete NCBI representative genome GB_GCA_002718195.1 g__UBA15 68.8 1.52 61.2 GTDB representative genome GB_GCA_001786415.1 g__RBG-13-33-26 65.65 0 65.65 GTDB representative genome GB_GCA_002762795.1 g__1-14-0-10-37-12 78.5 0 78.5 GTDB representative genome RS_GCF_000008265.1 g__Picrophilus 99.59 0 99.59 Complete NCBI representative genome RS_GCF_000196655.1 g__Methanohalobium 100 1.96 90.2 Complete NCBI representative genome GB_GCA_002687795.1 g__GCA-2687795 72.04 0.93 67.39 GTDB representative genome RS_GCF_000304355.2 g__Methanoculleus 98.37 0.65 95.12 Complete NCBI representative genome GB_GCA_002010075.1 g__JdFR-19 96.73 0 96.73 GTDB representative genome GB_GCA_002011125.1 g__JdFR-18 98.13 1.87 88.78 GTDB representative genome RS_GCF_002177135.1 g__Natronolimnobius_A 98.92 0.38 97.02 NCBI representative genome GB_GCA_001723845.1 g__WOR-SM1-SCG-A 79.06 5.14 53.36 GTDB representative genome GB_GCA_002011035.1 g__JdFR-11 100 0.93 95.35 GTDB representative genome GB_GCA_002011075.1 g__JdFR-13 93.2 0.97 88.35 GTDB representative genome GB_GCA_000389735.1 g__Acd1 99.4 0 99.4 GTDB representative genome GB_GCA_001800745.1 g__COMBO-69-17 77.47 0.4 75.47 GTDB representative genome GB_GCA_002779595.1 g__CG07-land 76.25 0 76.25 GTDB representative genome GB_GCA_000402775.1 g__SCGC-AAA252-I15 62.31 0.93 57.66 GTDB representative genome RS_GCF_001469955.1 g__Haloprofundus 99.24 0.1 98.74 NCBI representative genome GB_GCA_001717015.1 g__Methanosuratus 91.59 0 91.59 GTDB representative genome GB_GCA_002688265.1 g__GCA-2688265 74.3 0.93 69.65 GTDB representative genome RS_GCF_000224475.1 g__Halolamina 97.28 1.71 88.73 Complete NCBI representative genome RS_GCF_000213215.1 g__Acidianus 99.4 0 99.4 Complete NCBI representative genome RS_GCF_000006805.1 g__Halobacterium 99.42 0 99.42 Complete NCBI representative genome U_68175 g__UBA10521 79.66 5.23 53.51 GTDB representative genome RS_GCF_900095815.1 g__Methanothermobacter 100 0 100 Complete NCBI representative genome GB_GCA_001563905.1 g__B1-Br10-U2g19 68.61 0 68.61 GTDB representative genome RS_GCF_000215995.1 g__Pyrococcus_A 100 0 100 Complete NCBI representative genome GB_GCA_001412335.1 g__SD8 97.53 2.83 83.38 GTDB representative genome GB_GCA_000405685.1 g__JGI-0000106-J15 56.55 0.49 54.1 GTDB representative genome GB_GCA_002713205.1 g__GCA-2713205 81.07 1.94 71.37 GTDB representative genome GB_GCA_002728275.1 g__GCA-2728275 83.18 1.87 73.83 GTDB representative genome GB_GCA_002855745.1 g__Thermofilum_A 99.26 0.74 95.56 GTDB representative genome RS_GCF_001412615.1 g__Pyrodictium 99.37 3.8 80.37 Complete NCBI representative genome GB_GCA_000496195.1 g__A07HB70 85.2 0.2 84.2 GTDB representative genome RS_GCF_000025325.1 g__Haloterrigena 99.49 0.95 94.74 Complete NCBI representative genome RS_GCF_000015145.1 g__Hyperthermus 98.73 1.42 91.63 Complete NCBI representative genome GB_GCA_002509225.1 g__UBA102 80.76 2.78 66.86 GTDB representative genome GB_GCA_001516585.1 g__Thermocladium 97.43 0.74 93.73 GTDB representative genome GB_GCA_001940705.1 g__AB-25 90.12 3.19 74.17 GTDB representative genome RS_GCF_001950595.1 g__CBA1134 98.8 0.4 96.8 GTDB representative genome GB_GCA_001515205.2 g__YNP-45 89.72 1.87 80.37 GTDB representative genome RS_GCF_001886955.1 g__Halodesulfurarchaeum 96.09 1.06 90.79 Complete NCBI representative genome U_71513 g__GW2011-AR9 75.49 0.93 70.84 GTDB representative genome

2

GB_GCA_000806115.1 g__GW2011-AR5 76.64 0.93 71.99 GTDB representative genome GB_GCA_002762735.1 g__GW2011-AR1 68.46 0 68.46 GTDB representative genome GB_GCA_000447225.1 g__A-plasma 97.18 1.61 89.13 GTDB representative genome GB_GCA_001563995.1 g__PL-Br10-U2g16 73.99 2.34 62.29 GTDB representative genome GB_GCA_002204705.1 g__B-DKE 98.61 0 98.61 GTDB representative genome RS_GCF_900090055.1 g__Cuniculiplasma 96.33 0 96.33 Complete NCBI representative genome GB_GCA_002762705.1 g__CG08-08-20-14 78.19 0 78.19 GTDB representative genome GB_GCA_002502135.1 g__UBA543 79.91 0 79.91 GTDB representative genome RS_GCF_000235685.2 g__Methanolinea 99.02 0 99.02 NCBI representative genome GB_GCA_001940755.1 g__UBA460 79.37 2.34 67.67 GTDB representative genome RS_GCF_000334895.1 g__Halococcus 98.51 0.04 98.31 NCBI representative genome GB_GCA_002501805.1 g__UBA463 80.93 0 80.93 GTDB representative genome RS_GCF_000230735.2 g__Natrinema 99.95 0 99.95 Complete NCBI representative genome RS_GCF_000025285.1 g__Archaeoglobus_B 99.84 0 99.84 Complete NCBI representative genome RS_GCF_000385565.1 g__Archaeoglobus_A 100 0 100 Complete NCBI representative genome GB_GCA_002503825.1 g__UBA467 99.67 1.31 93.12 GTDB representative genome RS_GCF_000685155.1 g__Methanoperedens 99.67 1.31 93.12 GTDB representative genome GB_GCA_002254885.1 g__ex4572-165 70.13 2.4 58.13 GTDB representative genome GB_GCA_002731905.1 g__UBA8886 72.13 0.8 68.13 GTDB representative genome GB_GCA_001412355.1 g__SDB 98.37 1.31 91.82 GTDB representative genome GB_GCA_000270325.1 g__Caldiarchaeum 98.06 0 98.06 GTDB representative genome RS_GCF_000243255.1 g__Methanoplanus 100 0.65 96.75 NCBI representative genome GB_GCA_002494485.1 g__UBA213 86.83 0.97 81.98 GTDB representative genome GB_GCA_002494785.1 g__UBA117 100 0 100 GTDB representative genome GB_GCA_002762785.1 g__UBA11998 85.51 0 85.51 GTDB representative genome GB_GCA_001421175.1 g__RumEn-M2 94.09 0 94.09 GTDB representative genome GB_GCA_001784635.1 g__RBG-16-49-10 74.92 0 74.92 GTDB representative genome RS_GCF_000190315.1 g__Vulcanisaeta 100 0.74 96.3 Complete NCBI representative genome GB_GCA_002254405.1 g__ex4484-96 66.77 0.93 62.12 GTDB representative genome GB_GCA_002255025.1 g__ex4484-135 73.83 0.93 69.18 GTDB representative genome RS_GCF_000949015.1 g__Acidiplasma 98.78 2.44 86.58 GTDB representative genome RS_GCF_000204415.1 g__Methanothrix 99.35 0.65 96.1 Complete NCBI representative genome GB_GCA_002794155.1 g__CG1-02-35-32 66.98 0 66.98 GTDB representative genome GB_GCA_002791855.1 g__CG10238-14 79.44 0 79.44 GTDB representative genome GB_GCA_001564115.1 g__Tc-Br11-E2g1 74.09 0.71 70.54 GTDB representative genome RS_GCF_000022545.1 g__Thermococcus_A 98.51 0 98.51 Complete NCBI representative genome RS_GCF_000151105.2 g__Thermococcus_B 99.5 0 99.5 Complete NCBI representative genome GB_GCA_002254785.1 g__ANME-1ex4572 69.91 1.96 60.11 GTDB representative genome GB_GCA_001563325.1 g__SMTZ1-83 90.19 6.54 57.49 GTDB representative genome RS_GCF_000144915.1 g__Acidilobus 99.37 0 99.37 Complete NCBI representative genome U_66361 g__UBA12276 70.92 1.87 61.57 GTDB representative genome RS_GCF_000504205.1 g__Methanolobus 99.67 0 99.67 NCBI representative genome RS_GCF_000217715.1 g__Halopiger 99.57 0.38 97.67 Complete NCBI representative genome GB_GCA_002505495.1 g__UBA59 73.37 0.93 68.72 GTDB representative genome GB_GCA_000806135.1 g__GW2011-AR11 64.49 0 64.49 GTDB representative genome GB_GCA_002495905.1 g__UBA57 87.08 0 87.08 GTDB representative genome GB_GCA_002687935.1 g__UBA55 89.72 1.87 80.37 GTDB representative genome GB_GCA_000496235.1 g__J07HR59 70.42 1.82 61.32 GTDB representative genome GB_GCA_001443365.1 g__CSP1-1 99.03 0 99.03 GTDB representative genome GB_GCA_002509405.1 g__VadinCA11 93.15 0 93.15 GTDB representative genome U_71259 g__UBA10191 68.81 0 68.81 GTDB representative genome GB_GCA_000145985.1 g__Ignisphaera 100 0 100 GTDB representative genome RS_GCF_000813245.1 g__Thermofilum 99.26 0.74 95.56 GTDB representative genome GB_GCA_002688355.1 g__GCA-2688355 78.5 1.87 69.15 GTDB representative genome RS_GCF_000725425.1 g__Palaeococcus 99.5 0.5 97 Complete NCBI representative genome RS_GCF_000350305.1 g__Methanomethylophilus 97.98 0.81 93.93 Complete NCBI representative genome RS_GCF_001011115.1 g__Halanaeroarchaeum 98.23 1.06 92.93 Complete NCBI representative genome GB_GCA_002496625.1 g__UBA10452 72.65 1.94 62.95 GTDB representative genome RS_GCF_000012545.1 g__Methanosphaera 97.6 0 97.6 Complete NCBI representative genome U_75421 g__UBA11576 70.11 3.34 53.41 GTDB representative genome GB_GCA_001766815.1 g__Syntrophoarchaeum 96.08 0.33 94.43 GTDB representative genome RS_GCF_000092465.1 g__Staphylothermus 99.37 0 99.37 GTDB representative genome GB_GCA_002504725.1 g__UBA587 86.4 1.6 78.4 GTDB representative genome GB_GCA_002506365.1 g__UBA583 64.95 0 64.95 GTDB representative genome GB_GCA_002505655.1 g__UBA581 84.11 0.93 79.46 GTDB representative genome

3

RS_GCF_000021965.1 g__Methanosphaerula 99.84 0 99.84 Complete NCBI representative genome GB_GCA_002509085.1 g__UBA588 80.55 0 80.55 GTDB representative genome RS_GCF_000308215.1 g__Methanomassiliicoccus 98.39 0 98.39 NCBI representative genome RS_GCF_900100385.1 g__Halovenus 97.65 0.4 95.65 NCBI representative genome GB_GCA_000014945.1 g__Methanosaeta 100 0 100 GTDB representative genome GB_GCA_002457555.1 g__UBA252 71.73 0 71.73 GTDB representative genome U_67070 g__UBA8516 85.6 0 85.6 GTDB representative genome GB_GCA_002503705.1 g__UBA153 78.45 0 78.45 GTDB representative genome GB_GCA_002838935.1 g__hermoplasmata-1 93.8 3.2 77.8 GTDB representative genome U_71035 g__UBA9915 74.25 1.77 65.4 GTDB representative genome GB_GCA_002254415.1 g__ex4484-52 59.81 0 59.81 GTDB representative genome GB_GCA_002254665.1 g__ex4484-58 84.81 1.48 77.41 GTDB representative genome RS_GCF_000230715.2 g__Natronobacterium 99.62 0 99.62 Complete NCBI representative genome RS_GCF_000018305.1 g__Caldivirga 99.26 0 99.26 Complete NCBI representative genome GB_GCA_001273385.1 g__AD8-1 95.79 4.21 74.74 GTDB representative genome GB_GCA_002763265.1 g__1-14-0-10-43-11 82.71 0.93 78.06 GTDB representative genome GB_GCA_000007185.1 g__Methanopyrus 96.74 1.6 88.74 GTDB representative genome U_73928 g__UBA11057 63.75 0.97 58.9 GTDB representative genome U_69440 g__UBA9210 75.8 0.98 70.9 GTDB representative genome U_69053 g__UBA9212 82.93 2.8 68.93 GTDB representative genome GB_GCA_001563965.1 g__PL-Br10-E2g29 60.26 1.33 53.61 GTDB representative genome GB_GCA_001918745.1 g__40CM-2-53-6 91.12 0.93 86.47 GTDB representative genome GB_GCA_002687275.1 g__GCA-2687275 72.66 0 72.66 GTDB representative genome U_65457 g__UBA7935 71.15 0 71.15 GTDB representative genome GB_GCA_002688775.1 g__GCA-2688775 68.69 1.87 59.34 GTDB representative genome GB_GCA_001399805.1 g__BA1 91.59 2.8 77.59 GTDB representative genome GB_GCA_001399795.1 g__BA2 93.77 3.74 75.07 GTDB representative genome GB_GCA_001872315.1 g__CG1-02-39-14 68.38 0 68.38 GTDB representative genome RS_GCF_000092185.1 g__Thermosphaera 98.73 0 98.73 GTDB representative genome GB_GCA_002011165.1 g__JdFR-21 96.08 0.65 92.83 GTDB representative genome GB_GCA_002010045.1 g__JdFR-22 98.69 0 98.69 GTDB representative genome GB_GCA_002011215.1 g__JdFR-24 100 8.17 59.15 GTDB representative genome GB_GCA_002508545.1 g__UBA10881 97.31 0.81 93.26 GTDB representative genome RS_GCF_000214415.1 g__Methanotorris 99.52 0 99.52 Complete NCBI representative genome GB_GCA_001871595.1 g__UBA97 83.8 0.93 79.15 GTDB representative genome GB_GCA_002499405.1 g__UBA95 84.11 0.93 79.46 GTDB representative genome RS_GCF_000711215.1 g__Methanomicrobium 97.39 0.65 94.14 NCBI representative genome GB_GCA_002763225.1 g__1-14-0-10-31-34 82.71 1.87 73.36 GTDB representative genome GB_GCA_002790055.1 g__CG49143 86.42 3.6 68.42 GTDB representative genome GB_GCA_002785505.1 g__Altiarchaeum 95.02 0.93 90.37 GTDB representative genome RS_GCF_002214585.1 g__Thermococcus 100 0 100 Complete NCBI representative genome GB_GCA_002495625.1 g__UBA419 99.94 0 99.94 GTDB representative genome U_68780 g__UBA9134 56.38 1.26 50.08 GTDB representative genome GB_GCA_002494765.1 g__UBA186 92.53 3.2 76.53 GTDB representative genome GB_GCA_000496135.1 g__E-plasma 94.72 0 94.72 GTDB representative genome GB_GCA_002503985.1 g__UBA184 98 1.72 89.4 GTDB representative genome GB_GCA_002763115.1 g__CG1-02-31-27 72.43 0.93 67.78 GTDB representative genome GB_GCA_002204695.1 g__Micrarchaeum 81.78 0.93 77.13 GTDB representative genome RS_GCF_000008665.1 g__Archaeoglobus 100 0 100 Complete NCBI representative genome RS_GCF_000404225.1 g__Methanomassiliicoccus_A 98.79 0.81 94.74 GTDB representative genome GB_GCA_002825515.1 g__SMTZ1-45 90.19 2.34 78.49 GTDB representative genome GB_GCA_001768965.1 g__RBG-16-50-20 72.26 1.94 62.56 GTDB representative genome RS_GCF_000063445.1 g__Methanocella_A 100 0 100 Complete NCBI representative genome RS_GCF_900109065.1 g__Halohasta 99.19 2.76 85.39 NCBI representative genome GB_GCA_002255065.1 g__ex4484-204 63.36 0 63.36 GTDB representative genome GB_GCA_002255045.1 g__ex4484-205 91.75 7.99 51.8 GTDB representative genome GB_GCA_001564035.1 g__B1-Br10-U2g21 70.25 0.93 65.6 GTDB representative genome GB_GCA_002825465.1 g__MP8T-1 92.06 5.3 65.56 GTDB representative genome GB_GCA_002506605.1 g__UBA160 69.74 0.97 64.89 GTDB representative genome RS_GCF_000025685.1 g__Haloferax 99.57 0 99.57 Complete NCBI representative genome GB_GCA_002505945.1 g__UBA119 69.25 0.93 64.6 GTDB representative genome GB_GCA_002506335.1 g__UBA114 98.37 0.65 95.12 GTDB representative genome GB_GCA_000494145.1 g__JGI-OTU-1 82.65 4.37 60.8 GTDB representative genome GB_GCA_001871475.1 g__CG1-02-47-40 92.21 0 92.21 GTDB representative genome GB_GCA_002494645.1 g__UBA253 84.13 0 84.13 GTDB representative genome

4

GB_GCA_001552015.1 g__Nanopusillus 74.07 0 74.07 GTDB representative genome GB_GCA_002778455.1 g__1-14-0-10-45-29 91.04 0 91.04 GTDB representative genome GB_GCA_002687185.1 g__GCA-2687185 69.16 0 69.16 GTDB representative genome RS_GCF_900107665.1 g__Haloplanus 98.8 0.19 97.85 NCBI representative genome RS_GCF_900100875.1 g__Halopelagius 99.38 0.22 98.28 NCBI representative genome GB_GCA_002497645.1 g__UBA559 80.53 0.8 76.53 GTDB representative genome GB_GCA_002838995.1 g__hermoplasmata-2 68.32 0.86 64.02 GTDB representative genome RS_GCF_000235565.1 g__Methanothrix_A 100 0 100 Complete NCBI representative genome GB_GCA_002503285.1 g__UBA557 84.53 0 84.53 GTDB representative genome GB_GCA_001593935.1 g__B26-1 92.99 3.81 73.94 GTDB representative genome GB_GCA_002686855.1 g__GCA-2686855 74.3 0 74.3 GTDB representative genome RS_GCF_900107195.1 g__Halobellus 98.92 0.38 97.02 NCBI representative genome GB_GCA_002495685.1 g__UBA412 100 0.8 96 GTDB representative genome GB_GCA_002688925.1 g__GCA-2688925 84.03 0 84.03 GTDB representative genome U_77789 g__UBA9642 83.64 0 83.64 GTDB representative genome U_70251 g__UBA9640 80.37 1.87 71.02 GTDB representative genome GB_GCA_002507085.1 g__UBA168 88.4 0 88.4 GTDB representative genome GB_GCA_002502685.1 g__UBA202 67.99 1.6 59.99 GTDB representative genome GB_GCA_002506015.1 g__UBA203 87.25 3.27 70.9 GTDB representative genome GB_GCA_002501765.1 g__UBA204 96.22 0.66 92.92 GTDB representative genome GB_GCA_002503105.1 g__UBA206 94.12 0.65 90.87 GTDB representative genome GB_GCA_002503135.1 g__UBA207 93.14 0 93.14 GTDB representative genome RS_GCF_900108505.1 g__Halopenitus 98.41 0 98.41 NCBI representative genome GB_GCA_002496425.1 g__UBA162 86.08 0 86.08 GTDB representative genome GB_GCA_002499005.1 g__UBA164 84.14 0.97 79.29 GTDB representative genome GB_GCA_001871495.1 g__UBA8480 79.44 1.87 70.09 GTDB representative genome GB_GCA_002499905.1 g__UBA9949 99.67 1.31 93.12 GTDB representative genome GB_GCA_002010215.1 g__JdFR-37 94.77 0.65 91.52 GTDB representative genome RS_GCF_000166095.1 g__Methanothermus 100 0 100 Complete NCBI representative genome RS_GCF_000970285.1 g__Methanosarcina 100 0 100 Complete NCBI representative genome GB_GCA_002496395.1 g__Methanocalculus 98.69 0 98.69 GTDB representative genome GB_GCA_000416025.1 g__Halonotius 92.47 3.85 73.22 GTDB representative genome RS_GCF_000026045.1 g__Natronomonas 99.29 1.28 92.89 Complete NCBI representative genome GB_GCA_002498125.1 g__UBA525 82.24 0.93 77.59 GTDB representative genome GB_GCA_000830315.1 g__GW2011-AR20 75.47 0 75.47 GTDB representative genome GB_GCA_002688035.1 g__GCA-2688035 74.69 0.93 70.04 GTDB representative genome RS_GCF_000328665.1 g__Methanomethylovorans 99.84 0.33 98.19 Complete NCBI representative genome GB_GCA_001800815.1 g__COMBO-56-21 92.8 0.8 88.8 GTDB representative genome GB_GCA_002762915.1 g__21-14-0-10-32-9 69.19 0 69.19 GTDB representative genome GB_GCA_002254595.1 g__ex4484-15 61.54 0 61.54 GTDB representative genome GB_GCA_001940665.1 g__LCB-4 96.26 1.4 89.26 GTDB representative genome GB_GCA_002779075.1 g__0-14-0-20-30-16 71.57 0.93 66.92 GTDB representative genome RS_GCF_000011205.1 g__Sulfolobus_C 100 0 100 Complete NCBI representative genome RS_GCF_001316045.1 g__Sulfolobus_B 93.21 0.63 90.06 NCBI representative genome RS_GCF_900079115.1 g__Sulfolobus_A 100 0 100 Complete NCBI representative genome U_71480 g__UBA10161 90.11 0 90.11 GTDB representative genome GB_GCA_001742785.1 g__IMC4 79.12 2.81 65.07 GTDB representative genome GB_GCA_001871415.1 g__CG1-02-57-44 79.91 0 79.91 GTDB representative genome GB_GCA_002496015.1 g__UBA73 59.5 0.47 57.15 GTDB representative genome RS_GCF_000025665.1 g__Aciduliprofundum 100 0 100 Complete NCBI representative genome RS_GCF_001316065.1 g__Aeropyrum 94.81 0.7 91.31 NCBI representative genome GB_GCA_000402515.1 g__SCGC-AAA011-G17 73.11 0 73.11 GTDB representative genome GB_GCA_002779235.1 g__0-14-0-80-44-23 84.11 0.93 79.46 GTDB representative genome GB_GCA_002687815.1 g__GCA-2687815 73.36 1.87 64.01 GTDB representative genome U_71436 g__UBA10117 76.17 0 76.17 GTDB representative genome RS_GCF_000147875.1 g__Methanolacinia 99.67 1.31 93.12 Complete NCBI representative genome RS_GCF_001006045.1 g__Geoglobus 100 0 100 Complete NCBI representative genome GB_GCA_001940655.1 g__CR-4 80.45 2.34 68.75 GTDB representative genome RS_GCF_000723185.1 g__Nitrosotenuis 100 0.97 95.15 NCBI representative genome GB_GCA_002509385.1 g__UBA590 92.32 0.65 89.07 GTDB representative genome RS_GCF_000223395.1 g__Pyrolobus 99.05 0.79 95.1 Complete NCBI representative genome U_73180 g__UBA10834 89.79 2.4 77.79 GTDB representative genome RS_GCF_000328685.1 g__Natronococcus 99.73 0.8 95.73 Complete NCBI representative genome RS_GCF_000011185.1 g__Thermoplasma 97.97 0 97.97 Complete NCBI representative genome GB_GCA_002779555.1 g__20-14-0-80-47-9 90.31 3.88 70.91 GTDB representative genome

5

GB_GCA_001940725.1 g__LC-2 72.43 4.21 51.38 GTDB representative genome GB_GCA_001940645.1 g__LC-3 91.59 5.61 63.54 GTDB representative genome GB_GCA_002688965.1 g__GCA-2688965 76.79 3.27 60.44 GTDB representative genome GB_GCA_001717035.1 g__Methanomethylicus 99.07 0.93 94.42 GTDB representative genome GB_GCA_002507425.1 g__UBA120 80.22 0 80.22 GTDB representative genome GB_GCA_002763335.1 g__CG1-02-47-18 80.61 0 80.61 GTDB representative genome GB_GCA_002497685.1 g__CG-Epi1 56.98 0 56.98 GTDB representative genome GB_GCA_002779065.1 g__0-14-0-20-34-12 81 0.93 76.35 GTDB representative genome RS_GCF_000317795.1 g__Caldisphaera 98.73 0.63 95.58 Complete NCBI representative genome RS_GCF_000474235.1 g__Halarchaeum 96.47 1.51 88.92 NCBI representative genome GB_GCA_000494205.1 g__SCGC-AAA471-B05 88.24 1.47 80.89 GTDB representative genome RS_GCF_000800805.1 g__Methanoplasma 97.85 1.61 89.8 GTDB representative genome GB_GCA_001512965.1 g__DTU008 98.06 0 98.06 GTDB representative genome GB_GCA_002254565.1 g__ex4484-2 81.8 1.98 71.9 GTDB representative genome GB_GCA_000246735.1 g__UBA562 83.2 0 83.2 GTDB representative genome RS_GCF_000015765.1 g__Methanocorpusculum 99.54 0 99.54 Complete NCBI representative genome GB_GCA_002503205.1 g__UBA447 96.46 0.81 92.41 GTDB representative genome GB_GCA_001564275.1 g__Tc-Br11 94.93 1.6 86.93 GTDB representative genome GB_GCA_002507245.1 g__UBA233 97.35 2.8 83.35 GTDB representative genome GB_GCA_001593855.1 g__B25 74.3 1.32 67.7 GTDB representative genome GB_GCA_001593865.1 g__B24 96.73 2.8 82.73 GTDB representative genome RS_GCF_000327485.1 g__Methanoregula 99.87 0 99.87 Complete NCBI representative genome GB_GCA_002504405.1 g__UBA71 97.98 0 97.98 GTDB representative genome RS_GCF_000025625.1 g__Natrialba 99.17 0 99.17 Complete NCBI representative genome GB_GCA_002762985.1 g__1-14-0-10-44-13 85.98 0.93 81.33 GTDB representative genome GB_GCA_002688095.1 g__GCA-2688095 67.76 0 67.76 GTDB representative genome GB_GCA_002505985.1 g__UBA623 71.32 0.8 67.32 GTDB representative genome RS_GCF_001889405.1 g__Methanohalophilus 99.51 0 99.51 Complete NCBI representative genome U_74861 g__UBA11384 70.17 1.32 63.57 GTDB representative genome RS_GCF_000711905.1 g__Methermicoccus 100 0 100 NCBI representative genome RS_GCF_000698785.1 g__Nitrososphaera 100 0.97 95.15 Complete NCBI representative genome RS_GCF_001571385.1 g__Methanofollis 100 0.65 96.75 NCBI representative genome GB_GCA_000402355.1 g__Iainarchaeum 90.19 0 90.19 GTDB representative genome RS_GCF_002156965.1 g__Nitrosopumilus 100 0 100 GTDB representative genome GB_GCA_002506905.1 g__UBA328 98.69 0 98.69 GTDB representative genome RS_GCF_000762265.1 g__Methanobacterium 100 0 100 Complete NCBI representative genome RS_GCF_000194625.1 g__Archaeoglobus_C 99.35 0 99.35 Complete NCBI representative genome GB_GCA_000416105.1 g__J07HB67 68.53 1.48 61.13 GTDB representative genome RS_GCF_000739065.1 g__Methanocaldococcus 99.52 0 99.52 Complete NCBI representative genome GB_GCA_002011395.1 g__JdFR-45 91.61 1.61 83.56 GTDB representative genome RS_GCF_900104065.1 g__Natronobacterium_A 99.57 0.24 98.37 GTDB representative genome GB_GCA_002010305.1 g__JdFR-42 99.84 1.96 90.04 GTDB representative genome GB_GCA_002011355.1 g__JdFR-43 78.48 0 78.48 GTDB representative genome GB_GCA_002010285.1 g__JdFR-41 95.92 0 95.92 GTDB representative genome GB_GCA_002762865.1 g__CG1-02-33-12 80.84 1.87 71.49 GTDB representative genome RS_GCF_000306725.1 g__Methanolobus_A 99.84 0 99.84 Complete NCBI representative genome GB_GCA_002841105.1 g__ltiarchaeales-1 77.41 1.87 68.06 GTDB representative genome GB_GCA_000258425.1 g__Fervidicoccus 99.53 0.63 96.38 GTDB representative genome RS_GCF_002153915.1 g__Methanonatronarchaeum 97.22 2.61 84.17 NCBI representative genome GB_GCA_002502625.1 g__UBA496 80.51 0.04 80.31 GTDB representative genome GB_GCA_002762975.1 g__UBA493 85.98 1.4 78.98 GTDB representative genome RS_GCF_000812185.1 g__Nitrosopelagicus 99.51 0 99.51 GTDB representative genome RS_GCF_000970325.1 g__Methanococcoides 100 0 100 Complete NCBI representative genome GB_GCA_002792115.1 g__um-filter-33-13 80.37 0 80.37 GTDB representative genome U_65474 g__ANME-3 77.05 0.65 73.8 GTDB representative genome GB_GCA_002499185.1 g__UBA284 71.26 3.74 52.56 GTDB representative genome GB_GCA_002495025.1 g__UBA285 98.42 1.27 92.07 GTDB representative genome GB_GCA_002495235.1 g__UBA287 65.03 0.86 60.73 GTDB representative genome GB_GCA_001766825.1 g__Syntrophoarchaeum_A 88.89 0.98 83.99 GTDB representative genome U_77879 g__UBA12515 96 0.8 92 GTDB representative genome RS_GCF_000016605.1 g__Metallosphaera 100 0 100 Complete NCBI representative genome GB_GCA_002726865.1 g__GCA-2726865 92.99 3.31 76.44 GTDB representative genome RS_GCF_000376965.1 g__Methanothermococcus_A 100 0 100 NCBI representative genome GB_GCA_000565255.1 g__AZ1 99.88 0 99.88 GTDB representative genome RS_GCF_000220645.1 g__Methanococcus 99.5 0 99.5 Complete NCBI representative genome

6

U_71519 g__UBA10200 66.92 0.93 62.27 GTDB representative genome U_71523 g__UBA10204 73.62 3.74 54.92 GTDB representative genome GB_GCA_002763075.1 g__1-14-0-10-34-76 65.11 0 65.11 GTDB representative genome GB_GCA_002762845.1 g__1-14-0-10-36-11 74.3 0 74.3 GTDB representative genome GB_GCA_002490245.1 g__B24-2 96.88 1.01 91.83 GTDB representative genome U_70264 g__UBA9653 88.67 0.8 84.67 GTDB representative genome GB_GCA_002792955.1 g__UBA8471 82.17 0.07 81.82 GTDB representative genome RS_GCF_000755225.1 g__Halapricum 99.5 0.85 95.25 NCBI representative genome GB_GCA_001593845.1 g__B63 59.81 0.93 55.16 GTDB representative genome GB_GCA_002254825.2 g__ex4572-44 75.82 0 75.82 GTDB representative genome RS_GCF_000092305.1 g__Methanocaldococcus_A 99.05 0 99.05 Complete NCBI representative genome GB_GCA_000806155.1 g__GW2011-AR18 73.29 0 73.29 GTDB representative genome U_71234 g__GW2011-AR10 90.65 1.87 81.3 GTDB representative genome RS_GCF_000025505.1 g__Ferroglobus 100 0 100 Complete NCBI representative genome U_76853 g__GW2011-AR13 70.33 0 70.33 GTDB representative genome GB_GCA_000830295.1 g__GW2011-AR15 86.92 0.93 82.27 GTDB representative genome U_71455 g__GW2011-AR17 77.57 0 77.57 GTDB representative genome RS_GCF_000337455.1 g__Halosimplex 99 2 89 NCBI representative genome GB_GCA_002722595.1 g__GCA-2722595 77.16 1.6 69.16 GTDB representative genome RS_GCF_000403645.1 g__Salinarchaeum 97.53 0 97.53 GTDB representative genome GB_GCA_002495675.1 g__UBA11751 73.2 0 73.2 GTDB representative genome RS_GCF_002214165.1 g__Mia14 82.4 0 82.4 GTDB representative genome GB_GCA_002502045.1 g__Parvarchaeum 81.8 0.97 76.95 GTDB representative genome RS_GCF_002156705.1 g__B1-Br10-E2g2 99.53 0.4 97.53 Complete NCBI representative genome RS_GCF_000237865.1 g__Haloquadratum 99.3 0.38 97.4 Complete NCBI representative genome RS_GCF_000019605.1 g__Korarchaeum 93.39 2.8 79.39 GTDB representative genome GB_GCA_002688315.1 g__UBA492 86.21 0 86.21 GTDB representative genome GB_GCA_002255135.1 g__ex4484-138 64.12 0.98 59.22 GTDB representative genome RS_GCF_001563245.1 g__Methanobrevibacter 100 0.8 96 Complete NCBI representative genome GB_GCA_002685855.1 g__GCA-2685855 72.43 0.93 67.78 GTDB representative genome GB_GCA_002010975.1 g__JdFR-07 64.49 0.74 60.79 GTDB representative genome RS_GCF_000172995.2 g__Halogeometricum 99.92 0.05 99.67 Complete NCBI representative genome RS_GCF_000710615.1 g__Haladaptatus 99.57 0.28 98.17 NCBI representative genome GB_GCA_000200715.1 g__Cenarchaeum 99.03 0 99.03 GTDB representative genome GB_GCA_002509175.1 g__SM1-50 86.7 3.2 70.7 GTDB representative genome GB_GCA_002254745.1 g__ex4484-217-1 78.57 2.38 66.67 GTDB representative genome GB_GCA_001723855.1 g__WOR-SM1-SCG 76.88 3.12 61.28 GTDB representative genome GB_GCA_001856825.1 g__I-plasma 97.13 0.81 93.08 GTDB representative genome GB_GCA_002763345.1 g__0-14-0-20-59-11 75.92 0 75.92 GTDB representative genome RS_GCF_900103505.1 g__Haloarchaeobius 99.53 0.4 97.53 NCBI representative genome RS_GCF_001481685.1 g__Ignicoccus 99.37 1.48 91.97 Complete NCBI representative genome RS_GCF_000970045.1 g__MTP4 99.84 0 99.84 GTDB representative genome U_72525 g__UBA10536 82.13 0.65 78.88 GTDB representative genome RS_GCF_001462205.1 g__Haloparvum 97.54 0.51 94.99 NCBI representative genome GB_GCA_002839545.1 g__operedenaceae-1 96.41 1.96 86.61 GTDB representative genome RS_GCF_000017185.1 g__Methanococcus_A 99.03 0 99.03 Complete NCBI representative genome RS_GCF_000251105.1 g__Methanocella 100 0 100 Complete NCBI representative genome GB_GCA_002503845.1 g__UBA12382 89.24 0 89.24 GTDB representative genome U_71426 g__UBA10107 63.01 0 63.01 GTDB representative genome RS_GCF_900129775.1 g__Halobaculum 98.3 1.33 91.65 NCBI representative genome GB_GCA_001914405.1 g__Methanohalarchaeum 97.71 0.65 94.46 GTDB representative genome GB_GCA_001800735.1 g__RBG-16-62-10 74.45 1.6 66.45 GTDB representative genome GB_GCA_002792915.1 g__UBA93 85.83 0.93 81.18 GTDB representative genome RS_GCF_000193375.1 g__Thermoproteus 100 0 100 GTDB representative genome GB_GCA_001595915.1 g__SG8-52-3 85.2 2.4 73.2 GTDB representative genome GB_GCA_001774245.1 g__UBA8941 80.58 2.91 66.03 GTDB representative genome GB_GCA_001595785.1 g__SM23-78 82.24 0 82.24 GTDB representative genome U_71425 g__UBA12030 71.18 0.54 68.48 GTDB representative genome GB_GCA_000220375.1 g__Nanosalina 75.39 3.74 56.69 GTDB representative genome

7

Figure S1 | Marker gene distribution in major archaeal lineages. Shown is the average marker gene copy number per lineage. Copy numbers were ranked for each lineage, using three categories per genome: 0 (gene is missing), 1 (gene is present in a single copy), 2 (gene is present in multiple copies). Note that if multiple copies of a particular marker gene were detected in a genome then this gene was not included in the multiple sequence alignment of this genome.

8

Figure S2 | Seven standard ranks versus other (b) classifications, including no rank information, in the NCBI . (a) Shown are the seven standard ranks (green) and all other classifications (including names with no rank; orange) ranks for all 1248 archaeal taxa in the NCBI taxonomy, grouped by GTDB phylum. The value on the x-axis shows the average number ranks with a maximum of seven ranks. For example, the GTDB phylum Crenarchaeota includes the genome HRBIN01 sp002898355 (GCA_002898355.1) with the GTDB tax string “d__Archaea; p__Crenarchaeota; c__Nitrososphaeria; o__Caldiarchaeales; f__Caldiarchaeaceae; g__HRBIN01;

s__HRBIN01 sp002898355” and the corresponding NCBI tax string “d__Archaea; x__unclassified ; x__unclassified Archaea DPANN (miscellaneous); s__archaeon HR01”. Translating the NCBI tax string into seven standard ranks yields “d__Archaea; p__; c__; o__; f__; g__; s__” which results in a value of 1 in the bar graph since this genome has only one assinged standard rank (the domain Archaea). Repeating this exercise with all genomes in the GTDB phylum Crenarchaeota will yield the average value of 2.3 for all canonical ranks and 4.9 for all other classifications with a rank. (b) Percentage of non-canonical ranks in defined NCBI phyla for the NCBI ranks class, order, family, genus, and . The scale spans from 0% (light blue) to 100% (purple) non-canonical ranks.

9

Figure S3 | Taxonomy of DPANN lineages in NCBI vs GTDB. Shown is a section of the ar122.r89 tree decorated with the NCBI r89 taxonomy. Taxa are coloured according to their NCBI taxonomy: taxa assigned to the phylum Woesearchaeota are shown in red, taxa assigned to Nanoarchaeota in blue, and taxa assigned to Aenigmarchaeota in green. The GTDB taxonomy (R04-RS89) is provided on the right, with brackets indicating the range of named lineages. Abbreviations: o__Parv. (order Parvarchaeales).

10

Figure S4 | Taxonomy of the class . Shown are sections of the ar122.r89 tree decorated with (a) NCBI taxonomy and (b) GTDB taxonomy. (a) The class Methanomicrobia comprised three polyphyletic orders in NCBI (blue boxes, red font), two of which (Methanosarcinales and Methanocellales) are separated from the type order . (b) In GTDB we re-assigned two of the orders to class ranks (Methanosarcinia and Methanocellia) and preserved the class Methanomicrobia (blue box) for the lineage containing the type material: is the type genus of the order Methanomicrobiales and the type genus of the family [4, 5]. The type species is Methanomicrobium mobile [1].

11

Figure S5 | Resolving the polyphyly of the genus Thermococcus. A section of the ar122.r89 tree decorated with the NCBI taxonomy is shown, with all taxa assigned to the genus Thermococcus in NCBI colored in red. The r89 GTDB taxonomy resolving the Thermococcus polyphyly is shown on the right. The monophyletic clade containing the type species Thermococcus celer (highlighted by a green box) retains the name Thermococcus, whereas the remaining polyphyletic “Thermococcus” groups obtained alphabetical suffixes indicating polyphyly (e.g. Thermococcus_A, Thermococcus_B, etc.). The misclassified species Thermococcus chitonophagus was reclassified as Pyrococcus chitonophagus since it branches within the genus Pyrococcus with the type species Pyrococcus furiosus (blue box).

12

Figure S6 | Rank normalization of the genus Methanobrevibacter. Section of the ar122.r89 tree showing all genomes assigned to the genus Methanobrevibacter in NCBI. Based on the rank normalization in GTDB, the genus Methanobrevibacter was split into five genera, one of which includes the type material Methanobrevibacter ruminantium (red font) and therefore retained the name Methanobrevibacter (green box), whereas for lineages without type material (rose boxes) the present names were retained as placeholders with alphabetical suffixes (Methanobrevibacter_A, Methanobrevibacter_B, etc).

13

Figure S7 | Dendrograms representing the pairwise normalized Robinson-Foulds (nRF) distance. The dendrograms were constructed using the Neighbour-Joining algorithm on pairwise distances. (a) The nRF distance was calculated on the full set of taxa, without tree pruning. (b) The nRF distance was calculated after pruning each tree to the subset of common taxa (n=643) which was limited by the SSU selection criteria. Coloured boxes highlight similar approaches: SSU based trees (light orange colour), rp1 and rp2 trees (blue), 122+ markers base trees (green), astral supertrees (purple). The ar122.r89 GTDB reference tree is highlighted in bold. More details about tree inference methods are provided in Table S10.

14

Figure S8 | Heatmap based on the normalized Robison Foulds distance. All inference methods and marker(s) sets are shown. Note that the RF distance is calculated as the sum of partitions in one tree that are not shared by the other. The ar122.r89 GTDB reference tree is highlighted in bold.

15

Figure S9 | Individual protein phylogeny of the 122 markers. All graphs are based on the F measure calculated for the single gene phylogeny (IQTREE, C10, PMSF) for each marker protein. The upper plots show the results for all GTDB taxa (genus to phylum). The lower plots show the same results but exclude taxa consisting of only a single genome. (a) The average F measure for each marker protein comparing the ranks of genus, family, order, class, and phylum. Each dot represents the average F-measure of a rank in an individual protein phylogeny, derived from a single marker. The ranks are color coded, ranging from phylum (dark blue) to genus (gray). The red line indicates the 95% F measure mark, defining all internal nodes above this line as “operationally monophyletic”. (b) Average single- 16 gene tree F measure per rank. As expected, the highest F measure is reported for lower ranks from genus to order. (c) Percentage of the GTDB taxa per rank (genus to phylum) recovered as monophyletic groups in ≥50% of single protein trees. The average across all taxa is shown as a red line at 91.3%. (d) Average percentage at which a taxon was resolved as monophyletic in all single marker trees, categorized per rank (genus to phylum) in GTDB. The average across all taxa is shown as a red line at 89.6%. Note: the rank of species was excluded, since the 1248 archaeal GTDB taxa are dereplicated on the species level, which results in a default F measure of 1 for each species.

17

Figure S10 | Percentage of operational monophyletic taxa in the SSU trees. The average percentage across all ranks (genus to phylum) is shown as a red line at 84.6% (IQ-TREE) and 85.7% (FastTree). Note that only SSU genes with equal or greater than 900bp after trimming were included in the analysis. Taxa comprising only a single genome were excluded from the analysis.

18

Figure S11 | Percentage of operational monophyletic taxa in all compared trees. (a) The average of operationally monophyletic (yellow bars) and polyphyletic (red bars) taxa across all ranks (genus to phylum) in percent. The dotted red line depicts the overall average of 97.7% calculated across all ranks. (b - f) Number of operational monophyletic (yellow bars) and polyphyletic taxa (orange bars) taxa across different inference methods and markers shown for the ranks of phylum (b), class (c), order (d), family (e), genus (f). Note that only taxa with two or more genomes were included. Also note that the data set (order representatives) used for PhyloBayes restricts the analysis to the ranks of phylum and class. Monophyly and operational monophyly was determined based on the F measure of decorated internal nodes. Abbreviations: “comp. bias” = tools to remove computational bias.

19

BS taxon 100 c__Methanocellia; o__Methanocellales 86 c__Thermoplasmata 70 o__Desulfurococcales 88 o__SCGC-AAA252-I15 86 o__Thermofilales 97 o__WOR-SM1-SCG 82 f__GW2011-AR4 100 f__Halococcaceae 100 f__NZ13-MGT 89 g__Acidianus 93 g__Acidilobus 82 g__Halogranum 99 g__MGIIa-L1 82 g__MGIIb-O3 91 g__UBA493 100 g__UBA71 90.3 Average 8.82 Stdev

Figure S12 | Bootstrap support values in the ar122.r89 tree of taxa resolved as polyphyletic in rp1 and rp2 trees. With the exception of the order Desulfurococcales (70% bootstrap support) all lineages have a bootstrap value of 81% or higher, with an average of 90.3%. Taxa are ordered by rank; bar charts indicate bootstrap value from 70- 100%. Abbreviations: BS = bootstrap support value.

20

Figure S13 | RED comparisons of different phylogenetic trees decorated with the R04-RS89 GTDB taxonomy. RED distributions for taxa at each rank are shown as the relative distance from the median RED value of the rank. (a) phylum, (b) class, (c) order, (d) family, (e) genus. The phylogenetic trees are defined in Table S10.

21

(a)

22

(b)

23

(c)

(d)

24

(e)

(f)

Figure S14 | Taxonomic differences of phylogenetic trees decorated with the 04RS89 GTDB taxonomy. The plots show the named taxonomic ranks which differ in their taxa composition in a given tree compared to the ar122.r89 tree. Thereby, each tree plot illustrates the number of expected taxa for a given rank (red box) based on the ar122.r89 tree. The total number of taxa that cluster unexpectedly within or outside this rank in all the trees included in a plot are shown in a purple box, and the number of taxa that cluster unexpectedly within or outside this rank in a specific tree is shown in green, yellow, and blue boxes. The comparison includes trees calculated from (a) 16 and 23 ribosomal proteins, (b) small subunit rRNA sequences, (c) 122 markers with alternative methods, and (d) 122 markers after applying tools to resolve compositional bias, (e) 122 and 253 markers via a supertree approach, and (f) 122 markers and a reduced dataset of 96 order representatives inferred with PhyloBayes. Abbreviations: “No. Common In” = in all trees, the same x taxa are unexpectedly descendants of this specific rank; “No. Common Out” = in all tree, the same y taxa are no longer descendants of this specific rank; “No. In” = in a given tree x taxa are unexpectedly descendants of this specific rank; “No. Out” = in a given tree, y taxa are no longer descendants of this specific rank. Note that in (c) in tree 2.3.iqtree.c10.slow (2.3.iqtreeC10) a single taxon (GB_GCA_002789275.1; Candidatus Huberarchaea) was placed within the p__Nanoarchaeota.

25

Figure S15 | Phylogenetic supertree calculated with the ASTRAL approach using 253 proteins. The major difference to the ar.122r89 tree taxonomy is that 48 taxa cluster outside of the phylum , this includes the classes Thermococci (46 taxa) and Methanopyri (2 taxa), both are highlighted in burgundy. Note if you are French you most likely call this shade of red after a different wine region, Bordeaux. Although if you speak Quebec French, it is Bourgogne for you.

26

Figure S16 | Impacts of the root placement on the relative evolutionary distance (RED). A fixed root pulls the taxa within the rooted lineage towards the root and causes slightly lower RED values of these taxa. The plot compares the RED value of the archaeal class rank across all rooting scenarios including the GTDB rooting approach (GTDB; blue), the root midpoint between the DPANN superphylum and the rest of the Archaea (DPANN; light blue), and the midpoint rooting within the NCBI phylum Euryarchaeota (EURY; orange). The affiliation of each class to the lineages used for the different rooting scenarios is indicated by blue boxes and the name of the lineage (DPANN = NCBI superphylum DPANN; EURY = NCBI phylum Euryarchaeota, containing the three GTDB phyla Thermoplasmatota, Halobacteriota, and Euryarchaeota). Note that in cases where DPANN (blue box labelled “DPANN”) was used to root the tree (light blue dots) the classes within this lineage got the lowest RED values, the same holds true for the other rootings, e.g. rooting within the NCBI phylum Euryarchaeota (yellow dots) results in lower RED values for classes within these lineages.

27

Figure S17 | Section of the ar122.r89 tree showing the lineages in the DPANN superphylum. The taxonomic groups are collapsed on the order level. GTDB phyla are indicated in red, and the internal nodes representing lineages containing the Candidatus type material are highlighted in blue. Note the low bootstrap values for some bifurcations in this section of the tree. Scale bar represents 0.1 substitutions. Note that the taxon ID GCA_002254415.1 of the GTDB representative of the species s__EX4484-52 sp002254415 was omitted from the Figure due to space limitations.

28

Figure S18 | The polyphyletic genus Sulfolobus. Shown is a section of the ar122.r89 tree containing the family Sulfolobaceae (purple font). All genomes assigned to the polyphyletic genus Sulfolobus in NCBI are highlighted in red. In GTDB Sulfolobus was split into 4 genera. Next to the true Sulfolobus, with the type species Sulfolobus acidocaldarius DSM 639 [7] (DSM 639; approved list 1980; bold blue font), three genera with placeholder names (Sulfolobus_A, Sulfolobus_B, Acd1) were created. Scale bar represents 0.1 substitutions.

29

Figure S19 | The genera Natrinema, Halopiger, Haloterrigena, Natronolimnobius and Natronorubrum. A section of the ar122.r89 tree shows the family Natrialbaceae and its genera as defined in GTDB R04-RS89. For each taxon the NCBI ID is provided, followed by the NCBI species name and the GTDB species name. Type material, i.e. the type strain of the species, is highlighted in bold and indicated with an asterisk. The relative evolutionary divergence (RED) is provided as a scale bar below the tree. The tree scale bar shows 0.05 substitutions.

30

Figure S20 | Total number of genomes in GTDB releases. Shown is the total number of genomes that passed QC in the current and previous releases of the GTDB taxonomy for the archaeal (blue) and the bacterial (orange) domain.

31

References 1. Paynter MJB, Hungate RE. Characterization of Methanobacterium mobilis, sp. n., Isolated from the Bovine Rumen. J Bacteriol 1968; 95: 1943–1951.

32