<<

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Growth temperature is the principal driver of chromatinization in

Antoine Hocher1,2*, Guillaume Borrel3, Khaled Fadhlaoui4, Jean-François Brugère4, Simonetta Gribaldo3, Tobias Warnecke1,2*

1Medical Research Council London Institute of Medical Sciences, London, United 2Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom 3Department of Microbiology, Unit “Evolutionary Biology of the Microbial Cell”, Institut Pasteur, Paris, France 4Université Clermont Auvergne, CNRS, Laboratoire Microorganismes: Génome et Environnement LMGE, Clermont-Ferrand, France

*corresponding author : [email protected] [email protected]

Running title: A quantitative view of archaeal chromatin

1 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

ABSTRACT

Across the tree of life, DNA in living cells is associated with proteins that coat chromosomes, constrain their structure and influence DNA-templated processes such as transcription and replication. In bacteria and , HU and histones, respectively, are the principal constituents of chromatin, with few exceptions. Archaea, in contrast, have more diverse repertoires of nucleoid-associated proteins (NAPs). The evolutionary and ecological drivers behind this diversity are poorly understood. Here, we combine a systematic phylogenomic survey of known and predicted NAPs with quantitative protein abundance data to shed light on the forces governing the evolution of archaeal chromatin. Our survey highlights the Diaforarchaea as a hotbed of NAP innovation and turnover. Loss of histones and Alba in the ancestor of this clade was followed by multiple lineage-specific horizontal acquisitions of DNA-binding proteins from other . Intriguingly, we find that one family of Diaforarchaea, the Methanomethylophilaceae, lacks any known NAPs. Comparative analysis of quantitative proteomics data across a panel of 19 archaea revealed that investment in NAP production varies over two orders of magnitude, from <0.02% to >5% of total protein. Integrating genomic and ecological data, we demonstrate that growth temperature is an excellent predictor of relative NAP investment across archaea. Our results suggest that high levels of chromatinization have evolved as a mechanism to prevent uncontrolled helix opening and runaway denaturation – rather than, for example, to globally orchestrate gene expression – with implications for the origin of chromatin in both archaea and eukaryotes.

Keywords: chromatin, histones, archaea, nucleoid,

2 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

INTRODUCTION

Archaeal genomes are coated by small, abundant and often basic proteins that bind DNA with low sequence specificity. As in bacteria, these are collectively referred to as nucleoid- associated proteins (NAPs). A variety of proteins that fit this loose description have been described in archaeal model organisms, and include histones, Alba, Cren7, and MC1 (Zhang et al. 2012). Whereas histones provide the backbone of chromatin across eukaryotes, the repertoire of major chromatin proteins in archaea is considerably more diverse and fluid. Histones are common but were lost along several lineages, including the / and (Adam et al. 2017). On the flipside, several NAPs are abundant but lineage-specific, including HTa in the

(Hocher et al. 2019) and Sul7 in the Sulfolobales (Zhang et al. 2012).

The evolutionary and ecological drivers of NAP turnover in archaea are poorly understood. In fact, the diversity of NAPs in archaea has been described as “puzzling” in light of otherwise highly conserved information processing pathways (Visone et al. 2014). Do different NAPs represent adaptations to specific niches? If so, what factors determine the presence or absence of a given NAP in a given genome? Alternatively, are NAPs in archaea diverse because several different proteins can do the same job, rendering them exchangeable? In other words, are these NAPs functional analogues that can replace each other without serious repercussions for genome function?

Phylogenomic surveys provide a useful starting point to tackle these questions. They chart the distribution of homologous genes across a set of genomes and allow gain and loss events to be traced along the phylogeny. The resulting presence/absence patterns, considered in ecological context, might then reveal clues as to why a particular gene is found in one set of genomes but not another. Phyletic comparisons, however, can be treacherous. The presence of a specific gene in two genomes does not necessarily imply that the protein product is doing the same job in both. Indeed, in the case of NAPs, we have reason to suspect that this assumption – known as the ortholog conjecture – is not always met. Histones, for example, are highly abundant at the protein level in Thermococcus kodakarensis (1.76% of total protein) but only weakly expressed in salinarum (0.02% of total protein) (Rojec et al. 2019; Müller et al. 2020). Given the difference in abundance, histones are unlikely to play the same roles in nucleoid biology in these two . In line with this, the

3 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

single H. salinarum histone gene (hpyA) is dispensable for growth (Dulmage et al. 2015), whereas retention of at least one of its two histone genes (htkA, htkB) is essential in T. kodakarensis (Cubonova et al. 2012). Alba too is highly expressed in many archaea, including shibatae (1.6% of total protein) but >100-fold less abundant in others, e.g. Methanococcus maripaludis (0.01% of total protein) (Liu et al. 2009). These large differences in abundance are indicative of cryptic functional diversity that is not directly accessible via comparative genomics.

Here, we combine a systematic phylogenomic survey of NAPs with quantitative mass spectrometry data on NAP abundance to shed light on the evolutionary drivers of chromatin diversity in archaea. Our survey highlights the Diaforarchaea as a group that has experienced exceptional NAP fluidity. Most surprisingly, we find that the Methanomethylophilaceae, a family in the order Methanomassiliicoccales, are devoid of any known NAPs. We generate quantitative proteomics data for two members of the Methanomassiliicoccales, Methanomethylophilus alvus and Methanomassiliicoccus luminyensis, and develop a simple pipeline to identify novel NAP candidates in these species. Applying this pipeline to proteomic data from an additional 17 archaea, we propose several new proteins with the capacity to play global roles in nucleoid organization, including highly abundant but previously uncharacterized proteins in model archaea. Pan-archaeal analysis revealed substantial quantitative variation in NAP abundance. Total investment in NAPs varies over two orders of magnitude, from as little as 0.014% to 5.38% of total protein. Exploring potential ecological and genomic covariates of differential investment in NAPs, we pinpoint growth temperature as the principal driver of NAP abundance across archaea. Our results suggest that high levels of chromatinization in archaea are first and foremost an adaptation to thermal stress. We speculate that chromatin in eukaryotes might constitute an evolutionary hangover from their thermophilic past: unlike many of their archaeal cousins, eukaryotes were unable to reduce their large investment in chromatin proteins as they transitioned to a mesophilic lifestyle, presumably because histones had assumed important functions other than thermoprotection.

4 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

RESULTS

A phylogenomic survey uncovers archaea without known nucleoid-associated proteins

To provide an up-to-date view of NAP diversity across the archaea, we first collated a list of previously described archaeal NAPs (Fig 1A) and used sensitive Hidden Markov Model (HMM) scans to establish the presence/absence of NAP homologs in 1419 archaeal genomes representative of known archaeal diversity (Table S1, see Methods). As highlighted previously (Zhang et al. 2012), archaeal chromatin is not dominated by a single protein but by small cliques of typically two (and sometimes three or more) abundant proteins (Fig 1B/C, Table S1). Different NAPs from a pan-archaeal repertoire can co-occur in most any clique, which are frequently dismantled by gene loss and absorb new members via horizontal gene transfer (HGT). Across our sample, any given NAP can be found partnering any other (Fig 1B, Fig S1), suggestive of functional promiscuity. While some NAPs are phylogenetically widespread, none are universal to archaea (Fig 1B/C). Histones and Alba are the most common and were likely present in the last archaeal common ancestor (LACA), but both have been lost along different lineages (Fig 1C). Conversely, gains are common and frequently driven by HGT (see below).

One clade where fluidity in NAP repertoire is particularly striking is the Diaforarchaea (Fig 2A). Both histones and Alba were lost at the root of this clade. Several lineages, including the Thermoplasmatales, later re-acquired Alba from different sources, as supported by the polyphyletic distribution of diaforarchaeal homologs on a pan-archaeal Alba tree (Fig 2B). Subsequent to alba/histone loss, NAP repertoires evolved in a highly idiosyncratic fashion along different diafoarchaeal lineages. We previously described the presence of an HU homolog (HTa) in acidophilum, which is highly expressed, exhibits histone- like binding behaviour, and was likely acquired via HGT from bacteria (Searcy 1975; Hocher et al. 2019). HU homologs, however, are confined to the Thermoplasmatales and absent from the remainder of the Diaforarchaea (Fig 2A). Similarly, most members of the marine group II (MG-II) archaea encode MC1, a NAP best known from spp. (Chartier et al. 1988) and widespread amongst (Fig 1C). Again, MC1 is only present in MG-II but absent from other diaforarchaeal lineages, and was likely acquired via HGT (Fig 2A, S2). Most curiously, we find that the members of one diaforarchaeal lineage, the Methanomethylophilaceae, encode no known NAPs whatsoever (Fig 2A).

5 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Predicting novel candidate NAPs

Do members of the Methanomethylophilaceae make do without major nucleoid-associated proteins? Or do these organisms encode as yet uncharacterized NAPs? To begin to address these questions, we generated quantitative mass spectrometry data for two members of the Methanomassiliicoccales, both isolated from the human gut: Methanomassiliicoccus luminyensis (Dridi et al. 2012) and Methanomethylophilus alvus (Fadhlaoui, Ben Hania et al., in preparation). We detected and quantified 72% of the predicted proteome in M. alvus and 67% in M. luminyensis, which compares favourably to a recent state-of-the-art effort to catalogue proteins across the tree of life (Müller et al. 2020) (Fig S3). Alba, though present in the genome of M. luminyensis, was not expressed at detectable levels. We then developed a simple pipeline to predict proteins that might play a global role in nucleoid organization similar to known NAPs (Fig 3A, Methods). To qualify as a candidate NAP, proteins had to meet four criteria: First, they could not exceed the size of previously characterized NAPs, which are typically short. Permissively, we only considered proteins smaller than 290 amino acids, 110% the size of TrmBL2 in T. kodakarensis (see below). Second, they had to contain either a known DNA-binding or be predicted to bind DNA (see Methods). Third, they needed to be expressed at a level that makes them high-abundance outliers compared to other predicted transcription factors, objectively determined using Rosner tests (see Methods). Fourth, they had to be encoded as single-gene operons. The latter criterion was adopted following the observation that known NAPs are usually present in single-gene operons (Fig 3B, see Methods). Two proteins in M. luminyensis met these criteria. One of them (WP_019177984.1) is a small (74 amino acids), basic (pI: 9.64) and lysine-rich protein, which constitutes 1.34% of the M. luminyensis proteome (Fig 3C, Table S1), making it the 12th most highly expressed protein in our sample. Its homolog in M. alvus (AGI86273.1) was independently identified as the sole candidate NAP in this organism, where it is less strongly expressed (0.14% of total protein, ranking 123rd out of 1220 proteins). Neither protein contains a known DNA-binding domain. Phylogenomic analysis reveals orthologs of WP_019177984.1/AGI86273.1 throughout the Methanomassiliicoccales and in several bacterial genomes, particularly amongst Sphingomonadales and Rhodobacterales (Fig S4). Monophyly of the Methanomassiliicoccales homologs suggests a single acquisition event at the root of this clade, which preceded the loss of Alba (Fig S4, Fig 2). Experimental investigation will be required to confirm or refute the association of

6 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

WP_019177984.1/AGI86273.1 with the nucleoid, but we suggest – based on this preliminary evidence – that it is unlikely that Methanomethylophilaceae make do entirely without NAPs. Relaxing criteria on single-gene operon status did not reveal additional candidates for M. alvus. Two additional candidates were recovered in M. luminyensis but their quantitative contribution to overall NAP investment is minor (0.4%).

Detection of novel candidate NAPs in several model archaea

To see whether the same approach might reveal additional candidate NAPs in other archaea, including well-characterized model species, we applied the same pipeline to a compendium of 17 archaea for which we could retrieve published proteome-scale quantitative mass spectrometry data. Quantitative inventories for 13 of these species were recently published as part of a cross-kingdom proteome survey (Müller et al. 2020) and generated using the same protocol that we used for M. luminyensis and M. alvus (see Methods).

Starting from 22643 proteins across 17 species, and excluding known NAPs, we retrieved 22 candidate hits (Fig 3D; Fig S5). Manual inspection revealed one obvious false positive, AAY80109.1 in Sulfolobus acidocaldarius, a truncated Lrp/AsnC transcriptional regulator that, unlike in other Sulfolobales, lacks a DNA-binding domain. Reassuringly, we recover TrmBL2, a known constituent of chromatin in T. kodakarensis where it is unusually abundant compared to TrmB homologs in other archaea (Fig S6). For some species (e.g., Methanothermobacter marburgensis and Archaeoglobus fulgidus) we identified no additional candidates, suggesting that our pipeline is not excessively greedy (Fig 3D). For others (e.g., Sulfolobus acidocaldarius), we only retrieved candidates that are much less abundant than known NAPs in the same organism. In contrast, we also observed species where novel candidates make up a substantial portion of the overall investment in NAPs, rivalling or even dwarfing the abundance of known NAPs. Notably, this list includes the model archaeon volcanii, where the two candidate NAPs (HVO_1577, HVO_2029) are considerably more abundant than either histones or MC1 (Fig 3D), a finding we confirm in an independently generated proteomics dataset (Fig S7). We note that, unlike the NAP candidate WP_019177984.1/AGI86273.1 in M. luminyensis/M. alvus, most of these hits are larger than well-known NAPs and might therefore, like TrmBL2, have originated from normally less abundant transcription factors.

7 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Investment in NAPs varies extensively across archaea

With or without candidate NAPs, the comparison above revealed striking variation in total NAP investment across species, ranging over two orders of magnitude, from 0.014% of total protein in the magadii to 5.38% in Archaeoglobus profundus (Fig 3D). Individual NAPs can vary over a similar range: relative histone abundance, for example, varies up to 400-fold (Fig S8), from 3.2% of the proteome in A. profundus to <0.06% in Nitrosopumilus maritimus, Methanosarcina barkeri and the , where abundance is indistinguishable from that of sequence-specific transcription factors (Fig S8). Note here that we operationally define % total protein as the fraction of total intensity in the mass spectrometry data attributable to NAPs.

What are the reasons for this wide variability in NAP investment? We initially considered the possibility that variability is not, in fact, biologically meaningful but attributable to experimental factors. It is conceivable, for example, that a NAP, once detected, might represent an artificially high proportion of a proteome simply because comparatively few proteins were quantified. This is clearly not the case in H. volcanii: less than 60% of protein- coding genes were quantified in this species (Fig S3) but histones make up only a small proportion of this total. It is also not generally true: we find no correlation between fractional coverage of the predicted proteome and the proportion allocated to NAPs (rho=-0.31, P=0.19). Further, differential investment is also evident when relative abundance is scaled not to the total proteome but to the abundance of house-keeping genes (tRNA synthetases) that show low cross-species variability (Fig S9, see Methods).

Next, to confirm that fractional protein abundances can be reasonably compared across species, we asked whether the relative abundance of a protein in one species is usually predictive of the relative abundance of its homolog in another species. Considering reciprocal best-blast hits (RBBHs) between species as an indicator of homology, we find this to be the case. Organisms that are phylogenetically or ecologically close tend to have more correlated abundance profiles (Fig S10). This is particularly evident when we consider not individual RBBHs but instead aggregate protein abundance by Pfam domain content or gene ontology category (see Methods, Fig S10). These results indicate that we can make informative quantitative comparison across species using the proportion of total protein as a metric. The

8 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

results also advocate the use of lower granularity. Below, we therefore consider the abundance of all NAPs collectively.

Growth temperature predicts NAP investment

We next asked whether relative NAP investment is a (somewhat trivial) function of genome size, whereby organisms with larger genomes need to make a greater relative investment in NAPs because they have more DNA to coat. We find this not to be the case (rho=-0.3, P=0.21). To gain clues into potential ecological drivers of NAP investment, we then determined which proteins (or protein domains/functional categories) quantitatively covaried with NAP investment across species (see Methods). Amongst the most highly correlated domains, we find several that are classically associated with heat stress, including the protein chaperones Hsp20 and prefoldin but also RTCB, which has been implicated in recovery from stress-induced RNA damage (Tanaka and Shuman 2011) (Fig 4A). Prompted by these findings, we examined several environmental and phenotypic variables, including optimal growth temperature (OGT), pH and doubling time. We find that relative abundance of NAPs is uniquely, and strongly, associated with OGT (rho=0.83, P=8e-6, Fig 4B, Table S2). This finding is robust to inclusion/exclusion of candidate NAPs (Fig S11) and holds true for individual NAPs, notably histones and Alba, which are sufficiently widespread to allow cross-species comparisons (Fig S11). Importantly, the relationship between NAP abundance and OGT is preserved when controlling for phylogenetic non-independence (see Methods). The relative abundance of sequence-specific transcription factors, on the other hand, does not covary with temperature (Fig 4C).

Is it unexpected to find a strong relationship between the abundance of a certain class of proteins and OGT? To address this question, we computed the correlation between OGT and the relative abundance of 1154 Pfam domains (and 297 gene ontology categories) across the 19 species in our analysis. Strikingly, NAPs (considered as an aggregate class) have the strongest relationship with growth temperature (Fig 4D).

NAP investment increases with temperature on physiological time scales

9 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

If growth temperature is a driver of NAP abundance over evolutionary time scales, this might also be true for physiological time scales. Do we see NAP abundance go up following heat stress and decrease in response to cold shock? Although no systematic data exist that span the diversity of species examined above, prior temperature shift experiments from various archaea support this hypothesis: first, NAP abundance is affected by temperature in T. kodakarensis (Fig 4E) (Sas-Chen et al. 2020), with reduced levels of histones and Alba driving a 13.5% relative drop in total NAP investment at 65ºC compared 85ºC (Fig 4F). Similarly, histone transcripts are downregulated upon cold shock in M. jannaschii (Boonyaratanakornkit et al. 2005), as are histones and MC1 in burtonii (Campanaro et al. 2011), while Sul7 expression increases upon heat shock in (Tachdjian and Kelly 2006). Taken together, these observations suggest that NAP investment varies with growth temperature not only over evolutionary but also physiological time scales.

DISCUSSION

Temperature is a powerful driving force for molecular evolution. Selection for increased thermostability has left conspicuous footprints on the composition of proteins and RNAs in many species. Proteins from are, for example, enriched in charged and hydrophobic amino acids while their structural RNAs (tRNAs, rRNAs) exhibit higher than average GC content, consistent with the need for stronger base-pair bonds at higher temperatures (Hickey and Singer 2004; Zeldovich et al. 2007). Similar compositional hard- coding was suspected to occur at the DNA level: as G-C base-pairs provide greater stability than A-T pairs, it was reasonable to suspect that thermophiles have genomes with elevated GC content. However, this turned out not to be the case (Hickey and Singer 2004). As demonstrated, for example, by T. kodakarensis (52% GC, OGT: 85ºC) and Pyrococcus furiosus (41% GC, OGT: 100ºC), average or even low genomic GC content is clearly compatible with growth at high temperatures (Grogan 1998).

NAPs, along with polyamines, have been suggested as alternative solutions, which can be deployed dynamically and on physiological time scales. A number of in vitro studies on archaeal histones (Sandman et al. 1990), Sul7 (Baumann et al. 1994; Guagliardi et al. 1997), HTa (Stein and Searcy 1978; Searcy 1986), and MC1 (Chartier et al. 1988) have shown that NAP binding can significantly increase melting temperature, reduce the risk of DNA

10 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

denaturation, and/or promote strand re-annealing. The results reported here unify these findings to establish temperature as a universal quantitative driver of investment in NAPs across the archaea.

NAPs likely play a role in reducing the risk of accidental as well as programmed opening events, which occur in the context of transcription, replication, and repair. A particular risk in this regard emanates from promoters, which are AT-rich and poised to open to enable transcription. Thermophiles appear to have reduced this liability in part through a simple strategy: losing promoters. The number of genes per transcription unit, co-expressed from a single upstream promoter, increases with temperature (Fig S12). In addition, the proportion of the genome dedicated to intergenic regions decreases with temperature across archaea (Sabath et al. 2013). Pinning the promoter on either side with DNA-binding proteins – as observed for histones (Nalabothula et al. 2013) but also HTa in T. acidophilum (Hocher et al. 2019) – might have evolved in parallel to prevent uncoordinated promoter melting and runaway extension of the resulting denaturation bubbles.

Protection from denaturation as the principal function of NAPs is consistent with the high fluidity of NAPs across archaeal evolution, epitomized by the Diaforarchaea. Proteins of several different folds will probably do a serviceable job of raising DNA melting temperature and curbing denaturation if they found themselves transplanted into a different genomic context. This model of NAP evolution is further consistent with limited quantitative covariation between the abundance of NAPs and other chromatin factors across evolution and with NAPs being encoded as single-gene operons. Both observations suggest a scarcity of intimate dependencies.

At the same time, we must highlight that our results do not exclude specific adaptive roles that may drive NAP diversity across the archaea. One such adaptive role might be in the prevention, detection, and repair of DNA damage (Grogan 1998). Mutagenic challenges differ considerably across environments and might favour some NAPs over others. MC1, for example, protects against radiation damage (Isabelle et al. 1993), a frequent insult for that live in shallow aquatic environments. Conversely, Cren7 binds to T:G mismatches produced by cytosine deamination events (Tian et al. 2016), which become more common at higher temperature. Thus, we suggest that while denaturation concerns shape total

11 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

NAP abundance, NAP diversity is likely a product of both exchangeability and species- specific requirements for nucleoid function and maintenance.

New chromatin components can be acquired from other archaea or bacteria, as illustrated by the eventful natural history of the Diaforarchaea where HTa, Alba, MC1 and the newly identified Methanomassiliicoccales protein can trace their origin to horizontal transfers. In addition, new components can arise through innovation/repurposing of proteins already present in the proteome, as transcription factors, like TrmBL2 in T. kodakarensis, become global chromatin constituents. Conversely, proteins can lose their global architectural roles. In the most extreme case, once abundant NAPs can be lost entirely – as observed in the Methanomethylophilaceae. They can also undergo significant reductions in abundance. This is what appears to have happened to histones in halophiles and other lineages, consistent with their non-essential status in H. salinarum (Dulmage et al. 2015) and Methanosarcina mazei (Weidenbach et al. 2008). The low abundance of HstA in H. volcanii, at both the transcript (Rojec et al. 2019) and protein level (Fig 3D), is hard to reconcile with its purported role as major architectural factors (Ammar et al. 2011). We therefore suggest that prior findings of widespread protection from micrococcal nuclease digestion in this species might, in fact, be caused not by histones but by an as yet uncharacterized protein or set of proteins. Our de novo prediction pipeline suggests HVO_1577, a protein that contains an HrcA DNA-binding domain, as a candidate. Quantitative chromatin mass spectrometry and/or histone/HVO_1577 deletions in H. volcanii will help to clarify this issue.

Our data show that, when moving from a thermophilic to a mesophilic niche, different lineages of archaea have reduced their investment in NAPs – shedding a cost that can now be spared. Does this imply that all archaeal mesophiles exhibit reduced investments in chromatin? We suspect not. For example, histones are very highly expressed, at least at the transcript level, in some mesophilic members of the , notably Methanobrevibacter smithii, which grows at 37ºC (its three histones are ranked 1st, 8th, and 282nd most highly expressed) (Rojec et al. 2019). Whether this also holds true at the protein level remains to be established, but we suggest that high levels of chromatinization might be obligatory for thermophiles but facultative for mesophiles.

Finally, eukaryotes are (in the vast majority) mesophiles, yet their DNA is ubiquitously wrapped in nucleosomes and histones have become a linchpin whose removal leads to the

12 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

collapse of controlled gene expression (Hennig and Fischer 2014). The acquisition of histone tails and their subsequent use for signalling might have been one of the factors driving entrenchment, generating a thick top layer of cellular machinery that acts on, modifies, and remodels nucleosomes to orchestrate gene expression, DNA repair and replication. Over time, the evolution of cryptic promoters, rendered inaccessible by nucleosomes but activated following their removal, might also have contributed to the retention of global chromatinization (Hennig and Fischer 2014). Irrespective of the factors that first rendered eukaryotic histones indispensable, we speculate that high levels of chromatinization in eukaryotes might represent an evolutionary hangover from their thermophilic ancestry, and that eukaryotes – unlike many of their archaeal cousins – evolved a dependency on global chromatinization that they were unable to break when moving into a more temperate niche.

METHODS

Culture of Methanomassiliicoccales M. luminyensis was obtained from DSMZ (DSM 25720) and M. alvus (isolate Mx-05) had been isolated previously by one of us (JFB). Both strains were grown in strictly anaerobic

conditions (2 atm. of H2/CO2 – 20/80) using 10 mL of growth medium in 50 mL glass bottles according to DSMZ recommendations for M. luminyensis with one exception: ruminal fluid (200 µL) was added for M. alvus. Cultures were performed without shaking at 37°C using 60 mM of methanol as electron acceptor for methanogenesis.

Protein pellet preparation and mass spectrometry 50 mL aliquots of 10-day (3-day) cultures of M. luminyensis (M. alvus) were pelleted with care to preserve anoxic conditions as much as possible and stored at -80°C. Protein pellets were prepared following PreOmics iST kit guidelines. 10mg of frozen pellet were resuspended in lysis buffer and volumes adjusted after the heating step to load 100µg as measured by nanodrop absorbance at 205nm. A DNA sonication step was included (Diagenode Bioruptor) followed by digestion for 1.5 hrs. The whole procedure was carried out without interruption and pellets stored at -80°C in MS-LOAD buffer before being processed by mass spectrometry (two biological replicates with technical replicates for each).

Genomes database

13 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

All genomes and proteomes were obtained from NCBI assembly (https://www.ncbi.nlm.nih.gov/assembly) accessed on 2021-05-21. Proteomes that were not available from NCBI were predicted using Prodigal v2.6.3 using default parameters.

Species tree and The archaeal species tree and taxonomic groups were obtained from GTDB (https://gtdb.ecogenomic.org) accessed on 2020-09-23, with some species names updated to reflect current use in the literature (Table S1).

Processing of public proteomics data We only included proteomes that were derived a) from whole cell extract, b) without size selection and c) comprised more than 500 identified proteins. Data for H. volcanii was obtained from Jevtić et al. (2019) and (for Fig S7) Knüppel et al. (2021), T. kodakarensis from Sas-Chen et al. (2020), N. magadii from Cerletti et al. (2018), and N. maritimus from Qin et al. (2018). For each dataset, measurements that did not correspond to the Genbank complete genome of the strain/species were discarded. Correspondence between Uniprot and Genbank ID was established using the Uniprot Retrieve/ID mapping tool. For each dataset, normalized intensity (in %) was computed as the ratio of each protein intensity over the total intensity for the species of interest.

Protein annotations HMM models were downloaded from PFAM (PFAM-A, accessed on 2020-01-20) and TIGR (TIGRFAMs 15.0, accessed on 2020-05-15) and sequences were searched using hmmsearch (version 3.1b2). All searches were carried out using the gathering thresholds provided for each models (option -ga) to ensure reproducibility. No further threshold was applied unless mentioned otherwise. Results were robust to application of an alternative, stricter threshold of 1e-3. As no HMM model existed for C1, we searched for sequences homologous to tenax Cc1 (Uniprot ID : G4RKF6) using jackhmmer (version 3.1b2), applying an e-value threshold of 1e-5. A list of DNA-binding protein PFAM domains was obtained from (Malhotra and Sowdhamini 2015) and domains found in transcription factors were manually annotated from this list. Detailed tables and full sequences of all NAPs and candidate NAPs discussed in this study are available as supplemental material (Table S1).

Gene ontology

14 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Gene ontologies were obtained from pfam2go tables, available at http://current.geneontology.org/ontology/external2go/pfam2go. When computing correlations between environmental variables and gene ontology, each protein was only counted once per ontology.

Habitat and phenotypes Phenotypic data was obtained from Madin et al. (2020) and habitat data from ProGenome Mende et al. (2020). When multiple strains per species or multiple sources per strain were available, optimal growth temperature was averaged, with the exception of H. volcanii for which the average was seemingly too low (38°C) and was thus set at 45°C, based on the description of the original isolate. Growth rates for M. luminyensis and M. alvus (absent from the database) were added manually.

Operon prediction Operons were predicted using Operon Mapper (https://biocomputo.ibt.unam.mx/operon_mapper/) with default settings.

DNA binding prediction DNA binding was predicted using DNAbinder (https://webs.iiitd.edu.in/raghava/dnabinder/) using the SVM model trained on a realistic dataset. Proteins whose score was higher than 0 were considered a possible DNA binding proteins.

Protein alignments and phylogenetic trees Proteins sequences were aligned using MAFFT (option -linsi). With the exception of the species tree (see above), all trees were built using RAXML-NG, model LG+R6. Best maximum likelihood midpoint rooted trees are shown along with the results of 100 non- parametric bootstraps. Trees were visualized using iTol (https://itol.embl.de/).

Phylogenetic linear regression To control for phylogenetic non-independence, phylogenetic linear regression were carried using the R package phylolm, Model “BM” with 10000 bootstraps or ‘OUrandomroot’. Variables were log transformed before regression.

Scripts

15 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

All scripts are available at https://github.com/hocherantoine/NAPsQuant

ACKNOWLEDGEMENTS

We thank Holger Kramer and the LMS Proteomics Facility for generating mass spectrometry data and the Molecular Systems Group for discussions. This work was funded by Medical Research Council core funding (TW), an EMBO Short-Term Fellowship 8472 (AH) and the French National Agency for Research (SG; Grant ArchEvol ANR-16-CE02-0005-01).

CONFLICT OF INTEREST

The authors declare that no conflict of interest exists.

REFERENCES

Adam P. S., Borrel G., Brochier-Armanet C., Gribaldo S., 2017 The growing tree of Archaea: new perspectives on their diversity, evolution and ecology. ISME J 11: 2407–2425.

Ammar R., Torti D., Tsui K., Gebbia M., Durbic T., Bader G. D., Giaever G., Nislow C., 2011 Chromatin is an ancient innovation conserved between Archaea and Eukarya. eLife 1: e00078.

Baumann H., Knapp S., Lundbäck T., Ladenstein R., Härd T., 1994 Solution structure and DNA-binding properties of a thermostable protein from the archaeon Sulfolobus solfataricus. Nat Struct Mol Biol 1: 808–819.

Boonyaratanakornkit B. B., Simpson A. J., Whitehead T. A., Fraser C. M., El-Sayed N. M. A., Clark D. S., 2005 Transcriptional profiling of the hyperthermophilic methanarchaeon Methanococcus jannaschii in response to lethal heat and non-lethal cold shock. Environ Microbiol 7: 789–797.

Campanaro S., Williams T. J., Burg D. W., De Francisci D., Treu L., Lauro F. M., Cavicchioli R., 2011 Temperature‐dependent global gene expression in the Antarctic archaeon Methanococcoides burtonii. Environ Microbiol 13: 2018–2038.

Cerletti M., Giménez M. I., Tröetschel C., D’ Alessandro C., Poetsch A., De Castro R. E., Paggi R. A., 2018 Proteomic Study of the Exponential–Stationary Growth Phase Transition in the Haloarchaea Natrialba magadii and . Proteomics 18: 1800116.

16 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Chartier F., Laine B., Sautiere P., 1988 Characterization of the chromosomal protein MC1 from the thermophilic archaebacterium Methanosarcina sp. CHTI 55 and its effect on the thermal stability of DNA. Biochim. Biophys. Acta 951: 149–156.

Cubonova L., Katano M., Kanai T., Atomi H., Reeve J. N., Santangelo T. J., 2012 An Archaeal Histone Is Required for Transformation of Thermococcus kodakarensis. Journal of Bacteriology 194: 6864–6874.

Dridi B., Fardeau M.-L., Ollivier B., Raoult D., Drancourt M., 2012 Methanomassiliicoccus luminyensis gen. nov., sp. nov., a methanogenic archaeon isolated from human faeces. International Journal of Systematic and Evolutionary Microbiology 62: 1902–1907.

Dulmage K. A., Todor H., Schmid A. K., 2015 Growth-Phase-Specific Modulation of Cell Morphology and Gene Expression by an Archaeal Histone Protein. mBio 6: 781.

Grogan D. W., 1998 Hyperthermophiles and the problem of DNA instability. Molecular Microbiology 28: 1043–1049.

Guagliardi A., Napoli A., Rossi M., Ciaramella M., 1997 Annealing of complementary DNA strands above the melting point of the duplex promoted by an archaeal protein. Journal of Molecular Biology 267: 841–848.

Hennig B. P., Fischer T., 2014 The great repression. Transcription 4: 97–101.

Hickey D. A., Singer G. A., 2004 Genomic and proteomic adaptations to growth at high temperature. Genome Biol. 5: 1–7.

Hocher A., Rojec M., Swadling J. B., Esin A., Warnecke T., 2019 The DNA-binding protein HTa from is an archaeal histone analog. eLife 8: e52542.

Isabelle V, Franchet-Beuzit J., Sabattier R., Laine B., Spotheim-Maurizot M., Charlier M., 1993 Radioprotection of DNA by a DNA-binding protein: MC1 chromosomal protein from the archaebacterium Methanosarcina sp. CHTI55. Int J Radiat Biol 63: 749–758.

Jevtić Ž., Stoll B., Pfeiffer F., Sharma K., Urlaub H., Marchfelder A., Lenz C., 2019 The Response of Haloferax volcanii to Salt and Temperature Stress: A Proteome Study by Label-Free Mass Spectrometry. Proteomics 19: e1800491.

Knüppel R., Trahan C., Kern M., Wagner A., Grünberger F., Hausner W., Quax T. E. F., Albers S.-V., Oeffinger M., Ferreira-Cerca S., 2021 Insights into synthesis and function of KsgA/Dim1-dependent rRNA modifications in archaea. Nucleic Acids Research 49: 1662–1687.

Liu Y., Guo L., Guo R., Wong R. L., Hernandez H., Hu J., Chu Y., Amster I. J., Whitman W. B., Huang L., 2009 The Sac10b homolog in Methanococcus maripaludis binds DNA at specific sites. Journal of Bacteriology 191: 2315–2329.

Madin J. S., Nielsen D. A., Brbić M., Corkrey R., Danko D., Edwards K., Engqvist M. K. M., Fierer N., Geoghegan J. L., Gillings M., Kyrpides N. C., Litchman E., Mason C. E., Moore L., Nielsen S. L., Paulsen I. T., Price N. D., Reddy T. B. K., Richards M. A., Eduardo P. C. Rocha, Schmidt T. M., Shaaban H., Shukla M., Supek F., Tetu S. G.,

17 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Vieira-Silva S., Wattam A. R., Westfall D. A., Westoby M., 2020 A synthesis of bacterial and archaeal phenotypic trait data. Sci Data 7: 1–8.

Malhotra S., Sowdhamini R., 2015 Collation and analyses of DNA-binding protein domain families from sequence and structural databanks. Mol. BioSyst. 11: 1110–1118.

Mende D. R., Letunic I., Maistrenko O. M., Schmidt T. S. B., Milanese A., Paoli L., Hernández-Plaza A., Orakov A. N., Forslund S. K., Sunagawa S., Zeller G., Huerta- Cepas J., Coelho L. P., Bork P., 2020 proGenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes. Nucleic Acids Research 48: D621–D625.

Müller J. B., Geyer P. E., Colaço A. R., Treit P. V., Strauss M. T., Oroshi M., Doll S., Winter S. V., Bader J. M., Köhler N., Theis F., Santos A., Mann M., 2020 The proteome landscape of the kingdoms of life. Nature 582: 592–596.

Nalabothula N., Xi L., Bhattacharyya S., Widom J., Wang J.-P., Reeve J. N., Santangelo T. J., Fondufe-Mittendorf Y. N., 2013 Archaeal nucleosome positioning in vivo and in vitro is directed by primary sequence motifs. BMC Genomics 14: 391.

Qin W., Amin S. A., Lundeen R. A., Heal K. R., Martens Habbena W., Turkarslan S., Urakawa H., Costa K. C., Hendrickson E. L., Wang T., Beck D. A., Tiquia-Arashiro S. M., Taub F., Holmes A. D., Vajrala N., Berube P. M., Lowe T. M., Moffett J. W., Devol A. H., Baliga N. S., Arp D. J., Sayavedra-Soto L. A., Hackett M., Armbrust E. V., Ingalls A. E., Stahl D. A., 2018 Stress response of a marine ammonia-oxidizing archaeon informs physiological status of environmental populations. ISME J 12: 508–519.

Rojec M., Hocher A., Stevens K. M., Merkenschlager M., Warnecke T., 2019 Chromatinization of Escherichia coli with archaeal histones. eLife 8: e49038.

Sabath N., Ferrada E., Barve A., Wagner A., 2013 Growth temperature and genome size in bacteria are negatively correlated, suggesting genomic streamlining during thermal adaptation. Genome Biol Evol 5: 966–977.

Sandman K., Krzycki J. A., Dobrinski B., Lurz R., Reeve J. N., 1990 HMf, a DNA-binding protein isolated from the hyperthermophilic archaeon Methanothermus fervidus, is most closely related to histones. Proceedings of the National Academy of Sciences of the United States of America 87: 5788–5791.

Sas-Chen A., Thomas J. M., Matzov D., Taoka M., Nance K. D., Nir R., Bryson K. M., Shachar R., Liman G. L. S., Burkhart B. W., Gamage S. T., Nobe Y., Briney C. A., Levy M. J., Fuchs R. T., Robb G. B., Hartmann J., Sharma S., Lin Q., Florens L., Washburn M. P., Isobe T., Santangelo T. J., Shalev-Benami M., Meier J. L., Schwartz S., 2020 Dynamic RNA acetylation revealed by quantitative cross-evolutionary mapping. Nature 583: 638–643.

Searcy D. G., 1975 Histone-like protein in the Thermoplasma acidophilum. Biochim. Biophys. Acta 395: 535–547.

Searcy D. G., 1986 The Archaebacterial Histone “HTa.” In: Bacterial Chromatin, Proceedings in Life Sciences. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 175– 184.

18 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Stein D. B., Searcy D. G., 1978 Physiologically important stabilization of DNA by a prokaryotic histone-like protein. Science 202: 219–221.

Tachdjian S., Kelly R. M., 2006 Dynamic Metabolic Adjustments and Genome Plasticity Are Implicated in the Heat Shock Response of the Extremely Thermoacidophilic Archaeon Sulfolobus solfataricus. Journal of Bacteriology 188: 4553–4559.

Tanaka N., Shuman S., 2011 RtcB is the RNA ligase component of an Escherichia coli RNA repair operon. Journal of Biological Chemistry 286: 7727–7731.

Tian L., Zhang Z., Wang H., Zhao M., Dong Y., Gong Y., 2016 Sequence-Dependent T:G Base Pair Opening in DNA Double Helix Bound by Cren7, a Chromatin Protein Conserved among Crenarchaea. PLoS ONE 11: e0163361.

Visone V., Vettone A., Serpe M., Valenti A., Perugino G., Rossi M., Ciaramella M., 2014 Chromatin structure and dynamics in hot environments: architectural proteins and DNA topoisomerases of thermophilic archaea. International Journal of Molecular Sciences 15: 17162–17187.

Weidenbach K., Glöer J., Ehlers C., Sandman K., Reeve J. N., Schmitz R. A., 2008 Deletion of the archaeal histone in Methanosarcina mazei Gö1 results in reduced growth and genomic transcription. Molecular Microbiology 67: 662–671.

Zeldovich K. B., Berezovsky I. N., Shakhnovich E. I., 2007 Protein and DNA sequence determinants of thermophilic adaptation. PLoS Comput Biol 3: e5.

Zhang Z., Guo L., Huang L., 2012 Archaeal chromatin proteins. Sci China Life Sci 55: 377– 385.

SUPPLEMENTARY DATA

Table S1. Known and candidate NAPs across 1419 archaeal genomes.

Table S2. Correlations between NAP abundance and ecological covariates.

19 A B Length NAP Organism Name (amino acids) pI complete genome 250 Alba bioRxivS. acidocaldarius preprint doi: https://doi.org/10.1101/2021.07.08.451601albA 97 10.4 ; this version posted July 8, 2021. The copyright holderincomplete for this genome preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is CC1 T. tenax CC1 made56 available10 under aCC-BY 4.0 International100 license. Cren7 S. acidocaldarius creN7 59 9.99 # of genomes 0 Histone M. fervidus HMfA/B 69 9.59/8.06 Alba ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● HU T. acidophilum HTa 90 10.74 Histone ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● MC1 M. thermophila MC1 93 10.32 MC1 Cren7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Sul7 S. acidocaldarius Sso7d 64 9.68 CC1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Sul7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● HU ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

C TACK Methanotecta Diaforarchaea

Alba Histone MC1 Cren7 CC1 Sul7 HU

Methanomethylophilaceae

Figure 1. Distribution of nucleoid-associated proteins (NAPs) across archaea. (A) Names and properties of previously characterized archaeal NAPs and (B) their co-occurrence in 1419 sequenced archaeal genomes. (C) Presence/absence of NAPs in phylogenetic context, highlighting the Methanomethylophilaceae as a family without any previously characterized NAPs. The species-level phylogeny is based on GTDB (see Methods). See Table S1 and Fig S1 for species-level information. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. A Methanomethylophilaceae Histone CC1 HU MC1 Alba

Methanomassilicoccales-related

NM2 MGBD MG-III MG-II Thermoplasmatales Methanomassilicoccales fer1 DSM 1728 GSS1 DSM 9790 s B10 Euryarchaeota archaeon UBA15 Euryarchaeota archaeon UBA21 Euryarchaeota archaeon UBA90 Euryarchaeota archaeon UBA229 Euryarchaeota archaeon UBA110 Euryarchaeota archaeon UBA549 Euryarchaeota archaeon UBA124 Euryarchaeota archaeon UBA495 Euryarchaeota archaeon UBA461 Euryarchaeota archaeon UBA559 Euryarchaeota archaeon UBA82 Euryarchaeota archaeon TMED132 Euryarchaeota archaeon UBA12 Euryarchaeota archaeon UBA562 Euryarchaeota archaeon UBA123 Euryarchaeota archaeon TMED252 Euryarchaeota archaeon UBA618 Euryarchaeota archaeon UBA53 Euryarchaeota archaeon UBA85 Euryarchaeota archaeon UBA545 Euryarchaeota archaeon TMED141 Euryarchaeota archaeon UBA550 Thermoplasmatales archaeon Gpl Thermoplasmatales archaeon E-plasma Euryarchaeota archaeon UBA68 acidarmanus Euryarchaeota archaeon UBA87 Euryarchaeota archaeon UBA596 Euryarchaeota archaeon UBA60 Ferroplasma sp. Type II Euryarchaeota archaeon UBA226 T469 Thermoplasmatales archaeon B_DKE uncultured archaeon torridus Thermoplasmatales archaeon UBA613 Thermoplasmatales archaeon BRNA1 Euryarchaeota archaeon UBA462 sp. MBA-1 Marine Group II euryarchaeote MED-G38 Euryarchaeota archaeon TMED129 Marine group II euryarchaeote REDSEA-S43_B8 Candidatus Methanomethylophilus sp. UBA78 Marine group II euryarchaeote REDSEA-S11_B3N4 Candidatus Methanomethylophilus alvus Mx1201 Euryarchaeota archaeon TMED173 Marine group II euryarchaeote REDSEA-S30_B12 Marine group II euryarchaeote REDSEA-S42_B7 Marine Group II euryarchaeote MED-G35 Euryarchaeota archaeon UBA196 Euryarchaeota archaeon UBA58 methanogenic archaeon ISO4-H5 Methanomassiliicoccaceae archaeon UBA593 Euryarchaeota archaeon UBA558 Euryarchaeota archaeon UBA36 Thermoplasma sp. Kam2015 Euryarchaeota archaeon UBA557 Euryarchaeota archaeon UBA14 Euryarchaeota archaeon UBA182 Thermoplasmatales archaeon I-plasma Euryarchaeota archaeon UBA501 Thermoplasma sp. Euryarchaeota archaeon UBA602 Euryarchaeota archaeon UBA221 Euryarchaeota archaeon UBA84 Euryarchaeota archaeon UBA125 T hermoplasma acidophilum Euryarchaeota archaeon UBA133 Euryarchaeota archaeon UBA622 Euryarchaeota archaeon UBA222 Marine Group II euryarchaeote MED-G33 Euryarchaeota archaeon UBA20 Euryarchaeota archaeon UBA39 Methanomassiliicoccaceae archaeon UBA537 Euryarchaeota archaeon UBA496 T hermoplasma volcanium Euryarchaeota archaeon UBA225 Euryarchaeota archaeon UBA22 Euryarchaeota archaeon UBA465 Euryarchaeota archaeon TMED192 Euryarchaeota archaeon UBA101 Euryarchaeota archaeon UBA174 Marine Group III euryarchaeote SCGC AAA288-E19 Euryarchaeota archaeon UBA59 Euryarchaeota archaeon UBA488 Euryarchaeota archaeon UBA584 Euryarchaeota archaeon UBA195 Euryarchaeota archaeon UBA252 Euryarchaeota archaeon UBA250 Euryarchaeota archaeon UBA52 Euryarchaeota archaeon UBA251 Marine Group III euryarchaeote CG-Bathy2 Euryarchaeota archaeon UBA40 Marine Group III euryarchaeote CG-Epi2 Euryarchaeota archaeon UBA193 Euryarchaeota archaeon UBA31 Euryarchaeota archaeon UBA491 uncultured marine group III euryarchaeote Euryarchaeota archaeon UBA34 Euryarchaeota archaeon UBA171 Methanomassiliicoccaceae archaeon UBA306 Methanomassiliicoccaceae archaeon UBA354 Methanomassiliicoccales archaeon Mx-06 Euryarchaeota archaeon UBA106 Methanomassiliicoccaceae archaeon UBA271 Euryarchaeota archaeon UBA102 Marine Group III euryarchaeote CG-Epi4 Euryarchaeota archaeon UBA105 Marine Group III euryarchaeote CG-Epi1 Marine Group III euryarchaeote CG-Bathy1 Marine Group III euryarchaeote CG-Epi6 Euryarchaeota archaeon UBA597 Methanomassiliicoccaceae archaeon UBA113 Methanomassiliicoccaceae archaeon UBA71 Euryarchaeota archaeon UBA287 Methanomassiliicoccales archaeon RumEn M2 methanogenic archaeon mixed culture ISO4-G1 Methanomassiliicoccales archaeon Mx-03 Candidatus Methanoplasma termitum Methanomassiliicoccaceae archaeon UBA408 Methanomassiliicoccales archaeon Mx-02 Thermoplasmatales archaeon SW_10_69_26 Candidatus Methanomassiliicoccus intestinalis Issoire-Mx1 Euryarchaeota archaeon RBG_16_62_10 Aciduliprofundum sp. archaeon DTU008 archaeon HGW-Thermoplasmata-1 Thermoplasmatales archaeon SM1-50 Euryarchaeota archaeon RBG_13_57_23 Candidatus Methanomethylophilaceae archaeon Methanomassiliicoccaceae archaeon UBA472 Euryarchaeota archaeon RBG_19FT_COMBO_56_21 Thermoplasmatales archaeon ex4572_165 Methanomassiliicoccales archaeon PtaB.Bin215 Methanomassiliicoccales archaeon RumEn M1 Euryarchaeota archaeon UBA33 Methanomassiliicoccus sp. Methanomassiliicoccus sp. UBA382 Methanomassiliicoccales archaeon PtaB.Bin134 Methanomassiliicoccaceae archaeon Methanomassiliicoccales archaeon Thermoplasmatales archaeon SG8-52-3 Euryarchaeota archaeon RBG_19FT_COMBO_69_17 Euryarchaeota archaeon RBG_16_68_13 Methanomassiliicoccales archaeon PtaU1.Bin124 Thermoplasmata archaeon M11B2D Thermoplasmatales archaeon SG8-52-2 Thermoplasmata archaeon M9B1D Thermoplasmata Euryarchaeota archaeon RBG_16_67_27 Methanomassiliicoccus luminyensi Thermoplasmatales archaeon SG8-52-4 Thermoplasmatales archaeon ex4484_6 Methanomassiliicoccus sp. UBA6 Thermoplasmata archaeon HGW-Thermoplasmata-2 Euryarchaeota archaeon 13_1_20CM_4_64_14 Methanomassiliicoccales archaeon UBA280 Methanomassiliicoccales archaeon UBA147 Aciduliprofundum sp. MAR08-339 Thermoplasmatales archaeon ex4484_30

B MGBD

Thermoplasmatales

MG-III

MG-II

Phylum

Aenigmarchaeota Hydrothermarchaeota

Altiarchaeota Iainarchaeota

Methanomassilicoccales Asgardarchaeota Micrarchaeota

Euryarchaeota Nanohaloarchaeota Hadarchaeota Thermoplasmatota

Halobacterota Not assigned

Diaforachaea Alba

Figure 2. Nucleoid-associated proteins (NAPs) in the Diaforarchaea. (A) Presence/absence of NAPs in phylogenetic context, highlighting the absence of known NAPs in the Methanomethylophilaceae, the lineage-restricted distribution of HU and MC1, and the patchy distribution of Alba. The species-level phylogeny is based on GTDB (see Methods). The two species for which proteomics data were generated are marked with asterisks. (B) Maximum likelihood protein tree (see Methods) of archaeal Alba homologs. Alba sequences from the Diaforarchaea are not monophyletic, suggesting multiple independent acquisition events. MG-II (III): Marine group II (III); NM2: New lineage 2; MGBD: Marine benthic group D. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. A B *** Pipeline to predict novel NAPs

19 Species, 25k mass spec measurements 30

Smaller than largest known NAP (TrmBL2) + 10% (≤ 290 AA) *** 20

PFAM-A hit (HmmSearch)

# genes in operon 10 known TF neither NAP Predicted to bind DNA (DNAbinder) 0 other TF NAP

Species-specific outlier prediction, (Rosner test, k=5, alpha < 0.01%)

single-gene operon (OperonMapper)

Candidate NAP

C

Candidate MC1 Cren7 4 Sul7 CC1 2 HU

% of proteome Histone Alba 0 M. alvus A. fulgidus M. barkeri P. torridus P. H. volcanii N. magadii F. placidus F. M. kandleri A. profundus H. salinarium N. maritimus T. volcanium T. M. jannaschii T. acidophilum T. M. luminyensis T. kodakarensis T. M. marburgensis S. acidocaldicarius M. thermautotrophicus

specie tree (Gtdb)

Figure 3. Quantitative variation in the abundance of known and predicted nucleoid-associated proteins (NAPs). (A) Outline of the bioinformatic pipeline to predict novel NAPs. Proteins detected by mass spectrometry need to pass several successive filters to be considered as a candidate NAP (see main text and Methods for a detailed description of the individual steps). (B) Sizes (number of genes) for operons that contain proteins classified as NAPs, transcription factors (TF) or “other” domains (neither NAP nor TF) based on Pfam annotations. (C) Large variation in the relative abundance (% of proteome) of known and candidate NAPs in 19 species of archaea for which quantitative mass spectrometry data was analyzed. The species tree is taken from GTDB, with P. torridus and H. salinarum added manually. A D E investment dropswhen growth temperature. We obtain similarresults whenconsidering gene ontology categoriesinstead ofPfam domains.(E,F ) Relative NAP abundance of1154 Pfamdomains. The relativeabundanceof NAPs, consideredas an aggregateclass,exhibits the strongestcorrelationwith is notcorrelated withoptimalgrowthtemperature. ( NAP abundanceandoptimalgrowth temperatureacrossthesame set ofarchaea.(C) The aggregateabundance oftranscriptionfactors(TF) across thesampleof19archaea showninFig3C.Examplesofindividual correlationsareshownontheright.(B)Relationshipbetween relative abundanceismostpositively (negatively)correlatedwithrelative NAP abundance(aggregatedacrossknownandcandidate NAPs) Figure 4. The relationship between growthtemperatureandNAP investmentacrossarchaea. (A) Top (bottom)13Pfamdomainswhose Spearman’s rho # PFAM domains 30 20 10 Formyl_trans_N 0

NAPs (% of measured proteome) (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.It

−0.5 0.0 0.5 bioRxiv preprint 0 2 4 DnaJ N. magadii N. maritimus | ● M. luminyensis P <1e−05 rho =0.82 || ● THF_DHG_CYH Porphobil_deamC Porphobil_deam Na_H_Exchanger DnaJ DnaJ_C Ribosomal_L32e AAA_29 FbpA NFACT HHH_5 TRAM GrpE THF_DHG_CYH_C Prefoldin eIF Memo HSP20 DUF211 ArsA_HSP20 tRNA.Thr_ED ATPase Ribosomal_L5e Molybdop_Fe4S4 RtcB Alba ● ● ● ● PFAM domain HAS.barrel 40 M. alvus ||| H. salinarium M. barkeri .5a | ● || || | | T. acidophilum H. volcanii P. torridus Gro .R_1 T. volcanium M. thermautotrophicus doi: wth temper ● ● T. kodakarensis isgrown at65ºCinstead ofits optimumgrowth temperature of85ºC. 60 ● M. marburgensis T.kodakarensis (65°C) https://doi.org/10.1101/2021.07.08.451601 −0.5 ● ● ● ature (°C) S. acidocaldarius Correlation withgrowthtemperature(rho) M. jannaschii 0100 80 F. placidus ● ● T. kodakarensis (85°C) A. fulgidus ● ● ● ● A. profundus M. kandleri made availableundera ● RtcB abundance (%)

0.00 Pref0.05 oldin 0.10 abundancy0.15 (%) 0.20 0.0 0.1 0.2 0.0 HSP200.2 abundance0.4 (%) 0.6 adj.p-value <0.05 Significant hits 0 0 N. magadii H. salinarium H. salinarium 0 M. alvus N. magadii M. barkeri ● ● M. alvus ● ● M. barkeri ● P <0.008 rho =0.6 H. salinarium ● M. alvus ● ● M. marburgensis ● N. maritimus N. magadii ● ● T. acidophilum N. maritimus H. volcanii ● 0.0 M. barkeri H. volcanii ● ● ● D) Distribution of correlationcoefficients betweengrowth temperatureandtherelative N. maritimus M. jannaschii H. volcanii M. marburgensis F ● ● ● M. luminyensis M. jannaschii ● ● ● M. thermautotrophicus M. marburgensis NAP ab NAP ab T.acidophilum M. luminyensis M. luminyensis NAP ab T. acidophilum M. jannaschii ● ● P. torridus P. torridus ● ● ● P. torridus ● ● M. thermautotrophicus ● ● M. kandleri ● ● M. thermautotrophicus ● 2 2 2 M. kandleri ● ● S. acidocaldarius A. fulgidus ● ● ● ● M. kandleri

% proteome undance (%) 0 1 2 3 4 S. acidocaldarius undancy (%) A. fulgidus undance (%) ● ● A. fulgidus T. volcanium ● ● ● S. acidocaldarius ● T. volcanium T. volcanium ● ● T. kodakarensis ● T. kodakarensis CC-BY 4.0Internationallicense T. kodakarensis ; 4 4 ● ● this versionpostedJuly8,2021. 4 ● P <0.002 rho =0.7 A. profundus P <0.007 rho =0.6 A. profundus F. placidus F. placidus 65°C F. placidus ● ● ● ● ● A. profundus ● T. kodakarensis 0.5 RtcB B Alba | F C er4_11 85°C || | (wCandidates) % | NAP TF abundance (%) NAP abundance (%) 0.5 1.0 1.5 | | 0 2 4 . % NAP N. magadii M. luminyensis N. maritimus M. barkeri N. maritimus H. salinarium ● The copyrightholderforthispreprint ● P <9e−06 rho =0.84 M. luminyensis ● ● N. magadii ● ● ● ● M. alvus ● ● ● ● 40 M. alvus 40 H. salinarium M. barkeri ● ● T. acidophilum P. torridus H. volcanii H. volcanii P. torridus Alba hpkA hpkB Candidate 1 Candiate 2 TrmBl2 Gro Gro T. acidophilum T. volcanium T. volcanium ● ● ● ● wth temper wth temper M. thermautotrophicus 60 M. thermautotrophicus 60 ● ● M. marburgensis M. marburgensis ● ● ● ● M. jannaschii S. acidocaldarius ature (°C) ature (°C) A. profundus S. acidocaldarius 0100 80 0100 80 M. jannaschii ● ● F. placidus ● ● T. kodakarensis T. kodakarensis A. fulgidus A. fulgidus ● ● ● ● P <0.19 rho =−0.31 ● ● ● ● F. placidus M. kandleri A. profundus M. kandleri ● ● bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is

A Alba Histone MC1 Cren7 Cc1 Sul7 HU made available under aCC-BY 4.0 International license. Candidatus^Altiar chaeales^archaeon^HGW-Altiarchaeales-1 Candidatus^Altiarchaeale s^archaeon^A3 Candidatus^Altiarchaeales^archaeon^WOR_SM1_79 Candidatus^Altiar chaeales^archaeon^IMC4 28 Candidatus^Altiar chaeales^archaeon^WOR_SM1_SCG 36 Euryarchaeota^archaeon^SCGC^AAA252-I15 assembly level 66 Candidatus^Altiar chaeales^archaeon 99 Candidatus^Altiar chaeales^archaeon^ex4484_2 Candidatus^Altiar chaeales^archaeon^ex4484_96 Candidatus^Diapherotrites^archaeon^CG08_land_8_20_14_0_20_30_16 Candidatus^Diapherotrites^archaeon^CG08_land_8_20_14_0_20_34_12 100 Candidatus^Diapherotrites^archaeon^CG10_big_fil_rev_8_21_14_0_10_31_34 96 archaeon^GW2011_AR10 90 Candidatus^Iainarchaeum^andersonii^SCGC^AAA011-E11 Candidatus^Diapherotrites^archaeon^CG11_big_fil_rev_8_21_14_0_20_37_9 Candidatus^Diapherotrites^archaeon^UBA493 93 Candidatus^Micrarchaeota^archaeon^CG08_land_8_20_14_0_20_59_11 38 Candidatus^Micrarchaeota^archaeon^CG1_02_55_22 48 Candidatus^Micrarchaeota^archaeon^CG1_02_51_15 Candidatus^Mancarchaeum^acidiphilum 99 100 97 Candidatus^Micrarchaeum^acidiphilum^ARMAN-1 Candidatus^Micrarchaeota^archaeon 93 Candidatus^Micrarchaeota^archaeon^CG1_02_47_40 99 Candidatus^Micrarchaeota^archaeon^CG10_big_fil_rev_8_21_14_0_10_45_29 archaeon^UBA543 Chromosome Contig archaeon^UBA55 Candidatus^Aenigmarchaeota^archaeon^ex4484_52 archaeon^GW2011_AR5 6 Candidatus^Aenigmarchaeum^subterraneum^SCGC^AAA011-O16 85 65 Candidatus^Aenigmarchaeota^archaeon^ex4484_224 Candidatus^Micrarchaeota^archaeon^RBG_16_49_10 Candidatus^Aenigmarchaeota^archaeon 45 53 Candidatus^Nanosalinarum^sp.^J07AB56 ^archaeon^B1-Br10_U2g19 Nanohaloarchaea^archaeon^PL-Br10_U2g16 98 99 95 100 Nanohaloarchaea^archaeon^SW_4_43_9 84 Nanohaloarchaea^archaeon^SW_7_43_1 Candidatus^Nanosalina^sp.^J07AB43 60 Nanohaloarchaea^archaeon^SG9 89 Nanohaloarchaea^archaeon^B1-Br10_U2g29 Nanohaloarchaea^archaeon^PL-Br10_U2g27 100 Complete Genome Scaffold Nanohaloarchaea^archaeon^B1-Br10_U2g21 100 archaeon^CG10_big_fil_rev_8_21_14_0_10_43_11 Nanoarchaeum^equitans^Kin4-M Candidatus^Nanoclepta^minutus Candidatus^Nanopusillus^acidilobi Candidatus^Nanobsidianus^stetteri archaeon^UBA431 44 archaeon^UBA583 Candidatus^Pacearchaeota^archaeon^ex4484_31 63 Candidatus^Pacearchaeota^archaeon^CG06_land_8_20_14_3_00_35_12 66 Candidatus^Pacearchaeota^archaeon^UBA284 63 Candidatus^Pacearchaeota^archaeon^CG10_big_fil_rev_8_21_14_0_10_35_13 37 Candidatus^Pacearchaeota^archaeon^RBG_13_36_9 100 Candidatus^Pacearchaeota^archaeon^CG10_big_fil_rev_8_21_14_0_10_34_76 9 100 Candidatus^Pacearchaeota^archaeon^CG1_02_39_14 43 Candidatus^Pacearchaeota^archaeon^UBA73 16 Candidatus^Pacearchaeota^archaeon 45 Candidatus^Pacearchaeota^archaeon^CG10_big_fil_rev_8_21_14_0_10_30_48 Candidatus^Pacearchaeota^archaeon^CG1_02_32_21 93 Candidatus^Pacearchaeota^archaeon^CG10_big_fil_rev_8_21_14_0_10_32_14 archaeon^BMS3Abin17 46 100 Nanoarchaeota^archaeon^SCGC^AAA011-D5 archaeon^GW2011_AR19 94 Candidatus^Pacearchaeota^archaeon^ex4484_71 74 archaeon^GW2011_AR18 55 archaeon^GW2011_AR20 Candidatus^Woesearchaeota^archaeon^UBA119 Candidatus^Woesearchaeota^archaeon^CG08_land_8_20_14_0_20_47_9 39 Candidatus^Woesearchaeota^archaeon^CG10_big_fil_rev_8_21_14_0_10_37_12 10 Candidatus^Woesearchaeota^archaeon^B 3_Woes Candidatus^Woesearchaeota^archaeon^CG10_big_fil_rev_8_21_14_0_10_44_13 1 archaeon^GW2011_AR15 1441 Euryarchaeota^archaeon^SM23-78 63 99 Candidatus^Woesearchaeota^archaeon^CG10_big_fil_rev_8_21_14_0_10_32_9 2 Candidatus^Woesearchaeota^archaeon 59 Candidatus^Woesearchaeota^archaeon^UBA153 Candidatus^Woesearchaeota^archaeon^UBA525 Candidatus^Woesearchaeota^archaeon^CG11_big_fil_rev_8_21_14_0_20_43_8 5 Nanoarchaeota^archaeon Candidatus^Woesearchaeota^archaeon^CG_4_10_14_0_2_um_filter_33_13 69 Candidatus^Woesearchaeota^archaeon^CG10_big_fil_rev_8_21_14_0_10_36_11 49 Candidatus^Woesearchaeota^archaeon^CG10_big_fil_rev_8_21_14_0_10_32_24 49 Candidatus^Woesearchaeota^archaeon^CG10_big_fil_rev_8_21_14_0_10_45_16 archaeon Candidatus^Heimdallarchaeota^archaeon^B3_Heim Candidatus^Heimdallarchaeota^archaeon^LC_3 Candidatus^Heimdallarchaeota^archaeon 100 Candidatus^Heimdallarchaeota^archaeon^LC_2 archaeon^UBA460 Candidatus^Heimdallarchaeota^archaeon^AB_125 Candidatus^^archaeon^SMTZ1-83 Candidatus^Thorarchaeota^archaeon 94 100 92 Candidatus^Thorarchaeota^archaeon^AB_25 100 Candidatus^Thorarchaeota^archaeon^MP11T_1 Candidatus^Odinarchaeota^archaeon^LCB_4 Candidatus^Helarchaeota^^archaeon^Hel_GB_B Candidatus^Helarchaeota^ASGARD^archaeon^Hel_GB_A 56 Candidatus^^archaeon^CR_4 Candidatus^Lokiarchaeota^archaeon^Loki_b32 Candidatus^Lokiarchaeota^archaeon^Loki_b31 Candidatus^Methanodesulfokores^washbu rnensis Candidatus^^archaeon^NZ13-K Candidatus^Korarchaeum^cryptofilum^OPF8 Candidatus^Bathyarchaeota^archaeon^B 24 Candidatus^Bathyarchaeota^archaeon^ex4484_218 Candidatus^Bathyarchaeota^archaeon^JdFR-11 88 Candidatus^Bathyarchaeota^archaeon^B 25 Candidatus^Bathyarchaeota^archaeon^RBG_16_48_13 49100 Candidatus^Bathyarchaeota^archaeon^RBG_13_38_9 archaeon^RBG_16_50_20 100 54 Candidatus^Bathyarchaeota^archaeon^UBA185 Crenarchaeota^archaeon^13_1_40CM_3_52_10 100 Crenarchaeota^archaeon^13_1_40CM_3_52_17 Candidatus^Bathyarchaeota^ar chaeon^B23 100 Candidatus^Bathyarchaeota^archaeon 100 100 Candidatus^Bathyarchaeota^archaeon^RBG_16_57_9 100 Candidatus^Bathyarchaeota^archaeon^RBG_13_52_12 Candidatus^Bathyarchaeota^archaeon^ex4484_135 100 Candidatus^Bathyarchaeota^archaeon^B 24-2 Candidatus^Bathyarchaeota^archaeon^B 63 Candidatus^Bathyarchaeota^archaeon^B 26-2 100 Candidatus^Bathyarchaeota^archaeon^B 26-1 Candidatus^Bathyarchaeota^archaeon^B A1 100 Candidatus^Bathyarchaeota^archaeon^B A2 Candidatus^Bathyarchaeota^archaeon^UBA233 Candidatus^Bathyarchaeota^archaeon^CG07_land_8_20_14_0_80_47_9 100 miscellaneous^Crenarchaeota^group-6^archaeon^AD8-1 100 miscellaneous^Crenarchaeota^group-1^archaeon^SG8-32-3 Candidatus^Bathyarchaeota^archaeon^ex4484_205 archaeon^HR02 93 100 ^archaeon^JGI^OTU-3 archaeon^HR01 Candidatus^Caldiarchaeum^subterraneum 92 Aigarchaeota^archaeon^NZ13_MG1 88 Aigarchaeota^archaeon^JGI^0000106-J15 100 Thaumarchaeota^archaeon^ex4484_121 Candidatus^Geothermarchaeota^archaeon^JdFR-13 Candidatus^Geothermarchaeota^archaeon^JdFR-14 71 Thaumarchaeota^archaeon^UBA160 Thaumarchaeota^archaeon^UBA164 Thaumarchaeota^archaeon^UBA57 100 Thaumarchaeota^archaeon^UBA141 88 65 Thaumarchaeota^archaeon^UBA183 Candidatus^Nitrosocaldus^cavascurensis archaeon^HR04 100 Candidatus^Nitrosocosmicus^oleophilus 100 Candidatus^Nitrosocosmicus^franklandus 100 Candidatus^Nitrosocosmicus^exaquare Nitrososphaera^sp.^UBA210 98 B Candidatus^Nitrososphaera^gargensis^Ga9.2 100 Candidatus^Nitrososphaera^evergladensis^SR1 100 Nitrososphaera^viennensis^EN76 Candidatus^Nitrosotalea^bavarica Candidatus^Nitrosotalea^sinensis 100 82 90 Candidatus^Nitrosotalea^okcheonensis 92 Candidatus^Nitrosotalea^devanaterra Candidatus^Nitrosotenuis^sp. 100 Candidatus^Nitrosotenuis^chungbukensis Candidatus^Nitrosotenuis^uzonensis 72 Candidatus^Nitrosotenuis^cloacae 100 100 Candidatus^Nitrosotenuis^aquarius 100 ^archaeon^UBA218 Thaumarchaeota^archaeon^CSP1-1 Thaumarchaeota^archaeon^casp-thauma4 100 100 Candidatus^Nitrosopelagicus^brevis 100 Marine^Group^I^thaumarchaeote^SCGC^AAA799-O18 100 75 Thaumarchaeota^archaeon^SCGC^AAA007-O23 73 Euryarchaeota^archaeum^SCGC^AAA287-E17 Cenarchaeum^symbiosum^A Nitrosopumilales^archaeon^CG15_BIG_FIL_POST_REV_8_21_14_020_37_12 100 Nitrosarchaeum^sp. 94 Nitrosarchaeum^koreense^MY1 97 Candidatus^Nitrosarchaeum^limnium^SFB1 100 ^sp.^D6 96 Nitrosopumilus^sp.^H13 Thaumarchaeota^archaeon^casp-thauma3 Thaumarchaeota^archaeon^SCGC^AAA282-K18 100 Thaumarchaeota^archaeon 100100 Nitrosopumilus^sp.^LS_AOA Nitrosopumilus^piranensis 31100 87 Candidatus^Nitrosopumilus^sp.^SW 44 Nitrosopumilus^maritimus^SCM1 complete genome Nitrosopumilus^sp.^BACL13^MAG-121220-bin23 Candidatus^Nitrosomarinus^catalina 100100 Candidatus^Nitrosomarinus^sp. 66 98 Nitrosopumilus^sp.^Nsub Candidatus^Nitrosopumilus^sp.^NM25 56100 Nitrosopumilus^adriaticus Candidatus^Nitrosopumilus^salaria^BD31 88 Nitrosopumilales^archaeon^CG_4_9_14_0_2_um_filter_34_16 66 Nitrosopumilus^sp.^UBA526 100 Candidatus^Nitrosopumilus^sediminis Candidatus^Nezhaarchaeota^archaeon^WYZ-LMO8 Candidatus^Verstraetearchaeota^archaeon Candidatus^Methanomethylicus^oleusabulum 100 Candidatus^Methanosuratus^subterraneum 96 Candidatus^Methanosuratus^petracarbonis ^sp.^ex4484_15 Thermofilum^adornatus^1505 100 Thermofilum^pendens^Hrk^5 95 Thermofilum^sp.^NZ13 Thermofilum^uzonense ^archaeon Thermoproteus^sp.^AZ2

TACK Thermoproteus^tenax^Kra^1 100 Thermoproteus^sp.^UBA157 100 Thermoproteus^uzoniensis^768-20 96 ^calidifontis^JCM^11548 Pyrobaculum^arsenaticum^DSM^13514 10064 Pyrobaculum^ferrireducens 100 98 Pyrobaculum^aerophilum^str.^IM2 Pyrobaculum^sp.^WP30 69 Pyrobaculum^neutrophilum^V24Sta 84 Pyrobaculum^islandicum^DSM^4184 50 Thermoprotei^archaeon^ex4572_64 ^sp.^UBA161 100 Caldivirga^maquilingensis^IC-167 100 ^sp.^ECH_B Thermocladium^sp.^UBA166 74 100 Thermocladium^modestius^JCM^10088 100 ^thermophila 100 Vulcanisaeta^sp.^UBA163 100 Vulcanisaeta^souniana^JCM^11219 97 Vulcanisaeta^moutnovskia^768-28 52 Vulcanisaeta^distributa 52 Vulcanisaeta^sp.^JCM^14467 57 Vulcanisaeta^sp.^JCM^16161 42 Vulcanisaeta^distributa^JCM^11213 97 Vulcanisaeta^distributa^DSM^14429 41 Vulcanisaeta^distributa^JCM^11217 82 Vulcanisaeta^sp.^JCM^16159 100 Vulcanisaeta^distributa^JCM^11216 archaeon^UBA168 Candidatus^Marsarchaeota^G1^archaeon^BE_D 78 Desulfurococcales^archaeon^ex4484_217_1 Sulfodiicoccus^acidiphilus Sulfolobales^ar chaeon^Acd1 ^tokodaii^str.^7 100 96 Sulfolobus^sp.^SCGC^AB-777_G06 92 Sulfolobus^acidocaldarius^SUSAZ # of genomes Sulfolobus^acidocaldarius^DSM^639 95 Sulfolobus^sp.^A20 100 Sulfolobus^islandicus^L.S.2.15 Sulfuracidifex^metallicus^DSM^6482^=^JCM^9184 99 100 Sulfolobus^sp.^UBA167 91 Sulfuracidifex^tepidarius ^yellowstonensis^MK1 Metallosphaera^cuprina^Ar-4 100 Metallosphaera^hakonensis^JCM^8857^=^DSM^7519 100 100 Metallosphaera^sedula^DSM^5348 100 ^brierleyi 98100 Candidatus^Aramenus^sulfurataquae Candidatus^Acidianus^copahuensis Acidianus^hospitalis^W1 92 Acidianus^sulfidivorans^JP7 100 Acidianus^manzaensis ^archaeon^AG1 60 ^aggregans^DSM^17230 100 Zestosphaera^tikiterensis Desulfurococcales^archaeon^ex4484_204 Thermogladius^calderae^1633 100 Thermosphaera^aggregans^DSM^11486 26 74 ^mucosus^DSM^2162 0 Desulfurococcales^archaeon^ex4484_58 100 ^hellenicus^DSM^12710 Staphylothermus^marinus^F1 ^islandicus^DSM^13165 Ignicoccus^hospitalis^KIN4/I ^fumarii^1A 21 ^sp. 6 100 Hyperthermus^butylicus^DSM^5456 97 ^occultum Pyrodictium^delaneyi ^sp. Fervidicoccus^fontis^Kam940 74 ^camini^SY1^=^JCM^12091 Aeropyrum^pernix^K1 Acidilobaceae^archaeon^UBA158 100 ^lagunensis^DSM^15908 Alba 100 ● ● ● ● ● ● ● ● ● ● ● ● ● Acidilobaceae^archaeon^UBA162 84 ^sp.^7A 100 Acidilobus^saccharovorans^345-15 archaeon^BMS3Abin16 Candidatus^Hydrothermarchaeota^archaeon^JdFR-18 100 archaeon^BMS3Bbin15 Hadesarchaea^archaeon^YNP_45 Hadesarchaea^archaeon 89 Hadesarchaea^archaeon^DG-33 Candidatus^Methanofastidiosum^methylthiophilus Candidatus^Methanofastidiosa^archaeon 78 Candidatus^Methanofastidiosum^methylthiophilus 100 Candidatus^Methanofastidiosum^methylthiophilus 90 Candidatus^Methanofastidiosum^methylthiophilus Histone Pyrococcus^yayanosii^CH1 ● ● ● ● ● ● ● ● ● ● ● ● ● Pyrococcus^furiosus^DSM^3638 100 Pyrococcus^sp.^ST04 91100 Thermococcus^chitonophagus 100 Pyrococcus^kukulkanii 100 Pyrococcus^abyssi^GE5 32 Pyrococcus^sp.^NA2 75 Pyrococcus^horikoshii^OT3 Thermococcus^paralvinellae Thermococcus^barophilus^MP Thermococcus^sp.^JdFR-02 85 Palaeococcus^ferrophilus^DSM^13482 56 Palaeococcus^pacificus^DY20341 76 Thermococcus^sp.^EP1 100 Thermococcus^sibiricus^MM^739 MC1 Thermococcus^sp.^PK ● ● ● ● ● ● ● ● ● ● ● ● ● 100 64 Thermococcus^sp.^2319x1 100 Thermococcus^litoralis^DSM^5473 Thermococcus^piezophilus 99 Thermococcus^onnurineus^NA1 Thermococcus^siculi 95 Thermococcus^pacificus 6492 Thermococcus^cleftensis 32 49 Thermococcus^barossii 99 Thermococcus^sp.^P6 52 Thermococcus^sp.^IOH1 100 Thermococcus^celericrescens 51 Thermococcus^sp.^4557 Thermococcus^kodakarensis^KOD1 Thermococcus^eurythermalis Cren7 54100 ● ● ● ● ● ● ● ● ● ● ● ● ● 99 Thermococcus^sp.^EXT12c 100 Thermococcus^nautili Thermococcus^guaymasensis^DSM^11113 99 Thermococcus^sp.^AM4 3798 Thermococcus^gammatolerans^EJ3 Thermococcus^profundus 61 Thermococcus^gorgonarius 100 Thermococcus^zilligii^AN1 Methanopyrus^kandleri^AV19 Methanocaldococcus^infernus^ME 93 Methanocaldococcus^vulcanius^M7 Methanocaldococcus^fervens^AG86 72 Methanocaldococcus^bathoardescens 49 Methanocaldococcus^jannaschii^DSM^2661 54 Methanocaldococcus^sp.^FS406-22 CC1 ● ● ● ● ● ● ● ● ● ● ● ● ● Methanotorris^formicicus^M c-S-70 Methanotorris^igneus^Kol^5 Methanofervidicoccus^abyssi 100 Methanothermococcus^okinawensis^IH1 90 Methanococcus^aeolicus^Nankai-3 60 Methanothermococcus^thermolithotrophicus^DSM^2095 100 Methanococcus^voltae^A3 Methanococcus^vannielii^SB 100 Methanococcus^mar ipaludis^C5 100 Methanococcus^mar ipaludis^C6 100 Methanococcus^maripalud is^C7 Methanothermus^fervidus^DSM^2088 Methanothermobacter^sp.^MT-2 Methanothermobacter^tenebrarum 100 Methanobacteriaceae^archaeon^41_258 Sul7 ● ● ● ● ● ● ● ● ● ● ● ● ● Methanothermobacter^wolfeii 74 Methanothermobacter^marburgensis^str.^Marburg 100 Methanothermobacter^sp.^CaT2 100 100 Methanothermobacter^thermautotrophicus^str.^Delta^H Methanobrevibacter^curvatus Methanobrevibacter^filiformis Methanobrevibacter^cuticularis 95 Methanobrevibacter^sp.^NOE 100 100 Methanobrevibacter^arboriphilus^ANOR1 Methanobrevibacter^ruminantium 100 Methanobrevibacter^ruminantium^M1 100 100 Methanobrevibacter^sp.^87.7 100 Methanobrevibacter^wolinii^SH 67 Methanobrevibacter^woesei 90 Methanobrevibacter^sp.^UBA586 HU ● ● ● ● ● ● ● ● ● ● ● ● ● Methanobrevibacter^smithii^ATCC^35061 100 Methanobrevibacter^millerae 100 Methanobrevibacter^thaueri 50 Methanobrevibacter^gottschalkii^DSM^11977 72 Methanobrevibacter^sp.^YE315 Methanobacteriaceae^archaeon Methanobacteriaceae^archaeon^UBA587 100100 Methanobacteriales^archaeon^HGW-Methanobacteriales-1 74 Methanobacteriaceae^archaeon^UBA349 Methanobacterium^sp.^BRmetb2 Methanobacterium^sp.^UBA8 100 Methanobacterium^bryantii 96 Methanobacterium^sp.^UBA176 100 Methanobacterium^sp.^UBA44 Methanosphaera^stadtmanae^DSM^3091 70 Methanosphaera^sp.^WGK6 96 93 Methanosphaera^sp.^rholeuAM270 100 Methanosphaera^sp.^rholeuAM6 Methanosphaera^sp.^rholeuAM74 82 Methanosphaera^sp.^SHI613 100 Methanosphaera^sp.^BMS 95 89 Methanosphaera^sp.^rholeuAM130 Methanobacterium^lacus Methanobacterium^sp.^SMA-27 92 Methanobacterium^congolense 68 Methanobacterium^paludis Methanobacterium^sp.^PtaU1.Bin242 Methanobacterium^sp.^UBA279 77 Methanobacterium^sp.^UBA353 92 Methanobacterium^sp.^PtaB.Bin024 Methanobacterium^sp.^Maddingley^MBC34 100100 Methanobacterium^formicicum^DSM^3637 Methanobacterium^sp.^MB1 100 Methanobacterium^sp.^BAmetb5 Euryarchaeota^archaeon^UBA287 Euryarchaeota^archaeon^UBA186 Thermoplasmatales^archaeon^SW_10_69_26 Euryarchaeota^archaeon^UBA202 Thermoplasm atales^archaeon^JdFR-43 Thermoplasmata^archaeon^HGW-Thermoplasmata-1 66 Thermoplasmatales^archaeon^SM1-50 Euryarchaeota^archaeon^UBA33 Thermoplasmatales^archaeon^ex4572_165 34 84 Thermoplasmata^archaeon^M9B1D 100 Thermoplasmatales^archaeon^SG8-52-3 Thermoplasm ata^archaeon^UBA61 Thermoplasmata^archaeon^UBA184 Aciduliprofundum^sp.^MAR08-339 Aciduliprofundum^boonei^T469 100 Aciduliprofundum^sp. 49 95 Thermoplasmatales^archaeon^I-plasma Thermoplasma^volcanium^GSS1 Thermoplasma^sp.^Kam2015 100 Thermoplasma^acidophilum^DSM^1728 Ferroplasm a^acidarmanus^fer1 38 Thermoplasmatales^archaeon^B_DKE 32 65 uncultured^archaeon 40 Thermogymnomonas^acidicola^JCM^13583 6 Thermoplasmatales^archaeon^UBA582 34 Thermoplasmatales^archaeon^E-plasma Euryarchaeota^archaeon^RBG_19FT_COMBO_69_17 100 Euryarchaeota^archaeon^RBG_16_67_27 Euryarchaeota^archaeon^SG8-5 94 Methanomassiliicoccales^archaeon^UBA280 100 Methanomassiliicoccales^archaeon^UBA147 77 Euryarchaeota^archaeon^RBG_19FT_COMBO_56_21 Euryarchaeota^archaeon^RBG_13_57_23 Methanomassiliicoccaceae^archaeon Figure S1. Methanomassiliicoccales^archaeon^PtaU1.Bin124 Distribution of nucleoid-associated proteins (NAPs) across the archaea. 100 Methanomassiliicoccus^sp. 100 Methanomassiliicoccales^archaeon^PtaB.Bin134 100 Methanomassiliicoccaceae^archaeon^UBA472 Methanomicrobia^archaeon^DTU008 100 Methanomassiliicoccus^sp.^UBA386 100 Methanomassiliicoccales^archaeon^RumEn^M1 70 Candidatus^Methanomassiliicoccus^intestinalis^Issoire-Mx1 85 Methanomassiliicoccus^luminyensis^B10 Methanomassiliicoccus^sp.^UBA6 100 Methanomassiliicoccales^archaeon^PtaU1.Bin030 Methanomassiliicoccales^archaeon^Mx-02 Methanomassiliicoccaceae^archaeon^UBA367 93 87 Methanomassiliicoccales^archaeon^RumEn^M2 100 Methanomassiliicoccaceae^archaeon^UBA404 (A) Name and assembly level are provided for each species in phylogenetic context. Methanomassiliicoccaceae^archaeon^UBA72 Methanomassiliicoccaceae^archaeon^UBA113 100 methanogenic^archaeon^mixed^culture^ISO4-G1 Methanomassiliicoccaceae^archaeon^UBA537 100 100 Methanomassiliicoccales^archaeon^Mx-06 94 Methanomassiliicoccales^archaeon^Mx-03 100 Methanomassiliicoccaceae^archaeon^UBA71 Candidatus^Methanoplasma^termitum 100 Methanomassiliicoccaceae^archaeon^UBA593 methanogenic^archaeon^ISO4-H5 100 76 Thermoplasmatales^archaeon^BRNA1 100 Candidatus^Methanomethylophilus^sp.^1R26 100 Candidatus^Methanomethylophilus^sp.^UBA78 100 Candidatus^Methanomethylophilus^alvus^Mx1201 The species-level phylogeny is based on GTDB (see Methods). (B) Co-occurrence of Thermoplasmatales^archaeon^ex4484_36 Thermoplasmatales^archaeon^ex4484_6 56 Euryarchaeota^archaeon^UBA102 Marine^Group^III^euryarchaeote^CG-Bathy1 Euryarchaeota^archaeon^UBA488 100 Euryarchaeota^archaeon^UBA169 78 Marine^Group^III^euryarchaeote^CG-Epi4 100 Marine^Group^III^euryarchaeote^CG-Epi1 Euryarchaeota^archaeon^UBA596 Euryarchaeota^archaeon^UBA226 73 Euryarchaeota^archaeon^UBA60 100 Euryarchaeota^archaeon^UBA53 97 Euryarchaeota^archaeon^UBA550 100 Euryarchaeota^archaeon^UBA15 Euryarchaeota^archaeon^UBA495 NAPs. This panel is equivalent to Fig 1B but only considers co-occurrence in 77 Euryarchaeota^archaeon^UBA124 100 Euryarchaeota^archaeon^UBA82 97 Euryarchaeota^archaeon^TMED117 89 Candidatus^Poseidonia^alphae Euryarchaeota^archaeon^UBA256 56 100 Euryarchaeota^archaeon^UBA549 100 Euryarchaeota^archaeon^UBA561 100 Euryarchaeota^archaeon^UBA242 Euryarchaeota^archaeon^TMED132 Euryarchaeota^archaeon^UBA229 Diaforarchaea 100 66 Euryarchaeota^archaeon^UBA111 50 uncultured^Candidatus^Poseidoniales^archaeon 100 Euryarchaeota^archaeon^UBA433 100 83 Euryarchaeota^archaeon^UBA258 complete genomes. Information on assembly level is based on metadata from NCBI. Euryarchaeota^archaeon^UBA88 100 Euryarchaeota^archaeon^UBA259 100 Euryarchaeota^archaeon^UBA91 100 Euryarchaeota^archaeon^TMED99 Euryarchaeota^archaeon^UBA59 Marine^Group^II^euryarchaeote^MED-G33 100 71 Euryarchaeota^archaeon^UBA501 100 Euryarchaeota^archaeon^UBA602 Euryarchaeota^archaeon^UBA40 86 Euryarchaeota^archaeon^UBA251 Euryarchaeota^archaeon^UBA222 100 Euryarchaeota^archaeon^UBA39 100 Euryarchaeota^archaeon^UBA125 Euryarchaeota^archaeon^UBA182 100 100 Euryarchaeota^archaeon^UBA558 Euryarchaeota^archaeon^UBA52 100 Euryarchaeota^archaeon^UBA250 91 Euryarchaeota^archaeon^UBA171 99 Euryarchaeota^archaeon^TMED192 100 Euryarchaeota^archaeon^UBA58 Marine^Group^II^euryarchaeote^MED-G36 100 Euryarchaeota^archaeon^UBA106 100 Euryarchaeota^archaeon^UBA105 Euryarchaeota^archaeon^UBA99 100 Euryarchaeota^archaeon^UBA556 100 Euryarchaeota^archaeon^UBA89 88100 Euryarchaeota^archaeon^UBA257 Euryarchaeota^archaeon^UBA605 50 Euryarchaeota^archaeon^UBA439 Euryarchaeota^archaeon^UBA557 100 Marine^Group^II^euryarchaeote^MED-G38 100 100 Euryarchaeota^archaeon^TMED129 Euryarchaeota^archaeon^UBA36 100 Marine^group^II^euryarchaeote^REDSEA-S30_B12 100 Euryarchaeota^archaeon^UBA462 100 Euryarchaeota^archaeon^TMED173 75 62 Marine^Group^II^euryarchaeote^MED-G35 100 Marine^group^II^euryarchaeote^REDSEA-S42_B7 100 Marine^group^II^euryarchaeote^REDSEA-S43_B8 54 Marine^group^II^euryarchaeote^REDSEA-S11_B3N4 Euryarchaeota^archaeon^UBA54 Euryarchaeota^archaeon^UBA623 31 100 100 Marine^Group^II^euryarchaeote^MED-G34 Euryarchaeota^archaeon^UBA4 100 Euryarchaeota^archaeon^UBA438 100 Euryarchaeota^archaeon^UBA173 Euryarchaeota^archaeon^UBA28 100 Euryarchaeota^archaeon^UBA181 96 Euryarchaeota^archaeon^UBA62 100 Euryarchaeota^archaeon^UBA103 100 Euryarchaeota^archaeon^UBA170 Methanonatronarchaeum^thermophilum archaeon^SCG-AAA382B04 100 Candidatus^Methanohalarchaeum^thermophilum Archaeoglobi^archaeon^JdFR-42 Archaeoglobus^sulfaticallidus^PM70-1 Archaeoglobus^sp.^JdFR-24 76 Archaeoglobus^sp.^UBA234 Archaeoglobus^sp.^JdFR-37 Ferroglobu s^placidus^DSM^10642 79100 Geoglobus^ahangari 35 Geoglobus^acetivorans Archaeoglobus^sp. 35 Archaeoglobus^veneficus^SNP6 Archaeoglobus^profundus^DSM^5631 48 Archaeoglobus^sp.^JdFR-41 100 Archaeoglobus^fulgidus^DSM^4304 Candidatus^Methanolliviera^hydrocarbonicum 84 Candidatus^Methanoliparum^thermophilum Methanophagales^archaeon^ANME-1-THS 38 ^archaeon Candidatus^Syntrophoarchaeum^butanivorans Candidatus^Syntrophoarchaeum^caldarius Methanocellaceae^archaeon^UBA148 ^conradii^HZ254 Methanocella^paludicola^SANAE Methanocella^sp.^PtaU1.Bin125 Methanocella^ar voryzae^MRE50 Methermicoccus^shengliensis^DSM^18856 Methanomicrobia^archaeon^JdFR-19 ^sp.^UBA204 50 100 ^archaeon^Methan_01 100 Methanosaeta^harundinacea^6Ac 49 ^thermoacetoph ila^PT 100 Methanosaeta^sp.^UBA114 100 Methanosaeta^sp.^PtaB.Bin018 100 Methanothrix^soehngenii^GP6 100 Methanosaeta^sp.^PtaU1.Bin112 100 Methanosaeta^sp.^UBA80 Methanosarcinales^archaeon^UBA261 Methanosarcinales^archaeon^UBA203 100 ANME-2^cluster ^archaeon 28 ANME-2^cluster^archae on^HR1 Candidatus^Argoarchaeum^ethanivorans 36 Methanosarcinales^archaeon^ex4572_44 84 Candidatus^Methanoperedenaceae^archaeon^HGW-Methanoperedenaceae-1 77 100 Candidatus^Methanoperedenaceae^archaeon 57 Candidatus^Methanoperedens^nitroreducens ^blatticola Methanosarcina^sp.^MTP4 100 Methanosarcina^lacustris^Z-7289 100 Methanosarcina^horonobensis^HB-1^=^JCM^15518 10050 100 Methanosarcina^siciliae^T4/M 100 Methanosarcina^acetivorans^C2A Methanosarcina^sp.^Ant1 100 Methanosarcina^sp. 100 Methanosarcina^sp.^UBA402 100 Methanosarcina^sp.^MSH10X1 56 Methanosarcina^thermophila^TM-1 53 Methanosarcina^sp.^UBA5 100 Methanosarcina^barkeri^3 67 Methanosarcina^barkeri^MS 99 Methanosarcina^barkeri^str.^Fusaro ^evestigatum^Z-7303 100 ^sp. 92 Methanosalsum^zhilinae^DSM^4017 ^halophilus 100 Methanohalophilus^mahii^DSM^5219 35 Methanohalophilus^sp.^RSK 44 Methanohalophilus^sp.^SLHTYRO Methanococcoides^burtonii^DSM^6242 100 Methanococcoides^methylutens^MM1 100 Methanococcoides^vulcani 100 51 78 Methanococcoides^sp.^AM1 83 Methanococcoides^methylutens ^archaeon^UBA470 100 ^sp.^PtaU1.Bin073 Methanomethylovorans^hollandica^DSM^15978 92 Methanomethylovorans^sp.^PtaU1.Bin093 100 98 Methanococcoides^sp.^EBM-47 ^profundi 88 Methanolobus^tindarius^DSM^2278 65 Methanolobus^sp.^UBA32 99 Methanolobus^psychrotolerans Methanolobus^sp.^UBA474 100100 Methanolobus^psychrophilus^R15 Methanolobus^sp.^SY-01 100 Methanolobus^sp.^UBA118 100 Methanolobus^sp.^T82-4 ^sp.^FWC-SCC2 Methanofollis^liminatans^DSM^4140 56 Methanofollis^sp.^UBA420 100 Methanofollis^ethanolicus ^sp.^SDB ^cariaci^JCM^10550 100 ^limicola^DSM^2279 100 ^mobile^BP 56 ^paynteri Methanolacinia^petrolearia^DSM^11571 Methanoculleus^taiwanensis 71 ^archaeon^UBA206 100 81 Methanomicrobiaceae^archaeon^UBA436 Methanoculleus^sp.^UBA430 31 Methanomicrobiales^archaeon^HGW-Methanomicrobiales-2 Methanoculleus^sp.^UBA303 2456 Candidatus^Methanoculleus^thermohydrogenotrophicum 27 100 Methanoculleus^bourgensis^MS2 Methanoculleus^sp.^UBA77 25 31 Methanoculleus^sp.^UBA208 96 Methanomicrobiales^archaeon^HGW-Methanomicrobiales-6 76 Methanoculleus^sediminis 4786 Methanoculleus^sp.^MH98A 100 Methanoculleus^marisnigri^JR1 Methanoculleus^horonobensis 89 Methanoculleus^chikugoensis 100 Methanoculleus^chikugoensis^JCM^10825 ^archaeon^Phil4 Methanocorpusculaceae^archaeon^UBA425 81 100 ^sp.^MCE 97 Methanocorpusculum^labreanum^Z ^sp. 35 Methanocorpusculaceae^archaeon^UBA456 100 Methanocalculus^sp.^MSAO_Arc1 100 Methanocalculus^sp.^MSAO_Arc2 ^stamsii 100 Methanospirillum^hungatei^JF-1 Methanospirillum^hungatei 100 Methanospirillum^lacunae 52 Methanosphaerula^palustris^E1-9c Methanoregulaceae^archaeon^UXB-2016 Methanoregulaceae^archaeon^UBA288 Methanoregulaceae^archaeon^UBA467 81 Methanolinea^sp.^UBA477 Methanolinea^sp.^SDB 100 42 Methanolinea^sp.^UBA277 32 Methanolinea^sp.^UBA473 Methanolinea^tarda^NOBI-1 100 Methanoregulaceae^archaeon^P taB.Bin108 100 Methanoregulaceae^archaeon^P taB.Bin056 100 Methanolinea^sp.^UBA286 72 Methanoregulaceae^archaeon^P taB.Bin152 ^sp.^PtaU1.Bin051 100 Methanoregula^sp.^UBA64 100 Methanoregula^boonei^6A8 Methanoregula^sp.^UBA247 100 Methanoregula^sp.^UBA471 100 Methanomicrobiales^archaeon^HGW-Methanomicrobiales-1 Methanoregula^sp.^PtaB.Bin085 67 10090 Methanoregula^sp.^UBA278 100 Methanoregula^sp.^UBA154 Methanoregula^sp.^UBA276 92100 Methanomicrobiales^archaeon^HGW-Methanomicrobiales-3 Methanoregula^sp. 100 Methanoregula^formicica^SMSP Euryarchaeota^archaeon^UBA24 Salinarchaeum^sp. Halobacteriales^archaeon^QS_1_68_20 ^subterraneus Halalkalicoccus^paucihalophilus 95 Halalkalicoccus^jeotgali^B3 Halobacteriales^archaeon^QS_8_65_32 64 Halobacteriales^archaeon^QH_6_64_20 100 Halobacteriales^archaeon^QH_8_64_26 100 Halobacteriales^archaeon^QS_3_64_16 55 Halobacteriales^archaeon^QS_4_69_34 100 ^agarilyticus 10059 Halococcus^salifodinae^DSM^8989 100 Halococcus^saccharolyticus^DSM^5350 Halococcus^sp.^IIIV-5B 100 Halococcus^sediminicola 100 Halococcus^thailandensis^JCM^13552 100 Halococcus^morrhuae^DSM^1307 34 Halomarina^oriensis Halobacteriales^archaeon^QS_5_70_17 Halobacteriales^archaeon^QS_5_70_15 Halobacteriales^archaeon^QS_4_70_19 9555 Halobacteriales^archaeon^QH_7_65_31 100 Halobacteriales^archaeon^QS_9_67_17 99 100 Halobacteriales^archaeon^QH_7_66_36 ^sp. Natronomonas^pharaonis^DSM^2160 81 Natronomonas^moolapensis^8.8.11 100100 Natronomonas^sp.^F20-122 81 Halobacteriales^archaeon^QS_9_68_42 100 Halobacteriales^archaeon^SW_5_68_122 100 Halobacteriales^archaeon^QS_9_70_65 10063 Halobacteriales^archaeon^QS_8_69_73 67 Halobacteriales^archaeon^QS_4_69_225 96 Halobacteriales^archaeon^QH_10_70_21 22 50 Halobacteriales^archaeon^QH_7_69_31 100 Halobacteriales^archaeon^QS_1_69_70 Halobacteriales^archaeon^QS_1_68_17 halophilic^archaeon^NEN8 100 Halorientalis^sp.^IM1011 100 Halorientalis^persicus 96 Halorientalis^regularis 89 Halorientalis^sp.^F13-25 Halobacteriales^archaeon^SW_10_68_16 100 Halobacteriales^archaeon^SW_9_67_25 71 halophilic^archaeon^J07HX64 87 ^archaeon^Tc-Br11_E2 g27 100 Halobacteriaceae^bacterium^SHR40 66 Halobacteriales^archaeon^QH_10_67_13 54100 Halobacteriales^archaeon^QH_10_65_19 18 Halobacteriales^archaeon^SW_7_65_23 97 Halobacteriales^archaeon^QH_2_65_14 100 Halovenus^aranensis 98 Halovenus^sp.

Halobacteriota Haloar culaceae^archaeon^HArcel1 ^utahensis^DSM^12940 87 Halorhabdus^rudnickae 100 Halorhabdus^sp.^H27 48 ^sp.^YPL4 Halosimplex^sp.^TH32 100 Halosimplex^carlsbadense^2-9-1 100 Halobacteriales^archaeon^QS_5_68_33 100 Halobacteriales^archaeon^QS_4_69_31 64 79 Halobacteriales^archaeon^QH_10_67_22 ^zhouii 100 Halobacteriales^archaeon^QS_4_62_28 45 Halomicrobium^mukohataei^DSM^12286 100 Halomicroarcula^sp.^LR21 100 Halobacteriales^archaeon^QH_8_67_36 100 Halobacteriales^archaeon^QS_6_64_34 ^vall ismortis^ATCC^29715 Haloarcula^taiwa nensis 81 Haloarcula^ru bripromontorii 61 Haloarcula^sp.^MD-130 86 Haloarcula^hisp anica^ATCC^33960 57 Haloarcula^jap onica^DSM^6131 96 Haloarcula^ar gentinensis^DSM^12282 50 Haloarcula^sp.^CBA1127 92 Haloarcula^ma rismortui^ATCC^43049 halophilic^archaeon ^sp.^R4 100 Haladaptatus^litoreus 100 Haladaptatus^cibarius^D43 Halorussus^sp.^LYG-36 98 Halorussus^amylolyticus Halorussus^sp.^HD8-83 100 Halorussus^sp.^HD8-51 95 Halobacteriales^archaeon^SW_6_65_15 41 Halorussus^rarus 100 halophilic^archaeon^LT46 41 Halorussus^ruber 81 Halorussus^sp.^RC-68 Halobacteriales^archaeon^QS_8_69_26 Natronoarchaeum^philippinense 79 30 Salinarchaeum^sp.^Harcht-Bsk1 Halostella^litorea 24 100 Halostella^salina Halostella^pelagica 24 89 100 Halostella^sp.^LT12 55 Halobacteriales^archaeon^QS_9_68_17 Haloarchaeob ius^iranensis Saliphagus^sp.^LR7 Halovivax^sp. 100 Halovivax^asiaticus^JCM^14624 100 100 Halovivax^ruber^XH-70 Natrialbaceae^archaeon^Tc-Br11_E2 g1 4740 Natrialbaceae^archaeon^Tc-Br11_E2 g8 Natrialbaceae^archaeon^Tc-Br11_E2 g14 26 Natrialbaceae^archaeon^Tc-Br11_E2 g28 Halostagnicola^larsenii^XH-48 25 Halostagnicola^kamekurae 100 Halostagnicola^sp.^A56 ^amylolyticus^DSM^10524 Natronococcus^jeotgali^DSM^18795 98 53 Natronococcus^occultus^SP4 Natrarchaeobius^chitinivorans 24 Natrarchaeobius^halalkaliphilus 94 Natrarchaeobius^chitinivorans 99 Natronolimnobius^sp.^XQ-INN^246 54 Halopiger^goleimassiliensis 3852 Natrialbaceae^archaeon^B1-Br10_E2g27 Natronolimnobius^aegyptiacus 82 Natrialbaceae^archaeon^B1-Br10_E2g2 Halopiger^aswanensis 80 Halopiger^xanaduensis^SH-6 ^nitratireducens^JCM^10879 100 Halobiforma^lacisalsi^AJ5 59 ^texcoconense 73 Natronobacterium^gregoryi^SP2 Halopiger^djelfimassiliensis Haloterrigena^sp.^H1 55 Haloterrigena^hispanica 10074 Haloterrigena^limicola^JCM^13563 98 34 Haloterrigena^mahii 10054 Haloterrigena^saccharevitans 91 ^pellirubrum^DSM^15624 Natrinema^sp.^CBA1119 98 Natrinema^ejinorense 23 5779 Natrinema^altunense 64 Natrinema^pallidum^DSM^3751 Natrinema^salaciae 44 Natrinema^versiforme 100 Natrinema^versiforme^JCM^10478 Natrialba^asiatica^DSM^12278 100 Natrialba^taiwanensis^DSM^12281 Natrialba^chahannaoensis^JCM^10990 100 Natrialba^magadii^ATCC^43099 50 54 Natrialba^sp.^SSL1 91 Natrialba^hulunbeirensis^JCM^10989 Natronolimnobius^innermongolicus^JCM^12255 Haloterrigena^turkmenica^DSM^5511 98 Haloterrigena^salifodinae 72100 Haloterrigena^salina^JCM^13891 ^sediminis 100 Haloterrigena^daqingensis Natronorubrum^thiooxidans 4076 Natronorubrum^sulfidifaciens^JCM^14089 Halobacteriaceae^archaeon^SHR37 78 Natronorubrum^texcoconense Halanaeroarchaeum^sulfurireducens 60 Halarchaeum^sp.^CBA1220 100 Halobacterium^sp.^DL1 78 Halobacterium^jilantaiense Halobacterium^hubeiense 100 Halobacterium^sp.^CBA1132 uncultured^archaeon^A07HB70 Haloplanus^vescus Haloplanus^natans^DSM^17983 100 56 Haloplanus^sp.^CBA1113 96 Haloplanus^salinus Haloprofundus^m arisrubri Halogranum^gelatinilyticum 100 Halogranum^amylolyticum 99 ^archaeon^SYSU^A00711 100 Halegenticoccus^soli Salinigranum^rubrum Haloferax^larsenii^JCM^13917 100 Haloferax^elongans^ATCC^BAA-1513 96 Haloferax^sp.^SB29 95 94 Haloferax^mucosum^ATCC^BAA-1512 100 Haloferax^mediterranei^ATCC^33500 100 Haloferax^sp.^SB3 100 Haloferax^sulfurifontis^ATCC^BAA-897 85 44 Haloferax^volcanii^DS2 Halopelagius^inordinatus ^rufum 99 99 Halogeometricum^borinquense^DSM^11551 Halogeometricum^limi 57 100 92 Halogeometricum^pallidum^JCM^14848 Halobellus^clavatus 76 Halobellus^rufus Halobellus^sp.^Atlit-31R 100100 Halobellus^sp.^Atlit-38R ^sp.^J07HQX50 Haloquadratum^walsbyi^J07HQW2 100 Haloquadratum^walsbyi^J07HQW1 99 Haloquadratum^walsbyi^C23 Candidatus^Halobonum^tyrrellensis^G22 ^gomorrense 100 Halobacteriales^archaeon^QS_6_71_20 halophilic^archaeon^DL31 100 Halolamina^pelagica 55 100 Halolamina^rubra 94 Halolamina^sp.^CBA1230 100 Halolamina^sediminis ^archaeon Haloferacaceae^archaeon^N1521 100 halophilic^archaeon^SHRA6 Haloferacaceae^archaeon^PL-Br10_E2g29 100 91 Haloferacaceae^archaeon^B1-Br10_E2g22 100 Halonotius^terrestris 82 39 uncultured^archaeon^A07HN63 Halonotius^roseus 100 Halonotius^pteroides 100 Halonotius^sp.^F13-13 Halalkaliarchaeum^desulfuricum Haloparvum^sedimenti 100 Halopenitus^malekzadehii 56 Halopenitus^persicus ^vacuolatum 100 uncultured^archaeon^A07HR67 Halobellus^sp.^ZY21 100 100 Halorubrum^sp.^48-1-W 100 Halorubrum^cibi 93 Halorubrum^halodurans 100 Halorubrum^aethiopicum Halorubrum^sp.^BOL3-1 100 Halobacteriales^archaeon^SW_8_68_21 8597 Halorubrum^tebenquichense^DSM^14210 Halorubrum^sodomense 10067 Halorubrum^trapanicum 3976 Halorubrum^sp.^Atlit-26R 77 Halorubrum^sp.^T3 100 Halorubrum^tropicale 2540 Halorubrum^coriense^DSM^10284 Halorubrum^sp.^GN11_10-6_MGM 97 Halorubrum^californiensis^DSM^19288 Halorubrum^sp.^BV1 53 Nanohaloarchaea^archaeon^PL-Br10_U2g5 98 Halorubrum^aidingense^JCM^13560 78 Halorubrum^sp.^SD626R 10067 Halorubrum^sp.^CBA1229 31 Halorubrum^sp.^Hd13 Halorubrum^sp.^PV6 99 Halorubrum^lipolyticum^DSM^21995 81 Halorubrum^kocurii^JCM^14978 87 Halorubrum^lacusprofundi^ATCC^49239 42 Halorubrum^amylolyticum 26 Halorubrum^saccharovorum^DSM^1137 40 Halorubrum^persicum 14 Halorubrum^halophilum 13 Halorubrum^sp.^CSM-61 79 Halorubrum^sp.^WN019 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Diaforarchaea (Marine Group II)

Outer ring: order Candidatus Altiarchaeales Inner ring: pI Candidatus Poseidoniales Halobacteriales Min. (4.4) Methanocellales Midpoint Methanosarcinales Max. (11.9)

MC1

Figure S2. Maximum likelihood phylogeny of archaeal MC1 homologs. MC1 sequences found in the Diaforarchaea are restricted to marine group II archaea and branch as a monophyletic group within the Methanosarcinales consistent with a single horizontal transfer event. Note also that sequences from haloarchaea have dramatically lower isoelectric points, making them unlikely donors. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. A B

2000 60 1500 40 1000 20 500

# proteins identified 0 0 % of proteome measured Natrialba magadii Haloferax volcanii Natrialba magadii Haloferax volcanii Haloferax Picrophilus torridus Pyrococcus furiosus Ferroglobus placidus Ferroglobus Ferroglobus placidus Ferroglobus Thermoproteus tenax Methanopyrus kandleri Methanopyrus Thermococcus litoralis Haloarculamarismortui Haloferax mediterranei Haloferax Archaeoglobus fulgidus Archaeoglobus Methanopyrus kandleri Methanopyrus Sulfolobus solfaticarius Sulfolobus Archaeoglobus fulgidus Archaeoglobus Methanosarcina barkeri Nitrosopumilus maritimus Sulfolobus acidocaldarius Sulfolobus Halobacterium salinarium Archaeoglobus profundus Archaeoglobus Nitrosopumilus maritimus Halobacterium salinarium Archaeoglobus profundus Archaeoglobus Thermoplasma volcanium Sulfolobus acidocaldicarius Sulfolobus Thermoplasma acidophilum Thermoplasma acidophilum Methanomethylophilus alvus Thermococcus kodakarensis Methanomethylophilus alvus Thermococcus kodakarensis Mathanocaldococcus jannaschii Mathanocaldococcus jannaschii Methanomassiliicoccus luminyensis Methanomassiliicoccus luminyensis Methanothermobacter marburgensis Methanothermobacter marburgensis Methanothermobacter thermautotrophicus Methanothermobacter thermautotrophicus

Jevtic et al. (2019) Cerlettii et al. (2018) Müller et al. (2020) this study Qin et al. (2018) Sas-Chen et al. (2020)

Figure S3. Absolute (A) and relative (B) coverage of the predicted proteome in 19 archaea. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is A made available under aCC-BY 4.0 International license. Order Bacteria Superkingdom Alteromonadales Methanomassiliicoccales Archaea Flavobacteriales Methanomicrobiales Rhodobacterales Sphingomonadales B Hmmsearch hit against pI NAP candidate from M. alvus (AGI86273.1) Min. (4.3) Protein length (70-580 amino acid) Max. (10.9)

Diaforarchaea Methanomssiliicoccales

Superkingdom pI

Figure S4. The novel candidate NAP WP_019177984.1/AGI86273.1 in phylogenetic context. (A) Maximum likelihood phylogeny of bacterial and archaeal WP_019177984.1/AGI86273.1 homologs. Homologs were obtained by running jackhmm using AGI86273.1 as a seed (e-value 1e-5). The subtree indicated by dashed lines was re-aligned and a tree computed using RAxML-NG. Bootstrap (n=100) values superior to 50 are shown. (B) Distribution of jackhmm hits using AGI86273.1 as a seed across archaea, displayed on a species tree obtained from GTDB. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

PFAM domain

5 6 7 8 9 AraC_binding AsnC_trans_reg GntR HTH_10 HTH_11 HTH_20 HTH_23 HTH_24 HTH_27 HTH_38 HTH_5 NikR_C PaaX Regulator_TrmB RHH_1 0 100 200 0.0 0.5 1.0 T. kodakarensis (BAD86252.1) T. kodakarensis (BAD84660.1) TrmBl2 S. acidocaldarius (AAY80109.1) x structure known, no dbd N. maritimus (ABX13658.1) N. maritimus (ABX13420.1) N. magadii (ADD07095.1) N. magadii (ADD06615.1) N. magadii (ADD04528.2) M. barkeri (AKB56076.1) M. kandleri (AAM01353.1) M. luminyensis (CAJE01000005.1−32) M. jannaschii (AAB99060.1) M. jannaschii (AAB98541.1) H. volcanii (ADE04502.1) H. volcanii (ADE03043.1) T. volcanium (BAB59768.1) T. volcanium (BAB59743.1) T. acidophilum (CAC12274.1) T. acidophilum (CAC11687.1) T. kodakarensis (BAD86187.1) M. alvus (AGI86273.1) no known domain M. luminyensis (CAJE01000015.1−144) M. luminyensis (CAJE01000001.1−33) H. salinarium (CAP14014.1) A. profundus (ADB57099.1) x CRISPR

pI CD-hit Length % of proteome (amino acids)

Figure S5. Overview of candidate NAPs, their domain content, size, isoelectric point (pI) and fractional abundance in the proteome of the species in which they were identified. Likely orthologs amongst this set were identified using CD-hit (see Methods) and bear the same colour. Two likely false positives are marked with an “x”. The pipeline used to predict novel candidate NAPs is described in the main text and Methods. Also see Table S1 for further details. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

● TrmBl2

0.6

0.4

% of proteome ● 0.2

● ●

● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ●● ● ● ● ● ●● Natrialba magadii Halobacterium sp Haloferax volcanii Haloferax Picrophilus torridus Ferroglobus placidus Ferroglobus Methanosarcina barkeri Thermoplasma volcanium Sulfolobus acidocaldicarius Sulfolobus Thermoplasma acidophilum Mathanocaldococcus jannaschii Nitrosopumilus maritimus SCM1 Thermococcus KOD1 kodakarensis Methanomassiliicoccus luminyensis

Figure S6. Relative abundance of TrmBL orthologs in different proteomes. T. kodakarensis TrmBL2 stands out as having a much higher relative abundance than its homologs in other proteomes. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

HVO_1577 ● ● 1 ● ● ● ● ● ● ● ● HVO_2029 ● ●● ● ● ●●● ● ● ● ● ● ●● ●●● ●● ●● ● ● ●●●●● ● ● ● ● ● ●●● ●● ●● ● ● ●● ● ● ● ● ●●● ●● ● ●● ● ● ●●●● ● ● ●● ●●● ● ● ● ● ● ●●●●●● ●●● ● ●● ● ● ●●● ●●●●●●●● ● ● ● ●● ● ●●●● ●●● ●● ● ● ● ●●●●●●●● ●●●●●●●● ● ● ● ● ● ● ● ● ● ●● ●●●●●●● ● ● ●●● ●● ● ● ● ●●●●●●● ●●●● ● ● ● ● ● ●● ●●● ●●● ●●● ●●●●●● ● ● mc1B (MC1) ●●● ●●●●●●●● ● ● ●●● ● ●●● ● ● ●● ●● ● ●●●●●●● ●●●● ● ● ● ● ●●●● ● ●●●●●● ●●●● 0.1 ● ● ● ●●●●●●●●●●● ●●●●●● ●●●●●●● ●●● ● ● ● ● ● ● ● ● ●● ●● ●●●●●●●●●●●●●●●●●●●●●● ●●● ● ● ● ●●● ●● ●● ●●●●●●●●●●●● ●●●●●● ● ●● ● ●● ●●● ●●●●●●●●●●●●●●●●●●● ● ●●●● ●●● ● ● ●●●●●●● ●●●●●●● ●●●●●●●●●●●● ●●● ●● ● ● ● ●●● ●● ●● ●●●● ●●● ●●● ●●●● ● ● ●● ●●● ● ● ● ●●●● ●● ●●●●●●●●●●●●● ●●●●●●●●●● HstA (histone) ● ● ● ● ●●● ●● ●●●●●●●●● ●●●● ●●●●●●●● ●●●●●●● ● ● ● ● ●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●● ●● ●● ● ●● ●●●●●●● ●●●●●● ●●●●● ●●●●● ●●●●● ● ●● ● ● ● ● ● ●● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ● ● ●● ● ●●● ●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●● ●●● ● ●● ● ● ●●● ●●●●●●●●●● ●●●●●●●●●●●●●● ●●●● ● ● ● ● ● ● ●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●● ●● ●● ●● ●●● ● ● ● ● ●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●● ●●● ● ● ●● ● ● ● ● ●● ●● ●● ●●●●●●● ●●●●●●●●●●●●●●● ●●● ●●● ●● ●●●●●● ● ●●● ● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●● ● ● ● ● ● ● ●●●● ●●● ●●● ● ●●●●●●● ●●●●●●●●●● ●●●●●●●● ●● ● ● ● ● ● ●● ●●●● ● ●●●●●●●● ●●●●●● ● ● ●●● ●● ● ●● ● ●●●● ●●●● ● ●●● ●●●●●●●●● ●● ●● ● ●● ● ●● ● ●● ● ●● ● ●● ●● ●● ●●●●●● ● ●●●●●●●● ● ●●●●●● ●● ● ● ● ● ● ● ● ●●●● ● ●● ● ●●● ● ●●●● ●●●●●●●● ●●●● ● ●● ● ● 0.01 ● ● ● ● ● ●●●●●● ●●●●● ●●● ●●●●● ● ● ●● ● ● ● ●● ● ● ●●●● ● ● ●●●● ● ● ●●●●●●●●●●● ●●●●● ● ●● ●● ● ● ● ● ● ●● ● ●●● ●●● ● ●● ●● ●●●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ●●●●●●●● ● ● ●●● ●● ● ● ● ● ●● ● ●● ●●●●●●● ●● ● ●●●●●●● ●● ●● ●●● ●● ●● ● ● ● ●●● ●● ● ●● ●●● ●●●●● ●●● ●●●● ●●●● ● ● ●● ● ● ●● ● ●● ●●●● ● ● ●●●●●●●●●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ●●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ●●●● ● ● ●● ● ● ●●●● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ● ● ●●● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ●● ● ● ● 0.001 ● ●● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● Knuppel et al., % of proteome Knuppel ● ● ● ● ● ● ● ● rho = 0.78 ● ● P = 0

0.001 0.01 0.1 1 Jevtic et al., % of proteome

Figure S7. Relative abundance of known and candidate NAPs in Haloferax volcanii, as determined in two other quantitative proteomics (Jevtić et al. 2019; Knüppel et al. 2021). Note that the single H. volcanii histone (HstA) and MC1 are >10-fold less abundant than the novel NAP candidate HVO_1577. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is A made available under aCC-BY 4.0 InternationalB license. ● TF Histones ● Histone ● 0.9 3 ● ● 0.6

● 2 ● 0.3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ●● ● ●● ●● ● ● % of proteome ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● % of proteome ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●●●● ● ● ● ● ●● ● ●● ● ●● ●● ●●● ●●●●● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ●●●● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ●● ● ● ● ●●●●● ●●●● ● ● ● ●● ● ●●●● ●● ● ●●●● ● ●● ● ●●●● ●● ● ●●● ●●●● ● ●● ● ●●● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ●●●●●● ●●●●●●●●●● 0 ● ● ● ● ● ● M. alvus P. torridus P. M. barkeri H. volcanii A. fulgidus F. placidus F. N. magadii M. kandleri T. volcanium T. N. maritimus A. profundus H. salinarium M. jannaschii T. acidophilum T. M. luminyensis T. kodakarensis T. Natrialba magadii Haloferax volcanii Haloferax M. marburgensis S. acidocaldicarius Methanosarcina barkeri M. thermautotrophicus Nitrosopumilus maritimus Halobacterium salinarium

Figure S8. Quantitative variation of histone abundance across 19 archaea. (A) Bar heights represent the summed abundance of all detected histone paralogs in a given species, while the grey dots mark the abundance of individual paralogs. (B) The relative abundance of histone proteins in halophilic archaea compared to the abundance of transcription factors (TF) measured in the same experiment. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is A made available under aCC-BY 4.0 International licenseB .

● ●

1.5 ● ● 6 ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● 4 1.0 ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● Coefficient of variation ● ●● ●●● ● ● ●● ● ●● ● ●●●● ● ● ● ●● ● ●●● 2 ●● ●● ●● ● ● ● ● ●●● ● ● ●● ●●● ● ●● NAP abundance (%) NAP abundance ● ● 0.5 ●● ● ● ●● ● tRNA synthetases ● ●● ● GTP_EFTU_D2 ● ● ● ●● ●●● GTP_EFTU (tRNA synthetase normalized) 0 0 1 2 3 4 5 Average abundance (%) M. alvus P. torridus P. M. barkeri H. volcanii A. fulgidus F. placidus F. N. magadii M. kandleri T. volcanium T. A. profundus H. salinarium N. maritimus M. jannaschii T. acidophilum T. M. luminyensis T. kodakarensis T. M. marburgensis S. acidocaldicarius M. thermautotrophicus

Figure S9. (A) Relative NAP abundance normalized to the abundance of tRNA synthetases in the same proteome. (B) Compared to other Pfam domains, proteins classified as tRNA synthetases are both highly abundant and different organisms dedicate a similar fraction of their protein budget to their production (as evident from a low coefficient of variation across species), making them a suitable choice for normalization. The idea of normalization here is to consider the investment in NAPs relative to some baseline investment in basic cellular processes (such as charging amino acids). The results indicate that large quantitative variability in NAP abundance is also evident when Sjudged against this baseline rather than as a fractional allocation across the entire proteome. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is A made available under aCC-BY 4.0 International license.

● ● ● 35 ● ● ● 10 ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ●●● ● ● ●●●● ● ● ● ● ● ● ●●● ● ● ● ● ●● ●● ●●● ● ● ● ●● ●●●● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ●●●● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ●● ●●●●●●●●●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ●●● ● ● ● ● ●●●●● ● ●●●●●● ● ● ● ● ● ●● ●●●● ●●●● ● ● ● ● ● ●● ● ●● ●●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●●●● ● ● ●● ●● ● ● ● ● ● ● ● ●● ●●● ● 30 ● ●● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●●●●●●●●●●● ● ●●●●● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ●● ●● ● ● ●● ●●● ●● ●●●● ●● ● ● ●● ●● ● ●● ● ● ● ●●●●●●● ●●●●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ●●● ● ●●● ● ● ● ● ● ●● ● ●● ●●●● ●● ● ●●●●●●●●● ●● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●● ●● ●●● ● ● ● ● ● ● ●●● ●● ● ● ● ●●● ● ● ● ● ● ●● ●● ●● ● ●●● ●●●●●● ● ● ● ●● ● ● ● ● ●●● ● ●●●● ●● ●●● ● ● ● ● ● ● ●●●● ●●●●●●●●●●●● ●●● ● ● ●●● ● ● ● ● ● ●●●●●● ●● ●●● ●● ● ● ●●● ●● ● ● ●● ●●●●●● ●●● ● ●● ● ●● ●● ●●● ● ● ●●● ●● ● ● ● 0.1 ● ● ● ● ●●●●●●●●● ●●●●● ●● ●●● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ●● ●● ●●● ●● ● ● ● ● ● ●●●● ●●●●●●●●●● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● , % of proteome ● ● ● ● ●● ●●●●●●●●●●● ●● ● ● ● ●●●● ●● ● ●● ●●● ● ● ●● ● ● ● ● ● ●●● ●●●● ●●● ●● ●●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●● ● ●● ● ● ●● ●●● ●●●●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●● ● ● ●●● ●●● ●●●●●● ●●●●● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●●● ● ●● ●●●● ● ●● ● ●●●● ●●●●● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ●●● ● ● ●●●● 0.1 ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ●●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● , % of proteome ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ●●●● ● ● ● ●●●●●● ● ●● ●● ●● ● ● ● ● ● ● ● ●● ●●● ●●● ● ●●●● ● ● ●● ● ●● ● ● ● ●●●●●●●● ●●●● ●●● ● ● ●● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ●●●● ●● ●●● ● ● ● ● ● ● ● ●●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●●● ● ●● ●● ●●●●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● per G.O ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● domain per PFAM ● ● ● ● ● ● ● ● Intensity (log2) ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 0.001 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.001 ● ● ● ● ● ● ● ● ● ● ● ● ● M. luminyensis ● ● ● ● ● rho = 0.8 rho = 0.68 ● rho = 0.7 ● ● ● ● M. luminyensis ● 20 ● ● P=0 P=0 P=0 ● ● ● ●● ●●●● ● ● ●● ● ●●●● ● ● ● ● ●● ● ●● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●● ● ●

B alvus Candidatus Methanomethylophilus 20 25 30 35 1e−04 0.01 1 0.001 0.1 10 Intensity (log2) M. alvus, % of proteome M. alvus, % of proteome Methanomassiliicoccus luminyensis B10 per GO Species tree (Gtdb) per PFAM domain H. salinarium 1 H. volcanii N. magadii 0.9 M. barkeri 0.8 A. fulgidus Correlation matrix (Spearman Rho) (Spearman A. profundus 0.7 F. placidus M. alvus 0.6 M. luminyensis T. acidophilum 0.5 T. volcanium P. torridus 0.4 M. thermautotrophicus 0.3 M. marburgensis M. jannaschii 0.2 M. kandleri T. kodakarensis 0.1 N. maritimus S. acidocaldarius 0 Best reciprocal blast hits PFAM domain content GO content

Jevtic et al. (2019) Cerlettii et al. (2018) Müller et al. (2020) this study Qin et al. (2018) Sas-Chen et al. (2020)

Figure S10. Comparing relative protein abundances across proteomes. (A) Correlation of relative protein abundances in Methanomassiliicoccus luminyensis versus Methanomethylophilus alvus when comparing reciprocal best BLAST hits (left panel) or proteins aggregated by Pfam domain content (middle panel) or by gene ontology class (right panel). (B) Visualization of pairwise correlation coefficients for the same comparisons across all 19 proteomes. The species tree is taken from GTDB, with P. torridus and H. salinarum added manually. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. A B C

A. profundus F. placidus rho =0.63 rho =0.85 ● rho =0.75 ● ● F. placidus P < 0.005 A. profundus 5 P<5e−06 ● P<0.0003 3

4 A. profundus 2 ● ● ● S. acidocaldarius ● F. placidus 2 M. thermautotrophicus A. fulgidus T. kodakarensis ● 3 ● T. kodakarensis ● A. fulgidus ● ● ● ● A. fulgidus M. thermautotrophicus S. acidocaldarius ● M. marburgensis ● T. volcanium M. kandleri ● M. kandleri P. torridus 2 ● ● ● 1 ● ●

NAPs (% of proteome) P. torridus M. marburgensis T. kodakarensis 1 T. volcanium M. jannaschii Alba (% of proteome) ● ● ● T. acidophilum M. kandleri 1 ● Histones (% of proteome) H. salinarium ● T. acidophilum ● M. jannaschii N. maritimus M. barkeri N. magadii M. thermautotrophicus ● ● ● ● H. volcanii N.● maritimus● H. volcanii N. magadii H. salinarium T. acidophilum S. acidocaldarius ● M. marburgensis ● ● ● ● ● ● ● ● ● ● ● ● 0 M. alvus M. luminyensis 0 M. barkeriM. alvusluminyensis H. volcanii M. jannaschii 0 P. torridus T. volcanium 40 60 80 100 40 60 80 100 40 60 80 100 Growth temperature (°C) Growth temperature (°C) Growth temperature (°C)

Figure S11. Relative abundance of individual NAPs as a function of optimal growth temperature. Strong correlations are evident not only for (A) NAPs considered as an aggregate class but also for (B) Alba and (C) histones considered individually. Note that Alba and histones are the only NAPs that are sufficiently widespread across our sample of archaea to allow meaningful correlations to be computed. bioRxiv preprint doi: https://doi.org/10.1101/2021.07.08.451601; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. A B

N. magadii ● ● A. profundus rho =−0.7 ● rho =−0.73 ● M. barkeri F. placidus P < 0.0012 70 P < 0.0005 ● H. volcanii ● T. acidophilum 4 T. kodakarensis ● M. luminyensis ● ● ● A. fulgidus T. volcanium H. salinarium 60 ● ● ● S. acidocaldarius N. maritimus T. volcanium ● ● S. acidocaldarius M. thermautotrophicus ● ● M. kandleri M. alvus ● ● P. torridus M. jannaschii M. luminyensis M. jannaschii 2 ● ● ● M. kandleri ● ● M. marburgensis T. acidophilum ● 50 T. kodakarensis ● P. torridus ● ● M. marburgensis N. maritimus H. volcanii ● ● M. barkeri ● A. fulgidus ● NAPs (% of measured proteome) H. salinarium ● M. thermautotrophicus F. placidus ● ● ● ● N. magadii Transcriptional Units / Total Genes (%) Units / Total Transcriptional A. profundus 0 M. alvus 40 60 80 100 50 60 70 Growth temperature (°C) Transcriptional Units / Genes (%)

Figure S12. The relationship between growth temperature and genome compactness. (A) The genomes of organisms growing at higher temperatures tend to be organized in such a manner that transcript units harbour more genes on average. There are therefore fewer promoters per gene in archaea that grow at higher temperature. (B) As predicted from the respective relationships with optimal growth temperature, genomes where genes are contained in more independent transcriptional units have a lower investment in NAPs.