bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Ancestral reconstructions decipher major adaptations of ammonia 2 oxidizing upon radiation into moderate terrestrial and 3 marine environments 4

5 Sophie S. Abby1,2,+, Melina Kerou1,+, Christa Schleper1,* 6 1 University of Vienna, Archaea Biology and Ecogenomics Division, Dep. of Ecogenomics 7 and Systems Biology, Althanstrasse 14, 1090 Vienna, Austria. 8 2 University Grenoble Alpes, CNRS, Grenoble INP, TIMC-IMAG, 38000 Grenoble, France. 9

10 + these authors contributed equally 11 * corresponding author: [email protected] 12 13 14 15 Abstract 16 Unlike all other archaeal lineages, ammonia oxidizing archaea (AOA) of the phylum 17 Thaumarchaea are widespread and abundant in all moderate and oxic environments on Earth. 18 The evolutionary adaptations that led to such unprecedented ecological success of a microbial 19 clade characterized by highly conserved energy and carbon metabolisms have, however, 20 remained underexplored. Here we reconstructed the genomic content and growth temperature 21 of the ancestor of all AOA as well as the ancestors of the marine and soil lineages based on 22 39 available complete or nearly complete genomes of AOA. Our evolutionary scenario depicts 23 an extremely thermophilic autotrophic, aerobic ancestor from which three independent 24 lineages of a marine and two terrestrial groups radiated into moderate environments. Their 25 emergence was paralleled by (I) a continuous acquisition of an extensive collection of stress 26 tolerance genes mostly involved in redox maintenance and oxygen detoxification, (II) an 27 expansion of regulatory capacities in transcription and central metabolic functions and (III) an 28 extended repertoire of cell appendages and modifications related to adherence and 29 interactions with the environment. Our analysis provides insights into the evolutionary 30 transitions and key processes that enabled the conquest of the diverse environments in which 31 contemporary AOA are found. 32

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

33 Introduction 34 35 Ammonia oxidizing Archaea of the phylum have a very broad ecological 36 distribution on Earth and are key players in the global Nitrogen cycle. They are generally found 37 as a stable part of the microbial community in studies of aerobic environments, including soils, 38 ocean waters, marine sediments, freshwater environments, hot springs, plants and animals,

39 including humans 1–3. This distribution is particularly impressive when considering that even 40 though the Archaea includes a large variety of energy and central carbon metabolisms

41 4, no other lineage is known so far to be represented this widely in oxic environments. Other 42 widespread functional guilds of archaea are rather confined to anoxic environments and 43 include methanogenic archaea (of the Classes I & II) and potentially also the so far 44 uncultivated archaea of the novel lineages Bathyarchaeota and Verstraetearchaeota, as well

45 as possibly some lineages of the and DPANN superphyla 5–9. 46 In contrast to methanogens, AOA form a monophyletic lineage now classified as

47 Nitrososphaeria within the phylum Thaumarchaeota 10. Based on environmental studies and

48 genomic data, they comprise thermophilic species growing around 70° 11–13, but most

49 cultivated organisms stem from ocean waters and soils 1,3,14,15. Phylogenetic and 50 phylogenomic studies have shown that the thermophilic group of Ca. Nitrosocaldales from hot

51 springs form a sister group to all AOA adapted to lower temperatures 11,13,16. The latter are 52 split into two major lineages, one encompassing the order Nitrososphaerales with cultivated

53 (and non-cultivated) representatives residing mostly in soils 17. The second major clade is 54 further divided into the orders Ca. Nitrosotaleales that are mostly found in acidic soils, as well

55 as the large group of Nitrosopumilales that are mostly found in marine environments 18,19. 56 Several cultivated species of AOA have greatly contributed to a better understanding of their 57 physiology and have revealed that AOA as opposed to their bacterial counterparts are adapted

58 to far lower ammonia concentrations20 which might explain their ecological success in so many 59 oligotrophic environments. However, a recent meta-analysis based on more than 30,000 60 amoA genes deposited in the public databases revealed that the cultivated strains and 61 enrichments represent only 7 out of 19 total AOA clades, indicating that the (eco-)physiological

62 potential of AOA could be far greater than assumed 14. Nevertheless, genomes of cultivated 63 species from diverse environments and a large number of fully or partially assembled (meta- 64 )genomes of AOA are now available and reveal a very well-conserved core of energy and

65 carbon metabolism 21,22, with ammonia oxidation catalysed by an ammonia monooxygenase

66 (AMO) and CO2 fixed via an extremely efficient version of the 3-hydroxypropionate/4-

67 hydroxybutyrate pathway 23. This finding confirms the general assumption that AOA play an 68 important role in the global nitrogen cycle, as ammonia oxidation represents the first step of 69 nitrification, which is an essential process to eventually enable the conversion of reactive

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

70 nitrogen species via denitrification into the inert gaseous compound N2, which is then released 71 to the atmosphere. Their broad occurrence, well-conserved central metabolism as well as their 72 monophyly render AOA an excellent model for elucidating the evolution and adaptations of a

73 microbial lineage that is very likely to have originated in hot springs 11,12,24–27 and from there

74 successfully radiated, probably between 1 and 2.1 billion years ago 28,29, into most if not all 75 moderate oxic ecosystems on Earth. 76 77 In this work we have investigated in a robust phylogenetic framework, 39 complete (or nearly 78 complete) genomes from cultivated or environmental strains of AOA that stem from ocean 79 waters, soils, estuarine, sediments, wastewater, a marine sponge and hot springs. This 80 enabled us to reconstruct the optimal growth temperatures and genome contents of the last 81 common ancestor of all ammonia oxidizing archaea, as well as the ancestors of major AOA 82 lineages to reveal the paths of adaptations that gave rise to radiations into oxic marine and 83 terrestrial environments, respectively.

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

84 Results 85 A molecular thermometer suggests a thermophilic ancestor for AOA and subsequent 86 parallel adaptations to mesophily 87 We compiled a 76 genomes dataset comprising publicly available complete genomes, and 88 high-quality metagenome-assembled or single-cell assembled genomes (see Materials and 89 Methods for selection criteria) from 39 AOA, 13 non-ammonia-oxidizing Thaumarchaeota, 8 90 Aigarchaeota, 5 Bathyarchaeota and 11 representative genomes of orders within 91 to serve as outgroups. We built a species tree from the concatenation of a 92 selection of 33 protein families that were found in at least 74 out of the 76 genomes to serve 93 as a backbone tree for our ancestral reconstructions of growth temperature and genome 94 content (see below) (see Fig. 1, Table S1 and Materials and Methods). 95 Our extended tree recovered the major clades of AOA as described earlier with a smaller

96 dataset 30, separating all AOA into four orders: Ca. Nitrosopumilales from mostly marine 97 environments and sediments and with rather smaller genomes (< 2 Mbp, see Fig. 2), Ca. 98 Nitrosotaleales from acidic soils, Nitrososphaerales with representatives exclusively from soils 99 and sediments exhibiting genomes of double the size than all other groups (around 3 Mbp) 100 and Ca. Nitrosocaldales with organisms exclusively from hot environments. (Fig. 1). 101 In order to approach the optimal growth temperatures (OGTs) of the ancestor of all AOA and 102 of ancestors of terrestrial and marine lineages, we used a “molecular thermometer” to perform 103 a linear regression that enabled to predict the OGTs from the predicted ancestral 16S rRNA

104 G+C stem compositions as in 31,32 (see Table S2, Fig. S1, Materials and Methods and 105 Supplementary Information). Not all genomes could be represented as 20 of them were 106 missing assigned 16S rRNA genes (Table S1). The OGTs of AOA decreased along time, from 107 around 76°C for the LAst Common Ancestor of AOA (LACAOA), to 67°C for the Common 108 Ancestor of the Mesophilic clades of AOA (CAMA), to 64°C and 52°C for the ancestors of 109 Nitrososphaerales, and Ca. Nitrosopumilales plus Ca. Nitrosotaleales (the ancestor of the 110 latter two hereafter called “CARA”, for Common Ancestor of Rod-shaped AOA including 111 marine and acid soil AOA), respectively (Fig. 2). It thus seems that the ancestors of the three 112 different major orders of AOA were all thermophilic, and parallel adaptations to lower 113 temperatures occurred along time in these lineages. Interestingly, the order 114 Nitrososphaerales, containing moderately thermophilic species like Ca. Nitrososphaera 115 gargensis (OGT = 46°C) and N. viennensis (OGT = 42°C), seemed to have preserved a 116 relatively high G+C content in their 16S rRNA, especially in the genus Nitrososphaera 117 (predicted ancestral OGT of 64 - 68°C), which might result from a recent or still ongoing 118 adaptation to lower temperatures from thermophilic ancestors. Based on this analysis, Ca. 119 Nitrosopumilales have most drastically adapted to lower temperatures, which is in line with 120 their adaptation to colder, marine environments. Since we infer an ancestor for all AOA with

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

121 an OGT around 76°C (73°C-78°C), we conclude that the ability to oxidize ammonia for energy 122 acquisition in Archaea was most probably acquired in thermophilic or even hyperthermophilic 123 organisms, as has been suggested since their discovery, and also through subsequent

124 findings of deeply branching thermophilic lineages within Thaumarchaeota and AOA 11–13,24–

125 27. 126 127 Genome dynamics and reconstruction of ancestral AOA genomes reveal distinct 128 evolutionary paths from thermophilic to moderate AOA lineages 129 In order to investigate the evolutionary path of metabolic and adaptive features of AOA, we 130 reconstructed the gene repertoire of LACAOA and of subsequent ancestors by analyzing the 131 distribution of families of homologous proteins from our 76 genomes dataset, and inferring 132 their gains and losses along the species tree with a birth-and-death model for gene family size

133 evolution implemented in the Count program 33 (Table S3). Our analysis enabled us to 134 reconstruct key ancestral states on the evolutionary path of AOA (Fig. 1, 2), such as that of 135 LACAOA, that of CAMA, the Common Ancestor of Mesophilic AOA (after the divergence of 136 Ca. Nitrosocaldales), that of CARA, the Common Ancestor of Rod-shaped AOA (the common 137 ancestor of Ca. Nitrosotaleales and the mostly marine Nitrosopumilales), and the ancestor of 138 Nitrososphaerales (encompassing mostly terrestrial lineages), as well as the ancestors of the 139 genera Ca. Nitrosocosmicus and Nitrososphaera. 140 We infer that at least 1265 protein families were present in LACAOA, the ancestor of AOA. 141 Along the transition from non-AOA to AOA, 289 families were inferred to be gained, and 267 142 lost (Fig. 1, 2 and Tables S3, S4). The huge metabolic transition provoked by the adoption of 143 an ammonia-oxidizing metabolism corresponded to a slightly increased gains’ rate for 144 LACAOA (i.e. number of gains events divided by branch length) when compared to other 145 Thaumarchaeota ancestors (Fig. S2, Wilcoxon signed rank test, p-value ~ 0.001), yet the 146 general gains’ rate trend was consistent with an overall continuous flux of genes along AOA 147 evolution rather than a massive influx of genes that would have marked the onset of this 148 lineage (Sup. Info., Fig. S2). 149 Despite the streamlined genomes of contemporary Nitrosopumilales (1.6 Mbp on average) 150 compared to Nitrososphaerales (3.0 Mbp on average), their ancestors had very similar 151 predicted family content (1534 and 1661, respectively) (Fig. 2 and Table S3). Therefore, most 152 of the differences in genome sizes can be attributed to the massive gains that occurred rather 153 continuously within Nitrososphaerales, with 312 gains by the ancestor of Nitrososphaerales 154 and after the split between Ca. Nitrosocosmicus and Nitrososphaera, respectively (595 and 155 256 gains in the ancestors of the Ca. Nitrosocosmicus clade and Nitrososphaera clade, 156 respectively). This parallel increase in genome sizes of the two ‘soil’ clades led to distinct 157 genomic repertoires and physiologies (see following sections). Our inferences of ancestral

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

158 genome sizes differed from those recently obtained in 24, resulting in apparent distinct genome 159 dynamics with larger ancestral genomes and more downstream losses in the latter. The Dollo 160 parsimony used in that study to infer gains and losses with Count was earlier deemed

161 inappropriate for studying prokaryotic genomes where lateral gene transfers are common 34. 162 Therefore, this difference in Count settings is likely to account for the differences observed 163 between the studies. Furthermore, the probabilistic setting we used, where gains and losses 164 rates were estimated for each family separately in this study, is likely to be more realistic. 165 In the following sections, we detail the evolutionary path followed by AOA towards adaptations 166 to moderate environments, starting by describing the ancestor of all AOA. In order to confirm 167 the evolutionary scenarios inferred by the Count program we used comprehensive 168 phylogenetic reconstructions for approximately 80 crucial families discussed in the rest of this 169 manuscript (see Table S5). This allowed us to correct - where needed - the evolutionary 170 scenarios inferred by the program Count, and in some cases to make assumptions on the 171 donating lineage (see Materials and Methods, Data S1 and Table S5). 172 173 The last common ancestor of AOA was a thermophilic, autotrophic aerobic organism 174 Metabolic acquisitions by LACAOA 175 As expected, LACAOA acquired the three genes for the ammonia monooxygenase complex

176 (amoABC) and the fourth candidate subunit (amoX) 35, the urease gene set and two urea

177 transporters (SSS and UT types)12,24,36 and expanded its set of pre-existing ammonia (Amt) 178 transporters (from one to two, as in extant AOA) (Fig. 3 and Table S4). Although the 179 evolutionary history of the archaeal AMO is not trivial to resolve, recent phylogenetic analyses 180 indicated that it is more closely related to actinobacterial hydrocarbon monooxygenases than

181 the bacterial AMO14. It should also be noted that none of the non-AOA Thaumarchaeal 182 genomes to date encode any genes related to ammonia oxidation or urea utilization. However, 183 one has to note that the pathway has not been fully elucidated yet. Meanwhile, a number of 184 cupredoxin-domain families were acquired, equipping it with the unique copper-reliant 185 biochemistry encountered in contemporary AOA. The heavy reliance on copper necessitated

186 the acquisition of copper uptake systems, such as the CopC/CopD family proteins 37. 187 Curiously, glutamate synthase (GOGAT), which in most Archaea is participating in the high 188 affinity ammonia assimilation pathway, is inferred to be lost at this stage, leaving the glutamate

189 dehydrogenase (usually exhibiting lower affinity 38) as the primary ammonia assimilation route 190 in contemporary AOA. This could be regarded as an adaptation to an energy and carbon-

191 limited lifestyle as less carbon and ATP are consumed during NH3 assimilation. It is also the

192 preferred strategy under these conditions in some bacteria 39. 193 Similarly, at least two genes for crucial steps in the 3-hydroxypropionate/4-hydroxybutyrate

194 (HP/HB) CO2 fixation pathway conserved in all AOA were acquired by LACAOA. These were

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

195 an ADP-forming 3-hydroxypropionyl-CoA synthetase (hpcs), responsible for the energy

196 efficiency of the cycle 23, possibly acquired from (Data S1), and methylmalonyl- 197 CoA epimerase (mce). A third component of the pathway, the ADP-forming 4-hydroxybutyryl- 198 CoA synthetase, was replaced at the stage of mesophilic AOA (i.e. in CAMA). LACAOA might

199 also have newly acquired the O2-tolerant enzyme 4-hydroxybutyryl-CoA dehydratase, (see

200 also Fig. S2 in 12). Overall, the carbon fixation pathway found in extant AOA seems to be the 201 result of the integration of mostly vertically inherited, and some more recently acquired 202 enzymes. 203 The key enzyme for production of polyhydroxyalcanoates (PHAs), PHA synthase, was also 204 acquired by LACAOA (Data S1), equipping AOA with the ability to produce carbon storage

205 compounds typically synthesized in response to unbalanced growth conditions 40. Additionally, 206 LACAOA acquired the families associated with the biosynthesis of the vitamin B cofactors

207 cobalamin (B12), biotin (B7) and riboflavin (B2) as pointed out in earlier studies 29,41. 208 209 Genes for aerobic autotrophic growth were already present in LACAOA. 210 The gene families inferred to be present already, and not gained by LACAOA (labeled light 211 and dark grey in Fig. 3), illustrate the genetic background of the archaeon which first acquired 212 the metabolism of ammonia oxidation (Tables S3, S4). The ancestor of LACAOA was capable

213 of using O2 as an electron acceptor as suggested by the presence of a heme-copper oxidase

214 (routinely used to infer an aerobic metabolism 24,42) and the absence of an alternative electron

215 acceptor. The inference of an aerobic ancestor stands in contrast to a recent study 29 that 216 concluded LACAOA arose from an anaerobic ancestor, based on the assumption that all 217 lineages basal to AOA were anaerobic (i.e. non-AOA Thaumarchaeota, Aigarchaeota and 218 Bathyarchaeota). However, most Aigarchaeota and at least two non-AOA Thaumarchaeota 219 (BS4 and pSL12) included in our dataset were predicted to be facultative aerobes – the latter 220 being recently further supported by a study reporting metagenomic-assembled genomes from

221 the pSL12 lineage with potential for aerobic respiration 24,43–46. In our ancestral 222 reconstructions, genes involved in anaerobic metabolisms were inferred as gains in the 223 respective lineages, such as the codH subunits in the ancestor of Bathyarchaeota, or nitrate 224 reductase and adenylylsulfate reductase families in Thaumarchaea BS4 and DS1. 225 Additionally, the presence of most enzyme families of the HP/HB carbon fixation pathway 226 (including the key enzymes acetyl-CoA/ propionyl-CoA carboxylase and 4-hydroxybutyryl- 227 CoA dehydratase) indicate the potential for autotrophic growth. They are present in all known 228 AOA and shared with a number of aerobic Crenarchaeota and Aigarchaeota, albeit with

229 distinct differences (see paragraph above and 23,47,48). 230 Concurrently, we observe a loss of gene families associated with heterotrophic growth, in 231 particular those involved in glycolysis and connecting pools of C3 and C4 compounds. Among

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

232 them are two phosphofructokinases, the 2-oxoacid dehydrogenase complex (OADHC), and

233 an (abcd)2-type 2-oxoacid:ferredoxin oxidoreductase (OFOR, while an (ab)2-type OFOR 234 exclusively found in aerobes is retained in LACAOA and present in all extant AOA (see 235 Supplementary Information and Data S1). Losses and gene families’ contractions (loss of 236 gene copies in an ancestrally multi-copy family) were also observed in families involved in 237 amino-acid catabolism such as the glycine cleavage system and amino-acid and sugar 238 transporter families (see Supplementary Information for details). 239 240 Overall, our analysis depicts an autotrophic aerobic ancestor, which accompanied the switch 241 to ammonia oxidation with an apparent minimization of the ability to grow heterotrophically. 242 Whether this ability was completely lost in the emerging AOA lineages remains an open 243 question, as many genomic studies have indicated the potential for mixotrophic growth and 244 many isolates benefit from the addition of organics (although this effect was later attributed to

245 ROS detoxification49). Nevertheless, assimilation of organic carbon has so far not been 246 demonstrated in AOA. 247 248 Radiations to moderate environments were paralleled by independent adaptations in 249 three major functional categories 250 In accordance with an inferred adaptation to lower temperatures, several traits involved in the 251 thermophilic lifestyle of LACAOA were lost in subsequent evolutionary steps: reverse gyrase,

252 the hallmark enzyme of organisms living at extremely high temperatures 50,51 and cyclic 2, 3-

253 diphosphoglycerate (cDPG, thermoprotectant 52) synthetase were lost in CAMA, while the 254 thermoprotectant mannosyl-3-phosphoglycerate synthase (MPGS) was lost in CARA (Tables

255 S3, S4, Fig. 4). In contrast, a homologue of the “cold-shock protein” CspA 53 was found to be 256 acquired from bacteria in an ancestor of Ca. Nitrosopumilales (to the exception of the soil- 257 residing genus Ca. Nitrosotenuis) (Fig. 4, Fig. S3, Table S3), corroborating the generally lower 258 OGT of the marine lineages, as inferred by our analysis and existing pure cultures. 259 When surveying the functions of families acquired and lost along the evolution of mesophilic 260 AOA, we observed that the majority of acquisitions in all examined ancestors, ranging from 261 18-34% of all functionally annotated families (and a few key losses), could be classified into 262 three major categories, discussed further below: i) adaptations to various kinds of abiotic 263 stress, in particular oxygen-related stress ii) specific metabolic regulations and general 264 increase of the regulatory potential, and iii) extension of capacities to engage in complex 265 interactions with the environment (Fig. 4, Fig. S3 Table S4). 266 267 1. Stress adaptations were crucial to colonize ‘moderate environments’ 268

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

269 Strategies against oxidative and nitrosative stress 270 Although LACAOA, being an aerobe, was already equipped with a set of proteins dedicated 271 to reactive oxygen species (ROS) detoxification, redox homeostasis and ROS damage repair 272 (Fig. 3 and Tables S3, S4), a number of additional genes involved in these processes were 273 frequently and continuously acquired along all lines of AOA evolution. Some of these specific 274 acquisitions also shaped in distinct ways the genomic repertoire of entire clades, determining 275 their ecological boundaries. 276 277 Thiol oxidoreductases of the thioredoxin (Trx) and disulfide-bond reductase proteins (Dsb)

278 family, involved in reducing oxidized cysteine pairs, 54 were continuously acquired throughout 279 all AOA lineages (Fig. 4, Fig. S3, Supplementary Information), while the methionine sulfoxide 280 reductase B (MsrB) family, responsible for the reduction of oxidized methionines, was 281 expanded only in Nitrososphaerales. Thioredoxin reductase and MsrA families were present 282 in LACAOA and inferred to be vertically inherited. Dsb and Msr family proteins may also be

283 involved in detoxification of reactive nitrogen species (RNS) such as NO 55,56, produced by

284 AOA (as in other ammonia oxidizing bacteria) as an intermediate of ammonia oxidation 57–59. 285 Interestingly, only the ancestor of Nitrososphaerales acquired a Mn-catalase, apparently from 286 Terrabacteria (as suggested by phylogenetic analyses, see Data S1), which was lost in N. 287 viennensis), whereas Ca. Nitrosopumilales, Ca. Nitrosocaldales and Ca. Nitrosotaleales all 288 lack this enzyme, considered to be a hallmark enzyme for aerobic metabolisms (Fig. 5, Fig. 289 S3). Although this absence remains puzzling, in Bacteria catalases are the primary

290 scavengers only at high H2O2 concentrations, while the activity of alkyl hydroperoxide 291 reductase (Ahp, an enzyme present in LACAOA and vertically inherited in all extant AOA) is

292 sufficient during logarithmic growth 60. Nevertheless, most AOA lacking catalase have been

293 shown to be dependent on, or stimulated by, the presence of an external H2O2 scavenger,

294 such as catalase, dimethylthiourea or -ketoacids 49,61, or were/are dependent on co-cultures

295 with bacteria 62,63. Stress reduction is a known factor shaping microbial communities64, and 296 these interdependencies often are the reason behind the low success rates in isolating and 297 characterizing novel species, as is the case with AOA. 298 299 The role of low molecular weight thiols (LMWT) in intracellular redox control and oxidative

300 stress defense is undisputed 65. While the most well-known LMWT, glutathione, has not been 301 detected in Archaea, they have nevertheless been shown to accumulate a variety of analogs

302 such as thiosulfate and Coenzyme A 66. The key genes (egtB and egtD) for the synthesis of 303 the LMTW ergothioneine (EGT), a histidine betaine derivative with a thiol group synthesized 304 by fungi and bacteria, were gained in LACAOA. In other archaeal phyla they are only found in

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

305 a few representatives of and . A donor could not be

306 identified, but the origin of the genes possibly lies within Actinobacteria 67. 307 308 Another notable acquisition by the ancestor of Nitrososphaerales is the protein family 309 encoding for a gamma-glutamylcysteine ligase (gshA), the evolutionary origin of which could 310 not be unambiguously traced. This enzyme catalyzes the formation of gamma- 311 glutamylcysteine from glutamate and cysteine, typically a precursor of glutathione

312 biosynthesis in Bacteria and Eukaryotes 68. Accumulation of this compound was shown in

313 69,70, making Nitrososphaerales the second archaeal group to follow this strategy. 314 The second enzyme responsible for glutathione biosynthesis, gshB is missing in both 315 Haloarchaea and Thaumarchaeota, while candidates for a putative bis- gamma- 316 glutamylcystine reductase can be found among the pyridine nucleotide-disulfide reductase 317 family homologs present in all AOA. 318 319 Iron, copper and redox homeostasis 320 Systems that control the levels of unincorporated iron and copper in the cell are upregulated

321 upon oxidative stress in Archaea and Bacteria 71 (Supplementary Information). Both Fe2+ and

322 Cu(I) can react with H2O2 and generate hydroxyl radicals (HO•) via the Fenton and Haber-

323 Weiss reactions respectively 72, which can subsequently cause severe cellular damage, 324 including DNA lesions. Dps family proteins (DNA-binding proteins from starved cells), known 325 archaeal antioxidants consuming both substrates of the Fenton reaction and physically

326 shielding the DNA 71,73, were acquired by LACAOA and expanded in Nitrososphaerales. Dps 327 and ferritin-like superfamily proteins were also gained as part of the parallel expansion of the 328 soil lineages in the Ca. Nitrosocosmicus and Nitrososphaera ancestors (Fig. 4, Fig. S3). Cu(I)

329 can also cause the displacement of iron in enzymatic Fe-S clusters 74. Interestingly, a Cys-

330 rich four-helix bundle Copper storage protein (Csp), recently identified in methanotrophs 75, 331 was acquired in the ancestor of Nitrososphaerales, probably from Bacteria. 332 333 Enzymes involved in maintaining redox homeostasis, an essential function to prevent or 334 alleviate oxidative stress, were gained throughout the evolution of AOA. In particular, a 335 ferredoxin:NADP(H) oxidoreductase (FNR) family was gained by LACAOA and expanded in 336 CAMA, while WrbA flavodoxins were respectively acquired by CARA, the ancestor of Ca. 337 Nitrosopumilales, the ancestor of Nitrososphaerales and the subsequent terrestrial lineages 338 (Table S4, Fig. 4, Fig. S3, Supplementary Information). 339 340 Frequent exchanges of pathways involved in desiccation, protein stability and osmotic stress 341 relief occurred during the colonization of osmotically variable environments.

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

342 343 LACAOA acquired the key enzyme mannosyl-3-phospholgycerate synthase (MPGS) (Data 344 S1), responsible for the first of the two-step synthesis pathway of the compatible solute 345 mannosylglycerate (MG), whose role beyond osmotic stress involves maintenance of protein

346 structure and activity during freezing, desiccation, ROS detox, and thermal denaturation 76,77. 347 Interestingly, the pathway for MG synthesis was lost in CARA. The emerging, mostly marine 348 lineages instead acquired the ability to synthesize or take up ectoine/hydroxyectoine (gained

349 by the ancestor of the genus 78) and proline/glycine-betaine (gained in CARA) 350 during the adaptation to osmotically variable (e.g. estuarine) and high salinity (e.g. open 351 ocean) environments (see Supplementary Information, Fig. 4, Fig. S3). Additional small 352 conductance mechanosensitive channel families (MscS) enabling the rapid outflux of solutes

353 in response to excessive turgor caused by hypo-osmotic shock 79 were acquired in CARA and 354 subsequent lineages, while the large conductance MscL was lost in marine lineages adapted

355 to high salinity, as is frequently the case in marine-dwelling Bacteria 79 (Table S4). 356 357 LEA14-like protein families containing a conserved WHy (water stress and hypersensitive

358 response) domain 80 were acquired in LACAOA, in CARA, and repeatedly along AOA's 359 diversification, ending up with multiple copies in extant genomes (Fig. 4, Fig. S3, Tables S3, 360 S4, Data S1), thus equipping AOA with proteins that act as “molecular shields”, preventing 361 denaturation and inactivation of cellular components under conditions of osmotic imbalance. 362 363 Gain of common and rare DNA repair systems in mesophilic AOA 364 Oxidative DNA lesions are hypothesized to be the primary source of DNA damage in oxic 365 habitats and LACAOA was already equipped with the basic components of most DNA repair 366 systems. Interestingly, the bacterial-type UvrABC system common in mesophilic and

367 thermophilic Bacteria 81, but not found in the hyperthermophilic AOA ancestors or in non-AOA 368 Thaumarchaeota, was acquired by CAMA, possibly from bacteria or by multiple transfers from 369 Bacteria first to other archaeal clades (methanogens, halophiles, TACK) and then to AOA 370 (Data S1). Genetic exchanges continued to shape the DNA repair arsenal of the different 371 thaumarchaeal lineages (see Fig. 4, Fig. S3 and Supplementary Information). In particular, an 372 impressive inflow of DNA repair families took place into the ancestor of Nitrososphaerales with 373 the acquisition of a complete non-homologous end joining (NHEJ) complex from Bacteria 374 (Data S1) while a UvdE endonuclease family involved in the alternate UV damage repair 375 pathway (UVDR) was acquired in Ca. Nitrosocosmicus. The former is extremely rare in

376 Archaea 82, with a full complex described so far only from Methanocella paludicola 83. 377 378 2. Expansion of transcriptional and redox-based metabolic regulation

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

379 380 An intriguing enzyme family exchange took place in CAMA, the last common ancestor of all 381 mesophilic AOA, in which the AcnA/IRP type aconitase, present in Ca. Nitrosocaldus 382 genomes, other Archaea, the eukaryotic cytosol and most Bacteria, was exchanged for a 383 mitochondrial-like aconitase (mAcn) present in eukaryotic mitochondria, Bacteroides (as

384 previously observed in 84), and only a few other lineages of Bacteria (Spirochaetes, 385 Fibrobacteres, Deltaproteobacteria and Nitrospirae; Fig. 5 and Data S1). The key to this 386 exchange may lie in the differential sensitivity and recovery rate of these Acn types to oxidative 387 inactivation of their Fe-S clusters. While the AcnA/IRP group aconitases are more resistant to 388 oxidative damage and are implicated in oxidative stress responses, their repair is slower as it

389 might require complete cluster disassembly 85. On the other hand, the eukaryotic mAcn and 390 bacterial AcnB aconitases, although distantly related, are easily inactivated by ROS, but can

391 be readily reactivated when iron homeostasis is restored 85,86. This makes them the enzymes 392 of choice for aerobic respiration in Eukaryotes and some Bacteria as it provides a

393 sophisticated redox regulation mechanism of the central carbon metabolism 87–89. 394 Interestingly, two non-AOA Thaumarchaeota genome bins from oceans, likely to be from 395 mesophilic organisms as well, also harbour this mitochondrial-like version of the aconitase, 396 while a replacement has taken place in the moderate thermophile N. gargensis (Fig. 5). To 397 our knowledge, this is the first observation of a mitochondrial-type aconitase in Archaea and 398 can be interpreted as a clearly beneficial, perhaps crucial adaptation to an increasingly oxic 399 environment, although this remains to be experimentally validated. 400 401 The two basic transcription factors of Archaea (TFB and TBP) which are homologs of the 402 eukaryotic TBP and TFIID are found in extended numbers in AOA (as noted earlier for smaller

403 datasets 90,91)(Fig. S3). Our ancestral reconstructions, supported by phylogenetic trees, 404 indicate that LACAOA already encoded multiple TFB copies (probably around three), and 405 extant mesophilic AOA genomes encode 4-11 copies. Interestingly, TBP expansions occurred 406 in CARA with extant genomes encoding 2-5 copies (Table S3, Fig. 4, S3). These expansions 407 enabled multiple potential combinations of TFBs-TBPs, resulting in the emergence of global 408 regulatory networks and rapid physiological adaptation in changing environments, as noted

409 earlier for halophilic Archaea and AOA 92–94. 410 411 3. Interactions with the environment: extracellular structures, cell wall 412 modifications 413

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

414 A variety of enzymes involved in acetamidosugar biosynthesis, EPS production, cell envelope 415 biogenesis and adhesion were gained by every ancestor we reconstructed, with an apparent 416 enhancement of the process in Nitrososphaerales (Fig. 4, Tables S3, S4), reflecting their

417 demonstrated ability to form biofilms 21,95. Formation of single or multi-species biofilms is an

418 understudied but very successful ecological adaptation in Archaea (reviewed in 96), as these

419 structures not only offer protection against environmental stress and nutrient limitation 97,98, 420 but can also provide favorable conditions for direct nutrient or electron exchange that facilitate 421 biogeochemical cycling. 422 Cellular structures play crucial roles in biofilm formation, motility, and in mediating various 423 forms of environmental and cell-cell interactions. Our analysis indicates that at least four 424 different types of archaeal Type IV pili (T4P) have been gained along AOA diversification (Fig. 425 6 and Supplementary Information), and were sometimes subsequently lost, leading to a 426 complex distribution pattern (Fig. S4). The AOA ancestor already possessed at least one T4P, 427 as indicated by the families inferred to be present and acquired (T4P biosynthesis and

428 chemotaxis genes). AOA possess the archaeal flagellum (“archaellum”) 99, a pilus of unknown 429 function related to Ups and Bas pili (involved in cell-cell contact and DNA repair and sugar

430 metabolism respectively in ) 100,101, as well as two other adhesion-like pili that

431 could play a role in biofilm formation or biotic interactions (Fig. 6) 12,102. Interestingly, Ca. 432 Nitrosocosmicus spp. did not harbor any T4P, unlike the soil-associated sister-group 433 Nitrososphaera, suggesting a different adaptive strategy to terrestrial environments, as 434 pointed out by a number of differential gains/losses in the two lineages (see above). The 435 precise functions of these four pili remain to be elucidated but their abundance and diversity 436 indicate frequent and varied environmental interactions and exchanges across AOA. Such 437 level of diversity in the pili repertoire of AOA even exceeds the diversity observed in 438 Sulfolobales, and is only paralleled by the diversity we find here in Bathyarchaeota (Fig. 6). 439 440 General Discussion and Conclusion 441 442 Our analysis highlights the many obstacles that AOA might have endured in the course of their 443 diversification and adaptation from hot to moderate temperature environments. 444 Overall, a continuous acquisition of crucial components, many of which are implicated in 445 coping with stress derived from oxygen exposure, seem to have been the key to the ecological 446 success in the different lineages of AOA. We assume that the higher solubility of oxygen at 447 lower temperatures (which is about 4-fold when shifting from 80° to 20°C), together with a 448 metabolic optimization resulting in higher production of ROS and possibly reactive nitrogen 449 species through ammonia oxidation, might have necessitated these extensive adaptations. In 450 summary, we observe a higher proportion of families implicated in oxygen tolerance compared

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

451 to thermophilic sister lineages in our dataset (Aigarchaeota, Crenarchaeota and some non- 452 AOA Thaumarchaeota) (Fig. S3, Table S4). In addition, moderate habitats are discontinuous 453 and unstable in terms of nutrient availability, hydration content and radiation exposure, in 454 drastically different ways than a hot spring. This is reflected in regulatory adaptations in non- 455 thermophilic AOA, including the acquisition of the regulatable mitochondrial-type aconitase in 456 the common mesophilic ancestor (CAMA), that is found in every mesophilic AOA to date, as 457 well as a considerable expansion of basic transcription factors in all AOA lineages. Thirdly, 458 AOA had to cope increasingly with competition at lower temperatures resulting from higher 459 overall microbial diversity and abundance, which might have triggered their diversification of 460 the outer cell wall and the acquisition of different kinds of type IV pili in most lineages. 461 The specific differences in adaptations along the evolutionary lineages of marine AOA (Ca. 462 Nitrosopumilales) and the two divergently evolved terrestrial lineages (Ca. Nitrosocosmicus) 463 and Nitrososphaera) reflect their current major ecological distribution. While the ancestors of 464 Ca. Nitrosopumilales acquired strategies to cope with higher osmotic pressures and lower 465 temperatures in the ocean, the ancestor of the terrestrial group (Nitrososphaerales) acquired 466 a range of families to cope with oxygen, fluctuating nutrient availability and other stressors as 467 well as DNA damage, while it diversified its cell wall modifications and EPS forming capacities. 468 The path into moderate environments of AOA, looks similar to that of halophilic Archaea, 469 where continuous acquisitions of bacterial genes (to Haloarchaea) seem to have contributed 470 to a metabolic switch from anaerobic methanogenic autotrophs to heterotrophy and aerobic

471 respiration, as well as to osmotic adaptations 93,103–105. These gradual adaptations contradict

472 the earlier idea of massive lateral gene transfers at the onset of major archaeal clades 93,103–

473 105(see also below). Further, we see clear parallels between halophilic Archaea and AOA with 474 respect to their stress adaptations, especially regarding oxidative stress (this study, and

475 93,103,106) and both lineages also have expanded sets of basic transcription factors indicating a

476 possibility to build complex regulatory networks 94. 477 Besides Thaumarchaeota and Halobacteria, Ca. Posidoniales (formerly Marine Group II

478 Archaea) and Class II Methanogens () thrive in oxic habitats 42,107,108. 479 Although strictly anaerobic, the latter show particular adaptations in the methanogenesis 480 pathway that result in decreased ROS production together with an enrichment in protein

481 families involved in ROS detoxification and DNA/protein repair 108, i.e. similar to adaptations 482 outlined here for AOA, but to a smaller extent. 483 While we agree that oxygen’s exposure was a driving evolutionary pressure, we here depict 484 a more complex scenario for adaptation of AOA to moderate environments than described in

485 a recent study 29. We show that adaptation to higher levels of oxygen occurred in addition to 486 adaptations to various types of stress, and that these gradually occurred involving both

487 innovations, but also pre-existing ancestral sets of genes. In contrast to Ren et al. 29 we infer

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

488 an already aerobic ancestor for all AOA which is in agreement with sister lineages to AOA 489 being at least facultative aerobes (Bathy-, Aig- and non-AOA Thaumarchaeota). This is also 490 in line with a more recent diversification of AOA than suggested by the use of molecular clock

491 in Ren et al., 2019 29. The diversification of mesophilic AOA (CAMA) has indeed been 492 estimated to be not older than 950 My based on the acquisition of the fused version of DnaJ-

493 Ferredoxin genes from the ancestor of Viridiplantae 109, an acquisition which we could confirm 494 with our comprehensive genomic dataset (DnaJ-Fd on Fig. 4). LACAOA, and at least major 495 mesophilic lineages of AOA might thus have diversified well after the Great Oxidation Event 496 2.3 Gy ago. 497 Although the phylogenetic trees we obtained could support Count scenarios and clarify at what 498 point a given family was acquired, in many cases they could not help to identify the donor

499 lineage. This is consistent with another recent study 29, and this inability to precisely identify 500 donors persists in spite of the huge increase in taxon sampling since the first studies of lateral

501 gene transfers in Thaumarchaeota 25,110,111. It may not only be caused by a limited taxon 502 sampling, but could also result from shifts in evolutionary rates of the acquired families, and 503 the resulting difficulties in solving the gene phylogenies. It is, however, clear that a mix of pre- 504 adaptations (vertical inheritance), and transfers of both bacterial and archaeal genes 505 altogether fell into place to allow the successful radiation of ammonia oxidizing archaea into 506 so many different environments. The complexity of this “genetic cocktail” could explain why 507 the ecological success as seen for AOA appears unique in the domain Archaea. 508 509 Materials & methods 510 511 Genome dataset 512 We gathered complete genomes of AOA, and enriched the dataset with complete or 513 metagenome-/single-cell- assembled genomes of high quality. We used CheckM (version

514 1.0.7) 112 to assess the completeness and contamination of the genomes gathered 515 (“lineage_wf” parameter), and used a >80% completeness, <5% contamination threshold to 516 include MAGs or SAGs. Yet these criteria were sometimes relaxed in order to broaden the 517 phylogenetic diversity included (see Table S1). In total, we selected 76 genomes for the 518 analysis, including 39 AOA, 13 non-AOA Thaumarchaeota, 8 Aigarchaeota, 5 Bathyarchaeota 519 and 11 Crenarchaeota (Table S1). When no annotations were available for the selected

520 genomes, the Prokka suite (v1.12)113 was used to annotate genes and proteins. 521 522 Protein families construction

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

523 Protein families were built for the set of genomes selected. A Blast “all sequences against all”

524 was performed, and the sequences representing hits with an e-value below 10-4 were used as 525 input for the Silix (v1.2.11) and Hifix (v1.0.6) programs to cluster the sets of similar sequences

526 into families 114,115. Two sequences were clustered in a same Silix family when they shared at 527 least 35% of identity and their blast alignment covered at least 70% of the two sequences 528 lengths. The Hifix program was then used on the Silix families for refinement. We obtained 529 37517 families, of which 12367 had more than one sequence. 530 531 Phylogenomic tree inference

532 Universal archaeal families described in 116 were annotated in our genome dataset using

533 HMMER (v3.1b2) 117 as described in 12. The best hit was selected in each genome, and a 534 subset of 33 informational protein families that were present in 74 out of 76 genomes of our 535 dataset was selected. The sequences from each family were extracted and aligned using

536 MAFFT (linsi, v7.313) 118, and the alignments filtered using BMGE (BLOSUM30, v1.12) 119. 537 The filtered alignments were then concatenated, and a maximum-likelihood tree was inferred 538 with IQ-Tree (v1.6.11) using the LG+C60+F model of sequence evolution, and 1000 Ultrafast

539 bootstraps were performed 120. 540 The 33 families were the following: PF00177 Ribosomal protein S7p/S5e, 541 PF00189 Ribosomal protein S3, PF00203 Ribosomal protein S19, PF00237 Ribosomal 542 protein L22p/L17e, PF00238 Ribosomal protein L14p/L23e, PF00252 Ribosomal protein 543 L16p/L10e, PF00276 Ribosomal protein L23, PF00347 Ribosomal protein L6, PF00366 544 Ribosomal protein S17, PF00380 Ribosomal protein S9/S16, PF00410 Ribosomal protein S8, 545 PF00411 Ribosomal protein S11, PF00416 Ribosomal protein S13/S18, PF00466 Ribosomal 546 protein L10, PF00572 Ribosomal protein L13, PF00573 Ribosomal protein L4/L1 family, 547 PF00673 ribosomal L5P family, PF00687 Ribosomal protein L1p/L10e family, PF00831 548 Ribosomal L29 protein, PF01090 Ribosomal protein S19e, PF01157 Ribosomal protein L21e 549 PF01200 Ribosomal protein S28e, PF01201 Ribosomal protein S8e, PF01280 Ribosomal 550 protein L19e, PF01667 Ribosomal protein S27, PF01000 RNA polymerase Rpb3/RpoA insert 551 domain, PF01191 RNA polymerase Rpb5, C-terminal domain, PF01192 RNA polymerase 552 Rpb6, PF01912 eIF-6 family, PF09173 Initiation factor eIF2 gamma, PF03439 Early 553 transcription elongation factor of RNA pol II, PF04406 Type IIB DNA topoisomerase, PF11987 554 Translation-initiation factor 2. 555 556 Inference of ancestral Optimal Growth Temperature

557 A dataset of 16S rRNA sequences and corresponding OGT was extracted from 121 and from 558 the literature (see Table S2). The stem position predictions were obtained from the RNAStrand 559 database (version 2.0) for all the sequences in our dataset that were available (20 rRNA

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

560 molecules). The 16S rRNA sequences that were available (56 out of 76 genomes, see Table

561 S1) were all aligned together with RNAStrand sequences with SSU-ALIGN (v0.1.1) 122, and 562 the consensus of the positions found in stems were conservatively selected to be part of stem 563 regions (878 positions). The GC% of the predicted stems was computed for each sequence. 564 A linear regression between the stem GC% and the OGT was computed. A non-homogeneous

565 model was used in bppml (BppSuite, v2.4.1 123) to estimate the evolutionary model of the 16S

566 rRNA sequences along the reference tree, as in 31. The estimated parameters were then used 567 in the program BppAncestor to reconstruct 100 replicates of ancestral sequences along the 568 reference tree. The OGT of the ancestor of extant AOA was inferred using the linear 569 regression coefficients and the bootstrap replicates to estimate a confidence interval, as

570 described in 31. 571 572 Count analysis and ancestral genomes reconstruction 573 A matrix of occurrences of each protein family in extant genomes was created for all the

574 genomes under analysis. This matrix was used as an input for the Count program (v10.04) 33 575 along with the reference phylogeny to compute 1) the rates of gains and losses along the tree 576 independently for each family given a phylogenetic birth-and-death model (gain-loss- 577 duplication model, default parameters), and 2) the posterior probabilities for each family to 578 have been gained and lost given the inferred rates, in each branch of the tree. (v3.5.1) and 579 Python (v2.7) scripts were used to analyse and extract the resulting data. Evolutionary events 580 inferred on each branch of the reference tree (family gains, losses, expansions: number of 581 family members increase, contractions: number of family members decrease) were selected 582 when their probability was above a fixed threshold of 0.5. This threshold was also used to 583 define the presence and multi-copy state of each family within each node. The sets of families 584 inferred to be present at a given node constituted the ancestral genome of the branches 585 emerging from this node. We therefore could extract the sets of evolutionary events (gains, 586 losses, expansions and contractions) before a lineage diversification, together with the 587 families inferred to have been present in the ancestor of the lineage (Table S3). 588 589 Annotation of protein families and quantification of functional categories 590 Aside from the annotations provided with the genomes, we obtained annotations using PFAM

591 (version 28.0 PFAM-A, 'cut-tc' option, best i-evalue match selected with HMMER) 124,

592 TIGRFAM (version 15, 'cut-tc' option, best i-evalue match) 125 using the corresponding HMM

593 protein profiles and the HMMER program version 3.1b2 117. arCOGs (2014 version) 126 and

594 COGs (2014 version) 127 were assigned using the COGnitor scripts, with an e-value threshold

595 set to 10-10. The MacSyFinder program (v1.0.5) 128 was also used to annotate type IV pili in

596 genomes, using models and HMM profiles described earlier 12. Manual inspection of the

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

597 protein families resulted in grouping into custom functional categories, which were then 598 quantified in extant genomes as in Table S4. 599 600 Phylogenetic analyses of gained genes 601 We first gathered a representative dataset of 281 genomes from Bacteria, Archaea and 602 Eukaryotes (see Table S1). This dataset, composed of complete genomes (Refseq database, 603 NCBI), was selected to cover a vast range of organismal diversity, and was enriched to cover 604 the diversity of organisms from the TACK super-phylum by adding 68 highly complete 605 metagenomic bins (Table S1). 606 For protein families of interest, we retrieved all the sequences classed as part of the family in 607 our 76 genomes dataset, and used them to query the representative dataset of genomes using

608 Blast (v2.6.0+) 129. The first 250 sequences corresponding to the best-score hits with an e-

609 value lower than 10-10 were selected. The corresponding sequences were extracted,

610 dereplicated with Uclust ('-cluster_fast', 100% identity level, Usearch v10.0.240)130 , and 611 aligned to the sequences from the original protein family using the MAFFT program ('linsi'

612 algorithm, version 7.397)118. The alignments were then filtered using the BMGE program

613 (BLOSUM30)119. The filtered alignments were used to build phylogenetic trees by maximum- 614 likelihood (IQ-tree, “-m TESTNEW”, best evolutionary model selected by Bayesian Information

615 Criterion, 1000 UF-Bootstrap and aLRT replicates)120. Upon manual inspection of the trees, a 616 few sequences were excluded during protein family phylogenies’ reconstructions, when the 617 corresponding branches had a length in the tree > 1 substitutions/site. In those few cases, 618 alignments and trees were re-constructed without the corresponding sequences. All resulting 619 trees are provided in Data S1, and a list of the generated trees can be found in Table S5. 620 621 Generation of figures

622 Drawings of trees and genes were generated with iTOl (v4) 131 and GeneSpy (v1.1) 132 623 respectively. 624 625 Data availability 626 The genomes analyzed in this study are publicly available on NCBI or IMG/JGI, and all 627 accession numbers are given in Table S1. All data generated in this study are provided in the 628 Supplementary Tables 2-5 and generated phylogenetic trees in Supplementary Data S1. 629 630 Acknowledgements: 631 This project was supported by ERC Advanced Grant TACKLE (695192) and project P27017 632 of the Austrian Science Fund (FWF). SSA was funded by a “Marie-Curie Action” fellowship, 633 grant number THAUMECOPHYL 701981. We are thankful to Thomas Rattei and Florian

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

634 Goldenberg for providing access to the CUBE servers (U. Vienna), to Céline Brochier-Armanet 635 for early discussions on the project, and to Simon Rittmann and Lisa-Maria Mauerhofer for 636 providing optimal growth temperatures of methanogens. 637 638 Author contributions: 639 SSA and CS conceived the study. SSA performed the bioinformatic and 640 phylogenetic/phylogenomic analyses. MK analysed the reconstructed genomes. All authors 641 analysed and discussed results and wrote the manuscript.

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

642 References: 643 644 1. Schleper, C. & Nicol, G. W. Ammonia-oxidising archaea--physiology, ecology and 645 evolution. Adv Microb Physiol 57, 1–41 (2010). 646 2. Stahl, D. A. & de la Torre, J. R. Physiology and diversity of ammonia-oxidizing 647 archaea. Annu. Rev. Microbiol. 66, 83–101 (2012). 648 3. Hatzenpichler, R. Diversity, Physiology, and Niche Differentiation of Ammonia- 649 Oxidizing Archaea. Appl. Environ. Microbiol. 78, 7501–7510 (2012). 650 4. Offre, P., Spang, A. & Schleper, C. Archaea in biogeochemical cycles. Annu. Rev. 651 Microbiol. 67, 437–457 (2013). 652 5. Manoharan, L. et al. Metagenomes from Coastal Marine Sediments Give Insights into 653 the Ecological Role and Cellular Features of Loki- and . MBio 10, 654 (2019). 655 6. Dombrowski, N., Lee, J.-H., Williams, T. A., Offre, P. & Spang, A. Genomic diversity, 656 lifestyles and evolutionary origins of DPANN archaea. FEMS Microbiol. Lett. (2019). 657 doi:10.1093/femsle/fnz008 658 7. Kubo, K. et al. Archaea of the Miscellaneous Crenarchaeotal Group are abundant, 659 diverse and widespread in marine sediments. ISME J. 6, 1949–65 (2012). 660 8. Pan, J. et al. Genomic and transcriptomic evidence of light-sensing, porphyrin 661 biosynthesis, Calvin-Benson-Bassham cycle, and urea production in Bathyarchaeota. 662 Microbiome 8, 43 (2020). 663 9. Borrel, G. et al. Wide diversity of methane and short-chain alkane metabolisms in 664 uncultured archaea. Nat. Microbiol. 1 (2019). doi:10.1038/s41564-019-0363-3 665 10. Kerou, M., Alves, R. J. E. & Schleper, C. Class Nitrososphaeria. Bergey’s Man. Syst. 666 Archaea Bact. 2015–2016 (2016). doi:10.1002/9781118960608.fbm00265. 667 11. de la Torre, J. R., Walker, C. B., Ingalls, A. E., Könneke, M. & Stahl, D. A. Cultivation 668 of a thermophilic ammonia oxidizing archaeon synthesizing crenarchaeol. Environ. 669 Microbiol. 10, 810–818 (2008). 670 12. Abby, S. S. et al. Candidatus Nitrosocaldus cavascurensis, an ammonia oxidizing, 671 extremely thermophilic archaeon with a highly mobile genome. Front. Microbiol. 9, 672 (2018). 673 13. Daebeler, A. et al. Cultivation and Genomic Analysis of “Candidatus Nitrosocaldus 674 islandicus,” an Obligately Thermophilic, Ammonia-Oxidizing Thaumarchaeon from a 675 Hot Spring Biofilm in Graendalur Valley, Iceland. Front. Microbiol. 9, (2018). 676 14. Alves, R. J. E., Minh, B. Q., Urich, T., von Haeseler, A. & Schleper, C. Unifying the 677 global phylogeny and environmental distribution of ammonia-oxidising archaea based 678 on amoA genes. Nat. Commun. 9, 1517 (2018).

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

679 15. Stahl, D. A. & de la Torre, J. R. Physiology and diversity of ammonia-oxidizing 680 archaea. Annu Rev Microbiol 66, 83–101 (2012). 681 16. Abby, S. S. et al. Candidatus Nitrosocaldus cavascurensis, an Ammonia Oxidizing, 682 Extremely Thermophilic Archaeon with a Highly Mobile Genome. Front Microbiol 9, 28 683 (2018). 684 17. Kerou, M., Alves, R. J. E. & Schleper, C. Nitrososphaerales . in Bergey’s Manual of 685 Systematics of Archaea and Bacteria (2018). doi:10.1002/9781118960608.obm00124 686 18. Prosser, J. I. & Nicol, G. W. Candidatus Nitrosotaleales . in Bergey’s Manual of 687 Systematics of Archaea and Bacteria (2016). doi:10.1002/9781118960608.obm00123 688 19. Qin, W., Martens-Habbena, W., Kobelt, J. N. & Stahl, D. A. Candidatus 689 Nitrosopumilales . in Bergey’s Manual of Systematics of Archaea and Bacteria (2016). 690 doi:10.1002/9781118960608.obm00122 691 20. Kits, K. D. et al. Kinetic analysis of a complete nitrifier reveals an oligotrophic lifestyle. 692 Nature 549, pages 269–272 (2017). 693 21. Kerou, M. et al. Proteomics and comparative genomics of Nitrososphaera viennensis 694 reveal the core genome and adaptations of archaeal ammonia oxidizers. Proc. Natl. 695 Acad. Sci. U. S. A. 113, E7937–E7946 (2016). 696 22. Herbold, C. W. et al. Ammonia-oxidising archaea living at low pH: Insights from 697 comparative genomics. Environ. Microbiol. 19, 4939–4952 (2017). 698 23. Könneke, M. et al. Ammonia-oxidizing archaea use the most energy-efficient aerobic 699 pathway for CO2 fixation. Proc. Natl. Acad. Sci. U. S. A. 111, 8239–8244 (2014). 700 24. Hua, Z.-S. et al. Genomic inference of the metabolism and evolution of the archaeal 701 phylum Aigarchaeota. Nat. Commun. 9, 2832 (2018). 702 25. Lopez-Garcia, P., Brochier, C., Moreira, D. & Rodriguez-Valera, F. Comparative 703 analysis of a genome fragment of an uncultivated mesopelagic crenarchaeote reveals 704 multiple horizontal gene transfers. Env. Microbiol 6, 19–34 (2004). 705 26. Brochier-Armanet, C., Boussau, B., Gribaldo, S. & Forterre, P. Mesophilic 706 Crenarchaeota: proposal for a third archaeal phylum, the Thaumarchaeota. Nat Rev 707 Microbiol 6, 245–252 (2008). 708 27. Eme, L. et al. Metagenomics of Kamchatkan hot spring filaments reveal two new 709 major (hyper)thermophilic lineages related to Thaumarchaeota. Res. Microbiol. 164, 710 425–38 (2013). 711 28. Petitjean, C., Moreira, D., López-García, P. & Brochier-Armanet, C. Horizontal gene 712 transfer of a chloroplast DnaJ-Fer protein to Thaumarchaeota and the evolutionary 713 history of the DnaK chaperone system in Archaea. BMC Evol. Biol. 12, 226 (2012). 714 29. Ren, M. et al. Phylogenomics suggests oxygen availability as a driving force in 715 Thaumarchaeota evolution. ISME J. 13, 2150–2161 (2019).

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

716 30. Kerou, M., Eloy Alves, R. J. & Schleper, C. Nitrososphaeria . in Bergey’s Manual of 717 Systematics of Archaea and Bacteria (2016). doi:10.1002/9781118960608.cbm00055 718 31. Groussin, M. & Gouy, M. Adaptation to environmental temperature is a major 719 determinant of molecular evolutionary rates in archaea. Mol Biol Evol 28, 2661–2674 720 (2011). 721 32. Galtier, N. & Lobry, J. R. Relationships between genomic G+C content, RNA 722 secondary structures, and optimal growth temperature in . J Mol Evol 44, 723 632–636 (1997). 724 33. Csuros, M. Count: evolutionary analysis of phylogenetic profiles with parsimony and 725 likelihood. 26, 1910–1912 (2010). 726 34. Rogozin, I. B., Wolf, Y. I., Babenko, V. N. & Koonin, E. V. Dollo parsimony and the 727 reconstruction of genome evolution. in Parsimony, Phylogeny, and Genomics (2007). 728 doi:10.1093/acprof:oso/9780199297306.003.0011 729 35. Bartossek, R., Spang, A., Weidler, G., Lanzen, A. & Schleper, C. Metagenomic 730 Analysis of Ammonia-Oxidizing Archaea Affiliated with the Soil Group. Front. 731 Microbiol. 3, 1–14 (2012). 732 36. Carini, P., Dupont, C. L. & Santoro, A. E. Patterns of thaumarchaeal gene expression 733 in culture and diverse marine environments. Environ. Microbiol. (2018). 734 doi:10.1111/1462-2920.14107 735 37. Lawton, T. J., Kenney, G. E., Hurley, J. D. & Rosenzweig, A. C. The CopC Family: 736 Structural and Bioinformatic Insights into a Diverse Group of Periplasmic Copper 737 Binding Proteins. Biochemistry 55, 2278–2290 (2016). 738 38. Reitzer, L. Nitrogen assimilation and global regulation in Escherichia coli. Annu. Rev. 739 Microbiol. 57, 155–176 (2003). 740 39. van Heeswijk, W. C., Westerhoff, H. V. & Boogerd, F. C. Nitrogen Assimilation in 741 Escherichia coli: Putting Molecular Data into a Systems Perspective. Microbiol. Mol. 742 Biol. Rev. 77, 628–695 (2013). 743 40. Schlegel, H. G., Gottschalk, G. & von Bartha, R. Formation and utilization of poly- 744 beta-hydroxybutyric acid by Knallgas bacteria (Hydrogenomonas). Nature 191, 463–5 745 (1961). 746 41. Doxey, A. C., Kurtz, D. A., Lynch, M. D., Sauder, L. A. & Neufeld, J. D. Aquatic 747 metagenomes implicate Thaumarchaeota in global cobalamin production. ISME J 9, 748 461–71 (2015). 749 42. Adam, P. S., Borrel, G., Brochier-Armanet, C. & Gribaldo, S. The growing tree of 750 Archaea: new perspectives on their diversity, evolution and ecology. ISME J. 1–19 751 (2017). doi:10.1038/ismej.2017.122 752 43. Beam, J. P. et al. Ecophysiology of an uncultivated lineage of Aigarchaeota from an

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

753 oxic, hot spring filamentous ‘streamer’ community. ISME J. 10, 210–224 (2016). 754 44. Takami, H., Arai, W., Takemoto, K., Uchiyama, I. & Taniguchi, T. Functional 755 classification of uncultured ‘Candidatus caldiarchaeum subterraneum’ using the maple 756 system. PLoS One 10, 1–18 (2015). 757 45. Aylward, F. O. & Santoro, A. E. Heterotrophic Thaumarchaea with Small Genomes 758 Are Widespread in the Dark Ocean. mSystems 5, (2020). 759 46. Reji, L. & Francis, C. A. Metagenome-assembled genomes reveal unique metabolic 760 adaptations of a basal marine Thaumarchaeota lineage. ISME J. 1–11 (2020). 761 doi:10.1038/s41396-020-0675-6 762 47. Berg, I. A. et al. Autotrophic carbon fixation in archaea. Nat. Rev. Microbiol. 8, 447– 763 460 (2010). 764 48. Beam, J. P., Jay, Z. J., Kozubal, M. A. & Inskeep, W. P. Niche specialization of novel 765 Thaumarchaeota to oxic and hypoxic acidic geothermal springs of Yellowstone 766 National Park. ISME J. 8, 938–51 (2014). 767 49. Kim, J.-G. et al. Hydrogen peroxide detoxification is a key mechanism for growth of 768 ammonia-oxidizing archaea. Proc. Natl. Acad. Sci. 113, 7888–93 (2016). 769 50. Bouthier de la Tour, C. et al. Reverse gyrase, a hallmark of the hyperthermophilic 770 archaebacteria. J Bacteriol 172, 6803–6808 (1990). 771 51. Forterre, P. A hot story from comparative genomics: reverse gyrase is the only 772 hyperthermophile-specific protein. Trends Genet. 18, 236–237 (2002). 773 52. Shima, S., Herault, D. A., Berkessel, A. & Thauer, R. K. Activation and 774 thermostabilization effects of cyclic 2, 3-diphosphoglycerate on enzymes from the 775 hyperthermophilic Methanopyrus kandleri. Arch Microbiol 170, 469–472 (1998). 776 53. López-García, P., Zivanovic, Y., Deschamps, P. & Moreira, D. Bacterial gene import 777 and mesophilic adaptation in archaea. Nat. Rev. Microbiol. 13, 447–56 (2015). 778 54. Imlay, J. A. The molecular mechanisms and physiological consequences of oxidative 779 stress: lessons from a model bacterium. Nat. Rev. Microbiol. 11, 443–54 (2013). 780 55. Lee, W. L. et al. Mycobacterium tuberculosis expresses methionine sulfoxide 781 reductases A and B that protect from killing by nitrite and hypochlorite. Mol Microbiol 782 71, 583–593 (2009). 783 56. Sliskovic, I., Raturi, A. & Mutus, B. Characterization of the S-denitrosation activity of 784 protein disulfide isomerase. J. Biol. Chem. (2005). doi:10.1074/jbc.M408080200 785 57. Stein, L. Y. & Klotz, M. G. The nitrogen cycle. Curr. Biol. 26, R94–R98 (2016). 786 58. Kozlowski, J. A., Stieglmeier, M., Schleper, C., Klotz, M. G. & Stein, L. Y. Pathways 787 and key intermediates required for obligate aerobic ammonia-dependent 788 chemolithotrophy in bacteria and Thaumarchaeota. ISME J. 10, 1836–45 (2016). 789 59. Caranto, J. D. & Lancaster, K. M. Nitric oxide is an obligate bacterial nitrification

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

790 intermediate produced by hydroxylamine oxidoreductase. Proc Natl Acad Sci U S A 791 114, 8217–8222 (2017). 792 60. Seaver, L. C., Imlay, J. A. & Loewen, P. Alkyl Hydroperoxide Reductase Is the 793 Primary Scavenger of Endogenous Hydrogen Peroxide in Escherichia coli. J Bacteriol 794 183, 7173–7181 (2001). 795 61. Qin, W. et al. Marine ammonia-oxidizing archaeal isolates display obligate mixotrophy 796 and wide ecotypic variation. Proc Natl Acad Sci U S A 111, 12504–12509 (2014). 797 62. Tourna, M. et al. Nitrososphaera viennensis, an ammonia oxidizing archaeon from 798 soil. Proc. Natl. Acad. Sci. 108, 8420–8425 (2011). 799 63. Bayer, B. et al. Proteomic Response of Three Marine Ammonia-Oxidizing Archaea to 800 Hydrogen Peroxide and Their Metabolic Interactions with a Heterotrophic 801 Alphaproteobacterium. mSystems 4, e00181-19 (2019). 802 64. Morris, J. J., Johnson, Z. I., Szul, M. J., Keller, M. & Zinser, E. R. Dependence of the 803 Cyanobacterium Prochlorococcus on Hydrogen Peroxide Scavenging Microbes for 804 Growth at the Ocean’s Surface. PLoS One 6, e16805 (2011). 805 65. Fahey, R. C. Glutathione analogs in prokaryotes. Biochimica et Biophysica Acta - 806 General Subjects (2013). doi:10.1016/j.bbagen.2012.10.006 807 66. Rawat, M. & Maupin-furlow, J. A. Redox and Thiols in Archaea. 1–20 (2020). 808 67. Jones, G. W., Doyle, S. & Fitzpatrick, D. A. The evolutionary history of the genes 809 involved in the biosynthesis of the antioxidant ergothioneine. Gene 549, 161–170 810 (2014). 811 68. Copley, S. D. & Dhillon, J. K. Lateral gene transfer and parallel evolution in the history 812 of glutathione biosynthesis genes. Genome Biol. 3, research0025.1-research0025.16 813 (2002). 814 69. Malki, L. et al. Identification and characterization of gshA, a gene encoding the 815 glutamate-cysteine ligase in the halophilic archaeon Haloferax volcanii. J. Bacteriol. 816 191, 5196–5204 (2009). 817 70. Sundquist, A. R. & Fahey, R. C. The function of gamma-glutamylcysteine and bis- 818 gamma-glutamylcystine reductase in Halobacterium halobium. J. Biol. Chem. 264, 819 719–725 (1989). 820 71. Wiedenheft, B. et al. An archaeal antioxidant: Characterization of a Dps-like protein 821 from Sulfolobus solfataricus. Proc. Natl. Acad. Sci. 102, 10551–10556 (2005). 822 72. Doerrer, L. H. Cu in biology: Unleashed by O2and now irreplaceable. Inorganica 823 Chim. Acta 481, 4–24 (2018). 824 73. Calhoun, L. N. & Kwon, Y. M. Structure, function and regulation of the DNA-binding 825 protein Dps and its role in acid and oxidative stress resistance in Escherichia coli: A 826 review. J. Appl. Microbiol. 110, 375–386 (2011).

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

827 74. Macomber, L. & Imlay, J. A. The iron-sulfur clusters of dehydratases are primary 828 intracellular targets of copper toxicity. Proc. Natl. Acad. Sci. 106, 8344–8349 (2009). 829 75. Vita, N. et al. A four-helix bundle stores copper for methane oxidation. Nature 525, 830 140–143 (2015). 831 76. Empadinhas, N. & da Costa, M. S. Diversity, biological roles and biosynthetic 832 pathways for sugar-glycerate containing compatible solutes in bacteria and archaea. 833 Environ. Microbiol. 13, 2056–2077 (2011). 834 77. Borges, N. et al. Mannosylglycerate: structural analysis of biosynthesis and 835 evolutionary history. Extremophiles 18, 835–852 (2014). 836 78. Widderich, N. et al. Strangers in the archaeal world: osmostress-responsive 837 biosynthesis of ectoine and hydroxyectoine by the marine thaumarchaeon 838 Nitrosopumilus maritimus. Environ. Microbiol. (2015). 839 79. Booth, I. R., Miller, S., Müller, A. & Lehtovirta-Morley, L. The evolution of bacterial 840 mechanosensitive channels. Cell Calcium 57, 140–150 (2015). 841 80. Mertens, J., Aliyu, H. & Cowan, D. A. LEA Proteins and the Evolution of the WHy 842 Domain. Appl. Environ. Microbiol. 84, 1–8 (2018). 843 81. Morita, R. et al. Molecular Mechanisms of the Whole DNA Repair System: A 844 Comparison of Bacterial and Eukaryotic Systems. J. Nucleic Acids 2010, 1–32 (2010). 845 82. Aravind, L. & Koonin, E. Prokaryotic Homologs of the Eukaryotic DNA-End-Binding 846 Protein Ku, Novel Domains in the Ku Protein and Prediction of a Prokaryotic Double- 847 Strand Break Repair System. Genome Res. 11, 1365–1374 (2002). 848 83. Bartlett, E. J., Brissett, N. C. & Doherty, A. J. Ribonucleolytic resection is required for 849 repair of strand displaced nonhomologous end-joining intermediates. Proc. Natl. 850 Acad. Sci. 110, E1984–E1991 (2013). 851 84. Baughn, A. D. & Malamy, M. H. A mitochondrial-like aconitase in the bacterium 852 Bacteroides fragilis: Implications for the evolution of the mitochondrial Krebs cycle. 853 Proc. Natl. Acad. Sci. 99, 4662–4667 (2002). 854 85. Varghese, S., Tang, Y. & Imlay, J. A. Contrasting sensitivities of Escherichia coli 855 aconitases A and B to oxidation and iron depletion. J. Bacteriol. 185, 221–230 (2003). 856 86. Gardner, P. R. Aconitase: Sensitive target and measure of superoxide. Methods 857 Enzymol. 349, 9–23 (2002). 858 87. Walden, W. E. From bacteria to mitochondria: Aconitase yields surprises. Proc. Natl. 859 Acad. Sci. 99, 4138–4140 (2002). 860 88. Tang, Y., Quail, M. A., Artymiuk, P. J., Guest, J. R. & Green, J. Escherichia coli 861 aconitases and oxidative stress: post-transcriptional regulation of sodA expression. 862 Microbiology 148, 1027–1037 (2002). 863 89. Scandroglio, F., Tórtora, V., Radi, R. & Castro, L. Metabolic control analysis of

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

864 mitochondrial aconitase: Influence over respiration and mitochondrial superoxide and 865 hydrogen peroxide production. Free Radic. Res. 48, 684–693 (2014). 866 90. Walker, C. B. et al. Nitrosopumilus maritimus genome reveals unique mechanisms for 867 nitrification and autotrophy in globally distributed marine crenarchaea. Proc Natl Acad 868 Sci U S A 107, 8818–8823 (2010). 869 91. Spang, A. et al. The genome of the ammonia-oxidizing Candidatus Nitrososphaera 870 gargensis: insights into metabolic versatility and environmental adaptations. Environ. 871 Microbiol. 14, 3122–3145 (2012). 872 92. Turkarslan, S. et al. Niche adaptation by expansion and reprogramming of general 873 transcription factors. Mol. Syst. Biol. 7, 554–554 (2011). 874 93. Becker, E. A. et al. Phylogenetically Driven Sequencing of Extremely Halophilic 875 Archaea Reveals Strategies for Static and Dynamic Osmo-response. PLoS Genet. 10, 876 (2014). 877 94. Martinez-Pastor, M., Tonner, P. D., Darnell, C. L. & Schmid, A. K. Transcriptional 878 Regulation in Archaea: From Individual Genes to Global Regulatory Networks. Annu. 879 Rev. Genet. 51, 143–170 (2017). 880 95. Jung, M.-Y. et al. A hydrophobic ammonia-oxidizing archaeon of the Nitrosocosmicus 881 clade isolated from coal tar-contaminated sediment. Environ. Microbiol. Rep. 11, 882 E7937–E7946 (2016). 883 96. van Wolferen, M., Orell, A. & Albers, S.-V. Archaeal biofilm formation. Nat. Rev. 884 Microbiol. 1 (2018). doi:10.1038/s41579-018-0058-4 885 97. Dang, H. & Lovell, C. R. Microbial Surface Colonization and Biofilm Development in 886 Marine Environments. Microbiol. Mol. Biol. Rev. 80, 91–138 (2016). 887 98. Gambino, M. & Cappitelli, F. Mini-review: Biofilm responses to oxidative stress. 888 Biofouling 32, 167–178 (2016). 889 99. Albers, S.-V. & Jarrell, K. F. The Archaellum: An Update on the Unique Archaeal 890 Motility Structure. Trends Microbiol. 0, (2018). 891 100. Ajon, M. et al. UV-inducible DNA exchange in hyperthermophilic archaea mediated by 892 type IV pili. Mol Microbiol 82, 807–817 (2011). 893 101. Zolghadr, B., Klingl, A., Rachel, R., Driessen, A. & Albers, S. The bindosome is a 894 structural component of the Sulfolobus solfataricus cell envelope. Extremophiles 15, 895 235–244 (2011). 896 102. Makarova, K. S., Koonin, E. V & Albers, S. V. Diversity and Evolution of Type IV pili 897 Systems in Archaea. Front Microbiol 7, 667 (2016). 898 103. Nelson-Sathi, S. et al. Acquisition of 1,000 eubacterial genes physiologically 899 transformed a methanogen at the origin of Haloarchaea. Proc. Natl. Acad. Sci. U. S. 900 A. 109, 20537–42 (2012).

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

901 104. Nelson-Sathi, S. et al. Origins of major archaeal clades correspond to gene 902 acquisitions from bacteria. Nature 1–15 (2014). 903 105. Groussin, M. et al. Gene Acquisitions from Bacteria at the Origins of Major Archaeal 904 Clades Are Vastly Overestimated. Mol Biol Evol 33, 305–310 (2016). 905 106. Méheust, R. et al. Hundreds of novel composite genes and chimeric genes with 906 bacterial origins contributed to haloarchaeal evolution. Genome Biol. 19, 1–12 (2018). 907 107. Rinke, C. et al. A phylogenomic and ecological analysis of the globally abundant 908 Marine Group II archaea ( Ca . Poseidoniales ord . nov .). ISME J. 663–675 (2019). 909 doi:10.1038/s41396-018-0282-y 910 108. Lyu, Z. & Lu, Y. Metabolic shift at the class level sheds light on adaptation of 911 methanogens to oxidative environments. ISME J. 12, 411–423 (2018). 912 109. Petitjean, C., Moreira, D., López-García, P. & Brochier-Armanet, C. Horizontal gene 913 transfer of a chloroplast DnaJ-Fer protein to Thaumarchaeota and the evolutionary 914 history of the DnaK chaperone system in Archaea. BMC Evol. Biol. 12, 226 (2012). 915 110. Brochier-Armanet, C. et al. Complete-fosmid and fosmid-end sequences reveal 916 frequent horizontal gene transfers in marine uncultured planktonic archaea. ISME J 5, 917 1291–1302 (2011). 918 111. Deschamps, P., Zivanovic, Y., Moreira, D., Rodriguez-Valera, F. & López-García, P. 919 Pangenome evidence for extensive interdomain horizontal transfer affecting lineage 920 core and shell genes in uncultured planktonic thaumarchaeota and euryarchaeota. 921 Genome Biol. Evol. 6, 1549–63 (2014). 922 112. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: 923 assessing the quality of microbial genomes recovered from isolates, single cells, and 924 metagenomes. Genome Res. 25, 1043–1055 (2015). 925 113. Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068– 926 2069 (2014). 927 114. Miele, V., Penel, S. & Duret, L. Ultra-fast sequence clustering from similarity networks 928 with SiLiX. BMC Bioinformatics 12, 116 (2011). 929 115. Miele, V. et al. High-quality sequence clustering guided by network topology and 930 multiple alignment likelihood. Bioinformatics 28, 1078–1085 (2012). 931 116. Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark 932 matter. Nature 499, 431–437 (2013). 933 117. Eddy, S. R. Accelerated Profile HMM Searches. PLoS Comput. Biol. 7, e1002195 934 (2011). 935 118. Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid 936 multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30, 937 3059–3066 (2002).

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

938 119. Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): a 939 new for selection of phylogenetic informative regions from multiple sequence 940 alignments. BMC Evol Biol 10, 210 (2010). 941 120. Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and 942 effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol 943 Evol 32, 268–274 (2015). 944 121. Eme, L. et al. Metagenomics of Kamchatkan hot spring filaments reveal two new 945 major (hyper)thermophilic lineages related to Thaumarchaeota. Res Microbiol 164, 946 425–438 (2013). 947 122. Nawrocki, E. P. Structural RNA Homology Search and Alignment using Covariance 948 Models. Ph.D. thesis (2009). 949 123. Gueguen, L. et al. Bio++: efficient extensible libraries and tools for computational 950 molecular evolution. Mol Biol Evol 30, 1745–1750 (2013). 951 124. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res 42, D222-30 952 (2014). 953 125. Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein families. 954 Nucleic Acids Res 31, 371–373 (2003). 955 126. Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Archaeal Clusters of Orthologous Genes 956 (arCOGs): An Update and Application for Analysis of Shared Features between 957 , , and . Life (Basel, 958 Switzerland) 5, 818–840 (2015). 959 127. Galperin, M. Y., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Expanded microbial 960 genome coverage and improved protein family annotation in the COG database. 961 Nucleic Acids Res. 43, D261-9 (2015). 962 128. Abby, S. S., Neron, B., Menager, H., Touchon, M. & Rocha, E. P. MacSyFinder: a 963 program to mine genomes for molecular systems with an application to CRISPR-Cas 964 systems. PLoS One 9, e110726 (2014). 965 129. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein 966 database search programs. Nucleic Acids Res 25, 3389–3402 (1997). 967 130. Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. 968 Bioinformatics 26, 2460–2461 (2010). 969 131. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic 970 tree display and annotation. Bioinformatics 23, 127–128 (2007). 971 132. Garcia, P. S., Jauffrit, F., Grangeasse, C. & Brochier-Armanet, C. GeneSpy, a user- 972 friendly and flexible genomic context visualizer. Bioinformatics (2019). 973 doi:10.1093/bioinformatics/bty459 974 133. Lam-Tung Nguyen, H. A. S. A. von H. B. Q. M. IQ-TREE: A Fast and Effective

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

975 Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol. Biol. Evol. 976 32, 268 (2015). 977 134. Saier Jr., M. H., Reddy, V. S., Tamang, D. G. & Vastermark, A. The transporter 978 classification database. Nucleic Acids Res 42, D251-8 (2014). 979 980 981 982 983 984 985 986 987 988 989

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

990 Figures

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 1. Phylogenetic tree (maximum likelihood) of the 76 genomes analysed, from the concatenation of 33 conserved protein families that were present in 74 out of our 76 genomes dataset (see Table S1). The Crenarchaeota outgroup (11 genomes) is not displayed here, and Bathyarchaeota (5 genomes) and Aigarchaeota (8 genomes) are collapsed. Ammonia oxidizing archaea (AOA) lineages and non-AOA Thaumarchaeota are represented by different colored shades, and their isolation sources are displayed with colored boxes. The tree was built with IQ-tree, using the LG+C60+F model. Supports at nodes are Ultrafast bootstrap

supports 133.

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 2. Reconstruction of optimal growth temperatures (OGT) and genome repertoires of AOA ancestors. Numbers of families inferred as having a probability higher than 0.5 to be present, gained or lost are displayed in circles at each node symbolizing the corresponding ancestral genomes and their dynamics (see main text). Ancestral OGT are displayed at the corresponding nodes (top number) together with their confidence interval (number range below). As there was no 16S rRNA sequences available for the J079 genomic bin, we could not infer OGT for the ancestor of Nitrosocaldales. The color of the bubbles represents the transition from hot (red) to colder (blue) growth temperature. For each candidate order of AOA, the number of protein families found on average in their respective genomes is indicated, as well as the average genome size (along with standard deviations). LACAOA stands for “Last Common Ancestor of AOA”, CAMA for “Common Ancestor of Mesophilic AOA” and CARA for “Common Ancestor of Rod-shaped AOA”.

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

NH3 NH2OH Urea + 2+ Na / + NH + + K+ Cu 3+ 2+ 2- H 4 H solute Fe Fe HPO4 e- Amo NDH M+ - P- III IV V Amt Amt SSS FeCT VIT PhoT PiT UT SSS ABCX (I) PPase ATPases

2+ 2+ ATP NH3 Fe / Mn ZIP Urea aa/ polyamines APC - 2+ O2 Me CDF NO 2+ 2+ Urease Mn / Zn MZT 3-hydroxypropionate / H2O2 peptides / Ni2+ PepT 4-hydroxybutyrate pathway ROS detox & repair: CO2 taurine / vit / S* TauT Sod, Ahp,TrxA/B, DsbD, NH3 HO• Co2+ CbtAB CO2 DsbA, Dps, Csd 2x Acetyl- Cu2+ CopD CoA MsrA/B, Grx Na+ : anion2- DASS Malonyl-CoA TrxA, Dps, Fnr, Csd, IscS sugars Mtn3 Acetoacyl-CoA DNA repair: Te Malonate HR, NER, BER, OsmC, EgtB/D TerC semialdehyde PolY, PolX MFS PHB Me2+ synthesis 3-hydroxybutyryl-CoA MIT 3-hydroxypropionate S* TSUP Thermoprotection: Cofactors HPCS Ars ArsAB Crotonyl-CoA Polyamines F420 MOCO MCP 3-hydroxypropionyl-CoA cDPG ions, MscS ? reverse gyrase Vitamin synthesis: osmolytes MscL 4-hydroxybutyryl-CoA B1, B5, B6, B9 Acrolyl-CoA Cl- ClC H+ Non-oxidative Pentose B12, B7 (biotin), B2 Phosphate pathway aa RhtB H+ 4-hydroxybutyrate Propionyl-CoA K+ CPA2 2- Osmoprotection HPO4 / C-PO(OH)2 PhnT CO2 Gluconeogenesis mngA LEA14 glycerol MIP Succinate Methylmalonyl- 2+ semialdehyde CoA Me Nramp Fla drugs DrugE1 MCE Succinyl-CoA TCA cycle Lipid biosynthesis mevalonate pathway drugs MacB antibiotics DHA2

gains (core) gains (not core) vertically inherited (core) vertically inherited (not core)

Figure 3. Reconstruction of main metabolic and physiological features and transport capabilities of the LAst Common ancestor of AOA (LACAOA). Metabolic modules inferred to be vertically inherited are in light gray, if present in all currently known extant AOA, or light blue if lost in certain lineages (not core). Newly acquired capacities that became core components for all extant AOA are depicted in blue, or in turquoise if they were more scattered throughout extant AOA genomes (not core). Gray lines indicate reactions for which a candidate enzyme is not identified. Due to the inherent difficulty in annotation and substrate prediction, transporters were considered as part of the core genome if found in all ancestors examined in this study, or if a functionally equivalent family exchange took place in the ancestors, but their distribution in extant genomes was not taken into account. Gradient boxes are used in the case of multiple families belonging to different categories. Transporters

are named according to the TCDB classification (Table S4 134), with indicative substrates and directionality of transport. Question mark indicates an unclear family history, see text for details. Abbreviations (in addition to the main text): S*, organo-sulfur

compounds/sulfite/sulfate; Me2+, divalent cations; aa, aminoacids; AcnA, AcnA/IRP group aconitase; MCP, methyl-accepting chemotaxis proteins; Fla, archaeal flagellum (or “archaellum”).

33 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 4. Summary of crucial gene gains and losses in ancestors of AOA and subsequently evolved lineages. The evolutionary events (gains/expansions in black, losses/contractions in grey) related to each stress category are displayed next to the corresponding symbol. Ancestral cells inferred to have been flagellated harbor a schematic flagellum and chemotaxis receptors. For discussion of functions see main text and Table S4 & Supplementary Information). Abbreviations: acnA, AcnA/IRP type aconitase; mAcn, mitochondrial-like aconitase; ahp, alkyl hydroperoxide reductase; ca, carbonic anhydrase; cat, catalase; cDPG, cyclic 2, 3-diphosphoglycerate synthetase; cofG/H, FO synthase subunits 1/2; copZ, copper binding protein; copC/D, copper resistance family proteins; csd/iscS, cysteine desulfurase families; csp, four-helix bundle copper storage protein; dps, DNA protection during starvation family protein; dsbA/D, disulfide bond oxidoreductase A/D; dyp, dyp-type peroxidase; EPS, extracellular polymeric substances synthesis families; fqr, F420H(2)-dependent quinone reductase; fno, F420H2:NADP oxidoreductase; fnr, flavodoxin reductase (ferredoxin-NADPH reductase); fprA/norV, flavorubredoxin; ftn, ferritin-like protein; gloA/B, glyoxalase I/II;

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

GOGAT, glutamate synthase; grx, glutaredoxin; gshA, glutamate-cysteine ligase; gsv, gas- vesicle formation genes; herA, HR helicase; hjc, holliday-junction resolvase; lea14-like, LEA14-like desiccation related protein; llht, luciferase-like hydride transferase; mpgs, mannosyl-3-phosphoglycerate synthase; mscL/S, large- and small-conductance mechanosensitive channel respectively; msrA/B, methionine sulfoxide reductase A/B; mut-L, putative DNA mismatch repair enzyme MutL-like; NHEJ, non-homologous end joining; nurA, HR nuclease; OFOR, 2-oxoacid:ferredoxin oxidoreductase; OsmC, peroxiredoxin; P450, cytochrome p450-domain protein; phaC/E, poly(R)-hydroxyalkanoic acid synthase subunits C/E; phr, photolyase; ppi, peptidylprolyl isomerase; polY/X, DNA repair polymerases Y & X; ProP (proline/glycine-betaine):(H+/Na+) symporter; radA/B, DNA repair and recombination proteins; rgy, reverse gyrase; tbp, TATA-box binding protein; tfb, transcription initiation factor TFIIB homolog; trxA, thioredoxin; trxB, thioredoxin reductase; wrbA, multimeric flavodoxin WrbA; dnaJ/dnaK/grpE/dnaJ-ferredoxin, chaperones; uspA, universal stress protein family A; uvrABC/uvdE, ultraviolet radiation repair excinuclease ABC system/UV damage endonuclease uvdE; xpb/bax1, NER helicase/nuclease pair; xpd, NER helicase.

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 5. A. Phylogenetic tree of the catalase family. A maximum-likelihood phylogenetic tree was obtained for FAM002068. AOA sequences are shown in green, bacterial ones are in black. Archaeal clades are indicated in bold font. B. Phylogenetic tree of the Aconitase family. A maximum-likelihood phylogenetic tree was obtained for FAM001301. AOA sequences are shown in green, and eukaryotic ones in blue. Grey triangles correspond to collapsed groups of archaeal and bacterial aconitases. In both panels, branches with UF-Boot support above 95% are indicated by a red circle.

36 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.28.176255; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 6. Phylogenetic tree of the Type IV pili ATPase family and genetic structure of AOA pili. A maximum-likelihood phylogenetic tree was obtained for the T4P ATPase in AOA (FAM001536) and representative genomes of other Archaea. Four distinct types of T4P are found in AOA (green branches). Clades other than AOA and non-AOA Thaumarchaeota (blue branches) were collapsed, and colored by . Annotations are based on the position of experimentally validated pili (archaeal flagella/archaella, Aap, Ups and Bas pili), otherwise, the generic term “pilus” was used. The genetic architecture of the different types of T4P found in AOA is displayed in green boxes in front of their corresponding position in the ATPase tree. Branches with UF-Boot support above 95% are indicated by a red circle. See also Fig. S4 for the precise genomic distribution within AOA of these different pili.

37