<<

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Phylogenomics Suggests the Origin of Obligately Anaerobic Anammox

2 Correlates with the Great Oxidation Event

3

4 Tianhua Liao^, Sishuo Wang^, Haiwei Luo*

5

6 Simon F. S. Li Marine Science Laboratory, School of Life Sciences and State Key Laboratory of 7 Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR

8

9 ^These authors contribute equally to this work.

10

11 *Corresponding author: 12 Haiwei Luo 13 The Chinese University of Hong Kong 14 Shatin, Hong Kong SAR 15 Phone: (+852) 39436121 16 E-mail: [email protected] 17 18 Running title: The origin time and phylogeny of anammox bacteria

19 Keywords: anammox bacteria, molecular dating analysis,

20

1

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

21 Abstract

22 The anaerobic ammonium oxidation (anammox) bacteria transform ammonium and nitrite to

23 dinitrogen gas, and this obligate anaerobic process accounts for nearly half of global

24 loss. Yet its origin and evolution, which may give important insights into the biogeochemistry of

25 early Earth, remains enigmatic. Here, we compile a comprehensive sequence data set of

26 anammox bacteria, and confirm their single origin within the phylum Planctomycetes. After

27 accommodating the uncertainties and factors influencing time estimates with different statistical

28 methods, we estimate that anammox bacteria originated at around the so-called Great Oxidation

29 Event (GOE; 2.32 to 2.5 billion years ago [Gya]) which is thought to have fundamentally

30 changed global biogeochemical cycles. We further show that during the origin of anammox

31 bacteria, genes involved in oxidative stress, bioenergetics and anammox granules formation were

32 recruited, which may have contributed to their survival in an increasingly oxic Earth. Our

33 findings suggest the rising level of atmospheric oxygen, which made nitrite increasingly

34 available, as a potential driving force for the emergence of anammox bacteria. This is one of the

35 first studies that link GOE to the evolution of obligate anaerobic bacteria. 36

2

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

37 Introduction

38 Accounting for up to 50% of the removal of fixed nitrogen (N) in nature, anaerobic

+ - 39 ammonium oxidation (anammox, NH4 + NO2 → N2 + 2H2O) is recognized as an important

40 biological process that leads to N loss, the other being denitrification [1]. Compared with

41 denitrification, anammox is cost-effective and environmentally friendly thanks to its negligible

42 emissions of greenhouse gases like N2O, and consequently, anammox bacteria, the organisms

43 that perform anammox, have been widely used in wastewater treatments [2]. Previous studies [3,

44 4] investigated the roles of key genes driving anammox, including hzsABC (hydrazine synthase),

45 hdh (hydrazine dehydrogenase) and hao (hydroxylamine ). Further, all members

46 of anammox bacteria are found within the deeply-branching phylum Planctomycetes. Six

47 candidate genera of anammox bacteria, namely ‘Candidatus Brocadia’, ‘Candidatus Kuenenia’,

48 ‘Candidatus Jettenia’, ‘Candidatus Scalindua’, ‘Candidatus Anammoximicrobium’ and

49 ‘Candidatus Anammoxoglobus’, have been proposed based on 16S ribosomal RNA (rRNA) gene

50 sequences [5], but none of them were obtained as typical pure cultures. Apparently, the habitat of

51 anammox bacteria requires simultaneous presence of reduced (ammonia) and oxidized (nitrite)

52 inorganic N compounds. Such habitats are often found at the aerobic-anaerobic interface in

53 aquatic ecosystems including the margins of oxygen minimum zones (OMZs) in the ocean and

54 sediment-water interfaces where ammonium originates from the anaerobic degradation of

55 organic matter and nitrite could be produced by aerobic ammonia oxidation [6]. It is believed

3

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

56 that the early Earth was largely anaerobic and deficient in suitable niches for aerobic nitrogen

57 cycles generating nitrite/nitrate, until around 2.4 billion years ago (Gya), the so-called Great

58 Oxidation Event (GOE), plausibly as a result of the emergence of oxygenic [7].

59 On the other side, ammonium, the other of anammox metabolism, might have been

60 accumulated during the mid-to-late Archaean [8]. It is therefore tempting to hypothesize that the

61 origin of anammox coincided with GOE, a revolutionary event that makes molecular oxygen (O2)

62 to a concentration that is biologically meaningful, thereby creating the (micro)environments

63 satisfying anammox metabolism.

64 The link between the evolution of metabolic pathways and associated geological factors

65 could be implied by investigating their timelines. One way to explore the timescale of a special

66 metabolic pathway is based on biogeochemical evidence, which, however, is often affected by

67 the poor conservation of the biomarker during the long period of time [9]. For instance,

68 ladderanes, a type of special lipids unique to anammox bacteria [10], are rarely preserved to a

69 level that can be used to date their origin time. Further, the nitrogen isotope fractionations from

70 sedimentary records are usually taken as evidence to determine the presence or absence of N

71 cycling metabolism in the past [8, 11, 12]. However, the vague differences of isotopic

72 fractionations between different metabolic pathways make it difficult to single out and elucidate

73 the evolution of the different redox reactions within the N cycle [13]. Alternatively, molecular

74 dating, which estimates the age of the last common ancestor (LCA) of analyzed lineages by

4

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

75 comparing their sequences based on the molecular clock theory [14], provides a powerful

76 strategy to investigate this issue. Here, we hypothesize the rising level of O2 during GOE, which

77 made nitrite available to power anammox [15], as a driving force of the origin of anammox

78 bacteria. To test this hypothesis and to obtain more insights into anammox bacteria evolution, we

79 compiled an up-to-date genomic data set of Planctomycetes (see Supplementary Text section 1;

80 Dataset S1.1), placed the evolution of anammox bacteria in the context of geological events, and

81 investigated genomic changes underlying the origin of this ecologically important bacterial

82 lineage.

83

84 Results and discussion

85 Monophyletic origins of anammox bacteria and anammox genes

86 The anammox bacteria form a monophyletic group embedded in other lineages of

87 Planctomycetes in a comprehensive phylogenomic tree with 881 Planctomycetes (Fig.

88 S1). The anammox bacteria clade in this phylogenomic tree (Fig. S1) comprises four known

89 genera including the early-branching Ca. Scalindua and Ca. Kuenenia, and the late-branching Ca.

90 Jettenia and Ca. Brocadia. It also includes two understudied lineages which we named ‘basal

91 lineage’ and ‘hzsCBA-less lineage’ (Fig. S1), both of which have relatively low

92 completeness compared to the other anammox bacteria and are represented by

93 metagenome-assembled genomes (MAGs) sampled from groundwater (see Supplementary Text

5

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

94 section 2.1). Two described genera (Ca. Anammoximicrobium and Ca. Anammoxoglobus) do

95 not have genomes available by the time of the present study (last accessed: April 2021), and are

96 therefore not included in the phylogenomic analysis. Furthermore, using the last updated set of

97 2,077 Planctomycetes genomes (retrieved in April 2021) from NCBI, we obtained a consistent

98 topology of anammox bacteria (Fig. S2).

99 Besides, to check whether important groups, particularly early-branching lineages, have

100 been encompassed in our analyzed genome data sets, we built 16S rRNA gene tree (Fig. S3)

101 using the identified 16S rRNA genes from genomic sequences of anammox bacteria and the

102 deposited 16S rRNA gene amplicons in SILVA from the class Brocadiae where both anammox

103 and non-anammox bacteria (see Supplementary Texts section 2.2). Since the anammox bacteria

104 with genome sequences in this 16S rRNA gene tree (Fig. S3) showed a branching order

105 congruent with the topology in the phylogenomic tree (Fig. S1), and since the 16S rRNA gene

106 tree included 913 (clustered from 20,142) sequences sampled from a wide array of habitats

107 (marine, sediments, man-made reactor, freshwater and terrestrial ecosystems), the 16S rRNA

108 gene phylogeny likely encompasses the early-branching lineages of anammox bacteria and thus

109 is proper to be used to study the evolutionary origin of anammox bacteria (Dataset S1.2).

110 We further analyzed the evolution (Fig. S4) of the genes (hzsCBA) critical to the

111 anammox reaction [4, 16]. Reconciliation (see Supplementary Texts section 2.3) of the gene and

112 the genome-based phylogenies pointed to a single origin of anammox metabolism at the

6

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

113 LCA of anammox bacteria (Fig. S4). Taken together, the above analyses suggest that presumably

114 our genome data sets have encompassed the earliest-branching lineages and deep phylogenetic

115 diversity of anammox bacteria, and indicate its monophyletic origin, thereby providing a

116 foundation for estimating the origin of anammox metabolism by estimating the origin time of

117 anammox bacteria.

118

119 The evolutionary origin of anammox bacteria coincided with the rising O2

120 We employed MCMCTree to estimate the divergence times of the anammox bacteria with

121 i) soft bounds (time constraint allowing a small probability of violation) based on cyanobacteria

122 fossils, and ii) the relaxed molecular clock approaches allowing substitution rates to vary among

123 branches. Using a best-practiced dating scheme (Fig. 1; see Supplementary Text section 3), we

124 dated the origin of anammox bacteria to 2,117 million years ago (Mya) (95% highest posterior

125 density [HPD] interval, 2,002 - 2,226 Mya). The analysis with the expanded Planctomycetes

126 genomes (see Supplementary Texts section 3) also shows similar time estimate at 2,105 Mya (95%

127 HPD: 1,961 - 2,235 Mya).

128 Analyses with different combinations of calibrations and parameters broadly converge to

129 similar time estimates (Fig. S5; Dataset S2.1). Specifically, assigning a more ancient time

130 constraint to the total group of oxygenic cyanobacteria, which is based on the biogeochemical

131 evidence of the presence of O2 near 3.0 billion years ago (Gya) [17] instead of GOE (Dataset

7

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

132 S2.1), shifted the posterior dates of the LCA of anammox bacteria to the past by ~300 million

133 years (Myr) (e.g., C1 versus C10; Fig. S5). Varying the maximum bounds of the root with either

134 4.5, 3.8 or 3.5 Gyr had a minor impact (around 100 Myr) on the time estimates of anammox

135 bacteria (e.g., C1 versus C3; Fig. S5). Though the auto-correlated rates (AR) model was less

136 favored than the independent rates (IR) model based on the model comparison (see

137 Supplementary Text section 3.3; Dataset S2.2), it is worth noting that the posterior ages

138 estimated by the AR model were 1-12% younger than estimated by the IR model (Dataset S2.1).

139 Besides, there remains a possibility that there are unsampled or even extinct lineages of

140 anammox bacteria that diverged earlier than all anammox bacteria analyzed in the present study.

141 This scenario, if true, hints that the first anammox bacterium could have originated before the

142 occurrence of the LCA of sequenced anammox bacteria but later than their split from the sister

143 lineage (Node X in Fig. 1; up to 2,600 Mya). Accommodating the above uncertainties, our

144 analyses suggest that the origin of anammox bacteria, hence the origin of anammox, was roughly

145 around GOE and most likely ranges from 2,000 to 2,600 Mya.

146

147 The geochemical context of the origin of anammox bacteria

148 We speculate that the coincident time between the origin of an obligate anaerobe and the

149 GOE might be ascribed to the increasing availability of nitrite, whose accumulation in turn

150 depends on the availability of O2 for biological nitrification [15, 18, 19]. It was shown that a low

8

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

151 concentration of nitrite significantly decreased the N removal rate of anammox in reactors [20].

152 In modern OMZs which presumably resemble the primordial Earth environment, the

153 concentration of nitrite is the rate-limiting factor to the anammox metabolism [21]. Besides, it

154 was suggested that nitrite might not reach a biologically meaningful level (micromolar level) on

155 Earth until GOE, likely due to a few nitrite sinks related to the pervasive UV photolysis and Fe(II)

156 reduction on early Earth [18]. Note that a prior study suggested that the seawater nitrite was

157 presumably accumulated up to micromolar concentrations via lightning and photochemical

158 processes [22], which implies the possibility of an earlier origin of anammox bacteria. However,

159 this argument was made under a model assuming high amounts of CO2, which was argued

160 against because the concentration of CO2 in late Archean was likely hundredth of that estimate

161 [23]. The above arguments suggest that the concentration of nitrite before GOE was likely too

162 low to support the origin of anammox bacteria.

163 In modern environments, the majority of nitrite used in anammox is derived from nitrate

164 reduction or aerobic ammonia oxidation [24], the latter performed by ammonia-oxidizing

165 archaea (AOA) or bacteria (AOB). Multiple studies [15, 25, 26] proposed that AOA and/or AOB

166 likely appeared on Earth at about 2.7 Gya according to elevated δ15N values (Fig. 1). In support

167 of this idea, a recent study dated the LCA of AOA to be ~2,300 Myr driven by the increase of O2

168 concentration from terrestrial ecosystems and that the expansion of AOA from land to the ocean

169 did not occur until nearly 1.0 Gya [27]. This hints at a dominant role of AOB in early ocean.

9

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

170 Consistent with this idea, the vast majority of the early-branching lineages of anammox bacteria

171 (Ca. Scalindua) and the sister Planctomycetes lineages of the anammox clade in the

172 phylogenomic tree (Fig. 1) are found in marine environments. Furthermore, in the 16S rRNA

173 gene tree which represents a greater diversity than the phylogenomic tree, marine lineages still

174 account for the majority of the earliest-branching lineages of anammox bacteria (Fig. S3). Thus,

175 it seems likely that the LCA of anammox bacteria might originate from the early ocean where

176 ammonia-oxidizing organisms particularly AOB had likely inhabited. The above arguments,

177 although speculative, tentatively suggest that the nitrite demand for anammox had been readily

178 met by GOE by taking advantage of the significant amount of O2 newly available, which is

179 broadly consistent with our molecular dating results suggesting an origin of anammox around the

180 GOE. Note that there are recently reported anammox bacteria lineages showing less dependence

181 on nitrite, which may use nitric oxide or hydroxylamine as the electron receptor in anammox [5].

182 These lineages are affiliated with Ca. Kuenenia and Ca. Brocadia, which evolved recently,

183 implying that the ability to use alternative electron receptors are likely not ancestral to all

184 anammox bacteria.

185

186 Genomic changes related to anammox metabolism upon the origin of anammox bacteria

187 We further explored the genomic changes characterizing the origin of anammox bacteria

188 (see Supplementary Text section 4). This includes the gains of the aforementioned genes that

10

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

189 directly participate in anammox, viz., hzsABC, hdh, and hao. The current view of the anammox

190 metabolism posits that electrons consumed by anammox for carbon fixation are replenished by

191 the oxidation of nitrite to nitrate by nitrite oxidoreductase (NXR) [28], which was inferred to be

192 acquired upon the origin of anammox bacteria (Fig. 2). Moreover, multiple auxiliary genes for N

193 assimilatory pathways including N regulatory protein (glnB), assimilatory nitrite reductase large

194 subunit (nirB), transporters like amt (ammonium), nirC (nitrite) and NRT family proteins

195 (nitrate/nitrite), were also gained at the origin of anammox bacteria (Fig. 2). However, genes for

196 nitrogen related dissimilatory pathways encoding nitrite reduction to nitric oxide (nirK or nirS)

197 and to ammonium (nrfAH), respectively, were likely acquired after the origin of anammox

198 bacteria (Fig. 2). Anammox occurs at the membrane of anammoxosome, an organelle

199 predominantly composed of a special type of lipid called ladderane. The ladderane membranes

200 show low proton permeability, which helps maintain the proton motive force during the

201 anammox metabolism [10]. Although the biosynthetic pathway of ladderane is yet to be

202 characterized, a previous study [29] predicted 34 candidate genes responsible for the synthesis of

203 ladderane lipids, four of which were potentially acquired during the origin of anammox bacteria

204 (see Supplementary Text section 4.2; Fig. 2), hinting at their potential roles in ladderane

205 synthesis and providing clues to future experimental investigations.

206 Anammox bacteria are detected at anoxic habitats and exhibit low oxygen tolerance [5].

207 Specifically, the O2-sensitive intermediates generated by anammox, e.g., hydrazine, a powerful

11

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

208 reductant, calls for maintaining the anaerobic environments. Accordingly, the likely acquired

209 peroxidases include cytochrome c peroxidase (Ccp) which could scavenge hydrogen peroxide in

210 periplasm [30] and the most prevalent peroxidase [31], thioredoxin-dependent peroxiredoxin

211 (AhpC). Besides peroxide scavengers, the desulfoferrodoxin (Dfx) which functions as

212 superoxide reductase (SOR) to reduce superoxide to hydrogen peroxide and which is broadly

213 distributed among anaerobic bacteria [31], were likely acquired during the origin of anammox

214 bacteria (Fig. 2). Another acquired gene is fprA encoding flavo-diiron proteins which could

215 scavenge O2. Besides, cbi genes which are exclusively involved in the anaerobic biosynthesis of

216 vitamin B12 [32], are widely present in the genomes of anammox bacteria, suggesting their

217 acquisition during the early evolution of anammox bacteria (Fig. 2).

218 Investigating other metabolisms which were also acquired upon the origin of anammox

219 bacteria could help us understand the coeval ecology. Here, we inferred that sat (Fig. 2) which

220 encodes sulfate adenylyltransferase for assimilatory incorporating sulfate into bioavailable

221 adenylyl sulfate, and aprAB and fsr (Fig. 2) encoding dissimilatory sulfite reductase, were

222 acquired upon the origin of anammox bacteria (Fig. 2). The likely acquired sulfur metabolism is

223 coincident with the increased concentrations of seawater sulfate from ~100µM throughout

224 much of the Paleoproterozoic to over 1mM after the rise of atmospheric O2 [33]. Besides, unlike

225 other Planctomycetes which are heterotrophs, anammox bacteria are generally autotrophs that

226 use Wood-Ljungdahl pathway for carbon assimilation [34]. Consequently, they potentially lost

12

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

227 many genes involved in carbohydrate utilization (Fig. 2). The metabolic loss is further

228 strengthened by the enrichment analysis where the genes predicted to be lost by the LCA of

229 anammox bacteria are enriched in pathways involving activity, intramolecular

230 oxidoreductase activity and carbohydrate activity (Fig. 2; see Data and Code availability),

231 hinting at a decreased ability of anammox bacteria to degrade organics. Also lost are genes

232 (mnhABCDEG) encoding Na+/H+ antiporter (Fig. 2) which usually provides sodium tolerance

233 and pH homeostasis abilities in marine environments [35]. Only members of the dominant

234 marine Ca. Scalindua keep this gene cluster. Further, though the way to build up proton

235 motive force is vital to energy harvesting, it remains unexplored in anammox bacteria [36]. Here,

236 we identified that nuo genes encoding NADH-quinone oxidoreductase which could build up

237 proton motive force was likely acquired by the LCA of anammox bacteria (Fig. 2). Besides, the

238 presence of genes coding for Na+ coupled respiratory , for example rnf

239 (ferredoxin-NADH oxidoreductase) and nqr (ubiquinone-NADH oxidoreductase), suggests that

240 anammox bacteria could also use the sodium motive force for ATP synthesis (Fig. 2).

241 Additionally, iron is a vital element for HZS [3] and HDH [37], and increased iron

242 concentrations could promote the growth of anammox bacteria, in vitro [38]. The ancient ocean

243 is thought to be ferruginous (high iron availability) according to iron isotope and other

244 sedimentary records across several coeval marine settings [39], which could provide large

245 amounts of soluble iron for anammox bacteria. Interestingly, a series of iron related genes to

13

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

246 make use of the abundant iron from the environment were likely acquired upon the origin of

247 anammox bacteria. First, anammox bacteria likely acquired gene fur encoding ferric uptake

248 regulator for iron assimilation. Moreover, the iron-containing proteins, cytochrome, is encoded

249 by the acquired genes petBC potentially encoding Rieske-heme b [28] and CcsAB encoding a

250 protein involved in cytochrome c maturation systems (Fig. 2). Notably, ahbABCD, which

251 encodes proteins synthesizing b-type hemes in anammox bacteria [40], were likely acquired

252 before the origin of anammox bacteria (Fig. 2). Furthermore, we identified that the gene Bfr

253 encoding oligomeric protein, bacterioferritin, involved in the uptake and storage of iron [41] was

254 likely acquired during the origin of anammox bacteria (Fig. 2), which may help to hoard iron in

255 the expanding euxinic (low soluble iron availability) ocean. Our results tentatively support the

256 hypothesis that the microenvironments those primordial anammox bacteria colonized, potentially

257 near the anoxic-oxic interface where nitrite was more available, could be iron limited.

258 Taken together, the timelines and corresponding genomic changes of anammox bacteria

259 provide implications for the physiological characteristics of descendant anammox bacteria and

260 the origin of other nitrogen transformation pathways in the context of an estimated evolution

261 timeline of anammox bacteria. For example, since anammox bacteria emerged at the time when

262 the nitrite concentration was merely available, the earlier branching lineage Ca. Scalindua

263 exhibit high nitrite affinity (0.45 μM) compared to other anammox genera (up to 370 μM) [5, 42].

264 Likewise, the gains (Fig. S7; see Supplementary Text section 4.4) of genes nrfAH for the

14

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

265 dissimilatory nitrate reduction to ammonium (DNRA) at the late branching lineages (Ca.

266 Brocadia and Ca. Kuenenia) around 800 Ma are consistent with the time when the nitrite/nitrate

267 availability was increased by the Neoproterozoic Oxygenation Event [43]. The loss of a large

268 number of genes involved in carbohydrate metabolism and the iron dependence upon their origin

269 might limit the distribution of anammox bacteria to oligotrophic environments. The time

270 estimate and genomic changes of the anammox bacteria have gone some way towards enhancing

271 the understanding of the physiological characteristics of modern anammox bacteria and the

272 historical N cycle.

273

274 Concluding remarks

275 Traditionally, the metabolic evolution during the GOE was mainly investigated in the

276 phylum Cyanobacteria [44], which is likely because oxygenic photosynthesis is believed as a

277 major driver of the GOE. Although the cyanobacteria-based strategy might vary a lot using

278 different maximum constraints at the root in the time estimation of Alphaproteobacteria

279 according to recent study [45], the relatively consistent time estimates shown here (Fig. S5)

280 implied that this uncertainty was not significant for anammox bacteria in our study, which might

281 be related to the different phylogenetic relationships with cyanobacteria between

282 Alphaproteobacteria and Planctomycetes. More recently, scientists have increasingly realized

283 that the impact of the GOE on the bacterial world may not be limited to cyanobacteria. This is

15

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

284 well illustrated in Ren et al [27] where the authors showed that the GOE is likely a driving force

285 of the origin of the aerobic AOA and thus an aerobic nitrogen cycle on Earth. Furthermore,

286 another molecular dating study suggested the expanded arsenic resistance systems as biological

287 adaptations to the increased toxicity of oxidized arsenic species after the GOE [46]. However, to

288 our knowledge, little is known about the relationships between the GOE and obligate anaerobic

289 metabolism.

290 Here, using molecular dating and comparative genomics approaches, we link the

291 emergence of an obligate anaerobic bacterial group, which drives the loss of fixed N in many

292 environments, to the rise of O2, and highlight their evolutionary responses to major

293 environmental disturbances. Apparently, the GOE might open novel niches for the origin and

294 subsequent expansion of diverse aerobic such as AOA and AOB, which might in

295 turn facilitate the origin of other bacterial lineages, including some anaerobes like the anammox

296 bacteria, by providing the resources for their energy conservation. Hence, our study suggests that

297 the impact of GOE can well go beyond aerobic prokaryotes. Thus, the origin of anammox

298 bacteria which is proposed driven by the prevalence of aerobic nitrogen transformation provided

299 biological supports to isotopic evidence from sedimentary records. In particular, we show

300 molecular dating analysis as a powerful approach to complement isotopic evidence to solve the

301 origin time of anammox, which has the potential to be extended to the investigation of other

302 ecologically important metabolisms.

16

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

303

304 Materials and Methods

305 A phylogenomic tree (Fig. S1) was generated based on the concatenated alignment of 120

306 ubiquitous proteins (bac120; Dataset S1.3) proposed for tree inference by the Genome

307 Database [47]. All published genomic sequences (952 in total) affiliated with the phylum

308 Planctomycetes in NCBI Genbank up to December 2019 were retrieved (Dataset S1.1). To

309 further examine whether the patterns obtained with this Planctomycetes dataset hold the same,

310 we further constructed an expanded set of Planctomycetes genomes by retrieving a total of 2,077

311 Planctomycetes genomes released by April 2021 at Genbank (Fig. S2; see Data and Code

312 availability). Key anammox genes hzsCBA and hdh were identified against manually curated

313 reference sequences by BLASTP. For each gene, identified protein sequences were aligned using

314 MAFFT (v7.222) [48] and the alignments were refined by trimAl (v1.4) [49]. All phylogenies

315 were constructed by IQ-tree (v1.6.2) [50] with substitution models automatically selected by

316 ModelFinder [51] and branch support assessed with 1,000 ultrafast bootstrap replicates.

317 Following manual curation, four well-recognized genera and two separate lineages of anammox

318 bacteria were highlighted with different colours, and the presence of key genes were annotated

319 with filled symbols beside the labels (Fig. S1). Furthermore, the phylogenomic tree was

320 generated with similar methods using 16S rRNA genes identified from downloaded genomes and

321 those retrieved from the SILVA database (Fig. S3; Dataset S1.2). All trees (including

17

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

322 phylogenomic and 16S trees) in our study were visualized with iTOL [52]. More details are

323 shown in supplementary text, section 2.

324 Molecular dating analysis was carried out using the program MCMCTree from the PAML

325 package (4.9j) [53]. In our study, 13 calibration sets constructed with different time constraints of

326 the root and three calibration nodes within the cyanobacteria lineage were used (see

327 Supplemental Texts section 3.2). The topology constraint for dating analysis was generated by 85

328 genomes which were sampled from the phylogenomic tree of phylum Planctomycetes (see

329 Supplemental Texts section 3.1). To perform dating analysis, clock models and different

330 calibration sets were compared in 26 schemes (see Supplemental Texts section 3; Dataset S2.1)

331 [54, 55]. For each scheme, the approximate likelihood method [56] of MCMCTree were

332 conducted in duplicate with identical iteration parameters (burn-in: 10,000; sample frequency: 20;

333 number of sample: 20,000). The convergence of each scheme was evaluated by comparing the

334 posterior dates of two independent runs. The repeat analysis using latest Planctomycetes

335 genomes was conducted with the best practice dating strategy (IR model and calibration C1).

336 The phylogenetic differences between gene trees and the species tree were reconciliated

337 by GeneRax [57] with unrooted gene tree as input and automatically optimized duplication,

338 transfer, loss (DTL) rates. For each gene, we used a species tree comprised by all anammox

339 bacteria pruned from the phylogenomic tree as the reference. We used recommended parameters

340 including SPR strategy, undated DTL model and a maximum radius of five.

18

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

341 For comparative genomics analysis, the protein-coding sequences were annotated against KEGG

342 [58], CDD [59], InterPro [60], Pfam [61], TIGRFAM [62] and TCDB [63], individually (see

343 Supplemental Texts section 4). Following annotations, the potentially gained/lost genes between

344 groups (anammox bacteria versus non-anammox bacteria; an anammox bacterial genus versus all

345 other anammox bacteria) were statistically tested using two-sided fisher’s exact tests. Finally, the

346 resulting p-values were corrected with the Benjamini-Hochberg FDR (false discovery rate)

347 procedure. Moreover, genes with corrected p-values smaller than 0.05 and with larger ratio in the

348 study group were defined as gained genes, and genes with corrected p-values smaller than 0.05

349 and with smaller ratio in the study group were defined as potentially lost genes. Those potentially

350 gained/lost genes were summarized for their detailed functions (see Data and Code availability).

351

352

353 Acknowledgements

354 We thank Bite Pei for building the basic dataset for this project, Yang Qian for his help in

355 editing the earlier version of the manuscript, Jinjin Tao for her helpful discussion, and Mario dos

356 Reis for his suggestion on dating analysis. This work is funded by the National Science

357 Foundation of China (92051113), the Hong Kong Research Grants Council Area of Excellence

358 Scheme (AoE/M-403/16), the Direct Grant of CUHK (4053495), and The CUHK Impact

359 Postdoctoral Fellowship Scheme to (S.W.).

19

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

360

361 Data and Code availability

362 All used genomic sequences, generated phylogenetic trees, estimated divergence times

363 and the used python codes are deposited in the online repository

364 https://github.com/luolab-cuhk/anammox-origin.

365

366 Conflict of interest

367 The authors declare that they have no conflict of interest.

368

369

20

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

370 References

371 1. Oshiki M, Satoh H, Okabe S. Ecology and physiology of anaerobic ammonium oxidizing

372 bacteria. Environ Microbiol 2016; 18: 2784-2796.

373 2. Li L, Ling Y, Wang HY, Chu ZS, Yan GK, Li ZW, et al. N2O emission in partial

374 nitritation-anammox process. Chinese Chemical Letters 2020; 31: 28-38.

375 3. Dietl A, Ferousi C, Maalcke WJ, Menzel A, de Vries S, Keltjens JT, et al. The inner

376 workings of the hydrazine synthase multiprotein complex. Nature 2015; 527: 394-397.

377 4. Kartal B, Maalcke WJ, de Almeida NM, Cirpus I, Gloerich J, Geerts W, et al. Molecular

378 mechanism of anaerobic ammonium oxidation. Nature 2011; 479: 127-130.

379 5. Zhang L, Okabe S. Ecological niche differentiation among anammox bacteria. Water Res

380 2020; 171: 115468.

381 6. Sonthiphand P, Hall MW, Neufeld JD. Biogeography of anaerobic ammonia-oxidizing

382 (anammox) bacteria. Front Microbiol 2014; 5: 399.

383 7. Lyons TW, Reinhard CT, Planavsky NJ. The rise of oxygen in Earth's early ocean and

384 atmosphere. Nature 2014; 506: 307-315.

385 8. Yang J, Junium CK, Grassineau NV, Nisbet EG, Izon G, Mettam C, et al. Ammonium

386 availability in the Late Archaean nitrogen cycle. Nature Geoscience 2019; 12: 553-557.

21

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

387 9. Hallmann C, Kelly AE, Gupta SN, Summons RE. Reconstructing Deep-Time Biology

388 with Molecular Fossils. Quantifying the Evolution of Early Life : Numerical Approaches

389 to the Evaluation of Fossils and Ancient Ecosystems 2011; 36: 355-401.

390 10. Moss FR, Shuken SR, Mercer JAM, Cohen CM, Weiss TM, Boxer SG, et al. Ladderane

391 phospholipids form a densely packed membrane with normal hydrazine and anomalously

392 low proton/hydroxide permeability. Proceedings of the National Academy of Sciences of

393 the United States of America 2018; 115: 9098-9103.

394 11. Mettam C, Zerkle AL, Claire MW, Prave AR, Poulton SW, Junium CK. Anaerobic

395 nitrogen cycling on a Neoarchaean ocean margin. Earth and Planetary Science Letters

396 2019; 527.

397 12. Ader M, Thomazo C, Sansjofre P, Busigny V, Papineau D, Laffont R, et al. Interpretation

398 of the nitrogen isotopic composition of Precambrian sedimentary rocks: Assumptions and

399 perspectives. Chemical Geology 2016; 429: 93-110.

400 13. Fuchsman CA, Stueken EE. Using modern low-oxygen marine ecosystems to understand

401 the nitrogen cycle of the Paleo- and Mesoproterozoic oceans. Environ Microbiol 2021; 23:

402 2801-2822.

403 14. Yang Z, Rannala B. Bayesian estimation of species divergence times under a molecular

404 clock using multiple fossil calibrations with soft bounds. Mol Biol Evol 2006; 23:

405 212-226.

22

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

406 15. Stueken EE, Kipp MA, Koehler MC, Buick R. The evolution of Earth's biogeochemical

407 nitrogen cycle. Earth-Science Reviews 2016; 160: 220-239.

408 16. Harhangi HR, Le Roy M, van Alen T, Hu BL, Groen J, Kartal B, et al. Hydrazine

409 synthase, a unique phylomarker with which to study the presence and biodiversity of

410 anammox bacteria. Appl Environ Microbiol 2012; 78: 752-758.

411 17. Betts HC, Puttick MN, Clark JW, Williams TA, Donoghue PCJ, Pisani D. Integrated

412 genomic and fossil evidence illuminates life's early evolution and eukaryote origin. Nat

413 Ecol Evol 2018; 2: 1556-1562.

414 18. Ranjan S, Todd ZR, Rimmer PB, Sasselov DD, Babbin AR. Nitrogen Oxide

415 Concentrations in Natural Waters on Early Earth. Geochemistry Geophysics Geosystems

416 2019; 20: 2021-2039.

417 19. Cheng C, Busigny V, Ader M, Thomazo C, Chaduteau C, Philippot P. Nitrogen isotope

418 evidence for stepwise oxygenation of the ocean during the Great Oxidation Event.

419 Geochimica Et Cosmochimica Acta 2019; 261: 224-247.

420 20. Li W, Zhuang JL, Zhou YY, Meng FG, Kang D, Zheng P, et al. Metagenomics reveals

421 microbial community differences lead to differential nitrate production in anammox

422 reactors with differing nitrogen loading rates. Water Res 2020; 169: 115279.

423 21. Zhang L, Narita Y, Gao L, Ali M, Oshiki M, Ishii S, et al. Microbial competition among

424 anammox bacteria in nitrite-limited bioreactors. Water Res 2017; 125: 249-258.

23

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

425 22. Wong ML, Charnay BD, Gao P, Yung YL, Russell MJ. Nitrogen Oxides in Early Earth's

426 Atmosphere as Electron Acceptors for Life's Emergence. Astrobiology 2017; 17:

427 975-983.

428 23. Charnay B, Le Hir G, Fluteau F, Forget F, Catling DC. A warm or a cold early Earth?

429 New insights from a 3-D climate-carbon model. Earth and Planetary Science Letters

430 2017; 474: 97-109.

431 24. Dalsgaard T, Thamdrup B, Farias L, Revsbech NP. Anammox and denitrification in the

432 oxygen minimum zone of the eastern South Pacific. Limnology and Oceanography 2012;

433 57: 1331-1346.

434 25. Canfield DE, Glazer AN, Falkowski PG. The evolution and future of Earth's nitrogen

435 cycle. Science 2010; 330: 192-196.

436 26. Zerkle AL, Poulton SW, Newton RJ, Mettam C, Claire MW, Bekker A, et al. Onset of the

437 aerobic nitrogen cycle during the Great Oxidation Event. Nature 2017; 542: 465-467.

438 27. Ren M, Feng X, Huang Y, Wang H, Hu Z, Clingenpeel S, et al. Phylogenomics suggests

439 oxygen availability as a driving force in Thaumarchaeota evolution. ISME J 2019; 13:

440 2150-2161.

441 28. Kartal B, Keltjens JT. Anammox Biochemistry: a Tale of Heme c Proteins. Trends

442 Biochem Sci 2016; 41: 998-1011.

24

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

443 29. Rattray JE, Strous M, Op den Camp HJ, Schouten S, Jetten MS, Damste JS. A

444 comparative genomics study of genetic products potentially encoding ladderane lipid

445 biosynthesis. Biol Direct 2009; 4: 8.

446 30. Van Vliet AH, Ketley JM, Park SF, Penn CW. The role of iron in Campylobacter gene

447 regulation, metabolism and oxidative stress defense. FEMS Microbiol Rev 2002; 26:

448 173-186.

449 31. Johnson LA, Hug LA. Distribution of reactive oxygen species defense mechanisms

450 across bacteria. Free Radical Biology and Medicine 2019; 140: 93-102.

451 32. Moore SJ, Warren MJ. The anaerobic biosynthesis of vitamin B12. Biochem Soc Trans

452 2012; 40: 581-586.

453 33. Fakhraee M, Hancisse O, Canfield DE, Crowe SA, Katsev S. Proterozoic seawater sulfate

454 scarcity and the evolution of ocean-atmosphere chemistry. Nature Geoscience 2019; 12:

455 375-380.

456 34. Kartal B, Keltjens JT, Jetten MSM. Metabolism and Genomics of Anammox Bacteria.

457 Nitrification 2011: 181-200.

458 35. Ito M, Morino M, Krulwich TA. Mrp Antiporters Have Important Roles in Diverse

459 Bacteria and Archaea. Front Microbiol 2017; 8: 2325.

460 36. Peeters SH, van Niftrik L. Trending topics and open questions in anaerobic ammonium

461 oxidation. Curr Opin Chem Biol 2019; 49: 45-52.

25

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

462 37. Akram M, Dietl A, Mersdorf U, Prinz S, Maalcke W, Keltjens J, et al. A 192-heme

463 electron transfer network in the hydrazine dehydrogenase complex. Sci Adv 2019; 5:

464 eaav4310.

465 38. Bi Z, Qiao S, Zhou J, Tang X, Zhang J. Fast start-up of Anammox process with

466 appropriate ferrous iron concentration. Bioresour Technol 2014; 170: 506-512.

467 39. Poulton SW, Canfield DE. Ferruginous Conditions: A Dominant Feature of the Ocean

468 through Earth's History. Elements 2011; 7: 107-112.

469 40. Ferousi C, Lindhoud S, Baymann F, Kartal B, Jetten MS, Reimann J. Iron assimilation

470 and utilization in anaerobic ammonium oxidizing bacteria. Curr Opin Chem Biol 2017;

471 37: 129-136.

472 41. Keren N, Aurora R, Pakrasi HB. Critical roles of bacterioferritins in iron storage and

473 proliferation of cyanobacteria. Plant Physiol 2004; 135: 1666-1673.

474 42. Wu Z, Meng H, Huang X, Wang Q, Chen WH, Gu JD, et al. Salinity-driven heterogeneity

475 toward anammox distribution and growth kinetics. Appl Microbiol Biotechnol 2019; 103:

476 1953-1960.

477 43. Ader M, Sansjofre P, Halverson GP, Busigny V, Trindade RIF, Kunzmann M, et al. Ocean

478 redox structure across the Late Neoproterozoic Oxygenation Event: A nitrogen isotope

479 perspective. Earth and Planetary Science Letters 2014; 396: 1-13.

26

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

480 44. Schopf JW. Geological evidence of oxygenic photosynthesis and the biotic response to

481 the 2400-2200 ma "great oxidation event". Biochemistry (Mosc) 2014; 79: 165-177.

482 45. Wang S, Meade A, Lam HM, Luo H. Evolutionary Timeline and Genomic Plasticity

483 Underlying the Lifestyle Diversity in Rhizobiales. mSystems 2020; 5.

484 46. Chen SC, Sun GX, Yan Y, Konstantinidis KT, Zhang SY, Deng Y, et al. The Great

485 Oxidation Event expanded the genetic repertoire of arsenic metabolism and cycling. Proc

486 Natl Acad Sci U S A 2020; 117: 10414-10421.

487 47. Parks DH, Rinke C, Chuvochina M, Chaumeil PA, Woodcroft BJ, Evans PN, et al.

488 Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree

489 of life. Nat Microbiol 2017; 2: 1533-1542.

490 48. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7:

491 improvements in performance and usability. Mol Biol Evol 2013; 30: 772-780.

492 49. Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. trimAl: a tool for automated

493 alignment trimming in large-scale phylogenetic analyses. Bioinformatics 2009; 25:

494 1972-1973.

495 50. Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective

496 stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol

497 2015; 32: 268-274.

27

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

498 51. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder:

499 fast model selection for accurate phylogenetic estimates. Nat Methods 2017; 14: 587-589.

500 52. Letunic I, Bork P. Interactive Tree Of Life (iTOL) v4: recent updates and new

501 developments. Nucleic Acids Research 2019; 47: W256-W259.

502 53. Yang ZH. PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology

503 and Evolution 2007; 24: 1586-1591.

504 54. Reis MD, Gunnell GF, Barba-Montoya J, Wilkins A, Yang Z, Yoder AD. Using

505 Phylogenomic Data to Explore the Effects of Relaxed Clocks and Calibration Strategies

506 on Divergence Time Estimation: Primates as a Test Case. Syst Biol 2018; 67: 594-615.

507 55. Dos Reis M, Donoghue PC, Yang Z. Bayesian molecular clock dating of species

508 divergences in the genomics era. Nat Rev Genet 2016; 17: 71-80.

509 56. Dos Reis M, Yang Z. Approximate likelihood calculation on a phylogeny for Bayesian

510 estimation of divergence times. Mol Biol Evol 2011; 28: 2161-2172.

511 57. Morel B, Kozlov AM, Stamatakis A, Szollosi GJ. GeneRax: A Tool for

512 Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under

513 Gene Duplication, Transfer, and Loss. Mol Biol Evol 2020; 37: 2763-2774.

514 58. Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids

515 Research 2000; 28: 27-30.

28

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

516 59. Lu S, Wang J, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, et al. CDD/SPARCLE:

517 the conserved domain database in 2020. Nucleic Acids Res 2020; 48: D265-D268.

518 60. Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, et al. InterPro in 2019:

519 improving coverage, classification and access to protein sequence annotations. Nucleic

520 Acids Res 2019; 47: D351-D360.

521 61. El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, et al. The Pfam

522 protein families database in 2019. Nucleic Acids Res 2019; 47: D427-D432.

523 62. Haft DH, Loftus BJ, Richardson DL, Yang F, Eisen JA, Paulsen IT, et al. TIGRFAMs: a

524 resource for the functional identification of proteins. Nucleic Acids Res

525 2001; 29: 41-43.

526 63. Saier MH, Jr., Reddy VS, Tsu BV, Ahmed MS, Li C, Moreno-Hagelsieb G. The

527 Transporter Classification Database (TCDB): recent advances. Nucleic Acids Res 2016;

528 44: D372-D379.

529 64. Kipp MA, Stueken EE, Yun M, Bekker A, Buick R. Pervasive aerobic nitrogen cycling in

530 the surface ocean across the Paleoproterozoic Era. Earth and Planetary Science Letters

531 2018; 500: 117-126.

532 533

29

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

534 Figure legends

535 Figure 1. The evolutionary timeline of anammox bacteria using MCMCTree. The chronogram

536 was estimated based on the calibration set C1 (see Supplementary Text section 3). The

537 distributions of the posterior dates estimated using other calibration sets are shown above the last

538 common ancestor (LCA) of anammox bacteria. These alternative calibration sets were selected

539 by choosing the three that accommodate different calibration constraints (see Supplementary

540 Texts section 3.3). More alternative time estimates are provided in Figure S4. The vertical grey

541 bar represents the period of the GOE from 2,500 to 2,320 Mya. The calibration constraints used

542 within the phylum Cyanobacteria are marked with orange texts: the LCA of Planctomycetes and

543 Cyanobacteria (Root), the total group of oxygenic Cyanobacteria (Node 2), the total group of

544 Nostocales (Node 3), and the total group of Pleurocapsales (Node 4). The genome completeness

545 estimated by checkM is visualized with a gradient color strip. The right next color strip indicates

546 the genome type of genomic sequences used in our study including metagenome-assembled

547 genomes (MAGs) and whole-genome sequencing (WGS) from either enriched culture sample or

548 isolate. The diagram below the dated tree illustrates the change of atmospheric partial pressure of

549 O2 (PO2) and nitrogen isotope fractionations. The blue line shows the proposed model according

550 to Lyons et al [7]. The green arrow suggests the earliest evidence for aerobic nitrogen cycle at

551 around 2.7 Ga. The PAL in right axis means PO2 relative to the present atmospheric level. The

552 black dots denote the change of nitrogen 15N isotope values according to Kipp et al [64].

30

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

553

554 Figure 2. The phyletic pattern of ecologically relevant genes in the comparison between

555 anammox bacteria and non-anammox bacteria. The phylogenomic tree on the left was

556 constructed with 887 genomic sequences described in Supplementary Texts section 2.1. Solid

557 circles at the nodes indicate the ultrafast bootstrap values in 1,000 bootstrapped replicates. Those

558 Planctomycetes genomes not used for comparative genomics analyses are collapsed into grey

559 triangles, and the numbers of collapsed genomes are labelled next to the triangles. The target

560 group and reference group for comparative genomic analysis are within an orange or blue box,

561 separately. For each genome, the genome completeness estimated by CheckM is visualized with

562 a color strip and labelled besides leaf nodes. The right next color strip represents the type of

563 genomic sequences used in our study including metagenome-assembled genomes (MAGs) and

564 whole-genome sequencing (WGS) from either enriched culture sample or isolate. The filled and

565 empty circles, respectively, represent the presence and absence of particular genes in

566 corresponding genomes. For gene clusters, only genomes with at least half of the members of the

567 gene cluster are indicated by a filled circle. The classifications of annotated genes are labelled

568 above the gene names. hzsCBA, hydrazine synthase subunit C, B and A; hdh, hydrazine

569 dehydrogenase; hao, hydroxylamine dehydrogenase; nxrAB, nitrite oxidoreductase subunit A and

570 B; nirK, copper-containing and NO-forming nitrite reductase; nirS, cytochrome NO-forming

571 nitrite reductase; nrfAH, ammonia-forming nitrite reductase subunit A and H; glnB, nitrogen

31

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

572 regulatory protein P-II; nirC, nitrite transporter; NRT, nitrate/nitrite transporter; amt, ammonium

573 transporter; kuste2805, 3603, 3605-3606, proposed genes relative to the synthetic pathways for

574 ladderane at Rattray et al [29]; cbiG, cobalt-precorrin 5A hydrolase; cbiD,

575 cobalt-precorrin-5B(C1)-methyltransferase; cbiOMQ, cobalt/nickel transport system; AhpC,

576 peroxiredoxin; Ccp, cytochrome c peroxidase; dfx, superoxide reductase; fprA, H2O-forming

577 flavoprotein; CcsAB, cytochrome c maturation systems; petB, Cytochrome b subunit of

578 the bc complex; petC, Rieske Fe-S protein; ahbABCD, heme biosynthesis; sat, sulfate

579 adenylyltransferase; aprAB, adenylylsulfate reductase, subunit A and B; fsr, sulfite reductase

580 (coenzyme F420); higB-1, toxin; higA-1, antitoxin; mnhABCDEG, multicomponent Na+/H+

581 antiporter; fdhAB, formate dehydrogenase subunit A and B; nuo(A-N), NADH-quinone

582 oxidoreductase; ndh(A-N), NAD(P)H-quinone oxidoreductase; nqrABCDEF, Na+-transporting

583 NADH:ubiquinone oxidoreductase; rnfABCDEG, Na+-translocating ferredoxin:NAD+

584 oxidoreductase; atp(A-H), F-type H+-transporting ATPase; lacAZ, beta-galactosidase; melA, galA,

585 alpha-galactosidase; ebgA, beta-galactosidase; galK, ; , ; fruK,

586 1-; rhaB, rhamnulokinase; rbsK, ribokinase; araB, L-ribulokinase; xylB,

587 xylulokinase; kdgK, 2-dehydro-3-deoxygluconokinase; GALK2, N-acetylgalactosamine kinase;

588 cah, cephalosporin-C deacetylase; argE, acetylornithine deacetylase; nagA;

589 N-acetylglucosamine-6-phosphate deacetylase; HDAC11, histone deacetylase 11.

32

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Completeness Genome Divergence time (Gyr) Marine Type 3.5 3.0 2.5 2.0 1.5 0.5 1.0 Candidatus Brocadia sapporoensis 96.09 Candidatus Brocadia sp. UTAMX1 UTAMX1 98.24 Candidatus Brocadia sp. WS118 WS118 98.24 Candidatus Brocadia sinica JPN1 99.41 Planctomycetes bacterium RIFCSPLOWO2_12_FULL_39_13 72.66 Planctomycetes bacterium GWA2_39_15 76.74 C12 IR Planctomycetes bacterium RIFCSPHIGHO2_12_42_15 50.91 uncultured Planctomycetales bacterium AMX2 98.24 C9 IR Candidatus Jettenia ecosi J2 99.02 Candidatus Kuenenia stuttgartiensis BL10 84.57 Planctomycetes bacterium RBG_16_41_13 69.92 Anammox Lineage C5 IR Candidatus Scalindua brodae RU1 96.13 Planctomycetes bacterium SZUA-167 59.17 Candiadtus Scalindua japonica 98.18 Basal lineage Planctomycetes bacterium RIFCSPHIGHO2_02_FULL_40_12 93.76 LCA of Candidatus Scalindua rubra BSI-1 93.65 Ca. Scalindua anammox bacteria Candidatus Scalindua sp. MohnsGC08_192 95.12 Candidatus Scalindua sp. SCAELEC01 SCAELEC01 98.63 Planctomycetes bacterium RIFCSPLOWO2_12_FULL_50_35 84.77 Ca. Kuenenia Node X Planctomycetes bacterium GWA2_50_13 70.09 Planctomycetes bacterium RIFCSPHIGHO2_12_FULL_52_36 62.79 Candidatus Brocadiaceae bacterium B1Sed10_209R1 86.88 Ca. Jettenia Candidatus Brocadiaceae bacterium B1Sed10_183 79.08 Candidatus Brocadiaceae bacterium B1Sed10_231 90.62 hzsCBA-less lineage Planctomycetes bacterium DG_23 DG_23 79.17 Planctomycetes bacterium B129_G9 64.62 Planctomycetaceae bacterium Bin_8_1_1 65.63 Ca. Brocadia Planctomycetes bacterium RBG_16_59_8 43.91 Planctomycetes bacterium B143_G9 62.21 Planctomycetes bacterium J132 48.08 Planctomycetes bacterium PLA1 90.98 Completness Planctomycetes bacterium E44_bin39 90.72 Planctomycetes bacterium Pla163 86.72 Planctomycetes bacterium GW928_bin.9 73.91 43.91 Planctomycetes Planctomycetes bacterium SP2985 74.09 Planctomycetes bacterium ARS15 89.06 Planctomycetes bacterium NAT19 80.66 45.91 Planctomycetes bacterium DOLJORAL78_63_9 82.42 Planctomycetes bacterium SZUA-250 61.41 Planctomycetes bacterium Baikal-deep-G175 76.56 47.91 Great Oxygen Event (GOE) Planctomycetes bacterium ARS1434 72.76 Planctomycetia bacterium TMED53 TMED53 73.86 50 Planctomycetes bacterium MAG 59 68.29 Planctomycetes bacterium SP3001 70.05 Planctomycetia bacterium F1-120-MAGs048 90.82 66 Planctomycetes bacterium UBA8898 85.27 Planctomycetes bacterium UBA8742 64.51 Planctomycetes bacterium J103 61.95 82 Opitutae bacterium UBA4506 UBA4506 98.23 Opitutales bacterium CSSed165cm_470 93.76 bacterium palsa_1439 97.55 100.0 Verrucomicobia Verrucomicrobia bacterium DP16D_bin.41 92.45 Verrucomicrobia bacterium UBA6056 UBA6056 93.67 Verrucomicrobia bacterium UBA10936 85.69 Genome type Tolypothrix bouteillei VB521301 99.64 Scytonema hofmannii PCC 7110 99.64 Node4 Tolypothrix sp. NIES-4075 100 WGS Cylindrospermum stagnale PCC 7417 99.73 Chlorogloea sp. CCALA 695 99.73 Aliterella atlantica CENA595 95.65 MAGs Root Chroococcidiopsis sp. CCALA 051 96.73 Phormidium ambiguum IAM M-71 99.37 Crinalium epipsammum PCC 9333 99.37 Microcoleus sp. PCC 7113 99.56 Gloeocapsa sp. PCC 73106 98.91 Ultrafast bootstrap value Cyanobacterium sp. IPPAS B-1200 99.18 Stanieria cyanosphaera PCC 7437 99.46 >95 Node3 Stanieria sp. NIES-3757 99.46 Spirulina major PCC 6313 99.46 Spirulina subsalsa PCC 9445 99.46 85-95 filamentous cyanobacterium ESFC-1 99.09 Rubidibacter lacunae KORDI 51-2 100 Synechococcus sp. PCC 7336 100 65-85 Synechococcus sp. JA-2-3B'a(2-13) 98.64 Node2 Gloeobacter violaceus PCC 7421 96.06 Candidatus Caenarcanum bioreactoricola UASB_169 84.62 Candidatus bacterium SSGW_16 SSGW_16 94.87 Candidatus Obscuribacter phosphatis EBPR_351 97.44 Vampirovibrio chlorellavorus Vc_AZ_2 95.73 Candidatus Gastranaerophilales bacterium UMGS1517 87.61 Candidatus Sericytochromatia bacterium S15B-MN24 CBMW_12 89.74 Candidatus Sericytochromatia bacterium GL2-53 LSPB_72 75.25 Candidatus Melainabacteria bacterium BOG_genome_mining.24 79.63 Candidatus Saganbacteria bacterium X2_MaxBin.013 93.1 Candidatus Margulisbacteria bacterium RAAC_38_40 96.58

100

20 p O

-2 2 10 10 (PAL) N (‰) 0 -4

15 10

δ -10 3.5 3.0 2.5 2.0 1.5 1.0 0.5 Age (Gyr) bioRxiv preprint doi: https://doi.org/10.1101/2021.07.07.451387; this version posted July 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Bioenergetic Tree scale: 1 formate dehydrogenase biosynthesis Carbohydrate utilization ladderane Nitrogen metabolism of VB12 Oxidative Iron & Completeness Anammox and transporter stress heme related Sulfur-cycle

Anammox lineage Genome amt nirK nirS nrfA nirC nrfH hdh hao nirB glnB NRT nxrB nxrA hzsB hzsA hzsC fdhA fdhB galK cbiG cbiD cbiO cbiM cbiQ lacA lacZ melA galA ebgA scrK rhaB rbsK araB xylB fruK cah kdgK argE nagA GALK2 nuoABCDHIJKLMNnqrABCDEFndhABCDEFGHIJKLMNrnfABCDEGatpABCDEFGH HDAC11 dfx AhpC ahbAB ahbC ahbD collapsed 689 genomes Type kuste2805kuste3605kuste3606kuste3603 Ccp fprA CcsAB bfr fur petB petC sat aprA aprB fsr mnhABCDEG Candidatus Brocadiaceae bacterium B188 98.24 Non-Anammox Planctomycetes Candidatus Brocadia sp. BROELEC01 BROELEC01 98.24 Candidatus Brocadia sp. UTAMX1 UTAMX1 98.24 Candidatus Brocadia sp. R4W10303 R4W10303 98.44 for comparative analysis Candidatus Brocadia sp. BL1 BL1 98.44 Candidatus Brocadia sapporoensis 96.09 Candidatus Brocadia sp. SB37 61.71 Candidatus Brocadia caroliniensis 26THWARD 97.04 Candidatus Brocadia sp. WS118 WS118 98.24 Completness Candidatus Brocadia sp. AMX2 98.63 Candidatus Brocadia fulgida RU1 97.85 Candidatus Brocadia sp. UTAMX2 UTAMX2 98.24 Candidatus Brocadia sinica JPN1 99.41 18.36 Candidatus Brocadia sp. AMX2 AMX2 99.01 Candidatus Brocadia sp. AMX1 98.24 Candidatus Brocadia sinica OLB1 96.68 Planctomycetes bacterium GWC2_39_26 61.74 28.36 Planctomycetes bacterium GWF2_39_10 58.89 Planctomycetes bacterium RIFCSPLOWO2_12_FULL_39_13 72.66 Planctomycetes bacterium GWA2_39_15 76.74 Planctomycetes bacterium RIFCSPHIGHO2_12_39_6 71 38.36 Planctomycetes bacterium RIFCSPHIGHO2_02_FULL_38_41 68.48 Planctomycetes bacterium RIFCSPLOWO2_12_38_17 78.98 Planctomycetes bacterium RIFOXYD12_FULL_42_12 45.69 Planctomycetes bacterium RIFOXYD2_FULL_41_16 62.7 50 Planctomycetes bacterium GWE2_41_14 48.5 Planctomycetes bacterium RIFOXYC2_FULL_41_27 55.14 Planctomycetes bacterium GWB2_41_19 45.9 Planctomycetes bacterium RIFCSPHIGHO2_12_42_15 50.91 66 Candidatus Jettenia caeni 99.41 uncultured Candidatus Brocadia sp. AMX3.fa.gz 98.63 Candidatus Jettenia sp. AMX1 AMX1 97.07 Candidatus Jettenia ecosi J2 99.02 82 uncultured Planctomycetales bacterium AMX2 98.24 Candidatus Kuenenia stuttgartiensis Ru_enrich_A3 97.07 Candidatus Kuenenia stuttgartiensis kuenenia_mbr1_ru-nijmegen 99.02 Candidatus Kuenenia stuttgartiensis UBA5795 97.46 Candidatus Kuenenia stuttgartiensis kuenenia_lowcov_bin 99.41 99.41 Candidatus Kuenenia stuttgartiensis enrichment culture clone RU1 98.05 Candidatus Kuenenia stuttgartiensis enrichment culture clone CH1 97.85 uncultured Candidatus Kuenenia sp. AMX1 99.02 Candidatus Kuenenia stuttgartiensis BL10 84.57 Planctomycetes bacterium RBG_16_41_13 69.92 Genome Type Candidatus Scalindua brodae RU1 96.13 Candidatus Scalindua rubra Ru_enrich_A1 96.91 Candidatus Brocadiaceae bacterium S225 74.25 Planctomycetes bacterium SZUA-167 59.17 Candiadtus Scalindua japonica 98.18 WGS Planctomycetes bacterium RIFCSPHIGHO2_02_FULL_40_12 93.76 Planctomycetes bacterium GWF2_40_8 57.21 Planctomycetes bacterium RIFCSPLOWO2_12_FULL_40_19 81.43 Planctomycetes bacterium GWA2_40_7 44.25 MAGs Candidatus Scalindua rubra BSI-1 93.65 Candidatus Scalindua sp. MohnsGC08_192 95.12 Candidatus Scalindua sp. AMX11 AMX11 98.63 Candidatus Scalindua sp. amx 98.63 Candidatus Scalindua sp. SCAELEC01 SCAELEC01 98.63 Anammox Lineages Planctomycetes bacterium RIFCSPHIGHO2_12_FULL_51_37 61.72 Planctomycetes bacterium RIFCSPLOWO2_12_FULL_50_35 84.77 Planctomycetes bacterium RIFCSPHIGHO2_02_FULL_50_42 79.36 Planctomycetes bacterium RIFCSPLOWO2_02_FULL_50_16 92.58 Planctomycetia bacterium UBA10217 81.28 Basal lineage Planctomycetes bacterium GWA2_50_13 70.09 Planctomycetes bacterium RIFCSPHIGHO2_12_FULL_52_36 62.79 Planctomycetes bacterium RIFCSPHIGHO2_02_FULL_52_58 50.65 Candidatus Brocadiaceae bacterium CSSed10_379 51.89 Ca. Scalindua Candidatus Brocadiaceae bacterium CSSed10_392 39.95 Candidatus Brocadiaceae bacterium CSSed10_442 55.54 Candidatus Brocadiaceae bacterium B1Sed10_209R1 86.88 Candidatus Brocadiaceae bacterium T3Sed10_170 78.52 Ca. Kuenenia Candidatus Brocadiaceae bacterium B1Sed10_131 74.8 Candidatus Brocadiaceae bacterium B1Sed10_226 62.25 Candidatus Brocadiaceae bacterium B1Sed10_183 79.08 Planctomycetes bacterium SM23_32 SM23_32 60.74 Ca. Jettenia Candidatus Brocadiaceae bacterium B1Sed10_231 90.62 Planctomycetes bacterium DG_58 DG_58 41.6 Planctomycetes bacterium SM23_65 SM23_65 37.69 Planctomycetes bacterium DG_23 DG_23 79.17 Candidatus Brocadiae bacterium CSSed165cm_501 89.32 hzsCBA-less lineage Candidatus Brocadiae bacterium CSSed162cmB_341 72.92 Planctomycetes bacterium CSSed10_192R1 88.57 Planctomycetes bacterium B1Sed10_114 85.25 Planctomycetes bacterium B129_G9 64.62 Ca. Brocadia Planctomycetes bacterium RBG_16_43_13 50.27 Planctomycetes bacterium metabat2.867 52.17 Planctomycetes bacterium RBG_16_59_8 43.91 Planctomycetaceae bacterium Bin_8_1_1 65.63 Planctomycetes bacterium B2_G15 52.65 Planctomycetes bacterium B2_G17 48.94 Planctomycetes bacterium B143_G9 62.21 Ultrafast bootstrap value Planctomycetes bacterium E44_bin39 90.72 collapsed 80 genomes collapsed 8 genomes >95 collapsed 3 genomes Planctomycetales bacterium Bin_31 82.78 85-95 collapsed 4 genomes collapsed 3 genomes Akkermansia muciniphila ATCC BAA-835 100 Akkermansia glycaniphila APytT 96.33 65-85 Verrucomicrobium spinosum DSM 4136 = JCM 18804 100 Candidatus Xiphinematobacter sp. Idaho Grape 81.63 Terrimicrobium sacchariphilum 99.59 Methylacidiphilum infernorum V4 99.18 Lacunisphaera limnophila 98.16 Opitutus terrae PB90-1 99.39