bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Horizontal gene transfer of a unique nif island drives convergent evolution of free-living

2 N2-fixing 3

4 Jinjin Tao^, Sishuo Wang^, Tianhua Liao, Haiwei Luo*

5

6 Simon F. S. Li Marine Science Laboratory, School of Life Sciences and State Key Laboratory of

7 Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR 8 9 ^These authors contribute equally to this work. 10

11 *Corresponding author:

12 Haiwei Luo 13 School of Life Sciences, The Chinese University of Hong Kong 14 Shatin, Hong Kong SAR 15 Phone: (+852) 39436121 16 E-mail: [email protected] 17 18 Running Title: Free-living Bradyrhizobium evolution

19 Keywords: free-living Bradyrhizobium, nitrogen fixation, lifestyle, HGT

1

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

20 Summary

21 The alphaproteobacterial genus Bradyrhizobium has been best known as N2-fixing members that

22 nodulate legumes, supported by the nif and nod gene clusters. Recent environmental surveys

23 show that Bradyrhizobium represents one of the most abundant free-living bacterial lineages in

24 the world’s soils. However, our understanding of Bradyrhizobium comes largely from symbiotic

25 members, biasing the current knowledge of their ecology and evolution. Here, we report the

26 genomes of 88 Bradyrhizobium strains derived from diverse soil samples, including both

27 nif-carrying and non-nif-carrying free-living (nod free) members. Phylogenomic analyses of

28 these and 252 publicly available Bradyrhizobium genomes indicate that nif-carrying free-living

29 members independently evolved from symbiotic ancestors (carrying both nif and nod) multiple

30 times. Intriguingly, the nif phylogeny shows that all nif-carrying free-living members comprise a

31 cluster which branches off earlier than most symbiotic lineages. These results indicate that

32 horizontal gene transfer (HGT) promotes nif expansion among the free-living Bradyrhizobium

33 and that the free-living nif cluster represents a more ancestral version compared to that in

34 symbiotic lineages. Further evidence for this rampant HGT is that the nif in free-living members

35 consistently co-locate with several important genes involved in coping with oxygen tension

36 which are missing from symbiotic members, and that while in free-living Bradyrhizobium nif and

37 the co-locating genes show a highly conserved gene order, they each have distinct genomic

38 context. Given the dominance of Bradyrhizobium in world’s soils, our findings have implications

2

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

39 for global nitrogen cycles and agricultural research.

40 Introduction

41 Members of the slow-growing genus Bradyrhizobium constitute an important group of

42 [1, 2]. They symbiotically fix N2 with diverse legume tribes and are the predominant

43 symbionts of a wide range of nodulating legumes [3, 4]. Recent studies, however, have provided

44 accumulating evidence for the existence of close relatives of rhizobia including Bradyrhizobium

45 that do not carry nod genes which are essential for the establishment of nodulation, and thus are

46 not able to nodulate legumes [5]. Increasingly, Bradyrhizobium strains without nod have been

47 found to be particularly abundant in soils [6-8]. VanInsberghe et al [8] showed that

48 non-symbiotic Bradyrhizobium was the dominant bacterial lineage in North America coniferous

49 soils. A global atlas of the dominant soil-dwelling based upon 16S rRNA gene amplicon

50 sequencing indicated that the genus Bradyrhizobium was the most abundant bacterial lineage in

51 soils across the world [6]. A recent study suggested that ancestral lineages of Bradyrhizobium

52 might adapt to a free-living lifestyle, highlighting the importance of the free-living lifestyle in

53 the evolution of Bradyrhizobium [9].

54 Although the majority of N2 fixation in terrestrial ecosystems is generally thought to be

55 performed by symbiotic rhizobia [10-12], free-living N2-fixing bacteria may be important

56 contributors to the nitrogen budgets in a number of environments, for example soil ecosystems

57 that lack leguminous plants [13]. Free-living N2 fixation also occurs in understudied ecosystems

3

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

58 such as deep soil and canopy soil [14, 15], which may lead to the underestimation of its global

59 rates. Rhizobia are generally thought to be capable of N2 fixation only in nodules [16, 17]. The

60 exceptions are found in some members of Bradyrhizobium and Azorhizobium which were shown

61 to be able to fix N2 in the free-living state [18, 19]. Such cases in Bradyrhizobium were mostly

62 explored in photosynthetic members [18, 19], but have recently been found in other strains. For

63 instance, Bradyrhizobium sp. AT1, a non-symbiotic strain isolated from the root of sweet potato

64 was reported to fix N2 with a rate comparable to the photosynthetic strain Bradyrhizobium sp.

65 ORS278 under the molecular oxygen (O2) concentrations at 1-5% (for both) in the free-living

66 state [20]. Another example is Bradyrhizobium sp. DOA9, which harbors a nif cluster on its

67 chromosome and another on a symbiotic plasmid: while both could participate in N2 fixation

68 during symbiosis, only the chromosomal one participated in N2 fixation in the free-living state

69 [21].

70 Compared with the nodules, the condition of soils is harsh for N2 fixation for several

71 reasons. First, in contrast to being at nanomolar levels in nodules, the O2 concentration in soils

72 could be up to atmospheric levels [22], which is highly detrimental to N2 fixation, a process that

73 requires an anaerobic microenvironment [23]. Second, unlike symbiotic rhizobia, free-living

74 rhizobia lack readily available organic matter provided by the host plants in exchange for fixed

75 nitrogen. Last, free-living soil bacteria are frequently exposed to stresses like droughts, high

4

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

76 osmolarity, and high temperature. It is therefore imperative to investigate the strategies that

77 free-living N2-fixing Bradyrhizobium may use to deal with the harsh condition in soils.

78 Despite the potential ecological significance, free-living Bradyrhizobium, particularly

79 those able to fix N2, have been poorly characterized. To this end, we isolated 88 Bradyrhizobium

80 and five related strains from soils of soybean cropland, artificial park, forest, and grassland at

81 several geographic locations in China, and had their genomes sequenced. By building a

82 phylogenomic tree and reconstructing the ancestral lifestyles with additional 252 genomes where

83 42 are free-living Bradyrhizobium strains deposited in public databases, we revealed complex

84 lifestyle shifts along the evolutionary history of Bradyrhizobium. Notably, we showed that

85 horizontal gene transfer (HGT) of a unique nif island among free-living members may have

86 facilitated their transitions from symbiotic lifestyle and adaptation to soil habitats.

87

88 Methods and Materials

89 Detailed methods are described in Supplementary Text S1. In brief, we collected soil

90 samples from five different sites in China (Fig. S1A) that cover several ecosystem types: soybean

91 cropland (Heihe and Lvliang), artificial park (Shenzhen), undeveloped forest (Lanzhou), and

92 grassland (Hefei) (see Fig. S1A and Data Set S1 for details). Five different media were applied to

93 retrieve Bradyrhizobium from the samples. We obtained 93 isolates that likely resembled

94 Bradyrhizobium, and had their genomes sequenced with the Hiseq X platform (Illumina),

5

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

95 followed by genome assembly and gene identification using SPAdes v3.10.1 [24] and Prokka

96 v1.12 [25], respectively. Completeness of these genomes was calculated using CheckM v1.0.7

97 [26] with default parameters.

98 The phylogenomic tree of Bradyrhizobium was constructed using IQ-Tree v1.6.2 [27]

99 with the 93 strains isolated in the current study and 252 genomes deposited in the NCBI

100 Genbank database based on amino acid sequences of 123 shared single-copy genes identified by

101 OrthoFinder v2.2.7 [28]. Bradyrhizobium members were divided into three lifestyle types based

102 on the presence or absence of nod and nif genes (Data Sets S1 and S2). Strains that possess

103 nifBDEHKN and nodABCIJ genes (at most one missing gene allowed; the same thereafter) were

104 classified as symbiotic (Sym), strains possessing only nifBDEHKN genes were classified as

105 free-living N2-fixing (FLnif), and those lacking both nod and nif genes were classified as

106 free-living non-N2-fixing (FLnonnif). The lifestyle of ancestral nodes in the phylogenomic tree was

107 inferred using Mesquite v3.61 [29] by the maximum parsimony reconstruction method, which

108 infers the ancestral lifestyles by minimizing the number of steps of lifestyle change along the

109 phylogenomic tree. Similar to a recent study [9], we did not apply the methodologically complex

110 maximum likelihood method because the frequent lifestyle transitions across short branches in

111 the Bradyrhizobium phylogeny likely leads to overestimates of the transition rate and hence

112 inaccurate reconstruction of ancestral lifestyles.

6

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

113 Metadata of 4,958 sequence files (runs) were retrieved from the NCBI Sequence Read

114 Archive (SRA) using the term “nifH metagenome” (last accessed in December 2020). Raw data

115 were downloaded using the SRA Toolkit (https://github.com/ncbi/sra-tools) and were further

116 processed with QIIME2 [30]. We applied evolutionary placement methods to assign the short

117 sequence reads to the four phylogenetically distinguishable types of nif clusters (i.e., FL, PB,

118 Sym and SymBasal; see Fig. 2) using PaPaRa v2.5 [31] and EPA-ng v0.3.8 [32]. The normalized

119 abundance of each type of Bradyrhizobium nifH genes was calculated as the number of reads

120 assigned to the corresponding nifH type divided by the total number of reads in the amplicon

121 sequencing data set assigned to Bradyrhizobium.

122

123 Results and discussion

124 Newly sequenced free-living strains expand ecological diversity of Bradyrhizobium

125 We isolated and sequenced the genomes of 93 Bradyrhizobium strains initially identified

126 by 16S rRNA gene (five were later shown to belong to Afipia based on phylogenomic

127 construction; see below) isolated from diverse types of soils, i.e., lateritic red earths, black soils,

128 brown coniferous forest soils, cinnamon soils, and yellow-brown earths in different sites of

129 China (Figure S1). Among these isolates, 81 were non-symbiotic strains, as evidenced by the

130 lack of the nodulation-determining nod genes, four of which carried nif genes. The remaining

131 twelve were potential symbiotic strains that were likely released from decaying nodules: 11 of

7

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

132 them carried nod genes and the other one phylogenetically clustered with photosynthetic

133 members which use a nod-independent strategy for nodulation [33] (Fig. 1); all of them were

134 isolated from the rhizosphere or bulk soil of leguminous plants (Data Set S1). Together with the

135 215 publicly available Bradyrhizobium genomes, our data set included 167 symbiotic (Sym), 22

136 free-living N2-fixing (FLnif), and 119 free-living non-N2-fixing (FLnonnif) strains (Data Sets S1

137 and S2).

138 Despite a significantly expanded data set, all the Bradyrhizobium isolates fell into seven

139 phylogenetic supergroups (Fig. 1) named in a recent study [2]. These were the Soil 1, Soil 2, B,

140 jicamae, B. elkanii, Kakadu, Photosynthetic, and B. japonicum supergroups (Fig. 1). Among the

141 93 newly isolated strains, 88 contributed to all these supergroups except for the Soil 2 and

142 Kakadu supergroups, and the remaining five strains were related to the genus Afipia in the

143 outgroup. The newly isolated strains were phylogenetically nested within those that have

144 publicly available genomes, except for the 13 strains in the B. jicamae supergroup which formed

145 a sister clade to the previously sequenced strains of that supergroup (Fig. 1). The phylogenetic

146 distribution of our newly sequenced genomes indicates a high diversity of soil-dwelling

147 Bradyrhizobium.

148 Our ancestral lifestyle reconstruction analysis inferred that the last common ancestor

149 (LCA) of Bradyrhizobium adapted to a free-living lifestyle, consistent with our prior study based

150 on a very limited taxon sampling [9] (Fig. 1). This was evident by the pattern that a vast majority

8

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

151 of the outgroup lineages as well as all members of the basal Bradyrhizobium clade (Soil 1 and

152 Soil 2 supergroups) were non-symbiotic lineages. The symbiotic lifestyle independently

153 originated eight times based on the sampled lineages, including three major origins: one in the B.

154 jicamae supergroup (FLnonnif-Sym1), one in the B. elkanii supergroup (FLnonnif-Sym3), and

155 another one at the LCA of the Kakadu, Photosynthetic, and B. japonicum supergroups

156 (FLnonnif-Sym4). These transitions occurred in relatively deep phylogenetic positions, suggesting

157 that they represent evolutionarily ancient events. Free-living lifestyles originated from symbiotic

158 strains 23 times: 13 of these events gave rise to non-N2-fixing strains and 10 gave rise to

159 N2-fixing members (Fig. 1). Transitions from symbiotic to free-living lifestyle (both Sym-FLnif

160 and Sym-FLnonnif) likely occurred recently as indicated by their shallow phylogenetic positions

161 (Fig. 1). Hence, the loss of the ability to nodulate legume plants occurred more frequently than

162 the reverse process in Bradyrhizobium, consistent with a prior study based on the analysis of

163 phenotypic markers [7]. This pattern remained when we combined FLnif and FLnonnif as a single

164 free-living lifestyle (Fig. S2). Furthermore, the free-living N2-fixing lifestyle originated from

165 free-living non-N2-fixing ancestors four times (Fig. 1). The infrequency of this type of lifestyle

166 transition does not necessarily mean that this type of transition is rare, as it may result from a

167 potential bacterial cultivation bias against those carrying the nif cluster under aerobic conditions

168 with readily available N sources.

169

9

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

170 HGT of the nif island drives the expansion of the free-living N2-fixing lifestyle

171 Free-living N2-fixing Bradyrhizobium is of much interest, because these Bradyrhizobium

172 members may play a previously unrecognized role in the N cycle in soils, especially given the

173 reportedly high abundance of Bradyrhizobium in soils [6]. We asked where the N2-fixing (nif)

174 genes in free-living Bradyrhizobium come from and whether the nif genes differ between

175 free-living and symbiotic members. Since most free-living strains evolved from their symbiotic

176 ancestors, one possibility is that free-living N2-fixing strains inherited the same version of nif

177 genes from their nodulating ancestors. Alternatively, free-living N2-fixing Bradyrhizobium might

178 have lost the entire symbiosis island which includes both nod and nif clusters among other genes,

179 and instead recruited a new set of nif genes from an external donor by HGT. To test these

180 competing hypotheses, we constructed the phylogenetic tree of the nif cluster based on the

181 concatenated alignment of the genes nifABDEHKNVX for those from the Bradyrhizobium and

182 several other rhizobia including Azorhizobium, Rhizobium, Sinorhizobium and Mesorhizobium

183 from , and Burkholderia and Cupriavidus from Betaproteobacteria (see

184 Materials and methods).

185 The nif genes of all Bradyrhizobium formed a monophyletic group (Fig. S3), indicating a

186 single origin of nif in this genus. Within the Bradyrhizobium clade, the nif gene tree (Fig. 2)

187 displayed substantial topological incongruence with the species tree (Fig. 1), suggesting different

188 evolutionary histories between nif genes and the core genome. In the Bradyrhizobium part of the

10

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

189 nif gene tree, the Photosynthetic Bradyrhizobium supergroup together with a strain from the B.

190 jicamae supergroup (SZCCT0283) branched first, followed by a few strains of the B. japonicum

191 supergroup (Fig. 2). The latter consisted of two clades, one exclusively constituted by symbiotic

192 strains (SymBasal; Fig. 2), some of which were early-split lineages of the B. japonicum

193 supergroup (Fig. 1), and the other mostly composed of free-living N2-fixing members (Cluster

194 FL) (Fig. 2). Nif genes of the B. jicamae and B. elkanii supergroups, which together with B.

195 japonicum comprised the three largest clades in the nif gene tree (Fig. 2), were nested within the

196 B. japonicum supergroup, suggesting that they were horizontally transferred from the latter.

197 Despite much inconsistency between the species and nif trees, but because the basal positions in

198 the nif phylogeny mostly included members from Photosynthetic, Kakadu, and B. japonicum

199 supergroups, it is likely that the nif cluster in Bradyrhizobium was first acquired by the LCA of

200 the Photosynthetic, Kakadu and B. japonicum supergroups. This is consistent with the ancestral

201 lifestyle reconstruction shown in Fig. 1 which inferred a symbiotic ancestor of the corresponding

202 ancestral node (FLnonnif-Sym4).

203 Strikingly, despite a scattered distribution in the species phylogeny (Fig. 1), 17

204 free-living N2-fixing strains grouped on the nif phylogeny (Cluster FL in Fig. 2). Also grouping

205 in Cluster FL were the nif genes from three symbiotic strains (DOA9, CCBAU 53363 and p9-20).

206 These three strains encoded another nif cluster located in the symbiotic plasmid (DOA9) or

207 symbiosis island on the chromosome (CCBAU 53363 and p9-20), which grouped with other

11

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

208 symbiotic strains in the nif phylogeny (Fig. 2). Clearly, the grouping of the nif genes belonging

209 to Cluster FL was the result of HGT. This point was further evidenced by the conserved genomic

210 context around nif genes in Cluster FL (see the next section). Moreover, the geographic locations

211 of isolates belonging to Cluster FL spans three continents (North America, South America and

212 Asia) (Fig. 1B). Remarkably, as introduced above, two strains (Bradyrhizobium sp. AT1 and

213 Bradyrhizobium sp. DOA9) whose nif genes from Cluster FL in the nif phylogeny were

214 previously shown to perform N2 fixation under micro-aerobic conditions (marked by a box

215 around the strain name in Fig. 2) [20, 21]. The above evidence together indicates that nif genes

216 from Cluster FL, which were mainly comprised by free-living Bradyrhizobium, are likely to be a

217 special group of nif genes that specifically take part in free-living N2 fixation.

218

219 A unique nif island likely contributes to the oxygen tolerance of free-living N2-fixing

220 Bradyrhizobium

221 We sought more evidence for HGT of the nif cluster by analyzing genes surrounding it.

222 This led to the identification of a ~50 kb genomic region containing nif genes (nif island)

223 conserved among all members of Cluster FL (Fig. 3A). The regions flanking the nif island were

224 not conserved across most free-living strains (Fig. S4), providing further evidence for HGT of

225 the nif island. Note that four strains of Cluster FL (BM-T, Y-H1, R2.2-H and W) shared highly

226 conserved genomic contexts (Fig. S4). These strains were closely related and comprised a

12

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

227 monophyletic group in the phylogenomic tree (marked by a box around the strain names in Fig.

228 S5), indicating that the shared genomic context of their nif islands was inherited from their LCA.

229 Additionally, the conserved genomic context of the nif island among strains from the

230 Photosynthetic Bradyrhizobium supergroup (Fig. S4) was consistent with their vertical descent

231 shown in the species phylogeny (Fig. 1). We further compared this unique nif island with the nif

232 clusters of other strains in this genus (Fig. 3).

233 The upstream boundary of the nif island in Cluster FL Bradyrhizobium (Fig. 3A) was

234 marked by nifA, which codes for a transcriptional activator required for the expression of nif

235 operons [34]. Adjacent to nifA was a suf cluster, which is responsible for synthesizing and

236 inserting Fe-S clusters into nitrogenase [35]. The core genes encoding the nitrogenase were

237 located downstream of the suf cluster, and were mainly separated into two operons, nifDKENX

238 and nifHQ. FixABCX which encode a membrane complex involved in electron transfer to

239 nitrogenase were located downstream of the nif genes [36]. Extensively studied in rhizobia, the

240 role of fixABCX in N2 fixation has been suggested for free-living diazotrophs [37]. At the

241 downstream boundary of the nif island was a mod cluster encoding molybdate transporter.

242 In general, the nif island conserved in free-living Bradyrhizobium was similar to that in

243 photosynthetic strains (Fig. 3A), as also noted in previous studies [38, 39], and that in a newly

244 sequenced soil-dwelling strain (SZCCT0283) from the B. jicamae supergroup (Fig. 3A),

245 consistent with the evolutionary relatedness of these nif islands (Fig. 2). A few differences were

13

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

246 that those from Cluster PB carried an additional copy of nifH but fewer genes in the mod operon

247 compared with those from Cluster FL (Fig. 3). Further, the early-split positions of Cluster FL and

248 PB nif genes in the phylogeny (Fig. 2) suggest an ancient evolutionary origin. Because nif genes

249 in the symbiosis island of most symbiotic strains branched later than those from the nif island

250 (Fig. 2), presumably, the nif island found in photosynthetic and free-living Bradyrhizobium

251 represent the ancestral status of symbiotic Bradyrhizobium, and it was later expanded to a

252 symbiosis island by integrating nod as well as other genes in non-photosynthetic symbiotic

253 Bradyrhizobium.

254 The nif clusters of the nif island conserved in Cluster PB and FL (Fig. 3A) shared many

255 genes with those on the symbiosis island (Fig. 3B), such as nifDKENBH and fixBCX, agreeing

256 with their essential roles in N2 fixation. However, they also displayed notable differences.

257 Apparently, the gene arrangement of the free-living nif island was more compact and more

258 conserved across different strains (Fig. 3). In contrast, in the nif cluster from the symbiosis island,

259 several genes involved in N2 fixation like nifA and fixA are often linked to nod genes and

260 separated from the nif cluster [40]. The nif island in free-living and photosynthetic members

261 additionally carried a nifV gene, which encodes homocitrate synthase to synthesize homocitrate,

262 a component of the Fe-Mo cofactor of nitrogenase [41]. The vast majority of symbiotic rhizobia

263 (not limited to Bradyrhizobium) do not harbor nifV [41], which is known to be compensated for

264 by the homocitrate provided by their legume host during the symbiotic state [42]. Some studies

14

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

265 attributed the ability of free-living N2 fixation to nifV, because this gene was detected in the

266 genomes of the few rhizobial strains able to perform N2 fixation in the free-living state [41, 43,

267 44]. Here, we found that nifV was present in around half of Bradyrhizobium genomes, not

268 restricted to the free-living and photosynthetic strains (Fig. 2). Phylogenetic analysis suggests

269 that nifV was presumably present at the LCA of Bradyrhizobium, but lost within the B.

270 japonicum supergroup (Node 1 in Fig. 2; Fig. S6). Further, early studies showed that

271 Bradyrhizobium sp. CB756, a strain carrying nifV on the symbiosis island (Fig. 3B), fixed N2 at a

272 rate more than a magnitude lower than the nif island-carrying photosynthetic Bradyrhizobium

273 (strains ORS310 and ORS322) and Azorhizobium caulinodans ORS571 in agar culture under air

274 or O2 concentrations resembling soil environments [19, 45]. This hints that the absence of nifV

275 might not be the only reason accounting for the decreased efficiency of free-living N2 fixation of

276 certain rhizobia strains, at least at higher O2 concentrations.

277 We further identified that several genes involved in O2 tolerance and stress response were

278 also specific to the nif island of free-living and photosynthetic members (Fig. 3), including glbO,

279 hspQ, and the mod operon. Specifically, glbO encodes a 17-kDa group-II truncated hemoglobin

280 that binds O2 with high affinity [46, 47]. The expression of truncated hemoglobin genes is

281 induced upon exposure to oxidative stress generated by H2O2 and hypoxia [48]. The product of

282 glbO might therefore play an important role in protecting nitrogenase from O2 inactivation,

283 endowing the free-living Bradyrhizobium with higher tolerance to O2. HspQ encodes a

15

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

284 chaperone protein, which combats the detrimental effects on proteins caused by stressors such as

285 increased temperatures, oxidative stress, and heavy metals [49]. The mod operon transports

286 extracellular molybdate with high affinity [50]. We also observed copy number differences

287 between the free-living and symbiotic members, such as the nifZ which has a function in the

288 maturation of the Fe-Mo protein nitrogenase [51], but whether the additional copies contribute to

289 adaptation to the free-living lifestyle is still unknown. In summary, we predict that in addition to

290 nifV, the genes involved in oxygen tolerance and stress response in the nif island (e.g., glbO,

291 hspQ and mod) may play a role in the adaptation to free-living lifestyle. Intriguingly, the nif

292 island from Azorhizobium, which is able to perform free-living N2 fixation, shared some of the

293 above genes like nifV, glbO and mod, and a generally similar gene arrangement with the

294 free-living nif island in Bradyrhizobium (Fig. 3C).

295 Note that, for simplicity, we defined free-living strains based on the absence of nod in the

296 current study. Four free-living N2-fixing strains, namely B. mercantei SEMIA 6399, B.

297 yuanmingense P10 130, B. liaoningense CCBAU 83689 and B. mercantei SEMIA 6399, were

298 originally isolated from nodules (Data Set S2). Interestingly, except the last isolate whose nif

299 genes were found in Cluster FL, the nif genes of the other three were embedded in symbiotic

300 lineages in the nif phylogeny (Fig. 2), indicating that the nif genes were inherited from

301 nodulating ancestors. These three strains were also found to carry genes encoding components of

302 T3SS and the effectors it injects, as evident by the genes gained or lost shown in Fig. S7C

16

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

303 (Sym-FLnonnif transitions 2, 5 and 6; Text S2), suggesting that they are likely nodule-inhabiting

304 strains that do not nodulate but with the potential to interact with plants. This is in contrast to

305 other free-living N2-fixing strains where symbiosis genes were commonly lost during lifestyle

306 transition (Sym-FLnonnif transitions 1, 3, 4 and 7-10 in Fig. S7C; Text S2). In general, the gene

307 arrangement of nif genes and surrounding regions for these strains were similar to those of

308 symbiotic strains, particularly when compared with their closest relatives, but the degree of

309 similarity varied: while P10 130 showed a very high similarity to its closest nod-carrying relative

310 CCGE-LA001 in the genomic context of the nif genes, SEMIA 6399 showed a limited similarity

311 to its closest nod-carrying relative NAS96.2, likely due to a recent insertion in SEMIA 6399 of

312 several genes between nifZ and nifH (Fig. 3B). We did not analyze CCBAU 83689 as the contig

313 carrying nif genes was too short. The results indicate the different origins of the free-living

314 N2-fixing Bradyrhizobium: while some were derived from their symbiotic ancestors, most

315 analyzed strains of this lifestyle originated via HGT of the free-living nif island (Fig. 3). It would

316 be interesting to conduct experiments to assess the detailed functions of the genes on the nif

317 island under O2 concentrations resembling those of soil environments in future.

318

319 Widespread distribution of the unique free-living nif island in the environment

320 We further asked how prevalent the nif island specific to free-living Bradyrhizobium is in

321 the environments and how it compares to the nif genes of symbiotic members. We therefore

17

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

322 assessed the normalized abundances of nifH, the most commonly used marker gene to identify

323 N2-fixing microbes [52], that potentially correspond to free-living (Cluster FL in Fig. 2),

324 photosynthetic (Cluster PB in Fig. 2), and symbiotic Bradyrhizobium (Sym and SymBasal in Fig.

325 2) in 4,958 amplicon sequencing data sets (see Materials and methods). We divided these data

326 sets into four categories according to the environments where the samples were collected: soils

327 (e.g., bulk soil, rhizosphere and freshwater lake sediment), plants (root and phyllosphere

328 samples), marine (including seawater, marine sediment, coral and mangrove samples), and others

329 (e.g., bioreactor and unidentified samples) (Data Set S3). Amplicon reads were assigned to

330 different types of Bradyrhizobium nifH according to their phylogenetic placements (see

331 Materials and methods).

332 Amplicon analysis showed that Bradyrhizobium constitute an important group of

333 diazotrophic communities in different habitats, and the relative abundance of Bradyrhizobium

334 was 7.52±18.57%, 6.67±12.75%, 17.39±18.32, 14.54±18.85% in marine, soil, plant, and other

335 environmental types, respectively (Data Set S3). The nifH that resembles those belonging to

336 Cluster FL displayed the highest normalized abundance in samples from marine, soil, and

337 habitats classified as “others” (Fig. 4; p < 0.001 for all comparisons, paired

338 Wilcoxon-Mann-Whitney test). Specifically, Cluster FL accounted for 56±47%, 46±38%, and 51

339 ±39% (mean±standard deviation) of reads that were assigned to Bradyrhizobium in these three

340 types of habitats (marine, soil, and others), respectively. In particular, for marine samples, nearly

18

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

341 all Bradyrhizobium nifH were assigned to Cluster FL (Data Set S3). For the data sets of plant

342 samples, while symbiotic nifH displayed the highest abundance (Fig. 4), nifH assigned to Cluster

343 FL also showed a considerably high normalized abundance in plants (29±39%), and was even

344 the dominant ecotype in diazotrophic communities associated with some Asteraceae species and

345 the roots of several perennial grass species (Data Set S3). Hence, though the presence of

346 free-living Bradyrhizobium-specific nifH does not necessarily indicate the occurrence of a

347 complete nif island, it is tempting to suggest that Bradyrhizobium members carrying the

348 free-living nif island i) were distributed in a variety of environments (Fig. S8), and ii) were in

349 general more abundant than that of symbiotic strains in the genus in most habitats except plants,

350 hinting at a previously unrecognized role of their nif island in the free-living lifestyle.

351 Nevertheless, it should be noted that Cluster FL nifH also displayed large variations in regard to

352 the normalized abundance between different diazotrophic communities of the same type of

353 habitat despite the general trend described above.

354

355 Caveats and concluding remarks

356 Adding the 93 soil-dwelling strains greatly expanded our knowledge of the ecology and

357 evolution of Bradyrhizobium. However, because of the bias towards symbiotic strains in

358 previous genomics research and the difficulties in cultivating those dwelling soil environments,

359 the free-living strains analyzed here may underestimate the real diversity of wild diazotrophic

19

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

360 Bradyrhizobium that adapt to a free-living lifestyle. For example, prior studies have shown that

361 Bradyrhizobium is the dominant genus in the diazotrophic communities in many soil ecosystems,

362 most of which have not been sampled for Bradyrhizobium cultivation however [53, 54]. Of the

363 limited types of soil ecosystems sampled here, the Bradyrhizobium strains we collected are not

364 necessarily representative of the wild populations in these ecosystems owing to the potential

365 cultivation bias. For example, our cultivation process was under aerobic condition and the

366 cultivation medium was rich in fixed nitrogen, conditions disfavoring the isolation of

367 diazotrophic members. Therefore, it is not clear whether important free-living diazotrophic

368 lineages occupying distinct phylogenetic positions are missing and how this complements the

369 current conclusion we reached. Another caveat to some of our findings is that some of our

370 analyses built on an over-simplified classification of Bradyrhizobium lifestyles, which was based

371 on the presence/absence of certain genes like nif and nod. In fact, isolates with a symbiosis island

372 could be ineffective (i.e., those form nodules but fix little N2 within nodules) [55]. Likewise,

373 whether the free-living strains possessing the nif island could perform N2 fixation, and if they

374 could, the N2 fixation efficiency under different O2 concentrations, remains to be studied.

375 In spite of the above caveats, our study reveals an interesting pattern of convergent

376 evolution of free-living N2-fixing Bradyrhizobium from their symbiotic ancestors. Though the

377 nested phylogenetic positions within symbiotic lineages (Fig. 1) might leave an impression that

378 free-living N2-fixing members could have inherited the nif genes from their symbiotic ancestors,

20

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

379 we provided compelling evidence that this is unlikely the case. Rather, it is the HGT of a

380 conserved nif island, which presumably represents the ancestral state of all types of nif found in

381 the Bradyrhizobium, that drives the independent transitions from symbiotic to free-living

382 N2-fixing lineages. This serves as a classic example of convergent shifts in lifestyle driven by

383 HGT of certain genes in bacteria, which, although mostly explored in pathogens or symbionts in

384 prior studies [56-58], may play a prominent role in free-living bacteria. It further hints that the

385 symbiosis island common to the vast majority of symbiotic Bradyrhizobium members could be

386 derived from the ancestral nif island potentially by recruiting symbiosis genes. This may be

387 different from the traditional view that the symbiosis island, including nif, nod and T3SS among

388 other genes, was acquired in its entirety to enable ancestral interactions between Bradyrhizobium

389 and legumes [9, 59]. Given the global dominance of Bradyrhizobium in the soil microbiota, even

390 a small proportion of them being diazotrophic members could potentially bring a considerable

391 amount of fixed nitrogen to the bulk soils and other nitrogen-limited terrestrial ecosystems. Our

392 results therefore have important implications for understanding terrestrial nitrogen cycles.

393 Because one of the major aims of modern agricultural research is to transfer the N2 fixation

394 ability to non-legume crops [60, 61] , the capacity of certain free-living Bradyrhizobium to fix N2

395 and their association with non-leguminous plants may imply potential application in agriculture.

396

397 Acknowledgements

21

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

398 We thank Xiaoyuan Feng for help with genome assembly. We appreciate Shuang Wang

399 and Jie Liu for providing soil samples and Jianjun Xu for assisting in sample collection. We are

400 grateful to Xingqin Lin for her helpful suggestions in medium design. We also thank Qin Li, Yan

401 Li, Xiaojun Wang, Xiao Chu, Hao Zhang and Danli Luo for their helpful discussion. This work

402 was supported by the Hong Kong Research Grants Council Area of Excellence Scheme

403 (AoE/M-403/16).

404

405 Data and codes availability

406 The genomic sequences of the 88 sequenced Bradyrhizobium and five Afipia strains have

407 been submitted to the NCBI databases GenBank (accessions: PRJNA698083), which will be

408 made publicly available upon the acceptance of the manuscript. The Python [62] and Ruby [63]

409 codes used for phylogenomic and comparative genomics analyses are deposited in the online

410 repository https://github.com/luolab-cuhk/Bradyrhizobium-Nif-HGT.

411

412 Competing interests

413 The authors declare no competing interests in relation to this work.

414

22

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

415 References

416 1. Ormeno-Orrillo E, Martinez-Romero E. A genomotaxonomy view of the Bradyrhizobium

417 genus. Frontiers in microbiology 2019; 10: 1334.

418 2. Avontuur JR, Palmer M, Beukes CW, Chan WY, Coetzee MPA, Blom J, et al.

419 Genome-informed Bradyrhizobium taxonomy: where to from here? Systematic and

420 Applied Microbiology 2019; 42: 427-439.

421 3. Andrews M, Andrews ME. Specificity in legume-rhizobia symbioses. International

422 journal of molecular sciences 2017; 18: 705.

423 4. Parker MA. The spread of Bradyrhizobium lineages across host legume clades: from

424 Abarema to Zygia. Microb Ecol 2015; 69: 630-640.

425 5. Garrido-Oter R, Nakano RT, Dombrowski N, Ma KW, McHardy AC, Schulze-Lefert P,

426 et al. Modular Traits of the Rhizobiales Root Microbiota and Their Evolutionary

427 Relationship with Symbiotic Rhizobia. Cell Host & Microbe 2018; 24: 155-167.

428 6. Delgado-Baquerizo M, Oliverio AM, Brewer TE, Benavent-Gonzalez A, Eldridge DJ,

429 Bardgett RD, et al. A global atlas of the dominant bacteria found in soil. Science 2018;

430 359: 320-+.

431 7. Hollowell AC, Regus JU, Gano KA, Bantay R, Centeno D, Pham J, et al. Epidemic

432 Spread of Symbiotic and Non-Symbiotic Bradyrhizobium Genotypes Across California.

433 Microbial Ecology 2016; 71: 700-710.

23

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

434 8. VanInsberghe D, Maas KR, Cardenas E, Strachan CR, Hallam SJ, Mohn WW.

435 Non-symbiotic Bradyrhizobium ecotypes dominate North American forest soils. ISME J

436 2015; 9: 2435-2441.

437 9. Wang S, Meade A, Lam H, Luo H. Evolutionary Timeline and Genomic Plasticity

438 Underlying the Lifestyle Diversity in Rhizobiales. Msystems 2020: 5(4).

439 10. Peoples MB, Craswell ET. Biological Nitrogen-Fixation - Investments, Expectations and

440 Actual Contributions to Agriculture. Plant and Soil 1992; 141: 13-39.

441 11. Cleveland CC, Townsend AR, Schimel DS, Fisher H, Howarth RW, Hedin LO, et al.

442 Global patterns of terrestrial biological nitrogen (N-2) fixation in natural ecosystems.

443 Global Biogeochemical Cycles 1999; 13: 623-645.

444 12. Herridge DF, Peoples MB, Boddey RM. Global inputs of biological nitrogen fixation in

445 agricultural systems. Plant and Soil 2008; 311: 1-18.

446 13. Vitousek PM, Menge DN, Reed SC, Cleveland CC. Biological nitrogen fixation: rates,

447 patterns and ecological controls in terrestrial ecosystems. Philos Trans R Soc Lond B Biol

448 Sci 2013; 368: 20130119.

449 14. Reed SC, Cleveland CC, Townsend AR. Relationships among phosphorus, molybdenum

450 and free-living nitrogen fixation in tropical rain forests: results from observational and

451 experimental analyses. Biogeochemistry 2013; 114: 135-147.

24

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

452 15. Matson AL, Corre MD, Burneo JI, Veldkamp E. Free-living nitrogen fixation responds to

453 elevated nutrient inputs in tropical montane forest floor and canopy soils of southern

454 Ecuador. Biogeochemistry 2015; 122: 281-294.

455 16. Lee KB, De Backer P, Aono T, Liu CT, Suzuki S, Suzuki T, et al. The genome of the

456 versatile nitrogen fixer Azorhizobium caulinodans ORS571. BMC genomics 2008; 9:

457 1-14.

458 17. Dreyfus B, Garcia JL, Gillis M. Characterization of Azorhizobium-Caulinodans Gen-Nov,

459 Sp-Nov, a Stem-Nodulating Nitrogen-Fixing Bacterium Isolated from Sesbania-Rostrata.

460 International Journal of Systematic Bacteriology 1988; 38: 89-98.

461 18. Dreyfus BL, Elmerich C, Dommergues YR. Free-Living Rhizobium Strain Able to Grow

462 on N-2 as the Sole Nitrogen-Source. Applied and Environmental Microbiology 1983; 45:

463 711-713.

464 19. Alazard D. Nitrogen-Fixation in Pure Culture by Rhizobia Isolated from Stem Nodules of

465 Tropical Aeschynomene Species. Fems Microbiology Letters 1990; 68: 177-182.

466 20. Terakado-Tonooka J, Fujihara S, Ohwaki Y. Possible contribution of Bradyrhizobium on

467 nitrogen fixation in sweet potatoes. Plant and Soil 2013; 367: 639-650.

468 21. Wongdee J, Boonkerd N, Teaumroong N, Tittabutr P, Giraud E. Regulation of nitrogen

469 fixation in Bradyrhizobium sp. strain DOA9 involves two distinct NifA regulatory

25

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

470 proteins that are functionally redundant during symbiosis but not during free-living

471 growth. Frontiers in microbiology 2018; 9: 1644.

472 22. Hunt S, Layzell DB. Gas-Exchange of Legume Nodules and the Regulation of

473 Nitrogenase Activity. Annual Review of Plant Physiology and Plant Molecular Biology

474 1993; 44: 483-511.

475 23. Gallon JR. Reconciling the Incompatible - N-2 Fixation and O-2. New Phytologist 1992;

476 122: 571-609.

477 24. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes:

478 a new genome assembly algorithm and its applications to single-cell sequencing. J

479 Comput Biol 2012; 19: 455-477.

480 25. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014; 30:

481 2068-2069.

482 26. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the

483 quality of microbial genomes recovered from isolates, single cells, and metagenomes.

484 Genome Res 2015; 25: 1043-1055.

485 27. Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective

486 stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol

487 2015; 32: 268-274.

26

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

488 28. Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative

489 genomics. Genome biology 2019; 20: 1-14.

490 29. Maddison WP, Maddison DR (2019). Mesquite: A modular system for evolutionary

491 analysis. Version 3.61. 2019.

492 30. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al.

493 QIIME allows analysis of high-throughput community sequencing data. Nature Methods

494 2010; 7: 335-336.

495 31. Berger SA, Stamatakis A. Aligning short reads to reference alignments and trees.

496 Bioinformatics 2011; 27: 2068-2075.

497 32. Barbera P, Kozlov AM, Czech L, Morel B, Darriba D, Flouri T, et al. EPA-ng: Massively

498 Parallel Evolutionary Placement of Genetic Sequences. Systematic Biology 2019; 68:

499 365-369.

500 33. Giraud E, Moulin L, Vallenet D, Barbe V, Cytryn E, Avarre JC, et al. Legumes

501 symbioses: Absence of Nod genes in photosynthetic bradyrhizobia. Science 2007; 316:

502 1307-1312.

503 34. Dixon R, Kahn D. Genetic regulation of biological nitrogen fixation. Nature Reviews

504 Microbiology 2004; 2: 621-631.

505 35. Outten FW, Djaman O, Storz G. A suf operon requirement for Fe-S cluster assembly

506 during iron starvation in Escherichia coli. Molecular Microbiology 2004; 52: 861-872.

27

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

507 36. Edgren T, Nordlund S. The fixABCX genes in Rhodospirillum rubrum encode a putative

508 membrane complex participating in electron transfer to nitrogenase. Journal of

509 Bacteriology 2004; 186: 2052-2060.

510 37. Ledbetter RN, Costas AMG, Lubner CE, Mulder DW, Tokmina-Lukaszewska M, Artz

511 JH, et al. The Electron Bifurcating FixABCX Protein Complex from Azotobacter

512 vinelandii: Generation of Low-Potential Reducing Equivalents for Nitrogenase Catalysis.

513 Biochemistry 2017; 56: 4177-4190.

514 38. Okubo T, Piromyou P, Tittabutr P, Teaumroong N, Minamisawa K. Origin and Evolution

515 of Nitrogen Fixation Genes on Symbiosis Islands and Plasmid in Bradyrhizobium.

516 Microbes and Environments 2016; 31: 260-267.

517 39. Okubo T, Tsukui T, Maita H, Okamoto S, Oshima K, Fujisawa T, et al. Complete

518 Genome Sequence of Bradyrhizobium sp S23321: Insights into Symbiosis Evolution in

519 Soil Oligotrophs. Microbes and Environments 2012; 27: 306-315.

520 40. Lamb JW, Hennecke H. In Bradyrhizobium-Japonicum the Common Nodulation Genes,

521 Nodabc, Are Linked to Nifa and Fixa. Molecular & General Genetics 1986; 202:

522 512-517.

523 41. Nouwen N, Arrighi JF, Cartieaux F, Chaintreuil C, Gully D, Klopp C, et al. The role of

524 rhizobial (NifV) and plant (FEN1) homocitrate synthases in

28

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

525 Aeschynomene/photosynthetic Bradyrhizobium symbiosis. Scientific reports 2017; 7:

526 1-10.

527 42. Hakoyama T, Niimi K, Watanabe H, Tabata R, Matsubara J, Sato S, et al. Host plant

528 genome overcomes the lack of a bacterial gene for symbiotic nitrogen fixation. Nature

529 2009; 462: 514-517.

530 43. Agarwal AK, Keister DL. Physiology of Ex-Planta Nitrogenase Activity in

531 Rhizobium-Japonicum. Applied and Environmental Microbiology 1983; 45: 1592-1601.

532 44. Terpolilli JJ, Hood GA, Poole PS. What Determines the Efficiency of N-2-Fixing

533 Rhizobium-Legume Symbioses? Advances in Microbial Physiology, Vol 60 2012; 60:

534 325-389.

535 45. Bender GL, Plazinski J, Rolfe BG. Asymbiotic Acetylene-Reduction by a Fast-Growing

536 Cowpea Rhizobium Strain with Nitrogenase Structural Genes Located on a Symbiotic

537 Plasmid. Applied and Environmental Microbiology 1986; 51: 868-871.

538 46. Liu C, He Y, Chang Z. Truncated hemoglobin o of Mycobacterium tuberculosis: the

539 oligomeric state change and the interaction with membrane components. Biochemical

540 and Biophysical Research Communications 2004; 316: 1163-1172.

541 47. Pathania R, Navani NK, Rajomohan G, Dikshit KL. Mycobacterium tuberculosis

542 hemoglobin HbO associates with membranes and stimulates cellular respiration of

543 recombinant Escherichia coli. Journal of Biological Chemistry 2002; 277: 15293-15302.

29

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

544 48. Ascenzi P, De Marinis E, Coletta M, Visca P. H(2)O(2) and (center dot)NO scavenging

545 by Mycobacterium leprae truncated hemoglobin O. Biochemical and Biophysical

546 Research Communications 2008; 373: 197-201.

547 49. Shimuta T, Nakano K, Yamaguchi Y, Ozaki S, Fujimitsu K, Matsunaga C, et al. Novel

548 heat shock protein HspQ stimulates the degradation of mutant DnaA protein in

549 Escherichia coli. Genes to Cells 2004; 9: 1151-1166.

550 50. Grunden AM, Shanmugam KT. Molybdate transport and regulation in bacteria. Archives

551 of Microbiology 1997; 168: 345-354.

552 51. Jimenez-Vicente E, Yang ZY, del Campo JSM, Cash VL, Seefeldt LC, Dean DR. The

553 NifZ accessory protein has an equivalent function in maturation of both nitrogenase

554 MoFe protein P-clusters. Journal of Biological Chemistry 2019; 294: 6204-6213.

555 52. Gaby JC, Buckley DH. A global census of nitrogenase diversity. Environmental

556 Microbiology 2011; 13: 1790-1799.

557 53. Han LL, Wang Q, Shen JP, Di HJ, Wang JT, Wei WX, et al. Multiple factors drive the

558 abundance and diversity of the diazotrophic community in typical farmland soils of China.

559 FEMS microbiology ecology 2019; 95: fiz113.

560 54. Meng H, Zhou Z, Wu R, Wang Y, Gu J. Diazotrophic microbial community and

561 abundance in acidic subtropical natural and re-vegetated forest soils revealed by

30

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

562 high-throughput sequencing of nifH gene. Applied Microbiology and Biotechnology 2019;

563 103: 995-1005.

564 55. Gano-Cohen KA, Wendlandt CE, Stokes PJ, Blanton MA, Quides KW, Zomorrodian A,

565 et al. Interspecific conflict and the evolution of ineffective rhizobia. Ecology Letters 2019;

566 22: 914-924.

567 56. Remigi P, Zhu J, Young JPW, Masson-Boivin C. Symbiosis within Symbiosis: Evolving

568 Nitrogen-Fixing Legume Symbionts. Trends in Microbiology 2016; 24: 63-75.

569 57. Novick RP, Ram G. The Floating (Pathogenicity) Island: A Genomic Dessert. Trends in

570 Genetics 2016; 32: 114-126.

571 58. Gal-Mor O, Finlay BB. Pathogenicity islands: a molecular toolbox for bacterial virulence.

572 Cellular Microbiology 2006; 8: 1707-1719.

573 59. Provorov NA, Andronov EE. Evolution of root nodule bacteria: Reconstruction of the

574 speciation processes resulting from genomic rearrangements in a symbiotic system.

575 Microbiology 2016; 85: 131-139.

576 60. Soumare A, Diedhiou AG, Thuita M, Hafidi M, Ouhdouch Y, Gopalakrishnan S, et al.

577 Exploiting Biological Nitrogen Fixation: A Route Towards a Sustainable Agriculture.

578 Plants-Basel 2020; 9: 1011.

579 61. Santi C, Bogusz D, Franche C. Biological nitrogen fixation in non-legume plants. Annals

580 of Botany 2013; 111: 743-767.

31

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

581 62. Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely

582 available Python tools for computational molecular biology and bioinformatics.

583 Bioinformatics 2009; 25: 1422-1423.

584 63. Goto N, Prins P, Nakao M, Bonnal R, Aerts J, Katayama T. BioRuby: bioinformatics

585 software for the Ruby programming language. Bioinformatics 2010; 26: 2617-2619.

586 64. Zulkower V, Rosser S. DNA Features Viewer: a sequence annotation formatting and

587 plotting library for Python. Bioinformatics 2020; 36: 4350-4352.

588

589

32

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

590 Legends

591 Figure 1. The maximum-likelihood phylogenomic tree of Bradyrhizobium and inferred lifestyle

592 evolutionary history. Strains from Xanthobacteraceae were used as an outgroup. Ancestral

593 lifestyles were inferred using the parsimony method in Mesquite. Branches in black, green, red

594 and gray represent free-living non-N2-fixing, free-living N2-fixing, symbiotic and uncertain

595 lifestyles, respectively. The color strips in the outer layer indicates different supergroups. Red

596 and green solid circles denote the presence of nod and nif genes, respectively. Black stars

597 adjacent to the tips of the phylogeny indicate strains isolated and sequenced in the present study.

598 We inferred 8, 13, 10 and 4 times FLnonnif-Sym, Sym-FLnonnif, Sym-FLnif and FLnonnif-FLnif

599 transitions respectively (noted with arrows). Purple circles on the nodes indicate ultrafast

600 bootstrap values higher than or equal to 95% calculated by IQ-Tree. The red and green triangles

601 on the outer layer indicate symbiotic members of the basal lineages of the B. japonicum

602 supergroup and free-living N2-fixing strains, respectively.

603

604 Figure 2. Nif gene phylogeny of Bradyrhizobium. The phylogeny was constructed using the

605 concatenated alignments of nifABDEHKNVX. Nif genes from Rhizobium, Sinorhizobium,

606 Mesorhizobium, Azorhizobium and Beta-rhizobia (rhizobia from Betaproteobacteria) are used as

607 the outgroup. The color strips on the outer layer indicates different supergroups, and the

608 background colors of the species name represent different lifestyles. The purple circles on the

33

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

609 nodes indicate ultrafast bootstrap values higher than or equal to 95% calculated by IQ-Tree. The

610 three symbiotic strains (also mentioned in Fig. 1) with nif genes on both the free-living nif island

611 and symbiosis island are indicated by black triangles. The tips with a black box around denote

612 the strains from Cluster FL with experimental evidence for their N2 fixation in the free-living

613 state. Note that three symbiotic strains, namely DOA9, p9-20 and CCBAU 53363, carried a nif

614 cluster similar to the one on the free-living nif island (NI) in addition to another one on the

615 symbiosis island (SI). The geographic locations of free-living N2-fixing strains are displayed in

616 Fig. S1B.

617

618 Figure 3. The comparison of the gene arrangement of the nif gene cluster located in the nif island

619 and in the symbiosis island. Gene functions are distinguished by different colors. The

620 visualization of gene arrangement is performed with dna-features-viewer v3.0.3 [64]. The

621 structures of the gene arrangement classified the nif clusters into three types: those belonging to

622 the Cluster FL or Cluster PB (A), those found in the symbiosis island in most symbiotic strains

623 or in FLnif strains that cluster with symbiotic strains (B), and those from outgroup species (C). In

624 panel B, two out of the three free-living strains whose phylogenetic positions are nested within

625 symbiotic strains, labeled as “FLnifwithinSym”, are shown together with the symbiotic strains

626 closest to them in the nif tree (see also Fig. 2): SEMIA 6399 vs. NAS96.2, and CCGE-LA001 vs.

34

bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

627 P10 130 (the other one, strain CCBAU 83689, is not shown as the assembly of the contig where

628 the nif island is located is incomplete).

629

630 Figure 4. Normalized abundance of free-living and symbiotic nifH of Bradyrhizobium in

631 amplicon sequencing samples collected from different types of environments. The values shown

632 as percentage in the y-axis in the violin plots denote the normalized abundance (see Materials

633 and methods) of different types of nifH. The width of each curve represents the approximate

634 frequency of data points in the corresponding region. The P-value is derived from a paired

635 Wilcoxon-Mann-Whitney test (two-sided).

636

35

Isolated and sequenced in this study bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Colorstrip > Supergroups Soil 1 supergroup Soil 2 supergroup Outgroup Bradyrhizobium jicamae supergroup 1 Bradyrhizobium elkanii supergroup Kakadu supergroup 1 1 Photosynthetic supergroup Bradyrhizobium japonicum supergroup

Dots > Gene clusters SEMIA 6399 nifBDEHKN 2 nodABCIJ 2 8 Branch color > Lifestyles 3 Symbiotic (Sym) 7 2

Free-living N2-fixing (FLnif) Free-living non-N2-fixing (FLnonnif) 4 Uncertain Bradyrhizobium S23321 4 Arrow > Lifestyle transitions 6 3 Ec3.3

nonnif 4 BR 10245 FL -Sym (8 times) BR 10247 Cp5.3 Sym-FLnonnif (13 times) 3 5 6 Sym-FLnif (10 times) 13 8 7 SZCCT0346 FLnonnif-FLnif (4 times) 9 5 SZCCT0007 12 4 p9-20 BK707

6 11 7

5 9 SZCCT0231 10 3 8

10 1 DOA9 P10 130 2

CCH5-F6

CCBAU 83689 R2.2-H W BM-T Y-H1 CCBAU 53363 UNPA324

22

TSA1 UNPF46 AT1 39S1MB

cf659 Colorstrip > Supergroups

CCBAU 53363 (NI) Soil 1 supergroup

SZCCT0007

SZCCT0346

SZCCT0231 p9-20 (NI) Soil 2 supergroup DOA9 (NI)

BK707 UNPA324

Bradyrhizobium jicamae supergroup 39S1MB 1TA S23321 cf659

UNPF46 TSA1 CCH5-F6 22 BM-T BR 446 Y-H1 R2.2-H W BR3351 Tv2a-2 DOA1 Ai1a-2 Bradyrhizobium elkanii supergroup BR 10247 PAC68 ARR65 BR 10245 Ec3.3 WSM2254 WSM3983 STM 3809 NAS96.2 bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint SEMIA 6399 Kakadu supergroup Cp5.3 P10 130 (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprintORS375 in perpetuity. It is made CCGE-LA001 available under aCC-BY-NC-NDAzorh 4.0Azorh International license. ORS285 PAC 48 Photosynthetic supergroup ORS278 USDA 3259 izobium izobium SZCCT0094 USDA 3254 R5 BTAi1 3458 Bradyrhizobium japonicum supergroup doeber caulino BR3262 STM 3843 BLY3-8 SZCCT0283 S58 einerae dansUFLA ORS571 BLY6-1 CI-1B Outgroup CI-41S Semia 938 Background > Lifestyle 1-100 SZCCT0424 C9 Symbiotic NBRC 14791 Sym Basal USDA 76 Free-living N2-fixing USDA 94 Cluster FL CCBAU 05737 CCBAU 43297 Cluster PB 587 Dots > Genes BR29 TnphoA33 CCBAU 15615 UFLA 03-321 nifV SZCCT0284 UFLA03-84 SEMIA 690 USDA 4 SEMIA 6208 SZCCT0046 WSM2793 Strains have two nif clusters OO99 Rc2d CCBAU 15544 SA-3 Rp7b CCBAU 15635 BR3267 USDA 124 p9-20 (SI) Genomes displayed in Figure 3 CCBAU 43298 CCBAU 10071 CCBAU 25435 Sym CCBAU 25021 CCBAU 35157 Is-34 CCBAU 05623 USDA 38 WSM1743 J5 Bradyrhizobium CCBAU 51781 NBRC 14783 CCBAU 51770 FN1 CCBAU 51778 CCBAU 15618 LMG 28791 USDA 6 Ghvi E109 Rc3b ERR11 SEMIA 5079 Node 1 AC87j1 USDA 110 Is-1 INPA54B CNPSo 3426 110spc4 XF7 CNPSo 3424 Y21 LVM 105 CNPSo 3448 LMG 26795 CCBAU 41267 CB756 SEMIA 5080 USDA 3456 USDA 122 USDA 3384 SZCCT0233 WSM2783 Leo121 CCBAU 15517 Leo170 CCBAU 83623 LMTR 13 SZCCT0285 LMTR 3 CCBAUUSDA 15354 123 WSM1741 LMTR 21 SZCCT0133 CCBAU 23086 LmjM6 SZCCT0293 LmjM3 SZCCT0342 RST89 RST91 CCBAU 83689 DOA9 plasmid USDA 135 CCBAU 05525 Ro19 CCBAU 51649 SZCCT0287 NK6 CCBAU 51670 CCBAU 53424 SZCCT0340 CCBAU 53426 SZCCT0341 Gha CCBAU 53363 (SI) UBMA197 RP6 WSM1417 L2 WSM4349 UBMA192 UBMA182 UBMA171 CCBAU 53390 UBMA195 UBMA183 UBMA181 CCBAU 51757 SUTN9-2 WSM1253 UBMA122 UBMAN05 UBMA510

Th.b2 174MSW

AS23.2

NAS80.1 CCBAU 51787 SEMIA 6148 CCNWSX0360 BR 10303 UBMA051 UBMA050 UBMA060 UBMA052 UBMA061 nifA sufBCDS nifZ nif(H)DKENX hspQ nifT nifB nifZ glbO nifHQ nifV nifW fixABCX modECBAD bolA nifZ iscA nifZ paaD grxD ahpD modE (A) glbO fabG sufE nifX hspQ K13819 nifT iscA nifZ fdx E3.1.1.45 hndC cheY nifW fixX PRDX2_4 S23321 nifA sufB sufC sufD sufS nifH nifD nifK nifE nifN iscS nifB per nifH nifQ irr nifV cysE fixA fixB fixC modC modB modA 0 6,000 12,000 18,000 24,000 30,000 bolA 36,000 42,000 48,000 nifZ iscA nifZ paaD grxD fixX sufE nifX hspQ hcaC K13819 nifT iscA nifZ fdx glbO E3.1.1.45 hndC cheY nifW modE DOA9 (NI) fabG nifA sufB sufC sufD sufS nifD nifK nifE nifN iscS nifB per nifH nifQ irr nifV cysE fixA fixB fixC modC modB modA modD TC.DME fabG 0 6,000 12,000 18,000 24,000 30,000 36,000 42,000 48,000 nifZ iscA nifZ paaD bolA hylB fixX sufB sufE nifX hspQ hcaC K13819 nifT iscA nifZ fdx glbO E3.1.1.45 grxD rssB nifW modE CCBAU 53363 (NI) fabG nifA sufB sufC sufD sufS nifD nifK nifE nifN iscS nifB per nifH nifQ irr nifV cysE fixA fixB fixC modC modB modA modD ABC.SP.S 0 6,000 12,000 18,000 24,000 30,000 36,000 42,000 48,000 sufE iscA nifZ paaD bolA hylB fixX Cluster FL nifZ narL nifX hspQ K13819 nifT iscA nifZ fdx glbO E3.1.1.45 grxD rssB nifW modE SZCCT0007 sufB sufC sufD sufS nifD nifK nifE nifN iscS nifB per nifH nifQ irr nifV cysE fixA fixB fixC modC modB modA modD hyuA 0 6,000 12,000 18,000 24,000 30,000 hylB 36,000 42,000 48,000 nifZ iscA nifZ paaD bolA fixX glbO sufE nifX hcaC K13819 nifT iscA nifZ fdx E3.1.1.45 grxD algB nifW modE p9-20 (NI) sufB sufC sufD sufS nifD nifK nifE nifN iscS nifB per nifH nifQ irr nifV cysE fixA fixB fixC modC modB modA crt 0 6,000 12,000 18,000 24,000 30,000 hylB 36,000 42,000 48,000 nifZ iscA nifZ paaD bolA fixX glbO sufE nifX hspQ K13819 nifT iscA nifZ fdx E3.1.1.45 grxD devR nifW modE AT1 fabG nifA sufB sufC sufD sufS nifD nifK nifE nifN iscS nifB per nifH nifQ irr nifV fixA fixB fixC modB modA ABC.SP.S 0 6,000 12,000 18,000 24,000 30,000 36,000 42,000 48,000 bolA bfd iscA nifZ paaD grxD ahpD nifX hspQ K13819 nifT iscA nifZ fdx hcaC glbO E3.1.1.45 hndC cheY nifW fixX PRDX2_4 mhuD SZCCT0283 nifH nifD nifK nifE nifN TST MAO VCP iscS nifB per yghU glpE nifH nifQ irr nifV cysE fixA fixB fixC modC modB modA modD IBA57 0 6,000 12,000 18,000 24,000 30,000 36,000 42,000 48,000 54,000 grxD fixJ iscA nifZ paaD hndC cheY fixX modC sufE nifX hspQ K13819 nifT iscA nifZ fdx glbO E3.1.1.45 bolA arsC1 irr nifW modE modD ORS278 fabG nifA sufB sufC sufD sufS nifH nifD nifK nifE nifN TST iscS nifB per K07487 nifH nifQ nifV cysE fixA fixB fixC livK 0 6,000 12,000 18,000 24,000 30,000 36,000 grxD 42,000 48,000 54,000 fixJ iscA nifZ paaD hndC cheY grxD modE ahpD Cluster PB sufE nifX hspQ K13819 nifT iscA nifZ fdx glbO E3.1.1.45 bolA arsC1 irr nifW fixX modC PRDX2_4 ORS285 fabG nifA sufB sufC sufD sufS nifH nifD nifK nifE nifN TST iscS nifB per nifH nifQ nifV cysE fixA fixB fixC modD livK 0 6,000 12,000 18,000 24,000 30,000 36,000 42,000 48,000 54,000 hya,hyp sufBCDS nifDKENX nifBnifTnifZ nifHQ nifV nifW fixBCX (B) nifZ parE1_3_4 hyaC hypA hypC nifX iscA nifT nifZ parD1_3_4 nifW fixX ahpD SEMIA 6399 hyaA hyaB hyaC hyaD hyaF hupV hypB hypF hypD hypE ald sufB sufC sufD sufS nifD nifK nifE nifN VCP iscS nifB per entS opuC opuA moeB ompW nifH nifQ nifV fixB fixC 0 8,000 16,000 24,000 32,000 40,000 48,000 56,000 64,000 72,000 fixJ nifZ fixX ahpD modC bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. ItdctR is made hypC hypA nifX iscA nifT nifZ ompW nifW K07482 PRDX2_4 FLnif within Symavailable under aCC-BY-NC-ND 4.0 International license. P10 130 fixL dctA hyaA hyaB hyaC hyaD hyaF hupV hypB hypD hypE ald sufB nifD nifK nifE nifN iscS nifB nifH nifQ nifV fixB fixC 0 8,000 16,000 24,000 32,000 40,000 0 2,500 5,000 7,500 10,000 12,500 iscA K07497 E3.1.1.11 nifX K13819 nifT nifQ nifW fixX DOA1 parA K00666 E6.4.1.4B bccA glmS E3.2.1.15 fabG nifH nifD nifK nifE nifN nifB nifV fixB fixC omp31 0 8,000 16,000 E3.2.1.15 24,000 32,000 40,000 iscA 48,000 56,000irr 64,000 Sym Basal parA K07497 E3.1.1.11 K07484 nifX K13819 nifZ nifQ nifW fixX BR 446 K00666 E6.4.1.4B bccA gumF glmS tgnB fabG nifH nifD nifK nifE nifN iscS nifB nifV fixB fixC 0 8,000 16,000 24,000 nifZ 32,000 40,000 48,000 56,000 64,000 nifX iscA nifT nifZ fixX PRDX2_4 fer E1.17.4.1B NAS96.2 ald sufB sufC sufD sufS nifD nifK nifE nifN davA iscS nifB per ompW nifH nifQ nifV fixB fixC ahpD ispA CYP117A fabG CYP114 CYP112A metE modC 0 8,000 16,000 24,000 32,000 40,000 nifZ 48,000 56,000 64,000 fixJ hypC hyaF hypA hypC nifX iscA nifT nifZ hylB nifW fixX PRDX2_4 TC.DME CCGE-LA001 dctR fixL dctA hyaA hyaB hyaC hyaD hupV hypB hypF hypD hypE ald sufB nifD nifK nifE nifN iscS nifB ompW nifH nifQ nifV fixB fixC 0 8,000 16,000 24,000 32,000 40,000 48,000 56,000 64,000

fixJ hypC hyaF hypA hypC sufS nifX iscA nifT nifZ hndC arsC1 nifW fixX PRDX2_4 p9-20 (SI) fixJ fixL dctA modB hyaA hyaB hyaC hyaD hupV hypB hypF hypD hypE ald nifD nifK nifE nifN iscS nifB ompW nifH nifQ nifV fixB fixC ahpD modC Sym 0 8,000 16,000 24,000 32,000 40,000 48,000 56,000 64,000 fixJ hypC hypA hypC nifX iscA nifT nifZ hcaC hndC nifW fixX PRDX2_4 CB756 fixJ fixL dctA modB K07148 hyaA hyaB hyaC hyaD hyaF hupV hypB hypF hypD hypE ald sufB sufC sufD sufS nifD nifK nifE nifN iscS nifB per ompW nifH nifQ nifV fixB fixC ahpD 0 8,000 16,000 24,000 32,000 40,000 48,000 56,000 64,000 nifT iscA K07497 virD4 ompR nifB fixX E3.1.30.1 ABC.MS.P K07497 DOA9 plasmid fabG nifD nifK ltrA bchE E3.2.1.15 envZ omp31 K07493 K07485 nifH fixA fixB fixC mcp ABC.MS.S K07486 K07485 VCP ybjD 0 8,000 TSTA3 16,000 K07483 24,000 32,000 nifT 40,000 48,000 56,000 64,000 ABC.SN.P TSTA3 E3.1.1.11 nifX iscA nifZ nifW fixX E3.1.30.1 ABC.SN.A korD CCBAU 53363 (SI) nodA nodB nodC ubiG E2.1.3.- nodI nodJ asm21 gmd nifA E3.2.1.15 K07486 K07497 fabG nifD nifK nifE nifN nifB nifH nifQ fixA fixB fixC panB panC ppaX smf ntaA atsK ABC.SN.S frdA 0 8,000 16,000 24,000 32,000 40,000 48,000 56,000 64,000

modE ahpD iscA nifZ paaD bolA cheY (C) PRDX2_4 sufE K07006 phnG phnO hcaC K13819 nifT iscA nifZ fdx glbO E3.1.1.45 grxD arsC1 nifW fixX Azorhizobium caulinodans phnE modD nifA sufS sufD sufC sufB nifH nifD nifK nifE nifN cysE nifV phnF phnH phnI phnJ phnK phnL phnM phnN TST MAO VCP iscS nifB per cysJ yghU glpE nifH nifQ irr fixA fixB fixC ORS 571 0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 nifT nifZ iscA Outgroup iscA nifZ nifX nifW fixX K06995 E3.2.1.123 Rhodopseudomonas palustris fabG nifA nifB nifH nifD nifK nifE nifN nifQ K13819 iscS nifV fixA fixB fixC metA TC.PST K16150 exoP exoQ tagA UXE ictB degP bmpA nupC nupB nupA degP otsA ppk2 fabI E2.3.1.8 ackA mucR paaI DX-1 0 8,000 16,000 24,000 32,000 40,000 48,000 56,000 64,000 bioRxiv preprint doi: https://doi.org/10.1101/2021.02.03.429501; this version posted February 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Marine Soil Plant Others *** *** *** *** *** *** *** *** ns *** *** ***

100

80

60

40

Normalized abundance (%) 20

0

Cluster FL Cluster PB Sym Cluster SymBasalCluster FL Cluster PB Sym Cluster SymBasalCluster FL Cluster PB Sym Cluster SymBasalCluster FL Cluster PB Sym Cluster SymBasal