1 Diversity and distribution of Nitrogen Fixation Genes in the Oxygen Minimum Zones of the

2 World Oceans

3 1Amal Jayakumar and 1Bess B. Ward

4 5 1Department of Geosciences 6 Princeton University 7 Princeton, NJ 08544 8

9 Abstract

10 Diversity and community composition of nitrogen fixing microbes in the three main oxygen

11 minimum zones (OMZs) of the world ocean were investigated using operational taxonomic unit

12 (OTU) analysis of nifH clone libraries. Representatives of the all four main clusters of nifH genes

13 were detected. Cluster I sequences were most diverse in the surface waters and the most abundant

14 OTUs were affiliated with Alpha- and . Cluster II, III, IV assemblages were

15 most diverse at oxygen depleted depths and none of the sequences were closely related to sequences

16 from cultivated organisms. The OTUs were biogeographically distinct for the most part – there was

17 little overlap among regions, between depths or between cDNA and DNA. Only a few

18 cyanobacterial sequences were detected. The prevalence and diversity of microbes that harbour nifH

19 genes in the OMZ regions, where low rates of N fixation are reported, remains an enigma.

20

21 Introduction

22 Nitrogen fixation is the biological process that introduces new biologically available

23 nitrogen into the ocean, and thus constrains the overall productivity of large regions of the ocean

24 where N is limiting to primary production. The most abundant and most important diazotrophs

25 in the ocean are cyanobacteria, members of the filamentous genus Trichodesmium and several

1

26 unicellular genera, including Chrocosphaera sp. and the symbiotic genus Candidatus

27 Atelocyanobacterium thalassa (UCYN-A). Although these cyanobacterial species are wide

28 spread and have different biogeographical distributions (Moisander et al. 2010), they are

29 restricted to surface waters, mainly in tropical or subtropical regions.

30 Because diazotrophs have an ecological advantage in N depleted waters, and because those

31 conditions occur in the vicinity of oxygen minimum zones, due to the loss of fixed N by

32 denitrification, it has been proposed that N fixation should be favoured in regions of the ocean

33 influenced by OMZs (Deutsch et al. 2007). It has also been suggested that the energetic constraints49 Moved down [1]: The search for non cyanobacterial 50 diazotrophs has resulted in discovery of diverse nifH genes, 51 but they have not been associated with significant rates of N 34 on N fixation might be partially alleviated under reducing, i.e., anoxic, conditions (Großkopf and 52 fixation (Moisander et al. 2017).

35 LaRoche 2012). In response to these ideas, the search for organisms with the capacity to fix

36 nitrogen has been focused recently in regions of the ocean that contain OMZs. That search usually

37 takes the form of characterizing and quantifying one of the genes involved in the fixation reaction,

38 nifH, which encodes the dinitrogenase reductase enzyme. Diverse nifH assemblages have been

39 reported from the oxygen minimum zone of the Eastern Tropical South Pacific (Turk-Kubo et al.

40 2014, Loescher et al. 2016, Fernandez et al. 2011) and the Costa Rica Dome, at the edge of the OMZ

41 in the Eastern Tropical North Pacific (Cheung et al 2016). The search for non cyanobacterial Moved (insertion) [1]

42 diazotrophs has resulted in discovery of diverse nifH genes, but they have not been associated with

43 significant rates of N fixation (Moisander et al. 2017). Here we report on the distribution and

44 diversity of nifH genes in all three of the world ocean’s major OMZs, including samples from both

45 surface and anoxic depths, and both DNA and cDNA (i.e., both presence and expression of the nifH

46 genes).

47

48 Materials and Methods:

2

53 Samples analysed for this study were collected from the three major OMZ regions of the

54 world oceans (32 total samples, Table1 and 2) from surface, oxycline and oxygen depleted zone

55 (ODZ) depths. Particulate material from water samples (5 – 10 L), collected using Niskin samplers,

56 mounted on a CTD (Conductivity-Temperature-Depth) rosette system (Sea-Bird Electronics), was

57 filtered onto Sterivex capsules (0.2 µm filter, Millipore, Inc., Bedford, MA) immediately after

58 collection using peristaltic pumps. The filters were flash frozen in liquid nitrogen and stored at -

59 80°C until DNA and RNA could be extracted. For samples from the Arabian Sea, DNA extraction

60 was carried out using the PUREGENETM Genomic DNA Isolation Kit (Qiagen, Germantown, MD)

61 and the RNA was extracted using the ALLPrep DNA/RNA Mini Kit (Qiagen, Germantown, MD).

62 For samples collected from ETNP and ETSP DNA and RNA were simultaneously extracted using

63 the ALLPrep DNA/RNA Mini Kit (Qiagen, Germantown, MD). SuperScript III First Strand

64 Synthesis System (Invitrogen, Carlsbad, CA, USA) was used to synthesise cDNA immediately after

65 extraction following purification of RNA using the procedure described by the manufacturer,

66 including RT controls. DNA was quantified using PicoGreen fluorescence (Molecular Probes,

67 Eugene, OR) calibrated with several dilutions of phage lambda standards.

68 PCR amplification of nifH genes from environmental sample DNA and cDNA was done on

69 an MJ100 Thermal Cycler (MJ Research) using Promega PCR kit following the nested reaction

70 (Zehr et al. 1998), with slight modification as in Jayakumar et al. (2017). Briefly, 25µl PCR

71 reactions containing 50 pmoles each of outer primer and 20-25ng of template DNA, were amplified

72 for 30 cycles (1 min at 98°C, 1 min at 57°C, 1 min at 72°C), followed by amplification with the

73 inner PCR primers 50 pmoles each (Zehr and McReynolds 1989). Water for negative controls and

74 PCR was freshly autoclaved and UV-irradiated every day. Negative controls were run with every

75 PCR experiment, to minimize the possibility of amplifying contaminants (Zehr et al. 2003). The

3

76 PCR preparation station was also UV irradiated for 1 hour before use each day and the number of

77 amplification cycles was limited to 30 for each reaction. Each reagent was tested separately for

78 amplification in negative controls. nifH bands were excised from PCR products after electrophoresis

79 on 1.2% agarose gel, and were cleaned using a QIAquick Nucleotide Removal Kit (Qiagen). Clean

80 nifH products were inserted into a pCR®2.1-TOPO® vector using One Shot® TOP10 Chemically

81 Competent E. coli, TOPO TA Cloning® Kit (Invitrogen) according to manufacturer’s specifications.

82 Inserted fragments were amplified with M13 Forward (-20) and M13 Reverse primers from Formatted: Superscript Formatted: Font: +Body (Calibri) 83 randomly picked clones. PCR products were sequenced at Macrogen DNA Analysis Facility using Formatted: Font: Times New Roman Formatted: Normal, None, Indent: First line: 0.5", Line TM spacing: Double, No widow/orphan control, Don't adjust 84 Big Dye terminator chemistry (Applied Biosystems, Carlsbad, CA, USA). Sequences were edited space between Latin and Asian text, Don't adjust space between Asian text and numbers 85 using FinchTV ver. 1.4.0 (Geospiza Inc.), and checked for identity using BLAST. Consensus nifH 99 Deleted: Standardization and verification of specificity for Q- 100 PCR assays was performed as described previously 86 sequences (359 bp) were translated to amino acid (aa) sequences (108 aa after trimming the primer101 (Jayakumar et al. 2009). Primers nifHfw and nifHrv (Mehta et 102 al. 2003, Dang et al. 2013) forward 5- 103 GGHAARGGHGGHATHGGNAARTC-3 and reverse 5- 87 region) and aligned using ClustalX (Thompson et al. 1997) along with published nifH sequences 104 GGCATNGCRAANCCVCCRCANAC-3, which correspond 105 to the amino acid positions 10 to 17 (GKGGIGKS) and 132 to 106 139 (VCGGFAMP) of Klebsiella pneumoniae numbering 88 from the NCBI database. Neighbor-joining trees were produced from the alignment using distance107 (Mehta et al. 2003), were used (100 pmoles per 25 mL 108 reaction) to amplify a ~400 bp region of the nifH gene for 89 matrix methods (PAUP 4.0, Sinauer Associates). Bootstrap analysis was used to estimate the 109 nifH quantification. Assays were carried out with Qiagen 110 master mix (Qiagen Sciences, Maryland, USA) at an 111 annealing temperature of 56 °C. Amplification conditions 90 reliability of phylogenetic reconstruction (1000 iterations). The nifH sequence from Methanosarcina112 were chosen based on amplification efficiency and 113 reproducible results with a single product, after test assays on 114 a Stratagene MX3000P (Agilent Technologies, La Jolla, CA, 91 lacustris (AAL02156) was used as an outgroup. The accession numbers from GenBank for the nifH115 USA). The amplified products were visualized after 116 electrophoresis in a 1.0% agarose gels stained with ethidium 92 sequences in this study are Arabian Sea DNA sequences JF429940– JF429973 and cDNA sequences117 bromide. Standards for quantification were prepared by 118 amplifying a constructed plasmid containing the nifH gene 119 fragment, followed by quantification and serial dilution. 93 accession numbers JQ358610–JQ358707, ETNP DNA sequences KY967751-KY967929 and cDNA120 Assays for all depths were carried out within a single assay 121 plate (Smith et al. 2006). Each assay included triplicates of the 122 no template controls (NTC), no primer control (NPR), four or 94 sequence KY967930-KY968089, and ETSP DNA sequences MK408165–MK408307 and cDNA 123 more standards, and 20-25ng of template DNA of the 124 environmental DNA samples. A subset of samples from the 125 previous run was included in subsequent assays, as well as a 95 sequences MK408308–MK408422. 126 new dilution series for standard curves on every assay. These 127 new dilution series were produced immediately following re- 128 quantification of plasmid DNA concentrations to verify gene 96 129 abundance (because concentrations declined upon storage and 130 freeze–thaw cycles). Automatic analysis settings were used to 131 determine the threshold cycle (Ct) values. The copy numbers 97 The nifH nucleotide alignment (of 787 sequences) was used to define operational 132 were calculated according to: Copy number = (ng * 133 number/mole) / (bp * ng/g * g/ mole of bp) and then converted 134 to copy number per ml seawater filtered, assuming 100% 98 taxonomic units (OTUs) on the basis of DNA sequence identity. Distance matrices based on this 135 extraction efficiency.¶ 136 ¶

4

137 nucleotide alignment were generated in MOTHUR (Schloss and Handlesman 2009). The

138 relative nifH richness within each clone library was evaluated using rarefaction analysis. OTUs

139 were defined as sequences which differed by ≤3% using the furthest neighbor method in the

140 MOTHUR program (Schloss and Handlesman 2009). The 3% OTU definition is similar to the

141 level at which species are conventionally defined using 16S rDNA sequences, so it may

142 overestimate the meaningful diversity of the functional gene. Redundancy analysis was

143 performed in R using the vegan package. Environmental variables were transformed using

144 decostand.

145

146 Results and Discussion:

147 DNA and cDNA sequences (787 in total) derived from the OMZ regions of the Arabian

148 Sea (AS), Eastern Tropical North Pacific (ETNP) and Eastern Tropical South Pacific (ETSP)

149 were subjected to OTU and phylogenetic analyses to compare the diversity and community

150 composition, biogeography and gene expression, of nifH possessing microbes among the three

151 OMZ regions. Phylogenetic analysis of the sequences from the AS, ETNP and ETSP were

152 reported previously (Jayakumar et al. 2012, Jayakumar et al. 2017, Chang et al. 2019), but the

153 sequences have been combined for additional analyses here. We compared the threshold OTU

154 definitions at 3 and 10% and found that the number of OTUs decreased, as expected, as the

155 resolution decreased. Even at the 3% threshold, however, OTUs tended to separate by depth and

156 location, indicating a functionally useful distinction at this level. Thresholds of 3 – 5% as the

157 OTU definition correspond to within and between species level distinctions for nifH (Gaby et al.

158 2018). The sequences from the OMZ regions represented all four sequence clusters (I, II, III, IV)

159 described by Zehr et al. (1998).

5

160

161 Cluster I nifH OTU distributions: Diversity analysis of the nifH cluster 1 sequences

162 for the three OMZs based on OTUs using MOTHUR identified 41 OTUs at a distance threshold

163 of 3% (Supplemental Table 1A and B). The number of sequences and the number of OTUs

164 varied widely among depths and stations, so the results are grouped by region (AS, ETNP,

165 ETSP) or depth horizon (surface or OMZ, including upper oxycline depths) or cDNA vs DNA

166 (Table 2). The OTUs are numbered in order of decreasing abundance in the clone library, i.e., 183 Deleted: 1).

167 OTU-1 was the most common OTU.

168 For all regions and depths combined, the number of OTUs detected (41) was less than the

169 sum of OTUs detected when each region was analyzed separately (45), indicating that there was

170 some overlap of OTUs among regions. The overlap was not large, however. Only three of the 12

171 most abundant OTUs contained sequences from more than one region and none contained

172 sequences from all three regions (Figure 1A). When sequences for all three regions were

173 combined, only four of the 12 most abundant OTUs contained sequences from both depth

174 horizons (Figure 1B). Most OTUs represented a single depth, and many a single sample.

175 Interestingly, Cheung et al. (2016) used 454-pyrosequencing to obtain a similar number of OTUs

176 (37 total) from the Costa Rica Dome, and all of the 15 samples investigated by Cheung et al.

177 (2016) were dominated (>50%) by one of five major OTUs.

178 The Arabian Sea was strikingly less diverse than other regions and sample subsets

179 (Figure 2). For example, when all DNA and cDNA sequences for all depths are grouped

180 together, the Arabian Sea (OTUs = 14, Chao = 21) contains less species richness than the

181 combined surface samples from all three regions (OTUs = 25, Chao = 52), despite having a

182 similar number of total sequences (178 for the Arabian Sea, 198 for all surface samples

6

184 combined). This lack of diversity in the AS data may be partly due to the preponderance of

185 cDNA sequences, which generally contained less diversity than a similar number of DNA

186 sequences (see below).

187 Although similar numbers of sequences were obtained for cDNA (255) vs DNA (257),

188 the OTU “density”, i.e., number of OTUs per number of sequences analyzed, was higher for

189 DNA (0.136 for DNA, 0.094 for cDNA). The Chao statistic verified this observation for the

190 combined data from each region in predicting higher total numbers of OTUs for DNA (Chao =

191 42) than for cDNA (Chao = 24). This difference could indicate that some of the nifH genes

192 present were not expressed at the time of sampling, but the cDNA sequences were not simply a

193 subset of the DNA community. Half of the 12 most abundant OTUs contained either cDNA or

194 DNA (Figure 1C), meaning that some genes were never expressed and some expressed genes

195 could not be detected in the DNA.

196 For all regions combined, similar numbers of OTUs were detected in surface waters

197 (OTUs = 25) and in OMZ samples (OTUs = 23), although a larger number of sequences was

198 analyzed for the OMZ environment (198 vs. 314 sequences for surface and OMZ depths,

199 respectively). It might be expected that the presence of phototrophic diazotrophs in the surface

200 water would lead to greater diversity there, but only one OTU representing a known

201 cyanobacterial phototroph (OTU-12 = Katagmynene spiralis or Trichodesmium) was identified,

202 so most of the additional diversity must be present in heterotrophic or unknown sequences.

203 Rarefaction curves (Figure 2) indicate that sampling did not approach saturation either for

204 region or depth. The Chao statistic also indicated that much diversity remains to be explored,

205 despite the great uncertainty in these estimates. The total number of OTUs detected, the shape of

206 the rarefaction curve and the diversity indicators (Figure 2, Table 2) all indicate that the greatest 207 Deleted: 1

7

208 nifH diversity occurred in surface waters, and much of that diversity was in singletons, i.e., not

209 represented in the 12 most abundant OTUs, which represented 441 (86 %) of the total 512 nifH

210 Cluster 1 sequences analyzed. Most of that diversity was contained in the ETNP, not solely a

211 function of number of sequences analyzed (Figure 2).

212 Cluster I nifH Phylogeny: Phylogenetic affiliations at both DNA and protein level are

213 shown for the 12 most abundant OTUs in Table 3. The most abundant OTU (129 sequences), 230 Deleted: 2

214 OTU-1, contained Gammaproteobacterial DNA and cDNA sequences from both surface and

215 OMZ depths of the ETNP and cDNA sequences from oxycline and OMZ depths in the Arabian

216 Sea (Figure 3). Although very similar to each other, none of these sequences had higher than

217 91% identity at the DNA level (96% at AA level) with cultivated strains and were most closely

218 related to stutzeri. P. stutzeri is a commonly isolated marine denitrifier, but it is

219 also known to possess the capacity for N fixation (Krotzky and Werner 1987). OTU-4, OTU-6

220 and OTU-8 also contained Gammaproteobacterial sequences. All had high identity with

221 cultivated strains at the protein level but none were >91% identical to cultivated strains at the

222 DNA level.

223 Gammaproteobacterial sequences with very close identities to Azotobacter vinelandii have

224 been reported from the Arabian Sea ODZ and also from the ETSP (Turk-Kubo et al. 2014). This

225 group of nifH sequences with close identities to A. vinelandii was also retrieved from the English

226 Channel, Himalayan soil, South Pacific gyre, Gulf of Mexico, mangrove soil and many other

227 environments (Figure 3). Azotobacter- like sequences were included in OTU-6 but were not closest

228 identity at the DNA level. Although a large number of clones were analyzed here, no sequence that

229 was closely associated with A. vinelandii was retrieved from the three regions. None of the g-

8

231 244774A11 sequences, Gammaproteobacterial relatives that were abundant in the South Pacific

232 (Moisander et al. 2014), were detected in this study.

233 OTUs-2, 3, 5, 10, and 11 all represented Alphaproteobacterial sequences, with closest

234 identities to various Bradyrhizobium, Sphingomonas and Methylosinus species. Thus,

235 Alphaproteobacterial sequences (206 sequences) were the most abundant in the clone library. OTU-2

236 contained almost exclusively ETSP ODZ DNA and cDNA sequences (plus one AS ODZ DNA

237 sequence). OTU-3 contained DNA sequences from ETNP surface waters. OTU-5 contained

238 exclusively Arabian Sea DNA sequences from Station 3, while OTU-10 contained only surface

239 samples from the ETNP. An OTU threshold of 11% grouped all (179 sequences in five OTUs) of

240 these Alphaproteobacterial sequences together, but the 3% threshold is consistent with the

241 phylogenetic tree, which shows small scale biogeographical separation of sequence groups.

242 OTUs-7 and -9 were identified as Betaproteobacteria with closest identities to Rubrivivax

243 gelatinosum and Burkholderia, 91 and 90% respectively at the DNA level. However, at the AA

244 level, these sequences were 99 and 100% identical to Novosphingobium malaysiense and S.

245 azotifigens, both Alphaproteobacteria, and again were biogeographically distinct. OTU-7 contained

246 25 DNA sequences from the ODZ depths in the Arabian Sea, and OTU-9 contained 17

247 Burkholderia-like sequences from the oxycline at Station 1 in the Arabian Sea. No

248 Betaproteobacterial nifH sequences were detected in the ETNP or ETSP, but sequences similar to

249 Burkholderia phymatum, Cupriavidus sp. and Sinorhizobium meliloti were reported from ETSP

250 previously (Fernandez et al. 2015). Consistent with our previous report, however, there is no clear

251 separation between the alfa and the beta groups in nifH phylogeny (Jayakumar et al 2017).

252 Most of the Cluster I ETSP sequences from this study were contained in two OTUs (2 and 4).

253 OTU-2 contained 89 Alphaproteobacterial sequences with >98% identity to nifH sequences from

9

254 Bradyrhizobium sp. Uncultured bacterial sequences retrieved from the South China Sea, English

255 Channel, mangrove sediment, wastewater treatment and grassland soil were related to these ETSP

256 sequences. OTU-4 contained 29 Gammaproteobacterial sequences retrieved from both surface and

257 ODZ depths. Four of the remaining ETSP Cluster I sequences were grouped together as OTU-17

258 (Alphaproteobacteria, 89 and 96% identities with Methyloceanibacter sp. and Bradyrhizobium sp. at

259 the DNA and AA level respectively), three were in OTU-23 (Bradyrhizobium 100% identity) and

260 two were singletons. One of the singletons was most closely related to uncultured soil and sediment

261 sequences and to Azorhizobium sp. (86%) and one had 97% identity with Bradyrhizobium

262 denitrificans and many sequences from marine sediments.

263 OTU-22 represents the Deltaproteobacterial group. This novel group was reported

264 previously from the ETNP (Jayakumar et al. 2017) and has three sequences from Arabian sea (OTU-

265 22) and two singletons from ETNP surface waters. nifH possessing Deltaproteobacteria have been

266 reported not only from all the three ODZs but also in several other marine environments including

267 Chesapeake Bay water column, microbial mats from intertidal sandy beach in a Dutch barrier island,

268 Jiaozhou Bay sediment, Rongcheng Bay sediment, Bohai Sea, Mediterranean Sea, Narragansett Bay,

269 and the south Pacific gyre.

270 -like sequences are the most frequently reported nifH sequences from the

271 OMZs studied here and similar environments. Thirty one of 37 OTUs detected by Cheung et al

272 (2016) in the Costa Rica Dome OMZ were Proteobacteria, the two most common OTUs being

273 closely related to Alphaproteobacterium Methylocella palustris and the Gammaproteobacterium

274 Vibrio diazotrophicus. Loescher et al. (2014, 2016) also found V. diazotrophicus-like sequences, as

275 well as several other Gammaproteobacteria in the ETSP. V. diazotrophicus was reported previously

276 in the Arabian Sea (Jayakumar et al. 2012) but was not prominent in the present study.

10

277 Bradyrhizobium spp., one of the most common genera reported here and in surface waters of the

278 Arabian Sea (Bird and Wyman 2013) and by Fernandez et al. (2011) in the ETSP, were also detected

279 in the Costa Rica Dome OMZ and were the dominant OTU at 1000 m at one station (Cheung et al.

280 2016). In addition to Bradyrhizobium-like and Teredinibacter-like nifH sequences, Turk-Kubo et al.

281 (2014) found four other abundant Gammaproteobactera-like nifH sequences, which were entirely

282 novel. The “Gamma A”, which are commonly reported non-cyanobacteria diazotroph nifH

283 sequences from non-OMZ environments (Langlois et al. 2015, Moisander et al. 2017), were

284 represented by a singleton from the ETNP in the present study.

285 nifH sequences related to various Alphaproteobacterial methylotrophs are commonly found

286 in OMZs: Methylosinus trichosporium-like sequences, which are reported here in OTU-5 from the

287 Arabian Sea at both surface and ODZ depths, were also reported by Fernandez et al. (2011) in the

288 ETSP. Methylocella palustris-like nifH genes comprised the most common OTU in the ODZ core

289 depths in the Costa Rica Dome (Cheung et al. 2016). M trichosporium and M. palustris represent

290 obligate and facultative methanotrophs, respectively, both also obligately aerobic. Detection of nifH

291 genes closely related to those of methanotrophs does not prove that methanotrophy is present or

292 important in the anoxic environment of the ODZ but the consistency of this finding across sites

293 motivates further investigation on the potential for methane production and consumption in ODZs.

294 The pattern of high diversity of nifH-bearing mostly heterotrophic microbes, but dominance

295 in each sample by one or a small number of nifH OTUs, suggests a bloom and bust pattern of

296 organic matter-supported growth. That is, we suggest that organic matter, which is supplied

297 episodically in the upwelling regimes, stimulates the growth of copiotrophic microbes that respond

298 rapidly in bloom like fashion. This bloom scenario has been described for denitrifying

299 based on the OTU patterns observed in the nirS and nirK genes as a function of the stage of

11

300 denitrification in both natural assemblages and incubated samples from OMZs (Jayakumar et al.

301 2009). The role of nifH in these heterotrophic microbes is unclear, especially because rates of

302 nitrogen fixation in these locations in the absence of cyanobacteria is often very low (Turk-Kubo et

303 al. 2014, Loescher et al. 2016, Chang et al. 2019).

304 Although Trichodesmium-like clones have been retrieved from the surface waters of the 323 Deleted:

305 Arabian Sea and the ETNP OMZs, only ten clones (OTU-12) in the combined clone library analyzed

306 here were related to Trichodesmium (98% identity), including both cDNA and DNA from the

307 Arabian Sea and cDNA from the ETNP. These sequences were actually 100% identical to

308 Katagnymene spiralis, a close relative of Trichodesmium isolated from the South Pacific Ocean.

309 Turk-Kubo et al. (2014) also retrieved only a few cyanobacterial sequences from the ETSP. No other

310 cyanobacterial nifH sequences were identified.

311 Clusters II, III, IV nifH OTU distributions: The other three nifH clusters were combined

312 for OTU analysis due to the limited number of sequences and OTUs obtained. A total of 18 OTUs

313 were identified in the combined set of 275 sequences with a 3% distance threshold (Table 3). Most324 Deleted: 2

314 of the Cluster II, III, IV sequences were from the ETNP and ETSP. As with the Cluster I sequences,

315 there was very little geographic and depth overlap among these OTUs (Figure 4A, 4B). Only OTU-

316 1 contained sequences from more than one site, the ETNP and the ETSP. OTU-2 contained only

317 cDNA sequences representing ODZ depths at both ETNP stations. OTU-3 contained exclusively

318 ETSP DNA sequences from surface and cDNA sequences from ODZ depths. Only 10 of the Cluster

319 II, III, IV sequences were from the Arabian Sea, and they formed three separate OTUs, a greater

320 “OTU density” than was present at either of the Pacific sites. As observed for Cluster I, most of the

321 OTUs that were detected in the DNA were not being expressed, and those that were expressed were

322 not detected in the DNA (Figure 4C).

12

325 Rarefaction curves (Figure 5) indicate that sampling for Cluster II, III, IV did not

326 approach saturation. The Chao statistic also indicated that much diversity remains to be

327 explored, despite the great uncertainty in these estimates. Unlike the Cluster I analysis, there

328 were relatively few singletons in the Cluster II, III, IV data and the assemblages were dominated

329 by a few types.

330 Clusters II, III, IV nifH phylogeny: Three large OTUs (OTU-1, -4 and -6) in Clusters II,

331 III, IV belonged to nifH Cluster IV and Alphaproteobacteria/Spirochaeta and Deltaproteobacteria

332 were the dominant phylogenies (Table 3, Figure 6). The largest OTU, OTU-1, contained 88 DNA348 Deleted: 2

333 sequences from the ETNP ODZ depths from both stations and from both depths in the ETSP. This

334 OTU had no similarity to any cultured microbe. OTU-4 contained 30 sequences from the ETSP, all

335 cDNA from one surface station, in nifH Cluster IV.

336 OTU-2 (75 sequences) in Cluster II contained only cDNA sequences, all from ODZ

337 samples in the ETNP (both stations), and had no close relatives among cultivated species. Turk-

338 Kubo et al. (2014) also retrieved a few clones identified as belonging to Cluster II from the

339 euphotic zone of the ETSP. OTU-3 contained 35 sequences in Cluster III and was dominated by

340 DNA sequences from surface depths of the ETSP. OTU-5 represented Deltaproteobacteria in

341 nifH Cluster III and contained 18 identical DNA sequences from 90 m at Station BB1 in the

342 ETNP. Thus, of the five most common OTUs (89% of the total Cluster II, III, IV sequences

343 analyzed), only one could be identified to a closely related genus (i.e., OTU-4 with 90% identity

344 with R. palustris) and there was no overlap between DNA and cDNA OTUs from the same

345 depths.

346 The other 13 OTUs in the Cluster II, III, IV sequences represented either Cluster III or IV.

347 None of these were very closely related to any cultivated sequences. OTU-6 contained both DNA

13

349 and cDNA from the OMZ at one ETSP station. OTU-7 contained four sequences from ETNP

350 surface waters with close identities with a sequence retrieved from Bohai sea. OTU-11, had one

351 DNA and one cDNA sequences from the ETSP. All of the other sequences were less than 84%

352 identical to any sequence in the database and could only be loosely identified as Firmicutes or

353 Proteobacteria.

354 Although there were few high identities with known species, many of the Cluster II, III, IV372 Deleted:

355 sequences (OTUs -2, -5, -7, -9, -10) were most closely affiliated with sulfate reducing clades at

356 either the DNA or protein level. Four OTUs with highest identity to known sulfate reducers were

357 reported by Cheung et al. (2016) and one of them comprised nearly 40% of the sequences in one

358 anoxic sample. nifH sequences that cluster with Desulfovibrio spp. are often reported from ODZ

359 samples (Turk-Kubo et al. 2014, Loescher et al. 2014, Fernandez et al. 2011). Consistent reports of

360 nifH genes associated with obligate anaerobes involved in sulfate reduction suggests a role for this

361 metabolism in the ODZ, again motivating further research on the significance of both sulfate

362 reduction and associated N fixation in ODZ waters.

363 Biogeography and Environmental Correlations: The dominant factor determining OTU

364 composition and distribution is clearly biogeography (Figure 4). That geographical factor is also

365 evident in the redundancy analysis (Figure 7). (Only sites that contained sequences from one of the

366 top OTUs are represented in the plots, so the number of site symbols is less than 32 for both plots.)

367 For example, Cluster I OTU-5 containing only Arabian Sea surface sequences was positively

368 correlated with both T and S and all of the Arabian Sea samples clustered in the quadrant associated

369 with high T and S (Figure 7A). Surface samples from the ETSP were also in that quadrant, but

370 surface ETNP samples were negatively correlated with S. The surface ETNP samples correlated

371 with OTUs-3. -6, -10 and -11, all of which contained exclusively surface samples. The two largest

14

373 Cluster I OTUs were associated with the deep samples from the ETNP and ETSP and correlated

374 positively with nitrite concentration and negatively with oxygen – a signature of the ODZ. Nitrate

375 concentration and depth did not increase the power of the analysis and were omitted from the Cluster

376 I RDA. Most of the sites and five of the most common Cluster I OTUs were not well differentiated

377 by any of the usual environmental parameters.

378 The Arabian Sea contained very few sequences in Clusters II, III, IV and none of them were

379 in the top six OTUs, so only ETNP and ETSP samples are represented in the RDA for these clusters

380 (Figure 7B). The two largest OTUs in Clusters II, III, IV were negatively correlated with T and S

381 but separated along the second RDA axis, demonstrating opposite relationships with oxygen, nitrite,

382 and nitrate concentrations. OTU-1 included ETSP surface sequences, as well as ODZ sequences

383 from both ETNP and ETSP, while OTU-2 contained only ODZ sequences but both OTUs were

384 phylogenetically related to anaerobic clades (Table 2). Inclusion of all six environmental variables

385 was necessary to obtain maximum separation of the sites and OTUs for Clusters II, III, IV.

386

387 Conclusions

388 The OMZ regions of the world ocean contain substantial nifH diversity, both in surface

389 waters and oxygen depleted intermediate depths. Surface waters contained greater diversity for

390 Cluster I, but the ODZ held the highest diversity for Clusters II, III, IV. Cyanobacterial sequences

391 were rare in the combined dataset and were not detected in the ETSP. The ETSP contained the least

392 diversity of Cluster I sequences, while Cluster II, III, IV were least abundant and least diverse in the

393 Arabian Sea. Most of the sequences in all four Clusters of the conventional nifH phylogeny were not

394 closely related to any sequences from cultivated Bacteria or Archaea. The most abundant OTUs in

395 Cluster I and in Clusters II, III, and IV could be assigned to the Alphaproteobacteria, followed by the

15

396 Gammaproteobacteria for Cluster I and Deltaproteobacteria accounted for Clusters II, III, IV

397 sequences. Most of the OTUs were not shared among regions, depths or DNA vs cDNA and

398 sometimes were restricted to individual samples. Some Cluster I sequences had high identity to

399 known species (e.g., Bradyrhizobium, Trichodesmium) but most of the Cluster II, III, IV sequences

400 were only distantly related to any cultured species.

401 The assemblage composition of nifH-bearing microbes is mainly explained by region, but

402 OTU composition was also consistent with the influence of key environmental parameters such as

403 oxygen and temperature, and reflects association with the secondary nitrite maximum for deep

404 samples. Most of the sites/depths, both in this study and in others from OMZ regions, are

405 dominated by one or a few OTUs, which suggests bloom-type dynamics within a diverse

406 background assemblage. Microbes occupying very similar niches and present at low population

407 levels might respond differentially to episodic inputs of organic matter, resulting in spatially and

408 temporally varying dominance by a few clades. Thus we find similar metabolic types represented

409 across all the OMZs, although the specific species and genus level affiliations differ. The consistent

410 detection of nifH sequences related to those found in known sulfate reducers and methanotrophs

411 suggests the need for further investigation of these pathways in ODZs.

412 While measurements of N2 fixation rates are not reported here, the abundance of cDNA

413 sequences suggests that the cells harboring these genes are active. Low, but analytically significant,

414 rates have been detected in ODZ depths in the ETNP (Jayakumar et al. 2017) and ETSP (Chang et

415 al. 2019), which suggests that non-cyanobacterial N fixation could make a minor contribution to the

416 nitrogen budget of the ocean. It is therefore important in future work to determine how the diversity

417 described here actually contributes to biogeochemically significant reactions and what

16

418 environmental and biotic factors might influence or control the activity of diazotrophs in the dark

419 ocean.

420

421

17

422 Figure Legends

423 Figure 1. Histogram of the 12 most common OTUs from Cluster I nifH clone libraries from the

424 three OMZ regions. OTUs were considered common if the total number of sequences in an

425 OTU was ≥2% of the total number of nifH clones analyzed (The common OTUs contained 441

426 of the 512 Cluster I sequences). OTUs were defined according to 3% nucleotide sequence

427 difference using the furthest neighbor method. OTU designation is from most common (OTU-1)

428 to least. A) OTU distribution among regions. B) OTU distribution between OMZ (including

429 core of the ODZ and the upper oxycline depths) and surface depths (oxygenated water). C)

430 OTU distribution of cDNA vs DNA clones.

431

432

433 Figure 2. Rarefaction curve displaying observed OTU richness versus the number of clones

434 sequenced for Cluster I nifH sequences (cDNA and DNA). OTUs were defined and designated as

435 in Figure 1. Chao estimators (individual symbols) are shown for each of the same subsets

436 represented in the rarefaction curves.

437

438 Figure 3. Phylogenetic tree of Cluster 1 based on amino acid sequences. Positions of the OTUs

439 are shown relative to their nearest neighbors from the database. Individual sequence identities

440 comprising each OTU are listed in Supplemental Table 2.

441

442 Figure 4. Histogram of the 6 most common OTUs from Cluster II, III, IV nifH clone libraries 445 Deleted: I

443 from the three OMZ regions. OTUs were considered common if the total number of sequences

444 in an OTU was ≥2% of the total number of nifH clones analyzed (the common OTUs contained

18

446 252 of the 275 Cluster II, III, IV sequences). OTUs were defined according to 3% nucleotide

447 sequence difference using the furthest neighbor method. OTU designation is from most common

448 (OTU-1) to least. A) OTU distribution among regions. B) OTU distribution between OMZ

449 (including core of the ODZ and the upper oxycline depths) and surface depths (oxygenated

450 water). C) OTU distribution of cDNA vs DNA clones.

451

452 Figure 5. Rarefaction curve displaying observed OTU richness versus the number of clones

453 sequenced for Cluster II, III, IV nifH sequences (cDNA and DNA). OTUs were defined and

454 designated as in Figure 4. Chao estimators (individual symbols) are shown for each of the same

455 subsets represented in the rarefaction curves.

456

457 Figure 6. Phylogenetic tree of Clusters II, III, IV based on amino acid sequences. Positions of

458 the OTUs are shown relative to their nearest neighbors from the database. Individual sequence

459 identities comprising each OTU are listed in Supplemental Table 2.

460

461 Figure 7. RDA plots for (A) Cluster I and (B) Clusters II, III, IV illustrating the relationships

462 among OTUs (green circles) and sites. DNA = squares; cDNA = circles. Arabian Sea = cyan

463 (surface) and blue (OMZ); ETNP = pink (surface) and red (deep); ETSP = yellow (surface) and

464 orange (deep). (A) Twelve most abundant OTUs for Cluster I and the four most independent

465 environmental variables. T = temperature, S = salinity, NO2 = nitrite concentration, O2 = oxygen

466 concentration. (B) Six most abundant OTUs for Clusters II, III, IV and all six environmental

467 variables. NO3 = nitrate concentration, Z = depth.

468

19

469

470 Tables

471 Table 1. Sampling regions and depths and sequences derived from each depth

472 Table 2. OTU summary for both clusters

473 Richness and diversity statistics for nifH clone libraries from three OMZ regions. ACE and

474 Chao are non-parametric estimators that predict the total number of OTUs in the original sample.

475

476 Table 3. OTU identities for both clusters 490 Deleted: 2

477 Cultivated species with closest nucleotide identity to the OTUs identified in the nifH clone

478 libraries from three OMZ regions. Only the 12 most common OTUs (out of 41 total) are listed

479 for Cluster 1 sequences, and the six most common (out of 18 total) for the Clusters II, III, IV

480 libraries.

481

482 Supplemental

483

484 S Table 1A and B. List of sequences in each OTU for both clusters

485 S Table 2

486 487

488

489

20

491 References

492

493 1. Bird, C. and Wyman, M., 2013. Transcriptionally active heterotrophic diazotrophs are 533 Deleted: 1. 494 widespread in the upper water column of the Arabian Sea. Fems Microbiology Ecology. 495 84, 189-200. 496 2. Chang, B. X., Jayakumar, A., Widner, B., Bernhardt, P., Mordy, C. M., Mulholland, M. R. and 497 Ward, B. B., 2019. Low rates of dinitrogen fixation in the eastern tropical South Pacific. 498 Limnology And Oceanography. 64, 1913-1923. 499 3. Cheung, S. Y., Xia, X. M., Guoand, C. and Liu, H. B., 2016. Diazotroph community structure in 534 Deleted: 2. Dang, H. Y., Yang, J. Y., Li, J., Luan, X. W., 500 the deep oxygen minimum zone of the Costa Rica Dome. Journal of Plankton Research. 535 Zhang, Y. B., Gu, G. Z., Xue, R. R., Zong, M. Y. and Klotz, M. 536 G., 2013. Environment-Dependent Distribution of the 501 38, 380-391. 537 Sediment nifH-Harboring Microbiota in the Northern 502 4. Deutsch, C., Sarmiento, J. L., Sigman, D. M., Gruber, N. and Dunne, J. P., 2007. Spatial 538 South China Sea. Applied and Environmental 503 coupling of nitrogen inputs and losses in the ocean. Nature. 445, 163-167. 539 Microbiology. 79, 121-132.¶ 540 3 504 5. Fernandez, C., Farias, L. and Ulloa, O., 2011. Nitrogen Fixation in Denitrified Marine Waters. 505 Plos One. 6. 541 Deleted: 4 506 6. Fernandez, C., Lorena Gonzalez, M., Munoz, C., Molina, V. and Farias, L., 2015. Temporal and 507 spatial variability of biological nitrogen fixation off the upwelling system of central Chile 508 (35-38.5 degrees S). Journal of Geophysical Research-Oceans. 120, 3330-3349. 509 7. Gaby, J. C., Rishishwar, L., Valderrama-Aguirre, L. C., Green, S. J., Valderrama-Aguirre, A., 542 Deleted: 5 510 Jordan, I. L. and Kostka, J. E., 2018. Diazotroph community characterization via a high- 511 throughput nifH amplicon sequencing and analysis pipeline. Applied And Environmental 512 Microbiology. 84, eO1512-01517. 513 8. Großkopf, T. and LaRoche, J., 2012. Direct and indirect costs of dinitrogen fixation in 543 Deleted: 6 514 Crocosphaera watsonii WH8501 and possible implications for the nitrogen cycle. 515 Frontiers in Microbiology. 3. 516 9. Jayakumar, A., Al-Rshaidat, M. M. D., Ward, B. B. and Mulholland, M. R., 2012. Diversity, 544 Deleted: 7 517 distribution, and expression of diazotroph nifH genes in oxygen-deficient waters of the 518 Arabian Sea. Fems Microbiology Ecology. 82, 597-606. 519 10. Jayakumar, A., Chang, B. N. X., Widner, B., Bernhardt, P., Mulholland, M. R. and Ward, B. B., 545 Deleted: 8 520 2017. Biological nitrogen fixation in the oxygen-minimum region of the eastern tropical 546 Deleted: 9 521 North Pacific ocean. Isme Journal. 11, 2356-2367. 522 11. Jayakumar, A., O'Mullan, G. D., Naqvi, S. W. A. and Ward, B. B., 2009. Denitrifying bacterial 547 Deleted: Distribution and relative quantification 523 community composition changes associated with stages of denitrification in oxygen 548 Deleted: key genes involved 524 minimum zones. Microbial Ecology. 58, 350-362. 549 Deleted: fixed nitrogen loss from the Arabian Sea 525 12. Krotzky, A. and Werner, D., 1987. NITROGEN-FIXATION IN PSEUDOMONAS-STUTZERI. 550 Deleted: zone. Indian Ocean Biogeochemical Processes 526 Archives of Microbiology. 147, 48-57. 551 and Ecological Variability. (J. D. Wiggert and R. R. Hood). 552 American Geophysical Union, Washington, D. C.: 187-203 527 13. Langlois, R., Grokopf, T., Mills, M., Takeda, S. and LaRoche, J., 2015. Widespread 528 Distribution and Expression of Gamma A (UMB), an Uncultured, Diazotrophic, gamma- 553 Deleted: 10 529 Proteobacterial nifH Phylotype. Plos One. 10. 554 Deleted: 11. Mehta, M. P., Butterfield, D. A. and Baross, J. 555 A., 2003. Phylogenetic diversity of nitrogenase (nifH) 530 14. Loescher, C. R., Bange, H. W., Schmitz, R. A., Callbeck, C. M., Engel, A., Hauss, H., Kanzow, T., 556 genes in deep-sea and hydrothermal vent environments 531 Kiko, R., Lavik, G., Loginova, A., Melzner, F., Meyer, J., Neulinger, S. C., Pahlow, M., 557 of the Juan de Fuca ridge. Applied and Environmental 532 Riebesell, U., Schunck, H., Thomsen, S. and Wagner, H., 2016. Water column 558 Microbiology. 69, 960-970.¶ 559 12

21

560 biogeochemistry of oxygen minimum zones in the eastern tropical North Atlantic and 561 eastern tropical South Pacific oceans. Biogeosciences. 13, 3585-3606. 562 15. Loescher, C. R., Grosskopf, T., Desai, F. D., Gill, D., Schunck, H., Croot, P. L., Schlosser, C., 563 Neulinger, S. C., Pinnow, N., Lavik, G., Kuypers, M. M. M., LaRoche, J. and Schmitz, R. A., 564 2014. Facets of diazotrophy in the oxygen minimum zone waters off Peru. Isme Journal. 565 8, 2180-2192. 566 16. Moisander, P. H., Beinart, R. A., Hewson, I., White, A. E., Johnson, K. S., Carlson, C. A., 567 Montoya, J. P. and Zehr, J. P., 2010. Unicellular Cyanobacterial Distributions Broaden the 568 Oceanic N-2 Fixation Domain. Science. 327, 1512-1514. 569 17. Moisander, P. H., Benavides, M., Bonnet, S., Berman-Frank, I., White, A. E. and Riemann, L., 596 Deleted: 13 570 2017. Chasing after Non-cyanobacterial Nitrogen Fixation in Marine Pelagic 571 Environments. Frontiers in Microbiology. 8. 572 18. Moisander, P. H., Serros, T., Paerl, R. W., Beinart, R. A. and Zehr, J. P., 2014. 597 Deleted: 14 573 Gammaproteobacterial diazotrophs and nifH gene expression in surface waters of the 574 South Pacific Ocean. The ISME Journal 8, 1962–1973. 575 19. Schloss, P. D. and Handlesman, J., 2009. Introducing DOTUR, a computer program for 598 Deleted: 15 576 defining operational taxonomic units and estimating species richness. Applied and 577 Environmental Microbiology. 71, 1501-1506. 578 20. Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. and Higgins, D. G., 1997. The 599 Deleted: 16. Smith, C. J., Nedwell, D. B., Dong, L. F. and 579 CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided 600 Osborn, A. M., 2006. Evaluation of quantitative 601 polymerase chain reaction-based approaches for 580 by quality analysis tools. Nucleic Acids Research. 25, 4876-4882. 602 determining gene copy and gene transcript numbers in 581 21. Turk-Kubo, K. A., Karamchandani, M., Capone, D. G. and Zehr, J. P., 2014. The paradox of 603 environmental samples. Environmental Microbiology. 8, 582 marine heterotrophic nitrogen fixation: abundances of heterotrophic diazotrophs do not 604 804-815.¶ 605 17 583 account for nitrogen fixation rates in the Eastern Tropical South Pacific. Environmental 606 Deleted: 18 584 Microbiology. 16, 3095-3114. 585 22. Zehr, J. P., Crumbliss, L. L., Church, M. J., Omoregie, E. O. and Jenkins, B. D., 2003. 607 Deleted: 19 586 Nitrogenase genes in PCR and RT-PCR reagents: implications for studies of diversity of 587 functional genes. Biotechniques. 35, 996-1005. 588 23. Zehr, J. P. and McReynolds, L. A., 1989. Use of degenerate oligonucleotides for amplification 608 Deleted: 20 589 of the nifH gene from the marine cyanobacterium Trichodesmium theiebautii. Applied 590 and Environmental Microbiology. 55, 2522-2526. 591 24. Zehr, J. P., Mellon, M. T. and Zani, S., 1998. New nitrogen-fixing microorganisms detected in 609 Deleted: 21 592 oligotrophic oceans by amplification of nitrogenase (nifH) genes. Applied and 593 Environmental Microbiology. 6, 3444-3450. 594

595

22

610 611 Figure.1

100

90 AS 80 ETNP 70 ETSP 60

50

40 OTU Abundance OTU

30

20

10

0 1 2 3 4 5 6 7 8 9 10 11 12 140 OTU Number

120 Surface OMZ 100

80

60 OTU Abundance OTU

40

20

0 1 2 3 4 5 6 7 8 9 10 11 12 120 OTU Number

100 DNA

cDNA

80

60 OTU Abundance OTU 40

20

0 1 2 3 4 5 6 7 8 9 10 11 12 OTU Number

612

23

613 Figure. 2

60 Surface ETNP 50 ODZ AS ETSP 40 Surface ETNP ODZ AS 30 ETSP Number of OTUs 20

10

0 0 50 100 150 200 250 300 350 400 Number of Clones Sampled

614 615

24

616 Figure. 3

UC Neuse River Estuary AAS22073 p UB Pacific Ocean AAZ31163

Katagnymene spiralis AAL36032 p UB tropical/subtropical Atlantic AAY59933 UC Arabian Sea surface AAV51281 s

Trichodesmium erythraeum IMS101 YP723617 UB North Pacific AAS98181 m OTU-12 u UB Eastern Tropical Atlantic ABV00643 a i UB South China Sea ABX39913 i

UB Gulf of Aqaba ABH02967 r

UB Western Tropical North Atlantic ABD62927 m 100 UB South China Sea surface ACV95413 Trichodesmium thiebautii AAA27368

UB Pacific Ocean-ETSP DNA AHY22772 es 100 Cyanothece sp. WH8902 AAV28035 acte Gloeothece sp. SK40 BAF51659 d

100 Cyanobacterium UCYN-A YP003421696 Group A b UB Pacific Ocean-ETSP DNA AHY23112

UB Pacific Ocean-ETSP DNA AHY22750 ho Lyngbya lagerheimii UTEX1930 AAA19029 no c Nostoc commune AAA21838 i

Richelia sp. SC01 ABB00036 a 52 r 50 UC Arabian Sea 10m AAV51306 60 UC Arabian Sea 25m AAV51296 T

Lyngbya sp. CCAP1446/10 ABQ65208 Cy Anabaena sp. A2 AAD30985 Aphanizomenon sp. KAC15 ABB96256 Synechococcus sp. JA-3-3Ab YP_475238 Calothrix sp. MCC-3A ABF81863 Crocosphaera watsonii WH850 AAP48976 Group B Bradyrhizobium sp. BTAi 1 BAC07281 Bradyrhizobium sp.AFQ33589 OTU-20 Bradyrhizobium denitrificans ADJ18188 OTU-2 OTU-10 OTU-14 OTU-26 Methylocystis rosea AAY89023 UB field soil Ithaca NY ABQ01506 Takara Reagent AAO73607 Arabian Sea UB JX064482 Bradyrhizobium sp. BBC54938 OTU-21 OTU-3 OTU-18 UB Pacific Ocean-ETSP DNA AHY22743 Bradyrhizobium japonicum ACT67975 Bradyrhizobium sp. CCBAU 33011 ABV44667 UB South Pacific Upwelling ABW80585 OTU-7 52 UB Pacific Ocean-ETSP DNA AHY22754 UB South China Sea ABX39742 Novosphingobium malaysiense WP039285924 53 UB Western English Channel ABP37994 UBSanya Mangrove sediment ABM67093 OTU-5 UB paper wastewater AAV52938 Marine Prot. bacterium Sippewissett AAD23112 UB Chazy NY grassland soil ACI26161 UB pulp and paper wastewater AAV52969 Sphingomonas azotifigens BAE71134 α turfgrass established soil ABY55141 OTU-9 OTU-23 OTU-17 Xanthobacter autotrophicus Py2 ABS65347 r I OTU-11 Bradyrhizobium sp. NPK+ROM8 AMX27928 e Bradyrhizobium diazoefficiens AIF73149 OTU-15

PCR Promega Rgnt AY225106 st OTU-19

PCR Reagent EU16668 u

UB South Pacific Upwelling AJC52151 l PCR Reagent ACI14248

UB South Pacific Upwelling AJC52156 C 96 UB South Pacific Upwelling AJC52161 UB South Pacific Upwelling AJC52149 UB South Pacific Upwelling AJC52154 UB South Pacific Upwelling AJC52150 OTU-25

58 Burkholderia phymatum STM815 CAD43950 a

UB South Pacific Upwelling AJC52155 i 50 Cupriavidus sp. mpp1.6 ADM25248 r Cupriavidus taiwanensis AAU85621 Paraburkholderia ferrariae ABO64209 61 Burkholderia sp. CACua-88 ADM13690 UB Pacific Ocean-ETSP DNA AHY22726

60 97 Methylosinus trichosporium OB3b ATQ67965 acte Methylosinus trichosporium AAO49392 β b UB South Pacific Upwelling AJC52153 PCRRgntContACI13947 o

76 UB South Pacific Upwelling AJC52158 e

Rubrivivax gelatinosus AHB60523 t mRNA eastern Mediterranean Sea ABQ50643 75 UB South Pacific Upwelling AJC52152 o 88 OTU-16 r 73 OTU-24 Sinorhizobium meliloti AAP48966 P UB copepod assoc. AAC36031 UB Copepod assoc. AF016602 UB Dutch barrier island-sandy intertidal beach ADD26542 OTU-27 Pseudomonas stutzeri CBL85101 bacterium CZ152S ADQ38905 64 Pseudomonas stutzeri AVX12447 Pseudomonas sp. CEG62552 Pseudomonas azotifigens BAD90116 Marinobacterium lutimaris DSM 22012 AHN13828 OTU-1 UB English Channel ABP37981 95 UB Pacific Ocean-ETSP DNA AHY22856 OTU-8 Pseudomonas oryzae SDS70900 UB Indian Ocean ADA57502 59 Azotobacter vinelandii 1NIP_B 51 UB Damma glacier soil ABU86698 Agrobacterium tumefaciens ACN88695 56 UB Himalayayan soil ACZ57832 Azotobacter chroococcum AYE20514 UB Pacific Ocean-ETSP DNA AHY23148 Arabian Sea UB JX064474 OTU-4 UB South Pacific Gyre ADO20650 UB Bahama Islands AAC36088 Oleibacter sp. MAK91178 Arabian Sea UB JX064473 88 UB South Pacific Gyre ADO20653 UB South Pacific Gyre ADO20643 UB South China Sea ABX39720 UB Chesapeake Bay AAO67664 68 UB Arabian Sea AAV68764 90 UB Arabian Sea AAV68768 70 Arabian Sea UB AY800134 UB South Pacific Gyre ADO20634 γ UB North Pacific ALOHA DNA ABI 18637 79 UB South Pacific Gyre ADO20632 61 OTU-6 UB Pacific Ocean-ETSP DNA AHY22799 UB South Pacific Gyre ADO20629 UB Eastern Tropical Atlantic ABV00665 UB Subtropical North Pacific AAP42775 83 Vibrio diazotrophicus AAA65435 Marine Prot. bacterium Sippewissett AAD23115 63 UB South Pacific Gyre ADO20645 Arabian Sea UB JX064475 UB Guangzhou mangrove sediment ADF32036 UB Gulf of Mexico ABF21178 Klebsiella sp. AAA16804 Teredinibacter turnerae AT7901 ARC12580 Magnetococcus marinus ABK43713 UB Pacific Ocean-ETSP DNA AHY23015 Arabian Sea UB JX064470 82 OTU-13 100 56 Arabian Sea UB JX064467 Marichromatium purpuratum AAC36094 Marichromatium purpuratum AAC36093 85 Methylomonas methanica AAK97417 Methylomonas sp. LW13 QBC29047 75 UB South Pacific Gyre ADO20614 96 UB South Pacific Gyre ADO20616 UB Pacific Ocean-ETSP DNA AHY22786 UB Intertidal microbial mat ADD26204 100 UB Chesapeake Bay Water AAZ06765 96 80 OTU-22 Geobacter sp. ABR01121 Geobacter pickeringii AJE04162 UB Jiaozhou Bay sediment ACN77120 UB Huanghai Sea and Bohai Sea AKS11123 86 UB Chesapeakwater AAZ06675 UB Dutch barrier island-sandy intertidal beach ADD26540 UB Pacific Ocean-ETSP DNA AHY22894 UB MD soil AHN50541 UB oil contaminated marine sediment AAY85432 100 δ UB Mediterranean Sea ABQ50817 UBNarragansett Bay AIL28609 63 UB Rongcheng Bay sediment AKV16181 UB Intertidal microbial mat ACY26274 UB copepod assoc. AF016608 Cluster II 96 UB copepod assoc. AF016603 Methanosarcina lacustris AAL02156 Cluster III YP002505948CcelH10 Methanosarcina lacustris AAL02156

0.01 changes Cluster IV

617 618

25

619 Figure. 4

80

70 AS 60 ETNP ETSP 50

40

OTU Abundance OTU 30

20

10

0 1 2 3 4 5 6 80 OTU Number

70 Surface OMZ 60

50

40

OTU Abundance OTU 30

20

10

0 1 2 3 4 5 6 100 OTU Number

90 DNA 80 cDNA 70

60

50

40 OTU Abundance OTU

30

20

10

0 1 2 3 4 5 6 OTU Number

620 621

26

622 Figure. 5.

40

35 All Cluster234 ODZ ETNP 30 Surface ETSP 25 AS All Cluster234 ODZ 20 ETNP Surface ETSP

Number of OTUs 15 AS

10

5

0 0 50 100 150 200 250 300 350 Number of Clones Sampled

623 624

27

625 Figure 6.

91 Rhodoplanes elegans WP111356811 100 OTU-4 Rhodopseudomonas palustris DX-1 ADU44011 Lachnospiraceae bacterium ND2006]WP035638048 Methanopyrus kandleri AV19 AAM02629 Ruminococcus sp. NK3A76 WP028510298 Clostridium intestinale WP021804255 V

Clostridium aerotolerans WP026891450 I Pelosinus fermentans JBW45 AJQ28834 Catabacter hongkongensis KKI51118 ster lu

Termite gut UB BAF52265 C Termite gut BAA28487 UB PNN32 100 UB Soil AHN51251 85 UB Soil AHN51375 OTU-1 70 Enhygromyxa salina PRP94280 54 OTU-6 Leptolyngbya boryana BAA00565 Coraliomargarita akajimensis WP013044574 UB marine anoxic enrichment AAC36034 100 OTU-3 UB ETSP denitrified waters AEA49118 66 UB MD Soil AHN50444 57 Verrucomicrobiae bacterium DG1235 EDY82020 UB South China Seawater ADT89936 UB GBR Seawater AEU12186 UB South China Seawater ADT89913 52 UB MD Soil AHN50902 61 99 Opitutaceae bacterium TAV5 WP_009512762 Didymococcus colitermitum WP043581909 UB Chesapeake Bay water AAZ06728 UB Dongkemadi Glacier foreland AJP08283 Desulfovibrio salexigens DSM2638 ACS78514 58 Pseudodesulfovibrio piezophilus C1TLV30 CCH50130 Desulfovibrio vulgaris str. Hildenborough YP_009055 Lentisphaerae bacterium GWF2_57_35 OGV45350 OTU-9 100 UB Pacific Ocean-ETSP DNA AHY22887 UB Chesapeake Bay water AAZ06701 Desulfatitalea tepidiphila WP054032742 64 Desulfobacteraceae bacterium RJP78566 UB MD Soil AHN50858 51 Desulfobacter curvatus AAD21801 Bact. Isolate Cyano.mat assoc. AAA65421 100 UB Marginal Sea China AKS11464 UB Marginal Sea China AKS11076 UB Forest soil ACI32145 100 OTU-7 76 UB Huanghai Sea and Bohai Sea AKS11225 Desulfovibrio inopinatus WP027183856 Desulfonatronum thioautotrophicum WP045221084 97 Desulfonatronum thiodismutans WP031386911 Desulfonatronum lacustre WP028571559 UB soil Mexico AER93044 Desulfovibrio gigas AAB09059 87 UB Tallgrass prairie Oklahoma AIE38864 Spirochaeta aurantia AAK01220 UB Pacific Ocean-ETSP DNA AHY22891 100 Desulfovibrio magneticus RS-1 BAH75547 57 Desulfovibrio carbinolicus QAZ67507 UB Pacific Ocean-ETSP DNA AHY22914 63 UB Marginal Sea China AKS11333 UB copepod assoc. GM214 AAC36022 96 78 OTU-11 UB Mediterranean Sponge -Spain: Montgri Coast AKP55777 I 52 Verrucomicrobiae bacterium S94 QBG48103 II

Kiritimatiellales bacterium WP 136081448 ster UB South China Sea ADT89822 lu UB copepod assoc. GM29 AAC36028 C 100 Prosthecochloris s sp. V1 WP110023904 64 UB Sanya Bay Coral AHI71637 75 Chlorobaculum macestae ABQ42889 Chloroherpeton thalassium ATCC 35110 ACF134991 82 OTU-5 UB Mediterranean Seawater-Spain: Montgri Coast AKP55968 70 UB Florida Keys Reef water ADV35081 73 UB MD Soil AHN51089 UB Mediterranean Seawater AKP55957 93 UB copepod assoc. GM212 AAC36023 UB marine anoxic enrichment AAC36036 UB Pacific Ocean-ETSP DNA AHY22827 UB marine anoxic enrichment AAC36035 63 mRNA Termite intestine assoc. NKNRT16 BAA86270 91 92 Treponema azotonutricium AAK01231 94 Treponema primitia ZAS-2 AF325801 Spirochaeta zuelzerae AAK01223 86 mRNA Termite intestine assoc. NKNRT6 BAA86276 UB Pacific Ocean ETSP DNA AHY23003 UB Termite intestine assoc. TKG6 BAA11789 Treponema bryantii AAK01224 100 Spirochaeta stenostrepta AAK01222 58 64 UB copepod assoc. GM216 AAC36027 Clostridium pasteurianum AAT37643 UB marine anoxic enrichment AAC36033 96 Sporobacter termitidis WP073078008 60 OTU-8 86 Massilibacillus massiliensis WP110955331 Desulfovibrio desulfuricans QCC85382 I 95 Proteobacteria

95 Cyanobacteria ster lu

Treponema denticola AAK01225 C Treponema primitia ZAS-1 AAK01227 88 Treponema pectinovorum AAK01226 100 Spirochaeta stenostrepta AAK01221 UB Termite intestine assoc. BAA28442 90 Rhodopseudomonas palustris CGA009 CAE26881 Clostridium pasteurianum CAA30361 94 UB Pacific Ocean-ETSP DNA AHY22721 I I Methanosarcina barkeri CAA39552 Methanoregula boonei 6A8 ABS565971 ster lu

70 Desulfitobacterium hafniense Y51 BAE858001 C OTU-2 77 Clostridium cellulolyticum str. H10 YP_002505948 83 OTU-10 Marine Surface water UB ADI24166 Methanocaldococcus villosus WP00459175 Methanobacterium ivanovii CAA30384 Methanococcus voltae CAA27407 100 Treponema primitia ZAS-2 AAK01230 77 Treponema primitia ZAS-1 AAK01228 97 Termite gut UB BAC210431 Termite gut UB BAF522131 Methanosarcina barkeri BAA34069 Methanosarcina lacustri AAL02156 0.05 changes

626 627

28

628 Figure 7A

Cluster 1 Symmetric

S

5+

+ 8+7 T 9+ 4+ 12+ NO2 O2 1+

RDA2 11+

10 1 2 10+ −

2+ RDA2 (36.3 %) (36.3 RDA2 2 − 3+ 6+

−20 2 4 6

RDA1 RDA1 (54.7 %)

29

629 Figure 7B

Cluster 234 Symmetric

1+ 4+ O2 3+

6+ RDA2 5+ S Z NO3 T 20 2 4 2+ −

RDA2 (32.7 %) (32.7 RDA2 NO2 4 −

−5 0 5 10

RDA1 RDA1 (57.7 %)

30

630 Table 1 Sampling position details Formatted: Font: 12 pt OMZ Region Station Latitude Longitude Depth (m) DNA cDNA Seqs Seqs Arabian Sea S1 19°N 67°E 10 3 0 Arabian Sea S1 19°N 67°E 60 20 0 Arabian Sea S1 19°N 67°E 150 23 25 Arabian Sea S1 19°N 67°E 175 10 22 Arabian Sea S2 15°N 64°E 150 4 25 Arabian Sea S3 12°N 64°E 10 25 4 Arabian Sea S3 12°N 64°E 110 4 23 ETNP BB1 20 9.6°N 106°W 0 26 5 ETNP BB1 20 9.6°N 106°W 18 24 17 ETNP BB1 20 9.6°N 106°W 90 42 38 ETNP BB2 16 31°N 107 6.8°W 0 40 35 ETNP BB2 16 31°N 107 6.8°W 150 47 67 ETSP BB1 13 59.9°S 81 12.0°W 2 29 1 ETSP BB1 13 59.9°S 81 12.0°W 130 46 44 ETSP BB2 20. 46.1°S 70 39. 5°W 20 45 30 ETSP BB2 20. 46.1°S 70 39. 5°W 115 23 40 631

31

632 Table 2 OTU Summary Depths, No. of No. of regions No. of Unique OTUs OTU Sample subset included Sequences Sequences (cutoff ~3) /seq Shannon Simpson Chao Ace Cluster I AS Arabian Sea, 178 36 14 0.079 1.8 0.22 21 45 all depths

ETNP ETNP, all 207 80 25 0.121 2.37 0.14 37 34 depths ETSP ETSP, OMZ 127 51 6 0.047 0.87 0.53 7 8 depths

All ClusterI Three 512 165 41 0.080 2.7 0.11 59 67 regions, all depths All ClusterI DNA Three 257 97 35 0.136 2.8 0.08 42 45 regions, all depths All ClusterI cDNA Three 255 75 24 0.094 1.7 0.25 24 27 regions, all depths All ClusterI Surface Three 198 73 25 0.126 2.5 0.10 52 75 regions, surface depths All ClusterI OMZ Three 314 98 23 0.073 0.9 0.23 30 37 regions, all depths

Clusters II, III, IV AS Arabian Sea, 10 6 3 0.300 1.09 0.27 3 3 all depths ETNP ETNP, all 134 49 8 0.060 1.19 0.39 14 38 depths ETSP ETSP, all 131 64 8 0.061 1.37 0.30 9 19 depths All Clusters II,III,IV Three 275 117 18 0.065 1.88 0.21 28 26 regions, all depths All Clusters II,III,IV Three 155 65 12 0.077 1.20 0.37 22 17 DNA regions, all depths All Clusters II,III,IV Three 120 56 9 0.075 1.11 0.45 12 15 cDNA regions, all depths All Clusters II,III,IV Three 86 46 6 0.070 1.32 0.29 7 13 Surface regions, surface depths All Clusters II,III IV Three 189 76 15 0.079 1.57 0.29 46 24 OMZ regions, OMZ depths 633 634 635 636

32

637 Table 3 639 Deleted: 2 638

No. of Phylogenetic Closest cultured Identity Coverage Closest cultured Identity Coverage Sequences Affiliation relative (DNA) DNA % % relative (Protein) AA % % Cluster I OTU-1 129 Gamma Psuedomonas 91 98 Pseudomonas 95.8 99 stutzeri stutzeri strain SGAir0442 OTU-2 89 Alpha Bradyrhizobium sp 99 100 Bradyrhizobium 99 99 denitrificans strain LMG 8443 OTU-3 40 Alpha Bradyrhizobium 94 98 Bradyrhizobium sp. 99 98 sp. TM124 MAFF 210318 OTU-4 29 Gamma Marinobacterium 87 100 Oleibacter sp 100 99 lutimaris OTU-5 29 Alpha Methylosinus 92 99 Sphingomonas 99 100 trichosporium azotifigens OTU-6 25 Gamma Azotobacter 81 99 Psuedomonas 94 99 chroococcum stutzeri strain B3 OTU-7 25 Beta/Alpha Rubrivivax 91 99 Novosphingobium 99 100 gelatinosus malasiense OTU-8 17 Gamma Psuedomonas 91 98 Azotobacter 97 100 stutzeri chroococcum strain B3 OTU-9 17 Beta/Alfa Burkholderia 90 100 Sphingomonas 100 100 azotifigens OTU-10 16 Alpha Bradyrhizobium 97 98 Bradyrhizobium sp. 99 99 ORS 285 OTU-11 15 Alpha Bradyrhizobium 97 98 Bradyrhizobium 98 99 diazoefficiens OTU-12 10 Cyanobacterium Katagnymene 100 99 Trichodesmium 100 99 spiralis erythraeum

Clusters II, III IV

OTU-1 88 Alpha/Spirochaet Rhizobium sp 74 59 Treponema primitia 55 98 aceae ZAS-1] OTU-2 75 Delta/Firmicutes Geobacter 73 43 Desulfitobacterium 98 61 hafniense OTU-3 35 Verrumicrobia Opitutaceae 82 99 Coraliomargarita 95 99 bacterium akajimensis OTU-4 30 Alpha Rhodopseudomon 90 98 Rhodoplanes 96 99 as palustris elegans OTU-5 18 Delta/Chlorobi Desulfovibrio 79 99 Prosthecochloris 92 99 piezophilus sp. V1, Chloroherpeton thalassium, Chloroherpeton thalassium OTU-6 6 Beta/Delta Azoarcus 70 88 Enhygromyxa salina 70 74 communis OTU-7 4 Delta Desulfovibrio 81 99 Desulfovibrio 90 99 carbinolicus strain DSM 3852 inopinatus OTU-8 4 Delta/Firmicutes Desulfovibrio 77 100 Sporobacter 99 99 desulfuricans termitidis strain IC1

33

OTU-9 3 Delta/Lentisphae Desulfovibrio 84 100 Lentisphaerae 84 100 rae magneticus RS-1 bacterium DNA GWF2_57_35, Desulfatitalea tepidiphila, Desulfobacteraceae bacterium OTU-10 3 Delta/Methanoc Desulfovibrio 77 100 Methanocaldococcus 65 99 occi desulfuricans villosus strain IC1 OTU-11 2 Verrucomicrobi Verrucomicrobia 87 100 Verrucomicrobia 97 99 bacterium S94 bacterium S94 a 640 641 642

34 Interactive comment on “Diversity and distribution of Nitrogen Fixation Genes in the Oxygen Minimum Zones of the World Oceans” by Amal Jayakumar and Bess B. Ward

Anonymous Referee #1 Received and published: 27 January 2020

The study of Jayakumar and Ward investigates the microbial nitrogen fixing community in three major oxygen minimum zones in the worlds oceans. They took samples in the ETNP, ETSP and IO where they analysed OTUs using up-to-date DNA / RNA ex-traction and amplification methods. The specificity of the Q-PCR assays is described in detail and the construction of phylogenetic trees is fine. All methods are described extensively and the data set allows for the first time a solid comparison among sites. Moreover, the paper is clearly written and presents interesting conclusions such as low diversity in the Arabian Sea compared to other sites. The manuscript can be published as is.

I would only appreciate some details like the station number and depths sampled in the material and methods section. Otherwise it is a timely and very valuable contribution to our understanding of the diversity of nitrogen fixing microbes in the ocean.

Response: Thank you very much for your constructive comments, we now have Inserted new table with the information

Anonymous Referee #2 Received and published: 2 March 2020

In this study, the authors have reported the diversity and phylogeny of putative diazotrophs in the three major OMZs of the Ocean, including ETNP, ETSP and Arabian Sea. By analysing the clone libraries of nifH gene fragments derived from DNA and RNA samples (787 sequences in total), the authors compared the putative diazotroph communities in surface, oxycline and oxygen depleted water of the three OMZs. Basically, my major concerns are the significance of the finding and validity of the approach in this study. It should be noted that the phylogenetic diversity of putative diazotrophs in these three major OMZs have been reported detailedly in previous studies (Jayakumar et al., 2012 & 2017; Loescher et al., 2014; Cheung et al., 2016). Therefore, I afraid that the current study does not provide significant amount of new knowledge to the field. For the approach, 787 nifH sequences are definitely insufficient to reconstruct the diazotroph communities in different layers of the three OMZs. The authors can calculate the coverage indices to evaluate whether the sequencing depths are enough. With such limited dataset, I doubt if it is meaningful and convincing to compare the diversity and community composition of putative diazotrophs in different waters. The authors stated that most of the OTUs were not shared among the regions (L 297), while it could also be the result of limited sequencing depth. Given that this study is mainly about phylogenetic diversity, the authors should consider using high-throughput next generation sequencing of nifH amplicons to provide more convincing findings.

Response: Thank you very much for your detailed constructive comments. We are addressing the specific comments below. In previous studies, what has been reported, describes the distribution of phylotypes and rates of dinitrogen fixation in the ETNP and ETSP, no rates for nitrogen fixation were made for the Arabian Sea. In this study we are comparing the distribution of the phylotypes of nitrogen fixing microbes and their expression, amongst the three major OMZs and also between the oxic and anoxic depths and their biogeography. We agree that number of sequences obtained from some of the depths are low compared to what could be obtained using next generation sequencing on new samples (which are not available for the Arabian Sea). True, the rarefaction curves (Figure 2 and Figure 5) indicate that sampling did not approach saturation, and for Cluster I there were many singletons, indicating more unexplored diversity. Next generation sequencing of nifH amplicons would provide a better understanding of the distributions but that would be a future study in itself -- here we are presenting the synthesis that is possible for all three locations with the currently available data.

In the revised manuscript, we have compared our results more broadly to the few other available data sets from similar locations, and that synthesis points to some useful conclusions, which we now include in the discussion – see new text starting at L267 and L349. All the previous studies except Cheung et al. (2016) were based on clone libraries and they all, including Cheung et al (2016) find (where the data are available) 1) the same basic metabolic types and 2) dominance by a few OTUS in each sample. We interpret this pattern to reflect the presence of diverse assemblages that essentially bloom episodically in response to organic matter input. This interpretation makes sense for non-cyanobacterial clades in both clusters. The consistent detection of nifH genes from e.g., sulfate reducers and methanotrophs motivates further investigation of those metabolisms in ODZs.

Thus we think the conclusions from this study are indeed robust, and hypothesize that this new synthesis would be corroborated by more in depth sequencing.

Specific comments: 1) L30: The authors should briefly talk about the previous studies about the putative diazotrophs in OMZs, including the works done by other teams.

We have introduced the other relevant studies in the introduction and incorporate comparisons with those studies in the results and discussion.

2) L45: Please provide detailed information about the sampling locations and depths.

Response: Thank you, Referee#1 also suggested this and we now have Inserted new table with the information

3) L89-111: Details of qPCR assay were listed in the methodology, while the relevant result was not mentioned at all.

Response: removed this section from methodology

4) L123: Please specify the sequence number of each sample.

The accession number ranges are now provided.

5) L137: The diverse nifH phylotypes of 4 different clusters and their affiliated strains have already been discussed in the previous studies. Is there any the new finding worth elaborating? How about the correlation between nifH phylotypes and environmental variables?

The new discussion sections starting at L L267 and L349 point out the new interpretations and ideas that are made possible by this synthesis of data from multiple sites and in comparison with previous studies. This is also highlighted in the conclusions.

6) L281: How about the other stations? How many stations in total? It may be easier to follow if the authors show the diaoztroph community composition in each station clearly.

Response: A new table 1 is now included which lists out the stations, positions and number of sequences obtained from each station and depths.

Interactive comment on Biogeosciences Discuss., https://doi.org/10.5194/bg-2019-445, 2020