bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Genomic evidence for the degradation of terrestrial organic matter by pelagic Arctic

2 Ocean Chloroflexi

3

4 David Colatriano1, Patricia Tran1, Celine Guéguen2, William J. Williams3, Connie Lovejoy4, 5, and

5 David A. Walsh1*

6

7 1 Department of Biology, Concordia University, 7141 Sherbrooke St. West, Montreal,

8 Quebec,H4B 1R6, Canada

9 2 Department of Chemistry and School of the Environment, Trent University, 1600 West bank

10 Drive, Peterborough, Ontario, K9J 7B8, Canada

11 3 Fisheries and Oceans Canada, Institute of Ocean Sciences, 9860 West Saanich Road,

12 Sidney, British Columbia, V8V 4L1, Canada

13 4 Département de biologie, Institut de Biologie Intégrative et des Systèmes (IBIS)and Québec-

14 Océan, Université Laval, Québec, G1K 7P4, Canada

15 5.Takuvik Joint International Laboratory (UMI 3376), Université Laval (Canada) - CNRS

16 (France), Université Laval, Québec QC G1V 0A6, Canada

17

18

19 *Corresponding author: Phone: (514) 848-2424 (ext. 3477)

20 E-mail: [email protected]

21

22

23

1 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

24 Abstract

25 The Arctic Ocean currently receives a large supply of global river discharge and terrestrial

26 dissolved organic matter. Moreover, an increase in freshwater runoff and riverine transport of

27 organic matter to the Arctic Ocean is a predicted consequence of thawing permafrost and

28 increased precipitation. The fate of the terrestrial humic-rich organic material and its impact on

29 the marine carbon cycle are largely unknown. Here, the first metagenomic survey of the Canada

30 Basin in the Western Arctic Ocean showed that pelagic Chloroflexi from the Arctic Ocean are

31 replete with aromatic compound degradation genes, acquired in part by lateral transfer from

32 terrestrial bacteria. Our results imply marine Chloroflexi have the capacity to use terrestrial

33 organic matter and that their role in the carbon cycle may increase with the changing

34 hydrological cycle.

35

36 Introduction

37 The Arctic Ocean accounts for 1.4% of global ocean volume but receives 11 % of global

38 river discharge 1. Up to 33 % of the dissolved organic matter in the Arctic Ocean is of terrestrial

39 origin 2 and a major fraction of this terrestrial dissolved organic matter (tDOM) originates from

40 carbon-rich soils and peatlands 3,4. With thawing permafrost and increased precipitation

41 occurring across the Arctic 5, increases in freshwater runoff and riverine transport of organic

42 matter to the Arctic Ocean are predicted, which will increase tDOM fluxes and loadings 6,7. The

43 additional tDOM may represent new carbon and energy sources for the Arctic Ocean microbial

44 community and contribute to increased respiration, which would result in the Arctic being a

45 source of dissolved inorganic carbon to the ocean. Alternatively, as it moves from its source of

46 origin to the Arctic Ocean tDOM could become more recalcitrant to bacterial metabolism and

47 represent a long term sequestration of the newly released carbon making the Arctic more

48 carbon neutral 8,9 . However, an estimated 50% of Arctic Ocean tDOM is removed before being

2 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

49 released to the Atlantic, at least in part by microbial processes 10. As input of tDOM increases,

50 knowledge on its microbial transformation will be critical for understanding changes in Arctic

51 carbon cycling.

52 The marine SAR202 is a diverse and uncultivated clade of Chloroflexi bacteria that

53 comprise roughly 10% of planktonic cells in the dark ocean 11-15. SAR202 is also common in

54 marine sediments and deep lakes 16-18. It has long been speculated that SAR202 may play a

55 role in the degradation of recalcitrant organic matter 12,15, and the recent analysis of SAR202

56 single-cell amplified genomes (SAGs) lends support to this notion19. More generally,

57 Chloroflexi, including those in the SAR202 clade, are also present in the upper layers of the

58 Arctic Ocean 20, leading to the hypothesis that recalcitrant organic compounds present in high

59 Arctic tDOM could be utilized by this group.

60

61 Results

62 In this study, we analyzed Chloroflexi metagenome assembled genomes (MAGs)

63 generated from samples collected from the vertically stratified waters of the Canada Basin in the

64 Western Arctic Ocean (Fig. 1a). A metagenomic co-assembly was generated from samples

65 originating from the surface layer (5 m to 7 m), the subsurface chlorophyll maximum (25 m to 79

66 m) and a layer corresponding to the terrestrially-derived DOM fluorescence (FDOM) maximum

67 previously described within the cold CB halocline comprised of Pacific-origin waters (177 m to

68 213 m) 21. The Pacific-origin FDOM maximum is due to sea ice formation and interactions with

69 bottom sediments on the Beaufort and Chukchi shelves, which themselves are influenced by

70 coastal erosion and river runoff 21. Binning based on tetranucleotide frequency and coverage

71 resulted in 360 MAGs from a diversity of marine microbes (Fig. 1b). Six near complete

72 Chloroflexi MAGs were identified. Based on 16S rRNA gene phylogeny, these MAGs

73 represented 3 distinct SAR202 subclades (SAR202-II, -VI, -VII), the AncK29 clade and the TK10

3 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

74 clade (Fig. 2a). Estimated MAG completeness ranged from 77 % to 99 %, while contamination

75 ranged from 0 % to 2.3 % (Table 1). All MAGs exhibited highest coverage just below the

76 subsurface chlorophyll maximum (Fig. 2b) which is consistent with earlier findings on SAR202

77 distribution in the North Pacific Ocean 13. However, the concentration and composition of the

78 FDOM maximum in the CB is significantly different compared to the North Pacific Ocean 22 and

79 the North Atlantic Subtropical Gyre 23. A concatenated protein phylogeny demonstrated that the

80 SAR202 MAGs were distinct from previously published MAGs from the deep ocean 19 and

81 oxygen minimum zones 24 (Supplementary Fig. 1). Fragment recruitment of 21 TARA Ocean

82 metagenomic datasets spanning epipelagic to mesopelagic waters at 7 locations and 4 separate

83 bathypelagic metagenomes indicated that the CB Chloroflexi MAGs were not widely distributed

84 in the oceans (Fig. 2c, Supplementary Data 1). These findings are evidence that the Chloroflexi

85 MAGs represent genotypes that are rare outside Arctic marine waters.

86 The Chloroflexi MAGs contained many genes implicated in the degradation of aromatic

87 compounds typically associated with humic-rich tDOM (Supplementary Data 2). A single MAG

88 (SAR202-VII-2) from a previously undescribed clade (SAR202-VII) exhibited a striking

89 enrichment in these genes (Fig. 3a). Partial pathways for the catabolism of aromatic compounds

90 were recently reported from deep ocean SAR202 SAGs 19. To assess whether the abundance

91 and diversity of SAR202-VII-2 genes involved in aromatic compound catabolism is unique to

92 Arctic Ocean MAGs or is a more broad characteristic of marine Chloroflexi, we compared gene

93 content between SAR202-VII-2 and two SAGs (SAR202-V-AB-629-P13 and SAR202-III-

94 AAA240-O15) reported in 19. Of the 117 SAR202-VII-2 orthologs implicated in aromatic

95 compound degradation, 12 were identified in SAR202-III-AAA240-O15 and only one was

96 identified in SAR202-V-AB-629-P13, implying distinct and less diverse pathways in deep ocean

97 SAR202 compared to the Arctic Ocean populations (Supplementary Data 2).

4 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

98 Proteins for the modification and degradation of monoaryl and biaryl compounds were

99 predicted, including a diversity of aromatic ring-cleaving dioxygenases 25-27. A total of 42 ring-

100 cleaving dioxygenases targeting compounds related to catechol, protocatechuate and gentisate

101 were present in the six MAGs, with 25 dioxygenases predicted in SAR202-VII-2 alone (Fig. 3a-

102 b). Ring demethylation, hydroxylation and decarboxylation are important prerequisite steps to

103 prime diverse aromatic compounds for downstream oxidative cleavage 28,29. Thirty ring-

104 demethylating monooxygenases, ten ring-hydroxylating dioxygenases, and eleven ring-

105 decarboxylases were annotated in the SAR202-VII-2 MAG (Fig. 3c, Supplementary Data 2).

106 Proteins involved in the conversion of ring-cleavage products to central intermediates of the

107 citric acid cycle were also present in the SAR202-VII-2 MAG, including dehydrogenases (i.e.

108 2,3-dihydroxy-2,3-dihydrophenylpropionate dehydrogenase), decarboxylases (i.e. oxaloacetate

109 B-decarboxylase), aldolases (i.e. HMG aldolase and 4-carboxymuconolactone decarboxylase),

110 hydratases (i.e. 4-oxalmescanoate hydratase and 2-oxopent-4-enoate hydratase), isomerases

111 (i.e. mycothiol maleulpyruvate isomerase and muconolactone isomerase) and hydrolases (i.e. 3-

112 oxoadipate enol-lactonase and β-ketoadipate enol-lactone hydrolase) (Supplementary Data 2,

113 Supplementary Fig. 2). We note that we were unable to identify a single complete reference

114 pathway for humic-like aromatic compound degradation. Since estimated genome

115 completeness for SAR202-VII-2 was 99%, it is unlikely the genes were missed due to an

116 incomplete genome. Another explanation is that marine Chloroflexi genomes encode novel

117 pathway variants. Indeed, numerous metal-dependent hydrolases, hydrolases of the HAD family

118 and NAD(P)-dependent dehydrogenase were clustered in genomic regions with the ring-

119 modifying oxygenases, decarboxylases, and demethylases described above. In addition to the

120 array of aromatic compound degradation genes, the SAR202-VII-2 MAG also contained 34

121 copies of the flavin mononucleotide (FM)/F420-dependent monooxygenase catalytic subunit

122 (FMNO) proteins previously implicated in activation of recalcitrant organic compounds in the

123 deep ocean 19. These results are consistent with Chloroflexi in the Arctic Ocean having the

5 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

124 metabolic potential to access carbon and energy available in aromatic compounds typically

125 associated with tDOM.

126 The diversity of SAR202-VII-2 genes implicated in aromatic compound degradation lead

127 us to hypothesize that they may have originated by lateral gene transfer from terrestrial bacteria.

128 To test this, we targeted aromatic compound degradation genes (the ring-cleaving

129 dioxygenases, specifically) in the Chloroflexi MAGs for in-depth phylogenetic analyses. The

130 genomic diversity of marine Chloroflexi was expanded in our analysis by including 130

131 Chloroflexi MAGs recently assembled and binned from the TARA Oceans project 30. A number

132 of the SAR202-VII-2 ring-cleaving dioxygenase homologs were most closely related to proteins

133 from the TARA Ocean Chloroflexi MAGs and other marine originating genomes, particularly the

134 catechol dioxygenases (Supplementary Fig. 3), 3 gentisate 1,2 dioxygenases (Fig. 4) and

135 methylgallate dioxygenases (Supplementary Fig. 4) indicating that aromatic compound

136 degradation in Chloroflexi is not restricted to the Arctic Ocean. However, lateral gene transfer

137 from terrestrial bacteria was also evident. For example, an annotated gentisate dioxygenase

138 gene was positioned within a clade of terrestrial Actinomycetes (Fig. 4). Additional genes

139 involved in the degradation of structures related to catechol, protocatechuate and gentisate

140 were phylogenetically associated with homologs from terrestrial Acidobacteria (Supplementary

141 Fig. 5), Actinobacteria (Supplementary Fig. 6), Armatimonadetes (Fig. 4), Delta-

142 (Supplementary Fig. 3 and 6), Beta-proteobacteria (Supplementary Fig. 7) and a clade of

143 diverse terrestrial phyla (Supplementary Fig. 8). Additionally, 2 gentisate 1,2-dioxygenase genes

144 and 1 protocatechuate dioxygenase ligB gene were phylogenetically associated to a clade of

145 genes from both terrestrial Delta-proteobacteria and marine microbes (Fig. 4 and

146 Supplementary Fig. 8). These putative gene acquisitions were unlikely due to contaminating

147 scaffolds because the genes were located on long scaffolds that were assigned to Chloroflexi

148 with high confidence based on tetranucleotide frequencies and the phylogenetic identity of

6 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

149 house-keeping genes. Such a phylogenetic pattern supports the hypothesis that marine

150 Chloroflexi acquired the capacity for aromatic compound degradation, at least in part, by lateral

151 gene transfer from terrestrial bacteria.

152 Conclusion

153 In total, these results are consistent with Chloroflexi playing a role in tDOM

154 transformation in waters of the Arctic Ocean. This is the first study to our knowledge to

155 associate a specific microbial group with tDOM metabolism in the Arctic Ocean and it expands

156 on recent studies contributing to our understanding of the metabolic diversity of the abundant

157 yet uncultivated marine Chloroflexi 19,24. Moreover, lateral gene transfer from terrestrial bacteria

158 appears to have contributed to the evolution of aromatic compound degradation capabilities

159 within marine Chloroflexi, particularly in regions of the Arctic Ocean impacted by terrestrial input.

160 The majority of MAGs were restricted to the humic-rich Pacific-origin halocline of the CB,

161 however it is the surface waters that will be most immediately affected by increased freshwater

162 input 1. Hence, our initial observations suggest a need for further research on the distribution of

163 tDOM-utilizing microbes in other Arctic water masses with an aim to establish how common and

164 phylogenetically widespread tDOM metabolism is in the Arctic Ocean. These water masses

165 could include coastal surface waters at the mouth of the Mackenzie River, as well as regions of

166 differing DOM composition such as the East Siberian Sea 21. Moreover, metagenomic studies

167 such as this are, in essence, hypothesis-generating and future work that includes targeted

168 cultivation, in situ gene expression analysis, and rate measurement-based approaches are

169 required to validate and quantify microbial metabolic contributions to nutrient cycling. Overall it

170 is likely that marine Chloroflexi have the capacity to degrade tDOM, and their role in the Arctic

171 carbon cycle may increase as Arctic warming leads to greater inputs of terrestrial organic

172 matter.

173

7 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

174 Methods

175

176 Sampling and DNA extraction

177 Twelve samples for metagenomics were collected in September 2015 during the Joint

178 Ocean Ice Study cruise to the Canada Basin . For each sample, 4-8 L of seawater was

179 sequentially filtered through a 50 μm pore mesh, followed by a 3 μm pore size polycarbonate

180 filter and a 0.22 μm pore size Sterivex filter (Durapore; Millipore, Billerica, MA, USA). Filters

181 were preserved in RNAlater and stored at -80 °C until processed in the laboratory. DNA was

182 extracted from the Sterivex filter using the following method: filters were thawed on ice and

183 RNAlater was removed. The Sterivex was then rinsed twice with a sucrose-based lysis buffer,

184 and filled with 1.8 mL of the lysis buffer. Filters were treated with 100 μL of 125 mg mL-1

185 lysozyme and 20 μL of 10 μg mL-1 RNAse A and left to rotate at 37 °C for 1 hour. After

186 incubation, 100 μL of 10 mg mL-1 proteinase K and 100 μL of 20% SDS was added. Filters were

187 left to rotate for 2 hours at 55 °C. Lysate was removed from the filters. Protein was precipitated

188 and removed with 0.583 volumes of MPC Protein Precipitation Reagent (Epicentre, Madison,

189 WI, USA) and centrifugation at 10,000 g at 4 °C for 10 minutes. The supernatant was

190 transferred to a clean tube. DNA was precipitated with cold isopropanol, and resuspended in

191 low TE buffer.

192

193 Metagenomic sequencing, assembly, annotation, and binning

194 DNA sequencing of 12 samples was performed at the Department of Energy Joint

195 Genome Institute (Walnut Creek, CA, USA) on the HiSeq 2500-1TB (Illumina) platform. Paired-

196 end sequences of 2 × 150bp were generated for all libraries. A metagenome coassembly of all

197 raw reads was generated using MEGAHIT 31 with kmer sizes of 23,43,63,83,103,123. Gene

198 prediction and annotation was performed using the DOE Joint Genome Institute's Integrated

199 Microbial Genomes (IMG) database tool (version 4.11.0) 32. Metagenomic binning was

8 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

200 performed on scaffolds greater than or equal to 10 Kb in length using MetaWatt 33. Relative

201 weight of coverage binning was set to 0.75 and the optimize bins and polish bins options were

202 set to on. The taxonomic identity of MAGs was assessed using a concatenated phylogenetic

203 tree based on 138 single copy conserved genes as implemented in MetaWatt 33. Visualization of

204 the tree and mapping of data on to taxa was performed with iTOL 34. Estimation of MAG

205 completeness and contamination was performed using CheckM 35. Six Chloroflexi MAGs were

206 selected for further analysis based on the presence of a 16S rRNA gene, high completeness

207 and low contamination. Manual curation of the six Chloroflexi MAGs was performed and

208 suspected contaminating scaffolds (single copy genes most similar to non-Chloroflexi taxa)

209 were removed prior to further analysis of MAGs.

210

211 16S rRNA phylogenetic analysis

212 Chloroflexi diversity in the metagenomic assembly was assessed by 16S rRNA gene

213 analysis. All 16S rRNA genes in the co-assembly were assigned to taxonomic groups using

214 mothur 36 and the Wang method with a bootstrap value cutoff of 60 % 37. Chloroflexi 16S rRNA

215 genes greater than 360 bp were included in a phylogenetic analysis with Chloroflexi reference

216 sequences. A multiple sequence alignment was generated using MUSCLE (implemented in

217 MEGA6) 38. Phylogenetic reconstructions were conducted by maximum likelihood using

218 MEGA6-v.0.6 and the following settings: general time reversible model, gamma distribution

219 model for the rate variation with four discrete gamma categories, and the nearest-neighbour

220 interchange (NNI) heuristic search method 39 with a bootstrap analysis using 100 replicates.

221

222 Single protein and concatenated protein phylogenies

223 A concatenated protein phylogeny was constructed using 30 Chloroflexi reference

224 genomes and the 6 Canada Basin Chloroflexi MAGs. Orthologous genes in the 36 Chloroflexi

225 genomes were identified using ProteinOrtho 40. Fifty orthologs present in at least 34 of the 36

9 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

226 genomes were selected for concatenated phylogenetic analysis (Supplemetary Data 3). Each

227 orthologous protein family was aligned using MUSCLE (implemented in MEGA6) and alignment

228 positions were masked using the probabilistic masker ZORRO 41 , masking columns with

229 weights less than 0.5. The concatenated alignment consisted of 14,815 amino acid positions.

230 Phylogenetic reconstructions were conducted by maximum likelihood using MEGA6-v.0.6 and

231 the following settings: JTT substitution model, gamma distribution with invariant sites model for

232 the rate variation with four discrete gamma categories, and the nearest-neighbour interchange

233 (NNI) heuristic search method 39 with a bootstrap analysis using 100 replicates.

234 For phylogenetic analysis of ring-cleaving dioxygenase sequences identified in SAR202-

235 VII-2, query sequences were searched against UniRef90 and 130 Chloroflexi MAGs constructed

236 from the TARA Oceans dataset 30 . The TARA Ocean Chloroflexi MAGs were used as is with no

237 manual curation. UniRef90 sequences and the top TARA Ocean MAG hits for each

238 dioxygenase were aligned with their respective SAR202-VII-2 homologs with MUSCLE

239 (implemented in MEGA6) and alignment positions were masked using the probabilistic masker

240 ZORRO 41, masking columns with weights less than 0.5. Phylogenetic reconstruction was

241 conducted using the same settings as the concatenated phylogeny.

242

243 Comparative genomics and metabolic reconstruction

244 The distribution of orthologs across Arctic Ocean genomes, as well as the identification

245 of orthologs shared with the deep ocean SAGs, was determined using proteinortho 40. Inference

246 of protein function and metabolic reconstruction was based on the IMG annotations provided by

247 the JGI, including KEGG, Pfam, EC numbers, and Metacyc annotations. Metabolic

248 reconstruction was also facilitated by generated pathway genome databases for each MAG

249 using the pathologic software available through Pathway Tools 42.

250

10 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

251 Metagenomic fragment recruitment

252 The distribution of the Canada Basin MAGs in the global ocean was determined using

253 best-hit reciprocal blast analysis similar to Landry et. al 2017 19 . Unassembled metagenomic

254 data from 25 samples (Supplementary Data 1) was first recruited to the six Canada Basin

255 Chloroflexi MAGS as well as two SAR202 SAGs originating from the deep North Pacific Ocean

256 from the Hawaiian ocean time series (HOTS) and the deep North Atlantic Ocean 19.

257 Metagenomes from the TARA Ocean project used here were representative of the surface,

258 chlorophyll maximum and mesopelagic waters from the North Atlantic, South Atlantic, North

259 Pacific, Coastal North Pacific, South Pacific, Coast of Brazil and the Antarctic peninsula. To

260 reduce computational demand, only part 1 (1 Gbp of a random subset of reads) of each

261 metagenomic dataset available at EBI was used (Supplementary Data 1). Additional

262 bathypelagic metagenomes from the North Pacific and South Atlantic Oceans (LineP P04, P12

263 and P26, and Knorr S15 2500 m) were also included. All hits from the initial blast were then

264 reciprocally queried against the Canada Basin Chloroflexi MAGs, bathypelagic SAR202 SAGs,

265 and 130 Chloroflexi MAGs constructed from the TARA oceans data. The best hit was reported.

266 Only hits with an alignment length greater than or equal to 100 bp and a percent identity of 95%

267 or more were counted (lower % identity cut-offs did not alter the number of reads recruited in

268 any significant manner). To compare the results among the different data sets, the number of

269 recruited reads was normalized to total number of reads in each sample. The final coverage

270 results were expressed as the number of reads per kilobase of the MAG per gigabase of

271 metagenome (rpkg).

272

273 Data Availability

274 The metagenomic data generated in this study are available in the Integrated Microbial

275 Genomes database at the Joint Genome Institute at https://img.jgi.doe.gov, GOLD Project ID:

276 Ga0133547. Metagenome assembled genomes are available at NCBI under the Bioproject

11 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

277 XXXXXXXXX and accession numbers XXXXXX (for SAR202-II-3), XXXXXX (for SAR202-II-

278 177A), XXXXXXX (for SAR202-VI-29A), XXXXXXX (for SAR202-VII-2), XXXXXX (for Anck29-

279 46) and XXXXXX (for TK10-74A).

280 Acknowledgments

281 The data were collected aboard the CCGS Louis S. St-Laurent in collaboration with researchers

282 from Fisheries and Oceans Canada at the Institute of Ocean Sciences and Woods Hole

283 Oceanographic Institution’s Beaufort Gyre Exploration Program and are available at

284 http://www.whoi.edu/beaufortgyre. We would like to thank both the Captain and crew of the

285 CCGS Louis S. St-Laurent and the scientific teams aboard. The work was conducted in

286 collaboration with the U.S. Department of Energy Joint Genome Institute, a DOE Office of

287 Science User Facility, and was supported under Contract No. DE-AC02-05CH11231. Funding

288 from the Canadian Natural Science and Engineering Research Council (NSERC) Discovery

289 grants (D.W., C.G. and C.L.) and the Canada Research Chair Program (D.W., C.G.) are

290 acknowledged. D.C. was supported by FRQNT and Concordia’s Institute for Water, Energy and

291 Sustainable Systems.

292

293 Authors contributions

294 D.C. and D.W. designed and carried out the metagenomic experiments. W. J. W., C. L. and C.

295 G. designed the sampling strategy. W. J. W. provided oceanographic data. D.W. collected

296 samples. D.C. extracted the environmental genomic DNA and analyzed sequencing data. P. T.

297 contributed to the bioinformatic analyses. D.C. and D.W. with contributions from C. G. and C. L.

298 wrote the manuscript. All authors commented on the manuscript.

299

300 Competing interests

12 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

301 The authors declare no competing interests.

302

303 References

304 1. Carmack, E. C., Yamamoto-Kawai, M., Haine, T. W. N., Bacon, S., Bluhm, B. A., Lique, C.,

305 Melling, H., Polyakov, I. V., Straneo, F., Timmermans, M.L. and Williams, W. J. Freshwater and

306 its role in the Arctic Marine System: Sources, disposition, storage, export, and physical and

307 biogeochemical consequences in the Arctic and global oceans. J. Geophys. Res. Biogeosci.

308 121,675–717 (2016).

309 2. Opsahl, S. and Benner, R. Distribution and cycling of terrigenous dissolved organic matter in

310 the ocean. Nature 386, 480-482 (1997).

311 3. Benner, R., Benitez-Nelson, B., Kaiser, K. and Amon, R.M.W. Export of young terrigenous

312 dissolved organic carbon from rivers to the Arctic Ocean. Geophys. Res. Lett 31, 1-4 (2004).

313 4. Opsahl, S., Benner, R. and Amon, R.M.W. Major flux of terrigenous dissolved organic matter

314 through the Arctic Ocean. Limnol. Oceanogr. 44, 2017-2022 (1999).

315 5. Bintanja, R., Andry, O. Towards a rain-dominated Arctic. Nat. Clim. Change 7, 263-267

316 (2017).

317 6. Frey, K.E. and McCelland, J.W. Impacts of permafrost degradation on arctic river

318 biogeochemistry. Hydrol. Process 23, 169-182 (2009).

319 7. Vonk, J. E., Sánchez-García, L., van Dongen, B. E., Alling, V., Kosmach, D., Charkin, A.,

320 Semiletov, I. P., Dudarev, O. V., Shakhova, N., Roos, P., Eglinton, T. I., Andersson, A. and

321 Gustafsson, Ö. Activation of old carbon by erosion of coastal and subsea permafrost in Arctic

322 Siberia. Nature 489, 137-140 (2012).

13 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

323 8. Jiao, N., Herndle, G. J., Hansell, D. A., Benner, R., Kattner, G., Wilhelm, S. W., Kirchman, D.

324 L., Weinbauer, M. G., Luo, T., Chen, F. and Azam, F. Microbial production of recalcitrant

325 dissolved organic matter: long-term carbon storage in the global ocean. Nat. Rev. Microbiol. 8,

326 593-599 (2010).

327 9. Hansell, D. A. and Carlson, C. A. Biochemistry of Marine Dissolved Organic Matter, 2nd ed.

328 Academic Press, Amsterdam, Netherlands (2015).

329 10. Kaiser, K., Benner, R. and Amon, R.M.W. The fate of terrigenous dissolved organic carbon

330 on the Eurasian shelves and export to the North Atlantic. J. Geophy. Res.: Oceans 122, 4-20

331 (2017).

332 11. Giovannoni, S. J., Rappé, M. S., Vergin, K. L., Adair, N. L. 16S rRNA genes reveal stratified

333 open ocean bacterioplankton populations related to the green non-sulfur bacteria. Proc. Natl.

334 Acad. Sci. U S A 93, 7979-7984 (1996).

335 12. DeLong E. F., Preston, C. M., Mincer, T., Rich, V., Hallam, S. J., Frigaard, N. U., Martinez,

336 A., Sullivan, M. B., Edwards, R., Brito, B.R., Chisholm, S.W., Karl, D. M. Community genomics

337 among stratified microbial assemblages in the ocean’s interior. Science 311, 496-503 (2006)

338 13. Morris, R. M., Rappé, M. S., Urbach E., Conon, S. A. and Giovannoni, S. J. Prevalence of

339 the chloroflexi-related SAR202 bacterioplankton cluster throughout the mesopelagic zone and

340 deep ocean. Appl. Environ. Microbiol. 70, 2836-2842 (2004).

341 14. Schattenhofer, M., Fuchs, B. M., Amann, R., Zubkov, M. V., Tarran, G. A., Pernthaler, J.

342 Latitudinal distribution of prokaryotic picoplankton populations in the Atlantic Ocean. Environ.

343 Microbiol. 11, 2078-2093 (2009).

344 15. Varela, M. M., van Aken, H. M., Herndl, G. J. Abundance and activity of Chloroflexi-type

345 SAR202 bacterioplankton in the meso- and bathypelagic waters of the (sub)tropical Atlantic.

346 Environ. Microbiol. 10, 1903-1911 (2008).

14 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

347 16. Urbach, E., Vergin, K. L., Young, L., Morse, A., Larson, G. L., Giovannoni, S. J. Unusual

348 bacterioplankton community structure in ultra-oligotrophic Crater Lake. Limnol. Oceanogr. 46,

349 557-572 (2001).

350 17. Urbach, E., Vergin, K. L., Larson, G. L., Giovannoni, S. J. Bacterioplankton communities of

351 Crater Lake, OR: dynamic changes with euphotic zone food web structure and stable deep

352 water populations. Hydrobiologia 574, 161-177 (2007).

353 18. Yamada, T., Sekiguchi, Y. Cultivation of uncultured Chloroflexi subphyla: significance and

354 ecophysiology of formerly uncultured Chloroflexi “subphylum I” with natural and biotechnological

355 relevance. Microbes Environ. 24, 205-216 (2009).

356 19. Landry, Z., Swan, B. K., Herndl, G. J., Stepanauskas, R. and Giovannoni, S. J. SAR202

357 genomes from the dark ocean predict pathways for the oxidation of recalcitrant dissolved

358 organic matter. mBio 8:e00413-17 (2017).

359 20. Bano, N., Hollibaugh, J. T., Phylogenetic composition of bacterioplankton assemblages from

360 the Arctic Ocean. Appl. Environ. Microbiol. 68, 505-528 (2002).

361 21. Guéguen, C., McLaughlin, F. A., Carmack, E. C., Itoh, M., Narita, H. and Nishino, S. The

362 nature of colored dissolved organic matter in the southern Canada Basin and East Siberian

363 Sea. Deep Sea Res. Part 2 81-84, 102-113 (2012).

364 22. Dainard, P. G., Guéguen, C., Distribution of PARAFAC modeled CDOM components in the

365 North Pacific Ocean, Bering, Chukchi and Beaufort Seas. Mar. Chem. 157, 216-223 (2013).

366 23. Dainard, P. G., Guéguen, C., McDonald, N. and Williams, W. J. Photobleaching of

367 fluorescent dissolved organic matter in Beaufort Sea and North Atlantic Subtropical Gyre. Mar.

368 Chem. 177, 630-637 (2015).

15 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

369 24. Thrash, J. C., Seitz, K. W., Baker, B. J., Temperton, B., Gillies, L. E., Rabalais, N. N.,

370 Henrissat, B. and Mason, O. U. Metabolic roles of uncultivated bacterioplankton lineages in the

371 Northern Gulf of Mexico “dead zone”. mBio 8:e01017-17 (2017).

372 25. Barry, K. P. and Taylor, E. A. Characterizing the promiscuity of LigAB, a lignin catabolite

373 degrading extradiol dioxygenase from Sphingomonas paucimobilis SYK-6. Biochemistry 52,

374 6724-6736 (2013).

375 26. Kasai, D., Masai, E., Miyauchu, K., Katayama, Y. and Fukuda, M. Characterization of the

376 gallate dioxygenases are involved in syringate degradation by Sphingomonas paucimobilis

377 SYK-6. J. Bacteriol. 187, 5067-5074 (2005).

378 27. Werwath, J., Arfmann, H-A., Pieper, D. H., Timmis, K. N. and Wittich, R-M. Biochemical and

379 genetic characterization of a gentisate 1,2-dioxygenase from Sphingomonas sp. Strain RW5. J.

380 Bacteriol. 180, 4171-4176 (1998).

381 28. Fetzner, S. Ring-cleaving dioxygenases with a cupin fold. Appl. Environ. Microbiol. 78,

382 2505-2514 (2012).

383 29. Fuchs, G., Boll, M. and Heider, J. Microbial degradation of aromatic compounds - from one

384 strategy to four. Nat. Rev. Microbiol. 9, 803-816 (2011).

385 30. Tully, B., Graham, E. D. and Heidelberg, J. F. The reconstruction of 2,631 draft

386 metagenome-assembled-genomes from the global oceans. bioRxiv pre-print (2017).

387 31. Li, D., Liu, C.-M., Luo, R., Sadakane, K. and Lam, T.-W. MEGAHIT: an ultra-fast single-node

388 solution for large and complex metagenomics assembly via succinct de Bruijn graph.

389 Bioinformatics 31, 1674-1676 (2015).

390 32. Huntemann, M., Ivanova, N. N., Mavromatis, K., Tripp, H. J., Paez-Espino, D., Tennessen,

391 K., Palaniappan, K., Szeto, E., Pillay, M., Chen, I-M. A., Pati, A., Nielsen, T., Markowitz, V. M.

16 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

392 and Kyrpides, N. C. The standard operating procedure of the DOE-JGI Metagenome Annotation

393 Pipeline (MAP v.4). Stand. Genomic Sci. 11, 1-17 (2016).

394 33. Strous, M., Kraft, B., Bisdorf, R. and Tegetmeyer, H. E. The binning of metagenomic contigs

395 for microbial physiology of mixed cultures. Front. Microbiol. 3, 1-11 (2012).

396 34. Letunic, I., Bork, P. Interactive tree of life (iTOL) v3: an online tool for display and annotation

397 of phylogenetic and other trees. Nucleic Acids Res. 44, 242-245 (2016).

398 35. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. and Tyson, G. W. CheckM:

399 assessing the quality of microbial genomes recovered from isolates, single cells, and

400 metagenomes. Genome Res. 25, 1043-1055 (2015).

401 36. Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E. B.,

402 Lesniewski, R. A., Oakley, B. B., Parks, D. H., Robinson, C. J., Sahl, J. W., Stres, B., Thallinger,

403 G. G., Van Horn, D. J. and Weber, C. F. Introducing mothur: open-source, platform-

404 independent, community-supported software for describing and comparing microbial

405 communities. Appl. Environ. Microbiol. 75, 7537-7541 (2009).

406 37. Wang, Q., Garrity, G. M., Tiedje, J. M. and Cole, J. R. Naïve bayesian classifier for rapid

407 assignment of rRNA sequences into the new bacterial . Appl. Environ. Microbiol. 73,

408 5261-5267 (2007).

409 38. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high

410 throughput. Nucleic Acids Res. 32, 1792-1797 (2004).

411 39. Tamura, K., Stecher, G., Peterson, D., Filipski, A. and Kumar, S. MEGA6: molecular

412 evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725-2729 (2013).

413 40. Lechner, M., Findeiß, S., Steiner, L. Marz, M. Stadler, P. F. and Prohaska, S. J.

414 Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinform. 12 (2011).

17 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

415 41. Wu, M. Chatterji, S. and Eisen, J. A. Accounting for alignment uncertainty in phylogenomics.

416 PLoS ONE 7, e30288 (2012).

417 42. Karp, P. D., Latendresse, M. and Caspi, R. The Pathway Tools pathway prediction

418 algorithm. Stand. Genomic Sci. 5, 424-429 (2011).

419

420 Table 1. Genomic characteristics of MAGs

Size Completeness Contamination N50 # of (Mb) Cov (x) GC (%) (%) (%) (kb) Contigs SAR202-II-3 1.36 68 39 80 0 35 46 SAR202-II-177A 1.62 101 42 82 1 36 52 SAR202-VII-2 2.78 16 59 99 0 246 8 SAR202-VI-29A 1.52 16 46 97 2 59 2 TK10-74A 2.45 12 69 81 2.3 38 76 Anck29-46 1.11 11 32 77 0 75 22 421

422 Figure titles

423 Fig.1 | Metagenomic survey of microbial diversity in the Canada Basin. (A) Sampling

424 locations and environmental profiles of the Canada Basin generated with Ocean Data Viewer.

425 (B) Concatenated protein phylogeny of 360 Arctic Ocean MAGs inferred by MetaWatt and

426 visualized in iTOL. The three inner tracks present relative coverage of MAGs averaged across

427 samples collected from surface waters, subsurface chlorophyll maximum (SCM) and the

428 fluorescent dissolved organic matter maximum (FDOM max). The outer track presents

429 estimated MAG completeness as inferred by MetaWatt. MAG completeness ranged from 25-94

430 %.

431

432 Fig. 2| Diversity and distribution of Arctic Ocean Chloroflexi MAGs. (A) Maximum

433 likelihood tree of Chloroflexi based on partial 16S rRNA gene sequences. Blue taxa are from

18 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

434 Canada Basin metagenomes and red taxa are from the six Chloroflexi MAGs. Bootstrap values

435 of >70% are shown (100 replicates) (B) Distribution of MAGs in the Canada Basin based on

436 metagenome coverage at the surface, subsurface chlorophyll maximum (SCM) and fluorescent

437 dissolved organic matter maximum (FDOM max) (C) Distribution of MAGs in global ocean

438 metagenomes based on fragment recruitment. Two deep ocean SAR202 SAGs from Landry et

439 al. 11 were included for comparison. Location of TARA ocean metagenomes (red) and

440 bathypelagic metagenomes (green) are shown on the map.

441

442 Fig. 3| Aromatic compound degradation genes and pathways in Arctic Ocean Chloroflexi

443 MAGs (A) Abundance of aromatic compound degradation genes in Chloroflexi MAGs from the

444 Canada Basin (pie chart) and a breakdown of those found specifically in SAR202-VII-2 (column

445 plot) (B) Predicted aromatic ring-opening enzymatic reactions identified in SAR202-VII-2 with

446 gene loci displayed in the coloured boxes. Genes in green and blue boxes were most closely

447 related to homologs from terrestrial or marine bacteria, respectively. Genes in the blue/green

448 boxes were in clades containing diverse environmental bacteria. (C) Examples of predicted

449 aromatic ring-modifying enzymatic reactions identified in SAR202-VII-2.

450

451 Fig. 4| Maximum likelihood tree of predicted gentisate 1,2-dioxygenases. Bootstrap values

452 of >60% are shown (100 replicates) . Homologs from SAR202-VII-2 are highlighted in red and

453 homologs from TARA Ocean MAGs are highlighted in blue.

19 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuseCB2 allowed CB4 without permission. CB8 CB11 A o 0 80 N 0.75 o C) CB11 0 CB8 300 -0.75 Temperature ( -1.5 CB4 0 33 CB2 30 300 Salinity (PSU) 70oN 27 0 3 ) USA 0.3 Depth (m) 0.2 Chlorophyll CANADA 300 fluorescence (mg/m 60oN 0.1

0 4.5 3 ) 50oN 3.75 FDOM (mg/m 300 3

40oN Ocean View Data 2.25 150oW 120oW 90oW 60oW 72oN 73oN 74oN 75oN 76oN 77oN 78oN 79oN 80oN Latitude B Completeness

FDOM max SCM Surface

Archaea Euk aryote a s teri bac Cya ido no Ac ba cte teo ri ro a -p ta el D

C h lo ro e x i

A c a t i i r n e o t c b a a b c t o e

e r t i a o r

p -

a

h

p l

A

B

Relative Coverage a

c

t

0 0.2 0.4 0.6 0.8 1 e

r

o

i

d

e

t

e

s

M

G

A

V

e

r

r

u

c

o

P

l

a

n

c

t

o

a

i

r

e

t

c

a

b

o

e

t

o

r

p -

a

t

e Surface B

SCM

FDOM max

G

a

m

m

a

- p

r

o

a t i e r o e t b c a Completeness

Figure 1 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A 92 SAR202-II (Ga0133547_10000520) SAR202-II (Ga0133547_11207734) 91 SAR202-II (Ga0133547_10125571) MAG SAR202-II-3 (Ga0133547_10001736) 100 Sargasso Sea SAR261 (AY534091.1) SAR202-II (Ga0133547_10453494) Sargasso Sea SAR194 (AY534098.1) SAR202-II (Ga0133547_10310572) II SAR202-II (Ga0133547_10341800) SAR202-II (Ga0133547_10001767) 91 100 Sargasso Sea SAR317 (AY534093.1) B SAR202-II-3SAR202-II-177ASAR202-VII-2SAR202-VI-29ATK10-74AAnck29-46 MAG SAR202-II-177A (Ga0133547_1000294) CB2 SAR202-II (Ga0133547_11014312) 50 SAR202-II Ga0133547_10000170 100 Crater Lake CL500-10 (AF316764.1) CB4 Crater Lake CL500-9 (AF316763.1) 20 99 Arctic Ocean Arctic95A18 (AF355054.1) 100 Surface CB8 SAR202-I (Ga0133547_11227957) I 10 Sargasso Sea SAR272 (AY534088.1) CB11 100 Sargasso Sea SAR188 (AY534087.1) 5 100 North Aegean Sea AEGEAN_116 (AF406538.1) CB2 100 SAR202-I (Ga0133547_11319260)

Coverage (x) 2 MAG SAR202-VII-2 (Ga0133547_10000360) 100 VII Sponge Klon-I-93-voll (HF912758.1) CB4 1 SAR202 (Ga0133547_10004625) SCM 97 Sargasso Sea SAR242 (AY534089.1) Sargasso Sea CB8 North Atlantic Ocean AB-629-P13 v2 (2640444700) V 0 Western Arctic Ocean Ga0133547_10010958 CB11 Western Arctic Ocean Ga0133547_10017273 85 100 Western Arctic Ocean Ga0133547_100409911 VI SAR202 Cluster 72 MAG SAR202-VI-29A (Ga0133547 10002442) CB2 Western Arctic Ocean Ga0133547_10035846 Trichloroethene-contaminated site FTL276 (AF529110.1) 100 CB4 Western Arctic Ocean Ga0133547_10016871 VIII 100 Seawater clone SAR259 (AY534094.1) Sargasso Sea CB8 100 Western Arctic Ocean Ga0133547_10002546 IV FDOM max Pacific Ocean sediment MBMPE42 (AJ567556.1) Pacific Ocean sediment MBAE74 (AJ567614.1) CB11 Seawater clone D92_22 (AY534099.1) Sargaso Sea 91 100 Western Arctic Ocean Ga0133547_10472218 Forrest soil C083 (AF507700.1) 87 Sargasso Sea SAR202 (U20797.1) Station Aloha AAA240-N13 (2264260034) 95 III 84 Station Aloha AAA240-O15 (HQ675645.1) Sargasso Sea SAR307 (U20798.1) 89 Arctic Ocean Arctic96BD6 (AF355053.1) Atlantic Ocean AAA001-F05 (HQ675371.1) 90 Atlantic Ocean AAA007-M09 (HQ675482.1) 100 Dehalococcoidia Anck29-46SAR202-V-AB-629-P13SAR202-III-AAA240-O15 Sponge TK17 (AJ347036.1) SAR202-II-3SAR202-II-177ASAR202-VII-2SAR202-VI-29ATK10-74A 98 C Southeast India sediment bac140 (JX138712.1) 83 TK10 Cluster Surface 99 MAG SAR202-TK10-74A (Ga0133547_10008775) 5 100 Sponge XC1005 (JN596666.1) SCM Red Sea Sponge C433/GW1046 (HG423495.1) 2.5 Chloroflexi (Ga0133547_10005725) Chloroflexi (Ga0133547_10000969) Mesopelagic RPKG Chloroflexi (Ga0133547_10365966) 0.5 100 Anck29 (Ga0133547_10009981) Bathypelagic 100 Sponge-associated clone AncK29 (FJ900348.1) 0 94 Hotoke-Ike Lake MPB1-195 (AB630577.1) Anck29 (Ga0133547_10000780) Gulf of Mexico DS34_F10 (KU578722.1) Anck29 Gulf of Mexico 86 (HQ015617.1) 99 Atlantic Ocean MOC-1-361 (KX427939.1) Cluster 86 Anck29 (Ga0133547_10183849) MAG SAR202 Anck29-46 (Ga0133547_10007268) Saanich Inlet SGPZ645 (GQ346975.1) Saanich Inlet SGPZ590 (GQ346947.1) Thermoflexus hugenholtzii JAD2 (KC526151.1) 100 Caldilineaeceae 84 Ardenticatena maritima (AB576167.1) Station Aloha HF500_13N24 (EU361068.1) 99 Station Aloha AAA240O05 (HQ675640.1) Deep Ocean 100 Canada Basin CB0563b.27 (GQ337107.1) Cluster Station Aloha HF770_09E03 (EU361081.1) 100 Gulf of Mexico 52331 (JN018849.1) 100 100 DOC (Ga0133547_10351421) Crater Lak CL500-11 (AF316759.1) Crater Lake, 500 m 100 100 CL500-11 Lake Michigan CL500-11LM (Denef et al. 2016) Lake Michigan Cluster 100 Anaerolineae 100 Lake Zurich ZS4311 (FN668371.2) Lake Gatun 5C231021 (EU903451.1) 100 Ktedonobacteria 100 Thermomicrobia

100 Chloroflexia 100 Chloroflexia

0.08

Figure 2 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A 40

20 3 SAR202-VII-2 9 30 SAR202-II-3 13 SAR202-II-177A 117 8 SAR202-VI-29A Anck29-46 20 TK10-74A No. of genes 10

0

O-demethylases Ring-decarboxylases

Gentisate 1,2-dioxygenaseCatechol 2,3-dioxygenase

Protocatechuate 4,5-dioxygenase Ring-hydroxylating dioxygenase Flavin-dependent monooxygenases

B C Dearomatization Decarboxylation

ligA ligB 2_0234 2_0233 4,5-dihydroxyphthalate 4_0522 4_0523 decarboxylase 4_0200 4_0237

4_0238 8610_0002

8610_0007 Demethylation

Gentisate dioxygenase 2_0146 5,5’-dehydrodivanillate 4_0244 O-demethylase 60_0179 103_0019

360_0073 2_0528

4_0172 Hydroxylation

Catechol dioxygenase

2_0074 3-phenylpropionate/ 2_0240 trans-cinnamate dioxygenase

2_0408 Terrestrial clade 2_0449 Marine clade 45_0109 Mixed clade 60_0003 60_0202 336_0078 biphenyl 2,3-dioxygenase 297_0090

Figure 3 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

74 Streptomyces avicenniae | Terrestrial, Avicennia mariana rhizosphere (UPI00069C8037) Actinomycetales Allosalinactinospora lopnorensis | Terrestrial, Tamarisk tree rhizosphere (UPI000623DDAE) Actinoplanes globisporus | Terrestrial, Soil (UPI00035C3EFA) Pseudonocardia sp. CNS139 | Marine, Sediment (A0A1Q9SZW9) 100 Pseudonocardia spinosispora | Terrestrial, Soil (UPI0003FB56B0) Pseudonocardia acaciae | Terrestrial, Acacia auriculiformis rhizosphere (UPI00048CFA4D) 60 Streptomyces sp. MnatMPM17 | Terrestrial, Soil (A0A1C5CS47) MAG SAR202-VII-2 (Ga0133547_10000004_0224) 87 100 Streptomyces odonnellii | Terrestrial, Soil (UPI0004C7D6A2) Mycobacterium sp. JS623 | Human, Soft tissue lesion (L0ITX6) Streptomyces sp. | Experimentally validated (Q7X284) 98 Leadbetterella sediminis | Freshwater sediment (A0A0P7C872) Chryseobacterium piperi | Freshwater, Creek (A0A086BJU5) 100 Solirubrobacter sp. URHD0082 | Terrestrial, Soil (UPI000422E0D1) 73 Chloroflexi bacterium 5419 | Bioreactor (A0A1Q3SS95) Acidobacteria bacterium | Terrestrial, Soil (A0A1Q6XSE4) Deltaproteobacteria bacterium | Terrestrial, Soil (A0A1F9KR35) Ca. Rokubacteria bacterium | Terrestrial, Soil (A0A1Q7GJ09) 100 TOBG EAC-693 Hydrothermal vent metagenome | Marine, Hydrothermal vent (A0A160V8U3) Acidobacteria bacterium | Terrestrial, Soil (A0A1Q7MJD6) 100 Burkholderiaceae arationis LMG 29324 | Terrestrial, Soil (A0A158B6L5) Paracoccus versutus | (A0A099FP16)

100 bacterium | Freshwater, Groundwater (A0A1F3ZF54) Deltaproteobacteria bacterium | Terrestrial, Soil (A0A1Q7J4E1) 92 Ca. Rokubacteria | Terrestrial, Soil (A0A1Q6Z3E1) Sulfitobacter geojensis | Marine, Coastal waters (UPI00046AA492) Kyrpidia tusciae strain DSM 2912 | Freshwater, Pond (D5WS41) Pseudomonas | Experimnetally validated (Q9S3U6) Chloroflexi bacterium | Terrestrial, Soil (A0A1Q7QL69) Conexibacter woesei | Terrestrial, Soil (D3FAR9) Nocardia arthritidis | Human (UPI0007A3EF410 Hydrothermal vent metagenome | Marine, Hydrothermal vent (A0A160V9N6) 100 Hydrothermal vent metagenome | Marine, Hydrothermal vent (A0A160V9W9) 100 TOBG NAT-93

100 TOBG SP-320 100 MAG SAR202-VII-2 (Ga0133547_10000002_0146) MAG SAR202-VII-2 (Ga0133547_10000360_0073) Magnaporthe oryzae | Terrestrial (Q2KEQ4) Hydrogenophaga pseudoflava | (UPI00082636C0) Haloferax sp. | Experimentally validated (Q330M9) 100 100 TOBG SAT-91 MAG SAR202-VII-2 (Ga0133547_10000103_0019) Dyadobacter beijingensis | Terrestrial, Soil (UPI00037A3AB9) 95 Stigmatella aurantiaca | (A0A1H8DFG8)

83 Photorhabdus temperata | Insect symbiont (W3V0B8) Chromobacterium sp. LK11 | Terrestrial (A0A0J6NGJ2) 100 Micromonospora rhizosphaerae | Terrestrial, Mangrove rhizosphere (A0A1C6SGZ1) Actinoplanes globisporus | Terrestrial, Soil (UPI000371AF25) 100 lata | (UPI000829F41D) Nodosilinea nodulosa | Marine (UPI0003025B02) Betaproteobacteria bacterium | Freshwater, Groundwater (A0A1F3ZZ05) Chloroflexi bacterium | Terrestrial, Soil (A0A1Q8AAT0) 98 Cohnella kolymensis | Terrestrial, Permafrost soil (A0A0C2QCW1) Acidobacteria bacterium | Freshwater, Groundwater (A0A1F2T6K4) 100 Armatimonadetes bacterium | Terrestrial, Soil (A0A1Q7Q8F3) 96 Deltaproteobacteria bacterium | Terrestrial, Soil (A0A1F9KKT3) 64 Chloroflexi bacterium | Terrestrial, Soil (A0A1Q7MBJ3) 100 Frankia sp. Iso899 | Terrestrial, Soil (UPI0003B34178) 94 MAG SAR202-VII-2 (Ga0133547_10000060_0179) Armatimonadetes bacterium CSP13 | Freshwater sediments (A0A0T6ART8) Microbacterium sp. Root53 | Terrestrial, Arabidopsis thaliana root (A0A0Q7UW44) 75 Rhizobiales bacterium 6217 | Bioreactor (A0A1Q4BBC1) 74 100 Pusillimonas noertemannii | Freshwater, River water (UPI0002D6E76C) Puniceibacterium sp. IMCC21224 | Marine, Coastal waters (A0A0J5QC59) 78 bacterium | Terrestrial, Soil (A0A1Q7BII2) Streptomyces sp. AcH 505 | Terrestrial, Soil (A0A0C2AZL0) 97 Deltaproteobacteria bacterium | Freshwater, Groundwater (A0A1F8XVR5) 99 Deltaproteobacteria bacterium | Terrestrial, Soil (A0A1F9K2A2) 93 Deltaproteobacteria bacterium | Terrestrial, Soil (A0A1F9KQT30 Marine sediment metagenome | Marine sediments (A0A0F9CXY8) MAGSAR202-VII-2 (Ga0133547_10000002_0528) 100 MAG SAR202-VII-2 (Ga0133547_10000004_0172) TOBG SP-253 100 TOBG SP-2981 Marine microorganism HF4000ANIW93N21 | Marine (B3T345) 93 Deltaproteobacteria bacterium | Terrestrail, Soil (A0A1F9JXN1) Deltaproteobacteria bacterium | Terrestrial, Soil (A0A1F9KR04) Marinobacter algicola DG893 | Marine (A6F042) 65 Deltaproteobacteria bacterium | Terrestrial, Soil (A0A1Q7JAY6)

100 Metallosphaera yellowstonensis MK1 | Freshwater, Thermal spring (H2C979) Sulfolobales archaeon AZ1 | Freshwater, Acidic hot spring (W7KL35) Ca. Acidianus copahuensis | Freshwater, Acidic hot spring (A0A031LPX2) Figure 4 0.4 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not Supp. Figure 1 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Station Aloha SCGC AAA240-O15 80 MAG SAR202-II-3 100 MAG SAR202-II-177A Gulf of Mexico OMZ SAR202 43 100 100 Gulf of Mexio OMZ SAR202 43-1 SAR202 cluster Gulf of Mexico OMZSAR202 43-2 100 MAG SAR202-VI-29A MAG SAR202-VII-2 98 North Atlantic Ocean SCGC AB-629-P13 80 100 Dehalococcoides ethenogenes 195

100 Dehalococcoides mccartyi GY50 Dehalogenimonas alkenigignens 100 Dehalococcoidia 100 100 Dehalogenimonas lykanthroporepellens BL-DC-9

98 Dehalococcoidia bacterium DscP2 Dehalococcoidia bacterium SCGC_AB-539-J10 98 MAG TK10-74A 98 Non-SAR202 Marine Chloroflexi MAG Anck29-46 Thermogemmatispora carboxidivorans PM5 97 100 Ktedonobacter racemifer SOSP1-21 DSM 44963 Thermoflexus hugenholtzii JAD2 Caldilinea aerophila 100 100 100 Lake Michigan CL500-11LM Anaerolinea thermophila UNI-1 Thermorudis peleae 100

100 Thermomicrobium roseum DSM 5159 Sphaerobacter thermophilus 100 Nitrolancea hollandica Lb Roseiflexus sp. RS-1 100 100 Roseiflexus castenholzii HLO8 DSM 13941 Kouleothrix aurantiaca COM-B 100 Candidatus Chlorothrix halophila Oscillochloris trichoides DG6 100 100 Chloroflexus aurantiacus J-10-fl Chloroflexus aggregans DSM 9485 Herpetosiphon geysericola DSM 7119 100 Herpetosiphon aurantiacus DSM 785

0.2 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Fig 1. A concatenated protein phylogeny of Chloroflexi genomes

including 30 Chloroflexi reference genomes and the 6 Canada Basin Chloroflexi MAGs.

This is a maximum likelihood tree based on concatenated sequence alignment of fifty amino

acid sequences (representing of 14,815 amino acid positions), present in at least 34 of the 36

Chloroflexi genomes. Bootstrap values are based on 100 replicates.

bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Figure 2

3-phenylpropanoate/ 3-hydroxycinnamate degradation Phthalate degradation 5,5’-dehydrodivanillate degradation

Catechol degradation (III) or

1.14.12.7 1.14.13.127 8 1 1 1.13.11.1 Protocatechuate degradation (II) 30 1.14.13.- or

7 Protocatechuate degradation (I) 1.3.1.64 1 1 5.51.1 1.13.11.3 1.3.1.87 4.1.1.55 Gentisate degradation 8 1 or Catechol degradation (II) 2 1.13.11.8 2 1 1 5.5.1.2 5.3.3.4 11 1.13.11.4 1.13.11.16 3 2 8 4 4.1.1.44 1.13.11.2 or 1 6 2 1 1 3 3 3.7.1.14 Spontaneous 2 2 2 3.1.1.24 1 1 3 4 5.2.1.4 4 1.2.1.85

Fumarate Succinate Methylgallate degradation 2.8.3.6 1.1.1.312 1 1 3.7.1.20 TCA Spontaneous TCA 5.3.2.6 1.13.11 1 4.1.1.77 3

2.3.1.174 1 3 3 1 3.1.1.57 4.2.1.80 Gallate degradation 4.1.1.- 3 3 1.13.11.57

TCA 4.1.3.39 Spontaneous

1.2.1.10 Vanillate degradation

4.2.1.83 4 SAR202-II-3SAR202-II-177ASAR202-VII-2SAR202-III-AAA240-O15 Acetyl CoA TCA TCA 4.1.3.17 SAR202-VI-29ATK10-74AAnck29-46SAR202-V-AB-629-P13 1 Pyruvate 1.1.1.38/ 1.1.1.40/4.1.1.3

3 4 16 TCA 4 1 1 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Fig 2. Aromatic compound degradation pathways identified in the Arctic

Ocean and deep ocean Chloroflexi genomes. Circles represent the six Arctic Ocean MAGs

and two deep ocean SAGs. The values in the circles represent the number of homologs

identified in each respective genome. These reference metabolic pathways from MetaCyc were

populated with proteins based on functional annotations available at the Integrated Microbial

Genomes database.

bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not Supp. Figure 3 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

100 Aliiroseovarius crassostreae | Marine (A0A0P7KJ16) 9 8 Nitratireductor aquibidomus RA22 | Marine (I5BUF3) 9 8 Gordonia otitidis NBRC 100426 | (H5TP87)

9 1 Paenibacillus sp. VKM B-2647 | Frozen volcanic ash (A0A0C2UQ70) 100 Chlamydia trachomatis | (A0A0E9G1Q0) unclassified Chloroflexi | Soil (A0A1Q7QMG5) 7 7 100 Streptomyces sp. MnatMP-M27 | Terrestrial, Bioprocessing plant (A0A1C5BR02) Arthrobacter sp. 161MFSha2.1 | Terrestrial, Rhizosphere (UPI00039C2D77) Rhodoplanes sp. Z2-YC6860 | Isolate from agricultural samples (A0A140GUJ7) 8 5 Mesorhizobium bacterium | Drinking water, Freshwater (A0A090EV43) 6 3 Tardiphaga robiniae | Terrestrial, Vavilovia formosa symbiont (A0A161QNM3) 100 Bradyrhizobium valentinum | Terrestrial, Lupinus mariae-josephae symbiont (A0A0R3L1I1) Ca. Rokubacteria | Soil (A0A1Q6YUZ9) Pseudonocardia spinosispora | Terrestrial, Gold mine (UPI0004169A7F) 8 8 Betaproteobacteria bacterium | Freshwater, Groundwater (A0A1F4A0U7) Betaproteobacteria bacterium | Freshwater, Groundwater (A0A1F3ZXD0) Alphaproteobacteria bacterium | Soil (A0A1F2ZXE7) 100 Deltaproteobacteria bacterium | Marine (A0A1F9KEB4) Rhodospirillales bacterium URHD0088 | Soil (UPI000483E998) Ca. Entotheonella sp. TSY1 | Marine, Sponge symbiont (W4LS37) Betaproteobacteria bacterium SG8 41 | Marine, White Oak River Estuary (A0A0S8C180) 9 9 Betaproteobacteria bacterium | Freshwater, Groundwater (A0A1F4AIL6) Ca. Rokubacteria | Soil (A0A1Q7G0Z7) 9 3 MAG SAR202-VII-2 (Ga0133547_10000002_0074) 100 unclassified Deltaproteobacteria | Groundwater (A0A1F9A6F9) Pseudomonas putida | Experimentally validated (P06622) Cyanobacteria bacterium | (A0A1Q7WSB7) 9 9 Acidobacteria bacterium | Freshwater, Groundwater (A0A1F2TRP4) 6 3 Acidobacteria bacterium | Terrestrial, Soil (A0A1F2U4F4) 9 6 Novosphingobium sp. AAP1 | Lake, Freshwater (A0A0N0JJN4) 100 Sphingobium xenophagum | Marine, Atlantic Ocean (Q50912) Sphingomonas paucimobilis NBRC 13935 | Hospital ventilator (A0A0C9NLQ6) TOBG RS-416 9 0 SAR202 cluster bacterium sp. SCGC AAA240-O15 | Marine, HOTS (2521613138) 8 0 SAR202 cluster bacterium sp. SCGC AAA240-O15 | Marine, HOTS (2521612387) 6 0 SAR202 cluster bacterium sp. SCGC AAA240-O15 | Marine, HOTS (2521612051) Deltaproteobacteria bacterium | Soil (A0A1F9KCU9) Nocardia nova SH22a | Terrestrial, Rhyxoplane (W5TJ40) 100 Mycobacterium sp. VKM Ac-1817D | Terrestrial, Soil (A0A0C5Y2E9) Sphingomonas sp. 67-36 | Ammonium sulfate bioreactor (A0A1M3D7P9) 9 9 Ca. Entotheonella | Marine, Theonella swinhoei symbiont (W4L5U7) 8 8 uncultured bacterium CSL144 | (G4WVR1) 100 Ferriphaselus amnicola | Freshwater, Groundwater (A0A0E9M983) Ferriphaselus sp. R-1 | Freshwater, Groundwater, Microbial mat (UPI000556BCD2) 8 3 Ca. Rokubacteria bacterium | Sediment (A0A0T5ZU72) 9 6 Ca. Rokubacteria bacterium | Terrestrial, Soil (A0A1F7MKX5) Nitrospira defluvii | Freshwater (D8P7G2) Chloroflexi bacterium 54-19 | Ammonium Sulfate bioreactor (A0A1Q3TUP7) 9 1 Ca. Giovannonibacteria bacterium | Freshwater, Groundwater (A0A1F5X5L8) Frankia inefficax | Terrestrial, Plant symbiont (E3IZS7) 7 2 MAG SAR202-VII-2 | (Ga0133547 10000045_0109) 7 5 TOBG SP-320 Betaproteobacteria bacterium | Fershwater, Groundwater (A0A1F4BZV0) 100 Burkholderia bacterium | (Q2T2L8) 8 0 Ca. Burkholderia humilis | Terrestrial, leaf nodule (A0A0K9JVQ6) SAR202 cluster bacterium sp. SCGC AAA240-O15 | Marine, HOTS (2521612191) Carnobacterium sp. CP1 | Marine, Antarctic (A0A0U3DRF6) 100 Halobacillus sp. BBL2006 | Sulfur-rich salt spring (A0A0B0D3E9) Acinetobacter sp. TGL-Y2 | Frozen soil (A0A143G2H2) 9 3 MAG SAR202-VII-2 | (Ga0133547_10000002_0449) TOBG EAC-693 9 7 Halomonas sp. PR-M31 | Marine, Sea cucumber symbiont (UPI0006519CB3) Halomonas subglaciescola | Antarctic Saline Lake (A0A1M7HAS5) MAG SAR202-VII-2 | (Ga0133547_10000297_0090) TOBG RS-362 TOBG SP-253 100 MAG SAR202-VII-2 | (Ga0133547_10000336_0078) 7 6 MAG SAR202-VII-2 | (Ga0133547_10000060_0003) MAG SAR202-VII-2 | (Ga0133547_10000002_0240) SAR202 cluster bacterium sp. SCGC AAA240-O15 | Marine, HOTS (2521613060) 9 0 SAR202 cluster bacterium sp. SCGC AAA240-O15 | Marine, HOTS (2521612251) SAR202 cluster bacterium SCGC AB-629-P13 | Marine, North Atlantic Ocean (2640444676) MAG SAR202-VII-2 | (Ga0133547_10000060_0202) 100 MAG SAR202-VII-2 | (Ga0133547_10000002_0408) 6 6 TOBG NAT-123

0.50 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Fig 3. Phylogenetic analysis of predicted catechol 2,3-dioxygenase (pfam

00903) protein sequences identified in SAR202-VII-2 . This is a maximum likelihood tree

based the alignment of catechol 2,3-dioxygenase sequences identified in SAR202-VII-2,

UniRef90 sequences and Chloroflexi MAGs constructed from the TARA Oceans dataset.

Bootstrap values of >60% are shown (100 replicates). SAR202-VII-2 sequences are shown in

red while sequences originating from the TARA Oceans MAGs are shown in blue.

bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Supp. Figure 4

Alicyclobacillus kakegawensis | (UPI00082FCD86) 66

83 Alicyclobacillus acidoterrestris strain ATCC 49025/DSM 3922/CIP 106132/NCIMB 13137/GD3B | Terrestrial, Soil (T0BNY6)

100 Paenibacillus naphthalenovorans | Terrestrial, Soil (A0A0U2IMX2)

Alicyclobacillus hesperidum URH17-3-68 | Freshwater, Thermal Spring (J9HCL8)

86 Bacillus thermotolerans | Terrestrial, Soil (A0A0F5HP79)

86 Cupriavidus sp. UYMMa02A | Terrestrial, Mimosa magentea nodule (A0A1E4QNX7)

Nocardia jiangxiensis | Terrestrial, Soil (UPI0002E47CB9) 100 Nocardioides sp. Iso805N | Terrestrial, Soil (UPI000377A736)

Pseudonocardia spinosispora | Terrestrial, Soil (UPI0004000C03)

Deltaproteobacteria bacterium | Terrestrial, Soi (A0A1F9KFR3)

Betaproteobacteria bacterium | Terrestrial, Soil (A0A1F4F7S0)

TOBG NAT-575

MAG SAR202-VII (Ga0133547_10000004_0237) 98 MAG SAR202-VII (Ga0133547_10008610_0007) 66 99 TOBG SP-320 85 Ca. Rokubacteria bacterium 13_1_40CM_68_15 | Terrestrial, Soil (A0A1Q7D3F6) 76 Pseudonocardia thermophila | Terrestrial, Animal manure (A0A1M6UG12)

MAG SAR202-VII (Ga0133547_10000004_0238) 65 Ca. Entotheonella sp. TSY1 | Marine sponge symbiont (W4LAU4)

Caballeronia sordidicola | Terrestrial (UPI0004D02645) 89

Burkholderiaceae sp. | Freshwater, Drinking water (A0A1A9MY11) 100 Cupriavidus sp. UYMMa02A | Terrestrial, Mimosa magentea nodule (A0A1E4QSE4) 100 Burkholderiaceae bacterium 16 | Terrestrial, Soil (A0A0F0FCF7) 85 Pseudomonas azotifigens | Terrestrial, Compost material (UPI0003FE4389)

100 Sphingomonas mali | Estuary sediment (UPI0008299C14)

Skermanella stibiiresistens | Terrestrial, Soil (W9H036)

Sphingomonas paucimobilis GN_desZ_PE_4_SV_1 | Experimentally tested (Q7WYU8)

99 Streptomyces lushanensis | Terrestrial, Soil (UPI0008300D2F) 100 Streptomyces sp. AcH_505 | Terrestrial, Soil, Mycorrhizosphere (A0A0C2AZK0) 100 Pseudonocardia acaciae | Terrestrial, Acacia auriculiformis roots (UPI0004920F1A)

Rhodococcus yunnanensis | Terrestrial, Soil (UPI00093405DA)

92 Microbacterium azadirachtae | Terrestrial, Soil (A0A0F0LPY2) 100 Actinobacteria sp. M39 | Termite tissue (A0A178X6B3) 76 Sciscionella marina | Marine, sand sediments (UPI00037CB6FA)

0.2 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Fig 4. Phylogenetic analysis of predicted 3-O-methylgallate dioxygenase

pfam 02900) protein sequences identified in SAR202-VII-2. Method and description as

described in Supp Figure 2.

bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Supp. Figure 5

100 Caballeronia sordidicola | Terrestrial (UPI0004D01EFA) 62 Paraburkholderia caribensis MBA4 | Terrestrial, Soil (A0A0P0RIQ4)

100 xylosoxidans | Human specimen (A0A1Q9X8R3) Bordetella sp. SCN 76-23 | Bioreactor (A0A1E3ZM49)

100 84 Variovorax soli | Terrestrial, Soil (UPI000838866B)

Cyanobacteria bacterium | Terrestrial, Soil (A0A1Q7XRG7) 100 63 Bradyrhizobium erythrophlei | Terrestrial (A0A1M5GHD9)

93 Rhodoplanes sp. Z2-YC6860 | Terrestrial (A0A127EWV7)

Ca. Rokubacteria bacterium | Terrestrial, Soil (A0A1Q7U362)

Ca. Rokubacteria bacterium | Terrestrial, Soil (A0A1Q7D4T9)

Acidobacteria bacterium | Terrestrial, Soil (A0A1Q8A4G9) 91 97 Acidobacteria bacterium | Terrestrial, Soil (A0A1Q7AKK7)

MAG SAR202-VII (Ga0133547_10000004_0200 99 Pseudonocardia acaciae | Terrestrial, Acacia auriculiformis roots (UPI00048BCA37) 100 Streptomyces sp. CNQ865 | (UPI00042957DF)

Frankia sp. NRRL B-16219 | Terrestrial, Soil (A0A1S1Q0L2) 100 Microtetraspora malaysiensis | Terrestrial, Soil (UPI000835F5F5)

Ca. Rokubacteria bacterium | Terrestrial, Soil (A0A1Q7PCG3)

Sulfolobus acidocaldarius SUSAZ | Freshwater, Geothermal hot spring (V9S8X5) 100 Metallosphaera yellowstonensis MK1 | Freshwater, Hot spring (H2C1X5)

Deltaproteobacteria bacterium | Freshwater, Groundwater (A0A1F9AAA4)

Pyrobaculum ferrireducens | Freshwater (G7VHK0)

Paenibacillus chondroitinus | Terrestrial, Populus root endosphere (UPI000648F4AD)

TOBG RS-394 100 TOBG RS-394

Hydrothermal vent metagenome | Marine, Hydrothermal vent (A0A160V8E9)

Deltaproteobacteria bacterium | Terrestrial, Soil (A0A1F9KHN7) 61 Pseudorhodoferax sp. | Terrestrial, Arabidopsis thaliana leaf (A0A0S9MUU2) 92 Alicyclobacillus kakegawensis | (UPI00082FCD86) 96 Cupriavidus sp. UYMMa02A | Terrestrial, Mimosa magentea symbiont (A0A1E4QNX7)

Sphingomonas sp. KA1 | (Q0KJ68) 92 Streptomyces violaceoruber | Terrestrial, Soil (UPI0004C7DB91) 100 Ca. Entotheonella sp. TSY1 | Marine, Marine sponge symbiont (W4L634) 100 | Activated sludge (UPI000416C2F8) 84 Betaproteobacteria bacterium | Terrestrial, Soil (A0A1F3YKZ2)

Rhodococcus jostii | Experimentally validated (Q0S034) 100 Pseudomonas nitroreducens | Experimentally validated (O06646)

0.5 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Fig 5. Phylogenetic analysis of a predicted merged LigA/LigB fusion

protein sequence identified in SAR202-VII-2. Method and description as described in Supp

Figure 2.

bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Supp. Figure 6

Pelagibacterium halotolerans | Marine, Marginal sea (G4RAM3) 100 76 Polymorphum gilvum | Terrestrial, Soil (F2IZI1)

Solirubrobacterales bacterium 70-9 | Bioreactor

Streptomyces varsoviensis | (A0A0L8QM23) 82 Streptomyces bacterium | (UPI0004C8D7FA)

MAG SAR202-VII (Ga0133547_10000002_0234) 98 Deltaproteobacteria bacterium | Terrestrial, Soil (A0A1Q7J251)

Pseudonocardia asaccharolytica | Terrestrial (UPI00041AEE9E) 86 Nakamurella multipartita Y-104 | Bioreactor wastewater (C8XHT9)

Sphingomonas paucimobilis | Experimentally validated (P22635)

Ca. Rokubacteria bacterium | Terrestrial, Soil (A0A1Q6VXE5)

Leucobacter sp. 7_1 | Cheese (A0A1R4HKQ4) 82 Salinibacterium sp. PAMC_21357 | Terrestrial, Antarctic Soil (UPI000288FD34) 99 Streptomyces sp. NBRC_109706 | (UPI0007840008)

Actinomycetales bacterium JB111 | Cheese (A0A1R4FA68)

MAG SAR202-VII (Ga0133547_10000004_0522) 92 Streptomyces sp. WM6386 | Terrestrial, Soil (A0A0F5VUF5)

Deltaproteobacteria bacterium | Terrestrial, Soil (A0A1F9KRG8)

Betaproteobacteria bacterium | Freshwater, Groundwater (A0A1F3ZC27)

Rhodospirillaceae bacterium CCH5-H10 | Drinking water (UPI00082F2864)

Herbaspirillum autotrophicum | Marine, Surface seawater (UPI00067ACA93) 65 Pseudomonas aeruginosa BL04 | Human specimen (A0A1C7BX42) 90 Variovorax paradoxus | (UPI000369FD25) 100 | (A0A072TD22)

Croceicoccus sp. H4 | Marine, Surface seawater (UPI0008368C73)

Caulobacter sp. AP07 | Terrestrial, Populus deltoides rhizosphere (J2PST9)

65 Sphingopyxis terrae | (UPI000789526F) 94 Sphingomonas sp. 67-36 | Bioreactor (A0A1M3DEG2) 70 Hydrothermal vent metagenome | Marine Hydrothermal vent (A0A161KCY9)

Novosphingobium malaysiense | Terrestrial, Mangrove soil (A0A0B1ZQR4) 76 Sphingomonas bacterium | (UPI00082AAD55)

Alcaligenes faecalis | (UPI00068A0F4F) 100 Burkholderia sp. OLGA172 | Terrestrial, Soil (A0A160FT67)

0.3 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Fig6. Phylogenetic analysis of predicted LigA dioxygenase (pfam 07746)

protein sequences identified in SAR202-VII-2. Method and description as described in Supp

Figure 2.

bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not Supp. Figure 7 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

75 Frankia bacterium | Terrestrial, Datisca glomerata Dg1 symbiont (F8AXH0) 87 Pseudonocardia sp. SCN_72-86 | Bioreactor (A0A1E4IDA7) Pseudonocardia spinosispora | Terrestrial, Soil (UPI0004124EC2) 74 Patulibacter minatonensis | Terrestrial, Soil (UPI00047A058A) Rhodococcus opacus B4 | (C1BB36) Mycobacterium rhodesiae JS60 | Terrestrial, Soil (G4HS34) 99 98 Mycobacteriaceae hassiacum DSM 44199 | Human Urine (K5BHL7)

85 Amycolatopsis methanolica 239 | Terrestrial, Soil (A0A076MQZ5) 97 Pseudonocardia autotrophica | (UPI000569C792) Streptomyces sp. MMG1533 | Terrestrial, Soil (A0A0M8TIB1)

100 Streptomyces prunicolor | (UPI0004779436) 94 Streptomyces acidiscabies | Terrestrial, Soil (A0A0L0JU72) 100 Nocardioides sp. Soil797 | Terrestrial, soil (A0A0Q9RWI1) 85 Nocardioides sp. J54 | (UPI00048F3FE2) 89 100 Gordonia polyisoprenivorans Kd2 | Freshwater, Pond (H0RMP9) Sciscionella marina | Marine sediment (UPI000370C781) 100 Nocardia vaccinii | (UPI000831AC97)

65 Pseudonocardia sp. SCN 72-86 | Bioreactor (A0A1E4I8U8)

65 Pseudonocardia acaciae | Terrestrial, Acacia auriculiformis root (UPI00048D6F7E) Mycobacterium mucogenicum | (UPI00076A6EB1) Streptacidiphilus oryzae | Terrestrial, Soil (UPI00055FA1E2) 84 97 Betaproteobacteria bacterium | Terrestrial, Soil (A0A1F4F092)

92 Noviherbaspirillum sp. Root189 | Terrestrial, Arabidopsis thaliana root (A0A0Q8QLF5) Cupriavidus necator | Terrestrial, Soil (UPI0005B45146) Frankia sp. Iso899 | Terrestrial, Soil (UPI0003B53B15)

92 Betaproteobacteria bacterium SG8 41 | Estuary sediment (A0A0S8C399) sp. SCN_67-23 | Bioreactor (A0A1E3Z5Z8)

82 MAG SAR202-VII (Ga0133547_10008610_0002) Betaproteobacteria bacterium | Freshwater, Groundwater (A0A1F3ZTR8)

100 Hydrothermal vent metagenome | (A0A160V6A6) Ca. Entotheonella sp. TSY2 | Marine Sponge symbiont (W4L8S2) 100 Ca. Entotheonella | Marine Sponge symbiont (W4M9A7)

78 Hydrothermal vent metagenome | (A0A160V9P6) 84 TOBG NAT-93 100 Hydrothermal vent metagenome | (A0A160V8E9) 100 Hydrothermal vent metagenome | (A0A161KFT0) Betaproteobacteria bacterium | Freshwater, Groundwater (A0A1F3ZZE6)

74 Burkholderiales sp. SCN 67-23 | Bioreactor (A0A1E3Z9R3) Immundisolibacter cernigliae | Terrestrial, Soil (A0A1B1YRT3)

0.3 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Fig 7. Phylogenetic analysis of a predicted LigB dioxygenase (pfam 02900) protein sequence identified in SAR202-VII-2 (Ga0133547_100086102). Method and description as described in Supp Figure 2. bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Supp. Figure 8

69 Tetrasphaera sp. Soil756 | Terrestrial, Arabidopsis thaliana symbiont (A0A0Q9MC26) 88 Streptomyces sp. WMMB 322 | (A0A1C6M2I8) Micromonospora echinaurantiaca DSM 43904 | Terrestrial, Soil (A0A1C5HIZ1) 65 Actinoplanes missouriensis NBRC 102363 | Terrestrial, Soil (I0H9W5) Ktedonobacter sp. 13_2_20cm_2_56_8 | Terrestrail, Soil (A0A1Q7A4Z9)

100 Actinospica robiniae | Terrestrial, Soil (UPI0003FBAA54) Streptomyces nodosus | Terrestrial, Soil (A0A0B5DVZ7) 100 Ferrimicrobium acidiphilum DSM 19497 | Terrestrial, Acid mine drainage (A0A0D8FRJ8) Rhizobium sp. WYCCWR10014 | Terrestrial, White clover nodule (A0A198YS13)

100 Actinomycetales bacterium JB111 | (A0A1R4EXL8) 71 100 Mesorhizobium sp. Root157 | Terrestrial, Arabidopsis thaliana symbiont (A0A0Q8BBX8) 60 Microbacterium sp. B19 | (UPI00034563B1)

71 Deltaproteobacteria bacterium | Terrestrial, Soil (A0A1F9KAK3) 97 Deltaproteobacteria bacterium | Terrestrial, Soil (A0A1F9KBX7) 94 Chloroflexi bacterium 13_1_20CM_2_70_9 | Terrestrial, Soil (A0A1Q8A6N3) MAG SAR202-VII (Ga0133547_10000004_0523) Herbiconiux sp. YR403 | Terrestrial, Populus root (UPI0006899857) 100 Agromyces italicus | Terrestrail, Tomb wall (UPI0003B77B21) 98 69 Microterricola viridarii | Glacial stream (A0A0Y0NZ61) 100 Marine actinobacterium PHSC20C1 | Marine (A4AEC9)

100 Rhizobiales bacterium 68-8 | Bioreactor (A0A1Q3KUR9) 100 Agrobacterium radiobacter strain K84/ATCC BAA-868 | Terrestrial, Plant pathogen (B9JQ82)

100 Streptomyces acidiscabies a10 | Terrestrial, Soil (A0A1Q2Z314) Cryptosporangium aurantiacum | (A0A1M7PEB8)

100 TOBG EAC-42 100 TOBG SP-2981 100 Deltaproteobacteria bacterium 13_1_40CM_4_54_4 | Terrestrial, Soil (A0A1Q7J241) MAG SAR202-VII (Ga0133547_10000002_0233) 70 84 Bradyrhizobium canariense | Terrestrial, Soil (A0A1H1S9Z6) 100 Bradyrhizobium sp. WSM1253 | Terrestrial, Ornithopud compressus symbiont (I2QDY1)

68 Labrenzia alba | Marine (A0A0M7A984)

100 Polymorphum gilvum SL003B-26A1 | Terrestrial, Crud oil-polluted saline soil (F2IZI2) Pelagibacterium halotolerans strain DSM 22347/JCM 15775/B2 | Marine (G4RAM4) Solirubrobacterales bacterium 70-9 | Bioreactor (A0A1M3BNN9) Streptomyces ciscaucasus DSM 40277 | Terrestrial, Soil (UPI000995FF84) 100 100 Streptomyces sp. NRRL S-1868 | Terrestrial, Soil (UPI0006896678) Actinoplanes sp. N902-109 | (R4LIE0) Sphingomonas paucimobilis | Experimentally validated (P22636)

0.3 bioRxiv preprint doi: https://doi.org/10.1101/325027; this version posted May 17, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Fig 8. Phylogenetic analysis of predicted beta subunits of the protocatechuate 4,5-dioxygenase (pfam 02900) protein sequences identified in SAR202- VII-2 (Ga0133547_100086102). Method and description as described in Supp Figure 2.