bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

1 Subsurface Hydrocarbon Degradation Strategies in Low- and High-Sulfate Coal Seam 2 Communities Identified with Activity-Based Metagenomics 3 4 Authors: Hannah Schweitzer1,2,†§**, Heidi Smith1,2,§, Elliott P. Barnhart1,3, Luke McKay1,4, 5 Robin Gerlach1,5,6, Alfred B. Cunningham1,5,7, Rex R. Malmstrom8, Danielle Goudeau8, and 6 Matthew W. Fields1,2,5** 7 8 Affiliations: 9 1Center for Biofilm Engineering, Montana State University, Bozeman, MT 59717, USA 10 2Department of Microbiology & Cell Biology, Montana State University, Bozeman, MT 11 59717, USA 12 3US Geological Survey, Wyoming-Montana Water Science Center, Helena, MT 59601,USA 13 4Department of Land Resources and Environmental Sciences, Montana State University, 14 Bozeman, MT 59717, USA 15 5Energy Research Institute, Montana State University, Bozeman, MT 59717, USA 16 6Department of Biological and Chemical Engineering, Montana State University, Bozeman, 17 MT 59717, USA 18 7Department of Civil Engineering, Montana State University, Bozeman, MT 59717, USA 19 8DOE Joint Genome Institute, Berkeley, CA 94720, USA 20 21 §Indicates both authors contributed equally to this work 22 †Now at UiT - The Arctic University of Norway, 9019 Tromsø, Norway 23 24 **Corresponding authors 25 H.D. Schweitzer, Post Doctoral Researcher 26 UiT - The Arctic University of Norway 27 The Norweigian College of Fishery Science 28 Muninbakken 21 29 9019 Tromsø, Norway 30 [email protected] 31 32 M.W. Fields, Professor 33 Montana State University 34 Center for Biofilm Engineering 35 366 EPS Building 36 Bozeman, MT 59717, USA 37 [email protected]

1 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

38 One Sentence Summary: 39 Identifying hydrocarbon degradation strategies across redox gradients via metagenomic 40 analysis of environmental and translationally active (BONCAT-FACS) samples from 41 subsurface coal beds. 42 43 44 45 46 Abstract

47 Environmentally relevant metagenomes and BONCAT-FACS derived translationally active

48 metagenomes from Powder River Basin coal seams were investigated to elucidate potential

49 genes and functional groups involved in hydrocarbon degradation to methane in coal seams

50 with high- and low-sulfate levels. An advanced subsurface environmental sampler allowed the

51 establishment of coal-associated microbial communities under in situ conditions for

52 metagenomic analyses from environmental and translationally active populations.

53 Metagenomic sequencing demonstrated that biosurfactants, aerobic dioxygenases, and

54 anaerobic phenol degradation pathways were present in active populations across the sampled

55 redox gradient. In particular, results suggested the importance of anaerobic degradation

56 pathways under high-sulfate conditions with an emphasis on fumarate addition. Under low-

57 sulfate conditions, a mixture of both aerobic and anaerobic pathways were observed but with

58 a predominance of aerobic dioxygenases. The putative low-molecular weight biosurfactant,

59 lichysein, appeared to play a more important role compared to rhamnolipids. The novel

60 methods used in this study—subsurface environmental samplers in combination with

61 metagenomic sequencing of both translationally active metagenomes and environmental

62 genomes—offer a deeper and environmentally relevant perspective on community genetic

63 potential from coal seams poised at different redox potentials broadening the understanding of

64 degradation strategies for subsurface carbon.

65

2 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

66 Introduction

67 The terrestrial subsurface contains the majority of Earth’s organic carbon

68 (~90%)1, and much of the carbon can be converted to methane under anaerobic

69 conditions through biogasification (i.e., biological decomposition of organic matter

70 into methane and secondary gases). Biogasification can take place in coal, black shale,

71 and petroleum reservoirs and is estimated to account for over 20% of the world’s

72 natural gas resources2. Factors influencing biogasification include coal rank, redox

73 conditions (e.g., presence or absence of oxygen and oxyanions), and the genetic

74 potential and activity of the microbial community. Coal is a heterogeneous and highly

75 complex hydrocarbon consisting of polycyclic aromatic hydrocarbons, alkylated

76 benzenes, and long and short chain n-alkanes3, and despite the recalcitrant nature of

77 coal, degradation by microbial consortia has been shown in a variety of coal

78 formations4. It is generally accepted that shallower coal beds that contain sulfate do

79 not produce methane because sulfate-reducing (SRB) outcompete

5,6 80 methanogens for substrates (e.g., acetate, CO2 and hydrogen) . In methanogenic coal

81 beds6,7, hydrogenotrophic and acetoclastic methanogens are commonly identified,

82 including different types of acetoclastic methanogens (e.g., Methanothrix,

83 Methanosarcina), which have distinct pathways for acetate utilization. It remains

84 unknown what type of methanogenesis predominates in situ for different coal seams

85 under different physicochemical conditions8,9.

86 New coal degradation pathways are still being discovered and the involvement

87 of different pathways in the turnover of refractory carbon under various redox

88 conditions remains largely unresolved10–12. The majority of coal degradation research

89 has focused on fumarate addition, while less is known about alternate coal degradation

90 strategies such as phenol degradation by carboxylation and hydroxylation of alkanes,

3 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

91 benzene and ethylbenzene13,14. The fumarate addition pathway involves the activation

92 of n-alkanes by the addition of fumarate via the double bonds at the terminal or sub

93 terminal carbon13,15–19. Several fumarate addition genes (e.g., ass-alkylsuccinate

94 synthase for alkanes, bss-benzylsuccinate synthase for alkylbenzenes, and nms-

95 naphylmethylsuccinate synthase) are often used as catabolic biomarkers for anaerobic

96 hydrocarbon degradation15–17. These genes have been characterized from many

97 subsurface hydrocarbon-containing environments,13,16,20–25 but the importance under

98 different redox conditions is still unclear. Carboxylation and hydroxylation strategies

99 are less well documented mechanisms of anaerobic degradation, although, in recent

100 years work has begun to suggest importance in anaerobic hydrocarbon

101 degradation14,18,26,27, yet how these strategies vary across redox transition zones in situ

102 and detecting organisms responsible for degradation warrants further investigation.

103 While biosurfactants have not been identified in situ in coal seams and are not

104 considered a necessary hydrocarbon degradation gene, previous laboratory-based

105 research demonstrates a potentially important role of these compounds in decreasing

106 the hydrophobicity of the solid coal surface, allowing for cellular and/or protein

107 interactions at the coal surface28. Biosurfactant-producing microorganisms likely play

108 direct and indirect roles in hydrocarbon degradation28–31. The accumulation of the

109 esterase hydrolase enzyme has been correlated to biosurfactant production and is often

110 used as a biomarker for biosurfactant production32. Biosurfactants are routinely

111 observed in environments that consist of complex hydrocarbons and are therefore

112 hypothesized to be an interdependent, complex, and coordinated means of increasing

113 coal bioavailability. However, studies that have demonstrated active biosurfactant-

114 producing microorganisms in coal environments are lacking.

4 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

115 Aerobic hydrocarbon degradation is the most documented form of hydrocarbon

116 degradation via the aerobic activation of alkanes with dioxygenase enzymes that use

117 oxygen as an electron acceptor and as a reactant in hydroxylation33. Aerobic

118 hydrocarbon degradation in coal environments is often disputed due to the uncertainty

119 of the presence of oxygen and more research into anaerobic hydrocarbon degradation

120 strategies has been performed in the last few decades. However, recent metagenomic

121 analyses have indicated the presence of aerobic hydrocarbon degradation genes in coal

122 beds across Canada, but it is unknown whether these genes were present in active

123 organisms in situ23.

124 The rate-limiting step in coal biogasification has been attributed to initial

125 biological breakdown of the refractory hydrocarbon matrix34. As discussed above,

126 while several degradation strategies including the role of biosurfactants and aerobic

127 hydrocarbon degradation have been observed, it remains challenging to link genetic

128 potential to activity using traditional sequencing methodologies35. While traditional

129 sequencing methodologies allow for an environmental snapshot they do not

130 distinguish between dead, dormant, or active genomes in the environment. We were

131 able to procure environmentally relevant samples from three separate coal seams

132 located in the Powder River Basin (PRB) near Birney, Montana, with a Subsurface

133 Environmental Sampler (SES; Patent # US10704993B2)36. The SES is a novel

134 sampling tool which enabled harvesting of in situ coal-associated biofilms

135 representative of local environmental conditions. The SES-retrieved samples were

136 analyzed via total and activity-based metagenomes (using BONCAT- Bioorthogonal

137 noncanonical amino acid tagging) to examine metabolic potential across

138 hydrologically isolated coal seams that spanned vertical redox gradients. BONCAT

139 methods have previously been used in combination of fluorescence-activated cell

5 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

140 sorting (BONCAT-FACS) and SSU rRNA gene sequencing37,38, but metagenomic

141 sequencing of active cells from BONCAT-FACS has yet to be performed. From these

142 samples, we performed gene- and genome-centric approaches to assess the total

143 hydrocarbon degradation/surfactant potential for each coal seam community as well as

144 examine changes in microbial diversity for the different communities. Investigation of

145 the active microbial groups and functional potential in CBM habitats provided new

146 insight into the hydrocarbon degradation strategies and the associated capacities for

147 carbon cycling that vary with depth and sulfate conditions through shallow coal seams.

148

149 Results and Discussion

150

151 Microbial communities and diversity under high and low-sulfate conditions

152 An SES36 was used to collect in situ coal-associated microbial communities for

153 shotgun metagenomic sequencing in three geochemically diverse coal seams. Samples

154 from a high-sulfate (Nance, N-H) coal seam and two low-sulfate (Flowers Goodale,

155 FG-L; Terret, T-L) coal seams were co-assembled and used to determine differences

156 in hydrocarbon degradation genes and the overall communities based on the

157 metagenome-assembled genomes (MAGs) (Fig. 1). After refinement, 86 MAGs had

158 >80% estimated completeness for the co-assembled environmental shotgun

159 metagenomes (Fig. 2, Table S1). There were 769,799 genes contained in the

160 environmental metagenomes on 92,729 contigs.

161 The MAGs that were most closely related to organisms from the

162 Desulfobacteraceae family were observed across all sampled coal seams regardless of

163 sulfate level, although highest coverage was in the high-sulfate coal seam (Fig. 2). The

164 Desulfobacteraceae family consists of many well studied sulfate reducing bacteria

6 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Figure 1. Schematic depicting experimental workflow beginning with in situ sampling of individual coal seams utilizing a Subsurface Environmental Sampler (SES) to retrieve CBM samples of the attached fraction under environmentally relevant conditions. Following retrieval, samples were either prepared for BONCAT incubation or shotgun metagenomics. For shotgun metagenomics, DNA was extracted and sequenced. BONCAT incubations lasted 24 hours followed by cell removal from coal, click chemistry, staining, and cell sorting of BONCAT+ and BONCAT-Total (SYTO stained) cell fractions, DNA extraction and amplified metagenomic sequencing.

165 (SRB) and is known to be metabolically versatile and exhibits multiple

166 electron transfer complexes suggesting alternative metabolic mechanisms in low-

167 sulfate environments39. Previous research has demonstrated increased abundance of

168 SRBs in high-sulfate coal seams compared to low-sulfate coals seams6. Although dsrA

169 genes, a commonly used gene marker for SRB, have been detected in acetate-amended

170 methane-producing coal seams, suggesting organisms with dsrA may play an

171 important role in syntrophic biogasification40. The total dsrA gene coverage for the

172 high-sulfate coal seam (N-H) was 2.2X and 2.4X fold greater than for the low-sulfate

173 coal seams, T-L and FG-L, respectively (Table S2). In the high-sulfate coal seam (N-

174 H), the dsrA gene was identified in MAGs taxonomically similar to

175 Desulfobacteraceae (Bin 372) and Desulfatibacillum alkenivorans (Bin 369), whereas

176 the FG-L and T-L low-sulfate coals were predominated by MAGs with < 80%

177 completion as well as Desulfococcus (Bin 21), Desulfuromonas (Bin 275), and

7 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Figure 2. Comparison of coverages from the one high sulfate coal seam (N-H in blue) and two low sulfate coal seams (FG-L and T-L in orange) for each individual metagenome- assembled genome (MAG) with greater than 80% completion (green line graph). Coverage is calculated by finding the average depth of coverage across all contigs using Anvi’o compiled contigs. Coverage bar graphs axis is indicated with the highest possible sequencing depth being 550. All samples were below 10% redundancy (purple line graph).

178

8 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

179 Myxococcaceae (Bin 1 and Bin 37) (Fig. 3, Supplemental Data 1). Previous research

180 on the Desulfococcus clade and the genus Desulfatibacillum alkenivorans reported the

181 ability to anaerobically degrade alkanes41,42. Desulfatibacillum alkenivorans has also

182 been described as being capable of using hexadecane as the sole carbon source in the

183 presence of sulfate42. Anaeromyxobacter dehalogenans (from the Myxococcaceae

184 family) is a facultative anaerobe that utilizes acetate with nitrate, oxygen, and

185 fumarate43.

186 The most predominant MAGs in the low-sulfate FG-L and T-L wells were

187 most closely related to Anaeromyxobacter (Myxococcaceae), Desulfococcus

188 (Desulfobacteraceae), and methylotrophic organisms from the Methylocystaceae

189 family consisting of Methylocystis and Methylosinus (Fig. 2 and Table S1). Methane

190 producing organisms such as acetoclastic methanogens, Methanothrix

191 (Methanosaetaceae) and Methanococcoides (Methanosarcinaceae), and the

192 hydrogentorphic methanogen, Methanoregula formicica (Methanoregulaceae), were

193 also detected in the low-sulfate coal seams (Fig. 2 and Table S1). Hydrogenotrophic

194 and acetoclastic methanogens are commonly identified in methanogenic coal beds6,7,

195 and different types of acetoclastic methanogens (e.g., Methanothrix, Methanosarcina)

196 have different pathways for acetate utilization. Active acetate utilization has been

197 demonstrated in PRB coalbeds but the dominant microbial communities or metabolic

198 pathways that generate or utilize acetate in these environments are generally

199 unknown35. As stated above, Anaeromyxobacter dehalogenans typically thrive in

200 environments with acetate further supporting the possibility of acetate in the low-

201 sulfate communities43. Previous laboratory investigations have demonstrated that

202 acetoclastic methanogens produce a large fraction of biogenic methane in coal

203 enrichments and it is therefore hypothesized that high methane producing coal seams

9 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Hydroxylation Carboxylation Fumarate Addition N-H T-L FG-L N-H T-L FG-L N-H T-L FG-L

<80% Completion <80% Completion <80% Completion Vicinamibacteraceae Uncharacterized Rhodocyclaceae (Bin 10) (Bin 295) (Bin 374) Syntrophaceae (Bin 84) Rhodocyclaceae Myxococcaceae Geobacteraceae (Bin 374) (Bin 1) (Bin 34) Desulfobacteraceae Desulfobacteraceae Desulfobacteraceae (Bin 372) (Bin 372) (Bin 56) Desulfobacteraceae Desulfobacteraceae Desulfobacteraceae (Bin 50) (Bin 368) (Bin 369) Desulfobacteraceae (Bin 372) Desulfobacteraceae Desulfobacteraceae Desulfobacteraceae (Bin 21) (Bin 21) (Bin 368)

T T T T T T T T T T T T T T T T T

N N N N N N N N N N N N N N N N

G G G G G G G G G G G G G G G G G

j

j

F F F F F c F F F F F s s c F F F F F F F

h

s c

h B B A A A A E

C D D D

B B A A A A E

C D D D D

t

j c s

m t b

d s

m y c s b

d d d c p s s s

y c s c s s c

d d h c p s s s s m

B B A A A A E

C

u D D D

c p a

u h

c e a

h b p h s s t

p c

e b a t b s

m

h b p s s b

b d p a t b n

y c s

d d c p i s s s

m

p a i m

m m

p p

a a p a b u

p e c p p

a

a b

a a p m h

e e p

h N b p m s s

c b p a t b

N

c n

n

i

m

p a

p

a a p a b

e p

m

N

c 0% 25% 50% 75% 100%

Dioxygenases Biosurfactant Sulfate Reduction/Methanogenesis N-H T-L FG-L N-H T-L FG-L N-H T-L FG-L <80% Completion <80% Completion <80% Completion Streptomycetaceae (Bin 3) Rhodocyclaceae (Bin 374) Pseudomonadaceae (Bin 302) Rhodocyclaceae Methanosarcinaceae (Bin 374) (Bin 96) Myxococcaceae (Bin 37) Myxococcaceae (Bin 1) Pseudomonadaceae Methanoregulaceae Microbacteriaceae (Bin 266) (Bin 302) (Bin 30.1) Methylocystaceae (Bin 92) Myxococcaceae Desulfobacteraceae Methanosaetaceae (Bin 11) (Bin 1) (Bin 372) Desulfobacteraceae (Bin 368) Geobacteraceae Desulfobacteraceae Comamonadaceae (Bin 122) (Bin 373) (Bin 369) Bradyrhizobiaceae (Bin 282) Conexibacteraceae Desulfobacteraceae Beijerinckiaceae (Bin 93) (Bin 370) (Bin 21)

T T T T T T T T T

N N N N N N N N N N

G G G G G G G G

G G

t f

l

t f

F F F F F F F F

F F

n e h

r

n e B X h A A

r

X s A A

B

t

s h

t

l r r

c t f

s

e r r

c

e g s r

l

g n e e h

r

a l

i O B X e A A

a c s

i O s h

l t c s

l I

r r

I c

G e s r

G L g

l d

L e

a a d

i O

c s

a

m

l

D m

I

D

G

h

L

d

h

a

t

t m

D

o

o

h

h

h

t

n

n

o

P

P h

o

o n 0% 25% 50% 75% 100% 0% 25% 50% 75% 100%

P

o

M

M

M 0% 25% 50% 75% 100% Figure 3. Heatmaps for abundant MAGs (>10% of the community) from shotgun metagenomic samples depicting the relative abundances for individual genes involved in hydrocarbon degradation for the high sulfate coal seam (N-H) and two low sulfate coal seams (T-L and FG-L). All MAGs that were below 80% completion were grouped together as <80% Completion. The six different heatmaps, displaying the genes of interest, were grouped based on method of anaerobic degradation (green heatmaps) via hydroxylation, carboxylation, and fumarate addition, aerobic degradation (blue heatmap) via dioxygenases, biosurfactant production (purple heat map) and sulfate reduction/ methanogenesis (red heatmap). All relative abundance of each gene found in the corresponding MAGs are represented by a color gradient (as seen in legend below heatmap) with the darkest color indicating the highest relative abundance for each sample and white indicating lack of the gene.

204 are abundant in acetoclastic methanogens44,45. In comparison to other methanogens,

205 Methanothrix spp. have been reported to be better scavengers in low-acetate

206 conditions9. However, recent isotopic analysis of PRB samples reveal that

207 hydrogenotrophic methanogenesis can also be a significant methane producing

208 pathway44.

209 To further investigate the dominant methanogenic populations in the low-

210 sulfate coal seams, the BONCAT positive (BONCAT+) metagenome from the FG-L

211 coal seam was analyzed and resulted in 24 MAGs (Fig 4, Table S3). The BONCAT+

10 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

212 library had 888 contigs with 31,748 genes, while the total SYTO stained population

213 (BONCAT-Total) from the same experiment yielded 71,455 genes on 1,748 contigs

214 (Fig. S1, Table S4). Comparison of the BONCAT-Total and BONCAT+ samples

215 demonstrated that 29.3 - 36.3% of the recovered genes belonged to populations that

216 were BONCAT+ (Fig. S2). Six of the 24 BONCAT+ MAGs were taxonomically

217 similar to known methanogens, Methanothrix, an acetoclastic methanogen and

218 Methanobacterium, a hydrogenotrophic methanogen (Fig. 4, Table S3). These results

219 suggest there are likely multiple (hydrogenotrophic and acetoclastic) active

220 methanogenic pathways in situ6,44, although the Methanothrix MAG exhibited

221 significantly higher quality (Fig. 4).

222 Notably, three BONCAT+ MAGs had > 70% completeness and were related to

223 the Bacteroidetes, Chlorobi and Methanothrix (Fig. 5, Table S3). The Chlorobi and

224 Methanothrix MAGS overlapped between the BONCAT+ and the BONCAT-Total

225 metagenome as determined by average nucleotide identity (Fig. 4, Fig. S1, Table S3,

226 Table S4 and Supplemental Data 2). Bacteroidetes and Chlorobi have previously been

227 observed to dominate acetate amended coal seams, and furthermore, organisms

228 belonging to the Bacteroidetes phylum have been identified as a key lineage in

229 hydrocarbon degradation40,46. These high quality BONCAT+ MAGs are examined in

230 greater detail for potential involvement in coal degradation, acetate production, and

231 subsequent acetoclastic methanogenesis in a separate study (McKay and Smith et al.,

232 in review).

233

234 Presence of Biosurfactant Genes

235 Presumptive biosurfactant genes for an esterase hydrolase enzyme (est),

236 lichenysin synthetase (lch), and surfactin synthetase (srf) were detected with the

11 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

237 highest gene coverage in the FG-L coal seam (Fig. 5, Table S2). The est gene had the

238 highest coverage (between 21.2-32.3 RPKM) compared to any of the investigated

239 biosurfactant genes and was primarily in MAGs from the environmental metagenomes

240 with < 80% estimated completeness (Fig. 3, Fig. 5, Table S2, and Supplemental Data

241 1). The est gene was also observed in three of the 24 BONCAT+ MAGs, two of which

Figure 4. Evaluation of BONCAT positive metagenome assembled genomes (MAGs) and the presence of anaerobic hydrocarbon degrading (green), aerobic hydrocarbon degrading (blue), biosurfactant (purple) and sulfate reduction/methanogenesis (red) genes for each corresponding MAG. The percent completion (green line graph out of 100%) and redundancy (purple line graph out of 10%) were compared for each MAG. The three highlighted MAGs with the symbol (■) have a >99% average nucleotide identity similarity to BONCAT-Total MAGs indicated with the same symbol.

12 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

242 were most closely related to Geobacter and one related to Proteobacteria (Fig. 4, Table

243 S3). Previous work has suggested the importance of esterases in the production of

244 biosurfactants is due to the correlation between the accumulation of the esterase

245 hydrolase enzyme and the production of lichenysin and surfactin32. Lichenysin,

246 surfactin and the esterase enzyme genes had the highest coverage in the low-sulfate

247 coal seams and were identified in the BONCAT+ metagenomes from FG-L (Fig. 4,

248 Fig. 5, Table S2 and Table S3).

249 In the low-sulfate environmental metagenomes (FG-L and T-L), the lch and srf

250 biosurfactant genes were most present in the MAGs that were < 80% estimated

251 completeness (Fig. 3) and one highly complete (>80%) MAG that was taxonomically

252 similar to Legionella (Supplemental Data 1). Legionella spp. have previously been

253 shown to have the ability to produce biosurfactants that contain a lipid structure and

254 had a similar environmental response as surfactin produced from Bacillus subtilis47.

255 Lichenysin and surfactin are part of the surfactin operon family and contain four open

256 reading frames that include lch and srf. Much like surfactin, lichenysin is a low

257 molecular weight anionic cyclic lipopeptide biosurfactant previously shown to be

258 produced by Bacillus licheniformis isolates48. Previous research has also demonstrated

259 lichenysins are capable of enhancing oil recovery and degrading naphthalene and

260 crude oil48,49. Biosurfactants have been suggested as having a potentially important

261 role in decreasing the hydrophobicity of the solid coal surface, allowing for cellular

262 and/or protein interactions at the coal surface28. Both lichenysin and surfactin are

263 nonribosomal peptides and therefore are less energy intensive and could be ideal for

264 oligotrophic conditions such as coal seams48,50,51. These results suggest biosurfactants

265 are likely important in coal biogasification in low-sulfate environments and could

266 facilitate the initial steps in hydrocarbon breakdown.

13 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

267 The rhl gene (rhamnolipids) had the lowest coverage of any of the investigated

268 biosurfactant-related genes. The greatest rhl coverage was observed in the high-sulfate

269 coal seam with little to no coverage in the low-sulfate coal seams (Fig. 5, Table S2). In

270 addition, rhl was not detected in the BONCAT+ sample, although it was found in one

271 MAG from the environmental shotgun metagenomes related to Pseudomonadaceae

272 (Bin 302) (Fig. 3, Fig. 4, Table S2). Rhamnolipid biosurfactants produced by

273 Pseudomonas aeruginosa and Pseudomonas stutzeri have previously shown the ability

274 to enhance anaerobic degradation of the PAHs, phenanthrene and pyrene52–54.

275 Pseudomonas stutzeri, a model biosurfactant producing organism, was detected in

276 formation water originating from volatile bituminous coal in the Appalachian Basin

277 coal bed and is suggested to play an important role in enhanced oil recovery53–55. The

Anaerobic Hydrocarbon Degradation Aerobic Biosurfactants Dioxygenases 1400 * 160 1200

) * 140 M 1000 K 120 * P 800 R 100 *

(

* e 80 * 600

g

a * r 60 * 400

e

v 40 * * o * * 200 C 20 * 0 * * * 0

* l

t

J

f

X *

e

C

h D

B A

B

D

h

n

B

C

C

D

* A

A

D E

A

r

D

t A

s

h

A

t

s

c

s

s c

c

d

d

d

s

s e r

c

m g

s

y

O

p s

r

a

i

S

I

E

u

s

R

s l

p

p

L c

c

e p

b

a

b

b

h

s

a b

L

m

m

I

T

a

D

B

A

P

P

P

A

P

E

A

A

H

M D

N

M

C

h

N

_

t

o

h

n

P Hydroxylation Carboxylation Fumarate Addition o

M Figure 5. The coverage of each gene was calculated using the average depth of coverage across the contig the gene was identified in and all non-duplicate contigs containing the gene of interest above 40% identity were summed and reported here. Protein gene sequence coverage was normalized using the coverage of each gene per kilobase of transcript per million mapped reads (RPKM) for each gene of interest from each shotgun metagenomic sample. There are representative genes for each of the databases tested with genes of interest containing the highest coverage in one of the samples and/or being present in the BONCAT+ sample. The different genes are grouped based on anaerobic hydrocarbon degradation strategies (green), aerobic dioxygenases (blue), biosurfactant (purple) and sulfate reduction/ methanogenesis genes (red). All three environmental samples are represented in a color gradient with the high sulfate coal seam, N-H in the darkest shade (top), and two low sulfate coal seams, T-L (middle) with medium shade and FG-L (bottom) in the lightest shade. Stars indicate the presence of the individual gene in the BONCAT+ sample. The DsrA gene has its own axis because it had a greater overall coverage.

14 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

278 greater prevalence of lch and srf suggests that rhamnolipids may not be the dominant

279 biosurfactants in all coal environments as previously hypothesized from benchtop coal

280 enrichment experiments52–55.

281

282 Hydrocarbon Degradation in High-Sulfate Environments

283 The high-sulfate coal seam (N-H) contained a higher average coverage for

284 anaerobic hydrocarbon degradation genes compared to the low-sulfate coal seams

285 (Fig. 5, Table S2). The phenylphosphate carboxylase gene (ppcC) had the highest

286 coverage across all coal seam metagenomes regardless of sulfate concentration (Fig. 5,

287 Table S2). Phenylphosphate carboxylase and phenylphosphate synthase can contribute

288 to the anaerobic degradation of phenol, and phenylphosphate carboxylase is

289 responsible for the conversion of phenylphosphate and CO2 into 4-hydroxybenzoate

290 and has previously been identified in Geobacter metallireducens56. Consistent with

291 this, one of the most prominent populations represented by MAGs with > 80%

292 estimated completeness in the high-sulfate coal seam (N-H) was taxonomically similar

293 to Geobacter (Bin 360, Bin 373, Bin 34 and Bin 184) and contained the ppcC gene

294 (Fig. 2, Table S1 and Supplemental Data 1).

295 The gene with the next highest coverage for the high-sulfate coal seam was the

296 alkylsuccinate synthase subunit D (assD) followed by hydroxybenzylsuccinate

297 synthase (hbsD), and benzylsuccinate synthase subunit D (bssD) (Fig. 5, Table S2).

298 These genes (assD, hbsD and bssD) are all involved in hydrocarbon degradation via

299 fumarate addition, which is currently the only well-described, oxygen-independent

300 alkane activation57. Gene coverages for fumarate addition were higher for the high-

301 sulfate coal seam (N-H) compared to the low-sulfate coal seams (Fig. 5, Table S2).

302 For the high-sulfate coal seam, fumarate addition genes were detected in MAGs that

15 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

303 were most taxonomically similar to populations from the Desulfobacteraceae family

304 (Bin 372) as well as others such as Desulfatibacillum alkenivorans (Bin 56) and the

305 Desulfococcus clade (Bin 50, Bin 368) (Fig. 3). Previous research has shown these

306 alkane-degrading organisms in the presence of sulfate41,42. Our results indicate

307 fumarate addition could play a more crucial role in high-sulfate than low-sulfate coal

308 seam environments for hydrocarbon degradation under high-sulfate conditions.

309

310 Low-sulfate hydrocarbon degradation: Aerobic and Anaerobic Strategies

311 Previously reported methane concentrations for low-sulfate FG-L and T-L (33-

312 67 mg/L7) are consistent with our detection of higher mcrA gene coverages compared

313 to the high-sulfate (< 0.15 mg/L methane) coal seam (Fig. 3, Fig. 5, Table S2). The

314 rate-limiting step in coal biogasification has been attributed to initial biological

315 breakdown of the refractory hydrocarbon matrix34. Analysis of the hydrocarbon

316 degradation genes showed a mix of both aerobic and anaerobic strategies, yet there

317 was a higher coverage of aerobic hydrocarbon degradation genes under low-sulfate

318 compared to the high-sulfate coal seam. Homoprotocatechuate dioxygenase (LigB)

319 was present in the BONCAT+ metagenome (Fig. 4 and Table S3), and in the

320 environmental metagenomes LigB had the highest coverage of all investigated aerobic

321 aromatic hydrocarbon degradation genes58 regardless of sulfate level. Moreover, LigB

322 had the highest coverage for any gene detected in the low-sulfate coal seams (Fig. 5,

323 Table S2). The homoprotocatechuate dioxygenase is involved in the initial

324 deoxygenation of 4-hydroxyphenylacetate to homoprotocatechuate (and eventually to

325 fumarate). The monocyclic (Mono_DIOX) and bicyclic extradiol dioxygenases and

326 gentisate dioxygenase (Gen) from the cupin superfamily were also in the BONCAT+

327 metagenome (Fig. 4, Table S3), although they were all at lower coverages across all

16 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

328 shotgun metagenomes compared to the LigB dioxygenase (Fig. 5, Table S2). The

329 LigB dioxygenase was found in the putative methanogenic BONCAT+ MAG that was

330 most closely related to Methanothrix (discussed above) indicating possible activity of

331 aerobic hydrocarbon degradation (LigB) in addition to anaerobic hydrocarbon

332 degradation (via ppsAB and bssD). Possible oxygen-dependent or -tolerant properties

333 of this putative methanogenic population are discussed in detail in a separate study

334 (McKay and Smith et al., in review). Further, the bicyclic dioxygenases and the

335 gentisate dioxygenases (Gen) were in a BONCAT+ MAG taxonomically affiliated

336 with the Proteobacteria, while the monocyclic dioxygenases were detected in a

337 different MAG within the Betaproteobacteria (Fig. 4, Table S3) indicating further

338 aerobic degradation capabilities.

339 The anaerobic hydrocarbon degradation gene with the greatest coverage in the

340 low-sulfate environments was ppcC and was identified in BONCAT+ MAG most

341 closely related to Geobacter (Fig. 4, Fig. 5, Table S2 and Table S3). The high

342 coverage of ppcC in both high- and low-sulfate environments indicates that anaerobic

343 phenol degradation is likely crucial in the degradation of PRB coal. Phenylphosphate

344 carboxylase genes are often overlooked for the more well documented fumarate

345 addition genes13,14,18,26,27, but results presented herein suggest ppcC should be

346 considered as a possible mechanisms for anaerobic hydrocarbon degradation. Phenol

347 degradation via the activation of phenylphosphate followed by its carboxylation has

348 previously been shown to only occur when in the presence of phenol and CO2 and is

349 believed to be a strictly anaerobic process56. However, previous research suggests ppc

350 can exist in an inactive oxidized form until it is reduced and activated under anaerobic

351 conditions59. Therefore, under fluctuating oxygen conditions, phenylphosphate

352 carboxylase may play an important role.

17 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

353 While all the fumarate addition genes of interest were identified in the

354 environmental shotgun metagenomes (Fig. 5) and belonged to predominant

355 populations related to the Geobacter (Bin 34), Syntrophus aciditrophicus (Bin 84),

356 Anaeromyxobacter (Bin 37), Luteitalea pratensis (Bin 10), and Desulfococcus (Bin

357 368 and Bin 50) (Fig. 3 and Supplemental Data 1), the coverage for these genes was

358 much lower in the low-sulfate compared to the high-sulfate coal seam. Only two of the

359 24 BONCAT+ MAGs displayed the presence of fumarate genes (related to

360 Methanothrix and Betaproteobacteria) and of the seven fumarate genes investigated

361 only three (assD, bssD, and tutE) were present (Fig. 4, Fig. 5, Table S2, Table S3). It

362 is possible that the low-sulfate and methanogenic conditions were not able to support

363 fumarate addition, due to the lack of available electron acceptors, which are necessary

364 for fumarate addition and therefore needed to rely on other methods of carbon

365 degradation.

366 Together, our results suggest that fumarate is an important intermediate

367 substrate in hydrocarbon degradation for both high- and low-sulfate environments, but

368 likely plays a more important role in high-sulfate PRB environments. Previous

369 laboratory investigations have demonstrated an increase in assA gene abundance

370 during carbon degradation in sulfate-reducing environments, with no such increase in

371 corresponding methanogenic cultures14,18,20. In contrast, other oil field surveys and

372 hydrocarbon degradation experiments demonstrate that fumarate addition is an

373 important pathway under methanogenic conditions and organisms with the assA, such

374 as Anaerolineaceae, have shown the ability to degrade n-alkanes to produce acetate,

375 potentially creating a syntrophic relationship with the acetoclastic methanogen,

376 Methanothrix6022,25. Although, while fumarate addition likely plays a role in both high-

377 and low- sulfate environments, our observations suggest that in low-sulfate coal

18 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

378 seams, prerequisite methanogenic substrates may result from both anaerobic and

379 aerobic (homoprotocatechuate dioxygenase) hydrocarbon degradation as they are

380 detected in higher coverages across all coal-dependent and active populations (Fig. 4,

381 Fig. 5 and Table S2).

382

383 Conclusion

384 The microbial environment of the terrestrial subsurface is one of the most

385 complex and poorly understood microbiome on Earth61. Due to diversity and high

386 functional capacity, there have been many calls to better understand the diversity62 and

387 the activity of in situ microorganisms and the potential implications on the cycling of

388 old and new carbon between the lithosphere and atmosphere63. There has been much

389 debate on possible pathways actively involved in the subsurface carbon cycle,

390 specifically, the degradation of complex hydrocarbons to methanogenic substrates

391 under varying conditions (e.g. sulfate, oxygen). The subsurface environment

392 constitutes a challenging environment to access and prior to this study, conclusions

393 had been based primarily on laboratory benchtop enrichments, geochemical

394 predictions, or shotgun metagenomic models2,6,23. The novel methods used in this

395 work, accessing the subsurface and sampling of environmentally relevant in situ

396 communities using the SES and the integration of metagenomes from active

397 communities derived from BONCAT-FACS, expands both the gene and genome

398 catalog of the subsurface environment as well as the activity of key carbon cycle genes

399 indicative of hydrocarbon degradation. The active groups were putatively involved in

400 both aerobic and anaerobic hydrocarbon degradation and biosurfactant production.

401 The presented results demonstrated that active populations have the ability to produce

402 diverse biosurfactants and could play an important role in coal-dependent

19 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

403 methanogenesis in low-sulfate coal seams. While this study proposes biosurfactants

404 are important in low-sulfate coal seams, more work needs to be performed to

405 determine the role and environmental triggers for biosurfactants from different coal

406 ranks and sulfate concentrations.

407 Hydrocarbon degradation in coal seam environments is often assumed to be

408 strictly anaerobic, although with advancements in metagenomics more research is

409 beginning to suggest mixed aerobic and anaerobic hydrocarbon degradation23,64. Our

410 results indicate that aerobic hydrocarbon degradation genes are associated with active

411 populations in coal seam environments that span redox transition zones and are more

412 prevalent in low-sulfate coal seams. Previously, enzymes belonging to the LigB

413 superfamily had not been readily detected due to conventional screening techniques.

414 In the advent of metagenomics, these enzymes have been broadly identified in the

415 environment with an enrichment in hydrocarbon rich samples, suggesting enhanced

416 importance in hydrocarbon degradation65,66. The BONCAT+ metagenomes were

417 predominated by both aerobic and anaerobic degradation genes that included ppcC,

418 ppsA, ebdB, cmdB, bssD, ligB, monocyclic and bicyclic dioxygenases. While it is

419 possible that active populations may not be using aerobic degradation genes,

420 fluctuations in dissolved oxygen have been reported in different coal seams and coals

421 have the ability to absorb free oxygen (Meslé and Fields, data not shown). Therefore,

422 based on the active MAGs that contain aerobic genes, it is possible microaerophilic

423 conditions could promote aerobic hydrocarbon degradation via dioxygenases such as

424 the homoprotocatechuate dioxygenases from the LigB family. The source of oxygen in

425 these environments is unknown, and likely depends on groundwater flow/recharge,

426 surface water intrusion, water level changes, coal seam depth, and associated

427 microbial activities. Further work should determine the extent of oxia, potential

20 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

428 oxygen sources, and potential segregation of hydrocarbon degradation and

429 methanogenesis along heterogeneous critical zone transitions in the shallow

430 subsurface.

431

432 Methods

433 Site Description and Sample Collection

434 The USGS Birney sampling site, located in the PRB of southeastern Montana

435 (45.435606, -106.393309), has access to four subbituminous coal seams at different

436 depths. Metagenomic samples were collected from a high-sulfate (23.78 – 26.04 mM)

437 coal seam (N-H) at 65 meter depth and two low-sulfate (0.01 – 0.38 mM) coal seams

438 (FG-L and T-L) located at 117 meter and 161 meter depths. The low-sulfate coal

439 seams have measurable methane (33 – 67 mg/L) while the high-sulfate coal seams

440 have low methane (<0.15 mg/L) as previously measured7. Coal associated microbial

441 assemblages were collected with a DMS as previously described7 or with a next-

442 generation subsurface environmental sampler (SES; Patent# US10704993B2)

443 following previous field protocols. The DMSs were incubated down-well for three

444 months and retrieved on June 12th, 2017. Once retrieved, the slurry (coal and

445 groundwater) from the DMSs were aseptically removed and stored on dry ice in sterile

446 falcon tubes until brought back to the lab, where they were stored at -80°C until DNA

447 extractions were performed.

448

449 DNA Extraction, Concentration and Sequencing for Shotgun Metagenomics

450 DNA was extracted from slurry in the DMS as previously described in

451 Schweitzer et al.6 using a FastDNA Spin Kit for Soil (MP Biomedical). DNA was

452 purified using One Step PCR Clean Up (Zymo Research). The DNA was quantified

21 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

453 using Qubit dsDNA HS Assay Kit (Invitrogen) before being sent to the MBL Keck

454 sequencing facility (The University of Chicago, IL). The genomic DNA was

455 concentrated using a SpeedVac (Thermo Scientific) and quantified using Picogreen

456 (Invitrogen). DNA concentrations ranged between 138.6ng (FG-L) and 1,450ng (T-L).

457 Using a Covaris ultrasonicator, DNA was sheared to ~400 bp and libraries were

458 constructed with NuGEN Ovation® Ultralow Library protocol (Tecan Genomics, CA).

459 The amplified libraries were visualized on a Caliper HiSense Bioanalyzer and pooled

460 to equimolar concentrations. The DNA was further size selected using a Sage

461 PippinPrep 2% cassette. The pooled library was quantified using Kapa Biosystems

462 qPCR library quantification before being sequenced in a 2x150 paired-end sequencing

463 run using an Illumina NextSeq. The sequences were demultiplexed and the adapter

464 sequences were trimmed using bcl2fastq conversion software. The sequences were

465 archived with SRA (SRP292435) under the BioProject ID PRJNA678021.

466

467 BONCAT Incubations, Fluorescent Labeling, Cell Sorting and Metagenomic

468 Sequencing

469 The following methods are also presented in our companion study, which

470 examines high-quality BONCAT+ MAGs for their involvement in acetate production

471 and acetoclastic methanogenesis (McKay and Smith et al., in review). Subsurface

472 environmental samplers (SES, Patent # US10704993B2) that contained UV-sterile,

473 crushed coal (size range 0.85-2.00mm) were deployed at the screened interval in a

474 methane-producing coal seam (FG-L; September 2017) at the USGS Birney Field Site

475 and, after approximately nine months of down-well incubation, were retrieved from

476 CBM well FG11 at the USGS Birney Field Site7. In contrast to DMSs used for

477 shotgun sequencing, the use of the SES allowed for samples to be kept under

22 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

478 anaerobic conditions and at formation pressures, an essential prerequisite for

479 conducting representative BONCAT incubations. Upon retrieval, 10 mL of SES slurry

480 was sampled via sealed ports (Swagelok) and were anaerobically transferred into

481 gassed-out, sterile Balch tubes. Triplicate tubes were prepared for incubation with 250

482 μM L-homopropargylglycine (HPG, Click Chemistry Tools, Scottsdale, AZ, USA)

483 prepared in sterile, degassed water (DEPC diethyl pyrocarbonate treated, pH 7).

484 Control samples were prepared the same as other samples with the exception that HPG

485 was not added (HPG negative control). Samples were incubated in the dark at 20 °C

486 for 24 hrs. At the end of the incubation period cells were removed from coal by

487 removing 1 mL of slurry and adding 5mls Tween® 20 dissolved in PBS (1X PBS) to a

488 final concentration of 0.02% (Sigma-Aldrich). Samples were vortexed at maximum

489 speed for 5 min followed by centrifugation at 500 × g for 5 min (Couradeau et al.,

490 2019)37. The supernatant (containing the detached cells) was immediately

491 cryopreserved at −20 °C in sterile 55% glycerol TE (11X) solution.

492 Translationally active cells were identified through a click reaction that added a

493 fluorescent dye to HPG molecules that had been incorporated into newly synthesized

494 protein. This BONCAT click reaction consisted of 5mM sodium ascorbate, 5mM

495 aminoguanidine HCl, 500uM THPTA, 100uM CuSO4, and 5uM FAM picolyl azide in

496 1X phosphate buffered saline. Incubation time was 30 minutes, followed by three

497 washes in 20ml of 1X PBS for 5 minutes each. Cells were recovered from the filter by

498 vortexing in 0.02% Tween for 5 minutes, and then stained using 0.5uM SYTOTM59

499 (ThermoFisher Scientific, Invitrogen, Eugene OR, USA) DNA stain (Couradeau et al.,

500 2019)37.

501 Cells were sorted using a BD-InfluxTM (BD Biosciences, San Jose, CA, USA)

502 configured to capture total cells labeled with SYTOTM59 DNA stain and excited with a

23 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

503 640nm red laser and BONCAT-positive cells labeled with FAM picolyl azide dye and

504 excited with a 488nm blue laser. BONCAT-positive cells were a subset to total

505 SYTOTM59 stained cells, and were differentiated from the BONCAT-negative based

506 on their FAM fluorescence (530/40BP) to background fluorescence from identical

507 cells inclubated without HPG, i.e., HPG negative control, that also underwent the

508 same click reaction to add a FAM label. The two populations were sorted. The first

509 population contained all DNA+ cells (BONCAT-total), and the second population

510 contained only those cells which were also BONCAT+ in comparison to the control.

511 For each sample, we sorted 4 wells containing 5,000 cells each, and 20 wells

512 containing 300 cells each into 384 wells plates. Plates were frozen at −80 °C until

513 further processing.

514 Cells from samples containing 5,000 sorted cells were pelleted prior to

515 sequencing via centrifugation at 6,000 x g for 1 hour at 10 °C. The supernatant was

516 discarded by a brief inverted spin at <60 x g so as not to interfere with subsequent

517 whole genome amplification reaction chemistry. Plates containing only 300 sorted

518 cells were not pelleted. Sorted cells were lysed and amplified using 5 µl WGAX

519 reactions following the optimized conditions described in Stepanauskas et al. (2017)67.

520 Briefly, cells were lysed in 650 nl lysis buffer for 10 minutes at room temperature. The

521 lysis buffer consisted of 300 nl TE + 350 nl of 400 mM KOH, 10 mM EDTA, 100

522 mM DTT. Lysis reactions were neutralized by the addition of 350 nl of 315 mM HCl

523 in Tris-HCl. Amplification reactions were brought to 5 µL with final concentrations of

524 1X EquiPhi29 reaction buffer (Thermo), 0.2U/µl EquiPhi29 polymerase (Thermo),

525 0.4mM dNTPs, 50µM random heptamers, 10 mM DTT, and 0.5 µM SYTO13. Plates

526 were incubated at 45 °C for 13 hours.

24 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

527 Libraries for BONCAT+ and BONCAT-total metagenomic sequencing were

528 created using the Nextera XT v2 kit (Illumina) with 12 rounds of PCR amplification.

529 All volumes and inputs to Nextera reactions were reduced 10-fold from the

530 manufacturer’s recommendations. Libraries were sequenced 2X150bp mode on

531 Illumina’s Nextseq platform. The sequences were archived under JGI GOLD Study ID

532 Gs0141001.

533

534 Metagenomic Analysis with Anvi’o

535 NextSeq-generated fastq files were filtered at a minimum quality read

536 length of 0.75 using Anvi’o for Illumina (http://merenlab.org). The filtered reads were

537 co-assembled using MEGAHIT v1.1.2 (The University of Hong Kong and L3

538 Bioinformatics Limited). All shotgun, BONCAT+, and BONCAT-Total replicates

539 were co-assembled and kept separate for subsequent comparisons. Alignments were

540 accomplished using bowtie2 v2.3.6 and samtools v1.9. All bam files were merged into

541 individual profile databases for each sample prior to binning using Anvi’o. Bins were

542 assembled using tetranucleotide frequency into 376 genomic clusters for the shotgun

543 metagenomics samples and 20 bins for the BONCAT+ amplified metagenome and 38

544 bins for the BONCAT-Total amplified metagenome. To determine completeness,

545 scanning of ribosomal single copy genes for bacteria (Campbell et al. 201368) and

546 archaea (Rinke et al. 201369) in each cluster was performed in Anvi’o. To identify

547 open reading frames in contigs, Prodigal was used within Anvi’o. Functions were

548 assigned in Anvi’o using NCBI’s Clusters of Orthologous Genes (COGs) and a

549 Hidden Markov Model hit database was generated with Anvi’o. Coverage estimates,

550 GC content, and N50 were calculated. The phylogenetic tree was created using

551 MUSCLE (drive5.com/muscle) for multiple sequence alignment. Anvi’o Interactive

25 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

552 Server (anvi-server.org) was used to create phylogenomic evaluation of genomic

553 parameters for each MAG. All shotgun metagenome high quality (>80% completion),

554 BONCAT+ and BONCAT-Total MAGs are archived under JGI GOLD (available

555 upon publication).

556

557 Metagenomic Analysis with Curated Database

558 Taxonomic identity was determined using NCBI BLASTn of all contigs for

559 each bin. All blasted contigs with the best e-value and percent identity were then

560 compared to select the most common ancestor to represent the most likely taxonomic

561 identification for all the contigs for MAGs > 20X coverage. Genes were analyzed

562 using curated databases via direct protein sequence comparison. Curated databases for

563 hydrocarbon degradation genes included aerobic (AromaDeg)58 and anaerobic

564 (AnHyDeg)19 protein databases. Other functional gene databases (e.g., biosurfactant

565 genes, dsrA, and mcrA) were generated from protein sequences by extracting HMM

566 files from the Pfam database (EMBL-EBI) and from McKay et al. (2017).12 These

567 protein databases were BLAST-searched against each protein sequence with an e-

568 value cutoff of 1E-10 and above 40% identity. All duplicate gene hits were removed

569 with the highest e-value hit being kept. All protein gene sequence coverages were

570 normalized by using Reads/Coverage per kilobase of transcript per million mapped

571 reads (RPKM).

퐺푒푛푒 퐶표푣푒푟푎푔푒 572 푅푃퐾푀 = 퐺푒푛푒 퐿푒푛푔푡ℎ 푇표푡푎푙 # 표푓 푅푒푎푑푠 1000 ∗ 1,000,000

573 Genes were selected from the complete list of all genes (Table S1) and classified as

574 genes of interest for further analyses. The genes of interest were selected first by

575 ensuring there were representatives from each of the four databases tested (AnHyDeg,

26 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

576 AromaDeg, Biosurfactant, and Redox) and that one subunit representative from the

577 same operon family was further analyzed. To determine which of the gene

578 representatives were used, at least one of the following criteria had to be met:

579 identified in the BONCAT+ sample or contained the highest gene coverage in one of

580 the samples (FG-L, T-L, or N-H).

581

582 Acknowledgements

583 Metagenome sequencing was made possible by the Deep Carbon

584 Observatory’s Census of Deep Life and was performed at the Marine Biological

585 Laboratory (Woods Hole, MA, United States) and Joint Genome Institute CSP

586 503725. We are grateful for the assistance of Mitch Sogin, Joseph Vineis, Amelia

587 Brumbaugh, and Hilary Morrison at MBL. The work conducted by the U.S.

588 Department of Energy Joint Genome Institute, a DOE Office of Science User Facility,

589 is supported under contract No. DE-AC02-05CH11231. Any use of trade, firm, or

590 product name is for descriptive purposes only and does not imply endorsement by the

591 U.S. Government. The US Geological Survey (E.P.B) supported field work and

592 sampling. The work was further supported by NSF 1736255 (L.J.M., R.G., and

593 M.W.F.). The authors would also like to acknowledge Dr. Katie Davis and George

594 Platt for their help in sample collection in the Powder River Basin.

595 596 References (Limited to 70 references) 597 1. Javoy, M. The major volatile elements of the Earth: Their origin, behavior, fate. 598 Geophys. Res. Lett. 24, 177–180 (1997). 599 2. Ritter, D. et al. Enhanced microbial coalbed methane generation: A review of research, 600 commercial activity, and remaining challenges. Int. J. Coal Geol. 146, 28–41 (2015). 601 3. Fakoussa, R. M. & Hofrichter, M. Biotechnology and microbiology of coal 602 degradation. Appl. Microbiol. Biotechnol. 52, 25–40 (1999). 603 4. Elder, D. J. E. & Kelly, D. J. The bacterial degradation of benzoic acid and benzenoid 604 compounds under anaerobic conditions: Unifying trends and new perspectives. FEMS 605 Microbiol. Rev. 13, 441–468 (1994). 606 5. Muyzer, G. & Stams, A. J. M. The ecology and biotechnology of sulphate-reducing

27 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

607 bacteria. Nat. Rev. Microbiol. 6, 441–454 (2008). 608 6. Schweitzer, H. et al. Changes in microbial communities and associated water and gas 609 geochemistry across a sulfate gradient in coal beds: Powder River Basin, USA. 610 Geochim. Cosmochim. Acta 245, (2019). 611 7. Barnhart, E. P. et al. Hydrogeochemistry and coal-associated bacterial populations 612 from a methanogenic coal bed. Int. J. Coal Geol. 162, 14–26 (2016). 613 8. Barnhart, E. P. et al. Potential Role of Acetyl-CoA Synthetase (acs) and Malate 614 Dehydrogenase (mae) in the Evolution of the Acetate Switch in Bacteria and Archaea. 615 Sci. Rep. 5, 12498 (2015). 616 9. Min, H. & Zinder, S. H. Kinetics of Acetate Utilization by Two Thermophilic 617 Acetotrophic Methanogens: Methanosarcina sp. Strain CALS-1 and Methanothrix sp. 618 Strain CALS-1. Appl. Environ. Microbiol. 55, 488–91 (1989). 619 10. Borrel, G. et al. Comparative genomics highlights the unique biology of 620 Methanomassiliicoccales, a Thermoplasmatales-related seventh order of methanogenic 621 archaea that encodes pyrrolysine. BMC Genomics 15, 679 (2014). 622 11. Haroon, M. F. et al. Anaerobic oxidation of methane coupled to nitrate reduction in a 623 novel archaeal lineage. Nature 500, 567–570 (2013). 624 12. McKay, L. J., Hatzenpichler, R., Inskeep, W. P. & Fields, M. W. Occurrence and 625 expression of novel methyl-coenzyme M reductase gene (mcrA) variants in hot spring 626 sediments. Sci. Rep. 7, (2017). 627 13. Callaghan, A. V. et al. Diversity of benzyl- and alkylsuccinate synthase genes in 628 hydrocarbon-impacted environments and enrichment cultures. Environ. Sci. Technol. 629 44, 7287–7294 (2010). 630 14. Callaghan, A. V., Tierney, M., Phelps, C. D. & Young, L. Y. Anaerobic biodegradation 631 of n-hexadecane by a nitrate-reducing consortium. Appl. Environ. Microbiol. 75, 1339– 632 1344 (2009). 633 15. Beller, H. R. & Edwards, E. A. Anaerobic toluene activation by benzylsuccinate 634 synthase in a highly enriched methanogenic culture. Appl. Environ. Microbiol. 66, 635 5503–5505 (2000). 636 16. von Netzer, F. et al. Enhanced gene detection assays for fumarate-adding enzymes 637 allow uncovering of anaerobic hydrocarbon degraders in terrestrial and marine systems. 638 Appl. Environ. Microbiol. 73, 543–552 (2013). 639 17. Acosta-González, A., Rosselló-Móra, R. & Marqués, S. Diversity of benzylsuccinate 640 synthase-like (bssA) genes in hydrocarbon-polluted marine sediments suggests 641 substrate-dependent clustering. Appl. Environ. Microbiol. 79, 3667–3676 (2013). 642 18. Callaghan, A. V., Gieg, L. M., Kropp, K. G., Suflita, J. M. & Young, L. Y. Comparison 643 of mechanisms of alkane metabolism under sulfate-reducing conditions among two 644 bacterial isolates and a bacterial consortium. Appl. Environ. Microbiol. 72, 4274–4282 645 (2006). 646 19. Callaghan, A. V. & Wawrik, B. AnHyDeg: A Curated Database of Anaerobic 647 Hydrocarbon Degradation Genes. (2016). 648 20. Aitken, C. M. et al. Evidence that crude oil alkane activation proceeds by different 649 mechanisms under sulfate-reducing and methanogenic conditions. Geochim. 650 Cosmochim. Acta 109, 162–174 (2013). 651 21. Tan, B., Dong, X., Sensen, C. W. & Foght, J. Metagenomic analysis of an anaerobic 652 alkane-degrading microbial culture: Potential hydrocarbon-activating pathways and 653 inferred roles of community members. Genome 56, 599–611 (2013). 654 22. Berdugo-Clavijo, C. & Gieg, L. M. Conversion of crude oil to methane by a microbial 655 consortium enriched from oil reservoir production waters. Front. Microbiol. 5, 197 656 (2014).

28 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

657 23. An, D. et al. Metagenomics of hydrocarbon resource environments indicates aerobic 658 taxa and genes to be unexpectedly common. Environ. Sci. Technol. 47, 10708–10717 659 (2013). 660 24. Johnson, J. M., Wawrik, B., Isom, C., Boling, W. B. & Callaghan, A. V. Interrogation 661 of Chesapeake Bay sediment microbial communities for intrinsic alkane-utilizing 662 potential under anaerobic conditions. FEMS Microbiol. Ecol. 91, 1–14 (2015). 663 25. Gieg, L. M. & Toth, C. R. A. Signature Metabolite Analysis to Determine In Situ 664 Anaerobic Hydrocarbon Biodegradation. in Anaerobic Utilization of Hydrocarbons, 665 Oils, and Lipids (2020). doi:10.1007/978-3-319-50391-2_19. 666 26. Zhou, J. et al. Synthesis and Characterization of Anaerobic Degradation Biomarkers of 667 n -Alkanes via Hydroxylation/Carboxylation Pathways. Eur. J. Mass Spectrom. 22, 31– 668 37 (2016). 669 27. Meckenstock, R. U. & Mouttaki, H. Anaerobic degradation of non-substituted aromatic 670 hydrocarbons. Curr. Opin. Biotechnol. 22, 406–414 (2011). 671 28. Yin, S., Tao, X. & Shi, K. The role of surfactants in coal bio-solubilisation. Fuel 672 Process. Technol. 92, 1554–1559 (2011). 673 29. Makkar, R. S. & Rockne, K. J. Comparison of synthetic surfactants and biosurfactants 674 in enhancing biodegradation of polycyclic aromatic hydrocarbons. Environ. Toxicol. 675 Chem. 22, 2280–2292 (2003). 676 30. Barin, R., Talebi, M., Biria, D. & Beheshti, M. Fast bioremediation of petroleum- 677 contaminated soils by a consortium of biosurfactant/bioemulsifier producing bacteria. 678 Int. J. Environ. Sci. Technol. 11, 1701–1710 (2014). 679 31. Gojgic-Cvijovic, G. D. et al. Biodegradation of petroleum sludge and petroleum 680 polluted soil by a bacterial consortium: A laboratory study. Biodegradation 23, 1–14 681 (2012). 682 32. Sekhon, K. K., Khanna, S. & Cameotra, S. S. Enhanced biosurfactant production 683 through cloning of three genes and role of esterase in biosurfactant release. Microb. 684 Cell Fact. 10, 49 (2011). 685 33. Wentzel, A., Ellingsen, T. E., Kotlar, H. K., Zotchev, S. B. & Throne-Holst, M. 686 Bacterial metabolism of long-chain n-alkanes. Appl. Microbiol. Biotechnol. 76, 1209– 687 1221 (2007). 688 34. Wawrik, B. et al. Field and laboratory studies on the bioconversion of coal to methane 689 in the San Juan Basin. FEMS Microbiol. Ecol. 81, 26–42 (2012). 690 35. Ulrich, G. & Bower, S. Active methanogenesis and acetate utilization in Powder River 691 Basin coals, United States. Int. J. Coal Geol. 76, 25–33 (2008). 692 36. Barnhart, E., Hyatt, R., Fields, M. W. & Cunningham, A. B. Subsurface environment 693 sampler with actuator movable collection chamber. (2020). 694 37. Couradeau, E. et al. Probing the active fraction of soil microbiomes using BONCAT- 695 FACS. Nat. Commun. 10, (2019). 696 38. Hatzenpichler, R. et al. Visualizing in situ translational activity for identifying and 697 sorting slow-growing archaeal−bacterial consortia. Proc. Natl. Acad. Sci. 113, E4069– 698 E4078 (2016). 699 39. Plugge, C. M., Zhang, W., Scholten, J. C. M. & Stams, A. J. M. Metabolic flexibility of 700 sulfate-reducing bacteria. Front. Microbiol. 2, (2011). 701 40. Beckmann, S. et al. Long-term succession in a coal seam microbiome during in situ 702 biostimulation of coalbed-methane generation. ISME J. 13, 632–650 (2019). 703 41. Kleindienst, S. et al. Diverse sulfate-reducing bacteria of the 704 Desulfosarcina/Desulfococcus clade are the key alkane degraders at marine seeps. 705 ISME J. 8, 2029–2044 (2014). 706 42. Shin, B. et al. Anaerobic degradation of hexadecane and phenanthrene coupled to

29 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

707 sulfate reduction by enriched consortia from northern Gulf of Mexico seafloor 708 sediment. Sci. Rep. 9, (2019). 709 43. Sanford, R. A., Cole, J. R. & Tiedje, J. M. Characterization and description of 710 Anaeromyxobacter dehalogenans gen. nov., sp. nov., an Aryl-halorespiring facultative 711 anaerobic myxobacterium. Appl. Environ. Microbiol. 68, 893–900 (2002). 712 44. Vinson, D. S., Blair, N. E., Ritter, D. J., Martini, A. M. & McIntosh, J. C. Carbon mass 713 balance, isotopic tracers of biogenic methane, and therole of acetate in coal beds: 714 Powder River Basin (USA). Chem. Geol. 530, 119329 (2019). 715 45. Jones, E. J. P., Voytek, M. A., Corum, M. D. & Orem, W. H. Stimulation of methane 716 generation from nonproductive coal by addition of nutrients or a microbial consortium. 717 Appl. Environ. Microbiol. 76, 7013–7022 (2010). 718 46. Strapoć, D. et al. Biogeochemistry of microbial coal-bed methane. Annu. Rev. Earth 719 Planet. Sci. 39, (2011). 720 47. Stewart, C. R., Burnside, D. M. & Cianciotto, N. P. The surfactant of legionella 721 pneumophila is secreted in a TolC-dependent manner and is antagonistic toward other 722 legionella species. J. Bacteriol. 193, 5971–5984 (2011). 723 48. Suthar, H. & Nerurkar, A. Characterization of Biosurfactant Produced by Bacillus 724 licheniformis TT42 Having Potential for Enhanced Oil Recovery. Appl. Biochem. 725 Biotechnol. 180, 248–260 (2016). 726 49. Kumar, A. P. et al. Evaluation of orange peel for biosurfactant production by Bacillus 727 licheniformis and their ability to degrade naphthalene and crude oil. 3 Biotech 6, 43 728 (2016). 729 50. McIntosh, J. A., Donia, M. S. & Schmidt, E. W. Ribosomal peptide natural products: 730 Bridging the ribosomal and nonribosomal worlds. Nat. Prod. Rep. 26, 537–559 (2009). 731 51. Strieker, M., Tanović, A. & Marahiel, M. A. Nonribosomal peptide synthetases: 732 Structures and dynamics. Curr. Opin. Struct. Biol. 20, 234–240 (2010). 733 52. Yu, H., Huang, G. H., Xiao, H., Wang, L. & Chen, W. Combined effects of DOM and 734 biosurfactant enhanced biodegradation of polycylic armotic hydrocarbons (PAHs) in 735 soil-water systems. Environ. Sci. Pollut. Res. 21, 10536–10549 (2014). 736 53. Zhao, F. et al. Heterologous production of Pseudomonas aeruginosa rhamnolipid under 737 anaerobic conditions for microbial enhanced oil recovery. J. Appl. Microbiol. 118, 738 379–389 (2015). 739 54. Singh, D. N. & Tripathi, A. K. Coal induced production of a rhamnolipid biosurfactant 740 by Pseudomonas stutzeri, isolated from the formation water of Jharia coalbed. 741 Bioresour. Technol. 128, 215–221 (2013). 742 55. Ross, D. E. & Gulliver, D. Reconstruction of a Nearly Complete Pseudomonas Draft 743 Genome Sequence from a Coalbed Methane-Produced Water Metagenome. Genome 744 Announc. 4, e01024-16 (2016). 745 56. Schleinitz, K. M. et al. Phenol degradation in the strictly anaerobic iron-reducing 746 bacterium Geobacter metallireducens GS-15. Appl. Environ. Microbiol. 75, 3912–3919 747 (2009). 748 57. Wilkes, H., Buckel, W., Golding, B. T. & Rabus, R. Metabolism of Hydrocarbons in n- 749 Alkane-Utilizing Anaerobic Bacteria. J. Mol. Microbiol. Biotechnol. 26, 138–151 750 (2016). 751 58. Duarte, M., Jauregui, R., Vilchez-Vargas, R., Junca, H. & Pieper, D. H. AromaDeg, a 752 novel database for phylogenomics of aerobic bacterial degradation of aromatics. 753 Database (Oxford) doi:10.1093/database/bau118 (2014) doi:10.1093/database/bau118. 754 59. Schmeling, S. & Fuchs, G. Anaerobic metabolism of phenol in proteobacteria and 755 further studies of phenylphosphate carboxylase. Arch. Microbiol. 191, 869–878 (2009). 756 60. Liang, B. et al. Anaerolineaceae and Methanosaeta turned to be the dominant

30 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

757 microorganisms in alkanes-dependent methanogenic culture after long-term of 758 incubation. AMB Express 5, (2015). 759 61. Momper, L., Jungbluth, S. P., Lee, M. D. & Amend, J. P. Energy and carbon 760 metabolisms in a deep terrestrial subsurface fluid microbial community. ISME J. 11, 761 2319–2333 (2017). 762 62. Lemos, L. N., Mendes, L. W., Baldrian, P. & Pylro, V. S. Genome-Resolved 763 Metagenomics Is Essential for Unlocking the Microbial Black Box of the Soil. Trends 764 in Microbiology vol. 29 279–282 (2021). 765 63. Schweitzer, H. et al. Innovating carbon-capture biotechnologies through ecosystem- 766 inspired solutions. One Earth 4, 49–59 (2021). 767 64. da Cruz, G. F. et al. Could petroleum biodegradation be a joint achievement of aerobic 768 and anaerobic microrganisms in deep sea reservoirs? AMB Express 1, (2011). 769 65. Suenaga, H., Ohnuki, T. & Miyazaki, K. Functional screening of a metagenomic 770 library for genes involved in microbial degradation of aromatic compounds. Environ. 771 Microbiol. 9, 2289–2297 (2007). 772 66. Terrón-González, L., Martín-Cabello, G., Ferrer, M. & Santero, E. Functional 773 metagenomics of a biostimulated petroleum-contaminated soil reveals an extraordinary 774 diversity of extradiol dioxygenases. Appl. Environ. Microbiol. 82, 2467–2478 (2016). 775 67. Stepanauskas, R. et al. Improved genome recovery and integrated cell-size analyses of 776 individual uncultured microbial cells and viral particles. Nat. Commun. 8, (2017). 777 68. Campbell, J. H. et al. UGA is an additional glycine codon in uncultured SR1 bacteria 778 from the human microbiota. Proc. Natl. Acad. Sci. U. S. A. 110, 5540–5545 (2013). 779 69. Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark 780 matter. Nature 499, 431–437 (2013). 781 782

31 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

783 Supplemental Information Contents: 784 Supplemental Figures S1-S2 785 Supplemental Tables 1-4 and Supplemental Data 1-2 786 787 Supplemental Figures:

788 789 Supplemental Figure S1.) Evaluation of SYTO-Total metagenome assembled genomes 790 (MAGs) and the presence or absence of anaerobic (green) and aerobic hydrocarbon degrading 791 (blue), biosurfactant (purple) and sulfate reduction/methanogenesis (red) genes for each 792 corresponding MAG. The AnHyDeg and AromaDeg databases were used to determine the 793 different hydrocarbon degradation genes used for comparison. The percent completion (green 794 line graph out of 100%) and redundancy (purple line graph out of 10%) were compared for 795 each MAG. MAGs with putative Methanothrix, Bacteria and Bacteriodetes were all 796 greater than 92% complete and contain an average nucleotide identity that is >99% similar to 797 their matching BONCAT MAGs which is represented by the symbol (■).

32 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

4 4 4 10 10 10 DNA Stain (670/30{640}) 1C 2C 3C DNA+ DNA+ DNA+

) 17.3 ) 24.9 ) 21.9 } 3 } 3 } 3

0 10 0 10 0 10

4 4 4

6 6 6

{ { {

0 0 0

3 3 3

/ / /

0 0 0

7 2 7 2 7 2

6 10 6 10 6 10

( ( (

n n n

i i i

a a a

t t t

S S S

A 1 A 1 A 1

N 10 N 10 N 10

D D D

0 0 0 10 10 10 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

Forward Scatter {488nm} Forward Scatter {488nm} Forward Scatter {488nm}

4 4 4

) 10 ) 10 ) 10 } 1C } 2C } 3C 8 BONCAT+ 8 BONCAT+ 8 BONCAT+

8 8 8

4 4 4

{ 36.2 { 29.3 { 34.3

0 0 0

4 3 4 3 4 3 / 10 / 10 / 10

0 0 0

3 3 3

5 5 5

( ( (

e e e

c c c

n 2 n 2 n 2 e 10 e 10 e 10

c c c

s s s

e e e

r r r

o o o

u u u

l l l

F F F 1 1 1

T 10 T 10 T 10

A A A

C C C

N N N

O O O

B 0 B 0 B 0 10 10 10 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

Forward Scatter {488nm} Forward Scatter {488nm} Forward Scatter {488nm}

798 799 Supplemental Figure S2.) Gate drawing accomplished in two steps for the biological 800 replicates of FG-11 BONCAT positive samples. The top panel 1A, 2A, and 3A is gated to 801 separate total cells from background particles and gates indicated in black are based on a 802 DNA stain (Syto59, ex: 640nm em: 655-685nm). Following the initial gating, SYTO+ cells 803 were analyzed further for BONCAT fluorescence with the FAM (Picolyl dye (Ex: 488nm/Em: 804 530nm) (bottom panel). Gate for 1B, 2B, and 3B. Gating of the BONCAT+ populations was 805 achieved by comparing each replicate to both HPG negative and water controls (indicated in 806 red overlaid in each graph). The number in the top left of each of the bottom panel graphs 807 indicates the percentage of BONCAT+ cells in comparison to total cells (SYTO+) for each 808 replicate. 809 810 811 Supplemental Tables/Data: 812 *Uploaded as a data file with table explanations provided below 813 814 Supplemental Table S1.) List of environmental metagenome assembled genomes (MAGs) 815 from high sulfate and low sulfate coal seams from the Powder River Basin and the 816 corresponding sequencing and analysis parameters. 817 818 Supplemental Table 2.) Complete list of the four databases and all the genes compared in this 819 study. The function of each gene and the coverage of each gene in each sample is listed and 820 the presence of gene in the BONCAT+ sample is indicated. The genes further analyzed are 821 marked as genes of interest. 822

33 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

823 Supplemental Table 3.) List of BONCAT+ metagenome assembled genomes (MAGs) and the 824 corresponding sequencing and analysis parameters. The corresponding genes present in each 825 MAG are listed under their corresponding database. 826 827 Supplemental Table 4.) List of SYTO-Total metagenome assembled genomes (MAGs) and 828 there corresponding sequencing and analysis parameters. The corresponding genes that are 829 present in each MAG are listed under their corresponding database. 830 831 Supplemental Data 1.) Relative abundance of individual genes of interest involved in 832 hydrocarbon degradation for the shotgun environmental MAGs from the high-sulfate coal 833 seam (N-H) and two low-sulfate coal seams (T-L and FG-L). All MAGs that were below 80% 834 completion were grouped together as <80% Completion. 835 836 Supplemental Data 2.) Average Nucleotide Identity comparison of all shotgun, BONCAT+ 837 and SYTO-Total metagenomes. 838 839

34 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

ExtractionDNA

Laboratory Fluorescence Cell DNA Incubation Labeling Sorting Extraction

4 24 hrs 10 BONCAT+ 34.3 incubation 3 10 Shotgun

2 10 Metagenomic Ampli ed Sequencing 1 No 10

HPG Metagenomic

c ell Sequencing Active 0

FAM Fluorescence 10 0 1 2 3 4 SES 10 10 10 10 10

BONCAT FSC Nance 4 10

DNA+ 21.9 3 10

FG c ell Total HPG 2 10 Tripli cates

1 10 SYTO Terret 0 10 Metagenomic

SYTO 59 Fluorescence 0 1 2 3 4 10 10 10 10 10 FSC Analysis

Sandstone (Ss) Siltstone (Slts.) Mudstone / Shale Coal High sulfate wells Low sulfate wells bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

N-H T-L FG-L Completion Redundancy Coverage (%) (%) Putative Taxonomy Uncharacterized Bacteria (Bin_46_1) Uncharacterized Bacteria (Bin_201) Uncharacterized Bacteria (Bin_232) Uncharacterized Bacteria (Bin_197) Uncharacterized Bacteria (Bin_12) Verrucomicrobiaceae (Bin_126) Uncharacterized Tenericutes (Bin_105) Uncharacterized Proteobacteria (Bin_295) Pseudomonadaceae (Bin_304) Pseudomonadaceae (Bin_302) Pseudomonadaceae (Bin_308) Pseudomonadaceae (Bin_17) Legionellaceae (Bin_107) Syntrophaceae (Bin_84) Syntrophaceae (Bin_8) Syntrophaceae (Bin_73) Syntrophaceae (Bin_35) Syntrophaceae (Bin_325) Myxococcaceae (Bin_37) Myxococcaceae (Bin_185) Myxococcaceae (Bin_1) Geobacteraceae (Bin_373) Geobacteraceae (Bin_360) Geobacteraceae (Bin_34) Geobacteraceae (Bin_184) Desulfuromonadaceae (Bin_275) Desulfuromonadaceae (Bin_162) Desulfobulbaceae (Bin_58) Desulfobulbaceae (Bin_371) Desulfobulbaceae (Bin_196) Desulfobulbaceae (Bin_299) Desulfobacteraceae (Bin_51) Desulfobacteraceae (Bin_50) Desulfobacteraceae (Bin_368) Desulfobacteraceae (Bin_324) Desulfobacteraceae (Bin_21) Desulfobacteraceae (Bin_121) Desulfobacteraceae (Bin_29) Desulfobacteraceae (Bin_56) Desulfobacteraceae (Bin_369) Desulfobacteraceae (Bin_372) Rhodocyclaceae (Bin_374) Sterolibacteriaceae (Bin_128) Methylophilaceae (Bin_123) Methylophilaceae (Bin_83_1) Comamonadaceae (Bin_122) Burkholderiaceae (Bin_39) Rhodobacteraceae (Bin_57) Methylocystaceae (Bin_92) Methylocystaceae (Bin_97) Hyphomicrobiaceae (Bin_87) Hyphomicrobiaceae (Bin_90) Bradyrhizobiaceae (Bin_282) Beijerinckiaceae (Bin_93) Uncharacterized Phycisphaerae (Bin_20) Paenibacillaceae (Bin_120) Ardenticatenaceae (Bin_19) Anaerolineaceae (Bin_60) Anaerolineaceae (Bin_52) Anaerolineaceae (Bin_42) Anaerolineaceae (Bin_15) Anaerolineaceae (Bin_106) Ignavibacteriaceae (Bin_6) Ignavibacteriaceae (Bin_48) Ignavibacteriaceae (Bin_41) Ignavibacteriaceae (Bin_22) Sumerlaeaceae (Bin_28) Uncharacterized Bacteroidetes (Bin_108) Chitinophagaceae (Bin_279) Flavobacteriaceae (Bin_47) Flavobacteriaceae (Bin_323) Flavobacteriaceae (Bin_175) Flavobacteriaceae (Bin_102) Marinilabiliaceae (Bin_233) Marinilabiliaceae (Bin_44) Marinilabiliaceae (Bin_40) Bacteroidaceae (Bin_195) Conexibacteraceae (Bin_370) Coriobacteriaceae (Bin_36) Coriobacteriaceae (Bin_202) Streptomycetaceae (Bin_3) Microbacteriaceae (Bin_266) Vicinamibacteraceae (Bin_10) Methanosarcinaceae (Bin_96) Methanosaetaceae (Bin_11) Methanoregulaceae (Bin_30_1) %2%5%7%100% 75% 50% 25% 0% 100% 75% 50% 25% 0% ebdB N LigB N N-H pedh N N-H Gen N cmdB N bioRxiv preprint Dioxygenases Mono DIOX N ahyA N (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.It Hydroxylation Phthalate N pcmj N ebdB T LigB T T-L pedh T T-L Gen T cmdB T Mono DIOX T ahyA T doi: Phthalate T pcmj T ebdB FG https://doi.org/10.1101/2021.08.26.457739 LigB FG FG-L pedh FG FG-L Gen FG cmdB FG Mono DIOX FG ahyA FG Phthalate FG pcmj FG Beijerinckiaceae (Bin93) Bradyrhizobiaceae (Bin282) Comamonadaceae (Bin122) Desulfobacteraceae (Bin368) Methanosaetaceae (Bin11) Methylocystaceae (Bin92) Microbacteriaceae (Bin266) Myxococcaceae (Bin1) Myxococcaceae (Bin37) Pseudomonadaceae (Bin302) Rhodocyclaceae (Bin374) Streptomycetaceae (Bin3) <80% Completion (Bin 21) Desulfobacteraceae (Bin 368) Desulfobacteraceae (Bin 372) Desulfobacteraceae (Bin 374) Rhodocyclaceae Proteobacteria (Bin295) Uncharacterized <80% Completion made availableundera CC-BY-ND 4.0Internationallicense ; this versionpostedAugust26,2021.

%2%5%7%100% 75% 50% 25% 0% ppcC N est N N-H

N-H abcA N lch N ppsA N Carboxylation srf N Biosurfactant NapA N rhl N ppcC T

abcA T T-L est T T-L ppsA T lch T NapA T srf T apcc T . ppcC FG

est FG FG-L The copyrightholderforthispreprint FG-L abcA FG lch FG ppsA FG srf FG NapA FG rhl FG apcc FG (Bin 370) Conexibacteraceae (Bin 373) Geobacteraceae (Bin 1) Myxococcaceae (Bin 302) Pseudomonadaceae (Bin 374) Rhodocyclaceae <80% Completion (Bin 21) Desulfobacteraceae (Bin 369) Desulfobacteraceae (Bin 372) Desulfobacteraceae (Bin 1) Myxococcaceae (Bin 374) Rhodocyclaceae <80% Completion Sulfate Reduction/Methanogenesis %2%5%7%100% 75% 50% 25% 0% assD N N-H

mcrA N N-H bssD N hbs N

nmsD N Fumarate Addition tutE N dsrA N ibsD N masc N assD T mcrA T bssD T T-L T-L hbs T nms T tutE T dsrA T ibsD T masc T assD FG

FG-L bssD FG mcrA FG FG-L hbs FG nms FG tutE FG dsrA FG ibsD FG masc FG (Bin 368) Desulfobacteraceae (Bin 372) Desulfobacteraceae (Bin 50) Desulfobacteraceae (Bin 56) Desulfobacteraceae (Bin 34) Geobacteraceae (Bin 84) Syntrophaceae (Bin 10) Vicinamibacteraceae <80% Completion (Bin 21) Desulfobacteraceae (Bin 369) Desulfobacteraceae (Bin 372) Desulfobacteraceae (Bin 30.1) Methanoregulaceae (Bin 96) Methanosarcinaceae <80% Completion Unknown Unknown Geobacter Geobacter Geobacter Desulfobacteraceae Deltaproteobacteria Deltaproteobacteria Comamonadaceae Betaproteobacteria Betaproteobacteria The copyright holder for this preprint for this holder The copyright Proteobacteria . Proteobacteria Proteobacteria Chlorobi Bacteriodetes Bacteriodetes Methanothrix Methanobacterium Methanobacterium Methanobacterium this version posted August 26, 2021. August 26, posted this version ;

CC-BY-ND 4.0 International license International 4.0 CC-BY-ND Methanobacterium Methanobacterium srf lch est tutE LigB dsrA assD ahyA bssD mcrA ppsA ppsB abcA edbB ppcC cmdB Phthalate Gentisate made available under a under available made EXDO bicyclic EXDO monocyclic https://doi.org/10.1101/2021.08.26.457739 Redundancy(%) doi:

Completion(%)

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is It is in perpetuity. preprint the to display a license bioRxiv granted has who the author/funder, is review) peer by certified not was (which bioRxiv preprint bioRxiv Putative Taxonomy bioRxiv preprint doi: https://doi.org/10.1101/2021.08.26.457739; this version posted August 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Anaerobic Hydrocarbon Degradation Aerobic Biosurfactants Dioxygenases 1400 * 160 * 1200 140 1000 120 * 100 * * 800 80 * * 600 60 * 400 40 * * * 200 Coverage (RPKM) 20 * * * 0 * * * * * 0 en Srf Est Rhl Lch LigB IbsD TutE BssD AssD PpcC PpsA PcmJ ApcC Pedh EbdB AbcA AhyA HbsD McrA Dsr A NapA MasC CmdB NmsD

Hydroxylation Carboxylation Fumarate Addition Phthalate Mono_DIOX