bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1

1 ARTICLE

2 Climate change negatively impacts dominant

3 microbes in the sediments of a High Arctic

1 1 2 4 Graham A. Colby , Matti O. Ruuskanen , Kyra A. St. Pierre , Vincent

2 1 1,3 5 L. St. Louis , Alexandre J. Poulain , St´ephane Aris-Brosou

1 6 Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada

2 7 University of Alberta, Department of Biological Sciences, Edmonton, AB T6G 2E9,

8 Canada

3 9 Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON K1N

10 6N5, Canada

11 B: St´ephane Aris-Brosou ([email protected]).

12 Running head: Climate change impacts on High Arctic microbes bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2

13 Abstract

14 Temperatures in the Arctic are expected to increase dramatically over the next century,

15 yet little is known about how microbial communities and their underlying metabolic pro-

16 cesses will be a↵ected by these environmental changes in freshwater sedimentary systems.

17 To address this knowledge gap, we analyzed sediments from , NU Canada.

18 Here, we exploit the spatial heterogeneity created by varying runo↵ regimes across the

19 watershed of this uniquely large lake at these latitudes to test how a transition from low

20 to high runo↵, used as one proxy for climate change, a↵ects the community structure and

21 functional potential of dominant microbes. Based on metagenomic analyses of lake sedi-

22 ments along these spatial gradients, we show that increasing runo↵ leads to a decrease in

23 taxonomic and functional diversity of sediment microbes. Our findings are likely to apply

24 to other, smaller, glacierized watersheds typical of polar or high latitude / high altitudes

25 ecosystems; we can predict that such changes will have far reaching consequences on these

26 ecosystems by a↵ecting nutrient biogeochemical cycling, the direction and magnitude of

27 which are yet to be determined.

28 Keywords: High Arctic, microbial ecology, metagenome assembled genomes

29 (MAGs), high-throughput sequencing bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

3

30 Main

31 Climate change is amplified in polar regions, where near-surface temperatures have in-

32 creased almost twice as fast as elsewhere on Earth over the last decade [1, 2, 3]. Climate

33 models predict that temperature will increase in the Arctic by as much as 8°Cby2100

34 [4]. These changes are already having dramatic consequences on physical [5, 6, 7], bio-

35 geochemical [8, 9], and ecological [10, 11] processes across Arctic ecosystems. Yet, while

36 we are starting to understand the e↵ect of thawing permafrost on microbial communities

37 [12, 13, 14], we know very little about how microbes in lentic ecosystems such as

38 respond to environmental changes – even though microbes mediate most global biogeo-

39 chemical cycles [15, 16]. Furthermore, lakes are broadly considered sentinels of climate

40 change, as they integrate physical, chemical and biological changes happening through

41 their watersheds [17]; however, their microbial community structure and function are

42 relatively understudied, in particular in the Arctic.

43 To date, much of the research performed on microbial communities in Arctic lakes

44 has been limited to studies that were mostly based on partial 16S rRNA gene sequencing

45 [18, 19, 20, 21, 22]. While these studies are useful to understand the structure of these

46 microbial communities, they provide limited functional insights and can be biased as they

47 often rely on sequence databases where environmental microbes, specifically from the

48 Arctic, may be underrepresented [23, 24]. More critically, being circumscribed both in

49 space and in time, previous studies only o↵er snapshots of microbial communities and

50 hence, have a limited power to predict how microbial communities might respond to

51 climate change.

52 To predict the e↵ect of climate change on microbial functional diversity in Arctic lake

53 sediments, we focused on Lake Hazen, the world’s largest High Arctic lake by volume bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

4

54 (82°N) [25]. In this work, we exploited two important properties of Lake Hazen. First, its

55 watershed is already experiencing the e↵ects of climate change, as increasing temperatures

56 there are leading to more glacial melt, permafrost thaw, and increased runo↵ from the

57 watershed into the lake in warmer years relative to cooler ones citelehnherr2018world.

58 Second, its tributaries are highly heterogeneous, fed by eleven glaciers ranging from 6 to

2 59 1041 km in surface area, and annual runo↵ volume approximately scaling with their size

3 60 (from <0.001 to 0.080 km in 2016) [32].

61 It is this temporal and spatial heterogeneity in runo↵ that we used to evaluate the

62 possible consequences of climate change on High Arctic sediment microbial functional

63 diversity, acknowledging that the consequences of increasing temperature are likely slightly

64 more plural and complex. To this e↵ect, we sampled lake sediments along two transects

65 representing low (L transect: samples L1 [shallow] and L2 [deep]) and high (H: samples H1

66 [shallow] and H2, [deep]) seasonal runo↵ volume, as well as at a single site that received

67 negligible runo↵ (C site; Figure 1A). We also collected soil samples (S sites) from three

68 sites in the dried streambeds of the tributaries, on the northern shore between the two

69 transects to assess soil influence on microbial communities present in the sediments. We

70 then resorted to untargeted metagenomics analyses to draw an inventory of dominant

71 microbes, assumed to be the most critical to nutrient cycling and the most relevant to

72 the dynamics of microbial communities. These reconstructed Metagenome Assembled

73 Genomes (MAGs) [26] allowed us to assess the quantitative impact of a change of runo↵

74 regime, from low to high, on both the structure of sediment microbial communities and

75 their functional potential. We show that an increase in runo↵ volume and resultant

76 sedimentation rates, as predicted under climate change scenarios for the region, could

77 lead to a reduced diversity of the dominant microbial community and of their functional

78 potential. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

5

79 Characterization of the physical and geochemical environments

80 We first characterized how geochemical properties of the sediments varied along and

81 between the two transects. Sediment samples from these five sites clustered into four

82 distinct geochemical groups (Figure 1B) that reflect spatial variability in glacial runo↵,

83 the primary hydrological input to the lake. Indeed, PC1 explained 43% of the total

2 84 variance ( ), and di↵erentiated the L and high H runo↵ transects, while PC2 (29.9%)

85 separated each transect according to their depth.

2 86 Along PC1, higher concentrations of ammonia (NH3)andsulfate(SO4)inthepore-

87 waters, and a greater percentage of calcium carbonate in the sediments, were present

88 in the H transect. However, higher concentrations of dioxygen (O2), nitrates / nitrites

89 (NO3/NO2), and greater redox potential were present in the L transect and the control

90 (C) sites. Along PC2, sediment organic carbon (OC), and porewater pH and Cl,were

91 more determinant when discriminating between the shallow (L1 and H1) and deep (L2

92 and H2) sites of both transects (Supplementary Figures 4-5). Rather than grouping spa-

93 tially with the H transect, the C sites were most chemically similar to L1 (Figure 1C,

94 Supplementary Figure 6). The shallow sites were not significantly di↵erent from each

95 other in pH or OC concentrations, but were both significantly di↵erent from the deeper

96 sites suggesting that although most chemical features were similar within each transect,

97 some features might still be influenced by their spatial proximity to the shoreline or depth

98 of the overlying water column (Figure 1C). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

6

99 Contrasting low vs. high runo↵ transects revealed a decrease in

100 biodiversity

2 101 With such a clear geochemical separation of the transects along PC1 (43% of )and

102 significant spatial contrasts (Figure 1C), we had the right context to evaluate the influ-

103 ence of runo↵ gradients on sediment microbial diversity. We assembled a total of 300

104 (290 bacterial and 10 archaeal) MAGs that were >50% complete and with <10% con-

105 tamination (Supplementary Tables 6-7). By constructing phylogenetic trees for Bacteria

106 and Archaea, we noted that while most major phyla were represented in the MAGs, no

107 Firmicutes and only a small number of Archaea were identified (Figure 2). In contrast,

108 Gammaproteobacteria (n = 50), Actinobacteria (n = 31), Alphaprobacteria (n =24),

109 Chloroflexoata (n =30),Planctomycetota(n = 24), and Acidobacteriota (n =19)

110 were the most commonly recovered taxa across the entire watershed. Uncultured phyla

111 comprised 11% of reconstructed MAGs, including representatives from multiple taxa: ⇠

112 Eisenbacteria (n =12),Patescibacteria(n =9),Omnitrophica(n =5),KSB1(n =1),

113 Armatimonadota (n =1),Lindowbacteria(n = 1), USBP1 (n = 1), UBP10 (n =1),and

114 Zixibacteria (n =1).

115 However, these MAGs were not evenly distributed across all sites (Figure 2, inset;

116 Supplementary Figure 7). To quantify this uneven distribution, we determined the site

117 where each genome was most abundant. Based solely on this information, we performed an

118 unsupervised clustering (t-SNE), and found that the directions defined by sediment-laden

119 water flowing from the shallow to the deep site within each transect in the projection space

120 were almost orthogonal between transects (see arrows in Figure 3). This orthogonality

121 suggests that transitioning from the L to the H transect could lead to a dramatic shift in

122 microbial communities. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

7

123 To assess the significance of these shifts at the phylum level, we calculated the relative

124 proportions of each of the reconstructed 300 MAGs at each site, and tallied these numbers

125 by phylum, over the 43 phyla represented in our data. We did this along each transect –

126 essentially pooling sites H1/H2 together to represent the H transect, and doing the same

127 for sites L1/L2 (the L transect), while keeping proportions for the S and C sites separate.

128 Hierarchical clustering on this table of MAGs proportions by phyla vs. sites showed a

129 divergence from the L to H transects (following the (((L,C),H),S) clustering pattern;

130 Figure 4A, inset), confirming the clear contrast between the two transects in terms of

131 taxa proportions (see Figure 3). To test if these taxa proportions tended to increase or

132 decrease when transitioning from L to H along the (((L,C),H),S) clustering pattern, we

133 fitted linear models (ANOVA) regressing the proportions of each of the 43 phyla against

134 sites, ordered as per their hierarchical clustering (L C H S). Essentially, we regressed ! ! !

135 a single data point for each of the four classes (L, C, H, and S), so that P -values could not

136 be obtained, but slope could be estimated (Figure 4A). Strikingly, most of these slopes

8 137 were negative (binomial test: P =7.8 10 ), demonstrating a significant decrease in ⇥

138 diversity at the phylum level as one goes from low to high runo↵ regimes.

139 An NMDS ordination allowed us to detect the geochemical features associated with

140 this shift in microbial communities (Supplementary Figure 8). In the sediments, NH3

141 concentrations (P =0.03), NO2 / NO3 concentrations (P =0.03), and redox potential

142 (P =0.03) were significant in determining the distribution of MAGs (permutation test:

143 P<0.05). We further observed that the sites with the greatest diversity (L/C sites) were

144 also those with the greatest redox potential, and O2 and NO3/NO2 concentrations. Sites

2 145 with the lowest microbial diversity (H sites), contained greater NH3 and SO4 concentra-

146 tions, and lower redox potential. In addition to gradients shaped by the interplay between

147 microbial metabolism and local geochemical constraints, the physical disturbances asso- bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

8

148 ciated with high sedimentation rates also likely contributed to the homogenization of the

149 microbial community structure; however, we cannot quantify the relative importance of

150 each of these processes here.

151 Contrasting low vs. high runo↵ transects also revealed a loss of

152 functional potential

153 To assess the functional implications of this decrease of biodiversity, we assigned metabolic

154 functions and pathways to proteins in each MAG. We focused on genes and pathways in-

155 volved in key elements, targeting carbon, nitrogen, and sulfur cycling (Supplementary

156 Figures 9-10). Only the most abundant genomes per site were reported within each phy-

157 lum (Supplementary Figure 11), allowing us to compute the proportions of functions and

158 pathways in each of the 43 phyla present in reconstructed MAGs across the hydrological

159 regimes. Their hierarchical clustering (Supplementary Figures 12-14) led to a picture con-

160 sistent with the ones derived from both geochemical (Figure 1) and taxonomic abundances

161 (Figure 4A). Indeed, the two transects were again clearly separated (clustering pattern

162 (((L,C),S),H); Figure 4B, inset), and fitting linear models regressing function/pathway

163 proportions against sites showed that, again, most of these slopes were negative (bino-

164 mial test: P =0.0010). Forcing the same site ordering as for the taxonomic abundances

5 165 (L C H S as in Figure 4A, inset) led to similar results (binomial test: P =7.8 10 ), ! ! ! ⇥

166 demonstrating a significant decrease in metabolic diversity when going from the L to the

167 Htransect.

168 More specifically, we found that marker genes whose product is implicated in carbon

169 and sulfur metabolisms significantly decreased when going from the L to H, while nitrogen

170 metabolism was una↵ected (Supplementary Table 8; see Supplementary Text for details). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

9

171 When considering the individual functions present or absent across the transects, we noted

172 that most oxidative pathways (CO, methane, formaldehyde, sulfide, sulfite) appeared less

173 common in the H transect (Supplementary Figure 9), corresponding to lower oxygen

174 concentrations and constraints on aerobic metabolism. Furthermore, while most carbon

175 fixation processes were shared between the two transects, carbon oxidation and reduction

176 reactions regulated through Wood-Ljungdahl pathway were only observed in the H tran-

177 sect, where sedimentary conditions were anoxic throughout the first 5cm (Supplementary

178 Figures 4-5), consistent with a more reductive environment.

179 Discussion

180 Even if Arctic microbial communities are changing rapidly [13], there is still a dearth of

181 long-term time series observations. To address this point, we used Lake Hazens spatial

182 geochemical heterogeneity to evaluate the structural and functional response of lake sedi-

183 ment microbial communities to varying runo↵, already shown to increase in this warming

184 High Arctic environment [9]. Such an approach can reasonably be interpreted from the

185 lens of a space-for-time design, which assumes that spatial and temporal variations are

186 not only equivalent [27, 28], but also stationary [29]. Whether this latter condition is met

187 cannot be known, but in the absence of any time-series documenting the e↵ect of climate

188 change on lake sediment microbial communities in the High Arctic, the space-for-time

189 design becomes a convenience, if not a necessity [30].

190 Using metagenomics along two transects experiencing heterogeneous runo↵ conditions,

191 we presented evidence that climate change, as it drives increasing runo↵ and sediment

192 loading to glacial lakes, will likely lead to a decrease in both diversity and functional

193 potential of the dominant microbial communities residing in lake sediments. Note that bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

10

194 we specifically focused here on the dominant microbes, that is those for which we could

195 reconstruct the MAGs, in order to (i) have a phylogenetic placement of the corresponding

196 organisms based on a large number of marker genes (Figure 2), rather than partial 16S

197 rRNA gene sequences as usually done [23], and (ii) be able to predict almost complete

198 functional pathways for each of these organisms to test the impact of a change of runo↵

199 (Figure 4), rather than inferring function from taxonomic aliation [23].

200 Such a decrease in taxonomic and functional diversity may not be unique to Lake

201 Hazen, where rising temperatures have resulted in increasing glacial melt and associated

202 runo↵. Although such a pattern was not observed in other regions of the globe where

203 runo↵ is predicted to decrease [31, 32], our finding are likely to apply to other, smaller,

204 glacierized watersheds typical of high latitudes or altitudes. Indeed, at least in the Arctic,

205 freshwater discharge is broadly expected to increase with increasing temperatures and

206 precipitation loadings [33, 34, 35]. It would thus be immensely valuable to conduct sim-

207 ilar studies, replicating where appropriate a similar space-for-time design, at other lakes

208 throughout the world. Additional sampling e↵orts should carefully consider the spatial

209 heterogeneity of runo↵ regimes leading to divergent sedimentation rates (Supplementary

210 Table 2), limiting our ability to make temporal predictions.

211 Despite lacking geochemical measurements for the soil samples, we found that the

212 microbial communities in the sediments at the high runo↵ sites clustered most frequently

213 with those in the soil sites (Figure 4), highlighting a connection between terrestrial and

214 aquatic sediment communities as a function of the runo↵ volume, consistent with previous

215 findings [36, 37]. Unsurprisingly, as the soil is likely a source of nutrients (e.g.,DOC)

216 and organic and inorganic particles, we would expect increased runo↵ to the aquatic

217 ecosystems to alter microbial community structure [38]. Some of these structural changes

218 may then alter the functional capacity to metabolize carbon, nitrogen, sulfur compounds bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

11

219 and process toxins such as metals and antibiotics (Supplementary Figure 9). A more

220 experimentally-driven approach, based for instance on in situ incubation and geochemi-

221 cal tracers, would have been necessary to quantify such an interplay between microbial

222 metabolism and geochemical features. Yet, as sediments and nutrients are mostly de-

223 posited during the summer melt months, it can be expected that lake sediments record

224 microbe-driven seasonal changes in their geochemistry. Indeed, high glacial runo↵ is

225 known to bring dense, oxygenated river waters with OC directly to the bottom of the lake

226 [32], stimulating aerobic microbial activity. As a result, the geochemistry recorded along

227 the high runo↵ transect may first reflect a period of greater microbial metabolism, which

228 may actually exceed those in temperate systems [39], eventually followed by low oxygen,

229 low redox, and high NH3 conditions observed here (Figure 1) as oxygen is depleted and

230 anaerobic metabolisms allowed to proceed.

231 At a larger temporal scale, a key question that arises from these results is how changes

232 in hydrological regimes will alter the evolutionary dynamics of microbial communities in

233 lake sediments. Niche di↵erentiation, where the coexistence of ecological opportunities

234 can facilitate species diversification, may explain why sediments along the low runo↵

235 transect hosts a more diverse microbial community than sediments along the high runo↵

236 transect [40]. Presently, climate change is predicted to increase runo↵ in this High Arctic

237 environment [9], and we found evidence suggesting that the increased runo↵ homogenizes

238 community structure. This can be expected to disrupt niche di↵erentiation, and hence to

239 reduce the overall and long-term metabolic capacity in lake sediments. It is currently hard

240 to predict the future microbial ecology of these systems. On the one hand, climate change

241 may diminish species diversification, and lead to highly specialized microbial communities

242 adapted to a homogeneous ecological niche characterized by low oxygen, low redox, and

243 high NH3 concentrations. On the other hand, the seasonal and rapid changes in redox bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

12

244 conditions, predicted to follow the strong but punctual input of oxygen and nutrients

245 during springtime may allow for the development of a short-lived community that eluded

246 our sampling and analysis.

247 The rapid changes that a↵ect Lake Hazen’s watershed in response to climate warming

248 were already known to directly alter its hydrological regime. Here we further provide

249 evidence that a combination of increasing runo↵ and changing geochemical conditions

250 are associated with the reduced diversity and metabolic potential of its dominant micro-

251 bial communities. While longitudinal studies are needed to confirm these patterns, it is

252 still unclear how such losses in biodiversity and metabolic potential in Arctic ecosystems

253 will impact key biogeochemical cycles, potentially creating feedback loops of uncertain

254 direction and magnitude.

255 References

256 1. Overpeck, J. et al. Arctic environmental change of the last four centuries. Science

257 278,1251–1256(1997).

258 2. Serreze, M. C. & Francis, J. A. The Arctic amplification debate. Climatic change

259 76,241–264(2006).

260 3. Screen, J. A. & Simmonds, I. The central role of diminishing sea ice in recent Arctic

261 temperature amplification. Nature 464,1334(2010).

262 4. IPCC. Summary for Policymakers, book section SPM, 1–30 (Cambridge University

263 Press, Cambridge, United Kingdom and New York, NY, USA, 2013). Available at

264 www.climatechange2013.org. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

13

265 5. Laudon, H. et al. Save northern high-latitude catchments. Nature Geoscience 10,

266 324 (2017).

267 6. Bliss, A., Hock, R. & Radi´c, V. Global response of glacier runo↵ to twenty-first

268 century climate change. Journal of Geophysical Research: Earth Surface 119,717–

269 730 (2014).

270 7. O’Reilly, C. M. et al. Rapid and highly variable warming of lake surface waters

271 around the globe. Geophysical Research Letters 42,10–773(2015).

272 8. Frey, K. E. & McClelland, J. W. Impacts of permafrost degradation on Arctic river

273 biogeochemistry. Hydrological Processes: An International Journal 23,169–182

274 (2009).

275 9. Lehnherr, I. et al. The world’s largest High Arctic lake responds rapidly to climate

276 warming. Nature Communications 9,1290(2018).

277 10. Smol, J. P. et al. Climate-driven regime shifts in the biological communities of

278 Arctic lakes. Proceedings of the National Academy of Sciences 102,4397–4402

279 (2005).

280 11. Wrona, F. J. et al. Transitions in Arctic ecosystems: Ecological implications of

281 achanginghydrologicalregime. Journal of Geophysical Research: Biogeosciences

282 121,650–674(2016).

283 12. McCalley, C. K. et al. Methane dynamics regulated by microbial community re-

284 sponse to permafrost thaw. Nature 514,478(2014).

285 13. Hultman, J. et al. Multi-omics of permafrost, active layer and thermokarst bog soil

286 microbiomes. Nature 521,208(2015). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

14

287 14. Mackelprang, R., Saleska, S. R., Jacobsen, C. S., Jansson, J. K. & Ta¸s, N. Per-

288 mafrost meta-omics and climate change. Annual Review of Earth and Planetary

289 Sciences 44,439–462(2016).

290 15. Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive

291 Earth’s biogeochemical cycles. Science 320,1034–1039(2008).

292 16. Fuhrman, J. A. Microbial community structure and its functional implications.

293 Nature 459,193(2009).

294 17. Williamson, C. E., Saros, J. E., Vincent, W. F. & Smol, J. P. Lakes and reser-

295 voirs as sentinels, integrators, and regulators of climate change. Limnology and

296 Oceanography 54,2273–2282(2009).

297 18. Stoeva, M. K. et al. Microbial community structure in lake and wetland sediments

298 from a High Arctic polar desert revealed by targeted transcriptomics. PLoS One

299 9,e89531(2014).

300 19. Thaler, M., Vincent, W. F., Lionard, M., Hamilton, A. K. & Lovejoy, C. Microbial

301 community structure and interannual change in the last epishelf lake ecosystem in

302 the north polar region. Frontiers in Marine Science 3,275(2017).

303 20. Mohit, V., Culley, A., Lovejoy, C., Bouchard, F. & Vincent, W. F. Hidden biofilms

304 in a far northern lake and implications for the changing Arctic. npj Biofilms and

305 Microbiomes 3,17(2017).

306 21. Ruuskanen, M. O., St. Pierre, K. A., St. Louis, V. L., Aris-Brosou, S. & Poulain,

307 A. J. Physicochemical drivers of microbial community structure in sediments of

308 Lake Hazen, , Canada. Frontiers in Microbiology 9,1138(2018). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

15

309 22. Cavaco, M. A. et al. Freshwater microbial community diversity in a rapidly changing

310 high arctic watershed. FEMS Microbiol Ecol 95 (2019).

311 23. Ruuskanen, M. O., St Pierre, K. A., St Louis, V. L., Aris-Brosou, S. & Poulain,

312 A. J. Physicochemical drivers of microbial community structure in sediments of

313 lake hazen, nunavut, canada. Front Microbiol 9,1138(2018).

314 24. Ruuskanen, M. O. et al. Microbial genomes retrieved from high arctic lake sedi-

315 ments encode for adaptation to cold and oligotrophic environments. Limnology and

316 Oceanography (2019).

317 25. K¨ock, G. et al. Bathymetry and sediment geochemistry of Lake Hazen (Quttinir-

318 paaq National Park, , Nunavut). Arctic 56–66 (2012).

319 26. Bowers, R. M. et al. Minimum information about a single amplified genome

320 (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea.

321 Nature biotechnology 35,725(2017).

322 27. Blois, J. L., Williams, J. W., Fitzpatrick, M. C., Jackson, S. T. & Ferrier, S.

323 Space can substitute for time in predicting climate-change e↵ects on biodiversity.

324 Proceedings of the National Academy of Sciences 110,9374–9379(2013).

325 28. Lester, R. E., Close, P. G., Barton, J. L., Pope, A. J. & Brown, S. C. Predicting

326 the likely response of data-poor ecosystems to climate change using space-for-time

327 substitution across domains. Global change biology 20,3471–3481(2014).

328 29. Damgaard, C. A critique of the space-for-time substitution practice in community

329 ecology. Trends Ecol Evol 34,416–421(2019). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

16

330 30. Pickett, S. T. Space-for-time substitution as an alternative to long-term studies.

331 In Long-term studies in ecology, 110–135 (Springer, 1989).

332 31. Huss, M. & Hock, R. Global-scale hydrological response to future glacier mass loss.

333 Nature Climate Change 8,135(2018).

334 32. Pierre, K. S. et al. Contemporary limnology of the rapidly changing glacierized

335 watershed of the world’s largest high arctic lake. Scientific reports 9,4447(2019).

336 33. Peterson, B. J. et al. Increasing river discharge to the arctic ocean. science 298,

337 2171–2173 (2002).

338 34. Rawlins, M. A. et al. Analysis of the arctic system for freshwater cycle intensifi-

339 cation: Observations and expectations. Journal of Climate 23,5715–5737(2010).

340 Available at http://dx.doi.org/10.1175/2010JCLI3421.1.

341 35. Bring, A. et al. Arctic terrestrial hydrology: A synthesis of processes, regional

342 e↵ects, and research challenges. Journal of Geophysical Research: Biogeosciences

343 121,621–649(2016).

344 36. Comte, J., Culley, A. I., Lovejoy, C. & Vincent, W. F. Microbial connectivity and

345 sorting in a High Arctic watershed. The ISME journal 12,2988–3000(2018).

346 37. Ruiz-Gonz´alez, C., Ni˜no-Garc´ıa, J. P. & del Giorgio, P. A. Terrestrial origin of

347 bacterial communities in complex boreal freshwater networks. Ecology letters 18,

348 1198–1206 (2015).

349 38. Le, H. T. et al. Responses of aquatic bacteria to terrestrial runo↵: e↵ects on

350 community structure and key taxonomic groups. Frontiers in microbiology 7,889

351 (2016). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

17

352 39. Probst, A. J. et al. Di↵erential depth distribution of microbial function and puta-

353 tive symbionts through sediment-hosted aquifers in the deep terrestrial subsurface.

354 Nature microbiology 3,328(2018).

355 40. Cordero, O. X. & Polz, M. F. Explaining microbial genomic diversity in light of

356 evolutionary ecology. Nature Reviews Microbiology 12,263(2014).

357 Acknowledgements

358 This study was made possible through a collaborative e↵ort undertaken by Igor Lehn-

359 herr, Stephanie Varty, Victoria Wisniewski (University of Toronto, Mississauga), Charles

360 Talbot (Environment and Climate Change Canada), and Maria Cavaco (University of

361 Alberta). We thank Linda Bonen, Marina Cvetkovska, Manon Ragonnet and Alex Wong

362 for comments and discussions. Funding support was provided by the Natural Science and

363 Engineering Research Council of Canada (VSL, AJP, SAB), ArcticNet Network Centre

364 of Excellence (VSL, AJP), and the Polar Continental Shelf Program (VSL) in Resolute,

365 Nunavut, which provided logistical and financial support.

366 Contributions

367 G.C. and V.S.L. performed sampling, whereas G.C. conducted laboratory analyses. G.C.

368 and S.A.B. performed data analyses. G.C., S.A.B., V.S.L., and A.J.P. designed the study

369 and wrote the manuscript. V.S.L. conducted the microsensor profiles and porewater

370 extractions. G.C., S.A.B., A.J.P., M.R., K.S.P., and V.S.L. reviewed the manuscript.

371 Ethics declarations / Competing interests

372 The authors declare no competing interests. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

18

373 Supplementary information

374 Supplementary information is available for this paper. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

19

375 Figure captions bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

20 ), 2 ). L1 L2 3 C C L2 L1 H2 H1 * Sample PAM PAM Cluster

H1H2

01010200 150 100 50 0

2

- 3 NO NO

µmol·L-

+ sites and dotted 3 - - NO ↵ 2 + - 2 2 NO O L1 L2 Redox C * *

H1H2 Water Depth Water 600 500 400 300 200 100 0

-

0 Cl 3 3 NH OC µg·L- PC1 (42.6%) PC1 4 * pH L1 L2 SO 3 C 3 * CaCO -2

NH ), and ammonia concentration (NH

H1H2

. . . 2.5 2.0 1.5 1.0 4

3 CaCO % ), water depth, percent organic carbon (OC), 3 3 2 1 0 -2 -1 L1 L2

*

(B) PC2 (29.9%) PC2 025 (Dunn Test). If branch tips form a dichotomy or C . 0 * + NO

↵ . (B) Principal component analysis (PCA) showing the

H1H2 2

4 67 3 5

OC P< % 2) concentration (SO 4 L1 L2 Tu r n a b o u t Abbé Abbé Very Watercourse Gilman Snowgoose Snowgoose Henrietta Nesmith Nesmith Henrietta Blister 70°W C Runoff Runoff

* ° Ruggles River 0

C

W

°

°E 30 H1H2 30 600 500 400 300 C-Soil H-Soil

N Redox ° mv H2

60 H1 High runoff

), sulfate (SO

N W °

° ° L2

3

60 E 60 70 L1

N L1 L2 * 72°W

° L-Soil

80 C Low runoff * km

10

W

°

°

H1H2

24 10 8 6

90 E 0 90 sites. There was cient insu data to include soil sites in B and C.

2 2 O µmol·L - N

0

82°N 81.75°N ° W ° *

L1 L2 2 E 120 120 C *

Lake Hazen sampling design and chemical composition. (A) Location of Lake Hazen (red box). Inset

° W H1H2

5 E 150 °

150

. 7.4 7.2 7.6 7.0

° 180 pH (A) (C) ↵ erences in physical and chemical composition of the sediment sites. Vectors display pH, dissolved dioxygen (O Figure 1. map: soil (orange dots)(purple), and low sediment (green), (black and dots)di negligible/control sample (blue) sites runo areredox separated potential, into hydrological nitrates regimes andpercent of nitrites calcium carbonate high (CaCO concentration (NO Individual points represent themedoids mean was values used using to 1 cmasterisks identify intervals indicate clusters. measured significant (C) in di ↵ erences Distributiontrichotomy, the between the of top sites interactions chemical 5 within features cm. thatline for Partitioning group separates sediment around is low sites. not runo ↵ Branches significant. and Long dashes separate high runo bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

21 L2 Samples 0.1 L1 C eota r Thermoplasmatota 2 H Crenoarchaeota* aeota h Halobacte aeota* c r h c 1 r da H r il Asga Nanoa o L-S il o aeota h C-S aeota c r Patescibacteria* h c il r oa o r Dormibacterota H-S Mic Altia UAP2 0.3

0.00

0.75 0.50 1.00

undances b Relative a Relative Chloroflexota* aeota h aeota c r h c r -18 R ya Actinobacteriota* r Eu JdF Hadesa Armatimonadota* Firmicutes Gammaproteobacteria* Cyanobacteria* Dictyoglomota Caldisericota UBP7 Thermotogota Synergistota Bipolaricaulota Deinococcota ia* ota* r t e e t c c y obiota Alphaproteobacteria* r m ophota* r o wba t allbacteria o c W Riflebacteria Dependentiae Fusobacteriota Spirochaetota* errucomicrobiota* lan Lind Elusimic Omnit P V ota r e abilota* t r ionota c ota* r ota* omonadota r cc r o oba vib f c ylomi o Marinisomatota KSB1* Bacteroidota* o h ospinota ospi x r r y et esulfu esul it it Maximum likelihood phylogenetic trees of Lake Hazen genomes based on 120 concatenated bacteria and dell Campylobacterota Acidobacteria* M N N UBP10* M B D D Fermentibacterota WOR-3 Zixibacteria* Latescibacterota Edwardsbacteria UBP1* Eisenbacteria* Gemmatimonadota* Figure 2. 122 concatenated archaea protein-codingcontain genes. Lake Red Hazen Dots: genomes.Euryarchaeota. Lake Bacteria See Hazen tree GitHub genomes. is account rooted Asterisksand for with (*) 3). full Patescibacteria indicate Inset taxonomy and phyla shows tree100%. Archaea that MAG files tree abundance and is across for rooted sites, original with in tree the files 300 (Supplemental high Data quality File genomes 2 for each sample normalized to bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

22

● ● ● ● ● ● 10 ● ● ● ● ● H transect ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● H1 ● ● ● ● ● ● ● ● ● ● ● ● H2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● L1 5 ● ● ● ● ● ● ● ● L2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● C ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● H soil ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ●● ● ● ● ● ● ● ● L soil ● ● ● ● ● ● ● ● ● ● ● ● ● -SNE 2 ● ● ● ● t ● ● ● ● ● C soil ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● − ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● L transect ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ●

− ● ● ● ● ● ● ● ● ●● ● ●● ●

−10 −5 0 5 10 t-SNE 1

Figure 3. t-SNE analysis of genome abundance for each sediment sample. Each of the 300 shown genomes was assigned to the sample where it has the greatest abundance. Shaded arrows display the approximate direction of water flow, from upstream to downstream, for the high (green) and low (purple) transects. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

23 0 s (vertical

L C S H

xidation o CO CO −50

Urease

Cu resistance Cu

xidation o yde h malde r o f

adation r deg acid Amino

DNRA

ate f sul to xidation o Sulfite

xidation o mate r o f

xidation o ate f thiosul

Calvin cycle Calvin

Methanogenesis

adation r deg compound Aromatic

ification r Denit

xidation o Sulfide

y resistance y r Mercu

Amino acid biosynthesis acid Amino

ate reduction ate f Sul

Arsenate reduction Arsenate

gluconeogenesis

ase r e f ans r t Aminoglycoside

N fixation N

ate reduction ate r Nit

ammonia assimilation ammonia

Arsenic resistance Arsenic

TCA cycle TCA

iron reduction iron

Sulfite reduction to sulfide to reduction Sulfite −100

ylation h met y r Mercu

acycline resistance acycline r et T

Slope of pathway counts from Low to High runoff ood Ljungdahl ood W

ification r Nit

Cd Pb resistance Pb Cd

xidation o iron

amphenicol resistance amphenicol r Chlo

glycolysis

Pb Zn resistance or homeostasis or resistance Zn Pb

ite reduction ite r Nit

yde production yde h malde r o f

ugs r d ultiple m

xidation o Arsenite

xidation o methane

Beta lactamase Beta Cd Co Cu Pb Zn resistance Zn Pb Cu Co Cd = 0.0010 −150 P

0

20 10 Frequency (B) 5 os Neg Null P Sign of slope leads to a decrease in diversity. (A) Distribution of the slopes of ↵ 0

L C H S

Nitrospirota

Myxococcota

UBP10

Gemmatimonadota

Crenarchaeota

abilota r ylomi h Met Nanoarchaeota

-values: one-sided binomial test for enrichment in negative slopes. KSB1 Omnitrophota

P

ia r wbacte o Lind

−5

ia r Zixibacte

iota r anobacte y C

matimonadota r A

UBP1

Bacteroidota

ia r Eisenbacte

Slope of taxa counts from Low to High runoff

ucomicrobiota r er V

Spirochaetota

ia r atescibacte P

ia r Alphaproteobacte

iota r Acidobacte

ycetota m Plancto

xota e Chlorofl Transition from low to high runo

ia r Gammaproteobacte

iota r Actinobacte −10 -5 = 7.8x10 P

6 4 2 0 Frequency (A) Figure 4. taxonomic counts as aboth function cases, of counts sites. were (B)models aggregated Distribution (ANOVA) by were of location fitted the types totables; slopes (L estimate leftmost of [Low], the dendrograms pathway C counts slope show [Control],triangle as how of S pointing the a each [Soil], down). location function regression. and of types Insets: H sites. cluster, [High] heatmap transitioning In sites), representations from and of L linear count to H runo ↵ bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1

1 Online methods

2 Sample collection and processing

3 Sediment and soil cores were collected from Lake Hazen (82°N, 71°W: Figure 1A), located

4 within Quttinirpaaq National Park, on northern Ellesmere Island, Nunavut. Sampling

5 took place between May 10 and June 10, 2017, when the lake was still completely ice-

6 covered (Supplementary Table 1). Within the watershed, runo↵ flows from the outlet

7 glaciers along the northwestern shoreline through poorly consolidated river valleys, de-

8 positing sediments at the bottom of Lake Hazen along two transects, the H1/H2 and

9 L1/L2 sites, respectively. The lake then drains via the Ruggles River along its south-

10 eastern shoreline (C sites). The surrounding glacial rivers deliver di↵erent amounts of

11 sediments, nutrients and organic carbon unevenly to the lake as a consequence of hetero-

12 geneous sedimentation rates (Supplementary Table 2). More specifically, the top 5cm of

13 sediments from the deeper low (L2) and high (H2) runo↵ sites represented depositional

14 periods of 30 years (1987-2017) and 6 years (2011-2017), respectively (Supplementary

15 Table 3).

16 Samples were collected along two transects and can be separated into three hydrologi-

17 cal regimes by seasonal runo↵ volume: low (L transect), high (H transect), and negligible

18 runo↵ (C sites) summarized in Supplementary Table 3. Contamination of samples was

19 minimized by wearing non-powdered latex gloves during sample handling and sterilizing

20 all equipment with 10% bleach and 90% ethanol before sample collection. Sediment cores

21 approximately 30cm in length were collected with an UWITEC (Mondsee, Austria) grav-

22 ity corer from five locations: C (overlying water depth: 50 m) far from the direct influence

23 of glacial inflows serving as a control site; L1 (water depth: 50m) and L2 (water depth:

3 24 251 m), at variable distances from a small glacial inflow (Blister Creek, <0.001 km in bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2

25 summer 2016); and, H1 (water depth: 21m) and H2 (water depth: 253m), located ad-

3 26 jacent to several larger glacial inflows (i.e., the Abb´eRiver, 0.015 km and Snow Goose,

3 27 0.006 km in 2016). The soil samples (S sites) were collected from three sites in the dried

28 streambeds of the tributaries, on the northern shore between the two transects. At each

29 site, for both sediments and soil, five cores were sampled, 3m apartforthesediment ⇠

30 cores, and approximately 1m aparttoaccountforsiteheterogeneity. ⇠

31 For sediment core, one of the five cores were used for microprofiling of oxygen (O2),

32 redox and pH, as well as one core for porewater chemistry and loss on ignition (see [1] for

33 details), and the remaining three cores were combined, prior to their genomic analysis, here

34 again to account for site heterogeneity. For soil samples, three cores per site were collected

35 for DNA analysis, but no additional cores were collected for chemical analyses. As we

36 were mostly interested in the community composition through space, we combined the

37 top 5 cm of sediment and 10 cm of soil for sample extraction and subsequent sequencing.

38 Any remaining length of cores that were used for DNA analysis were discarded. These

39 uppermost layers in the sediment correspond to both the most recent sediment deposition

40 dates [2] and the region of greatest microbial activity [3]. The top of each core was

41 sectioned and placed into Whirlpack bags. These slices were homogenized manually inside

42 of the bags and stored in a 20°C freezer until shipment back to the University of Ottawa 43 where they were then stored at 80°C. Soil samples were transferred into falcon tubes,

44 homogenized, and stored as described above for the lake sediment samples.

45 Samples were thawed overnight and 250-400 mg (wet weight; Supplementary Table

46 4) were then washed in a sterile salt bu↵er (10mM EDTA, 50mM Tris-HCl, 50mM

47 Na2 HPO4 7H2O at pH 8.0) to remove PCR inhibitors [4, 5]. All sample handling was

48 conducted in a stainless-steel laminar flow hood (HEPA 100) treated with UVC radiation

49 and bleach before use. DNA extractions were performed using the DNeasy PowerSoil Kit bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

3

50 (MO BIO Laboratories Inc, Carlsbad, CA, USA), following the kit guidelines, except that

51 the final elution volume was 30 µlinsteadof100µl. DNA integrity was validated with

52 a NanoDrop Spectrometer and PCR combined with electrophoresis of the Glutamine

53 synthetase gene (glnA) as this gene is ubiquitous across microbial life (Supplementary

54 Figure 1 and Supplementary Table 5). Adequate DNA concentrations for sequencing were

55 reached by combining triplicate extractions for a total volume of 45 µlandaconcentration

56 50 ng/µl(SupplementaryTable4).Positiveandnegativecontrolswereusedtoverify

57 the integrity of the PCR amplification. Two kit extraction blanks contained no trace of

58 DNA and were not sequenced.

59 Chemical analyses

60 Redox potential, pH, and dissolved O2 concentration profiles were measured at 100 µM

61 intervals in the field within an hour of collection, using Unisense (Aarhus, Denmark) mi-

62 crosensors connected to a Unisense Field Multimeter. Cores used for porewater chemistry

63 analysis were sectioned in 1 cm intervals into 50 mL falcon tubes, followed by flushing

64 of any headspace with ultra-high-purity nitrogen (N2)beforecapping.Sedimentpore-

65 water was extracted following centrifugation at 4,000rpm. The supernatant was then

66 filtered through 0.45 µmcelluloseacetatefiltersinto15mltubes,andwerefrozenuntil

67 analysis. Concentrations of nitrates and nitrites (NO2 + NO3), and ammonia (NH3),

68 chloride (Cl) were measured in the sediment porewater using a Lachat QuickChem 8500

2 69 FIA Ion Analyzer, while total dissolved phosphorus (TDP) and SO4 were measured in

70 the sediment porewater using an ion chromatograph at the Biogeochemical Analytical

71 Service Laboratory (Department of Biological Sciences, University of Alberta). However,

72 TDP was removed from data analysis because insucient porewater was collected to mea-

73 sure TDP at site C. The centrifuged sediments were retained and percentages of calcium bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

4

74 carbonate (CaCO3)andorganiccarbon(OC)wereestimatedthroughlossonignition[6].

75 The chemical features of the top 5cm of the sediment cores were derived from mea-

76 surements performed at 1 cm intervals throughout the cores. The geochemical properties

77 of each sediment site were summarized using a Principle Component Analysis (PCA)

78 and projections were clustered using Partitioning Around Medoids [7]. The appropriate

79 number of clusters was determined from silhouettes with the R package hopach [8]. The

80 Dunn test [9] was used to compare samples, controlling for multiple comparisons with the

81 Benjamini-Hochberg adjustment.

82 Sequencing and data processing

83 Metagenomic libraries were prepared and sequenced by Genome Quebec on an Illumina

84 HiSeq 2500 platform (Illumina, San Diego, CA, USA; Supplementary Figure 2) on a

85 paired-end 125 bp configuration using Illumina TruSeq LT adapters (read 1: AGATCG-

86 GAAGAGCACACGTCTGAACTCCAGTCAC, and read 2: AGATCGGAAGAGCGTCGT-

87 GTAGGGAAAGAGTGT). The DNA from the eight sites (five sediments, three soils) was

88 sequenced, generating over 150 GB of data. Read count summaries were tracked through-

89 out each step of the pipeline for quality control (Supplementary Figure 3). Low quality

90 reads, adapters, unpaired reads, and low quality bases at the ends of reads were removed to

91 generate quality controlled reads with Trimmomatic (v0.36) [10] using the following argu-

92 ments: phred33, ILLUMINACLIP:TruSeq3-PE-2.fa:3:26:10, LEADING:3 TRAILING:3,

93 SLIDINGWINDOW:4:20, MINLEN:36, CROP:120, HEADCROP:20, AVGQUAL:20. FASTQC

94 (v0.11.8) [11] was then used to confirm that the Illumina adapters were removed and that

95 trimmed sequence lengths were at least 90bp in length with a Phred score of at least 33. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

5

96 Reconstruction of environmental genomes and annotation

97 To reconstruct environmental genomes, metagenomic quality-controlled reads from all

98 samples were coassembled using Megahit [12] software with a k-mer size of 31 and “meta-

99 large” setting (see Supplementary Table 6 for additional summary statistics). EukRep

100 [13] was used to remove any eukaryotic DNA from the contigs prior to the formation

101 of an Anvio (v5) [14] contig database. The contig database was generated by removing

102 contigs under 1000bp, and gene prediction was performed in the Anvio environment.

103 Sequence coverage information was determined for each assembled sca↵old by mapping

104 reads from each sample to the assembled contig database using Bowtie2 [15] with default

105 settings. The resulting SAM files were sorted and converted to BAM files using sam-

106 tools (v0.1.19) [16]. Each BAM file was prepared for Anvio using the “anvi-init-bam”

107 and contig database generated using “anvi-gen-contigs-database”. The contig database

108 and BAM mapping files were further used as input for “anvi-profile”, which generated

109 individual sample profiles for each contig over the minimum length of 2500 bp. These

110 profiles were then combined using “anvi-merge” and summary statistics for abundance

111 and coverage were generated with “anvi-summarise.” Automated binning was performed

112 using CONCOCT [17]. Sca↵olds were binned on the basis of GC content and di↵erential

113 coverage abundance patterns across all eight samples. Manual refinement was done using

114 Anvio’s refine option (Supplementary Table 7). Kaiju [18] was used to classify taxon-

115 omy of the assembled contigs with “anvi-import-taxonomy-for-genes” and aided in the

116 manual refinement process. Open reading frames were predicted with Prodigal (v2.6.3)

117 [19]. Anvio’s custom Hidden Markov models were run, along with NCBIs COG [20] an-

118 notation to identify protein-coding genes. PFAM [21], TIGRFAM [22], GO terms [23],

119 KEGG enzymes and pathways [24], and Metacyc pathways [25] were predicted with Inter-

120 proscan (v5) [26]. These annotations were then combined with the Anvio database with bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

6

121 “anvi-import-functions”.

122 Genome completeness and contamination were evaluated on the presence of a core set

123 of genes using CheckM (v1.0.5) “lineage wf” (Supplementary Table 7) [27]. Only genomes

124 that were at least 50% complete and with less than 10% contamination were further anal-

125 ysed – meeting the MIMAG standard for medium or high-quality genomes [28]. All recov-

126 ered genomes were used to calculate an average amino acid identity across all genomes us-

127 ing compareM (v0.0.23, function “aai wf”; https://github.com/dparks1134/CompareM)

128 [29]. CheckM was used again to identify contigs that were not contained in any of the

129 300 high-quality genomes, that is those whose size ranges from 1000–2500bp. As an

130 attempt to “rescue” these unbinned contigs, an alternative binning algorithm MaxBin

131 (v2.0) [30] was employed. An additional 481 genomes were recovered, but were not in-

132 cluded in further analysis as only 21 genomes were of average completion >65% (Sup-

133 plementary Data 1: https://github.com/colbyga/hazen_metagenome_publication/

134 blob/master/Supplemental_Data_1_maxbin2_unbinned_contigs_summary.csv).

135 Phylogenetic placement of the MAGs

136 Phylogenetic analyses were performed using two di↵erent sets of marker genes from the

137 Genome Taxonomy Database (GTDB): one for bacteria (120 marker genes) and one for

138 archaea (122 marker genes), as previously been used to assign taxonomy to MAGs [31].

139 The marker genes were extracted from each genome by matching Pfam72 (v31) [21] and

140 TIGRFAMs73 (v15.0) [22] annotations from GTDB (v86) [31]. Marker genes from each

141 of the 300 genomes were translated using seqinr [32], selecting the genetic code that

142 returned no in-frame stop codon. As some genomes had multiple copies of a marker

143 gene, duplicated copies were filtered out by keeping the most complete sequence. Marker

144 genes that were missing from some genomes were replaced by indel (gap) characters, bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

7

145 and their concatenated sequences were added those from the reference GTDB sequences.

146 MUSCLE (v3.8.31) [33] was employed to construct the alignment in R (v 3.5.1) [34].

147 Archaeal sequences were removed from the bacterial alignment on the basis of results

148 from CheckM [27] and independently verified using a custom list of archaea specific marker

149 genes. Alignments were then refined using trimAI [35] and the “-gappyout” parameter.

150 FastTree2 [36], recompiled with double precision to resolve short branch lengths, was used

151 to infer maximum likelihood phylogenetic trees from protein sequence alignments under

152 the WAG + model [37, 38, 39]. The archaeal tree was rooted with Euryarchaeota and

153 the bacterial tree was rooted with Patescibacteria using APE [40]. Trees were visualized

154 and colored by phylum with ggtree [41].

155 Community composition of the MAGs

156 To determine the relative abundance of each genome in the eight samples, sample-specific

157 genome abundances were normalized by sequencing depth [(reads mapped to a genome) /

158 (total number of reads mapped)], making comparisons across samples possible. Genome

159 abundances were generated using the CheckM “profile” function [27]. To determine the

160 average abundance of major taxonomic groups across sites (determined by the phyloge-

161 netic analysis described above), the abundances for genomes from the same taxonomic

162 group were summed and visualized using phyloseq [42] (usually at the phylum level, un-

163 less otherwise stated). These same abundance values were the basis for a community

164 composition analysis. The t-SNE plots were constructed by assigning each genome to a

165 site based on where it was most abundant using Rtsne [43]. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

8

166 Metabolic potential of the MAGs

167 To analyze functional marker genes in the metagenomes, we used a custom database of

168 reference proteins sequences (COG, PFAM, TIGRFAM, KEGG) based on the marker

169 genes used in other studies [44, 45] (Supplementary Data Files on GitHub). Pathways

170 were also predicted using MinPath [46] to map all identified KEGG enzymes to the most

171 parsimonious MetaCyc pathways [25]. As these MAGs were incomplete, some genes in

172 pathways may be absent. MinPath presented only parsimonious pathways represented by

173 multiple genes. As most genomes were present even at low abundances across all sites,

174 acut-o↵valueof 0.25 (on a log scale) was set for a genome to be included in the  10

175 functional analyses at any site, so that only the most abundant genomes for each site

176 were considered. We aggregated marker genes and pathways by function, summarizing

177 the results by phyla, except for Proteobacteria that was separated by class. We further

178 grouped all taxa together at each site to test for significant di↵erences in major nutri-

179 ent cycling processes (carbon, nitrogen, and sulphur) among sites using a hierarchical

180 clustering; significance was derived from the Approximately Unbiased bootstrap [47] and

181 Fisher’s exact test.

182 Data availability

183 Scripts and supplemental data files can be accessed from https://github.com/colbyga/

184 hazen_metagenome_publication. Raw sequence reads of the shotgun metagenomic data

185 were submitted to the sequence read archive (SRA) under accession no. SRP218124 and

186 under Bioproject PRJNA556841. The geochemical data were submitted to the National

187 Science Foundation’s (NSF) Arctic Data Center repository under doi:10.18739/A20R9M41W. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

9

188 References

189 1. Ruuskanen, M. O., St. Pierre, K. A., St. Louis, V. L., Aris-Brosou, S. & Poulain,

190 A. J. Physicochemical drivers of microbial community structure in sediments of

191 Lake Hazen, Nunavut, Canada. Frontiers in Microbiology 9,1138(2018).

192 2. Pierre, K. S. et al. Contemporary limnology of the rapidly changing glacierized

193 watershed of the world’s largest high arctic lake. Scientific reports 9,4447(2019).

194 3. Haglund, A.-L., Lantz, P., T¨ornblom, E. & Tranvik, L. Depth distribution of active

195 bacteria and bacterial activity in lake sediment. FEMS Microbiology Ecology 46,

196 31–38 (2003).

197 4. Zhou, J., Bruns, M. A. & Tiedje, J. M. DNA recovery from soils of diverse compo-

198 sition. Applied and environmental microbiology 62,316–322(1996).

199 5. Poulain, A. J. et al. Microbial DNA records historical delivery of anthropogenic

200 mercury. The ISME journal 9,2541(2015).

201 6. Heiri, O., Lotter, A. F. & Lemcke, G. Loss on ignition as a method for estimating

202 organic and carbonate content in sediments: reproducibility and comparability of

203 results. Journal of paleolimnology 25,101–110(2001).

204 7. Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K. cluster: Cluster

205 Analysis Basics and Extensions (2019). R package version 2.1.0 — For new features,

206 see the ’Changelog’ file (in the package source).

207 8. van der Laan, M. J. & Pollard, K. S. Hybrid clustering of gene expression data with

208 visualization and the bootstrap. Journal of Statistical Planning and Infererence

209 117,275–303(2003). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

10

210 9. Dinno, A. Dunn’s Test of Multiple Comparisons Using Rank Sums (2017). R

211 package version 1.3.5 — For new features, see the ’Changelog’ file (in the package

212 source).

213 10. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina

214 sequence data. Bioinformatics 30,2114–2120(2014).

215 11. Andrews, S. et al. Fastqc: a quality control tool for high throughput sequence data

216 (2010).

217 12. Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast

218 single-node solution for large and complex metagenomics assembly via succinct de

219 bruijn graph. Bioinformatics 31,1674–1676(2015).

220 13. West, P. T., Probst, A. J., Grigoriev, I. V., Thomas, B. C. & Banfield, J. F.

221 Genome-reconstruction for eukaryotes from complex natural microbial communi-

222 ties. Genome Res 28,569–580(2018).

223 14. Eren, A. M. et al. Anvio: an advanced analysis and visualization platform for omics

224 data. PeerJ 3,1319(2015).

225 15. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature

226 Methods 9,357(2012).

227 16. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics

228 25,2078–2079(2009).

229 17. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition.

230 Nature methods 11,1144(2014). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

11

231 18. Menzel, P., Ng, K. L. & Krogh, A. Fast and sensitive taxonomic classification for

232 metagenomics with Kaiju. Nature Communications 7,11257(2016).

233 19. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation

234 site identification. BMC bioinformatics 11,119(2010).

235 20. Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes.

236 BMC bioinformatics 4,41(2003).

237 21. Finn, R. D. et al. The Pfam protein families database: towards a more sustainable

238 future. Nucleic acids research 44,D279–D285(2015).

239 22. Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein

240 families. Nucleic acids research 31,371–373(2003).

241 23. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nature

242 genetics 25,25(2000).

243 24. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as

244 areferenceresourceforgeneandproteinannotation. Nucleic acids research 44,

245 D457–D462 (2015).

246 25. Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes and

247 the BioCyc collection of pathway/genome databases. Nucleic acids research 36,

248 D623–D631 (2007).

249 26. Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioin-

250 formatics 30,1236–1240(2014). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

12

251 27. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W.

252 CheckM: assessing the quality of microbial genomes recovered from isolates, single

253 cells, and metagenomes. Genome research 25,1043–1055(2015).

254 28. Bowers, R. M. et al. Minimum information about a single amplified genome

255 (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea.

256 Nature biotechnology 35,725(2017).

257 29. Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes sub-

258 stantially expands the tree of life. Nature microbiology 2,1533(2017).

259 30. Wu, Y.-W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning

260 algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics

261 32,605–607(2015).

262 31. Parks, D. H. et al. Astandardizedbacterialtaxonomybasedongenomephylogeny

263 substantially revises the tree of life. Nature biotechnology 36,996–1004(2018).

264 32. Charif, D. & Lobry, J. SeqinR 1.0.2: a contributed package to the R project for

265 statistical computing devoted to biological sequences retrieval and analysis. In

266 Bastolla, U., Porto, M., Roman, H. & Vendruscolo, M. (eds.) Structural approaches

267 to sequence evolution: Molecules, networks, populations,BiologicalandMedical

268 Physics, Biomedical Engineering, 207–232 (Springer Verlag, New York, 2007). ISBN

269 :978-3-540-35305-8.

270 33. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high

271 throughput. Nucleic acids research 32,1792–1797(2004). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

13

272 34. RDevelopmentCoreTeam.R: A Language and Environment for Statistical Com-

273 puting. R Foundation for Statistical Computing, Vienna, Austria (2008). Available

274 at http://www.R-project.org.ISBN3-900051-07-0.

275 35. Capella-Guti´errez, S., Silla-Mart´ınez, J. M. & Gabald´on, T. trimAl: a tool for

276 automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics

277 25,1972–1973(2009).

278 36. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2: approximately maximum-

279 likelihood trees for large alignments. PloS one 5,e9490(2010).

280 37. Whelan, S. & Goldman, N. A general empirical model of protein evolution derived

281 from multiple protein families using a maximum-likelihood approach. Molecular

282 biology and evolution 18,691–699(2001).

283 38. Aris-Brosou, S. & Rodrigue, N. The essentials of computational molecular evolu-

284 tion. In Evolutionary Genomics, 111–152 (Springer, 2012).

285 39. Aris-Brosou, S. & Rodrigue, N. A not-so-long introduction to computational molec-

286 ular evolution. Methods Mol Biol 1910,71–117(2019).

287 40. Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolu-

288 tion in R language. Bioinformatics 20,289–290(2004).

289 41. Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T.-Y. ggtree: an R package for

290 visualization and annotation of phylogenetic trees with their covariates and other

291 associated data. Methods in Ecology and Evolution 8,28–36(2017).

292 42. McMurdie, P. J. & Holmes, S. phyloseq: an R package for reproducible interactive

293 analysis and graphics of microbiome census data. PloS one 8,e61217(2013). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

14

294 43. Krijthe, J., van der Maaten, L. & Krijthe, M. J. Package ‘Rtsne’ (2018).

295 44. Anantharaman, K. et al. Thousands of microbial genomes shed light on intercon-

296 nected biogeochemical processes in an aquifer system. Nature Communications 7,

297 13219 (2016).

298 45. Dombrowski, N., Teske, A. P. & Baker, B. J. Expansive microbial metabolic versa-

299 tility and biodiversity in dynamic Guaymas Basin hydrothermal sediments. Nature

300 Communications 9,4999(2018).

301 46. Ye, Y. & Doak, T. A parsimony approach to biological pathway reconstruc-

302 tion/inference for genomes and metagenomes. PLoS Computational Biology 5,

303 e1000465 (2009).

304 47. Suzuki, R. & Shimodaira, H. Pvclust: an R package for assessing the uncertainty

305 in hierarchical clustering. Bioinformatics 22,1540–1542(2006). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

Supplementary Text

Nutrient cycles a↵ected when transitioning from low to high Overall, markers of carbon and sulphur metabolism significantly decreased when tran- sitioning from the L to H sites, even if nitrogen metabolism was not (Tab. S8). Most carbon pathways, such as carbon fixation through the Calvin-Benson-Bassham (CBB) pathway, as well as the capacity for simple carbon metabolism, were shared across all runo↵ regimes. In contrast, carbon oxidation and reduction reactions regulated through Wood-Ljungdahl pathway were only observed in the H sites, where sedimentary conditions were anoxic throughout the first 5cm (Figs.S4, S9). Here, Spirochaetota were likely per- forming anaerobic respiration and carbon fixation producing acetate as an end product. Methanogenesis pathways were present across all sites, but notably, methane oxidation pathways were absent from high runo↵ sites, where oxygen is limited. Greater concentrations of ammonia in the high runo↵ regimes may suggest that N- containing organic matter was mineralised through ammonification (Fig.S8). In the high runo↵ regime, there was both an absence of nitrification and a greater presence of mark- ers for ammonia assimilation. Markers for dissimilarity nitrate reduction (DNRA) were present in multiple genomes across all runo↵ regimes (Fig.S8). In contrast, urease mark- ers were found more abundantly in low runo↵ regimes, where ammonia concentration was lower (Fig.S5). The functional ability of microbes to cycle sulphur between oxidised and reduced forms was significantly di↵erent between the high and low runo↵ regimes (Tab. S8). In the high runo↵ regime, Gammaproteobactiera were the only organisms with the metabolic capacity to expansively utilise sulphur, performing sulphide oxidation and thiosulphate reduction. Whereas, sulphate reduction was predominantly found in the soil, control, and low runo↵ regimes. Aside from nutrient cycling, we also assessed the capacity of microbial communities to process metals and antibiotics. Metal resistance and cycling was mostly ubiquitous throughout all of the sites, regardless of runo↵. Methyl mercury production, identified by the presence of both hgcA and hgcB genes [1], was only implicated in the high runo↵ sites, in Spirocheatoa and Chloroflexoata. However, genes conferring mercury resistance involved in the conversion of inorganic HgII to the less toxic Hg0 –wereevenlydistributed throughout the sites. There was a broad presence of metal tolerance that was indicated by genetic determinants related to heavy metal resistance of cadmium, cobalt, copper, lead and zinc (Fig.S9). Furthermore, antimicrobial resistance genes specifically -lactamases, were ubiquitous across all genomes at all sites. We identified 90 genomes with antibiotic resistance genes (Fig.S??). Drug resistance was prevalent throughout all phyla, including one Creanarchaeota (archaeal) genome (Figs.S9, S??). Finally, we found that while amino acids were readily synthesised and degraded by most organisms (Fig.S9), the degradation of polycyclic aromatic compounds appeared to be least prevalent in the high runo↵ sites.

Page S1 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

Sites clustered by runo↵ regime To identify potential drivers of this reduction in diversity when going from the L to the H transects, we bi-clustered genomes by their normalised abundances (on a -log10 scale to reduce skew) and by sample site (Fig. S12), and found that sites clustered following a similar pattern to geochemical features (see Fig. 1B), with H sites grouping separately from L sites (Fig. 3A). The normalised abundances of MAGs showed no strong phylogeographic pattern, in that we did not observe an assemblage of MAGs solely representative of a given site (Fig. S13). In spite of this absence of phylogeographic pattern, the tanglegram suggests that the beta diversity of highly abundant MAGs in the L/C sites was greater than at the H sites (see the distribution of green lines connecting the phylogenetic and clustered trees in Tab. S13). This di↵erence in diversity between samples is further supported by an NDMS ordination (Fig. S14) and a PERMANOVA test on a PCoA ordination (Fig. S14).

References

1. Parks JM, Johs A, Podar M, Bridou R, Hurt RA, Smith SD, et al. The genetic basis for bacterial mercury methylation. Science. 2013;339(6125):1332–1335.

2. Pierre KS, Louis VS, Lehnherr I, Schi↵ S, Muir D, Poulain A, et al. Contemporary limnology of the rapidly changing glacierized watershed of the world’s largest High Arctic lake. Scientific reports. 2019;9(1):4447.

Page S2 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

Supplementary Figures Space (A) Space Space Space Space E.coli

H20 H2 L2 H1 L1 L-Soil H-Soil C-Soil Mix C B1 B2 1 23 1 23 1 23 1 23 1 23 1 23 1 23 1 23 1350 1350

1350 1350 150 150 100 100 50 50 150 150 100 100 50 50 (B) L-Soil L-Soil H2 H2 L-Soil H2 1 2 E.coli3 H20 12 3 1 1 2 3 1 2 3 12 3

1350 1350

150 150 100 100 50 50

Supplementary Figure 1. Gel electrophoresis images of the glnA gene for each sample extracted in triplicate. (A) Initial gel results using 1 µl of DNA for each PCR reaction. Every sample contains glnA except L-Soil and H2. (B) Repeated PCR reaction for L-Soil and H2. Diluted DNA concentrations for PCR from 1 µlto0.1µland0.5µl.

Page S3 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

L-Soil H-Soil C-Soil

L2 H2 C

L1 H1

Supplementary Figure 2. Illumina HiSeq DNA sequencing library validation from an Agilent Technology 2100 Bioanalyzer. All libraries were constructed using Illumina TruSeq DNA PCR-Free Library Prep. The library results for each of the eight samples is presented and labelled above the panel: L-Soil, H-Soil, C-Soil, L2, H2, C, L1, H1. Peaks between 300-600bp indicate the presence of DNA. Peaks below 1bp and above 1000bp are internal standards.

Page S4 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

150,000,000

100,000,000 Reads

50,000,000

0 C initial C L1 initial L1 initial L2 H1 initial H1 initial H2 C trimmedC L1 trimmedL1 trimmedL2 L Soil initial Soil L H1 trimmedH1 trimmedH2 C Soil initial Soil C initial Soil H L Soil trimmed Soil L C Soil trimmed Soil C trimmed Soil H C mapped anvio mapped C L1 mapped anvio mapped L1 anvio mapped L2 H1 mapped anvio mapped H1 anvio mapped H2 C mapped checkm mapped C L1 mapped checkm mapped L1 checkm mapped L2 H1 mapped checkm mapped H1 checkm mapped H2 L Soil mapped anvio mapped Soil L C Soil mapped anvio mapped Soil C anvio mapped Soil H L Soil mapped checkm mapped Soil L C Soil mapped checkm mapped Soil C checkm mapped Soil H Stages of Metagenome Processing

Supplementary Figure 3. Number of reads throughout analysis. Reads are successively decreasing, except for the reads mapped to the Anvio contig database, as each read can map to more than one contig and be counted more than once. The reads mapped using checkM include only reads that map to the 300 high quality reconstructed genomes, opposed to all reconstructed genomes in the Anvio mapping.

Page S5 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

0 0

−1 −1

−2 −2

depth cm depth −3 cm depth −3

−4 −4

−5 −5 0 5 10 7.00 7.25 7.50 7.75

O2 µmol/L pH

0

● H1● H2● L1● L2 ● C −1

−2

depth cm depth −3

−4

−5 200 300 400 500 600 redox mV

Supplementary Figure 4. Sediment microprobe profiles for oxygen (O2), pH, and redox measured in 100 µm intervals. Legend (bottom right) indicates the colour of each sediment site in the profiles.

Page S6 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

0 0 ● ● ● ●● ●● ● ● ● −1 −1 ● ● ● ● ● ●● ● ● ● −2 −2 ● ● ● ●● ● ● ● ● ● −3 −3 ● ● ● ● ● ● ● ● ●

depth cm depth −4 cm depth −4 ● ● ● ● ● ●● ● ● −5 −5 34567 0 50 100 150 200 -- Organic Carbon % NO2 + NO3 ug/L

0 0 ●●● ● ● ●● ● ● ● −1 −1 ● ● ● ● ● ● ● ● ● ● −2 −2 ● ● ● ● ● ●● ● ● ● −3 −3 ● ● ● ● ● ● ● ● ● depth cm depth depth cm depth −4 −4 ● ● ● ● ● ●● ● ● −5 −5 1.0 1.5 2.0 2.5 0 250 500 750 1000 NH ug/L CaCO3 % 3 0 0 ● ● ● ● ● ● ● ● ● −1 −1 ● ● ● ● ● ●●● ● ● −2 −2 ● ● ● ● ● ● ● ● ●● −3 −3 ● ● ● ●● ●

depth cm depth −4 cm depth −4 ● ● ● ● ● ● ● −5 −5 0.2 0.4 0.6 0 10 20 30 - Cl mg/L SO4 mg/L

0 ● ● ● ● ● −1 ● ● ● ● −2 ● H1● H2● L1● L2 ● C ● ● ● ● −3 ● ● ●

depth cm depth −4 ● ● ● ● −5 10 20 30 40 TDP ug/L

Supplementary Figure 5. Sediment porewater profiles for organic carbon, nitrite and nitrates, calcium carbonate, ammonia, chlorine, sulphate, and total dissolved phosphorus (TDP). TDP was removed when producing PCA and boxplots in Figure 1 because of incomplete measurements in C and L2. Legend (bottom right) indicates the colour of each sediment site in the profiles.

Page S7 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

* * (A) * *

* 600 7.6

O 6 8 10 pH 2 Redox µmol·L- mv 300 400 500 7.0 7.2 7.4 024

H1 H2 C L1 L2 H1 H2 C L1 L2 H1 H2 C L1 L2

* (B) * * * * * 2.5 0.5 0.6 20 500 600 67

* * 150 200 - - - NH3 NO2 +NO3 Cl * SO4 %OC %CaCO 3 µg·L- µmol·L- mg·L- mg·L- 10 15

* 1.0 1.5 2.0 0.1 0.2 0.3 0.4 345 0 50 100 0 100 200 300 400 H1 H2 C L1 L2 H1 H2 C L1 L2 H1 H2 C L1 L2 H1 H2 C L1 L2 H1 H2 C L1 L2 H1 H2 C L1 L2 Supplementary Figure 6. Distribution of all chemical features for sediment sites. Includes chlorine and sulphate measurements absent from Figure 1. Branches and asterisks indicate significant di↵erences between sites P<0.025 (Dunn Test). If branches form a dichotomy or trichotomy, the interactions within that group is not significant. (A) Microprobe measurements collected at every 100 µm. (B) Porewater measurements collected from bulk 1 cm intervals.

Page S8 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

Acidobacteriota

Actinobacteriota 60.7 (A) 1.00 (B) 60 Alphaproteobacteria 55.4 Armatimonadota Bacteroidota 51.3 49.6 Chloroflexota Crenarchaeota 0.75 Cyanobacteriota Eisenbacteria 40 Gammaproteobacteria 36.1 Gemmatimonadota

0.50 KSB1 Lindowbacteria Methylomirabilota 20.3 20.9 Abundance Myxococcota 20 Nanoarchaeota 14.7 0.25 Nitrospirota Omnitrophota % of Reads Mapped to MAGs to Mapped Reads of % Patescibacteria Planctomycetota Spirochaetota 0.00 0 UBP1 H-Soil C-Soil L-Soil H1 H2 C L1 L2 UBP10 C-SoilH-Soil L-Soil H1 H2 C L1 L2 Sample Verrucomicrobiota Sample Zixibacteria

Supplementary Figure 7. Reconstructed genome abundance across sites. (A) Amount of reads mapped to the 300 high quality genomes for each sample normalised to 100%. Only reads that were mapped to genomes are shown and all unmapped reads have been excluded. (B) Amount of reads mapped to the 300 high quality genomes for each sample.

Page S9 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

● Samples ●● ● ● ● Taxa ● ● ● Acidobacteriota ● ● ● ● Actinobacteriota ● ● H2 ● ● OC ● Alphaproteobacteria 0.3 ● ● ● ● ● ● Bacteroidota ●● Water ● ● Chloroflexota ● Depth SO4 ● ● ● ● Eisenbacteria ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● Gammaproteobacteria NH3 ● ●● ●● ● ●● ● ● ●● ● ●● ● Gemmatimonadota ● ● C ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● Methylomirabilota ● ● ● ● 0.0 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Myxococcota ● ● ● ● NMDS2 ● ● ● ● ● ● ● ● ● ● ● ● CaCO3 ● ● ● ● ● ● ● Nitrospirota ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ● O ● ● ● ● ● ● ● ●● ●● 2 ● ● ●● ● ● ● pH ● ● ● ● ● ● Omnitrophota ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● Patescibacteria ● ●● ●●●●● ●● ● ● ● ● ● ●●● ●● ● ● ●●● ●● ●● ●●● ● ● ● ● ●● H1 ●●● ● ●●●●● Planctomycetota ● L2 ●● ● Spirochaetota −0.3 L1 Redox ● UBP1 - - NO3 +NO2 ● UBP10 Cl- ● Verrucomicrobiota

● Crenarchaeota −0.6 −0.3 0.0 0.3 ● Nanoarchaeota NMDS1 Supplementary Figure 8. NMDS analysis of genomes from 20 most abundant phyla in sediment sites. Physical and chemical vectors are fitted to data. Red vectors are significant (permutation test: P<0.05).

Page S10 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

Soils: High Runoff: Low Runoff: Control: C H-Soil, L-Soil, H1 & H2 L1 & L2 & C-Soil - 50-100% of MAGs 30-49% of MAGs 0-29% of MAGs Actinobacter-iota(1) Alphaproteobacteria(4) Actinobacteriota(3) Acidobacteriota(3) Alphaproteobacteria(2) Actinobacteriota(1) Acidobacteriota(3) Alphaproteobacteria(1) Actinobacteriota(12) Alphaproteobacteria(1) Verrucomicrobiota(3) Verrucomicrobiota(2) Verrucomicrobiota(1) Chloroflexota(3) Chloroflexota(2) Chloroflexota(2) Crenarchaeota(2) Spirochaetota(3) Gammaproteobacteria(8) Gemmatimonadota(3) Gammaproteobacteria(4) Gemmatimonadota(2) Gammaproteobacteria(1) Chloroflexota(2) Crenarchaeota(1) Gammaproteobacteria(5) Bacteroidota(2) Myxococcota(3) Methylomirabilota(2) Myxococcota(2) KSB1(1) Nitrospirota(2) Nitrospirota(2) Bacteroidota(1) Methylomirabilota(2) UBP1(1) UBP10(4) UBP1(1) UBP10(2) Patescibacteria(5) Eisenbacteria(1) Planctomycetota(7) Eisenbacteria(1) Planctomycetota(5) Planctomycetota(3) UBP10(1) Wood–Ljungdahl Calvin cycle TCA cycle CO oxidation Gluconeogenesis Glycolysis Methane oxidation Carbon Methanogenesis Formaldehyde oxidation Formaldehyde production Formate oxidation N fixation Nitrification Denitrification DNRA Nitrate reduction Nitrogen Nitrite reduction Ammonia Assimilation Urease Sulfate reduction Sulfide oxidation Sulfite oxidation Sulfur Thiosulfate reduction Sulfite reduction (reversible) Iron oxidation Iron reduction Arsenite oxidation Arsenate reduction orr 1.0

0.5

0.0 −0.5 Arsenic resistance −1.0 Cd/Co/Cu/Pb/Zn resistance Metals Cd/Pb resistance Cu resistance Pb/Zn resistance Mercury methylation Mercury reduction Aminoglycoside transferase Tetracycline resistance Beta−lactamase Drugs Multiple drugs Chloramphenicol resistance Amino acid biosynthesis Amino acid degradation Other Aromatic compound degradation Supplementary Figure 9. Metabolic capacity of genomes separated by hydrological

regime. Genomes are only considered to contribute to a site if 0.56% ( log10 0.25) of reads per sample mapped to a genome. Presence of core metabolic genes involved in carbon metabolism, nitrogen metabolism, sulphur metabolism, metal cycling, antibiotic resistance, and other metabolism are shown. Number of genomes for each taxa are shown in parentheses. Blank: genes predicting function are absent or in low abundance. Shaded colours: genes predicting function are present in 30-50% of genomes per phylogenetic group. Dark colours: genes predicting function are present in >50% of genomes per phylogenetic group.

Page S11 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

au bp edge # 0.20

95 46 6 h1

Height 97 33 96 71 5 4 0.10 0.15

94 53 90 70 h2 3 2 h_soil

100 95 control

l_soil 1 c_soil l1 l2 0.00 0.05

Supplementary Figure 10. Metabolic capacity for carbon, sulphur, and nitrogen cycles clustered with P -values via multiscale bootstrap resampling. AU (approximately unbiased) P -values are shown in red, with any value > 0.95 (significance level 0.05) being significant. BP (bootstrap proportions) values are shown in green. Hierarchical clustering was completed using “correlation” as distance method and “average” as clustering method.

Page S12 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

80 80

75 60 60

50 40 40

25 Number of Bins of Number 20 20

0 0 0 −20 24 −20 24 −20 24 H−Soil Abundance L−Soil Abundance C−Soil Abundance

60 40

30 30 40

20 20

20

Number of Bins of Number 10 10

0 0 0 −20 24 −20 24 −20 24 L2 Abundance L1 Abundance C Abundance

30 30

20 20

10 Number of Bins of Number 10

0 0 −20 24 −20 24 H2 Abundance H1 Abundance

Supplementary Figure 11. Genome abundance per sample ( log scale). 10

Page S13 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

(A) Count

Abundance 0 80 Phylogeny −202 Row Z−Score

Bin_22_6 Bin_16_6 Bin_26_16 Bin_54_2 Bin_33_2 Bin_26_3 Bin_26_5 Bin_4_2 Bin_25_6 Bin_4_24 Bin_44_5 Bin_17_14 Bin_13_8 Bin_8_9 Bin_17_4 Bin_8_7_2 Bin_48_8 Bin_48_7 Bin_48_3 Bin_48_4 Bin_6_9 Bin_6_12 Bin_6_5 Bin_9_15 Bin_6_4 Bin_9_7 Bin_23_3 Bin_6_2 Bin_9_12 Bin_56_2 Bin_40_4 Bin_42_5 Bin_45_3 Bin_45_2 Bin_4_34 Bin_13_10 Bin_36_7 Bin_13_6 Bin_48_2 Bin_43_8 Bin_2_22 Bin_1_10 Bin_3_9

Bin_63_6 Bin_3_15 Bin_72_7 Bin_25_3 Bin_28_13 Bin_13_5 Bin_3_17 Bin_4_10

Bin_4_20_2 Bin_72_9 Bin_4_30 Bin_1_17 Bin_8_15 Bin_63_5 Bin_40_5 Bin_40_6 Bin_22_5 Bin_22_10 Bin_22_18 Bin_22_14 Bin_25_12 Bin_56_7 Bin_64_3 Bin_64_5 Bin_56_4 Bin_56_3 Bin_62_6 Bin_64_7 Bin_64_4 Bin_64_8 Bin_37_3 Bin_39_4 Bin_26_14 Bin_39_7 Bin_39_6

Bin_64_2 Bin_52_2 Bin_4_3 Bin_17_15 Bin_73_3 Bin_73_4 Bin_28_3 Bin_74_3 Bin_79_3 Bin_23_8 Bin_28_8 Bin_28_5 Bin_68_6 Bin_28_9 Bin_28_10 Bin_25_4 Bin_28_7 Bin_25_9 Bin_28_2

Bin_73_7 Bin_16_2 Bin_42_2 Bin_35_3 Bin_12_6 Bin_35_4 Bin_30_2 Bin_24_8 Bin_24_3 Bin_12_12 Bin_35_6 Bin_12_13 Bin_59_2 Bin_46_7 Bin_22_16 Bin_73_2 Bin_70_4 Bin_65_2 Bin_79_2 Bin_79_4 Bin_5_12 Bin_24_4 Bin_65_4 Bin_42_10 Bin_65_7 Bin_59_3 Bin_9_14 Bin_43_6 Bin_16_5 Bin_76_2 Bin_20_9 Bin_5_3 Bin_13_9 Bin_73_6 Bin_71_2_2 Bin_9_16 Bin_68_3 Bin_32_7 Bin_32_5 Bin_32_8 Bin_32_6 Bin_4_14 Bin_4_12 Bin_38_2 Bin_71_4 Bin_11_4 Bin_61_2 Bin_71_3 Bin_46_6 Bin_11_9 Bin_46_5 Bin_46_3 Bin_70_7 Bin_18_4 Bin_70_6 Bin_11_5 Bin_18_8 Bin_18_7 Bin_49_3 Bin_49_5 Bin_70_5 Bin_75_5 Bin_51_4 Bin_51_6 Bin_36_5 Bin_36_2 Bin_13_3 Bin_36_15 Bin_73_5 Bin_32_4 Bin_68_5 Bin_66_2 Bin_34_5 Bin_19_3 Bin_34_7 Bin_69_2 Bin_44_3 Bin_68_4 Bin_68_2 Bin_44_2 Bin_58_3 Bin_68_9 Bin_68_8 Bin_58_8 Bin_58_4 Bin_17_12 Bin_70_8 Bin_23_5 Bin_12_2 Bin_70_3 Bin_21_5 Bin_72_3 Bin_58_2 Bin_75_4 Bin_21_4 Bin_30_7 Bin_18_2_2 Bin_21_8 Bin_21_12

Bin_49_2 Bin_12_8 Bin_21_3_2 Bin_31_7 Bin_57_6 Bin_47_5 Bin_44_4 Bin_75_6 Bin_12_3 Bin_29_9 Bin_15_10 Bin_58_9 Bin_15_16 Bin_58_5 Bin_29_2 Bin_29_3 Bin_50_4 Bin_7_3_2 Bin_33_15 Bin_31_4 Bin_27_2

Bin_11_7 Bin_43_4 Bin_43_3 Bin_42_14 Bin_47_4 Bin_41_2 Bin_30_4 Bin_53_5 Bin_31_3 Bin_79_5 Bin_15_3 Bin_15_14 Bin_15_15 Bin_15_9 Bin_79_6 Bin_78_2 Bin_72_5 Bin_15_7 Bin_15_12 Bin_15_5 Bin_58_7 Bin_53_2 Bin_29_4 Bin_33_6 Bin_15_6 Bin_7_14 Bin_31_6 Bin_11_2

0.0 0.5 1.0 1.5 2.0 C L2 L1

H2 H1 0.5 soil soil soil − − − L (B) H C Count 02 4

−202 Row Z−Score

Bin_37_6

Bin_37_9

Bin_37_10

Bin_37_2

Bin_80_8

Bin_80_4

Bin_80_9

Bin_80_10

Bin_54_9

Bin_54_7

0.0 0.5 1.0 1.5 2.0

C 0.5 L1 L2 H1 H2 soil soil soil − − − L H C

Supplementary Figure 12. Genome abundance heatmap, and tanglegram separated by (A) bacterial genomes and (B) archaeal genomes. Left: The heatmap displays genome abundance per site normalised by amount of reads in each sample and

transformed to a log10 scale. Dotted yellow lines represent mean abundance values, and yellow traces represent the raw z-scores above (red) and below (blue) the mean. The abundance values are grouped both by sites (top dendrogram) and genomes (left dendrogram). Right: Tanglegram between dendrogram clustered by similar abundances and phylogenetic tree. Highlights in tanglegram: orange lines are genomes abundant in soil, green lines are genomes abundant in low runo↵ sediment, purple lines are genomes abundant in high runo↵ sediment, black lines are genomes shared in multiple environments.

Page S14 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

100 100 88 100 100 100 96 100 100 100 100

100 100 87 100

34 100 100 100 100 100 100 96 100 100 77

85 100 100 100 100 100 93

78 100

100 100 100 100

100 96 100

95 100 100 Gammaproteobacteria 100 100 40 100 Alphaproteobacteria

100 90 54 UBP10 100 100 100 100 88 100 Myxococcota 100 100 63 100 100 Nitrospirota

33 100 100 100 Methylomirabilota 70 100

87 100 100 100 100 Acidobacteriota

100 28 95 Eisenbacteria 98 100 36 100 UBP1 100 100 100 Gemmatimonadota 99 100 46 70 100 100 100 Zixibacteria 100 100 100 Bacteroidota 99 100 100 100 KSB1 100 100

100

73 Verrucomicrobiota 100 100 100 100 100 Planctomycetota 58 100

100 98 Omnitrophota 100 100 100

100 92 Lindowbacteria 74 100 98 100 6 Spirochaetota 100 72 100 100 Cyanobacteriota 100 82

77 98 Armatimonadota 96 100 100 100 Actinobacteriota 100 100 100 100

100 Chloroflexota 100

100 100 100 Patescibacteria 100 100 100 100 100

70 100 100 100 5

100 95 4 100

100 88 100 3 100 100 98 2 100 100 100 100 1 99 100 100 100 100 0

100 100 100 100 -log10 abundance 100 100 100 100 100 100 100

100 100 100 100 77 100 100

100 100 100 44 100 100 100

100 100 100 100 100 100 100

74 100 100 100 100 100 100 100 100 100

53 100 100 100 100 100

100 100

100 100 100 100 100

100 100 100 100

100 100 100 100 100

100 100

62 100 100 100 100

100 100 100 100 100 15

100 58 100 100 100

100 100 100 100 100

100 98 100 100 100 100

100 100

100 100 100 100 100 100 100

100 100 100 97 100 100 100

100 100

100 33 76 100 100

100 100

100 H1 H2 L2 L1 C H L C − − − soil soil soil 0.2

Bin_37_6 100 Bin_37_9 100 Bin_37_10 100 Bin_37_2 73 Crenarchaeota Bin_80_8 100 100 Bin_80_4 100 100 Bin_80_9 Bin_80_10 Nanoarchaeota Bin_54_9 100 Bin_54_7 H1 H2 L2 L1 C H L C − − − 0.1 soil soil soil

Supplementary Figure 13. Abundance of genomes are presented on a log10 scale were more negative values (red) are more abundant than positive values (blue). The abundance values correspond by row with the genomes phylogenetic assignment. Support values for phylogenetic tree are shown at each node. Page S15 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

P = 0.005 H1 F = 3.5242 High H2 0.2 0.4

Soil H-Soil PCoA 2 PCoA L2 L-Soil Low

0.2C-Soil 0.0 − L1 C

−0.6 −0.4 −0.2 0.0 0.2 0.4 PCoA 1 method = "bray" Supplementary Figure 14. Principal coordinate analysis (PCoA) on reconstructed genome abundances for all sites. Grouping the sites into soil, high, and low runo↵ was proven to be significant (PERMANOVA test: F =3.52, P =0.005).

Page S16 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

Supplementary Tables

Supplementary Table 1. Coordinates of Lake Hazen sediment and soil sites with temperature and date at time of sampling.

Sample Location Temperature (C)Date H-Soil 81 84’ 840” N; 70 83’ 849” W -5 June 3, 2017 L-Soil 81 80’ 332” N; 71 54’ 239” W 0 June 7, 2017 C-Soil 81 79’ 382” N; 70 44’ 486” W -2 June 3, 2017 H1 81 84’ 150” N; 70 85’ 175” W not measured May 24, 2017 H2 81 82’ 493” N; 70 71’ 498” W not measured May 29, 2017 C 81 80’ 343” N; 70 50’ 447” W not measured May 27, 2017 L1 80 80’ 521” N; 70 52’699”W notmeasured June1,2017 L2 81 79’ 171” N; 71 46’926”W notmeasured June2,2017

Supplementary Table 2. Sediment deposition dates and rates for the two deep sites, H2 and L2, is based on 210Pb constant rate of supply (CRS) dating model. Analysis was completed on 0.5cm core intervals. Data adapted from previous study [2].

H2 (Abbe Deepsite) L2 (Blister Deep Site) Interval Midpoint CRS date (CRS year) Sedimentation rate (g/cm2/yr) Interval Midpoint CRS date (CRS year) Sedimentation rate (g/cm2/yr) 0-0.5 2017.1 0.349 0-0.5 2016.5 0.0724 0.5-1 2016.6 0.292 0.5-1 2014.0 0.0701 1-1.5 2015.9 0.111 1-1.5 2011.3 0.1163 1.5-2 2014.1 0.671 1.5-2 2009.2 0.1759 2-2.5 2012.7 0.671 2-2.5 2007.3 0.1232 2.5-3 2012.7 0.227 2.5-3 2004.7 0.0909 3-3.5 2012.0 5.563 3-3.5 2001.9 0.0967 3.5-4 2011.3 5.563 3.5-4 1998.8 0.0854 4-4.5 2011.3 5.563 4-4.5 1994.0 0.0513 4.5-5 2011.3 5.563 4.5-5 1987.2 0.0375 5-5.5 2011.3 5.563 5-5.5 1976.7 0.0220 5.5-6 2011.3 5.563 5.5-6 1966.51 0.0515 6-6.5 2011.3 5.563 6-6.5 1957.66 0.0339 6.5-7 2011.3 5.563 6.5-7 1949.00 0.0537 7-7.5 2011.3 5.563 7-7.5 1940.41 0.0451 7.5-8 2011.3 5.563 7.5-8 1931.79 0.0668 8-8.5 2011.3 5.563 8-8.5 1925.07 0.0553 8.5-9 2011.3 0.441 8.5-9 1918.10 0.0553 9-9.5 2010.9 0.224 9-9.5 1911.08 0.0553 9.5-10 2009.8 0.123 9.5-10 1904.42 0.0553

Page S17 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

Supplementary Table 3. Glacial runo↵ to the Lake Hazen Watershed. Lengths of rivers in km are shown in parentheses. Mass balance modelled runo↵ for years 2015 and 2016. Sampling dates in 2017 were prior to the summer runo↵. Data adapted from previous study [2].

Surface area (km2) Runo↵ volume (km3) River Catchment Glacier River 2015 2016 High Runo↵:Abb´e(AB) 390 204 7.9(21) 0.061 0.015 Low Runo↵: Blister (BR) n/a 6 2.5 (11) 0.002 <0.001 Gilman (GL) 992 708 5.1 (22) 0.192 0.08 Henrietta Nesmith (HN) 1274 1041 9.6 (4.6) 0.291 0.075 Snowgoose (SG) 222 87 5.6 (17) 0.026 0.006 Turnabout(TN) 678 259 13.4(42) 0.082 0.024 Very (VR) 1035 269 32.9(39) 0.165 0.08 Watershed total 7516 3078 91.2 0.979 0.291

Supplementary Table 4. DNA extraction masses. DNA was extracted in triplicate for each sample and then combined prior to sequencing.

Lake Hazen Sample Location Tube ID Wet Weight (g) PCR with glnA 1 2 3Total(grams) Sediment H2:Deephole S1 0.416 0.455 0.499 1.370 yes(diluted) L2: Blister Deep S2 0.497 0.455 0.389 1.341 yes H1: Abbe S3 0.469 0.514 0.552 1.535 yes L1: Blister Shallow S4 0.429 0.402 0.361 1.192 yes C: Ruggles S5 0.518 0.447 0.331 1.296 yes Soil L-Soil:BlisterSoil BS3 0.527 0.535 0.561 1.623 yes(diluted) H-Soil:AbbeSoil AS3 0.443 0.537 0.417 1.397 yes C-Soil: Ruggles Soil RS3 0.447 0.500 0.447 1.394 yes

Page S18 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

Supplementary Table 5. DNA fluorescence assay quantification for each sample. Note: H2 sequencing required using full extraction volume of 65 µltoreachan appropriate concentration. Volume Concentration Total DNA NanoDrop Sample (µl) (ng/µl) (ng) (concentration ng/µl) H2 65 0.08 3.36 4.6 L2 42 2.55 107.1 7.6 H1 42 4.33 181.86 10.2 L2 42 17.98 755.16 32.8 C4221.2890.436.4 H-Soil 42 38.14 1601.88 65.2 L-Soil 47 69.88 3284.36 149.0 C-Soil 47 74.38 3495.86 110.0

Supplementary Table 6. Contigs were assembled with Megahit and used in Anvio database. Only contigs that were greater than 2500 bp in length were used to form MAGs, however, contigs with a length of 1000 bp or more were used to form the initial database.

Contigs Total Length (bp) Min Contig Length (bp) Max Contig Length (bp) Avg Contig Length (bp) N50 (bp) # of genes prodigal Megahit 12026467 8477069127 200 792468 705 756 NA Anvio 1455655 3414974759 1000 792468 NA 285024 4254625

Supplementary Table 7. Number of bins recovered at each assembly step prior to and after manual refinement with Anvio.

Anvio-Output Anvio-Output CheckM Final CheckM (CONCOCT binning) (manually refined) Total Bins 850 850 877 877 Completion, Redundancy C: >90, R: <10 52 74 52 72 C: >70, R: <10 180 199 178 198 C: >50, R: <10 321 324 309 300 C: <50, R: any 529 526 568 577 Contamination <1233261255286 <5544632590676 >10 8 51 7 27 Good bins but contaminated C: >50, R: >10 2 27 0 2 C: >70, R: >10 1 6 0 0 C: >90, R: >10 0 2 0 0

Page S19 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Colby et al.SupplementaryInformation

Supplementary Table 8. P -values for marker gene and pathway distribution between sites. A Fisher’s exact test was used in the place of 2 test for small count numbers. P -values of less than 0.05 are considered significant (labelled with an asterisks).

Functional Marker High vs Low High vs Control High vs Soil Low vs Control Low vs Soil Control vs Soil Carbon Cycle 0.03298 * 0.0004998 * 0.0004998 * 0.07746 0.1384 0.001499 * NitrogenCycle 0.07596 0.001999* 0.0004998* 0.7006 0.03448* 0.1859 sulphurCycle 0.001999* 0.002999* 0.0004998* 0.4913 0.02999* 0.3823

Page S20