bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1
1 ARTICLE
2 Climate change negatively impacts dominant
3 microbes in the sediments of a High Arctic lake
1 1 2 4 Graham A. Colby , Matti O. Ruuskanen , Kyra A. St. Pierre , Vincent
2 1 1,3 5 L. St. Louis , Alexandre J. Poulain , St´ephane Aris-Brosou
1 6 Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada
2 7 University of Alberta, Department of Biological Sciences, Edmonton, AB T6G 2E9,
8 Canada
3 9 Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON K1N
10 6N5, Canada
11 B: St´ephane Aris-Brosou ([email protected]).
12 Running head: Climate change impacts on High Arctic microbes bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
2
13 Abstract
14 Temperatures in the Arctic are expected to increase dramatically over the next century,
15 yet little is known about how microbial communities and their underlying metabolic pro-
16 cesses will be a↵ected by these environmental changes in freshwater sedimentary systems.
17 To address this knowledge gap, we analyzed sediments from Lake Hazen, NU Canada.
18 Here, we exploit the spatial heterogeneity created by varying runo↵ regimes across the
19 watershed of this uniquely large lake at these latitudes to test how a transition from low
20 to high runo↵, used as one proxy for climate change, a↵ects the community structure and
21 functional potential of dominant microbes. Based on metagenomic analyses of lake sedi-
22 ments along these spatial gradients, we show that increasing runo↵ leads to a decrease in
23 taxonomic and functional diversity of sediment microbes. Our findings are likely to apply
24 to other, smaller, glacierized watersheds typical of polar or high latitude / high altitudes
25 ecosystems; we can predict that such changes will have far reaching consequences on these
26 ecosystems by a↵ecting nutrient biogeochemical cycling, the direction and magnitude of
27 which are yet to be determined.
28 Keywords: High Arctic, microbial ecology, metagenome assembled genomes
29 (MAGs), high-throughput sequencing bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
3
30 Main
31 Climate change is amplified in polar regions, where near-surface temperatures have in-
32 creased almost twice as fast as elsewhere on Earth over the last decade [1, 2, 3]. Climate
33 models predict that temperature will increase in the Arctic by as much as 8°Cby2100
34 [4]. These changes are already having dramatic consequences on physical [5, 6, 7], bio-
35 geochemical [8, 9], and ecological [10, 11] processes across Arctic ecosystems. Yet, while
36 we are starting to understand the e↵ect of thawing permafrost on microbial communities
37 [12, 13, 14], we know very little about how microbes in lentic ecosystems such as lakes
38 respond to environmental changes – even though microbes mediate most global biogeo-
39 chemical cycles [15, 16]. Furthermore, lakes are broadly considered sentinels of climate
40 change, as they integrate physical, chemical and biological changes happening through
41 their watersheds [17]; however, their microbial community structure and function are
42 relatively understudied, in particular in the Arctic.
43 To date, much of the research performed on microbial communities in Arctic lakes
44 has been limited to studies that were mostly based on partial 16S rRNA gene sequencing
45 [18, 19, 20, 21, 22]. While these studies are useful to understand the structure of these
46 microbial communities, they provide limited functional insights and can be biased as they
47 often rely on sequence databases where environmental microbes, specifically from the
48 Arctic, may be underrepresented [23, 24]. More critically, being circumscribed both in
49 space and in time, previous studies only o↵er snapshots of microbial communities and
50 hence, have a limited power to predict how microbial communities might respond to
51 climate change.
52 To predict the e↵ect of climate change on microbial functional diversity in Arctic lake
53 sediments, we focused on Lake Hazen, the world’s largest High Arctic lake by volume bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
4
54 (82°N) [25]. In this work, we exploited two important properties of Lake Hazen. First, its
55 watershed is already experiencing the e↵ects of climate change, as increasing temperatures
56 there are leading to more glacial melt, permafrost thaw, and increased runo↵ from the
57 watershed into the lake in warmer years relative to cooler ones citelehnherr2018world.
58 Second, its tributaries are highly heterogeneous, fed by eleven glaciers ranging from 6 to
2 59 1041 km in surface area, and annual runo↵ volume approximately scaling with their size
3 60 (from <0.001 to 0.080 km in 2016) [32].
61 It is this temporal and spatial heterogeneity in runo↵ that we used to evaluate the
62 possible consequences of climate change on High Arctic sediment microbial functional
63 diversity, acknowledging that the consequences of increasing temperature are likely slightly
64 more plural and complex. To this e↵ect, we sampled lake sediments along two transects
65 representing low (L transect: samples L1 [shallow] and L2 [deep]) and high (H: samples H1
66 [shallow] and H2, [deep]) seasonal runo↵ volume, as well as at a single site that received
67 negligible runo↵ (C site; Figure 1A). We also collected soil samples (S sites) from three
68 sites in the dried streambeds of the tributaries, on the northern shore between the two
69 transects to assess soil influence on microbial communities present in the sediments. We
70 then resorted to untargeted metagenomics analyses to draw an inventory of dominant
71 microbes, assumed to be the most critical to nutrient cycling and the most relevant to
72 the dynamics of microbial communities. These reconstructed Metagenome Assembled
73 Genomes (MAGs) [26] allowed us to assess the quantitative impact of a change of runo↵
74 regime, from low to high, on both the structure of sediment microbial communities and
75 their functional potential. We show that an increase in runo↵ volume and resultant
76 sedimentation rates, as predicted under climate change scenarios for the region, could
77 lead to a reduced diversity of the dominant microbial community and of their functional
78 potential. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
5
79 Characterization of the physical and geochemical environments
80 We first characterized how geochemical properties of the sediments varied along and
81 between the two transects. Sediment samples from these five sites clustered into four
82 distinct geochemical groups (Figure 1B) that reflect spatial variability in glacial runo↵,
83 the primary hydrological input to the lake. Indeed, PC1 explained 43% of the total
2 84 variance ( ), and di↵erentiated the L and high H runo↵ transects, while PC2 (29.9%)
85 separated each transect according to their depth.
2 86 Along PC1, higher concentrations of ammonia (NH3)andsulfate(SO4 )inthepore-
87 waters, and a greater percentage of calcium carbonate in the sediments, were present
88 in the H transect. However, higher concentrations of dioxygen (O2), nitrates / nitrites
89 (NO3 /NO2 ), and greater redox potential were present in the L transect and the control
90 (C) sites. Along PC2, sediment organic carbon (OC), and porewater pH and Cl ,were
91 more determinant when discriminating between the shallow (L1 and H1) and deep (L2
92 and H2) sites of both transects (Supplementary Figures 4-5). Rather than grouping spa-
93 tially with the H transect, the C sites were most chemically similar to L1 (Figure 1C,
94 Supplementary Figure 6). The shallow sites were not significantly di↵erent from each
95 other in pH or OC concentrations, but were both significantly di↵erent from the deeper
96 sites suggesting that although most chemical features were similar within each transect,
97 some features might still be influenced by their spatial proximity to the shoreline or depth
98 of the overlying water column (Figure 1C). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
6
99 Contrasting low vs. high runo↵ transects revealed a decrease in
100 biodiversity
2 101 With such a clear geochemical separation of the transects along PC1 (43% of )and
102 significant spatial contrasts (Figure 1C), we had the right context to evaluate the influ-
103 ence of runo↵ gradients on sediment microbial diversity. We assembled a total of 300
104 (290 bacterial and 10 archaeal) MAGs that were >50% complete and with <10% con-
105 tamination (Supplementary Tables 6-7). By constructing phylogenetic trees for Bacteria
106 and Archaea, we noted that while most major phyla were represented in the MAGs, no
107 Firmicutes and only a small number of Archaea were identified (Figure 2). In contrast,
108 Gammaproteobacteria (n = 50), Actinobacteria (n = 31), Alphaprobacteria (n =24),
109 Chloroflexoata (n =30),Planctomycetota(n = 24), and Acidobacteriota (n =19)
110 were the most commonly recovered taxa across the entire watershed. Uncultured phyla
111 comprised 11% of reconstructed MAGs, including representatives from multiple taxa: ⇠
112 Eisenbacteria (n =12),Patescibacteria(n =9),Omnitrophica(n =5),KSB1(n =1),
113 Armatimonadota (n =1),Lindowbacteria(n = 1), USBP1 (n = 1), UBP10 (n =1),and
114 Zixibacteria (n =1).
115 However, these MAGs were not evenly distributed across all sites (Figure 2, inset;
116 Supplementary Figure 7). To quantify this uneven distribution, we determined the site
117 where each genome was most abundant. Based solely on this information, we performed an
118 unsupervised clustering (t-SNE), and found that the directions defined by sediment-laden
119 water flowing from the shallow to the deep site within each transect in the projection space
120 were almost orthogonal between transects (see arrows in Figure 3). This orthogonality
121 suggests that transitioning from the L to the H transect could lead to a dramatic shift in
122 microbial communities. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
7
123 To assess the significance of these shifts at the phylum level, we calculated the relative
124 proportions of each of the reconstructed 300 MAGs at each site, and tallied these numbers
125 by phylum, over the 43 phyla represented in our data. We did this along each transect –
126 essentially pooling sites H1/H2 together to represent the H transect, and doing the same
127 for sites L1/L2 (the L transect), while keeping proportions for the S and C sites separate.
128 Hierarchical clustering on this table of MAGs proportions by phyla vs. sites showed a
129 divergence from the L to H transects (following the (((L,C),H),S) clustering pattern;
130 Figure 4A, inset), confirming the clear contrast between the two transects in terms of
131 taxa proportions (see Figure 3). To test if these taxa proportions tended to increase or
132 decrease when transitioning from L to H along the (((L,C),H),S) clustering pattern, we
133 fitted linear models (ANOVA) regressing the proportions of each of the 43 phyla against
134 sites, ordered as per their hierarchical clustering (L C H S). Essentially, we regressed ! ! !
135 a single data point for each of the four classes (L, C, H, and S), so that P -values could not
136 be obtained, but slope could be estimated (Figure 4A). Strikingly, most of these slopes
8 137 were negative (binomial test: P =7.8 10 ), demonstrating a significant decrease in ⇥
138 diversity at the phylum level as one goes from low to high runo↵ regimes.
139 An NMDS ordination allowed us to detect the geochemical features associated with
140 this shift in microbial communities (Supplementary Figure 8). In the sediments, NH3
141 concentrations (P =0.03), NO2 / NO3 concentrations (P =0.03), and redox potential
142 (P =0.03) were significant in determining the distribution of MAGs (permutation test:
143 P<0.05). We further observed that the sites with the greatest diversity (L/C sites) were
144 also those with the greatest redox potential, and O2 and NO3 /NO2 concentrations. Sites
2 145 with the lowest microbial diversity (H sites), contained greater NH3 and SO4 concentra-
146 tions, and lower redox potential. In addition to gradients shaped by the interplay between
147 microbial metabolism and local geochemical constraints, the physical disturbances asso- bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
8
148 ciated with high sedimentation rates also likely contributed to the homogenization of the
149 microbial community structure; however, we cannot quantify the relative importance of
150 each of these processes here.
151 Contrasting low vs. high runo↵ transects also revealed a loss of
152 functional potential
153 To assess the functional implications of this decrease of biodiversity, we assigned metabolic
154 functions and pathways to proteins in each MAG. We focused on genes and pathways in-
155 volved in key elements, targeting carbon, nitrogen, and sulfur cycling (Supplementary
156 Figures 9-10). Only the most abundant genomes per site were reported within each phy-
157 lum (Supplementary Figure 11), allowing us to compute the proportions of functions and
158 pathways in each of the 43 phyla present in reconstructed MAGs across the hydrological
159 regimes. Their hierarchical clustering (Supplementary Figures 12-14) led to a picture con-
160 sistent with the ones derived from both geochemical (Figure 1) and taxonomic abundances
161 (Figure 4A). Indeed, the two transects were again clearly separated (clustering pattern
162 (((L,C),S),H); Figure 4B, inset), and fitting linear models regressing function/pathway
163 proportions against sites showed that, again, most of these slopes were negative (bino-
164 mial test: P =0.0010). Forcing the same site ordering as for the taxonomic abundances
5 165 (L C H S as in Figure 4A, inset) led to similar results (binomial test: P =7.8 10 ), ! ! ! ⇥
166 demonstrating a significant decrease in metabolic diversity when going from the L to the
167 Htransect.
168 More specifically, we found that marker genes whose product is implicated in carbon
169 and sulfur metabolisms significantly decreased when going from the L to H, while nitrogen
170 metabolism was una↵ected (Supplementary Table 8; see Supplementary Text for details). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
9
171 When considering the individual functions present or absent across the transects, we noted
172 that most oxidative pathways (CO, methane, formaldehyde, sulfide, sulfite) appeared less
173 common in the H transect (Supplementary Figure 9), corresponding to lower oxygen
174 concentrations and constraints on aerobic metabolism. Furthermore, while most carbon
175 fixation processes were shared between the two transects, carbon oxidation and reduction
176 reactions regulated through Wood-Ljungdahl pathway were only observed in the H tran-
177 sect, where sedimentary conditions were anoxic throughout the first 5cm (Supplementary
178 Figures 4-5), consistent with a more reductive environment.
179 Discussion
180 Even if Arctic microbial communities are changing rapidly [13], there is still a dearth of
181 long-term time series observations. To address this point, we used Lake Hazens spatial
182 geochemical heterogeneity to evaluate the structural and functional response of lake sedi-
183 ment microbial communities to varying runo↵, already shown to increase in this warming
184 High Arctic environment [9]. Such an approach can reasonably be interpreted from the
185 lens of a space-for-time design, which assumes that spatial and temporal variations are
186 not only equivalent [27, 28], but also stationary [29]. Whether this latter condition is met
187 cannot be known, but in the absence of any time-series documenting the e↵ect of climate
188 change on lake sediment microbial communities in the High Arctic, the space-for-time
189 design becomes a convenience, if not a necessity [30].
190 Using metagenomics along two transects experiencing heterogeneous runo↵ conditions,
191 we presented evidence that climate change, as it drives increasing runo↵ and sediment
192 loading to glacial lakes, will likely lead to a decrease in both diversity and functional
193 potential of the dominant microbial communities residing in lake sediments. Note that bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
10
194 we specifically focused here on the dominant microbes, that is those for which we could
195 reconstruct the MAGs, in order to (i) have a phylogenetic placement of the corresponding
196 organisms based on a large number of marker genes (Figure 2), rather than partial 16S
197 rRNA gene sequences as usually done [23], and (ii) be able to predict almost complete
198 functional pathways for each of these organisms to test the impact of a change of runo↵
199 (Figure 4), rather than inferring function from taxonomic a liation [23].
200 Such a decrease in taxonomic and functional diversity may not be unique to Lake
201 Hazen, where rising temperatures have resulted in increasing glacial melt and associated
202 runo↵. Although such a pattern was not observed in other regions of the globe where
203 runo↵ is predicted to decrease [31, 32], our finding are likely to apply to other, smaller,
204 glacierized watersheds typical of high latitudes or altitudes. Indeed, at least in the Arctic,
205 freshwater discharge is broadly expected to increase with increasing temperatures and
206 precipitation loadings [33, 34, 35]. It would thus be immensely valuable to conduct sim-
207 ilar studies, replicating where appropriate a similar space-for-time design, at other lakes
208 throughout the world. Additional sampling e↵orts should carefully consider the spatial
209 heterogeneity of runo↵ regimes leading to divergent sedimentation rates (Supplementary
210 Table 2), limiting our ability to make temporal predictions.
211 Despite lacking geochemical measurements for the soil samples, we found that the
212 microbial communities in the sediments at the high runo↵ sites clustered most frequently
213 with those in the soil sites (Figure 4), highlighting a connection between terrestrial and
214 aquatic sediment communities as a function of the runo↵ volume, consistent with previous
215 findings [36, 37]. Unsurprisingly, as the soil is likely a source of nutrients (e.g.,DOC)
216 and organic and inorganic particles, we would expect increased runo↵ to the aquatic
217 ecosystems to alter microbial community structure [38]. Some of these structural changes
218 may then alter the functional capacity to metabolize carbon, nitrogen, sulfur compounds bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
11
219 and process toxins such as metals and antibiotics (Supplementary Figure 9). A more
220 experimentally-driven approach, based for instance on in situ incubation and geochemi-
221 cal tracers, would have been necessary to quantify such an interplay between microbial
222 metabolism and geochemical features. Yet, as sediments and nutrients are mostly de-
223 posited during the summer melt months, it can be expected that lake sediments record
224 microbe-driven seasonal changes in their geochemistry. Indeed, high glacial runo↵ is
225 known to bring dense, oxygenated river waters with OC directly to the bottom of the lake
226 [32], stimulating aerobic microbial activity. As a result, the geochemistry recorded along
227 the high runo↵ transect may first reflect a period of greater microbial metabolism, which
228 may actually exceed those in temperate systems [39], eventually followed by low oxygen,
229 low redox, and high NH3 conditions observed here (Figure 1) as oxygen is depleted and
230 anaerobic metabolisms allowed to proceed.
231 At a larger temporal scale, a key question that arises from these results is how changes
232 in hydrological regimes will alter the evolutionary dynamics of microbial communities in
233 lake sediments. Niche di↵erentiation, where the coexistence of ecological opportunities
234 can facilitate species diversification, may explain why sediments along the low runo↵
235 transect hosts a more diverse microbial community than sediments along the high runo↵
236 transect [40]. Presently, climate change is predicted to increase runo↵ in this High Arctic
237 environment [9], and we found evidence suggesting that the increased runo↵ homogenizes
238 community structure. This can be expected to disrupt niche di↵erentiation, and hence to
239 reduce the overall and long-term metabolic capacity in lake sediments. It is currently hard
240 to predict the future microbial ecology of these systems. On the one hand, climate change
241 may diminish species diversification, and lead to highly specialized microbial communities
242 adapted to a homogeneous ecological niche characterized by low oxygen, low redox, and
243 high NH3 concentrations. On the other hand, the seasonal and rapid changes in redox bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
12
244 conditions, predicted to follow the strong but punctual input of oxygen and nutrients
245 during springtime may allow for the development of a short-lived community that eluded
246 our sampling and analysis.
247 The rapid changes that a↵ect Lake Hazen’s watershed in response to climate warming
248 were already known to directly alter its hydrological regime. Here we further provide
249 evidence that a combination of increasing runo↵ and changing geochemical conditions
250 are associated with the reduced diversity and metabolic potential of its dominant micro-
251 bial communities. While longitudinal studies are needed to confirm these patterns, it is
252 still unclear how such losses in biodiversity and metabolic potential in Arctic ecosystems
253 will impact key biogeochemical cycles, potentially creating feedback loops of uncertain
254 direction and magnitude.
255 References
256 1. Overpeck, J. et al. Arctic environmental change of the last four centuries. Science
257 278,1251–1256(1997).
258 2. Serreze, M. C. & Francis, J. A. The Arctic amplification debate. Climatic change
259 76,241–264(2006).
260 3. Screen, J. A. & Simmonds, I. The central role of diminishing sea ice in recent Arctic
261 temperature amplification. Nature 464,1334(2010).
262 4. IPCC. Summary for Policymakers, book section SPM, 1–30 (Cambridge University
263 Press, Cambridge, United Kingdom and New York, NY, USA, 2013). Available at
264 www.climatechange2013.org. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
13
265 5. Laudon, H. et al. Save northern high-latitude catchments. Nature Geoscience 10,
266 324 (2017).
267 6. Bliss, A., Hock, R. & Radi´c, V. Global response of glacier runo↵ to twenty-first
268 century climate change. Journal of Geophysical Research: Earth Surface 119,717–
269 730 (2014).
270 7. O’Reilly, C. M. et al. Rapid and highly variable warming of lake surface waters
271 around the globe. Geophysical Research Letters 42,10–773(2015).
272 8. Frey, K. E. & McClelland, J. W. Impacts of permafrost degradation on Arctic river
273 biogeochemistry. Hydrological Processes: An International Journal 23,169–182
274 (2009).
275 9. Lehnherr, I. et al. The world’s largest High Arctic lake responds rapidly to climate
276 warming. Nature Communications 9,1290(2018).
277 10. Smol, J. P. et al. Climate-driven regime shifts in the biological communities of
278 Arctic lakes. Proceedings of the National Academy of Sciences 102,4397–4402
279 (2005).
280 11. Wrona, F. J. et al. Transitions in Arctic ecosystems: Ecological implications of
281 achanginghydrologicalregime. Journal of Geophysical Research: Biogeosciences
282 121,650–674(2016).
283 12. McCalley, C. K. et al. Methane dynamics regulated by microbial community re-
284 sponse to permafrost thaw. Nature 514,478(2014).
285 13. Hultman, J. et al. Multi-omics of permafrost, active layer and thermokarst bog soil
286 microbiomes. Nature 521,208(2015). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
14
287 14. Mackelprang, R., Saleska, S. R., Jacobsen, C. S., Jansson, J. K. & Ta¸s, N. Per-
288 mafrost meta-omics and climate change. Annual Review of Earth and Planetary
289 Sciences 44,439–462(2016).
290 15. Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive
291 Earth’s biogeochemical cycles. Science 320,1034–1039(2008).
292 16. Fuhrman, J. A. Microbial community structure and its functional implications.
293 Nature 459,193(2009).
294 17. Williamson, C. E., Saros, J. E., Vincent, W. F. & Smol, J. P. Lakes and reser-
295 voirs as sentinels, integrators, and regulators of climate change. Limnology and
296 Oceanography 54,2273–2282(2009).
297 18. Stoeva, M. K. et al. Microbial community structure in lake and wetland sediments
298 from a High Arctic polar desert revealed by targeted transcriptomics. PLoS One
299 9,e89531(2014).
300 19. Thaler, M., Vincent, W. F., Lionard, M., Hamilton, A. K. & Lovejoy, C. Microbial
301 community structure and interannual change in the last epishelf lake ecosystem in
302 the north polar region. Frontiers in Marine Science 3,275(2017).
303 20. Mohit, V., Culley, A., Lovejoy, C., Bouchard, F. & Vincent, W. F. Hidden biofilms
304 in a far northern lake and implications for the changing Arctic. npj Biofilms and
305 Microbiomes 3,17(2017).
306 21. Ruuskanen, M. O., St. Pierre, K. A., St. Louis, V. L., Aris-Brosou, S. & Poulain,
307 A. J. Physicochemical drivers of microbial community structure in sediments of
308 Lake Hazen, Nunavut, Canada. Frontiers in Microbiology 9,1138(2018). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
15
309 22. Cavaco, M. A. et al. Freshwater microbial community diversity in a rapidly changing
310 high arctic watershed. FEMS Microbiol Ecol 95 (2019).
311 23. Ruuskanen, M. O., St Pierre, K. A., St Louis, V. L., Aris-Brosou, S. & Poulain,
312 A. J. Physicochemical drivers of microbial community structure in sediments of
313 lake hazen, nunavut, canada. Front Microbiol 9,1138(2018).
314 24. Ruuskanen, M. O. et al. Microbial genomes retrieved from high arctic lake sedi-
315 ments encode for adaptation to cold and oligotrophic environments. Limnology and
316 Oceanography (2019).
317 25. K¨ock, G. et al. Bathymetry and sediment geochemistry of Lake Hazen (Quttinir-
318 paaq National Park, Ellesmere Island, Nunavut). Arctic 56–66 (2012).
319 26. Bowers, R. M. et al. Minimum information about a single amplified genome
320 (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea.
321 Nature biotechnology 35,725(2017).
322 27. Blois, J. L., Williams, J. W., Fitzpatrick, M. C., Jackson, S. T. & Ferrier, S.
323 Space can substitute for time in predicting climate-change e↵ects on biodiversity.
324 Proceedings of the National Academy of Sciences 110,9374–9379(2013).
325 28. Lester, R. E., Close, P. G., Barton, J. L., Pope, A. J. & Brown, S. C. Predicting
326 the likely response of data-poor ecosystems to climate change using space-for-time
327 substitution across domains. Global change biology 20,3471–3481(2014).
328 29. Damgaard, C. A critique of the space-for-time substitution practice in community
329 ecology. Trends Ecol Evol 34,416–421(2019). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
16
330 30. Pickett, S. T. Space-for-time substitution as an alternative to long-term studies.
331 In Long-term studies in ecology, 110–135 (Springer, 1989).
332 31. Huss, M. & Hock, R. Global-scale hydrological response to future glacier mass loss.
333 Nature Climate Change 8,135(2018).
334 32. Pierre, K. S. et al. Contemporary limnology of the rapidly changing glacierized
335 watershed of the world’s largest high arctic lake. Scientific reports 9,4447(2019).
336 33. Peterson, B. J. et al. Increasing river discharge to the arctic ocean. science 298,
337 2171–2173 (2002).
338 34. Rawlins, M. A. et al. Analysis of the arctic system for freshwater cycle intensifi-
339 cation: Observations and expectations. Journal of Climate 23,5715–5737(2010).
340 Available at http://dx.doi.org/10.1175/2010JCLI3421.1.
341 35. Bring, A. et al. Arctic terrestrial hydrology: A synthesis of processes, regional
342 e↵ects, and research challenges. Journal of Geophysical Research: Biogeosciences
343 121,621–649(2016).
344 36. Comte, J., Culley, A. I., Lovejoy, C. & Vincent, W. F. Microbial connectivity and
345 sorting in a High Arctic watershed. The ISME journal 12,2988–3000(2018).
346 37. Ruiz-Gonz´alez, C., Ni˜no-Garc´ıa, J. P. & del Giorgio, P. A. Terrestrial origin of
347 bacterial communities in complex boreal freshwater networks. Ecology letters 18,
348 1198–1206 (2015).
349 38. Le, H. T. et al. Responses of aquatic bacteria to terrestrial runo↵: e↵ects on
350 community structure and key taxonomic groups. Frontiers in microbiology 7,889
351 (2016). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
17
352 39. Probst, A. J. et al. Di↵erential depth distribution of microbial function and puta-
353 tive symbionts through sediment-hosted aquifers in the deep terrestrial subsurface.
354 Nature microbiology 3,328(2018).
355 40. Cordero, O. X. & Polz, M. F. Explaining microbial genomic diversity in light of
356 evolutionary ecology. Nature Reviews Microbiology 12,263(2014).
357 Acknowledgements
358 This study was made possible through a collaborative e↵ort undertaken by Igor Lehn-
359 herr, Stephanie Varty, Victoria Wisniewski (University of Toronto, Mississauga), Charles
360 Talbot (Environment and Climate Change Canada), and Maria Cavaco (University of
361 Alberta). We thank Linda Bonen, Marina Cvetkovska, Manon Ragonnet and Alex Wong
362 for comments and discussions. Funding support was provided by the Natural Science and
363 Engineering Research Council of Canada (VSL, AJP, SAB), ArcticNet Network Centre
364 of Excellence (VSL, AJP), and the Polar Continental Shelf Program (VSL) in Resolute,
365 Nunavut, which provided logistical and financial support.
366 Contributions
367 G.C. and V.S.L. performed sampling, whereas G.C. conducted laboratory analyses. G.C.
368 and S.A.B. performed data analyses. G.C., S.A.B., V.S.L., and A.J.P. designed the study
369 and wrote the manuscript. V.S.L. conducted the microsensor profiles and porewater
370 extractions. G.C., S.A.B., A.J.P., M.R., K.S.P., and V.S.L. reviewed the manuscript.
371 Ethics declarations / Competing interests
372 The authors declare no competing interests. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
18
373 Supplementary information
374 Supplementary information is available for this paper. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
19
375 Figure captions bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
20 ), 2 ). L1 L2 3 C C L2 L1 H2 H1 * Sample PAM PAM Cluster
H1H2
01010200 150 100 50 0
2
- 3 NO NO
µmol·L-
+ sites and dotted 3 - - NO ↵ 2 + - 2 2 NO O L1 L2 Redox C * *
H1H2 Water Depth Water 600 500 400 300 200 100 0
-
0 Cl 3 3 NH OC µg·L- PC1 (42.6%) PC1 4 * pH L1 L2 SO 3 C 3 * CaCO -2
NH ), and ammonia concentration (NH
H1H2
. . . 2.5 2.0 1.5 1.0 4
3 CaCO % ), water depth, percent organic carbon (OC), 3 3 2 1 0 -2 -1 L1 L2
*
(B) PC2 (29.9%) PC2 025 (Dunn Test). If branch tips form a dichotomy or C . 0 * + NO
↵ . (B) Principal component analysis (PCA) showing the
H1H2 2
4 67 3 5
OC P< % 2) concentration (SO 4 L1 L2 Tu r n a b o u t Abbé Abbé Very Watercourse Gilman Snowgoose Snowgoose Henrietta Nesmith Nesmith Henrietta Blister 70°W C Runoff Runoff
* ° Ruggles River 0
C
W
°
°E 30 H1H2 30 600 500 400 300 C-Soil H-Soil
N Redox ° mv H2
60 H1 High runoff
), sulfate (SO
N W °
° ° L2
3
60 E 60 70 L1
N L1 L2 * 72°W
° L-Soil
80 C Low runoff * km
10
W
°
°
H1H2
24 10 8 6
90 E 0 90 sites. There was cient insu data to include soil sites in B and C.
2 2 O µmol·L - N
0
82°N 81.75°N ° W ° *
L1 L2 2 E 120 120 C *
Lake Hazen sampling design and chemical composition. (A) Location of Lake Hazen (red box). Inset
° W H1H2
5 E 150 °
150
. 7.4 7.2 7.6 7.0
° 180 pH (A) (C) ↵ erences in physical and chemical composition of the sediment sites. Vectors display pH, dissolved dioxygen (O Figure 1. map: soil (orange dots)(purple), and low sediment (green), (black and dots)di negligible/control sample (blue) sites runo areredox separated potential, into hydrological nitrates regimes andpercent of nitrites calcium carbonate high (CaCO concentration (NO Individual points represent themedoids mean was values used using to 1 cmasterisks identify intervals indicate clusters. measured significant (C) in di ↵ erences Distributiontrichotomy, the between the of top sites interactions chemical 5 within features cm. thatline for Partitioning group separates sediment around is low sites. not runo ↵ Branches significant. and Long dashes separate high runo bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
21 L2 Samples 0.1 L1 C eota r Thermoplasmatota 2 H Crenoarchaeota* aeota h Halobacte aeota* c r h c 1 r da H r il Asga Nanoa o L-S il o aeota h C-S aeota c r Patescibacteria* h c il r oa o r Dormibacterota H-S Mic Altia UAP2 0.3
0.00
0.75 0.50 1.00
undances b Relative a Relative Chloroflexota* aeota h aeota c r h c r -18 R ya Actinobacteriota* r Eu JdF Hadesa Armatimonadota* Firmicutes Gammaproteobacteria* Cyanobacteria* Dictyoglomota Caldisericota UBP7 Thermotogota Synergistota Bipolaricaulota Deinococcota ia* ota* r t e e t c c y obiota Alphaproteobacteria* r m ophota* r o wba t allbacteria o c W Riflebacteria Dependentiae Fusobacteriota Spirochaetota* errucomicrobiota* lan Lind Elusimic Omnit P V ota r e abilota* t r ionota c ota* r ota* omonadota r cc r o oba vib f c ylomi o Marinisomatota KSB1* Bacteroidota* o h ospinota ospi x r r y et esulfu esul it it Maximum likelihood phylogenetic trees of Lake Hazen genomes based on 120 concatenated bacteria and dell Campylobacterota Acidobacteria* M N N UBP10* M B D D Fermentibacterota WOR-3 Zixibacteria* Latescibacterota Edwardsbacteria UBP1* Eisenbacteria* Gemmatimonadota* Figure 2. 122 concatenated archaea protein-codingcontain genes. Lake Red Hazen Dots: genomes.Euryarchaeota. Lake Bacteria See Hazen tree GitHub genomes. is account rooted Asterisksand for with (*) 3). full Patescibacteria indicate Inset taxonomy and phyla shows tree100%. Archaea that MAG files tree abundance and is across for rooted sites, original with in tree the files 300 (Supplemental high Data quality File genomes 2 for each sample normalized to bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
22
● ● ● ● ● ● 10 ● ● ● ● ● H transect ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● H1 ● ● ● ● ● ● ● ● ● ● ● ● H2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● L1 5 ● ● ● ● ● ● ● ● L2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● C ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● H soil ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ●● ● ● ● ● ● ● ● L soil ● ● ● ● ● ● ● ● ● ● ● ● ● -SNE 2 ● ● ● ● t ● ● ● ● ● C soil ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● − ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● L transect ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ●
− ● ● ● ● ● ● ● ● ●● ● ●● ●
−10 −5 0 5 10 t-SNE 1
Figure 3. t-SNE analysis of genome abundance for each sediment sample. Each of the 300 shown genomes was assigned to the sample where it has the greatest abundance. Shaded arrows display the approximate direction of water flow, from upstream to downstream, for the high (green) and low (purple) transects. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
23 0 s (vertical
L C S H
xidation o CO CO −50
Urease
Cu resistance Cu
xidation o yde h malde r o f
adation r deg acid Amino
DNRA
ate f sul to xidation o Sulfite
xidation o mate r o f
xidation o ate f thiosul
Calvin cycle Calvin
Methanogenesis
adation r deg compound Aromatic
ification r Denit
xidation o Sulfide
y resistance y r Mercu
Amino acid biosynthesis acid Amino
ate reduction ate f Sul
Arsenate reduction Arsenate
gluconeogenesis
ase r e f ans r t Aminoglycoside
N fixation N
ate reduction ate r Nit
ammonia assimilation ammonia
Arsenic resistance Arsenic
TCA cycle TCA
iron reduction iron
Sulfite reduction to sulfide to reduction Sulfite −100
ylation h met y r Mercu
acycline resistance acycline r et T
Slope of pathway counts from Low to High runoff ood Ljungdahl ood W
ification r Nit
Cd Pb resistance Pb Cd
xidation o iron
amphenicol resistance amphenicol r Chlo
glycolysis
Pb Zn resistance or homeostasis or resistance Zn Pb
ite reduction ite r Nit
yde production yde h malde r o f
ugs r d ultiple m
xidation o Arsenite
xidation o methane
Beta lactamase Beta Cd Co Cu Pb Zn resistance Zn Pb Cu Co Cd = 0.0010 −150 P
0
20 10 Frequency (B) 5 os Neg Null P Sign of slope leads to a decrease in diversity. (A) Distribution of the slopes of ↵ 0
L C H S
Nitrospirota
Myxococcota
UBP10
Gemmatimonadota
Crenarchaeota
abilota r ylomi h Met Nanoarchaeota
-values: one-sided binomial test for enrichment in negative slopes. KSB1 Omnitrophota
P
ia r wbacte o Lind
−5
ia r Zixibacte
iota r anobacte y C
matimonadota r A
UBP1
Bacteroidota
ia r Eisenbacte
Slope of taxa counts from Low to High runoff
ucomicrobiota r er V
Spirochaetota
ia r atescibacte P
ia r Alphaproteobacte
iota r Acidobacte
ycetota m Plancto
xota e Chlorofl Transition from low to high runo
ia r Gammaproteobacte
iota r Actinobacte −10 -5 = 7.8x10 P
6 4 2 0 Frequency (A) Figure 4. taxonomic counts as aboth function cases, of counts sites. were (B)models aggregated Distribution (ANOVA) by were of location fitted the types totables; slopes (L estimate leftmost of [Low], the dendrograms pathway C counts slope show [Control],triangle as how of S pointing the a each [Soil], down). location function regression. and of types Insets: H sites. cluster, [High] heatmap transitioning In sites), representations from and of L linear count to H runo ↵ bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1
1 Online methods
2 Sample collection and processing
3 Sediment and soil cores were collected from Lake Hazen (82°N, 71°W: Figure 1A), located
4 within Quttinirpaaq National Park, on northern Ellesmere Island, Nunavut. Sampling
5 took place between May 10 and June 10, 2017, when the lake was still completely ice-
6 covered (Supplementary Table 1). Within the watershed, runo↵ flows from the outlet
7 glaciers along the northwestern shoreline through poorly consolidated river valleys, de-
8 positing sediments at the bottom of Lake Hazen along two transects, the H1/H2 and
9 L1/L2 sites, respectively. The lake then drains via the Ruggles River along its south-
10 eastern shoreline (C sites). The surrounding glacial rivers deliver di↵erent amounts of
11 sediments, nutrients and organic carbon unevenly to the lake as a consequence of hetero-
12 geneous sedimentation rates (Supplementary Table 2). More specifically, the top 5cm of
13 sediments from the deeper low (L2) and high (H2) runo↵ sites represented depositional
14 periods of 30 years (1987-2017) and 6 years (2011-2017), respectively (Supplementary
15 Table 3).
16 Samples were collected along two transects and can be separated into three hydrologi-
17 cal regimes by seasonal runo↵ volume: low (L transect), high (H transect), and negligible
18 runo↵ (C sites) summarized in Supplementary Table 3. Contamination of samples was
19 minimized by wearing non-powdered latex gloves during sample handling and sterilizing
20 all equipment with 10% bleach and 90% ethanol before sample collection. Sediment cores
21 approximately 30cm in length were collected with an UWITEC (Mondsee, Austria) grav-
22 ity corer from five locations: C (overlying water depth: 50 m) far from the direct influence
23 of glacial inflows serving as a control site; L1 (water depth: 50m) and L2 (water depth:
3 24 251 m), at variable distances from a small glacial inflow (Blister Creek, <0.001 km in bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
2
25 summer 2016); and, H1 (water depth: 21m) and H2 (water depth: 253m), located ad-
3 26 jacent to several larger glacial inflows (i.e., the Abb´eRiver, 0.015 km and Snow Goose,
3 27 0.006 km in 2016). The soil samples (S sites) were collected from three sites in the dried
28 streambeds of the tributaries, on the northern shore between the two transects. At each
29 site, for both sediments and soil, five cores were sampled, 3m apartforthesediment ⇠
30 cores, and approximately 1m aparttoaccountforsiteheterogeneity. ⇠
31 For sediment core, one of the five cores were used for microprofiling of oxygen (O2),
32 redox and pH, as well as one core for porewater chemistry and loss on ignition (see [1] for
33 details), and the remaining three cores were combined, prior to their genomic analysis, here
34 again to account for site heterogeneity. For soil samples, three cores per site were collected
35 for DNA analysis, but no additional cores were collected for chemical analyses. As we
36 were mostly interested in the community composition through space, we combined the
37 top 5 cm of sediment and 10 cm of soil for sample extraction and subsequent sequencing.
38 Any remaining length of cores that were used for DNA analysis were discarded. These
39 uppermost layers in the sediment correspond to both the most recent sediment deposition
40 dates [2] and the region of greatest microbial activity [3]. The top of each core was
41 sectioned and placed into Whirlpack bags. These slices were homogenized manually inside
42 of the bags and stored in a 20°C freezer until shipment back to the University of Ottawa 43 where they were then stored at 80°C. Soil samples were transferred into falcon tubes,
44 homogenized, and stored as described above for the lake sediment samples.
45 Samples were thawed overnight and 250-400 mg (wet weight; Supplementary Table
46 4) were then washed in a sterile salt bu↵er (10mM EDTA, 50mM Tris-HCl, 50mM
47 Na2 HPO4 7H2O at pH 8.0) to remove PCR inhibitors [4, 5]. All sample handling was
48 conducted in a stainless-steel laminar flow hood (HEPA 100) treated with UVC radiation
49 and bleach before use. DNA extractions were performed using the DNeasy PowerSoil Kit bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
3
50 (MO BIO Laboratories Inc, Carlsbad, CA, USA), following the kit guidelines, except that
51 the final elution volume was 30 µlinsteadof100µl. DNA integrity was validated with
52 a NanoDrop Spectrometer and PCR combined with electrophoresis of the Glutamine
53 synthetase gene (glnA) as this gene is ubiquitous across microbial life (Supplementary
54 Figure 1 and Supplementary Table 5). Adequate DNA concentrations for sequencing were
55 reached by combining triplicate extractions for a total volume of 45 µlandaconcentration
56 50 ng/µl(SupplementaryTable4).Positiveandnegativecontrolswereusedtoverify
57 the integrity of the PCR amplification. Two kit extraction blanks contained no trace of
58 DNA and were not sequenced.
59 Chemical analyses
60 Redox potential, pH, and dissolved O2 concentration profiles were measured at 100 µM
61 intervals in the field within an hour of collection, using Unisense (Aarhus, Denmark) mi-
62 crosensors connected to a Unisense Field Multimeter. Cores used for porewater chemistry
63 analysis were sectioned in 1 cm intervals into 50 mL falcon tubes, followed by flushing
64 of any headspace with ultra-high-purity nitrogen (N2)beforecapping.Sedimentpore-
65 water was extracted following centrifugation at 4,000rpm. The supernatant was then
66 filtered through 0.45 µmcelluloseacetatefiltersinto15mltubes,andwerefrozenuntil
67 analysis. Concentrations of nitrates and nitrites (NO2 + NO3 ), and ammonia (NH3),
68 chloride (Cl ) were measured in the sediment porewater using a Lachat QuickChem 8500
2 69 FIA Ion Analyzer, while total dissolved phosphorus (TDP) and SO4 were measured in
70 the sediment porewater using an ion chromatograph at the Biogeochemical Analytical
71 Service Laboratory (Department of Biological Sciences, University of Alberta). However,
72 TDP was removed from data analysis because insu cient porewater was collected to mea-
73 sure TDP at site C. The centrifuged sediments were retained and percentages of calcium bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
4
74 carbonate (CaCO3)andorganiccarbon(OC)wereestimatedthroughlossonignition[6].
75 The chemical features of the top 5cm of the sediment cores were derived from mea-
76 surements performed at 1 cm intervals throughout the cores. The geochemical properties
77 of each sediment site were summarized using a Principle Component Analysis (PCA)
78 and projections were clustered using Partitioning Around Medoids [7]. The appropriate
79 number of clusters was determined from silhouettes with the R package hopach [8]. The
80 Dunn test [9] was used to compare samples, controlling for multiple comparisons with the
81 Benjamini-Hochberg adjustment.
82 Sequencing and data processing
83 Metagenomic libraries were prepared and sequenced by Genome Quebec on an Illumina
84 HiSeq 2500 platform (Illumina, San Diego, CA, USA; Supplementary Figure 2) on a
85 paired-end 125 bp configuration using Illumina TruSeq LT adapters (read 1: AGATCG-
86 GAAGAGCACACGTCTGAACTCCAGTCAC, and read 2: AGATCGGAAGAGCGTCGT-
87 GTAGGGAAAGAGTGT). The DNA from the eight sites (five sediments, three soils) was
88 sequenced, generating over 150 GB of data. Read count summaries were tracked through-
89 out each step of the pipeline for quality control (Supplementary Figure 3). Low quality
90 reads, adapters, unpaired reads, and low quality bases at the ends of reads were removed to
91 generate quality controlled reads with Trimmomatic (v0.36) [10] using the following argu-
92 ments: phred33, ILLUMINACLIP:TruSeq3-PE-2.fa:3:26:10, LEADING:3 TRAILING:3,
93 SLIDINGWINDOW:4:20, MINLEN:36, CROP:120, HEADCROP:20, AVGQUAL:20. FASTQC
94 (v0.11.8) [11] was then used to confirm that the Illumina adapters were removed and that
95 trimmed sequence lengths were at least 90bp in length with a Phred score of at least 33. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
5
96 Reconstruction of environmental genomes and annotation
97 To reconstruct environmental genomes, metagenomic quality-controlled reads from all
98 samples were coassembled using Megahit [12] software with a k-mer size of 31 and “meta-
99 large” setting (see Supplementary Table 6 for additional summary statistics). EukRep
100 [13] was used to remove any eukaryotic DNA from the contigs prior to the formation
101 of an Anvio (v5) [14] contig database. The contig database was generated by removing
102 contigs under 1000bp, and gene prediction was performed in the Anvio environment.
103 Sequence coverage information was determined for each assembled sca↵old by mapping
104 reads from each sample to the assembled contig database using Bowtie2 [15] with default
105 settings. The resulting SAM files were sorted and converted to BAM files using sam-
106 tools (v0.1.19) [16]. Each BAM file was prepared for Anvio using the “anvi-init-bam”
107 and contig database generated using “anvi-gen-contigs-database”. The contig database
108 and BAM mapping files were further used as input for “anvi-profile”, which generated
109 individual sample profiles for each contig over the minimum length of 2500 bp. These
110 profiles were then combined using “anvi-merge” and summary statistics for abundance
111 and coverage were generated with “anvi-summarise.” Automated binning was performed
112 using CONCOCT [17]. Sca↵olds were binned on the basis of GC content and di↵erential
113 coverage abundance patterns across all eight samples. Manual refinement was done using
114 Anvio’s refine option (Supplementary Table 7). Kaiju [18] was used to classify taxon-
115 omy of the assembled contigs with “anvi-import-taxonomy-for-genes” and aided in the
116 manual refinement process. Open reading frames were predicted with Prodigal (v2.6.3)
117 [19]. Anvio’s custom Hidden Markov models were run, along with NCBIs COG [20] an-
118 notation to identify protein-coding genes. PFAM [21], TIGRFAM [22], GO terms [23],
119 KEGG enzymes and pathways [24], and Metacyc pathways [25] were predicted with Inter-
120 proscan (v5) [26]. These annotations were then combined with the Anvio database with bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
6
121 “anvi-import-functions”.
122 Genome completeness and contamination were evaluated on the presence of a core set
123 of genes using CheckM (v1.0.5) “lineage wf” (Supplementary Table 7) [27]. Only genomes
124 that were at least 50% complete and with less than 10% contamination were further anal-
125 ysed – meeting the MIMAG standard for medium or high-quality genomes [28]. All recov-
126 ered genomes were used to calculate an average amino acid identity across all genomes us-
127 ing compareM (v0.0.23, function “aai wf”; https://github.com/dparks1134/CompareM)
128 [29]. CheckM was used again to identify contigs that were not contained in any of the
129 300 high-quality genomes, that is those whose size ranges from 1000–2500bp. As an
130 attempt to “rescue” these unbinned contigs, an alternative binning algorithm MaxBin
131 (v2.0) [30] was employed. An additional 481 genomes were recovered, but were not in-
132 cluded in further analysis as only 21 genomes were of average completion >65% (Sup-
133 plementary Data 1: https://github.com/colbyga/hazen_metagenome_publication/
134 blob/master/Supplemental_Data_1_maxbin2_unbinned_contigs_summary.csv).
135 Phylogenetic placement of the MAGs
136 Phylogenetic analyses were performed using two di↵erent sets of marker genes from the
137 Genome Taxonomy Database (GTDB): one for bacteria (120 marker genes) and one for
138 archaea (122 marker genes), as previously been used to assign taxonomy to MAGs [31].
139 The marker genes were extracted from each genome by matching Pfam72 (v31) [21] and
140 TIGRFAMs73 (v15.0) [22] annotations from GTDB (v86) [31]. Marker genes from each
141 of the 300 genomes were translated using seqinr [32], selecting the genetic code that
142 returned no in-frame stop codon. As some genomes had multiple copies of a marker
143 gene, duplicated copies were filtered out by keeping the most complete sequence. Marker
144 genes that were missing from some genomes were replaced by indel (gap) characters, bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
7
145 and their concatenated sequences were added those from the reference GTDB sequences.
146 MUSCLE (v3.8.31) [33] was employed to construct the alignment in R (v 3.5.1) [34].
147 Archaeal sequences were removed from the bacterial alignment on the basis of results
148 from CheckM [27] and independently verified using a custom list of archaea specific marker
149 genes. Alignments were then refined using trimAI [35] and the “-gappyout” parameter.
150 FastTree2 [36], recompiled with double precision to resolve short branch lengths, was used
151 to infer maximum likelihood phylogenetic trees from protein sequence alignments under
152 the WAG + model [37, 38, 39]. The archaeal tree was rooted with Euryarchaeota and
153 the bacterial tree was rooted with Patescibacteria using APE [40]. Trees were visualized
154 and colored by phylum with ggtree [41].
155 Community composition of the MAGs
156 To determine the relative abundance of each genome in the eight samples, sample-specific
157 genome abundances were normalized by sequencing depth [(reads mapped to a genome) /
158 (total number of reads mapped)], making comparisons across samples possible. Genome
159 abundances were generated using the CheckM “profile” function [27]. To determine the
160 average abundance of major taxonomic groups across sites (determined by the phyloge-
161 netic analysis described above), the abundances for genomes from the same taxonomic
162 group were summed and visualized using phyloseq [42] (usually at the phylum level, un-
163 less otherwise stated). These same abundance values were the basis for a community
164 composition analysis. The t-SNE plots were constructed by assigning each genome to a
165 site based on where it was most abundant using Rtsne [43]. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
8
166 Metabolic potential of the MAGs
167 To analyze functional marker genes in the metagenomes, we used a custom database of
168 reference proteins sequences (COG, PFAM, TIGRFAM, KEGG) based on the marker
169 genes used in other studies [44, 45] (Supplementary Data Files on GitHub). Pathways
170 were also predicted using MinPath [46] to map all identified KEGG enzymes to the most
171 parsimonious MetaCyc pathways [25]. As these MAGs were incomplete, some genes in
172 pathways may be absent. MinPath presented only parsimonious pathways represented by
173 multiple genes. As most genomes were present even at low abundances across all sites,
174 acut-o↵valueof 0.25 (on a log scale) was set for a genome to be included in the 10
175 functional analyses at any site, so that only the most abundant genomes for each site
176 were considered. We aggregated marker genes and pathways by function, summarizing
177 the results by phyla, except for Proteobacteria that was separated by class. We further
178 grouped all taxa together at each site to test for significant di↵erences in major nutri-
179 ent cycling processes (carbon, nitrogen, and sulphur) among sites using a hierarchical
180 clustering; significance was derived from the Approximately Unbiased bootstrap [47] and
181 Fisher’s exact test.
182 Data availability
183 Scripts and supplemental data files can be accessed from https://github.com/colbyga/
184 hazen_metagenome_publication. Raw sequence reads of the shotgun metagenomic data
185 were submitted to the sequence read archive (SRA) under accession no. SRP218124 and
186 under Bioproject PRJNA556841. The geochemical data were submitted to the National
187 Science Foundation’s (NSF) Arctic Data Center repository under doi:10.18739/A20R9M41W. bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
9
188 References
189 1. Ruuskanen, M. O., St. Pierre, K. A., St. Louis, V. L., Aris-Brosou, S. & Poulain,
190 A. J. Physicochemical drivers of microbial community structure in sediments of
191 Lake Hazen, Nunavut, Canada. Frontiers in Microbiology 9,1138(2018).
192 2. Pierre, K. S. et al. Contemporary limnology of the rapidly changing glacierized
193 watershed of the world’s largest high arctic lake. Scientific reports 9,4447(2019).
194 3. Haglund, A.-L., Lantz, P., T¨ornblom, E. & Tranvik, L. Depth distribution of active
195 bacteria and bacterial activity in lake sediment. FEMS Microbiology Ecology 46,
196 31–38 (2003).
197 4. Zhou, J., Bruns, M. A. & Tiedje, J. M. DNA recovery from soils of diverse compo-
198 sition. Applied and environmental microbiology 62,316–322(1996).
199 5. Poulain, A. J. et al. Microbial DNA records historical delivery of anthropogenic
200 mercury. The ISME journal 9,2541(2015).
201 6. Heiri, O., Lotter, A. F. & Lemcke, G. Loss on ignition as a method for estimating
202 organic and carbonate content in sediments: reproducibility and comparability of
203 results. Journal of paleolimnology 25,101–110(2001).
204 7. Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K. cluster: Cluster
205 Analysis Basics and Extensions (2019). R package version 2.1.0 — For new features,
206 see the ’Changelog’ file (in the package source).
207 8. van der Laan, M. J. & Pollard, K. S. Hybrid clustering of gene expression data with
208 visualization and the bootstrap. Journal of Statistical Planning and Infererence
209 117,275–303(2003). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
10
210 9. Dinno, A. Dunn’s Test of Multiple Comparisons Using Rank Sums (2017). R
211 package version 1.3.5 — For new features, see the ’Changelog’ file (in the package
212 source).
213 10. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina
214 sequence data. Bioinformatics 30,2114–2120(2014).
215 11. Andrews, S. et al. Fastqc: a quality control tool for high throughput sequence data
216 (2010).
217 12. Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast
218 single-node solution for large and complex metagenomics assembly via succinct de
219 bruijn graph. Bioinformatics 31,1674–1676(2015).
220 13. West, P. T., Probst, A. J., Grigoriev, I. V., Thomas, B. C. & Banfield, J. F.
221 Genome-reconstruction for eukaryotes from complex natural microbial communi-
222 ties. Genome Res 28,569–580(2018).
223 14. Eren, A. M. et al. Anvio: an advanced analysis and visualization platform for omics
224 data. PeerJ 3,1319(2015).
225 15. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature
226 Methods 9,357(2012).
227 16. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics
228 25,2078–2079(2009).
229 17. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition.
230 Nature methods 11,1144(2014). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
11
231 18. Menzel, P., Ng, K. L. & Krogh, A. Fast and sensitive taxonomic classification for
232 metagenomics with Kaiju. Nature Communications 7,11257(2016).
233 19. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation
234 site identification. BMC bioinformatics 11,119(2010).
235 20. Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes.
236 BMC bioinformatics 4,41(2003).
237 21. Finn, R. D. et al. The Pfam protein families database: towards a more sustainable
238 future. Nucleic acids research 44,D279–D285(2015).
239 22. Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein
240 families. Nucleic acids research 31,371–373(2003).
241 23. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nature
242 genetics 25,25(2000).
243 24. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as
244 areferenceresourceforgeneandproteinannotation. Nucleic acids research 44,
245 D457–D462 (2015).
246 25. Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes and
247 the BioCyc collection of pathway/genome databases. Nucleic acids research 36,
248 D623–D631 (2007).
249 26. Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioin-
250 formatics 30,1236–1240(2014). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
12
251 27. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W.
252 CheckM: assessing the quality of microbial genomes recovered from isolates, single
253 cells, and metagenomes. Genome research 25,1043–1055(2015).
254 28. Bowers, R. M. et al. Minimum information about a single amplified genome
255 (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea.
256 Nature biotechnology 35,725(2017).
257 29. Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes sub-
258 stantially expands the tree of life. Nature microbiology 2,1533(2017).
259 30. Wu, Y.-W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning
260 algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics
261 32,605–607(2015).
262 31. Parks, D. H. et al. Astandardizedbacterialtaxonomybasedongenomephylogeny
263 substantially revises the tree of life. Nature biotechnology 36,996–1004(2018).
264 32. Charif, D. & Lobry, J. SeqinR 1.0.2: a contributed package to the R project for
265 statistical computing devoted to biological sequences retrieval and analysis. In
266 Bastolla, U., Porto, M., Roman, H. & Vendruscolo, M. (eds.) Structural approaches
267 to sequence evolution: Molecules, networks, populations,BiologicalandMedical
268 Physics, Biomedical Engineering, 207–232 (Springer Verlag, New York, 2007). ISBN
269 :978-3-540-35305-8.
270 33. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high
271 throughput. Nucleic acids research 32,1792–1797(2004). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
13
272 34. RDevelopmentCoreTeam.R: A Language and Environment for Statistical Com-
273 puting. R Foundation for Statistical Computing, Vienna, Austria (2008). Available
274 at http://www.R-project.org.ISBN3-900051-07-0.
275 35. Capella-Guti´errez, S., Silla-Mart´ınez, J. M. & Gabald´on, T. trimAl: a tool for
276 automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics
277 25,1972–1973(2009).
278 36. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2: approximately maximum-
279 likelihood trees for large alignments. PloS one 5,e9490(2010).
280 37. Whelan, S. & Goldman, N. A general empirical model of protein evolution derived
281 from multiple protein families using a maximum-likelihood approach. Molecular
282 biology and evolution 18,691–699(2001).
283 38. Aris-Brosou, S. & Rodrigue, N. The essentials of computational molecular evolu-
284 tion. In Evolutionary Genomics, 111–152 (Springer, 2012).
285 39. Aris-Brosou, S. & Rodrigue, N. A not-so-long introduction to computational molec-
286 ular evolution. Methods Mol Biol 1910,71–117(2019).
287 40. Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolu-
288 tion in R language. Bioinformatics 20,289–290(2004).
289 41. Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T.-Y. ggtree: an R package for
290 visualization and annotation of phylogenetic trees with their covariates and other
291 associated data. Methods in Ecology and Evolution 8,28–36(2017).
292 42. McMurdie, P. J. & Holmes, S. phyloseq: an R package for reproducible interactive
293 analysis and graphics of microbiome census data. PloS one 8,e61217(2013). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
14
294 43. Krijthe, J., van der Maaten, L. & Krijthe, M. J. Package ‘Rtsne’ (2018).
295 44. Anantharaman, K. et al. Thousands of microbial genomes shed light on intercon-
296 nected biogeochemical processes in an aquifer system. Nature Communications 7,
297 13219 (2016).
298 45. Dombrowski, N., Teske, A. P. & Baker, B. J. Expansive microbial metabolic versa-
299 tility and biodiversity in dynamic Guaymas Basin hydrothermal sediments. Nature
300 Communications 9,4999(2018).
301 46. Ye, Y. & Doak, T. A parsimony approach to biological pathway reconstruc-
302 tion/inference for genomes and metagenomes. PLoS Computational Biology 5,
303 e1000465 (2009).
304 47. Suzuki, R. & Shimodaira, H. Pvclust: an R package for assessing the uncertainty
305 in hierarchical clustering. Bioinformatics 22,1540–1542(2006). bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
Supplementary Text
Nutrient cycles a↵ected when transitioning from low to high Overall, markers of carbon and sulphur metabolism significantly decreased when tran- sitioning from the L to H sites, even if nitrogen metabolism was not (Tab. S8). Most carbon pathways, such as carbon fixation through the Calvin-Benson-Bassham (CBB) pathway, as well as the capacity for simple carbon metabolism, were shared across all runo↵ regimes. In contrast, carbon oxidation and reduction reactions regulated through Wood-Ljungdahl pathway were only observed in the H sites, where sedimentary conditions were anoxic throughout the first 5cm (Figs.S4, S9). Here, Spirochaetota were likely per- forming anaerobic respiration and carbon fixation producing acetate as an end product. Methanogenesis pathways were present across all sites, but notably, methane oxidation pathways were absent from high runo↵ sites, where oxygen is limited. Greater concentrations of ammonia in the high runo↵ regimes may suggest that N- containing organic matter was mineralised through ammonification (Fig.S8). In the high runo↵ regime, there was both an absence of nitrification and a greater presence of mark- ers for ammonia assimilation. Markers for dissimilarity nitrate reduction (DNRA) were present in multiple genomes across all runo↵ regimes (Fig.S8). In contrast, urease mark- ers were found more abundantly in low runo↵ regimes, where ammonia concentration was lower (Fig.S5). The functional ability of microbes to cycle sulphur between oxidised and reduced forms was significantly di↵erent between the high and low runo↵ regimes (Tab. S8). In the high runo↵ regime, Gammaproteobactiera were the only organisms with the metabolic capacity to expansively utilise sulphur, performing sulphide oxidation and thiosulphate reduction. Whereas, sulphate reduction was predominantly found in the soil, control, and low runo↵ regimes. Aside from nutrient cycling, we also assessed the capacity of microbial communities to process metals and antibiotics. Metal resistance and cycling was mostly ubiquitous throughout all of the sites, regardless of runo↵. Methyl mercury production, identified by the presence of both hgcA and hgcB genes [1], was only implicated in the high runo↵ sites, in Spirocheatoa and Chloroflexoata. However, genes conferring mercury resistance involved in the conversion of inorganic HgII to the less toxic Hg0 –wereevenlydistributed throughout the sites. There was a broad presence of metal tolerance that was indicated by genetic determinants related to heavy metal resistance of cadmium, cobalt, copper, lead and zinc (Fig.S9). Furthermore, antimicrobial resistance genes specifically -lactamases, were ubiquitous across all genomes at all sites. We identified 90 genomes with antibiotic resistance genes (Fig.S??). Drug resistance was prevalent throughout all phyla, including one Creanarchaeota (archaeal) genome (Figs.S9, S??). Finally, we found that while amino acids were readily synthesised and degraded by most organisms (Fig.S9), the degradation of polycyclic aromatic compounds appeared to be least prevalent in the high runo↵ sites.
Page S1 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
Sites clustered by runo↵ regime To identify potential drivers of this reduction in diversity when going from the L to the H transects, we bi-clustered genomes by their normalised abundances (on a -log10 scale to reduce skew) and by sample site (Fig. S12), and found that sites clustered following a similar pattern to geochemical features (see Fig. 1B), with H sites grouping separately from L sites (Fig. 3A). The normalised abundances of MAGs showed no strong phylogeographic pattern, in that we did not observe an assemblage of MAGs solely representative of a given site (Fig. S13). In spite of this absence of phylogeographic pattern, the tanglegram suggests that the beta diversity of highly abundant MAGs in the L/C sites was greater than at the H sites (see the distribution of green lines connecting the phylogenetic and clustered trees in Tab. S13). This di↵erence in diversity between samples is further supported by an NDMS ordination (Fig. S14) and a PERMANOVA test on a PCoA ordination (Fig. S14).
References
1. Parks JM, Johs A, Podar M, Bridou R, Hurt RA, Smith SD, et al. The genetic basis for bacterial mercury methylation. Science. 2013;339(6125):1332–1335.
2. Pierre KS, Louis VS, Lehnherr I, Schi↵ S, Muir D, Poulain A, et al. Contemporary limnology of the rapidly changing glacierized watershed of the world’s largest High Arctic lake. Scientific reports. 2019;9(1):4447.
Page S2 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
Supplementary Figures Space (A) Space Space Space Space E.coli
H20 H2 L2 H1 L1 L-Soil H-Soil C-Soil Mix C B1 B2 1 23 1 23 1 23 1 23 1 23 1 23 1 23 1 23 1350 1350
1350 1350 150 150 100 100 50 50 150 150 100 100 50 50 (B) L-Soil L-Soil H2 H2 L-Soil H2 1 2 E.coli3 H20 12 3 1 1 2 3 1 2 3 12 3
1350 1350
150 150 100 100 50 50
Supplementary Figure 1. Gel electrophoresis images of the glnA gene for each sample extracted in triplicate. (A) Initial gel results using 1 µl of DNA for each PCR reaction. Every sample contains glnA except L-Soil and H2. (B) Repeated PCR reaction for L-Soil and H2. Diluted DNA concentrations for PCR from 1 µlto0.1µland0.5µl.
Page S3 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
L-Soil H-Soil C-Soil
L2 H2 C
L1 H1
Supplementary Figure 2. Illumina HiSeq DNA sequencing library validation from an Agilent Technology 2100 Bioanalyzer. All libraries were constructed using Illumina TruSeq DNA PCR-Free Library Prep. The library results for each of the eight samples is presented and labelled above the panel: L-Soil, H-Soil, C-Soil, L2, H2, C, L1, H1. Peaks between 300-600bp indicate the presence of DNA. Peaks below 1bp and above 1000bp are internal standards.
Page S4 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
150,000,000
100,000,000 Reads
50,000,000
0 C initial C L1 initial L1 initial L2 H1 initial H1 initial H2 C trimmedC L1 trimmedL1 trimmedL2 L Soil initial Soil L H1 trimmedH1 trimmedH2 C Soil initial Soil C initial Soil H L Soil trimmed Soil L C Soil trimmed Soil C trimmed Soil H C mapped anvio mapped C L1 mapped anvio mapped L1 anvio mapped L2 H1 mapped anvio mapped H1 anvio mapped H2 C mapped checkm mapped C L1 mapped checkm mapped L1 checkm mapped L2 H1 mapped checkm mapped H1 checkm mapped H2 L Soil mapped anvio mapped Soil L C Soil mapped anvio mapped Soil C anvio mapped Soil H L Soil mapped checkm mapped Soil L C Soil mapped checkm mapped Soil C checkm mapped Soil H Stages of Metagenome Processing
Supplementary Figure 3. Number of reads throughout analysis. Reads are successively decreasing, except for the reads mapped to the Anvio contig database, as each read can map to more than one contig and be counted more than once. The reads mapped using checkM include only reads that map to the 300 high quality reconstructed genomes, opposed to all reconstructed genomes in the Anvio mapping.
Page S5 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
0 0
−1 −1
−2 −2
depth cm depth −3 cm depth −3
−4 −4
−5 −5 0 5 10 7.00 7.25 7.50 7.75
O2 µmol/L pH
0
● H1● H2● L1● L2 ● C −1
−2
depth cm depth −3
−4
−5 200 300 400 500 600 redox mV
Supplementary Figure 4. Sediment microprobe profiles for oxygen (O2), pH, and redox measured in 100 µm intervals. Legend (bottom right) indicates the colour of each sediment site in the profiles.
Page S6 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
0 0 ● ● ● ●● ●● ● ● ● −1 −1 ● ● ● ● ● ●● ● ● ● −2 −2 ● ● ● ●● ● ● ● ● ● −3 −3 ● ● ● ● ● ● ● ● ●
depth cm depth −4 cm depth −4 ● ● ● ● ● ●● ● ● −5 −5 34567 0 50 100 150 200 -- Organic Carbon % NO2 + NO3 ug/L
0 0 ●●● ● ● ●● ● ● ● −1 −1 ● ● ● ● ● ● ● ● ● ● −2 −2 ● ● ● ● ● ●● ● ● ● −3 −3 ● ● ● ● ● ● ● ● ● depth cm depth depth cm depth −4 −4 ● ● ● ● ● ●● ● ● −5 −5 1.0 1.5 2.0 2.5 0 250 500 750 1000 NH ug/L CaCO3 % 3 0 0 ● ● ● ● ● ● ● ● ● −1 −1 ● ● ● ● ● ●●● ● ● −2 −2 ● ● ● ● ● ● ● ● ●● −3 −3 ● ● ● ●● ●
depth cm depth −4 cm depth −4 ● ● ● ● ● ● ● −5 −5 0.2 0.4 0.6 0 10 20 30 - Cl mg/L SO4 mg/L
0 ● ● ● ● ● −1 ● ● ● ● −2 ● H1● H2● L1● L2 ● C ● ● ● ● −3 ● ● ●
depth cm depth −4 ● ● ● ● −5 10 20 30 40 TDP ug/L
Supplementary Figure 5. Sediment porewater profiles for organic carbon, nitrite and nitrates, calcium carbonate, ammonia, chlorine, sulphate, and total dissolved phosphorus (TDP). TDP was removed when producing PCA and boxplots in Figure 1 because of incomplete measurements in C and L2. Legend (bottom right) indicates the colour of each sediment site in the profiles.
Page S7 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
* * (A) * *
* 600 7.6
O 6 8 10 pH 2 Redox µmol·L- mv 300 400 500 7.0 7.2 7.4 024
H1 H2 C L1 L2 H1 H2 C L1 L2 H1 H2 C L1 L2
* (B) * * * * * 2.5 0.5 0.6 20 500 600 67
* * 150 200 - - - NH3 NO2 +NO3 Cl * SO4 %OC %CaCO 3 µg·L- µmol·L- mg·L- mg·L- 10 15
* 1.0 1.5 2.0 0.1 0.2 0.3 0.4 345 0 50 100 0 100 200 300 400 H1 H2 C L1 L2 H1 H2 C L1 L2 H1 H2 C L1 L2 H1 H2 C L1 L2 H1 H2 C L1 L2 H1 H2 C L1 L2 Supplementary Figure 6. Distribution of all chemical features for sediment sites. Includes chlorine and sulphate measurements absent from Figure 1. Branches and asterisks indicate significant di↵erences between sites P<0.025 (Dunn Test). If branches form a dichotomy or trichotomy, the interactions within that group is not significant. (A) Microprobe measurements collected at every 100 µm. (B) Porewater measurements collected from bulk 1 cm intervals.
Page S8 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
Acidobacteriota
Actinobacteriota 60.7 (A) 1.00 (B) 60 Alphaproteobacteria 55.4 Armatimonadota Bacteroidota 51.3 49.6 Chloroflexota Crenarchaeota 0.75 Cyanobacteriota Eisenbacteria 40 Gammaproteobacteria 36.1 Gemmatimonadota
0.50 KSB1 Lindowbacteria Methylomirabilota 20.3 20.9 Abundance Myxococcota 20 Nanoarchaeota 14.7 0.25 Nitrospirota Omnitrophota % of Reads Mapped to MAGs to Mapped Reads of % Patescibacteria Planctomycetota Spirochaetota 0.00 0 UBP1 H-Soil C-Soil L-Soil H1 H2 C L1 L2 UBP10 C-SoilH-Soil L-Soil H1 H2 C L1 L2 Sample Verrucomicrobiota Sample Zixibacteria
Supplementary Figure 7. Reconstructed genome abundance across sites. (A) Amount of reads mapped to the 300 high quality genomes for each sample normalised to 100%. Only reads that were mapped to genomes are shown and all unmapped reads have been excluded. (B) Amount of reads mapped to the 300 high quality genomes for each sample.
Page S9 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
● Samples ●● ● ● ● Taxa ● ● ● Acidobacteriota ● ● ● ● Actinobacteriota ● ● H2 ● ● OC ● Alphaproteobacteria 0.3 ● ● ● ● ● ● Bacteroidota ●● Water ● ● Chloroflexota ● Depth SO4 ● ● ● ● Eisenbacteria ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● Gammaproteobacteria NH3 ● ●● ●● ● ●● ● ● ●● ● ●● ● Gemmatimonadota ● ● C ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● Methylomirabilota ● ● ● ● 0.0 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Myxococcota ● ● ● ● NMDS2 ● ● ● ● ● ● ● ● ● ● ● ● CaCO3 ● ● ● ● ● ● ● Nitrospirota ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ● O ● ● ● ● ● ● ● ●● ●● 2 ● ● ●● ● ● ● pH ● ● ● ● ● ● Omnitrophota ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● Patescibacteria ● ●● ●●●●● ●● ● ● ● ● ● ●●● ●● ● ● ●●● ●● ●● ●●● ● ● ● ● ●● H1 ●●● ● ●●●●● Planctomycetota ● L2 ●● ● Spirochaetota −0.3 L1 Redox ● UBP1 - - NO3 +NO2 ● UBP10 Cl- ● Verrucomicrobiota
● Crenarchaeota −0.6 −0.3 0.0 0.3 ● Nanoarchaeota NMDS1 Supplementary Figure 8. NMDS analysis of genomes from 20 most abundant phyla in sediment sites. Physical and chemical vectors are fitted to data. Red vectors are significant (permutation test: P<0.05).
Page S10 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
Soils: High Runoff: Low Runoff: Control: C H-Soil, L-Soil, H1 & H2 L1 & L2 & C-Soil - 50-100% of MAGs 30-49% of MAGs 0-29% of MAGs Actinobacter-iota(1) Alphaproteobacteria(4) Actinobacteriota(3) Acidobacteriota(3) Alphaproteobacteria(2) Actinobacteriota(1) Acidobacteriota(3) Alphaproteobacteria(1) Actinobacteriota(12) Alphaproteobacteria(1) Verrucomicrobiota(3) Verrucomicrobiota(2) Verrucomicrobiota(1) Chloroflexota(3) Chloroflexota(2) Chloroflexota(2) Crenarchaeota(2) Spirochaetota(3) Gammaproteobacteria(8) Gemmatimonadota(3) Gammaproteobacteria(4) Gemmatimonadota(2) Gammaproteobacteria(1) Chloroflexota(2) Crenarchaeota(1) Gammaproteobacteria(5) Bacteroidota(2) Myxococcota(3) Methylomirabilota(2) Myxococcota(2) KSB1(1) Nitrospirota(2) Nitrospirota(2) Bacteroidota(1) Methylomirabilota(2) UBP1(1) UBP10(4) UBP1(1) UBP10(2) Patescibacteria(5) Eisenbacteria(1) Planctomycetota(7) Eisenbacteria(1) Planctomycetota(5) Planctomycetota(3) UBP10(1) Wood–Ljungdahl Calvin cycle TCA cycle CO oxidation Gluconeogenesis Glycolysis Methane oxidation Carbon Methanogenesis Formaldehyde oxidation Formaldehyde production Formate oxidation N fixation Nitrification Denitrification DNRA Nitrate reduction Nitrogen Nitrite reduction Ammonia Assimilation Urease Sulfate reduction Sulfide oxidation Sulfite oxidation Sulfur Thiosulfate reduction Sulfite reduction (reversible) Iron oxidation Iron reduction Arsenite oxidation Arsenate reduction orr 1.0
0.5
0.0 −0.5 Arsenic resistance −1.0 Cd/Co/Cu/Pb/Zn resistance Metals Cd/Pb resistance Cu resistance Pb/Zn resistance Mercury methylation Mercury reduction Aminoglycoside transferase Tetracycline resistance Beta−lactamase Drugs Multiple drugs Chloramphenicol resistance Amino acid biosynthesis Amino acid degradation Other Aromatic compound degradation Supplementary Figure 9. Metabolic capacity of genomes separated by hydrological
regime. Genomes are only considered to contribute to a site if 0.56% ( log10 0.25) of reads per sample mapped to a genome. Presence of core metabolic genes involved in carbon metabolism, nitrogen metabolism, sulphur metabolism, metal cycling, antibiotic resistance, and other metabolism are shown. Number of genomes for each taxa are shown in parentheses. Blank: genes predicting function are absent or in low abundance. Shaded colours: genes predicting function are present in 30-50% of genomes per phylogenetic group. Dark colours: genes predicting function are present in >50% of genomes per phylogenetic group.
Page S11 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
au bp edge # 0.20
95 46 6 h1
Height 97 33 96 71 5 4 0.10 0.15
94 53 90 70 h2 3 2 h_soil
100 95 control
l_soil 1 c_soil l1 l2 0.00 0.05
Supplementary Figure 10. Metabolic capacity for carbon, sulphur, and nitrogen cycles clustered with P -values via multiscale bootstrap resampling. AU (approximately unbiased) P -values are shown in red, with any value > 0.95 (significance level 0.05) being significant. BP (bootstrap proportions) values are shown in green. Hierarchical clustering was completed using “correlation” as distance method and “average” as clustering method.
Page S12 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
80 80
75 60 60
50 40 40
25 Number of Bins of Number 20 20
0 0 0 −20 24 −20 24 −20 24 H−Soil Abundance L−Soil Abundance C−Soil Abundance
60 40
30 30 40
20 20
20
Number of Bins of Number 10 10
0 0 0 −20 24 −20 24 −20 24 L2 Abundance L1 Abundance C Abundance
30 30
20 20
10 Number of Bins of Number 10
0 0 −20 24 −20 24 H2 Abundance H1 Abundance
Supplementary Figure 11. Genome abundance per sample ( log scale). 10
Page S13 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
(A) Count
Abundance 0 80 Phylogeny −202 Row Z−Score
Bin_22_6 Bin_16_6 Bin_26_16 Bin_54_2 Bin_33_2 Bin_26_3 Bin_26_5 Bin_4_2 Bin_25_6 Bin_4_24 Bin_44_5 Bin_17_14 Bin_13_8 Bin_8_9 Bin_17_4 Bin_8_7_2 Bin_48_8 Bin_48_7 Bin_48_3 Bin_48_4 Bin_6_9 Bin_6_12 Bin_6_5 Bin_9_15 Bin_6_4 Bin_9_7 Bin_23_3 Bin_6_2 Bin_9_12 Bin_56_2 Bin_40_4 Bin_42_5 Bin_45_3 Bin_45_2 Bin_4_34 Bin_13_10 Bin_36_7 Bin_13_6 Bin_48_2 Bin_43_8 Bin_2_22 Bin_1_10 Bin_3_9
Bin_63_6 Bin_3_15 Bin_72_7 Bin_25_3 Bin_28_13 Bin_13_5 Bin_3_17 Bin_4_10
Bin_4_20_2 Bin_72_9 Bin_4_30 Bin_1_17 Bin_8_15 Bin_63_5 Bin_40_5 Bin_40_6 Bin_22_5 Bin_22_10 Bin_22_18 Bin_22_14 Bin_25_12 Bin_56_7 Bin_64_3 Bin_64_5 Bin_56_4 Bin_56_3 Bin_62_6 Bin_64_7 Bin_64_4 Bin_64_8 Bin_37_3 Bin_39_4 Bin_26_14 Bin_39_7 Bin_39_6
Bin_64_2 Bin_52_2 Bin_4_3 Bin_17_15 Bin_73_3 Bin_73_4 Bin_28_3 Bin_74_3 Bin_79_3 Bin_23_8 Bin_28_8 Bin_28_5 Bin_68_6 Bin_28_9 Bin_28_10 Bin_25_4 Bin_28_7 Bin_25_9 Bin_28_2
Bin_73_7 Bin_16_2 Bin_42_2 Bin_35_3 Bin_12_6 Bin_35_4 Bin_30_2 Bin_24_8 Bin_24_3 Bin_12_12 Bin_35_6 Bin_12_13 Bin_59_2 Bin_46_7 Bin_22_16 Bin_73_2 Bin_70_4 Bin_65_2 Bin_79_2 Bin_79_4 Bin_5_12 Bin_24_4 Bin_65_4 Bin_42_10 Bin_65_7 Bin_59_3 Bin_9_14 Bin_43_6 Bin_16_5 Bin_76_2 Bin_20_9 Bin_5_3 Bin_13_9 Bin_73_6 Bin_71_2_2 Bin_9_16 Bin_68_3 Bin_32_7 Bin_32_5 Bin_32_8 Bin_32_6 Bin_4_14 Bin_4_12 Bin_38_2 Bin_71_4 Bin_11_4 Bin_61_2 Bin_71_3 Bin_46_6 Bin_11_9 Bin_46_5 Bin_46_3 Bin_70_7 Bin_18_4 Bin_70_6 Bin_11_5 Bin_18_8 Bin_18_7 Bin_49_3 Bin_49_5 Bin_70_5 Bin_75_5 Bin_51_4 Bin_51_6 Bin_36_5 Bin_36_2 Bin_13_3 Bin_36_15 Bin_73_5 Bin_32_4 Bin_68_5 Bin_66_2 Bin_34_5 Bin_19_3 Bin_34_7 Bin_69_2 Bin_44_3 Bin_68_4 Bin_68_2 Bin_44_2 Bin_58_3 Bin_68_9 Bin_68_8 Bin_58_8 Bin_58_4 Bin_17_12 Bin_70_8 Bin_23_5 Bin_12_2 Bin_70_3 Bin_21_5 Bin_72_3 Bin_58_2 Bin_75_4 Bin_21_4 Bin_30_7 Bin_18_2_2 Bin_21_8 Bin_21_12
Bin_49_2 Bin_12_8 Bin_21_3_2 Bin_31_7 Bin_57_6 Bin_47_5 Bin_44_4 Bin_75_6 Bin_12_3 Bin_29_9 Bin_15_10 Bin_58_9 Bin_15_16 Bin_58_5 Bin_29_2 Bin_29_3 Bin_50_4 Bin_7_3_2 Bin_33_15 Bin_31_4 Bin_27_2
Bin_11_7 Bin_43_4 Bin_43_3 Bin_42_14 Bin_47_4 Bin_41_2 Bin_30_4 Bin_53_5 Bin_31_3 Bin_79_5 Bin_15_3 Bin_15_14 Bin_15_15 Bin_15_9 Bin_79_6 Bin_78_2 Bin_72_5 Bin_15_7 Bin_15_12 Bin_15_5 Bin_58_7 Bin_53_2 Bin_29_4 Bin_33_6 Bin_15_6 Bin_7_14 Bin_31_6 Bin_11_2
0.0 0.5 1.0 1.5 2.0 C L2 L1
H2 H1 0.5 soil soil soil − − − L (B) H C Count 02 4
−202 Row Z−Score
Bin_37_6
Bin_37_9
Bin_37_10
Bin_37_2
Bin_80_8
Bin_80_4
Bin_80_9
Bin_80_10
Bin_54_9
Bin_54_7
0.0 0.5 1.0 1.5 2.0
C 0.5 L1 L2 H1 H2 soil soil soil − − − L H C
Supplementary Figure 12. Genome abundance heatmap, and tanglegram separated by (A) bacterial genomes and (B) archaeal genomes. Left: The heatmap displays genome abundance per site normalised by amount of reads in each sample and
transformed to a log10 scale. Dotted yellow lines represent mean abundance values, and yellow traces represent the raw z-scores above (red) and below (blue) the mean. The abundance values are grouped both by sites (top dendrogram) and genomes (left dendrogram). Right: Tanglegram between dendrogram clustered by similar abundances and phylogenetic tree. Highlights in tanglegram: orange lines are genomes abundant in soil, green lines are genomes abundant in low runo↵ sediment, purple lines are genomes abundant in high runo↵ sediment, black lines are genomes shared in multiple environments.
Page S14 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
100 100 88 100 100 100 96 100 100 100 100
100 100 87 100
34 100 100 100 100 100 100 96 100 100 77
85 100 100 100 100 100 93
78 100
100 100 100 100
100 96 100
95 100 100 Gammaproteobacteria 100 100 40 100 Alphaproteobacteria
100 90 54 UBP10 100 100 100 100 88 100 Myxococcota 100 100 63 100 100 Nitrospirota
33 100 100 100 Methylomirabilota 70 100
87 100 100 100 100 Acidobacteriota
100 28 95 Eisenbacteria 98 100 36 100 UBP1 100 100 100 Gemmatimonadota 99 100 46 70 100 100 100 Zixibacteria 100 100 100 Bacteroidota 99 100 100 100 KSB1 100 100
100
73 Verrucomicrobiota 100 100 100 100 100 Planctomycetota 58 100
100 98 Omnitrophota 100 100 100
100 92 Lindowbacteria 74 100 98 100 6 Spirochaetota 100 72 100 100 Cyanobacteriota 100 82
77 98 Armatimonadota 96 100 100 100 Actinobacteriota 100 100 100 100
100 Chloroflexota 100
100 100 100 Patescibacteria 100 100 100 100 100
70 100 100 100 5
100 95 4 100
100 88 100 3 100 100 98 2 100 100 100 100 1 99 100 100 100 100 0
100 100 100 100 -log10 abundance 100 100 100 100 100 100 100
100 100 100 100 77 100 100
100 100 100 44 100 100 100
100 100 100 100 100 100 100
74 100 100 100 100 100 100 100 100 100
53 100 100 100 100 100
100 100
100 100 100 100 100
100 100 100 100
100 100 100 100 100
100 100
62 100 100 100 100
100 100 100 100 100 15
100 58 100 100 100
100 100 100 100 100
100 98 100 100 100 100
100 100
100 100 100 100 100 100 100
100 100 100 97 100 100 100
100 100
100 33 76 100 100
100 100
100 H1 H2 L2 L1 C H L C − − − soil soil soil 0.2
Bin_37_6 100 Bin_37_9 100 Bin_37_10 100 Bin_37_2 73 Crenarchaeota Bin_80_8 100 100 Bin_80_4 100 100 Bin_80_9 Bin_80_10 Nanoarchaeota Bin_54_9 100 Bin_54_7 H1 H2 L2 L1 C H L C − − − 0.1 soil soil soil
Supplementary Figure 13. Abundance of genomes are presented on a log10 scale were more negative values (red) are more abundant than positive values (blue). The abundance values correspond by row with the genomes phylogenetic assignment. Support values for phylogenetic tree are shown at each node. Page S15 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
P = 0.005 H1 F = 3.5242 High H2 0.2 0.4
Soil H-Soil PCoA 2 PCoA L2 L-Soil Low
0.2C-Soil 0.0 − L1 C
−0.6 −0.4 −0.2 0.0 0.2 0.4 PCoA 1 method = "bray" Supplementary Figure 14. Principal coordinate analysis (PCoA) on reconstructed genome abundances for all sites. Grouping the sites into soil, high, and low runo↵ was proven to be significant (PERMANOVA test: F =3.52, P =0.005).
Page S16 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
Supplementary Tables
Supplementary Table 1. Coordinates of Lake Hazen sediment and soil sites with temperature and date at time of sampling.
Sample Location Temperature ( C)Date H-Soil 81 84’ 840” N; 70 83’ 849” W -5 June 3, 2017 L-Soil 81 80’ 332” N; 71 54’ 239” W 0 June 7, 2017 C-Soil 81 79’ 382” N; 70 44’ 486” W -2 June 3, 2017 H1 81 84’ 150” N; 70 85’ 175” W not measured May 24, 2017 H2 81 82’ 493” N; 70 71’ 498” W not measured May 29, 2017 C 81 80’ 343” N; 70 50’ 447” W not measured May 27, 2017 L1 80 80’ 521” N; 70 52’699”W notmeasured June1,2017 L2 81 79’ 171” N; 71 46’926”W notmeasured June2,2017
Supplementary Table 2. Sediment deposition dates and rates for the two deep sites, H2 and L2, is based on 210Pb constant rate of supply (CRS) dating model. Analysis was completed on 0.5cm core intervals. Data adapted from previous study [2].
H2 (Abbe Deepsite) L2 (Blister Deep Site) Interval Midpoint CRS date (CRS year) Sedimentation rate (g/cm2/yr) Interval Midpoint CRS date (CRS year) Sedimentation rate (g/cm2/yr) 0-0.5 2017.1 0.349 0-0.5 2016.5 0.0724 0.5-1 2016.6 0.292 0.5-1 2014.0 0.0701 1-1.5 2015.9 0.111 1-1.5 2011.3 0.1163 1.5-2 2014.1 0.671 1.5-2 2009.2 0.1759 2-2.5 2012.7 0.671 2-2.5 2007.3 0.1232 2.5-3 2012.7 0.227 2.5-3 2004.7 0.0909 3-3.5 2012.0 5.563 3-3.5 2001.9 0.0967 3.5-4 2011.3 5.563 3.5-4 1998.8 0.0854 4-4.5 2011.3 5.563 4-4.5 1994.0 0.0513 4.5-5 2011.3 5.563 4.5-5 1987.2 0.0375 5-5.5 2011.3 5.563 5-5.5 1976.7 0.0220 5.5-6 2011.3 5.563 5.5-6 1966.51 0.0515 6-6.5 2011.3 5.563 6-6.5 1957.66 0.0339 6.5-7 2011.3 5.563 6.5-7 1949.00 0.0537 7-7.5 2011.3 5.563 7-7.5 1940.41 0.0451 7.5-8 2011.3 5.563 7.5-8 1931.79 0.0668 8-8.5 2011.3 5.563 8-8.5 1925.07 0.0553 8.5-9 2011.3 0.441 8.5-9 1918.10 0.0553 9-9.5 2010.9 0.224 9-9.5 1911.08 0.0553 9.5-10 2009.8 0.123 9.5-10 1904.42 0.0553
Page S17 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
Supplementary Table 3. Glacial runo↵ to the Lake Hazen Watershed. Lengths of rivers in km are shown in parentheses. Mass balance modelled runo↵ for years 2015 and 2016. Sampling dates in 2017 were prior to the summer runo↵. Data adapted from previous study [2].
Surface area (km2) Runo↵ volume (km3) River Catchment Glacier River 2015 2016 High Runo↵:Abb´e(AB) 390 204 7.9(21) 0.061 0.015 Low Runo↵: Blister (BR) n/a 6 2.5 (11) 0.002 <0.001 Gilman (GL) 992 708 5.1 (22) 0.192 0.08 Henrietta Nesmith (HN) 1274 1041 9.6 (4.6) 0.291 0.075 Snowgoose (SG) 222 87 5.6 (17) 0.026 0.006 Turnabout(TN) 678 259 13.4(42) 0.082 0.024 Very (VR) 1035 269 32.9(39) 0.165 0.08 Watershed total 7516 3078 91.2 0.979 0.291
Supplementary Table 4. DNA extraction masses. DNA was extracted in triplicate for each sample and then combined prior to sequencing.
Lake Hazen Sample Location Tube ID Wet Weight (g) PCR with glnA 1 2 3Total(grams) Sediment H2:Deephole S1 0.416 0.455 0.499 1.370 yes(diluted) L2: Blister Deep S2 0.497 0.455 0.389 1.341 yes H1: Abbe S3 0.469 0.514 0.552 1.535 yes L1: Blister Shallow S4 0.429 0.402 0.361 1.192 yes C: Ruggles S5 0.518 0.447 0.331 1.296 yes Soil L-Soil:BlisterSoil BS3 0.527 0.535 0.561 1.623 yes(diluted) H-Soil:AbbeSoil AS3 0.443 0.537 0.417 1.397 yes C-Soil: Ruggles Soil RS3 0.447 0.500 0.447 1.394 yes
Page S18 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
Supplementary Table 5. DNA fluorescence assay quantification for each sample. Note: H2 sequencing required using full extraction volume of 65 µltoreachan appropriate concentration. Volume Concentration Total DNA NanoDrop Sample (µl) (ng/µl) (ng) (concentration ng/µl) H2 65 0.08 3.36 4.6 L2 42 2.55 107.1 7.6 H1 42 4.33 181.86 10.2 L2 42 17.98 755.16 32.8 C4221.2890.436.4 H-Soil 42 38.14 1601.88 65.2 L-Soil 47 69.88 3284.36 149.0 C-Soil 47 74.38 3495.86 110.0
Supplementary Table 6. Contigs were assembled with Megahit and used in Anvio database. Only contigs that were greater than 2500 bp in length were used to form MAGs, however, contigs with a length of 1000 bp or more were used to form the initial database.
Contigs Total Length (bp) Min Contig Length (bp) Max Contig Length (bp) Avg Contig Length (bp) N50 (bp) # of genes prodigal Megahit 12026467 8477069127 200 792468 705 756 NA Anvio 1455655 3414974759 1000 792468 NA 285024 4254625
Supplementary Table 7. Number of bins recovered at each assembly step prior to and after manual refinement with Anvio.
Anvio-Output Anvio-Output CheckM Final CheckM (CONCOCT binning) (manually refined) Total Bins 850 850 877 877 Completion, Redundancy C: >90, R: <10 52 74 52 72 C: >70, R: <10 180 199 178 198 C: >50, R: <10 321 324 309 300 C: <50, R: any 529 526 568 577 Contamination <1233261255286 <5544632590676 >10 8 51 7 27 Good bins but contaminated C: >50, R: >10 2 27 0 2 C: >70, R: >10 1 6 0 0 C: >90, R: >10 0 2 0 0
Page S19 bioRxiv preprint doi: https://doi.org/10.1101/705178; this version posted January 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Colby et al.SupplementaryInformation
Supplementary Table 8. P -values for marker gene and pathway distribution between sites. A Fisher’s exact test was used in the place of 2 test for small count numbers. P -values of less than 0.05 are considered significant (labelled with an asterisks).
Functional Marker High vs Low High vs Control High vs Soil Low vs Control Low vs Soil Control vs Soil Carbon Cycle 0.03298 * 0.0004998 * 0.0004998 * 0.07746 0.1384 0.001499 * NitrogenCycle 0.07596 0.001999* 0.0004998* 0.7006 0.03448* 0.1859 sulphurCycle 0.001999* 0.002999* 0.0004998* 0.4913 0.02999* 0.3823
Page S20