Articles https://doi.org/10.1038/s41564-017-0098-y

Differential depth distribution of microbial function and putative symbionts through sediment-hosted aquifers in the deep terrestrial subsurface

Alexander J. Probst1,5,7, Bethany Ladd2,7, Jessica K. Jarett3, David E. Geller-McGrath1, Christian M. K. Sieber1,3, Joanne B. Emerson1,6, Karthik Anantharaman1, Brian C. Thomas1, Rex R. Malmstrom3, Michaela Stieglmeier4, Andreas Klingl4, Tanja Woyke 3, M. Cathryn Ryan 2* and Jillian F. Banfield 1*

An enormous diversity of previously unknown and has been discovered recently, yet their functional capacities and distributions in the terrestrial subsurface remain uncertain. Here, we continually sampled a CO2-driven (Colorado Plateau, Utah, USA) over its 5-day eruption cycle to test the hypothesis that stratified, sandstone-hosted aquifers sampled over three phases of the eruption cycle have microbial communities that differ both in membership and function. Genome- resolved , single-cell genomics and geochemical analyses confirmed this hypothesis and linked microorganisms to groundwater compositions from different depths. Autotrophic Candidatus “Altiarchaeum sp.” and phylogenetically deep- branching nanoarchaea dominate the deepest groundwater. A nanoarchaeon with limited metabolic capacity is inferred to be a potential symbiont of the Ca. “Altiarchaeum”. Candidate Phyla Radiation bacteria are also present in the deepest groundwater and they are relatively abundant in water from intermediate depths. During the recovery phase of the geyser, microaerophilic Fe- and S-oxidizers have high in situ genome replication rates. Autotrophic Sulfurimonas sustained by aerobic sulfide oxidation and with the capacity for N2 fixation dominate the shallow aquifer. Overall, 104 different -level lineages are present in water from these subsurface environments, with uncultivated archaea and bacteria partitioned to the deeper subsurface.

uch remains to be learned about how microbial communi­ overlooked due to sampling13 and primer bias13,20,21 and the spatial ties in the deep terrestrial subsurface vary with depth due variation in metabolic functions over depth transects including the Mto limited access to samples without contamination from deep subsurface (100 m below the ground) remains unexplored. drilling fluids or sampling equipment. Studies to date have anal­ Crystal Geyser is a cold-water, CO2-driven geyser located geo­ ysed samples acquired by drilling1–3, from deep mines4,5, subsurface logically within the Paradox Basin, Colorado Plateau, Utah, USA22. research laboratories6,7 and sites of groundwater discharge8–11. These Originally an abandoned oil exploration well, the 800-m deep verti­ investigations have shown that the terrestrial subsurface is popu­ cal borehole has served as a geyser conduit whose regular and sig­ lated by a vast array of previously undescribed archaea and bacteria. nificant flow rate (since 1936) provides uncontaminated access to At one site, an aquifer in Colorado (Rifle, USA), the diversity spans organisms present in underlying aquifers. Prior geological studies much of the tree of life12 and includes organisms of the Candidate have defined the region’s hydrostratigraphy, including the transmis­ Phyla Radiation (CPR)13, which may comprise more than 50% of sive Entrada, Navajo, Wingate and White Rim fractured sandstone all bacterial diversity14, and many other previously undescribed aquifers (listed in order of increasing depth), which are separated by bacterial lineages. Also present in the terrestrial subsurface are low-permeability confining units23,24, through which limited verti­ previously unknown or little known archaea, including members cal connectivity for CO2, water and microbes is largely restricted of the DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, to faults and fractures25. A nearby research borehole provided fur­ , )11,15, Altiarchaeum10, Lokiar­ ther geologic and aquifer geochemical information to 322 m below chaeota16 and Aigarchaeota17. ground surface26. Time-series geochemical data collected over the A major question in subsurface relates to how ca. 5-day eruption cycle suggest that Crystal Geyser is primarily organisms, and their capacities for carbon, nitrogen and sulfur sourced from the Navajo Sandstone, with increased contributions cycling, vary along depth transects through the terrestrial sub­ from the shallower Entrada Sandstone during major eruptions, and surface. Some evidence pointing to taxonomic variation between increased fraction of deeper water during minor eruptions26,27. 9 m and 52 m below the surface was obtained via a massive A survey of ribosomal proteins predicted from metagenome 16S ribosomal RNA gene survey at the Hanford Site18. Similar sequences from Crystal Geyser microbial communities revealed the variation and change of two functional genes were also detected for existence of a large phylogenetic diversity of previously unknown two shallow aquifers that were accessed via drilling in Germany19. bacteria and archaea in this system8, and a genomic resolution However, major groups of archaea and bacteria may have been study documented a high incidence of carbon-fixation pathways9.

1Department of Earth and Planetary Science, University of California, Berkeley, CA, USA. 2Department of Geoscience, University of Calgary, Calgary, AB, Canada. 3Department of Energy Joint Genome Institute, Walnut Creek, CA, USA. 4Plant Development and Electron Microscopy, Department of Biology I, Biocenter LMU Munich, Planegg-Martinsried, Germany. Present address: 5Present address: Group for Aquatic Microbial Ecology, Biofilm Center, Department of Chemistry, University of Duisburg-Essen, Essen, Germany. 6Present address: Department of Plant Pathology, University of California, Davis, Davis, CA, USA. 7These authors contributed equally: Alexander J. Probst and Bethany Ladd. *e-mail: [email protected]; [email protected]

328 Nature Microbiology | VOL 3 | MARCH 2018 | 328–336 | www.nature.com/naturemicrobiology © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. NAtURe MiCRObiOlOGy Articles

A remaining question relates to the source regions and distribu­ level increased ~3.5 m over 33.5 h, potentially enabling microbes tions of these organisms. Here, we tracked the microbiology and the to thrive in microaerophilic borehole-affected conditions. To sim­ associated geyser discharge geochemistry continuously through­ plify terminology, we henceforth refer to the source water composi­ out the full 5-day geyser eruption cycle to test the hypothesis that tions as relatively ‘deep’ during the minor eruptions, ‘intermediate’ groundwater from stratified aquifers sampled at different stages (and borehole-affected) during the recovery phase and ‘shallow’ of the cycle has microbial communities that differ in both mem­ during the major eruptions. Similar phase variations in the relative bership and function. Our analyses made use of a comprehensive depths of water composition were recently observed27. collection of more than 1,000 newly reconstructed genomes, both Analysis of microbial community composition in the 2015 bulk from metagenomes and single cells, as well as detailed physical and samples made use of a relatively comprehensive database of genomic chemical information that enabled linking of fluids to their ground­ information for the Crystal Geyser system (see Supplementary water source regions. Fig. 2 for sample processing and analysis overview). The genomic dataset included previously reported draft genomes from this system9 Results and new genomes reconstructed from size-fractionated samples Continuous in situ (downhole) monitoring of the geyser water obtained in April and October of 2014 (Supplementary Table 1). pressure throughout the field campaign defined the regular ~5-day Samples included a post-0.2 µ​m fraction collected onto a 0.1-µ​m period of the eruption cycle (Supplementary Fig. 1). Sampling was filter to enrich for community members with ultra-small cell sizes. conducted over a complete eruption cycle (24–29 May, 2015) dur­ Binning of assembled metagenomes from 27 different samples ing which microbial cells were continuously collected onto 0.1 µ​m from five time points in 2014 used seven different algorithms filters. Time series of downhole temperature, electrical conductivity, (see Supplementary Methods) and resulted in 30,574 genomic total dissolved gas pressure and water samples (for major ion, trace bins (with multiple bins for the same genome generated by dif­ metal and dissolved gas analyses) were collected to associate specific ferent algorithms). Selection of the best quality bin generated for microorganisms with water from different geyser eruption intervals any organism in each sample resulted in 5,795 bins for bacteria and relative aquifer depths (Supplementary Fig. 1). and archaea, of which 2,216 were considered to be at least medium Time series of water pressure, electrical conductivity and tem­ quality (>​70% completeness based on single-copy genes with less perature showed three Crystal Geyser eruption phases previously than three multiple single-copy genes). After curation based on observed26,27: the recovery (relatively low water level, no eruptions, guanine-cytosine content, coverage and the database

light CO2 bubbling), minor eruptions (short eruptions of ~10 min contains 1,215 genome sequences for 503 different archaeal and every hour with elevated CO2 discharge) and major eruptions (con­ bacterial species (for details on genome numbers for each step, 28 stant eruption and heavy CO2 discharge ; Fig. 1). Average chloride please see Supplementary Fig. 2; genome completeness is provided concentrations ([Cl]) and baseline water temperature (16.9 °C; also in Supplementary Table 2). observed in a year-long monitoring period; Supplementary Fig. 1d) To augment the genome-resolved metagenomics, we acquired indicate that, overall, the geyser water is primarily sourced from 206 single amplified genomes (SAGs) from cells collected at one ~320 to 480 m depth, which mainly corresponds to the Navajo aqui­ time point in the minor eruption and one time point during the fer; Supplementary Fig. 1a–d). In minor eruptions, elevated elec­ recovery phase. SAGs were chosen for full sequencing and analy­ trical conductivity and [Cl] indicate increased contribution from sis if PCR-screening for 16S rRNA genes was positive, agnostic to deeper, more saline water (that is, possibly the Wingate aquifer or the specific sequence. Only SAGs with assembly size of >​100 kbp Paradox brine sourced from even greater depth; Supplementary after multi-step contamination screening were considered fur­ Fig. 1c). In the major eruption phase, decreased electrical conduc­ ther. This set comprised 183 SAGs, seven of which were of suf­ tivity and [Cl], and elevated Ca, Sr and Fe concentrations, were con­ ficient quality to be classified as medium-quality draft genomes sistent with an increased contribution from the shallower Entrada ( >​ 70% complete, less than three multiple single-copy genes). We aquifer (Supplementary Fig. 1b). During the eruption-free recov­ required alignments ≥​98% nucleotide identity over >​30% of the ery phase, in which the Crystal Geyser borehole slowly refilled SAG to establish a match between SAGs and genomes from metage­ after the major eruption phase, electrical conductivity gradually nomes. This approach was chosen because almost all of the SAGs increased with the relative contribution of deeper groundwater up were less complete than related draft genomes from metagenomes to (and during) the minor eruptions. During this time the water (Supplementary Fig. 3). In general, SAG sequences aligned well to

Recovery phase Minor eruptions Major eruptions 100 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 19.5 Filtration for metagenomic samples

90 18.5 EC (mS cm

80 17.5

70 16.5 –1 ) 60 15.5 Water pressure (kPa) 50 14.5 0102030405060708090 100 Per cent cycle (time)

Fig. 1 | Crystal Geyser’s 5-day eruption cycle measured during the 2015 sampling period exhibited variations in downhole water pressure and electrical conductivity that define three phases. In each phase, electrical conductivity (EC) and geochemical measurements (6,710 measurements each, no technical replicates; Supplementary Fig. 1) are used to identify relative depths of source water compositions: intermediate for the recovery phase (2,330 measurements), deep for the minor eruptions (2,820 measurements) and shallow for the major eruptions (1,560 measurements). The numbered horizontal grey bars indicate the time periods for each metagenomic sample (25 samples in total) and coloured numbers indicate the grouping of samples from each phase.

Nature Microbiology | VOL 3 | MARCH 2018 | 328–336 | www.nature.com/naturemicrobiology 329 © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Articles NAtURe MiCRObiOlOGy the sequences of genomes from the metagenomes (Supplementary representative of the community found in the Crystal Geyser ecosys­ Fig. 3). We found that >70%​ of the SAGs (145, of which five were tem (Supplementary Fig. 4; morphological diversity of organisms is draft quality) were represented in the set of 503 draft-quality provided in Supplementary Fig. 5). Analysis of the community struc­ genomes from the metagenomes. Conversely, 63 of the 503 genomes ture of the 25 different metagenomes using this approach revealed from metagenomes were also detected by single-cell genomics. Two strong shifts in microbial composition over the cycle (Fig. 3b; draft-quality SAG genomes were not binned from the metagenomes Supplementary Table 5). The accuracy of relative abundance mea­ and thus were added to the database. One SAG is entirely absent in sures of individual genomes was confirmed for three species using the metagenomes based on sequences of the ribosomal protein S3 quantitative digital droplet PCR (Supplementary Fig. 6). No physi­ and read mapping, and probably derived from a very rare organism. cal or geochemical factor besides time, which corresponds to the The 505 genomes in the database (Supplementary Table 2), which source regions of the sampled water, could explain the observed were derived via dereplication from a total set of 1,208 genomes changes in community composition (based on multivariate sta­ (984 genomes from metagenomes, 222 genomes from a previous tistics; Supplementary Table 6). The community was dominated study9 and two single-cell genomes), represent archaeal and bacterial by species of the taxa Candidatus “Altiarchaeum”, Sulfurimonas, species that belong to 104 different phylum-level lineages (Fig. 2). Piscirickettsiaceae, Gallionellaceae and (in order Nine lineages were named as they were represented by least two of decreasing abundance, Fig. 3c). The set of CPR and archaea from genomes with significant phylogenetic distance to neighbouring the DPANN superphylum showed several peaks in relative abun­ phyla and thus may constitute previously unrecognized phylum- dance at several time points during the eruption cycle (Fig. 3d,e). level lineages. In addition, six genomes may be from previously Both groups had the highest cumulative abundance during the unknown phylum-level lineages but the lineages are currently only minor eruptions, when groundwater from the deepest source was represented by a single genome. The majority of diversity was attrib­ sampled. The most abundant CPR (Moranbacteria13) was, however, uted to members of the CPR (Fig. 2). prominent during the recovery phase (Fig. 3d). Overall, the cumu­ Mapping of metagenome reads to the set of 505 genomes showed lative abundances of DPANN and other archaea were significantly that the genomes account for ~50% of the sequence data collected higher in the deep groundwater compared to shallow or intermedi­ through the 2015 eruption cycle (Supplementary Table 4) and thus is ate (Supplementary Fig. 7).

47 50 53 Recovered from both technologies (62) 44 55 Recovered from metagenomics (441) 41 58 Recovered from single-cell genomics (2)

39 65

38 70

72 35

74 29

26 78

19 82

84 15

90 12

277 CPR 96 11 174 OD1 347 Bacteria 0.1

8 103 58 Archaea 104

1 Aenigmarchaeota* 16 31 46 Dojkabacteria* 61 76 Moranbacteria* 91 Ryanbacteria* 2 Hadesarchaea* 17 32 Novel phylum-level lineage* 47 WWE3* 62 * 77 Yanofskybacteria* 92 Niyogibacteria* 3 18 Ratteibacteria* 33 SAR406 48 Gottesmanbacteria* 63 Doudnabacteria* 78 Portnoybacteria* 93 Tagabacteria* 4 Bathyarchaeota* 19 Omnitrophica 34 Ignavibacteria 49 Levybacteria* 64 Moisslbacteria* 79 Novel phylum-level lineage* 94 Terrybacteria* 5 20 Riflebacteria* 35 50 Roizmanbacteria* 65 Kerfelbacteria* 80 Novel phylum-level lineage* 95 Novel phylum-level lineage* 6 Altiarchaeota* 21 Caldiserica 36 51 Beckwithbacteria* 66 Parcubacteria 81 Novel phylum-level lineage* 96 Vogelbacteria* 7 Diapherotrites* 22 37 Nitrospinae 52 Collierbacteria* 67 Buchananbacteria* 82 Nealsonbacteria* 97 Nomurabacteria* 8 Micrarchaeota* 23 Saganbacteria* 38 53 Pacebacteria* 68 Komeilibateria* 83 Staskawiczbacteria* 98 Lloydbacteria* 9 Huberarchaea* 24 Blackallbacteria* 39 54 Woesebacteria* 69 Kuenenbacteria* 84 Gribaldobacteria* 99 Yonathbacteria* 10 Unknwon_archaea 25 40 55 Shapirobacteria* 70 Falkowbacteria* 85 Brennerbacteria* 100 Campbellbacteria* 11 Woesearchaeota* 26 41 56 * 71 Uhrbacteria* 86 Jorgensenbacteria* 101 Novel phylum-level lineage* 12 Pacearchaeota* 27 42 57 SR1* 72 Novel phylum-level lineage* 87 Wolfebacteria* 102 Taylorbacteria* 13 Hydrogenedentes 28 43 58 Peregrinibacteria* 73 Magasanikbacteria* 88 Liptonbacteria* 103 Zambryskibacteria* 14 Desantisbacteria* 29 Stahlbacteria* 44 Betaproteobacteria 59 Novel phylum-level lineage* 74 Torokbacteria* 89 Colwellbacteria* 104 Kaiserbacteria* 15 30 * 45 * 60 Howlettbactria* 75 Andersenbacteria* 90 Harrisonbacteria*

Fig. 2 | Diversity of recovered genomes based on 16 concatenated ribosomal proteins. Genomes were reconstructed for organisms from 104 different phylum-level lineages; 503 different lineages are shown (two lineages did not exhibit >​50% alignment coverage and are thus not displayed). Phyla in bold were assigned names in this study. The scale corresponds to per cent average amino acid substitution over the alignment. Asterisks mark yet-to-be- cultivated phyla, which thus have a Candidatus status. OD1, Parcubacteria. A full tree with reference sequences can be found in Supplementary File 1.

330 Nature Microbiology | VOL 3 | MARCH 2018 | 328–336 | www.nature.com/naturemicrobiology © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. NAtURe MiCRObiOlOGy Articles

a

Depth (m) c 8,000 Entire community (505)

6,000 Relative abundance 200

4,000 Entrada Ss

Sulfurimonas Carmel FM 2,000

400 Navajo Ss GW flow Ca. “Altiarchaeum” (meteoric) d CPR (277) 2,000 Kayenta Fm Relative abundance Wingate Ss 1,500 600 1,000 Chinle Fm

500

Blackbox dol 800 Moenkopi Moranbacteria (OD1) Fm White Rim Ss e DPANN (47) Little Grand Wash Fault 300 Relative abundance

CO2-charged brine

200 b Stress = 0.07

0.2 10 30 100 Ca. “Huberarchaeum” 0.1 20 14 22 29 36 43 51 58 65 71 78 84 91 107 Recovery phase: Minor eruptions: Major eruptions: 70 shallow groundwater 0.0 60 intermediate groundwater deep groundwater 50

NMDS2 40

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 19.5 –0.1 f EC (mS cm 18.5 17.5 % cycle 80 90 –0.2 16.5 –1

15.5 ) −0.2 −0.1 0.0 0.1 0.2 0.3 14.5 NMDS1 0102030405060708090 100 Per cent cycle (time)

Fig. 3 | Hydrogeology and community composition of subsurface fluids sourced from Crystal Geyser throughout an entire eruption cycle. a, The Crystal

Geyser site lies within one of the several natural CO2 reservoirs within the Paradox Basin. The CO2 was probably generated from thermal decomposition of 26,51,52 53,54 Pennsylvanian-aged carbonate rocks . CO2 gas and brine formed by groundwater dissolution of Paradox evaporites migrate via faults and fractures . b, The community profile of 505 organisms strongly followed the succession of the geyser eruptions (blue lines, NMDS). One data point corresponds to one metagenomic sample. The samples show a clear pattern following the succession of the geyser cycle. c, Entire community profile of 505 organisms tracked across the 5-day cycle of the geyser. Each colour corresponds to one genome. d,e, Profiles of the CPR and DPANN community, respectively, show an increase in the overall abundance during the minor eruptions when groundwater has the deepest source composition. f, Downhole electrical conductivity time series during the sampling of the cycle illustrating the individual phases of the geyser (6,710 samples were measured, see Fig. 1 and Supplementary Fig. 1). Number of biological replicates in panels b–e was 24. EC, electrical conductivity; GW, groundwater.

Nature Microbiology | VOL 3 | MARCH 2018 | 328–336 | www.nature.com/naturemicrobiology 331 © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Articles NAtURe MiCRObiOlOGy

When analysed one at a time, the majority of organisms (289, the geyser conduit over the 33.5 h of the recovery phase. Its growth ~57%) were significantly enriched (false discovery rate-corrected was probably favoured by microaerophilic conditions as well as P value <​0.05) in one specific phase of the geyser and could thus sulfide and reduced iron in the geyser fluids. Potentially, other be sourced to one of the groundwater depths (Fig. 4a–c). The shal­ microorganisms enriched in this fraction may also have favoured lowest groundwater was mainly populated by one Sulfurimonas sp. the conditions in the borehole over the 33.5-hour recovery phase, along with a few other bacteria and some archaea. Based on the during which the geyser had no water discharge. Consequently, the genome sequence of Sulfurimonas sp., this organism was inferred community sampled from the recovery phase represents the com­ to be a chemolithoautotroph, capable of nitrogen and carbon munity from intermediate depths with distortions from microbial fixation as well as sulfide oxidation through oxygen respiration growth in the borehole. When deeper groundwater was discharged, (Supplementary Table 7). The capacity for carbon fixation via the the abundances of different DPANN archaea and Ca. “Altiarchaeum” low-cost reductive TCA cycle (two ATP per pyruvate29) coupled were significantly increased. Diverse members of the CPR were still to oxygen respiration may provide an ecological advantage for this present in deep groundwater, although at low relative abundance. species and is also indicative of microaerophilic conditions in the The shallowest groundwater had a substantially higher capac­ relatively shallow aquifer. In contrast to the shallow source, ground­ ity for microbial sulfide oxidation, nitrogen fixation and oxygen water from intermediate depths had a great diversity of different respiration, probably due to the presence of atmospheric gases. In organisms, the majority of which belonged to the CPR. The most contrast, the intermediate source and borehole community had the abundant organism was a member of the Gallionellaceae, a family highest microbial capacity for reduction of various nitrogen com­ of bacteria well known for microaerophilic iron and sulfur oxida­ pounds as well as thiosulfate disproportionation, metal reduction tion at Crystal Geyser8,9. This organism also exhibited the highest and oxidation. The deepest groundwater was enriched in several genome replication rates of all bacteria in the study (average in situ bacteria with the capacity for sulfite reduction, with carbon fixation replication rate (iRep) value of 2.5, maximum iRep value of 4.2; mediated by the Ca. “Altiarchaeum”. The capacity for oxygen respi­ Supplementary Table 8), suggesting that it was also proliferating in ration decreased with increasing depth to the sourced groundwater. Relative abundance of organisms a Organisms enriched in shallow groundwater (n = 5): 35 d Changes in capacity for carbon and nitrogen fixation 4,000 Other contributing to pathway DPANN archaea WL pathway 4,000 3,000 31 Other 16 bacteria CBB cycle 2,000 15 3,000 CPR 2,000 1,000 Nitrogen fixation Relative abundance Sulfurimonas Reverse TCA cycle 1,000 14 22 29 36 43 51 58 65 71 78 84 91 107 14 22 29 36 43 51 58 65 71 78 84 91 107 Per cent cycle Per cent cycle

b Organisms enriched in intermediate groundwater and conduit (n = 8): 164 e Significant changes of metabolic pathways between phases DPANN Minor Major Recovery Minor 3,000 Other eruptions eruptions phase eruptions archaea 14 53 Other Sulfide oxidation 2,000 92 bacteria Sulfur oxidation –2,400 +300 Sulfite reduction CPR Sulfate reduction –2,100 +600 Thiosulfate disprop. 1,000 –1,800 +900

Relative abundance Nitrogen fixation Gallionellaceae –1,500 +1,200 Nitrite oxidation 14 22 29 36 43 51 58 65 71 78 84 91 107 Nitrate reduction –1,200 +1,500 Per cent cycle Nitrite reduction Nitric oxide reduction –900 +1,800 Nitrous oxide reduction c Organisms enriched in deep groundwater (n = 4): 90 –600 +2,100 4,000 Other CBB cycle DPANN –300 +2,400 archaea Other Reverse TCA cycle 3,000 16 27 bacteria WL pathway 46 Oxygen respiration Significant increase 2,000 CPR Fe/Mn reduction/oxidation Significant decrease

Relative abundance 1,000 Hydrogen metabolism Ca. “Altiarchaeum”

14 22 29 36 43 51 58 65 71 78 84 91 107 Shallow groundwater Per cent cycle Intermediate groundwater/conduit Intermediate ground- Deep groundwater Shallow groundwater water/conduit Deep groundwater

Fig. 4 | Microbial source tracking and changes of metabolic potential. a–c, The abundances of organisms that are significantly enriched in groundwater from the different depths (for details on organisms please see Supplementary Table 5, number of biological replicates are given in parenthesis of panels a–c). The pie charts indicate the diversity of CPR, DPANN, other bacteria, and other archaea associated with each relative depth. d, Different carbon fixation pathways predominate in groundwater from the three different depths. Nitrogen fixation and the reverse TCA cycle occur in one organism, Sulfurimonas (a). e, Metabolic pathway analysis shows distinct metabolic profiles associated with the groundwater from the different depths (individual metabolic capacities of each organism are listed in Supplementary Table 7). Each circle displays the cumulative relative abundance of genomes contributing to this single metabolic process. Arrows display if an increase or decrease is significant (P <​ 0.05). CBB, Calvin-Benson-Bassham; disprop., disproportionation.

332 Nature Microbiology | VOL 3 | MARCH 2018 | 328–336 | www.nature.com/naturemicrobiology © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. NAtURe MiCRObiOlOGy Articles

a b

3.5 Candidatus “Altiarchaeum” ?N (SM1) 3.0 Candidatus “Huberiarchaeum” SM1 2.5

2.0 log10 coverage 1.5 P value < 2.8 × 10E-12; 25 samples ordered by time cor. = 0.94 200 nm

Fig. 5 | Putative symbiotic interaction of Ca. “Altiarchaeum” and Ca. “H. crystalense” a, Linear correlation analysis of relative abundance of the two archaea across 25 metagenome samples (full cycle of the geyser). b, Scanning electron micrograph of what are inferred to be Ca. “Altiarchaeum” cells (“SM1”) taken during the minor eruptions of the geyser. Tiny cell-like structures appear to be attached to the surface (“?N”). This structure was observed in two out of five samples taken for scanning electron microscopy analysis from the geyser fluids. cor., correlation coefficient. More images are available under Supplementary Fig. 9.

Previously, we reported the operation of three carbon fixation depths of source water composition (and borehole; Fig. 4e). pathways in bacteria and archaea from the Crystal Geyser com­ The reductive TCA cycle was associated with a Sulfurimonas munities, each of which requires substantially different amounts sp. that dominates the shallowest groundwater and also has the 9 of energy . While the Wood–Ljungdahl pathway requires approx­ capacity for N2 fixation. The Calvin–Benson–Bassham cycle was imately 1 mol of ATP for the generation of 1 mol pyruvate, the enriched in bacteria associated with the intermediate groundwa­ Calvin–Benson–Bassham cycle and the reverse TCA cycle require ter as well as the borehole, and the Wood–Ljungdahl pathway 7 mol and 2 mol, respectively29. Here, we show that the three carbon was encoded in Ca. “Altiarchaeum” and Deltaproteobacteria fixation pathways used by Crystal Geyser microorganisms were genomes, and thus most highly represented in the deepest most abundant in different eruption phases, which reflect varying groundwater (Fig. 4d). One previously undescribed archaeal phylum-level lineage within the DPANN branched next to Parvarchaeota (ARMAN-5) and Nanoarchaeum equitans. The 16S rRNA gene sequence of this CPR/DPANN Quartz grain with organic carbon and species shared less than 67% identity with any 16S rRNA gene Other bacteria mineral coatings available in the SILVA database30 (and <78%​ with 16S rRNA gene SM1 and other archaea fragments from environmental samples in the National Center for Biotechnology Information). We reconstructed 11 genomes for this species (including one from a single isolated cell) and estimated the genome size to be about 0.5 Mbp, which is similar to those of Nanoarchaeum equitans and some other DPANN15,31. We pro­ pose the name Candidatus “Huberarchaeum crystalense” (phylum Ca. “Huberarchaea”) for this archaeal lineage in honour of Prof. Robert Huber, pioneer in research on psychrophilic archaea and discoverer of Ca. “Altiarchaeum”. Although enzymes for modification of purine and pyrimidine bases were encoded in the genome of Ca. “H. crystalense” (for example, via methylated folate), it is predicted to be incapable of de novo nucleotide synthesis (Supplementary File 2). The genome encodes a near-complete set of aminoacyl transfer RNA synthe­ tases, proteins for replication and repair of DNA and translation and transcription machinery. Amino acids, whose biosynthesis Fig. 6 | Conceptual representation of a relatively stable microbiome in pathways were lacking, are probably acquired via five different pro­ deeper sandstone aquifer sources. The microbiome is dominated by teases. It has enzymes for glycosylating proteins and lipids and a Ca. “Altiarchaeum” (SM1) and their putative DPANN symbionts and near-complete pathway for lipid biosynthesis, observations that populated by many CPR and other bacteria, some of which are probably support the claim that this is a cellular organism. Protein export symbiotic partners for CPR. We envision facile distribution of the very was probably accomplished via an encoded sec-pathway. Based on small CPR and DPANN cells through the sandstone pore spaces, providing the limited metabolism of Ca. “H. crystalense”, we infer a symbiotic periodic opportunities for establishment of the symbiont–host interactions lifestyle. Interaction of the symbiont and a host may be mediated that are probably required for CPR and DPANN cell replication. This figure via large surface proteins, some of which are Cys-rich32. One of the provides a conceptual diagram of generalized microbial habitats in the extracellular, membrane-anchored Cys-rich proteins is predicted to aquifer based on an approximate pore size of sandstone. However, we note bind calcium (Supplementary Fig. 8)33, a function also commonly the subsurface is a heterogeneous three-dimensional system and physical found in hemolysin proteins. Hemolysin proteins destroy cell mem­ properties will vary substantially55. The Carmel and Kayenta formations branes, an activity that might be pivotal for Ca. “H. crystalense” to are expected to act as aquitards (confining barriers) that separate the access metabolites from its host. high permeability sandstone aquifers (Fig. 3a), with each aquifer largely Within the whole geyser community, Ca. “H. crystalense” is the confined, both hydrologically and microbiologically, from other aquifers by seventh most abundant organism. Notably, its abundance correlated these low-permeability shale/mudstone units56. This physical separation significantly with that of the dominant organism, Ca. “Altiarchaeum” by low-permeability units probably contributes to the distinctive microbial (linear correlation, P value <​ 2.8 ×​ 10E-12; Fig. 5a). Based on the communities associated with the three relative groundwater source depths correlation of abundance patterns, we suggest that Ca. “H. crys­ as documented in the study. talense” is a symbiont of the Ca. “Altiarchaeum”. Some support for

Nature Microbiology | VOL 3 | MARCH 2018 | 328–336 | www.nature.com/naturemicrobiology 333 © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Articles NAtURe MiCRObiOlOGy this may be provided by scanning electron microscope images, Spectrometry and Inductively Coupled Plasma Mass Spectrometry, respectively38, which showed small rounded structures of approximately 0.15-µ​m at the Geologic Survey of Canada in Calgary. diameter attached to larger cells (Fig. 5b, Supplementary Fig. 9). We Dissolved gas collection and analysis. Water samples for dissolved gas infer that the larger cells are Ca. “Altiarchaeum”, based on the distinct composition were taken simultaneously with water chemistry and isotope samples, hami-like appendages10, and that Ca. “H. crystalense” are episymbi­ collected using the inverted bottle method39 in 12 ml glass bottles capped with onts. Interestingly, both genomes exhibited very high levels of frag­ precision seal silicone septa caps. All samples were refrigerated until analysis at mentation, an indication of high levels of strain heterogeneity within the University of Calgary. Dissolved gas compositions were determined by gas chromatography using headspace extraction. Due to our primary interest in CO2 both populations. The diversification of the Ca. “Altiarchaeum” gas in this system, we used a headspace to sample water ratio of 3:1 and shaking host in its deep subsurface habitat might drive coevolution of time of 12 min at 400 rpm40. Headspace samples were injected onto an HP 5890 Ca. “H. crystalense”. The shared characteristic of strain heterogene­ (Hewlett-Packard) gas chromatograph with a Hamilton gas-tight syringe via ity may also support the inference of their interaction. a six-port, two-position sampling valve. The gas chromatograph was outfitted with parallel Rt-Msieve 5 A (Restek, 30 m ×​ 0.32 mm) and Rt-Q-PLOT (Restek, 30 m ×​ 0.53 mm), and data were collected using an HP 3396 Series II Integrator Discussion (Hewlett-Packard). Our microbiological investigation clearly demonstrated a strong stratification of microbial community composition and microbial Genome-resolved metagenomics and single-cell genomics. Methods for genomic function with relative groundwater source depths. Groundwater analysis of the 2014 datasets (including estimation of genome completeness) can be sampled from all three relative depths was dominated by auto­ found in the Supplementary Methods. trophs. The main pathway used for carbon fixation in the deeper Crystal geyser genome database. The genome database was constructed from subsurface is the one with the lowest energy cost, the Wood– genomes, from metagenomes and from single-cell genomes (SAGs) collected Ljungdahl pathway, possibly because the is the in 2014. First, all curated, newly binned genomes from metagenomes (985 in 9 most energy limited. Use of this pathway for provision of organic total) were combined with 222 previously published genomes and clustered based on 98% nucleotide identity. One representative of each genome cluster carbon was reported recently for other deep biosphere communi­ was chosen based on the highest completeness (single-copy genes) and lowest 3,6 ties . This pathway is also central to metabolism of methanogens, amount of contamination (multiple single-copy genes) following the formula: 34 archaeal found in the deep subsurface . Reliance on the score =​ single-copy genes—2x multiple single-copy genes9,12. In cases of ties, the Wood–Ljungdahl pathway for CO2 fixation may be a widespread genome with the highest N50 was chosen. The resulting 503 archaea and bacteria phenomenon in such environments. Our results indicate that the were then compared against draft-quality SAGs (at least 70% complete) using 98% nucleotide identity. Two draft-quality SAGs were not covered by the genomes from carbon provided by primary producers operating different carbon metagenomes and were thus added to the Crystal Geyser database that consists fixation pathways sustains a wide variety of bacteria and archaea in of 505 archaeal and bacterial species used for downstream analyses. A schematic the subsurface. overview of the procedure is presented in Supplementary Fig. 2. This study adds to a growing body of literature that suggests that terrestrial subsurface regions are biodiversity hotspots6,12. Comparison of genomes from metagenomes to SAGs. Whole-genome alignment of genomes from metagenomes9 was performed at 98% nucleotide identity. Notably, we find that deeper regions can be particularly enriched If a SAG shared more than 30% of its genomic content with a genome from a in candidate phyla bacteria (especially CPR), DPANN archaea and metagenome (which were at least 70% complete), the SAG was considered to be other deep-branching archaea. The CPR were the most diverse represented by the genome from the metagenome (Supplementary Fig. 4). organisms in the system. Intriguingly, many of these enigmatic CPR and DPANN are inferred to be symbionts13,15, probably epi­ Phylogeny of bacteria and archaea. Phylogenetic placements of the 505 archaea 35 and bacteria in the Crystal Geyser database were determined from a tree computed symbionts of other bacteria or archaea . One highly abundant from 16 ribosomal proteins14 and included 3,609 sequences (including reference DPANN is a putative episymbiont of the most abundant archaeon, sets from previous studies12,14). Bacterial ribosomal proteins were extracted using Ca. “Altiarchaeum”; however, further investigations are neces­ usearch41 against a public database9 (https://github.com/AJProbst/sngl_cp_gn), sary to confirm this association. The putative symbiotic relation­ while archaeal ribosomal proteins were first selected by searching against Hidden 42 14 ship between Ca. “Altiarchaeum” and Ca. “H. crystalense” could Markov Models (HMMs) built from a previous dataset (to exclude A/E type) and then annotated against UniRef43. Individually aligned protein sequences44 be analogous to that described between Ignicoccus hospitalis and were end trimmed and gaps ( ​5% coverage) were removed before concatenation 36 < Nanoarchaeum equitans . Although Ca. “Altiarchaeum” is found of protein sequences. Only sequences of genomes that spanned at least 50% of elsewhere in the subsurface10, Ca. “H. crystalense” has not been the alignment were included in the phylogenetic analysis; others were classified detected in other metagenomic studies. The frequent detection of using ribosomal protein S3 or 16S rRNA gene sequences. Trees were computed 14 CPR and DPANN in groundwater, as found in this and other stud­ as described earlier . Taxa from Probst et al., 2016 that were redefined in their taxonomy are listed in Supplementary Table 2. ies6,8,9,12,13,15, may reflect the advantage of existence as ultra-small cells that can be readily distributed through sediment pore spaces, Tracking community members across time. In 2015, time resolution of the allowing periodic encounters with potential host organisms (Fig. 6). Crystal Geyser community was achieved by near-continuous filtration of Highly interdependent lifestyles and intimate metabolic connec­ groundwater onto 0.1-µ​m filters (ZTECG, Graver Technologies, Glasgow, USA) tions among community members may be an adaptation to con­ that we recovered at 25 different time points over a time course of nearly 5 days (114 h). Filtration was performed for an average of 4.6 h per sample and sampling stant low-nutrient conditions at depth. spanned one entire eruption cycle of the geyser (Fig. 1). Metagenomic DNA was extracted from the samples8 and paired-end sequenced (Illumina HiSeq 2500, Methods Supplementary Table 1). Reads of 150 bp were quality filtered (see Supplementary Water chemistry and isotopes. Downhole electrical conductivity, water pressure Methods) and mapped onto the de-replicated genome set of 505 organisms using and temperature were monitored using a Solinst LTC logger located in Crystal bowtie245 (default settings), allowing three mismatches per 150-bp read Geyser borehole about 8.5 m below ground surface. Water samples for major ions (98% identity)46. Read coverage was normalized by genome size and relative and trace metals were collected hourly at a pumping rate of 0.2 l min-1 over the abundances of each genome in each sample were normalized by number of -1 eruption cycle from the borehole from a sampling tube inserted to 8.5 m below reads per sample using the equation Ar =​ Nm/Ns * (r * l) g , where Ar is the ground surface, and from the Green River. During two individual minor eruptions, relative abundance of the genome in a particular sample, Nm is the maximum samples were collected approximately every 10 min. Samples were feld fltered to number of reads of all metagenome samples, Ns is the total number of reads 0.2 μm​ by hand using Acrodisc syringe flters and a plastic syringe before collection of that particular sample, r is the number of reads of that particular sample into prerinsed 60 ml scintillation vials and then acidifed to pH 2 with high-purity that mapped to the genome, l is the average read length and g is the length nitric acid for sample preservation37. Te bottles were frozen for transport to of the genome. the University of Calgary. Alkalinity was measured in the laboratory using an iRep values were calculated with one mismatch per read as described Orion Autochemistry 960 Autotitrator with 0.2 N sulfuric acid within 1 month of earlier46. Relative abundance measure using metagenomics was confirmed using collection and expressed as HCO3 concentration. Major element and trace metal quantitative digital droplet PCR (ddPCR), for which the method can be found in concentrations were determined using Inductively Coupled Plasma Emission the Supplementary Methods.

334 Nature Microbiology | VOL 3 | MARCH 2018 | 328–336 | www.nature.com/naturemicrobiology © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. NAtURe MiCRObiOlOGy Articles

Microbial metabolism from genomics. Microbial metabolism from genomics was 17. Nunoura, T. et al. Insights into the evolution of Archaea and eukaryotic predicted as described earlier9,12. In brief, genes for each genome were predicted protein modifer systems revealed by the genome of a novel archaeal group. using prodigal47 with the respective genetic code and key metabolic genes for Nucleic Acids Res. 39, 3204–3223 (2011). various sulfur, nitrogen, hydrogen and metal redox processes were predicted 18. Lin, X., Kennedy, D., Fredrickson, J., Bjornstad, B. & Konopka, A. Vertical using HMMs12,42. In addition, functional predictions against KEGG were stratifcation of subsurface microbial community composition across performed on the basis of HMMs including hits with e-values <​ E-109. As such, geological formations at the Hanford Site. Environ. Microbiol. 14, the genetic potential of organisms for carbon fixation and oxygen respiration was 414–425 (2012).

based on the presence of all key enzymes and a pathway coverage of at least 60% in 19. Herrmann, M. et al. Large fractions of CO2-fxing microorganisms the KEGG module. in pristine limestone aquifers appear to be involved in the oxidation of reduced sulfur and nitrogen compounds. Appl. Environ. Microbiol. 81, Microbial community statistics. Ordination analyses of microbial community 2384–2394 (2015). structure was performed using a Bray–Curtis distance measure and non-metric 20. Baker, B. J. et al. Lineages of acidophilic archaea revealed by community multidimensional scaling (NMDS) in the R programming environment48. Influence genomic analysis. Science 314, 1933–1935 (2006). of environmental factors as provided in Supplementary Table 1 were determined 21. Eloe-Fadrosh, E. A., Ivanova, N. N., Woyke, T. & Kyrpides, N. C. by BioENV (Bray–Curtis dissimilarity, Spearman correlation) and plotted onto Metagenomics uncovers gaps in amplicon-based detection of microbial the NMDS49. Microbial source tracking of organisms and changes in microbial diversity. Nat. Microbiol. 1, 15032 (2016). metabolism between different groundwater source depths based on cumulative 22. Nuccio, V. F. & Condon, S. M. Burial and thermal history of the Paradox abundance of organisms was performed using analysis of variance coupled to Basin, Utah and Colorado, and petroleum potential of the Middle a Tukey honest significant difference post hoc test. Sample designations of the Pennsylvanian Paradox Formation. US Geol. Surv. Bull. 2000–O (1996). different depths correspond to those provided in Fig. 1. All P values that were 23. Trimble, L. M. & Doelling, H. H. Te geology and uranium vanadium affected by multiple testing were corrected for false discovery rate using the deposits of the San Rafael River Mining Area, Emery County, Utah. Benjamini–Hochberg procedure50. Utah Geol. Surv. Bull. 1046-D, 113 (1978). 24. Hood, J. W. & Patterson, D. J. Bedrock Aquifers in the Northern San Rafael Scanning electron microscopy. Methods can be found in the Swell Area, Utah, with Special Emphasis on the Navajo Sandstone 139 supplementary documents. (Utah Department of Natural Resources, Division of Water Rights, 1984). 25. Freethey, G. W. & Cordy, G. E. Geohydrology of Mesozoic Rocks in the Data availability. SRA accession numbers for metagenomes of each sample are Upper Colorado River Basin in Arizona, Colorado, New Mexico, Utah, provided in Supplementary Table 1. All genomes from metagenomes included in and Wyoming, excluding the San Juan Basin. US Geol. Surv. Prof. Pap. this study were deposited at NCBI under Bioprojects PRJNA362739, PRJNA349044 1411-C (1991). and PRJNA297582. Genomes from metagenomes and single-cell genomes are 26. Kampman, N. et al. Drilling and sampling a natural CO2 reservoir: also available under: http://ggkbase.berkeley.edu/CG_2014_505_non-redundant_ implications for fuid fow and CO2-fuid–rock reactions during CO2 genomes/organisms, http://ggkbase.berkeley.edu/CG_2014_genomes_from_ migration through the overburden. Chem. Geol. 369, 51–82 (2014). metagenomes/organisms and http://ggkbase.berkeley.edu/CG_2014_SAGs/organisms. 27. Han, W. S. et al. Periodic changes in efuent chemistry at cold-water geyser: Crystal geyser in Utah. J. Hydrol. 550, 54–64 (2017). Received: 23 February 2017; Accepted: 12 December 2017; 28. Watson, Z. T., Han, W. S., Keating, E. H., Jung, N.-H. & Lu, M. Eruption Published online: 29 January 2018 dynamics of CO2-driven cold-water : Crystal, Tenmile geysers in Utah and Chimayó geyser in New Mexico. Earth. Planet. Sci. Lett. 408, References 272–284 (2014). 1. Wrighton, K. C. et al. Fermentation, hydrogen, and sulfur metabolism in 29. Berg, I. A. et al. Autotrophic carbon fxation in archaea. Nat. Rev. Microbiol. multiple uncultivated . Science 337, 1661–1665 (2012). 8, 447 (2010). 2. Pedersen, K., Arlinger, J., Ekendahl, S. & Hallbeck, L. 16S rRNA gene 30. Quast, C. et al. Te SILVA ribosomal RNA gene database project: diversity of attached and unattached bacteria in boreholes along the access improved data processing and web-based tools. Nucleic Acids Res. 41, tunnel to the Äspö hard rock laboratory, Sweden. Fems. Microbiol. Ecol. 19, D590–D596 (2012). 249–262 (1996). 31. Waters, E. et al. Te genome of Nanoarchaeum equitans: insights into early 3. Magnabosco, C. et al. A metagenomic window into carbon metabolism at archaeal evolution and derived parasitism. Proc. Natl Acad. Sci. USA 100, 3 km depth in Precambrian continental crust. ISME J. 10, 730–741 (2016). 12984–12988 (2003). 4. Chivian, D. et al. Environmental genomics reveals a single-species ecosystem 32. Anantharaman, K. et al. Analysis of fve complete genome sequences for deep within Earth. Science 322, 275–278 (2008). members of the class Peribacteria in the recently recognized Peregrinibacteria 5. Bagnoud, A. et al. Reconstructing a hydrogen-driven microbial metabolic bacterial phylum. Peer J. 4, e1607 (2016). network in Opalinus Clay rock. Nat. Commun. 7, 12770 (2016). 33. Sharon, I. et al. Time series community genomics analysis reveals rapid shifs in bacterial species, strains, and phage during infant gut colonization. 6. Hernsdorf, A. W. et al. Potential for microbial H2 and metal transformations associated with novel bacteria and archaea in deep terrestrial subsurface Genome Res. 23, 111–120 (2013). sediments. ISME J. 11, 1915–1929 (2017). 34. Lau, M. C. Y. et al. An oligotrophic deep-subsurface community dependent 7. McKelvie, J. R., Korber, D. R. & Wolfaardt, G. M. in Teir World: A Diversity on is dominated by sulfur-driven autotrophic denitrifers. of Microbial Environments (ed. Hurst, C. J.) 251–300 (Springer International Proc. Natl Acad. Sci. USA 113, E7927–E7936 (2016). Publishing, Switzerland, 2016). 35. He, X. et al. Cultivation of a human-associated TM7 phylotype reveals a 8. Emerson, J. B., Tomas, B. C., Alvarez, W. & Banfeld, J. F. Metagenomic reduced genome and epibiotic parasitic lifestyle. Proc. Natl Acad. Sci. USA analysis of a high carbon dioxide subsurface microbial community populated 112, 244–249 (2015). by chemolithoautotrophs and bacteria and archaea from candidate phyla. 36. Huber, H. et al. A new phylum of Archaea represented by a nanosized Environ. Microbiol. 18, 1686–1703 (2016). hyperthermophilic symbiont. Nature 417, 63–67 (2002). 9. Probst, A. J. et al. Genomic resolution of a cold subsurface aquifer 37. Standard Guide for Sampling Ground-Water Monitoring Wells ASTM community provides metabolic insights for novel microbes adapted to high D4448-01 (ASTM International, West Conshohocken, 2013).

CO2 concentrations. Environ. Microbiol. 19, 459–474 (2017). 38. Rice, E. W., Baird, E. B., Eaton, A. D. & Clesceri, L. S. Standard Methods for 10. Probst, A. J. et al. Biology of a widespread uncultivated archaeon that the Examination of Water and Wastewater (American Public Health contributes to carbon fxation in the subsurface. Nat. Commun. 5, 5497 (2014). Association, American Water Works Association, Water Environment 11. Rinke, C. et al. Insights into the phylogeny and coding potential of microbial Federation, 2012). dark matter. Nature 499, 431–437 (2013). 39. Clark, I. D. & Fritz, P. Environmental Isotopes in Hydrogeology (CRC Press, 12. Anantharaman, K. et al. Tousands of microbial genomes shed light on New York, 1997). interconnected biogeochemical processes in an aquifer system. Nat. Commun. 40. Jahangir, M. M. et al. Evaluation of headspace equilibration methods for 7, 13219 (2016). quantifying greenhouse gases in groundwater. J. Environ. Manag. 111, 13. Brown, C. T. et al. Unusual biology across a group comprising more than 208–212 (2012). 15% of Bacteria. Nature 523, 208–211 (2015). 41. Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. 14. Hug, L. A. et al. A new view of the . Nat. Microbiol. 1, Bioinformatics 26, 2460–2461 (2010). 16048 (2016). 42. Johnson, L.S., Eddy, S. R. & Portugaly, E. Hidden Markov model 15. Castelle, C. J. et al. Genomic expansion of domain archaea highlights roles for speed heuristic and iterative HMM search procedure. BMC Bioinformatics 11, organisms from new phyla in anaerobic carbon cycling. Curr. Biol. 25, 431 (2010). 690–701 (2015). 43. Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C. H. UniRef: 16. Zaremba-Niedzwiedzka, K. et al. archaea illuminate the origin of comprehensive and non-redundant UniProt reference clusters. Bioinformatics eukaryotic cellular complexity. Nature 541, 353–358 (2017). 23, 1282–1288 (2007).

Nature Microbiology | VOL 3 | MARCH 2018 | 328–336 | www.nature.com/naturemicrobiology 335 © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Articles NAtURe MiCRObiOlOGy

44. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and Research under contract DE-AC02-05CH11231. Work at UCB was funded by the Sloan high throughput. Nucleic Acids Res. 32, 1792–1797 (2004). Foundation (“Deep Life”, grant no. G-2016-20166041). Funding for hydrogeological and 45. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. geochemical analyses was provided by a Natural Sciences and Engineering Research Nat. Methods 9, 357–359 (2012). Council of Canada Discovery Grant to M.C.R. Development of ggKbase was supported 46. Brown, C. T., Olm, M. R., Tomas, B. C. & Banfeld, J. F. Measurement of by the Office of Science, Office of Biological and Environmental Research, of the US bacterial replication rates in microbial communities. Nat. Biotechnol. 34, Department of Energy Grant DOE-SC10010566. Work conducted by the US Department 1256–1263 (2016). of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported 47. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation under Contract No. DE-AC02-05CH11231. We thank DOE’s Emerging Technologies initiation site identifcation. Bmc. Bioinforma. 11, 119 (2010). Opportunity Program “Development of a pipeline for high-throughput recovery 48. R Core Team. R: A Language and Environment for Statistical Computing of near-complete and complete microbial genomes from complex metagenomic datasets” (R Foundation for Statistical Computing, 2016); http://www.r-proj.org for sequencing. 49. Weinmaier, T. et al. A viability-linked metagenomic analysis of cleanroom environments: eukarya, , and viruses. Microbiome 3, 62 (2015). Author contributions 50. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical A.J.P., B.L., J.B.E., K.A., M.C.R. and JFB sampled the ecosystem. B.L. and M.C.R. and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57, conducted hydrogeological and geochemical analyses. A.J.P. performed genome-resolved 289–300 (1995). metagenomics, phylogenetic, metabolic and community analyses. D.E.G.M. performed 51. Mayo, A. L., Shrum, D. B. & Chidsey Jr., T. C. Factors contributing to digital droplet PCR. C.M.K.S. and B.C.T. provided software. J.J. and T.W. performed exsolving carbon dioxide in ground water systems in the Colorado Plateau, single-cell genomics. A.J.P., J.J. and R.R.M. performed analyses of single-cell genomes. Utah. Utah Geol. Assoc. Publ. 19, 335–342 (1991). M.S. and A.K. carried out scanning electron microscopy. A.J.P. and J.F.B. designed the 52. Shipton, Z. K. et al. Analysis of CO2 leakage through ‘low-permeability’ faults study. A.J.P. and J.F.B. wrote the paper with input from B.L. and M.C.R. All authors from natural reservoirs in the Colorado Plateau, east-central Utah. Geol. Soc. revised the manuscript. Lond. Spec. Publ. 233, 43–58 (2004). 53. Allis, R. et al. Implications of results from CO fux surveys over known CO systems for long-term monitoring. In Fourth Annual Conference on Carbon Competing interests Capture and Sequestration (DOE/NETL, Alexandria, VA, 2005). The authors declare no competing financial interests. 54. Jung, N.-H., Han, W. S., Han, K. & Park, E. Regional-scale advective, difusive, and eruptive dynamics of CO2 and brine leakage through faults and Additional information wellbores. J. Geophys. Res. Solid Earth 120, 2014JB011722 (2015). Supplementary information is available for this paper at https://doi.org/10.1038/s41564- 55. Stegen, J. et al. Coupling among microbial communities, biogeochemistry, 017-0098-y. and mineralogy across biogeochemical facies. Sci. Rep. 6, 30553 (2016). 56. Krumholz, L. R., McKinley, J. P., Ulrich, G. A. & Sufita, J. M. Confned Reprints and permissions information is available at www.nature.com/reprints. subsurface microbial communities in Cretaceous rock. Nature 386, Correspondence and requests for materials should be addressed to M.C.R. or J.F.B. 64 (1997). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Acknowledgements Open Access Tis article is licensed under a Creative Commons We thank J. Hinshaw for his contribution to fieldwork logistics and year-long Attribution 4.0 International License, which permits use, sharing, temperature monitoring. We also thank C. Brown, S. Spaulding, S. Clingenpeel, D. adaptation, distribution and reproduction in any medium or Barton, B. Rocha and D. Bethune for logistic support during fieldwork. C. Niemann format, as long as you give appropriate credit to the original author(s) and the source, provided technical assistance regarding scanning electron microscopy. D. Goudeau is provide a link to the Creative Commons license, and indicate if changes were made. acknowledged for help with single-cell lab work. We thank T. G. del Rio for handling Te images or other third party material in this article are included in the article’s of metagenomic samples at JGI. We are grateful to S. Gribaldo for discussion of the Creative Commons license, unless indicated otherwise in a credit line to the material. phylogenetic placement of bacterial and archaeal phyla. A.J.P. was supported by the If material is not included in the article’s Creative Commons license and your intended German Science Foundation under DFG PR 1603/1-1 and by Lawrence Berkeley use is not permitted by statutory regulation or exceeds the permitted use, you will need National Laboratory’s Sustainable Systems Scientific Focus Area funded by the US to obtain permission directly from the copyright holder. To view a copy of this license, Department of Energy, Office of Science, Office of Biological and Environmental visit http://creativecommons.org/licenses/by/4.0/.

336 Nature Microbiology | VOL 3 | MARCH 2018 | 328–336 | www.nature.com/naturemicrobiology © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. nature research | life sciences reporting summary

Corresponding author(s): Jillian Banfield, Cathryn Ryan Initial submission Revised version Final submission Life Sciences Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form is intended for publication with all accepted life science papers and provides structure for consistency and transparency in reporting. Every life science submission will use this form; some list items might not apply to an individual manuscript, but all fields must be completed for clarity. For further information on the points included in this form, see Reporting Life Sciences Research. For further information on Nature Research policies, including our data availability policy, see Authors & Referees and the Editorial Policy Checklist.

` Experimental design 1. Sample size Describe how sample size was determined. Sample size was determined based on hydrogeological properties of subsurface fluids. This is elucidated in Figure 1 and Supplementary Figure 1 in detail. 2. Data exclusions Describe any data exclusions. The continuous data collected over the eruption cycle of the geyser was categorized based on hydrogeological measurements. Samples taken during the transition between the categories were excluded if they showed properties of both categories (Figure 1 and Supplementary Figure 1). 3. Replication Describe whether the experimental findings were Biological replicates are represented by the samples of the different categories reliably reproduced. (determined via hydrogeological measurements). At least four samples per category were used to ensure statistical robustness. 4. Randomization Describe how samples/organisms/participants were n/a allocated into experimental groups. 5. Blinding Describe whether the investigators were blinded to n/a group allocation during data collection and/or analysis. Note: all studies involving animals and/or human research participants must disclose whether blinding and randomization were used.

6. Statistical parameters For all figures and tables that use statistical methods, confirm that the following items are present in relevant figure legends (or in the Methods section if additional space is needed). n/a Confirmed

The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement (animals, litters, cultures, etc.) A description of how samples were collected, noting whether measurements were taken from distinct samples or whether the same sample was measured repeatedly A statement indicating how many times each experiment was replicated The statistical test(s) used and whether they are one- or two-sided (note: only common tests should be described solely by name; more complex techniques should be described in the Methods section) A description of any assumptions or corrections, such as an adjustment for multiple comparisons June 2017 The test results (e.g. P values) given as exact values whenever possible and with confidence intervals noted A clear description of statistics including central tendency (e.g. median, mean) and variation (e.g. standard deviation, interquartile range) Clearly defined error bars

See the web collection on statistics for biologists for further resources and guidance.

1 ` Software nature research | life sciences reporting summary Policy information about availability of computer code 7. Software Describe the software used to analyze the data in this publicly available code written R, shell, python or ruby study.

For manuscripts utilizing custom algorithms or software that are central to the paper but not yet described in the published literature, software must be made available to editors and reviewers upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). Nature Methods guidance for providing algorithms and software for publication provides further information on this topic.

` Materials and reagents Policy information about availability of materials 8. Materials availability Indicate whether there are restrictions on availability of n/a unique materials or if these materials are only available for distribution by a for-profit company. 9. Antibodies Describe the antibodies used and how they were validated n/a for use in the system under study (i.e. assay and species). 10. Eukaryotic cell lines a. State the source of each eukaryotic cell line used. n/a

b. Describe the method of cell line authentication used. n/a

c. Report whether the cell lines were tested for n/a contamination.

d. If any of the cell lines used are listed in the database n/a of commonly misidentified cell lines maintained by ICLAC, provide a scientific rationale for their use.

` Animals and human research participants Policy information about studies involving animals; when reporting animal research, follow the ARRIVE guidelines 11. Description of research animals Provide details on animals and/or animal-derived n/a materials used in the study.

Policy information about studies involving human research participants 12. Description of human research participants Describe the covariate-relevant population n/a characteristics of the human research participants. June 2017

2