Genome-Resolved Metagenomics Reveals Site-Specific Diversity of Episymbiotic CPR Bacteria and DPANN Archaea in Groundwater Ecosystems
Total Page:16
File Type:pdf, Size:1020Kb
ARTICLES https://doi.org/10.1038/s41564-020-00840-5 Genome-resolved metagenomics reveals site-specific diversity of episymbiotic CPR bacteria and DPANN archaea in groundwater ecosystems Christine He 1, Ray Keren 2, Michael L. Whittaker3,4, Ibrahim F. Farag 1, Jennifer A. Doudna 1,5,6,7,8, Jamie H. D. Cate 1,5,6,7 and Jillian F. Banfield 1,4,9 ✉ Candidate phyla radiation (CPR) bacteria and DPANN archaea are unisolated, small-celled symbionts that are often detected in groundwater. The effects of groundwater geochemistry on the abundance, distribution, taxonomic diversity and host associa- tion of CPR bacteria and DPANN archaea has not been studied. Here, we performed genome-resolved metagenomic analysis of one agricultural and seven pristine groundwater microbial communities and recovered 746 CPR and DPANN genomes in total. The pristine sites, which serve as local sources of drinking water, contained up to 31% CPR bacteria and 4% DPANN archaea. We observed little species-level overlap of metagenome-assembled genomes (MAGs) across the groundwater sites, indicating that CPR and DPANN communities may be differentiated according to physicochemical conditions and host populations. Cryogenic transmission electron microscopy imaging and genomic analyses enabled us to identify CPR and DPANN lineages that reproduc- ibly attach to host cells and showed that the growth of CPR bacteria seems to be stimulated by attachment to host-cell surfaces. Our analysis reveals site-specific diversity of CPR bacteria and DPANN archaea that coexist with diverse hosts in groundwater aquifers. Given that CPR and DPANN organisms have been identified in human microbiomes and their presence is correlated with diseases such as periodontitis, our findings are relevant to considerations of drinking water quality and human health. etagenome-enabled phylogenomic analyses have led to despite harbouring an estimated 90% of all bacterial biomass32. CPR/ the classification of two groups of organisms that lack pure DPANN organism abundance is likely to have been underestimated culture representatives—the CPR bacteria and DPANN in genomic surveys because they are small enough to pass through M1–4 archaea . Although diverse, CPR and DPANN organisms share 0.2 µm filters, which are widely used to collect cells. Furthermore, conserved traits that are indicative of a symbiotic lifestyle, being ‘universal’ primers to divergent or intron-containing 16S rRNA ultrasmall in size with small genomes and minimal biosynthetic genes2,12,15 are unlikely to detect many members of both groups. capabilities5–9. Episymbiosis (surface attachment) with bacterial or Most of the available near-complete CPR and DPANN genomes are archaeal hosts has been observed in co-cultures of Saccharibacteria from just two aquifers2,19,22. In this Article, to investigate the roles with Actinobacteria10,11, Nanoarchaeota with Crenarchaeota12–14, that CPR and DPANN organisms may have in groundwater ecosys- and Nanohaloarchaeota and archaeal Richmond Mine acidophilic tems, we applied genome-resolved metagenomics to analyse eight nanoorganisms (ARMAN) with Euryarchaeota15–17, and one case groundwater communities in Northern California, and cryogenic of endosymbiosis has been reported in which a member of the transmission electron microscopy (cryo-TEM) to image the com- CPR superphylum Parcubacteria lives inside a protist18. CPR and munity with the highest abundance of CPR/DPANN organisms. DPANN organisms are ubiquitous and can be abundant in ground- water, in which they are predicted to contribute to biogeochemical Results cycling2,4,8,9,19–24. CPR bacteria can persist in drinking water through Metagenome sampling and MAG assembly. The planktonic frac- multiple treatment methods25–27, posing the question of whether tions of eight groundwater communities in Northern California groundwater is a source of CPR10,11,28–30 and DPANN31 organisms were sampled during 2017–2019 (Fig. 1a) using bulk filtration detected in human microbiomes. (0.1 µm filter). Some sites were also sampled using serial size fil- The variation in the abundance and distribution of CPR and tration (2.5 µm, 0.65 µm, 0.2 µm and 0.1 µm filters) in parallel. This DPANN organisms in groundwater environments, their roles and enterprise required pumping 400–1,200 l of groundwater from their relationships with host organisms are not well characterized. each site through a purpose-built sequential filtration appara- Subsurface environments such as groundwater are difficult to sample tus to recover sufficient biomass for deep sequencing of each size and are poorly characterized compared with surface environments, fraction (Methods). One site (Ag) is an agriculturally impacted, 1Innovative Genomics Institute, University of California, Berkeley, CA, USA. 2Department of Civil and Environmental Engineering, University of California, Berkeley, CA, USA. 3Energy Geoscience Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. 4Department of Earth and Planetary Sciences, University of California, Berkeley, CA, USA. 5Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA. 6Department of Chemistry, University of California, Berkeley, CA, USA. 7Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. 8Howard Hughes Medical Institute, University of California Berkeley, Berkeley, CA, USA. 9Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA. ✉e-mail: [email protected] 354 NATURE MICROBIOLOGY | VOL 6 | MARCH 2021 | 354–365 | www.nature.com/naturemicrobiology NATURE MICROBIOLOGY ARTICLES a c 8 8 (i) (ii) Ag (Mar 2017) Ag (Sep 2017) Ep 6 17% CPR, 16% DPANN 6 37% CPR, 24% DPANN Q KJf (i) Qv E KI Tv 4 4 J KI Ep Qv 2 2 (ii) Sacramento Total coverage (%) coverage Total Total coverage (%) coverage Total KJf 0 0 Ep m 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Abundance rank Abundance rank J Ku um KI 8 8 Ag (Nov 2017) Ag (Feb 2018) 6 6 (iii) 40% CPR, 19% DPANN 38% CPR, 19% DPANN Pr1 Pr2 4 4 San Francisco QPc (iii) Q Pr3 Pr4 2 2 Total coverage (%) coverage Total Total coverage (%) coverage Total 0 0 Modesto Pr5 Pr6 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Abundance rank Abundance rank 8 8 San Jose Pr7 Ag Ag (Jun 2018) Pr1 (Dec 2018) 6 37% CPR, 17% DPANN 6 38% CPR, 4.2% DPANN 4 4 2 2 Total coverage (%) coverage Total Total coverage (%) coverage Total 0 0 b 100 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Abundance rank Abundance rank 8 8 Pr2 (May 2019) Pr3 (Nov 2017) 6 6 80 31% CPR, 3.2% DPANN 16% CPR, 3.5% DPANN 4 4 2 2 60 Total coverage (%) coverage Total Total coverage (%) coverage Total 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Abundance rank 8 Abundance rank 40 8 Total coverage (%) coverage Total Pr3 (Sep 2018) Pr4 (Sep 2018) 6 6 2.7% CPR, 0.35% DPANN 8.5% CPR, 1.2% DPANN 20 4 4 2 2 Total coverage (%) coverage Total Total coverage (%) coverage Total 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Ag Ag Ag Ag Ag Pr1 Pr2 Pr3 Pr3 Pr4 Pr5 Pr6 Pr7 Abundance rank Abundance rank 8 8 Pr5 (May 2019) Pr6 (May 2019) 6 6 6.4% CPR, 0.89% DPANN 7.0% CPR, 2.7% DPANN (Mar 2017) (Sep 2017) (Nov 2017) (Feb 2018) (Jun 2018) (Dec 2018) (May 2019) (Nov 2017) (Sep 2018) (Sep 2018) (May 2019) (May 2019) (Nov 2018) 4 4 Sampling point 2 2 Total coverage (%) coverage Total Total coverage (%) coverage Total 0 0 CPR Alphaproteobacteria Armatimonadetes Melainabacteria Ca. Acetothermia 0 5 10 15 20 25 30 0 5 10 15 20 25 30 DPANN Ca. Tectomicrobia Bathyarchaeota Ca. Eisenbacteria Lambdaproteobacteria Abundance rank Abundance rank Chloroflexi Gemmatimonadetes Ca. Schekmanbacteria TA06 Epsilonproteobacteria Planctomycetes Euryarchaeota Ca. Ratteibacteria Ca. Cloacimonetes Lentisphaerae 8 Ca. Omnitrophica Ca. Handelsmanbacteria Verrucomicrobia Ca. Wallbacteria Ca. Hydrogenedentes Pr7 (Nov 2018) 6 Ca. Rokubacteria Gammaproteobacteria Fusobacteria Spirochaetes Ca. Latescibacteria 4.7% CPR, 0% DPANN DeItaproteobacteria Firmicutes Elusimicrobia Ca. Saganbacteria Ca. Riflebacteria 4 Acidobacteria Ignavibacteria Ca. Lindowbacteria Ca. Margulisbacteria Ca. Dependentiae Unbinned NC10 Ca. Sumerlaeota Ca. Zixibacteria KSB1 Chlamydiae 2 Betaproteobacteria Ca. Dadabacteria Ca. Blackallbacteria Deinococcus–Thermus Oligoflexia Nitrospinae Nitrospirae Korarchaeota Ca. Absconditabacteria Ca. Coatesbacteria (%) coverage Total 0 Thaumarchaeota Actinobacteria Bacteroidetes Ca. Desantisbacteria Ca. Abawacabacteria 0 5 10 15 20 25 30 Bacteria Cyanobacteria Ca. Stahlbacteria Abundance rank Fig. 1 | Sampling and overview of groundwater communities. a, Map of eight Northern California groundwater sites sampled in this study (image from Google Maps). Insets: geological maps (i)–(iii) show the sampled areas (black boxes). J, marine sedimentary and metasedimentary rocks (Jurassic); KJfm, marine sedimentary and metasedimentary rocks (Cretaceous–Jurassic); Q, marine and non-marine (continental) sedimentary rocks (Pleistocene– Holocene); Qv, marine sedimentary and metasedimentary rocks (Cretaceous–Jurassic); um, plutonic (Mesozoic); K, marine sedimentary rocks (Pliocene); Kl, marine sedimentary and metasedimentary rocks (Lower Cretaceous); Ep, marine sedimentary rocks (Paleocene); E, marine sedimentary rocks (Eocene); QPc, non-marine (continental) sedimentary rocks (Pliocene–Pleistocene); Tv, volcanic rocks (Tertiary). Scale bars, 23.3 km (large map), 3.2 km (i and ii) and 9.7 km (iii). b, Phylum-level breakdown (with the exception of CPR and DPANN superphyla) of rpS3 genes detected in each site. The sampling dates for each site are indicated. c, Rank abundance curves showing the 30 rpS3 genes with highest relative coverage identified for each site. The hatched bars indicate an unbinned rpS3 gene. river sediment-hosted aquifer and the remaining sites are pristine genomes across all sites (≥70% completeness and ≤10% contamina- groundwater aquifers hosted in a mixture of sedimentary and volca- tion). The median genome completeness was >90% and up to 58% nic rocks (Pr1–Pr7, numbered in decreasing order of total CPR and of metagenomic reads mapped to each site’s dereplicated genome DPANN organism abundance).