Sequence-based approaches to uncultivated microbes

Susannah Green Tringe DOE Joint Genome Institute Metagenome Program Lead Department of Energy: Mission areas

Bioenergy Carbon Cycle

DOE Joint Genome Institute Walnut Creek, CA since 1999 Mission: To advance genomics in support of the DOE mission

Biogeochemistry Uncultivated microbial communities

Wetland carbon cycling

Root-associated communities Uncultivated microbial communities

Wetland carbon cycling

Root-associated communities JGI Programs & Infrastructure

Bioenergy Carbon Cycling Biogeochemistry

Metagenomes Plants Fungi Microbes Synthesis

DNA Genomic Computational Synthesis Sequencing Technologies Analysis What is metagenomics?

Isolate (pure culture) Microbial community

Genomics Metagenomics tame wild

Most bacteria don’t grow on plates: “the great plate count anomaly” 16S ribosomal RNA as a phylogenetic marker

21 proteins

16S rRNA

30S 70S Ribosome

subunits

50S

5S rRNA 34 proteins 23S rRNA

Escherichia coli 16S rRNA Primary and Secondary Structure Falk Warnecke 16S-based phylogeny

Carl Woese 1928-2012

Woese Microbiol Rev 1987 Norm Pace 16S rRNA in environmental microbiology

Falk Warnecke Bacterial phylogenetic tree expansion

cultured

uncultured

Modified from Baker et al 2013 (Microbe) The situation is similar for

Baker et al 2013 (Microbe) Why metagenomics?

Industrial enzymes

Most microbes are uncultured! Antibiotics

Greenhouse gas cycling Suizenbacher et al 1997; www.chm.bris.ac.uk; Functional metagenomics

Gillespie 2002

Turbomycin synthesis genes isolated from a soil metagenome library

Schloss & Handelsman 2003 Discovery of proteorhodopsin

de la Torre, J. R. et al. (2003) Proc. Natl. Acad. Sci. USA 100, 12830-12835

Rhodopsin-like gene – never before seen in bacteria! Bacterial rRNA operon

Copyright ©2003 by the National Academy of Sciences Shotgun metagenomics

Shotgun sequencing: Could it be done?

Schloss & Handelsman 2003 Metagenome assembly - like putting together several jigsaw puzzles

Falk Warnecke . . . with some pieces missing

Falk Warnecke Can we still reconstruct?

Falk Warnecke Can we still reconstruct?

Falk Warnecke One approach: a simple community

sedimentation anaerobic aerobic tank influent effluent

return sludge waste sludge

EUB PAO Enhanced Biological Phosphate Removal (EBPR) reactors are dominated by Candidatus Accumulibacter phosphatis (CAP) but it cannot be grown in pure culture

Crocetti et al., 2001 Proposed biological model for polyP- accumulating organisms (PAOs)

Anaerobic zone Aerobic zone

Cell PHA Cell PHA NAD

NADH Glycogen Acetyl-CoA Glycogen TCA cycle CO2 ADP ADP

PolyP PolyP ATP ATP

acetate PO4 PO4 CAP metabolic reconstruction

Anaerobic Phase Aerobic Phase

Garcia Martin et al. 2006 What about more complex communities? EBPR sludge Sargasso Sea Soil

1 10 100 1000 10000 complexity

? A

Adaptive gene for habitat A Environmental Gene Tags Adaptive gene for habitat B Essential gene (EGTs)

B Comparative metagenomics

COG5524: Bacteriorhodopsin

COG1292: Choline- glycine betaine transporter COG3459: Cellobiose phosphorylase

Tringe et al 2005 Metagenome sequence output

100 Tb

100

90 JGI Sequenced Bases 80 Metagenome bases ) 70 (Tb 60

50

40 30 Tb 30

20 40 Gb

Sequence output output Sequence 10

0 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Fiscal Year meta-GENOMEs

Hess 2012: 15 rumen genomes

Wrighton 2012: 3 uncultivated phyla 49 genomes

Iverson 2012: Marine Group A Annotation of metagenomes

Shotgun data Singlet fasta

ORF prediction Contig fasta

Assembly RNA prediction Contig stats

Contig Depth G+C Jill Banfield, UCB Contig1 10.1 .49 Chongle Pan, ORNL Contig2 5.6 .35 Contig3 19.8 .38

Genome binning, extraction, improvement

ORF prediction

RNA prediction The single-cell approach: how it works

isolation

lysis

+ + dNTPs + polymerase random hexamers template DNA

MDA

Tanja Woyke Single cell genomics: key challenges

CHALLENGE

isolation Sample contamination (‘hitchhiker’ DNA)

No universal lysis for all taxa lysis

Chimerism

MDA Reagent contamination

MDA bias

Tanja Woyke JGI single-cell sequencing pipeline

Whole genome Single cell Sample Community amplification isolation profiling

16S rRNA gene Draft genome identification

Genome Assembly Data QC Annotation Data curation sequencing

Tanja Woyke Current limitations: 100 cells ≠ 100 SAGs

Whole genome Single cell Sample Community amplification isolation profiling ???

16S rRNA gene Not every cell can be isolated Draft genome identification o o Not every cell can be lysed and WGA’d o Not every cell can be 16S ID’d

Genome Assembly Data QC Annotation Data curation sequencing

Tanja Woyke Recovered diversity: 16S tags vs SAGs

Tanja Woyke Modified from Clingenpeel et al 2014 (Frontiers in Microbiol) marker genes

culture single cell target cell enrichment metagenomics

draft/ complete partial draft genomes, draft genomes, unassembled data, genomes complete genomes complete genomes genome bins, (rarely) complete genomes Tanja Woyke culture single cell target cell enrichment metagenomics

Tanja Woyke metagenomic approach

single-cell approach

Tanja Woyke marker genes Uncultivated microbial communities

Wetland carbon cycling

Root-associated communities Wetland restoration

Carbon stored by Global wetlands Wetlands Global carbon (35% of all (9%) land area terrestrial carbon)

CH4 CO2 Wetland restoration What determines if a wetland serves as a greenhouse gas source or a greenhouse gas sink?

CH4 CO2 San Francisco Bay wetlands Farmland PAST PRESENT

Wetland Salt pond www.wrmp.org Subsidence and carbon loss

1850s

Elevation loss

Levee failure

Mount and Twiss 2005 Subsidence and carbon loss

1850s

Elevation loss

Levee failure

Mount and Twiss 2005 Twitchell Island restored wetland - Est. 1997 - Peat accretion: ~4 cm/year - Net GHG budget: -494 g CO eq/m2 2 Twitchell wetland

Shaomei He Mark Waldrop Lisamarie Windham-Myers Sampling site gradients

Site A B C/L

Methane flux

Water inlet Do the microbial communities reflect these changes in geochemistry?

Peat Water accretion outflow Oxygen, Nitrate, Sulfate What controls methane?

CHCH4

Abundance Activity Species composition Relative Gene Family Abundances

Samples with more methanogenesis genes have fewer genes in denitrification, dissimilatory sulfate reduction, and metal reduction.

Methane oxidation genes were more abundant in rhizomes. Methanogen abundance

DNA abundance

Methanogen marker genes from RNA abundance metagenome

(÷2)

CH4

- Methane emissions correlated to methanogen CH4 ABUNDANCE and ACTIVITY Methanogen diversity

Hydrogenotrophic Methanogenesis: CO2 + 4 H2 → CH4 + 2 H2O Acetoclastic Methanogenesis: CH3COOH → CH4 + CO2

Hydrogenotrophic

Aug_C (order) [H] Methanosaetaceae;Methanosaeta [A] Methanobacteriaceae;Methanobacterium [H] Feb_L Methanobacteriales (order) [H] Methanosarcinaceae;Methanosarcina [A] Aug_B Methanocellaceae;Methanocella [H] ; [H] Feb_B MSBL1;SAGMEG-1 [H] Methanospirillaceae [H] Methanococcales (order) [H] Aug_A Methanosarcinaceae;Methanomethylovorans [A] Methanobacteriaceae;Methanobrevibacter [H] Feb_A Methanobacteriaceae;Methanosphaera [H] pMC1;pMC1FA [H] 0% 50% 100% Acetoclastic Bay / Delta Salinity Gradient

Historic and restored wetlands sampled along a salinity gradient

Methane flux varies with salinity and wetland age

Salinity (ppt) Average seawater Susie Theroux Uncultivated microbial communities

Wetland carbon cycling

Root-associated communities Rhizosphere Grand Challenge 1) Rhizosphere / endophyte microbes -provide nutrients -protect from pathogens and stress -influence growth -sequester carbon 2) Challenges -soil microbial communities are notoriously complex -plant genomes are complex -crosses multiple disciplines and programs -statistical rigor requires high sample numbers

52 Arabidopsis rhizosphere project

Rhizosphere Endosphere

Soil

Are there unique communities in each compartment? Does the plant control access?

53 Rhizosphere Grand Challenge

Identifying the major determinants of microbial community assembly

Host factors

Root-associated microbial communities

Variables investigating: Soil type – Mason Farm vs. Clayton Full factorial design Sample fraction – Bulk soil vs. rhizosphere vs. endophyte 1117 samples Plant age – bolting (young) vs. senescent (old) 16S pyrotag profiles Genotype – 8 ecotypes 54 Individual – Aim for 10 individuals per condition Jeff Dangl The Arabidopsis microbiome

The endophyte community is unique and reproducible and similar across soil types

OTUs Lundberg Nature 2012

Rhizosphere/ Soil 1

Rhizosphere/ Soil 2 Sample type

Endophyte The Arabidopsis Microbiome

Cultured isolates Single cells “Plate scrape” metagenomes Flow-sorted “mini-metagenomes”

Sphingobacteriales OTU 2324

Streptomyces Pseudonocardiaciae OTU 14834 OTU 13797 Dangl lab, UNC Woyke lab, JGI An endophyte genome catalog

Actinobacteria

99 isolates and 130 SAGs fully sequenced >50% of target OTUs

Alphaproteobacteria Firmicutes

Cyanobacteria

Single Cells Isolates Bacteroidetes Chloroflexi Betaproteobacteria S. Clingenpeel Recolonization and RNA-seq

Full phosphate

No bacteria full P + Bacteria full P

Low phosphate Inoculation with

No bacteria low P + Bacteria low P root-associated microbes reverses low P phenotype Sterile Colonized Conclusions

• Most microorganisms cannot be readily grown in the lab • Nucleic acid sequencing provides a means to study uncultivated organisms via their genomes • Next-gen sequencing is opening up new opportunities in metagenomics and single cell genomics • These techniques are enabling advances in diverse areas of science Acknowledgments

• EBPR sludge project –Trina McMahon, UWM –Phil Hugenholtz • Wetlands project –Shaomei He –Lisamarie Windham-Myers, USGS –Mark Waldrop, USGS –Susie Theroux –Wyatt Hartman –Dennis Baldocchi, UCB • Rhizosphere project –Jeff Dangl, UNC –Scott Clingenpeel –Tanja Woyke JGI: Next Generation Genome Science User Facility

• Community Science Program • JGI-EMSL Collaborative Science Program • Emerging Technologies Opportunity Program (ETOP) • Visiting Scientist Program • Distinguished Postdoctoral Fellow in Genomics Questions?

Contact: [email protected]

Genomic signatures of plant association

Taxon # PA # NPA # total genomes genomes genes Bacillus 28 141 852,695 Comparing genomes of Burkholderiales 60 144 1,072,509 phylogenetically related plant- associated and non-plant-associated Microbacteriaceae 18 27 140,326 bacteria to identify gene families overrepresented in plant-associated Micrococcaceae 13 22 117,775 (PA) organisms Nocardiaceae 13 37 315,053 1300 genomes (6.5M genes) Paenibacillus 11 29 226,631 included in analysis Pseudomonas 152 77 1,242,490 Rhizobiales 170 152 1,717,437 Sphingomonas 7 12 73,946 Streptomyces 9 65 513,229 Xanthomonadaceae 93 19 433,488

Recurrent plant-associated functions include chemotaxis, certain transposases, carbohydrate transport and metabolism, type III secretion systems, and nodulation Asaf Levy 64