Sequence-based approaches to uncultivated microbes
Susannah Green Tringe DOE Joint Genome Institute Metagenome Program Lead Department of Energy: Mission areas
Bioenergy Carbon Cycle
DOE Joint Genome Institute Walnut Creek, CA since 1999 Mission: To advance genomics in support of the DOE mission
Biogeochemistry Uncultivated microbial communities
Wetland carbon cycling
Root-associated communities Uncultivated microbial communities
Wetland carbon cycling
Root-associated communities JGI Programs & Infrastructure
Bioenergy Carbon Cycling Biogeochemistry
Metagenomes Plants Fungi Microbes Synthesis
DNA Genomic Computational Synthesis Sequencing Technologies Analysis What is metagenomics?
Isolate (pure culture) Microbial community
Genomics Metagenomics tame wild
≠
Most bacteria don’t grow on plates: “the great plate count anomaly” 16S ribosomal RNA as a phylogenetic marker
21 proteins
16S rRNA
30S 70S Ribosome
subunits
50S
5S rRNA 34 proteins 23S rRNA
Escherichia coli 16S rRNA Primary and Secondary Structure Falk Warnecke 16S-based phylogeny
Carl Woese 1928-2012
Woese Microbiol Rev 1987 Norm Pace 16S rRNA in environmental microbiology
Falk Warnecke Bacterial phylogenetic tree expansion
cultured
uncultured
Modified from Baker et al 2013 (Microbe) The situation is similar for archaea
Baker et al 2013 (Microbe) Why metagenomics?
Industrial enzymes
Most microbes are uncultured! Antibiotics
Greenhouse gas cycling Suizenbacher et al 1997; www.chm.bris.ac.uk; Functional metagenomics
Gillespie 2002
Turbomycin synthesis genes isolated from a soil metagenome library
Schloss & Handelsman 2003 Discovery of proteorhodopsin
de la Torre, J. R. et al. (2003) Proc. Natl. Acad. Sci. USA 100, 12830-12835
Rhodopsin-like gene – never before seen in bacteria! Bacterial rRNA operon
Copyright ©2003 by the National Academy of Sciences Shotgun metagenomics
Shotgun sequencing: Could it be done?
Schloss & Handelsman 2003 Metagenome assembly - like putting together several jigsaw puzzles
Falk Warnecke . . . with some pieces missing
Falk Warnecke Can we still reconstruct?
Falk Warnecke Can we still reconstruct?
Falk Warnecke One approach: a simple community
sedimentation anaerobic aerobic tank influent effluent
return sludge waste sludge
EUB PAO Enhanced Biological Phosphate Removal (EBPR) reactors are dominated by Candidatus Accumulibacter phosphatis (CAP) but it cannot be grown in pure culture
Crocetti et al., 2001 Proposed biological model for polyP- accumulating organisms (PAOs)
Anaerobic zone Aerobic zone
Cell PHA Cell PHA NAD
NADH Glycogen Acetyl-CoA Glycogen TCA cycle CO2 ADP ADP
PolyP PolyP ATP ATP
acetate PO4 PO4 CAP metabolic reconstruction
Anaerobic Phase Aerobic Phase
Garcia Martin et al. 2006 What about more complex communities? EBPR sludge Sargasso Sea Soil
1 10 100 1000 10000 Species complexity
? A
Adaptive gene for habitat A Environmental Gene Tags Adaptive gene for habitat B Essential gene (EGTs)
B Comparative metagenomics
COG5524: Bacteriorhodopsin
COG1292: Choline- glycine betaine transporter COG3459: Cellobiose phosphorylase
Tringe et al 2005 Metagenome sequence output
100 Tb
100
90 JGI Sequenced Bases 80 Metagenome bases ) 70 (Tb 60
50
40 30 Tb 30
20 40 Gb
Sequence output output Sequence 10
0 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Fiscal Year meta-GENOMEs
Hess 2012: 15 rumen genomes
Wrighton 2012: 3 uncultivated phyla 49 genomes
Iverson 2012: Marine Group A Annotation of metagenomes
Shotgun data Singlet fasta
ORF prediction Contig fasta
Assembly RNA prediction Contig stats
Contig Depth G+C Jill Banfield, UCB Contig1 10.1 .49 Chongle Pan, ORNL Contig2 5.6 .35 Contig3 19.8 .38
Genome binning, extraction, improvement
ORF prediction
RNA prediction The single-cell approach: how it works
isolation
lysis
+ + dNTPs + polymerase random hexamers template DNA
MDA
Tanja Woyke Single cell genomics: key challenges
CHALLENGE
isolation Sample contamination (‘hitchhiker’ DNA)
No universal lysis for all taxa lysis
Chimerism
MDA Reagent contamination
MDA bias
Tanja Woyke JGI single-cell sequencing pipeline
Whole genome Single cell Sample Community amplification isolation profiling
16S rRNA gene Draft genome identification
Genome Assembly Data QC Annotation Data curation sequencing
Tanja Woyke Current limitations: 100 cells ≠ 100 SAGs
Whole genome Single cell Sample Community amplification isolation profiling ???
16S rRNA gene Not every cell can be isolated Draft genome identification o o Not every cell can be lysed and WGA’d o Not every cell can be 16S ID’d
Genome Assembly Data QC Annotation Data curation sequencing
Tanja Woyke Recovered diversity: 16S tags vs SAGs
Tanja Woyke Modified from Clingenpeel et al 2014 (Frontiers in Microbiol) marker genes
culture single cell target cell enrichment metagenomics
draft/ complete partial draft genomes, draft genomes, unassembled data, genomes complete genomes complete genomes genome bins, (rarely) complete genomes Tanja Woyke culture single cell target cell enrichment metagenomics
Tanja Woyke metagenomic approach
single-cell approach
Tanja Woyke marker genes Uncultivated microbial communities
Wetland carbon cycling
Root-associated communities Wetland restoration
Carbon stored by Global wetlands Wetlands Global carbon (35% of all (9%) land area terrestrial carbon)
CH4 CO2 Wetland restoration What determines if a wetland serves as a greenhouse gas source or a greenhouse gas sink?
CH4 CO2 San Francisco Bay wetlands Farmland PAST PRESENT
Wetland Salt pond www.wrmp.org Subsidence and carbon loss
1850s
Elevation loss
Levee failure
Mount and Twiss 2005 Subsidence and carbon loss
1850s
Elevation loss
Levee failure
Mount and Twiss 2005 Twitchell Island restored wetland - Est. 1997 - Peat accretion: ~4 cm/year - Net GHG budget: -494 g CO eq/m2 2 Twitchell wetland
Shaomei He Mark Waldrop Lisamarie Windham-Myers Sampling site gradients
Site A B C/L
Methane flux
Water inlet Do the microbial communities reflect these changes in geochemistry?
Peat Water accretion outflow Oxygen, Nitrate, Sulfate What controls methane?
CHCH4
Abundance Activity Species composition Relative Gene Family Abundances
Samples with more methanogenesis genes have fewer genes in denitrification, dissimilatory sulfate reduction, and metal reduction.
Methane oxidation genes were more abundant in rhizomes. Methanogen abundance
DNA abundance
Methanogen marker genes from RNA abundance metagenome
(÷2)
CH4
- Methane emissions correlated to methanogen CH4 ABUNDANCE and ACTIVITY Methanogen diversity
Hydrogenotrophic Methanogenesis: CO2 + 4 H2 → CH4 + 2 H2O Acetoclastic Methanogenesis: CH3COOH → CH4 + CO2
Hydrogenotrophic
Aug_C Methanomicrobiales (order) [H] Methanosaetaceae;Methanosaeta [A] Methanobacteriaceae;Methanobacterium [H] Feb_L Methanobacteriales (order) [H] Methanosarcinaceae;Methanosarcina [A] Aug_B Methanocellaceae;Methanocella [H] Methanomicrobiaceae;Methanofollis [H] Feb_B MSBL1;SAGMEG-1 [H] Methanospirillaceae [H] Methanococcales (order) [H] Aug_A Methanosarcinaceae;Methanomethylovorans [A] Methanobacteriaceae;Methanobrevibacter [H] Feb_A Methanobacteriaceae;Methanosphaera [H] pMC1;pMC1FA [H] 0% 50% 100% Acetoclastic Bay / Delta Salinity Gradient
Historic and restored wetlands sampled along a salinity gradient
Methane flux varies with salinity and wetland age
Salinity (ppt) Average seawater Susie Theroux Uncultivated microbial communities
Wetland carbon cycling
Root-associated communities Rhizosphere Grand Challenge 1) Rhizosphere / endophyte microbes -provide nutrients -protect from pathogens and stress -influence growth -sequester carbon 2) Challenges -soil microbial communities are notoriously complex -plant genomes are complex -crosses multiple disciplines and programs -statistical rigor requires high sample numbers
52 Arabidopsis rhizosphere project
Rhizosphere Endosphere
Soil
Are there unique communities in each compartment? Does the plant control access?
53 Rhizosphere Grand Challenge
Identifying the major determinants of microbial community assembly
Host factors
Root-associated microbial communities
Variables investigating: Soil type – Mason Farm vs. Clayton Full factorial design Sample fraction – Bulk soil vs. rhizosphere vs. endophyte 1117 samples Plant age – bolting (young) vs. senescent (old) 16S pyrotag profiles Genotype – 8 ecotypes 54 Individual – Aim for 10 individuals per condition Jeff Dangl The Arabidopsis microbiome
The endophyte community is unique and reproducible and similar across soil types
OTUs Lundberg Nature 2012
Rhizosphere/ Soil 1
Rhizosphere/ Soil 2 Sample type
Endophyte The Arabidopsis Microbiome
Cultured isolates Single cells “Plate scrape” metagenomes Flow-sorted “mini-metagenomes”
Sphingobacteriales OTU 2324
Streptomyces Pseudonocardiaciae OTU 14834 OTU 13797 Dangl lab, UNC Woyke lab, JGI An endophyte genome catalog
Actinobacteria
99 isolates and 130 SAGs fully sequenced >50% of target OTUs
Alphaproteobacteria Firmicutes
Cyanobacteria
Single Cells Isolates Bacteroidetes Chloroflexi Betaproteobacteria S. Clingenpeel Recolonization and RNA-seq
Full phosphate
No bacteria full P + Bacteria full P
Low phosphate Inoculation with
No bacteria low P + Bacteria low P root-associated microbes reverses low P phenotype Sterile Colonized Conclusions
• Most microorganisms cannot be readily grown in the lab • Nucleic acid sequencing provides a means to study uncultivated organisms via their genomes • Next-gen sequencing is opening up new opportunities in metagenomics and single cell genomics • These techniques are enabling advances in diverse areas of science Acknowledgments
• EBPR sludge project –Trina McMahon, UWM –Phil Hugenholtz • Wetlands project –Shaomei He –Lisamarie Windham-Myers, USGS –Mark Waldrop, USGS –Susie Theroux –Wyatt Hartman –Dennis Baldocchi, UCB • Rhizosphere project –Jeff Dangl, UNC –Scott Clingenpeel –Tanja Woyke JGI: Next Generation Genome Science User Facility
• Community Science Program • JGI-EMSL Collaborative Science Program • Emerging Technologies Opportunity Program (ETOP) • Visiting Scientist Program • Distinguished Postdoctoral Fellow in Genomics Questions?
Contact: [email protected]
Genomic signatures of plant association
Taxon # PA # NPA # total genomes genomes genes Bacillus 28 141 852,695 Comparing genomes of Burkholderiales 60 144 1,072,509 phylogenetically related plant- associated and non-plant-associated Microbacteriaceae 18 27 140,326 bacteria to identify gene families overrepresented in plant-associated Micrococcaceae 13 22 117,775 (PA) organisms Nocardiaceae 13 37 315,053 1300 genomes (6.5M genes) Paenibacillus 11 29 226,631 included in analysis Pseudomonas 152 77 1,242,490 Rhizobiales 170 152 1,717,437 Sphingomonas 7 12 73,946 Streptomyces 9 65 513,229 Xanthomonadaceae 93 19 433,488
Recurrent plant-associated functions include chemotaxis, certain transposases, carbohydrate transport and metabolism, type III secretion systems, and nodulation Asaf Levy 64