Strains, Functions and Dynamics in the Expanded Human Microbiome Project

OPEN ARTICLE doi:10.1038/nature23889 Strains, functions and dynamics in the expanded Human Microbiome Project Jason Lloyd-Price1,2*, Anup Mahurkar3*, Gholamali Rahnavard1,2, Jonathan Crabtree3, Joshua Orvis3, A. Brantley Hall2, Arthur Brady3, Heather H. Creasy3, Carrie McCracken3, Michelle G. Giglio3, Daniel McDonald4, Eric A. Franzosa1,2, Rob Knight4,5, Owen White3 & Curtis Huttenhower1,2 The characterization of baseline microbial and functional diversity in the human microbiome has enabled studies of microbiome-related disease, diversity, biogeography, and molecular function. The National Institutes of Health Human Microbiome Project has provided one of the broadest such characterizations so far. Here we introduce a second wave of data from the study, comprising 1,631 new metagenomes (2,355 total) targeting diverse body sites with multiple time points in 265 individuals. We applied updated profiling and assembly methods to provide new characterizations of microbiome personalization. Strain identification revealed subspecies clades specific to body sites; it also quantified species with phylogenetic diversity under-represented in isolate genomes. Body-wide functional profiling classified pathways into universal, human-enriched, and body site-enriched subsets. Finally, temporal analysis decomposed microbial variation into rapidly variable, moderately variable, and stable subsets. This study furthers our knowledge of baseline human microbial diversity and enables an understanding of personalized microbiome function and dynamics. The human microbiome is an integral component in the maintenance Nevertheless, technical differences were even lower, indicating a base- of health1,2 and of the immune system3,4. Population-scale studies have line level of intra-individual strain variation over time (Extended aided in understanding the functional consequences of its remarkable Data Fig. 2b). inter-individual diversity, the earliest of which include MetaHIT5,6 and Several species exhibited differentiation into body site-specific the Human Microbiome Project1 (referred to here as HMP1). Studies subspecies clades (Fig. 1c; Extended Data Fig. 2e–u), defined here continue to focus on the gut7–9, with fewer population-scale cohorts as discrete phylogenetically related clusters of strains, according to a investigating vaginal10, oral11, or skin12 microbial communities. HMP1 silhouette-based score of niche association (Methods). This is readily remains the largest body-wide combined amplicon and metagenome visible in extreme cases, such as Haemophilus parainfluenzae survey of the healthy microbiome to date. (Fig. 1d), in which distinct subspecies clades are apparent in the suprag- Here we report on an expanded dataset from the HMP (HMP1-II), ingival plaque, buccal mucosa, and tongue dorsum. Other species with consisting of whole-metagenome sequencing (WMS) of 1,631 new notable site-specific subspecies clades included Rothia mucilaginosa, samples from the HMP cohort13 (for a total of 2,355; Extended Data Neisseria flavescens, and a Propionibacterium species. Some species did Fig. 1a; Extended Data Table 1a; Supplementary Table 1). New samples not sub-speciate within body sites, but instead specialized in clades dif- greatly expand the number of subjects with sequenced second and third fering among individuals (for example, Eubacterium siraeum (Fig. 1e), visits, and primarily target 6 body sites (from 18 total sampled): anterior or Actinomyces johnsonii (Extended Data Fig. 2d)); others showed no nares, buccal mucosa, supragingival plaque, tongue dorsum, stool, and discrete subspecies phylogenetic structure at all in this population (for posterior fornix. After quality control (Methods), the dataset consisted example, Streptococcus sanguinis, Extended Data Fig. 2u). Interestingly, of 2,103 unique metagenomes and 252 technical replicates, which were no subspecies clades were found to be specific to either of the two cities used in all the following analyses. Profiles, raw data, and assemblies in the study (Extended Data Fig. 2a), although geographically locali- are publicly available at http://hmpdacc.org (Extended Data Table 1b) zed subspecies population structure has been observed in cohorts with and https://aws.amazon.com/datasets/human-microbiome-project/. greater geographic range15. Culture-independent strain profiling, in combination with the 16,903 Body-wide strain diversity and ecology NCBI isolate genomes used as references in this analysis19, provided a The diversity and spatiotemporal distributions of strains were first new quantification20 of how well covered human microbial diversity is investigated using StrainPhlAn14 (Fig. 1), which identifies the domi- by these references (Fig. 1f). Well-sequenced species such as Escherichia nant haplotype (‘strain’) of each sufficiently abundant species in a coli (Extended Data Fig. 2c) and the lactobacilli showed little divergence metagenome (Methods, Supplementary Table 2). Most previous from reference isolates. However, many prevalent and abundant spe- culture-independent strain surveys have targeted only the gut15,16, cies in the body-wide microbiome diverged markedly from the closest and body-wide phylogenetic distances (quantified using the Kimura available reference genomes. Notable clades lacking isolate genomes two-parameter distance17) suggest that all other habitats possess representative of those in the microbiome included Actinomyces greater strain diversity (Fig. 1a). Consistent with previous observa- (Fig. 1b), Haemophilus parainfluenzae (Fig. 1d), Eubacterium rectale, tions15,18, strain profiles were stable over time, with differences over and several Streptococcus and Bacteroides species, and these represent time consistently lower than differences between people (Fig. 1a, b). priority targets for isolation. 1Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA. 2The Broad Institute, Cambridge, Massachusetts 02142, USA. 3Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland 21201, USA. 4Department of Pediatrics, University of California San Diego, La Jolla, California 92093, USA. 5Department of Computer Science & Engineering, University of California San Diego, La Jolla, California 92093, USA. *These authors contributed equally to this work. 5 OCTOBER 2017 | VO L 550 | NAT U RE | 61 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. RESEARCH ARTICLE Technical replicates acWithin subjects f Reference genome coverage Between subjects 43 Rothia mucilaginosa (4) 0.06 Escherichia coli (2,704) 29 Neisseria flavescens (2) Lactobacillus jensenii (11) Propionibacterium sp. 434 HC2 (3) Bacteroides fragilis (90) Haemophilus parainfluenzae (6) Staphylococcus epidermidis (240) Campylobacter showae (2) Lactobacillus crispatus (9) 11 Staphylococcus epidermidis (3) Propionibacterium acnes (79) 6 Porphyromonas sp. oral taxon 279 (3) Bacteroides dorei (8) 0.04 Fusobacterium nucleatum (2) Bacteroides eggerthii (3) Eubacterium siraeum (4) 25 Streptococcus infantis (2) 4 Propionibacterium acnes (3) Bacteroides ovatus (3) 22 Streptococcus parasanguinis (2) Bacteroides finegoldii (3) Alistipes shahii (1) Fusobacterium periodonticum (2) 6 41 Bacteroides caccae (2) Propionibacterium sp. KPL1854 (2) 0.02 Bacteroides cellulosilyticus (3) Propionibacterium granulosum (2) Bacteroides uniformis (3) Lactobacillus crispatus (2) Mean distance of strains 14 Dialister invisus (1) 3 Corynebacterium matruchotii (2) Actinomyces sp. oral taxon 448 (1) 22 2 Veillonella parvula (4) Streptococcus sanguinis (23) 32 6 Veillonella atypica (3) Streptococcus salivarius (13) 1 21 2 Prevotella melaninogenica (2) Streptococcus parasanguinis (23) 0.00 Rothia dentocariosa (3) Haemophilus parainfluenzae (13) Streptococcus sanguinis (4) Eubacterium rectale (3) s Rothia aeria (2) ucosa sum Stool fornix Odoribacter splanchnicus (1) m plaque 0.0 0.2 0.4 0.6 al Streptococcus infantis (4) AnteriorBuccal nare Tongue dor Posterior Niche-association measure Supragingiv 0.00 0.25 0.50 0.75 1.00 1 – UniFrac G bdActinomyces sp. oral taxon 448 Haemophilus parainfluenzae e Eubacterium siraeum Body site Buccal mucosa ) ) Keratinized gingiva Subgingival plaque Supragingival plaque Tongue dorsum Stool PCo 2 (7.6%) PCo 2 (15.7% Other PCo 2 (10.7% Reference genome Same subject PCo 1 (17.5%) PCo 1 (19.8%) PCo 1 (49.8%) Figure 1 | Personalization, niche association, and reference genome greater phylogenetic separation between body sites. d, PCoA showing coverage in strain-level metagenomic profiles. a, Mean phylogenetic niche association of Haemophilus parainfluenzae, showing subspecies divergences17 between strains of species with sufficient coverage at specialization to three different body sites. e, PCoA for Eubacterium each targeted body site (minimum 2 strain pairs). b, Individuals tended siraeum. f, Coverage of human-associated strains by the current 16,903 to retain personalized strains, as visualized by a principal coordinates reference genome set (Methods). Top 25 species by mean relative analysis (PCoA) plot for Actinomyces sp. oral taxon 448, in which abundance when present (>0.1% relative abundance) are shown lines connect samples from the same individual. c, Quantification of (minimum prevalence of 50 samples). Sample counts in Supplementary niche association (Methods; only species with sufficient coverage in Table 2, and distance matrices are available from Extended Data Table 1b. at least five samples at two or more

Load more