bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Stability and dynamics of the human gut microbiome and its association with systemic immune traits

Allyson L. Byrd1*, Menghan Liu1,2, Kei E. Fujimura1, Svetlana Lyalina3, Deepti R. Nagarkar1, Bruno Charbit4, Jacob Bergstedt5, Etienne Patin5, Oliver J. Harrison6, Lluís Quintana-Murci5,7, Darragh Duffy8*, Matthew L. Albert9 and The Milieu Intérieur Consortium†

1Department of Cancer Immunology, Genentech Inc., San Francisco, CA, 94080, USA 2Sackler Institute of Graduate Biomedical Sciences, New York University School of Medicine, New York, NY, 10016, USA 3Personalized Health Care, Genentech Inc., San Francisco, CA, 94080, USA 4Center for Translation Research, Institut Pasteur, Paris, France 5Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, Paris, France 6Benaroya Research Institute, Seattle, WA, 98101, USA. 7Collège de France, Paris, France 8Immunobiology of Dendritic Cells laboratory, INSERM U1223, Institut Pasteur, Paris, France 9Insitro, 279 East Grand Avenue, South San Francisco, CA, 94080, USA *Correspondence: [email protected] (A.L.B.), [email protected] (D.D.).

1

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Summary

Analysis of 1,363 deeply sequenced gut microbiome samples from 946 healthy donors of the Milieu Intérieur cohort provides new opportunities to discover how the gut microbiome is associated with host factors and lifestyle parameters. Using a genome- based to achieve higher resolution analysis, we found an enrichment of in males, and that bacterial profiles are dynamic across five decades of life (20-69), with Bacteroidota species consistently increased with age while Actinobacteriota species, including Bifidobacterium, decreased. Longitudinal sampling revealed short-term stability exceeds inter-individual differences; however, the degree of stability was variable between donors and influenced by their baseline community composition. We then integrated the microbiome results with systemic immunophenotypes to show that host/microbe associations discovered in animal models, such as T regulatory cells and short chain fatty acids, could be validated in human data. These results will enable personalized medicine approaches for microbial therapeutics and biomarkers.

2

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Introduction

Microbial therapeutics, including fecal microbiota transplants, bacterial consortia and probiotics are increasingly being tested in patients with difficile infections and other gastrointestinal (GI) disorders (Allegretti et al., 2019), including inflammatory bowel disease, and more recently non-gastrointestinal indications such as autism (Kang et al., 2019) and cancer (Mullard, 2018). While the impact of microbial therapies on the recipients’ immune system has been characterized in animal models for several of these indications; in humans, the specific ways in which microbial therapies impact the recipient’s immune system are poorly understood. In parallel to microbial therapeutics, microbial signatures are being evaluated as a novel class of biomarkers, applied for stratification of efficacy and safety in clinical trials across multiple indications (Ananthakrishnan et al., 2017; Dubin et al., 2016). Notably, this rapid increase in microbial therapeutics and biomarkers justifies a careful reevaluation of the factors influencing an individual’s personal gut microbiome over time, as well as how gut microbes associate with systemic immunity in the steady state.

In this paper, we present a comprehensive assessment of the gut microbiome of 946 well-defined heathy French donors from the Milieu Intérieur (MI) consortium with 1,363 shotgun metagenomic samples. Designed to study the genetic and environmental factors underlying immunological variance between individuals, the MI consortium is comprised of 500 women and 500 men evenly stratified across 5 decades of life from 20 to 69 years of age, for whom extensive metadata, including demographic variables, serological measures, dietary information, as well as systemic immune profiles, are available (Patin et al., 2018; Thomas et al., 2015).

To build on the findings of several landmark microbiome studies (Falony et al., 2016; Human Microbiome Project, 2012; Zhernakova et al., 2016), many of which relied on an older reference library for taxonomic classification of microbial sequence reads (Truong et al., 2015), we leveraged an expanded set of reference genomes with a novel taxonomy that corrects many misclassifications in public databases to discover new biological insights particularly around age and sex (Parks et al., 2018). Notably, an 3

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

independent dataset was used for replication of many of the findings (Zeevi et al., 2015). Longitudinal samplings from half the donors were used to expand on previous work that individuals are more similar to themselves over time, as compared to others (Costello et al., 2009; Flores et al., 2014; Mehta et al., 2018), and to identify factors that influence an individual’s stability over a two week time period. Using this improved annotation, we relate microbial composition to systemic immune profiles to uncover novel /immune interactions which can be experimentally tested and ultimately leveraged to design a microbial therapeutic with a desired immunogenicity. Overall this study utilizes the increased availability of reference genomes, granularity afforded by deep shotgun metagenomic sequencing, and statistical power of a large, well- characterized cohort to provide new insights into host/bacteria biology that will enable personalized medicine approaches for microbial therapeutics and biomarkers. In addition, the rich metadata and 1,000 plus deep shotgun metagenomic samples provided here will be a valuable resource upon which future microbiome studies can test and build new computational tools as well as generate and test new hypotheses.

Results

The Genome Tree Database improves taxonomic resolution of k-mer based approaches

Historically, microbial sequencing efforts focused predominantly on a small number of organisms, often causes of nosocomial infections [Figure S1A,B]. By contrast, reference databases, including NCBI Genbank, are increasingly populated with genomic information of commensal microbes (Browne et al., 2016; Forster et al., 2019; Poyet et al., 2019; Zou et al., 2019). As genome reference databases expand, historical, microbiological-based taxonomic assignments do not reflect population level relationships inferred from genome sequencing. This is particularly problematic for k-mer based analyses which utilize sequence similarity between closely related genomes to infer which taxa are present (Nasko et al., 2018).

4

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

To overcome these issues, in lieu of the traditional NCBI taxonomy, we generated a custom reference database of 23,505 RefSeq genomes with Genome Tree Database (GTDB) taxonomies (See methods)[Table S1]. Briefly, GTDB is a bacterial taxonomy based on a concatenated protein phylogeny in which polyphyletic groups were removed and taxonomic ranks were normalized on the basis of relative evolutionary divergence (Parks et al., 2018). The impact of this procedure was particularly prominent for species of the genus Clostridium, which were split into 121 unique genera spanning 29 families (Parks et al., 2018). This could be especially meaningful for analysis of gut microbiome samples as Clostridium species are a prevalent community members and often emerge in association studies.

The Refseq sequences and taxonomic tree from the GTDB were used to build a reference database for the k-mer based program Kraken2 (Wood and Salzberg, 2014) and read-reassignment step Bracken (Lu et al., 2017). This custom Kraken2/GTDB pipeline was applied to 1,363 quality-controlled samples from 946 MI donors [Figure S1C- F, Table S2,3], and compared using both the marker gene based tool Metaphlan2 (Truong et al., 2015) and Kraken2 with the same 23,505 reference genomes using their original NCBI taxonomies [Figure S2]. Consistently, more bacterial taxa were identified per sample with Kraken2 than Metaphlan2, a result of the updated reference database and higher sensitivity of this k-mer based approach (McIntyre et al., 2017) [Figure S2A-C]. Between the two Kraken databases (GTDB and NCBI), richness varied depending on how taxa were re-distributed by GTDB. For example, GTDB split 2,397 NCBI genera into 3,205, while it collapsed 18,795 NCBI species into 13,446 [Figure S2A,D]. Despite finer level differences, the overall distribution of phyla across the 3 approaches was similar [Figure S2E], indicating that Kraken2/GTDB pipeline results would be consistent with previous analyses. As such, a combination of k-mer based read assignment and genome- based taxonomy allows higher resolution analysis of shotgun metagenomic samples.

5

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Variable gut microbiomes in a restricted geographical region.

To complement our optimized taxa-based approach and further utilize the resolution afforded by shotgun metagenomic sequencing, we applied HUMANn2 to identify the functional potential of microbial pathways present in the MI samples (Franzosa et al., 2018) [Table S4]. Utilizing both the Kraken2/GTDB and HUMANn2 pipelines, we identified a broad range of diversity across the 946 individuals in this geographically-restricted cohort of healthy French adults. This diversity was observed in terms of metabolic pathway richness (282 ± 40), species richness (353 ± 45), as well as Shannon diversity (3.9 ± 0.36), which accounts for both richness and evenness [Figure 1A-C, Table S2]. Across donors, our GTDB pipeline confirmed and Bacteroidota (formerly ) as the most abundant phyla in the gut, but enabled distinction among the original Firmicutes phyla, which was further divided into 12 distinct categories; Firmicutes, Firmicutes_A, Firmicutes_B, … Firmicutes_K [Table S1]. Notably, throughout the GTDB, the group containing type material (if known) kept the original unsuffixed name. Of those, 7 were present in this cohort with Firmicutes_A the most abundant, followed by Firmicutes and Firmicutes_C [Figure 1D, Table S3], highlighting the finer granularity, even at the phylum level, provided by GTDB-based taxonomic calls. Subsequent application of the Bray-Curtis distance metric, as a means to assess species presence/absence in addition to relative abundance across donors, demonstrated that samples fell along a gradient defined by the relative abundances of Firmicutes_A and Bacteroidota [1st dimension of MDS projection, Figure 1E].

Using this extensively characterized cohort we explored how 154 metadata variables [Figure S3, Table S5,6], including 42 laboratory measurements and 43 dietary variables, contributed to overall bacterial community composition. We identified 51 variables (33% of total) that associated with Kraken2-GTDB species (multivariate PERMANOVA test FDR <0.05); all of which replicated at the genera-level (multivariate PERMANOVA test FDR <0.1) [Figure 1F, Table S7]. The top contributors were age and sex with lesser contributions from diet such as consumption of raw fruit, cooked, cured meats, as well as frequency of fast food consumption, in line with previous reports of 16S

6

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

rRNA analyses from this cohort (Partula et al., 2019; Scepanovic et al., 2019). Notably, sex and age were associated with 24 and 44 of the other metadata variables respectively, which confounds our ability to dissociate the individual effects of these variables on microbial community composition [Figure S4]. In total, these factors explained less than 10% of population variability, indicating that the majority of variance in community composition remains unexplained. Drawbacks of this analysis are the absence of Bristol stool score, a measure of stool consistency, and levels of chromogranin A, a protein secreted by enteroendocrine cells, the factors most associated with community composition in previous European cohorts (Falony et al., 2016; Zhernakova et al., 2016). Although genetic data were also available for these donors, they were not considered here based on previous analyses that the effects of host genetics on microbiome are minimal in this (Scepanovic et al., 2019) and other cohorts (Rothschild et al., 2018) in part due to the small population sizes by GWAS standards (Goodrich et al., 2017).

Notably, in this healthy cohort medication usage was low, with only 28% of individuals (n=266) taking medication of any kind. Of all medications, only oral contraception was taken by more than 10% of participants (n=111). In pre-menopausal women, oral contraception was taken by 36% (110/303) and explained 0.005% of the variance (p-value = 0.002). In contrast, relatively common medications, beta-blockers and proton pump inhibitors were taken by only 7 and 4 individuals respectively. Despite this, medication usage was a significant albeit minor contributor (R2 = 0.003) to microbial community composition, highlighting how xenobiotics can, and do influence the gut microbiome (Jackson et al., 2018; Maier et al., 2018).

MI bacterial profiles are comparable to those from Israeli donors

In order to determine whether MI bacterial profiles were unique to this population, or comparable with other non-European healthy cohorts, we ran our Kraken2/GTDB pipeline on 1,161 samples from 851 Israeli donors originally published by Zeevi et al (Zeevi et al., 2015) [Table S8], for whom age, sex, and body mass index (BMI) were provided [Figure S5A,B]. After accounting for sequencing depth (mean read count MI:

7

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

14.2 million vs. Zeevi 13.5 million), we found that richness across taxonomic levels was consistently elevated in the MI samples, even though the percentage of unmapped reads was comparable (MI 42.2% versus Zeevi 39.5%) [Figure S5C-E]. More specifically, we identified on average 75 more species in samples from the MI donors compared to those from the Zeevi cohort. In addition to potential technical and lifestyle reasons, this discrepancy could reflect the stricter inclusion and exclusion criteria, and thus the greater overall health of the MI donors (Thomas et al., 2015; Zeevi et al., 2015).

On the whole, community composition including relative taxa abundances and beta diversity was consistent across both cohorts [Figure S5F-H]. Notably however, the contributions of age and sex to community composition were almost 2 times greater in MI than Zeevi (Age: R2 0.0088 vs. 0.0038, Sex: 0.011 vs. 0.0068)[Figure S5I], highlighting how stratification of age and sex in the MI cohort provided enhanced statistical power to identify new associations [Figure S5A,B]. Despite the geographic and cultural distinctions between these cohorts, our findings demonstrate a comparable makeup of the gut microbiome. This allowed us to utilize the Zeevi samples as a replication cohort to demonstrate the reproducibility of our findings in MI.

Prevotella species are more abundant in male donors

Given that sex and age were the variables most strongly associated with bacterial community composition in healthy individuals, we leveraged the statistical power of the MI cohort to explore which taxa were differentially abundant between sexes and across decades of life. To identify bacteria differentially abundant between the 473 females and 473 males, we conducted DESeq2 analysis (Love et al., 2014) on 547 abundant species (prevalence > 5% and a mean relative abundance > 0.01%) [Table S9]. Of the 130 differentially abundant species (FDR < 0.05) [Figure 2A], 12 were more abundant in

females, while 26 were more abundant in males with log2 fold change > 1 [Figure 2B,C]. In total, 17 out of 45 prevalent Prevotella species were more abundant in males compared to females, corresponding to a greater overall richness of Prevotella species in males [Figure 2D]. Similarly, in the Zeevi cohort, 25 species of Prevotella were more abundant

8

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

in males [Figure S6, Table S10]. Notably, even when a species was significantly differentially abundant between sexes in only one cohort, the direction of this trend was also consistent in the other, indicating that higher Prevotella abundance in males compared to females is a biological phenomenon consistent across multiple species and populations. This information expands the findings of two previous studies, one that identified Bacteroides-Prevotella as broadly more abundant in males than females based on 16S rRNA-targeted oligonucleotide probes (Mueller et al., 2006) and another that found males were three times more likely to have an enterotype consisting of fewer Bacteroides and higher Prevotella (Ding and Schloss, 2014). Although the factors driving preferential colonization of Prevotella in males are unknown, from these data we could generate hypotheses surrounding the roles of gonadal hormones and microbial community composition.

When considering metabolic pathways, we identified 54 (FDR < 0.05) differentially abundant between the sexes [Table S11]. Of those, the pathway CRNFORCAT-PWY: creatinine degradation I was the most strongly enriched in men [Figure 2E]. Biologically, this is consistent with men having greater blood levels of creatinine [Figure 2F]. Across both sexes but not in each individually, circulating creatinine levels were significantly associated with the abundance of the CRNFORCAT-PWY pathway (Both sexes Spearman rho = 0.087, p-value = 0.0014; men rho = 0.028, p-value = 0.47; women rho = -0.047, p-value = 0.22). Overall, this exemplifies how adaptation to utilize available nutrients may influence microbiome composition.

The gut microbiome is dynamic across decades of life

The composition of the gut microbiome differs dramatically between newborns and adults, with the neonatal microbiome transitioning to a more adult-like state upon consumption of solid food and cessation of breast-feeding (Backhed et al., 2015; Stewart et al., 2018). Similarly, comparison of young adults, elderly individuals, and centenarians has identified differences in microbial community by culturing or 16S rRNA (reviewed by (An et al., 2018)). The design of the MI cohort provides a unique opportunity to explore

9

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

how in the absence of underlying disease the gut microbiome is dynamic across the adult decades (20 to 69 years old).

In total, we identified 227 out of 547 abundant species that were differently abundant by donor age (Spearman correlations of age by taxa relative abundance, FDR < 0.05) [Table S12]. Notably, 3 of the top 5 phyla (Bacteroidota, Actinobacteriota, and Proteobacteria), experienced shifts in relative abundance across the decades [Figure 3A]. Transitions were most pronounced around 40-50 years old, a time span when many persons experience the preclinical stages of chronic diseases, and women begin to experience hormonal changes associated with onset of menopause; average age in this cohort 50 ± 4.2 years. Across phyla, the associations with age were conserved across sexes and cohorts [Figure S7A,B]. For example, relative abundances of Proteobacteria and Bacteroidota were primarily increased with age, while Actinobacteriota, including 17 species of Bifidobacterium were decreased with age [Figure 3B]. This gradual decline of Bifidobacterium was true in terms of both relative abundance as well as presence/absence [Figure 3C,D]. Notably, Collinsella species that were positively correlated with Bifidobacterium were also decreased with age [Figure 2B, Figure S7C].

The decline in Bifidobacterium prominence with age is particularly interesting in light of Bifidobacterium being the dominant bacteria in newborns, gradually decreasing as infants cease breastfeeding (Stewart et al., 2018). The association of Bifidobacterium and old age indicates that the loss of Bifidobacterium occurs not only in infants, but continues throughout adulthood (An et al., 2018; Biagi et al., 2016; Biagi et al., 2010; Kato et al., 2017; Mueller et al., 2006). Similar to our findings associating Prevotella and sex [Figure S6B], we built on previous findings and revealed the trend was consistent across species within the genera and across cohorts [Figure S7D, Table S13], highlighting how the phenomenon is intrinsic to this species. Notably, the only exception, Bifidobacterium animalis, is a common probiotic-associated strain (Turroni et al., 2009), rather than a persistent colonizer. Moving forward, comparative genomic analyses between these different species could reveal features associated with colonization in older adults.

10

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

We then focused our attention on 364 prevalent microbial pathways (prevalence > 5%) and identified 108 that correlated with age [Table S14], of which 31 were increased and 77 were decreased [FDR < 0.05], including several lactose and galactose degradation pathways [Figure 3E]. Lower levels of lactose/galactose degradation may explain increased lactose intolerance in older adults and presents a possible opportunity for microbial therapeutic intervention (Gingold-Belfer et al., 2019; Savaiano et al., 2013). Notably in this cohort, the abundance of these pathways was not associated with consumption of dairy products, e.g. milk, cheese, and yogurt (Spearman p-values >0.3).

Other pathways associated with age were related to L-histidine. In this case, pathways for L-histidine biosynthesis were decreased with age while those for degradation were increased [Figure 3F], indicating that gut L-histidine levels may be decreased in older adults which could lead to an altered immune state as L-histidine metabolites have been demonstrated to influence colonic inflammation (Gao et al., 2017). Overall, understanding the multitude of microbial associations with age will become increasingly important as microbial therapeutics are being evaluated as treatments for individuals of all ages.

Short term stability is variable across donors

To complement our cross-sectional study of the microbiome across the decades, we leveraged longitudinal sampling of roughly half the cohort (n = 413) to study short- term (17 ± 3.3 days) dynamics within an individual in the absence of antibiotic exposure. By comparing species Bray-Curtis distances within and between individuals, we found that in the short-term intra-individual differences were less than the inter-individual ones [Figure 4A]. This is consistent with previously published findings (Costello et al., 2009; Flores et al., 2014; Mehta et al., 2018), and also reflected the analysis of relative abundance and metabolic pathways presence/absence [Figure 4A]. Notably, differences between donors were less dramatic at the pathway level, reflecting the more conserved nature of annotated metabolic pathways versus species profiles across individuals (Human Microbiome Project, 2012).

11

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Although stability was the norm, the degree of stability (quantified as 1 - Bray Curtis distance (Mehta et al., 2018)) was variable across the 413 donors [Figure 4B,C], and as such we investigated which microbial and metadata features underlie this personalized stability trait. Using Spearman correlations, we identified the phyla Firmicutes, Firmicutes_A and Actinobacteriota as enriched at baseline in the less stable donors, while Bacteroidota was higher in donors with greater stability over time (Flores et al., 2014) [Figure 4D, TableS15]. This is consistent with observations that spore forming bacteria, including many Firmicutes species, are intrinsically less stable (Kearney et al., 2018). Using generalized linear models, we found BMI and Triglycerides levels were negatively associated with stability, while conversely, consumption of sweet items (e.g. chocolate, sweets, honey, jam) and raw fruit were positively associated [Figure 4E, Table S16], concordant with diet being a key determinant of human gut microbiome variation (Johnson et al., 2019). While the previous results were only marginally significant (p-value < 0.05), consistent with previous findings (Flores et al., 2014; Mehta et al., 2018) and ecological theory (McCann, 2000; Schindler et al., 2010; Tilman, 1999), baseline species Shannon diversity was positively associated with stability (FDR = 0.014) [Figure 4E], i.e. individuals with more diverse communities were more resilient to change than individuals with lower diversity.

We then applied the Bray-Curtis metric to calculate stability (1 – BC) of individual species and pathways [Table S17,18], as previously done (Faith et al., 2013; Franzosa et al., 2015). For both species and pathways, we found that their stability was strongly associated with their mean abundance and prevalence across donors [Figure 4F, G]. For example, the Firmicutes Enterococcus_B_durans had a mean abundance of 0.021%, prevalence 7%, and the third lowest stability 0.085. Additionally, many species known to be present in yogurt and probiotics were also highly unstable, e.g. Lactobacillus_D sakei, Lactococcus lactis, and Bifidobacterium animalis [Figure 4F] (Fijan, 2014); in agreement with previous observations that probiotics often face colonization resistance (Zmora et al., 2018). Moving forward, these data can be leveraged to prioritize microbial

12

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

pathways/species that will make reliable biomarkers as well as persistent colonizers if incorporated into a microbial therapeutic.

Bacterial species and microbial pathways are associated with systemic immunophenotypes

The microbiota has been shown to play a major role in the development, education and function of the mammalian immune system, largely through the use of mouse models (Belkaid and Harrison, 2017). However, there remains a gap in our knowledge concerning the association of microbial signatures and immune activation in healthy humans. Having established high resolution bacterial profiles, we leveraged immunophenotyping of matched peripheral blood samples to explore how gut bacterial species/pathways were associated with 86 systemic immune cell profiles (Patin et al., 2018). Of these immunophenotypes, 50 related to adaptive immunity and 36 with innate immune cell subsets [Figures S8,9, Tables S19,20]. After correcting for age, sex, cytomegalovirus (CMV) status, smoking, genetics, and batch, which were previously shown to associate with immune profiles (Patin et al., 2018), we applied multivariate linear models to associate 547 bacterial species (prevalence >5%, average abundance > 0.01%) and 350 metabolic pathways (prevalence > 20%) with the 86 immunophenotypes.

To evaluate the applicability of this approach, we initially investigated microbial

+ species associated with circulating regulatory T (Treg) and TCRgd T cell populations. Experimentally in animal models, unique microbial species have been previously demonstrated to engage these distinct cellular subsets (Belkaid and Harrison, 2017). In total, 37 pathways (FDR < 0.1, corresponding p-value < 0.005) [Figures 5A, Table S21] and 9 species (FDR < 0.1, corresponding p-value < 4 x 10-4 )[Figure S10A, Table S21]

+ were associated with circulating Treg cells and TCRgd T cells. In line with observations

that supplementation of short chain fatty acids (SCFAs) augments colonic Treg cell frequencies in animal models (Arpaia et al., 2013; Furusawa et al., 2013; Smith et al., 2013), the SCFA pathway “P461-PWY: hexitol fermentation to lactate, formate, ethanol

and acetate” was positively associated with circulating Treg cells numbers [Figure 5B]. As

13

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

such, combined microbial profiling and extensive immunophenotyping illustrated that the

SCFA/Treg cell axis is relevant throughout adult human life. Additionally, pathways in the “Fatty Acid and Lipid Biosynthesis” superfamily, including “PWYG-321 mycolate biosynthesis”, were positively associated with circulating TCRgd+ T numbers [Figure 5C], particularly in individuals in their 60s in whom TCRgd+ T counts are lowest [Figure 5D]. Notably, this correlation is supported by experimental observations that TCRgd+ T cells are capable of recognizing microbial lipid antigens, including mycolic acids (Ridaura et al., 2018). Together these results demonstrate that certain concepts established in animal models can be validated using human subject data. Furthermore, at the species level, the species Terrisporobacter othiniensis, Terrisporobacter glycolicus and Terrisporobacter glycolicus_A were positively associated with circulating TCRgd+ T cell numbers [Figure S10B]. As such, we identify both previously known microbial associations with TCRgd+ T cell responses and novel associations for future mechanistic investigation.

We then extended our analyses to the remaining 83 immunophenotypes, in order to discover additional novel microbial/immune associations. After 45,401 tests (FDR < 0.1, corresponding p-value < 5 x 10-5), 20 species were associated with 17 immunophenotypes. [Figure 6A, Table S22]. Of the 24 pairs, 8 (33%) were positive associations; many of which demonstrated consistency at the phylogenetic level, for example, multiple Actinobacteriota species, including Rothia mucilaginosa, Pauljensenia odontolyticus_A, and Pauljensenia odontolyticus_B were positively associated with expression of the activation marker CD38 by B cells, which was previously strongly correlated with smoking (Patin et al., 2018) [Figure 6B]. Additionally, the Bacteroidota species Bacteroides intestinalis was negatively associated with multiple circulating T cell subsets, including both CD4+ and CD8+ central memory T cells [Figure 6A]. Further studies, requiring gastrointestinal tissue immunophenotyping, could determine whether this is due to accumulation of these cells locally within the intestinal microenvironment.

To further investigate the mechanisms of microbial/immune associations, after 29,050 tests (FDR < 0.1, corresponding p-value < 2.2 x 10-5), 3 pathways were associated

14

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

with 5 immunophenotypes. [Table S22]. Notably, two pathways promoting degradation of the neuromodulator 4-aminobutanoate (GABA) were strongly associated with complement receptor 2 (CD21) expression by peripheral blood B cells [Figure 6C]. In previous association studies, the metabolic pathway “PWY-5022 4-aminobutanoate degradation V”, responsible for the breakdown of GABA into the SCFA butyrate, was identified as causally associated with increased insulin secretion following an oral glucose challenge (Sanna et al., 2019). Additionally, neuro-immunology is an emerging focus for disease pathogenesis, one that might be influenced by commensal metabolic pathways and bears further investigation (Godinho-Silva et al., 2019). Moving forward, as sample sizes increase, new microbe x immune interactions will continue to be uncovered.

Discussion

As the number of microbial intervention trials and biomarker studies continues to grow, it is increasingly important to develop robust understanding of the gut microbiome across individuals in the steady state. In this study, we utilize the statistical power of a large cohort, the resolution afforded by deep shotgun sequencing, and an updated microbial database with genome-based taxonomy to expand on many prior microbiome observations. More specifically, we identified 28 Prevotella species enriched in men compared to women [Figure S6]; many of which were absent in previous databases (Truong et al., 2015) and thus not detectable in prior analyses. Given the recent literature on the strain-level variability within Prevotella species (De Filippis et al., 2019; Fehlner- Peach et al., 2019), particularly P. copri, follow-up analyses should compare if there are also strain-level differences between the sexes.

Additionally, we identified 227 species associated with age [Figure 3B], greatly expanding what was known about the effects of aging on the gut microbiome (reviewed by (An et al., 2018)). The changes seen here are particularly striking because they occur in the absence of underlying diseases or medication usage. Given the cross-sectional nature of this cohort, it is difficult to tease apart which of these associations is mediated by the variables correlated with age [Figure S4B]. For example, the interdependence of age and changes in dietary behavior could not be parsed from dependant physiological

15

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

changes such as thinning of the mucosal layer or altered pH levels. To definitively understand how the gut microbiome matures with age, we will require long-term longitudinal studies. In addition, these conclusions are based on stool samples; to better understand where in the gut these changes are manifesting would require sampling along the gastrointestinal tract, as previously performed in smaller cohorts (Zmora et al., 2018).

Despite these caveats, the knowledge that many bacteria are associated with age and sex could have multiple implications for the interpretation of microbial biomarker studies. For example, several bacterial biomarkers have been reported for response to checkpoint inhibitors in non-GI cancer indications (Frankel et al., 2017; Gopalakrishnan et al., 2018; Matson et al., 2018; Routy et al., 2018). Notably, however, age, a prognostic biomarker of response and strong correlate of microbial composition, remains unaccounted for in those analyses. Additionally, these data will be valuable for designing microbial therapeutics, especially those for indications affecting older individuals. For example, interventions containing Bifidobacterium species may need to be dosed more frequently in individuals older than 50 years of age, in whom Bifidobacterium appears to colonize less effectively [Figure 3C,D, Figure S7D] . Similarly, consortia with species which demonstrated low stability in the short-term may also require additional dosing. With respect to the analysis of host immune system – microbiome associations, here we present the largest study to date to broadly associate gut bacterial profiles with circulating immune cells. We used this information to successfully validate results from animal studies in human subject data, including the association of SCFAs with T regulatory cells as well as microbial fatty acids and TCRgd+ T cells. While this dataset is well powered for targeted questions, additional samples will support the discovery of novel associations with smaller effect sizes. Finally, beyond the findings in this manuscript, the rich metadata and 1,000 plus deep shotgun metagenomic samples provided here will be a valuable resource upon which future microbiome studies can test and build new computational tools as well as generate and test new hypotheses.

16

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Acknowledgements

We would like to thank all the donors for their participation to the study. We thank J. Segre and F. Tamburini for critical reading of the manuscript and helpful comments. Author: The Milieu Intérieur Consortium†.

† The Milieu Intérieur Consortium¶ is composed of the following team leaders: Laurent Abel (Hôpital Necker), Andres Alcover, Hugues Aschard, Kalla Astrom (Lund University), Philippe Bousso, Pierre Bruhns, Ana Cumano, Caroline Demangel, Ludovic Deriano, James Di Santo, Françoise Dromer, Gérard Eberl, Jost Enninga, Jacques Fellay (EPFL, Lausanne), Ivo Gomperts-Boneca, Milena Hasan, Serge Hercberg (Université Paris 13, Paris), Olivier Lantz (Institut Curie, Paris), Hugo Mouquet, Etienne Patin, Sandra Pellegrini, Stanislas Pol (Hôpital Côchin, Paris), Antonio Rausell (INSERM UMR 1163 – Institut Imagine, Paris), Lars Rogge, Anavaj Sakuntabhai, Olivier Schwartz, Benno Schwikowski, Spencer Shorte, Frédéric Tangy, Antoine Toubert (Hôpital Saint-Louis, Paris), Mathilde Trouvier (Université Paris 13, Paris), Marie-Noëlle Ungeheuer, Darragh Duffy§, Matthew L. Albert (In Sitro)§, Lluis Quintana-Murci§,

¶ unless otherwise indicated, partners are located at Institut Pasteur, Paris, France § co-coordinators of the Milieu Intérieur Consortium

Additional information can be found at: http://www.pasteur.fr/labex/milieu-interieur This work benefited from support of the French government's Invest in the Future programme. This programme is managed by the Agence Nationale de la Recherche, reference ANR-10-LABX-69-01.

Author Contributions

Conceptualization, A.L.B., O.J.H., D.D., M.L.A, L.Q-M.; Methodology, A.L.B., M.L., J.B.; Software, A.L.B., M.L.; Formal Analysis, A.L.B., S.L., M.L., K.E.F.; Investigation, B.C.; Resources, D.D., M.L.A., L.Q-M,;Data Curation, A.L.B., J.B.; Writing – Original Draft, A.L.B., O.J.H.; Writing – Review & Editing, D.D, S.L., E.P., D.N., M.L., K.E.F., J.B., M.L.A.; Visualizations, A.L.B., M.L.; Supervision, D.N., D.D., M.L.A.; Project Administration, D.N., D.D.; Funding Acquisition, D.D., M.L.A., L.Q-M. Declaration of Interests

A.L.B., K.E.F., V.R., S.L., D.R.N are employees of Genentech. M.L. and M.L.A. were employees of Genentech. M.L.A. is an employee of Insitro. The remaining authors declare no competing interests.

17

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figures

A B C 150 150 90

100 100 60

Frequency 50 Frequency 50 Frequency 30

0 0 0 200 400 600 200 400 600 3 4 Pathway Richness Species Richness Species Shannon Diversity D E Firmicutes_A 0.2 0.2 Bacteroidota Actinobacteriota 0.0 0.0 Firmicutes −0.2 −0.2 Verrucomicrobiota MDS component:2 MDS component:2 Firmicutes_C −0.4 −0.4 −0.25 0.00 0.25 −0.25 0.00 0.25 Proteobacteria MDS component:1 MDS component:1 Desulfobacterota_A 0 25 50 75 Firmicutes_A Bacteroidota Relative Abundance 25 50 75 25 50 75

F FDR Species Genera Sex Age Raw fruit/mon HDL (mmol/L) ALAT (IU/L) Cook,cured meats/mon MCV (µm3) Glomerular filt rate Cooks <6 times/mon Cooked veggies/mon Takes medication Temperature (°C) Triglycerides (mmol/L) Essential oils BMI (kg/m^2) V1 Bicarbonates (mmol/L) Mumps vaccine Category (Count) Diastolic 1 (mmHg) Doesn't always eat dinner Basic physiological measurements (3) Wears corrective lenses Fried foods/mon Biometrics (3) Current smoker Measles MMRV IgG Concomitant drug treatment (2) Uses reduced sugar products Demographics (3) Fasting glycaemia (mmol/L) Phosphate (mmol/L) Food and nutrition (13) Household income >2000 EUR/mon

Variable Meat/mon Laboratory measure (20) Bread g per day Medical history (4) Total proteins (g/L) Monocytes (G/L) Smoking habits (1) Takes oral contraception Owns housing, landlord Socio−professional information (1) Potassium (mmol/L) Urea (µmol/L) Vaccination history (1) Black hair colour Total cholesterol (mmol/L) Albumin (g/L) Lymphocytes (G/L) Vascular system surgery Chicken pox Eats fast−food Cooked fruit/mon CMV CRP (mg/L) Toxoplasmosis ToRC IgG Reproductive system surgery Leucocytes (G/L) Dark brown hair colour Dried pulses/mon Hepatitis B 0.000 0.005 0.010 0.015 0.000 0.005 0.010 0.015 Explained Variance in BC distance (R2) Species Genera

<0.01 <0.05 <0.1

Figure 1. Inter−individual variation of bacterial composition is associated with many factors.

18

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 1. Inter-individual variation of bacterial composition is associated with many factors.

(A) Distribution of pathway richness across donors. (B) Distribution of species richness across donors. (C) Distribution of species Shannon diversity index across donors. (D) Boxplots of the top 8 phyla. Each dot corresponds to 1 donor. Firmicutes_A, Firmicutes_C, and Firmicutes were split into unique phyla by the Genome Tree Database (GTDB). (E) Multidimensional scaling (MDS) plots of Bray-Curtis distance of bacterial species composition. Ordination was driven by the top two phyla Firmcutes_A and Bacteroidota. Each dot corresponds to 1 donor while color indicates relative abundance of each phyla. (F) In total, 51 factors (Benjamini & Hochberg FDR < 0.05) were associated with inter- individual variation of the gut microbiome. The bar plots indicate the amount of interindividual variance explained by each factor for the species and genera level Bray- Curtis (BC) distance. Variables are ordered by the % variance explained in Kraken species. Colors of the bars correspond to the broad metadata category. The rectangles to the left indicate the statistical strength, as measured by FDR, of the association. For each variable, samples with NA values were excluded. For all these analyses, 1 sample per donor was used, when available Visit 1, if not Visit2. See also Figures S1, S2, S3, S4, and S5 and Tables S1, S2, S3, S4, S5, S6, and S7.

19

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Phyla Actinobacteriota 10 Bacteroidota Desulfobacterota_A

(FDR) Firmicutes 10 5 Firmicutes_A log

− Firmicutes_C Verrucomicrobiota 0 Nonsignificant −2.5 0.0 2.5 5.0 Sex log2(Female/Male) B F Female enriched M 0.6 1.5 2 1 1.8 1.8 1.1 1.3 1.2 1.1 1.1 1.1 1.7 0.4 Phyla 0.2 Actinobacteriota 0.0 Bacteroidota Firmicutes Relative Abundance Relative Firmicutes_A Verrucomicrobiota

Tidjanibacter inops

Enorma massiliensis Rikenella microfusus

Alistipes sp002161445

Virgibacillus_G profundi Acutalibacter timonensis Bacteroides massiliensis

Barnesiella sp002159975

Bacteroides_A salanitronis Akkermansia sp001580195

Bacteroides_A sp002161565 Lawsonibacter sp002160305 C Male Male enriched 0.12 3 1 1.1 1.7 1.9 1.2 1.2 2.5 1.5 1.2 1.1 1.1 1.6 4.4 1.5 1.2 1.5 1.4 1.9 1.1 1.6 1.3 1 1.1 1.4 1.1 1.3 0.08 2 1 0.04 0 0.00 Relative Abundance Relative

Prevotella bivia Prevotella copri Prevotella jejuni Prevotella ihumii

Prevotella copri_A Prevotella albensis Prevotella denticola Prevotella stercorea Actinotalea carbonis

Prevotella multiformis

Holdemanella biformis

Enterococcus_B durans Prevotella sp002265625 Prevotella sp000834015 Prevotella sp002251295 Prevotella sp002251435 Tyzzerella sp000411335

Libanicoccus massiliensis Prevotella melaninogenica

Senegalimassilia anaerobia Allisonella histaminiformans Megasphaera sp000417505

Prevotellamassilia timonensis Acidaminococcus fermentans

Acidaminococcus massiliensis

Collinsella bouchesdurhonensis

D Prevotella Species E CRNFORCAT−PWY: F creatinine degradation I ** **** **** 40 60 54% 75 100 47% 30 40 50 80 20 CPM Richness 20 Prevalence 60 10 25 Creatinine (µmol/L)

0 0 0 40 Female Male Female Male Female Male Female Male

Figure 2. 130 bacteria were differentially present between males and females.

20

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 2. 130 bacteria were differentially present between males and females.

(A) Volcano plot of 547 abundant species, of which 130 were differentially abundant between males and females based on DESeq2 (FDR < 0.05). Each species is colored by its taxonomic phyla. (B, C) Bars indicate the mean relative abundance of the species in either sex. The left bars with a black border are for females, while those shaded with a gray border are for males. Error bars indicate the mean ± standard error. Absolute log2 fold change for each species is indicated above the bars. (B) 12 bacteria enriched in females with FDR < 0.05 and log2 fold change > 1. (C) 26 bacteria enriched in males with FDR < 0.05 and absolute log2 fold change > 1. (D) Boxplots show richness of Prevotella species in males and females. (E) Abundance measure in Counts Per Million (CPM) and prevalence of the MetaCyc pathway CRNFORCAT-PWY: creatinine degradation I in males and females. (F) Blood Creatinine levels in males and females. (D-F) ** p < 0.01, **** p < 0.0001 by Wilcoxon-Rank Sum. See also Figure S6 and Tables S9 and S11.

21

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A 50 C B. adolescentis B. angulatum 75 45 50 Phyla 25 40 Actinobacteriota* B. animalis B. bifidum 75 35 Bacteroidota* 50 25 8 Firmicutes 6 Firmicutes_A B. breve B. catenulatum 75 4 Proteobacteria* 50

Relative Abundance Relative 25

2 B. dentium B. gallinarum 0 75 50 20 30 40 50 60 70 25 Age B. infantis B. kashiwanohense B 75 50 25

B. kashiwanohense_A B. longum % Prevalence 75 50 25

B. merycicum B. minimum 75 50 25

B. pseudocatenulatum B. ruminantium 75 50 25

B. sp002742445 20 30 40 50 60 75 50 25 20 30 40 50 60 Age D Bifidobacterium Species

15

10

Richness 5

0 20s 30s 40s 50s 60s Age E Lactose/Galactose degradation Pathway 3.3 5.6 3.0 LACTOSECAT−PWY: lactose and galactose degradation I 2.7 5.5 PWY66−422: D−galactose degradation V (Leloir pathway) 2.4

log(Abundance) 20 30 40 50 60 70 20 30 40 50 60 70 Age F L−histidine

Degradation Synthesis Pathway 4.0 5.4 HISDEG−PWY: 3.5 L−histidine degradation I 5.3 PWY−5030: 3.0 L−histidine degradation III 5.2 HISTSYN−PWY: L−histidine biosynthesis

log(Abundance) 20 30 40 50 60 70 20 30 40 50 60 70 Age

Figure 3. Bacterial profiles are dynamic across decades of life.

22

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 3. Bacterial profiles are dynamic across decades of life.

(A) Abundance of the top 5 most abundant phyla across decades of life. Curves show 95% confidence intervals and were modeled with LOESS regression. Stars in the figure legend indicate those phyla statistically associated with age. (B) GraPhlAn taxonomic tree of the 508 species in the top 5 phyla found at significant prevalence and abundance across all donors. Relative abundance of taxa in red were decreased with age, while those in blue increased. Association between taxa relative abundance and age was determined by Spearman correlation (FDR < 0.05). The heatmap on the outer ring indicates the strength of the correlation (Spearman Rho). Genera with at least 5 species associated with age are labeled. (C) Lines indicate the percent prevalence of 17 Bifidobacterium species across the different decades. Only Bifidobacterium species significantly associated with age are shown. (D) Boxplots show richness of Bifidobacterium species in donors grouped by their age. (E) Abundance (CPM) of lactose/galactose degradation pathways decreased with age. Curves show 95% confidence intervals and were modeled with LOESS regression. (F) Abundance (CPM) of L-histidine degradation/synthesis pathways associated with age. Curves show 95% confidence intervals and were modeled with LOESS regression. See also Figure S7 and Tables S12 and S14.

23

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B C Pathways Species Pathways Binary 0.4 1.00 **** **** **** 40 Stability 0.75 30 0.2 0.8 0.6 0.50 20 0.0

Frequency 0.4 Dissimilarity 0.25 10 −0.2 MDS component:2 0.00 0 0.3 0.5 0.7 −0.4 −0.2 0.0 0.2 0.4 Between Within Stability MDS component:1 donors a donor D E Firmicutes Firmicutes_A

0.8

0.6 Variable Estimate Std. Error z value p value FDR 0.4 Baseline Shannon 0.2600 0.0670 3.9 9.2e−05 0.014 R = − 0.15 , p = 0.0028 R = − 0.15 , p = 0.0024 Raw fruit/mon 0.0034 0.0012 2.9 4.2e−03 0.330 0.2 Triglycerides (mmol/L) −0.1200 0.0430 −2.7 7.5e−03 0.380 0 5 10 25 50 75 Wears corrective lenses 0.1300 0.0510 2.5 1.3e−02 0.500 Actinobacteriota Bacteroidota Salts food 0.1500 0.0650 2.4 1.8e−02 0.560 0.8 Sweet items/mon 0.0034 0.0015 2.2 2.8e−02 0.630 Stability over time Stability over BMI (kg/m^2) V1 −0.0160 0.0077 −2.1 3.4e−02 0.630 0.6 Sodium (mmol/L) −0.0360 0.0170 −2.1 3.6e−02 0.630 Total bilirubin 0.0093 0.0045 2.0 4.0e−02 0.630 0.4 R = − 0.11 , p = 0.022 R = 0.2 , p = 5.7e−05 0.2 0 10 20 30 0 25 50 75 Relative Abundance F Species Phyla (Number species) 0.8 0.8 Actinobacteriota (70) Bacteroidota (140) 0.6 0.6 1: Enterococcus_B durans Desulfobacterota (1) 2: Lactobacillus_D sakei Desulfobacterota_A (5) 0.4 0.4 3: Lactococcus lactis Firmicutes (62) Stability Stability 4: Lactococcus lactis_E 5 5 Firmicutes_A (239) 0.2 4 0.2 4 3 3 Firmicutes_C (13) 5: Bifidobacterium animalis 2 2 1 1 Proteobacteria (31) 0.0 0.0 −12.5 −10.0 −7.5 −5.0 −2.5 0.25 0.50 0.75 1.00 Verrucomicrobiota (6) log(mean relative abundance) Mean Prevalence G Pathways

0.75 0.75

0.50 0.50 Stability Stability

0.25 0.25

0.0 2.5 5.0 0.25 0.50 0.75 1.00 log(mean relative abundance) Mean Prevalence

Figure 4. Bacterial profiles are stable over the short−term, however the degree of stability is variable across donors.

24

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 4. Bacterial profiles are stable over the short-term, however the degree of stability is variable across donors.

(A) Boxplots of species Bray-Curtis distances (left), pathway Bray-Curtis distances (middle), and pathway binary Jaccard distances (right) between donors and within a donor over time (n= 413, time between samples= 17 ± 3.3 days), 1 = samples are completely different, 0 = samples are identical. **** p-value < 2.2e-16 by Wilcoxon-Rank Sum. (B) Histogram of 413 donors' species stability (1 - Bray-Curtis). (C) Multidimensional scaling (MDS) plot of Bray-Curtis distance of bacterial species composition. Each dot corresponds to 1 donor who had a second sample. Color indicates longitudinal stability of that donor (1 - within sample Bray-Curtis). (D) Scatter plots show the top phyla associated with stability. Each point corresponds to 1 donor with a longitudinal sample (n= 413). Colors correspond to those in F. Trend lines show 95% confidence intervals and were modeled with lm. Stats based on Spearman correlation. (E) Results of the series of GLMM fits aimed at identifying factors associated with intra- individual stability. Only factors with p.value < 0.05 are shown. (F) Scatter plots show the stability of individual species (1 - Bray-Curtis) by their mean baseline relative abundance and prevalence. Each point corresponds to a bacterial species and is colored by the species Phyla. (G) Scatter plots show the stability of individual pathways (1 - Bray-Curtis) by their mean baseline relative abundance and prevalence. Each point corresponds to a pathway. See also Tables S15, S16, S17, and S18.

25

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

26

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 5. Metabolic pathways associated with circulating regulatory T (Treg) and gd+ T cells.

(A) Right bars indicate the prevalence of the pathway in the MI donors and are grouped by superpathway designation. In the center, a circle represents each significant immunophenotype x pathway interaction. Fill color indicates whether the interaction was positive/negative and size the -log10 FDR value. (B) Scatter plots show the abundance of the Short-Chain Fatty Acid pathway P461-PWY versus counts of T regulatory cells and memory cells. (C) Scatter plots show the abundance of the pathway PWYG-321 mycolate biosynthesis versus counts of gd+T cells by age of the donor. (D) Scatter plot demonstrates the negative correlation between circulating gd+ T cells and age of the donor. (B-D) Stats based on Spearman correlation. See also Tables S19 and S21.

27

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B CD4 CD8 DC Mono NKT Other Class_Type 6 Adaptive Count 4 Adaptive MFI 2 Innate Count 0 # Associations Innate MFI Rothia sp001808955 Pauljensenia sp000466265 Pauljensenia odontolyticus_B Actinobacteria Pauljensenia odontolyticus_A Pauljensenia cellulosilytica Bifidobacterium dentium

Absiella dolichum Bacilli Association Negative UBA3382 sp002159555 Prevotella sp900290275 Positive Prevotella sp002251385 Bacteroidia Prevotella disiens

Species −log (FDR) Bacteroides intestinalis 10 Bacteroides fluxus 2

UBA9475 sp002161235 3 Gemmiger formicilis Flavonifractor sp002161215 Clostridia Faecalicatena sp001487105 faecis Clostridium_M bolteae

Acidaminococcus timonensis Negativicutes 0 25 50 75 100 Prevalence cDC1 T cells B cells − naive B cells naive CD4+ T cells CD8+ T cells CD86 in cDC1 CD27 CD38 in B cells − CM CD8+ T cells CD21 in naive B cells CD21 in naive CD21 DR in CD16hi NK cells − CD38 in memory B cells CCR7 in CM CD8+ T cells CD16 in CD16hi monocytes CCR7 in naive CD8+ T cells CCR7 in naive HLA DR in CD8a+ CD16+ NK cells − CD69 in CD8a+ CD16+ NK cells HLA

B Pauljensenia cellulosilytica Pauljensenia odontolyticus_A Pauljensenia odontolyticus_B Pauljensenia sp000466265 Rothia sp001808955 **** **** **** **** **** log(Relative abundance) 4.5 −6 4.0 −7 −8 3.5 −9 (CD38 in B cells) −10 10

log 3.0 Never Ex Active Never Ex Active Never Ex Active Never Ex Active Never Ex Active Smoker

C CD21 in B cells CD21 in CD24hi memory B cells CD21 in marginal zone B cells CD21 in memory B cells

R = 0.064 , p = 0.026 R = 0.11 , p = 0.00027 R = 0.083 , p = 0.0042 R = 0.086 , p = 0.003 GLUDEG 4.00 GABA shunt −

3.75 I − PWY: 3.50 4 − (MFI value) R = 0.087 , p = 0.0027 R = 0.13 , p = 1.3e−05 R = 0.087 , p = 0.0026 R = 0.1 , p = 3e−04 degradation V aminobutanoate PWY 10 4.00 log −

3.75 5022:

3.50

0 2 4 0 2 4 0 2 4 0 2 4 log(Pathway CPM +1)

Figure 6. Bacterial species and metabolic pathways associated with circulating immunophenotypes.

28

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 6. Bacterial species and metabolic pathways associated with circulating immunophenotypes.

(A) Top bars indicate how many species were significantly associated with each immunophenotype. Colors indicate whether the immunophenotype relates to the adaptive or innate immune system and measures cells counts or MFI. Right bars indicate the prevalence of the species in the MI donors and are grouped by bacterial class. In the center, a circle represents each significant immunophenotype x species interaction. Fill color indicates whether the interaction was positive/negative and size the -log10 FDR value. (B) Boxplots indicate the expression of CD38 in B cells as measured by MFI by smoking status. Color corresponds to the log relative abundance of the species in the facet. **** p <= 0.0001 by Kruskal-Wallis test. (C) Scatter plots show the abundance of the pathways PWY-5022: 4-aminobutanoate degradation V and GLUDEG-I-PWY: GABA shunt pathway versus MFI for multiple B cell populations. Stats based on Spearman correlation. See also Tables S19, S20, and S22.

29

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Methods

EXPERIMENTAL MODEL AND SUBJECT DETAILS

The Milieu Intérieur cohort

The 1000 healthy donors of the Milieu Intérieur cohort were recruited by BioTrial in the suburban Rennes area (Ille-et-Vilaine, Bretagne, France). The cohort included 500 men and 500 women, and 200 individuals from each decade of life, between 20 and 69 years of age. Participants were selected based on stringent inclusion and exclusion criteria, detailed elsewhere (Thomas et al., 2015). Donor BMI was restricted to ³ 18.5 and £ 32 kg/m2. Briefly, they had no evidence of any severe/chronic/recurrent pathological conditions. Primary exclusion criteria were seropositivity for human immunodeficiency virus or hepatitis C virus, travel to (sub-) tropical countries within the previous 6 months, recent vaccine administration, and alcohol abuse. Subjects were also excluded if they took nasal, intestinal, or respiratory antibiotics or antiseptics any time in the 3 months preceding enrollment. Additionally, anyone following a doctor or dietician prescribed diet for medical reasons (e.g. calorie-controlled diet in overweight patients) and volunteers with food intolerance or allergy were excluded. To avoid the influence of hormonal fluctuations in women during the peri-menopausal phase, only pre- or post-menopausal women were included. To minimize the influence of population substructure, the study was restricted to individuals of self-reported Metropolitan French origin for three generations (i.e., with parents and grandparents born in continental France).

Demographic, environmental, dietary, and clinical variables

Multiple demographic, environmental, and clinical variables were collected for each of the donors in an electronic case report form (Thomas et al., 2015). For example, donors were asked about their family medical history, smoking habits, sleeping habits, and infection and vaccination history. Additionally, donors completed a food-frequency questionnaire (FFQ) administered by trained investigators and comprising 19 food groups [Table S5]. Participants estimated their “usual consumption” selecting from 6 intake frequencies ranging from “Twice per day or more” to “Never” (except for alcohol, which offered 5 intake frequencies ranging from “Every day” to “Never”). Investigators administering the FFQ invited participants to declare their “usual” diet, rather than focusing on their latest dietary consumption. The detailed FFQ is available in (Partula et al., 2019). For clinical chemistry, hematologic, and serologic assessments, 20mL of blood was collected from each donor and analyzed at the certified Laboratoire de biologie médicale, Centre Eugene Marquis (Rennes, France).

After manual curation and removal of variables that were (i) variable in less than 5% of participants (ii) missing in more than 25% of donors (iii) correlated with another variable Spearman Rho > -0.6 or < 0.6, 154 metadata variables were considered for future associations. In the case of correlated variables [Table S5,6], the variable with fewer missing values was prioritized and kept, while the other variable was removed. When the

30

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

pair had equivalent numbers of missing values, one from the pair was randomly selected. Notably, circulating levels of creatinine were so strongly correlated with sex (Spearman Rho = 0.72, p-value 3.5 e-115), this variable was excluded from the 154.

Ethics Statement

The clinical study was approved by the Comité de Protection des Personnes - Ouest 6 on June 13, 2012, and by the French Agence Nationale de Sécurité du Médicament on June 22, 2012, and was performed in accordance with the Declaration of Helsinki. The study was sponsored by the Institut Pasteur (Pasteur ID-RCB Number 2012-A00238-35) and conducted as a single-center study without any investigational product. The original protocol is registered under ClinicalTrials.gov (study number NCT01699893). Informed consent was obtained from the participants after the nature and possible consequences of the studies were explained. The samples and data used in this study were formally established as the Milieu Interieur biocollection (NCT03905993), with approvals by the Comité de Protection des Personnes – Sud Méditerranée and the Commission nationale de l'informatique et des libertés (CNIL) on April 11, 2018.

METHOD DETAILS

Fecal DNA extraction and shotgun metagenomic sequencing

Subjects were asked to produce stool samples at their home within 24 hours before the scheduled visits (V1, V2). Stool specimens were collected in a double-lined sealable bag containing a GENbag Anaer atmosphere generator (Aerocult, Biomerieux) to maintain anaerobic conditions. Upon reception at the clinical site, fresh samples were aliquoted into cryotubes and stored at -80 °C.

Stool aliquots were shipped to the CRO Diversigen for DNA extraction and shotgun metagenomic sequencing. At Diversigen, genomic DNA was extracted using PowerMag Soil DNA Isolation Kit (Qiagen - MO BIO Laboratories, Catalog No. 27100). Libraries were prepared using Beckman robotic workstations (Biomek FX and FXp models) in batches of 96 samples. DNA (10 ng to 500 ng) was sheared into fragments of approximately 300- 400 bp in a Covaris E210 system (96 well format, Covaris, Inc. Woburn, MA) followed by purification of the fragmented DNA using AMPure XP beads. DNA end repair, 3’- adenylation, ligation to Illumina multiplexing PE adaptors, and ligation-mediated PCR (LM-PCR) was all completed using automated processes. To amplify high GC and low AT rich regions at greater efficiency, KAPA HiFi polymerase (KAPA Biosystems Inc.) was used for PCR amplification (6-10 cycles). Fragment Analyzer (Advanced Analytical Technologies, Inc) electrophoresis system was used for library quantification and size estimation. Prepared libraries were then pooled and sequenced on an Illumina HiSeq 2500.

In the end, we obtained 21 trillion raw paired-end reads from 1,363 samples from 946 of the donors. On average per sample, there were 2.4 Gbp, 15.5 million reads with 358 bp

31

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

insert size. To process the reads, Illumina TruSeq adapters were trimmed with Trimmomatic v0.36 (Bolger et al., 2014); low quality and low complexity reads were removed with prinseq-lite 0.20.4 (Schmieder and Edwards, 2011); and Bowtie2 v2.1.0 (Langmead and Salzberg, 2012) was used to remove reads mapping to PhiX or the PacBio human genome (parameters specified in Fig S1C). After processing, there were on average 14.2 ± 2.8 million reads per sample [Fig S1D, E]. Out of an initial 1,000 recruited donors, 44 donors were excluded from this analysis due to a lack of consent for sharing their data outside of the MI consortium. An additional 10 donors were excluded because of technical issues in the extraction and the sequencing steps (e.g., due to low DNA extraction yield) resulting in a sample size of 946 donors.

QUANTIFICATION AND STATISTICAL ANALYSIS

Building the Kraken-GTDB database

To build the Kraken-GTDB database, first the following files were downloaded ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt and https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/bac120_taxonomy_ r89.tsv (Parks et al., 2018) on 2019-06-25. These files were merged based on accession number and only those genomes present in both databases were considered, aka RefSeq genomes with a GTDB taxonomy. To avoid biasing the database towards those species with large numbers of genomes [Figure S2A], while balancing the added information provided by additional isolates per species, we selected 5 genomes per GTDB species to include in our database. Genomes were first ordered by their assembly quality, i.e. reference genome, representative genome, Complete Genome, Chromosome, Contig, Scaffold and then randomly selected. Based on this criteria, 23,505 genomes representing 13,446 unique bacterial species were downloaded and formatted into a Kraken2 database (Wood and Salzberg, 2014). To incorporate the GTDB taxonomy into the Kraken2 database, files mimicking the “NCBI-like” taxonomy files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.zip were created for names.dmp, complete_names.dmp, nodes.dmp, and accession2taxid. A matching Bracken database was then generated with bracken-build -k 35 -l 126 (Lu et al., 2017).

Metagenomic data analysis

First, putative reagent contaminants identified by species co-correlation analysis were filtered using Kraken2’s -unclassified-out option and a custom database of contaminant genomes [Figure S11, Table S23]. Using our custom GTDB-Kraken database, Kraken2 v2.0.6-beta (Wood and Salzberg, 2014) and Bracken v2.5 (Lu et al., 2017) were run on the 1,363 quality-controlled samples (parameters specified in Figure S1F) to generate bacterial profiles. With the exception of Figure 4, all analyzes were based on bacterial profiles from the Visit 1 samples. In the case no Visit 1 sample was available, the sample from Visit 2 was used.

32

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

To complement results from the MI donors, this pipeline was run on an additional 1,159 samples from 851 donors downloaded from ENA: PRJEB11532 (Zeevi et al., 2015). Notably, the reagent contaminants identified in the MI donors were nearly absent in the Zeevi samples, so no reagent filtering was performed. For these samples, metadata (Sex, Age, BMI) was obtained by emailing the authors. No timepoint was provided, so an average of the microbial profiles across samples of the same donor were used in further analyzes.

To go beyond taxa-based calls, HUMANn2 v0.11.2 (Franzosa et al., 2018) with default parameters including Uniref90 was run on all MI samples where the forward and reverse reads were concatenated into a single file. Raw output values were converted from rpks to cpms with humann2_renorm_table. Outputs of all samples were joined into a single merged table with the humann2_join_tables function. Using humann2_regroup_table, individual gene families were regrouped with multiple different databases including COG, GO, KEGG, and MetaCyc. Despite varying magnitudes of richness, at a high level, results were consistent across the different databases [Figure S12]. Ultimately, MetaCyc pathways (Caspi et al., 2018) were selected for the associations because of the additional steps implemented in HUMANn2 to check for completeness of the pathways.

Statistical Analysis

All associations and statistical tests were performed in R v3.6.1 (R Core Team, 2019), documented via rmarkdown documents (Allaire et al., 2019), and compiled with knitr (Xie, 2019). Within R, tables were manipulated with functions of the dplyr package (Wickham et al., 2019). The majority of figures were rendered with ggplot2 (Wickham, 2016) and arranged with cowplot (Wilke, 2019). Colors were selected with the help of RColorBrewer (Neuwirth, 2014) and viridis (Garnier, 2018). The cladogram in Figure 3B was generated with GraPhlAn (Asnicar et al., 2015). Correlation plots in Figure S4, S7C were generated with ggcorrplot (Kassambara, 2019). Supplemental tables were generated with Openxlsx (Walker, 2019). When comparing values between two or more groups, Wilcoxon-Rank Sum tests were used.

Bacterial α- and β-diversity measures including Shannon and Bray-Curtis were calculated using the R package vegan (Oksanen et al., 2019). To identify which of the 154 metadata variables were significantly associated with Bray-Curtis β-diversity, we used the adonis function in vegan to run permutational analysis of variance (PERMANOVA) tests with 999 permutations. To identify bacterial species and pathways differentially abundant between males and females, we used DESeq2 (Love et al., 2014). To identify bacterial species and pathways differentially prevalent between males and females, we used prop.test in R. Bacterial taxa and pathways associated with age were identified using Spearman correlations.

To determine the stability of a donor’s bacterial species and pathways between Visit 1 and Visit 2, the Bray-Curtis distance was calculated and subtracted from 1. Individuals with a stability of 1 had samples that were identical across timepoints, while a stability of 0 meant the samples were nothing alike. Similarly, 1 - the Jaccard index was used to

33

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

determine pathway stability based on presence/absence. To identify the phyla associated with stability, Spearman correlation coefficients were calculated. We then used generalized linear mixed models (GLMMs) fit with the R package glmmTMB (v.0.2.3)(Mollie E. Brooks et al., 2017) with a beta response to identify which metadata factors were associated with intra−individual stability. To calculate stability of individual features (species and pathways), we again applied the Bray-Curtis metric but this time compared the relative abundance of a single feature across all donors at Visit 1 and Visit 2.

Bacterial profiles (species and pathways) were associated with circulating immunophenotypes with multivariate linear models, akin to the procedure in (Patin et al., 2018). To reduce the multiple testing burden, of the 166 immunophenotypes originally published (Patin et al., 2018), we prioritized the 40 cell counts that had a mean count greater than 1,000 and their 46 associated MFI intensities [Figures S8,9, Tables S19,20]. Fifty of these immunophenotypes described adaptive immune system, while 36 were for innate cells. With multivariate linear modeling using the lmer function from lmerTest R package (Kuznetsova et al., 2017), correcting for age (spline, df = 3), sex, CMV status, smoking, genetics, and batch which were previously associated with immune profiles (Patin et al., 2018), 547 bacterial species (prevalence >5%, average abundance > 0.01%, TSS normalized) were associated with the 86 immunophenotypes for 47,042 tests. Similarly, 350 metabolic pathways (prevalence > 20%) were correlated with the 86 immunophenotypes with the same linear models for 29,050 tests. More specifically, the formula utilized for each combination was

immunophenotype ~ pathway (or species) + SNPs + ns(Age, df = 3) + Sex + Smoking + TABAC.T1 +AncestryPC1 + AncestryPC2 + HourOfSampling + (1 | DateOfSampling),

where immunophenotypes and species are log-transformed, if contains 0, then 1 unit is added before log-transformation; the pathway is the log(CPM + 0.6884654), where 0.6884654 is the smallest pathway CPM value.

For all statistical tests, p-values were corrected with the R function p.adjust using the Benjamini & Hochberg (FDR) method.

DATA AND SOFTWARE AVAILABILITY

Data Availability

Sequence data will have been deposited in the European Genome-Phenome Archive under accession code EGAXXX prior to acceptance. Donor metadata and code utilized in this paper will also be available.

34

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplemental Figures

A B

Species Number Genomes Staphylococcus aureus 8998 Salmonella enterica 8395 (% of all genomes) Streptococcus pneumoniae 8057 8926 species had 1 genomes (7.7%) 3443 species had 2−5 genomes (7.9%) Escherichia flexneri 7622 477 species had 6−10 genomes (3.1%) 457 species had 11−15 genomes (8.4%) Mycobacterium tuberculosis 5510 59 species had 51−100 genomes (3.5%) 55 species had 101−500 genomes (10%) Klebsiella pneumoniae 3834 12 species had 501−1000 genomes (7.8%) Acinetobacter baumannii 2718 17 species had 1000+ genomes (52%) Pseudomonas aeruginosa 2685 Escherichia coli 2102 Mycobacteroides abscessus 1564

C

D E 50

40

30 Sample Min Median Mean Max 20 starting 1922330 15077276 15493648 50490325 10 trimmomatic_passed 1848873 14756334 15134797 48819603

Number Pairs x 1 million Number Pairs prinseq_passed 1717551 13856573 14241724 45019531 0 no_phi_X 1715534 13855281 14236891 45014534 no_human 1703558 13573337 13869341 43854387 starting no_phi_X no_human

prinseq_passed

trimmomatic_passed

F

Figure S1. Quality control and Kraken analysis pipeline, Related to Figure 1.

35

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S1. Quality control and Kraken analysis pipeline, Related to Figure 1

(A) Pie chart indicating the distribution of genomes in RefSeq (as of 2019-06-25) to different species, based on GTDB taxonomy. 29 species represent over 60% of the reference genomes. (B) Table of the 10 species with the greatest number of genomes in RefSeq. (C) Steps taken to quality control shotgun metagenomic reads. Additional details can be found in the Methods Section. (D) Barplots show the number of paired-end reads remaining after each filtration step in (C). Each point corresponds to 1 sample (n = 1363). Colors correspond to steps in (C). (E) Summary table for the number of paired-end reads in (D). (F) Steps taken to run Kraken2 and Bracken on the quality-controlled reads including removal of putative reagent contaminants. Additional details can be found in the Methods Section. See also Table S2.

36

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Level Metaphlan2−NCBI Kraken2−NCBI Kraken2−GTDB Phylum 15 43 52 Class 26 95 112 Order 46 211 287 Family 164 493 670 Genus 1098 2397 3205 Species 6278 18795 13446

B 100 **** C Phylum Class Order Family Genus Species 25 60 300 25 600 75 20 90 20 40 200 400 50 15 15 60 25 10 20 100 200 Taxa Richness Taxa 10 30 Percent Reads Mapped Percent 5 0 5

Program − Database Metaphlan2−NCBI Kraken2−NCBI Kraken2−GTDB

D Class Order Family Genus Species

Verrucomicrobia: 5 8 11 20 43 Verrucomicrobiota 3 8 14 26 43

10 60 175 1026 8169 Proteobacteria 4 70 175 1144 5357 Firmicutes: 7 16 58 467 3858 Firmicutes_C, 1 6 15 39 102 4 18 60 325 800 Firmicutes_A, 1 11 60 331 1588 Firmicutes Bacteroidetes: 7 10 45 268 1603 Bacteroidota 5 11 49 324 1350

Actinobacteria: 7 29 68 310 3651 NCBI Phyla: GTDB Phyla NCBI Phyla: Actinobacteriota 5 18 47 372 2591 0.0 2.5 5.0 7.5 10.0 0 20 40 60 0 50 100 150 0 300 600 900 12000 2000400060008000 Number of unique taxa

Taxonomy GTDB NCBI

E Metaphlan2−NCBI Kraken2−NCBI Kraken2−GTDB 100

75

50

25

Relative Abundance Relative 0 Sample

Actinobacteria Bacteroidetes Firmicutes Firmicutes_C Verrucomicrobia Actinobacteriota Bacteroidota Firmicutes_A Proteobacteria Verrucomicrobiota

Figure S2. Comparison of all Program−Database combinations tested, Related to Figure 1.

37

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S2. Comparison of all Program-Database combinations tested, Related to Figure 1

(A) Table indicates the richness across taxonomic levels of the different databases tested. The NCBI and Genome Tree Database (GTDB) databases used by Kraken2 include the same 23505 RefSeq genomes with taxonomies assigned either by NCBI or the GTDB respectively. GTDB taxonomic assignments often lead to collapsing or splitting of those taxa originally in NCBI; e.g.Firmicutes was split into Firmicutes, Firmicutes_A, Firmicutes_B, Firmicutes_C, etc. (B) Boxplots compare the percent of shotgun metagenomic reads mapped by the different program-database combinations. Percentage of mapped reads for Metaphlan2 was based on its estimated number of reads from the clade. **** p < 0.0001 by Wilcoxon-Rank Sum. (C) Boxplots compare the richness of taxa identified per sample with the different program-database combinations tested. All pairwise comparisons had p < 0.0001 by Wilcoxon-Rank Sum. (D) Bars indicate the number of unique taxa present in the NCBI (purple) and GTDB (red) databases for each of the top phyla across taxonomic levels. (E) Relative abundance of the top phyla across different program-database combinations. Each bar corresponds to 1 sample. Samples are ordered based on the relative abundance of Firmicutes_A found by Kraken2-GTDB.

38

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Panel (Count) Basic physiological measurements (4) Medical history (14) Biometrics (6) OmniExpress SNP genotyping array (2) Concomitant drug treatment (2) Personal medical history at birth (3) Demographics (5) Sleep and drug habits, psychological problems (12) Family medical history (8) Smoking habits (2) Food and nutrition (43) Socio−professional information (5) Geographic origin (3) Vaccination history (3) Laboratory measure (42)

B BMI (kg/m^2) V1 Diastolic 1 (mmHg) Heart rate (bpm) Temperature (°C) Age C 60 80 120 40 Wears corrective lenses 40 60 90 90 30 40 60 60 20 Green eyes 20 20 30 30 10 Dark brown hair colour 0 0 0 0 0 Blue eyes 18 21 24 27 30 33 60 80 100 40 60 80 100 36.0 36.5 37.0 37.5 20 30 40 50 60 70 Blond hair colour Black hair colour Hours physical activity/wk Alcohol drinks/week Bread g per day Cheese/mon Cook,cured meats/mon 500 Takes oral contraception 200 400 300 300 300 Takes medication 150 300 200 200 200 100 200 50 100 100 100 100 Sex 0 0 0 0 0 Owns housing, landlord 0 20 40 0 10 20 30 100 200 300 400 0 20 40 60 0 20 40 60 Lives alone Other cancer Cooked fruit/mon Cooked veggies/mon Dairy products/mon Desserts/mon Dried pulses/mon Myocardial infarction 400 500 600 300 300 400 Diabetes 300 300 400 Colorectal cancer 200 200 200 200 100 100 100 100 200 Cerebral accident, haemorrhage 0 0 0 0 0 Breast cancer 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 Arterial hypertension Allergic disease Eats fast−food Eggs/mon Fish/mon Fried foods/mon Meat/mon Vitamins 500 Uses reduced sugar products 400 400 600 300 400 300 300 400 Uses reduced salt products 200 200 200 Takes probiotics 200 100 100 200 100 0 0 0 0 0 Takes multiminerals/multivitamins Salts food 0 5 10 15 20 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 Regular mealtimes during workdays Only eats in afternoon Pastries, sweet breads/mon Raw fruit/mon Raw veggies/mon Ready meals/mon Restaurants/mon Nibbles between meals 300 800 Minerals >12 times/mon 400 200 600 600 200 400 400 Essential oils/mon 200 100 100 200 200 Essential oils by oral route 0 0 0 0 0 Essential oils 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 5 10 15 20 Eats only in the morning Eats only in evening Sodas, sugary drinks/mon Starchy foods/mon Sweet items/mon Homes before age 13 Pop birthplace (x 100,000) Drinks spirits 400 Drinks beer/cider 600 300 600 400 Doesn't always eat lunch 300 200 400 300 400 200 200 Doesn't always eat dinner 200 100 100 200 100 Cooks <6 times/mon 0 0 0 0 0 0 20 40 60 0 20 40 60 0 20 40 60 2.5 5.0 7.5 0 5 10 15 20 Varicella Zoster Virus MMRV IgG Toxoplasmosis ToRC IgG Mumps MMRV IgG Pop hometown (x 100,000) ALAT (IU/L) Albumin (g/L) Alkaline phosph (IU/L) Basophils (G/L) Measles MMRV IgG 600 120 150 150 90 300 Influenza 400 100 100 60 200 Herpes Simplex Virus 2 IgG 200 50 50 30 100 Herpes Simplex Virus 1 IgG Count 0 0 0 0 0 Helicobacter pylori IgG 0 5 10 15 20 0 30 60 90 40 45 50 55 50 100 150 0.00 0.05 0.10 0.15 0.20 HBV anti−HBs IgG EBV anti−EBNA IgG EBV anti−EA IgG Bicarbonates (mmol/L) Calcium (mmol/L) Chloride (mmol/L) CRP (mg/L) Eosinophils (G/L) CMV 80 200 200 150 60 150 300 150 100 40 100 200 100 Visceral system surgery 50 20 50 100 50 Vascular system surgery 0 0 0 0 0 Tonsillectomy 22.5 25.0 27.5 30.0 32.5 35.0 2.2 2.3 2.4 2.5 2.6 2.7 95 100 105 110 0 5 10 15 20 0.00 0.25 0.50 0.75 1.00 Teeth extraction Rubella Reproductive system surgery Fasting glycaemia (mmol/L) Glomerular filt rate HDL (mmol/L) IgA (g/l) IgE (UI/ml) Orthopedic and maxillofacial surgery 100 600 150 75 90 150 Mumps 100 50 60 100 400 Measles 50 25 30 50 200 Hepatitis B 0 0 0 0 0 Flu 3 4 5 6 7 75 100 125 150 1.0 1.5 2.0 2.5 3.0 0.0 2.5 5.0 7.5 0 1000 2000 3000 Chicken pox Appendicectomy IgG (g/l) IgM (g/l) Leucocytes (G/L) Lymphocytes (G/L) MCHC (g/L) Any surgery 150 200 100 80 Breastfed 100 150 75 90 60 100 50 60 40 Born via C−section 50 50 25 30 20 Born premature 0 0 0 0 0 5 10 15 20 25 0 2 4 6 2.5 5.0 7.5 10.0 12.5 1 2 3 4 31 32 33 34 35 Sleep difficulities >3 nights last 2w Regular sleep difficulities Poor self−image, last 2w MCV (µm3) Monocytes (G/L) Phosphate (mmol/L) Platelets (G/L) Potassium (mmol/L) Major Negative Life event, last 12 month 200 80 100 Little or too much appetite, last 2w 100 150 60 75 100 100 40 50 Light while sleeping 50 50 20 25 50 Lack of pleasure in doing things, last 2w 0 0 0 0 0 Haschich 0_never 1_rarely, regularly 80 90 100 0.0 0.5 1.0 1.5 2.0 0.6 0.8 1.0 1.2 1.4 1.6 100 200 300 400 3.5 4.0 4.5 5.0 5.5 Feeling tired, little energy, last 2w Feeling sad, depressed, last 2w Sodium (mmol/L) Total bilirubin Total cholesterol (mmol/L) Total proteins (g/L) Triglycerides (mmol/L) Difficulty concentrating, last 2w 120 200 150 Exposed to 2nd hand smoke 200 150 90 150 100 60 100 100 Current smoker 100 50 30 50 50 0 0 0 0 0 Steady job Past, current noise exposure 135.0137.5140.0142.5145.0 0 10 20 30 40 4 6 8 10 60 70 80 90 0 1 2 3 4 Past, current exposure to toxic products Household income >2000 EUR/mon Urea (µmol/L) PC1 Ancestry PC2 Ancestry Hours of sleep per day Have higher education 500 300 400 300 75 200 300 200 Yellow fever vaccine 50 200 Whooping cough vaccine 25 100 100 100 0 0 0 0 Mumps vaccine 2.5 5.0 7.5 10.0 −0.04 −0.02 0.00 0.02 0.00 0.05 0.10 2.5 5.0 7.5 10.0 12.5 0 25 50 75 Value Percent of donors Figure S3. 154 variables were associated with bacterial profiles, Related to Figure 1.

39

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S3. 154 variables were associated with bacterial profiles, Related to Figure 1

(A) Distribution of variables across broad categories. (B) Distribution of 64 continuous variables across donors. Colors correspond to those in A. (C) Percent of donors for which each of the 90 binary variables was True. Colors correspond to those in A. 2w = 2 weeks, ALAT = Alanine Aminotransferase, CMV = Cytomegalovirus, CRP = C- Reactive Protein, EBV = Epstein-Barr virus, G/L = billion of cells per liter, HBV = Hepatitis B virus, HDL = high-density lipoproteins, Ig = Immunoglobulin, MCV = Mean corpuscular volume, MCHC = Mean corp hemoglobin concentration, mon = month, PC = Principal Component of genetics SNP array See also Tables S5 and S6.

40

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A 24 variables associated with Sex Bread g per day Drinks beer/cider Monocytes (G/L) Total bilirubin Albumin (g/L) Bicarbonates (mmol/L) Corr: Urea (µmol/L) Female 1 ALAT (IU/L) Triglycerides (mmol/L) Male −1 Fasting glycaemia (mmol/L) 1.0 Diastolic 1 (mmHg) Past, current noise exposure Takes oral contraception 0.5 Takes medication Sex Temperature (°C) 0.0 IgM (g/l) Uses reduced sugar products −0.5 Phosphate (mmol/L) HDL (mmol/L) Cooked fruit/mon −1.0 Chloride (mmol/L) Raw fruit/mon Cooked veggies/mon Cooks <6 times/mon

Sex IgM (g/l) ALAT (IU/L) Raw fruit/monHDL (mmol/L) Urea (µmol/L)AlbuminTotal bilirubin(g/L) MonocytesBread (G/L) g per day ChlorideCooked (mmol/L) fruit/monTemperatureTakes medication(°C) Drinks beer/cider Diastolic 1 (mmHg) CooksCooked <6 times/mon veggies/monPhosphate (mmol/L) TriglyceridesBicarbonates (mmol/L) (mmol/L) Takes oral contraception Fasting glycaemia (mmol/L) Uses reduced sugarPast, products current noise exposure B 44 variables associated with Age

Feeling tired, little energy, last 2w Steady job Total proteins (g/L) Albumin (g/L) Homes before age 13 Eats fast−food Doesn't always eat dinner Current smoker Haschich 0_never 1_rarely, regularly Owns housing, landlord Sodas, sugary drinks/mon Pastries, sweet breads/mon Fried foods/mon Cook,cured meats/mon Ready meals/mon Pop birthplace (x 100,000) Corr Glomerular filt rate Starchy foods/mon 1.0 Restaurants/mon Mumps vaccine Whooping cough vaccine 0.5 HBV anti−HBs IgG Hepatitis B Temperature (°C) Takes oral contraception 0.0 Chicken pox Have higher education Only eats in afternoon −0.5 Age Total cholesterol (mmol/L) Wears corrective lenses −1.0 Toxoplasmosis ToRC IgG Raw fruit/mon Visceral system surgery Fasting glycaemia (mmol/L) Diastolic 1 (mmHg) BMI (kg/m^2) V1 Urea (µmol/L) Any surgery Orthopedic and maxillofacial surgery Household income >2000 EUR/mon Measles MMRV IgG Herpes Simplex Virus 1 IgG Sodium (mmol/L) MCV (µm3)

Age

Hepatitis B Steady job MCV (µm3) Any surgery Chicken pox Urea (µmol/L) Raw fruit/mon Eats fastAlbumin−food (g/L) Mumps vaccine Current smoker Sodium (mmol/L) BMI (kg/m^2) V1 Temperature (°C)Restaurants/mon Fried foods/mon HBV anti−HBsStarchy IgGGlomerular foods/monReady filt meals/monrate Total proteins (g/L) Measles MMRV IgG Diastolic 1 (mmHg) OnlyHave eats higher in afternoon education Homes before age 13 Visceral systemWears surgery corrective lenses Cook,cured meats/monOwns housing, landlord Toxoplasmosis ToRC IgGTakes oral contraceptionWhooping cough vaccine Total cholesterol (mmol/L) Pop birthplace (xSodas, 100,000) sugaryDoesn't drinks/mon always eat dinner Herpes Simplex Virus 1 IgGFasting glycaemia (mmol/L) Pastries, sweet breads/mon

Feeling tired, little energy, last 2w HouseholdOrthopedic income and maxillofacial>2000 EUR/mon surgery Haschich 0_never 1_rarely, regularly

Figure S4. Metadata variables co−correlated with sex and age, Related to Figure 1.

Figure S4. Metadata variables co-correlated with sex and age, Related to Figure 1

(A) 24 variables were associated with sex with Spearman Rho > 0.2 or < -0.2. (B) 44 variables were associated with age with Spearman Rho > 0.2. or < -0.2. See also Table S6.

41

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A 600 B BMI Age 502 0.125 473 473 0.03 400 Gender 0.100 330 cohort Female 0.075 0.02 MI 200 Male 0.050 density 0.01 Zeevi Number people 0.025 0 0.000 0.00 MI Zeevi 20 30 40 50 20 30 40 50 60 70 Cohort value

C D E P C O F G S 30 **** 100 **** **** **** **** **** **** **** 60 25 300 600 25 60 75 20 100 40 20 200 400 50 15 40 15 20 50 Taxa Richness Taxa 10 200 25 20 100

Percent Unmapped Percent 10 Starting Reads x 1 mil 0 5 MI Zeevi MI Zeevi MI Zeevi MI Zeevi MI Zeevi MI Zeevi MI Zeevi MI Zeevi Cohort Cohort Cohort

F P C O F G S 4 4 2 R = 0.97 , p < 2.2e−16 R = 0.9 , p < 2.2e−16 R = 0.93 , p < 2.2e−16 R = 0.92 , p < 2.2e−16 2 R = 0.91 , p < 2.2e−16 R = 0.9 , p < 2.2e−16 2.5 2 2 2 0 0 0 0.0 0 0 −2 −2 −2 −2 −2 −2.5 −4 −4 −4 −4 −4 −5.0 −6

MI log(Relative Abundance) MI log(Relative −6 −6 −4 −2 0 2 4 −4 −2 0 2 4 −6 −4 −2 0 2 4 −6 −4 −2 0 2 4 −5.0 −2.5 0.0 2.5 −6 −4 −2 0 2 Zeevi et al. log(Relative Abudance) I 0.0125 G "R"^"2":0.0327, Pr(>F): 0.001 H

0.00 0.00 0.00 0.0100

−0.25 −0.25 2 0.0075 −0.25 Cohort MI −0.50 −0.50 0.0050 Zeevi MDS component:2 MDS component:2 −0.50 Adonis R MDS component:2 −0.25 0.00 0.25 −0.25 0.00 0.25 0.0025 −0.25 0.00 0.25 MDS component:1 MDS component:1 MDS component:1 0.0000 Firmicutes_A Bacteroidota Age BMI Gender cohort MI Zeevi 25 50 75 25 50 75 Variable

Figure S5. Microbial profiles of MI donors in comparison to those from Zeevi et al., Related to Figure 1.

42

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S5. Microbial profiles of MI donors in comparison to those from Zeevi et al., Related to Figure 1

(A) Bars indicate the number of females and males in both cohorts. (B) Density plots show distribution of Body Mass Index (BMI) and Age in both cohorts. (C-E) **** p < 0.0001 by Wilcoxon-Rank Sum. (C) Bar plots compare sequencing depth across cohorts. (D) Barplots show percent of reads unmapped after the Kraken2-GTDB pipeline. (E) Boxplots show richness across taxonomic levels. Each dot corresponds to 1 donor. (F) Association of relative abundances of each taxon across taxonomic levels with Spearman correlation. Each dot corresponds to 1 taxon. Only taxa present in >5% of either cohort were considered. Log relative abundances are shown. (G) Multidimensional scaling (MDS) ordination plot of all samples in both the MI and Zeevi cohorts. Each dot corresponds to 1 donor and is colored by the cohort. R2 indicates the amount of interindividual variation (calculated with Bray Curtis) explained by the cohort and was calculated with the PERMANOVA test adonis. (H) MDS ordination plot of samples in both the MI and Zeevi cohorts. Each dot corresponds to 1 donor and is colored by the relative abundance of the phyla. (I) Bars indicate the amount of interindividual variation (calculated with Bray Curtis) explained by each of the variables in each of the cohorts. See also Table S8.

43

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A

B Prevotella veroralis Prevotella histicola Prevotellamassilia timonensis Prevotella stercorea Prevotella sp900290275 Prevotella sp002265625 Prevotella sp002251435 Prevotella sp002251385 Prevotella sp002251365 Prevotella sp002251295 Prevotella sp000834015 Prevotella sp000758925 Prevotella seregens

Prevotella salivae Prevotella multisaccharivorax Prevotella multiformis Prevotella melaninogenica Prevotella lascolaii Prevotella jejuni Prevotella intermedia Prevotella ihumii Prevotella denticola Prevotella copri_A Prevotella copri Prevotella buccalis Prevotella bryantii Prevotella bivia Prevotella albensis MI Zeevi MI Zeevi 0 25 50 75 Cohort Cohort Prevalence

Log2FC M:F FDR value gender cohort <0.001 <0.1 F MI 1.5 <0.01 <1 M Zeevi 1.0 <0.05 0.5

Figure S6. Associations of taxa with sex, particularly Prevotella, were consistent in the Zeevi cohort, Related to Figure 2.

44

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S6. Associations of taxa with sex, particularly Prevotella, were consistent in the Zeevi cohort, Related to Figure 2

(A) Venn diagram comparing all the bacterial species statistically significantly (FDR < 0.05) associated with sex in the MI and Zeevi cohorts. (B) 28 Prevotella species were more abundant in males consistently across cohorts. Left panel shows the log2 fold change (Log2FC) of species in males versus females. Middle panel indicates the FDR value. Notably, even when a species was significant in only one cohort the direction was consistent in the other. Right panel shows the prevalence of each species. Color indicates the sex, while shape indicates the cohort. See also Tables S9 and S10.

45

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A 0.2 0.2

0.0 0.0 Male Rho Male Rho −0.2 −0.2

−0.2 0.0 0.2 −0.2 0.0 0.2 Female Rho Female Rho

Significant in Phyla Both Actinobacteriota Firmicutes_A Female Bacteroidota Firmicutes_C Male Desulfobacterota_A Proteobacteria Firmicutes Synergistota

Bifidobacterium merycicum B C Bifidobacterium angulatum Bifidobacterium ruminantium Bifidobacterium adolescentis Corr Bifidobacterium sp002742445 Bifidobacterium pseudocatenulatum 1.0 Bifidobacterium kashiwanohense_A Bifidobacterium kashiwanohense Bifidobacterium catenulatum 0.5 Bifidobacterium longum Bifidobacterium infantis Bifidobacterium breve Bifidobacterium bifidum 0.0 Bifidobacterium dentium Bifidobacterium minimum Collinsella sp002232035 −0.5 Collinsella aerofaciens_E Collinsella aerofaciens_A Collinsella aerofaciens_F −1.0 Collinsella aerofaciens Collinsella sp000763055

Bifidobacterium breve Collinsella aerofaciensBifidobacteriumBifidobacteriumBifidobacterium bifidum infantis longum CollinsellaCollinsellaCollinsella Collinsellasp000763055Collinsella aerofaciens_FBifidobacterium aerofaciens_ABifidobacterium aerofaciens_E sp002232035 minimum dentium BifidobacteriumBifidobacterium angulatum merycicum BifidobacteriumBifidobacterium catenulatumBifidobacterium adolescentis ruminantium Bifidobacterium sp002742445 Bifidobacterium kashiwanohense BifidobacteriumBifidobacterium kashiwanohense_A pseudocatenulatum D Bifidobacterium sp002742445 Bifidobacterium ruminantium Bifidobacterium pseudocatenulatum Bifidobacterium minimum Bifidobacterium merycicum Bifidobacterium longum Bifidobacterium kashiwanohense_A Bifidobacterium kashiwanohense Bifidobacterium infantis Bifidobacterium gallinarum Bifidobacterium dentium Bifidobacterium catenulatum Bifidobacterium breve Bifidobacterium bifidum Bifidobacterium animalis Bifidobacterium angulatum Bifidobacterium adolescentis MI Zeevi MI Zeevi 25 50 75 Cohort Cohort Prevalence

Spearman Rho FDR value Cohort 0.1 <0.001 <0.1 MI 0.0 <0.01 <1 Zeevi −0.1 <0.05 −0.2

Figure S7. Associations of taxa with age, particularly Bifidobacterium, were consistent across sexes and in the Zeevi cohort, Related to Figure 3.

46

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S7. Associations of taxa with age, particularly Bifidobacterium, were consistent across sexes and in the Zeevi cohort, Related to Figure 3

(A) Scatter plots compare the Spearman rho values of bacteria ~ age for males and females, 1 point = 1 species. In the left plot, points are colored based on whether the association was significant (FDR < 0.05) in males, females, or both. In the right plot, points are colored based on the phyla designation of the species. (B) Venn diagram comparing the bacterial species statistically (FDR < 0.05) associated with age in the MI and Zeevi cohorts. (C) Correlation plot of Bifidobacteria species (prevalence > 30%) and their co-correlated Collinsella species (Spearman Rho > 0.4) (D) 17 Bifidobacterium species were associated with age across cohorts. Left panel shows the association of species with age (Spearman Rho). Middle panel indicates the FDR value. Notably, even when a species was significant in only one cohort the trend was consistent in the other. Right panel shows the prevalence of each species. Color indicates the cohort. See also Tables S12 and S13.

47

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Immune Class (Count) B (21) CD4 (14) CD8 (8) MAIT (3) Other (4)

B B CD21− CD27− B CD24hi memory B marginal zone B memory B naive B 150 150 300 200 150 12486 ± 7119 7934 ± 4910 1158 ± 851 1490 ± 1401 2113 ± 1483 150 6289 ± 3853 100 100 200 150 100 100 100 50 50 50 100 50 50 0 0 0 0 0 0 0 20 40 60 0 10 20 30 40 0 2 4 6 0 5 10 15 0 5 10 15 0 10 20 30

CD4+ T CM CD4+ T EM CD4+ T HLA−DR+ CM CD4+ T memory Treg naive CD4+ T 41196 ± 16924 23285 ± 9803 3311 ± 2624 150 1320 ± 823 1293 ± 658 14097 ± 9104 90 90 200 75 150 100 60 60 100 50 100 50 30 30 50 50 25 0 0 0 0 0 0 0 50 100 0 20 40 60 80 0 10 20 0 2 4 6 8 0 2 4 0 20 40 60

TFH Th1 Th17 like T Treg CCR6+ CD8+ T CD8b+ T 100 2846 ± 1447 300 3522 ± 3563 8457 ± 4350 80 2376 ± 1110 1538 ± 1176 150 16943 ± 8829 60 100 75 200 100 100 50 40 50 50 Count 25 100 20 50 0 0 0 0 0 0 0.0 2.5 5.0 7.5 10.0 0 5 10 15 20 0 10 20 30 0 2 4 6 8 0 2 4 6 8 0 20 40 60 80

CM CD8+ T EMRA CD8+ T naive CD8+ T CD4− CD8− MAIT CD8+ MAIT MAIT 125 300 200 200 6979 ± 3605 1646 ± 2407 200 5968 ± 5035 1015 ± 1020 1364 ± 1240 120 2655 ± 2172 100 150 150 75 200 150 80 100 100 100 50 100 40 25 50 50 50 0 0 0 0 0 0 0 10 20 30 0 5 10 15 20 0 10 20 30 40 50 0 2 4 6 8 0 4 8 12 0 5 10

CD8b− CD4− T conventional T gd+ T T 250 125 200 4177 ± 3293 34710 ± 14652 2592 ± 2603 63096 ± 24975 150 60 200 100 40 150 75 100 100 50 50 20 50 25 0 0 0 0 0 10 20 30 0 25 50 75 100 0 5 10 15 20 0 50 100 150 200 Cell Count (x 1,000)

C CD19 in B CD21 in B CD21 in CD24hi memory B CD21 in marginal zone B CD21 in memory B CD21 in naive B 125 5408 ± 1319 120 5262 ± 891 5401 ± 933 5960 ± 1137 5080 ± 863 100 5410 ± 1030 60 100 100 100 75 40 80 75 50 50 40 50 50 20 25 25 0 0 0 0 0 0 4 6 8 10 2 4 6 8 10 2.5 5.0 7.5 10.0 12.5 5 10 15 2 4 6 8 10 2.5 5.0 7.5 10.0

CD24 in B CD24 in marginal zone B CD24 in memory B CD24 in naive B Cells CD27 in B CD38 in B 100 1121 ± 389 150 2006 ± 735 1515 ± 533 762 ± 256 200 1316 ± 650 400 4666 ± 3516 90 90 75 100 150 300 60 50 60 100 200 50 25 30 30 50 100 0 0 0 0 0 0 1 2 3 0 2 4 6 8 0 1 2 3 4 1 2 0.0 2.5 5.0 7.5 10.0 0 20 40 60

Count CD38 in memory B IgD in B IgM in B CCR7 in CM CD4+ T CCR7 in EM CD4+ T CCR7 in naive CD4+ T 1137 ± 315 80 2327 ± 757 120 520 ± 249 852 ± 432 596 ± 233 2065 ± 718 60 90 100 100 60 80 40 60 40 50 50 20 20 40 30 0 0 0 0 0 0 0.5 1.0 1.5 2.0 1 2 3 4 5 0.0 0.5 1.0 1.5 2.0 0 1 2 3 0.0 0.5 1.0 1.5 2 4 6

ICOS in memory Treg CCR7 in CM CD8+ T CCR7 in EMRA CD8+ T CCR7 in naive CD8+ T 80 111 ± 26 100 592 ± 225 810 ± 259 80 2467 ± 872 100 75 60 60 50 40 40 50 25 20 20 0 0 0 0 0.1 0.2 0.3 0.0 0.5 1.0 0.5 1.0 0 2 4 Cell MFI (x 1,000)

Figure S8. 50 circulating adaptive immunophenotypes were associated with bacterial profiles, Related to Figures 5 and 6.

48

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S8. 50 circulating adaptive immunophenotypes were associated with bacterial profiles, Related to Figures 5 and 6

(A) Distribution of adaptive immunophenotypes across cell type. (B) Distribution of 28 circulating immune cell counts across donors. Colors correspond to those in A. Numbers are mean cell count ± standard deviation. (C) Distribution of 22 Mean Fluorescence Intensities (MFIs) across donors. Colors correspond to those in A. Numbers are mean value ± standard deviation. See also Tables 19 and 20.

49

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Immune Class (Count) DC (3) Granulo (13) Mono (7) NKT (11) Other (2)

B cDC1 basophils eosinophils neutrophils CD14hi monocytes CD16hi monocytes 200 1220 ± 693 1446 ± 759 150 3500 ± 3306 142610 ± 67483 100 15152 ± 6325 2654 ± 1289 150 100 150 75 100 100 100 100 50 50 50 50 50 50 25 0 0 0 0 0 0 0 2 4 6 0 2 4 0 5 10 15 20 0 200 400 600 800 0 10 20 30 40 0.0 2.5 5.0 7.5 10.0 12.5

Count monocytes CD16hi NK CD8a+ CD16hi NK NK CD45+ total number of 125 17846 ± 7022 120 7872 ± 4849 3884 ± 2670 8780 ± 5037 315634 ± 114588 334541 ± 135423 100 100 150 100 90 150 75 75 75 100 60 100 50 50 50 50 25 30 25 25 50 0 0 0 0 0 0 0 20 40 60 0 10 20 30 0 5 10 15 0 10 20 30 40 0 500 1000 0 500 1000 Cell Count (x 1,000)

C CD86 in cDC1 HLA−DR in cDC1 CD16 in basophils CD16 in eosinophils CD16 in neutrophils CD203c in basophils 500 407 ± 350 100 4116 ± 1050 154 ± 50 755 ± 228 100 40041 ± 14545 1459 ± 1154 400 90 100 150 75 75 300 75 50 60 50 100 200 50 30 50 100 25 25 25 0 0 0 0 0 0 0 2 4 6 2 4 6 8 0.0 0.1 0.2 0.3 0.5 1.0 1.5 2.0 30 60 90 0.0 2.5 5.0 7.5

CD32 in basophils CD62L in eosinophils CD62L in neutrophils FceRI in basophils FceRI in eosinophils FceRI in neutrophils 150 100 7458 ± 2706 4903 ± 1218 11595 ± 3475 6751 ± 4077 150 618 ± 210 114 ± 50 75 100 75 75 100 100 50 50 50 50 50 25 25 50 25 0 0 0 0 0 0 0 5 10 15 20 5 10 0 5 10 15 20 0 5 10 15 20 25 0.5 1.0 1.5 0.0 0.1 0.2 0.3 0.4

Count CD16 in CD14hi monocytes CD16 in CD16hi monocytes CD86 in CD14hi monocytes HLA−DR in CD14hi monocytes CD16 in CD16hi NK CD69 in CD16hi NK 100 1197 ± 494 100 9236 ± 2875 150 5809 ± 1771 1261 ± 481 100 20889 ± 8988 550 ± 261 75 200 75 75 75 100 150 50 50 50 50 100 50 25 25 25 25 50 0 0 0 0 0 0 0 1 2 5 10 15 20 25 5 10 15 20 1 2 3 0 20 40 60 0 1 2 3

CD69 in CD8a+ CD16+ NK CD8a in CD16hi NK CD8a in CD8a+ CD16+ NK HLA−DR in CD16hi NK HLA−DR in CD8a+ CD16+ NK NKp46 in NK 526 ± 245 4207 ± 2483 7896 ± 2777 1000 ± 467 974 ± 448 2578 ± 925 200 60 60 100 100 150 100 40 40 100 50 50 50 20 50 20 0 0 0 0 0 0 0 1 2 3 0 5 10 15 20 10 20 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 2.5 5.0 7.5 Cell MFI (x 1,000) Figure S9. 36 circulating innate immunophenotypes were associated with bacterial profiles, Related to Figure 6.

Figure S9. 36 circulating innate immunophenotypes were associated with bacterial profiles, Related to Figures 6

(A) Distribution of innate immunophenotypes across cell type. (B) Distribution of 12 circulating immune cell counts across donors. Colors correspond to those in A. Numbers are mean cell count ± standard deviation. (C) Distribution of 24 Mean Fluorescence Intensities (MFIs) across donors. Colors correspond to those in A. Numbers are mean value ± standard deviation. See also Tables S19 and S20.

50

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S10. Bacteria species associated with circulating regulatory T (Treg) and gd+ T cells, Related to Figure 5

(A) Right bars indicate the prevalence of the species in the MI donors and are grouped by bacterial class. In the center, a circle represents each significant immunophenotype x species interaction. Fill color indicates whether the interaction was positive/negative and size the -log10 FDR value. (B) Boxplots show the number of gd+ T-cells in donors with or without the Terrisporobacter species. ** p < 0.01 by Wilcoxon-Rank Sum.

See also Table S21.

51

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Alistipes onderdonkii Alistipes finegoldii Alistipes_A indistinctus Alistipes_A sp900240235 Alistipes_A ihumii A Tidjanibacter inops Rikenella microfusus Bacteroides ovatus Bacteroides congonensis Bacteroides xylanisolvens Bacteroides caecimuris Bacteroides sp900066265 Bacteroides finegoldii Bacteroides thetaiotaomicron Bacteroides_B dorei distasonis Bacteroides fragilis_A Bacteroides fragilis Parabacteroides goldsteinii Bacteroides salyersiae Bacteroides uniformis Bacteroides fluxus Bacteroides faecichinchillae Bacteroides eggerthii Bacteroides clarus Bacteroides cutis Odoribacter laneus Bacteroides_B vulgatus Bacteroides_B sartorii Bacteroides_B massiliensis Prevotellamassilia sp002933955 Bacteroides nordii Bacteroides bouchesdurhonensis Bacteroides faecis Bacteroides stercoris Parabacteroides johnsonii Parabacteroides timonensis Parabacteroides sp900155425 Bacteroides caccae Bacteroides stercorirosoris Bacteroides oleiciplenus Bacteroides intestinalis Bacteroides timonensis Bacteroides cellulosilyticus Gabonibacter massiliensis Barnesiella viscericola Barnesiella sp002159975 Bacteroides gallinarum Coprobacter secundus Butyricimonas virosa Butyricimonas sp002161485 Odoribacter splanchnicus Alistipes timonensis Alistipes sp900083545 Alistipes senegalensis Alistipes shahii Alistipes obesi Alistipes sp002161445 Alistipes sp002159375 Alistipes sp900290115 Alistipes sp900021155 Barnesiella intestinihominis Alistipes putredinis Parasutterella excrementihominis Prevotella copri_A Prevotella copri Prevotella sp002251295 Prevotella buccalis Prevotella bivia Prevotella sp002251435 Prevotella sp002251365 Prevotella sp900290275 Prevotella sp002265625 Prevotella sp000834015 Prevotella stercorea Prevotellamassilia timonensis Prevotella sp002251385 Prevotella lascolaii Prevotella sp000479005 Prevotella disiens Butyricimonas sp900258545 Paraprevotella xylaniphila Paraprevotella clara Prevotella sp002933775 Bacteroides_A ilei Bacteroides_A barnesiae Bacteroides_A mediterraneensis Bacteroides_A plebeius Bacteroides_A coprocola Bacteroides_A coprophilus Prevotella sp001275135 Bacteroides_A sp002161565 Bacteroides_A salanitronis Bacteroides sp002160055 Bacteroides_A sp002161765 Bacteroides ndongoniae Bacteroides massiliensis Bacteroides togonis Parabacteroides sp002159645 CAG−462 sp900291465 UBA3382 sp002159555 Barnesiella sp002161555 Odoribacter massiliensis Sutterella wadsworthensis_B Desulfovibrio piger_A Flavonifractor sp002161085 Flavonifractor sp002159455 Flavonifractor sp002161215 Flavonifractor sp002159175 Lawsonibacter sp002161175 Flavonifractor sp002159265 Lawsonibacter sp002160305 An200 sp002160025 Pseudoflavonifractor capillosus Anaerofilum sp002160015 Butyricicoccus pullicaecorum Fournierella sp002161595 Anaerotignum lactatifermentans UBA9475 sp002161235 Bittarella massiliensis Fournierella massiliensis Provencibacterium massiliense UBA9475 sp002161675 An172 sp002160515 Marseille−P4683 sp900232885 Faecalicatena sp900120155 Massilimaliae massiliensis Anaeromassilibacillus sp001305115 Traorella massiliensis Faecalicatena sp000509105 Clostridium_M asparagiforme Phocea massiliensis Negativibacillus massiliensis Fournierella sp002159185 Emergencia timonensis Anaerotruncus sp900199635 Lawsonibacter asaccharolyticus Anaerotruncus colihominis Flavonifractor sp900199495 Flavonifractor sp000508885 Oscillibacter sp900066435 Neobitarella massiliensis Akkermansia muciniphila_C Akkermansia muciniphila_B Akkermansia muciniphila_A Akkermansia muciniphila Dorea sp900312975 Anaeromassilibacillus sp002159845 GCA−900066575 sp002160825 GCA−900066575 sp002160765 Enorma massiliensis Clostridium_A leptum Acutalibacter timonensis Merdibacter massiliensis Hungatella hathewayi Corr Clostridium_M sp000155435 Clostridium_Q symbiosum Clostridium_M clostridioforme Eisenbergiella massiliensis Hungatella effluvii Clostridium_M citroniae Clostridium_M bolteae Clostridium_M lavalense Bilophila wadsworthia Holdemania massiliensis Ruminococcus_F champanellensis 1.0 Lachnoanaerobaculum sp000296385 Criibacterium bergeronii Intestinimonas massiliensis Angelakisella massiliensis D5 sp900113995 Oscillibacter valericigenes Oscillibacter ruminantium Oscillibacter sp000403435 Oscillibacter sp900115635 Lawsonibacter sp000492175 Lawsonibacter sp000177015 Intestinimonas butyriciproducens Marseille−P3106 sp900169975 Anaerotruncus rubiinfantis OEMS01 sp900199405 Butyricicoccus_A porcorum Ruminiclostridium_E siraeum Ruminococcus_E bromii_A GCA−900066995 sp900291955 0.5 Massilioclostridium coli Eubacterium_R coprostanoligenes Coprobacillus cateniformis Christensenella minuta Sharpea azabuensis Catenibacterium mitsuokai Holdemanella biformis Phascolarctobacterium_A succinatutens Bacteroides_F pectinophilus Acetivibrio_A ethanolgignens Roseburia sp001940165 14−2 sp001940225 Butyrivibrio_A crossotus TF01−11 sp001414325 Lachnospira eligens_B Lachnospira eligens Lachnospira rogosae KLE1615 sp900066985 Clostridium_Q sp003024715 0.0 Roseburia inulinivorans Roseburia intestinalis Agathobacter faecis Coprococcus eutactus_A ER4 sp000765235 Ruminococcus_C callidus Coprococcus sp000154245 Faecalicatena lactaris Roseburia hominis Eubacterium_G ventriosum Agathobacter rectale Eubacterium_I ramulus_A Eubacterium_I ramulus Agathobaculum butyriciproducens Lactobacillus_G kefiri Gemmiger variabile Gemmiger formicilis Hungatella saccharolyticum −0.5 Clostridium_Q saccharolyticum Faecalibacterium prausnitzii_F Faecalibacterium prausnitzii_E Faecalibacterium prausnitzii_H Faecalibacterium prausnitzii_K Faecalibacterium prausnitzii_A Faecalibacterium sp002160915 Faecalibacterium sp002160895 Faecalibacterium prausnitzii_D Faecalibacterium prausnitzii_C Escherichia flexneri Escherichia coli_D Escherichia coli Clostridium disporicum Clostridium celatum Clostridium saudiense Turicibacter sanguinis Terrisporobacter othiniensis −1.0 Intestinibacter bartlettii Turicibacter sp001543345 Veillonella parvula_A Dialister invisus Ruminococcus_D bicirculans Ruthenibacterium lactatiformans Absiella innocuum Eisenbergiella tayi Faecalicatena torques Dorea scindens Blautia producta Sellimonas intestinalis Faecalicatena contorta_B Blautia_A hydrogenotrophica Absiella sp000163515 Blautia sp001304935 Gordonibacter pamelaeae Eggerthella lenta Gordonibacter urolithinfaciens Adlercreutzia equolifaciens Sellimonas sp002161525 Eubacterium_E sp900016875 Dorea phocaeense OEMR01 sp900199515 Dorea sp002160985 Dorea faecis Dorea sp000765215 Massiliomicrobiota sp002160815 Erysipelatoclostridium ramosum Erysipelatoclostridium spiroforme Streptococcus vestibularis Streptococcus sp000187445 Streptococcus sp001556435 Streptococcus salivarius Streptococcus parasanguinis_D Streptococcus parasanguinis_B Streptococcus parasanguinis Streptococcus sp001587175 Pauljensenia sp000466265 Pauljensenia sp000278725 Blautia_A wexlerae Blautia_A massiliensis Eubacterium_E hallii Anaerostipes hadrus Fusicatenibacter saccharivorans Erysipelatoclostridium saccharogumia Erysipelatoclostridium sp003024675 Erysipelatoclostridium sp000752095 Lactococcus lactis_E Lactococcus lactis Streptococcus thermophilus Dorea sp900240315 Blautia sp900120295 Faecalicatena gnavus Faecalicatena glycyrrhizinilyticum Tyzzerella sp000411335 Blautia sp003287895 Faecalicatena sp001487105 Clostridium_M sp001517625 Anaerotignum sp001304995 Lachnoclostridium_A edouardi Agathobaculum sp900291975 Clostridium_M clostridioforme_A Sellimonas sp002159995 Blautia sp002161285 Lactonifactor longoviformis Tyzzerella nexilis Blautia hansenii Holdemania sp900120005 Ruminococcus_E bromii_B Ruminococcus_E bromii Lactobacillus_B ruminis Bifidobacterium ruminantium Bifidobacterium adolescentis Bifidobacterium angulatum Bifidobacterium bifidum Bifidobacterium longum Bifidobacterium infantis Bifidobacterium breve Bifidobacterium sp002742445 Bifidobacterium pseudocatenulatum Bifidobacterium kashiwanohense_A Bifidobacterium kashiwanohense Bifidobacterium catenulatum Bifidobacterium dentium Anaerostipes hadrus_A Dorea longicatena_B Dorea formicigenerans Faecalicatena faecis Dorea longicatena Coprococcus_B comes Ruminococcus_A sp003011855 Eubacterium_E hallii_A Blautia_A sp900120195 Blautia_A obeum Romboutsia timonensis Faecalitalea cylindroides Christensenella massiliensis Collinsella sp002232035 Collinsella aerofaciens_E Collinsella aerofaciens_A Collinsella aerofaciens_F Collinsella aerofaciens Collinsella sp000763055 Collinsella bouchesdurhonensis Senegalimassilia anaerobia Arthrobacter_I sp900168135 Alteromonas australica Pseudomonas_E sp000282475 Mycolicibacterium phlei Sphingomonas elodea Mesorhizobium muleiense Acidovorax_E sp003053545 Clostridium_H botulinum_B Novosphingobium sp000281975 Burkholderia cepacia Phytoplasma sp000309485 Mesorhizobium sp000502715 Corynebacterium provencense Mesorhizobium sp000502875 Mesorhizobium australicum Mycobacterium gastri Pseudomonas_E asplenii Lactobacillus_C zeae Nocardia otitidiscaviarum Weissella viridescens Amycolatopsis vancoresmycina Mycolicibacterium neoaurum_A Corynebacterium urinapleomorphum Nocardia brevicatena Halomonas sp900155385 Desulfohalovibrio sp000422205 Limnohabitans sp003063265 Amycolatopsis thailandensis Mesorhizobium sp000503055 Paeniclostridium sordellii Pseudomonas_E sp002112885 Olsenella sp001457795 Desulfococcus multivorans Amycolatopsis alba Leucobacter sp000349545 Shewanella decolorationis Negativicoccus massiliensis Rhizorhabdus wittichii Pseudomonas_E tolaasii Pseudomonas_E aeruginosa Mycolicibacterium houstonense Flavonifractor plautii

Dorea faecis Prevotella bivia AlistipesAlistipes obesi shahii Blautia hansenii BlautiaDorea producta scindensDialister invisusEscherichia coli Prevotella copri Blautia_A obeum Tyzzerella nexilis Eggerthella lenta D5 sp900113995 Prevotella disiens Bacteroides cutis Alistipes_A ihumii Dorea longicatena Lactococcus lactis EisenbergiellaAbsiella innocuum tayi Escherichia coli_D RoseburiaER4 hominis sp000765235 Hungatella effluvii Bacteroides_APrevotella ilei lascolaiiPrevotellaPrevotellaAlistipes buccalis copri_A putredinis BacteroidesBacteroides faecisBacteroides nordiiBacteroides clarus fluxus Alistipes finegoldii Flavonifractor plautiiAmycolatopsis alba Blautia_A wexlerae Dorea phocaeense ClostridiumEscherichia celatum flexneri GemmigerGemmiger formicilis variabile AgathobacterLachnospira faecis14−2 eligenssp001940225 Bacteroides togonis ParaprevotellaPrevotella clara stercorea Alistipes timonensis Bacteroides caccae Odoribacter laneusBacteroides fragilisBacteroidesTidjanibacter ovatus inops Nocardia brevicatena Burkholderia cepacia FaecalicatenaDorea longicatena_B faecis Blautia sp002161285Blautia Blautiasp003287895Dorea Lactococcussp900120295 sp900240315AnaerostipesEubacterium_E lactis_E hadrus hallii DoreaDorea sp000765215 sp002160985Blautia sp001304935 Veillonella parvula_A Agathobacter rectale Lachnospira rogosae Sharpea azabuensis EnormaDorea massiliensis sp900312975Holdemania filiformisPhocea massiliensisAn172 sp002160515An200 sp002160025Desulfovibrio piger_A Butyricimonas virosa Bacteroides stercorisBacteroides eggerthiiBacteroides_BRikenella dorei microfususAlistipes onderdonkii Rhizorhabdus wittichii WeissellaLactobacillus_CMycobacterium viridescens zeae gastriSphingomonasCollinsella elodea aerofaciens Bifidobacterium breve Faecalicatena gnavusBlautia_A massiliensis AbsiellaSellimonas sp000163515Faecalicatena intestinalis torquesTuricibacterClostridium sanguinis saudiense Lactobacillus_GFaecalicatena kefiriCoprococcusRoseburia lactaris eutactus intestinalis Holdemanella biformis BilophilaClostridium_M wadsworthiaHungatella bolteaeClostridium_A hathewayi leptum Traorella massiliensisBittarella massiliensis Alistipes senegalensisBarnesiella viscericola Bacteroides_BBacteroides sartorii Bacteroides uniformisBacteroides fragilis_A finegoldii MycolicibacteriumAlteromonas australica phlei RomboutsiaEubacterium_ECoprococcus_B timonensisDoreaAnaerostipes formicigeneranshallii_A comes hadrus_ABifidobacteriumBifidobacteriumBifidobacterium infantis longum bifidum OEMR01 sp900199515 IntestinibacterClostridium bartlettii disporicum Eubacterium_I ramulusRoseburiaKLE1615Lachnospira TF01inulinivorans sp900066985−11 sp001414325 eligens_BChristensenellaMassilioclostridiumOEMS01 minuta sp900199405 coli Clostridium_M citroniae Emergencia timonensis AlistipesAlistipesAlistipesAlistipes sp900021155 sp900290115 Alistipessp002159375 sp002161445 Coprobactersp900083545CoprobacterBacteroidesBacteroides secundus Bacteroidesfastidiosus gallinarum timonensis intestinalis Bacteroides salyersiaeBacteroides caecimurisAlistipes_A indistinctus OlsenellaPaeniclostridium sp001457795 sordellii CollinsellaCollinsella sp000763055FaecalitaleaBlautia_A sp002232035 cylindroides sp900120195Bifidobacterium dentiumLactobacillus_BRuminococcus_E ruminis bromii Tyzzerella sp000411335 Streptococcus salivarius Butyrivibrio_ARoseburia sp001940165crossotus CriibacteriumClostridium_M bergeronii lavalenseMerdibacterAcutalibacter massiliensis timonensis UBA9475FournierellaUBA9475 sp002161675 massiliensis sp002161235 OdoribacterUBA3382CAGBacteroides−462 massiliensisBacteroides sp002159555 sp900291465Prevotella massiliensis ndongoniaeBacteroides_A sp001275135Prevotella Prevotellaplebeius sp002933775PrevotellaPrevotella Prevotellasp000479005Prevotella Prevotellasp002251385Prevotella sp000834015 Prevotellasp002265625 sp900290275 sp002251365 sp002251435 sp002251295 BacteroidesParabacteroides oleiciplenus Bacteroides_B merdae vulgatus Pseudomonas_E tolaasiiHalomonasNocardia sp900155385Pseudomonas_E otitidiscaviarum asplenii CollinsellaCollinsellaCollinsella aerofaciens_F aerofaciens_A aerofaciens_E Sellimonas sp002159995 Sellimonas sp002161525Faecalicatena contorta_BTuricibacter sp001543345 Eubacterium_IRuminococcus_C ramulus_ACoprococcus callidus eutactus_A Oscillibacter ruminantiumHoldemania massiliensis AkkermansiaNeobitarella muciniphila massiliensis Barnesiella sp002161555Bacteroides_ABacteroides_AParaprevotella coprocola barnesiae xylaniphila OdoribacterBarnesiella splanchnicus sp002159975Parabacteroides johnsonii ParabacteroidesBacteroides gordoniiBacteroidesAlistipes_A xylanisolvens congonensis sp900240235 ShewanellaLeucobacterDesulfococcus decolorationis sp000349545 multivorans Mesorhizobium muleiense BifidobacteriumRuminococcus_EHoldemania Lactonifactorangulatum sp900120005 bromii_B longoviformis PauljenseniaPauljensenia sp000278725 sp000466265Streptococcus vestibularisAdlercreutziaGordonibacter equolifaciens pamelaeae CatenibacteriumCoprobacillusRuminococcus_E mitsuokai AnaerotruncuscateniformisOscillibacter Oscillibacterbromii_AOscillibacter rubiinfantisAngelakisella sp900115635 sp000403435 valericigenes massiliensisEisenbergiellaClostridium_Q massiliensis symbiosum OscillibacterAnaerotruncus Fournierellasp900066435 colihominis Massilimaliaesp002159185 massiliensisFournierellaAnaerofilum sp002161595 sp002160015 BacteroidesBacteroides_ABacteroides_A sp002160055 salanitronis coprophilus GabonibacterBacteroidesBacteroides cellulosilyticusmassiliensis stercorirosoris ParabacteroidesParabacteroidesBacteroides goldsteinii distasonis sp900066265 Negativicoccus Amycolatopsismassiliensis thailandensisMesorhizobiumPhytoplasmaClostridium_HAcidovorax_E australicum sp000309485Senegalimassilia botulinum_Bsp003053545Christensenella anaerobia massiliensis AnaerotignumFaecalicatena sp001304995Streptococcus sp001487105 thermophilus Terrisporobacter othiniensis Hungatella saccharolyticumEubacterium_GCoprococcus ventriosum sp000154245 Bacteroides_F pectinophilusButyricicoccus_A porcorumIntestinimonas massiliensis AkkermansiaAkkermansiaAkkermansiaFlavonifractorFlavonifractor muciniphila_A muciniphila_B muciniphila_C sp000508885 sp900199495FaecalicatenaFaecalicatena sp000509105 sp900120155FlavonifractorFlavonifractorFlavonifractorFlavonifractorFlavonifractor sp002159265 sp002159175 sp002161215 sp002159455 sp002161085 Barnesiella intestinihominis ParabacteroidesBacteroides_B timonensisBacteroides massiliensis faecichinchillae Pseudomonas_E aeruginosaMesorhizobiumLimnohabitans sp000503055 sp003063265MesorhizobiumMesorhizobium sp000502875 sp000502715Arthrobacter_I sp900168135 Bifidobacterium catenulatumBifidobacteriumBifidobacterium adolescentis ruminantiumClostridium_M sp001517625 StreptococcusStreptococcusStreptococcusStreptococcus sp001587175 parasanguinis sp001556435 sp000187445 Blautia_A hydrogenotrophicaRuminococcus_D bicirculans Clostridium_Q sp003024715Acetivibrio_A ethanolgignensRuminiclostridium_ELawsonibacterLawsonibacter siraeum sp000177015 sp000492175 Clostridium_M sp000155435 AnaerotruncusNegativibacillusClostridium_M sp900199635 massiliensis asparagiformeButyricicoccusLawsonibacterLawsonibacter pullicaecorumSutterella sp002160305 sp002161175 wadsworthensis_BBacteroides_ABacteroides_A sp002161765 sp002161565ButyricimonasPrevotellamassilia sp900258545 timonensis Butyricimonas sp002161485 Bacteroides thetaiotaomicron Corynebacterium provencense Bifidobacterium sp002742445 AgathobaculumLachnoclostridium_A sp900291975 edouardi Eubacterium_EGordonibacter sp900016875 urolithinfaciens FaecalibacteriumFaecalibacteriumFaecalibacteriumFaecalibacteriumFaecalibacteriumFaecalibacteriumFaecalibacterium prausnitzii_C prausnitzii_D prausnitzii_A prausnitzii_K prausnitzii_H prausnitzii_E prausnitzii_F Clostridium_M clostridioforme Parabacteroides sp002159645 Parabacteroides sp900155425 MycolicibacteriumPseudomonas_E houstonenseDesulfohalovibrioMycolicibacterium Amycolatopsissp002112885 sp000422205 vancoresmycina neoaurum_A Pseudomonas_ECollinsella bouchesdurhonensissp000282475Ruminococcus_A sp003011855 StreptococcusStreptococcusErysipelatoclostridium parasanguinis_B parasanguinis_D ramosum FaecalibacteriumFaecalibacteriumClostridium_Q sp002160895 sp002160915 saccharolyticum GCA−900066995Marseille sp900291955−P3106 sp900169975 GCAGCA−900066575−900066575 sp002160765 sp002160825Lawsonibacter asaccharolyticusMarseilleProvencibacterium−P4683Anaerotignum sp900232885Pseudoflavonifractor massiliense lactatifermentans capillosus Novosphingobium sp000281975 Bifidobacterium kashiwanohense Clostridium_MFaecalicatena clostridioforme_A glycyrrhizinilyticumFusicatenibacter saccharivoransErysipelatoclostridiumMassiliomicrobiota spiroformesp002160815 Ruthenibacterium lactatiformans Intestinimonas butyriciproducens Bacteroides_A mediterraneensis Parasutterella excrementihominis BacteroidesPrevotellamassilia bouchesdurhonensis sp002933955 Agathobaculum butyriciproducens Eubacterium_R coprostanoligenes Ruminococcus_F champanellensis Corynebacterium urinapleomorphum BifidobacteriumBifidobacterium kashiwanohense_A pseudocatenulatum ErysipelatoclostridiumErysipelatoclostridium sp000752095 sp003024675 Anaeromassilibacillus sp002159845 Anaeromassilibacillus sp001305115 Erysipelatoclostridium saccharogumia Lachnoanaerobaculum sp000296385 Phascolarctobacterium_A succinatutens B

30

20

Total % Total 10

0 Sample

C 1.00

0.75

0.50

0.25 Relative Abundance Relative 0.00 Sample

Acidovorax_E sp003053545 Leucobacter sp000349545 Novosphingobium sp000281975 Alteromonas australica Limnohabitans sp003063265 Olsenella sp001457795 Amycolatopsis alba Mesorhizobium australicum Paeniclostridium sordellii Amycolatopsis thailandensis Mesorhizobium muleiense Phytoplasma sp000309485 Amycolatopsis vancoresmycina Mesorhizobium sp000502715 Pseudomonas_E aeruginosa Arthrobacter_I sp900168135 Mesorhizobium sp000502875 Pseudomonas_E asplenii Burkholderia cepacia Mesorhizobium sp000503055 Pseudomonas_E sp000282475 Clostridium_H botulinum_B Mycobacterium gastri Pseudomonas_E sp002112885 Corynebacterium provencense Mycolicibacterium houstonense Pseudomonas_E tolaasii Corynebacterium urinapleomorphum Mycolicibacterium neoaurum_A Rhizorhabdus wittichii Desulfococcus multivorans Mycolicibacterium phlei Shewanella decolorationis Desulfohalovibrio sp000422205 Negativicoccus massiliensis Sphingomonas elodea Halomonas sp900155385 Nocardia brevicatena Weissella viridescens Lactobacillus_C zeae Nocardia otitidiscaviarum

Figure S11. Putative reagent contaminants identified in the MI samples.

52

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S11. Putative reagent contaminants identified in the MI samples.

(A) Correlation plot of 393 species in the MI donors with prevalence > 30%. Boxed are the 41 species identified as putative reagent contaminants. (B) Barplots show the portion of all microbial reads mapping to contaminant organisms. (C) Barplots show how the contaminant reads are distributed across species consistently across samples. See also Table S23.

53

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Richness across donors COG GO KEGG MetaCyc 125 80 200 75 100 60 75 150 50 40 50 100 25 Frequency 25 50 20 0 0 0 0 10 15 20 25 30 2 4 6 8 2 3 4 5 0.15 0.20 0.25 0.30 0.35 Richness (x1,000) B Variables associated with richness COG GO KEGG MetaCyc ** **

Read Count *** *** . GLUC ** ** ** ** ** **

BMI ***

ALT *** *** *** *** −0.1 0.0 0.1 0.2 −0.1 0.0 0.1 0.2 −0.1 0.0 0.1 0.2 −0.1 0.0 0.1 0.2

Spearman Rho −0.1 0.0 0.1 0.2

C Prevalence across donors COG GO KEGG MetaCyc

30000 39116 20552 3210 6000 3864 2878 1489 3000 4595 3178 1107 150 364 283 114

20000 4000 2000 100

10000 1000 50

Frequency 2000

0 0 0 0 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 % Prevalence D Relative abundance of top 20 functional units COG GO KEGG MetaCyc 1.00 1.00 1.00 1.00

0.75 0.75 0.75 0.75

0.50 0.50 0.50 0.50

0.25 0.25 0.25 0.25 Top 20 pathways Top 0.00 0.00 0.00 0.00 Sample E Longitudinal stability COG GO KEGG MetaCyc 1.00 0.75 0.50 0.25 Dissimilarity 0.00 Bray Jaccard Bray Jaccard Bray Jaccard Bray Jaccard

Between donors Within a donor

Figure S12. Microbial pathway profiles across donors.

54

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S12. Microbial pathway profiles across donors

(A) Distribution of pathway/gene richness across donors and different functional databases. (B) Variables associated with functional richness based on Spearman. *** p-value < 0.001, ** p-value < 0.01. (C) Distribution of pathway prevalence across donors. The first line and value to the left indicates the number of pathways present in at least 5% of donors, the next 50%, and the final 100%. (D) Relative abundance plots of the top 20 functional units per database based on prevalence and mean abundance. (E) Boxplots of Bray-Curtis distances and binary Jaccard distances between donors and within a donor over time (n= 413, time between samples= 17 ± 3.3 days), 1 = samples are completely different, 0 = samples are identical. For all Wilcoxon-Rank Sum p-values were < 2.2e-16.

55

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

References

Allaire, J., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J., Chang, W., and Iannone, R. (2019). rmarkdown: Dynamic Documents for R.

Allegretti, J.R., Mullish, B.H., Kelly, C., and Fischer, M. (2019). The evolution of the use of faecal microbiota transplantation and emerging therapeutic indications. Lancet 394, 420-431.

An, R., Wilms, E., Masclee, A.A.M., Smidt, H., Zoetendal, E.G., and Jonkers, D. (2018). Age- dependent changes in GI physiology and microbiota: time to reconsider? Gut 67, 2213-2222.

Ananthakrishnan, A.N., Luo, C., Yajnik, V., Khalili, H., Garber, J.J., Stevens, B.W., Cleland, T., and Xavier, R.J. (2017). Gut Microbiome Function Predicts Response to Anti-integrin Biologic Therapy in Inflammatory Bowel Diseases. Cell Host Microbe 21, 603-610 e603.

Arpaia, N., Campbell, C., Fan, X., Dikiy, S., van der Veeken, J., deRoos, P., Liu, H., Cross, J.R., Pfeffer, K., Coffer, P.J., et al. (2013). Metabolites produced by commensal bacteria promote peripheral regulatory T-cell generation. Nature 504, 451-455.

Asnicar, F., Weingart, G., Tickle, T.L., Huttenhower, C., and Segata, N. (2015). Compact graphical representation of phylogenetic data and metadata with GraPhlAn. PeerJ 3, e1029.

Backhed, F., Roswall, J., Peng, Y., Feng, Q., Jia, H., Kovatcheva-Datchary, P., Li, Y., Xia, Y., Xie, H., Zhong, H., et al. (2015). Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life. Cell Host Microbe 17, 690-703.

Belkaid, Y., and Harrison, O.J. (2017). Homeostatic Immunity and the Microbiota. Immunity 46, 562-576.

Biagi, E., Franceschi, C., Rampelli, S., Severgnini, M., Ostan, R., Turroni, S., Consolandi, C., Quercia, S., Scurti, M., Monti, D., et al. (2016). Gut Microbiota and Extreme Longevity. Curr Biol 26, 1480-1485.

Biagi, E., Nylund, L., Candela, M., Ostan, R., Bucci, L., Pini, E., Nikkila, J., Monti, D., Satokari, R., Franceschi, C., et al. (2010). Through ageing, and beyond: gut microbiota and inflammatory status in seniors and centenarians. PLoS One 5, e10667.

Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120.

Browne, H.P., Forster, S.C., Anonye, B.O., Kumar, N., Neville, B.A., Stares, M.D., Goulding, D., and Lawley, T.D. (2016). Culturing of 'unculturable' human microbiota reveals novel taxa and extensive sporulation. Nature 533, 543-546.

Caspi, R., Billington, R., Fulcher, C.A., Keseler, I.M., Kothari, A., Krummenacker, M., Latendresse, M., Midford, P.E., Ong, Q., Ong, W.K., et al. (2018). The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res 46, D633-D639.

Costello, E.K., Lauber, C.L., Hamady, M., Fierer, N., Gordon, J.I., and Knight, R. (2009). Bacterial community variation in human body habitats across space and time. Science 326, 1694-1697.

56

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

De Filippis, F., Pasolli, E., Tett, A., Tarallo, S., Naccarati, A., De Angelis, M., Neviani, E., Cocolin, L., Gobbetti, M., Segata, N., et al. (2019). Distinct Genetic and Functional Traits of Human Intestinal Prevotella copri Strains Are Associated with Different Habitual Diets. Cell Host Microbe 25, 444-453 e443.

Ding, T., and Schloss, P.D. (2014). Dynamics and associations of microbial community types across the human body. Nature 509, 357-360.

Dubin, K., Callahan, M.K., Ren, B., Khanin, R., Viale, A., Ling, L., No, D., Gobourne, A., Littmann, E., Huttenhower, C., et al. (2016). Intestinal microbiome analyses identify melanoma patients at risk for checkpoint-blockade-induced colitis. Nat Commun 7, 10391.

Faith, J.J., Guruge, J.L., Charbonneau, M., Subramanian, S., Seedorf, H., Goodman, A.L., Clemente, J.C., Knight, R., Heath, A.C., Leibel, R.L., et al. (2013). The long-term stability of the human gut microbiota. Science 341, 1237439.

Falony, G., Joossens, M., Vieira-Silva, S., Wang, J., Darzi, Y., Faust, K., Kurilshikov, A., Bonder, M.J., Valles-Colomer, M., Vandeputte, D., et al. (2016). Population-level analysis of gut microbiome variation. Science 352, 560-564.

Fehlner-Peach, H., Magnabosco, C., Raghavan, V., Scher, J.U., Tett, A., Cox, L.M., Gottsegen, C., Watters, A., Wiltshire-Gordon, J.D., Segata, N., et al. (2019). Distinct Polysaccharide Utilization Profiles of Human Intestinal Prevotella copri Isolates. Cell Host Microbe 26, 680-690 e685.

Fijan, S. (2014). Microorganisms with claimed probiotic properties: an overview of recent literature. Int J Environ Res Public Health 11, 4745-4767.

Flores, G.E., Caporaso, J.G., Henley, J.B., Rideout, J.R., Domogala, D., Chase, J., Leff, J.W., Vazquez-Baeza, Y., Gonzalez, A., Knight, R., et al. (2014). Temporal variability is a personalized feature of the human microbiome. Genome Biol 15, 531.

Forster, S.C., Kumar, N., Anonye, B.O., Almeida, A., Viciani, E., Stares, M.D., Dunn, M., Mkandawire, T.T., Zhu, A., Shao, Y., et al. (2019). A human gut bacterial genome and culture collection for improved metagenomic analyses. Nat Biotechnol 37, 186-192.

Frankel, A.E., Coughlin, L.A., Kim, J., Froehlich, T.W., Xie, Y., Frenkel, E.P., and Koh, A.Y. (2017). Metagenomic Shotgun Sequencing and Unbiased Metabolomic Profiling Identify Specific Human Gut Microbiota and Metabolites Associated with Immune Checkpoint Therapy Efficacy in Melanoma Patients. Neoplasia 19, 848-855.

Franzosa, E.A., Huang, K., Meadow, J.F., Gevers, D., Lemon, K.P., Bohannan, B.J., and Huttenhower, C. (2015). Identifying personal microbiomes using metagenomic codes. Proc Natl Acad Sci U S A 112, E2930-2938.

Franzosa, E.A., McIver, L.J., Rahnavard, G., Thompson, L.R., Schirmer, M., Weingart, G., Lipson, K.S., Knight, R., Caporaso, J.G., Segata, N., et al. (2018). Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 15, 962-968.

57

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Furusawa, Y., Obata, Y., Fukuda, S., Endo, T.A., Nakato, G., Takahashi, D., Nakanishi, Y., Uetake, C., Kato, K., Kato, T., et al. (2013). Commensal microbe-derived butyrate induces the differentiation of colonic regulatory T cells. Nature 504, 446-450.

Gao, C., Ganesh, B.P., Shi, Z., Shah, R.R., Fultz, R., Major, A., Venable, S., Lugo, M., Hoch, K., Chen, X., et al. (2017). Gut Microbe-Mediated Suppression of Inflammation-Associated Colon Carcinogenesis by Luminal Histamine Production. Am J Pathol 187, 2323-2336.

Garnier, S. (2018). viridis: Default Color Maps from 'matplotlib'.

Gingold-Belfer, R., Levy, S., Layfer, O., Pakanaev, L., Niv, Y., Dickman, R., and Perets, T.T. (2019). Use of a Novel Probiotic Formulation to Alleviate Lactose Intolerance Symptoms-a Pilot Study. Probiotics Antimicrob Proteins.

Godinho-Silva, C., Cardoso, F., and Veiga-Fernandes, H. (2019). Neuro-Immune Cell Units: A New Paradigm in Physiology. Annu Rev Immunol 37, 19-46.

Goodrich, J.K., Davenport, E.R., Clark, A.G., and Ley, R.E. (2017). The Relationship Between the Human Genome and Microbiome Comes into View. Annu Rev Genet 51, 413-433.

Gopalakrishnan, V., Spencer, C.N., Nezi, L., Reuben, A., Andrews, M.C., Karpinets, T.V., Prieto, P.A., Vicente, D., Hoffman, K., Wei, S.C., et al. (2018). Gut microbiome modulates response to anti-PD-1 immunotherapy in melanoma patients. Science 359, 97-103.

Human Microbiome Project, C. (2012). Structure, function and diversity of the healthy human microbiome. Nature 486, 207-214.

Jackson, M.A., Verdi, S., Maxan, M.E., Shin, C.M., Zierer, J., Bowyer, R.C.E., Martin, T., Williams, F.M.K., Menni, C., Bell, J.T., et al. (2018). Gut microbiota associations with common diseases and prescription medications in a population-based cohort. Nat Commun 9, 2655.

Johnson, A.J., Vangay, P., Al-Ghalith, G.A., Hillmann, B.M., Ward, T.L., Shields-Cutler, R.R., Kim, A.D., Shmagel, A.K., Syed, A.N., Personalized Microbiome Class, S., et al. (2019). Daily Sampling Reveals Personalized Diet-Microbiome Associations in Humans. Cell Host Microbe 25, 789-802 e785.

Kang, D.W., Adams, J.B., Coleman, D.M., Pollard, E.L., Maldonado, J., McDonough-Means, S., Caporaso, J.G., and Krajmalnik-Brown, R. (2019). Long-term benefit of Microbiota Transfer Therapy on autism symptoms and gut microbiota. Sci Rep 9, 5821.

Kassambara, A. (2019). ggcorrplot: Visualization of a Correlation Matrix using 'ggplot2'.

Kato, K., Odamaki, T., Mitsuyama, E., Sugahara, H., Xiao, J.Z., and Osawa, R. (2017). Age- Related Changes in the Composition of Gut Bifidobacterium Species. Curr Microbiol 74, 987-995.

Kearney, S.M., Gibbons, S.M., Poyet, M., Gurry, T., Bullock, K., Allegretti, J.R., Clish, C.B., and Alm, E.J. (2018). Endospores and other lysis-resistant bacteria comprise a widely shared core community within the human microbiota. ISME J 12, 2403-2416.

Kuznetsova, A., Brockhoff, P.B., and Christensen, R.H.B. (2017). {lmerTest} Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software 82, 1-26.

58

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357-359.

Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550.

Lu, J., Breitwieser, F.P., Thielen, P., and Salzberg, S.L. (2017). Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science 3.

Maier, L., Pruteanu, M., Kuhn, M., Zeller, G., Telzerow, A., Anderson, E.E., Brochado, A.R., Fernandez, K.C., Dose, H., Mori, H., et al. (2018). Extensive impact of non-antibiotic drugs on human gut bacteria. Nature 555, 623-628.

Matson, V., Fessler, J., Bao, R., Chongsuwat, T., Zha, Y., Alegre, M.L., Luke, J.J., and Gajewski, T.F. (2018). The commensal microbiome is associated with anti-PD-1 efficacy in metastatic melanoma patients. Science 359, 104-108.

McCann, K.S. (2000). The diversity-stability debate. Nature 405, 228-233.

McIntyre, A.B.R., Ounit, R., Afshinnekoo, E., Prill, R.J., Henaff, E., Alexander, N., Minot, S.S., Danko, D., Foox, J., Ahsanuddin, S., et al. (2017). Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol 18, 182.

Mehta, R.S., Abu-Ali, G.S., Drew, D.A., Lloyd-Price, J., Subramanian, A., Lochhead, P., Joshi, A.D., Ivey, K.L., Khalili, H., Brown, G.T., et al. (2018). Stability of the human faecal microbiome in a cohort of adult men. Nat Microbiol 3, 347-355.

Mollie E. Brooks, Kasper Kristensen, Koen J. {van Benthem}, Arni Magnusson, Casper W. Berg, Anders Nielsen, Hans J. Skaug, Martin Maechler, and Bolker, B.M. (2017). {glmmTMB} Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. In The R Journal, pp. 378-400.

Mueller, S., Saunier, K., Hanisch, C., Norin, E., Alm, L., Midtvedt, T., Cresci, A., Silvi, S., Orpianesi, C., Verdenelli, M.C., et al. (2006). Differences in fecal microbiota in different European study populations in relation to age, gender, and country: a cross-sectional study. Appl Environ Microbiol 72, 1027-1033.

Mullard, A. (2018). Oncologists tap the microbiome in bid to improve immunotherapy outcomes. Nat Rev Drug Discov 17, 153-155.

Nasko, D.J., Koren, S., Phillippy, A.M., and Treangen, T.J. (2018). RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification. Genome Biol 19, 165.

Neuwirth, E. (2014). RColorBrewer: ColorBrewer Palettes.

Oksanen, J., Blanchet, F.G., Friendly, M., Kindt, R., Legendre, P., McGlinn, D., Minchin, P.R., O'Hara, R.B., Simpson, G.L., Solymos, P., et al. (2019). vegan: Community Ecology Package.

59

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Parks, D.H., Chuvochina, M., Waite, D.W., Rinke, C., Skarshewski, A., Chaumeil, P.A., and Hugenholtz, P. (2018). A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996-1004.

Partula, V., Mondot, S., Torres, M.J., Kesse-Guyot, E., Deschasaux, M., Assmann, K., Latino- Martel, P., Buscail, C., Julia, C., Galan, P., et al. (2019). Associations between usual diet and gut microbiota composition: results from the Milieu Interieur cross-sectional study. Am J Clin Nutr 109, 1472-1483.

Patin, E., Hasan, M., Bergstedt, J., Rouilly, V., Libri, V., Urrutia, A., Alanio, C., Scepanovic, P., Hammer, C., Jonsson, F., et al. (2018). Natural variation in the parameters of innate immune cells is preferentially driven by genetic factors. Nat Immunol 19, 302-314.

Poyet, M., Groussin, M., Gibbons, S.M., Avila-Pacheco, J., Jiang, X., Kearney, S.M., Perrotta, A.R., Berdy, B., Zhao, S., Lieberman, T.D., et al. (2019). A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research. Nat Med 25, 1442-1452.

R Core Team (2019). R: A Language and Environment for Statistical Computing (Vienna, Austria}: R Foundation for Statistical Computing).

Ridaura, V.K., Bouladoux, N., Claesen, J., Chen, Y.E., Byrd, A.L., Constantinides, M.G., Merrill, E.D., Tamoutounour, S., Fischbach, M.A., and Belkaid, Y. (2018). Contextual control of skin immunity and inflammation by Corynebacterium. J Exp Med 215, 785-799.

Rothschild, D., Weissbrod, O., Barkan, E., Kurilshikov, A., Korem, T., Zeevi, D., Costea, P.I., Godneva, A., Kalka, I.N., Bar, N., et al. (2018). Environment dominates over host genetics in shaping human gut microbiota. Nature 555, 210-215.

Routy, B., Le Chatelier, E., Derosa, L., Duong, C.P.M., Alou, M.T., Daillere, R., Fluckiger, A., Messaoudene, M., Rauber, C., Roberti, M.P., et al. (2018). Gut microbiome influences efficacy of PD-1-based immunotherapy against epithelial tumors. Science 359, 91-97.

Sanna, S., van Zuydam, N.R., Mahajan, A., Kurilshikov, A., Vich Vila, A., Vosa, U., Mujagic, Z., Masclee, A.A.M., Jonkers, D., Oosting, M., et al. (2019). Causal relationships among the gut microbiome, short-chain fatty acids and metabolic diseases. Nat Genet 51, 600-605.

Savaiano, D.A., Ritter, A.J., Klaenhammer, T.R., James, G.M., Longcore, A.T., Chandler, J.R., Walker, W.A., and Foyt, H.L. (2013). Improving lactose digestion and symptoms of lactose intolerance with a novel galacto-oligosaccharide (RP-G28): a randomized, double-blind clinical trial. Nutr J 12, 160.

Scepanovic, P., Hodel, F., Mondot, S., Partula, V., Byrd, A., Hammer, C., Alanio, C., Bergstedt, J., Patin, E., Touvier, M., et al. (2019). A comprehensive assessment of demographic, environmental, and host genetic associations with gut microbiome diversity in healthy individuals. Microbiome 7, 130.

Schindler, D.E., Hilborn, R., Chasco, B., Boatright, C.P., Quinn, T.P., Rogers, L.A., and Webster, M.S. (2010). Population diversity and the portfolio effect in an exploited species. Nature 465, 609- 612.

60

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Schmieder, R., and Edwards, R. (2011). Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863-864.

Smith, P.M., Howitt, M.R., Panikov, N., Michaud, M., Gallini, C.A., Bohlooly, Y.M., Glickman, J.N., and Garrett, W.S. (2013). The microbial metabolites, short-chain fatty acids, regulate colonic Treg cell homeostasis. Science 341, 569-573.

Stewart, C.J., Ajami, N.J., O'Brien, J.L., Hutchinson, D.S., Smith, D.P., Wong, M.C., Ross, M.C., Lloyd, R.E., Doddapaneni, H., Metcalf, G.A., et al. (2018). Temporal development of the gut microbiome in early childhood from the TEDDY study. Nature 562, 583-588.

Thomas, S., Rouilly, V., Patin, E., Alanio, C., Dubois, A., Delval, C., Marquier, L.G., Fauchoux, N., Sayegrih, S., Vray, M., et al. (2015). The Milieu Interieur study - an integrative approach for study of human immunological variance. Clin Immunol 157, 277-293.

Tilman, D. (1999). The ecological consequences of changes in biodiversity: A search for general principles. Ecology 80, 1455-1474.

Truong, D.T., Franzosa, E.A., Tickle, T.L., Scholz, M., Weingart, G., Pasolli, E., Tett, A., Huttenhower, C., and Segata, N. (2015). MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 12, 902-903.

Turroni, F., Foroni, E., Pizzetti, P., Giubellini, V., Ribbera, A., Merusi, P., Cagnasso, P., Bizzarri, B., de'Angelis, G.L., Shanahan, F., et al. (2009). Exploring the diversity of the bifidobacterial population in the human intestinal tract. Appl Environ Microbiol 75, 1534-1545.

Walker, A. (2019). openxlsx: Read, Write and Edit XLSX Files.

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag New York).

Wickham, H., François, R., Henry, L., and Müller, K. (2019). dplyr: A Grammar of Data Manipulation.

Wilke, C.O. (2019). cowplot: Streamlined Plot Theme and Plot Annotations for 'ggplot2'.

Wood, D.E., and Salzberg, S.L. (2014). Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15, R46.

Xie, Y. (2019). knitr: A General-Purpose Package for Dynamic Report Generation in R.

Zeevi, D., Korem, T., Zmora, N., Israeli, D., Rothschild, D., Weinberger, A., Ben-Yacov, O., Lador, D., Avnit-Sagi, T., Lotan-Pompan, M., et al. (2015). Personalized Nutrition by Prediction of Glycemic Responses. Cell 163, 1079-1094.

Zhernakova, A., Kurilshikov, A., Bonder, M.J., Tigchelaar, E.F., Schirmer, M., Vatanen, T., Mujagic, Z., Vila, A.V., Falony, G., Vieira-Silva, S., et al. (2016). Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science 352, 565-569.

Zmora, N., Zilberman-Schapira, G., Suez, J., Mor, U., Dori-Bachash, M., Bashiardes, S., Kotler, E., Zur, M., Regev-Lehavi, D., Brik, R.B., et al. (2018). Personalized Gut Mucosal Colonization

61

bioRxiv preprint doi: https://doi.org/10.1101/2020.01.17.909853; this version posted January 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Resistance to Empiric Probiotics Is Associated with Unique Host and Microbiome Features. Cell 174, 1388-1405 e1321.

Zou, Y., Xue, W., Luo, G., Deng, Z., Qin, P., Guo, R., Sun, H., Xia, Y., Liang, S., Dai, Y., et al. (2019). 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat Biotechnol 37, 179-185.

62