bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Post-weaning shifts in microbiome composition and metabolism revealed by over 25,000 pig gut metagenome assembled genomes

Daniela Gaio1, Matthew Z. DeMaere1, Kay Anantanawat1, Toni A. Chapman2, Steven P. Djordjevic1, Aaron E. Darling1*

1The ithree institute, University of Technology Sydney, Sydney, NSW, Australia

2NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, NSW, Australia

*Corresponding author: Daniela Gaio, [email protected]

Keywords: shotgun metagenomics; pig; post-weaning; gut microbiome; co-assembly; binning; MAG; time series; CAZy; carbohydrate active enzymes

ABSTRACT

Using a previously described metagenomics dataset of 27 billion reads, we reconstructed over 50,000 metagenome-assembled genomes (MAGs) of organisms resident in the porcine gut, 46.5% of which were classified as >70% complete with a <10% contamination rate, and 24.4% were nearly complete genomes. Here we describe the generation and analysis of those MAGs using time- series samples. The gut microbial communities of piglets appear to follow a highly structured developmental program in the weeks following weaning, and this development is robust to treatments including an intramuscular antibiotic treatment and two probiotic treatments. The high resolution we obtained allowed us to identify specific taxonomic “signatures” that characterize the microbiome development immediately after weaning. Additionally, we characterized the carbohydrate repertoire of the organisms resident in the porcine gut, identifying 294 carbohydrate active enzymes. We tracked the shifts in abundance of these enzymes across time, and identified the and higher-level taxonomic groups carrying each of these enzymes in their MAGs, raising the possibility of modifying the piglet microbiome through the tailored provision of carbohydrate substrates. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

INTRODUCTION

Advances in sequencing technology and computational analysis enable the study of whole microbial communities from a given environment in a high-throughput manner, bypassing the need to cultivate individual organisms. Microbial community profiling is generally approached in a targeted (e.g: 16S rRNA) or untargeted (shotgun) manner. Only a tiny fraction of known bacterial species are represented in genome sequence databases, and in particular, difficult-to-culture and non-pathogenic organisms remain greatly underrepresented 1–3. On the contrary, taxa of interest for their pathogenicity or potential in the biotech sector are more often sequenced and therefore tend to be well represented in public databases. Although an increasing diversity of reference genomes is becoming available, reference- based methods are not suited to discover new genomes, as they depend, directly or indirectly, on existing databases 4. Moreover, proof of the nature of functional redundancy in the microbiome 5, and recent findings highlighting the striking functional implications of small genomic differences between organisms of the same species 6, should make us ever more aware that important compositional and functional features may be masked completely when the view of the microbiome is restricted to a small gene fragment, as in the 16S rRNA amplicon profiling technique. Shotgun metagenomic sequencing, on the other hand, allows a much more complete view of the microbial community. The technique uses assembly and binning for the reconstruction of genomes from metagenomic reads, but it comes with many technical challenges. Among the main challenges are: 1. the detection of less prevalent taxa within the sample 7, and 2. the high sequence similarity among strains of the same species, making assembly a particularly challenging task 8,9. Previous work in computational metagenomics has established that repeated sampling from an environment or subject can facilitate the reconstruction of genomes from species 10,11 and strains 12 in the microbial community. Two major steps are involved in the process: 1. assembly and 2. binning. The joint assembly of samples from an individual host is called co-assembly, while binning consists in the grouping of co-assemblies into MAGs in the process of differential coverage binning 13. The abundance per sample is then inferred from coverage depth and/or k-mer frequencies 14. The rationale behind this type of binning method is based on the observation that the abundance of genetic material from the same organism changes in one subject or environment over time in a highly correlated manner 10,11,13,15. Although over half of all species in the gut are represented by single strains 16, the fact that strains persist within a host for long periods of time (i.e. decades) 17 can be used as leverage (or: a strengthening point) to the scope of resolving genomes. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

The present work builds upon a large collection of post-weaning porcine gut samples for which we previously described the microbial community composition using an assembly-free phylogenetic profiling technique. We reconstruct genomes from this large sample collection and analyse how those genomes change in abundance across time and treatment cohorts. To do so we created co- assemblies for each individual host, pooling all the time point samples available from each subject, thereby increasing the power to reconstruct genomes from low abundance microbes. Here we describe the associations we found between specific genomes and the post-weaning aging process in piglets, as well as treatment effects and carbohydrate metabolism during the post-weaning process.

METHODS

Metagenomic samples Below we briefly summarize the origin of the samples, but we refer to our previous study 18 for a thorough description of the animal trial and the sample processing workflow 18. Subjects were post-weaning piglets (n=126) from which faecal samples were collected between 1 and 10 times during a 40 days period. Piglets were aged 22.5±2.5 days at the start of the trial. The piglets were distributed over 6 treatment cohorts: a placebo group (Control n=29); two probiotic groups (D-Scour n=18; ColiGuard n=18); one antibiotic group (Neomycin n=24) and two antibiotic-then-probiotic treatment groups (Neomycin+D-Scour n=18; Neomycin+ColiGuard n=18). Additionally, 42 faecal samples derived from the piglets’ mothers, 18 samples derived from three distinct positive controls and 20 negative controls were included. All piglets were sampled weekly, while a subset of the piglets (8 per cohort; n=48) was sampled twice weekly. In total, mothers were sampled once, while piglets were sampled between 1 and 10 times (median: 6.0; mean: 6.11). (Supplementary Figure 1) As previously described 18, samples were homogenised immediately after collection and stored at -80oC until thawed for further processing. Extraction of DNA was performed with the PowerMicrobiome DNA/RNA EP kit (Qiagen) and libraries were prepared from genomic DNA using the Hackflex method 19. Libraries were normalised, pooled and sequenced on three Illumina NovaSeq S4 flow cells. The data was deposited to the NCBI Short Read Archive under project PRJNA526405. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Sequence data processing A total of 911 samples were sequenced generating a mean of 7,581,656 (median=3,936,038) paired- end reads per sample for a total of 27.2 billion paired-end reads. bbduk.sh 20 (v38.22) was used for adapter trimming (parameters: k=23 hdist=1 tpe tbo mink=11), PhiX DNA removal (parameters: k=31 hdist=1) and quality filtering (parameters: ftm=0 qtrim=r trimq=20). Quality assessment was performed with FASTQC 21 (v0.11.18) and MULTIQC 22 (v1.8).

Pooled assembly and binning Adapter trimmed and quality filtered paired-end reads from all samples were grouped by subject and assembled using MEGAHIT 23 (v1.1.3) (parameters: --min-contig-len=5 --prune-level=3 --max- tip-len=280), yielding a combined total of 8,389,418 contigs. Reads were mapped back to assembly contigs using BWA MEM 24 (v0.7.17-r1188). Format conversion to bam format was performed with Samtools 25 (v1.9). Metagenome binning of the co-assemblies was performed for each subject with MetaBAT2 26 (v2.12.1) (parameters: --minContig 2500).

MAGs quality assignment Quality analysis of metagenome-assembled genomes (MAGs) was performed with CheckM 27 (v1.0.13) following the lineage specific workflow (lineage_set).

MAGs taxonomic assignment Taxonomic clustering of MAGs was performed with Kraken2 28,29 (v2.0.8) using, in parallel: a prebuilt 8Gb database (MiniKraken DB_8GB) constructed from bacterial, viral, archaeal genomes from RefSeq version Oct. 18, 2017, and the genome database GTDB 30 (release 89.0). All MAGs were assigned a taxon at the species level. Higher taxonomic levels of the GTDB were retrieved from the latest release (release 89.0) of the bacterial and archaeal database publicly available at the Genome Taxonomy Database website (https://gtdb.ecogenomic.org/). Phyla composition profiles were obtained from presence/absence counts of MAGs, ignoring their relative abundances in each sample, using either GTDB- or CheckM- MAGs taxonomic assignments. We refer to these as GTDB clustered MAGs and CheckM clustered MAGs. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

MAGs dereplication Because MAGs were constructed independently for each host, highly similar MAGs are expected to be present in the data, representing the same species or strain recovered from different hosts. To group the MAGs into collections that represent MAGs of the same species or strain we carried out bin dereplication using dRep (v2.3.2) 31. All MAGs derived from pig samples were dereplicated with dRep 31 (parameters: --checkM_method=lineage_wf --length=500000). All other parameters were set to default. The clusters of similar bins produced by dRep at 95% identity (primary clusters) and at 99% identity (secondary clusters) were used for further analysis as described below. We refer to these as 95% ANI MAG clusters and 99% ANI MAG clusters.

Comparison of dRep clusters with GTDB- taxonomic assignments In order to assess the extent of agreement between the ANI clustering with dRep, and the taxonomic clustering with the GTDB, ANI MAG clusters (43.79%; n=22403) were compared to GTDB clustered MAGs. A 100 percent agreement score was given when a single GTDB taxonomic assignment corresponded to a single secondary cluster (99% ANI).

Data analysis Analyses of MAGs were performed for the following sets of MAGs: 1) nearly complete genomes (≥90% completeness and ≤5% contamination) as estimated and taxonomically categorized by CheckM analysis (n=12.4k); 2) length filtered and ANI clustered MAGs from dRep analysis (n=22.4k); 3) all (unfiltered) MAGs taxonomically categorized by GTDB (n=51.2k). Non-metric multidimensional scaling (NMDS) and network analysis were performed with PhyloSeq 32 using the median sequencing depth to normalise samples. Taxa were filtered for abundance and prevalence (criteria for inclusion of MAGs: must be present at least 10 times in at least 30 samples). Bray-Curtis was used as a distance measure. In addition, we ran principal component analysis with data normalised by proportions and transformed with the centered log ratio. In the latter case, the data underwent the following transformations: 1. library size normalisation by proportions; 2. sum of counts from MAGs falling under the same MAG (taxonomic or cluster) assignment; 3. mean by sampling time point and cohort; 4. centered log ratio transformation. The diversity of samples was assessed with PhyloSeq (v1.28.0) using GTDB clustered MAGs. Samples were rarefied to obtain an equal amount of MAGs by running the rarefy_even_depth bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

function. Sample diversity scores from different time points were compared by t-test and significance values were adjusted with the Bonferroni method. Age, breed, co-housing and piglet weight were tested for correlation with microbiome composition. Samples were rarefied to obtain an equal amount of MAGs. For age, breed, and co- housing, taxa were filtered for abundance and prevalence (criteria for inclusion of MAGs: must be present at least 10 times in at least 50% of the samples), and principal component analysis (PCA) was performed with PhyloSeq. For piglet weight, all taxa, independent of prevalence and abundance, were included in the analysis. As breed and age were confounded factors we additionally performed PCA on subsets of equal age or breed. The resulting eigenvectors were tested to assess the significance of the correlations. For the categorical variables “age”, “breed” and “co-housing”, Dunn tests were performed. For the continuous variable “weight”, the Spearman’s rank test was performed. Significance values were adjusted by Bonferroni correction for multiple testing.

Differential abundance analysis SIAMCAT 33 was employed to determine differentially abundant GTDB clustered MAGs between groups. As expected, the data was normalised by proportions prior to analysis with SIAMCAT. Unsupervised abundance and prevalence filtering is run prior to the association testing. The abundance was normalized, and all GTDB clustered MAGs underwent analysis with SIAMCAT. Association of species with groups were found by running the SIAMCAT check.associations function, which finds associations of single species with groups using the nonparametric Wilcoxon test. The p values were corrected for false discovery rate.

Gene function analysis Protein prediction from MAGs was performed with Prodigal 34 (v2.6.3). The predicted proteins were clustered with CD-HIT 35 (v4.6) using 90% and 100% identity. Predicted proteins were mapped against a custom database of the UniRef90 database (release June 2020) using DIAMOND 36 (v0.9.31). Before mapping, the database was filtered to contain only error corrected sequences matching the uniref90_ec_filtered list from HUMAnN2 37 (/utility_mapping) (v2.8.1). To run the filtering, SeqTK 38 (v1.3-r106) was used. Additionally, predicted proteins were mapped against the carbohydrate active enzyme (CAZy) database 39 with dbCAN2 40 (v2.0.11) which utilizes DIAMOND 36 and HMMER 41. The proportion bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

of species per enzyme ID was derived as follows: MAGs were grouped by subject, enzyme ID and GTDB-assigned species; distinct groups were selected and a count was obtained; the sum of distinct species falling within one enzyme ID was obtained and the proportion was derived.

Analysis workflow and scripts In order to manage the processing of the data in the High Performance Computing environment (sequence data processing, pooled assembly, binning, MAGs quality assessment), Nextflow 42 was used. For the installation and management of environments, conda 43 was used (v4.7.12). R 44 and the following R packages were used for the data analysis and visualization: ape 45, circlize 46, cluster 47, cowplot 48, data.table 49, dplyr 50, EnvStats 51, factoextra 52, forcats 53, ggbiplot 54, ggplot2 54, ggpubr 55, ggrepel 56, gplots 57, gridExtra 58, magrittr 59, matrixStats 60, microbiome 61, openxlsx 62, pheatmap 63, phyloseq 32, plyr 64, purr 65, RColorBrewer 66 , readr 67, readxl 68, reshape 69, robCompositions 70, scales 71, seqinr 72, SIAMCAT 33, splitstackshape 73, stringr 74, tidyr 75, and tidyverse 76. A schematic workflow of the data processing and analysis is represented in Supplementary Figure 2. Workflows and scripts used for the data processing are available in the Github repository https://github.com/koadman/metapigs. Scripts used for the data analysis and visualization are available in the Github repository https://github.com/GaioTransposon/metapigs_dry.

RESULTS

A total of 51,170 MAGs from 911 samples were generated in this study, 92.96% (n=47569) of which derived from samples of post-weaning piglets, 5.24% (n=2680) from samples of the piglets’ mothers, and 1.81% (n=926) from negative and positive control samples. The binning method employed here makes use of coverage information across multiple samples, with the potential for increased accuracy with increasing numbers of samples. We observed a positive correlation (Spearman r2=0.92; p value<0.0001) between the number of time points sampled per subject and the number of bins obtained for that subject (Figure 1). The bin count per subject doubled with the number of time point samples available up to 4 time point samples, after which the gains began to diminish (2 time points: 154 MAGs; 4 time points: 296 MAGs; 9 time points: 516 MAGs; 10 time points: 539 MAGs) (Figure 1). The higher number of time point samples (n=10) available from the piglet subset group (n=48) reflects in the higher number of MAGs obtained from these piglets as observable in Supplementary Figure 1. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Completeness and contamination of MAGs According to quality analysis with CheckM 27, 46.5% (n=23,798) of the total MAGs (n=51,170), were classified as ≥70% complete with a ≤10% contamination rate, while 24.4% (n=12486) were classified as nearly complete genomes as by the Bowers et al (2015) definition of MAGs with ≥90% completeness and ≤5% contamination (Figure 2). Three hundred and thirty MAGs were ≥99% complete and had a ≤0.1% contamination rate. The distribution of contig counts (median=126.0; mean=132.6; sd=70.0) per MAG is displayed in Supplementary Figure 3. Contigs had an average N50 of 36,276 nucleotides (median=25,438).

MAG ANI clusters and taxonomic clusters As a reference-free technique of MAG generation was adopted, we grouped the MAGs into clusters of similar genomes using two methods: 1) 95% and 99% sequence similarity using the genome dereplication tool dRep 31; and 2) taxonomic clustering with Kraken2 29 against the GTDB 30. Using the first method, nearly half of the MAGs (43.8%) passed the established length-filtering threshold (500,000 nt) and were assigned a cluster ID based on average nucleotide identity (ANI). A total of 1,267 unique primary clusters (95% ANI) and 4,480 unique secondary clusters (99% ANI) were obtained. Primary or secondary clusters that were found in only one subject were defined as unique, whereas clusters that were found in more than one subject were defined as common. Primary clusters (95% ANI) were common among subjects at a higher rate than secondary clusters (99% ANI): 98.6% and 86.2% among piglets and 91.4% and 72.1% among the mothers, respectively, delineating strain specificity within hosts (Supplementary Figure 4). The second method applies taxonomic clustering using the most comprehensive to date bacterial and archaeal taxonomic database. We assessed the extent of agreement between the first method (dRep - grouping based on ANI) and the second (GTDB taxonomic assignment). The degree of agreement ranged from a mean of 94.4% at the species level (median=100%), 96.7% at the level (median=100%), and 98.3% at the family level (median=100%) to reach a >99.1% agreement at the order, class and phylum level (Supplementary Figure 5). Out of all MAGs assigned a cluster (43.8%), no duplicate secondary clusters were detected within the same host. On the contrary, 24 MAGs from 11 hosts were assigned a total of 7 distinct primary clusters. Among primary clusters found more than once within a host, two were also found more than once among other hosts (primary cluster “838”: 3 hosts; primary cluster “1099”: 4 hosts). The MAGs assigned primary cluster 838 were all assigned to the same species (F23-B02 sp000431075 species; Oscillospiraceae bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

family) by GTDB. The MAGs assigned primary cluster 1099 were assigned to three species by GTDB, falling under the family. Based on GTDB clustering of MAGs derived from positive control samples (3 positive control types; 8 replicates within each), MAG taxonomic assignments matched the expected profile and the profile obtained from analysis of the reads with MetaPhlAn2 77 reported in our previous study 18 (Supplementary Figure 6).

High-level taxonomy of the piglet gut microbiome The quality assessment tool CheckM 27 also provides a taxonomic assignment of the nearly complete genomes. According to CheckM, the post-weaning piglet gut microbiome is composed of the following Phyla in the following proportions: (63.6%), Bacteroidetes (13.2%), Tenericutes (9.2%), (6.8%), Proteobacteria (2.2%), Euryarchaeota (2.1%), Spirochaetes (1.8%), Chlamydiae (0.8%) and Synergistetes (0.5%). CheckM taxonomic clustering resolved 91.28% of the nearly complete MAGs to the Order level, the most common taxonomic orders above 1% being: Clostridiales (56.4%), Bacteroidales (12.9%), Erysipelotrichales (8.2%), (5.4%), Lactobacillales (5.4%), Selenomodales (3.4%) and Methanobacteriales (1.8%). According to taxonomic assessment with Kraken2 29, mapping the MAGs against the GTDB database, The Prevotella genus took up the largest proportion, with 10.1% of the total genera composition (Supplementary Figure 7). Most common phyla, genera and species of the piglet microbiome, grouped by Phylum, Order and Family, respectively, are shown in Supplementary Figure 7.

Time trend The predominant variation we observed in the piglet microbiomes is associated with the aging of the piglets. Both non-metric multidimensional scaling (NMDS) analysis and network analysis of samples with PhyloSeq 32 showed samples clustering by time point of collection (Supplementary Figure 8). Samples were the most scattered in principal component analysis, at the start of the trial (t0) when piglets were aged between 3 and 4 weeks old, and samples tended to cluster more closely at later time points, shifting most prominently along the first NMDS axis. The temporal shift was evident in the results from all of the various approaches we applied to filter and cluster MAGs into bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

groups, hence the final number of MAGs included in the analysis (CheckM: 12.4k; dRep: 22.4k; GTDB: 51k) (Supplementary Figure 8). The application of a different normalisation method (by proportion rather than by median sequencing depth) did not affect the observed time trend. The separation of samples by time persisted along the first four dimensions of the PCA, regardless of the approach used and the filtering criteria (Supplementary Figure 9). Samples from t0, t1 and t2 clearly separated from each other and from other time point samples both with ANI MAG clusters and GTDB clustered MAGs (ANI: PC1: 51.0% PC2: 15.9%; GTDB: PC1: 81.7% PC2: 7.5%). Despite the lower number of MAGs taking part in the dRep analysis (ANI MAG clusters=22.4k), the clearest separation of samples by time was observed along the second principal component (15.9% variation explained). Also GTDB clustered MAGs separated clearly by time point along the third and fourth principal components (PC3: 4.4%; PC4: 1.2%). (Figure 3; Supplementary Figure 9)

A remarkable tightly regulated compositional shift We found the aging-associated compositional shift throughout the length of the study to be consistent among hosts, independent of treatment, and marked by changes that were highly specific in the taxonomy. This tight regulation is observable in Figure 4, where, for visualization purposes, the abundance changes across time in all subjects are displayed for only the 26 most abundant species. A more comprehensive temporal heatmap, comprising the 112 most abundant species in the piglet population is provided in Supplementary Figure 10. Differential abundance between time points and statistical analysis of the changes over time was obtained with SIAMCAT 33. Time points analyzed were at consecutive weeks from the start (t0): t0, t2, t4, t6, t8, and t10. The top 10 significantly shifting species in piglets between t0 and t4, and between t4 and t8, are shown in Figure 5. Correlations with p value < 0.05 after false discovery rate correction were considered significant. Significance values, fold change and other metrics are provided in Supplementary File 1. A total of 143 species were found to significantly shift in abundance (84% positive fold change) in the piglet population between t0 and t2, (piglets aged 3-4 weeks and 4-5 weeks, respectively) (S11-S12), among which we found: Blautia_A wexlerae (p value<0.001; fold change: +1.4), CAG-83 sp003487665 (p value<0.001; fold change: -0.9), Lactobacillus amylovorus (p value<0.001; fold change: +1.3), CAG-45 sp002299665 (p value<0.001; fold change: +1.7), CAG- 110 sp002437585 (p value<0.001; fold change: +1.7). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Ninety species were found significantly shifted in abundance (63% positive fold change) between t2 and t4 (S11; S13), among which we found: Methanobrevibacter_A smithii (p value<0.001; fold change: -1.6), Phil1 sp001940855 (p value<0.001; fold change: -1.3), Prevotella copri (p value<0.001; fold change: +1.0), Prevotella sp000434515 (p value<0.001; fold change: +1.3), Lactobacillus amylovorus (p value<0.001; fold change: +0.7). Fifty-two species significantly shifted in abundance (85% positive fold change) between t4 and t6 (S11; S14), among which we found: Corynebacterium xerosis (p value<0.001; fold change: +1.4), Blautia_A obeum (p value<0.001; fold change: +1.1), Blautia_A sp000285855 (p value<0.001; fold change: +0.8), Clostridium sp000435835 (p value<0.001; fold change: +1.0), Clostridium_P ventriculi (p value<0.001; fold change: +1.4). The smallest number of species (n=4) shifting in abundance between time intervals were found between t6 and t8 (50% positive fold change) (S11; S15). These were: UBA7748 sp900314535 (p value=0.003; fold change: -0.9), Clostridium sp000435835 (p value=0.003; fold change: +0.8), Prevotella copri (p value=0.005; fold change: +0.3), and Bifidobacterium boum (p value=0.04; fold change: -0.8). The number of species shifting in abundance (32% positive fold change) was again higher (n=28) for the last time interval (t8-t10) when piglets aged between 7-8 and 8-9 weeks (S11; S16). During this interval the most significantly changing species were: Clostridium sp000435835 (p value<0.001; fold change: +1.4), Corynebacterium xerosis (p value<0.001; fold change: +1.2), Prevotella copri_A (p value<0.001; fold change: -0.7), Prevotella copri (p value<0.001; fold change: -0.5), Prevotella sp000434515 (p value<0.001; fold change: -0.5). We display all significant abundance shifts (FDR adjusted p value < 0.05) in Supplementary Figure 17. From this plot it is evident that most species increase during the first time interval (red dots). While most species kept increasing the following week (yellow dots; fc > 0), some decreased (yellow dots; fc < 0; cluster on the bottom left). Notably, all Prevotella species (n=16) significantly increased in abundance during the first or the second week (red and yellow dots, respectively), and nearly all of these Prevotella species (n=13) decreased during the last time interval (pink dots; fc < 0; top left). (S17)

Diversity During the first week (t0-t2) (piglet age range: 3.5±0.5 to 4.5±0.5 weeks) species richness significantly increased (Bonferroni adjusted p values: Chao1 p=3.4e-19; Shannon p=1.3e-14) (S18). Between t2 and t4 (piglets age range: 4.5±0.5 to 5.5±0.5 weeks) a loss of species evenness is bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

recorded (Simpson: Bonferroni adjusted p value<0.001). The following week (t4-t6) (piglets age range: 5.5±0.5 to 6.5±0.5 weeks) a mild increment of species richness (Chao1 Bonferroni adjusted p value=0.031) was observed. (S18)

Effects of age, breed, cohousing, and weight Age and breed were found to correlate with microbial composition (S19; Supplementary File 1). Piglets of the same breed (D×L) and two age groups (3 days difference between the two groups) separated by age at t0 in PC1 (18.2 % variation explained; Dunn test; Bonferroni adjusted p value=0.001) and at t2 in PC1 (15.5% variation explained; Dunn test; Bonferroni adjusted p value=0.001) and in PC2 (10.6% variation explained; Dunn test; Bonferroni adjusted p value=0.022). (S19B; Supplementary File 1). These were compared with SIAMCAT 33 to discover taxa that exhibit an age-association in their abundances. Taxa mildly associated with age were found at t0 (5 species; alpha 0.06) and at t2 (5 species; alpha 0.13) (S20). Three of the five taxa found significantly associated with the age groups at t0 (alpha 0.06), were also found significantly changed (alpha 0.05) towards the same direction with time when including all samples to the analysis. These were: UBA1777 sp003150355, CAG−83 sp900313295, and Methanobrevibacter_A gottschalkii. The remaining two were found significantly changed toward the opposite direction (Phil1 sp001940855) and not significantly changed (UBA3388 sp002358835). (S20; Supplementary File 1) Four of the five species found significantly changed between the two age groups at t2 (alpha 0.13), were also found significantly changed (alpha 0.05) towards the same direction with time when including all samples to the analysis. These were: Methanobrevibacter_A smithii, UBA7748 sp900314535, Bifidobacterium thermacidophilum, Bifidobacterium boum. Breed was also found to significantly associate with microbiome composition when piglets of the same age and two breeds (D×L and D×LW) were compared. Significance was reached in PC2 (variation explained: 10.5%) at t0 (Dunn test; Bonferroni adjusted p value=0.009) and in PC1 (variation explained: 14.1%) at t2 (Dunn test; Bonferroni adjusted p value=0.025). (S19C; Supplementary File 1). Piglets of the same age and two breed groups were compared with SIAMCAT to find taxa with abundances that were associated with breed. Neither at 95% confidence (alpha 0.05) or 80% confidence (alpha 0.20) were specific taxa found to be associated with breed. (Supplementary File 1) Weak correlations were found with weight at t0 (Spearman’s rank rho=-0.264 p value=0.01), t2 (Spearman’s rank rho=-0.266 p value=0.001), t4 (Spearman’s rank rho=-0.263 p value=0.007), t8 (Spearman’s rank rho=0.352 p value=0.003), and at t10 (Spearman’s rank rho=-0.346 p bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

value=0.008) (S21). A stronger correlation of weight with composition was found at t6 (Spearman’s rank rho=0.384 p value=0.001). The co-housing factor did not correlate with composition (Supplementary File 1).

The predicted proteome of the piglet Gene prediction from all MAGs (n=51,170) yielded 70,696,284 predicted open reading frames. The predicted protein sequences fell into 24.4 million clusters at 100% amino acid identity and into 6.9 million clusters at 90% amino acid identity. Homology search of all the predicted proteins against the CAZy database with HMMER identified 2,049,008 predicted proteins that are potentially involved in carbohydrate metabolism (highest e-value=1e-15; coverage: median=94.3; mean=87.0). Of these there were 1,023,807 glycoside hydrolases (GHs), 716,644 glycosyl transferases (GTs), 26,544 carbohydrate binding modules (CBMs), 253,525 carbohydrate esterases (CEs), 16,636 polysaccharide lyases (PLs), 8,626 enzymes with auxiliary activity (AA), 3,064 S-layer homology domain (SLH) enzymes and 162 cohesins found. Sequence identity against the CAZy database for each class of enzymes and the distribution of carbohydrate enzymes across phyla are shown in Supplementary Figure 22.

CAZy species-specificity and their distribution across the pig population All 294 unique CAZy enzymes in our dataset are shown in Figure 6. As each enzyme mapped against a MAG and each MAG was mapped separately against the GTDB database, the two sources of information were joined as described in the methods section. We display the prevalence of each enzyme in the pig population (y-axis), while, mapped on the x-axis, we report the highest proportion a distinct species was found for each enzyme ID. Below, we describe each enzyme class in terms of prevalence in the pig population and its distribution among species. Enzymes of the CE class (n=16) frequently occurred among subjects, with > 100 subjects carrying this class of enzymes (median=166; mean=148). These enzymes tend to be species generic (median=5.5; mean=14.2), as they fall on the left side of the plot, showing a broad distribution across multiple species. Out of all genera, Prevotella genus carried the most of these enzymes (11.5%). Similarly, of the PL enzyme class (n=26), nearly half formed a cluster in the center left side of the plot, indicating a high prevalence among subjects (median=93; mean=81) and a broad bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

distribution across species (median=20.8; mean=32.2). PL enzymes were carried mostly by the Prevotella genus (40.6%). Enzymes of GT class (n=63) are prevalent among the pig population (median=121; mean=107) and have a broad distribution across species (median=16.0; mean=25.6). The majority of these enzymes (n=47) were found among species of the Prevotella genus, 27 of which were also found in . Species of these genera had the highest variety of enzymes, as well as the highest number of genes (>200) per genome (Supplementary Figure 23). GH enzymes (n=126) were the most represented enzymes in our dataset and over 70% of these enzymes fell on the top left of the plot, showing a high frequency across subjects (median=154; mean=131) and a high distribution across species (median=11.2; median=20.4). Hundred-thirteen of the 126 GH enzymes are found in the Prevotella genus, 96 of which were shared with the Blautia_A genus. CBM enzymes (n=54) were moderately prevalent enzymes in the pig population (median=41; mean=65), and the most heterogeneous among the enzyme classes in terms of distribution across different species (median=27.0; mean=37.9), therefore they appeared as scattered across the plot. Also for this class of enzymes, the majority (n=30) were found among species of the Prevotella genus, and partially shared with Gemmiger (n=8). The CAG-269 genus carries 3 CBM enzymes neither Prevotella or Gemmiger carry in their MAGs. Five of the seven AA enzymes (n=7) were prevalent in the pig population (>90 subjects). AA10 was majorly found in Enterococcus_B hirae (52.9%), AA2, AA3, and AA6 were found mostly represented by Terrisporobacter othininensis (28.6%), Methanobrevibacter_A gottschalkii (20.1%), and Escherichia flexneri (17.5%), respectively. Cohesin (n=1) and SLH (n=1) were found in 92 and 167 subjects, respectively, and across multiple species. Cohesin was mostly found in Ruminococcus_C sp000433635 (15.4%), while Ruminiclostridium_C sp000435295 carried the most SLH (13.0%). Seven enzymes, present in between 20 and all subjects (n=167, comprising piglets and mothers), were found majorly represented by a single species (>70%) (Figure 6). These were: CBM44 (Clostridium_P perfringens), GT44 (Chlamydia suis), CBM75 (Ruminococcus_F champanellensis), CBM71 (An172 sp002160515), GH44 (Ruminococcus_C sp000433635), GH54 (RC9 sp000432655), CBM83 (Agathobacter sp900317585). In order of number of genes per genome we found: GT (median=6.3; mean=30.0), CE (median=5.5; mean=25.3), GH (median=3.7; mean=16.2), SLH (median=2.0; mean=12.7), AA (median=2.0; mean=11.2), CBM (median=2.0; mean=6.3), PL (median=1.5; mean=5.7), cohesin bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

(median=1.0; mean=3.5). Gene counts of enzymes per GTDB- species are reported in Supplementary File 1.

The piglet carbohydrate proteome across time Of the 294 CAZy enzymes in our dataset, 234 showed a significant change (Bonferroni adjusted p value < 0.05) (Supplementary File 1), when comparing the abundance between any time point by pairwise Wilcox test. We display their trend over time in the piglet population and their abundance in the mothers’ population in Supplementary Figures 24-35. A number of enzymes increased in abundance over time in the piglet population to reach similar abundance levels to the mothers’. Among these were: CE13, GH42, GH32, GT53, GT87, GH1, GT85, CBM25, CBM4, CE5, GT101, CBM26, GT103, and GT85 (S24-S34). Other enzymes decreased in abundance over time in the piglets population and their abundance in the mothers’ population was as low or lower than the abundance recorded at the last time point for the piglets. Among these were: GH123, GH29, GH110, GH33, GH30, GH109, GH2, GH35, CBM35, and CBM9 (S24-S34). A number of enzymes followed an upwards or downwards trend with time in the piglets population while their abundance in the mothers population was lower or higher, respectively, than the abundance recorded at the last time point in the piglets population. Among these were: GT66, CBM51, GH25, GH37, CBM13, CBM67, GH101, CBM44, GT46, GT22, and CBM6 (S24-S34).

DISCUSSION

Previous efforts have applied metagenomics to survey the diversity of pig 78, but have not focused on the post-weaning microbial community nor an approach that generates MAGs. Bowers et al (2017) described genomes with a completeness of at least ninety percent and a contamination rate lower than five percent as nearly complete genomes. In this study we generated over 50,000 MAGs from a dataset of 27 billion read pairs, of which 12,486 MAGs are predicted to represent nearly complete genomes of bacterial or archaeal organisms. While we previously reported the results obtained from an assembly-free analysis of community structure 18, here we report the composition and predicted carbohydrate proteome of the microbial community using MAGs constructed from the dataset. Using the MAGs and their abundance across the samples, we were able to track compositional and functional changes over a 5-week period in 126 post-weaning bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

piglets. In addition, the MAGs obtained from the piglet’s mothers, sampled only at a single time point, were included in the analysis.

More MAGs were obtained from more frequently sampled subjects Compared to other large animal metagenomic studies 79,80, we obtained a larger number of MAGs overall as well as a larger number of nearly complete MAGs. We found the number of MAGs obtained per subject to be proportional to the time points available per subject, up to 8 time points per subject, after which a saturation started to be seen. Beyond 8 time-series samples the recovery of genomes tends not to be limited by sample count or sequencing depth, but instead by other factors such as limitations of analysis software 8 and the finite amount of microbial diversity in the community. The improved recovery of MAGs from samples with more available time points could be due to three factors: 1. a greater combined depth of coverage allowing a better co-assembly of low abundance community members, or 2. more observed fluctuation in relative abundance profiles, which is informative for binning, or 3. new taxa observed at later time points. We did not investigate the extent to which the increased recovery of MAGs with additional time points was driven by greater total depth of coverage versus greater resolving power for coverage binning.

Concordance of taxonomic MAG clusters with ANI clusters Using the grouping of MAGs based on average nucleotide identity and the taxonomic clustering with GTDB, we attempted to establish the degree of nucleotide similarity of MAGs assigned to the same species. From this study, MAGs identical in nucleotide composition at 99%, and therefore assigned the same secondary cluster, were assigned to the same family and order only 98% and 99% of the times, respectively. Faith et al (2013) defined strains as organisms sharing 96% of genome content 17. From our data it can be concluded that two MAGs assigned the same taxonomic order can be considered to be 99% similar in nucleotide composition, while organisms that shared

95% of genome content, fell 99% of the times under the same taxonomic class.

Time dictates composition, down to the species level As previously observed 18, we confirm the large time factor driving the compositional shift of post- weaning piglet gut microbiota samples. The temporal shift appears regardless of the approach- dependent grouping of MAGs (CheckM taxonomic clustering, 99% ANI clustering, or GTDB taxonomic clustering), or of the filtering criteria (completeness and contamination, length filtering, no filtering, respectively), hence the resulting number of MAGs taking part into the analysis (12.4k, bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

22.4k, 51k, respectively). Overall, the first three weeks represent the establishment of a new community during which a large number of species bloom. After a brief transition period (fourth week) during which only a few species are lost and acquired, a microbial community consolidation phase starts, where species start to die off, and a smaller number of dominant taxa remain, reflective of an adult gut 81. We often found species belonging to the same genus to follow a similar trend with time (e.g. Bifidobacterium thermacidophilum, Bifidobacterium boum). This could be due to the fact that similar organisms, by sharing similar niche and nutrient requirements, are affected in a similar manner by the changing physiological environment of the piglet gut during the post-weaning process 82,83. Bifidobacterium species have been previously associated with milk consumption in infants 84. Two Bifidobacterium species dropped significantly in abundance during the last two weeks, between the third and the fourth week post-weaning, suggesting this shift to be due to the diet change from milk to solid food. We also found some species of the same genus to follow very distinct trends with time (e.g.: Methanobrevibacter_A smithii, Methanobrevibacter_A gottschalkii). Other studies showed that closely related organisms have evolved to display large differences in behaviour 6,85. Based on our carbohydrate functional profiling, it becomes clear that members of the same genus can differ significantly in their enzymatic repertoire, both on the total number of genes encoding enzymes per genome, as well as in the number of different enzymes a genome bears. This was particularly evident among species of the Prevotella genus. Over eighty percent of the GTDB clustered MAGs showing significant abundance shifts during the first week of post-weaning were positive fold changes. This is also reflected by the major increase in diversity and richness we measured by compositional analysis of the GTDB clustered MAGs, as well as by phylogenetic diversity analysis of the metagenomic reads in our previous study 18. Nearly forty percent of the species that increased during the first time interval (t0-t2) kept increasing during the following week (t2-t4), reflecting that these species kept taking up a larger portion of the microbial community. This is confirmed by the increase of evenness (as measured by balance-weighted phylogenetic diversity) we recorded for this time interval in our previous study. Among the species significantly decreasing with time during the first two weeks we found Methanobrevibacter smithii, CAG-83 sp003487665, and Phyl1 sp001940855. A large portion (>80%) of the significant shifts detected in the following time interval (t4-t6) were positive fold changes. Of the twenty-eight species shifting in abundance during the last time interval, nineteen were negative fold changes, and twelve of these were Prevotella species. Prevotella are known to be acquired post-weaning as a consequence of a substrate shift from breast-milk to solid food in bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

infants 84. Notably, of the sixteen Prevotella species established during the first and the second week, twelve significantly dropped in abundance during the last time interval.

Strain differences are strongly associated with the timeline of gut development Samples from early time points separated neatly in the second dimension of principal component analysis independently of the MAG clustering approach used and the number of MAGs taking part in the analysis. However, samples from later time points showed a higher degree of separation when using 99% ANI as a clustering approach. Here a journey-like development through stages from 3-4 towards 8-9 weeks old piglets is seen, suggesting that fine-scale genomic differences among organisms may play a larger role at later time points than at earlier time points. Faith et al (2013) suggested that strains, defined as organisms sharing 96% of genome content, remain over the course of 5 years in human faecal microbiota 17. A more recent study showed that strains are shared between individuals for decades 86. New strains can be acquired with age and co-habit an environment alongside existing strains 87,88, or substitute existing strains in the process of species replacement 89, or appear as a result of diversification of an existing strain into new types within a single host 89. Multiple paths can reach this same outcome. Garud et al (2019) suggest that although diversification of strains has been reported to happen in human-relevant timescales, the acquisition of strain diversity is more often a result of species replacement 89. The temporal signal in our dataset was so strong in the first two weeks of the trial that piglets differing by just 3 days of age separate cleanly, in the first principal component. Although the dataset for this analysis was reduced to include only piglets of the same breed, thereby avoiding breed as a confounding factor, a number of GTDB-assigned species were found to be associated with these small age differences. Additionally, we confirm factors such as host weight and breed, together with small differences in age, to correlate with microbial composition, as it was suggested by our previous analysis with unassembled metagenomic reads 18. A larger fraction of unique MAGs (either at 95% or 99% ANI) were found among the mothers than among the piglets. This could be due to the fact that the mother group was not as heavily sampled as the piglet group, unavoidably affecting the overall resolution of the community composition, or it can suggest that a higher strain specificity is reached with age.

Possible drivers of a tight regulation: immune system versus diet The highly structured compositional changes over time raise the question of what factors drive these observed compositional shifts. Changes of diet 84,90,91 and immune system 82,83,92 during weaning are bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

the likely culprits. As piglets in this study were weaned two days prior to the start of sampling and they followed the same dietary regime throughout the length of the study, we refer to changes in

diet as the weaning transition from milk to solid food. As the immune system during weaning is training to recognise pathogens, it initially tolerates a large number of species, so as to develop the necessary antigens upon encounter 93,94. Possibly due to the exposure to a new environment upon the start of the trial, and a higher tolerance for new species immediately after weaning, the piglets were found to experience the largest number of species increasing in abundance during the first two weeks, when piglets aged between 3 and 5 weeks. Thompson et al (2008) report that the microbiota of piglets of 3 weeks of age is particularly dynamic and environmental factors start to be seen at this stage 92. They determined that at 6 weeks of age CD8+ T cells infiltrate the intestinal tissue and the mucosal lining resembles that of an adult pig. The shifts we observed in the microbial community structure could be consistent with these statements. The compositional profile of over hundred most abundant species started to be more visibly stable in 5-6 weeks old piglets. Also, according to differential abundance analysis the smallest numbers of species with significant changes in abundance occurred when the piglets were aged between 6-7 and 7-8 weeks. The piglet microbial communities could potentially be modulated by substrate availability 91,95,96. In this study piglets are weaned at the start of the trial, implying that the transition from milk to solid food took place from the moment we started collecting samples. It is for this reason not possible to discern the diet-driven from the immunological-driven shifts in this study. To this end the collection of immunological data, such as the T-cells repertoire during weaning and its exploration in conjunction with compositional shifts (e.g. network analysis), could provide

interesting answers.

Probiotic strains go missing: did they fail to assemble or fail to colonize? Based on the taxonomic clustering of MAGs with GTDB, a few MAGs were assigned a taxon at the species level that matched the species contained in the probiotic formula, and they were not abundant. This could be due to a lack of successful colonisation in the piglet guts, leading to a low read count in the sample and therefore insufficient coverage to assemble contigs from these organisms, but it could also be due to within-host strain heterogeneity. The expected high sequence similarity of the Lactobacilli strains of the probiotic formulations to the indigenous Lactobacilli population could lead to the well-known difficulty in metagenomics studies of assembling mixtures of highly similar genomes 8,9. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

It is well known that repeat-containing genomic regions, lateral gene transfers, and sequencing errors, jointly contribute to assembly fragmentation and errors in genome binning in a considerable manner 11,97,98. Efforts in the field of metagenomics are currently addressing the problems caused by strain mixtures, working on the hard task of linking single nucleotide variants (SNVs) from around the genome that belong to the same strain, to ultimately recover distinct MAGs from highly similar 99 strains . In the case where the probiotic strains simply did not colonise successfully, other studies suggest that probiotic formulations can exert a beneficial effect in other ways than by colonizing the gut. Šmajs et al (2012) showed E. coli Nissle, administered at a dosage that was not sufficient for colonization, nonetheless induced immune responses in the host that led to a selection against the E. coli population 100. Substantial evidence also exists on beneficial probiotic effects by modulating the immune system 101–105, by exerting niche competition 106,107, or by producing compounds that antagonise the growth of pathogens 108,109. For these reasons, even without successful colonisation of the probiotic strains, a secondary effect on other taxa in the microbial community could be

possible.

Assessment of probiotic treatment effect on other taxa We attempted to assess the probiotic treatment effect by looking for significant differences in the abundance of GTDB clustered MAGs at the species level, between the probiotic treatment cohorts and their respective control cohorts. Although several significant differences in composition were found between cohorts, these were also found before any treatment was commenced. Among the possible underlying causes are the intrinsic differences among the piglets in breed and tiny differences in age, found here and in our previous study 18, and the predominant effect of time, possibly masking milder treatment effects. In conclusion, as we found significant differences between cohorts at time point 0 of sample collection (piglets aged 3-4 weeks), we wondered about the legitimacy of using a simple group comparison approach to evaluate significant differences

between cohorts at later time points. In this study we detected a gradual and consistent increase of Lactobacillus amylovorus among the piglet population that reached its peak in 5-6 weeks old piglets, to later gradually decrease. The prominent trend of L. amylovorus appeared both in differential abundance analysis as well as in compositional analysis. Other studies have reported the health promoting use of L. amylovorus 104,110. After testing for its pH survival and ability to adhere to cells, Kim et al (2007) demonstrated the supernatant of L. amylovorus to be as effective as antibiotics against the growth of certain bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

pathogens 110. This species has also been reported to suppress an immune-modulation cascade initiated by enterotoxigenic Escherichia coli, a well-known deadly pathogen in post-weaning piglets 104. We wonder whether the use of a piglet-derived strain of Lactobacillus instead, such as L.

amylovorus, could have led to a higher success of colonisation if administered as a probiotic.

Carbohydrate metabolism based on predicted protein CAZy profile We predicted open reading frames (ORFs) from all the MAGs, mapped them against the CAZy database, which contains amino acid sequences of known carbohydrate active enzymes, and we found 294 distinct enzymes in our dataset, distributed across eight enzyme classes. We report the representation of these CAZymes within each specific taxonomic group at lower (e.g.: species) and higher levels (e.g.: Phylum) based on GTDB taxonomic clustering. Similarly to the rumen metagenome 80 half of these enzymes were of the glycoside hydrolases class and over a third were of class glycoside transferases, the first known to break down sucrose, lactose and starch 111, the latter known to assemble complex carbohydrates from activated sugar donors 112. We obtained a considerably high mean and median sequence similarity against the CAZy database for all enzyme classes except for the auxiliary enzyme (AA) class. Despite the different taxonomic clustering methods applied in this study and the rumen metagenomic study, a few similarities were found. Similarly to the rumen metagenomic study, we found a larger proportion of GT enzymes in the Euryarchaeota phylum, and a larger proportion of CBM and PL enzymes in the Fibrobacterota

phylum. The proportions of the most abundant enzymatic classes in this study (50% GH; 35% GT; 12% CE; 1% PL) roughly matched the proportions suggested by Kaoutari et al (2013) (57% GH; 35% GT; 6% CE; 2% PL) who generated a profile of the human carbohydrate repertoire by mapping 177 reference human gut microbial genomes against the CAZy database 111. A number of enzymes known to break down animal glycans, which are present in milk (GH2, GH20, GH92, GH29, GH95, GH38, GH88) 111 decreased with time in the piglet population, while, among others, enzymes reported to break down peptidoglycans (GH25, GH73), starch and glycogen (GH13), sucrose and fructose (GH32) 111, followed an upward trend. We found a number of enzymes to decrease or increase gradually with time, to reach abundance levels that are similar to those found in the mothers. As it could be expected due to the large difference in age between the piglets at the last sampling point and the mothers (single time point), the time trend for a minority of the total enzymes changed gradually in the piglets, but did not reach the lower or higher level measured in

the mothers. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

The data we report in this study addresses a knowledge gap in the association of carbohydrate active enzymes to microbial species. We assessed the trend of CAZy enzymes across the piglets and across time, determining for each enzyme the prevalence in the pig population including the mothers and the piglets, and determining what GTDB-assigned species carried each of the enzymes in their metagenome. We found certain enzymes to be common within more closely related species, such as species belonging to the same genus. Examples of these are: AA10 found predominantly in Enterococcus genera, CBM40 in Clostridium, PL31 in , CE5, GT53, GT85 and GT87 in Corynebacterium, GT21 in Desulfovibrio, GT66 and GT81 in Methanobrevibacter, GT6 in Gemmiger, GH68 in Lactobacillus, PL21, GH54 and GH55 in RC9 genera. A large number of enzymes were common to many species and to higher level taxonomic assignments. Kaoutari et al (2013) also reported the frequency of genes from some carbohydrate active enzymes within species and, for distinct species, the proportion of distinct enzymes within an enzymatic class. By following their work, we hereby provide a comprehensive mapping of species to genes and species to

enzymes. The synthesis of knowledge of enzyme function and substrate -based activity, with knowledge of the species carrying these enzymes in their genome can be of high value to the design of probiotic and prebiotic formulations in the livestock (e.g. methane emission control; improvement of substrate energy efficiency) as well as in the human settings (e.g.: diet). However, it must be noted that the method of mapping CAZymes against a reference database, like so many reference- based analyses in microbial metagenomics, may suffer bias due to the limited representation of diversity in the database. In the case of the CAZymes the approach may accurately represent the CAZyme profile of better represented genomes while underestimating the CAZyme profile of genomes for which reference genomes are lacking.

Limitations Obtaining compositional profiles that closely reflect the microbial biomass in the sample is a highly sought after objective in microbiome studies. Because differences in read depth between samples can be minimized but not eliminated with the current sequencing tools, sample size normalisation is a necessary step, but the chosen method is critical to the interpretation of results 113. Among the commonly used methods for compositional data we find: normalisation-dependent 114,115, transformation-dependent (composition) 116, and transformation-independent (proportions) 117 approaches, each carrying its own advantages and limitations. The first method assumes that the absolute counts of some features are equal across samples, an assumption that is not particularly bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

suited to highly heterogeneous environments such as the gut microbiome. The second method assumes that a set of organisms exists that do not change between subjects or over time, and these are used as internal references. This is also a risky assumption in unstable microbial environments such as that of the post-weaning piglet gut microbiota, as we witnessed in this study. The increase or decrease of a certain organism over time does not necessarily cause a decrease or increase of another organism, but the use of proportions (third method) can lead to such conclusions, as correlations “piggyback” on each other. This phenomenon is referred to as spurious correlation of ratios. Despite the latter drawback, normalisation by proportions is currently among the most

widely used normalisation methods in microbiome studies. In our study we applied a number of normalisation methods in parallel, among which we chose median sequencing depth, rarefaction and proportions. Although the temporal shift was evident when applying any of these normalisation methods, we are aware of the fact that the use of

proportions to normalise for library size can lead to a higher discovery of false correlations. Bias can also arise from the nature of organisms in the sample. The content of GC 118 and the genome size 119 of each organism both influence the representation of species in the sample, by preferential affinity of the polymerase and the portion each organism takes up in the sample having

a direct link to its genome size.

Conflicts of interest D-Scour™ was sourced from International Animal Health Products (IAHP). ColiGuard® was developed in a research project with NSW DPI, IAHP and AusIndustry Commonwealth

government funding.

Funding information This work was supported by the Australian Research Council, linkage grant LP150100912. This project was funded via the Australian Centre for Genomic Epidemiological Microbiology (Ausgem), a collaborative partnership between the NSW Department of Primary Industries and the University of Technology Sydney. DG is a recipient of UTS International Research and UTS

President’s Scholarships.

Author contributions Study design: TC, SD, AED bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Sequencing data processing: MZD, DG, AED

Sequencing data submission: KA

Data analysis and visualization: DG

Manuscript writing: DG

Manuscript editing: AED, MZD

Acknowledgements Thank you Simone Arnold for the insightful discussions on the biological content of this manuscript. International Animal Health Products for providing access to the probiotic supplements and the Australian Centre for Genomic Epidemiological Microbiology (Ausgem) for financially supporting this study. Thank you John Webster and Dave Wheeler for providing the DPI internal review of this manuscript.

Supplementary files

Supplementary File 1. Statistical tests output. https://github.com/GaioTransposon/metapigs_dry/blob/master/out/stats.xlsx

REFERENCES

1. Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain . Nature 523, 208–211 (2015). 2. Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016). 3. Pasolli, E. et al. Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle. Cell 176, 649-662.e20 (2019). 4. Mande, S. S., Mohammed, M. H. & Ghosh, T. S. Classification of metagenomic sequences: methods and challenges. Brief. Bioinform. 13, 669–681 (2012). 5. Dethlefsen, L. & Relman, D. A. Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation. Proc. Natl. Acad. Sci. 108, 4554–4561 (2011). 6. Valguarnera, E. & Wardenburg, J. B. Good Gone Bad: One Toxin Away From Disease for Bacteroides fragilis. J. Mol. Biol. 432, 765–785 (2020). 7. Bowers, R. M. et al. Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community. BMC Genomics 16, 856 (2015). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

8. Sczyrba, A. et al. Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software. http://biorxiv.org/lookup/doi/10.1101/099127 (2017) doi:10.1101/099127. 9. Sanders, J. G. et al. Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads. Genome Biol. 20, 226 (2019). 10. Alneberg, J. et al. CONCOCT: clustering contigs on coverage and composition. ArXiv Prepr. ArXiv13124038 (2013). 11. Imelfort, M. et al. GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2, e603 (2014). 12. Cleary, B. et al. Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat. Biotechnol. 33, 1053–1060 (2015). 13. Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013). 14. Garza, D. R. & Dutilh, B. E. From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems. Cell. Mol. Life Sci. 72, 4287–4308 (2015). 15. Liu, M. & Darling, A. Metagenomic Chromosome Conformation Capture (3C): techniques, applications, and challenges. F1000Research 4, (2015). 16. Truong, D. T., Tett, A., Pasolli, E., Huttenhower, C. & Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27, 626–638 (2017). 17. Faith, J. J. et al. The Long-Term Stability of the Human Gut Microbiota. Science 341, 1237439 (2013). 18. Gaio, D. et al. Community composition and development of the post-weaning piglet gut microbiome. http://biorxiv.org/lookup/doi/10.1101/2020.07.20.211326 (2020) doi:10.1101/2020.07.20.211326. 19. Gaio, D. et al. Hackflex: low cost Illumina sequencing library construction for high sample counts. bioRxiv 779215 (2019). 20. Bushnell, B. BBDuk: Adapter. Qual. Trimming Filter. Httpssourceforgenetprojectsbbmap. 21. Andrews, S. FastQC: a quality control tool for high throughput sequence data. (2010). 22. Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016). 23. Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics btv033 (2015). 24. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv13033997 Q-Bio (2013). 25. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078– 2079 (2009). 26. Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019). 27. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015). 28. Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014). 29. Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019). 30. Parks, D. H. et al. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat. Biotechnol. (2020) doi:10.1038/s41587-020-0501-8. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

31. Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017). 32. McMurdie, P. J. & Holmes, S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PloS One 8, (2013). 33. Wirbel, J. et al. SIAMCAT: user-friendly and versatile machine learning workflows for statistically rigorous microbiome analyses. http://biorxiv.org/lookup/doi/10.1101/2020.02.06.931808 (2020) doi:10.1101/2020.02.06.931808. 34. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). 35. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012). 36. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59 (2015). 37. Franzosa, E. A. et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat. Methods 15, 962–968 (2018). 38. Li, H. seqtk Toolkit for processing sequences in FASTA/Q formats. (GitHub, 2012). 39. Cantarel, B. L. et al. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res. 37, D233–D238 (2009). 40. Zhang, H. et al. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 46, W95–W101 (2018). 41. Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011). 42. Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017). 43. Grüning, B. et al. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods 15, 475–476 (2018). 44. R Core Team, R. C. T. R: A language and environment for statistical computing. (2013). 45. Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004). 46. Gu, Z., Gu, L., Eils, R., Schlesner, M. & Brors, B. circlize implements and enhances circular visualization in R. Bioinformatics 30, 2811–2812 (2014). 47. Maechler, M. & others. Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et. R Packag Version 20 6, (2019). 48. Wilke, C. O. cowplot: Streamlined Plot Theme and Plot Annotations for ‘ggplot2’. (2019). 49. Dowle, M. & Srinivasan, A. data.table: Extension of `data.frame`. (2019). 50. Wickham, H., François, R., Henry, L. & Müller, K. dplyr: A Grammar of Data Manipulation. (2019). 51. Millard, S. P. E nv S tats, an RP ackage for E nvironmental S tatistics. Wiley StatsRef Stat. Ref. Online (2014). 52. Kassambara, A. & Mundt, F. Package ‘factoextra’. Extr. Vis. Results Multivar. Data Anal. 76, (2017). 53. Wickham, H. forcats: Tools for Working with Categorical Variables (Factors). (2019). 54. Vu, V. Q. ggbiplot: A ggplot2 based biplot. (2011). 55. Kassambara, A. ggpubr: ‘ggplot2’ Based Publication Ready Plots. (2019). 56. Slowikowski, K. et al. Package ggrepel. Autom. Position Non-Overlapping Text Labels ‘ggplot2 (2018). 57. Warnes, M. G. R. et al. Package ‘gplots’. Var. R Program. Tools Plotting Data (2016). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

58. Auguie, B. gridExtra: Miscellaneous Functions for ‘Grid’ Graphics. (2017). 59. Bache, S. M. & Wickham, H. magrittr: A Forward-Pipe Operator for R. (2014). 60. Bengtsson, H. et al. Package ‘matrixStats’. (2020). 61. Lahti, L., Shetty, S., Blake, T. & Salojarvi, J. Microbiome r package. Tools Microbiome Anal R (2017). 62. Schauberger, P. & Walker, A. openxlsx: Read, Write and Edit xlsx Files. (2019). 63. Kolde, R. pheatmap: Pretty Heatmaps. (2019). 64. Wickham, H. & Wickham, M. H. Package ‘plyr’. Obtenido Httpscran Rproject Orgwebpackagesdplyrdplyr Pdf (2020). 65. Henry, L. & Wickham, H. purrr: Functional Programming Tools. (2019). 66. Neuwirth, E. & Neuwirth, M. E. Package ‘RColorBrewer’. (2011). 67. Wickham, H., Hester, J. & Francois, R. readr: Read Rectangular Text Data. (2018). 68. Wickham, H. & Bryan, J. readxl: Read Excel Files. (2019). 69. Wickham, H., Wickham, M. H. & Rcpp, L. Package ‘reshape’. (2018). 70. Templ, M., Hron, K. & Filzmoser, P. robCompositions: an R-package for robust statistical analysis of compositional data. (2011). 71. Wickham, H. & Seidel, D. scales: Scale Functions for Visualization. (2019). 72. Charif, D. & Lobry, J. R. SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis. in Structural approaches to sequence evolution 207–232 (Springer, 2007). 73. Mahto, A. splitstackshape: Stack and Reshape Datasets After Splitting Concatenated Values. (2019). 74. Wickham, H. stringr: Simple, Consistent Wrappers for Common String Operations. (2019). 75. Wickham, H. & Henry, L. tidyr: Tidy Messy Data. (2019). 76. Wickham, H. et al. Welcome to the tidyverse. J. Open Source Softw. 4, 1686 (2019). 77. Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015). 78. Xiao, L. et al. A reference gene catalogue of the pig gut microbiome. Nat. Microbiol. 1, 16161 (2016). 79. Stewart, R. D. et al. Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nat. Commun. 9, 870 (2018). 80. Stewart, R. D. et al. Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat. Biotechnol. 37, 953–961 (2019). 81. Aleman, F. D. D. & Valenzano, D. R. Microbiome evolution during host aging. PLOS Pathog. 15, e1007727 (2019). 82. Carroll, J. A., Veum, T. L. & Matteri, R. L. Endocrine Responses to Weaning and Changes in Post-Weaning Diet in the Young Pig. Domest. Anim. Endocrinol. 15, 183–194 (1998). 83. Lallès, J.-P. et al. Gut function and dysfunction in young pigs: physiology. Anim. Res. 53, 301–316 (2004). 84. Tanaka, M. & Nakayama, J. Development of the gut microbiota in infancy and its impact on health in later life. Allergol. Int. 66, 515–522 (2017). 85. Köhler, C.-D. & Dobrindt, U. What defines extraintestinal pathogenic Escherichia coli? Int. J. Med. Microbiol. 301, 642–647 (2011). 86. Koo, H., Hakim, J. A., Crossman, D. K., Lefkowitz, E. J. & Morrow, C. D. Sharing of gut microbial strains between selected individual sets of twins cohabitating for decades. PLOS ONE 14, e0226111 (2019). 87. Siranosian, B. A., Tamburini, F. B., Sherlock, G. & Bhatt, A. S. Acquisition, transmission and strain diversity of human gut-colonizing crAss-like phages. Nat. Commun. 11, 280 (2020). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

88. Vatanen, T. et al. Genomic variation and strain-specific functional adaptation in the human gut microbiome during early life. Nat. Microbiol. 4, 470–479 (2019). 89. Garud, N. R., Good, B. H., Hallatschek, O. & Pollard, K. S. Evolutionary dynamics of bacteria in the gut microbiome within and across hosts. PLOS Biol. 17, e3000102 (2019). 90. Bian, G. et al. Age, introduction of solid feed and weaning are more important determinants of gut bacterial succession in piglets than breed and nursing mother as revealed by a reciprocal cross-fostering model: Gut bacterial succession in piglets. Environ. Microbiol. 18, 1566–1577 (2016). 91. Frese, S. A., Parker, K., Calvert, C. C. & Mills, D. A. Diet shapes the gut microbiome of pigs during nursing and weaning. Microbiome 3, 28 (2015). 92. Thompson, C. L., Wang, B. & Holmes, A. J. The immediate environment during postnatal development has long-term impact on gut community structure in pigs. ISME J. 2, 739 (2008). 93. Kollmann, T. R., Kampmann, B., Mazmanian, S. K., Marchant, A. & Levy, O. Protecting the Newborn and Young Infant from Infectious Diseases: Lessons from Immune Ontogeny. Immunity 46, 350–363 (2017). 94. Yu, J. C. et al. Innate Immunity of Neonates and Infants. Front. Immunol. 9, 1759 (2018). 95. Rinninella et al. Food Components and Dietary Habits: Keys for a Healthy Gut Microbiota Composition. Nutrients 11, 2393 (2019). 96. Heinritz, S. N., Mosenthin, R. & Weiss, E. Use of pigs as a potential model for research into dietary modulation of the human gut microbiota. Nutr. Res. Rev. 26, 191–209 (2013). 97. Luo, C. et al. ConStrains identifies microbial strains in metagenomic datasets. Nat. Biotechnol. 33, 1045–1052 (2015). 98. Kamada, M. et al. Whole Genome Complete Resequencing of Bacillus subtilis Natto by Combining Long Reads with High-Quality Short Reads. PLoS ONE 9, e109999 (2014). 99. Sharon, I. et al. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 23, 111–120 (2013). 100. Šmajs, D. et al. Experimental Administration of the Probiotic Escherichia coli Strain Nissle 1917 Results in Decreased Diversity of E. coli Strains in Pigs. Curr. Microbiol. 64, 205–210 (2012). 101. Valeriano, V. D. V., Balolong, M. P. & Kang, D. K. Probiotic roles of Lactobacillus sp. in swine: insights from gut microbiota. J. Appl. Microbiol. 122, 554–567 (2017). 102. Shimazu, T. et al. Immunobiotic Lactobacillus jensenii Elicits Anti-Inflammatory Activity in Porcine Intestinal Epithelial Cells by Modulating Negative Regulators of the Toll-Like Receptor Signaling Pathway. Infect. Immun. 80, 276–288 (2012). 103. Deng, J., Li, Y., Zhang, J. & Yang, Q. Co-administration of Bacillus subtilis RJGP16 and Lactobacillus salivarius B1 strongly enhances the intestinal mucosal immunity of piglets. Res. Vet. Sci. 94, 62–68 (2013). 104. Roselli, M. et al. The novel porcine Lactobacillus sobrius strain protects intestinal cells from enterotoxigenic Escherichia coli K88 infection and prevents membrane barrier damage. J. Nutr. 137, 2709–2716 (2007). 105. Lessard, M. et al. Administration of Pediococcus acidilactici or Saccharomyces cerevisiae boulardii modulates development of porcine mucosal immunity and reduces intestinal bacterial translocation after Escherichia coli challenge1, 2. J. Anim. Sci. 87, 922 (2009). 106. Deriu, E. et al. Probiotic bacteria reduce salmonella typhimurium intestinal colonization by competing for iron. Cell Host Microbe 14, 26–37 (2013). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

107. Hancock, V., Dahl, M. & Klemm, P. Probiotic Escherichia coli strain Nissle 1917 outcompetes intestinal pathogens during biofilm formation. J. Med. Microbiol. 59, 392–399 (2010). 108. Jin, L.-Z., Marquardt, R. R. & Baidoo, S. K. Inhibition of enterotoxigenic Escherichia coli K88, K99 and 987P by the Lactobacillus isolates from porcine intestine. J Sci Food Agric 6 (2000). 109. Rangan, K. J. et al. A secreted bacterial peptidoglycan hydrolase enhances tolerance to enteric pathogens. science 353, 1434–1437 (2016). 110. Kim, P. I. et al. Probiotic properties of Lactobacillus and Bifidobacterium strains isolated from porcine gastrointestinal tract. Appl. Microbiol. Biotechnol. 74, 1103–1111 (2007). 111. Kaoutari, A. E., Armougom, F., Gordon, J. I., Raoult, D. & Henrissat, B. The abundance and variety of carbohydrate-active enzymes in the human gut microbiota. Nat. Rev. Microbiol. 11, 497–504 (2013). 112. Lairson, L., Henrissat, B., Davies, G. & Withers, S. Glycosyltransferases: structures, functions, and mechanisms. Annu Rev Biochem 77, 521–555 (2008). 113. Quinn, T. P. et al. A field guide for the compositional analysis of any-omics data. GigaScience 8, giz107 (2019). 114. Anders, S. & Huber, W. Differential expression analysis for sequence count data. 12 (2010). 115. Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010). 116. Aitchison, J. A Concise Guide to Compositional Data Analysis. 134. 117. Greenacre, M. Variable Selection in Compositional Data Analysis Using Pairwise Logratios. Math. Geosci. 51, 649–682 (2019). 118. Browne, P. D. et al. GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms. GigaScience 9, giaa008 (2020). 119. Oshlack, A. & Wakefield, M. J. Transcript length bias in RNA-seq data confounds systems biology. Biol. Direct 4, 14 (2009).

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[1] "Correlaon mepoint sample/ mean bins obtained: " number of mepoints per subject mean_bins median_bins 1 1 62.16667 62.5

● 2 2 154.00000 142.0 800 ● 3 3 239.00000 234.5 4 4 297.60000 296.0 5 6 428.56522 433.0 ● ● ● ● 600 ● ● 6 9 516.00000 516.0 ● ● ● ● ● ● ● ● ● ● ● 7 10 538.22222 532.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

400 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 200 ● ● [1] "tot.bins/cohort: mean and median " ● ● ● ● ● ● # A bble: 7 x 6 ●

MAGs obtained per subject MAGs ● ● ● ● ● ● ● ● ● cohort mean_bins median_bins tot_bins ● ● ● ● 0 mean_sampling_frequency median_sampling_frequency 2.5 5.0 7.5 10.0 timepoints available per subject 1 Control 263. 232 7621 4.45 3 2 DScour 425. 437 7654 7.11 6 3 ColiGuard 422. 453 7605 7.11 6 Figure 1. MAGs obtained versus number of time point samples available per subject. 4 Neomycin 374. 372. 8964 5.58 4.5 The number of MAGs obtained per subject doubled with the number of time point samples available 5 NeoD 437. 451 7861 7.06 6 per subject. The gains of MAGs per subject started to diminish after 4 time point samples per subject. 6 NeoC 437. 436 7864 7 6 7 Mothers 63.8 63 2680 1 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Partial quality: 14.09% (n=7213) Medium quality: 14.11% (n=7219) Nearly complete: 24.4% (n=12486)

● ● ● ● ●● ● ● ● A B C ● ● ● ● ● ● 10.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● 400 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● 1500 ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● 7.5 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 300 ● ●● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●●●●●● ●●●● ● ● ● ●●● ● ● ●●●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●●● ●● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ●● ●● ● ● ●●●●● ● ●●● ● ● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ●● ●● ●●●● ● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1000 ● ●●●● ●● ●● ● ●● ●● ● ● ● ●●●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ●● ● ● ●●●●● ●●●●● ● ●● ● ●● ● ● ●●● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ●●●●● ● ● ● ●● ●● ● ●● ●● ●● ● ● ● ● ●●● ●● ● ● ● ●● ● ●● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ●● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ●● ● ●● ● ●● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ●●● ●●●●●● ●● ● ● ●● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ●●●●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ●●●●●● ● ● ●● ● ● 5.0 ● ● ● ●●● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ●● ●●●●●● ●● ● ● ●●● ●●● ● ● ● ● ● ● ●● ●●● ● ● ●●● ●● ●● ●● ● ● ●● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●●●● ●● ●●● ● ● ●● ●●●● ●●● ● ● ●● ● ●● ● ● ● ● ● ● 200 ● ● ●●●● ● ●●●●● ● ●●● ● ●● ● ● ●●● ● ●●●● ● ●●●●● ●● ●●●●●● ●●●●● ● ● ● ●●● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ●●●●●● ● ●● ●●● ●● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ●● ●●● ●● ● ● ● ● ●● ●●● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ●●●●●● ● ●● ●● ●●● ●● ●● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ●● ●●●● ●●●●●●●● ●● ● ● ●●●● ● ● ● ● ● ●● ●●● ● ●● ●● ●● ● ●● ● ● ●●● ● ●●● ● ● ● ●●● ● ●●● ● ● ●● ●● ●●● ● ●●●● ● ● ● ●●●● ● ●● ● ●● ●● ●● ● ●● ●●● ● ● ●●●● ●● ● ●●● ●● ●●●●●● ●●●● ● ●● ● ● ●●● ●● ● ●● ● ●● ●● ● ●● ● ●●● ●●● ● ●●● ●● ● ●● ●●●●●● ●●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●●●●● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●● ●● ●●●● ● ●● ●●● ●● ● ●●●● ● ●● ● ● ● ● ● ●● ●● ●● ●●●● ● ●●●● ●● ●●●●● ●●●●●●●● ●●● ●● ● ● ●●●● ●●● ●● ● ● ●● ●●● ●● ● ●●● ● ● ●● ●● ● ● ● ●●● ●● ●● ● ●● ● ● ● ● ● ●●● ●●●● ●●● ●● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ●● ● ● ●●● ● ● ●● ● ● ● ●●●● ● ●●● ●●●● ●●●●● ●●●●● ● ●● ● ● ● ● ● ● ●●● ●●● ● ●● ● ●● ● ●● ●●● ●● ●● ●●●●●● ●●●●● ● ● ● ● ● ●●●●● ● ●● ●● ● ● ●● ● ●●● ● ● ●●●●●● ● ●●●● ● ●● ●●●●● ●● ●●●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ●●● ● ●●●●●● ●● ●●● ●●●●●●●● ● ● ●● ●● ●●●● ●● ● ●●● ●●●●●●●● ● ● ●●● ● ●● ●●●● ●●● ●●●●●●● ●●●● ●● ● ● ●● ● ●●●● ● ●● ●●●● ● ● ● ● ●● ● ●●● ●●●● ●●●●●● ●● ● ●●●●●● ●●●●●●● ● ● ● ● ● ● ● ●●● ●● ●●● ●●● ●● ●●● ●●●●● ●● ●●● ●●●●●●●● ●●●●●●●●● ●● ●● ●● ●● ● ●●● ●●●● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ●●● ● ● ●● ● ●●● ●●●● ● ● ● ● ● ● ● ● ●●● ●● ●●● ●●●● ●●● ●●●●● ● ● ●● ●● ● ● ●●●●●● ●● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ● ●●● ●● ●●●● ●●●● ●●●●●● ● ●● ●●●● ●●●● ● ● ● ● ●● ● ●● ● ● ● ●●●● ● ● ●● ●●●● ● ●● ●● ●●●● ●●●●●● ●●●●●●●●●●●●●●●●●● ● ● ●●● ● ● ● ●● ● ●●●● ●● ● ●●●●● ● ●● ● ● ● ●●● ●●●●● ● ● ● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ● ● ●● ●●●● ●● ●●●●●●●● ● ●●● ● ●● ● ●● ●● ●● ●●● ●● ● ● ●● ●● ●● ●●● ● ●●●●● ● ● ● ● ●●●● ● ● ●●● ●●● ● ●● ●●●●● ● ●● ●●●●●●● ● ● ● ●● ● ●●● ●● ●● ●●●●● ● ●●●● ● ● ●● ●●● ● ●●● ●● ●● ●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●● ●● ●●● ●●●●●●●●● ● ● ●● ●● ● Frequency Frequency ● ●●●● ● ●●● ● ●●● ● ● ●●● ●● ●●●●● ●●● ●● ●●● ●● ●●● ●●● ● ●● ●●● ●●●●●● ●●● ●●● ●●●●●●●● ● ● ● ● ●●●●●●●● ●● ●●● ●●●● ● ● ● ● ● ● ●● ●●●●●●● ●●●● ●● ● ●●●●●● ●●●● ● ● ● ● ● ●● ● ● ● ●●●●●●● ●●● ●● ●●● ● ● ●●● ●● ●●●● ● ● ● ● ● ●●●● ● ●● ● ● ● ●●●●● ●●●● ●●●●●● ● ●●●●●●●● ●●●●●●●●● ●●●●●●● ●●● ●● ● ● ●● ● ●● ● ● ● ●● ●●●● ●● ●●●● ●● ●●● ● ● ● ●●●● ●● ●● ●●●● ● ● ●●● ●● ●●● ●●● ● ●● ●●● ●● ●● ● ● ● ● ● ● ●●●●● ●●●● ● ● ●●●●●● ●● ●●● ● ● ● ● ●●● ● ●● ●●●● ●●● ● ● ●●● ●● ●●●●●● ●● ●● ●●●● ● ● ●●●● ●● ●●●● ● ● ● ●● ● ● ● ●● ● ●●●●●● ●● ●●●● ● ● ● ●●● ●● ● ● ●● ●●●●● ●● ●●●●● ●● ● ●● ●●● ●● ●●● ● ●●●●●●●●●●● ●●● ●●●● ● ●●●● ●●● ●●●●●●●●●●●● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ●●●● ●●● ●●● ●●●● ● ●●●● ●● ●● ● ●●● ●●●●●● ●●● ●●● ●● ● ●● ● ● ●●●●● ●●●●●● ● ●●● ●●●●●● ●●●● ●● ●●●●●●●●●●●●●●●●● ●●●● ●●●●● ●●●● ●●●● ●●●●●●●●● ● ●●●●●●●●●●●●●●●●●● ●● ●● ●● ● ●● ●● ●● ●●●● ● ●●●● ● ●●●● ● ●●●● ● ●●● ●●●●●● ●●●●● ●●●●● ● ●●●●● ●● ●● ●●●●● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ●●●● ●●●●●●● ●●●● ●●● ●● ●●●●●● ● ● ●●●● ● ●●●● ●●● ● ● ●●● ● ●●●● ●● ●● ●●● ● ● ●●●●●● ●● ●●● ●●●●● ●●● ●● ●● ● ●● ● ● ●●●● ●●● ● ●● ●●● ● ● ● ●●●●● ●●● ●● ●●●●●●●●●●●● ●●●●●● ●● ●●●●● ●● ●●●●●● ● ●●●●●●●●●●●●●●●●●●●● ●● ●● ● ● ●●● ● ●●● ●●●●● ● ●●●●●● ●●●●●● ●●●●●●● ●● ●●●●●●●● ●●●●● ●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●● ●●● ●●● ●●●●●●●● ●● ●●●●●● ●● ●●● ● ● ● ● ● ● ●● ●●● ●●● ● ●●●● ● ●● ●●●●● ●● ●● ● ● ●●●●● ●●● ●● ●● ●● ●● ●●●●● ● ● ● ● 500 ● ●● ●● ● ● ● ●● ● ● ●●● ●●●●●● ● ●● ●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●● ●●●●●●●●●●● ● ● ● ● ●● ● ● ●● ● ● ●●●●●● ●●●● ●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ● ●● ●●●●●●●●● ●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ● ●●● ● ● ●●●● ● ● ●●●● ●● ● ●●●●●● ●●● ●● ● ●●●●●● ●●●●● ●●●●●●●●●●● ●●● ●●●●●●●● ●●● ● ● ● ●● ●●●●●●●● ● ● ● ●● ●●●● ●●●●● ●●●●● ●● ●●● ●●●●● ●●●● ●●●●●●●●● ●●●●● ● ●●● ●●●●●●●●● ●●●●● ●●●●●● ●●●●●●●● ●●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ●●●●●●●●● ●●●●●●●●●●●● ●●●●●● ●●●●●●● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ● ● ● ●● ●● ●●● ● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●● ●●●●● ● ●●● ●● ●● ● ● ●● ●●● ● ● ● ● ●●●●●● ●●● ●● ●●●●●●●●● ●●●●● ●●●●●●● ●●●●● ●●● ●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ● ●● ●● ●● ●● ● ●●●● ● ●●● ●● ●●●● ● ●● ●●●●●●●●●● ●●●● ● ●●● ●● ●●●●●●●●●●● ●●●● ●●●●●●●●● ●●●●●●● ●● ●●● ●●●●●●●●●●●●●●●●● ● ●●●● ●● ●● ● ● ● ● ●●●●●●● ● ●●● ●● ●●● ●●●●●●●●● ●●●●●●●●● ●●●●●●●● ●●●● ●●●●●●● ● ● ●●●●●●● ●●●● ●●● ● ● ● ●●●●●●● ●●● ●●●●● ●●● ● ●●●●●●● ●●●●●● ●●●●●●●●● ●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●● ●●●● ●●●●●●●●●●● ● ●●● ● ● ●● ●●●● ●● ●● ● ●●●● ●●●● ●●●●● ●●● ●● ● ●● ●●●●●●●● ●●●● ●●●●●●● ● ●●●●●● ●● ●●●● ●●● ● ●● ●●●●●●● ●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ● ●● ● 2.5 ●●● ●● ●●● ●● ●● ●● ●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●● ●●●●● ● ● ●● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ●● ●●●● ● ● ●● ●● ● ●● ● ● ● ●●●●● ●● ● ● ●● ● ●● ●●● ●● ●● ●●● ●●●● ●● ●●●● ●●●●● ●●● ● ●●●●● ●●● ●●●● ●● ●●●●●●●●● ●●● ●●●● ●●●●●●●●●●●●● ●●●●●●● ●● ● ● ● ●●● ●●● ●●● ●●● ●●●●●● ●●●● ●●●●●●● ●●●●●●●●● ●●●● ●●● ●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●● ●●●●●●●●● ●●●●●●● ● 100 Contamination (%) ● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ●●●●●●● ●●● ●●● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●● ● ●●● ●●●● ●●● ● ● ● ● ● ● ●● ● ●● ●●●●● ● ●●●● ●●● ●●●●●● ●●● ● ●●●● ● ●●●●●●●● ●●● ●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●● ● ●●●●●●● ● ●●●●● ●●● ● ●●●● ●●●●● ●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ●●● ●●●●● ●● ●● ● ●●●●●●●● ●●●●●●●●●●● ●● ●●●●● ● ●●●●●●●●● ●●● ●●● ● ●●●●●● ● ●●● ● ● ●●● ● ● ●●●●●●● ●● ●● ●●● ●●●● ●●●●●● ●●●●●●●●●●●● ●● ●●●●●●●●●● ● ● ●● ● ●●●● ●●●●●●●● ●●●● ●●●●●●● ●●●●●●●●● ● ●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●● ●● ●●● ●●● ●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●● ●●●●●●●●● ●●● ●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ●● ● ● ●●●● ● ●●●●● ●●●●●●●●● ●● ●●●● ●●● ●●●● ●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●● ●●● ●●● ●● ●●●●●● ●● ● ● ● ● ●● ●●● ●● ●●● ●●●●●●● ● ●●● ●● ●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●● ● ●●●●● ●● ●●●●●●●●●●● ● ●● ●●●●●●●● ●● ● ● ● ● ●●●●●●●●●●● ●●●● ● ●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●● ●●●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ● ● ● ●●●● ● ● ●● ● ● ● ●●●●●● ●● ●●● ●● ●●●●●● ●●●●●●● ●●● ●●● ●●●● ●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●● ● ● ● ● ●●● ●●● ●●●● ●●●●●● ● ●● ●●●●●●●●●●●●●●●●●● ●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●● ●●●●●●●●● ● ●●● ●●● ●●●● ●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●● ●●● ●●● ● ●● ●●●●● ● ● ●●●● ●●●●●●●●●● ●●●●●●●●●●●●●● ●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ● ●●● ● ● ● ● ●●● ●●● ●●●● ● ●●●●● ●● ●●●● ●●● ●●● ● ●● ●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ● ●● ●●●● ● ● ●● ●●●● ● ● ●●● ● ●●● ● ●● ● ●●●● ●●●●●● ●●●●●● ●●●●● ●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ● ●●●●●● ●●●● ●● ●● ●●●● ●●●●●●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ● ● ● ● ● ● ● ● ●●●●●● ●● ●●● ●●●●● ●● ●● ● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●● ●●●●●●●●●●● ●●●●●●●● ●● ●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ● ● ●● ● ●●● ●● ●●●●● ● ● ●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●● ●●● ● ● ●● ●● ●● ●●●● ●●●●●●●● ●●●●●●●● ●● ●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●● ●●●●●● ●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●● ● ●●● ●● ● ● ●●● ●●●● ●●● ●● ●●●●● ●●●● ●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●●●● ●●●●●●●●● ●●●●●●●● ●●●●●●●●●● ●●●●●●● ●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ●●●●●● ●●●●●●●● ●●●●● ●●● ●●●●●● ●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●● ● ● ● ● ●● ●● ●● ● ●● ●●●●● ●●● ●●● ●●●● ●● ●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ●●●●●●●●● ● ●●●● ●●●●●●●●●●● ● ● ●● ●●●● ●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●● ●●● ●●● ● ●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ●●● ●● ●●● ●● ● ●● ● ● ● ●● ●●● ●●● ●●●●● ●●●●●● ●●●●●●● ●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●● ●● ● ●●●● ●●●●●●● ●●● ●● ●●●●●●●●●●●●●●●●●●●●● ●● ●● ●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ●● ●●●●●●●●●● ●●●●● ●●●●● ● ● ●● ●●●● ● ●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●● ●●● ●●●●●●● ● ●● ●●● ●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●● ● ● ●● ●● ● ●●●●●●●● ●●●●● ●●●● ●● ●●●●●●●●●●●●● ● ●● ●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ● ●● ●● ●● ●● ●● ● ●●●●●●●●● ● ●●●● ●● ●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ●● ●●● ● ●●●●● ●●●● ●●●●●●● ● ●● ●●● ●●● ● ● ● ●●● ●●●● ●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●●● ●●●●●●●● ●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ● ● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●●● ● ●●●●●●●●●●● ●● ●●●●●●●● ●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●● ●●●●●● ●● ● ●●●●●●●●●●●●●● ●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●● ●●● ● ●●● ● ●● ● ● ● ● ● ●●● ●●●● ●●● ● ● ●●●●●● ●● ● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●● ● ● ● ●● ● ●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●● ●●● ● ●● ●● ●● ● ● ●● ●● ●● ● ●●●●●●● ●● ● ●●●● ● ●●● ● ●● ●●●●●●● ●●●●●●● ●●●●●●●●●●●● ●● ●●●●●● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ●● ●●● ● ●●●●●● ●●●●● ●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ●● ●● ●●●● ●●● ●●●●●● ●●● ●● ● ● ● ●●● ● ● ● ●●● ● ● ●●●●●●●●●●● ●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ●●●●● ●●●●● ●●●●●●●● ●●● ●●●● ●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ●● ●●●● ● ●● ● ●● ●●● ●●●●● ●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ● ● ● ● ●●●●●● ●●● ● ●●● ● ●●● ●● ●●●● ● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●● ●●●●●●●●●●●●●●●●● ●●● ●● ●●● ● ●● ● ● ● ●●●● ● ● ●●● ●● ●●●●●●● ●●● ●●●●●● ●● ● ●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ● ●● ●●● ●● ●● ●● ●● ●●●●● ● ●●●●●●●● ●●● ●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●● ● ● ● ●●● ● ●● ● ● ● ●●●●● ●●●●●●●● ●●●● ●●●●● ●● ●●●●● ●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ● ● ●● ●● ●● ● ● ● ●● ● ●●●● ●● ●●●●●●●●●●● ● ●●●●●●●●●●● ●●●●●●●●●● ●●● ●●●●●●●●●●●● ●●●● ● ● ● ●● ●●● ● ● ● ● ●●●●●●●● ●●●● ●●●●●●●● ●●●●●●● ● ● ●●●●●●●●●●●● ●●● ● ●●● ●●●●●●●●● ● 0 0 0.0 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0 200 400 600 1e+03 3e+03 5e+03 75 80 85 90 95 100 scaffolds predicted genes Completeness (%)

Figure 2. Quality of MAGs.

Distribution of scaffold counts (A), distribution of predicted genes counts (B), and completeness and contamination rates over MAGs of partial quality (grey), medium quality (pink) and nearly complete MAGs (blue) (C). Partial and medium quality MAGs have a level of completeness between 60 and 80, and between 80 and 90, respectively, and a rate of contamination of < 10, and ≤ 10, respectively. MAGs with a ≥ 90% completeness and a ≤ 5% contamination are considered nearly complete as by the GSC MIMAG standards (Bowers et al, 2017). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 1 Control ControlNeoDNeoD 2 D−Scour ColiGuardControl timepoint DNeoC−ScourControlNeoD ColiGuardNeomycinNeoDNeoDNeomycinControlNeoD ColiGuardControlNeoCColiGuard ColiGuardNeoCNeoCNeoC 2 ColiGuard Neomycin DD−ColiGuardScour−ScourDNeomycinNeoC−Scour ControlNeoC DNeoD−Scour timepointa t0 DNeomycinD−ColiGuardNeoC−ScourScour NeoCNeoDNeoC D−ScourDColiGuardD−NeoD−ScourScourScour D−Scour ColiGuardDD−DNeoDScour−Neomycin−NeomycinScourScourControl NeoD ColiGuardNeoD Control DNeoDControl−Scour NeoDNeoDNeoDNeoC D−ScourD−Scour a t0a t1 0 NeoCColiGuardNeomycin NeomycinNeoCNeomycinD−Scour 0 NeomycinColiGuardNeoC NeoD ColiGuardControl ControlNeomycinDMothers−Scour NeomycinColiGuardNeoCControlNeoC D−Scour D−NeoCScourD−Scour a t1 MothersNeomycinNeomycinNeomycinNeoDNeoC 1 NeoD DNeoD−Scour a t2 ColiGuardNeoD ColiGuardNeomycinNeoCControl ControlDD−NeoC−ScourScour ColiGuard NeomycinNeomycin NeoD NeoDNeoCNeoC D−NeomycinScourNeoD a NeomycinColiGuard 0 NeoC NeoCDD−−ScourScour NeoC NeoDControl t2a ColiGuard ColiGuardNeoD NeoDNeoDColiGuardD−Scour NeoC NeoCNeoD t3 ControlControlControlControl ControlNeoD Control NeoC ControlD−ScourNeoC ColiGuard ColiGuardNeoDColiGuard ControlNeoD Control D−Scour NeoCControl a t3 ColiGuardNeomycin Neomycin ColiGuard ColiGuardControl NeoD a NeoD D−ScourNeoD NeomycinNeomycinColiGuardColiGuardControl t4 −1 NeoC NeomycinNeomycinNeomycin Control D−Scour 0 NeoC Control a ColiGuardNeoC ColiGuard t4 −1 NeomycinNeomycin D−Scour ControlColiGuard NeoCColiGuard NeoC Neomycin DNeoC−Scour D−Scour Neomycin ColiGuardNeoC Control a t5 D−Scour NeomycinNeoDColiGuard NeoD NeoDNeoC ColiGuard a t5 ColiGuardNeoD Control a t6 NeoC ColiGuardControl ColiGuardNeomycin a Control NeoD t6 Neomycin −12 D−Scour D−Scour Neomycin a t7 −2 NeoD a NeoC t7 ColiGuardNeomycin NeoD Control −2 D−Scour a t8 ColiGuard Control a t8 NeoD NeoDNeoC NeoC Control NeoC −2 a DColiGuardControl−Scour a t9 t9 standardized PC2 (7.5% explained var.) PC2 (7.5% explained standardized var.) PC4 (1.2% explained standardized D−Scour Neomycin

standardized PC4 (4.1% explained var.) PC4 (4.1% explained standardized NeoC

standardized PC2 (15.9% explained var.) PC2 (15.9% explained standardized ColiGuard −3 Neomycin a t10a t10

−4 Control −2 −2 −1 −1 0 0 11 a NA standardized PC1 (81.7% explained var.) −1 0 1 2 standardized PC1 (51.0% explained var.) standardized−3 −2 PC3 (4.4%−1 explained0 var.)1 standardized PC3 (7.1% explained var.)

Figure 3. Temporal shift of samples from ANI clustered MAGs and GTDB taxonomically clustered MAGs.

Principal component analysis from 99% ANI clustered MAGs (22.4k) across the first and second principal component (left) and from all GTDB taxonomically clustered MAGs (51.1k) across the third and fourth principal component (right). Prior to PCA, the data was normalized by proportions, the average proportion was computed by time point and treatment cohort, and the centered log transformation was applied. Labels report the treatment cohorts: Control, D-Scour, ColiGuard, Neomycin, NeoD (Neomycin followed by D-Scour), NeoC (Neomycin followed by ColiGuard). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Diversity of GTDB−predicted species in the piglet population

Phascolarctobacterium_A succinatutens C941 sp002476625 Phil1 sp001940855 Prevotella sp000834015 Prevotella sp002251435 CAG−45 sp002299665 Prevotella sp002251295 CAG−110 sp002437585 Lactobacillus amylovorus Solobacterium sp900315065 Prevotella sp000436915 Abundance Blautia_A wexlerae 256 Prevotella copri

Gemmiger formicilis 16 species prausnitzii_D Holdemanella sp002299315 1 Gemmiger sp003476825 Holdemanella biformis Lactobacillus johnsonii Catenibacterium mitsuokai Collinsella sp002391315 Lactobacillus_H reuteri Mogibacterium sp002299625 Eubacterium_R coprostanoligenes CAG−83 sp003487665 Methanobrevibacter_A smithii

t0 t2 t4 t6 t8 t10 NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA Sample

Figure 4. The twenty-six most abundant taxa in the piglet population and their abundance profile over time.

The heatmap reports the log transformed normalized relative abundance from all piglets. Each panel represents one time point and, within each panel, all samples from all piglets are included. At t0 piglets are aged 3 to 4 weeks old; at t10 piglets are aged 8 to 9 weeks old. Samples are rarefied to obtain equal amount of GTDB clustered MAGs per sample, and subsequently the most prevalent taxa, present in at least 20% of all the samples are included. Analysis is performed with PhyloSeq. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Differentially abundant features Differentially abundant features Significance Fold change Prevalence shift Feature AUCs Significance Fold change Prevalence shift Feature AUCs showing top 10 features showing top 10 features

t4 t8 t0 t4 Faecalibacterium prausnitzii_E__1566 Clostridium_P ventriculi__3047

Lactobacillus amylovorus__4345 Corynebacterium xerosis__7061

Ruminococcus_A sp003011855__2218 Clostridium sp000435835__3041

Faecalibacterium prausnitzii_D__1561 Lactobacillus johnsonii__4373

Gemmiger formicilis__1571 Blautia_A obeum__2183

Blautia_A wexlerae__2186 CAG−170 sp002298695__1663

UBA1417 sp003531055__1350 CAG−83 sp000431575__1625

CAG−83 sp003487665__1630 UBA7748 sp900314535__9466

Phil1 sp001940855__1796 Bifidobacterium thermacidophilum__8868

Methanobrevibacter_A smithii__758 Bifidobacterium boum__8866

1E−06 1E−05 1E−04 1E−03 1E−02 1E−01 1E+00 0 6 12 19 26 −3 −1.5 0 1E−1.506 31E− 05 0 251E−04 50 1E 75−03 100 1E −002 0.25 1E−010.5 0.751E+00 1 0 3 6 9 12 −2 −1 0 1 2 0 25 50 75 100 0 0.25 0.5 0.75 1

Abundance (log10−scale) −log10(adj. p value) Generalized fold change PrevalenceAbundance [%] (log10−scale) AU−ROC −log10(adj. p value) Generalized fold change Prevalence [%] AU−ROC

Figure 5. Significantly changing GTDB-species with time.

Shown are the top 10 significantly changing GTDB taxonomically clustered MAGs at the species level, between time points 0 and 4 (2 weeks interval) (left), and between time points 4 and 8 (2 weeks interval) (right). Points show normalized and log -transformed abundance within each subject and GTDB-predicted species. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●● ● ●● ●●●● ● ● ● ●●●● ● ●●●●●● ● ● ● ●● GT89 150 ●●●● ● ●●●●● ● ● CAG−45 sp002299665 ●●●● ● ● ● ●●● GH54 ●●●●● RC9 sp000432655 ●● ● ● ● GH153 ●●●● ● ● Escherichia flexneri ●●● ● ●●● ●●●● ●● ● ● ●●●●● ● ● ● GT44 ●●● GH81 enzyme class Chlamydia suis ● ● CAG−279 sp000437795 ●● ● ● AA ● ● ● 100 ● ● CE ● ● ● ● ● ● ●● ● SLH ● GT87 CBM71 ● An172 sp002160515 ● ● Corynebacterium xerosis ● GH ● ● ● ● ● ● GT ●● ●● ● ● GH104 ● CBM ● ● Escherichia flexneri ● ● ● ● cohesin ● ● GH44 prevalence in the pig population prevalence Ruminococcus_C sp000433635 ● PL ● ● ● CBM44 50 ● CBM21 ● Clostridium_P perfringens ● Ruminococcus_E bromii_A ●● ● ● CBM83 ● Agathobacter sp900317585 ● ● ● ● ● ● ●● CBM79 ● ●Ruminococcus_F champanellensis ●● ● ● ● ● ● CBM75 ●● ●● Ruminococcus_F champanellensis ● ● ● ● ● ●● ● ● ● PL7 ● ● ● Treponema_D berlinense ● ●● ● ● ● ● 0 ● CBM72● Prevotella sp002251435

0 50 100 percentage of top species bearing the enzyme

Figure 6. CAZy enzymes in the pig population and their species representation.

Two-hundred ninety-four carbohydrate active enzymes are represented. The position along the x-axis indicates the relative abundance (as a percentage) of the species with the highest copy number of genes encoding a specific enzyme, while the y-axis represented the prevalence among the pig population (n=126 piglets; n=42 mothers). Taxonomic assignments of MAGs are based on the GTDB. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

A B sampling frequency

Bins distribution over cohorts and piglets (49185) 2.5 5.0 7.5 10.0

29875 14276 14317 29646 14261 29643 29961 29934 29951 7621 7654 7605 8964 7231 7430 2680 800 ●

● 29752 29742 14159 29667 14295 29703 14263 29715 14260

D−Scour NeoC NeoD ● ● ● 29846 ● ● 14194 14287 ● 14262 29644 29765 14320 600 29679 ● ● 14274 29900 14297 ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ●● ● ● ● ● ● 14193 14298 14174 ● ● 29864 14195 29652 ● ● 14172 1427329778 ● ● ● ● 29943 ● ● ● ● ● ● 29645 14171 ● ● ● ● ● ● ● ● ●● ● ● ● ● 400 14205 14233 29792 29914 14186 29931 29691 29754 14201 ● 29745 ● ● ● ● ● ● 29655 ●● ● ● ● ● ● ● ● "Total bins and sampling frequency per cohort " ● ● ● ● ● ● ● ● 14265 14208 29694 29948 29666 29705 ● 14306 14188 29924 ● ● ● ● cohort mean_bins median_bins tot_bins mean_sampling_frequency ● ● ● ● ● ● ● ● ● ● ● ● ● 200 ● ● median_sampling_frequency number of metagenomes obtained number ● 29753 29847 29668 14187 29654 14184 ● 14308 14183 14209 ● ● ● ● ● ● ● ● 1 Control 262.79310 232.0 7621 4.448276 3.0 ●

● ●● ● ● ●● ● ● ●● ● ● ●● 29631 ● ●●● 14286 ● ● ColiGuard Neomycin Control ● ● ● ●● 14305 29653 ● 29796 ● 14322 ●● 14307 ● ●●● ● 29704 2 DScour 425.22222 437.0 7654 7.111111 6.0 ● ● ●● ● ● ● 0 14160 29912 14271 14161 3 ColiGuard 422.50000 453.0 7605 7.111111 6.0 29718 141922963329797 n=29 n=18 n=18 n=24 n=17 n=17 n=42 14288 14275 14163 29863 29949 29707 1420429945 4 Neomycin 373.50000 371.5 8964 5.583333 4.5 14190 29939 29867 14231 29777 29647 14207 14169 29876 29700 14162 29901 29793 14277 5 NeoD 436.72222 451.0 7861 7.055556 6.0 29913 14200 NeoD NeoC 29743 29693 29946 29898 14182 Control 29692 29804 Mothers D − Scour 14315 29764 Neomycin ColiGuard cohort 6 NeoC 436.88889 436.0 7864 7.000000 6.0 7 Mothers 63.80952 63.0 2680 1.000000 1.0 Supplementary Figure 1. Distribution of bins over subjects and cohorts.

A) In total, mothers have been sampled once, while piglets have been sampled between 1 and 10 times (median: 6.0; mean: 6.18) over a period of 5 weeks. The control and neomycin cohorts, in which 12 and 6 subjects were euthanized during the first week of the study, had a mean sampling of 4.5 and 5.6 time point samples and a mean bin count of 262.8 and 373.5 bins per subject, respectively. In the other cohorts, with a mean of 6.0 time point samples per subject, an average of 430.3 bins per subject was obtained. B) The higher number of time point samples available from the subset group is reflected in the higher number of MAGs obtained from these piglets. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Time-series binning Mapping Pooled assembly Cleaning RAW READS MetaBat2 bwa mem MEGAHIT bbduk samtools

Taxonomic assignment Visualization DEPTH BINS KRAKEN GTDB Krona

Protein prediction associate clean contigs to Prodigal bins Bins quality analysis CheckM

Compositional DA analysis Functional analysis analysis Dereplication SIAMCAT PhyloSeq CAZy DBCAN dRep merge

Transform: for contigs falling cohorts info under the same bin: Transform: weighted_depth = merge all into a counts = BINS sum(depth*weight)/ single dataframe (weighted_depth*median(binLength)) COUNTS sum(weights) / 300 [weight = contig length]

Supplementary Figure 2. Workflow of sequencing data processing and analysis. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Distribution of scaffold N50 across nearly complete genomes

400

300

200 Frequency

100

0 1e+04 3e+04 1e+05 3e+05 scaffold N50

Supplementary Figure 3. Distribution of scaffold N50 across nearly complete MAGs.

Nearly complete metagenome-assembled genomes (MAGs) had an average of 132.6 scaffolds (min.: 3.0; 1st Qu.: 79.0; Median: 126.0; Mean: 132.6; 3rd Qu.: 174.0; Max: 567.0) and an average scaffold N50 of 36,276 nucleotides (min.: 6,234; 1st Qu.: 17,261; Median: 25,438; 3rd Qu.: 41,264; Max: 727,772). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

type common unique

Shared versus host unique primary clusters (95% ANI) Shared versus host unique secondary clusters (99% ANI) distribution among piglets distribution among piglets

300 300

200 200 clustered bins clustered bins

100 100

0 0

piglets piglets

Shared versus host unique primary clusters (95% ANI) Shared versus host unique secondary clusters (99% ANI) distribution among mothers distribution among mothers 40 40

Piglets between 4 30 30 and 5 mes more bins 20 20 per subject: clustered bins clustered bins 2680/42 = 63 10 10 47569/126 = 377

0 0

mothers mothers

Supplementary Figure 4. The fraction of MAGs that are unique to individual subjects when clustered at 95% and 99% ANI.

Primary clusters (95% ANI) (A-B) were shared across subjects at a higher rate than secondary clusters (99% ANI) (B-C): 98.6% and 86.2% among piglets (A-C) and 91.4% and 72.1% among the mothers (B-D), respectively, delineating strain specificity. It appears that each individual subject harbors a specific set of ANI clustered MAGs (95% ANI or 99% ANI) that are unique to the subject, as well as a set ANI clustered MAGs (95% ANI or 99% ANI) that are common among subjects. A larger fraction of unique ANI clustered MAGs (either at 95% or 99% ANI) is seen among the mothers than among the piglets. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

freq 200 400 600 GTDB− MAGs assignments agreement with primary clusters (95% ANI)

species ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 82.57%

genus ●● ●● ●●●● ● ●● ●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 89.2%

family ● ● ●●● ● ●●●●●●●●● ● ●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 94.44%

order ● ● ● ● ●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 97.04%

GTDB_taxa_level class ● ● ● ●●●●● ●●● ●● ● ●●● ●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 98.56%

phylum ● ● ● ●●●●● ●●● ●● ●●● ●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 98.58%

0.00 0.25 0.50 0.75 1.00 1.25 agree GTDB− MAGs assignments agreement with secondary clusters (99% ANI)

species ● ● ●● ●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 94.36%

genus ● ● ●● ●● ● ●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 96.69%

family ● ● ● ● ●● ●●●●● ● ●●●●●● ● ● ●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 98.34%

order ● ● ● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 99.1%

GTDB_taxa_level class ● ● ● ● ●● ● ●●● ●● ● ●● ●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 99.6%

phylum ● ● ● ● ●● ● ●●● ●● ●● ●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 99.61%

0.00 0.25 0.50 0.75 1.00 1.25 agree

Supplementary Figure 5. Extent of agreement between dRep ANI clustering and GTDB taxonomic clustering.

Extent of agreement is shown between GTDB taxonomic clustering and dRep- primary clusters (95% ANI) (top) and secondary clusters (99% ANI) (bottom). Color coding represents MAG frequency (lower: dark; higher: light) bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[1] " Phyla distribuon : " taxa n perc 1 Firmicutes 7542 63.60 2 Bacteroidetes 1563 13.18 3 Tenericutes 1090 9.19 Bacillus subtilis Bifidobacterium bifidum Angelakisella sp003453215 4 Acnobacteria 801 6.75 Enterobacter himalayensis Enterobacter himalayensis Enterobacter himalayensis 5 Proteobacteria 255 2.15 Enterococcus_B faecium_B Enterococcus_B faecium_B Escherichia flexneri 6 Euryarchaeota 244 2.06 Escherichia flexneri Lactobacillus delbrueckii F23−B02 sp001916715 7 Spirochaetes 210 1.77 Lactobacillus_B salivarius Lactobacillus helveticus Lactobacillus_B salivarius 8 Chlamydiae 90 0.76 Lactobacillus_F plantarum Lactobacillus_C rhamnosus Lactobacillus_F plantarum 9 Synergistetes 63 0.53 Pseudomonas aeruginosa Lactobacillus_F plantarum Oscillibacter sp900115635 10 Verrucomicrobia 1 0.01 Staphylococcus aureus PeH17 sp001940845 Pseudomonas aeruginosa

Staphylococcus epidermidis Streptococcus thermophilus [1] " Order distribuon : " ColiGuard taxa n perc MockCommunity D−Scour 1 Clostridiales 6510 56.37

2.0e+07 2 Bacteroidales 1487 12.88 3 Erysipelotrichales 942 8.16 6e+06 4 Coriobacteriales 628 5.44 1.5e+07 5 Lactobacillales 625 5.41 1.5e+07 6 Selenomonadales 387 3.35

4e+06 7 Methanobacteriales 205

1.0e+07 1.78 1.0e+07 8 Spirochaetales 206 1.78 Relative abundance Relative Relative abundance Relative

2e+06 9 Bifidobacteriales 98 0.85 Relative abundance Relative 10 Enterobacteriales 96 0.83 5.0e+06 5.0e+06 11 Chlamydiales 90 0.78 MockCommunity_R1 MockCommunity_R2 MockCommunity_R3 MockCommunity_R4 MockCommunity_R5 MockCommunity_R6 MockCommunity_R7 MockCommunity_R8 MockCommunity_R9

0e+00 12 Acnomycetales 69 0.60

ColiGuard_R1 ColiGuard_R2 ColiGuard_R3 ColiGuard_R4 ColiGuard_R5 ColiGuard_R6 ColiGuard_R7 ColiGuard_R8 13 Synergistales 63 0.55 D − Scour_R1 D − Scour_R2 D − Scour_R3 D − Scour_R4 D − Scour_R5 D − Scour_R6 D − Scour_R7 D − Scour_R8 0.0e+00

0.0e+00 14 Desulfovibrionales 52 0.45 15 Aeromonadales 48 0.42 16 Campylobacterales 21 0.18 Replicate Replicate Replicate 17 Burkholderiales 12 0.10 18 Bacillales 4 0.03 Supplementary Figure 6. Taxonomic profile of the positive controls. 19 Pseudomonadales 3 0.03 20 Pasteurellales 1 0.01 Taxonomic profile is obtained from PhyloSeq analysis of GTDB clustered MAGs of 21 Rhodocyclales 1 0.01 the positive controls samples. The taxonomic profile within each replicate is 22 Verrucomicrobiales 1 0.01 displayed in relative abundance. Each replicate sample is normalized by median sequencing depth. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Phyla distribution from all MAGs (GTDB) − most common Phyla distribution from all MAGs (GTDB) − least common [1] " Phyla distribuon : " taxa n perc 1 Firmicutes 7542 63.60 Campylobacterota Synergistota Cyanobacteria 0.22 % 0.22 % 2 Bacteroidetes 1563 13.18 0.47 % 3 Tenericutes 1090 9.19 Firmicutes 14.64 % 4 Acnobacteria 801 6.75 5 Proteobacteria 255 2.15 Firmicutes_A 54.52 % Verrucomicrobiota_A Patescibacteria 6 Euryarchaeota 244 2.06 0.21 % 0.2 % 7 Spirochaetes 210 1.77 8 Chlamydiae 90 0.76 Planctomycetota 0.42 % 9 Synergistetes 63 0.53

Actinobacteriota others 10 Verrucomicrobia 1 0.01 Thermoplasmatota 4.07 % 2.73 % Verrucomicrobiota NA 0.14 % 0.13 % 0.09 % [1] " Order distribuon : " taxa n perc Firmicutes_C Spirochaetota Desulfobacterota_A Fusobacteriota Bacteroidota 1.29 % Myxococcota 2.62 % 0.34 % Firmicutes_B 0.02 % 1 Clostridiales 6510 56.37 17.96 % Fibrobacterota 0.04 % 0.13 % 0.06 % Eremiobacterota 0.01 % 2 Bacteroidales 1487 12.88 Proteobacteria Euryarchaeota Elusimicrobiota 0.73 % Thermotogota 1.44 % 0.02 % 0.01 % 3 Erysipelotrichales 942 8.16

Genus distribution from all MAGs (GTDB) − 50 most common Species distribution from all MAGs (GTDB) − 50 most common 4 Coriobacteriales 628 5.44 Oscillospirales Lachnospirales RF39 Bacteroidaceae Lactobacillaceae 5 Lactobacillales 625 5.41 6 Selenomonadales 387 3.35 Prevotella Prevotella sp002251435 0.99 % Prevotella Gemmiger Gemmiger 7 Methanobacteriales 205 Gemmiger sp000436915 sp002251365 Lactobacillus Dorea formicilis 1.02 % sp003476825 CAG−110 2.7 % CAG−83 2.46 % 2.04 % UBA3789 0.91 % 0.8 % amylovorus Blautia_A 3.14 % 1.64 % 0.52 % Lactobacillus_H 1.78 2.28 % 0.89 % reuteri 0.76 % Faecalibacterium Gemmiger Prevotella Prevotella CAG−45 UBA2856 prausnitzii_D variabile Lactobacillus 8 Spirochaetales 206 1.78 copri 0.95 % sp002251385 F23−B02 0.67 % 0.64 % 0.66 % 0.44 % johnsonii 0.61 % 1.33 % Oscillibacter CAG−170 Agathobacter CAG−302 0.76 % 9 Bifidobacteriales 98 0.85 1.17 % 1.08 % 1.33 % CAG−74 CAG−508 UBA932 Faecalibacterium 1.33 % Prevotella 1.7 % CAG−103 Prevotella sp002251295 Prevotella RC9 10 Enterobacteriales 96 0.83 Eubacterium_H CAG−1000 sp002251245 UBA1777 0.57 % Eubacterium_EFaecalicatena sp000834015 0.95 % 0.64 % sp000434935 0.76 % 0.61 % 0.56 % 0.73 % 0.5 % UBA11524 UBA2862 CAG−269 CAG−245 1.28 % Eubacterium_R sp000437595 sp001916055 11 Chlamydiales 90 0.78 sp000435175 0.69 % 0.88 % Angelakisella Christensenellales Lactobacillales Oscillospiraceae sp900317565 0.7 % 0.5 % 0.48 % 0.57 % 0.65 % RC9 DTU024 OEMS01 CAG−269 12 Acnomycetales 69 0.60 UBA2862 sp000431335 sp000432655 Ruminococcus_E 0.81 % Oscillibacter sp900199405 Butyricicoccus_A Intestinimonas CAG−110 UBA1777 0.54 % 0.46 % 0.45 % 1.58 % ER4 1.19 % 0.71 % 0.53 % 1.18 % sp900066435 Lactobacillus 2.19 % sp003150355 Enterobacteriaceae 13 Synergistales 63 0.55 UBA2943 sp002437585 Anaerovoracaceae CAG−138 CAG−302 Bacteroidales 0.85 % 0.53 % 1.28 % 0.7 % CAG−110 UBA7102 PeH17 14 Desulfovibrionales 52 0.45 sp002362145 UBA1191 1.06 % 0.5 % Lactobacillus_H 1.57 % Phil1 CAG−302 Escherichia CAG−145 Actinomycetales Treponematales 0.53 % sp900066305 Erysipelotrichales sp001940855 sp001916775 flexneri 15 Aeromonadales 48 0.42 TANB77 CAG−83 F23−B02 sp002320005 0.64 % 0.48 % 0.68 % 0.61 % 0.58 % sp003487665 sp002472405 ER4 Acidaminococcaceae Acutalibacteraceae Muribaculaceae 16 Campylobacterales 21 0.18 sp000765235 CAG−611 1.15 % 0.51 % 0.48 % Holdemanella Solobacterium Bifidobacterium Treponema_D 0.9 % Lachnospiraceae CAG−877 C941 17 Burkholderiales 12 0.10 0.51 % 0.83 % 0.82 % Eubacterium_R Acidaminococcales Methanobacteriales Enterobacterales Dorea sp000433455 sp002476625 CAG−269 1.94 % Phascolarctobacterium_A coprostanoligenes formicigenerans RC9 3.55 % Peptostreptococcales succinatutens 0.97 % 0.57 % 0.48 % 0.48 % 18 Bacillales 4 0.03 Sphaerochaetaceae Phascolarctobacterium_A 0.49 % Erysipelotrichaceae Clostridiaceae CAG−1000 0.98 % Escherichia Dorea Blautia_A 19 Pseudomonadales 3 0.03 Clostridiales 0.66 % Holdemanella Clostridium UBA636 CAG−1000 UBA5920 CAG−485 C941UBA1191 Coriobacteriales Blautia_A longicatena massiliensis CAG−145 biformis sp002299675 sp000435835 sp000434555 sp002406055 20 Pasteurellales 1 0.01 Prevotella 10.05 % 0.8 % 0.65 % 0.9 % 0.71 % Collinsella 0.64 % wexlerae 1.27 % 0.5 % 0.47 % 0.46 % 0.43 % 0.5 % 0.46 % 0.43 % 21 Rhodocyclales 1 0.01 22 Verrucomicrobiales 1 0.01 Supplementary Figure 7. Taxonomic composition of the piglet gut.

Tree maps were built from MAGs taxonomically clustered with GTDB, without scaling for the relative abundance of each MAG. This accounts for prevalence of the MAG in the pig population (e.g. if the same taxon occurs in multiple piglets it can get counted more than once). The most common Phyla (top left), the least common Phyla (top right), the 50 most common Genera colored by Order (bottom left), and the 50 most common Species colored by Family (bottom right) are shown. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

date ● t0 ● t1 ● t2 ● t3 ● t4 ● t5 ● t6 ● t7 ● t8 ● t9

A 1.0 ● C ● E ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ●● 0.5 ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 0.5 ● ● ● ● ● ● ●●● ●● ●● ●●●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●●●● ●●●●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●●●●● ● ● ● ●● ● ●●●●●●● ● ● ● ● ●●● ●● ● ●●● ● ● ● ● ●●●●●●●●● ● ● ●● ●●●●● ● ●●●●● ● ● ● ●●● ● ● ●● ● ●●●●●●● ●●●●● ● ● ● ● ●● ● ●● ●●●● ● ● ●● ● ● ●● ● ● ●●●● ●●●●●●●●●●●●●● ● ● ● ● ●● ●●● ●● ●●●●●●● ●● ● ● ● ● ● ● ●●●● ●●●●●●● ●● ● ● ● ● ● ●●●●●●●● ● ● ●● ● ●● ●●●●●●●●●●● ● ● ●● ● ●● ●●●● ● ●●●●● ● ● ● ● ● ●●●●● ● ● ● ●●●●●●●●● ●●●●●● ● ● ● ●● ●●●●● ●●●●●●●●●●● ● ● ● ● ● ●● ● ●● ● ●● ● ●●●●●●●●●●●●●● ●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●● ●●● ● ● ● ● ●●●●●●●●●●●●●● ● ●● ●●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●● ● ● ●●●●●● ●● ● ● ● ●●● ● ●●●●●●●●●●● ●● ● ●●● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●● ● ● ●● ●● ●●●●●●●●●●●●●●● ● ●●● ● ●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●●●●●●●● ●●● ● ● ●●●●●●●● ●●●●●●●●●●●●●●●● ●● ● ●●● ● ●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●●●●●●●●●●●●●●●● ●● ● ● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●● ● ● 0.0 ● ● ● ● ●●●●●●●●●●●●●●● ● ● ● ● ●●●● ●●●●● ●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●● ● ●● ●●●●●●●●●●●●●●●●●●● ● ● ● 0.0 ● ●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ● ● ● ●●●● ●●●● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●● ●● ● ● ● ●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●● ●●●●●●●●●●●●●●●●●● ● ●● ●●● ●●●●●●●●●●●●● ● ● ● ● ●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●● ● ●● ●●●●●●●●●●●●●●●●●●●●● ● ● ●● ●● ● ● ●●●●●●●●●● ● ● ●● ● ● ●● ●●●●● ●●●●●●●●●●●●●●● ● ● ● ●● ●●●●●●●●●●●●●●● ● ● ●●●●●●● ●●●● ●●●●●●●●●●●●●● ● ● ●●● ●●●●●● ●●● ●●●●●●●●●● ●●● ● ● ● ●●● ● ●●●●●●●●●●●●●●●●●● ● ●●●●● ●●● ●●●●●●●●●●●● ● NMDS2 ● ●●●● ● ●●●●●● ●●● NMDS2 ● ● ● ●● ●● ●●●●●●●●●●●●●●● ● NMDS2 ● ● ●●●●●●●●●●●●●● ● ●● ●● ●● ●●●●●●●● ● 0 ● ● ● ●●● ●●●● ●●●●●●●●●●●● ●●● ● ● ● ●●●● ●●●●●●●●●●●● ● ● ● ●● ●●●●● ●● ●●●●● ● ●●●●● ●●●●●●●●●●●●●●●●● ●● ●● ● ●● ● ●●●●●●●●●● ● ● ● ●●● ● ●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●●● ●● ● ●●●●●●● ● ● ●●●● ●●●●●●●●●●●● ●● ●●● ●● ●●●●●●●●●●●●●●●●●● ● ● ● ●● ● ●●●●●●●● ●● ● ● ● ● ●●●●●●●●●●●●●●●● ● ● ● ●● ●●● ● ●● ●●●●●●●●●●●●●● ● ●● ●●● ● ● ● ● ● ●● ● ● ●●●●●●● ●●● ● ● ● ● ●●●●● ●●● ●●●●●●●●●●●●●●●● ● ● ●●● ● ● ● ● ● ●●●●●●● ● ●●● ● ● ● ● ●●● ●●●●●●● ●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●●● ● ● ● ● ●●●●●●●●●●●●● ●● ●●●●●●●●● ● ● ● ● ● ● ● ●●● ● ●●●●● ● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ●●●●● ● ●●●●● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●●● ●●●● ●●● ● ● ● ● ● ● ● ● ● ● ●●●● ●●●●●●●● ● ●● ● ● ● ● ● ● ●● ●●●●●● ●● −0.5 ● ● ● ● ● ● ●●● ●● ●●● ● ●●● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ●● ●● ● ● ● ● ● −0.5 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● −1.0 ● ●● ● ● ● ● ● −1.0 −1.0 −0.5 0.0 0.5 −2 −1 0 1 −2 −1 0 NMDS1 NMDS1 NMDS1

cohort ● Control D−Scour ColiGuard Neomycin NeoD NeoC B D F ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●●● ● ● ●●●● ● ●● ● ●●●●● ● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ●●●●● ● ●●●●● ● ● ● ●● ●●●●● ● ● ●● ●●● ● ●● ●● ● ● ●● ● ●●● ●● ●

● ●

Supplementary Figure 8. Temporal shift of MAGs from PhyloSeq analysis.

NMDS analysis (top) and network analysis (bottom) with PhyloSeq from nearly complete- CheckM clustered MAGs (12.4k) (A-B), 99% ANI clustered MAGs (22.4k) (C-D), and GTDB clustered MAGs (51.2k) (E-F). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

3 2 NeoC D−Scour D−Scour NeomycinColiGuardNeoD ControlNeoCControl timepoint Neomycin 2 DNeoD−Scour NeoCColiGuard Control a t0 Control NeoC 1 D−Scour NeomycinControl ColiGuard a t1 DNeoC−Scour ColiGuard ColiGuard NeoDNeoD ColiGuard ColiGuard 1 D−Scour NeoDColiGuard D−Scour Neomycin ColiGuard a Control NeomycinColiGuard t2 NeoC ColiGuard NeoDColiGuardNeoC ColiGuard NeoD a t3 D−ScourD−Scour Neomycin Neomycin D−Scour 0 ColiGuard D−Scour NeoCNeoD Neomycin 0 ControlNeoD NeoD a t4 NeoC NeoD ControlNeomycin Control ControlNeomycin NeomycinColiGuard DDNeoC−Scour−Scour D−Scour Control a D−Scour Control t5 D−Scour ColiGuardNeoD Neomycin NeoC D−Scour NeoCNeoDNeoC NeomycinNeoC NeoC DColiGuard−NeoCScourNeoC NeoD D−NeoCScour NeoD a t6 ColiGuardColiGuardNeomycinNeoD −1 NeoC Neomycin ControlControl Control NeoC Neomycin NeomycinNeomycinControl NeoDD−Scour −1 ColiGuardNeomycinColiGuardControlNeoD NeoC a t7 ControlColiGuard NeomycinNeoDControl a t8 −2 NeoD a t9 standardized PC2 (6.9% explained var.) PC2 (6.9% explained standardized var.) PC4 (1.1% explained standardized D−NeoDScour 1 Control a t10 NeoCControl −2 −2 −1 0 1 −3 ColiGuard DScourNeoC standardized PC1 (86.8% explained var.) −2 −1 0 1 2 NeoDNeoD Control Neomycin 2standardized PC3 (1.3% explained var.) timepoint NeomycinColiGuardNeoDDScourDScour DScour ColiGuardDScourColiGuardDScourControl ColiGuardNeoD DScourControlNeomycinControl ColiGuardNeomycin DScourNeoDNeoCNeoC a t0 NeoDNeomycinColiGuard 0 DScour1 NeoD NeoC NeomycinControlNeoDControlControl D−Scour NeomycinColiGuardNeoCControlNeoDNeoC NeoDNeoD 2 timepoint NeoC D−ScourControlNeoDControl a ColiGuardNeomycinNeomycinControlNeoCNeoD t1 NeoCColiGuardColiGuardNeoC NeoC ColiGuardNeoC 1 Control Neomycin a D−ColiGuardScourNeoCNeomycinNeoC NeoCDNeoD−ScourNeoC t0 D−ScourDD−−ScourScour ColiGuard NeoD ColiGuardDNeoD−NeomycinScour ColiGuardNeoD Control DControl−Scour NeoCDScour Control a t2 Control NeoD NeoD D−ScourDScourD−Scour a t1 ColiGuard NeoCColiGuard NeomycinNeomycinNeoCNeomycinD−ScourColiGuard 0 DScour NeomycinNeoC DScourNeoD ControlNeoDNeomycinDMothers−Scour NeoD NeoD D−Scour D−NeoCScourColiGuardD−ScourControl Neomycin MothersNeomycin DScourNeoD DNeoD−Scour a t2 a ColiGuardNeomycinColiGuardNeoCControl DScour t3 NeoC ControlNeoC Neomycin NeomycinNeomycinNeomycin NeoD 0 DScourNeoC NeoDDScourNeomycinNeoD Neomycin −1 ColiGuard ColiGuardNeoC NeoCNeoDControlNeomycin a NeoCt3 NeoDControl ColiGuardControl NeoDColiGuardNeoC NeoD 0 NeoC ColiGuardColiGuard NeoCNeoD a Control ColiGuardColiGuard NeomycinControl Control t4 ColiGuardNeoD Control ColiGuardNeomycin a t4 Neomycin Control Control D−Scour NeoC DScour NeoC NeoD NeoC ColiGuard −1 Neomycin ColiGuard ColiGuard D−NeomycinScour NeomycinControlDNeoC−Scour a a DScourNeoDNeoC ColiGuardNeoC t5 t5 Neomycin Neomycin NeoD Control ColiGuardNeoD NeoD Control a t6 NeoC ColiGuard a Control −−21 t6 ColiGuardD−Scour a t7 NeoD NeoCDScour −2 NeomycinColiGuardNeomycin −2 DScour NeoD a t8 a t7

ControlNeoDNeoC NeoD ColiGuard Control a t9 a D−Scour NeoC t8 standardized PC4 (4.1% explained var.) PC4 (4.1% explained standardized NeomycinControl standardized PC2 (15.9% explained var.) PC2 (15.9% explained standardized ColiGuard −2 Neomycin a t10 NeoC NeoD −4 Control a t9 DScour var.) PC4 (2.5% explained standardized a NA standardized PC2 (12.0% explained var.) PC2 (12.0% explained standardized −2 ColiGuardControl−1 0 1 NeoC −3 standardized PC1 (51.0% explained var.) −3 −2 −1 0 1 a t10 standardized PC3 (7.1% explained var.) −1 0 1 2 −2 −1 0 1 1 standardized PC3 (8.0% explained var.) standardized PC1ColiGuard (66.7% explained var.) NeoCNeoD NeoDControl NeoC 2 D−ScourDNeomycin−Scour ColiGuardControlNeoC timepoint DColiGuard−Scour DColiGuardD−NeoD−ScourScour D−Scour D−DScour−NeomycinScourControl NeoD NeoD NeoDNeoC a t0 0 NeomycinColiGuard Control ColiGuardNeomycinColiGuardNeoCControlNeoC a t1 NeomycinNeomycinNeoDNeoC ColiGuardNeoD 1 DD−−ScourScour ColiGuard NeoDNeoCNeoC D−Scour a NeomycinColiGuard NeoC DD−−ScourScour t2 NeoD NeoDNeoD NeoCD−Scour ControlColiGuardControlControlControl Control NeoC ControlD−ScourNeoD Control ColiGuard NeoD NeoC a Neomycin D−Scour ColiGuard NeoD t3 NeomycinD−Scour Control Control −1 NeomycinNeomycinNeoD NeomycinNeomycinColiGuard 0 NeoC Control a ColiGuardNeoC t4 D−Scour ControlColiGuard NeoCColiGuard NeoC Neomycin ColiGuard D−Scour NeomycinNeoD NeoDNeoC a t5 ColiGuardNeoD ColiGuardControl Neomycin a NeoD t6 Neomycin −1 D−Scour Neomycin −2 a NeoC t7 NeoD Control D−Scour ColiGuard Control a t8 NeoCNeoD NeoC −2 DColiGuardControl−Scour a t9 standardized PC2 (7.5% explained var.) PC2 (7.5% explained standardized var.) PC4 (1.2% explained standardized Neomycin NeoC −3 Neomycin a t10 −2 −1 0 1 standardized PC1 (81.7% explained var.) −1 0 1 2 standardized PC3 (4.4% explained var.)

Supplementary Figure 9. Temporal shift of samples from CheckM clustered MAGs, dRep- ANI clustered MAGs, and GTDB clustered MAGs.

Principal component analysis from nearly complete CheckM taxonomically clustered MAGs (12.4k) (top), ANI clustered MAGs (22.4k) (middle), and GTDB taxonomically clustered MAGs (51.2k) (bottom), across the first and the second (left), and the third and fourth principal components (right). Prior to PCA, the data was normalized by proportions, the average proportion was computed by time point and treatment cohort, and the centered log transformation was applied. Labels report the treatment cohorts: Control, D-Scour, ColiGuard, Neomycin, NeoD (Neomycin followed by D-Scour), NeoC (Neomycin followed by ColiGuard). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Diversity of GTDB−predicted species in the piglet population

ER4 sp000765235 PeH17 sp000435055 CAG−110 sp003525905 Phil1 sp001940855 Intestinimonas butyriciproducens UBA1191 sp002310095 UBA1777 sp002320035 OEMS01 sp900199405 UBA738 sp003522945 RC9 sp000433355 CAG−1024 sp000432015 RC9 sp000434935 RUG563 sp900320085 Oscillibacter sp900066435 CAG−170 sp002404795 UBA636 sp002299675 UBA1777 sp003150355 UBA2862 sp900317565 CAG−145 sp002320005 UBA11524 sp000437595 Prevotella sp002251435 CAG−302 sp001916775 RC9 sp000432655 Methanobrevibacter_A gottschalkii CAG−110 sp002362145 CAG−83 sp900313295 Prevotella sp002251365 UBA5920 sp002406055 Dorea longicatena F23−B02 sp002472405 CAG−1000 sp000434555 Prevotella sp002251245 CAG−45 sp002299665 UBA1191 sp900066305 CAG−110 sp002437585 RC9 sp900320185 CAG−877 sp000433455 UBA2834 sp002351365 CAG−594 sp000434835 UBA3789 sp900314225 Lactobacillus amylovorus Bifidobacterium thermacidophilum Solobacterium sp900315065 Eubacterium_E hallii_A Bifidobacterium boum UBA1417 sp003531055 Abundance Lactobacillus_H mucosae RC9 sp000435075 UBA7748 sp900314535 Butyricicoccus_A sp002395695 256 Prevotella sp002251295 Lactobacillus delbrueckii_A CAG−269 sp000431335 F23−B02 sp000431075 Marvinbryantia sp900066075 Blautia_A wexlerae CAG−269 sp001916055 Dorea formicigenerans CAG−269 sp003525075 Fusicatenibacter saccharivorans 16

species Gemmiger sp003476825 Faecalibacterium prausnitzii_E Ruminococcus_A sp003011855 Holdemanella biformis Agathobacter faecis Agathobacter sp000434275 Faecalibacterium prausnitzii_D Senegalimassilia anaerobia Megasphaera elsdenii Blautia_A sp000285855 1 Prevotella copri Holdemanella sp002299315 Agathobacter sp900317585 Ruminococcus_E sp900314705 Faecalicatena faecis Blautia_A massiliensis CAG−279 sp000437795 Olsenella_E sp002440065 Clostridium sp000435835 Prevotella sp000434515 Blautia_A obeum Faecalibacterium sp003449675 Prevotella sp000436915 Lactobacillus johnsonii Prevotella copri_A CAG−245 sp000435175 Catenibacterium mitsuokai Eubacterium_A pyruvativorans Prevotella sp002300055 CAG−103 sp900317855 Lactobacillus_H reuteri Collinsella sp002391315 Eubacterium_R coprostanoligenes Prevotella sp000834015 Prevotella sp900290275 Phascolarctobacterium_A succinatutens Prevotella sp000434975 Mogibacterium sp002299625 Escherichia flexneri Gemmiger variabile RC9 sp000431015 C941 sp002476625 Angelakisella sp003453215 Prevotella sp002251385 CAG−177 sp003514385 CAG−83 sp003487665 Ruminiclostridium_C sp000435295 ER4 sp003522105 Ruminococcus_E bromii_A Methanobrevibacter_A smithii ER4 sp900317525 t0 t2 t4 t6 t8 t10 NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA Sample

Supplementary Figure 10. The most abundant taxa in the piglet population and their abundance profile over time.

The heatmap reports the log transformed normalized relative abundance from all piglets. Each panel represents one time point and, within each panel, all samples from all piglets are included. At t0 piglets are aged 3 to 4 weeks old; at t10 piglets are aged 8 to 9 weeks old. Samples are rarefied to obtain equal amount of GTDB clustered MAGs per sample, and subsequently the most prevalent taxa, present in at least 20% of all the samples are included. Analysis is performed with PhyloSeq. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

125 84% fold change

positive negative

100

75

63%

50 85%

significant abundance shifts (counts) significant abundance 37%

25 16% 68%

15% 32%

50% 50% 0

t0_t2 t2_t4 t4_t6 t6_t8 t8_t10 time interval

Supplementary Figure 11. Number of significant abundance shifts per time interval.

Significant shifts (alpha = 0.05) were determined from comparison of GTDB clustered MAGs abundance in samples from two consecutive time points with SIAMCAT. Significant hits were corrected for false discovery rate. Proportions of negative fold change (blue) and positive fold change (red) shifts are reported on the top of the bar plots. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Differentially abundant features Significance Fold change Prevalence shift Feature AUCs showing top 50 features

Bifidobacterium boum__8866 t2 Solobacterium sp900315065__5521 t0 Bifidobacterium thermacidophilum__8868 UBA7748 sp900314535__9466 CAG−45 sp002299665__2159 CAG−110 sp002437585__1651 Chlamydia suis__11722 CAG−1000 sp000434555__5653 CAG−1000 sp000436975__5654 F23−B02 sp002472405__1686 CAG−877 sp000433455__5608 Butyricicoccus_A sp002395695__1719 Marvinbryantia sp900066075__2239 CAG−110 sp002362145__1650 Blautia_A wexlerae__2186 Faecalibacterium prausnitzii_D__1561 Lactobacillus amylovorus__4345 UBA1417 sp003531055__1350 UBA1191 sp900066305__2705 Gemmiger sp003476825__1573 Faecalibacterium prausnitzii_E__1566 Gemmiger formicilis__1571 UBA11524 sp000437595__1772 Ruminococcus_A sp003011855__2218 Eubacterium_E hallii_A__2369 Megasphaera elsdenii__3655 UBA2862 sp900317565__1786 Fusicatenibacter saccharivorans__2215 Senegalimassilia anaerobia__9534 KLE1615 sp900066985__2558 Olsenella_E sp002440065__9452 Dorea formicigenerans__2276 Catenibacterium mitsuokai__5583 RC9 sp000435075__12952 Holdemanella biformis__5548 CAG−170 sp003516765__1662 Agathobacter sp900317585__2146 Faecalibacterium sp003449675__1560 RC9 sp000432655__12914 CAG−145 sp002320005__2676 Methanobrevibacter_A smithii__758 Angelakisella sp003453215__1545 ER4 sp003522105__1635 Prevotella sp000434975__12307 CAG−83 sp003487665__1630 Escherichia flexneri__16728 RC9 sp000431015__12942 Gemmiger variabile__1572 Lactobacillus_H vaginalis__4223 An172 sp002160515__1285

1E−06 1E−05 1E−04 1E−03 1E−02 1E−01 1E+00 0 5 10 15 20 25 −2 −1 0 1 2 0 25 50 75 100 0 0.25 0.5 0.75 1

Abundance (log10−scale) −log10(adj. p value) Generalized fold change Prevalence [%] AU−ROC

Supplementary Figure 12. Significantly shifting taxa between the first and the second week.

Significant shifts (alpha = 0.05) were determined from the comparison of GTDB clustered MAGs abundance in samples from two consecutive time points with SIAMCAT. Points show normalized and log -transformed abundance within each subject and species. Significant hits were corrected for false discovery rate. Species positively (red) or negatively (blue) shifting in abundance are shown. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Differentially abundant features Significance Fold change Prevalence shift Feature AUCs showing top 50 features

Lactobacillus delbrueckii_A__4367 t4 Prevotella sp000434515__12267 t2 CAG−269 sp003525075__1928 Ruminococcus_E sp900314705__1275 Oribacterium sp002431355__2535 UBA3789 sp900314225__5607 Faecalibacterium prausnitzii_E__1566 Agathobacter sp900317585__2146 Olsenella_E sp002440065__9452 Megasphaera elsdenii__3655 Prevotella copri__12261 Prevotella sp900313215__12270 Blautia_A sp000285855__2187 Faecalibacterium sp003449675__1560 Agathobacter sp000434275__2143 CAG−269 sp001916065__1921 Agathobaculum butyriciproducens__1714 Ruminococcus_A sp003011855__2218 Coprococcus_A catus__2377 CAG−791 sp000431495__2088 Prevotella sp000434975__12307 Blautia_A sp900066205__2177 Prevotella copri_A__12262 Fusicatenibacter saccharivorans__2215 CAG−245 sp000435175__1942 Gemmiger formicilis__1571 CAG−605 sp000433475__5645 CAG−279 sp000437795__12634 Mitsuokella jalaludinii__3542 Lactobacillus amylovorus__4345 Faecalibacterium prausnitzii_D__1561 Lactobacillus_H mucosae__4233 Prevotella sp000834015__12318 Blautia_A wexlerae__2186 CAG−45 sp002299665__2159 CAG−83 sp003487665__1630 UBA2862 sp900315585__1787 UBA1191 sp900066305__2705 CAG−145 sp002320005__2676 ER4 sp003522105__1635 CAG−349 sp003539515__1866 RUG563 sp900320085__1780 RC9 sp000431015__12942 UBA738 sp003522945__1697 UBA1191 sp002310095__2704 CAG−1000 sp000436975__5654 UBA2862 sp900317565__1786 CAG−1024 sp000432015__1810 Phil1 sp001940855__1796 Methanobrevibacter_A smithii__758

1E−06 1E−05 1E−04 1E−03 1E−02 1E−01 1E+00 0 4 8 13 18 23 −2 −1 0 1 2 0 25 50 75 100 0 0.25 0.5 0.75 1

Abundance (log10−scale) −log10(adj. p value) Generalized fold change Prevalence [%] AU−ROC

Supplementary Figure 13. Significantly shifting taxa between the second and the third week.

Significant shifts (alpha = 0.05) were determined from the comparison of GTDB clustered MAGs abundance in samples from two consecutive time points with SIAMCAT. Points show normalized and log -transformed abundance within each subject and species. Significant hits were corrected for false discovery rate. Species positively (red) or negatively (blue) shifting in abundance are shown. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Differentially abundant features Significance Fold change Prevalence shift Feature AUCs showing top 50 features

Corynebacterium xerosis__7061 t6 Clostridium_P ventriculi__3047 t4 UBA1448 sp002329405__1532 Dialister sp000434475__3672 Blautia_A obeum__2183 F23−B02 sp000431075__1687 Corynebacterium stationis__6990 Blautia_A sp900066205__2177 CAG−269 sp001916055__1930 Clostridium sp000435835__3041 CAG−170 sp002298695__1663 Lactobacillus johnsonii__4373 Ruminococcus_E sp900314705__1275 CAG−536 sp900314505__5552 Eubacterium_A pyruvativorans__2663 CAG−791 sp000431495__2088 Oscillibacter sp001916835__1643 CAG−170 sp000432135__1660 CAG−83 sp000431575__1625 Agathobaculum butyriciproducens__1714 CAG−83 sp000435555__1629 Faecalibacterium sp003449675__1560 Blautia_A sp000285855__2187 Terrisporobacter othiniensis__2754 Coprococcus_A catus__2377 Lactobacillus_H reuteri__4220 Catenibacterium mitsuokai__5583 Angelakisella sp003453215__1545 CAG−103 sp900317855__1693 Holdemanella biformis__5548 Senegalimassilia anaerobia__9534 Blautia_A massiliensis__2178 Olsenella_E sp002440065__9452 Ruminococcus_A sp003011855__2218 Faecalibacterium prausnitzii_E__1566 Dorea formicigenerans__2276 Gemmiger sp003476825__1573 Collinsella sp002391315__9500 Faecalibacterium prausnitzii_D__1561 Gemmiger formicilis__1571 Marvinbryantia sp900066075__2239 Phascolarctobacterium_A succinatutens__3625 Chlamydia suis__11722 Methanobrevibacter_A smithii__758 UBA7748 sp900314535__9466 Prevotella sp002251435__12397 Bifidobacterium thermacidophilum__8868 Bifidobacterium boum__8866 Lactobacillus delbrueckii_A__4367 Lactobacillus_H sp002299975__4219

1E−06 1E−05 1E−04 1E−03 1E−02 1E−01 1E+00 0 2 4 6 8 10 −2 −1 0 1 2 0 25 50 75 100 0 0.25 0.5 0.75 1

Abundance (log10−scale) −log10(adj. p value) Generalized fold change Prevalence [%] AU−ROC

Supplementary Figure 14. Significantly shifting taxa between the third and the fourth week.

Significant shifts (alpha = 0.05) were determined from the comparison of GTDB clustered MAGs abundance in samples from two consecutive time points with SIAMCAT. Points show normalized and log -transformed abundance within each subject and species. Significant hits were corrected for false discovery rate. Species positively (red) or negatively (blue) shifting in abundance are shown. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Differentially abundant features Significance Fold change Prevalence shift Feature AUCs

t8 t6

Clostridium sp000435835__3041

Prevotella copri__12261

Bifidobacterium boum__8866

UBA7748 sp900314535__9466

1E−06 1E−05 1E−04 1E−03 1E−02 1E−01 1E+00 0 1 2 3 −1 −0.5 0 0.5 1 0 25 50 75 100 0 0.25 0.5 0.75 1

Abundance (log10−scale) −log10(adj. p value) Generalized fold change Prevalence [%] AU−ROC

Supplementary Figure 15. Significantly shifting taxa between the fourth and the fifth week.

Significant shifts (alpha = 0.05) were determined from the comparison of GTDB clustered MAGs abundance in samples from two consecutive time points with SIAMCAT. Points show normalized and log -transformed abundance within each subject and species. Significant hits were corrected for false discovery rate. Species positively (red) or negatively (blue) shifting in abundance are shown. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Differentially abundant features Significance Fold change Prevalence shift Feature AUCs

Clostridium sp000435835__3041 t10 t8 Corynebacterium xerosis__7061

Terrisporobacter othiniensis__2754

Corynebacterium stationis__6990

Mogibacterium sp002299625__2688

Holdemanella biformis__5548

Gemmiger sp003476825__1573

Collinsella sp002391315__9500

UBA1191 sp900066305__2705

RC9 sp000432655__12914

Faecalibacterium prausnitzii_E__1566

Megasphaera elsdenii__3655

Butyricicoccus_A sp002395695__1719

RC9 sp000434935__12951

Prevotella sp002251295__12392

Prevotella sp900290275__12309

Prevotella sp002251385__12290

Prevotella sp002251435__12397

Oribacterium sp002431355__2535

Prevotella sp002251365__12289

CAG−279 sp000437795__12634

Prevotella copri__12261

Prevotella sp000834015__12318

Prevotella sp000434975__12307

Prevotella sp000436035__12268

Prevotella sp000434515__12267

Prevotella copri_A__12262

Prevotella sp900313215__12270

1E−06 1E−05 1E−04 1E−03 1E−02 1E−01 1E+00 0 2 4 6 8 10 −2 −1 0 1 2 0 25 50 75 100 0 0.25 0.5 0.75 1

Abundance (log10−scale) −log10(adj. p value) Generalized fold change Prevalence [%] AU−ROC

Supplementary Figure 16. Significantly shifting taxa between the fifth and the sixth week.

Significant shifts (alpha = 0.05) were determined from the comparison of GTDB clustered MAGs abundance in samples from two consecutive time points with SIAMCAT. Points show normalized and log -transformed abundance within each subject and species. Significant hits were corrected for false discovery rate. Species positively (red) or negatively (blue) shifting in abundance are shown. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Campylobacter sp002139915__23089 Escherichia flexneri__16728 RC9 sp000435075__12952 RC9 sp000434935__12951 RC9 sp000431015__12942 RC9 sp000433355__12937 RC9 sp000432655__12914 CAG−279 sp000437795__12634 UBA7173 sp001701135__12592 Bacteroides_B vulgatus__12572 Prevotella sp002251435__12397 Prevotella sp002251245__12396 Prevotella sp002251295__12392 Prevotella sp002299275__12333 Prevotella sp000834015__12318 Prevotella sp900290275__12309 Prevotella sp000434975__12307 Prevotella sp002251385__12290 Prevotella sp002251365__12289 Prevotella sp000436915__12288 Prevotella sp900313215__12270 Prevotella sp000436035__12268 Prevotella sp000434515__12267 Prevotella copri_A__12262 Prevotella copri__12261 Prevotella sp002300055__12250 UBA5920 sp002406055__12098 Treponema_D porcinum__11976 RUG369 sp900317365__11779 Chlamydia suis__11722 Cloacibacillus porcorum__11272 UBA2834 sp002351365__10373 Senegalimassilia anaerobia__9534 Senegalimassilia sp002431805__9533 Collinsella sp002391315__9500 RUG001 sp900314575__9468 UBA7748 sp900314535__9466 Olsenella_E sp002440065__9452 Bifidobacterium thermacidophilum__8868 Bifidobacterium boum__8866 Bifidobacterium pseudolongum__8861 Corynebacterium xerosis__7061 Corynebacterium stationis__6990 CAG−1000 sp000436975__5654 CAG−1000 sp000434555__5653 CAG−533 sp002438065__5649 CAG−533 sp000434495__5647 CAG−605 sp000433475__5645 CAG−605 sp000433255__5644 CAG−460 sp000437315__5640 CAG−302 sp001916775__5624 CAG−1193 sp000436255__5619 CAG−594 sp000434835__5618 CAG−710 sp000432595__5613 RUG343 sp900317975__5609 CAG−877 sp000433455__5608 UBA3789 sp900314225__5607 Catenibacterium mitsuokai__5583 UBA636 sp002299675__5570 CAG−536 sp900314505__5552 Holdemanella sp002299315__5549 Holdemanella biformis__5548 Solobacterium sp900315065__5521 Lactobacillus johnsonii__4373 Lactobacillus delbrueckii_A__4367 Lactobacillus amylovorus__4345 Lactobacillus crispatus__4342 Lactobacillus_H mucosae__4233 Lactobacillus_H vaginalis__4223 Lactobacillus_H reuteri__4220 Lactobacillus_H sp002299975__4219 Dialister sp000434475__3672 Dialister succinatiphilus__3670 interval Megasphaera elsdenii__3655 Phascolarctobacterium_A succinatutens__3625 Mitsuokella jalaludinii__3542 Mitsuokella multacida__3541 Clostridium_P ventriculi__3047 Clostridium sp000435835__3041 Terrisporobacter othiniensis__2754 t0_t2 UBA1191 sp900066305__2705 UBA1191 sp002310095__2704 Mogibacterium sp002299625__2688 CAG−145 sp002320005__2676 Eubacterium_A pyruvativorans__2663 KLE1615 sp900066985__2558 t2_t4 Oribacterium sp002431355__2535 Lachnospira sp900316325__2446 Coprococcus_A catus__2377 Eubacterium_E hallii__2370 Eubacterium_E hallii_A__2369 Dorea formicigenerans__2276 t4_t6 Dorea longicatena__2274 Faecalicatena faecis__2253 Marvinbryantia sp900066075__2239 GCA−900066135 sp900066135__2232 Ruminococcus_A sp003011855__2218 Fusicatenibacter saccharivorans__2215 Blautia_A sp000285855__2187 t6_t8 Blautia_A wexlerae__2186 Blautia_A obeum__2183 Blautia_A massiliensis__2178 Blautia_A sp900066205__2177 CAG−45 sp002299665__2159 Agathobacter sp900317585__2146 t8_t10 Agathobacter rectale__2145 Agathobacter sp000434275__2143 Agathobacter faecis__2141 CAG−791 sp000431495__2088 UBA1712 sp002317245__2020 CAG−245 sp000435175__1942 CAG−269 sp001916055__1930 CAG−269 sp003525075__1928 CAG−269 sp000431335__1926 CAG−269 sp001916065__1921 CAG−349 sp003539515__1866 UBA11517 sp003522085__1830 CAG−1024 sp000432015__1810 Phil1 sp001940855__1796 UBA7102 sp002492415__1792 Firm−10 sp001603025__1789 UBA2862 sp900315585__1787 UBA2862 sp900317565__1786 RUG563 sp900320085__1780 UBA11524 sp000437595__1772 UBA644 sp002299265__1759 CAG−390 sp003523225__1732 Butyricicoccus_A sp002395695__1719 Butyricicoccus_A porcorum__1718 Agathobaculum butyriciproducens__1714 UBA738 sp003522945__1697 CAG−103 sp900317855__1693 CAG−103 sp000432375__1692 F23−B02 sp002314685__1689 F23−B02 sp000431075__1687 F23−B02 sp002472405__1686 F23−B02 sp001916715__1685 UBA1777 sp002320035__1682 UBA1777 sp900319835__1674 UBA1777 sp003150355__1671 CAG−170 sp002298695__1663 CAG−170 sp003516765__1662 CAG−170 sp000432135__1660 CAG−170 sp002404795__1659 CAG−110 sp002437585__1651 CAG−110 sp002362145__1650 CAG−110 sp002493965__1648 Oscillibacter sp001916835__1643 ER4 sp900317525__1638 ER4 sp003522105__1635 CAG−83 sp003487665__1630 CAG−83 sp000435555__1629 CAG−83 sp900313295__1627 CAG−83 sp000431575__1625 Lawsonibacter sp000177015__1615 Intestinimonas butyriciproducens__1605 Gemmiger sp003476825__1573 Gemmiger variabile__1572 Gemmiger formicilis__1571 Faecalibacterium sp002160895__1569 Faecalibacterium prausnitzii_E__1566 Faecalibacterium prausnitzii_D__1561 Faecalibacterium sp003449675__1560 Angelakisella sp003453215__1545 UBA1448 sp002329405__1532 CAG−180 sp002490525__1397 CAG−177 sp003514385__1368 UBA1417 sp003531055__1350 Acutalibacter sp000435395__1345 An172 sp002160515__1285 Ruminococcus_E sp900314705__1275 Ruminococcus_E sp003521625__1269 Ruminococcus_E bromii_B__1265 Methanobrevibacter_A smithii__758 Methanobrevibacter_A gottschalkii__748 −1 0 1

Supplementary Figure 17. Species significantly shifting in abundance at all time intervals.

Significant shifts (FDR adjusted p value < 0.05) between time points during the trial are displayed for all GTDB clustered MAGs. A large number of species increased in abundance during the first week (red dots). The following week (yellow dots) several kept increasing the (fc > 0; top and middle right), wheras some descreased (fc < 0; near bottom left). Wheras nearly all Prevotella increased during the first and second week (red and yellow dots, respectively), nearly all decreased during the last time interval (purple dots; fc < 0; top left). Species are sorted by phylogenetic tree node number. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

● t0 ● t4 ● t8 date ● t2 ● t6 ● t10 Chao1: A 1 B 1 C 1 Abundance-based 6 1.1 1 esmator of species Piglets age 1 0.77 0.52 Richness (weeks) 0.23 0.031 0.00045 0.19 0.12 New species acquired 8-9 0.099 Stability acquired ● Shannon: 1.3e−14 1.0 ● ● ● 3.4e−19 ● ● ● ● ● ●●● ● ● ● ●●●●● ● ●●● ● ●● ● ●●●●● ●●● ● ● Esmator of species ●●● ●● ● ● ● ● ●●●●●●● ●● ●● ● ● ● ●● ●● ● ● ● ●● ●●●●● ●● ● ● ● 300 ●● ● ●●● ● ● ● ● ● ● ●● ● ●● No significant changes 7-8 ● ● ●●●●●●●●● ●●●●●● ●●●● ● ● ●●● ● ● ● ● ● ●●●●● ●● ●● ●●● ●● ● ● ●●●●● ●● ●● ●●●●● ● ● ● ●●● ●●●● ● ●● ● 5 ● ●●● ●● ●●●●● ● ● ●● ● ● ● ● richness and species ●● ● ●●● ●● ● ● ● ●●●● ● ● ● ●● ●● ● ●●●● ● ● ● ● ● ●● ● ● ●●● ●● ●● ● ● ●● ●● ● ●● ● ● ● ●●●●●● ●●●● ●●●●●● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ●●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ●● ● ●● ● evenness: more weight New species acquired 6-7 ●●● ●●● ●● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● Stability acquired ● ● ● ●● ● ● ● on species richness ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 0.9 ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● Some(?) new species 5-6 ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● Loss of evenness ● ● ● ● ●●●● ● ●● ● ● ● ● Simpson: ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● Esmator of species ● ●● ● ●● ●●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● New species introduced 4-5 ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● richness and species ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● Evenly represented ● ● ● ●● ● ● ● ● ● 200 ●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● Chao1 ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●● ●● ● evenness: more weight ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Simpson ● Shannon ● ● ● ● ●●●● ●● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ●●● ● ● ●●● ●● ● ●● ●● ● ● ● 0.8 ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● on species evenness ● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●●●●●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● ●●●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● 3 ● ● ● ● 0.7 ● ● ● ● ● ● ● ●●● ● ●● ● ●● ●●●● ● ● ● ●●● ● ● ●●● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ● 100 ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ●●● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● 0.6 ●

●● ● ● ● ● ● ● 2 ● ●●

t0 t2 t4 t6 t8 t10 t0 t2 t4 t6 t8 t10 t0 t2 t4 t6 t8 t10 time point time point time point

Supplementary Figure 18. Sample diversity across time.

Samples were rarefied and diversity was obtained with PhyloSeq. Diversity measures displayed A) Chao1: abundance-based estimator of species richness; B) Shannon: estimator of species richness and species evenness, with more weight on species richness; C) Simpson: estimator of species richness and species evenness, with more weight on species evenness. Sample diversity scores from different time points were compared by t-test and significance values were adjusted with the Bonferroni method. GTDB taxonomic clustering of MAGs was used. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

A breed B birth_day C breed

● DxL ● 2017−01−08 ● DxL

● DxLW ● 2017−01−11 ● DxLW

● LWxD

● LxLWD

t0 t0 t0

2017−01−06 2017−01−07 2017−01−08 DxL 2017−01−08

● ● ● ● ● ● 0.3 ● 0.2 ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●●●● ● 0.4 ● ● ● ● ●●● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ●● ● ● 0.2 ● ● ● −0.2 ●● ● ● ● ● ● 0.1 ● ● −0.4 0.2 ● ● ● ●● ● 0.0 ●● ● ● ● 2017−01−09 2017−01−10 2017−01−11 ● ● ● ● ● 0.0 ●●● ● ● −0.1 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● 0.2 ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● −0.2 ● Axis.2 [11%] Axis.2 0.0 ●●●● ● ●● ●● ●● ● ● ●● ●●● ●●●● ● ● ● ● ● ● ● ● [12.6%] Axis.2 −0.2 [10.5%] Axis.2 ●● ● ●●● ● ● −0.2 ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ● ● −0.3 ● −0.4 ●● −0.25 0.00 0.25 0.50−0.25 0.00 0.25 0.50−0.25 0.00 0.25 0.50 −0.2 0.0 0.2 0.4 −0.2 0.0 0.2 Axis.1 [12.5%] Axis.1 [18.2%] Axis.1 [13.6%] t2 t2 t2

2017−01−06 2017−01−07 2017−01−08 DxL 2017−01−08 ● ● 0.4 ● ● ●● ● ● ● 0.2 ● ●● ● 0.4 ● ● ● ● ●●● ●● ● ● ● ● ● ● ●● ● ● ● 0.0 ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ● −0.2 ● ● 0.2 ● 0.2 ● ● ● ● ● ● ● ● ● ● 2017−01−09 2017−01−10 2017−01−11 0.0 ●● ● ● ● ● ● ● ● ● ● ● ●● ● 0.0 ● ● 0.2 ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● Axis.2 [7.8%] Axis.2 ● ● ● ● ● ●● ● ● ● ● ●●●●● ● ● ● ● 0.0 ● ● ●● [10.6%] Axis.2 [10.3%] Axis.2 ● ●● ● ●● ● ● ● ● ●● ● ● −0.2 ● ● ● ● ● ● ●● ●● ● ● ● ●●●●●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ●● −0.2 ● ● ●● ● ● ● −0.4 −0.2 0.0 0.2 −0.4 −0.2 0.0 0.2 −0.4 −0.2 0.0 0.2 −0.2 0.0 0.2 −0.2 0.0 0.2 Axis.1 [14.8%] Axis.1 [15.5%] Axis.1 [14.1%] t4 t4 t4

2017−01−06 2017−01−07 2017−01−08 DxL 2017−01−08 0.4 ●● ●● ● 0.2 ● ● ●● ● ● ● ●● 0.2 ● ● ● ●●● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● −0.2 ●● ● ●● ● ● 0.2 ● ● ● ● ● ● ● ●● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● 2017−01−09 2017−01−10 2017−01−11 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● −0.2 ● 0.2 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● −0.2 ● ● ● ● ● ● ● ● ●●● ● ●● ● ● 0.0 ● ● ●● ● ● ● ● ● ● ● ● ●● ● Axis.2 [11.3%] Axis.2 ● ● ● ● ●● ●● ●● ● ● ● ● ● [11.4%] Axis.2 [15.1%] Axis.2 ● ● ●●● ● −0.2 ● ●● ● ● ● ●● ● ● ● ● ● ● −0.4 −0.4 −0.2 0.0 0.2 −0.4 −0.2 0.0 0.2 −0.4 −0.2 0.0 0.2 −0.2 0.0 0.2 −0.2 0.0 0.2 Axis.1 [19.5%] Axis.1 [23.7%] Axis.1 [23.4%]

Supplementary Figure 19. Age and breed correlations with composition.

A principal component analysis was performed for all samples (A), for a subset including one breed and two age groups (B), and for a subset including one age group and two breeds (C). Metadata with breed and age was joined to the principal components coordinates generated with Phyloseq. GTDB taxonomic clustering of MAGs was used. Each plot shows the principal components 1 and 2, reporting the percentage of variation explained. Significance of the correlations was determined with the Dunn test and p-values were corrected with the Bonferroni method. We refer to Supplementary File 1 for all the generated p-values. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Differentially abundant features Significance Fold change Prevalence shift Feature AUCs

t0 Duroc x Landrace 2017 01 11 t0 Duroc x Landrace 2017 01 08

Phil1 sp001940855__1796

UBA1777 sp003150355__1671

CAG−83 sp900313295__1627

Methanobrevibacter_A gottschalkii__748

UBA3388 sp002358835__13144

1E−30 1E−27 1E−24 1E−21 1E−18 1E−15 1E−12 1E−09 1E−06 1E−03 1E+00 0 1 2 −22 −11 0 11 22 0 25 50 75 100 0 0.25 0.5 0.75 1

Abundance (log10−scale) −log10(adj. p value) Generalized fold change Prevalence [%] AU−ROC

Differentially abundant features Significance Fold change Prevalence shift Feature AUCs

t2 Duroc x Landrace 2017 01 11 t2 Duroc x Landrace 2017 01 08

Ruminococcus_E sp900319655__1262

Methanobrevibacter_A smithii__758

UBA7748 sp900314535__9466

Bifidobacterium thermacidophilum__8868

Bifidobacterium boum__8866

1E−30 1E−27 1E−24 1E−21 1E−18 1E−15 1E−12 1E−09 1E−06 1E−03 1E+00 0 1 −22 −11 0 11 22 0 25 50 75 100 0 0.25 0.5 0.75 1

Abundance (log10−scale) −log10(adj. p value) Generalized fold change Prevalence [%] AU−ROC

Supplementary Figure 20. Mild taxonomic differences between age groups.

Mild taxonomic differences between subjects of the same breed and two age groups. Differences in abundance at time point 0 (alpha = 0.06; week 1) (top) and at time point 2 (alpha = 0.13; week 2) (bottom). GTDB taxonomic clustering of MAGs was used. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

weight

t0 Rho: x=−0.264 y=−0.232 p−value: x=0.007 y=0.018 t2 Rho: x=0.28 y=0.266 p−value: x=0.004 y=0.007 0.2 ● ● ● 0.2 ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● 0.1 ● ● ● 0.1 ● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●●● ●● ● ●● ● ● ●● ● ●●● ●●● ● ●● ●●●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 0.0 ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ●● ● ●●● ● ● ● 0.0 ●● ● ● ●● ●●● ● ● ●● ● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● −0.1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● Axis.9 [2.8%] Axis.9 [3.4%] Axis.4 −0.1 ● ● ● ● ● ●●● ●● −0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● −0.2 ● ● −0.3 ● ● −0.2 0.0 0.2 −0.2 −0.1 0.0 0.1 Axis.5 [4.2%] Axis.18 [1.5%]

t4 Rho: x=−0.263 y=0.222 p−value: x=0.007 y=0.023 t6 Rho: x=0.384 y=−0.256 p−value: x=0.001 y=0.034

● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● 0.04 ●● 0.1 ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ●● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ●●● ● ● ● ● ● ● ● −0.1 ● ● ● ● ● ● ● ● ● ● ●● Axis.4 [4.3%] Axis.4 ● ● ● ● ● [0.4%] Axis.48 ● ●● ● ● ●●● ● ● ● ● −0.04 −0.2 ● ● ● ● ● −0.1 0.0 0.1 −0.10 −0.05 0.00 0.05 0.10 Axis.5 [3.5%] Axis.33 [0.8%]

t8 Rho: x=0.352 y=−0.349 p−value: x=0.003 y=0.003 t10 Rho: x=−0.346 y=−0.326 p−value: x=0.008 y=0.013

● ● ● 0.1 ● ● 0.05 ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ●● ●● ●● ● ●● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Axis.44 [0.5%] Axis.44 ● [1.7%] Axis.15 ● ● −0.1 ● −0.05 ● ● ● ● ● ● ● −0.03 0.00 0.03 0.06 −0.1 0.0 0.1 Axis.51 [0.3%] Axis.13 [2%]

Supplementary Figure 21. Correlation of weight with composition.

A principal component analysis was performed for all samples with Phyloseq using the GTDB taxonomic clustering of MAGs. Information on the weight of the hosts throughout the trial was joined to the principal components coordinates generated with Phyloseq. A correlation test was run with the Spearman’s rank method. Each plot shows the principal components for which a significant correlation with weight was found (rho describing the degree and direction of the correlation). The variation explained by each principal component is reported. Correction of the p-values with the Bonferroni method is applied. We refer to Supplementary File 1 for all the generated p-values. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Percentage identity against CAZy 6e+08

100 4e+08

2e+08 Total predicted proteins Total 0e+00 75

40

50 20

AA Percentage identity Percentage 0 CE Proportion of CAZymes (%)

SLH 1.00 GH

GT 25 0.75 CBM

0.50 cohesin value PL 0.25

0.00 0 0.42% 12.37% 0.15% 49.97% 34.98% 1.3% 0.01% 0.81% NA Firmicutes Synergistota Bacteroidota Myxococcota Firmicutes_A Firmicutes_B Firmicutes_C PL Spirochaetota AA Thermotogota CE GT GH Cyanobacteria Fibrobacterota Euryarchaeota Proteobacteria Fusobacteriota Elusimicrobiota Patescibacteria SLH Deferribacterota Actinobacteriota Eremiobacterota Planctomycetota CBM Verrucomicrobiota Campylobacterota Thermoplasmatota Desulfobacterota_A Verrucomicrobiota_A cohesin

Supplementary Figure 22. Sequence identity against the CAZy database and proportions across Phyla.

Sequence identity of all predicted proteins against the CAZy database is shown (left). Numbers on the bottom of the whisker plots report the proportion each enzyme class that is present in our dataset. Distribution of predicted proteins (top right), distribution of CAZy enzymes (middle right) and proportions of enzymatic classes (bottom right) across phyla are shown. GTDB taxonomic clustering of MAGs was used. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

● 40 Escherichia flexneri Prevotella copri ●

● Lactobacillus amylovorus

●●● Lactobacillus_H reuteri Prevotella sp002251295 Prevotella sp002251435 ● ● ● ● Prevotella sp002251365 ● ● ● Bacteroides_B vulgatus Prevotella sp000834015 ● ● Lactobacillus_B salivarius Prevotella sp002251385 30 ● ● ●● ● ● ● CAG−269 sp003525075● ● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● phylum ●●●● ● ● ●●● ●● ●a Actinobacteriota ●● ● ●● ● ● ● ●●● ● ● Gemmiger sp003476825 CAG−110 sp002437585 ●●●●●● ●● ● ● ●● ● ●a Bacteroidota ●●●●●●●●●●●●●● ●● ●● ● ● ● 20 ●●●●● ● ● ● ●●● ●● ● ● ● Gemmiger formicilis ● a Firmicutes ●●●●●●●●● ●●●●●●● ●● ●● ● OEMS01 sp900199405 ●a Other Phyla ●●●●●●●●●● ●●●●●● ●●●● ●●● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●● ●● ● ●a Proteobacteria ●●●●●●●●●●●● ●●●● ●● ● ● ●● ● ●●●●●●●●●●●●●●●●●● ●● ● ● ●●●●●●●●● ●●●●● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●● ●●●●● ● ●●● ●●●●●●●●●●●●● ●●● ●●●●● ● ●●●●●●●●●●●●●● ● ● ● ● ● 10 ●●●●●●●●●●●●●● ● ●●● ● ● ● ●●●●●●●●●●● ● ●●●●●●● ●● ●● Number of different GT class enzymes within a genome Number of different ●●●●●●●●● ● ● ● ● ● ● ●●●●●●●●●● ●● ●●●●●● ●●● ● ● ● ● ●●●●●●●●●●●●● ●●●●●●●● ● ● ●●●●● ● ●●●● ● ●●●●● 0

0 200 400 600 Number of genes encoding GT per genome

Supplementary Figure 23. Diversity of glycoside transferases.

The plot shows the number of genes encoding glycoside transferases (GTs) across species of GTDB clustered MAGs (x-axis) and the number of distinct enzymes within the GT class represented in those species (y-axis). Species of Gemmiger show a particularly large number of genes encoding GTs. Escherichia flexneri has the highest number of distinct GT enzymes among all species. A cluster of Prevotella species display a high count of GT genes (between 100 and 400) and a relatively high number of distinct GT enzymes. Enzymes of class GT are known to assemble complex carbohydrates from activated sugar donors (Lairso et al, 2008). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

AA3 CE13 AA7 AA3 CE13 AA7

● −2.5 ● ● ● Methanobrevibacter_A CAG−791 Clostridium −3 −2 gottschalkii 20 sp000431495 16 sp000435835 5 −5.0

● ● ● −4 Methanobrevibacter_A Eubacterium_H CAG−103 ● ● −6 ●

n=41 n=22 n=42 19 9 6 ● −7.5 ● ● smithii sp002394235 sp000432375 ● ● ●

n=126 n=105 n=126 ● ● −6 −9 ● −10.0 ● ● Escherichia CAG−605 UBA1777 ● ● 18 8 6 ● ● −8 flexneri sp000433255 sp003150355 ● ● ●

CE11 CE12 AA6 CE11 CE12 AA6

● ● −1 ● −2 −3 ● Phascolarctobacterium_A 5 Prevotella 7 Escherichia 18 −2 succinatutens copri flexneri ● −6 −4 ●

n=42 n=38 n=35 Prevotella Prevotella Dorea −3 ● ● ● ● ● ● ● ● 4 7 11 ● ● ● ● ● sp002251435 sp002251435 formicigenerans n=126 n=124 n=125 ● ● ● ● −4 −6 ● −9 ● ● ● RC9 UBA11524 Anaerovibrio ● ● −5 ● ● ● 4 7 7 −12 sp000434935 sp000437595 sp002100115 CE5 CE4 CE10 CE5 CE4 CE10 ● ● −2.5 1.0 ● ● 0.5 −5.0 0.5 Corynebacterium Phascolarctobacterium_A UBA11524 xerosis 37 succinatutens 1 sp000437595 2 0.0 0.0 ● ● −7.5 ● n=96 n=13 n=42 ● n=42 ● ● Bifidobacterium CAG−110 Gemmiger −0.5 ● ● ● ● ● n=126 ● ● n=126 18 2 2 −10.0 −0.5 ● pseudolongum sp002437585 formicilis −1.0 ● ● ● Corynebacterium Blautia_A CAG−110 −12.5 −1.0 ● −1.5 stationis 12 wexlerae 2 sp002437585 2 CE15 CE8 CE2

−1 ● CE15 CE8 CE2 −2.5 ● −2 −2 Firm−10 Prevotella Prevotella ● ● −5.0 ● ● 9 5 4 ● ● sp001603025 sp002251435 sp002251435 ● −3 ●

n=32 n=40 ● n=42 ● ● −4 ● ● ● ● ● ●

−7.5 n=126 n=126 ● n=126 UBA2862 UBA5920 Firm−10 ● ● −4 sp900315585 6 sp002406055 5 sp001603025 3 ● −6 −10.0 ● ● −5 ● ● ● UBA11524 Gemmiger CAG−110 enzymeID sp000437595 8 formicilis 5 sp002437585 3 CE14 CE3 CE6

● ● −2 ● ● −2 ● CE14 CE3 CE6 −2 ● −4

● ● ● ● −4 UBA11524 CAG−177 Faecalibacterium ● ● ● ● −4 ● −6 ● ● 4 11 7 ● n=42 ● n=30 ● ● n=28 ● sp000437595 sp003514385 prausnitzii_D ● ● ● ● ● ● ● ● n=126 n=126 ● ● n=124 −6 −8 ● −6 ● F23−B02 CAG−877 Prevotella ● 4 9 12 ● −10 ● sp002472405 sp000433455 sp002251435 ● ● ● CAG−110 Prevotella UBA11524 sp002437585 5 sp000434975 8 sp000437595 6 AA4 CE9 AA10

● ● ● ● ● 0.0 ● ● ● AA4 CE9 AA10 ● −2 ● −0.5 ● −5 ●

● −1.0 ● ● Megasphaera UBA11524 Enterococcus_B −4 n=39 ● ● n=42 ● ● n=67 n=25 ● ● ● ● 14 2 53 ● ● elsdenii sp000437595 hirae ● −1.5 ● n=126 n=126 ● −10 −6 ● −2.0 Phascolarctobacterium_A 25 CAG−110 2 Enterococcus_B 15 ● ● succinatutens sp002437585 villorum −2.5 Succiniclasticum 3 Blautia_A 2 Enterococcus_B 12 CE7 AA2 CE1 sp900314855 wexlerae durans

● ● ● ● ● −1 ● ● −5.0 0 −2 CE7 AA2 CE1 −7.5 n=8 n=6 −3 ● ● ● ● n=41 n=42 ● ● ● −1 ● ● ● Prevotella Terrisporobacter UBA11524 ● ● ● 5 29 3 −4 n=126 ● n=126 ● −10.0 sp002251435 othiniensis sp000437595

● −5 −2 Prevotella Romboutsia CAG−110 ● −12.5 ● sp002251295 6 timonensis 21 sp002437585 4 Prevotella Clostridium OEMS01 sp002251365 5 sp000435835 21 sp900199405 2 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 tM tM tM ss_sows ss_sows ss_sows ss_piglets ss_piglets ss_piglets date percentage of species carrying enzyme 10 20 30 40 50

Supplementary Figure 24. AA and CE class enzymes time trend and species representation.

Log-transformed normalized abundance of enzymes over time is represented on the left panels. Time trend is visualized from time point 0 (t0) (week 1) to time point 8 (t8) (week 4) for enzymes in the piglet population. Time point tM is the single time point of the mothers. The number of subjects (piglets and mothers) carrying each enzyme is reported on the left of the boxplots within each panel. Panels on the right report the top three species carrying each enzyme in their MAGs (based on GTDB taxonomic clustering of MAGs). Only enzymes for which a significant change over time was found between any time point (Kruskal pair-wise comparison with Bonferroni correction) are reported. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

PL8 PL27 PL1 PL8 PL27 PL1

Prevotella Firm−10 Prevotella −2.5 18 11 13 −2.5 sp900290275 sp001603025 sp002251435 −5.0 n=8 n=19 n=27 −5.0 n=124 n=116 ● n=122 ● ● −5.0 ● RC9 Fusicatenibacter Prevotella −7.5 ● 5 12 9 ● sp002439805 saccharivorans sp000436915 ● ● −7.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −7.5 ● ● −10.0 −10.0 ● ● Alistipes Prevotella Prevotella ● ● ● ● ● 9 11 10 ● ● shahii copri sp000834015 ● ● ● ●

PL37 PL10 PL11 PL37 PL10 PL11

−2.5 ● −3 −3 UBA2862 Prevotella Prevotella n=7 n=5 17 27 10 −5.0 sp900315585 sp002251295 sp002251435 n=25

n=111 −6 n=122 n=120 ● ● −6 ● ● ● ● ● ● ● ● −7.5 ● ● UBA11524 RC9 Prevotella ● ● 15 8 10 ● ● ● ● ● ● ● ● ● sp000437595 sp001915575 sp000436915 ● ● ● ● ● −9 ● ● ● ● ● ● ● −9 ● ● ● ● −10.0 ● ● ● ● ● ● ● ● ● ● UBA7102 Bacteroides_B Prevotella ● ● 13 17 8 ● ● −12 ● ● sp002492415 vulgatus sp002251365

PL26 PL9 PL21 −2.5 −2 PL26 PL9 PL21 ●

−5.0 −4 −5.0 n=5 n=3 KLE1615 UBA5920 RC9

n=34 n=82 sp900066985 21 sp002406055 16 sp002440085 18

n=116 −6 n=116 −7.5 ● ● −7.5 ● ● ● ● ● ● ● ● ● −8 RC9 Prevotella RC9 ● −10.0 12 6 14 ● ● sp001915575 sp000834015 sp000434935 ● ● ● ● −10.0 ● ● ● ● ● ● ● ● −10 ●

enzymeID ● ● ● ● ● ● ● ● ● ● ● ● ● −12.5 ● ● ● ● Prevotella Lachnospira RC9 −12.5 sp002251435 12 sp900316325 6 sp002439805 23 PL35 PL12 PL22 −2.5

● PL35 PL12 PL22 ●

● ● −5.0

n=7 −3 n=5 −5 n=82 n=13 n=91 RC9 18 KLE1615 5 KLE1615 45 n=124 −7.5 sp001915575 sp900066985 sp900066985

−6 ● ● ● ● ● ● −10.0 ● ● ● Ruminococcus_E UBA5920 Gemmiger −10 ● ● ● ● ● ● 14 13 20 ● ● bromii_A sp002406055 sp003476825 −9 ● ● ● ● ● ● −12.5 ● ● ● ● ● ● ● UBA4716 RC9 Enterobacter sp002438865 7 sp002439805 5 himalayensis 4 PL4 PL31 PL15 −4

● ● −2.5 PL4 PL31 PL15 −5.0 −6 n=1 n=7 n=7

n=60 n=26 −5.0

n=113 RC9 Ruminococcus_F Prevotella −8 −7.5 sp001915575 40 champanellensis 42 sp900290275 15 −7.5

−10 ● ● ● −10.0 ● ● ● ● −10.0 ● Clostridium_AI Ruminococcus_C UBA7173 ● ● ● ● ● 16 39 15 ● ● ● ● ● sp002297865 sp000433635 sp001701135 ● ● ● ● ● −12 ● ● ● ● −12.5 ● −12.5 ● Prevotella CAG−45 Firm−10 sp000834015 4 sp900315735 3 sp001603025 17 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 tM tM tM ss_sows ss_sows ss_sows ss_piglets ss_piglets ss_piglets date percentage of species carrying enzyme 10 20 30 40

Supplementary Figure 25. PL class enzymes time trend and species representation.

Log-transformed normalized abundance of enzymes over time is represented on the left panels. Time trend is visualized from time point 0 (t0) (week 1) to time point 8 (t8) (week 4) for enzymes in the piglet population. Time point tM is the single time point of the mothers. The number of subjects (piglets and mothers) carrying each enzyme is reported on the left of the boxplots within each panel. Panels on the right report the top three species carrying each enzyme in their MAGs (based on GTDB taxonomic clustering of MAGs). Only enzymes for which a significant change over time was found between any time point (Kruskal pair-wise comparison with Bonferroni correction) are reported. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

CBM32 CBM13 CBM20 CBM32 CBM13 CBM20

−2.5 ● Prevotella Gemmiger Prevotella sp002251365 5 formicilis 14 copri 36 −2.5 −2 n=5 −5.0 n=34 n=29

● n=126 n=126 ● n=118 ● ● Prevotella Faecalibacterium Prevotella ● ● −5.0 −7.5 ● sp000834015 6 prausnitzii_D 8 copri_A 22 ● −4 ● ● ● ●

● ● ● −10.0 ● ● An172 Gemmiger Parabacteroides −7.5 ● ● ● 4 8 3 −6 ● ● sp002160515 sp003476825 distasonis ● −12.5

CBM35 CBM34 CBM25 CBM35 CBM34 CBM25

−2.5 ● ● −3 −3 CAG−177 Lactobacillus CAG−269 sp003514385 12 amylovorus 21 sp000431335 15

−5.0 n=14 n=23 n=21

● n=124 n=115 n=119

−6 ● ● −6 ● ● −7.5 ● ● ● CAG−180 Blautia_A CAG−269 ● ● ● ● ● 7 5 15 ● ● ● ● sp002490525 wexlerae sp001916055 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −9 ● −10.0 ● ● −9 ● ● ● ● ● ● ● ● Prevotella Fusicatenibacter CAG−269 ● ● 7 4 11 ● ● −12.5 ● ● sp002251365 saccharivorans sp001916065

CBM38 CBM50 CBM48 CBM38 CBM50 CBM48 ● ● −1 ● ● ● ● −2.5 ●

−6 −2 RUG369 Oscillibacter Gemmiger

n=81 n=15 n=42 n=30 sp900317365 37 sp900066435 6 formicilis 4 −5.0 −3 n=126 n=120

● −9 ● ● ● ● ● ● ● ● ● UBA11524 CAG−594 Eubacterium_E ● −4 ● ● 21 6 6 ● ● −7.5 ● sp000437595 sp000434835 hallii_A ● ● ● ● ● ● ● −12 ● −5 ● ●

enzymeID ● ● ● ● ● ● ● ● −10.0 ● ● ● ● UBA3207 CAG−877 Blautia_A −6 sp003501485 11 sp000433455 5 wexlerae 5 CBM26 CBM4 CBM23

−2.5 ● −4 CBM26 CBM4 CBM23

−6 n=2 −5.0 −5 n=20 n=21 n=41 Ruminococcus_E 34 RUG369 24 Anaerovibrio 50 n=114 n=108 sp900314705 sp900317365 sp002100115 −8 −7.5 ●

● −10 UBA7173 Agathobacter Ruminococcus_C ● ● −10 ● ● ● 8 14 11 ● ● sp001701135 sp000434275 sp000433635 −10.0 ● ● ● ● ● ● ● ● −12 ● ● ● ● −12.5 Ruminococcus_E Agathobacter Agathobacter bromii_B 7 sp900317585 9 sp000434275 7 CBM2 CBM40 t0 t2 t4 t6 t8 tM

● −2.5 ● ss_sows ss_piglets

● CBM2 CBM40 ● −5.0

n=7 n=1 ● −5.0 n=58 n=13 −7.5 Clostridium_AI 40 Clostridium_P 47 ● sp002297865 perfringens −7.5

−10.0 −10.0 Eubacterium_F Clostridium sp003491505 11 celatum 18 ● −12.5 −12.5 CAG−95 Clostridium sp000436115 4 nigeriense 6 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 tM tM ss_sows ss_sows ss_piglets ss_piglets date percentage of species carrying enzyme 10 20 30 40 50

Supplementary Figure 26. CBM class enzymes time trend and species representation.

Log-transformed normalized abundance of enzymes over time is represented on the left panels. Time trend is visualized from time point 0 (t0) (week 1) to time point 8 (t8) (week 4) for enzymes in the piglet population. Time point tM is the single time point of the mothers. The number of subjects (piglets and mothers) carrying each enzyme is reported on the left of the boxplots within each panel. Panels on the right report the top three species carrying each enzyme in their MAGs (based on GTDB taxonomic clustering of MAGs). Only enzymes for which a significant change over time was found between any time point (Kruskal pair-wise comparison with Bonferroni correction) are reported. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

CBM51 CBM71 CBM67 CBM51 CBM71 CBM67

● ● ● −2.5 −2.5 An172 An172 Prevotella sp002160515 13 sp002160515 91 copri 6 ●

−5 n=1 −5.0 ● ● ●

n=35 n=77 ● −5.0 n=28 ● ● ● Clostridium_P Clostridium_P Prevotella ● ● ● ● n=119 −7.5 n=122 9 4 6 ● ● perfringens perfringens copri_A ● ● ● −10 ● −10.0 −7.5 ● ● ● ● Romboutsia Clostridium UBA11524 ● 10 2 5 ● ● ● timonensis celatum sp000437595 ● ● −12.5 ●

CBM9 CBM44 CBM62 CBM9 CBM44 CBM62

−3 ● ● ● −2 ● ● ● −2.5 ● ● Prevotella Clostridium_P Prevotella −6 7 90 16 n=1 sp000434975 perfringens sp002251365 −4 ● −5.0 n=15 n=29 n=10 ● ●

n=126 n=118 Fusicatenibacter Clostridium UBA4372 −9 −7.5 ● ● 7 3 12 ● ● ● saccharivorans disporicum sp002316015 ● ● −6 ● ● ● ● ● ● ● ● ● ● ● ● −10.0 ● ● −12 ● ● ● ● Prevotella CAG−914 UBA6398 ● ● ● ● ● 5 3 10 −8 ● ● −12.5 ● sp900290275 sp000437895 sp003150315 CBM61 CBM6 CBM66 CBM61 CBM6 CBM66 −2.5 −2.5 ● −4 −5.0 Treponema_D CAG−56 UBA7102 −5.0 n=6 −6 n=2 porcinum 14 sp900066615 14 sp002492415 14 n=15 n=76 −7.5 −8 n=109 −7.5 n=112 Agathobacter RC9 Holdemanella ● −10 11 13 12 ● sp000434275 sp000433355 biformis −10.0 ● ● −10.0 ● ● ● ● ● ● −12 ● ● ● ● ● ● ● ● ● ● ● −12.5 ● ● Agathobacter Bacteroides RUG563 ● −12.5 ● ● rectale 8 fragilis 13 sp900320085 10 CBM54 CBM74 CBM57 −2.5 −2.5 CBM54 CBM74 CBM57 −2.5 ● ●

enzymeID −5.0

−5.0 n=6 n=4 −5.0 Agathobaculum 22 Ruminococcus_E 18 RC9 20 n=20 −7.5 n=90 ● butyriciproducens sp003521625 sp001915575 ● ● ● −7.5 n=107 n=122 −7.5 ● ● ● ● ● ● ● ● ● ● ● −10.0 ● Dorea Ruminococcus_E Prevotella ● ● ● ● ● 8 17 13 −10.0 ● ● −10.0 longicatena_B sp900314705 sp002251385 ● ● ● −12.5 ● ● ● ● −12.5 ● −12.5 ● Eubacterium Prevotella Prevotella callanderi 7 sp900313215 7 sp000834015 10 CBM77 CBM22 CBM21 −2.5 −2.5 CBM77 CBM22 CBM21 ● ● −6 ● −5.0 −5.0 ● n=2 n=9 n=3

−8 n=25 −7.5 n=60 n=31 Prevotella Eubacterium_F Ruminococcus_E −7.5 sp002251365 26 sp003491505 22 bromii_A 60

−10 −10.0 −10.0 Prevotella Ruminococcus_F Ruminococcus_E ● 10 17 24 −12 sp002299275 champanellensis sp003521625 −12.5 −12.5 Lachnospira HUN007 Clostridium sp900316325 10 sp002315235 9 sp900317445 9 CBM16 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 tM tM

● ss_sows ss_sows ss_piglets ss_piglets −5.0 CBM16

−7.5 n=73 n=10 Ruthenibacterium lactatiformans 12 −10.0 Firm−10

● 11 ● sp001603025 −12.5 ● ● OEMS01 sp900199405 10 t0 t2 t4 t6 t8 tM ss_sows ss_piglets date percentage of species carrying enzyme 25 50 75

Supplementary Figure 27. CBM class enzymes time trend and species representation (part 2).

Log-transformed normalized abundance of enzymes over time is represented on the left panels. Time trend is visualized from time point 0 (t0) (week 1) to time point 8 (t8) (week 4) for enzymes in the piglet population. Time point tM is the single time point of the mothers. The number of subjects (piglets and mothers) carrying each enzyme is reported on the left of the boxplots within each panel. Panels on the right report the top three species carrying each enzyme in their MAGs (based on GTDB taxonomic clustering of MAGs). Only enzymes for which a significant change over time was found between any time point (Kruskal pair-wise comparison with Bonferroni correction) are reported. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

GT20 GT19 GT26 GT20 GT19 GT26

−1 −2.5 Escherichia RC9 CAG−110 ● −1 flexneri 38 sp000434935 4 sp002437585 3 ●

−5.0 n=34 −2 n=42 n=42

● n=116 n=126 n=126

● ● Corynebacterium Prevotella Oscillibacter −2 −3 19 4 3 −7.5 ● xerosis sp002251365 sp900066435 ● ● ● ● ● ● ● ● −4 ● −10.0 −3 Bacteroides_B Phascolarctobacterium_A Phascolarctobacterium_A vulgatus 15 succinatutens 5 succinatutens 3

● ● −5

GT101 GT3 GT27 GT101 GT3 GT27

● ● ●

−2 −3 −3 Lactobacillus Prevotella Prevotella johnsonii 22 sp000834015 7 copri 6 n=40 −3 n=39 n=25 n=122 n=126 n=125

● −6 −5 ● ● −4 ● Clostridium Prevotella RC9 ● ● ● 5 6 10 ● ● sp000435835 sp002251365 sp000434935 ● ● ● ● ● ● ● ● ● ● ● −9 ● −5 −7 ● ● ● ●

● ● ● ● Streptococcus Prevotella Prevotella ● −6 ● ● 4 7 6 ● ● suis_P sp002251435 sp002251365 −12 −9 GT14 GT28 GT2 GT14 GT28 GT2 ● ● ● ● ● ● −2 ● ● 2.0 Prevotella Phascolarctobacterium_A Gemmiger 0.0

n=40 n=42 n=42 4 1 3 −3 1.5 copri succinatutens formicilis ●

n=126 n=126 n=126 ● ● ● ● ● ● 1.0 ● ● ● −4 ● ● ● ● −0.5 ● OEMS01 CAG−83 Prevotella ● ● ● ● ● ● 6 1 2 ● ● 0.5 sp900199405 sp003487665 sp002251435 ● ● ● −5 ● ●

enzymeID 0.0 −1.0 −6 ● ● UBA738 UBA2862 Blautia_A sp003522945 4 sp900317565 1 wexlerae 2 GT23 GT10 GT25 −2

● GT23 GT10 GT25 ● −2.5

−4

n=2 −3 −5.0 n=31 n=14 Bacteroides_B 11 CAG−45 10 Succinivibrio 11 −6 n=119 n=125 ● n=101 vulgatus sp002299665 sp900316375 −6 ● −7.5 ● ● ● ● −8 ● ● Prevotella Gemmiger Prevotella ● ● ● −10.0 ● 10 14 12 ● ● sp002251385 sp003476825 copri ● ● ● ● −9 ● ● ● ● −10 ● ● ● ● ● ● ● ● ● ● −12.5 CAG−791 CAG−110 RUG343 sp000431495 8 sp002437585 8 sp900317975 7 GT104 GT11 GT1 −1 −2.5 ● GT104 GT11 GT1 −2 −2.5 −5.0 n=5 n=42 n=42 −3 n=108 n=126 n=123 Treponema_D 24 Gemmiger 5 Clostridium_P 15 ● porcinum sp003476825 ventriculi −7.5 −5.0 ●

● ● ● −4 ●

● ●

● −10.0 ● ● ● ● Succinivibrio CAG−791 CAG−110 ● −5 ● ● ● ● 11 6 8 ● ● −7.5 ● sp900316375 sp000431495 sp002437585

● ● −12.5 ● −6 ● Treponema_D Gemmiger Oscillibacter succinifaciens 11 formicilis 6 sp900066435 6 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 tM tM tM ss_sows ss_sows ss_sows ss_piglets ss_piglets ss_piglets date percentage of species carrying enzyme 10 20 30

Supplementary Figure 28. GT class enzymes time trend and species representation.

Log-transformed normalized abundance of enzymes over time is represented on the left panels. Time trend is visualized from time point 0 (t0) (week 1) to time point 8 (t8) (week 4) for enzymes in the piglet population. Time point tM is the single time point of the mothers. The number of subjects (piglets and mothers) carrying each enzyme is reported on the left of the boxplots within each panel. Panels on the right report the top three species carrying each enzyme in their MAGs (based on GTDB taxonomic clustering of MAGs). Only enzymes for which a significant change over time was found between any time point (Kruskal pair-wise comparison with Bonferroni correction) are reported. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

GT66 GT76 GT32 GT66 GT76 GT32

● ●

● −2.5 ● ● Methanobrevibacter_A OEMS01 Prevotella −2 −1 gottschalkii 27 sp900199405 8 copri 3

−5.0 n=40 n=42 n=42 ● ● ● n=126 −4 n=126 ● −2 n=126 ● Methanobrevibacter_A F23−B02 Lactobacillus ● 25 8 2 ● ● smithii sp002472405 amylovorus −7.5 ● ● ● ● ● ● −6 ● −3 ● ● ● ● −10.0 ● ● ● Campylobacter Phil1 Lactobacillus ● 9 10 2 ● ● ● −4 sp002139915 sp001940855 delbrueckii_A ● −8 ● ● −12.5

GT56 GT81 GT53 GT56 GT81 GT53

● −2 ● ● ● ● Escherichia Corynebacterium Corynebacterium −5.0 n=8 44 5 27 ● flexneri xerosis stationis

−5 n=34 n=42 n=67 ●

−4 ● n=118 n=126 −7.5 CAG−485 Methanobrevibacter_A Corynebacterium sp002361155 8 gottschalkii 15 xerosis 47 ● ● −6 ● −10.0 ● ● −10 ● ● ● ●

● Enterobacter Methanobrevibacter_A Dietzia ● −12.5 4 14 19 ● −8 ● himalayensis smithii sp002441635

GT6 GT8 GT41 GT6 GT8 GT41 ●

0 ●

−3 n=4 Gemmiger Lactobacillus Polynucleobacter

n=35 n=42 −5 17 3 34 −1 variabile johnsonii rarus

n=122 ● n=126 n=102

● −6 ● ● ● ● ● ● ● −2 ● ● ● Gemmiger 34 CAG−245 3 Anaerovibrio 19 ● formicilis sp000435175 sp002100115 ● ● ● −10 −9 ● ● ● ● −3 ● enzymeID ● ● ●

● ● ● Gemmiger Lactobacillus Neisseria sp003476825 17 amylovorus 3 meningitidis_B 12 GT39 GT30 GT35

● −1 −1 GT39 GT30 GT35 ● 0.0 ● −2

n=42 n=42 n=42 Gemmiger Phascolarctobacterium_A Gemmiger −2 6 6 2 n=126 n=126 −0.5 n=126 formicilis succinatutens formicilis −3

● ● ● ● ● ● ● ● −3 −4 −1.0 ● Phil1 Prevotella CAG−110 ● ● 4 6 1 ● sp001940855 sp002251435 sp002437585 ● ●

● ● −5

● ● −4 −1.5 UBA11524 Prevotella Blautia_A sp000437595 4 sp000834015 5 wexlerae 2 GT51 GT4 GT5

● ● ● 0.0 GT51 GT4 GT5 0.4 1 −0.5 n=42 n=42 n=42

n=126 n=126 n=126 Prevotella Gemmiger CAG−110 ● ● 1 3 2 ● ● ● ● sp002251435 formicilis sp002437585 0.0 ● ● ● ● ● ● −1.0 ●

● ● ● 0 ● ● ● ● ● Prevotella Gemmiger Blautia_A −0.4 −1.5 1 2 2 ● ● ● sp000834015 sp003476825 wexlerae

● ● RC9 CAG−110 CAG−83 sp000434935 1 sp002437585 4 sp003487665 1 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 tM tM tM ss_sows ss_sows ss_sows ss_piglets ss_piglets ss_piglets date percentage of species carrying enzyme 10 20 30 40

Supplementary Figure 29. GT class enzymes time trend and species representation (part 2).

Log-transformed normalized abundance of enzymes over time is represented on the left panels. Time trend is visualized from time point 0 (t0) (week 1) to time point 8 (t8) (week 4) for enzymes in the piglet population. Time point tM is the single time point of the mothers. The number of subjects (piglets and mothers) carrying each enzyme is reported on the left of the boxplots within each panel. Panels on the right report the top three species carrying each enzyme in their MAGs (based on GTDB taxonomic clustering of MAGs). Only enzymes for which a significant change over time was found between any time point (Kruskal pair-wise comparison with Bonferroni correction) are reported. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

GT89 GT87 GT85 GT89 GT87 GT85

−2.5 ● ●

● CAG−45 Corynebacterium Corynebacterium ● −5.0 ● 61 57 54 ● ● ● sp002299665 xerosis xerosis n=9 −5 −5.0 n=29 n=68 n=66 n=10 ● ● ● ● ● ● n=104 −7.5 Corynebacterium Corynebacterium Corynebacterium −7.5 stationis 23 stationis 27 stationis 31

● −10.0 −10 ● −10.0 Dietzia Dietzia Dietzia ● 5 10 6 ● sp002441635 sp002441635 sp002441635 ● −12.5 ●

GT83 GT9 GT94 GT83 GT9 GT94

● ● −2 −1 ● −1 ● Phascolarctobacterium_A Escherichia RC9 ● 8 5 16 ● succinatutens flexneri sp000434935 n=42 n=41 n=31 −2 −2 n=126 n=126 −4 n=126 −3 Megasphaera Phascolarctobacterium_A RC9 −3 6 9 11 ● ● elsdenii succinatutens sp000432655 ● ● ● ● ● ● −4 −6 ● ● ● ● −4 ● ● ● ● ● ● ● Prevotella Anaerovibrio RC9 ● ● 3 4 7 ● −5 ● ● sp000834015 sp002100115 sp000433355

GT46 GT22 GT103 −2.5 GT46 GT22 GT103 ● −2.5

−5.0 ●

n=8 n=6 ● ● −5.0 n=6 ● UBA1777 F082 Holdemanella ●

n=98 n=37 n=40 14 46 44 −5.0 ● ● ● sp002320035 sp900315175 biformis ● −7.5 ● −7.5 −7.5 CAG−110 Alistipes CAG−269 ● 9 16 4 ● −10.0 ● −10.0 sp002437585 senegalensis sp001916055 ● −10.0 ● ●

● ● enzymeID ● ● ● ● −12.5 −12.5 ● −12.5 Bacteroides Prevotella CAG−269 fragilis 7 sp002251435 20 sp003525075 4 GT92 GT90 GT107 ● −2.5 GT92 GT90 GT107 −2.5 ●

n=2 −5.0 n=3 −5 n=35 n=26 RC9 Prevotella Bilophila −5.0 7 12 24 n=120 n=119 ● sp000434935 sp002251435 wadsworthia −7.5 ●

● ● ● ● −7.5 ● ● −10.0 Prevotella Prevotella Campylobacter_D ● −10 11 14 14 ● ● ● ● sp000434975 sp000436035 coli ● ● ● ● ● ● ●

● ● −10.0 ● ● −12.5 ● ● ● ● Prevotella Alistipes Escherichia sp900290275 10 shahii 8 flexneri 19 GT21 GT12 GT84

−4 ● ● −2.5 ● GT21 GT12 GT84 −6 −5.0 n=5 n=4 n=59 n=46 n=40 −5.0

−8 n=123 Desulfovibrio CAG−273 UBA2862 −7.5 sp000403945 35 sp003507395 18 sp900317565 6 ● ● ● ● ● ● −10 ● ● ● ● −7.5 ● ● −10.0 ● ● ● Desulfovibrio OEMS01 CAG−269 −12 ● 29 15 5 ● ● sp900319575 sp900199405 sp001916065 ●

−10.0 ● −14 −12.5 ● Desulfovibrio UBA791 CAG−269 piger 29 sp002297005 8 sp001916055 5 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 tM tM tM ss_sows ss_sows ss_sows ss_piglets ss_piglets ss_piglets date percentage of species carrying enzyme 20 40 60

Supplementary Figure 30. GT class enzymes time trend and species representation (part 3).

Log-transformed normalized abundance of enzymes over time is represented on the left panels. Time trend is visualized from time point 0 (t0) (week 1) to time point 8 (t8) (week 4) for enzymes in the piglet population. Time point tM is the single time point of the mothers. The number of subjects (piglets and mothers) carrying each enzyme is reported on the left of the boxplots within each panel. Panels on the right report the top three species carrying each enzyme in their MAGs (based on GTDB taxonomic clustering of MAGs). Only enzymes for which a significant change over time was found between any time point (Kruskal pair-wise comparison with Bonferroni correction) are reported. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

GH123 GH120 GH101 GH123 GH120 GH101

● Firm−10 CAG−791 An172 −2.5 −2.5 5 8 38 −2 ● sp001603025 sp000431495 sp002160515 −5.0 −5.0 RC9 Fusicatenibacter CAG−56 ● ● ● 3 7 9 ● ● −4 ● −7.5 n=6 ● sp000434935 saccharivorans sp900066615 ● ● ● −7.5 ● Dorea Blautia_A Clostridium_P −10.0 ● −6 n=40 n=39 ● 3 8 7 ● −10.0 ● −12.5 ● formicigenerans wexlerae perfringens n=126 n=124 n=117

GH1 GH10 GH136 GH1 GH10 GH136

● ●

● ● ● ● ● ● Catenibacterium Firm−10 An172 1 ● −2 −2.5 9 8 10 0 mitsuokai sp001603025 sp002160515 ● ● ● Solobacterium Prevotella Eubacterium_R ● −5.0 ● ● ● −1 ● −4 ● ● 6 8 10 ● ● ● ● ● sp900315065 sp002251435 coprostanoligenes ● −7.5 −2 n=42 n=38 n=12 UBA636 Prevotella Gemmiger −6 −10.0 ● 6 6 8 −3 n=126 n=125 n=126 sp002299675 sp002251295 formicilis

GH103 GH110 GH109 GH103 GH110 GH109 −2.5 ● ● 0 ● Escherichia Eubacterium_R Firm−10 −2 −1 −5.0 flexneri 26 coprostanoligenes 6 sp001603025 5 ● ● −7.5 ● −4 −2 ● ● ● ● Polynucleobacter Prevotella UBA2862 ● ● ● ● ● ● ● 7 5 3 −10.0 ● ● ● ● rarus sp002251365 sp900317565 n=36 n=28 −3 n=42 ● −6 ● ● −12.5 ● ● ● Succinivibrio RC9 UBA11524 n=120 n=126 n=126 sp900316375 8 sp000433355 5 sp000437595 4 GH102 GH13 GH125 GH102 GH13 GH125 ● ● 1.6 ● ● ● ● −2.5 ● −2 ● ● −5.0 1.2 Escherichia 26 Blautia_A 3 RC9 7 −7.5 ● ● flexneri wexlerae sp000434935 0.8 ● −4 ● ● ● ● ● ● ● ● Bilophila Gemmiger Prevotella ● ● ● −10.0 ● ●

n=28 n=42 ● n=39 12 2 4 ● ● ● ● 0.4 wadsworthia formicilis sp000834015 −12.5 ● ● −6

n=121 n=126 n=126 Polynucleobacter CAG−110 UBA1417 rarus 9 sp002437585 2 sp003531055 3 GH133 GH115 GH106 0 −1 ● GH133 GH115 GH106 −2 ● −2 −2 ● ● Prevotella Prevotella UBA2862 −3 −4 ● ● ● ● 4 8 7 ● ● ● ● ● ● sp002251435 copri sp900315585 −4 −4 ● ● −6 ● Prevotella Prevotella UBA11524 −5 n=41 n=31 ● n=39 ● 3 10 6 ● ● −6 ● sp002251365 sp002251435 sp000437595 n=126 n=126 n=126 RC9 Prevotella Prevotella sp000434935 4 sp002251295 6 sp002251385 4 enzymeID GH129 GH130 GH112

● −1 −2 −1 ● ● ● −2 GH129 GH130 GH112 −3 −2 −3 ● ● ● ● −4 ● −3 ● ● −4 ● Firm−10 UBA11524 Eubacterium_R ● ● ● ● ● ● ● ● 11 4 6 −4 ● ● −5 −5 ● sp001603025 sp000437595 coprostanoligenes ● n=38 ● ● n=38 n=33 ● −5 −6 Solobacterium Fusicatenibacter Blautia_A −6 ● ● 6 4 6 n=125 n=126 n=126 sp900315065 saccharivorans wexlerae GCA−900066135 RC9 OEMS01 sp900066135 6 sp000434935 5 sp900199405 5 GH105 GH116 GH127 ● −2.5 0 ● ● ● −2 −5.0 −1 GH105 GH116 GH127 ● ● ● −2 −7.5 ● ● ● ● ● ● −4 ● −3 Prevotella Firm−10 Firm−10 ● ● ● 5 14 4 ● ● ● ● −10.0 ● n=40 n=10 −4 n=41 sp002251435 sp001603025 sp001603025 ● −6 −12.5 ● −5 UBA11524 UBA2862 UBA11524 n=126 n=117 n=126 sp000437595 10 sp900317565 12 sp000437595 5 Gemmiger 5 UBA11524 9 Blautia_A 3 GH104 GH128 GH113 sp003476825 sp000437595 wexlerae

● ● ● ● ● −2.5 ● −5 −5.0 −5.0 GH104 GH128 GH113 ● ● ● −7.5 ● ● ● n=2 n=5 −7.5 ● ● ● ● ● −10 ● ● ● Escherichia Prevotella Gemmiger ● ● ● n=60 −10.0 n=16 ● 57 15 19 ● ● −10.0 ● ● ● ● ● ● ● ● flexneri sp002300055 variabile

n=117 −12.5 n=123 Bifidobacterium CAG−279 UBA7102 thermacidophilum 2 sp000437795 9 sp002492415 7 Escherichia 3 Prevotella 11 Bifidobacterium 7 GH108 dysenteriae sp002251365 thermacidophilum t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 tM tM

● −2.5 ss_sows ss_sows −5.0 ss_piglets ss_piglets GH108 −7.5

● ● ● −10.0 ● ●

n=26 Prevotella ● ● ● 15 −12.5 ● sp000436915 n=109 Desulfovibrio piger 7 Desulfovibrio sp000403945 5 t0 t2 t4 t6 t8 tM ss_sows ss_piglets date percentage of species carrying enzyme 10 20 30 40 50

Supplementary Figure 31. GH class enzymes time trend and species representation.

Log-transformed normalized abundance of enzymes over time is represented on the left panels. Time trend is visualized from time point 0 (t0) (week 1) to time point 8 (t8) (week 4) for enzymes in the piglet population. Time point tM is the single time point of the mothers. The number of subjects (piglets and mothers) carrying each enzyme is reported on the left of the boxplots within each panel. Panels on the right report the top three species carrying each enzyme in their MAGs (based on GTDB taxonomic clustering of MAGs). Only enzymes for which a significant change over time was found between any time point (Kruskal pair-wise comparison with Bonferroni correction) are reported. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

GH20 GH23 GH154 GH20 GH23 GH154 1 −1 Prevotella Escherichia OEMS01 0 ● ● 5 3 8 ● 0 ● −2 sp000834015 flexneri sp900199405 −1 −1 −3 Prevotella RC9 UBA11524 ● 4 2 6 ● ● ● ● ● sp002251385 sp000434935 sp000437595 −2 ● ● −4 ● ● ● −2 ● ● Prevotella Prevotella Gemmiger ● ● −3 n=42 n=42 −5 n=40 ● 4 3 6 −3 ● ● sp002251365 sp002251435 formicilis

−4 n=126 n=126 n=126

GH153 GH2 GH139 GH153 GH2 GH139

● −2 ● ● 1 ● −4 Escherichia UBA11524 UBA7102 −5 flexneri 56 sp000437595 2 sp002492415 21 −6 ● 0 ● ● Phascolarctobacterium_A Prevotella Prevotella ● ● ● ● ● ● ● ● n=3 ● −8 ● ● ● 24 2 21 ● ● succinatutens sp002251365 sp002251295 −10 ● ● ● ● −1 n=14 ● ● n=42 −10 ● Oxalobacter Prevotella Prevotella ● ● ● ● ● ● 5 3 13

n=101 n=126 n=122 formigenes_B sp002251435 sp000834015

GH141 GH146 GH159 GH141 GH146 GH159 ● ● −3 −2 −5 Prevotella Firm−10 RC9 −3 sp002251295 7 sp001603025 6 sp001915575 12 −6 ● ● ● ● ● ● −4 ● Firm−10 UBA11524 OEMS01 ● ● ● ● ● ● ● −10 ● ● ● 11 6 15 ● ● ● ● −9 −5 ● ● ● ● ● sp001603025 sp000437595 sp900199405 n=19 ● n=38 ● n=11 ● ● ● ● ● ● ● Prevotella Prevotella UBA11524

n=124 −6 n=125 n=106 sp000834015 8 sp002251435 7 sp000437595 14 GH137 GH148 GH165

● −3 GH137 GH148 GH165 ● −3 ● −4 −6 Prevotella UBA1067 RUG563 ● −6 16 20 11 −6 ● ● ● ● ● sp002251295 sp900320855 sp900320085 ● −9 ● ● ● ● n=5 ● ● ● ● ● ● ● ● ● ● −8 ● ● UBA11524 RUG369 UBA644 ● ● ● ● −9 ● ● ● ● ● n=15 n=99 n=15 8 17 10 ● ● −12 ● ● ● ● ● ● sp000437595 sp900317365 sp002299265 ● ● ● −10 ● ●

n=119 n=122 RUG369 Prevotella Bacteroides_B sp900317365 8 sp002251365 9 vulgatus 5 GH158 GH138 GH163

−2.5 ● −2.5 ● GH158 GH138 GH163 ● −2.5 −5.0 −5.0 −5.0 Prevotella Prevotella UBA1067 ● 18 31 5 −7.5 −7.5 ● ● n=9 n=3 ● ● ● ● ● ● ● ● ● sp002409785 sp002251295 sp900320855 ● ● ● ● ● ● ● ● −7.5 ● ● ● ● RC9 Prevotella UBA2862 −10.0 −10.0 ● ● ● n=25 ● ● ● ● 10 18 4 ● ● ● ● sp000432655 sp000834015 sp900317565 n=110 n=116 n=124 RC9 Bacteroides_B Prevotella sp000434935 16 vulgatus 13 sp002251385 8 enzymeID GH156 GH16 GH143

● −2.5 0 ● ● GH156 GH16 GH143 −5.0 −1 −4 −2 −6 ● Firm−10 RUG369 Prevotella −7.5 ● ● ● ● ● ● ● ● ● ● ● −3 −8 ● 24 4 13 ● ● ● ● ● ● ● ● sp001603025 sp900317365 sp002251295 −10.0 ● ● ● ● ●

n=16 −4 n=42 n=20 ● ● ● ● ● ● −10 UBA11524 RC9 UBA11524 ● ● −5 ● ● 13 3 22 n=113 n=126 n=121 sp000437595 sp000434935 sp000437595 Prevotella Prevotella Prevotella sp002251385 13 sp002251365 3 sp002251385 10 GH15 GH142 GH151

● ● ● −3 ● −2.5 −4 GH15 GH142 GH151 −5.0 ● ● ● −6 ● −6 ● ● ● ● −7.5 ● ● ● n=6 ● CAG−269 Treponema_D Firm−10 ● ● −8 ● ● ● ● ● ● ● ● ● ● ● 14 17 15 ● ● ● ● ● −9 ● ● −10.0 ● ● ● ● ● n=24 ● ● n=19 ● sp000431335 porcinum sp001603025 ● ● −10 ● ● ● −12.5 ● ● CAG−245 OEMS01 UBA2862 n=123 n=120 n=118 sp000435175 13 sp900199405 6 sp900317565 16 CAG−269 13 UBA11524 10 UBA11524 7 GH140 GH144 GH147 sp001916055 sp000437595 sp000437595

● −2.5 −2.5 ● −3 −5.0 GH140 GH144 GH147 −5.0 ● ● ● −6 ● −7.5 ● ● −7.5 ● n=3 ● ● ● ● Prevotella RC9 Prevotella −10.0 ● ● ● ● n=23 ● −9 n=33 ● ● 7 13 17 ● −10.0 ● ● ● ● sp002251295 sp002438635 sp002409785 ● ● −12.5 ● ● n=123 n=121 n=103 UBA7748 RC9 CAG−1031 sp900314535 12 sp002314885 11 sp000431215 13 UBA11524 6 RC9 7 UBA7173 11 GH18 sp000437595 sp900318565 sp001701135 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 tM tM

● ● ● ss_sows ss_sows

−1 ss_piglets ss_piglets GH18 −2 ● ● ● ● ● −3 n=42 Oscillibacter 5 ● sp900066435 n=126 Blautia_A wexlerae 4 CAG−45 sp002299665 2 t0 t2 t4 t6 t8 tM ss_sows ss_piglets date percentage of species carrying enzyme 10 20 30 40 50

Supplementary Figure 32. GH class enzymes time trend and species representation (part 2).

Log-transformed normalized abundance of enzymes over time is represented on the left panels. Time trend is visualized from time point 0 (t0) (week 1) to time point 8 (t8) (week 4) for enzymes in the piglet population. Time point tM is the single time point of the mothers. The number of subjects (piglets and mothers) carrying each enzyme is reported on the left of the boxplots within each panel. Panels on the right report the top three species carrying each enzyme in their MAGs (based on GTDB taxonomic clustering of MAGs). Only enzymes for which a significant change over time was found between any time point (Kruskal pair-wise comparison with Bonferroni correction) are reported. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

GH33 GH42 GH25 GH33 GH42 GH25

0 ● ● ● Firm−10 UBA11524 CAG−110 ● −1 ● 6 7 2 ● 0.5 ● sp001603025 sp000437595 sp002437585 −2 ● UBA11524 Ruminococcus_A Gemmiger ● 0.0 −2 ● ● ● ● ● 4 4 3 ● ● ● ● ● ● ● ● −0.5 sp000437595 sp003011855 formicilis −4 −3 ● ● Prevotella Blautia_A Prevotella ● n=42 n=42 −1.0 n=42 ● ● ● 3 7 2 ● ● ● sp002251385 wexlerae copri

n=126 n=126 −1.5 n=126

GH32 GH29 GH38 GH32 GH29 GH38

● 1 ● 0 ● ● Blautia_A Firm−10 UBA2862 ● −1 0 ● ● ● ● 4 4 6 −1 −1 −2 wexlerae sp001603025 sp900317565 ● Catenibacterium UBA11524 UBA11524 ● ● −2 ● ● ● ● −3 2 4 6 −2 −3 ● ● mitsuokai sp000437595 sp000437595 ● n=42 ● n=42 −4 n=42 Fusicatenibacter Prevotella Eubacterium_R −3 ● −4 2 3 4 n=126 n=126 n=126 saccharivorans sp002251385 coprostanoligenes

GH30 GH5 GH26 GH30 GH5 GH26 ● ● 0 ● ● −1 ● −2 −2 Prevotella RC9 Prevotella −2 ● ● ● 5 3 7 ● sp002251365 sp000434935 sp000434515 −3 ● ● ● −4 ● ● ● ● ● ● ● ● ● ● Eubacterium_R Prevotella RC9 −4 ● ● −4 ● ● coprostanoligenes 4 sp002251295 4 sp000434935 7 −5 n=32 n=41 −6 n=37 ● ● Bacteroides_B Prevotella Prevotella n=126 n=126 n=125 vulgatus 4 sp002251435 5 sp002251295 6 GH37 GH51 GH27

0 ● ● GH37 GH51 GH27 ● −1 ● ● ● ● ● −1 −3 −2 ● Escherichia RC9 RC9 −2 ● ● ● 37 4 3 −6 ● −3 ● flexneri sp000432655 sp000434935 ● ● ● ● ● ● −3 ● ● ● ● ● ● −4 RC9 UBA11524 UBA11524 ● ● ● ● ● −9 ● −4 n=39 n=42 n=39 6 5 4 ● −5 sp900320185 sp000437595 sp000437595 ● ● −5

n=124 n=126 n=126 RC9 Prevotella UBA2862 sp002439805 6 sp002251435 4 sp900315585 3 GH36 GH35 GH50

● ● ● 0.0 0 ● ● GH36 GH35 GH50 ● −2 −0.5 −2 −1.0 −4 UBA11524 UBA11524 RC9 ● 3 7 7 ● sp000437595 sp000437595 sp000433355 −1.5 ● ● ● ● ● ● −4 ● −6 ● ● ● ● ● ● ● ● ● ● ● Blautia_A Prevotella RC9 n=42 ● n=41 n=27 ● ● −2.0 ● ● ● 4 4 14 ● −6 −8 ● wexlerae sp002251435 sp000434935 n=126 n=126 n=124 GCA−900066135 Prevotella UBA11524 sp900066135 2 sp002251365 4 sp000437595 9 enzymeID GH4 GH43 GH31

● ● ● ● ● ● 0 ● 0 GH4 GH43 GH31 −1 −1 −1 ● ● ● ● ● Escherichia UBA11524 UBA11524 −2 ● ● ● 4 4 4 ● ● ● −2 ● ● −2 ● ● ● flexneri sp000437595 sp000437595 n=42 ● n=42 n=42 UBA11524 Prevotella Prevotella −3 ● −3 −3 ● 7 6 3 n=126 n=126 n=126 sp000437595 sp002251435 sp002251435 OEMS01 Prevotella Blautia_A sp900199405 3 sp002251365 4 wexlerae 3 GH54 GH39 GH28

● ● 0 ● −2 ● −5.0 −3 GH54 GH39 GH28 −2 ● ● ● ● −4 ● −7.5 ● ● ● ● ● ● RC9 Firm−10 Firm−10 ● ● ● ● −5 ● ● −4 77 9 6 −10.0 ● n=11 ● −6 n=41 ● n=40 sp000432655 sp001603025 sp001603025 ● ● ● RC9 UBA11524 UBA11524

n=109 −7 n=126 n=126 sp001915575 12 sp000437595 5 sp000437595 4 RC9 6 Eubacterium_H 4 Gemmiger 5 GH55 GH53 GH24 sp900167925 sp900314295 formicilis −2.5 −1 −2 −5.0 −2 −3 GH55 GH53 GH24 ● ● −7.5 −3 ● ● ● n=9 ● ● −4 ● ● ● ● ● ● ● ● −4 ● ● RC9 RC9 F23−B02 −10.0 ● ● ● ● ● ● ● n=42 −5 n=42 21 3 5 −5 ● ● sp000432655 sp000434935 sp002472405 −12.5 ● −6 ● n=117 n=126 n=125 RC9 Prevotella CAG−110 sp000435075 10 sp002251435 6 sp003525905 5 RC9 8 Prevotella 5 CAG−110 8 GH3 sp000434935 sp002251365 sp002437585 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 tM tM

● 1.0 ● ss_sows ss_sows

0.5 ss_piglets ss_piglets 0.0 GH3 ● ● ● ● ● −0.5 ●

−1.0 n=42 CAG−110 sp002437585 3 n=126 Blautia_A wexlerae 3 Gemmiger formicilis 4 t0 t2 t4 t6 t8 tM ss_sows ss_piglets date percentage of species carrying enzyme 20 40 60

Supplementary Figure 33. GH class enzymes time trend and species representation (part 3).

Log-transformed normalized abundance of enzymes over time is represented on the left panels. Time trend is visualized from time point 0 (t0) (week 1) to time point 8 (t8) (week 4) for enzymes in the piglet population. Time point tM is the single time point of the mothers. The number of subjects (piglets and mothers) carrying each enzyme is reported on the left of the boxplots within each panel. Panels on the right report the top three species carrying each enzyme in their MAGs (based on GTDB taxonomic clustering of MAGs). Only enzymes for which a significant change over time was found between any time point (Kruskal pair-wise comparison with Bonferroni correction) are reported. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

GH8 GH84 GH95 GH8 GH84 GH95

● ● ● Lactobacillus_H Prevotella Prevotella 0 ● ● ● 6 5 4 −2 −2 reuteri sp000834015 sp002251435 ● Prevotella An172 UBA11524 ● ● −2 ● ● ● ● −4 ● ● ● 9 6 6 ● ● −4 ● ● ● ● sp002251435Escherichia sp002160515Prevotella Eubacterium_Rsp000437595 ● −4 −6 ● −6 ● ● 5 6 4 −8 ● ● flexneri sp002251365 coprostanoligenes n=42 n=31 n=40 n=126 n=126 n=126 GH92 GH89 GH63 GH92 GH89 GH63

● ● 0 ● RC9 RC9 Bacteroides_B −2 −2.5 9 6 8 −2 sp000434935Prevotella sp000434935RC9 Escherichiavulgatus ● −5.0 ● ● ● ● ● ● ● ● ● ● 6 5 9 ● ● ● ● −4 ● −4 ● ● ● sp002251385 sp000435075 flexneri ● −7.5 Prevotella Prevotella RC9 −6 −6 ● ● sp002251365 6 sp002251365 5 sp000435075 6 n=40 n=42 n=35 n=126 n=126 n=126 GH97 GH67 GH73 GH97 GH67 GH73 0 ● −2 ● 0 ● −1 −4 RC9 Prevotella Lactobacillus_H ● ● 6 7 4 −2 ● ● −1 ● ● sp000434935Prevotella sp002251295RC9 Blautia_Areuteri ● −6 −3 ● ● ● ● ● 7 11 2 ● ● −2 −4 ● −8 ● ● sp002251435RC9 sp000432655Prevotella CAGwexlerae−269 ● 6 10 2

n=38 n=32 n=42 sp000433355 sp002251435 sp000431335 n=126 GH68 n=124 GH9 n=126 GH81 GH68 GH9 GH81 ● −1 ● −3 −2 ● −5.0 Lactobacillus_H Prevotella CAG−279 −6 −3 −7.5 27 6 56 ● −4 ● reuteri sp002251295 sp000437795 ● ● ● Lactobacillus_H RC9 Clostridium_AI ● ● ● −9 ● ● ● −10.0 ● −5 ● ● 16 8 6 −12 ● −6 ● −12.5 n=5 Lactobacillusvaginalis sp000432655RC9 sp002297865Prevotella

n=40 n=39 n=88 johnsonii 22 sp000433355 4 sp000834015 7 n=123 GH57 n=126 GH66 GH85

−1 −2 ● GH57 GH66 GH85 ● −2.5 −2 ● −4 −5.0 RC9 RUG563 Prevotella ● ● −3 ● ● ● −6 ● ● ● ● ● ● ● ● ● 4 9 19 ● ● ● ● ● ● ● −7.5 ● sp000434935 sp900320085 sp000434975 −4 ● ● ● ● Prevotella CAG−1031 Prevotella −8 ● ● ● −10.0 ● 3 8 16 −5 sp002251365UBA5920 sp000431215RC9 sp000834015Prevotella n=41 n=20 n=27 sp002406055 5 sp000433355 8 sp002251385 9 n=126 GH44 n=126 GH93 n=124 GH98 −4 ● ● −2.5 ● ● GH44 GH93 GH98 −6 ● ● −5 ● −5.0 −8 ● ● ● ● ● −7.5 ● Ruminococcus_C RC9 UBA1691 ● ● −10 ● ● −10 71 39 18 −10.0 ● ● −12 ● ● ● sp000433635 sp000434935 sp002320555 n=4 n=8 ● n=5 HUN007 RC9 Ruminococcus_F −12.5 ● 3 10 19

n=45 n=74 sp002315235 sp002440085 champanellensis Fibrobacter_A 3 Prevotella 5 Acetivibrio_A 8

n=116 intestinalis sp002409785 ethanolgignens enzymeID GH88 GH78 GH76

● −2 0 −2.5 GH88 GH78 GH76

−2 ● −5.0 ● ● −4 ● ● Gemmiger UBA11524 RC9 ● ● −7.5 ● ● ● −4 ● ● 6 4 17 −6 ● −10.0 ● sp003476825OEMS01 sp000437595RC9 sp000434935Prevotella

n=35 n=42 n=25 9 3 9 sp900199405UBA11524 sp000431015UBA2862 sp000834015Prevotella

n=126 n=126 n=126 7 3 7 GH79 GH94 GH74 sp000437595 sp900315585 sp002251365

● −2.5 −3 −1 −5.0 GH79 GH94 GH74 −6 ● −2 ● ● ● −7.5 ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −10.0 ● −9 ● ● Faecalibacterium UBA11524 Prevotella n=7 ● −12.5 ● ● 16 3 53 −4 prausnitzii_ECAG−110 sp000437595Blautia_A Ruminococcus_Fsp002251295 n=42 n=14 15 4 13 sp002362145Gemmiger CAGwexlerae−110 champanellensisBacteroides n=118 n=126 n=106 16 3 7 GH77 GH11 GH48 sp003476825 sp002437585 uniformis

−2.5 ● ● −4 ● 0 ● ● −5.0 ● −6 ● ● ● −1 ● ● ● ● −8 GH77 GH11 GH48 ● ● −7.5 ● ● ● −10 −2 −10.0 ● n=7 −12 n=7 ● Blautia_A Clostridium_AI Ruminococcus_C ● −12.5 2 35 41

n=42 n=56 n=47 wexlerae sp002297865 sp000433635 CAG−110 2 Eubacterium_F 17 Clostridium_AI 20

n=126 sp002437585 sp003491505 sp002297865 Gemmiger 2 Ruminococcus_F 13 Ruminococcus_F 17 GH65 GH126 GH70 formicilis champanellensis champanellensis −1 −2.5 −2 −5 −5.0 −3 −7.5 −10 GH65 GH126 GH70 −4 ● ● −10.0 ● −5 ● −12.5 n=8 Lactobacillus Lactobacillus_B Lactobacillus_H

n=42 n=83 n=20 n=59 4 22 22 UBA11524johnsonii Clostridium_Psalivarius Lactobacillus_Hsp002299975

n=126 6 19 27 sp000437595Escherichia perfringensUBA11524 Lactobacillusreuteri GH161 GH99 flexneri 5 sp000437595 13 delbrueckii_A 4 t0 t2 t4 t6 t8 tM

● −5.0 −5.0 ss_sows ss_piglets −7.5 −7.5 ● ● ● ● GH161 GH99 −10.0 −10.0 ● ● ● ● ● ● ● ●

n=3 n=2 ● ● −12.5 −12.5 ● Roseburia Firm−10 n=78 n=96 21 38 Agathobacterinulinivorans sp001603025Prevotella sp000434275 9 sp000834015 6 CAG−127 8 UBA1067 5

t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 sp900319515 sp900320855 tM tM ss_sows ss_sows ss_piglets ss_piglets date percentage of species carrying enzyme 20 40 60

Supplementary Figure 34. GH class enzymes time trend and species representation (part 4).

Log-transformed normalized abundance of enzymes over time is represented on the left panels. Time trend is visualized from time point 0 (t0) (week 1) to time point 8 (t8) (week 4) for enzymes in the piglet population. Time point tM is the single time point of the mothers. The number of subjects (piglets and mothers) carrying each enzyme is reported on the left of the boxplots within each panel. Panels on the right report the top three species carrying each enzyme in their MAGs (based on GTDB taxonomic clustering of MAGs). Only enzymes for which a significant change over time was found between any time point (Kruskal pair-wise comparison with Bonferroni correction) are reported. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.17.253872; this version posted August 17, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

SLH cohesin SLH cohesin n=8 n=41 n=84 n=126

−2.5

−2 ●

Ruminiclostridium_C Ruminococcus_C sp000435295 13 sp000433635 15

−5.0

−4

CAG−103 CAG−103 −7.5 sp000432375 8 sp900317855 12

● ● enzymeID

● ●

● ●

● −6

● −10.0

● ●

● ● Agathobaculum Eubacterium_E ● 5 9 ● butyriciproducens hallii_A

● −12.5 t0 t2 t4 t6 t8 t0 t2 t4 t6 t8 tM tM ss_sows ss_sows ss_piglets ss_piglets date percentage of species carrying enzyme 7.5 10.0 12.5 15.0

Supplementary Figure 35. Cohesin and S-layer homology (SLH) enzymes time trend and species representation.

Log-transformed normalized abundance of enzymes over time is represented on the left panels. Time trend is visualized from time point 0 (t0) (week 1) to time point 8 (t8) (week 4) for enzymes in the piglet population. Time point tM is the single time point of the mothers. The number of subjects (piglets and mothers) carrying each enzyme is reported on the left of the boxplots within each panel. Panels on the right report the top 3 species carrying each enzyme in their MAGs (based on GTDB taxonomic clustering of MAGs). Only enzymes for which a significant change over time was found between any time point (Kruskal pair-wise comparison with Bonferroni correction) are reported.