<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Succession of longum strains in response to the changing early-life

2 nutritional environment reveals specific adaptations to distinct dietary substrates. 3

4 Magdalena Kujawska1, Sabina Leanti La Rosa2, Phillip B. Pope2,3, Lesley Hoyles4, Anne L. 5 McCartney5, Lindsay J Hall1

6 1 Gut Microbes & Health, Quadram Institute Biosciences, Norwich Research Park, Norwich, UK

7 2 Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Aas, 8 Norway

9 3 Faculty of Biosciences, Norwegian University of Life Sciences, Aas, Norway

10 4 Department of Biosciences, Nottingham Trent University, Nottingham, UK

11 5 Department of Food & Nutritional Sciences, University of Reading, Reading, UK

12

13 Abstract

14 Diet-microbe interactions play a crucial role in infant development and modulation of the early-life 15 microbiota. The genus Bifidobacterium dominates the breast-fed infant gut, with strains of B. 16 longum subsp. longum (B. longum) and B. longum subsp. infantis (B. infantis) particularly prevalent. 17 Although transition from milk to a more diversified diet later in infancy initiates a shift to a more 18 complex microbiome, specific strains of B. longum may persist in individual hosts for prolonged 19 periods of time. Here, we sought to investigate the adaptation of B. longum to the changing infant 20 diet. Genomic characterisation of 75 strains isolated from nine either exclusively breast- or formula- 21 fed (pre-weaning) infants in their first 18 months revealed subspecies- and strain-specific intra- 22 individual genomic diversity with respect to glycosyl hydrolase families and enzymes, which 23 corresponded to different dietary stages. Complementary phenotypic growth studies indicated 24 strain-specific differences in human milk oligosaccharide and plant utilisation profiles 25 of isolates between and within individual infants, while proteomic profiling identified active 26 polysaccharide utilisation loci involved in metabolism of selected . Our results indicate 27 a strong link between infant diet and B. longum subspecies/strain genomic and carbohydrate 28 utilisation diversity, which aligns with a changing nutritional environment: i.e. moving from breast 29 milk to a solid food diet. These data provide additional insights into possible mechanisms 30 responsible for the competitive advantage of this Bifidobacterium species and its long-term

1

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

31 persistence in a single host and may contribute to rational development of new dietary therapies for 32 this important developmental window.

33

34 Keywords: Bifidobacterium longum, infant diet, carbohydrates, genomics, proteomics

35

36 Introduction

37 Microbial colonisation shortly after birth is the first step in establishment of the mutualistic 38 relationship between the host and its microbiota (1-3). The microbiota plays a central role in infant 39 development by modulating immune responses, providing resistance to pathogens, and also 40 digesting the early-life diet (4-10). Indeed, diet-microbe interactions are proposed to play a crucial 41 role during infancy and exert health effects that extend to later life stages (11-16). The 42 of vaginally delivered full-term healthy infants harbours a relatively simple 43 microbiota characterised by the dominance of the genus Bifidobacterium (17).

44 Breast milk is considered the gold nutritional standard for infants, which also acts as an important 45 dietary supplement for early-life microbial communities, including Bifidobacterium. The strong diet- 46 microbe association has further been supported by reports of differences in microbial composition 47 between breast- and formula-fed infants (e.g. high versus low Bifidobacterium abundance) and 48 related differential health outcomes between the two groups: e.g. increased instances of asthma, 49 allergy and obesity in formula-fed infants (18-24).

50 The high abundance of Bifidobacterium in breast-fed infants has been linked to the presence of 51 specific carbohydrate utilisation genes and polysaccharide utilisation loci (PULs) in their genomes, 52 particularly the ones involved in the degradation of breast milk-associated human milk 53 oligosaccharides (HMOs) (8). The presence of these genes is often species- and indeed strain- 54 specific, and has been described in B. breve, B. bifidum, B. longum, B. infantis, and more rarely in B. 55 pseudocatenulatum (8, 25-27). However, previous studies have indicated co-existence of 56 Bifidobacterium species and strains in individual hosts, resulting in interaction and metabolic co- 57 operation within a single (HMO-associated) ecosystem (1, 28).

58 Transition from breastfeeding to a more diversified diet and the introduction of solid foods has been 59 considered to initiate the development of a functionally more complex adult-like microbiome with 60 genes responsible for degradation of plant-derived complex carbohydrates, starches, and 61 xenobiotics, as well as production of vitamins (29, 30). Non-digestible complex carbohydrates such 62 as inulin-type fructans (ITF), arabino-xylans (AX) or arabinoxylo-oligosaccharides (AXOS) in

2

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

63 complementary foods have been proposed to potentially exert beneficial health effects through 64 their bifidogenic and prebiotic properties and resulting modulation of the intestinal microbiota and 65 metabolic end-products (31-34).

66 Despite the shift in microbiota composition during weaning, specific strains of Bifidobacterium, and 67 B. longum in particular, have previously been shown to persist in individuals over time (35, 36). B. 68 longum is currently recognised as four subspecies: longum and infantis (characteristic of the human 69 ), and suis and suillum (from animal hosts) (37, 38). It is considered the most common 70 and prevalent species found in the human gut, with B. longum subsp. infantis detected in infants, 71 and B. longum subsp. longum widely distributed in both infants and adults (39, 40). The differences 72 in prevalence between the two subspecies, and the ability of infant, adult and elderly host to acquire 73 new B. longum strains during a lifetime have been attributed to distinct bacterial carbohydrate 74 utilisation capabilities and the overall composition of the resident microbiota (41, 42). However, 75 longitudinal assessments of this species in single hosts over the course of changing dietary patterns 76 are limited, and therefore further detailed studies are required.

77 Here, we investigate the adaptations of Bifidobacterium to the changing infant diet and examine a 78 unique collection of B. longum strains isolated from nine infants across their first 18 months. We 79 probed the genomic and phenotypic similarities between 62 B. longum strains and 13 B. infantis 80 strains isolated from either exclusively breast-fed or formula-fed infants (pre-weaning). Our results 81 indicate a strong link between host diet and Bifidobacterium species/strains, which appears to 82 correspond to the changing nutritional environment. Genome flexibility of B. longum and nutrient 83 preferences of specific strains may aid their establishment within individual infant hosts, and their 84 ability to persist through significant dietary changes. These dietary changes (moving from breast milk 85 to solid food) may also encourage acquisition of new B. longum sp. strains with different nutritional 86 preferences. Overall, our findings provide important additional insights into mechanisms responsible 87 for adaptation to a changing nutritional environment and long-term persistence within the early life 88 gut.

89

90 Results

91 Previous investigations into B. longum across the human lifespan have determined a broad 92 distribution of this species, including prolonged periods of colonisation (35, 36). To gain insight into 93 potential mechanisms facilitating these properties during the early-life window, we investigated the 94 genotypic and phenotypic characteristics of B. longum strains within individual infant hosts in 95 relation to diet (i.e. breast milk vs formula) and dietary stages (i.e. pre-weaning, weaning and post-

3

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

96 weaning), following up on a longitudinal study of the infant faecal microbiota (43, 44). Faecal 97 samples from exclusively breast-fed infants and exclusively formula-fed infants were collected 98 regularly from 1 month to 18 months of age (43). The number of samples obtained from the breast- 99 fed infants during the pre-weaning period was higher than that obtained from the formula-fed 100 group, which may correlate with differences in weaning age (~20.6 vs. ~17 weeks old). Bacterial 101 isolation was carried out on faecal samples, and the isolated colonies identified using ribosomal 102 intergenic spacer analysis (44). Based on these results, 88 isolates identified as Bifidobacterium were 103 selected for this study, 46 from five exclusively breast-fed infants (BF1-BF5, including identical twins 104 BF3 and BF4) and 42 from four exclusively formula-fed infants (FF1-FF3 and FF5). Following 105 sequencing and ANI analysis (Supplementary Tables S1 & S2), 75 strains were identified as B. 106 longum sp. and included in further analysis, with 62 strains identified as B. longum subsp. longum (B. 107 longum) and 13 strains identified as B. longum subsp. infantis (B. infantis) (Figure 1a).

108

109 General features of B. longum genomes

110 To determine possible genotypic factors facilitating establishment and persistence of B. longum in 111 the changing early-life environment, we assessed the genome diversity of our strains. Sequencing 112 generated between 12 and 193 contigs for each B. longum strain, with 98.6% of draft genomes 113 (n=74) containing fewer than 70 contigs and one draft genome containing 193 contigs, yielding a 114 mean of 66.95-fold coverage for strains sequenced on HiSeq (minimum 46-fold, maximum 77-fold) 115 and 231-fold for the strain sequenced on MiSeq (Supplementary Table S1). The predicted genome 116 size for strains identified as B. longum ranged from 2.21 Mb to 2.58 Mb, possessing an average 117 G+C% content of 60.11%, an average predicted ORF number of 2,023 and number of tRNA genes 118 ranging from 55-88. For strains identified as B. infantis, the predicted genome size ranged from 2.51 119 Mb to 2.75 Mb, with an average G+C% content of 59.69%, an average predicted ORF number of 120 2,280 and the number of tRNA genes ranging from 57 to 62.

121

122 Comparative genomics

123 To identify B. longum strains among the sequenced isolates and assess the nucleotide-level genomic 124 differences between isolates, we subjected their genomes to ANI analysis. Results (Supplementary 125 Table S2) indicated that B. longum strains isolated from individual infant hosts displayed higher 126 levels of sequence identity than strains isolated from different hosts. More specifically, pairwise 127 identity values for strains isolated from infant BF3 showed the narrowest range (average value of

4

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

128 99.99±3.15e-5%), followed by infant FF2 strains (99.98±1.12e-4%), with infant BF2 strains having the 129 broadest identity value range (averaging 99.13±7.8e-3%).

130 Next, we examined genetic diversity of newly sequenced B. longum strains and their relatedness to 131 each other, and B. longum type strains, namely B. longum subsp. longum JCM 1217T, B. longum 132 subsp. infantis ATCC 15697T and B. longum subsp. suis LMG 21814T, based on the generated 133 pangenome data. This analysis identified a total of 1002 genes as core genes present in at least 99% 134 of the analysed B. longum subspecies genomes and allowed a clear distinction between B. longum 135 subspecies (i.e. longum vs. infantis) based on the presence/absence of specific genes 136 (Supplementary Table S3). Phylogenetic analysis performed on the B. longum core genome revealed 137 that B. longum strains within each subspecies clustered mainly according to isolation source, i.e. 138 individual infants, rather than dietary stage (i.e. pre-weaning, weaning and post-weaning) (Figure 139 1b). Interestingly, strains isolated from formula-fed baby FF5 clustered into two separate clusters, 140 irrespective of the isolation period, suggesting presence of two highly related B. longum groups 141 within this infant. Furthermore, strains isolated from identical twins BF3 and BF4 clustered together, 142 indicating their close relatedness.

143 We next sought to identify whether specific components of the B. longum subspecies pangenome 144 were enriched in infant hosts. Each candidate gene in the accessory genome was sequentially scored 145 according to its apparent correlation to host diet (breast vs. formula) or dietary stage. A gene 146 annotated as alpha-L-arabinofuranosidase, along with four other genes coding for hypothetical 147 proteins, were predicted to be enriched in B. longum strains isolated from breast-fed infants. Alpha- 148 L-arabinofuranosidases are enzymes involved in hydrolysis of terminal non-reducing alpha-L- 149 arabinofuranoside residues in alpha-L-arabinosides and act on such carbohydrates as (arabino)xylans 150 (45, 46). In addition, two genes coding for hypothetical proteins and a gene coding for Mobility 151 protein A were overrepresented in strains isolated from formula-fed infants. We did not find any 152 associations between specific genes and diet in B. infantis. Furthermore, no associations between 153 genes and dietary stages were found in either B. longum or B. infantis (Supplementary Table S4).

154 As our strains were isolated from individual infants at different time points, we next sought to 155 determine their intra-strain diversity; for this we used the first B. longum isolate from each infant as 156 the ‘reference’ strain to which all other strains from the same infant were compared (Figure 2). 157 Infants BF1, BF3 and FF2 had the lowest strain diversity; with respective mean pairwise SNP 158 distances of 18.7±20.3 SNPs (mean±sd), 10.3±5.0 SNPs and 13.3±5.3 SNPs. These results suggest 159 strains isolated from these infants may be clonal, indicating long-term persistence of B. longum 160 within individual infant hosts despite early-life dietary changes. Surprisingly, analysis of strains

5

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

161 isolated from breast-fed identical twins BF3 and BF4 revealed higher strain diversity in baby BF4 162 (mean pairwise SNP distance of 1034.5±1327.1 SNPs), compared to the highly similar strains in infant 163 BF3 (i.e. 10.3±5.0 SNPs). Based on these results, we conducted SNP analysis on B. longum strains 164 isolated from both babies and found that out of 13 strains analysed (n=8 from BF3 and n=5 from 165 BF4), 12 isolated during pre-weaning, weaning and post-weaning appeared to be clonal (with mean 166 pairwise SNP distance of 10.0±5.5 SNPs) and one strain from baby BF4 isolated post-weaning was 167 more distant, with mean SNP distance of 2595.4±2.8 SNPs. The difference in strain diversity may 168 relate to the fact that infant BF4 received a course of antibiotics during pre-weaning (at 14 weeks). 169 Bifidobacterium counts were not detectable nor was any Bifidobacterium-specific PCR product for 170 DGGE obtained from this infant during the antibiotic administration; however, both were obtained 171 for the sample collected one week after antibiotic treatment completed (44). Furthermore, the 172 presence of clonal strains in both babies suggests vertical transmission of B. longum from mother to 173 both infants, or potential horizontal transmission between babies, consistent with previous reports 174 (42, 47-49). B. infantis strains isolated from infant BF2 showed the highest strain diversity, with the 175 mean pairwise SNP distance of 9030.9±8036.6 SNPs. Seven strains isolated during both pre-weaning 176 and weaning periods appeared to be clonal, with mean pairwise SNP distance of 6.3±1.6 SNPs, while 177 four strains isolated during weaning and post-weaning were more distant, with mean pairwise SNP 178 distance of 14983.5±4658.3 SNPs (Supplementary Table S5).

179

180 Functional annotation of B longum subspecies genomes – carbohydrate utilisation

181 To assess genomic differences between our strains at a functional level, we next assigned functional 182 categories to ORFs of each B. longum genome. Carbohydrate transport and metabolism was 183 identified as the second most abundant category (after unknown function), reflecting the 184 saccharolytic lifestyle of Bifidobacterium (Supplementary Figure 1) (28, 50). B. longum had a slightly 185 higher proportion of carbohydrate metabolism and transport genes (11.39±0.31%) compared to B. 186 infantis (10.20±0.60%), which is consistent with previous reports (51, 52). B. longum strains isolated 187 during pre-weaning had a similar proportion of carbohydrate metabolism genes in comparison with 188 the strains isolated post-weaning: 11.28±0.23% and 11.48±0.38%, respectively. Furthermore, we 189 obtained similar results for B. longum strains isolated from breast- and formula-fed infants, with 190 respective values of 11.41±0.21% and 11.38±0.38%. In contrast, B. infantis strains isolated pre- 191 weaning had a lower proportion of carbohydrate metabolism genes in their genomes compared to 192 the ones isolated post-weaning: 9.90±0.24% and 11.20±0.01%, respectively (Supplementary Table

6

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

193 S6). These findings may indicate evolutionary adaptation of B. infantis strains to the changes in 194 infant diet at early-life stages.

195 One of the major classes of carbohydrate-active enzymes comprises glycosyl hydrolases (GH), which 196 facilitate glycan metabolism in the gastrointestinal tract (53). Bifidobacterium have been shown to 197 possess an extensive repertoire of these enzymes, which allow them to adapt to the host 198 environment through degradation of complex dietary and host-derived carbohydrates (50). We thus 199 sought to investigate and compare the arsenal of GHs in B. longum sp. using dbCAN2. We identified 200 a total of 36 different GH families in all Bifidobacterium strains. B. longum was predicted to contain 201 55 GH genes per genome on average (2.72 % of OFRs), while this number was lower for B. infantis 202 strains - predicted to harbour an average of 37 GH genes per genome (1.62% of ORFs) (Figure 3).

203 The predominant GH family in B. longum strains was GH43, whose members include enzymes 204 involved in metabolism of complex plant carbohydrates such as (arabino)xylans (54), followed by 205 GH13 (starch), GH51 (hemicelluloses) and GH3 (plant glycans) (28, 55).

206 Within the B. longum group, strains isolated during pre-weaning had a slightly lower mean number 207 of GH genes compared to strains isolated post-weaning (54.46±2.81 vs. 56.85±2.77). Moreover, 208 strains isolated from breast-fed babies contained an average of 53.96±3.82 GH genes per genome, 209 while this number was slightly higher for strains isolated from formula-fed infants with 56.47±2.96 210 GH genes per genome. Further analysis revealed that differences in abundance of the predominant 211 GH families in B. longum strains appeared to be intra-host-specific and diet-related. For example, 212 strains isolated from breast-fed twins BF3 and BF4 pre-weaning had 11 GH43 genes per genome, 213 while the pre-weaning strain from formula-fed baby FF3had 13 GH genes per genome predicted to 214 belong to this GH family. Similarly, strains isolated from babies BF3 and BF4 post-weaning had 11 215 predicted GH genes, while the three strains isolated from infant FF3 were predicted to contain 16, 216 16 and 18 GH genes per genome, respectively (Supplementary Table S7).

217 In contrast, the most abundant GH family in B. infantis strains was GH13 (starch), followed by GH42, 218 GH20 and GH38 (Supplementary Table S7). The GH42 family contains beta-galactosidases whose 219 enzymatic activity ranges from lactose present in breast milk to galacto-oligosaccharides and 220 galactans found in plant cell walls (56, 57). Members of GH20 family show hexosaminidase and 221 lacto-N-biosidase activities, while family GH38 contains alpha-mannosidases (28). We also 222 determined that, in contrast to B. longum, B. infantis strains harbour genes predicted to encode 223 members of the GH33 family, which contains exo-sialidsaes (28). This finding suggested that B. 224 infantis strains may have the ability to metabolise sialylated HMOs as well as utilise host mucins to 225 release sialic acids and digest free sialic acid present in the gut.

7

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

226 Within the B. infantis group, strains isolated pre-weaning were predicted to contain an average of 227 34.83±0.4 GH genes per genome, while this number was higher for the strains isolated post-weaning 228 (i.e. 43.00±0.00 GH genes per genome). Further analysis revealed that B. infantis strains isolated 229 post-weaning contained families GH1 and GH43 that were absent in the strains isolated pre- 230 weaning. The GH1 family contains enzymes such as beta-glucosidases, beta-galactosidases and beta- 231 D-fucosidases active on a wide variety of (phosphorylated) disaccharides, oligosaccharides, and 232 sugar–aromatic conjugates (58). In addition, the B. infantis strains isolated post-weaning harboured 233 a higher number of genes predicted to belong to families GH42 and GH2 (enzymes active on a 234 variety of carbohydrates) (59).

235 Members of the genus Bifidobacterium have previously been shown to contain GH genes involved in 236 metabolism of various HMOs present in breast milk (27, 60). Alpha-L-fucosidases belonging to 237 families GH29 and GH95 have been determined to show specificity towards fucosylated HMOs (27, 238 61), while lacto-N-biosidases and galacto-N-biose/lacto-N-biose phosphorylases members of GH20 239 and GH112 have been shown to be involved in degradation of isomeric lacto-N-tetraose (LNT) (62). 240 We identified genes belonging to GH29 and GH95 in all our B. infantis strains, as well as four B. 241 longum strains isolated from formula-fed baby FF3. Furthermore, we found GH20 and GH112 genes 242 in all our B. infantis and B. longum strains (Supplementary Table S7).

243 Overall, these findings suggest differences in general carbohydrate utilisation profiles between B. 244 longum and B. infantis. The presence of genes involved in utilisation of different carbohydrates, 245 including HMOs in our strains, suggests the adaptation of Bifidobacterium to a changing early-life 246 nutritional diet, which may be a factor facilitating establishment of these within individuals 247 during infancy.

248

249 Prediction of gain and loss of GH families in B. longum

250 Given the differences in the carbohydrate utilisation profiles between B. longum and B. infantis, we 251 next investigated the acquisition and loss of GH families within the two subspecies. For this purpose, 252 we additionally predicted the presence of GH families in type strains B. longum subsp. longum JCM 253 1217T, B. longum subsp. infantis ATCC 15697T and B. longum subsp. suis LMG 21814T with dbCAN2 254 and generated a whole genome SNP tree to reflect gene loss/gain events more accurately (Figure 3, 255 Supplementary Table S8). Both B. longum and B. infantis lineages appear to have acquired GH 256 families (when compared to the common ancestor of the phylogenetic group), with the B. longum 257 lineage gaining two GH families (GH121 and GH146) and the B. infantis lineage one GH family

8

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

258 (GH33). Within the B. infantis lineage, which also contains the B. suis type strain, the B. infantis 259 taxon has further acquired two and lost five GH families. These findings suggest that the two human- 260 related subspecies have followed different evolutionary paths, which is consistent with our 261 observation of differences between B. longum and B. infantis resulting from phylogenomic analyses. 262 Intriguingly, strain adaptation to the changing host environment (i.e. individual infant gut) seems to 263 be driven by loss of specific GH families (Figure 3). For example, B. infantis strains isolated during 264 pre-weaning and weaning from baby BF2 appear to be missing up to three GH families (GH1, GH43 265 and GH109) present in strains isolated post-weaning. Lack of family GH43 (containing enzymes 266 involved in metabolism of a variety of complex carbohydrates, including plant-derived 267 polysaccharides) in early-life B. infantis strains may explain nutritional preference of this subspecies 268 for an HMO-rich diet. Similarly, we observed differential gene loss events in B. longum strains from 269 individual hosts. For example, all strains isolated from baby BF5 appear to lack GH families GH1, 270 GH29 and GH95. However, strains isolated pre-weaning additionally lacked GH53 family, which 271 includes endogalactanases shown to be involved in liberating galactotriose from type I 272 arabinogalactans in B. longum (63). In contrast, strain B_38 isolated from this infant (BF5) post- 273 weaning appears to have lost families GH136 and GH146. Interestingly, members of family GH136 274 are lacto-N-biosidases responsible for liberating lacto-N-biose I from LNT, an abundant HMO unique 275 to human milk (64). Overall, the presence of intra-individual and strain-specific GH family repertoires 276 in B. longum suggests their adaptation to host-specific diet. The presence of strains with different 277 GH content at different dietary stages further indicates potential acquisition of new Bifidobacterium 278 strains with nutrient-specific adaptations in response to the changing infant diet.

279

280 Phenotypic characterisation of carbohydrate utilisation

281 Bifidobacterium longum has previously been shown to metabolise a range of carbohydrates, 282 including dietary and host-derived glycans (65, 66). Given the predicted differences in carbohydrate 283 metabolism profiles between B. longum and B. infantis, and to understand strain-specific nutrient 284 preferences of our strains, we next sought to determine their glycan fermentation capabilities. We 285 performed growth assays on 49 representative strains from all nine infants, cultured in modified 286 MRS supplemented with selected carbohydrates as the sole carbon source. For these experiments, 287 we chose both plant- and host-derived glycans that we would expect to constitute components of 288 the early-life infant diet (67). Although all B. longum strains were able to grow on simple 289 carbohydrates (i.e. glucose and lactose), we also observed subspecies-specific complex carbohydrate 290 preferences, consistent with bioinformatic predictions (Figure 4). To represent host-derived

9

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

291 carbohydrates, we selected 2’-fucosyllactose (2’-FL) and lacto-N-neotetraose (LNnT) as examples of 292 HMOs found in breast milk. Out of the tested isolates, all B. infantis strains were able to metabolise 293 2’-FL, as were three B. longum strains isolated from a formula-fed baby FF3 during weaning and 294 post-weaning (Figure 4). These results supported the computational analysis and the identification 295 of genes potentially involved in degradation of fucosylated carbohydrates in the genomes of these 296 isolates (GH29 and GH95). Although bioinformatics identified the presence of genes involved in 297 metabolism of isomeric LNT in all our strains (GH20 and GH112), LNnT metabolism in B. infantis was 298 strain-specific, with most strains showing moderate to high growth rates. Out of B. longum strains, 299 B_25 (isolated during weaning from breast-fed baby BF3) also showed robust growth on LNnT. 300 Furthermore, this strain was the only strain out of the 49 tested that showed growth on cellobiose 301 and, in contrast to all other B. longum strains, was not able to metabolise plant-derived arabinose 302 and xylose despite the predicted presence of genes involved in metabolism of monosaccharides 303 (GH43, GH31, GH2). Additionally, both B. longum and B. infantis strains showed varying degrees of 304 growth performance on mannose, while none of the tested strains were able to grow on 305 arabinogalactan, pectin or rhamnose (Figure 4).

306 To further characterise strains identified above for putative carbohydrate degradation genes, we 307 performed carbohydrate uptake analysis and proteomics. B. longum strain B_25, from one of the 308 breast-fed identical twins that showed growth on LNnT and cellobiose, and formula-fed strain B_71 309 which was able to grow on 2’-FL, were chosen. Supernatant from these cultures was initially 310 subjected to high-performance anion-exchange chromatography (HPAEC) to evaluate the 311 carbohydrate-depletion profiles (Figure 5). In all three cases, the chromatograms showed complete 312 utilisation of the tested carbohydrates and absence of any respective degradation products in the 313 stationary phase culture. The depletion of cellobiose by B_25 and 2’-FL by B_71 occurred in the early 314 exponential phase while LNnT was still detected in the culture supernatant until the late exponential 315 phase of growth, suggesting that cellobiose and 2’-FL were internalised more efficiently than LNnT. 316 We next determined the proteome of B_25 and B_71 when growing on cellobiose, LNnT and 2’-FL 317 compared to glucose (Figure 5a-c & Supplementary Table S9). The top 10 most abundant proteins in 318 the cellobiose proteome of B_25 included three beta-glucosidases belonging to GH3 family, as well 319 as a homologue of transport gene cluster previously shown to be upregulated in B. animalis subsp. 320 lactis Bl-04 during growth on cellobiose (Figure 5a & Supplementary Table S10) (68). Among the 321 three β-glucosidases, B_25_00240 showed 98% sequence identity to the structurally characterized 322 BlBG3 from B. longum, which has been shown to be involved in metabolism of the natural glycosides 323 saponins (69). B_25_01763 and B_25_00262 showed 46% identity to the β-glucosidase Bgl3B from 324 Thermotoga neapolitana (70) and 83% identity to BaBgl3 from B. adolescentis ATCC 15703 (71),

10

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

325 respectively, two enzymes previously shown to hydrolyse cello-oligosaccharides. With respect to 326 LNnT metabolism by the same strain, the most abundant proteins were encoded by genes located in 327 two PULs (B_25_00111-00117 and B_25_00130-00133) with functions compatible with LNnT import, 328 degradation to monosaccharides and further metabolism. The PULs contain the components of an 329 ABC-transporter (B_25_00111-00113), a predicted intracellular GH112 lacto-N-biose phosphorylase 330 (B_25_00114), an N-acetylhexosamine 1-kinase (B_25_00115) and enzymes involved in the Leloir 331 pathway. All these proteins were close homologues to proteins previously implicated in the 332 degradation of LNT/LNnT by type strain B. infantis ATCC 15697T (72) (Figure 5b & Supplementary 333 Table S10). Interestingly, all clonal strains isolated from twin babies BF3 and BF4 also contained 334 close homologues of all the above-mentioned genes in their genomes, in some cases identical to 335 those determined in B_25; however, only strain B_25 was able to grow on cellobiose and LNnT. 336 Growth of B_71 on 2’-FL corresponded to increased abundance of proteins encoded by the PUL 337 B_71_00973-00983. These proteins showed close homology to proteins described for B. longum 338 SC596 and included genes for import of fucosylated oligosaccharides, fucose metabolism and two α- 339 fucosidases belonging to the families GH29 and GH95 (Figure 5c & Supplementary Table S10) (27). 340

341 Discussion

342 High abundance of Bifidobacterium in early infancy is strongly linked to availability of nutrients, with 343 dominance in breast-fed infants correlated with enrichment of genes required for the degradation of 344 HMOs present in breast milk, while the transition to solid foods during weaning has been linked to 345 genes involved in degradation of complex plant-derived carbohydrates (3, 29, 64). Bifidobacterium 346 longum species appear to be widely distributed in individuals at different life stages, which may 347 correlate with the abundance of genes responsible for carbohydrate metabolism (35, 42). In this 348 study, we aimed to investigate the adaptations of B. longum to the changing infant diet during the 349 early-life developmental window. We analysed the intra-subspecies genomic diversity of 75 B. 350 longum strains isolated from nine individual infant hosts at different dietary stages, with focus on 351 their potential carbohydrate metabolism capabilities, and determined their growth performance on 352 different carbohydrates as sole carbon sources. Our results indicate intra-individual and diet-related 353 differences in genomic content of analysed strains, which links to their ability to metabolise specific 354 dietary components.

355 Our comparative genomic analysis indicates that clonal strains of B. longum sp. can persist in 356 individuals through infancy, for at least 18 months, despite significant changes in diet during 357 weaning, which is consistent with previous reports (35, 42). Concurrently, new strains (that display

11

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

358 different genomic content and potential carbohydrate metabolism capabilities) can be acquired, 359 possibly in response to the changing diet. Initial vertical acquisition of Bifidobacterium from mother 360 to newborn babies has been well documented (48, 49, 73); however, details of strain transmission 361 events in later life are currently unclear. Work of Odamaki et al. (42) identified person-to-person 362 horizontal transmission of a particular B. longum strain between members of the same family, and 363 suggested direct transfer, common dietary sources or environmental reservoirs, such as family 364 homes (74), as potential vehicles and routes for strain transmission. Our results showed the 365 presence of clonal strains in identical twins BF3 and BF4, which may have resulted from maternal 366 transfer. However, potential strain transmission between these infants living in the same 367 environment may also occur. Furthermore, wider studies involving both mothers and twin babies 368 (and other siblings) could provide details on the extent, timing and location of transmission events 369 between members of the same household.

370 Another aspect of comparative genomic analysis involved in-silico prediction of genes belonging to 371 GH families. This analysis revealed genome flexibility within B. longum sp., with differences in GH 372 family content between strains belonging to the same subspecies as described previously; B. infantis 373 predominantly enriched in GH families implicated in the degradation of host-derived breast milk- 374 associated dietary components like HMOs and B. longum containing GH families involved in the 375 metabolism of plant-derived substrates (28, 55). Within the B. infantis group, we identified 376 subspecies-specific differences in GH content between pre- and post-weaning strains, which 377 indicates adaptation to the changing infant diet. Moreover, we observed differences in the number 378 of genes belonging to the most abundant GH families (e.g. GH43) between breast-fed and formula- 379 fed strains at different dietary stages, which can be linked to nutrient availability. Surprisingly, we 380 computationally and phenotypically identified closely related weaning and post-weaning B. longum 381 strains capable of metabolising HMOs (i.e. 2’-FL) in a formula-fed baby that only received standard 382 non-supplemented (i.e. no prebiotics or synthetic HMOs) formula. However, these data should be 383 carefully interpreted, since our collection only contains one bacterial strain per time point. In 384 addition, analysis of strains belonging to B. infantis group was performed based on strains from only 385 one breast-fed baby, which is a further caveat of the study.

386 Recorded phenotypic data support the results of genomic analyses and further highlight differences 387 in carbohydrate utilisation profiles between and within B. longum and B. infantis. The ability of 388 Bifidobacterium, especially B. infantis, to grow on different HMOs indicates their adaptation to an 389 HMO-rich diet, which may be a factor facilitating their establishment within hosts at early-life stages. 390 Similarly, B. longum preference for plant-based nutrients may be influencing their ability to persist 391 within individual hosts through significant dietary changes. Differential growth of strains that are

12

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

392 genotypically similar on various carbohydrate substrates and the ability of formula-fed strains to 393 metabolise selected HMOs suggest that Bifidobacterium possess an overall very broad repertoire of 394 genes for carbohydrate acquisition and metabolism that may be differentially switched on and off in 395 response to the presence of specific dietary components (75, 76). Another explanation for these 396 results may be a potential influence of the intra-individual environment on epigenetic mechanisms 397 in these bacteria. One potential factor involved in this process may be a cooperative effort among 398 Bifidobacterium in the early-life microbiota supported by cross-feeding activities between species 399 and strains (1, 28).

400 Glycan uptake analysis and proteomic investigation allowed us to determine mechanisms which 401 selected B. longum strains employ to metabolise different carbohydrates. A common feature, based 402 on the predicted activity of the most abundant proteins detected during grown on the three 403 substrates (cellobiose, LNnT and 2’-FL), was that they were all imported and “selfishly” degraded 404 intracellularly, therefore limiting release of degradation products that could allow cross-feeding by 405 other gut bacteria. This is in line with the carbohydrate uptake analysis, where no peak for 406 cellobiose, LNnT and 2’-FL degradation products could be detected. Cellobiose uptake in B_25 occurs 407 via a mechanism similar to B. animalis subsp. lactis Bl-04 (68); cellobiose hydrolysis appears to be 408 mediated by the activity of three intracellular β-glucosidases, although further confirmatory 409 biochemical characterization of these enzyme is still required. B_25 was observed to utilize LNnT 410 using a pathway similar to that described in B. longum subsp. infantis whereby LNnT is internalized 411 via an ABC-transporter (B_25_00111-00113) followed by intracellular degradation into constituent 412 monosaccharides by a GH112 (B_25_00114) and an N-acetylhexosamine 1-kinase (B_25_00115). 413 LNnT degradation products are further metabolized to fructose-6-phosphate by activities that 414 include B_25_00116-00117 (galactose-1-phosphate urydyltranferase, UDP-glucose 4-epimerase, 415 involved in the Leloir pathway) and B_25_01030-01033 (for metabolism of N-acetylgalactosamine) 416 prior to entering the Bifidobacterium genus-specific fructose-6-phosphate phosphoketolase (F6PPK) 417 pathway (72). B_71 is predicted to deploy an ABC-transporter (B_71_00974-00976) that allows 418 uptake of intact 2’-FL that is subsequently hydrolysed to L-fucose and lactose by the two predicted 419 intracellular α-fucosidases GH29 (B_71_00982) and GH95 (B_71_00983). L-fucose is further 420 metabolized to L-lactate and pyruvate, via a pathway of non-phosphorylated intermediates that 421 include activities of L-fucose mutarotase (B_71_00981), L-fucose dehydrogenase (B_71_00978), L- 422 fuconate hydrolase (B_71_00977) as previously described for B. longum subsp. longum SC596 (27). 423 Considering that the proteins encoded by the aforementioned genes are located in the cellobiose, 424 LNnT and 2’-FL PULs that share high similarity and similar organization with those found in 425 equivalent systems in other B. longum and B. animalis, it is reasonable to suggest that the PULs are

13

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

426 related and may be the results of horizontal gene transfer events between B. longum/B. animalis 427 members residing in the infant gut microbiota. Collectively, these data reflect inter- and intra-host 428 phenotypic diversity of B. longum ssp. in terms of their carbohydrate degradation capabilities and 429 suggest that intra-individual environment may influence epigenetic mechanisms in Bifidobacterium, 430 resulting in differential growth on carbohydrate substrates. 431

432 In conclusion, this research provides new insight into distinct genomic and phenotypic abilities of B. 433 longum species and strains isolated from the same individuals during the early-life developmental 434 window by demonstrating that subspecies- and strain-specific differences between members of B. 435 longum sp. in infant hosts can be correlated to their adaptation at specific age and diet stages.

436

437 Materials and methods

438 Bacterial isolates

439 Infants were recruited between 2005 and 2007: five were exclusively breast-fed and four were 440 exclusively formula-fed. Faecal samples were obtained from infants at specific intervals during the 441 first 18 months of life. For inclusion in the study, infants had to meet the following criteria: have 442 been born at full-term (>37 weeks gestation); be of normal birth weight (>2.5 kg); be <5 weeks old 443 and generally healthy; and be exclusively breast-fed or exclusively formula-fed [SMA Gold or SMA 444 White (Wyeth Pharmaceuticals), to avoid supplemented formulae and to keep consistency within 445 the formula group]. The mothers of the breast-fed infants had not consumed any antibiotics within 446 the 3 months prior to the study and had not taken any prebiotics and/or . Ethical approval 447 was obtained from the University of Reading Ethics Committee (43). Bifidobacterium strains (n=88) 448 were isolated from healthy infants (Supplementary Table S1), either exclusively breast-fed (BF) or 449 formula-fed (FF), and originally identified using ribosomal intergenic spacer analysis (44).

450 DNA extraction, whole-genome sequencing, assembly and annotation

451 Phenol-chloroform method used for genomic DNA extraction as described previously (1). DNA 452 isolated from pure bacterial cultures was subjected to multiplex Illumina library preparation protocol 453 followed by sequencing on Illumina HiSeq 2500 platform (n=87) at the Wellcome Trust Sanger 454 Institute (Hinxton, UK) or Illumina MiSeq (n=1) at Quadram Institute Bioscience (Norwich, UK) with 455 read length of PE125 bp and PE300 bp, respectively, with an average sequencing coverage of 66.95- 456 fold for isolates sequenced on HiSeq (minimum 46-fold, maximum 77-fold) and 231-fold for the 457 isolate sequenced on MiSeq (Supplementary Table S1). Sequencing reads were checked for

14

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

458 contamination using Kraken v1.1 (MiniKraken) (77) and pre-processed with fastp v0.20 (78) before 459 assembling using SPAdes v3.11 with “careful” option (79). Contigs below 500bp were filtered out 460 from the assemblies. Incorrectly assembled sequences were removed from further analysis (n=3). 461 Additionally, publicly available assemblies of Bifidobacterium type strains (n=70) (Supplementary 462 Table S1) were retrieved from NCBI Genome database and all genomes were annotated with Prokka 463 v1.13 (80). The draft genomes of 75 B. longum isolates have been deposited to GOLD database at 464 https://img.jgi.doe.gov, GOLD Study ID: Gs0145337.

465 Phylogenetic analysis

466 Python3 module pyANI v0.2.7 with default BLASTN+ settings was employed to calculate the average 467 nucleotide identity (ANI) (81). Species delineation cut-off was set at 95% identity (82) and based on 468 that only sequences identified as Bifidobacterium longum subspecies were selected for further 469 analysis (n=75) (Supplementary Table S2).

470 General feature format files of B. longum strains were inputted into the Roary pangenome pipeline 471 v.3.12.0 to obtain core-genome data and the multiple sequence alignment (msa) of core genes 472 (Mafft v7.313) (83, 84). SNP analysis of strains from individual infants was performed using Snippy 473 v4.2.1 (85) and the resulting msa was passed to the recombination removal tool Gubbins(86). 474 Alignments resulting from all previous steps were cleaned from poorly aligned positions using 475 manual curation and Gblocks v0.9b where appropriate (87). The core-genome tree was generated 476 using FastTree v2.1.9 using the GTR model with 1000 bootstrap iterations (88). Snp-dists v0.2 was 477 used to generate pairwise SNP distance matrix between strains within individual infants (89). 478 Altogether, the results of the SNP analysis reflected ANI results, showing that pairwise sequence 479 identities were inversely proportional to pairwise SNP distances in B. longum subspecies isolates 480 recovered from individual hosts.

481 Functional annotation and genome-wide association study analysis

482 Scoary v1.6.16 with Benjamini Hochberg correction (90) was used to associate subsets of genes with 483 specific traits – breast-fed, formula-fed, pre-weaning, weaning and post-weaning. The p-value cut- 484 off was set to <1e-5, sensitivity cut-off to ≥70 % and specificity cut-off to ≥90 % to report the most 485 overrepresented genes. Functional categories (COG categories) were assigned to genes using 486 EggNOG-mapper v0.99.3, based on the EggNOG database (bacteria) (91) and the abundance of 487 genes involved in carbohydrate metabolism was calculated. As most B. infantis strains (12 out of 13) 488 were isolated from breast-fed infants, we did not compare abundances of carbohydrate metabolism 489 genes in breast-fed and formula-fed groups for this subspecies. Standalone version of dbCAN2

15

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

490 (v2.0.1) was used for CAZyme annotation (92). Glycosyl hydrolase (GH) gain-loss events were 491 predicted using Dollo parsimony implemented in Count v9.1106 (93). Snippy v4.2.1 with the “--ctgs” 492 option, SNP-sites v2.3.3 (94) and FastTree v2.1.9 (GTR model with 1000 bootstrap iterations) were 493 used to generate the whole genome SNP tree.

494 Carbohydrate utilisation

495 To assess the carbohydrate utilisation profile, Bifidobacterium (1%, v/v) was grown in modified 496 (m)MRS (pH 6.8) supplemented with cysteine HCl at 0.05% and 2% (w/v) of selected carbohydrates 497 (HMOs obtained from Glycom, Hørsholm, Denmark) as described previously (1), except for pectin 498 and mucin which were added at 1% (w/v). Growth was determined over a 48-h period using Tecan

499 Infinite 50 (Tecan Ltd, UK) microplate spectrophotometer at OD595. Experiments were performed in 500 biologically independent triplicates, and the plate reader measurements were taken automatically 501 every 15 min following 60 s of shaking at normal speed. Due to the expected drop in initial OD values

502 (i.e. recorded between T0 and T1) growth data were expressed as mean of the replicates between T2

503 (30 min) and Tend (48-h).

504 High-performance anion-exchange chromatography (HPAEC)

505 Mono-, di- and oligo- saccharides present in the spent media samples were analyzed on a Dionex 506 ICS-5000 HPAEC system operated by the Chromeleon software version 7 (Dionex, Thermo Scientific). 507 Samples were bound to a Dionex CarboPac PA1 (Thermo Scientific) analytical column (2 × 250 mm) 508 in combination with a CarboPac PA1 guard column (2 × 50 mm), equilibrated with 0.1 M NaOH. 509 Carbohydrates were detected by pulsed amperometric detection (PAD). The system was run at a 510 flow rate of 0.25 mL/min. The separation was done using a stepwise gradient going from 0.1 M 511 NaOH to 0.1 M NaOH–0.1 M sodium acetate (NaOAc) over 10 min, 0.1 M NaOH–0.3 M NaOAc over 512 25 min followed by a 5 min exponential gradient to 1 M NaOAc, before reconditioning with 0.1 M 513 NaOH for 10 min. Commercial glucose, cellobiose, fucose, lactose and lacto-N-neotetraose (LNnT) 514 were used as external standards.

515 Proteomics

516 B. longum subsp. longum strain 25 (B_25) was grown in triplicate in mMRS supplemented with 517 cysteine HCl at 0.05% and 2% (w/v) glucose, cellobiose or LNnT as a sole carbon source. B. longum 518 subsp. longum strain 71 (B_71) was grown in triplicate in mMRS supplemented with cysteine HCl at 519 0.05% and either 2% (w/v) glucose or 2’-fucosyllactose (2’-FL) as a sole carbon source. Cell pellets 520 from 50 mL samples (at the mid-exponential growth phase) were collected by centrifugation 521 (4500 × g, 10 min, 4 °C) and washed three times with PBS pH 7.4. Cells were resuspended in 50 mM

16

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

522 Tris-HCl pH 8.4 and disrupted by bead-beating in three 60 s cycles using a FastPrep24 (MP 523 Biomedicals, CA). Protein concentration was determined using a Bradford protein assay (Bio-Rad, 524 Germany). Protein samples, containing 50 μg total protein, were separated by SDS-PAGE with a 10% 525 Mini-PROTEAN gel (Bio-Rad Laboratories, CA) and then stained with Coomassie brilliant blue R250. 526 The gel was cut into five slices, after which proteins were reduced, alkylated, and in-gel digested as 527 previously described (95). Peptides were dissolved in 2% acetonitrile containing 0.1% trifluoroacetic 528 acid and desalted using C18 ZipTips (Merck Millipore, Germany). Each sample was independently 529 analysed on a Q-Exactive hybrid quadrupole-orbitrap mass spectrometer (Thermo Scientific) 530 equipped with a nano-electrospray ion source. MS and MS/MS data were acquired using Xcalibur 531 (v.2.2 SP1). Spectra were analysed using MaxQuant 1.6.1.0 (96) and searched against a sample- 532 specific database generated from the B_25 and B_71 genomes. Proteins were quantified using the 533 MaxLFQ algorithm (97). The enzyme specificity was set to consider tryptic peptides and two missed 534 cleavages were allowed. Oxidation of methionine, N-terminal acetylation and deamidation of 535 asparagine and glutamine and formation of pyro-glutamic acid at N-terminal glutamines were used 536 as variable modifications, whereas carbamidomethylation of cysteine residues was used as a fixed 537 modification. All identifications were filtered in order to achieve a protein false discovery rate (FDR) 538 of 1% using the target-decoy strategy. A protein was considered confidently identified if it was 539 detected in at least two of the three biological replicates in at least one glycan substrate. The 540 MaxQuant output was further explored in Perseus v.1.6.1.1 (98). The proteomics data have been 541 deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via 542 the PRIDE partner repository with dataset identifier PXD017277.

543

544 Acknowledgments

545 We would like to thank Glycom A/S for the kind donation of purified HMOs: 2′-FL and LNnT. The 546 authors would also like to thank Prof Rob Kingsley and Mr Shabhonam Caim for technical support 547 and advice. This work was funded by a Wellcome Trust Investigator Award (no. 100/974/C/13/Z); a 548 BBSRC Norwich Research Park Bioscience Doctoral Training grant no. BB/M011216/1 (supervisor LJH, 549 student MK); an Institute Strategic Programme Gut Microbes and Health grant no. BB/R012490/1 550 and its constituent projects BBS/E/F/000PR10353 and BBS/E/F/000PR10356; and an Institute 551 Strategic Programme Gut Health and Food Safety grant no. BB/J004529/1 to LJH. LH was in receipt of 552 a Medical Research Council Intermediate Research Fellowship in Data Science (UK MED-BIO, grant 553 no. MR/L01632X/1). PBP and SLLR are grateful for support from The Research Council of Norway 554 (FRIPRO program, PBP: 250479), as well as the European Research Commission Starting Grant

17

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

555 Fellowship (awarded to PBP; 336355 - MicroDE). The funding bodies did not contribute to the design 556 of the study, collection, analysis, and interpretation of data or in writing the manuscript.

557

558 Author contributions

559 LJH, LH, ALM and MK designed the overall study. ALM provided the unique B. longum strain 560 collection and extracted the DNA. MK prepared the DNA for WGS, performed all genomic analysis 561 and visualisation, as well as growth studies. SLLR, PBP, LJH and MK planned metabolomics and 562 proteomics studies. MK prepared samples for metabolomics and proteomics. SLLR and MK 563 performed the metabolomics and proteomics experiments and SLLR analysed and visualised the 564 resulting data. LJH and MK analysed the data, with input and discussion from LH, and drafted the 565 manuscript. SLLR, PBP, LH and ALM provided providing further edits and co-writing of the final 566 version. All authors read and approved the final manuscript.

567

568 References

569 1. M. A. E. Lawson et al., Breast milk-derived human milk oligosaccharides promote 570 Bifidobacterium interactions within a single ecosystem. ISME J 14, 635-648 (2020). 571 2. L. Wampach et al., Colonization and Succession within the Human Gut Microbiome by 572 Archaea, Bacteria, and Microeukaryotes during the First Year of Life. Frontiers in 573 microbiology 8, 738 (2017). 574 3. F. Backhed et al., Dynamics and Stabilization of the Human Gut Microbiome during the First 575 Year of Life. Cell Host Microbe 17, 852 (2015). 576 4. M. G. de Aguero et al., The maternal microbiota drives early postnatal innate immune 577 development. Science 351, 1296-1301 (2016). 578 5. A. Sivan et al., Commensal Bifidobacterium promotes antitumor immunity and facilitates 579 anti-PD-L1 efficacy. Science 350, 1084-1089 (2015). 580 6. M. P. Heikkila, P. E. Saris, Inhibition of Staphylococcus aureus by the commensal bacteria of 581 human milk. Journal of applied microbiology 95, 471-478 (2003). 582 7. M. Aaboud et al., Search for High-Mass Resonances Decaying to taunu in pp Collisions at 583 sqrt[s]=13 TeV with the ATLAS Detector. Phys Rev Lett 120, 161802 (2018). 584 8. D. A. Sela et al., The genome sequence of Bifidobacterium longum subsp infantis reveals 585 adaptations for milk utilization within the infant microbiome. P Natl Acad Sci USA 105, 586 18964-18969 (2008). 587 9. A. Marcobal, J. L. Sonnenburg, Human milk oligosaccharide consumption by intestinal 588 microbiota. Clin Microbiol Infec 18, 12-15 (2012). 589 10. T. Thongaram, J. L. Hoeflinger, J. Chow, M. J. Miller, Human milk oligosaccharide 590 consumption by and human-associated bifidobacteria and lactobacilli. Journal of 591 dairy science 100, 7825-7833 (2017). 592 11. H. Renz, P. Brandtzaeg, M. Hornef, The impact of perinatal immune development on 593 mucosal homeostasis and chronic inflammation. Nat Rev Immunol 12, 9-23 (2012). 594 12. P. J. Turnbaugh et al., An obesity-associated gut microbiome with increased capacity for 595 energy harvest. Nature 444, 1027-1031 (2006).

18

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

596 13. T. Olszak et al., Microbial Exposure During Early Life Has Persistent Effects on Natural Killer T 597 Cell Function. Science 336, 489-493 (2012). 598 14. N. A. Bokulich et al., Antibiotics, birth mode, and diet shape microbiome maturation during 599 early life. Sci Transl Med 8, (2016). 600 15. W. H. W. Tang, T. Kitai, S. L. Hazen, Gut Microbiota in Cardiovascular Health and Disease. Circ 601 Res 120, 1183-1196 (2017). 602 16. Q. Feng et al., Gut microbiome development along the colorectal adenoma-carcinoma 603 sequence. Nat Commun 6, (2015). 604 17. Y. Shao et al., Stunted microbiota and opportunistic pathogen colonization in caesarean- 605 section birth. Nature 574, 117-121 (2019). 606 18. A. O'Sullivan, M. Farver, J. T. Smilowitz, The Influence of Early Infant-Feeding Practices on 607 the Intestinal Microbiome and Body Composition in Infants. Nutr Metab Insights 8, 1-9 608 (2015). 609 19. R. Martin et al., Early-Life Events, Including Mode of Delivery and Type of Feeding, Siblings 610 and Gender, Shape the Developing Gut Microbiota. PLoS One 11, e0158498 (2016). 611 20. L. T. Stiemsma, K. B. Michels, The Role of the Microbiome in the Developmental Origins of 612 Health and Disease. Pediatrics 141, (2018). 613 21. S. Ip et al., Breastfeeding and maternal and infant health outcomes in developed countries. 614 Evid Rep Technol Assess (Full Rep), 1-186 (2007). 615 22. U. N. Das, Breastfeeding prevents type 2 diabetes mellitus: but, how and why? Am J Clin 616 Nutr 85, 1436-1437 (2007). 617 23. J. A. Ortega-Garcia et al., Full Breastfeeding and Obesity in Children: A Prospective Study 618 from Birth to 6 Years. Child Obes 14, 327-337 (2018). 619 24. J. D. Forbes et al., Association of Exposure to Formula in the Hospital and Subsequent Infant 620 Feeding Practices With Gut Microbiota and Risk of Overweight in the First Year of Life. Jama 621 Pediatr 172, (2018). 622 25. K. James, M. O. Motherway, F. Bottacini, D. van Sinderen, Bifidobacterium breve UCC2003 623 metabolises the human milk oligosaccharides lacto-N-tetraose and lacto-N-neo-tetraose 624 through overlapping, yet distinct pathways. Sci Rep 6, 38560 (2016). 625 26. T. Katayama, Host-derived glycans serve as selected nutrients for the gut microbe: human 626 milk oligosaccharides and bifidobacteria. Biosci Biotech Bioch 80, 621-632 (2016). 627 27. D. Garrido et al., A novel gene cluster allows preferential utilization of fucosylated milk 628 oligosaccharides in Bifidobacterium longum subsp longum SC596. Sci Rep-Uk 6, (2016). 629 28. C. Milani et al., Bifidobacteria exhibit social behavior through carbohydrate resource sharing 630 in the gut. Sci Rep-Uk 5, (2015). 631 29. J. E. Koenig et al., Succession of microbial consortia in the developing infant gut microbiome. 632 Proc Natl Acad Sci U S A 108 Suppl 1, 4578-4585 (2011). 633 30. S. McKeen et al., Infant Complementary Feeding of Prebiotics for the Microbiome and 634 Immunity. Nutrients 11, (2019). 635 31. M. B. Roberfroid, Inulin-type fructans: functional food ingredients. J Nutr 137, 2493S-2502S 636 (2007). 637 32. W. F. Broekaert et al., Prebiotic and other health-related effects of cereal-derived 638 arabinoxylans, arabinoxylan-oligosaccharides, and xylooligosaccharides. Crit Rev Food Sci 639 Nutr 51, 178-194 (2011). 640 33. S. Hald et al., Effects of Arabinoxylan and Resistant Starch on Intestinal Microbiota and 641 Short-Chain Fatty Acids in Subjects with Metabolic Syndrome: A Randomised Crossover 642 Study. Plos One 11, (2016). 643 34. A. Riviere, M. Selak, D. Lantin, F. Leroy, L. De Vuyst, Bifidobacteria and Butyrate-Producing 644 Colon Bacteria: Importance and Strategies for Their Stimulation in the Human Gut. Frontiers 645 in microbiology 7, 979 (2016).

19

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

646 35. M. X. Maldonado-Gomez et al., Stable Engraftment of Bifidobacterium longum AH1206 in 647 the Human Gut Depends on Individualized Features of the Resident Microbiome. Cell Host 648 Microbe 20, 515-526 (2016). 649 36. K. Oki et al., Long-term colonization exceeding six years from early infancy of 650 Bifidobacterium longum subsp. longum in human gut. Bmc Microbiol 18, (2018). 651 37. P. Mattarelli, C. Bonaparte, B. Pot, B. Biavati, Proposal to reclassify the three biotypes of 652 Bifidobacterium longum as three subspecies: Bifidobacterium longum subsp. longum subsp. 653 nov., Bifidobacterium longum subsp. infantis comb. nov. and Bifidobacterium longum subsp. 654 suis comb. nov. Int J Syst Evol Microbiol 58, 767-772 (2008). 655 38. E. Yanokura et al., Subspeciation of Bifidobacterium longum by multilocus approaches and 656 amplified fragment length polymorphism: Description of B. longum subsp. suillum subsp. 657 nov., isolated from the faeces of piglets. Syst Appl Microbiol 38, 305-314 (2015). 658 39. F. Turroni et al., Exploring the Diversity of the Bifidobacterial Population in the Human 659 Intestinal Tract. Appl Environ Microb 75, 1534-1545 (2009). 660 40. F. Turroni et al., Diversity of Bifidobacteria within the Infant Gut Microbiota. Plos One 7, 661 (2012). 662 41. D. Garrido, D. Barile, D. A. Mills, A molecular basis for bifidobacterial enrichment in the 663 infant gastrointestinal tract. Adv Nutr 3, 415S-421S (2012). 664 42. T. Odamaki et al., Genomic diversity and distribution of Bifidobacterium longum subsp 665 longum across the human lifespan. Sci Rep-Uk 8, (2018). 666 43. L. C. Roger, A. L. McCartney, Longitudinal investigation of the faecal microbiota of healthy 667 full-term infants using fluorescence in situ hybridization and denaturing gradient gel 668 electrophoresis. Microbiol-Sgm 156, 3317-3328 (2010). 669 44. L. C. Roger, A. Costabile, D. T. Holland, L. Hoyles, A. L. McCartney, Examination of faecal 670 Bifidobacterium populations in breast- and formula-fed infants during the first 18 months of 671 life. Microbiol-Sgm 156, 3329-3341 (2010). 672 45. H. Ichinose, M. Yoshida, Z. Fujimoto, S. Kaneko, Characterization of a modular enzyme of 673 exo-1,5-alpha-L-arabinofuranosidase and arabinan binding module from Streptomyces 674 avermitilis NBRC14893. Appl Microbiol Biotechnol 80, 399-408 (2008). 675 46. S. Ahmed et al., A novel alpha-L-arabinofuranosidase of family 43 glycoside hydrolase 676 (Ct43Araf) from Clostridium thermocellum. PLoS One 8, e73575 (2013). 677 47. H. Makino et al., Transmission of intestinal Bifidobacterium longum subsp. longum strains 678 from mother to infant, determined by multilocus sequencing typing and amplified fragment 679 length polymorphism. Appl Environ Microbiol 77, 6788-6793 (2011). 680 48. H. Makino et al., Mother-to-infant transmission of intestinal bifidobacterial strains has an 681 impact on the early development of vaginally delivered infant's microbiota. PLoS One 8, 682 e78331 (2013). 683 49. C. Milani et al., Exploring Vertical Transmission of Bifidobacteria from Mother to Child. Appl 684 Environ Microbiol 81, 7078-7087 (2015). 685 50. K. Pokusaeva, G. F. Fitzgerald, D. van Sinderen, Carbohydrate metabolism in Bifidobacteria. 686 Genes Nutr 6, 285-306 (2011). 687 51. M. Ventura et al., Genome-scale analyses of health-promoting bacteria: probiogenomics. 688 Nat Rev Microbiol 7, 61-71 (2009). 689 52. D. A. Sela, D. A. Mills, Nursing our microbiota: molecular linkages between bifidobacteria 690 and milk oligosaccharides. Trends Microbiol 18, 298-307 (2010). 691 53. G. Davies, B. Henrissat, Structures and Mechanisms of Glycosyl Hydrolases. Structure 3, 853- 692 859 (1995). 693 54. A. H. Viborg et al., Biochemical and kinetic characterisation of a novel xylooligosaccharide- 694 upregulated GH43 beta-D-xylosidase/alpha-L-arabinofuranosidase (BXA43) from the 695 probiotic Bifidobacterium animalis subsp lactis BB-12. Amb Express 3, (2013).

20

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

696 55. C. Milani et al., Genomics of the Genus Bifidobacterium Reveals Species-Specific Adaptation 697 to the Glycan-Rich Gut Environment. Appl Environ Microb 82, 980-991 (2016). 698 56. S. K. Kang et al., Three forms of thermostable lactose-hydrolase from Thermus sp IB-21: 699 cloning, expression, and enzyme characterization. J Biotechnol 116, 337-346 (2005). 700 57. S. W. A. Hinz, L. A. M. van den Broek, G. Beldman, J. P. Vincken, A. G. J. Voragen, Beta- 701 galactosidase from Bifidobacterium adolescentis DSM20083 prefers beta(1,4)-galactosides 702 over lactose. Appl Microbiol Biot 66, 276-284 (2004). 703 58. H. Suzuki, A. Murakami, K. Yoshida, Motif-guided identification of a glycoside hydrolase 704 family 1 alpha-L-arabinofuranosidase in Bifidobacterium adolescentis. Biosci Biotechnol 705 Biochem 77, 1709-1714 (2013). 706 59. V. Ambrogi et al., Characterization of GH2 and GH42 beta-galactosidases derived from 707 bifidobacterial infant isolates. Amb Express 9, 9 (2019). 708 60. D. Garrido et al., Comparative transcriptomics reveals key differences in the response to milk 709 oligosaccharides of infant gut-associated bifidobacteria. Sci Rep 5, 13517 (2015). 710 61. D. A. Sela et al., Bifidobacterium longum subsp. infantis ATCC 15697 alpha-fucosidases are 711 active on fucosylated human milk oligosaccharides. Appl Environ Microbiol 78, 795-803 712 (2012). 713 62. M. Kitaoka, Bifidobacterial enzymes involved in the metabolism of human milk 714 oligosaccharides. Adv Nutr 3, 422S-429S (2012). 715 63. S. W. Hinz, M. I. Pastink, L. A. van den Broek, J. P. Vincken, A. G. Voragen, Bifidobacterium 716 longum endogalactanase liberates galactotriose from type I galactans. Appl Environ 717 Microbiol 71, 5501-5510 (2005). 718 64. C. Yamada et al., Molecular Insight into Evolution of Symbiosis between Breast-Fed Infants 719 and a Member of the Human Gut Microbiome Bifidobacterium longum. Cell Chem Biol 24, 720 515-+ (2017). 721 65. D. Watson et al., Selective carbohydrate utilization by lactobacilli and bifidobacteria. Journal 722 of applied microbiology 114, 1132-1146 (2013). 723 66. S. Arboleya et al., Gene-trait matching across the Bifidobacterium longum pan-genome 724 reveals considerable diversity in carbohydrate catabolism among human infant strains. Bmc 725 Genomics 19, (2018). 726 67. S. Mills, C. Stanton, J. A. Lane, G. J. Smith, R. P. Ross, Precision Nutrition and the Microbiome, 727 Part I: Current State of the Science. Nutrients 11, (2019). 728 68. J. M. Andersen et al., Transcriptional analysis of oligosaccharide utilization by 729 Bifidobacterium lactis Bl-04. Bmc Genomics 14, 312 (2013). 730 69. S. Yan et al., Functional and structural characterization of a beta-glucosidase involved in 731 saponin metabolism from intestinal bacteria. Biochem Biophys Res Commun 496, 1349-1356 732 (2018). 733 70. T. Pozzo, J. L. Pasten, E. N. Karlsson, D. T. Logan, Structural and functional analyses of beta- 734 glucosidase 3B from Thermotoga neapolitana: a thermostable three-domain representative 735 of glycoside hydrolase 3. J Mol Biol 397, 724-739 (2010). 736 71. R. N. Florindo et al., Structural and biochemical characterization of a GH3 beta-glucosidase 737 from the probiotic bacteria Bifidobacterium adolescentis. Biochimie 148, 107-115 (2018). 738 72. E. Ozcan, D. A. Sela, Inefficient Metabolism of the Human Milk Oligosaccharides Lacto-N- 739 tetraose and Lacto-N-neotetraose Shifts Bifidobacterium longum subsp. infantis Physiology. 740 Front Nutr 5, (2018). 741 73. K. Mikami, M. Kimura, H. Takahashi, Influence of maternal bifidobacteria on the 742 development of gut bifidobacteria in infants. Pharmaceuticals (Basel) 5, 629-642 (2012). 743 74. S. Lax et al., Longitudinal analysis of microbial interaction between humans and the indoor 744 environment. Science 345, 1048-1052 (2014). 745 75. J. Dworkin, R. Losick, Linking nutritional status to gene activation and development. Genes 746 Dev 15, 1051-1054 (2001).

21

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

747 76. J. Slager, J. W. Veening, Hard-Wired Control of Bacterial Processes by Chromosomal Gene 748 Location. Trends Microbiol 24, 788-800 (2016). 749 77. D. E. Wood, S. L. Salzberg, Kraken: ultrafast metagenomic sequence classification using exact 750 alignments. Genome Biol 15, (2014). 751 78. S. Chen, Y. Zhou, Y. Chen, J. Gu, fastp: an ultra-fast all-in-one FASTQ preprocessor. 752 Bioinformatics 34, i884-i890 (2018). 753 79. A. Bankevich et al., SPAdes: a new genome assembly algorithm and its applications to single- 754 cell sequencing. J Comput Biol 19, 455-477 (2012). 755 80. T. Seemann, Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068-2069 756 (2014). 757 81. L. Pritchard, R. H. Glover, S. Humphris, J. G. Elphinstone, I. K. Toth, Genomics and 758 in diagnostics for food security: soft-rotting enterobacterial plant pathogens. Anal Methods- 759 Uk 8, 12-24 (2016). 760 82. J. Chun et al., Proposed minimal standards for the use of genome data for the taxonomy of 761 prokaryotes. Int J Syst Evol Micr 68, 461-466 (2018). 762 83. A. J. Page et al., Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 763 3691-3693 (2015). 764 84. K. Katoh, J. Rozewicki, K. D. Yamada, MAFFT online service: multiple sequence alignment, 765 interactive sequence choice and visualization. Brief Bioinform 20, 1160-1166 (2019). 766 85. T. Seemann. (2015). 767 86. N. J. Croucher et al., Rapid phylogenetic analysis of large samples of recombinant bacterial 768 whole genome sequences using Gubbins. Nucleic Acids Res 43, (2015). 769 87. G. Talavera, J. Castresana, Improvement of phylogenies after removing divergent and 770 ambiguously aligned blocks from protein sequence alignments. Systematic Biol 56, 564-577 771 (2007). 772 88. M. N. Price, P. S. Dehal, A. P. Arkin, FastTree 2-Approximately Maximum-Likelihood Trees for 773 Large Alignments. Plos One 5, (2010). 774 89. T. Seemann, A. J. Page, F. Klotzl. (2017). 775 90. O. Brynildsrud, J. Bohlin, L. Scheffer, V. Eldholm, Rapid scoring of genes in microbial pan- 776 genome-wide association studies with Scoary. Genome Biol 17, 238 (2016). 777 91. J. Huerta-Cepas et al., Fast Genome-Wide Functional Annotation through Orthology 778 Assignment by eggNOG-Mapper. Mol Biol Evol 34, 2115-2122 (2017). 779 92. H. Zhang et al., dbCAN2: a meta server for automated carbohydrate-active enzyme 780 annotation. Nucleic Acids Res 46, W95-W101 (2018). 781 93. M. Csuros, I. Miklos, A probabilistic model for gene content evolution with duplication, loss, 782 and horizontal transfer. Lect Notes Comput Sc 3909, 206-220 (2006). 783 94. A. J. Page et al., SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. 784 Microb Genomics 2, (2016). 785 95. M. O. Arntzen, I. L. Karlskas, M. Skaugen, V. G. H. Eijsink, G. Mathiesen, Proteomic 786 Investigation of the Response of Enterococcus faecalis V583 when Cultivated in Urine. Plos 787 One 10, (2015). 788 96. J. Cox, M. Mann, MaxQuant enables high peptide identification rates, individualized p.p.b.- 789 range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26, 1367- 790 1372 (2008). 791 97. J. Cox et al., Accurate Proteome-wide Label-free Quantification by Delayed Normalization 792 and Maximal Peptide Ratio Extraction, Termed MaxLFQ. Mol Cell Proteomics 13, 2513-2526 793 (2014). 794 98. S. Tyanova et al., The Perseus computational platform for comprehensive analysis of 795 (prote)omics data. Nat Methods 13, 731-740 (2016). 796

22

a) BF1 * * * * * BF2 + + + + + + + * + + + + BF3 * * * * * * * * BF4 * * * * * BF5 * * + * * * * *

FF1 + * * * * * * * FF2 * * * * * * * FF3 * * * * * * * * * * FF5 * * * * * * * * * * * M M M M 4 6 8 12 16 17 18 19 20 22 23 24 25 26 27 28 29 30 31 32 33 34 36 9 12 15 18

Age at collection in weeks, except where prefixed by M (months)

* B. longum subsp. longum + B. longum subsp. infantis

b)

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 1. Identification and relatedness of B. longum strains. a) Sampling scheme and strain identification within individual breast-fed (BF1-BF5) and formula-fed (FF1-FF3 and FF5) infants based on average nucleotide identity values (ANI). b) Relatedness of B. longum strains based on core proteins. Coloured strips represent isolation period (pre-weaning, weaning and post-weaning) and isolation source (individual infants), respectively. Figure 2. Pairwise SNP distances between B. longum strains of the same subspecies within individual infants. Individual points show data distribution, diamonds indicate the group mean, box plots show group median and interquartile range.

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. GH33

GH109,GH123 GH1, GH109 GH30,GH53,GH85, GH101,GH127

GH1, GH43, GH109

GH1, GH29, GH53, GH95

GH1, GH29, GH95, GH136, GH146

GH121 GH146

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 3. Gene-loss events and abundance of GH families within B. longum subspecies. Pie charts superimposed on the whole genome SNP tree represent predicted GH family gain-loss events within B. longum and B. infantis lineages. Due to the size of the tree, examples of detailed gain loss events have been provided for main lineages, as well as baby BF2 (strains highlighted with light blue) and BF5 (strains highlighted with light purple). Heatmap represents abundance of specific GH families predicted in analysed B. longum strains. BF1 BF2 BF3 BF4 BF5 FF1 FF2 FF3 FF5

Figure 4. Growth performance of B. longum strains on different carbon sources. Heatmap displays the difference in average growth of triplicates between T2 (30 min) and Tend (48 hours).

bioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555; this version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Figure 5. HPAEC-PAD traces showing mono-, di- and oligo-saccharides detected in the supernatant of either B_25 or B_71 single cultures during growth in mMRS supplemented with (a) cellobiose; (b) LNnT; (c) 2’-FL. The data are representative of biological triplicates. Abbreviations: LNnT, Lacto-N-neotetraose; Glc, glucose; Glc2, cellobiose; 2’-FL, 2′-fucosyllactose. Panel on the right shows (a) cellobiose; (b) LNnT; (c) 2’-FL utilization clusters in B_25 and B_71 and proteomic detection of the corresponding proteins during growth on HMOs. Heat maps above genes show the LFQ detection levels for the corresponding proteins in triplicates grown on glucose (G); cellobiose (C); LNnT (L); and 2’-FL (F). Numbers between genes indicate percent identity between corresponding genes in homologous PULs relative to strains B_25 and B_71. Numbers below each gene show the locus tag in the corresponding genome. Locus tag numbers are abbreviated with the last numbers after the second hyphen (for example B_25_XXXXX). The locus tag prefix for each strain is indicated in parenthesis besidebioRxiv preprint doi: https://doi.org/10.1101/2020.02.20.957555the organism ; thisname. version posted February 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.