Microbial association networks in : a meta-analysis

Eugenio Parente, Teresa Zotta, Annamaria Ricciardi

Scuola di Scienze Agrarie, Forestali ed Alimentari, Università degli Studi della Basilicata, Potenza, Italy

Supplementary material

Index

1. DATA AND SOFTWARE. 2

2. ANALYSIS WORKFLOW. 2

3. INFERENCE OF ASSOCIATION NETWORKS AT DIFFERENT TAXONOMIC LEVELS. 7

4. COMPARING INFERENCE METHODS. 8

5. COMPARING GLOBAL NETWORK PROPERTIES. 12

6. NODE PROPERTIES. 14

7. EDGE PROPERTIES. 14

REFERENCES 19

1

1. Data and software.

The metataxonomic data used for the inference of microbial association networks were extracted from DairyFMBN v2.1.6 (Parente et al., 2020, Parente, 2021), a specialized version of FoodMicrobionet (Parente et al., 2019), which is available on Mendeley Data (https://data.mendeley.com/datasets/3cwf729p34/5). The list of studies used in the analysis is shown in Supplementary Table 1.

2. Analysis workflow.

A demo of the pipeline used for the inference and analysis of microbial association networks is available from GitHub (https://github.com/ep142/MAN_in_cheese) as a .Rmd notebook which, when run properly, will generate a ready to use report and a number of publication quality images and tables. The analysis workflow is summarized in Supplementary Figure 1. Basically: 1. general options for plotting, saving and inference (including the inference methods to use) are set 2. an arbitrary number of phyloseq objects (McMurdie and Holmes, 2013) containing the data are imported in a list. This can be either objects extracted from DairyFMBN or phyloseq objects generated using a suitable bioinformatics pipeline. Study metadata are also imported in this stage as a tab delimited file; 3. samples with low number of sequences and studies with low number of samples are removed; calculation of diversity indices is also performed at this stage 4. taxonomic filtering (removal of Amplicon Sequence Variants identified as Eukaryotes, mitochondria or chloroplasts, removal of sequences identified above the family level) and agglomeration (no agglomeration, or agglomeration at the species, genus or family level) are performed using mostly functions from package phyloseq 5. prevalence and abundance filtering is performed to remove the least prevalent and abundant taxa (even if this can be performed within the netConstruct() function, see below); prevalence and abundance plots and tables are generated; reports on the effect of filtering on the number of taxa and sequences are generated; 6. network inference is performed for all datasets and, within dataset, all inference methods using the netConstruct() function of the NetCoMi package (Peschel et al., 2020); errors are trapped using try() and the results (as microNet or try objects) are put in a list and a report is generated 7. networks are analysed using netAnalyze() function; global network properties are extracted and joined with metadata and microbial diversity indices and evenness indices (including Pielou J and average Bray-Curtis dissimilarity, see Parente et al., 2018 for details) 8. node statistics (cluster membership and centrality indices) are then extracted from the microNetProps objects and taxonomic and prevalence and abundance information are merged 9. the networks are extracted ad tidygraph objects (Pedersen, 2020) and edge betweenness is calculated; Venn diagrams showing edges in common between different inference methods are optionally generated 10. global properties of the networks are compared using Principal Component Analysis 11. networks are plotted using ggraph (Pedersen, 2021) or NetCoMi and optionally compared in grids

2

12. node plots are generated for each dataset/inference method 13. stable edges (edges identified with more than one method within a given dataset or over different datasets) are identified 14. taxonomic assortativity is tested using epi.2by2() function of the epiR package (Stevenson et al., 2021)

Several aspects of the workflow can be personalized by setting options (locations of the input and output files, resolution of the graphs, filtering and taxonomic agglomeration options, network inference methods to be used and their parameters).

Supplementary Figure 1. Schematic representation of the workflow used in the meta- analysis.

3

Supplementary Table 1.

Details of the studies used for inference of microbial association networks. The studies were extracted from DairyFMBN 2.1.6 (Parente et al., 2020, Parente, 2021).

R ead R length NCBI SRA S Study T arget egion (bp) seq. accn. amples Description Type1 Design Reference Commercial high-moisture cheese Guidone et ST1 16S RNA gene V1-V3 504 SRP052240 29 produced with different acidification methods cs (1) descriptive al., 2016 Undefined strain starters (milk cultures) for high- Parente et ST2 16S RNA gene V1-V3 498 SRP057506 24 moisture Mozzarella cheese cs (1) descriptive al., 2016 Undefined strain starters (whey cultures) and cheese curds for water-, Padano 3x2 groups De Filippis ST3 16S RNA gene V1-V3 465 SRP033419 50 (and Parmigiano Reggiano cheese cs (1) (small) et al., 2014 time series De Milk, curd and ewe's milk cheese during + Pasquale et ST6 16S RNA V1-V3 485 SRP038100 22 ripening ts (11) descriptive al., 2014a Milk (from different lactation stages), curd and 3x2 groups Dolci et al., ST8 16S RNA V1-V3 490 SRP040575 27 cheese from three different dairies mx (3) (small) 2014 time series Alessandri Piedmont hard cheese made from raw milk: milk, curd + a et al., ST9 16S RNA V1-V3 469 SRP044294 39 and cheese throughout ripening mx (7) descriptive 2016 2x2x2 16S RNA gene Silano cheese manufacture: starter groups De Filippis ST10 and 16S RNA V1-V3 601 SRP061555 67 culture, milk, curd and cheese throughout ripening ts (5) (small) et al., 2016 Environmental swabs from an Italian dairy plant and different kind of (Mozzarella; ; time series ; Caciocavallo; Grancacio) produced in the + Stellato et ST18 16S RNA gene V1-V3 454 SRP058584 45 same plant cs (1) descriptive al., 2015 Environmental swabs from an Italian dairy plant and different kind of cheeses (Caciotta; Caciocavallo Calasso et ST22 16S RNA gene V1-V3 465 -- 48 Pugliese) and cow milk produced in the same plant mx (10) 2 groups al., 2016 cheese samples with or without blowing Bassi et al., ST23 16S RNA gene V3-V4 398 SRP055798 37 defect, from different factories, at different ripening cs (1) descriptive 2015 4

times, produced with and without lysozyme Continental (Swiss type cheese) produced early and late in the day, core and rind samples included, O'Sullivan ST25 16S RNA gene V4-V5 232 ERP009223 31 different ripening times cs (1) descriptive et al., 2015 descriptive, Artisanal soft, semi-hard and hard cheeses from raw possibly 3 Quigley et ST32 16S RNA gene V4 500 -- 93 or pasteurized cow, goat, or cs (1) groups al., 2012 Investigating the role of the microbiota in Pink Cheese. Cheddar, Emmental and cheese coloured with 2x2 groups Quigley et ST33 16S RNA gene V4 500 ERP006630 58 Annatto, either unspoiled or spoiled. cs (1) small al., 2016 Bovine ricotta cheese (two lots, winter and spring) without or with pink discoloration, throughout storage 2 groups Sattin et al., ST34 16S RNA V3-V4 422 SRP060430 46 at 8°C. mx (7) small 2016 Raw cow milk and cheese protective 3 groups, Minervini ST36 16S RNA V1-V3 424 SRP110830 50 lactobacilli cheese with or without dietary fibres an mx (4) small et al., 2017 Teat skin, raw cow milk and Cantal cheese) 4+ groups, Frétin et ST39 16S RNA gene V3-V4 435 SRP126475 48 microbiota, as a function of grazing system mx (3) small al., 2018 4+ or 2 Dugat- groups, Bony et al., ST41 16S RNA gene V3-V4 425 SRP071345 95 Microbiota of core and rind of 12 French cheeses cs (1) small 2016 4 groups Guzzon et ST43 16S RNA gene V1-V3 420 SRP051167 26 Cheese with brown defect and cheese environment cs (1) (small) al., 2017 Caciocavallo cheese, throughout ripening, with milk 2 groups Giello et al., ST44 16S RNA gene V1-V3 371 SRP070077 38 obtained under different cow's feeding regimes mx (7) (small) 2017 Gouda cheese (15 brands), from pasteurized or raw milk. Spatial (core, surface, middle) variability was assessed. Cheese age (2-18 mo.) confounded with Salazar et ST45 16S RNA gene V3-V4 463 SRP103624 92 brand. cs (1) 2 groups al., 2018 High moisture Mozzarella cheese (cow or buffalo Marino et ST48 16S RNA gene V3-V4 427 SRP156292 39 milk), produced with different types of starters cs (1) descriptive al., 2019 Artisanal raw milk cheeses from Brazil (11 different Kamimura ST49 16S RNA gene V3-V4 412 SRP165151 196 types from 5 geographical areas) cs (1) descriptive et al., 2019 Cow milk, cheese (4 types: Brie, Cheddar, Gruyere, cs (1) descriptive Jarlsberg) and environmental samples, from farm to Falardeau ST74 16S RNA gene V3 151 SRP170819 375 fork for an artisanal cheese production facility et al., 2019 cs (1) descriptive De Active microbiota from Italian PDO ewe's cheese Pasquale et ST106 16S RNA V1-V3 369 SRP059382 30 ( toscano; ; Fiore sardo) al., 2016 ST110 16S RNA gene V3-V4 427 SRP212264 112 Dynamics of the microbiota of semisoft caciotta cheese mx (3) 4+ Calasso et 5

produced a washed rind protocol, with or without al., 2020 attenuated adjuncts and surface inoculants Microbiota of core and rind of Cheddar, and 3x2 Swiss type cheese produced in Oregon from groups, Choi et al., ST115 16S RNA gene V4 245 SRP233045 63 pasteurized milk cs (1) small 2020b Microbiota of Cheddar cheese during production and aging (26 months). Two batches, raw and pasteurized Choi et al., ST131 16S RNA gene V4 245 SRP244702 108 milk included ts (24+) -- 2020a Microbiota of F romadzo cheese (a semi-hard, ripened cheese from Valle d'Aosta) made from milk Dolci et al., ST136 16S RNA gene V3-V4 427 SRP256471 47 from two different farms during ripening mx (4) 2x2 groups 2020 Castellanos Microbiota of Paipa cheese, a Colombian semi-ripened -Rozo et al., ST146 16S RNA gene V3-V4 427 ERP119763 22 cheese made from raw cow milk mx (4) descriptive 2020 Bacterial microbiota of Parmigiano Reggiano throughout cheese ripening (curd-24 mo.) for 6 Bottari et ST148 16S RNA gene V3-V4 427 SRP274316 67 production batches mx (5) descriptive al., 2020 Microbiota of raw cow bulk milk and vat milk, natural whey culture and Trentingrana as affected by chlorine possibly 3 Cremonesi ST149 16S RNA gene V3-V4 425 SRP254987 102 wash of milking equipment mx (3) groups et al., 2020 Evolution of microbiota during production of (milk, natural milk starter, cheese) di Roccaverano, an artisanal Protected Designation of Origin soft cheese made with raw goat milk by addition of a natural milk possibly 4 Biolcati et ST150 16S RNA gene V3-V4 426 SRP257637 58 starter (NMS), from the Piedmont region of Italy mx (3) groups al., 2020 Bacterial microbiota of commercial Australian Cheddar cheese, three brands, at different ripening possibly 3 Afshari et ST157 16S RNA gene V4 271 SRP290895 40 ages cs (1) groups al., 2020 Bacterial microbiota of Pélardon cheese, a French PDO goat milk cheese with white bloomy rind, during mx (3- Penland et ST165 16S RNA gene V3-V4 425 ERP121277 45 ripening 7) al., 2021

1 cs cross sectional study; ts time series (one or very few cheese makings followed over a relatively long period of time); mx mixed (several cheeses with some time points available for each). The number of time points is shown in parentheses.

6

3. Inference of association networks at different taxonomic levels.

Microbial association networks were inferred for two studies, ST49 and ST136 (Supplementary Table 1) using the phyloseq objects with Amplicon Sequence Variants (no taxonomic agglomeration, prior to merging in FoodMicrobionet, Parente et al., 2019), with taxonomic agglomeration at the lowest taxonomic level (this is the level of taxonomic agglomeration of objects stored in DairyFMBN) and after taxonomic agglomeration at the genus level. Prevalence and abundance filtering were performed by removing ASVs or "taxa" which had a prevalence <0.05 and a relative abundance <0.005. Inference of networks was performed using CCREPE and SPIEC-EASI with neighbourhood selection, the networks were transformed in tidygraph objects and plotted using ggraph. The networks with taxonomic aggregation at the lowest possible level are shown in Figure 2 of the paper, while those at the ASV and genus level are shown in Supplementary Figures 2 and 3 respectively.

Supplementary Figure 2. Microbial association networks inferred Amplicon Sequence variants for studies ST49 (Kamimura et al., 2019) and ST136 (Dolci et al., 2020) using methods CCREPE (Faust et al., 2012) or SPIEC-EASI (Kurtz et al., 2015). For each pane, the colour of nodes is determined by phylum, and the size by degree; the colour of the edges is red for mutual exclusion relationships and green for presence relationships; the thickness of the nodes is determined by the strength of the association measure, as determined in NetCoMi (Peschel et al., 2020). A force-based layout (Fruchterman–Reingold) was used for positioning nodes and edges. The name of the nodes has been abbreviated to 15 characters.

7

Supplementary Figure 3. Microbial association networks inferred from studies ST49 (Kamimura et al., 2019) and ST136 (Dolci et al., 2020) extracted from DairyFMBN 2.1.6 (https://data.mendeley.com/datasets/3cwf729p34/5), with taxonomic agglomeration at the genus level using methods CCREPE (Faust et al., 2012) or SPIEC-EASI (Kurtz et al., 2015). For each pane, the colour of nodes is determined by phylum, and the size by degree; the colour of the edges is red for mutual exclusion relationships and green for presence relationships; the thickness of the nodes is determined by the strength of the association measure, as determined in NetCoMi (Peschel et al., 2020). A force-based layout (Fruchterman–Reingold) was used for positioning nodes and edges. The name of the nodes has been abbreviated to 15 characters.

4. Comparing inference methods.

Microbial association networks were inferred using four methods (SparCC, Sparse Correlations for Compositional data, Friedman and Alm, 2012; CCREPE Compositionality Corrected by REnormalization and PErmutation, Faust et al., 2012; SPIEC-EASI, SParse InversE Covariance Estimation for Ecological Association Inference, Kurtz et al., 2015; SPRING, SemiParametric Rank-based approach for INference in Graphical model, Yoon et al., 2019) on 35 datasets from 34 studies from DairyFMBN 2.1.6 (Supplementary Table 1) using a R workflow based mostly on R packages phyloseq, NetCoMi, and tidygraph (see section 2). Taxonomic agglomeration was performed at the genus level using the function tax_glom() of package phyloseq.

8

The inferred networks for two studies (ST41, Dugat-Bony et al., 2016; ST131 Choi et al., 2020a; see Supplementary Table 1 for details) are shown in Supplementary Figures 4 and 5.

Supplementary Figure 4. Microbial association networks inferred at the genus level for studies ST31 (Dugat-Bony et al., 2019) using four methods (SparCC, Sparse Correlations for Compositional data, Friedman and Alm, 2012; CCREPE Compositionality Corrected by REnormalization and PErmutation, Faust et al., 2012; SPIEC-EASI, SParse InversE Covariance Estimation for Ecological Association Inference, Kurtz et al., 2015; SPRING, SemiParametric Rank-based approach for INference in Graphical model, Yoon et al., 2019). For each pane, the colour of nodes is determined by phylum, and the size by degree; the colour of the edges is red for mutual exclusion relationships and green for presence relationships; the thickness of the nodes is determined by the strength of the association measure, as determined in NetCoMi (Peschel et al., 2020). A force-based layout (Fruchterman–Reingold) was used for positioning nodes and edges. The name of the nodes has been abbreviated to 15 characters.

A number of comparisons (number of edges, PEP - Positive Edge Proportion, density, clustering coefficient, modularity, all estimated using the netAnalyze() function of NetCoMi) between methods were carried out using the 35 datasets (2 separate datasets, one for DNA and one for RNA were available for study 10) listed in Supplementary Table 1. The significance of the differences was tested, when appropriate, using a pairwise Wilcoxon test. The distribution of the number of edges for each method is compared in Supplementary Figure 6. The distribution of the network density of the networks for each method is compared in Supplementary Figure 7. The distribution of positive edge proportion of the networks for each method is compared in Supplementary Figure 8. The distribution of global clustering coefficient and modularity of the networks for each method is compared in Supplementary Figure 9 and 10, respectively.

9

Supplementary Figure 5. Microbial association networks inferred at the genus level for studies ST141 (Choi et al., 2020a) using four methods (SparCC, Sparse Correlations for Compositional data, Friedman and Alm, 2012; CCREPE Compositionality Corrected by REnormalization and PErmutation, Faust et al., 2012; SPIEC-EASI, SParse InversE Covariance Estimation for Ecological Association Inference, Kurtz et al., 2015; SPRING, SemiParametric Rank-based approach for INference in Graphical model, Yoon et al., 2019). Only three methods returned networks with at least one edge. For each pane, the colour of nodes is determined by phylum, and the size by degree; the colour of the edges is red for mutual exclusion relationships and green for presence relationships; the thickness of the nodes is determined by the strength of the association measure, as determined in NetCoMi (Peschel et al., 2020). A force-based layout (Fruchterman–Reingold) was used for positioning nodes and edges. The name of the nodes has been abbreviated to 15 characters.

Supplementary Figure 6. A comparison of total number of edges in microbial association networks inferred after taxonomic agglomeration at the genus level for the studies listed in Supplementary Table 1.

10

Supplementary Figure 7. A comparison of network density in microbial association networks inferred after taxonomic agglomeration at the genus level for the studies listed in Supplementary Table 1.

Supplementary Figure 8. A comparison of positive edge proportion in microbial association networks inferred after taxonomic agglomeration at the genus level for the studies listed in Supplementary Table 1.

11

Supplementary Figure 9. A comparison of global clustering coefficient in microbial association networks inferred after taxonomic agglomeration at the genus level for the studies listed in Supplementary Table 1.

Supplementary Figure 10. A comparison of modularity in microbial association networks inferred after taxonomic agglomeration at the genus level for the studies listed in Supplementary Table 1.

5. Comparing global network properties.

Microbial associations were inferred for all studies in Supplementary Table 1 after filtering agglomeration at the genus level using method SPIEC-EASI, and network properties were estimated using netAnalyze() function of NetCoMi (see Supplementary Figure 1) with the centrLCC parameter set to true. This resulted in calculation of centralities only for the largest connected component (LCC), i.e. when more than a connected component of the network was

12 present (a group of nodes connected among them but not with other group of nodes in the same network), the centralities were only calculated for the LCC. Furthermore, the global network statistics were merged with metadata (number of taxa, types of study, diversity indices calculated as described in Parente et al., 2018 (Chao1, average Bray-Curtis distance, and an evenness index, Pielou J).

Supplementary Figure 11. Relationship between the size of the networks and the average path length for the largest connected component for microbial association networks inferred with method SPIEC-EASI after taxonomic agglomeration at the genus level for the studies listed in Supplementary Table 1.

The relationship between the number of nodes of the networks and the average path length of the largest connected component is shown in Supplementary Figure 11. The correlation between selected variables, including network properties (average path length, avPath; density, calculated on all connected nodes, density_2; modularity; natural connectivity, i.e. the average eigenvalue of the adjacency matrix, a measure of the robustness of a graph, natConnect; number of connected nodes, nnodes; positive edge proportion, pep) and properties of the datasets (average dissimilarity avDiss: average of the Bay-Curtis distance matrix; average Chao1 index; Pielou J evenness) was calculated using the Pearson product moment coefficient. A principal component analysis with varimax rotation was carried out using function prcomp() and a biplot of the first two components was plotted using the autoplot() function of package ggfortify (Tang et al., 2016). Parallel analysis carried out using package psych (Revelle, 2020) confirmed that three components were sufficient to explain the variance (74%).

13

6. Node properties.

Using the data generated in the previous session, node properties were extracted from the netProps objects created by netAnalyze() and further node properties (positive and negative degree, PEP) were calculated for each node. The results for all networks were combined into a data frame, merged with taxonomic metadata and used to generate graphs. An example for study ST49 (Kamimura et al., 2019) is shown in Supplementary Figure 12.

Supplementary Figure 12. Node plot for the microbial association network inferred using SPIEC-EASI after taxonomic agglomeration at the genus level for study ST49 (Kamimura et al., 2019). The name of the genera was abbreviated to 7 characters. The size of the nodes is made proportional to the relative abundance of the genus, and the transparency is made proportional to edge betweenness. The solid black line corresponds to equal values of degree and positive degree (i.e. all the nodes above have at least one negative edge). The dotted line is the linear regression line. The nodes above have a higher than average ratio of degree to positive degree.

7. Edge properties.

Using the microbial association networks generated in section 4, we created a combined data frame with edges obtained from all methods. Taxonomic information was merged for both from and to nodes and flags showing if both nodes belonged to the same higher taxon (family, order or class) were created. 14

To evaluate which were the most stable and frequently detected nodes, the edges were first grouped by edge name and type (presence or mutual exclusion), pooling all data for different methods. The number of times and the frequency of detection of each node was calculated, together with the average frequency of detection within study (a measure of edge stability). The topmost 25 presence and mutual exclusion edges were plotted. A similar procedure was used to evaluate which were the most frequently detected associations using method SPIEC-EASI. In this case summary statistics were calculated after grouping by edge. The topmost (in terms of absolute frequency of detection) 25 copresence and mutual exclusion associations are shown in Supplementary Figures 13 and 14, respectively.

Supplementary Figure 13. The 25 topmost (in terms of number of studies in which the association was detected) copresence associations detected in microbial association networks inferred after aggregation at the genus level with method SPIEC-EASI. The size of the points is made proportional to the median edge betweenness quantile.

Supplementary Figure 14. The 25 topmost (in terms of number of studies in which the association was detected) mutual exclusion associations detected in microbial association networks inferred after aggregation at the genus level with method SPIEC-EASI. The size of the points is made proportional to the median edge betweenness quantile.

15

Finally, in order to obtain a preliminary evaluation of the most frequently detected associations at the species level, microbial association network detection was performed on the same set of studies with the four methods, without taxonomic agglomeration. Only edges in which both nodes had been identified at the species level were retained and the 50 most frequent associations were retained and plotted. The list of these edges is shown in Supplementary Table 2. To evaluate if there was evidence of taxonomic assortativity, odds ratio (OR) and relative risk (RR) of having a co-presence association between members of the same higher taxon were calculated, after tabulation, using function epi.2by2() of package epiR (Stevenson et al., 2021). The decimal logarithm of odd ratio was calculated (using a dummy value of 5 whenever the OR was Inf) and a Yates test was testing the null hypothesis that being members of the same higher taxonomic group did not have any effect on the OR of having a presence relationship. The results for assortativity at the family level are shown in Supplementary Figure 15.

Supplementary Figure 15. Distribution of the logarithm (base10) of the odds ratio for the occurrence of a copresence association among genera belonging to the same family compared to genera belonging to different families. A dummy value of 5 was used when the odds ratio was infinite. The colour of the points shows if a Chi-square test of the null hypothesis that the frequency of copresence associations was not different between members of the same or different families (p<0.05).

16

Supplementary Table 2. The 50 most frequent associations detected at the species level using 4 inference methods (see section 2). For each edge n indicates the number of methods which detected the association and med_ebq the median quantile for edge betweenness. Ass. type indicates the nature of the association (copres = copresence, mut. ex. mutual exclusion). Some associations were detected in more than one data set

dataset ass.type edge name n med_ebq Reference ST23 copres Brevibacterium aureum--Staphylococcus equorum 4 0.94 Bassi et al., 2015 ST49 copres Corynebacterium variabile--Levilactobacillus brevis 4 0.76 Kamimura et al., 2019 ST149 copres Caryophanon latum--Porphyromonas levii 4 0.69 Cremonesi et al., 2020 ST49 copres Psychrobacter sanguinis--Vibrio rumoiensis 4 0.68 Kamimura et al., 2019 ST22 copres Staphylococcus equorum--Staphylococcus sciuri 4 0.60 Calasso et al., 2016 ST49 copres Staphylococcus equorum--Staphylococcus sciuri 4 0.59 Kamimura et al., 2019 ST9 copres Lactobacillus delbrueckii--Lactobacillus helveticus 4 0.58 Alessandria et al., 2016 ST41 copres Marinomonas foliarum--Vibrio litoralis 4 0.52 Dugat-Bony et al., 2016 ST49 copres Psychrobacter pacificensis--Psychrobacter sanguinis 4 0.45 Kamimura et al., 2019 Marinilactibacillus psychrotolerans--Psychrobacter ST49 copres pacificensis 4 0.42 Kamimura et al., 2019 ST43 copres Alkalibacterium gilvum--Marinilactibacillus psychrotolerans 4 0.38 Guzzon et al., 2017 ST49 copres Ligilactobacillus acidipiscis--Weissella paramesenteroides 4 0.34 Kamimura et al., 2019 ST149 copres Chryseobacterium anthropi--Chryseobacterium haifense 4 0.11 Cremonesi et al., 2020 ST41 copres Vibrio litoralis--Vibrio rumoiensis 3 0.92 Dugat-Bony et al., 2016 ST48 copres Brochothrix thermosphacta--Rahnella aquatilis 3 0.86 Marino et al., 2019 ST149 copres Acinetobacter guillouiae--Chryseobacterium haifense 3 0.79 Cremonesi et al., 2020 ST48 copres Anoxybacillus flavithermus--Shewanella putrefaciens 3 0.69 Marino et al., 2019 ST49 copres Lactiplantibacillus paraplantarum--Levilactobacillus brevis 3 0.68 Kamimura et al., 2019 ST48 copres Lactobacillus delbrueckii--Lactobacillus helveticus 3 0.67 Marino et al., 2019 ST18 copres Propionibacterium acnes--Pseudomonas mandelii 3 0.66 Stellato et al., 2015 ST41 copres Halomonas zhanjiangensis--Vibrio toranzoniae 3 0.65 Dugat-Bony et al., 2016 ST22 copres Staphylococcus equorum--Streptococcus pneumoniae 3 0.65 Calasso et al., 2016 ST23 copres Lacticaseibacillus rhamnosus--Lentilactobacillus buchneri 3 0.56 Bassi et al., 2015 ST3 copres Lactococcus lactis--Streptococcus thermophilus 3 0.50 De Filippis et al., 2014 ST6 copres Lactiplantibacillus pentosus--Latilactobacillus fuchuensis 3 0.50 De Pasquale et al., 2014a ST49 copres Corynebacterium stationis--Psychrobacter meningitidis 3 0.47 Kamimura et al., 2019 ST49 copres Corynebacterium stationis--Tetragenococcus halophilus 3 0.45 Kamimura et al., 2019 Lactiplantibacillus paraplantarum--Weissella ST49 copres paramesenteroides 3 0.43 Kamimura et al., 2019 ST146 copres Lactococcus raffinolactis--Streptococcus parauberis 3 0.38 Castellanos-Rozo et al., 2020 Lactiplantibacillus paraplantarum--Leuconostoc ST49 mut_ex mesenteroides 3 0.34 Kamimura et al., 2019 ST49 copres Tetragenococcus halophilus--Weissella paramesenteroides 3 0.32 Kamimura et al., 2019 Corynebacterium tuberculostearicum--Propionibacterium ST22 copres granulosum 3 0.27 Calasso et al., 2016 ST49 copres Acinetobacter johnsonii--Pseudomonas fragi 3 0.21 Kamimura et al., 2019 ST8 copres Enterococcus faecalis--Macrococcus caseolyticus 3 0.21 Dolci et al., 2014 ST49 copres Corynebacterium stationis--Corynebacterium variabile 3 0.18 Kamimura et al., 2019 ST49 copres Ligilactobacillus acidipiscis--Tetragenococcus halophilus 3 0.16 Kamimura et al., 2019 ST9 copres Lacticaseibacillus casei--Limosilactobacillus fermentum 3 0.05 Alessandria et al., 2016 17

ST10RNA copres Lacticaseibacillus casei--Limosilactobacillus fermentum 2 1.00 De Filippis et al., 2016 ST8 copres Lactobacillus delbrueckii--Streptococcus thermophilus 2 1.00 Dolci et al., 2014 ST1 copres Lactococcus garvieae--Weissella viridescens 2 0.96 Guidone et al., 2016 ST18 copres Lactobacillus delbrueckii--Lactobacillus helveticus 2 0.96 Stellato et al., 2015 ST18 copres Chromohalobacter canadensis--Lactobacillus delbrueckii 2 0.88 Stellato et al., 2015 ST8 mut_ex Acinetobacter johnsonii--Lactobacillus delbrueckii 2 0.84 Dolci et al., 2014 ST10RNA copres Lactobacillus helveticus--Streptococcus thermophilus 2 0.83 De Filippis et al., 2016 ST23 copres Enterococcus casseliflavus--Limosilactobacillus fermentum 2 0.82 Bassi et al., 2015 ST22 copres Staphylococcus sciuri--Streptococcus pneumoniae 2 0.80 Calasso et al., 2016 ST10RNA copres Lacticaseibacillus casei--Lentilactobacillus kefiri 2 0.79 De Filippis et al., 2016 ST22 copres Clostridium difficile--Propionibacterium granulosum 2 0.78 Calasso et al., 2016 ST49 copres Corynebacterium stationis--Psychrobacter celer 2 0.78 Kamimura et al., 2019 ST22 copres Clostridium difficile--Lactiplantibacillus pentosus 2 0.77 Calasso et al., 2016

18

References

Afshari, R., Pillidge, C.J., Dias, D.A., Osborn, A.M., Gill, H., 2020. Microbiota and metabolite profiling combined with integrative analysis for differentiating cheeses of varying ripening ages. Front. Microbiol. 11, 592060. https://doi.org/10.3389/fmicb.2020.592060 Alessandria, V., Ferrocino, I., De Filippis, F., Fontana, M., Rantsiou, K., Ercolini, D., Cocolin, L., 2016. Microbiota of an Italian Grana like cheese during manufacture and ripening unraveled by 16S rRNA-based approaches. Appl. Environ. Microbiol. 82, 3988–3995. https://doi.org/10.1128/aem.00999-16 Bassi, D., Puglisi, E., Cocconcelli, P.S., 2015. Understanding the bacterial communities of hard cheese with blowing defect. Food Microbiol. 52, 106–118. https://doi.org/10.1016/j.fm.2015.07.004 Biolcati, F., Ferrocino, I., Bottero, M.T., Dalmasso, A., 2020. Short communication: High- throughput sequencing approach to investigate Italian artisanal cheese production. J. Dairy Sci. 103, 10015–10021. https://doi.org/10.3168/jds.2020-18208 Bottari, B., Levante, A., Bancalari, E., Sforza, S., Bottesini, C., Prandi, B., De Filippis, F., Ercolini, D., Nocetti, M., Gatti, M., 2020. The interrelationship between microbiota and peptides during ripening as a driver for Parmigiano Reggiano cheese quality. Front. Microbiol. 11, 581658. https://doi.org/10.3389/fmicb.2020.581658 Calasso, M., Ercolini, D., Mancini, L., Stellato, G., Minervini, F., Di Cagno, R., De Angelis, M., Gobbetti, M., 2016. Relationships among house, rind and core microbiotas during manufacture of traditional Italian cheeses at the same dairy plant. Food Microbiol. 54, 115–126. https://doi.org/10.1016/j.fm.2015.10.008 Calasso, M., Minervini, F., De Filippis, F., Ercolini, D., De Angelis, M., Gobbetti, M., 2020. Attenuated Lactococcus lactis and surface bacteria as tools for conditioning the microbiota and driving the ripening of semisoft Caciotta cheese. Appl. Environ. Microb. 86. https://doi.org/10.1128/aem.02165-19 Castellanos-Rozo, J., Pulido, R.P., Grande, M.J., Lucas, R., Gálvez, A., 2020. Analysis of the bacterial diversity of Paipa cheese (a traditional raw cow’s milk cheese from Colombia) by High-Throughput Sequencing. Microorg. 8, 218. https://doi.org/10.3390/microorganisms8020218 Choi, J., Lee, S.I., Rackerby, B., Frojen, R., Goddik, L., Ha, S.-D., Park, S.H., 2020a. Assessment of overall microbial community shift during Cheddar cheese production from raw milk to aging. Appl. Microbiol. Biot. 104, 6249–6260. https://doi.org/10.1007/s00253-020- 10651-7 Choi, J., Lee, S.I., Rackerby, B., Goddik, L., Frojen, R., Ha, S.-D., Kim, J.H., Park, S.H., 2020b. Microbial communities of a variety of cheeses and comparison between core and rind region of cheeses. J. Dairy Sci. https://doi.org/10.3168/jds.2019-17455 Cremonesi, P., Morandi, S., Ceccarani, C., Battelli, G., Castiglioni, B., Cologna, N., Goss, A., Severgnini, M., Mazzucchi, M., Partel, E., Tamburini, A., Zanini, L., Brasca, M., 2020. Raw milk microbiota modifications as affected by chlorine usage for cleaning procedures: the Trentingrana PDO case. Front. Microbiol. 11, 564749. https://doi.org/10.3389/fmicb.2020.564749 De Filippis, F., Genovese, A., Ferranti, P., Gilbert, J.A., Ercolini, D., 2016. Metatranscriptomics reveals temperature-driven functional changes in microbiome impacting cheese maturation rate. Sci. Rep. 6, 21871. https://doi.org/10.1038/srep21871

19

De Filippis, F., La Storia, A., Stellato, G., Gatti, M., Ercolini, D., 2014. A selected core microbiome drives the early stages of three popular italian cheese manufactures. PLoS One 9, e89680. https://doi.org/10.1371/journal.pone.0089680 De Pasquale, I., Calasso, M., Mancini, L., Ercolini, D., La Storia, A., De Angelis, M., Di Cagno, R., Gobbetti, M., 2014a. Causal relationship between microbial ecology dynamics and proteolysis during manufacture and ripening of Canestrato Pugliese PDO cheese. Appl. Environ. Microbiol. 80, 4085–4094. https://doi.org/10.1128/aem.00757-14 De Pasquale, I., Di Cagno, R., Buchin, S., De Angelis, M., Gobbetti, M., 2016. Spatial distribution of the metabolically active microbiota within Italian PDO ewes’ milk cheeses. Plos One 11, e0153213. https://doi.org/10.1371/journal.pone.0153213 Dolci, P., De Filippis, F., La Storia, A., Ercolini, D., Cocolin, L., 2014. rRNA-based monitoring of the microbiota involved in Fontina PDO cheese production in relation to different stages of cow lactation. Int. J. Food Microbiol. 185, 127–135. https://doi.org/10.1016/j.ijfoodmicro.2014.05.021 Dolci, P., Ferrocino, I., Giordano, M., Pramotton, R., Vernetti-Prot, L., Zenato, S., Barmaz, A., 2020. Impact of Lactococcus lactis as starter culture on microbiota and metabolome profile of an Italian raw milk cheese. Int. Dairy J. 110, 104804. https://doi.org/10.1016/j.idairyj.2020.104804 Dugat-Bony, E., Garnier, L., Denonfoux, J., Ferreira, S., Sarthou, A.-S., Bonnarme, P., Irlinger, F., 2016. Highlighting the microbial diversity of 12 French cheese varieties. Int. J. Food Microbiol. 238, 265–273. https://doi.org/10.1016/j.ijfoodmicro.2016.09.026 Falardeau, J., Keeney, K., Trmčić, A., Kitts, D., Wang, S., 2019. Farm-to-fork profiling of bacterial communities associated with an artisan cheese production facility. Food Microbiol. 83, 48–58. https://doi.org/10.1016/j.fm.2019.04.002 Faust, K., Sathirapongsasuti, J.F., Izard, J., Segata, N., Gevers, D., Raes, J., Huttenhower, C., 2012. Microbial Co-occurrence Relationships in the Human Microbiome. PLoS Comp. Biol. 8, e1002606. https://doi.org/10.1371/journal.pcbi.1002606 Frétin, M., Martin, B., Rifa, E., Isabelle, V.-M., Pomiès, D., Ferlay, A., Montel, M.-C., Delbès, C., 2018. Bacterial community assembly from cow teat skin to ripened cheeses is influenced by grazing systems. Sci. Rep. 8, 200. https://doi.org/10.1038/s41598-017-18447-y Giello, M., La Storia, A., Masucci, F., Francia, A.D., Ercolini, D., Villani, F., 2017. Dynamics of bacterial communities during manufacture and ripening of traditional Caciocavallo of Castelfranco cheese in relation to cows’ feeding. Food Microbiol. 63, 170–177. https://doi.org/10.1016/j.fm.2016.11.016 Guidone, A., Zotta, T., Matera, A., Ricciardi, A., De Filippis, F., Ercolini, D., Parente, E., 2016. The microbiota of high-moisture mozzarella cheese produced with different acidification methods. Int. J. Food Microbiol. 216, 9–17. https://doi.org/10.1016/j.ijfoodmicro.2015.09.002 Guzzon, R., Carafa, I., Tuohy, K., Cervantes, G., Vernetti, L., Barmaz, A., Larcher, R., Franciosi, E., 2017. Exploring the microbiota of the red-brown defect in smear-ripened cheese by 454-pyrosequencing and its prevention using different cleaning systems. Food Microbiol. 62, 160–168. https://doi.org/10.1016/j.fm.2016.10.018 Kamimura, B.A., Filippis, F.D., Sant’Ana, A.S., Ercolini, D., 2019. Large-scale mapping of microbial diversity in artisanal Brazilian cheeses. Food Microbiol. 80, 40–49. https://doi.org/10.1016/j.fm.2018.12.014 Kurtz, Z.D., Müller, C.L., Miraldi, E.R., Littman, D.R., Blaser, M.J., Bonneau, R.A., 2015. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comp. Biol. 11, e1004226. https://doi.org/10.1371/journal.pcbi.1004226 Marino, M., de Wittenau, G.D. , Saccà, E., Cattonaro, F., Spadotto, A., Innocente, N., Radovic, S., Piasentier, E., Marroni, F., 2019. Metagenomic profiles of different types of Italian high- 20

moisture Mozzarella cheese. Food Microbiol. 79, 123–131. https://doi.org/10.1016/j.fm.2018.12.007 McMurdie, P.J., Holmes, S., 2013. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One 8, e61217. https://doi.org/10.1371/journal.pone.0061217 Minervini, F., Conte, A., Del Nobile, M.A., Gobbetti, M., De Angelis, M., 2017. Dietary fibers and protective Lactobacilli drive Burrata cheese microbiome. Appl. Environ. Microbiol. 83, e01494-17–15. https://doi.org/10.1128/aem.01494-17 O’Sullivan, D.J., Cotter, P.D., O’Sullivan, O., Giblin, L., McSweeney, P.L.H., Sheehan, J.J., 2015. Temporal and spatial differences in microbial composition during the manufacture of a continental-type cheese. Appl. Environ. Microbiol. 81, 2525–2533. https://doi.org/10.1128/aem.04054-14 Parente, E., 2021. DairyFMBN: a database of studies on the bacterial microbiota of dairy products, Mendeley Data, V5, doi: 10.17632/3cwf729p34.5 Parente, E., De Filippis, F., Ercolini, D., Ricciardi, A., Zotta, T., 2019. Advancing integration of data on food microbiome studies: FoodMicrobionet 3.1, a major upgrade of the FoodMicrobionet database. Int. J. Food Microbiol. 305, 108249. https://doi.org/10.1016/j.ijfoodmicro.2019.108249 Parente, E., Guidone, A., Matera, A., De Filippis, F., Mauriello, G., Ricciardi, A., 2016. Microbial community dynamics in thermophilic undefined milk starter cultures. Int. J. Food Microbiol. 217, 59–67. https://doi.org/10.1016/j.ijfoodmicro.2015.10.014 Parente, E., Ricciardi, A., Zotta, T., 2020. The microbiota of dairy milk: A review. Int. Dairy J. 107, 104714. https://doi.org/10.1016/j.idairyj.2020.104714 Pedersen, T. L. (2020). tidygraph: A Tidy API for Graph Manipulation. R package version 1.2.0. https://CRAN.R-project.org/package=tidygraph Pedersen, T. L. (2021). ggraph: An Implementation of Grammar of Graphics for Graphs and Networks. R package version 2.0.5. https://CRAN.R-project.org/package=ggraph Penland, M., Falentin, H., Parayre, S., Pawtowski, A., Maillard, M.-B., Thierry, A., Mounier, J., Coton, M., Deutsch, S.-M., 2021. Linking Pélardon artisanal goat cheese microbial communities to aroma compounds during cheese-making and ripening. Int. J. Food Microbiol. 345, 109130. https://doi.org/10.1016/j.ijfoodmicro.2021.109130 Peschel, S., Müller, C.L., Mutius, E. von, Boulesteix, A.-L., Depner, M., 2020. NetCoMi: network construction and comparison for microbiome data in R. Brief Bioinform. https://doi.org/10.1093/bib/bbaa290 Quigley, L., O’Sullivan, O., Beresford, T.P., Ross, R.P., Fitzgerald, G.F., Cotter, P.D., 2012. High- throughput sequencing for detection of subpopulations of bacteria not previously associated with artisanal cheeses. Appl. Environ. Microbiol. 78, 5717–5723. https://doi.org/10.1128/aem.00918-12 Quigley, L., O’Sullivan, D.J., Daly, D., O’Sullivan, O., Burdikova, Z., Vana, R., Beresford, T.P., Ross, R.P., Fitzgerald, G.F., McSweeney, P.L.H., Giblin, L., Sheehan, J.J., Cotter, P.D., 2016. Thermus and the pink discoloration defect in cheese. mSystems 1. https://doi.org/10.1128/msystems.00023-16 Salazar, J.K., Carstens, C.K., Ramachandran, P., Shazer, A.G., Narula, S.S., Reed, E., Ottesen, A., Schill, K.M., 2018. Metagenomics of pasteurized and unpasteurized gouda cheese using targeted 16S rDNA sequencing. BMC Microbiol. 18, 189. https://doi.org/10.1186/s12866-018-1323-4 Sattin, E., Andreani, N.A., Carraro, L., Fasolato, L., Balzan, S., Novelli, E., Squartini, A., Telatin, A., Simionati, B., Cardazzo, B., 2016. Microbial dynamics during shelf-life of industrial Ricotta cheese and identification of a Bacillus strain as a cause of a pink discolouration. Food Microbiol. 57, 8–15. https://doi.org/10.1016/j.fm.2015.12.009 21

Stellato, G., De Filippis, F., La Storia, A., Ercolini, D., 2015. Coexistence of Lactic Acid Bacteria and potential spoilage microbiota in a dairy processing environment. Appl. Environ. Microbiol. 81, 7893–7904. https://doi.org/10.1128/aem.02294-15 Stevenson, M., Sergeant E. et al. (2021). epiR: Tools for the Analysis of Epidemiological Data. R package version 2.0.26. https://CRAN.R-project.org/package=epiR Tang, Y., Horikoshi, M., Li, W., 2016. ggfortify: unified interface to visualize statistical result of popular R packages. The R Journal 8.2, 478-489. Yoon, G., Gaynanova, I., Müller, C.L., 2019. Microbial networks in SPRING - Semi-parametric Rank-Based Correlation and Partial Correlation Estimation for Quantitative Microbiome Data. Front. Gen. 10, 516. https://doi.org/10.3389/fgene.2019.00516

22