Downloaded from orbit.dtu.dk on: Oct 04, 2021

Metagenomics in wine fermentation

Klincke, Franziska

Publication date: 2018

Document Version Publisher's PDF, also known as Version of record

Link back to DTU Orbit

Citation (APA): Klincke, F. (2018). Metagenomics in wine fermentation. Technical University of Denmark.

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

 Users may download and print one copy of any publication from the public portal for the purpose of private study or research.  You may not further distribute the material or use it for any profit-making activity or commercial gain  You may freely distribute the URL identifying the publication in the public portal

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

st

′ ′ ′

′ ′

D i=0 xi = c x > 0 1 i D i ≤ ≤ c = 100 !

x1 xD clr(x)=(( /g(x)),...,( /g(x))) (0)

Raw reads

Quality control

Trimmed & filtered reads De novo assembly De novo assembly (contigs)

Taxonomic annotation Genome binning

Species abundance table Draft genomes

Differential abundance analysis Quality control Gene calling

Differentially abundant Taxonomic annotation Genes

1 of 17 Analysing complex metagenomics data with MicroWineBar

1 Analysing complex metagenomics data with MicroWineBar 2 3 Authors 4 5 Franziska Klincke1, Martin Abel-Kistrup2, Sofie Maria Gilberte Saerens2, Simon 6 Rasmussen3,* 7 8 Affiliations 9 1 Department of Bio and Health Informatics, Technical University of Denmark, Kongens 10 Lyngby, Denmark 11 2 CHR. Hansen A/S, Hoersholm, Denmark 12 3 Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical 13 Sciences, University of Copenhagen, Copenhagen, Denmark 14 15 Author List Footnotes 16 Correspondence: Simon Rasmussen ([email protected]) 17 18 Abstract 19 An important step in metagenomics studies is to identify the microbial organisms that are 20 present in a sample as well as comparing samples from different environments. Here we 21 introduce MicroWineBar, a graphical tool for analysing and comparing metagenomics 22 samples. MicroWineBar can visualize the abundances of metagenomics samples in line and 23 bar graphs, perform PCoA analysis as well as analysis of richness and diversity. Additionally, 24 one can perform a differential abundance analysis between two environments with a 25 compositional data approach using a log-ratio transformation with the ANCOM method. We 26 use MicroWineBar to analyse two different years of wine fermentation as well as data from a 27 human microbiome study of colorectal cancer. Importantly, MicroWineBar does not require 28 any programming skills, is intuitive and user friendly. MicroWineBar is available at: 29 https://github.com/klincke/MicroWineBar 30 31 Keywords 32 Metagenomics, microbiome, fermentation, wine, metagenomics tools, metagenomics analysis

Analysing complex metagenomics data with MicroWineBar 2 of 17

33 Introduction

34 Metagenomics is the application of sequencing techniques to study the communities of 35 microbial organisms in an environment1. Some examples of environments which have been 36 studied include sea water from the Sargasso Sea2, acid mine drainage3 and the human gut4. One 37 basic question in metagenomics studies is which species are present in a sample from a specific 38 environment and whether there are differences in species composition between environments. 39 However, analysis of metagenomics data is not straightforward due to the large numbers of 40 reads that must be processed and that the microbial composition of the samples is often largely 41 unknown. To investigate these questions, it is important to explore the relative abundances of 42 species or at higher taxonomic ranks. To determine statistically differential abundant species 43 between two environments can be achieved using either nonparametric methods such as the 44 Wilcoxon rank sum test or using a Wald test that controls for heteroscedasticity in the data. 45 Importantly, for the latter, one also has to normalize for the sample sizes which can be done 46 taking the geometric mean over all samples to calculate scaling factors. Finally, it is even better 47 to take into account that count data from metagenomics studies are compositional which can 48 be achieved through log-ratio transformation5. Additionally, visualization and dimensionality 49 reduction are needed to get an overview of metagenomics samples. 50 Many bioinformatic tools exist for visualising metagenomics data. One of the first tools 51 which was developed was MEGAN6-8. MEGAN displays the taxonomic hierarchy by node- 52 link diagrams where each node has a small, log-scaled quantitative chart. The advantage of this 53 approach is that each node is represented in the hierarchy. Another tool for visualising relative 54 abundances is Krona9. Krona displays pie charts, which are subdivided into sectors with an 55 embedded hierarchy. This is an easy way to display several taxonomic ranks at the same time. 56 However, it is difficult to compare relative abundances of two species since the abundances are 57 represented by angle. Cleveland et al. demonstrated that it is easier to compare values which 58 are represented by length (as in a bar graph) than by angle (as in a pie chart)10. 59 We have developed a new tool for statistical analysis, dimensionality reduction and 60 visualization of metagenomics datasets. The tool is called MicroWineBar and is able to display 61 relative abundances in bar graphs interactively. MicroWineBar not only includes features to 62 compare metagenomics samples but also enables the user to compare two groups of 63 metagenomics samples, e.g. from two environments. It supports common data exploration 64 techniques including Principal Coordinates Analysis (PCoA) scatter plots and Shannon 65 diversity index plots to interpret variability in a metagenomics dataset. In addition, it provides

3 of 17 Analysing complex metagenomics data with MicroWineBar

66 both scatter and box plots for analysing the species richness. We have implemented several 67 methods to determine differentially abundant species between two groups of samples: the 68 nonparametric Wilcoxon rank sum test, the commonly used DESeq211 package and a 69 differential abundance test using ANCOM (Analysis of composition of microbiomes)5. The 70 latter calculates the pairwise log-ratios between all species and performs a significance test to 71 reduce the False Discovery Rate (FDR), from which many differential abundance analysis 72 methods suffer. ANCOM treats the count data from metagenomics samples as compositional 73 data because the relative abundances within a sample sum to one5. For a given species the null 74 hypothesis is that both sample groups have the same mean abundance. Additionally, it is 75 possible to get information about specific species through both Wikipedia and PubMed. We 76 exemplify the use of MicroWineBar using metagenomics samples from wine fermentations of 77 Bobal grapes and by reanalysing a published human microbiome dataset13. In both examples 78 we point out the differences between two sample groups. In the wine data we compare samples 79 from two years and can see that there are differences between the vintages, especially regarding 80 richness and diversity. For the human microbiome dataset, we can largely replicate the findings 81 of Zeller et al. between the two sample groups. MicroWineBar is available at 82 https://github.com/klincke/MicroWineBar 83 84 85 Results

86 Overview of MicroWineBar

87 MicroWineBar is designed to read simple text input files generated from programs that estimate 88 abundances of microbial organisms from metagenomics samples such as MGmapper14, 89 Kraken15 or MetaPhlAn16. These files contain absolute (and relative) abundances with 90 taxonomic annotations determined from mapping to whole genome databases, k-mer based 91 approaches or marker gene databases, respectively. The design of MicroWineBar follows the 92 visual information-seeking mantra: overview first, zoom, filter and details on demand17. These 93 key tasks are recommended for designing advanced graphical user interfaces and will be 94 addressed in the following. The main window of MicroWineBar will display relative 95 abundances in bar graphs. By default, it starts with displaying a bar graph of the (relative) 96 abundances of the species present in the first sample which was loaded. Each species is 97 represented by one bar and its height corresponds to the (relative) abundance of the species in

Analysing complex metagenomics data with MicroWineBar 4 of 17

98 that sample (Figure 1). One can change the taxonomic rank, filter out individual species or 99 taxonomic groups or filter for a minimum (relative) abundance. Additionally, information 100 about the species can be retrieved in the form of a Wikipedia summary or PubMed publications. 101 One can create line and stacked bar graphs for several samples on all taxonomic ranks. In 102 addition, it is also possible to compare two groups of samples to identify differentially abundant 103 species. We demonstrate this in the following paragraphs using two example datasets. 104 105 Differences in the wine fermentation of two vintages 106 We tested MicroWineBar using a dataset of 21 metagenomics samples from two different years 107 of wine fermentations from a winery in the La Mancha region in Spain. The aim was to identify 108 differential abundant species between the fermentations of Bobal grapes from 2012 (6 samples 109 from one fermentation tank with a mean of 17,285,292 reads per sample) and from 2013 (15 110 samples from two fermentation tanks with a mean of 13,514,142 reads per sample). With a 111 rarefaction curve (Figure 2A) we investigated whether the samples were sequenced deeply 112 enough to capture all microorganisms present in the environment. From this it was obvious that 113 the samples from 2013 in general contained more species compared to samples from 2012. 114 When analysing the species richness, we found that samples from 2012 had a significantly 115 lower richness compared to the samples from 2013 (p-value: 4e-5) with a median of 41 116 compared to 94 (Figure 2B). Additionally, we found the Shannon diversity index to be 117 significantly higher (p-value: 0.04) for the 2013 samples (median of 2.03) compared to the 118 2012 samples (median of 0.53) (Figure 2C). In the PCoA analysis the samples from the two 119 years were well separated into two groups (Figure 2D) with the variance explained by the three 120 first principal components as 44.55%, 25.05% and 13.88%, respectively. 121 Using the conservative ANCOM method we obtain four species to be differentially abundant. 122 Of these Botrytis cinerea, Bradyrhizobium sp. BTAi1 and Pseudomonas syringae are more 123 abundant in 2013 whereas Hanseniaspora vineae is more abundant in 2012 (Table 1). The 124 presence of H. vineae is in good concordance with previous findings that at the beginning of 125 wine fermentation microorganisms originating from the grapes such as Candida, Pichia, 126 Brettanomyces, Aureobasidium or Hanseniaspora are present19. Some of these are considered 127 spoilage microorganisms such as Brettanomyces20, however later during the alcoholic 128 fermentation the principal wine yeast Saccharomyces cerevisiae dominates the community. 129 Additionally, we also applied the widely used DESeq211 package to identify significantly 130 differentially abundant species between the two years. We filtered the species for an adjusted

5 of 17 Analysing complex metagenomics data with MicroWineBar

131 p-palue < 0.05, an absolute log2 fold change > 2 and a base mean > 300 and obtained 52 132 differentially abundant species including the 4 found by ANCOM (Supplementary Table S1). 133 134 Replicating a human microbiome study 135 In addition to investigating metagenomics samples from wine fermentations we used 136 MicroWineBar to reanalyse previously published data. Here we investigated 141 human gut 137 microbiome samples of colorectal carcinoma (CRC) patients and tumour-free controls13. In the 138 original study the authors created a classifier based on 22 species to distinguish CRC patients 139 from controls based on the microbiome profiles and taxonomic markers. We wanted to replicate 140 their findings of taxonomic markers, namely two Fusobacterium species, Porphyromonas 141 asaccharolytica and Peptostreptococcus stomatis enriched in the CRC samples. To generate 142 count data to be analysed in MicroWineBar we aligned the reads to reference genome databases 143 and determined species abundance using MGmapper14. Hereafter we loaded the count data into 144 MicroWineBar for analysis. 145 Initially we performed a PCoA analysis based on the Bray-Curtis dissimilarity index to 146 identify differences between the two groups (Figure 4A), however we were not able to identify 147 any clear differences between the phenotypes. We next investigated species richness and found 148 it to be slightly higher in the CRC samples compared to the control with a median of 180 and 149 176, respectively. However, this was not statistically significant (p-value: 0.1) (Figure 4B). 150 Compared to this the Shannon diversity index was slightly higher (p value: 1e-1) in the control 151 samples (median of 5.42) compared to the CRC samples (median of 5.22) (Figure 4C). Both 152 richness and diversity were also not found to be significantly different in the study by Zeller et 153 al.13. We found Fusobacterium species to be present in some CRC and in none of the control 154 samples. Additionally, we found P. asaccharolytica to be present in 75% of the CRC samples 155 compared to only 60% of the control samples. Finally, many Bacteroides species were present 156 in all CRC as well as most of the control samples. Bacteroides are a major component of the 157 human gut microbiota. To investigate differential abundance between the two groups we first 158 used the conservative ANCOM method. This resulted in only one species being differentially 159 abundant, namely P. stomatis that was more abundant in the CRC samples. With DESeq2 160 (filtered for an adjusted p-palue < 0.05, an absolute log2 fold change > 2 and a base mean > 161 300) we identified 21 species to be differentially abundant. Among these were three 162 Bacteroides, one Fusobacterium, one Parvimonas and four Veillonella species (Supplementary 163 Table S2). 164

Analysing complex metagenomics data with MicroWineBar 6 of 17

165 Discussion 166 MicroWineBar is designed to be a generic visualisation tool for metagenomics samples. This 167 means that it is not tied to a specific analysis toolkit for preparing the input. Among other 168 features it displays relative abundances of species of one or several samples in bar graphs 169 without showing the hierarchy of the taxonomic classifications. Additionally, one can compare 170 groups of samples. 171 As expected, there are more wine spoilage organisms in the wine fermentation samples 172 from 2013 compared to 2012 and with this an increased species richness in general. This was 173 likely due to differences in weather conditions, where humid conditions favour spoilage 174 organisms such as Hanseniaspora, Aspergillus, Botrytis and Aureobasidium19,20. The year of 175 2013 was generally a humid year in Europe and therefore a challenging year for winemaking 176 (Figure 3B). In the La Mancha region, the average humidity was significantly (p-value 0.049) 177 higher in 2013 compared to 2012, especially in the growing season from April to October (p- 178 value 0.001). This is also reflected in general by wines from the La Mancha region which were 179 graded much higher (excellent) in 2012 compared to 2013 (good)21. 180 Non-Saccharomyces wine yeasts were in the past considered as spoilage yeasts but have 181 recently attracted more attention as they are believed to positively modify the wine aroma20,22, 182 23. The fact that the non-Saccharomyces yeast H. vineae is more abundant in 2012 than in 2013 183 as well as that in 2012 it is more abundant than H. uvarum is interesting. This might have an 184 influence on the wine aroma for the two years even though the relative abundance of H. vineae 185 was low. Furthermore S. cerevisiae is clearly dominating the fermentations from both years 186 and was even more abundant in the 2013 samples than in the 2012 samples. 187 For the human microbiome dataset, we identified with ANCOM only one of the two 188 species which contribute most to the classifier in the study by Zeller et al. to be differentially 189 abundant as well, namely P. stomatis. With DESeq2 we obtained more differentially abundant 190 species, among them Fusarium nucleatum which contributes most to the classifier by Zeller et 191 al. This means that we could partially replicate the results. This might be due to the fact that 192 we used another approach to map the reads to the databases for determining the taxonomic 193 abundance and annotations. In other words, we might have started with a different set of species 194 found in the samples which of course would influence the result of the analysis. This also points 195 out a problem with metagenomics in general, namely the reproducibility of results from 196 metagenomics studies. 197 An issue with several methods for differential abundance analysis can be a high FDR. 198 This is very well reflected in our results where we identified more significantly differentially

7 of 17 Analysing complex metagenomics data with MicroWineBar

199 abundant species using DESeq2 compared to ANCOM. The latter resulted in fewer species 200 being called as differentially abundant in both the wine and the CRC datasets. For both H. 201 vineae and P. stomatis we can be more confident that they are differentially abundant, however 202 the other species reported by DESeq2 to be differentially are likely false positives. 203 MicroWineBar enables researchers without any programming skills to perform analysis 204 and visualization of complex metagenomics datasets. We hope that MicroWineBar will 205 contribute to making analysis of metagenomics data more accessible for non-bioinformaticians. 206 207 208 Methods

209 Implementation of MicroWineBar

210 MicroWineBar is written in Python3 (v.3.6.5) and uses the GUI package Tkinter. It needs to 211 be installed locally and runs on Mac OS and Linux. The pandas (v.0.23.0)26 package is used to 212 represent the data in data frames. The scikit-bio (v.0.5.3) package is used to calculate the 213 Shannon diversity index, the Bray-Curtis dissimilarity, to create the PCoA and to perform the 214 differential abundance test using ANCOM12 with default parameters. The rpy2 (v.2.9.4) and 215 the DESeq2 (v.1.20.0)11 packages are used for the differential abundance analysis with DESeq2 216 in R (v.3.5.1)18. In general, the Benjamini-Hochberg p-value correction was used for multiple 217 hypothesis testing and a corrected p-value of less than 0.05 was considered statistically 218 significant. For the DESeq2 results we additionally required a log2 fold change of 2 or more to 219 call significance. The rarefaction curve was created using the vegan27 package in R. The scipy 220 (v.1.1.0)28 package was used for the nonparametric Wilcoxon rank sum test and the matplotlib 221 (v.2.2.2)29 package was used to create figures. Welch's unequal variances t-test from the scipy 222 package was used to test for statistical difference between two sample groups, i.e. for richness 223 and Shannon diversity. The module wikipedia (v.1.4.0) was used to retrieve the summary of a 224 Wikipedia entry and the module webbrowser was used to open a website in a browser. 225 MicroWineBar takes files with absolute (and relative) abundances and taxonomic annotations 226 created by Kraken15, MetaPhlAn16 and MGmapper14 as input. For the latter the output files 227 from several databases need to be merged with the build_table_from_mgmapper_positive.py 228 script. 229

Analysing complex metagenomics data with MicroWineBar 8 of 17

230 Processing of wine samples

231 The 21 shotgun metagenomics Bobal wine samples were obtained from two studies. 232 From Melkonian et al. (in preparation) we only used the Bobal samples from the two control 233 tanks. The sampling points for the six samples from 2012 were 0h, 16h, 24h, 32h, 48h and 96h 234 after fermentation. After 24h the tank was inoculated with S. cerevisiae to start the alcoholic 235 fermentation. The sequencing reads from both years were generated in the following way. 236 For DNA isolation, cells were pelleted from 50 ml of wine centrifuged at 4,500 g for 237 10 minutes and subsequently washed three times with 10 ml of 4 ˚C phosphate buffered saline. 238 The pellet was mixed with G2-DNA enhancer (Ampliqon, Odense, Denmark) in 2 ml tubes 239 and incubated at room temperature for 5 min. Then 1 ml of lysis buffer (20 mM Tris-HCl- pH 240 8.0, 2 mM EDTA and 40 mg/ml lysozyme) was added to the tube and incubated at 37 ˚C for 241 one hour. An additional 1 ml of CTAB/PVP lysis buffer (50) was added to the lysate and 242 incubated at 65 ˚C for one hour. DNA was purified from 1 ml of lysate with an equal volume 243 of phenol-chloroform-isoamyl alcohol mixture 49.5:49.5:1 and the upper aqueous layer was 244 further purified with a MinElute PCR Purification kit and the QIAvac 24 plus (Qiagen, Hilden, 245 Germany), according to manufacturer’s instructions, before it was eluted in 100 ul DNase-free 246 water. Prior to library building, genomic DNA was fragmented to an average length of ~400 247 bp using the Bioruptor XL (Diagenode, Inc.), with the profile of 20 cycles of 15 s of sonication 248 and 90 s of rest. Sheared DNA was converted to Illumina compatible libraries using NEBNext 249 library kit E6070L (New England Biolabs) and blunt-ended library adapters. The libraries were 250 amplified in 25-ml reactions, with each reaction containing 5 μl of template DNA, 2,5 U 251 AccuPrime Pfx Supermix (Invitrogen, Carlsbad, CA), 1X Accuprime Pfx Supermix, 0.2 uM 252 IS4 forward primer and 0.2 uM reverse primer with sample specific 6 bp index. The PCR 253 conditions were 2 minutes at 95 ˚C to denature DNA and activate the polymerase, 11 cycles of 254 95 ˚C for 15 seconds, 60 ˚C annealing for 30 seconds, and 68 ˚C extension for 40 seconds, and 255 a final extension of 68 ˚C for 7 minutes. 256 Sequence data for 2012 has been deposited at ENA with the accession number 257 ERP111084 and the sequence data for 2013 will be available at ENA with the accession number 258 ERP112039. 259 The raw reads were trimmed and quality-filtered with cutadapt (v. 1.16)31. The first 10 260 bp were removed from the reads, the phred quality cutoff was set to 20 and the minimum length 261 to 30. To get the taxonomic annotation the remaining reads were then mapped with MGmapper 262 (v. 2.7) to the following databases: VitisVinifera, Plant, Human, , Bacteria_draft,

9 of 17 Analysing complex metagenomics data with MicroWineBar

263 MetaHitAssembly, HumanMicrobiome, Fungi, FungiDB, Brettanomyces, NewWine, Archaea, 264 Virus, Protozoa, Plasmid, Invertebrates. The databases VitisVinifera, FungiDB, Brettanomyces 265 and NewWine were custom databases containing the grape genome and genomes of species 266 which are especially interesting with respect to wine fermentation. MGmapper was run with 267 the option to remove PCR duplicates turned on, the maximum edit distance was set to 0.05 and 268 the minimum number of Matches+Mismatches for a valid read was set to 30. Then the python 269 script build_table_from_mgmapper_positive.py was run for each sample to merge the output 270 of MGmapper so that each sample consists of only one file and imported in MicroWineBar. In 271 MicroWineBar the samples were first filtered for a minimum relative abundance of 0.01 and 272 the following phyla were filtered: Annelida, Arthropoda, Chordata, Mollusca, Nematoda, 273 Platyhelminthes and Streptophyta. The reason for this is that we are only interested in the 274 microorganisms but still wanted to know from which other organisms DNA was found in the 275 samples. 276 277 Processing of colorectal cancer dataset 278 The shotgun metagenomics dataset analysed here was downloaded from ENA (ERP005534)13 279 and all 88 control and 53 CRC shotgun metagenomics paired end samples were processed. The 280 reads were trimmed and quality-filtered with the same settings as the wine samples. To get the 281 taxonomic annotation the reads were then also mapped with MGmapper (v. 2.7)14 with the 282 same settings as the wine samples to the following databases: Human (Human reference 283 sequence GRCh38.p12) and HumanMicrobiome4. Hereafter the python script 284 build_table_from_mgmapper_positive.py was run for each sample to merge the output of 285 MGmapper. In MicroWineBar the reads that mapped to human were filtered out since we are 286 interested in the differentially abundant microorganisms between the CRC and the control 287 samples. Finally, the results were filtered for a minimum relative abundance of 0.01. 288

289 Acknowledgements

290 This study was funded by the Horizon 2020 Programme of the European Commission within 291 the Marie Skłodowska-Curie Innovative Training Network "MicroWine" (grant number 292 643063). SR was supported by the Novo Nordisk Foundation grant NNF14CC0001. 293

Analysing complex metagenomics data with MicroWineBar 10 of 17

294 Author contributions

295 FK developed the program MicroWineBar and performed the analysis of the two datasets. 296 MAK and SS collected the wine fermentation data and sequenced the samples. SR guided the 297 design of MicroWineBar. FK and SR wrote the manuscript with input from all authors. 298

299 Additional Information

300 Competing interests

301 MAK and SS are employed by Chr. Hansen A/S and have commercial interests in wine 302 fermentation. FK and SR declare no competing interests. 303 304 305 306 Data availability 307 The read data is available from ENA with the accession ERP111084 and ERP112039. 308

309 References

310 1. Chen, K. & Pachter, L. Bioinformatics for whole-genome shotgun sequencing of 311 microbial communities. PLoS Comput. Biol.1,0106–0112 (2005). 312 2. Venter, J. C. et al. Environmental Genome Shotgun Sequencing of the Sargasso Sea. 313 Science 304,66–74 (2004). 314 3. Ram, R. J. et al. Community proteomics of a natural microbial biofilm (Supporting 315 Online Material). Sci. (New York, NY) 308,1915–1920 (2005). 316 4. Turnbaugh, P. J. et al.The Human Microbiome Project. Nature 449,804 (2007). 317 5. Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V. & Egozcue, J. J. Microbiome 318 datasets are compositional: And this is not optional. Front. Microbiol. 8,1–6 (2017). 319 6. Mitra, S., Klar, B. & Huson, D. H. Visual and statistical comparison of metagenomes. 320 Bioinformatics 25,1849–1855 (2009). 321 7. Huson, D., Auch, A., Qi, J. & Schuster, S. MEGAN analysis of metagenome data. 322 Genome Res.17,377–386 (2007).

11 of 17 Analysing complex metagenomics data with MicroWineBar

323 8. Huson, D. H. et al. MEGAN Community Edition - Interactive Exploration and Analysis 324 of Large-Scale Microbiome Sequencing Data. PLoS Comput. Biol.12,1–12 (2016). 325 9. Ondov, B. D., Bergman, N. H. & Phillippy, A. M. Interactive metagenomic visualization 326 in a Web browser. BMC Bioinformatics 12,385 (2011). 327 10. Cleveland, W. S. & McGill, R. Graphical Perception: Theory, Experimentation, and 328 Application to the Development of Graphical Methods. J. Am. Stat. Assoc.79,531–554 329 (1984). 330 11. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and 331 dispersion for RNA-seq data with DESeq2. Genome Biol.15,1–21 (2014). 332 12. Mandal, S. et al. Analysis of composition of microbiomes: a novel method for studying 333 microbial composition. Microb. Ecol. Health Dis. 26 (2015) 334 13. Zeller, G. et al.Potential of fecal microbiota for early-stage detection of colorectal 335 cancer. Mol. Syst. Biol.1–18 (2014). 336 14. Petersen, T. N. et al. MGmapper: Reference based mapping and annotation 337 of metagenomics sequence reads. PLoS One 12,1–13 (2017). 338 15. Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification 339 using exact alignments. Genome Biol. 15, R46 (2014). 340 16. Segata, N. et al. Metagenomic microbial community profiling using unique clade- 341 specific marker genes. Nat. Methods (2012). 342 17. Shneiderman, B. The eyes have it: a task by data type taxonomy for information 343 visualizations. Proc. 1996 IEEE Symp. Vis. Lang. 336–343 (1996). 344 18. R Core Team. R: A language and environment for statistical computing. R Foundation 345 for Statistical Computing, Vienna, Austria. http://www.R-project.org/ (2018). 346 19. Longo, E., Cansado, J., Agrelo, D. & Villa, T.G. Effect of Climatic Conditions on Yeast 347 Diversity in Grape Musts from Northwest Spain. Am. J. Vitic. 42, 141-144 (1991) 348 20. Loureiro, V. Spoilage yeasts in the wine industry. Int. J. Food. Microbiol. 86, 23-50 349 (2003). 350 21. Wines from Spain. 2018. Vintage chart. https://winesfromspainusa.com/vintage-chart/. 351 22. Padilla, B., Gil, J. V. & Manzanares, P. Past and future of non-Saccharomyces yeasts: 352 From spoilage microorganisms to biotechnological tools for improving wine aroma 353 complexity. Front. Microbiol.7,1–20 (2016). 354 23. Lleixà, J. et al. Comparison of fermentation and wines produced by inoculation of 355 Hanseniaspora vineae and Saccharomyces cerevisiae. Front. Microbiol.7,1–12 (2016).

Analysing complex metagenomics data with MicroWineBar 12 of 17

356 24. Morata, A. et al. Lachancea thermotolerans Applications in Wine Technology. 357 Fermentation 4, 53 (2018). 358 25. Weather Underground. 2018. Weather History for Valdemoro. 359 https://www.wunderground.com/personal-weather- 360 station/dashboard?ID=IMADRIDV3#history/s20120101/e20130101/myear. 361 26. McKinney, W. Data Structures for Statistical Computing in Python. Proceedings of the 362 9th Python in Science Conference, 51-56 (2010). 363 27. Oksanen, J. et al. vegan: Community Ecology Package. R package version 2.5-2. 364 https://CRAN.R-project.org/package=vegan (2018). 365 28. Jones, E. et al. SciPy: Open source scientific tools for Python. http://www.scipy.org/. 366 29. Hunter, J. D. Matplotlib: A 2D graphics environment. Computing in Science & 367 Engineering. 9, 90-95 (2007). 368 30. Melkonian, C. et al. Finding functional differences between species in a microbial 369 community [in preparation]. 370 31. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing 371 reads. EMBnet.journal 17, 1 (2011). 372

13 of 17 Analysing complex metagenomics data with MicroWineBar

373 Figures

374 375 Figure 1. The main window of MicroWineBar displaying one metagenomics wine sample. 376 (Hansenia* species are highlighted in red and wine spoilage organisms are highlighted in 377 yellow).

Analysing complex metagenomics data with MicroWineBar 14 of 17

A Bobal2013_B_1 B Bobal2013_A_2 Bobal2013_A_3 Bobal2013_B_6

Bobal2013_A_5Bobal2013_B_5 Bobal2013_A_4Bobal2013_B_4 Bobal2013_B_3Bobal2013_B_2 Bobal2013_A_1 Bobal2013_B_7Bobal2013_A_0 Bobal2013_A_7

Bobal2013_A_6 300 400 500 Species

Bobal2012_A_3 Bobal2012_A_4 Bobal2012_A_6 Bobal2012_A_1Bobal2012_A_2 Bobal2012_A_0 0 100 200

0.0e+00 5.0e+06 1.0e+07 1.5e+07 2.0e+07 2.5e+07 3.0e+07

Sample Size C D

378 379 Figure 2. Comparing metagenomics wine samples from Bobal fermentations of 2012 with 380 samples from 2013. A) Rarefaction curve of all samples. B) Boxplot of the richness of the 2012 381 samples and the 2013 samples where the 2012 samples show a significantly lower richness. C) 382 Boxplot of the Shannon diversity showing that the 2013 samples are more diverse. For the 383 boxplots the lower and upper hinges correspond to the first and third quartiles and the median 384 is indicated with an orange line. The upper and lower whiskers extend up to 1.5 * interquartile 385 range (IQR). Outliers are represented by circles. D) A Principal Coordinate scatter plot (PCoA) 386 showing that it is possible to distinguish the two vintages when looking at the second and third 387 principal coordinates. 388

15 of 17 Analysing complex metagenomics data with MicroWineBar

A

T0 T1 T2 T3 T4 T5 T0 T1 T2 T3 T4 T5 T6 T7 T1 T2 T3 T4 T5 T6 T7

B

389 390 Figure 3. Comparing metagenomics wine samples from Bobal fermentations of 2012 with 391 samples from 2013. A) Abundances of Hanseniaspora vineae in 2012 (left) and 2013 (right). 392 The labels on the x-axis indicate the order in which the samples were taken during the 393 fermentation. The 2013 samples were taken from two fermentation tanks. B) Average humidity 394 in La Mancha, Spain, in 2012 (blue) and 2013 (green). The average humidity in La Mancha 395 was significantly higher (p-value: 0.049) in 2013 compared to 2012. The range of the humidity 396 for both years is indicated by the background colour. Data from Weather Underground25.

Analysing complex metagenomics data with MicroWineBar 16 of 17

A B

C

397 398 Figure 4. Reanalysing a human microbiome dataset with colorectal cancer (CRC) and control 399 samples. A) A Principal Coordinate scatter plot (PCoA) displaying the CRC (blue) and the 400 control (green) samples. B) Boxplot of the species richness showing no significant difference 401 between the CRC and the control samples. C) Boxplot of Shannon diversity showing no 402 significant difference between the CRC and the control samples. For the boxplots the lower 403 and upper hinges correspond to the first and third quartiles and the median is indicated with an 404 orange line. The upper and lower whiskers extend up to 1.5 * interquartile range (IQR). Outliers 405 are represented by circles.

17 of 17 Analysing complex metagenomics data with MicroWineBar

406 Tables

407 Table 1: ANCOM results for the differential abundance analysis of the wine samples. Only the 408 significantly differential abundant species are displayed. ANCOM calculates the pairwise log 409 ratios between all species and performs a one-way ANOVA (Analysis of Variance) test to 410 determine if there is a significant difference in the log ratios with respect to the species of 411 interest. “W” is then the W-statistic which is the number of pairwise log ratios where the 412 species is significantly different between the two years. Also, the median and maximum 413 abundance are reported for both years. Species W Median abundance Maximum abundance 2012 2013 2012 2013 Botrytis cinerea 101 7.97E-05 2.3E-02 7.84E-03 2.62E-01 Bradhyrizobium sp. BTAi1 108 7.97E-05 2.77E-02 7.97E-05 2.44E-04 Hanesiaspora vineae 109 8.25E-04 6.14E-05 6.13E-02 5.46E-04 Pseudomonas syringae 108 7.97E-05 2.31E-02 7.97E-05 1.14E-01 414 415 416

417 Supplementary Material

418 ● Supplementary Table S1: The results of the differential abundance analysis with 419 DESeq2 for the wine samples. All species are included. 420 ● Supplementary Table S2: The results of the differential abundance analysis with 421 DESeq2 for the CRC samples. All species are included.

1 of 20 Microbial consistency in spontaneous wine fermentation

1 Microbial consistency in spontaneous wine fermentation

2 3 Authors 4 Franziska Klincke1, Kimmo Sirén2,3, Sarah S. T. Mak4, Chrats Melkonian5, Simon 5 Rasmussen6, Ulrich Fischer2, M. Thomas P. Gilbert4 6 7 Affiliations 8 1 Department of Bio and Health Informatics, Technical University of Denmark, Kongens 9 Lyngby, Denmark 10 2 Institute for Viticulture and Oenology, Dienstleistungszentrum Ländlicher Raum Rheinpfalz, 11 Neustadt an der Weinstraße, Germany 12 3 Department of Chemistry, University of Kaiserslautern, Kaiserslautern, Germany 13 4 Section for Evolutionary Genomics, Natural History Museum of Denmark, University of 14 Copenhagen, Copenhagen, Denmark 15 5 Systems Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands. 16 6 Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical 17 Sciences, University of Copenhagen, Copenhagen, Denmark 18 19 20 21 Abstract 22 23 So far, the microbial community in wine fermentation has mostly been examined by amplicon 24 sequencing and culture-based approaches. Here, we apply shotgun metagenomics to 25 investigate the microbial community associated with spontaneous wine fermentations of 26 Riesling grapes from vineyards in the Pfalz region, Germany. We found a high microbial 27 consistency within vineyards compared to across vineyards and that species abundance, but 28 not species richness varied between vineyards. Especially, the fermentative yeast (e.g. 29 Saccharomyces and Hanseniaspora) and filamentous fungi such as Aspergillus were 30 consistent across the vineyards in contrast to most bacteria (e.g. Gilliamella or 31 Sphingomonas). Furthermore, we show that there were subtle differences in the microbial 32 community during the wine fermentation of grapes from vineyards of the same region.

Microbial consistency in spontaneous wine fermentation 2 of 20

33 Introduction 34 35 Today, most wines are produced by inoculation of specific Saccharomyces cerevisiae strains. 36 These inoculants reduce the winemaker’s risks of off-flavour production and especially of stuck 37 fermentation as they drive the fermentation process, but they are also assumed to reduce the 38 complexity of the wines due to a reduced number of microbial interactions [1]. In spontaneous 39 fermentation, where the must is not inoculated prior to fermentation, the alcoholic fermentation 40 is carried out by indigenous yeasts which increases aroma and flavour diversity of the wines 41 [2]. Therefore, studying the microbial content of such spontaneous fermentations can provide 42 important knowledge that can improve wine making. 43 44 The majority of the microorganisms present at the beginning of wine fermentation come either 45 from the surface of the grape berries or the winery environment [3]. These are mainly aerobic 46 and apiculate yeast and yeast-like fungi [3]. Non-Saccharomyces yeast such as 47 Hanseniaspora generally initiate the fermentation whereas during the fermentation the 48 principal wine yeast Saccharomyces cerevisiae takes over. However, S. cerevisiae is not 49 commonly found on grapes and therefore found in low abundances at the beginning of the 50 fermentation [3]. Because it could not be isolated from grapes with molecular techniques, 51 winery surfaces were considered to be the main source of S. cerevisiae in spontaneous 52 fermentations [4]. However, it has been shown that spontaneous fermentation is possible in a 53 newly built winery which does not inhibit S. cerevisiae, meaning that it is also present on the 54 grapes [5]. 55 56 The microbial communities during wine fermentation have been studied several times using 57 amplicon sequencing approaches and with this mostly concentrating on either bacteria or 58 fungi. Bokulich et al. looked at both bacteria and fungi in grape musts from different wine 59 growing regions in California [6]. Piao et al. compared the bacterial populations from both 60 organically and conventionally produced wines [7]. Shotgun metagenomics enables us to 61 study the entire microbial community of an environment including both bacteria and 62 eukaryotes and makes it possible to identify novel microorganisms which are not in the 63 databases. So far shotgun metagenomics has been only applied to study microorganisms 64 during wine fermentation by two studies. Sternes et al. investigate spontaneous fermentations 65 by both amplicon and shotgun sequencing and report a bias of the amplicon analysis [8]. The 66 second study compares fermentations inoculated with Oenococcus oeni or Brettanomyces 67 bruxellensis to study their influence on the fermentations [9]. Additionally, it was applied to 68 explore the diversity of soil microbial communities in Chilean vineyards and discovered that 69 the vineyards share most of the microorganisms with surrounding native forests [10].

3 of 20 Microbial consistency in spontaneous wine fermentation

70 Moreover, the microbiome associated with grape berries withered in two post-harvest 71 conditions was investigated [11]. 72 73 To investigate the microbial composition during spontaneous wine fermentation, we created 74 metagenomics data from fermentations of Riesling grapes from seven vineyards in the Pfalz 75 region in Germany. The pilot-scale sized winemaking was done in a way to avoid introduction 76 of microorganisms from outside the vineyard, such as the winery. Furthermore, to ensure that 77 our results were not confounded by factors such as grape variety and vintage [6] the vineyards 78 were all from the same wine growing region and had the same cultivar, Riesling. Our data 79 showed that there were indeed differences in the microbial composition between the vineyards 80 and that these could be detected throughout the fermentation process. However, we also 81 found a high microbial consistency across vineyards, especially for the fermentative yeast 82 Saccharomyces and Hanseniaspora. 83 84 85 86 87 Results 88 89 Grape harvest, fermentation and sequencing 90 The Riesling grapes were collected aseptically at the harvest in 2016 from seven vineyards in 91 the Pfalz region, Germany. The viticulture management systems for the vineyards where as 92 follows: one was organically farmed, two conventional while four applied biodynamic farming. 93 The seven vineyards were owned and maintained by six different wineries. The grapes from 94 each vineyard were harvested into sterile bags and sealed in the vineyard to avoid 95 contamination from outside the vineyard, before being transported to the research winery. 96 Furthermore, during grape processing, in order to avoid potential contamination from the 97 research winery, the spontaneous alcoholic fermentations were carried out in a sanitized 98 temperature-controlled shipping container. We then sampled from each vineyard during 99 pressing and again after sedimentation. Hereafter, each must was racked into three 100 fermentation vessels and samples were collected during the onset of alcoholic fermentation 101 at 2.5%, 4%, 6%, 8% and 11% alcohol content. Finally, we sampled at the end of the 102 fermentation before racking the wine to new vessels (rack). For details about the number of 103 samples see Table 1. Samples were then processed and sequenced using BGISeq paired 104 end sequencing with 5 - 233 million reads per sample (average of 61 million reads). 105 106 107 Consistent fermentation across vineyards

Microbial consistency in spontaneous wine fermentation 4 of 20

108 To investigate the abundance of the microbial organisms present during the fermentations, we 109 determined the taxonomic annotation of the reads using 168 genomes of wine-related 110 microorganisms as well as the grape genome Vitis vinifera (see methods). Using a centred 111 log-ratio approach on the estimated read counts we investigated the variation in the data 112 performing a principal component analysis. Here, we detected a batch effect (Supplementary 113 Figure S1) where samples from batch 2 were outliers compared to all other batches. We 114 identified the species which contributed to this effect by concentrating on the species with a 115 high abundance in the negative control samples and therefore excluded 11 microorganisms 116 from future analysis (Supplementary Figure S2). These were mainly acetic acid bacteria and 117 this reduced our database to 157 microorganisms (see Supplementary Table S3 for more 118 details). Hereafter, there was a slight separation in the principle components (PC) according 119 to the vineyards (Figure 1A), however the main variation (PC1, 43%) was driven by the early 120 time points (press, sedimentation and onset of fermentation) being gradually different from the 121 later time points (Figure 1B). Saccharomyces, including S. pastorianus, S. cerevisiae, S. 122 paradoxus and S. uvarum, had the highest influence on PC1 with component scores of -0.24, 123 -0.24, -0.23, -0.21, respectively. This was confirmed in a heatmap of the clr-transformed 124 estimated abundances of the species (Figure 2). Here, the first time points (of five vineyards) 125 from the fermentation clustered separately and the samples from later time points were partly 126 grouped by vineyard. 127 128 During the alcoholic fermentation all seven vineyards had a similar development which was 129 reflected in both species richness and alpha diversity. For all seven vineyards the species 130 richness was the highest before or at the start of the alcoholic fermentation and lowest in the 131 middle (4-8% alcohol content) and increased during the second half of the fermentation 132 (Figure 1C). Furthermore, alpha diversity was the highest for the press samples except for 133 vineyard 5 where it was slightly higher during the sedimentation phase (Figure 1D). The 134 Shannon diversity index decreased drastically at the beginning of the alcoholic fermentation 135 when S. cerevisiae started to dominate the microbial community (Figure 3A). Therefore, the 136 fermentations seemed to follow a similar pattern in terms of microbial development. 137 138 Additionally, we investigated the similarity of the fermentations based on sequence 139 composition. Here, we de novo assembled the reads from each of the samples taken during 140 the fermentation. We then merged all assemblies from each of the fermentation tanks and 141 calculated annotation-independent distances between tanks. Theses distances correlate with 142 the Average Nucleotide Identity (ANI) which is used to compare prokaryotic genomes [12]. By 143 clustering the pairwise distances we found that the vineyards clustered in two main clusters 144 with vineyard 1, 5, 6 and 7 in one cluster and vineyard 2, 3 and 4 in the second cluster (Figure

5 of 20 Microbial consistency in spontaneous wine fermentation

145 3B). For five of the seven vineyards all three biological replicates clustered together. This 146 shows that the main separation in the assemblies was between and not within the vineyards. 147 Only for two vineyards (3 and 4) the fermentation tanks develop slightly different as not all 148 three fermentations (biological replicates) cluster together, however they are within the same 149 main cluster. Additionally, the vineyards did not cluster according to vineyard management. 150 Taken together the overall fermentation process seemed to follow a similar development, 151 however with distinct microbial compositions for each vineyard. 152 153 Vineyard specific species 154 In order to investigate the microbial differences between the vineyards we determined 155 differentially abundant species. In order to identify differences between vineyards, we 156 performed all pairwise comparisons of one vineyard to the remaining six vineyards. We then 157 used the intersection of both the over- and the underrepresented species, which were 158 significant in all comparisons, separately (Table 2). It is evident that three Gilliamella and one 159 Snodgrassella species are significantly more abundant in vineyard 1 compared to all other 160 vineyards. Both species belong to the Orbaceae of which we could also reconstruct a draft 161 genome for vineyard 1. In addition, two Bacillus were more abundant in vineyard 3, two 162 Tatumella in vineyard 2 and three Sphingomonas in vineyard 5. Further, Starmerella bacillaris 163 was less abundant in vineyard 5 whereas Alternaria alternata was more abundant. Figure 3C 164 shows that Bacillus cereus is more abundant in vineyard 3 compared to the other vineyards. 165 Only for vineyard 7 we could not get any significantly different species compared to all other 166 vineyards. 167 168 Consistent dynamics of yeast across vineyards 169 Next, we wanted to investigate consistency of the microbial development during the 170 fermentation. We therefore computed a consistency index for each species based on the 171 pearson correlation between the time series profiles of the fermentation tanks. We found that 172 especially the fermentative yeast Hanseniaspora and Saccharomyces had very high 173 consistency where 7 of the 10 most consistent microorganisms were of these two genera. 174 Across all seven vineyards they had a median consistency index of 0.67 and 0.68, respectively 175 (Figure 4A). In addition, Lachancea waltii and Aspergillus carbonarius showed high 176 consistency across the vineyards with median consistency indices of 0.71. In general fungi 177 had a significantly higher consistency compared to bacteria with a median of 0.72 vs. 0.52 (p- 178 value 1.6 e-122) for the 20 most consistent fungi and bacteria, respectively. Similarly, we 179 investigated the consistency within the vineyards (Figure 4B) and found that it was higher 180 compared to across vineyards (p-value 1.7 e-127). As with across vineyard consistency, within 181 consistency was remarkably high for Saccharomyces where especially S. paradoxus, S.

Microbial consistency in spontaneous wine fermentation 6 of 20

182 eubayanus, S. cerevisiae and S. mikatae had the highest consistencies of all organisms with 183 a median consistency of 0.94-0.96. 184 185 Reconstructing draft wine fermentation genomes 186 We tried to reconstruct draft genomes to see whether we can find any new microorganisms in 187 our spontaneous fermentations, but we mainly obtained draft genomes of microorganisms 188 which are already reported to be wine related and hence in our custom database. The only 189 exception was Rickettsia for which we reconstructed three draft genomes with an estimated 190 completeness of 95.73%, 99.29% and 99.53%, respectively. Most of the high-quality draft 191 genomes (estimated completeness >90%, contamination <10) are Gluconobacter, 192 Oenococcus, Ralstonia or Pseudomonas. For Saccharomyces, we obtained only one draft 193 genome with an estimated completeness of >72% while the majority had an estimated 194 completeness of ~50%. See Supplementary Table S4 for more details. 195 196 197 198 Discussion 199 200 Spontaneous fermentations may result in more complex wines due to the presence of non- 201 Saccharomyces species as well as various strains of S. cerevisiae [13]. The microorganisms 202 present in spontaneous fermentations mainly originate from the vineyard or the winery. We 203 wanted to investigate the influence of the vineyard on forming the microbial community during 204 wine fermentation. Further, we analysed both the differences in the microbial communities 205 between the vineyards as well as the similarities. 206 207 We found that fermentative yeast such as Saccharomyces and Hanseniaspora have a higher 208 consistency across the vineyards showing that they behave similar in all seven vineyards. 209 210 To perform the species investigations, we would have preferred to use a more general 211 database than our custom Kraken database, however several of the wine related 212 microorganisms such as Hanseniaspora are not represented yet in the RefSeq database [14]. 213 Consequently, we decided to build our own custom database. The genome binning results 214 indicated that our database is not comprehensive as we obtained three draft genomes of 215 Rickettsia. So far two plant-associated Rickettsia [15, 16] have been reported but none 216 specifically for grapevine. Consequently, this needs to be investigated in more detail. 217

7 of 20 Microbial consistency in spontaneous wine fermentation

218 The high-quality draft genomes might be contamination. Both Ralstonia and Pseudomonas 219 are common contaminants in next-generation sequencing and are also reported as such for 220 the sequencing kit used [17]. 221 222 Additionally, diversity measures indicate that the fermentations of all seven vineyards behave 223 similar. Both the species richness and the Shannon diversity profiles over time are similar 224 across the vineyards. The Shannon diversity is low during the alcoholic fermentation because 225 S. cerevisiae is dominating the microbial community. In general, the increase in temperature 226 and alcohol concentration during the fermentation leads to a decrease in population 227 complexity [18]. This can also be noticed in our fermentations as a decrease in species 228 richness until around 4-8% alcohol are reached. Many non-Saccharomyces species are more 229 sensitive to ethanol than S. cerevisiae and diminish for that reason [3, 19]. However, some 230 non-Saccharomyces yeast were shown to have a high ethanol tolerance such as Candida 231 Stellata [20]. The slight increase of the Shannon diversity index in the second half of the 232 alcoholic fermentation might be due to yeast autolysis which releases nutrients which other 233 microorganisms can use [3]. However, it is more likely that we see a sequencing effect here. 234 During the fermentation some species are below our threshold of 100 reads but at the end of 235 the fermentation they are slightly above. 236 237 However, there are differences in the microbial composition between the seven vineyards. 238 During the fermentation the samples largely cluster according to vineyards. Additionally, the 239 Pearson correlations are much higher if one only compares within the vineyards. The 240 differential abundance analysis highlighted some species which are distinct between the 241 analysed vineyards. There are Gilliamella and Snodgrassella which were found in the gut of 242 bumblebees and honeybees [21]. This indicates that vineyard 1 inhabited more bumblebees 243 and honeybees compared to the other vineyards. In vineyard 3 Bacillus was found to be more 244 abundant compared to the other vineyards. B. cereus and B. thuringiensis both belong to the 245 Bacillus cereus group due to the high similarity of their genomes [22]. It is even proposed that 246 they should be considered as belonging to the same species [23]. Thus, it might be that we 247 actually only have B. thuringiensis in our samples which is often applied as a biological 248 fungicide or insecticide in organic vineyards [24]. Vineyard 3 is the only organic vineyard in 249 our analysis and therefore it is likely that B. thuringiensis was used as an insecticide or 250 fungicide. Additionally, Prendes et al. reported S. bacillaris to be an antagonist yeast for A. 251 alternata [25]. This shows that for vineyard 5 it makes sense that while S. bacillaris is 252 underrepresented A. alternata is overrepresented. It must be emphasised that we only 253 identified differences in the abundance of species between the vineyards. 254

Microbial consistency in spontaneous wine fermentation 8 of 20

255 Additionally, it would be beneficial to include other types of data such as metabolomic data to 256 investigate how the microorganisms are influencing the final wine. This could then be related 257 to functional annotations of the called genes. 258 259 The fact that the samples at the beginning of the fermentation cluster together suggests that 260 the microbial composition at this time point is distinct from the later time points and at the same 261 time similar across the vineyards. During the alcoholic fermentation, where S. cerevisiae is 262 dominating, most of the vineyards seem to have a distinct microbial composition which might 263 be influenced by the grapes and not so much by the microbial composition at the early time 264 points. 265 266 There is a high microbial consistency in the spontaneous fermentation of the seven vineyards. 267 This is especially the case for yeast. At the same time there are subtle differences in the 268 microbial composition between vineyards. 269 270 271 272 273 Methods 274 275 Experimental design 276 In the 2016 harvest Riesling grapes were harvested from seven vineyards in the Pfalz region 277 in Germany. To investigate the impact of vineyard originated microbes on fermentations, at 278 each vineyard 250 kg of grapes were individually handpicked into autoclaved sterile flat plastic 279 bags which were sealed inside the vineyards to avoid contamination coming from outside the 280 vineyard. The grapes were processed at Dienstleistungszentrum Ländlicher Raum Rheinpfalz, 281 Neustadt an der Weinstraße, Germany. Pilot-scale vinification took place in a temperature- 282 controlled intermodal container set at 20 °C. The vinification scheme followed direct pressing 283 with overnight sedimentation by gravity, after which the must was divided into 3 full 34 Litre 284 carboys sealed with silicon bungs and airlocks. Ample amount of ozone dissolved in water 285 was used for sanitation/sterilization steps. Neither volatile acid production nor malolactic 286 fermentation was observed. For metagenomic analysis, the sampling was done from each 287 vineyard during pressing and again after sedimentation. Then each must was racked into three 288 fermentation vessels. The rest of the samples for meta-genomic analysis were taken during 289 the onset of alcoholic fermentation, 2.5%, 4%, 6%, 8% and 11% alcohol content and at the 290 end of fermentation before racking the wine to new vessels (rack). For details about the 291 number of samples see Table 1. One sample for vineyard 6 for the rack time point is missing

9 of 20 Microbial consistency in spontaneous wine fermentation

292 and one sample for vineyard 7 for the press time point had to be excluded from the analysis 293 due to a too high abundance of Saccharomyces which did not correspond to the later time 294 points from that fermentation. Additionally, the experimental design included negative control 295 samples for the library preparation and DNA extraction steps. 296 297 Preparation of samples and sequencing 298 Samples (4.5 ml), prior to DNA extractions, were spinned at 4500 xg for 10 min and re- 299 suspended with 1 ml of ice-cold 1X PBS (pH 7.4, Life Technologies, California, U.S.A.) after 300 removal of supernatant. The centrifugation and resuspension were repeated once where 301 pellets were subsequently stored at -20 °C until extractions. All pre- and post-PCR steps were 302 carried out in the laboratory of Natural History Museum of Denmark, University of Copenhagen 303 under laminar flow hoods. Extractions were followed by the method described in Mak et al. 304 [26] using FastDNA Spin Kit for Soil (MP Biomedical, California, U.S.A.) manufacturer’s 305 protocol. Extracted DNA was qualified by Qubit 1.0 fluorometer with dsDNA High Sensitivity 306 Assay kit (ThermoFisher Scientific). Extraction blanks (without sample) were included. 307 308 The DNA was subsequently fragmented to 300bp by using Covaris M220 Focused- 309 ultrasonicator (Covaris, USA). Fragmented DNA were built using a modified single-tube library 310 building protocol [27]. After library constructions, qPCR was run to determine the number of 311 cycles for indexing by using 1 ul of 10-time dilution libraries as a template in MX3005 qPCR 312 machine (Agilent) using the forward primer and one of indexed reverse primers. The libraries 313 were subsequently indexed with different indices in randomised order and purified for residual 314 adapter dimers using SPRI beads. Eight samples were pooled in equimolar in each lane and 315 total 32 lanes. All pooled libraries were then sent to BGI-Europe for circularization, 316 constructions of DNA Nanoballs (DNB) and sequencing on the BGISeq-500 platform in 100bp 317 pair-ended mode. 318 319 Bioinformatic analysis 320 Cutadapt (v1.18) [28] was used to trim and filter the shotgun metagenomic reads. The first 10 321 bp were removed from the reads, the phred quality cutoff was set to 20 and the minimum 322 length to 30. MetaSPAdes (v3.12.0) [29] was then used with default parameters to de novo 323 assemble each sample individually unless there were technical replicates (only for press and 324 sedimentation time points) which were merged before creating the assembly. MASH (v2.0) 325 [12] was used to calculate annotation-independent distances between the merged assembled 326 sequences of two fermentation tanks. First, ‘mash sketch’ was used to generate reduced 327 representations and then ‘mash dist’ was run to estimate the distances with MinHash. The 328 Python (v.3.6.6) package seaborn (v.0.9.0) was used to generate the cluster heatmap.

Microbial consistency in spontaneous wine fermentation 10 of 20

329 Taxonomic abundances 330 A database with genomes of microorganisms related to wine fermentation was created based 331 on bacteria and fungi mentioned on the wine server from UC Davis [30] and and by Sternes 332 et al. [8]. The grape genome was also added to the database resulting in 169 genomes. This 333 database was used to run Kraken (v1.1) [31] to get the taxonomic annotations of the reads in 334 the samples. Bracken (v2.2) [32] was then run with default parameters on those read counts 335 to estimate the abundance of the microorganisms on species level. The multiplicative 336 replacement technique was used to substitute zeros in the compositional read counts with 337 small pseudocounts before a centre log ratio transform (clr) was performed with the Python 338 package scikit-bio (v0.5.3) to transform the compositional read counts into the real space. The 339 clr-transformed data was used to generate the PCA scatter plots with scikit-learn (v0.19.2), 340 abundance profiles over time, the cluster heatmap and the Pearson correlation with scipy 341 (v1.1.0) [33]. The medians of the Pearson correlations were compared using a one-sided 342 Wilcoxon rank sum test and were considered significant if the obtained p-value < 0.05. The 343 species richness and the Shannon diversity index were calculated on the estimated read 344 counts using scikit-bio. 345 346 Differential abundance analysis 347 An ANOVA-like differential expression was carried out using the R (v3.5.10) [34] package 348 ALDEx2 (v1.12.0) [35, 36, 37]. To generate the distribution of the centered log-ratio 349 transformed (clr) values, 128 Monte-Carlo samples were sampled from a Dirichlet distribution. 350 “iqlr” was used as the denominator for the Geometric Mean calculation as it is robust. The 351 effect size and the within and between condition values were estimated. General linear models 352 (glm) and Kruskal-Wallis tests were performed for a one-way ANOVA. A Benjamini-Hochberg 353 corrected p-value < 0.05 was considered significant. Only micro-organisms with a significant 354 adjusted p-value for both glm and Kruskal-Wallis tests were considered. To investigate the 355 microorganisms which are distinct for one vineyard the intersection of the over- 356 /underrepresented microorganisms compared to the remaining six vineyards was calculated. 357 358 359 Genome binning 360 For each of the seven vineyards draft genomes were created the following way. First all 361 assemblies from a vineyard were merged into a single file, before the contigs were filtered for 362 a minimum length of 1500 bp using awk. Next, the contigs were clustered using CD-HIT (v4.7) 363 [38, 39] with the sequence identity threshold set to 0.99, the word length to 8, the alignment 364 coverage for the shorter sequence to 0.9 and otherwise default parameters to create a 365 reduced contig file. Next, the reads of each sample were mapped to the reduced contig file

11 of 20 Microbial consistency in spontaneous wine fermentation

366 using bwa mem (v0.7.15). The alignment file was name-sorted with samtools (v1.8) [40] sort 367 before the mate coordinates were filled in with samtools fixmate and subsequently being 368 position-sorted. Then potential PCR duplicates were removed using samtools rmdup and 369 samtools view. A depth file was created using the script 370 jgi_summarize_bam_contig_depths.py with a minimum contig length of 1500 bp before 371 MetaBAT (v2.10.2) [41] was run with the minimum contig length set to 1500 bp to create draft 372 genome bins. The quality of the genome bins was assessed using CheckM (v1.0.12) [42] 373 running the lineage workflow. To get an improved taxonomic annotation the bins with an 374 estimated completeness of >20% were mapped with mmseqs 375 (vbe8f616d7cfd406880608e470a6c7fda45b04c50) [43] against UniProtKB [44]. Additionally, 376 the completeness of the Saccharomyces draft genomes was quantified using Benchmarking 377 Universal Single-Copy Orthologs (BUSCO) as CheckM uses prokaryotic marker genes by 378 default. BUSCO (v3.0.2) [45, 46] was run and saccharomycetales_odb9 was used as the 379 BUSCO lineage and the gene finding parameters for saccharomyces were used for 380 AUGUSTUS [47] in BUSCO. 381 382 383 Acknowledgements 384 This study was funded by the Horizon 2020 Programme of the European Commission within 385 the Marie Skłodowska-Curie Innovative Training Network "MicroWine" (grant number 386 643063). SR was supported by the Novo Nordisk Foundation grant NNF14CC0001. 387 388 Author contributions 389 KS, UF, SSTM and MTPG designed the study. SSTM carried out the laboratory work and 390 data collection. KS performed the sample preparation and fermentation set up. FK and SR 391 wrote the manuscript with input from all authors. 392 393 Additional information 394 395 Data availability 396 The shotgun metagenomics samples will be made publicly available. 397 398

Microbial consistency in spontaneous wine fermentation 12 of 20

399 References 400 1. Jolly, N. P., Varela, C. & Pretorius, I. S. Not your ordinary yeast: Non- 401 Saccharomyces yeasts in wine production uncovered. FEMS Yeast Res. 14,215–237 402 (2014). 403 2. Medina, K. et al. Increased flavour diversity of Chardonnay wines by spontaneous 404 fermentation and co-fermentation with Hanseniaspora vineae. Food Chem. 405 141,2513–2521 (2013). 406 3. Fleet, G. H. in Microbiology of Fermented Foods 1,217–262 (1998). 407 4. Vaughan-Martini, A. & Martini, A. Facts, myths and legends on the prime industrial 408 microorganism. J. Ind. Microbiol. 14,514–522 (1995). 409 5. Beltran, G. et al.Analysis of yeast populations during alcoholic fermentation: A six 410 year follow-up study. Syst. Appl. Microbiol. 25,2 (2002). 411 6. Bokulich, N. A., Thorngate, J. H., Richardson, P. M. & Mills, D. A. Microbial 412 biogeography of wine grapes is conditioned by cultivar, vintage, and climate. Proc. 413 Natl. Acad. Sci. 111, E139–E148 (2014). 414 7. Piao, H. et al. Insights into the bacterial community and its temporal succession 415 during the fermentation of wine grapes. Front. Microbiol.6,(2015). 416 8. Sternes, P. R., Lee, D., Kutyna, D. R. & Borneman, A. R. A combined meta- 417 barcoding and shotgun metagenomic analysis of spontaneous wine fermentation. 418 Gigascience 6,1–10 (2017). 419 9. Zepeda-Mendoza, M. L. et al. Influence of Oenococcus oeni and Brettanomyces 420 bruxellensis on wine microbial taxonomic and functional potential profiles. Am. J. 421 Enol. Vitic. 69,321–333 (2018). 422 10. Castañeda, L. E. & Barbosa, O. Metagenomic analysis exploring taxonomic and 423 functional diversity of soil microbial communities in Chilean vineyards and 424 surrounding native forests. PeerJ 5, e3098 (2017). 425 11. Salvetti, E. et al. Whole-metagenome-sequencing-based community profiles of Vitis 426 vinifera L. cv. Corvina berries withered in two post-harvest conditions. Front. 427 Microbiol. 7,1–17 (2016). 428 12. Ondov, B. D. et al. Mash: Fast genome and metagenome distance estimation using 429 MinHash. Genome Biol. 17,1–14 (2016). 430 13. Tello, J., Cordero-Bueso, G., Aporta, I., Cabellos, J. M. & Arroyo, T. Genetic diversity 431 in commercial wineries: Effects of the farming system and vinification management 432 on wine yeasts. J. Appl. Microbiol. 112,316–328 (2012). 433 14. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: Current 434 status, taxonomic expansion, and functional annotation.Nucleic Acids Res. 44, 435 D733–D745 (2016)

13 of 20 Microbial consistency in spontaneous wine fermentation

436 15. Davis, M. J., Ying, Z., Brunner, B. R., Pantoja, A. & Ferwerda, F. H. Rickettsia 437 relative associated with papaya bunchy top disease. Curr. Microbiol. 36,80–84 438 (1998). 439 16. Chen, D. Q., Campbell, B. C. & Purcell, A. H. A new Rickettsia from a herbivorous 440 insect, the pea aphid Acyrthosiphon pisum (Harris). Curr. Microbiol. 33,123–128 441 (1996). 442 17. Laurence, M., Hatzis, C. & Brash, D. E. Common contaminants in next-generation 443 sequencing that hinder discovery of low-abundance microbes. PLoS One 9,1–8 444 (2014). 445 18. Stefanini, I. & Cavalieri, D. Metagenomic approaches to investigate the contribution 446 of the vineyard environment to the quality of wine fermentation: Potentials and 447 difficulties. Front. Microbiol. 9,1–17 (2018). 448 19. Nissen, P. & Arneborg, N. Characterization of early deaths of non-Saccharomyces 449 yeasts in mixed cultures with Saccharomyces cerevisiae. Arch. Microbiol. 180,257– 450 263 (2003). 451 20. Gao, C. & Fleet, G. H. The effects of temperature and pH on the ethanol tolerance of 452 the wine yeasts, Saccharomyces cerevisiae, Candida stellata and Kloeckera 453 apiculata.J. Appl. Bacteriol. 65,405–409 (1988). 454 21. Kwong, W. K. & Moran, N. A. Cultivation and characterization of the gut symbionts of 455 honey bees and bumble bees: Description of Snodgrassella alvi gen. nov., sp. nov., a 456 member of the family Neisseriaceae of the betaproteobacteria, and Gilliamella 457 apicola gen. nov., sp. nov., a memb. Int. J. Syst. Evol. Microbiol. 63,2008–2018 458 (2013). 459 22. Liu, Y. et al. Genomic insights into the taxonomic status of the Bacillus cereus group. 460 Sci. Rep. 5,1–11 (2015). 461 23. Helgason, E. et al. One Species on the Basis of Genetic Evidence. Appl. Environ. 462 Microbiol. 66,2627–2630 (2000). 463 24. Bae, S., Fleet, G. H. & Heard, G. M. Occurrence and significance of Bacillus 464 thuringiensis on wine grapes. Int. J. Food Microbiol. 94,301–312 (2004). 465 25. Prendes, L. P. et al. Isolation, identification and selection of antagonistic yeast 466 against Alternaria alternata infection and tenuazonic acid production in wine grapes 467 from Argentina. Int. J. Food Microbiol. 266,14–20 (2018). 468 26. Mak, S. S. T et al. Comparison of DNA extraction methods for use in fungal diversity 469 analyses of Riesling during alcoholic fermentation. Manuscript submitted for 470 publication. 471 27. Sirén, K. et al. Taxonomic and functional characterisation of the microbial community 472 during spontaneous in vitro fermentation of Riesling must. Manuscript in preparation.

Microbial consistency in spontaneous wine fermentation 14 of 20

473 28. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing 474 reads. EMBnet.journal 17,1 (2011). 475 29. Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. MetaSPAdes: A new 476 versatile metagenomic assembler. Genome Res. 27,824–834 (2017). 477 30. UC Davis, Viticulture & Enology. Wine Microbiology. 478 https://wineserver.ucdavis.edu/industry-info/enology/wine-microbiology 479 31. Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification 480 using exact alignments. Genome Biol. 15, R46 (2014). 481 32. Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species 482 abundance in metagenomics data. PeerJ Comput. Sci. 3, e104 (2017). 483 33. Jones, E. et al.SciPy: Open source scientific tools for Python. http://www.scipy.org/. 484 34. R Core Team. R: A language and environment for statistical computing. R 485 Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/ 486 (2018). 487 35. Fernandes, A. D., Macklaim, J. M., Linn, T. G., Reid, G. & Gloor, G. B. ANOVA-Like 488 Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq. PLoS One 489 8 (2013). 490 36. Fernandes, A. D. et al. Unifying the analysis of high-throughput sequencing datasets: 491 Characterizing RNA-seq, 16S rRNA gene sequencing and selective growth 492 experiments by compositional data analysis. Microbiome 2,1–13 (2014). 493 37. Gloor, G. B., Macklaim, J. M. & Fernandes, A. D. Displaying Variation in Large 494 Datasets: Plotting a Visual Summary of Effect Sizes. J. Comput. Graph. Stat. 495 25,971–979 (2016). 496 38. Li, W. & Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of 497 protein or nucleotide sequences. Bioinformatics 22,1658–1659 (2006). 498 39. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: Accelerated for clustering the next- 499 generation sequencing data. Bioinformatics 28,3150–3152 (2012). 500 40. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 501 2078–9 (2009). 502 41. Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately 503 reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 504 (2015). 505 42. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. 506 CheckM: assessing the quality of microbial genomes recovered from isolates, single 507 cells and metagenomes. Genome Res. 25,1043–1055 (2015). 508 43. Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching 509 for the analysis of massive data sets. Nat. Biotechnol. 35,1026–1028 (2017).

15 of 20 Microbial consistency in spontaneous wine fermentation

510 44. Bateman, A. et al. UniProt: The universal protein knowledgebase. Nucleic Acids Res. 511 45, D158–D169 (2017). 512 45. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. 513 BUSCO: Assessing genome assembly and annotation completeness with single- 514 copy orthologs. Bioinformatics 31,3210–3212 (2015). 515 46. Waterhouse, R. M. et al. BUSCO applications from quality assessments to gene 516 prediction and phylogenomics. Mol. Biol. Evol. 35,543–548 (2018). 517 47. Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new 518 intron submodel. Bioinformatics 19,215–225 (2003). 519 520 521 522 Figures

A B

C

D

523 524 Figure1. Similarities of the seven vineyards. A) PCA scatter plot of the clr-transformed 525 estimated read counts of 157 microorganisms. It is coloured for the vineyards. B) PCA scatter 526 plot coloured for time point before and during the fermentation. C) Boxplots of the species

Microbial consistency in spontaneous wine fermentation 16 of 20

527 richness for each vineyard and all nine time points. It was calculated from the estimated read 528 counts. D) Boxplots of the Shannon diversity index for each vineyard and all nine time points. 529 It was calculated from the estimated read counts. 530

531 532 Figure 2. A cluster heatmap of the the clr-transformed estimated abundances for 157 533 microorganisms. Bacteria are highlighted in blue while fungi are pink and they cluster in 534 several individual clusters. The samples are coloured for the vineyard and the time point during 535 the alcoholic fermentation (top rows).

17 of 20 Microbial consistency in spontaneous wine fermentation

A B

C

536 537 Figure 3. Differences between the seven vineyards. A) Abundance profile of Saccharomyces 538 cerevisiae over time for all vineyards. B) MASH clustering of the fermentation tanks from the 539 vineyards. C) Abundance profile of Bacillus cereus over time for all vineyards.

Microbial consistency in spontaneous wine fermentation 18 of 20

A

B

540 541 Figure 4. Consistency indices for each species in the custom database. The boxplots are 542 coloured for the median. A) Consistency index for each species between all fermentation tanks 543 (including within and across vineyards). B) Consistency index for each species between 544 fermentation tanks from the same vineyard. The order of the species is the same as in A.

19 of 20 Microbial consistency in spontaneous wine fermentation

545 Tables 546 547 Table 1. Number of samples taken at the nine time points for each vineyard. *: vineyard 548 with 3 presses and each press has two technical replicates. #: vineyard with only 1 press with 549 two technical replicates. At the sedimentation time point two technical replicates were taken 550 from the same sedimentation tank before separating to three fermentation tanks (biological 551 replicates) for the alcoholic fermentation. One sample of time point rack is missing from 552 vineyard 6 and one press sample from vineyard 7 had to be excluded from the analysis. In 553 total there are 185 samples. Before fermentation During alcoholic fermentation

Vineyard Press Sedimentation Onset 2.5 4 6 8 11 rack

1 6* 2 3 3 3 3 3 3 3

2 2# 2 3 3 3 3 3 3 3

3 6* 2 3 3 3 3 3 3 3

4 6* 2 3 3 3 3 3 3 3

5 2# 2 3 3 3 3 3 3 3

6 2# 2 3 3 3 3 3 3 2

7 1# 2 3 3 3 3 3 3 3

554 555 Table 2. Differential abundance analysis with ALDEx2 between the vineyards. Each vine-yard 556 was compared to all vineyards individually. Only the intersection of the microorganisms which 557 are significantly more/less abundant (adjusted p-value < 0.05 for both the glm and Kruskal- 558 Wallis tests) in each vineyard are reported here.

vineyard more abundant less abundant

1 Gilliamella bombi, Gilliamella intestini, Erwinia billingiae, Pantoea vagans, Gilliamella mensalis, Kluyveromyces Pseudomonas syringae, wickerhamii, Snodgrassella alvi Sphingomonas melonis

2 Tatumella morbirosei, Tatumella saanichensis

Microbial consistency in spontaneous wine fermentation 20 of 20

3 Bacillus cereus, Bacillus thuringiensis Aureobasidium pullulans

4 Curtobacterium flaccumfaciens, Mucor circinelloides, Pichia Erwinia billingiae, Pantoea vagans, kudriavzevii Papiliotrema flavescens, Penicillium expansum

5 Alternaria alternata, Sphingomonas Rhizopus stolonifer, Schizo- melonis, Sphingomonas phyllosphae- saccharomyces pombe, rae, Sphingomonas wittichii Starmerella bacillaris, Tatumella ptyseos

6 Aspergillus niger, Mucor circinelloides

7

559 560 561 562 563 Supplementary Material 564 565 Figure S1. PCA including the negative control samples coloured by batch. The orange 566 samples in the upper left corner are suspected of being contaminated as also the negative 567 control samples that belong to those samples (orange cross) is drawn to that corner. 568 569 Figure S2. Abundances of species which are suspected to be contaminating some samples 570 of one batch. Displayed are the normal samples as well as the negative control samples for 571 the contaminated batch (in green) and the non-contaminated batches (in grey). 572 573 List S3. Wine related microorganisms used in the custom Kraken database. The species 574 which are contamination in some samples from the second batch are highlighted with a *. 575 576 Table S4. Overview of the draft genomes for the vineyards. The completeness and 577 contamination were estimated with CheckM as well as the Marker lineage. Only draft 578 genomes with an estimated completeness of 340% are reported. The number of genes and 579 contigs are given as additional quality measures. The taxonomic annotation from mapping 580 the genes to UniProt with mmseqs as well as how many genes had that hit (in %) are given. 581 For the Saccharomyces bins the completeness was additionally estimated using BUSCO.

Supplementary Table S1: The results of the differential abundance analysis with DESeq2 for the wine samples. All species are included. The baseMean is the mean of normalized counts of all samples, log2FoldChange is the log 2 fold change from the CRC samples to the control samples, lfcSE is the logfoldchangeStandard Error, stat is the Wald statistic used to calculate the p-values, pvalue is the p-value and padj is the for multiple testing adjusted p-value.

species baseMean log2FoldChange lfcSE stat pvalue padj Pseudomonas 52531.36384 18.44426354 1.001569028 18.41536931 9.89E-76 9.99E-74 syringae Bradyrhizobium sp. 84530.2595 19.13051999 1.164507309 16.42799477 1.21E-60 6.09E-59 BTAi1 Pseudoalteromonas 3102.032676 14.36253196 0.940570727 15.27001803 1.21E-52 4.08E-51 prydzensis Pseudomonas 4856.914008 15.00923355 1.062850854 14.12167426 2.79E-45 7.05E-44 cerasi Penicillium 2732.550721 14.17940478 1.010518166 14.03181582 9.96E-45 2.01E-43 expansum Starmerella 1602.799386 13.40977647 0.983144038 13.6396865 2.33E-42 3.91E-41 bacillaris Rhodanobacter 2739.094964 14.18303737 1.050176377 13.50538603 1.45E-41 2.10E-40 glycinis Vibrio metschnikovii 470.6511336 11.64290727 0.93583464 12.44120144 1.56E-35 1.97E-34 Hammondia 1948.194088 13.69087183 1.123681473 12.18394373 3.79E-34 4.25E-33 hammondi Rhizopus stolonifer 886.9465427 12.55663674 1.034283424 12.14042152 6.45E-34 6.51E-33 Pseudomonas 2385.099625 13.98324299 1.2274657 11.39196231 4.59E-30 4.21E-29 savastanoi Hanseniaspora 1536.692601 -4.521537088 0.407310034 -11.10097152 1.24E-28 1.04E-27 vineae Aspergillus awamori 2149.244638 13.83298356 1.254563201 11.02613527 2.86E-28 2.22E-27 Botrytis paeoniae 441.366855 11.54978453 1.074143882 10.75254882 5.76E-27 4.16E-26 Aspergillus 134626.62 8.96540346 0.836294215 10.72039397 8.17E-27 5.50E-26 carbonarius Pseudoruegeria 642.9613632 12.09203067 1.133151395 10.67115191 1.39E-26 8.77E-26 sabulilitoris Pseudomonas 1587.302955 13.39585475 1.285582713 10.42006446 2.01E-25 1.19E-24 aeruginosa Aspergillus 624.968032 12.05101152 1.171295255 10.28861977 7.93E-25 4.45E-24 brasiliensis Aspergillus lentulus 355.2334471 11.23612178 1.097878627 10.23439341 1.39E-24 7.39E-24 Pantoea 1017.85101 12.75480729 1.287245633 9.908604047 3.82E-23 1.93E-22 agglomerans Pseudomonas 1453.611696 13.26882201 1.396991018 9.49814411 2.14E-21 1.03E-20 syringae group genomosp. 3 Botryosphaeria 382.0683635 11.34075113 1.222356345 9.277778268 1.73E-20 7.94E-20 dothidea Preussia sp. BSL10 259.5711754 10.78404278 1.175616068 9.173099172 4.60E-20 2.02E-19 Leuconostoc 669.719988 19.93539562 2.174330009 9.168523426 4.80E-20 2.02E-19 mesenteroides Aureobasidium 15654.35761 5.781085472 0.655251676 8.822694673 1.12E-18 4.51E-18 pullulans Aureobasidium sp. 15720.80127 5.742371605 0.659687423 8.704685596 3.18E-18 1.24E-17 FSWF8-4 Pseudomonas 1408.554935 13.22338602 1.521184857 8.692819917 3.54E-18 1.32E-17 viridiflava Pseudomonas lutea 342.8535492 11.18502979 1.290582395 8.666653003 4.45E-18 1.61E-17 Acanthamoeba 288.2875258 10.93295728 1.263684877 8.651648424 5.08E-18 1.77E-17 palestinensis Komagataeibacter 412.8137763 19.58648275 2.286239237 8.567118625 1.06E-17 3.57E-17 hansenii Rhodopseudomonas 250.6694505 10.73288882 1.303812355 8.231927533 1.84E-16 6.00E-16 palustris Bradyrhizobium 353.6459744 11.22844399 1.450646768 7.740301936 9.92E-15 3.13E-14 elkanii Naganishia albida 161.3736536 18.90915141 2.641091249 7.159597917 8.09E-13 2.48E-12 443.0486888 -19.62875328 2.76072443 -7.110000934 1.16E-12 3.45E-12 senegalensis Aspergillus kawachii 2814.757159 7.622504157 1.095616967 6.957271005 3.47E-12 1.00E-11 Botrytis cinerea 50763.9135 7.687699945 1.117568338 6.878952887 6.03E-12 1.65E-11 Pseudomonas 295.5154615 19.34507146 2.811912802 6.879683982 6.00E-12 1.65E-11 amygdali Aspergillus 3172.571798 7.566238103 1.11164615 6.806336806 1.00E-11 2.66E-11 luchuensis Talaromyces 337.796557 11.16347379 1.662182617 6.716153612 1.87E-11 4.83E-11 atroroseus Aspergillus 24866.8453 4.948335759 0.747218325 6.62234262 3.54E-11 8.93E-11 tubingensis Pseudomonas 254.0456659 19.23582921 3.117607886 6.17006048 6.83E-10 1.68E-09 protegens Ralstonia insidiosa 581.4982089 11.9466239 1.940565305 6.156259657 7.45E-10 1.79E-09 Staphylococcus 115.2221693 9.611785535 1.570955907 6.118431137 9.45E-10 2.22E-09 epidermidis Aspergillus niger 15275.55201 7.660908857 1.283553298 5.968516359 2.39E-09 5.50E-09 Lactococcus lactis 186.7197966 19.01368872 3.232817497 5.881460595 4.07E-09 9.13E-09 Pyrenophora teres 144.8934971 9.942111942 1.751726086 5.675608775 1.38E-08 3.01E-08 Stenotrophomonas 124.8347825 18.72354535 3.300391079 5.673129307 1.40E-08 3.01E-08 maltophilia Pyrenochaeta sp. 147.2869195 9.96576833 1.91413884 5.206397843 1.93E-07 4.05E-07 UM 256 Aspergillus 580.8623704 5.920810794 1.155504049 5.124006964 2.99E-07 6.17E-07 aculeatus Micrococcus luteus 870.2995199 5.868836874 1.300189609 4.513831548 6.37E-06 1.29E-05 Hanseniaspora 556.1416931 -3.455325629 0.806144902 -4.286233928 1.82E-05 3.60E-05 osmophila Puccinia triticina 47.15830028 8.321900808 1.969630942 4.225106659 2.39E-05 4.64E-05 Lachancea 6780.19795 -2.595504135 0.628906011 -4.127014356 3.68E-05 7.00E-05 thermotolerans Dunaliella salina 701.9815931 1.829968276 0.446567088 4.097857466 4.17E-05 7.80E-05 Plasmodium ovale 206.3673184 4.307682495 1.063938315 4.048808503 5.15E-05 9.45E-05 Volvox carteri 414.8851649 1.744197412 0.436144532 3.999127092 6.36E-05 0.000114665 Hanseniaspora 545.6833383 -1.259192174 0.318454324 -3.954074663 7.68E-05 0.00013614 guilliermondii Malassezia restricta 492.9480081 3.964808279 1.02264507 3.877013046 0.000105747 0.000184145 Melampsora 469.2980816 3.002185981 0.929365179 3.230362025 0.001236336 0.002116439 pinitorqua Diplodia seriata 475.621909 5.064038253 1.584855458 3.195268204 0.00139701 0.002351633 Alternaria 822.2872262 3.171557219 1.004585287 3.157081095 0.00159357 0.002638534 arborescens Alternaria alternata 1912.513092 2.704802482 0.92964259 2.909507924 0.003619982 0.005897068 Hanseniaspora 2389.13341 1.207394393 0.448154783 2.69414595 0.007056926 0.011313484 uvarum Puccinia arachidis 702.5557304 0.966016225 0.362311078 2.666261904 0.007669991 0.012104205 Physarum 954.425418 1.25889182 0.482551481 2.608823864 0.009085399 0.014117313 polycephalum Emiliania huxleyi 160.4077479 3.701583514 1.447816913 2.556665474 0.010568079 0.016172364 Tetradesmus 676.3874395 2.280851108 0.910480582 2.505106811 0.012241442 0.018453517 obliquus Balamuthia 180.4091185 2.914559986 1.199994262 2.428811603 0.015148402 0.022499832 mandrillaris Erysiphe pisi 605.8320219 2.109250121 0.87586414 2.408193263 0.016031692 0.023466679 Neocallimastix 1069.39021 -1.127634977 0.475602025 -2.370963363 0.01774179 0.02523832 californiae Lactobacillus 447.6246457 5.105402459 2.149736711 2.374896625 0.017553866 0.02523832 uvarum Gluconobacter 1151.422504 -4.018871797 1.758369163 -2.285567718 0.022279561 0.031253272 oxydans Tatumella ptyseos 1519.918395 2.468319508 1.115788402 2.212175269 0.026954557 0.037293291 Anaeromyces 525.4017129 -1.149767104 0.521898772 -2.203046196 0.027591493 0.037658659 robustus Pecoramyces 646.0410101 -1.031219819 0.475171748 -2.1702044 0.029991365 0.040388371 ruminatium Chrysoporthe 466.4843167 2.336267707 1.107261853 2.109950505 0.034862619 0.046330586 austroafricana Erysiphe necator 22630.34937 1.299247857 0.631071105 2.058797886 0.039513604 0.051829533 Cutibacterium acnes 8745.576011 2.792224739 1.401873503 1.991780809 0.046395113 0.060075723 Torulaspora 2458.290292 -1.296773829 0.663326605 -1.954955251 0.050588352 0.064676248 delbrueckii Acetobacter 281.0553325 -1.633681697 0.908409514 -1.798397828 0.072113987 0.091043908 cerevisiae Saccharomyces sp. 49733.47628 -1.142387032 0.73986046 -1.544057418 0.122574423 0.151374522 'boulardii' Komagataeibacter 128.5329763 -4.231003202 2.742556275 -1.542722474 0.122898127 0.151374522 europaeus Acetobacter 1458.987913 -1.770209619 1.215102233 -1.456840067 0.145160533 0.17664113 pasteurianus Saccharomyces 2728258.672 -1.045926562 0.736104411 -1.420894301 0.15534749 0.186786863 cerevisiae Saccharomyces 11487.89089 -1.000674465 0.740399483 -1.351533176 0.1765247 0.209752879 cerevisiae x Saccharomyces kudriavzevii Torulaspora 603.9824144 1.408495358 1.117792228 1.26006902 0.207644465 0.243861523 microellipsoides Halomonas 661.3971917 1.097410879 0.907120931 1.209773517 0.226365811 0.262792493 desiderata Variovorax 716.8172022 1.054568511 0.902657774 1.168292726 0.242688708 0.278540449 paradoxus Shewanella benthica 723.5306676 0.99842293 1.063584345 0.938734136 0.347867263 0.394770714 Campylobacter 1130.1561 0.799643833 0.900223526 0.888272535 0.37439417 0.420153457 mucosalis Saccharomyces 5829.768543 -0.752830231 0.855705536 -0.879777213 0.378980011 0.420626166 pastorianus Bacillus 2617.366704 1.218832628 1.512625392 0.805772952 0.420373777 0.461497298 weihenstephanensis Saccharomyces 1258.358388 -0.56553136 0.965814221 -0.585548802 0.558178762 0.601388526 paradoxus Meyerozyma 358.4872109 0.892312873 1.529831977 0.583275083 0.559708134 0.601388526 guilliermondii Bacillus cereus 30526.50244 0.562473299 1.03207053 0.544995019 0.585756943 0.622752119 Malassezia slooffiae 323.0450341 0.487703859 1.162227545 0.41962855 0.674756829 0.709900413 Symbiodinium 667.9845075 -0.144405208 0.373646105 -0.386475881 0.699144262 0.727974953 microadriaticum Vibrio anguillarum 4809.055198 -0.372178662 1.126114025 -0.330498203 0.74102355 0.763707944 Lactobacillus 257.8501924 -0.757331364 2.902724953 -0.260903591 0.794166855 0.81021063 plantarum Hanseniaspora 1163.525898 0.057583358 0.43475602 0.132449823 0.894628509 0.894628509 opuntiae Nitratireductor 550.0180549 -0.143624147 1.05218479 -0.136500877 0.891425331 0.894628509 indicus Puccinia striiformis 586.1147548 4.358420669 1.312601729 3.320444102

Oenococcus oeni 650.8329539 -2.13169349 1.899414962 -1.122289511

Burkholderia 31.77635504 -0.816132156 3.195880285 -0.255370065 cenocepacia Kocuria polaris 181.7017505 4.254689777 1.432416673 2.970287806

Clavaria fumosa 393.8692318 4.510432811 1.099132548 4.103629558

Lactobacillus casei 1216.7488 -18.71377523 3.236593434 -5.781935733

Lactobacillus 1535.534314 -9.164787211 3.196435681 -2.867189622 paracasei Lactobacillus 61.48243984 -22.2596367 3.236863221 -6.876916069 rhamnosus Rhizopus oryzae 389.3725667 3.440226685 1.185389457 2.902191061

Acetobacter 55.89998518 -22.39534853 3.236892322 -6.918780826 ghanensis Propionibacterium 0 0 0 0 1 acnes Burkholderia 32.89054511 -0.732149423 3.195619214 -0.229110346 cenocepacia Kocuria polaris 199.9054523 4.399951778 1.408755574 3.123289704

Clavaria fumosa 406.5662042 4.561254649 1.062611707 4.292494256

Lactobacillus casei 1216.7488 -18.71452994 3.236589173 -5.782176525

Lactobacillus 1535.464819 -9.179588069 3.196566772 -2.871702275 paracasei Lactobacillus 61.48243984 -22.25987044 3.236813378 -6.87709418 rhamnosus Rhizopus oryzae 416.4019036 3.542361587 1.157941211 3.059189493

Acetobacter 55.89998518 -22.39556017 3.236837558 -6.918963269 ghanensis Schistosoma 0 0 0 0 1 haematobium Schistosoma 0 0 0 0 1 rodhaini Propionibacterium 0 0 0 0 1 acnes

Supplementary Table S2: The results of the differential abundance analysis with DESeq2 for the CRC samples. All species are included. The baseMean is the mean of normalized counts of all samples, log2FoldChange is the log 2 fold change from the CRC samples to the control samples, lfcSE is the logfoldchangeStandard Error, stat is the Wald statistic used to calculate the p-values, pvalue is the p-value and padj is the for multiple testing adjusted p-value.

species baseMean log2FoldChange lfcSE stat pvalue padj Parvimonas sp. oral 182.5638454 -21.12821137 1.734267793 -12.18278484 3.84E-34 1.04E-31 taxon 393 Parvimonas sp. oral 188.8697404 -21.08799647 1.82648589 -11.54566624 7.76E-31 1.05E-28 taxon 110 Fusobacterium 304.0602796 -20.47357173 1.807434216 -11.3274229 9.60E-30 8.67E-28 nucleatum Megasphaera 9.40668745 -25.35458986 2.986967878 -8.488403924 2.09E-17 1.42E-15 micronuciformis [Eubacterium] 50.31698957 -22.94604866 2.732365501 -8.397869412 4.55E-17 2.46E-15 infirmum Bacteroides nordii 37464.62034 -2.82934458 0.340006543 -8.321441554 8.69E-17 3.36E-15 Fusobacterium varium 13.29840847 -24.85570463 2.986816311 -8.321805575 8.66E-17 3.36E-15 Klebsiella sp. 1_1_55 7.326364494 -24.62990391 2.987115319 -8.245381005 1.65E-16 5.58E-15 Porphyromonas 17.31162925 -24.47591102 2.986731294 -8.194882168 2.51E-16 7.55E-15 endodontalis Proteus mirabilis 5.472880927 -24.24058359 2.987326557 -8.114473972 4.88E-16 1.32E-14 Peptoniphilus harei 21.52195679 -24.16295336 2.986676203 -8.090248729 5.95E-16 1.47E-14 Desulfovibrio sp. 27.51074984 -23.81027897 2.986626858 -7.97229788 1.56E-15 3.47E-14 6_1_46AFAA Acinetobacter 3.901023207 -23.79366291 2.98758086 -7.964190436 1.66E-15 3.47E-14 haemolyticus Campylobacter coli 3.215643741 -23.54139924 2.987770026 -7.879254104 3.29E-15 6.38E-14 Bacteroides sp. 44162.10731 -2.249635302 0.294678259 -7.634208601 2.27E-14 4.10E-13 2_1_16 Fusobacterium sp. oral 74.70925052 -22.39258214 2.986515277 -7.497896399 6.49E-14 1.10E-12 taxon 370 Butyrivibrio crossotus 102496.1186 2.948381185 0.399687704 7.376712263 1.62E-13 2.59E-12 Flavonifractor plautii 13563.91242 -1.658353012 0.226433639 -7.323792606 2.41E-13 3.63E-12 Fusobacterium 117.4375965 -21.77575791 2.986492038 -7.291416696 3.07E-13 4.37E-12 necrophorum Lactobacillus 70.11138076 18.39769136 2.600445159 7.074823821 1.50E-12 2.03E-11 plantarum Fusobacterium 225.1754203 -20.95578715 2.986473149 -7.016901241 2.27E-12 2.93E-11 gonidiaformans Bacteroides fragilis 183327.6696 -1.761399064 0.261672605 -6.73130864 1.68E-11 2.07E-10 [Clostridium] 22568.08032 -1.018462076 0.153321044 -6.642676369 3.08E-11 3.63E-10 symbiosum Bacteroides sp. 36344.64533 -1.903665756 0.292322929 -6.512201286 7.41E-11 8.36E-10 2_1_56FAA Lachnospiraceae 33125.33616 -1.131792404 0.174453756 -6.48763564 8.72E-11 9.45E-10 bacterium 7_1_58FAA Hungatella hathewayi 13096.8495 -1.338975576 0.215216473 -6.221529223 4.92E-10 5.13E-09 Lactobacillus johnsonii 90.12158307 18.57508568 2.991710228 6.208851881 5.34E-10 5.36E-09 Succinatimonas hippei 32.75345741 17.84930066 2.991775393 5.966123224 2.43E-09 2.35E-08 Actinomyces sp. oral 8.601604985 16.88868307 2.992061565 5.644497183 1.66E-08 1.55E-07 taxon 175 Bacteroides sp. 3_2_5 29467.87082 -2.22474548 0.404390873 -5.501473022 3.77E-08 3.40E-07 Peptostreptococcus 3987.467277 -8.053209441 1.480174652 -5.440715682 5.31E-08 4.64E-07 stomatis Clostridium sp. 11491.13427 -0.807957037 0.148887385 -5.426631912 5.74E-08 4.86E-07 7_3_54FAA Lactobacillus ruminis 1304.114819 2.035615778 0.383030474 5.314500845 1.07E-07 8.78E-07 Coprococcus eutactus 18641.2596 1.228810205 0.238705933 5.147799177 2.64E-07 2.10E-06 Parvimonas micra 2560.259196 -5.261819623 1.035886496 -5.079532984 3.78E-07 2.93E-06 [Clostridium] citroniae 10826.26926 -0.977816441 0.198845562 -4.917466762 8.77E-07 6.58E-06 Subdoligranulum sp. 43856.18049 -1.053949346 0.214536657 -4.912677225 8.98E-07 6.58E-06 4_3_54A2FAA Bacteroides sp. 1_1_6 167762.9505 -1.201013552 0.269270181 -4.460254553 8.19E-06 5.84E-05 Bilophila sp. 4_1_30 8597.813852 -1.941318608 0.447680862 -4.336389543 1.45E-05 0.000100647 Bacteroides salyersiae 33409.22955 -1.535703552 0.360920724 -4.254960855 2.09E-05 0.000141655 Clostridium sp. L2-50 49701.64228 1.144004428 0.283956829 4.028797027 5.61E-05 0.000370563 Coprobacillus sp. 7847.514155 0.611963985 0.155015963 3.947748184 7.89E-05 0.000509027 8_2_54BFAA [Clostridium] bolteae 20844.3076 -0.923168382 0.235356819 -3.922420365 8.77E-05 0.000552486 [Eubacterium] hallii 131420.1755 1.232669112 0.316192231 3.89848008 9.68E-05 0.000582941 Eubacterium 48108.27526 0.892183174 0.228638214 3.902161229 9.53E-05 0.000582941 ventriosum Roseburia intestinalis 134183.1088 1.03040427 0.266654774 3.86418834 0.000111459 0.00065664 Bifidobacterium 20317.73542 2.959712107 0.770885306 3.83936765 0.000123352 0.00071124 catenulatum Bifidobacterium 18141.15149 2.448256539 0.654548579 3.740374083 0.000183747 0.001037403 pseudocatenulatum Parabacteroides 117153.8112 -0.997705166 0.273468431 -3.648337629 0.000263943 0.001459764 distasonis Bacteroides uniformis 539356.0717 -0.851959972 0.242552743 -3.512473051 0.000443957 0.002406247 Bacteroides sp. 60060.95832 -1.137017421 0.324688598 -3.501870496 0.000462004 0.002454963 3_1_23 Bacteroides sp. 200353.4756 -0.894796523 0.255945738 -3.496039937 0.000472218 0.002460982 4_1_36 Bifidobacterium 1230.385294 3.616940287 1.053920101 3.431892306 0.000599386 0.003064783 angulatum Bacteroides sp. 58446.73849 -0.957641328 0.282222172 -3.393217914 0.000690767 0.003466625 3_1_19 Lachnospiraceae 6964.171417 0.621673919 0.184143678 3.376026407 0.000735409 0.003623559 bacterium 1_4_56FAA Parabacteroides sp. 66886.56014 -0.962533962 0.289475333 -3.325098379 0.000883873 0.004234376 D13 Bacteroides sp. D20 204281.4461 -0.868813189 0.261819569 -3.318366131 0.000905457 0.004234376 Lachnospiraceae 5407.710842 0.586876134 0.176869988 3.318121634 0.00090625 0.004234376 bacterium 6_1_63FAA [Bacteroides] 34367.07066 0.635722905 0.194901445 3.261765988 0.001107205 0.005085637 pectinophilus Clostridium sp. HGF2 2158.441654 -0.871871995 0.27149338 -3.211393201 0.001320931 0.005966203 Streptococcus 8013.499718 1.380108508 0.432509809 3.19092996 0.001418157 0.006213833 salivarius Bifidobacterium 107336.4782 2.314205463 0.725404831 3.190226153 0.001421615 0.006213833 adolescentis Acidaminococcus sp. 6256.126078 -1.212375654 0.386749809 -3.134780224 0.001719829 0.007397995 D21 Veillonella parvula 1746.932289 -2.825394933 0.910462696 -3.103251725 0.001914068 0.00810488 [Clostridium] leptum 15382.16957 0.677783831 0.219636485 3.085934606 0.002029134 0.008331749 Bacteroides 44947.61756 -1.141573858 0.369907375 -3.086107318 0.002027956 0.008331749 intestinalis Bacteroides sp. 2_2_4 78854.73168 -0.874245332 0.28381569 -3.080327706 0.002067729 0.008363503 Escherichia coli 647970.0368 -1.548671858 0.510132599 -3.035822175 0.002398808 0.009559955 [Clostridium] 9801.461214 -0.587272892 0.19434067 -3.021873355 0.002512156 0.009866585 clostridioforme [Clostridium] 7237.766589 -0.701831749 0.232884726 -3.013644395 0.002581302 0.009993325 asparagiforme Gemella morbillorum 618.5470003 -5.338245106 1.781151114 -2.997075916 0.002725828 0.010404218 Lachnospiraceae 75095.37232 0.925799047 0.30967226 2.989609226 0.002793346 0.010513842 bacterium 5_1_63FAA Anaerostipes caccae 1566.70803 -0.648064671 0.217293701 -2.982436521 0.002859639 0.010615922 Blautia obeum 118184.6691 0.746673659 0.251155409 2.972954727 0.00294948 0.010801472 Blautia hansenii 6020.116912 0.523280552 0.178221123 2.936130929 0.00332334 0.012008336 Prevotella multiformis 1667.132452 -1.169646922 0.40223754 -2.907851221 0.003639214 0.012976672 Peptostreptococcus 706.9337703 -1.481288018 0.510614255 -2.900992289 0.00371983 0.01309187 anaerobius Clostridium sp. SS2/1 87085.29037 0.876950252 0.304426807 2.880660418 0.003968429 0.013787749 Mobiluncus curtisii 508.0227259 -2.307167814 0.818837104 -2.817615132 0.004838177 0.016219786 Bacteroides sp. D2 35536.91488 -1.028300117 0.364995067 -2.817298671 0.004842947 0.016219786 Solobacterium moorei 812.0374982 -4.186843005 1.4862957 -2.816965025 0.00484798 0.016219786 Parabacteroides sp. 51923.01801 -0.788549243 0.282566027 -2.790672502 0.005259866 0.017383217 D25 Coprococcus comes 120798.237 0.64973991 0.233966346 2.777065676 0.005485209 0.01769633 Veillonella sp. 6_1_27 983.209438 -3.075926087 1.107505321 -2.777346554 0.005480471 0.01769633 Bacteroides coprocola 34953.10948 -0.869419333 0.314698167 -2.7627086 0.005732392 0.018276214 Anaerostipes hadrus 71926.09602 0.815844283 0.297577038 2.74162378 0.006113632 0.019071094 Prevotella nigrescens 415.759503 -4.103567657 1.497024101 -2.741150029 0.006122454 0.019071094 Alistipes sp. HGB5 38565.65376 -0.814592549 0.306969882 -2.653656264 0.007962489 0.024520848 Parabacteroides 186266.7705 -0.726401811 0.274877453 -2.642638762 0.008226275 0.025048545 merdae Veillonella sp. 3_1_44 796.5379932 -2.986280952 1.132500363 -2.636891827 0.008366951 0.025193819 Bacteroides clarus 11047.2196 1.082068794 0.411185585 2.631582508 0.008498823 0.025309681 Marvinbryantia 1722.174327 0.643766814 0.245573867 2.621479315 0.008754907 0.025788912 formatexigens Bacteroides ovatus 364095.1074 -0.809576012 0.312318068 -2.592152341 0.009537753 0.027792806 Enterococcus faecium 5781.495418 0.885500522 0.349870341 2.530939091 0.01137576 0.032796074 Veillonella dispar 531.3715874 -3.462945413 1.397530905 -2.477902565 0.013215723 0.037699589 Lactobacillus reuteri 2006.958674 3.547158779 1.43749192 2.467602586 0.013602124 0.038397662 Blautia 9649.073127 -0.659966913 0.273572189 -2.412404987 0.015847667 0.044200873 hydrogenotrophica Prevotella denticola 1428.201981 -1.35030446 0.560460012 -2.409278862 0.015984079 0.044200873 Pseudoflavonifractor 7974.683947 -0.368550207 0.154486079 -2.385653197 0.01704882 0.046668992 capillosus Intestinibacter bartlettii 13557.46886 0.853646319 0.366262994 2.330692244 0.019769593 0.053045147 Porphyromonas 576.8332351 -1.392131711 0.597090354 -2.331526042 0.019725639 0.053045147 asaccharolytica Bacteroides sp. 32704.9886 -0.793109931 0.340915252 -2.326413751 0.019996489 0.053127926 2_1_33B Parabacteroides sp. 67290.10442 -0.602459454 0.26449092 -2.277807698 0.022738037 0.059825322 20_3 Lachnospiraceae 53964.95536 -0.655827694 0.288782713 -2.271007452 0.023146527 0.060314508 bacterium 1_1_57FAA Bacteroides sp. D22 51463.13237 -0.606027368 0.268219575 -2.259444967 0.02385572 0.061486523 Lachnospiraceae 54433.11863 -0.655153128 0.290362503 -2.256328285 0.024050079 0.061486523 bacterium 8_1_57FAA Eubacterium sp. 4816.967306 0.578792002 0.26022348 2.224211289 0.026134232 0.066190438 3_1_31 Clostridium sp. M62/1 26441.22791 0.239693093 0.108075394 2.217832243 0.026566273 0.066661666 Streptococcus 1391.703749 0.673305759 0.31232059 2.155816113 0.031098027 0.077317113 dysgalactiae Bacteroides sp. 74482.93613 -0.596363654 0.277351361 -2.15020994 0.031538612 0.077699672 1_1_30 Clostridium sp. D5 4922.795584 -0.350368082 0.16328387 -2.145760522 0.03189209 0.07786267 Alistipes indistinctus 5638.682609 -1.438907155 0.679282626 -2.118274632 0.034151816 0.082635197 Anaerotruncus 12588.95495 -0.398976865 0.189192948 -2.108835813 0.034958756 0.083839141 colihominis Granulicatella 166.6749098 -3.720192576 1.786958495 -2.081857293 0.037355507 0.088801248 adiacens Streptococcus 2121.860918 -1.78420264 0.860316301 -2.073891472 0.038089391 0.089758478 anginosus Paraprevotella 9391.731788 0.746339008 0.368768915 2.023866377 0.042983902 0.100419289 xylaniphila [Eubacterium] siraeum 14829.05594 -0.728356451 0.361976473 -2.012165167 0.044202529 0.102383636 Bacteroides 150482.7384 -0.691752048 0.344669098 -2.007003389 0.044749302 0.102771702 cellulosilyticus Dorea formicigenerans 93321.81254 0.375285174 0.188898116 1.986706816 0.046954896 0.106930898 Johnsonella ignava 1081.773715 -0.45372498 0.229711534 -1.975194589 0.04824607 0.10858226 Megamonas funiformis 1833.72933 5.877512905 2.978784595 1.973124514 0.048481378 0.10858226 Lachnospiraceae 58271.54563 -0.581130057 0.300103171 -1.936434247 0.052814538 0.11731754 bacterium 3_1_46FAA [Ruminococcus] 51293.87197 -0.618238401 0.321671923 -1.921953263 0.054611639 0.120323204 torques Bacteroides sp. 40304.15856 -0.513214806 0.275286954 -1.86429033 0.062280944 0.136113999 2_1_22 Bacteroides sp. D1 41097.06025 -0.50890668 0.274927562 -1.851057331 0.064161306 0.138374668 Streptococcus 480.1665894 1.803968732 0.975202385 1.849840362 0.064336562 0.138374668 infantarius Streptococcus sp. 939.8149657 1.298205463 0.708101022 1.833361939 0.066748764 0.142432402 C150 Lactobacillus 3.076439871 -5.44217813 2.987779301 -1.821479293 0.068534032 0.145099395 acidophilus Clostridiales bacterium 4276.722123 -0.33271129 0.184876928 -1.799636619 0.071918035 0.151083624 1_7_47FAA Roseburia 140930.0808 -0.380402732 0.212854463 -1.787149427 0.073913341 0.154080887 inulinivorans Klebsiella oxytoca 1442.926909 -1.722355795 0.975038311 -1.766449353 0.077320491 0.159953078 Prevotella bivia 3560.660639 -0.585429906 0.343395769 -1.704825625 0.088226951 0.181132603 Bacteroides caccae 149694.8971 -0.503434725 0.30083691 -1.673447336 0.094239266 0.190687051 Holdemania filiformis 6367.16408 -0.289049866 0.172752798 -1.67319933 0.094288062 0.190687051 Prevotella buccalis 2040.806808 -0.529551233 0.321575788 -1.646738504 0.099611806 0.199961478 Alloprevotella rava 121.3516882 -3.586527538 2.198130832 -1.631626055 0.102758286 0.204760996 Bacteroides 9915.69088 -0.402680411 0.249115864 -1.616438251 0.105999594 0.209678029 oleiciplenus Streptococcus 3844.271043 0.739488162 0.462877319 1.597589968 0.110134261 0.211958597 vestibularis Parabacteroides 10872.97787 -0.52149769 0.326064229 -1.599371058 0.109738179 0.211958597 goldsteinii Lachnospiraceae 2917.793922 0.318189916 0.199250859 1.596931212 0.110281041 0.211958597 bacterium 9_1_43BFAA Streptococcus oralis 590.0981436 -1.631788338 1.015485201 -1.60690509 0.10807519 0.211958597 Enterococcus faecalis 19473.68116 0.532997071 0.339646637 1.569269392 0.116585181 0.222497071 Raoultella 436.5496637 -2.270163197 1.45656156 -1.558576897 0.119096555 0.225700464 ornithinolytica Fusobacterium 3.243257867 4.641635541 2.992601332 1.55103705 0.120892805 0.227513544 ulcerans Prevotella oris 2704.500802 -0.604838594 0.394891748 -1.531656706 0.125607171 0.234755472 Streptococcus sp. oral 167.8123432 -2.866438408 1.882433915 -1.5227299 0.127826294 0.237266614 taxon 071 Bifidobacterium breve 1663.964746 1.263522498 0.833846439 1.515293992 0.129697969 0.239103059 Tannerella sp. 20708.74584 -0.436070891 0.290438777 -1.501421042 0.133246695 0.243985502 6_1_58FAA_CT1 Mitsuokella multacida 271.5342394 -3.59585352 2.406953695 -1.493943788 0.135190312 0.245883051 Streptococcus 491.3150656 -1.592787979 1.06946637 -1.489329654 0.136400583 0.246430386 intermedius Prevotella marshii 895.088099 -0.83956958 0.572760712 -1.465829556 0.142694727 0.25609451 Erysipelotrichaceae 3841.982901 -0.413533482 0.285712405 -1.447376716 0.147791443 0.263496587 bacterium 21_3 Escherichia sp. 10363.4785 -0.953534712 0.680406339 -1.401419501 0.161088664 0.283474208 1_1_43 Sutterella parvirubra 2.410523587 4.204428409 2.992962974 1.404771274 0.160089304 0.283474208 Alloprevotella 588.0236217 -1.340850476 0.976500468 -1.373118109 0.169715632 0.296728622 tannerae Lactobacillus gasseri 1046.443311 -1.875729943 1.37060139 -1.368545193 0.171141486 0.297303479 Peptoclostridium 6236.549027 -0.217532452 0.160147276 -1.358327516 0.174359783 0.300964975 difficile Holdemanella biformis 14585.62837 0.402068287 0.297273777 1.352518512 0.176209501 0.302232752 Collinsella tanakaei 547.8985096 -1.393582288 1.058356211 -1.316742202 0.187925052 0.318298058 Oxalobacter 2.00029174 3.946493314 2.99322699 1.318474451 0.187344872 0.318298058 formigenes Coprobacillus sp. 2864.093253 0.426184383 0.329737207 1.292497095 0.196185051 0.330224527 3_3_56FAA Prevotella disiens 2323.625725 -0.466295235 0.364861147 -1.27800737 0.201246828 0.336653644 Bifidobacterium sp. 46636.90126 0.741292978 0.599453543 1.236614558 0.216230213 0.359499311 12_1_47BFAA Alistipes putredinis 182072.44 -0.410287661 0.336125107 -1.220639731 0.222222456 0.363848382 Anaerostipes sp. 503.2393942 -1.297750038 1.063650222 -1.220090977 0.222430389 0.363848382 3_2_56FAA Klebsiella sp. 757.7510299 -1.364596444 1.119511168 -1.218921689 0.222873917 0.363848382 4_1_44FAA Desulfovibrio piger 4246.527521 -1.373220755 1.132913694 -1.212114182 0.225468676 0.365880306 Veillonella atypica 657.0941929 -1.287238348 1.077471407 -1.194684462 0.232210336 0.372360952 Bacteroides 19414.16007 0.335999831 0.281092977 1.195333424 0.231956786 0.372360952 coprophilus Lactobacillus 370.5002563 1.727379039 1.459566551 1.183487685 0.236615924 0.37719362 paracasei Collinsella intestinalis 69.6974246 3.503564095 2.979803345 1.17577024 0.239686702 0.379854363 Citrobacter youngae 37.940302 -3.486969992 2.979350187 -1.170379369 0.241848334 0.381051736 Bacteroides plebeius 36314.063 0.343605523 0.294485578 1.166799153 0.243291489 0.381109789 Bacteroides 40715.59249 -0.339772604 0.293088361 -1.15928385 0.246340502 0.382655235 xylanisolvens Escherichia sp. 9482.850278 -0.790729246 0.683184489 -1.15741686 0.247102089 0.382655235 4_1_40B Bacteroides sp. 354953.5299 -0.284589224 0.246983338 -1.15226082 0.249213912 0.383732784 3_1_40A Klebsiella sp. MS 92-3 686.1179152 -1.344328972 1.204724241 -1.115881067 0.264473047 0.404927659 Fusobacterium 145.9638764 -3.313432028 2.978718287 -1.112368378 0.265979795 0.404946767 mortiferum [Clostridium] 1751.968295 1.292006876 1.17841435 1.096394384 0.27290622 0.41317087 spiroforme [Eubacterium] 6418.633834 0.151108634 0.139151603 1.085928084 0.277510811 0.413473811 dolichum Dialister 1054.744342 0.76297232 0.710929404 1.073204057 0.283179571 0.413473811 succinatiphilus [Clostridium] 193.9346131 -1.500484124 1.385941098 -1.082646389 0.278965409 0.413473811 hylemonae Leuconostoc 685.8571186 1.122720973 1.044458811 1.074930826 0.282405699 0.413473811 mesenteroides Streptococcus 840.3716651 0.856104619 0.798715227 1.071852132 0.283786453 0.413473811 australis Lactobacillus oris 209.6076835 -2.340494304 2.181646773 -1.072810838 0.283355997 0.413473811 Streptococcus sp. 102.4751145 -2.522826885 2.337614383 -1.079231418 0.280484577 0.413473811 F0441 Bacteroides finegoldii 40378.52915 -0.299115877 0.280958561 -1.064626316 0.287045066 0.413772409 Eggerthella sp. 8485.956845 -0.433289293 0.406454997 -1.066020336 0.286414449 0.413772409 1_3_56FAA Lachnospiraceae 3329.913001 0.262869334 0.24949434 1.053608406 0.292062239 0.418777073 bacterium 5_1_57FAA Methanobrevibacter 118187.0258 -0.921347374 0.895938648 -1.028359895 0.303780566 0.433287018 smithii Enterococcus 89.4168884 3.03329464 2.9793575 1.018103615 0.308628718 0.437897291 saccharolyticus Haemophilus 818.9358302 -0.980521145 0.96689588 -1.014091761 0.310538969 0.438312816 parainfluenzae Faecalibacterium 259850.9441 0.198812053 0.199760716 0.995251003 0.319614202 0.448784708 prausnitzii Bacteroides sp. 80639.47869 -0.232489899 0.240790588 -0.965527353 0.334280742 0.464564519 3_1_33FAA Pyramidobacter 100.8814643 -2.879790401 2.978758675 -0.96677533 0.33365636 0.464564519 piscolens Eggerthella sp. HGA1 8633.048903 -0.420942334 0.440025721 -0.956631202 0.338753434 0.468378473 Tyzzerella nexilis 27135.6227 0.172026153 0.180810205 0.951418388 0.341392029 0.46963066 Prevotella salivae 678.3792048 -0.579093874 0.615987051 -0.94010722 0.347162566 0.475156845 Bacteroides vulgatus 348424.7284 -0.236308669 0.254951688 -0.926876269 0.35399078 0.481785243 Streptococcus infantis 1192.976535 0.394480755 0.426993566 0.92385644 0.355561065 0.481785243 Bacteroides fluxus 12107.92791 0.276808374 0.301556782 0.917931183 0.358654895 0.483559585 Bacteroides sp. 333680.1159 -0.226123444 0.251304243 -0.899799544 0.368226937 0.494007425 4_3_47FAA Dialister invisus 15643.95227 1.212367302 1.411509966 0.858915155 0.390387328 0.521157467 Citrobacter freundii 70.554407 -2.452867199 2.882913086 -0.850829396 0.39486413 0.522520029 Klebsiella pneumoniae 411.3098236 -1.33985061 1.576091858 -0.850109468 0.395264228 0.522520029 Burkholderiales 6015.343809 0.572887967 0.70945491 0.807504407 0.419375937 0.551703295 bacterium 1_1_47 Streptococcus 2440.751133 0.274180016 0.348197344 0.787427074 0.431031903 0.564297805 sanguinis Erysipelotrichaceae 13117.86633 0.104266789 0.141071421 0.739106389 0.459842397 0.599121585 bacterium 2_2_44A Parasutterella 6871.770347 0.41421526 0.567842545 0.729454429 0.465723734 0.603011015 excrementihominis Bifidobacterium 2799.864777 -0.46030795 0.633234735 -0.726915195 0.467277908 0.603011015 dentium Streptococcus sp. 838.7389446 0.606336288 0.877568303 0.690927744 0.48961095 0.628836812 F0442 Lactobacillus 161.1498851 -0.913397681 1.335340679 -0.684018464 0.493963506 0.629894795 delbrueckii Slackia piriformis 136.0887838 -1.767849168 2.591219245 -0.682246078 0.495083363 0.629894795 Collinsella aerofaciens 16145.5753 0.399504508 0.6042533 0.661154036 0.508513535 0.643958729 Enterobacter 30.90745226 -1.939758722 2.979104432 -0.651121425 0.514968107 0.649099335 cancerogenus Dorea longicatena 124830.3714 0.166558205 0.262825836 0.633720821 0.526263034 0.660265196 [Clostridium] 3214.053275 -0.156919709 0.249747745 -0.628312818 0.529799037 0.661638428 methylpentosum Streptococcus 82.76421514 1.699660602 2.747815465 0.618549762 0.536213006 0.666576719 gallolyticus Paraprevotella clara 10721.38763 0.205794232 0.340446026 0.604484164 0.545521794 0.675052083 Coprobacillus sp. 11602.58193 0.161273341 0.271405956 0.594214449 0.55236869 0.677338982 29_1 Actinomyces 277.4198982 -0.64997853 1.091967796 -0.595235988 0.551685738 0.677338982 odontolyticus Clostridium celatum 618.9476423 0.667814707 1.13336525 0.589231677 0.55570587 0.678361671 Prevotella sp. C561 1928.278203 -0.21705399 0.372101812 -0.583318822 0.559678694 0.680147651 Propionibacterium 268.4554764 0.601211505 1.045879937 0.574837975 0.565400881 0.684034102 acnes Bifidobacterium 74271.92905 0.219971636 0.39592366 0.555591034 0.578490462 0.696759623 longum Subdoligranulum 8922.159214 -0.089129882 0.161592508 -0.551571871 0.581241713 0.696975682 variabile Barnesiella 60753.53826 -0.154321137 0.286499868 -0.538642961 0.590133238 0.701904402 intestinihominis Streptococcus 25.6544034 1.60329308 2.97973819 0.538065084 0.590532117 0.701904402 agalactiae Anaerococcus 77.65434596 -1.275195894 2.493009001 -0.511508741 0.60899487 0.720688252 vaginalis Lactobacillus 149.8025225 1.031293702 2.050254967 0.503007537 0.614958973 0.724582094 fermentum Staphylococcus 65.44373828 -1.060569011 2.280451318 -0.465069788 0.64188148 0.749783971 aureus Enterobacter 172.659478 -0.96469128 2.060290139 -0.46823079 0.639619556 0.749783971 hormaechei Phascolarctobacterium 61905.49538 -0.10953845 0.248962571 -0.439979588 0.659951891 0.76758353 succinatutens Lactobacillus 111.4872308 -0.871015934 2.042487523 -0.426448595 0.669780998 0.775686541 rhamnosus Actinomyces 165.1117344 -0.51583025 1.259932092 -0.409411152 0.682237957 0.786751006 graevenitzii Clostridium sp. 90.70630215 -0.91020262 2.293139737 -0.396924185 0.691423369 0.793964971 7_2_43FAA Lactobacillus crispatus 726.2812601 -0.24180059 0.630733727 -0.383363977 0.701449909 0.802079854 Erysipelotrichaceae 15520.94522 0.048447391 0.131729192 0.367780216 0.713037122 0.811903614 bacterium 6_1_45 Prevotella stercorea 11271.29281 -0.118799984 0.332648112 -0.3571341 0.720991418 0.816564831 Lactobacillus 184.4402837 -0.884275998 2.496243767 -0.354242647 0.723157046 0.816564831 salivarius [Clostridium] scindens 3182.537684 -0.057462006 0.171407352 -0.335236531 0.737446667 0.82545739 Bacteroides dorei 403060.4884 -0.078118897 0.238460217 -0.327597189 0.743216248 0.82545739 Catenibacterium 2197.174597 -0.430005129 1.310230516 -0.328190439 0.742767676 0.82545739 mitsuokai Synergistes sp. 33.1537786 -1.0048816 2.979012001 -0.337320427 0.735875363 0.82545739 3_1_syn1 Rothia mucilaginosa 259.3038135 0.456706802 1.415808161 0.322576755 0.7470158 0.826290946 Peptoniphilus sp. oral 191.7073066 0.251390351 0.820315405 0.306455723 0.759257687 0.836418021 taxon 836 Prevotella timonensis 1790.897269 0.162037508 0.542672121 0.298591916 0.765251434 0.839607849 Prevotella copri 100976.4769 0.198247763 0.693075579 0.286040612 0.774847016 0.846707828 Erysipelatoclostridium 1486.85147 0.250755188 0.911678061 0.275047957 0.783279394 0.852484803 ramosum Finegoldia magna 19.97198953 -0.78791698 2.97934157 -0.264460104 0.791425404 0.857905137 Turicibacter sp. HGF1 2814.778215 0.307276114 1.195592765 0.257007339 0.797173108 0.860692878 Odoribacter laneus 2200.674584 -0.112402948 0.457538607 -0.245668773 0.805938647 0.866703862 Lachnospiraceae 34315.94347 0.060987393 0.279174594 0.218456098 0.827073761 0.885916953 bacterium 2_1_58FAA Ruminococcus lactaris 59628.20338 0.066893413 0.346331061 0.193148755 0.846842471 0.894083548 Parabacteroides 11977.0895 0.050194419 0.259801655 0.193202844 0.846800112 0.894083548 johnsonii Streptococcus mitis 752.0995113 0.134875393 0.68739214 0.196213174 0.844443317 0.894083548 Streptococcus equinus 526.6292788 0.281932722 1.469890255 0.191805287 0.84789473 0.894083548 Pseudomonas sp. 19.12810706 -0.407809362 2.979409882 -0.136875884 0.891128901 0.936030745 2_1_26 Bacteroides stercoris 74593.11166 -0.045432288 0.375976883 -0.120837984 0.903819364 0.945695164 Veillonella sp. oral 507.3948404 0.15740417 1.378482977 0.114186517 0.90908994 0.947551437 taxon 158 Bifidobacterium 24500.71458 -0.096291713 1.090448952 -0.08830465 0.929634543 0.954283944 bifidum [Ruminococcus] 30556.70701 0.02757167 0.292636899 0.094218023 0.924935968 0.954283944 gnavus Streptococcus 4450.418692 0.046868347 0.523552333 0.08951989 0.928668747 0.954283944 parasanguinis Hafnia alvei 89.70462023 -0.243631476 2.625499828 -0.092794322 0.926066962 0.954283944 Erysipelotrichaceae 5939.382198 0.009006082 0.155043075 0.058087612 0.953678842 0.97527157 bacterium 3_1_53 Bacteroides eggerthii 88904.52057 -0.003656916 0.285758599 -0.012797222 0.989789573 0.991758181 Lachnospiraceae 1127.184917 -0.002086008 0.196753139 -0.010602157 0.991540861 0.991758181 bacterium 6_1_37FAA Clostridium 184.4746497 -0.032512514 2.005857277 -0.016208787 0.987067825 0.991758181 perfringens Sutterella 17207.38543 -0.02967304 0.946851002 -0.031338659 0.97499946 0.991758181 wadsworthensis Lactobacillus vaginalis 22.51234057 -0.030775798 2.979329957 -0.010329772 0.991758181 0.991758181 Streptococcus sp. 151.3927168 -0.021334112 1.718393782 -0.012415147 0.9900944 0.991758181 C300 Actinomyces turicensis 0 0 0 0 1

Corynebacterium 0 0 0 0 1 striatum Achromobacter 0 0 0 0 1 piechaudii Peptoniphilus 0 0 0 0 1 lacrimalis

Supplementary Figure S1. PCA including the negative control samples coloured by batch. The orange samples in the upper left corner are suspected of being contaminated as also the negative control sample that belong to those samples (orange cross) is drawn to that corner.

Supplementary Figure S2. Abundances of species which are suspected to be contaminating some samples of one batch. Displayed are the normal samples as well as the negative control samples for the contaminated batch (in green) and the non-contaminated batches (in grey).

Supplementary List S3: Wine related microorganisms used in the custom Kraken database. The species which are contamination in some samples from the second batch are highlighted with a *.

Acetobacter aceti Gilliamella bombi * Acetobacter cerevisiae Gilliamella intestine * Acetobacter malorum Gilliamella mensalis * Acetobacter orleanensis * Gluconobacter oxydans * Acetobacter pasteurianus Hanseniaspora guilliermondii Acremonium chrysogenum Hanseniaspora opuntiae Acremonium furcatum Hanseniaspora osmophila Agrobacterium vitis Hanseniaspora uvarum Alternaria alternate Hanseniaspora vineae Amycolatopsis orientalis Kazachstania servazzii Aspergillus aculeatus Klebsiella pneumoniae Aspergillus carbonarius Kluyveromyces aestuarii Aspergillus clavatus Kluyveromyces lactis Aspergillus flavus Kluyveromyces marxianus Aspergillus fumigatus Kluyveromyces wickerhamii Aspergillus nidulans * Komagataeibacter europaeus Aspergillus niger Komagataeibacter hansenii Aspergillus terreus * Komagataeibacter intermedius Aureobasidium pullulans * Komagataeibacter oboediens Bacillus cereus * Komagataeibacter xylinus Bacillus subtilis Lachancea kluyveri Bacillus thuringiensis Lachancea lanzarotensis Baudoinia panamericana Lachancea thermotolerans Bifidobacterium bifidum Lachancea waltii Brettanomyces anomalus Lactobacillus brevis * Brettanomyces bruxellensis Lactobacillus buchneri Burkholderia cenocepacia Lactobacillus curvatus Burkholderia pseudomallei Lactobacillus delbrueckii Burkholderia thailandensis Lactobacillus diolivorans Candida albicans * Lactobacillus fermentum Candida parapsilosis Lactobacillus fructivorans Candida tropicalis Lactobacillus jensenii Citrobacter freundii Lactobacillus kunkeei Clavispora lusitaniae Lactobacillus mali Coniella lustricola Lactobacillus nagelii Cryptococcus gattii VGI Lactobacillus paracasei Cryptococcus neoformans Lactobacillus plantarum Curtobacterium ammoniigenes Lactobacillus vini Curtobacterium flaccumfaciens Lelliottia amnigena Debaryomyces fabryi Leuconostoc citreum Debaryomyces hansenii Leuconostoc mesenteroides Diaporthe ampelina Leuconostoc pseudomesenteroides Dyella japonica Mesorhizobium amorphae Dyella thiooxydans Mesorhizobium cicero Erwinia billingiae Mesorhizobium japonicum Erysiphe necator Methylorubrum extorquens Fusarium fujikuroi Metschnikowia sp. 04-218.3 Fusarium oxysporum Metschnikowia sp. 13-106.1 Fusarium verticillioides Meyerozyma guilliermondii Geotrichum fermentans Micrococcus luteus Micrococcus lylae Saccharomyces uvarum Micrococcus terreus Scheffersomyces lignosus Mitsuaria chitosanitabida Scheffersomyces shehatae Mucor circinelloides Scheffersomyces stipites Oenococcus oeni Schizosaccharomyces pombe Pantoea vagans Sclerotinia sclerotiorum Papiliotrema flavescens Snodgrassella alvi Pediococcus damnosus Sphingomonas melonis Penicillium digitatum Sphingomonas phyllosphaerae Penicillium expansum Sphingomonas wittichii Pichia fermentans Staphylococcus aureus Pichia kluyveri Starmerella bacillaris Pichia kudriavzevii Tatumella morbirosei Pichia membranifaciens Tatumella ptyseos Plasmopara viticola Tatumella saanichensis Propionibacterium acidifaciens Torulaspora delbrueckii Propionibacterium cyclohexanicum Trichothecium roseum Propionibacterium freudenreichii Variovorax boronicumulans Pseudomonas syringae Variovorax paradoxus Ralstonia solanacearum Variovorax soli Ramularia collo-cygni Vitis vinifera Ramularia endophylla Weissella halotolerans Rhizophagus irregularis Weissella koreensis Rhizopus stolonifera Weissella paramesenteroides Rhodotorula sp. FNED7-22 Wickerhamomyces anomalus Rhodotorula sp. JG-1b Xylella fastidiosa Roseateles aquatilis Yarrowia deformans Roseateles depolymerans Yarrowia keelungensis Saccharomyces bayanus Yarrowia lipolytica Saccharomyces cerevisiae Zygosaccharomyces bailii Saccharomyces eubayanus Zygosaccharomyces rouxii Saccharomyces kudriavzevii [Candida] apicola Saccharomyces mikatae [Candida] glabrata Saccharomyces paradoxus [Candida] intermedia Saccharomyces pastorianus

Supplementary Table S4: Overview of the draft genomes for the vineyards. The completeness and contamination were estimated with CheckM as well as the Marker lineage. Only draft genomes with an estimated completeness of ³40% are reported. The number of genes and contigs are given as additional quality measures. The taxonomic annotation from mapping the genes to UniProt with mmseqs as well as how many genes had that hit (in %) are given. For the Saccharomyces bins the completeness was additionally estimated using BUSCO.

vineyard bin completeness contamination Marker lineage # genes # contigs mmseqs most hits busco completeness 1 bins.27 99.53 3.32 o__Rickettsiales 1289 38 Rickettsia (39%) 1 bins.49 99.2 0 c__Spirochaetia 1208 9 bacteria (37%) 1 bins.6 99.03 5.2 f__Burkholderiaceae 5028 45 Ralstonia (45%) 1 bins.40 98.79 0.71 f__Enterobacteriaceae 3686 96 Enterobacteriaceae (21%) 1 bins.9 97.51 0.25 o__Rhodospirillales 2612 70 Gluconobacter (47%) 1 bins.30 95.92 54.39 k__Bacteria 16015 2417 Burkholderiaceae (39%) 1 bins.2 80.11 36.05 o__Burkholderiales 8775 1985 Burkholderiaceae 1 bins.32 72.96 2.94 c__Gammaproteobacteria 2105 672 Orbacea (36%) 1 bins.50 60.84 1.69 c__Gammaproteobacteria 1144 332 Gammaproteobacteria (42%) 1 bins.7 59.74 1.19 f__Enterobacteriaceae 2363 633 Enterobacteriaceae (40%) 1 bins.34 54.86 22.61 k__Bacteria 5707 747 fungi (32%) 1 bins.44 52.97 64.91 k__Archaea 233422 41328 Plantae (32%) 1 bins.5 48.2 22.37 k__Bacteria 64623 2418 Sclerotiniaceae (19%) 1 bins.46 46.63 18.05 k__Bacteria 5558 126 Saccharomyces (44%) 72.40% 1 bins.16 45.26 46.18 k__Archaea 4224 1100 Hanseniaspora (60%) 1 bins.13 43.15 25.25 k__Archaea 44093 10983 Pleosporaceae (35%) 1 bins.21 42.63 20.36 k__Bacteria 4554 368 Hanseniaspora (62%) 1 bins.48 41.12 0.62 o__Pseudomonadales 2653 832 Pseudomonas (39%) 1 bins.51 40.9 26.87 k__Archaea 3256 112 Saccharomyces (46%) 35.30% 2 bins.11 98.66 0 o__Lactobacillales 1852 28 Oenococcus (46%) 2 bins.39 95.73 0.95 o__Rickettsiales 1254 57 Rickettsia (38%) 2 bins.19 94.3 0.64 f__Enterobacteriaceae 3242 51 Enterobacteriaceae (36%) 2 bins.9 88.07 9.17 f__Burkholderiaceae 4489 49 Ralstonia (43%) 2 bins.22 86.22 8.34 g__Burkholderia 8595 898 Burkholderiaceae (43%) 2 bins.25 80.19 22.03 o__Burkholderiales 7761 970 Burkholderiaceae (25%) 2 bins.18 63.25 23.95 k__Archaea 3950 273 Hanseniaspora (61%) 2 bins.3 62.39 26.22 k__Archaea 30734 6474 Eukaryota (20%) 2 bins.13 56.9 1.72 k__Bacteria 3070 795 Burkholderiaceae (37%) 2 bins.6 56.22 1.99 o__Rhodospirillales 993 48 Acetobacter (50%) 2 bins.17 52.06 98.51 k__Archaea 7947 265 Saccharomyces (48%) 51.30% 2 bins.33 50.79 2.07 o__Burkholderiales 3605 941 Burkholderiaceae (33%) 2 bins.40 48.35 13.75 k__Bacteria 61821 2946 Sclerotiniaceae (19%) 2 bins.29 47.54 39.25 k__Bacteria 47903 3942 Ascomycota (24%) 2 bins.20 46.84 20.72 k__Archaea 3549 90 Saccharomyces (46%) 43.50% 2 bins.16 45.81 35.56 k__Archaea 3871 714 Hanseniaspora (60%) 3 bins.49 99.5 0.5 o__Rhodospirillales 2996 61 Gluconobacter (42%) 3 bins.32 99.35 5.74 o__Pseudomonadales 5672 126 Pseudomonas (52%) 3 bins.24 91.04 12.48 o__Rhodospirillales 2451 239 Acetobacter (44%) 3 bins.26 84.04 5.84 g__Bacillus 5555 630 Bacillus (44%) 3 bins.29 79.54 8.96 g__Lactobacillus 7125 2070 Plantae (14%) 3 bins.1 76.35 7.76 k__Bacteria 9416 867 Burkholderiaceae (40%) 3 bins.7 57.41 121 k__Archaea 370451 75576 Plantae (30%) 3 bins.4 57.05 19.97 k__Archaea 3054 168 Hanseniaspora (64%) 3 bins.22 55.05 0 f__Burkholderiaceae 2357 21 Ralstonia (41%) 3 bins.51 54.61 58.56 k__Archaea 5826 206 Saccharomyces (46%) 49.00% 3 bins.16 53.74 0 o__Lactobacillales 870 5 Oenococcus (48%) 3 bins.8 53.45 15.52 k__Bacteria 2676 19 Ralstonia (40%) 3 bins.19 52.89 83.15 k__Archaea 7229 1559 Hanseniaspora (61%) 3 bins.6 48.92 21.61 k__Archaea 3748 76 Saccharomyces (48%) 46.40% 3 bins.27 48.4 3.7 o__Burkholderiales 2349 711 Comamonadeceae (23%) 3 bins.11 44.17 9 k__Archaea 25880 2420 Ascomycota (25%) 3 bins.15 44.12 0 o__Lactobacillales 941 13 Oenococcus (44%) 3 bins.37 40.99 14.69 k__Bacteria 48052 1545 Sclerotiniaceae (18%) 3 bins.21 40.31 3.45 k__Bacteria 5917 1067 Burkholderiaceae (24%) 4 bins.29 91.78 7.67 o__Pseudomonadales 5537 155 Pseudomonas (53%) 4 bins.1 87.21 3.2 o__Pseudomonadales 5141 940 Pseudomonas (47%) 4 bins.11 69.71 12.08 g__Burkholderia 8594 1557 Burkholderiaceae (42%) 4 bins.37 69.55 35.08 k__Archaea 38588 2588 Eukaryota (18%) 4 bins.21 64.18 0 o__Rhodospirillales 1263 63 Acetobacter (51%) 4 bins.14 58.72 13.06 o__Burkholderiales 6913 1492 Burkholderiaceae (23%) 4 bins.23 55.41 39.16 k__Bacteria 40933 2999 Eukaryota (20%) 4 bins.53 54.76 14.75 k__Archaea 3159 166 Hanseniaspora (61%) 4 bins.2 54.18 67.13 k__Archaea 5696 888 Hanseniaspora (61%) 4 bins.43 51.41 19.27 k__Archaea 41191 7008 Eukaryota (24%) 4 bins.49 48.35 19.73 k__Bacteria 60241 2416 Sclerotiniaceae (19%) 4 bins.47 47.85 70.82 k__Archaea 6518 152 Saccharomyces (49%) 47.10% 4 bins.39 46.55 0 k__Bacteria 3074 22 Ralstonia (40%) 4 bins.6 45.99 0 o__Lactobacillales 652 9 Oenococcus (50%) 4 bins.28 44.01 0 o__Lactobacillales 879 14 Oenococcus (51%) 4 bins.7 42.55 13.18 k__Bacteria 39649 531 Ascomycota (25%) 5 bins.3 97.86 9.76 o__Lactobacillales 1909 79 Oenococcus (45%) 5 bins.16 75.86 0 k__Bacteria 1705 112 Acetobacter (54%) 5 bins.17 71.08 13.65 k__Bacteria 13057 1909 Burkholderiaceae (45%) 5 bins.2 64.92 14.64 o__Burkholderiales 6725 1374 Burkholderiaceae (26%) 5 bins.15 60.99 19.2 k__Archaea 3803 93 Saccharomyces (46%) 50.10% 5 bins.11 60.32 0.92 f__Burkholderiaceae 2844 23 Ralstonia (41%) 5 bins.21 46.08 19.87 k__Bacteria 4314 287 Hanseniaspora (64%) 5 bins.18 43.89 8.13 k__Bacteria 30845 4394 Ascomycota (26%) 5 bins.19 41.85 3.45 k__Bacteria 2500 848 Comamonadeceae (23%) 5 bins.1 41.5 6.9 k__Bacteria 46146 8915 Sclerotiniaceae (20%) 6 bins.25 98.88 0.84 f__Bacillaceae 3344 225 Bacillaceae (38%) 6 bins.12 89.97 4.04 f__Burkholderiaceae 4638 64 Ralstonia (40%) 6 bins.16 65.35 41.65 k__Archaea 35818 3188 Eukaryota (10%) 6 bins.11 53.51 23.51 g__Burkholderia 10261 3022 Burkholderiaceae (33%) 6 bins.28 46.63 16.65 k__Bacteria 64439 5258 Sclerotiniaceae (20%) 6 bins.29 45.42 18.12 k__Bacteria 4600 93 Saccharomyces (48%) 61.50% 6 bins.23 44.32 1.5 o__Pseudomonadales 3048 1000 Pseudomonas (54%) 6 bins.26 42.57 18.56 k__Bacteria 3232 378 Hanseniaspora (63%) 6 bins.9 41.99 16.49 k__Archaea 4005 1718 fungi (29%) 6 bins.22 41.5 17.3 k__Bacteria 37292 4852 fungi (30%) 7 bins.19 99.94 3.21 f__Burkholderiaceae 5311 66 Ralstonia (39%) 7 bins.16 99.29 0.16 o__Rickettsiales 1281 63 Rickettsia (36%) 7 bins.12 98.13 0 o__Lactobacillales 1809 21 Oenococcus (48%) 7 bins.8 93.44 7.2 o__Pseudomonadales 5926 684 Pseudomonas (49%) 7 bins.26 92.87 4.95 f__Bacillaceae 3278 619 Bacillaceae (29%) 7 bins.22 58.15 25.7 k__Bacteria 5831 440 fungi (30%) 7 bins.27 57.72 81.35 k__Archaea 7516 1648 Hanseniaspora (59%) 7 bins.24 51.38 37.36 k__Bacteria 39905 3265 Eukaryota (10%) 7 bins.11 50.92 1.24 o__Rhodospirillales 2205 641 (41%) 7 bins.29 50.25 12.28 k__Archaea 3117 66 Saccharomyces (45%) 43.30% 7 bins.4 47.83 27.92 k__Archaea 3938 79 Saccharomyces (47%) 45.30% 7 bins.5 42.31 11.48 k__Bacteria 33473 5347 Ascomycota (26%)