bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1

2 Title: A robust metaproteomics pipeline for a holistic taxonomic and functional

3 characterization of microbial communities from marine particles

4 Authors: Doreen Schultz,1 Daniela Zühlke,1 Jörg Bernhardt,1 Thomas Ben Francis,2 Dirk

5 Albrecht,1 Claudia Hirschfeld,1 Stephanie Markert,3 and Katharina Riedel 1*

6

7 Addresses: 1 Institute of Microbiology, University of Greifswald, Greifswald, Germany.

8 2 Max-Planck Institute for Marine Microbiology, Bremen, Germany.

9 3 Institute of Pharmacy, University of Greifswald, Greifswald, Germany.

10

11 *Corresponding author mailing address: Institute of Microbiology, Center for Functional

12 Genomics of Microbes (C_FunGene), University of Greifswald, Felix-Hausdorff-Straße 8, DE-

13 17487 Greifswald, Germany. E-mail [email protected]; Tel (+49) 3834 420 5900

14

15 Running title: Metaproteomic pipeline to analyse marine particles

1

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

16 Originality-Significance Statement

17 Marine particles consist of organic particulate matter (e.g. phyto- or zooplankton) and

18 particle-associated (PA) microbial communities, which are often embedded in a sugary

19 matrix. A significant fraction of the decaying algal biomass in marine ecosystems is expected

20 to be mineralized by PA heterotrophic communities, which are thus greatly contributing to

21 large-scale carbon fluxes. Whilst numerous studies have investigated the succession of

22 planktonic marine along phytoplankton blooms, the community structure and

23 functionality of PA bacterial communities remained largely unexplored and knowledge on

24 specific contributions of these microorganisms to carbon cycling is still surprisingly limited.

25 This has been mostly been due to technical problems, i.e. to the difficulty to retrieve genomic

26 DNA and proteins from these polysaccharide-rich entities, their enormous complexity and the

27 high abundance of eukaryotic microorganisms.

28 Our study presents an innovative, robust, reproducible, and reliable metaproteomics

29 pipeline for marine particles, which will help to address and fill the above-described

30 knowledge gap. Employing the here established workflow enabled us to identify more than

31 5,000 PA proteins, which is, at least to our knowledge, the largest number of protein groups

32 ever assigned to marine particles. Notably, the novel pipeline has been validated by a first,

33 comparative metaproteome analysis of free-living and PA bacterial communities indicating a

34 significant functional shift enabling surface-associated bacteria to adapt to particle-specific

35 living conditions. In conclusion, our novel metaproteomics pipeline presents a solid and

36 promising methodological groundwork for future culture-independent analyses of seasonal

37 taxonomic and functional successions of PA microbial communities in aquatic habitats.

2

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

38 Summary

39 This study aimed to establish a robust, reproducible and reliable metaproteomic pipeline for

40 an in-depth characterization of marine particle-associated (PA) bacteria. To this end, we

41 compared six well-established protein extraction protocols together with different MS-sample

42 preparation techniques using particles sampled during a North Sea spring algae bloom in

43 2009. In this optimized workflow, proteins are extracted using a combination of SDS-

44 containing lysis buffer and cell disruption by bead-beating, separated by SDS-PAGE, in-gel

45 digested and analysed by LC-MS/MS, before MASCOT search against a metagenome-

46 based database and data processing/visualization with the in-house-developed

47 bioinformatics tools Prophane and Paver.

48 As proof of principle, free-living (FL) and particulate communities sampled in April 2009

49 were analysed, resulting in an as yet unprecedented number of 9,354 and 5,034 identified

50 protein groups for FL and PA bacteria, respectively. Our data revealed that FL and PA

51 communities appeared similar in their taxonomic distribution, with notable exceptions:

52 eukaryotic proteins and proteins assigned to , Cyanobacteria, and some

53 proteobacterial genera were found more abundant on particles, whilst overall proteins

54 belonging to were more dominant in the FL fraction. In contrast, significant

55 functional differences including proteins involved in polysaccharide degradation, sugar- and

56 phosphorus uptake, adhesion, motility, and stress response were detected.

3

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

57 Introduction

58 The oceans provide about one half of the global net primary production (NPP) (Field et al.,

59 1998; Falkowski et al., 1998; Azam and Malfatti, 2007), a large part of which is reprocessed

60 by bacterioplankton in the so-called “microbial loop”, a pathway during which dissolved

61 organic carbon is returned to higher trophic levels via its incorporation into bacterial biomass

62 (Azam, 1998). Notably, about 20% of the bacterioplankton lives attached to algae or marine

63 particles (Azam et al., 1983). These marine particles consist of various kinds of organic

64 matter, i.e. dead/dying zoo- or phytoplankton, bacterioplankton, as well as inorganic small

65 particles held together by a sugary matrix consisting of polysaccharide-composed

66 transparent extracellular particles (TEPs) composed of polysaccharides, which are exuded

67 mostly by phytoplankton but also bacteria (Alldredge et al., 1993). These particle-associated

68 (PA) microbial communities present one of the most ancient forms of “multicellularity” (Hall-

69 Stoodley et al., 2004) and have adapted to strive and survive in marine environments. Whilst

70 some bacteria are only loosely associated with algae, others colonize algal surfaces

71 (Grossart, 1999), where they form commensalistic or symbiotic communities with their host

72 or even predate on algae (Sohn et al., 2004; Amin et al., 2012). Marine particles grow while

73 sinking and thus contribute largely to the “biological pump” by transporting carbon to deeper

74 waters and sediments (Volkman and Tanoue, 2002). These aggregates may reach several

75 centimetres in diameter. They are enzymatically well equipped to metabolize high molecular

76 weight substrates, thus providing nutrition to the attached community as well as leaving

77 nutrients to the surrounding water column community (Simon et al., 2002; Grossart, 2010).

78 About one decade ago, scientists started to link molecular systems biology of

79 microorganisms to ecosystem level processes (e.g. reviewed in Raes and Bork, 2008).

80 Metagenomic studies were initiated to provide valuable knowledge about diversity and

81 distribution of microorganisms in natural environments. Moreover, metatranscriptomics and

82 metaproteomics approaches were established to investigate, which genes are expressed at

83 a given time point and which proteins are particularly abundant in complex biological

84 systems. Metaproteomics has meanwhile widely proven its potential to revisit microbial 4

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

85 ecology concepts by linking genetic and functional diversity in microbial communities and

86 relating taxonomic and functional diversity to ecosystem stability (Schneider and Riedel,

87 2010). Numerous studies, describing large-scale proteome analyses of acid-mine drainage

88 (AMD) biofilms (Ram et al., 2005), wastewater treatment plants (Wilmes et al., 2008), and

89 fresh-water stream biofilms (Hall et al., 2012) have demonstrated the power of

90 metaproteomics to unveil molecular mechanisms involved in function, physiology, and

91 evolution of surface-associated aquatic microbial communities. Marine metaproteomics has

92 meanwhile been widely applied (Saito et al., 2019; Wang et al., 2014), in particular in

93 habitats such as ocean scale shifts (Morris et al., 2010), the Atlantic (Bergauer et al., 2017)

94 or Antarctic oceans (Williams et al., 2012), e.g., to investigate Roseobacter clade (Christie-

95 Oleza and Armengaud 2015) and bacterioplankton (e.g. Wöhlbrand et al., 2017a) physiology.

96 Teeling et al. (2012) studied the bacterioplankton response to a bloom in the North

97 Sea by an integrated meta-omics approach employing metagenomics and metaproteomics

98 and provided strong evidence that distinct free-living (FL) populations of ,

99 , and Alphaproteobacteria specialized in a successive decomposition

100 of algal-derived organic matter. As mentioned above, a significant fraction of decaying algal

101 biomass is, however, mineralized by heterotrophic bacteria living on particles, which process

102 a large fraction of the biosynthesized organic matter (Azam, 1998) and are thus greatly

103 contributing to large-scale carbon fluxes (Battin et al., 2003; Bauer et al., 2006).

104 So far, the majority of the published studies focused on FL bacterioplankton, thereby

105 leaving the PA bacterial communities largely unexplored. The particulate fraction of

106 bacterioplankton has proven challenging to comprehensive meta-omics characterization, due

107 to its high complexity, presence of DNA/protein-binding polysaccharides, process-interfering

108 substances and lack of (meta)genomic information on marine particles (e.g. Wöhlbrand et al.,

109 2017b), despite the fact that information on marine metagenomes is constantly growing

110 (reviewed by Mineta and Gojobori 2016; Alma'abadi et al., 2015). Previous experiments also

111 indicate that a high abundance of eukaryotic proteins contributes to these challenges (Smith

112 et al., 2017; Saito et al., 2019). 5

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

113 Our goal was therefore to establish a robust, reproducible and reliable pipeline

114 enabling in-depth metaproteomics analyses of marine particles by testing different

115 established protocols for their applicability for protein extraction from PA bacterioplankton in

116 order to unravel the PA community’s specific contribution to polysaccharide decomposition in

117 marine habitats. As stated above, heterotrophic microbial communities are supposed to be

118 well equipped to metabolize algal high molecular weight substrates (Simon et al., 2002;

119 Grossart, 2010). We thus hypothesize that these communities either differ taxonomically

120 from their FL counterparts (as shown by Bizic-Ionescu et al. (2015)) and/or express different

121 genes to adapt to the sessile life style and the availability of specific polysaccharides such as

122 insoluble glycan fibres (as observed by Ganesh et al. (2014)). Moreover, we postulated that

123 this adaption will mostly affect the expression of proteins involved in motility, adhesion, stress

124 response as well as CAZymes together with appropriate sugar transporters. In order to test

125 these hypotheses, a metaproteomics pipeline for marine particles was established and tested

126 as a proof of concept on spring bloom samples collected in April 2009, for which the FL

127 bacterial fraction had previously been characterized (Teeling et al., 2012).

6

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

128 Results and Discussion

129 Establishment of a metaproteomics pipeline for PA microbial communities

130 As stated above, the metaproteomics analysis of PA microbial communities is severely

131 hampered by their high complexity, the presence of a large proportion of eukaryotic proteins,

132 the sugary particle-matrix as well as the lack of (meta)genomic information on PA-specific

133 pro- and (Wöhlbrand et al., 2017b; Saito et al., 2019). Whilst the metaproteomics

134 analyses by Teeing et al. (2012) and Kappelmann et al. (2019) of FL bacterioplankton

135 (harvested on 0.2 µm filters) sampled during spring blooms from 2009 to 2012 at

136 “Kabeltonne” Helgoland resulted in the identification of several thousand protein groups, the

137 PA microbial communities retained on 3 and 10 µm pore-sized filters emerged as difficult to

138 analyse by the integrated metagenomic/metaproteomic approach employed at that time.

139 We thus aimed to develop a suitable and effective metaproteomics pipeline for in-depth

140 analyses of taxonomic composition and functionality of marine PA microbial communities. To

141 this end, six different well-established protein extraction protocols for various particulate

142 samples were tested using biomass from 3 µm and 10 µm pore-sized filters (assigned as PA

143 fraction) of the MIMAS bio-archive (www.mimas-project.de) covering different stages of the

144 bloom, which had been collected in 2009 during the spring phytoplankton bloom sampling

145 campaign by Teeling et al. (2012). Moreover, various MS sample preparation protocols and

146 different protein sequence databases were evaluated.

147 [Table 1]

148 Protein extraction. Efficient protein extraction is a crucial step for successful

149 metaproteomics analyses of the microbial communities. In a first step, we therefore tested

150 five different protein extraction methods that employ different strategies and that were

151 already successfully applied for metaproteome analyses of microbial communities from

152 different environments, i.e. sewage sludge (phenol extraction; Kuhn et al., 2011), leaf litter

153 (SDS-TCA; Schneider et al., 2012), stream hyporheic biofilms (SDS-acetone; Hall et al.,

7

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

154 2012), hypersaline microbial mats (bead beating; Moog, 2012), and soil (freezing and

155 thawing; Chourey et al., 2010; Thompson et al., 2008). In addition, the commercially

156 available TRI-Reagent® (Sigma Aldrich) for simultaneous isolation of RNA, DNA and proteins

157 was tested (Table 1). Filter samples used for protocol evaluation originated from several

158 sampling dates in the 2009 spring bloom sampling campaign (9th of February, 7th of April, 21st

159 of April, and 16th of June 2009). Filters were cut into pieces and subjected to the respective

160 extraction protocol (Fig. 1A). Total protein amounts extracted from the filters by each of the

161 applied methods were quite variable. Highest protein yield as determined by the Pierce™

162 BCA Protein Assay and 1D SDS-PAGE was obtained using the SDS-acetone or bead

163 beating approach (Table 1; Fig. 1B). In conclusion, SDS-acetone- and bead beating-based

164 protocols turned out to be most efficient for protein extraction from particles and were

165 therefore used for optimizing the downstream MS sample preparation procedure.

166 MS sample preparation. Total protein was extracted by the SDS-acetone and bead beating

167 method from filters collected on 28th of April 2009 and separated by 1D SDS-PAGE (Fig.

168 1B). Even though MS sample preparation via GeLC MS/MS is more time-consuming

169 compared to 1D or 2D-LC approaches, it has been proven valuable to purify protein extracts

170 and remove polymeric contaminants (e.g. Lassek et al. 2015; Keiblinger and Riedel, 2018)

171 and yields comparable results as LC-based peptide fractionation (Hinzke et al., 2019). To

172 determine whether an increase in the total number of individual gel sub-fractions will lead to

173 more protein identifications, gel lanes were cut in either 10 or 20 equal-sized fractions,

174 proteins were in gel trypsin-digested and the resulting peptides were subjected to LC-MS/MS

175 analysis. Moreover, we tested whether reduction and alkylation of the proteins prior to tryptic

176 digestion increased protein identification rates (Fig. 1B). Searching the acquired spectra in

177 the so far available 0.2 µm 2009 (MIMAS) database (Teeling et al., 2012) revealed that the

178 best results (Fig. 1B, Fig. S1) were obtained by higher fractionation (20 gel pieces) without

179 reduction and alkylation.

180 [Figure 1]

8

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

181 Optimizing databases. Metagenomic sequencing, assembling and annotation of FL (0.2 µm

182 pore-sized filters) and PA (3 and 10 µm pore-sized filters) fractions of water samples

183 collected during the Helgoland spring bloom 2009 was performed in parallel to the

184 optimization of the metaproteomics protocol (for details see experimental procedures).

185 Unfortunately, the coverage and quality of the metagenome sequences of the large

186 particulate fraction (10 µm pore-sized filters) was not sufficient to be correctly assembled,

187 annotated and translated. Thus, the metagenomic database used for subsequent database

188 searches was only composed of sequences of FL bacteria (0.2 µm pore-sized filters) and

189 microbial communities present in the medium particulate fraction (3 µm pore-sized filters).

190 The LC-MS/MS spectra obtained with the bead beating protocol were searched against

191 four different databases to identify the database that results in the highest number of reliably

192 identified protein groups (Fig. S2): (I) the non-redundant NCBI database (NCBInr), (II) a

193 database with Uniprot sequences from abundant bacteria and identified by Teeling

194 et al. (2012) (PABD), (III) the database used by Teeling et al. (2012) containing proteins

195 based on translated metagenomes of FL bacteria (0.2 µm pore-sized filters from different

196 sampling time points) of the spring bloom 2009 (MIMAS) and (IV) a database based on the

197 metagenomes of the 0.2 and 3 µm pore-sized filters from samples of the 14th of April 2009

198 (0.2 + 3 µm 2009). Best results were obtained with the 0.2 + 3 µm 2009 database (Fig. S2),

199 which is not surprising as the resolving power of metaproteome analyses relies heavily upon

200 the database used for protein identification (e.g. Schneider and Riedel, 2010; Teeling et al.,

201 2012). It is, moreover, well accepted that metaproteomic data are most informative in

202 combination with complementary omics approaches, i.e. genomics and transcriptomics (e.g.

203 Banfield et al., 2005; Ram et al., 2005).

204 Based on our results, we finally decided to use the bead beating-based protocol for

205 protein extraction since this method resulted in more reproducible protein yields compared to

206 the SDS-acetone extraction protocol (Fig. 1B). Furthermore, this method was less time-

207 intensive compared to the SDS-acetone approach and resulted in the identification of the

9

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

208 highest number of unique protein groups, most probably due to the effective disintegration of

209 the particulate matrix by EDTA added to the extraction buffer (Passow, 2002). In summary,

210 the optimized metaproteomic pipeline for marine particles (Fig. 2) comprises protein

211 extraction by bead beating, protein fractionation by 1D SDS-PAGE (20 fractions), followed by

212 in-gel tryptic digestion, LC-MS/MS and database search against the matching metagenome

213 database (0.2 + 3 µm 2009).

214 [Figure 2]

215 Proof of principle – comparative metaproteome analyses of FL and PA bacterioplankton

216 As proof of principle, the newly established protocol was applied for a comparative

217 metaproteomic analysis of FL and PA microbial communities sampled at the 14th of April

218 2009 in 0.2 µm - 3 µm (= FL), 3 - 10 µm and ≥ 10 µm (= PA) fractions. Five technical

219 replicates of each sample were subjected to the final optimized workflow and the resulting

220 MS/MS-data were searched against the matching metagenome-based database (0.2 + 3 µm

221 2009). Employing our optimized pipeline, we were able to record 460,000, 360,000, and

222 440,000 spectra, which subsequently led to the identification of 9,354 protein groups (19.4%

223 of spectral IDs), 2,263 protein groups (10.2% of spectral IDs), and 2,771 protein groups

224 (10.7% of spectral IDs) for the 0.2 - 3 µm (Table S1), 3 - 10 µm (Table S2), and ≥ 10 µm

225 (Table S3) fractions, respectively. This is, at least to our knowledge, the largest number of

226 protein groups ever identified for marine particles. Comparable studies addressing

227 metaproteomic analyses of marine sediments of the Bering Sea (Moore et al., 2012), the

228 coastal North Sea, and the Pacific Ocean (Wöhlbrand et al., 2017b) identified less than 10%

229 of the protein identification numbers resulting from the here presented novel metaproteomic

230 pipeline.

231 1,956 of the identified protein groups of the two PA fractions were also identified in the

232 FL fraction and only 276 proteins were exclusively found in the PA fractions (Fig. S3). This

233 suggests that protein expression profiles of planktonic and particulate bacteria vary less than

10

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

234 expected. However, this might also be due to the fact that PA bacteria are known to hop on

235 and off particles and, e.g. as offspring cells searching for a place to settle, may thus only

236 temporarily be part of the planktonic community (Ghiglione et al., 2007; Grossart, 2010;

237 Crespo et al., 2013). Moreover, clogging of filter pores by particles may cause retention of FL

238 bacteria thus contaminating the PA fractions by planktonic bacteria. In addition, the lack of ≥

239 10 µm pore-sized filter metagenomic sequences hampers comprehensive protein

240 identifications in this PA fraction that may result in a virtually lower abundance than

241 expected.

242 Taxonomic differences between FL and PA bacterioplankton. Besides the somewhat

243 unexpected similarity of the FL and PA metaproteomic datasets, the phylogenetic

244 assignment of the identified protein groups indicated some notable taxonomic differences

245 between the FL and PA fractions (Fig. 3, Table S4).

246 [Figure 3]

247 The number of eukaryotic protein groups was significantly higher in the PA fractions

248 (Fig. 3A) and comprised 43% and 54% of the protein groups identified for the 3 - 10 µm and

249 ≥ 10 µm fractions, respectively, in contrast to only 11% of the protein groups identified for the

250 FL phytoplankton. Moreover, the number of viral protein groups was found to be almost three

251 times higher in the two particulate fractions when compared to their planktonic counterpart

252 (Fig. 3A). The most abundant phyla within both, the FL and PA fractions, were

253 Proteobacteria (FL 55%; PA 41% and 39%, 3 µm and 10 µm pore-sized filters) and

254 Bacteroidetes (FL 40%; PA 48% and 47%, 3 µm and 10 µm pore-sized filters). Proteins

255 expressed by Alpha-, Beta- and Gammaproteobacteria were generally more dominant in the

256 FL bacteria, whilst proteins assigned to Cyanobacteria (e.g. Synechococcus, Arthrospira),

257 Opitutae, Flavobacteriia (e.g. Arenitalea, Olleya, Algibacter, Lacinutrix), and some

258 proteobacterial genera (e.g. Oceanicoccus, Candidatus Puniceispirillum, Neptuniibacter,

259 Halioglobus, Ramlibacter) were more abundant in the PA fraction (Fig. 3B). This is in good

260 accordance to other studies confirming that Bacteroidetes have been identified in both, FL 11

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

261 and PA, bacterioplankton (DeLong et al., 1993; Eilers et al., 2001; Abell and Bowman, 2005;

262 Alonso et al., 2007). Moreover, Flavobacteriia have been found highly abundant during

263 phytoplankton blooms indicating that they play an important role as consumers of algal-

264 derived organic matter (Simon et al., 1999; Riemann et al., 2000; Pinhassi et al., 2004;

265 Grossart et al., 2005; Teeling et al., 2016; Chafee et al., 2018).

266 Functional differences between FL and PA bacterioplankton. Notably, differences in the

267 protein profiles between FL and PA bacteria were more evident on the functional level (Fig.

268 4). Most importantly, the SusC/D utilization system, specific glycoside hydrolases, i.e. GH

269 family 1, 13, and 16 (including beta-glucosidases, alpha-1,4-amylases, and exo- and endo-

270 1,3-beta-glycanases), glycosyl transferases and TonB-dependent transporters were found

271 with higher overall expression levels in the PA fractions compared to the FL fraction (Fig.

272 4A). This is in good accordance with the high substrate availability (Caron et al., 1982;

273 Grossart et al., 2003; Fernández-Gómez et al., 2013), especially the presence of highly

274 abundant microalgae storage polysaccharides, i.e. alpha- and beta-glucans (Kroth et al.,

275 2008), in the particles. Sulfatases, capable of cleaving sulphate sugar ester bonds, are

276 contributing to the degradation of specific sulphated algal polysaccharides such as mannans

277 and fucans (Gómez-Pereira et al., 2012). This is well supported by our finding that sulfatases

278 are strongly expressed by PA Flavobacteriia, especially Formosa sp. (Fig. 4A & B).

279 [Figure 4]

280 FL and PA bacteria seem to employ different phosphate acquisition strategies: whilst in

281 FL bacteria phosphate and phosphonate ABC-type transporters appeared highly expressed,

282 PA bacteria rather seem to employ phytases and phosphate-selective porins. As expected,

283 various proteins involved in stress response were differentially expressed. Interestingly,

284 functions involved in oxidative stress defense appeared to be less abundant in the PA

285 fraction (maybe due to shading, reducing solar irradiation stress in the particles), whilst

286 proteins for heavy metal and antibiotic resistance were strongly expressed in the ≥ 10 µm

287 fraction, which also contained the highest proportion of eukaryotic proteins (Fig. 4A). This 12

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

288 might be due to the fact that some algae take up and store heavy metals (Gaudry et al.,

289 2007) and are capable of producing antibiotics (Grossart, 1999). This indicates that close

290 -bacterial interactions in particles require such defense strategies of the associated

291 bacteria. As expected, adhesion proteins as well as proteins involved in motility, i.e. flagella

292 and type IV-pili, were more abundant on the particles, emphasizing their importance for

293 biofilm/aggregate formation (O´Toole & Kolter, 1998; Lemon et al., 2007; Houry et al., 2010;

294 Burke et al., 2011). Interestingly, proteorhodopsin, an inner membrane protein involved in

295 light-dependent energy generation, which has been proposed to enable FL bacteria such as

296 Polaribacter (Fernández-Gómez et al., 2013) and Pelagibacter (Giovannoni et al., 2005) to

297 survive under low nutrient conditions, was also abundantly identified in PA bacteria such as

298 Polaribacter, Paraglaciecola, and Marinosulfomonas in our analyses.

299 Eukaryotes are highly abundant and might contribute to polysaccharide degradation

300 on marine particles. Our metaproteomics data demonstrate a high abundance of

301 eukaryotes on the particles (Fig. 5). Preliminary analyses indicate that these include

302 numerous microalgal groups, e.g. diatoms, Pelagophytes, Raphidiophytes, Cryptophytes,

303 Dinoflagellates and Haptophytes, but also fungi and various protozoa (Table S2 and S3).

304 This clearly sets particles apart from the FL fraction and highlights the importance of direct

305 eukaryote-bacterial interactions in particles. Previous work on FL bacteria showed that

306 bacterial succession was largely independent of phytoplankton composition, and instead

307 determined by broad substrate availability (Teeling et al., 2016). PA bacterial composition is

308 more likely to be directly controlled by algal composition due to the intimate nature of their

309 interactions (Grossart et al., 2005), although functional redundancy may be substantial

310 (Burke et al., 2011). Moreover, eukaryotes may also contribute to polysaccharide

311 degradation in concert with bacteria. For example, fungal taxa can be abundant in marine

312 particles and have been shown to utilize algal polysaccharides such as laminarin

313 (Bochdansky et al., 2016; Cunliffe et al., 2017).

314 [Figure 5]

13

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

315 Conclusions and outlook

316 Our comparative metaproteomic analyses of marine microbial communities living either

317 planktonically or attached to particles resulted in an as yet unequalled number of identified

318 protein groups for marine particles. Interestingly, the great overlap between metaproteomes

319 of FL and PA heterotrophic bacterial communities indicates that taxonomic differences

320 between them might be less pronounced than previously thought. This might be due to the

321 fact that (I) FL bacteria can rapidly adapt to the surface-associated life style, as the majority

322 of these bacteria seems to be also present on the particles and proteins important for biofilm-

323 formation, i.e. motility and adhesion proteins, are also expressed when living planktonically,

324 and that (II) FL or PA-specific bacteria are frequently hopping on and off the particles.

325 Notably, there is strong evidence that bacteria, when living on the particles, express life style-

326 specific functions, i.e. special CAZymes, sugar transporters and proteins involved in certain

327 stress responses, which enable them to cope with the unique living conditions on marine

328 particles.

329 Although our optimized metaproteomic workflow significantly improved the identification

330 rate of PA proteins, the number of protein identifications from the particles is still

331 considerably lower compared to FL bacterial communities. We assume that especially the

332 high abundance of eukaryotic proteins poses problems in protein identification due to the

333 complexity and diversity of microbial eukaryote genomes and the presence of introns and

334 repeats in the metagenomic DNA sequence databases, which hinders peptide identification

335 (Saito et al., 2019). Metaproteome coverage of marine particles could be significantly

336 improved by employing customized databases including eukaryotic metatranscriptomic

337 (RNA-based) sequence data (Keeling et al., 2014). This can be achieved by generating

338 metatranscriptomes from the particular fractions. Alternatively, protein identification could

339 also be substantially improved by extracting already existing metatranscriptomic and

340 metagenomic data from relevant eukaryotic taxa from public databases. Key to the latter

341 approach is reliable information on which eukaryotic organisms make up the particles, which

14

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

342 can be attained by 18S rRNA gene amplicon sequencing. Perspectively, we will extend our

343 analyses on eukaryotic taxa and analyse multiple time points during phytoplankton bloom to

344 investigate succession of taxonomical clades and expressed functions of marine particles

345 from pre-bloom to post-bloom conditions.

15

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

346 Experimental Procedures

347 Bacterial biomass samples. Sampling of bacterioplankton was performed as described

348 previously (Teeling et al., 2012). Briefly, surface water samples were taken during spring

349 2009 at the station “Kabeltonne” (50° 11.3’ N, 7° 54.0’ E) between the main island Helgoland

350 and the minor island Düne about 40 km offshore in the southeastern North Sea in the

351 German Bight. Bacterial biomass for protein extraction was sequentially filtered with

352 peristaltic pumps onto 10 µm, 3 µm, and 0.2 µm pore-sized filters (142 mm diameter) to

353 separate PA and FL bacteria. All filters were stored at -80 °C until further analyses.

354 Testing of protein extraction protocols. To test six different existing protein extraction

355 protocols for their applicability on PA bacteria, filters from several time points containing

356 varying amounts of biomass were chosen. Sample preparation for the metaproteomic

357 analysis included cutting the filters into quarters and subsequently into small pieces (1-2 mm

358 in diameter). Pieces of one quarter filter were transferred into 15 ml falcon tubes and treated

359 according to the respective protocol.

360 Protocol 1 - Phenol. Filter pieces were incubated in 2.4 ml of a 0.1 M NaOH solution for 10

361 min at room temperature and were then sonicated three times for 30 s at 20% power output

362 (Sonopuls HD2200 with microtip MS 73; Bandelin electronic, Germany). Subsequently the

363 sample was centrifuged for 15 min at 12,500 x g at 20 °C to separate the supernatant from

364 the filter pieces. The supernatant was transferred into a new tube and protein extraction

365 using phenol was performed according to the protocol published by Kuhn and colleagues

366 (Kuhn et al., 2011).

367 Protocol 2 - SDS-TCA. Filter pieces were mixed with 5 ml extraction buffer (1% (w/v) SDS,

368 50 mM Tris/HCl, pH 7.0) and vigorously shaken for 2 min at room temperature. The cell

369 disruption by sonication, boiling and shaking was performed according to the protein

370 extraction protocol published by Schneider and colleagues (Schneider et al., 2012).

371 Subsequently, proteins were precipitated with 10% TCA over night at 4 °C. The precipitated

16

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

372 proteins were centrifuged for 20 min at 12,500 x g at 4 °C and the pellet was washed two

373 times in ice-cold acetone.

374 Protocol 3 - TRI-Reagent®. The TRI-Reagent® (Sigma-Aldrich, product-number T9424) is

375 used for the simultaneous isolation of RNA, DNA, and proteins. Filter pieces were transferred

376 into 4 ml TRI-Reagent and shaken vigorously for 5 min. Subsequently the proteins were

377 extracted according to the manufacturer’s guidelines.

378 Protocol 4 - Freeze and Thaw. Protein extraction was carried out according to a combination

379 of the extraction protocols of Chourey et al. (2010) and Thompson et al. (2008). To this end,

380 filter pieces were mixed with 4 ml lysis buffer (5% SDS, 50 mM Tris/HCl, 0.1 mM EDTA,

381 0.15 M NaCl, 1 mM MgCl2, 50 mM DTT, pH 8.5) and vigorously shaken for 3 min.

382 Subsequently, the samples were boiled for 10 min, followed by two freezing and thawing

383 cycles with liquid nitrogen. After cooling at 4 °C the samples were vigorously shaken for

384 3 min. To remove cell debris, samples were centrifuged for 20 min at 12,500 x g at 4 °C. The

385 proteins in the supernatant were precipitated with 25% TCA over night at 4 °C. Precipitated

386 proteins were centrifuged for 20 min at 12,500 x g at 4 °C and the resulting protein pellet was

387 washed with ice-cold acetone.

388 Protocol 5 - SDS-Acetone. Filter pieces were mixed with 5 ml extraction buffer (50 mM Tris,

389 1% (w/v) SDS, pH 7.5) and vortexed vigorously. Proteins were extracted by sonication,

390 boiling and shaking as described by Hall and colleagues (Hall et al., 2012). Subsequently,

391 proteins were precipitated with five volumes acetone over night at -20 °C. The precipitated

392 proteins were centrifuged for 20 min at 12,500 x g at 4 °C and the pellet washed two times in

393 ice-cold acetone.

394 Protocol 6 – bead beating. Protein extraction was carried out according to the extraction

395 protocol of Moog (2012), which is based on the protocol of Teeling and colleagues (Teeling

396 et al., 2012). To this end, filter pieces were covered with 4 ml lysis buffer (0.1 M DTT, 0.01 M

397 EDTA, 10% Glycerol (v/v), 1.7 mM PMSF, 5% SDS (w/v), 0.05 M Tris/HCl, pH 6.8) and 2 ml

398 glass beads (0.1 – 0.11 mm diameter) were added. The cells on the filter pieces were 17

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

399 subsequently disrupted four times for 30 s with 6.5 m/s via bead beating with a Fast Prep™-

400 24 (MP Biomedicals, Germany). To remove cell debris and glass beads, samples were

401 centrifuged for 20 min at 12,500 x g at 4 °C and the supernatant was transferred into new

402 tubes. This washing step was repeated 2 to 4 times until the beads were colourless. The

403 glass beads were washed with 3 ml lysis buffer and vigorously shaken. Proteins enriched in

404 the pooled supernatants were precipitated with 1:4 acetone at -20 °C over night. Precipitated

405 proteins were centrifuged for 20 min at 12,500 x g at 4 °C and the resulting protein pellet

406 washed with ice-cold acetone.

407 All resulting protein pellets were air-dried and resolved in 8 M urea / 2 M thiourea.

408 Determination of protein concentrations. . Protein concentrations were determined using

409 the Pierce™ BCA Protein Assay Kit (Thermo Fisher Scientific). Protein extracts were

410 prepared with the Compat-Able™ Protein Assay Preparation Reagent Kit (Thermo Fisher

411 Scientific) according to the manufacturer’s guidelines.

412 SDS-PAGE protein separation. 30 µg protein or 30 µl protein extract was mixed with

413 4x SDS sample buffer (20% glycerol, 100 mM Tris/HCl, 10% (w/v) SDS, 5% β-

414 mercaptoethanol, 0.8% bromphenol blue, pH 6.8) and loaded on TGX precast 4-20% gels

415 (Biorad, Germany). Samples were separated by electrophoresis at 150 V for 45 min. After

416 fixation (10% acetic acid, 40% ethanol, 30 min) the gels were stained with Brilliant Blue G250

417 Coomassie and imaged.

418 Protein digestion and MS-sample preparation. Three different protocols were tested on

419 proteins extracted from 3 µm and 10 µm filters.

420 Protocol 1 - 10 gel pieces. Protein lanes were cut into 10 equal-sized pieces and washed

421 with a buffer containing 50 mM ammoniumbicarbonate and 30% (v/v) acetonitrile. Prior to

422 tryptic digestion, gel pieces were dried in a vacuum concentrator and re-swollen with 2 ng/µl

423 trypsin solution (sequencing grade trypsin, Promega, USA) followed by overnight digestion at

424 37 °C. After digestion the gel pieces were covered with water and peptides were eluted from

18

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

425 the gel in an ultrasonic bath. The eluted peptides were desalted with C18 Millipore® ZipTip

426 columns (Millipore) according to the manufacturer’s guidelines.

427 Protocol 2 - 20 gel pieces. The protein lanes were cut into 20 equal-sized pieces and treated

428 as described above.

429 Protocol 3 - 20 gel pieces with reduction and alkylation. The protein lanes were cut into 20

430 equal pieces and washed with a buffer containing 100 mM ammoniumbicarbonate

431 (NH4HCO3) and 50% (v/v) methanol. Subsequently, proteins were reduced in 50 mM

432 NH4HCO3 containing 10 mM DTT for 30 min at 60 °C, followed by alkylation in 50 mM

433 NH4HCO3 containing 50 mM iodoacetamide (IAA) for 60 min in the dark at room temperature.

434 Prior to tryptic digestion, the gel pieces were dehydrated using 100% acetonitrile and dried,

435 re-swollen with 2 ng/µl trypsin solution and incubated at 37 °C over night. Peptides were

436 eluted from the gel pieces by a six-step procedure, using acetonitrile, 1% (v/v) acetic acid in

437 water, acetonitrile, 10% (v/v) acetic acid and two times acetonitrile. Peptide-containing

438 supernatants were pooled and completely dried in a vacuum concentrator. Samples were

439 subsequently resolved in buffer A (5% (v/v) acetonitrile, 0.1% (v/v) formic acid) and desalted

440 with C18 Millipore® ZipTip columns (Millipore) according to the manufacturer’s guidelines.

441 Constructions of a protein sequence database from marine particle metagenomes.

442 Environmental DNA was extracted from the 0.2 µm, 3 µm and 10 µm pore-sized filters

443 sampled on the 14th of April 2009 by a modified standard protocol of Zhou et al. (1996). In

444 detail one polycarbonate filter was cut into 4 pieces and mixed with 13.5 ml extraction buffer

445 (100 mM Tris-HCl (pH 8.0), 100 mM EDTA (pH 8.0), 100 mM Na-phosphate (pH 8.0), 1.5 M

446 NaCl, 1% CTAB (Hexadecyltrimethylammonium-bromide)). Subsequently 100 µl 10 mg/ml

447 Proteinase K was added and the sample was incubated shaking at 37 °C for 30 min. 1.5 ml

448 20% SDS was added and the sample was incubated shaking at 65 °C for 2 h. The sample

449 was centrifuged at 6,000 x g for 10 min at room temperature and the supernatants were

450 transferred to fresh tubes. Subsequently, an equal volume of chloroform/isoamylalcohol was

451 added and the sample was mixed carefully by shaking and was centrifuged at 10,000 x g for 19

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

452 10 min at room temperature. Afterwards the aqueous upper phase was transferred into a

453 new tube and the DNA was precipitated by addition of 0.6 volumes isopropanol. The sample

454 was moderately shaken over night at 4°C. After centrifugation at 50,000 x g for 20 min at

455 room temperature, the pellet was washed with 10 ml 80% (v/v) ethanol and dried. The pellet

456 was resuspended in 200 µl TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 7) and stored at -20

457 °C until sequencing.

458 DNA was sequenced at the Max Planck Sequencing Centre (Cologne, Germany), using the

459 Illumina HiSeq 2500 platform and 2 x 250 bp chemistry. Sequences were then trimmed using

460 bbduk v35.14 (http://bbtools.jgi.doe.gov) with the following parameters: ktrim = r k = 28

461 mink = 12 hdist = 1 tbo = t tpe = t qtrim = rl trimq = 20 minlength = 100. Read quality for each

462 sample was then confirmed using FastQC v0.11.2 (Andrews, 2010). Trimmed and filtered

463 reads from the three metagenomic datasets were then assembled individually. The 0.2 µm

464 pore-sized filter sample was assembled with metaSPAdes v3.10.1 (Nurk et al., 2017) with

465 kmers of length 21, 33, 55, 77, 99, and 127, and error correction mode switched on.

466 Assembly of the larger size fraction was done with MEGAHIT v1.1.3 (Li et al., 2016) with

467 kmers 21, 33, 55, 77, 99, 127, 155, 183, and 211. Assembled contigs longer than 1500 base

468 pairs were kept for gene predictions. Genes were predicted and annotated using Prokka

469 v1.11 (Seeman, 2014), which implements prodigal v2.6.3 (Hyatt et al., 2010) for ORF

470 prediction.

471 Raw read sequences and assembled contig sequences have been deposited in the

472 European nucleotide archive (ENA) under the project accession number PRJEB2888.

473 LC-MS/MS data acquisition and data analysis. Peptides were separated by reversed-

474 phase chromatography on an Easy-nLC 1000 (Thermo Scientific) with self-packed C18

475 analytical columns (100 µm × 20 cm) and coupled to a LTQ Orbitrap Velos mass

476 spectrometer(Thermo Scientific) using a non-linear binary gradient of 80 minutes from 5 %

477 solvent A (0.1 % (v/v) acetic acid) to 99 % solvent B (0.1 % acetic acid (v/v), 99.9 %

478 acetonitrile (v/v)) and a flow rate of 300 nl/min. Survey scans at a resolution of 30,000 were 20

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

479 recorded in the Orbitrap analyser (m/z 300 - 1700) and the 20 most intense precursor ions

480 were selected for CID fragmentation in the LTQ. Dynamic exclusion of precursor ions was

481 enabled; single-charged ions and ions with unknown charge state were excluded from

482 fragmentation. Internal lock mass calibration was enabled (lock mass 445.120025).

483 The mass spectrometry raw data were converted into mgf files using MSConvert (64-bit,

484 Proteowizard 3) and subsequently subjected to database searching via Mascot (Matrix

485 Science; version 2.6.0). Four different protein sequence databases were used for peptide to

486 spectrum matching: I) the non-redundant NCBI database (NCBI nr - NCBIprot_20171030

487 database (136,216,794 entries)), II) a database containing protein sequences of abundant

488 bacteria and diatoms (PABD) based on the study of Teeling et al. (2012), and retrieved from

489 Uniprot KB (Uniprot_DoS_complete_20170829 database (2,638,314 entries)), III) a database

490 containing protein sequences of the free-living fraction from Teeling et al. (2012) (0.2 µm

491 2009 (MIMAS) - MIMAS_forward_reverse_all_contaminants database (1,579,724 entries)),

492 and IV) a database based on translated metagenomes from 0.2 µm and 3 µm filters (see

493 below for details) (0.2 + 3 µm 2009 - 02_plus_3_POMPU_nr97_fw_cont_20181015 database

494 (1,463,571 entries). Mascot was searched with a fragment ion mass tolerance of 0.80 Da

495 and a parent ion tolerance of 10.0 ppm. Oxidation of methionine was specified as a variable

496 modification, trypsin was set as digestion enzyme and a maximum of two missed cleavages

497 was allowed.

498 Scaffold (version Scaffold_4.8.7, Proteome Software Inc.) was used to validate MS/MS

499 based peptide and protein identifications. Peptide identifications were accepted if they could

500 be established at greater than 95.0% probability by the Peptide Prophet algorithm (Keller et

501 al., 2002) with Scaffold delta-mass correction. Protein identifications were accepted if they

502 could be established at greater than 99.0% probability and contained at least one identified

503 peptide. Protein probabilities were assigned by the Protein Prophet algorithm (Nesvizhskii et

504 al., 2003). Proteins that contained similar peptides and could not be differentiated based on

505 MS/MS analysis alone were grouped to satisfy the principles of parsimony. Peptides that

21

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

506 were only found in one of the replicates were excluded from the following data analysis.

507 Mass spectrometry proteomics data have been deposited to the ProteomeXchange

508 Consortium via the PRIDE partner repository (Perez-Riverol et al., 2019) with the data set

509 identifier PXD12699 (reviewer account details: username [email protected];

510 password 5CKUi0AF).

511 For further data analysis, the software ProPHAnE (Proteomics result Pruning and Homology

512 group Annotation Engine; version 3.1.1) (Schneider et al., 2011) was used. For the

513 taxonomical classification of the identified protein groups the NCBI NR database (version

514 2018-08-02; e-value 0.01, query cover 0.9, max-target-seqs 1) and the diamond blastp

515 algorithm (version 0.8.22) were used. For functional classification of the identified protein

516 groups the eggmap database (version 4.5.1, downloaded at 2018-07-31) and the algorithm

517 e-mapper were used.

518 A list of common contaminants was added to all translated ORF sequences found by

519 metagenome analysis of the 0.2 µm and 3 µm filters from the sampling date 14th of April

520 2009. Redundant sequences were eliminated (97% redundancy, elimination of shorter

521 sequence) using CD-HIT (www.cdhit.org), a program for clustering and comparing protein or

522 nucleotide sequences, resulting in the database

523 02_plus_3_POMPU_nr97_fw_cont_20181015 (1,463,571 entries).

524

525 Acknowledgements.

526 We thank the DFG for financial support (RI 969/9-1) in the frame of the Research Unit

527 “POMPU” (FOR 2406). We are grateful to Sabine Kühn for technical assistance and to

528 Thomas Schweder for helpful discussions.

529

530

22

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

531 References

532 Abell, G.C.J., and Bowman, J.P. (2005) Colonization and community dynamics of class

533 Flavobacteria on diatom detritus in experimental mesocosms based on Southern Ocean

534 seawater. FEMS Microbiol Ecol 53: 379-391.

535 Alma'abadi, A.D., Gojobori, T., and Mineta, K. (2015) Marine metagenome as a resource for

536 novel enzymes. GPB 13: 290-295.

537 Alldredge, A.L., Passow, U., and Logan, B. (1993) The existence, abundance, and

538 significance of large transparent exopolymer particles in the ocean. Deep-Sea Res I 40:

539 1131-1140.

540 Alonso, C., Warnecke, F., Amann, R., and Pernthaler, J. (2007) High local and global

541 diversity of Flavobacteria in marine plankton. Environ Microbiol 9: 1253-1266.

542 Amin, S.A., Parker, M.S., and Armbrust, E.V. (2012) Interactions between Diatoms and

543 Bacteria. MMBR 76: 667-684.

544 Andrews, S. (2010) FastQC: a quality control tool for high throughput sequence data.

545 Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc

546 Azam, F., Fenchel, T., Field, J.G., Gray, J.S., Meyer-Reil, L.A., and Thingstad, F. (1983) The

547 Ecological Role of Water-Column Microbes in the Sea. Mar Ecol Prog Ser 10: 257-263.

548 Azam F. (1998) Microbial control of Oceanic carbon flux: the plot thickens. Science 280: 694-

549 696.

550 Azam, F., and Malfatti, F. (2007) Microbial structuring of marine ecosystems. Nature Reviews

551 5: 782-791.

552 Banfield, J.F., Verberkmoes, N.C., Hettich, R.L., and Thelen, M.P. (2005) Proteogenomic

553 approaches for the molecular characterization of natural microbial communities. OMICS 9:

554 301-333.

555 Battin, T.J., Kaplan, L.A., Denis Newbold, J., and Hansen, C.M. (2003) Contributions of

556 microbial biofilms to ecosystem processes in stream mesocosms. Nature 426: 439-442.

23

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

557 Bauer, M., Kube, M., Teeling, H., Richter, M., Lombardot, T., Allers, E., et al. (2006) Whole

558 genome analysis of the marine Bacteroidetes ‘Gramella frosetii’ reveals adaptations to

559 degradation of polymeric organic matter. Environ Microbiol 8: 2201-2213.

560 Bergauer, K., Fernandez-Guerra, A., Garcia, J.A.L., Sprenger, R.R., Stepanauskas, R.,

561 Pachiadaki, M.G., et al. (2018) Organic matter processing by microbial communities

562 throughout the Atlantic water column as revealed by metaproteomics. Proc Natl Acad Sci

563 U S A 115: E400-E408.

564 Bižić-Ionescu, M., Zeder, M., Ionescu, D., Orlić, S., Fuchs, B.M., Grossart, H.P., et al. (2015)

565 Comparison of bacterial communities on limnic versus coastal marine particles reveals

566 profound differences in colonization. Environ Microbiol 17: 3500-3514.

567 Bochdansky, A.B., Clouse, M.A., and Herndl, G.J. (2016). Eukaryotic microbes, principally

568 fungi and , dominate biomass on bathypelagic marine snow. ISME J

569 11: 362-373.

570 Burke, C., Steinberg, P., Kjelleberg, S., and Thomas, T. (2011) Bacterial community

571 assembly based on functional genes rather than . PNAS 108: 14288-14293.

572 Caron, D.A., Davis, P.G., Madin, L.P., and Sieburth, J.M. (1982) Heterotrophic bacteria and

573 bacterivorous protozoa in oceanic macroaggregates. Science 218: 795-797.

574 Chafee, M., Fernàndez-Guerra, A., Buttigieg, P.L., Gerdts, G., Eren, A.M., Teeling, H., et al.

575 (2018) Recurrent patterns of microdiversity in a temperate coastal marine environment.

576 ISME J 12: 237-252.

577 Chourey, K., Jansson, J., VerBerkmoes, N., Shah, M., Chavarria, K.L., Tom, L.M., et al.

578 (2010) Direct cellular lysis/protein extraction protocol for soil metaproteomics. J Proteome

579 Res 9: 6615-6622.

580 Christie-Oleza, J.A., and Armengaud, J. (2015) Proteomics of the Roseobacter clade, a

581 window to the marine microbiology landscape. Proteomics 15: 3928-3942.

582 Crespo, B.G., Pommier, T., Fernández-Gómez, B., and Pedrós-Alió, C. (2013) Taxonomic

583 composition of the particle-attached and free-living bacterial assemblages in the

584 Northwest Mediterranean Sea. Microbiology Open 2: 541-552.

24

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

585 Cunliffe, M., Hollingsworth, A., Bain, C., Sharma, V., and Taylor, J.D. (2017) Algal

586 polysaccharide utilisation by saprotrophic planktonic marine fungi. Fungal Ecol 30: 135-

587 138.

588 DeLong, E.F., Franks, D.G., and Alldredge, A.L. (1993) Phylogenetic diversity of aggregate-

589 attached vs. free-living marine bacterial assemblages. Limnol Oceanogr 38: 924-934.

590 Eilers, H., Pernthaler, J., Peplies, J., Glöckner, F.O., Gerdts, G., and Amann, R. (2001)

591 Isolation of novel pelagic bacteria from the German Bight and their seasonal contribution

592 to surface picoplankton. AEM 67: 5134-5142.

593 Falkowski, P.G., Barber, R.T., and Smetacek, V. (1998) Biogeochemical controls and

594 feedbacks on ocean primary production. Science 281: 200-206.

595 Fernández-Gómez, B., Richter, M., Schüler, M., Pinhassi, J., Acinas, S., González, J.M. et

596 al. (2013) Ecology of marine Bacteroidetes. A comparative genomics approach. ISME J 7:

597 1026-1037.

598 Field, C.B., Behrenfeld, M.J., Randerson, J.T., and Falkowski, P. (1998) Primary production

599 of the biosphere: integrating terrestrial and oceanic components. Science 281: 237-240.

600 Ganesh, S., Parris, D.J., DeLong, E.F., and Stewart, F.J. (2014) Metagenomic analysis of

601 size-fractionated picoplankton in a marine oxygen minimum zone. ISME J 8: 187-211.

602 Gaudry, A., Zeroual, F., Gaie-Levrel, F., Moskura, M., Boujrhal, F.Z., Cherkaoui, E. et al.

603 (2007) Heavy metals pollution of the atlantic marine environment by the Moroccan

604 phosphate industry, as observed through their bioaccumulation in Ulva lactuca. Water Air

605 Soil Pollut 178: 267-285.

606 Ghiglione, J.F., Mevel, G., Pujo-Pay, M., Mousseau, L., Lebaron, P., and Goutx, M. (2007)

607 Diel and seasonal variations in abundance, activity, and community structure of particle-

608 attached and free living bacteria in NW Mediterranean Sea. Microb Ecol 54: 217-231.

609 Giovannoni, S.J., Tripp, H.J., Givan, S., Podar, M., Vergin, K.L., Baptista, D., et al. (2005)

610 Genome Streamlining in a Cosmopolitan Oceanic Bacterium. Science 309: 1242-1245.

25

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

611 Gómez-Pereira, P.R., Schüler, M., Fuchs, B.M., Bennke, C., Teeling, H., Waldmann, J., et al.

612 (2012) Genomic content of uncultured Bacteroidetes from contrasting oceanic provinces

613 in the North Atlantic Ocean. Environ Microbiol 14: 52-66.

614 Grossart, H.P. (1999) Interactions between marine bacteria and axenic diatoms

615 (Cylindrotheca fusiformis, Nitzschia laevis, and Thalassiosira weissflogii) incubated under

616 various conditions in the lab. Aquat. Microb Ecol 19: 1-11.

617 Grossart, H.P., Hietanen, S., and Ploug, H. (2003) Microbial dynamics on diatom aggregates

618 in Oresund, Denmark. Mar Ecol Prog Ser 249: 69-78.

619 Grossart, H.P., Levold, F., Allgaier, M., Simon, M., and Brinkhoff, T. (2005) Marine diatom

620 species harbour distinct bacterial communities. Environ Microbiol 7: 860-873.

621 Grossart, H.P. (2010) Ecological consequences of bacterioplankton lifestyles: changes in

622 concepts are needed. Environ Microbiol Rep 2: 706-714.

623 Hall, E.K., Besemer, K., Kohl, L., Preiler, C., Riedel, K., Schneider, T., et al. (2012) Effects of

624 resource chemistry on the composition and function of stream hyporheic Biofilms. Front

625 Microbiol 3: 1-14.

626 Hinzke, T., Kouris, A., Hughes, R.A., Strous, M., and Kleiner, M. (2019) More Is Not Always

627 Better: Evaluation of 1D and 2D-LC-MS/MS Methods for Metaproteomics. Front Microbiol

628 10: 238.

629 Houry, A., Briandet, R., Aymerich, S., and Gohar, M. (2010) Involvement of motility and

630 flagella in Bacillus cereus biofilm formation. Microbiology 156: 1009-1018.

631 Hyatt, D., Chen, G.L., LoCascio, P.F., Land, M.L., Larimer, F.W., and Hauser, L.J. (2010)

632 Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC

633 Bioinformatics 11: 119.

634 Kappelmann, L., Krüger, K., Hehemann, J.H., Harder, J., Markert, S., Unfried, F., et al.

635 (2019) Polysaccharide utilization loci of North Sea Flavobacteriia as basis for using

636 SusC/D-protein expression for predicting major phytoplankton glycans. ISME J 13: 76-91

637 Keeling, P.J., Burki, F., Wilcox, H.M., Allam, B., Allen, E.E., Amaral-Zettler, L.A., et al. (2014)

638 The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP):

26

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

639 Illuminating the functional diversity of eukaryotic life in the oceans through transcriptome

640 sequencing. PLoS Biol 12: e1001889-6.

641 Keiblinger, K.M., and Riedel, K. (2018) Sample Preparation for Metaproteome Analyses of

642 Soil and Leaf Litter. Methods Mol Biol 1841: 303-318.

643 Keller, A., Nesvizhskii, A.I., Kolker, E., and Aebersold, R. (2002) Empirical statistical model to

644 estimate the accuracy of peptide identifications made by MS/MS and database search.

645 Anal Chem 74: 5383-5392.

646 Kroth, P.G., Chiovitti, A., Gruber, A., Martin-Jezequel, V., Mock, T., Schnitzler Parker, M., et

647 al. (2008). A model for carbohydrate metabolism in the diatom Phaeodactylum tricornutum

648 deduced from comparative whole genome analysis. PLoS One 3: e1426.

649 Kuhn, R., Benndorf, D., Rapp, E., Reichl, U., Palese, L.L., and Pollice, A. (2011)

650 Metaproteome analysis of sewage sludge from membrane bioreactors. Proteomics 11:

651 2738-2744.

652 Lassek, C., Burghartz, M., Chaves-Moreno, D., Otto, A., Hentschker, C., Fuchs, S., et al.

653 (2015) A metaproteomics approach to elucidate host and pathogen protein expression

654 during catheter-associated urinary tract infections (CAUTIs). Mol Cell Proteomics 14: 989-

655 1008.

656 Lemon, K.P., Higgins, D.E., and Kolter, R. (2007) Flagellar motility is critical for Listeria

657 monocytogenes biofilm formation. J Bacteriol 189: 1009-1018.

658 Li, D., Luo, R., Liu, C.M., Leung, C.M., Ting, H.F., Sadakane, K., et al. (2016) MEGAHIT

659 v1.0: A Fast and Scalable Metagenome Assembler driven by Advanced Methodologies

660 and Community Practices. Methods 102: 3-11.

661 Mineta, K., and Gojobori, T. (2016). Databases of the marine metagenomics. Gene 576: 724-

662 728.

663 Moog, G. (2012) Optimierung der Probenaufbereitung für Metaproteomanalysen mittels

664 Massenspektrometrie. Diploma thesis at the University of Greifswald, Institute of

665 Microbiology.

27

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

666 Moore, E.K., Nunn, B.L., Goodlett, D.R., Harvey, H.R. (2012) Identifying and tracking

667 proteins through the marine water column: insights into the inputs and preservation

668 mechanisms of protein in sediments. Geochim Cosmochim Acta 83: 324-359.

669 Morris, R.M., Nunn, B.L., Frazar, C., Goodlett, D.R., Ting, Y.S., and Rocap, G. (2010)

670 Comparative metaproteomics reveals ocean-scale shifts in microbial nutrient utilization

671 and energy transduction. ISME J 4: 673-685.

672 Nesvizhskii, A.I., Keller, A., Kolker, E., and Aebersold, R. (2003) A statistical model for

673 identifying proteins by tandem mass spectrometry. Anal Chem 75: 4646-4658.

674 Nurk, S., Meleshko, D., Korobeynikov, A., and Pevzner, P.A. (2017) metaSPAdes: a new

675 versatile metagenomic assembler. Genome Res 27: 824-834.

676 O’Toole, G.A., and Kolter, R. (1998) Flagellar and twitching motility are necessary for

677 Pseudomonas aeruginosa biofilm development. Mol Microbiol 30: 295-304.

678 Perez-Riverol, Y., Csordas, A., Bai, J., Bernal-Llinares, M., Hewapathirana, S., Kundu, D.J.,

679 et al. (2019) The PRIDE database and related tools and resources in 2019: improving

680 support for quantification data. Nucleic Acids Res 47: D442-D450.

681 Pinhassi, J., Sala, M.M., Havskum, H., Peters, F., Guadayol, O., Malits, A., et al. (2004)

682 Changes in bacterioplankton composition under different phytoplankton regimens. AEM

683 70: 6753-6766.

684 Raes, J., and Bork, P. (2008) Molecular eco-systems biology: towards an understanding of

685 community function. Nat Rev Microbiol 6: 693-699.

686 Ram, R.J., Verberkmoes, N.C., Thelen, M.P., Tyson, G.W., Baker, B.J., Blake, R.C., et al.

687 (2005) Community proteomics of a natural microbial biofilm. Science 308: 1915-1920.

688 Riemann, L., Steward, G.F., and Azam, F. (2000) Dynamics of bacterial community

689 composition and activity during a mesocosm diatom bloom. AEM 66: 578-587.

690 Saito, M.A., Bertrand, E.M., Duffy, M.E., Gaylord, D.A., Held, N.A., Hervey, W.J., et al.

691 (2019) Progress and challenges in ocean metaproteomics and proposed best practices for

692 data sharing. J Proteome Res 18: 1461-1476.

28

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

693 Schneider, T., and Riedel, K. (2010) Environmental proteomics. Analysis of structure and

694 function of microbial communities. Proteomics 10: 785-798.

695 Schneider, T., Schmid, E., de Castro Junior, J.V., Cardinale, M., Eberl, L., Grube, M., et al.

696 (2011) Structure and function of the symbiosis partners of the lung lichen (Lobaria

697 pulmonaria L. Hoffm.) analyzed by metaproteomics. Proteomics 11: 2752-2756.

698 Schneider, T., Keiblinger, K.M., Schmid, E., Sterflinger-Gleixner, K., Ellersdorfer, G.,

699 Roschitzki, B., et al. (2012) Who is who in litter decomposition? Metaproteomics reveals

700 major microbial players and their biogeochemical functions. ISME J 6: 1749-1762.

701 Seemann, T. (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30: 2068-

702 2069.

703 Simon, M., Grossart, H.P., Schweitzer, B., and Ploug, H. (2002) Microbial ecology of organic

704 aggregates in aquatic ecosystems. Aquat Microb Ecol 28: 175-211.

705 Simon, M., Glöckner, F.O., and Amann, R. (1999) Different community structure and

706 temperature optima of heterotrophic picoplankton in various regions of the Southern

707 Ocean. Aquat Microb Ecol 18: 275-284.

708 Sohn, J.H., Lee, J.H., Yi, H., Chun, J., Bae, T.S., Ahn, T.Y., et al. (2004) Kordia algicida gen.

709 nov., sp. nov., an aligicidal bacterium isolated from red tide. Int J Syst Evol Microbiol 54:

710 675-80.

711 Teeling, H., Fuchs, B.M., Becher, D., Klockow, C., Gardebrecht, A., Bennke, C.M., et al.

712 (2012) Substrate-controlled succession of marine bacterioplankton populations induced by

713 a phytoplankton bloom. Science 336: 608-611.

714 Teeling, H., Fuchs, B.M., Bennke, C.M., Krüger, K., Chafee, M., Kappelmann, L., et al.

715 (2016) Recurring patterns in bacterioplankton dynamics during coastal spring algae

716 blooms. Elife 5: e1188.

717 Thompson, M.R., Chourey, K., Froelich, J.M., Erickson, B.K., VerBerkmoes, N.C., and

718 Hettich, R.L. (2008) Experimental approach for deep proteome measurements from small-

719 scale microbial biomass samples. Anal Chem 80: 9517-9525.

29

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

720 Volkman, J., and Tanoue, E. (2002) Chemical and biological studies of particulate organic

721 matter in the ocean. J Oceanogr 58: 265-279.

722 Wang, D.Z., Xie, Z.X., and Zhang, S.F. (2014). Marine metaproteomics: current status and

723 future directions. J Proteomics 97: 27-35.

724 Williams, T.J., Long, E., Evans, F., Demaere, M.Z., Lauro, F.M., Raftery, M.J., et al. (2012) A

725 metaproteomic assessment of winter and summer bacterioplankton from Antarctic

726 Peninsula coastal surface waters. ISME J 6: 1883-1900.

727 Wilmes, P., Wexler, M., and Bond, P.L. (2008) Metaproteomics provides functional insight

728 into activated sludge wastewater treatment. PLoS One 3: e1778.

729 Wöhlbrand, L., Wemheuer, B., Feenders, C., Ruppersberg, H.S., Hinrichs, C., Blasius, B., et

730 al. (2017a) Complementary metaproteomic approaches to assess the bacterioplankton

731 response toward a phytoplankton spring bloom in the Southern North Sea. Front Microbiol

732 8: 442.

733 Wöhlbrand, L., Feenders, C., Nachbaur, J., Freund, H., Engelen, B., Wilkes, H., et al.

734 (2017b) Impact of extraction methods on the detectable protein complement of

735 metaproteomic analyses of marine sediments. Proteomics 17: 22.

736 Zhou, J., Bruns, M.A., and Tiedje, J.M. (1995) DNA Recovery from Soils of Diverse

737 Composition. AEM 62: 316-322.

30

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

738 Figure legends

739 Figure 1: (A) Sampling strategy and (B) evaluation of protein extraction and MS

740 sample preparation protocols. (A) Water samples collected at “Kabeltonne” Helgoland

741 during the spring bloom 2009 were sequentially filtered to obtain the 0.2 - 3 µm (FL) and two

742 PA (3 - 10 µm = medium, ≥ 10 µm = large) fractions as described in Teeling et al. (2012).

743 Filters were initially cut into three or four pieces, which were subsequently shredded and

744 mixed with the respective extraction buffer. (B) Filters (medium particles = yellow; large

745 particles = blue) from different sampling time points (turquoise, green and red) were

746 processed according to the six different protocols describe in the experimental procedure

747 section. With regard to the extracted protein amount the bead beating and SDS-acetone

748 approaches obviously outcompeted the four other protocols. However, the SDS-acetone

749 protocol was less reproducible than the bead beating protocol. Considering bead beating and

750 SDS-acetone as best performing protocols, they were employed to test different MS sample

751 preparation approaches, i.e., different number of SDS gel fractions for tryptic digestion

752 together with protein reduction (red.) and alkylation (alk.) prior to tryptic digestion. The

753 subsequent LC-MS/MS analyses revealed best results for the bead beating protocol followed

754 by GeLC-MS/MS from 20 fractions without protein reduction and alkylation as shown in the

755 bottom line of the figure. Bubble sizes for the large (blue) and medium (yellow) particles

756 correspond to the number of identified protein groups (see also Fig. S1).

757

758 Figure 2: Final metaproteomics pipeline. Protein extraction from filters was conducted

759 using 5% (w/v) SDS containing lysis buffer, cell disruption by FastPrep-mediated bead

760 beating, separation of proteins by 1D-SDS-PAGE, tryptic in-gel digestion, LC-MS/MS

761 analyses on an Orbitrap VelosTM mass spectrometer, MASCOT database search against the

762 metagenome-based database (0.2 + 3 µm 2009) and data-processing and visualization with

763 the in-house-developed bioinformatics tools Prophane 3.1 and Paver.

764

31

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

765 Figure 3: Taxonomic affiliation of proteins of FL and PA metaproteomes during the

766 spring bloom on 14th of April 2009 at “Kabeltonne” Helgoland. (A) Distribution of pro-

767 and eukaryotes in the FL (0.2 - 3 µm) and PA (3 - 10 µm, ≥ 10 µm) fractions based on the

768 relative abundance of protein groups assigned to the different phylogenetic groups. (B)

769 Voronoi treemaps visualizing the phylogenetic assignment of bacterial protein groups

770 identified in FL (red) and PA (yellow and blue) fractions. Cell size corresponds to the relative

771 abundance of the respective bacterial genus on protein level. Proteins of Reinekea for

772 example are most abundant in the FL fraction and are therefore encoded by a large red

773 treemap cell. In the PA fractions they can be detected only in traces resulting in very small

774 cell sizes (coloured in yellow and blue). Algibacter protein abundance, on the other hand,

775 was notably higher in the PA fractions, compared to the FL fraction.

776

777 Figure 4: Functional assignment of proteins in FL and PA metaproteomes during the

778 spring bloom on 14th of April 2009 at “Kabeltonne” Helgoland. (A) Total abundance of

779 selected protein groups with assigned functions in the FL (0.2 – 3 µm = small) and PA (3 - 10

780 µm = medium and ≥ 10 µm = large) fractions. (B) Voronoi treemaps showing the

781 phylogenetic assignment of selected functional protein groups identified in FL (red) and PA

782 (yellow and blue) fractions. Cell size corresponds to the relative abundance of the respective

783 genus within specific functional categories.

784

785 Figure 5: Phylogenetic assignment of eukaryotic proteins present in the FL and PA

786 fractions during the spring bloom on 14th of April 2009 at “Kabeltonne” Helgoland. (A)

787 Distribution of different eukaryotes in the FL (0.2 µm) and PA (0.2 - 3 µm and ≥ 10 µm)

788 fractions as shown by the relative protein abundances assigned to the different eukaryotic

789 phylogenetic groups. (B) Voronoi treemap visualizing the relative abundance of eukaryotic

790 taxa based on the abundance of assigned proteins extracted from the FL (red) and PA

791 (yellow and blue) fractions. Cell size corresponds to the relative abundance of the respective

792 genus. In this preliminary analysis, protein identification is based on metagenomic (DNA-

32

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

793 based) information from the filtered fractions, which suffers limitations for eukaryotic protein

794 identification, probably resulting in incomplete functional and taxonomic profiles.

795

796

33

bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

797 Supplemental Material

798 Figure S1. Protein identifications obtained by different extraction and protein pre-

799 fractionation protocols. For the medium particle size fraction (3 - 10 µm, yellow), 20 gel

800 fractions after standard treatment, i.e. without protein reduction (red.) and alkylation (alk.),

801 resulted in the highest number of identified protein groups, no matter which protein extraction

802 protocol (SDS-acetone (red) or bead beating (green)) was applied. For the large particle

803 fraction (≥ 10 µm, blue) the general trend was similar. However, the beat beating protocol

804 performed better compared to the SDS-acetone protocol.

805

806 Figure S2: Number of identified protein groups obtained with different databases: (I)

807 the non-redundant NCBI database (NCBInr, 136,216,794 entries), (II) a database with

808 Uniprot sequences of known abundant bacteria and diatoms identified by the study of

809 Teeling et al. (2012) (PABD, 2,638,314 entries), (III) a metagenome-based database

810 employed for the FL bacterial fraction within the study of Teeling et al. (2012) (MIMAS,

811 1,579,724 entries) and (IV) a database based on translated metagenomes of the FL fraction

812 on the 0.2 µm filters and particles on the 3 µm filters sampled on the 14th of April 2009 (0.2 +

813 3 µm 2009, 1,463,572 entries).

814

815 Figure S3: Venn diagram of overlapping and fraction-specific protein sets.

816

817 Table S1: Prophane output for proteins extracted from 0.2 µm pore-sized filters

818 Table S2: Prophane output for proteins extracted from 3 µm pore-sized filters

819 Table S3: Prophane output for proteins extracted from 10 µm pore-sized filters

820 Table S4: Distribution of phylogenetic groups within proteins extracted from the 0.2 µm, 3

821 µm and 10 µm pore-sized filters.

34 bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Table 1: Comparison of the six tested protein extraction protocols

Protocol Nr. 1 - Phenol 2 - SDS-TCA 3 - TRI-reagent® 4 - Freeze and thaw 5 - SDS-acetone 6 - Bead beating

Reference Kuhn et al. (2011) Schneider et al. (2012) Sigma Aldrich Chourey et al. (2010) Hall et al. (2012) Moog (2012) Thompson et al. (2008) Originally used for sewage sludge from leaf litter simultaneous extraction soil stream hyporheic hypersaline microbial biomembrane reactors of RNA, DNA, and biofilms mats proteins Composition of 0.1 M NaOH 1% (w/v) SDS, 50 mM TRI-Reagent® 5% (w/v) SDS, 50 mM 1% (w/v) SDS, 50 mM 5% (w/v) SDS, 0.05 the protein Tris/HCl, pH 7 (guanidine thiocyanate Tris/HCl, 0.1 mM Tris/HCl, pH 6.8 mM Tris/HCl, 0.1 M extraction buffer and phenol monophase EDTA, 0.15 M NaCl, 1 DTT, 0.01 M EDTA,

solution) mM MgCl2, 50mM DTT, 10% (v/v) glycerol, 1.7 pH 8.5 mM PMSF, pH 6.8 Cell disruption sonication sonication TRI-Reagent® 2 freeze and thaw sonication FastPrep® 6.5 m/s, methodology 3 x 30 s (20% power 3 x 40 s (20% power cycles (liquid nitrogen, 5 x 1 min (20% power 4 x 30 s output) output) rt), 10 min boiling output), 15 min boiling, procedure repeated on the pellet Additional protein phenol extraction (2x) chloroform extraction, / / / / purification ethanol extraction Protein 0.1 M ammonium- 10% TCA 2-propanol (1:1.5) 25% TCA acetone (1:5) acetone (1:4) precipitation acetate in methanol (1:5) Mean total protein amount [µg] 8.9 22.7 16.1 25.2 38.6 27.3 3 - 10 µm fraction Mean total protein amount [µg] 8.8 12.8 38.2 24.3 102.1 114.2 ≥ 10 µm fraction

35

A Kabeltonne Helgoland ≥10µm large (pa*) ≥3µmmedium (pa) bioRxiv preprint doi: https://doi.org/10.1101/667428≥0.2µm small (f*); this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. sampling fltering >10 µm cutting cutting >3 µm +buffer free water flters flter pieces sample ready for sample with cells for comparative cell disruption and analyses protein extraction

B protocol development and validation samples on flters 9.2.2009 16.6.2009 7.4.2009 21.4.2009

protein phenol SDS SDS bead TRI freezing extraction TCA acetone beating Reagent thawing protein amount [µg] legend samples area ≙ 1µg on flters ≥ 3µm 28.4.2009 ≥10µm replicates

protein SDS bead extraction acetone beating

protein amount [µg]

SDS PAGE

classifed as inadequate * * classifed as inadequate protein 20 10 20 20 10 20 fractions pieces pieces pieces pieces pieces pieces bubble size ≙# red. & alk. red. & alk. of identifed protein groups

LC- MS/MS and data base search classifed as inadequate fnal classifed as inadequate protocol bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Kabeltonne Helgoland ≥10µm large (pa*) ≥3µmmedium (pa) ≥0.2µm small (f*) cell sampling fltering >10 µm cutting disruption >3 µm +glassbeads 6.5 m/s free water flters lysis buffer sample ready for 4x30s cell lysate sample with cells cell disruption and and debris protein extraction centrifuging washing pooling centri- protein drying fugation precipitation add 1:4 acetone overnight -20 °C dry protein protein precipitated pooled pellet pellet protein supernatants with proteins re- 8M urea 2M thio- suspending urea 1D SDS tryptic PAGE cutting digestion ...... 30 µg/lane one lane add trypsin 20 fractions protein in 1D gel gel fractions digest sample buffer ZipTip purifcation Mascot Server Prophane Paver signal signal ...

m/z m/z data analysis raw mass spec LC-MS/MS and visualization data measurement digest bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A 100% proteins assigned particle fraction to main taxa color code 80% >0.2 µm >3 µm >10 µm Archaea 0.05 0.12 0.04 60% Bacteria 84.74 44.57 30.84 40% Eukaryota 11.08 43.27 54.32 unclassified 2.76 8.84 11.75 20% various 0.04 0.23 0.26 Viruses 1.33 2.97 2.79 ≥0.2µm 0% free living (f) particle associated (pa) >0.2 µm >3 µm >10 µm particle sizes small medium large

Catenovulum Lacimicrobium Salmonella Candidatus Pseudobacteriovorax Bacteriovorax Enterococcus Bowmanella Riesia Thauera Staphylococcus Citrobacter Lactobacillus Anaerococcus var Myxococcus uc Brachyspira Azovibrio Alteromonadaceae Clostridiales

Salinimonas Calditerricola

Weissella

Streptococcus

Plesiocystis Anaeromyxobacter Clostridium uc Marinomonas Deltaproteobacteria Marinobacterium Glaciecola Alishewanella Escherichia Motiliproteus Colwellia Bacillus

Chondromyces Desulfopila B Candidatus Butyricicoccus Faecalibacterium Candidatus Paenibacillus Babela Magnetomorum Eubacterium

Saccharibacillus Leptospira Desulfatibacillum Thermoactinomyces

uc Kosakonia Melittangium Enhygromyxa Geothermobacter uc Haliangium Desulfuromonas Minicystis Agarivorans Geoalkalibacter Syntrophobacter Proteobacteria Geobacter Megasphaera Thalassomonas

uc Betaproteobacteria Aquaspirillum Yersiniaceae var Ferrimonas Thalassotalea Serratia Marinobacter Proteobacteria Neisseria

Oligofexia Polynucleobacter

Tissierellia uc Psychrosphaera Paraferrimonas Methylococcaceae var Snodgrassella Alteromonas Burkholderiaceae Paraglaciecola Caballeronia Methylomonas

Burkholderia

Algicola Clostridia Neptuniibacter Methylomagnum

Deltaproteobacteria Bacilli Bordetella Cupriavidus Amphritea Luminiphilus Plasticicumulans

Spirochaetia uc Pseudoalteromonas Advenella Paraburkholderia Proteobacteria Negativicutes Reinekea var Balneatrix Limnohabitans Burkholderiales Ramlibacter Collimonas Candidatus Thioglobus Oceanospirillum Massilia var Proteobacteria uc

uc Alteromonadales Oceanospirillaceae Betaproteobacteria Bermanella Duganella Thiolapillus Comamonas Pseudospirillum Ferrovum Hydrogenophaga

Variovorax Simplicispira Candidatus Halomonas Halioglobus Rhodoferax Symbiobacter Aestuariibacter Lampropedia Neptunomonas Pseudorhodoferax uc Polaromonas

Thalassolituus Bacterioplanes uc Hylemonella Halieaceae Comamonadaceae Porticoccus Marinospirillum Aliiglaciecola uc Alteromonadaceae Acidovorax Curvibacter Nitrincola Malikia Macromonas Loktanella Ascidiaceihabitans Oceanobacter Salinicola Gynuella var uc Xylophilus Pseudohaliea Burkholderiales Tepidimonas Ketobacter Idiomarina Thiomonas uc Psychromonas var uc Burkholderiales Marinicella Nitrosomonadales Methylotenera Candidatus Alteromonadales Methylopumilus

uc uc Methylophilaceae Alcanivorax Spongiibacteraceae Gammaproteobacteria Leptothrix Methyloversatilis Haliea Chromatocurvus Corallincola Sulfuricella Endozoicomonas Nitrosomonas Saccharospirillum Thiobacillus

var var Shewanella Betaproteobacteria Spongiibacteraceae uc

Congregibacter Candidatus Kangiella uc Candidatus Lambdaproteobacteria uc Lambdaproteobacteria Oceanospirillales Dasania Gammaproteobacteria Terasakiispira Acidithiobacillia

Zetaproteobacteria Thiohalobacter uc Acidithiobacillales

uc Hahella Zetaproteobacteria

Planktomarina var

Oceanospirillales Haemophilus Oleiphilus uc Oceanicoccus Porticoccaceae Spongiibacter

Sulftobacter MonaibacteriumRoseivivax Arenicella Zhongshania var Yoonia Pseudohongiella Aquimixticola Cellvibrionales Sedimenticola

Thiolinea var Gilvimarinus Cycloclasticus Rhodobacteraceae Methylophaga Thiomicrospira

Microbulbifer Marinimicrobium Thiothrix Shimia Cellvibrio Teredinibacter

uc Leucothrix Francisellaceae

Ruegeria Saccharophagus

Fangia

var Phaeobacter Pseudooctadecabacter Marinagarivorans

Jannaschia Planktotalea Amylibacter uc Agarilytica Pseudomonas Candidatus Simiduia Endobugula Rhodobacter Cellvibrionales

Azotobacter

Pseudopelagicola Emcibacter Roseobacter Polaribacter uc Thiorhodovibrio Alphaproteobacteria Leisingera Candidatus Rhodobacterales Tenderia Leeuwenhoekiella Epsilonproteobacteria Pseudophaeobacter uc Chromatiaceae Thalassobius Psychrobacter

Rheinheimera Ectothiorhodospira Nitrosococcus Acinetobacter Seonamhaeicola Roseovarius Thioalkalivibrio

var uc Moraxellaceae Chromatiales Moraxella Xanthomonas Perlucidibaca Pseudoxanthomonas Aeromonas uc Aquisalimonas Moraxellaceae

Granulosicoccus

Stenotrophomonas

Thiohalomonas

Oceanimonas Labrenzia Actibacterium Algiphilus

Marinosulfonomonas Rhodanobacter Oleiagrimonas Sinimarinibacterium var uc Gammaproteobacteria Marivita Thalassococcus Sphingopyxis uc Pelagimonas Pseudoruegeria Alphaproteobacteria Sphingobium Vibrio Photobacterium Dyella

Maritimibacter Legionella Nereida Sulfurospirillum Litorimicrobium

Rhodobacteraceae Tropicibacter Sphingomonas

Pseudodonghicola Aliiroseovarius Primorskyibacter var Sphingomonadales Arcobacter

var uc Confuentimicrobium Alphaproteobacteria Marinovum Hasllibacter Sphingorhabdus Flaviramulus Flavobacteria Cognatiyoonia Palleronia Albidovulum Helicobacter Roseibium Gramella

Litoreibacter var Sulfurimonas

Natronohydrobacter Pelagicola Acuticoccus Pseudaestuariivita Celeribacter Rhodobacterales Altererythrobacter Nautilia Sedimentitalea Nautella Marivivens Aestuariivita

Pseudovibrio

Tropicimonas Maribius Antarctobacter

Oceaniovalibus Neptunicoccus Pseudooceanicola Croceicoccus Puniceibacterium Candidatus uc Cribrihabitans Boseongicola Hyphomonadaceae Rhodosalinus Winogradskyella Oceanicola Porphyrobacter

Donghicola Albimonas

Dinoroseobacter Epibacterium Jhaorihella

Marinibacterium

Ahrensia Candidatus Puniceispirillum Pseudozobellia Thalassobacter Flavimaricola Robiginitomaculum Jidaibacter

uc Tateyamaria Paracoccus Sphingomonadales Halocynthiibacter Defuviimonas

Thermoleophilia Rhodovulum Candidatus Midichloria

var

Actinobacteria Chryseobacterium Chlorofexia Hyphobacterium Salipiger uc Actinobacteria Salinimicrobium

uc Tritonibacter Cognatishimia Hyphomonas Chlorofexi Thalassobium Mollicutes Lentibacter

Octadecabacter Sediminimonas Nioella Deinococci Lutibacter Acidimicrobiia uc Mesofavibacter

uc Chlorobia Candidatus Rifebacteria Rickettsiales Zobellia Ulvibacter Hanstruepera Psychroserpens Aureitalea Anaplasma uc Tamlana Confuentibacter uc Salegentibacter Kordia

uc Nitrospirae Actinobacteria Verrucomicrobiae Rhodospirillaceae Kiloniella Nitrospira Gilvibacter uc uc Tenacibaculum var Sediminicola uc Pelagibacterales Rhodospirillales Flagellimonas Bacteria Bacteroidetes Candidatus Muricauda Marinirhabdus Phycisphaerae uc Planctomycetes uc var Pelagibacter Pelagibacteraceae Candidatus Oceanibaculum Arenibacter

uc Opitutae Lentisphaerae Endolissoclinum Flavobacteriaceae Magnetospira SkermanellaFodinicurvata

Lentisphaeria Magnetovibrio uc Kordiimonadales Thalassospira

uc Caulobacterales Phenylobacterium Candidatus Thalassobaculum Rhizobium Planctomycetia Tistrella Candidatus Fonsibacter Aquiluna Ensifer Azospirillum Magnetospirillum uc Sinorhizobium

Verrucomicrobia Martelella Robiginitalea uc Aureimonas Phyllobacteriaceae Aurantimonas

Mesorhizobium uc Hyphomicrobiaceae Nisaea

Tistlia

uc Devosia Holosporaceae Formosa

uc Maritalea var

Microbacteriaceae Bradyrhizobium uc Alphaproteobacteria Arenitalea uc Rhodoluna Actinobacteria Salinispora

uc Candidatus Bathyarchaeota var Janibacter Pelagibacterales Actinoplanes Lacinutrix uc Thaumarchaeota var Patulibacter uc Pelagibacteraceae Microvirga uc Cytophagia uc Mycobacteroides Archaea uc var uc Actinobacteria Acidimicrobiales Herpetosiphon Acidobacteria Methanobacteria Bifdobacterium Rhizobiales

uc Candidatus Bacteria Wolfebacteria Acidimicrobium Saprospiria uc Chlamydiia var Chlorofexi Microbacteriaceae Mycobacterium Mycoplasma Maribacter Cyanobacteria Cohaesibacter Methyloceanibacter Lysinimicrobium Cellulomonas uc Pontimonas Rhodococcus Corynebacterium Breoghania uc Microbacterium Dietzia Nakamurella Candidatus uc Candidatus Nonomuraea Olleya Handelsmanbacteria Fibrobacteria Rathayibacter Deinococcus Amycolatopsis uc Candidatus uc var Bacteroidetes Euryarchaeota Marinimicrobia Acidobacteriia Atribacteria uc Candidatus Candidatus Kerfeldbacteria Deferribacteres uc Methylorubrum uc uc Candidatusuc Candidatus uc Candidatus Fusobacteriia Rhodothermaeota Parcubacteria Ignavibacteria PortnoybacteriaUhrbacteria Gemmatimonadetes Saccharibacteria Bacteroidia Chitinophagia Nitrospinae Methanomicrobia Actinophytocola Bauldia Hartmannibacter uc uc Prosthecochloris Pseudorhodoplanes Candidatus Acidimicrobiaceae Rifebacteria uc

Chloroherpeton

Methylobacterium

Streptomyces uc Terasakiella Crocinitomicaceae Actinomycetales Rubritalea Gaetbulibacter

Candidatus Microthrix

var Aquimarina Actinobacteria uc Jiangella Verrucomicrobiaceae Spongiibacterium Dokdonia Bizionia Altibacter Hyunsoonleella

uc Haloferula

Verrucomicrobium

uc Nitrospirae Actinobacteria Jejuia

Nitrospira Pedosphaera uc Cellulophaga

var Eudoraea

Imtechella

uc Xanthomarina bacterial groups Phycisphaeraceae Bacteria Bacteroidetes Riemerella

uc Verrucomicrobiales Akkermansia Algibacter Psychrofexus Flavivirga Phycisphaera Zhouia uc Planctomycetes uc

var Pricia Nonlabens uc Verrucomicrobiae Siansivirga Aureicoccus uc legend of hierarchical top level Lentisphaerae var Opitutae Verrucomicrobiae Lentisphaera

Gemmata Rubinisphaera Bergeyella

Opitutus Mangrovimonas uc Opitutaceae var uc Ochrovirga Capnocytophaga Planctomycetaceae Fuerstia Ereboglobus Flavobacterium Crocinitomix Moheibacter uc

uc Joostella Opitutales Schleiferia Verrucomicrobia Mesonia Elizabethkingia Nodosilinea Coraliomargarita Brumimicrobium Owenweeksia

Gillisia

Thermobaculum Fluviicola Wenyingzhuangia Oceanihabitans Aequorivita Marixanthomonas Gelidibacter Rhodopirellula Leptolyngbya

Lishizhenia uc Puniceicoccaceae uc Phaeodactylibacter

uc Rhodothermaceae Cryomorphaceae

Candidatus Scalindua Rhodothermus proteins assigned Phormidesmis uc Flavobacteriales Haliscomenobacter Planktothricoides Arthrospira uc uc Saprospira Marivirga

uc Candidatus uc Bathyarchaeota

Trichodesmium

Flavobacteriia var uc Synechococcus Thaumarchaeota uc uc uc Candidatus Altiarchaeales Flammeovirgaceae Flavobacteriia Methanobrevibacter Archaea Dyadobacter

to bacterial genera which have been Roseivirga var Saprospiraceae Ekhidna

Acidobacteria Candidatus Eisenibacter Nephrothrix Methanosphaera

Sporocytophaga Aureispira uc Candidatus Bacteria Mucilaginibacter Wolfebacteria

Moorea Neochlamydia

Limnoraphis Lyngbya Cyanothece Spirosoma Leadbetterella Phormidium Lewinella uc Persicobacter Flectobacillus Cecembia Pedobacter uc Chryseolinea uc Candidatus Candidatus uc Prochlorococcus Marinilabiliales Handelsmanbacteria Fibrobacter Echinicola uc uc Candidatus Fabibacter uc Anaerophaga var Euryarchaeota Algoriphagus var Bacteroidetes Bacteroidia Marinimicrobia uc Granulicella Atribacteria uc Candidatus uc

Porphyromonas Candidatus Kerfeldbacteria Hymenobacter Prolixibacter Saccharicrinis Geovibrio Catalinimonas Mongoliibacter uc Cyanobacterium Candidatus Sphingobacterium Lewinellaceae Oscillatoriales uc Hydrobacter uc uc uc Candidatusuc Candidatus uc Candidatus uc Candidatus found in all respective fractions (colors) Cyanobacteria Aphanocapsa TrichormusNostoc Fusobacterium Rhodothermaeota Parcubacteria Ignavibacteriales PortnoybacteriaUhrbacteria Gemmatirosa Saccharibacteria HugenholtziaRufbacter Amoebophilus Aquifexum Parapedobacter uc Sphingobacteriia Bacteroides Marinilabilia Salinivirgaceae Flavihumibacter Nitrospinae Methanoperedens Prevotella bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A fold ch. to smallest fold ch. to smallest abundance frac�ons small medium large abundance frac�ons small medium large particle fraction color code carbon compound utilization adhesion Glycoside hydrolase family 1 2.1 1.0 6.7 Pili 1.0 3.7 3.3 Glycoside hydrolase family 13 1.0 1.5 1.1 other Adhesion proteins 1.0 3.0 2.0 Glycoside hydrolase family 16 1.0 1.3 1.4 mo�lity Glycoside hydrolase family 45 +++ - ++ Starch utilization system SusD 1.0 1.8 1.0 Chemotaxisproteins ++++ - - Starch utilization system SusC 1.0 2.5 1.9 Gliding motility ++++ - + Glycosyl transferases 1.0 6.4 21.5 Flagellum 1.0 4.0 1.7 Sulfatases 2.0 1.0 4.1 other Motilityproteins ++++ - - transporters stress TonB-dependent transporter 1.0 2.4 1.6 Oxidative stress 3.0 1.0 1.1 ABC-transporter 3.2 1.2 1.0 Heavy metal resistance 1.0 1.4 3.8 TRAP-transporter 8.8 1.8 1.0 Antibiotic resistance 1.0 1.6 1.0 Antibiotic biosynthesis - - +++ phosphorus supply light driven energy genera�on Phosphonate ABC-type transport 1.8 1.3 1.0 Phosphate sodium symporter 1.1 1.0 1.1 Proteorhodopsin 1.0 1.1 1.1 ≥0.2µm Phosphate-selective porins 1.0 9.4 6.3 free living (f) particle associated (pa) Phosphate ABC-type transport 2.0 1.2 1.0 Phytase 1.0 2.7 3.1 particle sizes small medium large

uc uc uc Nitrosomonadales Lutibacter Flavobacteriaceae Crocinitomicaceae uc Flammeovirgaceae

uc Limnohabitans Polaribacter Crocinitomicaceae uc Marinirhabdus uc Porticoccaceae Hyunsoonleella Bacteria Seonamhaeicola uc Bacteroidetes

Flavobacterium Joostella

Flaviramulus Methyloversatilis Flavobacterium Flavivirga uc Thiobacillus uc uc uc Comamonadaceae Betaproteobacteria uc Lewinellaceae Crocinitomicaceae

Bacteria Tenacibaculum Formosa uc B Bacteroidetes

Aureicoccus var Algibacter Proteobacteria

var Sinorhizobium Psychroserpens Polaribacter Bacteria Olleya uc Rhodobacteraceae Synechococcus Lutibacter Lewinella Glycoside uc Crocinitomicaceae hydrolase uc family 45 Maribacter Flavobacteriia

uc Formosa Σof Flavobacteriaceae Pseudomonas uc sulfatases Polaribacter Ochrovirga Amylibacter Flavobacterium phytase Flavobacterium Bacteria Roseovarius Sulftobacter Paraglaciecola Polaribacter Gilvibacter Formosa uc uc Flammeovirgaceae Kordia Flavobacteriia

uc Kordia Bacteroidetes Cellulophaga Colwellia

uc Gammaproteobacteria Starch utilization Winogradskyella uc Gammaproteobacteria

heavy Saccharospirillum system SusD Tenacibaculum Reinekea metal uc Vibrio Clostridiales resistance Cycloclasticus var Flagellum Proteobacteria uc

uc Rhodospirillaceae uc Bacteroidetes Ochrovirga Lutibacter Ruegeria Flammeovirgaceae Algibacter Reinekea uc uc Porticoccaceae Muricauda Formosa Flavobacteriaceae Cellvibrio Cecembia Lacinutrix Leisingera Yoonia Polaribacter Starch utilization Gynuella Motility phosphonate uc ABC-type Bacteria Muricauda transport Litoreibacter Hahella Amylibacter uc Cellvibrionales Ulvibacter Psychroserpens uc

Polaribacter system SusC Aurantimonas Rhodobacterales Methylomagnum uc uc Halieaceae Pseudoruegeria Loktanella Pelagimonas uc Rhodospirillaceae Rhizobiales Algoriphagus Candidatus Marinirhabdus

Planktotalea Pelagibacter

Phaeobacter Agarivorans

uc Loktanella Jejuia Nitrospirae Pseudomonas Hartmannibacter uc uc Rhodobacteraceae Cryomorphaceae Roseobacter Candidatus Endolissoclinum

Breoghania Cohaesibacter uc Rhodoferax uc Chlorofexi Thalassobacter Rhodobacterales

var Flavobacteriaceae Octadecabacter Marivirga Polaribacter ABC-transporter uc Deltaproteobacteria uc uc Candidatus Alphaproteobacteria Rifebacteria Sulftobacter var Rhodobacteraceae uc uc Nitrospinae var Candidatus Puniceispirillum Candidatus Arenitalea Aureitalea Rhodobacteraceae Pelagibacter Bacteria Curvibacter Glycoside uc Flammeovirgaceae Schleiferia

uc hydrolase Bacteroidetes uc Rhodospirillales family 1 Malikia Formosa

Roseovarius uc Flavobacteriales Ruegeria Arenitalea

Thalassococcus uc uc Rhodospirillaceae Shimia Marinirhabdus

uc Betaproteobacteria Porticoccaceae Labrenzia

Gynuella Pseudooceanicola

uc Loktanella Chromatiales Planktomarina uc Thalassobium Pelagibacteraceae

Marinobacterium uc Maribacter uc Oceanospirillales

var uc Rhodobacterales var Bacteria Thioalkalivibrio Pontimonas Candidatus Oceanobacter Thioglobus Flavobacteriaceae Bacteria Leucothrix

Reinekea Colwellia Aquimarina

uc Reinekea Cryomorphaceae uc Candidatus Porticoccaceae

var Microbacteriaceae Agarilytica

uc Actinobacteria Algibacter

uc Aquiluna Actinomycetales Flaviramulus

Lysinimicrobium

uc Zhouia Ulvibacter Flavobacteriaceae

uc Acidimicrobium phosphate-selective Microbacteriaceae Halomonas

uc uc oxidative Flavobacteriia porins Bacteroidetes Pseudophaeobacter uc uc stress Acidimicrobiaceae var Alphaproteobacteria Bacteroidetes Gaetbulibacter

uc TonB-dependent Polaribacter Lutibacter Crocinitomicaceae

Kordia Flaviramulus Flavobacterium uc uc Alteromonadales Coraliomargarita Formosa Lacimicrobium Actinobacteria

uc transporter Flavobacteriales Lacinutrix Ruegeria

uc Bacteroidetes Imtechella Formosa

var Flavobacteriia gliding Polaribacter Arenitalea uc motility Flammeovirgaceae Pseudoalteromonas uc uc Deltaproteobacteria Rhodothermaceae uc

Winogradskyella Tateyamaria Antibiotic uc uc Opitutae uc Hyunsoonleella Alphaproteobacteria Halieaceae Gilvibacter var Bacteria Rhodobacteraceae resistance Haliscomenobacter Lewinella

Xanthomarina var Chromatocurvus Bacteria uc Tenacibaculum var Robiginitomaculum Rhodobacterales Algibacter phosphate uc Flavobacteriia Crocinitomicaceae uc TRAP-transporter Rhodothermaeota var Rhodobacteraceae uc Actibacterium Flammeovirgaceae Paraglaciecola Marivirga Arenicella

uc var Cellvibrionales Sulftobacter

uc ABC-type Rhodobacteraceae uc Betaproteobacteria Rhodobacteraceae uc uc Porticoccaceae Gammaproteobacteria uc Halieaceae uc uc Aliiroseovarius Betaproteobacteria Rickettsiales

Xylophilus Glycoside transport Psychrobacter Ruegeria Alteromonas Hasllibacter hydrolase Neisseria

family 16 phosphate Tistrella Reinekea Planktomarina Paraglaciecola Glaciecola Phycisphaera sodium Hydrogenophaga fagellum Oceaniovalibus symporter Glaciecola Idiomarina uc Nitrosomonadales Aquimarina uc Proteobacteria

Polaribacter Colwellia uc Saccharospirillum Pelagibacteraceae uc Betaproteobacteria adhesion Motiliproteus Snodgrassella Pilus Proteorhodopsin var Candidatus more motility Proteobacteria uc uc uc uc Porticoccaceae proteins Glycoside hydrolase Algibacter Pelagibacter Flavobacterium Cellvibrionales Glycosyl uc Flammeovirgaceae uc transferases family 13 Antibiotic Crocinitomicaceae Rhizobiales biosynthesis Porticoccaceae proteins uc Chemotaxis Bowmanella Porticoccaceae uc

Rhodobacteraceae Skermanella Neptunicoccus uc uc uc Rhodospirillaceae Gammaproteobacteria Arenicella Gammaproteobacteria Oceanicoccus Polaribacter Reinekea Robiginitomaculum Tateyamaria Albimonas Haliea Dokdonia uc Oceanospirillales Flaviramulus Shimia Alcanivorax

Litorimicrobium

uc Pelagibacteraceae Marinobacterium

Planktotalea

uc Thalassococcus Flavobacteriia Candidatus Thiohalobacter var uc Roseobacter

Sphingomonadales uc Crocinitomicaceae Pelagibacter Roseovarius uc Polaribacter Formosa Phaeobacter uc Gammaproteobacteria Rhodobacterales Flavobacteriaceae uc uc Marivita Litoreibacter Sulftobacter Gammaproteobacteria Colwellia uc Rhodobacterales Methylophaga Ruegeria Flavobacteriia Amylibacter uc uc Planktomarina uc Roseibium Alphaproteobacteria Flammeovirgaceae Bacteroidetes Reinekea Reichenbachiella Sphingomonas Colwellia Planktomarina Litorimicrobium Motiliproteus Roseovarius Sulftobacter Candidatus

var uc Neptuniibacter Rhodobacteraceae Pelagibacter

uc Rhodobacteraceae Pelagibacterales Candidatus Octadecabacter Puniceispirillum Epibacterium Pelagimonas Amylibacter Pseudoalteromonas uc Paraglaciecola Simiduia uc Pelagibacteraceae Candidatus Rhodospirillaceae functional groups the given Halocynthiibacter Puniceispirillum Formosa

uc Bacteria

Marinomonas uc uc Rhodospirillales var Flavobacteriia Candidatus uc Saccharospirillum Pelagibacter Rhodobacteraceae uc Spongiibacteraceae Alphaproteobacteria Ascidiaceihabitans uc uc Oceanospirillaceae uc Rhodobacterales Nitrosomonadales Planktomarina Neptunomonas Reichenbachiella Saprospira

uc Algibacter Cellvibrionales uc uc Rubritalea proteins were assigned to Rhizobiales Reinekea Chromatiales Jannaschia Primorskyibacter Oceanicoccus uc uc MarinilabilialesFormosa Porticoccaceae uc PaenibacillusAureispira Hydrogenophaga uc Actinobacteria Nitrosomonadales uc Rhodospirillaceae

uc uc Gammaproteobacteria Polaribacter uc Betaproteobacteria Arenicella Simiduia uc Bacteria

uc Betaproteobacteria Acidovorax uc uc Rhodospirillales

uc Cryomorphaceae uc Alphaproteobacteria Ekhidna Flavobacteriia Kordia uc Betaproteobacteria uc Microbacteriaceae Flammeovirgaceae Algoriphagus Candidatus Puniceispirillum Polaribacter uc Bacteria uc Bacteria Candidatus Pelagibacter uc uc Winogradskyella Cryomorphaceae Polaribacter uc Flammeovirgaceae

uc Alphaproteobacteriauc Motiliproteus Reinekea Pelagibacteraceae uc

Porticoccaceae uc Lyngbya Planctomycetaceae Candidatus Pelagibacter

Bacteria Colwellia Acidovorax uc Polaribacter Glaciecola Gammaproteobacteria

Serratia Sulfuricella uc uc uc Oceanicoccus Gammaproteobacteria proteins assigned Rhodobacterales uc Thalassotalea uc var Mucilaginibacter Bacteroidetes uc Bacteriovorax Gammaproteobacteria Citrobacter Bacteria uc Crocinitomicaceae Betaproteobacteria Porticoccaceae Alteromonas Formosa Saccharophagus uc Aestuariibacter Flavobacteriia Polaribacter Thiomicrospira

Winogradskyella Algibacter uc uc Pseudoalteromonas Alphaproteobacteria var Paraglaciecola Rubritalea Verrucomicrobiae Bergeyella uc Bacteroidetes Cyanobacteria Thalassomonas Caballeronia uc Porticoccaceae Candidatus uc Flavobacteriales Oceanicoccus to bacterial genera which have been Algibacter Candidatus Puniceispirillum Endolissoclinum Candidatus Polaribacter Polaribacter Sphingomonas Pelagibacter Paraglaciecola Candidatus Mesofavibacter Aquiluna Arenitalea uc

uc Pelagibacteraceae Candidatus uc Porticoccaceae var

uc Gammaproteobacteria Flavobacteriales uc Muricauda uc Alteromonadaceae Alteromonas Staphylococcus Gammaproteobacteria uc Pelagibacter Bacteroidetes Planktomarina Wenyingzhuangia uc Rhodospirillales Marinosulfonomonas uc Flammeovirgaceae uc Reinekea uc Bacteroidetes Bacteroidetes Rhodospirillaceae Ochrovirga Opitutus uc uc uc found in all respective fractions (colors) Reinekea Bacteria Rhizobialesuc Salinivirgaceae Psychroserpens Reinekea Methylococcaceae Enhygromyxa Lutibacter bioRxiv preprint doi: https://doi.org/10.1101/667428; this version posted June 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A 100% proteins assigned particle fraction to Eukaryotes color code 80% >0.2 µm >3 µm >10 µm Phytoplankton 67.08 72.41 69.78 60% Macroalgae 10.68 9.74 9.97 40% Fungi 2.23 1.50 1.48 Animals 3.78 2.48 3.87 20% Land Plants 1.37 0.70 1.64 other Eukaryotes 14.86 13.16 13.26 ≥0.2µm 0% free living (f) particle associated (pa) >0.2 µm >3 µm >10 µm particle sizes small medium large

Heterotrichida ucAconoidasida NeuropteraAraneae Galliformes Pterioida Octopoda Diplostraca Calanoida

Scorpiones Passeriformes Parachela Bucerotiformes Pectinoida Ichthyophonida Squamata Trichinellida Anthoathecata Xiphosura Anura Rhabditida Schizopyrenida Haemosporida Ostreoida uc Orectolobiformes Ichthyosporea Trombidiformes uc Actinopteri uc Kinetoplastida Sarcoptiformes var var uc Petromyzontiformes Cypriniformes Lepidoptera Testudines Gastropoda Choanofagellida Pleuronectiformes Diplogasterida Chordata Enterogona Actiniaria uc uc Other Actinophryidae Hemiptera Eucoccidiorida Mesostigmata Siluriformes Blattodea Chiroptera uc Architaenioglossa Mammalia Osteoglossiformes Gobiiformes Coleoptera B Characiformes Scleractinia

Dasyuromorphia Clupeiformes Bacillariales Beloniformes

uc Diptera Hymenoptera Enteropneusta Euglyphida Carangiformes Rodentia Golfngiida

Lingulida

Entomobryomorpha Acytosteliales unclassifed uc Placozoa

Longamoebia Primates Capitellida

Aspidochirotida Vitales Ranunculales Perkinsida Valvatida

Proboscidea

Sirenia Trichomonadida Cyprinodontiformes Scandentia Echinoida Animals Tritrichomonadida Eukaryotes Cucurbitales

Leptocylindrales Heteronemertea Haplosclerida

Strigeidida

Macrostomida Plagiorchiida

Rosales Solanales Fagales Asterales

Sapindales Opisthorchiida Gentianales

Funariales Fabales

Lamiales Selaginellales Chlorokybales Ericales Coscinodiscales Apiales Brassicales Charales Land Poales uc plants Bacillariophyceae Naviculales Cymbellales

Oomycetes Chaetocerotales Arecales

Klebsormidiales

Marchantiales Albuginales Pythiales

Philasterida Thalassiosirales

Peniculida

Myzocytiopsidales Eunotiales Hymenostomatida uc Labyrinthulomycetes Bacillariophyta Rhizosoleniales

Cryptomonadales Phyto- Sporadotrichida Pyrenomonadales

Licmophorales Compsopogonales

Toxariales Erythropeltidales Cymatosirales plankton Euplotida Mediophyceae

Lithodesmiales Anaulales Fragilariales

Phaeocystales Hemiaulales Triceratiales

var uc Biddulphiales Phaeophyceae

Ulvales

Pavlovales Pelagomonadales Isochrysidales Pyramimonadales Eustigmatales Macroalgae Synurales

Gloeochaetales Fungi Prymnesiales

Chromulinales Mamiellales

Florenciellales uc Microsporidia Monomastigales generic eukaryotic groups Sphaeropleales Blastocladiales

Pucciniales Chlamydomonadales

Cystoflobasidiales Ustilaginales Cantharellales legend of hierarchical top level Hymenochaetales Microstromatales Violaceomycetales Gigartinales Auriculariales Sebacinales Tilletiales

Agaricales Dacrymycetales

Acrosymphytales

Basidiomycota Chattonellales Polyporales Corticiales Gloeophyllales Chlorellales Microbotryales Schizosaccharomycetales

uc Ascomycota Gracilariales uc Nephroselmidophyceae Rhodellales Pleosporales

uc Trebouxiophyceae Porphyridiales Dothideales uc protein assigned Chromerida Hypocreales Suessiales Euglenales Gymnodiniales Cyanidiales Sordariales Saccharomycetales uc Dothideomycetes Ceramiales Dixoniellales Glomerales Batrachospermales Glomerellales Peridiniales

uc Eutreptiales Leotiomycetes Gelidiales Mortierellales Laminariales to orders of Eukaryotes which have been Mischococcales Chaetothyriales Mucorales Helotiales Botrydiales Eurotiales Pneumocystidales uc var uc Xanthophyceae Bangiales Cryptomycota Rhizophydiales Basidiobolales Nemaliales Monoblepharidales Vaucheriales Plocamiales Dictyosteliales Gonyaulacales Chytridiales Neocallimastigales found in all respective fractions (colors) Protosteliales Harpellales